我们将学会对 DNA 测序数据进行分析的计算方法——算法和数据结构。我们将学习一点DNA、基因组、以及 DNA 测序的应用的相关知识。我们将使用 Python 实现关键算法和数据结构，并分析实际的基因组和 DNA 测序数据集。

From the course by 约翰霍普金斯大学

DNA 测序算法

From the lesson

Edit distance, assembly, overlaps

This week we finish our discussion of read alignment by learning about algorithms that solve both the edit distance problem and related biosequence analysis problems, like global and local alignment.

- Ben Langmead, PhDAssistant Professor

Computer Science - Jacob Pritt

Department of Computer Science

In this practical we'll extend the overlap function that we wrote in the last

practical, and use it to create a naive overlap map for a set of reads.

So I've already pasted in here the function that we wrote

that calculates the overlap between two strings a and b.

Another function we're going to use here is permutations, which is in Python so

I'm going to start by saying from itertools import permutations.

And now just to give you an example of what this does,

I'm going to print out permutations.

I'm going to give it a list.

I'll just give it 1, 2, and 3, and

I'm going to give it the length of the permutations that I want.

So I'll start by just saying 1.

And so this prints out all the permutations of this set of size 1.

So in this case they can be 1, 2, or 3.

I can get all the permutations of size 2, and

this will get me all of the pairs, which are ordered.

And similarly, I can do all the permutations of size 3, and this gives me

all the size 3 permutations of that list.

So now let's write our function naive_overlap.

We'll take a set of reads and a minimum overlap y of k.

And so I'm going to store my

overlap map as a dictionary.

And now what I'm going to do is I'm going to say for

each pair of reads a, b, in permutations of reads, 2.

So I'll get all the pairs of reads in that set.

And I'm going to calculate the overlap length using the function above.

And all I'm going to do is I'm going to say,

if that length is greater than zero, then,

add the tuple a, b to the dictionary as a key and

give it a value of that overlap length.

And I'll just return overlaps.

So now let's do this for a simple set of reads.

Give it a minimum length of 3 and let me try to get something that will work

There we go.

So there's an example of, so in this case, the first two reads overlap.

First read overlaps the second one, and the third read overlaps the first one.

