Welcome back to Peking University MOOC: "Bioinformatics: Introduction and Methods". I am Ge Gao from the Center of Bioinformatics, Peking University. Let's continue with our second topic: Sequence Alignment. Last time we discussed how to construct pairwise sequence alignment using dynamic programming. In this unit, we will continue on this topic. As the last figure in the previous unit shows, we traced back from the lower-bottom corner of the dynamic programming matrix and ended at the upper-most corner (0,0). In other words, we aligned all the residues of the two sequences from end to end. It is called a Global Sequence Alignment. It is also called the Needleman-Wunsch algorithm which was developed by Saul Needleman and Christian Wunsch from Chicago and published in 1970. This algorithm can efficiently find the optimal alignment for any two sequences given a user-specified scoring function. It was widely used in early researches on protein sequence analysis. However, as more and more sequences became known, limitations of aligning two sequences globally were noticed. First, with more and more proteins sequenced, researchers have found out that, two functionally related proteins might be significantly/largely different in their whole sequences but share similar important functional domains. The sequence fragment of the functional domain might be very conservative across different proteins in the same protein family, and determine the biological function. But it is impossible to find such fragments by global alignment algorithms. Secondly, the new discovery of introns in the 1970s required the DNA sequence alignment algorithms to be able to handle large deletions and interspersed conserved fragments (exons) and variable fragments (introns). Therefore, since the early 1980s people began to realize that new methods were needed to find the most similar sequence fragments between two input sequences. We need to do "local alignment". In 1981, physicist Temple Smith and mathematician Michael Waterman jointly published a 4-page paper in the Journal of Molecular Biology. They modified the Needleman-Wunsch algorithm, resulting in a local alignment algorithm which was later called the Smith-Waterman algorithm. resulting in a local alignment algorithm which was later called the Smith-Waterman algorithm. This paper has been cited more than 8000 times so far, making it one of the most highly cited papers in bioinformatics and even in life sciences. We're pleased to announce that we have invited Dr. Waterman here to film an interview on the co-development of the historic Smith-Waterman algorithm by himself and Dr. Smith 30 years ago. Now let's take a look at this algorithm. This is the Needleman-Wunsch global alignment algorithm discussed earlier. And the Smith-Waterman algorithm is made by adding a zero here. In other words, Here we can see more differences. the score of the best alignment in each recursion is lower bounded by 0. We will demonstrate it with the same example we used last time. Here nothing is changed except for an additional line in the recursive formula. Here nothing is changed except for an additional line in the recursive formula. We can see that all these cells are zero instead of negative numbers. This is because even though the gap penalty is -5, the modified recursive formula lower-bound the score by 0. Now let's move onto the cells at the center. Here we can see more differences. First, here we get two independent paths, which is different from the previous global alignment. The reason is that the modified recursive formula will assign a cell with score 0 if the scores from all three possible "sources" are below zero. This will stop the previous recursion paths, Thus, the start and end of the backtracking will not necessarily be at the lower-right and upper-left corners. In fact, we can see that the backtracking paths are local, and the final optimal alignment is also local. And we can get not only the optimal but also the sub-optimal local alignments. This is different from global alignment. Now let's summarize. By introducing a lower bound of 0 for the score, algorithm provides a "reset" mechanism after we encounter regions with large sequence differences. This allowed us to find local similarities. It also shows how flexible dynamic programming could be. So here comes a question. Which are the "correct" or the "better" alignments among all these four different alignments? I will leave this as this unit's question for you to think about it and discuss with other students and TAs in the forum. Thank you！