Sequence Alignment Algorithms

download Sequence Alignment Algorithms

of 18

Transcript of Sequence Alignment Algorithms

  • 8/3/2019 Sequence Alignment Algorithms

    1/18

    Developing Pairwise Sequence Alignment Algorithms

    Dr. Nancy Warter-Perez

  • 8/3/2019 Sequence Alignment Algorithms

    2/18

    Developing Pairwise Sequence Alignment Algorithms 2

    OutlineOverview of global and local alignment References for sequence alignment algorithms

    Discussion of Needleman-Wunsch iterative approachto global alignment Discussion of Smith-Waterman recursive approach tolocal alignment Discussion of how LCS Algorithm can be extended for

    Global alignment (Needleman-Wunsch)Local alignment (Smith-Waterman)

    Affine gap penaltiesGroup assignments for project

  • 8/3/2019 Sequence Alignment Algorithms

    3/18

    Developing Pairwise Sequence Alignment Algorithms 3

    Overview of Pairwise

    Sequence Alignment Dynamic Programming

    Applied to optimization problemsUseful when

    Problem can be recursively divided into sub-problemsSub-problems are not independent Needle man-Wunsch is a global alignment technique that usesan iterative algorithm and no gap penalty (could extend to fixedgap penalty).S mith-Wat e rman is a local alignment technique that uses a

    recursive algorithm and can use alternative gap penalties (such asaffine ). Smith-Waterman s algorithm is an extension of Longest Common Substring (LCS) problem and can be generalized to solveboth local and global alignment.Note: Needleman-Wunsch is usually used to refer to globalalignment regardless of the algorithm used.

  • 8/3/2019 Sequence Alignment Algorithms

    4/18

    Developing Pairwise Sequence Alignment Algorithms 4

    Project Referenceshttp://www.sbc.su.se/~arne/kurser/swell/pairwise

    _alignments.html

    Computational Molecular Biology An Algorithmic Approach, Pavel PevznerIntroduction to Computational Biology Maps,sequences, and genomes, Michael Waterman

    Algorithms on Strings, Trees, and Sequences Computer Science and Computational Biology, DanGusfield

  • 8/3/2019 Sequence Alignment Algorithms

    5/18

    Developing Pairwise Sequence Alignment Algorithms 5

    Classic PapersNeedleman, S.B. and Wunsch, C.D. A GeneralMethod Applicable to the Search for Similarities in

    Amino Acid Sequence of Two Proteins. J. Mol. Biol. ,48, pp. 443-453, 1970.(http://www.cs.umd.edu/class/spring2003/cmsc838t/papers/needlemanandwunsch1970.pdf )Smith, T.F. and Waterman, M.S. Identification of Common Molecular Subsequences. J. Mol. Biol. ,147, pp. 195-197,1981.( http://www.cmb.usc.edu/papers/msw_papers/msw-042.pdf )

  • 8/3/2019 Sequence Alignment Algorithms

    6/18

    Developing Pairwise Sequence Alignment Algorithms 6

    Needleman-Wunsch (1 of 3)

    Match = 1

    Mismatch = 0

    Gap = 0

  • 8/3/2019 Sequence Alignment Algorithms

    7/18

    Developing Pairwise Sequence Alignment Algorithms 7

    Needleman-Wunsch (2 of 3)

  • 8/3/2019 Sequence Alignment Algorithms

    8/18

    Developing Pairwise Sequence Alignment Algorithms 8

    Needleman-Wunsch (3 of 3)From page 446:

    It is apparent that the above array operation can beginat any of a number of points along the borders of thearray, which is equivalent to a comparison of N-terminalresidues or C-terminal residues only. As long as the

    appropriate rules for pathways are followed, themaximum match will be the same. The cells of thearray which contributed to the maximum match, may be determined by recording the origin of the number that was added to each cell when the array was

    operated upon.

  • 8/3/2019 Sequence Alignment Algorithms

    9/18

    Developing Pairwise Sequence Alignment Algorithms 9

    Smith-Waterman (1 of 3) A lgorithm

    The two molecular sequences will be A=a 1a 2 . . . a n, and B=b 1b2 . . . b m . Asimilarity s(a,b ) is given between sequence elements a and b . Deletions of

    length k are given weight W k . To find pairs of segments with highdegrees of similarity, we set up a matrix H . First set

    H k0 = H ol = 0 for 0

  • 8/3/2019 Sequence Alignment Algorithms

    10/18

    Developing Pairwise Sequence Alignment Algorithms 10

    Smith-Waterman (2 of 3)The formula for H ij follows by considering the possibilities for

    ending the segments at any a i and b j .

    (1 ) If a i and b j are associated, the similarity isH i-l,j-l + s (a i ,b j ).

    (2 ) If a i is at the end of a deletion of length k, the similarity is

    H i k, j - W k .

    (3 ) If b j is at the end of a deletion of length 1, the similarity is

    H i , j-l - W l . (typo in paper )

    (4 ) Finally, a zero is included to prevent calculated negativesimilarity, indicating n o similarity up to a i and b j .

  • 8/3/2019 Sequence Alignment Algorithms

    11/18

    Developing Pairwise Sequence Alignment Algorithms 11

    Smith-Waterman (3 of 3)The pair of segments with maximum similarity isfound by first locating the maximum element of H. The other matrix elements leading to thismaximum value are than sequentially determinedwith a traceback procedure ending with an element of H equal to zero. This procedureidentifies the segments as well as produces the

    corresponding alignment. The pair of segmentswith the next best similarity is found by applyingthe traceback procedure to the second largestelement of H not associated with the firsttraceback.

  • 8/3/2019 Sequence Alignment Algorithms

    12/18

    Developing Pairwise Sequence Alignment Algorithms 12

    LCS Problem (cont.)Similarity score

    si-1,j

    si,j = max { si,j-1si-1,j-1 + 1, if vi = wj

  • 8/3/2019 Sequence Alignment Algorithms

    13/18

    Developing Pairwise Sequence Alignment Algorithms 13

    Extend LCS to Global

    Alignment si-1,j + H(vi, -)

    si,j = max { si,j-1 + H(-, wj)si-1,j-1 + H(vi, wj)

    H(vi, -) = H(-, wj) = - V = fixed gap penalty

    H(vi, wj) = score for match or mismatch can befixed, from PAM or BLOSUMModify LCS and PRINT-LCS algorithms to support global alignment (On board discussion)

  • 8/3/2019 Sequence Alignment Algorithms

    14/18

    Developing Pairwise Sequence Alignment Algorithms 14

    Extend to Local Alignment 0 (no negative scores)si-1,j + H(vi, -)

    si,j = max { si,j-1 + H(-, wj)si-1,j-1 + H(vi, wj)

    H(vi, -) = H(-, wj) = - V = fixed gap penaltyH(vi, wj) = score for match or mismatch can

    be fixed, from PAM or BLOSUM

  • 8/3/2019 Sequence Alignment Algorithms

    15/18

    Developing Pairwise Sequence Alignment Algorithms 15

    Discussion on adding

    affine gap penalties Affine gap penalty

    Score for a gap of length x-( V + Wx)

    WhereV > 0 is the insert gap penaltyW> 0 is the extend gap penalty

    On board example fromhttp://www.sbc.su.se/~arne/kurser/swell/pairwise_alignments.html

  • 8/3/2019 Sequence Alignment Algorithms

    16/18

    Developing Pairwise Sequence Alignment Algorithms 16

    Alignment with Gap PenaltiesCan apply to global or local (w/ zero) algorithms

    o si,j = max { o si-1,j - Wsi-1,j - ( V + W)

    n si,j = max { n si1,j-1 - Wsi,j-1 - ( V + W)

    si-1,j-1 + H(vi, wj)si,j = max { o si,j

    n si,j

  • 8/3/2019 Sequence Alignment Algorithms

    17/18

    Developing Pairwise Sequence Alignment Algorithms 17

    Project Teams and

    Presentation AssignmentsBase Project (Global Alignment):

    Shwe and LeightonExtension 1 (Ends-Free Global Alignment):

    Ehsanul and Water TreeExtension 2 (Local Alignment):

    Scott and BrianExtension 3 (Affine Gap Penalty):

    Charlyn and DavidExtension 4 (Database):Daniel and Ashley

    Extension 5 (Space Efficient Algorithm):Kendra and Qing

  • 8/3/2019 Sequence Alignment Algorithms

    18/18

    Developing Pairwise Sequence Alignment Algorithms 18

    WorkshopMeet with your group and develop forthe overall structure of your program

    High-level algorithmIdentify the modules, functions (includingparameters), and global variablesDetermine who is responsible for eachmoduleDevise a development timeline and atesting strategy