Pairwise Alignment How do we tell whether two sequences are similar? BIO520 BioinformaticsJim Lund...
-
Upload
sydney-brooks -
Category
Documents
-
view
215 -
download
0
Transcript of Pairwise Alignment How do we tell whether two sequences are similar? BIO520 BioinformaticsJim Lund...
![Page 1: Pairwise Alignment How do we tell whether two sequences are similar? BIO520 BioinformaticsJim Lund Assigned reading: Ch 4.1-4.7, Ch 5.1, get what you can.](https://reader035.fdocuments.in/reader035/viewer/2022062800/56649de55503460f94adcfa3/html5/thumbnails/1.jpg)
Pairwise Alignment
How do we tell whether two sequences are
similar?
BIO520 Bioinformatics Jim Lund
Assigned reading:Ch 4.1-4.7, Ch 5.1, get what you can out of 5.2, 5.4
![Page 2: Pairwise Alignment How do we tell whether two sequences are similar? BIO520 BioinformaticsJim Lund Assigned reading: Ch 4.1-4.7, Ch 5.1, get what you can.](https://reader035.fdocuments.in/reader035/viewer/2022062800/56649de55503460f94adcfa3/html5/thumbnails/2.jpg)
Pairwise alignment
• DNA:DNA
• polypeptide:polypeptide
The BASIC Sequence Analysis Operation
![Page 3: Pairwise Alignment How do we tell whether two sequences are similar? BIO520 BioinformaticsJim Lund Assigned reading: Ch 4.1-4.7, Ch 5.1, get what you can.](https://reader035.fdocuments.in/reader035/viewer/2022062800/56649de55503460f94adcfa3/html5/thumbnails/3.jpg)
Alignments
• Pairwise sequence alignments
–One-to-One
–One-to-Database• Multiple sequence alignments
–Many-to-Many
![Page 4: Pairwise Alignment How do we tell whether two sequences are similar? BIO520 BioinformaticsJim Lund Assigned reading: Ch 4.1-4.7, Ch 5.1, get what you can.](https://reader035.fdocuments.in/reader035/viewer/2022062800/56649de55503460f94adcfa3/html5/thumbnails/4.jpg)
Origins of Sequence Similarity
• Homology– common evolutionary descent
• Chance– Short similar segments are very
common.
• Similarity in function– Convergence (very rare)
![Page 5: Pairwise Alignment How do we tell whether two sequences are similar? BIO520 BioinformaticsJim Lund Assigned reading: Ch 4.1-4.7, Ch 5.1, get what you can.](https://reader035.fdocuments.in/reader035/viewer/2022062800/56649de55503460f94adcfa3/html5/thumbnails/5.jpg)
![Page 6: Pairwise Alignment How do we tell whether two sequences are similar? BIO520 BioinformaticsJim Lund Assigned reading: Ch 4.1-4.7, Ch 5.1, get what you can.](https://reader035.fdocuments.in/reader035/viewer/2022062800/56649de55503460f94adcfa3/html5/thumbnails/6.jpg)
Visual sequence comparison: Dotplot
![Page 7: Pairwise Alignment How do we tell whether two sequences are similar? BIO520 BioinformaticsJim Lund Assigned reading: Ch 4.1-4.7, Ch 5.1, get what you can.](https://reader035.fdocuments.in/reader035/viewer/2022062800/56649de55503460f94adcfa3/html5/thumbnails/7.jpg)
Visual sequence comparison: Filtered dotplot
4 bp window, 75% identity cutoff
![Page 8: Pairwise Alignment How do we tell whether two sequences are similar? BIO520 BioinformaticsJim Lund Assigned reading: Ch 4.1-4.7, Ch 5.1, get what you can.](https://reader035.fdocuments.in/reader035/viewer/2022062800/56649de55503460f94adcfa3/html5/thumbnails/8.jpg)
Visual sequence comparison: Dotplot
4 bp windw, 75% identity cutoff
![Page 9: Pairwise Alignment How do we tell whether two sequences are similar? BIO520 BioinformaticsJim Lund Assigned reading: Ch 4.1-4.7, Ch 5.1, get what you can.](https://reader035.fdocuments.in/reader035/viewer/2022062800/56649de55503460f94adcfa3/html5/thumbnails/9.jpg)
Dotplots of sequence rearrangements
![Page 10: Pairwise Alignment How do we tell whether two sequences are similar? BIO520 BioinformaticsJim Lund Assigned reading: Ch 4.1-4.7, Ch 5.1, get what you can.](https://reader035.fdocuments.in/reader035/viewer/2022062800/56649de55503460f94adcfa3/html5/thumbnails/10.jpg)
Assessing similarity
GAACAAT||||||| 7/7 OR 100%GAACAAT
GAACAAT | 1/7 or 14%GAACAAT
Which is BETTER?How do we SCORE?
![Page 11: Pairwise Alignment How do we tell whether two sequences are similar? BIO520 BioinformaticsJim Lund Assigned reading: Ch 4.1-4.7, Ch 5.1, get what you can.](https://reader035.fdocuments.in/reader035/viewer/2022062800/56649de55503460f94adcfa3/html5/thumbnails/11.jpg)
Similarity
GAACAAT||||||| 7/7 OR 100%GAACAAT
GAACAAT||| ||| 6/7 OR 84%GAATAAT
MISMATCH
![Page 12: Pairwise Alignment How do we tell whether two sequences are similar? BIO520 BioinformaticsJim Lund Assigned reading: Ch 4.1-4.7, Ch 5.1, get what you can.](https://reader035.fdocuments.in/reader035/viewer/2022062800/56649de55503460f94adcfa3/html5/thumbnails/12.jpg)
Mismatches
GAACAAT||| ||| 6/7 OR 84%GAATAAT
GAACAAT||| ||| 6/7 OR 84%GAAGAAT
![Page 13: Pairwise Alignment How do we tell whether two sequences are similar? BIO520 BioinformaticsJim Lund Assigned reading: Ch 4.1-4.7, Ch 5.1, get what you can.](https://reader035.fdocuments.in/reader035/viewer/2022062800/56649de55503460f94adcfa3/html5/thumbnails/13.jpg)
Terminal Mismatch
GAACAATttttt ||| |||aaaccGAATAAT 6/7 OR 84%
![Page 14: Pairwise Alignment How do we tell whether two sequences are similar? BIO520 BioinformaticsJim Lund Assigned reading: Ch 4.1-4.7, Ch 5.1, get what you can.](https://reader035.fdocuments.in/reader035/viewer/2022062800/56649de55503460f94adcfa3/html5/thumbnails/14.jpg)
INDELS
GAAgCAAT||| |||| 7/7 OR 100%GAA*CAAT
![Page 15: Pairwise Alignment How do we tell whether two sequences are similar? BIO520 BioinformaticsJim Lund Assigned reading: Ch 4.1-4.7, Ch 5.1, get what you can.](https://reader035.fdocuments.in/reader035/viewer/2022062800/56649de55503460f94adcfa3/html5/thumbnails/15.jpg)
Indels, cont’d
GAAgCAAT||| ||||GAA*CAAT
GAAggggCAAT||| ||||GAA****CAAT
![Page 16: Pairwise Alignment How do we tell whether two sequences are similar? BIO520 BioinformaticsJim Lund Assigned reading: Ch 4.1-4.7, Ch 5.1, get what you can.](https://reader035.fdocuments.in/reader035/viewer/2022062800/56649de55503460f94adcfa3/html5/thumbnails/16.jpg)
Similarity Scoring
Common Method: • Terminal mismatches (0)• Match score (1)• Mismatch penalty (-3)• Gap penalty (-1)• Gap extension penalty (-1)
DNA Defaults
![Page 17: Pairwise Alignment How do we tell whether two sequences are similar? BIO520 BioinformaticsJim Lund Assigned reading: Ch 4.1-4.7, Ch 5.1, get what you can.](https://reader035.fdocuments.in/reader035/viewer/2022062800/56649de55503460f94adcfa3/html5/thumbnails/17.jpg)
DNA Scoring
GGGGGGAGAA
|||||*|*|| 8(1)+2(-3)=22GGGGGAAAAAGGGGG
GGGGGGAGAA--GGG
|||||*|*|| ||| 11(1)+2(-3)+1(-1)+1(-1)=33GGGGGAAAAAGGGGG
![Page 18: Pairwise Alignment How do we tell whether two sequences are similar? BIO520 BioinformaticsJim Lund Assigned reading: Ch 4.1-4.7, Ch 5.1, get what you can.](https://reader035.fdocuments.in/reader035/viewer/2022062800/56649de55503460f94adcfa3/html5/thumbnails/18.jpg)
Absurdity of Low Gap Penalty
GATCGCTACGCTCAGC A.C.C..C..T
Perfect similarity,Every time!
![Page 19: Pairwise Alignment How do we tell whether two sequences are similar? BIO520 BioinformaticsJim Lund Assigned reading: Ch 4.1-4.7, Ch 5.1, get what you can.](https://reader035.fdocuments.in/reader035/viewer/2022062800/56649de55503460f94adcfa3/html5/thumbnails/19.jpg)
Sequence alignment algorithms
• Local alignment– Smith-Waterman
• Global alignment– Needleman-Wunsch
![Page 20: Pairwise Alignment How do we tell whether two sequences are similar? BIO520 BioinformaticsJim Lund Assigned reading: Ch 4.1-4.7, Ch 5.1, get what you can.](https://reader035.fdocuments.in/reader035/viewer/2022062800/56649de55503460f94adcfa3/html5/thumbnails/20.jpg)
Alignment Programs
• Local alignment (Smith-Waterman)– BLAST (simplified Smith-Waterman)
– FASTA (simplified Smith-Waterman)
– BESTFIT (GCG program)
• Global alignment (Needleman-Wunsch)– GAP
![Page 21: Pairwise Alignment How do we tell whether two sequences are similar? BIO520 BioinformaticsJim Lund Assigned reading: Ch 4.1-4.7, Ch 5.1, get what you can.](https://reader035.fdocuments.in/reader035/viewer/2022062800/56649de55503460f94adcfa3/html5/thumbnails/21.jpg)
Local vs. global alignment
10 gaggc 15 ||||| 3 gaggc 7
1 gggggaaaaagtggccccc 19 || |||| ||1 gggggttttttttgtggtttcc 22
Global alignment: alignment of the full length of the sequences
Local alignment: alignment of regions of substantial similarity
![Page 22: Pairwise Alignment How do we tell whether two sequences are similar? BIO520 BioinformaticsJim Lund Assigned reading: Ch 4.1-4.7, Ch 5.1, get what you can.](https://reader035.fdocuments.in/reader035/viewer/2022062800/56649de55503460f94adcfa3/html5/thumbnails/22.jpg)
Local vs. global alignment
![Page 23: Pairwise Alignment How do we tell whether two sequences are similar? BIO520 BioinformaticsJim Lund Assigned reading: Ch 4.1-4.7, Ch 5.1, get what you can.](https://reader035.fdocuments.in/reader035/viewer/2022062800/56649de55503460f94adcfa3/html5/thumbnails/23.jpg)
BLAST Algorithm
Look for local alignment, a High Scoring Pair (HSP)• Finding word (W) in query and subject. Score > T.• Extend local alignment until score reaches
maximum-X.• Keep High Scoring Segment Pairs (HSPs) with
scores > S.• Find multiple HSPs per query if present• Expectation value (E value) using Karlin-Altschul
stats
![Page 24: Pairwise Alignment How do we tell whether two sequences are similar? BIO520 BioinformaticsJim Lund Assigned reading: Ch 4.1-4.7, Ch 5.1, get what you can.](https://reader035.fdocuments.in/reader035/viewer/2022062800/56649de55503460f94adcfa3/html5/thumbnails/24.jpg)
BLAST statistical significance: assessing the likelihood a match
occurs by chance
Karlin-Altschul statistic:E = k m N exp(-Lambda S)
m = Size of query seqeunceN = Size of databasek = Search space scaling parameterLambda = scoring scaling parameterS = BLAST HSP score
Low E -> good match
![Page 25: Pairwise Alignment How do we tell whether two sequences are similar? BIO520 BioinformaticsJim Lund Assigned reading: Ch 4.1-4.7, Ch 5.1, get what you can.](https://reader035.fdocuments.in/reader035/viewer/2022062800/56649de55503460f94adcfa3/html5/thumbnails/25.jpg)
BLAST statistical significance:
Rule of thumb for a good match:
•Nucleotide match•E < 1e-6•Identity > 70%
•Protein match•E < 1e-3•Identity > 25%
![Page 26: Pairwise Alignment How do we tell whether two sequences are similar? BIO520 BioinformaticsJim Lund Assigned reading: Ch 4.1-4.7, Ch 5.1, get what you can.](https://reader035.fdocuments.in/reader035/viewer/2022062800/56649de55503460f94adcfa3/html5/thumbnails/26.jpg)
Protein Similarity Scoring
• Identity - Easy• WEAK Alignments• Chemical Similarity
– L vs I, K vs R…
• Evolutionary Similarity–How do proteins evolve?–How do we infer similarities?
![Page 27: Pairwise Alignment How do we tell whether two sequences are similar? BIO520 BioinformaticsJim Lund Assigned reading: Ch 4.1-4.7, Ch 5.1, get what you can.](https://reader035.fdocuments.in/reader035/viewer/2022062800/56649de55503460f94adcfa3/html5/thumbnails/27.jpg)
BLOSUM62
C S T P A G N D C 9 -1 -1 -3 0 -3 -3 -3 S -1 4 1 -1 1 0 1 0 T -1 1 4 1 -1 1 0 1 P -3 -1 1 7 -1 -2 -1 -1 A 0 1 -1 -1 4 0 -1 -2 G -3 0 1 -2 0 6 -2 -1 N -3 1 0 -2 -2 0 6 1 D -3 0 1 -1 -2 -1 1 6
![Page 28: Pairwise Alignment How do we tell whether two sequences are similar? BIO520 BioinformaticsJim Lund Assigned reading: Ch 4.1-4.7, Ch 5.1, get what you can.](https://reader035.fdocuments.in/reader035/viewer/2022062800/56649de55503460f94adcfa3/html5/thumbnails/28.jpg)
Single-base evolution changes the encoded
AACAU=HCAU=H
CAC=H CGU=R UAU=Y
CAA=Q CCU=P GAU=D
CAG=Q CUU=L AAU=N
![Page 29: Pairwise Alignment How do we tell whether two sequences are similar? BIO520 BioinformaticsJim Lund Assigned reading: Ch 4.1-4.7, Ch 5.1, get what you can.](https://reader035.fdocuments.in/reader035/viewer/2022062800/56649de55503460f94adcfa3/html5/thumbnails/29.jpg)
Substitution Matrices
Two main classes:
• PAM-Dayhoff
• BLOSUM-Henikoff
![Page 30: Pairwise Alignment How do we tell whether two sequences are similar? BIO520 BioinformaticsJim Lund Assigned reading: Ch 4.1-4.7, Ch 5.1, get what you can.](https://reader035.fdocuments.in/reader035/viewer/2022062800/56649de55503460f94adcfa3/html5/thumbnails/30.jpg)
PAM-Dayhoff
• Built from closed related proteins, substitutions constrained by evolution and function
• “accepted” by evolution (Point Accepted Mutation=PAM)
• 1 PAM::1% divergence• PAM120=closely related proteins
• PAM250=divergent proteins
![Page 31: Pairwise Alignment How do we tell whether two sequences are similar? BIO520 BioinformaticsJim Lund Assigned reading: Ch 4.1-4.7, Ch 5.1, get what you can.](https://reader035.fdocuments.in/reader035/viewer/2022062800/56649de55503460f94adcfa3/html5/thumbnails/31.jpg)
BLOSUM-Henikoff&Henikoff
• Built from ungapped alignments in proteins: “BLOCKS”
• Merge blocks at given % similar to one sequence
• Calculate “target” frequencies
• BLOSUM62=62% similar blocks– good general purpose
• BLOSUM30– Detects weak similarities, used for distantly related proteins
![Page 32: Pairwise Alignment How do we tell whether two sequences are similar? BIO520 BioinformaticsJim Lund Assigned reading: Ch 4.1-4.7, Ch 5.1, get what you can.](https://reader035.fdocuments.in/reader035/viewer/2022062800/56649de55503460f94adcfa3/html5/thumbnails/32.jpg)
BLOSUM62
C S T P A G N D C 9 -1 -1 -3 0 -3 -3 -3 S -1 4 1 -1 1 0 1 0 T -1 1 4 1 -1 1 0 1 P -3 -1 1 7 -1 -2 -1 -1 A 0 1 -1 -1 4 0 -1 -2 G -3 0 1 -2 0 6 -2 -1 N -3 1 0 -2 -2 0 6 1 D -3 0 1 -1 -2 -1 1 6
![Page 33: Pairwise Alignment How do we tell whether two sequences are similar? BIO520 BioinformaticsJim Lund Assigned reading: Ch 4.1-4.7, Ch 5.1, get what you can.](https://reader035.fdocuments.in/reader035/viewer/2022062800/56649de55503460f94adcfa3/html5/thumbnails/33.jpg)
Gapped alignments
• No general theory for significance of matches!!
• G+L(n) – indel mutations rare
– variation in gap length “easy”, G > L
![Page 34: Pairwise Alignment How do we tell whether two sequences are similar? BIO520 BioinformaticsJim Lund Assigned reading: Ch 4.1-4.7, Ch 5.1, get what you can.](https://reader035.fdocuments.in/reader035/viewer/2022062800/56649de55503460f94adcfa3/html5/thumbnails/34.jpg)
Real Alignments
![Page 35: Pairwise Alignment How do we tell whether two sequences are similar? BIO520 BioinformaticsJim Lund Assigned reading: Ch 4.1-4.7, Ch 5.1, get what you can.](https://reader035.fdocuments.in/reader035/viewer/2022062800/56649de55503460f94adcfa3/html5/thumbnails/35.jpg)
Phylogeny
![Page 36: Pairwise Alignment How do we tell whether two sequences are similar? BIO520 BioinformaticsJim Lund Assigned reading: Ch 4.1-4.7, Ch 5.1, get what you can.](https://reader035.fdocuments.in/reader035/viewer/2022062800/56649de55503460f94adcfa3/html5/thumbnails/36.jpg)
1 MGLSDGEWQLVLNAWGKVEADVAGHGQEVLIRLFTGHPETLEKFDKFKHL 50 ||||||||||||| |||||||||||||||||||| ||||||||||||||| 1 MGLSDGEWQLVLNVWGKVEADVAGHGQEVLIRLFKGHPETLEKFDKFKHL 50 . . . . . 51 KTEAEMKASEDLKKHGNTVLTALGGILKKKGHHEAEVKHLAESHANKHKI 100 |.| ||||||||||||||||||||||||||||||||. ||:||| |||| 51 KSEDEMKASEDLKKHGNTVLTALGGILKKKGHHEAELTPLAQSHATKHKI 100 . . . . . 101 PVKYLEFISDAIIHVLHAKHPSDFGADAQAAMSKALELFRNDMAAQYKVL 150 |||||||||:||| || .||| ||||||| |||||||||||||||.|| | 101 PVKYLEFISEAIIQVLQSKHPGDFGADAQGAMSKALELFRNDMAAKYKEL 150
151 GFHG 154 || | 151 GFQG 154
Cow-to-Pig Protein
![Page 37: Pairwise Alignment How do we tell whether two sequences are similar? BIO520 BioinformaticsJim Lund Assigned reading: Ch 4.1-4.7, Ch 5.1, get what you can.](https://reader035.fdocuments.in/reader035/viewer/2022062800/56649de55503460f94adcfa3/html5/thumbnails/37.jpg)
Cow-to-Pig cDNA 1 CAGCTGTCGGAGACAGACACCCAGTCAGTCCCGCCCTTGTTCTTTTTCTC 50 | ||| ||| || | ||||| |||| ||| |||||| 1 .......CAGAGCCAGGACACCCAGTACGCCCGCACTTGCTCTGTTTCTC 43 . . . . . 51 TTCTTCAGACTGCGCCATGGGGCTCAGCGACGGGGAATGGCAGTTGGTGC 100 |||| ||||||| |||||||||||||||||||||||||||||| |||||| 44 TTCTGCAGACTGTGCCATGGGGCTCAGCGACGGGGAATGGCAGCTGGTGC 93 . . . . . 101 TGAATGCCTGGGGGAAGGTGGAGGCTGATGTCGCAGGCCATGGGCAGGAG 150 |||| | ||||||||||||||||||||||||||||||||||||||||||| 94 TGAACGTCTGGGGGAAGGTGGAGGCTGATGTCGCAGGCCATGGGCAGGAG 143 . . . . . 151 GTCCTCATCAGGCTCTTCACAGGTCATCCCGAGACCCTGGAGAAATTTGA 200 ||||||||||||||||| | ||||| ||||||||||||||||||||||| 144 GTCCTCATCAGGCTCTTTAAGGGTCACCCCGAGACCCTGGAGAAATTTGA 193 . . . . . 201 CAAGTTCAAGCACCTGAAGACAGAGGCTGAGATGAAGGCCTCCGAGGACC 250 |||||| |||||||||||| |||||| ||||||||||||||| ||||||| 194 CAAGTTTAAGCACCTGAAGTCAGAGGATGAGATGAAGGCCTCTGAGGACC 243
80% Identity (88% at aa!)
![Page 38: Pairwise Alignment How do we tell whether two sequences are similar? BIO520 BioinformaticsJim Lund Assigned reading: Ch 4.1-4.7, Ch 5.1, get what you can.](https://reader035.fdocuments.in/reader035/viewer/2022062800/56649de55503460f94adcfa3/html5/thumbnails/38.jpg)
DNA similarity reflects polypeptide similarity
101 TGAATGCCTGGGGGAAGGTGGAGGCTGATGTCGCAGGCCATGGGCAGGAG 150 |||| | ||||||||||||||||||||||||||||||||||||||||||| 94 TGAACGTCTGGGGGAAGGTGGAGGCTGATGTCGCAGGCCATGGGCAGGAG 143
501 CCAGTACAAGGTGCTGGGCTTCCATGGCTAAGCCCCACCCCTGTGCCCCT 550 | ||||||||| |||||||||||| ||||||||||| | | || | 494 CAAGTACAAGGAGCTGGGCTTCCAGGGCTAAGCCCCCCAGACGCCCCTCA 543 . . . . .
![Page 39: Pairwise Alignment How do we tell whether two sequences are similar? BIO520 BioinformaticsJim Lund Assigned reading: Ch 4.1-4.7, Ch 5.1, get what you can.](https://reader035.fdocuments.in/reader035/viewer/2022062800/56649de55503460f94adcfa3/html5/thumbnails/39.jpg)
Coding vs Non-coding Regions
451 CAGGCTGCCATGAGCAAGGCCCTGGAACTGTTCCGGAATGACATGGCTGC 500 |||| ||||||||||||||||||||||| |||||||| |||||||| || 444 CAGGGAGCCATGAGCAAGGCCCTGGAACTCTTCCGGAACGACATGGCGGC 493 . . . . . 501 CCAGTACAAGGTGCTGGGCTTCCATGGCTAAGCCCCACCCCTGTGCCCCT 550 | ||||||||| |||||||||||| ||||||||||| | | || | 494 CAAGTACAAGGAGCTGGGCTTCCAGGGCTAAGCCCCCCAGACGCCCCTCA 543 . . . . . 551 CAC.CCCACCCACCTGGG...........CAGGGTGGGCGGGGACTGAAT 588 | | |||| |||| |||| | || ||| ||| ||||| 544 CCCACCCATCCACTTGGGCCAGGGCCCCCCGCGGAGGGTGGGCGCTGAAG 593 . . . . . 589 CCCAAGTAGTTATAGGGTTTGCTTCTGAGTGTGTGCTTTGTTTAGGAGAG 638 | | |||| | |||||||||||||||||||| ||||||||| | ||||| 594 CTCCTGTAGCTGTAGGGTTTGCTTCTGAGTGT.TGCTTTGTTCATGAGAG 642 . . . . . 639 GTGGGTGGAAGAGGTGGATGGGTTAGGGGTGGAGG............... 673 |||||||| ||||||||| ||| | | ||||| || 643 GTGGGTGGGAGAGGTGGAGGGGCTGGTGGTGGTGGTGGGGGGGTGTTCAG 692
90% in coding (70% in non-coding)
![Page 40: Pairwise Alignment How do we tell whether two sequences are similar? BIO520 BioinformaticsJim Lund Assigned reading: Ch 4.1-4.7, Ch 5.1, get what you can.](https://reader035.fdocuments.in/reader035/viewer/2022062800/56649de55503460f94adcfa3/html5/thumbnails/40.jpg)
Third Base of Codon is Hypervariable
201 CAAGTTCAAGCACCTGAAGACAGAGGCTGAGATGAAGGCCTCCGAGGACC 250 ||||||*||||||||||||*||||||*|||||||||||||||*||||||| 194 CAAGTTTAAGCACCTGAAGTCAGAGGATGAGATGAAGGCCTCTGAGGACC 243 . . . . . 251 TGAAGAAGCATGGCAACACGGTGCTCACGGCCCTGGGGGGTATCCTGAAG 300 ||||||||||*||||||||||||||*||*|||||||||||*|||||*||| 244 TGAAGAAGCACGGCAACACGGTGCTGACTGCCCTGGGGGGCATCCTTAAG 293
![Page 41: Pairwise Alignment How do we tell whether two sequences are similar? BIO520 BioinformaticsJim Lund Assigned reading: Ch 4.1-4.7, Ch 5.1, get what you can.](https://reader035.fdocuments.in/reader035/viewer/2022062800/56649de55503460f94adcfa3/html5/thumbnails/41.jpg)
Cow-to-Fish Protein
1 MGLSDGEWQLVLNAWGKVEADVAGHGQEVLIRLFTGHPETLEKFDKFKHL 50 :. :|| || .||| | || || |||| |||||. | || : 1 ....MADFDMVLKCWGPMEADHATHGSLVLTRLFTEHPETLKLFPKFAGI 46 . . . . . 51 KTEAEMKASEDLKKHGNTVLTALGGILKKKGHHEAEVKHLAESHANKHKI 100 :: . || ||| || :|| :| | | .| |. ||| |||| 47 .AHGDLAGDAGVSAHGATVLNKLGDLLKARGAHAALLKPLSSSHATKHKI 95 . . . . . 101 PVKYLEFISDAIIHVLHAKHPSDFGADAQAAMSKALELFRNDMAAQYKVL 150 |: . |.: | |: | | | | |: : : || | || | 96 PIINFKLIAEVIGKVMEEKAGLD..AAGQTALRNVMAIIITDMEADYKEL 143
151 GFHG 154 || 144 GFTE 147
42% identity, 51% similarity
![Page 42: Pairwise Alignment How do we tell whether two sequences are similar? BIO520 BioinformaticsJim Lund Assigned reading: Ch 4.1-4.7, Ch 5.1, get what you can.](https://reader035.fdocuments.in/reader035/viewer/2022062800/56649de55503460f94adcfa3/html5/thumbnails/42.jpg)
Cow-to-Fish DNA
32 .ACAGGACATTTTACTACTCTGCAGATAATGGCTGACTTTGACATGGTAC 80 | | | | | | || | | || | | |||| | 51 TTCTTCAGACTGCGCCATGGGGCTCAGCGACGGGGAATGGCAGTTGGTGC 100 . . . . . 81 TGAAGTGCTGGGGTCCAATGGAGGCGGACCACGCAACCCACGGGAGTCTG 130 |||| |||||| ||||||| || |||| ||| ||| | 101 TGAATGCCTGGGGGAAGGTGGAGGCTGATGTCGCAGGCCATGGGCAGGAG 150 . . . . . 131 GTGCTGACCCGTTTATTCACAGAGCACCCAGAAACCCTAAAGTTATTCCC 180 || || | | | | ||||||| || || || ||||| || ||| 151 GTCCTCATCAGGCTCTTCACAGGTCATCCCGAGACCCTGGAGAAATTTGA 200 . . . . . 181 CAAGTTTGCTGGC...ATCGCCCATGGGGACCTGGCCGGGGATGCAGGTG 227 |||||| | | | | | || || | | | 201 CAAGTTCAAGCACCTGAAGACAGAGGCTGAGATGAAGGCCTCCGAGGACC 250
48% similarity
![Page 43: Pairwise Alignment How do we tell whether two sequences are similar? BIO520 BioinformaticsJim Lund Assigned reading: Ch 4.1-4.7, Ch 5.1, get what you can.](https://reader035.fdocuments.in/reader035/viewer/2022062800/56649de55503460f94adcfa3/html5/thumbnails/43.jpg)
Protein vs. DNAAlignments
• Polypeptide similarity > DNA• Coding DNA > Non-coding
• 3rd base of codon hypervariable• Moderate Distance poor DNA similarity
![Page 44: Pairwise Alignment How do we tell whether two sequences are similar? BIO520 BioinformaticsJim Lund Assigned reading: Ch 4.1-4.7, Ch 5.1, get what you can.](https://reader035.fdocuments.in/reader035/viewer/2022062800/56649de55503460f94adcfa3/html5/thumbnails/44.jpg)
Rules of Thumb
• DNA-DNA similarities– 50% significant if “long”
– E < 1e-6, 70% identity
• Protein-protein similarities– 80% end-end: same structure, same function
– 30% over domain, similar function, structure overall similar
– 15-30% “twilight zone”
– Short, strong match…could be a “motif”
![Page 45: Pairwise Alignment How do we tell whether two sequences are similar? BIO520 BioinformaticsJim Lund Assigned reading: Ch 4.1-4.7, Ch 5.1, get what you can.](https://reader035.fdocuments.in/reader035/viewer/2022062800/56649de55503460f94adcfa3/html5/thumbnails/45.jpg)
Basic BLAST Family
• BLASTN– DNA to DNA database
• BLASTP– protein to protein database
• TBLASTN– DNA (translated) to protein database
• BLASTX– protein to DNA database (translated)
• TBLASTX– DNA (translated) to DNA database (translated)
![Page 46: Pairwise Alignment How do we tell whether two sequences are similar? BIO520 BioinformaticsJim Lund Assigned reading: Ch 4.1-4.7, Ch 5.1, get what you can.](https://reader035.fdocuments.in/reader035/viewer/2022062800/56649de55503460f94adcfa3/html5/thumbnails/46.jpg)
DNA Databases
• nr (non-redundantish merge of Genbank, EMBL, etc…)– EXCLUDES HTGS0,1,2, EST, GSS, STS, PAT, WGS
• est (expressed sequence tags)• htgs (high throughput genome seq.)• gss (genome survey sequence)• vector, yeast, ecoli, mito• chromosome (complete genomes)• And more
http://www.ncbi.nlm.nih.gov/BLAST/blastcgihelp.shtml#nucleotide_databases
![Page 47: Pairwise Alignment How do we tell whether two sequences are similar? BIO520 BioinformaticsJim Lund Assigned reading: Ch 4.1-4.7, Ch 5.1, get what you can.](https://reader035.fdocuments.in/reader035/viewer/2022062800/56649de55503460f94adcfa3/html5/thumbnails/47.jpg)
Protein Databases
• nr (non-redundant Swiss-prot, PIR, PDF, PDB, Genbank CDS)
• swissprot
• ecoli, yeast, fly
• month
• And more
![Page 48: Pairwise Alignment How do we tell whether two sequences are similar? BIO520 BioinformaticsJim Lund Assigned reading: Ch 4.1-4.7, Ch 5.1, get what you can.](https://reader035.fdocuments.in/reader035/viewer/2022062800/56649de55503460f94adcfa3/html5/thumbnails/48.jpg)
BLAST Input
• Program
• Database
• Options - see more
• Sequence– FASTA
– gi or accession#
![Page 49: Pairwise Alignment How do we tell whether two sequences are similar? BIO520 BioinformaticsJim Lund Assigned reading: Ch 4.1-4.7, Ch 5.1, get what you can.](https://reader035.fdocuments.in/reader035/viewer/2022062800/56649de55503460f94adcfa3/html5/thumbnails/49.jpg)
BLAST Options
• Algorithm and output options– # descriptions, # alignments returned– Probability cutoff– Strand
• Alignment parameters– Scoring Matrix
• PAM30, PAM70, BLOSUM45, BLOSUM62BLOSUM62, BLOSUM80, BLOSUM80
– Filter (low complexity) PPPPP->XXXXX
![Page 50: Pairwise Alignment How do we tell whether two sequences are similar? BIO520 BioinformaticsJim Lund Assigned reading: Ch 4.1-4.7, Ch 5.1, get what you can.](https://reader035.fdocuments.in/reader035/viewer/2022062800/56649de55503460f94adcfa3/html5/thumbnails/50.jpg)
Extended BLAST Family
• Gapped Blast (default)Gapped Blast (default)• PSI-Blast (Position-specific iterated
blast)– “self” generated scoring matrix
• PHI BLAST (motif plus BLAST)• BLAST2 client (align two seqs)
• megablast (genomic sequence)• rpsblast (search for domains)