7/27/2019 03 Comparison.ppt
1/12
1
SequenceComparison
BINF3010/9010
Homologyandsimilarity
HomologySequencesarehomologousiftheyare
evoluonarilyrelated-i.e.theyshareacommon
ancestorthroughevoluon
SimilarityLookingalikeNotanevoluonaryconcept
Homologyandsimilarity
HomologyisnotaquantyTwosequencesareeitherhomologousornot
homologous
e.g.,itisincorrecttorefertotwosequencesasbeing50%homologous
Similaritycanbequanfiede.g.,twosequencescanbe50%similar,80%similaretc
Homologyandsimilarity
Computaonalmethodsrecogniseandmeasuresimilarity
Highsimilarityissupporngevidencetoinferhomology
Typesofhomology
Orthologs:Genes/proteinsdescendedfromacommonancestor
Paralogs:Genes/proteinsrelatedtoeachotherduetoageneduplicaonevent
Evoluonthroughmutaons
SPAMEGGANDSPAMsubstitutions
insertionsdeletions
SPATEGGANDSPAM
1 SPLATEGGANDSPAM 2 SPAGANDSPAM
7/27/2019 03 Comparison.ppt
2/12
2
Visualisingtheprocess
Dotmatrixplots(dotplots) Alignments
Dotmatrixplot
M A P S D
N A G
A P S SPLATEGGANDSPAM
1 SPLATEGGANDSPAM
2 SPAGANDSPAM
1
2
DotmatrixplotsDotmatrixplot:Principle
AAGTTCAGTAGGCATTTAAGCGA ** * * * **
G * * ** * *T ** * ***
A ** * * * **
C * * *C * * *
G * * ** * *T ** * ***T ** * ***
C * * *C * * *
Word size = 1
AAGTTCAGTAGGCATTTAAGCGA * * * *
G * *T * *
A
C
C *
G * *T * **
T *
C
C
Word size = 2AAGTTCAGTAGGCATTTAAGCG
A * *
G *
T
A
CC
G *T *
T
CC
Word size = 3
7/27/2019 03 Comparison.ppt
3/12
3
AAGTTCAGTAGGCATTTAAGCGA * * * * *
G * * * *T * * *
A *
C * *C * * *
G * * *T * **T **
CC
Word size = 3
Threshold = 2
Window = 30 Stringency = 9
Window=20Stringency=9 Window = 30 Stringency = 14
Window = 20 Stringency = 13Dotmatrixplot:repeats
M A P S D
N A G
A P S SPLATEGGANDSPAM
1 SPLATEGGANDSPAM
2 SPAGANDSPAM
1
2
7/27/2019 03 Comparison.ppt
4/12
4
Repeatdetecon
TFIIIA
vs
TFIIIA
Sequencealignment
1 SPLATEGGANDSPAM 2 SPAGANDSPAM
1 SPLATEGGANDSPAM
|| | ||||||||
2 SP-A---GANDSPAM
GlobalvsLocalAlignment
1 ....AUAUCUUUAAUUUAAUGGUAAAAUAUUAGAAUACGAAUCUAAUUAU 46|||| || | || || || || | | | || ||
1 UGGUAUAUAGUUUAAACAAAACGAAUGAUUUCGACUCAUUAAAUUAUGAU 50. .
47 AUAGGUUCAAAUCCUAUAAGAUAUUCCA 74| | | | |
51 AAUCAUAUUUACCAACCA.......... 68
44 UAUAUAGGUUCAA 56||||||| || ||
4 UAUAUAGUUUAAA 16
Global: align the whole of the two sequences together
Local: align only the region of best similarity
Whichalignmentiscorrect?
1 SPLATEGGANDSPAM|| | ||||||||
2 SP-A---GANDSPAM2 insertion/deletions
1 SPLATEGGANDSPAM|| ||||||||
2 SPA----GANDSPAM1 indel, 1 substitution
1 SPLATEGGANDSPAM|| ||||||||
2 SP----AGANDSPAM1 indel, 1 substitution
1 SPLATEGGANDSPAM| ||||||||
2 -SPA---GANDSPAM2 indels, 2 substitutions
Whichalignmentisopmal?
SelectascoringsystemforalignmentsAssignvaluestomatches,mismatchesandgaps
SumupthevaluesoverthewholealignmentAlignmentscore=Scorematch-Scoregap
Theopmalalignmentistheonewiththehighestscore
Forexample:
Match:+2Mismatch:-1Gap:5
1 SPLATEGGANDSPAM
|| | ||||||||
2 SP-A---GANDSPAMS= (11*2) + (0*-1) - (2*5) = 12
1 SPLATEGGANDSPAM
||x ||||||||
2 SPA----GANDSPAMS= (10*2) + (1*-1) - (1*5) = 14
1 SPLATEGGANDSPAM|| x||||||||
2 SP----AGANDSPAMS= (10*2) + (1*-1) - (1*5) = 14
1 SPLATEGGANDSPAMxx| ||||||||
2 -SPA---GANDSPAMS= (9*2) + (2*-1) - (2*5) = 6
7/27/2019 03 Comparison.ppt
5/12
5
Algorithms
GlobalalignmentNeedleman-WunschSellers
LocalalignmentSmith-Waterman
Notethattheopmalalignmentisnot
necessarilythecorrectbiological
alignment.
However,itisusuallyimpossibletoknow
thecorrectevoluonaryalignment
Structurealignment Structurealignment
10 20 30 40 50 60
....*....|....*....|....*....|....*....|....*....|....*....| 4HHB_A 1 ~VLSPADKTNVKAAWGKVgaHAGEYGAEALERMFLSFPTTKTYFPHFD ls~~~~~~hGSA532HHB_B 1 vHLTPEEKSAVTALWGKV~~NVDEVGGEALGRLLVVYPWTQRFFESFGdlstpdavmGNP58
70 80 90 100 110 120....*....|....*....|....*....|....*....|....*....|....*....|
4HHB_A 54 QVKGHGKKVADALTNAVAHVDDMPNALSALSDLHAHKLRVDPVNFKLLSHCLLVTLAAHL 1132HHB_B 59 KVKAHGKKVLGAFSDGLAHLDNLKGTFATLSELHCDKLHVDPENFRLLGNVLVCVLAHHF 118
130 140....*....|....*....|....*...
4HHB_A114 PAEFTPAVHASLDKFLASVSTVLTSKYR1412HHB_B119 GKEFTPPVQAAYQKVVAGVANALAHKYH 146
Scoringsystems
MatchesandmismatchesSubstuonmutaons
GapsInseronsanddeleons
DNAsequencealignment768 TT....TGTGTGCATTTAAGGGTGATAGTGTATTTGCTCTTTAAGAGCTG 813
|| || || | | ||| | |||| ||||| ||| |||
87 TTGACAGGTACCCAACTGTGTGTGCTGATGTA.TTGCTGGCCAAGGACTG 135. . . . .
814 AGTGTTTGAGCCTCTGTTTGTGTGTAATTGAGTGTGCATGTGTGGGAGTG 863| | | | |||||| | |||| | || | |
136 AAGGATC.............TCAGTAATTAATCATGCACCTATGTGGCGG 172
. . . . .864 AAATTGTGGAATGTGTATGCTCATAGCACTGAGTGAAAATAAAAGATTGT 913
||| | ||| || || ||| | ||||||||| || |||||| |
173 AAA.TATGGGATATGCATGTCGA...CACTGAGTG..AAGGCAAGATTAT 216
7/27/2019 03 Comparison.ppt
6/12
6
A T G CA 5 -4 -4 -4T -4 5 -4 -4G -4 -4 5 -4
C -4 -4 -4 5
DNAscoringmatrixusedinEMBOSS
Section of EMBOSS data file EDNAFULL
ProteinSequenceAlignment
TPKRREAEDLQVGQVLGGPLQLLE...SLQKRGIVEQCCT||:|: |: |:|||::|: |||||||||
YPKKRDMEQ......LSGPLDMLQQEYQKMKRGIVEQCCH
ProteinSequenceAlignment
TPKRREAEDLQVGQVLGGPLQLLE...SLQKRGIVEQCCT||:|:|: |:|||::|: |||||||||
YPKKRDMEQ......LSGPLDMLQQEYQKMKRGIVEQCCH
Identical
ProteinSequenceAlignment
TPKRREAEDLQVGQVLGGPLQLLE...SLQKRGIVEQCCT||:|:|: |:|||::|: |||||||||
YPKKRDMEQ......LSGPLDMLQQEYQKMKRGIVEQCCH
Identical
Similar
Different
ProteinComparison:
ScoringMatrix
A la C ys A sp G lu P h e G l y H is I l e L ys Le u M et As n P ro G ln A rg S er T hr V al T rp T yr A C D E F G H I K L M N P Q R S T V W Y
0.8 0 . 0 - 0 .4 - 0. 2 - 0 . 4 0 . 0 - 0 .4 - 0 .2 - 0 .2 - 0 .2 - 0 .2 - 0 .4 - 0. 2 - 0 .2 - 0 .2 0.2 0 . 0 0 . 0 - 0 .6 - 0 .4 A Ala
1.8 - 0 .6 - 0. 8 - 0 . 4 - 0 .6 - 0 .6 - 0 .2 - 0 .6 - 0 .2 - 0 .2 - 0 .6 - 0. 6 - 0 .6 - 0 .6 - 0 .2 - 0. 2 - 0. 2 - 0 . 4 - 0 .4 C Cys
1.2 0.4 - 0 .6 - 0 .2 - 0 .2 - 0 .6 - 0 .2 - 0 .8 - 0 .6 0.2 - 0. 2 0 . 0 - 0 .4 0 . 0 - 0. 2 - 0 . 6 - 0 .8 - 0 .6 D Asp
1.0 - 0 .6 - 0 .4 0 . 0 - 0 . 6 0.2 - 0 .6 - 0 .4 0 . 0 - 0 .2 0.4 0 . 0 0 . 0 - 0 .2 - 0. 4 - 0 . 6 - 0 . 4 E Glu
1.2 - 0 .6 - 0 . 2 0 . 0 - 0 .6 0 . 0 0 . 0 - 0 . 6 - 0. 8 - 0 . 6 - 0 .6 - 0 .4 - 0 . 4 - 0 . 2 0 .2 0 .6 F Phe
1.2 - 0 .4 - 0 .8 - 0 . 4 - 0 . 8 - 0 . 6 0 . 0 - 0 .4 - 0 .4 - 0 . 4 0 . 0 - 0 .4 - 0. 6 - 0 . 4 - 0 . 6 G Gly
1.6 - 0 .6 - 0 . 2 - 0 . 6 - 0. 4 0 . 2 - 0. 4 0 . 0 0 . 0 - 0 . 2 - 0. 4 - 0 . 6 - 0 .4 0.4 H His
0.8 -0.6 0 .4 0 .2 - 0 .6 - 0. 6 - 0 .6 - 0 .6 - 0 .4 - 0. 2 0 . 6 - 0 .6 - 0 .2 I Ile
1.0 - 0 .4 - 0 .2 0 . 0 - 0 .2 0 .2 0 .4 0 .0 - 0. 2 - 0. 4 - 0 .6 - 0 .4 K Lys
0.8 0.4 - 0 .6 - 0. 6 - 0 .4 - 0 .4 - 0 .4 - 0. 2 0.2 -0.4 -0.2 L Leu
1.0 - 0 .4 - 0. 4 0 . 0 - 0 .2 - 0 .2 - 0. 2 0.2 -0.2 -0.2 M Met
1.2 - 0. 4 0 .0 0 .0 0.2 0 . 0 - 0. 6 - 0 .8 - 0 .4 N Asn
1.4 - 0 .2 - 0 .4 - 0 .2 - 0. 2 - 0. 4 - 0 .8 - 0 .6 P Pro
1.0 0.2 0 . 0 - 0 . 2 - 0. 4 - 0 .4 - 0 .2 Q Gln
1.0 - 0 .2 - 0. 2 - 0. 6 - 0 .6 - 0 .4 R Arg
0.8 0.2 -0.4 -0.6 -0.4 S Ser
1.0 0 . 0 - 0 . 4 - 0 .4 T Thr
0.8 -0.6 -0.2 V Val
2.2 0.4 W Trp
1.4 Y Tyr
BLOSUM62 Matrix
Firstprinciplesaminoacidsubstuon
matrices Identymatrix
Perfectmatch:posivescore Anymismatch:negavescore
Genecscorematrix Basedontheaveragenumberofnucleodechanges
neededtomutateoneaminoacidintoanother
e.g.K(AAA,AAG)toN(AAC,AAU)hasahigherscorethanK(AAA,AAG)toD(GAU,GAC)
Chemicalproperesmatrices e.g.K(basic)toR(basic)hasahigherscorethanK(basic)
toF(aromac)orKtoE(acidic)
7/27/2019 03 Comparison.ppt
7/12
7
Identymatrixexample
D +1E -1 +1Q -1 -1 +1
H -1 -1 -1 +1V -1 -1 -1 -1 +1
F -1 -1 -1 -1 -1 +1W -1 -1 -1 -1 -1 -1 +1
D E Q H V F W
Data-basedmatrices
Calculatedfromaminoacidfrequenciesinknownhomologoussequences
PAMfamilyofmatrices BLOSUMfamilyofmatrices Performbeerthanfirstprinciplematrices
(whicharesllusefulforsomespecialised
applicaons)
BLOSUMmatrices
BLOSUM 62
BLOSUMmatrices
HenikoffandHenikoff,1992 BlocksSubstuonMatrix BasedontheBLOCKSdatabase Currently,mostwidelyusedmatrixfamily Mostcommonlyusedmatrices:BLOSUM62
andBLOSUM55
BLOCKSdatabase
BLOCKSareungappedmulplesequencealignmentsbasedontheSWISS-PROTdatabaseandthePROSITE
proteinfamilydatabase
AllthesequencesfromSWISS-PROTbelongingtoaPROSITEfamilyarealignedtogether,tocreatelocal
ungappedalignments characteriscoftheprotein
family
BLOCKexampleID Mn_catalase; BLOCK
AC IPB007760A; distance from previous block=(3
DE Manganese containing catalase
BL HIL; width=14; seqs=49; 99.5%=727; strengt
CTJC_BACSU|Q45538 ( 67) HLEMIATMVYKLTK 12
GS80_BACSU|P80878 ( 69) HVEMIATMIARLLE 14YDHU_BACSU|O05513 ( 4) HGNLITDLLDNLLL 25
O69145 ( 70) HMEIVAETINLLNG 64
Q9KDZ2 ( 136) SGNLIFDLLHNYFL 34
Q9KAU6 ( 69) HVEMLATMIARLLD 16
Q9I1T0 ( 68) HLEIIGSIVGMLNK 20Q97JE8 ( 68) HLEIVGSIVRQLSR 50
MCAT_CLOAB|Q97FE0 ( 124) TGDIVADLLSNIAS 73
Q8Z7E1 ( 68) HLEIIGSLVGMLNK 17
Q8YY54 ( 69) HIEMLATMIAHLLD 27Q8YSJ5 ( 68) HLEMVGKLIEAHTK 36
7/27/2019 03 Comparison.ppt
8/12
8
FromBLOCKStoBLOSUM
1. Countthenumberofaminoacidpairsobservedineachcolumnofeachblockandcalculatetheobservedfrequencyofeachpair
2. Calculatetheexpectedfrequencyofeachpair(basedonthefrequencyofindividualaminoacids)
3. Calculatethelograo(typicallylog2)
1.Countnumberofobservedpairsand
calculatefrequencies
DADAAAAEAAEEAADA
AAEE
AADE
There are 4 6
2
#
$%
&
'(= 60 aligned pairs of amino acids in the block
Aligned pair
(xy)
Proportion of times observed
(oxy)
A to A 26/60
A to D 8/60
A to E 10/60
D to D 3/60
D to E 6/60
E to E 7/60
Generalcaseforstep1.
For each pair of amino acids x and y,
nxy
= number of times x and y are in the same
column of a block
oxy = observed proportion of aligned pairxy
oxy
=
nxy
nuv
uv
2.Calculatetheexpectedfrequencyofeach
pair
DADAAAAE
AAEEAADA
AAEE
AADE
Amino acid (x) Proportion in block (px)
A 14/24
D 4/24
E 6/24
Amino acid pair (xy) Expected proportion (exy)
A to A (14/24)2 = 196/576
A to D 2(14/24) (4/24) = 112/576
A to E 2 (14/24) (6/24) = 168/576
D to D (4/24)2 = 16/576
D to E 2(4/24) (6/24) = 48/576
E to E (6/24)2 = 36/576
Generalcaseforstep2
Expected proportion of amino acid pair xy in
random block of same amino acid composition :
exy
=
2pxp
yifx y
pxp
yifx = y
#$%
3.Calculatethelograo
Matrix entry = 2log2oxy
exy
"
#$$
%
&''(rounded to nearest integer)
Aligned pair (xy) oxy
exy
2log2(oxy/exy)A to A 26/60 196/576 0.70A to D 8/60 112/576 -1.09A to E 10/60 168/576 -1.61D to D 3/60 16/576 1.70D to E 6/60 48/576 0.53E to E 7/60 36/576 1.80
7/27/2019 03 Comparison.ppt
9/12
9
Finalmatrix
A D EA 1 -1 -2
D -1 2 1
E -2 1 2
The 2log2 transformation means that the matrix is in half-bits
BLOSUMfamily
Problem:counngeveryaminoacidintheblockcanleadtoanover-representaonofaminoacid
changesfoundincloselyrelatedsequences
Soluon:clustersequencescloserthanaset%identy,andaveragetheircontribuonsothatthe
wholeclustercountsasonesequence
Thisgivesrisetoafamilyofmatrices,dependingonthe%identythreshold
VSLHLELTRSEWTRSEISRSELCRT
80% identical
60% identical
nEE nVE
No clustering (BLOSUM100) 6 4
Clustering sequences with
80% identity (BLOSUM80)3 3
Clustering sequences with
60% identity (BLOSUM60)2 2
PAMmatrices
PAM120
PAMmatrices
PAM-Point(Percent)AcceptedMutaon SchwartzandDayhoff,1978 AlsoknownasMDM78(mutaondatamatrix)or
Dayhoffmatrix
Empiricalmatrixbasedonevoluonarymodel Basedonsmallnumberoffamiliesofcloselyrelated
proteins(>85%identy)sothatsequencescanbealignedunambiguouslybyhand
Sincethechangesobservedbetweenthesesequencesdidnotaffectthefunconoftheprotein,theseareacceptedmuta9ons
1.Alignthesequencesbyhand
2.Orderthesequencesusingparsimony
hbb_ornan LSELHCDKLH VDPENFNRLG NVLIVVLARH FSKDFSPEVQ AAWQKLVSGVhbb_tacac LSELHCDKLH VDPENFNRLG NVLVVVLARH FSKEFTPEAQ AAWQKLVSGV
hbe_ponpy LSELHCDKLH VDPENFKLLG NVMVIILATH FGKEFTPEVQ AAWQKLVSAVhbb_speci LSELHCDKLH VDPENFKLLG NMIVIVMAHH LGKDFTPEAQ AAFQKVVAGV
hbb_speto LSELHCDKLH VDPENFKLLG NMIVIVMAHH LGKDFTPEAQ AAFQKVVAGVhbb_equhe LSELHCDKLH VDPENFRLLG NVLVVVLARH FGKDFTPELQ ASYQKVVAGV
7/27/2019 03 Comparison.ppt
10/12
10
3.Countthenumberofmeseachaminoacid
changestoeachotherone
e.g.FchangingtoLhbb_ornan LSELHCDKLH VDPENFNRLG NVLIVVLARH FSKDFSPEVQ AAWQKLVSGVhbb_tacac LSELHCDKLH VDPENFNRLG NVLVVVLARH FSKEFTPEAQ AAWQKLVSGV
hbe_ponpy LSELHCDKLH VDPENFKLLG NVMVIILATH FGKEFTPEVQ AAWQKLVSAVhbb_speci LSELHCDKLH VDPENFKLLG NMIVIVMAHH LGKDFTPEAQ AAFQKVVAGV
hbb_speto LSELHCDKLH VDPENFKLLG NMIVIVMAHH LGKDFTPEAQ AAFQKVVAGVhbb_equhe LSELHCDKLH VDPENFRLLG NVLVVVLARH FGKDFTPELQ ASYQKVVAGV
F
F
FL
L
F
L F FF
1 FL change. (NFL = 1)
4.Calculateprobabilityforeachaminoacidmutang
toeachotheraminoacid
Foreachpairofaminoacidsiandj,thefrequencyofchangefijis:
Forij,theprobabilityofchangepijis:
wherecisaposivescalingconstantchosensothat
eachpii>0.
fij =Nij
Nikk
pij =
cfijand p
ii =1 cfij
i j
Probabilitymatrix
Theresulngprobabilitymatrixallowsmodellingtheevoluonofproteinsequencesas
aMarkovprocess-thatis,theprobabilityofany
aminoacidmutangtoanotheroneis
dependentonlyonthataminoacid
ApAACpACpCC
DpADpCDpDDEpAEpCEpDEpEE
A C D E
PAM1 Theconstantcischosensothattheexpected
numberofaminoacidchangesaeroneroundofapplyingtheprobabiliesis1in100aminoacids
TheresulngprobabilitymatrixisthePAM1probabilitymatrix,givingtheprobabilitythatanaminoacidwillmutatetoanotheroveranamountofevoluonarymesuchthat1%ofaminoacidsmutate
Expected proportion of mutated amino acids :
pi
i
pijij
= c piij
i
fij = 0.01
5.PAMN
BecausetheprobabilitymatrixisMarkov,itispossibletocalculateprobabilitymatricesfor
longerevoluonarymesbymulplyingthe
matrixbyitselfnmes
e.g. PAM2 probability matrix :
pAA pAC pAD ...
pCA pCC pCD ...
pDA pDC pDD ...
... ... ... ...
"
#
$$$$
%
&
''''
pAA pAC pAD ...
pCA pCC pCD ...
pDA pDC pDD ...
... ... ... ...
"
#
$$$$
%
&
''''
PAMN
e.g.aPAM250matrixrepresentsa250%levelofevoluonarychange
e.g.PAM120,PAM80,PAM60matricescouldbeusedforaligningsequenceswhichareapproximately40%,
50%and60%similar,respecvely
PAM250hasbeenshownpreferablefordistantlyrelatedproteinsof14-27%similarity
7/27/2019 03 Comparison.ppt
11/12
11
Detecngevoluonaryrelaonships
300 million years
200 million years
100 million years
PAM100 PAM100 PAM100 PAM100
PAM200 PAM200
Today
6.PAMlogoddsmatrices Ratherthanuseprobabilies,itismoreconvenientto
uselogoddsmatrices IfpijisanentryinthePAMNprobabilitymatrix,the
correspondingentryinthePAMNlogoddsmatrixis:
whereCisaposiveconstantandqiandqjarethe
respecveobservedfrequenciesofaminoacidsiandjinthesequences
Interpretedastheraooftheprobabilitythatthesubstuonrepresentsanauthencevoluonarychangetotheprobabilitythatitoccurredduetorandomeventsofnobiologicalsignificance.
Clogp
ij
qiq
j
"#$$
%&''
PAMmatrices-summary
Familyofsubstuonmatricescorrespondingtodifferentlevelsofevoluonaryme
Basedonsoundevoluonaryprinciples Distancesforlongperiodsofevoluonaryhistory
extrapolatedfromshortermes(assumpon!)
Basedonarelavelysmalldataset(mainlyglobularproteins)
BLOSUMvsPAM
PAM BLOSUMBuilt from an evolutionary
model based on closely
related proteins
Built directly from blocks
of aligned protein segments
covering a wide range of
evolutionary time
Extrapolation from closely
related sequences
No extrapolation
Built from a small number
of complete sequences
Built from a large number
of sequence segments
BLOSUMvsPAM(cont.)
PAM BLOSUMPAMn matrices with low n
are better suited to closely
related sequencesBLOSUMn matrices with
low n are better suited to
highly divergent sequencesUses phylogenetic tree to
avoid over-representing
closely related sequences
Uses clustering of related
sequences and direct
counting of amino acid
changesCommonly used as log
odds matrix Commonly used as logodds matrix
BLOSUMvsPAM
CounngChanges
BLOSUMAA
AB
BB
direct counts A-B count = 4
PAMcounts from an
evolutionary modelA-B count = 2
AA
ABBB AB
7/27/2019 03 Comparison.ppt
12/12
12
GappenalesI
Raonale: Gapsarisethroughinseron/deleonevents,whichdonot
happenoneresidueatame.
Gapcreaonpenalty: Penaltyforcreanganewgap Typically,relavelyhightopreventtoomanygapsinthe
alignment
Gapextension(length)penalty: Penaltyforextendinganexisnggap Typically,relavelysmallsothatasmalldifferenceingap
lengthwillnotaffectthepenaltyforthisgap,butnottoosmalltoresultinverylonggaps.
Gap Penalties IIAlignment of human and hemoglobin chains
Gap penalty = 1, Gap extension penalty = 0.1
1 V.LSPADKTNVKAAWGKVGAHAGEYGAEALERMFLSFPTTKTYFPHF.DLSH.....GSA| |.|.:|..|.| |||| :.:| |:|||:|::: :| |. :|. | ||| |.:
1 VHLTPEEKSAVTALWGKV..NVDEVGGEALGRLLVVYPWTQRFFESFGDLSTPDAVMGNP. . . . . .
54 QVKGHGKKVADALTNAVAHVDDMPNALSALSDLHAHKLRVDPVNFKLLSHCLLVTLAAHL.||:||||| :|:.:::||:|::...:..||:||..||:||| ||:||::.|:..|| |:
59 KVKAHGKKVLGAFSDGLAHLDNLKGTFATLSELHCDKLHVDPENFRLLGNVLVCVLAHHF
. .114 PAEFTPAVHASLDKFLASVSTVLTSKYR 141
. ||||:|:|..:|.:|:|...|. ||:
119 GKEFTPPVQAAYQKVVAGVANALAHKYH 146
GapPenalesIIIAlignment of human and hemoglobin chains
Gap penalty = 5, Gap extension penalty = 0.1
2 LSPADKTNVKAAWGKVGAHAGEYGAEALERMFLSFPTTKTYFPHF......DLSHGSAQV|.|.:|..|.| |||| :.:| |:|||:|::: :| |. :|. | | |.:.|
3 LTPEEKSAVTALWGKV..NVDEVGGEALGRLLVVYPWTQRFFESFGDLSTPDAVMGNPKV. . . . . .
56 KGHGKKVADALTNAVAHVDDMPNALSALSDLHAHKLRVDPVNFKLLSHCLLVTLAAHLPA|:||||| :|:.:::||:|::...:..||:||..||:||| ||:||::.|:..|| |:.
61 KAHGKKVLGAFSDGLAHLDNLKGTFATLSELHCDKLHVDPENFRLLGNVLVCVLAHHFGK
. .116 EFTPAVHASLDKFLASVSTVLTSKYR 141
||||:|:|..:|.:|:|...|. ||:
121 EFTPPVQAAYQKVVAGVANALAHKYH 146
Thetwilight
zone
True positives
False negatives
Rost, B.Protein Eng. 1999 12:85-94;doi:10.1093/protein/12.2.85
Measuringalignmentquality
AlignmentscoreRelavetorandomalignment?
Percentageidenty Percentagesimilarity Evoluonarydistance
Initssimplestform,1-%identySeveralmethodsavailabletocorrectformulple
substuons
Somethingtothinkabout
Whydoweaddthescorestogether?
Top Related