Sequence motifs, information content, logos, and HMM’s
description
Transcript of Sequence motifs, information content, logos, and HMM’s
CEN
TER
FO
R B
IOLO
GIC
AL S
EQ
UEN
CE A
NA
LY
SIS
TEC
HN
ICA
L U
NIV
ER
SIT
Y O
F D
EN
MA
RK
DTU
Sequence motifs, information content,
logos, and HMM’sMorten Nielsen,
CBS, BioCentrum, DTU
CEN
TER
FO
R B
IOLO
GIC
AL S
EQ
UEN
CE A
NA
LY
SIS
TEC
HN
ICA
L U
NIV
ER
SIT
Y O
F D
EN
MA
RK
DTU
Outline• Multiple alignments and sequence motifs• Weight matrices and consensus sequence
– Sequence weighting– Low (pseudo) counts
• Information content– Sequence logos– Mutual information
• Example from the real world• HMM’s and profile HMM’s
– TMHMM (trans-membrane protein) – Gene finding
• Links to HMM packages
CEN
TER
FO
R B
IOLO
GIC
AL S
EQ
UEN
CE A
NA
LY
SIS
TEC
HN
ICA
L U
NIV
ER
SIT
Y O
F D
EN
MA
RK
DTU
Multiple alignment and sequence motifs
• Core• Consensus
sequence• Weight matrices• Problems
– Sequence weights– Low counts
----------MLEFVVEADLPGIKA------------------MLEFVVEFALPGIKA------------------MLEFVVEFDLPGIAA---------------------YLQDSDPDSFQD-----------GSDTITLPCRMKQFINMWQE-------------RNQEERLLADLMQNYDPNLR-----------------YDPNLRPAERDSDVVNVSLK----------------NVSLKLTLTNLISLNEREEA-------EREEALTTNVWIEMQWCDYR-------------------WCDYRLRWDPRDYEGLWVLR-----LWVLRVPSTMVWRPDIVLEN-----------------------IVLENNVDGVFEVALYCNVL--------------YCNVLVSPDGCIYWLPPAIF---------PPAIFRSACSISVTYFPFDW---- ********* FVVEFDLPG
Consensus
CEN
TER
FO
R B
IOLO
GIC
AL S
EQ
UEN
CE A
NA
LY
SIS
TEC
HN
ICA
L U
NIV
ER
SIT
Y O
F D
EN
MA
RK
DTU
Sequences weighting 1 - Clustering
----------MLEFVVEADLPGIKA------------------MLEFVVEFALPGIKA------------------MLEFVVEFDLPGIAA---------------------YLQDSDPDSFQD-----------GSDTITLPCRMKQFINMWQE-------------RNQEERLLADLMQNYDPNLR-----------------YDPNLRPAERDSDVVNVSLK----------------NVSLKLTLTNLISLNEREEA-------EREEALTTNVWIEMQWCDYR-------------------WCDYRLRWDPRDYEGLWVLR-----LWVLRVPSTMVWRPDIVLEN-----------------------IVLENNVDGVFEVALYCNVL--------------YCNVLVSPDGCIYWLPPAIF---------PPAIFRSACSISVTYFPFDW----
*********
} Homologous sequencesWeight = 1/n (1/3)
Consensus sequence
YRQELDPLV
Previous
FVVEFDLPG
CEN
TER
FO
R B
IOLO
GIC
AL S
EQ
UEN
CE A
NA
LY
SIS
TEC
HN
ICA
L U
NIV
ER
SIT
Y O
F D
EN
MA
RK
DTU
Sequences weighting 2 - (Henikoff & Henikoff)
w FVVEADLPG 0.37FVVEFALPG 0.43FVVEFDLPG 0.32YLQDSDPDS 0.59MKQFINMWQ 0.90LMQNYDPNL 0.68PAERDSDVV 0.75LKLTLTNLI 0.85VWIEMQWCD 0.84YRLRWDPRD 0.51WRPDIVLEN 0.71VLENNVDGV 0.59YCNVLVSPD 0.71FRSACSISV 0.75
• waa’ = 1/rs• r: Number of different aa in a column• s: Number occurrences• Normalize so waa= 1 for each column• Sequence weight is sum of waa
F: r=7 (FYMLPVW), s=4 w’=1/28, w = 0.055Y: s=3, w`=1/21, w = 0.073M,P,W: s=1, w’=1/7, w = 0.218L,V: s=2, w’=1/14, w = 0.109
CEN
TER
FO
R B
IOLO
GIC
AL S
EQ
UEN
CE A
NA
LY
SIS
TEC
HN
ICA
L U
NIV
ER
SIT
Y O
F D
EN
MA
RK
DTU
Low count correction
--------MLEFVVEADLPGIKA----------------MLEFVVEFALPGIKA----------------MLEFVVEFDLPGIAA-------------------YLQDSDPDSFQD---------GSDTITLPCRMKQFINMWQE-----------RNQEERLLADLMQNYDPNLR---------------YDPNLRPAERDSDVVNVSLK--------------NVSLKLTLTNLISLNEREEA-----EREEALTTNVWIEMQWCDYR-----------------WCDYRLRWDPRDYEGLWVLR---LWVLRVPSTMVWRPDIVLEN---------------------IVLENNVDGVFEVALYCNVL------------YCNVLVSPDGCIYWLPPAIF-------PPAIFRSACSISVTYFPFDW---- *********
• Limited number of data
• Poor sampling of sequence space
• I is not found at position P1. Does this mean that I is forbidden?
• No! Use Blosum matrix to estimate pseudo frequency of I
P1
CEN
TER
FO
R B
IOLO
GIC
AL S
EQ
UEN
CE A
NA
LY
SIS
TEC
HN
ICA
L U
NIV
ER
SIT
Y O
F D
EN
MA
RK
DTU
Low count correction using Blosum matrices
# I L V
L 0.1154 0.3755 0.0962
V 0.1646 0.1303 0.2689
Blosum62 substitution frequencies• Every time for
instance L/V is observed, I is also likely to occur
• Estimate low (pseudo) count correction using this approach
• As more data are included the pseudo count correction becomes less important
NL = 2, NV=2, Neff=12 =>fI = (2*0.1154 + 2*0.1646)/12 = 0.05
pI* = (Neff * pI + * fI)/(Neff+) = (12*0 + 10*0.05)/(12+10) = 0.02
CEN
TER
FO
R B
IOLO
GIC
AL S
EQ
UEN
CE A
NA
LY
SIS
TEC
HN
ICA
L U
NIV
ER
SIT
Y O
F D
EN
MA
RK
DTU
Information content
• Information and entropy– Conserved amino acid regions contain high degree of
information (high order == low entropy)– Variable amino acid regions contain low degree of
information (low order == high entropy)
• Shannon information D = log2(N) + pi log2 pi (for proteins N=20, DNA
N=4)
• Conserved residue pA=1, pi<>A=0, D = log2(N) ( = 4.3 for proteins)
• Variable region pA=0.05, pC=0.05, .., D = 0
CEN
TER
FO
R B
IOLO
GIC
AL S
EQ
UEN
CE A
NA
LY
SIS
TEC
HN
ICA
L U
NIV
ER
SIT
Y O
F D
EN
MA
RK
DTU
Sequence logo
• Height of a column equal to D
• Relative height of a letter is pA
• Highly useful tool to visualize sequence motifs
High information position
MHC class IILogo from 10 sequences
http://www.cbs.dtu.dk/~gorodkin/appl/plogo.html
CEN
TER
FO
R B
IOLO
GIC
AL S
EQ
UEN
CE A
NA
LY
SIS
TEC
HN
ICA
L U
NIV
ER
SIT
Y O
F D
EN
MA
RK
DTU
More on logos
• Information contentD = pi log2 (pi/qi)
• Shannon, qi = 1/N = 0.05D = pi log2 (pi) - pi log2 (1/N)
= log2 N - pi log2 (pi)
• Kullback-Leibler, qi = background frequency– V/L/A more frequent than for instance C/H/W
A R N D C Q E G H I L K M F P S T W Y V2 1 1 1 1 1 1 1 1 4 16 1 6 15 7 1 2 7 18 138 19 1 1 7 2 2 2 1 3 15 13 6 2 1 2 2 7 1 83 2 7 2 1 17 13 2 1 8 14 3 1 1 7 7 2 0 1 88 13 13 14 1 2 13 2 1 2 3 3 1 7 1 3 7 0 1 74 1 7 7 7 1 2 2 1 13 15 2 6 6 1 7 2 7 7 45 2 8 23 1 6 3 2 1 3 3 2 1 1 1 13 8 0 1 182 1 7 13 1 1 2 2 1 8 14 2 6 1 20 7 2 7 1 33 7 7 8 7 1 7 8 1 2 8 2 1 1 13 7 2 7 1 73 2 7 19 1 6 2 8 1 9 9 2 1 1 1 7 2 0 1 18
Frequency matrix
CEN
TER
FO
R B
IOLO
GIC
AL S
EQ
UEN
CE A
NA
LY
SIS
TEC
HN
ICA
L U
NIV
ER
SIT
Y O
F D
EN
MA
RK
DTU
Mutual information
I(i,j) = aai aaj
P(aai, aaj) *
log[P(aai, aaj)/P(aai)*P(aaj)]
P(G1) = 2/9 = 0.22, ..P(V6) = 4/9 = 0.44,..P(G1,V6) = 2/9 = 0.22, P(G1)*P(V6) = 8/81 = 0.10
log(0.22/0.10) > 0
ALWGFFPVAILKEPVHGVILGFVFTLTLLFGYPVYVGLSPTVWLSYMNGTMSQV
GILGFVFTL WLSLLVPFVFLPSDFFPS
P1 P6
CEN
TER
FO
R B
IOLO
GIC
AL S
EQ
UEN
CE A
NA
LY
SIS
TEC
HN
ICA
L U
NIV
ER
SIT
Y O
F D
EN
MA
RK
DTU
Mutual information
313 binding peptides 313 random peptides
CEN
TER
FO
R B
IOLO
GIC
AL S
EQ
UEN
CE A
NA
LY
SIS
TEC
HN
ICA
L U
NIV
ER
SIT
Y O
F D
EN
MA
RK
DTU
Mutual information at anchor position is low
• Mutual information between anchor positions 2 and 9 and other residues low– At pos 2 we know that L,M,T,V and I are the most
frequent amino acids. – At pos 9 V,L,I and A are most frequent– 313 Rammensee + Buus pep
• P(L2) = 0.51, P(V9)=0.48, P(L2,V9) = 0.23• P(L2,V9)/(P(L2)*P(V9) )=0.23/0.24 = 1.0
• Knowing that we have L at position 2 does not tell us which one of V,L or I is placed on position 9 => NO mutual information
CEN
TER
FO
R B
IOLO
GIC
AL S
EQ
UEN
CE A
NA
LY
SIS
TEC
HN
ICA
L U
NIV
ER
SIT
Y O
F D
EN
MA
RK
DTU
Weight matrices
• Estimate amino acid frequencies from alignment inc. sequence weighting and pseudo counts
• Now a weight matrix is given as
Wij = log(pij/qj)• Here i is a position in the motif, and j an amino
acid. qj is the background frequency for amino acid j.
• W is a L x 20 matrix, L is motif length• Score sequences to weight matrix by looking
up and adding L values from matrix
CEN
TER
FO
R B
IOLO
GIC
AL S
EQ
UEN
CE A
NA
LY
SIS
TEC
HN
ICA
L U
NIV
ER
SIT
Y O
F D
EN
MA
RK
DTU
Example from real life
• 10 peptides from MHCpep database
• Bind to the MHC complex
• Relevant for immune system recognition
• Estimate sequence motif and weight matrix
• Evaluate on 528 peptides
ALAKAAAAMALAKAAAANALAKAAAARALAKAAAATALAKAAAAVGMNERPILTGILGFVFTMTLNAWVKVVKLNEPVLLLAVVPFIVSV
CEN
TER
FO
R B
IOLO
GIC
AL S
EQ
UEN
CE A
NA
LY
SIS
TEC
HN
ICA
L U
NIV
ER
SIT
Y O
F D
EN
MA
RK
DTU
Example (cont.)
• Raw sequence counting– No sequence
weighting – No pseudo count– Prediction accuracy
0.45
• Sequence weighting– No pseudo count– Prediction accuracy
0.5
ALAKAAAAMALAKAAAANALAKAAAARALAKAAAATALAKAAAAVGMNERPILTGILGFVFTMTLNAWVKVVKLNEPVLLLAVVPFIVSV
CEN
TER
FO
R B
IOLO
GIC
AL S
EQ
UEN
CE A
NA
LY
SIS
TEC
HN
ICA
L U
NIV
ER
SIT
Y O
F D
EN
MA
RK
DTU
Example (cont.)
• Sequence weighting and pseudo count– Prediction accuracy
0.60
• Motif found on all data (485)– Prediction accuracy
0.79
ALAKAAAAMALAKAAAANALAKAAAARALAKAAAATALAKAAAAVGMNERPILTGILGFVFTMTLNAWVKVVKLNEPVLLLAVVPFIVSV
CEN
TER
FO
R B
IOLO
GIC
AL S
EQ
UEN
CE A
NA
LY
SIS
TEC
HN
ICA
L U
NIV
ER
SIT
Y O
F D
EN
MA
RK
DTU
Hidden Markov Models
• Weight matrices do not deal with insertions and deletions
• In alignments, this is done in an ad-hoc manner by optimization of the two gap penalties for first gap and gap extension
• HMM is a natural frame work where insertions/deletions are dealt with explicitly
CEN
TER
FO
R B
IOLO
GIC
AL S
EQ
UEN
CE A
NA
LY
SIS
TEC
HN
ICA
L U
NIV
ER
SIT
Y O
F D
EN
MA
RK
DTU
HMM (a simple example)
ACA---ATG
TCAACTATC
ACAC--AGC
AGA---ATC
ACCG--ATC
• Example from A. Krogh
• Core region defines the number of states in the HMM (red)
• Insertion and deletion statistics is derived from the non-core part of the alignment (blue)
Core of alignment
CEN
TER
FO
R B
IOLO
GIC
AL S
EQ
UEN
CE A
NA
LY
SIS
TEC
HN
ICA
L U
NIV
ER
SIT
Y O
F D
EN
MA
RK
DTU
.8
.2
ACGT
ACGT
ACGT
ACGT
ACGT
ACGT.8
.8 .8.8
.2.2.2
.2
1
ACGT .2
.2
.2
.4
1. .4 1. 1.1.
.6.6
.4
HMM construction
ACA---ATG
TCAACTATC
ACAC--AGC
AGA---ATC
ACCG--ATC
• 5 matches. A, 2xC, T, G• 5 transitions in gap region
• C out, G out• A-C, C-T, T out• Out transition 3/5• Stay transition 2/5
ACA---ATG 0.8x1x0.8x1x0.8x0.4x1x0.8x1x0.2 = 3.3x10-2
CEN
TER
FO
R B
IOLO
GIC
AL S
EQ
UEN
CE A
NA
LY
SIS
TEC
HN
ICA
L U
NIV
ER
SIT
Y O
F D
EN
MA
RK
DTU
Align sequence to HMMACA---ATG 0.8x1x0.8x1x0.8x0.4x1x0.8x1x0.2 = 3.3x10-2
TCAACTATC 0.2x1x0.8x1x0.8x0.6x0.2x0.4x0.4x0.4x0.2x0.6x1x1x0.8x1x0.8 = 0.0075x10-2
ACAC--AGC = 1.2x10-2
AGA---ATC = 3.3x10-2
ACCG--ATC = 0.59x10-2
Consensus:
ACAC--ATC = 4.7x10-2, ACA---ATC = 13.1x10-2
Exceptional:
TGCT--AGG = 0.0023x10-2
CEN
TER
FO
R B
IOLO
GIC
AL S
EQ
UEN
CE A
NA
LY
SIS
TEC
HN
ICA
L U
NIV
ER
SIT
Y O
F D
EN
MA
RK
DTU
Align sequence to HMM - Null model
• Score depends strongly on length
• Null model is a random model. For length L the score is
0.25L
• Log-odd score for sequence S
Log( P(S)/0.25L)
ACA---ATG = 4.9
TCAACTATC = 3.0 ACAC--AGC = 5.3AGA---ATC = 4.9ACCG--ATC = 4.6Consensus:ACAC--ATC = 6.7 ACA---ATC = 6.3Exceptional:TGCT--AGG = -0.97
Note!
CEN
TER
FO
R B
IOLO
GIC
AL S
EQ
UEN
CE A
NA
LY
SIS
TEC
HN
ICA
L U
NIV
ER
SIT
Y O
F D
EN
MA
RK
DTU
HMM’s and weight matrices
• In the case of un-gapped alignments HMM’s become simple weight matrices
• It still might be useful to use a HMM tool package to estimate a weight matrix– Sequence weighting– Pseudo counts
CEN
TER
FO
R B
IOLO
GIC
AL S
EQ
UEN
CE A
NA
LY
SIS
TEC
HN
ICA
L U
NIV
ER
SIT
Y O
F D
EN
MA
RK
DTU
Profile HMM’s
• Alignments based on conventional scoring matrices (BLOSUM62) scores all positions in a sequence in an equal manner
• Some position are highly conserved, some are highly flexible (more than what is described in the BLOSUM matrix)
• Profile HMM’s are ideal suited to describe such position specific variations
CEN
TER
FO
R B
IOLO
GIC
AL S
EQ
UEN
CE A
NA
LY
SIS
TEC
HN
ICA
L U
NIV
ER
SIT
Y O
F D
EN
MA
RK
DTU
ExampleSequence profiles
• Alignment of 1PLC._ to 1GYC.A• Blast e-value > 1000• Profile alignment
– Align 1PLC._ against Swiss-prot– Make position specific weight matrix from
alignment– Use this matrix to align 1PLC._ against
1GYC.A
• E-value > 10-22. Rmsd=3.3
CEN
TER
FO
R B
IOLO
GIC
AL S
EQ
UEN
CE A
NA
LY
SIS
TEC
HN
ICA
L U
NIV
ER
SIT
Y O
F D
EN
MA
RK
DTU
Example continued Score = 97.1 bits (241), Expect = 9e-22 Identities = 13/107 (12%), Positives = 27/107 (25%), Gaps = 17/107 (15%) Query: 3 VLLGADDGSLAFVPSEFSISPGEKI------VFKNNAGFPHNIVFDEDSIPSGVDASKIS 56 V+ G F + G++ N+ + +G + +Sbjct: 26 VVNG------VFPSPLITGKKGDRFQLNVVDTLTNHTMLKSTSIHWHGFFQAGTNWADGP 79 Query: 57 MSEEDLLNAKGETFEVAL---SNKGEYSFYCSP--HQGAGMVGKVTV 98 A G +F G + ++ G+ G VSbjct: 80 AFVNQCPIASGHSFLYDFHVPDQAGTFWYHSHLSTQYCDGLRGPFVV 126
Rmsd=3.3
CEN
TER
FO
R B
IOLO
GIC
AL S
EQ
UEN
CE A
NA
LY
SIS
TEC
HN
ICA
L U
NIV
ER
SIT
Y O
F D
EN
MA
RK
DTU
EM55_HUMAN WWQGRVEGSSKESAGLIPSPELQEWRVASMAQSAP--SEAPSCSPFGKKKK-YKDKYLAKCSKP_HUMAN WWQGKLENSKNGTAGLIPSPELQEWRVACIAMEKTKQEQQASCTWFGKKKKQYKDKYLAKKAPB_MOUSE -----PENLLIDHQGYIQVTDFGFAKRVKG------------------------------NRC2_NEUCR -----PENILLHQSGHIMLSDFDLSKQSDPGGKPTMIIGKNGTSTSSLPTIDTKSCIANF
EM55_HUMAN HSSIFDQLDVVSYEEVVRLPAFKRKTLVLIGASGVGRSHIKNALLSQNPEKFVYPVPYTTCSKP_HUMAN HNAVFDQLDLVTYEEVVKLPAFKRKTLVLLGAHGVGRRHIKNTLITKHPDRFAYPIPHTTKAPB_MOUSE RTWTLCGTPEYLAPEIILSKGYNKAVDWWALGVLIYEMAAGYPPFFADQPIQIYEKIVSGNRC2_NEUCR RTNSFVGTEEYIAPEVIKGSGHTSAVDWWTLGILIYEMLYGTTPFKGKNRNATFANILRE
EM55_HUMAN RPPRKSEEDGKEYHFISTEEMTRNISANEFLEFGSYQGNMFGTKFETVHQIHKQNKIAILCSKP_HUMAN RPPKKDEENGKNYYFVSHDQMMQDISNNEYLEYGSHEDAMYGTKLETIRKIHEQGLIAILKAPB_MOUSE KVRFPSHF-----SSDLKDLLRNLLQVDLTKRFGNLKNGVSDIKTHKWFATTDWIAIYQRNRC2_NEUCR DIPFPDHAGAPQISNLCKSLIRKLLIKDENRRLG-ARAGASDIKTHPFFRTTQWALI--R
EM55_HUMAN NNGVDETLKKLQEAFDQACSSPQWVPVSWVYCSKP_HUMAN NNEIDETIRHLEEAVELVCTAPQWVPVSWVYKAPB_MOUSE EKCGKEFCEF---------------------NRC2_NEUCR ENAVDPFEEFNSVTLHHDGDEEYHSDAYEKR
Profile HMM’s Insertion
Deletion
Conserved
CEN
TER
FO
R B
IOLO
GIC
AL S
EQ
UEN
CE A
NA
LY
SIS
TEC
HN
ICA
L U
NIV
ER
SIT
Y O
F D
EN
MA
RK
DTU
Profile HMM’s
All M/D pairs must be visited once
CEN
TER
FO
R B
IOLO
GIC
AL S
EQ
UEN
CE A
NA
LY
SIS
TEC
HN
ICA
L U
NIV
ER
SIT
Y O
F D
EN
MA
RK
DTU
TMHMM (trans-membrane HMM)
(Sonnhammer, von Heijne, and Krogh)
Model TM length distribution.Power of HMM.Difficult in alignment.
CEN
TER
FO
R B
IOLO
GIC
AL S
EQ
UEN
CE A
NA
LY
SIS
TEC
HN
ICA
L U
NIV
ER
SIT
Y O
F D
EN
MA
RK
DTU
Combination of HMM’s -Gene finding
x cccxxxxxxxxATGccc cccTAAxxxxxxxx
Inter-genicregion
Region aroundstart codon
Coding region
Region aroundstop codon
Start codon
Stop codon
CEN
TER
FO
R B
IOLO
GIC
AL S
EQ
UEN
CE A
NA
LY
SIS
TEC
HN
ICA
L U
NIV
ER
SIT
Y O
F D
EN
MA
RK
DTU
HMM packages
• HMMER (http://hmmer.wustl.edu/)– S.R. Eddy, WashU St. Louis. Freely available.
• SAM (http://www.cse.ucsc.edu/research/compbio/sam.html)– R. Hughey, K. Karplus, A. Krogh, D. Haussler and others, UC Santa
Cruz. Freely available to academia, nominal license fee for commercial users.
• META-MEME (http://metameme.sdsc.edu/)– William Noble Grundy, UC San Diego. Freely available. Combines
features of PSSM search and profile HMM search.
• NET-ID, HMMpro (http://www.netid.com/html/hmmpro.html)– Freely available to academia, nominal license fee for commercial users.– Allows HMM architecture construction.
CEN
TER
FO
R B
IOLO
GIC
AL S
EQ
UEN
CE A
NA
LY
SIS
TEC
HN
ICA
L U
NIV
ER
SIT
Y O
F D
EN
MA
RK
DTU
Simple Hmmer command
hmmbuild --gapmax 0.0 --fast A2.hmmer A2.fsa
hmmbuild - build a hidden Markov model from an alignmentHMMER 2.2g (August 2001)
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -Alignment file: A2.fsa
File format: a2mSearch algorithm configuration: Multiple domain (hmmls)
Model construction strategy: Fast/ad hoc (gapmax 0.0)Null model used: (default)
Sequence weighting method: G/S/C tree weights- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Alignment: #1Number of sequences: 232
Number of columns: 9Determining effective sequence number ... done. [192]
Weighting sequences heuristically ... done.Constructing model architecture ... done.Converting counts to probabilities ... done.
Setting model name, etc. ... done. [A2.fasta]Constructed a profile HMM (length 9)
Average score: -6.42 bitsMinimum score: -15.47 bitsMaximum score: -0.84 bits
Std. deviation: 2.72 bits
>HLA-A.0201 16 Example_for_LigandSLLPAIVEL>HLA-A.0201 16 Example_for_LigandYLLPAIVHI>HLA-A.0201 16 Example_for_LigandTLWVDPYEV>HLA-A.0201 16 Example_for_LigandSXPSGGXGV>HLA-A.0201 16 Example_for_LigandGLVPFLVSV