Protein threading using context specific alignment potential ismb-2013
-
Upload
sheng-wang -
Category
Technology
-
view
1.075 -
download
1
description
Transcript of Protein threading using context specific alignment potential ismb-2013
![Page 1: Protein threading using context specific alignment potential ismb-2013](https://reader033.fdocuments.in/reader033/viewer/2022052522/554fb605b4c9057b298b53fa/html5/thumbnails/1.jpg)
Protein Threading Using Context-Specific Alignment Potential
Sheng Wanghttp://raptorx.uchicago.edu
Toyota Technological Institute at Chicago,Joint work with Jianzhu Ma, Feng Zhao and Jinbo Xu
ISMB 2013 Jul 22, ICC Berlin, Germany
![Page 2: Protein threading using context specific alignment potential ismb-2013](https://reader033.fdocuments.in/reader033/viewer/2022052522/554fb605b4c9057b298b53fa/html5/thumbnails/2.jpg)
Outline
• Where we are @ template-based modeling• What’s our work• What’s the problem• What’s our solution• Welcome to our server
![Page 3: Protein threading using context specific alignment potential ismb-2013](https://reader033.fdocuments.in/reader033/viewer/2022052522/554fb605b4c9057b298b53fa/html5/thumbnails/3.jpg)
Template-based Modeling (or, Threading)• Observation
– ~50,000 non-redundant structures in PDB – ~ 1,200 unique structure folds (SCOP)
• Methodology– Use known structures to predict a new one
Template sequenceQuery sequence DDVYILDQAEEG
DE-FIVD-PDEH
DDVYILDQAEEG
SPCKR---ADEG
DDVYILDQAEEG
E--IFVDQADDS
DDVYILDQAEEG
NMCVFGQWERTY
database
![Page 4: Protein threading using context specific alignment potential ismb-2013](https://reader033.fdocuments.in/reader033/viewer/2022052522/554fb605b4c9057b298b53fa/html5/thumbnails/4.jpg)
Template-based Modeling Procedures Easy: similar sequences → similar structures
Sequence-based method, e.g., BLAST, FASTA Works only for close homologous (>70% sequence identity)
Medium: similar profiles → similar structures Protein profile is a matrix that represents a multiple sequence
alignment of the similar proteins Profile-based method, e.g., PSI-BLAST , HHMER, HHpred, Works for relative remote homologous (>40% sequence identity)
Challenge: dissimilar profiles → similar structures Adding structural information, or context-specific into sequence/profile
based methods Threading method, e.g., MUSTER, RAPTOR, CS-BLAST Works for distant remote homologous (<40% sequence identity)
![Page 5: Protein threading using context specific alignment potential ismb-2013](https://reader033.fdocuments.in/reader033/viewer/2022052522/554fb605b4c9057b298b53fa/html5/thumbnails/5.jpg)
Our Work
• CNFpred: Transform a template-sequence alignment problem into a Machine Learning problem to calculate the alignment’s probability.
• DeepAlign: Prepare for high quality training data of structural alignment.
• CNF model: Combined Machine Learning model that incorporate Conditional Random Field (CRF) and Neural Network (NN).
![Page 6: Protein threading using context specific alignment potential ismb-2013](https://reader033.fdocuments.in/reader033/viewer/2022052522/554fb605b4c9057b298b53fa/html5/thumbnails/6.jpg)
Protein Alignment ModelS A L R Q
L
P
L
S
E
M
M
M
M
L P L S - E
S A - L R QTemplateSequence
Match states (M)
M M Is M It M
Insertion at sequence (Is)
Insertion at template (It)
The structural alignment generated by DeepAlign is used for training data
![Page 7: Protein threading using context specific alignment potential ismb-2013](https://reader033.fdocuments.in/reader033/viewer/2022052522/554fb605b4c9057b298b53fa/html5/thumbnails/7.jpg)
DeepAlign for Structure Alignment
• evolutionary information• local sub-structure similarity • angular similarity for hydrogen bonding
BLOSUM is the local amino acid substitution matrix; CLESUM is the local sub-structure substitution matrix;v(i,j) measures the angular similarity for hydrogen bonding; d(i,j) measures the spatial proximity of two aligned residues.
local similarity global similarity
Score(i,j)=( max(0,BLOSUM(i,j) )+CLESUM(i,j) )*v(i,j)*d(i,j)
![Page 8: Protein threading using context specific alignment potential ismb-2013](https://reader033.fdocuments.in/reader033/viewer/2022052522/554fb605b4c9057b298b53fa/html5/thumbnails/8.jpg)
CNF-based Alignment Model
E: a neural network estimating the log-likelihood of state transition
Z(S,T): normalization factor
1 2{ , ,..., }LA a a a { , , }i t sa M I IGiven an alignment
Define a conditional probability
between Sequence S and Template T
Where,
),(/)),,,(exp(),|( 1 TSZTSaaETSApi
ii
Context-Specific
![Page 9: Protein threading using context specific alignment potential ismb-2013](https://reader033.fdocuments.in/reader033/viewer/2022052522/554fb605b4c9057b298b53fa/html5/thumbnails/9.jpg)
Comprehensive FeaturesMTYKLILN--GKTKGETTTEAVDAATAEKVFQYANDNGVDGEWTYTE
How similar two residues : EAA
How similar query’s sequence and profile and template’s profile: Esp, Epp
How similar template’s secondary structure and sequence’s predicted second structure (3-class and 8-class): Ess3, Ess8
Sequence S
How similar is the query’s solvent accessibility and template’s solvent accessibility: Esa
Total scoring function is a non-linear combination of:
E( ai, ai-1, EAA , Esp , Epp , Ediso, Ess3 , Ess8 , Esa )
Template TMTYKLILNSTVRTKSDTVTDAVP---ADKICSFAQQLPWEREWSF--
For disordered regions, Ediso,
no structure information used.
![Page 10: Protein threading using context specific alignment potential ismb-2013](https://reader033.fdocuments.in/reader033/viewer/2022052522/554fb605b4c9057b298b53fa/html5/thumbnails/10.jpg)
What’s the problem?
• Only the alignment probability is described, instead of the log-odds potential compared to background.
• Only incorporate local information, insufficient of global information.
![Page 11: Protein threading using context specific alignment potential ismb-2013](https://reader033.fdocuments.in/reader033/viewer/2022052522/554fb605b4c9057b298b53fa/html5/thumbnails/11.jpg)
Our solution
Propose a protein alignment potential• With an elaborately designed reference state.• Can be generalized into sequence-sequence,
sequence-structure as well as structure-structure alignment.
Incorporate both local and global terms• For local term, CNFpred potential is applied.• For global term, EPAD potential is employed.
![Page 12: Protein threading using context specific alignment potential ismb-2013](https://reader033.fdocuments.in/reader033/viewer/2022052522/554fb605b4c9057b298b53fa/html5/thumbnails/12.jpg)
Protein alignment potential
Similarly, given one alignment A between sequence S and template T,we define the potential of A as follows.
NN
i
ref
yxAP
TSAP
APTSAP
TSAu
1),|(
),|(log
)(),|(
log),|(
Given 2 AAs a and b, their mutation potential is defined as follows.
)()(
)(log
)(
)(log)(
bPaP
baP
baP
baPbau
ref
BLOSUM62 Potential
Alignment Potential
x and y are two random proteins with the same length as S and T, respectively.
Assumption: the alignment maximizing the potential is the optimal.
![Page 13: Protein threading using context specific alignment potential ismb-2013](https://reader033.fdocuments.in/reader033/viewer/2022052522/554fb605b4c9057b298b53fa/html5/thumbnails/13.jpg)
),(/)),|(),|(exp(),|( TSZTSAGTSAFTSAP
The alignment probability given sequence S and template T could be modeled as follows,
local term global term
partition function
A
TSAPtsZ ),|(),(
Protein alignment potential
![Page 14: Protein threading using context specific alignment potential ismb-2013](https://reader033.fdocuments.in/reader033/viewer/2022052522/554fb605b4c9057b298b53fa/html5/thumbnails/14.jpg)
),(),|(),|(
),|(),|(
),(/)),|(),|(exp(
),(/)),|(),|(exp(log
),|(
),|(log),|(
,
,
1
1
TScyxAGEXPTSAG
yxAFEXPTSAF
yxZyxAGyxAF
TSZTSAGTSAF
yxAP
TSAPTSAu
yx
yx
NN
i
NN
i
Expected score, can be calculated in advance by sampling
Independent of any specific alignment.
Protein alignment potential
![Page 15: Protein threading using context specific alignment potential ismb-2013](https://reader033.fdocuments.in/reader033/viewer/2022052522/554fb605b4c9057b298b53fa/html5/thumbnails/15.jpg)
Model the local potential
i
ii TSaaETSAF ),,,(),|( 1
From CNFpred, we use a context-specific linear chain model as,
The expectation term can be calculated by uniformly sampling a few thousand protein pairs, so the local potential is
The local potential is defined as,
),|(),|(),|( , yxAFEXPTSAFTSAU yxlocal
i
iiiilocal aaETSaaETSAU )),(),,,((),|( 11
![Page 16: Protein threading using context specific alignment potential ismb-2013](https://reader033.fdocuments.in/reader033/viewer/2022052522/554fb605b4c9057b298b53fa/html5/thumbnails/16.jpg)
Maximize on probability Maximize on potential
Long but less informative and highly false positive.
Good for building models.
Template Template
Sequ
ence
Sequ
ence
Short but relevant and highly significant.
Good for ranking templates.
What’s the difference between
![Page 17: Protein threading using context specific alignment potential ismb-2013](https://reader033.fdocuments.in/reader033/viewer/2022052522/554fb605b4c9057b298b53fa/html5/thumbnails/17.jpg)
Model the global potential
ji
jiTij ssdPTSAG ),|(log),|(
From EPAD, we use a context-specific distance-dependent model as,
The expectation term can be calculated by uniformly sampling a few thousand residue pairs from templates, so the global potential is
The global potential is defined as,
),|(),|(),|( , yxAGEXPTSAGTSAU yxglobal
ji
Tijji
Tijglobal dPssdPTSAU ))(log),|((log),|(
![Page 18: Protein threading using context specific alignment potential ismb-2013](https://reader033.fdocuments.in/reader033/viewer/2022052522/554fb605b4c9057b298b53fa/html5/thumbnails/18.jpg)
What’s global information given an alignment?
i j
i j
ji
jiTij ssdPTSAG ),|(log),|(
Template T
Sequence S
Tijd
Tijd
i j
If the alignment is good, the distance of a sequence residue pair shall match well with that of their aligned template residue pair.
si
sj
![Page 19: Protein threading using context specific alignment potential ismb-2013](https://reader033.fdocuments.in/reader033/viewer/2022052522/554fb605b4c9057b298b53fa/html5/thumbnails/19.jpg)
Result on 1000*6000
CNFpred (local+global potential) compared to,
HHpred CNFpred (local potential)
![Page 20: Protein threading using context specific alignment potential ismb-2013](https://reader033.fdocuments.in/reader033/viewer/2022052522/554fb605b4c9057b298b53fa/html5/thumbnails/20.jpg)
Welcome to our server http://raptorx.uchicago.edu/
Binding
Contact
![Page 21: Protein threading using context specific alignment potential ismb-2013](https://reader033.fdocuments.in/reader033/viewer/2022052522/554fb605b4c9057b298b53fa/html5/thumbnails/21.jpg)
Thank you
Jinbo Xu
Feng ZhaoJianzhu Ma
National Institutes of Health (R01GM0897532)National Science Foundation (DBI-0960390)
NSF CAREER award CCF-1149811Alfred P. Sloan Research Fellowship