PepHMM: A Hidden Markov Model Based Scoring Function for Mass Spectrometry Database Search Laxman...
-
Upload
stuart-lyons -
Category
Documents
-
view
223 -
download
0
Transcript of PepHMM: A Hidden Markov Model Based Scoring Function for Mass Spectrometry Database Search Laxman...
![Page 1: PepHMM: A Hidden Markov Model Based Scoring Function for Mass Spectrometry Database Search Laxman Yetukuri T-61.6070: Modeling of Proteomics Data.](https://reader035.fdocuments.in/reader035/viewer/2022062314/56649ef65503460f94c09cb5/html5/thumbnails/1.jpg)
PepHMM: A Hidden Markov Model Based Scoring Function for Mass Spectrometry
Database Search
Laxman YetukuriT-61.6070: Modeling of Proteomics Data
![Page 2: PepHMM: A Hidden Markov Model Based Scoring Function for Mass Spectrometry Database Search Laxman Yetukuri T-61.6070: Modeling of Proteomics Data.](https://reader035.fdocuments.in/reader035/viewer/2022062314/56649ef65503460f94c09cb5/html5/thumbnails/2.jpg)
Outline Motivation
Basics: MS and MS/MS for Protein Identification
Computational Framework of Database Search
Scoring Algorithms PepHMM
MOWSE
Results
Summary
![Page 3: PepHMM: A Hidden Markov Model Based Scoring Function for Mass Spectrometry Database Search Laxman Yetukuri T-61.6070: Modeling of Proteomics Data.](https://reader035.fdocuments.in/reader035/viewer/2022062314/56649ef65503460f94c09cb5/html5/thumbnails/3.jpg)
Proteomics studies- dynamic and context sensitive
Speed and accuracy of omics-driven methods
High throughput MS-based approaches
Real analysis starts with protein identification
Protein identification is challenging
The heart of protein identification algorithm is scoring function
Motivation
![Page 4: PepHMM: A Hidden Markov Model Based Scoring Function for Mass Spectrometry Database Search Laxman Yetukuri T-61.6070: Modeling of Proteomics Data.](https://reader035.fdocuments.in/reader035/viewer/2022062314/56649ef65503460f94c09cb5/html5/thumbnails/4.jpg)
Protein Identification Is Challenging
Sample Contamination
Imperfect Fragmentation
Post translational Modifications
Low signal to noise ratio
Machine errors
![Page 5: PepHMM: A Hidden Markov Model Based Scoring Function for Mass Spectrometry Database Search Laxman Yetukuri T-61.6070: Modeling of Proteomics Data.](https://reader035.fdocuments.in/reader035/viewer/2022062314/56649ef65503460f94c09cb5/html5/thumbnails/5.jpg)
Basics: MS and MS/MS for protein Identification
Trypsin Digest
MassSpectrometry
Liquid Chromatography
Precursor selection + collision induced dissociation
(CID)
MS/MS
![Page 6: PepHMM: A Hidden Markov Model Based Scoring Function for Mass Spectrometry Database Search Laxman Yetukuri T-61.6070: Modeling of Proteomics Data.](https://reader035.fdocuments.in/reader035/viewer/2022062314/56649ef65503460f94c09cb5/html5/thumbnails/6.jpg)
Nesvizhskii and Aebersold, Drug Discovery Today, 2004, 9, 173-181
Computational Problem
![Page 7: PepHMM: A Hidden Markov Model Based Scoring Function for Mass Spectrometry Database Search Laxman Yetukuri T-61.6070: Modeling of Proteomics Data.](https://reader035.fdocuments.in/reader035/viewer/2022062314/56649ef65503460f94c09cb5/html5/thumbnails/7.jpg)
i+1
Peptide Fragmentation: b & y ions
-HN-CH-CO-NH-CH-CO-NH-
RiCH-R’
bi
yn-iyn-i-1
bi+1
R”
i+1
![Page 8: PepHMM: A Hidden Markov Model Based Scoring Function for Mass Spectrometry Database Search Laxman Yetukuri T-61.6070: Modeling of Proteomics Data.](https://reader035.fdocuments.in/reader035/viewer/2022062314/56649ef65503460f94c09cb5/html5/thumbnails/8.jpg)
Peptide Fragmentation: b & y ions …
K1166
L1020
E907
D778
E663
E534
L405
F292
G145
S88 b ions
100
0250 500 750 1000
m/z
% I
nte
nsit
y
147260389504633762875102210801166 y ions
y6
y7
y2 y3 y4
y5
y8 y9
b3
b5 b6 b7b8 b9
b4
![Page 9: PepHMM: A Hidden Markov Model Based Scoring Function for Mass Spectrometry Database Search Laxman Yetukuri T-61.6070: Modeling of Proteomics Data.](https://reader035.fdocuments.in/reader035/viewer/2022062314/56649ef65503460f94c09cb5/html5/thumbnails/9.jpg)
i+1
Peptide Fragmentation with other ions
-HN-CH-CO-NH-CH-CO-NH-
RiCH-R’
bi
yn-iyn-i-1
bi+1
R”
i+1
ai
xn-i
ci
zn-i
![Page 10: PepHMM: A Hidden Markov Model Based Scoring Function for Mass Spectrometry Database Search Laxman Yetukuri T-61.6070: Modeling of Proteomics Data.](https://reader035.fdocuments.in/reader035/viewer/2022062314/56649ef65503460f94c09cb5/html5/thumbnails/10.jpg)
Peptide Identification
Two main methods for tandem MS:
De novo interpretation
Sequence database search
![Page 11: PepHMM: A Hidden Markov Model Based Scoring Function for Mass Spectrometry Database Search Laxman Yetukuri T-61.6070: Modeling of Proteomics Data.](https://reader035.fdocuments.in/reader035/viewer/2022062314/56649ef65503460f94c09cb5/html5/thumbnails/11.jpg)
De Novo Interpretation
100
0250 500 750 1000
m/z
% I
nte
nsit
y
E L F
KL
SGF G
E DE
L E
E D E L
![Page 12: PepHMM: A Hidden Markov Model Based Scoring Function for Mass Spectrometry Database Search Laxman Yetukuri T-61.6070: Modeling of Proteomics Data.](https://reader035.fdocuments.in/reader035/viewer/2022062314/56649ef65503460f94c09cb5/html5/thumbnails/12.jpg)
Sequence Database Search
Widely used approach
Compares peptides from a protein sequence database with
experimental spectra
Scoring function summarise the comparison
Critical for any search engine
Score each peptide against spectrum
Cross correlation (SEQUEST)
MOWSE scoring and its extensions (MASCOT)
Probabilistic scoring systems (OMSSA, OLAV, ProbID…..)
PepHMM is HMM based probabilistic scoring function
![Page 13: PepHMM: A Hidden Markov Model Based Scoring Function for Mass Spectrometry Database Search Laxman Yetukuri T-61.6070: Modeling of Proteomics Data.](https://reader035.fdocuments.in/reader035/viewer/2022062314/56649ef65503460f94c09cb5/html5/thumbnails/13.jpg)
Computational Framework for pepHMM
MSDB based peptide extraction
Hypothetical spectrum generation
b,y,y-H2O,b-H2O,b2+ and y2+
Computing probabilistic scores
Initial classification :Match, missing or noise
Compute pepHMM scores (discussed later)
Compute Z-score
Compute E-score
![Page 14: PepHMM: A Hidden Markov Model Based Scoring Function for Mass Spectrometry Database Search Laxman Yetukuri T-61.6070: Modeling of Proteomics Data.](https://reader035.fdocuments.in/reader035/viewer/2022062314/56649ef65503460f94c09cb5/html5/thumbnails/14.jpg)
Contents of pepHMM Model
PepHMM combines the information on correlation among
the ions, peak intensity and match tolerance
Input – sets of matches, missing and noise
Model is based on b and y ions
Each match is associated with observation (T,I)
Observation state = observed (T,I)
Hidden state =True assignement of the observations
![Page 15: PepHMM: A Hidden Markov Model Based Scoring Function for Mass Spectrometry Database Search Laxman Yetukuri T-61.6070: Modeling of Proteomics Data.](https://reader035.fdocuments.in/reader035/viewer/2022062314/56649ef65503460f94c09cb5/html5/thumbnails/15.jpg)
Model Structure
Four possible assignments corresponding to four hidden states
![Page 16: PepHMM: A Hidden Markov Model Based Scoring Function for Mass Spectrometry Database Search Laxman Yetukuri T-61.6070: Modeling of Proteomics Data.](https://reader035.fdocuments.in/reader035/viewer/2022062314/56649ef65503460f94c09cb5/html5/thumbnails/16.jpg)
Model Computation
.),Pr(max psDp
Goal: Calculate highest score peptide in the database
Let a path in HMM be represents configuration of states, probability of the path
),Pr(),,Pr( Mps
noisen
iea iii
#1
0
*)(11,
n ....21
),,,,( yba
![Page 17: PepHMM: A Hidden Markov Model Based Scoring Function for Mass Spectrometry Database Search Laxman Yetukuri T-61.6070: Modeling of Proteomics Data.](https://reader035.fdocuments.in/reader035/viewer/2022062314/56649ef65503460f94c09cb5/html5/thumbnails/17.jpg)
Model Computation…
).,Pr(),Pr(
Mps
).,.....,r(
....
)(11
11
P
Muiv i
i
f
),()1(
,
)1()()(
afi
vu
i
uu
i
v
i
v fe
Considering all possible paths
Forward algorithm: Probability of all possible Paths from the first position to state v at postion i
)()Pr( nv
vfM
![Page 18: PepHMM: A Hidden Markov Model Based Scoring Function for Mass Spectrometry Database Search Laxman Yetukuri T-61.6070: Modeling of Proteomics Data.](https://reader035.fdocuments.in/reader035/viewer/2022062314/56649ef65503460f94c09cb5/html5/thumbnails/18.jpg)
Emmission Probabilities
Probability of observing (Tb,Ib) and (Ty, Iy) for the state 1 at position i
)Pr(),Pr(ybTT
)Pr(),Pr(ybII
---Normal distribution
---Exponential distribution
)),(),,Pr(()(1 yybbi ITITe
)),Pr((),Pr(( yybb ITIT
),Pr()Pr()Pr()Pr( yybb ITIT
![Page 19: PepHMM: A Hidden Markov Model Based Scoring Function for Mass Spectrometry Database Search Laxman Yetukuri T-61.6070: Modeling of Proteomics Data.](https://reader035.fdocuments.in/reader035/viewer/2022062314/56649ef65503460f94c09cb5/html5/thumbnails/19.jpg)
MOWSE Scoring System
incolumnj
ji
ji
ji
f
f
m
max
,
,
,
mn
jiotMScore
,Pr
000,50
MOWSE Algorithm is implemented in MASCOT software
Where
mi,j -elements of MOWSE frequence matrix
![Page 20: PepHMM: A Hidden Markov Model Based Scoring Function for Mass Spectrometry Database Search Laxman Yetukuri T-61.6070: Modeling of Proteomics Data.](https://reader035.fdocuments.in/reader035/viewer/2022062314/56649ef65503460f94c09cb5/html5/thumbnails/20.jpg)
Data Sets
ISB data set:
1. A,B mixtures of 18 different proteins with modifications/relative amounts
2. Analysed using SEQUEST and other in-house Software
3. Data set is curated
4. Final data set with charge 2+ for trypsin digestion contains 857 spectra
5. 5-fold cross validation by random selection-Training set :687 spectra-Testing set : 170 spectra
6. EM algorithm is used for estimating parameters
![Page 21: PepHMM: A Hidden Markov Model Based Scoring Function for Mass Spectrometry Database Search Laxman Yetukuri T-61.6070: Modeling of Proteomics Data.](https://reader035.fdocuments.in/reader035/viewer/2022062314/56649ef65503460f94c09cb5/html5/thumbnails/21.jpg)
Results: Distributions of Ions
b and y ionsNoise
Match ToleranceParameter estimates
,,
![Page 22: PepHMM: A Hidden Markov Model Based Scoring Function for Mass Spectrometry Database Search Laxman Yetukuri T-61.6070: Modeling of Proteomics Data.](https://reader035.fdocuments.in/reader035/viewer/2022062314/56649ef65503460f94c09cb5/html5/thumbnails/22.jpg)
Comparative StudiesDat set selection repeated 10 times to select both training and test data setFor each group parameters are similar valuesPrediction is considered correct if the peptide has highest score
![Page 23: PepHMM: A Hidden Markov Model Based Scoring Function for Mass Spectrometry Database Search Laxman Yetukuri T-61.6070: Modeling of Proteomics Data.](https://reader035.fdocuments.in/reader035/viewer/2022062314/56649ef65503460f94c09cb5/html5/thumbnails/23.jpg)
Independent Data Set A.Y’s Lab: The other independent data set for comparing with other tools like SEQUEST and MASCOT size of data set =20,980 spectra
![Page 24: PepHMM: A Hidden Markov Model Based Scoring Function for Mass Spectrometry Database Search Laxman Yetukuri T-61.6070: Modeling of Proteomics Data.](https://reader035.fdocuments.in/reader035/viewer/2022062314/56649ef65503460f94c09cb5/html5/thumbnails/24.jpg)
False/True Positive Rates
![Page 25: PepHMM: A Hidden Markov Model Based Scoring Function for Mass Spectrometry Database Search Laxman Yetukuri T-61.6070: Modeling of Proteomics Data.](https://reader035.fdocuments.in/reader035/viewer/2022062314/56649ef65503460f94c09cb5/html5/thumbnails/25.jpg)
Summary Developed probabilistic scoring function called pepHMM for
improving protein identifications
PepHMM outperform other tools like MASCOT with low false
postive rate (always?)
Can this handle other type of ions other than b and y ions
Need to handle post translational modifications