Improving the Reliability of Peptide Identification by Tandem Mass Spectrometry Nathan Edwards...
-
Upload
pierce-rose -
Category
Documents
-
view
228 -
download
1
Transcript of Improving the Reliability of Peptide Identification by Tandem Mass Spectrometry Nathan Edwards...
![Page 1: Improving the Reliability of Peptide Identification by Tandem Mass Spectrometry Nathan Edwards Department of Biochemistry and Molecular & Cellular Biology.](https://reader036.fdocuments.in/reader036/viewer/2022062321/56649e675503460f94b627f4/html5/thumbnails/1.jpg)
Improving the Reliability of
Peptide Identification by
Tandem Mass Spectrometry
Improving the Reliability of
Peptide Identification by
Tandem Mass SpectrometryNathan EdwardsDepartment of Biochemistry and Molecular & Cellular BiologyGeorgetown University Medical Center
![Page 2: Improving the Reliability of Peptide Identification by Tandem Mass Spectrometry Nathan Edwards Department of Biochemistry and Molecular & Cellular Biology.](https://reader036.fdocuments.in/reader036/viewer/2022062321/56649e675503460f94b627f4/html5/thumbnails/2.jpg)
2
Mass Spectrometry for Proteomics
• Measure mass of many (bio)molecules simultaneously• High bandwidth
• Mass is an intrinsic property of all (bio)molecules• No prior knowledge required
![Page 3: Improving the Reliability of Peptide Identification by Tandem Mass Spectrometry Nathan Edwards Department of Biochemistry and Molecular & Cellular Biology.](https://reader036.fdocuments.in/reader036/viewer/2022062321/56649e675503460f94b627f4/html5/thumbnails/3.jpg)
3
Mass Spectrometer
Ionizer
Sample
+_
Mass Analyzer Detector
• MALDI• Electro-Spray
Ionization (ESI)
• Time-Of-Flight (TOF)• Quadrapole• Ion-Trap
• ElectronMultiplier(EM)
![Page 4: Improving the Reliability of Peptide Identification by Tandem Mass Spectrometry Nathan Edwards Department of Biochemistry and Molecular & Cellular Biology.](https://reader036.fdocuments.in/reader036/viewer/2022062321/56649e675503460f94b627f4/html5/thumbnails/4.jpg)
4
High Bandwidth
100
0250 500 750 1000
m/z
% I
nte
nsit
y
![Page 5: Improving the Reliability of Peptide Identification by Tandem Mass Spectrometry Nathan Edwards Department of Biochemistry and Molecular & Cellular Biology.](https://reader036.fdocuments.in/reader036/viewer/2022062321/56649e675503460f94b627f4/html5/thumbnails/5.jpg)
5
Mass is fundamental!
![Page 6: Improving the Reliability of Peptide Identification by Tandem Mass Spectrometry Nathan Edwards Department of Biochemistry and Molecular & Cellular Biology.](https://reader036.fdocuments.in/reader036/viewer/2022062321/56649e675503460f94b627f4/html5/thumbnails/6.jpg)
6
Mass Spectrometry for Proteomics
• Measure mass of many molecules simultaneously• ...but not too many, abundance bias
• Mass is an intrinsic property of all (bio)molecules• ...but need a reference to compare to
![Page 7: Improving the Reliability of Peptide Identification by Tandem Mass Spectrometry Nathan Edwards Department of Biochemistry and Molecular & Cellular Biology.](https://reader036.fdocuments.in/reader036/viewer/2022062321/56649e675503460f94b627f4/html5/thumbnails/7.jpg)
7
Mass Spectrometry for Proteomics
• Mass spectrometry has been around since the turn of the century...• ...why is MS based Proteomics so new?
• Ionization methods• MALDI, Electrospray
• Protein chemistry & automation• Chromatography, Gels, Computers
• Protein sequence databases• A reference for comparison
![Page 8: Improving the Reliability of Peptide Identification by Tandem Mass Spectrometry Nathan Edwards Department of Biochemistry and Molecular & Cellular Biology.](https://reader036.fdocuments.in/reader036/viewer/2022062321/56649e675503460f94b627f4/html5/thumbnails/8.jpg)
8
Sample Preparation for Peptide Identification
Enzymatic Digestand
Fractionation
![Page 9: Improving the Reliability of Peptide Identification by Tandem Mass Spectrometry Nathan Edwards Department of Biochemistry and Molecular & Cellular Biology.](https://reader036.fdocuments.in/reader036/viewer/2022062321/56649e675503460f94b627f4/html5/thumbnails/9.jpg)
9
Single Stage MS
MS
m/z
![Page 10: Improving the Reliability of Peptide Identification by Tandem Mass Spectrometry Nathan Edwards Department of Biochemistry and Molecular & Cellular Biology.](https://reader036.fdocuments.in/reader036/viewer/2022062321/56649e675503460f94b627f4/html5/thumbnails/10.jpg)
10
Tandem Mass Spectrometry(MS/MS)
Precursor selection
m/z
m/z
![Page 11: Improving the Reliability of Peptide Identification by Tandem Mass Spectrometry Nathan Edwards Department of Biochemistry and Molecular & Cellular Biology.](https://reader036.fdocuments.in/reader036/viewer/2022062321/56649e675503460f94b627f4/html5/thumbnails/11.jpg)
11
Tandem Mass Spectrometry(MS/MS)
Precursor selection + collision induced dissociation
(CID)
MS/MS
m/z
m/z
![Page 12: Improving the Reliability of Peptide Identification by Tandem Mass Spectrometry Nathan Edwards Department of Biochemistry and Molecular & Cellular Biology.](https://reader036.fdocuments.in/reader036/viewer/2022062321/56649e675503460f94b627f4/html5/thumbnails/12.jpg)
12
The big picture...
• MS/MS spectra provide evidence for the amino-acid sequence of functional proteins.
• Key concepts:• Spectrum acquisition is unbiased• Direct observation of amino-acid sequence• Sensitive to minor sequence variation• Observed peptides represent folded proteins
![Page 13: Improving the Reliability of Peptide Identification by Tandem Mass Spectrometry Nathan Edwards Department of Biochemistry and Molecular & Cellular Biology.](https://reader036.fdocuments.in/reader036/viewer/2022062321/56649e675503460f94b627f4/html5/thumbnails/13.jpg)
13
Peptide Identification
• For each (likely) peptide sequence1. Compute fragment masses2. Compare with spectrum3. Retain those that match well
• Peptide sequences from protein sequence databases• Swiss-Prot, IPI, NCBI’s nr, ...
• Automated, high-throughput peptide identification in complex mixtures
![Page 14: Improving the Reliability of Peptide Identification by Tandem Mass Spectrometry Nathan Edwards Department of Biochemistry and Molecular & Cellular Biology.](https://reader036.fdocuments.in/reader036/viewer/2022062321/56649e675503460f94b627f4/html5/thumbnails/14.jpg)
14
Peptide Identification, but...
• What about novel peptides?• Search compressed ESTs (C3, PepSeqDB)
• What about peak intensity?• Spectral matching using HMMs (HMMatch)
• Which identifications are correct?• Unsupervised, model-free, result combiner
with false discovery rate estimation
![Page 15: Improving the Reliability of Peptide Identification by Tandem Mass Spectrometry Nathan Edwards Department of Biochemistry and Molecular & Cellular Biology.](https://reader036.fdocuments.in/reader036/viewer/2022062321/56649e675503460f94b627f4/html5/thumbnails/15.jpg)
15
Why don’t we see more novel peptides?
• Tandem mass spectrometry doesn’t discriminate against novel peptides...
...but protein sequence databases do!
• Searching traditional protein sequence databases biases the results towards well-understood protein isoforms!
![Page 16: Improving the Reliability of Peptide Identification by Tandem Mass Spectrometry Nathan Edwards Department of Biochemistry and Molecular & Cellular Biology.](https://reader036.fdocuments.in/reader036/viewer/2022062321/56649e675503460f94b627f4/html5/thumbnails/16.jpg)
16
What goes missing?
• Known coding SNPs
• Novel coding mutations
• Alternative splicing isoforms
• Alternative translation start-sites
• Microexons
• Alternative translation frames
![Page 17: Improving the Reliability of Peptide Identification by Tandem Mass Spectrometry Nathan Edwards Department of Biochemistry and Molecular & Cellular Biology.](https://reader036.fdocuments.in/reader036/viewer/2022062321/56649e675503460f94b627f4/html5/thumbnails/17.jpg)
17
Why should we care?
• Alternative splicing is the norm!• Only 20-25K human genes• Each gene makes many proteins
• Proteins have clinical implications• Biomarker discovery
• Evidence for SNPs and alternative splicing stops with transcription• Genomic assays, ESTs, mRNA sequence.• Little hard evidence for translation start site
![Page 18: Improving the Reliability of Peptide Identification by Tandem Mass Spectrometry Nathan Edwards Department of Biochemistry and Molecular & Cellular Biology.](https://reader036.fdocuments.in/reader036/viewer/2022062321/56649e675503460f94b627f4/html5/thumbnails/18.jpg)
18
Novel Splice Isoform
• Human Jurkat leukemia cell-line• Lipid-raft extraction protocol, targeting T cells• von Haller, et al. MCP 2003.
• LIME1 gene:• LCK interacting transmembrane adaptor 1
• LCK gene:• Leukocyte-specific protein tyrosine kinase• Proto-oncogene• Chromosomal aberration involving LCK in leukemias.
• Multiple significant peptide identifications
![Page 19: Improving the Reliability of Peptide Identification by Tandem Mass Spectrometry Nathan Edwards Department of Biochemistry and Molecular & Cellular Biology.](https://reader036.fdocuments.in/reader036/viewer/2022062321/56649e675503460f94b627f4/html5/thumbnails/19.jpg)
19
Novel Splice Isoform
![Page 20: Improving the Reliability of Peptide Identification by Tandem Mass Spectrometry Nathan Edwards Department of Biochemistry and Molecular & Cellular Biology.](https://reader036.fdocuments.in/reader036/viewer/2022062321/56649e675503460f94b627f4/html5/thumbnails/20.jpg)
20
Novel Splice Isoform
![Page 21: Improving the Reliability of Peptide Identification by Tandem Mass Spectrometry Nathan Edwards Department of Biochemistry and Molecular & Cellular Biology.](https://reader036.fdocuments.in/reader036/viewer/2022062321/56649e675503460f94b627f4/html5/thumbnails/21.jpg)
21
Novel Mutation
• HUPO Plasma Proteome Project• Pooled samples from 10 male & 10 female
healthy Chinese subjects• Plasma/EDTA sample protocol• Li, et al. Proteomics 2005. (Lab 29)
• TTR gene• Transthyretin (pre-albumin) • Defects in TTR are a cause of amyloidosis.• Familial amyloidotic polyneuropathy
• late-onset, dominant inheritance
![Page 22: Improving the Reliability of Peptide Identification by Tandem Mass Spectrometry Nathan Edwards Department of Biochemistry and Molecular & Cellular Biology.](https://reader036.fdocuments.in/reader036/viewer/2022062321/56649e675503460f94b627f4/html5/thumbnails/22.jpg)
22
Novel Mutation
Ala2→Pro associated with familial amyloid polyneuropathy
![Page 23: Improving the Reliability of Peptide Identification by Tandem Mass Spectrometry Nathan Edwards Department of Biochemistry and Molecular & Cellular Biology.](https://reader036.fdocuments.in/reader036/viewer/2022062321/56649e675503460f94b627f4/html5/thumbnails/23.jpg)
23
Novel Mutation
![Page 24: Improving the Reliability of Peptide Identification by Tandem Mass Spectrometry Nathan Edwards Department of Biochemistry and Molecular & Cellular Biology.](https://reader036.fdocuments.in/reader036/viewer/2022062321/56649e675503460f94b627f4/html5/thumbnails/24.jpg)
24
Searching ESTs
• Proposed long ago:• Yates, Eng, and McCormack; Anal Chem, ’95.
• Now:• Protein sequences are sufficient for protein identification• Computationally expensive/infeasible• Difficult to interpret
• Make EST searching feasible for routine searching to discover novel peptides.
![Page 25: Improving the Reliability of Peptide Identification by Tandem Mass Spectrometry Nathan Edwards Department of Biochemistry and Molecular & Cellular Biology.](https://reader036.fdocuments.in/reader036/viewer/2022062321/56649e675503460f94b627f4/html5/thumbnails/25.jpg)
25
Searching Expressed Sequence Tags (ESTs)
Pros• No introns!• Primary splicing
evidence for annotation pipelines
• Evidence for dbSNP• Often derived from
clinical cancer samples
Cons• No frame• Large (8Gb)• “Untrusted” by
annotation pipelines• Highly redundant• Nucleotide error
rate ~ 1%
![Page 26: Improving the Reliability of Peptide Identification by Tandem Mass Spectrometry Nathan Edwards Department of Biochemistry and Molecular & Cellular Biology.](https://reader036.fdocuments.in/reader036/viewer/2022062321/56649e675503460f94b627f4/html5/thumbnails/26.jpg)
26
Compressed EST Peptide Sequence Database
• For all ESTs mapped to a UniGene gene:• Six-frame translation• Eliminate ORFs < 30 amino-acids• Eliminate amino-acid 30-mers observed once• Compress to C2 FASTA database
• Complete, Correct for amino-acid 30-mers
• Gene-centric peptide sequence database:• Size: < 3% of naïve enumeration, 20774 FASTA entries• Running time: ~ 1% of naïve enumeration search• E-values: ~ 2% of naïve enumeration search results
![Page 27: Improving the Reliability of Peptide Identification by Tandem Mass Spectrometry Nathan Edwards Department of Biochemistry and Molecular & Cellular Biology.](https://reader036.fdocuments.in/reader036/viewer/2022062321/56649e675503460f94b627f4/html5/thumbnails/27.jpg)
27
PepSeq FASTA Databases
• Organisms:• HUMAN, MOUSE, RAT, ZEBRA FISH
• Peptide Evidence:• Genbank mRNA, EST, HTC• RefSeq mRNA, Proteins• Swiss-Prot/TrEMBL, EMBL, VEGA, H-Inv, IPI
Proteins• Swiss-Prot variants• Swiss-Prot signal peptide & init. Met removal
• Singe FASTA entry per Gene
![Page 28: Improving the Reliability of Peptide Identification by Tandem Mass Spectrometry Nathan Edwards Department of Biochemistry and Molecular & Cellular Biology.](https://reader036.fdocuments.in/reader036/viewer/2022062321/56649e675503460f94b627f4/html5/thumbnails/28.jpg)
28
Spectral Matching for Peptide Identification
• Detection vs. identification• Increased sensitivity & specificity• No novel peptides!
• NIST GC/MS Spectral Library• Identifies small molecules, • 100,000’s of (consensus) spectra• Bundled/Sold with many instruments• “Dot-product” spectral comparison• Current project: Peptide MS/MS
![Page 29: Improving the Reliability of Peptide Identification by Tandem Mass Spectrometry Nathan Edwards Department of Biochemistry and Molecular & Cellular Biology.](https://reader036.fdocuments.in/reader036/viewer/2022062321/56649e675503460f94b627f4/html5/thumbnails/29.jpg)
29
NIST MS Search: Peptides
![Page 30: Improving the Reliability of Peptide Identification by Tandem Mass Spectrometry Nathan Edwards Department of Biochemistry and Molecular & Cellular Biology.](https://reader036.fdocuments.in/reader036/viewer/2022062321/56649e675503460f94b627f4/html5/thumbnails/30.jpg)
30
Peptide DLATVYVDVLK
![Page 31: Improving the Reliability of Peptide Identification by Tandem Mass Spectrometry Nathan Edwards Department of Biochemistry and Molecular & Cellular Biology.](https://reader036.fdocuments.in/reader036/viewer/2022062321/56649e675503460f94b627f4/html5/thumbnails/31.jpg)
31
Protein Families
![Page 32: Improving the Reliability of Peptide Identification by Tandem Mass Spectrometry Nathan Edwards Department of Biochemistry and Molecular & Cellular Biology.](https://reader036.fdocuments.in/reader036/viewer/2022062321/56649e675503460f94b627f4/html5/thumbnails/32.jpg)
32
Protein Families
![Page 33: Improving the Reliability of Peptide Identification by Tandem Mass Spectrometry Nathan Edwards Department of Biochemistry and Molecular & Cellular Biology.](https://reader036.fdocuments.in/reader036/viewer/2022062321/56649e675503460f94b627f4/html5/thumbnails/33.jpg)
33
Peptide DLATVYVDVLK
![Page 34: Improving the Reliability of Peptide Identification by Tandem Mass Spectrometry Nathan Edwards Department of Biochemistry and Molecular & Cellular Biology.](https://reader036.fdocuments.in/reader036/viewer/2022062321/56649e675503460f94b627f4/html5/thumbnails/34.jpg)
34
Hidden Markov Models for Spectral Matching
• Capture statistical variation and consensus in peak intensity• Only need 10 spectra to build a model
• Capture semantics of peaks• Extrapolate model to other peptides
• Good specificity with superior sensitivity for peptide detection• Assign 1000’s of additional spectra (p-value < 10-5)
![Page 35: Improving the Reliability of Peptide Identification by Tandem Mass Spectrometry Nathan Edwards Department of Biochemistry and Molecular & Cellular Biology.](https://reader036.fdocuments.in/reader036/viewer/2022062321/56649e675503460f94b627f4/html5/thumbnails/35.jpg)
35
Hidden Markov Model
Ion
Delete
Insert
(m/z,int) pair emitted by ion & insert states
![Page 36: Improving the Reliability of Peptide Identification by Tandem Mass Spectrometry Nathan Edwards Department of Biochemistry and Molecular & Cellular Biology.](https://reader036.fdocuments.in/reader036/viewer/2022062321/56649e675503460f94b627f4/html5/thumbnails/36.jpg)
36
The devil in the details
• Intensity normalization
• Discretize (m/z,int) pairs
• Viterbi distance as score
• Compute p-value using “random” spectra
![Page 37: Improving the Reliability of Peptide Identification by Tandem Mass Spectrometry Nathan Edwards Department of Biochemistry and Molecular & Cellular Biology.](https://reader036.fdocuments.in/reader036/viewer/2022062321/56649e675503460f94b627f4/html5/thumbnails/37.jpg)
37
Random Spectra
• Uniform sample of (m/z,int)• Permutation (m/z) of true spectra peaks• M/z distribution between true spectra and
uniform sample (parameter)
RandomTrue False
Viterbi Score
# of
spe
ctra
![Page 38: Improving the Reliability of Peptide Identification by Tandem Mass Spectrometry Nathan Edwards Department of Biochemistry and Molecular & Cellular Biology.](https://reader036.fdocuments.in/reader036/viewer/2022062321/56649e675503460f94b627f4/html5/thumbnails/38.jpg)
38
HMM Peptide Identification Results – DLATV
DLAT (viterbi)
0
20
40
60
80
100
120
140
160
180
200
220
240
0-10
20-3
0
40-5
0
60-7
0
80-9
0
100-
110
120-
130
140-
150
160-
170
180-
190
200-
210
220-
230
240-
250
260-
270
280-
290
Viterbi Distance
# o
f s
pe
ctr
a
True_test(0.0001) True_test(other) False_test(0.0001) False_test(other)
DLAT (-logP)
0
10
20
30
40
50
60
70
80
90
100
0-1 1-2 2-3 3-4 4-5 5-6 6-7 7-8 8-9 9-10
10-11
11-12
12-13
13-14
14-15
15-16
16-17
17-18
18-19
inf
-log(p-value)
# o
f s
pe
ctr
a
True_test(0.0001) True_test(other) False_test(0.0001) False_test(other)
![Page 39: Improving the Reliability of Peptide Identification by Tandem Mass Spectrometry Nathan Edwards Department of Biochemistry and Molecular & Cellular Biology.](https://reader036.fdocuments.in/reader036/viewer/2022062321/56649e675503460f94b627f4/html5/thumbnails/39.jpg)
39
Spectral Matching of Peptide Variants
DFLAGGVAAAISK
DFLAGGIAAAISK
![Page 40: Improving the Reliability of Peptide Identification by Tandem Mass Spectrometry Nathan Edwards Department of Biochemistry and Molecular & Cellular Biology.](https://reader036.fdocuments.in/reader036/viewer/2022062321/56649e675503460f94b627f4/html5/thumbnails/40.jpg)
40
HMM model extrapolation
![Page 41: Improving the Reliability of Peptide Identification by Tandem Mass Spectrometry Nathan Edwards Department of Biochemistry and Molecular & Cellular Biology.](https://reader036.fdocuments.in/reader036/viewer/2022062321/56649e675503460f94b627f4/html5/thumbnails/41.jpg)
41
Mascot Search Results
![Page 42: Improving the Reliability of Peptide Identification by Tandem Mass Spectrometry Nathan Edwards Department of Biochemistry and Molecular & Cellular Biology.](https://reader036.fdocuments.in/reader036/viewer/2022062321/56649e675503460f94b627f4/html5/thumbnails/42.jpg)
42
Peptide Identification Results
• Search engines always provide an answer
• Current search engines:• Hard to determine “good” scores• Significance estimates are unreliable
• Need better methods!
![Page 43: Improving the Reliability of Peptide Identification by Tandem Mass Spectrometry Nathan Edwards Department of Biochemistry and Molecular & Cellular Biology.](https://reader036.fdocuments.in/reader036/viewer/2022062321/56649e675503460f94b627f4/html5/thumbnails/43.jpg)
43
Common Algorithmic Framework
• Pre-process experimental spectra
• Filter peptide candidates
• Score match between peptides and spectra
• Rank peptides and assign
![Page 44: Improving the Reliability of Peptide Identification by Tandem Mass Spectrometry Nathan Edwards Department of Biochemistry and Molecular & Cellular Biology.](https://reader036.fdocuments.in/reader036/viewer/2022062321/56649e675503460f94b627f4/html5/thumbnails/44.jpg)
44
Comparison of search engines
• No single score is comprehensive
• Search engines disagree
• Many spectra lack confident peptide assignment
4%
OMSSA10%
2%
5%9%
69%
2%
X!Tandem
Mascot
![Page 45: Improving the Reliability of Peptide Identification by Tandem Mass Spectrometry Nathan Edwards Department of Biochemistry and Molecular & Cellular Biology.](https://reader036.fdocuments.in/reader036/viewer/2022062321/56649e675503460f94b627f4/html5/thumbnails/45.jpg)
45
Lots of published solutions!
• Treat search engines as black-boxes
• Apply supervised machine learning to results• Use multiple match metrics
• Combine/refine using multiple search engines• Agreement suggests correctness
• Use empirical significance estimates• “Decoy” databases (FDR)
![Page 46: Improving the Reliability of Peptide Identification by Tandem Mass Spectrometry Nathan Edwards Department of Biochemistry and Molecular & Cellular Biology.](https://reader036.fdocuments.in/reader036/viewer/2022062321/56649e675503460f94b627f4/html5/thumbnails/46.jpg)
46
PepArML
• Peptide identification arbiter by machine learning
• Unifies these ideas within a model-free, combining machine learning framework
• Unsupervised training procedure
![Page 47: Improving the Reliability of Peptide Identification by Tandem Mass Spectrometry Nathan Edwards Department of Biochemistry and Molecular & Cellular Biology.](https://reader036.fdocuments.in/reader036/viewer/2022062321/56649e675503460f94b627f4/html5/thumbnails/47.jpg)
47
PepArML Overview
• Unify Tandem, Mascot, and OMSSA results
X!Tandem
Mascot
OMSSA
Other
PepArML
Identified
Unidentified
![Page 48: Improving the Reliability of Peptide Identification by Tandem Mass Spectrometry Nathan Edwards Department of Biochemistry and Molecular & Cellular Biology.](https://reader036.fdocuments.in/reader036/viewer/2022062321/56649e675503460f94b627f4/html5/thumbnails/48.jpg)
48
Voting Heuristic Combiner
• Choose peptide ID with the most votes• Use best FDR as confidence
• Break ties (single votes) using FDR
• Strawman for comparison
![Page 49: Improving the Reliability of Peptide Identification by Tandem Mass Spectrometry Nathan Edwards Department of Biochemistry and Molecular & Cellular Biology.](https://reader036.fdocuments.in/reader036/viewer/2022062321/56649e675503460f94b627f4/html5/thumbnails/49.jpg)
49
Dataset construction
Machine Learningx
Spectra compare
Matched Ions
Peak_intensity
Mass delta
# of missed cleavages
Peptide length
Tandem Score
Mascot Score
OMSSA Score
Extract Features
X!Tandem
Mascot
OMSSA
Other
Search Tools
![Page 50: Improving the Reliability of Peptide Identification by Tandem Mass Spectrometry Nathan Edwards Department of Biochemistry and Molecular & Cellular Biology.](https://reader036.fdocuments.in/reader036/viewer/2022062321/56649e675503460f94b627f4/html5/thumbnails/50.jpg)
50
Dataset construction
• Build feature vectors
T),( 11 PS
F),( 21 PS
T),( 12 PS
Tandem Mascot OMSSA
T),( mn PS
……
![Page 51: Improving the Reliability of Peptide Identification by Tandem Mass Spectrometry Nathan Edwards Department of Biochemistry and Molecular & Cellular Biology.](https://reader036.fdocuments.in/reader036/viewer/2022062321/56649e675503460f94b627f4/html5/thumbnails/51.jpg)
51
Dataset construction
• Synthetic protein mixtures provide ground truth
• C8 • 8 standard proteins (Calibrant Biosystems)• 4594 MS/MS spectra (LTQ)• 618 (11.2%) true positives
• S17• 17 standard proteins (Sashimi Repository)• 1389 MS/MS spectra (Q-TOF)• 354 (25.4%) true positives
• AURUM• 364 standard proteins (AURUM 1.0)• 7508 MS/MS spectra (MALDI-TOF-TOF)• 3775 (50.3%) true positives
![Page 52: Improving the Reliability of Peptide Identification by Tandem Mass Spectrometry Nathan Edwards Department of Biochemistry and Molecular & Cellular Biology.](https://reader036.fdocuments.in/reader036/viewer/2022062321/56649e675503460f94b627f4/html5/thumbnails/52.jpg)
52
Machine learning improves single search engines (S17)
![Page 53: Improving the Reliability of Peptide Identification by Tandem Mass Spectrometry Nathan Edwards Department of Biochemistry and Molecular & Cellular Biology.](https://reader036.fdocuments.in/reader036/viewer/2022062321/56649e675503460f94b627f4/html5/thumbnails/53.jpg)
53
Multiple search engines are better than single search engines (S17)
![Page 54: Improving the Reliability of Peptide Identification by Tandem Mass Spectrometry Nathan Edwards Department of Biochemistry and Molecular & Cellular Biology.](https://reader036.fdocuments.in/reader036/viewer/2022062321/56649e675503460f94b627f4/html5/thumbnails/54.jpg)
54
Feature Evaluation
![Page 55: Improving the Reliability of Peptide Identification by Tandem Mass Spectrometry Nathan Edwards Department of Biochemistry and Molecular & Cellular Biology.](https://reader036.fdocuments.in/reader036/viewer/2022062321/56649e675503460f94b627f4/html5/thumbnails/55.jpg)
55
Application to Real Data
• How well do these models generalize?
• Different instruments• Spectral characteristics change scores
• Search parameters• Different parameters change score values
• Supervised learning requires• (Synthetic) experimental data from every instrument• Search results from available search engines• Training/models for all
parameters x search engine sets x instruments
![Page 56: Improving the Reliability of Peptide Identification by Tandem Mass Spectrometry Nathan Edwards Department of Biochemistry and Molecular & Cellular Biology.](https://reader036.fdocuments.in/reader036/viewer/2022062321/56649e675503460f94b627f4/html5/thumbnails/56.jpg)
56
Model Generalization
![Page 57: Improving the Reliability of Peptide Identification by Tandem Mass Spectrometry Nathan Edwards Department of Biochemistry and Molecular & Cellular Biology.](https://reader036.fdocuments.in/reader036/viewer/2022062321/56649e675503460f94b627f4/html5/thumbnails/57.jpg)
57
Rescuing Machine Learning
• Train a new machine-learning model for every dataset!• Generalization not required• No predetermined search engines, parameters,
instruments, features
• Perhaps we can “guess” the true proteins• Most proteins not in doubt• Machine learning can tolerate imperfect labels
![Page 58: Improving the Reliability of Peptide Identification by Tandem Mass Spectrometry Nathan Edwards Department of Biochemistry and Molecular & Cellular Biology.](https://reader036.fdocuments.in/reader036/viewer/2022062321/56649e675503460f94b627f4/html5/thumbnails/58.jpg)
58
Unsupervised Learning
• Heuristic selection of “true” proteins• Train classifier, predict true peptide IDs
• Update “true” proteins• Heuristic selection of “true” proteins from
classifier predictions
• Iterate until convergence
![Page 59: Improving the Reliability of Peptide Identification by Tandem Mass Spectrometry Nathan Edwards Department of Biochemistry and Molecular & Cellular Biology.](https://reader036.fdocuments.in/reader036/viewer/2022062321/56649e675503460f94b627f4/html5/thumbnails/59.jpg)
59
Unsupervised Learning Performance
![Page 60: Improving the Reliability of Peptide Identification by Tandem Mass Spectrometry Nathan Edwards Department of Biochemistry and Molecular & Cellular Biology.](https://reader036.fdocuments.in/reader036/viewer/2022062321/56649e675503460f94b627f4/html5/thumbnails/60.jpg)
60
Unsupervised Learning Convergence
![Page 61: Improving the Reliability of Peptide Identification by Tandem Mass Spectrometry Nathan Edwards Department of Biochemistry and Molecular & Cellular Biology.](https://reader036.fdocuments.in/reader036/viewer/2022062321/56649e675503460f94b627f4/html5/thumbnails/61.jpg)
61
Conclusions
• Proteomics can inform genome annotation• Eukaryotic and prokaryotic • Functional vs silencing variants
• Peptides identify more than just proteins• Untapped source of disease biomarkers
• Computational inference can make a substantial impact in proteomics
![Page 62: Improving the Reliability of Peptide Identification by Tandem Mass Spectrometry Nathan Edwards Department of Biochemistry and Molecular & Cellular Biology.](https://reader036.fdocuments.in/reader036/viewer/2022062321/56649e675503460f94b627f4/html5/thumbnails/62.jpg)
62
Conclusions
• Compressed peptide sequence databases make routine EST searching feasible
• HMMatch spectral matching improves identification performance for familiar peptides
• Unsupervised, model-free, combining PepArML framework solves peptide identification interpretation problem
![Page 63: Improving the Reliability of Peptide Identification by Tandem Mass Spectrometry Nathan Edwards Department of Biochemistry and Molecular & Cellular Biology.](https://reader036.fdocuments.in/reader036/viewer/2022062321/56649e675503460f94b627f4/html5/thumbnails/63.jpg)
63
Acknowledgements
• Chau-Wen Tseng, Xue Wu• UMCP Computer Science
• Catherine Fenselau• UMCP Biochemistry
• Cheng Lee• Calibrant Biosystems
• PeptideAtlas, HUPO PPP, X!Tandem
• Funding: NIH/NCI, USDA/ARS