1 st MS 2 2 nd 3 rd 4 th 5 th 6 th 10 th 9 th 8 th 7 th
Relative Intensity Fill Times Scan Times shotgun sequencing
Slide 2
MS/MS Spectrum Protein Database spectral matching
Slide 3
time shotgun sequencing
Slide 4
ms 1 ms 2 time shotgun sequencing
Slide 5
LTQ Orbitrap base peak chromatogram 37 min LC-MS/MS run-time
6186 MS/MS spectra 2308 peptide IDs (false-positive rate 1%) 287
protein IDs 6000 spectra x 10s/spectrum = 16 CPU hours Server
single CPU search time 16 hours Server 20 nodes parallel CPUs 0.8
hours distributed spectral matching
Slide 6
XCorr: goodness of fit between theoretical b and y ions from
peptides in the database dCn: fractional XCorr difference between
the highest XCorr and next highest XCorr sequest yates j.r. 3 rd et
al. j am soc mass spectrom 5:976-89 (1994)
Slide 7
ms 1 ms 2 time 5000 - 25000 ms 2 spectra all ms 2 in LC run
sequest
Slide 8
all ms 2 in LC run 1 dta all raw 501.000 (precursor m/z) +2
(charge state) ms2 array (all ms2 = 1 file) 1 ms2 = 1 file (all ms2
= ~10000 files) 2 dta 1001.500 (precursor m/z) +3 (charge state)
ms2 array sequest
Slide 9
2 x 3,250,000 times3 x 3,250,000 times 10000 x 3,250,000 times
all ms 2 in LC run 1 dta, 2 3 10000 dta 1000.000 +/- 1Da human ipi
database 61236 proteins peptide mass: MSQVQVQVQNPSAALSGSQILNK
digest to next peptide calculate peptide mass 2426.258812 compare
with precursor not a candidate if cand., calc. theoretical spectrum
correlate, score & return 3000.000 +/- 1Da 3,250,000 times
sequest
Slide 10
yates j.r. 3 rd et al. j am soc mass spectrom 5:976-89 (1994)
theoretical candidate spectrumexperimental peptide spectrum
correlation spectrum
Slide 11
yates j.r. 3 rd et al. j am soc mass spectrom 5:976-89 (1994)
correlation spectrum
Slide 12
yates j.r. 3 rd et al. j am soc mass spectrom 5:976-89 (1994)
correlation spectrum
Slide 13
yates j.r. 3 rd et al. j am soc mass spectrom 5:976-89 (1994)
correlation spectrum similarity scoring Xcorr score
The Results: Distinguishing Right from Wrong In large
proteomics data sets (for which manual data inspection is
impossible), how can we distinguish between correct and incorrect
peptide assignments? Use decoy sequences to distract non-peptidic,
non- uniquely matchable, or otherwise unmatchable spectra into a
search space that is known a priori to be incorrect Use the
frequency of decoy sequences among total sequences to estimate the
overall frequency of wrong answers (False Positive Rate) Adjust
filtering criteria to achieve a ~ 1% False Positive Rate
Slide 19
Decoy Sequences? A Reversed Database! We generate decoy
sequences by reversing each protein sequence in a given database,
such that the resultant in silico digest contains nonsense
peptides, then append the reversed database to the end of the
forward database Decoy references are labeled with # Database
searching with SEQUEST occurs from top to bottom when decoy
references are found, there is an equal probability it could have
also mapped to a non-decoy sequence. So our FPR is (# of decoys) x
2 / total matches. S E A R C H I N G
Slide 20
Forward database 1.MAGFA SHTRP Reversed database 1.PRTHS AFGAM
Composite Database Sequest Right Wrong (random) F FR 50% 100%
Filter (scoring, mass accuracy, etc) Generate final list Estimate
FP rate from 2 x Rev (i.e., 4%) Known FP Unknown FP Target/Decoy
Database Searching
Precision of mass errors between observed and actual m/z LTQ
Orbitrap & LTQ FT 0.1 0.4 ppm LTQ FT (SIM) AGC target 50,000 to
avoid space-charge effects Olsen et al. (2004) Mol. Cell.
Proteomics 3, 608 -0.2 1.0 ppm High Mass Accuracy Haas et al.
(2006) Mol. Cell. Proteomics 5, 1326 Mass Accuracy in Proteomics:
Performance is related to the width of the distribution, not the
average error
Slide 23
MMA: True Positives and False Positives MMA0 True Positives
False Positives TPFP PSM number False positives are distributed
evenly across MMA space
Slide 24
MS/MS vs MMA: Precision vs Sensitivity MMA0 0 MS/MS criteria
are strong precision filters require TP / FP separation for
sensitivity MMA criteria are weak precision filters assists MS/MS
criteria in improving sensitivity
Slide 25
Distracting Wrong from Right: MMA MMA0 True Positives False
Positives True Positives False Positives MMA 0 Extended Search
Space Search Space Filtered
Slide 26
Mass Accuracy: Another dimension of selectivity Cn XCorr Cn
XCorr Forward Sequences Cn XCorr Forward + Reverse Tryptic Search
+/- 2Da Cn XCorr Tryptic Search +/- 2Da 5ppm filter
phosphorylation site localization GFDSNQpTWR or GFDpSNQTWR?
Beausoleil et al., Nat. Biotechnol, 2006
Slide 36
phosphorylation site localization Beausoleil et al., Nat.
Biotechnol, 2006
Slide 37
phosphorylation site localization Taus et al., JPR, 2011
Slide 38
phosphorylation localization rate (FLR) Chalkey & Clauser,
MCP, 2012 Baker et al., MCP, 2011 use non-native phosphoacceptors
as decoys Ser + Thr (human proteome): 14.1% Pro + Glu (human
proteome): 14.5% allow search engine / localization assessment
tools to consider pP and pE as true negative decoys calculate
dataset FLR based on frequency of pP + pE decoys