CSE182-L12
description
Transcript of CSE182-L12
![Page 1: CSE182-L12](https://reader034.fdocuments.in/reader034/viewer/2022051821/568152d6550346895dc0f10f/html5/thumbnails/1.jpg)
CSE182
CSE182-L12
Mass SpectrometryPeptide identification
![Page 2: CSE182-L12](https://reader034.fdocuments.in/reader034/viewer/2022051821/568152d6550346895dc0f10f/html5/thumbnails/2.jpg)
CSE182
All atoms have isotopes
• Isotopes of atoms– O16,18, C-12,13, S32,34….– Each isotope has a frequency of occurrence
• If a molecule (peptide) has a single copy of C-13, that will shift its peak by 1 Da
• With multiple copies of a peptide, we have a distribution of intensities over a range of masses (Isotopic profile).
• How can you compute the isotopic profile of a peak?
![Page 3: CSE182-L12](https://reader034.fdocuments.in/reader034/viewer/2022051821/568152d6550346895dc0f10f/html5/thumbnails/3.jpg)
CSE182
Isotope Calculation
• Denote:– Nc : number of carbon atoms in the peptide
– Pc : probability of occurrence of C-13 (~1%)
– Then
€
Pr[Peak at M] =NC
0 ⎛ ⎝ ⎜
⎞ ⎠ ⎟pc
0 1− pc( )NC
Pr[Peak at M +1] =NC
1 ⎛ ⎝ ⎜
⎞ ⎠ ⎟pc
1 1− pc( )NC −1
+1
Nc=50
+1
Nc=200
![Page 4: CSE182-L12](https://reader034.fdocuments.in/reader034/viewer/2022051821/568152d6550346895dc0f10f/html5/thumbnails/4.jpg)
CSE182
Isotope Calculation Example • Suppose we consider Nitrogen, and Carbon• NN: number of Nitrogen atoms• PN: probability of occurrence of N-15• Pr(peak at M)• Pr(peak at M+1)?• Pr(peak at M+2)?
€
Pr[Peak at M] =NC
0 ⎛ ⎝ ⎜
⎞ ⎠ ⎟pc
0 1− pc( )NC NN
0 ⎛ ⎝ ⎜
⎞ ⎠ ⎟pN
0 1− pN( )NN
Pr[Peak at M +1] =NC
1 ⎛ ⎝ ⎜
⎞ ⎠ ⎟pc
1 1− pc( )NC −1 NN
0 ⎛ ⎝ ⎜
⎞ ⎠ ⎟pN
0 1− pN( )NN
+NC
0 ⎛ ⎝ ⎜
⎞ ⎠ ⎟pc
0 1− pc( )NC NN
1 ⎛ ⎝ ⎜
⎞ ⎠ ⎟pN
1 1− pN( )NN −1
How do we generalize? How can we handle Oxygen (O-16,18)?
![Page 5: CSE182-L12](https://reader034.fdocuments.in/reader034/viewer/2022051821/568152d6550346895dc0f10f/html5/thumbnails/5.jpg)
CSE182
General isotope computation
• Definition:– Let pi,a be the abundance of the isotope with
mass i Da above the least mass– Ex: P0,C : abundance of C-12, P2,O: O-18 etc.– Let Na denote the number of atome of
amino-acid a in the sample.
• Goal: compute the heights of the isotopic peaks. Specifically, compute Pi= Prob{M+i}, for i=0,1,2…
![Page 6: CSE182-L12](https://reader034.fdocuments.in/reader034/viewer/2022051821/568152d6550346895dc0f10f/html5/thumbnails/6.jpg)
CSE182
Characteristic polynomial
• We define the characteristic polynomial of a peptide as follows:
•
(x) is a concise representation of the isotope profile
€
φ(x) = P0 + P1x + P2x 2 + P3x 3 +K
![Page 7: CSE182-L12](https://reader034.fdocuments.in/reader034/viewer/2022051821/568152d6550346895dc0f10f/html5/thumbnails/7.jpg)
CSE182
Characteristic polynomial computation
• Suppose carbon was the only atom with an isotope C-13.
€
φ(x) = P0 + P1x + P2x 2 + P3x 3 +K
=NC
0
⎛
⎝ ⎜
⎞
⎠ ⎟p0,c
0 1− p0,c( )NC +
NC
1
⎛
⎝ ⎜
⎞
⎠ ⎟p0,c 1− p0,c( )
NC −1x
= (p0,c + p1,c x)NC
![Page 8: CSE182-L12](https://reader034.fdocuments.in/reader034/viewer/2022051821/568152d6550346895dc0f10f/html5/thumbnails/8.jpg)
CSE182
General isotope computation
• Definition:– Let pi,a be the abundance of the isotope with mass i Da
above the least mass– Ex: P0,C : abundance of C-12, P2,O: O-18 etc.
• Characteristic polynomial
• Prob{M+i}: coefficient of xi in (x) (a binomial convolution)
€
φ(x) = p0,a + p1,a x + p2,a x 2 +L( )a
∏Na
![Page 9: CSE182-L12](https://reader034.fdocuments.in/reader034/viewer/2022051821/568152d6550346895dc0f10f/html5/thumbnails/9.jpg)
CSE182
Isotopic Profile Application
• In DxMS, hydrogen atoms are exchanged with deuterium• The rate of exchange indicates how buried the peptide is (in
folded state)• Consider the observed characteristic polynomial of the isotope
profile t1, t2, at various time points. Then
• The estimates of p1,H can be obtained by a deconvolution
• Such estimates at various time points should give the rate of incorporation of Deuterium, and therefore, the accessibility.
€
φt2(x) = φt1(x)(p0,H + p1,H )N H
Not in Syllabus
![Page 10: CSE182-L12](https://reader034.fdocuments.in/reader034/viewer/2022051821/568152d6550346895dc0f10f/html5/thumbnails/10.jpg)
CSE182
Quiz
How can you determine the charge on a peptide?
Difference between the first and second isotope peak is 1/Z
Proposal: Given a mass, predict a composition, and the isotopic profile Do a ‘goodness of fit’ test to isolate the peaks corresponding to the isotope Compute the difference
![Page 11: CSE182-L12](https://reader034.fdocuments.in/reader034/viewer/2022051821/568152d6550346895dc0f10f/html5/thumbnails/11.jpg)
CSE182
Ion mass computations
• Amino-acids are linked into peptide chains, by forming peptide bonds
• Residue mass– Res.Mass(aa) =
Mol.Mass(aa)-18– (loss of water)
![Page 12: CSE182-L12](https://reader034.fdocuments.in/reader034/viewer/2022051821/568152d6550346895dc0f10f/html5/thumbnails/12.jpg)
CSE182
Peptide chains
• MolMass(SGFAL) = resM(S)+…res(L)+18
![Page 13: CSE182-L12](https://reader034.fdocuments.in/reader034/viewer/2022051821/568152d6550346895dc0f10f/html5/thumbnails/13.jpg)
CSE182
M/Z values for b/y-ions
• Singly charged b-ion = ResMass(prefix) + 1
• Singly charged y-ion= ResMass(suffix)+18+1
• What if the ions have higher units of charge?
RNH+
3-CH-CO-NH-CH-COOH R
RNH+
3-CH-CO-NH-CH-CO R
H+ RNH2-CH-CO-………-NH-CH-COOH R
Ionized Peptide
![Page 14: CSE182-L12](https://reader034.fdocuments.in/reader034/viewer/2022051821/568152d6550346895dc0f10f/html5/thumbnails/14.jpg)
CSE182
De novo interpretation
• Given a spectrum (a collection of b-y ions), compute the peptide that generated the spectrum.
• A database of peptides is not given!• Useful?
– Many genomes have not been sequenced– Tagging/filtering– PTMs
![Page 15: CSE182-L12](https://reader034.fdocuments.in/reader034/viewer/2022051821/568152d6550346895dc0f10f/html5/thumbnails/15.jpg)
CSE182
De Novo Interpretation: Example
S G E K0 88 145 274 402 b-ions
420 333 276 147 0 y-ions
b
y
y
2
100 500400300200
M/Z
b
1
1
2
Ion Offsetsb=P+1y=S+19=M-P+19
![Page 16: CSE182-L12](https://reader034.fdocuments.in/reader034/viewer/2022051821/568152d6550346895dc0f10f/html5/thumbnails/16.jpg)
CSE182
Computing possible prefixes
• We know the parent mass M=401.• Consider a mass value 88• Assume that it is a b-ion, or a y-ion• If b-ion, it corresponds to a prefix of the peptide with
residue mass 88-1 = 87.• If y-ion, y=M-P+19.
– Therefore the prefix has mass • P=M-y+19= 401-88+19=332
• Compute all possible Prefix Residue Masses (PRM) for all ions.
![Page 17: CSE182-L12](https://reader034.fdocuments.in/reader034/viewer/2022051821/568152d6550346895dc0f10f/html5/thumbnails/17.jpg)
CSE182
Putative Prefix Masses
Prefix Mass
M=401 b y88 87 332145 144 275147 146 273276 275 144
S G E K0 87 144 273 401
• Only a subset of the prefix masses are correct.
• The correct mass values form a ladder of amino-acid residues
![Page 18: CSE182-L12](https://reader034.fdocuments.in/reader034/viewer/2022051821/568152d6550346895dc0f10f/html5/thumbnails/18.jpg)
CSE182
Spectral Graph
• Each prefix residue mass (PRM) corresponds to a node.
• Two nodes are connected by an edge if the mass difference is a residue mass.
• A path in the graph is a de novo interpretation of the spectrum
87 144G
![Page 19: CSE182-L12](https://reader034.fdocuments.in/reader034/viewer/2022051821/568152d6550346895dc0f10f/html5/thumbnails/19.jpg)
CSE182
Spectral Graph
• Each peak, when assigned to a prefix/suffix ion type generates a unique prefix residue mass.
• Spectral graph: – Each node u defines a putative prefix residue M(u).– (u,v) in E if M(v)-M(u) is the residue mass of an a.a. (tag)
or 0.– Paths in the spectral graph correspond to a interpretation
300100
401
200
0
S G E K
27387 146144 275 332
![Page 20: CSE182-L12](https://reader034.fdocuments.in/reader034/viewer/2022051821/568152d6550346895dc0f10f/html5/thumbnails/20.jpg)
CSE182
Re-defining de novo interpretation
• Find a subset of nodes in spectral graph s.t.– 0, M are included– Each peak contributes at most one node (interpretation)(*)– Each adjacent pair (when sorted by mass) is connected by an
edge (valid residue mass)– An appropriate objective function (ex: the number of peaks
interpreted) is maximized
300100
401
200
0
S G E K
27387 146144 275 332
87 144G
![Page 21: CSE182-L12](https://reader034.fdocuments.in/reader034/viewer/2022051821/568152d6550346895dc0f10f/html5/thumbnails/21.jpg)
CSE182
Two problems
• Too many nodes.– Only a small fraction are correspond to b/y ions (leading to
true PRMs) (learning problem)• Multiple Interpretations
– Even if the b/y ions were correctly predicted, each peak generates multiple possibilities, only one of which is correct. We need to find a path that uses each peak only once (algorithmic problem).
– In general, the forbidden pairs problem is NP-hard
300100
401
200
0
S G E K
27387 146144 275 332
![Page 22: CSE182-L12](https://reader034.fdocuments.in/reader034/viewer/2022051821/568152d6550346895dc0f10f/html5/thumbnails/22.jpg)
CSE182
Too many nodes
• We will use other properties to decide if a peak is a b-y peak or not.
• For now, assume that (u) is a score function for a peak u being a b-y ion.
![Page 23: CSE182-L12](https://reader034.fdocuments.in/reader034/viewer/2022051821/568152d6550346895dc0f10f/html5/thumbnails/23.jpg)
CSE182
Multiple Interpretation
• Each peak generates multiple possibilities, only one of which is correct. We need to find a path that uses each peak only once (algorithmic problem).
• In general, the forbidden pairs problem is NP-hard
• However, The b,y ions have a special non-interleaving property
• Consider pairs (b1,y1), (b2,y2)– If (b1 < b2), then y1 > y2
![Page 24: CSE182-L12](https://reader034.fdocuments.in/reader034/viewer/2022051821/568152d6550346895dc0f10f/html5/thumbnails/24.jpg)
CSE182
Non-Intersecting Forbidden pairs
300100 4002000
S G E K• If we consider only b,y ions, ‘forbidden’ node pairs are non-
intersecting, • The de novo problem can be solved efficiently using a dynamic
programming technique.
87 332
![Page 25: CSE182-L12](https://reader034.fdocuments.in/reader034/viewer/2022051821/568152d6550346895dc0f10f/html5/thumbnails/25.jpg)
CSE182
The forbidden pairs method
• Sort the PRMs according to increasing mass values.• For each node u, f(u) represents the forbidden pair• Let m(u) denote the mass value of the PRM.• Let (u) denote the score of u• Objective: Find a path of maximum score with no
forbidden pairs.
300100 4002000 87 332
u f(u)
![Page 26: CSE182-L12](https://reader034.fdocuments.in/reader034/viewer/2022051821/568152d6550346895dc0f10f/html5/thumbnails/26.jpg)
CSE182
D.P. for forbidden pairs
• Consider all pairs u,v– m[u] <= M/2, m[v] >M/2
• Define S(u,v) as the best score of a forbidden pair path from – 0->u, and v->M
• Is it sufficient to compute S(u,v) for all u,v?
300100 4002000 87 332
u v
![Page 27: CSE182-L12](https://reader034.fdocuments.in/reader034/viewer/2022051821/568152d6550346895dc0f10f/html5/thumbnails/27.jpg)
CSE182
D.P. for forbidden pairs
• Note that the best interpretation is given by
€
max((u,v )∈E ) S(u,v)
300100 4002000 87 332
u v
![Page 28: CSE182-L12](https://reader034.fdocuments.in/reader034/viewer/2022051821/568152d6550346895dc0f10f/html5/thumbnails/28.jpg)
CSE182
End of L12
![Page 29: CSE182-L12](https://reader034.fdocuments.in/reader034/viewer/2022051821/568152d6550346895dc0f10f/html5/thumbnails/29.jpg)
CSE182
D.P. for forbidden pairs
• Note that we have one of two cases.1. Either u > f(v) (and f(u) < v)2. Or, u < f(v) (and f(u) > v)
• Case 1.– Extend u, do not touch f(v)
300100 4002000
uf(v) v
€
S(u,v) = max(u':(u',u)∈E
u'≠ f (v ))S(u',v) + δ(u')
![Page 30: CSE182-L12](https://reader034.fdocuments.in/reader034/viewer/2022051821/568152d6550346895dc0f10f/html5/thumbnails/30.jpg)
CSE182
The complete algorithm
for all u /*increasing mass values from 0 to M/2 */for all v /*decreasing mass values from M to M/2 */
if (u < f[v])
else if (u > f[v])
If (u,v)E /*maxI is the score of the best interpretation*/maxI = max {maxI,S[u,v]}
€
S[u,v] = max (w,u)∈E
w≠ f (v )
⎛
⎝ ⎜
⎞
⎠ ⎟S[w,v] + δ(w)
€
S[u,v] = max (v,w )∈E
w≠ f (u)
⎛
⎝ ⎜
⎞
⎠ ⎟S[u,w] + δ(w)
![Page 31: CSE182-L12](https://reader034.fdocuments.in/reader034/viewer/2022051821/568152d6550346895dc0f10f/html5/thumbnails/31.jpg)
CSE182
De Novo: Second issue
• Given only b,y ions, a forbidden pairs path will solve the problem.
• However, recall that there are MANY other ion types.– Typical length of peptide: 15– Typical # peaks? 50-150?– #b/y ions?– Most ions are “Other”
• a ions, neutral losses, isotopic peaks….
![Page 32: CSE182-L12](https://reader034.fdocuments.in/reader034/viewer/2022051821/568152d6550346895dc0f10f/html5/thumbnails/32.jpg)
CSE182
De novo: Weighting nodes in Spectrum Graph
• Factors determining if the ion is b or y– Intensity (A large fraction of the most intense peaks are b
or y)– Support ions– Isotopic peaks
![Page 33: CSE182-L12](https://reader034.fdocuments.in/reader034/viewer/2022051821/568152d6550346895dc0f10f/html5/thumbnails/33.jpg)
CSE182
De novo: Weighting nodes
• A probabilistic network to model support ions (Pepnovo)
![Page 34: CSE182-L12](https://reader034.fdocuments.in/reader034/viewer/2022051821/568152d6550346895dc0f10f/html5/thumbnails/34.jpg)
CSE182
De Novo Interpretation Summary
• The main challenge is to separate b/y ions from everything else (weighting nodes), and separating the prefix ions from the suffix ions (Forbidden Pairs).
• As always, the abstract idea must be supplemented with many details.
– Noise peaks, incomplete fragmentation– In reality, a PRM is first scored on its likelihood of being correct, and the
forbidden pair method is applied subsequently.
• In spite of these algorithms, de novo identification remains an error-prone process. When the peptide is in the database, db search is the method of choice.
![Page 35: CSE182-L12](https://reader034.fdocuments.in/reader034/viewer/2022051821/568152d6550346895dc0f10f/html5/thumbnails/35.jpg)
CSE182
The dynamic nature of the cell
• The proteome of the cell is changing
• Various extra-cellular, and other signals activate pathways of proteins.
• A key mechanism of protein activation is PT modification
• These pathways may lead to other genes being switched on or off
• Mass Spectrometry is key to probing the proteome
![Page 36: CSE182-L12](https://reader034.fdocuments.in/reader034/viewer/2022051821/568152d6550346895dc0f10f/html5/thumbnails/36.jpg)
CSE182
What happens to the spectrum upon modification?
• Consider the peptide MSTYER.
• Either S,T, or Y (one or more) can be phosphorylated
• Upon phosphorylation, the b-, and y-ions shift in a characteristic fashion. Can you determine where the modification has occurred?
11
654325432
If T is phosphorylated, b3, b4, b5, b6, and y4, y5, y6 will shift
![Page 37: CSE182-L12](https://reader034.fdocuments.in/reader034/viewer/2022051821/568152d6550346895dc0f10f/html5/thumbnails/37.jpg)
CSE182
Effect of PT modifications on identification
• The shifts do not affect de novo interpretation too much. Why?
• Database matching algorithms are affected, and must be changed.
• Given a candidate peptide, and a spectrum, can you identify the sites of modifications
![Page 38: CSE182-L12](https://reader034.fdocuments.in/reader034/viewer/2022051821/568152d6550346895dc0f10f/html5/thumbnails/38.jpg)
CSE182
Db matching in the presence of modifications
• Consider MSTYER• The number of modifications can be obtained by the difference
in parent mass.• If 1 phoshphorylation, we have 3 possibilities:
– MS*TYER– MST*YER– MSTY*ER
• Which of these is the best match to the spectrum?• If 2 phosphorylations occurred, we would have 6 possibilities.
Can you compute more efficiently?
![Page 39: CSE182-L12](https://reader034.fdocuments.in/reader034/viewer/2022051821/568152d6550346895dc0f10f/html5/thumbnails/39.jpg)
CSE182
Scoring spectra in the presence of modification
• Can we predict the sites of the modification?• A simple trick can let us predict the modification sites?• Consider the peptide ASTYER. The peptide may have 0,1, or 2
phosphorylation events. The difference of the parent mass will give us the number of phosphorylation events. Assume it is 1.
• Create a table with the number of b,y ions matched at each breakage point assuming 0, or 1 modifications
• Arrows determine the possible paths. Note that there are only 2 downward arrows. The max scoring path determines the phosphorylated residue
A S T Y E R
0
1
![Page 40: CSE182-L12](https://reader034.fdocuments.in/reader034/viewer/2022051821/568152d6550346895dc0f10f/html5/thumbnails/40.jpg)
CSE182
Modifications
• Modifications significantly increase the time of search.
• The algorithm speeds it up somewhat, but is still expensive