PEAKS: De Novo Sequencing using MS/MS spectra Bin Ma, U. Western Ontario, Canada Kaizhong Zhang,U....
-
date post
15-Jan-2016 -
Category
Documents
-
view
216 -
download
0
Transcript of PEAKS: De Novo Sequencing using MS/MS spectra Bin Ma, U. Western Ontario, Canada Kaizhong Zhang,U....
![Page 1: PEAKS: De Novo Sequencing using MS/MS spectra Bin Ma, U. Western Ontario, Canada Kaizhong Zhang,U. Western Ontario, Canada Chengzhi Liang, Bioinformatics.](https://reader035.fdocuments.in/reader035/viewer/2022070412/56649d445503460f94a210e5/html5/thumbnails/1.jpg)
PEAKS: De Novo Sequencing using MS/MS spectra
Bin Ma, U. Western Ontario, Canada
Kaizhong Zhang, U. Western Ontario, Canada
Chengzhi Liang, Bioinformatics Solutions Inc. Canada
![Page 2: PEAKS: De Novo Sequencing using MS/MS spectra Bin Ma, U. Western Ontario, Canada Kaizhong Zhang,U. Western Ontario, Canada Chengzhi Liang, Bioinformatics.](https://reader035.fdocuments.in/reader035/viewer/2022070412/56649d445503460f94a210e5/html5/thumbnails/2.jpg)
Outline
• Background – Tandem Mass Spectrometry
• De novo sequencing– Problem Definition and Algorithm.
• Software implementation – PEAKS
• Future work
![Page 3: PEAKS: De Novo Sequencing using MS/MS spectra Bin Ma, U. Western Ontario, Canada Kaizhong Zhang,U. Western Ontario, Canada Chengzhi Liang, Bioinformatics.](https://reader035.fdocuments.in/reader035/viewer/2022070412/56649d445503460f94a210e5/html5/thumbnails/3.jpg)
Background
• Human has 100,000 different proteins. Because of the existence of post translational modifications, each protein can have many different versions.
• Diseases are closely related to the abnormal proteins or the expression levels of proteins.
• Given a tissue, the identification of the proteins (and their modified versions) in it is a fundamental problem for the drug design.
![Page 4: PEAKS: De Novo Sequencing using MS/MS spectra Bin Ma, U. Western Ontario, Canada Kaizhong Zhang,U. Western Ontario, Canada Chengzhi Liang, Bioinformatics.](https://reader035.fdocuments.in/reader035/viewer/2022070412/56649d445503460f94a210e5/html5/thumbnails/4.jpg)
Proteins and Peptides
• A protein is a sequence of 20 different types of amino acids.– A protein is a string over alphabet with size 20
• A peptide is a substring of the protein.• The 20 amino acids have 19 distinct masses.
– I and L have the same mass and cannot (difficult) be distinguished by MS/MS.
– Regard them as the same letter.
![Page 5: PEAKS: De Novo Sequencing using MS/MS spectra Bin Ma, U. Western Ontario, Canada Kaizhong Zhang,U. Western Ontario, Canada Chengzhi Liang, Bioinformatics.](https://reader035.fdocuments.in/reader035/viewer/2022070412/56649d445503460f94a210e5/html5/thumbnails/5.jpg)
Tandem Mass Spectrometry
• MS/MS is the only reliable way for protein identification.
…VITK | GTDIMNEMR | SMW…
tissue fraction gel protein
peptide
![Page 6: PEAKS: De Novo Sequencing using MS/MS spectra Bin Ma, U. Western Ontario, Canada Kaizhong Zhang,U. Western Ontario, Canada Chengzhi Liang, Bioinformatics.](https://reader035.fdocuments.in/reader035/viewer/2022070412/56649d445503460f94a210e5/html5/thumbnails/6.jpg)
LGSSEVEQVQLVVDGVKpeptide sequence:
tandem mass spectrometer:
MS/MS spectrum
de novo sequencing:
LGSSEVEQVQLVVDGVK
database
![Page 7: PEAKS: De Novo Sequencing using MS/MS spectra Bin Ma, U. Western Ontario, Canada Kaizhong Zhang,U. Western Ontario, Canada Chengzhi Liang, Bioinformatics.](https://reader035.fdocuments.in/reader035/viewer/2022070412/56649d445503460f94a210e5/html5/thumbnails/7.jpg)
How Does a Peptide Fragment?
m(y1)=19+m(A4)m(y2)=19+m(A4)+m(A3)m(y3)=19+m(A4)+m(A3)+m(A2)
m(b1)=1+m(A1)m(b2)=1+m(A1)+m(A2)m(b3)=1+m(A1)+m(A2)+m(A3)
![Page 8: PEAKS: De Novo Sequencing using MS/MS spectra Bin Ma, U. Western Ontario, Canada Kaizhong Zhang,U. Western Ontario, Canada Chengzhi Liang, Bioinformatics.](https://reader035.fdocuments.in/reader035/viewer/2022070412/56649d445503460f94a210e5/html5/thumbnails/8.jpg)
Matching Sequence with Spectrum
![Page 9: PEAKS: De Novo Sequencing using MS/MS spectra Bin Ma, U. Western Ontario, Canada Kaizhong Zhang,U. Western Ontario, Canada Chengzhi Liang, Bioinformatics.](https://reader035.fdocuments.in/reader035/viewer/2022070412/56649d445503460f94a210e5/html5/thumbnails/9.jpg)
• For any peptide P= a1…an, m(P) = Σi ai.
• De Novo Sequencing
– Given a spectrum, a mass value m, compute a sequence P, s.t. m(P)=m, and the matching score score(P) is maximized.
De Novo Sequencing
![Page 10: PEAKS: De Novo Sequencing using MS/MS spectra Bin Ma, U. Western Ontario, Canada Kaizhong Zhang,U. Western Ontario, Canada Chengzhi Liang, Bioinformatics.](https://reader035.fdocuments.in/reader035/viewer/2022070412/56649d445503460f94a210e5/html5/thumbnails/10.jpg)
A Simpler Case – Only Y-ions
![Page 11: PEAKS: De Novo Sequencing using MS/MS spectra Bin Ma, U. Western Ontario, Canada Kaizhong Zhang,U. Western Ontario, Canada Chengzhi Liang, Bioinformatics.](https://reader035.fdocuments.in/reader035/viewer/2022070412/56649d445503460f94a210e5/html5/thumbnails/11.jpg)
Y-ions Determined By a Suffix19
y1 y2 y3score(Q) can be defined for a suffix Q.
)(max)()(
QscoreuDPuQm
)()()( ufVRscoreLVRscore
)()(max)(a
ufauDPuDP
![Page 12: PEAKS: De Novo Sequencing using MS/MS spectra Bin Ma, U. Western Ontario, Canada Kaizhong Zhang,U. Western Ontario, Canada Chengzhi Liang, Bioinformatics.](https://reader035.fdocuments.in/reader035/viewer/2022070412/56649d445503460f94a210e5/html5/thumbnails/12.jpg)
Counting Both y and b ions
![Page 13: PEAKS: De Novo Sequencing using MS/MS spectra Bin Ma, U. Western Ontario, Canada Kaizhong Zhang,U. Western Ontario, Canada Chengzhi Liang, Bioinformatics.](https://reader035.fdocuments.in/reader035/viewer/2022070412/56649d445503460f94a210e5/html5/thumbnails/13.jpg)
Strategies
• Consider a pair of prefix R and a suffix Q simultaneously.
• Consider only those pairs (R,Q) that satisfy a nice property, which we call “chummy”
• Chummy pairs allow:– The score of a chummy pair can be computed
recursively from a smaller chummy pair. – There are a series of chummy pairs that grow to
the optimal solution.
![Page 14: PEAKS: De Novo Sequencing using MS/MS spectra Bin Ma, U. Western Ontario, Canada Kaizhong Zhang,U. Western Ontario, Canada Chengzhi Liang, Bioinformatics.](https://reader035.fdocuments.in/reader035/viewer/2022070412/56649d445503460f94a210e5/html5/thumbnails/14.jpg)
Dynamic Programming
• Combining Lemma A, B, we can compute
• Suppose (R,Q) is the pair maximizing DP(u,v) under the condition m(R)+m(Q)+a=m. Then RaQ is the optimal peptide.
),(max),(
chummy ),(
)(,)(QRscorevuDP
QR
vQmuRm
![Page 15: PEAKS: De Novo Sequencing using MS/MS spectra Bin Ma, U. Western Ontario, Canada Kaizhong Zhang,U. Western Ontario, Canada Chengzhi Liang, Bioinformatics.](https://reader035.fdocuments.in/reader035/viewer/2022070412/56649d445503460f94a210e5/html5/thumbnails/15.jpg)
PEAKS – The Software
![Page 16: PEAKS: De Novo Sequencing using MS/MS spectra Bin Ma, U. Western Ontario, Canada Kaizhong Zhang,U. Western Ontario, Canada Chengzhi Liang, Bioinformatics.](https://reader035.fdocuments.in/reader035/viewer/2022070412/56649d445503460f94a210e5/html5/thumbnails/16.jpg)
Red = Correct
m/z z Correct Sequence PEAKS (de novo) Comments Lutefisk (de novo)
MALDI MS/MS BSA
927.4 1 YLYEIAR YLYEIAR correct [276.14]EY[184.08]R 1439.7 1 RHPEYAVSVLLR GVLMVDVPPADNGR Wrong (?) No results 1479.8 1 LGEYGFQNALIVR LWYGFQNALIVR correct No results 1639.8 1 KVPQVSTPTLVEVSR RAPKVPQVSTPTLVEVSR correct No results
ESI MS/MS Cyt- c
482.7 2 EDLIAYLK EDLIAYLK correct [357.15]LAYLK 584.8 2 TGPNLHGLFGR TGPNLHGLFGR correct TGPNLHGLFGR 589.3 1 GDVEK VDVEK V = Ac-G VDVEK 634.4 1 IFVQK IFVQK correct IFVQK 678.3 1 YIPGTK YIPGTK correct YIPGTK 728.8 2 TGQAPGFSYTDANK TGQAPGFSYTDANK correct [199.10]SAPGF[250.09]TWNK 779.4 1 MIFAGIK MIFAGIK correct [244.12]FAGLK 792.9 2 KTGQAPGFSYTDAMK KTGAGAPGFSYTDAMK almost [229.15]QGAPGAYQNHANK 817.3 2 IFVQKCAQCHTVEK QFVTHMACCHTVEK partial [257.08][218.08][GP][260.08][HM]TVEK
Apo-Myoglobin
662.3 1 ASEDLK ASEDLK correct [244.07]SALK 689.9 2 HGTVVLTALGGILK HGTVVLTALGGILK correct HGTVVLTALG[170.1]LK 748.4 1 ALELFR ALELFR correct [184.12]ELFR 803.9 2 VEADIAGHGQEVLIR LDADIAGHGQEVLIR almost no results 908.4 2 GLSDGEWQQVLNVWGK GLSDGEWQQVLNVWGK correct [170.11]SG[244.07]WQQVLNVWGK 943.2 2 YLEFISDAIIHVLHSK YLEFISDAIIHVLHSK correct [276.1]EFLSD[184.12]LHVLHSK
Comparison of PEAKS and Lutefisk
![Page 17: PEAKS: De Novo Sequencing using MS/MS spectra Bin Ma, U. Western Ontario, Canada Kaizhong Zhang,U. Western Ontario, Canada Chengzhi Liang, Bioinformatics.](https://reader035.fdocuments.in/reader035/viewer/2022070412/56649d445503460f94a210e5/html5/thumbnails/17.jpg)
Users
![Page 18: PEAKS: De Novo Sequencing using MS/MS spectra Bin Ma, U. Western Ontario, Canada Kaizhong Zhang,U. Western Ontario, Canada Chengzhi Liang, Bioinformatics.](https://reader035.fdocuments.in/reader035/viewer/2022070412/56649d445503460f94a210e5/html5/thumbnails/18.jpg)
Implementation Particulars
• More accurate scoring:– sum of the logarithmic intensities– many other ion types– coexisting ions, e.g., x2, y2, z2
• Deconvolution– converting multiply-charged peaks to singly-charged
ones
• Recalibration – compress/stretch the spectrum for calibration error
• Noise reduction
![Page 19: PEAKS: De Novo Sequencing using MS/MS spectra Bin Ma, U. Western Ontario, Canada Kaizhong Zhang,U. Western Ontario, Canada Chengzhi Liang, Bioinformatics.](https://reader035.fdocuments.in/reader035/viewer/2022070412/56649d445503460f94a210e5/html5/thumbnails/19.jpg)
Acknowledgement
• Bin Ma, Kaizhong Zhang were supported by NSERC.
• Chengzhi Liang was supported by BSI.
• Thanks the development team in BSI for the software development.
![Page 20: PEAKS: De Novo Sequencing using MS/MS spectra Bin Ma, U. Western Ontario, Canada Kaizhong Zhang,U. Western Ontario, Canada Chengzhi Liang, Bioinformatics.](https://reader035.fdocuments.in/reader035/viewer/2022070412/56649d445503460f94a210e5/html5/thumbnails/20.jpg)
![Page 21: PEAKS: De Novo Sequencing using MS/MS spectra Bin Ma, U. Western Ontario, Canada Kaizhong Zhang,U. Western Ontario, Canada Chengzhi Liang, Bioinformatics.](https://reader035.fdocuments.in/reader035/viewer/2022070412/56649d445503460f94a210e5/html5/thumbnails/21.jpg)
Tandem Mass Spectrometer
massanalyzer
fragment
precursor ions fragment ions
MPSER
SG…
+
PAK +
+
P+ AKPAK +
PAK + PA+ K
AK+P
K+PA
P +K+
PA+
AK+
PAK +
PAK +
de novo sequencing
…
massanalyzer
ionsdetector
![Page 22: PEAKS: De Novo Sequencing using MS/MS spectra Bin Ma, U. Western Ontario, Canada Kaizhong Zhang,U. Western Ontario, Canada Chengzhi Liang, Bioinformatics.](https://reader035.fdocuments.in/reader035/viewer/2022070412/56649d445503460f94a210e5/html5/thumbnails/22.jpg)
Algorithm Sandwich• DP(0,0) = 0; DP(u,v) = -infinity for (u,v)!=(0,0);
• for u from 1 to m/2 do
for v from u-max(a) to u+max(a) do
for a in Σ do
if u<v then
else
• find u,v,a, s.t. u+v+a=m and DP(u,v) maximized;
• backtracking;
),(),,(),(max),( vauDPvufvuDPvauDP
),(),,(),(max),( avuDPvugvuDPavuDP
![Page 23: PEAKS: De Novo Sequencing using MS/MS spectra Bin Ma, U. Western Ontario, Canada Kaizhong Zhang,U. Western Ontario, Canada Chengzhi Liang, Bioinformatics.](https://reader035.fdocuments.in/reader035/viewer/2022070412/56649d445503460f94a210e5/html5/thumbnails/23.jpg)
![Page 24: PEAKS: De Novo Sequencing using MS/MS spectra Bin Ma, U. Western Ontario, Canada Kaizhong Zhang,U. Western Ontario, Canada Chengzhi Liang, Bioinformatics.](https://reader035.fdocuments.in/reader035/viewer/2022070412/56649d445503460f94a210e5/html5/thumbnails/24.jpg)
Dynamic Programming
1. for u from 0 to m
2. backtracking
)()(max)( ufauDPuDP a
![Page 25: PEAKS: De Novo Sequencing using MS/MS spectra Bin Ma, U. Western Ontario, Canada Kaizhong Zhang,U. Western Ontario, Canada Chengzhi Liang, Bioinformatics.](https://reader035.fdocuments.in/reader035/viewer/2022070412/56649d445503460f94a210e5/html5/thumbnails/25.jpg)
Dynamic Programming
),(max),(
suffix is prefix, is
)(,)(QRscorevuDP
QR
vQmuRm
•We hope DP(u,v) for u+v=m gives the optimal prefix and suffix. •The optimal solution can be obtained by concatenation of the prefix and suffix.
![Page 26: PEAKS: De Novo Sequencing using MS/MS spectra Bin Ma, U. Western Ontario, Canada Kaizhong Zhang,U. Western Ontario, Canada Chengzhi Liang, Bioinformatics.](https://reader035.fdocuments.in/reader035/viewer/2022070412/56649d445503460f94a210e5/html5/thumbnails/26.jpg)
Chummy Pairs
• Two strings Ra and bQ are called chummy pairs, iff. either of the following two is true:(C1)(C2)
)a(1)b(19)(1 RmQmRm
)b(19)a(1)(19 QmRmQm
(LGE, LVR) (C2)(LGE, VR) (C1)(LGE, R) (C1)(LG,VR) is not chummy
![Page 27: PEAKS: De Novo Sequencing using MS/MS spectra Bin Ma, U. Western Ontario, Canada Kaizhong Zhang,U. Western Ontario, Canada Chengzhi Liang, Bioinformatics.](https://reader035.fdocuments.in/reader035/viewer/2022070412/56649d445503460f94a210e5/html5/thumbnails/27.jpg)
Chummy pairs
• Lemma A – Suppose Ra and bQ are a chummy pair. u=m(Ra), v=m(bQ). If (C1) is true,
If (C2) is true,
) , ( ) a ( ) b a (v u f ,Q R score Q, R score
) , ( ) b ( ) b a (v u g Q R, score Q, R score
![Page 28: PEAKS: De Novo Sequencing using MS/MS spectra Bin Ma, U. Western Ontario, Canada Kaizhong Zhang,U. Western Ontario, Canada Chengzhi Liang, Bioinformatics.](https://reader035.fdocuments.in/reader035/viewer/2022070412/56649d445503460f94a210e5/html5/thumbnails/28.jpg)
Chummy Pairs
• Lemma B – Let P be the optimal solution. Then there is a chummy pair (R,Q) and a letter a such that P=RaQ. Also, there is a chummy pair series such that
),(),(),(),( 11 QRQRQR nn