Tag-based Blind Identification of PTMs with Point Process Model 1 Chunmei Liu, 2 Bo Yan, 1 Yinglei...

34
Tag-based Blind Identifi cation of PTMs with Poin t Process Model 1 Chunmei Liu, 2 Bo Yan, 1 Yinglei Song, 2 Ying Xu, 1 Liming Cai 1 Dept. of Computer Science 2 Dept. of Biochemistry and Molecular Biology, University of Georgia, USA

description

3 Tandem mass spectra of peptides (a tutorial from Mass b1 + Mass y8 = Mass total

Transcript of Tag-based Blind Identification of PTMs with Point Process Model 1 Chunmei Liu, 2 Bo Yan, 1 Yinglei...

Page 1: Tag-based Blind Identification of PTMs with Point Process Model 1 Chunmei Liu, 2 Bo Yan, 1 Yinglei Song, 2 Ying Xu, 1 Liming Cai 1 Dept. of Computer Science.

Tag-based Blind Identification of PTMs with Point Process Model

1Chunmei Liu, 2Bo Yan, 1Yinglei Song, 2Ying Xu, 1Liming Cai

1Dept. of Computer Science 2Dept. of Biochemistry and Molecular Biology, University of Georgia, USA

Page 2: Tag-based Blind Identification of PTMs with Point Process Model 1 Chunmei Liu, 2 Bo Yan, 1 Yinglei Song, 2 Ying Xu, 1 Liming Cai 1 Dept. of Computer Science.

2

Tandem mass spectra of peptides

e.g., MS/MS of GLSDGEWQQVLNVWGK (www.ionsource.com)

Page 3: Tag-based Blind Identification of PTMs with Point Process Model 1 Chunmei Liu, 2 Bo Yan, 1 Yinglei Song, 2 Ying Xu, 1 Liming Cai 1 Dept. of Computer Science.

3

Tandem mass spectra of peptides

(a tutorial from www.cmb.usc.edu)

Massb1 + Massy8 = Masstotal

Page 4: Tag-based Blind Identification of PTMs with Point Process Model 1 Chunmei Liu, 2 Bo Yan, 1 Yinglei Song, 2 Ying Xu, 1 Liming Cai 1 Dept. of Computer Science.

4

Peptide sequencing De novo sequencing: directly infer the target peptide from its MS

data

[Fernandez et al 1992; Dancik et al 1999; Chen et al 2001; Searle et al 2001; Ma et al 2003, Liu et al 2006]

sensitive to MS data; noises; missing peaks; and difficult.

DB search based: compare the target MS with theoretical MS in a peptide database

[Eng et al, 1994; Perkins et al 1999]

slow; target may not be in the database; or modified after translation

Page 5: Tag-based Blind Identification of PTMs with Point Process Model 1 Chunmei Liu, 2 Bo Yan, 1 Yinglei Song, 2 Ying Xu, 1 Liming Cai 1 Dept. of Computer Science.

5

Post-translational modification (PTM)

Page 6: Tag-based Blind Identification of PTMs with Point Process Model 1 Chunmei Liu, 2 Bo Yan, 1 Yinglei Song, 2 Ying Xu, 1 Liming Cai 1 Dept. of Computer Science.

6

Identifying PTMs in peptide sequencing

Assume a limited set of modification types and model them as pseudo amino acids[Yates et al 1995; Wilkins et al 1999; Tanner et al 2005]

regular sequencing tools can applymay erroneously processing PTMs of unknown types

Blind identification (unlimited modification types) spectral alignment based (difficult) [Pevnzer et al 2000, Tsur

et al 2005, Yan et al 2006]de novo sequencing dependent [Han et al 2005]

Page 7: Tag-based Blind Identification of PTMs with Point Process Model 1 Chunmei Liu, 2 Bo Yan, 1 Yinglei Song, 2 Ying Xu, 1 Liming Cai 1 Dept. of Computer Science.

7

This work

DB search based (point process model, Yan et al 2006) PTM identification Yan et al 2006 is comparable to Tsur et al 2005

Both are the best blind PTM identification programsYan et al 2006 faster, hits of homologs

Peptide tag-based filtering of database

Graph-theoretic approach to generate tags

Page 8: Tag-based Blind Identification of PTMs with Point Process Model 1 Chunmei Liu, 2 Bo Yan, 1 Yinglei Song, 2 Ying Xu, 1 Liming Cai 1 Dept. of Computer Science.

8

Our approach details

Input: an experimental spectrum Output: a peptide sequence and possible PTMs Steps:

Construct an extended spectrum graph to find all maximum weighted anti-symmetric paths and

select tags as the paths

Construct a DFA from the tags to filter the peptide database to obtain candidate peptides

Apply point process model to the candidates to identify the peptide and potential PTMs by maximizing spectra

alignment score

Page 9: Tag-based Blind Identification of PTMs with Point Process Model 1 Chunmei Liu, 2 Bo Yan, 1 Yinglei Song, 2 Ying Xu, 1 Liming Cai 1 Dept. of Computer Science.

9

Spectrum graph

1 2 3 2n-1 2n

b y b b y

A tandem mass spectrum

source

sink

Inte

nsity

m/z

1 3source

i 2n-1sink

is the mass of a single amino acids.

De novo sequencing corresponds to finding a longest directed anti-symmetric path from source to sink[Dancik et al 1999, etc.]

2 4 2i 2n

Page 10: Tag-based Blind Identification of PTMs with Point Process Model 1 Chunmei Liu, 2 Bo Yan, 1 Yinglei Song, 2 Ying Xu, 1 Liming Cai 1 Dept. of Computer Science.

10

},....,,{ 221 kxxx

12 −+ikx2x

Assume a MS/MS spectrum S of a peptide P be a set of mass peaks . if ; parent mass is M.

……

1x……

ix……

12 −kx kx2 M0

ij xx −If is a mass of a single amino acid, connect the corresponding vertices with directed edges

Connect each pair of complementary verticesand with a non-directed edge.

ix12 −+ikx

ji xx > ji >

Extended spectrum graph

[Liu et al 2006]

Page 11: Tag-based Blind Identification of PTMs with Point Process Model 1 Chunmei Liu, 2 Bo Yan, 1 Yinglei Song, 2 Ying Xu, 1 Liming Cai 1 Dept. of Computer Science.

11

Extended spectrum graph

00 471

Mass/Charge

Inte

nsi

ty

(a) (b)

100 200 300 400

71

113

202 269

358

400

71 113 202 269 358 400

A M R L

AMRL

parent mass=471

Peptide: AMRL/LRMA

Page 12: Tag-based Blind Identification of PTMs with Point Process Model 1 Chunmei Liu, 2 Bo Yan, 1 Yinglei Song, 2 Ying Xu, 1 Liming Cai 1 Dept. of Computer Science.

12

Tag selection for the target peptide

Tag: a short sequence of amino acids

Previous work: PepNovo [Frank et al 2005]

apply de novo sequence algorithms first, and identify tags from the sequenced peptide

Advantage: effectiveDisadvantages: the present of noises, missing peaks, and PTMs make it hard to improve the effectiveness; slow

Page 13: Tag-based Blind Identification of PTMs with Point Process Model 1 Chunmei Liu, 2 Bo Yan, 1 Yinglei Song, 2 Ying Xu, 1 Liming Cai 1 Dept. of Computer Science.

13

Tag selection for the target peptide

In this work:construct an extended spectrum graph (mixed graph) from the target spectrum

tree-decompose the graph

dynamic programming to find all maximum weighted anti-symmetric paths

advantages: fast and effective, tolerating noises and missing peaks

Page 14: Tag-based Blind Identification of PTMs with Point Process Model 1 Chunmei Liu, 2 Bo Yan, 1 Yinglei Song, 2 Ying Xu, 1 Liming Cai 1 Dept. of Computer Science.

14

Graph Tree Decomposition

a ab c c

cde

hf fa g gb c

d ef

ha g

fa gfa gf

Tree decompositionbag

a b c d e

a c f g h

Graph

Page 15: Tag-based Blind Identification of PTMs with Point Process Model 1 Chunmei Liu, 2 Bo Yan, 1 Yinglei Song, 2 Ying Xu, 1 Liming Cai 1 Dept. of Computer Science.

15

Properties of tree decomposition

1. Each vertex is contained in at least one bag

a ab c c

cde

hf fa g g

b c

d ef

ha

fa gfa gf

g

Page 16: Tag-based Blind Identification of PTMs with Point Process Model 1 Chunmei Liu, 2 Bo Yan, 1 Yinglei Song, 2 Ying Xu, 1 Liming Cai 1 Dept. of Computer Science.

16

2. For any edge {g, f}: there is a bag containing both g and f

1. Each vertex is contained in at least one bag

a ab c c

cde

hf g

b c

d ef

ha g

fa gfa gf

Properties of tree decomposition

Page 17: Tag-based Blind Identification of PTMs with Point Process Model 1 Chunmei Liu, 2 Bo Yan, 1 Yinglei Song, 2 Ying Xu, 1 Liming Cai 1 Dept. of Computer Science.

17

3. For every vertex c: the bags that contain c form a connected subtree

a ab c c

cde

hf g

b c

d ef

ha g

fa gfa gf

2. For any edge {g, f}: there is a bag containing both g and f

1. Each vertex is contained in at least one bag

Properties of tree decomposition

Page 18: Tag-based Blind Identification of PTMs with Point Process Model 1 Chunmei Liu, 2 Bo Yan, 1 Yinglei Song, 2 Ying Xu, 1 Liming Cai 1 Dept. of Computer Science.

18

Tree width of a tree decomposition:

Tree width of a graph: minimum width over all tree decompositions of the graph

a ab c c

cde

hf fa g g

1||max −∈ iIi bagTree width = 2

b c

d ef

ha g

a b c d e

a c f g h

Tree width = 4

Tree Width

Page 19: Tag-based Blind Identification of PTMs with Point Process Model 1 Chunmei Liu, 2 Bo Yan, 1 Yinglei Song, 2 Ying Xu, 1 Liming Cai 1 Dept. of Computer Science.

19

Internal tree bags in a tree decomposition are separators of the graph

a ab c c

cde

hf fa g g

b c

d ef

ha g

Tree bags are separators

Page 20: Tag-based Blind Identification of PTMs with Point Process Model 1 Chunmei Liu, 2 Bo Yan, 1 Yinglei Song, 2 Ying Xu, 1 Liming Cai 1 Dept. of Computer Science.

20

Tree bags in a tree decomposition are separators of the graph

a ab c c

cde

hf fa g g

b c

d ef

ha g

This allows efficient dynamic programming

b

d e

hg

Tree bags are separators

Page 21: Tag-based Blind Identification of PTMs with Point Process Model 1 Chunmei Liu, 2 Bo Yan, 1 Yinglei Song, 2 Ying Xu, 1 Liming Cai 1 Dept. of Computer Science.

21

A table is maintained for each bag

1 2

3

4

6

5

Dynamic programming

Compute tables bottom up

Each table contains partial optimal solutions; the root table contains the optimal one

Time complexity: O(6tn2)

Page 22: Tag-based Blind Identification of PTMs with Point Process Model 1 Chunmei Liu, 2 Bo Yan, 1 Yinglei Song, 2 Ying Xu, 1 Liming Cai 1 Dept. of Computer Science.

22

Dynamic programming (cont’s)

a b c V L

a d b 1c 2c V L b e c 1c 2c V L

1 1 1 1 2

111

110

0 0 1 31 − 1 4

1 0 11 0 1

01

−−

10

3−

iX

jX kX

1c 2c

… ……

bottom-up…

……

……

abc

adb bec

……

iX

jX kX

Page 23: Tag-based Blind Identification of PTMs with Point Process Model 1 Chunmei Liu, 2 Bo Yan, 1 Yinglei Song, 2 Ying Xu, 1 Liming Cai 1 Dept. of Computer Science.

23

Score scheme and reliability of sequence tags Assign the score scheme [Dancik et all 1999] as weights to the

edges in spectrum graphs

Overall reliability of a tag t i = w1r1(ti) + w2r2(ti) r1(ti) - reliability computed from ti’s edge normalized weightsr2(ti) - reliability computed autocorrelation score [Liu et al, 2005]

Refer to the paper for details

Page 24: Tag-based Blind Identification of PTMs with Point Process Model 1 Chunmei Liu, 2 Bo Yan, 1 Yinglei Song, 2 Ying Xu, 1 Liming Cai 1 Dept. of Computer Science.

24

PTM identification with point process blind search

Find a set of PTMs to maximize the spectral alignment

Can identify all possible PTMs through one round of cross-correlation calculation

Computation time is independent of the number of PTMs

Page 25: Tag-based Blind Identification of PTMs with Point Process Model 1 Chunmei Liu, 2 Bo Yan, 1 Yinglei Song, 2 Ying Xu, 1 Liming Cai 1 Dept. of Computer Science.

25

PTM identification with point process blind search Treat a spectrum and the theoretical spectrum of a candidate

peptide as one point process:

where {ti } is a set of mass locations with N peaks, and δ is the Kronecker delta function:

Assume there is K PTMs, the {ti } can be clustered into K+1 groups:

∑=

−=N

iitttx

1

)()( δ

∑ ∑∑= ==

−==K

k

N

i

ki

K

kk

k

tttxtx0 1

)(

0

)()()( δ

Page 26: Tag-based Blind Identification of PTMs with Point Process Model 1 Chunmei Liu, 2 Bo Yan, 1 Yinglei Song, 2 Ying Xu, 1 Liming Cai 1 Dept. of Computer Science.

26

PTM identification with point process blind search When a PTM happens, a shift occurs to xk(t) to produce yk(t)

Use C[.] to denote the total number of non-zero values in a point process:

∑=

−Δ+=Δ+=kN

i

kikkkk tttxty

1

)( )()()( δ

[ ] [ ] ∑=

==K

kkkk NtxCNtxC

0

)(,)(

[ ])()()( ττ −≡ tytxCcxy

Page 27: Tag-based Blind Identification of PTMs with Point Process Model 1 Chunmei Liu, 2 Bo Yan, 1 Yinglei Song, 2 Ying Xu, 1 Liming Cai 1 Dept. of Computer Science.

27

For K=1, ∆ represents the mass of a possible PTM, we report the top candidate with a ∆, and

with the maximum tolerancePWPW ≤Δ−− || exp

)()0( Δ+ xyxy cc

PTM identification with point process blind search

Page 28: Tag-based Blind Identification of PTMs with Point Process Model 1 Chunmei Liu, 2 Bo Yan, 1 Yinglei Song, 2 Ying Xu, 1 Liming Cai 1 Dept. of Computer Science.

28

Evaluations

Datasets 2657 annotated yeast ion trap tandem mass spectra fro

m OPD (Prince et al, 2004) having relatively low mass resolutions

2620 modified spectra with one artificially added one PTM to each spectrum (Yan et al, 2006)

Experiments Sequence tag generation Database search via DFA based model Blind PTM identification

Page 29: Tag-based Blind Identification of PTMs with Point Process Model 1 Chunmei Liu, 2 Bo Yan, 1 Yinglei Song, 2 Ying Xu, 1 Liming Cai 1 Dept. of Computer Science.

29

Performance in tag selectionTag

lengthAlgorithm Top 1 Top 3 Top 5 Top 10 Top 25 Time(s)

w/oPTM

3 OursPepnovo

75.875.8

89.190.1

94.694.6

96.996.8

98.198.8

0.333.62

4 OursPepnovo

65.365.5

80.581.0

88.786.6

93.692.3

96.495.3

0.343.69

5 OursPepnovo

56.458.4

72.871.3

78.377.6

85.184.0

89.888.9

0.333.83

6 OursPepnovo

50.249.7

62.361.5

66.967.8

76.675.0

82.481.8

0.344.27

withPTM

3 OursPepnovo

68.162.8

84.883.7

90.389.7

94.884.9

97.197.8

0.323.59

4 OursPepnovo

53.551.1

71.271.7

78.679.3

84.885.8

90.091.4

0.323.64

Columns: percentages of spectra that have at least one correct tag in top 1, 3, 5, 10, 25. Comparisons based on the sequencing results by SEQUEST [Eng et al 1994]

Page 30: Tag-based Blind Identification of PTMs with Point Process Model 1 Chunmei Liu, 2 Bo Yan, 1 Yinglei Song, 2 Ying Xu, 1 Liming Cai 1 Dept. of Computer Science.

30

Performance in tag selection (cont’d)

Time complexity of the tag selection depends on the tree width tof spectrum graphs: O(6tn2)

About 90% of such graphs have tree width not exceeding 6

More than 10 times faster than PepNovo [Frank et al 2005]

Page 31: Tag-based Blind Identification of PTMs with Point Process Model 1 Chunmei Liu, 2 Bo Yan, 1 Yinglei Song, 2 Ying Xu, 1 Liming Cai 1 Dept. of Computer Science.

31

Database search for PTM identification

Construct a DFA from the selected sequence tags and use it to filter a peptide database

Only small portion of peptides will remain

Point process model for PTM identification are applied to identify the peptide and potential PTMs

Page 32: Tag-based Blind Identification of PTMs with Point Process Model 1 Chunmei Liu, 2 Bo Yan, 1 Yinglei Song, 2 Ying Xu, 1 Liming Cai 1 Dept. of Computer Science.

32

Performance in PTM identification

Tag length

Top 1 Top 2 Top 3 Top 4 Top 5 Filtration Ratio

T(s)

3 76.69 86.01 89.29 90.70 91.62 0.0167 263

4 74.98 80.77 81.71 82.17 84.40 0.0014 34

W/O Filtration

60.38 72.33 76.64 79.16 81.17 - 3843

Columns: cumulative percentages of search results capturingthe target peptides exactly in Top i; T is the total time for all 2620 experimental spectra. Comparisons with Yan et al 2006 that does not employ filtration.

Page 33: Tag-based Blind Identification of PTMs with Point Process Model 1 Chunmei Liu, 2 Bo Yan, 1 Yinglei Song, 2 Ying Xu, 1 Liming Cai 1 Dept. of Computer Science.

33

Summary

A new graph-theoretic approach for peptide tag selectioneffective and efficient

In combine with point process model to sequence peptide and identify PTMs

effective and efficient

More tests are needed (e.g. two PTMs)

Tree decomposition based approaches have not been fully exploited (e.g., improving tag selection effectiveness)

Page 34: Tag-based Blind Identification of PTMs with Point Process Model 1 Chunmei Liu, 2 Bo Yan, 1 Yinglei Song, 2 Ying Xu, 1 Liming Cai 1 Dept. of Computer Science.

34

Acknowledgement

CS@UGAChunmei LiuYinglei Song

BMB@UGABo YanYing Xu

NSF NIH