Laboratorio3 - ModalitÃ...

29
Identification of proteins from PMF data sets: Protein Prospector MS-Fit http://prospector.ucsf.edu/prospector/mshome.htm

Transcript of Laboratorio3 - ModalitÃ...

Page 1: Laboratorio3 - Modalità compatibilitÃpuccini.chimica.uniba.it/.../Esercitazioni/Laboratorio3.pdf · 2019-06-07 · Microsoft PowerPoint - Laboratorio3 - Modalità compatibilitÃ

Identification of proteins from PMF data sets:Protein Prospector MS-Fit

http://prospector.ucsf.edu/prospector/mshome.htm

Page 2: Laboratorio3 - Modalità compatibilitÃpuccini.chimica.uniba.it/.../Esercitazioni/Laboratorio3.pdf · 2019-06-07 · Microsoft PowerPoint - Laboratorio3 - Modalità compatibilitÃ
Page 3: Laboratorio3 - Modalità compatibilitÃpuccini.chimica.uniba.it/.../Esercitazioni/Laboratorio3.pdf · 2019-06-07 · Microsoft PowerPoint - Laboratorio3 - Modalità compatibilitÃ
Page 4: Laboratorio3 - Modalità compatibilitÃpuccini.chimica.uniba.it/.../Esercitazioni/Laboratorio3.pdf · 2019-06-07 · Microsoft PowerPoint - Laboratorio3 - Modalità compatibilitÃ
Page 5: Laboratorio3 - Modalità compatibilitÃpuccini.chimica.uniba.it/.../Esercitazioni/Laboratorio3.pdf · 2019-06-07 · Microsoft PowerPoint - Laboratorio3 - Modalità compatibilitÃ
Page 6: Laboratorio3 - Modalità compatibilitÃpuccini.chimica.uniba.it/.../Esercitazioni/Laboratorio3.pdf · 2019-06-07 · Microsoft PowerPoint - Laboratorio3 - Modalità compatibilitÃ
Page 7: Laboratorio3 - Modalità compatibilitÃpuccini.chimica.uniba.it/.../Esercitazioni/Laboratorio3.pdf · 2019-06-07 · Microsoft PowerPoint - Laboratorio3 - Modalità compatibilitÃ

Identification of proteins from PMF data sets: MASCOT PMF

http://www.matrixscience.com/search_form_select.html

Page 8: Laboratorio3 - Modalità compatibilitÃpuccini.chimica.uniba.it/.../Esercitazioni/Laboratorio3.pdf · 2019-06-07 · Microsoft PowerPoint - Laboratorio3 - Modalità compatibilitÃ
Page 9: Laboratorio3 - Modalità compatibilitÃpuccini.chimica.uniba.it/.../Esercitazioni/Laboratorio3.pdf · 2019-06-07 · Microsoft PowerPoint - Laboratorio3 - Modalità compatibilitÃ

The best protein sequence corresponding to a PMF is assigned by MASCOTon a probabilistic base, giving a score to each candidate sequence:

Page 10: Laboratorio3 - Modalità compatibilitÃpuccini.chimica.uniba.it/.../Esercitazioni/Laboratorio3.pdf · 2019-06-07 · Microsoft PowerPoint - Laboratorio3 - Modalità compatibilitÃ

A score is calculated for each candidate protein using the MOWSE(MOlecular Weight SEarch, 1993) algorythm:

Protein MW classes (10 kDa-spaced)

1 2 ………

j

1 f1,1 f1,2 f1,j

2 f2,1 f2,2 f2,j

i fi,1 fi,2 fi,j

Peptide m/z ratios classes(100 Da-spaced)

Frequency factors matrix

Proteins in the database are sequentially digested in silico and each ofthe experimental m/z ratios is compared with those of virtual hydrolyticpeptides. When a matching is observed, the factor mi,j is introduced in thePn mi,j operator (n is then the number of matching peptides).

Once the scores for all the candidate proteins have been calculated,their statistical distribution is evaluated by the software.

Page 11: Laboratorio3 - Modalità compatibilitÃpuccini.chimica.uniba.it/.../Esercitazioni/Laboratorio3.pdf · 2019-06-07 · Microsoft PowerPoint - Laboratorio3 - Modalità compatibilitÃ

A typical MASCOT PMF output shows the evaluated distribution:

Page 12: Laboratorio3 - Modalità compatibilitÃpuccini.chimica.uniba.it/.../Esercitazioni/Laboratorio3.pdf · 2019-06-07 · Microsoft PowerPoint - Laboratorio3 - Modalità compatibilitÃ

A specific entry can be considered significant if the probability of itsoccurrence is lower than a user-defined threshold. Usually a 5%threshold is used by MASCOT users.

The green-colored region in the distribution plot encloses a 95%probability, thus only an entry located outside that region can beconsidered significant, according to the criterion adopted.

When less stringent tolerance values are adopted, many entries canhave relatively high score values, thus changing the distribution (notethat the highest score changes too):

Mass tolerance from 0.1 to 1 u

Mass tolerance from 1 to 2.5 u

The best score is progressively lowered and finally falls below thethreshold, thus becoming not statistically significant.

Page 13: Laboratorio3 - Modalità compatibilitÃpuccini.chimica.uniba.it/.../Esercitazioni/Laboratorio3.pdf · 2019-06-07 · Microsoft PowerPoint - Laboratorio3 - Modalità compatibilitÃ

Identification of peptides from MS/MS data sets: Protein Prospector MS-Tag

Tolerances on precursor (parent) and product (frag) ions m/z ratios

Page 14: Laboratorio3 - Modalità compatibilitÃpuccini.chimica.uniba.it/.../Esercitazioni/Laboratorio3.pdf · 2019-06-07 · Microsoft PowerPoint - Laboratorio3 - Modalità compatibilitÃ

Restricting the database search to a specific taxonomy, if known, reducessignificantly the number of candidate peptides.

……………..

……………..

High taxonomic specificity

Page 15: Laboratorio3 - Modalità compatibilitÃpuccini.chimica.uniba.it/.../Esercitazioni/Laboratorio3.pdf · 2019-06-07 · Microsoft PowerPoint - Laboratorio3 - Modalità compatibilitÃ

The types of instrument ions, i.e., of product ions that can be reasonablyconsidered to explain peptide MS/MS spectra can be also selected by theanalyst, in accordance with fragmentation patterns typically observed withhis/her instrument.

Relative abundances of product ions can be specified, along with their m/zvalues and, if different from 1, their charge states, into the Data Paste Area.Their presence is fundamental to draw a rank of different candidatesequences based on Matched Intensity.

Page 16: Laboratorio3 - Modalità compatibilitÃpuccini.chimica.uniba.it/.../Esercitazioni/Laboratorio3.pdf · 2019-06-07 · Microsoft PowerPoint - Laboratorio3 - Modalità compatibilitÃ

Candidate peptides are listed by MS Tag according to the number ofexperimental m/z ratios not matching the predicted ones.

Page 17: Laboratorio3 - Modalità compatibilitÃpuccini.chimica.uniba.it/.../Esercitazioni/Laboratorio3.pdf · 2019-06-07 · Microsoft PowerPoint - Laboratorio3 - Modalità compatibilitÃ

Detailed results available in the MS-Tag output allow a check of the spectralcoverage (matched intensity) provided by matching ions:

Page 18: Laboratorio3 - Modalità compatibilitÃpuccini.chimica.uniba.it/.../Esercitazioni/Laboratorio3.pdf · 2019-06-07 · Microsoft PowerPoint - Laboratorio3 - Modalità compatibilitÃ

A different approach to database search from MS/MS data: MASCOT MS/MS Ions Search

Page 19: Laboratorio3 - Modalità compatibilitÃpuccini.chimica.uniba.it/.../Esercitazioni/Laboratorio3.pdf · 2019-06-07 · Microsoft PowerPoint - Laboratorio3 - Modalità compatibilitÃ

The software considers all the database proteins (or those included in aspecific MW range, if provided by the user), then all the possiblepeptide sequences are generated for each protein.

The generated sequences can be digestive peptides, if an enzyme hasbeen specified, or sub-sequences of the protein, if the “None” optionhas been chosen (the option is not available on the public version).

The latter refers to matrices in which peptides are present but theirorigin (enzymatic or chemical) is not known.

When no enzyme is specified the processing time can be significantlylonger: for a protein having N aminoacidic residues, a number of N/10tryptic peptides can be estimated, whereas possible sub-sequencesare about N(N+1)/2.

For N = 200 (a typical value for a protein with MW close to 20000 u)tryptic peptides are about 20 whereas possible sub-sequences aremore than 20000!

Key points in MASCOT MS/MS Ions Search

Page 20: Laboratorio3 - Modalità compatibilitÃpuccini.chimica.uniba.it/.../Esercitazioni/Laboratorio3.pdf · 2019-06-07 · Microsoft PowerPoint - Laboratorio3 - Modalità compatibilitÃ

A frequency factors matrix is built by MASCOT MS/MS Ions Search byfragmenting in silico all the possible peptides arising from the proteins inthe database and by grouping the resulting product ions into m/z ratiosclasses:

Peptide MW classes

1 2 ………

j

1 f1,1 f1,2 f1,j

2 f2,1 f2,2 f2,j

i fi,1 fi,2 fi,j

Peptide product ions m/z ratios classes

Frequency factors matrix

Ion score in MASCOT MS/MS Ions Search

A specific peptide is then fragmented in silico and a mi,j factor is assignedto each product ion matching with an experimental one, according to itslocation in the matrix. An ion score is then obtained.

Page 21: Laboratorio3 - Modalità compatibilitÃpuccini.chimica.uniba.it/.../Esercitazioni/Laboratorio3.pdf · 2019-06-07 · Microsoft PowerPoint - Laboratorio3 - Modalità compatibilitÃ

The output report from MS/MS Ion Search clarifies the statistical elaborationfor each candidate peptide, showing the distribution of ion scores andtheir position with respect to the significance threshold:

In this example the -10 log10P threshold is well beyond the maximum ionscore obtained.

Page 22: Laboratorio3 - Modalità compatibilitÃpuccini.chimica.uniba.it/.../Esercitazioni/Laboratorio3.pdf · 2019-06-07 · Microsoft PowerPoint - Laboratorio3 - Modalità compatibilitÃ

The subsequent part of the output shows the Ion score of a candidatepeptide, its aminoacidic sequence and further information.

In any case, proteins to which that peptide belongs are also indicated.

Actually, several MS/MS sets can be contemporarily loaded into MS/MSIon Search. In this case, the Ion Scores from all the matching peptidesreferred to the same protein are summed to give a Protein Score.

Page 23: Laboratorio3 - Modalità compatibilitÃpuccini.chimica.uniba.it/.../Esercitazioni/Laboratorio3.pdf · 2019-06-07 · Microsoft PowerPoint - Laboratorio3 - Modalità compatibilitÃ

The MS/MS Ion Search output also includes the MS/MS spectral coverage:

the * superscript indicates NH3 lossthe 0 superscript indicates H2O loss

Page 24: Laboratorio3 - Modalità compatibilitÃpuccini.chimica.uniba.it/.../Esercitazioni/Laboratorio3.pdf · 2019-06-07 · Microsoft PowerPoint - Laboratorio3 - Modalità compatibilitÃ

400 500 600 700 800 900 1000 1100m/z

0

10

20

30

40

50

60

70

80

90

100

Re

lativ

e A

bu

nda

nce

792,5

887,2

1034,2570,2 863,3548,1

325,9 717,5 788,3471,3 1086,31006,5542,0486,7 622,9 641,5 964,1464,3 761,3 817,1

415,7373,1 917,6

300

Peptide withm/z 1109.5

In order to avoid the contribution of ionic noiseto the MS/MS input set, the user should:

raise the fragmentation extent

select only m/z ratios corresponding tosignificant abundances (i.e. fixing an intensitythreshold)

(m/z, rel. ab.) list

1086.3 7.041034.2 28.961006.5 5.96 887.2 50.43863.3 24.22792.5 100788.3 8.22……………….. 471.3 7.54325.9 9.05

Appropriate selection of MS/MS data before database search

Page 25: Laboratorio3 - Modalità compatibilitÃpuccini.chimica.uniba.it/.../Esercitazioni/Laboratorio3.pdf · 2019-06-07 · Microsoft PowerPoint - Laboratorio3 - Modalità compatibilitÃ

Candidate sequences (MS-Tag) and sum of abundances for matching ions:

MAIPPKKNQ – 1.66(bovine k casein)

SDLHPICNK – 1.59

APEEELNPK – 1.48ANIYNATFL – 1.42TPVDRVPDQ – 1.29VGSLTTGYTQ – 0.51

Controversial cases: peptides with similar MS/MS spectral coverages

0 5 10 15 20 25 30 35 40 45Retention time / min

0

10

20

30

40

50

60

70

80

90

100

Re

lativ

e A

bun

danc

e

38.83

36.96

34.83

Crescenza cheese extractFull scan ESI-MS chromatogram(base peak)

4.8629,84

300 350 400 450 500 550 600 650 700 750 800 850 900 950 100005

1015

2025

3035

4045

5055606570

75808590

95

100

Re

lativ

e A

bun

danc

e

1009.4

783.3547.5 748.2 1026.5323.4

MS/MS1026.5 >

MAIPPKKNQ y7+

SDLHPICNK y7+

m/z

711.3

766.5

694.4 862.3824.5 880.6614.5451.3389.3

MAIPPKKNQ b8+

SDLHPICNK b8+

MAIPPKKNQ b7+

SDLHPICNK b7+

MAIPPKKNQ y6+

SDLHPICNK y6+

MAIPPKKNQ y6+ - NH3

SDLHPICNK y6+ - NH3

MAIPPKKNQ y5+

[M+H]+

MAIPPKKNQ b8+ - H2O

SDLHPICNK b8+ - H2O

[M+H]+-NH3

MAIPPKKNQ y3+

Page 26: Laboratorio3 - Modalità compatibilitÃpuccini.chimica.uniba.it/.../Esercitazioni/Laboratorio3.pdf · 2019-06-07 · Microsoft PowerPoint - Laboratorio3 - Modalità compatibilitÃ

Only MS3 on the most abundant product ion (m/z 711.3) is able to clarifythe correct peptide sequence:

NH

OH

N

O

KKNQ

+PPKKNQ PK+

NH

O

N2H

NH

O

NH

O

N2H

NH

ON 3H

[PK(CO)NH3]+

200 250 300 350 400 450 500 550 600 650 700m/z

05

101520253035404550556065707580859095

100R

ela

tive

Ab

un

danc

e+PPKKNQ (y6

+)+HPICNK (y6

+)

694.4

565.4711.3547.3

451.4

517.2

468.5323.4 389.3226.1

Y6+ - NH3

y4+

b4+ / b4

+

b3+ y3

+ c4+/c4

+ PK+

b5+ / b5

+

372.2

614.6 676.3243.1

y3+ - NH3

y5+

[PK(CO)NH3]+

582.3

b5+ - H2O

Y6+ - NH3 –H2O

c5+ / c5

+

Page 27: Laboratorio3 - Modalità compatibilitÃpuccini.chimica.uniba.it/.../Esercitazioni/Laboratorio3.pdf · 2019-06-07 · Microsoft PowerPoint - Laboratorio3 - Modalità compatibilitÃ

The Protein Prospector MS-Product software

Page 28: Laboratorio3 - Modalità compatibilitÃpuccini.chimica.uniba.it/.../Esercitazioni/Laboratorio3.pdf · 2019-06-07 · Microsoft PowerPoint - Laboratorio3 - Modalità compatibilitÃ
Page 29: Laboratorio3 - Modalità compatibilitÃpuccini.chimica.uniba.it/.../Esercitazioni/Laboratorio3.pdf · 2019-06-07 · Microsoft PowerPoint - Laboratorio3 - Modalità compatibilitÃ

Identification of peptides arising from proteins not listed in proteomic databases: De Novo Sequencing

Construction of possible sequences, based on

combinatorial calculus

In silico fragmentation(b, y, a, c, x, z, ion series)

input

Pre-selection of candidate sequences

MWtolerance

Experimental MW

Estimate of the number of aminoacidic residues in the

sequence

Evaluation of matching between experimental and predicted

fragmentation patterns

Tolerance on

product ions m/z

ratios

MS/MS experimental

data set

Final set of candidate peptidesvalidation