Laboratorio3 - ModalitÃ...
Transcript of Laboratorio3 - ModalitÃ...
Identification of proteins from PMF data sets:Protein Prospector MS-Fit
http://prospector.ucsf.edu/prospector/mshome.htm
Identification of proteins from PMF data sets: MASCOT PMF
http://www.matrixscience.com/search_form_select.html
The best protein sequence corresponding to a PMF is assigned by MASCOTon a probabilistic base, giving a score to each candidate sequence:
A score is calculated for each candidate protein using the MOWSE(MOlecular Weight SEarch, 1993) algorythm:
Protein MW classes (10 kDa-spaced)
1 2 ………
j
1 f1,1 f1,2 f1,j
2 f2,1 f2,2 f2,j
i fi,1 fi,2 fi,j
Peptide m/z ratios classes(100 Da-spaced)
Frequency factors matrix
Proteins in the database are sequentially digested in silico and each ofthe experimental m/z ratios is compared with those of virtual hydrolyticpeptides. When a matching is observed, the factor mi,j is introduced in thePn mi,j operator (n is then the number of matching peptides).
Once the scores for all the candidate proteins have been calculated,their statistical distribution is evaluated by the software.
A typical MASCOT PMF output shows the evaluated distribution:
A specific entry can be considered significant if the probability of itsoccurrence is lower than a user-defined threshold. Usually a 5%threshold is used by MASCOT users.
The green-colored region in the distribution plot encloses a 95%probability, thus only an entry located outside that region can beconsidered significant, according to the criterion adopted.
When less stringent tolerance values are adopted, many entries canhave relatively high score values, thus changing the distribution (notethat the highest score changes too):
Mass tolerance from 0.1 to 1 u
Mass tolerance from 1 to 2.5 u
The best score is progressively lowered and finally falls below thethreshold, thus becoming not statistically significant.
Identification of peptides from MS/MS data sets: Protein Prospector MS-Tag
Tolerances on precursor (parent) and product (frag) ions m/z ratios
Restricting the database search to a specific taxonomy, if known, reducessignificantly the number of candidate peptides.
……………..
……………..
High taxonomic specificity
The types of instrument ions, i.e., of product ions that can be reasonablyconsidered to explain peptide MS/MS spectra can be also selected by theanalyst, in accordance with fragmentation patterns typically observed withhis/her instrument.
Relative abundances of product ions can be specified, along with their m/zvalues and, if different from 1, their charge states, into the Data Paste Area.Their presence is fundamental to draw a rank of different candidatesequences based on Matched Intensity.
Candidate peptides are listed by MS Tag according to the number ofexperimental m/z ratios not matching the predicted ones.
Detailed results available in the MS-Tag output allow a check of the spectralcoverage (matched intensity) provided by matching ions:
A different approach to database search from MS/MS data: MASCOT MS/MS Ions Search
The software considers all the database proteins (or those included in aspecific MW range, if provided by the user), then all the possiblepeptide sequences are generated for each protein.
The generated sequences can be digestive peptides, if an enzyme hasbeen specified, or sub-sequences of the protein, if the “None” optionhas been chosen (the option is not available on the public version).
The latter refers to matrices in which peptides are present but theirorigin (enzymatic or chemical) is not known.
When no enzyme is specified the processing time can be significantlylonger: for a protein having N aminoacidic residues, a number of N/10tryptic peptides can be estimated, whereas possible sub-sequencesare about N(N+1)/2.
For N = 200 (a typical value for a protein with MW close to 20000 u)tryptic peptides are about 20 whereas possible sub-sequences aremore than 20000!
Key points in MASCOT MS/MS Ions Search
A frequency factors matrix is built by MASCOT MS/MS Ions Search byfragmenting in silico all the possible peptides arising from the proteins inthe database and by grouping the resulting product ions into m/z ratiosclasses:
Peptide MW classes
1 2 ………
j
1 f1,1 f1,2 f1,j
2 f2,1 f2,2 f2,j
i fi,1 fi,2 fi,j
Peptide product ions m/z ratios classes
Frequency factors matrix
Ion score in MASCOT MS/MS Ions Search
A specific peptide is then fragmented in silico and a mi,j factor is assignedto each product ion matching with an experimental one, according to itslocation in the matrix. An ion score is then obtained.
The output report from MS/MS Ion Search clarifies the statistical elaborationfor each candidate peptide, showing the distribution of ion scores andtheir position with respect to the significance threshold:
In this example the -10 log10P threshold is well beyond the maximum ionscore obtained.
The subsequent part of the output shows the Ion score of a candidatepeptide, its aminoacidic sequence and further information.
In any case, proteins to which that peptide belongs are also indicated.
Actually, several MS/MS sets can be contemporarily loaded into MS/MSIon Search. In this case, the Ion Scores from all the matching peptidesreferred to the same protein are summed to give a Protein Score.
The MS/MS Ion Search output also includes the MS/MS spectral coverage:
the * superscript indicates NH3 lossthe 0 superscript indicates H2O loss
400 500 600 700 800 900 1000 1100m/z
0
10
20
30
40
50
60
70
80
90
100
Re
lativ
e A
bu
nda
nce
792,5
887,2
1034,2570,2 863,3548,1
325,9 717,5 788,3471,3 1086,31006,5542,0486,7 622,9 641,5 964,1464,3 761,3 817,1
415,7373,1 917,6
300
Peptide withm/z 1109.5
In order to avoid the contribution of ionic noiseto the MS/MS input set, the user should:
raise the fragmentation extent
select only m/z ratios corresponding tosignificant abundances (i.e. fixing an intensitythreshold)
(m/z, rel. ab.) list
1086.3 7.041034.2 28.961006.5 5.96 887.2 50.43863.3 24.22792.5 100788.3 8.22……………….. 471.3 7.54325.9 9.05
Appropriate selection of MS/MS data before database search
Candidate sequences (MS-Tag) and sum of abundances for matching ions:
MAIPPKKNQ – 1.66(bovine k casein)
SDLHPICNK – 1.59
APEEELNPK – 1.48ANIYNATFL – 1.42TPVDRVPDQ – 1.29VGSLTTGYTQ – 0.51
Controversial cases: peptides with similar MS/MS spectral coverages
0 5 10 15 20 25 30 35 40 45Retention time / min
0
10
20
30
40
50
60
70
80
90
100
Re
lativ
e A
bun
danc
e
38.83
36.96
34.83
Crescenza cheese extractFull scan ESI-MS chromatogram(base peak)
4.8629,84
300 350 400 450 500 550 600 650 700 750 800 850 900 950 100005
1015
2025
3035
4045
5055606570
75808590
95
100
Re
lativ
e A
bun
danc
e
1009.4
783.3547.5 748.2 1026.5323.4
MS/MS1026.5 >
MAIPPKKNQ y7+
SDLHPICNK y7+
m/z
711.3
766.5
694.4 862.3824.5 880.6614.5451.3389.3
MAIPPKKNQ b8+
SDLHPICNK b8+
MAIPPKKNQ b7+
SDLHPICNK b7+
MAIPPKKNQ y6+
SDLHPICNK y6+
MAIPPKKNQ y6+ - NH3
SDLHPICNK y6+ - NH3
MAIPPKKNQ y5+
[M+H]+
MAIPPKKNQ b8+ - H2O
SDLHPICNK b8+ - H2O
[M+H]+-NH3
MAIPPKKNQ y3+
Only MS3 on the most abundant product ion (m/z 711.3) is able to clarifythe correct peptide sequence:
NH
OH
N
O
KKNQ
+PPKKNQ PK+
NH
O
N2H
NH
O
NH
O
N2H
NH
ON 3H
[PK(CO)NH3]+
200 250 300 350 400 450 500 550 600 650 700m/z
05
101520253035404550556065707580859095
100R
ela
tive
Ab
un
danc
e+PPKKNQ (y6
+)+HPICNK (y6
+)
694.4
565.4711.3547.3
451.4
517.2
468.5323.4 389.3226.1
Y6+ - NH3
y4+
b4+ / b4
+
b3+ y3
+ c4+/c4
+ PK+
b5+ / b5
+
372.2
614.6 676.3243.1
y3+ - NH3
y5+
[PK(CO)NH3]+
582.3
b5+ - H2O
Y6+ - NH3 –H2O
c5+ / c5
+
The Protein Prospector MS-Product software
Identification of peptides arising from proteins not listed in proteomic databases: De Novo Sequencing
Construction of possible sequences, based on
combinatorial calculus
In silico fragmentation(b, y, a, c, x, z, ion series)
input
Pre-selection of candidate sequences
MWtolerance
Experimental MW
Estimate of the number of aminoacidic residues in the
sequence
Evaluation of matching between experimental and predicted
fragmentation patterns
Tolerance on
product ions m/z
ratios
MS/MS experimental
data set
Final set of candidate peptidesvalidation