PROTEOMICS De novo sequence prediction for: nsi78_11.1803.1806.2.dta SequenceAbsoluteRelative...

Post on 21-Dec-2015

214 views 1 download

Transcript of PROTEOMICS De novo sequence prediction for: nsi78_11.1803.1806.2.dta SequenceAbsoluteRelative...

PROTEOMICS

De novo sequence prediction for:nsi78_11.1803.1806.2.dta

Sequence Absolute RelativeProbability Probability

CRGSVNFP[PL]FK 3.9% 36.3%CRGSVN[DE][PL]FK 2.3% 24.7%CRGSVPFN[PN]FK 6.1% 17.2%CRGSV[SR]D[PL]FK 3.1% 6.5%CRGSVPFNWGDK <0.1% 2.7%

Genomics DNA (Gene)

FunctionalGenomics

Transcriptomics RNA

Proteomics PROTEIN

Metabolomics METABOLITE

Transcription

Translation

Enzymatic reaction

The “omics” nomenclature…

GenTranscriptProteMetabol

~ome Sequence of a complete set of

GenesTranscriptsProteinsMetabolites

=

GenProte

~omics = Analysis of the GenomeProteome

A few definitions…

Current -omics

The proteome is defined as the set of all expressed proteins in a cell, tissue or

organism (Wilkins et al., 1997).

Proteomics can be defined as the systematic analysis of proteins for their

identity, quantity and function.

Proteome Genome

dynamic static

No amplification possible

Amplification possible

Hetergenous molecules

Homogenous molecules

Large variability of the amount

No variability of the amount

Complexity of the proteome

Applications of Proteomics• Mining: identification of proteins (catalog the

proteins)• Quantitative proteomics: defining the relative or

absolute amount of a protein• Protein-expression profile: identification of

proteins in a particular state of the organism• Protein-network mapping: protein interactions in

living systems• Mapping of protein modifications: how and

where proteins are modified.

Proteins classes for Analysis

• Membrane

• Soluble proteins

• Organelle-specific

• Chromosome-associated

• Phosphorylated

• Glycosylated

• Multi-protein complexes

General flow for

proteomics analysis

SEPA

RA

TIO

N

IDE

NTIF

ICA

TIO

N

Debora Frigi Rodrigues
You do your experiment, than extract the protein, than obtain a protein mixture, that you are going to separate through 2 dimensions (usually the first dimension is by the protein charge and the second dimension by the mass of the protein. Could be in a gel, like it's shown in here but could be also by Liquid chromatography and or Mass Spectrometry.

Current Proteomics Technologies• Proteome profiling/separation

– 2D SDS PAGE (two-dimensional sodium dodecylsulphate polyacrylamide gel electrophoresis)

– 2-D LC/LC (LC = Liquid Chromatography)– 2-D LC/MS (MS= Mass spectrometry)

• Protein identification– Peptide mass fingerprint– Tandem Mass Spectrometry (MS/MS)

• Quantative proteomics

- ICAT (isotope-coded affinity tag)

- SILAC (stable isotopic labeling of amino acids)

The first dimension (separation by isoelectric focusing)- gel with an immobilised pH gradient- electric current causes charged proteins to move until it reaches the isoelectric point (pH gradient makes the net charge 0)

2D-PAGE gel

Isoelectric point (pI)

• Separation by charge:

4

5

6

7

8

9

10

Sta

ble

pH

g

rad

ien

t

High pH: protein is negatively charged

Low pH:Protein is positively charged

At the isolectric point the protein has no net charge and therefore no longer migrates in the electric field.

The first dimension (separation by isoelectric focusing)- gel with an immobilised pH gradient- electric current causes charged proteins to move until it reaches the isoelectric point (pH gradient makes the net charge 0)

The second dimension (separation by mass)-pH gel strip is loaded onto a SDS gel-SDS denatures and linearises the protein (to make movement solely dependent on mass, not shape)

2D-SDS PAGE gel

2D-SDS PAGE gel

2D-gel technique example

Peng, J. and Gygi, S.P. (2001) Proteomics: the move to mixtures. J. Mass Spectrom., 36, 1083-1091.

Some limitations of 2DE:

• Limited dynamic range of detection - bias towards high abundant proteins

• Co-migration of proteins

• Separation of proteins– Basic proteins (IP > 10)– Hydrophobic proteins– Small and large proteins (< 10; >150 kDa)

Methods for protein

identification

Mass Spectrometry (MS) Stages• Introduce sample to the instrument• Generate ions in the gas phase• Separate ions on the basis of differences in m/z

with a mass analyzer • Detect ions

Vacuum Vacuum SystemSystem

SamplesSamples

HPLCHPLCDetectorDetector

Data Data SystemSystem

Mass Mass AnalyserAnalyser

Ionisation Ionisation MethodMethod

MALDI

ESI

Aebersold, R. and Mann, M. (2003) Mass spectrometry-based proteomics. Nature, 422, 198-207.

Mass spectrometers used in proteomic research

Principles of MALDI-TOF Mass

Spectrometry

Mann, M., Hendrickson, R.C. and Pandey, A. (2001) Analysis of proteins and proteomes by mass spectrometry. Annu Rev Biochem, 70, 437-473.

Electro-spray ionisation

ESI

M + RH+ MH+ + R (in solution)

Methods for protein

identification

Protein identification by Peptide Mass fingerprint

• Use MS to measure the masses of proteolytic peptide fragments.

• Identification is done by matching the measured peptide masses to corresponding peptide masses from protein or nucleotide sequence databases.

Mass spectrometry – method of separating molecules based on mass/charge ratio

Compare peptide m/z with protein databases

eg. MALDI-TOF

(trypsin)

Mass spectometry (MS)

Protein Identification by MS

Artificial spectra built

Artificially trypsinated

Database of sequences

(i.e. SwissProt)

Spot removed from gel

Fragmented using trypsin

Spectrum of fragments generated

MATCH

Lib

rary

Advantages vs. Disadvantages

• Determination of MW

• High-throughput capability

• Relative low costs

• Ambiguous results difficult to interpret

• Requires sequence databases for analysis

Limitations can be overcome by peptide sequencing using tandem mass spectrometry

How the protein sequencing works?

• Use Tandem MS: two mass analyzer in series with a collision cell in between

• Collision cell: a region where the ions collide with a gas (He, Ne, Ar) resulting in fragmentation of the ion

• Fragmentation of the peptides in the collision cell occur in a predictable fashion, mainly at the peptide bonds (also phosphoester bonds)

• The resulting daughter ions have masses that are consistent with known molecular weights of dipeptides, tripeptides, tetrapeptides…

Ser-Glu-Leu-Ile-Arg-Trp

Collision Cell

Ser-Glu-Leu-Ile-Arg

Ser-Glu-Leu

Ser-Glu-Leu-Ile

Etc…

Peng, J. and Gygi, S.P. (2001) Proteomics: the move to mixtures. J Mass Spectrom, 36, 1083-1091.

Schematic of a quadrupole TOF instrument

After traversing a countercurrent gas stream (curtain gas), the ions enter the vacuum system and are focused into the first quadrupole section (q0). They can be mass-separated in Q1 and dissociated in q2. Ions enter the time-of-flight analyzer through a grid and are pulsed into the reflector and onto the detector, where they are recorded. There are 14,000 pulsing events per second. Mann, M., Hendrickson, R.C. and Pandey, A. (2001) Analysis of proteins and proteomes by mass spectrometry. Annu Rev Biochem, 70, 437-473.

Peptide Fragmentation

Isolates individual peptide fragments for 2nd mass spec – can obtain peptide sequence

Compare peptide sequence with protein

databases

(trypsin)

Tandem Mass Spectrometry

Advantages vs. Disadvantages

• Determination of MW and aa. Sequence

• Detection of posttranslational modifications

• High-throughput capability

• High capital costs

• Requires sequence databases for analysis

LCIon trap

MS75 µm RP

200 nL to MSPeptide:1. MW2. Sequence3. Modification

Tryptic digested proteins

Coupling of LC and tandem MS

Polypeptides enter the column in the mobile phase……the hydrophobic “foot” of the polypeptides adsorb to the

hydrophobic (non polar) surface of the reverse-phase material (stationary phase) where they remain until……the organic modifier concentration rises to critical

concentration and desorbs the polypeptides

Reverse Phase column

0 20 40 60 80 100 120 140 160 180 200

Time (min)

0

10

20

30

40

50

60

70

80

90

100

0

10

20

30

40

50

60

70

80

90

10047.64

75.8157.90

82.90 104.24111.7774.48

134.7846.013.39 26.43 140.20 146.61 206.18160.29 181.98

47.97

83.07

82.0770.11

85.56 102.41 126.8946.01

134.7843.6329.48 144.13

172.59163.9727.2919.24 181.98 197.48

NL: 2.83E9

TIC MS RS_Contest_04

NL: 4.22E8

Base Peak m/z= 400.0-2000.0 F: + c

Full ms [ 400.00-2000.00] MS RS_Contest_04

Data acquired - Chromatogram

Triple Play

RT: 120.99 - 124.07

121.0 121.5 122.0 122.5 123.0 123.5 124.0Time (min)

0

50

1000

50

1000

50

1000

50

1000

50

100

Rel

ativ

e A

bund

ance

0

50

1000

50

1000

50

1004516

4504

4507

4516

4513

4519

4528

NL: 1.14E7Base Peak m/z= 400.0-2000.0 F: + c Full ms [ 400.00-2000.00] MS

RS_Contest_04

m/z= 626.6

m/z= 852.3

m/z= 872.5

m/z= 865.0

m/z= 684.0

m/z= 774.5

m/z= 1046.1

+ c Full ms [ 400.00-2000.00]

400 600 800 1000 1200 1400 1600 1800 2000m/z

0

5

10

15

20

25

30

35

40

45

50

55

60

65

70

75

80

85

90

95

100

Re

lativ

e A

bun

danc

e

626.3

835.5

982.4

610.21054.4

1156.2852.21157.5703.2

885.0578.8503.9 765.91217.7445.1 1469.71259.8

+ d Z ms [ 622.30-632.30]

622 623 624 625 626 627 628 629 630 631 632m/z

0

5

10

15

20

25

30

35

40

45

50

55

60

65

70

75

80

85

90

95

100 626.1

626.6

627.1

627.71

2+ c d Full ms2 626.28@35.00 [ 160.00-

1890.00]

200 400 600 800 1000 1200 1400 1600 1800m/z

0

5

10

15

20

25

30

35

40

45

50

55

60

65

70

75

80

85

90

95

100

Re

lativ

e A

bun

danc

e

479.4535.8

828.2

957.3

715.2958.2

1070.3406.2 602.2

3

Triple Play Dynamic Exclusion

Scan 4501

Scan 4502Scan 4503

+ c d Full ms2 852.26@35.00 [ 220.00-2000.00]

400 600 800 1000 1200 1400 1600 1800 2000

m/z

0

5

10

15

20

25

30

35

40

45

50

55

60

65

70

75

80

85

90

95

100721.2

471.0

1261.0

697.1636.8 1141.91076.2

787.5611.5

1029.11558.2515.2340.0 830.0 1648.0930.3

+ c Full ms [ 400.00-2000.00]

400 600 800 1000 1200 1400 1600 1800 2000m/z

0

5

10

15

20

25

30

35

40

45

50

55

60

65

70

75

80

85

90

95

100626.3

835.0

982.6

610.1

957.31156.3872.0

885.2766.8 1024.3445.0 579.2 1252.6

852.2

1

+ d Z ms [ 848.00-858.00]

848 849 850 851 852 853 854 855 856 857 858m/z

0

5

10

15

20

25

30

35

40

45

50

55

60

65

70

75

80

85

90

95

100852.2

853.1

2

3

Triple Play Dynamic Exclusion

Scan 4504

Scan 4505Scan 4506

2D - LC/MS

Peng, J. and Gygi, S.P. (2001) Proteomics: the move to mixtures. J Mass Spectrom, 36, 1083-1091.

Multidimensional Protein Identification Technology (MudPIT).

Whitelegge JP (2002) Plant proteomics: BLASTing out of a MudPIT. Proc Natl Acad Sci U S A 99: 11564-6.

Koller A, Washburn MP, Lange BM, Andon NL, Deciu C, Haynes PA, Hays L, Schieltz D, Ulaszek R, Wei J, Wolters D, Yates JR, 3rd (2002) Proteomic survey of metabolic pathways in rice. Proc Natl Acad Sci U S A

5: 5.