MBP1001 Advanced Cell Biology 2010 Proteomics and Mass Spectrometry Brian Raught...

MBP1001

Advanced Cell Biology 2010

Proteomics and Mass Spectrometry

Brian [email protected]

Proteomics is an extremely powerful and broadly applicable technology

can be used to identify e.g. low stoichiometry PTMs, components of protein complexes, or to characterize all protein components in an organelle, tissue or organism

the key - but poorly understood - technology in this processis mass spectrometry-based peptide sequencing

today’s lecture will provide a brief overview of the approach, followed by some examples of its utility

First step- sample preparationthe goal - simplify

depending upon the goal of your experiment, you will isolatelarge or small numbers of proteins for analysis

you may subject your protein population to one or more fractionation steps, e.g.

1D SDS-PAGE2D gel electrophoresisstrong cation exchange liquid chromatographynewer technologies - free flow electrophoresis

you will then convert your protein sample to peptides

Why are peptides (and not proteins) sequenced?

top-down approaches can identify intact proteins, but...

proteins can be difficult to handle, and all proteins in your sample may not be soluble under the same conditions (e.g. membrane-spanning proteins vs DNA binding prots)

proteins are often significantly processed and modified, resulting in many different isoforms, making identification difficult

ion trap mass spectrometers are most efficient at obtaining sequence info from peptides up to ~40aa in length – ID of prots via peptides is bottom-up proteomics

Proteases are used to convert proteins to peptides

trypsinstable and very active, cleaves on the carboxy-terminal side of K and R residues (except when modified or followed by a P)results in information rich, easily interpretable peptidefragment spectra

other commonly used proteasesLysCAspNGluC

sequence non-specific proteases are generally avoided, since they divide the peptide signal into multiple overlapping species, and thereby generate unnecessarily complex peptide mixtures

How are peptides introduced into the mass spectrometer?

1. liquid chromatography (LC) directly coupled (in-line) with MS (LC-MS), introduced via electrospray (ESI)

2. peptides spotted onto metal surface, released into the MS via controlled laser shots (MALDI)

LC-MSpeptides are loaded onto an extremely small (50-150um) reversed-phase (silica particles coated with C18) column,and eluted directly into the machine by a gradient of increasing organic solvent (water - acetonitrile, with a small amount of acid – pH~2)

100-400nl/min flow rates (nanoflow)

separated according to hydrophobicity (standard 1-2hr runs)

eluted into the MS in a very small volume, and therefore at high concentrations

In most MS applications, peptides are positively charged, via the application of a strong current to the buffer in the LC column (~2kV)

some amino acids, as well as the peptide amino terminus, are positively charged at low pH (e.g. K, R, H) – so most peptides (esp. tryptic peptides) are multiply charged

charge is critical - the MS optics manipulate only charged ions, whereas uncharged peptides are “invisible”

LC column ends in a very fine needle (~5microns); since the HPLC system is under pressure, and an electrical charge is applied, this results in a fine spray of droplets emanating from the tip containing charged peptides – electrospray ionization (soft ionization = Nobel prize)

Positively charged peptides are guided into the machineby a strong charge potential (and vacuum)

peptides first enter a small heated tube - as the fine droplets containing the peptides traverse the length of the tube, the buffer is rapidly evaporated

as the concentration of positively charged peptides increases in smaller and smaller droplets, they begin to repel one another, resulting in a series of Coulombic explosions

end result - individual positively charged peptides in the gas phase are ready for manipulation and measurement

So what is in a mass spectrometer, anyway?

think of it as a series of boxes, connected to eachother via a pipe - each box has the ability to trap and release peptides, some boxes can also smash your peptides

at the end of the pipe sits a peptide counter (detector)

1 2 3

detector

Step 1peptides enter the first chamber (Q1), where they are trapped (until the trap is full)

typical ion traps (Paul trap) use a combination of static DC and RF oscillating AC electric fields to move and manipulate the charged molecules

to characterize the contents of the trap, a small amount of the peptides (~10%) is released to the detector

this process is called the parent ion, precursor, or MS scan, and yields the m/z and intensity of all of the peptides in the first chamber at that moment

readout is expressed as intensity of signal (number of counts) for a given mass (actually m/z or mass/charge)

a parent ion (MS) scanio

n

inte

nsi

ty

m/z

select for fragmentation

Step 2collision induced dissociation

a process whereby a (mostly) pure population of a single peptide (actually a small m/z window) is ejected to a second chamber (the collision cell), and mixed with an inert gas

as energy is applied to the isolated peptide population, they collide with the gas particles, and fragment – luckily for us, most of the time peptides fragment at peptide (amide) bonds between amino acids

add just enough energy to the collision cellsuch that an individual peptide fragments just once

the resulting mixed populations of peptide fragments is thenanalyzed to give a product ion, tandem or MS/MS spectrum

a real CID spectrum

While dependent upon the particular goal of your analysis,the MS is usually programmed to conduct a single MS scanfollowed by several MS/MS scans

MS/MS scans are usually conducted on the x most abundantpeptides (m/z), where x is 1-20

1 MS followed by 4-20 MS/MS scans (depending upon the instrument) is typical

Step 3The ion trap is emptied, refilled, and the process repeated - the entire MS-MS/MS cycle takes 1-4 secondsand is thus repeated thousands of times per MS analysistypical LC-MS run is 1-2 hrsaverage ~10,000 MS/MS per hour for a complex sample

How does the MS/MS give you sequence information?

the most common and informative fragment ions are generated by fragmentation of the amide bonds between amino acids

b-ions if charge is retained by the amino-terminal fragment

y-ions if charge is retained by carboxy-terminal fragment

the differences in mass between the peptide fragments can be used to reconstruct the sequence of the original (parent) peptide (this is called de novo sequencing)

but fragmentation pattern matching is used more often (we will talk about this later)

a real CID spectrum

getting your sequence – most of the time, we use database searching

a user-defined protein database is subjected to in-silico digestion with the appropriate protease(s) to generate a list of all possible peptides

a theoretical fragmentation pattern is then generated for each peptide

parent ion mass (MS) and fragmentation data (MS/MS) from your analysis are compared to the theoretical data to find the best match

matches may then be subjected to statistical analysis to determine the quality of the ID (p-value)

spectral matching is also becoming more popular

millions of spectra have been generated and searched already

can keep these spectra in a library, then search for the best match to our newly generated spectra in the library

advantages – can identify “messier” spectra, and is very fast

disadvantges – if your peptide of interest has not been observedbefore, it won’t be in the library, and may not be compatiblebetween different machine types

Real spectral matching

Mass spectrometry identification of proteins

proteinProteolyticdigestion

peptides

1212 1414 1616

Time (min)Time (min)

LCseparation

m/z

Peptide selection;

fragmentation

200200 400400 600600 800800 1000100012001200

m/zm/z

Database searching

Peptideidentification

Proteinidentification

putting it all together

identification of peptides tells you which proteins were in your sample in the first place

can identify hundreds of proteins in a single MS run

can identify thousands of proteins in multiple MS runsof fractionated samples

questions?

take a break

MBP 1001 LecturePart 2

Okay, so I understand how to identify peptides - and thereforeproteins - so what?

i.e. what can proteomics do for you?

some typical proteomics goals:

global protein analysis

protein machines

protein-protein interactions

PTMs

quantitation

global protein analysis

goal - identification of every protein in a cell, tissue or organism- can compare state A to state Be.g. growth conditions, developmental stages, +/- hormone, mitogen or stress

normal vs. disease state?

typically involves extensive upstream protein (or peptide)fractionation

however, some issues:dynamic range (MS vs serum?)massive amounts of machine, computer, and analysis time

what proteins are present in each organelle?

protein-protein interactions

most cellular processes are carried out by multiprotein complexes(think transcription, translation, mRNA splicing, proteosomal degradation)

to know your friends is to know you:interacting partners provide invaluable insight into understanding protein function and regulation

interacting partners also change in response to signaling events, providing further clues to function

signaling or metabolic pathways function in a stepwise fashion - understanding how these pathways are structurally connected

tagged protein/MS analysis - general

protein of interesttag

isolation

sample fractionation

MS identification

expression in relevant cell/tissue

optional

epitope taggingshort AA sequence recognized by Ab - FLAG, HA, GluGlu, etc.metal binding - 6xHiscalcium binding - CaMother strong bimolecular interactions: biotin/avidin, GST/glutathione, chitinBP/chitin, MBP/maltose

TAP (tandem affinity purification) consists of two proteintags, usually separated by a protease cleavage site

*how might a tag affect protein-protein interactions?*pros/cons of different tag types?

tandem affinity purification (TAP) strategy

1 express POI as a fusion with 2 peptide tags

ProtA CaMBP protein of interest

2 bind to IgG matrix, cleave with Tobacco Etch Virus protease

TEV

interactingpartners

TAP tag strategy (step 2)

CaM

3 bind to calmodulin matrix

4 elute

CaM

EDTA

5 identify co-purifying proteins

large-scale tagging projects

good:pull down multiprotein complexes, providing a more realisticpicture of interactionspossible to see interactions that are dependent upon PTMscan do this type of analysis in relevant organism/cell/tissue

not so good:lots of non-specific interactions; with sepharose, tags, or due to overexpressiondetection of low abundance proteins may require scale-up

*how might you deal with these problems?

several large-scale tagging/MS projects now published have identified thousands of novel protein-protein interactions

other problems with large-scale techniques?

all of these techniques are biased toward proteins of higher abundance

-many low stoichiometry interactions may be missed

-usually conducted under a single condition, may miss very interesting regulated interactions

large-scale take-home messages

large-scale prot-prot interaction techniques are extremely valuable for obtaining a snapshot in time, and under a given set of environmental/developmental conditions

this knowledge is extremely valuable - connects formerly unconnectedpathways and processes

provides an overview of how protein machines are built and interactwith each other

however-not much fine detail in these studies, much of the data uncorroboratedby other methods -if you are interested in a particular protein, protein machine, or biochemical pathway, present large-scale data will likely be unsatisfactory-for these types of questions, more focused studies are required

IPs and tagged proteins

high density prot-prot interaction networks

small-scale quantitative proteomics

directed studies

116 kD 97 kD

control

experimental

45 kD

66 kD

samples are cleaned up until maximal difference between sample and controlis achieved:

*pros/cons?

classical IP analysis of protein complexes

weak interactors are lostlots of backgroundextensive optimization requiredconditions vary for each samplespecificity of Ab?what kind of control(s)?

what does my protein do?generating a high-density interaction map

you have found an interesting protein of unknown function

what does it do?

protein phosphatase 2A (PP2A or PPP2)

regulatory (B)catalytic (C)

adapter (A)

major Ser/Thr phosphatase in mammalian cells

PPP2 functions in most cases as a trimeric complex

conserved from yeast to human

numerous regulatory subunits (B) thoughtto confer substrate specificity

additional human PP2A-related phosphatases

PPP2regulatory (B)

PPP2catalytic (C)

PPP2 adapter (A)

PPP4catalytic (C)

PPP6catalytic (C)

??

two additional phosphatases highly related to PPP2C

PPP4C is 67% identical to PPP2C

PPP6C is 58% identical to PPP2C

molecular organization of PPP4 and PPP6 was unknown

who do PPP4 and PPP6 talk to?

Generating a human protein interaction network

Stably express TAP-tagged proteins in human 293 cells

Harvest cells, and affinity-purify recombinant proteins, as

well as associated proteins

Identify all proteins in the complex by mass spectrometry

Obtain the cDNA for each protein identified

A

DB C

F E

IHG

Clone protein of interest intoa TAP-tag vector

high density data via iterative TAP-tagging reveals mutually exclusive and cooperative interactions in the

PPP2 module

PPP2CPPP2R1

PPP2R2

IGBP1

PPP2R5

1

2

3

PTMs

PTMs commonly identified using MS

phosphorylationubiquitylationglycosylationmethylationacetylation

hundreds of others…

identified primarily via a mass shift of a particular amino acid

Reading a CID spectrum

i. unmodified peptide

ii. phosphopeptide

iii. sumoylated peptide

enrichment of phosphopeptides

IMACimmunocapturechemical captureaffinity chromatography

identification of aUb conjugation site

quantitation and mass spectrometrytwo primary methods

spectral counting - characterizing the number of spectra observed for a given protein, in relation to other proteins, or between samples

stable isotopes (e.g. 13C, 15N) incorporation of stable isotopes into peptides does not alter biochemical properties (e.g. chromatography is unaffected) but changes the mass of the peptide - this, of course, is a property that the MS can see

Identification

m/z

MS

Quantitation

LC m/z

MS/MS

Separation

“heavy” peptide

“light” peptide

Isotope-coding

inte

nsi

ty

inte

nsi

ty

Isotopic mass difference

intensity is proportionalto peptide abundance

m/z

inte

nsi

tyquantitative proteomics with stable isotopes

spectral counting in a series of AP-MS analyses

protein A was tagged and isolated, sample subjected to LC-MS/MS

dataprotein condition 1 condition 2 protein B knockoutA 684 599 620B 131 157 0C 176 10 204D 34 0 0

what can you get from this data?

Proteolyticdigestion

Proteolyticdigestion

Labeling with“light” ICAT

Labeling with“heavy” ICAT

Isolation ofICAT-labeled

peptides

Metabolic labeling: SILAC

Chemical labeling: ICAT

Cells grown in“light” SILAC

Cells grown in“heavy” SILAC

lysis

Affinitypurification

Lysis Affinitypurification

FractionationLC-MS/MS

FractionationLC-MS/MS

isotopic labeling strategies

absolute quantitation

what if you would like to know absolute levels of your protein/peptide?

e.g. determine stoichiometries of various proteins in protein complexes?

AQUA – peptides synthesized with stable isotopes, to use as internal standards

spiked into sample, and used to quantify endogenous peptide by comparingion intensities

can be made with standard PTMs

iTRAQ 114

0 min 30 min 60 min 120 min

AC

AAC

B B

A

treat cells

isolatecomplex

proteolyze

iTRAQ label

combine

quantitate and identify

iTRAQ 115 iTRAQ 116 iTRAQ 117

iTRAQ

MBP1001 Advanced Cell Biology 2010 Proteomics and Mass Spectrometry Brian Raught...

Documents

Transcript of MBP1001 Advanced Cell Biology 2010 Proteomics and Mass Spectrometry Brian Raught...