Ruedi Aebersold, Ph.D. Institute for Systems Biology Seattle, Washington

52
Data Collection and Analysis for High Throughput Quantitative Proteomics: Current Status and Challenges Ruedi Aebersold, Ph.D. Institute for Systems Biology Seattle, Washington email: [email protected]

description

Data Collection and Analysis for High Throughput Quantitative Proteomics: Current Status and Challenges. Ruedi Aebersold, Ph.D. Institute for Systems Biology Seattle, Washington email: [email protected]. Enumerate all the components of a proteome. Detect dynamic - PowerPoint PPT Presentation

Transcript of Ruedi Aebersold, Ph.D. Institute for Systems Biology Seattle, Washington

Page 1: Ruedi Aebersold, Ph.D. Institute for Systems Biology Seattle, Washington

Data Collection and Analysis for High Throughput Quantitative Proteomics:

Current Status and Challenges

Ruedi Aebersold, Ph.D.Institute for Systems BiologySeattle, Washingtonemail: [email protected]

Page 2: Ruedi Aebersold, Ph.D. Institute for Systems Biology Seattle, Washington

Proteomics:

The systematic (quantitative) analysis The systematic (quantitative) analysis of the proteins expressed in a cell at a timeof the proteins expressed in a cell at a time

Enumerate all the Enumerate all the components components of a proteomeof a proteome

Proteome as Proteome as database:database:

Proteome analyzedonce

Detect dynamic Detect dynamic changes in proteomechanges in proteome following external orfollowing external or

internal perturbationsinternal perturbations

Proteomics as Proteomics as Biol. or clin. assayBiol. or clin. assay::

Proteome analyzedmultiple (infinite) times

Page 3: Ruedi Aebersold, Ph.D. Institute for Systems Biology Seattle, Washington

Q2Q2Collision CellCollision Cell

Q3Q3

II

IIII

IIIIIICorrelative Correlative

sequence database sequence database searchingsearching

TheoreticalTheoretical AcquiredAcquiredProtein identificationProtein identification

PeptidesPeptides

1D, 2D, 3D peptide separation1D, 2D, 3D peptide separation

200 400 600 80010001200m/zm/z

200 400 600 80010001200m/zm/z

200 400 600 80010001200m/zm/z

12 14 16Time (min)

Tandem mass spectrumTandem mass spectrum

Protein Identification StrategyProtein Identification Strategy

Q1Q1

*

*

Protein Protein mixturemixture

Page 4: Ruedi Aebersold, Ph.D. Institute for Systems Biology Seattle, Washington

Accurate Quantitation Using Isotope Dilution

• h/l analytes are chemically identical h/l analytes are chemically identical identical specific signal in MS identical specific signal in MS

Sample 1Sample 1 Sample 2Sample 2(Reference)(Reference)

IncorporateIncorporateStable LightStable Light

IsotopeIsotope

IncorporateIncorporateStable HeavyStable Heavy

IsotopeIsotope

Analyze by Mass SpectrometerAnalyze by Mass Spectrometer

Combine Samples

• Ratio of h/l signals indicates ratio of analytesRatio of h/l signals indicates ratio of analytes

Page 5: Ruedi Aebersold, Ph.D. Institute for Systems Biology Seattle, Washington

Isotope Coded Affinity Tags (ICAT)Isotope Coded Affinity Tags (ICAT)Isotope Coded Affinity Tags (ICAT)Isotope Coded Affinity Tags (ICAT)

Heavy reagent: d8-ICAT (Heavy reagent: d8-ICAT (XX=deuterium)=deuterium)Light reagent: d0-ICAT (Light reagent: d0-ICAT (XX=hydrogen)=hydrogen)Heavy reagent: d8-ICAT (Heavy reagent: d8-ICAT (XX=deuterium)=deuterium)Light reagent: d0-ICAT (Light reagent: d0-ICAT (XX=hydrogen)=hydrogen)

Biotin tagBiotin tagBiotin tagBiotin tag Linker (heavy or light)Linker (heavy or light)Linker (heavy or light)Linker (heavy or light)

SSSS

NNNN NNNN

OOOO

NNNN OOOOOOOO

OOOO NNNNIIII

OOOO OOOOXXXX

XXXX

XXXX

XXXX

XXXX

XXXX

XXXX

XXXX

ThiolThiolreactivereactive

ThiolThiolreactivereactive

Detection of Cys containing peptides andDetection of Cys containing peptides andaccurate quantification using stable isotope dilutionaccurate quantification using stable isotope dilution

Page 6: Ruedi Aebersold, Ph.D. Institute for Systems Biology Seattle, Washington

Quantitative proteomics by isotope labeling-LC-MS/MSQuantitative proteomics by isotope labeling-LC-MS/MSQuantitative proteomics by isotope labeling-LC-MS/MSQuantitative proteomics by isotope labeling-LC-MS/MS

550550 560560 570570 580580m/zm/z

00

100100

LightLight HeavyHeavy

Mixture 1Mixture 1

Quantitation and protein Quantitation and protein identificationidentification

200200 400400 600600 800800m/zm/z

00

100100NHNH22-EACDPLR--EACDPLR-COOHCOOH

Combine and Combine and proteolyzeproteolyze

Avidin affinity Avidin affinity enrichmentenrichment

isotope-isotope-labellabel

isotope-isotope-labellabel

Mixture 2Mixture 2

Optional fractionationOptional fractionation

Compatible with any separation/fractionationmethod at protein/peptide level.

Page 7: Ruedi Aebersold, Ph.D. Institute for Systems Biology Seattle, Washington

Metabolic stableisotope labeling

Isotope taggingby chemical reaction

Digest

Label

Stable isotope incorporationvia enzyme reaction

PR

OT

EIN

LA

BE

LIN

GD

AT

A C

OL

LE

CT

ION

DA

TA

AN

AL

YS

IS

Mass spectrometryMass spectrometry

Inte

nsity

Inte

nsity

Inte

nsity

m/z m/z m/z

Digest Digest

Stable Isotope Labeling Strategies

Page 8: Ruedi Aebersold, Ph.D. Institute for Systems Biology Seattle, Washington

Quantitative Proteomics Technology

Protein identification: Automated peptide tandem mass spectrometry of complex peptide mixtures

Protein quantification: Isotope dilution

Selective chemical reactions: reduction of sample complexity; selective analyte isolation

ResultsResults

Identification of proteins in sample and quantitative profilesIdentification of proteins in sample and quantitative profiles

Page 9: Ruedi Aebersold, Ph.D. Institute for Systems Biology Seattle, Washington

Quantitative Proteomics Technology

Protein identification: Automated peptide tandem mass spectrometry of complex peptide mixtures

Protein quantification: Isotope dilution

Selective chemical reactions: reduction of sample complexity; selective analyte isolation

ResultsResults

Identification of proteins in sample and quantitative profilesIdentification of proteins in sample and quantitative profiles

Current capacity: ~1000 proteins per day/instrumentTotal yeast lysate: ~ 2000 proteins identified and quantified

Page 10: Ruedi Aebersold, Ph.D. Institute for Systems Biology Seattle, Washington

Quantitative Proteomics Technology

Protein identification: Automated peptide tandem mass spectrometry of complex peptide mixtures

Protein quantification: Isotope dilution

Selective chemical reactions: reduction of sample complexity; selective analyte isolation

ResultsResults

Identification of proteins in sample and quantitative profilesIdentification of proteins in sample and quantitative profiles

Current capacity: ~1000 proteins per day/instrumentTotal yeast lysate: ~ 2000 proteins identified and quantified

In 1991, all the world’s labs combined had identified just about 2000 genes

Page 11: Ruedi Aebersold, Ph.D. Institute for Systems Biology Seattle, Washington

Current Limitations (and Potential Solutions)

• The efficiency problem

• The validation problem

• The biological inference problem

Page 12: Ruedi Aebersold, Ph.D. Institute for Systems Biology Seattle, Washington

Standard Method for Complex Peptide Mixture

Analysis

Cation Exchange RP-HPLC ESI-MS/MS

Page 13: Ruedi Aebersold, Ph.D. Institute for Systems Biology Seattle, Washington

Proteome Analysis: The Analytical Challenges

Yeast Proteome• Expected number of ORFs: 6118

• Expected number of tryptic peptides: ~350,000

Page 14: Ruedi Aebersold, Ph.D. Institute for Systems Biology Seattle, Washington

Synchronous Timepoint SamplesCompared to Reference Sample

Timepoint Samples from Yeast Cells SynchronouslyTransiting the Cell Cycle

AsynchronousReference

Sample

Page 15: Ruedi Aebersold, Ph.D. Institute for Systems Biology Seattle, Washington

T0 T30 T60 T90 T120T0 678T30 320 998T60 342 555 1006T90 340 604 571 1243T120 319 626 587 684 1047

Data Summary

16481523

14481713

1229

1095 1184 1112 8921055 1140 921

1051 871960

• 2735/6562 proteins quantified across all timepoints (42%) • 696 proteins quantified in every experiment• 1513 proteins quantified in at least one timepoint• 34,400 peptides quantified on average per timepoint• >1 million mass spectra collected

Page 16: Ruedi Aebersold, Ph.D. Institute for Systems Biology Seattle, Washington

Features: 2720

Pep3D: Xiao-jun Li et al. submitted

Page 17: Ruedi Aebersold, Ph.D. Institute for Systems Biology Seattle, Washington

CIDs: 1633

Features: 2720

Page 18: Ruedi Aebersold, Ph.D. Institute for Systems Biology Seattle, Washington

Features: 2720

CIDs: 1633

IDs: 363

ID/CID: 22%

ID/feature: 13%

Page 19: Ruedi Aebersold, Ph.D. Institute for Systems Biology Seattle, Washington

Possible Solutions

• Better separation technology

• Selective peptide isolation

• Smart precursor ion selection

Page 20: Ruedi Aebersold, Ph.D. Institute for Systems Biology Seattle, Washington

Number of peptides identified in each FFE fraction(average overlap: 29%)

0

100

200

300

400

1 3 5 7 9 11 13 15 17 19 21 23 25 27

Number of fraction

Number of peptides

Number of peptides overlaped with previous one fraction Unique peptide in the fraction

Number of peptides identified in each SCX fraction(average overlap: 52%)

0

100

200

300

400

500

600

700

1 3 5 7 9 11 13 15 17 19 21 23 25 27

Number of fraction

Number of peptides

Number of peptides overlaped with previous one fraction Unique peptide in the fraction

•Tryptic yeast digest separated by FFE-IEX or SAX•30 fractions collected and analyzed by capLC-MS/MS•Overlap: same peptide identified in adjacent fractions

Page 21: Ruedi Aebersold, Ph.D. Institute for Systems Biology Seattle, Washington

92%

Peptide overlap in SCX

0

200

400

600

800

1000

1200

1400

1 6 11 16 21 26

Number of fractions one peptide distribute to

Number of peptides

Peptide overlap in FFE

0

400

800

1200

1600

2000

2400

2800

1 6 11 16 21 26

Number of fractions one peptide distribute to

Number of peptides

68%

Page 22: Ruedi Aebersold, Ph.D. Institute for Systems Biology Seattle, Washington

Possible Solutions

• Better separation technology• Selective peptide isolation

– Zhang H, et al. Curr. Op. Chem . Biol. (2004) 8: 66-75

– Aebersold R Nature (2003) 422(6928):115-6. • Smart precursor ion selection

– Griffin T et al. Anal Chem.( 2003) 75:867-74. – Griffin et al. J Am Soc Mass Spectrom. (2001) 12:1238-46.

Page 23: Ruedi Aebersold, Ph.D. Institute for Systems Biology Seattle, Washington

• Only a (small) subset of peptides present is identified

• Current separation strategies do not have sufficient resolving power

• MS/MS of every peptide in every experiment is a bottleneck of current MS based proteomics

• LC-ESI MS/MS wastes a high fraction of MS/MS cycles sequencing precursor ions that do not lead to a positive identification

• Most positive identifications are not informative in profiling experiments

• Smart precursor ion selection is required

Summary: Efficiency Problem

Page 24: Ruedi Aebersold, Ph.D. Institute for Systems Biology Seattle, Washington

Current Limitations (and Potential Solutions)

• The efficiency problem

• The validation problem

• The biological inference problem

Page 25: Ruedi Aebersold, Ph.D. Institute for Systems Biology Seattle, Washington

MS/MS spectra

A B C D A B C

Protein Identification by MS/MS

protein sample

MS/MS spectra

peptide mixture

peptide identifications

protein identifications

Page 26: Ruedi Aebersold, Ph.D. Institute for Systems Biology Seattle, Washington

MS/MS spectra

A B C D A B C

Protein Identification by MS/MS

protein sample

MS/MS spectra

peptide mixture

peptide identifications

protein identifications

Protein level

Peptide level

MS/MS spectrumlevel

Database searchTools:-Sequest-Mascott-SpectrumMill-Etc.

Page 27: Ruedi Aebersold, Ph.D. Institute for Systems Biology Seattle, Washington

OUTPUT FROM SEARCH ALGORITHM

sort

by

sear

ch s

core

Page 28: Ruedi Aebersold, Ph.D. Institute for Systems Biology Seattle, Washington

sort

by

sear

ch s

core

threshold

incorrect

“correct”

SEQUEST:Xcorr > 2.0Cn > 0.1

MASCOT:Score > 47

Threshold Model

Page 29: Ruedi Aebersold, Ph.D. Institute for Systems Biology Seattle, Washington

Difficulty Interpreting Protein Identifications based on MS/MS

• Different search score thresholds used to filter data

• Unknown and variable false positive error rates

• No reliable measures of confidence

Page 30: Ruedi Aebersold, Ph.D. Institute for Systems Biology Seattle, Washington

Spectrum Peptide Score

Spectrum 1 LGEYGH 4.5Spectrum 2 FQSEEQ 3.4Spectrum 3 FLYQE 1.3 … … …Spectrum N EIQKKF 2.2

Statistical Model

entire dataset:

MS/MSspectrum

best match

database search score

Page 31: Ruedi Aebersold, Ph.D. Institute for Systems Biology Seattle, Washington

Spectrum Peptide Score

Spectrum 1 LGEYGH 4.5 1.0Spectrum 2 FQSEEQ 3.4 0.97Spectrum 3 FLYQE 1.3 0.01 … … …Spectrum N EIQKKF 2.2 0.3

Statistical Model

entire dataset:

EM mixture model algorithm learns the most likely distributions among correct and incorrect peptide

assignments given the observed data

incorrect

correct

incorrect ---

correct ---

p=0.5

probability

unsupervised learning

Page 32: Ruedi Aebersold, Ph.D. Institute for Systems Biology Seattle, Washington

Threshold Model: Bad Discrimination and Inconsistency

Sensitivity: fraction of all correct resultspassing filter

Error Rate: fraction of allresults passingfilter that areincorrect

Ideal Spot

SEQUEST thresholds(from literature)

test data: A. Keller et al. OMICS 6(2), 207 (2002)

Page 33: Ruedi Aebersold, Ph.D. Institute for Systems Biology Seattle, Washington

Discriminating Power of Peptide Prophet

Sensitivity: fraction of all correct resultspassing filter

Error Rate: fraction of allresults passingfilter that areincorrect

Ideal Spot

SEQUEST thresholds(from literature)

probab

ility

model

Improved discrimination: more identifications (for the same error rate) Keller at al. Anal. Chem. 2003

Page 34: Ruedi Aebersold, Ph.D. Institute for Systems Biology Seattle, Washington

>sp|P02754|LACB_BOVIN BETA-LACTOGLOBULIN PRECURSOR (BETA-LG) (ALLERGEN BOS D 5) - Bos taurus (Bovine).

MKCLLLALALTCGAQALIVTQTMKGLDIQKVAGTWYSLAMAASDISLLDAQSAPLRVYVEELKPTPEGDLEILLQKWENGECAQKKIIAEKTKIPAVFKIDALNENKLVLDTDYKKYLLFCMENSAEPEQSLACQCLVRTPEVDDEALEKFDKALKALPMHIRLSFNPTQLEEQCHI

TPEVDDEALEK : p = 0.96

TPEVDDEALEKFDK : p = 0.96

KPTPEGDLEILLQK : p = 0.83

LSFNPTQLEEQCHI : p = 0.65LSFNPTQLEEQCHI : p = 0.76

sp|P02754|LACB_BOVINProbability = ???

ProteinProphetTM software combines probabilities of peptides assigned to MS/MS spectra to compute accurate probabilities that corresponding proteins are present

Protein Identification

Nesvizhskii et al Anal Chem. (2003)75:4646-58.

Page 35: Ruedi Aebersold, Ph.D. Institute for Systems Biology Seattle, Washington

Issues for Protein Identification

• Many peptides are present in more than a single database protein entry

ProteinProphet apportions such peptides among all corresponding proteins to derive simplest list of proteins

that explain observed peptides

• Peptides corresponding to ‘single-hit’ proteins are less likely to be correct than those corresponding to ‘multi-hit’ proteins

ProteinProphet learns by how much peptide probabilities should be adjusted to reflect this protein grouping

information

Page 36: Ruedi Aebersold, Ph.D. Institute for Systems Biology Seattle, Washington

Prot APeptide 1

Peptide 2

Prot BPeptide 3

Peptide 4

Peptide 5

Prot

Prot

Prot

Prot

in the sample(enriched for ‘multi-hit’ proteins)

not in the sample(enriched for ‘single hits’)Prot

Peptide 6

Peptide 7

Peptide 8

Peptide 9

Peptide10

+

++

+

+

5correct (+)

Amplification of False Positive Error Rate from Peptide to Protein Level

Peptide Level: 50% False Positives

Protein Level: 71% False Positives

Page 37: Ruedi Aebersold, Ph.D. Institute for Systems Biology Seattle, Washington

Data Filter # ids # non-single hits # single-hits

Publ. Threshold model#1 2257 359 1898

Publ. Threshold model #2 2742 441 2301

ProteinProphet, p 0.5 713 511 202 (predicted error rate: 7%)

Serum Protein Identifications from Large-scale (~375 run)

Experiment

Reference: H. Zhang et al., in prep

Page 38: Ruedi Aebersold, Ph.D. Institute for Systems Biology Seattle, Washington

Consistency of Manual Validation of SEQUEST Search Results

Correct Validation Incorrect Validation Validation Withheld

Manual Authenticators

Search Results

Page 39: Ruedi Aebersold, Ph.D. Institute for Systems Biology Seattle, Washington

mzXMLmzXML

Data Analysis Pipeline

Suitableinput

Peptideassignment Validation

Proteinassignment

Quantitation

Interpretation

Tasks for a proteomic analysis pipeline

COMETProbID

Peptide Prophet

Protein Prophet

ASAPRatio

SBEAMSCytoscape

Page 40: Ruedi Aebersold, Ph.D. Institute for Systems Biology Seattle, Washington

• Processing of data collected from different platforms, samples, experiments, operators requires transparent methods to score data

• Publication and relational database analysis require consistently scored data

• Tools assigning probability based scores are essential

• Openly accessible, transparent (OS) tools bring in new talent and lead to community improved tools

Data Analysis Summary:

Nesvizhskii and Aebersold (2004) Drug Discov Today. 9:173-81http://www.proteomecenter.org/software.php

Page 41: Ruedi Aebersold, Ph.D. Institute for Systems Biology Seattle, Washington

Current Limitations (and Potential Solutions)

• The efficiency problem

• The validation problem

• The biological inference problem

Page 42: Ruedi Aebersold, Ph.D. Institute for Systems Biology Seattle, Washington

HPLC-MS/MS

IFN-treatedIFN-treated Mock-treatedMock-treated

ICAT labelICAT labelC12C12C12C12

C12C12/C13C13

C13C13C13C13

Wei Yan et al

Page 43: Ruedi Aebersold, Ph.D. Institute for Systems Biology Seattle, Washington

0.6632.8440.98ubiquitination and protein degradationGSA7: ubiquitin activating enzyme E1 -like protein

0.1382.8750.57signaling pathway; transcription; apoptosisNMI: N -myc (and STAT) interactor

0.4842.8900.45electron transfer ETFA: electron -transfer -flavoprotein , alpha polypeptide ( glutaric aciduria II)

0.3902.9230.98transcriptionM96: likely ortholog of mouse metal response element binding transcription factor 2

0.2312.9751cytoskeletion and intracellular transport?LOC151636: rhysin 2

0.3353.1401cytoskeletion and intracellular transportKNS2: kinesin 2 60/70kDa

0.2593.3511lipid and fatty acit metabolismACACA: acetyl -Coenzyme A carboxylase alpha

1.6593.5330.99chaperone and protein foldingCABC1: chaperone, ABC1 activity of bc1 complex like (S. pombe )

0.2243.6840.98nucleobase , nucleoside, nucleotide and nucleic acid metabolismNUDT2: nudix (nucleoside diphosphate linked moiety X) -type motif 2

0.0583.8150.62unknown and ESIsKIAA1276: KIAA1276 protein

0.6593.9631translation and ribosomal protein; anti - viral responsePRKR protein kinase , interferon -inducible double stranded RNA dependent

1.284(0.195)(0.39

4)

4.164(2.741)(2.

2)

1(1)(1

)translation and ribosomal protein; GTP bindingEEF1A protein [Fragment]

2.2044.5230.57immune responseCD7: CD7 antigen (p41)

0.1164.5690.69signaling pathway; lipid metabolismPLCD1: phospholipase C, delta 1

0.7514.7480.97lipid and fatty acid metabolismMTP: microsomal triglyceride transfer protein (large polypeptide, 88kDa)

0.6614.8580.98cytoskeletion and intracellular transportG1P2: interferon, alpha -inducible protein (clone IFI -15K)

0.1335.8451G-protein coupled receptor and G - protein signalingGNB1: guanine nucleotide binding protein (G protein), beta polyp eptide 1

4.8836.0540.73unknownFLJ32915: hypothetical protein FLJ32915

0.7946.2301unknownIFIT4: interferon - induced protein with tetratricopeptide repeats 4

0.9366.2790.93one -carbon compound metabolismAHCY: S - adenosylhomocysteine hydrolase

1.0708.1041chaperone and protein foldingCSA_PPIasePEPTIDYL PROLYL CIS TRANS ISOMERASE

1.0439.5080.79adhesion molecule and extracellular matrix proteinADRM1: adhesion regulating molecule 1

1.02412.0060.49signaling pathwayPASK PAS domain containing serine/ threonine kinase

4.74121.2700.63G-protein coupled receptor and G - protein signalingGPR111: G protein -coupled receptor 111

-199990.48unknown and ESIsIFIT1: interferon - induced protein with tetratricopeptide repeats 1

-199990.57ubiquitination and protein degradationUBE2L6: ubiquitin -conjugating enzyme E2L 6

-199990.94moto protein complexDNAH11: dynein , axonemal , heavy polypeptide 11

0.6632.8440.98ubiquitination and protein degradationGSA7: ubiquitin activating enzyme E1 -like protein

0.1382.8750.57signaling pathway; transcription; apoptosisNMI: N -myc (and STAT) interactor

0.4842.8900.45electron transfer ETFA: electron -transfer -flavoprotein , alpha polypeptide ( glutaric aciduria II)

0.3902.9230.98transcriptionM96: likely ortholog of mouse metal response element binding transcription factor 2

0.2312.9751cytoskeletion and intracellular transport?LOC151636: rhysin 2

0.3353.1401cytoskeletion and intracellular transportKNS2: kinesin 2 60/70kDa

0.2593.3511lipid and fatty acit metabolismACACA: acetyl -Coenzyme A carboxylase alpha

1.6593.5330.99chaperone and protein foldingCABC1: chaperone, ABC1 activity of bc1 complex like (S. pombe )

0.2243.6840.98nucleobase , nucleoside, nucleotide and nucleic acid metabolismNUDT2: nudix (nucleoside diphosphate linked moiety X) -type motif 2

0.0583.8150.62unknown and ESIsKIAA1276: KIAA1276 protein

0.6593.9631translation and ribosomal protein; anti - viral responsePRKR protein kinase , interferon -inducible double stranded RNA dependent

1.284(0.195)(0.39

4)

4.164(2.741)(2.

2)

1(1)(1

)translation and ribosomal protein; GTP bindingEEF1A protein [Fragment]

2.2044.5230.57immune responseCD7: CD7 antigen (p41)

0.1164.5690.69signaling pathway; lipid metabolismPLCD1: phospholipase C, delta 1

0.7514.7480.97lipid and fatty acid metabolismMTP: microsomal triglyceride transfer protein (large polypeptide, 88kDa)

0.6614.8580.98cytoskeletion and intracellular transportG1P2: interferon, alpha -inducible protein (clone IFI -15K)

0.1335.8451G-protein coupled receptor and G - protein signalingGNB1: guanine nucleotide binding protein (G protein), beta polyp eptide 1

4.8836.0540.73unknownFLJ32915: hypothetical protein FLJ32915

0.7946.2301unknownIFIT4: interferon - induced protein with tetratricopeptide repeats 4

0.9366.2790.93one -carbon compound metabolismAHCY: S - adenosylhomocysteine hydrolase

1.0708.1041chaperone and protein foldingCSA_PPIasePEPTIDYL PROLYL CIS TRANS ISOMERASE

1.0439.5080.79adhesion molecule and extracellular matrix proteinADRM1: adhesion regulating molecule 1

1.02412.0060.49signaling pathwayPASK PAS domain containing serine/ threonine kinase

4.74121.2700.63G-protein coupled receptor and G - protein signalingGPR111: G protein -coupled receptor 111

-199990.48unknown and ESIsIFIT1: interferon - induced protein with tetratricopeptide repeats 1

-199990.57ubiquitination and protein degradationUBE2L6: ubiquitin -conjugating enzyme E2L 6

-199990.94moto protein complexDNAH11: dynein , axonemal , heavy polypeptide 11

0.6632.8440.98ubiquitination and protein degradationGSA7: ubiquitin activating enzyme E1 -like protein

0.1382.8750.57signaling pathway; transcription; apoptosisNMI: N -myc (and STAT) interactor

0.4842.8900.45electron transfer ETFA: electron -transfer -flavoprotein , alpha polypeptide ( glutaric aciduria II)

0.3902.9230.98transcriptionM96: likely ortholog of mouse metal response element binding transcription factor 2

0.2312.9751cytoskeletion and intracellular transport?LOC151636: rhysin 2

0.3353.1401cytoskeletion and intracellular transportKNS2: kinesin 2 60/70kDa

0.2593.3511lipid and fatty acit metabolismACACA: acetyl -Coenzyme A carboxylase alpha

1.6593.5330.99chaperone and protein foldingCABC1: chaperone, ABC1 activity of bc1 complex like (S. pombe )

0.2243.6840.98nucleobase , nucleoside, nucleotide and nucleic acid metabolismNUDT2: nudix (nucleoside diphosphate linked moiety X) -type motif 2

0.0583.8150.62unknown and ESIsKIAA1276: KIAA1276 protein

0.6593.9631translation and ribosomal protein; anti - viral responsePRKR protein kinase , interferon -inducible double stranded RNA dependent

1.284(0.195)(0.39

4)

4.164(2.741)(2.

2)

1(1)(1

)translation and ribosomal protein; GTP bindingEEF1A protein [Fragment]

2.2044.5230.57immune responseCD7: CD7 antigen (p41)

0.1164.5690.69signaling pathway; lipid metabolismPLCD1: phospholipase C, delta 1

0.7514.7480.97lipid and fatty acid metabolismMTP: microsomal triglyceride transfer protein (large polypeptide, 88kDa)

0.6614.8580.98cytoskeletion and intracellular transportG1P2: interferon, alpha -inducible protein (clone IFI -15K)

0.1335.8451G-protein coupled receptor and G - protein signalingGNB1: guanine nucleotide binding protein (G protein), beta polyp eptide 1

4.8836.0540.73unknownFLJ32915: hypothetical protein FLJ32915

0.7946.2301unknownIFIT4: interferon - induced protein with tetratricopeptide repeats 4

0.9366.2790.93one -carbon compound metabolismAHCY: S - adenosylhomocysteine hydrolase

1.0708.1041chaperone and protein foldingCSA_PPIasePEPTIDYL PROLYL CIS TRANS ISOMERASE

1.0439.5080.79adhesion molecule and extracellular matrix proteinADRM1: adhesion regulating molecule 1

1.02412.0060.49signaling pathwayPASK PAS domain containing serine/ threonine kinase

4.74121.2700.63G-protein coupled receptor and G - protein signalingGPR111: G protein -coupled receptor 111

-199990.48unknown and ESIsIFIT1: interferon - induced protein with tetratricopeptide repeats 1

-199990.57ubiquitination and protein degradationUBE2L6: ubiquitin -conjugating enzyme E2L 6

-199990.94moto protein complexDNAH11: dynein , axonemal , heavy polypeptide 11

0.6632.8440.98ubiquitination and protein degradationGSA7: ubiquitin activating enzyme E1 -like protein

0.1382.8750.57signaling pathway; transcription; apoptosisNMI: N -myc (and STAT) interactor

0.4842.8900.45electron transfer ETFA: electron -transfer -flavoprotein , alpha polypeptide ( glutaric aciduria II)

0.3902.9230.98transcriptionM96: likely ortholog of mouse metal response element binding transcription factor 2

0.2312.9751cytoskeletion and intracellular transport?LOC151636: rhysin 2

0.3353.1401cytoskeletion and intracellular transportKNS2: kinesin 2 60/70kDa

0.2593.3511lipid and fatty acit metabolismACACA: acetyl -Coenzyme A carboxylase alpha

1.6593.5330.99chaperone and protein foldingCABC1: chaperone, ABC1 activity of bc1 complex like (S. pombe )

0.2243.6840.98nucleobase , nucleoside, nucleotide and nucleic acid metabolismNUDT2: nudix (nucleoside diphosphate linked moiety X) -type motif 2

0.0583.8150.62unknown and ESIsKIAA1276: KIAA1276 protein

0.6593.9631translation and ribosomal protein; anti - viral responsePRKR protein kinase , interferon -inducible double stranded RNA dependent

1.284(0.195)(0.39

4)

4.164(2.741)(2.

2)

1(1)(1

)translation and ribosomal protein; GTP bindingEEF1A protein [Fragment]

2.2044.5230.57immune responseCD7: CD7 antigen (p41)

0.1164.5690.69signaling pathway; lipid metabolismPLCD1: phospholipase C, delta 1

0.7514.7480.97lipid and fatty acid metabolismMTP: microsomal triglyceride transfer protein (large polypeptide, 88kDa)

0.6614.8580.98cytoskeletion and intracellular transportG1P2: interferon, alpha -inducible protein (clone IFI -15K)

0.1335.8451G-protein coupled receptor and G - protein signalingGNB1: guanine nucleotide binding protein (G protein), beta polyp eptide 1

4.8836.0540.73unknownFLJ32915: hypothetical protein FLJ32915

0.7946.2301unknownIFIT4: interferon - induced protein with tetratricopeptide repeats 4

0.9366.2790.93one -carbon compound metabolismAHCY: S - adenosylhomocysteine hydrolase

1.0708.1041chaperone and protein foldingCSA_PPIasePEPTIDYL PROLYL CIS TRANS ISOMERASE

1.0439.5080.79adhesion molecule and extracellular matrix proteinADRM1: adhesion regulating molecule 1

1.02412.0060.49signaling pathwayPASK PAS domain containing serine/ threonine kinase

4.74121.2700.63G-protein coupled receptor and G - protein signalingGPR111: G protein -coupled receptor 111

-199990.48unknown and ESIsIFIT1: interferon - induced protein with tetratricopeptide repeats 1

-199990.57ubiquitination and protein degradationUBE2L6: ubiquitin -conjugating enzyme E2L 6

-199990.94moto protein complexDNAH11: dynein , axonemal , heavy polypeptide 11

-1.2500.0000.94transcription suppressionHIC1: hypermethylated in cancer 1

-1.2500.0000.99signaling pathwayMIG -6: Gene 33/Mig -6

0.0080.0031cytoskeletion and intracellular transportKRT6: keratin 6

0.0640.0430.71unknownHypothetical protein FLJ21140

0.0330.1981carbohydrate metabolismG2AN: alpha glucosidase II alpha subunit

0.1100.2650.98immune responseIg lambda chain C regions

0.1000.3040.99lipid and fatty acid metabolismFASN: fatty acid synthase

0.0910.3160.99signaling pathwayWNT9A: wingless -type MMTV integration site family, member 9A

0.0840.3231chaperone and protein folding; protein protectionPPGB: protective protein for beta -galactosidase (galactosialidosis)

0.0180.3310.98amino acid and peptide metabolismOAT: ornithine aminotransferase (gyrate atrophy)

0.0160.3390.98cytoskeletion and intracellular transportMAPRE1: microtubule -associated protein, RP/EB family, member 1

0.1540.3780.96RNA splicing and processingCPSF4: cleavage and polyadenylation specific factor 4, 30kDa

0.1650.3831cytoskeletion and intracellular transport; GTP bindingTUBA6: tubulin alpha 6

0.0740.3910.95RNA binding and ribosomal associationFXR2: fragile X mental retardation, autosomal homolog 2

0.0810.4160.98lipid and fatty acid metabolismFACL4: fatty -acid -Coenzyme A ligase, long - chain 4

0.0140.4261unknownKIAA0007: KIAA0007 protein

0.2240.4340.82RNA splicing and processingSRRM2: serine/arginine repetitive matrix 2

0.1380.4550.98G-protein coupled receptor and G -protein signalingGPS1: G protein pathway suppressor 1

0.1650.4521chaperone and protein foldingTRA1: tumor rejection antigen (gp96) 1

0.0080.4840.98electron transferSARDH: sarcosine dehydrogenase

0.0550.4950.97cytoskeletion and intracellular transportKRT10: keratin 10 (epidermolytic hyperkeratosis; keratosis palma ris et plantaris)

0.0290.4961unknownSPK: symplekin

0.0710.4990.61translation and ribosomal proteinMGC3207: hypothetical protein MGC3207

-1.2500.0000.94transcription suppressionHIC1: hypermethylated in cancer 1

-1.2500.0000.99signaling pathwayMIG -6: Gene 33/Mig -6

0.0080.0031cytoskeletion and intracellular transportKRT6: keratin 6

0.0640.0430.71unknownHypothetical protein FLJ21140

0.0330.1981carbohydrate metabolismG2AN: alpha glucosidase II alpha subunit

0.1100.2650.98immune responseIg lambda chain C regions

0.1000.3040.99lipid and fatty acid metabolismFASN: fatty acid synthase

0.0910.3160.99signaling pathwayWNT9A: wingless -type MMTV integration site family, member 9A

0.0840.3231chaperone and protein folding; protein protectionPPGB: protective protein for beta -galactosidase (galactosialidosis)

0.0180.3310.98amino acid and peptide metabolismOAT: ornithine aminotransferase (gyrate atrophy)

0.0160.3390.98cytoskeletion and intracellular transportMAPRE1: microtubule -associated protein, RP/EB family, member 1

0.1540.3780.96RNA splicing and processingCPSF4: cleavage and polyadenylation specific factor 4, 30kDa

0.1650.3831cytoskeletion and intracellular transport; GTP bindingTUBA6: tubulin alpha 6

0.0740.3910.95RNA binding and ribosomal associationFXR2: fragile X mental retardation, autosomal homolog 2

0.0810.4160.98lipid and fatty acid metabolismFACL4: fatty -acid -Coenzyme A ligase, long - chain 4

0.0140.4261unknownKIAA0007: KIAA0007 protein

0.2240.4340.82RNA splicing and processingSRRM2: serine/arginine repetitive matrix 2

0.1380.4550.98G-protein coupled receptor and G -protein signalingGPS1: G protein pathway suppressor 1

0.1650.4521chaperone and protein foldingTRA1: tumor rejection antigen (gp96) 1

0.0080.4840.98electron transferSARDH: sarcosine dehydrogenase

0.0550.4950.97cytoskeletion and intracellular transportKRT10: keratin 10 (epidermolytic hyperkeratosis; keratosis palma ris et plantaris)

0.0290.4961unknownSPK: symplekin

0.0710.4990.61translation and ribosomal proteinMGC3207: hypothetical protein MGC3207

Name Cellular pathway Probability ASAPRatio Mean

ASAPRatio Std.

671

748

270

330

523

590

P

0.9 P

0.4

1464

1668

P3S100 P100 Unique IDSum

1113

1272

54 IFN-induced proteins (2-fold)

15 previously reported

39 novel

23 IFN-repressed proteins (0.5-fold)

Page 44: Ruedi Aebersold, Ph.D. Institute for Systems Biology Seattle, Washington

Lots of data -what does it mean?

Page 45: Ruedi Aebersold, Ph.D. Institute for Systems Biology Seattle, Washington

Katze et al (2002) 2: 675

Interferon (IFN) Pathway

2.215 ± 0.079

PKR

2’,5’-OAS

Mx

ADAR

IRFs

MHC

3.963 ± 0.659

IFN / Mock

2.460 ± 0.076

2.359 ± 0.149

1.398 ± 0.118

Not identified

2.768 ± 0.583-2-microglobulin (MHC I)

IFI-30 (MHC II) 2.219 ± 0.183

Page 46: Ruedi Aebersold, Ph.D. Institute for Systems Biology Seattle, Washington

GO level

3

4

5

6

7

8

9

10

11

12

Physiological process

Response to stress

Response to external stimulus

PathogenesisMetabolismDeathCell growth and/or maintenance

Cellular defense responseMetabolismCell growth and/or maintenance

Defense response

Immune response

Cell deathTransportCell organization

Cell growth

Cytoplasm organization

Nuclear organization

Fatty acid metabolism

Amino acid metabolism

Nitrogen metabolism

DNA metabolism

Catabolism

GO Analysis of Interferon regulated proteins

Page 47: Ruedi Aebersold, Ph.D. Institute for Systems Biology Seattle, Washington

Islands of intense knowledge in ocean of unknown

HormoneHormoneresponseresponse

ss

CellCellmotilitmotilit

yyEnergyEnergy

metabolismetabolismm

TranscriptionTranscription

Page 48: Ruedi Aebersold, Ph.D. Institute for Systems Biology Seattle, Washington

Charting the path between landmarks

HormoneHormoneresponseresponse

ss

CellCellmobilitymobilityEnergyEnergy

metabolismetabolismm

TranscriptionTranscriptionUnassigned observations

Page 49: Ruedi Aebersold, Ph.D. Institute for Systems Biology Seattle, Washington

Walking down the interaction map

A

D

B

C

F

E

I

H

G

Page 50: Ruedi Aebersold, Ph.D. Institute for Systems Biology Seattle, Washington

First round of TAP-tagging:Identification of IGBP1 and TIP41 interactors

IGBP1

TIP41

PPP6R2A*

PPP6R1*

PPP4R2*

PPP2CB

PPP4C

PPP6C

PPP2CA

CCT4

CCT5 CCT6A

CCT8CCT7

CCT3

TCP1 CCT2

CCT complex

Catalytic subunitsPP2A-type

phosphatases

Uncharacterizedproteins

Anne-Claude Gingras

Page 51: Ruedi Aebersold, Ph.D. Institute for Systems Biology Seattle, Washington

Human phosphatase-interaction network:Segregation into functional modules

Centrosome; Meiosis Exit from mitosis; Actin cytoskeleton

PP2B

PP2C

PP2A a

PP4C

PP6C

G1 S transition

Page 52: Ruedi Aebersold, Ph.D. Institute for Systems Biology Seattle, Washington

AcknowledgementsSeparation strategies

Hookeun LeeEugene YiMingliang Yi

Abundance dependent MS/MSTim GriffinChris Lock (Sciex)

Software development and statistical modelsEric Deutsch Xiao-Jun LiJimmy Eng Alex NesvizhskiiAndy Keller Benno Schwikowski Patrick Pedrioli Ning Zhang

Inference of biological functionWei YanAnne-Claude GingrasCytoscape project (www.cytoscape.org)

Funding:NIH (NCI, NCRR, NIDA, NHBLI), Merck, ABI