Proteomics Informatics (BMSC-GA 4437)
description
Transcript of Proteomics Informatics (BMSC-GA 4437)
Proteomics Informatics (BMSC-GA 4437)
Course Director
David Fenyö
Contact information
http://fenyolab.org/presentations/Proteomics_Informatics_2014/
http://fenyolab.org/presentations/Proteomics_Informatics_2014/
Proteomics Informatics – Learning Objectives
Be able analyze proteomics data sets and understand the limitations of the results.
Proteomics Informatics – SyllabusWeek 1 Overview of proteomics (1/28/2014 at 4 pm in TRB 718)
Week 2 Overview of mass spectrometry (2/4/2014 at 4 pm in TRB 718)
Week 3 Analysis of mass spectra: signal processing, peak finding, and isotope clusters (2/11/2014 at 4 pm in TRB 119)
Week 4 Protein identification I: searching protein sequence collections and significance testing (2/18/2014 at 4 pm in TRB 718)
Week 5 Protein identification II: de novo sequencing (2/25/2014 at 4 pm in TRB 718)
Week 6 Databases, data repositories and standardization (3/4/2014 at 4 pm in TRB 718)
Week 7 Proteogenomics (3/11/2014 at 4 pm in TRB 718)
Week 8 Protein quantitation I: Overview (3/18/2014 at 4 pm in TRB 718)
Week 9 Protein quantitation II: Targeted (3/25/2014 at 4 pm in TRB 718)
Week 10 Protein characterization I: post-translational modifications (4/1/2014 at 4 pm in TRB 718)
Week 11 Protein characterization II: Protein interactions (4/10/2014 at 4 pm in TRB 718)
Week 12 Molecular Signatures (4/17/2014 at 4 pm in TRB 718)
Week 13 Presentations of projects (4/22/2014 at 4 pm in TRB 718)
Proteomics Informatics – Overview of Proteomics (Week 1)
• Why proteomics?
• Bioinformatics
• Overview of the course
Motivating Example: Protein Regulation
Geiger et al., “Proteomic changes resulting from gene copy number variations in cancer cells”, PLoS Genet. 2010 Sep 2;6(9). pii: e1001090.
Motivating Example: Protein Complexes
Alber et al., Nature 2007
Motivating Example: Signaling
Choudhary & Mann, Nature Reviews Molecular Cell Biology 2010
BioinformaticsBiological System
Samples
Measurements
Experimental Design
Raw Data
Information
Data Analysis
Mass Spectrometry Based Proteomics
Mass spectrometry
LysisFractionation
MS
Digestion
Identified and Quantified Proteins
Peak Finding Charge determination
De-isotopingIntegrating Peaks
Searching
Proteomics Informatics – Overview of Mass spectrometry (Week 2)
Ion Source
Mass Analyzer Detector
mass/charge
inte
nsity
Mass Analyzer 1
Frag-mentation DetectorIon
SourceMass
Analyzer 2
b y
Proteomics Informatics – Overview of Mass spectrometry (Week 2)
Proteomics Informatics – Overview of Mass spectrometry (Week 2)
Mass Analyzer 1
Frag-mentation
Detector
inte
nsity
mass/charge
Ion Source
Mass Analyzer 2
LC
inte
nsity
mass/chargeinte
nsity
mass/charge
inte
nsity
mass/chargeinte
nsity
mass/chargeinte
nsity
mass/charge
Time
inte
nsity
mass/chargeinte
nsity
mass/chargeinte
nsity
mass/charge
inte
nsity
mass/chargeinte
nsity
mass/chargeinte
nsity
mass/charge
inte
nsity
mass/chargeinte
nsity
mass/chargeinte
nsity
mass/charge
Proteomics Informatics – Analysis of mass spectra: signal processing, peak finding, and isotope clusters (Week 3)
m/z
Inte
nsity
Proteomics Informatics – Protein identification I: searching protein
sequence collections and significance testing (Week 4)
MS/MS
LysisFractionation
MS/MS
Digestion
SequenceDB
All FragmentMasses
Pick Protein
Compare, Score, Test Significance
Repeat for all proteins
Pick PeptideLC-MS
Repeat for
all peptides
Proteomics Informatics – Protein identification I: searching protein
sequence collections and significance testing (Week 4)
Proteomics Informatics – Protein identification II:
de novo sequencing (Week 5)
m/z
% R
elat
ive
Abu
ndan
ce
100
0250 500 750 1000
[M+2H]2+
762
260 389 504
633
875
292 405 5349071020663 778 1080
1022
Mass Differences
1-letter code
3-letter code
Chemical formula
Monoisotopic
Average
A Ala C3H5ON 71.0371 71.0788R Arg C6H12ON4 156.101 156.188N Asn C4H6O2N2 114.043 114.104D Asp C4H5O3N 115.027 115.089C Cys C3H5ONS 103.009 103.139E Glu C5H7O3N 129.043 129.116Q Gln C5H8O2N2 128.059 128.131G Gly C2H3ON 57.0215 57.0519H His C6H7ON3 137.059 137.141I Ile C6H11ON 113.084 113.159L Leu C6H11ON 113.084 113.159K Lys C6H12ON2 128.095 128.174M Met C5H9ONS 131.04 131.193F Phe C9H9ON 147.068 147.177P Pro C5H7ON 97.0528 97.1167S Ser C3H5O2N 87.032 87.0782T Thr C4H7O2N 101.048 101.105W Trp C11H10ON2 186.079 186.213Y Tyr C9H9O2N 163.063 163.176V Val C5H9ON 99.0684 99.1326
Amino acid masses
Sequences consistent
with spectrum
Proteomics Informatics – Databases, data repositories and
standardization (Week 6)
Most proteins show very reproducible peptide patterns
Proteomics Informatics – Databases, data repositories and
standardization (Week 6)
Query Spectrum
Best match In GPMDB
Secondbest match In GPMDB
Proteomics Informatics – Databases, data repositories and
standardization (Week 6)
Proteomics Informatics – Proteogenomics (Week 7)
Tumor Specific
Protein DB
Non-Tumor Sample Genome sequencing Identify germline variants
Reference Human Database (Ensembl)
Genome sequencingRNA-SeqTumor Sample
Identify alternative splicing, somatic variants and
novel expression
TCGAGAGCTGTCGAGAGCTGTCGAGAGCTGTCGAGAGCTGTCGAGAGCTGTCGATAGCTG
Exon 1 Exon 2 Exon 3
Exon 1
Variants
Alt. Splicing Novel Expression
Exon 1 Exon X Exon 2
Fusion Genes
Gene XExon 1
Gene XExon 2
Gene YExon 1
Gene YExon 2
Gene X Gene Y Kelly Ruggles
Proteomics Informatics – Protein quantitation I: Overview (Week 8)
Fractionation
Digestion
LC-MS
Lysis
MS
C ij
I ik
pij
Pr
pD
ijk pPep
ik
pLC
ik
pMS
ik
pL
ij
ppppppCIMS
ik
LC
ik
Pep
ikj
D
ijkij
L
ijijkik
Pr
Sample iProtein jPeptide k
ppppppIC MS
ik
LC
ik
Pep
ik
D
ijkij
L
ijk
ikk
ij Pr
k
Proteomics Informatics – Protein quantitation I: Overview (Week 8)
Fractionation
Digestion
LC-MS
Lysis
MS MS
pppppp MS
ik
LC
ik
Pep
ik
D
ijkij
L
ijk
Pr
Assumption:
constant for all samples
IICC jjjj iiii mnmn//
Sample iProtein jPeptide k
Proteomics Informatics – Protein quantitation II: Targeted (Week 9)
Fractionation
Digestion
LC-MS
Lysis
MS
Shotgun proteomics Targeted MS1. Records M/Z
2. Selects peptides based on abundance and fragments MS/MS
3. Protein database search for peptide identification
Data Dependent Acquisition (DDA) Uses predefined set of peptides
1. Select precursor ion
MS
2. Precursor fragmentation
MS/MS
3. Use Precursor-Fragment pairs for identification
Proteomics Informatics – Protein characterization I: post-translational
modifications (Week 10)Peptide with two possible modification sites
MS/MS spectrum
m/z
Inte
nsity
Matching
Which assignment doesthe data support?
1, 1 or 2, or 1 and 2?
AB
AC
D
Digestion
Mass spectrometry
EF
Identification
Proteomics Informatics – Protein Characterization II: protein
interactions (Week 11)
Proteomics Informatics – Molecular Signatures (Week 12)
Proteomics Informatics – Molecular Signatures (Week 12)
Proteomics Informatics – Presentations of projects (Week 13)
Select a published data set that has been made public and reanalyze it.
Highlighted data sets: http://www.thegpm.org/
10 min presentations
Proteomics Informatics (BMSC-GA 4437)
Course Director
David Fenyö
Contact information
http://fenyolab.org/presentations/Proteomics_Informatics_2014/