Lecture-8 Introduction to Proteomics Huseyin Tombuloglu, Phd GBE423 Genomics & Proteomics.
Clinical Proteomics: From Research To Practice...Proteomics vs Genomics Proteins actually do the...
Transcript of Clinical Proteomics: From Research To Practice...Proteomics vs Genomics Proteins actually do the...
Clinical Proteomics: From Research to Practice
Eric Fung, MD, PhDVice President of Medical and Clinical Affairs
Outline of presentation
Proteomics defined
A summary of proteomics technologies
Focus on biomarkers to diagnostics
Practical aspects of a clinical proteomics study
Proteome - defined
All the proteins expressed by a genome.
“Functional Proteome” = all the proteins produced by a specific cell in a single time frame.
Can be defined as the total protein complement of a system (be that an organelle/cell/tissue/organism).
The genome is for the most part static within a generation, whereas the proteome is more dynamic: cell cycle, development, response to environment, etc
Why is the proteome important?
It is the proteins within the cell that:Provide structure
Produce energy
Allow communication
Allow movement
Allow reproduction
Proteins provide the structural and functional framework of cellular life
Proteomics, defined
The study of the expression, structure and function of proteins,and the interactions between proteins.
Where and when are proteins expressed? Abundance?
Protein modifications and activities
Interactions: Protein-protein, protein-DNA, protein-small molecule, etc
Protein structure
It represents the protein counterpart to the analysis of gene function.
Initial goal was to rapidly identify all the proteins expressed by a cell or tissue – a goal that has yet to be achieved for any species
Why is proteomics important?
Identification of proteins changed in disease conditions
Identification of pathogenic mechanisms
Promise in novel drug discovery via analysis of clinically relevant molecular events
Contributes to understanding of gene function
Why Proteomics?
Same genome
Different proteome
Proteomics vs Genomics
Proteins actually do the work of the cell
DNA/RNA analysis cannot predict the amount of a gene product made (if and when)
RNA quantitation does not always reflect corresponding protein levels
Genomics cannot predict post-translational modifications and the effects thereof
Post-translational modification is extensive: so far more than 200 different types of modifications have been reported! How does modification alter protein function?
~30,000 human genes yields 1,200,000+ protein variants?!
Multiple proteins can be obtained from each gene (alternative splicing)
38,016 possible isoforms! How many are produced and functional?
Proteomic Methodologies
Analysis of protein expression patterns 2-D gel electrophoresis
Protein separation – mass spectrometry
SELDI, MALDI, ICAT, LC(n)-MS(n), ESI
Analysis of protein sequence information
Analysis of protein structure/function relationships
Bioinformatics/statistics
Affinity approachesProtein arrays
Genetic methods – tagging every proteinPrototype – Yeast two-hybrid assay
Components of Human Serum
“Unbiased” proteomics approaches
Mocellin et al, TMM, 2004
Studying the proteomePatient sample
extraction method(i.e. lipid solubilisation)
comparison methods to identify changed proteins in gels
protein extraction from gels (spot cutting)
protein fragmentation and mass spec.
separation methods(2D gels)
matching spectrum to virtual spectrum for protein identification
pITwo-dimensional gel
MW
2-D DIGE
Courtesy of Amersham
ProteinChip® technology
FractionateQ-HyperD, CM-HyperD, IMAC-HyperCel,
HIC, DEAE, MEP, IDM
Study Design ProfileQ10, CM10, IMAC30, H50, RS100
Benign (50)
Control
(30)
Benign (90)Control
(49)
Stage
III/IV (2)
Stage I/II (35)Benign (26)
Control 63
Stage
III/IV (103)
Stage I/II (20)
Stage I/II (35)
Ca (41)
Other Ca 2 (20)
Control
(41)
Other Ca 1 (20)
Other Ca 3 (20)
IndependentValidation
CrossComparison
CandidateMarkers
MultivariateModels
Protein ID
Independent Validation by Immunoassay
Multivariate
Model Derivatio
n
0 0.1 0 .2 0 .3 0. 4 0 .5 0. 6 0 .7 0. 8 0 .9 10
0. 1
0. 2
0. 3
0. 4
0. 5
0. 6
0. 7
0. 8
0. 9
1ROC cur ve, are a=0. 94 327 , std = 0 .09 49 73 , a lph a= 2. 79 1, be ta= 0.4 71 54
1 - Spe cific ity
Se
nsiti
vity
Discovery 1
Discovery 2
Collect and PreprocessBaseline subtract, Normalize
Calibrate
Characterize and IdentifyPeptide Mass Fingerprinting, MS/MS
Univariate, Multivariate analysis to Prioritize Biomarkers
CiphergenExpress Heirarchical Clustering, PCA; BPS
25000 50000 75000
05
101520
ProteinChip® Three dimensions of Resolution
ProteinChip® System, Series 4000
Protein Microarrays
High-Throughput!
Quantitative Protein Microarrays
• requires previous knowledge about proteins to be studied and that appropriate antibodies exist
• protein complexes can obscure results
MacBeath 2003
Protein Function Microarrays
• Proteins are arrayed on slide and assayed for…• Protein-Antibody interactions• Protein-protein interactions• Protein-small molecule interactions and complexes• Activity assays (eg. phosphorylation, protease activity etc)
Protein Function Microarrays
ResultsCalmodulin: identified 6 known and 33 new calmodulin binding proteins
Phosphoinositide binding: 35% of identified proteins are uncharacterized
Zhu et al, 2001
InflammationInflammation
NeovesselsNeovessels
HyperplasiaHyperplasia
NerveNerve
StromaStroma
NormalNormal
HighHigh--grade PINgrade PIN
WellWell--differentiated differentiated carcinomacarcinoma
ModeratelyModerately--differentiated differentiated carcinomacarcinoma
PoorlyPoorly--differentiateddifferentiatedcarcinomacarcinoma
LowLow--gradegrade PINPIN
TISSUE MICROENVIRONMENT
Nature 2001
Invented: Dr. Lance A. LiottaScience’ 96, 97, 98
Before LCM
After LCM
Case study: Prostate normal epithelium (human)
Liotta et al. (2003) Cancer Cell
PY-ERK TOTAL ERK2 ALONEo
Use of Novel Protein Array Technology:Signal Pathway Profiling in Human Breast
Cancer Biopsy Specimens
Ongoing work: cluster analysis with 135 phospho-specific endpoints, allnormalized to the self protein for true signal pathway profiling
Normal/Normal (reduction mammoplasty)
Coupling Laser Capture Microdissection With True Signal Pathway Profiling
Normal Premalignant Invasive
38 cases Invasive
NCI
The next revolution: Molecular Profiling: Individualized Therapy
Microdissection
Patient
Biopsy
Signal NetworkProfileGene Microarray
EGFR signature, 452 genes in 74 cases
EGFR Pathway
Choose combination therapy tailored to pathogenic defect
Monitor success of therapy
Rationale basis for revising therapy following recurrence
Protein Microarray
Phosphorylated states of signal pathway proteins
The Need & The Opportunity
NONE12,71060,240Urinary
NONE12,69035,710Kidney
NONE13,33014,250Esophagus
AFP 14,27018,920Liver
CA-12516,09025,580Ovary
NONE19,41054,370Non-Hodgkin Lymphoma
PSA (tot, free,cmpx)29,900230,110Prostate
Lab TestsDeathsNew CasesType
NONE160,440173,770Lung
NONE12,48018,400Brain
CA-19-931,27031,860Pancreas
CA15-3, CA 27-2940,580217,440Breast
CEA, Pre-gen 2656,730106,370Colon
Leading Cancer cases and deaths in the U.S. alone, 2004
Current Tests Sensitivity and Specificity
Breast Cancer Tumor Marker
97%62%CA-15-3
Pancreatic CancerNot establishedNot establishedCA-19-9
Monitoring Ovarian Cancer
97%35-55%CA-125
Varies with onset of MI96%86%Troponin T- 10 hour
Risk assessment for cervical cancer
85%95%HPV DNA
Breast Cancer Screening
90-95%*75%Mammogram
Cervical Cancer Screening
94-98%77-87%PAP Smear (ThinPrep)
Prostate Cancer Screening (4ng/ml cutoff)
35%65%PSA
IndicationSpecificitySensitivityTest
*Varies with Radiologist Reading
Historical Failure of Protein Diagnostics
Protein Microarrays “AI” Bioinformatic Tools Mass Spec
LCM
Courtesy: Gordon Whiteley, SAIC
Proteomic spectra
Pathologic signaturesubset of modified proteins
Patterns of Proteomic Information in SERUM
Perfused Tissue
How can we discover diagnostic proteomic patterns even without knowing the identity of the proteins ahead of time?
Ciphergen’s Proteomic Diagnostics Process
Interacting proteinsModified proteins
Pathway Discovery
Interaction Discovery Mapping ™ methods
Protein identification
SELDI Assisted Purification & MS
DiscoveryExpression Difference Mapping™ methods
Clinical developmentAssay development Diagnostic assay
Increasing medical value
Human Acute Phase ProteinsHost response proteins ID’ed as markers
Albumin (mg/ml)
Apoliprotein family (100 ug/ml)
Transthyretin (50 ug/ml)
Haptoglobin
Vitamin D Binding Protein
Inter alpha-trypsin inhibitor 4
Alpha-1 antichymotrypsin
Serum amyloid A
Complement factors
CCL18 (ug/ml)
Fibrinogen
Vitronectin V10
Transport proteins
Protease inhibitors
Immune system
Clotting
Specific Diagnostic Fragments
Disease Processes
Host Proteins
Pancreatic Cancer
Ovarian Cancer
Diabetes
Inter Alpha Trypsin Inhibitor
4
(ITIH4)
nvhsgstffkyylqgakipkpeasfspr
mnfrpgvlssrqlglpgppdvpdhaayhpfr
srqlglgppdvpdhaayhpfr
Three basic rulesRule #1. Know what you’re looking for.
Broad questions require more samples to have clinical utility
Minimize sample variability
Control for all relevant conditions
Rule #2: Avoid systematic biases.
Pre-analytical biases
Analytical biases
Rule #3: Don’t misuse statistics.
Feature selection
Independent validation
How many samples do I need?
Formula based on univariate analysis
N=2[ ]^2
zα: z score associated with type I error (p value)
zβ: z score associated with the power of the study
σ: standard deviation
µ1 and µ2: mean of the two groups
Also depends on nature of samplesCell culture models < animal models < human studies
More confounding variables (sex, comorbidities, smoking, etc) require more samples.
21)(
µµσβα
−− zz
Prevalence and study design
50
Cancer A
505050
Cancer BBenign diseaseHealthy controls
Prevalence of Cancer A is 50/200 = 25%
Prevalence of disease X in most studies is almost always higher than it is in the population
Predictive power of a diagnostic test in a study is usually artificially better than it will be in a real world population
The two languages of clinical proteomics
Clinicalese
Clinical question
Clinical trial design
Clinical specificity, sensitivity
Positive/negative predictive value
Analyticalese
Precision
Accuracy
Dynamic range
Analytical specificity, sensitivity
The clinical question
Clear-cut clinical question
Unmet clinical need
The marker will affect patient management and patient outcome
Know the desired results
Controls just as important as disease samples
Acquire the samples
Minimize pre-analytical biases
Multi-institutional roster of collaborators
Associate with clinical trials
Retrospective first, prospective when possible
Controls just as important as disease samples
Get enough samples to have a statistically meaningful result
Sample types
Body fluids
Serum, plasma
Cerebrospinal fluid
Urine
Fine needle aspirate, other cytology
Biopsy specimens
Tissue
Laser-capture microdissection
Processing the sample
Fractionation is keyAnion exchange
Subcellular fractionation
Other – Imac, phospho, etc.
Homogenization conditionsSonication
Dounce
Polytron
Bead-beater
Lysis conditionsDetergent
Buffer constituents
Multivariate analysis
Physicians and biologists employ multivariate analysis routinely
Advantages over univariate analysis
Accommodates biological variation
Accommodates systematic variation
Eliminates dependence on single analyte
Many types of multivariate analysis
Types of multivariate analysis
Unsupervised learning
No a priori “knowledge” of groups
Can discover new groups/subgroups
Generally not useful as diagnostic algorithm
Examples: PCA, heirarchal clustering, k-means clustering
Supervised learning
Class assignments required for input
Output is a classification (diagnostic) algorithm
Examples: classification trees (BPS), support vector machines, neural networks
T Test Review
The relationship between CV and t value
3.87100%2
7.7550%2
15.4925%2
38.7310%2
t valueCVC*Fold Change
CT XX /
*: CV of group control; n= 30
The relationship between fold change and t value
2.9020%1.15
4.8420%1.25
9.6820%1.5
19.3620%2
t valueCVC*Fold Change
CT XX /
*: CV of group control; n= 30
Multivariate analysis
Multivariate algorithms
Unsupervised learning
Supervised learning
Feature selection
Univariate analysis (easy but not as powerful)
Multivariate analysis (more powerful but requires more expertise)
Use multivariate techniques to find optimal combination of features
Independent validation of fixed classification algorithms
Always report validation results
How to build a model
Divide discovery data into training and testing sets
Feature selection: reduce from 1000s to <10 important variables
Different methods of feature selection can lead to slightly different ranks of features but most important features will be common
Further reduce number of features by calculating correlation, keeping non-correlated features
Use selected features to create classification algorithms
Bootstrapping/cross-validation help describe robustness of algorithms
Choose the best algorithm and apply to validation data set
From model to diagnostic test
A fixed laboratory protocol is used to generate the data to train, test, and validate the model
Diagnostic assay measures discrete, individual components
Laboratory report provides values for each components
Measuring and reporting values for individual components gives confidence to physician
Value for individual components applied to classification model
Laboratory report provides computed output from classification model
A reference range or cutoff value is provided
Samples with values outside reference range (or above/below cutoff value) are flagged for further clinical workup