Clinical Proteomics: From Research To Practice...Proteomics vs Genomics Proteins actually do the...

Clinical Proteomics: From Research to Practice

Eric Fung, MD, PhDVice President of Medical and Clinical Affairs

[email protected]

Outline of presentation

Proteomics defined

A summary of proteomics technologies

Focus on biomarkers to diagnostics

Practical aspects of a clinical proteomics study

Proteome - defined

All the proteins expressed by a genome.

“Functional Proteome” = all the proteins produced by a specific cell in a single time frame.

Can be defined as the total protein complement of a system (be that an organelle/cell/tissue/organism).

The genome is for the most part static within a generation, whereas the proteome is more dynamic: cell cycle, development, response to environment, etc

Why is the proteome important?

It is the proteins within the cell that:Provide structure

Produce energy

Allow communication

Allow movement

Allow reproduction

Proteins provide the structural and functional framework of cellular life

Proteomics, defined

The study of the expression, structure and function of proteins,and the interactions between proteins.

Where and when are proteins expressed? Abundance?

Protein modifications and activities

Interactions: Protein-protein, protein-DNA, protein-small molecule, etc

Protein structure

It represents the protein counterpart to the analysis of gene function.

Initial goal was to rapidly identify all the proteins expressed by a cell or tissue – a goal that has yet to be achieved for any species

Why is proteomics important?

Identification of proteins changed in disease conditions

Identification of pathogenic mechanisms

Promise in novel drug discovery via analysis of clinically relevant molecular events

Contributes to understanding of gene function

Why Proteomics?

Same genome

Different proteome

Proteomics vs Genomics

Proteins actually do the work of the cell

DNA/RNA analysis cannot predict the amount of a gene product made (if and when)

RNA quantitation does not always reflect corresponding protein levels

Genomics cannot predict post-translational modifications and the effects thereof

Post-translational modification is extensive: so far more than 200 different types of modifications have been reported! How does modification alter protein function?

~30,000 human genes yields 1,200,000+ protein variants?!

Multiple proteins can be obtained from each gene (alternative splicing)

38,016 possible isoforms! How many are produced and functional?

Proteomic Methodologies

Analysis of protein expression patterns 2-D gel electrophoresis

Protein separation – mass spectrometry

SELDI, MALDI, ICAT, LC(n)-MS(n), ESI

Analysis of protein sequence information

Analysis of protein structure/function relationships

Bioinformatics/statistics

Affinity approachesProtein arrays

Genetic methods – tagging every proteinPrototype – Yeast two-hybrid assay

Components of Human Serum

“Unbiased” proteomics approaches

Mocellin et al, TMM, 2004

Studying the proteomePatient sample

extraction method(i.e. lipid solubilisation)

comparison methods to identify changed proteins in gels

protein extraction from gels (spot cutting)

protein fragmentation and mass spec.

separation methods(2D gels)

matching spectrum to virtual spectrum for protein identification

pITwo-dimensional gel

MW

2-D DIGE

Courtesy of Amersham

ProteinChip® technology

FractionateQ-HyperD, CM-HyperD, IMAC-HyperCel,

HIC, DEAE, MEP, IDM

Study Design ProfileQ10, CM10, IMAC30, H50, RS100

Benign (50)

Control

(30)

Benign (90)Control

(49)

Stage

III/IV (2)

Stage I/II (35)Benign (26)

Control 63

Stage

III/IV (103)

Stage I/II (20)

Stage I/II (35)

Ca (41)

Other Ca 2 (20)

Control

(41)

Other Ca 1 (20)

Other Ca 3 (20)

IndependentValidation

CrossComparison

CandidateMarkers

MultivariateModels

Protein ID

Independent Validation by Immunoassay

Multivariate

Model Derivatio

n

0 0.1 0 .2 0 .3 0. 4 0 .5 0. 6 0 .7 0. 8 0 .9 10

0. 1

0. 2

0. 3

0. 4

0. 5

0. 6

0. 7

0. 8

0. 9

1ROC cur ve, are a=0. 94 327 , std = 0 .09 49 73 , a lph a= 2. 79 1, be ta= 0.4 71 54

1 - Spe cific ity

Se

nsiti

vity

Discovery 1

Discovery 2

Collect and PreprocessBaseline subtract, Normalize

Calibrate

Characterize and IdentifyPeptide Mass Fingerprinting, MS/MS

Univariate, Multivariate analysis to Prioritize Biomarkers

CiphergenExpress Heirarchical Clustering, PCA; BPS

25000 50000 75000

05

101520

ProteinChip® Three dimensions of Resolution

ProteinChip® System, Series 4000

Protein Microarrays

High-Throughput!

Quantitative Protein Microarrays

• requires previous knowledge about proteins to be studied and that appropriate antibodies exist

• protein complexes can obscure results

MacBeath 2003

Protein Function Microarrays

• Proteins are arrayed on slide and assayed for…• Protein-Antibody interactions• Protein-protein interactions• Protein-small molecule interactions and complexes• Activity assays (eg. phosphorylation, protease activity etc)

Protein Function Microarrays

ResultsCalmodulin: identified 6 known and 33 new calmodulin binding proteins

Phosphoinositide binding: 35% of identified proteins are uncharacterized

Zhu et al, 2001

InflammationInflammation

NeovesselsNeovessels

HyperplasiaHyperplasia

NerveNerve

StromaStroma

NormalNormal

HighHigh--grade PINgrade PIN

WellWell--differentiated differentiated carcinomacarcinoma

ModeratelyModerately--differentiated differentiated carcinomacarcinoma

PoorlyPoorly--differentiateddifferentiatedcarcinomacarcinoma

LowLow--gradegrade PINPIN

TISSUE MICROENVIRONMENT

Nature 2001

Invented: Dr. Lance A. LiottaScience’ 96, 97, 98

Before LCM

After LCM

Case study: Prostate normal epithelium (human)

Liotta et al. (2003) Cancer Cell

PY-ERK TOTAL ERK2 ALONEo

Use of Novel Protein Array Technology:Signal Pathway Profiling in Human Breast

Cancer Biopsy Specimens

Ongoing work: cluster analysis with 135 phospho-specific endpoints, allnormalized to the self protein for true signal pathway profiling

Normal/Normal (reduction mammoplasty)

Coupling Laser Capture Microdissection With True Signal Pathway Profiling

Normal Premalignant Invasive

38 cases Invasive

NCI

The next revolution: Molecular Profiling: Individualized Therapy

Microdissection

Patient

Biopsy

Signal NetworkProfileGene Microarray

EGFR signature, 452 genes in 74 cases

EGFR Pathway

Choose combination therapy tailored to pathogenic defect

Monitor success of therapy

Rationale basis for revising therapy following recurrence

Protein Microarray

Phosphorylated states of signal pathway proteins

The Need & The Opportunity

NONE12,71060,240Urinary

NONE12,69035,710Kidney

NONE13,33014,250Esophagus

AFP 14,27018,920Liver

CA-12516,09025,580Ovary

NONE19,41054,370Non-Hodgkin Lymphoma

PSA (tot, free,cmpx)29,900230,110Prostate

Lab TestsDeathsNew CasesType

NONE160,440173,770Lung

NONE12,48018,400Brain

CA-19-931,27031,860Pancreas

CA15-3, CA 27-2940,580217,440Breast

CEA, Pre-gen 2656,730106,370Colon

Leading Cancer cases and deaths in the U.S. alone, 2004

Current Tests Sensitivity and Specificity

Breast Cancer Tumor Marker

97%62%CA-15-3

Pancreatic CancerNot establishedNot establishedCA-19-9

Monitoring Ovarian Cancer

97%35-55%CA-125

Varies with onset of MI96%86%Troponin T- 10 hour

Risk assessment for cervical cancer

85%95%HPV DNA

Breast Cancer Screening

90-95%*75%Mammogram

Cervical Cancer Screening

94-98%77-87%PAP Smear (ThinPrep)

Prostate Cancer Screening (4ng/ml cutoff)

35%65%PSA

IndicationSpecificitySensitivityTest

*Varies with Radiologist Reading

Historical Failure of Protein Diagnostics

Protein Microarrays “AI” Bioinformatic Tools Mass Spec

LCM

Courtesy: Gordon Whiteley, SAIC

Proteomic spectra

Pathologic signaturesubset of modified proteins

Patterns of Proteomic Information in SERUM

Perfused Tissue

How can we discover diagnostic proteomic patterns even without knowing the identity of the proteins ahead of time?

Ciphergen’s Proteomic Diagnostics Process

Interacting proteinsModified proteins

Pathway Discovery

Interaction Discovery Mapping ™ methods

Protein identification

SELDI Assisted Purification & MS

DiscoveryExpression Difference Mapping™ methods

Clinical developmentAssay development Diagnostic assay

Increasing medical value

Human Acute Phase ProteinsHost response proteins ID’ed as markers

Albumin (mg/ml)

Apoliprotein family (100 ug/ml)

Transthyretin (50 ug/ml)

Haptoglobin

Vitamin D Binding Protein

Inter alpha-trypsin inhibitor 4

Alpha-1 antichymotrypsin

Serum amyloid A

Complement factors

CCL18 (ug/ml)

Fibrinogen

Vitronectin V10

Transport proteins

Protease inhibitors

Immune system

Clotting

Specific Diagnostic Fragments

Disease Processes

Host Proteins

Pancreatic Cancer

Ovarian Cancer

Diabetes

Inter Alpha Trypsin Inhibitor

4

(ITIH4)

nvhsgstffkyylqgakipkpeasfspr

mnfrpgvlssrqlglpgppdvpdhaayhpfr

srqlglgppdvpdhaayhpfr

Three basic rulesRule #1. Know what you’re looking for.

Broad questions require more samples to have clinical utility

Minimize sample variability

Control for all relevant conditions

Rule #2: Avoid systematic biases.

Pre-analytical biases

Analytical biases

Rule #3: Don’t misuse statistics.

Feature selection

Independent validation

How many samples do I need?

Formula based on univariate analysis

N=2[ ]^2

zα: z score associated with type I error (p value)

zβ: z score associated with the power of the study

σ: standard deviation

µ1 and µ2: mean of the two groups

Also depends on nature of samplesCell culture models < animal models < human studies

More confounding variables (sex, comorbidities, smoking, etc) require more samples.

21)(

µµσβα

−− zz

Prevalence and study design

50

Cancer A

505050

Cancer BBenign diseaseHealthy controls

Prevalence of Cancer A is 50/200 = 25%

Prevalence of disease X in most studies is almost always higher than it is in the population

Predictive power of a diagnostic test in a study is usually artificially better than it will be in a real world population

The two languages of clinical proteomics

Clinicalese

Clinical question

Clinical trial design

Clinical specificity, sensitivity

Positive/negative predictive value

Analyticalese

Precision

Accuracy

Dynamic range

Analytical specificity, sensitivity

The clinical question

Clear-cut clinical question

Unmet clinical need

The marker will affect patient management and patient outcome

Know the desired results

Controls just as important as disease samples

Acquire the samples

Minimize pre-analytical biases

Multi-institutional roster of collaborators

Associate with clinical trials

Retrospective first, prospective when possible

Controls just as important as disease samples

Get enough samples to have a statistically meaningful result

Sample types

Body fluids

Serum, plasma

Cerebrospinal fluid

Urine

Fine needle aspirate, other cytology

Biopsy specimens

Tissue

Laser-capture microdissection

Processing the sample

Fractionation is keyAnion exchange

Subcellular fractionation

Other – Imac, phospho, etc.

Homogenization conditionsSonication

Dounce

Polytron

Bead-beater

Lysis conditionsDetergent

Buffer constituents

Multivariate analysis

Physicians and biologists employ multivariate analysis routinely

Advantages over univariate analysis

Accommodates biological variation

Accommodates systematic variation

Eliminates dependence on single analyte

Many types of multivariate analysis

Types of multivariate analysis

Unsupervised learning

No a priori “knowledge” of groups

Can discover new groups/subgroups

Generally not useful as diagnostic algorithm

Examples: PCA, heirarchal clustering, k-means clustering

Supervised learning

Class assignments required for input

Output is a classification (diagnostic) algorithm

Examples: classification trees (BPS), support vector machines, neural networks

T Test Review

The relationship between CV and t value

3.87100%2

7.7550%2

15.4925%2

38.7310%2

t valueCVC*Fold Change

CT XX /

*: CV of group control; n= 30

The relationship between fold change and t value

2.9020%1.15

4.8420%1.25

9.6820%1.5

19.3620%2

t valueCVC*Fold Change

CT XX /

*: CV of group control; n= 30

Multivariate analysis

Multivariate algorithms

Unsupervised learning

Supervised learning

Feature selection

Univariate analysis (easy but not as powerful)

Multivariate analysis (more powerful but requires more expertise)

Use multivariate techniques to find optimal combination of features

Independent validation of fixed classification algorithms

Always report validation results

How to build a model

Divide discovery data into training and testing sets

Feature selection: reduce from 1000s to <10 important variables

Different methods of feature selection can lead to slightly different ranks of features but most important features will be common

Further reduce number of features by calculating correlation, keeping non-correlated features

Use selected features to create classification algorithms

Bootstrapping/cross-validation help describe robustness of algorithms

Choose the best algorithm and apply to validation data set

From model to diagnostic test

A fixed laboratory protocol is used to generate the data to train, test, and validate the model

Diagnostic assay measures discrete, individual components

Laboratory report provides values for each components

Measuring and reporting values for individual components gives confidence to physician

Value for individual components applied to classification model

Laboratory report provides computed output from classification model

A reference range or cutoff value is provided

Samples with values outside reference range (or above/below cutoff value) are flagged for further clinical workup

Clinical Proteomics: From Research To Practice...Proteomics vs Genomics Proteins actually do the...

Documents

Transcript of Clinical Proteomics: From Research To Practice...Proteomics vs Genomics Proteins actually do the...