Protein Mutations and Pathways in Cancer Toward Modular & Combinatorial Therapy Less war !!more...

73
Protein Mutations and Pathways in Cancer Toward Modular & Combinatorial Therapy Less war !! more science !! Chris Sander Computational & Systems Biology Memorial Sloan-Kettering Cancer Center, New York International Conference on Bioinformatics Asia-Pacific Bioinformatics Network

Transcript of Protein Mutations and Pathways in Cancer Toward Modular & Combinatorial Therapy Less war !!more...

Protein Mutations and Pathways in CancerToward Modular & Combinatorial Therapy

Less war !! more science !!

Chris SanderComputational & Systems Biology

Memorial Sloan-Kettering Cancer Center, New York

International Conference on BioinformaticsAsia-Pacific Bioinformatics Network

CancerCancer

Simplicity of phenotypeSimplicity of phenotype

Diversity of implementationDiversity of implementation

Modular therapy!Modular therapy!

Combinatorial therapy!Combinatorial therapy!

Cancer GenomicsCancer Genomics

Functional consequences of somatic Functional consequences of somatic mutationsmutations

Molecular alterations in pathway contextMolecular alterations in pathway context

Toward Combinatorial TherapyToward Combinatorial Therapy

Combinatorial Perturbations & Network Combinatorial Perturbations & Network Models Models

Information InfrastructureInformation Infrastructure

Pathway Commons & Author Fact Pathway Commons & Author Fact DepositionDeposition

Function of Protein Mutations

Boris RevaJenya AntipinAlyosha Stupalov

Cancer Genomics

Nikolaus SchultzBarry Taylor, Ethan Cerami

Nick Socci, John MajorSam Singer

Marc Ladanyi, Cameron BrennanMatt Meyerson, Jordi Barretina

& TCGA community Niki Schultz

PathwayCommons.org Emek Demir

Gary Bader, Toronto Ethan Cerami

Ben GrossRobert Hoffman

Ken FukudabioPAX community

The Cancer Genome Atlas (TCGA)The Cancer Genome Atlas (TCGA)

DNA copy number

Gene expression - mRNA (exon-level) - miRNA

DNA sequencingcurrently 1300 genes

soon 6000 or all genes

DNA methylation

Genomic rearrangements

Proteomics

Sample processingClinical annotation

Data storage and distribution

Integrative analysis

Next: lung squamous, kidney, breast, colon

Glioblastoma Multiforme – Aggressive Disease

DNA copy number DNA methylation mRNA expression miRNA expression mutationsClinical data

DNA copy number alterations in GBM and ovarian cancerDNA copy number alterations in GBM and ovarian cancerO

VAG

BM

More than half of the genes copy-number altered in ovarian cancer correlated with expression

Cancer GenomicsCancer Genomics

Functional Consequences of Protein Functional Consequences of Protein MutationsMutations

Q9SD07 RLHIGGLQ44861 RLLIGRVQ61EI4 RLLIGRVQ7PY36 RLFIGKIQ4I4E0 RLWMDQIQ58SJ9 RLLIGRVO01811 RLFIGKIP55159 RLLIGRV

Q9SD07 RLHIGGLQ44861 RLLIGRVQ61EI4 RLLIGRVQ7PY36 RLFIGKIQ4I4E0 RLWMDQIQ58SJ9 RLLIGRVO01811 RLFIGKIP55159 RLLIGRV

Public Databases

protein stabilityprotein stability

psc pcr ppspppi

correlation betweeninteracting residuescorrelation betweeninteracting residues

specificity &conservation

specificity &conservation

Probability(Disruptive/Non-disruptive) = f ( Psc, Pcr, Pps, Pppi )Probability(Disruptive/Non-disruptive) = f ( Psc, Pcr, Pps, Pppi )Output

protein family

Input mutationin coding region

protein-protein interactions

protein-protein interactions

SuperfamilyPDBPFAMSCOPNCBIENSEMBLReactome

allele 1 … GCC ATC CCG … ALA ILE/MET PROallele 2 … GCC AAC CCG …

3D structure 3D complex pathway

Somatic mutations in cancer:What are the functional consequences ?

Variant Annotation Top Spec/Cons Probability (%)G719S (in lung cancer; somatic mutation) Yes 99G724S (in lung cancer) Yes 100E734K (in lung cancer) 74L747F (in lung cancer) 98R748P (in lung cancer) 99Q787R (in lung cancer) Yes 73T790M (in lung cancer) Yes 98L833V (in lung cancer) Yes 96V834L (in lung cancer) 98L858R (in lung cancer; somatic mutation) 100L861Q (in lung cancer) 99G873E (in lung cancer) Yes 78R962G (in dbSNP:17337451) 100D761Y (in lung cancer, MSKCC) 96

Assessing the functional consequences of mutations

EGFR_human

CEO algorithmCombinatorial Entropy Optimization

Boris Reva, MSKCC

G E K Q E S S S S Y E P K E E F A Q C V L LG E S L E E A S V N G P F Q Y F Y T V E C LG E S S E V A A Q N V P M L W F Y Q R H V MG E Q V E S S E S Q E P H E E F Y Q I R T LW E S K E E N A V N V P H Q K F F T V L T MK E T N E V P W F K K P M R E F Y S AW G LE E Q S E S A E S Q Q P E E P F Y Q I L E LG E K N E V E A F K L P F R E F Y S V Q R VH E R V E S A A S N V P M E T F Y Q I A E LW E E K E E F A V Y I P L Q P F L T F G R LR E C H E V K A Q Y V P M L E F Y Q V K P WG E T N E E E A F N V P R R V F F S V S N LG E S P E E N F V N V P H Q Y F Y T V E P MT E N P E V E L F K V P F R V F F S L S H YS G W K E E L A V N Q P V Q E F E T F E I EG E A S E V E H Q N V P H L K F Y Q E G P PR E A Q E S Q A S N V P M E T F Y Q V R T L

S G W K E E L A V N Q P V Q E F E T F E I EW E E K E E F A V Y I P L Q P F L T F G R LG E S P E E N F V N V P H Q Y F Y T V E P MG E S L E E A S V N G P F Q Y F Y T V E C LW E S K E E N A V N V P H Q K F F T V L T MT E N P E E E L F K V P F R V F F S L S H YK E T N E E P W F K K P M R E F Y S AW G LG E T N E E E A F N V P R R V F F S V S N LG E K N E E E A F K L P F R E F Y S V Q R VE E Q S E S A E S Q Q P E E P F Y Q I L E LG E Q V E S S E S Q E P H E E F Y Q I R T LG E K Q E S S S S Y E P K E E F A Q C V L LR E A Q E S Q A S N V P M E T F Y Q V R T LH E R V E S A A S N V P M E T F Y Q I A E LR E C H E V K A Q Y V P M L E F Y Q V K P WG E S S E V A A Q N V P M L W F Y Q R H V MG E A S E V E H Q N V P H L K F Y Q E G P P

Defining subfamilies and specificity residues

Input Output

Sub-F

amilies

1

2

3

4

Specificity Residues

Clustering

Conserved Residues

Minimize contrast function = difference between entropies of ordered and disordered clusters of sequences of the same size

S

S’

S-S’=0 S-S’=-9S-S’=-3.5 S-S’=-7.5

ordered

disordered

Q: How one can achieve the most distinctive=informative separation of sequences into clusters?

Goal: S-S’->min

∑ ∏=

=k ki

ki N

NS

20,...,1,, !

!ln

αα

∑∏=

=k

ki

ki

N

NS

20,...,1

,,

~

~

!

!ln

α

α

ikki PNN ,,,

~

αα =

∑ ∑=k k

kkii NNP /,,, αα

)(~

0 ∑ −=Δi

ii SSS

Optimization problem: form clusters (subfamilies) of sequences, so as to minimize the combinatorial entropy difference .

For each column i of the alignment one computes the combinatorial entropy

and the reference entropy :

iS

is the number of sequences in cluster (subfamily) k;

is the number of residues of type α in the column i of the cluster k.

kN

kiN ,,α

The entropy difference , summed up over all columns i, is a measure of the deviation of a given sequence clustering from random. This difference is minimal when each cluster has its distinct type of residues.

ii SS~

combinatorial entropy measure of specificity patterns

iS~

Specificity residues - high contrastGlobally conserved residues - low contrast

-400

-350

-300

-250

-200

-150

-100

-50

0

0 30 60 90 120 150 180 210 240 270

Specificity region

Conserved region

Rank of residue position

Contrast entropy difference Family of 390 protein kinases

G E K Q E S S S S Y E P K E E F A Q C V L LG E S L E E A S V N G P F Q Y F Y T V E C LG E S S E V A A Q N V P M L W F Y Q R H V MG E Q V E S S E S Q E P H E E F Y Q I R T LW E S K E E N A V N V P H Q K F F T V L T MK E T N E V P W F K K P M R E F Y S AW G LE E Q S E S A E S Q Q P E E P F Y Q I L E LG E K N E V E A F K L P F R E F Y S V Q R VH E R V E S A A S N V P M E T F Y Q I A E LW E E K E E F A V Y I P L Q P F L T F G R LR E C H E V K A Q Y V P M L E F Y Q V K P WG E T N E E E A F N V P R R V F F S V S N LG E S P E E N F V N V P H Q Y F Y T V E P MT E N P E V E L F K V P F R V F F S L S H YS G W K E E L A V N Q P V Q E F E T F E I EG E A S E V E H Q N V P H L K F Y Q E G P PR E A Q E S Q A S N V P M E T F Y Q V R T L

S G W K E E L A V N Q P V Q E F E T F E I EW E E K E E F A V Y I P L Q P F L T F G R LG E S P E E N F V N V P H Q Y F Y T V E P MG E S L E E A S V N G P F Q Y F Y T V E C LW E S K E E N A V N V P H Q K F F T V L T MT E N P E E E L F K V P F R V F F S L S H YK E T N E E P W F K K P M R E F Y S AW G LG E T N E E E A F N V P R R V F F S V S N LG E K N E E E A F K L P F R E F Y S V Q R VE E Q S E S A E S Q Q P E E P F Y Q I L E LG E Q V E S S E S Q E P H E E F Y Q I R T LG E K Q E S S S S Y E P K E E F A Q C V L LR E A Q E S Q A S N V P M E T F Y Q V R T LH E R V E S A A S N V P M E T F Y Q I A E LR E C H E V K A Q Y V P M L E F Y Q V K P WG E S S E V A A Q N V P M L W F Y Q R H V MG E A S E V E H Q N V P H L K F Y Q E G P P

Defining subfamilies and specificity residues

Input Output

Sub-F

amilies

1

2

3

4

Specificity Residues

Clustering

Conserved Residues

Variant Annotation Top Spec/Cons Probability (%)G719S (in lung cancer; somatic mutation) Yes 99G724S (in lung cancer) Yes 100E734K (in lung cancer) 74L747F (in lung cancer) 98R748P (in lung cancer) 99Q787R (in lung cancer) Yes 73T790M (in lung cancer) Yes 98L833V (in lung cancer) Yes 96V834L (in lung cancer) 98L858R (in lung cancer; somatic mutation) 100L861Q (in lung cancer) 99G873E (in lung cancer) Yes 78R962G (in dbSNP:17337451) 100D761Y (in lung cancer, MSKCC) 96

Assessing the functional consequences of mutations

EGFR_human

OMA - Online Mutation Analysis

www.cbio.mskcc.org/cancergenomics

www.proteinfunction.org

Functional implications of cancer mutationsat the protein level

ERBB2 mutations

L49H no alignment data available

C311R strong functional impact, conserved residue

N319D likely functional impact, conserved and specificity residue

E321G likely functional impact, specificity residue

D326G likely functional impact, specificity residue - binding site?

C334S strong functional impact, conserved residue in S-S bridge

V750Estrong functional impact, strongly conserved residue

V777Aunlikely functional

NF1 mutations

V1308E strong functional impact, buried residue

R1412S strong functional impact

D1849N no alignment data available

A2336T likely functional impact, specificity residue

D326G in ERBB2- Tyrosine kinase-type cell surface receptor HER2

Examples of mutations predicted as functional by OMA

likely functional impact

specificity residue with conserved neighborsmay be a part of binding site

D->G

C334S in ERBB2 - Tyrosine kinase-type cell surface receptor HER2

Examples of mutations predicted as functional by OMA

strong functional impact

conserved residue

mutation eliminates SS bridge C334-C338

C334C338

Cancer GenomicsCancer Genomics

Molecular alterations in pathway contextMolecular alterations in pathway context

TCGA glioblastomaDNA copy number changes and mutations

mutated

deleted

amplified

Glioblastoma copy number alterationsWhich events are functional, which are passengers ?

RAE: Barry Taylor, Nick Socci, Chris Sander PLoS ONE 2008

RAErecurrenceamplitudeextent

www.cbio.mskcc.org/cancergenomics

Analyzing genetic alterations in pathway context

Combining molecular profiles and prior biological knowledge

www.cbio.mskcc.org/cancergenomics

“GBM pathway”Based on:Genes Dev. 2007 Nov 1;21(21):2683-710.

copy number datasample 2

copy number datasample 3

copy number datasample 4

copy number datasample 5

copy number datasample 6

copy number datasample 7

copy number datasample 8

mRNA expr. datasample 8

methylation datasample 8

mutation datasample 8

mutation datasample 3

Mapping molecular alterations in 200 glioblastoma samples

onto biological pathways

Goal: determine oncogenic programs

www.cbio.mskcc.org/cancergenomics

EGFR ERBB2

PI-3KClass I

PI-3KClass I

PDGFRA MET

mutation, amplificationin 46%

mutationin 7%

amplificationIn 14%

amplificationin 3%

RASRASNF-1NF-1

AKTAKT

FOXOFOXO

PTENPTEN

Proliferation

Activated oncogenes

MDM4MDM4

TP53TP53

MDM2MDM2

CDKN2A(ARF)

CDKN2A(ARF)

RB1RB1

RTK/RAS/PI-3Ksignaling

altered in

85%

RTK/RAS/PI-3Ksignaling

altered in

85%

P53signaling

altered in

86%

P53signaling

altered in

86%

Senescence Apoptosis

CDK4CDK4

CDKN2A(INK4A)

CDKN2A(INK4A) CDKN2BCDKN2B CDKN2CCDKN2C

G1/S progression

homozygousdeletion in 51%

RB signaling

altered in

77%

RB signaling

altered in

77%

homozygousdeletion in 48%

homozygousdeletion in 2%

homozygous deletion in 49%

amplification in 13%

amplification in 5%

mutation,deletion in 35%

amplification in 17%

deletion,mutation in 11%

mutation in 2%

amplification in 2%

mutation in 2% mutation,amplification in 24%

mutation,deletion in 17%

mutation,deletion in 33%

Cancerprogram bysub-networks

The CancerGenome AtlasPilot Project(2006-2008)

~200 cases ofglioblastoma m.brain tumors

www.cbio.mskcc.org/cancergenomics

Automate module analysis (make it objective)

Key: capture biological knowledge in computable form

Facilitate creation and communication of pathway dataAggregate pathway data in the public domainProvide easy access for pathway analysis

http://www.pathwaycommons.org

Community Process !

bioPAX

Algorithm(s) to detectaltered modules in cancer

glioblastoma – altered modules

PI3K module change in subtypes ?

whole proteome/genome sequencing will lead to more complete module map

Network pharmacologyNetwork pharmacology

Toward Combinatorial TherapyToward Combinatorial Therapy

Simple Models from Complex DataSimple Models from Complex Data

CoPIA – Nelander et al. - 2008CoPIA – Nelander et al. - 2008

Perturbation Cell Biology – CoPIA

Sven Nelander

Peter Gennemark & Wei Qing Wang

Bjoern Nilsson, Christine Pratilas, QingBai She

Neal Rosen

Sven Nelander

http://cbio.mskcc.org/copia/

Nelander, Sander et al., Molecular Systems Biology, 2008

Reality Abstraction / Model

Application

Therapy

Experiment: Dual drug perturbation of MCF7 cancer cell line@ MSKCC

Wei Qing Wang, Sven Nelander & Rosen Lab 2007-2008

Mathematical Model System Simulation by Bounded ODEs

like Hopfield Network

dx idt

= ( W ij x jj

∑ ) −α ix i + Pi

dx idt

= β i f ( W ij x jj

∑ + Pi) −α ix i

Mean Field Model for Combinatorial Perturbation

linear

non-linear

dx idt

= β i f ( W ij x jj

∑ + Pi) −α ix i

transfer function

f(…)

A simple but effective non-linear deviceto capture cooperative effects(epistatis, synergy, antagonism)

Optimize the network model

Minimize the discrepancy between prediction and experiment

while keeping the model simple !

Sum of squares pred-expt error

Structural complexity€

E = ESSQ + ESTRUCT

dx idt

= β i f ( W ij x jj

∑ + Pi) −α ix i

Optimization algorithm – examplesexplore alternative network structures

movie by Evan Molinelli

Best network model deduced from dual drug perturbation

in MCF7 cancer cell lines

Dual drug perturbation in MCF7 cancer

cell line@ MSKCC

Does the model work ?Leave out one drug combo at a time, compute best model, predict & compare with experiment

dx idt

= β i f ( W ij x jj

∑ + Pi) −α ix i

Set of Best Network Model(s)

Network structure deduced from dual drug perturbation

in MCF7 cancer cell lines

Network structure deduced from dual drug perturbation

in MCF7 cancer cell lines

dx idt

= β i f ( W ij x jj

∑ + Pi) −α ix i

Power of CoPIA network modelsCoPIA = Combinatorial Perturbation Analysis

Capture …

multiple perturbation

epistasis (synergy/antagonism)

feedback loops

time-dependent processes

modification of prior knowledge

CoPIA Network Models - Applications

design combination

therapy

refine pathway models

identify drug

targets

predict outcomes

Extend & complete existing network models !

Cancer GenomicsCancer Genomics

Functional consequences of somatic Functional consequences of somatic mutationsmutations

Molecular alterations in pathway contextMolecular alterations in pathway context

Toward Combinatorial TherapyToward Combinatorial Therapy

Combinatorial Perturbations & Network Combinatorial Perturbations & Network Models Models

Information InfrastructureInformation Infrastructure

Pathway Commons & Author Fact Pathway Commons & Author Fact DepositionDeposition

Integrate Pathway Information

Facilitate creation and communication of pathway dataAggregate pathway data in the public domainProvide easy access for pathway analysis

http://www.pathwaycommons.org

Community Process !

bioPAX

http://iHOP-net.orgGenes & compounds & interactions from millions of abstracts - instantly

Robert Hoffmann, Benjamin Gross, Chris Sander iHop-net.org version 2 released 6 Dec 2006

Factoidsdigital abstracts to databases

As authors submit a paper they deposit structured facts

to a public database

How to get rich biological knowledge into a computable form

Postdocs wanted

Sander Group – Computational & Systems Biology @ MSKCC in NYCUpper East Side Tri-I Campus: Sloan Kettering, Cornell Weill, Rockefeller

Cancer genomics (dry)

Network pharmacology (wet)

We pause for station identification…

Toward Combinatorial TherapyToward Combinatorial TherapyUse multiple perturbation experiments to Use multiple perturbation experiments to

build predictive network modelsbuild predictive network models

Cancer GenomicsCancer Genomics

The active sub-pathway model of cancer The active sub-pathway model of cancer biologybiology

Pathway CommonsPathway CommonsOne-stop-shop access to pathway informationOne-stop-shop access to pathway information

using the bioPAX common languageusing the bioPAX common language

SummarySummary

Cytoscape, bioPAX & Pathway Commons

Emek DemirEthan Cerami

Ben GrossRobert Hoffman

Ken FukudabioPAX community

Gary Bader

Perturbational cell biologySven Nelander

Wei Qing Wang Peter Gennemark

Neal Rosen, Christine Pratilas

Small RNAsDoron Betel

Rob SheridanChristina Leslie, Debora Marks

Tom Tuschl, Eric Kandel

Protein Families & Combinatorial Entropy

Boris Reva, Jenya Antipin

Cancer GenomicsNikolaus Schultz

Barry TaylorBoris Reva,

J Antipin, A StukalovJohn Major

Nick Socci, Sam SingerMarc Ladanyi

Matt Meyerson, Jordi Barretina

tools >

TGFα • 6000 gene RNAi screenNikolaus SchultzDina Marenstein

Joan Massague, Hakim Djaballah

Support: Bioinformatics Core in the Computational Biology Center at MSKCC

Optimization algorithm 1outer loop - explore alternative network structures

dx idt

= β i f ( W ij x jj

∑ + Pi) −α ix imodel

occasoinally climb uphill in Monte Carlo fashion

error

Optimization algorithm 2inner loop – optimize parameters along a trajectory in solution space

dx idt

= β i f ( W ij x jj

∑ + Pi) −α ix i

adjust weights

explicit

errormodel

result: optimal model parameters Wij

dx idt

= β i f ( W ij x jj

∑ + Pi) −α ix i