Expression signatures as biomarkers: solving combinatorial problems with gene networks Andrey...

Post on 19-Dec-2015

213 views 0 download

Tags:

Transcript of Expression signatures as biomarkers: solving combinatorial problems with gene networks Andrey...

Expression signatures as biomarkers: solving

combinatorial problems with gene networks

Andrey AlexeyenkoDepartment of Medical Epidemiology and

Biostatistics, Karolinska Institute

FunCoup is a data integration framework to discover

functional coupling in eukaryotic proteomes with

data from model organisms

Amouse

Bmouse

?

Find

orthologs

Human

Fly

Rat

Yeast

High-throughput

evidence

Andrey Alexeyenko and Erik L.L. Sonnhammer. Global networks of functional coupling in eukaryotes from comprehensive data integration. Genome Research. Published in Advance February 25, 2009

FunCoup• Each piece of data is evaluated• Data FROM many eukaryotes (7)• Practical maximum of data sources (>50)• Predicted networks FOR a number of

eukaryotes (10…)• Organism-specific efficient and robust

Bayesian frameworks• Orthology-based information transfer and

phylogenetic profiling• Networks predicted for different types of

functional coupling (metabolic, signaling etc.)

http://FunCoup.sbc.su.se

FunCoup was queried for any links between members of TGFβ pathway (left blue circle) and habituées of known cancer pathways (members of at least 7 out of 18 groups; right blue circle). MAPK1 and MAPK3 belonged to both categories.

TGFβ <-> cancer pathway cross-talk

http://FunCoup.sbc.su.se

FunCoup: recapitulation of known cancer pathways

Figure 5 from:The Cancer Genome Atlas Research Network Comprehensive genomic characterization defines human glioblastoma genes and core pathways. Nature. 2008 Sep 4. [Epub ahead of print]

The same genes submitted to FunCoup No TCGA data were used. Outgoing links are not shown.

Single molecular markers are (often) far from perfect. Combinations (signatures) should perform better.

The problem:

How to select optimal combinations?

×

Outcome,Optimal treatment, Severity/urgency

etc.

Biomarker discovery in network context

The idea:

Construct multi-gene predictors with regard to network context

• Reduce the computational complexity• Make marker sets biologically sound

Accounting for network context is taking either:a) network neighbors orb) genes at remote network positions

“Rotterdam” dataset (Wang et al., 2005): 286 patients

Expression:

~22000 probes

Clinical data:

Estrogen receptor status: +/ –

Lymph. node status: all –

Relapse : yes/no and time (days)

×

Procedure

Individual probe p-values (~22000):

Estrogen receptor-specific ability to predict relapse

Select most significant probes (1000):

Candidate members for marker signatures

Compile set of probes:

N probes at a time (e.g. N=20 or N=50)1. Split data: 75% to train, 25% to test.

2. Produce a linear regression equation (weight terms step-wise, reward for performance, penalize for complexity) on the train sub-set.

3. Apply the equation to the test set to predict outcome (relapse yes/no).

4. Record the specificity/sensitivity (Type I/II error rates) as ROC curve.Repeat m times

RELAPSE = γ1g1 + γ2g2 + γ3g3 + … + γNgN

ProcedureSelect most significant probes (1000):

Candidate members for marker signatures

Compile set of probes:

N probes at a time (e.g. N=20 or N=50)

1. Split data: 75% to train, 25% to test.

2. Produce a linear regression equation (weight terms step-wise, reward for performance, penalize for complexity) on the train sub-set.

3. Apply the equation to the test set to predict outcome (relapse yes/no).

4. Record the specificity/sensitivity (Type I/II error rates) as ROC curve.Repeat m times

RELAPSE = γ1g1 + γ2g2 + γ3g3 + … + γNgN

Test X randomly retieved sets

Take the best ones Account for the network context

Candidate signature in the network

Biomarker candidates

Ready signature in the network

RELAPSE = γ1EIF3S9+ γ2CRHR1 + γ3LYN + … + γNKCNA5

Testing “top”, “free”, and “network” approaches

Estrogen receptor status: positive

90% 91% 92% 93% 94% 95% 96% 97%

Quality of prognosis relapse/no relapse (area under ROC curve)

Fre

quen

cy

netw free

Estrogen receptor status: negative

93% 94% 95% 96% 97% 98% 99%

Quality of prognosis relapse/no relapse (area under ROC curve)

Fre

quen

cy

netw free

Top

Top

Signature involves genes mutated in cancer

Tumour tcga-02-0114-01a-01w

Cancer individuality: each tumor is unique in its molecular state and set of

mutated/disordered genes

Partial correlations:a way to get rid of spurious links

0.7

0.6

0.4

Cancer individuality via network view

Functional couplingtranscription ? transcription transcription ? methylation methylation ? methylation mutation methylation mutation transcriptionmutation ? mutation

+ mutated gene

is a framework for biomarker discovery:

•Markers can be discovered and presented in the network dimension.

•Choice of data types to incorporate is unlimited – from metabolite profiling to patient phenotypes.

Useful features:•Web-based resource ready for further expansion

and presenting new research results in an interactome perspective;

•Cross-species network comparison of human and model organisms.

•Efficient query system to retrieve network environments of interest.

http://FunCoup.sbc.su.se

Thank you for attention!

Decomposing biological context

rPLC = 0.88

rPLC = 0.95

rPLC = 0.76

Common

Develomental

Dioxin-enabled

ANOVA (Analysis Of VAriance):

Look at F-ratios:

Signal of interest /Residual (“error”) variance

Accounting for edge features:dioxin-enabled vs. dioxin-sensitive links

Andrey Alexeyenko, Deena M Wassenberg, Edward K Lobenhofer, Jerry Yen, Erik LL Sonnhammer, Elwood Linney, Joel N Meyer Transcriptional response to dioxin in the interactome of developing zebrafish. submitted.

a