Network and pathway analysis in systems biology - Melissa Davis

39
Network and pathway analysis in systems biology Dr. Melissa Davis The University of Queensland Institute for Molecular Bioscience [email protected]

Transcript of Network and pathway analysis in systems biology - Melissa Davis

Page 1: Network and pathway analysis in systems biology - Melissa Davis

Network and pathway analysis in systems biology

Dr. Melissa DavisThe University of Queensland

Institute for Molecular [email protected]

Page 2: Network and pathway analysis in systems biology - Melissa Davis
Page 3: Network and pathway analysis in systems biology - Melissa Davis

• Regulated by complex cellular circuitry– Extra- and intra-cellular signals are transacted through networks of interacting molecules– Changes in cellular signalling result in the activation or repression of programs of gene

expression

• Protein interactions, metabolic network, signalling pathways, gene regulatory networks

Source: Cell , Volume 144, Issue 5, Pages 646-674

Page 4: Network and pathway analysis in systems biology - Melissa Davis

Interpret similarities in large cohorts

– Statistics of feature selection– Interpretation of single ‘omics

results– Discovery of biomarkers

Interpret individual differences for patient-specific treatment

– Robust n=1 analysis methods– Interpret multiple ‘omics

results simultaneously– Discover diagnostic features

Reductionist biology– List of molecules implicated in

condition– Selection of molecule of

interest– Hypothesis generation– Experiment to determine role

of molecule

Systems biology– Networks, pathways implicated in

condition– Identify perturbed or deregulated

systems– Hypothesis generation– Experiment to determine

responses of the system

Page 5: Network and pathway analysis in systems biology - Melissa Davis

So…• We need new methods for data interpretation• We need better knowledge bases

– Context specificity– Molecular resolution– Biological complexity

• All this requires knowledge engineering:

• Computational biology can enrich biological context, improve molecular resolution and capture missing biological complexity

• New informatic methods in cancer research that enable comparative systems analysis for individuals

Page 6: Network and pathway analysis in systems biology - Melissa Davis

Computational biology of molecular interaction networks

Page 7: Network and pathway analysis in systems biology - Melissa Davis

Protein interaction networks

• Advantages– Increasing coverage

– Powerful insights

– Increasing quality

– Visualisation

• Disadvantages– Inadequate metadata– Poor molecular

resolution– Aggregated conditions– Flattened, generic PPI

network– Little evidence of the

biological specificity– Meaning of

interactions missing

Page 8: Network and pathway analysis in systems biology - Melissa Davis

Thakur et al., (1997)

• Protein-protein interactions are useful– Understand subcellular localisation (Chin, et al., 2009)– Perform comparative mammalian systems analysis (Chin, Davis and Ragan, 2009)– Interpret proteomics data in prostate cancer (Inder et al., 2011, Inder, Davis and Hill,

2012)

• PPI data are usually assigned to a reference protein, or even gene

• Previously characterised the impact of alternative splicing on subcellular localisation (Davis et al. 2006)

• Little or no isoform specificity exists in most PPI datasets• Does alternative splicing generate protein isoforms that have different

interaction potential?

Guo and Qui, (2011)

Page 9: Network and pathway analysis in systems biology - Melissa Davis

Davis, et. al.,Mol.BioSys,

2012

Buljan, et. al.,Mol.Cell, 2012

Ellis, et. al.,Mol.Cell, 2012

Alternative splicing of domains rewires protein-protein interactions

Tissue-specific exons enriched for disordered regions favoring binding

Cassette exons regulated by neural specific splicing regulator modulate PPIs

Observations are not tissue specific,no analysis of disordered regions

Do not address splicing of protein domains explicitly, use exons not isoforms as unit of analysis, limited expression data

Page 10: Network and pathway analysis in systems biology - Melissa Davis

Rewiring the dynamic interactomeDavis et al. Mol. BioSyst., 2012,8, 2054-2066

8860 transcriptional units (genes) with both alternative isoforms and protein interaction domains:

3DIDStein et al., (2010)

H-Invitational+Fantom 3

PPIShin et al., (2009)

Page 11: Network and pathway analysis in systems biology - Melissa Davis

What is happening with the interactions?– 1787 genes involved in known interactions

1287 215 644

• STAT1 (isoform cant bind CREBBP)

• AKT1 (isoform retains kinase domains but loses PH)

• PTPN11 (isoform lacks a SH2 domain and cant bind JAK2)

• GRB2 (isoform GRB3-3 participates in distinct interactions and signalling)

Page 12: Network and pathway analysis in systems biology - Melissa Davis

Current work: Tissue specific interactions

• Updated the DDI data and PPI data

• Illumina Bodymap 2.0: RNA seq data produced using the HiSeq 2000 (2010)

• 16 phenotypically normal tissues:adrenal, adipose, brain, breast, colon, heart, kidney, liver, lung, lymph, ovary, prostate, skeletal muscle, testes, thyroid, and white blood cells

16 human tissues RNA-seq mapped(two replicates)

diagnostic features

AlexaSeq Cufflinks

? expressed

Page 13: Network and pathway analysis in systems biology - Melissa Davis

(69255 Isoform Interactions)(3627 Gene Interactions)

PPI network:14528

interactions

Protein interaction

domains (3DID): 2622

Isoform domain

annotation (Ensembl):

151664

+ +

RibosomeComplex

NeurotransmitterComplex

Page 14: Network and pathway analysis in systems biology - Melissa Davis

Domain and process results

Domain Count

Pkinase 393

Pkinase_Tyr 389

SH2 217

Ras 183

7tm_1 164

GO Biological Process (variable genes) q-value

Intracellular signaling cascade 6.78e-42

Response to organic substance 6.26e-31

Positive regulation of molecular function 7.98e-29

Positive regulation of catalytic activity 7.76e-27

Regulation of apoptosis 3.55e-25

Information regarding genes with variable domain architecture within the maximal PIDnetwork. (a) Five most common domain classes present in the variable genes showing anenrichment for signalling domain. (b) GO biological process enrichment scores for the samegenes.

Page 15: Network and pathway analysis in systems biology - Melissa Davis

Pathway level analysis

Page 16: Network and pathway analysis in systems biology - Melissa Davis

• Protein isoforms functionally diverse

• Interaction network is rewired by splicing of interaction domains

• Identify interaction networks for specific tissues

• Isoform variability: emerging theme of opposing function

• Very strong enrichment for signalling proteins

• Part of normal phenotypic diversity

BUTalso has a role in cancer and disease:– Isoforms of Gli1 (from Shh

pathway)– MST1R (RON) isoforms (upstream

of MAPK pathway)– P53 isoforms with dominant

negative effect– Switch to developmentally

restricted isoforms– Transcript variants and protein

isoforms as potential diagnostic and therapeutic targets

Page 17: Network and pathway analysis in systems biology - Melissa Davis
Page 18: Network and pathway analysis in systems biology - Melissa Davis

Modelling information flow in pathways

• Pathways contain a richer representation of biological information than PPI networks

• Mechanistic models are desirable -> hypothesis generation

• Kinetic parameters aren’t available for all reactions

• Does network topology contain sufficient informationfor predicting system-levelresponses?

HIF1A

VHL

CUL2

Page 19: Network and pathway analysis in systems biology - Melissa Davis

Topology mattersGrouping of molecules into sets breaks connectivity and eliminates real crosstalk

Page 20: Network and pathway analysis in systems biology - Melissa Davis

Representation of multi-cellular interactions

Page 21: Network and pathway analysis in systems biology - Melissa Davis

Is there such a thing as a pathway at all?

• Our concept of a pathway as a linear series of events is largely a fiction

• Signalling proteins may be active in many pathways

• More-correct to think of this as a network

AKT1

EGFR

FGFRNGF

PDGF SCF-KIT

ERBB2GPCR

Immune system, Membrane trafficking, Gene expression, Hemostasis, Apoptosis, and Metabolism

Page 22: Network and pathway analysis in systems biology - Melissa Davis

PATHLOGIC-S• Extract the reaction network from REACTOME (BioPAX L3 model)• Convert to a Boolean logical model of the signal transduction network• No parameterisation• Enumerate the capabilities of the network

Fearnley et al., 2012, PLOS One

Page 23: Network and pathway analysis in systems biology - Melissa Davis

Comparison to phosphoproteomic data

Novel differential signalling predicted

Observed differential signalling predicted

Observed molecules accurately predicted

Observed molecules inaccurately predicted

Observed molecules fail to map

• Adapt the PATHLOGIC system to model different experimental states

• Evaluate predictions against published experimental results

• Performance depends on pathway coverage

• Good sensitivity• Small true negative datasets -> difficult

to calculate accurate specificity• Novel, mechanistic predictions

Steen et al, 2002 Osinalde et al, 2011

Page 24: Network and pathway analysis in systems biology - Melissa Davis

• EGFR signalling• Gold-standard map (Oda et al.,

MSB, 1: 2005.0010) compared with representation in Reactome– Overlap (red)– Equivalency (purple)– Greens (present in different

pathways)– White (not present)– Limited crosstalk found that was

not captured in the gold standard

Bauer-Mehren et al., (2009) MSB, 5:290

Page 25: Network and pathway analysis in systems biology - Melissa Davis

EGF Signalling results

• Validation data: EGF phospho-proteomic experiment characterisingdownstream activity resulting from EGF stimulation (Steen et al, 2002)

• Model two conditions:– Without EGF present -> EGF and related molecules switched off– In presence of EGF -> EGF as an input is switched on, and related molecules are

set as undetermined

• Correct predictions for 6 positive and 3 negative results• 38 additional proteins are predicted to have altered signalling

Page 26: Network and pathway analysis in systems biology - Melissa Davis

• Advanced generation pathway analysis techniques use pathway topology and rich biological information attached to interactions

• Screen for the consequencesof mutations, knock-outs, altered connectivity

• Identify the effectsof drugs targetingspecific moleculesor pathways

• Models are untrained and unfitted – more accurate models in specific cells or tissues to improve predictive power

Page 27: Network and pathway analysis in systems biology - Melissa Davis

• Using CNV data from Pancreatic cancer to identify patient specific gene deletions and simulate the effects on signal transduction using PATHLOGIC

• Bone-specific interactions and secreted factors to identify candidate systems implicated in prostate cancer metastasis to bone

• Simulations on DNA repair pathways to identify synthetic-lethal genes in breast cancer

• Protein interactions to develop network-based biomarkers in Medulloblastoma

Applications in cancer research

Page 28: Network and pathway analysis in systems biology - Melissa Davis

Cancer heterogeneity

• Recent work in a number of cancers has characterised genetic heterogeneity within tumours– Subsection or single cell sequencing coupled with phylogenetic

inference to infer clonal populations

• What can we tell about heterogenetiy in existing data by using new informatic approaches?

• Not interested in what is the same between tumours, but what is different

• mCOPA: analysis of heterogeneous features in cancer expression data (Wang, Taciroglu, Maetschke, Nelson, Ragan and Davis, J.Clin.Bioinf. 2012)– identifies over- and under-expressed outliers in individual tumour

samples

Page 29: Network and pathway analysis in systems biology - Melissa Davis

Why are outliers interesting?

• Tumours have diverse molecular characteristics

• Not all interesting genes have a biomarker-like profile

• Need a statistical method to detect outliers in gene expression data

Tomlins et al., Recurrent fusion of TMPRSS2 and ETS transcription factor genes in prostate cancer. Science 2005, 310:644-648.

• mCOPA – stand-alone method for detection of over- and under-expressed outliers

• COPA transformation– COPA Score = (score – median)/mean

absolute deviation

• Improved outlier detection• Filters

– Fold change calculation– No normal samples contain outliers for

the feature of interest

• Generation of outlier feature list– Over-expressed and under-expressed

feature list for each individual sample• -1 (under-expressed outlier)• 0 (not an outlier)• 1 (over-expressed outlier)

Page 30: Network and pathway analysis in systems biology - Melissa Davis

Applications: Unsupervised clustering with mCOPA

Data and sample annotation from Tommlins et al. (2007)

Page 31: Network and pathway analysis in systems biology - Melissa Davis

Feature selection and clustering analysis methodology

1. Select features from 12 datasets based on one of three methods– Variance (top 1000 most variable genes)– Differential expression (p value < 0.01)– Outliers

2. Apply different clustering algorithms on each features set

3. Compare resulting clusters to clinical annotation and generate RAND index

4. Evaluate performance on clinically defined cancer subtypes

DEanalysis

Varianceanalysis

mCOPAanalysis

PAMK-means

SilCH

Dataset annotation

RAND calculation

Evaluation

mCOPA features produce the best clustering in 7 cases, compared with 2 for DE and 3 when using the original COPA method

Page 32: Network and pathway analysis in systems biology - Melissa Davis

Feature selection approaches

• Distinct biology

– Usually, minimal overlap between Variable, Differentially Expressed and Outlier genes

– Functional analysis reveals distinct functions and processes for selected genes

GO analysis UP DOWN

mCOPA Outlier Genes

Cell cycle, cell division Apoptosis, positive regulation of kinase cascade, and signalling

DifferentiallyExpressed Genes

Cell adhesion, Wnt and Cadherin signalling

Oxidative metabolism,Cholesterol metabolism

Page 33: Network and pathway analysis in systems biology - Melissa Davis

Can we use under-expressed outliers to identify tumour suppressors?

mCOPA analysisGene Ontology

Database

223 Under-expressed outliers

727 Cell cycle regulators

12 Potential tumour suppressors

RBL2CDK6TP63BIRC2

SONPAFAH1B1

PDCD4RBBP8DBC1

FZR1CDC14BHEXIM1

Known prostate

cancer tumour

suppressors

Potential new prostate cancer

tumour suppressors

Known cancer

tumour suppressors

Cancer Gene Index:http://wiki.nci.nih.gov/display/cageneindex

Potential novel

tumour suppressors

Page 34: Network and pathway analysis in systems biology - Melissa Davis

Evidence for novel tumour suppressorsFZR1 (Degrades positive

regulators of cell cycle, prevents

entry into mitosis following DNA

damage.)

TCGA Prostate CNV:

Also• Significant loss in TCGA:

Ovarian, Lung, Gastric, Endometrium, Breast

• Expression: Significantly under-expressed in 46 experiments

• Under-expressed outlier in 17% of cancer experiments

Also• Significant loss in TCGA:

Breast, Ovarian, Renal, Lung, Endometrium, but not Prostate

• Expression: Significantly under-expressed in 84 experiments

• Under-expressed outlier in 22% of cancer experiments

CDC14B (Regulates the G2

DNA damage checkpoint

following DNA damage.)

TCGA Ovarian CNV: TCGA Ovarian CNV:

HEXIM1 (transcriptional

regulator via RNA Polymerase II

transcription inhibition.)

Also• Significant loss in TCGA:

Colorectal, Ovarian, Breast, Prostate and Endometrium

• Expression: Significantly under-expressed in 101 experiments

• Under-expressed outlier in 11% of cancer experiments

Page 35: Network and pathway analysis in systems biology - Melissa Davis

Pathway analysis for individuals

• Most outliers are present in only one sample• We can treat the set of outliers for a given

sample as input to a pathway analysis for each tumour

• Some pathways affected only in a single patient

• Some pathways show disruption in multiple tumours

Page 36: Network and pathway analysis in systems biology - Melissa Davis

• Using outlier profiles to understand heterogeneity in multi-focal prostate cancer

• Exploring outlier profiles to improve gene regulatory network inference

• Tumour-specific outliers as input to pathway modelling and simulation

Applications in cancer research

Page 37: Network and pathway analysis in systems biology - Melissa Davis
Page 38: Network and pathway analysis in systems biology - Melissa Davis

• Systems biology needs to move beyond simple networks to representations that are rich in biology

• Formal, machine-readable mechanistic models of biological knowledge

• Knowledge-based analytical methods will enable n=1 scale analysis

• Computational analyses can generate networks rich in biology and with great predictive power

Data can be generated by machines,but to generate knowledge from data, we need to start with

what we know

Page 39: Network and pathway analysis in systems biology - Melissa Davis

AcknowledgementsThe Institute for Molecular Bioscience:Mark Ragan, Rohan Teasdale, Sean Grimmond, and

Brandon WainwrightUQ: Lars Nielsen, Michelle Hill, and

Nicholas SaundersQIMR: Nicole CloonanQUT: Colleen Nelson

Stefan Maetschke (UQ)Chenwei Wang (QUT)

Current students:

David WoodLiam FearnleyJosha InglisAkash Boda

Previous students:

ChangJin ShinPiyush Mathamshettiwar

Ning JingAlperen TacirogluAnna-Belle BeauYoann GlougenChang LiuAnh Phuong Le [grant number DP110103384]