USING NETWORKS TO UNDERSTAND THE GENOTYPE …...ARID3A PRRX2 1.16E-23 244 A+ ARID3A SOX5 1.01E-14...

60
USING NETWORKS TO UNDERSTAND THE GENOTYPE-PHENOTYPE CONNECTION John Quackenbush Dana-Farber Cancer Institute Harvard TH Chan School of Public Health

Transcript of USING NETWORKS TO UNDERSTAND THE GENOTYPE …...ARID3A PRRX2 1.16E-23 244 A+ ARID3A SOX5 1.01E-14...

Page 1: USING NETWORKS TO UNDERSTAND THE GENOTYPE …...ARID3A PRRX2 1.16E-23 244 A+ ARID3A SOX5 1.01E-14 155 A+ PRRX2 SOX5 3.83E-12 157 A+ ARNT MZF1 5.83E-23 92 N-AHR ARNT 6.13E-16 382 N-ETS1

USING NETWORKS

TO UNDERSTAND THE

GENOTYPE-PHENOTYPE

CONNECTION

John Quackenbush

Dana-Farber Cancer Institute

Harvard TH Chan School of Public Health

Page 2: USING NETWORKS TO UNDERSTAND THE GENOTYPE …...ARID3A PRRX2 1.16E-23 244 A+ ARID3A SOX5 1.01E-14 155 A+ PRRX2 SOX5 3.83E-12 157 A+ ARNT MZF1 5.83E-23 92 N-AHR ARNT 6.13E-16 382 N-ETS1

Essentially, all models are wrong,

but some are useful.

– George E. Box

Page 3: USING NETWORKS TO UNDERSTAND THE GENOTYPE …...ARID3A PRRX2 1.16E-23 244 A+ ARID3A SOX5 1.01E-14 155 A+ PRRX2 SOX5 3.83E-12 157 A+ ARNT MZF1 5.83E-23 92 N-AHR ARNT 6.13E-16 382 N-ETS1

How we do Network Analysis

Ge

ne

s

Conditions

Expression data

(Phenotype I)

Ge

ne

s

Conditions

Expression data

(Phenotype II)

Infer

phenotype-specific

network

Infer

phenotype-specific

network

Compare Network

Topologies

Analyze Network

Topology

and Structure●

●●●●●●

●●

●●

●●

●●

●●●●●

●●●●

●● ●●

SNP Degree Distribution

Degree

Fre

qu

en

cy

10−4

10−3

10−2

10−1

100

100

101

102

●●●●●●●●●●●

●●●

●●●

●●●●●●●●●

●●●●●

●●●●

●●●

●●●

●●●●

●●

●●●

●●

●●●

●●

●●

●●

●●●

●●

●●

●●●

●●●●

●●●

●●

●●●

●●●●●●●●●●●●●● ●● ●●●

Gene Degree Distribution

Degree

10−3

10−2

10−1

100

100

101

102

Simultaneously

compare differential

structure

and expression

+

Page 4: USING NETWORKS TO UNDERSTAND THE GENOTYPE …...ARID3A PRRX2 1.16E-23 244 A+ ARID3A SOX5 1.01E-14 155 A+ PRRX2 SOX5 3.83E-12 157 A+ ARNT MZF1 5.83E-23 92 N-AHR ARNT 6.13E-16 382 N-ETS1

• There is no single “right” network

• The structure of the network matters and network

structure often changes between states.

• We have to move from asking “Is the network right?”

to asking “Is the network useful?”

• The real question is “Does a network model inform

our understanding of biology?”

Starting Assumptions

Page 5: USING NETWORKS TO UNDERSTAND THE GENOTYPE …...ARID3A PRRX2 1.16E-23 244 A+ ARID3A SOX5 1.01E-14 155 A+ PRRX2 SOX5 3.83E-12 157 A+ ARNT MZF1 5.83E-23 92 N-AHR ARNT 6.13E-16 382 N-ETS1

The Methodological Zoo

LIONESS

CONDOR

MONSTER

ALPACA

GRN

mRNA

GRN GRN GRN GRN GRN

Node Communities

& Core-scores

GRNpairs of

networks

single

network

single-sample networks

Data Integration

& Network

Modelling

Comparative

Network

Analysis

prioritize SNPs

eQTLs

Public Data Resources

Public Data

Gene

Expression

Genetic

Profiles

edgeQTLs

(NEW!)

Differential

Node

Communities

Differential TFs

Network

Algorithm

Input

Data

Algorithm

Output

Hypothesis

Test

Legend

integrate

variants

model

covariates

Edge Communities

(set of edges perturbed by SNPs)

PANDA – Passing Attributes

between Networks for Data

Assimilation

SPIDER– Seeding PANDA

Iterations Determining

Epigenetic Regulation

PUMA – PANDA Using Micro-

RNA Associations

CONDOR – COmplex Network

Description Of Regulators

LIONESS – Linear Interpolation

to Obtain Network Estimates

for Single Samples

ALPACA – ALtered Partitions

Across Community

Architectures

MONSTER – MOdeling Network

State Transitions from

Expression and Regulatory

data

Algorithm Key

PPI

StringDB

Proteomics?

PANDA

TF-

gene

Epigenetic

data?

yes

miRNA-

gene

PUMA

FIMO/motif-scan TargetScan

SPIDER

no

Pre

lim

inary

Data

&

GR

N c

on

str

ucti

on

Page 6: USING NETWORKS TO UNDERSTAND THE GENOTYPE …...ARID3A PRRX2 1.16E-23 244 A+ ARID3A SOX5 1.01E-14 155 A+ PRRX2 SOX5 3.83E-12 157 A+ ARNT MZF1 5.83E-23 92 N-AHR ARNT 6.13E-16 382 N-ETS1

Question 1:

Can we solve the

“GWAS Puzzle”?

Page 7: USING NETWORKS TO UNDERSTAND THE GENOTYPE …...ARID3A PRRX2 1.16E-23 244 A+ ARID3A SOX5 1.01E-14 155 A+ PRRX2 SOX5 3.83E-12 157 A+ ARNT MZF1 5.83E-23 92 N-AHR ARNT 6.13E-16 382 N-ETS1

Genome Wide Association Studies (GWAS)

© Gibson & Muse, A Primer of Genome Science

Page 8: USING NETWORKS TO UNDERSTAND THE GENOTYPE …...ARID3A PRRX2 1.16E-23 244 A+ ARID3A SOX5 1.01E-14 155 A+ PRRX2 SOX5 3.83E-12 157 A+ ARNT MZF1 5.83E-23 92 N-AHR ARNT 6.13E-16 382 N-ETS1

697 SNPs explain 20% of height

~2,000 SNPs explain 21% of height

~3,700 SNPs explain 24% of height

~9,500 SNPs explain 29% of height

Page 9: USING NETWORKS TO UNDERSTAND THE GENOTYPE …...ARID3A PRRX2 1.16E-23 244 A+ ARID3A SOX5 1.01E-14 155 A+ PRRX2 SOX5 3.83E-12 157 A+ ARNT MZF1 5.83E-23 92 N-AHR ARNT 6.13E-16 382 N-ETS1

97 SNPs explain 2.7% of BMI

All common SNPs may explain 20% of BMI

Do we give up on GWAS, fine map everything,

or think differently?

Page 10: USING NETWORKS TO UNDERSTAND THE GENOTYPE …...ARID3A PRRX2 1.16E-23 244 A+ ARID3A SOX5 1.01E-14 155 A+ PRRX2 SOX5 3.83E-12 157 A+ ARNT MZF1 5.83E-23 92 N-AHR ARNT 6.13E-16 382 N-ETS1

Rare Variants = Dust

…large-scale sequencing does not support the idea that lower-frequency

variants have a major role in predisposition to type 2 diabetes.

Page 11: USING NETWORKS TO UNDERSTAND THE GENOTYPE …...ARID3A PRRX2 1.16E-23 244 A+ ARID3A SOX5 1.01E-14 155 A+ PRRX2 SOX5 3.83E-12 157 A+ ARNT MZF1 5.83E-23 92 N-AHR ARNT 6.13E-16 382 N-ETS1

eQTL AnalysisExpression Quantitative Trait Locus Analysis (eQTL

Analysis) uses genome-wide data on genetic variants

(SNPs) together with gene expression data

Treat gene expression as a quantitative trait

Ask, “Which SNPs are correlated with the degree of

gene expression?”

Most people concentrate on cis-acting SNPs

What about trans-acting SNPs?

Page 12: USING NETWORKS TO UNDERSTAND THE GENOTYPE …...ARID3A PRRX2 1.16E-23 244 A+ ARID3A SOX5 1.01E-14 155 A+ PRRX2 SOX5 3.83E-12 157 A+ ARNT MZF1 5.83E-23 92 N-AHR ARNT 6.13E-16 382 N-ETS1

cis-eQTL Analysis

Page 13: USING NETWORKS TO UNDERSTAND THE GENOTYPE …...ARID3A PRRX2 1.16E-23 244 A+ ARID3A SOX5 1.01E-14 155 A+ PRRX2 SOX5 3.83E-12 157 A+ ARNT MZF1 5.83E-23 92 N-AHR ARNT 6.13E-16 382 N-ETS1

trans-eQTL Analysis

Page 14: USING NETWORKS TO UNDERSTAND THE GENOTYPE …...ARID3A PRRX2 1.16E-23 244 A+ ARID3A SOX5 1.01E-14 155 A+ PRRX2 SOX5 3.83E-12 157 A+ ARNT MZF1 5.83E-23 92 N-AHR ARNT 6.13E-16 382 N-ETS1

eQTL Networks: A simple idea

John Platig

• Perform a “standard eQTL” analysis:

Y = β0 + β1 ADD + ε

where Y is the quantitative trait and ADD is

the allele dosage of a genotype.

Representing eQTLs as a network and

analyzing its structure should provide insight

in the complex interactions that drive

disease.

Page 15: USING NETWORKS TO UNDERSTAND THE GENOTYPE …...ARID3A PRRX2 1.16E-23 244 A+ ARID3A SOX5 1.01E-14 155 A+ PRRX2 SOX5 3.83E-12 157 A+ ARNT MZF1 5.83E-23 92 N-AHR ARNT 6.13E-16 382 N-ETS1

Many strong eQTLs are found near the target

gene. But what about multiple SNPs that are

correlated with multiple genes?

Genes

SNPs

eQTL Networks: A simple idea

Can a network of SNP-

gene associations

(cis and trans) inform

the functional roles of

these SNPs?

Page 16: USING NETWORKS TO UNDERSTAND THE GENOTYPE …...ARID3A PRRX2 1.16E-23 244 A+ ARID3A SOX5 1.01E-14 155 A+ PRRX2 SOX5 3.83E-12 157 A+ ARNT MZF1 5.83E-23 92 N-AHR ARNT 6.13E-16 382 N-ETS1

The Result: A Hairball

Some random hairball I grabbed. I was too lazy to make one.

Page 17: USING NETWORKS TO UNDERSTAND THE GENOTYPE …...ARID3A PRRX2 1.16E-23 244 A+ ARID3A SOX5 1.01E-14 155 A+ PRRX2 SOX5 3.83E-12 157 A+ ARNT MZF1 5.83E-23 92 N-AHR ARNT 6.13E-16 382 N-ETS1

Results: COPD

●●●●●●

●●

●●

●●

●●

●●●●●

●●●●

●● ●●

SNP Degree Distribution

Degree

Fre

qu

en

cy

10−4

10−3

10−2

10−1

100

100

101

102

●●●●●●●●●●●

●●●

●●●

●●●●●●●●●

●●●●●

●●●●

●●●

●●●

●●●●

●●

●●●

●●

●●●

●●

●●

●●

●●●

●●

●●

●●●

●●●●

●●●

●●

●●●

●●●●●●●●●●●●●● ●● ●●●

Gene Degree Distribution

Degree

10−3

10−2

10−1

100

100

101

102

~30,000 SNPs and ~3,400 Genes

Degree – number of links per node

Page 18: USING NETWORKS TO UNDERSTAND THE GENOTYPE …...ARID3A PRRX2 1.16E-23 244 A+ ARID3A SOX5 1.01E-14 155 A+ PRRX2 SOX5 3.83E-12 157 A+ ARNT MZF1 5.83E-23 92 N-AHR ARNT 6.13E-16 382 N-ETS1

What about GWAS SNPs?

John Platig

● ●

● ●●

●●●

● ●

● ● ●

Degree

Fre

qu

en

cy

SNP Degree Distribution

10−

41

0−

310−

21

0−

110

0

100

101

102

● ●

● ●

GWAS SNPs

All SNPs

● ●

● ●●

●●●

● ●

● ● ●

Degree

Fre

qu

en

cy

SNP Degree Distribution

10−

41

0−

310−

21

0−

110

0

100

101

102

● ●

● ●

GWAS SNPs

All SNPs

The “hubs”

are a

GWAS desert!

Page 19: USING NETWORKS TO UNDERSTAND THE GENOTYPE …...ARID3A PRRX2 1.16E-23 244 A+ ARID3A SOX5 1.01E-14 155 A+ PRRX2 SOX5 3.83E-12 157 A+ ARNT MZF1 5.83E-23 92 N-AHR ARNT 6.13E-16 382 N-ETS1

Can we use this network to identify

groups of SNPs and genes that play

functional roles in the cell?

Try clustering the nodes into “communities” based on the

network structure

Page 20: USING NETWORKS TO UNDERSTAND THE GENOTYPE …...ARID3A PRRX2 1.16E-23 244 A+ ARID3A SOX5 1.01E-14 155 A+ PRRX2 SOX5 3.83E-12 157 A+ ARNT MZF1 5.83E-23 92 N-AHR ARNT 6.13E-16 382 N-ETS1

Newman 2006 (PNAS)

• Community structure algorithms group nodes

such that the number of links within a community

is higher than expected by chance

• Formally, they assign nodes to communities such

that the modularity, Q, is optimized

Fraction of network

links in community i

Fraction of

links expected

by chance

Communities are groups of highly intra-connected nodes

Page 21: USING NETWORKS TO UNDERSTAND THE GENOTYPE …...ARID3A PRRX2 1.16E-23 244 A+ ARID3A SOX5 1.01E-14 155 A+ PRRX2 SOX5 3.83E-12 157 A+ ARNT MZF1 5.83E-23 92 N-AHR ARNT 6.13E-16 382 N-ETS1

Communities in COPD eQTL networks

Page 22: USING NETWORKS TO UNDERSTAND THE GENOTYPE …...ARID3A PRRX2 1.16E-23 244 A+ ARID3A SOX5 1.01E-14 155 A+ PRRX2 SOX5 3.83E-12 157 A+ ARNT MZF1 5.83E-23 92 N-AHR ARNT 6.13E-16 382 N-ETS1

• We identified 52 communities, with Q = 0.79 (out of 1)

• Of 34 communities in the giant connected component, 11 are

enriched for genes with coherent functions (GO Terms;

P<5x10-4)

Communities in COPD eQTL networks

Page 23: USING NETWORKS TO UNDERSTAND THE GENOTYPE …...ARID3A PRRX2 1.16E-23 244 A+ ARID3A SOX5 1.01E-14 155 A+ PRRX2 SOX5 3.83E-12 157 A+ ARNT MZF1 5.83E-23 92 N-AHR ARNT 6.13E-16 382 N-ETS1

Communities in COPD eQTL networks

- Immune response

- Stress response

- T cell stimulation

- Microtubule organization

- Cell cycle

- Centrosome

- Chromatin Assembly

- DNA conformation

change

- Nucleosome

assembly

Page 24: USING NETWORKS TO UNDERSTAND THE GENOTYPE …...ARID3A PRRX2 1.16E-23 244 A+ ARID3A SOX5 1.01E-14 155 A+ PRRX2 SOX5 3.83E-12 157 A+ ARNT MZF1 5.83E-23 92 N-AHR ARNT 6.13E-16 382 N-ETS1

Identifying community cores

Newman 2006 (PNAS)

• Score each SNP by its contribution to the modularity of its

community

• Do these “core scores” reflect known biology?

1

32

Page 25: USING NETWORKS TO UNDERSTAND THE GENOTYPE …...ARID3A PRRX2 1.16E-23 244 A+ ARID3A SOX5 1.01E-14 155 A+ PRRX2 SOX5 3.83E-12 157 A+ ARNT MZF1 5.83E-23 92 N-AHR ARNT 6.13E-16 382 N-ETS1

What about COPD GWAS SNPs?

• Use a meta-analysis by Cho et. al. and consider

34 COPD GWAS SNPs (FDR < 0.05)

Cho, Michael H., et al. "Risk loci for chronic obstructive pulmonary disease: a

genome-wide association study and meta-analysis." The Lancet Respiratory

Medicine 2.3 (2014): 214-225.

Page 26: USING NETWORKS TO UNDERSTAND THE GENOTYPE …...ARID3A PRRX2 1.16E-23 244 A+ ARID3A SOX5 1.01E-14 155 A+ PRRX2 SOX5 3.83E-12 157 A+ ARNT MZF1 5.83E-23 92 N-AHR ARNT 6.13E-16 382 N-ETS1

Core Scores for COPD GWAS SNPs

The median core score for the 34 FDR-significant GWAS SNPs is

20.3 times higher than the median for non-significant SNPs

Page 27: USING NETWORKS TO UNDERSTAND THE GENOTYPE …...ARID3A PRRX2 1.16E-23 244 A+ ARID3A SOX5 1.01E-14 155 A+ PRRX2 SOX5 3.83E-12 157 A+ ARNT MZF1 5.83E-23 92 N-AHR ARNT 6.13E-16 382 N-ETS1

First in lung tissue/COPD

Page 28: USING NETWORKS TO UNDERSTAND THE GENOTYPE …...ARID3A PRRX2 1.16E-23 244 A+ ARID3A SOX5 1.01E-14 155 A+ PRRX2 SOX5 3.83E-12 157 A+ ARNT MZF1 5.83E-23 92 N-AHR ARNT 6.13E-16 382 N-ETS1

How general is this?

Page 29: USING NETWORKS TO UNDERSTAND THE GENOTYPE …...ARID3A PRRX2 1.16E-23 244 A+ ARID3A SOX5 1.01E-14 155 A+ PRRX2 SOX5 3.83E-12 157 A+ ARNT MZF1 5.83E-23 92 N-AHR ARNT 6.13E-16 382 N-ETS1

Now in thirteen tissues

Page 30: USING NETWORKS TO UNDERSTAND THE GENOTYPE …...ARID3A PRRX2 1.16E-23 244 A+ ARID3A SOX5 1.01E-14 155 A+ PRRX2 SOX5 3.83E-12 157 A+ ARNT MZF1 5.83E-23 92 N-AHR ARNT 6.13E-16 382 N-ETS1

GTEx: A big sandbox

Page 31: USING NETWORKS TO UNDERSTAND THE GENOTYPE …...ARID3A PRRX2 1.16E-23 244 A+ ARID3A SOX5 1.01E-14 155 A+ PRRX2 SOX5 3.83E-12 157 A+ ARNT MZF1 5.83E-23 92 N-AHR ARNT 6.13E-16 382 N-ETS1

eQTL networks are highly modular

Page 32: USING NETWORKS TO UNDERSTAND THE GENOTYPE …...ARID3A PRRX2 1.16E-23 244 A+ ARID3A SOX5 1.01E-14 155 A+ PRRX2 SOX5 3.83E-12 157 A+ ARNT MZF1 5.83E-23 92 N-AHR ARNT 6.13E-16 382 N-ETS1

GWAS SNPs are cores, but not hubs

Page 33: USING NETWORKS TO UNDERSTAND THE GENOTYPE …...ARID3A PRRX2 1.16E-23 244 A+ ARID3A SOX5 1.01E-14 155 A+ PRRX2 SOX5 3.83E-12 157 A+ ARNT MZF1 5.83E-23 92 N-AHR ARNT 6.13E-16 382 N-ETS1

GWAS SNPs not Hubs—in every tissue

Page 34: USING NETWORKS TO UNDERSTAND THE GENOTYPE …...ARID3A PRRX2 1.16E-23 244 A+ ARID3A SOX5 1.01E-14 155 A+ PRRX2 SOX5 3.83E-12 157 A+ ARNT MZF1 5.83E-23 92 N-AHR ARNT 6.13E-16 382 N-ETS1

Core SNPs are more likely to be functionally annotated

Page 35: USING NETWORKS TO UNDERSTAND THE GENOTYPE …...ARID3A PRRX2 1.16E-23 244 A+ ARID3A SOX5 1.01E-14 155 A+ PRRX2 SOX5 3.83E-12 157 A+ ARNT MZF1 5.83E-23 92 N-AHR ARNT 6.13E-16 382 N-ETS1

Core SNPs are different from Hubs:

Roadmap Epigenomics Project

Global Hubs

Nongenic Enhancers

Polycomb Repressed

Regions

Data from 8 tissues

Local Hubs (Core SNPs)

Tissue-specific active

chromatin

Page 36: USING NETWORKS TO UNDERSTAND THE GENOTYPE …...ARID3A PRRX2 1.16E-23 244 A+ ARID3A SOX5 1.01E-14 155 A+ PRRX2 SOX5 3.83E-12 157 A+ ARNT MZF1 5.83E-23 92 N-AHR ARNT 6.13E-16 382 N-ETS1

• The SNPs that are global hubs are not GWAS hits—

meaning that they are not linked to diseases or traits.

• The SNPs and genes group into communities that

share function—a family of SNPs regulate a function.

• Disease-associated (GWAS) SNPs map to communities

whose genes have functions that make biological

sense.

• “Core” SNPs are far more likely to be disease SNPs.

• Tissue-specific functions are in tissue-specific

communities with tissue-specific genes organized

around SNPs in tissue-specific open chromatin.

What does this tell us?

Page 37: USING NETWORKS TO UNDERSTAND THE GENOTYPE …...ARID3A PRRX2 1.16E-23 244 A+ ARID3A SOX5 1.01E-14 155 A+ PRRX2 SOX5 3.83E-12 157 A+ ARNT MZF1 5.83E-23 92 N-AHR ARNT 6.13E-16 382 N-ETS1

Question 2:

Can we model

gene regulatory processes?

Page 38: USING NETWORKS TO UNDERSTAND THE GENOTYPE …...ARID3A PRRX2 1.16E-23 244 A+ ARID3A SOX5 1.01E-14 155 A+ PRRX2 SOX5 3.83E-12 157 A+ ARNT MZF1 5.83E-23 92 N-AHR ARNT 6.13E-16 382 N-ETS1

Integrative Network Inference:

PANDA

Page 39: USING NETWORKS TO UNDERSTAND THE GENOTYPE …...ARID3A PRRX2 1.16E-23 244 A+ ARID3A SOX5 1.01E-14 155 A+ PRRX2 SOX5 3.83E-12 157 A+ ARNT MZF1 5.83E-23 92 N-AHR ARNT 6.13E-16 382 N-ETS1

Regulation of Transcription

Specific transcription factors

promoterregulatory

sequences

Page 40: USING NETWORKS TO UNDERSTAND THE GENOTYPE …...ARID3A PRRX2 1.16E-23 244 A+ ARID3A SOX5 1.01E-14 155 A+ PRRX2 SOX5 3.83E-12 157 A+ ARNT MZF1 5.83E-23 92 N-AHR ARNT 6.13E-16 382 N-ETS1

A Simple Idea: Message PassingTranscription Factor

Downstream Target

The TF is Responsible for

communicating with its Target

The Target must be Available

to respond to the TF

Kimberly Glass, GC Yuan

Page 41: USING NETWORKS TO UNDERSTAND THE GENOTYPE …...ARID3A PRRX2 1.16E-23 244 A+ ARID3A SOX5 1.01E-14 155 A+ PRRX2 SOX5 3.83E-12 157 A+ ARNT MZF1 5.83E-23 92 N-AHR ARNT 6.13E-16 382 N-ETS1

PPI0 Co-regulation0

Network1

Responsibility Availability

Network0

TF-Motif Scan

Co-regulation1PPI1

Glass et. al. “Passing Messages Between Biological Networks to Refine Predicted Interactions.” PLoS One. 2013 May 31;8(5):e64832.Code and related material available on sourceforge: http://sourceforge.net/projects/panda-net/

Message-Passing Networks: PANDA

Page 42: USING NETWORKS TO UNDERSTAND THE GENOTYPE …...ARID3A PRRX2 1.16E-23 244 A+ ARID3A SOX5 1.01E-14 155 A+ PRRX2 SOX5 3.83E-12 157 A+ ARNT MZF1 5.83E-23 92 N-AHR ARNT 6.13E-16 382 N-ETS1

Subtypes of Ovarian Cancer

Page 43: USING NETWORKS TO UNDERSTAND THE GENOTYPE …...ARID3A PRRX2 1.16E-23 244 A+ ARID3A SOX5 1.01E-14 155 A+ PRRX2 SOX5 3.83E-12 157 A+ ARNT MZF1 5.83E-23 92 N-AHR ARNT 6.13E-16 382 N-ETS1

Kimberly Glass, GC Yuan

Gen

es

Condition

s

Expression data

(Angiogenic)

Gen

es

Condition

s

Expression data

(Non-angiogenic)

Co

mp

are

/Iden

tify D

iffere

nces

Network for

Angiogenic Subtype

Network for

Non-angiogenic Subtype

PANDA: Integrative Network Models

Page 44: USING NETWORKS TO UNDERSTAND THE GENOTYPE …...ARID3A PRRX2 1.16E-23 244 A+ ARID3A SOX5 1.01E-14 155 A+ PRRX2 SOX5 3.83E-12 157 A+ ARNT MZF1 5.83E-23 92 N-AHR ARNT 6.13E-16 382 N-ETS1

Kimberly Glass, GC Yuan

Page 45: USING NETWORKS TO UNDERSTAND THE GENOTYPE …...ARID3A PRRX2 1.16E-23 244 A+ ARID3A SOX5 1.01E-14 155 A+ PRRX2 SOX5 3.83E-12 157 A+ ARNT MZF1 5.83E-23 92 N-AHR ARNT 6.13E-16 382 N-ETS1

Kimberly Glass, GC Yuan

Page 46: USING NETWORKS TO UNDERSTAND THE GENOTYPE …...ARID3A PRRX2 1.16E-23 244 A+ ARID3A SOX5 1.01E-14 155 A+ PRRX2 SOX5 3.83E-12 157 A+ ARNT MZF1 5.83E-23 92 N-AHR ARNT 6.13E-16 382 N-ETS1

Kimberly Glass, GC Yuan

Page 47: USING NETWORKS TO UNDERSTAND THE GENOTYPE …...ARID3A PRRX2 1.16E-23 244 A+ ARID3A SOX5 1.01E-14 155 A+ PRRX2 SOX5 3.83E-12 157 A+ ARNT MZF1 5.83E-23 92 N-AHR ARNT 6.13E-16 382 N-ETS1

Inner ring: key TFs

Colored by Edge

Enrichment (A or N)

Outer ring: genes

Colored by Differential

Expression (A or N)

Interring Connections

Colored by

Subnetwork (A or N)

Ticks – genes

annotated to

“angiogenesis” in GO,

Page 48: USING NETWORKS TO UNDERSTAND THE GENOTYPE …...ARID3A PRRX2 1.16E-23 244 A+ ARID3A SOX5 1.01E-14 155 A+ PRRX2 SOX5 3.83E-12 157 A+ ARNT MZF1 5.83E-23 92 N-AHR ARNT 6.13E-16 382 N-ETS1

TF1 TF2 sig. # Class

ARID3A PRRX2 1.16E-23 244 A+

ARID3A SOX5 1.01E-14 155 A+

PRRX2 SOX5 3.83E-12 157 A+

ARNT MZF1 5.83E-23 92 N-

AHR ARNT 6.13E-16 382 N-

ETS1 MZF1 9.08E-16 148 N-

Co

-reg

ula

tory

TF

Pa

irs

Complex Regulatory Patterns Emerge

Kimberly Glass, GC Yuan

Page 49: USING NETWORKS TO UNDERSTAND THE GENOTYPE …...ARID3A PRRX2 1.16E-23 244 A+ ARID3A SOX5 1.01E-14 155 A+ PRRX2 SOX5 3.83E-12 157 A+ ARNT MZF1 5.83E-23 92 N-AHR ARNT 6.13E-16 382 N-ETS1

Kimberly Glass, GC Yuan

Regulatory Patterns suggest Therapies

Page 50: USING NETWORKS TO UNDERSTAND THE GENOTYPE …...ARID3A PRRX2 1.16E-23 244 A+ ARID3A SOX5 1.01E-14 155 A+ PRRX2 SOX5 3.83E-12 157 A+ ARNT MZF1 5.83E-23 92 N-AHR ARNT 6.13E-16 382 N-ETS1

More application papers coming….

Page 51: USING NETWORKS TO UNDERSTAND THE GENOTYPE …...ARID3A PRRX2 1.16E-23 244 A+ ARID3A SOX5 1.01E-14 155 A+ PRRX2 SOX5 3.83E-12 157 A+ ARNT MZF1 5.83E-23 92 N-AHR ARNT 6.13E-16 382 N-ETS1

More application papers coming….

Page 52: USING NETWORKS TO UNDERSTAND THE GENOTYPE …...ARID3A PRRX2 1.16E-23 244 A+ ARID3A SOX5 1.01E-14 155 A+ PRRX2 SOX5 3.83E-12 157 A+ ARNT MZF1 5.83E-23 92 N-AHR ARNT 6.13E-16 382 N-ETS1

Question 3:

Can we move beyond

THE Network?

Page 53: USING NETWORKS TO UNDERSTAND THE GENOTYPE …...ARID3A PRRX2 1.16E-23 244 A+ ARID3A SOX5 1.01E-14 155 A+ PRRX2 SOX5 3.83E-12 157 A+ ARNT MZF1 5.83E-23 92 N-AHR ARNT 6.13E-16 382 N-ETS1

Reconstructing Gene Regulatory

NetworksGene Expression Data

Genes

Multiple Samples

Network Reconstruction

One Network

Pearson Correlation, Mutual

Information, CLR, PANDA, etc.

Marieke Kuijjer, Matt Tung, Kimbie Glass

We generally estimate “Aggregate” Networks.

Page 54: USING NETWORKS TO UNDERSTAND THE GENOTYPE …...ARID3A PRRX2 1.16E-23 244 A+ ARID3A SOX5 1.01E-14 155 A+ PRRX2 SOX5 3.83E-12 157 A+ ARNT MZF1 5.83E-23 92 N-AHR ARNT 6.13E-16 382 N-ETS1

Gene Expression Data

Genes

Multiple Samples

Reconstructing Gene Regulatory

Networks

Network Reconstruction

One Network

Represents information

from all the input

samples

Linear Interpolation to Obtain Network

Estimates for Single Samples

(LIONESS)

+ + + =

Pearson Correlation, Mutual

Information, CLR, PANDA, etc.

Page 55: USING NETWORKS TO UNDERSTAND THE GENOTYPE …...ARID3A PRRX2 1.16E-23 244 A+ ARID3A SOX5 1.01E-14 155 A+ PRRX2 SOX5 3.83E-12 157 A+ ARNT MZF1 5.83E-23 92 N-AHR ARNT 6.13E-16 382 N-ETS1

Sample q’s contribution to e(α)

Network e(α)

Representing contributions

from all samples

Reconstruct

network

Network e(α-q)

Representing contributions

from all samples except q

Reconstruct

network

e(α) e(α-q)

1 2 3 4 5 6 … q … Ns

Samples

Genes

X1 2 3 4 5 6 … q … Ns

Genes

Samples

Single-Sample Networks (LIONESS)

-

𝑁𝑠 𝑒𝑖𝑗(α)

− 𝑒𝑖𝑗α−𝑞

+ 𝑒𝑖𝑗α−𝑞

= 𝑒𝑖𝑗(𝑞)

scale

factor

Ns( - )e(α-q) e(q)

Sample q’s

network

Network

estimated

without

sample q

+ =

Page 56: USING NETWORKS TO UNDERSTAND THE GENOTYPE …...ARID3A PRRX2 1.16E-23 244 A+ ARID3A SOX5 1.01E-14 155 A+ PRRX2 SOX5 3.83E-12 157 A+ ARNT MZF1 5.83E-23 92 N-AHR ARNT 6.13E-16 382 N-ETS1

• Data includes 48 total expression arrays taken over a time-

course (every 5 min) on synchronized yeast cells (~2 cell

cycles)

• Includes technical replicates (Cy3/Cy5 and Cy5/Cy3)

• Estimate single-sample networks using data for each replicate.

Replicate 1 (24 Samples 24 Networks)

Replicate 2 (24 Samples 24 Networks)

A Quick Test Using Yeast Cell Cycle Data

Re

p1

Re

p2

Rep1 Rep2 0.5

0.8Correlation in Networks

Re

p1

Re

p2

Rep1 Rep2-0.8

0.8Correlation in Expression

Page 57: USING NETWORKS TO UNDERSTAND THE GENOTYPE …...ARID3A PRRX2 1.16E-23 244 A+ ARID3A SOX5 1.01E-14 155 A+ PRRX2 SOX5 3.83E-12 157 A+ ARNT MZF1 5.83E-23 92 N-AHR ARNT 6.13E-16 382 N-ETS1

Validation and Insight

Marieke Kuijjer, Matt Tung, Kimbie Glass

Page 58: USING NETWORKS TO UNDERSTAND THE GENOTYPE …...ARID3A PRRX2 1.16E-23 244 A+ ARID3A SOX5 1.01E-14 155 A+ PRRX2 SOX5 3.83E-12 157 A+ ARNT MZF1 5.83E-23 92 N-AHR ARNT 6.13E-16 382 N-ETS1

good

prognosis

poor

prognosis

• Single-sample networks for TCGA

glioblastoma patients

• 3yr survival to define good and poor

prognosis

• LIMMA analysis using network

“edge Z-scores”

• Analysis points to important roles

for FOS-JUN and NFkB

• Gene degree differences identify

mitosis and immune-related genes

Significant

differential targeting

(gene degree

differences),

GSEA, FDR<0.01

higher targeting in

good

higher targeting in

poor

Glioblastoma

network signatures

Page 59: USING NETWORKS TO UNDERSTAND THE GENOTYPE …...ARID3A PRRX2 1.16E-23 244 A+ ARID3A SOX5 1.01E-14 155 A+ PRRX2 SOX5 3.83E-12 157 A+ ARNT MZF1 5.83E-23 92 N-AHR ARNT 6.13E-16 382 N-ETS1

Before I came here I was confused

about this subject.

After listening to your lecture,

I am still confused but at a higher level.

- Enrico Fermi, (1901-1954)

Page 60: USING NETWORKS TO UNDERSTAND THE GENOTYPE …...ARID3A PRRX2 1.16E-23 244 A+ ARID3A SOX5 1.01E-14 155 A+ PRRX2 SOX5 3.83E-12 157 A+ ARNT MZF1 5.83E-23 92 N-AHR ARNT 6.13E-16 382 N-ETS1

Gene Expression Team

Fieda Abderazzaq

Aedin Culhane

Jessica Mar

Renee Rubio

University of Queensland

Christine Wells

Lizzy Mason

<[email protected]>

AcknowledgmentsStudents and Postdocs

Joseph Barry

Joey (Cho-Yi) Chen

Marieke Kuijjer

Camila Lopes-Ramos

Zachary McCaw

Megha Padi

Joseph Paulson

John Platig

Daniel Schlauch

Daphne Tsoucas

DFCI Radiology

Hugo Aerts

Administrative Support

Nicole Trotman

Center for Cancer

Computational Biology

Fieda Abderazzaq

Stas Alekseev

Nicole Flanagan

Ed Harms

Lev Kuznetsov

Brian Lawney

Antony Partensky

John Quackenbush

Renee Rubio

Yaoyu E. Wang

http://cccb.dfci.harvard.edu

http://compbio.dfci.harvard.edu

CDNM, Brigham and Women’s Hospital

Peter Castaldi

Michael Cho

Dawn DeMeo

Kimberly Glass

Ed Silverman

Xiaobo Zhou

Alumni

Martin Aryee

Stefan Bentink

Kimberly Glass

Benjamin Haibe-Kains

Kaveh Maghsoudi

Jess Mar

Melissa Merritt

Alejandro Qiuiroz

J. Fah Sathirapongsasuti