Gene Set Enrichment Analysis

17
Gene Set Enrichment Analysis Genome 559: Introduction to Statistical and Computational Genomics Elhanan Borenstein

description

Gene Set Enrichment Analysis. Genome 559: Introduction to Statistical and Computational Genomics Elhanan Borenstein. A quick review. Gene expression profiling Which molecular processes/functions are involved in a certain phenotype (e.g., disease, stress response , etc.) - PowerPoint PPT Presentation

Transcript of Gene Set Enrichment Analysis

Page 1: Gene Set Enrichment Analysis

Gene SetEnrichment Analysis

Genome 559: Introduction to Statistical and Computational Genomics

Elhanan Borenstein

Page 2: Gene Set Enrichment Analysis

Gene expression profiling Which molecular processes/functions

are involved in a certain phenotype (e.g., disease, stress response, etc.)

The Gene Ontology (GO) Project Provides shared vocabulary/annotation GO terms are linked in a complex

structure Enrichment analysis:

Find the “most” differentially expressed genes Identify functional annotations that are over-represented Modified Fisher's exact test

A quick review

Page 3: Gene Set Enrichment Analysis

A quick review:Modified Fisher's exact test

Differentially expressed (DE) genes/balls

4 out of 810 out of 50

2 out of 8 2 out of 84 out of 81 out of 8 2 out of 8 5 out of 8 3 out of 8

Null model: the 8 genes/balls are selected randomly

So, if you have 50 balls, 10 of them are blue, and you pick 8 balls randomly, what is the probability that k of them are blue?

Do I have a surprisingly high number of blue genes?

Genes/balls

Page 4: Gene Set Enrichment Analysis

A quick review:Modified Fisher's exact test

m=50, mt=10, n=8

Hypergeometric distribution

So … do I have a surprisinglyhigh number of blue genes?

Can such high numbers (4 or above) occur by change?

What is the probability of getting at least 4 blue genes in the null model?

P(σt >=4)

Prob

abili

ty

k0 1 2 3 4 5 6 7 8

0.15

0.30

0

Page 5: Gene Set Enrichment Analysis

Enrichment AnalysisClassA ClassB

Gen

es ra

nked

by

expr

essi

on c

orre

latio

n to

Cla

ss A

Cutoff

Biologicalfunction?

Page 6: Gene Set Enrichment Analysis

Enrichment AnalysisClassA ClassB

Gen

es ra

nked

by

expr

essi

on c

orre

latio

n to

Cla

ss A

Cutoff

Biologicalfunction?

2 /

10

Func

tion

1(e

.g.,

met

abol

ism

)

5 /

11

Func

tion

2(e

.g.,

sign

alin

g)

3 /

10

Func

tion

3(e

.g.,

regu

latio

n)

Page 7: Gene Set Enrichment Analysis

After correcting for multiple hypotheses testing, no individual gene may meet the threshold due to noise.

Alternatively, one may be left with a long list of significant genes without any unifying biological theme.

The cutoff value is often arbitrary!

We are really examining only a handful of genes, totally ignoringmuch of the data

Problems with cutoff-based analysis

Page 8: Gene Set Enrichment Analysis

MIT, Broad Institute V 2.0 available since Jan 2007

Gene Set Enrichment Analysis

(Subramanian et al. PNAS. 2005.)

Page 9: Gene Set Enrichment Analysis

Calculates a score for the enrichment of a entire set of genes rather than single genes!

Does not require setting a cutoff!

Identifies the set of relevant genes as part of the analysis!

Provides a more robust statistical framework!

GSEA key features

Page 10: Gene Set Enrichment Analysis

Gene Set Enrichment AnalysisClassA ClassB

Gen

es ra

nked

by

expr

essi

on c

orre

latio

n to

Cla

ss A

Cutoff

Biologicalfunction?

2 /

10

5 /

11

3 /

10

Func

tion

1(e

.g.,

met

abol

ism

)

Func

tion

2(e

.g.,

sign

alin

g)

Func

tion

3(e

.g.,

regu

latio

n)

Page 11: Gene Set Enrichment Analysis

Gene Set Enrichment AnalysisClassA ClassB

Gen

es ra

nked

by

expr

essi

on c

orre

latio

n to

Cla

ss A

Running sum:Increase when gene is in set

Decrease otherwise

Func

tion

1(e

.g.,

met

abol

ism

)

Func

tion

2(e

.g.,

sign

alin

g)

Func

tion

3(e

.g.,

regu

latio

n)

Page 12: Gene Set Enrichment Analysis

Gene Set Enrichment Analysis

What would you expect if the hits were randomly distributed?

What would you expect if most of the hits cluster at the top of the list?

Page 13: Gene Set Enrichment Analysis

Gene Set Enrichment Analysis

Genes within functional set

(hits)

Running sum

Enrichment score (ES) = max deviation from 0

Leading Edge genes

Page 14: Gene Set Enrichment Analysis

Gene Set Enrichment AnalysisLow ES (evenly distributed)ES = 0.43 ES = -0.45

Page 15: Gene Set Enrichment Analysis

Ducray et al. Molecular Cancer 2008 7:41

Gene Set Enrichment Analysis

Page 16: Gene Set Enrichment Analysis

1. Calculation of an enrichment score(ES) for each functional category

2. Estimation of significance level of the ES An empirical permutation test

Phenotype labels are shuffled and the ES for this functional set is recomputed. Repeat 1000 times.

Generating a null distribution

3. Adjustment for multiple hypotheses testing Necessary if comparing multiple gene sets (i.e.,functions)

Computes FDR (false discovery rate)

GSEA Steps

Page 17: Gene Set Enrichment Analysis