Download - great.stanford€¦ · GREAT improves functional interpretation of cis-regulatory regions Cory Y McLean 1, Dave Bristor1,2, Michael Hiller2, Shoa L Clarke3, Bruce T Schaar2, Craig

Transcript
Page 1: great.stanford€¦ · GREAT improves functional interpretation of cis-regulatory regions Cory Y McLean 1, Dave Bristor1,2, Michael Hiller2, Shoa L Clarke3, Bruce T Schaar2, Craig

GREAT improves functional interpretation of cis-regulatory regions Cory Y McLean1, Dave Bristor1,2, Michael Hiller2, Shoa L Clarke3, Bruce T Schaar2, Craig B Lowe4, Aaron M Wenger1, Gill Bejerano1,2

Departments of 1. Computer Science, 2. Developmental Biology, 3. Genetics, Stanford University

4. Department of Biomolecular Engineering, University of California Santa Cruz

ChIP-seq identifies functional

non-coding regions of the genome

Gene-based enrichment tools

are inappropriate to analyze

cis-regulatory elements

• Variation in gene

distribution makes genes

with no others nearby more

likely to be selected than

genes in clusters2,3

• Shown are results from

sets scattered randomly

throughout the genome,

associating regions with the

nearest genes within 1 Mb

1. Fields, S., Science, 2007

2. Lowe et al., Proc. Natl. Acad. Sci., 2007

3. Taher, L. & Ovcharenko, I., Bioinformatics, 2009

GREAT supports many ontologies

in both human and mouse

• Human: NCBI Build 36.1 (hg18)

• Mouse: NCBI Build 37 (mm9)

• Twenty supported ontologies span Gene Ontology,

pathways, gene expression, regulatory motifs, phenotypes

and human disease, and gene families

Example: GREAT infers many

functions of Serum Response

Factor (SRF) from its binding profile

• ChIP-seq identified SRF binding profile in human5

• Gene-based enrichment analysis identified only general

terms as highly enriched5

• GREAT detects enrichments for specific functions of SRF

GREAT is easy to navigate and

provides detailed information

• GREAT accurately assesses statistical enrichments of

cis-regulatory sequences such as those generated by

ChIP-seq, open chromatin, comparative genomics, etc.

• GREAT supports 20 diverse ontologies for both human

and mouse

• Application to multiple transcription-associated factors

in a variety of contexts shows both detailed enrichment

for known functions and potential avenues for further

investigation of the assayed factors4

• Online tool available at http://great.stanford.edu

•ChIP-seq peaks identify cis-regulatory elements of interest

(transcription factor binding sites, methylation domains, etc.)

• Identified regions work in cis to affect expression of nearby

genes

http://great.stanford.edu

Input: BED regions of interest

Advanced options: Alter association rules

between genomic regions and genes

Output: Ontology Term Enrichments

Summary

4. McLean, C.Y. et al., Nat. Biotechnol., 2010

5. Valouev, A. et al., Nat. Methods, 2008

6. Kent, W. et al., Genome Res., 2002

Gene-based GO Enrichments of SRF Peaks5

GREAT Ontology Enrichments of SRF Peaks

[adapted from ref. 1]

Genomic regions annotated

as “actin cytoskeleton”

Seamless integration with

UCSC Genome Browser6

visualization tools

Ontology Term Hypothesis Binomial Experimental

P-value support

GO: Cellular

Component

GO: Mol.Func.

TF Targets

Promoter

Motifs

Pathway

Commons

TreeFam

actin cytoskeleton

cortical cytoskeleton

actin binding

Targets of SRF

Targets of YY1

Targets of E2F4 and p130

SRF variants

GABPA/GABPB

Motif NGGGACTTTCCA

EGR1

TRAIL signaling

Class I PI3K signaling

FOSL2 / JDP2 / FOS /

FOSL1 / FOSB / ATF3

location

location

function

co-regulator

co-regulator

co-regulator

co-regulator

co-regulator

co-regulator

pathway

pathway

gene

family

6.91 x 10-9

4.03 x 10-6

5.21 x 10-5

4.97 x 10-76

1.45 x 10-6

4.73 x 10-3

4.54 x 10-28

to 4.19 x 10-12

4.20 x 10-9

1.02 x 10-4

1.71 x 10-4

2.37 x 10-7

9.92 x 10-7

9.66 x 10-9

Novel testable hypothesis Positive control Known from literature*

1. Current methods ignore distal binding

2. Including distal binding using

gene-based tools results in bias

3. Distal binding events comprise a

large fraction of all binding events

Details page for “actin cytoskeleton”

Genes annotated as

“actin cytoskeleton” with

associated genomic regions

nearby

Genomic regions annotated

as “actin cytoskeleton”

Frame holding http://www.geneontology.org

definition of “actin

cytoskeleton”

All input genomic regions

Miano et al 2007

Miano et al 2007

Miano et al 2007

Natesan & Gilman 1995

Novel

Novel

Novel

Novel

Bertolotto et al 2000

Poser et al 2000

Chai & Tarnawski 2002

20 ontologies

Term statistics

Display filters

Data export options

Multiple hypothesis

correction options

H: human, M: mouse

• SRF, NRSF, GABP from [4]

• Stat3, p300 in ESC from

Chen et al., 2008

• p300 in other tissues from

Visel et al., 2009

The Genomic Regions Enrichment

of Annotations Tool (GREAT)4

accurately analyzes cis-regulatory

element enrichments

* Known interactions typically implicate a small subset of genes that GREAT identifies as

potentially related to the processes of interest.

Version 1.2

interface

Open API allows submission of data from other tools as well

Crosslink proteins to

DNA and lyse cells Fragment chromatin, add a protein-specific

antibody and purify protein-DNA complexes Reverse crosslinks, sequence isolated

DNA and map reads to genome