great.stanford€¦ · GREAT improves functional interpretation of cis-regulatory regions Cory Y...

1
GREAT improves functional interpretation of cis-regulatory regions Cory Y McLean 1 , Dave Bristor 1,2 , Michael Hiller 2 , Shoa L Clarke 3 , Bruce T Schaar 2 , Craig B Lowe 4 , Aaron M Wenger 1 , Gill Bejerano 1,2 Departments of 1. Computer Science, 2. Developmental Biology, 3. Genetics, Stanford University 4. Department of Biomolecular Engineering, University of California Santa Cruz ChIP-seq identifies functional non-coding regions of the genome Gene-based enrichment tools are inappropriate to analyze cis-regulatory elements Variation in gene distribution makes genes with no others nearby more likely to be selected than genes in clusters 2,3 Shown are results from sets scattered randomly throughout the genome, associating regions with the nearest genes within 1 Mb 1. Fields, S., Science, 2007 2. Lowe et al., Proc. Natl. Acad. Sci., 2007 3. Taher, L. & Ovcharenko, I., Bioinformatics, 2009 GREAT supports many ontologies in both human and mouse Human: NCBI Build 36.1 (hg18) Mouse: NCBI Build 37 (mm9) Twenty supported ontologies span Gene Ontology, pathways, gene expression, regulatory motifs, phenotypes and human disease, and gene families Example: GREAT infers many functions of Serum Response Factor (SRF) from its binding profile ChIP-seq identified SRF binding profile in human 5 Gene-based enrichment analysis identified only general terms as highly enriched 5 GREAT detects enrichments for specific functions of SRF GREAT is easy to navigate and provides detailed information GREAT accurately assesses statistical enrichments of cis-regulatory sequences such as those generated by ChIP-seq, open chromatin, comparative genomics, etc. GREAT supports 20 diverse ontologies for both human and mouse Application to multiple transcription-associated factors in a variety of contexts shows both detailed enrichment for known functions and potential avenues for further investigation of the assayed factors 4 Online tool available at http://great.stanford.edu ChIP-seq peaks identify cis-regulatory elements of interest (transcription factor binding sites, methylation domains, etc.) Identified regions work in cis to affect expression of nearby genes http://great.stanford.edu Input: BED regions of interest Advanced options: Alter association rules between genomic regions and genes Output: Ontology Term Enrichments Summary 4. McLean, C.Y. et al., Nat. Biotechnol., 2010 5. Valouev, A. et al., Nat. Methods, 2008 6. Kent, W. et al., Genome Res., 2002 Gene-based GO Enrichments of SRF Peaks 5 GREAT Ontology Enrichments of SRF Peaks [adapted from ref. 1] Genomic regions annotated as “actin cytoskeleton” Seamless integration with UCSC Genome Browser 6 visualization tools Ontology Term Hypothesis Binomial Experimental P-value support GO: Cellular Component GO: Mol.Func. TF Targets Promoter Motifs Pathway Commons TreeFam actin cytoskeleton cortical cytoskeleton actin binding Targets of SRF Targets of YY1 Targets of E2F4 and p130 SRF variants GABPA/GABPB Motif NGGGACTTTCCA EGR1 TRAIL signaling Class I PI3K signaling FOSL2 / JDP2 / FOS / FOSL1 / FOSB / ATF3 location location function co-regulator co-regulator co-regulator co-regulator co-regulator co-regulator pathway pathway gene family 6.91 x 10 -9 4.03 x 10 -6 5.21 x 10 -5 4.97 x 10 -76 1.45 x 10 -6 4.73 x 10 -3 4.54 x 10 -28 to 4.19 x 10 -12 4.20 x 10 -9 1.02 x 10 -4 1.71 x 10 -4 2.37 x 10 -7 9.92 x 10 -7 9.66 x 10 -9 Novel testable hypothesis Positive control Known from literature* 1. Current methods ignore distal binding 2. Including distal binding using gene-based tools results in bias 3. Distal binding events comprise a large fraction of all binding events Details page for “actin cytoskeleton” Genes annotated as “actin cytoskeleton” with associated genomic regions nearby Genomic regions annotated as “actin cytoskeleton” Frame holding http://www.geneontology.org definition of “actin cytoskeleton” All input genomic regions Miano et al 2007 Miano et al 2007 Miano et al 2007 Natesan & Gilman 1995 Novel Novel Novel Novel Bertolotto et al 2000 Poser et al 2000 Chai & Tarnawski 2002 20 ontologies Term statistics Display filters Data export options Multiple hypothesis correction options H: human, M: mouse SRF, NRSF, GABP from [4] Stat3, p300 in ESC from Chen et al., 2008 p300 in other tissues from Visel et al., 2009 The Genomic Regions Enrichment of Annotations Tool (GREAT) 4 accurately analyzes cis-regulatory element enrichments * Known interactions typically implicate a small subset of genes that GREAT identifies as potentially related to the processes of interest. Version 1.2 interface Open API allows submission of data from other tools as well Crosslink proteins to DNA and lyse cells Fragment chromatin, add a protein-specific antibody and purify protein-DNA complexes Reverse crosslinks, sequence isolated DNA and map reads to genome

Transcript of great.stanford€¦ · GREAT improves functional interpretation of cis-regulatory regions Cory Y...

Page 1: great.stanford€¦ · GREAT improves functional interpretation of cis-regulatory regions Cory Y McLean 1, Dave Bristor1,2, Michael Hiller2, Shoa L Clarke3, Bruce T Schaar2, Craig

GREAT improves functional interpretation of cis-regulatory regions Cory Y McLean1, Dave Bristor1,2, Michael Hiller2, Shoa L Clarke3, Bruce T Schaar2, Craig B Lowe4, Aaron M Wenger1, Gill Bejerano1,2

Departments of 1. Computer Science, 2. Developmental Biology, 3. Genetics, Stanford University

4. Department of Biomolecular Engineering, University of California Santa Cruz

ChIP-seq identifies functional

non-coding regions of the genome

Gene-based enrichment tools

are inappropriate to analyze

cis-regulatory elements

• Variation in gene

distribution makes genes

with no others nearby more

likely to be selected than

genes in clusters2,3

• Shown are results from

sets scattered randomly

throughout the genome,

associating regions with the

nearest genes within 1 Mb

1. Fields, S., Science, 2007

2. Lowe et al., Proc. Natl. Acad. Sci., 2007

3. Taher, L. & Ovcharenko, I., Bioinformatics, 2009

GREAT supports many ontologies

in both human and mouse

• Human: NCBI Build 36.1 (hg18)

• Mouse: NCBI Build 37 (mm9)

• Twenty supported ontologies span Gene Ontology,

pathways, gene expression, regulatory motifs, phenotypes

and human disease, and gene families

Example: GREAT infers many

functions of Serum Response

Factor (SRF) from its binding profile

• ChIP-seq identified SRF binding profile in human5

• Gene-based enrichment analysis identified only general

terms as highly enriched5

• GREAT detects enrichments for specific functions of SRF

GREAT is easy to navigate and

provides detailed information

• GREAT accurately assesses statistical enrichments of

cis-regulatory sequences such as those generated by

ChIP-seq, open chromatin, comparative genomics, etc.

• GREAT supports 20 diverse ontologies for both human

and mouse

• Application to multiple transcription-associated factors

in a variety of contexts shows both detailed enrichment

for known functions and potential avenues for further

investigation of the assayed factors4

• Online tool available at http://great.stanford.edu

•ChIP-seq peaks identify cis-regulatory elements of interest

(transcription factor binding sites, methylation domains, etc.)

• Identified regions work in cis to affect expression of nearby

genes

http://great.stanford.edu

Input: BED regions of interest

Advanced options: Alter association rules

between genomic regions and genes

Output: Ontology Term Enrichments

Summary

4. McLean, C.Y. et al., Nat. Biotechnol., 2010

5. Valouev, A. et al., Nat. Methods, 2008

6. Kent, W. et al., Genome Res., 2002

Gene-based GO Enrichments of SRF Peaks5

GREAT Ontology Enrichments of SRF Peaks

[adapted from ref. 1]

Genomic regions annotated

as “actin cytoskeleton”

Seamless integration with

UCSC Genome Browser6

visualization tools

Ontology Term Hypothesis Binomial Experimental

P-value support

GO: Cellular

Component

GO: Mol.Func.

TF Targets

Promoter

Motifs

Pathway

Commons

TreeFam

actin cytoskeleton

cortical cytoskeleton

actin binding

Targets of SRF

Targets of YY1

Targets of E2F4 and p130

SRF variants

GABPA/GABPB

Motif NGGGACTTTCCA

EGR1

TRAIL signaling

Class I PI3K signaling

FOSL2 / JDP2 / FOS /

FOSL1 / FOSB / ATF3

location

location

function

co-regulator

co-regulator

co-regulator

co-regulator

co-regulator

co-regulator

pathway

pathway

gene

family

6.91 x 10-9

4.03 x 10-6

5.21 x 10-5

4.97 x 10-76

1.45 x 10-6

4.73 x 10-3

4.54 x 10-28

to 4.19 x 10-12

4.20 x 10-9

1.02 x 10-4

1.71 x 10-4

2.37 x 10-7

9.92 x 10-7

9.66 x 10-9

Novel testable hypothesis Positive control Known from literature*

1. Current methods ignore distal binding

2. Including distal binding using

gene-based tools results in bias

3. Distal binding events comprise a

large fraction of all binding events

Details page for “actin cytoskeleton”

Genes annotated as

“actin cytoskeleton” with

associated genomic regions

nearby

Genomic regions annotated

as “actin cytoskeleton”

Frame holding http://www.geneontology.org

definition of “actin

cytoskeleton”

All input genomic regions

Miano et al 2007

Miano et al 2007

Miano et al 2007

Natesan & Gilman 1995

Novel

Novel

Novel

Novel

Bertolotto et al 2000

Poser et al 2000

Chai & Tarnawski 2002

20 ontologies

Term statistics

Display filters

Data export options

Multiple hypothesis

correction options

H: human, M: mouse

• SRF, NRSF, GABP from [4]

• Stat3, p300 in ESC from

Chen et al., 2008

• p300 in other tissues from

Visel et al., 2009

The Genomic Regions Enrichment

of Annotations Tool (GREAT)4

accurately analyzes cis-regulatory

element enrichments

* Known interactions typically implicate a small subset of genes that GREAT identifies as

potentially related to the processes of interest.

Version 1.2

interface

Open API allows submission of data from other tools as well

Crosslink proteins to

DNA and lyse cells Fragment chromatin, add a protein-specific

antibody and purify protein-DNA complexes Reverse crosslinks, sequence isolated

DNA and map reads to genome