ENCODE project: brief summary of main findings

Post on 22-Nov-2014

3.105 views 3 download

Tags:

description

A brief summary of the ENCODE project and ist main finding. Most important publications for cancer researchers and how to make use of the ENCODe data.

Transcript of ENCODE project: brief summary of main findings

ENCODEEncyclopedia of DNA Elements

Outline

What and who is ENCODE

Key ENCODE topics and most important papers for our research

ENCODE data – make use of the encyclopedia…

Maté Ongenaert

What and who is ENCODEMain aims, funding and the institutions/labs behind the 200 M $

Who?International consortium

Funded by NHGRI – National Human Genome Research Institute200 million dollar

Main collaborators (for human data)Broad Institute (ChIP-seq)

HudsonAlpha Institute for Biotechnology (methylation)Sanger Institute (RNA-seq)Duke University (DNAse)

Yale University (Pol II)EBI (data analysis)

Main aims “Build a comprehensive parts list of functional elements in the human genome, including elements that act at the protein and

RNA levels, and regulatory elements that control cells and circumstances in which a gene is active”

What and who is ENCODEMain aims, funding and the institutions/labs behind the 200 M $

What’s so hot… It has been running for years?

Started in 2007 – pilot project1% of the genome

2007-2012Since then, introduction of new technologies

Higher throughput Genome-wide

Much more samples and different tissues (different ‘tiers’ – see later)

Better data analysis and integration

What and who is ENCODEMain aims, funding and the institutions/labs behind the 200 M $

What’s so hot… It has been running for years?

World wide press attention

What and who is ENCODEMain aims, funding and the institutions/labs behind the 200 M $

What’s so hot… It has been running for years?

World wide press attention…and criticisms

“Popular” media focus on the “junk DNA aspect”

The authors also claim in their press-release that > 80% of the genome is

‘biologically active’ (<> may be involved in regulation in one way or another <>

junk DNA)

ENCODE reveals for the fist time a lot of factors of the very complex switching

board controlling expression / …

What and who is ENCODEMain aims, funding and the institutions/labs behind the 200 M $

What’s so hot… It has been running for years?

30 (!) research papers published in three journals at the same time

ENCODEEncyclopedia of DNA Elements

Outline

What and who is ENCODE

Key ENCODE topics and most important papers for our research

ENCODE data – make use of the encyclopedia…

Key ENCODE topicsMain ENCODE topics and selection of most important papers

Key topics

Transcription factor binding motifsChromatin patterns at transcription factor binding sites

Characterization of intergenic regions and gene definitionsRNA and chromatin modification patterns around promoters

Epigenetic regulation of RNA processingNon-coding RNA characterisation

DNA-methylationEnhancer discovery and characterization

3D connections across the genomeCharacterisation of network topology

Machine learning approaches to genomicsImpact of functional information on understanding variation

Impact of evolutionary selection on functional regions

Key ENCODE topicsMain ENCODE topics and selection of most important papers

Main paper

95% of the genome lies within 8 kilobases (kb) of a DNA–protein interaction

Classifying the genome into seven chromatin states indicates an initial set of 399,124 regions with enhancer-like features and 70,292 regions with promoter-like features

It is possible to correlate quantitatively RNA sequence production and processing with both chromatin marks and transcription factor binding at promoters, indicating that promoter

functionality can explain most of the variation in RNA expression

Single nucleotide polymorphisms (SNPs) associated with disease by GWAS are enriched within non-coding functional elements, with a majority residing in or near ENCODE-defined

regions that are outside of protein-coding genes. In many cases, the disease phenotypescan be associated with a specific cell type or transcription factor

Key ENCODE topicsMain ENCODE topics and selection of most important papers

Main paper

Techniques used:RNA-seqChIP-seq

DNAse-seqDNA-methylation arrays and bisulfite seq

FAIRE-seq

Tier 1: three cell lines (K652 – GM12878 – H1 hESC)Tier 2: cell line panel (HeLa-S3 – HepG2 – HUVECs)

Tier 3 (all other cell types)

Total: 1640 datasets / 147 different cell types

Key ENCODE topicsMain ENCODE topics and selection of most important papers

Main paper

Key ENCODE topicsMain ENCODE topics and selection of most important papers

Main paper

95% of the genome lies within 8 kilobases (kb) of a DNA–protein interaction

Classifying the genome into seven chromatin states indicates an initial set of 399,124 regions with enhancer-like features and 70,292 regions with promoter-like features

It is possible to correlate quantitatively RNA sequence production and processing with both chromatin marks and transcription factor binding at promoters, indicating that promoter

functionality can explain most of the variation in RNA expression

Single nucleotide polymorphisms (SNPs) associated with disease by GWAS are enriched within non-coding functional elements, with a majority residing in or near ENCODE-defined

regions that are outside of protein-coding genes. In many cases, the disease phenotypescan be associated with a specific cell type or transcription factor

Key ENCODE topicsMain ENCODE topics and selection of most important papers

Expression – chromatin state Expression – transcription factors

Key ENCODE topicsMain ENCODE topics and selection of most important papers

Expression – transcription factors

Key ENCODE topicsMain ENCODE topics and selection of most important papers

Chromatin state patterns at transcription-factor binding

sites

Key ENCODE topicsMain ENCODE topics and selection of most important papers

Co-association between transcription factors (K562)

Key ENCODE topicsMain ENCODE topics and selection of most important papers

Insight in genomic variation – allele specific variation

Key ENCODE topicsMain ENCODE topics and selection of most important papers

Main paper

95% of the genome lies within 8 kilobases (kb) of a DNA–protein interaction

Classifying the genome into seven chromatin states indicates an initial set of 399,124 regions with enhancer-like features and 70,292 regions with promoter-like features

It is possible to correlate quantitatively RNA sequence production and processing with both chromatin marks and transcription factor binding at promoters, indicating that promoter

functionality can explain most of the variation in RNA expression

Single nucleotide polymorphisms (SNPs) associated with disease by GWAS are enriched within non-coding functional elements, with a majority residing in or near ENCODE-defined

regions that are outside of protein-coding genes. In many cases, the disease phenotypescan be associated with a specific cell type or transcription factor

Key ENCODE topicsMain ENCODE topics and selection of most important papers

Overlap SNPs withregulatory elements

Key ENCODE topicsMain ENCODE topics and selection of most important papers

Overlap SNPs with regulatory elements and ‘open’ chromatin

Key ENCODE topicsMain ENCODE topics and selection of most important papers

Other important papers to us

Key ENCODE topicsMain ENCODE topics and selection of most important papers

Accessible chromatin landscape

DNAseI treatmentCombined analysis with TFs and H3K4me3

Identification of “accessible” chromatin regions

Key ENCODE topicsMain ENCODE topics and selection of most important papers

Accessible chromatin landscape – location of accessible regions

Key ENCODE topicsMain ENCODE topics and selection of most important papers

Accessible chromatin landscape – association with ChIP-seq and TFs

Key ENCODE topicsMain ENCODE topics and selection of most important papers

Accessible chromatin landscape – novel transcripts

Key ENCODE topicsMain ENCODE topics and selection of most important papers

Other important papers to us

Key ENCODE topicsMain ENCODE topics and selection of most important papers

Landscape of transcription

RNA-seq

Get a grip on what is transcribed, including novel transcripts and RNAs

Key ENCODE topicsMain ENCODE topics and selection of most important papers

Landscape of transcription – nucleolar fraction vs. whole cell

Key ENCODE topicsMain ENCODE topics and selection of most important papers

Landscape of transcription

Key ENCODE topicsMain ENCODE topics and selection of most important papers

Other important papers to us

Key ENCODE topicsMain ENCODE topics and selection of most important papers

Long-range interaction of promoters

5C mapping (chromatin interaction mapping technology)

Long-range interactions of promoter regions

Key ENCODE topicsMain ENCODE topics and selection of most important papers

Long-range interaction of promoters

Key ENCODE topicsMain ENCODE topics and selection of most important papers

Other important papers to us

Key ENCODE topicsMain ENCODE topics and selection of most important papers

Other important papers to us

Key ENCODE topicsMain ENCODE topics and selection of most important papers

Transcriptional regulation

ChIP-seq <> expression detection

Predict transcriptional regulation

Key ENCODE topicsMain ENCODE topics and selection of most important papers

Transcriptional regulation – predict transcription

Key ENCODE topicsMain ENCODE topics and selection of most important papers

Transcriptional regulation – expression prediction

Key ENCODE topicsMain ENCODE topics and selection of most important papers

Transcriptional regulation – TFs predict location of histone modifications

Key ENCODE topicsMain ENCODE topics and selection of most important papers

Transcriptional regulation – model

Key ENCODE topicsMain ENCODE topics and selection of most important papers

Other important papers to us

Key ENCODE topicsMain ENCODE topics and selection of most important papers

Cell-type specific gene expression from open chromatin regions

Key ENCODE topicsMain ENCODE topics and selection of most important papers

Other important papers to us

Key ENCODE topicsMain ENCODE topics and selection of most important papers

Cell-type specific TF binding

Key ENCODE topicsMain ENCODE topics and selection of most important papers

Other important papers to us

Key ENCODE topicsMain ENCODE topics and selection of most important papers

SNPs in regulatory regions

Key ENCODE topicsMain ENCODE topics and selection of most important papers

Other important papers to us

Key ENCODE topicsMain ENCODE topics and selection of most important papers

TF binding - interactions

Key ENCODE topicsMain ENCODE topics and selection of most important papers

TF binding – cell-type specificity

Key ENCODE topicsMain ENCODE topics and selection of most important papers

Other important papers to us

Key ENCODE topicsMain ENCODE topics and selection of most important papers

Classification of genomic regions

Key ENCODE topicsMain ENCODE topics and selection of most important papers

Classification of genomic regions

Key ENCODE topicsMain ENCODE topics and selection of most important papers

Classification of genomic regions

ENCODEEncyclopedia of DNA Elements

Outline

What and who is ENCODE

Key ENCODE topics and most important papers for our research

ENCODE data – make use of the encyclopedia…

ENCODE dataData availability

Data availability

All data is available, from raw data to final processed data

For end-level users:

- Tracks in the UCSC browser with desired level of detail Visualize tracks and explore genomic context

For end-level users and bio-IT:- In UCSC “Table browser” and other UCSC tools

Export genomic information, including processed data

For high end-level users and Bio-IT:- Raw data and semi-processed data in GEO and others

ENCODE dataData availability

Tracks in the UCSC browser with desired level of detail

ENCODE dataData availability

Tracks in the UCSC table browser

ENCODE dataData availability

Raw data

ENCODE dataData availability

Raw data

Blokde Van…

ETER