Computational analysis of the ENCODE...
Transcript of Computational analysis of the ENCODE...
. . . . . .
.
......
Computational analysis of the ENCODE datasetsand other related epigenetic explorations
Ved Topkar
Harvard College class of 2016
Gunawardena LabHarvard Medical School
Department of Systems Biology13 August 2013
Ved Topkar (Harvard College) Epigenetic analysis at promoters 13 August 2013 1 / 1
. . . . . .
Introduction
Presentation Goals
FULL understanding of discussed materialAsk questions along the way!
Ved Topkar (Harvard College) Epigenetic analysis at promoters 13 August 2013 2 / 1
. . . . . .
Introduction
Outline
...1 Molecular biology in a jiffy
...2 A case study
Hypothesis formulationAnalyzing data
...3 More examples
Ved Topkar (Harvard College) Epigenetic analysis at promoters 13 August 2013 3 / 1
. . . . . .
Introduction
Outline
...1 Molecular biology in a jiffy
...2 A case study
Hypothesis formulationAnalyzing data
...3 More examples
Ved Topkar (Harvard College) Epigenetic analysis at promoters 13 August 2013 3 / 1
. . . . . .
Introduction
Outline
...1 Molecular biology in a jiffy
...2 A case study
Hypothesis formulationAnalyzing data
...3 More examples
Ved Topkar (Harvard College) Epigenetic analysis at promoters 13 August 2013 3 / 1
. . . . . .
Molecular Biology Essentials
The Cell
Ved Topkar (Harvard College) Epigenetic analysis at promoters 13 August 2013 4 / 1
. . . . . .
Molecular Biology Essentials
The Central Dogma
Ved Topkar (Harvard College) Epigenetic analysis at promoters 13 August 2013 5 / 1
. . . . . .
Molecular Biology Essentials
Transcriptional Regulation
Ved Topkar (Harvard College) Epigenetic analysis at promoters 13 August 2013 6 / 1
. . . . . .
Molecular Biology Essentials
Transcriptional Access
Ved Topkar (Harvard College) Epigenetic analysis at promoters 13 August 2013 7 / 1
. . . . . .
Background
Epigenetics and Gene Expression
Things beyond just the base pairs in DNA matter → gene expression
Ved Topkar (Harvard College) Epigenetic analysis at promoters 13 August 2013 8 / 1
. . . . . .
Background
The Question
Analyze the ENCODE dataset
Ved Topkar (Harvard College) Epigenetic analysis at promoters 13 August 2013 9 / 1
. . . . . .
Background
The Question
Analyze the ENCODE dataset
Ved Topkar (Harvard College) Epigenetic analysis at promoters 13 August 2013 9 / 1
. . . . . .
Background
Ved Topkar (Harvard College) Epigenetic analysis at promoters 13 August 2013 10 / 1
. . . . . .
Background
ENCODE (Overview)
.Overview..
......
National Human GenomeInstitute: Encyclopedia ofDNA Elements (ENCODE)
Nearly 600 collaboratinglabs post HGP
Ved Topkar (Harvard College) Epigenetic analysis at promoters 13 August 2013 11 / 1
. . . . . .
Background
The Data Set
Ved Topkar (Harvard College) Epigenetic analysis at promoters 13 August 2013 12 / 1
. . . . . .
Background
Data Types
Raw signals
Raw signal peak calling outputs (e.g. PeakSeq results)
Relatively course-grain peak data
Ved Topkar (Harvard College) Epigenetic analysis at promoters 13 August 2013 13 / 1
. . . . . .
A simple example TFBS
The Game Plan
.Can we reduce transcription factor binding landscapes into categories?..
......
Scan across genome, looking for promoters
Bin promoters appropriately
Score binding at each promoter
Clustering analysis
Ved Topkar (Harvard College) Epigenetic analysis at promoters 13 August 2013 14 / 1
. . . . . .
A simple example TFBS
RefSeq
.Overview..
......
Curated database of genes
New versions released asfrequently as Firefox
Includes pseudogenes,haplotype variations, andpredicted genes
Ved Topkar (Harvard College) Epigenetic analysis at promoters 13 August 2013 15 / 1
. . . . . .
A simple example TFBS
Defining Promoter?
Only upstream from TSS?
Incredibly far regulatory regions?
Intronic regulation?
Post termination regulatory elements?
1000 bp upstream
Ved Topkar (Harvard College) Epigenetic analysis at promoters 13 August 2013 16 / 1
. . . . . .
A simple example TFBS
Defining Promoter?
Only upstream from TSS?
Incredibly far regulatory regions?
Intronic regulation?
Post termination regulatory elements?
1000 bp upstream
Ved Topkar (Harvard College) Epigenetic analysis at promoters 13 August 2013 16 / 1
. . . . . .
A simple example TFBS
Defining Promoter?
Only upstream from TSS?
Incredibly far regulatory regions?
Intronic regulation?
Post termination regulatory elements?
1000 bp upstream
Ved Topkar (Harvard College) Epigenetic analysis at promoters 13 August 2013 16 / 1
. . . . . .
A simple example TFBS
Defining Promoter?
Only upstream from TSS?
Incredibly far regulatory regions?
Intronic regulation?
Post termination regulatory elements?
1000 bp upstream
Ved Topkar (Harvard College) Epigenetic analysis at promoters 13 August 2013 16 / 1
. . . . . .
A simple example TFBS
Defining Promoter?
Only upstream from TSS?
Incredibly far regulatory regions?
Intronic regulation?
Post termination regulatory elements?
1000 bp upstream
Ved Topkar (Harvard College) Epigenetic analysis at promoters 13 August 2013 16 / 1
. . . . . .
A simple example TFBS
Binning
How do we quantitatively analyze promoter presence?
Break promoter regions into bins for a finer metric?
Do we give weights to bins as a function of their position?
Single, unweighted 1000 bp bin of counts
Ved Topkar (Harvard College) Epigenetic analysis at promoters 13 August 2013 17 / 1
. . . . . .
A simple example TFBS
Binning
How do we quantitatively analyze promoter presence?
Break promoter regions into bins for a finer metric?
Do we give weights to bins as a function of their position?
Single, unweighted 1000 bp bin of counts
Ved Topkar (Harvard College) Epigenetic analysis at promoters 13 August 2013 17 / 1
. . . . . .
A simple example TFBS
Binning
How do we quantitatively analyze promoter presence?
Break promoter regions into bins for a finer metric?
Do we give weights to bins as a function of their position?
Single, unweighted 1000 bp bin of counts
Ved Topkar (Harvard College) Epigenetic analysis at promoters 13 August 2013 17 / 1
. . . . . .
A simple example TFBS
Binning
How do we quantitatively analyze promoter presence?
Break promoter regions into bins for a finer metric?
Do we give weights to bins as a function of their position?
Single, unweighted 1000 bp bin of counts
Ved Topkar (Harvard College) Epigenetic analysis at promoters 13 August 2013 17 / 1
. . . . . .
A simple example Survey of Promoters
Forward vs. Backward Strand
Ved Topkar (Harvard College) Epigenetic analysis at promoters 13 August 2013 18 / 1
. . . . . .
A simple example Survey of Promoters
TF Binding Frequency
Ved Topkar (Harvard College) Epigenetic analysis at promoters 13 August 2013 19 / 1
. . . . . .
A simple example Survey of Promoters
Histogram of promoter binding
Ved Topkar (Harvard College) Epigenetic analysis at promoters 13 August 2013 20 / 1
. . . . . .
A simple example Survey of Promoters
Promoter/TFBS intersections
Ved Topkar (Harvard College) Epigenetic analysis at promoters 13 August 2013 21 / 1
. . . . . .
A simple example Survey of Promoters
Computational Efficiency
This was an exercise in program optimization
Original algorithm took about 5 days, optimized/parallelizedalgorithm took just a few hours
Ved Topkar (Harvard College) Epigenetic analysis at promoters 13 August 2013 22 / 1
. . . . . .
A simple example Survey of Promoters
Clustering
The unsupervised grouping of information such that groups have similarelements that are dissimilar from elements in other groups
Ved Topkar (Harvard College) Epigenetic analysis at promoters 13 August 2013 23 / 1
. . . . . .
A simple example Clustering analysis
Clustering
.Minkowski Distance (p = 2 → Euclidean Distance)..
......
(n∑
i=1
|xi − yi |p) 1
p
Ved Topkar (Harvard College) Epigenetic analysis at promoters 13 August 2013 24 / 1
. . . . . .
A simple example Clustering analysis
Clustering
.Simple Linkage Clustering..
...... D(X ,Y ) = min(d(x , y)) ∀ (x , y)ϵ(X ,Y )
Ved Topkar (Harvard College) Epigenetic analysis at promoters 13 August 2013 25 / 1
. . . . . .
A simple example Clustering analysis
Clustering (simple linkage exhibiting chaining)
Ved Topkar (Harvard College) Epigenetic analysis at promoters 13 August 2013 26 / 1
. . . . . .
A simple example Clustering analysis
Clustering
.Complete Linkage Clustering..
...... D(X ,Y ) = max(d(x , y)) ∀ (x , y)ϵ(X ,Y )
Ved Topkar (Harvard College) Epigenetic analysis at promoters 13 August 2013 27 / 1
. . . . . .
A simple example Clustering analysis
Clustering (complete linkage)
Ved Topkar (Harvard College) Epigenetic analysis at promoters 13 August 2013 28 / 1
. . . . . .
A simple example Clustering analysis
Computational Efficiency
Pairwise distance calculations require a LOT of RAM
Ved Topkar (Harvard College) Epigenetic analysis at promoters 13 August 2013 29 / 1
. . . . . .
A simple example Clustering analysis
2-way clustering heatmap
Ved Topkar (Harvard College) Epigenetic analysis at promoters 13 August 2013 30 / 1
. . . . . .
A simple example Clustering analysis
2-way clustering heatmap (with random data arrays)
Ved Topkar (Harvard College) Epigenetic analysis at promoters 13 August 2013 31 / 1
. . . . . .
A simple example Roadmap Dataset Analysis
Histone Modifications (Roadmap Dataset)
Ved Topkar (Harvard College) Epigenetic analysis at promoters 13 August 2013 32 / 1
. . . . . .
A simple example Roadmap Dataset Analysis
A quick correlation test
R = 0.042
Ved Topkar (Harvard College) Epigenetic analysis at promoters 13 August 2013 33 / 1
. . . . . .
A simple example Roadmap Dataset Analysis
Conclusions
There are numerous methods of promoter binning that prove usefulfor complexity reduction
Unsupervised clustering of preprocessed promoter data yield resultswith biological significance
Ved Topkar (Harvard College) Epigenetic analysis at promoters 13 August 2013 34 / 1
. . . . . .
A simple example Roadmap Dataset Analysis
Next Steps
Incorporation of nucleosome enrichment, methylation, and histonemodification data in a more meaningful way
Further refining of cluster analysis pipeline to uncover more unknownbiology
Ved Topkar (Harvard College) Epigenetic analysis at promoters 13 August 2013 35 / 1
. . . . . .
A simple example End
Thanks!
Jeremy Gunawardena and the HMS Department of Systems Biology!
My collaborator Tobias Ahsendorf!
PRISE and Greg Llacer!
The lovely PRISE staff!
This audience!
Ved Topkar (Harvard College) Epigenetic analysis at promoters 13 August 2013 36 / 1