Computational analysis of the ENCODE...

46
. . . . . . . . Computational analysis of the ENCODE datasets and other related epigenetic explorations Ved Topkar Harvard College class of 2016 Gunawardena Lab Harvard Medical School Department of Systems Biology 13 August 2013 Ved Topkar (Harvard College) Epigenetic analysis at promoters 13 August 2013 1/1

Transcript of Computational analysis of the ENCODE...

Page 1: Computational analysis of the ENCODE datasetsvcp.med.harvard.edu/papers/poster-ved-topkar-PRISE.pdf · Computational analysis of the ENCODE datasets and other related epigenetic explorations

. . . . . .

.

......

Computational analysis of the ENCODE datasetsand other related epigenetic explorations

Ved Topkar

Harvard College class of 2016

Gunawardena LabHarvard Medical School

Department of Systems Biology13 August 2013

Ved Topkar (Harvard College) Epigenetic analysis at promoters 13 August 2013 1 / 1

Page 2: Computational analysis of the ENCODE datasetsvcp.med.harvard.edu/papers/poster-ved-topkar-PRISE.pdf · Computational analysis of the ENCODE datasets and other related epigenetic explorations

. . . . . .

Introduction

Presentation Goals

FULL understanding of discussed materialAsk questions along the way!

Ved Topkar (Harvard College) Epigenetic analysis at promoters 13 August 2013 2 / 1

Page 3: Computational analysis of the ENCODE datasetsvcp.med.harvard.edu/papers/poster-ved-topkar-PRISE.pdf · Computational analysis of the ENCODE datasets and other related epigenetic explorations

. . . . . .

Introduction

Outline

...1 Molecular biology in a jiffy

...2 A case study

Hypothesis formulationAnalyzing data

...3 More examples

Ved Topkar (Harvard College) Epigenetic analysis at promoters 13 August 2013 3 / 1

Page 4: Computational analysis of the ENCODE datasetsvcp.med.harvard.edu/papers/poster-ved-topkar-PRISE.pdf · Computational analysis of the ENCODE datasets and other related epigenetic explorations

. . . . . .

Introduction

Outline

...1 Molecular biology in a jiffy

...2 A case study

Hypothesis formulationAnalyzing data

...3 More examples

Ved Topkar (Harvard College) Epigenetic analysis at promoters 13 August 2013 3 / 1

Page 5: Computational analysis of the ENCODE datasetsvcp.med.harvard.edu/papers/poster-ved-topkar-PRISE.pdf · Computational analysis of the ENCODE datasets and other related epigenetic explorations

. . . . . .

Introduction

Outline

...1 Molecular biology in a jiffy

...2 A case study

Hypothesis formulationAnalyzing data

...3 More examples

Ved Topkar (Harvard College) Epigenetic analysis at promoters 13 August 2013 3 / 1

Page 6: Computational analysis of the ENCODE datasetsvcp.med.harvard.edu/papers/poster-ved-topkar-PRISE.pdf · Computational analysis of the ENCODE datasets and other related epigenetic explorations

. . . . . .

Molecular Biology Essentials

The Cell

Ved Topkar (Harvard College) Epigenetic analysis at promoters 13 August 2013 4 / 1

Page 7: Computational analysis of the ENCODE datasetsvcp.med.harvard.edu/papers/poster-ved-topkar-PRISE.pdf · Computational analysis of the ENCODE datasets and other related epigenetic explorations

. . . . . .

Molecular Biology Essentials

The Central Dogma

Ved Topkar (Harvard College) Epigenetic analysis at promoters 13 August 2013 5 / 1

Page 8: Computational analysis of the ENCODE datasetsvcp.med.harvard.edu/papers/poster-ved-topkar-PRISE.pdf · Computational analysis of the ENCODE datasets and other related epigenetic explorations

. . . . . .

Molecular Biology Essentials

Transcriptional Regulation

Ved Topkar (Harvard College) Epigenetic analysis at promoters 13 August 2013 6 / 1

Page 9: Computational analysis of the ENCODE datasetsvcp.med.harvard.edu/papers/poster-ved-topkar-PRISE.pdf · Computational analysis of the ENCODE datasets and other related epigenetic explorations

. . . . . .

Molecular Biology Essentials

Transcriptional Access

Ved Topkar (Harvard College) Epigenetic analysis at promoters 13 August 2013 7 / 1

Page 10: Computational analysis of the ENCODE datasetsvcp.med.harvard.edu/papers/poster-ved-topkar-PRISE.pdf · Computational analysis of the ENCODE datasets and other related epigenetic explorations

. . . . . .

Background

Epigenetics and Gene Expression

Things beyond just the base pairs in DNA matter → gene expression

Ved Topkar (Harvard College) Epigenetic analysis at promoters 13 August 2013 8 / 1

Page 11: Computational analysis of the ENCODE datasetsvcp.med.harvard.edu/papers/poster-ved-topkar-PRISE.pdf · Computational analysis of the ENCODE datasets and other related epigenetic explorations

. . . . . .

Background

The Question

Analyze the ENCODE dataset

Ved Topkar (Harvard College) Epigenetic analysis at promoters 13 August 2013 9 / 1

Page 12: Computational analysis of the ENCODE datasetsvcp.med.harvard.edu/papers/poster-ved-topkar-PRISE.pdf · Computational analysis of the ENCODE datasets and other related epigenetic explorations

. . . . . .

Background

The Question

Analyze the ENCODE dataset

Ved Topkar (Harvard College) Epigenetic analysis at promoters 13 August 2013 9 / 1

Page 13: Computational analysis of the ENCODE datasetsvcp.med.harvard.edu/papers/poster-ved-topkar-PRISE.pdf · Computational analysis of the ENCODE datasets and other related epigenetic explorations

. . . . . .

Background

Ved Topkar (Harvard College) Epigenetic analysis at promoters 13 August 2013 10 / 1

Page 14: Computational analysis of the ENCODE datasetsvcp.med.harvard.edu/papers/poster-ved-topkar-PRISE.pdf · Computational analysis of the ENCODE datasets and other related epigenetic explorations

. . . . . .

Background

ENCODE (Overview)

.Overview..

......

National Human GenomeInstitute: Encyclopedia ofDNA Elements (ENCODE)

Nearly 600 collaboratinglabs post HGP

Ved Topkar (Harvard College) Epigenetic analysis at promoters 13 August 2013 11 / 1

Page 15: Computational analysis of the ENCODE datasetsvcp.med.harvard.edu/papers/poster-ved-topkar-PRISE.pdf · Computational analysis of the ENCODE datasets and other related epigenetic explorations

. . . . . .

Background

The Data Set

Ved Topkar (Harvard College) Epigenetic analysis at promoters 13 August 2013 12 / 1

Page 16: Computational analysis of the ENCODE datasetsvcp.med.harvard.edu/papers/poster-ved-topkar-PRISE.pdf · Computational analysis of the ENCODE datasets and other related epigenetic explorations

. . . . . .

Background

Data Types

Raw signals

Raw signal peak calling outputs (e.g. PeakSeq results)

Relatively course-grain peak data

Ved Topkar (Harvard College) Epigenetic analysis at promoters 13 August 2013 13 / 1

Page 17: Computational analysis of the ENCODE datasetsvcp.med.harvard.edu/papers/poster-ved-topkar-PRISE.pdf · Computational analysis of the ENCODE datasets and other related epigenetic explorations

. . . . . .

A simple example TFBS

The Game Plan

.Can we reduce transcription factor binding landscapes into categories?..

......

Scan across genome, looking for promoters

Bin promoters appropriately

Score binding at each promoter

Clustering analysis

Ved Topkar (Harvard College) Epigenetic analysis at promoters 13 August 2013 14 / 1

Page 18: Computational analysis of the ENCODE datasetsvcp.med.harvard.edu/papers/poster-ved-topkar-PRISE.pdf · Computational analysis of the ENCODE datasets and other related epigenetic explorations

. . . . . .

A simple example TFBS

RefSeq

.Overview..

......

Curated database of genes

New versions released asfrequently as Firefox

Includes pseudogenes,haplotype variations, andpredicted genes

Ved Topkar (Harvard College) Epigenetic analysis at promoters 13 August 2013 15 / 1

Page 19: Computational analysis of the ENCODE datasetsvcp.med.harvard.edu/papers/poster-ved-topkar-PRISE.pdf · Computational analysis of the ENCODE datasets and other related epigenetic explorations

. . . . . .

A simple example TFBS

Defining Promoter?

Only upstream from TSS?

Incredibly far regulatory regions?

Intronic regulation?

Post termination regulatory elements?

1000 bp upstream

Ved Topkar (Harvard College) Epigenetic analysis at promoters 13 August 2013 16 / 1

Page 20: Computational analysis of the ENCODE datasetsvcp.med.harvard.edu/papers/poster-ved-topkar-PRISE.pdf · Computational analysis of the ENCODE datasets and other related epigenetic explorations

. . . . . .

A simple example TFBS

Defining Promoter?

Only upstream from TSS?

Incredibly far regulatory regions?

Intronic regulation?

Post termination regulatory elements?

1000 bp upstream

Ved Topkar (Harvard College) Epigenetic analysis at promoters 13 August 2013 16 / 1

Page 21: Computational analysis of the ENCODE datasetsvcp.med.harvard.edu/papers/poster-ved-topkar-PRISE.pdf · Computational analysis of the ENCODE datasets and other related epigenetic explorations

. . . . . .

A simple example TFBS

Defining Promoter?

Only upstream from TSS?

Incredibly far regulatory regions?

Intronic regulation?

Post termination regulatory elements?

1000 bp upstream

Ved Topkar (Harvard College) Epigenetic analysis at promoters 13 August 2013 16 / 1

Page 22: Computational analysis of the ENCODE datasetsvcp.med.harvard.edu/papers/poster-ved-topkar-PRISE.pdf · Computational analysis of the ENCODE datasets and other related epigenetic explorations

. . . . . .

A simple example TFBS

Defining Promoter?

Only upstream from TSS?

Incredibly far regulatory regions?

Intronic regulation?

Post termination regulatory elements?

1000 bp upstream

Ved Topkar (Harvard College) Epigenetic analysis at promoters 13 August 2013 16 / 1

Page 23: Computational analysis of the ENCODE datasetsvcp.med.harvard.edu/papers/poster-ved-topkar-PRISE.pdf · Computational analysis of the ENCODE datasets and other related epigenetic explorations

. . . . . .

A simple example TFBS

Defining Promoter?

Only upstream from TSS?

Incredibly far regulatory regions?

Intronic regulation?

Post termination regulatory elements?

1000 bp upstream

Ved Topkar (Harvard College) Epigenetic analysis at promoters 13 August 2013 16 / 1

Page 24: Computational analysis of the ENCODE datasetsvcp.med.harvard.edu/papers/poster-ved-topkar-PRISE.pdf · Computational analysis of the ENCODE datasets and other related epigenetic explorations

. . . . . .

A simple example TFBS

Binning

How do we quantitatively analyze promoter presence?

Break promoter regions into bins for a finer metric?

Do we give weights to bins as a function of their position?

Single, unweighted 1000 bp bin of counts

Ved Topkar (Harvard College) Epigenetic analysis at promoters 13 August 2013 17 / 1

Page 25: Computational analysis of the ENCODE datasetsvcp.med.harvard.edu/papers/poster-ved-topkar-PRISE.pdf · Computational analysis of the ENCODE datasets and other related epigenetic explorations

. . . . . .

A simple example TFBS

Binning

How do we quantitatively analyze promoter presence?

Break promoter regions into bins for a finer metric?

Do we give weights to bins as a function of their position?

Single, unweighted 1000 bp bin of counts

Ved Topkar (Harvard College) Epigenetic analysis at promoters 13 August 2013 17 / 1

Page 26: Computational analysis of the ENCODE datasetsvcp.med.harvard.edu/papers/poster-ved-topkar-PRISE.pdf · Computational analysis of the ENCODE datasets and other related epigenetic explorations

. . . . . .

A simple example TFBS

Binning

How do we quantitatively analyze promoter presence?

Break promoter regions into bins for a finer metric?

Do we give weights to bins as a function of their position?

Single, unweighted 1000 bp bin of counts

Ved Topkar (Harvard College) Epigenetic analysis at promoters 13 August 2013 17 / 1

Page 27: Computational analysis of the ENCODE datasetsvcp.med.harvard.edu/papers/poster-ved-topkar-PRISE.pdf · Computational analysis of the ENCODE datasets and other related epigenetic explorations

. . . . . .

A simple example TFBS

Binning

How do we quantitatively analyze promoter presence?

Break promoter regions into bins for a finer metric?

Do we give weights to bins as a function of their position?

Single, unweighted 1000 bp bin of counts

Ved Topkar (Harvard College) Epigenetic analysis at promoters 13 August 2013 17 / 1

Page 28: Computational analysis of the ENCODE datasetsvcp.med.harvard.edu/papers/poster-ved-topkar-PRISE.pdf · Computational analysis of the ENCODE datasets and other related epigenetic explorations

. . . . . .

A simple example Survey of Promoters

Forward vs. Backward Strand

Ved Topkar (Harvard College) Epigenetic analysis at promoters 13 August 2013 18 / 1

Page 29: Computational analysis of the ENCODE datasetsvcp.med.harvard.edu/papers/poster-ved-topkar-PRISE.pdf · Computational analysis of the ENCODE datasets and other related epigenetic explorations

. . . . . .

A simple example Survey of Promoters

TF Binding Frequency

Ved Topkar (Harvard College) Epigenetic analysis at promoters 13 August 2013 19 / 1

Page 30: Computational analysis of the ENCODE datasetsvcp.med.harvard.edu/papers/poster-ved-topkar-PRISE.pdf · Computational analysis of the ENCODE datasets and other related epigenetic explorations

. . . . . .

A simple example Survey of Promoters

Histogram of promoter binding

Ved Topkar (Harvard College) Epigenetic analysis at promoters 13 August 2013 20 / 1

Page 31: Computational analysis of the ENCODE datasetsvcp.med.harvard.edu/papers/poster-ved-topkar-PRISE.pdf · Computational analysis of the ENCODE datasets and other related epigenetic explorations

. . . . . .

A simple example Survey of Promoters

Promoter/TFBS intersections

Ved Topkar (Harvard College) Epigenetic analysis at promoters 13 August 2013 21 / 1

Page 32: Computational analysis of the ENCODE datasetsvcp.med.harvard.edu/papers/poster-ved-topkar-PRISE.pdf · Computational analysis of the ENCODE datasets and other related epigenetic explorations

. . . . . .

A simple example Survey of Promoters

Computational Efficiency

This was an exercise in program optimization

Original algorithm took about 5 days, optimized/parallelizedalgorithm took just a few hours

Ved Topkar (Harvard College) Epigenetic analysis at promoters 13 August 2013 22 / 1

Page 33: Computational analysis of the ENCODE datasetsvcp.med.harvard.edu/papers/poster-ved-topkar-PRISE.pdf · Computational analysis of the ENCODE datasets and other related epigenetic explorations

. . . . . .

A simple example Survey of Promoters

Clustering

The unsupervised grouping of information such that groups have similarelements that are dissimilar from elements in other groups

Ved Topkar (Harvard College) Epigenetic analysis at promoters 13 August 2013 23 / 1

Page 34: Computational analysis of the ENCODE datasetsvcp.med.harvard.edu/papers/poster-ved-topkar-PRISE.pdf · Computational analysis of the ENCODE datasets and other related epigenetic explorations

. . . . . .

A simple example Clustering analysis

Clustering

.Minkowski Distance (p = 2 → Euclidean Distance)..

......

(n∑

i=1

|xi − yi |p) 1

p

Ved Topkar (Harvard College) Epigenetic analysis at promoters 13 August 2013 24 / 1

Page 35: Computational analysis of the ENCODE datasetsvcp.med.harvard.edu/papers/poster-ved-topkar-PRISE.pdf · Computational analysis of the ENCODE datasets and other related epigenetic explorations

. . . . . .

A simple example Clustering analysis

Clustering

.Simple Linkage Clustering..

...... D(X ,Y ) = min(d(x , y)) ∀ (x , y)ϵ(X ,Y )

Ved Topkar (Harvard College) Epigenetic analysis at promoters 13 August 2013 25 / 1

Page 36: Computational analysis of the ENCODE datasetsvcp.med.harvard.edu/papers/poster-ved-topkar-PRISE.pdf · Computational analysis of the ENCODE datasets and other related epigenetic explorations

. . . . . .

A simple example Clustering analysis

Clustering (simple linkage exhibiting chaining)

Ved Topkar (Harvard College) Epigenetic analysis at promoters 13 August 2013 26 / 1

Page 37: Computational analysis of the ENCODE datasetsvcp.med.harvard.edu/papers/poster-ved-topkar-PRISE.pdf · Computational analysis of the ENCODE datasets and other related epigenetic explorations

. . . . . .

A simple example Clustering analysis

Clustering

.Complete Linkage Clustering..

...... D(X ,Y ) = max(d(x , y)) ∀ (x , y)ϵ(X ,Y )

Ved Topkar (Harvard College) Epigenetic analysis at promoters 13 August 2013 27 / 1

Page 38: Computational analysis of the ENCODE datasetsvcp.med.harvard.edu/papers/poster-ved-topkar-PRISE.pdf · Computational analysis of the ENCODE datasets and other related epigenetic explorations

. . . . . .

A simple example Clustering analysis

Clustering (complete linkage)

Ved Topkar (Harvard College) Epigenetic analysis at promoters 13 August 2013 28 / 1

Page 39: Computational analysis of the ENCODE datasetsvcp.med.harvard.edu/papers/poster-ved-topkar-PRISE.pdf · Computational analysis of the ENCODE datasets and other related epigenetic explorations

. . . . . .

A simple example Clustering analysis

Computational Efficiency

Pairwise distance calculations require a LOT of RAM

Ved Topkar (Harvard College) Epigenetic analysis at promoters 13 August 2013 29 / 1

Page 40: Computational analysis of the ENCODE datasetsvcp.med.harvard.edu/papers/poster-ved-topkar-PRISE.pdf · Computational analysis of the ENCODE datasets and other related epigenetic explorations

. . . . . .

A simple example Clustering analysis

2-way clustering heatmap

Ved Topkar (Harvard College) Epigenetic analysis at promoters 13 August 2013 30 / 1

Page 41: Computational analysis of the ENCODE datasetsvcp.med.harvard.edu/papers/poster-ved-topkar-PRISE.pdf · Computational analysis of the ENCODE datasets and other related epigenetic explorations

. . . . . .

A simple example Clustering analysis

2-way clustering heatmap (with random data arrays)

Ved Topkar (Harvard College) Epigenetic analysis at promoters 13 August 2013 31 / 1

Page 42: Computational analysis of the ENCODE datasetsvcp.med.harvard.edu/papers/poster-ved-topkar-PRISE.pdf · Computational analysis of the ENCODE datasets and other related epigenetic explorations

. . . . . .

A simple example Roadmap Dataset Analysis

Histone Modifications (Roadmap Dataset)

Ved Topkar (Harvard College) Epigenetic analysis at promoters 13 August 2013 32 / 1

Page 43: Computational analysis of the ENCODE datasetsvcp.med.harvard.edu/papers/poster-ved-topkar-PRISE.pdf · Computational analysis of the ENCODE datasets and other related epigenetic explorations

. . . . . .

A simple example Roadmap Dataset Analysis

A quick correlation test

R = 0.042

Ved Topkar (Harvard College) Epigenetic analysis at promoters 13 August 2013 33 / 1

Page 44: Computational analysis of the ENCODE datasetsvcp.med.harvard.edu/papers/poster-ved-topkar-PRISE.pdf · Computational analysis of the ENCODE datasets and other related epigenetic explorations

. . . . . .

A simple example Roadmap Dataset Analysis

Conclusions

There are numerous methods of promoter binning that prove usefulfor complexity reduction

Unsupervised clustering of preprocessed promoter data yield resultswith biological significance

Ved Topkar (Harvard College) Epigenetic analysis at promoters 13 August 2013 34 / 1

Page 45: Computational analysis of the ENCODE datasetsvcp.med.harvard.edu/papers/poster-ved-topkar-PRISE.pdf · Computational analysis of the ENCODE datasets and other related epigenetic explorations

. . . . . .

A simple example Roadmap Dataset Analysis

Next Steps

Incorporation of nucleosome enrichment, methylation, and histonemodification data in a more meaningful way

Further refining of cluster analysis pipeline to uncover more unknownbiology

Ved Topkar (Harvard College) Epigenetic analysis at promoters 13 August 2013 35 / 1

Page 46: Computational analysis of the ENCODE datasetsvcp.med.harvard.edu/papers/poster-ved-topkar-PRISE.pdf · Computational analysis of the ENCODE datasets and other related epigenetic explorations

. . . . . .

A simple example End

Thanks!

Jeremy Gunawardena and the HMS Department of Systems Biology!

My collaborator Tobias Ahsendorf!

PRISE and Greg Llacer!

The lovely PRISE staff!

This audience!

Ved Topkar (Harvard College) Epigenetic analysis at promoters 13 August 2013 36 / 1