NGS Sequence Analysis for Regulation and Epigenomics -...

52
NGS Sequence Analysis for Regulation and Epigenomics Timothy Bailey Winter School in Mathematical and Computational Biology July 2, 2013

Transcript of NGS Sequence Analysis for Regulation and Epigenomics -...

Page 1: NGS Sequence Analysis for Regulation and Epigenomics - Bioinformaticsbioinformatics.org.au/wp-content/uploads/ws13/sites/3/... · 2013-07-25 · NGS Sequence Analysis for Regulation

NGS Sequence Analysis for Regulation and Epigenomics

Timothy Bailey Winter School in Mathematical

and Computational Biology July 2, 2013

Page 2: NGS Sequence Analysis for Regulation and Epigenomics - Bioinformaticsbioinformatics.org.au/wp-content/uploads/ws13/sites/3/... · 2013-07-25 · NGS Sequence Analysis for Regulation

NGS Analysis and Transcriptional Regulation

•  RNA-seq – Measuring transcription levels (gene

expression) – Detecting RNA regulators (e.g., miRNA)

•  ChIP-seq – Chromatin modifications – Binding of transcription factor proteins

Page 3: NGS Sequence Analysis for Regulation and Epigenomics - Bioinformaticsbioinformatics.org.au/wp-content/uploads/ws13/sites/3/... · 2013-07-25 · NGS Sequence Analysis for Regulation

Talk Overview I.  Transcriptional Regulation 101 II.  ChIP-seq 101 III.  Analyzing ChIP-seq data IV.  Combining ChIP-seq and RNA-seq

Page 4: NGS Sequence Analysis for Regulation and Epigenomics - Bioinformaticsbioinformatics.org.au/wp-content/uploads/ws13/sites/3/... · 2013-07-25 · NGS Sequence Analysis for Regulation

Part I: Basic Transcriptional Regulation

Source:  Steven  Chu  

Page 5: NGS Sequence Analysis for Regulation and Epigenomics - Bioinformaticsbioinformatics.org.au/wp-content/uploads/ws13/sites/3/... · 2013-07-25 · NGS Sequence Analysis for Regulation

Transcription Factors •  Mammalian transcription is controlled

(in part) by about 1400 DNA-binding transcription factor (TF) proteins.

•  These proteins control transcription in two main ways: – Directly, by promoting (or preventing) the

assembly of the pre-initiation complex. – Indirectly, by modifying chromatin.

Page 6: NGS Sequence Analysis for Regulation and Epigenomics - Bioinformaticsbioinformatics.org.au/wp-content/uploads/ws13/sites/3/... · 2013-07-25 · NGS Sequence Analysis for Regulation

BASAL TRANSCRIPTION:  • The pre-initiation complex assembles at the core promoter. • This results in only low levels of transcription because the interaction is unstable.

DNA  

+  

Core  Promoter  

TATA        INR  

Page 7: NGS Sequence Analysis for Regulation and Epigenomics - Bioinformaticsbioinformatics.org.au/wp-content/uploads/ws13/sites/3/... · 2013-07-25 · NGS Sequence Analysis for Regulation

DNA  

Proximal  Promoter  

TATA        INR  

PROXIMAL PROMOTER: • The proximal promoter extends upstream of the promoter. • It contains binding sites for repressor and activator transcription factors.

Page 8: NGS Sequence Analysis for Regulation and Epigenomics - Bioinformaticsbioinformatics.org.au/wp-content/uploads/ws13/sites/3/... · 2013-07-25 · NGS Sequence Analysis for Regulation

• Some transcription factors (“activators”) stabilize the transcriptional machinery when they bind to sites in the proximal promoter.

ACTIVATORS:

• This increases transcription.  

DNA  

+  +  

Proximal  Promoter  

TATA        INR  

Page 9: NGS Sequence Analysis for Regulation and Epigenomics - Bioinformaticsbioinformatics.org.au/wp-content/uploads/ws13/sites/3/... · 2013-07-25 · NGS Sequence Analysis for Regulation

• This reduces transcription.

• Their binding can block binding by co-factors and activators.

• Some factors do not stabilize the transcriptional machinery.

REPRESSORS:

+++  DNA  

+  +  

Proximal  Promoter  

TATA        INR  

Page 10: NGS Sequence Analysis for Regulation and Epigenomics - Bioinformaticsbioinformatics.org.au/wp-content/uploads/ws13/sites/3/... · 2013-07-25 · NGS Sequence Analysis for Regulation

ENHANCER REGIONS:

DNA  

+  +  

Proximal  Promoter  

TATA        INR  

Enhancer  Region  1-­‐-­‐100Kb  

• Often very distant—1000s of base pairs.  

• Groups of binding sites located upstream or downstream of a promoter.  

Page 11: NGS Sequence Analysis for Regulation and Epigenomics - Bioinformaticsbioinformatics.org.au/wp-content/uploads/ws13/sites/3/... · 2013-07-25 · NGS Sequence Analysis for Regulation

• Activator and repressor transcription factors compete to occupy enhancer regions. • DNA looping brings factors into contact with transcriptional machinery. • Bound activators increase transcription.    

ENHANCER REGIONS:

DNA  

+  

Proximal  Promoter  

TATA        INR  

Enhancer  Region  

+++  TATA        INR  

Page 12: NGS Sequence Analysis for Regulation and Epigenomics - Bioinformaticsbioinformatics.org.au/wp-content/uploads/ws13/sites/3/... · 2013-07-25 · NGS Sequence Analysis for Regulation

+++  

Chromatin modification by TFs:

DNA  

+  +  

Proximal  Promoter  

TATA        INR  

Enhancer  Region  

• Tissue-specific transcription factors can bind to HATs, causing chromatin to open.  • This can increase transcription.

• Example: Histone Acetyltransferases (HATs) acetylate histones.  

Specific   General  

HAT  

Page 13: NGS Sequence Analysis for Regulation and Epigenomics - Bioinformaticsbioinformatics.org.au/wp-content/uploads/ws13/sites/3/... · 2013-07-25 · NGS Sequence Analysis for Regulation

Part II: ChIP-seq Overview

Source:  Steven  Chu  

Page 14: NGS Sequence Analysis for Regulation and Epigenomics - Bioinformaticsbioinformatics.org.au/wp-content/uploads/ws13/sites/3/... · 2013-07-25 · NGS Sequence Analysis for Regulation

ChIP-seq •  Chromatin

ImmunoPrecipitation followed by high-throughput sequencing.

•  TF binding sites (“punctate peaks”)

•  Chromatin mods (“broad peaks”

Page 15: NGS Sequence Analysis for Regulation and Epigenomics - Bioinformaticsbioinformatics.org.au/wp-content/uploads/ws13/sites/3/... · 2013-07-25 · NGS Sequence Analysis for Regulation

Steps in ChIP-seq

•  Cross-link proteins to DNA

•  Fragment chromatin •  Immunoprecipitate

with antibody to protein

•  Size-select and ligate

•  Amplify •  Sequence

Cross-­‐link  

Page 16: NGS Sequence Analysis for Regulation and Epigenomics - Bioinformaticsbioinformatics.org.au/wp-content/uploads/ws13/sites/3/... · 2013-07-25 · NGS Sequence Analysis for Regulation

What can I learn from ChIP-seq?

•  What chromatin regions are marked as active promoters or enhancers?

•  Where is my TF bound? •  What is its DNA-binding

motif? •  What genes might it

regulate?

Page 17: NGS Sequence Analysis for Regulation and Epigenomics - Bioinformaticsbioinformatics.org.au/wp-content/uploads/ws13/sites/3/... · 2013-07-25 · NGS Sequence Analysis for Regulation

Part III: Analyzing ChIP-seq Data

Source:  Steven  Chu  

Page 18: NGS Sequence Analysis for Regulation and Epigenomics - Bioinformaticsbioinformatics.org.au/wp-content/uploads/ws13/sites/3/... · 2013-07-25 · NGS Sequence Analysis for Regulation

Analyzing TF ChIP-seq Data •  Key messages of this talk:

– Use controls! – Validate your data at each

step. – But this is Science! What

could possibly go wrong…?

Page 19: NGS Sequence Analysis for Regulation and Epigenomics - Bioinformaticsbioinformatics.org.au/wp-content/uploads/ws13/sites/3/... · 2013-07-25 · NGS Sequence Analysis for Regulation

Things that can go wrong in ChIP-seq…

1.  Low affinity antibody 2.  Non-specific antibody 3.  Contamination 4.  Poor choice of peak calling algorithm (or

parameters) … etc.

Page 20: NGS Sequence Analysis for Regulation and Epigenomics - Bioinformaticsbioinformatics.org.au/wp-content/uploads/ws13/sites/3/... · 2013-07-25 · NGS Sequence Analysis for Regulation

Steps in ChIP-seq Data Analysis

1.  Mapping: where do the sequence “tags” map to the genome?

2.  Peak Calling: where are the regions of significant tag concentration?

3.  Motif Discovery: what is the binding motif?

4.  Location Analysis: where are the peaks w/respect to genes, promoters, introns etc?

Page 21: NGS Sequence Analysis for Regulation and Epigenomics - Bioinformaticsbioinformatics.org.au/wp-content/uploads/ws13/sites/3/... · 2013-07-25 · NGS Sequence Analysis for Regulation

1) Mapping ChIP-seq Tags •  Tags: ChIP-seq produces a pool of

“tags” (~100bp) •  Tag Count: measure of enrichment of region •  Negative Control: “input DNA” tag count

Tallack  et  al.,  Genome  Res.,  2010  

Page 22: NGS Sequence Analysis for Regulation and Epigenomics - Bioinformaticsbioinformatics.org.au/wp-content/uploads/ws13/sites/3/... · 2013-07-25 · NGS Sequence Analysis for Regulation

Do the mapped tags make sense? •  Each ~100 bp tag is the

5’ end of a DNA fragment.

•  But DNA is double-stranded so there are tags from both strands.

•  We expect pairs of clusters of tags on opposite strands, separated by the fragment length.

Wilbanks  and  FaccioK,  PLoS  One,  2010  

Page 23: NGS Sequence Analysis for Regulation and Epigenomics - Bioinformaticsbioinformatics.org.au/wp-content/uploads/ws13/sites/3/... · 2013-07-25 · NGS Sequence Analysis for Regulation

Strand Cross Correlation Analysis (SCCA)

•  If we shift the anti-sense tags left by the (average) fragment length, we should see maximum correlation between the reads on the two strands.

Kharchenko  et  al.,  Nature  Biotechnology,  2009  

Page 24: NGS Sequence Analysis for Regulation and Epigenomics - Bioinformaticsbioinformatics.org.au/wp-content/uploads/ws13/sites/3/... · 2013-07-25 · NGS Sequence Analysis for Regulation

SCCA often shows two maxima •  Fragment-length

peak at average fragment length (as we expected)

•  Read-length peak at average read length(due to variable and dispersed mappability of genomic positions)

read-­‐length  peak  

fragment-­‐length  peak  

Landt  S  G  et  al.  Genome  Res.  2012;22:1813-­‐1831  

Page 25: NGS Sequence Analysis for Regulation and Epigenomics - Bioinformaticsbioinformatics.org.au/wp-content/uploads/ws13/sites/3/... · 2013-07-25 · NGS Sequence Analysis for Regulation

Quality control 1: SCCA identifies failed ChIP-seq

Landt  S  G  et  al.  Genome  Res.  2012;22:1813-­‐1831  

ENCODE Guidelines: •  Normalized Strand Correlation,

NSC > 1.05 •  Relative Strand Correlation,

RSC > 0.8 •  https://code.google.com/p/

phantompeakqualtools

Page 26: NGS Sequence Analysis for Regulation and Epigenomics - Bioinformaticsbioinformatics.org.au/wp-content/uploads/ws13/sites/3/... · 2013-07-25 · NGS Sequence Analysis for Regulation

2) ChIP-seq Peak Calling •  Peak callers combine

overlapping tags to get the “peak height”.

•  Often, strand information and shifting is used to combine tags on opposite strands.

•  Fold-enrichment (tag count / control tag count) is usually used as the criterion for declaring a peak.

Wilbanks  and  Faccio.,  PLoS  One,  2010  

Page 27: NGS Sequence Analysis for Regulation and Epigenomics - Bioinformaticsbioinformatics.org.au/wp-content/uploads/ws13/sites/3/... · 2013-07-25 · NGS Sequence Analysis for Regulation

Some ChIP-seq peak callers use SCCA

Bailey  et.  al.,  PLoS  Comp  Bio,  in  press.  

Uses  SCCA  

Uses  SCCA  

Page 28: NGS Sequence Analysis for Regulation and Epigenomics - Bioinformaticsbioinformatics.org.au/wp-content/uploads/ws13/sites/3/... · 2013-07-25 · NGS Sequence Analysis for Regulation

Sanity checks: Are your peaks reasonable?

•  Width: TF ChIP-seq peaks should be relatively short (< 300bp) compared to histone modification peaks. –  Are your peaks too wide?

•  Number: Is the number of TF ChIP-seq peaks reasonable? –  Some key TFs bind ~30,000 sites but your TF

probably only binds far fewer (~1000?) •  Location: Do your peaks co-occur with histone

marks and genes that your TF regulates? –  Examine some peaks using the UCSC genome

browser and ENCODE histone tracks

Page 29: NGS Sequence Analysis for Regulation and Epigenomics - Bioinformaticsbioinformatics.org.au/wp-content/uploads/ws13/sites/3/... · 2013-07-25 · NGS Sequence Analysis for Regulation

Quality control 2: Fraction of Reads in Peaks (FRiP)

•  Only a fraction of reads typically fall within ChIP-seq peaks.

•  ENCODE guideline: FRiP > 1%

•  Caveat: A lower FRiP threshold may be appropriate if there are very few peaks.

Landt  S  G  et  al.  Genome  Res.  2012;22:1813-­‐1831  

Page 30: NGS Sequence Analysis for Regulation and Epigenomics - Bioinformaticsbioinformatics.org.au/wp-content/uploads/ws13/sites/3/... · 2013-07-25 · NGS Sequence Analysis for Regulation

How many of my peaks are “real”?

•  Irreproducible Discovery Rate (IDR) compares the ranks of peaks from two biological replicates. – Rank peaks by significance (p-value or q-

value) – Reproducible discoveries (peaks) should have

similar ranks between replicates. •  ENCODE: reports peaks at 1% IDR •  https://sites.google.com/site/

anshulkundaje/projects/idr

Page 31: NGS Sequence Analysis for Regulation and Epigenomics - Bioinformaticsbioinformatics.org.au/wp-content/uploads/ws13/sites/3/... · 2013-07-25 · NGS Sequence Analysis for Regulation

Quality control 3: IDR identifies failed ChIP-seq

Landt  S  G  et  al.  Genome  Res.  2012;22:1813-­‐1831  

High  Reproducibility  

Low  Reproducibility  

Page 32: NGS Sequence Analysis for Regulation and Epigenomics - Bioinformaticsbioinformatics.org.au/wp-content/uploads/ws13/sites/3/... · 2013-07-25 · NGS Sequence Analysis for Regulation

3) Motif Discovery & Enrichment Analysis

•  If your TF binds DNA directly (and sequence-specifically), Motif Discovery should find its binding motif.

•  The DNA-binding motif of your TF should be centrally enriched in the peaks, and Central Motif Enrichment Analysis (CMEA) should find it.

Page 33: NGS Sequence Analysis for Regulation and Epigenomics - Bioinformaticsbioinformatics.org.au/wp-content/uploads/ws13/sites/3/... · 2013-07-25 · NGS Sequence Analysis for Regulation

Caveats in ChIP-seq Motif Analysis

•  Peak regions may contain other TF motifs due to looping.

•  The binding of the ChIP-ed factor “X” may be indirect.

•  ChIP-ed motif might be weak due to assisted binding.

Farnham,  Nature  Reviews  Gene>cs,  2009  

Page 34: NGS Sequence Analysis for Regulation and Epigenomics - Bioinformaticsbioinformatics.org.au/wp-content/uploads/ws13/sites/3/... · 2013-07-25 · NGS Sequence Analysis for Regulation

TF Binding Motif Discovery •  ChIP-seq provides

extremely rich data for inferring the DNA-binding affinity of the ChIP-ed transcription factor.

•  In principle, discovering the motif is simple. ààà •  ChIP-seq peaks tend

to be within +/- 50bp of the bound factor.

•  So we just examine the peak regions for enriched patterns.

Page 35: NGS Sequence Analysis for Regulation and Epigenomics - Bioinformaticsbioinformatics.org.au/wp-content/uploads/ws13/sites/3/... · 2013-07-25 · NGS Sequence Analysis for Regulation

MEME Suite tools for ChIP-seq motif discovery and enrichment

•  The MEME Suite (http://meme.nbcr.net) contains several motif discovery and enrichment algorithms appropriate for ChIP-seq data analysis.

–  Discovery & Enrichment: MEME-ChIP

–  Discovery: MEME, DREME, GLAM2

–  Enrichment: CentriMo, AME

Page 36: NGS Sequence Analysis for Regulation and Epigenomics - Bioinformaticsbioinformatics.org.au/wp-content/uploads/ws13/sites/3/... · 2013-07-25 · NGS Sequence Analysis for Regulation

Example: Motif discovery in NFIC ChIP-seq data

•  Pjanic et al. predicted 39,807 ChIP-seq peaks in NFIC ChIP-seq data.

•  They do not report a using motif discovery on these peaks.

•  We used MEME-ChIP which runs both MEME and DREME to perform motif discovery on the 100-bp NFIC ChIP-seq peak regions.

Machanick  &  Bailey,  Bioinforma>cs,  2011  

Page 37: NGS Sequence Analysis for Regulation and Epigenomics - Bioinformaticsbioinformatics.org.au/wp-content/uploads/ws13/sites/3/... · 2013-07-25 · NGS Sequence Analysis for Regulation

Motif discovery fails in the (original) NFIC dataset

•  An NFIC motif is known from in vitro data, based on only 16 sites.

•  MEME and DREME fail to find this motif in the NFIC data.

•  But so do the other algorithms we tried: Amadeus, peak-motifs, Trawler and Weeder.

Page 38: NGS Sequence Analysis for Regulation and Epigenomics - Bioinformaticsbioinformatics.org.au/wp-content/uploads/ws13/sites/3/... · 2013-07-25 · NGS Sequence Analysis for Regulation

The problem: poor peak calling! •  We applied a

different ChIP-seq peak calling algorithm (ChIP-peak) which predicts only 700 peaks (rather than 40,000).

•  MEME discovers the NFI-family binding motif in this new set of peaks.

Page 39: NGS Sequence Analysis for Regulation and Epigenomics - Bioinformaticsbioinformatics.org.au/wp-content/uploads/ws13/sites/3/... · 2013-07-25 · NGS Sequence Analysis for Regulation

“site-­‐probability”  curve    MA0119.1

Position CEQLOGO 22.09.10 17:31

TGGCCTAAGCATGCTGACATGCCAGTA

0

0.0005

0.001

0.0015

0.002

0.0025

0.003

0.0035

0.004

0.0045

-250 -200 -150 -100 -50 0 50 100 150 200 250

pro

ba

bility

position of best site in sequence

ATGGCG p=9.9e-081,w=89,n=3495CRSAGC p=9.2e-040,w=207,n=15406

CHGSAGC p=2.3e-038,w=138,n=11787MTGCGCA p=1.4e-033,w=250,n=2184

CDCCKCC p=2.6e-028,w=266,n=11544

PosiKon  of  Best  Site  

Prob

ability  

Central Motif Enrichment Analysis: CentriMo

•  CentriMo searches for known motifs whose sites are most centrally enriched in the ChIP-seq regions.

•  Use 500bp regions centered on each ChIP-seq peak.

500-­‐bp  ChIP-­‐seq  regions  

W=120  L=500  

S  =  number  of  “successes”  =  4  T  =  number  of  “trials”  =  5  

Bailey  et  al,  NAR,  2012  

Page 40: NGS Sequence Analysis for Regulation and Epigenomics - Bioinformaticsbioinformatics.org.au/wp-content/uploads/ws13/sites/3/... · 2013-07-25 · NGS Sequence Analysis for Regulation

0

0.0005

0.001

0.0015

0.002

0.0025

0.003

-250 -200 -150 -100 -50 0 50 100 150 200 250

pro

babili

ty

position of best site in sequence

MA0119.1 p=2.4e-031,w=295,n=5409MA0244.1 p=4.6e-015,w=381,n=39398MA0161.1 p=7.3e-015,w=329,n=39356MA0099.1 p=5.5e-014,w=343,n=34267MA0406.1 p=8.1e-012,w=323,n=31383

Central Motif Enrichment confirms the known NFIC motif—even in the original peaks

•  NFIC motif is most centrally enriched of 862 JASPAR and UniPROBE motifs (p = 10-31).

MA0119.1

Position CEQLOGO 22.09.10 17:31

TGGCCTAAGCATGCTGACATGCCAGTA NFIC  

•  However, standard motif enrichment algorithms do not show the NFIC as the most enriched motif.

Page 41: NGS Sequence Analysis for Regulation and Epigenomics - Bioinformaticsbioinformatics.org.au/wp-content/uploads/ws13/sites/3/... · 2013-07-25 · NGS Sequence Analysis for Regulation

Quality control 4: CMEA identifies failed ChIP-seq

0

0.0005

0.001

0.0015

0.002

0.0025

-250 -200 -150 -100 -50 0 50 100 150 200 250

pro

babili

ty

position of best site in sequence

MA0039.2 p=7.2e-001,w=365,n=11404

MA0039.2

Position CEQLOGO 10.10.11 18:17

T

C

AGT

G

A

CA

T

CACCT

GACC

T

CC

TA

p  =  0.7  

2.  Failed  KLF1  ChIP-­‐seq  

KLF4  

Pilon  et  al.,  Blood,  2011  

-0.002

-0.001

0

0.001

0.002

0.003

0.004

0.005

0.006

0.007

-250 -200 -150 -100 -50 0 50 100 150 200 250

pro

babili

ty

position of best site in sequence

MA0039.2 p=4.4e-066,w=111,n=712Klf7_primary p=6.9e-056,w=103,n=676

MA0140.1 p=1.5e-048,w=177,n=693MA0035.2 p=2.4e-040,w=194,n=756

1.  Successful  KLF1  ChIP-­‐seq  

MA0039.2

Position CEQLOGO 10.10.11 18:17

T

C

AGT

G

A

CA

T

CACCT

GACC

T

CC

TA

KLF4  

Tallack  et  al.,  Genome  Res,  2010  

Page 42: NGS Sequence Analysis for Regulation and Epigenomics - Bioinformaticsbioinformatics.org.au/wp-content/uploads/ws13/sites/3/... · 2013-07-25 · NGS Sequence Analysis for Regulation

New motif databases

•  In vitro motifs are especially useful for verifying that your ChIP-seq worked.

•  They are independent of the motifs found by motif discovery in your ChIP-seq data. – UniPROBE: 386 mouse TF motifs from

protein-binding microarrays. – Jolma et al., Cell, 2013: 738 human and

mouse TF motifs from SELEX

Page 43: NGS Sequence Analysis for Regulation and Epigenomics - Bioinformaticsbioinformatics.org.au/wp-content/uploads/ws13/sites/3/... · 2013-07-25 · NGS Sequence Analysis for Regulation

4) Location Analysis •  Counts how often TF binding sites are in, say,

promoters, intergenic or intragenic regions.

Farnham,  Nature  Reviews  Gene>cs,  2009  

Page 44: NGS Sequence Analysis for Regulation and Epigenomics - Bioinformaticsbioinformatics.org.au/wp-content/uploads/ws13/sites/3/... · 2013-07-25 · NGS Sequence Analysis for Regulation

Example: Predicting Target Genes •  TF binding sites in promoters probably are

regulatory.

•  “Nearest TSS” rule is often used to assign binding sites to target genes.

•  But distal sites may regulate some other gene via chromatin looping.

Farnham,  Nature  Reviews  Gene:cs,  2009  

Page 45: NGS Sequence Analysis for Regulation and Epigenomics - Bioinformaticsbioinformatics.org.au/wp-content/uploads/ws13/sites/3/... · 2013-07-25 · NGS Sequence Analysis for Regulation

Klf1 binding near TSSs •  Histogram of

distances from Klf1 ChIP-seq peak to the nearest TSS.

•  KLF1 has a population of binding sites in promoters (small hump on left), but most are distal.

Tallack  et  al,  Genome  Res,  2010  

Page 46: NGS Sequence Analysis for Regulation and Epigenomics - Bioinformaticsbioinformatics.org.au/wp-content/uploads/ws13/sites/3/... · 2013-07-25 · NGS Sequence Analysis for Regulation

Motif Spacing Analysis finds co-factor motifs and TF complexes

Page 47: NGS Sequence Analysis for Regulation and Epigenomics - Bioinformaticsbioinformatics.org.au/wp-content/uploads/ws13/sites/3/... · 2013-07-25 · NGS Sequence Analysis for Regulation

Part IV: Combining ChIP-seq and RNA-seq

Source:  Steven  Chu  

Page 48: NGS Sequence Analysis for Regulation and Epigenomics - Bioinformaticsbioinformatics.org.au/wp-content/uploads/ws13/sites/3/... · 2013-07-25 · NGS Sequence Analysis for Regulation

Identification of KLF1 target genes using RNA-seq

3 x Klf1-/- libraries

3 x Klf1+/+ libraries CuffDiff

RefSeq.gtf (gene definition set)

690 KLF1 “Activated” genes

118 KLF1 “Repressed” genes At Bonferroni corrected p-val <0.05 and >1.5 fold change (KO vs WT)

E2f2 E2f40

200

400

600

800

1000

mR

NA

-seq

FPK

M

mRNA-seq

**

qRT-PCR

valida.on  

Tallack  et  al,  Genome  Res,  2012  

Page 49: NGS Sequence Analysis for Regulation and Epigenomics - Bioinformaticsbioinformatics.org.au/wp-content/uploads/ws13/sites/3/... · 2013-07-25 · NGS Sequence Analysis for Regulation

The KLF1 Transcriptome

Tallack  et  al,  Genome  Res,  2012  

Page 50: NGS Sequence Analysis for Regulation and Epigenomics - Bioinformaticsbioinformatics.org.au/wp-content/uploads/ws13/sites/3/... · 2013-07-25 · NGS Sequence Analysis for Regulation

KLF1 is a (direct) Activator The distance from KLF1 ChIP-seq peaks to the nearest TSS (putative target gene) is less for “Activated” genes than for “Repressed” genes.

Tallack  et  al,  Genome  Res,  2012  

Page 51: NGS Sequence Analysis for Regulation and Epigenomics - Bioinformaticsbioinformatics.org.au/wp-content/uploads/ws13/sites/3/... · 2013-07-25 · NGS Sequence Analysis for Regulation

Final reminders •  Check your data at each step!

– Read mapping •  Strand Cross Correlation Analysis (SCCA)

– Peak calling •  Fraction of Reads in Peaks (FRiP) •  Irreproducible Discovery Rate (IDR) analysis

– Motif discovery / enrichment analysis •  De novo motif found? •  In vitro motif centrally enriched?

Page 52: NGS Sequence Analysis for Regulation and Epigenomics - Bioinformaticsbioinformatics.org.au/wp-content/uploads/ws13/sites/3/... · 2013-07-25 · NGS Sequence Analysis for Regulation

   

Acknowledgements The MEME Suite •  Tom Whitington •  Philip Machanick •  James Johnson •  Martin Frith •  William Noble •  Charles Grant •  Shobhit Gupta

KLF Project •  Michael Tallack •  Tom Whitington •  Andrew Perkins •  Sean Grimmond •  Brooke Gardiner •  Ehsan Nourbakhsh •  Nicole Cloonan •  Elanor Wainwright •  Janelle Keys •  Wai Shan Yuen