JianhongOu, Jun Yu, HaiboLiu and LihuaJulie Zhu
Transcript of JianhongOu, Jun Yu, HaiboLiu and LihuaJulie Zhu
April 2008
Jianhong Ou, Jun Yu, Haibo Liu and Lihua Julie Zhu
Integrated Analysis Of ChIP-seq/chip using ChIPpeakAnno, GeneNetworkBuilder and TrackViewer
Bioconductor Annual MeetingBostonJuly 28th 2017
• Introduction of ChIP-seq and ChIP-chip analysis workflow
• ChIPpeakAnno
• GeneNetworkBuilder
• TrackViewer
• Demo
Outline
HIGH-THROUGHPUT IDENTIFICATION OF DNA BINDINGSITES
• ChIP-seq– ChIP followed by high-throughput sequencing
• ChIP-chip– ChIP followed by genome tiling array analysis
Cross-link with formaldehyde
Fragment DNA
Add specific antibody to immunoprecipitate
Reverse cross-link, purify and amplify DNA
High- throughput sequencing
Hybridize to DNA microarray post DNA labeling
Fastq sequence file@HWI-ST570:42:D0PJJACXX:4:1101:1436:2323 1:N:0:CATGGATCGGAAGAGGGAANTCATCTTTGGCCCGGTGTTTCGTCCTTTCC+CCCFFFFFHHHHHHHIJGG#2AEGHIGJIJJJJJJ?FFHJGIGHIIIJIJ
Adapted from: Zhu LJ. Integrative analysis of ChIP-chip and ChIP-seq dataset. Methods Mol Biol. 2013; 1067:105-24.
ANALYSIS WORKFLOW
Adapted from: Zhu LJ. Integrative analysis of ChIP-chip and ChIP-seq dataset. Methods Mol Biol. 2013; 1067:105-24.
CHIPPEAKANNO
• Batch annotate enriched peaks– ChIP-seq– ChIP-chip– PAS-seq (Poly(A) Site Sequencing)– Cap Analysis of Gene Expression (CAGE)– Any experiments resulting in a large number of
enriched genomic regions
FUNCTIONALITY• Find the nearest genes for each set of peaks and graph the distribution around features.• Find all genes within a certain distance from the peaks • Identify enriched Gene Ontology (GO) terms and pathways associated with adjacent genes of the peaks.• Label peaks with any annotation of interest
• a dataset from the literature• CpG island• conserved element• histone modification marks
• Determine the significance of overlap and drawing Venn diagrams to visualize the extent of the overlap • binding sites among replicates• binding sites among transcription factors within a complex• binding sites among different experiments such as yours and the ones in literature
• Retrieve genomic sequences flanking putative binding sites for motif discovery, cloning or PCR amplification• Find the peaks with bi-directional promoters with summary statistics• Summarize motif occurrence in peaks• Irreproducibility Discovery Rate (IDR)
DAF-12 EXAMPLE DATASET
• ChIP-chip peaks were downloaded from GEO at http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE28350 (Hochbaum, Zhang et al. 2011, PLoS Genet 7(7): e1002179)
• Expression Microarray results were downloaded from (Fisher and Lithgow 2006, Aging Cell 5(2): 127-138).
OVERLAP ANALYSIS AND DISTRIBUTION OF PEAKS AROUND TSS
Replicate1 Replicate2
Replicate3 156
932
12
323
4
148
1
1424
Distance To Nearest TSS
Freq
uenc
y
−10000 −5000 0 5000 10000
05
1015
2025
DISTRIBUTION OF DAF-12-BINDING SITES
downstream
includeFeature
insideoverlapEnd
overlapStart
upstream
Exon% Intron% 5UTR% 3UTR% Proximal Promoter% Immediate Downstream% Enhancer%
Chromosome Region
% b
indi
ng s
ites
010
2030
4050
6070
TOP MOTIFS
m1forward RC
EWSR1−FLI1−
2.2641e−03
IRF1−
2.33e−02
SPIB−
2.3803e−02
m2forward RC
NR2F1+
1.6075e−02
SOX10+
2.4823e−02
HNF4A−
2.7161e−02
m3forward RC
RREB1−
5.2626e−03
Egr1+
1.1045e−02
Foxd3+
1.9366e−02
m4forward RC
SPIB−
2.0655e−02
Tal1::Gata1−
6.646e−02
SOX10+
6.9122e−02
m5forward RC
Pax4−
4.2833e−03
Sox5−
1.6952e−02
NKX3−1−
2.2411e−02
Logo
DAF-12 REGULATORY NETWORK
C02F5.5
mi r -46C33D9.9
C01B10.4
mi r -52
mi r -53
C17A2.4
mi r -228
F01D5.1
srd-16
mb f - 1
mi r -74
mi r -229Y37D8A.5
lsy-2
mi r -75
C50F4.9
mi r -73
hpd -1
F32A5.4
C08E8.3
K10C2.3
Y69E1A.5
mi r -76
htas-1
Y59E9AL.3
dod-24
R10H10.3
sss-1
mi r -240
K09H11.7
pos-1
mi r -43
F55G11.7T01D3.6
cdh-5
T05E12.6
mi r -788
F57G8.7
C33F10.1
F01D5.3
K03H1.12
scl-27
m i r - 2
mi r -242
bre-1
mi r -63nhr -86
hpo-28
Y106G6A.4
F35H8.4
msp-51
clec-66
mi r -72
F44A2.5
ZK1248.17
msp-76
mir-239.1
mi r -87
B0507.10
T11B7.5
pho-1
clec-52
f l i - 1
T05C3.6
nspd-10
Y54G11A.14F21C10.11
Y49G5A.1
F36D1.4
K10D3.6
C27F2.7
nspd-1
ceh-45
meg-2
col-172
z ip -2
tbc -7
mi r -230
wago-9col -19
m i r - 1clec-4
g rd -5
gst -11
dyc-1
nhr -20
T22B7.7
F35E12.5
mi r -243
mi r -44
m ig -5
mi r -80
nhr -42
h lh-30
mi r -40
C24B9.3
i lys-2
mi r -58Y44A6D.1
Y75B8A.23
gst -27
cyp-14A5
col-20
C53A5.2
mi r -34far -3
msp-38
Y67A6A.1
W10C8.5
nhr -10
a t f -6
W01B6.4
ces-2mi r -237
l i n -4
nh r -1
daf -12
grd-10
C12D5.4
F09F7.4
mi r -84
sre-13lnp -1
msp-56
mi r -261
pho-12 T01B11.2
C36C9.1
mi r -42
msp-142le t -7
mi r -795
mi r -37
ug t -6
F11G11.4
ssq-2msp-63
C27D6.3
mi r -51
clec-76
msp-50
fu t - 1
mi r -39
Y40C5A.1
mi r -355
pd i -2
mi r -38
daf -3
mi r -41
l in -13mi r -241
mi r -35
mi r -36
scd-1
C32H11.4
K12H4.7
REFERENCES
• Zhu LJ, Gazin C, Lawson ND, Pagès H, Lin SM, Lapointe DS, Green MR. ChIPpeakAnno: a Bioconductor package to annotate ChIP-seq and ChIP-chip data. BMC Bioinformatics. 2010 May 11; 11:237. PMID: 20459804.
• Zhu LJ. Integrative analysis of ChIP-chip and ChIP-seqdataset. Methods Mol Biol. 2013; 1067:105-24. PMID: 23975789.