Python meetup 2014
Transcript of Python meetup 2014
Unveiling Epigenetic Regulation with
Next Generation Sequencing (NGS) and Python
Ying Liu
Weill Cornell Medical College
The New York Python Meetup, 05-29-2014
About Me
• PhD candidate, Weill Cornell Medical College
• Major Area: Stem cell epigenomics, Computational Biology
• Graduation: Fall 2014
• Email: [email protected]
• LinkedIn: https://www.linkedin.com/pub/ying-liu/b/669/605
Reprogram > 20 days(Thousands of genes change expression)
Induced pluripotent stem
(iPS) cells
Adult cells
Express pluripotent stem cell
specific genes (4 genes)
2012 Nobel Prize in
Physiology or Medicine
Generate Pluripotent Stem Cells
from Mature Adult Cells
Limitation
• Reprogram efficiency: 0.01 - 0.1%
• Molecular mechanism is not fully understood
?
Human Genome Project
• Human genome: ~ 3 billion DNA base pairs
• Complete sequence: 2003
First sequence draft: 2001
Nature 454, 711-715
Gene Expression
My project: Histone X
Enriches at expressing genes
Epigenetic Regulation
• Epigenetics: study of heritable changes
in gene activity that are NOT caused by
changes in the DNA sequence
• One of the major epigenetic regulators:
Histone protein
Histone
proteins
DNA
Induced pluripotent stem
(iPS) cells
Adult cells
2012 Nobel Prize in
Physiology or Medicine
Project
Detect histone X function in initiating
adult cells reprogramming to iPS cells.
Experiment
• Collect cells at the beginning (Day 0, 3, 6,
10) and after reprogramming (iPS);
• Map genome-wide histone X localization
with Next Generation Sequencing (NGS);
• Analyze the dynamic change of genome-
wide histone X localization with Python
program and framework.
Reprogram > 20 days(Thousands of genes change expression)
Express pluripotent stem cell
specific genes (4 genes)
Generate Pluripotent Stem Cells
from Mature Adult Cells
Next Generation
DNA Sequencing
(Illumina, Inc)
Genome-wide Analysis of Epigenetic Regulation
Computation analysis (by genome)
Tools: Python, R, etc.
Align DNA sequence to chromosome
Display in genome browser (by gene)
chromosome
10 kb
Day 0
Day 3
Day 6
Day 10
Day 0
Day 3
Day 6
Day 10
Histone
X
K27me3
Pou5f1 Nanog
Histone X Enriches Near Stem Cell Specific Genes
At the Beginning of Cell Reprogramming
Genome browser (IGV)
Alignment output (BED format)
chr1 3000062 3000113 HWI-1KL117_0134:6:2101:14893:19331#ACAGTG/A..GTG. 37 +
chr1 3000113 3000164 HWI-1KL117_0134:6:2302:6790:10626#ACAGTG/A..GT.. 37 +
chr1 3000146 3000197 HWI-1KL117_0134:6:2303:8145:108924#ACAGTG/A..GT.. 37 -
chr1 3000154 3000205 HWI-1KL117_0134:6:2202:14995:109690#ACAGTG/A..GT.. 37 -
chr1 3000241 3000292 HWI-1KL117_0134:6:1304:12589:77263#ACAGTG/A..GT.. 25 -
chr1 3000311 3000362 HWI-1KL117_0134:6:1101:17212:111473#ACAGTG/A..GT.. 37 -
chr1 3000334 3000385 HWI-1KL117_0134:6:2308:10385:78074#ACAGTG/A..GT.. 25 -
chr1 3000385 3000436 HWI-1KL117_0134:6:2102:20734:102615#ACAGTG/A..GG.. 37 +
chr1 3000498 3000549 HWI-1KL117_0134:6:1203:3146:72739#ACAGTG/A..GTG. 37 -
chr1 3000538 3000589 HWI-1KL117_0134:6:1101:1921:57017#ACAGTG/A..GT.. 37 +
Chrom Start End Strand
Computational Pipeline for Genome-wide DNA
Sequence Analysis
Bardet AF, Stark A, Nature Protocols, 2012
Alignment Analysis (Python, Perl)
• BWA
• Picard
• Samtools
• MACS, Cistrome (X. Shirley Liu Lab)
• ChIPseeqer (Olivier Elemento Lab)
Peak Identification with Python Program:
Model-based Analysis of ChIP-Seq (MACS)
Zhang Y, Liu XS, et al. Genome Biology 2008
Feng J, Liu XS, et al. Nature Protocol 2012
(1)
(2)
Requirement: ~3 GB of RAM, 1.5 h per data set with
30 million DNA sequence reads.
d: estimated DNA fragment size
5’
3’
3’
5’
d
• Read distribution: Poisson distribution
• Use dynamic λlocal to capture local biases in the genome
λlocal = max (λBG, [λregion, λ1k], λ5k, λ10k)
λBG: constant estimated from the genome background
λregion: estimated from the candidate region
λ1k, λ5k, λ10k: estimated from 1kb, 5kb, 10kb local window in the control
• p-value: default threshold is 10-5
(3)
(4)
Galaxy / Cistrome
MACS integrated web-based application: http://cistrome.org/ap/
ChIPseeqer
• Graphical User Interface
• Command-line
http://physiology.med.cornell.edu/faculty/elemento/lab/chipseq.shtml
10 kb
Day 0
Day 3
Day 6
Day 10
Day 0
Day 3
Day 6
Day 10
Histone
X
K27me3
Pou5f1 Nanog
Histone X Enriches Near Stem Cell Specific Genes
At the Beginning of Cell Reprogramming
Day: 0 3 6 10 iPS E 0 3 6 10 iPS E
L H L H
Expression Histone X
Ex
pre
ss
ion
Ch
an
ge
Ex
pre
ss
ion
Sta
ble
Pou5f1
Sox2
Cdh1
Cldn3
Jag2
Zbtb32
Elf3
Msh6
Lefty1
Piwil2
Notch4
Tjp3
Fbxo15
Cldn6
Foxh1
Zp3
Fgf15
Nodal
Tdgf1
Gdf3
Nanog
Fgf4
Dppa3
Histone X Enriches At Stem Cell Specific Gene Promoters
Prior to Gene Expression Activation
Embryonic placenta development
Stem cell maintenance
Response to nutrient
Cell-cell signaling
DNA metabolic process
DNA recombination
Formation of primary germ layer
Chromosome organization
Mesoderm development
Cell fate commitment
Stem cell differentiation
Blastocyst formation
Meiosis
Sexual reproduction
Thyroid hormone metabolic process
Cellular response to abiotic stimulus
X0
X1
X2
X3
X4
X5
X6
Embryonic placenta development, GO:0001892
Stem cell maintenance, GO:0019827
Response to nutrient, GO:0007584
Cell-cell signaling, GO:0007267
DNA metabolic process, GO:0006259
DNA recombination, GO:0006310
Formation of primary germ layer, GO:0001704
Chromosome organization, GO:0051276
Mesoderm development, GO:0007498
Cell fate commitment, GO:0045165
Stem cell differentiation, GO:0048863
Blastocyst formation, GO:0001825
Meiosis, GO:0007126
Sexual reproduction, GO:0019953
Thyroid hormone metabolic process, GO:0042403
Cellular response to abiotic stimulus, GO:0071214
5
Enrichment
-5
Depletion
X0
X1
X2
X3
X4
X5
X6
Embryonic placenta development, GO:0001892
Stem cell maintenance, GO:0019827
Response to nutrient, GO:0007584
Cell-cell signaling, GO:0007267
DNA metabolic process, GO:0006259
DNA recombination, GO:0006310
Formation of primary germ layer, GO:0001704
Chromosome organization, GO:0051276
Mesoderm development, GO:0007498
Cell fate commitment, GO:0045165
Stem cell differentiation, GO:0048863
Blastocyst formation, GO:0001825
Meiosis, GO:0007126
Sexual reproduction, GO:0019953
Thyroid hormone metabolic process, GO:0042403
Cellular response to abiotic stimulus, GO:0071214
5
Enrichment
-5Depletion
Expression Active Stable
Group - a b c a b c
Gene Ontology Analysis
a. Histone X enrich during Day 0 – 10
b. Histone X enrich in iPS (after Day 10)
c. Histone X not Enrich
Induced pluripotent stem
(iPS) cells
Adult cells
Limitation
• Reprogram efficiency: 0.01 - 0.1%
• Molecular mechanism is not fully understood
Our Genome-wide analysis suggests:
Histone X participates in stem cell gene activation
at the early stage of adult cell reprogram.
Express pluripotent stem cell
specific genes (4 genes)
Reprogram > 20 days(Thousands of genes change expression)
Generate Pluripotent Stem Cells
from Mature Adult Cells
AcknowledgementThesis advisors
Dr. Shahin Rafii (Weill Cornell Medical College)
Dr. C. David Allis (Rockefeller Univ.)
Collaborators
Dr. Olivier Elemento (Weill Cornell Medical College)
Dr. Eugenia Giannopoulou (Hospital for Special Surgery)