Chromatin Structure & Dynamics Victor Jin Department of Biomedical Informatics The Ohio State...

Post on 04-Jan-2016

222 views 0 download

Tags:

Transcript of Chromatin Structure & Dynamics Victor Jin Department of Biomedical Informatics The Ohio State...

Chromatin Structure Chromatin Structure & Dynamics& Dynamics

Victor JinDepartment of Biomedical InformaticsThe Ohio State University

Chromatin Walther Flemming first used the term Chromatin in 1882. At that

time, Flemming assumed that within the nucleus there was some kind of a nuclear-scaffold.

Chromatin is the complex of DNA and protein that makes up chromosomes.

Chromatin structure: DNA wrapping around nucleosomes – a

“beads on a string” structure.

In non-dividing cells there are two types of chromatin: euchromatin and heterochromatin.

Chromatin Fibers

30 nmchromatin fiber

11 nm(beads)

Chromatin as seen in the electron microscope. (source: Alberts et al., Molecular Biology of The Cell, 3rd Edition)

The basic repeating unit of chromatin.

It is made up by five histone proteins: H2A, H2B, H3, H4 as core histones and H1 as a linker.

It provides the lowest level of compaction of double-strand DNA into the cell nucleus.

It often associates with transcription.

Nucleosome

H2A H2BH3

H4

1974: Roger Kornberg discovers nucleosome who won Nobel Prize in 2006.

Core Histones are highly conserved proteins - share a structural motif called a histone fold including three α helices connected by two loops and an N-terminal tail

Histone Octamer

Each core histone forms pairs as a dimer contains 3 regions of interaction with dsDNA; H3 and H4 further assemble tetramers. The histone octamer organizes 146 bp of DNA in 1.65 helical turn of DNA: 48 nm of DNA packaged in a disc of 6 x 11nm

< 6 nm >

<

11

nm

>

Nucleosome Assembly In Vitro

4 core histones + 1 naked DNA template at 4C at 2M salt concentration, from Dyer et al, Methods in Enzymology (2004), 375:23-44.

DNA compaction compaction in a human cell nucleus

1bp (0.3nm)

10,000 nm

30nm

11 nm

The N-terminal tails protrude from the core

Histone Modifications

Me

P

Ub

Su

Ac Me

Acetylation

Methylation

Ubiquitination

Sumoylation

Phosphorylation

‘Histone Code’

Acetylation of LysinesAcetylation of the lysines at the N terminus of histones removes positive charges, thereby reducing the affinity between histones and DNA.

This makes RNA polymerase and transcription factors easier to access the promoter region. 

Histone acetylation enhances transcription while histone deacetylation represses transcription.

Methylation of Arginines and Lysines

Arginine can be methylated to form mono-methyl, symmetrical di-methyl and asymmetrical di-methylarginine.

Lysine can be methylated to form mono-methyl,

di-methyl

and tri-methylarginine.

Methylation of Histone H3-K27

K27

PCDNMT

SUZ12

HDACEED

EZH2

Functional Consequences of Histone Modification

Establishing global chromatin environment, such as Euchromatin, Heterochromatin and Bivalent domains in embryonic stem cells (ESCs).

Orchestration of DNA-based process transcription.

Euchromatin

A lightly packed form of chromatin; Gene-rich; At chromosome arms; Associated with active transcription.

Heterochromatin

A tightly packed form of chromatin; At centromeres and telomeres; Contains repetitious sequences; Gene-poor; Associated with repressed transcription.

Bivalent Domains

Poised state. The chromatin of embryonic stem cells has “bivalent” domains with marks of both gene activation and repression. In these domains, the tail of histone protein H3 has a methyl group attached to lysine 4 (K4) that is activating and a methyl group at lysine 27 (K27) that is repressive (above). This contradictory state may keep the genes silenced but poised to activate if needed. When the cell differentiates (right), only one tag or the other remains, depending on whether the gene is expressed or not.

DNA Methylation

5-methylcytosine

S-adenosylmethionine

DNA methyltransferase

deoxycytosine

N

N

O

OH H

-OO

N

N

N

O

OH H

-OO

NCH3

CpG Islands

CpG island: a cluster of CpG residues often found near gene promoters (at least 200 bp and with a GC percentage that is greater than 50% and with an observed/expected CpG ratio that is greater than 0.6).

~29,000 CpG islands in human genome (~60% of all genes are associated with CpG islands)

Most CpG islands are unmethylated in normal cells.

Mark Transcriptionally relevant sites Biological Role

Methylated cytosine(meC)

CpG islands Transcriptional Repression

Acetylated lysine (Kac)

H3 (9,14,18,56), H4 (5,8,13,16), H2A, H2B

Transcriptional Activation

Phosphorylated serine/threonine

(S/Tph)

H3 (3,10,28), H2A, H2B Transcriptional Activation

Methylated argine (Rme)

H3 (17,23), H4 (3) Transcriptional Activation

Methylated lysine (Kme)

H3 (4,36,79)H3 (9,27), H4 (20)

Transcriptional Activation

Transcriptional Repression

Ubiquitylated lysine(Kub)

H2B (123/120)H2A (119)

Transcriptional Activation

Transcriptional Repression

Sumoylated lysine (Ksu)

H2B (6/7), H2A (126) Transcriptional Repression

Chromatin modifications

Genome-wide Distribution Pattern of Histone Modification Associated with Transcription

Li et al. Cell (review) 128, 707-719Source: Li et al. Cell (Review, 2007), 128:707-719

ChIP-chipStep 1: Rapid fixation of cells chemically cross-links DNA binding proteins to their genomic targets in vivo.

Step 2: Cell lysis releases the DNA-protein complexes, and sonication fragments the DNA.

Step 3: Immunoprecipitation (IP) purifies the protein-DNA fragments, with specificity dictated by antibody choice.

Step 4: Hydrolysis reverses the cross-links within the released DNA fragments.

Step 5: PCR amplification of ChIP DNA

Step 6: PCR amplification on a known binding-site region for that protein will need to be performed using either conventional PCR methods followed by agarose gel electrophoresis or by quantitative PCR.

Step 7: Labeling pool of protein-DNA fragments.

Step 8: Hybridization of DNA onto microarrays featuring 60-mer oligonucleotide probes.

Major types of array platforms

NimbleGen Arrays: tiling arrays, promoter arrays, whole

genome arrays.

(http://www.nimblegen.com/products/chip/index.html)

Agilent Arrays: promoter arrays, whole genome arrays.

(http://www.chem.agilent.com/Scripts/Phome.asp)

Affymetrix Arrays: tiling arrays, Chr21,22 arrays, whole

genome arrays.

(http://www.affymetrix.com/index.affx)

Measurement of intensity of probes on the array

The hybridized arrays were scanned on an Axon GenePix 4000B scanner (Axon Instruments Inc.) at wavelengths of 532 nm for control (Cy3), and 635 nm (Cy5) for each experimental sample. Data points were extracted from the scanned images using the NimbleScan 2.0 program (NimbleGen Systems, Inc.). Each pair of N probe signals was normalized by converting into a scaled log ratio using the following formula:

•Si = Log2 (Cy5l(i) /Cy3(i))

Confirming on a known target

Different antibodies to same factor

Antibodies to different family members

siRNA-ChIP

Antibodies to two components of a complex

Antibodies to an enzyme/modification pair

Antibody Validation

Confirming on a known target

Comparison of biological replicates and antibodies to different E2Fs

Loss of E2F6 ChIP signal after knockdown of E2F6 siRNA

•Promoter 1 •Promoter 2

Reproducibility of promoter arrays using biological replicates

•Top 1000 overlap

•Top 1000 overlap

•H3me3K27; Ntera2 cells

•500 kb region of chromosome 6

•500 kb region of chromosome 1

Amount of Sample Per ChIP

Number of cells Chromatin input

ChIP output

1x107 200 µg 150 ng

1x106 20 µg 10 ng

5x105 10 µg 1.3 ng

1x105 2 µg 300 pg

1x104 200 ng 30 pg

Amount of Sample Per ChIP

Number of cells Chromatin input

ChIP output

1x107 200 µg 150 ng

1x106 20 µg 10 ng

5x105 10 µg 1.3 ng

1x105 2 µg 300 pg

1x104 200 ng 30 pg

•Standard ChIP Protocol (1x107 cells; WGA2)

• Promoter Arrays

• Genome Tiling Arrays

•MicroChIP Protocol (10,000-100,000 cells; WGA4)

• Promoter Arrays

• Genome Tiling Arrays

Miniaturization

Reproducibility of MicroChIP Protocol

Peak calling programs

• Moving average method by Keles et al. (2004), • A Hidden Markov Model (HMM) approach by Li et al. (2005), • TileMap by Ji and Wong (2005) using moving averages or

an HMM to account for information of adjacent probes, • PMT by Chung et al. (2007) that integrates a physical model

to correct for probe-specific behavior. • ChIPmix (Martin-Magniette et al. (2008)) based on a linear

regression mixture model .

Spike-ins comparison

• Mixtures of human genomic DNA and “spike-ins” comprised of nearly 100 human sequences at various concentrations were hybridized to four tiling array platforms by eight independent groups.

• Ref: Johnson et al., “Systematic evaluation of variability in ChIP-chip experiments using predefined DNA targets”, Genome Research, 18: 393-403, 2008.

Programs in Spike-ins comparison • MAT: Model-based Analysis of Tiling arrays first standardizes each individual Affymetrix

tiling arrays by modeling the effect of probe’s 25-mer sequence and genome copy number on its signal.

• TAS: Affymetrix Tiling Array Software first uses quantile normalization to normalize probes on all the arrays. Then a Mann-Whitney U test (also known as Wilcoxon rank-sum test) is used across 500bp sliding windows to identify windows where the spike-in probes has higher signals than the control probes.

• Weighted Average (WA): To detect enriched regions, we used an approach that judged the significance of ratios of a contiguous set of probes defining a region by comparing a score based on their weighted average to the distribution of scores of all sets of probes taken in windows of the same predefined size (500bp in this case.)

• TAMAL: the algorithm proceeds in two basic steps. First, peaks are found using the TAMALPAIS. Then, the enrichment is estimated within the peak by using the maxfour approach described in Krig et al. (2007, J Biol Chem 282:9703). Bieda et al. (2006) describe four levels of stringency, called L1, L2, L3, L4, with L1 being the most stringent set of detection parameters and L4 the least stringent.

• Mpeak: The model-based Mpeak method is used to identify peaks in ChIP-on-chip data.• Wavelet: The algorithm uses wavelet transform of the signals from the red and green

channels of the tiling array. From the approximation coefficients of the wavelet transform we obtain clear intensity and length-scale separation between the background signal and the signal coming from the regions of the biochemical activity.