ChIP-chip Data

Post on 23-Jan-2016

61 views 0 download

Tags:

description

ChIP-chip Data. DNA-binding proteins. Constitutive proteins (mostly histones) Organize DNA Regulate access to DNA Have many modifications Acetylation, methylation, … Sporadic proteins (Transcription Factors) Mediate docking of transcription apparatus Modify histones Methylate DNA. - PowerPoint PPT Presentation

Transcript of ChIP-chip Data

ChIP-chip Data

DNA-binding proteins

• Constitutive proteins (mostly histones)– Organize DNA– Regulate access to DNA– Have many modifications

• Acetylation, methylation, …

• Sporadic proteins (Transcription Factors)– Mediate docking of transcription apparatus– Modify histones– Methylate DNA

Histones

Histones are an ancient family of proteins which serve as the scaffold for DNA

Four types of histones assemble in pairs to form a nucleosome

DNA is wrapped twice around each nucleosome

Histones and Modifications

DNA contacts histones on their tails Histone tails can be modified

Histones can stay loose or assemble tightly – this compacts the DNA

Transcription Factors

• General – help to set up transcription of many genes

• Specific – draw in general factors or RNA Pol II to specific genes

TATABindingProtein

DNA Methylation

Adding a Methyl to Cytosine

Cytosine methylation is passed on to daughter cells

Chromatin Immuno-precipitation

Tiling Array

• One probe every n base pairs over some length of chromosome

– Interrupted by repeat regions

• Promoter array: each (known) promoter tiled

An Affymetrix tiling design

What the data look like

__ _

_

__ _ __

_

_____

1206600 1206800 1207000 1207200 1207400

-2-1

01

23

4

loc[nn]

lr(e

co

g1

.h3

k9

)[n

n, ]

_

__

_

_ _ _ ___ _____

__ _

_

_ __

___ ___

__

__

__

__

____

_____

__

_ _

_ __

___

____

_

_

_

_

_

_ _ _ ___

_

____

__

_

_

_ __

____

___

_

__

_ _

_ _

__

__ _

__

__

___

_

__

_ ___ _

_

___

__ _

_

_

__ ___

_

____

___

_

__ _

___

_

___

_

__

_ _

_ _

_

_

_

_

_

_

___

__ _

_

_ _

_ ____

_

___

__ __

_ _ _

___ _

____

histone acetylation on 15 samples over one promoter (raw)

Multiple Promoters

----

--

--

----

-------------

--------

-

-

-

- ---

--

--

-

-

-

------------- -

-

------------------

---------

---

10120000 10125000 10130000 10135000

-4-3

-2-1

01

2

loc[mm]

log

.R[m

m, ] -

log

.G[m

m, ]

---

-

------

-

------

---------

-----------

-----

--------

---------- --

---

-------

-----------------

- -

-

-

----

-

--

--

---

-

-

---------------------- --------

------

---

------

-- -

--

-------

--

--

----

--------

--

--

-

-------

--

-

-

-

--

---

-

-

------------

-

-

-

----

--

-

--

-----

-

-

----

-

---

-

-

----

----

---

---

-

-

-

--

-

-

-

--

-----

--

Normalized by Medians

----

--

--

----

---------

----

---

---

--

-

-

-

- ---

--

--

-

-

-

------

---

---- -

-

------------------

-----

----

---

10120000 10125000 10130000 10135000

-2-1

01

23

loc[mm]

xx

---

-

------

-

-----

-

---------

----------

---

---

------------------ --

---

-------

----

-------------- -

--

---

-

-

--

--

---

-

-

---------------------- --------

------

---

------

-

- -

--

-------

--

--

----

--------

--

--

-

-

------

--

-

-

-

--

---

-

-

------------

-

-

-

----

--

-

--

-----

-

-

----

-

---

-

-

----

----

---

---

-

-

-

--

-

-

-

--

-----

--

Methods and Issues

• Normalization– Different enrichment ratios– Different probe thermodynamics– Dye and probe bias

• Estimation– Categorical or continuous?– Individual values are noisy:

• For TF binding: where is the peak?----

--

--

----

---------

----

---

---

--

-

-

-

- ---

--

--

-

-

-

------

---

---- -

-

------------------

-----

----

---

10120000 10125000 10130000 10135000

-2-1

01

23

loc[mm]

xx

---

-

------

-

-----

-

---------

----------

---

---

------------------ --

---

-------

----

-------------- -

--

---

-

-

--

--

---

-

-

---------------------- --------

------

---

------

-

- -

--

-------

--

--

----

--------

--

--

-

-

------

--

-

-

-

--

---

-

-

------------

-

-

-

----

--

-

--

-----

-

-

----

-

---

-

-

----

----

---

---

-

-

-

--

-

-

-

--

-----

--

Normalization

• Basic idea: compensate technical variables

• Technique differences should affect different probes differently

• Try to estimate what part of signal can be attributed to technical factors

• Easiest variable to access: sequence

MAT

• One color Affy array– Needs separate array for comparison

• Normalizes probe thermodynamics & enrichment ratio

• Estimation by (robust) moving average

Normalized Data – Rare Event

Normalized Data – Common Event

Estimation

• Try to build an intelligent moving average

• Not all neighbors will be similar

• Typical TF binds to 8bp– Pol II may spread wider

• Typical fragment is 100-200 bp

• Cannot resolve < 200 bp----

--

--

----

---------

----

---

---

--

-

-

-

- ---

--

--

-

-

-

------

---

---- -

-

------------------

-----

----

---

10120000 10125000 10130000 10135000

-2-1

01

23

loc[mm]

xx

---

-

------

-

-----

-

---------

----------

---

---

------------------ --

---

-------

----

-------------- -

--

---

-

-

--

--

---

-

-

---------------------- --------

------

---

------

-

- -

--

-------

--

--

----

--------

--

--

-

-

------

--

-

-

-

--

---

-

-

------------

-

-

-

----

--

-

--

-----

-

-

----

-

---

-

-

----

----

---

---

-

-

-

--

-

-

-

--

-----

--

Pol II binding on a 100 bp grid

TileMap

• Ignores normalization

• ‘Shrinkage’ estimator of variance– Improves individual scores

• Smooths noise by moving average