Chromatin Immuno-precipitation (CHIP)-chip Analysis 11/07/07.

Post on 16-Dec-2015

220 views 1 download

Tags:

Transcript of Chromatin Immuno-precipitation (CHIP)-chip Analysis 11/07/07.

Chromatin Immuno-precipitation (CHIP)-chip Analysis

11/07/07

Experimental Protocol

• Step 1: crosslink protein with DNA

• Step 2: sonication (break) DNA

Kim and Ren 2007

Experimental Protocol

• Step 1: crosslink– fix protein with DNA

• Step 2: sonication– break DNA

• Step 3: immuno-precipitation– Pull down target

protein by specific antibody

Kim and Ren 2007

Experimental Protocol

• Step 1: crosslink– fix protein with DNA

• Step 2: sonication– break DNA

• Step 3: immuno-precipitation– Pull down target protein by

specific antibody

• Step 4: hybridization– Hybridize input and pulled-

down DNA on microarray

Kim and Ren 2007

Intergenic microarray

• Array probes are PCR products of intergenic regions.

• Binding signal is represented by a single probe.

ChIP-array

• Consistently enriched in repeated ChIP-arrays are selected to be the TF binding targets

• Usually hundreds of targets, each ~1000 long

• We want to know the precise binding

(e.g. 10 bases)

TF Target

• Microarray probes are oligonucleotide sequences with regular spacing covering a whole genomic region.

chromosome

Tiling arrays

Tiling Array Data

Each TF binding signal is represented by multiple probes.

Need more sophisticated statistical tools.Kim and Ren 2007

Methods

• Moving average t-test (Keles et al. 2004)

• HMM (Li et al. 2005; Yuan et al. 2005)

• Tilemap (Ji and Wong 2005)

• MAT (Johnson et al. 2006)

Keles’ method• Calculate a two-sample t-

statistic Y2

Y1

i

CHIP-signal

Input-signal

22,21

2,1

,1,2,

/ˆ/ˆ nn

YYT

ii

iini

Keles et al. 2004

Keles’ method• Calculate a two-sample t-

statistic Y2

Y1

i

CHIP-signal

Input-signal

22,21

2,1

,1,2,

/ˆ/ˆ nn

YYT

ii

iini

w

1

,*,

1 wi

ihnhni T

wT

• Moving average scan-statistic

Multiple hypothesis testing

• Multiple hypothesis testing needs to be considered to control false positive error rates.

• What is the null distribution of this statistic?

1

,*,

1 wi

ihnhni T

wT

Multiple hypothesis testing

• Assume has t-distribution• Approximate

by normal distribution.

• Alternatively can use resampling method to estimate the null distribution.

nhT ,

1

,*,

1 wi

ihnhni T

wT

Tilemap

Improvement over Keles’ method in following ways

• Use a more robust test statistic

• Estimate the null distribution without prior assumptions.

Ji and Wong 2005

Step 1: calculating a t-like test statistic

• Model:

log-intensity

Probe index Condition index Replicate index

Step 1: calculating a t-like test statistic

• Model:

log-intensity

pooling data

• Two samples:

• Multiple samples:

Step 1: calculating a t-like test statistic

• Want to have a robust estimate of variance.

Notation

Step 1: calculating a t-like test statistic

Estimation of by variance shrinkage

Shrinkage factor

Step 2: Merging data

• Moving average

• Alternatively use Hidden Markov Model

Step 3: control FDR

Goal: To find null and signal distributions

Idea: assume a mixture modelThis is unidentifiable!

Step 3: control FDR

Goal: To find null and signal distributions

Idea: assume a mixture modelThis is unidentifiable!

A clever trick: Look for

with

How to find g0 and g1

• To get g1, can we select probes with highest t-score?

• Why or why not?

How to find g0 and g1

• Idea: signals at neighboring probes are correlated, whereas noises are not (hopefully!)

• First select probes that have the highest t-score ti.

• Use their downstream value ti+1 to estimate g1.

• Use same trick to estimate g0.

Step 3: control FDR

Goal: To find null and signal distributions

Idea: assume a mixture modelThis is unidentifiable!

A clever trick: Find

with

Additional assumption:

Step 3: control FDR

Goal: To find null and signal distributions

Idea: assume a mixture modelThis is unidentifiable!

A clever trick: Find

with

Additional assumption:

Step 3: Unbalanced mixture score

with

)()( 00 tgtf

is estimated by fitting

dttftg

dttftgtfth2

10

101

0)()(

)()())()(̂

False discovery rate (FDR)

Determine TF bindings sites are FDR cutoff

How to find g0 and g1

• Idea: signals at neighboring probes are correlated, whereas noises are not (hopefully!)

• First select probes that have the highest t-score ti.

• Use their downstream value ti+1 to estimate g1.

• Use same trick to estimate g0.

Memory problem!

Example: Analysis of a cMyc binding data

Comparison of models

Simulation results

MAT

Basic Idea:

• Baseline level correction

• Standardize probe intensity with respect to the expected baseline value

(Johnson et al. 2006)

MAT

• How to estimate the baseline values?

Estimated nucleotide effect

A C

MAT

• Standardization

binaffinity

ˆ)log(

i

iii s

mPMt

region)in values()( tTMnregionMATscore p

(X.S. Liu)

Reading List

• Keles el 2004– Developed a multiple hypothesis method for

tiling array analysis

• Ji and Wong 2005– Tilemap; improved over Keles et al.’s method

• Johnson et al. 2006– MAT: showed baseline adjustment improved

signal detection.