Tom Kepler Santa Fe Institute Normalization and Analysis of DNA Microarray Data by Self-Consistency...

36
Tom Kepler Santa Fe Institute Normalization and Analysis of DNA Microarray Data by Self-Consistency and Local Regression [email protected]

Transcript of Tom Kepler Santa Fe Institute Normalization and Analysis of DNA Microarray Data by Self-Consistency...

Page 1: Tom Kepler Santa Fe Institute Normalization and Analysis of DNA Microarray Data by Self-Consistency and Local Regression kepler@santafe.edu.

Tom KeplerSanta Fe Institute

 Normalization and Analysis

of DNA Microarray Data

by Self-Consistency

and Local Regression

[email protected]

Page 2: Tom Kepler Santa Fe Institute Normalization and Analysis of DNA Microarray Data by Self-Consistency and Local Regression kepler@santafe.edu.
Page 3: Tom Kepler Santa Fe Institute Normalization and Analysis of DNA Microarray Data by Self-Consistency and Local Regression kepler@santafe.edu.

Rat mesothelioma cellscontrol

Rat mesothelioma cellstreated with KBrO2

Page 4: Tom Kepler Santa Fe Institute Normalization and Analysis of DNA Microarray Data by Self-Consistency and Local Regression kepler@santafe.edu.

NormalizationMethod to be improved:

1. Assume that some genes will not change under the treatment under investigation.

2. Identify these core genes in advance of the experiment.

3. Normalize all genes against these genes assuming they do not change

 

Page 5: Tom Kepler Santa Fe Institute Normalization and Analysis of DNA Microarray Data by Self-Consistency and Local Regression kepler@santafe.edu.

NormalizationNew Method:

1. Assume that some genes will not change under the treatment under investigation.

2. Choose these core genes arbitrarily.3. Normalize (provisionally) all genes

against these genes assuming they do not change.

4. Determine which genes do not change under this normalization.

5. Make this set the new core. If this core differs from the previous core, go to 3. Else, done.

 

Page 6: Tom Kepler Santa Fe Institute Normalization and Analysis of DNA Microarray Data by Self-Consistency and Local Regression kepler@santafe.edu.

I c mRNA [ ]

I = spot intensity[mRNA] = concentration of specific mRNAc = normalization constant

Error Model

Page 7: Tom Kepler Santa Fe Institute Normalization and Analysis of DNA Microarray Data by Self-Consistency and Local Regression kepler@santafe.edu.

I c mRNA [ ]

I = spot intensity[mRNA] = concentration of specific mRNAc = normalization constant = lognormal multiplicative error

Error Model

Page 8: Tom Kepler Santa Fe Institute Normalization and Analysis of DNA Microarray Data by Self-Consistency and Local Regression kepler@santafe.edu.

I c mRNAijk ij ik ijk [ ]

I = spot intensity[mRNA] = concentration of specific mRNAc = normalization constant = lognormal multiplicative error

index 1, i: treatment groupindex 2, j: replicate within treatmentindex 3, k: spot (gene)

Error Model

Page 9: Tom Kepler Santa Fe Institute Normalization and Analysis of DNA Microarray Data by Self-Consistency and Local Regression kepler@santafe.edu.

Y = log spot intensity = mean log concentration of specific mRNA = treatment effect (conc. specific mRNA) = normalization constant = normal additive error

index 1, i: treatment groupindex 2, j: replicate within treatmentindex 3, k: spot (gene)

Page 10: Tom Kepler Santa Fe Institute Normalization and Analysis of DNA Microarray Data by Self-Consistency and Local Regression kepler@santafe.edu.

Identifiability constraints:

Model:

x Y Y

a Y

d Y Y Y Y

k k

ij i ij

ik i i k k i

Estimate by ordinary least squares:

Page 11: Tom Kepler Santa Fe Institute Normalization and Analysis of DNA Microarray Data by Self-Consistency and Local Regression kepler@santafe.edu.

Identifiability constraints:

Model:

But note: cannot identify between and

Page 12: Tom Kepler Santa Fe Institute Normalization and Analysis of DNA Microarray Data by Self-Consistency and Local Regression kepler@santafe.edu.

Self-consistency:

The weight wk() is small if the kth gene is judged to be changed; close to one if it is judged to be unchanged.

Procedure is iterative.

Page 13: Tom Kepler Santa Fe Institute Normalization and Analysis of DNA Microarray Data by Self-Consistency and Local Regression kepler@santafe.edu.
Page 14: Tom Kepler Santa Fe Institute Normalization and Analysis of DNA Microarray Data by Self-Consistency and Local Regression kepler@santafe.edu.

-2 0 2 4 6

log intensity, array 1

-2

0

2

4

6

log

inte

nsi

ty,

arr

ay

2

Page 15: Tom Kepler Santa Fe Institute Normalization and Analysis of DNA Microarray Data by Self-Consistency and Local Regression kepler@santafe.edu.

-2 0 2 4 6

log intensity, array 1

-2

0

2

4

6

log

inte

nsi

ty,

arr

ay

2

Page 16: Tom Kepler Santa Fe Institute Normalization and Analysis of DNA Microarray Data by Self-Consistency and Local Regression kepler@santafe.edu.

Failure of Model

Page 17: Tom Kepler Santa Fe Institute Normalization and Analysis of DNA Microarray Data by Self-Consistency and Local Regression kepler@santafe.edu.
Page 18: Tom Kepler Santa Fe Institute Normalization and Analysis of DNA Microarray Data by Self-Consistency and Local Regression kepler@santafe.edu.

Generalized Model

The normalization ij(k) and the heteroscedasticity

function ij(k) are slowly varying functions

of the intensity, .

Estimate by Local Regression

Page 19: Tom Kepler Santa Fe Institute Normalization and Analysis of DNA Microarray Data by Self-Consistency and Local Regression kepler@santafe.edu.

data

Local Regression

Page 20: Tom Kepler Santa Fe Institute Normalization and Analysis of DNA Microarray Data by Self-Consistency and Local Regression kepler@santafe.edu.

Predict value at x=50: weight, linear regression

Page 21: Tom Kepler Santa Fe Institute Normalization and Analysis of DNA Microarray Data by Self-Consistency and Local Regression kepler@santafe.edu.

Predict whole function similarly

Page 22: Tom Kepler Santa Fe Institute Normalization and Analysis of DNA Microarray Data by Self-Consistency and Local Regression kepler@santafe.edu.
Page 23: Tom Kepler Santa Fe Institute Normalization and Analysis of DNA Microarray Data by Self-Consistency and Local Regression kepler@santafe.edu.

Compare to known true function

Page 24: Tom Kepler Santa Fe Institute Normalization and Analysis of DNA Microarray Data by Self-Consistency and Local Regression kepler@santafe.edu.
Page 25: Tom Kepler Santa Fe Institute Normalization and Analysis of DNA Microarray Data by Self-Consistency and Local Regression kepler@santafe.edu.
Page 26: Tom Kepler Santa Fe Institute Normalization and Analysis of DNA Microarray Data by Self-Consistency and Local Regression kepler@santafe.edu.
Page 27: Tom Kepler Santa Fe Institute Normalization and Analysis of DNA Microarray Data by Self-Consistency and Local Regression kepler@santafe.edu.

Simulation-based Validation1. Reproduce observed bias.

Page 28: Tom Kepler Santa Fe Institute Normalization and Analysis of DNA Microarray Data by Self-Consistency and Local Regression kepler@santafe.edu.

Simulation-based Validation2. Reproduce observed heteroscedasticity.

Page 29: Tom Kepler Santa Fe Institute Normalization and Analysis of DNA Microarray Data by Self-Consistency and Local Regression kepler@santafe.edu.

Test based on z statistic:

21

12

11nn

s

ddz

k

kkk

Page 30: Tom Kepler Santa Fe Institute Normalization and Analysis of DNA Microarray Data by Self-Consistency and Local Regression kepler@santafe.edu.

Choice of significance level:expected number of false positives:

E(false positives) = N

But minimum detectable difference increases as gets smaller

Page 31: Tom Kepler Santa Fe Institute Normalization and Analysis of DNA Microarray Data by Self-Consistency and Local Regression kepler@santafe.edu.

E(fp) min diff min ratio

0.05 250 0.916 2.50.01 50 1.09 30.001 5 1.29 3.60.0001 0.5 1.61 5

Page 32: Tom Kepler Santa Fe Institute Normalization and Analysis of DNA Microarray Data by Self-Consistency and Local Regression kepler@santafe.edu.

Validation of method against simulated data3. Hypothesis testing: Simulated from stated model

Pro

port

ion

chan

ged

spot

s

“-fo

ld c

hang

e”

bias

“rate false pos.” = mean observed / expected

Page 33: Tom Kepler Santa Fe Institute Normalization and Analysis of DNA Microarray Data by Self-Consistency and Local Regression kepler@santafe.edu.

Simulated data: mis-specified model — multiplicative + additive noise

Page 34: Tom Kepler Santa Fe Institute Normalization and Analysis of DNA Microarray Data by Self-Consistency and Local Regression kepler@santafe.edu.

Validation of method against simulated data4. Hypothesis testing: Simulated from “wrong” model: additive + multiplicative noise.

Pro

port

ion

chan

ged

spot

s

“-fo

ld c

hang

e”

bias

Page 35: Tom Kepler Santa Fe Institute Normalization and Analysis of DNA Microarray Data by Self-Consistency and Local Regression kepler@santafe.edu.

Acknowledgments

Lynn CrosbyNorth Carolina State University

Kevin MorganStrategic Toxicological Sciences

GlaxoWellcome

Page 36: Tom Kepler Santa Fe Institute Normalization and Analysis of DNA Microarray Data by Self-Consistency and Local Regression kepler@santafe.edu.

Santa Fe Institute

www.santafe.edu 

postdoctoral fellowships available(apply before the end of the year)

[email protected]