Previous Lecture: Bioimage Informatics Agullo-Pascual E, Reid DA, Keegan S, Sidhu M, Fenyö D,...

41
Previous Lecture: Bioimage Informatics Agullo-Pascual E, Reid DA, Keegan S, Sidhu M, Fenyö D, Rothenberg E, Delmar M, "Super-resolution fluorescence microscopy of the cardiac connexome reveals plakophilin-2 inside the connexin43 plaque", Cardiovasc Res. 2013

Transcript of Previous Lecture: Bioimage Informatics Agullo-Pascual E, Reid DA, Keegan S, Sidhu M, Fenyö D,...

Page 1: Previous Lecture: Bioimage Informatics Agullo-Pascual E, Reid DA, Keegan S, Sidhu M, Fenyö D, Rothenberg E, Delmar M, "Super-resolution fluorescence microscopy.

Previous Lecture: Bioimage Informatics

Agullo-Pascual E, Reid DA, Keegan S, Sidhu M, Fenyö D, Rothenberg E, Delmar M, "Super-resolution fluorescence microscopy of the cardiac connexome reveals plakophilin-2 inside the connexin43 plaque", Cardiovasc Res. 2013

Page 2: Previous Lecture: Bioimage Informatics Agullo-Pascual E, Reid DA, Keegan S, Sidhu M, Fenyö D, Rothenberg E, Delmar M, "Super-resolution fluorescence microscopy.

Introduction to Biostatistics and Bioinformatics

Experimental Design

This Lecture

Page 3: Previous Lecture: Bioimage Informatics Agullo-Pascual E, Reid DA, Keegan S, Sidhu M, Fenyö D, Rothenberg E, Delmar M, "Super-resolution fluorescence microscopy.

Experimental Design

Experimental Design by Christine Ambrosinowww.hawaii.edu/fishlab/Nearside.htm

Page 4: Previous Lecture: Bioimage Informatics Agullo-Pascual E, Reid DA, Keegan S, Sidhu M, Fenyö D, Rothenberg E, Delmar M, "Super-resolution fluorescence microscopy.

Experimental Design

Overcoming the threat from chance and bias to the validity of conclusion.

Page 5: Previous Lecture: Bioimage Informatics Agullo-Pascual E, Reid DA, Keegan S, Sidhu M, Fenyö D, Rothenberg E, Delmar M, "Super-resolution fluorescence microscopy.

Experimental Design

Inputs Process Outputs

Controllable Factors

Uncontrollable Factors

Page 6: Previous Lecture: Bioimage Informatics Agullo-Pascual E, Reid DA, Keegan S, Sidhu M, Fenyö D, Rothenberg E, Delmar M, "Super-resolution fluorescence microscopy.

Experimental Design

• Recognition and statement of the problem (e.g. testing a specific hypothesis or open ended discovery).

• Selecting a response variable.

• Choosing controllable factors and their range.

• Listing uncontrollable factors and estimate their effect.

• Choosing experimental design.

• Performing experiment.

• Statistical analysis of data.

• Designing the next experiment based on the results.

Page 7: Previous Lecture: Bioimage Informatics Agullo-Pascual E, Reid DA, Keegan S, Sidhu M, Fenyö D, Rothenberg E, Delmar M, "Super-resolution fluorescence microscopy.

Exploring the Parameter SpaceOne factor at a

time

Factor 1

Sco

re

Factor 2

Sco

re

Factor 3

Sco

re

Factor 1

Facto

r 2

2-factor factorial design 3-factor factorial design

k-factor factorial design (2k experiments)

k factors : 2k experiments

4 experiments 8 experiments

For example, 7 factors: 128 experiments, 10 factors: 1,024 experiments

Page 8: Previous Lecture: Bioimage Informatics Agullo-Pascual E, Reid DA, Keegan S, Sidhu M, Fenyö D, Rothenberg E, Delmar M, "Super-resolution fluorescence microscopy.

Randomization

• Statistical methods require that observations are independently distributed random variables. Randomization usually makes this assumption valid.

• Randomization guards against unknown and uncontrolled factors.

• Randomize with respect to analysis order, location, material etc.

Order of MeasurementsOrder of Measurements

p = 0.19 p = 0.32

Not Randomized Randomized

No change in sensitivity

duringmeasurement

Page 9: Previous Lecture: Bioimage Informatics Agullo-Pascual E, Reid DA, Keegan S, Sidhu M, Fenyö D, Rothenberg E, Delmar M, "Super-resolution fluorescence microscopy.

Randomization

Order of MeasurementsOrder of Measurements

p = 0.19 p = 0.32

Not Randomized Randomized

Order of MeasurementsOrder of Measurements

p = 5.7x10-6

No change in sensitivity

duringmeasurement

Change in sensitivity

duringmeasurement

p = 0.20

StandardDeviation:

0.8, 0.8

StandardDeviation:

0.7, 0.9

StandardDeviation:

1.8, 1.3

Page 10: Previous Lecture: Bioimage Informatics Agullo-Pascual E, Reid DA, Keegan S, Sidhu M, Fenyö D, Rothenberg E, Delmar M, "Super-resolution fluorescence microscopy.

Blocking

Blocking is used to control for known and controllable factors.Randomized Complete Block Design - minimizing the effect of variability associated with e.g. location, operator, plant, batch, time.

The Latin Square Design - minimizing the effect of variability associated with two independent factors

Intrument 1 Intrument 2 Intrument 3 Intrument 4Sample 3 Sample 3 Sample 2 Sample 1Sample 1 Sample 4 Sample 1 Sample 4Sample 4 Sample 2 Sample 3 Sample 2Sample 2 Sample 1 Sample 4 Sample 2

The rows and columns represent two restrictions on randomization

Intrument 1 Intrument 2 Intrument 3 Intrument 4Operator 1 Sample 1 Sample 2 Sample 3 Sample 4Operator 2 Sample 2 Sample 3 Sample 4 Sample 1Operator 3 Sample 4 Sample 1 Sample 2 Sample 3Operator 4 Sample 3 Sample 4 Sample 1 Sample 2

Page 11: Previous Lecture: Bioimage Informatics Agullo-Pascual E, Reid DA, Keegan S, Sidhu M, Fenyö D, Rothenberg E, Delmar M, "Super-resolution fluorescence microscopy.

Replication

Replication is needed to estimate the variance in the measurements.

• Technical replicates (repeat measurements).

• Process replicates

• Biological replicates

Page 12: Previous Lecture: Bioimage Informatics Agullo-Pascual E, Reid DA, Keegan S, Sidhu M, Fenyö D, Rothenberg E, Delmar M, "Super-resolution fluorescence microscopy.

Uncertainty in Determining the MeanComplex Normal Skewed Long tails

n=3

n=10

Mean

n=100

n=3

n=10

n=100

n=3

n=10

n=100

n=10

n=100

n=1000

Page 13: Previous Lecture: Bioimage Informatics Agullo-Pascual E, Reid DA, Keegan S, Sidhu M, Fenyö D, Rothenberg E, Delmar M, "Super-resolution fluorescence microscopy.

Standard Error of the Mean

n

ni

iix

1

xxx n,...,,21

Variance

Sample

Mean

n

i

ni

ix

1

2

2)(

nmes

..

Standard Error of the Mean

Page 14: Previous Lecture: Bioimage Informatics Agullo-Pascual E, Reid DA, Keegan S, Sidhu M, Fenyö D, Rothenberg E, Delmar M, "Super-resolution fluorescence microscopy.

Analytical Measurements

Theoretical Concentration

Measu

red

C

on

cen

trati

on

Page 15: Previous Lecture: Bioimage Informatics Agullo-Pascual E, Reid DA, Keegan S, Sidhu M, Fenyö D, Rothenberg E, Delmar M, "Super-resolution fluorescence microscopy.

A Few Characteristics of Analytical Measurements

Accuracy: Closeness of agreement between a test result and an accepted reference value.

Precision: Closeness of agreement between independent test results.

Robustness: Test precision given small, deliberate changes in test conditions (preanalytic delays, variations in storage temperature).

Lower limit of detection: The lowest amount of analyte that is statistically distinguishable from background or a negative control.

Limit of quantification: Lowest and highest concentrations of analyte that can be quantitatively determined with suitable precision and accuracy.

Linearity: The ability of the test to return values that are directly proportional to the concentration of the analyte in the sample.

Page 16: Previous Lecture: Bioimage Informatics Agullo-Pascual E, Reid DA, Keegan S, Sidhu M, Fenyö D, Rothenberg E, Delmar M, "Super-resolution fluorescence microscopy.

Precision and Accuracy

Theoretical Concentration

Theoretical Concentration

Measu

red

C

on

cen

trati

on

Measu

red

C

on

cen

trati

on

Page 17: Previous Lecture: Bioimage Informatics Agullo-Pascual E, Reid DA, Keegan S, Sidhu M, Fenyö D, Rothenberg E, Delmar M, "Super-resolution fluorescence microscopy.

Before/After Treatment

Gradient Length

Date Laboratory Patient

Before 3h 2010/07/02 13:08 1 6Before 3h 2010/07/02 19:15 1 11Before 3h 2010/07/04 18:19 1 4Before 3h 2010/07/05 00:26 1 10Before 3h 2010/07/11 05:29 1 16Before 3h 2010/07/11 08:33 1 17Before 3h 2010/07/11 14:39 1 19Before 3h 2010/07/11 20:46 1 29Before 3h 2010/07/19 00:12 1 20Before 3h 2010/07/19 09:22 1 53Before 3h 2010/07/19 12:26 1 58Before 3h 2010/07/19 15:29 1 61Before 3h 2010/07/25 09:17 1 35Before 3h 2010/07/25 12:20 1 39After 1h 2011/02/20 10:49 1 4After 1h 2011/02/20 13:57 1 6After 1h 2011/02/20 17:05 1 11After 1h 2011/03/04 14:07 2 15After 1h 2011/03/04 15:47 2 16After 1h 2011/03/04 17:06 2 17After 1h 2011/03/04 18:25 2 19After 1h 2011/03/04 19:44 2 20After 1h 2011/03/04 21:03 2 29After 1h 2011/03/05 02:19 2 35After 1h 2011/03/05 03:39 2 39After 1h 2011/03/05 04:57 2 53After 1h 2011/03/07 00:35 2 65After 1h 2011/03/07 02:51 2 58

Before 3h 2011/04/16 20:43 1 11After 3h 2011/04/21 04:54 1 10After 3h 2011/04/21 11:00 1 15After 1h 2011/04/22 08:20 1 17After 1h 2011/04/23 09:03 1 65

Before 3h 2011/04/23 21:20 1 20

An example of bad experimental design

Page 18: Previous Lecture: Bioimage Informatics Agullo-Pascual E, Reid DA, Keegan S, Sidhu M, Fenyö D, Rothenberg E, Delmar M, "Super-resolution fluorescence microscopy.

A proteomics example – no replicates

Page 19: Previous Lecture: Bioimage Informatics Agullo-Pascual E, Reid DA, Keegan S, Sidhu M, Fenyö D, Rothenberg E, Delmar M, "Super-resolution fluorescence microscopy.

A proteomics example – three replicates

no replicates

three replicates

Log

2 S

tan

dard

Devia

tion

Log 2 Average Spectrum Count

Log

2 S

um

Sp

ectr

um

Cou

nt

Log 2 Spectrum Count Ratio

Log

2 S

um

Sp

ectr

um

Cou

nt

Log 2 Spectrum Count Ratio

Page 20: Previous Lecture: Bioimage Informatics Agullo-Pascual E, Reid DA, Keegan S, Sidhu M, Fenyö D, Rothenberg E, Delmar M, "Super-resolution fluorescence microscopy.

Testing multiple hypothesis

• Is the concentration of calcium/calmodulin-dependent protein kinase type II different between the two samples?

• What protein concentration are different between the two samples?

p = 2x10-

6

The p-value needs to be corrected taking into account the we perform many tests.

Bonferroni correction: multiply the p-value with The number of tests performed (n): pcorr = puncorr x n

In this case where 3685 proteins are identified, so the Bonferroni corrected p-value for calcium/calmodulin-dependent protein kinase type II is pcorr = 2x10-6 x 3685 = 0.007

Page 21: Previous Lecture: Bioimage Informatics Agullo-Pascual E, Reid DA, Keegan S, Sidhu M, Fenyö D, Rothenberg E, Delmar M, "Super-resolution fluorescence microscopy.

Testing multiple hypothesis

The p-value distribution is uniform when testing differences between samples from the same distribution.

Normal distributionSample size = 10

p-value 10

# o

f te

st

p-value 10

# o

f te

st

p-value 10

# o

f te

st

0

8

0

60

0

500

10,000 tests1,000 tests100 tests

Page 22: Previous Lecture: Bioimage Informatics Agullo-Pascual E, Reid DA, Keegan S, Sidhu M, Fenyö D, Rothenberg E, Delmar M, "Super-resolution fluorescence microscopy.

Testing multiple hypothesis

The p-value distribution is uniform when testing differences between samples from the same distribution.

Normal distributionSample size = 10

30 tests from a distribution with a different mean (μ1-

μ2>>σ)

p-value 1

# o

f te

st

p-value 1

# o

f te

st

p-value 10

# o

f te

st

0

30

0

100

0

500

10,000 tests1,000 tests100 tests

00

Page 23: Previous Lecture: Bioimage Informatics Agullo-Pascual E, Reid DA, Keegan S, Sidhu M, Fenyö D, Rothenberg E, Delmar M, "Super-resolution fluorescence microscopy.

Testing multiple hypothesis

Controlling for False Discovery Rate (FDR)

Normal distributionSample size = 10

30 tests from a distribution with a different mean (μ1-

μ2>>σ)

p-value 1

Fals

e R

ate

p-value 1

Fals

e R

ate

p-value 10

Fals

e R

ate

0

1

0

1

0

1

00

False Discovery

Rate

False Discovery

Rate

False Discovery

Rate

10,000 tests1,000 tests100 tests

Page 24: Previous Lecture: Bioimage Informatics Agullo-Pascual E, Reid DA, Keegan S, Sidhu M, Fenyö D, Rothenberg E, Delmar M, "Super-resolution fluorescence microscopy.

Testing multiple hypothesis

False Discovery Rate (FDR) and False Negative Rate (FNR)

Normal distributionSample size = 10

100 tests30 tests from a distribution

with a different mean

p-value 1

Fals

e R

ate

p-value 1

Fals

e R

ate

p-value 10

Fals

e R

ate

0

1

0

1

0

1

00

μ1-μ2=2σμ1-μ2=σμ1-μ2=σ/2

False Discovery

Rate

False Negative

Rate

False Discovery

Rate

False Negative

Rate

False Discovery

Rate

False Negative Rate

Page 25: Previous Lecture: Bioimage Informatics Agullo-Pascual E, Reid DA, Keegan S, Sidhu M, Fenyö D, Rothenberg E, Delmar M, "Super-resolution fluorescence microscopy.

Sampling – Gaussian Peak

Retention Time

Inte

nsi

ty

Page 26: Previous Lecture: Bioimage Informatics Agullo-Pascual E, Reid DA, Keegan S, Sidhu M, Fenyö D, Rothenberg E, Delmar M, "Super-resolution fluorescence microscopy.

0

5

10

15

20

25

30

0.8 0.85 0.9 0.95 1

3 points

0

20

40

60

80

100

120

140

0.8 0.85 0.9 0.95 1

3 points

5%

Acquisition time = 0.05s

5%

Sampling – Gaussian Peak

Page 27: Previous Lecture: Bioimage Informatics Agullo-Pascual E, Reid DA, Keegan S, Sidhu M, Fenyö D, Rothenberg E, Delmar M, "Super-resolution fluorescence microscopy.

0.5

0.6

0.7

0.8

0.9

1

1.1

1 2 3 4 5 6 7 8 9 10

Th

res

ho

lds

(90

%)

# of points

Sampling – Gaussian Peak

Page 28: Previous Lecture: Bioimage Informatics Agullo-Pascual E, Reid DA, Keegan S, Sidhu M, Fenyö D, Rothenberg E, Delmar M, "Super-resolution fluorescence microscopy.

Definition of a molecular signature

FDA calls them “in vitro diagnostic multivariate assays”

A molecular signature is a computational or mathematical model that links high-dimensional molecular information to phenotype or other response variable of interest.

Page 29: Previous Lecture: Bioimage Informatics Agullo-Pascual E, Reid DA, Keegan S, Sidhu M, Fenyö D, Rothenberg E, Delmar M, "Super-resolution fluorescence microscopy.

1. Models of disease phenotype/clinical outcome• Diagnosis• Prognosis, long-term disease management• Personalized treatment (drug selection,

titration)

2. Biomarkers for diagnosis, or outcome prediction• Make the above tasks resource efficient, and

easy to use in clinical practice

3. Discovery of structure & mechanisms (regulatory/interaction networks, pathways, sub-types)• Leads for potential new drug candidates

Uses of molecular signatures

Page 30: Previous Lecture: Bioimage Informatics Agullo-Pascual E, Reid DA, Keegan S, Sidhu M, Fenyö D, Rothenberg E, Delmar M, "Super-resolution fluorescence microscopy.

Example of a molecular signature

Page 31: Previous Lecture: Bioimage Informatics Agullo-Pascual E, Reid DA, Keegan S, Sidhu M, Fenyö D, Rothenberg E, Delmar M, "Super-resolution fluorescence microscopy.

Oncotype DX Breast Cancer Assay

• Developed by Genomic Health (www.genomichealth.com)

• 21-gene signature to predict whether a woman with localized, ER+ breast cancer is at risk of relapse

• Independently validated in thousands of patients• So far performed >100,000 tests• Price of the test is $4,175• Not FDA approved but covered by most insurances

including Medicare• Its sales in 2010 reached $170M and with a compound

annual growth rate is projected to hit $300M by 2015.

Page 32: Previous Lecture: Bioimage Informatics Agullo-Pascual E, Reid DA, Keegan S, Sidhu M, Fenyö D, Rothenberg E, Delmar M, "Super-resolution fluorescence microscopy.

Improved Survival and Cost Savings

In a 2005 economic analysis of recurrence in LN-,ER+ patients receiving tamoxifen, Hornberger et al. performed a cost-utility analysis using a decision analytic model. Using a model, recurrence Score result was predicted on average to increase quality-adjusted survival by 16.3 years and reduce overall costs by $155,128.

In a 2 million member plan, approximately 773 women are eligible for the test. If half receive the test, given the high and increasing cost of adjuvant chemotherapy, supportive care and management of adverse events, the use of the Oncotype DX assay is estimated to save approximately $1,930 per woman tested (given an aggregate 34% reduction in chemotherapy use).

Page 33: Previous Lecture: Bioimage Informatics Agullo-Pascual E, Reid DA, Keegan S, Sidhu M, Fenyö D, Rothenberg E, Delmar M, "Super-resolution fluorescence microscopy.

EF Petricoin III, AM Ardekani, BA Hitt, PJ Levine, VA Fusaro, SM Steinberg, GB Mills, C Simone, DA Fishman, EC Kohn, LA Liotta, "Use of proteomic patterns in serum to identify ovarian cancer", Lancet 359 (2002) 572–77

Page 34: Previous Lecture: Bioimage Informatics Agullo-Pascual E, Reid DA, Keegan S, Sidhu M, Fenyö D, Rothenberg E, Delmar M, "Super-resolution fluorescence microscopy.

Check E., Proteomics and cancer: running before we can walk? Nature. 2004 Jun 3;429(6991):496-7.

Page 35: Previous Lecture: Bioimage Informatics Agullo-Pascual E, Reid DA, Keegan S, Sidhu M, Fenyö D, Rothenberg E, Delmar M, "Super-resolution fluorescence microscopy.

Example: OvaCheck

• Developed by Correlogic (www.correlogic.com)• Blood test for the early detection of epithelial ovarian

cancer  • Failed to obtain FDA approval • Looks for subtle changes in patterns among the tens of

thousands of proteins, protein fragments and metabolites in the blood

• Signature developed by genetic algorithm• Significant artifacts in data collection & analysis

questioned validity of the signature:- Results are not reproducible- Data collected differently for different groups of

patientshttp://www.nature.com/nature/journal/v429/n6991/full/

429496a.html

Page 36: Previous Lecture: Bioimage Informatics Agullo-Pascual E, Reid DA, Keegan S, Sidhu M, Fenyö D, Rothenberg E, Delmar M, "Super-resolution fluorescence microscopy.

Main ingredients for developing a molecular signature

Page 37: Previous Lecture: Bioimage Informatics Agullo-Pascual E, Reid DA, Keegan S, Sidhu M, Fenyö D, Rothenberg E, Delmar M, "Super-resolution fluorescence microscopy.

Base-Line Characteristics

DF Ransohoff, "Bias as a threat to the validity of cancer molecular-marker research", Nat Rev Cancer 5 (2005) 142-9.

Page 38: Previous Lecture: Bioimage Informatics Agullo-Pascual E, Reid DA, Keegan S, Sidhu M, Fenyö D, Rothenberg E, Delmar M, "Super-resolution fluorescence microscopy.

How to Address Bias

DF Ransohoff, "Bias as a threat to the validity of cancer molecular-marker research", Nat Rev Cancer 5 (2005) 142-9.

Page 39: Previous Lecture: Bioimage Informatics Agullo-Pascual E, Reid DA, Keegan S, Sidhu M, Fenyö D, Rothenberg E, Delmar M, "Super-resolution fluorescence microscopy.

Experimental Design - Summary

• Chance and bias is a threat to the conclusions from experiments

• Controllable and uncontrollable factors

• Randomization to guard against unknown and uncontrolled factors

• Replication (technical, process, and biological replicates) is used to estimate error in measurement and yields a more precise estimate.

• Blocking to control for known and controllable factors

• Multiple testing

• Molecular markers

Page 40: Previous Lecture: Bioimage Informatics Agullo-Pascual E, Reid DA, Keegan S, Sidhu M, Fenyö D, Rothenberg E, Delmar M, "Super-resolution fluorescence microscopy.

Experimental Design - Summary

• Use your domain knowledge: using a designed experiment is not a substitute for thinking about the problem.

• Keep the design and analysis as simple as possible.

• Recognize the difference between practical and statistical significance.

• Design iterative experiments.

Page 41: Previous Lecture: Bioimage Informatics Agullo-Pascual E, Reid DA, Keegan S, Sidhu M, Fenyö D, Rothenberg E, Delmar M, "Super-resolution fluorescence microscopy.

Next Lecture: Machine Learning