Post on 31-Dec-2015
description
Some views on microarray experimental design
Rainer BreitlingMolecular Plant Science Group & Bioinformatics Research Centre
University of Glasgow, Scotland, UK
Personal Background
• University of Glasgow, Scotland, UK
• Molecular Plant Sciences Group
• Bioinformatics Research Centre
• Functional Genomics Facility
Some common questions in microarray experimental design
• How many arrays will I need?
• Should I pool my samples?
• Which arrays should I choose?
• Which samples should I put together on one array?
Why are microarrays special?
• produce large amounts of data instantaneously
• can look for unexpected effects
• are still quite expensive
almost never repeated
careful design necessary before you start
How many replicates?
• as many as possible
Statistics says: The more replicates, the better your estimate of expression (that’s an asymptotic process, so if you add at least a few replicates, the effect will be really strong)
How many replicates?
2
212/1
)/(
)(4
zz
n
•α significance level (probability of detecting FP)
•1-β power to detect differences (probability of detecting TP)
•σ standard deviation of the log-ratios
•δ detectable difference between class mean log-ratios
•z percentile of standard normal distribution
n required number of arrays (reference design)
How many replicates?
• Five
Experience shows: For most common experiments you get a reasonable list of differentially expressed genes with 5 replicates
How many replicates?
• Three
One to convince yourself, one to convince your boss, one just in case...
How many replicates?
• It depends on– the quality of the sample– the magnitude of the expected effect– the experimental design– the method of analysis
The quality of the sample
• smaller samples (single cells) are more noisy than large samples (tissue homogenates)
• cell cultures are less noisy than patient biopsies
• sample pooling can decrease noise – if individual variation is not of interest
The magnitude of the effect
• Microarrays are very sensitive
• To keep effects small:– use early time points, gentle stimuli– never compare dogs and donuts
• if you get a list of 2000 genes that are significantly changed, your experiment failed!
The magnitude of the effect
• some problematic cases– stably transfected cell lines (are they still the
same cells?)– knock-out organisms (even the same tissue
can be a different)– local changes may be diluted cell
isolation will increase noise
The experimental design
• Three major options:– reference design (flexible)
– balanced block design (efficient)
– loop design (elegant)
The experimental design
• loop designs can save samples...
• ...but they can cause interpretation nightmares in less simple cases (use for large studies, if you have a full-time statistician in the team)
A B
CD
A B C D
R R R R
The method of analysis
• Golub et al. (1999) data set
• 38 leukemia patient bone marrow samples, hybridized individually to Affymetrix microarrays
• Differential expression between two leukemia types was examined, using random subsets of the complete dataset
The method of analysis 0h 9.5h 11.5h 13.5h 15.5h 18.5h 20.5h
6144 - purine base metabolism
6099 - tricarboxylic acid cycle
6099 - tricarboxylic acid cycle
3773 - heat shock protein activity
6099 - tricarboxylic acid cycle
9277 - cell wall (sensu Fungi)
3773 - heat shock protein activity
5749 - respiratory chain complex II (sensu Eukarya)
6099 - tricarboxylic acid cycle
3773 - heat shock protein activity
297 - spermine transporter activity
6950 - response to stress
6121 - oxidative phosphorylation, succinate to ubiquinone
5977 - glycogen metabolism
5749 - respiratory chain complex II (sensu Eukarya)
15846 - polyamine transport
297 - spermine transporter activity
8177 - succinate dehydrogenase (ubiquinone) activity
6950 - response to stress
6121 - oxidative phosphorylation, succinate to ubiquinone
4373 - glycogen (starch) synthase activity
3773 - heat shock protein activity
4373 - glycogen (starch) synthase activity
8177 - succinate dehydrogenase (ubiquinone) activity
15846 - polyamine transport
4373 - glycogen (starch) synthase activity
4129 - cytochrome c oxidase activity
6537 - glutamate biosynthesis
5353 - fructose transporter activity
7039 - vacuolar protein catabolism
5751 - respiratory chain complex IV (sensu Eukarya)
6097 - glyoxylate cycle
15578 - mannose transporter activity
6950 - response to stress
5749 - respiratory chain complex II (sensu Eukarya)
5750 - respiratory chain complex III (sensu Eukarya)
7039 - vacuolar protein catabolism
4129 - cytochrome c oxidase activity
6121 - oxidative phosphorylation, succinate to ubiquinone
9060 - aerobic respiration
8645 - hexose transport
5751 - respiratory chain complex IV (sensu Eukarya)
8177 - succinate dehydrogenase (ubiquinone) activity
4129 - cytochrome c oxidase activity
iterativeGroupAnalysis(iGA)
glyoxylate
cycle
citrate (TCA) cycle
oxidative phosphorylation
(complex V)
respiratory chaincomplex III
respiratory chaincomplex II
Graph-based iterative
GroupAnalysis (GiGA)
What is a good replicate?
The experiment your competitor at the other side of the globe would do to see if your results are reproducible
Vary “all” parameters – challenge your results
Prepare new samples, from new cultures, using new buffers and new graduate students
Remember to produce matched controls
What is a “bad” replicate?
• technical replicates (i.e. hybridizing the same sample repeatedly)
• dye-swapping experiments (usually gene-specific dye bias is not a big issue, and dye balancing is more efficient anyway)
• pooled samples, hybridized repeatedly
• the same preparation, only labelled twice
Should samples be pooled?
• most samples are already pooled – they come from multiple cells
• pool to increase amount of mRNA, but only as much as necessary
• prepare independent pools to assess variation
• problems: bias, “contamination”, outliers, information loss...
Which arrays are the best?
• Standard arrayscompare and exchange data easily
• Whole-genome arraysdetect unexpected effects, increase confidence
• Single-color arrays (Affymetrix GeneChip)for more complex comparisons
• Annotated arrays
Further reading
• Dobbin, Shih & Simon (2003) J. Natl. Cancer Inst. 95: 1362.
• Yang & Speed (2002) Nature Rev. Genet. 3: 579.
• Breitling (2004) http://www.brc.dcs.gla.ac.uk/~rb106x/microarray_tips.htm
Contact
Rainer Breitling
Bioinformatics Research Centre
Davidson Building A416
R.Breitling@bio.gla.ac.uk
http://www.brc.dcs.gla.ac.uk/~rb106x