Gene expression Statistics 246, Week 3, 2002. Thesis: the analysis of gene expression data is going...
-
Upload
brook-atkinson -
Category
Documents
-
view
217 -
download
0
Transcript of Gene expression Statistics 246, Week 3, 2002. Thesis: the analysis of gene expression data is going...
![Page 1: Gene expression Statistics 246, Week 3, 2002. Thesis: the analysis of gene expression data is going to be big in 21st century statistics Many different.](https://reader035.fdocuments.in/reader035/viewer/2022062423/5697bfac1a28abf838c9ba75/html5/thumbnails/1.jpg)
Gene expressionGene expression
Statistics 246, Week 3, 2002
![Page 2: Gene expression Statistics 246, Week 3, 2002. Thesis: the analysis of gene expression data is going to be big in 21st century statistics Many different.](https://reader035.fdocuments.in/reader035/viewer/2022062423/5697bfac1a28abf838c9ba75/html5/thumbnails/2.jpg)
Thesis:Thesis: the analysis of gene the analysis of gene expression data is going to be big expression data is going to be big
in 21st century statisticsin 21st century statistics
Many different technologies, including
High-density nylon membrane arrays
Serial analysis of gene expression (SAGE)
Short oligonucleotide arrays (Affymetrix)
Long oligo arrays (Agilent)
Fibre optic arrays (Illumina)
cDNA arrays (Brown/Botstein)*
![Page 3: Gene expression Statistics 246, Week 3, 2002. Thesis: the analysis of gene expression data is going to be big in 21st century statistics Many different.](https://reader035.fdocuments.in/reader035/viewer/2022062423/5697bfac1a28abf838c9ba75/html5/thumbnails/3.jpg)
1995 1996 1997 1998 1999 2000 2001
0
100
200
300
400
500
600
(projected)
Year
Num
ber
of
papers
Total microarray articles indexed in Medline
![Page 4: Gene expression Statistics 246, Week 3, 2002. Thesis: the analysis of gene expression data is going to be big in 21st century statistics Many different.](https://reader035.fdocuments.in/reader035/viewer/2022062423/5697bfac1a28abf838c9ba75/html5/thumbnails/4.jpg)
Common themes themes
• Parallel approach to collection of very large amounts of data (by biological standards)
• Sophisticated instrumentation, requires some understanding
• Systematic features of the data are at least as important as the random ones
• Often more like industrial process than single investigator lab research
• Integration of many data types: clinical, genetic, molecular…..databases
![Page 5: Gene expression Statistics 246, Week 3, 2002. Thesis: the analysis of gene expression data is going to be big in 21st century statistics Many different.](https://reader035.fdocuments.in/reader035/viewer/2022062423/5697bfac1a28abf838c9ba75/html5/thumbnails/5.jpg)
Biological backgroundBiological background
G T A A T C C T C | | | | | | | | | C A T T A G G A G
DNA
G U A A U C C
RNA polymerase
mRNA
Transcription
![Page 6: Gene expression Statistics 246, Week 3, 2002. Thesis: the analysis of gene expression data is going to be big in 21st century statistics Many different.](https://reader035.fdocuments.in/reader035/viewer/2022062423/5697bfac1a28abf838c9ba75/html5/thumbnails/6.jpg)
Idea: measure the amount of mRNA to see which genes are being expressed in (used by) the cell.
Measuring protein might be better, but is currently harder.
![Page 7: Gene expression Statistics 246, Week 3, 2002. Thesis: the analysis of gene expression data is going to be big in 21st century statistics Many different.](https://reader035.fdocuments.in/reader035/viewer/2022062423/5697bfac1a28abf838c9ba75/html5/thumbnails/7.jpg)
Reverse transcriptionReverse transcriptionClone cDNA strands, complementary to the mRNA
G U A A U C C U C
Reverse transcriptase
mRNA
cDNA
C A T T A G G A G C A T T A G G A G C A T T A G G A G C A T T A G G A G
T T A G G A G
C A T T A G G A G C A T T A G G A G C A T T A G G A G
C A T T A G G A G
C A T T A G G A G
![Page 8: Gene expression Statistics 246, Week 3, 2002. Thesis: the analysis of gene expression data is going to be big in 21st century statistics Many different.](https://reader035.fdocuments.in/reader035/viewer/2022062423/5697bfac1a28abf838c9ba75/html5/thumbnails/8.jpg)
cDNA microarray experimentscDNA microarray experiments
mRNA levels compared in many different contexts
Different tissues, same organism (brain v. liver) Same tissue, same organism (ttt v. ctl, tumor v. non-tumor) Same tissue, different organisms (wt v. ko, tg, or mutant)
Time course experiments (effect of ttt, development)
Other special designs (e.g. to detect spatial patterns).
![Page 9: Gene expression Statistics 246, Week 3, 2002. Thesis: the analysis of gene expression data is going to be big in 21st century statistics Many different.](https://reader035.fdocuments.in/reader035/viewer/2022062423/5697bfac1a28abf838c9ba75/html5/thumbnails/9.jpg)
cDNA microarrayscDNA microarrays
cDNA clones
![Page 10: Gene expression Statistics 246, Week 3, 2002. Thesis: the analysis of gene expression data is going to be big in 21st century statistics Many different.](https://reader035.fdocuments.in/reader035/viewer/2022062423/5697bfac1a28abf838c9ba75/html5/thumbnails/10.jpg)
cDNA microarrayscDNA microarraysCompare the genetic expression in two samples of cells
cDNA from one gene on each spot
SAMPLES
cDNA labelled red/green
e.g. treatment / control
normal / tumor tissue
![Page 11: Gene expression Statistics 246, Week 3, 2002. Thesis: the analysis of gene expression data is going to be big in 21st century statistics Many different.](https://reader035.fdocuments.in/reader035/viewer/2022062423/5697bfac1a28abf838c9ba75/html5/thumbnails/11.jpg)
HYBRIDIZE
Add equal amounts of labelled cDNA samples to microarray.
SCAN
Laser Detector
![Page 12: Gene expression Statistics 246, Week 3, 2002. Thesis: the analysis of gene expression data is going to be big in 21st century statistics Many different.](https://reader035.fdocuments.in/reader035/viewer/2022062423/5697bfac1a28abf838c9ba75/html5/thumbnails/12.jpg)
Biological questionDifferentially expressed genesSample class prediction etc.
Testing
Biological verification and interpretation
Microarray experiment
Estimation
Experimental design
Image analysis
Normalization
Clustering Discrimination
R, G
16-bit TIFF files
(Rfg, Rbg), (Gfg, Gbg)
![Page 13: Gene expression Statistics 246, Week 3, 2002. Thesis: the analysis of gene expression data is going to be big in 21st century statistics Many different.](https://reader035.fdocuments.in/reader035/viewer/2022062423/5697bfac1a28abf838c9ba75/html5/thumbnails/13.jpg)
Some statistical questionsSome statistical questions
Image analysis: addressing, segmenting, quantifying Normalisation: within and between slides
Quality: of images, of spots, of (log) ratios
Which genes are (relatively) up/down regulated?
Assigning p-values to tests/confidence to results.
![Page 14: Gene expression Statistics 246, Week 3, 2002. Thesis: the analysis of gene expression data is going to be big in 21st century statistics Many different.](https://reader035.fdocuments.in/reader035/viewer/2022062423/5697bfac1a28abf838c9ba75/html5/thumbnails/14.jpg)
Some statistical questions, ctdSome statistical questions, ctd
Planning of experiments: design, sample size
Discrimination and allocation of samples
Clustering, classification: of samples, of genes
Selection of genes relevant to any given analysis
Analysis of time course, factorial and other special experiments…..…...& much more.
![Page 15: Gene expression Statistics 246, Week 3, 2002. Thesis: the analysis of gene expression data is going to be big in 21st century statistics Many different.](https://reader035.fdocuments.in/reader035/viewer/2022062423/5697bfac1a28abf838c9ba75/html5/thumbnails/15.jpg)
Some bioinformatic questionsSome bioinformatic questions
Connecting spots to databases, e.g. to sequence, structure, and pathway databases
Discovering short sequences regulating sets of genes: direct and inverse methods
Relating expression profiles to structure and function, e.g. protein localisation
Identifying novel biochemical or signalling pathways, ………..and much more.
![Page 16: Gene expression Statistics 246, Week 3, 2002. Thesis: the analysis of gene expression data is going to be big in 21st century statistics Many different.](https://reader035.fdocuments.in/reader035/viewer/2022062423/5697bfac1a28abf838c9ba75/html5/thumbnails/16.jpg)
Part of the image of one channel false-coloured on a white (v. high) red (high) through yellow and green (medium) to blue (low) and black scale
![Page 17: Gene expression Statistics 246, Week 3, 2002. Thesis: the analysis of gene expression data is going to be big in 21st century statistics Many different.](https://reader035.fdocuments.in/reader035/viewer/2022062423/5697bfac1a28abf838c9ba75/html5/thumbnails/17.jpg)
Does one size fit all?
![Page 18: Gene expression Statistics 246, Week 3, 2002. Thesis: the analysis of gene expression data is going to be big in 21st century statistics Many different.](https://reader035.fdocuments.in/reader035/viewer/2022062423/5697bfac1a28abf838c9ba75/html5/thumbnails/18.jpg)
Segmentation: limitation of the Segmentation: limitation of the fixed circle methodfixed circle method
SRG Fixed Circle
Inside the boundary is spot (foreground), outside is not.
![Page 19: Gene expression Statistics 246, Week 3, 2002. Thesis: the analysis of gene expression data is going to be big in 21st century statistics Many different.](https://reader035.fdocuments.in/reader035/viewer/2022062423/5697bfac1a28abf838c9ba75/html5/thumbnails/19.jpg)
Some local backgroundsSome local backgrounds
We use something different again: a smaller, less variable value.
Single channelgrey scale
![Page 20: Gene expression Statistics 246, Week 3, 2002. Thesis: the analysis of gene expression data is going to be big in 21st century statistics Many different.](https://reader035.fdocuments.in/reader035/viewer/2022062423/5697bfac1a28abf838c9ba75/html5/thumbnails/20.jpg)
Quantification of expressionQuantification of expression
For each spot on the slide we calculate
Red intensity = Rfg - Rbg
fg = foreground, bg = background, and
Green intensity = Gfg - Gbg
and combine them in the log (base 2) ratio
Log2( Red intensity / Green intensity)
![Page 21: Gene expression Statistics 246, Week 3, 2002. Thesis: the analysis of gene expression data is going to be big in 21st century statistics Many different.](https://reader035.fdocuments.in/reader035/viewer/2022062423/5697bfac1a28abf838c9ba75/html5/thumbnails/21.jpg)
Gene Expression DataGene Expression Data On p genes for n slides: p is O(10,000), n is O(10-100), but growing,
Genes
Slides
Gene expression level of gene 5 in slide 4
= Log2( Red intensity / Green intensity)
slide 1 slide 2 slide 3 slide 4 slide 5 …
1 0.46 0.30 0.80 1.51 0.90 ...2 -0.10 0.49 0.24 0.06 0.46 ...3 0.15 0.74 0.04 0.10 0.20 ...4 -0.45 -1.03 -0.79 -0.56 -0.32 ...5 -0.06 1.06 1.35 1.09 -1.09 ...
These values are conventionally displayed on a red (>0) yellow (0) green (<0) scale.
![Page 22: Gene expression Statistics 246, Week 3, 2002. Thesis: the analysis of gene expression data is going to be big in 21st century statistics Many different.](https://reader035.fdocuments.in/reader035/viewer/2022062423/5697bfac1a28abf838c9ba75/html5/thumbnails/22.jpg)
![Page 23: Gene expression Statistics 246, Week 3, 2002. Thesis: the analysis of gene expression data is going to be big in 21st century statistics Many different.](https://reader035.fdocuments.in/reader035/viewer/2022062423/5697bfac1a28abf838c9ba75/html5/thumbnails/23.jpg)
The red/green ratios can be spatially biasedThe red/green ratios can be spatially biased
• .Top 2.5%of ratios red, bottom 2.5% of ratios green
![Page 24: Gene expression Statistics 246, Week 3, 2002. Thesis: the analysis of gene expression data is going to be big in 21st century statistics Many different.](https://reader035.fdocuments.in/reader035/viewer/2022062423/5697bfac1a28abf838c9ba75/html5/thumbnails/24.jpg)
The red/green ratios can be intensity-biased
M = log2R/G
= log2R - log2G
= (log2R + log2G )/2
Values should scatter about zero.
![Page 25: Gene expression Statistics 246, Week 3, 2002. Thesis: the analysis of gene expression data is going to be big in 21st century statistics Many different.](https://reader035.fdocuments.in/reader035/viewer/2022062423/5697bfac1a28abf838c9ba75/html5/thumbnails/25.jpg)
Yellow: GAPDH, tubulin Light blue: MSP pool / titration
Orange: Schadt-Wong rank invariant set Red line: lowess smooth
Normalization: how we “fix” the previous problem
The curved line becomes the new zero line
![Page 26: Gene expression Statistics 246, Week 3, 2002. Thesis: the analysis of gene expression data is going to be big in 21st century statistics Many different.](https://reader035.fdocuments.in/reader035/viewer/2022062423/5697bfac1a28abf838c9ba75/html5/thumbnails/26.jpg)
Normalizing: before2
0-2
-4
6 8 10 12 14 16
M
![Page 27: Gene expression Statistics 246, Week 3, 2002. Thesis: the analysis of gene expression data is going to be big in 21st century statistics Many different.](https://reader035.fdocuments.in/reader035/viewer/2022062423/5697bfac1a28abf838c9ba75/html5/thumbnails/27.jpg)
Normalizing: after2
0-2
-4
M n
orm
alis
ed
6 8 10 12 14 16
![Page 28: Gene expression Statistics 246, Week 3, 2002. Thesis: the analysis of gene expression data is going to be big in 21st century statistics Many different.](https://reader035.fdocuments.in/reader035/viewer/2022062423/5697bfac1a28abf838c9ba75/html5/thumbnails/28.jpg)
Olfactory Epithelium
VomeroNasal Organ
Main (Auxiliary)Olfactory Bulb
From Buck (2000)
From a study of the mouse olfactory system
![Page 29: Gene expression Statistics 246, Week 3, 2002. Thesis: the analysis of gene expression data is going to be big in 21st century statistics Many different.](https://reader035.fdocuments.in/reader035/viewer/2022062423/5697bfac1a28abf838c9ba75/html5/thumbnails/29.jpg)
Axonal connectivity between the nose and the mouse olfactory bulb
>2M, ~1,800 types
Two principles: “zone-to-zone projection”, and “glomerular convergence”
Neocortex
![Page 30: Gene expression Statistics 246, Week 3, 2002. Thesis: the analysis of gene expression data is going to be big in 21st century statistics Many different.](https://reader035.fdocuments.in/reader035/viewer/2022062423/5697bfac1a28abf838c9ba75/html5/thumbnails/30.jpg)
Of interest: the hardwiring of the Of interest: the hardwiring of the vertebrate olfactory systemvertebrate olfactory system
• Expression of a specific odorant receptor gene by an olfactory neuron.
• Targeting and convergence of like axons to specific glomeruli in the olfactory bulb.
![Page 31: Gene expression Statistics 246, Week 3, 2002. Thesis: the analysis of gene expression data is going to be big in 21st century statistics Many different.](https://reader035.fdocuments.in/reader035/viewer/2022062423/5697bfac1a28abf838c9ba75/html5/thumbnails/31.jpg)
The biological question in this caseThe biological question in this case
Are there genes with spatially restricted expression patterns within
the olfactory bulb?
![Page 32: Gene expression Statistics 246, Week 3, 2002. Thesis: the analysis of gene expression data is going to be big in 21st century statistics Many different.](https://reader035.fdocuments.in/reader035/viewer/2022062423/5697bfac1a28abf838c9ba75/html5/thumbnails/32.jpg)
![Page 33: Gene expression Statistics 246, Week 3, 2002. Thesis: the analysis of gene expression data is going to be big in 21st century statistics Many different.](https://reader035.fdocuments.in/reader035/viewer/2022062423/5697bfac1a28abf838c9ba75/html5/thumbnails/33.jpg)
Layout of the cDNA MicroarraysLayout of the cDNA Microarrays
• Sequence verified mouse cDNAs• 19,200 spots in two print groups of 9,600 each
– 4 x 4 grid, each with 25 x24 spots– Controls on the first 2 rows of each grid.
77
pg1 pg2
![Page 34: Gene expression Statistics 246, Week 3, 2002. Thesis: the analysis of gene expression data is going to be big in 21st century statistics Many different.](https://reader035.fdocuments.in/reader035/viewer/2022062423/5697bfac1a28abf838c9ba75/html5/thumbnails/34.jpg)
Design: How We Sliced Up the BulbDesign: How We Sliced Up the Bulb
A
P D
V
M
L
![Page 35: Gene expression Statistics 246, Week 3, 2002. Thesis: the analysis of gene expression data is going to be big in 21st century statistics Many different.](https://reader035.fdocuments.in/reader035/viewer/2022062423/5697bfac1a28abf838c9ba75/html5/thumbnails/35.jpg)
Design: Two Ways to Do the Design: Two Ways to Do the ComparisonsComparisons
Goal: 3-D representation of gene expression
P
D
MA
V
L
R
Compare all samples to a common reference sample (e.g., whole bulb)
P
D
MA
V
L
Multiple direct comparisons between different samples (no common reference)
![Page 36: Gene expression Statistics 246, Week 3, 2002. Thesis: the analysis of gene expression data is going to be big in 21st century statistics Many different.](https://reader035.fdocuments.in/reader035/viewer/2022062423/5697bfac1a28abf838c9ba75/html5/thumbnails/36.jpg)
An Important Aspect of Our DesignAn Important Aspect of Our Design
Different ways of estimating the same contrast:
e.g. A compared to P
Direct = A-P
Indirect = A-M + (M-P) or
A-D + (D-P) or
-(L-A) - (P-L)
How do we combine these?
LL
PPVV
DD
MM
AA
![Page 37: Gene expression Statistics 246, Week 3, 2002. Thesis: the analysis of gene expression data is going to be big in 21st century statistics Many different.](https://reader035.fdocuments.in/reader035/viewer/2022062423/5697bfac1a28abf838c9ba75/html5/thumbnails/37.jpg)
Analysis using a linear model
Define a matrix X so that E(M)=X
Use least squares estimates for A-L, P-L, D-L, V-L, M-LIn practice, we use robust regression.
Estimates for other estimable contrasts follow in the usual way.
€
E
m1
m2
M
mn
⎛
⎝
⎜ ⎜ ⎜
⎞
⎠
⎟ ⎟ ⎟ =
0 0 0 −1 1
−1 0 0 0 0
M O M
−1 1 0 0 0
⎛
⎝
⎜ ⎜ ⎜
⎞
⎠
⎟ ⎟ ⎟ •
A−L
P −L
D−L
V −L
M−L
⎛
⎝
⎜ ⎜ ⎜ ⎜
⎞
⎠
⎟ ⎟ ⎟ ⎟
ˆ = X' X( )−1X'M
![Page 38: Gene expression Statistics 246, Week 3, 2002. Thesis: the analysis of gene expression data is going to be big in 21st century statistics Many different.](https://reader035.fdocuments.in/reader035/viewer/2022062423/5697bfac1a28abf838c9ba75/html5/thumbnails/38.jpg)
The Olfactory Bulb ExperimentsThe Olfactory Bulb Experiments completed so farcompleted so far
![Page 39: Gene expression Statistics 246, Week 3, 2002. Thesis: the analysis of gene expression data is going to be big in 21st century statistics Many different.](https://reader035.fdocuments.in/reader035/viewer/2022062423/5697bfac1a28abf838c9ba75/html5/thumbnails/39.jpg)
Contrasts & PatternsContrasts & Patterns Because of the connectivity of our experiment, we can estimate
all 15 different pairwise comparisons directly and/or indirectly.
For every gene we thus have a pattern based on the 15 pairwise comparisons.
Gene #15,228
![Page 40: Gene expression Statistics 246, Week 3, 2002. Thesis: the analysis of gene expression data is going to be big in 21st century statistics Many different.](https://reader035.fdocuments.in/reader035/viewer/2022062423/5697bfac1a28abf838c9ba75/html5/thumbnails/40.jpg)
Contrasts & patterns:another wayContrasts & patterns:another way Instead of estimating pairwise comparisons between each of the six
effects, we can come closer to estimating the effects themselves by doing so subject to the standard zero sum constraint (6 parameters, 5 d.f.).
What we estimate for A, say, subject to this constraint, is in reality an estimate of
A - 1/6(A + P + D + V + M + L).
This set of parameter estimates gives results similar to, but better than, the ones we would have obtained had we carried out the experiments with whole-bulb reference tissue.
In effect we have created the whole-bulb reference in silico.
![Page 41: Gene expression Statistics 246, Week 3, 2002. Thesis: the analysis of gene expression data is going to be big in 21st century statistics Many different.](https://reader035.fdocuments.in/reader035/viewer/2022062423/5697bfac1a28abf838c9ba75/html5/thumbnails/41.jpg)
Alternative pattern representationAlternative pattern representation
Gene #15,228 once again.
![Page 42: Gene expression Statistics 246, Week 3, 2002. Thesis: the analysis of gene expression data is going to be big in 21st century statistics Many different.](https://reader035.fdocuments.in/reader035/viewer/2022062423/5697bfac1a28abf838c9ba75/html5/thumbnails/42.jpg)
Reconstruction of the Bulb as a Cube:Reconstruction of the Bulb as a Cube:Expression of Gene # 15,228Expression of Gene # 15,228
ExpressionLevel
High
Low
![Page 43: Gene expression Statistics 246, Week 3, 2002. Thesis: the analysis of gene expression data is going to be big in 21st century statistics Many different.](https://reader035.fdocuments.in/reader035/viewer/2022062423/5697bfac1a28abf838c9ba75/html5/thumbnails/43.jpg)
Patterns, More Globally...Patterns, More Globally...
1. Find the genes whose expression fits specific, predefined patterns.
2. Perform cluster analysis - see what expression patterns emerge.
Can we identify genes with interesting patterns of expression across the bulb?
Two approaches:
![Page 44: Gene expression Statistics 246, Week 3, 2002. Thesis: the analysis of gene expression data is going to be big in 21st century statistics Many different.](https://reader035.fdocuments.in/reader035/viewer/2022062423/5697bfac1a28abf838c9ba75/html5/thumbnails/44.jpg)
Clustering procedureClustering procedure
Start with a sets of genes exhibiting some minimal level of differential expression across the bulb; here ~650 were chosen from all 15 contrasts.
Carry out hierarchical clustering, building a dendrogram: Mahalanobis distance and Ward agglomeration (minimum variance) were used.
Now consider all clusters of 2 or more genes in the tree. Singles are added separately.
Measure the heterogeneity h of a cluster by calculating the 15 SDs
across the cluster of each of the pairwise effects, and taking the largest.
Choose a score s (see plots) and take all maximal disjoint clusters with
h < s. Here we used s = 0.46 and obtained 16 clusters.
![Page 45: Gene expression Statistics 246, Week 3, 2002. Thesis: the analysis of gene expression data is going to be big in 21st century statistics Many different.](https://reader035.fdocuments.in/reader035/viewer/2022062423/5697bfac1a28abf838c9ba75/html5/thumbnails/45.jpg)
Plots guiding choice of clusters of genes
Cluster heterogeneity h (max of 15 SDs)
Number ofclusters(patterns)
Number of genes
![Page 46: Gene expression Statistics 246, Week 3, 2002. Thesis: the analysis of gene expression data is going to be big in 21st century statistics Many different.](https://reader035.fdocuments.in/reader035/viewer/2022062423/5697bfac1a28abf838c9ba75/html5/thumbnails/46.jpg)
Red :genes chosenBlue:controls
15 p/w effects
PA DA VA
LA DP VPLA MP
MA
LP VD MD
LA LV LMMV
LD
![Page 47: Gene expression Statistics 246, Week 3, 2002. Thesis: the analysis of gene expression data is going to be big in 21st century statistics Many different.](https://reader035.fdocuments.in/reader035/viewer/2022062423/5697bfac1a28abf838c9ba75/html5/thumbnails/47.jpg)
![Page 48: Gene expression Statistics 246, Week 3, 2002. Thesis: the analysis of gene expression data is going to be big in 21st century statistics Many different.](https://reader035.fdocuments.in/reader035/viewer/2022062423/5697bfac1a28abf838c9ba75/html5/thumbnails/48.jpg)
The 16 groups systematically arranged (6 point representation)
![Page 49: Gene expression Statistics 246, Week 3, 2002. Thesis: the analysis of gene expression data is going to be big in 21st century statistics Many different.](https://reader035.fdocuments.in/reader035/viewer/2022062423/5697bfac1a28abf838c9ba75/html5/thumbnails/49.jpg)
![Page 50: Gene expression Statistics 246, Week 3, 2002. Thesis: the analysis of gene expression data is going to be big in 21st century statistics Many different.](https://reader035.fdocuments.in/reader035/viewer/2022062423/5697bfac1a28abf838c9ba75/html5/thumbnails/50.jpg)
Validation of Gene # 15,228 Expression Validation of Gene # 15,228 Expression Pattern by RNA Pattern by RNA In SituIn Situ Hybridization Hybridization
gluR
CTX
MOB
AOB
#15,228
CTXAOB
MOB
![Page 51: Gene expression Statistics 246, Week 3, 2002. Thesis: the analysis of gene expression data is going to be big in 21st century statistics Many different.](https://reader035.fdocuments.in/reader035/viewer/2022062423/5697bfac1a28abf838c9ba75/html5/thumbnails/51.jpg)
Gene 15,228: another in situ view
![Page 52: Gene expression Statistics 246, Week 3, 2002. Thesis: the analysis of gene expression data is going to be big in 21st century statistics Many different.](https://reader035.fdocuments.in/reader035/viewer/2022062423/5697bfac1a28abf838c9ba75/html5/thumbnails/52.jpg)
384(group 3)
D
V
L M
![Page 53: Gene expression Statistics 246, Week 3, 2002. Thesis: the analysis of gene expression data is going to be big in 21st century statistics Many different.](https://reader035.fdocuments.in/reader035/viewer/2022062423/5697bfac1a28abf838c9ba75/html5/thumbnails/53.jpg)
3-dimension reconstruction from in-situ data
15,228
5,291
8,496
384
![Page 54: Gene expression Statistics 246, Week 3, 2002. Thesis: the analysis of gene expression data is going to be big in 21st century statistics Many different.](https://reader035.fdocuments.in/reader035/viewer/2022062423/5697bfac1a28abf838c9ba75/html5/thumbnails/54.jpg)
Are the pattens we found real?Are the pattens we found real?
Here’s how we attempted to show that the answer is a qualified yes.
Each cluster average (pattern) has a ‘strength’ we can measure by its root-mean-square (RMS). The n=16 clusters we found have an average
RMS of av= 0.3. Both n and av being strongly determined by our heterogeneity cut-off score of s=0.46.
Now consider randomizing the labels (e.g. P-A) on our hybridizations and repeating the entire analysis, keeping the cut-off score at 0.46. We typically get fewer, “weaker” patterns, with less contrast in the red-green patchwork. One such is on the next page.
500 independent random relabellings had a mean av value of 0.18, an SD of 0.07 and a max av value of 0.26, cf. 0.3 in our data. Our clusters are definitely ‘non-random’ in some sense.
![Page 55: Gene expression Statistics 246, Week 3, 2002. Thesis: the analysis of gene expression data is going to be big in 21st century statistics Many different.](https://reader035.fdocuments.in/reader035/viewer/2022062423/5697bfac1a28abf838c9ba75/html5/thumbnails/55.jpg)
Random
Real
![Page 56: Gene expression Statistics 246, Week 3, 2002. Thesis: the analysis of gene expression data is going to be big in 21st century statistics Many different.](https://reader035.fdocuments.in/reader035/viewer/2022062423/5697bfac1a28abf838c9ba75/html5/thumbnails/56.jpg)
ProblemProblem
We later tried all this with a different set of data, one which made use of reference mRNA had generally lower S/N, and where the inveestigator sought fewer interesting patterns.
We found that the patterns the previous method discovered were similarly quite distinct in av values from those in randomly labelled hybs, but this time, the av values were ‘significantly’ lower than random. It all depends where you are on the curve.
![Page 57: Gene expression Statistics 246, Week 3, 2002. Thesis: the analysis of gene expression data is going to be big in 21st century statistics Many different.](https://reader035.fdocuments.in/reader035/viewer/2022062423/5697bfac1a28abf838c9ba75/html5/thumbnails/57.jpg)
Where next?Where next?
I feel that we need a new idea. The previous one doesn’t seem to have worked. Or did it?
Just clustering and taking averages seems too easy….
But maybe clustering is all there is to patterns, once we have decided on the appropriate and context dependent profile to cluster, and selected the genes, but I keep wondering…
![Page 58: Gene expression Statistics 246, Week 3, 2002. Thesis: the analysis of gene expression data is going to be big in 21st century statistics Many different.](https://reader035.fdocuments.in/reader035/viewer/2022062423/5697bfac1a28abf838c9ba75/html5/thumbnails/58.jpg)
Some statistical research stimulated Some statistical research stimulated by microarray data analysisby microarray data analysis
Experimental design : Churchill & Kerr
Image analysis: Zuzan & West, ….
Data visualization: Carr et al
Estimation: Ideker et al, ….
Multiple testing: Westfall & Young , Storey, ….
Discriminant analysis: Golub et al,…
Clustering: Hastie & Tibshirani, Van der Laan, Fridlyand & Dudoit, ….
Empirical Bayes: Efron et al, Newton et al,…. Multiplicative models: Li &Wong
Multivariate analysis: Alter et al
Genetic networks: D’Haeseleer et al and more
![Page 59: Gene expression Statistics 246, Week 3, 2002. Thesis: the analysis of gene expression data is going to be big in 21st century statistics Many different.](https://reader035.fdocuments.in/reader035/viewer/2022062423/5697bfac1a28abf838c9ba75/html5/thumbnails/59.jpg)
In closing:In closing: The pervasiveness of The pervasiveness of microarray technologymicroarray technology
and the statistical problems that go with it
Hybridization of target DNA or RNA to large numbers of probes attached to a solid support in a microarray format has a much wider applicability.
All such applications have their own statistical problems. Here are two relating to the previous lectures.
![Page 60: Gene expression Statistics 246, Week 3, 2002. Thesis: the analysis of gene expression data is going to be big in 21st century statistics Many different.](https://reader035.fdocuments.in/reader035/viewer/2022062423/5697bfac1a28abf838c9ba75/html5/thumbnails/60.jpg)
Meiosis data in which all exchanges are precisely located (from microarrays)
Figure courtesy of J Derisi
Yeast
![Page 61: Gene expression Statistics 246, Week 3, 2002. Thesis: the analysis of gene expression data is going to be big in 21st century statistics Many different.](https://reader035.fdocuments.in/reader035/viewer/2022062423/5697bfac1a28abf838c9ba75/html5/thumbnails/61.jpg)
Predicted exon Predicted exon
Exon Arrays can validate Exon Predictions Exon Arrays can validate Exon Predictions and assemble Gene Structuresand assemble Gene Structures
One or more Probes per Predicted Exon
• Verify predicted exons on a genome-wide scale.• Group exons into genes via co-regulation.
This and the next slide courtesy of Rosetta
![Page 62: Gene expression Statistics 246, Week 3, 2002. Thesis: the analysis of gene expression data is going to be big in 21st century statistics Many different.](https://reader035.fdocuments.in/reader035/viewer/2022062423/5697bfac1a28abf838c9ba75/html5/thumbnails/62.jpg)
Tiling arrays can identify exons and Tiling arrays can identify exons and refine gene structuresrefine gene structures
Oligonucleotides60 bp in length “60-mers”
10 bp steps
Predicted exon Predicted exon
![Page 63: Gene expression Statistics 246, Week 3, 2002. Thesis: the analysis of gene expression data is going to be big in 21st century statistics Many different.](https://reader035.fdocuments.in/reader035/viewer/2022062423/5697bfac1a28abf838c9ba75/html5/thumbnails/63.jpg)
AcknowledgmentsAcknowledgmentsStatistical collaboratorsStatistical collaboratorsYee Hwa Yang (Berkeley)Yee Hwa Yang (Berkeley)Sandrine Dudoit (Berkeley)Sandrine Dudoit (Berkeley)Ingrid Lönnstedt (Uppsala)Natalie Thorne (WEHI)Natalie Thorne (WEHI)Mauro Delorenzi (WEHI)
CSIRO Image Analysis GroupMichael BuckleyMichael BuckleyRyan Lagerstorm
WEHIGlenn BegleySuzie GrantRob Good
PMCIChuang Fong Kong
Ngai Lab (Berkeley)Cynthia DugganJonathan ScolnickDave Lin Vivian Peng Percy LuuElva DiazJohn Ngai
LBNLMatt Callow
RIKEN Genomic Sciences CenterRIKEN Genomic Sciences CenterYasushi OkazakiYoshihide Hayashizaki