Microarrays with an emphasis on DNA microarrays BE 4332 Final Project Natalie Derise.
Brad Windle, Ph.D. 628-1956 [email protected] Unsupervised Learning and Microarrays Web Site:...
-
Upload
mercy-lyons -
Category
Documents
-
view
212 -
download
0
Transcript of Brad Windle, Ph.D. 628-1956 [email protected] Unsupervised Learning and Microarrays Web Site:...
Brad Windle, [email protected]
Unsupervised Learningand Microarrays
Web Site: http://www.people.vcu.edu/~bwindleLink to Courses and then lecture for this class
Gene Expression Profiling
Unsupervised Learning
Cluster Analysisand
Applications
Good review of microarray data analysis isComputational analysis of microarray data.Quackenbush J. Nat Rev Genet 2001 Jun;2(6):418-427
Reductionism versus Systems Approach
Why generate global analyses?
as opposed to picking a gene/protein and hoping you get lucky and it has great significance to the big picture or to mankind’s health.
Normalizing Data
Northern blot
For normalizing samples, you would divide experimental values bythe mean of the values thought to be constant through the samples
Sample values are typically normalized by dividing by the meanof the reference values or mean of all values
What about normalizing gene values across all the samples?
100
10
Rationale for normalizing samples does not apply to genes
One strategy is to subtract the mean (mean centering).
Log transformation
.01 1 10 100//
-2 0 2
Gene to Gene Variability
Cluster Analysis
Goal - puts items (genes) together in clusters based on similarity of expression across various conditions, either similarity of absolute expression levels or overall similarity in pattern
1
2
34
1
2
34
item X Y Z
1 1 1.5 1
2 1.2 1.3 1.5
3 1.4 3.2 4.0
4 5.1 3.5 2.1
d= (X1-X2)2 + (Y1-Y2)2 + (Z1-Z2)2
QuickTime™ and aAnimation decompressor
are needed to see this picture.
1
2
34
d= (X1-X2)2 + (Y1-Y2)2 + (Z1-Z2)2
item X Y Z
1 1 1.5 1
2 1.2 1.3 1.5
3 1.4 3.2 4.0
4 5.1 3.5 2.1
item X Y Z
1 1 1.5 1
2 1.2 1.3 1.5
3 1.4 3.2 4.0
4 5.1 3.5 2.1
1 2 3 4
0 .28 1.75 4.56
.28 0 1.91 4.48
1.75 1.91 0 3.71
4.56 4.48 3.71 0
1
2
3
4
r =n(∑XY) -(∑X)(∑Y)
[n∑X2-(∑X)2][n∑Y2-(∑Y)2]Pearson
1.00 -0.19 0.22 -0.04
-0.19 1.00 0.92 -0.97
0.22 0.92 1.00 -0.98
-0.04 -0.97 -0.98 1.00
1
2
3
4
1 2 3 41 2 3 4
1
2
3
4
0.00 1.19 0.78 1.04
1.19 0.00 0.08 1.97
0.78 0.08 0.00 1.98
1.04 1.97 1.98 0.00
d= 1-r 0 to 2
r= -1 to +1d= 1-|r| 0 to 1
d= 1-r2 0 to 1
Item 1
Item 2
Item 3
Item 4
Item 5
Item 6
Item 7
1
2
3
4
Hierarchical Clustering
Divisive Agglomerative(Aggregative)
Clustering Methods
A
B
C
D
.1
.12
.15
.15
.6
.6
A
B
C
D
.1
.12
.2
.3
.2
.6
Cluster Linkage Methods
Nearest Neighboror Single Linkage
Furthest Neighboror Complete Linkage
Average Neighborsor Average Linkage
2N-1
item X Y Z
1 1 1.5 1
2 1.2 1.3 1.5
3 1.4 3.2 4.0
4 5.1 3.5 2.1
X Y Z
12
3
1 2 3
K-Means Clustering and it’s relative Self-Organizing Maps (SOM)
12
3
1
2
3
0 10
0
5
10
15
0 5 10 15
Ranking Order Clustering
Cluster Playground 3
Applications of Gene Expression Profiling andCluster Analysis
Tissue or Tumor Classification
Gene Classification
Drug Classification
Drug Target Identification
B-Cell LymphomaNATURE 403, 503-511, 2000
Indistinguishable by histology
Yet half responded well to therapy and half did not
Where there differences in gene expression that correlate with drug response?
Gene expression profiles showed half the lymphomas were of GC B-Cell lineage and the other of Activated B-Cell lineage
A subset of genes predicts therapeutic outcome
M1 M2 M3 M4 M5 M6
M7 M8 M9 M10M11M12
M13M14M15M16M17M18
D1 D2 D3 D4 D5 D6
D7 D8 D9 D100D11D12
D13D14D15D16D17D18
Gene Expression Profiling of Yeast Mutants and DrugsCell 102, 109–126, 2000
Mutants Drugs
M4 D17
Erg2 Dyclonine
Human sigma receptor
Validation of cdc28 Kinase Target InhibitionSCIENCE 281, 533-538, 1998
cdc28-
D1 D2
} Cdc28-regulated genes
} Phosphate metabolism genes
Nucleotide analogs that block cdc28pD1 and D2
Pho85
Drug 12345
CellsA B C D E
-2 -1 0 -1 .01 1 -1.5 2 0 -.5 .4 0 1 1 .2 0 .7 2 1 .9 1 0 -.5 .5 -.8
COMPAREClustering Drugs Based on Cell Line Sensitivities
Nature Genetics 24: 236-244, 2000
T1T1T1T1T1T2T2A7A7T2A7A7A7A7A7A7A7T1T1T1T1T1
ProfilingGene
Expression
ProteinExpression
MiscData
SNPs
Methylation
DrugStructure
ProteinStructure
Cell State
Disease Drug Response
MetaboliticsStructuralGenomic
Clustering NCI 60 Cancer Cell LinesNature Genetics 24: 227-238
6165 Genes
9 Types of Tissues/Tumors
BreastCNSColonLeukemiaLungMelanomaOvarianProstateRenal
Filtering Data
Filter out data with the program Cluster, based on SD cuts