Diagnosis of multiple cancer types by shrunken centroids of gene expression
-
Upload
shantell-allen -
Category
Documents
-
view
19 -
download
0
description
Transcript of Diagnosis of multiple cancer types by shrunken centroids of gene expression
![Page 1: Diagnosis of multiple cancer types by shrunken centroids of gene expression](https://reader035.fdocuments.in/reader035/viewer/2022062721/56813615550346895d9d8b0f/html5/thumbnails/1.jpg)
Diagnosis of multiple cancer types by shrunken centroids of gene expression
Course: 550.635 Topics in Bioinformatics Presenter: Ting YangTeacher: Professor Geman
By Robert Tibshirani, Trevor Hastie, Balasubramanian Narasimhan, and Gilbert Chu
![Page 2: Diagnosis of multiple cancer types by shrunken centroids of gene expression](https://reader035.fdocuments.in/reader035/viewer/2022062721/56813615550346895d9d8b0f/html5/thumbnails/2.jpg)
Nearest Centroid Classification
Example: small round blue cell tumors of childhood
• 63 training samples, 25 testing samples
• 4 classes: BL, EWS, NB, RMS
• Figure 1
• Nearest centroid classification
• Disadvantage
![Page 3: Diagnosis of multiple cancer types by shrunken centroids of gene expression](https://reader035.fdocuments.in/reader035/viewer/2022062721/56813615550346895d9d8b0f/html5/thumbnails/3.jpg)
![Page 4: Diagnosis of multiple cancer types by shrunken centroids of gene expression](https://reader035.fdocuments.in/reader035/viewer/2022062721/56813615550346895d9d8b0f/html5/thumbnails/4.jpg)
Nearest shrunken Centroids
• A modification of the nearest centroid method
• Idea: First normalize class centroids by the within-class standard deviation for each gene, shrink each class centroid towards the overall centroid.
![Page 5: Diagnosis of multiple cancer types by shrunken centroids of gene expression](https://reader035.fdocuments.in/reader035/viewer/2022062721/56813615550346895d9d8b0f/html5/thumbnails/5.jpg)
Details:
0( )ik i
ikk i
x xd
m s s
Mean expression value in class k for gene i
ith component of the overall centroid
Pooled within class standard deviation for gene i
:t statistics
1 1k
k
mn n
![Page 6: Diagnosis of multiple cancer types by shrunken centroids of gene expression](https://reader035.fdocuments.in/reader035/viewer/2022062721/56813615550346895d9d8b0f/html5/thumbnails/6.jpg)
:t statistics0( )
ik iik
k i
x xd
m s s
• It measures the difference between the gene i in class k and gene i in all classes combined.
• Idea: a gene that discriminates one class from the rest will have a statistic of large absolute value.
![Page 7: Diagnosis of multiple cancer types by shrunken centroids of gene expression](https://reader035.fdocuments.in/reader035/viewer/2022062721/56813615550346895d9d8b0f/html5/thumbnails/7.jpg)
• Shrink it toward zero to eliminate the genes that do not provide sufficient information.
• ‘De-noising’ step
( )( )ik ik ikd sign d d
![Page 8: Diagnosis of multiple cancer types by shrunken centroids of gene expression](https://reader035.fdocuments.in/reader035/viewer/2022062721/56813615550346895d9d8b0f/html5/thumbnails/8.jpg)
Choosing the amount of shrinkage• Shrinkage amount is allowed to vary over a wide range.
• 10-fold cross validation ( choose the one that has the smallest error rate)
• Divide the set of samples (at random)into 10 equal size parts.
(classes were distributed proportionally among each of the 10 parts)
• Fit the model on 90% of the samples and then predict the class label of the remaining 10% (test samples).
• Repeat 10 times, add together the error (overall error).
• Figure 2
• Figure 1
![Page 9: Diagnosis of multiple cancer types by shrunken centroids of gene expression](https://reader035.fdocuments.in/reader035/viewer/2022062721/56813615550346895d9d8b0f/html5/thumbnails/9.jpg)
![Page 10: Diagnosis of multiple cancer types by shrunken centroids of gene expression](https://reader035.fdocuments.in/reader035/viewer/2022062721/56813615550346895d9d8b0f/html5/thumbnails/10.jpg)
![Page 11: Diagnosis of multiple cancer types by shrunken centroids of gene expression](https://reader035.fdocuments.in/reader035/viewer/2022062721/56813615550346895d9d8b0f/html5/thumbnails/11.jpg)
More Figures
• Figure 3
• Figure 4
![Page 12: Diagnosis of multiple cancer types by shrunken centroids of gene expression](https://reader035.fdocuments.in/reader035/viewer/2022062721/56813615550346895d9d8b0f/html5/thumbnails/12.jpg)
![Page 13: Diagnosis of multiple cancer types by shrunken centroids of gene expression](https://reader035.fdocuments.in/reader035/viewer/2022062721/56813615550346895d9d8b0f/html5/thumbnails/13.jpg)
Classification
• A new sample is classified by comparing its expression profile with each shrunken centroid, over those 43 active genes.
• Distance function: prior information included.
![Page 14: Diagnosis of multiple cancer types by shrunken centroids of gene expression](https://reader035.fdocuments.in/reader035/viewer/2022062721/56813615550346895d9d8b0f/html5/thumbnails/14.jpg)
Statistical details:
• t-statistic
• Estimates of the class probabilities (Figure 5)
0( )ik i
ikk i
x xd
m s s
![Page 15: Diagnosis of multiple cancer types by shrunken centroids of gene expression](https://reader035.fdocuments.in/reader035/viewer/2022062721/56813615550346895d9d8b0f/html5/thumbnails/15.jpg)