Gene expression profiling identifies molecular subtypes of gliomas
description
Transcript of Gene expression profiling identifies molecular subtypes of gliomas
Gene expression profGene expression profiling identifies moleciling identifies molecular subtypes of glioular subtypes of glio
masmasRuty Shai, Tao Shi, Thomas J Kremen, Steve HorvaRuty Shai, Tao Shi, Thomas J Kremen, Steve Horvath, Linda M Liau, Timothy F Cloughesy, Paul S Misth, Linda M Liau, Timothy F Cloughesy, Paul S Mischel* and Stanley F Nelsonchel* and Stanley F Nelson
Presented by Stephanie TsungPresented by Stephanie Tsung
OutlineOutline
Descriptions of Data Descriptions of Data Statistical MethodsStatistical Methods
Multidimensional Scaling PlotMultidimensional Scaling Plot Hierarchical ClusteringHierarchical Clustering K-means ClusteringK-means Clustering Gene Filtering/SelectionGene Filtering/Selection Predictor ComparisonPredictor Comparison
Conclusion/ Future worksConclusion/ Future works
BackgroundBackground Brain tumors can be classified by tumor Brain tumors can be classified by tumor
origins, cell type origin or the tumor site origins, cell type origin or the tumor site etc; etc;
Tumor classification has been critical in Tumor classification has been critical in treatment selection and outcome predicttreatment selection and outcome prediction. However, current classification metion. However, current classification methods are still far from perfect;hods are still far from perfect;
As a new technology, DNA microarray haAs a new technology, DNA microarray has been introduced to cancer classificatios been introduced to cancer classification on the basis of gene expression levels. n on the basis of gene expression levels.
Background: Cancer Background: Cancer ClassificationClassification
Cancer classification can be divided into Cancer classification can be divided into two challenges: class discovery and class two challenges: class discovery and class prediction. prediction.
Class discovery refers to definingClass discovery refers to defining
previously unrecognized tumor subtypes. previously unrecognized tumor subtypes.
Class prediction refersClass prediction refers to the assignment to the assignment of particular tumor samples to already-of particular tumor samples to already-defineddefined classes.classes.
ObjectivesObjectives1.1. To test whether gene expression To test whether gene expression
measurements can be used to classify measurements can be used to classify different brain tumors; different brain tumors;
2.2. To determine sets of significant genes to To determine sets of significant genes to
distinguish brain tumor of different distinguish brain tumor of different pathological types, grades and survival pathological types, grades and survival times; times;
3.3. To validate the selected informative To validate the selected informative genes in brain tumor classification and genes in brain tumor classification and prediction. prediction.
Affymetrix HG-U95Av2 chipsAffymetrix HG-U95Av2 chips 12,555 Genes and total12,555 Genes and total 42 samples 42 samples Tumor Types (#): Tumor Types (#): N(7) O(3) D(18) A(2) AA(3) P(9)N(7) O(3) D(18) A(2) AA(3) P(9)
Data pre-processing:Data pre-processing:1.1. Each tumor was examined by a neuropatholEach tumor was examined by a neuropathol
ogist and dissected into two portions: tissue ogist and dissected into two portions: tissue diagnosis and RNA extraction. diagnosis and RNA extraction.
2.2. Normalization and Model-Based Expression Normalization and Model-Based Expression indices in dChip. indices in dChip.
Data and Pre-ProcessingData and Pre-Processing
Q. Are the global transcriptional signatures of the different pathologic subtypes of glio
mas molecularly distinct?
Multidimensional Multidimensional Scaling Plot (MDS Plot)Scaling Plot (MDS Plot)To uncover the hidden structure of data. To uncover the hidden structure of data.
D(N) -> D(2)D(N) -> D(2) Dimension reduction techniqueDimension reduction technique 12,555 dimensional space to low dimen12,555 dimensional space to low dimen
sional Euclidean space sional Euclidean space Explain observed similarities and dissiExplain observed similarities and dissi
milarity between objects such as correlmilarity between objects such as correlation, euclidean distance etc.ation, euclidean distance etc.
R: cmd1 <- cmdscale(dist(dat1[,1:30]),k=2,eig=T)R: cmd1 <- cmdscale(dist(dat1[,1:30]),k=2,eig=T)
MDS PlotMDS Plot
Figure 1. (a)Multidimensional scaling plot of all 42 tissue samples plotted in two-dimensional space using expression values from all 12 555 probesets.
Hierarchical ClusteringHierarchical Clustering
1.1. Evaluate all pair wise distance betEvaluate all pair wise distance between objectsween objects
2.2. Look for a pair with shortest distaLook for a pair with shortest distancence
3.3. Construct ‘new obj’ by avg. of tConstruct ‘new obj’ by avg. of two obj.wo obj.
4.4. Evaluate distance from ‘new objEvaluate distance from ‘new obj’ to all other objects and Go to Ste’ to all other objects and Go to Step 2p 2
R: h1 <- hclust(dist(x), method=“average”)R: h1 <- hclust(dist(x), method=“average”)
Figure 1. (b) The same 42 tissue samples were grouped into hierarchical clusters. Tissue samples are color-coded.
Hierarchical ClusteringHierarchical Clustering
I II
III IV
I & II : P=0.00006, Fisher’s exact test
III & IV : P=0.00001
Fisher’s Exact TestFisher’s Exact TestSamp
le w/ charat.
w/o charat. Total
1 A B A+B
2 C D C+D
Total A+C B+D N
The two-tailed probability: .326 + .007+ .093 + .163 + .019 = .608
Ex. 55
8 7
Ho: Whether proportion of interest differs between two groups.
Q. Can we uncover Q. Can we uncover these subtypes these subtypes without prior without prior knowledge?knowledge?
i.e. How many categories of gliomas i.e. How many categories of gliomas are suggested by the gene expressioare suggested by the gene expressio
n data?n data?
K-means ClusteringK-means Clustering To find a K-partition of the observations To find a K-partition of the observations
that minimizes the within sum of squarethat minimizes the within sum of squares (WSS) for each clusterss (WSS) for each clusters
The number of clusters, k, needs to be prThe number of clusters, k, needs to be pre-specified.e-specified.
Tibshirani prediction strength can be usTibshirani prediction strength can be used to determine the optimal k.ed to determine the optimal k.
R: cl1<- kmeans (x, 3)R: cl1<- kmeans (x, 3)
Figure 2 Grouping of tumors. All tumor samples were plotted using multidimensional scaling using all 12 555 probesets. We performed nonhierarchical Kmeans clustering (Kaufmann and Rousseeu, 1990).
Gene Gene Filtering/SelectionFiltering/Selection
To find the interesting genes To find the interesting genes which differently expressed in which differently expressed in 6 two groups comparisons6 two groups comparisons
Using top 30 genes based on T-Using top 30 genes based on T-testtest
170 most differentially 170 most differentially expressed genes using T-testexpressed genes using T-test
Predictor ComparisonPredictor Comparison
Compare the performance of predictoCompare the performance of predictors: rs: Gene VoteGene Vote
Leave-one-out crossvalidation error rLeave-one-out crossvalidation error rates were calculated.ates were calculated. For a given method and sample size, n, For a given method and sample size, n, a classifier is generated a classifier is generated usingusing (n - l) cases and tested on the single remaining case.(n - l) cases and tested on the single remaining case. This is repe This is repeated n times, each time designing a classifier by leaving-one-out.ated n times, each time designing a classifier by leaving-one-out. Thus, each case in the sample is used as a test case, and each ti Thus, each case in the sample is used as a test case, and each time nearly all the cases are used to design a classifierme nearly all the cases are used to design a classifier
Table 1.Table 1.
Using 170 filtered genes based on t-test
Table 2.Table 2.
ConclusionConclusion Performed MDS plots and K-means clusteriPerformed MDS plots and K-means clusteri
ng analysis and found evidence for three clng analysis and found evidence for three clusters: glioblastomas, lower grade astrocytusters: glioblastomas, lower grade astrocytomas, and oligodendrogilmas (p<0.00001). omas, and oligodendrogilmas (p<0.00001).
A relatively small number of genes can be A relatively small number of genes can be used to distinguish between molecular subused to distinguish between molecular subtypes.types.
Subsets of gliomas can be potentially used Subsets of gliomas can be potentially used for patient stratification and potential targfor patient stratification and potential targets for treatment. ets for treatment.
Future DirectionsFuture Directions Construct predictors using Construct predictors using
different gene selection different gene selection methods.methods.
Validate the selected genes Validate the selected genes with new tumor samples.with new tumor samples.
…………
K=3 gave us the best prediction K=3 gave us the best prediction powerpower
Number of Cluster (K) 1 2 3 4 5
Tibshirani Prediction Strength
1.000 0.766 0.881 0.501 0.510
1 2 3 4 5
K
0.5
0.6
0.7
0.8
0.9
1.0
Tib
sh
ira
ni P
red
ictio
n S
tre
ng
th
Statistical problems in response-basedclassification
Identification of new or unknown classes--unsupervised learning
Classification into known classes— supervised learning
Identification of “best” predictor variables—variable selection, e.g. marker genes in microarray data (gene voting, hierarchical clustering)