Gene expression profiling identifies molecular subtypes of gliomas

Gene expression profGene expression profiling identifies moleciling identifies molecular subtypes of glioular subtypes of glio

masmasRuty Shai, Tao Shi, Thomas J Kremen, Steve HorvaRuty Shai, Tao Shi, Thomas J Kremen, Steve Horvath, Linda M Liau, Timothy F Cloughesy, Paul S Misth, Linda M Liau, Timothy F Cloughesy, Paul S Mischel* and Stanley F Nelsonchel* and Stanley F Nelson

Presented by Stephanie TsungPresented by Stephanie Tsung

OutlineOutline

Descriptions of Data Descriptions of Data Statistical MethodsStatistical Methods

Multidimensional Scaling PlotMultidimensional Scaling Plot Hierarchical ClusteringHierarchical Clustering K-means ClusteringK-means Clustering Gene Filtering/SelectionGene Filtering/Selection Predictor ComparisonPredictor Comparison

Conclusion/ Future worksConclusion/ Future works

BackgroundBackground Brain tumors can be classified by tumor Brain tumors can be classified by tumor

origins, cell type origin or the tumor site origins, cell type origin or the tumor site etc; etc;

Tumor classification has been critical in Tumor classification has been critical in treatment selection and outcome predicttreatment selection and outcome prediction. However, current classification metion. However, current classification methods are still far from perfect;hods are still far from perfect;

As a new technology, DNA microarray haAs a new technology, DNA microarray has been introduced to cancer classificatios been introduced to cancer classification on the basis of gene expression levels. n on the basis of gene expression levels.

Background: Cancer Background: Cancer ClassificationClassification

Cancer classification can be divided into Cancer classification can be divided into two challenges: class discovery and class two challenges: class discovery and class prediction. prediction.

Class discovery refers to definingClass discovery refers to defining

previously unrecognized tumor subtypes. previously unrecognized tumor subtypes.

Class prediction refersClass prediction refers to the assignment to the assignment of particular tumor samples to already-of particular tumor samples to already-defineddefined classes.classes.

ObjectivesObjectives1.1. To test whether gene expression To test whether gene expression

measurements can be used to classify measurements can be used to classify different brain tumors; different brain tumors;

2.2. To determine sets of significant genes to To determine sets of significant genes to

distinguish brain tumor of different distinguish brain tumor of different pathological types, grades and survival pathological types, grades and survival times; times;

3.3. To validate the selected informative To validate the selected informative genes in brain tumor classification and genes in brain tumor classification and prediction. prediction.

Affymetrix HG-U95Av2 chipsAffymetrix HG-U95Av2 chips 12,555 Genes and total12,555 Genes and total 42 samples 42 samples Tumor Types (#): Tumor Types (#): N(7) O(3) D(18) A(2) AA(3) P(9)N(7) O(3) D(18) A(2) AA(3) P(9)

Data pre-processing:Data pre-processing:1.1. Each tumor was examined by a neuropatholEach tumor was examined by a neuropathol

ogist and dissected into two portions: tissue ogist and dissected into two portions: tissue diagnosis and RNA extraction. diagnosis and RNA extraction.

2.2. Normalization and Model-Based Expression Normalization and Model-Based Expression indices in dChip. indices in dChip.

Data and Pre-ProcessingData and Pre-Processing

Q. Are the global transcriptional signatures of the different pathologic subtypes of glio

mas molecularly distinct?

Multidimensional Multidimensional Scaling Plot (MDS Plot)Scaling Plot (MDS Plot)To uncover the hidden structure of data. To uncover the hidden structure of data.

D(N) -> D(2)D(N) -> D(2) Dimension reduction techniqueDimension reduction technique 12,555 dimensional space to low dimen12,555 dimensional space to low dimen

sional Euclidean space sional Euclidean space Explain observed similarities and dissiExplain observed similarities and dissi

milarity between objects such as correlmilarity between objects such as correlation, euclidean distance etc.ation, euclidean distance etc.

R: cmd1 <- cmdscale(dist(dat1[,1:30]),k=2,eig=T)R: cmd1 <- cmdscale(dist(dat1[,1:30]),k=2,eig=T)

MDS PlotMDS Plot

Figure 1. (a)Multidimensional scaling plot of all 42 tissue samples plotted in two-dimensional space using expression values from all 12 555 probesets.

Hierarchical ClusteringHierarchical Clustering

1.1. Evaluate all pair wise distance betEvaluate all pair wise distance between objectsween objects

2.2. Look for a pair with shortest distaLook for a pair with shortest distancence

3.3. Construct ‘new obj’ by avg. of tConstruct ‘new obj’ by avg. of two obj.wo obj.

4.4. Evaluate distance from ‘new objEvaluate distance from ‘new obj’ to all other objects and Go to Ste’ to all other objects and Go to Step 2p 2

R: h1 <- hclust(dist(x), method=“average”)R: h1 <- hclust(dist(x), method=“average”)

Figure 1. (b) The same 42 tissue samples were grouped into hierarchical clusters. Tissue samples are color-coded.

Hierarchical ClusteringHierarchical Clustering

I II

III IV

I & II : P=0.00006, Fisher’s exact test

III & IV : P=0.00001

Fisher’s Exact TestFisher’s Exact TestSamp

le w/ charat.

w/o charat. Total

1 A B A+B

2 C D C+D

Total A+C B+D N

The two-tailed probability: .326 + .007+ .093 + .163 + .019 = .608

Ex. 55

8 7

Ho: Whether proportion of interest differs between two groups.

Q. Can we uncover Q. Can we uncover these subtypes these subtypes without prior without prior knowledge?knowledge?

i.e. How many categories of gliomas i.e. How many categories of gliomas are suggested by the gene expressioare suggested by the gene expressio

n data?n data?

K-means ClusteringK-means Clustering To find a K-partition of the observations To find a K-partition of the observations

that minimizes the within sum of squarethat minimizes the within sum of squares (WSS) for each clusterss (WSS) for each clusters

The number of clusters, k, needs to be prThe number of clusters, k, needs to be pre-specified.e-specified.

Tibshirani prediction strength can be usTibshirani prediction strength can be used to determine the optimal k.ed to determine the optimal k.

R: cl1<- kmeans (x, 3)R: cl1<- kmeans (x, 3)

Figure 2 Grouping of tumors. All tumor samples were plotted using multidimensional scaling using all 12 555 probesets. We performed nonhierarchical Kmeans clustering (Kaufmann and Rousseeu, 1990).

Gene Gene Filtering/SelectionFiltering/Selection

To find the interesting genes To find the interesting genes which differently expressed in which differently expressed in 6 two groups comparisons6 two groups comparisons

Using top 30 genes based on T-Using top 30 genes based on T-testtest

170 most differentially 170 most differentially expressed genes using T-testexpressed genes using T-test

Predictor ComparisonPredictor Comparison

Compare the performance of predictoCompare the performance of predictors: rs: Gene VoteGene Vote

Leave-one-out crossvalidation error rLeave-one-out crossvalidation error rates were calculated.ates were calculated. For a given method and sample size, n, For a given method and sample size, n, a classifier is generated a classifier is generated usingusing (n - l) cases and tested on the single remaining case.(n - l) cases and tested on the single remaining case. This is repe This is repeated n times, each time designing a classifier by leaving-one-out.ated n times, each time designing a classifier by leaving-one-out. Thus, each case in the sample is used as a test case, and each ti Thus, each case in the sample is used as a test case, and each time nearly all the cases are used to design a classifierme nearly all the cases are used to design a classifier

Table 1.Table 1.

Using 170 filtered genes based on t-test

Table 2.Table 2.

ConclusionConclusion Performed MDS plots and K-means clusteriPerformed MDS plots and K-means clusteri

ng analysis and found evidence for three clng analysis and found evidence for three clusters: glioblastomas, lower grade astrocytusters: glioblastomas, lower grade astrocytomas, and oligodendrogilmas (p<0.00001). omas, and oligodendrogilmas (p<0.00001).

A relatively small number of genes can be A relatively small number of genes can be used to distinguish between molecular subused to distinguish between molecular subtypes.types.

Subsets of gliomas can be potentially used Subsets of gliomas can be potentially used for patient stratification and potential targfor patient stratification and potential targets for treatment. ets for treatment.

Future DirectionsFuture Directions Construct predictors using Construct predictors using

different gene selection different gene selection methods.methods.

Validate the selected genes Validate the selected genes with new tumor samples.with new tumor samples.

…………

K=3 gave us the best prediction K=3 gave us the best prediction powerpower

Number of Cluster (K) 1 2 3 4 5

Tibshirani Prediction Strength

1.000 0.766 0.881 0.501 0.510

1 2 3 4 5

K

0.5

0.6

0.7

0.8

0.9

1.0

Tib

sh

ira

ni P

red

ictio

n S

tre

ng

th

Statistical problems in response-basedclassification

Identification of new or unknown classes--unsupervised learning

Classification into known classes— supervised learning

Identification of “best” predictor variables—variable selection, e.g. marker genes in microarray data (gene voting, hierarchical clustering)

Gene expression profiling identifies molecular subtypes of gliomas

Documents

Transcript of Gene expression profiling identifies molecular subtypes of gliomas