Classification of cancerous and non cancerous tissues
-
Upload
meenal-goyal -
Category
Engineering
-
view
134 -
download
8
description
Transcript of Classification of cancerous and non cancerous tissues
![Page 1: Classification of cancerous and non cancerous tissues](https://reader030.fdocuments.in/reader030/viewer/2022013115/559e0d3c1a28abb3308b4706/html5/thumbnails/1.jpg)
Cancerous Tissue Classification (Using Microarray Gene Expression)
Meenal Goyal Pankhuri Goyal
![Page 2: Classification of cancerous and non cancerous tissues](https://reader030.fdocuments.in/reader030/viewer/2022013115/559e0d3c1a28abb3308b4706/html5/thumbnails/2.jpg)
Background
● Decoding gene expression is an important active research area in molecular biology and bioinformatics.
● Microarray technology used to get gene expression level in different cells.
● Applications:○ Tissue classification (Cancer vs non-cancer) ○ Identify novel targets for drug design.○ Extract patterns and analyse.
![Page 3: Classification of cancerous and non cancerous tissues](https://reader030.fdocuments.in/reader030/viewer/2022013115/559e0d3c1a28abb3308b4706/html5/thumbnails/3.jpg)
Problem
● Binary classification of cancerous and normal tissue.
● Investigate feature selection and classification (supervised and unsupervised) algorithms.
● Improves the diagnosis, prognosis, and treatment planning by cancer detection in early stages.
● Challenges:○ High dimension of the input features.○ Limited number of tissue samples.
![Page 4: Classification of cancerous and non cancerous tissues](https://reader030.fdocuments.in/reader030/viewer/2022013115/559e0d3c1a28abb3308b4706/html5/thumbnails/4.jpg)
Dataset
● GSE3 (renal clear cell carcinoma):○ Modality: numeric○ # features: 36,864 genes○ # samples: 81 cancerous and 90 normal
● High dimensional feature space, not sparse.
● Cell ( i, j ) represents expression level of gene j in tissue i.
![Page 5: Classification of cancerous and non cancerous tissues](https://reader030.fdocuments.in/reader030/viewer/2022013115/559e0d3c1a28abb3308b4706/html5/thumbnails/5.jpg)
GSE3
Feature Selection1. T-Test2. Volcano Plot3. mRmR4. PCA5. Weighted kmeans (fisher
weights)
Supervised Learning (KNN, SVM, Boosting)
Unsupervised Learning (K-means, hierarchical learning)
Model GSE3
Resulting error rate and accuracy
Classification Pipeline
![Page 6: Classification of cancerous and non cancerous tissues](https://reader030.fdocuments.in/reader030/viewer/2022013115/559e0d3c1a28abb3308b4706/html5/thumbnails/6.jpg)
Feature SelectionMethods
![Page 7: Classification of cancerous and non cancerous tissues](https://reader030.fdocuments.in/reader030/viewer/2022013115/559e0d3c1a28abb3308b4706/html5/thumbnails/7.jpg)
T-Test● T scores:
● Null hypothesis: Both classes have equal mean.
● Pvalues : Probability of that observation if null hypothesis is true.
● Features with Pvalues <= 0.01 are selected.
● GSE3 data (916 features).
![Page 8: Classification of cancerous and non cancerous tissues](https://reader030.fdocuments.in/reader030/viewer/2022013115/559e0d3c1a28abb3308b4706/html5/thumbnails/8.jpg)
Volcano Plot
GSE3 dataset Pvalues < 0.01 Fold change = 2 Features extracted : 492
![Page 9: Classification of cancerous and non cancerous tissues](https://reader030.fdocuments.in/reader030/viewer/2022013115/559e0d3c1a28abb3308b4706/html5/thumbnails/9.jpg)
Minimum redundancy-maximum relevance (MRMR)
● F-test value is defined by
● Top 20 features are selected from the f-test score. ● Rest 130 features extracted using linear incremental
search algorithm : MRMR-FDM
● Total features selected for GSE3 data : 150
![Page 10: Classification of cancerous and non cancerous tissues](https://reader030.fdocuments.in/reader030/viewer/2022013115/559e0d3c1a28abb3308b4706/html5/thumbnails/10.jpg)
PCA
● Top 3000 dimensions are selected for GSE3 from two sample t-test for PCA analysis.
Features selected : 170
![Page 11: Classification of cancerous and non cancerous tissues](https://reader030.fdocuments.in/reader030/viewer/2022013115/559e0d3c1a28abb3308b4706/html5/thumbnails/11.jpg)
Weighted-kMeans (using Fisher Weights)
● Top 10,000 features selected from two sample t-test for Fisher analysis.
● Fisher score calculated by F(w) = (u1 - u2)
2
(s12 + s2
2)● Weighted - kmeans applied on feature space using
fisher values as weights. ● Centroid from each cluster is selected as a desired
feature. ● Total features for GSE3 dataset : 200
![Page 12: Classification of cancerous and non cancerous tissues](https://reader030.fdocuments.in/reader030/viewer/2022013115/559e0d3c1a28abb3308b4706/html5/thumbnails/12.jpg)
ClassificationAlgorithms
![Page 13: Classification of cancerous and non cancerous tissues](https://reader030.fdocuments.in/reader030/viewer/2022013115/559e0d3c1a28abb3308b4706/html5/thumbnails/13.jpg)
K- nearest neighbours (k-NN)
● Test / Train data divided using○ Holdout -> test : train = 1 : 1○ Kfold -> k : 5, test : train = 1 : 5
● K parameter varied from k=1 to 10.● Distance metric : Euclidean
![Page 14: Classification of cancerous and non cancerous tissues](https://reader030.fdocuments.in/reader030/viewer/2022013115/559e0d3c1a28abb3308b4706/html5/thumbnails/14.jpg)
KNN misclassification error rate plotted for all feature selection methods
![Page 15: Classification of cancerous and non cancerous tissues](https://reader030.fdocuments.in/reader030/viewer/2022013115/559e0d3c1a28abb3308b4706/html5/thumbnails/15.jpg)
Support Vector Machine (SVM)
● Test/ Train data divided using○ Holdout -> test : train = 0.2○ Kfold -> k=5, test : train = 1: 5
● Kernel functions used ○ Linear○ Polynomial : order = 2○ Radial
● c parameter varied from 0.01 to 0.3.( for linear kernel, holdout method)
![Page 16: Classification of cancerous and non cancerous tissues](https://reader030.fdocuments.in/reader030/viewer/2022013115/559e0d3c1a28abb3308b4706/html5/thumbnails/16.jpg)
Misclassification error rate vs c-parameter for all feature selection methods. (Linear kernel)
![Page 17: Classification of cancerous and non cancerous tissues](https://reader030.fdocuments.in/reader030/viewer/2022013115/559e0d3c1a28abb3308b4706/html5/thumbnails/17.jpg)
Accuracy matrix for SVM
T- Test
Volcano Plot
MRMR
PCA
Weighted- kMeans (using Fisher weights)
Best accuracy observed in linear kernel for all cases.
![Page 18: Classification of cancerous and non cancerous tissues](https://reader030.fdocuments.in/reader030/viewer/2022013115/559e0d3c1a28abb3308b4706/html5/thumbnails/18.jpg)
Adaboost
● Test/ Train divided using Holdout with ratio 1 : 1.● Weak Learner = Decision Tree● Number of weak learners used = 100
![Page 19: Classification of cancerous and non cancerous tissues](https://reader030.fdocuments.in/reader030/viewer/2022013115/559e0d3c1a28abb3308b4706/html5/thumbnails/19.jpg)
K-Means
● Test / Train set divided as○ Holdout -> test : train = 1 : 1○ Kfold -> k =5, test : train = 1 : 5
● K parameter varied from k=1 to 5.
Objective function
![Page 20: Classification of cancerous and non cancerous tissues](https://reader030.fdocuments.in/reader030/viewer/2022013115/559e0d3c1a28abb3308b4706/html5/thumbnails/20.jpg)
Misclassification error vs k for all feature selection methods.
![Page 21: Classification of cancerous and non cancerous tissues](https://reader030.fdocuments.in/reader030/viewer/2022013115/559e0d3c1a28abb3308b4706/html5/thumbnails/21.jpg)
Hierarchical Clustering
● Some cancer types can contain an arbitrary number of subtypes and usually it is unknown how many or what subtypes a specific cancer has.
● Green, black, and red colors in the heat maps indicate a low, medium, and high expression of the corresponding gene in the sample.
● Lower accuracy rate as compared to other algorithms.
![Page 22: Classification of cancerous and non cancerous tissues](https://reader030.fdocuments.in/reader030/viewer/2022013115/559e0d3c1a28abb3308b4706/html5/thumbnails/22.jpg)
T-test Volcano Plots MRMR
PCA Weighted kMeans
![Page 23: Classification of cancerous and non cancerous tissues](https://reader030.fdocuments.in/reader030/viewer/2022013115/559e0d3c1a28abb3308b4706/html5/thumbnails/23.jpg)
References● http://cs229.stanford.edu/proj2012/ChenPopicLiu-
CancerousTissueClassificationUsingMicroarrayGeneExpression.pdf
● http://www.sciencedirect.com/science/article/pii/S1532046411000037
● http://in.mathworks.com/help/bioinfo/ug/exploring-gene-expression-data.html
● http://arxiv.org/pdf/1103.3434.pdf
![Page 24: Classification of cancerous and non cancerous tissues](https://reader030.fdocuments.in/reader030/viewer/2022013115/559e0d3c1a28abb3308b4706/html5/thumbnails/24.jpg)
Thank you