Machine Learning Applications to Translational Research ... · based approach for selecting...
Transcript of Machine Learning Applications to Translational Research ... · based approach for selecting...
![Page 1: Machine Learning Applications to Translational Research ... · based approach for selecting visualization techniques and machine learning algorithms appropriate for a given dataset.](https://reader034.fdocuments.in/reader034/viewer/2022043017/5f39b0212f99176354514f16/html5/thumbnails/1.jpg)
1
Machine Learning Applications to Translational
Research: Titbits from NCI Cancer Center Projects
Srisairam Achuthan, PhD
Center for Informatics, City of Hope
Oct 3rd , 2018
CI4CC Fall 2018 Symposium, New Orleans
![Page 2: Machine Learning Applications to Translational Research ... · based approach for selecting visualization techniques and machine learning algorithms appropriate for a given dataset.](https://reader034.fdocuments.in/reader034/viewer/2022043017/5f39b0212f99176354514f16/html5/thumbnails/2.jpg)
Outline
• SPIRIT- Scientific Analytics
• Case Studies
• Stem Cell Therapy
• Epigenetics (DNA Methylation)
• Radiology and Pathology Image Analysis
• Summary
![Page 3: Machine Learning Applications to Translational Research ... · based approach for selecting visualization techniques and machine learning algorithms appropriate for a given dataset.](https://reader034.fdocuments.in/reader034/viewer/2022043017/5f39b0212f99176354514f16/html5/thumbnails/3.jpg)
• Eugene and Ruth Roberts Summer Academy Interns• Michelle Tran (MD/PhD Candidate, Icahn School of Medicine, Mount Sinai)• Kelsang Donyo (Bachelor’s Degree, Statistics, Harvard University)• Arin Jayasekara (USC Neuroscience, Tulane University School of Medicine)• Eric Jiang (Bachelor’s Degree, Data Science, UC San Diego)• Joseph Wong (Bachelor’s Degree, Computer Science, UC Santa Barbara)• Sid Rumalla (Master’s Degree, Public Health, UT Austin)
• Beckman Research Institute, City of Hope• Prof. Karen Aboody, Dr. Lucy Ghoda, Prof. Michael Barish: Stem Cells Project• Prof. Rama Natarajan, Chuo Zhen (Nancy): Epigenetics Project• Prof. Joyce Niland
• Center for Informatics, City of Hope• Karthik Seetharamu, Allen Mao, Lawrence Love• Zahra (Nasim) Eftekhari, Lorenzo Rossi (Data Science)
• Dr. Ajay Shah, Executive Director, Bristol Myers Squibb
• Sorena Nadaf, Chief Informatics Officer, Center for Informatics, City of Hope
Acknowledgements
![Page 4: Machine Learning Applications to Translational Research ... · based approach for selecting visualization techniques and machine learning algorithms appropriate for a given dataset.](https://reader034.fdocuments.in/reader034/viewer/2022043017/5f39b0212f99176354514f16/html5/thumbnails/4.jpg)
Motivation
• Significant effort is spent when various scientific analytical
methods are applied to biomedical problems using independent
one-off deployment of computational pipeline.
• SPIRIT−SA is a comprehensive scientific analytics platform, as
part of SPIRIT1 (Software Platform for Integrated Research
Information and Transformation) that also provides a rule-based
approach to simplify the data visualization and machine learning
model selection problems.
[1] Achuthan S, Chang M, Shah A. SPIRIT-ML: A Machine learning platform for deriving knowledge from
biomedical datasets 11th International Conference, DILS. Los Angeles, CA, USA: Springer; 2015.
![Page 5: Machine Learning Applications to Translational Research ... · based approach for selecting visualization techniques and machine learning algorithms appropriate for a given dataset.](https://reader034.fdocuments.in/reader034/viewer/2022043017/5f39b0212f99176354514f16/html5/thumbnails/5.jpg)
SPIRIT – Scientific Analytics (SA)
![Page 6: Machine Learning Applications to Translational Research ... · based approach for selecting visualization techniques and machine learning algorithms appropriate for a given dataset.](https://reader034.fdocuments.in/reader034/viewer/2022043017/5f39b0212f99176354514f16/html5/thumbnails/6.jpg)
SPIRIT – SA Workflow
![Page 7: Machine Learning Applications to Translational Research ... · based approach for selecting visualization techniques and machine learning algorithms appropriate for a given dataset.](https://reader034.fdocuments.in/reader034/viewer/2022043017/5f39b0212f99176354514f16/html5/thumbnails/7.jpg)
SPIRIT – SA Dataflow
![Page 8: Machine Learning Applications to Translational Research ... · based approach for selecting visualization techniques and machine learning algorithms appropriate for a given dataset.](https://reader034.fdocuments.in/reader034/viewer/2022043017/5f39b0212f99176354514f16/html5/thumbnails/8.jpg)
Recommendation Rules
[2] Sheneiderman, B. The eyes have it. A task by data type taxonomy for information visualizations. Proceedings
of IEEE Symposium on Visual Languages , Boulder, Co; 1996.
[3] Tableau : https://www.tableau.com/sites/default/files/media/which_chart_v6_final_0.pdf
[4] https://eazybi.com/blog/data_visualization_and_chart_types/
Visualization Rules Machine Learning Rules
[5] Scikit : http://scikit-learn.org/stable/tutorial/machine_learning_map/
![Page 9: Machine Learning Applications to Translational Research ... · based approach for selecting visualization techniques and machine learning algorithms appropriate for a given dataset.](https://reader034.fdocuments.in/reader034/viewer/2022043017/5f39b0212f99176354514f16/html5/thumbnails/9.jpg)
Example : Mammographic Mass Data
• Discrimination of benign and malignant mammographic masses
based on BI-RADS attributes and the patient’s age7.
• Dataset description
Number of observations : 961
Number of Features : 5
Missing Values ? : Present
Outcome (=Class) : Severity
Benign – 516 instances
Malignant – 445 instances
[6] UCI Machine Learning Repository : http://archive.ics.uci.edu/ml/datasets/mammographic+mass
![Page 10: Machine Learning Applications to Translational Research ... · based approach for selecting visualization techniques and machine learning algorithms appropriate for a given dataset.](https://reader034.fdocuments.in/reader034/viewer/2022043017/5f39b0212f99176354514f16/html5/thumbnails/10.jpg)
VisiRule for Distribution Visualization
[7] VisiRule : http://www.lpa.co.uk/vsr.htm
![Page 11: Machine Learning Applications to Translational Research ... · based approach for selecting visualization techniques and machine learning algorithms appropriate for a given dataset.](https://reader034.fdocuments.in/reader034/viewer/2022043017/5f39b0212f99176354514f16/html5/thumbnails/11.jpg)
Data Visualization : Tableau
• Data Visualization based on distribution :
[6] UCI Machine Learning Repository : http://archive.ics.uci.edu/ml/datasets/mammographic+mass
Distribution of Severity as a function of Age
Distribution of Features
![Page 12: Machine Learning Applications to Translational Research ... · based approach for selecting visualization techniques and machine learning algorithms appropriate for a given dataset.](https://reader034.fdocuments.in/reader034/viewer/2022043017/5f39b0212f99176354514f16/html5/thumbnails/12.jpg)
VisiRule for Supervised Learning
![Page 13: Machine Learning Applications to Translational Research ... · based approach for selecting visualization techniques and machine learning algorithms appropriate for a given dataset.](https://reader034.fdocuments.in/reader034/viewer/2022043017/5f39b0212f99176354514f16/html5/thumbnails/13.jpg)
SPIRIT- ML Implementation
• Predictive Model Building 961 observations with 5 numeric features
• Data imputation was
applied in this case
(replace with median).
Missing Data Analysis
Default parameters
Attribute Selection and Normalization
![Page 14: Machine Learning Applications to Translational Research ... · based approach for selecting visualization techniques and machine learning algorithms appropriate for a given dataset.](https://reader034.fdocuments.in/reader034/viewer/2022043017/5f39b0212f99176354514f16/html5/thumbnails/14.jpg)
• Predictive Model Building : 70% of data used for training
Accuracy, Performance Measures and Feature Ranking based on training data
Most important features (top 3) obtained by consensus polling of features : BI-RADS
Assessment, Age and Margin
Training Data Results
![Page 15: Machine Learning Applications to Translational Research ... · based approach for selecting visualization techniques and machine learning algorithms appropriate for a given dataset.](https://reader034.fdocuments.in/reader034/viewer/2022043017/5f39b0212f99176354514f16/html5/thumbnails/15.jpg)
Test Data Results
• Predictive Model Building :15 % of data used for testing
Accuracy, Performance Measures and Feature Ranking based on test data
![Page 16: Machine Learning Applications to Translational Research ... · based approach for selecting visualization techniques and machine learning algorithms appropriate for a given dataset.](https://reader034.fdocuments.in/reader034/viewer/2022043017/5f39b0212f99176354514f16/html5/thumbnails/16.jpg)
• Case Study 1 : Stem Cells Project
In collaboration with Prof. Karen Aboody’s Lab @COH
![Page 17: Machine Learning Applications to Translational Research ... · based approach for selecting visualization techniques and machine learning algorithms appropriate for a given dataset.](https://reader034.fdocuments.in/reader034/viewer/2022043017/5f39b0212f99176354514f16/html5/thumbnails/17.jpg)
Stem Cell Project
Determine all experimentally observed
factors that may be influencing the
neural stem cells’ coverage of tumor
sites.
[8] Metz MZ, Gutova M, Lacey SF, Abramyants Y, Vo T, Gilchrist M, Tirughana R, Ghoda LY, Barish ME, Brown
CE, Najbauer J, Potter PM, Portnow, J, Synold TW, and Aboody KS: Neural stem cell mediated delivery of
irinotecan-activating carboxylesterases to glioma: Implications for clinical use. Stem Cells Transl Med 2013
Dec;2(12):983-92
![Page 18: Machine Learning Applications to Translational Research ... · based approach for selecting visualization techniques and machine learning algorithms appropriate for a given dataset.](https://reader034.fdocuments.in/reader034/viewer/2022043017/5f39b0212f99176354514f16/html5/thumbnails/18.jpg)
Experimental Factors
![Page 19: Machine Learning Applications to Translational Research ... · based approach for selecting visualization techniques and machine learning algorithms appropriate for a given dataset.](https://reader034.fdocuments.in/reader034/viewer/2022043017/5f39b0212f99176354514f16/html5/thumbnails/19.jpg)
•
• Results based on training data with % of tumor coverage as the
class variable with 2 classes (covered vs not covered).
Summary of ML Results
![Page 20: Machine Learning Applications to Translational Research ... · based approach for selecting visualization techniques and machine learning algorithms appropriate for a given dataset.](https://reader034.fdocuments.in/reader034/viewer/2022043017/5f39b0212f99176354514f16/html5/thumbnails/20.jpg)
• Using consensus analysis, we find that the factors influencing
tumor coverage the most are :
Tumor_NumberCell
LogOne
LogTwo
Tumor_Age
Sex
Most Influential Factors
![Page 21: Machine Learning Applications to Translational Research ... · based approach for selecting visualization techniques and machine learning algorithms appropriate for a given dataset.](https://reader034.fdocuments.in/reader034/viewer/2022043017/5f39b0212f99176354514f16/html5/thumbnails/21.jpg)
Test Data Results
![Page 22: Machine Learning Applications to Translational Research ... · based approach for selecting visualization techniques and machine learning algorithms appropriate for a given dataset.](https://reader034.fdocuments.in/reader034/viewer/2022043017/5f39b0212f99176354514f16/html5/thumbnails/22.jpg)
Cluster Analysis
Threshold
Mice
![Page 23: Machine Learning Applications to Translational Research ... · based approach for selecting visualization techniques and machine learning algorithms appropriate for a given dataset.](https://reader034.fdocuments.in/reader034/viewer/2022043017/5f39b0212f99176354514f16/html5/thumbnails/23.jpg)
• Case Study 2 : DNA Methylation Project
In collaboration with Prof. Rama Natarajan’s Lab @COH
![Page 24: Machine Learning Applications to Translational Research ... · based approach for selecting visualization techniques and machine learning algorithms appropriate for a given dataset.](https://reader034.fdocuments.in/reader034/viewer/2022043017/5f39b0212f99176354514f16/html5/thumbnails/24.jpg)
Epigenetics
• Epigenetics involves the study of heritable changes in gene
functionality during cell replication that does not involve any
change in the underlying DNA sequence9.
• Epigenetic mechanisms such as DNA Methylation (DNAm) vary
at specific genomic locations in human diseases such as cancer
and diabetes.
• DNA methylation (DNAm) is known to change the DNA and
chromatin structure among cancer and diabetic patients relative
to healthy subjects.
[9] Armstrong, L. Epigenetics. Garland Science, New York (2014).
![Page 25: Machine Learning Applications to Translational Research ... · based approach for selecting visualization techniques and machine learning algorithms appropriate for a given dataset.](https://reader034.fdocuments.in/reader034/viewer/2022043017/5f39b0212f99176354514f16/html5/thumbnails/25.jpg)
DNA Methylation
Adapted from Illumina data sheet
Adapted from Illumina data sheet
![Page 26: Machine Learning Applications to Translational Research ... · based approach for selecting visualization techniques and machine learning algorithms appropriate for a given dataset.](https://reader034.fdocuments.in/reader034/viewer/2022043017/5f39b0212f99176354514f16/html5/thumbnails/26.jpg)
Applications of DNAm
• Infinium HumanMethylation 27K Beadchips generated DNAm
datasets obtained from GEO10 are used to validate our approach.
• DNA methylation profiles across approximately 27,000 CpGs in
smear cells from the uterine cervix (liquid based cytology samples),
obtained from 48 women. All women tested positive for the human
papilloma virus (HPV+). Of the 48 samples, 24 were cytologically
normal while the other 24 exhibited morphological transformation
(cervical intraepithelial neoplasia of grade 2 or higher - CIN2+).
[10] Teschendorff, A. E. and Widschwendter, M. Differential variability improves the identification of cancer risk
markers in DNA methylation studies profiling precursor cancer lesions. Bioinformatics, 28:11, 1487-94 (2012).
![Page 27: Machine Learning Applications to Translational Research ... · based approach for selecting visualization techniques and machine learning algorithms appropriate for a given dataset.](https://reader034.fdocuments.in/reader034/viewer/2022043017/5f39b0212f99176354514f16/html5/thumbnails/27.jpg)
DNAm Results
• Bartlett’s test was applied to identify CpGs (features) that had
high variance between control and case patients (GSE37020).
• This reduced the total number of dimensions to just over 22%
of the original number of measured features. The reduction in
number of dimensions is dependent on patient
sample variance11.
DNA Analysis Step Initial # of CpGs Final # of CpGs Number of
Groups
Variance Filtering
(Bartlett’s test)
27578 6114 2
[11] Zhuang, J, Widschwendter, M. and Teschendorff, A. E. A comparison of feature selection and classification
methods in DNA methylation studies using the Illumina Infinium platform. BMC Bioinformatics, 13:59, 1-14
(2012).
![Page 28: Machine Learning Applications to Translational Research ... · based approach for selecting visualization techniques and machine learning algorithms appropriate for a given dataset.](https://reader034.fdocuments.in/reader034/viewer/2022043017/5f39b0212f99176354514f16/html5/thumbnails/28.jpg)
DNAm Results Contd.
• To refine the features we make use of the variable importance
option in H2O’s deep learning that ranks all the features used
to build the deep learning model.
• The variable importance method (Gedeon) considers the
weights connecting the input features to the first two hidden
layers.
DNA Analysis Step Initial # of CpGs Final # of CpGs Number of
Groups
Classification
Analysis (multiple
refinements)
6114 368 2
![Page 29: Machine Learning Applications to Translational Research ... · based approach for selecting visualization techniques and machine learning algorithms appropriate for a given dataset.](https://reader034.fdocuments.in/reader034/viewer/2022043017/5f39b0212f99176354514f16/html5/thumbnails/29.jpg)
H2O-ML Results
Accuracy: Training & Cross Validation
Top Features
after
Refinement
CHR
Gene
Name
cg00025138 14 MAP3K9
cg00080012 11 EED
cg00112517 17 PPP1R1B
cg00212549 3 SEMA5B
cg00234616 2 TLX2
cg00480356 15 HYPK
cg00489401 5 FLT4
cg00501366 17 ALOX12B
cg00509670 14 PAX9
cg00579393 2 REG1B
Genes & Chrmosomes: Top 10 CpGs
![Page 30: Machine Learning Applications to Translational Research ... · based approach for selecting visualization techniques and machine learning algorithms appropriate for a given dataset.](https://reader034.fdocuments.in/reader034/viewer/2022043017/5f39b0212f99176354514f16/html5/thumbnails/30.jpg)
• Case Study 3 : Image based detection use cases
Radiology Images
Pathology Images
In collaborations with Dr. Ammar Chaudhry & Dr. Raju Pillai @COH
![Page 31: Machine Learning Applications to Translational Research ... · based approach for selecting visualization techniques and machine learning algorithms appropriate for a given dataset.](https://reader034.fdocuments.in/reader034/viewer/2022043017/5f39b0212f99176354514f16/html5/thumbnails/31.jpg)
Deep Learning Models
[12] Simonyan, K and Zisserman, A, “Very Deep Convolutional Networks for Large-Scale Image Recognition”,
arXiv: 1409.1556
[13] Kermany, D; Goldbaum, M, et.al. , “Identifying Medical Diagnoses and Treatable Diseases by Image-Based
Deep Learning”, Cell, 172 (5) P1122-1131 (2018).
Example of Deep Learning Model Architecture
with Convolutional Neural Network Layers
Example of Transfer Learning
![Page 32: Machine Learning Applications to Translational Research ... · based approach for selecting visualization techniques and machine learning algorithms appropriate for a given dataset.](https://reader034.fdocuments.in/reader034/viewer/2022043017/5f39b0212f99176354514f16/html5/thumbnails/32.jpg)
Classification of Radiology Images
[14] Kermany, D; Zhang, K and Goldbaum, M, “Labeled Optical Coherence Tomography (OCT) and Chest X-
Ray Images for Classification”, Mendeley Data, v2 http://dx.doi.org/10.17632/rscbjbr9sj.2
• Chest X-ray images of pediatric patients (1-5 years old),
Guangzhou Women and Children’s Medical Center,
Guangzhou14.
NormalViral Pneumonia Bacterial Pneumonia
![Page 33: Machine Learning Applications to Translational Research ... · based approach for selecting visualization techniques and machine learning algorithms appropriate for a given dataset.](https://reader034.fdocuments.in/reader034/viewer/2022043017/5f39b0212f99176354514f16/html5/thumbnails/33.jpg)
• Trained a deep neural network [ VGG16 CNN (minus the top
layer) +Dense(256 neurons) + Dropout (0.5) + Dense (3
neurons) ] with 3600 X-ray images and validated with 300
images. A total of 623 images held out as part of testing dataset.
Model Accuracy of training and
validation datasets
Confusion Matrix and Metrics
of testing dataset
Radiology Images – Results
![Page 34: Machine Learning Applications to Translational Research ... · based approach for selecting visualization techniques and machine learning algorithms appropriate for a given dataset.](https://reader034.fdocuments.in/reader034/viewer/2022043017/5f39b0212f99176354514f16/html5/thumbnails/34.jpg)
Classification of Pathology Images
[15] https://github.com/Narasimha1997/Blood-Cell-type-identification-using-CNN-classifier
![Page 35: Machine Learning Applications to Translational Research ... · based approach for selecting visualization techniques and machine learning algorithms appropriate for a given dataset.](https://reader034.fdocuments.in/reader034/viewer/2022043017/5f39b0212f99176354514f16/html5/thumbnails/35.jpg)
• Trained a deep neural network [ Customized CNN+Dense(128
neurons) + Dropout (0.5) + Dense (4 neurons) ] with 4800
X-ray images and validated with 960 images. A total of 400
images were held out as part of testing dataset.
• Server with an NVIDIA Quadro P5000 GPU
Confusion Matrix and Metrics of testing dataset
Pathology Images – Results
![Page 36: Machine Learning Applications to Translational Research ... · based approach for selecting visualization techniques and machine learning algorithms appropriate for a given dataset.](https://reader034.fdocuments.in/reader034/viewer/2022043017/5f39b0212f99176354514f16/html5/thumbnails/36.jpg)
Quantitative Image Analysis in
Digital Pathology
[16] Coudray, N; Ocampo, PS ; et.al., “Classification and Mutation Prediction from Non-Small Cell Lung
Cancer Histopathology Images using Deep Learning”, Nature Medicine (Sept. 2018),
https://doi.org/10.1038/s41591-018-0177-5.
![Page 37: Machine Learning Applications to Translational Research ... · based approach for selecting visualization techniques and machine learning algorithms appropriate for a given dataset.](https://reader034.fdocuments.in/reader034/viewer/2022043017/5f39b0212f99176354514f16/html5/thumbnails/37.jpg)
• SPIRIT-SA can be a powerful end-to-end scientific analytics
platform that has been developed to provide a seamless user
experience for analyzing biomedical datasets.
• Key features that highlight the utility of this platform are the rules
based approach for selecting visualization techniques and
machine learning algorithms appropriate for a given dataset.
• A guided approach to choosing visualization methods and
machine learning models make the tasks of prediction and trend
analysis when analyzing data as seamless as possible for a user
in translational research projects.
Summary