Radiomics and Deep Learning for Lung Cancer Screening
-
Upload
wookjin-choi -
Category
Healthcare
-
view
316 -
download
0
Transcript of Radiomics and Deep Learning for Lung Cancer Screening
Radiomics and deep learning for lung cancer screening
Wookjin Choi, PhDDepartment of Medical Physics, Memorial Sloan Kettering Cancer Center, New York, NY 10065
KOCSEA Technical Symposium 2017, Nov 11, Las Vegas, NV
Lung Cancer Screening
• Early detection of lung cancer by LDCT can reduce mortality– LDCT dramatically increases the number of
indeterminate pulmonary nodules (PNs)
• Known features correlated with PN malignancy– Size, growth rate– Calcification, enhancement, solidity → texture features– Boundary margins (spiculation, lobulation) → shape and
appearance features
2
Benign pattern of calcification
Malignant nodules
Benign nodules
Images from radiologyassistant.nl, AJR Am J Roentgenol. 2003 May;180(5):1255-63, and AJR Am J Roentgenol. 2002 May;178(5):1053-7.
• Booz | Allen | Hamilton and Kaggle
• Stage1 - 1595 cases with outcome
• Stage 2 – 506 cases without outcome
• Only images and outcome• Deep learning• Prizes total $1,000,000• 1,972 teams
Lung cancer screening competitions
LUNGx Challenge 2015
• SPIE, AAPM, and NCI
• 10 cases for calibration set with outcome
• 60 cases for test set without outcome
• Location of nodules
• Radiomics
• 11 teams
Kaggle Data Science Bowl 2017
3
Radiomics
4
Hugo J. W. L. Aerts et al., Nature Communications 5, Article number: 4006, June 2014
A high-throughput quantitative image analysis
• Lambin, et al. 2012. Eur J Cancer 48: 441-6.
• The automatic extraction of a large number of image features from medical images
• Hypothesis: these image features could capture additional information not currently used that has prognostic value
Data set
A subset of LIDC-IDRI from TCIA• Multi-institution data• Four radiologists detected and contoured
PNs • Consensus contour: generated by STAPLE
using 2 or more contours of PN• Biopsy-proven ground-truth or 2 years of
stable PN• 36 benign and 43 malignant cases, 7 missing
contours (5 benign and 2 malignant)• 72 cases evaluated (31 benign and 41
malignant cases)
5
LIDC-IDRI: Lung Image Database Consortium image collection, TCIA: The Cancer Imaging Archive, STAPLE: the simultaneous truth and performance level estimationData From LIDC-IDRI. The Cancer Imaging Archive. http://doi.org/10.7937/K9/TCIA.2015.LO9QL9SX
# Pts
Total 1,010
Having diagnosis data 157
Primary cancerbiopsy-provenprogression
4342
1
Benignbiopsy-proven2yrs of stable PNprogression
367
263
Metastatic cancer or unknown
78
ACR Lung-RADS
Category Baseline Screening Malignancy
1 No PNs; PNs with calcificationNegative
<1% chance of malignancy
2Solid/part-solid: <6 mm
GGN: <20 mm
Benign appearance
<1% chance of malignancy
3
Solid: ≥6 to <8 mm
Part-solid: ≥6 mm with solid component <6 mm
GGN: ≥20 mm
Probably benign
1-2% chance of malignancy
4ASolid: ≥8 to <15 mm
Part-solid: ≥8 mm with solid component ≥6 and <8 mm
Suspicious
5-15% chance of malignancy
4BSolid: ≥15 mm
Part-solid: Solid component ≥8 mm>15% chance of malignancy
4XCategory 3 or 4 PNs with suspicious features (e.g. enlarged lymph nodes)
or suspicious imaging findings (e.g. spiculation)>15% chance of malignancy
6
Summary of Lung-RADS categorization for baseline screening
ACR: American College of RadiologyLung-RADS: Lung CT Screening Reporting and Data System
Radiomics for Lung Cancer Screening
• Radiomic features from 3D volume and 2D axial slice with largest area (n=103)
– Shape: 40 features (3D: 26 and 2D: 14)
– Texture: 36 features (GLCM: 16 and GLCM: 20)
– Intensity: 18 features (3D: 9 and 2D: 9)
– Shape+Intensity: 9 features, shape features weighted by intensity using image moment (3D: 5 and 2D: 4)
7GLCM: gray level co-occurrence matrix, GLRM: gray level run-length matrix
GLCM GLRM
Texture features Intensity features
3D 2D
Shape features
Prediction model
• Distinctive features (n=50)– Hierarchical clustering using Pearson
correlation– 9 shape, 26 texture, 8 intensity, and 7
shape+intensity features– 15 significant features after Bonferroni
correction
• SVM classification coupled with LASSO feature selection– Selected 10 most important features by 10-
fold CV of the LASSO– Radial basis function kernel
(γ = 0.001 and C = 64) – 10 times 10-fold CV
8SVM: Support vector machine, LASSO: Least absolute shrinkage and selection operator, CV: Cross validation
Performance of the SVM-LASSO model
9CV: Cross validation, SVM: Support Vector Machine
with increasing number of features in the 10x10-fold CV
using the two important features and compared with Lung-RADS
Performance of the SVM-LASSO models
Prediction Model Sensitivity Specificity Accuracy AUC # of Features
Lung-RADS 73.3% 70.4% 72.2% 0.74 4
SVM-LASSO 10×10-fold 87.9±2.5% 78.2±1.6% 83.7±1.7% 0.86±0.01 2
20×5-fold 86.0±3.3% 75.9±3.9% 81.6±2.6% 0.85±0.02 2
50×2-fold 83.4±4.9% 71.9±8.8% 78.5±5.1% 0.84±0.03 2
10BB: Bounding Box, AP: Anterior-Posterior, SD: Standard Deviation, IDM: Inverse Difference Moment
• BB_AP– Highly correlated with the axial longest diameter and its
perpendicular diameter (r = 0.96, larger – more malignant)
• SD_IDM– Directional variation of local homogeneity (smaller – more
malignant)
Scatter plot of the two features
11
and the classification curve by the SVM-LASSO model
Cases misclassified by Lung-RADS
12BB: Bounding Box, SD: Standard Deviation, AP: Anterior-Posterior, SI: Superior-Inferior, IDM: Inverse Difference MomentScale bar is 10 mm, Spiculation: 1(no)-5(marked) scale
but correctly classified by the SVM-LASSO model
BB_AP 10mm
13
BenignIDM_LR: 0.172IDM_AP: 0.182IDM_SI: 0.284
Mean_IDM: 0.174SD_IDM: 0.033
MalignantIDM_LR: 0.116IDM_AP: 0.136IDM_SI: 0.138
Mean_IDM: 0.111SD_IDM: 0.014
Axial Sagittal Coronal
d e f
LR
AP
APSI
LRSI
a b c
LRAP
APSI
LRSI
BB_AP 17mm
14
BenignIDM_LR: 0.276IDM_AP: 0.316IDM_SI: 0.210
Mean_IDM: 0.220SD_IDM: 0.030
MalignantIDM_LR: 0.234IDM_AP: 0.215IDM_SI: 0.236
Mean_IDM: 0.203SD_IDM: 0.020
Axial Sagittal Coronal
d e f
LR
AP
APSI
LRSI
a b c
LRAP
APSI
LRSI
Comparison with recent modelsDataset Model description
Hawkins et al.
(2016)
Baseline CT scans of 261pts inNLST
Biopsy-proven ground-truth or 2 years of stable PN
23 RIDER stable radiomic features Random forest classifier 10×10-fold CV
Ma et al.
(2016)
LIDC 72pts Biopsy-proven ground-truth or 2
years of stable PN
583 radiomic features Random forest classifier 10-fold CV
Buty et al.
(2016)
LIDC 2054 PNs Ground-truth by radiologist’s
assessment
Spherical Harmonics (100, 150, and 400 shape features)and AlexNet33 (4096 appearance features)
Random forest classifier 10-fold CV
Kumar et al.
(2015)
LIDC 97pts, including metastatic tumors
Biopsy-proven ground-truth or 2 years of stable PN
Deep convolutional neural network model (5000 features)
10-fold CV
Proposed LIDC 72pts Biopsy-proven ground-truth or
2 years of stable PN
2 important features LASSO features selection and SVM classification 10×10-fold CV
15
Comparison with recent modelsSensitivity Specificity Accuracy AUC
Hawkins et al. (2016) 51.7% 92.9% 80.0% 0.83
Ma et al. (2016) 80.0% 85.5% 82.7%
Buty et al. (2016) 82.4%
Kumar et al. (2015) 79.1% 76.1% 77.5%
Proposed 87.9% 78.2% 83.7% 0.86
16
• A large number of features applied comparing to number of patients– May cause model overfitting problem
• No discussions on how the selected features might have contributed to the prediction of malignancy
• Deep learning needs numerous training data to avoid model overfitting, and transfer learning is questionable
Deep Learning
17
One of the greatest breakthroughs in recent years
Convolutional Neural Networks (CNN)
Gigantic annotated patient data are available for Deep Learning
Data Science Bowl 2017• Two 3D Fully Convolutional Neural Network models
– Nodule Detection network• Trained using LIDC-IRDI database, 883 cases
– Nodule Classification network• Transfer learning applied using the Nodule Detection model and
trained using 2/3 of Kaggle dataset, 1063 cases
18
Layer Kernel Stride #Filter
I 1
C1 5 1 8
C2 5 1 16
M1 2 2 16
C3 5 1 16
C4 5 1 32
M2 2 2 32
FC1 2 1 256
FC2 1 1 1
Layer Kernel Stride #Filter
I 1
C1 5 1 16
C2 5 1 16
M1 2 2 16
C3 5 1 32
C4 5 1 32
M2 2 2 32
FC1 2 1 256
FC2 1 1 64
FC3 1 1 1
Nodule detection network Nodule Classification network
Trained filters – Nodule Detection
19
1st layer 2nd layer
3rd layer
4th layer
3-fold cross-validation Sensitivity 95.1% and 5 false positives per scan
Trained filters – Nodule Classification
20
1st layer
2nd layer
3rd layer
4th layer
3-fold cross-validation Accuracy 67.4%
Failed to win the award
21
Ranked 99th out of 1972 teams (Top 6%, Bronze medal)
Log loss score
Top 10 teams
• https://datasciencebowl.com/2017algorithms/
22
Overfitting?
23
Public leaderboard – about 1% of test data Private leaderboard – about 99% of test data
Future Works
• Candidate feature approach– Quantification of spiculated or lobulated margins– Calcification, attachment, solidity and cavitation of
PNs
• Integrate plasma biomakers in the SVM-LASSO model– Difficult to diagnose small PNs, 50% accuracy when
PN size < 15mm– Combining plasma biomarkers with clinical variables
and image features (AUC = 0.95)
24Jiang et al. Int J Cancer. 2017. [published online ahead of print 2017/06/06].
Spiculation Quantification based on Shape Analysis
25
Mean and Gaussian curvature Spherical parameterization
Quantifying tumor morphological change with Jacobian map
26
Conclusion
• Developed an SVM-LASSO model and a deep learning model to predict malignancy of the indeterminate PNs
• Deep learning is feasible for the lung cancer screening but still needs more data
• A multicenter clinical trial in a large population is required
27
Acknowledgements
Memorial Sloan Kettering Cancer Center
– Wei Lu PhD
– Sadegh Riyahi, PhD
– Jung Hun Oh, PhD
– Saad Nadeem, PhD
– James G. Mechalakos, PhD
– Joseph O. Deasy, PhD
– Andreas Rimner, MD
– Chia-ju Liu, MD
– Prasad Adusumilli, MD
– Wolfgang Weber, MD
University of Maryland School of Medicine
– Howard Zhang, PhD
– Feng Jiang, MD, PhD
– Wengen Chen, MD, PhD
– Charles White, MD
28
NIH/NCI Grant R01 CA172638 and NIH/NCI Cancer Center Support Grant P30 CA008748
Thank you!
Q & A
29
ROC curve analysis on the best model of SVM-LASSO and Lung-RADS
The box plots show the difference between benign and malignant PNs for the selected features (BB_AP and SD_IDM) and the largest diameter. P-values were obtained by the Wilcoxon rank sum test and adjusted using Bonferroni correction
30
Significant featuresRank Feature name Type P-value AUC Correlation
1 BB_AP Shape 0.00070 0.81 +
2 BB_SI Shape 0.0012 0.80 +
3 SD_IDM Texture 0.0018 0.79 -
4 Weighted Principal Moments2 Shape+Intensity 0.0022 0.78 +
5 Grey Level Nonuniformity Texture 0.0026 0.78 +
6 Oriented BB_SI Shape 0.0027 0.79 +
7 Weighted Principal Moments3 Shape+Intensity 0.0030 0.78 +
8 Low Grey Level Run Emphasis Texture 0.0031 0.78 -
9 SD Run Length Nonuniformity Texture 0.0033 0.78 +
10 SD Low Grey Level Run Emphasis Texture 0.017 0.75 -
11 Correlation Texture 0.018 0.75 +
12 IDM Texture 0.020 0.75 +
13 SD Long Run Emphasis Texture 0.024 0.75 -
14 Long Run Low Grey Level Emphasis Texture 0.028 0.75 -
15 Inertia Texture 0.035 0.74 -
31