Radiomics and Deep Learning for Lung Cancer Screening

Radiomics and deep learning for lung cancer screening

Wookjin Choi, PhDDepartment of Medical Physics, Memorial Sloan Kettering Cancer Center, New York, NY 10065

KOCSEA Technical Symposium 2017, Nov 11, Las Vegas, NV

Lung Cancer Screening

• Early detection of lung cancer by LDCT can reduce mortality– LDCT dramatically increases the number of

indeterminate pulmonary nodules (PNs)

• Known features correlated with PN malignancy– Size, growth rate– Calcification, enhancement, solidity → texture features– Boundary margins (spiculation, lobulation) → shape and

appearance features

2

Benign pattern of calcification

Malignant nodules

Benign nodules

Images from radiologyassistant.nl, AJR Am J Roentgenol. 2003 May;180(5):1255-63, and AJR Am J Roentgenol. 2002 May;178(5):1053-7.

• Booz | Allen | Hamilton and Kaggle

• Stage1 - 1595 cases with outcome

• Stage 2 – 506 cases without outcome

• Only images and outcome• Deep learning• Prizes total $1,000,000• 1,972 teams

Lung cancer screening competitions

LUNGx Challenge 2015

• SPIE, AAPM, and NCI

• 10 cases for calibration set with outcome

• 60 cases for test set without outcome

• Location of nodules

• Radiomics

• 11 teams

Kaggle Data Science Bowl 2017

3

https://wiki.cancerimagingarchive.net/display/Public/LUNGx+SPIE-AAPM-NCI+Lung+Nodule+Classification+Challenge

https://www.kaggle.com/c/data-science-bowl-2017

Radiomics

4

Hugo J. W. L. Aerts et al., Nature Communications 5, Article number: 4006, June 2014

A high-throughput quantitative image analysis

• Lambin, et al. 2012. Eur J Cancer 48: 441-6.

• The automatic extraction of a large number of image features from medical images

• Hypothesis: these image features could capture additional information not currently used that has prognostic value

Data set

A subset of LIDC-IDRI from TCIA• Multi-institution data• Four radiologists detected and contoured

PNs • Consensus contour: generated by STAPLE

using 2 or more contours of PN• Biopsy-proven ground-truth or 2 years of

stable PN• 36 benign and 43 malignant cases, 7 missing

contours (5 benign and 2 malignant)• 72 cases evaluated (31 benign and 41

malignant cases)

5

LIDC-IDRI: Lung Image Database Consortium image collection, TCIA: The Cancer Imaging Archive, STAPLE: the simultaneous truth and performance level estimationData From LIDC-IDRI. The Cancer Imaging Archive. http://doi.org/10.7937/K9/TCIA.2015.LO9QL9SX

# Pts

Total 1,010

Having diagnosis data 157

Primary cancerbiopsy-provenprogression

4342

1

Benignbiopsy-proven2yrs of stable PNprogression

367

263

Metastatic cancer or unknown

78

ACR Lung-RADS

Category Baseline Screening Malignancy

1 No PNs; PNs with calcificationNegative

<1% chance of malignancy

2Solid/part-solid: <6 mm

GGN: <20 mm

Benign appearance

<1% chance of malignancy

3

Solid: ≥6 to <8 mm

Part-solid: ≥6 mm with solid component <6 mm

GGN: ≥20 mm

Probably benign

1-2% chance of malignancy

4ASolid: ≥8 to <15 mm

Part-solid: ≥8 mm with solid component ≥6 and <8 mm

Suspicious

5-15% chance of malignancy

4BSolid: ≥15 mm

Part-solid: Solid component ≥8 mm>15% chance of malignancy

4XCategory 3 or 4 PNs with suspicious features (e.g. enlarged lymph nodes)

or suspicious imaging findings (e.g. spiculation)>15% chance of malignancy

6

Summary of Lung-RADS categorization for baseline screening

ACR: American College of RadiologyLung-RADS: Lung CT Screening Reporting and Data System

Radiomics for Lung Cancer Screening

• Radiomic features from 3D volume and 2D axial slice with largest area (n=103)

– Shape: 40 features (3D: 26 and 2D: 14)

– Texture: 36 features (GLCM: 16 and GLCM: 20)

– Intensity: 18 features (3D: 9 and 2D: 9)

– Shape+Intensity: 9 features, shape features weighted by intensity using image moment (3D: 5 and 2D: 4)

7GLCM: gray level co-occurrence matrix, GLRM: gray level run-length matrix

GLCM GLRM

Texture features Intensity features

3D 2D

Shape features

Prediction model

• Distinctive features (n=50)– Hierarchical clustering using Pearson

correlation– 9 shape, 26 texture, 8 intensity, and 7

shape+intensity features– 15 significant features after Bonferroni

correction

• SVM classification coupled with LASSO feature selection– Selected 10 most important features by 10-

fold CV of the LASSO– Radial basis function kernel

(γ = 0.001 and C = 64) – 10 times 10-fold CV

8SVM: Support vector machine, LASSO: Least absolute shrinkage and selection operator, CV: Cross validation

Performance of the SVM-LASSO model

9CV: Cross validation, SVM: Support Vector Machine

with increasing number of features in the 10x10-fold CV

using the two important features and compared with Lung-RADS

Performance of the SVM-LASSO models

Prediction Model Sensitivity Specificity Accuracy AUC # of Features

Lung-RADS 73.3% 70.4% 72.2% 0.74 4

SVM-LASSO 10×10-fold 87.9±2.5% 78.2±1.6% 83.7±1.7% 0.86±0.01 2

20×5-fold 86.0±3.3% 75.9±3.9% 81.6±2.6% 0.85±0.02 2

50×2-fold 83.4±4.9% 71.9±8.8% 78.5±5.1% 0.84±0.03 2

10BB: Bounding Box, AP: Anterior-Posterior, SD: Standard Deviation, IDM: Inverse Difference Moment

• BB_AP– Highly correlated with the axial longest diameter and its

perpendicular diameter (r = 0.96, larger – more malignant)

• SD_IDM– Directional variation of local homogeneity (smaller – more

malignant)

Scatter plot of the two features

11

and the classification curve by the SVM-LASSO model

Cases misclassified by Lung-RADS

12BB: Bounding Box, SD: Standard Deviation, AP: Anterior-Posterior, SI: Superior-Inferior, IDM: Inverse Difference MomentScale bar is 10 mm, Spiculation: 1(no)-5(marked) scale

but correctly classified by the SVM-LASSO model

BB_AP 10mm

13

BenignIDM_LR: 0.172IDM_AP: 0.182IDM_SI: 0.284

Mean_IDM: 0.174SD_IDM: 0.033

MalignantIDM_LR: 0.116IDM_AP: 0.136IDM_SI: 0.138


Axial Sagittal Coronal

d e f

LR

AP

APSI

LRSI

a b c

LRAP

APSI

LRSI

BB_AP 17mm

14

BenignIDM_LR: 0.276IDM_AP: 0.316IDM_SI: 0.210


MalignantIDM_LR: 0.234IDM_AP: 0.215IDM_SI: 0.236


Axial Sagittal Coronal

d e f

LR

AP

APSI

LRSI

a b c

LRAP

APSI

LRSI

Comparison with recent modelsDataset Model description

Hawkins et al.

(2016)

Baseline CT scans of 261pts inNLST

Biopsy-proven ground-truth or 2 years of stable PN

23 RIDER stable radiomic features Random forest classifier 10×10-fold CV

Ma et al.

(2016)

LIDC 72pts Biopsy-proven ground-truth or 2

years of stable PN

583 radiomic features Random forest classifier 10-fold CV

Buty et al.

(2016)

LIDC 2054 PNs Ground-truth by radiologist’s

assessment

Spherical Harmonics (100, 150, and 400 shape features)and AlexNet33 (4096 appearance features)

Random forest classifier 10-fold CV

Kumar et al.

(2015)

LIDC 97pts, including metastatic tumors

Biopsy-proven ground-truth or 2 years of stable PN

Deep convolutional neural network model (5000 features)

10-fold CV

Proposed LIDC 72pts Biopsy-proven ground-truth or

2 years of stable PN

2 important features LASSO features selection and SVM classification 10×10-fold CV

15

Comparison with recent modelsSensitivity Specificity Accuracy AUC

Hawkins et al. (2016) 51.7% 92.9% 80.0% 0.83

Ma et al. (2016) 80.0% 85.5% 82.7%

Buty et al. (2016) 82.4%

Kumar et al. (2015) 79.1% 76.1% 77.5%

Proposed 87.9% 78.2% 83.7% 0.86

16

• A large number of features applied comparing to number of patients– May cause model overfitting problem

• No discussions on how the selected features might have contributed to the prediction of malignancy

• Deep learning needs numerous training data to avoid model overfitting, and transfer learning is questionable

Deep Learning

17

One of the greatest breakthroughs in recent years

Convolutional Neural Networks (CNN)

Gigantic annotated patient data are available for Deep Learning

Data Science Bowl 2017• Two 3D Fully Convolutional Neural Network models

– Nodule Detection network• Trained using LIDC-IRDI database, 883 cases

– Nodule Classification network• Transfer learning applied using the Nodule Detection model and

trained using 2/3 of Kaggle dataset, 1063 cases

18

Layer Kernel Stride #Filter

I 1

C1 5 1 8

C2 5 1 16

M1 2 2 16

C3 5 1 16

C4 5 1 32

M2 2 2 32

FC1 2 1 256

FC2 1 1 1

Layer Kernel Stride #Filter

I 1

C1 5 1 16

C2 5 1 16

M1 2 2 16

C3 5 1 32

C4 5 1 32

M2 2 2 32

FC1 2 1 256

FC2 1 1 64

FC3 1 1 1

Nodule detection network Nodule Classification network

Trained filters – Nodule Detection

19

1st layer 2nd layer

3rd layer

4th layer

3-fold cross-validation Sensitivity 95.1% and 5 false positives per scan

Trained filters – Nodule Classification

20

1st layer

2nd layer

3rd layer

4th layer

3-fold cross-validation Accuracy 67.4%

Failed to win the award

21

Ranked 99th out of 1972 teams (Top 6%, Bronze medal)

Log loss score

Top 10 teams

• https://datasciencebowl.com/2017algorithms/

22

https://datasciencebowl.com/2017algorithms/

Overfitting?

23

Public leaderboard – about 1% of test data Private leaderboard – about 99% of test data

Future Works

• Candidate feature approach– Quantification of spiculated or lobulated margins– Calcification, attachment, solidity and cavitation of

PNs

• Integrate plasma biomakers in the SVM-LASSO model– Difficult to diagnose small PNs, 50% accuracy when

PN size < 15mm– Combining plasma biomarkers with clinical variables

and image features (AUC = 0.95)

24Jiang et al. Int J Cancer. 2017. [published online ahead of print 2017/06/06].

Spiculation Quantification based on Shape Analysis

25

Mean and Gaussian curvature Spherical parameterization

Quantifying tumor morphological change with Jacobian map

26

Conclusion

• Developed an SVM-LASSO model and a deep learning model to predict malignancy of the indeterminate PNs

• Deep learning is feasible for the lung cancer screening but still needs more data

• A multicenter clinical trial in a large population is required

27

Acknowledgements

Memorial Sloan Kettering Cancer Center

– Wei Lu PhD

– Sadegh Riyahi, PhD

– Jung Hun Oh, PhD

– Saad Nadeem, PhD

– James G. Mechalakos, PhD

– Joseph O. Deasy, PhD

– Andreas Rimner, MD

– Chia-ju Liu, MD

– Prasad Adusumilli, MD

– Wolfgang Weber, MD

University of Maryland School of Medicine

– Howard Zhang, PhD

– Feng Jiang, MD, PhD

– Wengen Chen, MD, PhD

– Charles White, MD

28

NIH/NCI Grant R01 CA172638 and NIH/NCI Cancer Center Support Grant P30 CA008748

Thank you!

Q & A

29

ROC curve analysis on the best model of SVM-LASSO and Lung-RADS

The box plots show the difference between benign and malignant PNs for the selected features (BB_AP and SD_IDM) and the largest diameter. P-values were obtained by the Wilcoxon rank sum test and adjusted using Bonferroni correction

30

Significant featuresRank Feature name Type P-value AUC Correlation

1 BB_AP Shape 0.00070 0.81 +

2 BB_SI Shape 0.0012 0.80 +

3 SD_IDM Texture 0.0018 0.79 -

4 Weighted Principal Moments2 Shape+Intensity 0.0022 0.78 +

5 Grey Level Nonuniformity Texture 0.0026 0.78 +

6 Oriented BB_SI Shape 0.0027 0.79 +

7 Weighted Principal Moments3 Shape+Intensity 0.0030 0.78 +

8 Low Grey Level Run Emphasis Texture 0.0031 0.78 -

9 SD Run Length Nonuniformity Texture 0.0033 0.78 +

10 SD Low Grey Level Run Emphasis Texture 0.017 0.75 -

11 Correlation Texture 0.018 0.75 +

12 IDM Texture 0.020 0.75 +

13 SD Long Run Emphasis Texture 0.024 0.75 -

14 Long Run Low Grey Level Emphasis Texture 0.028 0.75 -

15 Inertia Texture 0.035 0.74 -

31

Radiomics and Deep Learning for Lung Cancer Screening

Healthcare

Transcript of Radiomics and Deep Learning for Lung Cancer Screening