Kidney and Kidney-tumor Segmentation Using Cascaded...

Kidney and Kidney-tumor Segmentation UsingCascaded V-Nets

Mohammad Arafat Hussain

BiSICL, University of British Columbia, Vancouver, BC, [email protected]

Abstract. Kidney cancer is the seventh most common cancer world-wide, accounting for an estimated 140,000 global deaths annually. Kid-ney segmentation in volumetric medical images plays an important role inclinical diagnosis, radiotherapy planning, interventional guidance and pa-tient follow-ups however, to our knowledge, there is no automatic kidney-tumor segmentation method present in the literature. In this paper, weaddress the challenge of simultaneous semantic segmentation of kidneyand tumor by adopting a cascaded V-Net framework. The first V-Net inour pipeline produces a region of interest around the probable location ofthe kidney and tumor, which facilitates the removal of the unwanted re-gion in the CT volume. The second sets of V-Nets are trained separatelyfor the kidney and tumor, which produces the kidney and tumor masksrespectively. The final segmentation is achieved by combining the kidneyand tumor mask together. Our method is trained and validated on 190and 20 patients scans, respectively, accesses from 2019 Kidney TumorSegmentation Challenge database. We achieved a validation accuracy interms of the Sørensen Dice coefficient of about 97%.

1 Introduction

Kidney cancer is the 7th most common cancer in men and 10th most commoncancer in women [1] accounting for an estimated 140,000 global deaths annu-ally [2]. The natural growth pattern varies across kidney cancers, which has ledto the development of different prognostic models for the assessment of patient-wise risk [3]. Kidney segmentation in medical images plays an important rolein clinical diagnosis, radiotherapy planning, interventional guidance and patientfollow-ups [4] however, to our knowledge, there is no automatic kidney-tumorsegmentation method present for in the 3D volumetric medical images (e.g. com-puted tomography (CT), magnetic resonance (MR), etc.) .

Prior to the wide range of use of various machine-learning approaches forkidney segmentation, a number of traditional methods were proposed in theliterature that made use of image thresholding, graph cuts, level sets, activecontours, multi-atlas image registration, template deformation, etc. For example,Yan et al. [5] proposed a simple intensity thresholding-based method, which isoften inaccurate and was limited to 2D. Other intensity-based methods haveused graph cuts [6] and active contours/level sets [7]. But these methods are

2

sensitive to the choice of parameters [8], which often need to be tweaked fordifferent images. In addition, the graph cuts [6] and level sets-based [7] methodsare prone to leaking through weak anatomical boundaries in the image, and oftenrequire considerable computation [8]. The methods proposed by Lin et al. [9] andYang et al. [10] rely extensively on prior knowledge of kidney shapes. However,building a realistic model of kidney shape variability and balancing the influenceof the model on the resulting segmentation are non-trivial tasks.

To overcome the aforementioned limitations of the traditional methods, anumber of kidney segmentation methods have been proposed based on manualfeature-engineering-based supervised learning. Cuingnet et al. [11] used a classi-fication forest to generate a kidney spatial probability map and then deformeda ellipsoidal template to approximate the probability map and generate the seg-mentation. Due to this restrictive template-based approach, it is likely to failfor kidneys having abnormal shape due to disease progression and/or internaltumors. Therefore, crucially, [11] did not include the truncated kidneys (16% oftheir data) in their evaluation. Even then, their proposed method did not cor-rectly detect/segment about 20% of left and 20% of right, and failed for another10% left and 10% right kidneys of their evaluation data set. Glocker et al. [12]used a joint classification-regression forest scheme to segment different abdom-inal organs, but their approach suffers from leaking, especially for kidneys, asevident in their results.

Avoiding complex manual feature engineering, supervised deep learning us-ing convolutional neural networks (CNN) have exploded in popularity for auto-matic feature learning, classification, as well as localization and dense labelling.Thong et al. [4] showed promising kidney segmentation performance using CNN,however, it was designed only for 2D contrast-enhanced CT slices. To facilitate3D segmentation of organs including kidney, a number 3D neural network ap-proaches have been proposed recently. For example, Kekeya et al. [13] used judg-ment assisted probabilistic atlas to generate probability map for eight differentorgan locations and then trained eight 3D U-Nets [14] per organ for segmenta-tion. Chen et al. [15] and Roth et al. [16,?] used two 3D U-Nets [14] in a cascadedfashion for organ segmentation, where the first 3D U-Net produces the region-of-interest (ROI) to reduce the search space, and the second 3D U-Net producesthe organ segmentation. Gibson et al. [17] proposed a variant of the V-Net [18],namely DenseVNet, for multi organ segmentation including left kidney. These3D segmentation approaches showed promise in kidney segmentation, however,not tested on the kidney tumor segmentation task.

In this work, we address the challenge of simultaneous semantic segmentationof kidney and kidney-tumor by using a cascaded V-Net architecture. The firstV-Net, namely ROI-V-Net produces a ROI around the probable location of thekidney and tumor, while the second sets of V-Nets, namely Kidney-V-Net andTumor-V-Net, are trained in parallel on the ROI data to produce the kidneyand tumor segmentation separately. These kidney and tumor segmentations arethen joined together by comparing the probabilities associated with each voxelto be a kidney or a tumor or a background.

3

2 Materials and Methods

2.1 Data

We used CT scans of 300 patients from the 2019 Kidney Tumor SegmentationChallenge database [19]. These patients underwent radical nephrectomy or par-tial nephrectomy at the University of Minnesota between 2010 and mid-2018 toexcise a renal tumor. Ground truth semantic segmentations for arterial phase ab-dominal CT scans of 300 unique kidney cancer patients were performed by med-ical students under the supervision of an expert radiologist. Out of 300 patientscans, we used 190 and 20 scans in model training and validation, respectively.The remaining 90 scans are used for objective model evaluation. Although thevoxel spacing among the datasets were variable, the challenge database madeavailable of an uniformly interpolated version of the same datasets, where thevoxel spacing was set to 3mm, 0.78162497mm and 0.78162497mm in the axial,coronal and sagittal directions, respectively. In this work, we used these inter-polated scans.

2.2 Semantic Segmentation of Kidney and Tumor

Kidney ROI Generation Using ROI-V-Net: We use a V-Net (Fig. 2) archi-tecture, namely ROI-V-Net, to roughly segment the kidney and tumor semanti-cally in order to produce a ROI around the kidney and tumor. In this way, wecan discard the unwanted region in the CT volume. The input to this V-Net isa 128×128×128 voxels 1-channel 3D patch. The V-Net has 4 levels and it uses1, 2, 3, and 3 number of convolution operations in the compression side (i.e. leftside), respectively and 3, 3, 2, and 1 numbers of convolutions in the decompres-sion side (i.e. right side). The output layer produces 2-channel 3D predictions ofsize 128×128×128 voxels, one for background and one for the kidney and tumor.These 2-channel predictions are fed to a softmax operator. We use parametricrectified linear unit (PReLU) activation throughout the network.

Once the ROI-V-Net is trained, we used this model on the test data to gener-ate the kidney+tumor predictions, i.e. inseperate kidney+tumor mask. Note thatat this phase, we do not distinguish between the kidney and tumor. We dividethe prediction volume in half along the coronal direction in order to separate theleft and right kidneys first. Then we projected the 3D prediction into a 2D distri-bution map by adding the mask along the axial and coronal directions. We useda median filtering of window sizes 10×25 pixels and 25×25 pixels on the coronaland axial projections, respectively. Then from the mid point (xm, ym, zm) of thekidney and tumor distribution, we select ROIs from the left and right half CTscans (Fig. 1) of dimension [max(1, xm−127), min(xm+128, xf )]×[max(1, ym−127), min(ym + 128, b(yf/2)c)]×[max(1, zm − 127), min(zm + 128, b(zf/2)c)],where xm, ym, and zm are the dimension of the interrogated CT scan, andmax(a, b) and min(a, b) take the maximum and minimum between a and b,respectively. If there is no kidney present or the ROI-V-Net fails to detect anykidney mask, then we consider the corresponding medially divided whole CT

4

scan as the ROI. We also record the ROI locations inside the actual image vol-ume for later use.

ROI-V-Net Kidney-V-Net

Tumor-V-Net

Training Data(Full CT Volume)

Automatic ROI Generation

Test Data (Full CT Volume)

Segmentations Combination

Kidney Prediction

Tumor Prediction

Manual ROI Generation

Training

Training

Training

Final Prediction

Fig. 1. The schematic diagram of our cascaded V-Net architecture for the semanticsegmentation of kidney and tumor.

Semantic Segmentation of Kidney and Tumor: We use another two V-Nets (Fig. 2) of similar architecture as in ROI-V-Net, namely Kidney-V-Netand Tumor-V-Net, to segment the kidney and tumor, respectively. These two V-Nets are trained using manually generated ROI around the kidney and tumor.The input to both these V-Nets is a 128×128×128 voxels 1-channel 3D patch.These V-Nets also have 4 levels having 1, 2, 3, and 3 number of convolutionoperations in the compression side (i.e. left side), respectively and 3, 3, 2, and 1numbers of convolutions in the decompression side (i.e. right side). The outputlayer produces 2-channel 3D predictions of size 128×128×128 voxels, one forbackground and one for the kidney (by Kidney-V-Net) or tumor (by Tumor-V-Net). These 2-channel predictions are fed to a softmax operator. Here also, weuse parametric rectified linear unit (PReLU) activation throughout the network.

After the Kidney-V-Net and Tumor-V-Net are trained, we used these modelson the test data ROI, produced by the ROI-V-Net, to generate the kidney andtumor predictions, respectively. Then we overlay the tumor mask in the kidneymask. Finally, we use the previously recorded ROI location information to re-construct the prediction map of size equal to the interrogated image volume,with value 0 for background, 1 for kidney, and 2 for tumor.

Training: We trained our networks by minimizing the Sørensen Dice loss be-tween the ground truth and predicted labels defined as:

L = 1−2∑N

i pigi∑Ni pi +

∑Ni pi

, (1)

where N is the total number of voxels in an interrogated image volume, pi is thepredicted binary voxel value, and gi is the ground truth binary voxel value. The

5

Fig. 2. The V-Net architecture used in this work. Image credit [20].

base learning rate was set to 0.001 and weight decay was set to 0.01. We used thestochastic gradient descent optimizer with momentum set to 0.9. Before training,we normalized each dataset using statistical normalization with σ = 0.25. Wealso used resampling of datasets using linear interpolation so that each voxel hasa dimension of 0.45mm×0.45mm×0.45mm. We also added random noise in thetraining data for augmentation and used random cropping to generate 3D imagepatch during training. The training batch size per iteration was set to 1. TheV-Net is implemented in Tensorflow, which is adopted from this repository [21].Training was performed on a workstation with Intel 4.0 GHz Core-i7 processor,an Nvidia GeForce Titan Xp GPU with 12 GB of VRAM, and 32 GB of RAM.

6

0.50.60.7

0.80.91.0

Training

0 1000 2000 3000 4000 5000 6000 7000 8000 9000ValidationSo

rens

en D

ice

Coef

ficie

nt

0.50.60.7

0.80.91.0

0 1000 2000 3000 4000 5000 6000 7000 8000 9000 10000

TrainingValidation

Iterations

Sore

nsen

Dic

e Co

effic

ient

Iterations(a)

(b)

Fig. 3. Graphs showing Sørensen Dice coefficient values vs. training iterations for (a)Kidney-V-Net and (b) Tumor-V-Net.

3 Results

In Fig.3, we show the Sørensen Dice coefficients with respect to the trainingiterations. We see in Fig.3(a) for the Kidney-V-Net that after about 8,500 iter-ations (i.e. ∼47 epochs), the Sørensen Dice coefficient for validation data wasabout 0.9650. We also see Fig.3(b) for the Tumor-V-Net that around 10,000 it-erations (i.e. ∼56 epochs), the Sørensen Dice coefficient for validation data wasabout 0.9860. The average Sørensen Dice coefficient combining the Kidney andTumor segmentation for 20 validation patient cases is about 0.9755.

4 Conclusions

In this work, we proposed a cascaded V-Nets framework for simultaneous seman-tic segmentation of kidney and tumor. Our first V-Net in the pipeline produceda ROI around the probable location of the kidney and tumor, facilitating theremoval of the unwanted region in the CT volume, while the second sets of V-Nets produced the kidney and tumor segmentations. Our validation results interms of the Sørensen Dice coefficient of about 97% shows robust segmentationperformance.

7

Acknowledgement: We thank NVIDIA Corporation for supporting our re-search through their GPU Grant Program by donating the GeForce Titan Xp.

References

1. Siegel, R.L., Miller, K.D., Jemal, A.: Cancer statistics, 2016. CA: a cancer journalfor clinicians 66(1) (2016) 7–30

2. Ding, J., Xing, Z., Jiang, Z., Chen, J., Pan, L., Qiu, J., Xing, W.: CT-basedradiomic model predicts high grade of clear cell renal cell carcinoma. Europeanjournal of radiology 103 (2018) 51–56

3. Escudier, B., Porta, C., Schmidinger, M., Rioux-Leclercq, N., Bex, A., Khoo, V.,Gruenvald, V., Horwich, A.: Renal cell carcinoma: ESMO clinical practice guide-lines for diagnosis, treatment and follow-up. Annals of Oncology 27(suppl 5) (2016)v58–v68

4. Thong, W., Kadoury, S., Piche, N., Pal, C.J.: Convolutional networks for kidneysegmentation in contrast-enhanced ct scans. Computer Methods in Biomechanicsand Biomedical Engineering: Imaging & Visualization 6(3) (2018) 277–282

5. Yan, G., Wang, B.: An automatic kidney segmentation from abdominal ct images.In: 2010 IEEE International Conference on Intelligent Computing and IntelligentSystems. Volume 1., IEEE (2010) 280–284

6. Li, X., Chen, X., Yao, J., Zhang, X., Tian, J.: Renal cortex segmentation usingoptimal surface search with novel graph construction. In: International Confer-ence on Medical Image Computing and Computer-Assisted Intervention, Springer(2011) 387–394

7. Zhang, Y., Matuszewski, B.J., Shark, L.K., Moore, C.J.: Medical image segmen-tation using new hybrid level-set method. In: 2008 fifth international conferencebiomedical visualization: information visualization in medical and biomedical in-formatics, IEEE (2008) 71–76

8. Zhen, X., Wang, Z., Islam, A., Bhaduri, M., Chan, I., Li, S.: Multi-scale deepnetworks and regression forests for direct bi-ventricular volume estimation. Medicalimage analysis 30 (2016) 120–129

9. Lin, D.T., Lei, C.C., Hung, S.W.: Computer-aided kidney segmentation on ab-dominal ct images. IEEE transactions on information technology in biomedicine10(1) (2006) 59–65

10. Yang, G., Gu, J., Chen, Y., Liu, W., Tang, L., Shu, H., Toumoulin, C.: Automatickidney segmentation in ct images based on multi-atlas image registration. In: 201436th Annual International Conference of the IEEE Engineering in Medicine andBiology Society, IEEE (2014) 5538–5541

11. Cuingnet, R., Prevost, R., Lesage, D., Cohen, L.D., Mory, B., Ardon, R.: Automaticdetection and segmentation of kidneys in 3d ct images using random forests. In:International Conference on Medical Image Computing and Computer-AssistedIntervention, Springer (2012) 66–74

12. Glocker, B., Pauly, O., Konukoglu, E., Criminisi, A.: Joint classification-regressionforests for spatially structured multi-object segmentation. In: European conferenceon computer vision, Springer (2012) 870–881

13. Kakeya, H., Okada, T., Oshiro, Y.: 3d u-japa-net: Mixture of convolutional net-works for abdominal multi-organ ct segmentation. In: International Conference onMedical Image Computing and Computer-Assisted Intervention, Springer (2018)426–433

8

14. Cicek, O., Abdulkadir, A., Lienkamp, S.S., Brox, T., Ronneberger, O.: 3d u-net: learning dense volumetric segmentation from sparse annotation. In: Interna-tional conference on medical image computing and computer-assisted intervention,Springer (2016) 424–432

15. Chen, S., Roth, H., Dorn, S., May, M., Cavallaro, A., Lell, M.M., Kachelrieß,M., Oda, H., Mori, K., Maier, A.: Towards automatic abdominal multi-organsegmentation in dual energy ct using cascaded 3d fully convolutional network.arXiv preprint arXiv:1710.05379 (2017)

16. Roth, H.R., Oda, H., Zhou, X., Shimizu, N., Yang, Y., Hayashi, Y., Oda, M., Fuji-wara, M., Misawa, K., Mori, K.: An application of cascaded 3d fully convolutionalnetworks for medical image segmentation. Computerized Medical Imaging andGraphics 66 (2018) 90–99

17. Gibson, E., Giganti, F., Hu, Y., Bonmati, E., Bandula, S., Gurusamy, K., Davidson,B., Pereira, S.P., Clarkson, M.J., Barratt, D.C.: Automatic multi-organ segmen-tation on abdominal ct with dense v-networks. IEEE transactions on medicalimaging 37(8) (2018) 1822–1834

18. Milletari, F., Navab, N., Ahmadi, S.A.: V-net: Fully convolutional neural net-works for volumetric medical image segmentation. In: 2016 Fourth InternationalConference on 3D Vision (3DV), IEEE (2016) 565–571

19. Heller, N., Sathianathen, N., Kalapara, A., Walczak, E., Moore, K., Kaluzniak, H.,Rosenberg, J., Blake, P., Rengel, Z., Oestreich, M., Dean, J., Tradewell, M., Shah,A., Tejpaul, R., Edgerton, Z., Peterson, M., Raza, S., Regmi, S., Papanikolopoulos,N., Weight, C.: The kits19 challenge data: 300 kidney tumor cases with clinicalcontext, ct semantic segmentations, and surgical outcomes (2019)

20. Monteiro, M.: Vnet-tensorflow: Tensorflow implementation of the V-Net architec-ture for medical imaging segmentation. https://github.com/MiguelMonteiro/

VNet-Tensorflow (2018)21. Ko, J.K.: Implementation of V-Net in tensorflow for medical image segmentation.

https://github.com/jackyko1991/vnet-tensorflow (2018)

https://github.com/MiguelMonteiro/VNet-Tensorflow

https://github.com/MiguelMonteiro/VNet-Tensorflow

https://github.com/jackyko1991/vnet-tensorflow

Kidney and Kidney-tumor Segmentation Using Cascaded...

Documents

Transcript of Kidney and Kidney-tumor Segmentation Using Cascaded...