Final Project Report - CAE...

13
Final Project Report Detection of Cervical Cancer in Pap Smear Images Hana Sarbortova COMPSCI/ECE/ME 539 Introduction to Artificial Neural Networks and Fuzzy Systems University of Wisconsin - Madison December 17, 2013 Summary Cervical cancer is one of the most common cancers but also one of the most preventable ones. Regular Pap smear test can uncover pre-cancerous signs of cervical cells and treatment can be done before cancer fully develop. Automatic analysis of a Pap smear has to deal with various problems such as cell occlusion and cell type variability. Some research focused on specific part of this problem has been done and some methods, mostly semi-automatic, are also used in practice. This project focus on classification based on features extracted from Pap smear images. The source code is not intended to be released. 1

Transcript of Final Project Report - CAE...

Page 1: Final Project Report - CAE Usershomepages.cae.wisc.edu/~ece539/fall13/project/Sarbortova_rpt.pdfSummary Cervical cancer is one of the most common cancers but also one of the most preventable

Final Project Report

Detection of Cervical Cancer in PapSmear Images

Hana SarbortovaCOMPSCI/ECE/ME 539 Introduction to Artificial Neural Networks

and Fuzzy SystemsUniversity of Wisconsin - Madison

December 17, 2013

Summary

Cervical cancer is one of the most common cancers but also oneof the most preventable ones. Regular Pap smear test can uncoverpre-cancerous signs of cervical cells and treatment can be done beforecancer fully develop. Automatic analysis of a Pap smear has to dealwith various problems such as cell occlusion and cell type variability.Some research focused on specific part of this problem has been doneand some methods, mostly semi-automatic, are also used in practice.This project focus on classification based on features extracted fromPap smear images.

The source code is not intended to be released.

1

Page 2: Final Project Report - CAE Usershomepages.cae.wisc.edu/~ece539/fall13/project/Sarbortova_rpt.pdfSummary Cervical cancer is one of the most common cancers but also one of the most preventable

1 Introduction

Cervical cancer is the second most common cancer affecting womanworldwide but at the same time it is one of the most preventable and treat-able cancers. Since the most common form of cervical cancer starts withpre-cancerous changes and develops very slowly, up to 90% of cervical can-cers may be prevented if cell changes are detected and treated early [1].

Early detection is undertaken using a Pap smear. There are two typesof Pap smears, Conventional and ThinPrep, which differ in the way theyare obtained. Conventional method tool is cytobrush while ThinPrep uses abroom-like device. The advantage of ThinPrep is that it contains less con-taminants and reduce clumping which makes seeing unobstructed cells mucheasier [2]. ThinPrep is widely used in the most developed countries and sam-ples obtained by this method will be used in this project. Each sample isusually stained before microscope investigation. The most common method,and also the method used for samples in this project, is Haemotoxylin andEosin (H&E) staining [3]. It can help differentiate cell components but doesnot have any reasonable meaning for recognition. The color of a cell indi-cates its age which is not a significant information for cancer identification[4],therefore all samples will be converted to grey scale images.

Although Cervical cancer can be prevented, Pap smear images must beevaluated properly in order to achieve that. Detection errors can be causedby inappropriate smear thickness which causes cell overlapping, or by un-wanted particles in the smear. Also, diagnosis done by cytotechnologists andcytopathologists may by faulty if number of cancerous cells is small or if theirexperience is not sufficient. Automatic detection can help to increase cancercell awareness, diagnosis objectivity and decrease testing cost at the sametime.

The cancer detection process consists of image preprocessing, segmen-tation, feature extraction and classification. These can be very challengingproblems due to variability of cells in a single sample and by their clumpingand occlusion. There are two types of cervical cells, Squamous cells (Exo-cervix) and Glandulas cells (Endocervix). Cervical cancer is usually caused

2

Page 3: Final Project Report - CAE Usershomepages.cae.wisc.edu/~ece539/fall13/project/Sarbortova_rpt.pdfSummary Cervical cancer is one of the most common cancers but also one of the most preventable

by the first type and is called Squamous cell carcinoma [5]. Additionally,white blood cells (Neutrophil), metaplastic cells, yeast strands, cell debris orbacteria can appear in samples. They all can be clustered in an arbitraryway which makes even segmentation very difficult as we are interested mostlyjust in separation of squamous cell cytoplasm and nuclei.

2 Related work

Many research papers focused on segmentation, classification, or bothhave been published over past 30 years. Although, the majority of them istrying to solve a very specific problems while working with very restricteddatasets. Segmentation usually focus on localisation of nucleus and does notdeal with overlapping cytoplasm. Proposed classification methods usuallydeal with already separated cervix cells, and again, does not consider over-lap. Very little methods considering a realistic Pap smear has been publishedrecently.

Segmentation The most popular and common choices for the segmenta-tion task in the literature are automatic thresholding, morphological oper-ations, and active contours model. Bamford and Lovell [14] segmented thenucleus using an active contour model that was estimated by using dynamicprogramming to find the boundary with the minimum cost within a boundedspace around the darkest point in the image. Wu at al. [15] used a parametriccost function with an elliptical shape assumption for the region of interest.Yang-Mao et al. [16] applied automatic thresholding to the image gradientin order to identify the edge pixels corresponding to nucleus and cytoplasmboundaries. This method was improved by replacement of the thresholdingstep by k-means clustering into two partitions by Tsai et al.[17]. Harandiet al.[18] identified the cervical cells boundaries by using the active contouralgorithm and then used thresholding to identify the nucleus within eachcell. The cytoplasm corresponding to each nucleus was identified by separateactive contour. Plissiti et al. [1] detected the locations of nuclei centroids de-tected the locations of nuclei centroids in Pap smear images by using the localminima of image gradient,eliminated the candidate centroids that were too

3

Page 4: Final Project Report - CAE Usershomepages.cae.wisc.edu/~ece539/fall13/project/Sarbortova_rpt.pdfSummary Cervical cancer is one of the most common cancers but also one of the most preventable

close to each other,and used a support vector machine (SVM) classifier forthe final selection of points using color values in square neighbourhoods. In[19], they used the detected centroids as markers in marker-based watershedsegmentation to find the nuclei boundaries, and eliminated the false-positiveregions by using a binary SVM classifier with shape, texture, and intensityfeatures.

Most of the described methods focus on only nuclei segmentation [14][15] [1] [19], which usually can’t be easily used for classification as the cy-toplasm are has to be considered too. Genstav et al. [6] focus on correctidentification of cells of the individual nuclei under the presence of overlap-ping cells while assuming that the overlapping cytoplasm area is shared bydifferent cells in the rest of the analysis. However, they can’t segment in-dividual cytoplasm of individual cells under overlap. The first step in thesegmentation process proposed by [6] separates the cell regions from thebackground using morphological operations and automatic thresholding thatcan handle varying staining and illumination levels. Then, the second stepbuilds a hierarchical segmentation tree by using a multi-scale watershed seg-mentation procedure, and automatically selects the regions that maximizea joint measure of homogeneity and circularity with the goal of identifyingthe nuclei at different scales. The third step finalizes the separation of nucleifrom cytoplasm within the segmented cell regions by using a binary classifier.

Classification The classification can either determine a cell to be normalor abnormal, or assign it one of various levels of dysplasia. Automatic [7] [8][9] and semi-automatic [10] [11] method have been proposed to discriminatenormal from abnormal dysplasia cells. However, the state of squamolous celldysplasia can be described more specifically, see Figure 1, which has beenconsidered in [12] [6] [13]. [12] and [13] classified cervical cells into normal,LSIL and HSIL classes but they didn’t distinguished types of normal cells.Moreover, [6] distinguished 7 different types including 3 normal (Superficialsquamous, Intermediate squamous, Columnar) and 4 abnormal (Mild dys-plasia, Moderate dysplasia, Severe dysplasia, Carcinoma in situ). As it iscrucial to determine whether a patient has to be treated or not, this projectwill consider only normal or abnormal cervical cells.

4

Page 5: Final Project Report - CAE Usershomepages.cae.wisc.edu/~ece539/fall13/project/Sarbortova_rpt.pdfSummary Cervical cancer is one of the most common cancers but also one of the most preventable

Published methods for cervical cancer classification works with cervicalfeatures that are either extracted manually by a human expert [22] [23] orautomatically [6] [21]. Furthermore, most of the classification methods (fea-ture extractors) work with single cell images, in which cytoplasm area canbe computed relatively easily[21]. The most important features that has tobe extracted are nucleus and cytoplasm area, nucleus and cytoplasm bright-ness/minima/maxima and nucleus roundness. A lot of methods make use ofArtificial Neural Networks (ANN) [22] [23], Mat-Isa et al. [22] developed Hy-brid Multilayer Perceptron which obtains better results than classical ANN.

A lot of research work has been done but the majority focuses only on aspecific areas of the problem on a limited dataset. They are usually not ap-plicable in the full chain of operations which is needed in order to achieve fullanalysis from an image of a whole Pap smear. An open question is analysisof occluded cells, especially indicating area of a cervical cell under occlusion.

Figure 1: Possible states of squamous cell dysplasia

5

Page 6: Final Project Report - CAE Usershomepages.cae.wisc.edu/~ece539/fall13/project/Sarbortova_rpt.pdfSummary Cervical cancer is one of the most common cancers but also one of the most preventable

3 Method proposal

3.1 Cancerous (abnormal) cell characteristics

Abnormal and normal cervical cells have some very significant featuresthat are necessary for distinguishing between them. The most importantcharacteristics is nuclei-cytoplasm area ratio. Abnormal cells has larger nu-clei and much smaller area of cytoplasm than normal cervical cells. Abnormalcells also tend to have more ellipsoidal shape while normal cells are more orless rounded. On the other hand, abnormal cells usually have more roundedshape of cytoplasm, but this rule is not necessarily true for cancerous cellswith a heavily developed dysplasia. Abnormal cell nuclei has a darker in-tensity value and much significant structural pattern. Also, abnormal cellstends to be clustered in Pap smear images. Some examples of cancerous andnormal cells are shown in Figure 2 and Figure 3 respectively.

Figure 2: Examples of abnormal cervical cells

6

Page 7: Final Project Report - CAE Usershomepages.cae.wisc.edu/~ece539/fall13/project/Sarbortova_rpt.pdfSummary Cervical cancer is one of the most common cancers but also one of the most preventable

Figure 3: Examples of normal cervical cells

3.2 Feature extraction

Pap smear test images are first converted from color to grayscale images.A median filter has been applied in order to remove a small noise whilepreserving edge sharpness.

3.2.1 Segmentation

Segmentation consists of two major steps while working with full Papsmear images. First, segmentation of regions of cells and clusters of cells,i.e. background removal. Second, segmentation of cells within the previouslylocated regions, i.e. nuclei segmentation. The hardest part even for a humaneye is segmentation of cell cytoplasm within cell clusters. We consider onlyan estimation of it’s area.

Background removal Background consists of pixels that have relativelyuniform intensity values. Usually, they form the largest part of the image.These two things naturally leads to an idea that the highest peak in theimage histogram will be background, see Figure 4. In this project, theregion segmentation has been done by using global thresholding method.

7

Page 8: Final Project Report - CAE Usershomepages.cae.wisc.edu/~ece539/fall13/project/Sarbortova_rpt.pdfSummary Cervical cancer is one of the most common cancers but also one of the most preventable

The threshold has been estimated from a Gaussian smoothed histogram; theseek threshold belongs to the first saddle towards the lower values.

Figure 4: Gaussian smoothed histogram of Pap smear image

Nuclei extraction Nuclei are significantly darker than surrounding cyto-plasm, however, a global threshold within a region generally cannot be used.Nevertheless, a nucleus usually belongs to an area of local maxima. Thesegmentation used in this project works with maxima of four single direc-tion gradient values. Four single direction gradient images are computed,the leftward horizontal, rightward horizontal, upward vertical and downwardvertical. Each pixel of the Pap smear is then assigned a class according tomaximum value of those four gradient images (or no class if the variancebetween gradient values is not large enough). Connected areas with thesame label can be seen as nodes of a graph, edges exist only between twoclasses (i.e. directed edge is between leftward and downward, downward andrightward,est.). Therefore, nuclei can be seen as four node cycles in directedgraph. This locates nuclei, however, nuclei edges has to be estimated a bitmore precisely. In order to do that, a minimum path in an all-directiongradient image around the located area is found.

8

Page 9: Final Project Report - CAE Usershomepages.cae.wisc.edu/~ece539/fall13/project/Sarbortova_rpt.pdfSummary Cervical cancer is one of the most common cancers but also one of the most preventable

Cytoplasm estimation Cytoplasm area can be precisely determined forsingle cells, however, an estimation has to be used for cell clusters. An areaof the Voronoi diagram cell is used as an upper bound and a distance to thenearest Voronoi edge represents the lower bound (resp. area of a circle witha radius equal to half of the mentioned distance)

3.2.2 Extracted features

Several features describing shape and structure have been used to con-struct the feature vector for classification.

Shape descriptor[5] A circle has been centred in the nucleus centroid.Number of pixels belonging to nucleus were counted around rays going fromthe centroid in specific direction (15 degrees angle). Difference between con-secutive rays values has been counted and finally a histogram of 5 bins hasbeen computed from the values. This is a description invariant to intensitychanges and rotation.

Structure descriptor[7] An area around the nucleus centroid has beentaken in order to analyse the nucleus structure. Variance within small areashas been computed and histogram has been constructed from all the resultingvalues. This is a description invariant to intensity changes.

Nucleus Shape features[7] Nucleus areaNucleus perimeterMajor axis lengthMinor axis lengthRoundnessEstimated cytoplasm areaNucleus-cytoplasm area ratio

3.3 Classification

The data used for training and testing consist of feature vectors with 19features each. The classification classes are cancerous cell and normal cell.The features were chosen so that the types of normal cells does not have tobe distinguished. The best classification result has been obtained by using

9

Page 10: Final Project Report - CAE Usershomepages.cae.wisc.edu/~ece539/fall13/project/Sarbortova_rpt.pdfSummary Cervical cancer is one of the most common cancers but also one of the most preventable

Feedforward Artificial Neural Network. Matlab Neural Network Toolbox hasbeen used to train and test the network. The best network had 20 hiddenlayer neurons. The cross-validation has been used for more reliable trainingand testing.

4 Results

4.1 Dataset

Full pap smear images has been used for obtaining the test and trainingdata. The dataset consists of 40 labelled images, see Figure 5. There were10 cancerous and 30 normal Pap smears in the dataset. A Pap smear isconsidered to be cancerous (abnormal) if at least one cell is cancerous.

Figure 5: Example of dataset labelling (cancer image)

4.2 Classification results

The result of segmentation gave 548 nuclei, 31 cancerous cells and theremaining any other cells (not necessarily only normal cervical cells). 37

10

Page 11: Final Project Report - CAE Usershomepages.cae.wisc.edu/~ece539/fall13/project/Sarbortova_rpt.pdfSummary Cervical cancer is one of the most common cancers but also one of the most preventable

images were successfully classified. The result on cell classification is 79%.Due to variability of cell clusters, much bigger dataset would be needed inorder to get better results. Also, it would help with the preliminary analysisand feature selection.

References

[1] M. E. Plissiti, C. Nikou, A. Charchanti, Automated detection of cell nu-clei in Pap smear images using morphological reconstruction and clus-tering, IEEE Transactions on Information Technology in Biomedicine15(2)(2011)233-241.

[2] S. Rogers, Collection of specimens for conventional & thinprep paptests, hpv tests, & gc/ct[http://www.frhg.org/documents/Lab Manuals/Collection-of-Specimens-for-Conventional-and-Thin-Prep-Pap-Tests,-HPV-Tests,-and-GC-CT-Tests.pdf ]

[3] The histology guide, University of Leeds[http://www.cancer.org/cancer/cervicalcancer/index]

[4] Haematoxylin eosin (h&e) staining[http://protocolsonline.com/histology/dyes-and-stains/haematoxylin-eosin-he-staining/]

[5] Pathology of an abnormal Pap Smear[http://wdavidstinsonmd.com/Pap%20test.htm]

[6] A. Genctav, S. Aksoy, and S. nder, Unsupervised segmentation andclassification of cervical cell images, Pattern Recognition (2012) 45 4151-4168.

[7] M. E. Plissiti, C. Nikou, and A. Charchanti, Automated detection of cellnuclei in Pap smear images using morphological reconstruction and clus-tering, IEEE Transactions on Information Technology in Biomedicine(2011) 15(2) 233-241

[8] P. Sobrevilla, E. Montseny, F. Vaschetto, and E. Lerma, Fuzzy-basedanalysis of microscopic color cervical pap smear images: Nuclei detec-tion, Computational Intelligence and Applications (2010) 9(3) 187-206.

11

Page 12: Final Project Report - CAE Usershomepages.cae.wisc.edu/~ece539/fall13/project/Sarbortova_rpt.pdfSummary Cervical cancer is one of the most common cancers but also one of the most preventable

[9] N. M. Harandi, S. Sadri, N. A. Moghaddam, and R. Amirfattahi, Anautomated method for segmentation of epithelial cervical cell images ofThinPrep, Journal of Medical Systems (2010) 34 1043-1058.

[10] S. N. Sulaimana, N. A. M. Isab, and N. H. Othmanc, Semi-automatedpseudo colour features extraction technique for cervical cancers papsmear images, Knowledge-based and Intelligent Engineering Systems(2011) 15 131-143.

[11] C. Bergmeir, M. Garca-Silvente, and J. M. Bentez, Segmentation of cer-vical cell nuclei in high-resolution microscopic images: A new algorithmand a web-based software framework, Computer Methods and Programsin Biomedicine (2012) 107(3) 497-512.

[12] B. Sokouti, S. Haghipour, and A. D. Tabrizi, A framework for diagnosingcervical cancer disease based on feedforward MLP neural network andThinPrep histopathological cell image features, Neural Computing andApplications (2012) DOI 10.1007/s00521-012-1220-y

[13] N. A. Mat-Isa, M. Y. Mashor, N. H. Othman, An automated cervi-cal pre-cancerous diagnostic system, Artificial Intelligence in Medicine(2008) 42 1-11

[14] P. Bramford, B. Lovell, Unsupervised cell nucleus segmentation withactive contours, Signal Processing 71 (2) (1998) 203-213

[15] H.-S. Wu, J. Barba, J. Gil, A parametric fitting algorithm for segmen-tation of cell images, IEEE Transactions on Biomedical Engineering 45(3) (1998) 400-408

[16] S.-F. Yang-Mao, Y.-K. Chan, Y.-P. Chu, Edge enhancement nucleus andcytoplasm contour detector of cervical smear images, IEEE Transactionson Systems, Man, and Cybernetics Part B: Cybernetics 38 (2) (2008)353-366

[17] M.-H. Tsai, Y.-K. Chan, Z.-Z. Lin, S.-F. Yang-Mao, P.-C. Huang, Nu-cleus and cytoplasm contour detector of cervical smear image, PatternRecognition Letters 29 (9) (2008) 1441-1453

12

Page 13: Final Project Report - CAE Usershomepages.cae.wisc.edu/~ece539/fall13/project/Sarbortova_rpt.pdfSummary Cervical cancer is one of the most common cancers but also one of the most preventable

[18] N.M. Harandi, S. Sadri, N. A. Moghaddam, R. Amirfattahi, An auto-mated method for segmentation of epithelial cervical cells in images ofThinPrep, Journal of Medical Systems 34(6)(2010) 1043-1058.

[19] M. E. Plissiti, C. Nikou, A. Charchanti, Combining shape,texture andintensity features for cell nuclei extraction in Pap smear images,PatternRecognition Letters 32(6)(2011)838-853.

[20] K. Li, Z. Lu, W. Liu, J. Yin, Cytoplasm and nucleus segmentation incervical smear images using Radiating GVF snake, Pattern Recognition45(4)(2012) 1255-1264.

[21] Y. Chen, P. Huang, et al., Semi-Automatic segmentation and classi-fication of Pap smear Cells, IEEE Journal of Biomedical and HealthInformatics (2013) PP

[22] R. Ashfaq, B. Solares, M. Saboorian, Detection of endocervical com-ponent by Papnet system on negative cervical smears. Diagnostic Cy-topathology (1996) 15(2) 121-124.

[23] C. Balas, A novel optical imaging method for the early detection, quan-titative grading and mapping of cancerous and precancerous lesions ofcervix. IEEE Transactions on Biomedical Engineering (2001) 48(1) 96-104.

13