AUTOMATIC MITOSIS DETECTION IN BREAST …
Transcript of AUTOMATIC MITOSIS DETECTION IN BREAST …
AUTOMATIC MITOSIS DETECTION IN BREAST HISTOPATHOLOGY IMAGES
USING KNN CLASSIFIER
1G.Usha,
2K.Narasimman,
3T.Shanmuganathan,
4M.Thalaimalaichamy
1Department of ECE, SRC, SASTRA Deemed University, Kumbakonam, Tamilnadu, India
2Department of ECE, School of EEE, SASTRA Deemed University, Thanjavur, Tamilnadu, India
3Department of ECE, Hindustan Institute of Technology and Science, Chennai, Tamilnadu, India
4Department of ECE, SRC, SASTRA Deemed University, Kumbakonam, Tamilnadu, India
Abstract: Mitosis detection is very hard to detect. Mitotic count is an important factor in grading
of breast cancer. In fact, mitosis is a process in which nucleus of the cell undergoes various
transformations. In addition, different image areas are characterized by different tissue types,
which exhibit highly variable appearance. Pixel classifiers are used to solve many detection
problems, and these are characterized by the relatively obvious appearance of the objects to be
detected. A KNN classifier is utilized to detect mitotic candidates from the contour segmented
nuclei regions. The technique utilizes stain normalization process to reduce the complexity in
segmenting exact nuclei boundary in large clinical images. The algorithm provides improved
performance with average F-score of 99.09% for the mitosis data set.
Keywords: H & E stained images, Stain reinhard normalisation, K-means clustering, KNN
1. Introduction
Mitotic count is one of the most important prognostic factors in breast cancer grading as it
is the key element for the assessment of tumour. Usually, mitotic nuclei are in the form of hyper
chromatic objects without a clear nuclear membrane in H & E stained breast histopathology
images [1]. Fig. 1 displays four main evolution phases in the mitosis, namely interface, prophase,
metaphase, anaphase and telophase. The shape of nucleus will be different in various stages.
However, they should be count as single mitosis since they are not separate cells. Due to large
variety of shapes, low frequency and size of the mitotic cells the detection process is time-
International Journal of Pure and Applied MathematicsVolume 119 No. 18 2018, 2795-2805ISSN: 1314-3395 (on-line version)url: http://www.acadpubl.eu/hub/Special Issue http://www.acadpubl.eu/hub/
2795
consuming and extremely difficult. In addition, irregular illumination, non-uniform stain
variation, and lymphocyte presence nuclei makes the detection process more difficult [2].
In this paper the pre-processing is done by using Stain Reinhard Normalisation technique.
The input images are H & E stained images. In this normalisation Hematoxyline stains nuclei
cells into blue and Eosin stains proteins as red, pink or orange. In segmentation K-Means
clustering algorithm is used to segment the interested area from the background. It will classify
given set into certain number of clusters. „K‟ is selected centroids so as the number of clusters.
The classification of the selected clustered image is done by means of KNN classifier.
2. Literature Survey
The detection of mitosis in H and E stained slides of the breast cancer is tedious process
because mitosis are of small sizes with large variety of shapes. Mitosis can be easily confused
with other artefacts present in the image [3]. The Krill Herd algorithm is proposed for solving
optimization of the tasks. It is based on the herding behaviour simulation of krill individuals.
Minimum distances of the individual krill from highest density of the herd are the objective
function for the krill movement [4]. The number of cells undergoing mitosis will play a vital role
in the classification system. However manual calculation is difficult, a computer assisted system
will produce precise results which results in high accuracy [5]. Image analysis using multi
threshold concept is implemented in the detection process to produce maximum optimization [6].
The multi threshold concept is applied in the segmentation of the biomedical images so that no
cell is left behind.
3. Methods
For the classification of the input image five steps were involved namely, image
acquisition, pre-processing, segmentation, feature extraction, performance analysis. The acquired
input image is treated using pre-processing by using stain normalisation followed by
segmentation using K-means clustering and in the classification is done by using Knn classifier.
International Journal of Pure and Applied Mathematics Special Issue
2796
4. Preprocessing
Image pre-processing is the process of enhancing the image . It consists of 3 major steps
namely filtering noise in input image, edge detection to detect the required object from the
unwanted background and binary image conversion (the process of converting the pixel
Fig. 1.Samples of mitotic cells in five mitotic phases.
International Journal of Pure and Applied Mathematics Special Issue
2797
value of the image into zero‟s and one‟s. The technique used for pre-processing is Stain reinhard
normalization Fig 2. Haematoxylin images are of dark blue or violet stained of basic in nature and
binds to basophilic substances such as DNA/RNA which are acidic in nature. Eosin is a pink or
red stain of acidic in nature which binds to acidophilic substances like DNA/RNA arginine and
colours cytoplasm red and RBC cherry red in colour. [9] Haemalum is a complex formed from
aluminium and haematin. It results in staining of nuclei cells in blue colour and with aqueous or
alcoholic solution which results in the eosinophilic structures like proteins in shades of red, pink
and orange. The staining of nuclei due to haemalum results in chemical reaction between dye and
cellular components [7].
5. Segmentation
The accuracy of mitotic count depends of the pre-processing, segmentation and
classification procedures. Cell nuclei and other cell structures can be differentiated using Stain
Normalization technique. Here comes the segmentation process where Krill Herd Algorithm
(KHA) was used in the existing system. Usually in the starting stages of breast cancer or any
other cancer cell membrane vanishes. So the background and the nuclei can be differentiated
easily. This is because we can‟t find a valid threshold (let‟s think it‟s a line that differentiates
background and nucleus). So what was done in KHA is we will first take a coloured infected
tissue image and convert that image to binary image that is 0's and 1's image (black and white
image). This is done because processing of coloured image is complex and time taking. Now in
this binary image (have pixels) selected pixels are made 1 and all the other pixels are made zero.
So now we get a selected imaged which is nothing but mask image and this specifies the centroids
of the nuclei region.
Now three thresholds are selected to differentiate nuclei from cytoplasm, background
stroma and vacuoles. This bi-level image is a mask which provides initial outline to segment
nuclei with exact boundaries by using LACM. LACM is Localized Active Contour Model which
is a broad overview in computer vision for describing object contour from disturbed image. It is
used in the applications like segmentation and shape recognition etc.., It is nothing but energy
minimizing curve (saline - it‟s a curve that connect two specific points) that pulls it towards
object contours that can withstand deformation. Till now it is the existing system and what we
used in our proposed system is K-means Algorithm instead of KHA, LACM because in LACM,
International Journal of Pure and Applied Mathematics Special Issue
2798
due to the energy minimization minute features are not considered over the entire contour and
KHA is not so efficient in large data collection and the performance speed is very low.
.
Here comes the K-means algorithm which is a clustering algorithm. It is used to segment
the selected area from the background. Before this segmentation we go for pre-processing for
improving the quality of the image. K-means classifies the given set through certain number of
clusters (bunch of similar things). How we get different clusters? let us take an example of an
image of 100*100 pixels. Let us select a part of image which has 10*10 pixels. These 10*10
pixels are nothing but 10*10 data points. In K-means K represent the count of randomly selected
centroid and so as the number of clusters we have. Let‟s take K value as 2. So now we have 2
randomly selected centroids ie., we will select two centroids randomly on the 100*100 pixel
image. Let it be C1 and C2. Now we have 10*10 data points and 2 centroids. Now we have to
calculate the distance from each data point to C1 and C2. The lesser the distance the closer the
centroid to that data point. So let us assume that 6*6(x*y) else belong to C1 and 4*4 pixels belong
to C2.
The next step is to calculate the mean i.e, average of 6*6 pixels and average of 4*4 pixels.
These averages will be the new centroids. Again the same process is repeated by calculating the
distance of data points from these new centroids and so on. After three to four iterations the
process can be stopped because though we can get new centroids those will be very nearer to the
previously arrived centroids.
We are extracting 2 clusters from the normalized image in segmentation. In our project we
considered the value of K as 4 and so we have 4 randomly selected centroids and so as the four
clustered images. We can get such number of images which have different similar things using
repetition matrix (inbuilt function) in matlab. This is how the segmentation is done using K-
means Algorithm.
5.1Advantages of K-means Algorithm
By K-means algorithm high performance speed is achieved by means of the repetition
matrix. The efficiency in the data collection is high. Accurate boundaries can be identified by
using K-means clustering.
International Journal of Pure and Applied Mathematics Special Issue
2799
6. Nuclei Classification
Classification phase consists of three stages such as
• Feature computation
• Feature selection
• Decision fusion of individual classifiers using KNN classifier frame work
7. Feature Computation
The cells which undergo mitosis will exhibit variations in texture, shape, size at different
stages. Fig.3(a) shows the example of an input image and Fig.3(b) displays zoomed version of a
selected segmented region. Useful features such as intensity based features, shape based and
texture based features of the cells are extracted from the segmented nucleus patch shown in
Fig.3(c). The intensity-based features include Median (M), Variance (V), Kurtosis (K) and
Skewness (S). The features such as Area (A), Perimeter (P) and Solidity (SL) are the shape-based
features considered along with thirteen Haralick texture features [3,7].
The Gray Level Co-occurrence Matrix (GLCM) will describe the pairing of pixels with
specific values which occurred in an image. However, the GLCM matrices can be estimated by
taking any direction. The adjacency occurs in horizontal (0◦), vertical (90◦), along 45◦ & 135◦, the
texture features are computed along these four directions. By taking the average in all the four
directions, thirteen texture features are computed that include Autocorrelation, Contrast (C),
Correlation (CR), Sum of Squares (SoS), Inverse Difference Moment (IDM), Sum Average (SA),
dissipation, energy, entropy (E), Difference Variance (DV), Difference Entropy (DV),
Information Measure of Correlation (IMoC) and Cluster Tendency. The final feature vector
contains 20 features which include mean and range of the 13 texture features along with four
intensity and three shape based features.
International Journal of Pure and Applied Mathematics Special Issue
2800
Table 1 – Feature Extraction
Feature Type Feature Name Dimension
Intensity based Median, Variance, Kurtosis, and Skewness 4
Shape based Area, Perimeter and Solidity 3
Texture based Haralick features 13
8. Feature Selection
The classifier subset evaluator selects a small subset of features that give best discriminant
information. The classification of the segmented image is done by using KNN classifier. The data
set which was taken from the RCC will be taken as training set. The data set of the input image
will be taken as the test set. The algorithm will return the row of the trained set matrix which was
matched with the test set. Based on the row number the classification is done. The subset of
features with highest score is considered as best feature subset.
By normalizing the feature within a uniform range the differences in the dynamic range of
the features are solved. The normalized value of N‟ is given by
m in
m a x m in
'N N
NN N
(1)
Where N is the actual feature value, m in
N and m a x
N represents the minimum and maximum
feature values.
9. Decision Fusion Using KNN Classifier
The KNN classifier will classifies the test set into groups, based on the training set
grouping. Both the test set and training set should consist of equal number of columns. Group is a
vector whose distinct value defines the rows of the training set. The default behaviour of the
classifier is to use the majority rule. It means a sample point is assigned to the class the majority
of the k nearest neighbours are from. The algorithm will return the row of the trained set matrix
which was matched with the test set. Based on the row number the classification is done.
Comparative results by the proposed KNN classifier with other classifiers as shown in fig.4.
International Journal of Pure and Applied Mathematics Special Issue
2801
10. Experimental results and discussion
The detected mitosis is considered as correct if it is located within the range of 8μm from
the centroid of ground truth mitosis. The well-known measures for the validation are precision
and F-score. Performance graph of precision, recall and F-score as shown in fig.5
1 0 0T P
T P F P
Np re c is io n
N N
(2)
2 1 0 0sen s it iv ity p rec is io n
F sco resen s it iv ity p rec is io n
(3)
Where T P
N represents number of True Positives (TP-correctly detected Mitosis),F P
N number of
False Positives (FP-wrongly detected mitosis).
Fig 5. Performance Graph
0
20
40
60
80
100
120
RF MV-MCS DBN-MCS KNN
Sensitivity
Precision
F-score
Fig 4. Comparative results by the proposed KNN classifier with other classifiers
International Journal of Pure and Applied Mathematics Special Issue
2802
The acquired:
Precision= 0.98198198198198
Recall = 1
F-Score = 0.990909090909091
11. Conclusion:
The paper proposes an accurate framework to carry out segmentation and classification of
mitotic nuclei in H & E stained breast images. Since mitosis detection is very hard to detect and
Mitotic count is an important factor in grading of breast cancer they are very hard to distinguish
from non-mitotic nuclei. The proposed technique first uses stain rein-hard normalization to reduce
the segmentation complexity. The complexity in segmentation is treated in an optimum way by
using K-means clustering algorithm. In classification KNN classifier is used to categorize the
image. Sequential feature selection and feature normalization helps in enhancing the classifier
properties. The proposed technique is evaluated on a publicly available standard dataset.
Compared to the existing techniques, the proposed framework results in better performance with
high sensitivity make it more realistic in clinical applications.
References
[1] M. M. Dundar et al., „„Computerized classification of intraductal breast lesions using
histopathological images,‟‟ IEEE Trans. Biomed. Eng., vol. 58, no. 7, pp. 1977–1984, Jul. 2011.
[2] C. D. Malon and E. Cosatto, „„Classification of mitotic figures with convolutional neural
networks and seeded blob features,‟‟ J. Pathol. Inform., vol. 4, no. 1, p. 9, 2013.
[3] L. Roux et al., „„Mitosis detection in breast cancer histological images An ICPR 2012
contest,‟‟ J. Pathol.Inform., vol. 4, no. 1, p. 8, 2013.
[4] A. H. Gandomi and A. H. Alavi, „„Krill herd: A new bio-inspired optimization algorithm,‟‟
Commun. Nonlinear Sci. Numer.Simul., vol. 17, no. 12, pp. 4831–4845, Dec. 2012.
[5] F. B. Tek et al., „„Mitosis detection using generic features and an ensemble of cascade
adaboosts,‟‟ J. Pathol. Inform., vol. 4, no. 1, p. 12, 2013.
[6] “Adaptive Multi Thresholding for Breast Cancer Stem Cell Detection- A Review” Sabina1 ,
Mrs.Nidhi Post Graduate Student (M.E.), Dept. of ECE, UIET, Panjab University, Chandigarh-
160014, India1 Asst. Prof., Dept. of ECE, UIET, Panjab University, Chandigarh-160014, India.
International Journal of Pure and Applied Mathematics Special Issue
2803
[7] H. Chen, Q. Dou, X. Wang, J. Qin, and P. A. Heng, „„Mitosis detection in breast cancer
histology images via deep cascaded networks,‟‟ in Proc. 13th AAAI Conf. Artif.Intell., 2016, pp.
1160–1166.
[8] (2014). MITOS, ICPR 2014 Contest, IPAL UMI CNRS Lab Std. [Online]. Available:
http://ipal.cnrs.fr/ICPR2014.
[9] J.Savithri, H.Inbarani,” Comparative Analysis Of K-Means, Pso-K-Means, And Hybrid Psogenetic K-
Means For Gene Expression Data”, International Journal Of Innovations In Scientific And
Engineering Research, Vol .1, No. 1, pp.43-50, 2014.
International Journal of Pure and Applied Mathematics Special Issue
2804
2805
2806