[IEEE 2014 IEEE Region 10 Symposium - Kuala Lumpur, Malaysia (2014.4.14-2014.4.16)] 2014 IEEE REGION...

4
HEp-2 Cell Classification Using Multilevel Wavelet Decomposition Ranveer Katyal Dept. of Communication and Computer Engineering The LNM Institute of Information Technology Jaipur, India Email: [email protected] Manohar Kuse Electronics and Computer Engineering Dept. Hong Kong University of Science and Technology Hong Kong Email: [email protected] Subrat Kumar Dash Dept. of Computer Science and Engineering The LNM Institute of Information Technology Jaipur, India Email: [email protected] Abstract—The analysis of anti-nuclear antibodies in HEp- 2 cells by Indirect Immunofluorescence (IIF) is considered a powerful, sensitive, and comprehensive test for auto-antibodies analysis for autoimmune diseases. The aim of this study is to explore the use of wavelet texture analysis for automated categorization of auto-antibodies into one of the six categories of immunofluorescent staining. Gray level co-occurrence matrix (GLCM) features were extracted over sub-bands obtained from multi-level wavelet decomposition. In this study, an attempt is also made to investigate effect of different wavelet bases and their superiority on spatial domain features on classification task at hand. A qualitative as well as quantitative comparison is done between GLCM features in wavelet domain and spatial domain. Discrete Meyer wavelet has been found to be the most discriminating for this classification task. KeywordsHEp-2 Cell Classification, Multi-level Wavelet De- composition, Wavelet Texture Representation I. I NTRODUCTION The American Autoimmune Related Diseases Association (AARDA) has reported that about 50 million Americans suf- fer from an autoimmune disease of one kind or another[1]. There are over 80 diseases caused by autoimmunity. Au- toimmune connective tissue disorders (CTDs) are one of the most common autoimmune diseases. Detection of antinuclear antibodies (ANAs) is a common marker in patients with suspected CTD. Such antinuclear antibodies are best diagnosed by indirect immunofluorescence (IIF) of biopsy specimens. Although more than thirty different nuclear and cytoplasm patterns of immunofluorescent staining could be identified [2], in the literature they are classified into one of the six groups [3] samples of whom are as shown in Fig.1. The aim of this study is to develop a method based on automatic processing of IIF images for categorization of auto- antibodies into one of the six categories of immunofluorescent staining from the segmented images of the HEp-2 cells. Texture is an important descriptor in classification of these images into six training patterns. In addition to the standard co-occurrence based texture descriptors, wavelets have also found application for texture discrimination. First reported texture classification study using wavelets was by Carter [4]. Wavelet features have also found applications in medical imaging. Qureshi et al. [5] developed discriminant wavelet features and reported them significantly outperforming the local binary patterns (LBP) for classification of meningioma subtype histopathology images. Homogeneous Coarse Speckled Cytoplasmatic Fine Speckled Centromere Nucleolar Fig. 1: Staining Patterns in the HEp-2 images [3] II. LITERATURE REVIEW In the last few years some research groups have proposed different algorithms for the analysis of IIF images. Early attempt by Perner et al. [6] was on data mining for patterns in HEp-2 cell images. Although there were efforts by a few other research groups on analysis of IIF images, the validation of these proposed methods have been carried out on small and private datasets. Hence the results are not reproducible and cannot be compared. In an effort to standardize and allow a fair comparison The HEp-2 Cells Classification contest [7] was hosted by the 21st International Conference on Pattern Recognition (ICPR). The aim of this contest was to bring researchers interested in performance evaluation of algorithms for IIF images analysis on a common dataset. A few authors who were part of this contest and have made use of this dataset are as follows. Strandmark et al. [8] presented a classification method based on random forests that achieved an accuracy of about 97.4%. A classification scheme using Morphological and Textural Features was pro- posed by Theodorakopoulos et al. [9] which achieved an 2014 IEEE Region 10 Symposium 978-1-4799-2027-3/14/$31.00 ©2014 IEEE 147

Transcript of [IEEE 2014 IEEE Region 10 Symposium - Kuala Lumpur, Malaysia (2014.4.14-2014.4.16)] 2014 IEEE REGION...

HEp-2 Cell Classification Using Multilevel WaveletDecomposition

Ranveer KatyalDept. of Communication and

Computer Engineering

The LNM Institute of Information Technology

Jaipur, India

Email: [email protected]

Manohar KuseElectronics and Computer

Engineering Dept.

Hong Kong University

of Science and Technology

Hong Kong

Email: [email protected]

Subrat Kumar DashDept. of Computer Science

and Engineering

The LNM Institute of Information Technology

Jaipur, India

Email: [email protected]

Abstract—The analysis of anti-nuclear antibodies in HEp-2 cells by Indirect Immunofluorescence (IIF) is considered apowerful, sensitive, and comprehensive test for auto-antibodiesanalysis for autoimmune diseases. The aim of this study isto explore the use of wavelet texture analysis for automatedcategorization of auto-antibodies into one of the six categoriesof immunofluorescent staining. Gray level co-occurrence matrix(GLCM) features were extracted over sub-bands obtained frommulti-level wavelet decomposition. In this study, an attempt isalso made to investigate effect of different wavelet bases andtheir superiority on spatial domain features on classification taskat hand. A qualitative as well as quantitative comparison isdone between GLCM features in wavelet domain and spatialdomain. Discrete Meyer wavelet has been found to be the mostdiscriminating for this classification task.

Keywords—HEp-2 Cell Classification, Multi-level Wavelet De-composition, Wavelet Texture Representation

I. INTRODUCTION

The American Autoimmune Related Diseases Association(AARDA) has reported that about 50 million Americans suf-fer from an autoimmune disease of one kind or another[1].There are over 80 diseases caused by autoimmunity. Au-toimmune connective tissue disorders (CTDs) are one of themost common autoimmune diseases. Detection of antinuclearantibodies (ANAs) is a common marker in patients withsuspected CTD. Such antinuclear antibodies are best diagnosedby indirect immunofluorescence (IIF) of biopsy specimens.Although more than thirty different nuclear and cytoplasmpatterns of immunofluorescent staining could be identified [2],in the literature they are classified into one of the six groups[3] samples of whom are as shown in Fig.1.

The aim of this study is to develop a method based onautomatic processing of IIF images for categorization of auto-antibodies into one of the six categories of immunofluorescentstaining from the segmented images of the HEp-2 cells.Texture is an important descriptor in classification of theseimages into six training patterns. In addition to the standardco-occurrence based texture descriptors, wavelets have alsofound application for texture discrimination. First reportedtexture classification study using wavelets was by Carter [4].Wavelet features have also found applications in medicalimaging. Qureshi et al. [5] developed discriminant waveletfeatures and reported them significantly outperforming the

local binary patterns (LBP) for classification of meningiomasubtype histopathology images.

Homogeneous Coarse Speckled Cytoplasmatic

Fine Speckled Centromere Nucleolar

Fig. 1: Staining Patterns in the HEp-2 images [3]

II. LITERATURE REVIEW

In the last few years some research groups have proposeddifferent algorithms for the analysis of IIF images. Earlyattempt by Perner et al. [6] was on data mining for patternsin HEp-2 cell images. Although there were efforts by a fewother research groups on analysis of IIF images, the validationof these proposed methods have been carried out on small andprivate datasets. Hence the results are not reproducible andcannot be compared.

In an effort to standardize and allow a fair comparisonThe HEp-2 Cells Classification contest [7] was hosted by the21st International Conference on Pattern Recognition (ICPR).The aim of this contest was to bring researchers interested inperformance evaluation of algorithms for IIF images analysison a common dataset.

A few authors who were part of this contest and havemade use of this dataset are as follows. Strandmark et al.[8] presented a classification method based on random foreststhat achieved an accuracy of about 97.4%. A classificationscheme using Morphological and Textural Features was pro-posed by Theodorakopoulos et al. [9] which achieved an

2014 IEEE Region 10 Symposium

978-1-4799-2027-3/14/$31.00 ©2014 IEEE 147

overall accuracy of 95.9%. Ghosh et al. [10] proposed amethod using shape and texture based features and obtainedfinal classification using Multi-class SVM achieving accuracyof 91.3%. Ersoy et al. [11] used Share-boost classifier anda feature set comprising of texture and ARST-HOG basedfeatures to achieve an overall accuracy of 92.8%.

III. DATA DESCRIPTION

The samples of HEp-2 images used in this study wereacquired by fluorescence microscope (40-fold magnification)coupled with a 50W mercury vapor lamp and with a digitalcamera (SLIM system by Das srl). The camera has a CCDwith square pixel of 6.45 μm. The acquired images have beenstored at a resolution of 1388x1038 pixels and color depth of24 bits[12].

The data-set consisted of 14 IIF Images based on HEp-2substrate contributing to a total of 721 cells. The cell imageswere manually segmented by cropping the bounding box ofthe cell. These cells were distributed among the following sixcategories of staining patterns as (i) Centromere: 208 (29%)(ii) Coarse Speckled: 109 (15%) (iii) Cytoplasmatic: 58 (8%)(iv) Fine Speckled: 94 (13%) (v) Homogeneous: 150 (21%)(vi) Nucleolar: 102 (14%). The examples of each categoryare depicted in Fig.1. The annotations for the dataset weremade and verified by an expert specialized in immunology.The annotated dataset was made available by The HEp-2Cells Classification contest [7] which was hosted by The 21stInternational Conference on Pattern Recognition (ICPR).

IV. PROPOSED METHOD

In this we have proposed a method for automatic classifi-cation of pre-segmented HEp-2 images. The proposed methodconsists of two steps as follows:

1) Feature Extraction: Extraction of wavelet domain fea-tures.

2) Classification: Classification using a trained Neural Net-work framework.

Rest of this section gives details on the proposed method.

Two-dimensional wavelet decomposition using waveletswas carried out for upto 3 levels on the gray-scale imageof each of the 721 cell images. For comparison purpose wehave used wavelets from six families – Haar, Discrete Meyer,Daubechies, Coiflets, Symlets and Biorthogonal, in our study.The wavelet decomposition resulted in approximation sub-band (cA) and horizontal (cH), vertical (cV), and diagonal(cD) details sub-bands at each level.

Nineteen GLCM texture features were extracted for gray-scale images of cells. These nineteen GLCM features were alsoevaluated for each of the three details sub-bands, cH, cV, andcD, obtained by wavelet decomposition of the input gray-scaleimage of cell for upto three levels as shown in the Fig.2 anddescribed in equation 2. For wavelet decomposition upto threelevels there were 9 details sub-bands (3 in each level). Thisresulted in a total of 190 (9×19 = 171, for details sub-bandsand 19 for the gray-scale image) features for the 721 samplepatterns available.

Fig. 2: Flow chart demonstrating feature extraction process

The Gray Level Co-occurrence Matrix Gi j(Δx,Δy), forimage I, can be characterized by (Δx,Δy) which are the offsetsin x and y direction, respectively and is described as,

Gi j =

{1 if I(p,q) = i and I(Δx+ p,Δy+q) = j0 otherwise

(1)

The Gray Level Co-occurrence Matrices were computed forfour offset values (Δx = k,Δy = k ∀k = 1,2,3,4) and forfour directions (θ = 0◦,45◦,90◦ and 135◦). Average of theseeight values were used to form a set of 19 features [13].

[ f1, f2, · · · , f19] = GLCM[Ci] (2)

Where, Ci ε { cHk, cVk, cDk} for each level, k of waveletdecomposition. k = 1,2,3 , for each sample image.

Nineteen GLCM texture features derived from the graylevel co-occurrence matrix Gi j(Δx,Δy) [14], [13] are – Auto-correlation, Contrast, Correlation, Cluster Prominence, ClusterShade, Dissimilarity, Energy, Entropy, Homogeneity, Maxi-mum probability, Variance, Sum average, Sum variance, Sumentropy, Difference variance, Difference entropy, Informationmeasure of correlation, Normalized inverse difference, Nor-malized inverse difference moment.

Feed-forward Neural Networks are popular technique inclassification because of their low dependency on domainspecific knowledge. A two-layer feed-forward Neural NetworkClassifier, with 35 sigmoid hidden neurons was trained usingthis training dataset. The entire dataset consisting of 721patterns was divided into – training dataset (505 patterns ie.70%), validation dataset (108 patterns ie. 15%) and test dataset(108 patterns ie. 15%). These sets were sampled randomly andmultiple runs of neural network were made for evaluation.

V. RESULTS AND DISCUSSION

This section gives a qualitative as well as a quantitativeevaluation of the features used for classification of HEp-2 cell

2014 IEEE Region 10 Symposium

978-1-4799-2027-3/14/$31.00 ©2014 IEEE 148

images. Chernoff faces [15] was the technique used to showqualitatively that the GLCM texture descriptors in waveletdomain are much more discriminating texture descriptors ascompared to the standard GLCM texture descriptors in spa-tial domain. The measures like sensitivity, accuracy and f-measure were evaluated on randomly sampled test datasetsto demonstrate quantitatively that wavelet domain featureshave better discriminating power than spatial domain features.Furthermore a comparison is also made with other works thathave been performed on the same dataset.

(a)

(b)

Fig. 3: Chernoff Faces representation of wavelet domain features (fig.a) and spatial domain features (fig. b)

In Chernoff face representation, the individual parts ofhuman face such as eyes, ears, mouth and nose representvalues of the features by their shape, size, placement andorientation [15]. A visual qualitative inspection of Fig.3 showsthat the faces in the features from wavelet domain are more dis-similar than those in spatial domain features. Thus it is evidentqualitatively that wavelet domain offers more discriminanttexture descriptors.

As mentioned earlier, texture features were evaluated from

a multi-level wavelet decomposition upto 3 levels from variouswavelet families. Mojsilovic et al. [16] have studied theimpact of choice of wavelet basis on texture classification.Mojsilovic have concluded that symmetry, degree of shiftvariance and number of vanishing moments play an importantrole in choice of better discriminant wavelet basis. SinceDaubechies family of filters are non-symmetric, they exhibita large degree of shift variance. Whereas Haar, Symlets andCoieflets are not perfectly symmetrical. Orthogonality of filtersplay an important role in presence of noise, since orthogonalityguarantees symmetry between frequency response of low andhigh pass filters. Bi-orthogonal wavelets may not give betterresults in presence of noise as in our case. On the otherhand Discrete Meyer basis is symmetric, has fair number ofvanishing moments and has less degree of shift variance. Thusis an optimal choice for the task at hand. Furthermore thisfact is verified by experimenting with other wavelet basis andresults are obtained supporting this claim.

In this study, tests have been performed using wavelet fromthe following families – Discrete Meyer, Haar, Daubechies,Coiflet, Symlets, Bi-orthogonal. The accuracies of classifica-tion on test datasets along with deviation are presented fordifferent wavelets in Table III. The entire dataset consistingof 721 patterns was randomly divided into training dataset(70%), validation dataset (15%) and test dataset (15%). Tensuch trials were done to report Table III. The overall accuracyis reported as mean of accuracies obtained in each trial alongwith standard deviations of the corresponding measures. Asexplained earlier Discrete Meyer is an optimal choice ofbasis and is also observed with experimental data that filtersproduced by Discrete Meyer Wavelet produced a classifier withbest performance.

Tests were done to show that over-fitting does not occur.We take a smaller training dataset and a large testing dataset.As shown in Table I even with a training set of 30% of the totalavailable data an accuracy of 92.7% is obtained on the testingset. Overall accuracy values of classification over testing andtraining datasets for different size of training set are also listedin Table I. It can also be observed that testing accuracies andtraining accuracies have similar values further strengtheningthe argument that over-fitting is not occurring in our case. Thuswe conclude that the classifier is general enough to performwell on an unknown dataset and avoid over-fitting.

TABLE I: Accuracy values in percentages over testing and trainingset, varying size of training set. The values of size of training set arepercentages of the total dataset.(OA=Overall Accuracy)

Training set size(%) OA for Test-ing set

OA forTraining set

30 92.7±0.75 99.5±0.2740 94.1±1.37 99.2±0.2750 91.4±2.80 99.4±0.3160 94.3±1.55 99.5±0.24

Confusion matrix was evaluated for test dataset in each ofthe ten trials using Discrete Meyer wavelet features. Accuracy,average sensitivity, average specificity, average precision, andaverage F1-score are presented in Table II. Initial comparisonwas done using only the GLCM features in spatial domain.Average classification accuracy of 68.9% was obtained withaverage precision of 73.62%. F-measure was found to be

2014 IEEE Region 10 Symposium

978-1-4799-2027-3/14/$31.00 ©2014 IEEE 149

TABLE II: Accuracy(Acc), average sensitivity(SN), average specificity(SP), average precision(Pr), and average F1-score (in percentages) forfeatures evaluated by Discrete Meyer Wavelet Decomposition, Standard GLCM in spatial domain and various authors that used the samedataset.

Acc SN SP Pr F1

Wavelet Domain Features 95.38±2.36 94.16±1.00 91.79±1.25 92.81±1.19 93.42±1.10Spatial Domain Features 68.9±3.08 71.78±2.15 72.11±2.93 73.62±4.71 70.99±3.81Strandmark et al. [8] 97.4 97.46 99.47 97.10 97.27Theodorakopoulos et al. [9] 95.9 93.92 98.74 93.87 93.77Ghosh et al. [10] 91.13 89.79 98.16 91.30 90.42Ersoy et al. [11] 92.8 92.8 98.56 93.08 92.83

TABLE III: Overall Classification accuracy values for staining pat-terns classification using various wavelet families on randomly sam-pled test dataset.

Wavelet Overall Accuracy (%)

Daubechies 6 89.28±2.14Discrete Meyer 95.38±2.36Coiflet 1 86.96±2.52Haar 89.14±1.96Symlets 6 88.90±1.12Biorthogonal 2.2 87.40±1.92

70.99%. This clearly demonstrates with quantitative measures,the superiority of GLCM features in wavelet domain as com-pared to spatial domain (see Table II). In addition to that,results from other authors who have evaluated their schemeson the same dataset are also presented. Although the proposedscheme is not the best performing scheme on the dataset,the authors are of the opinion that wavelet domain texturedescriptors are more general than the descriptors used byother authors. This would lead to higher performance onunknown datasets. Evaluation of these methods with larger anddiverse datasets is required for better comparison in terms ofrobustness and generalization.

VI. CONCLUSION AND FUTURE WORK

In this work, an attempt is made at classification ofimmunofluorescent staining patterns in HEp-2 cell ’IndirectImmuno-fluorescence (IIF)’ images. Texture descriptors basedon gray level co-occurrence matrix (GLCM) applied on sub-bands obtained from multi-level wavelet decomposition areproposed. It has been shown experimentally with qualitativeand quantitative measures that GLCM features in waveletdomain have more discriminating power than texture descriptorin spatial domain. To achieve a better classification waveletbasis is to be selected which is symmetric, exhibits smallshift variance and has a sufficiently large number of vanishingmoments. Hence, Discrete Meyer wavelet was found to bethe most discriminating for the classification task at hand.Although a basic comparison is made between the proposedscheme and schemes by other authors, evaluation with largerand diverse datasets is required for better view in terms ofrobustness and generalization.

It is further hypothesized that use of dimensionality reduc-tion techniques, use of wavelet packet and optimal selectionshall further aid in a better texture description and classifi-cation. These facts remains to be validated. Finally a large-scale, multi-site collaborative study may prove to be a vehicle

in the development of reliable and robust computerized toolsfor classification of HEp-2 cell images to be used in clinicalpractice.

REFERENCES

[1] “The american autoimmune related diseases association.” [Online].Available: http://www.aarda.org/autoimmune-information/autoimmune-statistics/

[2] D. Solomon, A. Kavanaugh, and P. Schur, “Evidence-based guidelinesfor the use of immunologic tests: Antinuclear antibody testing,” ArthritisCare & Research, vol. 47, no. 4, pp. 434–444, 2002.

[3] U. Sack, S. Knoechner, H. Warschkau, U. Pigla, F. Emmrich, andM. Kamprad, “Computer-assisted classification of hep-2 immunoflu-orescence patterns in autoimmune diagnostics,” Autoimmunity Reviews,vol. 2, no. 5, pp. 298–304, 2003.

[4] P. Carter, “Texture discrimination using wavelets,” in Proceedings ofSPIE, vol. 1567, 1991, p. 432.

[5] H. Qureshi, O. Sertel, N. Rajpoot, R. Wilson, and M. Gurcan, “Adaptivediscriminant wavelet packet transform and local binary patterns formeningioma subtype classification,” Medical Image Computing andComputer-Assisted Intervention–MICCAI 2008, pp. 196–204, 2008.

[6] P. Perner, H. Perner, and B. Muller, “Mining knowledge for hep-2 cellimage classification,” Artificial Intelligence in Medicine, vol. 26, no.1-2, pp. 161–173, 2002.

[7] P. Foggia, G. Percannella, P. Soda, and M. Vento, “Benchmarking hep-2cells classification methods,” 2013.

[8] P. Strandmark, J. Ulen, and F. Kahl, “Hep-2 staining pattern clas-sification,” in Pattern Recognition (ICPR), 2012 21st InternationalConference on. IEEE, 2012, pp. 33–36.

[9] I. Theodorakopoulos, D. Kastaniotis, G. Economou, and S. Fotopoulos,“Hep-2 cells classification via fusion of morphological and texturalfeatures,” in Bioinformatics & Bioengineering (BIBE), 2012 IEEE 12thInternational Conference on. IEEE, 2012, pp. 689–694.

[10] S. Ghosh and V. Chaudhary, “Feature analysis for automatic classi-fication of hep-2 florescence patterns: Computer-aided diagnosis ofauto-immune diseases,” in Pattern Recognition (ICPR), 2012 21stInternational Conference on. IEEE, 2012, pp. 174–177.

[11] I. Ersoy, F. Bunyak, J. Peng, and K. Palaniappan, “Hep-2 cell classifi-cation in iif images using shareboost,” in Pattern Recognition (ICPR),2012 21st International Conference on. IEEE, 2012, pp. 3362–3365.

[12] P. Soda, A. Rigon, A. Afeltra et al., “Automatic acquisition of im-munofluorescence images: Algorithms and evaluation,” in Computer-Based Medical Systems, 2006. CBMS 2006. 19th IEEE InternationalSymposium on. IEEE, 2006, pp. 386–390.

[13] R. Haralick, K. Shanmugam, and I. Dinstein, “Textural features for im-age classification,” Systems, Man and Cybernetics, IEEE Transactionson, vol. 3, no. 6, pp. 610–621, 1973.

[14] L. Davis, S. Johns, and J. Aggarwal, “Texture analysis using generalizedco-occurrence matrices,” Pattern Analysis and Machine Intelligence,IEEE Transactions on, no. 3, pp. 251–259, 1979.

[15] H. Chernoff, “The use of faces to represent points in k-dimensionalspace graphically,” Journal of the American Statistical Association, pp.361–368, 1973.

[16] A. Mojsilovic, M. Popovic, and D. Rackov, “On the selection of anoptimal wavelet basis for texture characterization,” Image Processing,IEEE Transactions on, vol. 9, no. 12, pp. 2043–2050, 2000.

2014 IEEE Region 10 Symposium

978-1-4799-2027-3/14/$31.00 ©2014 IEEE 150