Unsupervised Mammograms Segmentation - CASlibrary.utia.cas.cz/separaty/2008/RO/haindl... · The...

4
Unsupervised Mammograms Segmentation Michal Haindl Stanislav Mikeˇ s Institute of Information Theory and Automation of the ASCR, 182 08 Prague, Czech Republic {haindl,xaos}@utia.cz Abstract We present a multiscale unsupervised segmenter for automatic detection of potentially cancerous regions of interest containing fibroglandular tissue in digital screening mammography. The mammogram tissue tex- tures are locally represented by four causal multispec- tral random field models recursively evaluated for each pixel and several scales. The segmentation part of the algorithm is based on the underlying Gaussian mixture model and starts with an over segmented initial estima- tion which is adaptively modified until the optimal num- ber of homogeneous mammogram segments is reached. The performance of the presented method is verified on the Digital Database for Screening Mammography (DDSM) from the University of South Florida as well as extensively tested on the Prague Texture Segmenta- tion Benchmark and compares favourably with several alternative unsupervised texture segmentation methods. 1. Introduction Breast cancer is the leading cause of death [17, 15] among all cancers for middle-aged women in most de- veloped countries. Thus a significant effort is cur- rently focused on cancer prevention and early detection which can substantially reduce the mortality rate. X-ray screening mammography is the most frequented method for breast cancer early detection although not without problems [15] such as rather large minimum detectable tumor size, higher mammogram sensitivity for older women or radiation exposition. Automatic mammo- gram analysis is still difficult task due to wide variation of breast anatomy, nevertheless a computer-aided diag- nosis system can successfully assist a radiologist, and can be used as a second opinion. The first step in a such system is detection of suspicious potentially cancerous regions of interest . Several approaches to detect these regions of interest (ROI) were published [17], mostly based on supervised learning. We propose an unsuper- vised segmentation method for fast automatic mammo- gram segmentation into the regions of interest (ROI) us- ing a statistical random field based texture representa- tion. The presented method detects the fibroglandular tissue regions from either craniocaudal (CC) or medi- olateral oblique (MLO) views and thus can help focus a radiologist to this most important breast region. Spa- tial interaction models and especially Markov random fields-based models are increasingly popular for texture representation [10, 16, 5], etc. Several researchers dealt with the difficult problem of unsupervised segmentation using these models see for example [12, 14, 1], or [7], which is also addressed in this paper. 2. Breast Detector The method starts with automatic breast area detec- tion because it can be easily computed and simplifies the subsequent fibroglandular tissue region detection. This is performed using simple histogram thresholding with an automatically selected threshold. In this step the method also recognizes several label areas on a mam- mogram. We compute their areas and all but the largest one are discarded and merged with the background. In this stage the algorithm also decides the breast orienta- tion (left or right) on the mammogram. Fig. 1 -breast mask illustrates the resulting detected breast area (in in- verted grey levels). The following detection of regions of interest is performed only in the breast region ignor- ing the background area set in the mask template. 3. Breast Tissue Texture Model Our method segments pseudo-colour multiresolution mammograms each created from the original greyscale mammogram and its two nonlinear gamma transforma- tions. We assume to down-sample input image Y into M =3 different resolutions Y (m) =ιm Y with sam- pling factors ι m m =1,...,M identical for both 978-1-4244-2175-6/08/$25.00 ©2008 IEEE

Transcript of Unsupervised Mammograms Segmentation - CASlibrary.utia.cas.cz/separaty/2008/RO/haindl... · The...

Page 1: Unsupervised Mammograms Segmentation - CASlibrary.utia.cas.cz/separaty/2008/RO/haindl... · The mammogram tissue tex- ... Rough scale pixels parameters are simply mapped to the corresponding

Unsupervised Mammograms Segmentation

Michal Haindl Stanislav MikesInstitute of Information Theory and Automation

of the ASCR, 182 08 Prague, Czech Republichaindl,[email protected]

Abstract

We present a multiscale unsupervised segmenter forautomatic detection of potentially cancerous regionsof interest containing fibroglandular tissue in digitalscreening mammography. The mammogram tissue tex-tures are locally represented by four causal multispec-tral random field models recursively evaluated for eachpixel and several scales. The segmentation part of thealgorithm is based on the underlying Gaussian mixturemodel and starts with an over segmented initial estima-tion which is adaptively modified until the optimal num-ber of homogeneous mammogram segments is reached.The performance of the presented method is verifiedon the Digital Database for Screening Mammography(DDSM) from the University of South Florida as wellas extensively tested on the Prague Texture Segmenta-tion Benchmark and compares favourably with severalalternative unsupervised texture segmentation methods.

1. Introduction

Breast cancer is the leading cause of death [17, 15]among all cancers for middle-aged women in most de-veloped countries. Thus a significant effort is cur-rently focused on cancer prevention and early detectionwhich can substantially reduce the mortality rate. X-rayscreening mammography is the most frequented methodfor breast cancer early detection although not withoutproblems [15] such as rather large minimum detectabletumor size, higher mammogram sensitivity for olderwomen or radiation exposition. Automatic mammo-gram analysis is still difficult task due to wide variationof breast anatomy, nevertheless a computer-aided diag-nosis system can successfully assist a radiologist, andcan be used as a second opinion. The first step in a suchsystem is detection of suspicious potentially cancerousregions of interest . Several approaches to detect theseregions of interest (ROI) were published [17], mostly

based on supervised learning. We propose an unsuper-vised segmentation method for fast automatic mammo-gram segmentation into the regions of interest (ROI) us-ing a statistical random field based texture representa-tion. The presented method detects the fibroglandulartissue regions from either craniocaudal (CC) or medi-olateral oblique (MLO) views and thus can help focusa radiologist to this most important breast region. Spa-tial interaction models and especially Markov randomfields-based models are increasingly popular for texturerepresentation [10, 16, 5], etc. Several researchers dealtwith the difficult problem of unsupervised segmentationusing these models see for example [12, 14, 1], or [7],which is also addressed in this paper.

2. Breast Detector

The method starts with automatic breast area detec-tion because it can be easily computed and simplifiesthe subsequent fibroglandular tissue region detection.This is performed using simple histogram thresholdingwith an automatically selected threshold. In this step themethod also recognizes several label areas on a mam-mogram. We compute their areas and all but the largestone are discarded and merged with the background. Inthis stage the algorithm also decides the breast orienta-tion (left or right) on the mammogram. Fig. 1 -breastmask illustrates the resulting detected breast area (in in-verted grey levels). The following detection of regionsof interest is performed only in the breast region ignor-ing the background area set in the mask template.

3. Breast Tissue Texture Model

Our method segments pseudo-colour multiresolutionmammograms each created from the original greyscalemammogram and its two nonlinear gamma transforma-tions. We assume to down-sample input image Y intoM = 3 different resolutions Y (m) =↓ιm Y with sam-pling factors ιm m = 1, . . . ,M identical for both

978-1-4244-2175-6/08/$25.00 ©2008 IEEE

Page 2: Unsupervised Mammograms Segmentation - CASlibrary.utia.cas.cz/separaty/2008/RO/haindl... · The mammogram tissue tex- ... Rough scale pixels parameters are simply mapped to the corresponding

B-3056-1 right CC breast mask

segmentation regions borders

Figure 1. Normal right breast mammo-gram (patient age 58, but with a cancer-ous lesion in the left breast), the detectedbreast area, segmentation result and de-tected regions of interest, respectively.

directions and Y (1) = Y . Local texture for each pixelY

(m)r is represented using the 3D CAR model param-

eter space Θ(m)r . The concept of decision fusion [11]

for high-performance pattern recognition is well knownand widely accepted in the area of supervised classi-fication where (often very diverse) classification tech-nologies, each providing complementary sources of in-formation about class membership, can be integratedto provide more accurate, robust and reliable classifi-cation decisions than the single classifier applications.The proposed method circumvents the problem of mul-tiple unsupervised segmenters combination [6] by fus-ing multiple-processed measurements into a single seg-menter feature vector.

Smooth pseudo-colour mammogram textures requirethree dimensional models for adequate representation.We assume that single multi spectral texture can be lo-cally modeled using a 3D simultaneous causal autore-gressive random field model (CAR). This model can

be expressed as a stationary causal uncorrelated noisedriven 3D autoregressive process [8]:

Yr = γXr + er , (1)

where γ = [A1, . . . , Aη] is the 3 × 3η parame-ter matrix, er is a white Gaussian noise vector withzero mean and a constant but unknown variance, Xr

is a corresponding vector of the contextual neighboursYr−s and r, r − 1, . . . is a chosen direction of move-ment on the image index lattice I . η = card(Icr)where Icr is a causal neighborhood index set (e.g.Icr = (r1, r2 − 1), (r1 − 1, r2)). The optimal neigh-bourhood (Icr ) as well as the Bayesian parameters esti-mation of a CAR model can be found analytically un-der few additional and acceptable assumptions using theBayesian approach [8]. The recursive Bayesian param-eter estimation of the CAR model is [8]:

γTr−1 = γTr−2 +V −1x(r−2)Xr−1(Yr−1 − γr−2Xr−1)T

(1 +XTr−1V

−1x(r−2)Xr−1)

,

where Vx(r−1) =∑r−1k=1XkX

Tk + Vx(0). Each matrix

contains local estimations of the CAR model param-eters. These models have identical contextual neigh-bourhood Icr but they differ in their major movementdirection (top-down, bottom-up, rightward, leftward).The local texture for each pixel and M resolutionsα1, . . . , αM is represented by four parametric matricest, b, r, l e.g. γ

i,αjr for i ∈ t, b, r, l, j = 1, . . . ,M

which are subsequently compressed using the localPCA (for computational efficiency) into γi,αjr . Singleresolution compressed parameters are composed intoM parametric matrices:

γαj Tr = γt,αjr , γb,αjr , γr,αjr , γl,αjr T j = 1, . . . ,M .

The parametric space γαj is subsequently smooth out,rearranged into a vector and its dimensionality is re-duced using the PCA feature extraction (γαj ). Finallywe add the average local spectral values ζ

αjr to the

resulting feature vector:

Θr = [γα1r , ζα1

r , . . . , γαMr , ζαMr ]T . (2)

Rough scale pixels parameters are simply mapped to thecorresponding fine scale locations.

4. Texture Parametric Space Segmentation

Multi-spectral, multiresolution texture segmentationis done by clustering in the combined CAR models pa-rameter space Θ defined on the lattice I where Θr is

Page 3: Unsupervised Mammograms Segmentation - CASlibrary.utia.cas.cz/separaty/2008/RO/haindl... · The mammogram tissue tex- ... Rough scale pixels parameters are simply mapped to the corresponding

the modified parameter vector (2) computed for the lat-tice location r. We assume that this parametric spacecan be represented using the Gaussian mixture model(GM) with diagonal covariance matrices due to the pre-vious CAR parametric space decorrelation. The Gaus-sian mixture model for CAR parametric representationis as follows:

p(Θr) =K∑i=1

pi p(Θr | νi,Σi) , (3)

p(Θr | νi,Σi) =|Σi|−

12

(2π)d2e−

(Θr−νi)TΣ−1

i(Θr−νi)

2 .(4)

The mixture model equations (3),(4) are solved usinga modified EM algorithm. The algorithm is initializedusing νi,Σi statistics estimated from the correspond-ing regions obtained by regular division of the inputdetected breast area. An alternative initialization canbe random choice of these statistics. For each possiblecouple of regions the Kullback Leibler divergence

D (p(Θr | νi,Σi) || p(Θr | νj ,Σj)) =∫Ω

p(Θr | νi,Σi) log(p(Θr | νi,Σi)p(Θr | νj ,Σj)

)dΘr (5)

is evaluated and the most similar regions, i.e.,

i, j = arg mink,l

D (p(Θr | νl,Σl) || p(Θr | νk,Σk))

are merged together in each step. This initializationresults in Kini subimages and recomputed statisticsνi,Σi . Kini > K where K is the optimal numberof textured segments to be found by the algorithm. Twosteps of the EM algorithm are repeating after the initial-ization. The components with smaller weights than afixed threshold (pj < 0.01

Kini) are eliminated. For every

pair of components we estimate their Kullback Leiblerdivergence (5). From the most similar couple, the com-ponent with the weight smaller than the threshold ismerged to its stronger partner and all statistics are ac-tualized using the EM algorithm. The algorithm stopswhen either the likelihood function has negligible in-crease (Lt − Lt−1 < 0.01) or the maximum iterationnumber threshold is reached.

The parametric vectors representing texture mosaicpixels are assigned to the clusters according to the high-est component probabilities, i.e., Yr is assigned to thecluster ωj∗ if

πr,j∗ = maxj∑s∈Ir

ws p(Θr−s | νj ,Σj) ,

where ws are fixed distance-based weights, Ir is arectangular neighbourhood and πr,j∗ > πthre (other-wise the pixel is unclassified). The area of single clus-ter blobs is evaluated in the post-processing thematic

B-3056-1 left MLO segmentation ground truth

C-0016-1 right CC segmentation ground truth

Figure 2. Cancerous mammograms (pa-tients age 58 (top) and 80 (bottom)), ra-diologist associated ground truth and de-tected regions of interest using the multi-ple segmenter approach, respectively.

map filtration step. Regions with similar statistics aremerged. Thematic map blobs with area smaller than agiven threshold are attached to its neighbour with thehighest similarity value. Finally, regions which havegrey level mean value difference from the median meanvalue (over the same type of digitized mammograms)of cancerous ground truth regions larger than a speci-fied threshold are eliminated.

5. Experimental Results

The algorithm was tested on mammograms fromthe Digital Database for Screening Mammography(DDSM) from the University of South Florida [9]. Thisdatabase contains 2620 four view (left and right cran-iocaudal (CC) and mediolateral oblique (MLO)) mam-mograms in different resolutions. Single mammogramscases are divided into normal, benign, benign withoutcallback volumes and cancer. All our experiments aredone with three resolutions (M = 3) using samplingfactors ι1 = 2, ι2 = 4, ι3 = 8 and the causal neigh-bourhood with fourteen neighbours (η = 14). Fig.2-top show left MLO mammogram of a patient age58 with detected malignant asymmetric lesion and the

Page 4: Unsupervised Mammograms Segmentation - CASlibrary.utia.cas.cz/separaty/2008/RO/haindl... · The mammogram tissue tex- ... Rough scale pixels parameters are simply mapped to the corresponding

right CC mammogram (Fig. 2-bottom) of a patient age80 with detected irregular, spiculated malignant lesiontype. The segmenter correctly found the region of inter-est with the cancer lesion on both mammograms. Thedetected region of interest results Figs. 1-2 demon-strate very good region segmentation and low over-segmentation properties of our method. The generalsegmentation part of our method (without mammogra-phy specific steps) was also successfully numericallycompared [6] with several alternative algorithms JSEG[4], Blobworld [2], GMRF-GM [7] and Edison [3].These algorithms on the Prague Texture Benchmark[7, 13] performed steadily worse as can be seen in the[6] or on the benchmark web (http://mosaic.utia.cas.cz).For all the 27 benchmark criteria our method is eitherthe best one or the next best with marginal differencefrom the best one. Resulting ROI segmentation re-sults are promising however comparison with other al-gorithms is difficult because of lack of sound experi-mental evaluation results in the field of screening mam-mography segmentation.

6. Conclusions

We proposed the efficient method for completely au-tomatic unsupervised detection of mammogram fibrog-landular tissue regions of interest. This method is basedon the underlying 3D CAR and GM texture models. Al-though our algorithm uses Markovian models it is fastdue to robust recursive models parameter estimationand therefore it is much faster than usual Markovian ap-proaches which often require time consuming iterativeMarkov chain Monte Carlo methods to estimate theirparameters. Usual drawback of segmentation methodsare many application dependent parameters to be ex-perimentally estimated. Our method requires only acontextual neighbourhood selection and two additionalthresholds. The algorithm’s performance is favourablydemonstrated on the extensive benchmark tests on largescreening mammography database and also natural tex-ture mosaics Prague benchmark ([6, 13]), where it out-performs several alternative segmenters.

Acknowledgements

This research was supported by the EC projectno. FP6-507752 MUSCLE and partially by grantsNo.A2075302, 1ET400750407, 1M0572 and 2C06019.

References

[1] P. Andrey and P. Tarroux. Unsupervised segmenta-tion of markov random field modeled textured images

using selectionist relaxation. IEEE Trans. on PAMI,20(3):252–262, March 1998.

[2] C. Carson, M. Thomas, S. Belongie, J. M. Hellerstein,and J. Malik. Blobworld: A system for region-based im-age indexing and retrieval. In Third International Con-ference on Visual Information Systems. Springer, 1999.

[3] C. Christoudias, B. Georgescu, and P. Meer. Syner-gism in low level vision. In R. Kasturi, D. Laurendeau,and C. Suen, editors, Proceedings of the 16th Int. Conf.on Pattern Recognition, volume 4, pages 150–155, LosAlamitos, August 2002. IEEE Computer Society.

[4] Y. Deng and B. Manjunath. Unsupervised segmenta-tion of color-texture regions in images and video. IEEETrans. on PAMI, 23(8):800–810, August 2001.

[5] M. Haindl. Texture synthesis. CWI Quarterly,4(4):305–331, December 1991.

[6] M. Haindl and S. Mikes. Unsupervised texture segmen-tation using multiple segmenters strategy. In M. Haindl,J. Kittler, and F. Roli, editors, MCS 2007, volume 4472of LNCS, pages 210–219. Springer, 2007.

[7] M. Haindl and S. Mikes. Model-based texture segmen-tation. Lecture Notes in Computer Science, (3212):306– 313, 2004.

[8] M. Haindl and S. Simberova. Theory & Applicationsof Image Analysis, chapter A Multispectral Image LineReconstruction Method, pages 306–315. World Scien-tific Publishing Co., Singapore, 1992.

[9] M. Heath, K. Bowyer, D. Kopans, R. Moore, andP. Kegelmeyer. The digital database for screening mam-mography. In Proc. of the 5th Int. Workshop on DigitalMammography. Medical Physics Publishing, 2000.

[10] R. Kashyap. Image models. In K. F. T.Y. Young, editor,Handbook of Pattern Recognition and Image Process-ing. Academic Press, New York, 1986.

[11] J. Kittler, A. Hojjatoleslami, and T. Windeatt. Weight-ing factors in multiple expert fusion. In Proc. BMVC,pages 41–50. BMVA, BMVA, 1997.

[12] B. Manjunath and R. Chellapa. Unsupervised texturesegmentation using markov random field models. IEEETrans. on PAMI, 13:478–482, 1991.

[13] S. Mikes and M. Haindl. Prague texture segmentationdata generator and benchmark. ERCIM News, (64):67–68, 2006.

[14] D. Panjwani and G. Healey. Markov random field mod-els for unsupervised segmentation of textured color im-ages. IEEE Transactions on Pattern Analysis and Ma-chine Intelligence, 17(10):939–954, 1995.

[15] H. Qi and N. A. Diakides. Thermal infrared imagingin early breast cancer detection - a survey of recent re-search. In 25th Annual Int. Conference of the IEEEEMBS, pages 448–452, 2003.

[16] T. R. Reed and J. M. H. du Buf. A review of re-cent texture segmentation and feature extraction tech-niques. CVGIP–Image Understanding, 57(3):359–372,May 1993.

[17] T. Tweed and S. Miguet. Automatic detection of re-gions of interest in mammographies based on a com-bined analysis of texture and histogram. In ICPR, vol-ume 02, pages 448–452, Los Alamitos, CA, USA, 2002.IEEE Computer Society.