[IEEE 2010 17th Iranian Conference Of Biomedical Engineering (ICBME) - Isfahan, Iran...

4
Comparison Evaluation of Three Brain MRI Segmentation Methods in Software Tools Keyvan Kasiri 1 , Mohammad Javad Dehghani 1 , Kamran Kazemi 1,2 , Mohammad Sadegh Helfroush 1 , Shaghayegh Kafshgari 1 1 Dept. of Electrical and Electronics Engineering, Shiraz University of Technology, Shiraz, Iran 2 GRAMFC, Faculty of Medicine, University of Picardie Jules Verne, Amiens, France {k.kasiri, dehghani, kazemi, ms_helfroush, sh.kafshgari}@sutech.ac.ir Abstract— Accuracy of automated segmentation methods is of great importance in brain image analysis. In this paper, a comparison evaluation of three common software packages in brain image segmentation is presented. Performance analysis of segmentation methods integrated with the latest versions of SPM, FSL and BrainSuite is considered. In this comparison study, quantitative and qualitative assessment of segmentation algorithms is performed for both simulated and real data. Results obtained in this paper can be utilized to assist the users for selecting the appropriate software for the application of brain tissue segmentation. Keywords-accuracy, brain, evaluation, magnetic resonance imaging (MRI), segmentation I. INTRODUCTION Magnetic resonance imaging (MRI) is a noninvasive imaging technique that often is used for anatomical assessment of human brain structures. Due to its outstanding soft tissue contrast, detailed resolution and its noninvasive properties, MRI plays an important role in detection of neurodegenerative diseases. In brain MRI, segmentation of brain tissues is an important first step for numerous applications. Quantitative and qualitative studies of anatomical brain tissues and structures that have distinctive structural or functional properties usually relies on accurate segmentation of brain [1,2]. Manual segmentation of brain tissues is reliable, but certainly hard and time-consuming. Moreover, it is highly dependent on large intra-and inter-observer variability that leads to degradation of credibility in the segmentation analysis. Therefore, there are strong demands to perform a reliable, reproducible, accurate and robust alternative automated segmentation of brain MR images as a prerequisite for the comprehensive brain analysis. Several approaches have been proposed for automatic segmentation of cerebral MR images into different tissues in the literature [1-4]. As stated in [4], segmentation methods can be classified in classification-based, region-based, contour- based, atlas-based and learning-based categories. Among them, thresholding, statistical classification based methods such as Markov random field (MRF) and expectation maximization (EM), clustering methods like fuzzy c-means (FCM) are considered as classification-based methods. Region-based methods include methods like region growing, split-and-merge and watershed. Edge-based methods, methods based on deformable templates or active contours are the methods, which can be counted in contour-based category. At present, there are some software packages, which are most widely used in neuroimaging analysis. These packages usually contain a set of skull stripping, bias field correction and automated segmentation routines. Among them, SPM8 toolbox [5] written and developed at Wellcome Trust Centre for Neuroimaging, University College London, UK, FSL version 4.1 software package [6] written by Analysis Group, FMRIB, Oxford, UK and BrainSiute version 9.01 [7] written by Dr. David W. Shattuck are examined in this paper. This study will provide a quantitative and qualitative assessment for segmentation algorithms in these packages. Different investigations on evaluation of brain segmentation techniques have been presented previously. In this study, we employ the latest versions of the most widely used packages for segmenting the brain into white matter (WM), grey matter (GM) and cerebrospinal fluid (CSF). In this paper, a comprehensive evaluation on both simulated and real MR data with commonly used evaluation metrics is performed. The rest of this paper is organized as follows. In the following section, materials and methods including simulated and real MR data, segmentation techniques, and evaluation measurements are presented. Section III provides detailed simulation results and discussion. Finally, we will conclude the paper in section IV. II. MATERIALS AND METHODS A. Data In order to perform an evaluation on segmentation methods integrated in the presented softwares, they are tested on both simulated and real MR images. 1) BrainWeb Simulated Datasets The simulated 3D MR images (181 217181 voxels of 1mm3 isotropic resolution) are provided by the BrainWeb Simulated Brain Database from the McGill University. Realistic simulations of MRI acquisition with different noise and intensity non-uniformity levels are available in this database. Simulated MR Data is generated based on an anatomical model of normal brain. Besides, this database provides a ground truth volume of the classification results.

Transcript of [IEEE 2010 17th Iranian Conference Of Biomedical Engineering (ICBME) - Isfahan, Iran...

Page 1: [IEEE 2010 17th Iranian Conference Of Biomedical Engineering (ICBME) - Isfahan, Iran (2010.11.3-2010.11.4)] 2010 17th Iranian Conference of Biomedical Engineering (ICBME) - Comparison

Comparison Evaluation of Three Brain MRI Segmentation Methods in Software Tools

Keyvan Kasiri1, Mohammad Javad Dehghani1, Kamran Kazemi1,2, Mohammad Sadegh Helfroush1, Shaghayegh Kafshgari1

1 Dept. of Electrical and Electronics Engineering, Shiraz University of Technology, Shiraz, Iran 2 GRAMFC, Faculty of Medicine, University of Picardie Jules Verne, Amiens, France

{k.kasiri, dehghani, kazemi, ms_helfroush, sh.kafshgari}@sutech.ac.ir

Abstract— Accuracy of automated segmentation methods is of great importance in brain image analysis. In this paper, a comparison evaluation of three common software packages in brain image segmentation is presented. Performance analysis of segmentation methods integrated with the latest versions of SPM, FSL and BrainSuite is considered. In this comparison study, quantitative and qualitative assessment of segmentation algorithms is performed for both simulated and real data. Results obtained in this paper can be utilized to assist the users for selecting the appropriate software for the application of brain tissue segmentation.

Keywords-accuracy, brain, evaluation, magnetic resonance imaging (MRI), segmentation

I. INTRODUCTION

Magnetic resonance imaging (MRI) is a noninvasive imaging technique that often is used for anatomical assessment of human brain structures. Due to its outstanding soft tissue contrast, detailed resolution and its noninvasive properties, MRI plays an important role in detection of neurodegenerative diseases. In brain MRI, segmentation of brain tissues is an important first step for numerous applications. Quantitative and qualitative studies of anatomical brain tissues and structures that have distinctive structural or functional properties usually relies on accurate segmentation of brain [1,2]. Manual segmentation of brain tissues is reliable, but certainly hard and time-consuming. Moreover, it is highly dependent on large intra-and inter-observer variability that leads to degradation of credibility in the segmentation analysis. Therefore, there are strong demands to perform a reliable, reproducible, accurate and robust alternative automated segmentation of brain MR images as a prerequisite for the comprehensive brain analysis.

Several approaches have been proposed for automatic segmentation of cerebral MR images into different tissues in the literature [1-4]. As stated in [4], segmentation methods can be classified in classification-based, region-based, contour-based, atlas-based and learning-based categories. Among them, thresholding, statistical classification based methods such as Markov random field (MRF) and expectation maximization (EM), clustering methods like fuzzy c-means (FCM) are considered as classification-based methods. Region-based methods include methods like region growing, split-and-merge and watershed. Edge-based methods, methods based on

deformable templates or active contours are the methods, which can be counted in contour-based category.

At present, there are some software packages, which are most widely used in neuroimaging analysis. These packages usually contain a set of skull stripping, bias field correction and automated segmentation routines. Among them, SPM8 toolbox [5] written and developed at Wellcome Trust Centre for Neuroimaging, University College London, UK, FSL version 4.1 software package [6] written by Analysis Group, FMRIB, Oxford, UK and BrainSiute version 9.01 [7] written by Dr. David W. Shattuck are examined in this paper. This study will provide a quantitative and qualitative assessment for segmentation algorithms in these packages. Different investigations on evaluation of brain segmentation techniques have been presented previously. In this study, we employ the latest versions of the most widely used packages for segmenting the brain into white matter (WM), grey matter (GM) and cerebrospinal fluid (CSF). In this paper, a comprehensive evaluation on both simulated and real MR data with commonly used evaluation metrics is performed.

The rest of this paper is organized as follows. In the following section, materials and methods including simulated and real MR data, segmentation techniques, and evaluation measurements are presented. Section III provides detailed simulation results and discussion. Finally, we will conclude the paper in section IV.

II. MATERIALS AND METHODS

A. Data

In order to perform an evaluation on segmentation methods integrated in the presented softwares, they are tested on both simulated and real MR images.

1) BrainWeb Simulated Datasets The simulated 3D MR images (181 217181 voxels of

1mm3 isotropic resolution) are provided by the BrainWeb Simulated Brain Database from the McGill University. Realistic simulations of MRI acquisition with different noise and intensity non-uniformity levels are available in this database. Simulated MR Data is generated based on an anatomical model of normal brain. Besides, this database provides a ground truth volume of the classification results.

Administrator
Text Box
Proceedings of the 17th Iranian Conference of Biomedical Engineering (ICBME2010), 3-4 November 2010
Administrator
Text Box
978-1-4244-7484-4/10/$26.00© 2010 IEEE
Page 2: [IEEE 2010 17th Iranian Conference Of Biomedical Engineering (ICBME) - Isfahan, Iran (2010.11.3-2010.11.4)] 2010 17th Iranian Conference of Biomedical Engineering (ICBME) - Comparison

2) IBSR Real Datasets The second dataset used in this evaluation study consists of

3D real MRI brain data from 20 normal subjects provided by Center for Morphometric Analysis at Massachusetts General Hospital. The real MR brain images and their manually segmented results by expert segmentation are available at these datasets. These 20 different real datasets involve different levels of difficulty such as low contrast scans, relatively smaller brain volumes and considerable intensity non-uniformity. This can make it possible to evaluate the segmentation methods under different circumstances of low contrast scans, different signal-to-noise ratio and contrast-to-noise ratio, shape complexity, variations in brain volumes and considerable intensity non-uniformity.

B. Segmentation Methods

1) SPM8 Statistical Parametric Mapping (SPM) is a package that

works as a suit of Matlab functions and subroutines for implementing statistical methods for analysis of brain imaging data. The segmentation process implemented in SPM is based on an integrated model in which tissue segmentation, intensity and spatial normalization, and bias correction are performed all in the same mixture of Gaussian model. SPM employs Expectation-Maximization (EM) algorithm for acquiring optimum parameters of tissue probabilistic models. Initial creation of tissue probability maps is performed prior to alignment of brain voxels with ICBM/MNI space. Registration of input image with tissue probability maps is also required to present the prior probability of each tissue class for the purpose of tissue classification. Bayes rule is employed to produce the posterior probability of each tissue class. In SPM, classification is probabilistic in the sense that a probability value of belonging to each of the classes is assigned to each voxel. In previous versions of SPM, the whole procedure of registration and segmentation is performed in circular way. In SPM version 8, the entire procedure is converted into a single unified generative model. The recent version of SPM software also takes advantage of different improved models and a more robust initial affine transformation for registration, an extended set of tissue probability maps, a different treatment of the mixing proportions and the capability of working with multi-spectral data.

In this study, all of the segmentations by SPM8 are performed using the default templates (a modified version of the ICBM Tissue Probabilistic Atlas, available at http://www.loni.ucla.edu/ICBM/ICBM_Probabilistic.html) and parameters for this version. SPM8 segmentation results are the probability maps with voxel values between 0 and 255. When generating binary images, voxels corresponding to larger tissue probability in the maps are counted as members of that particular class. Using this idea for the class membership decision prevents voxels in the border region between two different classes to be classified as a member of both classes.

2) FSL

FMRIB’s Software Library (FSL) as an integrated software package made by FMRIB Analysis Group is one of the most widely used library for neuroimage analyses [6]. The FMRIB

Automated Segmentation Tool (FAST) is part of the FSL library. This package is composed of different modules for functional, structural and diffusion MRI such as BET for brain extraction, FAST for tissue segmentation and FLIRT for linear registration. FAST, FMRIB's Automated Segmentation Tool, is a module for segmenting the brain into different tissue types while correcting the bias field. Also, in FAST's advanced options, the input image is first registered to standard space and then standard tissue probability maps (from the MNI152 dataset) are used to estimate the initial parameters of tissue classes. The segmentation routine used in this tool works based on a Hidden Markov Random Field (HMRF) model optimized by Expectation- Maximization algorithm [8]. The fully automated segmentation process bring about a bias field-corrected version of input image and a probabilistic and/or partial volume tissue segmentation.

In this paper, FSL version 4.1 is employed for the whole process of registration, skull stripping and CSF extraction. At the first step, the brain is extracted using FSL’s own brain extraction tool (BET: brain extracting tool) [9] that eliminates all non-brain tissue automatically. Then, FAST tool using probability maps as its default settings is used to segmenting the brain into three tissue classes of WM, GM and CSF and performing bias correction.

3) Brain Suit

BrainSuite [7] is the updated version of BSE (Brain Surface Extraction Software) and is specifically designed for the purpose of cortical surface extraction. BrainSuite is an integrated package which can be used for soft tissue, skull and scalp segmentation and for surface analysis and visualization. BrainSuite package software provides a multi-stage user friendly approach including brain surface extraction using BSE technique [10], bias field correction, voxel classification, cerebrum labeling, and surface generation in a row.

In this paper, most of the parameters in skull stripping and bias correction are set to be default. In tissue classification, BrainSuite provide a partial volume tissue classification as default. In this paper three classes option is set to provide brain segmentation into WM, GM and CSF.

C. Evaluation Metrics

For the quantitative evaluation of the segmentation results, some of the common measures are chosen to facilitate direct comparisons of segmentation accuracy. The algorithms are assessed by comparing the automatically obtained results to manual segmentation for each of the three tissues of CSF, GM and WM. The manually labeled images of each tissue are used as the reference and the results of segmentation approach are converted into binary images of the same voxel resolution and image dimensions as the reference image. A number of commonly used measures in evaluation of segmentation are computed for each segmentation result. In the following description of these measures, the segmentation result is termed as S and the manual standard (reference image) R.

1) Spatial Overlaps:

The Dice coefficient or similarity index and Jaccard coefficient are the two measures, which represent spatial

Page 3: [IEEE 2010 17th Iranian Conference Of Biomedical Engineering (ICBME) - Isfahan, Iran (2010.11.3-2010.11.4)] 2010 17th Iranian Conference of Biomedical Engineering (ICBME) - Comparison

overlap between two binary images. These metrics are commonly used measures and their values range between 0 (no overlap) and 1 (perfect agreement). Definition of these metrics is given as follows:

D

D

RS

RSJ

RS

RSD

2100,100

2 (1)

2) Sensitivity and specificity Sensitivity is the true positive fraction and specificity is the

true negative fraction. They can be defined as shown in the following equation:

FPTN

TNTNF

FNTP

TPTPF

, (2)

III. RESULTS AND DISCUSSION

We evaluated our results of the three segmentation techniques both quantitatively and qualitatively with the synthetic BrainWeb and real IBSR MRI data.

As mentioned before, in FSL software package, skull stripping is performed using BET algorithm. Also, tissue classification is carried out while automatically correcting the bias field. In BrainSuite, one can use the benefit of skull stripping using BSE, bias field correction and partial volume tissue segmentation in a row.

In these experiments, it is tried to reduce the undesirable effects of skull stripping errors caused by skull stripping modules using BET and BSE. Hence, skull-stripping parameters in each of the software packages are selected in such a way that the resulting outputs substantially present the segmentation performance. Thus, a single set of parameters is used to avoid missing the generality of the whole procedure.

A. Simulated Data Tests

In this experiment, T1 weighted volumes with different levels of noise and intensity non-uniformity are used. Here, noise and non-uniformity of 0%-0%, 1%-0%, 3%-0%, 3%-20% and 3%-40% are examined. Results in Table 1 describe the mean and variance of metrics for the following tests.

In the FSL package, BET procedure is set to run under the default settings. In BrainSuite software, skull stripping is performed with erosion size of 2, edge constant of 0.62 and other parameters are remained in default settings. Besides, we set to remove brainstem and dilate final mask in skull stripping.

As it is shown in Table 1, SPM8 shows a significant superiority over two other methods in CSF extraction. This can be counted as the result of the SPM ability to accurately remove the skull compared to two other methods. Apart from the weakness of BrainSuite in skull stripping compared to SPM8, BrainSuite performs more accurately in GM and WM segmentation than the two others. It can also be inferred that performing a unified procedure for skull stripping and tissue segmentation in SPM8 leads to a better performance. Errors in skull stripping using BET and BSE in FSL and BrainSuite may cause undesirable effects on classification routine.

As two visual examples, 92nd sagital and 74th axial slices of the segmentation results for T1 weighted image with noise

and non-uniformity level if 1%-0% and 3%-40% are brought in Fig. 1. Such samples are selected to make it possible for having a qualitative comparison between these methods. As it is observed in this figure, BSE method integrated in BrainSuite with the predetermined settings cannot perform an accurate skull stripping. In other words, using a single set of parameters for different inputs cannot operate satisfactorily for all inputs. Therefore, in sense of average, there will be a significant error in labeling voxels as CSF. This fact is obvious in difference of skull stripping for two different tests shown in Fig. 1.

Considering regions surrounded by the red and yellow circles, the accuracy of each method for extracting GM from CSF in 92nd sagital slice and from WM in 74th axial slice is highlighted. In this case, BrainSuite perform a more accurate segmentation in WM and GM in comparison with FSL and SPM techniques. Also, FSL presents the weakest results in distinguishing between WM and GM. Results presented in Table 1 can also confirm the fact that FSL-FAST segmentation technique, specifically in segmenting WM and GM, is not as accurate and reliable as SPM and BrainSuite, while SPM and BrainSuite lead to somehow similar accuracy in WM-GM segmentation. As a result, in WM and GM segmentation, BrainSuite is the best choice and SPM and FSL are placed in the next two positions, respectively. Moreover, in extracting CSF from other brain tissuesextraction, SPM plays far more successfully than FSL and BrainSuite.

Real Data Tests

We applied the proposed method to segment the 20 normal subjects of IBSR T1-weighted brain scans. In this section, again default settings are selected for skull stripping, bias field correction and tissue segmentation in FSL package. In BrainSuite software, skull stripping is performed with only one diffusion iteration. Other parameters of skull stripping and bias correction are in default settings. In addition, brainstem removal and final mask dilation are checked.

Results obtained through the simulations on 20 subjects of IBSR database is illustrated in Table 2. As is shown, BrainSuite offers a set of better results than two other methods. Moreover, variations around the average in BrainSuite results, are relatively less than SPM8. In other words, results shown in this table prove the fact that BrainSuite shows a more robust behavior for inputs of different noise and non-uniformity levels. In FSL-FAST segmentation results, a large variation around the average value of measurement metrics is evident. In overall, we can deduce that experiments over real MR data confirm the results presented for simulated BrainWeb datasets that BrainSuite in WM and GM segmentation is situated at the top and SPM8 and FSL-FAST are the followers.

IV. CONCLUSION

In this paper, a comprehensive comparison evaluation of three most widely used neuroimage analysis softwares was presented. In this study, SPM8, FSL 4.1 and BrainSuite 9.1 was examined for segmenting the brain into GM, WM and CSF. As the results of experiments over simulated and real datasets confirm, BrainSuite presents a superior performance in classification of WM and GM compared to two other methods. Besides the high accuracy, BrainSuite shows high robustness in

Page 4: [IEEE 2010 17th Iranian Conference Of Biomedical Engineering (ICBME) - Isfahan, Iran (2010.11.3-2010.11.4)] 2010 17th Iranian Conference of Biomedical Engineering (ICBME) - Comparison

performing tissue classification over different situation of noise and non- uniformities. One should note that brain tissue classification in an integrated package highly relies on skull-stripping module. As shown in the results, SPM8 with a unified procedure of skull-striping and tissue segmentation presents significant accuracy in distinguishing CSF from other tissues.

V. ACKNOWLEDGMENT

This work was supported in part by Shiraz University of Technology. We would like to acknowledge Welcome Department of Imaging Neuroscience, FMRIB Analysis at University of Oxford and Laboratory of Neuro Imaging at UCLA for providing SPM, FSL and BrainSuite softwares. The synthetic MR data were provided by the McConnell Brain Imaging Center of the Montreal Neurological Institute, McGill University:vailable at http://www.bic.mni.mcgill.ca/brainweb/. The 20 normal MR brain data sets and their manual segmentations were provided by the Center for Morphometric Analysis at Massachusetts General Hospital and are available at http://www.cma.mgh.harvard.edu/ibsr/.

REFERENCES [1] D. Pham, C. Xu, J. Prince, “Current methods in medical image

segmentation,” Annual Review of Biomedical Engineering, vol. 2, pp. 315–337, 2000.

[2] M. Sonka, J. M. Fitzpatrick. Handbook of Medical Imaging. SPIE, 2000.

[3] J. C. Bezdek, L. O. Hall, L. P. Clarke, “Review of MRI segmentation techniques using pattern recognition,” Medical Physics, vol. 20, no. 4, pp. 1033–1048, 1993.

[4] A. Wee-Chung Liew, H. Yan, “ Current Methods in the Automatic Tissue Segmentation of 3D Magnetic Resonance Brain Images,” Current Medical Imaging Reviews, 2006.

[5] J. Ashburner, K. J. Friston, J. Poline, et al., “Spatial registration and normalization of images,” Human Brain Mapping, vol. 2, pp.165–189, 1995.

[6] S. M. Smith, M. Jenkinson, M. W. Woolrich, et al, “Advances in functional and structural MR image analysis and implementation as FSL,” NeuroImage, vol. 23(S1), pp. 208-219, 2004.

[7] D. W. Shattuck, R. M. Leahy, “BrainSuite: An automated cortical surface identification tool,” Medical Image Analysis, vol. 6, pp. 129–142, 2002

[8] Y. Zhang, M. Brady, S. Smith, “Segmentation of brain MR images through a hidden Markov random field model and the expectation maximization algorithm,” IEEE Trans. On Medical Imaging, vol. 20(1), pp 45-57, 2001.

[9] S. Smith, “Fast robust automated brain extraction,” Human Brain Mapping, vol. 17, pp. 143-155, 2002.

[10] D. W. Shattuck, S. R. Sandor-Leahy, K. A. Schaper, et al, “Magnetic resonance image tissue classification using a partial volume model,” NeuroImage, vol. 13, no. 5, pp. 856–876, (May), 2001.

(a) 92nd sagital slice

(b) 74th axial slice

SPM FSL BrainSuite

(c) Ground truth for 74th axial (left) and 92nd sagital slices (right)

Figure 1. Visual comparison of three methods. In (a) and (b), the first rows are related to 3% noise and 40% non-uniformity

and the second rows are related to 1% noise and 0% non-uniformity

TABLE I. AVERAGE PERFORMANCE OF BRAINWEB VOLUMES

Method Metric CSF GM WM

Mean STD Mean STD Mean STD

SPM8

Dice % 93.39 0.56 93.55 0.7 95.12 0.95 J % 87.6 0.99 87.89 1.24 90.7 1.72

TPF % 97.91 0.52 92.81 1.63 93.37 2.96 TNF % 99.33 0.05 99.18 0.31 99.69 0.2

FSL FAST

Dice % 71.22 1.98 89.46 0.63 94.70 1.39 J % 55.33 2.39 80.93 1.03 89.95 2.49

TPF % 73.92 3.89 86.54 1.36 91.53 3.77 TNF % 98.09 0.07 98.99 0.42 96.82 0.17

Brain Suite

Dice % 72.37 0.35 94.52 1.46 96.23 1.52 J % 56.70 0.43 89.65 2.63 92.78 2.84

TPF % 64.58 0.43 93.60 1.87 96.60 0.95 TNF % 99.21 0.04 99.35 0.15 99.56 0.23

TABLE II. AVERAGE PERFORMANCE OF IBSR VOLUMES

Method Metric GM WM

Mean STD Mean STD

SPM8

Dice % 79.33 2.90 81.75 4.79 J % 65.84 3.94 69.37 6.30

TPF % 73.44 4.47 79.74 4.76 TNF % 99.21 0.19 99.42 0.26

FSL-FAST

Dice % 75.64 5.48 77.00 9.88 J % 61.11 6.91 63.43 10.81

TPF % 65.10 6.98 87.80 15.37 TNF % 99.89 0.23 99.48 0.27

BrainSuite

Dice % 79.90 3.15 83.53 2.15 J % 67.07 2.75 70.10 2.45

TPF % 75.14 3.04 85.63 3.45 TNF % 99.44 0.29 99.56 0.28