Research Article Pulmonary Nodule Detection Model...

Research ArticlePulmonary Nodule Detection Model Based on SVM andCT Image Feature-Level Fusion with Rough Sets

Tao Zhou1 Huiling Lu1 Junjie Zhang1 and Hongbin Shi2

1School of Science Ningxia Medical University Ningxia Yinchuan 750004 China2Department of Urology The Generel Hospital of Ningxia Medical University Ningxia Yinchuan 750004 China

Correspondence should be addressed to Tao Zhou zhoutaonxmu126com

Received 4 May 2016 Revised 23 July 2016 Accepted 2 August 2016

Academic Editor Zexuan Ji

Copyright copy 2016 Tao Zhou et alThis is an open access article distributed under theCreativeCommonsAttribution License whichpermits unrestricted use distribution and reproduction in any medium provided the original work is properly cited

In order to improve the detection accuracy of pulmonary nodules in CT image considering two problems of pulmonary nodulesdetection model including unreasonable feature structure and nontightness of feature representation a pulmonary nodulesdetection algorithm is proposed based on SVM and CT image feature-level fusion with rough sets Firstly CT images of pulmonarynodule are analyzed and 42-dimensional feature components are extracted including six new 3-dimensional features proposed bythis paper and others 2-dimensional and 3-dimensional features Secondly these features are reduced for five times with rough setbased on feature-level fusionThirdly a grid optimization model is used to optimize the kernel function of support vector machine(SVM) which is used as a classifier to identify pulmonary nodules Finally lung CT images of 70 patients with pulmonary nodulesare collected as the original samples which are used to verify the effectiveness and stability of the proposed model by four groupsrsquocomparative experimentsThe experimental results show that the effectiveness and stability of the proposed model based on roughset feature-level fusion are improved in some degrees

1 Introduction

Lung cancer is a malignant tumor with the highest morbidityand mortality rate in the world posing a serious threat tohuman life and health [1 2]The ability to estimate the risk oflung cancer is important in two common clinical models [3]pulmonary nodules management and risk prediction modelIdentification of early symptomatic in lung cancer is veryimportant to improve early survival and reduce emergencypresentations Early detection is the most popular method toimprove the effectiveness of the treatment of patients withlung cancer Since pulmonary nodules are the early form oflung cancer [4] the detection of pulmonary nodules playsa critical role in the early diagnosis and treatment of lungcancer Recent advances in computed tomography (CT) havea progressively increased spatial resolution and decreasedacquisition timesmaking it possible for high resolutionmul-tiangle 3-dimensional isotropic image of the whole lung tobe acquired in less than 10 seconds This has expanded capa-bilities for the early detection of small pulmonary nodules [4]

It is believed that early detection of lung cancer will resultin earlier treatment at lower stages of the disease therebyimproving the 5-year survival rate which has remainedrelatively constant at 15 for the last 30 years However withthe wide application of CT in the lung imaging the issues ofCT data overloading and subjective interpretation of imagesresult in a high clinical misdiagnosis rate [5]

Computer-Aided Diagnosis (CAD) systems provided abeneficial support and enhance the diagnostic accuracyCAD is capable of performing the preliminary screen of thevast amounts of CT image and marking suspicious lesionsthereby helping radiologists to carry out the quadratic dis-crimination to reduce the workload and improve the accu-racy rate of cancer diagnosis [6 7]

Pulmonary nodule detection technology is one of the hottopics in the field of CAD in recent years For example ROIsegment is a key problem Xia et al [8] using local variationalGaussianmixturemodels to segment brainMRI image Basedon Learning Local Variational GaussianMixtureModels seg-mentation of breast ultrasound images are discussed by Xian

Hindawi Publishing CorporationBioMed Research InternationalVolume 2016 Article ID 8052436 13 pageshttpdxdoiorg10115520168052436

2 BioMed Research International

[9 10] and Santos et al [11] segment the lung parenchymabased on region growing algorithm Magalhaes Barros Nettoet al [12] use growing neural gas (GNG) to segment thelung parenchyma the obtained pulmonary nodules are thenseparated from tissues containing blood vessels and bronchiaccording to the 3D distance transform and finally SVM isused to carry out the effective identification of pulmonarynodules with shape and texture features Ye et al [13] firstlysegment and extract region of interest (ROI) with fuzzythreshold in combinationwith Gaussianmatrix mean curva-ture and Hessian matrix then choose the local shape infor-mation and local intensity dispersion as the feature expres-sion of ROI and finally use theweighted SVM for recognitionof pulmonary nodules Tan et al [14] segment pulmonarynodules based on the blood vessels and nodule enhancementfilter proposed by Li et al [15] then locate the clustering cen-ter of pulmonary nodules based on the divergence calculatedby Gaussian template and achieve ROI extraction and finallyuse the classifier based on genetic model artificial neuralnetwork (ANN) and SVM for comparative analysis of thedetection effectiveness of pulmonary nodules Cascio et al[16] use regional growthmodel andmorphological operationto extract the ROI firstly then reconstruct B-spline surfacebased on 3D spring model in order to extract the related 3Dgray features and shape features and detect the pulmonarynodules using ANN Although the above literature exploresthe methods of detecting pulmonary nodules overall theseare still two disadvantages of these methods in featurestructure design and feature set expression as follows

(1) When extracting and quantifying feature for ROI thefeature structure design is irrational reflected by thefact that the combination of global features and localfeatures and the combination of two-dimensional andthree-dimensional features are not fully considered

(2) When fusing feature data the compactness of featureexpression is a difficult problem Therefore featureredundancy is usually not eliminated Moreover thefeature-level fusion method without prior knowledgeis rarely used

Rough set theory was developed by Zdzislaw Pawlak inthe early 1980s and can be regarded as a new mathematicaltool for feature selection feature extraction and decision rulegeneration without prior knowledge Rough sets provide themechanism to find the minimal set of attributes required toclassify the training samples This minimal set of attributes iscalled reduct and contains the same knowledge as the originalset of attributes in a given information system Thereforereducts can be used to obtain different classifiers Wang et al[17] present a framework for a systematic study of the roughset theory Various views and interpretations of the theoryand different approaches to study the theory are discussedThe relationships between the rough sets and other theoriessuch as fuzzy sets evidence theory granular computing for-mal concept analysis and knowledge spaces are examinedCost of disease prediction and diagnosis can be reduced byapplying machine learning and data mining methods Dis-ease prediction and decision-making play a significant role in

medical diagnosis Udhaya Kumar and Hannah Inbarani [18]put forward a novel neighborhood rough set classificationapproach to deal with medical datasets Experimental resultof the proposed classification algorithm is compared withother existing approaches such as rough set 119870th-nearestneighbor support vector machine BP NN and multilayerperceptron to conclude that the proposed approach is acheaperway for disease prediction anddecision-making Fea-ture Selection (FS) is a solution that involves finding a subsetof prominent features to improve predictive accuracy andto remove the redundant features Thus the learning modelreceives a concise structure without forfeiting the predictiveaccuracy built by using only the selected prominent featuresTherefore nowadays FS is an essential part of knowledge dis-covery Inbarani et al [19] proposed new supervised featureselection methods based on hybridization of Particle SwarmOptimization (PSO) PSO based Relative Reduct (PSO-RR)and PSO based Quick Reduct (PSO-QR) presented for thediseases diagnosis in order to seek to investigate the utilityof a computer-aided diagnosis in the task of differentiatingmalignant nodules frombenign nodules based on single thin-section CT image data In Shah et al [20] CT images of soli-tary pulmonary nodules were contouredmanually on a singlerepresentative slice by a thoracic radiologist Two separatecontours were created for each nodule one including only thesolid portion of the nodule and one including any ground-glass components For each contour 75 features were calcu-lated that measured the attenuation shape and texture of thenoduleThese features were then input into a feature selectionstep and four different classifiers to determine if the diagnosiscould be predicted from the feature vector Hassanien [21]discuss a hybrid scheme that combines the advantages offuzzy sets and rough sets in conjunction with statisticalfeature extraction techniques An application of breast cancerimaging has been chosen and hybridization scheme havebeen applied to see their ability and accuracy to classify thebreast cancer images into two outcomes cancer or noncancer

Based on the above reasons a pulmonary nodule detec-tion model based on rough set (RS) feature-level fusion andSVM is proposed in this paper To overcome the first afore-mentioned disadvantage the shape feature intensity featureand texture feature are extracted For shape feature three new3-dimensional features namely External Spherical Volume(ESV) Surface-Center Distance Standard Deviation (SCD-STD) and External Rectangle Cross Line Distance (ERCLD)are proposed For intensity feature three new 3-dimensionalfeatures namely intensity gradient (from inside to outside)Laplace Divergence Mean (LDM) and Laplace DivergenceDistance (LDD) are proposed Regarding feature descriptiontwo-dimensional texture feature three-dimensional shapefeature and intensity feature are used for quantificationWithregard to the second aforementioned disadvantage rough setfeature-level fusion is adopted since it can fully retain theproperties of the features without prior knowledge Finallya grid optimizationmodel is employed to optimize the kernelfunction of support vector machine (SVM) which is used toconduct the recognition and detection of pulmonary nodulesIn order to verify the validity and stability advantages of

BioMed Research International 3

the model four groups of comparative experiments are per-formed in this paper that is model validation experimentsbefore and after rough set reduction model stability experi-ments before and after rough set reduction validation exper-iments of the superiority of the rough set feature-level fusionmodel and comparative experiments with other pulmonarynodule detection models to compare the performance Theexperimental results show that the method proposed in thispaper can improve to a certain extent the rationality of fea-ture structure and compactness of feature expression therebyimproving the detection accuracy of pulmonary nodules

2 Related Theory

The description of ROI features is determined by bothits comprehensiveness (features cannot be ldquoobservedrdquo withldquomultiperspectiverdquo approach if the features amount is toolittle) and the accuracy of characterization (more quantizedvalues diverged from the real information will cause a lowfeature discrimination) A large number of noise informationsets will reduce the ROI feature extraction accuracy and affectthe final results of detection Therefore for comprehensiveand accurate expression of the morphological structure ofROI and local features six new 3-dimensional features areproposed based on the analysis of ROI for lung CT imageThese new 3-dimensional features are used to qualitativelyanalyze and quantitatively characterize the lesions from 2-dimensional and 3-dimensional perspectives in combinationwith other shape features intensity features texture features

21 Pulmonary Nodules Features in CT Image

211 Shape Characteristics Shape characteristics analyze thespatial distribution of gray values by computing local featuresat each point in the image Shape feature is the most intuitivevisual feature which can be used to describe themainmedicalsigns of CT image of pulmonary nodule ROI such as nodulesign lobulation sign spinous process sign vacuole sign andspicule sign from the perspectives of geometric shape edgeroughness and topology structure In this paper the extractedcomponents of the shape features mainly include perimeterarea volume roundness rectangularity elongation Eulernumber Harris Hu moment ESV SCDSTD and ERCLDHere some features are given [22]

(1) Area

119878 =

119873

sum

119909=1

119872

sum

119910=1

119891 (119909 119910) (1)

where 119891(119909 119910) is the pixels of the target and119872 and119873 are thelength and width respectively

(2) Perimeter

119862 =

119872

sum

119894=1

119873

sum

119895=1

119901 (119894 119895) (2)

where 119901(119894 119895) is the pixels of the target edge and119872 and119873 arethe length and width respectively

(3) Circularity

1198770=1198622

4120587119878 (3)

Circularity describes object shape that is close to thedegree of circular where 119878 is the area of the target region and119862 is circumference of the target region 0 lt 119877

0lt 1 and 119877

0

value reflects the complexity of the measurement boundarythe shape is more complex and the 119877

0value is more smaller

(4) Rectangularity

119877 =119878

(119867 lowast119882) (4)

where 119878 is the area of the target region and119867 and119882 are thelength and width respectively

(5) Elongation

119864 =min (119867119882)max (119867119882)

(5)

Elongation can distinguish different shapes of the images(such as circle square ellipse thin and long and short andwide) where119867 and119882 are the length and width respectively

(6) Euler Number

119864 = 119862 minus 119867 (6)

where 119862 is the number of connection parts and 119867 is thenumber of holes

(7) External Spherical Volume (ESV) ESV is the ratio of eachROI 119860

119894(maximum diameter is dim(119860

119894)) to the External

Spherical Volume VS(119860119894) extracted from three-dimensional

CT image which reflects the similarity between the regionand the sphere as shown in Figure 1(b)

Volume (VS (119860119894)) =

4

3times 120587 times (

dim2)

3

1198641(119860119894) =

Volume (119860119894)

Volume (VS (119860119894))

(7)

(8) Surface-Center Distance STandard Deviation (SCDSTD)SCDSTD is the coordinate distance standard deviation ofeach individual element 119862(119878

119894) and regional center 119862cen(119860 119894)

from the surface of each ROI its value also describes thesimilarity with sphere of ROI If the value is 0 119864

2(119860119894) is

a standard sphere With the increase in 1198642(119860119894) value the

magnitude of the deviation from the sphere in the regionincreases as shown in Figure 1(c)

1198642(119860119894) = std(

1003817100381710038171003817119862 (119878119894) minus 119862cen (119860 119894)1003817100381710038171003817

mean (1003817100381710038171003817119862 (119878119894) minus 119862cen (119860 119894)1003817100381710038171003817)) (8)

(9) External Rectangle Cross Line Distance (ERCLD) ERCLDis the distance from center voxel119862cen(119860 119894) of ROI to the center


Y

Z

X

(a) Coordinate system

Ai

O

Dim

(Ai )

VS(Ai)

(b) ESV

C(Si)

Ccen(Ai)

(c) SCDSTD

Ccen(Ai)

Ccen(Li)

(d) ERCLD

Figure 1 Three-dimensional character sketch

dim(119871119894) (119894 = 1 2 12) of its 12 intersecting lines which

may indicate that the regional voxel is evenly distributed inthe rectangular body as shown in Figure 1(d)

1198643 (119860) =

1003817100381710038171003817mean (119862cen (119860 119894) minus 119862cen (dim (119871 119894)))1003817100381710038171003817

mean (119862cen (dim (119871 119894))) (9)

212 Hu Moment Characteristics Moments and the relatedinvariants have been extensively analyzed to characterize thepatterns in images The moment invariants are independentof position size and orientation but also independent ofparallel projection Hu [23] was the first person to prove thecentral moment invariants The central geometric momentinvariants are derived based upon algebraic invariantsincluding six absolute orthogonal invariants and one skeworthogonal invariant The moment invariants have beenproved to be the adequatemeasures for tracing image patternsabout the images translation scaling and rotation

Hu moment invariants define seven values computed bynormalizing central moments through order three whichare invariant to object scale position and orientation and alarge number of papers that have significant contribution to

the application of Hu moment Two-dimensional momentsof a digitally sampled 119872 lowast 119873 image that has gray function119891(119909 119910) (119909 = 1 2 119872 119910 = 1 2 119873) are given as

119872119901119902=

119872

sum

119909=1

119873

sum

119910=1

119909119901119910119902119891 (119909 119910) 119901 119902 = 1 2 3 (10)

The moments 119891(119909 119910) translated by an amount (119886 119887) aredefined as

119909 =11989810

11989800

119910 =11989801

11989800

120583119901119902=

119872

sum

119909=1

119873

sum

119910=1

(119909 minus 119909)119901(119910 minus 119910)

119902119891 (119909 119910)

119901 119902 = 1 2 3

(11)


When a scaling normalization is applied the centralmoments change as

120578119901119902=

120583119901119902

120583120574

00

120574 = ((119901 + 119902)

2) + 1 (12)

In terms of the central moments the seven moments aregiven as

1198621= 12057820+ 12057802

1198622= (12057820minus 12057802)2+ 41205782

11

1198623= (12057830minus 312057812)2+ (312057821minus 12057803)2

1198624= (12057830+ 12057812)2+ (12057803+ 12057821)2

1198625= (12057830minus 312057812) (12057830+ 12057812)

sdot [(12057830+ 12057812)2minus 3 (120578

03+ 12057821)2] + (3120578

21minus 12057803)

sdot (12057821+ 12057803) [3 (120578

30+ 12057812)2minus (12057803+ 12057821)2]

1198626= (12057820minus 12057802) [(12057830+ 12057812)2minus (12057803+ 12057821)2]

+ 412057811(12057830+ 12057812) (12057821+ 12057803)

1198627= (312057821minus 12057803) (12057830+ 12057812)

sdot [(12057830+ 12057812)2minus 3 (120578

03+ 12057821)2] + (3120578

12minus 12057830)

sdot (12057821+ 12057803) [3 (120578

30+ 12057812)2minus (12057803+ 12057821)2]

(13)

Hu 7-moment invariants varywidely in order to compareusing logarithmic function to compress data and hence theactual invariants moment features are 1198621015840

119870

119862119870=10038161003816100381610038161003816log10

100381610038161003816100381610038161198621015840

119870

10038161003816100381610038161003816

10038161003816100381610038161003816119870 = 1 2 7 (14)

The amended moment invariant features possess transla-tion invariance rotational invariance and scale invariance

213 TextureCharacteristics Tamura texture features Tamuratexture based on human visual perception in psychologicalresearch are proposed by Tamura in 1978 Six components ofTamura texture feature correspond with 6 properties in psy-chology three of themare coarseness contrast anddirection-ality which have the good application value in the texturesynthesis image recognition and so on

Texture is the gray distribution which appears repeatedlyin the space position so there are some relationships betweentwo pixels at some distance from each other in image spacecalled gray spatial correlation properties in gray imageGLCM is a common method by studying the relevant rela-tionship of gray image

214 Intensity Features Gray statistical feature is a quantita-tivemethod to describe the basic features of two-dimensionalimage region it is called intensity feature from three-dimen-sional perspective [16] In this paper the extracted com-ponents of intensity features include the mean intensity

intensity variance maximum and minimum intensity dif-ference skewness kurtosis intensity gradient (from insideto outside) Laplace Divergence Mean (LDM) and LaplaceDivergence Distance (LDD)

(1) Intensity Gradient (from Inside to Outside) For ROI 119860119894

with the voxel 119878119894volume greater than 0 morphological

erosion processing is performed continuously and the ratio ofthe mean of the excluded area of each erosion processing tothe mean of the last operation (initial value is 0) is calculateduntil the ratio is zero Consider the following equation where119899 is the number of operations

1198644(119860119894) =

119870

119899 (15)

(2) Laplace Divergence Mean (LDM) According to theLaplacian convolution results with the original CT imageit is found that the nodule surrounding area with smallergray value difference has a significant different divergenceTherefore calculation of Laplace divergence is helpful todistinguish pulmonary nodules from interfering impurities

1198645(119860119894) = mean (119860

119894times La) (16)

(3) Laplace Divergence Distance (LDD) The differencebetween the maximum and minimum values of the Laplacedivergence values is used to describe the range of regionaldivergence

1198646(119860119894) = max (119860

119894times La) minusmin (119860

119894times La) (17)

Table 1 shows the feature set of 42 features based on theabove feature description of ROI To facilitate subsequenttests features are numbered in the order as showed inTable 1 that is the shape features are numbered fs1ndashfs18 theintensity features are numbered fi1ndashfi8 and texture featuresare numbered ft1ndashft16 respectively

22 Rough Set and Attribute Reduction Rough set theory(RST) proposed by Pawlak in 1982 is one of the effectivemathematical tools for processing fuzzy and uncertaintyknowledge Nowadays RST has been applied to a varietyof fields such as artificial intelligence data mining patternrecognition and knowledge discovery Rough set is foundedon the assumption that with every object of the universeof discourse some knowledge is associated Objects char-acterized by the same information are similar in view ofthe available information about them The indiscernibilityrelation generated in this way is the mathematical basis ofrough set theory Any set of all indiscernible objects are calledan elementary set and form a basic granule of knowledgeabout the universe Any union of some elementary sets isreferred to as a crisp set otherwise the set is rough set

Definition 1 An information system 119878 is a quadruple 119878 =(119880 119860 119881 119891) where 119880 is a nonempty and finite set of objects119860 is a nonempty and finite set of attributes 119881 fl ⋃119881

119886with

119881119886being the domain of attribute 119886 and 119891 is an information


Table 1 ROI feature set

Feature type Feature vectors Dimensionality

Shapefeatures (fs)

Perimeter area volumeroundness rectangularity lengthEulerrsquos number ESV SCDSTDERCLD Hu moment

18

Intensityfeatures (fi)

Mean intensity intensitystandard variancemaximum-minimum intensitydifference value of varianceskewness kurtosis intensitygradient (from inside to outside)LDM LDD

8

Texturefeatures (ft)

Tamura texture features(contrast direction roughness)GLCM (angular second momentmoment of inertia torque deficitsum mean variance sumvariance difference varianceentropy sum entropy differentialentropy information measurecorrelation coefficient maximumcorrelation coefficient)

16

function such that119891(119909 119886) isin 119881119886for every 119909 isin 119880 and every 119886 isin

119860 A decision system is an information system (119880 119862cup119863119881 119891)with119862cap119863 = B where119862 and119863 are called the conditional anddecision attribute sets respectively

For a subset 119875 of 119860 let us define the correspondingequivalence relation as

IND (119875) = (119909 119910) isin 119880 times 119880 | 119891 (119909 119886)

= 119891 (119910 119886) for any 119886 isin 119875(18)

and denote the equivalence class of IND(119875) which containsthe object 119909 isin 119880 by [119909]

119875 that is

[119909]119875 = 119910 isin 119880 | (119909 119910) isin IND (119875) (19)

The factor set of all equivalence classes of IND(119875) isdenoted by 119880119875 that is 119880119875 = [119909]

119875| 119909 isin 119880

As well known attribute reduction is one of the keyissues in RST It is performed in information systems bymeans of the notion of a reduct based on a specializationof the notion of independence due to Marczewski Up tonow much attention has been paid to this issue and manydifferent methods of attribute reduction have been proposedfor decision systems For example the reduction approachesare respectively based on partition discernibility matrixconditional information entropy positive region and antcolony optimization approach

Definition 2 Let 119878 = (119880 119860 119881 119891) be an information systemand 119875 sube 119860 For a subset119883 of 119880 119877

119875(119883) = 119909 isin 119880 | [119909]

119875sube 119883

and 119877119875(119883) = 119909 isin 119880 | [119909]119875cap 119883 = B are called 119875-lower and

119875-upper approximations of119883 respectively

H1

H2

H

r

r

Figure 2 Optimal hyper plane

Definition 3 Let 119878 = (119880 119860 119881 119891) be an information systemand let 119875 and 119876 be two subsets of 119860 Then POS

119875(119876) =

⋃119883isin119880119876

119877119875(119883) is called 119875-positive region of 119876 where 119877

119875(119883)

is the 119875-lower approximation of 119883

Definition 4 Let 119878 = (119880 119860 119881 119891) be a decision system 119886 isin 119862and 119875 sube 119862 If POS

119862(119876) = POS

119862119886(119876) 119886 is said to be 119863-

dispensable in 119862 otherwise 119886 is said to be 119863-indispensablein119862 The set of all the119863-indispensable attributes is called thecore of 119878 and denoted by Core(119878) Furthermore if POS

119875(119876) =

POS119862(119876) and each of the attributes of 119875 is 119863-indispensable

then 119875 is called a reduct of 119878

23 SVM and Its Optimization SVM is a pattern recognitionmethod developed from statistical learning theory based onthe idea of structural risk minimization principle In the caseof ensuring classification accuracy SVM can improve thegeneralization ability of the learning machine by maximizingthe classification interval The biggest advantage of SVMis that it overcomes the overlearning and high dimensionboth of which lead to computational complexity and localextremum problems A reliable classification model basedon SVM is urgently needed for the study of hospitalizationexpenses of patients with gastric cancer

SVM deals with linearly separable data (Figure 2) theassumption is that there are data sets 119878 = 119909

1 119909

119899 and data

marker 119866 = 1199101 119910

119899 where 119909

119894is the input space vector of

the data sample and 119910119894records the category of the sample

The aim of SVM is to find an optimal hyper plane 119867 toseparate these two samples andmake the largest intervalTheoptimal hyper plane119867 is expressed as

119908119879119909 + 119887 = 0 (20)

where 119908 is the weight vector and 119887 is the thresholdThis problem is transformed into the optimal problem of

119908 and 119887

min119908119887

119903 (119908) =1

2119908

119910119894 ((119908 sdot 119909) + 119887) ge 1 119894 = 1 119899

(21)


In order to simplify the formula the Lagrange dual isintroduced to meet the requirements of KKT (Karush-Kuhn-Tucker) The objective function is transformed into

min120572

1

2

119899

sum

119894=1

119899

sum

119895=1

119910119894119910119895120572119894120572119895(119909119894sdot 119909119895) minus

119899

sum

119895=1

120572119895

st119899

sum

119894=1

119910119894120572119894= 0 120572

119894ge 0 119894 = 1 2 119899

(22)

As for the linearly inseparable data the penalty parameter119862 and relaxation variable 120585 are introduced in the constraintcondition thus the generalization ability of SVM is increasedand the function is transformed into

min120572

1

2

119899

sum

119894=1

119899

sum

119895=1

119910119894119910119895120572119894120572119895(119909119894sdot 119909119895) minus

119899

sum

119895=1

120572119895

st119899

sum

119894=1

119910119894120572119894= 0 0 le 120572

119894le 119862

(23)

where 119862 is the artificial setting parameter According to thepractical experience the bigger 119862 the greater separationinterval At the same time it will increase the risk ofgeneralization

The final classification function is

119891 (119909) = sgn(119899

sum

119894=1

119886lowast

119894119910119894(119909119894sdot 119909)) + 119887

lowast (24)

For nonlinear classification data SVM transforms theminto linearly separable data in a high-dimensional space vianonlinear mapping of kernel function and the optimal hyperplane is found in high-dimensional space The kernel func-tion which meets the mercer kernel condition correspondingto the transvection of a spatial transformation is used torealize the nonlinear transformation of linear classification

The corresponding kernel function is defined as

119870(119909119894 119909) = (120593 (119909

119894) 120593 (119909)) (25)

At this point the final classification function is

119891 (119909) = sgn(119899

sum

119894=1

119886119894119910119894119870(119909119894sdot 119909)) + 119887 (26)

Penalty factor 119862 and parameter 119892 of the kernel functionplay an extremely important role in the performance of SVMclassification In order to obtain the optimal classificationresults grid optimization model is used for optimization inthis paper In grid optimization model the parameters to besearched are expressed in the form of grids in a certain spaceand the optimal parameters are selected by traversing all thegrids Therefore grid optimization model has the advantagesof simplicity convenience good stability and easiness to getthe global optimal solution [24] In the learning process ofSVM 10-fold cross-validation is used to calculate the kernel

function parameters and penalty coefficient with the optimalclassification performance which are then applied to theSVM classifier for recognition and detection of pulmonarynodules Finally sensitivity specificity accuracy and pro-cessing time are used as indexes to evaluate the detection ofrelevant experiments

3 Pulmonary Nodule Detection Model

In this paper CT images of 70 cases of patients with pul-monary nodules are used The images are firstly segmented[7] to three different types of pulmonary nodules (solitarypulmonary nodules or SPN vascular adhesion pulmonarynodules or VAPN and pleural adhesion pulmonary nodulesor PAPN) which are marked by radiologists as well as alarge number of nonnodular areas including blood vesselsbones and alveoli Forty-two feature components charac-terizing ROI are extracted from the 2-dimensional and 3-dimensional perspectives including six new 3-dimensionalfeatures proposed in this paper They are composed of 18shape features 8 intensity features and 16 texture featuresThe extracted feature set (identified as the FS) is discretizedand normalized Feature-level fusion of the improved featuredata is performed for five times using rough set model (sincethe reduction of rough set feature subset is not unique in thispaper the extracted feature sets are reduced for five timesand are identified as RS1 RS2 RS3 RS4 and RS5) Featuresubset RS1 is used for comparative experiment Finally SVMparameters are optimized using grid optimization modeland the improved SVM is used in the following four setsof comparative experiments comparative analysis of theeffectiveness and stability of classification before and afterrough set reduction of features comparative analysis of therecognition performance before and after feature-level fusionbased on rough set or PCA comparative analysis of therecognition performance of our proposed method and othermethods Based on the above views we present a flow chartof pulmonary nodule detection model as shown in Figure 3

4 Results and Discussion

41 Experimental Environments In this paper the hardwareand software environments are as follows

Software Environments Windows 7 OS the Matlab R2014bImageJ 148 u and LibSVM

Hardware Environments Intel Core i5 4670-34GHz 80GBof memory and 500GB hard disk

Experimental Data CT images of 70 cases of patients withpulmonary nodules are collected as experimental sampleswhich are marked by radiologists with a size of 512 times 512and a thickness of 2mm They are composed of 2232 CTimages from 38 cases of patients with solitary pulmonarynodules (SPN) 17 cases of patients with vascular adhesionpulmonary nodules (VAPN) and 15 cases of patients withpleural adhesion pulmonary nodules (PAPN) respectively


ROI

Pulmonarynodule

Otherareas

Blood vesselsbone alveolar

Gridoptimization

methodSVM

Featureextraction

Featurereduction

Classification result

Feature set

RS1 RS2 RS3 RS4 RS5

Rough set

Image segmentation

Shapefeatures

Perimeter area volume roundrectangular length Eulers number ESV

SCDSTD Hu moment ERCLD

Mean intensity intensity standardvariance maximum-minimum intensitydifference value of variance skewness

kurtosis the intensity gradient (by)outside introversion Laplace divergence

Tamura texture features (contrastorientation degree roughness)

GLCM (angular second moment moment ofinertia the deficit sum mean variance sumvariance difference variance entropy andentropy differential entropy information

measure correlation coefficient maximumcorrelation coefficient)

Intensityfeatures

Texturefeatures

CT

SPN VAPN PAPN

Normalized

Discretization

Figure 3 Flow chart of pulmonary nodule detection model

Figure 4 shows the representatives of each type of pulmonarynodules and the corresponding segmentation results

In this paper 42-dimensional features of 70 markedpulmonary nodular areas and 70 randomly selected nonn-odular areas are extracted Table 2 shows the 42-dimensionalfeature values of the lung nodular and nonnodular areasshape features are identified as the fs intensity features areidentified as the fi and texture features are identified as the ftIn order to intuitively understand the distribution of differentfeature values and the discrimination comparison externalsphere volume (ESV) ratio and the standard deviation of

surface-center distance (SCD) are calculated and plotted asbox diagram as shown in Figure 5

42 Feature-Level Fusion Based on Rough Set In order toavoid the attribute value of small range of values dominatedby that of large range of values and reduce the complexityof the statistical computation process the extracted featuresets are firstly preprocessed by normalizing data with biggerdifference and linearly mapping the data to [0 1] Thepreprocessed feature data are then fused for five times usingrough set model The fusion results are shown in Table 3


(a) Original image of SPN (b) Segment result of SPN (c) Original image of VAPN

(d) Segment result of VAPN (e) Original image of PAPN (f) Segment result of PAPN

Figure 4 Pulmonary nodule segmentation results

Table 2 Feature values of pulmonary nodular areas and nonnodular areas

Shape features (fs) Intensity features (fi) Texture features (ft)Nodular areas Nonnodular areas Nodular areas Nonnodular areas Nodular areas Nonnodular areas95 78 5906 910987 83104 54016159 128 1406 44872 12041 125216284 178 05956 minus039568 04303 0006706517 0211 27348 18669 07709 0727506961 21587 551865 143481 07169 0986503529 07778 05 1 08059 538940 1 139598 206044 01942 0048703186 10295 729905 3546389 07708 0727300686 10197 08059 5349800042 00458 35042 5097100021 00295 06514 0845300013 00268 00971 0614300005 00011 44033 8218620 1 00691 5006114 9 minus05785 minus0424505356 05571 2307 3223903072 050178801738 0207122


+0Nodular areas Nonnodular areas

02

04

06

08

1

12

14

(a) Boxplot of ESV

+

+

+

01

015

02

025

03

035

04

045

05

055

06

Nodular areas Nonnodular areas(b) Boxplot of SCDSTD

Figure 5 Pulmonary nodule area and the pulmonary nodules boxplot ldquo+rdquo refers to upper and lower bounders of ESV value and SCDSTDvalue

Table 3 Feature reduction based on rough sets

Featuresubset Reduction results Dimensionality

RS1fs4 fs16 fs17 fs18 fi2 fi4 fi6 fi7fi8 ft2 ft4 ft5 ft6 ft7 ft8 ft9ft10 ft11 ft13 ft14 ft15 ft16

21

RS2fs4 fs9 fs16 fs18 fi1 fi2 fi5 ft2ft5 ft6 ft8 ft9 ft10 ft11 ft12 ft13

ft1517

RS3fs9 fs17 fs18 fi1 fi2 fi5 fi7 fi8ft2 ft6 ft7 ft8 ft9 ft10 ft11 ft12

ft14 ft15 ft1619


ft14 ft15 ft1619

RS5fs9 fs16 fs17 fs18 fi1 fi2 fi4 fi5fi7 fi8 ft2 ft5 ft6 ft7 ft8 ft9 ft10

ft12 ft15 ft1620

43 Pulmonary Nodule Detection withSVM Based on Grid Optimization

431 The Model Effectiveness Experiment Tenfold cross-validation is used to calculate the accuracy sensitivity speci-ficity and processing time of classification before and afterrough set reduction (RS1(70 times 21) obtained from experimentone is used as the data set after reduction) and the recogni-tion performance of classifier is compared before and afterreduction The results are shown in Table 4

Experimental results show that pulmonary noduledetection accuracy is increased significantly after feature-level fusion with a decrease in the missed diagnosis ratereflected by the increased sensitivity and the misdiagnosisrate reflected by the increased specificity The processingtime is also shorter after reduction These results indicatethat the feature-level fusion of the extracted feature set with

42 dimensionalities based on rough set model is effectivewhich not only improves the compactness of the feature set(to eliminate redundancy and low degree of differentiationfeatures component) but also corrects the abnormal data ofthe feature set thereby further improving the performance ofpulmonary nodule detection Table 5 shows the effectivenessof the five rough set reduction subsets

432 The Model Stability Experiment The feature data ofpulmonary nodules are tested with RS1(70lowast21) as the datasetfor classification for five rounds with a different ratio oftraining set over testing set of 5020 4030 3535 3535 or2050 Each round of test is carried out with a randomlyselected ratio of training set over testing set and themeanof 10test results is used as the corresponding accuracy sensitivityspecificity and running time of the model The results areshown in Table 6

The experimental results show that with the decreasein the ratio of training set over testing set the decrease inthe classification accuracy of feature subset after rough setreduction is not obvious whereas that of feature set beforerough set reduction is fluctuating to certain extent (Figure 6is more intuitive)These results indicate that the classificationstability of the feature level fusion model based on rough setis higher and is less susceptible to the interference of sampledata Table 7 shows the stability of 5 groups feature subset afterrough set reduction

433 The Superiority of Feature-Level Fusion Model Basedon Rough Set Since PCA is a well-developed model char-acterized by simple calculation and easy programming ithas become the preferred dimension reduction method formost of the feature-level fusion model in order to analyzecomparatively two types of feature-level fusions In this paperPCA-based feature-level fusion of the extracted feature sets isperformed at the same time and the tenfold cross-validationresults are shown in Table 8 Figure 7 shows the classificationperformance of the two types of feature-level fusion methods


Table 4 Statistics of effectiveness before and after rough set reduction

Serial number Accuracy () Sensibility () Specificity () Processing time (s)

Before reduction

1 9642 9286 100 106102 9196 8393 100 061703 9554 100 9107 054904 8928 100 7857 056305 9554 9107 100 054706 9821 9643 100 054607 9464 8929 100 054608 9553 9107 100 054609 9196 8393 100 0546010 9732 100 9664 05300

Mean 9464 9286 9643 06051

After reduction (Rs1)

1 100 100 100 093702 100 100 100 043603 100 100 100 038704 100 100 100 042105 100 100 100 042106 100 100 100 039007 100 100 100 040608 9167 100 8333 040609 100 100 100 0374010 100 100 100 03930

Mean 9917 100 9833 04571Increase after reduction 453 714 19 0148

Table 5 Effectiveness of rough set reduction subsets

Subset Averageaccuracy ()

Averagesensitivity ()

Averagespecificity ()

Processingtime (s)

RS1 9917 100 9833 04571RS2 975 9667 9833 04650RS3 9917 100 9833 04656RS4 100 100 100 04731RS5 9833 9833 9833 04850Mean 9883 99 9866 04672

(feature subset RS1 fromTable 3 is used and the running timeis 100 times actual time)

Experimental results show that various performanceindicators of the feature-level fusion model based on roughset are better than those based on PCA indicating that therough set is more suitable than PCA to eliminate redundantinformation

434 Comparison with Other Pulmonary Nodule DetectionMethods Pulmonary nodule detection accuracy and FalsePositives per scan (FPs) are used as the evaluation indexesof pulmonary nodule detection methods to compare andanalyze the method proposed in this paper and other five

9293949596979899

100101

5020 4030 3535 3040 2050

Before reductionAfter reduction

Figure 6 Comparative results of feature subsets before and afterrough set reduction

detectionmethods of pulmonary nodules (the optimal detec-tion accuracy is used for all detection methods) The resultsare shown in Table 9 (Pr private database L LIDC)

Experimental results show that the proposed method issuperior to the other pulmonary nodule detection methodsto a certain extent indicating that this method not onlyimproves the comprehensiveness and accuracy of the featuredescription of ROI by supplementing and improving the


Table 6 Stability statistics of rough set reduction subsets

Trainingsettesting set Accuracy () Sensitivity () Specificity () Running time (s)

Before fusion

5020 9735 9471 100 048734030 9653 9308 9832 038463535 9583 9239 9779 042543040 9616 9558 9674 035602050 9488 9463 9586 04236Mean 9615 9408 97742 04154

After fusion (Rs1)

5020 9971 9941 100 026844030 9896 9958 9846 025683535 9865 9923 9808 023823040 9837 9860 9814 026462050 9825 9767 9884 02636Mean 9879 9884 9870 02583

Table 7 Classification performance of rough set reduction subset

SubsetAverageaccuracy

()

Averagesensitivity

()

Averagespecificity

()

Runningtime (s)

RS1 9917 100 9833 02583RS2 975 9667 9833 02870RS3 9917 100 9833 02560RS4 100 100 100 02531RS5 9833 9833 9833 02656Mean 98834 99 9866 02620

Table 8 Classification performance of feature reduction based onPCA

Serialnumber

Accuracy()

Sensitivity()

Specificity()

10 timesrunningtime (s)

1 9167 8333 100 099702 9674 9348 100 048303 9674 9348 100 048804 9891 100 9783 049505 9348 8696 100 049506 9674 100 9348 051407 9674 100 9348 051208 9457 8913 100 048909 9783 9565 100 0499010 9565 9348 9783 05180Mean 9591 9355 9826 05490

feature components but also improves the firmness of thefeature set by integrating the concept of feature-level fusionbased on rough set to exclude the redundant features and datawith irregular information thereby improving the overallpulmonary nodule detection performance

Table 9 Comparison of the performance of different lung noduledetection methods

Author Database Nodulenumbers

Accuracy() FPs

Santos et al[11] L 260 884 117

MagalhaesBarros Nettoet al [12]

L 48 9065 0138

Ye et al [13] Pr 220 902 82Tan et al [14] L 172 875 4Cascio et al[16] L 148 97 61

Our method Pr 70 9917 047

Rough set

Accuracy Sensitivity Specificity Running time

PCA

020406080

100120

Figure 7 Comparison of two feature-level fusion models

5 Conclusions

In this paper the research status quo of pulmonary noduledetection methods is analyzed and a pulmonary noduledetection model is proposed based on rough set basedfeature-level fusion To address the issues that the featuredescription is insufficient and the characterization is inac-curate in the process of feature extraction six new 3Dfeatures in combination with other 2D and 3D features are


proposed to extract and quantify the feature information ofROI in this model A rough set based feature-level fusionis employed to reduce the dimensionality of the feature setssince there is redundant information in the extracted high-dimensional features In addition a grid optimization modelis adopted to optimize the SVM kernel function whichis used as the classifier for detection and recognition ofpulmonary nodule Finally the pulmonary nodule detectionperformance of the proposed method is verified with fourgroups of comparative experimentsThe experimental resultsshow that the proposed pulmonary nodule detectionmethodbased on rough set based feature-level fusion is effective withthe classification accuracy that can basicallymeet the require-ments of medical imaging for the detection of pulmonarynodules and therefore is of great value for the detection ofpulmonary nodules and auxiliary diagnosis of lung cancer

Competing Interests

The authors declare that they have no competing interests

Acknowledgments

The work is partially supported by National Natural ScienceFoundation of China (Grant No 61561040) Natural ScienceFoundation of Ningxia (Grant No NZ16067) and ScientificResearch Fund of Ningxia Education Department (Grant NoNGY2016084)

References

[1] S P Riaz M Luchtenborg V H Coupland J Spicer M DPeake and H Moslashller ldquoTrends in incidence of small cell lungcancer and all lung cancerrdquo Lung Cancer vol 75 no 3 pp 280ndash284 2012

[2] J Ferlay I Soerjomataram R Dikshit et al ldquoCancer incidenceand mortality worldwide sources methods and major patternsin GLOBOCAN 2012rdquo International Journal of Cancer vol 136no 5 pp E359ndashE386 2015

[3] D R Baldwin ldquoPrediction of risk of lung cancer in populationsand in pulmonary nodules significant progress to drive changesin paradigmsrdquo Lung Cancer vol 89 no 1 pp 1ndash3 2015

[4] S Diciotti G Picozzi M Falchini MMascalchi N Villari andG Valli ldquo3-D segmentation algorithm of small lung nodules inspiral CT imagesrdquo IEEETransactions on Information Technologyin Biomedicine vol 12 no 1 pp 7ndash19 2008

[5] S Theodoridis and K Koutroumbas Pattern Recognition TheAssociated Press 4th edition 2010

[6] B Sahiner H-P Chan L M Hadjiiski et al ldquoEffect of CADon radiologistsrsquo detection of lung nodules on thoracic CT scansanalysis of an observer performance study by nodule sizerdquoAcademic Radiology vol 16 no 12 pp 1518ndash1530 2009

[7] Y Xia S Lu LWen S EberlM Fulham andDD Feng ldquoAuto-mated identification of dementia using FDG-PET imagingrdquoBioMed Research International vol 2014 Article ID 421743 8pages 2014

[8] Y Xia Z Ji and Y Zhang ldquoBrain MRI image segmentationbased on learning local variational Gaussian mixture modelsrdquoNeurocomputing vol 204 pp 189ndash197 2016

[9] M Xian Y Zhang and H D Cheng ldquoFully automatic segmen-tation of breast ultrasound images based on breast characteris-tics in space and frequency domainsrdquo Pattern Recognition vol48 no 2 pp 485ndash497 2015

[10] M Xian Y Zhang H-D Cheng F Xu and J Ding ldquoNeutro-connectedness cutrdquo IEEE Transactions on Image Processing vol25 no 10 pp 4691ndash4703 2016

[11] A M Santos A O de Carvalho Filho A C Silva et alldquoAutomatic detection of small lung nodules in 3D CT datausing Gaussian mixture models Tsallis entropy and SVMrdquoEngineering Applications of Artificial Intelligence vol 36 pp 27ndash39 2014

[12] S Magalhaes Barros Netto A Corrca Silva R AcatauassuNunes and M Gattass ldquoAutomatic segmentation of lungnodules with growing neural gas and support vector machinerdquoComputers in Biology andMedicine vol 42 no 11 pp 1110ndash11212012

[13] X Ye X Lin J Dehmeshki G Slabaugh and G BeddoeldquoShape-based computer-aided detection of lung nodules in tho-racic CT imagesrdquo IEEE Transactions on Biomedical Engineeringvol 56 no 7 pp 1810ndash1820 2009

[14] M Tan R Deklerck B Jansen M Bister and J Cornelis ldquoAnovel computer-aided lung nodule detection system for CTimagesrdquoMedical Physics vol 38 no 10 pp 5630ndash5645 2011

[15] Q Li F Li and K Doi ldquoComputerized Detection of LungNodules in Thin-Section CT Images by Use of SelectiveEnhancement Filters and an Automated Rule-Based ClassifierrdquoAcademic Radiology vol 15 no 2 pp 165ndash175 2008

[16] D Cascio R Magro F Fauci M Iacomi and G RasoldquoAutomatic detection of lung nodules in CT datasets basedon stable 3D mass-spring modelsrdquo Computers in Biology andMedicine vol 42 no 11 pp 1098ndash1109 2012

[17] G Y Wang Y Y Yao and H Yu ldquoA survey on rough set theoryand applicationsrdquo Chinese Journal of Computers vol 32 no 7pp 1229ndash1246 2009

[18] S Udhaya Kumar and H Hannah Inbarani ldquoA novel neigh-borhood rough set based classification approach for medicaldiagnosisrdquo Procedia Computer Science vol 47 pp 351ndash359 2015

[19] H H Inbarani A T Azar and G Jothi ldquoSupervised hybridfeature selection based on PSO and rough sets for medicaldiagnosisrdquo Computer Methods and Programs in Biomedicinevol 113 no 1 pp 175ndash185 2014

[20] S K Shah M F McNitt-Gray S R Rogers et al ldquoComputer-aided diagnosis of the solitary pulmonary nodulerdquo AcademicRadiology vol 12 no 5 pp 570ndash575 2005

[21] A Hassanien ldquoFuzzy rough sets hybrid scheme for breastcancer detectionrdquo Image and Vision Computing vol 25 no 2pp 172ndash183 2007

[22] Z Guo Y Li Y Wang S Liu T Lei and Y Fan ldquoA method ofeffective text extraction for complex video scenerdquoMathematicalProblems in Engineering vol 2016 Article ID 2187647 11 pages2016

[23] M-K Hu ldquoVisual pattern recognition by moment invariantsrdquoIRE Transactions on Information Theory vol 8 no 2 pp 179ndash187 1962

[24] J Chorowski J Wang and J M Zurada ldquoReview and per-formance comparison of SVM- and ELM-based classifiersrdquoNeurocomputing vol 128 pp 507ndash516 2014

Submit your manuscripts athttpwwwhindawicom

Stem CellsInternational

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014


MEDIATORSINFLAMMATION

of


Behavioural Neurology

EndocrinologyInternational Journal of



Disease Markers


BioMed Research International

OncologyJournal of



Oxidative Medicine and Cellular Longevity


PPAR Research

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Immunology ResearchHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Journal of

ObesityJournal of



Computational and Mathematical Methods in Medicine

OphthalmologyJournal of


Diabetes ResearchJournal of



Research and TreatmentAIDS


Gastroenterology Research and Practice


Parkinsonrsquos Disease

Evidence-Based Complementary and Alternative Medicine

Volume 2014Hindawi Publishing Corporationhttpwwwhindawicom


[9 10] and Santos et al [11] segment the lung parenchymabased on region growing algorithm Magalhaes Barros Nettoet al [12] use growing neural gas (GNG) to segment thelung parenchyma the obtained pulmonary nodules are thenseparated from tissues containing blood vessels and bronchiaccording to the 3D distance transform and finally SVM isused to carry out the effective identification of pulmonarynodules with shape and texture features Ye et al [13] firstlysegment and extract region of interest (ROI) with fuzzythreshold in combinationwith Gaussianmatrix mean curva-ture and Hessian matrix then choose the local shape infor-mation and local intensity dispersion as the feature expres-sion of ROI and finally use theweighted SVM for recognitionof pulmonary nodules Tan et al [14] segment pulmonarynodules based on the blood vessels and nodule enhancementfilter proposed by Li et al [15] then locate the clustering cen-ter of pulmonary nodules based on the divergence calculatedby Gaussian template and achieve ROI extraction and finallyuse the classifier based on genetic model artificial neuralnetwork (ANN) and SVM for comparative analysis of thedetection effectiveness of pulmonary nodules Cascio et al[16] use regional growthmodel andmorphological operationto extract the ROI firstly then reconstruct B-spline surfacebased on 3D spring model in order to extract the related 3Dgray features and shape features and detect the pulmonarynodules using ANN Although the above literature exploresthe methods of detecting pulmonary nodules overall theseare still two disadvantages of these methods in featurestructure design and feature set expression as follows

(1) When extracting and quantifying feature for ROI thefeature structure design is irrational reflected by thefact that the combination of global features and localfeatures and the combination of two-dimensional andthree-dimensional features are not fully considered

(2) When fusing feature data the compactness of featureexpression is a difficult problem Therefore featureredundancy is usually not eliminated Moreover thefeature-level fusion method without prior knowledgeis rarely used

Rough set theory was developed by Zdzislaw Pawlak inthe early 1980s and can be regarded as a new mathematicaltool for feature selection feature extraction and decision rulegeneration without prior knowledge Rough sets provide themechanism to find the minimal set of attributes required toclassify the training samples This minimal set of attributes iscalled reduct and contains the same knowledge as the originalset of attributes in a given information system Thereforereducts can be used to obtain different classifiers Wang et al[17] present a framework for a systematic study of the roughset theory Various views and interpretations of the theoryand different approaches to study the theory are discussedThe relationships between the rough sets and other theoriessuch as fuzzy sets evidence theory granular computing for-mal concept analysis and knowledge spaces are examinedCost of disease prediction and diagnosis can be reduced byapplying machine learning and data mining methods Dis-ease prediction and decision-making play a significant role in

medical diagnosis Udhaya Kumar and Hannah Inbarani [18]put forward a novel neighborhood rough set classificationapproach to deal with medical datasets Experimental resultof the proposed classification algorithm is compared withother existing approaches such as rough set 119870th-nearestneighbor support vector machine BP NN and multilayerperceptron to conclude that the proposed approach is acheaperway for disease prediction anddecision-making Fea-ture Selection (FS) is a solution that involves finding a subsetof prominent features to improve predictive accuracy andto remove the redundant features Thus the learning modelreceives a concise structure without forfeiting the predictiveaccuracy built by using only the selected prominent featuresTherefore nowadays FS is an essential part of knowledge dis-covery Inbarani et al [19] proposed new supervised featureselection methods based on hybridization of Particle SwarmOptimization (PSO) PSO based Relative Reduct (PSO-RR)and PSO based Quick Reduct (PSO-QR) presented for thediseases diagnosis in order to seek to investigate the utilityof a computer-aided diagnosis in the task of differentiatingmalignant nodules frombenign nodules based on single thin-section CT image data In Shah et al [20] CT images of soli-tary pulmonary nodules were contouredmanually on a singlerepresentative slice by a thoracic radiologist Two separatecontours were created for each nodule one including only thesolid portion of the nodule and one including any ground-glass components For each contour 75 features were calcu-lated that measured the attenuation shape and texture of thenoduleThese features were then input into a feature selectionstep and four different classifiers to determine if the diagnosiscould be predicted from the feature vector Hassanien [21]discuss a hybrid scheme that combines the advantages offuzzy sets and rough sets in conjunction with statisticalfeature extraction techniques An application of breast cancerimaging has been chosen and hybridization scheme havebeen applied to see their ability and accuracy to classify thebreast cancer images into two outcomes cancer or noncancer

Based on the above reasons a pulmonary nodule detec-tion model based on rough set (RS) feature-level fusion andSVM is proposed in this paper To overcome the first afore-mentioned disadvantage the shape feature intensity featureand texture feature are extracted For shape feature three new3-dimensional features namely External Spherical Volume(ESV) Surface-Center Distance Standard Deviation (SCD-STD) and External Rectangle Cross Line Distance (ERCLD)are proposed For intensity feature three new 3-dimensionalfeatures namely intensity gradient (from inside to outside)Laplace Divergence Mean (LDM) and Laplace DivergenceDistance (LDD) are proposed Regarding feature descriptiontwo-dimensional texture feature three-dimensional shapefeature and intensity feature are used for quantificationWithregard to the second aforementioned disadvantage rough setfeature-level fusion is adopted since it can fully retain theproperties of the features without prior knowledge Finallya grid optimizationmodel is employed to optimize the kernelfunction of support vector machine (SVM) which is used toconduct the recognition and detection of pulmonary nodulesIn order to verify the validity and stability advantages of



2 Related Theory




(1) Area

119878 =

119873

sum

119909=1

119872

sum

119910=1

119891 (119909 119910) (1)


(2) Perimeter

119862 =

119872

sum

119894=1

119873

sum

119895=1

119901 (119894 119895) (2)


(3) Circularity

1198770=1198622

4120587119878 (3)


0lt 1 and 119877

0



(4) Rectangularity

119877 =119878

(119867 lowast119882) (4)


(5) Elongation

119864 =min (119867119882)max (119867119882)

(5)


(6) Euler Number

119864 = 119862 minus 119867 (6)







Volume (VS (119860119894)) =

4


dim2)

3

1198641(119860119894) =

Volume (119860119894)

Volume (VS (119860119894))

(7)




2(119860119894) is



1198642(119860119894) = std(

1003817100381710038171003817119862 (119878119894) minus 119862cen (119860 119894)1003817100381710038171003817

mean (1003817100381710038171003817119862 (119878119894) minus 119862cen (119860 119894)1003817100381710038171003817)) (8)



Y

Z

X


Ai

O

Dim

(Ai )

VS(Ai)

(b) ESV

C(Si)

Ccen(Ai)

(c) SCDSTD

Ccen(Ai)

Ccen(Li)

(d) ERCLD




1198643 (119860) =


mean (119862cen (dim (119871 119894))) (9)




119872119901119902=

119872

sum

119909=1

119873

sum

119910=1

119909119901119910119902119891 (119909 119910) 119901 119902 = 1 2 3 (10)


119909 =11989810

11989800

119910 =11989801

11989800

120583119901119902=

119872

sum

119909=1

119873

sum

119910=1

(119909 minus 119909)119901(119910 minus 119910)

119902119891 (119909 119910)

119901 119902 = 1 2 3

(11)



120578119901119902=

120583119901119902

120583120574

00

120574 = ((119901 + 119902)

2) + 1 (12)


1198621= 12057820+ 12057802

1198622= (12057820minus 12057802)2+ 41205782

11

1198623= (12057830minus 312057812)2+ (312057821minus 12057803)2

1198624= (12057830+ 12057812)2+ (12057803+ 12057821)2

1198625= (12057830minus 312057812) (12057830+ 12057812)

sdot [(12057830+ 12057812)2minus 3 (120578

03+ 12057821)2] + (3120578

21minus 12057803)

sdot (12057821+ 12057803) [3 (120578

30+ 12057812)2minus (12057803+ 12057821)2]

1198626= (12057820minus 12057802) [(12057830+ 12057812)2minus (12057803+ 12057821)2]

+ 412057811(12057830+ 12057812) (12057821+ 12057803)

1198627= (312057821minus 12057803) (12057830+ 12057812)

sdot [(12057830+ 12057812)2minus 3 (120578

03+ 12057821)2] + (3120578

12minus 12057830)

sdot (12057821+ 12057803) [3 (120578

30+ 12057812)2minus (12057803+ 12057821)2]

(13)


119870

119862119870=10038161003816100381610038161003816log10

100381610038161003816100381610038161198621015840

119870

10038161003816100381610038161003816

10038161003816100381610038161003816119870 = 1 2 7 (14)









1198644(119860119894) =

119870

119899 (15)


1198645(119860119894) = mean (119860

119894times La) (16)


1198646(119860119894) = max (119860


119894times La) (17)




119886with





Shapefeatures (fs)


18



8



16




IND (119875) = (119909 119910) isin 119880 times 119880 | 119891 (119909 119886)

= 119891 (119910 119886) for any 119886 isin 119875(18)


119875 that is

[119909]119875 = 119910 isin 119880 | (119909 119910) isin IND (119875) (19)


119875| 119909 isin 119880



119875(119883) = 119909 isin 119880 | [119909]

119875sube 119883



H1

H2

H

r

r



119875(119876) =

⋃119883isin119880119876


119875(119883)



119862(119876) = POS

119862119886(119876) 119886 is said to be 119863-


119875(119876) =





1 119909

119899 and data

marker 119866 = 1199101 119910

119899 where 119909




119908119879119909 + 119887 = 0 (20)


119908 and 119887

min119908119887

119903 (119908) =1

2119908

119910119894 ((119908 sdot 119909) + 119887) ge 1 119894 = 1 119899

(21)



min120572

1

2

119899

sum

119894=1

119899

sum

119895=1

119910119894119910119895120572119894120572119895(119909119894sdot 119909119895) minus

119899

sum

119895=1

120572119895

st119899

sum

119894=1

119910119894120572119894= 0 120572

119894ge 0 119894 = 1 2 119899

(22)


min120572

1

2

119899

sum

119894=1

119899

sum

119895=1

119910119894119910119895120572119894120572119895(119909119894sdot 119909119895) minus

119899

sum

119895=1

120572119895

st119899

sum

119894=1

119910119894120572119894= 0 0 le 120572

119894le 119862

(23)



119891 (119909) = sgn(119899

sum

119894=1

119886lowast

119894119910119894(119909119894sdot 119909)) + 119887

lowast (24)



119870(119909119894 119909) = (120593 (119909

119894) 120593 (119909)) (25)


119891 (119909) = sgn(119899

sum

119894=1

119886119894119910119894119870(119909119894sdot 119909)) + 119887 (26)











ROI

Pulmonarynodule

Otherareas


Gridoptimization

methodSVM

Featureextraction

Featurereduction


Feature set

RS1 RS2 RS3 RS4 RS5

Rough set

Image segmentation

Shapefeatures








Intensityfeatures

Texturefeatures

CT

SPN VAPN PAPN

Normalized

Discretization














02

04

06

08

1

12

14

(a) Boxplot of ESV

+

+

+

01

015

02

025

03

035

04

045

05

055

06






21


ft1517


ft14 ft15 ft1619


ft14 ft15 ft1619


ft12 ft15 ft1620











Before reduction

1 9642 9286 100 106102 9196 8393 100 061703 9554 100 9107 054904 8928 100 7857 056305 9554 9107 100 054706 9821 9643 100 054607 9464 8929 100 054608 9553 9107 100 054609 9196 8393 100 0546010 9732 100 9664 05300

Mean 9464 9286 9643 06051


1 100 100 100 093702 100 100 100 043603 100 100 100 038704 100 100 100 042105 100 100 100 042106 100 100 100 039007 100 100 100 040608 9167 100 8333 040609 100 100 100 0374010 100 100 100 03930






Processingtime (s)

RS1 9917 100 9833 04571RS2 975 9667 9833 04650RS3 9917 100 9833 04656RS4 100 100 100 04731RS5 9833 9833 9833 04850Mean 9883 99 9866 04672




9293949596979899

100101

5020 4030 3535 3040 2050








Before fusion

5020 9735 9471 100 048734030 9653 9308 9832 038463535 9583 9239 9779 042543040 9616 9558 9674 035602050 9488 9463 9586 04236Mean 9615 9408 97742 04154

After fusion (Rs1)

5020 9971 9941 100 026844030 9896 9958 9846 025683535 9865 9923 9808 023823040 9837 9860 9814 026462050 9825 9767 9884 02636Mean 9879 9884 9870 02583



()

Averagesensitivity

()

Averagespecificity

()

Runningtime (s)

RS1 9917 100 9833 02583RS2 975 9667 9833 02870RS3 9917 100 9833 02560RS4 100 100 100 02531RS5 9833 9833 9833 02656Mean 98834 99 9866 02620


Serialnumber

Accuracy()

Sensitivity()

Specificity()


1 9167 8333 100 099702 9674 9348 100 048303 9674 9348 100 048804 9891 100 9783 049505 9348 8696 100 049506 9674 100 9348 051407 9674 100 9348 051208 9457 8913 100 048909 9783 9565 100 0499010 9565 9348 9783 05180Mean 9591 9355 9826 05490




Accuracy() FPs

Santos et al[11] L 260 884 117


L 48 9065 0138



Rough set


PCA

020406080

100120


5 Conclusions




Competing Interests


Acknowledgments


References






























of






Disease Markers



OncologyJournal of





PPAR Research



Journal of

ObesityJournal of


















2 Related Theory




(1) Area

119878 =

119873

sum

119909=1

119872

sum

119910=1

119891 (119909 119910) (1)


(2) Perimeter

119862 =

119872

sum

119894=1

119873

sum

119895=1

119901 (119894 119895) (2)


(3) Circularity

1198770=1198622

4120587119878 (3)


0lt 1 and 119877

0



(4) Rectangularity

119877 =119878

(119867 lowast119882) (4)


(5) Elongation

119864 =min (119867119882)max (119867119882)

(5)


(6) Euler Number

119864 = 119862 minus 119867 (6)







Volume (VS (119860119894)) =

4


dim2)

3

1198641(119860119894) =

Volume (119860119894)

Volume (VS (119860119894))

(7)




2(119860119894) is



1198642(119860119894) = std(

1003817100381710038171003817119862 (119878119894) minus 119862cen (119860 119894)1003817100381710038171003817

mean (1003817100381710038171003817119862 (119878119894) minus 119862cen (119860 119894)1003817100381710038171003817)) (8)



Y

Z

X


Ai

O

Dim

(Ai )

VS(Ai)

(b) ESV

C(Si)

Ccen(Ai)

(c) SCDSTD

Ccen(Ai)

Ccen(Li)

(d) ERCLD




1198643 (119860) =


mean (119862cen (dim (119871 119894))) (9)




119872119901119902=

119872

sum

119909=1

119873

sum

119910=1

119909119901119910119902119891 (119909 119910) 119901 119902 = 1 2 3 (10)


119909 =11989810

11989800

119910 =11989801

11989800

120583119901119902=

119872

sum

119909=1

119873

sum

119910=1

(119909 minus 119909)119901(119910 minus 119910)

119902119891 (119909 119910)

119901 119902 = 1 2 3

(11)



120578119901119902=

120583119901119902

120583120574

00

120574 = ((119901 + 119902)

2) + 1 (12)


1198621= 12057820+ 12057802

1198622= (12057820minus 12057802)2+ 41205782

11

1198623= (12057830minus 312057812)2+ (312057821minus 12057803)2

1198624= (12057830+ 12057812)2+ (12057803+ 12057821)2

1198625= (12057830minus 312057812) (12057830+ 12057812)

sdot [(12057830+ 12057812)2minus 3 (120578

03+ 12057821)2] + (3120578

21minus 12057803)

sdot (12057821+ 12057803) [3 (120578

30+ 12057812)2minus (12057803+ 12057821)2]

1198626= (12057820minus 12057802) [(12057830+ 12057812)2minus (12057803+ 12057821)2]

+ 412057811(12057830+ 12057812) (12057821+ 12057803)

1198627= (312057821minus 12057803) (12057830+ 12057812)

sdot [(12057830+ 12057812)2minus 3 (120578

03+ 12057821)2] + (3120578

12minus 12057830)

sdot (12057821+ 12057803) [3 (120578

30+ 12057812)2minus (12057803+ 12057821)2]

(13)


119870

119862119870=10038161003816100381610038161003816log10

100381610038161003816100381610038161198621015840

119870

10038161003816100381610038161003816

10038161003816100381610038161003816119870 = 1 2 7 (14)









1198644(119860119894) =

119870

119899 (15)


1198645(119860119894) = mean (119860

119894times La) (16)


1198646(119860119894) = max (119860


119894times La) (17)




119886with





Shapefeatures (fs)


18



8



16




IND (119875) = (119909 119910) isin 119880 times 119880 | 119891 (119909 119886)

= 119891 (119910 119886) for any 119886 isin 119875(18)


119875 that is

[119909]119875 = 119910 isin 119880 | (119909 119910) isin IND (119875) (19)


119875| 119909 isin 119880



119875(119883) = 119909 isin 119880 | [119909]

119875sube 119883



H1

H2

H

r

r



119875(119876) =

⋃119883isin119880119876


119875(119883)



119862(119876) = POS

119862119886(119876) 119886 is said to be 119863-


119875(119876) =





1 119909

119899 and data

marker 119866 = 1199101 119910

119899 where 119909




119908119879119909 + 119887 = 0 (20)


119908 and 119887

min119908119887

119903 (119908) =1

2119908

119910119894 ((119908 sdot 119909) + 119887) ge 1 119894 = 1 119899

(21)



min120572

1

2

119899

sum

119894=1

119899

sum

119895=1

119910119894119910119895120572119894120572119895(119909119894sdot 119909119895) minus

119899

sum

119895=1

120572119895

st119899

sum

119894=1

119910119894120572119894= 0 120572

119894ge 0 119894 = 1 2 119899

(22)


min120572

1

2

119899

sum

119894=1

119899

sum

119895=1

119910119894119910119895120572119894120572119895(119909119894sdot 119909119895) minus

119899

sum

119895=1

120572119895

st119899

sum

119894=1

119910119894120572119894= 0 0 le 120572

119894le 119862

(23)



119891 (119909) = sgn(119899

sum

119894=1

119886lowast

119894119910119894(119909119894sdot 119909)) + 119887

lowast (24)



119870(119909119894 119909) = (120593 (119909

119894) 120593 (119909)) (25)


119891 (119909) = sgn(119899

sum

119894=1

119886119894119910119894119870(119909119894sdot 119909)) + 119887 (26)











ROI

Pulmonarynodule

Otherareas


Gridoptimization

methodSVM

Featureextraction

Featurereduction


Feature set

RS1 RS2 RS3 RS4 RS5

Rough set

Image segmentation

Shapefeatures








Intensityfeatures

Texturefeatures

CT

SPN VAPN PAPN

Normalized

Discretization














02

04

06

08

1

12

14

(a) Boxplot of ESV

+

+

+

01

015

02

025

03

035

04

045

05

055

06






21


ft1517


ft14 ft15 ft1619


ft14 ft15 ft1619


ft12 ft15 ft1620











Before reduction

1 9642 9286 100 106102 9196 8393 100 061703 9554 100 9107 054904 8928 100 7857 056305 9554 9107 100 054706 9821 9643 100 054607 9464 8929 100 054608 9553 9107 100 054609 9196 8393 100 0546010 9732 100 9664 05300

Mean 9464 9286 9643 06051


1 100 100 100 093702 100 100 100 043603 100 100 100 038704 100 100 100 042105 100 100 100 042106 100 100 100 039007 100 100 100 040608 9167 100 8333 040609 100 100 100 0374010 100 100 100 03930






Processingtime (s)

RS1 9917 100 9833 04571RS2 975 9667 9833 04650RS3 9917 100 9833 04656RS4 100 100 100 04731RS5 9833 9833 9833 04850Mean 9883 99 9866 04672




9293949596979899

100101

5020 4030 3535 3040 2050








Before fusion

5020 9735 9471 100 048734030 9653 9308 9832 038463535 9583 9239 9779 042543040 9616 9558 9674 035602050 9488 9463 9586 04236Mean 9615 9408 97742 04154

After fusion (Rs1)

5020 9971 9941 100 026844030 9896 9958 9846 025683535 9865 9923 9808 023823040 9837 9860 9814 026462050 9825 9767 9884 02636Mean 9879 9884 9870 02583



()

Averagesensitivity

()

Averagespecificity

()

Runningtime (s)

RS1 9917 100 9833 02583RS2 975 9667 9833 02870RS3 9917 100 9833 02560RS4 100 100 100 02531RS5 9833 9833 9833 02656Mean 98834 99 9866 02620


Serialnumber

Accuracy()

Sensitivity()

Specificity()


1 9167 8333 100 099702 9674 9348 100 048303 9674 9348 100 048804 9891 100 9783 049505 9348 8696 100 049506 9674 100 9348 051407 9674 100 9348 051208 9457 8913 100 048909 9783 9565 100 0499010 9565 9348 9783 05180Mean 9591 9355 9826 05490




Accuracy() FPs

Santos et al[11] L 260 884 117


L 48 9065 0138



Rough set


PCA

020406080

100120


5 Conclusions




Competing Interests


Acknowledgments


References






























of






Disease Markers



OncologyJournal of





PPAR Research



Journal of

ObesityJournal of

















Y

Z

X


Ai

O

Dim

(Ai )

VS(Ai)

(b) ESV

C(Si)

Ccen(Ai)

(c) SCDSTD

Ccen(Ai)

Ccen(Li)

(d) ERCLD




1198643 (119860) =


mean (119862cen (dim (119871 119894))) (9)




119872119901119902=

119872

sum

119909=1

119873

sum

119910=1

119909119901119910119902119891 (119909 119910) 119901 119902 = 1 2 3 (10)


119909 =11989810

11989800

119910 =11989801

11989800

120583119901119902=

119872

sum

119909=1

119873

sum

119910=1

(119909 minus 119909)119901(119910 minus 119910)

119902119891 (119909 119910)

119901 119902 = 1 2 3

(11)



120578119901119902=

120583119901119902

120583120574

00

120574 = ((119901 + 119902)

2) + 1 (12)


1198621= 12057820+ 12057802

1198622= (12057820minus 12057802)2+ 41205782

11

1198623= (12057830minus 312057812)2+ (312057821minus 12057803)2

1198624= (12057830+ 12057812)2+ (12057803+ 12057821)2

1198625= (12057830minus 312057812) (12057830+ 12057812)

sdot [(12057830+ 12057812)2minus 3 (120578

03+ 12057821)2] + (3120578

21minus 12057803)

sdot (12057821+ 12057803) [3 (120578

30+ 12057812)2minus (12057803+ 12057821)2]

1198626= (12057820minus 12057802) [(12057830+ 12057812)2minus (12057803+ 12057821)2]

+ 412057811(12057830+ 12057812) (12057821+ 12057803)

1198627= (312057821minus 12057803) (12057830+ 12057812)

sdot [(12057830+ 12057812)2minus 3 (120578

03+ 12057821)2] + (3120578

12minus 12057830)

sdot (12057821+ 12057803) [3 (120578

30+ 12057812)2minus (12057803+ 12057821)2]

(13)


119870

119862119870=10038161003816100381610038161003816log10

100381610038161003816100381610038161198621015840

119870

10038161003816100381610038161003816

10038161003816100381610038161003816119870 = 1 2 7 (14)









1198644(119860119894) =

119870

119899 (15)


1198645(119860119894) = mean (119860

119894times La) (16)


1198646(119860119894) = max (119860


119894times La) (17)




119886with





Shapefeatures (fs)


18



8



16




IND (119875) = (119909 119910) isin 119880 times 119880 | 119891 (119909 119886)

= 119891 (119910 119886) for any 119886 isin 119875(18)


119875 that is

[119909]119875 = 119910 isin 119880 | (119909 119910) isin IND (119875) (19)


119875| 119909 isin 119880



119875(119883) = 119909 isin 119880 | [119909]

119875sube 119883



H1

H2

H

r

r



119875(119876) =

⋃119883isin119880119876


119875(119883)



119862(119876) = POS

119862119886(119876) 119886 is said to be 119863-


119875(119876) =





1 119909

119899 and data

marker 119866 = 1199101 119910

119899 where 119909




119908119879119909 + 119887 = 0 (20)


119908 and 119887

min119908119887

119903 (119908) =1

2119908

119910119894 ((119908 sdot 119909) + 119887) ge 1 119894 = 1 119899

(21)



min120572

1

2

119899

sum

119894=1

119899

sum

119895=1

119910119894119910119895120572119894120572119895(119909119894sdot 119909119895) minus

119899

sum

119895=1

120572119895

st119899

sum

119894=1

119910119894120572119894= 0 120572

119894ge 0 119894 = 1 2 119899

(22)


min120572

1

2

119899

sum

119894=1

119899

sum

119895=1

119910119894119910119895120572119894120572119895(119909119894sdot 119909119895) minus

119899

sum

119895=1

120572119895

st119899

sum

119894=1

119910119894120572119894= 0 0 le 120572

119894le 119862

(23)



119891 (119909) = sgn(119899

sum

119894=1

119886lowast

119894119910119894(119909119894sdot 119909)) + 119887

lowast (24)



119870(119909119894 119909) = (120593 (119909

119894) 120593 (119909)) (25)


119891 (119909) = sgn(119899

sum

119894=1

119886119894119910119894119870(119909119894sdot 119909)) + 119887 (26)











ROI

Pulmonarynodule

Otherareas


Gridoptimization

methodSVM

Featureextraction

Featurereduction


Feature set

RS1 RS2 RS3 RS4 RS5

Rough set

Image segmentation

Shapefeatures








Intensityfeatures

Texturefeatures

CT

SPN VAPN PAPN

Normalized

Discretization














02

04

06

08

1

12

14

(a) Boxplot of ESV

+

+

+

01

015

02

025

03

035

04

045

05

055

06






21


ft1517


ft14 ft15 ft1619


ft14 ft15 ft1619


ft12 ft15 ft1620











Before reduction

1 9642 9286 100 106102 9196 8393 100 061703 9554 100 9107 054904 8928 100 7857 056305 9554 9107 100 054706 9821 9643 100 054607 9464 8929 100 054608 9553 9107 100 054609 9196 8393 100 0546010 9732 100 9664 05300

Mean 9464 9286 9643 06051


1 100 100 100 093702 100 100 100 043603 100 100 100 038704 100 100 100 042105 100 100 100 042106 100 100 100 039007 100 100 100 040608 9167 100 8333 040609 100 100 100 0374010 100 100 100 03930






Processingtime (s)

RS1 9917 100 9833 04571RS2 975 9667 9833 04650RS3 9917 100 9833 04656RS4 100 100 100 04731RS5 9833 9833 9833 04850Mean 9883 99 9866 04672




9293949596979899

100101

5020 4030 3535 3040 2050








Before fusion

5020 9735 9471 100 048734030 9653 9308 9832 038463535 9583 9239 9779 042543040 9616 9558 9674 035602050 9488 9463 9586 04236Mean 9615 9408 97742 04154

After fusion (Rs1)

5020 9971 9941 100 026844030 9896 9958 9846 025683535 9865 9923 9808 023823040 9837 9860 9814 026462050 9825 9767 9884 02636Mean 9879 9884 9870 02583



()

Averagesensitivity

()

Averagespecificity

()

Runningtime (s)

RS1 9917 100 9833 02583RS2 975 9667 9833 02870RS3 9917 100 9833 02560RS4 100 100 100 02531RS5 9833 9833 9833 02656Mean 98834 99 9866 02620


Serialnumber

Accuracy()

Sensitivity()

Specificity()


1 9167 8333 100 099702 9674 9348 100 048303 9674 9348 100 048804 9891 100 9783 049505 9348 8696 100 049506 9674 100 9348 051407 9674 100 9348 051208 9457 8913 100 048909 9783 9565 100 0499010 9565 9348 9783 05180Mean 9591 9355 9826 05490




Accuracy() FPs

Santos et al[11] L 260 884 117


L 48 9065 0138



Rough set


PCA

020406080

100120


5 Conclusions




Competing Interests


Acknowledgments


References






























of






Disease Markers



OncologyJournal of





PPAR Research



Journal of

ObesityJournal of


















120578119901119902=

120583119901119902

120583120574

00

120574 = ((119901 + 119902)

2) + 1 (12)


1198621= 12057820+ 12057802

1198622= (12057820minus 12057802)2+ 41205782

11

1198623= (12057830minus 312057812)2+ (312057821minus 12057803)2

1198624= (12057830+ 12057812)2+ (12057803+ 12057821)2

1198625= (12057830minus 312057812) (12057830+ 12057812)

sdot [(12057830+ 12057812)2minus 3 (120578

03+ 12057821)2] + (3120578

21minus 12057803)

sdot (12057821+ 12057803) [3 (120578

30+ 12057812)2minus (12057803+ 12057821)2]

1198626= (12057820minus 12057802) [(12057830+ 12057812)2minus (12057803+ 12057821)2]

+ 412057811(12057830+ 12057812) (12057821+ 12057803)

1198627= (312057821minus 12057803) (12057830+ 12057812)

sdot [(12057830+ 12057812)2minus 3 (120578

03+ 12057821)2] + (3120578

12minus 12057830)

sdot (12057821+ 12057803) [3 (120578

30+ 12057812)2minus (12057803+ 12057821)2]

(13)


119870

119862119870=10038161003816100381610038161003816log10

100381610038161003816100381610038161198621015840

119870

10038161003816100381610038161003816

10038161003816100381610038161003816119870 = 1 2 7 (14)









1198644(119860119894) =

119870

119899 (15)


1198645(119860119894) = mean (119860

119894times La) (16)


1198646(119860119894) = max (119860


119894times La) (17)




119886with





Shapefeatures (fs)


18



8



16




IND (119875) = (119909 119910) isin 119880 times 119880 | 119891 (119909 119886)

= 119891 (119910 119886) for any 119886 isin 119875(18)


119875 that is

[119909]119875 = 119910 isin 119880 | (119909 119910) isin IND (119875) (19)


119875| 119909 isin 119880



119875(119883) = 119909 isin 119880 | [119909]

119875sube 119883



H1

H2

H

r

r



119875(119876) =

⋃119883isin119880119876


119875(119883)



119862(119876) = POS

119862119886(119876) 119886 is said to be 119863-


119875(119876) =





1 119909

119899 and data

marker 119866 = 1199101 119910

119899 where 119909




119908119879119909 + 119887 = 0 (20)


119908 and 119887

min119908119887

119903 (119908) =1

2119908

119910119894 ((119908 sdot 119909) + 119887) ge 1 119894 = 1 119899

(21)



min120572

1

2

119899

sum

119894=1

119899

sum

119895=1

119910119894119910119895120572119894120572119895(119909119894sdot 119909119895) minus

119899

sum

119895=1

120572119895

st119899

sum

119894=1

119910119894120572119894= 0 120572

119894ge 0 119894 = 1 2 119899

(22)


min120572

1

2

119899

sum

119894=1

119899

sum

119895=1

119910119894119910119895120572119894120572119895(119909119894sdot 119909119895) minus

119899

sum

119895=1

120572119895

st119899

sum

119894=1

119910119894120572119894= 0 0 le 120572

119894le 119862

(23)



119891 (119909) = sgn(119899

sum

119894=1

119886lowast

119894119910119894(119909119894sdot 119909)) + 119887

lowast (24)



119870(119909119894 119909) = (120593 (119909

119894) 120593 (119909)) (25)


119891 (119909) = sgn(119899

sum

119894=1

119886119894119910119894119870(119909119894sdot 119909)) + 119887 (26)











ROI

Pulmonarynodule

Otherareas


Gridoptimization

methodSVM

Featureextraction

Featurereduction


Feature set

RS1 RS2 RS3 RS4 RS5

Rough set

Image segmentation

Shapefeatures








Intensityfeatures

Texturefeatures

CT

SPN VAPN PAPN

Normalized

Discretization














02

04

06

08

1

12

14

(a) Boxplot of ESV

+

+

+

01

015

02

025

03

035

04

045

05

055

06






21


ft1517


ft14 ft15 ft1619


ft14 ft15 ft1619


ft12 ft15 ft1620











Before reduction

1 9642 9286 100 106102 9196 8393 100 061703 9554 100 9107 054904 8928 100 7857 056305 9554 9107 100 054706 9821 9643 100 054607 9464 8929 100 054608 9553 9107 100 054609 9196 8393 100 0546010 9732 100 9664 05300

Mean 9464 9286 9643 06051


1 100 100 100 093702 100 100 100 043603 100 100 100 038704 100 100 100 042105 100 100 100 042106 100 100 100 039007 100 100 100 040608 9167 100 8333 040609 100 100 100 0374010 100 100 100 03930






Processingtime (s)

RS1 9917 100 9833 04571RS2 975 9667 9833 04650RS3 9917 100 9833 04656RS4 100 100 100 04731RS5 9833 9833 9833 04850Mean 9883 99 9866 04672




9293949596979899

100101

5020 4030 3535 3040 2050








Before fusion

5020 9735 9471 100 048734030 9653 9308 9832 038463535 9583 9239 9779 042543040 9616 9558 9674 035602050 9488 9463 9586 04236Mean 9615 9408 97742 04154

After fusion (Rs1)

5020 9971 9941 100 026844030 9896 9958 9846 025683535 9865 9923 9808 023823040 9837 9860 9814 026462050 9825 9767 9884 02636Mean 9879 9884 9870 02583



()

Averagesensitivity

()

Averagespecificity

()

Runningtime (s)

RS1 9917 100 9833 02583RS2 975 9667 9833 02870RS3 9917 100 9833 02560RS4 100 100 100 02531RS5 9833 9833 9833 02656Mean 98834 99 9866 02620


Serialnumber

Accuracy()

Sensitivity()

Specificity()


1 9167 8333 100 099702 9674 9348 100 048303 9674 9348 100 048804 9891 100 9783 049505 9348 8696 100 049506 9674 100 9348 051407 9674 100 9348 051208 9457 8913 100 048909 9783 9565 100 0499010 9565 9348 9783 05180Mean 9591 9355 9826 05490




Accuracy() FPs

Santos et al[11] L 260 884 117


L 48 9065 0138



Rough set


PCA

020406080

100120


5 Conclusions




Competing Interests


Acknowledgments


References






























of






Disease Markers



OncologyJournal of





PPAR Research



Journal of

ObesityJournal of



















Shapefeatures (fs)


18



8



16




IND (119875) = (119909 119910) isin 119880 times 119880 | 119891 (119909 119886)

= 119891 (119910 119886) for any 119886 isin 119875(18)


119875 that is

[119909]119875 = 119910 isin 119880 | (119909 119910) isin IND (119875) (19)


119875| 119909 isin 119880



119875(119883) = 119909 isin 119880 | [119909]

119875sube 119883



H1

H2

H

r

r



119875(119876) =

⋃119883isin119880119876


119875(119883)



119862(119876) = POS

119862119886(119876) 119886 is said to be 119863-


119875(119876) =





1 119909

119899 and data

marker 119866 = 1199101 119910

119899 where 119909




119908119879119909 + 119887 = 0 (20)


119908 and 119887

min119908119887

119903 (119908) =1

2119908

119910119894 ((119908 sdot 119909) + 119887) ge 1 119894 = 1 119899

(21)



min120572

1

2

119899

sum

119894=1

119899

sum

119895=1

119910119894119910119895120572119894120572119895(119909119894sdot 119909119895) minus

119899

sum

119895=1

120572119895

st119899

sum

119894=1

119910119894120572119894= 0 120572

119894ge 0 119894 = 1 2 119899

(22)


min120572

1

2

119899

sum

119894=1

119899

sum

119895=1

119910119894119910119895120572119894120572119895(119909119894sdot 119909119895) minus

119899

sum

119895=1

120572119895

st119899

sum

119894=1

119910119894120572119894= 0 0 le 120572

119894le 119862

(23)



119891 (119909) = sgn(119899

sum

119894=1

119886lowast

119894119910119894(119909119894sdot 119909)) + 119887

lowast (24)



119870(119909119894 119909) = (120593 (119909

119894) 120593 (119909)) (25)


119891 (119909) = sgn(119899

sum

119894=1

119886119894119910119894119870(119909119894sdot 119909)) + 119887 (26)











ROI

Pulmonarynodule

Otherareas


Gridoptimization

methodSVM

Featureextraction

Featurereduction


Feature set

RS1 RS2 RS3 RS4 RS5

Rough set

Image segmentation

Shapefeatures








Intensityfeatures

Texturefeatures

CT

SPN VAPN PAPN

Normalized

Discretization














02

04

06

08

1

12

14

(a) Boxplot of ESV

+

+

+

01

015

02

025

03

035

04

045

05

055

06






21


ft1517


ft14 ft15 ft1619


ft14 ft15 ft1619


ft12 ft15 ft1620











Before reduction

1 9642 9286 100 106102 9196 8393 100 061703 9554 100 9107 054904 8928 100 7857 056305 9554 9107 100 054706 9821 9643 100 054607 9464 8929 100 054608 9553 9107 100 054609 9196 8393 100 0546010 9732 100 9664 05300

Mean 9464 9286 9643 06051


1 100 100 100 093702 100 100 100 043603 100 100 100 038704 100 100 100 042105 100 100 100 042106 100 100 100 039007 100 100 100 040608 9167 100 8333 040609 100 100 100 0374010 100 100 100 03930






Processingtime (s)

RS1 9917 100 9833 04571RS2 975 9667 9833 04650RS3 9917 100 9833 04656RS4 100 100 100 04731RS5 9833 9833 9833 04850Mean 9883 99 9866 04672




9293949596979899

100101

5020 4030 3535 3040 2050








Before fusion

5020 9735 9471 100 048734030 9653 9308 9832 038463535 9583 9239 9779 042543040 9616 9558 9674 035602050 9488 9463 9586 04236Mean 9615 9408 97742 04154

After fusion (Rs1)

5020 9971 9941 100 026844030 9896 9958 9846 025683535 9865 9923 9808 023823040 9837 9860 9814 026462050 9825 9767 9884 02636Mean 9879 9884 9870 02583



()

Averagesensitivity

()

Averagespecificity

()

Runningtime (s)

RS1 9917 100 9833 02583RS2 975 9667 9833 02870RS3 9917 100 9833 02560RS4 100 100 100 02531RS5 9833 9833 9833 02656Mean 98834 99 9866 02620


Serialnumber

Accuracy()

Sensitivity()

Specificity()


1 9167 8333 100 099702 9674 9348 100 048303 9674 9348 100 048804 9891 100 9783 049505 9348 8696 100 049506 9674 100 9348 051407 9674 100 9348 051208 9457 8913 100 048909 9783 9565 100 0499010 9565 9348 9783 05180Mean 9591 9355 9826 05490




Accuracy() FPs

Santos et al[11] L 260 884 117


L 48 9065 0138



Rough set


PCA

020406080

100120


5 Conclusions




Competing Interests


Acknowledgments


References






























of






Disease Markers



OncologyJournal of





PPAR Research



Journal of

ObesityJournal of


















min120572

1

2

119899

sum

119894=1

119899

sum

119895=1

119910119894119910119895120572119894120572119895(119909119894sdot 119909119895) minus

119899

sum

119895=1

120572119895

st119899

sum

119894=1

119910119894120572119894= 0 120572

119894ge 0 119894 = 1 2 119899

(22)


min120572

1

2

119899

sum

119894=1

119899

sum

119895=1

119910119894119910119895120572119894120572119895(119909119894sdot 119909119895) minus

119899

sum

119895=1

120572119895

st119899

sum

119894=1

119910119894120572119894= 0 0 le 120572

119894le 119862

(23)



119891 (119909) = sgn(119899

sum

119894=1

119886lowast

119894119910119894(119909119894sdot 119909)) + 119887

lowast (24)



119870(119909119894 119909) = (120593 (119909

119894) 120593 (119909)) (25)


119891 (119909) = sgn(119899

sum

119894=1

119886119894119910119894119870(119909119894sdot 119909)) + 119887 (26)











ROI

Pulmonarynodule

Otherareas


Gridoptimization

methodSVM

Featureextraction

Featurereduction


Feature set

RS1 RS2 RS3 RS4 RS5

Rough set

Image segmentation

Shapefeatures








Intensityfeatures

Texturefeatures

CT

SPN VAPN PAPN

Normalized

Discretization














02

04

06

08

1

12

14

(a) Boxplot of ESV

+

+

+

01

015

02

025

03

035

04

045

05

055

06






21


ft1517


ft14 ft15 ft1619


ft14 ft15 ft1619


ft12 ft15 ft1620











Before reduction

1 9642 9286 100 106102 9196 8393 100 061703 9554 100 9107 054904 8928 100 7857 056305 9554 9107 100 054706 9821 9643 100 054607 9464 8929 100 054608 9553 9107 100 054609 9196 8393 100 0546010 9732 100 9664 05300

Mean 9464 9286 9643 06051


1 100 100 100 093702 100 100 100 043603 100 100 100 038704 100 100 100 042105 100 100 100 042106 100 100 100 039007 100 100 100 040608 9167 100 8333 040609 100 100 100 0374010 100 100 100 03930






Processingtime (s)

RS1 9917 100 9833 04571RS2 975 9667 9833 04650RS3 9917 100 9833 04656RS4 100 100 100 04731RS5 9833 9833 9833 04850Mean 9883 99 9866 04672




9293949596979899

100101

5020 4030 3535 3040 2050








Before fusion

5020 9735 9471 100 048734030 9653 9308 9832 038463535 9583 9239 9779 042543040 9616 9558 9674 035602050 9488 9463 9586 04236Mean 9615 9408 97742 04154

After fusion (Rs1)

5020 9971 9941 100 026844030 9896 9958 9846 025683535 9865 9923 9808 023823040 9837 9860 9814 026462050 9825 9767 9884 02636Mean 9879 9884 9870 02583



()

Averagesensitivity

()

Averagespecificity

()

Runningtime (s)

RS1 9917 100 9833 02583RS2 975 9667 9833 02870RS3 9917 100 9833 02560RS4 100 100 100 02531RS5 9833 9833 9833 02656Mean 98834 99 9866 02620


Serialnumber

Accuracy()

Sensitivity()

Specificity()


1 9167 8333 100 099702 9674 9348 100 048303 9674 9348 100 048804 9891 100 9783 049505 9348 8696 100 049506 9674 100 9348 051407 9674 100 9348 051208 9457 8913 100 048909 9783 9565 100 0499010 9565 9348 9783 05180Mean 9591 9355 9826 05490




Accuracy() FPs

Santos et al[11] L 260 884 117


L 48 9065 0138



Rough set


PCA

020406080

100120


5 Conclusions




Competing Interests


Acknowledgments


References






























of






Disease Markers



OncologyJournal of





PPAR Research



Journal of

ObesityJournal of

















ROI

Pulmonarynodule

Otherareas


Gridoptimization

methodSVM

Featureextraction

Featurereduction


Feature set

RS1 RS2 RS3 RS4 RS5

Rough set

Image segmentation

Shapefeatures








Intensityfeatures

Texturefeatures

CT

SPN VAPN PAPN

Normalized

Discretization














02

04

06

08

1

12

14

(a) Boxplot of ESV

+

+

+

01

015

02

025

03

035

04

045

05

055

06






21


ft1517


ft14 ft15 ft1619


ft14 ft15 ft1619


ft12 ft15 ft1620











Before reduction

1 9642 9286 100 106102 9196 8393 100 061703 9554 100 9107 054904 8928 100 7857 056305 9554 9107 100 054706 9821 9643 100 054607 9464 8929 100 054608 9553 9107 100 054609 9196 8393 100 0546010 9732 100 9664 05300

Mean 9464 9286 9643 06051


1 100 100 100 093702 100 100 100 043603 100 100 100 038704 100 100 100 042105 100 100 100 042106 100 100 100 039007 100 100 100 040608 9167 100 8333 040609 100 100 100 0374010 100 100 100 03930






Processingtime (s)

RS1 9917 100 9833 04571RS2 975 9667 9833 04650RS3 9917 100 9833 04656RS4 100 100 100 04731RS5 9833 9833 9833 04850Mean 9883 99 9866 04672




9293949596979899

100101

5020 4030 3535 3040 2050








Before fusion

5020 9735 9471 100 048734030 9653 9308 9832 038463535 9583 9239 9779 042543040 9616 9558 9674 035602050 9488 9463 9586 04236Mean 9615 9408 97742 04154

After fusion (Rs1)

5020 9971 9941 100 026844030 9896 9958 9846 025683535 9865 9923 9808 023823040 9837 9860 9814 026462050 9825 9767 9884 02636Mean 9879 9884 9870 02583



()

Averagesensitivity

()

Averagespecificity

()

Runningtime (s)

RS1 9917 100 9833 02583RS2 975 9667 9833 02870RS3 9917 100 9833 02560RS4 100 100 100 02531RS5 9833 9833 9833 02656Mean 98834 99 9866 02620


Serialnumber

Accuracy()

Sensitivity()

Specificity()


1 9167 8333 100 099702 9674 9348 100 048303 9674 9348 100 048804 9891 100 9783 049505 9348 8696 100 049506 9674 100 9348 051407 9674 100 9348 051208 9457 8913 100 048909 9783 9565 100 0499010 9565 9348 9783 05180Mean 9591 9355 9826 05490




Accuracy() FPs

Santos et al[11] L 260 884 117


L 48 9065 0138



Rough set


PCA

020406080

100120


5 Conclusions




Competing Interests


Acknowledgments


References






























of






Disease Markers



OncologyJournal of





PPAR Research



Journal of

ObesityJournal of
























02

04

06

08

1

12

14

(a) Boxplot of ESV

+

+

+

01

015

02

025

03

035

04

045

05

055

06






21


ft1517


ft14 ft15 ft1619


ft14 ft15 ft1619


ft12 ft15 ft1620











Before reduction

1 9642 9286 100 106102 9196 8393 100 061703 9554 100 9107 054904 8928 100 7857 056305 9554 9107 100 054706 9821 9643 100 054607 9464 8929 100 054608 9553 9107 100 054609 9196 8393 100 0546010 9732 100 9664 05300

Mean 9464 9286 9643 06051


1 100 100 100 093702 100 100 100 043603 100 100 100 038704 100 100 100 042105 100 100 100 042106 100 100 100 039007 100 100 100 040608 9167 100 8333 040609 100 100 100 0374010 100 100 100 03930






Processingtime (s)

RS1 9917 100 9833 04571RS2 975 9667 9833 04650RS3 9917 100 9833 04656RS4 100 100 100 04731RS5 9833 9833 9833 04850Mean 9883 99 9866 04672




9293949596979899

100101

5020 4030 3535 3040 2050








Before fusion

5020 9735 9471 100 048734030 9653 9308 9832 038463535 9583 9239 9779 042543040 9616 9558 9674 035602050 9488 9463 9586 04236Mean 9615 9408 97742 04154

After fusion (Rs1)

5020 9971 9941 100 026844030 9896 9958 9846 025683535 9865 9923 9808 023823040 9837 9860 9814 026462050 9825 9767 9884 02636Mean 9879 9884 9870 02583



()

Averagesensitivity

()

Averagespecificity

()

Runningtime (s)

RS1 9917 100 9833 02583RS2 975 9667 9833 02870RS3 9917 100 9833 02560RS4 100 100 100 02531RS5 9833 9833 9833 02656Mean 98834 99 9866 02620


Serialnumber

Accuracy()

Sensitivity()

Specificity()


1 9167 8333 100 099702 9674 9348 100 048303 9674 9348 100 048804 9891 100 9783 049505 9348 8696 100 049506 9674 100 9348 051407 9674 100 9348 051208 9457 8913 100 048909 9783 9565 100 0499010 9565 9348 9783 05180Mean 9591 9355 9826 05490




Accuracy() FPs

Santos et al[11] L 260 884 117


L 48 9065 0138



Rough set


PCA

020406080

100120


5 Conclusions




Competing Interests


Acknowledgments


References






























of






Disease Markers



OncologyJournal of





PPAR Research



Journal of

ObesityJournal of



















Before reduction

1 9642 9286 100 106102 9196 8393 100 061703 9554 100 9107 054904 8928 100 7857 056305 9554 9107 100 054706 9821 9643 100 054607 9464 8929 100 054608 9553 9107 100 054609 9196 8393 100 0546010 9732 100 9664 05300

Mean 9464 9286 9643 06051


1 100 100 100 093702 100 100 100 043603 100 100 100 038704 100 100 100 042105 100 100 100 042106 100 100 100 039007 100 100 100 040608 9167 100 8333 040609 100 100 100 0374010 100 100 100 03930






Processingtime (s)

RS1 9917 100 9833 04571RS2 975 9667 9833 04650RS3 9917 100 9833 04656RS4 100 100 100 04731RS5 9833 9833 9833 04850Mean 9883 99 9866 04672




9293949596979899

100101

5020 4030 3535 3040 2050








Before fusion

5020 9735 9471 100 048734030 9653 9308 9832 038463535 9583 9239 9779 042543040 9616 9558 9674 035602050 9488 9463 9586 04236Mean 9615 9408 97742 04154

After fusion (Rs1)

5020 9971 9941 100 026844030 9896 9958 9846 025683535 9865 9923 9808 023823040 9837 9860 9814 026462050 9825 9767 9884 02636Mean 9879 9884 9870 02583



()

Averagesensitivity

()

Averagespecificity

()

Runningtime (s)

RS1 9917 100 9833 02583RS2 975 9667 9833 02870RS3 9917 100 9833 02560RS4 100 100 100 02531RS5 9833 9833 9833 02656Mean 98834 99 9866 02620


Serialnumber

Accuracy()

Sensitivity()

Specificity()


1 9167 8333 100 099702 9674 9348 100 048303 9674 9348 100 048804 9891 100 9783 049505 9348 8696 100 049506 9674 100 9348 051407 9674 100 9348 051208 9457 8913 100 048909 9783 9565 100 0499010 9565 9348 9783 05180Mean 9591 9355 9826 05490




Accuracy() FPs

Santos et al[11] L 260 884 117


L 48 9065 0138



Rough set


PCA

020406080

100120


5 Conclusions




Competing Interests


Acknowledgments


References






























of






Disease Markers



OncologyJournal of





PPAR Research



Journal of

ObesityJournal of



















Before fusion

5020 9735 9471 100 048734030 9653 9308 9832 038463535 9583 9239 9779 042543040 9616 9558 9674 035602050 9488 9463 9586 04236Mean 9615 9408 97742 04154

After fusion (Rs1)

5020 9971 9941 100 026844030 9896 9958 9846 025683535 9865 9923 9808 023823040 9837 9860 9814 026462050 9825 9767 9884 02636Mean 9879 9884 9870 02583



()

Averagesensitivity

()

Averagespecificity

()

Runningtime (s)

RS1 9917 100 9833 02583RS2 975 9667 9833 02870RS3 9917 100 9833 02560RS4 100 100 100 02531RS5 9833 9833 9833 02656Mean 98834 99 9866 02620


Serialnumber

Accuracy()

Sensitivity()

Specificity()


1 9167 8333 100 099702 9674 9348 100 048303 9674 9348 100 048804 9891 100 9783 049505 9348 8696 100 049506 9674 100 9348 051407 9674 100 9348 051208 9457 8913 100 048909 9783 9565 100 0499010 9565 9348 9783 05180Mean 9591 9355 9826 05490




Accuracy() FPs

Santos et al[11] L 260 884 117


L 48 9065 0138



Rough set


PCA

020406080

100120


5 Conclusions




Competing Interests


Acknowledgments


References






























of






Disease Markers



OncologyJournal of





PPAR Research



Journal of

ObesityJournal of


















Competing Interests


Acknowledgments


References






























of






Disease Markers



OncologyJournal of





PPAR Research



Journal of

ObesityJournal of





















of






Disease Markers



OncologyJournal of





PPAR Research



Journal of

ObesityJournal of
















Research Article Pulmonary Nodule Detection Model...

Documents

Transcript of Research Article Pulmonary Nodule Detection Model...