Applied Soft Computingash/moumita_appl_soft.pdf · 2016-08-12 · Ghosh a, Moumita Roy , Ashish...
Transcript of Applied Soft Computingash/moumita_appl_soft.pdf · 2016-08-12 · Ghosh a, Moumita Roy , Ashish...
Sf
Sa
b
a
ARRAA
KSCFS
1
mciTqtmdg
lccsntsoe
m
1h
Applied Soft Computing 15 (2014) 1–20
Contents lists available at ScienceDirect
Applied Soft Computing
j ourna l h o mepage: www.elsev ier .com/ locate /asoc
emi-supervised change detection using modified self-organizingeature map neural network
usmita Ghosha, Moumita Roya, Ashish Ghoshb,∗
Department of Computer Science and Engineering, Jadavpur University, Kolkata 700032, IndiaCenter for Soft Computing Research, Indian Statistical Institute, Kolkata 700108, India
r t i c l e i n f o
rticle history:eceived 11 May 2012eceived in revised form 6 August 2013ccepted 24 September 2013vailable online 18 October 2013
eywords:
a b s t r a c t
In the present article, semi-supervised learning is integrated with an unsupervised context-sensitivechange detection technique based on modified self-organizing feature map (MSOFM) network. In theproposed methodology, training of the MSOFM network is initially performed using only a few labeledpatterns. Thereafter, the membership values, in both the classes, for each unlabeled pattern are deter-mined using the concept of fuzzy set theory. The soft class label for each of the unlabeled patterns is thenestimated using the membership values of its K nearest neighbors. Here, training of the network using
emi-supervised learninghange detectionuzzy setelf-organizing feature map
the unlabeled patterns along with a few labeled patterns is carried out iteratively. A heuristic method hasbeen suggested to select some patterns from the unlabeled ones for training. To check the effectivenessof the proposed methodology, experiments are conducted on three multi-temporal and multi-spectraldata sets. Performance of the proposed work is compared with that of two unsupervised techniques, asupervised technique and two semi-supervised techniques. Results are also statistically validated usingpaired t-test. The proposed method produced promising results.
. Introduction
Change detection is a process of detecting temporal effects ofulti-temporal images [1–3]. This process is used for finding out
hanges in a land cover over time by analyzing remotely sensedmages of a geographical area captured at different time instants.he changes can occur due to natural hazards (e.g., disaster, earth-uake), urban growth, deforestation. Change detection is one ofhe most challenging tasks in the field of pattern recognition and
achine learning [4]. There are various applications of changeetection like land use change analysis [5,6], monitoring urbanrowth [7,8], burned area identification [9], etc.
Change detection can be viewed as an image segmentation prob-em, where two groups of pixels are to be formed, one for thehanged class and the other for the unchanged one. Process ofhange detection can be broadly classified into two categories:upervised [10–12] and unsupervised [13–21]. Supervised tech-iques have certain advantages like they can explicitly recognizehe kinds of changes occurred and are robust to different atmo-
pheric and light conditions of acquisition dates. Various method-logies exist in literature to carry out supervised change detection.g., post classification method [1,11,22], direct multi-date∗ Corresponding author. Tel.: +91 33 2575 3110/3100; fax: +91 33 2578 3357.E-mail addresses: [email protected] (S. Ghosh),
[email protected] (M. Roy), [email protected] (A. Ghosh).
568-4946/$ – see front matter © 2013 Elsevier B.V. All rights reserved.ttp://dx.doi.org/10.1016/j.asoc.2013.09.010
© 2013 Elsevier B.V. All rights reserved.
classification method [1], kernel based method [12]. Besides sev-eral advantages, applicability of supervised methods in changedetection is poor due to mandatory requirement of sufficientamount of ground truth information which is expensive, hard andmonotonous. On the contrary, in unsupervised approach [13–20],there is no need of additional information like ground truth. Dueto depletion of labeled patterns, unsupervised techniques seem tobe compulsory for change detection. Generally, three consecutivesteps are followed for unsupervised change detection. These areimage preprocessing, image comparison and image analysis [1].Images of the same geographical area captured at different timeinstants constitute the input of the change detection process. Inthe preprocessing step, these images are made compatible by oper-ations like radiometric and geometric corrections, co-registrationand noise reduction [1]. After preprocessing, image comparison iscarried out, pixel by pixel, to generate a difference image (DI) whichis used for change detection. There are various methods for gener-ating DI like univariate image differencing, change vector analysis(CVA), image ratioing [1]. In the present work, CVA technique [1]is used for creation of DI. Unsupervised change detection processcan be of two types: context insensitive [1,15] and context sensi-tive [13,14,16–19]. Histogram thresholding [1,15] is the simplestunsupervised context insensitive change detection method which
has the main disadvantage of not considering spatial correlationbetween neighborhood pixels in the decision process. To overcomethis difficulty, context sensitive methods using Markov randomfield (MRF) [16,17] are developed. These techniques also suffer from2 oft Co
cfOntc[
ibtmni[Saolucaeostfpo
(spmsieaepplbbidfvtep
owttIwgutctnmu
S. Ghosh et al. / Applied S
ertain difficulties like requirement of the selection of proper modelor statistical distribution of changed and unchanged class pixels.n the contrary, change detection methodologies based on neuraletworks are free from such limitations. Work along this direc-ion is already being carried out employing neural networks forhange detection, both using supervised and unsupervised learning13,14,18,20].
In change detection, a situation may occur where the categor-cal information of a few labeled patterns can be collected easilyy the experts. If the number of these labeled patterns is less, thenhis information may not be sufficient for developing supervised
ethods. In such a scenario, knowledge of labeled patterns, thoughot much, may be completely unutilized if unsupervised approach
s considered. Under this circumstance, semi-supervised approach23–25] can be opted instead of unsupervised or supervised ones.emi-supervision uses a small amount of labeled patterns withbundant unlabeled ones for learning, and integrates the meritsf both supervised and unsupervised strategies to make full uti-ization of collected patterns [23,24]. Semi-supervision has beensed successfully for improving the performance of clustering andlassification [26–34] when insufficient amount of labeled datare present. Semi-supervised learning using neural networks isxplored in various domains [35–38]. Though research is carriedut using multilayer perceptron (MLP) for change detection [39] inemi-supervised classification framework, there is no such applica-ion of neural network using semi-supervised clustering approachor change detection problem. This motivated us to pursue theresent study using neural networks to improve the performancef change detection process.
In one of the earlier works, the self-organizing feature mapSOFM) network [40,41] was modified (named as, modifiedelf-organizing feature map (MSOFM) [42]) and was used for unsu-ervised context sensitive change detection [14]. In the proposedethodology semi-supervised learning is incorporated within the
aid MSOFM framework [14]. The network architecture considereds similar to the one used in [14]. The network consists of two lay-rs: input and output. For each feature of the input pattern, there is
neuron in the input layer. The output layer is two dimensional andach (i, j)th neuron in the output layer represents the (i, j)th pixelosition in the difference image (DI). Here, we have a few labeledatterns. So, some neurons in the output layer correspond to these
abeled patterns (labeled neuron); others corresponds to unla-eled patterns (unlabeled neurons). There is a weighted connectionetween each neuron in the output layer and all the neurons in the
nput layer. In the present work, connection weights are initializedifferently for labeled and unlabeled neurons. The weight vectorsor unlabeled neurons are initialized randomly in [0, 1]. The weightectors for labeled neurons are initialized by the normalized fea-ure values of the corresponding labeled patterns (to introduce theffect of supervision). To normalize the feature values of the inputatterns between [0, 1], a mapping function (Eq. (2)) is used.
At the onset, the network is learned by the labeled patternsnly. Then, the unlabeled patterns are passed through the net-ork and the membership values of the unlabeled patterns for
he changed and the unchanged classes are calculated (from therained network) depending on some pre-fixed threshold value.f the similarity measure between an unlabeled pattern and the
eight vector of the corresponding neuron in the output layer isreater than the said threshold, then the membership value of thatnlabeled pattern in the changed class will be more than that ofhe unchanged class; and vice versa. A method is also suggested foromputing the membership values of unlabeled patterns for both
he classes. In [14] a correlation based and an energy based tech-iques were used for selecting suitable thresholds. In the proposedethodology, the threshold selection process is the same as it wassed in [14]. Thereafter, soft class label (or, target value) of each
mputing 15 (2014) 1–20
of the unlabeled patterns is updated using the membership val-ues of K nearest neighbors [39] of the corresponding pattern. Aftereach training step, the unlabeled patterns, which are more likely tobelong to the changed class, are selected and the MSOFM networkis re-iterated by considering the labeled patterns along with theseselected unlabeled patterns. Thus, learning of the MSOFM networkand modification of soft class labels for the unlabeled patterns arecontinued iteratively until a given convergence criterion is satisfiedor the number of training steps exceeds a certain value.
To assess the effectiveness of the proposed method, experi-ments are carried out on three multi-temporal and multi-spectraldata sets of Mexico area, Island of Sardinia and the southern part ofthe Peloponnesian Peninsula, Greece and the results are comparedwith those of the existing unsupervised method based on MSOFM[14], a robust fuzzy clustering technique [43], a supervised methodbased on MLP [41], a semi-supervised technique based on MLP[39] and constrained k-means algorithm [44] (a semi-supervisedclustering algorithm).
The rest of the article is organized into five sections. Sec-tion 2 describes the methodology of the proposed semi-supervisedchange detection technique. Description of the data sets used tocarry out the investigation is provided in Section 3. In Section 4,implementation details and experimental results are discussed.Conclusion is drawn in Section 5. The performance measures usedfor investigation are concisely explained in Appendix A.
2. Proposed methodology for semi-supervised changedetection
In some of our earlier works, we have developed differentchange detection techniques [13,14,19,43] in unsupervised frame-work. In [13,14], context-sensitive change detection techniqueswere proposed using unsupervised learning based neural networksi.e. Hopfield-type neural network and modified self-organizingmap neural network. Various fuzzy clustering techniques (i.e. fuzzyc-means and Gustafson–Kessel clustering) are used for unsuper-vised change detection in [19]. These fuzzy clustering based changedetection techniques are further improved by incorporating localinformation in [43]. We have also developed a semi-supervisedchange detection technique in [39] by modifying the learning ofsupervised neural network (i.e., multilayer perceptron) in sucha way that it can utilize the abundant unlabeled patterns alongwith a few labeled patterns during learning. As already mentioned,no research work is carried out in this direction using unsuper-vised neural network when a few labeled patterns are available. Inthe present work, modified self-organizing map neural network isintegrated with the concept of semi-supervised learning for bet-ter change detection. Detailed description of the proposed changedetection technique is presented in the subsequent sections.
2.1. Generation of input pattern
The difference image D = {lmn, 1 ≤ m ≤ p, 1 ≤ n ≤ q} is produced bythe CVA technique [1] from two co-registered and radiometricallycorrected �-spectral band images Y1 and Y2, each of size p × q, ofthe same geographical area captured at different times T1 and T2.Here, gray value of the difference image D at spatial position (m, n),denoted as lmn, is calculated as,
lmn = (int)
√√√√ �∑(l˛mn(Y1) − l˛mn(Y2))2, (1)
˛=1
where l˛mn(Y1) and l˛mn(Y2) are the gray values of the pixels at thespatial position (m, n) in the ˛th band of the images Y1 and Y2,respectively.
S. Ghosh et al. / Applied Soft Co
Fp
ppt2cv
oati
x
wm
2
toaaviTTilpatmfbeT
ig. 1. Kohonen’s model of self-organizing feature map with two-dimensional out-ut layer.
From the difference image D, the input pattern for a particularixel position is generated by considering the gray value of the saidixel as well as those of its neighboring ones to exploit (spatial) con-extual information from neighbors. In the present methodology,nd order neighborhood system is used. Here, each input patternonsists of nine features, one gray value of its own and eight grayalues from its neighbors.
The y-dimensional input pattern of the (m, n)th pixel positionf DI is denoted by �Xmn = [xmn,1, xmn,2, . . ., xmn,y]. Here, a mappinglgorithm is used to normalize the feature values of the input pat-ern in [0, 1]. The ith feature value (i = 1, 2, . . . y) of the y-dimensionalnput pattern, �Xmn, is normalized as
mn,i = xmn,i − cmin
cmax − cmin, (2)
here cmax and cmin, respectively, are the maximum and the mini-um gray values of DI.
.2. Modified self-organizing feature map (MSOFM) [42]
The self-organizing feature map network (SOFM) [40,41] useshe concept of competitive learning. It has two layers: input andutput. The output layer is two dimensional (see Fig. 1). There is
weighted connection between each neuron in the output layernd all the neurons of the input layer. The y-dimensional weightector between the (m, n)th neuron of the output layer and all thenput neurons is represented by �Wmn = [wmn,1, wmn,2, . . ., wmn,y].he neurons in the output layer are competing among themselves.he SOFM is learned iteratively and it gradually generates topolog-cal map of the input patterns. The SOFM follows three steps duringearning: compete, co-operation and update weight (learn). In com-etition step, similarity measure between a given input pattern �Xmn
nd the weight vector of all the output neurons is computed. Then,he (i, j)th output neuron is selected as winner where the similarity
easure is maximum. Let, hkl,ij(itr) be the topological neighborhoodunction between the wining neuron (i, j) and its topological neigh-orhood (k, l) at iteration number itr and this function shrinks afterach iteration. It can be of any form like Gaussian, rectangular, etc.he weight updating for (k, l)th output neuron is performed as
mputing 15 (2014) 1–20 3
�Wkl(itr + 1) = �Wkl(itr) + hkl,ij(itr)�(itr)( �Xmn − �Wkl(itr)), (3)
where �(itr) denotes the learning rate in the itrth iteration and itdecreases with the increase of itr. The weight vectors of the winingneuron and its neighborhood neurons gradually move towards theinput pattern under consideration.
As mentioned earlier, the modified SOFM (MSOFM) network[42] was used for unsupervised context sensitive change detection[14]. In the present work, a similar MSOFM network architecture isused. Like SOFM, in the MSOFM network [14,42] the output layer istwo dimensional and there is a representative neuron correspond-ing to each pixel position of DI. The number of neurons in the inputlayer is the same as the number of features of the input pattern.There is also a weighted connection between each neuron in theoutput layer and all the neurons in the input layer.
In [14], the input pattern for every pixel position in DI is passedthrough the MSOFM network. Thereafter, the similarity measurebetween the given input pattern �Xmn and the weight vector �Wmn ofthe (m, n)th output neuron, is computed. If the similarity is morethan a pre-fixed threshold then the concerned output neuron is thewinner and the weight updating is performed for that neuron andits neighbors, using Eq. (3). In the SOFM network, the same inputpattern is applied to all the output neurons for selecting the win-ning neuron. On the other hand, in the MSOFM network, differentinputs are given to different output neurons and the selection ofthe winner (neuron) is done depending on a pre-defined thresholdvalue. A correlation based and an energy based methods are alsosuggested to select suitable thresholds.
2.3. Labeled pattern collection and weight initialization
Semi-supervised learning of the MSOFM network requires asmall amount of labeled patterns. The labeled patterns can be col-lected in many ways. In the present technique, for experimentalpurpose, labeled patterns are picked up from the ground truth forboth the classes with equal percentage.
After collecting the labeled patterns, weight initializations forthe labeled and the unlabeled neurons are done differently. If theclass label of the (m, n)th pixel of DI is known, then the weight vectorfor the (m, n)th output neuron, denoted as �Wmn, is initialized withthe normalized feature values of the corresponding labeled pattern;whereas weight vector for others is initialized randomly between[0, 1].
2.4. Learning by a small amount of labeled patterns
During training, the input patterns �Xmn are passed to the MSOFMnetwork consecutively. Each time, the dot product, d(m, n), between
�Xmn and �Wmn is calculated as,
d(m, n) = �Xmn · �Wmn =y∑
k=1
xmn,k · wmn,k. (4)
At the beginning of the training phase, the connection weightsof the network are updated in the following manner using thelabeled patterns only. If the class label of the (m, n)th pixel positionis known, then the weight vector �Wij for all neighboring unla-beled neurons (defined by hij,mn(·)) of the (m, n)th output neuronis updated using Eq. (3). The weight vector is gradually shiftedtowards the given input pattern through updating. This is donebecause the input patterns of the same class have similar featurevalues and the neighboring pixels of a given input pattern have
high probability to belong to the same class as that of the inputpattern. In the proposed method, the weight updating process,using labeled patterns, brings their neighboring unlabeled pixelsto their respective classes, if they originally belong to the same4 oft Co
cietdcoa
O
wftauptvWd
w
t
2
tcsocileb�iv
[], i
], o
pnnntTtti
t
Ij
S. Ghosh et al. / Applied S
lass. On the other hand, the weight vector for labeled neuron isnitialized with its own feature values and is not updated duringntire learning process; otherwise during learning the weight vec-or might move closer to the class in which the labeled patternoes not originally belong to. Learning using labeled patterns isontinued iteratively until convergence. To check convergence, theutput of the MSOFM network, O at each iteration ‘itr’ is calculateds,
=∑
d(m,n)≥�
d(m, n), (5)
here � is a pre-defined threshold value. The network convergesor any value of � (proof is given in [42]). In the present work, similarhreshold selection techniques (correlation maximization criterionnd energy based criterion) are used as it was used in the existingnsupervised change detection method [14]. Weight updating isreformed until the difference between output O in two consecu-ive iterations is less than ı, where ı is a small positive quantity. Thealue of � lies within [0, 1]. The components of the weight vector�mn are normalized in the following way so that the dot product
(m, n) lies in [0, 1]:
mn,k = wmn,k∑yk=1wmn,k
. (6)
After each epoch, the learning rate �(itr) and the size of theopological neighborhood h(itr) is decreased.
.5. Computation of soft class label of the unlabeled patterns
After each training step, the unlabeled patterns are presented tohe network and their soft class labels are calculated using the con-ept of fuzzy set theory. Let us consider that there exists two fuzzyets: one for the changed class and the other for the unchangedne. The membership values of each unlabeled pattern for both thelasses can be determined. For each (i, j)th unlabeled pattern, d(i, j)s computed by Eq. (4); if, d(i, j) ≥ �, then the (i, j)th pattern is moreikely to belong to the changed class than the unchanged one; oth-rwise it is from the unchanged class. Let, �(i, j) = [�1(i, j), �2(i, j)]e the membership value of the (i, j)th unlabeled pattern, where1(i, j) and �2(i, j) are the membership values of the (i, j)th pattern
n the unchanged class and the changed class, respectively. Thesealues can be calculated as,
�1(i, j), �2(i, j)] ={
[min(d(i, j), 1 − d(i, j)), max(d(i, j), 1 − d(i, j))
[max(d(i, j), 1 − d(i, j)), min(d(i, j), 1 − d(i, j))
After that, the target value (or, soft class label) of the unlabeledattern is updated in the same way using K-nearest neighbor tech-ique as it was done in [39]. For each unlabeled pattern, its K nearesteighbors are determined. To search for the K number of nearesteighbors, instead of using all the patterns, we considered onlyhose which lie within a window around that unlabeled pattern.his is done to reduce time requirement for searching. Let, M behe set of K nearest neighbors of the (i, j)th unlabeled pattern. Now,he target value t(i, j) = [t1(i, j), t2(i, j)] of the (i, j)th unlabeled patterns estimated as,
(i, j) =[∑
�Xsl∈M�1(s, l),
∑�Xsl∈M�2(s, l)
]. (8)
K K
t is to be noted that for the labeled patterns, both of t(i, j) and �(i,) are either [1, 0] or [0, 1].
mputing 15 (2014) 1–20
f d(i, j) ≥ �
therwise. (7)
2.6. Iterative learning process
Initially, learning considers only labeled patterns; and the softclass labels of the unlabeled patterns are obtained by Eqs. (7) and(8). Then, the unlabeled patterns for which the estimated targetvalue in changed class is greater than unchanged class, are selectedfor training of the MSOFM network again. The process of trainingthe MSOFM using the labeled patterns and the selected unlabeledpatterns continue until convergence. Training of the network andre-estimation of soft class labels of the unlabeled patterns usingEqs. (7) and (8) are continued iteratively until the network is sta-bilized. The stability of network (for DI of size p × q) is checked bycomputing the sum of square error, �, after each training step as:
� =p∑
i=1
q∑j=1
2∑k=1
(�k(i, j) − tk(i, j))2. (9)
Learning is continued until the difference of error � between twoconsecutive training steps is less than � (where � is a small pos-itive quantity) or the number of training steps exceeds a certainnumber. After convergence, the hard class labels are assigned tothe unlabeled patterns depending on their target values. The algo-rithmic representation of the proposed methodology is given inTable 1.
3. Description of data sets
To evaluate the effectiveness of the proposed methodology,experiments are carried out on three multi-temporal remotelysensed images corresponding to the geographical areas of Mexico,Sardinia Island of Italy and Greece.
3.1. Data set related to Mexico area [13,14,39]
This data set consists of two multi-spectral images of theLandsat-7 satellite captured by the Landsat Enhanced ThematicMapper Plus (ETM+) sensor over an area of Mexico taken on 18thApril 2000 and 20th May 2002. From the entire available Landsatscene, a section of 512 × 512 pixels has been selected as test site.A fire destroyed a large portion of the vegetation in the consid-ered region between two acquisition dates. Initially, we performedsome trials in order to determine the most effective spectral bandsfor detecting the burnt area in the considered data set. On thebasis of the results of these trials, band 4 is observed to be more
effective to locate the burnt area. Fig. 2(a) and (b) shows the band4 images corresponding to April 2000 and May 2002, respectively.The difference image (Fig. 2(c)) created by spectral band 4 usingCVA technique is only used for further analysis. For evaluation ofthe proposed approach, a reference map (Fig. 2(d)) was used. Thereference map contains 25,599 changed and 236,545 unchangedpixels.
3.2. Data set related to Sardinia Island, Italy [13,14,39]
Two multi-spectral images are acquired by the Landsat ThematicMapper (TM) sensor of the Landsat-5 satellite in September 1995and July 1996. The test site of 412 × 300 pixels of a scene includes
the lake Mulargia on the Island of Sardinia (Italy). The water level ofthe lake increased (see lower center part of the image) between twoacquisition dates. Fig. 3(a) and (b) , respectively, shows the 1995and 1996 images of band 4. We applied CVA technique on spectralS. Ghosh et al. / Applied Soft Computing 15 (2014) 1–20 5
Table 1Algorithmic representation of the proposed work.
Step 1: Pick up a few labeled patterns from the reference map.
Step 2: Initialize connection weights of the MSOFM network.For the output neuron corresponding to each of the labeled patterns, initialize weights using the feature values of the corresponding pattern.For the output neuron corresponding to each of the unlabeled patterns, initialize weights randomly in [0, 1].
Step 3: Update the network weight vector for the output neuron corresponding to each of the unlabeled patterns using labeled patterns only.
Step 4: Calculate the membership value (�) of the unlabeled patterns using similarity measure (d) and the pre-fixed threshold value (�) bypassing through the network.if d ≥ �,
� in the changed class = max[d, (1 − d)].� in the unchanged class = min[d, (1 − d)].
else� in the changed class = min[d, (1 − d)].� in the unchanged class = max[d, (1 − d)].
Step 5: Assign the target value of each unlabeled pattern using the membership values of its K nearest neighbors.
Step 6: For the next training step select those unlabeled patterns for which the estimated target value in changed class is greater than theunchanged one.
Step 7: Update the network weight vector for the output neuron corresponding to each of the unlabeled patterns using the labeled patterns aswell as the selected unlabeled patterns.
Step 8: Repeat Steps 4, 5, 6 and 7 until convergence. At convergence, go to Step 9.
tterns
btto1
Ft
Step 9: Assign a hard class label to each of the unlabeled pa
ands 1, 2, 4, and 5 of the two multi-temporal images to generate
he difference image (Fig. 3(c)), as elementary experiments showhat the above channels contain useful information on the changesf water body. In the reference map (Fig. 3(d)), 7480 changed and16,120 unchanged pixels were identified.ig. 2. Images of Mexico area. (a) Band 4 image acquired in April 2000, (b) band 4 imaechnique, and (d) a reference map of the changed area.
.
3.3. Data set related to the Peloponnesian Peninsula, Greece
This data set is composed two images captured by a pas-sive multi-spectral scanner installed on a satellite (i.e., the WideField Sensor (WiFS) mounted on board the IRS-P3 satellite)
ge acquired in May 2002, (c) corresponding difference image generated by CVA
6 S. Ghosh et al. / Applied Soft Computing 15 (2014) 1–20
F , (b) bu
oiWaNfi
FS
ig. 3. Images of Sardinia Island, Italy. (a) Band 4 image acquired in September 1995sing 1, 2, 4, and 5, and (d) a reference map of the changed area.
n the southern part of the Peloponnesian Peninsula, Greece,n April 1998 and September 1998. From the entire available
iFS scene, a section of 492 × 492 pixels has been selected
s test site. Fig. 4(a) and (b) shows the respective images forIR band (i.e., near-infrared spectral channel). Various wild-re destroyed a large portion of vegetation in the said areaig. 4. Images of the Peloponnesian Peninsula, Greece. (a) NIR band of the IRS-P3 WiFS
eptember 1998, (c) corresponding difference image generated by CVA technique, and (d
and 4 image acquired in July 1996, (c) difference image generated by CVA technique
between two acquisition dates. Fig. 4(c) and (d), respectively,shows the corresponding difference image and the referencemap which are obtained by the same process as used in
the case of the previously mentioned data sets. The ref-erence map contains 5197 changed and 236,867 unchangedpixels.image acquired in April 1998, (b) NIR band of the IRS-P3 WiFS image acquired in) a reference map of the changed area.
S. Ghosh et al. / Applied Soft Computing 15 (2014) 1–20 7
Table 2Results obtained by the unsupervised change detection technique using MSOFM on Mexico data set.
Threshold Max/Min MA FA OE Avg. OE Avg. Micro F1 Avg. Macro F1 Avg. Kappa Avg. PE
0.216 (optimal) Min 1366 1618 2984 2991.7 0.9677 0.9677 0.9355 0.0114Max 1366 1634 3000 (4.360046) (0.000044) (0.000044) (0.000089) (0.000017)
0.183 (energy based) Min 556 3004 3560 3570.9 0.9635 0.9629 0.9258 0.01362Max 556 3024 3580 (6.42573) (0.00006) (0.000063) (0.000125) (0.000025)
0.232 (correlation based) Min 1987 1210 3197 3208.5 0.9648 0.9648 0.9296 0.0122Max 1989 1227 3216 (6.086871) (0.000064) (0.000063) (0.000127) (0.000023)
Table 3Results obtained by a supervised change detection technique using MLP on Mexico data set.
Training patterns Max/Min MA FA OE Avg. OE Avg. Micro F1 Avg. Macro F1 Avg. Kappa Avg. PE
0.1% Min 1345 1429 2774 3086.4 0.9663 0.9662 0.9275 0.0117Max 2268 1139 3407 (186.277857) (0.002266) (0.00234) (0.004676) (0.000711)
0.5% Min 1269 1406 2675 2834.3 0.9695 0.9694 0.915 0.0108Max 875 2192 3067 (95.736148) (0.000899) (0.000933) (0.001731) (0.000365)
1% Min 1208 1420 2628 2727.1 0.9703 0.9703 0.8946 0.0104Max 913 2034 2947 (87.813951) (0.000838) (0.000864) (0.001477) (0.000335)
Table 4Results obtained by constrained k-means algorithm on Mexico data set.
Training patterns Min/Max MA FA OE Avg. OE Avg. Micro F1 Avg. Macro F1 Avg. Kappa Avg. PE
0.1% Min 3099 664 3763 3767.5 0.9581 0.9574 0.9148 0.0143Max 3106 665 3771 (2.539685) (0.000029) (0.000031) (0.000061) (0.00001)
0.5% Min 3077 661 3738 3743.6 0.9584 0.9576 0.9154 0.0142Max 3081 664 3745 (3.104835) (0.000035) (0.000036) (0.000072) (0.000012)
1% Min 3063 657 3720 3727 0.9585 0.9578 0.9157 0.0142Max 3073 662 3735 (3.464102) (0.000039) (0.00004) (0.000081) (0.000013)
Table 5Results obtained by using robust fuzzy c-means and semi-supervised MLP on Mexico data set.
Techniques used Min/Max MA FA OE Avg. OE Avg. Micro F1 Avg. Macro F1 Avg. Kappa Avg. PE
RFCM Min 1795 1068 2863 2863 0.9687 0.9686 0.9372 0.0109Max 1795 1068 2863 (0) (0) (0) (0) (0)
Semi-supervised MLP Min 2660 706 3366 3388.1 0.9625 0.9620 0.9240 0.0129Max 2771 661 3432 (22.38504) (0.00026) (0.000283) (0.000564) (0.000085)
Table 6Results obtained by the proposed semi-supervised technique on Mexico data set.
Threshold Trainingpatterns
Max/Min MA FA OE Avg. OE Avg. Micro F1 Avg. Macro F1 Avg. Kappa Avg. PE
0.216 (optimal) 0.1% Min 1512 1218 2730 2741.4 0.9701 0.9701 0.9403 0.0104Max 1516 1239 2755 (6.666333) (0.000071) (0.000071) (0.000141) (0.000025)
0.5% Min 1470 1235 2705 2723.7 0.9702 0.9702 0.9404 0.0104Max 1502 1232 2734 (8.331266) (0.000096) (0.000096) (0.000193) (0.000032)
1% Min 1478 1201 2679 2700.8 0.9703 0.9703 0.9406 0.010407Max 1483 1237 2720 (12.432216) (0.000135) (0.000135) (0.00027) (0.000048)
0.183 (energy based) 0.1% Min 697 2174 2871 2884.1 0.9697 0.9695 0.939 0.011Max 710 2195 2896 (6.284107) (0.000062) (0.000064) (0.000127) (0.000024)
0.5% Min 683 2175 2858 2868.2 0.9698 0.9695 0.9391 0.0109Max 692 2189 2881 (8.459314) (0.000089) (0.000089) (0.000178) (0.000032)
1% Min 688 2142 2830 2854.4 0.9698 0.9695 0.9391 0.0109Max 687 2197 2884 (13.821722) (0.000137) (0.000141) (0.000281) (0.000053)
0.232 (correlation based) 0.1% Min 2143 920 3063 3080.7 0.966 0.9659 0.9318 0.0117Max 2148 945 3093 (9.327915) (0.000102) (0.000102) (0.000203) (0.000036)
0.5% Min 2098 931 3029 3054.1 0.9662 0.966 0.9321 0.0117Max 2137 946 3083 (16.585837) (0.000186) (0.000188) (0.000375) (0.000064)
1% Min 2070 929 2999 30Max 2111 940 3051 (1
24.6 0.9664 0.9662 0.9324 0.01164.779716) (0.000166) (0.000167) (0.000335) (0.000057)
8 S. Ghosh et al. / Applied Soft Computing 15 (2014) 1–20
Table 7Results obtained by the unsupervised change detection technique using MSOFM on Sardinia data set.
Threshold Max/Min MA FA OE Avg. OE Avg. Micro F1 Avg. Macro F1 Avg. Kappa Avg. PE
0.368 (optimal) Min 1070 574 1644 1649.2 0.9397 0.9394 0.8789 0.0133Max 1076 579 1655 (3.37046) (0.000126) (0.000127) (0.000254) (0.000027)
0.356 (energy based) Min 915 766 1681 1685.7 0.9395 0.9394 0.8789 0.0136Max 914 776 1690 (2.45153) (0.000081) (0.00008) (0.00016) (0.00002)
0.337 (correlation based) Min 704 1181 1885 1892.3 0.9349 0.9346 0.8693 0.0153Max 702 1196 1898 (3.606938) (0.000104) (0.000108) (0.000216) (0.000029)
Table 8Results obtained by a supervised change detection technique using MLP on Sardinia data set.
Training patterns Max/Min MA FA OE Avg. OE Avg. Micro F1 Avg. Macro F1 Avg. Kappa Avg. PE
0.1% Min 1161 504 1665 1969.4 0.9277 0.9268 0.8472 0.0159Max 1231 1591 2822 (376.058559) (0.012885) (0.01306) (0.025651) (0.003043)
0.5% Min 1057 659 1716 2035 0.9258 0.9249 0.8198 0.0164Max 1672 641 2313 (177.15925) (0.006511) (0.006851) (0.013029) (0.001433)
1% Min 1050 578 1628 1815.3 0.9327 0.9321 0.8057 0.0146Max 1534 409 1943 (88.293884) (0.003542) (0.003847) (0.007149) (0.000714)
Table 9Results obtained by constrained k-means algorithm on Sardinia data set.
Training patterns Min/Max MA FA OE Avg. OE Avg. Micro F1 Avg. Macro F1 Avg. Kappa Avg. PE
0.1% Min 637 1876 2513 2514.8 0.9184 0.9169 0.8339 0.0203Max 637 1881 2518 (1.4) (0.000039) (0.000041) (0.000082) (0.000011)
0.5% Min 635 1858 2493 2501.8 0.9188 0.9173 0.8347 0.0202Max 634 1876 2510 (5.87875) (0.000146) (0.000161) (0.000321) (0.000047)
1% Min 631 1839 2470 2480.4 0.9194 0.9179 0.8359 0.02Max 628 1866 2494 (6.696268) (0.000168) (0.000185) (0.000369) (0.000054)
Table 10Results obtained by using robust fuzzy c-means and semi-supervised MLP on Sardinia data set.
Techniques used Min/Max MA FA OE Avg. OE Avg. Micro F1 Avg. Macro F1 Avg. Kappa Avg. PE
RFCM Min 606 1576 2182 2182 0.9278 0.9268 0.8536 0.0176Max 606 1576 2182 (0) (0) (0) (0) (0)
Semi-supervised MLP Min 1369 279 1648 1669.2 0.9378 0.9360 0.8721 0.0135Max 1450 246 1696 (11.496086) (0.000468) (0.000559) (0.001115) (0.000093)
Table 11Results obtained by the proposed semi-supervised technique on Sardinia data set.
Threshold Trainingpatterns
Max/Min MA FA OE Avg. OE Avg. Micro F1 Avg. Macro F1 Avg. Kappa Avg. PE
0.368 (optimal) 0.1% Min 1172 371 1543 1555.5 0.9424 0.9416 0.8832 0.0125Max 1225 355 1580 (10.052363) (0.000404) (0.000442) (0.000883) (0.000081)
0.5% Min 1155 359 1514 1540.2 0.9428 0.9419 0.8839 0.0125Max 1166 399 1565 (12.237647) (0.000456) (0.000455) (0.00091) (0.000099)
1% Min 1143 355 1498 1516 0.9434 0.9426 0.8852 0.0123Max 1183 357 1540 (16.321765) (0.000624) (0.000634) (0.001268) (0.000133)
0.356 (energy based) 0.1% Min 999 496 1495 1502.5 0.945 0.9447 0.8894 0.0121Max 1002 513 1515 (7.003571) (0.000247) (0.000242) (0.000483) (0.000056)
0.5% Min 976 493 1469 1485.6 0.9454 0.9451 0.89 0.012Max 991 511 1502 (12.682271) (0.000473) (0.000476) (0.000952) (0.000103)
1% Min 967 473 1440 1459.9 0.9461 0.9458 0.8916 0.0119Max 989 490 1479 (12.332477) (0.000464) (0.000469) (0.000937) (0.000101)
0.337 (correlation based) 0.1% Min 811 826 1637 1656.2 0.9411 0.9411 0.8822 0.0134Max 825 847 1672 (9.537295) (0.000334) (0.000334) (0.000669) (0.000077)
0.5% Min 785 841 1626 1635.7 0.9416 0.9416 0.8832 0.0133Max 805 850 1655 (7.253275) (0.000253) (0.000253) (0.000506) (0.000059)
1% Min 794 787 1581 161Max 818 835 1653 (18
3.7 0.942 0.942 0.8841 0.01318.649665) (0.000653) (0.000653) (0.001305) (0.000153)
S. Ghosh et al. / Applied Soft Computing 15 (2014) 1–20 9
Table 12Results obtained by the unsupervised change detection technique using MSOFM on Greece data set.
Threshold Max/Min MA FA OE Avg. OE Avg. Micro F1 Avg. Macro F1 Avg. Kappa Avg. PE
0.425 (optimal) Min 2202 728 2930 2935.4 0.8381 0.8324 0.665 0.0121Max 2203 741 2944 (4.103657) (0.000197) (0.000158) (0.000317) (0.000017)
0.397 (energy based) Min 1709 1564 3273 3285.2 0.8364 0.8364 0.6729 0.01357Max 1709 1582 3291 (5.095096) (0.000177) (0.000174) (0.000349) (0.000021)
0.438 (correlation based) Min 2510 528 3038 3045.3 0.8274 0.8159 0.6322 0.0125Max 2510 541 3051 (4.450843) (0.000243) (0.000173) (0.000348) (0.000018)
Table 13Results obtained by a supervised change detection technique using MLP on Greece data set.
Training patterns Max/Min MA FA OE Avg. OE Avg. Micro F1 Avg. Macro F1 Avg. Kappa Avg. PE
0.1% Min 2167 990 3127 3924.2 0.8033 0.7989 0.5929 0.0162Max 2292 3380 5672 (816.001201) (0.027801) (0.027682) (0.052822) (0.003371)
0.5% Min 2570 464 3034 3386.3 0.8238 0.8203 0.6103 0.0139Max 1564 2201 3765 (241.102903) (0.004244) (0.006501) (0.011776) (0.000996)
1% Min 1948 973 2921 3071 0.8344 0.8291 0.6002 0.0126Max 1575 2024 3599 (196.981217) (0.008858) (0.015223) (0.021243) (0.000814)
Fig. 5. Change detection maps obtained for Mexico data set: (a) using an unsupervised technique based on optimal threshold, (b) using MLP based supervised technique (with0.1% training pattern), (c) using constrained k-means algorithm (with 0.1% training pattern), (d) using the proposed semi-supervised technique based on optimal threshold(with 0.1% training pattern), (e) a reference map for the changed area, (f) using semi-supervised MLP, and (g) using robust fuzzy c-means algorithm.
10 S. Ghosh et al. / Applied Soft Computing 15 (2014) 1–20
Fig. 6. Error maps obtained for Mexico data set: (a) using an unsupervised technique based on optimal threshold, (b) using MLP based supervised technique (with 0.1%training pattern), (c) using constrained k-means algorithm (with 0.1% training pattern), (d) using the proposed semi-supervised technique based on optimal threshold (with0.1% training pattern), (e) using semi-supervised MLP, and (f) using robust fuzzy c-means algorithm.
Table 14Results obtained by constrained k-means algorithm on Greece data set.
Training patterns Min/Max MA FA OE Avg. OE Avg. Micro F1 Avg. Macro F1 Avg. Kappa Avg. PE
0.1% Min 37 67,295 67,332 67,358.3 0.6583 0.4835 0.0966 0.2782Max 37 67,339 67,376 (13.842326) (0.000014) (0.000032) (0.000025) (0.000057)
0.5% Min 37 66,798 66,835 66,860.4 0.6588 0.4847 0.0975 0.2762Max 37 66,885 66,922 (28.228355) (0.000032) (0.000065) (0.000053) (0.000117)
948.1
.25704
4
nasmais
TR
1% Min 39 65,871 65,910 65,Max 39 65,939 65,978 (23
. Experimental results and discussion
As mentioned in Section 1, to investigate the effective-ess of the proposed semi-supervised technique, experimentsre conducted on three different multi-temporal and multi-pectral data sets and the results obtained using the proposed
ethodology are compared with those of the unsupervisedpproach using MSOFM [14], a robust fuzzy c-means cluster-ng technique (RFCM) [43], a supervised method using MLP, aemi-supervised change detection technique using MLP [39], and
able 15esults obtained by using robust fuzzy c-means and semi-supervised MLP on Greece data
Techniques used Min/Max MA FA OE Avg. O
RFCM Min 37 66,963 67,000 67,00Max 37 66,963 67,000 (0)
Semi-supervised MLP Min 39 68,110 68,149 70,57Max 33 74,571 74,604 (1798
0.6596 0.4867 0.0992 0.27242) (0.000028) (0.000054) (0.000044) (0.000096)
constrained k-means algorithm [44] (a semi-supervised clusteringalgorithm).
To implement the proposed algorithm, during training ofMSOFM network, the learning rate � in each iteration ‘itr’ is com-puted as �(itr) = 1/(1 + itr) to ensure its value to lie within the rangeof 0 to 1 (i.e., 0 < � ≤ 1). The topological neighborhood, h(itr) was
considered to be a rectangular window with initial size 11 × 11 andafter each epoch, the window size was gradually reduced until itattained a size of 3 × 3; thereafter its size was kept constant tillconvergence. For each unlabeled pattern, to search for its K nearestset.
E Avg. Micro F1 Avg. Macro F1 Avg. Kappa Avg. PE
0 0.6587 0.4844 0.0973 0.276786(0) (0) (0) (0)
2.5 0.6552 0.4763 0.0911 0.2915.824352) (0.001671) (0.004005) (0.002918) (0.007431)
S. Ghosh et al. / Applied Soft Computing 15 (2014) 1–20 11
Tab
le
16R
esu
lts
obta
ined
by
the
pro
pos
ed
sem
i-su
per
vise
d
tech
niq
ue
on
Gre
ece
dat
a
set.
Thre
shol
d
Trai
nin
g
pat
tern
s
Max
/Min
MA
FA
OE
Avg
. OE
Avg
. Mic
ro
F 1A
vg. M
acro
F 1A
vg. K
appa
Avg
. PE
0.42
5
(op
tim
al)
0.1%
Min
2373
405
2778
2791
.8
0.84
29
0.83
15
0.66
33
0.01
15M
ax
2376
425
2801
(6.9
3974
1)
(0.0
0041
5)
(0.0
0041
2)
(0.0
0082
3)
(0.0
0002
9)
0.5%
Min
2344
420
2744
2769
.6
0.84
37
0.83
22
0.66
48
0.01
14M
ax
2376
417
2793
(14.
9144
23)
(0.0
0094
8)
(0.0
0105
1)
(0.0
0209
6)
(0.0
0006
2)
1%
Min
2310
399
2709
2744
.7
0.84
44
0.83
3
0.66
63
0.01
14M
ax
2347
422
2769
(15.
1528
88)
(0.0
0095
)
(0.0
0100
5)
(0.0
0200
7)
(0.0
0006
3)
0.39
7
(en
ergy
base
d)
0.1%
Min
1763
1143
2906
2915
.7
0.84
84
0.84
75
0.69
51
0.01
2M
ax
1781
1161
2942
(9.5
7131
1)
(0.0
0047
8)
(0.0
0047
4)
(0.0
0094
8)
(0.0
0004
)
0.5%
Min
1750
1098
2848
2886
.3
0.84
92
0.84
83
0.69
66
0.01
19M
ax
1753
1159
2912
(18.
1771
84)
(0.0
0084
8)
(0.0
0082
6)
(0.0
0165
2)
(0.0
0007
5)
1%
Min
1751
1093
2844
2865
.3
0.84
92
0.84
82
0.69
66
0.01
19M
ax
1766
1116
2882
(12.
9926
9)
(0.0
0068
3)
(0.0
0070
4)
(0.0
0140
8)
(0.0
0005
4)
0.43
8
(cor
rela
tion
base
d)
0.1%
Min
2721
282
3003
3013
.3
0.82
69
0.80
72
0.61
5
0.01
24M
ax
2730
295
3025
(8.1
2465
4)
(0.0
0053
9)
(0.0
0055
9)
(0.0
0111
4)
(0.0
0003
4)
0.5%
Min
2687
269
2956
2985
.6
0.82
8
0.80
84
0.61
74
0.01
23M
ax
2704
312
3016
(19.
4586
74)
(0.0
0129
3)
(0.0
0132
5)
(0.0
0264
2)
(0.0
0008
1)
1%
Min
2639
270
2909
2948
.9
0.82
95
0.81
03
0.62
12
0.01
23M
ax
2673
297
2970
(19.
9471
8)
(0.0
0133
3)
(0.0
0140
4)
(0.0
0279
8)
(0.0
0008
3)
Fig. 7. Change detection maps obtained for Sardinia data set: (a) using an unsu-pervised technique based on optimal threshold, (b) using MLP based supervisedtechnique (with 0.1% training pattern), (c) using constrained k-means algorithm(with 0.1% training pattern), (d) using the proposed semi-supervised technique
based on optimal threshold (with 0.1% training pattern), (e) a reference map for thechanged area, (f) using semi-supervised MLP, and (g) using robust fuzzy c-meansalgorithm.neighbors, although we experimented with different window sizesand K values, finally, the window size was taken as 51 × 51 and thevalue of K was fixed at 8. The similar threshold calculation methods(optimal, correlation based and energy based) are adopted in ourexperiments as they were employed in [14]. Weight initializationof the network connections (for both MSOFM and MLP) and thetraining (labeled) patterns used are different for different simula-tions. For experimentation, three different percentages of trainingpatterns (0.1%, 0.5%, and 1%) are considered and 10 simulations areconducted. At the beginning of the training phase, the labeled pat-terns are obtained from the reference map and a target value isassigned to each labeled pattern depending on its class label. Thetarget value of each of the training patterns is fixed to its class label
while testing.To assess the effectiveness of the proposed methodology,various performance measuring indices are considered in ourinvestigation and these are as follows: the number of missed alarms
12 S. Ghosh et al. / Applied Soft Computing 15 (2014) 1–20
Fig. 8. Error maps obtained for Sardinia data set: (a) using an unsupervised technique based on optimal threshold, (b) using MLP based supervised technique (with 0.1%training pattern), (c) using constrained k-means algorithm (with 0.1% training pattern), (d) using the proposed semi-supervised technique based on optimal threshold (with0.1% training pattern), (e) using semi-supervised MLP, and (f) using robust fuzzy c-means algorithm.
((m(avi(busTuapiSs
pov
MA), the number of false alarms (FA), the number of overall errorOE), micro averaged F1 measure (MicroF1), macro averaged F1
easure (MacroF1), Kappa measure (Kappa) and error probabilityPE). Except the cases of missed alarms and false alarms, the aver-ge (Avg.) and standard deviation (written in brackets in the tables)alues (over 10 simulations) of all other performance measuringndices are considered for comparative analysis. The best resultsdenoted by ‘Min’ in the tables) and the worst results (denotedy ‘Max’ in the tables) for MA, FA and OE, considering all the sim-lations, are also provided in the tables. Results of Mexico dataet, Sardinia data set and Greece data set are put in Tables 2–6,ables 7–11, and Tables 12–16, respectively. Results obtained usingnsupervised MSOFM, supervised, constrained k-means algorithm,
robust fuzzy clustering algorithm, semi-supervised MLP and theroposed semi-supervised techniques for Mexico data set are given
n Tables 2, 3, 4, 5 and 6, respectively. The corresponding results forardinia data set are depicted in Tables 7–11 and for Greece dataet the results are put in Tables 12–16.
From Tables 2 and 6, it is noticed that for Mexico data set, theroposed semi-supervised method (considering all the percentagesf training patterns) outperforms the corresponding unsupervisedersion for most of the cases except the cases of missed alarms. It
has also been observed that in case of the proposed strategy theaverage values of almost all the measuring indices are significantlybetter than those of the corresponding unsupervised method, butthe standard deviations are little more. This might be due to the factthat different training patterns are used for different simulationsin case of semi-supervised technique. By comparing the standarddeviation values of Tables 2, 3 and 6, it has been found that for allthe performance measurements MLP based supervised techniqueproduces much higher values than those obtained using unsuper-vised and semi-supervised approaches. This may be due to theunavailability of sufficient number of training samples to carry outany supervised method and it might be a typical example of any reallife scenario. From Tables 3 and 6, it is also seen that the maximumoverall error (worst case) over 10 simulations using the proposedapproach are lower than the corresponding supervised method forthe cases of optimal and energy based threshold; whereas the min-imum overall error (best case) using the proposed strategy are notbetter for most of the cases for Mexico data set. It may be due
to the fact that the supervised framework with good representa-tive labeled patterns (covering the underline pattern distributionproperly) can obtain better results than the corresponding semi-supervised approach. But, such good training patterns may not beS. Ghosh et al. / Applied Soft Computing 15 (2014) 1–20 13
F d tech0 patte( i-supe
astvhopicttataa4okcb(o
ig. 9. Change detection maps obtained for Greece data set: (a) using an unsupervise.1% training pattern), (c) using constrained k-means algorithm (with 0.1% trainingwith 0.1% training pattern), (e) a reference map for the changed area, (f) using sem
vailable for most of the cases, especially when there is a circum-tance of inadequacy of labeled patterns. This is also justified byhe attainment of higher standard deviation values for the super-ised method over the unsupervised and semi-supervised ones. Itas been also found that for the proposed method out of a totalf 18 cases (considering different threshold values with differentercentages of training patterns used for both maximum and min-
mum overall error), missed alarms and false alarms are less in 8ases and 12 cases, respectively. For Kappa measure, in all caseshe results obtained using the semi-supervised method are betterhan those of the corresponding supervised method. It has beenlso observed that out of a total of 9 cases (considering differenthreshold values with different percentages of training patterns),verage values of overall error, micro averaged F1 measure, macroveraged F1 measure, error probability are more in 5 cases, 4 cases,
cases and 5 cases, respectively. From Tables 4 and 6, it has beenbserved that the proposed technique is better than constrained-means algorithm in terms of all the measuring indices except the
ase of false alarms. By analyzing the results in Tables 5 and 6, it haseen found that the proposed technique using optimal thresholdconsidering all the three different percentage of training patterns)utperforms the robust fuzzy clustering (RFCM) technique in allnique based on optimal threshold, (b) using MLP based supervised technique (withrn), (d) using the proposed semi-supervised technique based on optimal thresholdrvised MLP, and (g) using robust fuzzy c-means algorithm.
the cases except the case of false alarms. From Tables 5 and 6, it isalso seen that the proposed approach using all the three differentthreshold values is significantly better than Semi-supervised MLPin most of the cases.
Comparative analysis of results of the unsupervised (in Table 7)and the proposed semi-supervised (in Table 11) techniques, for Sar-dinia data set, reveals a similar findings as in Mexico data set. Byanalyzing the results depicted in Tables 8 and 11, it is observed thatthe proposed methodology is always better than the supervisedapproach in terms of maximum, minimum and average overallerror, micro averaged F1 measure, macro averaged F1 measure,Kappa measure, error probability and standard deviation. In casesof missed alarms and false alarms, it is seen that out of 18 cases(considering different threshold values along with different per-centages of training patterns in terms of maximum and minimumoverall error) the proposed strategy is better in 15 cases and 12cases, respectively. From Tables 9 and 11, it is noticed that theproposed technique is better than constrained k-means algorithm
in terms of all the measuring indices except the case of missedalarms. By comparing the results displayed in Tables 10 and 11, ithas been found that the proposed approach considering all thresh-old values is significantly better than the robust fuzzy c-means14 S. Ghosh et al. / Applied Soft Computing 15 (2014) 1–20
Fig. 10. Error maps obtained for Greece data set: (a) using an unsupervised technique based on optimal threshold, (b) using MLP based supervised technique (with 0.1%t ern), a(
csssm
presffimbeo(pmuattsitapvbf
raining pattern), (c) using constrained k-means algorithm (with 0.1% training pattwith 0.1% training pattern).
lustering in most of the cases except the case of missed alarms andtandard deviation. It has been also noticed that the proposed semi-upervised approach is better suited for change detection than theemi-supervised approach using MLP in terms of almost all theeasuring indices.From Tables 12 and 16 it is seen that for Greece data set the
roposed methodology is providing better results than the cor-esponding unsupervised version in terms of false alarms, overallrror and error probability. It has also been observed that the semi-upervised approach is not being able to improve the performanceor the cases of missed alarms. These results also corroborate ourndings for the other two data sets. It is also noticed that the perfor-ance of the proposed method using micro averaged F1 measure is
etter than the corresponding unsupervised methodology consid-ring the optimal and the energy based threshold values. In casef correlation based threshold with 0.5% and 1% training patternsand not with 0.1% training patterns), micro averaged F1 measurerovided better results than those obtained using unsupervisedethod. Among the three different threshold selection techniques
sed in the present article, for Kappa measure and macro aver-ged F1 measure the proposed technique produces better resultshan the unsupervised version only for the case of energy basedhreshold. By comparing the results of the supervised and the semi-upervised approaches, for Greece data set (in Tables 13 and 16),t has been found that the proposed method is always better thanhe supervised method in terms of overall error, kappa measurend error probability. It has been also noticed that for the pro-
osed method out of total 18 cases (considering different thresholdalues with different percentages of training patterns used foroth maximum and minimum overall error), missed alarms andalse alarms are less in 5 cases and 15 cases, respectively. Outnd (d) using the proposed semi-supervised technique based on optimal threshold
of total 9 cases (considering different threshold values with dif-ferent percentages of training patterns), average values of microaveraged F1 measure and macro averaged F1 measure are higherin 8 cases and 7 cases. From Tables 14 and 16, it is noticed thatthe proposed technique is better than constrained k-means algo-rithm in terms of all the measuring indices except the case ofmissed alarms. From Tables 15 and 16, it has been found that theresults obtained using the proposed approach for all the three dif-ferent thresholding techniques are significantly better than thoseobtained using a robust fuzzy clustering approach and the semi-supervised MLP in almost all cases. In case of Greece data set, theperformance of the semi-supervised MLP [39] and the fuzzy c-means algorithm [43] are noticeably worse. By this observation,it can be concluded that the semi-supervised MLP and RFCM arenot robust for all the datasets, used for experimentation; whereasthe proposed semi-supervised approach performed well for all thedata sets.
From Tables 6, 11 and 16 (considering the results obtainedusing semi-supervised approach under different conditions), itis also observed that the performance measurement indices aremostly attaining better values with an increase of percentagesof training patterns. Robustness of the proposed methodology(as evident from the standard deviation) is slightly worse thanthe unsupervised approach but far better than the supervisedtechnique. To sum up, considering the results obtained for 10different performance measures with different data sets, differ-ent thresholding techniques and different percentages of training
patterns (wherever applicable), the proposed semi-supervisedmethodology, has an edge over the unsupervised as well assupervised ones when a small number of labeled patterns are avail-able.S. Ghosh et al. / Applied Soft Computing 15 (2014) 1–20 15
Fig. 11. Graphs of the error-rate for Mexico data set obtained using: (a) unsupervised MSOFM (optimal threshold), (b) fuzzy c-means algorithm incorporating local information( LP, (e)s rn).
Kptoohoiotfo
p(s
i.e. RFCM), (c) supervised MLP (with 0.5% training pattern), (d) Semi-supervised Memi-supervised technique MSOFM (with optimal threshold and 0.5% training patte
To test the significance of results statistically (in terms ofappa measure) of the investigation, paired t-test [45] has beenerformed with the proposed semi-supervised approach versushe other unsupervised, supervised and semi-supervised meth-ds at 5% level of significance and the results of t-test in termsf p-score are reported in Table 17. For typical illustration, weave considered the results (over 10 simulations) obtained usingptimal threshold and 0.5% training patterns. Statistically signif-cant results in terms of p-score of the paired t-test (at 5% levelf significance) are marked as bold in Table 17. By analyzinghe results, in all the cases, significant improvement has beenound by the proposed method as compared to the other meth-ds.
For visual illustration, the change detection maps are dis-layed in figures, corresponding to the minimum overall errorobtained over 10 simulations), using unsupervised method,upervised method, constrained k-means algorithm and the
constrained k-means algorithm (with 0.5% training pattern), and (f) the proposed
proposed semi-supervised method. The change detection mapsobtained using these four approaches for Mexico, Sardinia andGreece data sets are shown, respectively, in Figs. 5, 7 and 9. The errormaps (highlighting the difference between the change detectionmap and the reference map) are also displayed in Figs. 6, 8 and 10.It has been observed that the change detection maps obtainedusing the proposed method are more accurate (resemblance ofthe reference map) in all cases. From the maps, it is clearly visi-ble that erroneous classification of the unchanged areas as changedones (i.e., false alarms) has been significantly reduced by the pro-posed methodology. But, it failed to detect some of the small andscattered changed areas (i.e., missed alarms), where the pixelsare on the boundary or the area is surrounded by a vast amount
of unchanged regions. This is obviously due to the neighboringeffect.Graphs of error-rate obtained using different unsupervised,supervised and semi-supervised techniques on three data sets are
16 S. Ghosh et al. / Applied Soft Computing 15 (2014) 1–20
Fig. 12. Graphs of the error-rate for Sardinia data set obtained using: (a) unsupervised MSOFM (optimal threshold), (b) fuzzy c-means algorithm incorporating local infor-mation (i.e. RFCM), (c) supervised MLP (with 0.5% training pattern), (d) Semi-supervised MLP, (e) constrained k-means algorithm (with 0.5% training pattern), and (f) thep ining p
depa
TR
roposed semi-supervised technique MSOFM (with optimal threshold and 0.5% tra
isplayed in Figs. 11–13. For typical illustration, we have consid-red the results obtained using optimal threshold and 0.5% trainingatterns. From the graphs, the rate of change in terms of over-ll error with increasing epoch or training step has been noticed
able 17esults of paired t-test performed with the proposed semi-supervised technique versus th
Data set used Proposed vs. unsupervised MSOFM Proposed vs. RFCM Prop
Mexico 1.1221 × 10−23 1.4471 × 10−21 0.02Sardinia 2.9941 × 10−13 2.2788 × 10−26 1.50Greece 0.0176 1.7321 × 10−42 2.06
attern).
in a particular simulation in which the minimum overall error isobtained. In the figures, it has been noticed for all the data setsthat after initial fluctuations, the overall error is becoming eithersteady or decreases with increasing epochs (or training step) in
e other unsupervised, supervised and semi-supervised methods in terms of p-score.
osed vs. MLP Proposed vs. semi-MLP Proposed vs. constrained k-means
62 7.5308 × 10−25 1.9845 × 10−36
46 × 10−11 9.7339 × 10−16 1.2720 × 10−29
30 × 10−5 2.0507 × 10−38 1.7321 × 10−42
S. Ghosh et al. / Applied Soft Computing 15 (2014) 1–20 17
Fig. 13. Graphs of the error-rate for Greece data set obtained using: (a) unsupervised MSOFM (optimal threshold), (b) fuzzy c-means algorithm incorporating local information(i.e. RFCM), (c) supervised MLP (with 0.5% training pattern), (d) Semi-supervised MLP, (e) constrained k-means algorithm (with 0.5% training pattern), and (f) the proposedsemi-supervised technique MSOFM (with optimal threshold and 0.5% training pattern).
18 S. Ghosh et al. / Applied Soft Co
ck
5
puosbtltt
mppdasr
A
cteTzoS
A
i
A
otenmct
the reference map.
Fig. 14. Mathematical representation of confusion matrix.
ase of most of the techniques except the case of constrained-means.
. Conclusion
In this paper, an attempt has been made to improve theerformance of change detection of remotely sensed imagesnder the scarcity of labeled patterns by exploiting the self-rganizing capacity of Kohonen’s neural network integrated withemi-supervision. Here, semi-supervised learning is employedy taking into consideration a few labeled patterns. Itera-ive learning of the MSOFM network is done using both theabeled patterns and the selected unlabeled patterns. A heuristicechnique is also suggested for collecting the unlabeled pat-erns.
Experiments are carried out on three multi-temporal andulti-spectral data sets to confirm the effectiveness of the pro-
osed technique. From the results, it has been found that theroposed semi-supervised approach is better suited for changeetection than unsupervised and supervised methods where
small amount of labeled patterns is available. Like otheremi-supervised methods, the technique has a drawback ofequirement of more computational time.
cknowledgments
The authors like to thank the reviewers for their thorough andonstructive comments which helped to enhance the quality ofhe article. The authors are also grateful to the Department of Sci-nce and Technology (DST), Government of India and University ofrento, Italy, the sponsors of the ITPAR program and Prof. L. Bruz-one for providing the data. Moumita Roy is grateful to Councilf Scientific & Industrial Research (CSIR), India for providing her aenior Research Fellowship [No. 09/096(0684)2k11-EMR-I].
ppendix A. Performance measures
The detailed description of different performance measuringndices, used for evaluation purpose, are given below:
.1. Missed alarms, false alarms and overall error
The number of missed alarms is calculated by comparing thebtained change detection map and the reference map. It is equalo the number of pixels wrongly predicted to be in unchanged cat-gory, i.e., changed ones are identified as unchanged ones. The
umber of false alarms is the reverse situation of the number ofissed alarms. It is the number of unchanged pixels classified ashanged ones. In this case, it is calculated as the number of pixelshose are wrongly predicted to be in changed category. The number
mputing 15 (2014) 1–20
of overall error is the total error obtained by the change detectionprocess, and is computed as the summation of number of missedalarms and number of false alarms. Our objective is to minimizethis value.
A.2. Macro averaged F1 measure
The macro averaged F1 measure [46] is also called macro F1.This is computed by averaging the F1 score of each category. The F1score of each category is computed form precision and recall. Theprecession of category i, denoted as pi, is,
pi = patterns correctly classified into category i
patterns classified into category i, (A.1)
and recall of the category i, ri, is defined as,
ri = patterns correctly classified into category i
patterns that are truely present in category i. (A.2)
Then, F1 score of category i, (F1)i, is computed as the harmonic meanbetween precision and recall, i.e.,
(F1)i = 2 × pi × ri
pi + ri. (A.3)
F1 measure gives equal importance to both precision and recall.After that, macro F1 is calculated to find the global mean of per-category F1 scores.
Macro averaged F1 = 1C
C∑i=1
(F1)i, (A.4)
where C represents the number of categories (classes). Macro aver-aged F1 gives equal weightage to each category and its value liesbetween 0 and 1. For macro averaged F1 measure, a value close to1 denotes better classification.
A.3. Micro averaged F1 measure
The micro averaged F1 measure [46] is calculated by using aglobal contingency table. The cell values of that table are definedby summing up the corresponding cell values in the per-categorycontingency table. The micro averaged F1 measure gives equalweightage to each sample and it is defined as:
Micro averaged F1 = 2 × (1/C)∑C
i=1pi × (1/C)∑C
i=1ri
(1/C)∑C
i=1pi + (1/C)∑C
i=1ri
, (A.5)
where C is the number of categories. The micro averaged F1 measureis also called micro F1. Like macro F1, the value of this measure alsolies between 0 and 1. More close the value of micro-averaged F1 to1, the better is the classification.
A.4. Kappa measure
Kappa measure [47] is calculated using the confusion matrix.The confusion matrix (in Fig. 14) is a C × C matrix where M samplescan be classified into C categories. The M samples are distributedinto C2 cells, and each sample is assigned to one of the C categories inthe classification map (usually, the rows) and independently to oneof the same categories in the reference map (usually, the columns).Let, mij denote the number of samples classified into category i (i = 1,2, . . ., C) in the classification map and category j (j = 1, 2, . . ., C) in
Let, mi+ = ∑Cj=1mij be the number of samples classified into cat-
egory i in the classification map and m+j =∑C
i=1mij be the numberof samples classified into category j in the reference map.
oft Co
cP
P
a
P
a
P
k
A
mpo
P
Ptbcbtcpec
R
[
[
[
[
[
[
[
[
[
[
[
[
[
[
[
[
[
[
[
[
[
[
[
[
[
[
[
[
[
S. Ghosh et al. / Applied S
Let, Pij denote the proportion of the samples in the (i, j)thell, corresponding to mij in the confusion matrix. In other words,ij = mij/M.
Let, Pi+ and P+j be defined as:
i+ =C∑
j=1
Pij,
nd
+j =C∑
i=1
Pij.
Then, the actual agreement Po between the reference judgmentnd classifier judgment is computed as, Po =
∑Ci=1Pii.
The chance agreement Pc between them is calculated as:
c =C∑
i=1
Pi+P+j.
The Kappa measure is defined as follows: k = (Po − Pc)/(1 − Pc).For computational purpose, it can be rewritten as:
ˆ = M
∑Ci=1mii − ∑C
i=1mi+m+i
M2 −∑C
i=1mi+m+i
. (A.6)
.5. Error probability
The error probability (PE) is computed by using the probability ofissed alarms (PM), the probability of false alarms (PF), the a-priori
robability of the changed pixels (PO) and the a-priori probabilityf the unchanged pixels (PL). PE is calculated as,
E = PO × PF + PL × PM. (A.7)
F is defined as the ratio between the number of false alarms andhe total number of unchanged pixels; PM is determined as the ratioetween the number of missed alarms and the total number ofhanged pixels. The probability, PO, can be estimated as the ratioetween the number of changed pixels in the reference map andhe total number of image pixels; whereas, the probability, PL, isalculated similarly as the ratio between the number of unchangedixels in the reference map and the total number of image pix-ls. Lesser is the value of the error probability (PE), better is thelassification.
eferences
[1] A. Singh, Digital change detection techniques using remotely-sensed data,International Journal of Remote Sensing 10 (6) (1989) 989–1003.
[2] M.J. Canty, Image Analysis, Classification and Change Detection in RemoteSensing, CRC Press/Taylor & Francis, Boca Raton, 2006.
[3] R.J. Radke, S. Andra, O. Al-Kofahi, B. Roysam, Image change detection algo-rithms: a systematic survey, IEEE Transactions on Image Processing 14 (3)(2005) 294–307.
[4] C.M. Bishop, Pattern Recognition and Machine Learning, Springer, New York,USA, 2006.
[5] Q. Zhang, J. Wang, X. Peng, P. Gong, P. Shi, Urban built-up land change detectionwith road density and spectral information from multi-temporal Landsat TMdata, International Journal of Remote Sensing 23 (15) (2002) 3057–3078.
[6] R. Manonmani, G.M.D. Suganya, Remote sensing and GIS application in changedetection study in urban zone using multi temporal satellite, InternationalJournal of Geomatics and Geosciences 1 (1) (2010) 60–65.
[7] K.R. Merril, L. Jiajun, A comparison of four algorithms for change detection inan urban environment, Remote Sensing of Environment 63 (2) (1998) 95–100.
[8] M.M. Yagoub, Monitoring of urban growth of a desert city through remote
sensing: Al-A in, UAE, between 1976 and 2000, International Journal of RemoteSensing 25 (6) (2004) 1063–1076.[9] L. Bruzzone, D.F. Prieto, An adaptive parcel-based technique for unsuper-vised change detection, International Journal of Remote Sensing 21 (4) (2000)817–822.
[
[
mputing 15 (2014) 1–20 19
10] F. Yuan, K.E. Sawaya, B.C. Loeffelholz, M.E. Bauer, Land cover classification andchange analysis of Twin cities (Minnesota) Metropolitan Area by multitemporalLandsat remote sensing, Remote Sensing of Environment 98 (2005) 317–328.
11] G.M. Foody, Monitoring the magnitude of land-cover change around the south-ern limits of the Sahara, Photogrammetric Engineering and Remote Sensing 67(2001) 841–847.
12] G. Camps-Valls, L. Gómez-Chova, J. Munoz-Mari, J.L. Rojo-Álvarez, M. Martinez-Ramón, Kernel-based framework for multitemporal and multisource remotesensing data classification and change detection, IEEE Transactions on Geo-science & Remote Sensing 46 (6) (2008) 1822–1835.
13] S. Ghosh, L. Bruzzone, S. Patra, F. Bovolo, A. Ghosh, A context-sensitivetechnique for unsupervised change detection based on Hopfield-type neuralnetworks, IEEE Transactions on Geoscience & Remote Sensing 45 (3) (2007)778–789.
14] S. Ghosh, S. Patra, A. Ghosh, An unsupervised context-sensitive change detec-tion technique based on modified self-organizing feature map neural network,International Journal of Approximate Reasoning 50 (1) (2009) 37–50.
15] F. Melgani, G. Moser, S.B. Serpico, Unsupervised change detection methods forremote sensing images, Optical Engineering 41 (12) (2002) 3288–3297.
16] D. Liu, K. Song, J.R.G. Townshend, P. Gong, Using local transition probabilitymodels in Markov random fields for forest change detection, Remote Sensingof Environment 112 (5) (2008) 2222–2231.
17] T. Kasetkasem, P.K. Varshney, An image change detection algorithm basedon Markov random field models, IEEE Transactions on Geoscience & RemoteSensing 40 (8) (2002) 1815–1823.
18] X. Liu, Urban change detection based on an artificial neural network, Interna-tional Journal of Remote Sensing 23 (12) (2002) 2513–2518.
19] A. Ghosh, N.S. Mishra, S. Ghosh, Fuzzy clustering algorithms for unsuper-vised change detection in remote sensing images, Information Sciences 181(4) (2011) 699–715.
20] G. Pajares, A Hopfield neural network for image change detection, IEEE Trans-actions on Neural Networks 17 (5) (2006) 1250–1264.
21] Y. Bazi, F. Melgani, L. Bruzzone, G. Vernazza, A genetic expectation-maximization method for unsupervised change detection in multitemporalSAR imagery, International Journal of Remote Sensing 30 (24) (2009)6591–6610.
22] J.R. Jensen, Introductory Digital Image Processing: A Remote Sensing Perspec-tive, Prentice Hall, New Jersey, 2005.
23] X. Zhu, Semi-supervised Learning Literature Survey, Computer SciencesTR1530, University of Wisconsin, Madison, 2008.
24] O. Chapelle, B. Schölkopf, A. Zien, Semi-supervised Learning, MIT Press,Cambridge, 2006.
25] T. Lange, M.H.C. Law, A.K. Jain, J.M. Buhmann, Learning with constrained andunlabelled data, in: Proceedings IEEE Computer Society Conference on Com-puter Vision and Pattern Recognition, vol. 1, 2005, pp. 731–738.
26] S. Basu, M. Bilenko, R.J. Mooney, Comparing and unifying search-based andsimilarity-based approaches to semi-supervised clustering, in: Proceedings20th International Conference on Machine Learning (ICML-2003), Washington,DC, 2003, pp. 42–49.
27] K. Wagstaff, C. Cardie, S. Rogers, S. Schroedl, Constrained K-means clusteringwith background knowledge, in: Proceedings 18th International Conference onMachine Learning, Williamstown, MA, USA, 2001, pp. 577–584.
28] D.-Y. Yeung, H. Chang, A kernel approach for semisupervised metric learning,IEEE Transactions on Neural Networks 18 (1) (2007) 141–149.
29] C. Hou, F. Nie, F. Wang, C. Zhang, Y. Wu, Semisupervised learning using negativelabels, IEEE Transactions on Neural Networks 22 (3) (2011) 420–432.
30] H. Chen, L. Li, J. Peng, Error bounds of multi-graph regularized semi-supervisedclassification, Information Sciences 179 (12) (2009) 1960–1969.
31] C.F. Gao, X.J. Wu, A new semi-supervised clustering algorithm with pair-wise constraints by competitive agglomeration, Applied Soft Computing 11(8) (2011) 5281–5291.
32] C.-C. Chang, H.-K. Pao, Y.-J. Lee, An RSVM based two-teachers-one-student semi-supervised learning algorithm, Neural Networks 25 (2012)57–69.
33] K. Chen, S. Wang, Semi-supervised learning via regularized boosting work-ing on multiple semi-supervised assumptions, IEEE Transactions on PatternAnalysis and Machine Intelligence 33 (1) (2011) 129–143.
34] N. Kumar, K. Kummamuru, Semisupervised clustering with metric learningusing relative comparisons, IEEE Transactions on Knowledge and Data Engi-neering 20 (4) (2008) 496–503.
35] A. Verikas, A. Gelzinis, K. Malmqvist, Using unlabelled data to train a multilayerperceptron, Neural Processing Letters 14 (3) (2001) 179–201.
36] Y. Kamiya, T. Ishii, S. Furao, O. Hasegawa, An online semi-supervised clus-tering algorithm based on a self-organizing incremental neural network, in:Proceedings International Joint Conference on Neural Networks, Orlando, FL,USA, 2007.
37] F. Ratle, G. Camps-Valls, J. Weston, Semisupervised neural networks for efficienthyperspectral image classification, IEEE Transactions on Geoscience & RemoteSensing 48 (5) (2010) 2271–2282.
38] X. Zenglin, I. King, M.-T. Lyu, J. Rong, Discriminative semi-supervised featureselection via manifold regularization, IEEE Transactions on Neural Networks
21 (7) (2010) 1033–1047.39] S. Patra, S. Ghosh, A. Ghosh, Change detection of remote sensing images withsemi-supervised multilayer perceptron, Fundamenta Informaticae 84 (2008)429–442.
40] T. Kohonen, Self-Organizing Maps, 2nd edition, Springer, Berlin, 1997.
2 oft Co
[
[
[
[
[
[based pattern classification, Fundamenta Informaticae 92 (4) (2009)345–362.
0 S. Ghosh et al. / Applied S
41] S. Haykin, Neural Networks A Comprehensive Foundation, Prentice-Hall ofIndia, New Delhi, 2007.
42] A. Ghosh, S.K. Pal, Neural network, self-organization and object extraction,Pattern Recognition Letters 13 (5) (1992) 387–397.
43] N.S. Mishra, S. Ghosh, A. Ghosh, Fuzzy clustering algorithms incorporating local
information for change detection in remotely sensed images, Applied Soft Com-puting 12 (8) (2012) 2683–2692.44] S. Basu, A. Banerjee, R. Mooney, Semi-supervised clustering by seeding, in:Proceedings 19th International Conference on Machine Learning (ICML-2002),Sydney, Australia, 2002, pp. 19–26.
[
mputing 15 (2014) 1–20
45] E. Kreyszig, Introductory Mathematical Statistics: Principles and Methods, JohnWiley & Sons Publisher, New York, 1970.
46] A. Halder, A. Ghosh, S. Ghosh, Aggregation pheromone density
47] R.G. Congalton, K. Green, Assessing the Accuracy of Remotely Sensed Data:Principles and Practices, 2nd edition, CRC Press/Taylor & Francis Group, BocaRaton/London/New York, 2009.