Classification of Remotely Sensed Data by an Artificial Neural … · 2007. 5. 25. ·...

PEER.REVIEWED ARI ICTE

Glassification of Remotely Sensed Databy an Artificial Neural Network:lssues Related to Training Data

GharacteristicsGiles M. Foody, Mary B. McCulloch, and William B. Yates

AbstractAftirtcial neurol networks hove considerable potential for theclassification of remotely sensed data. In this paper a feed-forward artificial neurcl network using a variont of the back-propagation learning algorithm was used to classifyagricultura) crops from synthetic aperture radar data. Theperformance of the classification, in terms of classificationaccuracy, was ossessed relotive to a conventional statisticalclassifier, a discriminont analysis, Classifications of trainingdato sets showed that the ortificial neural network appeorcable to characterize clos.ses better than the discriminantanalysis, tvith accuracies of up to 98 percent observed. Thisbetter characterization of the training data need not, how-ever, translate into a significantly ntorc accurate classifica-tion of an independent testing set. The results of a series ofclassifications ore presenled which shov, that in generalmorkedly higher ciossification occuracies ntay bi obtoinedfrom the artificial neural network, except when a priori inlor-mation on class occuffence is incorporated into the discrimi-nant onalysis, when the classification performance wassimilar to that of the artificial neural network. These andother issues were analyzed further w,ith reference to classifi-cations of svnthetic dato sets. The results illustrate the de-pendency o7 the t*o classification techniques onrepresentative training samples and norntal)y distributeddata.

lntroductionSupervised image classification is one of the most widelyused procedures in the analysis of digital remotely senseddata. A wide range of supervised classifiers are available butall share a common obiective, to allocate each case of un-known class membership to a pre-defined class on the basisof its spectral properties. The accuracy with which a re-motely sensed image may be classified is dependent onmany factors. In addition to the properties of the remotelysensed data set, the latter include the characteristics of thetra in ing data (Swain, 1978; Schneider , 1980; Campbel l , t9B1;Foody, rgaa) and the nature of the classifier (Mather, 1,987ai

G. M. Foodv and M. B. McCulloch are with the Departmentof GeographLy, University College of Swansea, Singieton Park.Swansea SA2 8PP, United Kingdom.W. B. Yates is with the Department of Mathematics andComputer Science, University College of Swansea, SingletonPark, Swansea SA2 BPP, United Kingdom.

Thomas et al. . 1.987i Schalkoff. t99z). These latter issues areoften important because widely used conventional stat ist icalclassifieri are based on a tunge of assumptions about thedata. For instance, the maximum-l ikel ihood classif icat ion,one of the most widely used image classif icat ion techniques,assumes that the data for each class display a Gaussian nor-mal distr ibution. This assumption is often untenable with re-motely sensed data where cl isses mav display a range ofdistr ibutions. Furthermore, because class membership is un-known-it is the objective of the classif icat ion to define i t-correction for non-normali ty is impossible. Even i f the as-sumptions regarding the data distr ibution are satisf ied, theperformance of the classif icat ion wil l be dependent on thequali ty of the training data used. In order to form a represen-tat ive training set for an individual class, i t is general ly rec-ommended that the size of the training sample should be atIeast 30 t imes the number of features (e.g., wavebands) foreach class (Swain, 1978; Mather, 1987a), yet such a trainingset is freouentlv not acquired.

To avoid sbme of the problems associated with the max-imum-l ikel ihood classif icai iott . ut"ts may adopt more robusttechniques. For instance, discriminant analysis is in manyways similar to the maximum-l ikel ihood classif icat ion but isless sensit ive to deviat ions from normali ty and other as-sumed condit ions (Tom and Mil ler, 1984). Other alternativeprocedures which make no assumption about the stat ist icalhirt . ibrrt ion of the data may also be used. A wide range oftechniques are avai lable, including non-parametric classif iers(Skidmore and Turner, 19BB) and fuzzy classif iers (Kent andMardia, 19BB; Key ef d1., l .9Bg; Wang, 1990). Recently, atten-tion has focused on artificial neural network techniques as adistr ibution-free approach to image classif icat ion (e.g.,Bened ik tsson e f d1 . , 1990; L iu and X iao , 1991) .

Art i f ic ial neural networks have a wide range of appl ica-t ions (Davalo and Naim. 1991.) and have been used to clas-sify remotely sensed data to accuracies that are general lycomparable to or higher than those derived from conven-t ional stat ist ical classif icat ions (Hepner et al. 1990; Medinaand Vasquez, 1991; Short, 1991). In essence, an art i f ic ialneural network may be considered to comprise a relat ively

Photogrammetric Engineering & Remote Sensing,Vol . 61, No. 4, Apr i l 1995, pp. 391-401.

oo99- 1 1 1. 2 / 95/61 04-391$3. 00/0O 1995 American Society for Photogrammetry

and Remote Sensins


weights u nit

Figure 1. A typical aftif icial neural networkunit. The outout of the unit ls a functionof the net input to it which is a compositeof the input strengths (x) and the con-necting weights (!v).

large number of simple interconnected neurons or units thatwork in parallel to categorise input data into output classes(Hepner et al. , lggo; Schalkoff, 1992). The interconnectionsbetween the units are weighted and, with the input data,these weights determine the level of activation of a unit inthe network, which in turn influences the level of activationof other units in the network and ultimately determines thenetwork outputs (Figure 1). The magnitude of the weights isdetermined by an iterative training procedure in which thenetwork repeatedly tries to Iearn the correct output for eachof the training samples. The procedure involves modifyingthe weights between units until the artificial neural networkis able to characterize the training data accurately. Conven-tionally, a backpropagation learning algorithm has been em-ployed in which the error between the network output anddesired output is minimized (Davalo and Naim, 1991; Schal-koff, 1992). Once trained, the artificial neural network maythen be used to determine class membership for other data.The network architecture Darameters influence the perform-ance of an art i f ic ial n*ur"[ network (Benediktsson et al. .1990; Hepner e t a1 . ,1990; Lee e f a .1 . , 1gg0) bu t may bedifficult to define and are typically determined subjectively.Once defined, however, the artificial neural network is morerobust than conventional statistical classifiers (Hepner ef o1.,1990). Artificial neural networks are also more tolerant tonoise and missing data (Hepner et 01., 1990), can adapt overt ime (Short, 1991), weight the importance of the data in theclassif icat ion (Benediktsson et aL.,1990), and, once trained,can be more efficient computationally than conventionalclassif iers, especial ly i f run on a paral lel processing system(Lee e t a l . , 1sso) .

As with the conventional statistical classifications, thecharacteristics of the training data will influence classifica-tion performance. The size of the training set, for instance,may be important in determining the relative accuracy ofclassifications derived from artificial neural networks andconventional statistical image classifications (Benediktsson et01., 1990; Hepner et 01., 1990). I t may be possible for art i f i -cial neural networks to classify a remotely sensed data setaccurately with a smaller training set than is required for aconventional classif icat ion (Hepner et o.1., 1990) and, i f so,then substantial advantages are afforded to the user; the abil-ity to use small training sets and the absence of data distri-bution assumptions. This is important because, i f theremotelv sensed data satisfv the assumptions of conventionalstatistical classifiers, then no significani difference in classifi-cation accuracy is usually observed between conventionaland neural network classif icat ions (Ryan ef 01., 1991). With-out any further advantage over conventional statistical classi-fiers, artificial neural networks would therefore only be

392

attractive for the classification of data which showed mark-edly non-normal distributions. In this paper we evaluate theperformance of an artificial neural network against a discri-minant analysis with particular attention to the characteris-tics of the training data used in the classification.

DataThe classifiers were evaluated using two data sets. First, X-band HH-polarized synthetic aperture radar (san) data for aregion of flat agricultural land centered on Feltwell, UnitedKingdom (Figure 2) and second, a simulated data set. Thelatter data set will be discussed later in the paper.

The SAR data were acquired on four dates during the19BG growing season for an approximately 1OOkm'z test siteby the VARAN-S SAR as part of the European AgriSRR cam-paign (Figure 3). These data suffer from a range of radiomet-ric distortions (Quegan et al., 1991.) and, consequently.required preprocessing. The major radiometric distortionswere reduced by deleting duplicated lines and radiometricbalancing of the imagery, and the tone of the images wasstandardized by a simple inter-image regression (Foody ef01.,1989). Because these data were to be general ized to pro-duce a nominal level land-cover map, and the temporal di-mension of the data was not being utilized further, moredetai led corrections were not considered necessary. Theanalyses aimed to classify crop type on a per-field basis(Pedley and Curran, 1991) using image tone (nN) as the dis-criminating variable. The field mean DN from each of thefour images was estimated from a sample of at least 10,000pixels, which were located away from the field edges to

Figure 2. Location of the Feltwell test site in the UnitedKingdom.

PE&RS

EER.REVIEWED ARTICTE

4

Figure 3. sAR image (uncorrected) of the test site re-corded on 28 June 1986.

avoid boundary effects, for a total of rs8 fields (Foody et o1.,1989). These fields had been planted to one of seven cropclasses; winter wheat (42), spring barley (42), sugar beet (+2),potatoes (12), carrots (7), spring wheat (5), and grass (8).

The ClassifiersThe two classification techniques used in the analyses were adiscriminant analysis (Klecka, 1980) and a feedforward artifi-cial neural network using a variant of the backpropagationlearning algorithm (Yates ef a1., 1993). The discriminantanalvsis, like the maximum-likelihood classification, allo-catei each case to the class with which it has the highest oposteriori probability of membership (Klecka, 1980; Tom andMiller, 1984), and has been widely used in the classificationof remotely sensed data. Although the discriminant analysisis a relatively robust technique (Tom and Miller, 1984), anartificial neural network may be a more attractive approachto classification when the data show a marked deviationfrom normality.

An artificial neural network is constructed from a set ofprocessing units interconnected by weighted channels ac--ording to some architecture. Each unit consists of a numberof input channels, an activation function, and an outputchannel. Signals impinging on a unit's inputs are multipliedby the channel's weight and are summed to derive the netinput to that unit. The net input (net) is then transformed bythe activation function (fl to produce an output for the unit(Figure 1). This may be expressed as

net: Ix,w,'

output : f(net)

where x, is the magnitude of the ih input and w, is theweight of the interconnection channel.

PE&RS

In order to apply the backpropagation learning algo-rithm, the activation function must be continuous, differen-tiatable, and non-linear. Frequently, a sigmoid function isemployed (Rumelhart et o1., 1986): i.e.,

f:R -+ [0,1]

1f ( n e t ) : 1 + " *

where A is a gain parameter which is often, and is in this pa-per, set to 1 (Schalkoff, 1992), The activation function em-ployed here is symmetric, and such functions have beenshor".r to improve learning algorithm performance over thesigmoid funclion (Stornetta and Huberman, 1987). Specifi--ca1ly, the hyperbolic tangent function, tanh, r,rlas used. In thispaper, a fully connected layered feedforward architecturewith one hidden layer and an external bias unit was used(Figure a). The units of the network are grouped into l_ayers,with the output of a layer feeding forward to the next layer.The output of the network as a whole is simply the activa-tion of the units in the final layer.

The backpropagation learning algorithm (Rumelhart eto/., 1986) has been widely used in pattern recognition appli-cations of artificial neural networks; it iteratively minimizesan error function over the network outputs and a set of targetoutputs, taken from a training data set' The process contin-ues until the error value converges to a (possibly local) min-ima. Conventionally, the error function is given as

E:1 . t 2 l , ( r , - o , ) '

where T' is the target output vector for the training set(Tr, ..., ?,) and O, is the output vector from the network forthe given training set. On each iteration backpropagation re-cursively computes the gradient or change in error with re-spect to each weight in the network, dE/Aw, and these valuesaie used to modify the weights. Adding a fraction of the neg-ative gradient to each weight is equivalent to performing asteepelt descent minimization of the error function with re-specl to each weight in the network (Rumelhart et ol., 1986).B-ecause standard backpropagation tends to be slow and doesnot scale up to larger problems well, faster variants of back-propagation are widely used. The algorithm employed in thispaperls Fahlman's (LgBB) Quickprop algorithm with Al-meida and Silva's (rsso) adaptive learning rate. In essence'Quickprop is a second-order gradient descent techniquebasedbnNewton's method (Fahlman, 19BB). The algorithmrecursively computes the error derivative with respect toeach weight in the network as in conventional backpropaga-tion (Rumelharr et a1.,1986). However, for each weight,Quickprop uses a copy of the previous gradient and weightupdate, in conjunction with the current error grarlient, to de-telmine a parabola of error versus weight strength. By em-ploying the quadratic update formula,

+A w l t ) :

* - * A w ( t - l )d*ft1)

- a*ftf

where f indicates the iteration number, and other heuristics,Quickprop tries to jump to the minima of the parabola. Al-though th-e derived parabola is only an approximation to thetrue error surface seen by each weight, the procedure whenapplied iteratively has been shown to be more effective than

PEER.REVIEWED ARTICTE

Sugar beet

Polaloes

Winter wheat

Spring barley

Carrols

Grass

Spring wheat

Inpulunits

Data tlow

Hidden Oulpulunils unils

Figure 4. Basic structure of the 4:3:7 artificial neural net-work used. For clarity, some of the network's connectionshave been omitted.

)..t t

standard backpropagation (Fahlman, 19BB). An extensionproposed by Almeida and Silva (ro0o) employs an individ-ual learning rate for each weight in the network. These learn-ing rates are adapted during training according to theheuristic that, if the previous and current gradients have thesame sign, that is, direction, the local learning rate is in-creased. If the gradients have different signs, the learningrate is decreased. Such a heuristic allows each learning rateto tune itself during training and has been shown to reducezigzagging on the error surface and therefore learning times.(Almeida and Silva, 1990). The network architecture andlearning algorithm parameters used were tuned following anumber of learning trials. The architecture selected con-tained as few weights as possible while still allowing signifi-cant learning to occur. Such an approach not only decreasesIearning algorithm run times but also increases the artificialneural networks' potential for generalization (Baum, 1990J. Afull exposition of the algorithm employed here is beyond thescope of this article but may be found in Yates et ol. (1994).

Classilication Acculacy Assessment and CompatisonA statement of classification accuracy is required for theevaluation of classification performance. There are many ap-proaches for the assessment of image classification accuracyand, because there are no general rules which can be fol-lowed to define an appropriate classification accuracy tech-nique (Congalton, r991), one simple, easy to interpret, andwidely used measure, the percentage correct allocation, wasused. However, the classification confusion matrices are alsopresented, enabling the calculation of other measures of clas-sification accuracy if desired fStory and Congalton, 1986;Foody, 1992). For clari ty, the main diagonal of the classif ica-tion confusion matrices, which illustrates the correct class al-locations, are highlighted.

The significance of the difference in classification accu-

394

racy observed between the artificial neural network and dis-criminant analysis classifications was assessed, on theassumption that the samples may be considered independ-ent, with the difference between proportions test (Freundand Will iams, 1959; Clark and Hoskins, 1986J.

Results and DiscussionFor the classifications of the sAR data, a three-layer artificialneural network architecture was used, comprising four inputunits, three hidden units, seven output units, and a bias unit(Figure a). The SAR data were divided into training and test-ing sets for a series of classifications. For the initial set ofanalyses, the training set comprised rzz fields drawn fromall seven classes. The testing set, however, comprised 36fields of only the three most predominant crop types foundin this region; spring barley, sugar beet, and winter wheat.To assess the level of inter-class separability and illustratethe relative ability of the artificial neural network and discri-minant analysis to identify or "learn" the characteristic ap-pearance of a class, the accuracy with which both couldclassify the training data was assessed. The results showedthat a significantly higher classification accuracy was derivedfrom the artificial neural network classification than from thediscriminant analysis (Figure 5). This result indicates thatthe artificial neural network is able to learn class appearancemore accurately than does the discriminant analysis.

Once trained, both classifiers were then used to classifythe independent testing set. As expected, the accuracy of theclassiflcations of the independent testing data sets werelower than those derived from the training classifications(Swain, 1978). As with the training classification, the artifi-cial neural network was able to classify the data to a signifi-cantly higher accuracy than the discriminant analysis (Figurec l .

The results above were derived under the assumptionthat each class had an equal a priori probability of oicur-rence. Often this is not the case in remote sensing and, forthe data set under investigation, there was a marked range inclass occurrence, with three classes covering most of the testsite. If this information on class occurrence is known beforethe analysis, it may be used to increase classification accu-racy (Strahler, 1980). Incorporating prior probabilities, de-fined in direct proportion to the number of fields in eachclass in the training set, into the discriminant analysis re-sulted in an increase in classification accuracy (Figure 6). Itis evident, however, that the classifications derived from theartificial neural network were slightly, but not significantly,more accurate than those from the discriminant analysiseven with the inclusion of prior information for a trainingclassification. With the classification of the testing data, how-ever, the discriminant analvsis with prior information wasable to classify the data to ihe same i"".,ru"y as the artificialneural network. It was also apparent that the distribution ofclass allocation evident in the confusion matrices derivedfrom the discriminant analysis with a priori information wassimilar to that from the artificial neural network. These re-sults, therefore, appear to indicate that the artificial neuralnetwork is superior to the discriminant analysis in terms ofcharacterizing class appearance and so indicating a high po-tential for accurate inter-class discrimination. This may not,however, always translate into a more accurate classificationof an independent testing set if prior information on class oc-currence is available to the discriminant analysis. The use ofprior information within an artificial neural network is more


TRAINTNC

Prodicled

TESTINC

Pr6dicl€d

WW SWspB SUB

spB

SuB

WWD A E

(PRroRs=EeuAL) e SW

P

spB

WW

spB

SuB

wwANN E a*

P

c

G

Accuracy - 8,|.9i",6

Figure 5. Confusion matrices for the classifications of the sAR data by a discriminant analysis (DA)with equal prior probabil it ies of class occurrence (priors:equal) assumed and by the artif icial neuralnetwork (ANN). Note that the training set comprised seven classes and the testing set three classes.The class codes are: SpB - Spring barley; SuB - Sugar beet; WW - Winter wheat; SW - Spring wheat; P- Potatoes: C - Carrots: G - Grass.

8 'I1

I 3

5 2

A e u 6 q y - 6 t . 1 I

mI 28 1

I 29

5

2 2 a

2 1 1 1

1 2 b

Accuracy - 83.33%

TRAIMNGPredicled

SuB WW SW P

TESTINGPcdictad

SpB

SpB

SUB

WWD A E

(PRIoRS=SIZE) P SW

P

SpB

SUB

Accuacy - 80.33%

Figure 6. Confusion matrices for the classifications of the sAR data by a discriminant analysis (DA)wilh a priori information of class occurrence. Prior probabilities were defined in direct proportion tothe size of each class in the training set (priors:size).

u

'2a 2

I . T 1

29

I

I I

1 1 I

1 3 4

Aeuacl - 93.33'.

problematic and is the focus of on-going research with pres-ent emphasis on the effect of training set size on the oppor-tunity to learn class appearance (Foody et ol.,1992b; 1993),

Another important issue is the reaction of the classifica-tion to cases that are members of an untrained class. Fre-quently, in a supervised image classification, the image willcontain classes for which a training set has not been ac-quired, For instance, an image used for a crop classificationmay contain classes of no interest to the user such as forest,water, or urban land for which training sets were not de-

fined. Cases of these untrained classes must, however, be al-Iocated to one of the trained classes by the classification. Theeffect of untrained classes on the performance of the twoclassifiers was assessed for a data set in which the trainingset comorized g0 cases of the three most predominant clas-ses but i,vas used to classify a testing test Lontaining 68 casesof all seven classes (Figure 7). As with the previous classifi-cations, the artificial neural network classified the trainingdata to a higher accuracy than did the discriminant analysis.It was evident, however, that the discriminant analysis clas-

PEER.REVIEWED ARTIC IE

.0, "3ljl"'**l;t-l-'lET,8T]tr-I-T,, I

A@wacf -91.11

spB

TESTINCPBdictod

SuB WW

spB

1."ro*rt5qu^r, ! sreWW

spB

WW

spB SuB WW

A@wacy - 17,06

SpB SuB WWspB

! sreWW

SpB

SW

u

A@urey - 11jt/.

Figure 7. Confusion matrices for the classifications of sARdata by discriminant analysis (DA) and artif icial neural net-work (ANN). Note that here the training set comprisedthree classes but the testing set contained seven clas-ses, four of which were untrained classes.

9 3

1 2'| t 1

4

1 1 1

2

I 7

Accu6cy - 98.89%

8 1 J

1 2

2 1 0

4

12

5

sified the testing set to a marginally, but not statistically sig-nificant, higher accuracy than the artificial neural network,and inspection of Figure 7 shows that the difference in clas-sification accuracy was not caused by cases of the untrainedclasses, The value of the classifications could be improved ifuntrained classes were identifiable. By outputting the relativestrength of class membership in addition to the most likelyclass of membership, it may be possible to identify cases thatmay be members of untrained classes. Measures of thestrength of class membership can be derived from a range ofclassifiers and used to improve the value of the classification(e.g., Foody et al., Lss2b). The outputs of an artificial neuralnetwork classifier, computing over the real numbers, may beinterpreted as such a measure of class membership, with theconvention that the output with the largest value is taken asthe predicted class of membership. The potential of thismeasure for identifying members of untrained classes was as-sessed with reference to the strengths of class membershipderived from a previous classification, the training classifica-tion of 122 cases drawn from all seven classes. Analysis ofthe relative strength of membership each case displayed toeach class revealed that a case typically showed a highstrength of membership to its allocated class and a lowstrength to others (Figure B). This indicated that the artificialneural network provided a fairly hard allocation, Iike the dis-criminant analysis (Table 1), and that this measure of thestrength of class membership may not be useful for the iden-tification of cases of untrained classes.

The ability of the classifiers to accurately determineclass membership for an independent testing set will be de-

396

pendent on the quality of the description of class appearancegenerated in the training stage. Two key issues related to thisare the representativeness of the training data and their sta-tistical diitribution. While the latter shoirld only affect theperformance of the discriminant analysis because the artifi-iial neural network is a distribution free technique (Medinaand Vasquez, 1991), it is essential for all classifiers that thetraining data are representative of the class from which theyare drawn if they are to be of value in determining classmembership for cases of previously unknown class. The ef-fects of data distribution and representativeness were as-sessed in a series of classifications using a simulated dataset. The latter comprised four classes, A, B, C, and D, eachwith 100 cases. Each class was generated with a normal dis-tribution, the characteristics of which are given in Table 2;normality was assessed by the Kolmogorov-Smirnov test. Thedata for each class were then split into training and testingsets comprising 33 and 67 cases, respectively. The architec-ture of the artificial neural network was modified from thatshown in Figure 4 to one with one input, three hidden, andfour output units, a 1:3:4 structure.

As with the analyses of the SAR data, the accuracy withwhich the training and testing data could be classified wasassessed for all classifications. In the first set of analyses, thetraining data for each class were sampled to ensure that theywere representative of the class from which they were drawn(Table 3); representativeness was assessed by the Mann-Whitney U test. Classifications of the training and testingdata by the discriminant analysis and artificial neural net-work were found to be of comparable accuracy, although theartificial neural network classifications were marginally moreaccurate (Figure 9). Keeping the size of the training and test-ing sets the same but taking a non-representative sample (Ta-ble 4) to form the training set, the classifications wererepeated; here an unrepresentative sample is considered tobe one drawn from the population but having a mean that isdissimilar to that of the population at the 95 percent level ofconfidence. Again, the performance of both classifiers wasfairly similar, but there was a marked difference in the accu-racy with which the training and testing sets were classified(Figure 10). This result is to be expected because the trainingset for each class would not be reoresentative of that class inthe testing set. It was apparent that both classifiers were af-fected similarly by unrepresentative training data, and usersmay therefore need to exercise care in selecting training sitesand ensure, for instance, that atypical cases are excludedfrom training sets through the application of a training setpurification procedure (Mather, 1,587b; Arai, 1992). It may bepossible to increase the accuracy of the artificial neural net-work classification by including a "noise" term which less-ens the sensitivity of the classification to the exactcharacteristics of the training data set fHolmstrm and Koisti-nen, 1992).

Thus far, the simulated data for each class had displayeda normal distribution. To assess the effect of data non-nor-mality on the relative performance of the classifiers, the datafor each class were deliberately skewed through the applica-tion of power functions (Table 5). These non-normal datawere then used in a set of classifications using representativeand non-representative training samples as before.

With a representative sample of the non-normally dis-tributed data used for training both classifiers, the accuracywith which the artificial neural network and discriminantanalysis could classify the training and testing data were as-sessed. The classifications were less accurate than the com-

PE&RS

PEER.REVIEWED ARI IC IE

Spring Sugarbarley beet

REFERENCE CLASS

Winler SPring Potatoeswheat wheat

Carrots Grass

Springbarley

Sugarbeet

Winterwheat

aA 6

t ol / \

: spring !

] El - f

bz

Potatoes

*-l "h st t , i !| . liil

l l i l i ll I . . '11, , : l

.h ,{' *il 'lil ,, ,, '

Carrots

Grass

Strength of membershiP

Figure 8. Variations in the strength of class membership derived from the artificial neural network for all casesto each class. Derived from the training classification of L22 cases of seven classes.

PEER.REVIE IYED ARTICTE

Tlare 1. A Posrtntont pRoBABtLtlEs oF CLAss MeNageRsFltp DeRrveo rnov rxe TRnnttNc CLASSTFTcATtoN oF 122 CAsES or Srveru Crnssrs By rHE DtscRtMtNANTAtelvsrs. Nore tnnt, ns Exprcto, rue CLlsstncntroru ls RernrrvEly Hlno.

Actual classA posteriori probabilities of class membership

Cases allocated correctly Casses allocated incorrectly

To actual classof membership

mean min max

To actual classof membership

mean min max

To predicted classof membership

mean min maxNumber of cases

allocated incorrectly

Spring barleySugar beetWinter wheatSpring wheatPotatoesCarrotsGrass

0.8058 0 .4259 0 .S9900.s067 0 .5905 0 .s9520.41.74 0.3706 0.45330.6342 0.3914 0.79180.7867 0.4452 0.97230.4248 0.4248 0.42480.6449 0 .5495 0 .7762

0.0906 0.0408 0.14030.236S 0 .0000 0 .48990.2559 0 .0331 0 .35400.0458 0.0458 0.04580.4471. 0.4471. 0.44710,2301 0 .1053 0 .47710.2457 0 .0466 0 .3987

0.3886 0 .3433 0 .43330.6490 0.4943 0.86840.4952 0.3338 0.92420.2s00 0 .2500 0 .25000.5514 0 .5514 0 .55140.4s01 0 .2806 0 .76230.3651 0 .3383 0 .4443

26

221LoA

Class AClass BClass CClass D

60.7474.1 .790.77

720.73

4.4914,7710 .054 .98

Tnele 2. Bnstc DescRrprroN oF THE Srnrrsrrcnl DrsrRlgurrox or rxeSrvuureo Dnrn Sers.

Mean Standard Deviation

TRAININGProdid6dB C

TESTINCProdiclgdB C

6 BD A P< c

DAeuract - 76.52%

B C D

A@utey - 7,1.63%

C D

A

c BANN P

i c

D

31 2

8 1 4 1 l

6 26 I

33Accufact - 78.7996

A

B

cD

Figure 9. Confusion matrices for the classifications ofthe simulated data by the discriminant analysis (DA)and artif icial neural network (ANN). The data sets weredistributed normally and representative training setswere used.

32 1

1 0 1 2 1 1

7 24 2

33

62 5

1 9 25 21 2

1 4 46

67

64 3

1 7 27 23

1 1 52 4

67

Acdfacy - 78.36%

TneLe 3. MEAN AND SrnruonRo Devrnrrol,t (Sro Dev) or rne Tnanrruc(Reenrsrrurlrrve SAMeLE) AND TESTTNG DAIA UsED rru rHr CrnssrncATtoNs oN

wnrcH FrcuRe 9 rs Bnseo.

Training Testing

Mean Std Dev Mean Std Dev

Class AClass BClass CClass D

60.327 3 , 1 38S.S5

119.1 .9

4 . 2 874.O3s . 7 34.90

60 .9574.69s1 .09

120 .30

4.61't4.3't

70.265.05

parable classifications derived using normally distributeddata (Figures I and 11). This result appears to have arisen asa function of increased overlap between classes W and Z as aresult of the deliberate skewing of the distributions ratherthan as a result of the distributions being non-normal. Aswith the classifications using normally distributed data witha representative training set, the accuracy of the training andtesting classifications were similar for both the artificial neu-ral network and discriminant analysis classifications, al-though the accuracy of the artificial neural networkclassifications were again marginally higher.

The accuracy of classifications using a non-representa-tive training set of non-normally distributed data were foundto be higher from the artificial neural network than from thediscriminant analysis. For both classifiers, there was also adifference in the accuracy of the training and testing classifi-cations, although the magnitude was similar for both tech-niques (Figure 12).

The results of the classifications of the simulated dataset therefore appear to show that the artificial neural networkclassifies the data as or more accurately than the discrimi-nant analysis. It seems that the ability of the two classifiers

398

to characterize classes, as indicated by the training classifica-tions, is similar, except where the data have a non-normaldistribution. In the laiter situation, the artificial neural net-work classified the data to a higher accuracy. Also, there wasonly an appreciable difference between the accuracy of train-ing and testing classifications when an unrepresentativetraining set was used. These results, especially given the dif-ficulties in defining a truly representative training set and inassessing and correcting for non-normality in the data, indi-cate that artificial neural networks have considerable poten-tial for the classification of remotely sensed data, especiallyas the statistical distribution of the data for each class is of-ten unknown and the analysis is often based on a smalltraining set.

The undoubted advantages of the artificial neural net-work based classifications over the discriminant analysis incertain circumstances must, however, be seen in the Iight oftheir limitations. Among the latter is included the Iargelysubiective manner in which the network architecture is de-


TABLE 4. Mrnru nruo StlroaRo Drvtnttott (Sro Dev) oF THE TRAINTNG (NoN-

RepREseltrnrrvr SAMeLE) nno Tgsttttc Dnrn Useo lN THE CLASSIFIcATIoNS oNwHtcH FtcuRE 10 ts Bnsto.

Training Testing

Mean Std Dev Mean Std Dev

Class AClass BClass Ct , lass I . ,

63.3283.47s7.86

123.57

4 .5313 .4910.2"14 .52

5S .2069 .6087.1.9

1 1 8 . 9 0

3 . 3 512. ' l .A

7 .934 .09

TnerE 5. CHnnncrERrsttcs oF THE SKEWED Sttr.luuteo Dnrn Ser. CussEs W,

X. Y. nrrro Z nlr HlvE Sxeweo DtsrnteurloNS AND AnE Noll-NoRvnrrvDrsrnreutgo AT THE 90 Pencerur Level or ConnoErucr.

Input c lass Funct ion appl ied New class New class mearr

B

D

A ' X 1 0 3B s x 1 0 ec r x L o 6Dri x 10 12

w

YZ

37. 'L30 .6

7 . 730 .8

TRNNINCPredictod

B C

TF,sTINCPredictedB C

_ R

D A i< c

D D

Accuracy - 80.30%

B C D

A@uracr - 57.54%

B C D

A

- b

ANN Ri c

D

32 1

o n7 24 1

1 32

Accur&y - 81.E2%

67

38 mI .16 n

J 64

A@urey - 67.16%

Figure 10. Confusion matrices for the classifica-tions of the simulated data by the discriminantanalysis (DA) and artif icial neural network (ANN).The data sets were distributed normally and non-representative training sets were used.

32

6 1 8 I

I 7 24 1

32

67

38 a45 21

2 65

fined. Furthermore, the artificial neural network is computa-tionally more demanding than the discriminant analysis inthe training stage of the classification; in the analyses per-formed here, the training stage of the artificial neural net-work classifications involved some 500 iterations.

ConclusionsFrom classifications of saR data and simulated data sets byan artificial neural network and discriminant analysis, fourmain conclusions may be drawn:

o The artificial neural network consistently provided a highertraining classification accuracy than did the discriminantanalysis, indicating that it is more able to accurately charac-terize class appearance. The differences, however, were onlysignificant if the data were non-normally distributed.

PE&RS

o The better training classification performance ofthe artificial

neural network does not always imply a better ability to clas-

sify an independent testing set. The results of a series of clas-

sifications showed that higher classification accuracies were

derived from the artificial neural network but the differences

were at times insignificant, particularly when a priori infor-

mation was available to the discriminant analysis. The use of

a priori information in an artificial neural network is cur-

rently under further investigation.o Non-iepresentative training data lead, as expected, to signif-

icant differences between training and testing classificationaccuracies, and the effect was fairly similar for both the ar-

tificial neural network and discriminant analysis classifica-

uons.

TRAINNCProdbledX Y w

TESTINGProdicledX Y

n xD A ;

z

wAccuracy - 55.3(fi

x Y z wAeure] - 55.97%

X Y Z

W

- XANN 5

: Y

z

31 2

1 1 4 1 1

33

1 9 t a

Accuracy - 60.61 7o

wX

z

Figure 11. Confusion matrices for the classificationsof the simulated data by the discriminant analysis(DA) and artif icial neural network (ANN). The data setswere distributed non-normally and representative train-ing sets were used.

24 a

1 2 b 1 2 a

J J

o 1 3 I 1 0

49 4 1 4

25 1 4 22 o

0 /

1 9 zd 20

62 I 4

20 t 1 22 1 4

@ 1

?o 28

Aeuracy - 62.31%

TR,AININCPr6dided

X Y

Accuracy - 61.36%

X Y Z

TFJTINGPrsdkl6d

X Y

Accur&y - 49.25%

x Y z1 9 rA

4 33 15

67

60

A@uracy - 55.97%

wwww

r xD A P

< Y

z

W

_ Xm n E

€ Y

z

wx

z

Figure L2. Confusion matrices for the classificationsof the simulated data by the discriminant analysis(on) and artif icial neural network (ANN). The data setswere distributed non-normally and non-representativetraining sets were used.

1 4 o 1 3

4 t 6

et

I 4 2 1 8

1 1 56

4 40

67

4 1 5 4A

23 2 8

7 1 4 4 8

1 4 1 1 6

Accufacy - 65.15%

PEER.REVIEWED ARTICTE

o The artificial neural network classifications were as or moreaccurate than those der ived f rom the discr iminant analysis;except for one classification when the accuracy was slightlyand insignificantly lower. Furthermore, unlike conventionalstatistical classifiers, the artificial neural network is notbased on often untenable assumptions about the data. Con-sequently artificial neural network techniques are likely tohave considerable potential for the classification of remotelysensed data.

AcknowledgmentsWe are grateful to the Commission of the European Commu-nities for the provision of the SAR and ground data and theanonymous referees for their comments. MBM is supportedby NERC research studentship GT4/90/TLS/60 and wBy by SERCresearch studentship BgB 00 829. An earlier version of this pa-per was presented at the Remote Sensing Society's annualconference in 1992 at Dundee University.

ReferencesAlmeida, L. B., and F. M. Silva, 1990. Acceleration technioues for

backpropagation algorithm, Lecture Nofes in Computei Science(L. B. Almeida and C. J. Wellekens, editors), Neural NetworksEURASIP Workshop, 412:11O-1,19.

Arai, K., 1992, A supervised Thematic Mapper classification with apurification of training samples, International lournal of RemoteSensing, L3:2O39-2O49.

Baum, E. B., 1990. When are K nearest neighbour and backpropaga-tion accurate for feasible sized sets of examples? Lecture Notesin Computer Science (L. B. Almeida and C. f. Wellekens, edi-tors), Neural Networks EURASIP Workshop, 412:2-ZS.

Benediktsson, I . A, , P. H. Swain, and O. K. Ersoy, 1990. Neural net-work approaches versus statistical methods in classification ofmultisource remote sensing daIa, IEEE Transactions on Geosci-ence and Remote Sensing, 28:540-55L.

Campbell, J. B., 198L. Spatial correlation effects upon the accuracyof supervised classification of land cover, Photogrammetric Engi-neering & Remote Sensing, 47:355-363.

Clark, W. A. V., and P. L. Hosking, 1986. Statistical Methods for Ge-ographers, Wiley, New York.

Congalton, R. G., 1991. A review of assessing the accuracy of classi-fications of remotely sensed data, Remote Sensing of Environ-men t , 37 :3546 .

Davalo, E., and P. Naim, 1991. Neural Networks, Macmillan, Basing-stoke.

Fahiman, S. E., 1988. Faster-learning variations of backpropagation:an empiricai study, Proceedings of the 1988 Connectionist Mod-els Sumner School, "17-26 J:urre, San Mateo, Morgan Kaufman,Carnegie-Mellon University, pp. 38-51.

Foody, G. M., 1988. The effects of viewing geometry on image classi-fication. International lournal of Remote Sensing, 9:1909-1915.

G. M., 1992. Classification accuracy assessment: some alter-natives to the kappa coefficient for nominal and ordinal levelclassifications, Remote Sensing-From Research to Operation,Remote Sensing Society, Nottingham, pp, 52S-538.

Foody, G. M., P. I . Curran, G. B. Groom, and D. C. Munro, 1989.Multi-temporal airborne synthetic aperture radar data for cropclassification, Geocarto International, 4:79-29.

Foody, G. M., N. A. Campbel l , N. M. Trodd, and T. F. Wood, 1992a.Derivation and applications of probabilistic measures of classmembership from the maximum-likelihood classification, Phofo-grammetric Engineering & Remote Sensing, 58:1335-1341.

Foody, G. M., M. B. McCul loch, and W. B. Yates, 1992b. An assess-ment of an artificial neural network for image classification, fle-mote Sensing-From Research to Operation, Remote SensingSociety, Nott ingham, pp. -507.

400

1993. Training set size and the accuracy of artificial neuralnetwork and statistical image classification, Towards Opera-tional Applicafions, Remote Sensing Society, Nottingham, pp.271 . -218 .

Freund, J. E., and F. l. Williams, 't9ES. Modern Business Statistics,

Pitman, London.

Hepner, G. F., T. Logan, N. Ritter, and N. Bryant, 1990. Artificiaineural network classification using a minimal training set: com-parison to conventional supervised classification, Photogram-metric Engineeri ng & Remote Se nsing, 56:46947 3.

Holmstrom, L. , and P. Koist inen, 1992. Using addi t ive noise in back-propagation training, IEEE Transactions on Neural Nelworks, 3:24-3a.

Kent, f. T., and K. V. Mardia, 1988. Spatial classification using fuzzymembership models, IEEE Transactions on Pattern Hecognitionand Machine lntelligence, 10:659-621.

Key, J. R., J. A. Maslanik, and R. G. Barry, 1989. Cloud classificationfrom satellite data using a fizzy seti algorithm: a polar example,Internotional lournal of Remote Sensing, 7O:1823-L842.

Klecka, W. R., 1980. Discriminant Analysis, Sage Publications, Bev-erly Hills.

Lee, J. , R. C. Weger, S. K. Sengupta, and R. M. Welch, 1990. A neu-ral approach to cloud classification, IEEE Transactions on Geo-science and Remote Sensing, 2B:846-855.

Liu, Z. K. , and l . Y. Xiao, 1991. Classi f icat ion of remotely-sensed im-age data using artificial neural networks, International lournal ofRemote S e n s i n g, 72 :2433-2438.

Mather, P. M., 1987a. Computer Processing of Remotely Sensed Im-ages, Wiley, Chichester.

1987b. Preprocessing of training data for multispectral imageclassification, Advances in Digital Image Processing, RemoteSensing Society, Nott ingham, pp. 111-120.

Medina, F. I . , and R. Vasquez, 1991. Use of s imulated neural net-works for aerial image classification, ACSM-ASP4S Annua|Convent ion, ACSM and ASPRS, Bethesda, Maryland, pp. 268-274 .

Pedley, M. I., and P. l. Curran, 1991. Per-field classification: an ex-ample using SPOT HRV imagery, International lournal of Re-mote Se n s i ng, 12 :2787-2792.

Quegan, S, C. C. F. Yanasse, H. de Groof, P. N. Churchill, and A.Sieber, 1991. The radiometric quality of AgriSAR d.aIa, lnterna-tional lournal of Remote Sensing, 72:277-3O2.

Rumelhart, D. E., G. E. Hinton, and R. J. Williams, 1986, Learninginternal representation by error propagation, Parallel DistribitedProcessing: Explorations in the Microstructure of Cognition (D.E. Rumelhart and J. L. McCleliand, editors), MIT Press, Cam-br idge, Massachusset ts, pp. 318-362.

Ryan, T. W., P. I . Sement i l l i , P. Yuen, and B. R. Hunt, 1991. Extrac-tion of shoreline features by neural nets and image processing,Photogrammetric Engineering & Remote Sensing, 57:947-955.

Schalkoff, R., 1992. Pattern Recognition: Statistical, Structural andNeural Approocftes, Wiley, New York.

Schneider, S., 1980. Interpretation of satellite imagery for determina-tion of land use data, International lournal of Remote Sensing,1 :85 -89 .

Short , N. , 1931. A real- t ime expert svstem and neural network forthe classification of remotelv sensed daIa. ACSM-ASPRS An-nual Convent ion, ACSM-ASFRS, Bethesda, Maryland, pp. a06-418 .

Skidmore, A. K., and B. J. Turner, 1gBB. Forest mapping accuraciesare improved using a supervised non parametric classifier, pfto-togrammetric Engineering & Remote Sensing, S4il47S-I421,.

Stornetta, T., and R. Huberman, 1987. An improved three layer back-propagation algorithm, IEEE First Intemational Conference onNeural Networks, San Diego, 3:632-644.

Story, M., and R. G. Congal ton, 1986. Accuracy assessment: A user 's

PE&RS

PEER.REVIETYED ARI ICLE

perspective, Photogrammetric Engineering & Remote Sensing,52:397-399.

Strahler, A. H., 1980. The use of prior probabilities in maximumIikelihood classification of remotely sensed data, Remote Sens-ing of Environmenf, 10:135-163.

Swain, P. H., 1978. Fundamentals of pattern recognition, RemoteSensing: the Quantitotive Approach [P. H. Swain and S. M. Da-vis, edi tors) , McGraw-Hi l l , New York, pp. 136-187.

Thomas, I. L., V. M. Benning, and N. P. Ching, 1987. Classificationof Remotely Sensed Images, Adam Hilger, Bristol.

Tom, C. H., and L. D. Miller, 1984. An automated land-use mappingcomparison of the Bayesian maximum likelihood and linear dis-criminant analysis algorithms, Photogrammetric Engineering &Remote Sensing, 50: 193-207.

Wang, F., 1990. Fuzzy supervised classification of remote sensingimages, IEEE Transactions on Ceoscience and Remote Sensing,28:1.94-20L.

Yates, W. 8. , I .V.Tucker, and B. C. Thompson, 1.994. Art i f ic ia l Neu-ral Networks as Synchronous Concurrent Algorithms: AlgebraicSpecifcation of a Feed Forward Backpropagation Network, Uni'versity of Wales, Technical Report (in press).

(Received 6 November 1992; accepted 17 March 1993; revised 16

Iune 1993)

Giles M. FoodyGiles M. Foody graduated from the Universityof Sheffield with a B.Sc. degree in geography in1983 and remained at Sheffield to complete aPh.D. in remote sensing in 1986. He is currentlylecturer in topographic science and from Octo-

ber 1994 Senior Lecturer in geography at the University Col-lege of Swansea (University of Wales).

liltlliliilffll,,.,r Mary B' McCullochMary B. McCulloch graduated with a B.A' de-gree in geography from Saint Davids UniversityCollege (University of Wales) in 19B6 and thencomDleted an M.Sc. in Environmental Manage-meni at the University of Stir l ing. She is cur--

rently completing a Ph.D. focused on land coverclassification from imaging radar data.

William B. YatesWill iam B. Yates received a B.Sc. degree incomputer science from the University College ofSwansea (University of Wales) in 1989 and iscurrently completing a Ph.D. degree in com-puter science at Swansea on the formal specifi-

cation and correctness of neurocomputers and theirapplications.

Changepinme alr.

To meet the changing needs

of today's business traveler, we've

been making quite a few changes

ourselves, including:

7 Neu Schedules. Today, weie

number one in jet departures from

Philadelphia, New York's LaGuardia,

Boston, \Washington, D.C.andTmpa, to name just

a few airports.7 Mw Terminals. 1992

marked the opening of nvo of theworld's most modern terminals atLaGuardia and Pittsburgh.

7 A Neu Alliance lVithBritish Airways. UltimatelS we'll

be able to take you to JJ9 cities in

7l countries, and offer you a newworld of service, superb qualiryand international sryle.

2 New Frequent Tiauebr Benefts.Members of the USAir FrequentTiaveler Program enjoy the fastest

free ticket to the most destinations.And rhis year weve introducedPrioriry Gold Plus, with the mostgenerous upgrade system in the

sky. And now, USAirFrequent Ti'aveler Programmembers will be able toearn mileage credit on any

British Airways flight worldwide.

Obviously, there's a lot that's

new at USAir. In fact, in the history

ofaviation, no other airline has

made so many changes so fasr.

And all of it's for you.

For reservations and information

cdl your travel consultant or USAir

ar l-800-428-4322.

Classification of Remotely Sensed Data by an Artificial Neural … · 2007. 5. 25. ·...

Documents

Transcript of Classification of Remotely Sensed Data by an Artificial Neural … · 2007. 5. 25. ·...