REMOTE SENSING CLASSIFICATION. ALGHORITMS ANALYSIS APPLIED...

REMOTE SENSING CLASSIFICATION.

ALGHORITMS ANALYSIS APPLIED TO

LAND COVER CHANGE.

GERMAN ALBA

Seminar

Master in Emergency Early Warning and Response Space Applications. Mario Gulich

Institute, CONAE. Argentina

October, 2014

Contents

1.Introduction..............................................................................................................3

2.Classification Algorithms: Categorization...............................................................4

2.1.Supervised and Unsupervised...................................................................4

2.2.Parametric and Non-parametric................................................................4

2.3. Per-pixel and Subpixel classifiers and hard and soft classifiers..............5

3.Maximum likelihood...............................................................................................5

4.Decision Tree...........................................................................................................7

5.Support Vector Machine.........................................................................................9

6. Comparison of SVM, MLC and DT's performance.............................................12

7.Fuzzy Classification..............................................................................................12

7.1 Fuzzy sets and fuzzy logic theory...........................................................13

7.2 Soft classifiers.........................................................................................16

8.Land cover change detection.................................................................................13

9.Conclusions...........................................................................................................18

10.References...........................................................................................................19

1.Introduction

Remote-sensing research focusing on image classification has long attracted the

attention of the remote-sensing community because classification results are the basis for many

environmental and socioeconomic applications. However, classifying remotely sensed data into a

thematic map remains a challenge because many factors, such as the complexity of the landscape in a

study area, selected remotely sensed data, and image- processing and classification approaches, may

affect the success of a classification (D. LU and Q. WENG, 2007).

Remote-sensing classification is a complex process and requires consideration of many factors.

The major steps of image classification may include determination of a suitable classification system,

selection of training samples, image preprocessing, feature extraction, selection of suitable

classification approaches, post-classification processing, and accuracy assessment. The user’s need,

scale of the study area, economic condition, and analyst’s skills are important factors influencing the

selection of remotely sensed data, the design of the classification procedure, and the quality of the

classification results (D. LU and Q. WENG, 2007).

This report will focus on the image classification process itself, exploring four algorithms:

Maximum Likelihood, Decision Tree, Support Vector Machine and Fuzzy classification theory and

some examples.

2.Classification Algorithms: Categorization

There are a lot of categories in which we can separate the different types of algorithms,

depending on the criteria we focus.

2.1.Supervised and Unsupervised

The first big separation is supervised and unsupervised classification algorithms, that depends

on whether training samples are needed or not.

The first case is the most used for its accuracy, but field information is needed. Land cover

classes are defined and sufficient reference data has to be available and used as training samples. The

signatures generated from the training samples are then used to train the classifier to classify the

spectral data into a thematic map (D. LU and Q. WENG, 2007). Some examples of this kind of

approach are: Maximum likelihood, minimum distance, artificial neural network, decision tree

classifier. Later on, Maximum likelihood and decision tree will be explain.

Unsupervised is also very used like a first step to understand the spectral response of the

different covers in a satellite image. Clustering-based algorithms are used to partition the spectral

image into a number of spectral classes based on the statistical information inherent in the image. No

prior definitions of the classes are used. The analyst is responsible for labeling and merging the

spectral classes into meaningful classes (D. LU and Q. WENG, 2007). ISODATA and K-means are the

most used algorithms for this.

2.2.Parametric and Non-parametric

Another important criteria for categorization is Parametric or Non-parametric

classifiers. This distinction depends on whether parameters such as mean vector and covariance

matrix are used or not.

In the first one, Gaussian distribution is assumed. The parameters (e.g. mean vector and

covariance matrix) are often generated from training samples. When landscape is complex, parametric

classifiers often produce ‘noisy’ results. Another major drawback is that it is difficult to integrate

ancillary data, spatial and contextual attributes, and non-statistical information into a classification

procedure (D. LU and Q. WENG, 2007). Examples of this are Maximum likelihood and linear

discriminant analysis.

In the second one, no assumption about the data is required. They do not employ statistical

parameters to calculate class separation and are especially suitable for incorporation of non-remote-

sensing data into a classification procedure (D. LU and Q. WENG, 2007). Examples are Artificial

neural network, decision tree classifier, evidential reasoning, support vector machine, expert system.

Later on, decision tree and support vector machine will be explained.

2.3. Per-pixel and Subpixel classifiers and hard and soft classifiers

This separation is mostly for classical classification and fuzzy classification. It depends on

which kind of pixel information is used and whether the outputs is a definitive decision about land

cover class or not. In per-pixel classifiers, most traditional classifiers, typically develop a signature

by combining the spectra of all training-set pixels from a given feature. The resulting signature

contains the contributions of all materials present in the training-set pixels, ignoring the mixed pixel

problems. Related to this are the hard classifiers. They make a definitive decision about the land cover

class that each pixel is allocated to a single class. The area estimation by hard classification may

produce large errors, especially from coarse spatial resolution data due to the mixed pixel problem.

Most of the classifiers are example of this, such as maximum likelihood, minimum distance, artificial

neural network, decision tree, and support vector machine (D. LU and Q. WENG, 2007).

In subpixel classifiers, the spectral value of each pixel is assumed to be a linear or non-linear

combination of defined pure materials (or end members), providing proportional membership of each

pixel to each end member (D. LU and Q. WENG, 2007). The same in soft classifiers, where for each

pixel a measure of the degree of similarity for every class is provided. Soft classification generate

more information and potentially a more accurate result, especially for coarse spatial resolution data

classification. This is how Fuzzy classifiers work in general, like Fuzzy-set classifiers, subpixel

classifier, spectral mixture analysis. Later on, Fuzzy set logic will be explain.

3.Maximum likelihood

Maximum likelihood classification algorithm is one of the well known parametric classifies

used for supervised classification.

The maximum-likelihood classifier is a parametric classifier that relies on the second- order

statistics of a Gaussian probability density function (pdf) model for each class. It is often used as a

reference for classifier comparison because, if the class pdf's are indeed Gaussian, it is the optimal

classifier (Paola J. D., Schowengerdt R. A, 1995). The basic discriminant function for each class is

where n is the number of bands, X is the data vector, Ui, is the mean vector of class i and i, is the

covariance matrix of class i ,

The values in the mean vector, Vi, and the covariance matrix, Xi, are estimated from the training data

by the unbiased estimators

where Pi is the number of training patterns in class i. Note that in order for the inverse of the

covariance matrix to be calculated, Pi must be at least one greater than the numbe of image bands. The

second equation shown before, can be reduced by taking the natural log and discarding the constant R

term to

If the a priori probabilities are assumed to be equal, the first term is a constant and can be ignored. The

second term is a constant for each class. This leaves only the third term to be calculated for each pixel

during classification. The discriminant g i ( X ) is calculated for each class and the class with the

highest value is selected for the final classification map (Paola J. D., Schowengerdt R. A, 1995).

The advantage of the MLC as a parametric classifier is that it takes into account the variance–

covariance within the class distributions and for normally distributed data, the MLC performs better

than the other known parametric classifies (Erdas, 1999). However, for data with a non-normal

distribution, the results may be unsatisfactory (Otukei, Blaschke, 2010).

4.Decision Tree

A decision tree classifier is a non-parametric classifier that does not require any a priori

statistical assumptions to be made regarding the distribution of data. The process of building the

decision tree is presented in Quinlan (1993). The basic structure of the decision tree however, consists

of one root node, a number of internal nodes and finally a set of terminal nodes. The data is

recursively divided down the decision tree according to the defined classification framework. At each

node, a decision rule is required and this can be implemented using a splitting test often of the form :

for univariate decision trees, where xi represents the measurement vectors on the n selected features

and a is a vector of linear discriminate coefficients while c is the decision threshold (Brodley and

Utgoff, 1992).

In this framework, a data set is classified by sequentially sub-dividing it according to the

decision framework defined by the tree, and a class label is assigned to each observation according to

the leaf node into which the observation falls (Friedl M. A. , Brodley C.E.,1997).

The DTs are known to produce results of higher accuracies in comparison to traditional

approaches such as the ‘‘box’’ and ‘‘minimum distance to means’’ classifiers but the performance of

DTs can be affected by a number of factors including: pruning and boosting methods used and

decision thresholds (Mahesh and Mather, 2003).

Decision trees have several advantages over traditional supervised classification procedures

used in remote sensing such as maximum likelihood classification. In particular, decision trees are

strictly non-parametric and do not require any assumptions regarding the distributions of the input data.

In addition, they handle non-linear relationships between features and classes, allow for missing values,

and are capable of handling both numeric and categorical inputs in a natural fashion (Hampson and

Volper, 1986; Fayyad and Irani, 1992a). Finally, decision trees have significant intuitive appeal because

the classification structure is explicit and therefore easily interpretable (Friedl M. A. , Brodley

C.E.,1997).

Numerous tree construction approaches have been developed over the past thirty or so years.

For classification problems that utilize data sets that are both well understood and well behaved,

classification trees may be defined solely on analyst expertise. However, this procedure is difficult to

implement in practice because the exact values of thresholds can vary substantially in both time and

space, and are therefore difficult to specify based on user knowledge alone (Friedl M. A. , Brodley

C.E.,1997).

More commonly, the splits defined at each internal node of a decision tree are estimated from

training data using a statistical procedure. The specific techniques that are used for this work are called

“learning algorithms”, which have been developed within the machine learning and pattern recognition

communities. They require high quality training data from which relationships among the features and

classes present within the data are “learned". Therefore, a set of training samples representative of the

population to be classified must be available to construct an accurate decision tree (Friedl M. A. ,

Brodley C.E.,1997).

A classic example of this approach is the classification and regression tree (CART) model

described by Breiman et al. (1984). In CART a tree-structured decision space is estimated 6by

recursively splitting the data at each node based on a statistical test that increases the homogeneity of

the training data in the resulting descendant nodes. The basic decision tree classification model

described by CART was tested using remotely sensed data (Hansen, et al. 1996). The results from this

work showed that the decision tree performed comparably to a maximum likelihood classifier in terms

of classification accuracy for the data set examined, and that decision trees have significant advantages

for feature selection and handling of disparate data types and missing data (Friedl M. A. , Brodley

C.E.,1997).

A key step in any decision tree estimation problem is to correct the tree for over fitting by

pruning the tree back. Conventionally, a tree is grown such that all training observations are correctly

classified (i.e., training classification accuracy = 100%). If the training data contains errors, then over

tting the tree to the data in this manner can lead to poor performance on unseen data. Therefore, the tree

must be pruned back to reduce classification errors when data outside of the training set are to be

classified. Quinlan (1987) and Mingers (1989) describe common methods for pruning trees.

5.Support Vector Machine

The support vector machines (SVMs) are a set of related learning algorithms used for

classification and regression. Like the DTs classifiers, the SVM are also non-parametric classifiers.

The theory of the SVM was originally proposed by Vapnik and Chervonenkis (1971) and later

discussed in detail by Vapnik (1999). The success of the SVM depends on how well the process is

trained. The easiest way to train the SVM is by using linearly separable classes. According to Osuna et

al. (1997) if the training data with k number of samples is represented as {X i , y i }, i = 1, . . ., k where

is an N-dimensional space and is a class label then these classes are considered linearly separable if

there exists a vector W perpendicular to the linear hyper-plane (which determines the direction of the

discriminating plane) and a scalar b showing the offset of the discriminating hyper-plane from the

origin. For the two classes, i.e. class 1 represented as -1 and class 2 represented as +1, two hyper-planes

can be used to discriminate the data points in the respective classes (Otukei, Blaschke, 2010). These are

expressed as

The two hyper-planes are selected so as not only to maximize the distance between the two

given classes but also not to include any points between them. The overall goal is to find out in which

class the new data points fall (Otukei, Blaschke, 2010). Overall, the SVMs are reported to produce

results of higher accuracies compared with the traditional approaches but the outcome depends on: the

kernel used, choice of parameters for the chosen kernel and the method used to generated SVM

(Huang et al., 2002).

The inductive principle behind SVM is structural risk minimization (SRM). According to

Vapnik (1995), the risk of a learning machine (R) is bounded by the sum of the empirical risk estimated

from training samples (R emp ) and a confidence interval. The strategy of SRM is to keep the

empirical risk (R emp) fixed and to minimize the confidence interval (Y ), or to maximize the margin

between a separating hyperplane and closest data points. A separating hyperplane refers to a plane in a

multi-dimensional space that separates the data samples of two classes. The optimal separating

hyperplane is the separating hyperplane that maximizes the margin from closest data points to the

plane.

Let the training data of two separable classes with k samples be represented by(x1 , y1), ..., (xk , yk )

where x Rn is an n-dimensional space, and y {+1, 1} is class label. Suppose the two classes can

be separated by two hyperplanes parallel to the optimal hyperplane:

The optimal separating hyperplane between (a) separable samples and (b)non-separable data samples.

Training data selection is one of the major factors determining to what degree the classification

rules can be generalized to unseen samples (Paola and Schowengerdt,1995). A previous study showed

that this factor could be more important for obtaining accurate classifications than the selection of

classification algorithms (Hixson et al. 1980).

With data sizes fixed, training pixels can be selected in many ways. A commonly used sampling

method is to identify and label small patches of homogeneous pixels in an image (Campbell 1996).

However, adjacent pixels tend to be spatially correlated or have similar values (Campbell 1981).

Training samples collected this way under- estimate the spectral variability of each class and are likely

to give degraded classifications (Gong and Howarth 1990). A simple method to minimize the effect of

spatial correlation is random sampling (Campbell 1996). Two strategies for this are sample rate (ESR)

in which a fixed percentage of pixels are randomly sampled from each class as training data, and equal

sample size (ESS), in which a fixed number of pixels are randomly sampled from each class as training

data (C. Huang et al., 2002). In the study perform by C. Huang et al.( 2002), is shown that for most

training cases slightly higher accuracies were achieved when the training samples were selected using

the ESR method. Considering the disadvantage of undersampling or even totally missing rare classes of

the ESR method, the sampling rate of very rare classes should be increased when this method is

employed.

Furthermore, the minimum number of samples for adequately training an algorithm may depend

on the algorithm concerned, the number of input variables, the method used to select the training

samples, and the size and spatial variability of the study area (C. Huang et al., 2002).

Also the kernel function plays a major role in locating complex decision boundaries between

classes. By mapping the input data into a high-dimensional space, the kernel function converts non-

linear boundaries in the original data space into linear ones in the high-dimensional space, which can

then be located using an optimization algorithm. Therefore the selection of kernel function and

appropriate values for corresponding kernel parameters, referred to as kernel configuration, may affect

the performance of the SVM (C. Huang et al., 2002). The parameter to be predefined for using the

polynomial kernels is the polynomial order p. Rapid increases in computing time as p increases limited

experiments with higher p values. In general, the linear kernel ( p=1) performed worse than nonlinear

kernels, which is expected because boundaries between many classes are more likely to be non-linear.

Previous studies suggest that polynomial order p has different impacts on kernel performance when

different number of input variables is used (C. Huang et al., 2002). With large numbers of input

variables, complex nonlinear decision boundaries can still be mapped into linearones using relatively

low-order polynomial kernels. However, if a data set has only several variables, it is necessary to try

high-order polynomial kernels in order to achieve optimal performances using polynomial SVM (C.

Huang et al., 2002).

Summarizing, the training of the SVM is affect by training data size, kernel parameter setting

and class separability. Generally, when the training data size is doubled, the training time would be

more than doubled. Training the SVM to classify two highly mixed classes could take several times

longer than training it to classify two separable classes. As expected, increases in training data size

generally led to improved performances (C. Huang et al., 2002).

6. Comparison of SVM, MLC and DT's performance

In C. Huang et al. (2002), the SVM was compared to three other popular classifiers, including

the maximum likelihood classifier (MLC), neural network classifiers (NNC) and decision tree

classifiers (DTC). Belove, some conclusions of this comparison are cited, excluding the NNC analysis.

The MLC had lower accuracies than the non-parametric algorithms. The SVM was more

accurate than DTC in 22 out of 24 training cases. The higher accuracies of the SVM should be

attributed to its ability to locate an optimal separating hyperplane. Statistically, the optimal separating

hyperplane found by the SVM algorithm should be generalized to unseen samples with fewer errors

than any other separating hyperplane that might be found by other classifiers. the SVM had less success

in transforming non-linear class boundaries in a very low-dimensional space into linear ones in a high-

dimensional space.

In terms of algorithm stability, the SVM gave more stable overall accuracies than the other three

algorithms except when trained using 6% pixels with three variables. Of the other three algorithms,

DTC gave slightly more stable overall accuracies than the MLC, but this one gave overall accuracies in

wide ranges. In terms of training speed, the MLC and DTC were much faster than the SVM. The SVM

was affected by training data size, kernel parameter setting and class separability.

The training speeds of the classifiers were substantially different. In all training cases training

the MLC and DTC did not take more than a few minutes on a SUN Ultra 2 workstation, while training

the SVM took hours and days, respectively.

7.Fuzzy Classification

Fuzzy systems are an alternative to classical notions of set membership and logic thathave their origins

in ancient Greek philosophy (Brule, 1985).

7.1 Fuzzy sets and fuzzy logic theory

The fuzzy set framework introduces vagueness, with the aim of reducing complexity, by

eliminating the sharp boundary dividing the members of a class from non-members. In some

situations, these sharp boundaries may be arbitrary, or powerless, as they cannot capture the semantic

flexibility inherent in complex categories. The grades of membership correspond to the degree of

compatibility with the concepts represented by the class concerned: the direct evaluation of grades

with adequate measures is a significant stage for subsequent decision- making processes (McBratney et

al., 1997).

In a formal definition of a fuzzy set, is presuppose that X = { x} is a finite set (or space) of

points, which could be elements, objects or properties; a fuzzy subset, A of X,is defined by a

function , in the ordered pairs:

In plain language, a fuzzy subset is defined by the membership function defining the membership

grades of fuzzy objects in the ordered pairs consisting of the objects and their membership grades. The

relation is therefore termed as a membership function (MF) defining the grade of

membership x (the object) in A and x X indicates that x is an object of, or is contained in X. For all

A, takes on the values between and including 0 and 1. In practice, X = {x1,x2, . . . . ,xn} and

Eq. (1) is written as:

the + is used as defined in the set theoretic sense. If = 0, then x, is omitted

(McBratney et al., 1997).

A fuzzy membership function (FMF) is thus an expression defining the grade of membership of

x in A. In other words, it is a function that maps the fuzzy subset A to a membership value between and

including 0 and 1. In contrast to the characteristic function in conventional set theory which implies

that membership of individual objects

in a subset as either belonging or not at all, → {0,1} where is the non-fuzzy

equivalent of fuzzy subset A, the FMF of x in A is expressed as:

that associates with each element x X its grade of membership [0,1]. Thus = 0

means that x does not belong to the subset A, = 1 indicates that x fully belongs, and

0 < < 1 means that x belongs to some degree; partial membership is therefore possible

(McBratney et al., 1997).

Fuzzy logic therefore uses a 'soft' linguistic type of variables (e.g. deep, sandy, steep, etc.)

which are defined by continuous range of truth values or FMFs in the interval [0,1] instead of the strict

binary (TRUE or FALSE) decisions and assigmnents, as is the case with the Boolean logic. The

linguistic input-output associations, when combined with an inference procedure, constitute the fuzzy

rule-based systems (McBratney et al., 1997).

7.2 Soft classifiers

Rule-based expert systems are often applied to classification problems in various application

fields, like fault detection, biology, and medicine. Fuzzy logic can improve such classification and

decision support systems by using fuzzy sets to define overlapping class definitions. The application of

fuzzy if- then rules also improves the interpretability of the results and provides more insight into the

classifier structure and decision making process (Roubos J.A. et al., 2003).

The automated construction of fuzzy classification rules from data has been approached by

different techniques. In Roubos J.A. et al. (2003), different examples of various approaches are listed,

like neuro-fuzzy methods in D. Nauck, R. Kruse, (1999) and S. Mitra, Y. Hayashi (2000), genetic-

algorithm based rule selection in H. Ishibuchi, T. Nakashima (1999), and fuzzy clustering in

combination with other methods such as fuzzy relations in M. Setnes, R. Babuska (1999) and genetic

algorithm (GA) optimization in M. Setnes, J.A. Roubos (2000).

In the field of classification problems, we often encounter classes with a very different

percentage of patterns between them, classes with a high pattern percentage and classes with a low

pattern percentage. These problems receive the name of “classification problems with imbalanced data-

sets” . This occurs when the number of instances of one class is much lower than the instances of the

other classes. Most classifiers generally perform poorly on imbalanced data-sets because they are

designed to minimize the global error rate and in this manner they tend to classify almost all instances

as negative (i.e., the majority class). Here resides the main problem for imbalanced data-sets, because

the minority class may be the most important one, since it can define the concept of interest, while the

other class(es) represent(s) the counterpart of that concept. For this type of data-sets, fuzzy

classification methods have been proposed to improve the results. For example, in Fernández et al.

(2007).

In addition, various approaches may be used to derive a soft classier. These approaches are

based on specific uncertainty representation frameworks such as the fuzzy set theory explained before.

In addition to the use of specific representation frameworks, the output of “hard” classifiers, such as the

maximum likelihood classifier and the multilayer perception, can be softened to derive measures of the

strength of class membership (Schowengerdt, 1996; Wilkinson, 1996).

The most common solutions adopt a fuzzy set framework (Pedrycz, 1990; Binaghi and

Rampini, 1993; Ishibuchi et al., 1993). The apparatus of the fuzzy set theory serves as a natural

framework for modeling the gradual transition from membership to non-membership in intrinsically

vague classes.

Finally, after the production of soft results, a hardening process is sometime performed to

obtain final crisp assignments to classes. This is done by applying appropriate ranking procedures and

decision rules based on the inherent uncertainty and total amount of information dormant within the

data.

However, despite the sizable achievements obtained, the use of soft classifiers is still limited by

the lack of well-assessed and adequate methods for the evaluation of the accuracy of their outputs, an

element of primary concern, which must be considered an integral part of the overall classification

procedure. In Binaghi et al. (1999), a method for assessment of soft classification is proposed.

8.Land cover change detection

Change detection is the process of identifying differences in the state of an object, a surface, or

a process by observing it at different times. Methods of change detection in remote sensing typically

analyze sequential images of the same area, and involve the detection and display of the change in the

image space (Abuelgasim A. A. et al., 1999).

The underlying assumption in using remotely sensed data for change detection is that changes

in the land-cover result in significant differences in the remote sensing measurements between two or

more dates. In addition, these differences must be larger or somehow distinguishable from other

changes in the images due to changing atmospheric conditions, seasons, illumination conditions, and

sensor calibration (Abuelgasim A. A. et al., 1999).

For this kind of studies, classification algorithms are used to identify the changes in classified

classes. Multiple options are available and the selection of approaches and resources depend on the

purposes of the study. While some comparisons of algorithmperformance have been published,there are

not generally accepted criteria for selection of the most appropriate classification algorithm for a given

set of circumstances (DeFries and Cheung-Wai Chan, 2000).

A simple taxonomy of land cover change might start with separation of land cover changes that

are continuous versus categorical. In continuous land cover changes, there is a change in the amount or

concentration of some attribute of the landscape that can be continuously measured (Abuelgasim A. A.

et al., 1999). An example might be change in a forest attribute like forest cover or basal area or leaf

area index. In this context, the goal of change detection would be to measure the degree of change in an

amount or concentration through time (Abuelgasim A. A. et al., 1999).

The second type of change is categorical, in which the changes in time are between land cover

or land use categories. Simple examples in this context might be deforestation, in which areas that were

once forest are no longer. Urbanization, expansion of agriculture, or reforestation are other examples

(Abuelgasim A. A. et al., 1999).

The second type of change determination is sensitive to slight quantitative changes in the input

vector introduced by noise that lead to changes in the most likely class. The problem is particularly

acute for change determinations between classes that are spectrally similar. The approach that has been

used in recent literature involves examining the output signal and a comparison of the magnitude of

likelihood values or fuzzy membership values between classes(Abuelgasim A. A. et al., 1999). Using

maximum likelihood, this approach involves examination of the distribution of likelihood values for the

various classes (Foody et al., 1992).

Techniques for extracting land cover information are very difficult to generalized but the effort

is to automated to the degree possible to process these large volumes of data. In addition, the

techniques need to be objective, reproducible, and feasible to implement within available resources. For

example, there are a lot of international efforts to characterize the extent of forest cover globally from

satellite data at repeated intervals over time. This task can only realistically be achieved through

techniques that minimize time-consuming human interpretation and maximize automated procedures

for data analysis (DeFries and Cheung-Wai Chan, 2000).

Comparing the classifiers depending on the performance, it can be stated that all classifiers are

affected by the selection of training samples. However, the initial trends of improved classification

accuracies for all classifiers as training data size increased emphasize the necessity of having adequate

training samples in land cover classification. Feature selection is another factor affecting

classification accuracy. Substantial increases in accuracy are normally achieved when as much

information as possible is used in deriving land cover classification from satellite images.

In the literature there is a vast amount of publications of very different techniques for this kind

of studies. For example, De Fries and Cheung-Wai Chan (2000), De Fries et al. (1998), Rogan (2002).

For fuzzy techniques applied to this thematic, Lizarazo (2012), Fisher (2010), Gopal et al. (1999). For

implementation Strategy and more information about the approaches to this kind of complex

phenomenons, see Lambin et al. (1999). For classification accuracy assessment, see Foody (2002).

9.Conclusions

Remote sensing classification is a very vast thematic and a lot of techniques are available in the

different softwares. The election of them is personal and specific of the porpoise of the study itself.

But some general appreciations can be made to avoid the principal problems that have to be solved,

mostly regarding to training areas and training the algorithms.

SVM are a very promising way of classification, but more slow and more difficult to train for

good results. More simple algorithms like MLC, are very useful and more easy to obtained results with,

but have problems with data that is not normally distributed. DT's are very used to land cover change

studies and there is a lot of literature that state this are very useful for this porpoise.

To take advantage of the pros and cons of them a combination of techniques is, in my

perspective, a good approach. More complex analysis can be made with fuzzy systems that can be a

good second step in the understanding of this complex processes.

10.References

Abuelgasim A. A. , Ross W. D., Gopal S., Woodcock C. E. 1999. Change Detection Using

Adaptive Fuzzy Neural Networks: Environmental Damage Assessment after the Gulf War

. REMOTE SENS. ENVIRON, 70:208–223.

Binaghi, E., Rampini, A. 1993. Fuzzy decision making in the classification of multisource

remote sensing data. Optical Engineering 6, 1193±1203.

Breiman, L., Friedman, J. H., Olshen, R. A., and Stone, C. J. 1984. Classi cation and Regression

Trees . Belmont, CA: Wadsworth International Group, 358 pages.

Brodley, C.E., Utgoff, P.E. 1992. Multivariate versus univariate decision trees. Technical

Report 92-8. University of Massachusetts, Amherst, MA, USA.

Brule, F.J. 1985. Fuzzy systems - - a tutorial. Source: Internet Newsgroups: comp.ai;

http//www.quadralay.com/www/Fuzzy/tutorial.html.

Campbell, J. B. 1981, Spatial correlation effects upon accuracy of supervised classification

of land cover. Photogrammetric Engineering and Remote Sensing, 47, 355–363.

Campbell, J. B. 1996, Introduction to Remote Sensing (New York: The Guilford Press).

Carpenter, G., Gopal, S., Martens, S., and Woodcock, C. 1999. Evaluation of mixture

estimation methods for vegetation mapping, Technical Report CAS/CNS-97-014, Boston

University. Remote Sens. Environ., in press.

De fries R. S., Hansen M., Townshend J. R. G. and Sohlberg R. 1998. Global land cover

classifications at 8 km spatial resolution: the use of training data derived from Landsat imagery in

decision tree classifiers. int. j. remote sensing, vol. 19, no. 16, 3141± 3168.

DeFries, R. S. and Cheung-Wai Chan, Jonathan . 2000. Multiple Criteria for Evaluating

Machine Learning Algorithms for Land Cover Classification from Satellite Data. Remote sens.

Environ. 74:503–515.

Erdas Inc. 1999. Erdas Field Guide. Erdas Inc., Atlanta, Georgia.

Fernández Alberto, García Salvador, María José del Jesus, Herrera Francisco. 2007. A study of

the behaviour of linguistic fuzzy rule based classification

systems in the framework of imbalanced data-sets. Available online at www.sciencedirect.com

Fisher, Peter F.. 2010.Remote sensing of land cover classes as type 2 fuzzy sets

. Remote Sensing of Environment 114, 309–321.

Foody, G. M., Campbell, N. A., Trodd, N. M., and Wood,

T. F. 1992. Derivation and applications of probabilistic

measures of class membership from maximum likelihood classification. Photogramm. Eng. Remote

Sens. 58:1335– 1341.

Foody, Giles M. 2002. Status of land cover classification accuracy assessment

. Remote Sensing of Environment 80,185 – 201.

Friedl M. A. , Brodley C.E. 1997. Decision Tree Classification of Land Cover From Remotely Sensed

Data. Remote Sensing of Environment, February 7.

Gong, P., and Howarth, P. J. 1990, An assessment of some factors in influencing multispectral

land-cover classification. Photogrammetric Engineering and Remote Sensing, 56, 597–603.

Gopal Sucharita, Woodcock Curtis E. and Strahler Alan H. 1999. Fuzzy Neural Network

Classification of Global Land Cover from a 1° AVHRR Data Set. Remote sens. Environ. 67:230–

243.

Hansen, M., Dubayah, R., and Defries, R. 1996. Classification trees: An alternative to

traditional land cover classifiers. International Journal of Remote Sensing, 17 , 1075-1081.

Hixson, M., Scholz, D., Fuhs, N., and Akiyama, T. 1980, Evaluation of several schemes for

classi cation of remotely sensed data. Photogrammetric Engineering and Remote Sensing, 46,

1547–1553.

Huang, C., Davis, L.S., Townshed, J.R.G. 2002. An assessment of support Vector Machines for

Land cover classification. International Journal of Remote sensing 23, 725–749.

Ishibuchi H., T. Nakashima. 1999.Voting in fuzzy rule-based systems for pattern classification

problems, Fuzzy Sets and Systems 103, 223–238.

Ishibuchi, H., Nozaki, K., Tanaka, H. 1993. Eficient fuzzy partition of pattern space for

classification problems. Fuzzy Sets and Systems 59, 295±304.

Lizarazo, Ivan. 2012. Quantitative land cover change analysis using fuzzy segmentation

. International Journal of Applied Earth Observation and Geoinformation 15, 16–27.

• LU D. and WENG Q. 2007. A survey of image classification methods and techniques for

improving classification performance . International Journal of Remote Sensing, Vol. 28, No. 5, 823–

870.

• Mahesh, P., Mather, P.M. 2003. An assessment of the effectiveness of the decision tree

method for land cover classification. Remote Sensing of the Environment 86, 554–565.

McBratney A.B. , Odeh L O.A. 1997. Application of fuzzy sets in soil science: fuzzy logic,

fuzzy measurements and fuzzy decisions. Geoderma 77, 85-113.

Mingers, J. 1989. An empirical comparison of pruning methods for decision tree induction.

Machine Learning, 4 , 227-243.

Mitra S., Y. Hayashi, 2000. Neuro-fuzzy rule generation: Survey in soft computing framework,

IEEE Transactions on Neural Networks 11, 748–768.

• Nauck D., R. Kruse. 1999. Obtaining interpretable fuzzy classification rules from medical data,

Artificial Intelligence in Medicine 16,149–169.

• Osuna, E.E., Freud, R., et al. 1997. Support Vector Machines: Training and Applications, A.I.

Memo No. 1602, C.B.C.L. Paper No. 144. Massachusetts Institute of Technology and Artificial

Intelligence Laboratory, Massachusetts.

• Otukei J.R., T. Blaschke. 2010. Land cover change assessment using decision trees, support

vector machines and maximum likelihood classification algorithms. International Journal of Applied

Earth Observation and Geoinformation, 12S, S27–S31.

• Paola J. D., Schowengerdt R. A. 1995. A Detailed Comparison of Backpropagation Neural

Network and Maximum-Likelihood Classifiers for Urban Land Use Classification. IEEE transactions

on geoscience and remote sensing, vol. 33, NO. 4.

Paola, J. D., and Schowengerdt, R. A. 1995, A review and analysis of backpropagation

neural networks for classification of remotely sensed multi-spectral imagery.

International Journal of Remote Sensing, 16, 3033–3058

Pedrycz, W. 1990. Fuzzy sets in pattern recognition: methodology and methods. Pattern

Recognition 23, 121±146.

Quinlan, J. R. 1987. Simplifying decision trees. International Journal of Man-machine Studies,

27 , 221-234.

Quinlan, R.1993. Programs for Machine Learning. Morgan Kaufman Publishers, San Mateo.

Rogana John, Franklina Janet, Robertsb Dar A. 2002. A comparison of methods for monitoring

multitemporal vegetation change using Thematic Mapper imagery. Remote Sensing of Environment

80, 143 – 156.

Roubos Johannes A. , Setnes Magne, Abonyi Janos. 2003. Learning fuzzy classification rules

from labeled data. Information Sciences 150, 77–93.

Schowengerdt, R.A.1996. On the estimation of spatial-spectral mixing with classifier likelihood

functions. Pattern Recognition Letters 17, 1379±1387. Setnes M., J.A. Roubos. 2000. GA-fuzzy

modeling and classification: Complexity and performance,

IEEE Transactions on Fuzzy Systems 8, 509–522.

Setnes M., R. Babuska. 1999. Fuzzy relational classifier trained by fuzzy clustering, IEEE

Transactions on Systems, Man, and Cybernetics––Part B: Cybernetics 29, 619–625.

Vapnik, W.N. 1999. An overview of statistical learning theory. IEEE Transactions of Neural

Networks 10, 988–999.

Wilkinson, G.G.1996. Classification algorithms ± where next?.

In: Binaghi, E., Brivio, P.A., Rampini, A. (Eds.), Soft

Computing in Remote Sensing Data Analysis. Series in

Remote Sensing, Vol. 1, pp. 93±100.

Vapnik, W.N., Chervonenkis, A.Y. 1971. On the uniform convergence of the relative frequencies of

events to their probabilities. Theory of Probability and its Applications 17, 264–280.

REMOTE SENSING CLASSIFICATION. ALGHORITMS ANALYSIS APPLIED...

Documents

Transcript of REMOTE SENSING CLASSIFICATION. ALGHORITMS ANALYSIS APPLIED...