[IEEE 2011 Annual IEEE India Conference (INDICON) - Hyderabad, India (2011.12.16-2011.12.18)] 2011...

4
Hierarchical artificial immune system for crop stage classification J. Senthilnath 1 , S.N. Omkar 1a , V. Mani 1 1 Evolutionary Computations Lab Department of Aerospace Engineering Indian Institute of Science Bangalore, India a [email protected] Nitin Karnwal 2b 2 Department of Instrumentation and Control Engineering National Institute of Technology Trichy, India b [email protected] AbstractThis paper presents a new hierarchical clustering algorithm for crop stage classification using hyperspectral satellite image. Amongst the multiple benefits and uses of remote sensing, one of the important application is to solve the problem of crop stage classification. Modern commercial imaging satellites, owing to their large volume of satellite imagery, offer greater opportunities for automated image analysis. Hence, we propose a unsupervised algorithm namely Hierarchical Artificial Immune System (HAIS) of two steps: splitting the cluster centers and merging them. The high dimensionality of the data has been reduced with the help of Principal Component Analysis (PCA). The classification results have been compared with K-means and Artificial Immune System algorithms. From the results obtained, we conclude that the proposed hierarchical clustering algorithm is accurate. Keywords- Crop Stage classification; Hierarchical Artificial Immune System; Principal Component Analysis I. INTRODUCTION With the advent of high resolution sensors and high speed data processing devices, the use of hyperspectral images for land resource estimation has gained considerable attention [1]. Hyperspectral image data has been used for various applications such as target detection [2], material identification and mapping [3], identifying surface properties [4], crop classification [5, 6], crop stage identification [7] and many other fields. One of the most significant issues involved with the use of hyperspectral images is the computational burden due to the high dimensionality of the data. Various dimensionality reduction techniques have been used in the past to overcome this problem. Principal Component Analysis (PCA) [8] is one of the most widely used dimensionality reduction technique. PCA is a method in which the directions of maximum variance are retained by the use of Eigen values of the covariance data matrix. It identifies patterns in data, and expresses the data in such a way as to highlight their similarities and differences. Since patterns in data can be hard to find in data of high dimensions, where the luxury of graphical representation is not available, PCA is a powerful tool for analyzing data. The other main advantage of PCA is that once these patterns in the data are found, the data is compressed, that is, by reducing the number of dimensions, without much loss of information. One of the applications of hyperspectral images is crop stage classification [7]. It is seen that a change in growth stage of a crop is characterized by changes in leaf pigments such as chlorophyll and carotenoids. These changes in turn affect the reflectance spectrum of the crop for intensities of different wavelengths which is used to extract stage information. Crop reflectance spectrum is found to be characteristic in the red edge wavelength, i.e. the wavelength in the range 680-780 nm, which indicates that dimensionality reduction is inherently possible in the case of crop data. This fact has been utilized to attempt crop stage classification using the intensity information of the bands in the red edge region in [7]. For the purpose of crop stage classification, various image classification techniques have been tried and tested [9]. Artificial neural networks (ANN) have been used for this purpose and have been shown to have satisfactory classification efficiency [10]. The drawback associated with ANN, for entire dimension of the data set, is that it leads to computationally complexity [11]. In this paper, we propose a Hierarchical Artificial Immune System (HAIS) for the crop stage problem. The proposed algorithm uses splitting and merging techniques using a combination of parametric and non-parametric methods. Hierarchical clustering constructs a hierarchy of clusters by splitting a large cluster into smaller ones and merging smaller clusters into their nearest centroid. For splitting the centers, Artificial Immune System (non-parametric) algorithm is used and for merging the centers to their respective classes, K- means (parametric) is used. The major hurdle here is to obtain possible combinations of clusters that can be used to split the data set and merge the data set efficiently with their respective groups. The proposed algorithm overcomes these difficulties. The performance measure used is classification efficiency. A Hyper-spectral EO-1 Hyperion Satellite Image of the region around Meerut district in Uttar Pradesh, India is used for the purpose of crop stage classification and to demonstrate the performance of the proposed HAIS algorithm. II. HIERARCHICAL ARTIFICIAL IMMUNE SYSTEM The Immune System is a complex functional system of cells, tissues and organs with the fundamental role of protecting the body from antigens (Ag) with the help of antibodies (Ab). The Immune System employs two significant principles, from various principles [12], that facilitate the B-cells generation to bind the antigens. These principles are Clonal Selection Theory and Immune Network Theory. Details of these principles are

Transcript of [IEEE 2011 Annual IEEE India Conference (INDICON) - Hyderabad, India (2011.12.16-2011.12.18)] 2011...

Hierarchical artificial immune system for crop stage classification

J. Senthilnath1, S.N. Omkar1a, V. Mani1 1Evolutionary Computations Lab

Department of Aerospace Engineering Indian Institute of Science

Bangalore, India [email protected]

Nitin Karnwal2b 2Department of Instrumentation and Control Engineering

National Institute of Technology Trichy, India

[email protected]

Abstract—This paper presents a new hierarchical clustering algorithm for crop stage classification using hyperspectral satellite image. Amongst the multiple benefits and uses of remote sensing, one of the important application is to solve the problem of crop stage classification. Modern commercial imaging satellites, owing to their large volume of satellite imagery, offer greater opportunities for automated image analysis. Hence, we propose a unsupervised algorithm namely Hierarchical Artificial Immune System (HAIS) of two steps: splitting the cluster centers and merging them. The high dimensionality of the data has been reduced with the help of Principal Component Analysis (PCA). The classification results have been compared with K-means and Artificial Immune System algorithms. From the results obtained, we conclude that the proposed hierarchical clustering algorithm is accurate.

Keywords- Crop Stage classification; Hierarchical Artificial Immune System; Principal Component Analysis

I. INTRODUCTION With the advent of high resolution sensors and high speed data processing devices, the use of hyperspectral images for land resource estimation has gained considerable attention [1]. Hyperspectral image data has been used for various applications such as target detection [2], material identification and mapping [3], identifying surface properties [4], crop classification [5, 6], crop stage identification [7] and many other fields. One of the most significant issues involved with the use of hyperspectral images is the computational burden due to the high dimensionality of the data. Various dimensionality reduction techniques have been used in the past to overcome this problem. Principal Component Analysis (PCA) [8] is one of the most widely used dimensionality reduction technique. PCA is a method in which the directions of maximum variance are retained by the use of Eigen values of the covariance data matrix. It identifies patterns in data, and expresses the data in such a way as to highlight their similarities and differences. Since patterns in data can be hard to find in data of high dimensions, where the luxury of graphical representation is not available, PCA is a powerful tool for analyzing data. The other main advantage of PCA is that once these patterns in the data are found, the data is compressed, that is, by reducing the number of dimensions, without much loss of information. One of the applications of hyperspectral images is crop stage classification [7]. It is seen that a change in growth stage of a

crop is characterized by changes in leaf pigments such as chlorophyll and carotenoids. These changes in turn affect the reflectance spectrum of the crop for intensities of different wavelengths which is used to extract stage information. Crop reflectance spectrum is found to be characteristic in the red edge wavelength, i.e. the wavelength in the range 680-780 nm, which indicates that dimensionality reduction is inherently possible in the case of crop data. This fact has been utilized to attempt crop stage classification using the intensity information of the bands in the red edge region in [7]. For the purpose of crop stage classification, various image classification techniques have been tried and tested [9]. Artificial neural networks (ANN) have been used for this purpose and have been shown to have satisfactory classification efficiency [10]. The drawback associated with ANN, for entire dimension of the data set, is that it leads to computationally complexity [11]. In this paper, we propose a Hierarchical Artificial Immune System (HAIS) for the crop stage problem. The proposed algorithm uses splitting and merging techniques using a combination of parametric and non-parametric methods. Hierarchical clustering constructs a hierarchy of clusters by splitting a large cluster into smaller ones and merging smaller clusters into their nearest centroid. For splitting the centers, Artificial Immune System (non-parametric) algorithm is used and for merging the centers to their respective classes, K-means (parametric) is used. The major hurdle here is to obtain possible combinations of clusters that can be used to split the data set and merge the data set efficiently with their respective groups. The proposed algorithm overcomes these difficulties. The performance measure used is classification efficiency. A Hyper-spectral EO-1 Hyperion Satellite Image of the region around Meerut district in Uttar Pradesh, India is used for the purpose of crop stage classification and to demonstrate the performance of the proposed HAIS algorithm.

II. HIERARCHICAL ARTIFICIAL IMMUNE SYSTEM The Immune System is a complex functional system of cells, tissues and organs with the fundamental role of protecting the body from antigens (Ag) with the help of antibodies (Ab). The Immune System employs two significant principles, from various principles [12], that facilitate the B-cells generation to bind the antigens. These principles are Clonal Selection Theory and Immune Network Theory. Details of these principles are

given in Section A and B, and Section C gives a step-by-step description of the algorithm.

A. Clonal Selection Theory This theory involves how an immune response is mounted

when a non-self-antigenic pattern is recognized by an antibody [13]. When an antigen is detected, those antibodies, that have best affinity for antigen, proliferate by cloning and become memory cells. The affinity between the antibody and the antigens is the distance between them, which can be found by similarity metrics such as Euclidean distance. In terms of Euclidean distance, the affinity measurement for the antigen and the antibody, is as given by Eq. 1 [13].

∑=

=c

iiJxuJ

1),(

∑ ∑= ≠=

−=c

i

n

jijiijij CXu

1 ,1,)( (1)

where uij indicates the extent with which the vector Xj (j=1,2,3….,n) belongs to the group Ci (i=1,2,3….,c) which takes its values between 0 and 1. Here X vector represents antigens and group C represents the generated antibodies. During each grouping the n antigens are grouped into different Cj (j=1,2,3,….c). Based on the strength of affinity, uij can be defined as follows:

⎪⎩

⎪⎨⎧ −≠−≠

=otherwise

CXCXandikifu kjij

ij0

1 (2)

Antibodies clone according to the Eq. 3 [14].

∑=

=n

iMroundn

1)*(β (3)

where β is a multiplication factor, M is the affinity between antigen and generated antibodies, round(·) is the operator that rounds its argument. These cloned cells undergo hypermutation process which leads to increase in their affinity with the antigens. These increased affinity cells are selected and hence survive, to enter the memory pool and the remaining cells are eventually removed (suppression). These mutations are, however, dependent upon their affinity to the antigen. The highest affinity cloned cells experience the lowest mutation rates whereas the lowest affinity cloned cells have high mutation rates. This process of increased affinity is also known as Affinity Maturation. The mutation of n clones is performed according to the following equation:

)1,0(*__ NrateMutationncloneMutated ii += (4) where i lies between 1 and n, N(0,1) is a Gaussian random variable of zero mean and standard deviation of one and mutation_rate is a user-entered parameter[14]. When a body has successfully defended against an antigen, memory cells remain circulating throughout the body for very long periods of time. When the immune system is later exposed to the same type of antigen (or a similar one), these memory cells are activated, presenting a better and more efficient response.

Such a strategy ensures, both, high speed and accuracy. This scheme forms an inevitable part of a reinforcement learning strategy [15], where the system is continuously improving its capability to perform its task.

B. Immune Network Theory The Immune Network Theory [16] indicates that the immune system involves not only the interactions of antibodies and antigens but also that of antibodies with other antibodies. The immune cells respond positively to the recognition of an antigen and negatively to recognition of another antibody. A positive response results into cell proliferation, cell activation and antibody secretion (called Network Activation), while a negative response leads to tolerance and suppression of redundant antibodies (called Network Suppression). Both of these responses continue until equilibrium is reached and finally there is network of antibodies and antigens. Thus, on the basis of the strength of the antigenic affinity and through the processes of antibody optimization and antibody suppression, optimal antibodies (cluster centers) are split from the input data set. These cluster centers are now merged using K-means algorithm. This kind of clustering approach is called agglomerative technique. The optimal cluster centers generated are used for initializing K-means to perform agglomerative clustering. The objective function is the sum of error squared, which is to be minimized.

∑ ∑= ∈

−=K

k ciki

k

cxKJ1

2)()( (5)

where ck is the number of clusters, given as ∑∈

=kci k

ik n

xc and

nk is the number of data points in cluster in cluster k, given as nk=|Ck|. This objective function is directly minimized to obtain the cluster centers. Then, we label a cluster based on the maximum number of data points (i.e. voting method) belonging to a class label. To estimate the effectiveness and efficiency of the clustering algorithm, we require class labels for the data points. After agglomerative clustering is carried out, we need to group the clusters and evaluate the performance.

C. Description of the Algorithm The aiNet algorithm [17] implies both the principle of clonal selection and immune network theory. An antigen represents each data from input data set and an antibody as cluster representatives. In this study, the cluster splitting is done by antibody network based on clonal selection, affinity maturation and immune network theory and is, then, subjected to K-means algorithm and voting method is used to classify the data set to their respective classes. The implementation of the HAIS based on artificial immune system principles is as follows: 1. Randomly create a matrix M of Abs. 2. For each Ag, do

2.1 Calculate affinity between Ag and each Ab from M according to Eq. 1. 2.2 Select n highest affinity Abs.

2.3Proliferate (Clone) these n selected Abs proportionally to their antigenic affinities: higher the affinity, larger the number of clones, using Eq. 3. 2.4 Mutate this set of clones towards the Ag with a rate inversely proportional to their affinities: higher the affinity, lower the mutation. The mutation of clones is done according to Eq. 4. 2.5 Determine affinity between Ag and each mutated clones. 2.6 Re-select nc % of highest affinity Ab (clones) and create a partial memory matrix Mp. 2.7 Eliminate those Ab with affinity inferior to death threshold (σd) yielding a reduction in memory matrix Mp. 2.8 Calculate the network affinity between Ab in the matrix Mp. 2.9 Clonal Suppression: Eliminate those Ab having affinity lower than suppression threshold (σs). 2.10 Concatenate the total antibody matrix with the resultant network matrix.

3. Determine network affinity among all memory antibodies. 4. Network Suppression: Eliminate all Ab having affinity lower than σs. 5. Add a new set of randomly generated set of Ab to network Ab. Thus, cluster centers are generated in the form of network of antibodies. 6. These cluster centers are then provided for K-means clustering to optimize the function of Eq. 5. 7. Merge data points to the closest clusters using Eq. 5. 8. Use voting method for each data points belonging to the cluster. 9. Clusters are grouped in agglomerative fashion using labels. 10. Assign each data point to one of the class. 11. Calculate the performance measures of each class.

In the above algorithm, steps 2.1 to 2.7 describe the clonal selection and affinity maturation principles. Steps 2.8 to 2.10 and 3 to 5 simulate the immune network model. There are four tunable parameters for the algorithm: N: The number of Abs selected for cloning in step 2.2; nc: The percentage of reselected Abs for step 2.6; σd : The death rate, which defines the threshold to remove the low-affinity Abs after the reselection for step 2.7; σs: The suppression threshold for steps 2.9 and 4, which defines the threshold to eliminate redundant Abs. The algorithm is parameter-sensitive and these parameters have a significant influence on the quality of the result and the computational time. σs is the parameter which has the most prominent effect in controlling the network size. Based on its value, N and nc adjust the network size. σd is responsible for eliminating the antibodies with low affinity for antigens and is useful only in the first iteration of the evolving process. However, the exact behavior of these parameters is dependent upon the input data set. Therefore, optimal values of the four parameters are to be chosen for best results.

III. RESULTS AND DISCUSSSIONS In this paper, we work with images acquired using EO-1 Hyperion Satellite. It has 225 bands, with a bandwidth of the order of 10 nm ranging from 300 to 2400 nm. The image resolution is 30 by 30 m. Each image has a width of 7.5 km and length of 100 km. The region of study is around Meerut city in Uttar Pradesh, India. The latitudinal and longitudinal positions of the four corners of the image are 29 15 56.20N 77 38 9.56E, 29 15 49.53N 77 43 41.58E, 29 1 20.77N 77 37 47.23E and 29 1 14.17N 77 43 18.46E. The images were first corrected using Geomatica-10.0 software, Macrovision Corporation. To read the images, an open source C language library called Geospatial Data Abstraction Library (GDAL) was used. The data consists of three stages of growth of the wheat crop. It has a total of 352 samples with 225 bands. The details are given in TABLE I. A plot of wavelength versus change in reflectance for three stages of crop, using a pixel from each stage, in the red-edge region (680-780nm), is also shown in FIGURE I. From the figure, we can observe that reflectance value for a crop in a particular stage increases as the crop grows from Stage 1 to Stage 3. To reduce the high dimension of the image data, Principal Component Analysis Technique is used. Then, HAIS is applied on the reduced data set and the classification efficiency is noted. The individual efficiency iη for class ci is

∑=

= n

jji

iii

q

q

1

η (6)

where qii is the number of correctly classified samples and n is the number of samples for the class ci in the data set. The global performance measures are the average aη and overall classification, which are defined as:

∑=

=cn

ii

ca n 1

1 ηη (7)

∑=

=cn

iiio q

N 1

1η (8)

where nc is the total number of classes and N is the number of samples. The results in terms of classification efficiency are compared with K-means and AIS. From TABLE II, it is apparent that Hierarchical AIS gives better classification efficiency than K-means and AIS algorithm. The reason for better classification efficiency is that K-means, being a parametric method, generates only a fixed number of cluster centers while the proposed algorithm, being hierarchical in nature, generates many cluster centers. The most favorable parameter value (σs, σd , N, nc) mentioned in section II(c), used for this study are: AIS – (8, 500, 2, 0.9) and HAIS (2, 500, 2, 0.9). For the reduced data set of crop stage classification, HAIS generated 15 cluster centers. This gives algorithm the freedom to find out the best cluster centers

and merge them into small clusters with the help of K-means method. Hence, it results in better classification efficiency.

TABLE I. HYPERSPECTRAL DATA DESCRIPTION

Stage Level Description Number of Pixels

Stage 1 Emergence 110

Stage 2 Mature 109

Stage 3 Milking 133

Total Samples 352

TABLE II. CLASSIFICATION EFFICIENCY FOR HYPERSPECTRAL DATA SET

Efficiency K-means (%) AIS (%) Hierarchical AIS (%)

1η 53.64 76.36 81.82

2η 22.94 14.68 73.39

3η 38.35 87.97 82.71

aη 38.31 59.67 79.30

oη 38.35 61.65 79.55

FIGURE I. Wavelengths Vs Reflectance value for crop stage data

IV. CONCLUSION A new hierarchical clustering algorithm based on the principles of the artificial immune systems, Hierarchical Artificial Immune System, was proposed and implemented in this paper. The HAIS was successfully applied for crop stage classification of hyperspectral images. It was capable of performing data clustering by generating a representative set of memory cells for classification. The key mechanisms and concepts incorporated in the algorithm include antibody network generation, clonal selection, memory cell development and merging technique. To reduce the dimensionality of the data, PCA was applied which reduced the data from 225 dimensions to 10 dimensions. The results were compared with K-means and AIS clustering algorithm.

The classification efficiency of Hierarchical AIS is better than K-means and AIS. This attests that the proposed algorithm is applicable for processing of the hyperspectral image and has high classification precision.

REFERENCES [1] David Landgrebe, “Hyperspectral Image Data Analysis as a High

Dimensional Signal Processing Problem”, IEEE Signal Processing Magazine, Vol. 19,No.1, pp.17-28,January 2002.

[2] Mohammad S. Alam, Mohammed Nazrul Islam, Abdullah Bal & Mohammad A. Karim, Hyperspectral target detection using Gaussian filter and post processing, Optics and Lasers in Engineering, Vol. 46, No. 11, pp. 817-822, Nov 2008

[3] Amar Kachenoura, Laurent Albera, Lotfi Senhadji & Pierre Comon, ICA : a potential tool for BCI systems, IEEE Signal Processing Magazine 2008, Vol. 25, No . 1, pp. 57-68, 2008

[4] E Ben-Dor, K Patkin, ABanin & AKarnieli, Mapping of several soil properties using DAIS-7915 Hyperspectral scanner data – a case study over clayey soils in Israel, International Journal of Remote Sensing, Vol. 23, No. 6, pp 1043-1062, 2002

[5] Paul C. Doraiswamy, Alan J. Stern & Bakhyt Akhmedov, Crop Classification in the U.S. Corn Belt Using MODIS imagery, in the proceeding of International Geoscience and Remote Sensing Symposium, July 19 – 27, 2007, Barcelona, Spain.

[6] Abdulhamit Subasi & Ergun Ercelebi, “Classification of EEG signals using Neural Networks and logistic regression”, Computer methods and Programs in Bio-medicine, Vol. 78, Issue 2, pp. 87-99, May 2005

[7] Suman Mukherjee,”Crop Stage Classification of Hyperspectral Data Using Red Edge Wavelength Pursuit” , in the proceeding of XXVIII INCA International Congress on Collaborative Mapping and Space Technology, November 2008, Ahmedabad

[8] Sunghyun Lim, Kwang Hoon Sohn & Chulhee Lee, “Principal Component Analysis for compression of Hyperspectral Images”, Geoscience & Remote Sensing symposium, 2001, IGARSS01, Vol.1, pp. 97-99, July 2001.

[9] D. Lu & Q. Weng,”A survey of image classification methods and techniques for improving classification performance”, International Journal of Remote Sensing, Vol. 28, No. 5, pp. 823-827,2007

[10] Goel, P.K., Prasher, S. O., Patel, R. M., Landry, J. M., Bonnell, R. B. and Viau, A. A.,2003, “Classification of Hyperspectral Data by Decision Trees and Artificial Neural Networks to Identify Weed Stress and Nitrogen Status of Corn,” Computers and Electronics in Agriculture, 39, 67-93.

[11] S. N. Omkar, Sivaranjani V, J. Senthilnath, Suman Mukherjee. “Dimensionality Reduction and Classification of Hyperspectral Data ,” Vol. 2, Number 3, pp. 157-163. 2010

[12] J. Timmis, Artificial immune systems: “A novel data analysis technique inspired by the immune network theory,” PhD thesis, 2000

[13] Tao Liu, Yan Zhou, Zhifeng Hu, Zhijie Wang."A New Clustering Algorithm Based on Artificial Immune System", Fifth International Conference on Fuzzy Systems and Knowledge Discovery

[14] Yanfei Zhong, Liangpei Zhang, Bo Huang, and Pingxiang Li, "An Unsupervised Artificial Immune Classifier for Multi/Hyperspectral Remote Sensing Imagery", IEEE Transactions On Geoscience And Remote Sensing, Vol. 44, No. 2, February 2006

[15] Sutton, R. S. & Barto, A. G. (1998), “Reinforcement Learning an Introduction’’, A Bradford Book.

[16] Jeme, N. K. (1974a), “Towards a Network Theory of the Immune System”, Ann. Itnmunol. (Inst. Pasteur) 125C, pp.373-389

[17] DE Castro, Von Zuben.” Artificial Immune Systems: Part I-Basic Theory and Applications”. Technical Report-RT DCA 01/99, URL:http://www.dca.fee.unicamp.br/ lnunes.