Wavelet Feature Selection for Image Classification

12
IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 17, NO. 9, SEPTEMBER 2008 1709 Wavelet Feature Selection for Image Classification Ke Huang and Selin Aviyente, Member, IEEE Abstract—Energy distribution over wavelet subbands is a widely used feature for wavelet packet based texture classification. Due to the overcomplete nature of the wavelet packet decomposition, fea- ture selection is usually applied for a better classification accuracy and a compact feature representation. The majority of wavelet fea- ture selection algorithms conduct feature selection based on the evaluation of each subband separately, which implicitly assumes that the wavelet features from different subbands are independent. In this paper, the dependence between features from different sub- bands is investigated theoretically and simulated for a given image model. Based on the analysis and simulation, a wavelet feature selection algorithm based on statistical dependence is proposed. This algorithm is further improved by combining the dependence between wavelet feature and the evaluation of individual feature component. Experimental results show the effectiveness of the pro- posed algorithms in incorporating dependence into wavelet feature selection. Index Terms—Feature selection, mutual information, texture classification, wavelet packet. I. INTRODUCTION W AVELET decomposition and its extension, wavelet packet decomposition, have gained popular applica- tions in the field of signal/image processing and classification [2]–[5]. Wavelet transforms enable the decomposition of the image into different frequency subbands, similar to the way the human visual system operates. This property makes it especially suitable for the segmentation and classification of texture images [4]–[12]. For the purpose of texture classi- fication, appropriate features need to be extracted to obtain a representation that is as discriminative as possible in the transform domain. A widely used wavelet feature is the energy of each wavelet subband [2]–[4], [6], [7], [11]–[13]. The idea of extracting energy features from filtered images can be traced back to [14], where a bank of band-pass filters were used for image analysis. A more recent survey on filtering based methods for texture classification can be found in [15]. In early research, such as in [4] and [13], features from all wavelet subbands are used for texture classification. However, it is known that proper feature selection is likely to improve the classification accuracy with fewer number of features [16]. At the same time, the overcomplete structure of the wavelet Manuscript received November 27, 2007; revised April 17, 2008. First pub- lished August 4, 2008; last published August 13, 2008 (projected). Part of this paper was published at ICASSP 2005. This work was supported in part by the Michigan Economic Development Corporation under GR-433. The associate editor coordinating the review of this manuscript and approving it for publica- tion was Dr. Dimitri Van De Ville. The authors are with the Department of Electrical and Computer Engineering, Michigan State University, MI 48824 USA (e-mail: [email protected]; [email protected]). Color versions of one or more of the figures in this paper are available online at http://ieeexplore.ieee.org. Digital Object Identifier 10.1109/TIP.2008.2001050 packet transform motivates the selection of the wavelet features for classification. For the widely used energy feature, wavelet feature selection is equivalent to selecting a set of subbands for image decomposition. Therefore, “wavelet feature selection” and “wavelet subband selection” are interchangeably used in this paper. At this point, it is important to make a distinction between wavelet feature selection and the general feature selection dis- cussed in pattern recognition [16], [17]. Both selection methods aim at obtaining a compact representation of the image for clas- sification. However, the two techniques are not exactly the same. First, for general feature selection methods, the explicit knowl- edge of the feature extraction process may not always be avail- able. The input to the general feature selection process is usually a vector of values representing the different features without any a priori information about how these features were obtained. On the other hand, the wavelet feature selection methods can take advantage of the tree structure of the wavelet decomposi- tion for the selection process. For example, in this paper, the wavelet packet decomposition structure is analyzed and used in determining the optimal set of subbands. Second, selection methods based on component analysis, such as the principle component analysis (PCA), independent component analysis (ICA), and linear discriminant analysis (LDA), usually require that all of the original feature components are available at both the training and testing stages for projecting features into a se- lected subspace. In the setting of wavelet packet decomposi- tion, this means that a full decomposition is required at both the training and testing stages. On the other hand, for the wavelet feature selection method, an image is only needed be decom- posed into the wavelet subbands selected by the wavelet feature selection method during the training stage. One of the most well-known subband selection algorithms is based on the entropy cost function [18], where the entropy of the coefficients at the children nodes is compared to the entropy at the parent node, and the “best” wavelet packet tree is chosen to minimize the entropy of the representation coefficients. As such, the main goal of this method is signal reconstruction and not classification. For this reason, subband selection methods that particularly aim at achieving compact signal representations for classification have been proposed. For example, in [6], the features are only extracted from subbands with energy values higher than a predetermined threshold value. The energy distri- bution over these subbands is then used as features for classifi- cation. In [19], each subband is evaluated based on the discrim- ination power of the extracted energy value, and the subbands with high discrimination power are selected for subsequent clas- sification. A similar evaluation method is also employed in [3], where the decomposition tree is pruned by comparing the dis- crimination score of the parent and its children nodes. 1057-7149/$25.00 © 2008 IEEE

Transcript of Wavelet Feature Selection for Image Classification

IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 17, NO. 9, SEPTEMBER 2008 1709

Wavelet Feature Selection for Image ClassificationKe Huang and Selin Aviyente, Member, IEEE

Abstract—Energy distribution over wavelet subbands is a widelyused feature for wavelet packet based texture classification. Due tothe overcomplete nature of the wavelet packet decomposition, fea-ture selection is usually applied for a better classification accuracyand a compact feature representation. The majority of wavelet fea-ture selection algorithms conduct feature selection based on theevaluation of each subband separately, which implicitly assumesthat the wavelet features from different subbands are independent.In this paper, the dependence between features from different sub-bands is investigated theoretically and simulated for a given imagemodel. Based on the analysis and simulation, a wavelet featureselection algorithm based on statistical dependence is proposed.This algorithm is further improved by combining the dependencebetween wavelet feature and the evaluation of individual featurecomponent. Experimental results show the effectiveness of the pro-posed algorithms in incorporating dependence into wavelet featureselection.

Index Terms—Feature selection, mutual information, textureclassification, wavelet packet.

I. INTRODUCTION

W AVELET decomposition and its extension, waveletpacket decomposition, have gained popular applica-

tions in the field of signal/image processing and classification[2]–[5]. Wavelet transforms enable the decomposition of theimage into different frequency subbands, similar to the waythe human visual system operates. This property makes itespecially suitable for the segmentation and classification oftexture images [4]–[12]. For the purpose of texture classi-fication, appropriate features need to be extracted to obtaina representation that is as discriminative as possible in thetransform domain. A widely used wavelet feature is the energyof each wavelet subband [2]–[4], [6], [7], [11]–[13]. The ideaof extracting energy features from filtered images can be tracedback to [14], where a bank of band-pass filters were usedfor image analysis. A more recent survey on filtering basedmethods for texture classification can be found in [15]. In earlyresearch, such as in [4] and [13], features from all waveletsubbands are used for texture classification. However, it isknown that proper feature selection is likely to improve theclassification accuracy with fewer number of features [16].At the same time, the overcomplete structure of the wavelet

Manuscript received November 27, 2007; revised April 17, 2008. First pub-lished August 4, 2008; last published August 13, 2008 (projected). Part of thispaper was published at ICASSP 2005. This work was supported in part by theMichigan Economic Development Corporation under GR-433. The associateeditor coordinating the review of this manuscript and approving it for publica-tion was Dr. Dimitri Van De Ville.

The authors are with the Department of Electrical and Computer Engineering,Michigan State University, MI 48824 USA (e-mail: [email protected];[email protected]).

Color versions of one or more of the figures in this paper are available onlineat http://ieeexplore.ieee.org.

Digital Object Identifier 10.1109/TIP.2008.2001050

packet transform motivates the selection of the wavelet featuresfor classification. For the widely used energy feature, waveletfeature selection is equivalent to selecting a set of subbands forimage decomposition. Therefore, “wavelet feature selection”and “wavelet subband selection” are interchangeably used inthis paper.

At this point, it is important to make a distinction betweenwavelet feature selection and the general feature selection dis-cussed in pattern recognition [16], [17]. Both selection methodsaim at obtaining a compact representation of the image for clas-sification. However, the two techniques are not exactly the same.First, for general feature selection methods, the explicit knowl-edge of the feature extraction process may not always be avail-able. The input to the general feature selection process is usuallya vector of values representing the different features without anya priori information about how these features were obtained.On the other hand, the wavelet feature selection methods cantake advantage of the tree structure of the wavelet decomposi-tion for the selection process. For example, in this paper, thewavelet packet decomposition structure is analyzed and usedin determining the optimal set of subbands. Second, selectionmethods based on component analysis, such as the principlecomponent analysis (PCA), independent component analysis(ICA), and linear discriminant analysis (LDA), usually requirethat all of the original feature components are available at boththe training and testing stages for projecting features into a se-lected subspace. In the setting of wavelet packet decomposi-tion, this means that a full decomposition is required at both thetraining and testing stages. On the other hand, for the waveletfeature selection method, an image is only needed be decom-posed into the wavelet subbands selected by the wavelet featureselection method during the training stage.

One of the most well-known subband selection algorithms isbased on the entropy cost function [18], where the entropy ofthe coefficients at the children nodes is compared to the entropyat the parent node, and the “best” wavelet packet tree is chosento minimize the entropy of the representation coefficients. Assuch, the main goal of this method is signal reconstruction andnot classification. For this reason, subband selection methodsthat particularly aim at achieving compact signal representationsfor classification have been proposed. For example, in [6], thefeatures are only extracted from subbands with energy valueshigher than a predetermined threshold value. The energy distri-bution over these subbands is then used as features for classifi-cation. In [19], each subband is evaluated based on the discrim-ination power of the extracted energy value, and the subbandswith high discrimination power are selected for subsequent clas-sification. A similar evaluation method is also employed in [3],where the decomposition tree is pruned by comparing the dis-crimination score of the parent and its children nodes.

1057-7149/$25.00 © 2008 IEEE

1710 IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 17, NO. 9, SEPTEMBER 2008

One important and effective principle guiding the featureselection process is to exploit the dependence among differentfeature components [5], [9], [20]–[24]. For wavelet featureselection, this principle indicates that the dependence betweendifferent subbands should be investigated and utilized. How-ever, in the existing research on wavelet feature selection [3],[6], [11], [18], [19], either each subband is evaluated separately,or only the parent and its children subbands are compared basedon a predetermined criterion. These methods do not take intoaccount the dependencies between all of the subbands, thusdegrading the classification accuracy.

The statistical properties of wavelet coefficients [25], [26] andthe dependence across scales have been successfully measuredby mutual information [27], or modeled by the hidden Markovmodel (HMM) [28], [29]. It has been shown that incorporatingthe dependence among wavelet coefficients improves the imagecompression efficiency [30] and image denoising performance.In this paper, we are interested in the dependence of the ex-tracted features, not the wavelet coefficients. Since the extractedfeatures are a function of all the coefficients in a subband, thedependence between features is more complicated than the de-pendence between individual wavelet coefficients.

The dependence among subbands can also be interpreted asthe amount of redundancy among subbands. In classification,one can take advantage of this redundancy in features by two dif-ferent approaches. In the first approach, the redundancy amongthe transform coefficients is modeled with a parametric modeland the parameters of the model are extracted as features forclassification. Examples of this first approach can be seen in[5], [9], and [22], where the dependency among wavelet sub-band coefficients that have “parent-children” relation is mod-eled with the HMM and used for texture classification. In thesecond approach, first the features are extracted from the trans-form coefficients and then the redundant features are removed.Algorithms presented in [20], [21], [23], and [24] belong to thissecond approach. The “minimax entropy principle” proposed in[21] requires that a newly added feature should be “very dif-ferent” from the existing features and the degree of difference ismeasured by the changes in the entropy caused by the inclusionof a new feature. In this paper, we adopt the second approach,i.e., identifying the redundant features and then removing them.In this sense, the models developed in [27]–[29] are not suitablefor the problem of wavelet feature selection for texture classifi-cation studied in this paper.

In this paper, the dependence between energy values extractedfrom different subbands are analyzed and incorporated into thewavelet feature selection. Different from the HMM model thatonly models the dependence between “parent-children” sub-bands [5], [9], [22], the dependence between features from allsubbands are taken into account in the selection process. Thedependence between energy features extracted from differentsubbands is theoretically analyzed and verified by the simula-tion results. Based on this analysis, mutual information basedsubband selection (MISS) algorithm is proposed for subband se-lection based on feature dependence. Experimental results showthat dependence is an effective criterion for selecting subbandsto obtain a compact feature representation. In order to furthercombine the dependence between the subbands and the evalu-ation score of individual subbands, subband grouping and se-

lection (SGS) algorithm is proposed to incorporate both factorsinto the subband selection process.

The contributions of this paper include demonstrating the de-pendence among extracted features, demonstrating that depen-dence can be used for effective subband selection, and com-bining dependence between features from different subbandsand the evaluation score of each individual feature for better sub-band selection. Although classification of texture images is usedto show the effectiveness of the proposed methods, the focusof this paper is not to propose a new feature extraction methodfor texture classification, but to identify the dependence amongdifferent components of wavelet features and propose new sub-band selection methods for texture classification. The proposedmethods can be similarly applied to the classification of othersignals using the wavelet packet transform.

The organization for the rest of this paper is as follows.Section II briefly reviews the standard 2-D wavelet packetdecomposition and feature extraction. In Section III, the de-pendence between energy features from different subbandsis analyzed. Due to the lack of an accurate statistical modelfor natural images, simulation results for covariance betweenenergy values from different subbands are presented. Based onthe analysis provided in Section III, Section IV proposes an al-gorithm for subband selection based on dependence. Section Vproposes a second algorithm that combines the dependencebetween subbands and the evaluation score of each individualsubband. Experimental results and related discussions are givenin Section VI. Section VII concludes the paper with suggestionsfor possible future research.

II. WAVELET PACKET FEATURE EXTRACTION

As an extension of the standard wavelets, wavelet packetsrepresent a generalization of the multiresolution analysis anduse the entire family of subband decompositions to generatean overcomplete representation of signals. In 2-D discretewavelet packet transform (2-D DWPT), an image is decom-posed into one approximation and three detail images. Theapproximation and the detail images are then decomposedinto a second-level approximation and detail images, and theprocess is repeated. The standard 2-D DWPT can be imple-mented with a low-pass filter and a high-pass filter [31].The 2-D DWPT of an discrete image up to level

( ), is recursively defined interms of the coefficients at level as follows:

(1)

(2)

(3)

(4)

where is the image and is an index of the nodes in thewavelet packet tree denoting each subband. At each step, theimage is decomposed into four quarter-size images ,

, , .

HUANG AND AVIYENTE: WAVELET FEATURE SELECTION FOR IMAGE CLASSIFICATION 1711

Two-dimensional DWPT decomposition allows us to analyzean image simultaneously at different resolution levels and ori-entations. Different functions for energy can be used to extractfeatures from each subband for classification. Commonly usedenergy functions include magnitude , magnitude squareand the rectified sigmoid [15]. All of these defini-tions of energy are highly correlated. In this paper, the defini-tion of energy based on squaring is used. The energy in differentsubbands is computed from the subband coefficient matrix as

, where is the energy of theimage projected onto the subspace at node . The energy ofeach subband provides a measure of the image characteristics inthat subband. The energy distribution has important discrimina-tory properties for images and as such can be used as a featurefor texture classification.

III. DEPENDENCE ANALYSIS

Wavelet packet decomposition is an orthogonal transform.However, it has been observed that the wavelet packet coeffi-cients from different subbands are highly dependent on eachother. The dependence between coefficients has been modeledthe HMM [28], [29] and utilized for better image compressionefficiency [30]. However, no analysis has been done on the de-pendence among features extracted from different subbands. Inthis section, covariance is used to analyze the dependence be-tween energy values from any two subbands. The analysis isfirst conducted theoretically and then a simulation is used for agiven image model. It is important to note the following analysisand the simulations are valid for only dyadic wavelet packets.

A. Theoretical Analysis

Based on (1)–(4), the wavelet coefficients at level ( ) isobtained by convolving the wavelet coefficients at level withthe wavelet filters. The highest level in the wavelet packet de-composition tree is the original image . Therefore, the coeffi-cients in any subband can be written as a linear combination ofpixel values in the original image . The weighting coefficientsin the linear combination are determined by the properties ofthe wavelet basis. Considering the definition of subband energy,the energy of wavelet coefficients in a subband is a second orderpolynomial in terms of the pixel values of the image . Coeffi-cients in the polynomials are determined by the filters andand the decomposition level. The energy at node can bewritten as

(5)

where is the coefficient of . Dueto the downsampling, some of the coefficients arezero. Hence, the covariance between two energy values is de-fined as

(6)

where and are the energy of the wavelet coeffi-cients at level , subband and level , subband , respec-tively. is a fourth-ordered polynomial interms of the pixel values in the image . Using definitions (1)to (4), the correlation between energy values of two childrennodes, and , is

(7)

Given that and are FIR filters, the random variables in(7) are the different pixel values. Therefore, the covariance is afunction of the 4th-order statistics of the original image . If wehave a statistical model for the pixels, the correlation value canbe easily calculated.

To illustrate this, suppose that the image is of size 4 4.The image model can be described using AR-1 Gaussian model,which is simple but often used in digital image processing [32].A useful property of AR-1 Gaussian model is that the marginaldistribution of each pixel is also Gaussian [33]. The GaussianAR-1 process with mean is usually written in terms of a seriesof white noise innovation processes

(8)

where are i.i.d. and . The marginaldistribution is also normal:

(9)

Given the marginal distribution, (7) can be expanded asfollows:

(10)

where are i.i.d. Gaussian random variablescorresponding to different pixels distributed as given by (9), and

are functions of waveletfilters.

Suppose that the size of the original image is , anddefine . represents those termson the right hand side of (7) where the indices of four pixelvalues are exactly the same. Thus, contains ex-actly terms. Similarly, contains ,

contains ,contains and con-tains terms from the right hand side of

1712 IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 17, NO. 9, SEPTEMBER 2008

TABLE INORMALIZED ENERGY COVARIANCE WITH HAAR BASIS

TABLE IINORMALIZED ENERGY COVARIANCE WITH DB4 BASIS

(7). Based on the marginal distribution, the four expected valuesare given as follows:

(11)

Therefore, we can obtain the covariance between the energyfeatures from two subbands analytically. In the next section,we compute the derived covariance values for a specific set ofwavelet filters.

B. Simulation

Suppose that the Haar wavelet basis is used, i.e.,and . The

parameters in (9) are chosen as follows: ,and , which are standard values for a 256-valuedgrey image. The normalized covariance is used for measuringthe dependence between energy of different subbands. Thecovariance matrix between the four energy values from thefirst decomposition level for the Haar basis, , ,

, , is shown in Table I. For the Daubechies’ 4-tap(2 vanishing moments) filters, the same covariance matrix isshown in Table II. In Table III, the covariance between 4 energyvalues computed from the subbands in the first decompositionlevel of Haar basis and 4 energy values for the correspondingsubbands in Daubechies’ 4-tap basis is given.

These results show that various degrees of dependence existbetween the energy features extracted from different subbandswithin a wavelet basis and from different wavelet bases. Thewavelet basis, i.e., the high and low pass filter pair and , af-fects the degree of dependence. Therefore, the assumption used

TABLE IIINORMALIZED ENERGY COVARIANCE BETWEEN DB4 BASIS AND HAAR BASIS

in many image processing algorithms that different wavelet sub-bands yield independent energy values is incorrect.

IV. SUBBAND SELECTION WITH DEPENDENCE

In Section III, an analytical procedure is given for quantifyingthe dependence between energy values from different subbandsusing different wavelet bases, given that the underlying imagemodel is known. Although in most applications, an accuratestatistical image model is not available [34], the simulation inSection III still supports the hypothesis that the energy valuesfrom different subbands and even different wavelet bases maybe dependent. This observation motivates us to select subbandsbased on dependence to generate a compact representation ofthe wavelet features for texture classification.

In real applications, due to the lack of suitable statisticalimage models, it is not likely that dependence between featurescan be easily calculated analytically. However, the dependencerelationship between energy values from subbands can still beestimated empirically using nonparametric methods based ona training set. Based on this estimation, independent subbandscan be chosen to achieve a compact representation for thesubsequent classification. In this paper, Mutual Informationbased Subband Selection (MISS) is proposed to implement thissubband selection process. Given a training texture set , atesting texture set for classification, and a set of subbands

from the wavelet packet decomposi-tion, the MISS algorithm can be formulated as follows.

1) For each texture , extract the energy values for allsubbands. Each time the texture is decomposed, the subbandsize is decreased by a factor of 4. When the subband size issmall, the energy of the subband will not be a robust measure.In the implementation, the minimum subband size is set to16 16.

2) Divide the subbands into two sets: SU and SR. InitializeSU to contain the subband that has the highest energy and

.

3) Move a subband from SR to SU based on thefollowing criterion: is chosen such that the featurecorresponding to has the smallest mutual information withthe features extracted from the subbands in the set SU, i.e.,

, where mutual information

is used to measure the dependence. This process stopswhen the number of subbands in SU reaches a predefinedvalue, which can be determined by cross validation. In theexperiments presented in this paper, this value is set as aparameter and the relation between the value of this parameterto the classification accuracy is studied. Finally, the subbands

HUANG AND AVIYENTE: WAVELET FEATURE SELECTION FOR IMAGE CLASSIFICATION 1713

in the set SU are used to construct the compact representationfor the given images.

4) Extract energy values corresponding to the sparserepresentation SU for all images in the testing set .

This algorithm tries to select subbands such that the energyfeatures from these subbands are as independent as possible.It does not require that the chosen subbands construct a validwavelet packet decomposition tree, as in [4], [12], and [18].

A. Mutual Information

This subsection discusses the computation of mutual infor-mation, which is used in the step 3) of the MISS algorithm.Consider two random variables and with a jointdistribution . The mutual information between andis defined as

(12)

where is the Kullback-Leibler distance between twoprobability mass functions and . When the log-arithm function in the definition uses a base of 2, the unit for

is bits. The mutual information is symmetric in and, nonnegative, and is equal to zero if and only if and are

independent. The mutual information indicates howmuch information conveys about .

Similarly, the mutual information between a set of randomvariables and a single random variablecan be defined as

(13)

This definition of mutual information can be used for evalu-ating the independence between the features extracted from a setof subbands SU and a single subband . The higher the valueof the mutual information, the easier it is to estimate the distri-bution of given SU. The mutual information thus provides acriterion for the subband selection process.

B. Computation of Mutual Information

Considering the right hand side of (13), one practical dif-ficulty in computation is the estimation of high-dimensionaljoint pdfs (or, equivalently, conditional pdf), due to the possiblelarge size of the subband set . For example, if the texture isdecomposed up to level 3 using wavelet packets, then there are85 subbands. It is impractical and unreliable to estimate suchhigh-dimensional distributions. To avoid the problem of “curseof dimensionality,” we assume that the set of random variables

provides information to only througha many-to-one scalar mapping . Inthis sense, the mutual information canbe calculated as . In this model, is obtained byprocessing . Thus, applying the data-processing theorem

[35] yields . The equality isachieved if and only if is the sufficientstatistics for [35]. Since this is rarely true inpractice, finding a proper form for the function is impor-tant to minimize the error caused by the inaccurate assumption.Since the true mutual information is theupper bound for the estimated mutual information , thefunction should maximize the estimation . For theconvenience of computation, is usually assumed to be alinear function. In [27], a simple linear model was adopted, asfollows:

(14)

Optimizing this model, i.e., the weighting coefficients ,incurs similar difficulties as discussed in [36]–[38]. To avoidthe difficulties, a simple equal weight model, i.e., ,is used in [27] and the results from the computational modelmatch the statistical property of actual images well. In thispaper, equal weights are used for mutual information compu-tation, i.e., . In this case, is the unbiased estimatefor the mean of .

At this point, the accuracy of this simplified model for mutualinformation computation and its effect on the MISS algorithmdeserve further discussion. In [27], different weighting models,such as equal weight and optimal linear weights have been pro-posed. The experimental results show that the mutual informa-tion estimate does not vary much based on the weighting model.Using the equal weight model is equivalent to averaging the en-ergy values in the set SU. In this process, the subbands withhigher energy values play a more important role in the mutualinformation computation that determines the next selected sub-band. This approximation of mutual information computationmay cause some bias in the selection of the subbands. The nextselected subband is the one least dependent on those subbandsin SU with high energy values. This is consistent with the cur-rent knowledge that the wavelet subbands with higher energyvalues are more informative for texture classification. As shownin the experimental results in the later sections, the simplifiedmodel is able to achieve a relatively low classification error ratewith a small number of subbands.

In the actual implementation, the value of mutual informa-tion between two discrete vectors and needs to be com-puted in the discrete domain. Usually, the ranges of and arepartitioned into and intervals, respectively [39], [40].The continuous pdf can be approximated by the 2-D his-togram . Similarly,the marginal distributions and can be estimatedby and , respectively. Based on the the estimateddiscrete probability distributions, the mutual information can becalculated as

(15)

If the random processes , are stationary and ergodic, thenthe histogram reliably approaches the pdf and con-verges to .

1714 IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 17, NO. 9, SEPTEMBER 2008

V. SUBBAND SELECTION COMBINING DEPENDENCE

AND ENERGY

As mentioned in Section I, the evaluation of individual sub-bands using criterion such as the energy of each subband pro-vides useful information for subband selection. It is now clearthat two factors, evaluation of individual subbands and depen-dence among subbands, affect the subband selection. In Sec-tion IV, dependence between different subbands is consideredfor subband selection. However, the subbands selected based onthe independence criterion are not necessarily the most signifi-cant subbands for classification. In order to further improve sub-band selection for classification, subband grouping and selec-tion (SGS) algorithm is proposed to combine the dependencebetween subbands and the evaluation of individual subbands.SGS achieves this objective through two steps. First, the algo-rithm discovers the structure of statistical dependency betweensubbands using a grouping algorithm that partitions the sub-bands into different sets based on their dependency on eachother, i.e., ideally, the subbands from the same set are depen-dent, while subbands from different sets are independent. Inthe second step, the subband with the highest energy from eachsubset is selected for classifying images. With these two steps,SGS successfully incorporates the previous research results in[6] and exploits the statistical dependency among subbands fora concise representation of the images.

A. Statistical Dependency Between Subbands

The general problem of discovering the structure of statisticaldependency can be described as follows. Given a set of subbands

, which is the full wavelet packet decom-position in our problem, find a partition ofset such that

and (16)

where is the number of subsets. It is desired that that thesubbands from the same subset are dependent on each other,while the subbands from different subsets are independent. Ifthis property holds, then we can choose only one feature fromeach subset to construct a concise representation of the texture.

Discovering the structure of statistical dependency has beenaddressed in literature. One of the recent research results in thisarea was introduced in [41], where a method based on hypoth-esis testing is proposed. Different hypotheses on subset par-titions were compared based on log-likelihood. However, thenumber of all possible partitions of a set with random vari-ables is in the order of , which is practically intractable.The hypothesis testing also requires evaluating the joint pdf ofall the random variables in the same set. The estimation of thepdf in high-dimensional spaces is usually unstable and inaccu-rate, due to the sparsity of the training data [16]. To avoid thisproblem of estimating high-dimensional joint pdfs, we proposean approximate method for statistically partitioning the sub-bands based on pairwise dependency. More specifically, we canestimate the marginal pdf of features extracted from each sub-band. Using these estimated marginal pdfs, the pairwise depen-dency between two subbands can be quantified. A data grouping

algorithm is subsequently applied to the features from all sub-bands, generating the partition of the features from each sub-band. This grouping algorithm only requires pairwise depen-dencies, thus avoiding computations in high-dimensional space.Note that this method is an approximation to the discovery ofthe structure of statistical dependency. For example, given threerandom variables , , , it is possible that ,

, while [42]. If onlypairwise dependence is measured, will not be grouped intothe same subband with and , though the dependence is ev-ident. Another limitation lies in the data grouping algorithm it-self. Research on data grouping, though conducted for severaldecades, is still far from perfect [16]. This indicates that an accu-rate partitioning of the subbands such that the ones in the samesubset are dependent, while the ones from different subsets areindependent can only be approximated. Despite these limita-tions, the grouping algorithm, measuring pairwise dependency,is still able to find out the dependence structure in the data.

In this paper, -means grouping algorithm is used togenerate the partition of subbands. Given subbands,partition subsets ( ), and the mean of each subset as

, the -Means algorithm can be described asfollows [16].

1) Randomly select subbands from subbands and setthem as initial class means .

2) Classify the subbands. Each subband is assigned to themean with which it has the highest mutual information.

3) Recompute the means based on thepartition of subbands from step (2).

4) Repeat steps 2) and 3) until the means do not change.

To avoid randomness in final performance evaluation, we runthe -means algorithm multiple times and the final evaluationis based on the average value from all runs.

B. Subband Grouping and Selection

Given a training texture set and a test texture set forclassification, and a subband set fromwavelet packet decomposition, the SGS algorithm can be de-scribed as follows.

1) For each texture , extract the energy values ofall subbands from wavelet packet decomposition. In theimplementation, the minimum subband size is set to 16 16.

2) For each subband , estimate the marginal pdf of its energyvalue using Parzen method with Gaussian kernel [16] basedon the training set.

3) For any two subbands , , use (15) to compute themutual information between the extracted features using thepdfs from step 2.

4) Apply -Means grouping algorithm with the pairwisemutual information as the metric, generate the partition of thesubband set : .

HUANG AND AVIYENTE: WAVELET FEATURE SELECTION FOR IMAGE CLASSIFICATION 1715

Fig. 1. Simplified illustration of SGS: Subbands are first partitioned into dif-ferent groups (three ellipses) and then the subband with highest energy (filleddot) is selected from each group.

5) Select the subbands with the highest energy value, fromeach subset, . The set of subbands used for representing theimages is thus .

6) Extract subband energies of all images in the testing set .Use only the energy values corresponding to the subbands inthe set SU for classification.

A simple illustration of SGS for a three dimensional space isgiven in Fig. 1.

VI. EXPERIMENTS

A. Experiment Setting

To evaluate the proposed wavelet subband selection algo-rithms, classification experiments are conducted on the Brodatztexture database [43]. All of the images are grayscale and havea size of 512 512. Each texture is divided into 16 nonoverlap-ping subimages with a size of 128 128. In this way, the dataset for experiments contains 54 different texture classes, with 16images in each class, resulting in 864 images for experiments.This experimental setting of using training and testing imagesfrom the same large texture image is used commonly in the eval-uation of texture classification methods, such as in [6], [7], [9],[10], and [44]. It is known that the size of subimages affects theclassification results. Since the focus of this paper is to demon-strate the effect of subband selection methods rather than findinga set of optimal parameters for texture classification, we fix thetexture size to 128 128.

The gray-scale images are first normalized to a given meanand variance. Denoting as the value of pixel andand as the mean and variance of , respectively, the normal-ized image is given by

if

if(17)

where and are the predefined mean and variance of theadjusted image , respectively. In the experiments, we set

and .The experiment includes two stages: the training stage and

the testing stage. Half of the data set ( )is used for training and the other half is used for testing. In thetraining stage, each texture in the training set is decomposedwith the wavelet packet transform up to three levels. Thus, foreach texture, the number of subbands from a single wavelet basis

is . A predetermined number of subbandsare selected by applying the subband selection algorithms on thetraining set . In the training stage, due to the randomness ofselecting the initial seeds for the -means grouping algorithm,SGS is run ten times for the same training and testing data setand the results are averaged. For this experimental setting, it isfound empirically that averaging on ten runs results in a rela-tively stable result.

Classification experiments are conducted on the testing set,, with the images decomposed only at the subbands selected

in the training stage. In the testing stage, the classification errorrates for different number of selected subbands are computedfor performance evaluation. The classification error rate is com-puted using the classifier with cross-validation (leave-one-out) [16]. All of the images in the testing set are clas-sified. In each round, one texture from is taken out and thenormalized Euclidian distance defined in (18) between this tex-ture and all the other images from are computed based on theenergy features extracted from the selected subbands. The nor-malized Euclidian distance, which was shown to be effective formeasuring texture similarity [45], is

(18)

where , are two different feature vectors, and is thestandard deviation of the feature extracted from subband , es-timated from the training set . The “ ” most similar imagesdetermine the class label of the texture to be classified by a ma-jority vote. After each texture in the testing set is classifiedin this way, the ratio of the number of misclassified images tothe number of total images in is computed as the error rate.The value “ ” in “ ” is chosen to be 16. Considering thatthe number of images in each class is 16, we expect that mostof the 16 most similar images come from the same class as thetexture to be classified.

For comparison, four different algorithms for wavelet sub-band selection are implemented and tested: 1) the subband se-lection algorithm based only on the magnitude of the energy ineach subband [6] (“Energy” method); 2) the subband selectionalgorithm that evaluates the Fisher discrimination power of eachsubband [3], [19] (“Fisher” method); 3) the best wavelet packetdecomposition tree based on entropy [18] (“Best Tree” method).In the “Best Tree” method, if the entropy value of a subband isless than the sum of entropy values of its children subbands,the subband is decomposed into children subbands. This crite-rion is iteratively applied to the leaf nodes of the wavelet de-composition tree. With this criterion, it is not possible to ob-tain any arbitrary number of subbands. The only parameter thatcan be modified is the decomposition level, which relates to themaximum number of subbands; (4) The subband selection algo-rithm that evaluates each subband with its entropy value (“En-tropy” method). This method selects subbands with high en-tropy values for classification. This method is a combination ofthe ideas presented in [3], [18], and [19].

The four selection methods (“Energy,” “Fisher,” “Best Tree,”and “Entropy”) are tested using the same experimental settingsas the two algorithms proposed in this paper (“MISS” and

1716 IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 17, NO. 9, SEPTEMBER 2008

Fig. 2. Error rates with six subband selection algorithms with the “Haar” basis.

Fig. 3. Error rates with six subband selection algorithms with the “db10” basis.

Fig. 4. Error rates with six subband selection algorithms with the “bio28” basis.

“SGS”). The error rates for different number of subbandsselected by different methods are reported and compared.

B. Performance With Different Wavelet Bases

Different wavelet bases define the transform filters usedin the tree structure decomposition introduced in Section II.Therefore, using different wavelet bases will generate differentenergy distributions over subbands. The relationship betweenwavelet bases and classification performance was empiricallydiscussed in [44]. In this paper, three wavelet bases are usedfor experiments: Haar, “db10” (Daubechies’ basis with tenvanishing moments) and ’bio28’ (biorthogonal spline waveletshaving two vanishing moments in the decomposition filterand 8 vanishing moments in the reconstruction filter.) [31].Since the total number of subbands for each wavelet basis is85, the number of selected subbands is set to change from 5to 85, with increments of 5. Figs. 2–4 show the classificationperformance with the “Haar,” “db10,” and “bio28” waveletbases, respectively.

Several observations can be made from these figures. First ofall, the dependence between features extracted from differentsubbands is useful for subband selection. This is verified bythe performance of the MISS algorithm. For all of the threewavelet bases, MISS algorithm can achieve a classification ac-

Fig. 5. Error rates with five subband selection algorithms with the “db10” basisfor 64� 64 images.

curacy comparable to the accuracy with the full decompositionusing only 20 to 30 subbands. Second, the evaluation of indi-vidual subbands is also effective in subband selection. The “En-ergy,” “Entropy,” and “Fisher” methods select subbands basedon the evaluation of single subbands with different evaluationfunctions. For the “Haar” and “db10” bases, MISS algorithmperforms better than the three algorithms when smaller number(less than 30) of subbands are selected. However, for the “bio28”basis, “Fisher” method performs better for the same range. Thisshows that both dependence between different subbands and theevaluation of individual subbands affect the subband selectionfor classification. Third, SGS consistently outperforms the otheralgorithms. This is due to the fact that both dependence betweensubbands and the evaluation of individual subbands are incorpo-rated into the selection process. The advantage is more obviouswhen smaller number of subbands are selected. The number ofselected subbands is the number of clusters in SGS algorithm.When the number of clusters is relatively small compared tothe total number of subbands, the distribution of clusters reflectmore reliably the underlying dependence structure, since moredata points can be used to construct each cluster. Finally, the“Best Tree” algorithm does not achieve the best classificationaccuracy. The “Best Tree” is optimal in the sense of image re-construction, not in the sense of classification.

C. Performance With Different Image Sizes

In order to test the dependence of the performance of thesubband selection algorithms, we tested the algorithms withdifferent image sizes. In texture classification algorithms, it iscommon to use an image size of 128 128. In this section, wetested the performance of our algorithms with a smaller imagesize of 64 64. Fig. 5 illustrates the performance of the algo-rithms for 64 64 input images using db10 basis. From thisfigure, it can be concluded that SGS and MISS still performbetter than the other algorithms especially for smaller numberof subbands.

D. Performance Using Subbands Selected From Two WaveletBases

Traditional subband selection algorithm confines to selectingsubbands that are generated from a single wavelet basis. How-ever, for classification, the pool of candidate subbands can beexpanded to subbands generated by multiple wavelet bases.

In this subsection, experiments on selecting subbands fromtwo wavelet bases are conducted for all of the subband selec-tion methods mentioned in Section VI-A. Figs. 6–8 show the

HUANG AND AVIYENTE: WAVELET FEATURE SELECTION FOR IMAGE CLASSIFICATION 1717

Fig. 6. Error rates with six subband selection algorithms by selecting subbandsfrom the “bio28” basis and the “db10” basis.

Fig. 7. Error rates with six subband selection algorithms by selecting subbandsfrom the “Haar” basis and the “bio28” basis.

Fig. 8. Error rates with six subband selection algorithms by selecting subbandsfrom the “Haar” basis and the “db10” basis.

error rates obtained by the algorithms using “bio28” and “db10,”“Haar” and “bio28,” and “db10” and “Haar,” respectively. Eachdecomposition generates 85 subbands, so the total number ofsubbands for selection is 170. In order to make the results di-rectly comparable with subband selection from a single waveletbasis, the number of selected subbands is still set to change from5 to 85, with an increment of 5.

In this modified experiment setting, the SGS still consis-tently outperforms other algorithms and the observations inSection VI-B still hold true. It is also observed that the errorrates obtained by subband selection from the two wavelet basesis consistently lower than the error rates using a single waveletbasis. This shows the advantage of selecting subbands frommultiple wavelet bases or redundant dictionaries in general.More subbands can provide more information for classificationand the proper selection algorithm can select the most informa-tive subbands for classification.

The minimum error rates achieved with different waveletbases and combination of bases for the different subbandselection methods are summarized in Table IV. Note that thedifferent minimum error rates are not necessarily obtained with

Fig. 9. Wavelet basis functions at the first decomposition level for “haar” and“bio28,” in the first row and second row, respectively. In this figure, white re-gions correspond to high values and dark regions correspond to low values.

the same number of subbands. Based on the results in this table,the best classification performance is achieved by applying theSGS algorithm on the combination of “bio28” and “haar” bases.In [44], the relation between the wavelet basis and the classi-fication results on texture classification was discussed. Basedon the conclusions from [44], the amount of shift variance inthe decomposition filters in the wavelet basis is much moreimportant than the regularity of the filters. This was used toexplain why “haar” basis performed better than the Daubechieswavelets for texture classification [44]. Another empiricalconclusion drawn in [44] is that even-length biorthogonal filtersperform better than odd-length ones. These conclusions supportour finding that the combination of “bio28” and “haar” basesachieves the best performance. Another support comes from thedifference of the spaces spanned by the wavelet functions. Asshown in Fig. 9, the two wavelet functions (“haar” and “bio28”)have quite different shapes, and, therefore, the combinationof the two is able to capture a variety of structures in textureimages such as smooth curves and sharp discontinuities.

E. Clustering of Subbands in the SGS Algorithm

In order to understand why SGS performs the best comparedto other methods, it is important to explore the distribution ofsubbands among clusters. The distribution of subbands in dif-ferent clusters reflects the similarity of these subbands in iden-tifying different texture structures. To make this analysis easierfor visualization, we analyze the grouping results with a 2-levelwavelet packet decomposition, i.e., there are 21 subbands intotal. The grouping algorithm is run 1000 times. Each time halfof the 864 texture images are randomly selected for decompo-sition and the energy features are used for grouping. When thenumber of clusters is set to 2, 998 out of the 1000 runs give thesame grouping results. All approximation subbands (subbands1, 2, and 6 in Fig. 10) are in one cluster and the rest of the sub-bands are in another cluster. This result indicates that subbandscorresponding to the same spatial frequency range usually havehigher degree of dependence and tend to be grouped into thesame cluster.

We further extend this analysis by changing the number ofclusters to 4. The number of clusters is set to 4 because thereare four types of subbands in wavelet packet decomposition:LL, LH, HL, HH. The resulting subband clusters are used tobuild a 21 21 co-occurrence matrix , where the thentry is the number of times the th and th subbands are inthe same cluster, with the subband labeling scheme given inFig. 10. The phenomenon observed in the 2-cluster case appears

1718 IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 17, NO. 9, SEPTEMBER 2008

TABLE IVMINIMUM ERROR RATES ACHIEVED WITH DIFFERENT WAVELET BASES AND DIFFERENT SELECTION METHODS

Fig. 10. Subband labeling scheme for a 2-level wavelet packet decomposition.

again, i.e., the approximation subbands are always grouped inthe same cluster with , and havingthe highest values. Also, subbands corresponding to similar spa-tial frequency are grouped into the same cluster most of the time.For example, , , , ,

, all have large values. The commonpattern in these subband pairs is that the paths from the rootto the two subbands are swapped. For example, the path fromthe root to subband 7 is “LL+LH” and the path to subband 10is “LH+LL.” The empirical analysis of the grouping results isconsistent with the filtering structure of the wavelet packet anal-ysis and verifies the validity of the proposed SGS algorithm.

F. Performance on Unseen Data

The experimental results discussed so far are based on thesetting that the training and testing subimages are cropped fromdifferent regions in the same large texture image. Although thisexperimental setting is commonly used for conducting exper-iments on texture classification, such as in [6], [7], [9], [10],and [44], having training and testing texture images from thesame large image may cause bias in the classification result.Therefore, experiments are conducted for studying the perfor-mance of the proposed algorithms on unseen data to see if theproposed algorithms are still effective. For this purpose, ninelarge texture images are selected and divided into three classes,as shown in Fig. 11, with one class in each row. Each of these512 512 images is divided into 16 nonoverlapping 128 128subimages. Subimages from the first image in each row areused for training and the subimages from the other two imagesare used for testing. In this case, the training set includes 48samples and the testing set includes 96 samples. The waveletbasis “bio28” is selected for decomposing the images. Otherexperimental settings are the same as in the previous sections.Given these conditions, the classification results are shown inFig. 12. For comparison, experiments are also conducted byusing subimages from each large texture image in the trainingset. In this case, 5 subimages are selected from the 16 subim-ages of each larger texture for the training set and the remaining11 subimages are used for testing. The classification results ofthis setting are shown in Fig. 13. From Figs. 12 and 13, it can be

Fig. 11. Texture images used for classification on unseen data. First row: bars;second row: grass; third row: knit pattern.

seen that the classification on the unseen data introduces somefluctuations in the performance curve, but does not change theconclusion on the comparison of different selection algorithms.The minimum error rates obtained by the two experimental set-tings are comparable. The major difference in the results is thata small error rate is achieved with fewer subbands in the secondexperimental setting (Fig. 13) compared to the modified ex-periments with unseen data (Fig. 12). The reason for the ob-served fluctuation in the error rates and the slight increase inthe number of subbands selected is that the subbands chosenfrom the training set may not necessarily be representative forthe test data because of the differences in orientation and thelarger variability within the test data compared to the trainingdata. Despite these fluctuations, the results are consistent withthe previous sections which shows the applicability and validityof the proposed algorithms to a wider setting of texture classifi-cation problems.

G. Discussion

In most applications of wavelet transforms such as denoisingand compression, the goal is to reconstruct the original signal/image. The perfect reconstruction goal constraints how the sub-bands are chosen. However, for classification, it is not requiredthat the selected subbands can be used to reconstruct the originalimage. The major requirement for classification applications isthat the features extracted from the selected subbands are uncor-related and discriminative. With this criterion, the features fromthe selected subbands will form a compact and informative rep-resentation of the images for the purpose of classification.

The idea of incrementally selecting an informative subset ofwavelet subbands used in the MISS algorithm bears some sim-ilarities to “feature pursuit” with “minimax entropy principle”[21]. In [21], the feature pursuit process selects a new featurethat maximizes the decrease in entropy between the existingfeature set and the new feature set obtained by adding the new

HUANG AND AVIYENTE: WAVELET FEATURE SELECTION FOR IMAGE CLASSIFICATION 1719

Fig. 12. Error rates with six subband selection algorithms with the “bio28”basis on unseen data.

Fig. 13. Error rates with six subband selection algorithms with the “bio28”basis and training data from all big texture images.

feature to the existing feature set. The feature pursuit is pro-posed for texture synthesis with a parametric probability model,while the objective of the MISS algorithm is texture classifica-tion. This difference directly results in the difference in the cri-terion used in the feature pursuit and the MISS algorithm. Forthe feature pursuit, the change in entropy values of different fea-ture sets is used for selecting new features, whereas in the MISSalgorithm, the new subband is selected based on change in themutual information.

It is also important to note that different experimental settingsfor the texture classification can affect the classification accu-racy. For example, the distance measure, changes in the texturesize and extracting features other than energy from wavelet sub-bands may all affect the classification accuracy. However, thefocus of this paper is not to propose a new method that can out-perform the state-of-the-art texture classification methods, butrather to investigate one factor (dependence) that affects sub-band selection and incorporate it into the selection process. Theproposed subband selection method is applicable to filterbankbased classification of other signals.

VII. CONCLUSION

In this paper, the dependence between features extractedfrom wavelet packet decomposition is investigated and incor-porated into subband selection for classification. The traditionalmethods implicitly assume independence among features ex-tracted from different subbands or only consider the dependencebetween the parent subband and the corresponding childrensubbands. We first demonstrate that the features extracted fromdifferent subbands are dependent by theoretical analysis andsimulation. An algorithm exploiting the dependence, MISS,is proposed for subband selection. Experimental results showthat dependence among features from different subbands is

effective for subband selection. By exploiting the dependenceamong features from different subbands and incorporatingsubband selection based on individual evaluation of subbands,SGS has been shown to be more effective in subband selection.Experimental results indicate that SGS can effectively selectsmaller number of subbands to achieve lower classificationerror rates than existing subband selection methods.

REFERENCES

[1] K. Huang and S. Aviyente, “Mutual information based subband selec-tion for wavelet packet based image classification,” in Proc. IEEE Int.Conf. Acoustics, Speech, Signal Processing, 2005, vol. 2, pp. 241–244.

[2] G. Choueiter and J. Glass, “A wavelet and filter bank framework forphonetic classification,” in Proc. IEEE Int. Conf. Acoustics, Speech,Signal Processing, vol. 1, pp. 933–936.

[3] N. Ince, A. Tewfik, and S. Arica, “Classification of movement EEGwith local discriminant bases,” in Proc. IEEE Int. Conf. Acoustics,Speech, Signal Processing, vol. 5, pp. 413–416.

[4] A. Laine and J. Fan, “Texture classification by wavelet signature,” IEEETrans. Pattern Anal. Mach. Intell., vol. 15, no. 11, pp. 1186–1191, Nov.1993.

[5] G. Fan and X. Xia, “Wavelet-based texture analysis and synthesis usinghidden Markov models,” IEEE Trans. Circuits Syst. I: Fundam. TheoryAppl., vol. 50, no. 1, pp. 106–120, Jan. 2003.

[6] T. Chang and C. Kuo, “Texture analysis and classification with tree-structured wavelet transform,” IEEE Trans. Image Process., vol. 2, no.4, pp. 429–441, Oct. 1993.

[7] G. Wouwer, P. Scheunders, and D. Dyck, “Statistical texture charac-terization from discrete wavelet representations,” IEEE Trans. ImageProcess., vol. 8, no. 4, pp. 592–598, Apr. 1999.

[8] C. Garcia, G. Zikos, and G. Tziritas, “Wavelet packet analysis for facerecognition,” Image Vis. Comput., vol. 18, no. 4, pp. 289–297, 2000.

[9] M. Do and M. Vetterli, “Rotation invariant texture characterization andretrieval using steerable wavelet-domain hidden markov models,” IEEETrans. Multimedia, vol. 4, no. 4, pp. 517–527, Dec. 2002.

[10] M. Do and M. Vetterli, “Wavelet-based texture retrieval using gener-alized gaussian density and Kullback-Leibler distance,” IEEE Trans.Image Process., vol. 11, no. 2, pp. 146–158, Feb. 2002.

[11] M. Acharyya, R. De, and M. Kundu, “Extraction of feature usingm-band wavelet packet frame and their neuro-fuzzy evaluation formultitexture segmentation,” IEEE Trans. Pattern Anal. Mach. Intell.,vol. 25, no. 12, pp. 1639–1644, Dec. 2003.

[12] S. Arivazhagan and L. Ganesan, “Texture classification using wavelettransform,” Pattern Recognit. Lett., vol. 24, no. 10, pp. 1513–1521, Oct.2003.

[13] M. Unser, “Texture classification and segmentation using waveletframes,” IEEE Trans. Image Process., vol. 4, no. 11, pp. 1549–1560,Nov. 1995.

[14] K. Laws, “Rapid texture identification,” Proc. SPIE, vol. 238, ImageProcessing for Missile Guidance, pp. 376–380, 1980.

[15] T. Randen and J. Husoy, “Filtering for texture classification: A com-parative study,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 21, no. 4,pp. 291–310, Apr. 1999.

[16] R. Duda, P. Hart, and D. Stork, Pattern Classification, 2nd ed. NewYork: Wiley-Interscience, 2000.

[17] A. Jain and D. Zongker, “Feature selection: Evaluation, application,and small sample performance,” IEEE Trans. Pattern Anal. Mach. In-tell., vol. 19, no. 2, pp. 153–158, Feb. 1993.

[18] R. R. Coifman and M. V. Wickerhauser, “Entropy-based algorithmsfor best basis selection,” IEEE Trans. Inf. Theory, vol. 38, no. 2, pp.713–718, Mar. 1992.

[19] N. Rajpoot, “Local discriminant wavelet packet basis for texture clas-sification,” presented at the SPIE Wavelets X, 2003.

[20] D. Koller and M. Sahami, “Toward optimal feature selection,” in Proc.Inf. Conf. Machine Learning, 1996, pp. 284–292.

[21] S. Zhu, N. Wu, and D. Mumford, “Minimax entropy principle and itsapplication to texture modeling,” Neural Comput., vol. 9, no. 8, 1997.

[22] J. Li and J. Wang, “Automatic linguistic indexing of pictures by a sta-tistical modeling approach,” IEEE Trans. Pattern Anal. Mach. Intell.,vol. 25, no. 9, pp. 1075–1088, Sep. 2003.

[23] L. Yu and H. Liu, “Efficient feature selection via analysis of relevanceand redundancy,” J. Mach. Learn. Res., vol. 5, no. 10, pp. 1205–1224,2004.

1720 IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 17, NO. 9, SEPTEMBER 2008

[24] H. Chen and P. Varshney, “Feature subset selection with applications tohyperspectral data,” in Proc. IEEE Int. Conf. Acoustics, Speech, SignalProcessing, 2005, vol. 2, pp. 249–252.

[25] J. Portilla and E. P. Simoncelli, “A parametric texture model based onjoint statistics of complex wavelet coefficients,” Int. J. Comput. Vis.,no. 1, pp. 49–71, 2000.

[26] E. P. Simoncelli, “Modeling the joint statistics of images in the waveletdomain,” presented at the SPIE 44th Annu. Meet., 1999.

[27] J. Liu and P. Moulin, “Information-theoretic analysis of interscale andintrascale dependencies between image wavelet coefficients,” IEEETrans. Image Process., vol. 10, no. 11, pp. 1647–1658, Nov. 2001.

[28] M. Crouse, R. Nowak, and R. Baraniuk, “Wavelet-based statisticalsignal processing using hidden markov models,” IEEE Trans. SignalProcess., vol. 46, no. 4, pp. 886–902, Apr. 1998.

[29] G. Fan and X. Xia, “Improved hidden Markov models in the wavelet-domain,” IEEE Trans. Signal Process., vol. 49, no. 1, pp. 115–120, Jan.2001.

[30] J. Shapiro, “Embedded image coding using zerotrees of wavelet coefi-cients,” IEEE Trans. Signal Process., vol. 41, no. 12, pp. 3445–3462,Dec. 1993.

[31] S. Mallat, A Wavelet Tour of Signal Processing. New York: Aca-demic, 1999.

[32] A. Jain, Fundamentals of Digital Image Processing. EnglewoodCliffs, NJ: Prentice-Hall, 1989.

[33] G. Grunwald, R. Hyndman, L. Tedesco, and R. Tweedie, “A UnifiedView of Linear AR(1) Models,” Tech. Rep. Denver, CO: Univ.Colorado, 1996, Health Sciences Center, Department of PreventiveMedicine and Biometrics.

[34] A. Srivastava, A. Lee, E. Simoncelli, and S. Zhu, “On advances in sta-tistical modeling of natural images,” J. Math. Imag. Vis., vol. 18, no. 1,pp. 17–33, 2003.

[35] T. Cover and J. Thomas, Elements of Information Theory. New York:Wiley, 1991.

[36] J. Principe, D. Xu, and J. Fisher, Information Theoretic Learning.New York: Wiley, 1999, pp. 265–319.

[37] K. Torkkola, “Feature extraction by non-parametric mutual informa-tion maximization,” J. Mach. Learn. Res., vol. 3, pp. 1415–1438, 2003.

[38] K. Hild, D. Erdogmus, K. Torkkola, and J. Principe, “Feature extrac-tion using information-theoretic learning,” IEEE Trans. Pattern Anal.Mach. Intell., vol. 28, no. 9, pp. 1385–1392, Sep. 2006.

[39] R. Moddemeijer, “On estimation of entropy and mutual information ofcontinuous distributions,” Signal Process., vol. 16, no. 3, pp. 233–246,1989.

[40] L. Paninski, “Estimation of entropy and mutual information,” NeuralComput., vol. 15, no. 6, pp. 1191–1253, 2003.

[41] A. Ihler, J. Fisher, and A. Willsky, “Nonparametric hypothesis tests forstatistical dependency,” IEEE Trans. Signal Process., vol. 52, no. 8, pp.2234–2249, Aug. 2004.

[42] S. Ross, Introduction to Probability Models, 7th ed. New York: Aca-demic, 2000.

[43] P. Brodatz, Texture: A Photographic Album for Artists and De-signers. New York: Dover, 1966.

[44] A. Mojsilovic, M. Popovic, and D. Rackov, “On the selection of anoptimal wavelet basis for texture characterization,” IEEE Trans. ImageProcess., vol. 9, no. 12, pp. 2043–2050, Dec. 2000.

[45] W. Ma and B. Manjunath, “Texture features and learning similarity,” inProc. IEEE Int. Conf. Computer Vision and Pattern Recognition, 1996,pp. 425–430.

Ke Huang received the B.S. degree in chemical engineering and the M.S. de-gree in computer science from Tsinghua University, Beijing, China, in 1999and 2002, respectively, and the Ph.D. degree in electrical engineering from theMichigan State University, East Lansing, in 2007.

From May 2005 to March 2006, he was full-time temporary technical staff atthe Siemens Corporate Research, Princeton, NJ, working on medical image pro-cessing for eye disease prevention and diagnosis. Since May 2007, he has beenwith Google, Inc., Mountain View, CA. His research interests include waveletanalysis, machine learning, information retrieval, and medical image analysis.

Selin Aviyente (M’97) received the B.S. degree (with high honors) in electricaland electronics engineering from Bogazici University, Istanbul, Turkey, in 1997,and the M.S. and Ph.D. degrees in electrical engineering: systems, from theUniversity of Michigan, Ann Arbor, in 1999 and 2002, respectively.

Since 2002, she has been an Assistant Professor in the Department of Elec-trical and Computer Engineering, Michigan State University, East Lansing. Herresearch focuses on statistical signal processing, in particular, nonstationarysignal analysis, with applications to image classification and biological signals.Her current research focuses on the study of the functional networks in the brain.

Dr. Aviyente is the recipient of the 2005 Withrow Teaching Excellence Awardand the 2008 National Science Foundation CAREER Award.