Correspondence Analysis for Principal Components ...

6
Coruespondence Analysis for Principal Gomponents Transformation of Multispectral and Hyperspectral Digital lmages James R. Caft and Korblaah Matanawi Abstract Correspondence analysis is introduced for principal compo- nents ftansformation of multispectral and hyperspectral digital images. This method relies on squared deviations between pixel volues and their expected values (joint probabilities computed as the product of the sum of all pixels in one spectal band and the sum of pixel values across all bands at a given pixel position). Correspondenceanalysis is applied to o multispectml SPoT High Resolution Visible fnnv) image of Eleuthera, Bahamas. Correspondenceanalysis, principa) components analysis, and factor andysis (standardized pfin- cipal components) yiel d similar transformations. Conespon- dence analysis,however, compresses mote imoge variance into fewer principal components. For the particular SPOT HRv scenechosen,correspondence analysis captures g6 percent of the original image vafiance in its first principal compo- nent. Used in a lossy image compression algorithm to recon- struct the original set of three SPOT HRVimages, this first principal component from cofiespondence analysis restores spectrul content better than doesprincipal componentsanal- ysis. lntroduction Principal components transformation, also refened to as a Hotelling transform,of multispectral imagesis well known (e.g., Schowengerdt, 1997;Gonzalez and Woods,1992; Jen- sen, 1996).The typical application relies on within-band var- iance/between-band covariance matrices,eigenvectors of which are used to rotate (transform) original image bands such that each rotated band is statistically orthogonalto (in- dependentofl all other bands. Another application, called standardized principal components(Singh and Harrison, 1985),uses within- and between-band correlation coeffrcients to assemble a matrix that is eigen decomposed. Although called standardized principal componentsin literature re- lated to remote sensing, this method is often known as factor analysis (see reviewsin Davis (1s73), Davis (1986), and Carr (1995)), referred to as such throughout this manuscript to avoid confusion with principal componentsanalysis.Factor analysishas been used in digital image processing to mini- mize the numerical influence of bands having a larger vari- ance.Realistically,principal componentsand factor analysis are similar because (co)variance is directly proportional to correlation coeffi cient. A multivariate method of relatively recent advent, corre- spondence analysis (Benzecri, 1973), differs from principal J. Carr is with the Departmentof Geological Sciences/l.72, University of Nevada,Reno, NV 89557 ([email protected]). K. Matanawi is with World Vision, Sudan Proqram, P.O. Box 56527, Nairobi, Kenya. PHOIOGRAMiIETRIC EiIGINEERING & RETIIOTE SENSING componentsand factor analysisby using a chi-square metric for representing inter- and intra-variablerelationships.For certain types of imagesand/or individual scenes, correspon- dence analysismay provide better discrimination among in- tercorrelated spectralbands when compared to principal componentsor factor analysis.Correspondence analysismay be a better method when applied to ratio data, such as pixel values, or measurements in general (Greenacre, 1984). Correspondence analysis is more a geometricalmethod than a statistical one. Factor analysis and principal compo- nents analysis,for instance,are statisticalmethods. Both are functions of covariance among the multiple variables, where covatianceis the metric for closeness (similarity). Correspon- dence analysisuses a different measureof closeness between variables.A chi-souare metric is used to measuredistance between pixel veciors of order M, where M is the number of spectral bands. In this sense, conespondence analysis is a ge- ometric method because its main concern is the orientation of [pixel] vectors in M-dimensional space, and how close these vectors are to one another. A d-etailed discussionof the historical and theoretical background of correspondence analysiscan be found in Greenacre (19s4). A subsequent application is used to emphasize the simi- larities and differences among principal components analysis, factor analysis, and correspondence analysis. The primary in- tent of this paper is to introduce correspondence analysisfor principal components transformation of multispectraland hy- perspectral digital images.Depending on the image type and scene, it may yield resultsthat are superior to those from the other two methods. An extensionof corresoondence analvsis to digital image compression is also suggesied. Algotithms Principal Components Analysis Given M intercorrelated image bands, principal components analysisproceeds by assembling a matrix [S], M by M in size and symmetrical, such that S(i,i) is the variance for band i, and S(i,il : S(/,t is the covariance between bands i and l. Variance can be comouted as Photogrammetric Engineering& RemoteSensing, Vol. 0s, No. B, August 19s9, pp. 909-914. oo99 - 1, 1, 1, 2/ 99/6508-909$ 3. 00/0 @ 1999 American Society for Photogrammetry and Remote Sensine -;(i"u,) ] ri ['i'"'- (i'",)'] (1) '= [i'"' VAR(BV) : August I gqq 9O9

Transcript of Correspondence Analysis for Principal Components ...

Page 1: Correspondence Analysis for Principal Components ...

Coruespondence Analysis forPrincipal Gomponents Transformation of

Multispectral and Hyperspectral Digital lmagesJames R. Caft and Korblaah Matanawi

AbstractCorrespondence analysis is introduced for principal compo-nents ftansformation of multispectral and hyperspectral digitalimages. This method relies on squared deviations betweenpixel volues and their expected values (joint probabilitiescomputed as the product of the sum of all pixels in onespectal band and the sum of pixel values across all bandsat a given pixel position). Correspondence analysis is appliedto o multispectml SPoT High Resolution Visible fnnv) imageof Eleuthera, Bahamas. Correspondence analysis, principa)components analysis, and factor andysis (standardized pfin-cipal components) yiel d similar transformations. Cone spon-dence analysis, however, compresses mote imoge varianceinto fewer principal components. For the particular SPOT HRvscene chosen, correspondence analysis captures g6 percentof the original image vafiance in its first principal compo-nent. Used in a lossy image compression algorithm to recon-struct the original set of three SPOT HRV images, this firstprincipal component from cofiespondence analysis restoresspectrul content better than does principal components anal-ysis.

lntroductionPrincipal components transformation, also refened to as aHotelling transform, of multispectral images is well known(e.g., Schowengerdt, 1997; Gonzalez and Woods, 1992; Jen-sen, 1996). The typical application relies on within-band var-iance/between-band covariance matrices, eigenvectors ofwhich are used to rotate (transform) original image bandssuch that each rotated band is statistically orthogonal to (in-dependent ofl all other bands. Another application, calledstandardized principal components (Singh and Harrison,1985), uses within- and between-band correlation coeffrcientsto assemble a matrix that is eigen decomposed. Althoughcalled standardized principal components in literature re-lated to remote sensing, this method is often known as factoranalysis (see reviews in Davis (1s73), Davis (1986), and Carr(1995)), referred to as such throughout this manuscript toavoid confusion with principal components analysis. Factoranalysis has been used in digital image processing to mini-mize the numerical influence of bands having a larger vari-ance. Realistically, principal components and factor analysisare similar because (co)variance is directly proportional tocorrelation coeffi cient.

A multivariate method of relatively recent advent, corre-spondence analysis (Benzecri, 1973), differs from principal

J. Carr is with the Department of Geological Sciences/l.72,University of Nevada, Reno, NV 89557 ([email protected]).

K. Matanawi is with World Vision, Sudan Proqram, P.O. Box56527, Nairobi, Kenya.

PHOIOGRAMiIETRIC EiIGINEERING & RETIIOTE SENSING

components and factor analysis by using a chi-square metricfor representing inter- and intra-variable relationships. Forcertain types of images and/or individual scenes, correspon-dence analysis may provide better discrimination among in-tercorrelated spectral bands when compared to principalcomponents or factor analysis. Correspondence analysis maybe a better method when applied to ratio data, such as pixelvalues, or measurements in general (Greenacre, 1984).

Correspondence analysis is more a geometrical methodthan a statistical one. Factor analysis and principal compo-nents analysis, for instance, are statistical methods. Both arefunctions of covariance among the multiple variables, wherecovatiance is the metric for closeness (similarity). Correspon-dence analysis uses a different measure of closeness betweenvariables. A chi-souare metric is used to measure distancebetween pixel veciors of order M, where M is the number ofspectral bands. In this sense, conespondence analysis is a ge-ometric method because its main concern is the orientationof [pixel] vectors in M-dimensional space, and how closethese vectors are to one another. A d-etailed discussion of thehistorical and theoretical background of correspondenceanalysis can be found in Greenacre (19s4).

A subsequent application is used to emphasize the simi-larities and differences among principal components analysis,factor analysis, and correspondence analysis. The primary in-tent of this paper is to introduce correspondence analysis forprincipal components transformation of multispectral and hy-perspectral digital images. Depending on the image type andscene, it may yield results that are superior to those from theother two methods. An extension of corresoondence analvsisto digital image compression is also suggesied.

Algotithms

Principal Components AnalysisGiven M intercorrelated image bands, principal componentsanalysis proceeds by assembling a matrix [S], M by M in sizeand symmetrical, such that S(i,i) is the variance for band i,and S(i,il : S(/,t is the covariance between bands i and l.Variance can be comouted as

Photogrammetric Engineering & Remote Sensing,Vol. 0s, No. B, August 19s9, pp. 909-914.

oo99 - 1, 1, 1, 2 / 99/6508-909$ 3. 00/0@ 1999 American Society for Photogrammetry

and Remote Sensine

-;(i"u,) ]ri [' i '"'- (i '",)'] (1)

'= [i'"'VAR(BV) :

A u g u s t I g q q 9 O 9

Page 2: Correspondence Analysis for Principal Components ...

in which N is the total number of pixels, BV, collectivelyrepresenting a particular image band. This equation com-pares to that for covariance between two image bands, BV,and BVr: i.e.,

1 f . . $cov(nv, . BVr) :

N(N _ 1 l LNZ BV,. r BVr. ,

- ,i ,u,,i ,u,,] e)where N is again the total number of pixels in each imageband.

Suppose the BVs in a Landsat TM scene are to be rotated(transformed) using principal components analysis of allseven (M bands. In this case, a matrix S of size 7 by 7 iscomputed. Diagonal entries of this matrix consist of theseven variances for each band. Additionallv. there are 21, off-diagonal entries, symmetrical above and below the diagonal,representing the covariances for all possible two-band combi-nations from the group of seven bands.

Factor Analysis (Standardized Principal Components Analysis)Again, assume a multispectral image consisting of M spectralbands. A matrix S of size Mbv M is comouted such thatS(i,, : S(i,t : r, the correlatibn coefficient between bands, jand i. For consistency with previous equations, let the pixelsin band i be represented by BVi, and those for band j be rep-resented by BVr. Then

COV (BVJ BV,)t" : (s*rs,J-In this equation, S represents the standard deviation of a dis-crete sample. Moreover, if i : i, then r : 1; thus, all diago-nal entries of S are equal to 1.

Given a seven-band Landsat TM scene, S is a 7 by 7 ma-trix, The seven diagonal entries are each equal to 1. The 21unique off-diagonal entries, symmetrical above and belowthe diagonal, are equal to correlation coefficients for all pos-sible two-band combinations (from the original group ofseven bands). These correlation coefficients range numeri-cally between -1 and 1, inclusive.

Conespondence AnalysisIn this development, a hypothetical notion of an image isused to demonstrate mathematical concepts. Suppose a 512by SIZ , seven-band Landsat TM image is to be transformedusing correspondence analysis. Each band consists of 512 by512 pixels, or a total of 262,L44 pixels. Let a matrix Y beformed that is N by M in size; M in this case, is 262,1.44,and M is 7. Notice that Y is simplv the total collection of allpixels involved. Further, Iet theioial sum of all entries in Ybe called u. The following steps are completed to yield asquare, symmetrical matrix S that is eigen decomposed toyield the image transformation (for more detail and computerprograms, see Carr (1990; 1995; 1998)):

Step 3:

normalize Y by u: Y' = Y/u;form an N by 1 vector W, each entry of whichrepresents the sum of each row of Y'; each entryin W is the total sum of [normalized] pixel val-ues across all spectral bands considered for agiven pixel position;form an Mby 1, vector T, each entry of whichrepresents the sum of each column of Y'; eachentry in T is the sum of all [normalized] pixelvalues in a given spectral band;

Step 4: although a matrix S of size Mby M can beformed from two intermediate matrices. S^ andS., such that

g f O A u s u s t I 9 9 9

This is an inefficient approach with respect to computermemory; instead, the matrix S can alternatively be formedas follows, an approach that more directly implies thechi-square analogy (Davis, 1986; Can, 1995):

,*:i(w)(w) (s)For example, if i : i = k :1, then

q - ( Y , , - w ' 7 , ) 'v1l wrT,

and comparing to the equation for the chi-square statistic:

" (O' - E,)'

x " : E ,

in which O is an observed value and E is the expected value,the analogy is realized if one considers that Y is observed,and the product WT is the expected value of Y. When usingthe second approach in correspondence analysis for comput-ing the matrix S, off diagonal entries can become negative.The second approach for forming S is used in this paper andin Car (1998). As a ffnal note, correspondence analysis al-ways reduces the dimensional space by at least one. If M im-age bands are considered, only M - 1 eigenvalues, at themost, will be significant. This has important implications fordigital image compression, a concept that is explored later inthis paper.

ApplicationA portion (400 rows by 400 pixels per row) of a SPOT HighResolution Visible (nnv) image of Eleuthera, Bahamas (Figure1) acquired on 21 April 1986 is chosen for an example com-parison among principal components analysis, factor analy-sis, and correspondence analysis. The three multispectralchannels of the SPOT HRV sensor are used: channel 1 (0.5 to0.59 pm), channel 2 (0.61 to 0.68 pm), and channel 3 (0.79 to0.89 pm). Matrices S, obtained from each multivariate method,are listed (Table 1), along with their associated eigenvaluesand eigenvectors (Table 2).

Principal components analysis and factor analysis areidentical with respect to the amount of original image vari-ance captured (represented) in each principal component: 70percent in the first, 28 percent in the second, and 2 percentin the third principal component. For this particular SPoTHRV scene, there seems to be no need to standardize imagevariance using correlation coefficients in a principal compo-nents analysis because the eigen decomposition results ardidentical from principal components analysis and factoranalysis.

Tralr 1. MATRTCES S, ron PnrnclpnL Colponenrs, FAcroR, ANDCoRnesponoeruce Annlysrs

Principal Components Analysis

(4)

(3)

Step 1:Step 2:

2261.61546.1

- l / / 5 - 5

Factor Analysis1.00000.8025

-0 .6029Correspondence Analysis

0.08750.0370

-o.7277

1546.11641.4-432.O

0.80251.0000

-o .1722

0.0370o.0267

-0 .0619

_ L / / J . J

-432.O3834.3

- 0 .6029-o.7722

1.0000

-o.1211.-0.0619

0.1 780

PHOTOGRAMMETRIC ENGINEERII{G & REIIIOTE SENSING

Page 3: Correspondence Analysis for Principal Components ...

Figure 1' Mosaic of sPor HRV spectral images of Eleuthera, Bahamas acquired on 21 April 1986. These images represent asub-sampled, 400 row by 400 pixel section of the original scene. Left: Channel 1 (0.5 to 0.59 pm); center: Channel 2 (o.6Ito 0.68 pm); right: Channel 3 (0.79 to 0.89 pm).

In-comparison, correspondence analysis captures g6 per-cent of the original image variance in its first principal com-ponent. The second principal component captures most ofthe remaining 4 percent of the total image variance, whereasthe third component represents a negligible amount of vari-ance. The 1arge amount of variance captured by correspon-dence analysis in its first principal component is usedlaterin a discussion on digital image compression.

Eigenvectors (Table 2) from all three methods are usedto transform the original suite of three bands to three mutu-ally orthogonal (statistically independent) bands, mosaics ofwhich are shown (Figures 2 and 3; a specific figure devel-oped using factor analysis is not presented because its trans-formation is visually identical to that from principalc_omponents analysis). No scaling was used to develop thedisplays (Figures 2 and 3), except to transform pixel valuesto a range, [0, 255], to facilitate display.

Mathematically, the transformation was implemented asfollows. Each of the SPOT HRV spectral band imases wasloaded as a single column in a matrix O, an N b! M imagewhere N is the total number of pixels in each band image,and M is the number of spectral bands. Eigenvectors (Table2) from each method were loaded as columns into a matrixE. The transformation was then obtained as a product: T :

OE, and each column in T is one of the transformed images,N total pixels in size.

Principal components analysis, factor analysis, and cor-respondence analysis yield visually similar results for thefirst principal component. The second principal componentimage from correspondence analysis is visuafly similar to thethird principal component image from both piincipal compo-nents an_alysis and factor analysis, probably becauJe approxi-mately the _same amount of original image-variance is beingrepresented. The third principal component image from coi-respondence analysis is shown, but represents negligible var-iance and is not comparable to any principal componentimage from the other two methods

Extension to Digital lmage CompressionGiven that OE : T effects the principal components transfor-mation, then a reconstruction is afforded by reversing thisprocess: TE-' : O, in which E I is the inverse of theeigenvector matrix. If all M columns (principal componentimages) in T are used in this process, O is reconstructed ex-actly with no loss. In this case, T and O have the same infor-mation content (the same size).

- A- lossy reconstruction (one with error) is possible usingonly the most significant principal component, or compo-

TneLE 2. Ereen Decovposrrron Resulrs

Eigenvectors

Vector 1 Vector 2 Vector 3

Principal Components:

Factor Analysis:

Correspondence Analysis:

Eigenvalues:Variance:

Eigenvalues:Variance:

Eigenvalues:Variance:

-o.779- 0 .435

1.0005463.7

(7Oo/o)- 1 .458-7.230

1.0002.O9

(7Oo/o)-0 .687-o.342

1..0000.28

(s6%)

0.6647.1721.000

2204.3(28%)

0.0630.7381.0000.84

(28o/o)

3.948-5 .008

1.0000.0099

(4%)

2.679-2.499

1.000' t57 .8

(2"/.)7 .970

-1..522

1.000o.o7(2%)

o.9750.9681.0005.04E-15(o%)

Sum = 7825.8

Sum = 3.0

Sum : 0.29

PHOTOGRAMMETRIC ENGINEERII{G & REMOTE SEI{SII{G A u g u s t 1 9 9 9 9 1 1

Page 4: Correspondence Analysis for Principal Components ...

Figure 2. Mosaic of transformed images obtained using correspondenceter: second principal component image; right: third principal componentl i ne .

analysis. Left: f irst principal component image; cen-image. Each image is 400 lines by 400 pixels per

nents, in T. For example, correspondence analysis of the oneSPor HRV image used in this paper resulted in the first prin-cipal component capturing 96 percent of the original datavariance, An experiment is presented to reconstruct the origi-nal three images, O, using only the first column in T. In thiscase. the second and third columns of T are discarded. Sucha reconstruction is presented (Figure a). This is an exampleof a lossy compression, because discarding the second andthird columns in T results in a loss of 4 percent of the origi-nal data variance. The reconstruction (Figure 4) is visuallysimilar, but not identical to the original three spectral bands(Figure 1). For comparison, a reconstruction is attempted us-ing only the first principal component image from principalcomponents analysis (Figure 5). In this case, 30 percent ofthe original image variance is discarded, and the comparisonto the original spectral images (Figure 1) is less satisfying.

In these reconstructions (Figure 4 contrasted by Figure5), correspondence analysis restores the vegetation brightnessin the infrared spectral band (spor HRV band 3) better thandoes principal components analysis when using only the first

principal component for reconstruction. In this particularcase, the first principal component in correspondence analy-sis represents more between-band covariance than does thecomparable component in principal components analysis.Whereas lossy digital image compression may not be satisfy-ing for many applications, it is used here more to distinguishthe information content in principal components developedusing correspondence analysis from the information contentin principal components developed using principal compo-nents analysis or factor analysis.

SummaryCorresoondence analvsis is a relativelv new method of multi-variate analysis. Its metric is a "chi-square" deviation be-tween a true (normalized) pixel value and its expected value.In application to a three-band, spor HRV image, the first twoprincipal component images from correspondence analysiscapture almost all original image variance. In contrast, allprincipal component images are necessary from principalcomponents analysis or factor analysis (standardized princi-

Figure 3. Mosaic of transformed images obtained using principal components analysis. Left: f irst principal component image;center: second principal component image; right: third principal component image. Each image is 4OO lines by 400 pixelsoer l ine.

A u g u s t I 9 9 9 PHOTOGRAMMETRIC EI{GI]{EERING & REMOTE SEI{SING

Page 5: Correspondence Analysis for Principal Components ...

Figure 4. Mosaic of reconstructed sPoT uRv images using the first principal component image from correspondence analysis.Channel 1 is on the left; Channel 2 is in the center; and Channel 3 is on the right. Note the restoration of vegetation bright-ness in Channel 3.

pal components analysis) to represent the same amount oforiginal image variance.

Visually, correspondence analysis yields principal com-ponent images similar to those obtained lrom principal com-ponents analysis or factor analysis. This statement is con-firmed by comparing Figures 2 and 3 again and noting thatthe second principal component image from correspondenceanalysis is almost identical to the third principal componentimage from principal components or factor analysis. If ap-plied for decorrelation stretches, there is no validity to a rec-ommendation that one of these methods be used rather thananother. But. correspondence analvsis captures more of theoriginal image variance in fewer piincipal components, anadvantage when the original image data consists of manyspectral bands.

Another advantage to correspondence analysis may befor digital image compression, especially for spectral com-pression. Such compression is of two types: lossless (error-free reconstruction) and lossy (reconstruction with error),Principal components and factor analysis provide lossless re-

construction if all principal component images are used forthe reconstruction. This, however, represents no compres-sion. Correspondence analysis will always yield some com-pression, even for lossless reconstruction, because it alwaysyields M - 1 significant principal components (recall thenegligible third principal component from correspondenceanalysis in the one application reviewed in this paper; Table2). In the one example of reconstruction that is presented,correspondence analysis yielded a fair reconstruction usingonly its first principal component.

AcknowledgmentsThis work was supported in part by the NAsA-sponsoredUniversity of Nevada System Space Grant Consortium. Soft-ware used in this study was developed by the senior authorusing Visual Basic 5.0 (Carr, 1998). This software is in thepublic domain and can be obtained at the following WorldWide Web site : http ://www. iamg. org/CGEditor/index.htm.The SPoT HRV image of Eleuthera, Bahamas was taken from aCD-RoM: SPOT Satellite Image Library for Schools, NPA Group

Figure 5. Mosaic of reconstructed spoT HRV images using the first principal component image from principal componentsanalysis. Channels are arranged as described in the caption to Figure 4. Vegetation brightness is not reconstructed well inChannel 3.

PHOTOGRAMMEf RIC ENGINEERING & REMOTE SENSIT{G A u e u s t 1 9 9 9 9 1 3

Page 6: Correspondence Analysis for Principal Components ...

Ltd., Edenbridge, Kent, TNB 6HS, United Kingdom. We aregrateful to two anonymous individuais whose reviews sub-s l a n t i a l l y i m p r o v e d t h i s m a n u s c r i p t .

References

Benzecri, J.-P., 1,573. L'Analyse des Donnees, Vol. 1: La Taxinomie.Vol. 2: L'Analyse des Correspondances, Dunod, Paris, 628 p.

Carr, J.R., 1990. CORSPOND: A Portable FORTRAN-77 Program lbrCorrespondence Analysis, Computers and Geosciences, 16(3):285-307.

1995. Numedcal Analysis for the Geological Sciences,Pren-tice-Hall, Englewood Cliffs, New Jersey, 592 p.

1sgB. A Visual Baslc Program for Principal ComponentsTransformation of Digital Images, Computers and Geosciences,24t31:2O9-21,8.

Davis, f.C., 1-973. Statistics and Data Analysis in Geolog.v, Fit'st Edi-t ion, Wi lev, New York, 550 p.

1986. Stofisfics and Data Analysis in Geologv, Second Edi-tion, Wilev, New York, 646 p.

Greenacre, M.J., 1984. Theory and Applications of CotespondenceAnalysis, Academic Press, London, 364 p.

Gonzalez, R.C., and R.E. Woods, 1992. Digital Image Processing,Addison-Wesley, Reading, Massachusetts, 7Lo p.

Jensen, J.R., 1996. lntroductory Digital Image Processing, Prentice-Hall, Englervood Cliffs, New Iersey, 316 p.

Schowengerdt, R.A., 1997. Remote Sensing: Models and Methods forImage Processing, Second Edition, Academic Press, Ner,r' York,5 2 2 p .

Singh, A. , and A. Harr ison, 1985. Standardized Pr incipal Compo-nents, In l . J . Remote Sensing, 6[6) :885-8S6.

[Received 18 JuIy 1997; revised and accepted 10 September 1998t re-v ised 0z October 1998)

ASOTS

n u g u s r PHOTOGRAMMETRIC ENGIIiEERING & REMOTE SENSING