The North American Chapter of the International ... · [email protected] ... also be displayed...

23
1 Barry M. Wise President Eigenvector Research, Inc. 830 Wapato Lake Road Manson, Washington 98831 [email protected] Patrick Wiegand President-Elect Union Carbide Corporation P.O. Box 8361 South Charleston, West Virginia 25303 [email protected] Margaret A. Nemeth Secretary Monsanto Company 800 N. Lindbergh Boulevard St. Louis, Missouri 63167 [email protected] Aaron J. Owens Editor-in-Chief DuPont Company POB 80249 Wilmington, DE 19880-0249 [email protected] Neal B. Gallagher Treasurer Eigenvector Research, Inc. 830 Wapato Lake Road Manson, Washington 98831 [email protected] Ronald E. Shaffer WebMaster Naval Research Laboratory Chemistry Division, Code 6116 Washington, DC 20375-5342 [email protected] David Duewer Membership Secretary National Institute of Standards and Technology Gaithersburg, Maryland 20899 [email protected] The North American Chapter of the International Chemometrics Society Newsletter #21 October, 2000 In this issue: Guest Editorial Bob Schweitzer Conference Announcements A Brief Introduction to Multivariate Image Analysis (MIA) Barry M. Wise and Paul Geladi The Angle Measure Technique (AMT) for textural pre-processing in Multivariate Image Analysis (MIA) Kim H. Esbensen and Jun Huang Opportunities for Chemical Imaging in the Industrial World Nancy Jestel The Role of Chemometrics in Chemical Image Analysis Robert C. Schweitzer, Arjun S. Bangalore and Patrick J. Treado Thanks to: for supporting the duplication and mailing of the newsletters

Transcript of The North American Chapter of the International ... · [email protected] ... also be displayed...

1

Barry M. WisePresidentEigenvector Research, Inc.830 Wapato Lake RoadManson, Washington [email protected]

Patrick WiegandPresident-ElectUnion Carbide CorporationP.O. Box 8361South Charleston,West Virginia [email protected]

Margaret A. NemethSecretaryMonsanto Company800 N. Lindbergh BoulevardSt. Louis, Missouri [email protected]

Aaron J. OwensEditor-in-ChiefDuPont CompanyPOB 80249Wilmington, DE [email protected]

Neal B. GallagherTreasurerEigenvector Research, Inc.830 Wapato Lake RoadManson, Washington [email protected]

Ronald E. ShafferWebMasterNaval Research LaboratoryChemistry Division, Code 6116Washington, DC [email protected]

David DuewerMembership SecretaryNational Institute of Standardsand TechnologyGaithersburg, Maryland [email protected]

The North American Chapter of theInternational Chemometrics Society

Newsletter #21 October, 2000

In this issue:

Guest EditorialBob Schweitzer

Conference Announcements

A Brief Introduction to Multivariate Image Analysis (MIA)Barry M. Wise and Paul Geladi

The Angle Measure Technique (AMT) for textural pre-processing inMultivariate Image Analysis (MIA)Kim H. Esbensen and Jun Huang

Opportunities for Chemical Imaging in the Industrial WorldNancy Jestel

The Role of Chemometrics in Chemical Image AnalysisRobert C. Schweitzer, Arjun S. Bangalore and Patrick J. Treado

Thanks to:

for supporting the duplication and mailing of the newsletters

2

Chemometrics is the Engine ofChemical Imaging

Chemical imaging is a rapidly emerging area in chem-istry, having been identified by the Council for Chemi-cal Research as a focus area in its roadmap for theU.S. chemical industry, “Technology Vision 2020.”Chemical imaging combines digital imaging andmolecular spectroscopy for the chemical analysis ofmaterials. While chemical imaging encompassesmany areas of analytical spectroscopy, all involve thecollection of spatially resolved multidimensionalchemical information. Virtually every spectroscopictechnique can be used to generate chemical images.As a result, chemical imaging can be enabling in manydiverse applications. For example, chemical imagingcan be applied to remote sensing, medical imaging,chemical microscopy, process monitoring and highthroughput screening. Chemical imaging is a growingfield with an increasing number of literature citations,experienced users and a diversity of commerciallyavailable chemical imaging technology. Given themultivariate nature of the data generated and theimpact that can be made by chemometrics in analyz-ing chemical image data, it is appropriate that achemometrics newsletter should address the basicsof chemical imaging. In this issue of the NAmICSnewsletter, several experienced chemical imagingresearchers offer their perspectives on the role ofchemometrics in chemical imaging science andpractice. It is hoped that this newsletter will provide auseful introduction to chemical imaging and will spurinterest for further research about this topic.

Bob SchweitzerChemIcon [email protected]

Conference Announcements

50th Gordon Research Conference on Statistics inChemistry and Chemical Engineering

July 22-27, 2001Williams College

The 50th Gordon Research Conference on Statisticsin Chemistry and Chemical Engineering is scheduledfor July 22-27, 2001 at Williams College inWilliamstown, MA. Note the year: the conference isskipping Y2K and will next meet in the summer ofY2K+1.

The GRC focuses on new research directions inapplied statistics and the analysis of chemical phe-nomena. It has met annually for half a century,drawing statisticians, chemometricians, chemists andchemical engineers from industry, government anduniversities around the world. Statistical intereststypically lie somewhere between Technometrics andthe Journal of the American Statistical Association,with the applied interests of the former and thetechnical depth of the latter. New methods of predic-tive modeling, experimentation, chemometrics, andquality are perennial favorites.

The program is still being formed, but some areas inwhich conferees have expressed interest are:

• Image analysis

• Boosting and bagging

• Effective degrees of freedom

• Process control

• Mixture experimentation

• Modern Bayesian methods

• Chemical informatics

• Spatial statistics

• Clustering

• Drug / chemical discovery

If you have thoughts on these topics or any others thatyou think would make for good GRC talks, pleasecontact the conference organizers:

Randy Tobias, [email protected]

Mary Beth Seasholtz, [email protected]

3

A Brief Introduction to Multivariate ImageAnalysis (MIA)

Barry M. Wise* and Paul Geladi†

*Eigenvector Research, Inc.†Umeå University

IntroductionImages have been used in the sciences for a longtime and they are used increasingly. Large amountsof data representing complex systems can only berepresented by visualization as images. Multivariateimages arise from a surprising variety of sources.Some are images in the conventional sense (such assatellite data) while others are not (secondary ionmass spectroscopy, SIMS). Almost all physical unitscan be used to make images and multivariate images:temperature, gravitational field, impedance, magneticfield, electrical field, mass, wavelength, ultrasoundwavelength, polarization, electron energy etc. A roughbut practical subdivision of the fields of scientificimaging is in satellite imaging, medical (clinical)imaging and the microscopies. The simplest meaning-ful multivariate image has two pixel indices (e.g. widthand height in the image plane) and a variable index,making up a three-way array. An important aspect ingoing from analog scenes or objects to digital imagesis resolution. Multivariate images have spatial, inten-sity, spectral and time (temporal) resolution. A typicalolder satellite image would have 512x512 pixels, in 7wavelength bands and an intensity resolution of 256gray levels. High spatial and intensity resolution isdesirable and this makes the arrays rather large andthe calculations slow.

The traditional field of univariate image analysis worksin the spatial domain in 2D or 3D image arrays. Whenimages become multivariate or multitemporal, thespectral or time domain become a higher priority thanspatial considerations. When this is the case, thetools of Multivariate Image Analysis (MIA) become veryuseful.

TheoryPrincipal Component Analysis (PCA) is the workhorseof MIA. The key is the proper reorganization(matricizing) of the original 3-way or higher array.Unfolding is done so that each pixel (or voxel) be-comes a single row in the analysis. Thus, an imagethat is originally I by J pixels with K spectral channelsis reshaped to form a two way array that is IxJ by K.PCA can then be performed on this matrix in the usualway. Mean centering is typically done and in somecircumstances variance scaling may be used. After

the PCA model is calculated, the scores, residualsand T2 values can be folded back up to reform im-ages. Loadings vectors can be interpreted in the usualway.

Computational IssuesComputational issues arise due to the shear size ofthe images. It is not uncommon for images to be 512by 512 by 20. Stored as double precision arrays, suchan image would take up 40 Megs of RAM. Oftenimages are stored as unsigned 8 bit integers (i.e. 0 to255), which reduces the storage space to 5 Megs forthe array above. However, most math libraries andMATLAB do not define most mathematical operations(such as addition and multiplication!) for 8 bit integers.This often means that the data must be convertedpiecemeal to double precision as it is needed. Gener-ally, algorithms must be written to take advantage ofthe smallest dimension in the problem, which isusually the spectral dimension (K). Scatter matricescan be constructed by converting the spectral chan-nels to doubles pairwise. Covariance matrices arecalculated from the scatter matrices given the meansof the spectral variables. Decompositions can then beperformed on these matrices, recovering the spectralloadings.

Once the spectral loadings are in hand, scores foreach pixel can be determined. It is often convenient toscale the scores back into the range of 0 to 255 andconvert them back to unsigned 8 bit integers. Thissaves storage space and doesn’t lose much informa-tion if the scores are to be observed as images. As willbe discussed below, it also makes working in thescore space easier.

Working in the Image PlaneOnce scores have been calculated for each pixel, theycan be folded back up to the original image dimen-sions (I by J) and displayed as pseudo color maps.This gives a graphic representation of the score valueof each pixel as a function of position. Examination ofthe corresponding loadings gives information as to theoriginal spectral variables which give rise to thevariations captured in the scores plots. Residuals canalso be displayed this way, as can Hotelling’s T2 andmany other diagnostics.

Displays of any three of the scores, residuals, or T2values can be accomplished in the image plane byassigning each to be the red, green and blue values inthe display of the image. If this is done for the first 3PCs it is arguable that this is the most informationabout the data that can be displayed in a single image(a least to the non-colorblind). Areas with different

4

attributes show up as different colors in the pseudocolor image, often offering stark contrast to featuresthat do not show up in any individual image or evenany particular score image. Display of the residualscan be particularly useful with regard to identifyingunique areas (such as minor amounts of contamina-tion). The display and visual study of color images isinformative, but it is also subjective. The human eye isnot very linear in interpreting color differences andmisjudgment of similarities or differences is a risk.

As an example, we’ll consider data from SIMS. Thesample surface is PMMA (polymethylmethacrylate)which has been exposed to deuterated polystyrene.The data consists of the SIMS spectra from 1 to 300amu on a 64 by 64 grid. Thus, the image is unfolded to4096 (64x64) by 300 and PCA is performed. A scoreimage for the first PC is shown in Figure 1.

Scaled 95 Percent limits are 244.857

Scores on PC# 1

50

100

150

200

250

Figure 1. Image Scores on First PC

Scaled 95 Percent Q limit is 44.5304

Residual

50

100

150

200

250

Figure 2. Residual Image of PS on PMMA Based on Two PC Model

The islands of deuterated polystyrene (low scores) areclearly visible against the background of PMMA (highscores). Inspection of the next two PCs (not shown)show little systematic variation across the image. Theresidual image, Figure 2, shows two small areas withlarge residuals. These areas are not representative ofthe remainder of the surface and may be minoramounts of contamination.

Working in Score SpaceWhile working in the image plane can provide usefulclues as to the nature of the data set at hand, manyfeatures of the data, particularly clustering of thepixels, show up only in score space. Here the scoresfor all the pixels on two of the loadings are plottedagainst each other. However, because of the verylarge number of points typically involved(512x512=262,144 points), simple scatter plots of thedata often produce large “blobs” in the plots that loseinformation regarding the densities of points in a givenarea. For that reason, it is convenient to color code theplots in a way that gives the density of points in a givenregion. When the scores are stored as unsigned 8 bitsthis is particularly easy, one needs just find the num-ber of pixels with scores in every element of a 256 by256 array and display this as a pseudo color image.This image always shows outliers, dense clusters,sparse clusters and gradients between the clusters.The cluster size can be interpreted as standarddeviation within the cluster. The clusters rarely havenormally distributed ellipsoidal shapes, so the bestway to delineate a cluster is by drawing a polygonaround it.

The scores plots for our example problem are shownin Figure 3. The PC#2 versus PC#1 data is in thelower right quadrant, the PC#2 versus PC#3 is in thelower left, and PC#3 versus PC#1 in the upper right. Apseudo color image is in the upper left quadrant. Thedata appears to split into two groups in the lower right,a more concentrated group on the right and a morediffuse group on the left. A polygon has been drawnaround the group on the left and the points within ithave been highlighted in green. These points are alsohighlighted in the other score plots and on the originalimage. The ability to connect points in scores spacewith points on the image plane is critical to MIA.

5

Figure 3. Working in Score Space Linked to Image Plane

Local modelingBecause of the size of the images, major expectedconstituents may swamp smaller more interestingones, making them show up in higher and noisiercomponents. It is therefore very useful to make localmodels for smaller parts of the images. The selectionof these parts can be as rectangular subsets, buteven better as subsets of an irregular shape selectedin the score plots. The subset doesn’t even have to becontiguous (see Geladi, 1995). In the future, whenboth high spatial and spectral resolution will be avail-able, it may be necessary to develop local modelingboth in space and in the spectral domain. In this case,each spatial subset may also benefit from the use of aspecific spectral subset. The different methods ofchemometrics will be useful both for selecting thespatial and spectral subsets and for analyzing theobtained subsets.

PreprocessingNot all images are ideal when they are digitized. Theymay be noisy, have missing pixels, have bad contrastetc. Fortunately there are the operations of univariateimage analysis to correct for most of these unwantedproperties. Noise removal, hole filling, and contrastimprovement can all be done in an interactive mannerby visual inspection. A special problem is non linearity.Many imaging techniques may produce a large rangeof intensity values. This range should be projected tothe linear range of 0-255 for many applications by acombination of logarithmic transformation and round-ing or truncation to the nearest integer. Another prob-lem is that some imaging techniques give intensities inthe reflectance mode and these may have to betransformed to absorbances and rescaled in order tobetter represent the underlying phenomena. A high

intensity resolution to begin with is often crucial inavoiding rounding/truncation errors.

The use of preprocessing is especially important insatellite imaging. The satellite images that are madeavailable have undergone quite some preprocessingfor removing scanning errors. Additional preprocess-ing methods include texture filters, wavelets, Fourieranalysis and angle measurement technique (AMT).

ClassificationOnce a basic MIA has been performed, it is easy todevelop classification models which can be applied tothe current image or future images. In the score spacepixels which cluster can be selected for developmentof a separate PCA model (as in the SIMCA technique),or as input into other classification methods. Pixelscan also be selected from the image plane. Once theclassification models are developed, areas on newimages (or remaining parts of the calibration image)can be classified.

RegressionImages may be regressed against each other. An I byJ by K image may be used as predictor (X) block foran I by J by M image (response variables, Y block).This is rare. More often, a smaller image or an imageof lower spatial resolution is used as the Y-image.Even images of irregular shapes may be used as Y-images in regression. The regression model param-eters allow a prediction image of the size I by J. Theuse of latent variable based calibration methods alsoleads to a residual of the X-block. This residual is usedas a measure of the reliability of the predictions. Insatellite images, regression is used for “ground truth”prediction of vegetation quality or quantity. In medicalapplications, discriminant analysis of known malignantand benign tissue may be formulated as a regressionon a binary variable. In this way, an automated detec-tor of malign tissue may be constructed that is usefulfor future images. An advantage of latent variableregression methods is that the latent variables areimages and can be studied as images or as scatterplots.

Other ExtensionsThe earliest uses of multivariate image analysis madeuse of the most obvious multivariate methods: PCA,ridge regression, principal components regression(PCR) and partial least squares regression (PLS).This was often dictated by limitations in storagespace, memory and calculation capacity. Also thelimited availability of wavelength bands (or othervariables) often dictated a simple form of data analy-sis, but nothing prevents the use of more advanced

6

methods, or hybrids. Curve resolution may be a goodalternative to PCA if linear mixtures are studied. Alsoalternative supervised and unsupervised, interactive orautomatic clustering methods are possible. Fortemporal imaging, the use of the principles of timeseries modeling is very promising. Positive matrixfactorization, maximum likelihood models, parallelfactor analysis (PARAFAC) on series of multivariateimages etc. When both temporal and spectral dataare available, the image array may be reorganized intoa three-way array with pixel, time and wavelength asthe ways. The pixel mode three-way loadings may bereorganized into images.

Hyperspectral ImagingNew technologies are emerging that make it very easyto get high spectral resolution. In satellite imaging,systems like AVIRIS (Airborne visual/infrared imagingspectrometer) with 224 wavelength bands are replac-ing the <20 band older satellite images. In opticalclinical imaging, systems that combine the resolutionof a single point infrared, fluorescence or Ramanmeasurement with a moderate spatial resolution arebecoming available. In electron and ion microscopytoo, high energy or mass resolution are becomingeasy and fast to obtain. The emergence of thesesystems, with a sometimes low (64x64) spatialresolution and a rather high (>200) spectral resolutionis generating new ways of treating the data. Moreemphasis is going to the spectral aspect. With sometechniques (Raman, infrared, fluorescence), it is easyto do wavelength selection and fall back to univariateimaging in one wavelength or an integral over a smallwavelength band. Also band ratios are often made togood use.

ExamplesThe MIA paradigm will be demonstrated using twoadditional examples. The first is from SIMS of a PVAsample. The original data is 64 by 64 pixels with 300mass channels. Only the positive SIMS spectra wereconsidered. Figure 4 shows a number of false colorimages of the data at several different mass numbers.Some of the images appear to have some grouping ofpixels with large numbers of ion counts on the imageplane. The picture becomes much clearer when PCAis applied, as shown in Figure 5. Here a spot is clearlyvisible on the image surface. Inspection of the load-ings (not shown) reveals some mass numbers whichload positively, and others that load negatively. Thus,the spot is depleted in the ion fragments that loadpositively and enriched in those that load negativelycompared to the remainder of the image. We wouldexpect this to be meaningful to a competent mass

64

64

300

Figure 4. Slices of the SIMS Data of a PVA Sample.

Figure 5. Scores on First PC of PVA Sample Showing SpotClearly.

spectroscopist . (Unfortunately, we’re not mass spec-troscopists and thus won’t attempt to interpret thisimage chemically.)

In our second example, PARAFAC will be used toanalyze a series of multivariate images. PARAFACcannot generally be used in MIA as most (single)images are not tri-linear. This is because the imageplane is not easily decomposed as a summation overthe outer product of pairs of factors. However, if thedata consist of a sequence of multivariate images, thedata can be unfolded in a way that should be approxi-mately tri-linear. As an example we will consider aseries of 5 images taken by the Eigenvector Research

7

Figure 6. Image 1 from Eigenvector Research Web Camera.

PARAFAC models with 1, 2 and 3 factors weredeveloped. The two factor model, capturing 99.6% ofthe sum of squares, was selected. The PARAFACmodel can be interrogated to determine the “trends” inthe series of images. The loadings from the color andtime modes are shown in Figure 7. The color dimen-sion shows one factor that is high in blue relative tored and green. This factor is associated with analmost constant factor in the time domain. In the pixelmode, Figure 8, this factor separates the foreground,which is mostly grass, from the mountains and sky,which appear blue. The second factor is highest inred and green in the color domain and is associatedwith a generally decreasing time factor. This factorseparates different types of vegetation from eachother, such as the bushes from the lawn, as shown inFigure 9. This is most apparent in the plot of pixelmode loadings, which we will refer to here as scores,shown in Figure 10. Here the different clusters of thescores are associated with different elements of theimages. Mountains, sky, front lawn, mowed field,unmowed field, shrubs, rocks, orchard, and man-made structures (road, fence and some buildings) allhave their own cluster as indicated.

Figure 8. PARAFAC Model Factor 1 Loadings in Image Plane

Figure 7. PARAFAC Loadings from Color and Time Modesof Image Sequence.

Figure 9. PARAFAC Model Factor 2 Loadings in ImagePlane

web camera on March 9, 2000. The first of theimages is shown in Figure 6. The images are each240 by 352 by 3 (the red, green and blue layers). Thusthe total array is 240 by 352 by 3 by 5. This array canbe unfolded to 84,480 (240x352) by 3 by 5 and ana-lyzed with PARAFAC. The PARAFAC loadings in thetime and color modes will describe the generalchanges of the image over the series while the load-ings in the pixel dimension describe changes inspecific pixels on the images.

red green blue 0.4

0.5

0.6

0.7

0.8 Loadings in Color Mode

1 2 3 4 5 2

3

4

5

6 x10 4 Loadings in Time Mode

Factor 1 Factor 2

Factor 1 Factor 2

PARAFAC Factor 1 Loadings in Image Plane X810-3

6

4

2

0

-2

10

8

6

4

2

0

-2

x10-3 PARAFAC Factor 2 Loadings in Image Plane

8

Figure 10. Scores on First Two PARAFAC Factors in PixelMode.

The residual in the pixel model over all of the imagescollectively is shown in Figure 11. The image has been“thresholded” so that a few pixels with very highresiduals do not take up most of the color map. Thelargest residuals are associated with elements of theimage that changed from frame to frame. The lawntractor and trailer is obvious in the foreground, it is onlyin the last frame. A car moving along the road can alsobe seen (in image 2 only), along with the lake, whichchanged considerably due to breezes in the first twoframes.

Figure 11. Residual Image from Two Factor PARAFACModel Thresholded to 40,000.

of the PARAFAC model is that it takes much longer tocalculate than the PCA model (about an hour versus18 seconds).

ConclusionsMultivariate images are a rich source of informationbut present unique challenges due to their structureand abundance of data. Techniques based on PCAcan be used to gain insight into their overall structureand the relationships of the parts of the image. Addi-tional techniques, such as PARAFAC, can be used onseries of images.

AcknowledgmentAll results shown in this manuscript were generatedwith MATLAB and PLS_Toolbox. The authors wouldlike to thank Anna Belu of Physical Electronics forproviding the SIMS data on samples provided by theGarcia Center for Polymers at Engineered Interfacesat Stoney Brook.

LiteratureFor more information on MIA we suggest the followingreferences:

P. Geladi and H. Grahn, Multivariate Image Analysis,Wiley, Chichester, 1996.

P. Geladi and H. Grahn, “Multivariate Image Analysis”,in Encyclopedia of Analytical Chemistry, Wiley,Chichester, in press, 2000.

P. Geladi, “Sampling and local models for multivariateimage analysis”, Microchimica Acta, 120, pps. 211-230, 1995.

A. Kriete ed., Visualization in Biomedical Mi-croscopies, VCH, Weinheim, 1992.

F. Toselli and J. Bodechtel eds, Imaging Spectros-copy: Fundamentals and Applications, ECSC, EEC,EAEC, Brussels, 1992.

B. M. Wise and P. Geladi, “Analysis of a Series ofImages with PCA and PARAFAC,” presented atTRICAP 2000: Three-way Methods in Chemistry andPsychology, Fåborg, Denmark, July, 2000.

The analysis of this series of images can also be donewith PCA if the images are unfolded in the pixel modesand time and color modes to produce a matrix that is84,480 (240x352) by 15 (3x5). Results (not shown)are similar. The main advantage of the PARAFACmodel is that the time and color modes are not con-volved, easing interpretation. The main disadvantage

Residual Image from PARAFAC Model

0.5

1

1.5

2

2.5

3

3.5

X4104

Scores from PARAFAC Model of Image Series

Scores on First Factor

3

30

100

300

mountain rocks

shrubs

lawn

new lawn

field

sky

man-made

orchard

slide

Sco

res

on S

econ

d F

acto

r

10 10

9

The Angle Measure Technique (AMT)for textural pre-processing in Multivariate

Image Analysis (MIA)

Kim H. Esbensen & Jun Huang

Telemark University College (HIT)Porsgrunn, Norway

http://www-pors.hit.no/tf/forskning/kjemomet/kjemomet.html

IntroductionMultivariate Image Analysis (MIA) was presentedelsewhere in this issue, along with an introduction tothe different types of 2-D and 3-D arrays (the“multivariate image” concept) and the appropriatemethods for their image analysis, especially where thespectral domain represents a higher priority thanspatial and textural considerations. Here we present anew type of complementary textural pre-processing,which has been found useful for many types oftechnical and industrial imagery in both multivariateand univariate (grey-scale) images. The AngleMeasure Technique (AMT) is a domain transform,along with FT (Fourier Transform) and WT (WaveletTransform). These three pre-processing techniques allhave the property of transforming a 2-D spatial arrayinto another modal domain, the frequency domain(FT), the scale-frequency domain (WT) and the scaledomain (AMT). The latter two “scale domains” are notidentical, in fact only the AMT transform deals with aspatial “scale” notion, whereas the WT addresses theamplitude domain proper. We shall here exclusivelypresent the AMT for textural image pre-processing.

The Angle Measure Technique (AMT)AMT was originally proposed by the American physicalgeographer Robert Andrle, 1996 (ref. 1) as a novelsubstitute for fractal analysis, in a direct attempt tocircumvent a documented problem of scale-dependent“complexity” of geomorphic lines (in reality a scale-related fractional dimensionality). Andrle’s topographicanalysis results clashed rather fundamentally with thedominant prerequisite notion of complexity scale-invariance, as postulated by fractal analysis – and hisinnovative answer was AMT, the Angle MeasureTechnique. This technique was later inducted intotechnical image analysis and process chemometricsby Esbensen et al., 1996 (ref. 2) and Huang &Esbensen, 2000 (ref. 3).

The workings of AMT, as described in Fig. 1, is inessence extremely simple. AMT transforms the“complexity” of a 1-D measurement series into a“scale-dependent spectrum”, examples of which arepresented in Fig. 2. Such scale-spectra have been

Figure 1. Explanation of the AMT derivation. The solid linerepresents the 1-D measurement series. The “Angle a(i)” ismeasured (for e.g. 500 random points A along the entiremeasurement series) as the supplement to angle CAB. Thisproduces a statistically robust “mean angle” measure, MA,of the local complexity corresponding to the scale, “S”,which is the contemporary radius of a circle with origin A.This circle will intersect the measurement series in twopoints, B and C, which in turn defines the complexity-relatedangle CAB. Difference Y is the vertical distance betweenpoint C and B, which produces the complementary complex-ity measure MDY. By letting S := S + 1 for the entire intervalfrom 1 (the digitalization unit of the measurement series) toN/2 (N := total number of measurements), the (MA, MDY)AMT-measures will simultaneously characterize the com-plexity at all scales.

Angle α(i)s

A S

C

B

Difference X

Difference Y

Digitized line

Figure 2. Representative (MA, MDY) AMT complexity-spectra(upper: MA; lower: MDY). Observe how MA and MDY capturedifferent but complementary aspects of the intricacies of thescale-complexity interrelationships. The horizontal axis repre-sents “log S”. MA displays a complexity “peak” correspondingto a scale of approx. 25-30. This illustration pertains to oneAMT-derivation only. When several AMT-spectra are collectedinto a common X-matrix, see Figs. 3-4, the (log S) scale isused as the variable dimension. For full details of AMT seerefs. 1-3.

10 0 10 1 10 2 0

50

100

150

S (Radius) [log]

MA

/MD

Y

f1.tif

Mean Angle Mean Difference Y

found very useful in characterising different complexityat different scales for any 1-D measurement seriesand also for 2-D image arrays.

10

Figure 3. 16 representative food powders

White Sugar Sortpepper White Rice "Stranger"

Sesame Salt Mustard seed British tea

Cumin Coriander Black Pepper Coffee

Brown Sugar Brown Rice Black Pepper Sunflower

(fine)

(coarse) seed

These latter are simply unfolded, and thus alsocorrespond to the basic 1-D series format. Perhapscounterintuitively, this unfolding does not destroy thepowerful texture-characterising features of AMT (ref. 2-3), and is indeed quite beneficial.

Features of AMTChemometrically the most useful aspect of the newAMT transform is that the compound (MA, MDY)spectra can be used directly as 1-D object vectors ine.g. PCA, PCR or PLS-modelling endeavours. Forprocess chemometric applications the AMTtransforms from the time-domain to the scale-domain,whereas for 2-D image objects it is the local texturewhich is transformed into a corresponding 1-D linearcomplexity. Irrespective, the resulting complexity-spectra, Fig. 2 (and below), implicitly carry a trulyremarkable information richness relating the scale(s)at which the geometric complexity are at a relativehigh vs. low, refs 1-5. This is the principal feature ofAMT, which in the present MIA context performs as auseful texture-characterising pre-processing in verymany applications, ibid. The way AMT complexityspectra may serve as “ordinary” 1-D object vectors,enables the use of all conventional chemometricmultivariate techniques, for PCA display or forclassification purposes for example, or for predictivepurposes (PLSR).

Some major advantageous characteristics of AMT inimage analysis:

•Digital image arrays are often (very) large 2 way data. AMT transforms a 2-D image into a 1-D complexity spectrum, without losing textural information.

•AMT can thus also (partly) be considered as a data compression method.

•AMT has a high sensitivity w.r.t. even (very small complexity-scale changes, see Huang et al. (2001) (ref. 4), Esbensen & Huang, 2001 (ref. 5).

Below are some application results, both relating topowder science and technology (POSTEC).

AMT applications in 2-D image analysis16 representative types of powders were collectedfrom different sources in to show AMT’s usefulness inpowder discrimination/classification. Fig. 3 is intendedto show how powder characterisation involves bothsingle-grain factors (surface roughness/smoothness,form factors: long/short vs. wide/lean, colour etc.) aswell as aggregate factors, related to the ensemble ofgrains. Complete powder description (visually ormicroscopically) is often more complex than whatimmediately meets the eye, which is why this particu-lar example was chosen in our initial AMT studies.

Figure 4. Averaged AMT spectra (each from 4 replicates) ofselected food powder samples. There is some overlap at thesmallest scales, but clear discrimination at all larger scales.Note that AMT spectra are here composed of two parts, MA(left) and MDY (right).

Figure 5. PCA score-plot (t1- t2) for the food powders; notecomplete discrimination.

0

50

100

150

0 200 400 600 blpepperC coriander salt stranger whiterice

11

Fig. 4 shows a set of the corresponding AMT spectrafor some selected food powder types spanning thefeature space involved. Already here it is quiteapparent that the full-spectrum PCA will be able to doa good discrimination job. PCA analysis on all 16transformed complexity spectra (X) shows a veryclear discrimination between all powder groups, Fig. 5.Inspection of the pertinent loadings (not shown here)allows for detailed interpretations of the particularscales involved, which indeed have both single-grainas well as aggregate components (ref. 3). It wasconcluded that AMT succeeded in extricating relevantinformation for discrimination/classification. Thisparticular food powder study was carried out to testthe AMT approach on a set of relatively similarpowders.

A parallel study for seven standard POSTEC powderswas also undertaken to test AMT’s ability to extractinformation related to external prediction, using PLS-regression. These powders are much more internallydifferent, Fig. 6. We here only show their AMT-spectra,Fig. 7. It is evident that there will be a satisfactorydiscriminating ability with PCA working on the pertinentAMT X-matrix, but we are now interested in using theX-matrix directly for PLS-prediction of so-calledfunctional powder parameters.

Figure 6. Representative POSTEC powders: PVC-pellets,Rape seeds, sand, PVC-powder, microdolomite (MD100),Alumina powder (Aluminum) and cement powder. Theseseven powder types span most of the entire feature span metwith in industrial powder handlings.

Figure 7. Derived AMT-spectra for the seven representativePOSTEC powders

In the POSTEC realm, powder characterisation playsa dominating role. Direct access to the so-called “bulkfunctional parameters” based on traditional materialscience approaches has often met with severe diffi-culties however. In many instances it has been neces-sary to resort to carrying out pilot experiments inspecial full-scale rigs in order to be able to arrive atreliable estimates for parameters such as mean grainsize, density (powder, bulk), permeability, angle ofrepose and the absolutely essential mean fluidisationvelocity. This has expensive economic consequences.It would be of great interest if it were possible to godirectly from static (laboratory-based) powdercharacterisations to the wished-for dynamic bulkfunctional properties. MIA/AMT has indeed allowed thisfor the very first time. (Observe how AMT may work oneither one selected variable (channel) or on a selectedPC-component image).

Based on the same derived AMT-X spectra as demon-strated above for the POSTEC powders, Fig. 7,inclusion of any of the bulk functional properties as Y-variables allows for a straight-forward multivariatecalibration (PLS-modeling and prediction evaluation).Fig. 8 shows the specific prediction results for the Y-parameter: mean fluidisation velocity (MFV), which isthe most interesting dynamic POSTEC parameter.MFV is critical for determining the powder propensityfor pipeline clogging and similar congestions. Theresult in Fig. 8 is novel and promising in the POSTECarea. As with any multivariate calibration the Y-valuesoriginate from external, bona fide reference determina-tions, hence only a relatively modest number ofpowder types in the present pilot study case.

Figure 8. Promising initial prediction ability assessment formean fluidisation velocity (MFV)

Table 1 shows the results of prediction ability assess-ments (PLS-1 models) for the bulk functional param-eters shown.

Pellet

Rape

Sand

PVC

MD 100

Aluminum

Cement

12

Slope Offset Correlation

coefficient.

RMSEP #PLS

components

Mean size 0.95 17.4 0.97 135.3 1

Density 0.84 330.6 0.91 356.3 4

Bulk Density

Min.Fluid.

Velocity

0.86 0.02 0.91 0.07 3

Permeability

Wall friction angle

against ST37

0.92 -1.7 0.94 1.45 3

Static angle of

repose

1.14 -7.8 0.94 5.72 3

Dynamic angle of

repose

Table 1. Comparative statistics for fitted regression modelspertaining to different powder properties. Note that all theresults below are based on the AMT spectra with 4 replicatesfor each powder. All replicates were put into the samesegment when the models were cross validated. The param-eters marked in gray could not be predicted.

The combination of AMT-pre-processing (taking careof the textural aspects in many image analysis situa-tions) with the all-powerful chemometric PLS-regres-sion has been termed MAR: Multivariate AMT regres-sion. (MAR) provides a unified methodology of datamodeling and analysis for prediction of Y-propertiesdirectly from digitized imagery, in which texture isimportant. We have several key studies under wayfurther exploring this powerful new MAR approach,including many application areas other than justpowder science and technology.

For further information, please see:

1. Andrle, R. 1994. Mathematical Geology, 26, 83-97.

2. Esbensen, Hjelmen & Kvaal, 1996. Jour. Chemom.10, 569-590.

3. Huang & Esbensen, 2000. Jour. Chemom. 14(Proceedings SSC6)???

4. Huang, Møller, Munck & Esbensen, 2001. (in prep)

5. Esbensen & Huang 2001. (in prep)

Opportunities for Chemical Imaging in theIndustrial World

Nancy Jestel

GE [email protected]

Why bother imaging? Spectroscopy works fine!In chemical analysis, industry generally focuses onspeed and productivity, trying to get an accurateanswer in the shortest possible time using the fewesthuman and financial resources. Since imaging hastraditionally been labor- and instrument-intensive andslow, it has not enjoyed as much application in indus-try as might be expected. Imaging has typically beenreserved for obviously visual problems, such asparticle morphology and distribution. In contrast,traditional spectroscopic techniques are widely ap-plied.

Most, if not all, spectroscopy techniques can be usedto generate chemical images simply by interrogatingadditional contiguous points on the sample. Byviewing imaging as the consequence of a highersampling frequency for spectroscopy, the boundariesbetween imaging and spectroscopy are blurred.Imaging is simply the multivariate way of doing spec-troscopy. The question “Why bother imaging?” be-comes “Can you afford not to image?”

What applications are there for chemical imaging?When chemometrics was a newer field, much effortwas devoted to justifying the routine use of multivariateinstead of univariate data. Often this argument wasrooted in the time and expense involved in capturingand analyzing the extra data. The same situation ispresent today for imaging. Since images can becollected in one, two, or three-dimensions, substantialamounts of data can be generated. The key to identi-fying industrial imaging applications is to look forspectroscopy problems where the addition of spatialinformation is either crucial to solving the issue, couldresult in a substantially different, more useful interpre-tation, or is more productive.

Industrial analytical chemistry problems fall into threebroad categories: product development, processdevelopment, and customer issue resolution. Foreach category, variants on the same basic questionscan be asked. Who made the material? What is theformation or reaction mechanism? When does aspecific change in the material occur? How did thechange occur? Where are the components located inthe matrix? Why is the material performing as it is?

13

Mapping particle distribution and determiningchemical identityChemical images that combine particle or filler distri-bution and chemical composition information are themost similar to traditional optical imaging. The par-ticles or fillers can be distinguished from the matrixvisually. The analyst will be more productive withchemical imaging because only one test needs to beperformed, rather than microscopy followed by somespectroscopy. One example is verifying the origin ofproduct returned by customers. Though the shapeand distribution of fillers and binding agents in prod-ucts can sometimes be used as a manufacturer’ssignature, the results are often inconclusive. A sec-ondary technique is usually necessary to reach aconclusion. Chemical imaging provides both thespatial and chemical information simultaneously.

Identifying chemical changes as a function oflocationBecause chemical images contain chemical informa-tion by definition, they also afford the ability to examinechemical changes as a function of location. Ex-amples include the distribution and chemical integrityof raw material changes in an existing product formu-lation, such as a new impact modifier or flame retar-dant in a polymer blend, a new excipient in a pharma-ceutical tablet, or a new emulsification agent in a handcream. First, the image is examined to verify that thecomponents are properly mixed and distributed andthat the material is not separating or unduly segregat-ing. Second, the spectral information is examined toensure that the materials are not reacting unexpect-edly or changing form, such as crystallizing instead ofremaining liquid. Vibrational spectroscopy techniquesare sensitive to changes in the local molecular envi-ronment from chemical interactions. These effectsare visible in small band shifts that can often bemissed or dismissed as instrument drift or calibrationerrors when only a few points on the material aresampled. Such information would only be observed insingle-point spectroscopy if the analyst was luckyenough to hit such a region when sampling. Chemicalimaging, however, provides a large enough number ofspectra so that these small spectral changes will notbe missed and generates new information that isgreater than the sum of outputs from imaging andspectroscopy.

Quantitative imagingQuantitative spectroscopy methods are routinely usedin industry. It is clear that chemical imaging wouldhave even more utility if chemical concentrationinformation could be extracted also. Studies such as

those discussed above are useful but will always belimited by the absence of quantitative informationabout the relative or absolute proportions of the con-stituents. Quantitative chemical imaging wouldincrease productivity by consolidating chemicalimaging and other techniques that are currently usedto provide concentration information.

The contrast in any chemical image reveals concen-tration differences. Since each image is typicallyscaled independently, it is difficult to compareamounts when looking at a series of chemical images,each specific to one species. Quantitative imagingwould enable both the estimation of the relative con-centration of the bulk material’s components as wellas real concentration difference between two pointson the image. This would be relevant for all of thetypes of problems discussed above.

Making chemical imaging even more multivariateChemical image data is inherently multivariate sincethere are up to three spatial dimensions and as manyspectral dimensions as were probed. Vibrationalspectroscopy data easily could have over 1000 spec-tral dimensions. Adding time, temperature, or environ-ment studies increases the dimensionality. Thechemical imaging approach to these problems offersthe opportunity to obtain information that is difficult orimpossible to obtain otherwise.

If reaction or phase changes are identified in experi-ments, additional chemical images could be gener-ated over time to try to identify the mechanism or pindown how and when a change, like polymer degrada-tion, occurs. Such studies are related to the onesnecessary to understand the effects of processing onmaterials, such as blending, mixing, heating, freezing,or pressing. Similarly, materials could be monitoredas they dissolve or are exposed to different chemicalenvironments or humidity levels. In all cases, thecorrelation of these results with other propertiesprovides more information.

ConclusionSpectroscopy and imaging are powerful tools, but areeven better together. The combination begs for theapplication of chemometrics. The problems areinteresting both for their chemical solutions and for thechemometric tools needed to tackle them. Learning toview imaging as a way of doing spectroscopychanges your perspective and frees you to recognizeimaging problems everywhere you look. Applicationsabound in industry!

14

THE ROLE OF CHEMOMETRICS INCHEMICAL IMAGE ANALYSIS

Robert C. Schweitzer, Arjun S. Bangalore,and Patrick J. Treado

ChemIcon Inc.7301 Penn Avenue

Pittsburgh, PA 15208, USAwww.chemimage.com

IntroductionMany industries, such as pharmaceuticals, coatingsand plastics, and semiconductors, manufactureproducts in which the spatial distribution of chemicalspecies is critical to the performance of the product.Chemical imaging, which is the combination of mo-lecular spectroscopy and digital imaging, has beenshown to be an effective means for rapidly measuringthe chemical architecture of materials. As a result,many organizations are turning to chemical imageanalysis of their materials to improve product formula-tions, perform failure analysis, and enhance manufac-turing productivity.

In support of industry’s need, chemical imagingtechnologies based on spectroscopic techniquesincluding Raman, ultraviolet (UV), visible (VIS), near-infrared (NIR), mid-infrared (MIR), fluorescence, andphotoluminescence have been developed. The choiceof chemical imaging strategy is determined by therequirements of the application. Chemical imaginghas been used in a broad range of applications,including: the measurement of the distribution of activeingredients in pharmaceutical tablets [1]; the evalua-tion of defects in semiconductors [2]; and the detec-tion of trace contaminants in fuel cells [3].

A wide variety of chemical imaging systems havebeen developed to examine macroscopic sizedmaterials, microscopic sized materials and materialsin remote locations. Whatever the system design,chemical imaging data is collected rapidly and in largevolumes. In many cases, data is gathered at suffi-ciently high throughput that manufacturing processmonitoring becomes practical if data reduction andprocessing can be performed at comparable rates.Combining chemometric techniques with traditionaldigital image analysis represents an effective strategyfor meeting these compelling data processing needs.

The Chemical Image Analysis CycleUntil recently, seamless integration of spectral analy-sis, chemometric analysis and digital image analysishas not been commercially available. Individual

communities have independently developed advancedsoftware applicable to their specific requirements. Forexample, digital imaging software packages that treatsingle-frame gray-scale images and spectral process-ing programs that apply chemometric techniques haveboth reached a relatively mature state. One limitationto the development of chemical imaging, however, hasbeen the lack of integrated software that combinesenough of the features of each of these individualdisciplines to have practical utility.

Historically, practitioners of chemical imaging wereforced to develop their own software routines toperform each of the key steps of the data analysis.Typically, routines were prototyped using packagesthat supported scripting capability, such as Matlab,IDL, Grams or LabView. These packages, whileflexible, are limited by steep learning curves, computa-tional inefficiencies, and the need for individual practi-tioners to develop their own graphical user interface(GUI). Today, commercially available software doesexist that provides efficient data processing and theease of use of a simple GUI [4].

Software that meets these goals must address theentirety of the chemical imaging process. Figure 1illustrates this cycle of analysis needed to success-fully extract information from chemical images and totap the full potential provided by chemical imagingsystems. The cycle begins with the selection ofsample measurement strategies and continuesthrough to the presentation of a measurement solu-tion. The first step is the collection of images. Therelated software must accommodate the full comple-ment of chemical image acquisition configurations,including support of various spectroscopic techniques,the associated spectrometers and imaging detectors,and the sampling flexibility required by differing samplesizes and collection times. Ideally, even relativelydisparate instrument designs can have one intuitiveGUI to facilitate ease of use and ease of adoption.

Figure 1: The chemical image analysis cycle

15

The second step in the analysis cycle is data prepro-cessing. In general, preprocessing steps attempt tominimize contributions from chemical imaging instru-ment response that are not related to variations in thechemical composition of the imaged sample. Someof the functionalities needed include: correction fordetector response, including variations in detectorquantum efficiency, bad detector pixels and cosmicevents; variation in source illumination intensity acrossthe sample; and gross differentiation between spectrallineshapes based on baseline fitting and subtraction.Examples of tools available for preprocessing includeratiometric correction of detector pixel response;spectral operations such as Fourier filters and otherspectral filters, normalization, mean centering,baseline correction, and smoothing; spatial operationssuch as cosmic filtering, low-pass filters, high-passfilters, and a number of other spatial filters.

Once instrument response has been suppressed,qualitative processing can be employed. Qualitativechemical image analysis attempts to address asimple question, “What is present and how is it distrib-uted?”. Many chemometric tools fall under this cat-egory. A partial list includes: correlation techniquessuch as cosine correlation and Euclidean distancecorrelation [5-6]; classification techniques such asprincipal components analysis, cluster analysis,discriminant analysis, and multi-way analysis [6-9];and spectral deconvolution techniques such asSIMPLISMA [10] and multivariate curve resolution [11-12].

Quantitative analysis deals with the development ofconcentration map images. Just as in quantitativespectral analysis, a number of multivariatechemometric techniques can be used to build thecalibration models. In applying quantitative chemicalimaging, all of the challenges experienced in non-imaging spectral analysis are present in quantitativechemical imaging, such as the selection of the calibra-tion set and the verification of the model. However, inchemical imaging additional challenges exist, such asvariations in sample thickness and the variability ofmultiple detector elements, to name a few. Dependingon the quality of the models developed, the results canrange from semi-quantitative concentration maps torigorous quantitative measurements.

Results obtained from preprocessing, qualitativeanalysis and quantitative analysis must be visualized.Software tools must provide scaling, automapping,pseudo-color image representation, surface maps,volumetric representation, and multiple modes ofpresentation such as single image frame views,montage views, and animation of multidimensional

chemical images, as well as a variety of digital imageanalysis algorithms for look up table (LUT) manipula-tion and contrast enhancement.

Once digital chemical images have been generated,traditional digital image analysis can be applied. Forexample, Spatial Analysis and Chemical Image Mea-surement involve binarization of the high bit depth(typically 32 bits/pixel) chemical image using thresholdand segmentation strategies. Once binary imageshave been generated, analysis tools can examine anumber of image domain features such as size,location, alignment, shape factors, domain count,domain density, and classification of domains basedon any of the selected features. Results of thesecalculations can be used to develop key quantitativeimage parameters that can be used to characterizematerials.

The final category of tools, Automated Image Process-ing, involves the automation of key steps or of theentire chemical image analysis process. For ex-ample, the detection of well defined features in animage can be completely automated and the results ofthese automated analyses can be tabulated based onany number of criteria (particle size, shape, chemicalcomposition, etc). Automated chemical imagingplatforms have been developed that can run for hoursin an unsupervised fashion.

The ideal analysis package should support the user’sefforts to carefully plan experiments and optimizeinstrument parameters and should allow the maxi-mum amount of information to be extracted fromchemical images so that the user can make intelligentdecisions. ChemIcon has developed a chemicalimage software package, ChemImage, which sup-ports many sophisticated analysis tools. This articlewill focus on several applications that illustrate thesetools.

Correlation Analysis Guided Spectral SubtractionIn some analyses, an image can be dominated byundesirable background interferences. In traditionalspectral analysis, spectral subtraction would beperformed to suppress the contribution from theinterfering species. Because the background may notbe present at all spatial locations in an image, applyingspectral subtraction strategies to chemical imagescan be challenging. An effective spectral subtractionstrategy is shown here for removing spectral back-ground arising from sample preparation requirements.Figure 2 shows an example from FT-IR chemicalimaging of bone cross-sections. FT-IR data from aBioRad Stingray instrument was provided to us byProf. Richard Mendelsohn of Rutgers University. The

16

Mendelsohn group applies FT-IR chemical imaging tothe study of bone disease and fracture healing [13]Bone sample preparation includes fixing material inpolymethylmethacrylate (PMMA) prior to microtoming.This standard preparation unfortunately results inimages with varying spatial distributions of PMMAintermingled with the chemical features of interest.PMMA is especially prevalent in the porous compo-nents of bones such as osteons.

To overcome this problem, an algorithm was devel-oped to perform a weighted subtraction of the spectralcontribution of PMMA from every pixel in the image.Figure 2A shows a FT-IR image at 1719 cm-1 of abone sample cross-section where the bright regionsare dominated by the presence of PMMA. Fig. 2Bplots the spectra associated with the three labeledpoints in Fig. 2A. Clearly, the PMMA spectrum isbrightest within the osteon regions, which may hidepotentially interesting bone spectral features.

To generate the weighting function image, the FT-IRdata was normalized and the pixel in the image withthe highest concentration of PMMA (largest intensityvalue at 1719 cm-1) was selected. A small portion ofthe corresponding spectrum centered around 1719

cm-1 was selected as the reference vector for thecalculation of a Euclidean distance with each appropri-ately truncated spectrum from the normalized image.The resulting Euclidean distance image is a singleframe image in which each pixel value quantifies thesimilarity of the corresponding spectrum in the originalimage to the reference vector. This image wasinverted so that a value of 0 would correspond toregions of no PMMA and a value of 1 would corre-spond to regions of the highest concentration ofPMMA. The resulting image was then used to weightthe spectral subtraction of the PMMA spectrum fromeach pixel in the normalized version of the originalimage. Fig. 2C shows the frame at 1719 cm-1 of thePMMA subtracted image and Fig. 2D shows the threespectra associated with the same spatial locationssampled in Fig. 2B. It can be seen that the PMMAsubtracted image does show more detail than theoriginal image and that the spectral contributions ofPMMA have been successfully suppressed from theimage. While in this discussion we have shown theeffectiveness of using Euclidean distance to guidespectral subtraction, other strategies can be employedas well, including other correlation techniques andprincipal component analysis.

A

C

D

-0.2

0.0

0.2

0.4

0.6

0.8

900 1400 1900 2400 2900 3400

Wavenumber (cm -1)

Abso

rbance (3) Background

(1,2) Osteons

Figure 2 (c) FT-IR image at 1719 cm-1 of bone sample withPMMA background suppression. (d) Spectra associated withlabeled points in figure 2c.

Figure 2 (a) FT-IR image at 1719 cm-1 of bone samplewithout PMMA background suppression (b) Spectraassociated with labeled points in figure 2a.

(3) Background – No PMMA

(1,2) Osteons – with PMMA

B

0.0

0.2

0.4

0.6

0.8

1.0

1.2

900 1400 1900 2400 2900 3400

Wavenumber (cm)-1

Abs

orba

nce

17

The Role of Factor Rotation in PharmaceuticalAnalysisPharmaceuticals are an especially fruitful field for theapplication of chemical imaging. This second applica-tion studies the distribution of aspirin particles in apharmaceutical tablet. Figure 3A is the reflectanceoptical microscope brightfield image collected from arectangular region of interest (ROI) in the tablet.Previous Raman chemical image studies have shownthat aspirin domains can be measured at the surfaceof the tablet [1]. In these studies, we established thataspirin has a peak at 1044 cm-1, while the excipientmatrix has a peak at 1060 cm-1. Fig. 3B shows apseudo-colored Raman image where the green colorchannel has been assigned to the raw Raman inten-sity image for aspirin (1044 cm-1) and the blue colorchannel has been assigned to the raw Raman inten-sity image for excipient (1060 cm-1). The aspirinparticles are relatively difficult to distinguish from theexcipient in this image because the surface morphol-ogy of the sample dominates the absolute Ramanintensities measured for this sample.

The spatial domains can be made visible to differingdegrees of success using a wide array of analysistools. We have successfully employed vector normal-ization, cosine correlation analysis (CCA), and princi-pal component analysis (PCA), to name a few.

0.22

0.27

0.32

0.37

0.42

0.47

1000 1020 1040 1060 1080 1100 Wavenumber (cm-1)

C

0.20

0.40

0.60

0.80

1.00

1.20

1000 1020 1040 1060 1080 1100 Wavenumber (cm-1)

D

Figure 3 (a) Bright field image of a ROI of a pharma-ceutical tablet. (b) Pseudo-color Raman image at1044 and 1060 cm-1. (c) Raman spectra associatedwith aspirin and excipient enriched regions in figure3b. (d) Resolved spectra obtained using MCR. (e)Pseudo-color image from pure component MCRspectra.

A

25 µm

B

25 µm

E

25 µm

18

Raman spectra associated with aspirin and excipientenriched domains first identified using CCA [1] areshown in Fig. 3C. In CCA, the user supplies anexternal reference spectrum that has the samewavelength dimensionality as the image, or selects arelatively pure spectrum from the image, or employsan orthonormal basis set. The cosine of the anglebetween the reference spectrum and every spectrumin the image is then calculated. The resultant singleframe image is brighter where there is a high correla-tion with the reference spectrum and darker wherethere is little correlation with the reference spectrum.

Note in Fig. 3C that baseline resolution of the aspirinand excipient Raman peaks is not achieved. At thespatial and spectral resolving power of the measure-ment, an individual chemical image pixel containscontributions from both aspirin and excipient. Inaddition to techniques such as normalization, CCA,and PCA, strategies based on factor rotation can alsogenerate “pure” chemical images and spectra, evenfrom complex mixtures.

We have applied a powerful factor rotation technique,Multivariate Curve Resolution (MCR) [11-12], to theanalysis of pharmaceutical data. MCR is an iterative,pure component spectral resolution technique whichis finding increasing application in the chemometricscommunity. The first step in the implementation ofMCR is the selection of a set of spectra which containthe initial estimates of the pure component spectra.We compared a Gram-Schmidt routine to PCA forgenerating the key set spectra and found that PCAgenerally gave the best results of the two methods.The second step in MCR uses alternating least-squares (ALS) with both concentration and spectralnon-negativity constraints. If convergence isachieved, the resulting spectra should represent purecomponent spectra.

Figure 3D shows the spectra obtained after 125iterations. The resolved spectra are not completelypure, but compare favorably with the initial mixturespectra. Fig. 3E is a pseudo-colored MCR concentra-tion map of the pure components, with aspirin appear-ing in green and excipient shown in blue.

Quantitative Chemical Imaging of SemiconductorsCadmium zinc telluride (CdZnTe) is a compoundsemiconductor material used in a variety of IR andgamma radiation detectors. Minor variations in Znconcentration will dramatically effect the electro-optical properties of CdZnTe. At present, opticalmethods for Zn concentration monitoring in CdZnTebased on low temperature photoluminescence andphotoreflectance spectroscopy have been shown to

be effective, but are costly to implement because theyare relatively slow, require sophisticated instrumenta-tion and typically require samples to be at cryogenictemperatures [14].

Near-infrared (NIR) chemical imaging has the potentialto provide a cost-effective strategy for high throughputscreening of Zn concentration in CdZnTe. Figure 4Ashows a NIR image at 825 nm of five CdZnTe stan-dards with Zn cation bulk growth loading concentra-tions of 0%, 1.2%, 2.4%, 4%, and 10%. The wave-length dependent transmission of NIR light throughCdZnTe varies with Zn concentration. For example,the sample with 10% zinc becomes transparent at825 nm, while the sample with 1.2% zinc does notshow appreciable transmission until 855 nm. Bydeveloping quantitative models using partial leastsquares regression (PLSR) that effectively monitor theNIR transmittance band edge, Zn concentration canbe monitored quantitatively, even for large CdZnTesubstrates.

NIR images were collected from 805 to 865 nm insteps of 5 nm for use in qualitative and quantitativeanalysis. The image data was mean-centered andPCA was applied with four PCs explaining 98.2% ofthe variance. The first score image explains primarilybackground, while the remaining three score imageseach highlight unique combinations of regions withdifferent zinc concentrations. An examination of thescatter plots of PC scores showed that the score plotof PC1 vs. PC3 shown in Fig. 4B effectively discrimi-nates all five zinc standards from the background andfrom each other. The points belonging to a givencluster have similar spectral characteristics and canbe assigned a unique color. These clusters oftenidentify unique components in the sample and can beexploited as a guide to map the morphological fea-tures of the sample. For example, Fig. 4Cshows a cluster-guided color mapping of the sixselected regions in Fig. 4B. Five of the clusterscorrespond to zinc standards and are annotated in thecolor image. The sixth cluster corresponds to back-ground pixels and is not annotated. The points in thescatter plot which are not clustered correspondprimarily to the edges of the zinc standards. At theseedges, the CdZnTe samples are undergoing strain,which modulates the NIR transmission spectra.

In addition to qualitative exploration of the CdZnTestandards, quantitative modeling can be implemented.A calibration set was selected for model buildingconsisting of 60 spectra with 10 spectra selected fromeach concentration standard and 10 spectra selectedfrom the background. A test set was also constructedconsisting of 30 spectra with 5 spectra selected from

19

Figure 4 (c) Colorized mapping of ROIs in figure 4b.(d) Concentration map using a PLSR model.

Figure 4 (a) NIR image at 825 nm of five concentra-tion standards. (b) Scatter plot of PC 1 vs PC 3.

each concentration standard and 5 spectra selected fromthe background. None of the test set spectra are in-cluded in the calibration set.

The historical approach for quantitative modeling in theCdZnTe industry is a univariate model where theindependent variable is the wavelength at which 65%transmission occurs. This approach is susceptible tocalibration drift and does not support calibrationtransfer.

We have developed a multivariate model using PLSRwith all wavelengths used for the generation of theindependent variables. Fig. 4D shows the results ofthis PLSR approach. We believe this is the first reportin the literature of quantitative NIR chemical imagingresults. We have evaluated these first results usingthe standard error of calibration (SEC) and the stan-dard error of prediction (SEP). As might be expected,the multivariate approach gives much better resultsthan the univariate approach.

There are a number of issues which are being ad-dressed to make these quantitative results morerobust. No corrections have been made for variationsin sample thickness, illumination variability, or detectorelement variability. The concentrations used in the

calibration model development and in the test setassume that there is no variance in the reportedconcentration for each concentration standard. Goldstandard techniques including low temperature photo-luminescence and photoreflectance are being used togenerate calibration data which will improve theaccuracy of our PLSR model. Despite the preliminarynature of this quantitative work, the initial results arepromising.

Automated Particle Analysis of SemiconductorsIn the applications discussed to this point, images areobtained with a high degree of interactivity between thesoftware and the analyst in order to obtain insightabout a particular chemical image dataset. Manyapplications, however, need to operate in an unsuper-vised fashion. One such application is the automatedNIR chemical imaging microanalysis of CdZnTesemiconductor materials for Te defects.

Te inclusions are observable as opaque sphericaldefects in the NIR. We have developed an automatedinstrument that can automatically inspect a wafer andprovide particle statistics on Te inclusions within thickCdZnTe wafers.

A C

12 10 8 6 4 2 0

D

-0.3

-0.2

-0.1

0

0.1

0.2

0.3

-3 -2 -1 0 1 2

PC 1

PC

3

1

2

3

4

56

B

1.5cm

1.5cm

1.5cm

20

1 cm

A For each wafer, a series of Te inclusion images arecollected at some user-defined depth in the wafer. Anexample wafer is shown in Figure 5A. A region ofinterest (ROI) is highlighted with an approximate sizeof 1 cm x 1 cm. Figs. 5B and 5C show raw and post-processed binary Te inclusion images, respectively, offour contiguous frames of the ROI. A visual inspectionof these images shows that the software routineshave identified and analyzed the particles in an auto-mated fashion.

As currently implemented, the only operator interactionrequired is the definition of an XY field of view using ajoy stick, the area of the wafer to scan, and the Z level(focus plane) at which to perform the analysis. Thefirst step in automated particle analysis involvespreprocessing the image so that it can be binarized.One of the biggest problems with the raw imagescollected is the gradually varying background acrosseach image frame. As a result, a particle in one areaof a frame may have a higher intensity value than thebackground of another area of that frame. To solvethis problem we use a background equalization step toforce the background to be essentially constantacross a given image frame.

Next, the software automatically selects the thresholdvalue resulting in the binarized image that best reflectsthe number and size of particles actually present inthe sample being imaged. A human operator wouldtypically approach this problem by trying multiplethreshold values and comparing the resultingbinarized images to the actual image to see whichbinarized image best matches their perception of theparticles in the actual image.

Our algorithm takes essentially the same approach,but in a completely automated manner. Briefly, aseries of threshold values are used to generatebinarized images. Each binarized image is submittedto a routine that finds the particles present in theimage. A set of rules is then used to determine thethreshold value at which the best binarized image iscreated.

For each wafer, the software generates an enhancedversion of the raw image, a binarized image using thethreshold approach, a table of particle statistics for thesample, and a montage view of the binarized image.The application of this imaging system with automatedparticle analysis is not limited to the semiconductorapplications. The methodology described here can beapplied to any digital image. This raises the prospectthat any chemical imaging application can be auto-mated.

Figure 5 (a) Macro bright field image of a CdZnTe semi-conductor wafer. (b) Broad band NIR unprocessed image offour contiguous frames of the ROI in figure 5a. (c) Pro-cessed image of the ROI in figure 5c.

ConclusionChemical imaging is a valuable tool for the analysis ofmaterials in which heterogeneity of chemical speciesplays a role. A number of industries are applyingchemical imaging on a routine basis in order to im-prove product quality and increase production effi-ciency. Over the next several years, we anticipate

B

C

1.25mm

1.25mm

21

seeing explosive growth in the use of chemical imag-ing in industry. The full value of this technology canonly be realized if the appropriate software tools areavailable and are utilized to extract meaningful infor-mation from the images. The examples in this articledescribe a range of capabilities currently available inChemIcon’s ChemImage software, which has beendesigned to support the entire chemical image analy-sis cycle from preprocessing to automation.

AcknowledgementsDr. Rich Mendelsohn of Rutgers University is acknowl-edged for providing the bone images.

Mr. Dan Reese of eV Products provided CdZnTesamples.

Dr. Tom Hancewicz of Unilever is acknowledged forhelpful discussions regarding the MCR implementa-tion.

Partial support of the work described was provided bya NIST ATP Grant (70NANB8H4021) .

References1. Zugates, C. T.; Treado, P. J. Raman Chemical

Imaging of Pharmaceutical Content Uniformity Int.J. Vibr. Spectrosc. 1999, 2, 4.

2. Schaeberle, M. D.; Morris, H. R.; Turner II, J. F.;Treado, P. J. Raman Chemical Imaging Anal.Chem. 1999, 71, 175A-181A.

3. Schoonover, J. R.; Saab, A.; Bridgewater, J. S.;Havrilla, G. J.; Zugates, C. T.; Treado, P. J.Raman/SEM Chemical Imaging of a ResidualGallium Phase in a Mixed Oxide Feed SurrogateAppl. Spectrosc. 2000, 54, in press.

4. ChemImage Product Literature, 2000; ChemIconInc., Pittsburgh, PA.

5. Morris, H. R.; Turner, II, J. F.; Munro, B.; Ryntz, R.A.; Treado, P. J. Chemical Imaging of Thermo-plastic Olefin (TPO) Surface ArchitectureLangmuir 1999, 15, 2961-2972.

6. Massart, D. L.; Vandeginste, B. G. M.; Deming, S.N.; Michotte, Y.; Kaufman, L. Chemometrics: ATextbook Data Handling in Science and Technol-ogy 2; Elsevier: Amsterdam, 1988.

7. Wold, S.; Esbensen, K.; Geladi, P. PrincipalComponent Analysis Chemom. Intell. Lab. Syst.1987, 2, 37-52.

8. Bro, R. Multiway Calibration. Multilinear PLS J.Chemom. 1996, 10, 47-61.

9. Special Issue: Multiway Analysis J. Chemom.2000, 14.

10. Windig, W.; Heckler, C. E.; Agblevor, F. A.;Evans, R. J. Self-Modeling Mixture Analysis ofCategorized Pyrolysis Mass Spectral Data with theSIMPLISMA Approach Chemom. Intell. Lab. Syst.1992, 14, 195-207.

11. Tauler, Roma; Kowalski, Bruce; Fleming, SydneyMultivatiate Curve Resolution Applied to SpectralData From Multiple Runs of an Industrial ProcessAnal. Chem. 1993, 65, 2040-2047.

12. Andrew, J. J.; Hancewicz, T. M. Rapid Analysis ofRaman Image Data Using Two-Way MultivariateCurve Resolution Appl. Spec. 1998, 52, 797-807.

13. Mendelsohn, R.; Paschalis, E. P.; Sherman, P. J.;Boskey, A. L. IR Microscopic Imaging of Pathologi-cal States and Fracture Healing of Bone Appl.Spec. 2000, 54, 1183-1191.

14. Lee, J. Giles, N. C.; Rajavel, D.; Summers, C. J.Room-Temperature Band-Edge Photolumines-cence from Cadmium Telluride Phys. Rev. B1994, 49, 1668-1676.

22

NAmICS on the WWWOur address is http://iris4.chem.ohiou.edu/

See this site for application for membership, previous newsletters, and much more!

Application for MembershipNorth American Chapter

of the International Chemometrics Society

Professional Name

Mailing Address

Unofficical Name (Nickname)

Telephone Fax

E-Mail

Chemometric “parent”(Who do we call if you’ve moved and we don’t have your new address?)

Field(s) (e.g. analchem, applied math, biotech, electrical eng., enviroscience, geophysics, statistics, toxicology)circle one

Research Interests

Primary Affiliationcircle one Academic Industrial Non-Profit Org. Consultant

Government Natl. Lab. Other

About how many chemometrics-related documents (publications & internal reports) do you generate/year?

Relevant Publications, 1990 to in press. (If more convenient, attach your publication list.)

What operating system do you use most?.........................DOS / Win / Mac / UNIX / VMS / Other:

Today’s Date:

Return to: David Lee Duewer, NaMICS Membership Secretary Building 222/Chemistry, Room A213 National Institute of Standards and Technology Gaithersburg, MD 20899