Use and Abuse of Chemometrics in Chromatography

16
Use and abuse of chemometrics in chromatography Michal Daszykowski, Beata Walczak This article presents a selection of the relevant issues that emerge at the interface between chromatography and chemometrics. In the first part, we present advantages and drawbacks of applying signal-enhancement, warping and mixture-analysis methods. In the second part, we discuss typical examples of misuse and abuse of chemometrics that can occur with those less familiar with the data-processing approaches. Finally, we conclude that close collaboration between the communities of chromatographers and chemometricians will allow a deeper insight into the chromatographic systems being analyzed and permit new chromatographic problems to be solved in an efficient, elegant manner. ª 2006 Elsevier Ltd. All rights reserved. Keywords: De-noising; Mixture analysis; Signal alignment; Warping; Wavelets 1. Introduction Chemometrics is considered part of ana- lytical chemistry. In its arsenal, we can find methods that help analytical chemists to deal with all steps of analytical proce- dures, starting from the design of an experiment through extraction of infor- mation to the final decision making. A majority of chemometrics methods are general (i.e. they can be applied to any type of analytical experiment and to any type of instrumental signal). However, there are problems associated with specific types of instrumental signals or with particular analytical techniques that need special treatment. Only with the knowledge of the system studied and of zthe principles of the measurements performed can well-suited methods be chosen. What are the methods specific to chromatography? mainly, mixture-analysis approaches and warping. Other chemometric approaches are of general use, but their possible applications to chromatography are endless. It is enough just to mention all the methods of data compression, visualization, calibra- tion, and classification [1]. However, we cannot forget that chemo- metrics is a relatively new sub-discipline and many methods, included in its arsenal today, have been applied successfully to chromatography for decades. Here, it will suffice to mention the methods of experi- mental design and optimization. As any other analytical technique, chromatography adapts from the other fields what is necessary and useful for its development and the speed of these adaptations is determined by the com- plexity of the problems to be solved, instrumentation currently in use and the amount of the data to be processed. For its part, chemometrics attempts to cope with the ongoing challenges and to develop new tools to deal with new problems. However, there are old chro- matographic problems that can only now be solved efficiently, due to the increasing power of computers and due to progress in computer-related fields of human knowledge (e.g., warping analytical signals and signal de-noising). This article presents a selection of the relevant issues that emerge at the inter- face between chromatography and chemometrics. We were invited to write this article because of our familiarity with chemometrics and our appreciation of certain chromatographic problems; yet it has proved a rather demanding, even challenging, task for us. The first step was to develop our own vision of the questions under discussion and to group them into thematic ‘‘building blocks’’, to be handled less generally and with greater ease. Our first thematic ‘‘building block’’ is addressed to all chromatographers who are still not fully convinced that chemo- metrics can make their life easier. In this section, we present advantages (and also certain drawbacks) of applying the signal enhancement, warping, and mixture- analysis methods. Michal Daszykowski, Beata Walczak* Department of Chemometrics, Institute of Chemistry, The University of Silesia, 9 Szkolna Street, 40-006 Katowice, Poland * Corresponding author. Tel.: +48 32 359 12 46; Fax: +48 32 259 99 78; E-mail: [email protected] Trends in Analytical Chemistry, Vol. 25, No. 11, 2006 Trends 0165-9936/$ - see front matter ª 2006 Elsevier Ltd. All rights reserved. doi:10.1016/j.trac.2006.09.001 1081 0165-9936/$ - see front matter ª 2006 Elsevier Ltd. All rights reserved. doi:10.1016/j.trac.2006.09.001 1081

description

Use and Abuse of Chemometrics in Chromatography

Transcript of Use and Abuse of Chemometrics in Chromatography

  • Trends in Analytical Chemistry, Vol. 25, No. 11, 2006 TrendsUse and abuse of chemometricsin chromatographyMicha Daszykowski, Beata WalczakThis article presents a selection of the relevant issues that emerge at the

    interface between chromatography and chemometrics. In the first part, we

    present advantages and drawbacks of applying signal-enhancement, warping

    and mixture-analysis methods. In the second part, we discuss typical

    examples of misuse and abuse of chemometrics that can occur with those

    less familiar with the data-processing approaches. Finally, we conclude that

    close collaboration between the communities of chromatographers and

    chemometricians will allow a deeper insight into the chromatographic

    systems being analyzed and permit new chromatographic problems to be

    solved in an efficient, elegant manner.

    2006 Elsevier Ltd. All rights reserved.

    Keywords: De-noising; Mixture analysis; Signal alignment; Warping; WaveletsMicha Daszykowski,

    Beata Walczak*

    Department of Chemometrics,

    Institute of Chemistry,

    The University of Silesia,

    9 Szkolna Street,

    40-006 Katowice,

    Poland

    *Corresponding author.

    Tel.: +48 32 359 12 46;

    Fax: +48 32 259 99 78;

    E-mail: [email protected]

    0165-9936/$ - see front matter 20060165-9936/$ - see front matter 20061. Introduction

    Chemometrics is considered part of ana-lytical chemistry. In its arsenal, we canfind methods that help analytical chemiststo deal with all steps of analytical proce-dures, starting from the design of anexperiment through extraction of infor-mation to the final decision making. Amajority of chemometrics methods aregeneral (i.e. they can be applied to any typeof analytical experiment and to any type ofinstrumental signal). However, there areproblems associated with specific types ofinstrumental signals or with particularanalytical techniques that need specialtreatment. Only with the knowledge of thesystem studied and of zthe principles of themeasurements performed can well-suitedmethods be chosen. What are the methodsspecific to chromatography? mainly,mixture-analysis approaches and warping.Other chemometric approaches are ofgeneral use, but their possible applicationsto chromatography are endless. It isenough just to mention all the methods ofdata compression, visualization, calibra-tion, and classification [1].

    However, we cannot forget that chemo-metrics is a relatively new sub-disciplineElsevier Ltd. All rights reserved. doi:10.1016/j.trac.2006.09.001Elsevier Ltd. All rights reserved. doi:10.1016/j.trac.2006.09.001and many methods, included in its arsenaltoday, have been applied successfully tochromatography for decades. Here, it willsuffice to mention the methods of experi-mental design and optimization.

    As any other analytical technique,chromatography adapts from the otherfields what is necessary and useful for itsdevelopment and the speed of theseadaptations is determined by the com-plexity of the problems to be solved,instrumentation currently in use and theamount of the data to be processed. Forits part, chemometrics attempts to copewith the ongoing challenges and todevelop new tools to deal with newproblems. However, there are old chro-matographic problems that can only nowbe solved efficiently, due to the increasingpower of computers and due to progressin computer-related fields of humanknowledge (e.g., warping analyticalsignals and signal de-noising).

    This article presents a selection of therelevant issues that emerge at the inter-face between chromatography andchemometrics. We were invited to writethis article because of our familiarity withchemometrics and our appreciation ofcertain chromatographic problems; yet ithas proved a rather demanding, evenchallenging, task for us. The first step wasto develop our own vision of the questionsunder discussion and to group them intothematic building blocks, to be handledless generally and with greater ease.

    Our first thematic building block isaddressed to all chromatographers whoare still not fully convinced that chemo-metrics can make their life easier. In thissection, we present advantages (and alsocertain drawbacks) of applying the signalenhancement, warping, and mixture-analysis methods.10811081

    mailto:[email protected]

  • Trends Trends in Analytical Chemistry, Vol. 25, No. 11, 2006In the second building block, we discuss typicalexamples of misuse and abuse of chemometrics that canoccur with those less familiar with the data-processingapproaches. Even if certain examples were taken directlyfrom literature, it was certainly not meant to criticize,but to provide instruction for the future.

    In the third building block, we focus on the mostup-to-the-date challenges of chemometrics in the serviceof chromatography.

    The article draws its conclusions directly from thethree preceding sections.

    As an introduction to the issues of interest, we per-formed a search of literature available in Scopus (thelargest abstract and citation database of research litera-ture and quality web sources). Our search focused onpaper titles and keywords that contained chemometricsin combination with chromatography. The results ofthis search are given in Table 1. The overall picture ofchemometric applications to solving the chromato-graphic problems seems rather promising. Even if certainkeywords have a relatively high score, there is still a vastspace left for many possible uses of chemometricmethods for enhancement, processing and analysis ofTable 1. Results of keyword search in SCOPUS system, using aquick search (keyword(s) and chromatography)

    Keyword(s) Score

    1 Multivariate curve resolution 442 Alternating least squares 343 Mcr-als 184 Chemometrics 4035 Experimental design 6056 Multivariate analysis 2757 Pattern recognition 2808 Classification 10299 Pca 556

    10 Qspr 5111 Qsar 11112 Topological indices 3813 Topological descriptors 1114 Modeling retention 515 Fingerprints 80216 Clustering 21917 Peak shifts 1618 Deconvolution 24419 Background correction 2120 De-noising 921 Noise reduction 1722 Signal enhancement 4323 Preprocessing 4524 Mixture analysis 8225 Alignment 38326 Warping 1127 Peak matching 1828 Peak detection 5429 Wavelets 32

    5456

    1082 http://www.elsevier.com/locate/tracchromatographic data in everyday chromatographicpractice.

    Now let us start by presenting a standard arsenal ofchemometric methods that can widely and profitably beinvested in chromatography.2. What can chemometrics offer chromatography?

    In this short article, we cannot, of course, discuss allpossible applications of chemometrics in chromatogra-phy. An extensive overview of diverse chemometric ap-proaches used in separation science is available [2]. Ourpresent examples were arbitrarily extracted from thearsenal of chemometrics methods. We focus on what wethink can be useful, yet still remains relatively new andhence not well enough known in chromatography. Ourpresentation of these approaches is not very extensiveand its only purpose is to encourage analytical chemiststo study them further.

    2.1. Signal enhancementThe quality of the chromatogram(s) determines the finalresults of chromatographic analysis and it is cruciallyimportant that the preprocessing step be performedproperly. As any other instrumental signal, chromato-grams contain three major components signal, noiseand background (Fig. 1). These components differ intheir frequency. Noise is the highest frequency compo-nent, background is the lowest frequency component,and frequency of the signal is usually intermediate.

    Chromatogram enhancement can be achievedby eliminating the noise and the backgroundcomponents.Figure 1. Components of analytical signal: (a) overall signal;(b) relevant signal; (c) background; and, (d) noise.

  • Trends in Analytical Chemistry, Vol. 25, No. 11, 2006 Trends2.2. De-noisingTo eliminate undesired frequencies from the signal with-out distorting the frequency region that contains crucialinformation, the processed signal needs to be treated withdigital filters. Usually, digital filtering can be performed ineither the time domain or the frequency domain (there isequivalence between the direct time-domain and theindirect frequency-domain noise filtering).

    To analyze a signal in both time and frequencydomains, the windowed Fourier Transform, FT, is oftenused. The main idea of the windowed FT is to study thefrequencies of a signal, segment by segment. However,this approach has a serious disadvantage, namely, thesmaller the window, the better localized are suddenchanges (peaks), but the less is known about the lowerfrequency components of the signal being studied. If alarger window is applied, then more of the lowfrequencies are observed, but localization in time isworsened.

    However, in the arsenal of de-noising tools, there is anapproach well suited to non-stationary types of signal(i.e. signals with components of very different frequen-cies), namely, de-noising in the wavelet domain [3,4].Wavelets automatically adapt to the different compo-nents of a signal, using a narrow window to look at thehigh-frequency components, and a wide window to lookfreq

    uenc

    y(s

    cale

    )

    f(t)

    time

    f(t)

    freq

    uenc

    y(s

    cale

    )

    f(t)

    time

    f(t)

    time

    f(t)

    a

    b

    Figure 2. Illustration of multi-resolution properties of the wavelet transformsignal, using the narrow window to trace the high-frequency components,signal; and, (b) decomposition of the time-frequency space: each tile reprfrequency domain.at the long-lived, low-frequency components of a signal.In the other words, the signal can be studied at thedifferent resolution levels. At a coarse resolution, we canget an overall picture of an analyzed signal, and, atconsecutively higher levels, we can see increasingly finerdetails.

    Due to unique properties of wavelets as basic functions(they are orthogonal and local), non-stationary signalscan be processed very efficiently, and multi-resolutiontheory provides a simple, very fast method for decom-posing a signal, whose length equals an integer power oftwo, into its components at different scales (Fig. 2). Thetime-frequency analysis is performed by repeating thefiltering of a signal. At each filter step, the frequencydomain is cut in the middle, using the pair of filters(low-pass and high-pass). From the raw data, the firststep produces n/2 low-frequency coefficients and n/2high-frequency coefficients. In each consecutive step, thehigh frequency coefficients are kept, and the same filtersare used to further subdivide the low frequencies, untilonly one point is left [5].

    Wavelet coefficients at the first level of signal decom-position are associated with the basic functions of thehighest frequency and they can be used to estimate thenoise level. For the normalized signal, the cut-off value, t,can be calculated [6]:time domain

    Long time intervals for low frequency events

    f(t)

    time

    Short time intervals for high frequency events

    WaveletTransform

    time

    time domain

    Long time intervals for low frequency events

    f(t)

    time

    f(t)

    time

    Short time intervals for high frequency events

    WaveletTransform

    time

    : (a) wavelets automatically adapt to the different components of aand the wide window to trace the low-frequency components of a

    esents the area covered by the wavelet-based functions in the time-

    http://www.elsevier.com/locate/trac 1083

  • Trends Trends in Analytical Chemistry, Vol. 25, No. 11, 2006t rffiffiffiffiffiffiffiffiffiffiffiffiffiffi2 log n

    p1

    where n is the length of the signal, and r is the varianceof the noise, estimated on the basis of the wavelet coef-ficients at the first level of resolution (d1) as:

    r 10:6745

    medianjd1j 2

    Depending on the applied thresholding policy, waveletcoefficients with the amplitudes lower than the cut-offcan be replaced by zeros (hard thresholding) or, addi-tionally, the remaining coefficients can be diminished bythe cut-off value (so-called soft thresholding) [6]. (seeFig. 3)

    Inverse wavelet transform of the pretreated signalallows reconstruction of the signal in the time domain,but now it is free of the noise component (Fig. 4).Figure 3. (a) Mallats pyramid algorithm for discrete waveletdecomposition: (at level 0, the original signal is presented, and,at levels 15, approximation coefficients and details (bold line) ofthe decomposed signal are given); (b) final representation of thesignal in the wavelet domain for the five levels of decomposition.

    1084 http://www.elsevier.com/locate/trac2.3. Elimination of backgroundElimination of the chromatogram baseline is of greatimportance (e.g., for peak detection or for a comparisonof the different chromatographic signals) because thevarying baseline can greatly influence measures of sim-ilarity (e.g., correlation coefficient or Euclidean distanceof the signals being compared).

    Among the different approaches to baseline approxi-mation (and elimination), we favor the approach pro-posed by Eilers [7], in which the asymmetric leastsquares method is employed. The objective function ofthe method, Q, is defined as:

    Q X

    i

    viyi yi2 kX

    i

    D2yi2 3

    where y is the experimental signal (e.g., a chromato-gram), y is the smooth trend (in the case discussed, thebaseline approximation), vi are the prior weights, k is thepositive parameter weighing the second term in Eq. (3),and D denotes the derivatives of y.

    The first term of Eq. (3) represents the weighedsquared residuals, whereas the second term is associatedwith roughness penalty.

    Choosing the weights in the asymmetric way:

    vi p if yi > yi 4vi 1 p if yi < yi 5

    where 0 < p < 1.We can differently weight the positive and the nega-

    tive deviations from the trend y (e.g., if p = 0.01, all datapoints with positive deviation from the approximation yare going to exert a very small influence on the baselineapproximation).

    The problem is how to determine y with unknownweights. This can be done in an iterative way. Startingwith vi = 1 for all the data points, we can calculate thefirst approximation of the signal, and then, for all thepoints above this first approximation, take p = 0.01and find the second approximation of the signal, and soon.

    The consecutive approximations of the chromatogrambaseline are shown in consecutive panels of Fig. 5. Oncethe baseline is approximated in a satisfactory manner, itcan be subtracted from the studied signal.

    This approach requires optimization of k, and even-tually, of the degrees of derivatives.

    After some modifications [8], the approach discussedcan also be applied to the estimating background withtwo-dimensional signals (e.g., 2D-gel electropherograms(see Fig. 6)).

    2.4. Pitfalls of the approaches presentedWe hope that we have illustrated the practical value ofthe signal-enhancement methods sufficiently. Now thetime has come to consider why their applications inchromatography are rather limited.

  • Figure 4. (a) Original signal in the time domain; (b) same signal in the wavelet domain; (c) signal in the wavelet domain after thresholding; and,(d) reconstructed signal in the time domain.

    Trends in Analytical Chemistry, Vol. 25, No. 11, 2006 TrendsLet us start with wavelets. They are a very flexibletool, but, in reality, it means that the user is faced withmany choices. Just to mention a handful of basic choices:

    wavelet transform;basic wavelets;decomposition level;thresholding policy; and,threshold criterion.As the choices are data- and problem-dependent,

    implementation of fixed rules does not seem proper. Withfixed rules, the power of wavelets as a flexible tool wouldbe lost in a given application. Also, the background-correction approach mentioned earlier requires the inputparameters to be optimized. The user has to select aproper degree of derivatives and the penalty parameter,both of which depend on the chromatograms to hand.

    In the other words, it is not easy to implement thesemethods as black-box approaches. The user needs to beaware of their principles and their limitations. However,one can hardly demand that a regular chromatographerhas the advanced knowledge of chemometrics necessaryto select a proper approach and to optimize inputparameters of the method.

    Moreover, although there are many programs in thepublic domain (e.g., in MATLAB code), they are notimplemented in any standard chromatographic software.It seems that the software producers still have a bigrole to play in popularizing the new chemometricapproaches.

    The situation is no better for the other chemometricapproaches, particularly those considered by us asdeveloped especially for the chromatographic purposes(i.e. warping and mixture-analysis methods).

    2.5. Alignment of chromatograms by warpingChromatograms of complex mixtures can be treated likefingerprints and used in the same way as the otherinstrumental signals for further data analysis (e.g., cali-bration and classification). However, this requires uni-form representation of the signals in matrix form. Toconstruct such representation for m chromatograms(representing m samples), their time axes have to besynchronized. Synchronization of the time axes, namedwarping or alignment, is not a trivial problem. There aredifferent approaches, with signals shifts [912] andCorrelation Optimized Warping [13] being the mostpopular and efficient.

    2.5.1. Correlation optimized warping. The CorrelationOptimized Warping (COW) algorithm aims to correctpeak shifts in a chromatogram, P, with respect to thetarget chromatogram, T [13]. This is achieved by linearhttp://www.elsevier.com/locate/trac 1085

  • Figure 5. (ad) Consecutive estimates of the background; (e) the original chromatogram, and, (f) chromatogram after background subtraction.

    Trends Trends in Analytical Chemistry, Vol. 25, No. 11, 2006stretching and compressing chromatogram P, such thatthe correlation between P and T is maximal. Two inputparameters are required and responsible for quality ofthe chromatograms alignment. The first parameter is anumber of sections, N, into which the chromatograms isdivided. The second parameter, t, the so-called warpingparameter, defines all possible end positions of a sectionand is responsible for the degree of alignment (flexibility).1086 http://www.elsevier.com/locate/tracFor the larger values of t, the larger time shifts can becorrected. By stretching and compressing the corre-sponding sections in the two chromatograms, the signalsare aligned so that the overall correlation between themis maximal.

    The performance of the COW approach is illustratedon two pairs of chromatograms where the peak shiftsalong the time axis were nearly linear (Fig. 7, first

  • Figure 6. (a) Original 2D-gel electropherogram; (b) estimated background; and, (c) 2D-gel electropherogram after elimination of background.

    Figure 7. Alignment of chromatograms: (a and c) original chromatograms before alignment with sample and complex retention shifts; and,(b and d) chromatograms aligned by Correlation Optimized.

    Trends in Analytical Chemistry, Vol. 25, No. 11, 2006 Trendscolumn) and highly non-linear (Fig. 7, secondcolumn).

    The alignment for the first pair of the chromatogramsis satisfactory, when N = 25 and t = 3. Their initialcorrelation is equal to 0.78, and, after alignment, about0.97. The COW algorithm offers optimal alignment, butselection of the warping parameters is crucial. Thecomputational time of COW is exponentially influencedby the high t values. Although more drastic retentionshifts can be corrected with the higher t values, byincreasing the number of sections at the relatively low tvalues, some extra flexibility in alignment can be gained.http://www.elsevier.com/locate/trac 1087

  • Trends Trends in Analytical Chemistry, Vol. 25, No. 11, 2006When the alignment is unsatisfactory, larger t valuesshould be considered. Often, it is possible to achieve goodalignment at low t values, thus ensuring reasonablecomputation time.

    In the second example, for complex retention shifts (seeFig. 7c), the COW algorithm gives an acceptable align-ment with N = 35 and t = 7. In this case, larger N andsmall t values do not lead to better alignment. For thewarping parameters selected, the initial correlation of thetwo chromatograms was improved from 0.58 to 0.96.

    The COW algorithm can also be used for warping LC-MS signals along the time axis. Among the approachesavailable for warping 2D signals [14], there is the fuzzywarping developed for the alignment of the 2D-gelelectropherograms. It allows an automated global andlocal warping of the two images with unidentified com-ponents of the mixture being studied (see Fig. 8). It isalso suitable for alignment of 1D signals [15]. It is par-ticularly useful for warping long signals [16], when it isnot feasible to use the COW algorithm.

    2.6. Deconvolution of chromatographic signalsDeconvolution is used to enhance selectivity of a certainchromatographic technique, when a separation cannotbe improved by optimizing the separation conditions.There are a number of chemometric approaches thathelp to deconvolute the measured chromatographicsignals, using a single-channel detector. In principle,most of them require advanced knowledge, so they areusually not used in an everyday practice. Just to give animpression about complexity of this problem, usually achromatographer is faced with signal de-noising, peakdetection, selecting a certain deconvolution method,choosing a peak model, and specifying the range ofFigure 8. Pseudocolor images of the two 2D-gel electropherograms, illustrimage 1 are displayed in the overall image as green spots, and spots from imentire image appears in grayscale.

    1088 http://www.elsevier.com/locate/traccharacteristics for the parameters of the peak model. Inorder to facilitate deconvolution and to encouragechromatographers to use deconvolution, an automaticprogram was developed [17,18]. The main idea ofdeveloping this program was to make the task ofdeconvolution easy for non-experienced users with littleknowledge about implementing chemometrics tools orthe samples being analyzed.

    2.7. Peak deconvolution for single-channel detectionNowadays, so-called hyphenated chromatographictechniques have become a standard analytical tool. Aninterest in hyphenated techniques arose from the extrainformation that they can provide about the samplesbeing analyzed, compared with standard techniques. It ispossible to detect co-eluting compounds (i.e. to evaluatethe purity of the chromatographic peaks). This infor-mation is of great importance (e.g., when studyingpurity or contamination of pharmaceutical or chemicalproducts).

    Typical data obtained with aid of a hyphenated tech-nique can be presented as a two-way data table, X(Fig. 9). The columns of X are the chromatograms of asample, registered by using a single-wavelength channelof the detector, whereas the rows are the spectra at agiven elution time. This type of the data has a bi-linearstructure and can be decomposed into two matricescontaining concentrations and spectral profiles, C and A,respectively:

    X CAT 6Decomposition of the bi-linear data has a unique

    solution only when good estimates of the spectral profilesare available and additional constrains are introduced toating: (a) two images before; and, (b) after fuzzy warping; spots fromage 2 appear as magenta; when spots from both images overlap, the

  • Figure 9. Illustration of the bi-linear chromatographic data. Columns of matrix X contain the concentration profiles (chromatograms) and rowscontain the spectral profiles.

    Trends in Analytical Chemistry, Vol. 25, No. 11, 2006 Trendsidentify an initial concentration and the spectral profiles(e.g., Orthogonal Projection Approach (OPA) [19] can beused, followed by Alternating Least Squares (ALS) [20]).

    The range of mixture-analysis approaches is verydiverse, but a good overview of them is available [21].Let us briefly discuss the Orthogonal ProjectionApproach and Alternating Least Squares.

    2.7.1. Orthogonal projection approach. The OrthogonalProjection Approach (OPA) aims to determine in astepwise manner the most dissimilar spectral or con-centration profiles [19]. To score dissimilarity of thespectra (or of the concentration profiles), a dissimilaritycriterion, d, is introduced. The dissimilarity between a setof profiles, s1,s2, . . . , sn, being the column vectors, is thedeterminant of the so-called dispersion matrix (YTY),with the normalized s1, s2, . . . , sm, in the columns of Y:

    d detYTY 7To determine the first pure profile (i.e. the profile most

    dissimilar to those remaining), the profiles are comparedwith a reference profile, which is the normalized meanprofile. The dissimilarity of a possible pure profile isscored by the determinant of the dispersion matrix,where the columns of Y are the normalized mean profileand the actually tested profile. The first pure profile hasthe largest dissimilarity measure. The second pure profileis found by calculating dissimilarity for all the profileswith respect to the first pure profile (in the columns of Y,there are normalized (i) the first pure profile and (ii) theprofile considered). The consecutive pure profiles arefound as the most dissimilar to those already selected.

    Information about the number of individual compo-nents in the mixture can be found by examining the so-called dissimilarity plot. It presents the dissimilarity of allspectra calculated with respect to the pure profiles. If thedissimilarity plot resembles the random profile, all pureprofiles in the data were identified. An example of thedissimilarity plots constructed for a three-componentmixture of pesticides (containing two known and oneunknown component) [22] is shown in Fig. 10. Con-centrations, spectral profiles and the spectra of the twoknown pure components are shown in Fig. 11.

    The first dissimilarity plot suggests that the first initialspectral profile corresponds with Spectrum 25 (Fig. 10a).The second initial spectral profile, according to thehttp://www.elsevier.com/locate/trac 1089

  • Figure 10. Dissimilarity plots with marked maxima for: (a) the first component; (b) the second component; and, (c) the third component.

    Figure 11. (a) Concentration profiles; (b) spectral profiles; and, (c) pure spectra of two known pesticides (fenitrothion and azinphos-ethyl); datataken from [22].

    Trends Trends in Analytical Chemistry, Vol. 25, No. 11, 2006dissimilarity plot, is Spectrum 61 (Fig. 10b), and thethird is Spectrum 38 (see Fig. 10c).

    2.7.2. Alternating least squares. Alternating LeastSquares (ALS), the so-called self-modeling curve-resolution technique, aims to provide the pure concen-tration and spectral profiles [20].1090 http://www.elsevier.com/locate/tracOnce the initial pure spectra have been estimated(e.g., with aid of the Orthogonal Projection Approachalgorithm), the concentration and the spectral profilesare obtained in an iterative way with Alternating LeastSquares. With each individual iteration of the algo-rithm, the concentration and spectral profiles arecalculated:

  • Trends in Analytical Chemistry, Vol. 25, No. 11, 2006 TrendsC XAATA1 8A XTCCTC1 9

    Hence, after each step of the algorithm, a bettersolution is obtained. The procedure is continued until thesquared differences between the original data and thosereconstructed (X and CAT) are smaller than a predefinedlimit.

    It is possible to implement certain constraints withinthe steps of Alternating Least Squares, such as unimo-dality, non-negativity and closure. The unimodalityconstraint forces the chromatograms to have theunimodal shape (i.e. only one peak). The non-negativityconstraint assumes that the concentration and thespectral profiles are positive, and closure ensures that thetotal concentration of the analytes remains unchanged.

    Let us briefly present how Alternating Least Squares isused to resolve the chromatograms of co-elutingcompounds. Fig. 12 shows the pure concentrationprofiles and the spectral profiles obtained for a mixture ofpesticides obtained with Alternating Least Squares. Theinitial spectral profiles used in Alternating Least Squareswere identified with the Orthogonal ProjectionApproach.

    The large number of hyphenated techniques providesan almost unlimited number of potential applications forthe mixture-analysis techniques to resolve overlappingpeaks. There is available an excellent review of the latesttrends in the multivariate resolution techniques [23].The advantages of studying the data obtained fromhyphenated techniques and using chemometrictechniques are greatly appreciated in quality control ofherbal medicines [24], environmental studies [22,25],metabolomic analyses [26], determining the kineticparameters of the enzymes [27], controlling the purity ofdrugs [28], and characterizing chromatographicstationary phases [29].Figure 12. (a) Concentration profiles; and, (b) spectral profiles obtained byunimodality constraints and the initial spectral profiles found by means ofIt would be possible to extend the applications ofmixture analysis to chromatography. However, let usstop at this point. As we have shown, mixture analysisalready has many interesting applications, althoughmany more are yet to be published. We certainly antic-ipate extensive applications of mixture analysis (e.g., indrug-purity assignment, where their use could stronglylimit the need for costly orthogonal systems).3. Misuse of the chemometrics in chromatography

    3.1. Typical examplesThere is another important problem, which we ought todiscuss, namely, improper use of chemometrics methodsor/and improper choice of a method to deal with aproblem at hand.

    The examples given below originate, without excep-tion, from research papers published in internationallyrecognized separation science and life science journals,all of them enjoying relatively high numerical values fortheir impact factor. We refrain from providing the ref-erences because we acknowledge that the problems aremore important than the bibliographical details.

    3.1.1. Application of improper methods. For the sake ofexample, let us mention one of the most common mis-takes. Let us assume that we have at our disposal thechromatograms of the nominally at least fullyanalogous products, although of a different origin (e.g.,samples of olive oil originating from the different coun-tries) and that we are interested in tracing differences incomposition of these samples caused by their place oforigin. In order to study the effect of origin on samplecomposition, we ought to apply the classification or thediscrimination approach. This type of approach is knownas supervised learning (both, sample composition and itsthe Alternating Least Squares approach with the non-negativity andthe OPA approach.

    http://www.elsevier.com/locate/trac 1091

  • Trends Trends in Analytical Chemistry, Vol. 25, No. 11, 2006origin are known a priori). However, in many applica-tions, unsupervised methods (e.g., Principal ComponentsAnalysis) are used for data analysis instead of thesupervised approach. For example, in one paper, wefound the sentence: Principal component analysis(PCA) recently became a popular technique in dataanalysis for classification, and, in another, the authorsconcluded: The results show that the artificial neuralnetwork technique is better than the principal compo-nent analysis for the classification of healthy persons andcancer patients based on nucleoside data.

    If the data is well structured, then PCA-score plotsalone can reveal groups of samples of a different origin,although lack of these groups in the PCA space does notnecessarily mean that there is no statistically significantdifference between these samples and that their classifi-cation or discrimination is impossible. PCA by definitionmaximizes description of data variance, but the mainvariance may not necessarily be associated with theeffect studied (in our case, sample origin). Of course, PCAcan be used for exploration (e.g., compression andvisualization) of any data, but it cannot be mixed withthe supervised classification methods (e.g., SIMCA,where PCA is applied to each class of objects separatelyand the PCA models are used for sample classification) orsupervised discrimination approaches (e.g., Linear Dis-criminant Analysis (LDA), Discriminant Partial LeastSquares (D-PLS), Neural Networks (NN) or SupportVector Machines (SVM)).

    As application of an improper method to solve a givenproblem is quite common, let us give another example.In a certain paper, the authors used PLS to select themost significant variables from a large set of descriptorsto model gas chromatographic retention indices, and, inthe Conclusions, we can find that PLS failed as a vari-able selection method.

    However, PLS is not a method of variable selection.We agree that, based on the final PLS model, we candraw certain conclusions about significance of individualvariables in model construction, but this can only bedone (based on the values of the regression coefficients) ifdata were standardized, which was not the case in thestudy cited.

    In the same paper, as an alternative to PLS for variableselection, the following approach was introduced:Construct the MLR models containing two parameters(i.e. the boiling point and the topological index X), selectindices, for which the individual two-parameter modelswere the best and construct the final MLR model withthese particular parameters.

    For those familiar with basic statistical methods, itshould be obvious that this is not the best approach,because individual variables are considered in an inde-pendent manner. It would be much better to applystepwise MLR, which is the basic method available inany statistical software. In stepwise MLR, it is possible to1092 http://www.elsevier.com/locate/tracconstruct the MLR model with the boiling point, andthen find variable(s) that model well the residuals fromthe previous model. There are also other approaches(e.g., Uniformative Variable Elimination-PLS (UVE-PLS)or Classification and Regression Tree (CART)) that couldbe proper choices, well suited to the purpose of variableselection.

    3.1.2. Improper (or lack of) model validation. Anotherexample of a common abuse of chemometrics methods isimproper validation (or even lack of validation) of themodels used. It does not seem obvious to everyone thatthe fit of a model has nothing in common with its pre-dictive power and that it is possible to fit almost any typeof data, especially when using flexible non-linear mod-eling techniques. This problem appears quite often, whenthe data are modeled with the Neural Networks (NN)(e.g., in one case, NN is used for modeling retention ofnine phenols as a function of the mobile-phase compo-sition). There is no good reason to optimize the archi-tecture of NN (i.e. the number of nodes in the hiddenlayer), based on the Root Mean Square Error for thetraining set. It is necessary to apply the monitoring set(i.e. the set that is used to trace the predictive power ofNN, but it is not used for training of NN) or to use theCross-Validation (CV) procedure (if NN is stable). In thepaper we are considering, the CV procedure was men-tioned, but not applied. Finally, the predictive power ofNN ought to be estimated with an independent test set.The roles of the monitoring set and the test set are not thesame. The monitoring set was already used for optimizingthe NN architecture, so it cannot anymore be consideredindependent. There is also no reason to determine thenumber of the learning epochs, based on the Root MeanSquare Error of prediction for the test set. This ought to bedone with the monitoring set. The final NN model, pro-posed in this paper for modeling retention of 9 phenols asa function of mobile-phase composition, contains 129weights (i.e. 129 parameters). Taking into the accountthe fact that the NN architecture was not properly opti-mized and that these 129 parameters were calculatedbased on 25 experimental examples, we can be sure thatthis model will be useless for practical applications.

    The list of papers with NNs containing hundreds ofweights (parameters) to be calculated and trained with avery limited number of examples could be really long.For instance, in another paper, the NN model con-structed to predict an optimal separation of the analyteswas trained with only 21 examples to calculate 773weights (the net architecture was determined to be threeinput parameters, two hidden layers comprising 20nodes each, and the 13 output parameters).

    3.1.3. Problems with data representativeness. The lack ofmodel validation is often accompanied by the lack ofrepresentativeness of the data, used for model

  • Trends in Analytical Chemistry, Vol. 25, No. 11, 2006 Trendsconstruction (e.g., in one case, Linear DiscriminantAnalysis was applied to distinguish among the 12 classesof oils on the basis of the chromatographic data, wheresome individual classes are represented by two or threesamples only, and the model was, of course, not vali-dated). One does not need to be a chemometrician torealize that two or three samples are not enough to drawany relevant conclusions about the class to which theybelong. There are more sources of possible data variancethan the number of samples used to estimate class vari-ability. As the models constructed are statistical by nat-ure, they ought to be built using representative data sets.

    3.1.4. Unfair comparison of different methods. Let us addyet another example to our list of common abuses. Wecan find a number of papers in which the performance oftwo different modeling methods is compared. Quite often,these two methods are linear and non-linear. Whenapplied to a particular data set, they can perform dif-ferently, depending on complexity of the data modeled. Ifthe problem at hand is non-linear, then the non-linearmethod obviously outperforms the linear one. Such aresult does not justify any general conclusion (e.g., thestudy shows that ANN can give better prediction resultsthan MLR or The results obtained using ANNs werecompared with the experimental values as well as withthose obtained using regression models and showed thesuperiority of ANNs over regression models). The onlyfounded conclusion could be that for such data, the non-linear modeling technique (e.g., NN) was necessary.

    In the above paper, we also found some intriguingstatements (e.g., Before training, the network wasoptimized for the number of nodes in the hidden layer,learning rates and momentum. Then the network wastrained using the training set to optimize the values ofweights and biases).

    Similar mysterious things were found in yet anotherpaper, in which the authors managed to construct anNN that performed much worse than the multi-linearregression model.

    3.1.5. Chemometrics approaches as black-box tools. Al-though we are genuinely interested in popularizingchemometrics in different fields of analytical chemistry,we do not recommend treating them as black-box ap-proaches. Carefully studying the papers on applicationsof chemometrics to chromatography, we often foundphrases such as the computer program calculated orthe computer programs available also do not differen-tiate among numerical values of logP for organic iso-mers. It sounds as if the authors in some way attributedpersonalities to their computer programs so that theycould not bear full responsibility for failed outcomes ofthese undisciplined creatures. It also sounds similar tothe sentence that Mrs. X measured concentration ofcalcium in the studied samples. It is not important whomeasured concentration of the element of interest, butwe need to know how it was measured. In the otherwords, it is necessary to describe the applied approach,its principles and its input parameters.

    3.1.6. Useless efforts. A relatively significant number ofabuses can be found in the area of modeling chromato-graphic retention based on topological indices.

    In one of the journals related to separation sciences,we found a series of papers, in which several well-knowntopological indices were explored in detail and severalnew topological indices were also proposed (e.g., twonew indices were introduced to differentiate between theL and D amino acids, named as optical topological in-dexes and valence optical indexes). Introducedcompletely out of the blue, they have the same predictivepower as any arbitrarily chosen pair of two numbers,higher for isomer L and lower for isomer D. In otherwords, vertices corresponding to the symmetric atomsand the asymmetric atom in isomer L are denoted as+di, whereas the respective asymmetric carbon atomin isomer D is denoted as di. The new index is definedas multiplication of the distance matrix and vector z, andthe elements of vector z are just these d parameters. Ofcourse, this absolutely trivial trick leads to a lower valueof the proposed index for isomer D than for isomer L,thus differentiating between the two isomers. It looksvery much like time-consuming nonsense.

    A similar mystery can be found elsewhere, wherethe term qualitative correlation is proposed to describecorrelation between two objects (isomers) described bytwo parameters. Taking into the account the fact that,through two points posed on the plane (i.e. in the two-dimensional parameter space), we can always draw astraight line, we wonder what is a real meaning of thisqualitative correlation or of any other type of corre-lation.

    We would like to encourage chromatographers toapply chemometrics methods, but it should be obvious toeveryone that any method ought to be applied properly.

    In the same way as repeated weighing of your sample onan unbalanced balance will not help you to estimate itsreal mass, wrongly applied chemometrics methods will nothelp you to draw any reasonable conclusions.

    However, make no mistake, examples of abuse and/ormisuse of chemometrics methods like those cited happenin any sub-field of chemistry, chemometrics included.4. Ongoing challenges

    Chromatography as an analytical technique is chal-lenged by the analysis of complex biological samples. Lethttp://www.elsevier.com/locate/trac 1093

  • Figure 13. Color image of the log values of the LC-MS data.

    Trends Trends in Analytical Chemistry, Vol. 25, No. 11, 2006us focus on LC-MS, which rapidly emerged as a methodof choice for a large-scale proteomic analysis. LC-MSsystems can be used to identify and to evaluate theFigure 14. Examples of indi

    1094 http://www.elsevier.com/locate/tracrelative abundance of thousands of molecules (in pro-teomic profiling, the molecules in question are peptidesderived by proteolysis of intact proteins). For example,for a very complex sample (e.g., the blood sample),the peptide mixture is resolved by chromatographicseparation prior to its injection into mass spectrometer,so the data generated during analysis consists of both theunique retention times and the m/z ratios of theindividual peptides (Fig. 13).

    In any description of state-of-the-art LC-MS systems,it can state that LC-MS is an information-rich tech-nique. Of course, this is true, but there is a long wayto go from the masses of experimental data to usefulinformation about the complex biological systems andevents studied. Data analysis is rapidly becoming themajor obstacle in converting experimental knowledgeto valid conclusions. The main problems are associatedwith the facts that: the LC-MS systems are subject to considerable noise

    and their variability is not fully characterized (e.g.,the individual mass chromatographic profiles, pre-sented in Fig. 14);vidual LC-MS time profiles.

  • Trends in Analytical Chemistry, Vol. 25, No. 11, 2006 Trends the elution-time axis varies between different experi-ments (see Fig. 14, in which for illustrative purposes,two total ion chromatograms are presented);

    the overlap of peptides across the experimental spaceis confounding; and, additionally,

    the differences in overall sample composition can af-fect the signal intensities of individual peptides [30].To address all these issues, many challenging tasks

    need to be tackled. At the stage of data organization intoa matrix form, we already face many problems. In rawform, the full-scan spectra obtained from an LC-MSexperiment comprises a table of the following values: (a)scan number; (b) LC retention time; and, 3) ion abun-dance. Depending on the resolution of the MS instru-ment, retaining all possible values can lead to anintractably huge matrix, so that matrix representation ofthe LC-MS data usually involves the so-called binning ofnominal m/z values. An optimal bin width should belarge enough to keep the matrix tractable and not toosparse, but small enough so that individual m/z valuesremain informative. However, so far, no methods havebeen reported for evaluating optimal bin width, or fordetermining its influence on calculating the features.However, there are some other approaches that caneliminate the binning problem, namely, the variables inthe data matrix can correspond to the peaks character-ized by their retention time and m/z. This type of dataorganization keeps the high resolution of MS, but alsorequires pre-selection of certain kinds of peak, in order toavoid the data having enormous dimensionality. In anysuch approach, signal enhancement, peak detection andpeak alignment are unavoidable. All these methods areinvolved only in the data preprocessing, which is followedby data exploration (e.g., data classification, markeridentification and model validation). Chemometrics offersall possible tools from start to finish. All the methods fordata compression, clustering, visualization, featureselection, classification or discrimination, and resolutionare at our disposal for the analysis of this type of data, andanalytical chemists ought to benefit from them.

    We should stress that depending on an overallstrategy of data modeling a different type of prepro-cessing is required and, with a good overview of only theproblem at hand, the optimal strategy can be chosen. Forexample, if the pilot study suggests that classification canbe successfully performed on the total mass chromato-graphic profiles only, there is no need to enhance theindividual mass spectra or the individual mass chro-matograms, because, in the total mass chromatograms,random noise is to a large degree reduced (by summa-tion of many individual mass chromatograms), so onlyits background correction is necessary. Classificationbased on the total mass chromatograms, extended byfeature selection and perhaps also mixture analysis,allows us to return to the pure mass spectra of the sig-nificant components identified and to benefit from theiroriginal high resolution.

    Several research groups have developed systems thataddress the main problems associated with processingLC-MS data [3133], but all these systems are far frombeing flexible enough to deal with the different problemsassociated with LC-MS data; moreover, they assume dataorganization in one out of many possible ways. Theoptimal approach depends on the structure and thecomplexity of the data, and it cannot be defined a priori.For each problem, the simplest, and simultaneously themost stable, approach ought to be selected individually.

    Another aspect of effective modeling of massive andnoisy LC-MS data that needs to be addressed is theapplication of robust methods of data pretreatment andmodeling. We have at our disposal a growing number ofrobust approaches (e.g., PLS [34], Least Square SupportVector Machines (LSSVM) [35] and robust Soft Inde-pendent Modeling of Class Analogies (SIMCA) [36]),which allow construction of stable models that describethe majority of data well (i.e. not influenced by theoutlying observations).

    Massive chromatographic data sets require intensivechemometric treatment.

    5. Conclusions

    Although there are analytical chemists who are some-what critical about chemometrics, this review demon-strates not only various advantages of data processing,but also a need for mathematical evaluation of the datain order improve understanding of the results in areas ofincreasing sample complexity.

    As demonstrated in this review, chemometrics can dealefficiently with signal enhancement and with other spe-cific tasks associated with chromatographic signals.However, the reader can see that each of the methodspresented requires the input parameters to be optimized,and, in the majority of cases, they cannot be run fullyautomatically. The choice of a particular method and itsparameters is by no means a trivial task. It requiresunderstanding of the principles of the method and of themeaning of the individual input parameters. What isrelatively easy for all those involved in chemometricsusually seems too complicated for others who are involvedin a particular sub-discipline of analytical chemistry.

    For many approaches, there is the software availablein the public domain, yet this advantage does not seemto have much influence on the popularity of themethods. Although the main goal of chemometrics is toprovide analytical chemists with efficient tools for dataprocessing, it seems that these methods are mainly usedby chemometricians, and we are still far from their dailyhttp://www.elsevier.com/locate/trac 1095

  • Trends Trends in Analytical Chemistry, Vol. 25, No. 11, 2006use in analytical laboratories. In laboratories, themethods that dominate are implemented in the softwareoffered as a part of equipment of a given instrument.

    Summing up, it seems that there is a huge gap be-tween the communities of chemometricians and ana-lytical chemists. Chemometricians are extending thearsenal of the possible approaches, but analyticalchemists are not profiting from these to the extent thatthey could.

    Both communities ought to make some effort andcollaborate more effectively.

    Both communities could profit from closer collabora-tion. There are many things to be learnt by chemome-tricians about modern separation techniques and theproblems inherent in them, and there are many che-mometric approaches that could make the life of chro-matographers easier. Moreover, chemometricians shouldcare more about popularizing their approaches andpublish more examples of practical applications ofchemometrics methods in the journals more orientedtowards analytical chemistry, but chromatographersalso ought to make an effort to study the basic principlesof chemometrics approaches.

    The main message of chemometricians to chroma-tographers is:

    Chemometric methods are being invented to make yourlife easier.

    References

    [1] D.L. Massart, L. Buydens, J. Pharm. Biomed. Anal. 6 (1988)

    535.

    [2] A.C. Duarte, S. Capelo, J. Liq. Chromatogr. Relat. Technol. 29

    (2006) 1143.

    [3] B.B. Hubbard, The World According to Wavelets, A.K. Peters,

    Wellesley, Massachusetts, USA, 1998..

    [4] B. Walczak, Wavelets in Chemistry, Elsevier, Amsterdam, The

    Netherlands, 2000.

    [5] S.G. Mallat, IEEE Trans. Pattern Anal. Machine Intell. 11 (1989)

    674.

    [6] D.L. Donoho, IEEE Trans. Inf. Theory 41 (1995) 613.

    [7] P.H.C. Eilers, Anal. Chem. 75 (2003) 3631.1096 http://www.elsevier.com/locate/trac[8] K. Kaczmarek, B. Walczak, S. De Jong, B.G.M. Vandeginste, Acta

    Chromatogr. 15 (2005) 82.

    [9] A. Kassidas, J.F. MacGregor, P.A. Taylor, AICHE J. 44 (1998)

    864.

    [10] P. Eilers, Anal. Chem. 76 (2004) 404.

    [11] J. Forshed, I. Schuppe-Koistinen, S.P. Jacobsson, Anal. Chim. Acta

    487 (2003) 189.

    [12] B. Walczak, W. Wu, Chemometrics Intell. Lab. Syst. 77 (2005)

    173.

    [13] N.-P.V. Nielsen, J.M. Carstensen, J. Smedsgaard, J. Chromatogr.,

    A 805 (1998) 17.

    [14] K. Kaczmarek, B. Walczak, S. de Jong, B.G.M. Vandeginste,

    J. Chem. Inf. Comput. Sci. 42 (2002) 1431.

    [15] B. Walczak, W. Wu, Chemometrics Intell. Lab. Syst. 77 (2005) 173.

    [16] W. Wu, M. Daszykowski, B. Walczak, B.C. Sweatman,

    S.C. Connor, J.N. Haselden, D.J. Crowther, R.W. Gill, M.W. Lutz,

    J. Chem. Inf. Model. 46 (2006) 863.

    [17] G. Vivo-Truyols, J.R. Torres-Lapasio, A.M. van Nederkassel,

    Y. Vander Heyden, D.L. Massart, J. Chromatogr., A 1096

    (2005) 133.

    [18] G. Vivo-Truyols, J.R. Torres-Lapasio, A.M. van Nederkassel,

    Y. Vander Heyden, D.L. Massart, J. Chromatogr., A 1096

    (2005) 146.

    [19] F. Cuesta Sanchez, J. Toft, B. van den Bogaert, D.L. Massart, Anal.

    Chem. 68 (1996) 79.

    [20] R. Tauler, D. Barcelo, Trends Anal. Chem. 12 (1993) 319.

    [21] F.C. Sanchez, B. van den Bogaert, S.C. Rutan, D.L. Massart,

    Chemometrics Intell. Lab. Syst. 34 (1996) 139.

    [22] R. Tauler, S. Lacorte, D. Barcelo, J. Chromatogr., A 730 (1996) 177.

    [23] A. de Juan, R. Tauler, Anal. Chim. Acta 500 (2003) 195.

    [24] Y.-Z. Liang, P. Xie, K. Chan, J. Chromatogr., B 812 (2004) 53.

    [25] E. Pere-Trepat, S. Lacorte, R. Tauler, J. Chromatogr., A 1096

    (2005) 111.

    [26] H. Idborg, L. Zamani, O.-O. Edlung, I. Schuppe-Koisrinen,

    S.P. Jacobsson, J. Chromatogr., B 828 (2005) 14.

    [27] R. Sanchez-Ponce, S.C. Rutan, Chemometrics Intell. Lab. Syst. 77

    (2005) 50.

    [28] D. Lincoln, A.F. Fell, N.H. Anderson, D. England, J. Pharm.

    Biomed. Anal. 10 (1992) 837.

    [29] S. Nigam, M. Stephens, A. de Juan, S.C. Rutan, Anal. Chem. 73

    (2001) 290.

    [30] J. Listgarten, A. Emili, Mol. Cell. Proteomics 4 (2005) 419.

    [31] P. Kearney, P. Thibault, J. Bioinform. Comput. Biol. 1 (2003)

    183.

    [32] W. Wang, H. Zhou, H. Lin, S. Roy, T.A. Shaler, L.R. Hill,

    S. Norton, P. Kumar, M. Anderle, C.H. Becker, Anal. Chem. 75

    (2003) 4818.

    [33] D. Radulovic, S. Jelveh, S. Ryu, T.G. Hamilton, E. Foss, Y. Mao,

    A. Emil, Mol. Cell. Proteomics 3 (2004) 984.

    [34] S. Serneels, C. Croux, P. Filzmoser, P.J. Van Espen, Chemometrics

    Intell. Lab. Syst. 79 (2005) 55.

    [35] S. Serneels, J. Chemometrics (submitted).

    [36] K. Vanden Branden, M. Hubert, Chemometrics Intell. Lab. Syst.

    79 (2005) 10.

    Use and abuse of chemometrics in chromatographyIntroductionWhat can chemometrics offer chromatography?Signal enhancementDe-noisingElimination of backgroundPitfalls of the approaches presentedAlignment of chromatograms by warpingCorrelation optimized warping

    Deconvolution of chromatographic signalsPeak deconvolution for single-channel detectionOrthogonal projection approachAlternating least squares

    Misuse of the chemometrics in chromatographyTypical examplesApplication of improper methodsImproper (or lack of) model validationProblems with data representativenessUnfair comparison of different methodsChemometrics approaches as black-box toolsUseless efforts

    Ongoing challengesConclusionsReferences