Interactive Series Baseline Correction Algorithm Andrey Bogomolov a, Willem Windig b, Susan M. Geer...

18
Interactive Series Baseline Correction Algorithm Andrey Bogomolov a , Willem Windig b , Susan M. Geer c , Debra B. Blondell c , and Mark J. Robbins c a ACD/Labs, Russian Chemometrics Society, Moscow, Russia b Eigenvector Research Inc., Rochester, NY, USA c Eastman Kodak Company, Rochester, NY, USA

Transcript of Interactive Series Baseline Correction Algorithm Andrey Bogomolov a, Willem Windig b, Susan M. Geer...

Page 1: Interactive Series Baseline Correction Algorithm Andrey Bogomolov a, Willem Windig b, Susan M. Geer c, Debra B. Blondell c, and Mark J. Robbins c a ACD/Labs,

Interactive Series Baseline Correction Algorithm

Andrey Bogomolova, Willem Windigb, Susan M. Geerc, Debra B. Blondellc, and Mark J. Robbinsc

a ACD/Labs, Russian Chemometrics Society, Moscow, Russiab Eigenvector Research Inc., Rochester, NY, USAc Eastman Kodak Company, Rochester, NY, USA

Page 2: Interactive Series Baseline Correction Algorithm Andrey Bogomolov a, Willem Windig b, Susan M. Geer c, Debra B. Blondell c, and Mark J. Robbins c a ACD/Labs,

Baseline (Background) Problem Baseline is an “eternal” issue in analytical data

processing “Baseline” or “background”?

no clear distinction baseline is associated with a smooth line reflecting a

“physical” interference background tends to be used in a more general sense

to designate ANY unwanted signal including noise and chemical components

Our preference is given to the term “baseline” because smoothness of the background signal is the main assumption of the proposed correction algorithm

Page 3: Interactive Series Baseline Correction Algorithm Andrey Bogomolov a, Willem Windig b, Susan M. Geer c, Debra B. Blondell c, and Mark J. Robbins c a ACD/Labs,

Classical Approach to the Baseline Correction Problem Classical baseline correction algorithms with

respect to single curve are almost exhaustively elaborated in the literature

A baseline to be subtracted is fitted by a linear (polynomial) function to the nodes that belong to signal-free regions

The nodes can be automatically detected by the software or manually placed by the user

These methods are advantageous for half-automatic processing where software-generated results need to be revised by a human expert

Page 4: Interactive Series Baseline Correction Algorithm Andrey Bogomolov a, Willem Windig b, Susan M. Geer c, Debra B. Blondell c, and Mark J. Robbins c a ACD/Labs,

Serial (Batch) Methods Development of two-dimensional spectroscopy

and hyphenated techniques demanded new methods applicable to data matrices

Early works in this direction applied automated baseline correction algorithms to every individual curve in a matrix dataset

The main problem with this approach is that it neglects internal (inter-spectral) correlations

Instead of the expected rank reduction it may introduce additional variance into the dataset

It is a “black-box” routine that is difficult to control

Page 5: Interactive Series Baseline Correction Algorithm Andrey Bogomolov a, Willem Windig b, Susan M. Geer c, Debra B. Blondell c, and Mark J. Robbins c a ACD/Labs,

Multivariate Background Correction Multivariate data analysis produced a revolutionary

impact onto the baseline problem in general The paradigmatic shift from hard- (knowledge-

driven) to soft- or self- (data-driven) modeling has opened new horizons and introduced new concepts

PLS introduces the means to address the background without its subtraction in the calibration context

OSC by S. Wold turns the problem inside out eliminating the variance that is irrelevant for calibration (orthogonal to Y) from the data (X)

A number of other excellent algorithms…

Page 6: Interactive Series Baseline Correction Algorithm Andrey Bogomolov a, Willem Windig b, Susan M. Geer c, Debra B. Blondell c, and Mark J. Robbins c a ACD/Labs,

Our Objectives The researchers are typically concentrated at

the development of fully automated background correction methods

Statement: fuzzy character of the baseline problem in general puts in doubt the feasibility of automated (expert-free) baseline correction routines

In contrast, we present an alternative approach that tends to maximize the means of control for a human operator

simplicity visualization interactive stepwise algorithm

Page 7: Interactive Series Baseline Correction Algorithm Andrey Bogomolov a, Willem Windig b, Susan M. Geer c, Debra B. Blondell c, and Mark J. Robbins c a ACD/Labs,

The Method The method is applied to a series of curves

(e.g., spectra or chromatograms) The method consists of two distinct steps First, a prototype baseline is constructed from

linear segments by selecting a set of nodes To aid in the node selection the mean values

are calculated to represent the entire series:

Second, the prototype baseline is used to construct individual baselines to be subtracted from the series curves by adjusting the nodes vertically to the corrected curve

j c iji

c

d1

1

Page 8: Interactive Series Baseline Correction Algorithm Andrey Bogomolov a, Willem Windig b, Susan M. Geer c, Debra B. Blondell c, and Mark J. Robbins c a ACD/Labs,

HPLC/DAD: Sample Data Calculating the meanSelecting nodesSubtracting the baselineRaw

Corrected

Page 9: Interactive Series Baseline Correction Algorithm Andrey Bogomolov a, Willem Windig b, Susan M. Geer c, Debra B. Blondell c, and Mark J. Robbins c a ACD/Labs,

2nd Derivative for Node Selection

Page 10: Interactive Series Baseline Correction Algorithm Andrey Bogomolov a, Willem Windig b, Susan M. Geer c, Debra B. Blondell c, and Mark J. Robbins c a ACD/Labs,

Baseline Correction for Curve Resolution Baseline correction is an application-specific

preprocessing technique The present baseline correction algorithm has

been developed to improve the performance of SIMPLISMA (SIMPLe-to-use Interactive Self-modeling Mixture Analysis) curve resolution technique

The algorithm has been used at Eastman Kodak Company over 10 years for routine analysis of TGA/IR data that represent a challenging case for curve resolution:

a lot of components high degree of overlap intensive background signal

Page 11: Interactive Series Baseline Correction Algorithm Andrey Bogomolov a, Willem Windig b, Susan M. Geer c, Debra B. Blondell c, and Mark J. Robbins c a ACD/Labs,

TGA/IR Sample Data

Reprinted with permission from Eastman Kodak Company, 2005

Page 12: Interactive Series Baseline Correction Algorithm Andrey Bogomolov a, Willem Windig b, Susan M. Geer c, Debra B. Blondell c, and Mark J. Robbins c a ACD/Labs,

Baseline Nature in TGA/IR

The most common reasons for TGA/IR baseline drift:

Temperature fluctuations over time Instrument drift Material scattering Impurities Inappropriate background, etc.

In the present dataset - miscellaneous reasons Spectral domain is more suitable for series

baseline correction because of narrow peaks and explicit baseline areas

Page 13: Interactive Series Baseline Correction Algorithm Andrey Bogomolov a, Willem Windig b, Susan M. Geer c, Debra B. Blondell c, and Mark J. Robbins c a ACD/Labs,

Raw spectral series

TGA/IR: Baseline CorrectionCalculating the

mean“Snapping” the

baselineSubtractingRaw

Corrected

Reprinted with permission from Eastman Kodak Company, 2005

Page 14: Interactive Series Baseline Correction Algorithm Andrey Bogomolov a, Willem Windig b, Susan M. Geer c, Debra B. Blondell c, and Mark J. Robbins c a ACD/Labs,

TGA/IR: Corrected Data Map

Reprinted with permission from Eastman Kodak Company, 2005

Page 15: Interactive Series Baseline Correction Algorithm Andrey Bogomolov a, Willem Windig b, Susan M. Geer c, Debra B. Blondell c, and Mark J. Robbins c a ACD/Labs,

TGA/IR: SIMPLISMA Curve Resolution

Reprinted with permission from Eastman Kodak Company, 2005

Page 16: Interactive Series Baseline Correction Algorithm Andrey Bogomolov a, Willem Windig b, Susan M. Geer c, Debra B. Blondell c, and Mark J. Robbins c a ACD/Labs,

IR Library Identification

Reprinted with permission from Eastman Kodak Company, 2005

Page 17: Interactive Series Baseline Correction Algorithm Andrey Bogomolov a, Willem Windig b, Susan M. Geer c, Debra B. Blondell c, and Mark J. Robbins c a ACD/Labs,

Conclusions

A new interactive approach to the baseline correction problem has been suggested

It allows for adapting traditional automated single-scan baseline correction routines or for performing manual correction on matrix data as if they were a single curve

Advantages of the method include “transparency” of the process and the means for extensive operator interaction

The method has passed long-term testing in an industrial laboratory and was integrated into a professional software package

In spite of the simplicity of the algorithm, it allows for successful elimination of baselines – even in complex cases such as TGA/IR data

Page 18: Interactive Series Baseline Correction Algorithm Andrey Bogomolov a, Willem Windig b, Susan M. Geer c, Debra B. Blondell c, and Mark J. Robbins c a ACD/Labs,

Acknowledgements Antony Williams for his friendly support, and Michel Hachey for his help and valuable ideas