What is Pre-Whitening? - Eigenvector FACSS 05.pdf · Using Pre-Whitening Techniques ......

13
10/29/2007 1 Using Pre-Whitening Techniques for Process Model Stabilization Using Pre-Whitening Techniques for Process Model Stabilization Jeremy M. Shaver, Barry M. Wise Eigenvector Research, Inc. Seattle/Wenatchee, WA [email protected] Inverse linear regression (e.g. partial least squares) likes random noise Non-random systematic interference common and make it work harder for same accuracy. More components makes models harder to interpret, introduces more noise Goal: Reduce presence of systematic information not relevant to prediction What is Pre-Whitening?

Transcript of What is Pre-Whitening? - Eigenvector FACSS 05.pdf · Using Pre-Whitening Techniques ......

Page 1: What is Pre-Whitening? - Eigenvector FACSS 05.pdf · Using Pre-Whitening Techniques ... Cross-Validation Prediction Plots for [isoflavone] < 5000 ppm 1-000 0 1000 2000 3000 4000 5000

10/29/2007

1

Using Pre-Whitening Techniques for Process Model Stabilization

Using Pre-Whitening Techniques for Process Model Stabilization

Jeremy M. Shaver, Barry M. Wise

Eigenvector Research, Inc.

Seattle/Wenatchee, WA

[email protected]

• Inverse linear regression (e.g. partial least squares)

likes random noise

• Non-random systematic interference common and

make it work harder for same accuracy.

• More components makes models harder to

interpret, introduces more noise

• Goal: Reduce presence of systematic information

not relevant to prediction

What is Pre-Whitening?

Page 2: What is Pre-Whitening? - Eigenvector FACSS 05.pdf · Using Pre-Whitening Techniques ... Cross-Validation Prediction Plots for [isoflavone] < 5000 ppm 1-000 0 1000 2000 3000 4000 5000

10/29/2007

2

Pre-Whitening Methods

• Add more PLS components… (brute force)

• Orthogonal Signal Correction (OSC)Wold et. al. Chemometrics Intell. Lab. Syst. 1998; 44: 175-185

• Orthogonal Partial Least Squares (OPLS)*Trygg, Wold, J. Chemometrics 2002; 16: 119-128

• Generalized Least Squares Weighting (GLS

/ GLSW) with y-block sort

* use requires patent license agreement

Orthogonal Signal Correction (OSC)

and GLS with y-block Sort

• Use y-block to identify spectral differences which are not apparently related to the “property of interest” (e.g. concentration)

• use y-block value to sort x-block (spectra)

• calculate row-wise differences

• Use those differences to create filter and “deweight” those features.

• y-block differences also calculated to use as weights in covariance calculation

Page 3: What is Pre-Whitening? - Eigenvector FACSS 05.pdf · Using Pre-Whitening Techniques ... Cross-Validation Prediction Plots for [isoflavone] < 5000 ppm 1-000 0 1000 2000 3000 4000 5000

10/29/2007

3

Concentration Spectra

analyte band

GLS with y-block Sort

Sort concentration

Concentration Spectra

and use order to sort spectra…

Spectra with similar y-block values are near each other and should contain similar features.

Sorting Spectra Using y-block

Page 4: What is Pre-Whitening? - Eigenvector FACSS 05.pdf · Using Pre-Whitening Techniques ... Cross-Validation Prediction Plots for [isoflavone] < 5000 ppm 1-000 0 1000 2000 3000 4000 5000

10/29/2007

4

Concentration Spectra

derivative down columns

(both conc. and spec.)

Only differences between spectra with similar y-values are left.

Create covariance from these differences.

Isoflavone in Corn

• Determine isoflavone content in corn

kernels (~ 600-7000 ppm)

• 820 samples from 2000 growing season

• NIR 400-2500 cm-1

• Goal: <400 ppm and <10 latent variables?!?

Steven Wright and James Janni

Pioneer Hi-Bred, Inc., A DuPont Business

(presented at CAC 2002 conference)

Page 5: What is Pre-Whitening? - Eigenvector FACSS 05.pdf · Using Pre-Whitening Techniques ... Cross-Validation Prediction Plots for [isoflavone] < 5000 ppm 1-000 0 1000 2000 3000 4000 5000

10/29/2007

5

5 10 15 20 25 30 35 40300

400

500

600

700

800

900

1000

1100

1200

Latent Variable Number

RM

SE

CV

5-splits contiguous block

(1)(2)

(3)AutoOPLS + AutoGLS + AutoAuto + OSC

(#) = number of components used in preprocessing.Dashed lines = Same preprocessing technique

with different numbers of components.

(6)

(8)

(4)

Cross-Validation Results for PLS Models

Cross-Validation Prediction Plots for [isoflavone] < 5000 ppm

-1000

0

1000

2000

3000

4000

5000

Y C

V P

red

icte

d

Y Measured

Y C

V P

red

icte

d

1000 2000 3000 4000-1000

0

1000

2000

3000

4000

5000

GLS + Autoscale

RMSECV =384 (6 LVs)

AutoscaleRMSECV = 508 (11 LVs)

Page 6: What is Pre-Whitening? - Eigenvector FACSS 05.pdf · Using Pre-Whitening Techniques ... Cross-Validation Prediction Plots for [isoflavone] < 5000 ppm 1-000 0 1000 2000 3000 4000 5000

10/29/2007

6

Raman Analysis of Ionic Mixtures

from Salt Cake Dissolution

• Monitor ionic concentration during dissolution of

waste tank salt cakes

• Most ions by classical least squares successful

• OH- difficult due to significant overlap with water

OH envelope (changes with ionic strength and

other effects)

Samuel A. Bryan, Tatiana G.

Levitskaia, Serguei I. Sinkov

Pacific Northwest National Labs

Spectral Changes with [OH-]

2800 3000 3200 3400 3600 3800 4000-50

0

50

100

150

200

250

300

Frequency (cm-1)

Rela

tive Inte

nsity

Other ions change

envelope shape!

Page 7: What is Pre-Whitening? - Eigenvector FACSS 05.pdf · Using Pre-Whitening Techniques ... Cross-Validation Prediction Plots for [isoflavone] < 5000 ppm 1-000 0 1000 2000 3000 4000 5000

10/29/2007

7

Before we bother, what’s the

advantage?!?

• Does the model respond differently to

random noise?

• Does the model respond differently to

changes in interferences?

• Is the model differently sensitive to

Hotelling’s T2 and/or outliers?

100 200 300 400 500 600 700 800 900

Simulated Frequency

0 50 100 150 200 250 300-20

0

20

40

60

80

100

120

Sample Number

Sim

ula

ted

Co

nce

ntr

atio

n

Experiment with different

positions and amplitudes of

analytical band and different

interferences.

Simulated Process DataConcentrations

Spectra

Page 8: What is Pre-Whitening? - Eigenvector FACSS 05.pdf · Using Pre-Whitening Techniques ... Cross-Validation Prediction Plots for [isoflavone] < 5000 ppm 1-000 0 1000 2000 3000 4000 5000

10/29/2007

8

1 2 3 4 5 6 7 8 9 1012

14

16

18

20

22

24

26

28

30

# of LVs

RM

SE

CV

OSC (2 comp, 7 iter)

mean center

GLSW (alpha = 1)

GLSW (alpha = .01)

GLSW (alpha = 100)

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

High Overlap Simulated Example(Very Low Signal to Noise Ratio)

Prediction at V. High Noise(peak to peak SNR = 1)

0 10 20 30 40 50 60 70 800

10

20

30

40

50

60

70

80

90

100

RMSEP

Num

ber

of

Replic

ate

s

noneOSCGLS

Page 9: What is Pre-Whitening? - Eigenvector FACSS 05.pdf · Using Pre-Whitening Techniques ... Cross-Validation Prediction Plots for [isoflavone] < 5000 ppm 1-000 0 1000 2000 3000 4000 5000

10/29/2007

9

Cross-Validation at Low Noise(peak to peak SNR = 5)

2 4 6 8 10 12 140

0.5

1

1.5

2

2.5

3

3.5

4

4.5

5

Number of Latent Variables

RM

SE

CV

None

OSCGLS

Prediction at Low Noise(peak to peak SNR = 5)

1.7 1.8 1.9 2 2.1 2.2 2.3 2.4 2.5 2.60

5

10

15

20

25

30

35

RMSEP

Num

ber

of

Replic

ate

s

noneOSCGLS

Page 10: What is Pre-Whitening? - Eigenvector FACSS 05.pdf · Using Pre-Whitening Techniques ... Cross-Validation Prediction Plots for [isoflavone] < 5000 ppm 1-000 0 1000 2000 3000 4000 5000

10/29/2007

10

Cross-Validation at High Noise(peak to peak SNR = 2)

2 4 6 8 10 12 140

1

2

3

4

5

6

7

8

9

10

Number of Latent Variables

RM

SE

CV

meancenterOSC (2)OSC (3)GLS

Prediction Error at High Noise(peak to peak SNR = 2)

3.4 3.6 3.8 4 4.2 4.4 4.6 4.8 50

5

10

15

20

25

30

35

RMSEP

Num

ber

of

Replic

ate

s

noneOSCGLS

Changing Noise and Background

Page 11: What is Pre-Whitening? - Eigenvector FACSS 05.pdf · Using Pre-Whitening Techniques ... Cross-Validation Prediction Plots for [isoflavone] < 5000 ppm 1-000 0 1000 2000 3000 4000 5000

10/29/2007

11

Q Residuals With and Without GLS

50 100 150 200 250 30050

60

70

80

90

100

110

120

130

Sample

Q R

esid

uals

(0.0

3%

), Q

Resid

uals

(54.2

8%

)

Samples/Scores Plot of data

SNR = 3

Scores With and Without GLS

1000

-20 0 20 40 60 80 100-800

-600

-400

-200

0

200

400

600

800

Y Measured 1

Without GLS

-20 0 20 40 60 80 100-20

-15

-10

-5

0

5

10

15

Y Measured 1

With GLS

SNR = 3

Page 12: What is Pre-Whitening? - Eigenvector FACSS 05.pdf · Using Pre-Whitening Techniques ... Cross-Validation Prediction Plots for [isoflavone] < 5000 ppm 1-000 0 1000 2000 3000 4000 5000

10/29/2007

12

Spectral Changes with [OH-]

2800 3000 3200 3400 3600 3800 4000-50

0

50

100

150

200

250

300

Frequency (cm-1)

Rela

tive Inte

nsity

OH- Cross Validation

2 4 6 8 10 12 140

0.2

0.4

0.6

0.8

1

Latent Variables

none

OSC

GLS

RM

SE

CV

Page 13: What is Pre-Whitening? - Eigenvector FACSS 05.pdf · Using Pre-Whitening Techniques ... Cross-Validation Prediction Plots for [isoflavone] < 5000 ppm 1-000 0 1000 2000 3000 4000 5000

10/29/2007

13

Conclusions

• Sensitivity to Outliers and Noise not apparently changed

• GLS can achieve factor reduction in some scenarios in which OSC cannot

• GLS pre-processed data less sensitive to selection of number of factors

• Easier to interpret models may be primary benefit