PLS Regression

36
PLS Regression Hervé Abdi The university of Texas at Dallas [email protected]

description

PLS Regression. Hervé Abdi The university of Texas at Dallas [email protected]. An Example: What is Mouthfeel?. From Folkenberg D.M., Bredie W.L.P., Martend M., (1999). - PowerPoint PPT Presentation

Transcript of PLS Regression

Page 1: PLS Regression

PLS Regression

Hervé AbdiThe university of Texas at Dallas

[email protected]

Page 2: PLS Regression

An Example: What is Mouthfeel?

From Folkenberg D.M., Bredie W.L.P., Martend M., (1999). What is mouthfeel: Sensory-rheological relationship in instant hot cocoa drinks. Journal of Sensory Studies, 14, 181-195.(Data set courtoisie ofMarten, H., Marten M. (2001) Multivariate Analysis of Quality: An introduction. London: Wiley.Downloaded from: www.wiley.co.uk/chemometricsData set: Cocoa-ii.mat

Goal.Predict sensory attributes (mouthfell): Dependent variables (Y set)from physical/chemical/rheological properties: Predictors / independent variables (X set)

Page 3: PLS Regression

An Example: What is Mouthfeel?

6 Predictors / independent variables (X set) physical/chemical/rheological properties %COCOA %SUGAR %MILK SEDIMENT COLOUR VISCOSITY

10 Dependent variables (Y set)colour cocoa-odour milk-odour thick-txtr mouthfeel smooth-txtr creamy-txtr cocoa-taste milk-taste sweet

14 Samples (n-: without stabilizer, n+: are with stabilizer)1- 2- 3- 4- 5- 6- 7- 1+ 2+ 3+ 4+ 5+ 6+ 7+

Page 4: PLS Regression

20.00 30.00 50.00 2.60 44.89 1.86 20.00 43.30 36.70 2.65 42.77 1.80 20.00 50.00 30.00 2.40 41.64 1.78 26.70 30.00 43.30 3.10 42.37 2.06 26.60 36.70 36.70 3.55 41.04 1.97 33.30 36.70 30.00 4.30 39.14 2.13 40.00 30.00 30.00 4.70 38.31 2.26 20.00 30.00 50.00 0.12 44.25 48.60 20.00 43.30 36.70 0.09 41.98 44.10 20.00 50.00 30.00 0.10 41.18 43.60 26.70 30.00 43.30 0.10 41.13 47.80 26.60 36.70 36.70 0.10 40.39 50.30 33.30 36.70 30.00 0.10 38.85 51.40 40.00 30.00 30.00 0.09 37.91 54.80

X

Page 5: PLS Regression

1.67 6.06 7.37 5.94 7.80 8.59 6.51 6.24 6.89 8.48 3.22 6.30 5.10 6.34 8.40 9.09 7.14 7.04 5.17 9.76 4.82 7.09 4.11 6.68 8.29 8.61 6.76 7.26 4.62 10.50 4.90 7.57 3.86 6.79 8.58 5.96 5.46 8.77 3.26 6.69 7.03 7.96 2.99 6.92 8.71 6.42 5.59 8.93 2.76 7.0510.60 10.24 1.57 6.51 9.70 4.55 4.62 11.44 1.51 5.4811.11 11.31 1.25 7.04 9.72 3.42 4.11 12.43 0.86 3.91 3.06 6.97 5.40 9.84 9.99 10.67 9.11 7.66 5.71 8.24 6.02 8.61 3.75 10.01 9.92 10.86 8.64 7.66 4.86 8.71 7.94 8.40 2.95 9.61 9.92 10.84 8.26 8.32 4.09 9.67 9.17 9.30 2.86 10.68 11.05 10.48 8.20 10.40 2.22 6.43 10.46 10.14 1.90 10.71 10.64 9.60 7.84 11.05 2.01 7.0212.40 11.30 1.18 10.64 11.09 7.24 7.23 11.78 1.65 5.5913.46 11.49 1.56 11.31 11.36 7.22 6.86 12.60 1.06 4.34

Y

Page 6: PLS Regression

Why using PLS and PCA and MLR

Page 7: PLS Regression

• I by J data sets: PCA, CA, Biplots, etc.

II

JJ

The beauty of Euclide …

Page 8: PLS Regression

• I by J I by 1 (with J << I) data sets: Multiple Regression

II

JJ 11

The beauty of Euclide

Page 9: PLS Regression

• I by J I by K data sets: PLS, CANDIS, etc.

II

JJ KK

The beauty of Euclide

Page 10: PLS Regression

Why using PLS ?

1.To explain the similarity between the observations (here cocoa samples).

2.To detect the structure in the relationships between dependent and independent variables.

3.To get a graphical representation of the data4.To predict the value of new observations

Page 11: PLS Regression

PLS combines features of Principal Component Analysis (PCA) and Multiple Linear Regression (MLR).

Like PCA: PLS extracts factors from X.Like MLR: PLS predicts Y from X

Combine PCA & MLR.PLS extracts factors from X in order to predict Y

What is PLS Regression ?

Page 12: PLS Regression

When to use PLS ?

To analyze two data tables describing the same I observations with J predictors and K dependent variables

1 … j … J

1...i...I

xi,j…...

……

...

IndependentVariables

Obs

erva

tions

1 … k … K

1...i...I

yi,k...............

……

...DependentVariables

Page 13: PLS Regression

General principle of PLS:

1 … j … J1

...i

...I

xij…...

……

...

Predictors XO

bse

rva

tions

t1 … tℓ ... tL1

...i

...I

ti,ℓ…...…

…...

Latent Variables

tℓ= Xwℓ

1 … k … K

1...i...I

yi,k...............

……

...

DependentVariables

Predict

NIPALS

ℓ= tℓ cTY

Page 14: PLS Regression

PLS: Maps of the observations

…... xijti,ℓ

t1 … tℓ ... tL

…...

……

...

Latent Variables1 … j … J

1

...i

...I

……

...

X1 … k … K

yi,k...............

……

...

tℓ= Xwℓ

NIPALS

ℓ= tℓ cTY

lv2

lv1

Observations: tℓ

I

i

3

1

2 4

Page 15: PLS Regression

PLS: Maps of the variables

…... xijti,ℓ

t1 … tℓ ... tL

…...

……

...

Latent Variables1 … j … J

1

...i

...I

……

...

X1 … k … K

yi,k...............

……

...

tℓ= Xwℓ

NIPALS

ℓ= tℓ cTY

lv1

lv2

Circle of correlations lv2

lv1

Common map wℓ & cℓ

xx yx

y y y

y y

Page 16: PLS Regression

PLS: Predicting Y from X

…... xijti,ℓ

t1 … tℓ ... tL

…...

……

...

Latent Variables1 … j … J

1

...i

...I

……

...

X1 … k … K

yi,k...............

……

...

tℓ= Xwℓ

NIPALS

ℓ= tℓ cTY

tℓ= Xwℓ & = tℓ cT = XBpls Y Y

Some

Magic

Here!

Page 17: PLS Regression

PLS: How do we explain Y from X?

RESS = (data – prediction)2

Compare Data (Y) with Prediction (Yhat)

RESS (REsidual Sum of Squares)

1 … k … K

Y

1...i...I

1 … k … K

ℓ = XBpls Y

1...i...I

Page 18: PLS Regression

1 … k … K

(-1) = X(-1) Bpls Y

2...i...I

PLS: How do we predict Y from X?

How well will we do with NEW data?Cross-validation. Here Jackknife

1 … k … K

Y

1...i...I

Predict y1 from X(-1) 1 … k … K

Y(-1)

12...i...I

Predict y2 from X(-2)

…etc…

Predict yI from X(-I)

Page 19: PLS Regression

PLS: How do we predict Y from X?How well will we do with NEW data?Cross-validation. Here Jackknife

PRESS = (data – jackknifed prediction)2

Compare Data (Y) with Jackknifed Prediction (Yjack)

PRESS (Predicted REsidual Sum of Squares)

1 … k … K

Y

1...i...I

1 … k … K

jack = XBpls Y

1...i...I

Page 20: PLS Regression

PLS Big Question:

How Many Latent Variables?

Compare RESS and PRESS, or use PRESS.

Quick and Dirty:

Min(PRESS) => Optimum number of Latent Variables

Page 21: PLS Regression

Back to cocoa

Goals: Explain and Predict Sensory (Y) from Physico-Chemical (X)

Page 22: PLS Regression

20.00 30.00 50.00 2.60 44.89 1.86 20.00 43.30 36.70 2.65 42.77 1.80 20.00 50.00 30.00 2.40 41.64 1.78 26.70 30.00 43.30 3.10 42.37 2.06 26.60 36.70 36.70 3.55 41.04 1.97 33.30 36.70 30.00 4.30 39.14 2.13 40.00 30.00 30.00 4.70 38.31 2.26 20.00 30.00 50.00 0.12 44.25 48.60 20.00 43.30 36.70 0.09 41.98 44.10 20.00 50.00 30.00 0.10 41.18 43.60 26.70 30.00 43.30 0.10 41.13 47.80 26.60 36.70 36.70 0.10 40.39 50.30 33.30 36.70 30.00 0.10 38.85 51.40 40.00 30.00 30.00 0.09 37.91 54.80

X

Page 23: PLS Regression

1.67 6.06 7.37 5.94 7.80 8.59 6.51 6.24 6.89 8.48 3.22 6.30 5.10 6.34 8.40 9.09 7.14 7.04 5.17 9.76 4.82 7.09 4.11 6.68 8.29 8.61 6.76 7.26 4.62 10.50 4.90 7.57 3.86 6.79 8.58 5.96 5.46 8.77 3.26 6.69 7.03 7.96 2.99 6.92 8.71 6.42 5.59 8.93 2.76 7.0510.60 10.24 1.57 6.51 9.70 4.55 4.62 11.44 1.51 5.4811.11 11.31 1.25 7.04 9.72 3.42 4.11 12.43 0.86 3.91 3.06 6.97 5.40 9.84 9.99 10.67 9.11 7.66 5.71 8.24 6.02 8.61 3.75 10.01 9.92 10.86 8.64 7.66 4.86 8.71 7.94 8.40 2.95 9.61 9.92 10.84 8.26 8.32 4.09 9.67 9.17 9.30 2.86 10.68 11.05 10.48 8.20 10.40 2.22 6.43 10.46 10.14 1.90 10.71 10.64 9.60 7.84 11.05 2.01 7.0212.40 11.30 1.18 10.64 11.09 7.24 7.23 11.78 1.65 5.5913.46 11.49 1.56 11.31 11.36 7.22 6.86 12.60 1.06 4.34

Y

Page 24: PLS Regression

0 50 10035 40 450 2 430 40 5030 40 5020 30 400

50

10035

40

45024

30

40

5030

40

5020

30

40

Correlation within the X set

Page 25: PLS Regression

010200 510510150 5100 1020510155 10150 5105 10150 10200

102005

105

101505

100

10205

10155

101505

105

10150

1020

Correlation within the Y set

Page 26: PLS Regression

0 50 10035 40 450 2 430 40 5030 40 5020 30 400

102005

105

101505

100

10205

10155

101505

105

10150

1020

Correlation between X and Y

Page 27: PLS Regression

Show The t (latent) variables• -0.42 -0.19 -0.34 -0.35• -0.25 -0.17 0.22 -0.20• -0.17 -0.14 0.50 -0.22• -0.13 -0.25 -0.26 -0.11• -0.03 -0.27 0.02 0.33• 0.23 -0.36 0.10 0.30• 0.41 -0.42 -0.11 0.06• -0.32 0.27 -0.37 0.04• -0.15 0.27 0.19 0.14• -0.08 0.27 0.46 0.03• 0.01 0.25 -0.29 0.38• 0.07 0.27 -0.02 0.33• 0.32 0.25 0.05 -0.22• 0.51 0.23 -0.16 -0.50

Page 28: PLS Regression

Show w

• 0.61 -0.15 -0.20 -0.46• -0.22 0.09 0.77 0.08• -0.39 0.06 -0.57 0.38• 0.01 -0.70 -0.00 0.41• -0.62 0.00 -0.15 -0.62• 0.20 0.69 -0.10 0.28

Page 29: PLS Regression

Show c

• 0.38 0.12 0.07 0.28• 0.38 0.11 -0.07 0.25• -0.37 -0.05 -0.30 -0.57• 0.15 0.55 -0.18 0.18• 0.27 0.41 -0.25 0.36• -0.23 0.46 0.22 0.10• -0.16 0.53 0.09 0.04• 0.38 0.03 -0.28 0.30• -0.37 0.03 0.07 -0.50• -0.33 0.09 0.81 -0.16

Page 30: PLS Regression

Bpls: X to Y (in Z-scores)

-0.11 -0.05 0.63 -0.21 -0.36 -0.48 -0.31 -0.09 0.45 -0.18

-0.03 -0.09 -0.13 -0.03 -0.07 0.24 0.15 -0.17 0.04 0.41

0.14 0.15 -0.50 0.24 0.43 0.25 0.16 0.26 -0.50 -0.24

0.32 0.29 -0.80 -0.19 0.19 -0.25 -0.40 0.43 -0.78 -0.33

-1.04 -0.97 1.70 -0.56 -1.10 -0.02 0.06 -1.07 1.54 0.68

0.52 0.5 -0.77 0.71 0.83 0.40 0.42 0.49 -0.65 -0.26

Page 31: PLS Regression

B*pls from X to Y (original units)

79.86 43.18 -52.77 29.23 32.63 6.91 4.32 52.51 -50.26 -19.07

-0.06 -0.01 0.15 -0.06 -0.06 -0.16 -0.06 -0.03 0.12 -0.05

-0.01 -0.02 -0.03 -0.01 -0.01 0.08 0.03 -0.05 0.01 0.11

0.07 0.04 -0.12 0.06 0.07 0.08 0.03 0.08 -0.13 -0.07

0.67 0.31 -0.82 -0.22 0.12 -0.33 -0.34 0.52 -0.84 -0.37

-1.85 -0.88 1.47 -0.54 -0.6 -0.02 0.04 -1.10 1.40 0.66

0.08 0.04 -0.06 0.06 0.04 0.04 0.03 0.04 -0.05 -0.02

Page 32: PLS Regression

Show RESS & PRESS

1 182.39 8505.472 50.86 8318.843 30.28 8292.234 15.69 8286.955 13.00 8299.236 11.91 8309.38

< min PRESS for 4

Keep 4 latent variables

Page 33: PLS Regression

Plot w & t (1 vs 2)

-1 -0.5 0 0.5 1

-0.8

-0.6

-0.4

-0.2

0

0.2

0.4

0.6

0.8

Dimension: 1

Dim

ens

ion:

2

%COCOA

%SUGAR%MILK

SEDIMENT

COLOUR

VISCOSITY

1- 2- 3-

4- 5-6-

7-

1+ 2+3+4+5+ 6+ 7+

Predictors & Rows (W*T) Cocoa Data (# factors: 4

Page 34: PLS Regression

Plot w & c (1 vs 2)

-1 -0.5 0 0.5 1

-0.8

-0.6

-0.4

-0.2

0

0.2

0.4

0.6

0.8

Dimension: 1

Dim

ens

ion:

2

%COCOA

%SUGAR%MILK

SEDIMENT

COLOUR

VISCOSITY

colourcocoa-odour

milk-odour

thick-txtr

mouthfeelsmooth-txtr

creamy-txtr

cocoa-tastemilk-tastesweet

Predictors & DVs (W*C) Cocoa (# factors: 4)

Page 35: PLS Regression

Show the circle of correlation

-1 -0.5 0 0.5 1

-1

-0.8

-0.6

-0.4

-0.2

0

0.2

0.4

0.6

0.8

1

Correlation with Principal Component # 1

Corr

elation

with

Princi

pal C

om

pone

nt # 2

colourcocoa-odour

milk-odour

thick-txtr

mouthfeelsmooth-txtr

creamy-txtr

cocoa-tastemilk-tastesweet

Cocoa (# factors: 4) Y loadings( Circle of Correlation)

Page 36: PLS Regression

Conclusion

• Useful References (contain bibliography):

Abdi (2007, 2003) see www.utd.edu/~herve