Applications of Tensor Methods in Life Sciencesmmds.imm.dtu.dk/presentations/bro.pdf · Other...

33
Applications of Tensor Applications of Tensor Methods in Life Sciences Methods in Life Sciences Rasmus Bro University of Copenhagen Faculty of Life Sciences [email protected]

Transcript of Applications of Tensor Methods in Life Sciencesmmds.imm.dtu.dk/presentations/bro.pdf · Other...

Page 1: Applications of Tensor Methods in Life Sciencesmmds.imm.dtu.dk/presentations/bro.pdf · Other applications of tensor methods Scientific field Environmental monitoring Sensory analysis

Applications of Tensor Applications of Tensor Methods in Life SciencesMethods in Life Sciences

Rasmus Bro

University of Copenhagen

Faculty of Life Sciences

[email protected]

Page 2: Applications of Tensor Methods in Life Sciencesmmds.imm.dtu.dk/presentations/bro.pdf · Other applications of tensor methods Scientific field Environmental monitoring Sensory analysis

[email protected]

PARAFACA very nice model

Some examplesHow to store a cheese?

A model of wine

Some issuesVariable selection

Nonnegativity

Dealing with missing data

Page 3: Applications of Tensor Methods in Life Sciencesmmds.imm.dtu.dk/presentations/bro.pdf · Other applications of tensor methods Scientific field Environmental monitoring Sensory analysis

[email protected]

PARallel FACtor analysis

• PCA - bilinear model,

X E= + +

b2

a2

b1

a1

E= +

A

B

1

F

ij if jf ijf

x a b e

Page 4: Applications of Tensor Methods in Life Sciencesmmds.imm.dtu.dk/presentations/bro.pdf · Other applications of tensor methods Scientific field Environmental monitoring Sensory analysis

[email protected]

• PCA - bilinear model,

• PARAFAC - trilinear model,

X E= + +

c2

b2

a2

c1

b1

a1

E= +

A

C

B

1

F

ij if jf ijf

x a b e

1

F

ijk if jf kf ijkf

x a b c e

PARAFAC invented in 1970 by Harshman and

independently by Carroll & Chang under the

name CANDECOMP. Based on a principle of

parallel proportional profiles suggested in 1944

by Cattell

•R. A. Harshman. UCLA working papers in phonetics 16:1-84, 1970.

•J. D. Carroll and J. Chang. Psychometrika 35:283-319, 1970.

•R. B. Cattell. Psychometrika 9:267-283, 1944.

PARallel FACtor analysis

PCA PARAFACX = ABT Xk = ADkB

T, Dk =diag(c(k,:))

Page 5: Applications of Tensor Methods in Life Sciencesmmds.imm.dtu.dk/presentations/bro.pdf · Other applications of tensor methods Scientific field Environmental monitoring Sensory analysis

Lamp

(uv-vis)

Sample

Excitation monochromator

Emissionmonochromator

Detector/IntensityExcitation

Emission

Fluorescence spectroscopy

Excitation-emission matrix – a chemical

fingerprint

Page 6: Applications of Tensor Methods in Life Sciencesmmds.imm.dtu.dk/presentations/bro.pdf · Other applications of tensor methods Scientific field Environmental monitoring Sensory analysis

Amino acids

Chlorophyll

Porphyrin

ATP

NADH

Vitamins

(A, B2, B6, E)

Fluorescence excitation-emissionVery high sensitivity and selectivity towards important compounds

J. Christensen - Foodfluor database at

www.models.life.ku.dk/research/foodfluor

Page 7: Applications of Tensor Methods in Life Sciencesmmds.imm.dtu.dk/presentations/bro.pdf · Other applications of tensor methods Scientific field Environmental monitoring Sensory analysis
Page 8: Applications of Tensor Methods in Life Sciencesmmds.imm.dtu.dk/presentations/bro.pdf · Other applications of tensor methods Scientific field Environmental monitoring Sensory analysis
Page 9: Applications of Tensor Methods in Life Sciencesmmds.imm.dtu.dk/presentations/bro.pdf · Other applications of tensor methods Scientific field Environmental monitoring Sensory analysis

[email protected]

PARAFAC - uniqueness

• Uniqueness - conditionsA PARAFAC model is unique when

kA + kB + kC 2F + 2

F is the number of components and kA is the k-rank of loading A = maximal number of randomly chosen columns which will be full rank (F)

J. B. Kruskal. Linear Algebra and its Applications 18:95-138, 1977.

N. D. Sidiropoulos and R. Bro. Journal of Chemometrics 14 (3):229-239, 2000.

XB

A

C• No rotational freedom

Unlike the bilinear ‘PCA’ model, there is only one solution

Page 10: Applications of Tensor Methods in Life Sciencesmmds.imm.dtu.dk/presentations/bro.pdf · Other applications of tensor methods Scientific field Environmental monitoring Sensory analysis

PARAFAC is mathematical chromatographyakaBlind source separationSolving the cocktail party effectUnmixingCurve resolution...

Mathematical chromatography eliminates major problems in multivariate analysis:

• Indirect correlations stemming from rotational freedom• It also eliminates outliers• It determines underlying sources• Simpler because it provides a chemical model• It is way more noise insensitive

Page 11: Applications of Tensor Methods in Life Sciencesmmds.imm.dtu.dk/presentations/bro.pdf · Other applications of tensor methods Scientific field Environmental monitoring Sensory analysis

How to store a cheese?How to store a cheese?

Oxidation• Oxidation from light causes rancid taste of cheese, butter etc.

• Important for packaging of food and shelf-storage

• Believed to be caused by riboflavin acting as photosensitizer

• Riboflavin does not absorb much red light, hence red material should protect

Page 12: Applications of Tensor Methods in Life Sciencesmmds.imm.dtu.dk/presentations/bro.pdf · Other applications of tensor methods Scientific field Environmental monitoring Sensory analysis

ExperimentDifferent lightWith / without OxygenDifferent storage time

Samples measured by

• Sensory analysis (quality)• Fluorescence EEM

Page 13: Applications of Tensor Methods in Life Sciencesmmds.imm.dtu.dk/presentations/bro.pdf · Other applications of tensor methods Scientific field Environmental monitoring Sensory analysis

Spectra from PARAFAC of EEMs

0 5 0 1 0 0 1 5 0 2 0 0 2 5 00

0 .0 5

0 . 1

0 .1 5

0 . 2

0 .2 5

600 620 635 650 665 680 695 705

nm

0 5 1 0 15 2 0 25 3 0 350

0 . 0 5

0 . 1

0 . 1 5

0 . 2

0 . 2 5

0 . 3

0 . 3 5

0 . 4

0 . 4 5

350 365 380 395 410 425 440 455 nm

Excitation Emission

Riboflavin

HematoPProtoP

Cloro B

Cloro A

Chlorin X

From JPWo/Matforsk

Page 14: Applications of Tensor Methods in Life Sciencesmmds.imm.dtu.dk/presentations/bro.pdf · Other applications of tensor methods Scientific field Environmental monitoring Sensory analysis

Relation between sensory data and PARAFAC estimated concentrations

Rancid taste

From JPWo/Matforsk

Importance of different compounds

ProtoporphyrinChlorophyl BX (Chlorine?)

S S

S

S S

S

From JPWo/ Matforsk

Page 15: Applications of Tensor Methods in Life Sciencesmmds.imm.dtu.dk/presentations/bro.pdf · Other applications of tensor methods Scientific field Environmental monitoring Sensory analysis

New result

• Apart from riboflavin at least five other light-sensitizers

• ‘New’ ones seem to be more important than riboflavin

• Fluorescence and PARAFAC provides a ‘simple’ approach for exploring these.

0

0,2

0,4

0,6

0,8

1

1,2

1,4

1,6

1,8

300 400 500 600 700

Abs.

Chlorophyll a

Riboflavin

UV

From JPWo/Matforsk

Page 16: Applications of Tensor Methods in Life Sciencesmmds.imm.dtu.dk/presentations/bro.pdf · Other applications of tensor methods Scientific field Environmental monitoring Sensory analysis

[email protected]

PARAFAC can not handle shifts and shape changes

PARAFAC(1) Xk = ADkBT

A wine modelA wine model

Page 17: Applications of Tensor Methods in Life Sciencesmmds.imm.dtu.dk/presentations/bro.pdf · Other applications of tensor methods Scientific field Environmental monitoring Sensory analysis

[email protected]

PARAFAC can not handle shifts and shape changes

PARAFAC(1) Xk = ADkBT

A wine modelA wine model

Page 18: Applications of Tensor Methods in Life Sciencesmmds.imm.dtu.dk/presentations/bro.pdf · Other applications of tensor methods Scientific field Environmental monitoring Sensory analysis

[email protected]

R. A. Harshman. UCLA working papers in phonetics 22:30-47, 1972.H. A. L. Kiers, J. M. F. ten Berge, R. Bro. J. Chemom. 13:275-294, 1999.

R. Bro, C. A. Andersson, H. A. L. Kiers. J. Chemom. 13:295-309, 1999.

PARAFAC2

PARAFAC2 Xk = ADkBkT subject to Bk

TBk constant

PARAFAC(1) Xk = ADkBT

*Actually it is more general than shifts

but it’s a feasible approximation to

what PARAFAC2 can handle

Page 19: Applications of Tensor Methods in Life Sciencesmmds.imm.dtu.dk/presentations/bro.pdf · Other applications of tensor methods Scientific field Environmental monitoring Sensory analysis

60 wine samples measured by GC-MS

Elution

Sam

ple

s

Page 20: Applications of Tensor Methods in Life Sciencesmmds.imm.dtu.dk/presentations/bro.pdf · Other applications of tensor methods Scientific field Environmental monitoring Sensory analysis

Weird shifts

Overlap and shifts

Low intensity and baseline

fronting

Page 21: Applications of Tensor Methods in Life Sciencesmmds.imm.dtu.dk/presentations/bro.pdf · Other applications of tensor methods Scientific field Environmental monitoring Sensory analysis

PARAFAC2 results

Page 22: Applications of Tensor Methods in Life Sciencesmmds.imm.dtu.dk/presentations/bro.pdf · Other applications of tensor methods Scientific field Environmental monitoring Sensory analysis

Other applications of tensor methods

Scientific fieldEnvironmental monitoringSensory analysisProcess monitoringFermentationCell phone audio qualityWireless communicationMetabolomicsProteomicsCancer diagnosticsAnthropometry......

2, L. America (Indian)

15, S. India16, N. Asia

19, Australia

20, Korea/Japan

12, S.E.Africa11, W. Africa

10, N. Africa18, S.E. Asia

13, Near East

9, Iberian Peninsula

6, E. Europe

7, S.E. Europe

17, S. China

3, L. America

14, N. India

1, N. America

5, C. Europe

4, N. Europe

8, France

2, L. America (Indian)

15, S. India16, N. Asia

19, Australia

20, Korea/Japan

12, S.E.Africa11, W. Africa

10, N. Africa18, S.E. Asia

13, Near East

9, Iberian Peninsula

6, E. Europe

7, S.E. Europe

17, S. China

3, L. America

14, N. India

1, N. America

5, C. Europe

4, N. Europe

8, France

Page 23: Applications of Tensor Methods in Life Sciencesmmds.imm.dtu.dk/presentations/bro.pdf · Other applications of tensor methods Scientific field Environmental monitoring Sensory analysis

Variable selectionVariable selection

VIS/NIR spectra of 61 beers

1

2

3

4

400 nm 2300 nm

Absorbance

Wavelength

Page 24: Applications of Tensor Methods in Life Sciencesmmds.imm.dtu.dk/presentations/bro.pdf · Other applications of tensor methods Scientific field Environmental monitoring Sensory analysis

Just crap! Random Just crap! Random noisenoise

The The good good partpart

Not relevant Not relevant but highly but highly systematicsystematic

400 600 800 1000 1200 1400 1600 1800 2000 2200

1

2

3

Raw data

400 600 800 1000 1200 1400 1600 1800 2000 2200

0

0.4

0.8

Actual modelled data - centered

Page 25: Applications of Tensor Methods in Life Sciencesmmds.imm.dtu.dk/presentations/bro.pdf · Other applications of tensor methods Scientific field Environmental monitoring Sensory analysis

500 1000 1500 2000-2

-1

0

1

Regression vector PLS

0 5 10 15 200

5

10

15

20Predictions test set PLS

’Classicial’ regression – in this case partial least squares (PLS). Good!

And can be optimized by chemical interpretation

But – this is not always the case

Page 26: Applications of Tensor Methods in Life Sciencesmmds.imm.dtu.dk/presentations/bro.pdf · Other applications of tensor methods Scientific field Environmental monitoring Sensory analysis

500 1000 1500 2000

-60

-40

-20

0

20

40

Regression vector LASSO

0 5 10 15 200

5

10

15

20Predictions test set LASSO

500 1000 1500 2000

-2

-1

0

1

Regression vector PLS

0 5 10 15 200

5

10

15

20Predictions test set PLS

Lasso. Weird stuff! Important area represented by two variables.

Little support:FragileNon-robustPoor outlier abilityInterpretation low

Page 27: Applications of Tensor Methods in Life Sciencesmmds.imm.dtu.dk/presentations/bro.pdf · Other applications of tensor methods Scientific field Environmental monitoring Sensory analysis

500 1000 1500 2000

-60

-40

-20

0

20

40

Regression vector LASSO

0 5 10 15 200

5

10

15

20Predictions test set LASSO

500 1000 1500 2000

-2

-1

0

1

Regression vector PLS

0 5 10 15 200

5

10

15

20Predictions test set PLS

500 1000 1500 2000

-5

0

5

Regression vector SR

0 5 10 15 200

5

10

15

20Predictions test set SR

’Chemometric’ variable selection – very nice!Variable selection by selectivity ratios but others would do the job as well

Page 28: Applications of Tensor Methods in Life Sciencesmmds.imm.dtu.dk/presentations/bro.pdf · Other applications of tensor methods Scientific field Environmental monitoring Sensory analysis

NonnegativityNonnegativity

Classical papers related to NMF

Lawton & Sylvestre. Self modeling curve resolution. Technometrics13:617-633, 1971.

Hanson & Lawson. Solving least squares problems, Englewood Cliffs:Prentice-Hall, Inc, 1974.

In 1999 Lee and Seung wrote a Nature paper on NMF -non-negative matrix factorization.

However, NMF has existed for much more than 30 years under the name multivariate curve resolution.

Page 29: Applications of Tensor Methods in Life Sciencesmmds.imm.dtu.dk/presentations/bro.pdf · Other applications of tensor methods Scientific field Environmental monitoring Sensory analysis

NMF is not generally unique – rotational freedom

Conditions for uniquness exist

When unique, the solution is often shaky

NMF is (very) sensitive to starting values

Some facts worth noting

Page 30: Applications of Tensor Methods in Life Sciencesmmds.imm.dtu.dk/presentations/bro.pdf · Other applications of tensor methods Scientific field Environmental monitoring Sensory analysis

Dealing with missing dataDealing with missing data

No missingEx.: standard PCA loss function ||X-TP’|| =

I.e., a summation of errors over all elements of X

If missingOnly fit the model to the data that exist

I.e., fit to the loss function

where wij is zero if xij is missing and one otherwise

2

1 1 1

I J F

ij if jfi j f

x t p

2

1 1 1

I J F

ij ij if jfi j f

w x t p

Page 31: Applications of Tensor Methods in Life Sciencesmmds.imm.dtu.dk/presentations/bro.pdf · Other applications of tensor methods Scientific field Environmental monitoring Sensory analysis

How can that loss function be optimized?

Method 1: use weighted least squares regression

Method 2: use imputation (expectation maximization)

1. Put numbers in missing elements

2. Fit model to these ‘wrong’ data (Ex: M = TP’ in PCA)

3. Replace missing elements with model guess (Ex: xij = Mij in

PCA)

4. Go to step 2 until convergence

Both methods give same result. Method 2 is easy to implement, Method 1 sometimes faster, but more memory-demanding

Page 32: Applications of Tensor Methods in Life Sciencesmmds.imm.dtu.dk/presentations/bro.pdf · Other applications of tensor methods Scientific field Environmental monitoring Sensory analysis

[email protected]

Some concluding remarks

Tensor models provide

Mathematical chromatography (real blind source separation)

Huge noise reductionIntuitive models (chemically)Better handling of correlationsRobustness…But you need to know your data well – or be lucky

Much neededBetter algorithmsBetter statistical diagnosticsBetter software

Page 33: Applications of Tensor Methods in Life Sciencesmmds.imm.dtu.dk/presentations/bro.pdf · Other applications of tensor methods Scientific field Environmental monitoring Sensory analysis

[email protected]

Papers, m-files, courses, database of references, data sets, spectral libraries etc.

www.models.life.ku.dk

Rasmus [email protected]