Applications of Tensor Methods in Life Sciencesmmds.imm.dtu.dk/presentations/bro.pdf · Other...

Applications of Tensor Applications of Tensor Methods in Life SciencesMethods in Life Sciences

Rasmus Bro

University of Copenhagen

Faculty of Life Sciences

[email protected]

[email protected]

PARAFACA very nice model

Some examplesHow to store a cheese?

A model of wine

Some issuesVariable selection

Nonnegativity

Dealing with missing data

[email protected]

PARallel FACtor analysis

• PCA - bilinear model,

X E= + +

b2

a2

b1

a1

E= +

A

B

1

F

ij if jf ijf

x a b e

[email protected]

• PCA - bilinear model,

• PARAFAC - trilinear model,

X E= + +

c2

b2

a2

c1

b1

a1

E= +

A

C

B

1

F

ij if jf ijf

x a b e

1

F

ijk if jf kf ijkf

x a b c e

PARAFAC invented in 1970 by Harshman and

independently by Carroll & Chang under the

name CANDECOMP. Based on a principle of

parallel proportional profiles suggested in 1944

by Cattell

•R. A. Harshman. UCLA working papers in phonetics 16:1-84, 1970.

•J. D. Carroll and J. Chang. Psychometrika 35:283-319, 1970.

•R. B. Cattell. Psychometrika 9:267-283, 1944.

PARallel FACtor analysis

PCA PARAFACX = ABT Xk = ADkB

T, Dk =diag(c(k,:))

Lamp

(uv-vis)

Sample

Excitation monochromator

Emissionmonochromator

Detector/IntensityExcitation

Emission

Fluorescence spectroscopy

Excitation-emission matrix – a chemical

fingerprint

Amino acids

Chlorophyll

Porphyrin

ATP

NADH

Vitamins

(A, B2, B6, E)

Fluorescence excitation-emissionVery high sensitivity and selectivity towards important compounds

J. Christensen - Foodfluor database at

www.models.life.ku.dk/research/foodfluor

[email protected]

PARAFAC - uniqueness

• Uniqueness - conditionsA PARAFAC model is unique when

kA + kB + kC 2F + 2

F is the number of components and kA is the k-rank of loading A = maximal number of randomly chosen columns which will be full rank (F)

J. B. Kruskal. Linear Algebra and its Applications 18:95-138, 1977.

N. D. Sidiropoulos and R. Bro. Journal of Chemometrics 14 (3):229-239, 2000.

XB

A

C• No rotational freedom

Unlike the bilinear ‘PCA’ model, there is only one solution

PARAFAC is mathematical chromatographyakaBlind source separationSolving the cocktail party effectUnmixingCurve resolution...

Mathematical chromatography eliminates major problems in multivariate analysis:

• Indirect correlations stemming from rotational freedom• It also eliminates outliers• It determines underlying sources• Simpler because it provides a chemical model• It is way more noise insensitive

How to store a cheese?How to store a cheese?

Oxidation• Oxidation from light causes rancid taste of cheese, butter etc.

• Important for packaging of food and shelf-storage

• Believed to be caused by riboflavin acting as photosensitizer

• Riboflavin does not absorb much red light, hence red material should protect

ExperimentDifferent lightWith / without OxygenDifferent storage time

Samples measured by

• Sensory analysis (quality)• Fluorescence EEM

Spectra from PARAFAC of EEMs

0 5 0 1 0 0 1 5 0 2 0 0 2 5 00

0 .0 5

0 . 1

0 .1 5

0 . 2

0 .2 5

600 620 635 650 665 680 695 705

nm

0 5 1 0 15 2 0 25 3 0 350

0 . 0 5

0 . 1

0 . 1 5

0 . 2

0 . 2 5

0 . 3

0 . 3 5

0 . 4

0 . 4 5

350 365 380 395 410 425 440 455 nm

Excitation Emission

Riboflavin

HematoPProtoP

Cloro B

Cloro A

Chlorin X

From JPWo/Matforsk

Relation between sensory data and PARAFAC estimated concentrations

Rancid taste

From JPWo/Matforsk

Importance of different compounds

ProtoporphyrinChlorophyl BX (Chlorine?)

S S

S

S S

S

From JPWo/ Matforsk

New result

• Apart from riboflavin at least five other light-sensitizers

• ‘New’ ones seem to be more important than riboflavin

• Fluorescence and PARAFAC provides a ‘simple’ approach for exploring these.

0

0,2

0,4

0,6

0,8

1

1,2

1,4

1,6

1,8

300 400 500 600 700

Abs.

Chlorophyll a

Riboflavin

UV

From JPWo/Matforsk

[email protected]

PARAFAC can not handle shifts and shape changes

PARAFAC(1) Xk = ADkBT

A wine modelA wine model

[email protected]

R. A. Harshman. UCLA working papers in phonetics 22:30-47, 1972.H. A. L. Kiers, J. M. F. ten Berge, R. Bro. J. Chemom. 13:275-294, 1999.

R. Bro, C. A. Andersson, H. A. L. Kiers. J. Chemom. 13:295-309, 1999.

PARAFAC2

PARAFAC2 Xk = ADkBkT subject to Bk

TBk constant

PARAFAC(1) Xk = ADkBT

*Actually it is more general than shifts

but it’s a feasible approximation to

what PARAFAC2 can handle

60 wine samples measured by GC-MS

Elution

Sam

ple

s

Weird shifts

Overlap and shifts

Low intensity and baseline

fronting

PARAFAC2 results

Other applications of tensor methods

Scientific fieldEnvironmental monitoringSensory analysisProcess monitoringFermentationCell phone audio qualityWireless communicationMetabolomicsProteomicsCancer diagnosticsAnthropometry......

2, L. America (Indian)

15, S. India16, N. Asia

19, Australia

20, Korea/Japan

12, S.E.Africa11, W. Africa

10, N. Africa18, S.E. Asia

13, Near East

9, Iberian Peninsula

6, E. Europe

7, S.E. Europe

17, S. China

3, L. America

14, N. India

1, N. America

5, C. Europe

4, N. Europe

8, France

2, L. America (Indian)

15, S. India16, N. Asia

19, Australia

20, Korea/Japan

12, S.E.Africa11, W. Africa

10, N. Africa18, S.E. Asia

13, Near East

9, Iberian Peninsula

6, E. Europe

7, S.E. Europe

17, S. China

3, L. America

14, N. India

1, N. America

5, C. Europe

4, N. Europe

8, France

Variable selectionVariable selection

VIS/NIR spectra of 61 beers

1

2

3

4

400 nm 2300 nm

Absorbance

Wavelength

Just crap! Random Just crap! Random noisenoise

The The good good partpart

Not relevant Not relevant but highly but highly systematicsystematic

400 600 800 1000 1200 1400 1600 1800 2000 2200

1

2

3

Raw data

400 600 800 1000 1200 1400 1600 1800 2000 2200

0

0.4

0.8

Actual modelled data - centered

500 1000 1500 2000-2

-1

0

1

Regression vector PLS

0 5 10 15 200

5

10

15

20Predictions test set PLS

’Classicial’ regression – in this case partial least squares (PLS). Good!

And can be optimized by chemical interpretation

But – this is not always the case

500 1000 1500 2000

-60

-40

-20

0

20

40

Regression vector LASSO

0 5 10 15 200

5

10

15

20Predictions test set LASSO

500 1000 1500 2000

-2

-1

0

1


0 5 10 15 200

5

10

15


Lasso. Weird stuff! Important area represented by two variables.

Little support:FragileNon-robustPoor outlier abilityInterpretation low

500 1000 1500 2000

-60

-40

-20

0

20

40

Regression vector LASSO

0 5 10 15 200

5

10

15

20Predictions test set LASSO

500 1000 1500 2000

-2

-1

0

1


0 5 10 15 200

5

10

15


500 1000 1500 2000

-5

0

5

Regression vector SR

0 5 10 15 200

5

10

15

20Predictions test set SR

’Chemometric’ variable selection – very nice!Variable selection by selectivity ratios but others would do the job as well

NonnegativityNonnegativity

Classical papers related to NMF

Lawton & Sylvestre. Self modeling curve resolution. Technometrics13:617-633, 1971.

Hanson & Lawson. Solving least squares problems, Englewood Cliffs:Prentice-Hall, Inc, 1974.

In 1999 Lee and Seung wrote a Nature paper on NMF -non-negative matrix factorization.

However, NMF has existed for much more than 30 years under the name multivariate curve resolution.

NMF is not generally unique – rotational freedom

Conditions for uniquness exist

When unique, the solution is often shaky

NMF is (very) sensitive to starting values

Some facts worth noting

Dealing with missing dataDealing with missing data

No missingEx.: standard PCA loss function ||X-TP’|| =

I.e., a summation of errors over all elements of X

If missingOnly fit the model to the data that exist

I.e., fit to the loss function

where wij is zero if xij is missing and one otherwise

2

1 1 1

I J F

ij if jfi j f

x t p

2

1 1 1

I J F

ij ij if jfi j f

w x t p

How can that loss function be optimized?

Method 1: use weighted least squares regression

Method 2: use imputation (expectation maximization)

1. Put numbers in missing elements

2. Fit model to these ‘wrong’ data (Ex: M = TP’ in PCA)

3. Replace missing elements with model guess (Ex: xij = Mij in

PCA)

4. Go to step 2 until convergence

Both methods give same result. Method 2 is easy to implement, Method 1 sometimes faster, but more memory-demanding

[email protected]

Some concluding remarks

Tensor models provide

Mathematical chromatography (real blind source separation)

Huge noise reductionIntuitive models (chemically)Better handling of correlationsRobustness…But you need to know your data well – or be lucky

Much neededBetter algorithmsBetter statistical diagnosticsBetter software

[email protected]

Papers, m-files, courses, database of references, data sets, spectral libraries etc.

www.models.life.ku.dk

Rasmus [email protected]

Applications of Tensor Methods in Life Sciencesmmds.imm.dtu.dk/presentations/bro.pdf · Other...

Documents

Transcript of Applications of Tensor Methods in Life Sciencesmmds.imm.dtu.dk/presentations/bro.pdf · Other...