Applications of Tensor Methods in Life Sciencesmmds.imm.dtu.dk/presentations/bro.pdf · Other...
Transcript of Applications of Tensor Methods in Life Sciencesmmds.imm.dtu.dk/presentations/bro.pdf · Other...
Applications of Tensor Applications of Tensor Methods in Life SciencesMethods in Life Sciences
Rasmus Bro
University of Copenhagen
Faculty of Life Sciences
PARAFACA very nice model
Some examplesHow to store a cheese?
A model of wine
Some issuesVariable selection
Nonnegativity
Dealing with missing data
PARallel FACtor analysis
• PCA - bilinear model,
X E= + +
b2
a2
b1
a1
E= +
A
B
1
F
ij if jf ijf
x a b e
• PCA - bilinear model,
• PARAFAC - trilinear model,
X E= + +
c2
b2
a2
c1
b1
a1
E= +
A
C
B
1
F
ij if jf ijf
x a b e
1
F
ijk if jf kf ijkf
x a b c e
PARAFAC invented in 1970 by Harshman and
independently by Carroll & Chang under the
name CANDECOMP. Based on a principle of
parallel proportional profiles suggested in 1944
by Cattell
•R. A. Harshman. UCLA working papers in phonetics 16:1-84, 1970.
•J. D. Carroll and J. Chang. Psychometrika 35:283-319, 1970.
•R. B. Cattell. Psychometrika 9:267-283, 1944.
PARallel FACtor analysis
PCA PARAFACX = ABT Xk = ADkB
T, Dk =diag(c(k,:))
Lamp
(uv-vis)
Sample
Excitation monochromator
Emissionmonochromator
Detector/IntensityExcitation
Emission
Fluorescence spectroscopy
Excitation-emission matrix – a chemical
fingerprint
Amino acids
Chlorophyll
Porphyrin
ATP
NADH
Vitamins
(A, B2, B6, E)
Fluorescence excitation-emissionVery high sensitivity and selectivity towards important compounds
J. Christensen - Foodfluor database at
www.models.life.ku.dk/research/foodfluor
PARAFAC - uniqueness
• Uniqueness - conditionsA PARAFAC model is unique when
kA + kB + kC 2F + 2
F is the number of components and kA is the k-rank of loading A = maximal number of randomly chosen columns which will be full rank (F)
J. B. Kruskal. Linear Algebra and its Applications 18:95-138, 1977.
N. D. Sidiropoulos and R. Bro. Journal of Chemometrics 14 (3):229-239, 2000.
XB
A
C• No rotational freedom
Unlike the bilinear ‘PCA’ model, there is only one solution
PARAFAC is mathematical chromatographyakaBlind source separationSolving the cocktail party effectUnmixingCurve resolution...
Mathematical chromatography eliminates major problems in multivariate analysis:
• Indirect correlations stemming from rotational freedom• It also eliminates outliers• It determines underlying sources• Simpler because it provides a chemical model• It is way more noise insensitive
How to store a cheese?How to store a cheese?
Oxidation• Oxidation from light causes rancid taste of cheese, butter etc.
• Important for packaging of food and shelf-storage
• Believed to be caused by riboflavin acting as photosensitizer
• Riboflavin does not absorb much red light, hence red material should protect
ExperimentDifferent lightWith / without OxygenDifferent storage time
Samples measured by
• Sensory analysis (quality)• Fluorescence EEM
Spectra from PARAFAC of EEMs
0 5 0 1 0 0 1 5 0 2 0 0 2 5 00
0 .0 5
0 . 1
0 .1 5
0 . 2
0 .2 5
600 620 635 650 665 680 695 705
nm
0 5 1 0 15 2 0 25 3 0 350
0 . 0 5
0 . 1
0 . 1 5
0 . 2
0 . 2 5
0 . 3
0 . 3 5
0 . 4
0 . 4 5
350 365 380 395 410 425 440 455 nm
Excitation Emission
Riboflavin
HematoPProtoP
Cloro B
Cloro A
Chlorin X
From JPWo/Matforsk
Relation between sensory data and PARAFAC estimated concentrations
Rancid taste
From JPWo/Matforsk
Importance of different compounds
ProtoporphyrinChlorophyl BX (Chlorine?)
S S
S
S S
S
From JPWo/ Matforsk
New result
• Apart from riboflavin at least five other light-sensitizers
• ‘New’ ones seem to be more important than riboflavin
• Fluorescence and PARAFAC provides a ‘simple’ approach for exploring these.
0
0,2
0,4
0,6
0,8
1
1,2
1,4
1,6
1,8
300 400 500 600 700
Abs.
Chlorophyll a
Riboflavin
UV
From JPWo/Matforsk
PARAFAC can not handle shifts and shape changes
PARAFAC(1) Xk = ADkBT
A wine modelA wine model
PARAFAC can not handle shifts and shape changes
PARAFAC(1) Xk = ADkBT
A wine modelA wine model
R. A. Harshman. UCLA working papers in phonetics 22:30-47, 1972.H. A. L. Kiers, J. M. F. ten Berge, R. Bro. J. Chemom. 13:275-294, 1999.
R. Bro, C. A. Andersson, H. A. L. Kiers. J. Chemom. 13:295-309, 1999.
PARAFAC2
PARAFAC2 Xk = ADkBkT subject to Bk
TBk constant
PARAFAC(1) Xk = ADkBT
*Actually it is more general than shifts
but it’s a feasible approximation to
what PARAFAC2 can handle
60 wine samples measured by GC-MS
Elution
Sam
ple
s
Weird shifts
Overlap and shifts
Low intensity and baseline
fronting
PARAFAC2 results
Other applications of tensor methods
Scientific fieldEnvironmental monitoringSensory analysisProcess monitoringFermentationCell phone audio qualityWireless communicationMetabolomicsProteomicsCancer diagnosticsAnthropometry......
2, L. America (Indian)
15, S. India16, N. Asia
19, Australia
20, Korea/Japan
12, S.E.Africa11, W. Africa
10, N. Africa18, S.E. Asia
13, Near East
9, Iberian Peninsula
6, E. Europe
7, S.E. Europe
17, S. China
3, L. America
14, N. India
1, N. America
5, C. Europe
4, N. Europe
8, France
2, L. America (Indian)
15, S. India16, N. Asia
19, Australia
20, Korea/Japan
12, S.E.Africa11, W. Africa
10, N. Africa18, S.E. Asia
13, Near East
9, Iberian Peninsula
6, E. Europe
7, S.E. Europe
17, S. China
3, L. America
14, N. India
1, N. America
5, C. Europe
4, N. Europe
8, France
Variable selectionVariable selection
VIS/NIR spectra of 61 beers
1
2
3
4
400 nm 2300 nm
Absorbance
Wavelength
Just crap! Random Just crap! Random noisenoise
The The good good partpart
Not relevant Not relevant but highly but highly systematicsystematic
400 600 800 1000 1200 1400 1600 1800 2000 2200
1
2
3
Raw data
400 600 800 1000 1200 1400 1600 1800 2000 2200
0
0.4
0.8
Actual modelled data - centered
500 1000 1500 2000-2
-1
0
1
Regression vector PLS
0 5 10 15 200
5
10
15
20Predictions test set PLS
’Classicial’ regression – in this case partial least squares (PLS). Good!
And can be optimized by chemical interpretation
But – this is not always the case
500 1000 1500 2000
-60
-40
-20
0
20
40
Regression vector LASSO
0 5 10 15 200
5
10
15
20Predictions test set LASSO
500 1000 1500 2000
-2
-1
0
1
Regression vector PLS
0 5 10 15 200
5
10
15
20Predictions test set PLS
Lasso. Weird stuff! Important area represented by two variables.
Little support:FragileNon-robustPoor outlier abilityInterpretation low
500 1000 1500 2000
-60
-40
-20
0
20
40
Regression vector LASSO
0 5 10 15 200
5
10
15
20Predictions test set LASSO
500 1000 1500 2000
-2
-1
0
1
Regression vector PLS
0 5 10 15 200
5
10
15
20Predictions test set PLS
500 1000 1500 2000
-5
0
5
Regression vector SR
0 5 10 15 200
5
10
15
20Predictions test set SR
’Chemometric’ variable selection – very nice!Variable selection by selectivity ratios but others would do the job as well
NonnegativityNonnegativity
Classical papers related to NMF
Lawton & Sylvestre. Self modeling curve resolution. Technometrics13:617-633, 1971.
Hanson & Lawson. Solving least squares problems, Englewood Cliffs:Prentice-Hall, Inc, 1974.
In 1999 Lee and Seung wrote a Nature paper on NMF -non-negative matrix factorization.
However, NMF has existed for much more than 30 years under the name multivariate curve resolution.
NMF is not generally unique – rotational freedom
Conditions for uniquness exist
When unique, the solution is often shaky
NMF is (very) sensitive to starting values
Some facts worth noting
Dealing with missing dataDealing with missing data
No missingEx.: standard PCA loss function ||X-TP’|| =
I.e., a summation of errors over all elements of X
If missingOnly fit the model to the data that exist
I.e., fit to the loss function
where wij is zero if xij is missing and one otherwise
2
1 1 1
I J F
ij if jfi j f
x t p
2
1 1 1
I J F
ij ij if jfi j f
w x t p
How can that loss function be optimized?
Method 1: use weighted least squares regression
Method 2: use imputation (expectation maximization)
1. Put numbers in missing elements
2. Fit model to these ‘wrong’ data (Ex: M = TP’ in PCA)
3. Replace missing elements with model guess (Ex: xij = Mij in
PCA)
4. Go to step 2 until convergence
Both methods give same result. Method 2 is easy to implement, Method 1 sometimes faster, but more memory-demanding
Some concluding remarks
Tensor models provide
Mathematical chromatography (real blind source separation)
Huge noise reductionIntuitive models (chemically)Better handling of correlationsRobustness…But you need to know your data well – or be lucky
Much neededBetter algorithmsBetter statistical diagnosticsBetter software
Papers, m-files, courses, database of references, data sets, spectral libraries etc.
www.models.life.ku.dk
Rasmus [email protected]