1 6. Other issues Quimiometria Teórica e Aplicada Instituto de Química - UNICAMP.
-
Upload
loraine-georgina-boone -
Category
Documents
-
view
213 -
download
0
Transcript of 1 6. Other issues Quimiometria Teórica e Aplicada Instituto de Química - UNICAMP.
1
6. Other issues
Quimiometria Teórica e Aplicada
Instituto de Química - UNICAMP
2
How many components to use?How many components to use?
• Use ‘unfolding trick’ i.e. look at rank of each mode.– does not have strict statistical basis, but generally works
well!
• Use core-consistency diagnostic (PARAFAC).– also seems to work well in practice
• Split-half analysis.
• Does algorithm converge without problems?
• Use full cross-validation.– N-way Toolbox now has a routine for this – can be slow!
• Look at loadings and residuals.
• Use chemical knowledge.
3
Preprocessing: centering (1)Preprocessing: centering (1)
• We are often interested in the differences between objects, not in their absolute values.– building calibration models: differences between samples
• Mean-centering removes offsets from the data– removes constant background effects
– can help to linearize data, i.e.
4
Preprocessing: centering (2)Preprocessing: centering (2)
• When performing a calibration, it is most common to remove the mean value from each column:
X
jx
ob
ject
variable
Two-way
jijij xxx *
X
primary variable
secondary variable
ob
ject
xjk
Three-way
jkx
jkijkijk xxx *
5
Preprocessing: scaling (1)Preprocessing: scaling (1)
• Sometimes we want to analyse variables measured in different units– chemical engineering: temperatures, pressures, flow rates
– QSAR: ionization constants, Hammett constants, dipole moments
• These variables should be scaled in order to give variables an equal chance to appear in the model.
6
Preprocessing: scaling (2)Preprocessing: scaling (2)
• For two-way arrays (object variables), it is common to divide by the standard deviation after mean-centering the data (‘autoscaling’):
X
j
ob
ject
variable
Two-way
jijij xx /*
X
primary variable
secondary variable
ob
ject
xjk
Three-way
jkAutoscaling can destroy
multilinear structure!
7
Preprocessing: scaling (3)Preprocessing: scaling (3)
process variable
time
ob
ject
X
Xj
Slab scaling maintains the multilinear structure!
jijkijk xx /*
jprocess variable 1
process variable 2
ob
ject
X
Xj
Xk
j k
Double slab scaling may also be useful - ITERATIVE
kijkijk
jijkijk
xx
xx
/
/**
*
8
Tucker modelsTucker models
• Tucker1: X = AG + E– Tucker1 = PCA
• Tucker2: X = G(BA)T + E– G (I R2 R3)
– very rarely used
• Tucker3: X = AG(CB)T + E
9
PARAFAC2PARAFAC2
time shift
wavelength (J)
time (K)
ob
ject
(I)
In PARAFAC2, only the matrix product XiXi
T (J J) is modelled. It works if the correlation structures in the objects are the same.
time shift
10
Missing dataMissing data
• Expectation-maximization (EM) is a technique for estimating models (PARAFAC, Tucker, PLS, PCA etc.) when some of the data is missing:
X = [X* X#]
known missing
• 0. Initialize X#
nnn EXX ˆ• 1. Estimate model, (maximization)
• 3. Repeat until convergence
• 2. Replace missing values with model values
(expectation)## ˆnn XX
11
MuitoMuito obrigadoobrigadoparapara sua
atenção!atenção!