Definition and overview of chemometrics
Paul Geladi
Head of Research NIRCEChairperson NIR Nord
Unit of Biomass Technology and ChemistrySwedish University of Agricultural SciencesUmeåTechnobothniaVasa
paul.geladi @ btk.slu.se paul.geladi @ syh.fi
Project geography
Chemometrics
Mathematics
Statistics
Computer Science
In Chemistry
Similar fields
• Biometrics ±1900
• Psychometrics ±1930
• Econometrics ±1950
• Technometrics ±1960
Chemometrics
• Design of Experiments (DOE)
• Exploratory Data Analysis
• Classification
• Regression and Calibration
Design of Experiments
• Most important where possible
• Uses:
• ANOVA
• F-test
• t-test
• Plots
• Response Surfaces
Design of Experiments
y = b0 + b1x1 + b2x2 +...+bKxK + b11x12 +
b22x22 +...+ bKKxK
2 + b12x1x2 +...+
Factors x1, x2,...xK changed systematically
Response y measured and modeled
Exploratory Data Analysis
• Design not possible• Sampling situations• Find structure• Find groupings• Find outliers
Classification
• Check for groupings = UNSUPERVISED• Existing groupings = SUPERVISED• Visualize groupings• Classify• Test
Regression / Calibration
• Two types of variables X / y
• Relationship linear / nonlinear
• Model
• Diagnostics
• Residual
x
y
Multivariate Data Analysis
Multivariate Data Analysis
• Sampled data and design with too many reponses:• Mining• Hospitals• Agriculture• Food industry• More
Nomenclature
• Samples are objects
• What is measured on the object is a variable
34.92 Spectrum
Samples
Vectors
1 K1
I
123.6
11.15.9340.51.417
A vector is a collectionof numbers.
It is always a columnvector.
The transpose of a vector is a row vector.
Symbols for transpose are’ and T. a’ or aT.
12 3.6 11.1 5.9 34 0.5 1.4 17
0 5 10 15 20 250
2
4
6
8
10
12
14
16
18
Particle size, 1 sample
0 5 10 15 20 25 30 35 400
2
4
6
8
10
12
Small particles, 35 samples
The Data Matrix
A data matrix is a vector of vectors
I
K
0 5 10 15 20 250
5
10
15
20
25
30
35
40
Size histograms, all samples
Particle area
0 200 400 600 800 1000 12000
0.5
1
1.5
2
2.5
3
3.5
4
NIR wavelengths
Times in batch reaction
Geometry of multivariate space
Problem
I and K can be large
Correlation
Univariate statistics does not apply
I patients
3 variables: blood oxygen,iron, hemoglobin
O2
Fe
Hb
O2
Fe
Hb
O2
Fe
Hb
O2
Fe
Hb
O2
Fe
Hb
O2
Fe
Hb
O2
Fe
Hb
O2
Fe
Hb
O2
Fe
Hb
Properties of multivariate spaceRotation
vectors unchanged / distance unchanged
Translation
vectors changed / distance unchanged
Rescaling / change units
all changes
Consequences
• We can move the coordinate sytem around
• The relative distances between objects do not change
• We can rotate the coordinate system
• Scale changes are important
• Move coordinate system to center of data
• Scale properly
Vectors (physics)
x = [ x1, x2, x3 ]
|| x || = ( x12 + x2
2 + x32 ) 1/2
Geometry
a
b
cc2 = a2 + b2
Vectors (K dimensions)
x = [ x1, x2,..., xK ]
|| x || = ( x12 + x2
2 +...+ xK2 ) 1/2
Problem
We can not see in more than 3 dimensions
Paper, computer screen: 2-2.5 dimensions
O2
Fe
Hb
O2
Fe
Hb
Projection
2D plane (screen, paper)
Many projections possible
Find a good one
Find a few good ones
What is good?
Top Related