The use of PARAFAC in the analysis of CDOM …...The use of PARAFAC in the analysis of CDOM...
Transcript of The use of PARAFAC in the analysis of CDOM …...The use of PARAFAC in the analysis of CDOM...
The use of PARAFAC in the analysis of CDOM fluorescence
Kate Murphy1,2
1. Smithsonian Environmental Research Center, Edgewater USA
2. The University of New South Wales, Dept. of Civil and Environmental Engineering, Sydney Australia
Challenges in CDOM fluorescence research
Can different sources of DOM be reliably distinguished on the basis of fluorescence?
Which chemical constituents contribute to CDOM fluorescence?
How do environmental variables (e.gpH, temp) and processes (e.g. photodegradation) affect fluorescence spectra?
EEMs (Excitation-Emission Matrices)
Emission Wavelength (nm)
M
What is PARAFAC?
Chemometrics decomposition method utilizing ALS algorithms to estimate the underlying structure of a multiway dataset
How does it work?
Multiway Data Structure
Component 1 Component 2 Component N
240 260 280 300 320 340 360 380 400 420 440300
350
400
450
500
550
600
1
2
3
4
5
6
7
Excitation wavelength (nm)
Emis
sion
wav
elen
gth
(nm
)PARAFAC model
EEM
∝
0
0.1
0.2
0.3
300 350 400 450 500wavelength (nm)
Load
ing
a concentrationb emission spectrum
c excitation spectrum
xijk = ∑ aif bjf ckf + eijk
F
0
0.2
0.4
0.6
240 290 340wavelength (nm)
Load
ing
Principal Components Analysis (PCA)
Parallel Factor Analysis (PARAFAC)
Model hierarchy
(bilinear / 2-way)
(trilinear / 3-way)
apply constraintsreduce complexityreduce degrees of freedomreduce fit
9 component PARAFAC model
Ex. (nm)
Em. (
nm)
250 300 350 400 450300
400
500
600
Ex. (nm)
Em. (
nm)
250300350400450300
400
500
600
Ex. (nm)
Em. (
nm)
250 300 350 400450
400
600
Ex. (nm)
Em. (
nm)
250 300 350 400 450
400
600
Ex. (nm)Em
. (nm
)250300350400450
400
600
Ex. (nm)
Em. (
nm)
250 300 350 400 450300
400
500
600
Ex. (nm)
Em. (
nm)
250 300 350 400 450300
400
500
600
Ex. (nm)
Em. (
nm)
250300350400450300
400
500
600
Ex. (nm)Em
. (nm
)250 300 350 400 450
300
400
500
600
C1 C2 C3
C4 C5 C6
C7 C8 C9
0
2
4
6
8
1 2 3 4 5 6 7 8 9
240 260 280 300 320 340 360 380 400 420 440300
350
400
450
500
550
600
1
2
3
4
5
6
7
240 260 280 300 320 340 360 380 400 420 440300
350
400
450
500
550
600
-1.5
-1
-0.5
0
0.5
1
1.5
240 260 280 300 320 340 360 380 400 420 440300
350
400
450
500
550
600
1
2
3
4
5
6
7
Raw Data Model
Component
+
Residuals
=C
onc.
AssumptionsPARAFAC assumes that:
1. Data structure is approximately trilinear
• fluorescence increases linearly with concentration
• Emission spectra doesn’t change with excitation wavelength, and vice versa
2. Additivity: Fluorescence results from the linear superposition of N individual fluorescent components
• determine N by trial and error (or know in advance)
3. Uniqueness: No two components have identical spectra
PARAFAC makes NO assumptions about:
1. Spectral shapes
2. Number of components
3. Structure of parameters and error terms
Advantages of PARAFAC
Unique solution (with few exceptions) pure spectra are recovered
concentrations can be estimated for each component
cf. rotational freedom in PCA means external information is needed to recover spectra and concentrations
Fully exploits the “2nd order advantage” can estimate the concentration of an analyte in an unknown mixture in the presence of uncalibrated interferents
Easily interpreted
Modelling with PARAFAC
1. Pre-treatment• Center and Scale• Remove or down-weight scatter (Raman, Rayleigh)
2. Calibration• Apply constraints (non-negativity, unimodality)• Choose number of components• Software: Matlab N-way toolbox or PLS toolbox
3. Validation4. Interpretation
Validation
1. Examine residuals
2. Split-half analysis (due to uniqueness)
3. Core consistency, cross-validation, influence plots…
4. Compare models of different datasets
Split-half analysis
0
0.1
0.2
P1350400450500
0
0.2
0.4
0.6
P2350400450500
00.10.20.3
Emiss
ion
wav
elen
gth
(nm
) (co
ntou
rs) /
Loa
ding
s (lin
e pl
ots)
P3350400450500
0.2
0.4
P4350400450500
0
0.2
0.4
0.6
P5350400450500
0
0.2
0.4
P6350400450500
300 400 5000
0.1
0.2
0.3
P7
300 350 400 450
350400450500
300 400 5000
0.2
Excitation wavelength (nm) (contours) / Wavelength (nm) (line plots)
P8300 350 400 450
350400450500
F255/580 [QSE]
0 20 40 60 80 100 120
C5
fluor
esce
nce
max
imum
[QS
E]
0
20
40
60
80
100
120
Excitation Wavelength (nm)250 300 350 400 450
Fluo
resc
ence
[QS
E]
0
10
20
30
40
50
60
70
Emission Wavelength (nm)
300 350 400 450 500 550 600
Fluo
resc
ence
[QS
E]
0
10
20
30
40
50
60
70
Excitation
Emission
sample
sample
PARAFAC
PARAFAC
Scores vs. Concentration
Validation of dye spectra
Validation of protein constituents
200 250 300 350 400 450 500 550 6000
0.2
0.4
0.6
0.8
load
ing
wavelength (nm)
Ex (C1)Em (C1)Ex (tyrosine)Em (tyrosine)
200 250 300 350 400 450 500 550 6000
0.1
0.2
0.3
0.4
0.5
load
ing
wavelength (nm)
Ex (C6)Em (C6)Ex (Tryptophan)Em (Tryptophan)
250 300 350 400 450300
350
400
450
500
550
600
250 300 350 400 450300
350
400
450
500
550
600
C1
C6
Inter-model comparisons
200 250 300 350 400 450 500 550 6000
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0.4
load
ing
wavelength (nm)
Ex (C7)Em (C7)Ex (S&M*)Em (S&M*)
(S&M* = Stedmon & Markager (in press). Marine Chemistry.)
250 300 350 400 450300
350
400
450
500
550
600
C7
Inter-model comparisons
300 400 5000
0.5 C1 c.f. P2+P5
Load
ing
300 400 5000
0.1
0.2C2 c.f. P1
300 400 5000
0.2
C3 c.f. P3
Load
ing
300 400 5000
0.2
0.4 C6 c.f. P6+P7
300 400 5000
0.2
C8 c.f. P8
Excitation (LHS) and emission (RHS) wavelength (nm)
Load
ing
Kauai model
BWE7 model
Example: Ships’ ballast water
Sampling Effort
Pacific Ocean
90° E 135° E 180° E 135° W 90° W
45° S
0°
45° N
Port SurveyCruise80nmi transect
9 component PARAFAC model
Ex. (nm)
Em. (
nm)
250 300 350 400 450300
400
500
600
Ex. (nm)
Em. (
nm)
250300350400450300
400
500
600
Ex. (nm)
Em. (
nm)
250 300 350 400450
400
600
Ex. (nm)
Em. (
nm)
250 300 350 400 450
400
600
Ex. (nm)
Em. (
nm)
250300350400450
400
600
Ex. (nm)
Em. (
nm)
250 300 350 400 450300
400
500
600
Ex. (nm)
Em. (
nm)
250 300 350 400 450300
400
500
600
Ex. (nm)
Em. (
nm)
250300350400450300
400
500
600
Ex. (nm)
Em. (
nm)
250 300 350 400 450300
400
500
600
C1 C2 C3
C4 C5 C6
C7 C8 C9
Model used EEMs from >700 samples of seawater and ballast water
“B”
“T”
?
“A” , “C”“M”
“A” , “C”
PAH
?
RhodamineWT dye
Humic-like fluorescence
165° W 150 ° W 135 ° W 120 ° W 105° W
15° N
30° N
45° N
160° W 158° W 156° W 154° W 20° N
22° N
24° N
KFos
BN
80° W 60° W 40° W 20° W 0°
0°
20° N
40° N
60° N
distance to land (nautical miles)
0-2 2-1010-25
25-5050-100
100-200
ocean
fluor
esce
nce
ratio
rela
tive
to C
3
0.0
0.5
1.0
1.5
Ex. (nm)
Em. (
nm)
250300 350400 450300
400
500
C8 Ex. (nm)
Em
. (nm
)
250 300 350 400 450300
400
500
600
C3Ex. (nm)
Em
. (nm
)
250 300 350 400 450300
400
500
600
C2
Decoupling between C2 and C3 fluorescence
-2 -1 0 1 2
-2
-1
0
1
2
log(C3)
log(
C2)
HarborCoastShelfOcean
0 5 100
1
2
3
4
C2
C2/
C3
-2 -1 0 1 2-2
-1
0
1
2
log(C3)
log(
C2)
(i)(ii)
(iii)
(iv)
0 5 100
1
2
3
4
C2C
2/C
3
(i)
(ii)
(iii)
(iv)
A B
C D
(A) At low C3 concentration, C2 concentrations frequently lie above the conservative dilution curve; (B) the ratio of C2/C3 in seawater is independent of C2 at high concentrations, but at low concentrations, it is driven by the concentration of C2;(C & D) modeled relationships assuming dilution only (i), dilution and increased removal of C3 (ii), dilution and constant production of C2 (iii), or
dilution and heterogeneous but generally increasing production of C2 (iv).
Interpretation of PARAFAC models
C3*: 370/494
Distance to Land
0.01 0.1 1 10 100 1000
Con
cent
ratio
n (Q
SE
)
0
2
4
6
8
10
12
0.7 ppb
Ex. (nm)
Em
. (nm
)
250 300 350 400 450300
400
500
600
C3
Protein-like fluorescenceR
ott.
NS/
ECE.
Shlf
B.Bi
scN
EAtlc
Trop
AtBr
zShf
SaoL
ui
F (q
se)
C1 C6 C7
250 300 350 400 450300
350
400
450
500
550
600
C7
Public Resources – Educational Materials
Chemometrics group of the Faculty of Life Sciences at the University of Copenhagen - www.models.life.ku.dk.
Information on meetings, symposia, new books, software
Downloadable datasets (including fluorescence of amino acids, fish muscle, parma ham, yoghurt, ….)
Web-based tutorials, interactive internet courses, graphical illustrations (movies)
Public Resources – Spectral databasewww.models.life.ku.dk
Guidelines for fluorescence spectral correction and calibration procedures.
For a range of compounds and IHSS humic standards, ASCII files containing:
currently published and available DOM PARAFAC components
carbon specific absorption spectra of the individual compounds
carbon specific fluorescence excitation emission matrices (EEMs)
Small datasets of DOM fluorescence for use in PARAFAC tutorials.
Acknowledgements Funding:
The University of Birmingham (Fluoronet), USCG Research & Development Center, Columbia River Aquatic Nuisance Species Initiative (CRANSI), California State Lands Commission, New Zealand Ministry of Fisheries
Host Shipping Companies:NYK Bulkship (USA) LTD., Gateway Maritime Corp. / Sincere Industrial Corp., Matson Navigation Company, Bergesen DY ASA., Sea River Maritime, the Alaska Tanker Company, BP Amoco PLC and Krupp Seeschiffahrt GmbH
Analyses:University of S. Florida, University of Maine, Portland State University, Denmark National Environmental Research Institute
PARAFAC:Thanks to Colin Stedmon for sharing his PARAFAC spectra with me