Data-driven signal-resolving approaches of infrared spectra …10.1007... · · 2015-05-142....

1

Analytical and Bioanalytical Chemistry

Electronic Supplementary Material

Data-driven signal-resolving approaches of infrared spectra to

explore the macroscopic and microscopic spatial distribution of

organic and inorganic compounds in plant

Jian-bo Chen, Su-qin Sun, Qun Zhou

1. Details of Algorithms

1) Derivative Spectroscopy

Briefly speaking, the derivative spectrum is the differentiation of the original

absorbance spectrum. The band shape of the derivative spectrum is different from the

original spectrum and becomes more complex as the order of differentiation increases.

The even derivative spectra have negative or positive bands corresponding with the

bands of the original spectrum, meanwhile, on both sides of the main bands there are

satellite bands which are absent in the original spectrum. The differentiation of the

original absorbance spectrum can be obtained by a number of means. Using the

Savitzky-Golay algorithm, the second derivatives are calculated as:

�� =1

�� ⋅ ��

�� （1）

where xk and yk are original and derivative values at the kth wavenumber, respectively.

2m+1 is the width of the spectral window used in the calculation. dj is the weighting

coefficient, and N2m+1 is the normalization factor.

2

2) Two-Dimensional Correlation Spectroscopy (2DCOS)

IR spectra of a sample recorded at m temperature points can be expressed as:

� =�� ,⋮��,⋮��,

…⋱…⋱…

�,�⋮��,�⋮��,�…⋱…⋱…

�,�⋮��,�⋮��,�� （2）

where n is the number of spectral variables contained in each spectrum.

The dynamic spectra Y are obtained by subtracting the average spectral intensity

(the mean of each column of X) from the original intensities at the same variable

(elements of X at the same column). The dynamic spectral intensity at temperature p

and variable k can be calculated as:

��,� = ��,� −1 �!,�

�

!� （3）

The synchronous 2D spectrum ΦΦΦΦ can be obtained from the dynamic spectra Y

using the following equation:

" = 1 − 1 ⋅ #

$# （4）

The synchronous 2D spectrum indicates the coincidence of spectral intensity

variations at corresponding variables under the perturbation. For example, a

cross-peak at (νi, νj) means the spectral intensities at νi and νj change simultaneously

when the perturbation is applied. The cross-peak is positive when the intensities at νi

and νj both increase (or decrease) along the perturbation, otherwise the cross-peak

will be negative. The peak on the diagonal of the synchronous 2D spectrum is defined

as the auto-peak; auto-peaks are always positive. The amplitude of the auto-peak

reflects the extent of the change at a variable under the perturbation.

3) Principal Component Analysis (PCA)

According to the Lambert-Beer’s law, if there are d kinds of chemical

components in total, the spectral data matrix X can be considered as the production of

the concentrations matrix Cm×d and the spectra matrix Sn×d :

� = % ⋅ &$ （5）

where each column of the matrix C represents the concentrations of a chemical

component in all pixels, while each column of the matrix S represents the spectrum of

a chemical component.

3

In most cases, the concentrations matrix C and the spectra matrix S are both

unknown. The measured spectral data matrix X has to be decomposed with little prior

knowledge. Generally, PCA is the first method to try. The PCA decomposition of the

data matrix X can be expressed as:

� = ' ⋅ ($ （6）

where each column of the score matrix Tm×n represents the scores of all pixels on one

principal component (PC), while each column of the loading matrix Fn×n represents

the contributions of all spectral channels to one PC.

The loadings of different PCs should be orthogonal to each other, but this is

unlikely for the IR spectra of different chemical components. In other words, PCs are

not identical with chemical components. Actually, it is reasonable to treat PCs as the

linear combinations of chemical components. However, the number of significant

PCs may be considered as the number of significant chemical components.

Each PC contains a part of the total spectral variance in the data matrix X, and

all PCs are arranged in descending order. Usually, the first few PCs represent most

meaningful spectral variance resulted from the differences between pixels in chemical

compositions, while the rest PCs containing spectral variance from minor chemical

differences or even random noise can be neglected for the time being. The number of

significant PCs can be determined by the cumulative spectral variance. The

significant PCs should contain most of the variance in the data matrix X. Another

useful tool to determine the number of significant PCs is the IND function defined as:

IND� =,∑ ./0123�,�4

/�� 5�

36 − 74� ⋅ 3 − 74�⋅ 13min3 , 64 − 74�

（7）

where k is the number of PCs, while λi is the spectral variance contained in the ith PC.

m and n are the numbers of pixels and spectral channels, respectively, and min(m, n)

is the smaller one of m and n.

4) Independent Component Analysis - Alternating Least Squares (ICA-ALS)

The basic assumption of ICA is that the distribution of each chemical component

is independent and nongaussian. Being independent means that the concentration of

one component provides no information about the concentration of another

component.

4

According to Eq.5, the concentrations matrix C can be obtained from X by a

separating matrix W:

% ≈ < = � ∙ > （8）

where U is the estimation of C.

To ensure the distribution of each component independent, the separating matrix

W has to maximize the entropy of E:

? = g3<4 （9）

where the function g is assumed to be the cumulative density function (cdf) of C.

The separating matrix W is usually optimized by some gradient algorithms to

maximize the entropy of E. The spectra and concentrations matrices obtained by ICA

can be further optimized by ALS with nonnegative constrains, because neither the

spectral intensity nor the concentration is negative. The iterative steps of ALS can be

summarized as:

&A� =&A� + |&A�|

2 （10）

%A = � ⋅ &A�3&A�$ &A�4� （11）

%E =%E + |%E|F （12）

&A = �$ ⋅ %A3%A$%A4� （13）

where Sz and Cz are spectra and concentrations matrices after z iteration cycles.

The ALS iteration does not stop until the difference function D(z) is minimized.

The function D(z) is defined as:

G3H4 =IJ/�A − J/�A�K��

/�

!

�� （14）

where J/�A is the spectral intensity of component j at wavenumber i after z cycles.

5) Partial Least Squares Target - Correlation Coefficients (PLST-CC)

Sometimes the reference spectrum of one or more chemical components can be

obtained. Here the data matrix X can be decomposed into the sum of the known part

and the unknown part:

� = LM ⋅ NM$ + %O ⋅ &P$ （15）

where ct and st are the concentration and spectrum vectors of the known component,

respectively, while Cu and Su correspond to the other unknown components.

5

To calculate ct from st and X, the following inverse model may be used:

NM = �' ⋅ QM （16）

Considering the existence of a larger number of unknown components, the

partial least squares target (PLST) is a reasonable approach to generate the

concentration vector ct of the target component. It should be remembered that the

PLST results are not real concentrations. Consequently, the results from different X

matrices cannot be compared directly.

Although the relative target values can reveal the pixels containing the target

component likely, it is difficult to set an absolute threshold. To reduce the

false-positive pixels, the correlation coefficient is used to generate a secondary

criterion to confirm the existence of the target component in the pixels. The target

value will be reset as zero if the correlation coefficient between the target spectrum

and the pixel spectrum is below the threshold. According to ASTM standard, the

spectral correlation coefficient is defined as:

R�,S =∑ I��,/ ⋅ JT,/K�/�

I∑ ��,/��/� K� ⋅ I∑ JT,/��/� K

�

（17）

where xp,i and st,j are the spectral intensities of the pth pixel and the target component

at the ith wavenumber.

2. Second derivative IR spectra of different parts of the clove bud

Figure S1 shows the second derivative IR spectra of the calyx tube, petal, sepal,

and stamen-pistil of the clove bud in the region of 1100~400 cm-1.

The bands of the four parts at 993~990, 949~948, 919~913, 850~848, 816~815

and 795~794 cm-1 correspond to the bands of eugenol at 994, 949, 915, 850, 818 and

794 cm-1, which confirms the existence of eugenol in all parts of the clove bud. The

calyx tube shows the characteristic band from eugenol at 557 cm-1, which is

insignificant in the spectra of the other parts. This proves that the calyx tube consists

of more eugenol than the other three parts of the clove bud.

The bands of the four parts at 782~779 and 518~517 cm-1 correspond to the

bands of calcium oxalate at 782 and 516 cm-1, which confirms the existence of

calcium oxalate in all parts of the clove bud. The intensity of the band near 782 cm-1

indicates that there is more calcium oxalate in the stamen-pistil than the other parts.

6

Fig. S1. The second derivative IR spectra of four parts of the clove bud and reference

materials in the region of 1100~400 cm-1. (a) calcium oxalate monohydrate; (b) stamen-pistil;

(c) sepal; (d) petal; (e) calyx tube; (f) eugenol

3. Two-dimensional correlation IR spectra of different parts of the clove bud

Figure S2 shows the 2D correlation IR spectra of the calyx tube, petal, sepal, and

stamen-pistil of the clove bud in the region of 950~750 cm-1.

The auto-peaks corresponding to eugenol at 913, 818~816 and 796~795 cm-1

attenuate successively from the calyx tube, petal to sepal, which means the content of

eugenol also decreases in this order.

7

Fig.S2. The two-dimensional correlation IR spectra of four parts of the clove bud in the

region of 950~750 cm-1. Synchronous 2D correlation spectrum: (a) calyx tube; (b) petal; (c)

sepal; (d) stamen-pistil. Auto-peak spectrum: (e) calyx tube; (f) petal; (g) sepal; (h)

stamen-pistil

8

4. PCA results of the NIR imaging data of the section of the calyx tube

Figure S3 shows the PCA results of the NIR imaging data of the transverse

section of the calyx tube of the clove bud in the region of 7500∼4000 cm-1. The first

two PCs account for more than 99.5% of the spectral variance, meanwhile the IND

function has a minimum at the second PC. Therefore, the first two PCs are significant

and represent most spectral information.

Fig.S3. The cumulative spectral variances and IND values of the first ten PCs of the NIR

imaging data of the transverse section of the calyx tube

5. PCA results of the ATR imaging data of different tissues of the calyx tube

Figure S4 shows the PCA results of the ATR imaging data of different tissues of

the calyx tube of the clove bud in the region of 1800∼800 cm-1. The first three PCs

account for 96.2%, 97.5%, and 95.7% of the spectral variance of the stele,

aerenchyma, and cortex, respectively. For each tissue, the IND function has a

minimum at the third PC. Therefore, the first three PCs of each tissue are significant

and represent most spectral information.

9

Fig.S4. The cumulative spectral variances and IND values of the first ten PCs of the ATR

imaging data of different tissues of the calyx tube

6. ATR spectra of typical pixels of different tissues of the calyx tube

Figure S5 shows the ATR spectra of the pixels with the highest PLST values in

each tissues of the calyx tube when the spectrum of eugenol in the range of 1800∼800

cm-1 is the reference. The typical pixel spectra of the aerenchyma and cortex are very

close to eugenol, while the typical pixel spectrum of the stele is quite different. This

proves that eugenol occurs in the aerenchyma and cortex, but not in the stele.

Figure S6 shows the ATR spectra of the pixels with the highest PLST values in

each tissues of the calyx tube when the spectrum of calcium oxalate in the range of

1800∼800 cm-1 is the reference. All the typical pixel spectra of three kinds of tissues

are very close to calcium oxalate, which confirms the existence of calcium oxalate in

all tissues of the calyx tube.

10

Fig.S5. ATR spectra of the pixels with the highest PLST values in each tissues of the calyx

tube when the spectrum of eugenol is the reference. (a) the typical pixel of the stele; (b) the

typical pixel of the aerenchyma; (c) the typical pixel of the cortex; (d) eugenol

Fig.S6. ATR spectra of the pixels with the highest PLST values in each tissues of the calyx

tube when the spectrum of calcium oxalate is the reference. (a) the typical pixel of the stele;

(b) the typical pixel of the aerenchyma; (c) the typical pixel of the cortex; (d) calcium oxalate

Data-driven signal-resolving approaches of infrared spectra …10.1007... · · 2015-05-142....

Documents

Transcript of Data-driven signal-resolving approaches of infrared spectra …10.1007... · · 2015-05-142....