Evidence Project Final Report -...

EVID4 Evidence Project Final Report (Rev. 06/11) of 106

General Enquiries on the form should be made to:

Defra, Procurements and Commercial Function (Evidence Procurement Team) E-mail: [email protected]

Evidence Project Final Report

Note

In line with the Freedom of Information Act 2000, Defra aims to place the results of its completed research projects in the public domain wherever possible. The Evidence Project Final Report is designed to capture the information on the results and outputs of Defra-funded research in a format that is easily publishable through the Defra website An Evidence Project Final Report must be completed for all projects.

This form is in Word format and the boxes may be expanded, as appropriate.

ACCESS TO INFORMATION

The information collected on this form will be stored electronically and may be sent to any part of Defra, or to individual researchers or organisations outside Defra for the purposes of reviewing the project. Defra may also disclose the information to any outside organisation acting as an agent authorised by Defra to process final research reports on its behalf. Defra intends to publish this form on its website, unless there are strong reasons not to, which fully comply with exemptions under the Environmental Information Regulations or the Freedom of Information Act 2000.

Defra may be required to release information, including personal data and commercial information, on request under the Environmental Information Regulations or the Freedom of Information Act 2000. However, Defra will not permit any unwarranted breach of confidentiality or act in contravention of its obligations under the Data Protection Act 1998. Defra or its appointed agents may use the name, address or other details on your form to contact you in connection with occasional customer research aimed at improving the processes through which Defra works with its contractors.

Project identification

1. Defra Project code FAO 158

2. Project title

Further development and validation of the proposed methodology to verify vegetable oil species in mixtures of oil

3. Contractor organisation(s)

The Queen’s University Belfast

54. Total Defra project costs £ 107,102

(agreed fixed price)

5. Project: start date ................ 1/8/14

end date ................. 31/10/15

mailto:[email protected]


6. It is Defra’s intention to publish this form.

Please confirm your agreement to do so. ................................................................................... YES NO

(a) When preparing Evidence Project Final Reports contractors should bear in mind that Defra intends that they be made public. They should be written in a clear and concise manner and represent a full account of the research project which someone not closely associated with the project can follow.

Defra recognises that in a small minority of cases there may be information, such as intellectual property or commercially confidential data, used in or generated by the research project, which should not be disclosed. In these cases, such information should be detailed in a separate annex (not to be published) so that the Evidence Project Final Report can be placed in the public domain. Where it is impossible to complete the Final Report without including references to any sensitive or confidential data, the information should be included and section (b) completed. NB: only in exceptional circumstances will Defra expect contractors to give a "No" answer.

In all cases, reasons for withholding information must be fully in line with exemptions under the Environmental Information Regulations or the Freedom of Information Act 2000.

(b) If you have answered NO, please explain why the Final report should not be released into public domain

Executive Summary

7. The executive summary must not exceed 2 sides in total of A4 and should be understandable to the intelligent non-scientist. It should cover the main objectives, methods and findings of the research, together with any other significant events and options for new work.

This report details the results of the work to validate a newly developed methodology for vegetable

oil species identification in a refined vegetable oil blend and its extension to processed foods

containing vegetable oil. The method has particular emphasis on the detection of palm oil and its

derivatives which is by far the most widely used vegetable oil in the food industry. Recent changes in

legislation now require the vegetable oil species used in processed foods to be labelled in the

ingredients list. Having the tools in place to verify and enforce food labelling requirement gives

consumers confidence in the integrity of the food chain. A novel method to allow verification of

vegetable oils species was developed under project FA0117, this follow-on project aimed to validate

the previously developed method. The methodology employed is a staged procedure that consists of a

combination of a spectroscopic technique known as FTIR (Fourier Transform Infrared spectroscopy)

that is used to screen and classify the oils and the established fatty acid methyl esters analysis using

gas chromatography to confirm the composition of the oils when required. These two techniques,

when performed serially using the developed decision making system, exploit the small differences in

chemical composition between different oil species in different type of oil blends to classify the

unknown sample into one of the 6 or 12 oil classes studied. In that way, both untargeted fingerprint

analysis (spectroscopic screening) and targeted analysis (fatty acid quantification by gas

chromatography, GC) are applied to increase result’s certainty. The project was divided into 4 main

sections, a) extension of the current method to include more samples and more oil classes, b) inter-lab

trials: FTIR and GC for fatty acids, c) validation of the method to identify oil species in pastry products

including any method refinement required, and d) validation of the methodology to detect the presence

of palm oil in chocolate confectionery products including any method refinement.

During the preliminary project FAO117 an initial database was created comprising of 23 pure

vegetable oils and 190 oil admixtures and grouped into 6 different oil classes. In this current follow-on

project, this database was expanded to include more variability, i.e. more pure oils from different

geographical origins and more in-house admixtures, in order to increase robustness of the calibration

models that the FTIR spectroscopic method is based on. Overall, a total of 80 pure vegetable oils were

obtained from reliable sources and 215 oil admixtures were prepared in-house in variable

concentrations. After merging the two datasets from FAO117 and from the current project, a total of

376 samples were used in the calibration models and 101 were used in the prediction set. The

prediction set is comprised of independent samples that are used inhouse to validate the method.


Calibration models were built using both SIMCA and PLS-DA classification techniques (multivariate

analysis) using computation software Matlab and SIMCA 14.0 UmetricsTM

for comparison purposes.

Calibration models were developed for the 6 classes of oil determined in FAO117 (with modifications

to incorporate the new oils/oil admixtures) and for 12 classes determined in this current project.The

new model of 12 classes provides much more resolution because it clearly distiguishes between the

different botanical origin of the vegetable oil species compared to the legacy 6 classes model that

contained some speciation overlap. In the legacy model design the 6 classes to be predicted

(corresponding to different oil types) using the method are palm,sunflowe/rapeseed oil, palm kernel oil,

and coconut oil and limited binary mixtures of the above. The new higher resolution model with the 12

classes includes: palm, palm kernel, sunflower, rapeseed, coconut, and all the binary combinations of

the above.

Overall, in the legacy design (when the classification result will be determined between 6 classes)

the best classification result was acheived using a calibration model built with PLS-DA using Matlab

sotware and simulated samples, whereas in the high resultion design (12 classes’ model) the best

performing calibration model is built with PLS-DA combined with threshold (t=0.57) using the

Umetrics™ SIMCA software. In the first case (legacy) no samples needed to go to the confirmation

step (GC analysis of fatty acids) whereas 44 samples were referred to the confirmation step with the

hugh resultuion model which was expected because the classification difficulty escalates when

classes are doubled. New criteria based on fatty acid content were developed for the high resolution

model design while the criteria for legacy model design were slightly revised to include the new pure

oil species and oil admixtures (coconut oil and its admixtures).

An inter-lab validation trial was undertaken in order to establish if the analytical method is

‘instrument-agnostic’, i.e. independent of the FTIR instruments used to acquire the spectra of the oils.

Nine samples including pure oils and oil admixtures were prepared in-house and dispatched to each of

the 12 participants agreeing to take part in the inter-lab trial. The majority of the blends could be

identified using the FTIR chemometric models (using the PLS-DA classification technique) (1st stage),

a small percentage of pure and oil blends were incorrectly identified (14% non-classified and 2.3%

wrongly classified). The GC fatty acid analysis (2nd

stage) of these non-classified samples however

correctly identified the nature of 16 out of 18 of the samples (88.9%) that had been referred to this

confirmation step. As a general conclusion, the original method, i.e. FTIR spectroscopy coupled with

PLS-DA classification technique, followed by GC fatty acid analysis when required, offers a great

insight into the nature of pure oil and binary mixtures and correctly classifies 96.03% of unknown oil

samples as seen in this inter-lab validation.

In order to establish the reproducibility of the GC fatty acid data obtained in-house (necessary for

the confirmation step), an inter-lab validation of the GC method was also undertaken. Three different

UK accredited food testing laboratories participated in the GC fatty acid inter-lab trial. Anonymous

samples (n=8) were submitted to the testing laboratories. Each of the laboratories performed the

analysis using their own GCMS instrument and official method for determination of individual fatty

acids in oil samples. Results showed very good reproducibility with low relative standard deviation

values (from 0.01 to 0.53) obtained for the major fatty acids present in oil samples.

The new calibration models (the legacy model design and the new high resolution model design)

were tested on oils extracted from commercial biscuits (plain biscuits, rich tea and digestive biscuits)

obtained in a UK survey. The accuracy (80%) was good for the legacy model but the false positive rate

(20%) was above the theshold we used to determine the the quality of the screening method (5%). For

the new high resulution model, the accuracy was low (50%) and the false positive rate was again high

(25%). Due to the relatively poor results obtained with these calibration models, a new calibration

model was built using oils extracted from biscuits prepared in-house (biscuit-specific model). Digestive

biscuits (DG) were prepared in-house using authentic palm oil and rapeseed oil and rich tea biscuits

(RT) were prepared with palm oil and admixtures of palm oil and rapeseed oil. All oils used were

sourced from reliable sources. After baking, the oils were recovered using hexane extraction and FTIR

spectra was recorded in triplicate for all the biscuit samples (n=40). The biscuit-specific model was

validated with oils extracted from in-house biscuits as well as with oils extracted from commercial

biscuits. Validation with oils from in-house biscuits showed 100% accuracy whereas validation with oils

from commercial biscuits showed 80% accuracy (15% wrongly classified). With the establishment of

thresholds, the false positive rate decreased from 15% to 5%, the accuracy decreased from 85% to

80% and 15% of the samples (3 samples) were referred to the confirmation step. Two out of three


samples referred to the confirmation step were correctly identified using the 6-classes fatty acids

criteria. In conclusion, FTIR spectroscopy coupled with PLS-DA classification technique, followed by

GC fatty acid analysis (when required), offers an insight into the nature of oils and oil admixtures

extracted from biscuits and correctly classifies 100% of the oils extracted from in-house biscuits

(validation set) and 90% of the oil extracted from commercial biscuits (validation set).

The presence of palm oil in confectionery products is widespread. Due to the different nature of

confectionery oils they could not be tested within the developed calibration models built with pure oils

(legacy and new model). New product-specific calibration models for chocolate confectionery products

were developed to answer the question “is there palm oil in a confectionery product, yes or no?”. FTIR

spectroscopy provided very good and promising results on the single detection of palm oil in a

chocolate confectionery product. Validation with in-house oil admixtures as well as with fats extracted

from commercial confectionery products showed 100% accuracy when using FTIR combined with

PLS-DA using a small dataset. Chocolate products with only cocoa butter (higher added value

products) could be confirmed and the presence of palm oil could be detected in those chocolate

products containing palm oil (generally of lower added value). Fatty acid criteria for confectionery

samples were created and successfully identified all oils extracted from commercial confectionery

products for those samples that needed a confirmatorty analysis following a non-specific screening

result.

In summary:

The vegetable species identification method performed very well when evaluating the speciation

of unprocessed oil blends then the mixture is up to two different oils (legacy and high resolution

model design). Due to the harmonisation protocols developed for the interlab trial the method

delivers accurate results on a range of instruments that were used for the spectra acquisition

and confirmatory chromatographic analysis. .

The results from this study indicate that the method can be successfully used when testing

processed foods containing vegetable oils, however a generic method is not possible and

modifications/ the development of new calibration models may be necessary in order to adapt its

use in different food product categories This is because the FTIR calibration model is not wholly

universal for all commercial products currently on the market.

Confectionary fats are very complex products. The presence of palm oil in confectionery

products has been successfully detected using specific PLS-DA calibration models for chocolate

confectionery products (yes/no model). Chocolate products with only cocoa butter (non-palm oil

confectionery) could also be confirmed using this model in a small commercial samples dataset

that was tested.

In conclusion, the staged procedure consisting of a spectroscopic screening with FTIR and a

chromatographic confirmatory analysis proved effective in identifying the nature of unknown complex

refined vegetable oil blends in both oils and in some extend in processed foods with some essential

modifications. The methodology is simple to implement, very affordable in terms of cost per sample

and equipment resources required and yet highly specific. The research proved that different variation

of the methods (different calibration model) is needed for every product category tested. Further work

is needed to develop the universal (applicable to all products), instrument agnostic (applicable to all

acquisition instruments) method in order to adequately enforce the legislation.

Project Report to Defra

8. As a guide this report should be no longer than 20 sides of A4. This report is to provide Defra with details of the outputs of the research project for internal purposes; to meet the terms of the contract; and to allow Defra to publish details of the outputs to meet Environmental Information Regulation or Freedom of Information obligations. This short report to Defra does not preclude contractors from also seeking to publish a full, formal scientific report/paper in an appropriate scientific or other journal/publication. Indeed, Defra actively encourages such publications as part of the contract terms. The report to Defra should include:

the objectives as set out in the contract;


the extent to which the objectives set out in the contract have been met;

details of methods used and the results obtained, including statistical analysis (if appropriate);

a discussion of the results and their reliability;

the main implications of the findings;

possible future work; and

any action resulting from the research (e.g. IP, Knowledge Exchange).

FOR THE ABBREVIATIONS USED PLEASE SEE PAGE 45

1. BRIEF BACKGROUND INFORMATION

In 2011, the European Commission (EC) introduced new legislation for labelling of processed foods

containing refined vegetable oils (EU Regulation 1169/2011) and this legislation took effect in 2014. A

number of important changes in the labelling of foodstuffs came into force. According to the legislation,

prepacked food labels should demonstrate clearly in the list of ingredients the vegetable oil species

used in the product. This essentially means that in the case of blended vegetable oils used in food

products, the type of vegetable oil is now clearly identified on the package in contrast to the previous

requirement where an oil blend could be labelled under the generic term “vegetable oil”. Currently

there is no official method that can be used to verify the vegetable oil constituents found in a product

under the new labelling legislation, which will be required to support its enforcement. In 2012, DEFRA

funded a 1 year proof-of-concept research project (FAO117) at Queens University Belfast which

aimed to develop such a methodology. After a thorough literature review (Osorio et al., 2013), it was

concluded that spectroscopic and chromatographic methods were suitable to tackle this problem

although their application has never been attempted with the these particular oil species. Through the

course of that project, the team developed a procedure based on a fusion of spectroscopic and

chromatographic methods for the analysis of binary blends of refined vegetable oils of interest with

emphasis on palm oil and its fractions (stearin and olein). The staged procedure consists of a

screening step (infrared spectroscopy, FTIR) and a confirmation step (chromatographic determination

of fatty acids) coupled with an embedded decision making system. The procedure demonstrated

excellent results when validated with external authentic oil samples in a single lab validation (SLV)

exercise. The extension of the method into foodstuffs (biscuits and confectionery) has been

undertaken within the current project and the reproducibility of the spectroscopic analysis, the fatty

acid criteria and the overall robustness of the method has been studied and re-evaluated.

2. OBJECTIVES

The specific project aims are:

1. To set up an inter-laboratory trial with partners in the UK using different FTIR

spectroscopy instruments.

2. To set up an inter-laboratory trial with partners in the UK using different GCMS (gas

chromatography- mass spectroscopy) instruments.

3. Expand vegetable oils reference database including a limited number of other types of oils

such as coconut oil present in processed foods.

4. Update calibration models and update/create fatty acid criteria when needed.

5. Assess robustness of using the method to determine oil species in food matrices (pastry

products): case study with biscuits.

6. Establish a method to detect the presence of palm oil and palm oil especies in

confectionery products: case study with chocolate confectionery bars and cakes.

7. Develop and validate the web tool used for data analysis.

8. Link with other EU wide initiatives and dissemination

The overall aim is to further improve and validate the developed oil speciation DEFRA method and

SOPs in order to make them fit-for-purpose for policing sustainable labelling of foodstuffs under the

new EU Regulation. This directive was driven by consumer awareness and need for better food

labelling in products across the EU.


3. DATABASE EXPANSION

3.1. Sourcing of refined authentic oils

As for project FAO117 the authenticity of reference vegetable oil samples was crucial for the

reliability of the final project results. Reference refined palm oil and its derivatives (palm stearin and

palm olein), palm kernel oil, sunflower oil, rapeseed oil and coconut oil samples were purchased from

reliable and reputable sources (major food industries and the oil processing industry) and are

representative of the refined oils present in the European/UK market. These oils were sourced globally

and usually refined/fractionated in the EU/UK. The period for the sourcing of oil samples was from

November 2014 to July 2015. Oils used in the confectionery industry were not easy to find and they

were mainly purchased from online retailers. Thus the authenticity of these oils was not verified and it

cannot be guaranteed. These oils were cocoa butter, hydrogenated palm kernel oil, shea butter, illipe

butter, mango kernel, kokum gurgi and sal. The list of all oils purchased for the current project is

shown in Table 1.

Table 1. Details of all oil samples sourced for the project FAO158.

Oil Specie Sample name

Usage Origin Company Date of purchase

Palm Oil POn1 Calibration Brazil Oil processor 1 07/14

POn2 Calibration Malaysia Oil processor 2 01/15

POn3 Validation Thailand Oil processor 3 11/14

POn4 Calibration Not provided Oil processor 4 01/15

POn5 Validation Not provided Oil processor 4 02/15



POn8 Validation Malaysia Oil processor 6 03/15



POn11 Calibration Indonesia Oil processor 9 04/15


POn13 Validation Indonesia Oil processor 9 04/15








POn21 Validation Not provided Oil supplier 1

POn22 Validation Not provided Oil supplier 2 06/15

POn23 Calibration Colombia Oil supplier 2 06/15

Palm Kernel Oil

PKOn1 Calibration Malaysia Oil processor 2 01/15

PKOn2 Validation Thailand Oil processor 3 11/14

PKOn3 Calibration China Oil supplier 3 04/15

PKOn4 Calibration Not provided Oil supplier 2 06/15

PKOn5 Not provided Oil supplier 2 06/15

Palm Olein POln Calibration Malaysia Oil processor 2 01/15

POln2 Validation Thailand Oil processor 3 11/14

POln3 Calibration Not provided Oil processor 5 02/15

Palm Stearin

PSn1 Calibration Malaysia Oil processor 2 01/15

PSn2 Validation Thailand Oil processor 3 11/14

PSn3 Calibration Not provided Oil processor 4 01/15



PSn6 Validation Not provided Oil processor 4 02/14


Rapeseed Oil

ROn1 Validation Not provided Oil retailer 1 01/15

ROn2 Calibration Not provided Oil retailer 2 03/15












ROn13 Calibration Not provided Online retailer 1 04/15

ROn14 Calibration Not provided Oil supplier 2 06/15

Sunflower Oil

SOn1 Validation Not provided Oil retailer 1 01/15

SOn2 Calibration Not provided Oil retailer 7 03/15












SOn14 Calibration Not provided Online retailer 2 04/15

SOn15 Validation Not provided Online retailer 3



SOn18 Calibration Italy Oil supplier 2 06/15

Coconut Oil

CCO1 Calibration Not provided Online retailer 6 04/15


CCO3 Validation Not provided Online retailer 8 04/15






CCO9 Calibration Not provided Oil supplier 2 06/15

CCO10 Validation Not provided Oil supplier 2 06/15

Cocoa Butter

COA1 Not provided Online retailer 14 04/15






COA7 Not provided Oil supplier 2 06/15

COA8 Not provided Oil supplier 2 06/15

Shea butter

ShB1 Not provided Oil supplier 2 06/15

ShB2 Not provided Online retailer 19 06/15




Mango Kernel

MnB1 Not provided Online retailer 23 06/15




Kokum gurgi

KmB1 Not provided Online retailer 27 07/15

KmB2 Not provided Online retailer 28 07/15

Illipe Butter

IlB1 Not provided Online retailer 29 07/15

IlB2 Not provided Online retailer 30 07/15

Sal SB1 Not provided Online retailer 31 07/15


Authentic oil samples were separated into calibration and prediction sets. Both sets are

independent. Calibration sets are samples used only to create the models and prediction sets are

samples used to test the prediction ability of the models. Calibration samples were added to the whole

FAO117 dataset (calibration + prediction) and prediction samples were used to validate the new

expanded database. New chemometric models were developed and prediction samples were used to

validate the new models.

3.2. Preparation of in-house oil mixtures

New binary oil admixtures including all sourced authentic oils (excluding oils for biscuits and

confectionery products) were created in our laboratory. These binary oil mixtures were (Appendix I):

Palm stearin + palm oil 23 samples

Palm olein + sunflower oil 17 samples

Palm oil + sunflower oil 23 samples

Rapeseed oil + palm kernel oil 18 samples

Sunflower oil + palm kernel oil 16 samples

Palm oil + palm kernel oil 30 samples

Rapeseed oil + sunflower oil 18 samples

Rapeseed oil + Palm oil 24 samples

In addition, a new binary admixture was also prepared:

Coconut oil + Palm oil 46 samples

In the preparation of every admixture, oils from different sources and geographic origins were used

in order to include compositional and geographical variability. All oil samples and resulting admixtures

were stored at -20ºC in glass vials with a headspace of <5% to avoid oxidation.

3.3. Spectral data acquisition with FTIR spectroscopy

FTIR spectroscopy was used as a screening technique in order to create a database of

spectroscopic data of vegetable oil samples. Appropriate number of replicates (3) was considered. All

spectra were pre-processed according to a suitable standardized treatment which includes three

spectral filters, standard normal variate (SNV), first order derivative and Savitsky-Golay smoothing,

applied in a sequential order. Pre-processing of spectral data removed undesired systematic variation

in the data (i.e. baseline drift and wavenumber regions of low information content) and enhanced the

predictive power of multivariate calibration models (Eriksson et al., 2006).

3.4. Chromatographic determination of fatty acid methyl esters

Fatty acid methyl esters were prepared according to BS684-2.34:2001 part 5 (see SOP, FAO117).

Specific criteria of individual fatty acids (FA) were modified accordingly and new criteria were

developed for the identification of an unknown sample.

3.5. Data analysis

Extended data analysis was undertaken. In advance of chemometric analysis, the datasets were

pre-processed as described in Section 3.3. After the elimination of the unwanted and systematic

variation, Principal Component Analysis (PCA) as an unsupervised pattern recognition technique was

applied for the exploratory data analysis (EDA) in order to simplify, gain better knowledge of datasets

and identify the outliers. In a second step, two supervised pattern recognition techniques were

performed to build up the classification models, Partial Least Squares Discriminant Analysis (PLS-DA)

and Soft Independent Modelling of Class Analogy (SIMCA). PLS-DA is a discriminant technique which

aims to find the variables and directions in the multivariate space which discriminate the established

classes in the calibration set (Berrueta et al., 2007). On the other hand, SIMCA is a class-modelling

technique where each class is independently modelled using PCA, and can be described by a

different number of principal components. For the interpretation of the models, inspection of the

Variable Importance in Projection (VIP) scores was used. The VIP of a predictor is a value that


expresses the contribution of the individual variable in the definition of the F-latent vector model

(Bevilacqua et al., 2012). The SIMCA 14.0 Umetrics TM

software (Upssala, Sweden) and MATLAB

R2015b (The Mathworks Inc., USA) software were used for conducting the chemometric analyses.

Specifically, in the workspace of MATLAB, SIMCA and PLS-DA Matlab functions of Cleiton A. Nunes

(UFLA, MG, Brazil) in combination with some in-house functions allowed us to establish the

identification models. The performance of the classification models produced was evaluated by means

of the most common statistical measures (Oliveri & Downey, 2012). In particular, the samples

belonging to the class being modeled are called true positive (TP) if they are correctly found inside of

class boundaries or false negative (FN) if they fall outside of the boundaries. By analogy, samples

extraneous to that class are referred to as false positive (FP) if they are found within the boundaries

or true negative (TN) if they are correctly outside the boundaries. Boundaries for each class are

defined by the classification technique applied for the development of the classification model. The

selection of these boundaries in the training step and the mapping of the new testing samples in the

validation step is based on the theory of each pattern recognition technique.

Sensitivity is defined as the fraction of samples belonging to the modeled class which is

correctly accepted by the respective model:

𝑠𝑒𝑛𝑠𝑖𝑡𝑖𝑣𝑖𝑡𝑦 =𝑇𝑃

𝑇𝑃 + 𝐹𝑁

Specificity is that fraction of samples not belonging to the modeled class that is correctly

rejected by the model:

𝑠𝑝𝑒𝑐𝑖𝑓𝑖𝑐𝑖𝑡𝑦 =𝑇𝑁

𝑇𝑁 + 𝐹𝑃

Precision is defined as the ratio between the number of samples correctly accepted and the

total number of samples accepted by the same model:

𝑝𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛 =𝑇𝑃

𝑇𝑃 + 𝐹𝑃

Accuracy or correct classification rate is the percentage of samples correctly classified. It is

used for the evaluation of the outcome of a discriminant classification:

𝑎𝑐𝑐𝑢𝑟𝑎𝑐𝑦 =𝑇𝑃 + 𝑇𝑁

𝑇𝑃 + 𝑇𝑁 + 𝐹𝑃 + 𝐹𝑁

Thresholds for the classification decision have been selected for the both cases of 6 and 12

classes (See Section 3.6.2). The decision of the thresholds was taken by using a standard cut-off of

false positive rate (FPR):

𝐹𝑃𝑅 =𝐹𝑃

𝐹𝑃 + 𝑇𝑁≤ 5%

Testing samples classified with predicted dummy variable (SIMCA Umetrics™) /probability

(MATLAB) less than the thresholds (0.54 and 0.50, respectively) have been forwarded to the

confirmation step (gas chromatography analysis) of the proposed analytical method. In the testing

step, a value for each class is generated for every testing sample (a vector 1xN where N is the

number of the model classes) corresponding to the predicted dummy variable (SIMCA Umetrics™)

/probability (MATLAB) that an unknown sample belongs to a class. The maximum value of these

numbers is used as classification criterion. In each case investigated, the setting of the thresholds was

done manually by minimising the FPR for the given testing datasets. Specifically, the threshold was

started with a value of 0.5, then it was increased gradually with ultimate aim the higher correct

classification rate and simultaneously a less than 5% false positive rate for the testing samples. If the

false positive rate was more than 5% then the threshold was decreased otherwise it was increased

until the highest overall classification rate have been achieved.

Additionally, in Matlab results, simulated samples were generated using real samples as

references in order to add a certain amount of variation in the calibration dataset. This strategy can be

very useful for improving the overall classification performance as proved by the results (See Section


3.6.2). In-house algorithms have been developed for changing the baseline, shifting and adding

random noise to the real calibration samples in order to produce simulated samples (Unpublished

work- under review). User can define the value ranges for the amplification factor for the spectral

intensifier which changes the baseline of a spectrum. Moreover, although random x-axis shifting and

adding noise blocks are not deterministic, user can select the parameters for these, i.e. scale

parameter for the Laplacian distribution (shifting along x-axis) and signal-to-noise ratio per spectrum, in

dB for the Gaussian noise (adding noise).

3.6. Results and discussion

3.6.1 Database Expansion

The database created during the FAO117 project funded by DEFRA was comprised of 23 pure oils

and 160 oil admixtures and were grouped into 6 different classes (PKO, P, RS, PPKO, RSPKO and

RSPO). In order to make the models more robust the database was expanded to include more

variability i.e. more pure oils from different origins and more in-house admixtures. A total of 80 pure

oils were purchased from reliable sources for the database expansion. Those pure oils included: palm

oil (n=23), palm kernel oil (n=5), palm olein (n=3), palm stearin (n=7), rapeseed oil (n=14), sunflower

oil (n=18) and coconut oil (n=10). Samples from the calibration set were exclusively restricted to the

prediction set. From these 80 pure oils, a total of 52 pure oils were used for calibration purposes and

27 were used for testing the new updated database. A total of 215 oil admixtures were prepared in-

house, 141 oil admixtures were used for calibration purposes and added to the previous database and

the rest 74 were used for testing the new updated database. Thus, a total of 193 oils including pure

oils and admixtures were added to the existing database and a total of 101 oils including pure oils and

oil admixtures were used as prediction set to validate the new expanded database.

3.6.2 Calibration model building – Classification

Substantial differences were observed among different types of pure oils (Figure 1) and oil

admixtures (Figure 2) when all spectra were superimposed.

Figure 1. Superimposed FTIR spectra of 7 pure oils.


Figure 2. Superimposed FTIR spectra of 9 oil admixtures.

Two different classification methods were applied to the pre-processed spectroscopic data: a) Soft

Independent Modelling of Class Analogy (SIMCA) and b) Partial Least Square Discriminant Analysis

(PLS-DA) in FTIR expanded database. Both models were developed using specific intervals from the

FTIR spectra (from 654.2 to 1875.4 and from 2520 to 3120.7 cm-1

, the selected 3781 variables were

concatenated serially suitable for untargeted analysis). The selection of the specific intervals was

based on literature findings and the aim was to exclude the areas of the spectra without peaks. The

calibration set was used to develop the classification models at 95% confidence level.

A total of 376 samples were used in the calibration models and 101 were used in the prediction set.

Two chemometric packages including SIMCA 14.0 Umetrics TM

and Matlab were used for different

purposes.

Various classes were considered in the calibration building phase including:

i. 6 classes legacy design (MODEL A and B) as per previous project (FAO 117) with minor

modifications

ii. 12 classes new high resolution design (MODEL C and D)

The characteristics (R2X and Q

2) of the new updated models are shown in Table 2. R

2 is the

percent of variation of the calibration set – Y with PLS – explained by the model. R2 is a measure of fit,

i.e. how well the model fits the data. R2X is the fraction of X variation modeled in the component and

R2X (cumulative) is the cumulative R

2X up to the specified component. Q

2 is the percent of variation of

the calibration set – Y with PLS – predicted by the model according to cross validation. Q2 indicates

how well the model predicts new data. A large Q2 (Q

2 > 0.5) indicates good predictability. Q

2

(cumulative) is the cumulative Q

2 up to the specified component. Unlike R

2X (cum), Q

2 (cum) is not

additive. The model characteristics R2X and Q

2 are generally very good for both models with the

exception of the palm kernel oil (PKO) and coconut oil (CCO) class model which had lower R2X and

Q2 values (Table 2c) for the 12 classes’ model. RO class had low Q

2 values for the 12 class model

(Table 2c).


Table 2. a) SIMCA and PLS-DA model characteristics on calibration dataset using FTIR spectral data

on all oil samples for the 6 classes’ models (model A) using SIMCA Umetrics™. b) PLS-DA model

characteristics on calibration dataset using FTIR spectral data on all oil samples for the 6 classes’

models (model B) using MATLAB. c) SIMCA and PLS-DA model characteristics on calibration dataset

using FTIR spectral data on all oil samples for the 12 classes’ models (model C) using SIMCA

Umetrics™. d) PLS-DA model characteristics on calibration dataset using FTIR spectral data on all oil

samples for the 12 classes’ models (model D) using MATLAB.

* R2X (cumulative) is the cumulative R

2X up to the specified component. R

2X is the fraction of X variation modeled

in the component; ** Q2 (

cumulative) is the cumulative Q2 up to the specified component. Q

2 indicates how well the

model predicts new data.* P group: palm oil, palm stearin, palm olein; PKOC group: palm kernel oil, coconut oil; RS

group: rapeseed oil, sunflower oil, rapeseed and sunflower admixtures; RSP group: RS group+P group; PPKOC

group: P group+PKOC group; RSPKOC group: RS group+PKOC; RO: rapeseed oil; SO: sunflower oil; PKO: palm

kernel oil; CCO: coconut oil; ROPO: rapeseed and palm oil admixture; SOPO: sunflower and palm oil admixture;

ROPKO: rapeseed and palm kernel oil admixture; SOPKO: sunflower and palm kernel oil admixture; ROSO:

rapeseed and sunflower oil admixture; PPKO: palm oil, kernel oil admixture; PCCO: palm oil and coconut oil mix.

A Class R

2X * (cumulative) Q

2 ** (cumulative)

FTIR

SIM

CA

P 0.918 0.872

PKOC 0.692 0.628

RS 0.929 0.881

PPKOC 0.967 0.944

RSP 0.963 0.951

RSPKOC 0.961 0.948

PLS-DA One model for all

classes 0.984 0.728

B Class R

2X * (cumulative) Q

2 ** (cumulative)

FTIR


classes 0.946 0.887

C Class R

2X * (cumulative) Q

2 ** (cumulative)

FTIR

SIM

CA

P 0.918 0.872

RO 0.684 0.474

SO 0.829 0.638

PKO 0.722 0.320

CCO 0.535 0.286

ROPO 0.954 0.907

SOPO 0.960 0.941

ROPKO 0.936 0.913

SOPKO 0.960 0.931

ROSO 0.807 0.733

PPKO 0.960 0.926

PCCO 0.963 0.933


classes 0.984 0.460

D Class R

2X * (cumulative) Q

2 ** (cumulative)

FTIR


classes 0.936 0.817


The Variable Importance in Projection (VIP) scores estimate the importance of each variable in the

projection used in a PLS-DA model and is often used for variable selection. A variable with a VIP

score close to or greater than 1 can be considered important in a model. The 10 variables (cm-1

) with

the highest VIP score for the PLS-DA model A were: 1738.99, 1739.48, 1738.51, 1739.96, 1738.03,

1737.55, 1740.44, 1737.07, 1740.92 and 1736.58. C=C, C=O and C=N (stretching vibrations) are the

types of bonds that normally absorb on this region of the spectra e.g. ester C=O stretch, carboxylic

acid C=O stretch, etc. The 10 variables (cm-1

) with the highest VIP score for the PLS-DA model B

were: 2919.7, 2919.22, 2920.18, 1738.99, 1739.48, 1738.51, 1739.96, 1737.07, 1737.55 and 1738.03.

The region around 1740 cm-1

related to C=C, C=O and C=N bonds is also relevant for the model B. C-

H (stretching vibrations) are the type of bonds absorbing on the 2900 cm-1

region of the spectra. The

10 variables with the highest VIP score for the PLS-DA model C were: 1133.46, 1737.07, 1736.58,

1737.55, 1738.03, 1738.51, 1736.1, 1738.99, 2919.7 and 2919.22. The regions around 1740 cm-1

and

2900 cm-1

are also relevant for the 12 class model (Model C). The variable with the highest VIP score

(=3.01) is 1133.46 cm-1

and it is within the fingerprint region (1500-550 cm-1

) which is related to

bending vibrations (C-C, C-O, C-N). The 10 variables with the highest VIP score for the PLS-DA

model D were: 1736.58, 1767.07, 1736.10, 1753.46, 2919.22, 1737.55, 1753.94, 2918.73, 2919.70

and 1086.89. The regions around 1740 cm-1

and 2900 cm-1

are also relevant for the model D. The

variable 1086.69 cm-1

is within the fingerprint region (1500-550 cm-1

).

The developed SIMCA and PLS-DA classification models were validated using the prediction set

(n=101). The prediction set contained different oils from the ones included in the calibration set. In

Table 3 classification results of the prediction dataset against SIMCA and PLS-DA models are

presented. The performance of the classification models was calculated using four parameters;

sensitivity, specificity, precision and accuracy (see Section 3.5). Confusion tables can be seen in the

Appendix II.


Table 3. SIMCA and PLS-DA model performance on prediction dataset (n=101) using FTIR spectral

data.

TRAINING SAMPLES

ACC (%)

FALSE POSITIVE RATE

(%)

AVERAGE PRECISION (%)

NON CLASSIFIED (%)

MODEL A (using SIMCA UmetricsTM

)/ 6 classes

SIMCA 376 93.07 6.93 91.53 0

PLS-DA (Lv=17) 376 90.10 9.90 92.06 0

MODEL A WITH THRESHOLDS/ 6 classes

SIMCA (t=0.05) 376 33.66 0.99 81.67 65.35

PLS-DA (t=0.57) 376 85.15 4.95 78.71 9.90

MODEL B (using MATLAB)/ 6 classes

SIMCA with simulated samples

1302 76.24 23.76 71.76 0

PLS-DA with simulated

samples (t=0.5) (Lv=11)

1302 95.05 4.95 96.63 0

MODEL C (using SIMCA UmetricsTM

)/ 12 classes

SIMCA 376 89.11 10.89 92.12 0

PLS-DA (Lv=16) 376 81.19 18.81 77.30 0

MODEL C WITH THRESHOLDS/ 12 classes

SIMCA (t=0.05) 376 25.74 0 66.67 74.26

PLS-DA (t=0.54)

376 51.49 4.95 59.36 43.56

MODEL D (using MATLAB)/ 12 classes

SIMCA + sim. samples

2123 34.65 65.35 45.72 0

PLS-DA + sim. samples (Lv=17)

2123 91.09 8.91 92.00 0

*For definitions of terms in first row please see ‘Data analysis’ (Section 3.5)

* See Appendix II for confusion tables

In fact, PLS-DA classification technique performed better when using simulated matrices, the

accuracy increased to 95.05%, the false positive rate was less than 5% (4.95%), the average

precision was 96.63% and no samples needed to go to the confirmation step.

Permutation tests were performed and permuted R2 and Q

2 values were obtained in order to

assess if the PLS-DA model for 6 classes (Model B) is overfitted. For each class, oil class labels were

randomised and then cross validation calibration procedures were repeated for each case (20 times).

The permuted Q2 for every class is negative and lower than the original Q

2 which indicates that the

model is not overfitted (See permutation plots in Appendix III).

One new oil type was introduced in the expanded database, coconut oil. Going beyond the legacy

model design classes were rebuilt in order to contain a clearly defined oil type(s) per class). A total of

12 classes (that had a defined oil types) were created. The spectroscopic datasets are the same but

the samples were re-grouped into 12 classes instead of 6 classes. The new 12 classes were P

(including palm oil, palm olein and palm stearin), RO (including rapeseed oil), SO (including sunflower

oil), PKO (including palm kernel oil), CCO (including coconut oil), ROPO (including rapeseed-palm oil

admixtures), SOPO (including sunflower-palm oil admixtures), ROPKO (including rapeseed-palm

kernel oil admixtures), SOPKO (including sunflower-palm kernel oil admixtures), ROSO (including

rapeseed-sunflower oil admixtures), PPKO (including palm-palm kernel oil admixtures) and PCCO

(including palm-coconut oil admixtures).


Table 4. Number of simulated samples and parameters used for generating simulated samples for the

6 classes’ model (Model B).

FTIR simulated samples / 6 classes

Actual New total with

simulated samples

Spectral intensifier (step=0.01)

Shifting along x-

axis

Gaussian noise

1 P 68 204 1.01-1.02 - -

2 RS 62 248 1.01- 1.03 - -

3 PKOC 14 210 1.01-1.07 Laplacian distribution

b=0.6 -

4 RSPKOC 51 204 1.01-1.03 - -

5 RSP 107 214 1.01 - -

6 PPKOC 74 222 1.01-1.02 - -

TOTAL 376 1302

* P group: palm oil, palm stearin, palm olein; PKOC group: palm kernel oil, coconut oil; RS group: rapeseed oil,

sunflower oil, rapeseed and sunflower admixtures; RSP group: RS group + P group; PPKOC group: P group +

PKOC group; RSPKOC group: RS group + PKOC.

SIMCA and PLS-DA model performance on prediction dataset using FTIR when considering 12

classes can be seen in Table 3 (Model C and D). Class discrimination using 12 classes proved to be

more challenging. A calibration model built with 12 classes provided lower values of accuracy (89.11%

when using SIMCA and 81.19% when using PLS-DA) compared to the 6 classes’ model. Average

precision was higher (92.12%) when using SIMCA compared to the 6 classes’ model, however

average precision decreased to 77.30% when using PLS-DA. However, the false positive rate (i.e. the

number of samples that are wrongly classified as belonging to the class they don’t) is much higher

(10.89 and 18.81% for SIMCA and PLS-DA, respectively) compared to the 6 classes’ model. With the

aim of decreasing the false positive rate to <5%, some thresholds were introduced in the models as

for the 6 classes’ model. These thresholds were t=0.05 for SIMCA and t=0.54 for PLS-DA. The false

positive rates decreased to 0% for SIMCA and stayed the same for PLS-DA (4.95%) at the expense of

significantly decreasing the accuracy rates to 25.74% for SIMCA and 51.49% for PLS-DA.74.26% (75

out of 101 samples) of the samples were not classified when using SIMCA and 43.56% (44 out of 101

samples) when using PLS-DA, which means that these samples need to go to the second step

(confirmation step based on fatty acid criteria).

Permutation tests were performed and permuted R2 and Q

2 values were obtained in order to

assess if the PLS-DA model for 12 classes (Model C) is spurious i.e. overfitted. The order of the y-

variable was randomly permuted 20 times and separate models were fitted to all the permuted y-

variables extracting 16 components (the same number of components of the original Y matrix). All

permutation plots are shown in Appendix III and they showed no overfitting of the PLS-DA models.

The simulated samples approach was also applied to improve the 12 classes’ model. The actual

number of samples and the final number of samples including the simulated samples for the 12

classes’ model can be seen in Table 5.


Table 5. Number of simulated samples and parameters used for generating simulated samples for the

12 classes’ model (Model B)

FTIR simulated samples / 12 classes

Actual New total with

simulated samples

Spectral intensifier

Shifting along x-

axis

Gaussian noise

1 P 68 136 1.01 - -

2 RO 15 150 1.005- 1.045, step=0.005

- -

3 SO 16 144 1.005-1.040, step=0.005

- -

4 PKO 8 152 1.01-1.06, step=0.01

Laplacian distribution

b=0.6 30dB

5 CCO 6 138 1.01-1.11, step=0.01

Laplacian distribution

b=0.6 -

6 ROPO 39 195 1.01-1.04, step=0.01

- -

7 SOPO 68 204 1.01-1.02, step=0.01

- -

8 ROPKO 26 182 1.01-1.06, step=0.01

- -

9 SOPKO 25 200 1.01-1.07, step=0.01

- -

10 ROSO 31 217 1.01-1.06, step=0.01

- -

11 PPKO 39 195 1.01-1.04, step=0.01

- -

12 PCCO 35 210 1.01-1.05, step=0.01

- -

TOTAL 376 2123

*P: palm oil, palm olein and palm stearin; RO: rapeseed oil; SO: sunflower oil; PKO: palm kernel oil; CCO:

coconut oil; ROPO: rapeseed and palm oil admixture; SOPO: sunflower and palm oil admixture; ROPKO:

rapeseed and palm kernel oil admixture; SOPKO: sunflower and palm kernel oil admixture; ROSO: rapeseed and

sunflower oil admixture; PPKO: palm oil and palm kernel oil admixture; PCCO: palm oil and coconut oil admixture.

PLS-DA performed much better than SIMCA when using simulated matrices, the accuracy

increased to 91.09%, the false positive rate was more than 5% (8.91%), the average precision was

92.00% and no samples needed to go to the confirmation step.

Overall, the method with the best performance when using a 6 classes’ model is a calibration

model built with PLS-DA using Matlab and simulated samples (MODEL B), whereas for the 12

classes’ model is a calibration model built with PLS-DA combined with threshold (t=0.57) (MODEL C).

Please note that best models are not comparable between the 6 and the 12 classes as they are

different approaches. In the first model, no samples needed to go to the confirmation step whereas 44

samples were referred to the confirmation step in the second model.

3.6.3 Confirmation step– Fatty acids

PLS-DA performed better than SIMCA for the given problem and thus SIMCA was excluded for

further analyses.

A total of 10 samples (9.90%) were submitted to the confirmation step when using the method A

with thresholds/6 classes and 44 samples (43.56%) when using the method C with thresholds/12

classes.


The criteria for the 6 and 12 classes’ models are shown in Table 6 and Table 8, respectively.

These criteria are applied for the identification of an unknown sample. All conditions have to be met for

a sample to belong in a class. This is applied to all classes. If the unknown sample meets the criteria

of a specific class it is classified in the corresponding class.

The criteria for the 6 classes’ model were modified from the ones of the previous project (FAO117)

since new oils species/oil admixtures were included. Those changes were: C14:0 (5.8-10.0 instead of

7.8-10.0) and C18:2 (43-85 instead of 43-80) for P class; PUFA/SAT ratio (>3.5 instead of >4.0) for

RS class and all criteria for the PKOC class.

Table 6. Criteria expressed in quantities (mg fatty acid/g oil) for 6 classes’ model.

Specific FA

P PKOC RS PPKOC RSP RSPKOC

C8:0 Caprylic acid >8 >2.5 >2.5

C12:0 Lauric acid >0.99 >150 <0.1

C14:0 Myristic acid 5.8-10.0 <0.7

C16:0 Palmitic acid 315-490 50-100 >=70 58-330 35-70

C18:1 Oleic acid >=195

C18:2 Linoleic acid 43-85 <35 135-550 25-75 70-425 24-450

PUFA /SAT (P/S) ratio <0.25 <0.06 >3.5 <=0.3 >=0.325

* FA: fatty acid; P group: palm oil, palm stearin, palm olein; PKOC group: palm kernel oil, coconut oil; RS

group: rapeseed oil, sunflower oil, rapeseed and sunflower admixtures; RSP group: RS group + P group; PPKOC

group: P group + PKOC group; RSPKOC group: RS group + PKOC. PUFA/SAT: polyunsaturated fatty

acids/Saturated fatty acids

All samples (n=10) submitted to the confirmation step according to method A with thresholds/6

classes were successfully identified according to the 6 classes’ criteria (Table 7).

Table 7. Predicted identity of the samples submitted to the confirmation step (fatty acid criteria)

SAMPLE NAME ACTUAL PREDICTED IDENTITY

1 100PKOn2 PKOC PKOC

2 100POln2 P P

3 100POn21 P P

4 100POn22 P P

5 100POn5 P P

6 100POn8 P P

7 70CCO3+30POn3 PPKOC PPKOC

8 26POn3+74SOn4 RSP RSP

9 65POn18+35SOn13 RSP RSP

10 35POn5+65ROn3 RSP RSP

New criteria based on fatty acids for 12 classes were created. Criteria were created in the same

way that the criteria for 6 classes. Pure oils and admixtures from FAO117 and the current project were

used. New criteria for 12 classes can be seen on Table 8. Due to the similar fatty acid profile of some

of the oils/oil admixtures there are some overlapping criteria between RO and ROSO classes and

between PCCO and PPKO classes.


Table 8. Criteria expressed in quantities (mg fatty acid/g oil) for 12 classes’ model

Class FA

PKO RO SO P ROSO ROPKO SOPKO ROPO SOPO PPKO PCCO CCO

C6:0 <3 0 0 0 0

0 0 <1.0 0.1-2.5

>1.0

C8:0 5.0-40

0 0 0 0 <15 <15 0 0 <15 3.0-35

25-50

C10:0 10-30.0

0 0 0 0 <15 <15 0 0 <20 3.0-35

25-50

C12:0 150-400

0 0 >0.5 0 <235 <235 0.02-1.5

0.01-1.25

<250 20-275

250-350

C14:0

5-10

<10 <10 <100 15-125

>100

C16:0 50-100

20-50

30-70

>300 20-60 <70 <70 20-400

50-400

50-400

100-325

50-100

C16:1 0

0.5-1.5

C18:0 <25 5.0-15

15-35

20-45

5.0-30

<25 <25 5.0-35

20-40 15-35 20-35 15-30

C18:1c 80-175

20-600

150-

250

150-400

200-600

100-600

100-250

200-600

150-350

125-300

80-250

40-80

C18:2c <30 75-175

300-

550

40-85

100-450

15-175 50-400 50-175

50-450

15-75 15-60 5.0-35

C18:3c9,12,15

30-100

<3

<75 2.0-75 0.1-2.0 2.0-90

0.5-2

PUFA/ SAT

<0.07 2.0-4.5

4.5-6.0

<0.27

3.0-6.0

<2.75 <4 <3.25 <5.0 <0.16 <0.16 <0.075

* FA: fatty acid; PKO: palm kernel oil; RO: rapeseed oil; SO: sunflower oil; P: palm oil, palm olein and palm

stearin; ROSO: rapeseed and sunflower oil admixture; ROPKO: rapeseed and palm kernel oil admixture; SOPKO:

sunflower and palm kernel oil admixture; ROPO: rapeseed and palm oil admixture; SOPO: sunflower and palm oil

admixture; PPKO: palm oil and palm kernel oil admixture; PCCO: palm oil and coconut oil admixture; CCO: coconut

oil; PUFA/SAT: polyunsaturated fatty acids/Saturated fatty acids

Thirty-nine out of forty-four samples submitted to the confirmation step according to method C with

thresholds/12 classes were successfully identified according to the 12 classes’ criteria (Table 9). Four

samples were given two identities because they met all conditions for two classes that are similar in

term of fatty acid profiles. One sample could not meet all conditions for any of the 12 classes and thus

was left unidentified.

Table 9. Predicted identity of the samples submitted to the confirmation step (fatty acid criteria)

SAMPLE NAME ACTUAL PREDICTED IDENTITY

1 100POn21 P P

2 100POn22 P P

3 100POn3 P P

4 100POn5 P P

5 100POn8 P P

6 100ROn3 RO UNIDENTIFIED

7 100ROn6 RO RO/ROSO

8 100ROn9 RO RO

9 25ROn1+75SOn1 ROSO ROSO






15 73ROn6+27SOn15 ROSO ROSO/RO


16 22PKO2+78POn3 PPKO PPKO

17 28PKO2+72POn5 PPKO PPKO

18 34PKOn2+66POn8 PPKO PPKO

19 40PKOn2+60POn13 PPKO PPKO

20 46PKOn2+54POn16 PPKO PCCO/PPKO

21 52PKO2+48POn18 PPKO PCCO/PPKO

22 28CCO3+72POn3 PCCO PCCO

23 31CCO4+69PO5 PCCO PCCO

24 25PKOn2+75ROn1 ROPKO ROPKO








32 42PKOn2+58SOn7 SOPKO SOPKO





37 65POn18+35SOn13 SOPO SOPO

38 35POn5+65ROn3 ROPO ROPO







4. INTER-LAB TRIALS AS PART OF METHOD VALIDATION

4.1 FTIR inter-lab trial

It was established from DEFRA project FAO117 that the combination of a two-step analytical

procedure, standard chemometric classification techniques and a vertical decision making process

produced very good results when validated in our lab (intra-lab validation). The analytical procedure

can be summarised as a screening step where a spectroscopic method such as FTIR (untargeted

analysis) is employed in oil admixtures and a confirmation step (targeted analysis) where the identity

of the unidentified samples from the screening step is confirmed by standard fatty acid analysis (GC).

In order to know if the method is ‘instrument-agnostic’ i.e. independent of the instruments used to

acquire the spectra of the oils, an inter-lab validation trial was undertaken.

4.1.1 Participants

Twelve different institutions in the UK including research centres, food industries, public services and

private companies participated in the inter-lab validation.

4.1.2 Samples

A total of nine samples including pure oils and oil admixtures were prepared in our lab and sent to each

of the participants. The oils used for preparing the admixtures were different from the ones included in

the calibration set. They were new oils (Origin: Thailand, Oil processor 3) purchased from the period


August 2014 to December 2014. The pure oil and oil admixture samples were:

o Sample 1: Palm oil (100% PO)

o Sample 2: Rapeseed oil (100% RO)

o Sample 3: Palm kernel oil (100% PKO)

o Sample 4: Rapeseed-palm oil (50% RO-50% PO)

o Sample 5: Rapeseed-palm stearin (70% RO-30% PS)

o Sample 6: Palm kernel oil-palm oil (40% PKO-60% PO)

o Sample 7: Rapeseed oil-Palm kernel oil (50% RO-50% PKO)

o Sample 8: Rapeseed oil-Sunflower oil (40% RO-60% SO)

o Sample 9: Palm olein-rapeseed oil (70% POL-30% RO)

4.1.3 Results

Due to the high variability observed on the spectral data coming from different instruments a new

approach to pre-processing was needed before testing them in our calibration models. Acquisition

parameters varied amongst participants due to the different FTIR instruments and software used.

Duplicates of all spectra were averaged before pre-processing. All spectra for every sample were

plotted together to see variation between participants (Figure 3-11).

Figure 3. Superimposed FTIR spectra of 16 palm oils

Figure 4. Superimposed FTIR spectra of 16 rapeseed oils


Figure 5. Superimposed FTIR spectra of 16 palm kernel oils

Figure 6. Superimposed FTIR spectra of 16 rapeseed oil + palm oil admixture

Figure 7. Superimposed FTIR spectra of 16 rapeseed oil + palm stearin admixture


Figure 8. Superimposed FTIR spectra of 16 palm kernel oil + palm oil admixture

Figure 9. Superimposed FTIR spectra of 16 rapeseed oil + palm kernel oil admixture

Figure 10. Superimposed FTIR spectra of 16 rapeseed oil + sunflower oil admixture


Figure 11. Superimposed FTIR spectra of 16 palm olein + rapeseed oil admixture

The first difference between the spectra recorded using different instruments is the number of

variables. Data spacing depends on resolution and other acquisition parameter such as zero filling.

The spectra used to create the calibration models were recorded at a resolution 4 cm-1

and zero filling

of four times (2 levels) so that the data spacing was 0.482 cm-1

and the number of variables

(wavenumbers) was 7157. Other aspects of the spectra that need to be corrected through signal

correction filters are baseline scope and peak shifting. The pre-processing techniques included: Linear

interpolation, iCoShift, Standard Normal Variate (SNV), first derivative, Savitzky–Golay and Pareto

scaling. Description of these pre-processing techniques can be seen in the Appendix IV.

The FTIR inter-lab trial was conducted before the database expansion performed later on in the

current project (see Section 3) and thus the unequal number of samples amongst classes was

overcome by creating simulated samples that were added to the calibration models in order to create

balanced classes and avoid any biased classification decision. Simulated samples are new samples

created by offsetting the mean spectrum of each class along the Y axis and slightly along the X axis.

These samples were appended to the calibration dataset and the model was re-trained. The offset

percentage along the Y-axis varied between 0 and 25% in order to have a balanced classification

model.

Detailed results and discussion can be found in the Appendix IV. Overall, PLS-DA proved to be

more powerful than SIMCA algorithm when correctly assigning unknown samples to any of the oil

classes. The disadvantage of miss-classification was tackled by establishing thresholds (P values) and

adding synthetic samples to the calibration models. The screening method (FTIR) has demonstrated

very capable of predicting the nature of both the pure oil and the binary oil admixtures and has the

great advantage of being a fast and easy method to rapidly screen an oil sample for authentication

purposes. The initial concept proved to work as seen in the inter-lab trial validation results where the

majority of the blends can be identified by the chemometric models (PLS-DA) in the screening step

and a small percentage (14% non-classified and 2.3% wrongly classified) of pure and oil blends are

rejected. Those pure oils and/or oil admixtures had to be analysed further using targeted analytical

methods such as analysis of fatty acid composition (confirmation step). The fatty acid analysis of the

validation samples correctly identified the nature of 16 out of 18 samples (88.9%) referred to the

confirmation step when using the PLS-DA algorithm. As a general conclusion, FTIR spectroscopy

coupled with PLS-DA algorithm and followed on by fatty acid analysis when required offers an insight

into the nature of pure oil and binary mixtures and correctly classifies 96.03% of unknown oil samples

as seen in this inter-lab validation.


4.2 Fatty acids inter-lab trial

A second step or confirmation step based on fatty acid analysis was stablished in FAO117 in order

to know the identity of samples that couldn’t be revealed on the screening step based on

spectroscopic analysis. Criteria were created based on fatty acid data obtained in our laboratory and

they proved successful. In order to know the reproducibility of the fatty acid data obtained in our

laboratory and thus the fatty acid criteria, an inter-lab validation has been undertaken.

4.2.1 Participants

Three different accredited laboratories based in UK participated in the fatty acid inter-lab trial.

Samples were anonymous and were submitted to the testing laboratories for performing fatty acid

analyses using GC. Each of the laboratories performed the analysis using their own GC instrument

and official method for determination of individual fatty acids in oil samples. The same samples were

also analysed in our laboratory.

4.2.2 Samples

A total of eight samples including pure oils and oil admixtures as well as certified reference

materials from the European Commission- Institute for Reference Materials and Measurements

(IRMM) were submitted to each of the participants. The samples were:

o Sample 1: Standard Soya-Maize oil blend. European Commission, Institute for Reference

Materials and Measurements (IRMM), certified reference material BCR-162R.

o Sample 2: Palm oil and shea butter admixture (50% palm oil + 50% shea butter)

o Sample 3: Palm oil and rapeseed oil admixture (65% palm oil + 35% rapeseed oil)

o Sample 4: Palm kernel oil and palm oil admixture (42% palm kernel oil + 58% palm oil)

o Sample 5: Coconut oil and palm oil admixture (58% coconut oil + 42 palm oil)

o Sample 6: Soybean oil and palm oil admixture (59% soybean oil + 41% palm oil)

o Sample 7: Palm oil

o Sample 8: Standard cocoa butter. European Commission, Institute for Reference Materials and

Measurements (IRMM), certified reference material IRMM-801.

4.2.3 Method

Individual fatty acid concentrations were calculated using the internal standard method as

calculated in the phase 1 of the FAO117 project. Response factors were calculated from the external

fatty acid standards with respect to C13:0 which was used as the internal standard. The peak area of

the individual fatty acid was divided by the peak area of the internal standard, multiplied by the internal

standard concentration and then by the corresponding response factor and then applying sample

weight and dilution factors. Duplicate analyses were then averaged.

4.2.4 Results

The fatty acid contents of all the oil samples included in this validation trial are presented in

Appendix V. Results of the first sample are discussed in this section as it is a certified standard

sample (Table 10). Similar pattern was observed for the rest of the samples analysed.

The relative standard deviation (RSD) was used to evaluate the repeatability of the measurements

taken using different instruments. The results obtained (Table 10) indicate that the repeatability of the

method is acceptable. The RSD of the most abundant fatty acids (palmitic acid, stearic acid, oleic acid,

linoleic acid and linolenic acid) ranged from 0.02 to 0.07 which indicates good repeatability.


Table 10. Fatty acid content (expressed in %) of sample 1 (Standard Soya-Maize oil blend,

certified reference material BCR-162R)

FATTY ACIDS

BCR-162R IRMM

LAB 1 (%) LAB 2 (%) LAB 3 (%) LAB 4 (%) RSD

C6:0

0.00

0.01

2.00

C8:0

0.00 0.00 0.01

2.00

C10:0

0.00 <0.1 0.01

1.76

C12:0

0.00 <0.1 0.01

1.76

C14:0

0.04 <0.1 0.05 0.10 0.41

C15:0

0.00

0.03

2.00

C16:0 10.74 11.18 10.90 10.69 11.00 0.02

C16:1c

0.06

0.12 0.20 0.92

C17:0

0.07

0.07 0.10 0.71

C17:1c

0.03

0.08

1.43

C18:0 2.82 3.27 2.90 2.84 2.90 0.07

C18:1t

0.00 <0.1 0.03 0.10 0.87

C18:1c 25.40 28.58 26.70 26.71 26.60 0.04

C18:2t

0.16 <0.1 0.46 0.50 0.67

C18:2c 54.13 52.13 55.30 53.86 53.60 0.02

C20:0

0.27 0.40 0.40 0.40 0.17

C18:3c6,9,12

0.16

0.01

1.58

C20:1c

0.35 0.30 0.35 0.30 0.09

C18:3c9,12,15

3.35 3.28 3.60 3.75 3.30 0.07

C20:2c

0.02

0.03

1.17

C22:0

0.28 <0.1 0.29 0.30 0.39

C23:0

0.00

0.00

C24:0

0.12

0.17 0.10 0.73

5. APPLICATION OF THE METHOD IN PASTRY PRODUCTS (BISCUITS)

5.1 Validation of the FTIR 6 and 12 classes’ models (Model B and C) on commercial biscuits

A total of 20 commercial samples including different types of plain biscuits and brands were

purchased from retailers in the UK (Table 11).

According to the ingredient list on the label, 16 biscuits contain palm oil (PO) and 4 biscuits contain

palm oil and rapeseed oil (PORO). Oils from commercial biscuits were extracted using the method

described in Section 5.2.2 for the extraction of oils from in-house biscuits. FTIR spectra were collected

for all samples. Spectroscopic data of the oils extracted from the commercial biscuits were checked

against the models built using pure oils (model B and C, see Section 3.6.2).


Table 11. List of commercial biscuits purchased from retailers in the UK

Sample code COMMERCIAL BISCUITS

Oil type Country Product type

CMDGV2 PO UK Digestives










CMRTV3 PO UK Rich Tea


CMRTV5 PORO UK Rich Tea








* PO: palm oil; PORO: palm oil and rapeseed oil admixtures.

5.1.1 Results using the 6 classes’ legacy model (Model B)

The 6-classes’ model was used to predict the oil types including in commercial biscuits and the

results were:

o Accuracy (%): 80.00;

o False rate (%): 20.00;

o Average precision (%): 96.67

80% of the samples were correctly identified using the model B whereas 20% were wrongly

predicted i.e. were assigned to the wrong class.

5.1.2 Results using the 12 classes’ high resolution model (Model C)

The 12-classes’ model was used to predict the oil types including in commercial biscuits and the

results were:

o Accuracy (%): 50.00;

o False rate (%): 25.00;

o Samples for the Confirmation Step (%): 25.00 (5 samples out of 20)

Accuracy was lower compared to the 6-classes’ model (50% vs 80%). 25% of the samples were

wrongly assigned to classes and the rest (25%) were unidentified and are referred to the confirmation

step based on fatty acid criteria.


5.2 Development of specific biscuit-only model

Specific models using the FTIR spectroscopic data of in-house biscuits were created to compare to the

results obtained with the previous models (Section 5.1).

5.2.1 Samples

Pure oils were purchased from wholesaler, retailers and supermarkets in the UK (Table 12). Those

oils were: palm oil (PO) (n=12) and rapeseed oil (RO) (n=10) which are the most common oils used in

the biscuit sector.

Table 12. Details of samples (calibration and prediction set) used for the biscuit-only model

Oil species

Sample code Usage Origin Company

Palm Oil (PO)

POn3 Prediction Thailand Oil processor 3

POn4 Calibration UK Oil processor 4


POn6 Prediction UK Oil processor 4

POn7 Calibration Not provided Oil processor 5

POn8 Calibration Malaysia Oil processor 6



POn12 Calibration Indonesia Oil processor 9

POn16 Calibration Indonesia Oil processor 9

POn19 Prediction Indonesia Oil processor 9


Rapeseed Oil (RO)

ROn1 Prediction Not provided Oil retailer 1

ROn2 Calibration Not provided Oil retailer 2

ROn3 Calibration UK Oil retailer 3





ROn8 Calibration More than one country

Oil retailer 7

ROn9 Calibration UK Oil retailer 3



ROn12 Calibration Belgium Oil retailer 8

A quick market research was undertaken to establish the combination of vegetable oil species

involved in the making of plain biscuits. Two different types of biscuits, digestive (DG) and rich tea

(RT) from different brands were studied as they are the most typical plain biscuits on the market. The

most common oils/oil admixtures found in biscuits are as follow:

- Palm oil (PO)

- Rapeseed oil (RO)

- Palm oil and rapeseed oil admixtures (PORO)

5.2.2 In-house biscuits preparation and extraction process

Digestive and rich tea biscuits were baked in our laboratory following the recipe and baking

conditions obtained from industry sources. The ingredients list is presented in Table 13. All ingredients


were weighted and mixed together. Palm oil (PO), rapeseed oil (RO) and PORO (palm oil and

rapeseed oil admixture) were added to the biscuits accordingly. Biscuits were baked for 10 minutes at

170℃. Digestive biscuits (DG) were prepared using 2 different oils/oil admixtures: PO and RO and

rich tea biscuits (RT) were prepared using 2 different oils/oil admixtures: PO and PORO.

Table 13. Formulation deriving from industry practices (Manleya, 2001; Manley

b, 2001).

Baking method: 170°C for 10 min

Ingredients Digestives (weight in

g/biscuit) Rich Tea (weight in

g/biscuit)

Wholemeal flour 3

Plain flour 15 27.6

Sugar 3.9 6.9

Syrup 0.6 1.8

Soda 0.5 0.15

Salt 0.3 0.2

Water 2 6

Vegetable oils 6 6

In-house biscuits were finely ground for the extraction of the oils. Extractions were done with

hexane and the extraction process was as follows: The ground biscuit powder was mixed with n-

hexane (1:2) in 50 mL centrifuge tubes (13 ~ 15 g/biscuit powder with 30 mL n-hexane in each tube)

by the roller mixer for 1 hour (33 rpm with 16 mm amplitude) for dissolving the oils in the solvent.

Afterwards, tubes with the biscuits powder and solvent were centrifuged at 3000 ×g for 10 min to

separate the powder from the solvent. The upper layer containing the oil dissolved in the solvent was

transferred immediately into a 50 mL round-bottomed flask for the evaporation of the solvent using a

rotary evaporator (60°C and 160 rpm for 15 min). After the evaporation of the solvent, the oil was

transferred to a small vial and kept at -20℃ until further analysis.

5.2.3 FTIR spectral data acquisition

FTIR spectroscopy was used as screening technique in order to collect spectroscopic data from

the oils present in biscuits. The procedure and spectroscopic conditions were the same as the ones

used for building the spectroscopic database of pure oils. Three replicates were obtained for each

sample. Samples were defrosted and heated at 50°C for 3-5 minutes prior to spectra collection.

All spectra were pre-processed according to a suitable standardized treatment which includes

three spectral filters, standard normal variate (SNV), first order derivative and Savitsky-Golay

smoothing, applied in a sequential order.

5.2.4 Biscuit only-model building- Calibration models and validation

5.2.4.1 Biscuit dataset building

Calibration models were built using spectroscopic data from oils extracted from the in-house

biscuits (n=40). Samples were divided into two independent sets, the calibration set (n=40) and the

prediction set (n=14) and were assigned to classes.

Calibration models using FTIR data were built for 3 classes (RO, PO and PORO). Only PLS-DA

was used as a chemometric technique as proved to perform better than SIMCA.

The model characteristics (R2 and Q

2) are shown in Table 14. R

2 is the percent of variation of the

calibration set – Y with PLS – explained by the model. R2 is a measure of fit, i.e. how well the model

fits the data. R2X is the fraction of X variation modeled in the component and R

2X (cumulative) is the

cumulative R2X up to the specified component. Q

2 is the percent of variation of the calibration set – Y

with PLS – predicted by the model according to cross validation. Q2 indicates how well the model

predicts new data. A large Q2 (Q

2 > 0.5) indicates good predictability. Q

2 (cumulative) is the cumulative

Q2 up to the specified component. Unlike R

2X (cum), Q

2 (cum) is not additive.


Table 14. PLS-DA model characteristics on calibration dataset using FTIR variables on all oil samples for the 3 classes’ model (n=40).

* R2 is the percent of variation of the training set – Y with PLS – explained by the model; ** Q

2 indicates how well the model

predicts new data.

The model characteristics R2X and Q

2 are good for the 3-classes model based on FTIR as seen in

Table 14.

5.2.4.2 Validation of the biscuit model with in-house biscuits

The developed PLS-DA classification models were validated using the prediction set (n=14). The

prediction set contained different oils from the ones included in the calibration set. The performance of

the classification models was calculated using four parameters; sensitivity, specificity, precision and

accuracy (see section 3.5). The classification results of the prediction dataset against PLS-DA models

are as follow:

- Accuracy (%): 100.00

- False rate (%): 0.00

- Average precision (%): 100.00

All in-house biscuit oils (n=14) were correctly classified when using FTIR and PLS-DA. Confusion

tables and the performance of the classification models can be found in table 15 a and b.

Table 15a. Performance of the classification model (oils extracted from in-house biscuits, FTIR,

PLS-DA) on the validation samples (oils extracted from in-house biscuits, n=14).

Description Statistical Measures

ACC

(%) Class TP TN FP FN

Sensitivit

y or TPR

Specific

ity

FP

R

Precisi

on

F1

score

Application of

PLSDA

100 PO 6 8 0 0 1.00 1.00 0.00 1.00 1.00

PORO 6 8 0 0 1.00 1.00 0.00 1.00 1.00

RO 2 12 0 0 1.00 1.00 0.00 1.00 1.00

*ACC: accuracy; TP: true positives; TN: true negatives; FP: false positives; FN: false negatives; TPR: true positive rate; FPR: false positive rate; PO: palm oil; PORO: palm oil and rapeseed oil admixtures; RO: rapeseed oil.

Table 15b. Confusion table for results presented in Table 15a

Actu

al

Predicted

PO PORO RO Total Sensitivity (%)

PO 6 0 0 6 100.0

PORO 0 6 0 6 100.0

RO 0 0 2 2 100.0

Precision (%) 100.0 100.0 100.0

Average sensitivity 100.0

Average precision 100.0

Overall accuracy 100.0

Class R2X * (cumulative) Q

2 ** (cumulative)

PLS-DA One model for all classes

0.942 0.783


* PO: palm oil; PORO: palm oil and rapeseed oil admixtures; RO: rapeseed oil.

Six samples were correctly classified as belonging to the PO class, other six samples were

correctly classified to the PORO class and the last two samples were correctly classified as belonging

to the RO class. Thus the overall accuracy was 100% and the false positive rate 0%.

5.2.4.3 Validation of the biscuit only-model with commercial biscuits

The developed PLS-DA classification models based on oils extracted from in-house biscuits were

validated using the oil extracted from the commercial biscuits (n=20). The performance of the

classification models was calculated using four parameters; sensitivity, specificity, precision and

accuracy (see section 3.5). The classification results of the prediction dataset against PLS-DA models

are as follow:

- Accuracy (%): 85.00

- False rate (%): 15.00

- Average precision (%): 75.00

85% (17 samples) of the extracted oils from the commercial biscuits were correctly classified when

using FT-IR and PLS-DA whereas 15% (3 samples) were wrongly classified. Confusion tables and the

performance of the classification models can be found in tables 16 a and b.


PLS-DA) on the validation samples (oils extracted from commercial biscuits, n=20).


ACC (%)

Class TP TN FP FN Sensitivity or TPR

Specificity

FPR

Precision

F1 score

Application of PLSDA

85 PO 16 1 3 0 1.00 0.25 0.75 0.84 0.91

PORO 1 16 0 3 0.25 1.00 0.00 1.00 0.40

RO 0 20 0 0 1.00 1.00 0.00 1.00 1.00

* ACC: accuracy; TP: true positives; TN: true negatives; FP: false positives; FN: false negatives; TPR: true positive rate; FPR: false positive rate; PO: palm oil; PORO: palm oil and rapeseed oil admixtures; RO: rapeseed oil.


Actu

al

Predicted

PO PORO RO Total Sensitivity (%)

PO 16 0 0 16 100.00

PORO 3 1 0 4 25.00

RO 0 0 0 0 0.00

Precision (%) 84.21 100.00 0.00





Three samples of oils extracted from commercial biscuits containing PORO were wrongly classified

in the P class.

In order to decrease the false positive rate (<5%) thresholds were established (t=0.70) and the


performance of the model was as follow:

- Accuracy (%): 80.00

- False rate (%): 5.00

- Samples for the confirmation step (%): 15.00

Exactly 80% (16 samples) of the extracted oils from the commercial biscuits were correctly

classified when using FTIR and PLS-DA whereas 5% (1 sample) was wrongly classified. Three

samples (15%) were not assigned to any class and thus they are submitted to the confirmation step

based on fatty acid criteria. Confusion tables and the performance of the classification models can be

found in tables 17 a and b. One sample of oil extracted from commercial biscuits containing PORO

was wrongly classified in the P class.


PLS-DA, thresholds) on the validation samples (oils extracted from commercial biscuits, n=20).


ACC (%)


Specificity

FPR

Precision

F1 score

Application of PLSDA

80 PO 15 3 1 1 0.94 0.75 0.25 0.94 0.94

PORO 1 16 0 3 0.25 1.00 0.00 1.00 0.40

RO 0 20 0 0 1.00 1.00 0.00 1.00 1.00

* ACC: accuracy; TP: true positives; TN: true negatives; FP: false positives; FN: false negatives; TPR: true positive rate; FPR: false positive rate; PO: palm oil; PORO: palm oil and rapeseed oil admixtures; RO: rapeseed oil.


Actu

al

Predicted

PO PORO RO Confirmation Total Sensitivity

(%)

PO 15 0 0 1 16 93.75

PORO 1 1 0 2 4 25.00

RO 0 0 0 0 0 0.00

Precision (%) 93.75 100.00 0.00





5.2.5 Confirmation step- Fatty acids

Three samples of oil extracted from commercial biscuits were submitted to the confirmation step.

These samples were:

- CMDGV11: containing palm oil

- CMRTV6: containing palm oil and rapeseed oil

- CMRTV9: containing palm oil and rapeseed oil


According to the 6 classes’ criteria (Table 6) sample CMDGV11 was classified as belonging to the

group P and sample CMRTV6 was classified as belonging to the group RSPO (Table 18a). Sample

CMRTV9 did not meet all the conditions of any class so remained unidentified but the high values of

palmitic acid indicate it may contain palm species.

Table 18a. Application of 6 classes’ fatty acid criteria to the unidentified biscuit samples

Specific FA (mg FA/g oil) CMDGV11 CMRTV6 CMRTV9

C8:0 Caprylic acid 0.09 0.09 1.95

C12:0 Lauric acid 1.25 1.12 1.08

C14:0 Myristic acid 6.25 5.15 5.16

C16:0 Palmitic acid 310.61 259.29 243.79

C18:1 Oleic acid 223.03 252.01 183.16

C18:2 c Linoleic acid 72.37 82.52 34.64

PUFA/SAT index 0.22 0.35 0.14

GC CRITERIA RESULT (assigned class)

P RSPO Not conclusive result. Contains

P

ACTUAL IDENTITY Palm oil biscuit (Oil retailer 7)

Rapeseed oil and palm oil (Oil retailer 7)

Rapeseed oil and palm oil (Oil retailer 1)

* PUFA/SAT: polyunsaturated fatty acids/saturated fatty acids.

Similar results (Table 18b) were obtained when applying the 12 classes’ criteria (Table 8). Sample

CMDGV11 was identified as being a pure palm oil sample and sample CMRTV6 was identified as

being a rapeseed and palm oil admixture. Sample CMRTV9 did not meet again the criteria of any of

the classes so remained unidentified; however the high values of palmitic acid indicate it may contain

palm species.

Table 18b. Application of 12 classes’ fatty acid criteria to the unidentified biscuit samples

Specific FA (mg FA/g oil)

CMDGV11 CMRTV6 CMRTV9

C8:0 Caprylic acid 0.09 0.09 1.95

C12:0 Lauric acid 1.25 1.12 1.08

C14:0 Myristic acid 6.25 5.15 5.16

C16:0 Palmitic acid 310.61 259.29 243.79

C18:1 Oleic acid 223.03 252.01 183.16

C18:2c Linoleic acid 72.37 82.52 34.64

PUFA/SAT index 0.22 0.35 0.14

GC CRITERIA RESULT (assigned class)

PO ROPO Not conclusive result. Contains P

ACTUAL IDENTITY Palm oil biscuit

(Oil retailer 7)

Rapeseed oil and palm oil

(Oil retailer 7)

Rapeseed oil and palm oil

(Oil retailer 1)



6. APPLICATION OF THE METHOD IN CONFECTIONERY PRODUCTS

6.1 Background

Palm oil is used as an ingredient in the production of confectionery products. Palm oil and palm

kernel oil fractions provide ideal functional properties in the development of confectionery fats. The

confectionery market can be divided into two broad sectors: chocolate confectionery (‘countlines’ and

moulded bars, blocks, boxed chocolates and bite-size products), and sugar confectionery (including

fruit sweets, mints and chewing gum). Confectionery fats are used in coatings, filling, toffees and

caramels and ice-cream. Vegetable fats have been used in chocolate and chocolate-like coatings for

many years. Current EU legislation restricts their use to 5% of specific fats, if the product is being sold

as chocolate, and also requires very clear labelling. Confectionery with greater than 5% Cocoa Butter

Equivalent (CBE) cannot be labelled as chocolate. If higher levels or other fats are used, it must be

sold under another name, such as a chocolate flavoured coating. Legislation varies elsewhere in the

world with a few countries even allowing all the cocoa butter to be replaced by other fats. In the EU,

there is a Chocolate Directive which defines milk chocolate as having a minimum of fat content of

25%, not including vegetable oils. For a typical low-cost milk chocolate recipe with 28.3% fat, the

maximum vegetable fat that can be added is 3.3%. Chocolate for use in ice cream or similar may

contain up to 5% vegetable fat other than cocoa butter (EU, 2012). The Chocolate Directive does not

cover chocolate coatings and fillings. Chocolate fillings may include for example, hazelnut, praline,

toffee, wafers or other fat products. Fillings in chocolates may use coconut or palm kernel oils or CBE

containing palm mid-fractions (EU, 2012).

There are three types of palm based confectionary fats used in chocolate:

1. - Cocoa Butter Equivalents (CBE) (Non-lauric fats, temper): Vegetable fats with similar chemical

and physical characteristics to cocoa butter and can hence be used interchangeably with cocoa butter

in any recipe. They are like cocoa butter i.e. palm oil mid-fractions, with similar triacylglycerol

composition (POP, POSt, StOSt) than cocoa butter (CB) and can be added in any proportion without

causing a significant softening or hardening effect. A standard Cocoa Butter Equivalent (CBE)

contains around 50% exotic fats and 50% palm oil ‘Soft’ cocoa butter equivalents are typically used in

the UK and Ireland, and contain up to 30% exotics and 70% palm mid fractions (compared to ‘hard’

CBE which contain higher proportions of exotics to palm oil) (EU, 2012).

2.- Cocoa Butter Replacers (CBR) (Non-lauric fats, non-temper): Vegetable fats of a non-lauric

origin with similar physical, but not chemical characteristics to cocoa butter and which can be used to

replace most of the cocoa butter in coating applications. They are partially compatible with CB, adding

up to 20-30% of CB in the fat phase. They can also replace cocoa fats entirely (partly hydrogenated

double fractionated palm olein).

3.- Cocoa Butter Substitutes (CBS) (Lauric fats): Vegetable fats of a lauric origin with similar

physical, but not chemical characteristics to cocoa butter and which can be used to replace almost all

of the cocoa butter in coating applications. Toffees are also likely to contain palm oil, at around 7.5%

of volume. Toffee fats can include interesterified hydrogenated palm kernel oil and palm oil,

hydrogenated palm kernel stearin, palm kernel olein and palm olein or hydrogenated palm kernel oil.

6.2 Samples

Oils used in the confectionery industry were sourced from retail companies. Exotic oils were not

easy to find so a low number of samples were purchased from online retailers and authenticity was not

guaranteed. These oils were cocoa butter (n=8), hydrogenated palm kernel oil (n=1), shea butter

(n=5), illipe butter (n=2), mango kernel (n=4), kokum gurgi (n=2) and sal (n=1) (see Section 3.1, Table

1).

Additionally, three types of confectionery products containing chocolate were purchased from local

supermarkets. These were:

o Confectionery product 1: bar of two crispy wafer fingers covered with milk chocolate (66%). Fats

included in the ingredients list are: Cocoa butter, vegetable fat (Palm

Kernel/Palm/Shea/Sal/Illipe/Kokum Gurgi/Mango Kernel) and butterfat (from milk).

o Confectionery product 2: contains milk chocolate (35%) covered caramel (32%) and biscuit


(26%). Fats included in the ingredients list are: Palm fat, cocoa butter and milk fat.

o Confectionery products 3:

Brand 1- Sponge cakes with dark crackly chocolate and a smashing orangey centre.

Fats listed in the ingredients list are: Vegetable fats (palm, sal and/or shea), butter oil

(milk) and cocoa butter for the chocolate coating (19%) and vegetable oils (sunflower,

palm) for the rest of the cake.

Brand 2- Sponge cakes with dark crackly chocolate and a smashing orangey centre.

Fats listed in the ingredients list are: Palm oil for the biscuit (38%) and cocoa butter for

the chocolate coating (17%).

6.3 In-house admixtures of pure oils

Oil admixtures including oils and butters widely used in confectionary industry (palm oil, palm

kernel, hydrogenated palm kernel oil, shea butter, cocoa butter, sal, kokum butter, illipe butter and

mango kernel butter) were created in our laboratory for model validation purposes. These oil

admixtures were intended to mimic some of the most popular oil admixtures used in the confectionery

industry and were as follows:

Confectionery admixture 1 (sample code EBM1):

- Cocoa butter (70 %) COA2

- Palm oil (30 %) POn9



- Hydrogenated palm kernel oil (20 %) PKOn5





- Shea butter (5 %) ShB1

- Sal (5 %) SB1




- Palm kernel oil (4 %) PKOn4

- Sal (2 %) SB1

- Illipe (2 %) IlB1

- Kokum (2 %) KmB1

- Mango kernel (2 %) MnB1







- Palm kernel oil (50 %) PKOn4

- Coconut oil (50 %) CCO9





- Sal (2 %) SB1

- Illipe (4 %) IlB1

- Kokum (2 %) KmB1






- Sal (2 %) SB1

- Illipe (5 %) IlB1






- Sal (4 %) SB1








- Sal (4 %) SB1





- Sal (2 %) SB1

- Sunflower oil (20 %) SOn5





- Palm kernel oil (10%) PKOn3

- Sal (4 %) SB1

- Illipe (4 %) IlB1


- Kokum (4%) KmB1


6.4 Fat extraction of commercial confectionery products

Confectionery chocolate products (1, 2 and 3) were analysed as a whole as well as per parts.

Confectionery product 1 was separated using a sharp knife into two parts, the chocolate coating and

the wafer fingers with filling. Confectionery product 2 was separated into three parts, the chocolate

coating, the caramel and the biscuit. And confectionery products 3 brand 1 and brand 2 were divided


into three sections, the chocolate coating, the orangey centre and the biscuit.

All samples were manually milled into powder/fine particles using a knife or a wooden stick. 10 g of

sample was mixed with 30 mL of hexane in a 50mL centrifuge tube. Tubes were mixed in a tube mixer

at 2500 rpm during 2 minutes and then left during 1 hour in a rotary mixer (33 rpm) letting the fat be

dissolved in the solvent. Tubes were centrifuged at 3000 rpm during 10 minutes until total separation

of phases. The upper layer containing the fat dissolved in hexane was transferred to a round bottomed

flask. Another 30 mL of hexane were added to the remaining bottom layer for a second extraction.

Procedure was the same followed for the first extraction. The second upper layer containing the

remaining fat dissolved in hexane was transferred to the round bottom flask and mixed with the first

extraction.

Solvent was evaporated using a rotary evaporator at 50°C during 15 minutes (160 rpm). Fat was

then weighted and transferred into small plastic tubes. The extraction procedure was repeated as

many times as needed in order to obtain the required amount of oil sample (approx. 3 g). Nitrogen was

injected into the headspace to prevent oxidation. Oil samples were stored at -20°C until analysis.

6.5 Spectral Data Acquisition with FTIR spectroscopy

FTIR and Raman spectroscopy were used as screening techniques in order to collect

spectroscopic data from the oils present in confectionery products. The procedure and spectroscopic

conditions were the same as described in the SOP (FAO117). Three and two replicates were obtained

for each sample, respectively.

All spectra were pre-processed according to a suitable standardized treatment which includes

three spectral filters, standard normal variate (SNV), first order derivative and Savitsky-Golay

smoothing, applied in a sequential order.

6.6 Confectionery only-model building- Calibration models and validation

6.6.1 Dataset building

Calibration models were built using spectroscopic data from pure oils as well as oil admixtures

likely present in confectionery products. The pure oils used for the calibration models were palm oil

(n=20), palm kernel oil (n=7), palm olein (n=3), hydrogenated palm kernel oil (n=1) and cocoa butter

(n=7). The number of different oil types and oil combinations in a confectionery product is high which

makes the identification of oil species in a confectionery product very challenging. A big number of

simulated samples/admixtures were generated in order to cover all the potential compositional ranges

met in commercial confectionery product. Oil admixtures were generated using simulated samples and

these were:

o CB+PO= CB (99%-69% (5%)) + PO (1%+31% (5%)) ---Oil admixtures of cocoa butter (ranging

from 69% to 99% in intervals of 5%) and palm oil (ranging from 1% to 31% in intervals of 5%)

(n=245)

o CB+PO+SB = CB (95%-99% (4%)) + PO (3.5%-0.5% (3%)) + SB (1.5%-0.5% (1%)) ---Oil

admixtures of cocoa butter (ranging from 95% to 99% in intervals of 4%), palm oil (ranging from

0.5% to 3.5% in intervals of 3%) and sal butter (ranging from 0.5% to 1.5% in intervals of 1%)

(n=350)

o CB+PO+ShB = CB (95%-99% (4 %)) + PO (3.5%-0.5% (3%)) + ShB (1.5%-0.5% (1%)) ---Oil

admixtures of cocoa butter (ranging from 95% to 99% in intervals of 4%), palm oil (ranging from

0.5% to 3.5% in intervals of 3%) and shea butter (ranging from 0.5% to 1.5% in intervals of 1%)

(n=350)

o CB+PO+SB+ShB = CB (95%-99% (4%)) + PO (3.5%-0.5% (3%)) + SB (0.75%-0.25% (0.5%)) +

ShB (0.75%-0.25% (0.5%)) ---Oil admixtures of cocoa butter (ranging from 95% to 99% in

intervals of 4%), palm oil (ranging from 0.5% to 3.5% in intervals of 3%), sal butter (ranging from

0.25% to 0.75% in intervals of 0.5%) and shea butter (ranging from 0.25% to 0.75% in intervals

of 0.5%) (n=350)

o CB+PO+SB+ShB+ILB+KMB+MNB= CB (95%-99% (4%)) + PO (3.5%-0.5% (3%)) + SB (0.3%-

0.1% (0.2%) + ShB (0.3%-0.1% (0.2%)) + ILB (0.3%-0.1% (0.2%)) + KMB (0.3%-0.1% (0.2%)) +


MNB (0.3%-0.1% (0.2%)) ---Oil admixtures of cocoa butter (ranging from 95% to 99% in

intervals of 4%), palm oil (ranging from 0.5% to 3.5% in intervals of 3%), sal butter (ranging from

0.1% to 0.3% in intervals of 0.2%), shea butter (ranging from 0.1% to 0.3% in intervals of 0.2%),

illipe butter (ranging from 0.1% to 0.3% in intervals of 0.2%), kokum butter (ranging from 0.1%

to 0.3% in intervals of 0.2%) and mango kernel butter (ranging from 0.1% to 0.3% in intervals of

0.2%) (n=5600)

o SO+PO: Oil admixtures of sunflower oil and palm oil (36 in-house samples and 900 simulated

samples for FTIR).

o PO+PKO: Oil admixtures of palm oil and palm kernel oil (37 in-house admixtures and 420

simulated samples for FTIR)

After the introduction of the simulated samples for the above type of admixtures, an unbalanced

training dataset is generated since the pure oil classes have a very small number of samples.

Therefore, simulated samples were also used to increase the number of samples of the pure oils and

created more balanced and robust models for avoiding any bias towards the classes with the most

representatives. Thus the final number of samples were as follows:

o Palm oil: 20 pure oil samples and 280 simulated samples (n=300).

o Palm kernel oil: 7 pure oil samples and 336 simulated samples (n=343)

o Palm olein: 3 pure oil samples and 300 simulated samples (n=303)

o Hydrogenated palm kernel oil: 1 pure oil sample and 280 simulated samples (n=281)

o Cocoa butter: 7 pure oil samples and 8400 simulated samples (n=8407)

All pure oils and oil admixtures above mentioned are commonly found in confectionery products

especially those products composed of chocolate coating and a biscuit/cake. The total number of

samples for building the models was 17922. Due to the different oil nature of the oils,

confectionery oils could not be tested with our initial calibration models built with pure oils

(model B and C, see section 3.6.2). For this particular case, the question to answer is “is palm oil

present in a confectionery product?” To answer this question, all samples were divided into two

classes, palm oil class (P class) and non-palm oil class (non-P class). Palm oil class was composed of

9515 samples and non-palm oil class was composed of a total of 8407 samples. Palm oil class (P

class) includes palm oil, palm olein, palm kernel oil, hydrogenated palm kernel oil and the oil

admixtures SO+PO, PO+PKO, CB+PO, CB+PO+SB, CB+PO+ShB, CB+PO+SB+ShB and

CB+PO+SB+ShB+ILB+KMB+MNB whereas non-palm oil class (non-P class) includes only cocoa

butter. Models were built using PLS-DA (number of latent variables used equals 2).

The PCA space of the two first principal components for FTIR is presented in Figure 12 (green

colour for the non-P class and blue colour for the P class). In the PCA space of the FTIR spectral data,

the ‘P class’ samples are dispersive as they include a large variety of oils and oil admixtures whereas

the ‘non-P class’ samples are grouped together.


Figure 12. Principal Component Analysis of FTIR spectral data of confectionery products (green

colour: non-P class and blue colour: P class)

6.6.2 Validation- in-house admixtures

The total number of in-house admixtures was 13 (see section 6.3) and one sample of pure cocoa

butter was also included in the model.

All oils and in-house admixtures were correctly classified when using FTIR and PLS-DA. Confusion

tables and the performance of the classification models can be found in Tables 19 a and b.

Table 19a. Performance of the FTIR classification model on the validation samples (cocoa butter

and in-house confectionery admixtures, n=14).


ACC (%)


Specificity

FPR

Precision

F1 score

Application of PLS-DA

100 Non-P 1 13 0 0 1.00 1.00 0.00 1.00 1.00

P 13 1 0 0 1.00 1.00 0.00 1.00 1.00

* ACC: accuracy; TP: true positives; TN: true negatives; FP: false positives; FN: false negatives; TPR: true positive rate; FPR: false positive rate.

Table 19b. Confusion table for the results presented in Table 19a

Actu

al

Predicted

Non-P class P class Total Sensitivity (%)

Non-P 1 0 1 100.0

P 0 13 13 100.0

Precision (%) 100.0 100.0




One sample out of 14 validation samples was correctly classified as belonging to the Non-P class

and 13 out of 14 validation samples were correctly classified as belonging to the P class and thus the

overall accuracy was 100%.


6.6.3 Validation- commercial admixtures

A total of 13 samples of oils extracted from commercial confectionery products were used for

validation purposes. The 13 samples were:

- Confectionery product 1 whole (sample code CP1W)

- Confectionery product 1chocolate coating (sample code CP1CH)

- Confectionery product 1biscuit with filling (sample code CP1B)

- Confectionery product 2 whole (sample code CP2W)

- Confectionery product 2 chocolate coating (sample code CP2CH)

- Confectionery product 2 caramel (sample code CP2C)

- Confectionery product 2 biscuit (sample code CP2B)

- Confectionery product 3 brand 1 whole (sample code CP3B1W)

- Confectionery product 3 brand 1 chocolate coating (sample code CP3B1CH)

- Confectionery product 3 brand 1 biscuit/cake (sample code CP3B1B)

- Confectionery product 3 brand 2 whole (sample code CP3B2W)

- Confectionery product 3 brand 2 chocolate coating (sample code CP3B2CH)

- Confectionery product 3 brand 2 biscuit/cake (sample code CP3B2B)

6.6.3.1.1.1 h

All oils extracted from confectionery products were correctly classified when using FTIR and PLS-

DA. Confusion tables and the performance of the classification models can be found the tables below

(Table 20 a and b).

Table 20a. Performance of the FTIR classification model on the validation samples (oil extracted

from commercial confectionery products, n=13)


ACC (%)

Class TP TN FP FN Sensitivi

ty or TPR

Specificity

FPR

Precision

F1 score

Application of PLS-DA

100 Non-P 1 12 0 0 1.00 1.00 0.00 1.00 1.00

P 12 1 0 0 1.00 1.00 0.00 1.00 1.00

* ACC: accuracy; TP: true positives; TN: true negatives; FP: false positives; FN: false negatives; TPR: true positive rate; FPR: false positive rate.

Table 20b. Confusion table for the results presented on Table 20a

Actu

al

Predicted

Non-P class P class Total Sensitivity (%)

Non-P 1 0 1 100.0

P 0 12 12 100.0

Precision (%) 100.0 100.0




One sample (CP3B2CH) out of 13 commercial confectionery samples was correctly classified as

belonging to the Non-P class and 13 out of 14 commercial confectionery samples were correctly

classified as belonging to the P class and thus the overall accuracy was 100%. According to the

ingredient list on the label of the package of the confectionery product 3 brand 2 the dark chocolate

coating includes only cocoa butter as oil/fat.


6.7 Application of chromatographic confirmation method based on fatty acid criteria

Due to the complexity of oil admixtures that can be included in just one confectionery product,

comprehensive identification of all oils present in one product is not possible using the current two-

step methodology. Therefore the best approach is to know if there is palm oil or not in a given

confectionery product. Confectionery products with chocolate coating are most likely to contain palm

oil but sometimes they might contain only cocoa butter.

Fatty acid criteria have been built using pure cocoa butter oils to confirm if a chocolate

confectionery product does not contain any palm oil and can be found in table 21.

Table 21. Fatty acid criteria for confectionery chocolate products

Specific FA (mg FA/g oil) Pure Cocoa butter

C16:0 Palmitic acid <250

C18:0 Stearic acid >200

PUFA /SAT (P/S) ratio <0.048

* FA: fatty acid; PUFA/SAT (P/S): polyunsaturated fatty acids/saturated fatty acids.

Oil extracted from commercial confectionery products has been used to test the efficiency of the

cocoa butter fatty acid criteria. Results are presented in table 22.

Table 22. Identity of oil extracted from commercial confectionery products using fatty acid criteria


From all the commercial confectionery samples only the chocolate coating of the confectionery

product 3 brand 2 (CP3B2CH) fulfilled all the criteria of pure cocoa butter. These results are in

accordance with the oils stated in the ingredient list of the package.

One sample, the chocolate coating of the confectionery product 2 (CP2CH) is quite close to fulfil all

the criteria of pure cocoa butter. This indicates that the amount of cocoa butter in the chocolate

coating of the confectionery product 2 is quite high and although palm oil is present, the amount is

very small.

SAMPLE C16:0 Palmitic acid

(mg FA/g oil)

C18:0 stearic acid

(mg FA/g oil) PUFA/SAT ratio

PREDICTED

IDENTITY

CP2W 334.81 137.51 0.0793 PO admixture

CP2CH 241.85 216.26 0.0565 PO admixture

CP2C 437.62 40.39 0.1033 PO admixture

CP2B 414.47 42.44 0.1139 PO admixture

CP1W 257.84 176.44 0.0635 PO admixture

CP1CH 233.48 199.33 0.0467 PO admixture

CP1B 341.33 68.86 0.1330 PO admixture

CP3B1W 214.46 192.78 0.1108 PO admixture

CP3B1CH 219.09 202.19 0.0726 PO admixture

CP3B1B 136.82 34.67 1.2618 PO admixture

CP3B2W 265.26 230.94 0.0831 PO admixture

CP3B2CH 231.17 284.95 0.0456 Cocoa butter

CP3B2B 363.44 38.58 0.2141 PO admixture


7. DEVELOP AND VALIDATE THE WEB TOOL USED FOR DATA ANALYSIS

The web tool predicts the composition of an unknown oil mixture using advanced multivariate

analysis tools and practically enable users to perform the analysis without the need of the necessary

statistical and data analysis packages. It can currently be accessed through the link:

www.whatismyoil.co.uk. One version of this web tool is ready and work is on-going.

The classification models of the webtool have been updated in this follow-on project in order to

include new vegetable oil species and extend it to processed foods containing vegetable oils,

especially foods containing palm oil such as biscuits and confectionery bars.

The above figure indicates the structure of the different options provided by the web tool. User can

select from the interface provided if the oil is a vegetable oil blend or it has been extracted from a

biscuit or a confectionery product (e.g. chocolate bar). Different models have been developed for each

of these options. ‘Biscuit’ and ‘vegetable oil’ options comprise one sub-model each whereas

‘confectionery option’ comprises three independent sub-models, i.e. coating, caramel and biscuit sub-

models.

The web tool also provides an extra functionality to detect the presence of palm oil in an unknown

testing sample where there is no information about the original source of the oil. The web tool is still

under development and it is expected to come online by the end of 2016.

Wh

at is

th

e sa

mp

le?

Known

Biscuit Biscuit model

Confectionery

Coating

Caramel

Biscuit

Vegetable oil Oil model

Unknown Palm oil / No

palm oil model

User options Models

http://www.whatismyoil.co.uk/


8. OVERALL CONCLUSIONS AND IMPLICATIONS OF THE FINDINGS

The two-stage procedure developed in the previous DEFRA project FAO117, consisting of a

screening stage based on a spectroscopic method (FTIR) and a confirmation stage based on a

chromatographic method (FA analysis using GC), has been successfully applied to two processed

food product categories, i.e. biscuits and confectionery products, with some necessary modifications

that actually improve the initial method further. Additionally, the initial dataset of pure oils and oil

admixtures has been significantly expanded to include more variability and the calibration models

have been rebuilt and revalidated.

The conclusions of this project are:

Initial database of pure oils and oils admixtures has been expanded in terms of number of

samples and oil species and now the sample library covers the global oil production. All

samples were purchased from reliable and reputable sources (major food industries and the oil

processing industry) with the exception of the exotic oils/fats. Exotic oils were not easy to obtain

and they were mainly purchased from online retailers. Thus, the authenticity of these oils was

not verified and it cannot be guaranteed.

Extension of the oil-only detection: Initial 6-class calibration models (legacy design) have been

re-built and also improved with the introduction of the ‘enhanced dataset’ concept (simulated

samples and the use of Matlab data analysis package) resulting in remarkably good

classification rate, i.e. 95.05% of the validation samples were correctly classified and only

4.95% were wrongly classified. Use of the FTIR screening stage was so successful with this

dataset that no sample needed to go to the confirmation step (fatty acid analysis).

New oil-species-calibration models have been successfully built for 12 classes (new – high

resolution model design) on the top of 6 classes’ model (now containing also coconut oil and its

admixtures). Using the classification algorithm PLS-DA on FTIR data a correct classification rate

of 51.49% was achieved which was expected because of the higher degree of difficulty (the

more classes the more difficult the classification problem). Only 4.95% of the samples were

wrongly classified and 43.56% of the samples (44 samples) needed to go to the second step

based on fatty acid criteria to confirm identity.

New fatty acid criteria for 12 classes were established for the confirmation step. Following

screening the criteria were applied in the ‘unknown’ samples of the FTIR stage and correctly

identified 39 out of 44 pure oil/oil admixture samples submitted to the confirmation step (88.6%

success). The overall success when considering both stages (screening by FTIR and

confirmation by GC) was 90.1%.

FTIR inter-lab trial: The majority of the blends in the FTIR inter-lab trial validation were identified

by the PLS-DA chemometric models in the screening step and a small percentage (14% non-

classified and 2.3% wrongly classified) of pure and oil blends were rejected. The fatty acid

analysis of the validation samples correctly identified the nature of 16 out of 18 samples (88.9%)

referred to the confirmation step. FTIR spectroscopy coupled with PLS-DA algorithm and

followed on by fatty acid analysis when required offers an insight into the nature of pure oil and

binary mixtures and correctly classifies 96.03% of unknown oil samples as seen in this inter-lab

validation.

More specifically (with respect to the gas chromatographic analysis) the inter-lab trial of the

targeted analytical analysis (fatty acids) proved to be successful. Fatty acid contents of the

same oil samples analysed by different gas chromatography instruments and under different

derivatisation and chromatographic conditions were shown to be consistent amongst

participants. Low RSD (relative standard deviation) values (from 0.01 to 0.53) were obtained for

the quantities of the major fatty acids present in oil samples.

Validation in biscuits: The two-step procedure has been used to identify the oil species present

in biscuits. The 6-classes model was more efficient in identifying the oil classes of oils extracted

from commercial biscuits than the 12-classes model. 80% of the samples were correctly


classified (20% of the samples were wrongly identified) when using the 6-classes model

whereas 50% of the samples were correctly classified (25% of the samples were wrongly

classified and 25% of the samples were non-classified and need to go to the confirmation step)

when using 12-classes model.

To improve the results further, new calibration models specifically built for biscuits (biscuit-only

model) were prepared using authentic vegetable oils extracted from in-house biscuits. Validation

of the methodology with in-house biscuits showed 100% accuracy whereas validation with oils

from commercial biscuits showed 80% accuracy and 15% wrongly classified. In order to tackle

false positives to an acceptable level (<5%), thresholds were established (threshold=0.70) to

decrease the false positive rate, and as a result, the accuracy was 80% and the false positive

rate was 5%.

Confectionary fats are very complex products and resolving the oil types required a different

approach. Since these fats are very different to most of the oils used until now in the

methodology and would not be sufficiently identified by the calibration models, it was decided to

simplify the problem to ‘bare minimum’: the detection of the presence of palm oil (yes/no model).

Thus, the presence of palm oil in confectionery products has been successfully detected using

specific PLS-DA calibration models for chocolate confectionery products (yes/no model or

confectionery-only model). FTIR spectroscopy provided excellent and promising results on the

detection of palm oil in a chocolate confectionery product. Validation with in-house oil

admixtures as well as with oils extracted from commercial confectionery products showed 100%

accuracy when using FTIR.

Chocolate products with only cocoa butter (non-palm oil confectionery) could be confirmed

using the latter PLS-DA model for confectionery products (yes/no model) as well as the

presence of palm oil in chocolate products containing palm oil. Fatty acid criteria for

confectionery samples were created and successfully identified all oils extracted from

commercial confectionery products. Due to the limited number of samples used, further work

could strengthen further the model design.

The method has some known limitations:

The performance of the ‘processed foods-specific models’ have not been evaluated from

spectral input from different spectrometers but ground work on the harmonisation protocol

should help in this direction.

Complex admixtures of more than two oils types have not been tested with the newly developed

method. These admixtures are not common in processed foods. There is evidence however that

when complicated mixtures are analysed (such as in the example of confectionary fats) the

developed method can be modified to a binary problem (‘is there palm oil or not’?- yes/no

model) and has showed promising results.

Although a generic method has attempted to be developed, results showed that some

modifications will be required in order to adapt its use in different food products (e.g. coleslaw,

ice cream, chilled ready-to-eat foods such as lasagne dish with a mixture of animal and

vegetable fat). It is the nature of the method (untargeted analysis) that it ‘cannot be prepared for

the unexpected’ and needs to be supported by robust calibration data that thus limits its

application/specificity.

The method is not suitable for testing trans-esterified oils such as the oil contained in

margarines because it is based on vibrational spectroscopy and fatty acid analysis. These

tailored mixtures can be found in numerous combinations and are the intellectual property of

every company and they were not available as reference samples in the project. More

importantly, the nature of the trans-esterification and the countless possibilities for a different

final oil composition would make the analysis extremely challenging.


To what extent original objectives were met?

Ultimately, the staged procedure consisting of a spectroscopic screening with FTIR and a

chromatographic confirmatory analysis proved effective in identifying the nature of unknown complex

refined vegetable oil blends in both oils and in some extend in processed foods with some essential

modifications. The methodology is simple to implement, very affordable in terms of cost per sample

and equipment resources required and yet highly specific. In this regard the original objectives were

fully met.

An Standard Operating Procedures (SOP) manual has been developed on what is essentially 4

different variations:

Variation 1. Initial determination of oil species in an unknown oil blend (oils-only)

Variation 2. High resolution determination of oil species (oils-only)

Variation 3. Prediction of the oil species in a biscuit product

Variation 4. Confirmation of the presence of palm oil in confectionery (chocolate) products.

The methodology as it now stands is ready to be transferred for routine analysis of unprocessed

vegetable oils (variations 1 and 2) because its performance has been fully validated both in-house and

externally and with the new harmonisation protocols implemented, enlarged sample database and

advanced method criteria implemented. Processed foods testing (where legislation actually applies)

has been developed (variations 3 and 4) in two product categories (pastry and confectionary products)

with some limitations. The performance of these methods (although fully outlined in the SOP) has not

been evaluated externally, i.e. from spectral input from different spectrometers. Results, however, are

very promising (>90% success rate). The research proved that different variation of the methods

(different calibration model) is needed for every product category tested. Further work is needed to

develop the universal (applicable to all products), instrument agnostic (applicable to all acquisition

instruments) method in order to adequately enforce the legislation.

9. ABBREVIATIONS

ACC Accuracy

CB Cocoa Butter

CCO Coconut oil

DEFRA Department of Environment, Food and Rural Affairs

DG Digestive biscuits

EC European Commission

EDA Exploratory data analysis

EU European Union

FA Fatty Acid(s)

FN False negative

FP False positive

FTIR Fourier Transform Infrared

FPR False positive rate

GC Gas chromatography

ILB Illipe Butter

IRMM Institute for Reference Materials and Measurements

KMB Kokum gurgi Butter

MNB Mango kernel Butter

P Palm oil and its derivatives olein and stearin

PCA Principal Component Analysis

PCCO Palm oil (PO) and Coconut oil (CCO) binary admixture

PKO Palm Kernel oil

PKOC Palm Kernel oil, coconut oil

PLS Partial least square


PLS-DA Partial least square discriminant analysis

PO Palm oil

PPKO Palm Oil (PO) and Palm Kernel oil (PKO) binary admixture

PPKOC P and PKOC oil admixtures

P/S Polyunsaturated/Saturate fatty acids

PUFA Polyunsaturated fatty acids

QUB Queens University Belfast

RO Rapeseed oil

ROSO Rapeseed (RO) and Sunflower oil (SO) binary admixture

ROPKO Rapeseed (RO) and Palm Kernel oil (PKO) binary admixture

ROPO Rapeseed (RO) and Palm oil (PO) binary admixture

rpm Revolutions per minute

RSD Relative Standard Deviation

RS Rapeseed oil, sunflower oil, rapeseed and sunflower oil admixtures

RSP RS and P oil admixtures

RSPKOC RS and PKOC oil admixtures

RT Rich Tea biscuits

SAT Saturated fatty acids

SB Sal Butter

ShB Shea Butter

SIMCA Soft independent modelling of class analogy

SLV Single Lab Validation

SNV Standard Normal Variate

SO Sunflower oil

SOP Standard Operating Procedure(s)

SOPKO Sunflower (SO) and Palm Kernel oil (PKO) binary admixture

SOPO Sunflower (SO) and Palm oil (PO) binary admixture

TN True negative

TP True positive

TPR True positive rate

UK United Kingdom


References to published material

9. This section should be used to record links (hypertext links where possible) or references to other published material generated by, or relating to this project.

These are the references used in the Evidence Project Final Report.

1. Bevilacqua, M., Bucci, R., Magri, A. D., Magri, A. L., and Marini, F. 2012. Tracing the origin of extra virgin olive oils by infrared spectroscopy and chemometrics: A case study. Analytica Chimica, 717, pp. 39-51.

2. Berrueta, L. A., Alonso-Salces, R. M., and Héberger, K. 2007. Supervised pattern recognition in food analysis. Journal of chromatography A, 1158 (1-2), pp.196–214.

3. Eriksson, L., Johansson, E., Kettaneh-Wold, N., Trygg, J., Wikstrom, C., and Wold, S. 2006. In: Multi- and megavariate data analysis (Part I) Basic principles and applications. Eds second. Umetrics AB, Umea, Sweden.

4. Manley a, D. (2001) Section 5.6 semisweet biscuits in Chpater 5 Recipes for hard doughs in Biscuits, cracker and cookie recipes for the food industry. Abington Hall, Abington: Woodhead Publishing Lt. pp. 63- pp. 74.

5. Manley b, D. (2001) Section 6.2 plain biscuits in Chapter 6 Recipes for short doughs in Biscuits, cracker and cookie recipes for the food industry. Abington Hall, Abington: Woodhead Publishing Lt. pp. 81 – pp. 90.

6. Oliveri, P. & Downey, G., 2012. Multivariate class modeling for the verification of food-authenticity claims. TrAC Trends in Analytical Chemistry, 35, pp. 74–86.

7. EU legislation and agriculture reports

http://ec.europa.eu/agriculture/eval/reports/chocolate/fullrep_en.pdf

http://ec.europa.eu/agriculture/eval/reports/chocolate/sum_en.pdf

http://europa.eu/legislation_summaries/consumers/product_labelling_and_packaging/l21122b_en.htm



http://ec.europa.eu/agriculture/eval/reports/chocolate/sum_en.pdf





APPENDIX

Appendix I - In house admixtures- Database expansion

Palm Stearin + Palm Oil binary mixture Palm Olein + Sunflower Oil binary mixture

Palm Stearin % Palm oil % Usage Palm Olein % Sunflower oil % Usage

23 PSn3 77 POn1 Calibration 27 POln1 73 SOn3 Calibration










63 PSn3 37 POn15 Calibration 26 POln2 74 SOn1 Validation





32 PSn2 68 POn3 Validation 65 POln2 35 SOn13 Validation

38 PSn6 62 POn5 Validation 71 POln2 29 SOn15 Validation

44 PSn2 56 POn8 Validation






Rapeseed oil + palm kernel oil binary mixture

Rapeseed Oil + Sunflower Oil binary mixture

Rapeseed

oil %

Palm kernel oil % Usage Rapeseed Oil % Sunflower oil % Usage

73 ROn2 27 PKOn1 Calibration 25 ROn2 75 SOn2 Calibration










75 ROn1 25 PKOn2 Validation 73 ROn2 27 SOn17 Calibration

70 ROn3 30 PKOn2 Validation 25 ROn1 75 SOn1 Validation








Rapeseed oil + palm oil binary mixture Sunflower oil + palm kernel oil binary mixture

Rapeseed oil %

Palm oil % Usage Sunflower oil % Palm kernel oil % Usage

82 ROn2 18 POn1 Calibration 73 SOn2 27 PKOn1 Calibration









42 ROn13 58 POn12 Calibration 70 SOn1 30 PKOn2 Validation







74 ROn1 26 POn3 Validation








Sunflower oil + palm oil binary mixture Palm oil + palm kernel oil binary mixture

Sunflower oil %

Palm oil % Usage Palm oil % Palm kernel oil % Usage

82 SOn2 18 POn1 Calibration 76 POn1 24 PKOn1 Calibration















74 SOn1 26 POn3 Validation 32 POn17 68 PKOn3 Calibration





35 SOn13 65 POn18 Validation 78 POn3 22 PKOn2 Validation



60 POn13 40 PKOn2 Validation








Coconut + Palm Oil Binary mixture

Coconut oil

%

Palm oil % Usage

20 CCO1 80 POn1 Calibration



































28 CCO3 72 POn3 Validation












Appendix II – Confusion tables – Oil Database expansion

MODEL A (using SIMCA Umetrics TM)/6 classes and MODEL A with

thresholds/6 classes

SIMCA


ACC

(%) Class

T

P TN

F

P FN

Sensiti

vity or

TPR

Specifi

city FPR

Precisio

n F1 score

Application

of PLS-DA

93.0 P

18 81 1 1 0.95 0.99 0.01 0.95 0.95

RS 14 83 0 4 0.78 1.00 0.00 1.00 0.88

PKOC 4 95 1 1 0.80 0.99 0.01 0.80 0.80

RSPKOC 15 83 3 0 1.00 0.97 0.03 0.83 0.91

RSP 22 77 1 1 0.96 0.99 0.01 0.96 0.96

PPKOC 21 79 1 0 1.00 0.99 0.01 0.95 0.98

Act

ual

Predicted

P

RS

PK

OC

RSP

KO

C

RSP

PP

KO

C

Tota

l

Sen

siti

vity

(%

)

P 18 0 1 0 0 0 19 94.74

RS 0 14 0 3 1 0 18 77.78

PKOC 0 0 4 0 0 1 5 80.00

RSPKOC 0 0 0 15 0 0 15 100.00

RSP 1 0 0 0 22 0 23 95.65

PPKOC 0 0 0 0 0 21 21 100.00

Precision (%)

94

.74

10

0.0

0

80

.00

83

.33

95

.65

95

.45





SIMCA with thresholds (threshold=0.05)


ACC


Sensitivi

ty or

TPR

Specifi

city

FP

R

Precisi

on

F1

score

Application

of PLSDA

33.66 P

7 82 0 12 0.37 1.00 0.00 1.00 0.54

RS 6 83 0 12 0.33 1.00

0.00 1.00 0.50

PKOC 0 96 0 5 0.00 1.00

0.00 0.00 0.00

RSPKOC 3 86 0 12 0.20 1.00

0.00 1.00 0.33

RSP 9 78 0 14 0.39 1.00

0.0

0 1.00 0.56

PPKOC 9 79 1 12 0.43 0.99

0.0

1 0.90 0.58

Act

ual

Predicted

P

RS

PK

OC

RSP

KO

C

RSP

PP

KO

C

Co

nfi

rmat

ion

Tota

l

Sen

siti

vity

(%)

P 7 0 0 0 0 0 12 19 36.84

RS 0 6 0 0 0 0 12 18 33.33

PKOC 0 0 0 0 0 1 4 5 0.00

RSPKOC 0 0 0 3 0 0 12 15 20.00

RSP 0 0 0 0 9 0 14 23 39.13

PPKOC 0 0 0 0 0 9 12 21 42.86

Precision (%)

10

0.0

0

10

0.0

0

0.0

0

10

0.0

0

10

0.0

0

90

.00





PLS-DA


ACC


Sensitivit

y or TPR

Specific

ity FPR

Precisio

n

F1

score

Application of

PLS-DA

90.1 P

14 81 1 5 0.74 0.99 0.01 0.93 0.82

RS 18 83 0 0 1.00 1.00 0.00 1.00 1.00

PKOC 1 96 0 4 0.20 1.00 0.00 1.00 0.33

RSPKOC 15 82 4 0 1.00 0.95 0.05 0.79 0.88

RSP 22 74 4 1 0.96 0.95 0.05 0.85 0.90

PPKOC 21 79 1 0 1.00 0.99 0.01 0.95 0.98

Act

ual

Predicted

P

RS

PK

OC

RSP

KO

C

RSP

PP

KO

C

Tota

l

Sen

siti

vity

(%

)

P 14 0 0 4 1 19 73.68

RS 0 18 0 0 0 0 18 100.00

PKOC 0 0 1 0 0 4 5 20.00

RSPKOC 0 0 0 15 0 0 15 100.00

RSP 1 0 0 0 22 0 23 95.65

PPKOC 0 0 0 0 0 21 21 100.00

Precision (%)

93

.33

10

0.0

0

10

0.0

0

10

0.0

0

84

.62

80

.77





PLS-DA with thresholds (threshold=0.57)


ACC


Sensitivit

y or TPR

Specific

ity FPR

Precisi

on

F1

score

Application of

PLS-DA

85.15 P

14 81 1 5 0.74 0.99 0.01 0.93 0.82

RS 18 83 0 0 1.00 1.00 0.00 1.00 1.00

PKOC 0 96 0 5 0.00 1.00 0.00 0.00 0.00

RSPKOC 15 82 4 0 1.00 0.95 0.05 0.79 0.88

RSP 19 78 0 4 0.83 1.00 0.00 1.00 0.90

PPKOC 20 80 0 1 0.95 1.00 0.00 1.00 0.98

Act

ual

Predicted

P

RS

PK

OC

RSP

KO

C

RSP

PP

KO

C

Co

nfi

rmat

io

n

Tota

l

Sen

sitv

ity

(%)

P 14 0 0 0 0 0 5 19 73.68

RS 0 18 0 0 0 0 0 18 100.00

PKOC 0 0 0 4 0 0 1 5 0.00

RSPKOC 0 0 0 15 0 0 0 15 100.00

RSP 1 0 0 0 19 0 3 23 82.61

PPKOC 0 0 0 0 0 20 1 21 95.24

Precision (%)

93

.33

10

0.0

0

0.0

0

78

.95

10

0.0

0

10

0.0

0





MODEL B (using MATLAB)/ 6 classes SIMCA + simulated samples

Description

Statistical Measures

ACC


Sensitivi

ty or

TPR

Specifi

city FPR

Preci

sion

F1

score

Application

of PLS-DA

76.24 P

16 80 2 3 0.84 0.98 0.02 0.89 0.86

RS 11 83 0 7 0.61 1.00 0.00 1.00 0.76

PKOC 0 96 0 5 0.00 1.00 0.00 0.00 0.00

RSPKOC 10 86 0 5 0.67 1.00 0.00 1.00 0.80

RSP 21 58 20 2 0.91 0.74 0.26 0.51 0.66

PPKOC 19 78 2 2 0.90 0.98 0.03 0.90 0.90

Act

ual

Predicted

P

RS

PK

OC

RSP

KO

C

RSP

PP

KO

C

Tota

l

Sen

siti

vity

(%

)

P 16 0 0 0 2 1 19 84.21

RS 0 11 0 0 7 0 18 61.11

PKOC 0 0 0 0 4 1 5 0.00

RSPKOC 0 0 0 10 5 0 15 66.67

RSP 2 0 0 0 21 0 23 91.30

PPKOC 0 0 0 0 2 19 21 90.48

Precision (%)

88

.89

10

0.0

0

0.0

0

10

0.0

0

51

.22

90

.48





PLS-DA + simulated samples

Description


ACC


Sensitivi

ty or

TPR

Specifi

city FPR

Prec

ision

F1

score

Application

of PLS-DA

95.05 P

14 82 0 5 0.74 1.00 0.00 1.00 0.85

RS 18 83 0 0 1.00 1.00 0.00 1.00 1.00

PKOC 5 96 0 0 1.00 1.00 0.00 1.00 1.00

RSPKOC 15 86 0 0 1.00 1.00 0.00 1.00 1.00

RSP 23 75 3 0 1.00 0.96 0.04 0.88 0.94

PPKOC 21 78 2 0 1.00 0.98 0.03 0.91 0.95

Act

ual

Predicted

P

RS

PK

OC

RSP

KO

C

RSP

PP

KO

C

Tota

l

Sen

siti

vity

(%)

P 14 0 0 0 3 2 19 73.68

RS 0 18 0 0 0 0 18 100.00

PKOC 0 0 5 0 0 0 5 100.00

RSPKOC 0 0 0 15 0 0 15 100.00

RSP 0 0 0 0 23 0 23 100.00

PPKOC 0 0 0 0 0 21 21 100.00

Precision (%)

10

0.0

0

10

0.0

0

10

0.0

0

10

0.0

0

88

.46

91

.30





MODEL C (using SIMCA Umetrics TM)/12 classes and MODEL C with

thresholds/12 classes

SIMCA

Act

ual

Predicted

P

RO

SO

PK

O

CC

O

RO

PO

SOP

O

RO

PK

O

SOP

KO

RO

SO

PP

KO

PC

CO

Tota

l

Sen

siti

vity

(%)

P 11 0 0 0 8 0 0 0 0 0 0 0 19 57.89

RO 0 4 0 0 0 0 0 0 0 0 0 0 4 100.00

SO 0 0 7 0 0 0 0 0 0 0 0 0 7 100.00

PKO 0 0 0 1 0 0 0 0 0 0 0 0 1 100.00

CCO 0 0 0 0 4 0 0 0 0 0 0 0 4 100.00

ROPO 2 0 0 0 0 6 0 0 0 0 0 0 8 75.00

SOPO 0 0 0 0 0 0 15 0 0 0 0 0 15 100.00

ROPKO 0 0 0 0 0 0 0 8 0 0 0 0 8 100.00

SOPKO 0 0 0 0 0 0 0 0 7 0 0 0 7 100.00

ROSO 0 0 0 0 0 0 0 0 1 6 0 0 7 85.71

PPKO 0 0 0 0 0 0 0 0 0 0 10 0 10 100.00

PCCO 0 0 0 0 0 0 0 0 0 0 0 11 11 100.00

Precision (%)

84

.62

10

0.0

0

10

0.0

0

10

0.0

0

33

.33

10

0.0

0

10

0.0

0

10

0.0

0

87

.50

10

0.0

0

10

0.0

0

10

0.0

0





Description


ACC


Sensitivi

ty or

TPR

Specifi

city FPR

Preci

sion

F1

score

Application

of PLS-DA

89.11 P 11 80 2 8 0.58 0.98 0.02 0.85 0.69

RO 4 97 0 0 1.00 1.00 0.00 1.00 1.00

SO 7 94 0 0 1.00 1.00 0.00 1.00 1.00

PKO 1 100 0 0 1.00 1.00 0.00 1.00 1.00

CCO 4 89 8 0 1.00 0.92 0.08 0.33 0.50

ROPO 6 93 0 2 0.75 1.00 0.00 1.00 0.86

SOPO 15 86 0 0 1.00 1.00 0.00 1.00 1.00

ROPKO 8 93 0 0 1.00 1.00 0.00 1.00 1.00

SOPKO 7 93 1 0 1.00 0.99 0.01 0.88 0.93

ROSO 6 94 0 1 0.86 1.00 0.00 1.00 0.92

PPKO 10 91 0 0 1.00 1.00 0.00 1.00 1.00

PCCO 11 90 0 0 1.00 1.00 0.00 1.00 1.00


SIMCA with thresholds (threshold=0.05)

Act

ual

Predicted

P

RO

SO

PK

O

CC

O

RO

PO

SOP

O

RO

PK

O

SOP

KO

RO

SO

PP

KO

PC

CO

Co

nfi

rmat

ion

Tota

l

Sen

siti

vity

(%)

P 7 0 0 0 0 0 0 0 0 0 0 0 12 19 36.84

RO 0 2 0 0 0 0 0 0 0 0 0 0 2 4 50.00

SO 0 0 0 0 0 0 0 0 0 0 0 0 7 7 0.00

PKO 0 0 0 0 0 0 0 0 0 0 0 0 1 1 0.00

CCO 0 0 0 0 0 0 0 0 0 0 0 0 4 4 0.00

ROPO 0 0 0 0 0 2 0 0 0 0 0 0 6 8 25.00

SOPO 0 0 0 0 0 0 5 0 0 0 0 0 10 15 33.33

ROPKO 0 0 0 0 0 0 0 2 0 0 0 0 6 8 25.00

SOPKO 0 0 0 0 0 0 0 0 1 0 0 0 6 7 14.29

ROSO 0 0 0 0 0 0 0 0 0 0 0 0 7 7 0.00

PPKO 0 0 0 0 0 0 0 0 0 0 3 0 7 10 30.00

PCCO 0 0 0 0 0 0 0 0 0 0 0 4 7 11 36.36

Precision (%)

10

0.0

0

10

0.0

0

0.0

0

0.0

0

0.0

0

10

0.0

0

10

0.0

0

10

0.0

0

10

0.0

0

0.0

0

10

0.0

0

10

0.0

0





Description


ACC


Sensitivi

ty or

TPR

Specifi

city FPR

Preci

sion

F1

score

Application

of PLS-DA

25.74 P 7 82 0 12 0.37 1.00 0.00 1.00 0.54

RO 2 97 0 2 0.50 1.00 0.00 1.00 0.67

SO 0 94 0 7 0.00 1.00 0.00 0.00 0.00

PKO 0 100 0 1 0.00 1.00 0.00 0.00 0.00

CCO 0 97 0 4 0.00 1.00 0.00 0.00 0.00

ROPO 2 93 0 6 0.25 1.00 0.00 1.00 0.40

SOPO 5 86 0 10 0.33 1.00 0.00 1.00 0.50

ROPKO 2 93 0 6 0.25 1.00 0.00 1.00 0.40

SOPKO 1 94 0 6 0.14 1.00 0.00 1.00 0.25

ROSO 0 94 0 7 0.00 1.00 0.00 0.00 0.00

PPKO 3 91 0 7 0.30 1.00 0.00 1.00 0.46

PCCO 4 90 0 7 0.36 1.00 0.00 1.00 0.53


PLS-DA

Act

ual

Predicted

P

RO

SO

PK

O

CC

O

RO

PO

SOP

O

RO

PK

O

SOP

KO

RO

SO

PP

KO

PC

CO

Tota

l

Sen

siti

vity

(%)

P 16 0 0 0 0 0 3 0 0 0 0 0 19 84.21

RO 0 1 0 0 0 3 0 0 0 0 0 0 4 25.00

SO 0 0 7 0 0 0 0 0 0 0 0 0 7 100.00

PKO 0 0 0 0 0 0 0 1 0 0 0 0 1 0.00

CCO 0 0 0 0 3 0 1 0 0 0 0 0 4 75.00

ROPO 1 0 0 0 0 4 3 0 0 0 0 0 8 50.00

SOPO 0 0 0 0 0 0 15 0 0 0 0 0 15 100.00

ROPKO 0 0 0 0 0 0 0 8 0 0 0 0 8 100.00

SOPKO 0 0 0 0 0 0 2 0 5 0 0 0 7 71.43

ROSO 0 1 2 0 0 0 2 0 0 2 0 0 7 28.57

PPKO 0 0 0 0 0 0 0 0 0 0 10 0 10 100.00

PCCO 0 0 0 0 0 0 0 0 0 0 0 11 11 100.00

Precision (%)

94

.12

50

.00

77

.78

0.0

0

10

0.0

0

57

.14

57

.69

88

.89

10

0.0

0

10

0.0

0

10

0.0

0

10

0.0

0





Description


ACC


Sensitivi

ty or

TPR

Specifi

city FPR

Preci

sion F1 score

Application

of PLS-DA

81.19 P 16 81 1 3 0.84 0.99 0.01 0.94 0.89

RO 1 96 1 3 0.25 0.99 0.01 0.50 0.33

SO 7 92 2 0 1.00 0.98 0.02 0.78 0.88

PKO 0 100 0 1 0.00 1.00 0.00 0.00 0.00

CCO 3 97 0 1 0.75 1.00 0.00 1.00 0.86

ROPO 4 90 3 4 0.50 0.97 0.03 0.57 0.53

SOPO 15 75 11 0 1.00 0.87 0.13 0.58 0.73

ROPKO 8 93 0 0 1.00 1.00 0.00 1.00 1.00

SOPKO 5 94 0 2 0.71 1.00 0.00 1.00 0.83

ROSO 2 94 0 5 0.29 1.00 0.00 1.00 0.44

PPKO 10 90 1 0 1.00 0.99 0.01 0.91 0.95

PCCO 11 90 0 0 1.00 1.00 0.00 1.00 1.00


PLS-DA with thresholds (threshold=0.54)

Act

ual

Predicted

P

RO

SO

PK

O

CC

O

RO

PO

SOP

O

RO

PK

O

SOP

KO

RO

SO

PP

KO

PC

CO

Co

nfi

rmat

ion

Tota

l

Sen

siti

vity

(%)

P 13 0 0 0 0 0 1 0 0 0 0 0 5 19 68.42

RO 0 0 0 0 0 1 0 0 0 0 0 0 3 4 0.00

SO 0 0 7 0 0 0 0 0 0 0 0 0 0 7 100.0

0

PKO 0 0 0 0 0 0 0 0 0 0 1 0 0 1 0.00

CCO 0 0 0 0 3 0 1 0 0 0 0 0 0 4 75.00

ROPO 0 0 0 0 0 1 0 0 0 0 0 0 7 8 12.50

SOPO 0 0 0 0 0 0 14 0 0 0 0 0 1 15 93.33

ROPKO 0 0 0 0 0 0 0 0 0 0 0 0 8 8 0.00

SOPKO 0 0 0 0 0 0 1 0 1 0 0 0 5 7 14.29

ROSO 0 0 0 0 0 0 0 0 0 0 0 0 7 7 0.00

PPKO 0 0 0 0 0 0 0 0 0 0 4 0 6 10 40.00

PCCO 0 0 0 0 0 0 0 0 0 0 0 9 2 11 81.82

Precision (%)

10

0.0

0

0.0

0

10

0.0

0

0.0

0

10

0.0

0

50

.00

82

.35

0.0

0

10

0.0

0

0.0

0

80

.00

10

0.0

0





Description


ACC


Sensitivi

ty or

TPR

Specifi

city FPR

Preci

sion

F1

score

Application

of PLS-DA

51.49 P 13 82 0 6 0.68 1.00 0.00 1.00 0.81

RO 0 97 0 4 0.00 1.00 0.00 0.00 0.00

SO 7 94 0 0 1.00 1.00 0.00 1.00 1.00

PKO 0 100 0 1 0.00 1.00 0.00 0.00 0.00

CCO 3 97 0 1 0.75 1.00 0.00 1.00 0.86

ROPO 1 92 1 7 0.13 0.99 0.01 0.50 0.20

SOPO 14 83 3 1 0.93 0.97 0.03 0.82 0.88

ROPKO 0 93 0 8 0.00 1.00 0.00 0.00 0.00

SOPKO 1 94 0 6 0.14 1.00 0.00 1.00 0.25

ROSO 0 94 0 7 0.00 1.00 0.00 0.00 0.00

PPKO 4 90 1 6 0.40 0.99 0.01 0.80 0.53

PCCO 9 90 0 2 0.82 1.00 0.00 1.00 0.90


MODEL D (using MATLAB)/ 12 classes

SIMCA + simulated samples

Act

ual

Predicted

P

RO

SO

PK

O

CC

O

RO

PO

SOP

O

RO

PK

O

SOP

KO

RO

SO

PP

KO

PC

CO

Tota

l

Sen

siti

vity

(%)

P 9 0 0 8 0 0 1 0 0 0 0 1 19 47.37

RO 0 0 0 4 0 0 0 0 0 0 0 0 4 0.00

SO 0 0 0 7 0 0 0 0 0 0 0 0 7 0.00

PKO 0 0 0 1 0 0 0 0 0 0 0 0 1 100.00

CCO 0 0 0 4 0 0 0 0 0 0 0 0 4 0.00

ROPO 2 0 0 3 0 3 0 0 0 0 0 0 8 37.50

SOPO 0 0 0 6 0 0 9 0 0 0 0 0 15 60.00

ROPKO 0 0 0 8 0 0 0 0 0 0 0 0 8 0.00

SOPKO 0 0 0 7 0 0 0 0 0 0 0 0 7 0.00

ROSO 0 0 0 5 0 0 1 0 0 1 0 0 7 14.29

PPKO 0 0 0 3 0 0 0 0 0 0 7 0 10 70.00

PCCO 0 0 0 6 0 0 0 0 0 0 0 5 11 45.45

Precision (%)

81

.82

0.0

0

0.0

0

1.6

1

0.0

0

10

0.0

0

81

.82

0.0

0

0.0

0

10

0.0

0

10

0.0

0

83

.33





Description


ACC


Sensitivi

ty or

TPR

Specifi

city FPR

Preci

sion

F1

score

Application

of PLS-DA

34.65 P 9 80 2 10 0.47 0.98 0.02 0.82 0.60

RO 0 97 0 4 0.00 1.00 0.00 0.00 0.00

SO 0 94 0 7 0.00 1.00 0.00 0.00 0.00

PKO 1 39 61 0 1.00 0.39 0.61 0.02 0.03

CCO 0 97 0 4 0.00 1.00 0.00 0.00 0.00

ROPO 3 93 0 5 0.38 1.00 0.00 1.00 0.55

SOPO 9 84 2 6 0.60 0.98 0.02 0.82 0.69

ROPKO 0 93 0 8 0.00 1.00 0.00 0.00 0.00

SOPKO 0 94 0 7 0.00 1.00 0.00 0.00 0.00

ROSO 1 94 0 6 0.14 1.00 0.00 1.00 0.25

PPKO 7 91 0 3 0.70 1.00 0.00 1.00 0.82

PCCO 5 89 1 6 0.45 0.99 0.01 0.83 0.59


PLS-DA + simulated samples

Act

ual

Predicted

P

RO

SO

PK

O

CC

O

RO

PO

SOP

O

RO

PK

O

SOP

KO

RO

SO

PP

KO

PC

CO

Tota

l

Sen

siti

vity

(%)

P 14 0 0 0 0 1 1 0 0 0 3 0 19 73.68

RO 0 4 0 0 0 0 0 0 0 0 0 0 4 100.00

SO 0 0 7 0 0 0 0 0 0 0 0 0 7 100.00

PKO 0 0 0 1 0 0 0 0 0 0 0 0 1 100.00

CCO 0 0 0 0 4 0 0 0 0 0 0 0 4 100.00

ROPO 0 0 0 0 0 8 0 0 0 0 0 0 8 100.00

SOPO 0 0 0 0 0 0 15 0 0 0 0 0 15 100.00

ROPKO 0 0 0 0 0 0 0 8 0 0 0 0 8 100.00

SOPKO 0 0 0 0 0 0 0 0 7 0 0 0 7 100.00

ROSO 0 2 2 0 0 0 0 0 0 3 0 0 7 42.86

PPKO 0 0 0 0 0 0 0 0 0 0 10 0 10 100.00

PCCO 0 0 0 0 0 0 0 0 0 0 0 11 11 100.00

Precision (%)

10

0.0

0

66

.67

77

.78

10

0.0

0

10

0.0

0

88

.89

93

.75

10

0.0

0

10

0.0

0

10

0.0

0

76

.92

10

0.0

0





Description


ACC


Sensitivi

ty or

TPR

Specifi

city FPR

Preci

sion

F1

score

Application

of PLS-DA

91.09 P 14 82 0 5 0.74 1.00 0.00 1.00 0.85

RO 4 95 2 0 1.00 0.98 0.02 0.67 0.80

SO 7 92 2 0 1.00 0.98 0.02 0.78 0.88

PKO 1 100 0 0 1.00 1.00 0.00 1.00 1.00

CCO 4 97 0 0 1.00 1.00 0.00 1.00 1.00

ROPO 8 92 1 0 1.00 0.99 0.01 0.89 0.94

SOPO 15 85 1 0 1.00 0.99 0.01 0.94 0.97

ROPKO 8 93 0 0 1.00 1.00 0.00 1.00 1.00

SOPKO 7 94 0 0 1.00 1.00 0.00 1.00 1.00

ROSO 3 94 0 4 0.43 1.00 0.00 1.00 0.60

PPKO 10 88 3 0 1.00 0.97 0.03 0.77 0.87

PCCO 11 90 0 0 1.00 1.00 0.00 1.00 1.00


Appendix III – Permutation plots

A) PLS-DA 6 classes’ model (MODEL B) using MATLAB


B) PLS-DA 12 classes’ model (MODEL C) using SIMCA Umetrics™

Permutation plot for model 1: P class

Permutation plot for model 2: RO class

Permutation plot for model 3: SO class

Permutation plot for model 4: PKO class


Permutation plot for model 5: CCO class

Permutation plot for model 6: ROPO class


Permutation plot for model 7: SOPO class

Permutation plot for model 8: ROPKO class

Permutation plot for model 9: SOPKO class


Permutation plot for model 10: ROSO class

Permutation plot for model 11: PPKO class

Permutation plot for model 12: PCCO class


Appendix IV – FTIR Inter-Lab trial results

1. DATA EXPLORATION

Principal component analysis (PCA) was first applied to all the FTIR spectral data composed of 3781

variables (654.23 - 1875.4 cm-1 and 2520.02 - 3120.74 cm-1) and 144 samples (Fig. 1). Spectral data were

previously transformed by three different signal correction methods: Standard Normal Variate (SNV), first

order derivative and Savitzky-Golay.

Figure 1. Score plot of the first two principal components: R2X(1)=0.361; R

2X(2)=0.162; Q

2 (cum)= 0.831. P

group: palm oil, palm stearin, palm olein; PKO group: palm kernel oil; RS group: rapeseed oil, sunflower oil,

rapeseed and sunflower mixtures; RSPO group: RS group + P group; PPKO group: P group + PKO; RSPKO

group: RS group + PKO.

A total of 9 uncorrelated principal components were calculated. The first two principal components

accounted for 52% of the total variance, with the first and second components explaining 36% and 16% of the

total variability, respectively. All the samples from one of the participants (Participant C) were outside the 95%

confidence level showing a large variability that could be explained by a ‘badly shaped’ instrument

performance or a “human/user error” and thus, they were removed from the dataset. A new principal

component analysis was applied to the reduced dataset (135 samples, PCs=12) and the score plots on the

first two and three principal components respectively are plotted in Figure 2.


Figure 2. Score plot (2D and 3D) of the first two principal components: R2X(1)=0.246; R

2X(2)=0.173; Q

2 (cum)=

0.725. P group: palm oil, palm stearin, palm olein; PKO group: palm kernel oil; RS group: rapeseed oil, sunflower

oil, rapeseed and sunflower mixtures; RSPO group: RS group + P group; PPKO group: P group + PKO; RSPKO


The score plot of the three first principal components revealed more samples situated outside the 95%

confidence ellipse that indicates a possible “mistake” in the performance of the instrument and/or the person

that carried out the analysis. All the samples from one of the participants (Participant B) were outside the

confidence level. Thus, all samples from participant B were considered outliers and were removed also from

the dataset. A final principal component analysis was applied to a total of 126 samples and 11 principal

components were calculated. The score plot of the first two principal components is presented in Figure 3.



2X(2)=0.191; Q

2 (cum)= 0.669. P

group: palm oil, palm stearin, palm olein; PKO group: palm kernel oil; RS group: rapeseed oil, sunflower oil,



2. DATA PRE-PROCESSING

Due to the high variability observed on the spectral data coming from different instruments a new

approach to pre-processing was needed before testing them in our calibration models.

Linear interpolation

All FTIR spectra used for creating the calibration models were recorded from 550 to 4000 cm-1 at a

resolution 4 cm-1. The total number of variables generated was 7157 (data spacing = 0.482 cm-1). The

number of variables varied amongst participants: 1738 scan points for one participant, 1762 scan points for

one participant, 1763 scan points for one participant, 1764 scan points for eight participants, 1765 scan points

for one participant, 3526 scan points for one participant, 7053 scan points for one participant, 7054 scan

points for one participant and 7157 scan points for the samples collected in out lab.

Linear interpolation was applied to all spectra in order to get the desirable number of variables.

If the two known points are given by the coordinates (X0, Y0) and (X1, Y1), the linear interpolant is the

straight line between these points. For a value X in the interval (X0, X1), the value Y along the straight line is

given from the equation

We created two or three points between the given scan points of the participants using interp1 function in

MATLAB R2014b returning interpolated at specific query points using linear interpolation. This yielded to a

total number of 7054 variables. All participants collected the spectra from 600 to 4000 cm-1. Thus, 103

variables covering the region from 550 to 600 cm-1 were added at the beginning of every spectrum to reach a

total number of 7157 variables. These variables were the same that the first variable of every spectrum.


Figure 4. Example of a linear interpolation of one of the spectra from one participant.

iCoShift

Using different types of FTIR’s ATR modules (made of different materials such as diamond) can result in

significant signal shifting of the spectral peaks. To overcome this problem a rapid and versatile algorithm for

the alignment of spectral datasets called “iCoShift” was applied to all spectra using MATLAB R2014b.

The iCoShift algorithm is based on correlation shifting of spectral Intervals and employs an FFT engine

that aligns all spectra simultaneously. The algorithm is fast making full-resolution alignment of large datasets

feasible and thus avoiding down-sampling steps such as binning. The algorithm can use missing values

(NaN) as a filling alternative in order to avoid spectral artifacts at the segment boundaries.

Figure 5. Example of a spectrum pre-processed with iCoShift.

Standard normal variate (SNV)

Standard normal variate (SNV) is a mathematical transformation that was applied to all FTIR spectra from

the participants. SNV is a normalization method that removes the slope variation from spectra caused by

scatter and variation of particle size (Barnes et al., 1989) [4]. It calculates the standard deviation of all the

variables for the given sample. The entire sample is then normalized by this value, thus giving the sample a

unit standard deviation (σ = 1). This procedure also includes a zero-order detrend (subtraction of the

individual mean value from each spectrum). The equations used by the algorithm are the mean and standard

deviation equations:


n

X

x

n

j

ji

i

1

,

)1(

)(

/)(1

2

,

1,

n

xX

xXSNV

n

j

iji

injii

where n is the number of variables, jiX , is the value of the jth variable for the ith sample.

This normalization approach is weighted towards considering the values that deviate from the individual

sample mean more heavily than values near the mean. FTIR raw spectra of 14 palm oils are presented in

Figure 6.

Figure 6. Superimposed FTIR spectra of 14 palm oils before applying any mathematical transformation

Figure 7. Superimposed FTIR spectra of 14 palm oils after applying SNV mathematical transformation

Another example of a spectrum pre-processed with SNV is presented in Figure 8.


Figure 8. Comparison of a spectrum before and after pre-processing with SNV

The effect of both mathematical transformations (iCoshift + SNV) on a raw spectrum can be seen in Figure

9.

Figure 9. Example of a spectrum pre-processed with iCoshift followed by SNV

First derivative

First order derivative (Osborne, Fearn & Hindle, 1993) [5] aims to remove overlapping peaks and correct

the baseline. The derivative brings the overlapping peaks apart and the linear background becomes to a

constant level in the first derivative spectrum. The peaks become zero in the first derivative. Specifically, first

derivative forward difference implementation was applied to the data.

F’(x) = f(x + 1) – f(x)

X’i= ( Xi,j+1 – Xi,j )

where jiX , is the value of the jth variable for the ith sample.

An example of a spectrum before and after pre-processing with the first order derivative is shown in Figure

10.


Figure 10. Comparison of a spectrum before and after pre-processing with first order derivative

The effect of the three mathematical transformations (iCoShift + SNV + First Derivative) on a raw spectrum

can be seen in Figure 11.

Figure 11. Example of a spectrum pre-processed with iCoShift followed by SNV and by the first derivative

Savitzky–Golay

Savitzky–Golay (Savitzky & Golay, 1964) [6] is a filter that can be applied to a set of data points for the

purpose of smoothing the data, that is, to increase the signal-to-noise ratio without greatly distorting the

signal. This is achieved in a process known as convolution, by fitting successive sub-sets of adjacent data

points with a low-degree polynomial by the method of linear least squares. When the data points are equally

spaced an analytical solution to the least-squares equations can be found, in the form of a single set of

"convolution coefficients" that can be applied to all data sub-sets, to give estimates of the smoothed signal (or

derivatives of the smoothed signal) at the central point of each sub-set.

An example of a raw spectrum (before pre-processing) and the same spectrum after the application of

Savitzky-Golay filter can be seen in Figure 12.


Figure 12. Comparison of a spectrum before and after pre-processing with Savitzky-Golay smoothing.

The effect of the four mathematical transformations (iCoShift + SNV + First Derivative + Savitzky-Golay)

applied as a series of spectral filters on a raw spectrum is presented in Figure 13.

Figure 13. Example of a spectrum pre-processed with iCoShift followed by SNV, the first derivative and Savitzky-

Golay

Pareto scaling

Scaling methods are data pre-processing approaches that divide each variable by a factor, the scaling

factor, which is different for each variable. They aim to adjust for the differences in fold differences between

the different variables by converting the data into differences in concentration relative to the scaling factor.

Pareto scaling uses a measure of the data dispersion (square root of the standard deviation) as a scaling

factor. Large fold changes are decreased more than small fold changes, thus the large fold changes are less

dominant compared to clean data. Furthermore, the data does not become dimensionless.

j

jji

jis

XXX

,

,

~

where jiX , is the value of the jth variable for the ith sample.


An example of a raw spectrum (before pre-processing) and the same spectrum after the application of

Pareto scaling can be seen in Figure 14.

Figure 14. Comparison of a spectrum before and after scaling with Pareto

The pre-processed spectra look quite different from the raw spectra when applied all filters together,

iCoshift, SNV, first derivative, Savitzky-Golay and Pareto scaling. Thus, the pre-processed spectra are

shifted, normalised, smoothed and scaled (Figure 15).

Figure 15. Example of a spectrum pre-processed with iCoshift followed by SNV, the first derivative, Savitzky-

Golay and Pareto scaling

3. SIMULATED SAMPLES

The calibration models developed in the first phase of the FAO117 project were created using different

number of samples. The unequal number of samples was due to the re-grouping of classes done at the end

of the project FAO117 because of the similarities observed amongst initial classes. This unequal number of

classes gives to the model uncertainty when classifying unknown samples mainly those that belong to the

low-numbered classes such as PKO class (only 12 samples). The number of samples of each class was as

followed:

P class= 78 samples

RS class= 78 samples

PKO class= 12 samples


RSPKO class= 84 samples

RSPO class= 180 samples

PPKO class= 54 samples

Simulated samples were added to the calibration models in order to create balanced classes and avoid

any biased classification decision. Simulated samples are new samples created by offsetting the mean

spectrum of each class along the Y axis and slightly along the X axis. These samples were appended to the

calibration dataset and the model was re-trained. The offset percentage along the Y-axis varied between 0

and 25% in order to have a balanced classification model.

Table 1. Description of offsetting for the production of simulated samples and the resulted number of

samples

Class Y-axis offset X-axis offset

Number of

synthetic

samples

added

Total

number of

samples

P 15% 1 variable to left 32 110

RS 20% 1 variable to left 42 120

PKO 25% 1 variable to left 52 64

RSPKO 15% 1 variable to left 32 116

RSPO - - 0 180

PPKO 20% 1 variable to left 42 96

For instance, in the case of P class, fifteen simulated samples were created above the mean spectrum

with 1% step (15% offset). Thereafter, the resulted spectra plus the mean spectrum were shifted by one

variable to left. In total, 32 simulated samples were added to the calibration dataset for the P class (Total new

= 110). Figure 16 shows the new simulated samples comparing to the mean spectrum of a specific class.

Figure 16. Thirty-two simulated samples added for P class (green colour)

The creation of the simulated samples for the rest of the classes was done following the same procedure

that for the P class (Figure 17, 18, 19 and 20).


Figure 17. Forty-two simulated samples for RS class (green colour)

Figure 18. Fifty-two simulated samples for PKO class (green colour)

Figure 19. Thirty-two simulated samples for RSPKO class (green colour)

It has to be noted that no simulated spectra were created for the RSPO class because of the high number

of samples included originally in that class.


Figure 20. Forty-two simulated samples for PPKO class

Principal component analysis (PCA) was applied to all the resulted FTIR spectral data composed of 686

samples (including all samples of the calibration models and the new simulated samples) for visualization

purposes.


2X(2)=0.279; Q

2 (cum)= 0.979.

(R2X: fraction of X variation modelled in the component; Q

2: overall cross-validated R

2X from the component)

P group: palm oil, palm stearin, palm olein; PKO group: palm kernel oil; RS group: rapeseed oil, sunflower oil,



A total of 23 uncorrelated principal components were calculated. The first two principal components

accounted for 88% of the total variance, with the first and second components explaining 60% and 28% of the

total variability, respectively.

4. PREDICTION- SCREENING STEP

4.1 INITIAL RESULTS OF THE INTER-LAB VALIDATION

A total of 126 spectra from all the participants were used for method validation. The developed SIMCA and

PLS-DA classification models from DEFRA FAO117 were validated using the inter-lab validation set to predict

the type of oil of each sample (oil and oil admixtures). The FTIR spectral intervals used were the same that


those used for creating the calibration models: from 654.2 to 1875.4 and from 2520 to 3120.7 cm-1 (3781

variables).

In general PLS-DA gave a better performance (higher correctly classification rate, 88.89%) than SIMCA

(33.33%). Performance was assessed by its accuracy in predicting each class correctly. However, both

techniques (PLS-DA and SIMCA) gave a high number of false positives (28.3% and 59.8%, respectively) -

which means a high risk of miss-classification and therefore rendering the whole process redundant. Samples

that are wrongly classified in the screening step will not be referred to the second step or confirmation step. In

order to decrease the number of wrongly classified samples (false positives) and increased the number of

non-classified samples, a probability threshold was calculated (see 4.2) and effectively included in the initial

methodology. The classification results with both methods are presented in Table 2 and 3 below.

Table 2. Classification results on the prediction of the inter-lab samples when using FTIR coupled with

PLS-DA algorithm

Pre-processing: calibration

dataset

Pre-processing

for prediction dataset

PLS-DA

ACC (%)


Specificity

FPR Precision F1

score

1. SNV 2. 1

st Deriv

3. S-Golay (7,39) 4. Pareto

1. Icoshift (‘average’, ’whole’) 2. SNV 3. 1

st

Derivative 4. S-Golay (7,39) 5. Pareto

88.9 P 14 99 13 0 1.00 0.88 0.12 0.52 0.68

RS 28 98 0 0 1.00 1.00 0.00 1.00 1.00

PKO 14 111 1 0 1.00 0.99 0.01 0.93 0.97

RSPKO

14 112 0 0 1.00 1.00 0.00 1.00 1.00

RSPO 35 84 0 7 0.83 1.00 0.00 1.00 0.91

PPKO 7 112 0 7 0.50 1.00 0.00 1.00 0.67

*ACC: accuracy; TP: true positive; TN: true negative; FP: false positive; FN: false negative, TPR: true positive rate; FPR:

false positive rate. P class: palm oil, palm stearin, palm olein; PKO class: palm kernel oil; RS class: rapeseed oil, sunflower

oil, rapeseed and sunflower mixtures; RSPO class: RS class + P class; PPKO class: P group + PKO; RSPKO class: RS

class + PKO.

Table 3. Classification results on the prediction of the inter-lab samples when using FTIR coupled with SIMCA

algorithm


dataset

Pre-processing

for prediction dataset

SIMCA

ACC (%)


Specificity

FPR Precision F1

score

1. SNV 2. 1

st Deriv



st


33.3 P 0 112 0 14 0.00 1.00 0.00 0.00 0.00

RS 0 98 0 28 0.00 1.00 0.00 0.00 0.00

PKO 0 112 0 14 0.00 1.00 0.00 0.00 0.00

RSPKO

0 112 0 14 0.00 1.00 0.00 0.00 0.00

RSPO 42 0 84 0 1.00 0.00 1.00 0.33 0.50

PPKO 0 112 0 14 0.00 1.00 0.00 0.00 0.00

*P class: palm oil, palm stearin, palm olein; PKO class: palm kernel oil; RS class: rapeseed oil, sunflower oil, rapeseed and

sunflower mixtures; RSPO class: RS class + P class; PPKO class: P group + PKO; RSPKO class: RS class + PKO;. ACC:

accuracy; TP: true positive; TN: true negative; FP: false positive; FN: false negative, TPR: true positive rate; FPR: false

positive rate

The confusion matrices containing the information about actual and predicted classifications done by the

two classifiers (PLS-DA and SIMCA) are shown in Table 4 and 5, respectively.


Table 4. Performance of the PLS-DA classification model (confusion matrix).

Predicted PLS-DA

P

RS

PK

O

RS

PK

O

RS

PO

PP

KO

To

tal

Accu

racy

(%)

Actu

al

P 14 0 0 0 0 0 14 100.00

RS 0 28 0 0 0 0 28 100.00

PKO 0 0 14 0 0 0 14 100.00

RSPKO 0 0 0 14 0 0 14 100.00

RSPO 7 0 0 0 35 0 42 83.33

PPKO 6 0 1 0 0 7 14 50.00

Reliability

(%)

51.8

5

100.0

0

93.3

3

100.0

0

100.0

0

100.0

0

The average accuracy, average reliability and overall accuracy were 88.89, 90.86 and 88.89 %,

respectively.

Table 5. Performance of the SIMCA classification model

Predicted SIMCA

P

RS

PK

O

RS

PK

O

RS

PO

PP

KO

To

tal

Accu

racy

(%)

Actu

al

P 0 0 0 0 14 0 14 0.00

RS 0 0 0 0 28 0 28 0.00

PKO 0 0 0 0 14 0 14 0.00

RSPKO 0 0 0 0 14 0 14 0.00

RSPO 0 0 0 0 42 0 42 100.00

PPKO 0 0 0 0 14 0 14 0.00

Reliability

(%) 0.0

0

0.0

0

0.0

0

0.0

0

33.3

3

0.0

0

The performance of SIMCA algorithm is poorer than the PLS-DA with an average accuracy, average

reliability and overall accuracy of the prediction of 16.67%, 5.56% and 33.33%, respectively.

4.2 CALCULATION OF P-VALUES

P-values were calculated to define thresholds for normalized confidence/probability as an upper limit for

classifying a sample to each class and for sample referral to the second step or confirmation step. For this

calculation, the training dataset is used as a prediction set to our model.

P-values were calculated by firstly determining our experiment's degrees of freedom:

Degrees of freedom (dF) = n-1


and calculating the Chi-square score using the following formula:

x2 = Σ((o-e)

2/e)

where "o" is the observed value and "e" is the expected value for each class.

Chi-square probability distribution is used to find P-value. The bigger the obtained Chi-Square is, the

greater the difference between the observed and expected frequencies will be. Due to the very high Chi-

square values obtained using the above formula, a web-based Chi-Square Distribution Calculator instead of

the Chi-square distribution table was used to automatically estimate the P-values by using the dF and Chi-

square values.

The significance value of 0.05 or 5% was selected for these experiments. This means that experimental

results that meet this significance level have, at most, a 5% chance of being the result of pure chance. In

other words, there is a 95% chance that the results were caused by the scientists’ manipulation of

experimental classes, rather than by chance.

4.2.1 Results with PLS-DA

The P-value estimated by using 5 degrees of freedom and a Chi-square value of 131.82 was <0.00001 for

a dataset composed of 686 samples. The calculation of each class contribution to the Chi-square value of the

classification model developed by PLS-DA is shown in Table 6.

Table 6. Expected, observed number of samples and contribution to Chi-square for each class

Class Expected

Test

proportion Observed

Contribution

to Chi-square

P 110 0.160 168 30.58

RS 120 0.175 183 33.08

PKO 64 0.093 91 11.39

RSPKO 116 0.169 80 11.17

RSPO 180 0.262 110 27.22

PPKO 96 0.140 54 18.38


sunflower mixtures; RSPO class: RS class + P class; PPKO class: P group + PKO; RSPKO class: RS class + PKO.

In these results, the P-value calculated is less than the confidence level of 0.05 and so the null hypothesis

can be rejected. This is the hypothesis that the experimental classes manipulated did not affect the results

observed. Thus, it is highly likely that there is a correlation between the classes manipulated and the results

observed.

The individual class contributions to Chi-square were used to quantify how much of the total Chi-square

statistic is attributable to each class's divergence.

Contribution to Chi-square = (o-e)2/e

The Chi-square statistic is the sum of these values for all classes.

Classes with a large difference between observed and expected values make a larger contribution to the

overall Chi-square statistic. The largest contribution comes from RS class. Based on the above class

contributions to Chi-square and the resulted normalized confidence/probability, the following thresholds were

selected (Table 7). The higher class contribution to Chi-square is, the higher threshold was selected for this

specific class.

Table 7. Thresholds for the normalized confidence/probability

P * RS PKO RSPKO RSPO PPKO

>0.21 >0.21 >0.18 >0.18 >0.19 >0.18



sunflower mixtures; RSPO class: RS class + P class; PPKO class: P group + PKO; RSPKO class: RS class + PKO.

4.2.2 Results with SIMCA

The P-value estimated by using 5 degrees of freedom and a Chi-square value of 192.1 was <0.00001 for a

dataset composed of 686 samples. The calculation of each class contribution to the Chi-square value of the

classification model developed by PLS-DA is shown below (Table 10). In this case, the P-value calculated is

also less than the confidence level of 0.05. Therefore, the classes of this experiment had meaningful effect on

the results.

Table 8. Expected, observed number of samples and contribution to Chi-square

Class Expected

Test

proportion Observed

Contribution to

Chi-square

P* 110 0.160 97 1.54

RS 120 0.175 87 9.08

PKO 64 0.093 0 64.00

RSPKO 116 0.169 160 16.69

RSPO 180 0.262 296 74.76

PPKO 96 0.140 46 26.04

*P class: palm oil, palm stearin, palm olein; PKO class: palm kernel oil; RS class: rapeseed oil, sunflower oil, rapeseed and sunflower

mixtures; RSPO class: RS class + P class; PPKO class: P group + PKO; RSPKO class: RS class + PKO.

RSPO class makes the greatest contribution to the high Chi-square value. Using the above class

contributions to Chi-square and the resulted normalized confidence/probability, the following thresholds were

selected. A higher threshold was selected for the classes with high contribution to Chi-square.

Table 9. Thresholds for the normalized confidence/probability

P* RS PKO RSPKO RSPO PPKO

>0.20 >0.20 >0.22 >0.21 >0.29 >0.21

*P class: palm oil, palm stearin, palm olein; PKO class: palm kernel oil; RS class: rapeseed oil, sunflower oil, rapeseed and sunflower

mixtures; RSPO class: RS class + P class; PPKO class: P group + PKO; RSPKO class: RS class + PKO.

4.3 PREDICTION WITH PLS-DA

PLS-DA consists in a classical PLS regression where the response variable is a categorical one (replaced

by the set of dummy variables describing the categories) expressing the class membership of the statistical

units. Therefore, PLS-DA does not allow for other response variables than the one for defining the groups of

individuals. As a consequence, all measured variables play the same role with respect to the class

assignment. Actually, PLS components are built by trying to find a proper compromise between two purposes:

describing the set of explanatory variables and predicting the response ones. This approach may go further

than the classical SIMCA classification method that works more on the reassignment of units to pre-defined

classes. PLS-DA calibration models were created in our previous DEFRA project (FAO117). Samples in the

inter-lab validation set were compared to the model and assigned either to the category being modelled or not

on the basis of their normalised probabilities from the model and the thresholds defined by the P-values in the

4.2 unit. Each of these probabilities is a negative exponential function of the distance between the testing

sample and each model class. Their values were normalised by dividing them by the sum of the probabilities

associated with each of the testing samples in the probability space so that this sum is equal to one.

Performance of classification models (95% confidence level) was calculated using several parameters:


Sensitivity or true positive rate is the percentage of positive labelled samples that were predicted

as positive (Sensitivity=TP / (TP + FN)),

Specificity or true negative rate is the percentage of negative labelled samples that were predicted

as negative (Specificity=TN / (TN + FP)),

False positive rate (FPR) is the percentage of incorrectly classified samples that were, in fact,

negative. (FPR=FP / (FP+TN)),

Precision is the percentage of positive predictions that are correct (Precision=TP/ (TP+FP)) and,

F1 score (also F-score or F-measure) is the harmonic mean of precision and sensitivity where an

F1 score reaches its best value at 1 and worst score at 0 (F1 = 2 TP / (2TP + FP + FN)).

Inter-lab validation samples (n=126) were predicted using the calibration models developed in DEFRA

FAO 117. Specific intervals from the FTIR spectra (from 654.23 to 1875.4 cm-1 and from 2520.0 to 3120.7

cm-1

, 3781 variables) and two latent variables were used. The results are shown in Table 10 and 11.

Table 10. PLSDA model performances on inter-lab validation set using 3781 variables


dataset

Pre-processing: prediction dataset

PLS-DA

ACC (%) Class TP TN FP FN

Sensitivity or TPR

Specificity FPR Precisio

n F1

score

1. SNV 2. 1

st Deriv



st


83.33 P 12 109 3 2 0.86 0.97 0.03 0.80 0.83

RS 26 98 0 2 0.93 1.00 0.00 1.00 0.96

PKO 14 112 0 0 1.00 1.00 0.00 1.00 1.00

RSPKO

13 112 0 1 0.93 1.00 0.00 1.00 0.96

RSPO 35 84 0 7 0.83 1.00 0.00 1.00 0.91

PPKO 5 112 0 9 0.36 1.00 0.00 1.00 0.53

* ACC: accuracy; TP: true positive; TN: true negative; FP: false positive; FN: false negative, TPR: true positive rate; FPR:



class + PKO.

The average accuracy, average reliability and overall accuracy were 81.75, 96.67 and 83.33 %,

respectively. Accuracy is the fraction of correctly classified samples with regard to all samples of that ground

truth class and reliability is the fraction of correctly classified samples with regard to all samples classified as

that class. The overall accuracy is calculated as the total number of correctly classified samples divided by

the total number of validation samples.

The confusion table is presented below in Table 11. Twelve out of fourteen samples of the validation set

that belong to the P group were classified as belonging to the P group and the rest (two) were not assigned to

any of the modelled groups (non-classified). All samples of the validation set that belong to the RS group

were correctly classified as belonging to the RS group. Twelve samples of the validation set that belong to the

PKO group were classified as belonging to the PKO group whereas two samples were not assigned to any of

the modelled groups (non-classified). Thirteen samples of the validation set that belong to the RSPKO group

were classified as belonging to the RSPKO group and one sample was not assigned to any of the modelled

groups (non-classified). Thirty-five samples of the validation set that belong to the RSPO group were

classified as belonging to the RSPO group whereas two samples were wrongly classified as belonging to the

P group and five samples were not assigned to any of the modelled groups (non-classified). Five samples of

the validation set that belong to the PPKO group were classified as belonging to the PPKO group whereas

one sample was wrongly classified as belonging to the P group and eight samples were not assigned

to any of the modelled groups (non-classified).


Table 11. Performance of the PLS-DA classification model after the application of thresholds

Predicted

P

RS

PK

O

RS

PK

O

RS

PO

PP

KO

No

n

Cla

ss

ifie

d

To

tal

Accu

racy

(%)

Actu

al

P 12 0 0 0 0 0 2 14 85.71

RS 0 26 0 0 0 0 2 28 92.86

PKO 0 0 14 0 0 0 0 14 100.00

RSPKO 0 0 0 13 0 0 1 14 92.86

RSPO 2 0 0 0 35 0 5 42 83.33

PPKO 1 0 0 0 0 5 8 14 35.71

Reliability

(%) 80.0

0

100.0

0

100.0

0

100.0

0

100.0

0

100.0

0

A total of 18 samples were non-classified to any of our modelled groups since the probability was below

the stablish threshold for each class (P-value). All 18 samples are referred to the confirmation step. The three

false positive samples (samples classified as belonging to the wrong group) give an error of 2.38% to the

method when using PLS-DA algorithm. The expected class and the observed class as well as the potential

reason for the misclassification are shown in Table 12.

Table 12. Potential reason for the miss-classification of samples

Sample

Name

Sample

composition

Expected

Class

Observed

Class Potential reason

1 gmx5-a.spa 70%RO - 30%PS RSPO P

It contains palm stearin which is difficult to

analyse because it gets solidified quickly when

placed in the non-heated ATR. Most of the

participants said that samples were solid

before the end of the spectra collection.

Additionally, admixtures of palm stearin and

rapeseed oil were not included in our

calibration models (only palm stearin and palm

oil admixture was included which belong to P

class)

2 hmx5_a.spa 70%RO - 30%PS RSPO P The same than sample 1

3 lmx6b-a.spa 40%PKO - 60%PO PPKO P

This admixture contains 60 % of palm oil and

the model classified it to the closest class, in

this case P.

4.4 PREDICTION WITH SIMCA

SIMCA is a class-modelling technique that involves the use of principal components to model a class of

material on the basis of samples in a training (or calibration) set. SIMCA calibration models were created in

DEFRA project FAO117. Samples in the inter-lab validation set were compared to the model and assigned


either to the category being modelled or not, on the basis of their predicted distance from the model (and P

values).

Performance of classification models (95% confidence level) was calculated using several parameters;

sensitivity (the percentage of positive labelled samples that were predicted as positive), specificity (the

percentage of negative labelled samples that were predicted as negative), false positive rate (FPR) (the

percentage of incorrectly classified samples that were, in fact, negative), precision (the percentage of positive

predictions that are correct) and F1 score (the harmonic mean of precision and sensitivity).

Inter-lab validation samples (n=126) were predicted using the calibration models. Specific intervals from

the FTIR spectra (from 654.23 to 1875.4 cm-1

and from 2520.0 to 3120.7 cm-1

, 3781 variables) were used.

The results are shown in Table 13 and 14.

Table 13. SIMCA model performances on inter-lab validation set using 3781 variables

Pre-

processing: calibration

dataset

Pre-processing: prediction dataset

SIMCA

ACC(%)


Specificity

FPR Precision F1

score

1. SNV 2. 1

st Deriv



st Deriv


3.96 P 0 112 0 14 0.00 1.00 0.00 0.00 0.00

RS 0 98 0 28 0.00 1.00 0.00 0.00 0.00

PKO 0 112 0 14 0.00 1.00 0.00 0.00 0.00

RSPKO 0 112 0 14 0.00 1.00 0.00 0.00 0.00

RSPO 5 82 2 37 0.12 0.98 0.02 0.71 0.20

PPKO

0 112 0 14 0.00 1.00 0.00 0.00 0.00

* ACC: accuracy; TP: true positive; TN: true negative; FP: false positive; FN: false negative, TPR: true positive rate; FPR:



class + PKO.

The confusion table is presented below in Table 13. One out of fourteen samples of the validation set that

belong to the P group were classified as belonging to the RSPO group and the rest (13 samples) were not

assigned to any of the modelled groups (non-classified). All samples of the validation set that belong to the

RS, RSPKO and PPKO group were not assigned to any of the modelled groups (non-classified). Five

samples of the validation set that belong to the RSPO group were correctly classified as belonging to the

RSPO group whereas thirty-seven samples were not assigned to any of the modelled groups (non-classified).

One sample of the validation set that belong to the PKO group were classified as belonging to the RSPO

group whereas thirteen samples were not assigned to any of the modelled groups (non-classified).

Table 14. Performance of the SIMCA classification model after the application of thresholds

Predicted

P

RS

PK

O

RS

PK

O

RS

PO

PP

KO

No

t

Cla

ss

ifie

d

To

tal

Accu

racy

(%)

Actu

al

P 0 0 0 0 1 0 13 14 0.00

RS 0 0 0 0 0 0 28 28 0.00

PKO 0 0 0 0 1 0 13 14 0.00

RSPKO 0 0 0 0 0 0 14 14 0.00

RSPO 0 0 0 0 5 0 37 42 11.90

PPKO 0 0 0 0 0 0 14 14 0.00


Reliability (%) 0

.00

0.0

0

0.0

0

0.0

0

71.4

3

0.0

0

*P group: palm oil, palm stearin, palm olein; PKO group: palm kernel oil; RS group: rapeseed oil, sunflower oil, rapeseed

and sunflower mixtures; RSPO group: RS group + P group; PPKO group: P group + PKO; RSPKO group: RS group + PKO

A total of 119 samples were non-classified to any of our modelled groups since the probability was below

the stablish threshold for each class (P-value). All 119 samples are referred to the confirmation step. Overall,

the performance of SIMCA algorithm is poor with an average accuracy, average reliability and overall

accuracy of the prediction of 11.90%, 1.98% and 3.97% respectively. Only five samples belonging to the

RSPO group are correctly classified as belonging to the RSPO group. All the other samples are either

wrongly classified (only two samples- false positive samples) or non-classified (n=119). The two false

positive samples (samples classified as belonging to the wrong group) give an error of 1.59% to the method

when using SIMCA algorithm. The expected class and the observed class as well as the potential reason for

the misclassification are shown in Table 15.

Table 15. Potential reason for the miss-classification of samples

4.5 SAMPLE REFERRAL TO THE CONFIRMATION STEP (GC-FAMEs)

A total of 18 samples were referred to the confirmation step when using PLS-DA algorithm and 119

samples when using the SIMCA algorithm (Table 16 and 17).

Table 16. Samples referred to the confirmation step when using FTIR coupled with PLS-DA algorithm.

File Name Actual class

apx1-a.spa P

emx6b_a.spa PPKO

fmx6-a.spa PPKO

gmx6-a.spa PPKO

hmx6_a.spa PPKO

kmx6-a.spa PPKO

kmx5-a.spa RSPO

lmx9b-a.spa RSPO

lmx5b-a.spa RSPO

lpx2b-a.spa RS

lmx6c-a.spa PPKO

lmx7c-a.spa RSPKO

nmx6-a.spa PPKO

npx1-a.spa P

omx6-a.spa PPKO

omx9-a.spa RSPO

omx5-a.spa RSPO

opx2-a.spa RS

Sample

Name

Sample

composition

Expected

Class

Observed

Class Potential reason

1 lpx3a-a.spa 100% PKO PKO RSPO

RSPO class includes big variety of

pure oils and oil admixtures

2 lpx1a-a.spa 100% PO P RSPO The same than sample 1


Only the PLS-DA classification is taken forward in the project although here the - not so satisfactory -

SIMCA classification results are also presented.

Table 17. Samples referred to the confirmation step when using FTIR coupled with SIMCA.

a/a File Name Actual class a/a File Name Actual class

1 apx1-a.spa P 43 gmx4-a.spa RSPO

2 amx2-a.spa RS 44 hpx3_a.spa PKO

3 amx3-a.spa PKO 45 hmx6_a.spa PPKO

4 amx4-a.spa RSPO 46 hpx1-a.spa P

5 amx5-a.spa RSPO 47 hmx9_a.spa RSPO

6 amx6-a.spa PPKO 48 hpx2_2.spa RS

7 amx7-a.spa RSPKO 49 hmx7_a.spa RSPKO

8 amx8-a.spa RS 50 hmx8_a.spa RS

9 epx3a_a.spa PKO 51 hmx4_a.spa RSPO

10 emx6a_a.spa PPKO 52 kpx3-a.spa PKO

11 epx1a_a.spa P 53 kmx6-a.spa PPKO

12 emx9a_a.spa RSPO 54 kpx1-a.spa P

13 emx5a_a.spa RSPO 55 kpx2-a.spa RS

14 epx2a_a.spa RS 56 kmx7-a.spa RSPKO

15 emx7a_a.spa RSPKO 57 kmx8-a.spa RS

16 emx8a_a.spa RS 58 kmx4-a.spa RSPO

17 emx4a_a.spa RSPO 59 kmx9-a.spa RSPO

18 epx3b_a.spa PKO 60 lmx6a-a.spa PPKO

19 emx6b_a.spa PPKO 61 lmx9a-a.spa RSPO

20 epx1b_a.spa P 62 lmx5a-a.spa RSPO

21 emx9b_a.spa RSPO 63 lpx2a-a.spa RS

22 emx5b_a.spa RSPO 64 lmx7a-a.spa RSPKO

23 epx2b_a.spa RS 65 lmx8a-a.spa RS

24 emx7b_a.spa RSPKO 66 lmx4a-a.spa RSPO

25 emx8b_a.spa RS 67 lpx3b-a.spa PKO

26 emx4b_a.spa RSPO 68 lmx6b-a.spa PPKO

27 fpx3-a.spa PKO 69 lpx1b-a.spa P

28 fmx6-a.spa PPKO 70 lmx9b-a.spa RSPO

29 fpx1-a.spa P 71 lmx5b-a.spa RSPO

30 fmx9-a.spa RSPO 72 lpx2b-a.spa RS

31 fmx5-a.spa RSPO 73 lmx7b-a.spa RSPKO

32 fpx2-a.spa RS 74 lmx8b-a.spa RS

33 fmx7-a.spa RSPKO 75 lmx4b-a.spa RSPO

34 fmx8-a.spa RS 76 lpx3c-a.spa PKO

35 fpx4-a.spa RSPO 77 lmx6c-a.spa PPKO

36 gpx3-a.spa PKO 78 lpx1c-a.spa P

37 gmx6-a.spa PPKO 79 lmx9c-a.spa RSPO

38 gpx1-a.spa P 80 lmx5c-a.spa RSPO

39 gmx9-a.spa RSPO 81 lpx2c-a.spa RS

40 gpx2-a.spa RS 82 lmx7c-a.spa RSPKO

41 gmx7-a.spa RSPKO 83 lmx8c-a.spa RS

42 gmx8-a.spa RS 84 lmx4c-a.spa RSPO


a/a File Name Actual class

85 npx3-a.spa PKO

86 nmx6-a.spa PPKO

87 npx1-a.spa P

88 nmx9-a.spa RSPO

89 nmx5-a.spa RSPO

90 npx2-a.spa RS

91 nmx7-a.spa RSPKO

92 nmx8-a.spa RS

93 nmx4-a.spa RSPO

94 opx3-a.spa PKO

95 omx6-a.spa PPKO

96 opx1-a.spa P

97 omx9-a.spa RSPO

98 opx2-a.spa RS

99 omx7-a.spa RSPKO

100 omx8-a.spa RS

101 omx4-a.spa RSPO

102 ppx3-a.spa PKO

103 pmx6-a.spa PPKO

104 ppx1-a.spa P

105 pmx9-a.spa RSPO

106 pmx5-a.spa RSPO

107 ppx2-a.spa RS

108 pmx7-a.spa RSPKO

109 pmx8-a.spa RS

110 pmx4-a.spa RSPO

111 palm kernel oil-a.spa

PKO

112 40 pko + 60 po -a.spa

PPKO

113 palm oil-a.spa P

114 70 pol + 30 ro -a.spa

RSPO

115 70 ro + 30 ps -a.spa RSPO

116 rapeseed oil-a.spa RS

117 50 ro + 50 pko -a.spa

RSPKO

118 40 ro + 60 so -a.spa RS

119 50 ro + 50 po -a.spa RSPO

5. PREDICTION- CONFIRMATION STEP

Individual fatty acid concentrations were calculated using the internal standard method as calculated in the

phase 1 of the FAO117 project. Response factors were calculated from the external fatty acid standards with

respect to C17:0 which was used as the internal standard. The peak area of the individual fatty acid was

divided by the peak area of the internal standard, multiplied by the internal standard concentration and then

by the corresponding response factor and then applying sample weight and dilution factors. Duplicate

analyses were then averaged. The fatty acid concentrations of all the oil samples included in this validation

trial are presented in Table 18.


A total of 18 samples from all participants were referred to the confirmation step when using PLS-DA

algorithm. Regardless the participants, the number of different samples referred to the confirmation step was

six. And those samples were:

Sample 1: Palm oil (P class)

Sample 2: Rapeseed oil (RS group)

Sample 5: Rapeseed oil (70%) + Palm stearin (30%) (RSPO class)

Sample 6: Palm kernel oil (40%) + Palm oil (60%) (PPKO class)

Sample 7: Rapeseed oil (50%) + Palm kernel oil (50%) (RSPKO class)

Sample 9: Palm olein (70%) + Rapeseed oil (30%) (RSPO class)

These six (6) samples were analysed chromatographically to determine their fatty acid profile according to

the SOPs and all FA contents (mg fatty acid / gram oil blend) and the P/S ratios were calculated.

With the application of the FA criteria (Table 19), the following classification results were obtained:

Sample 1 was not assigned to any of the classes

Sample 2 was found to belong to the RS class

Sample 5 was found to belong to the RSPO class

Sample 6 was found to belong to the PPKO class

Sample 7 was found to belong to the RSPKO class


At the end of the procedure, 16 out of 18 samples were correctly classified whereas two samples were not

assigned to any of the classes because it did not meet all the requirements of every class.

A total of 119 samples were referred to the confirmation step when using SIMCA algorithm. Regardless

the participants, the number of different samples referred to the confirmation step was nine. And those

samples were:

Sample 1: Palm oil (P class)

Sample 2: Rapeseed oil (RS group)

Sample 3: Palm kernel oil (PKO class)

Sample 4: Rapeseed oil (50%) + Palm oil (50%) (RSPO class)

Sample 5: Rapeseed oil (70%) + Palm stearin (30%) (RSPO class)

Sample 6: Palm kernel oil (40%) + Palm oil (60%) (PPKO class)

Sample 7: Rapeseed oil (50%) + Palm kernel oil (50%) (RSPKO class)

Sample 9: Palm olein (70%) + Rapeseed oil (30%) (RSPO class)

These nine (9) samples were analysed chromatographically to determine their fatty acid profile according

to the SOPs and all FA contents (mg fatty acid / gram oil blend) and the P/S ratios were calculated.

With the application of the FA criteria (Table 19), the following classification results were obtained:


Sample 2 was found to belong to the RS class




Sample 6 was found to belong to the PPKO class

Sample 7 was found to belong to the RSPKO class



At the end of the procedure, 93 out of 119 samples were correctly classified whereas 26 samples were not

assigned to any of the classes because they did not meet all the requirements of every class.

Table 18. Content (mg/g) of fatty acids of interest for all oil samples included in the inter-lab validation.

*FA: fatty acid; P/S ratio: polyunsaturated FAs/Saturated FAs; P group: palm oil, palm stearin, palm olein; PKO group: palm

kernel oil; RS group: rapeseed oil, sunflower oil, rapeseed and sunflower mixtures; RSPO group: RS group + P group;

PPKO group: P group + PKO; RSPKO group: RS group + PKO; PO: palm oil; RO: rapeseed oil; PKO: palm kernel oil; PS:

palm stearin; POL: palm olein.

Table 19. Classification criteria of fatty acids for every one of the 6 classes developed from control in-

house oil admixtures (DEFRA FAO117).

*FA: fatty acid; P/S ratio: polyunsaturated FAs/Saturated FAs; P group: palm oil, palm stearin, palm olein; PKO group: palm

kernel oil; RS group: rapeseed oil, sunflower oil, rapeseed and sunflower mixtures; RSPO group: RS group + P group;

PPKO group: P group + PKO; RSPKO group: RS group + PKO

Fatty acid content (mg FA/g oil)

C8:0 Caprylic

acid

C12:0 Lauric acid

C14:0 Myristic

acid

C16:0 Palmitic

acid

C18:1 Oleic acid

C18:2 c n6 Linoleic

acid P/S ratio

100% PO P group

0.081 1.233 4.618 268.228 336.052 98.829 0.325

100% RO RS group

0.000 0.000 0.158 24.627 361.206 165.410 6.275

40% RO+60% SO RS group

0.000 0.000 0.248 34.679 280.455 384.626 6.792

100% PKO PKO group

17.698 214.702 75.010 54.168 106.653 21.056 0.045

40% PKO+60% PO PPKO group

5.061 75.942 30.996 158.751 199.496 54.047 0.189

50% RO+50% PO RSPO group

0.031 0.578 2.562 152.631 309.400 99.498 0.776

70% RO+30% PS RSPO group

0.000 0.181 1.725 128.258 345.890 130.745 1.299

70% POL+30% RO RSPO group

0.047 0.894 3.270 170.749 320.622 104.774 0.650

50% RO+50% PKO RSPKO group

8.418 100.836 37.843 41.922 239.518 90.102 0.603

Specific FA P group PKO group RS group PPKO group RSPO group RSPKO group

C8:0 >20 >2.5 >2.5

C12:0 >0.99 >300 <0.1

C14:0 7.8-10.0 <0.7

C16:0 315-490 >=70 58-330 35-70

C18:1 >=195

C18:2c n6 43-80 135-550 25-75 70-425 24-450

P/S ratio <0.25 <0.04 >4.0 <=0.3 >=0.325


Appendix V – Fatty acid Inter-Lab trial

Sample 1: Standard Soya-Maize oil blend

FATTY ACIDS

BCR-162R IRMM


C6:0 0.00 0.01 2.00

C8:0 0.00 0.00 0.01 2.00

C10:0 0.00 <0.1 0.01 1.76

C12:0 0.00 <0.1 0.01 1.76

C14:0 0.04 <0.1 0.05 0.10 0.41

C15:0 0.00 0.03 2.00

C16:0 10.74 11.18 10.90 10.69 11.00 0.02

C16:1c 0.06 0.12 0.20 0.92

C17:0 0.07 0.07 0.10 0.71

C17:1c 0.03 0.08 1.43

C18:0 2.82 3.27 2.90 2.84 2.90 0.07

C18:1t 0.00 <0.1 0.03 0.10 0.87

C18:1c 25.40 28.58 26.70 26.71 26.60 0.04

C18:2t 0.16 <0.1 0.46 0.50 0.67

C18:2c 54.13 52.13 55.30 53.86 53.60 0.02

C20:0 0.27 0.40 0.40 0.40 0.17

C18:3c6,9,12 0.16 0.01 1.58

C20:1c 0.35 0.30 0.35 0.30 0.09

C18:3c9,12,15 3.35 3.28 3.60 3.75 3.30 0.07

C20:2c 0.02 0.03 1.17

C22:0 0.28 <0.1 0.29 0.30 0.39

C23:0 0.00 0.00

C24:0 0.12 0.17 0.10 0.73


Sample 2: Palm oil and shea butter admixture (50% palm oil + 50% shea butter)

FATTY ACIDS


C6:0 0.00

0.01

2.00

C8:0 0.00 0.00 0.01

2.00

C10:0 0.00 <0.1 0.01

1.76

C12:0 0.13 <0.1 0.13 0.20 0.31

C14:0 0.58 0.50 0.49 0.50 0.08

C15:0 0.03

0.03

1.17

C16:0 24.35 24.50 24.02 24.60 0.01

C16:1c 0.06

0.10 0.10 0.73

C17:0 0.07

0.08 0.10 0.69

C17:1c 0.00

0.02

2.00

C18:0 24.24 22.80 22.65 22.70 0.03

C18:1t 0.00 <0.1 0.05 0.10 0.76

C18:1c 41.68 43.00 42.64 42.20 0.01

C18:2t 0.07 <0.1 0.14 0.20 0.44

C18:2c 7.49 8.20 7.98 7.90 0.04

C20:0 0.85 0.90 0.93 0.90 0.04

C18:3c6,9,12 0.00

0.03

2.00

C20:1c 0.21 <0.1 0.26 0.30 0.40

C18:3c9,12,15 0.12 <0.1 0.21 0.20 0.35

C20:2c 0.00

0.01

2.00

C22:0 0.08 <0.1 0.10 0.10 0.11

C23:0 0.00

C24:0 0.05

0.08 0.10 0.77


Sample 3: Palm oil and rapeseed oil admixture (65% palm oil + 35% rapeseed oil)

FATTY ACIDS


C6:0 0.00

0.01

2.00

C8:0 0.00 0.00 0.01

2.00

C10:0 0.00 <0.1 0.01

1.76

C12:0 0.05 <0.1 0.07 0.10 0.28

C14:0 0.73 0.70 0.63 0.70 0.06

C15:0 0.03

0.05

1.23

C16:0 30.22 30.10 29.53 30.20 0.01

C16:1c 0.12

0.23 0.20 0.74

C17:0 0.06

0.08 0.10 0.71

C17:1c 0.00

0.06

2.00

C18:0 3.66 3.50 3.45 3.50 0.03

C18:1t 0.00 <0.1 0.05 0.10 0.76

C18:1c 49.02 48.90 48.46 47.90 0.01

C18:2t 0.24 <0.1 0.38 0.50 0.57

C18:2c 12.42 13.30 12.89 12.70 0.03

C20:0 0.38 0.40 0.45 0.40 0.07

C18:3c6,9,12 0.16

0.01

1.57

C20:1c 0.59 0.50 0.51 0.50 0.08

C18:3c9,12,15 2.13 2.60 2.82 2.30 0.13

C20:2c 0.00

0.02

2.00

C22:0 0.13 <0.1 0.16 0.10 0.23

C23:0 0.00

C24:0 0.06

0.10 0.10 0.73


Sample 4: Palm kernel oil and palm oil admixture (42% palm kernel oil + 58% palm oil)

FATTY ACIDS


C6:0 0.03

0.05 0.10 0.94

C8:0 0.74 1.40 0.95 1.10 0.26

C10:0 1.03 1.40 1.10 1.30 0.14

C12:0 18.14 19.20 17.01 18.70 0.05

C14:0 7.74 7.00 6.67 7.10 0.06

C14:1

0.10 2.00

C15:0 0.03

0.03

1.15

C16:0 29.09 28.80 28.93 28.50 0.01

C16:1c 0.06

0.13 0.10 0.77

C17:0 0.06

0.07 0.10 0.73

C17:1c 0.00

0.02

2.00

C18:0 3.91 3.70 3.77 3.60 0.03

C18:1t 0.00 <0.1 0.04

1.33

C18:1c 32.46 32.20 33.79 31.90 0.03

C18:2t 0.09 <0.1 0.15 0.10 0.24

C18:2c 6.09 6.30 6.46 6.00 0.03

C20:0 0.25 <0.1 0.32 0.30 0.41

C18:3c6,9,12 0.00

0.01

2.00

C20:1c 0.12 <0.1 0.17 0.10 0.26

C18:3c9,12,15 0.08 <0.1 0.18 0.10 0.37

C20:2c 0.00

0.01

2.00

C22:0 0.04 <0.1 0.06 0.10 0.42

C23:0 0.00

C24:0 0.04 0.07 0.10 0.82


Sample 5: Coconut oil and palm oil admixture (58% coconut oil + 42 palm oil)

FATTY ACIDS


C6:0 0.11

0.18 0.20 0.74

C8:0 2.44 4.10 2.83 3.20 0.23

C10:0 2.76 3.40 2.76 3.10 0.10

C12:0 26.88 27.50 25.06 26.80 0.04

C14:0 12.73 11.30 11.26 11.50 0.06

C15:0 0.00

0.03

2.00

C16:0 24.85 24.30 25.46 24.60 0.02

C16:1c 0.05

0.10 0.10 0.77

C17:0 0.04

0.05 0.10 0.86

C17:1c 0.00

0.01

2.00

C18:0 3.94 3.70 3.88 3.70 0.03

C18:1t 0.00 <0.1 0.03

1.44

C18:1c 20.69 20.60 22.23 20.60 0.04

C18:2t 0.08 <0.1 0.15 0.30 0.63

C18:2c 5.05 5.10 5.41 5.00 0.04

C20:0 0.19 <0.1 0.23 0.20 0.31

C18:3c6,9,12 0.00

0.01

2.00

C20:1c 0.07 <0.1 0.08 0.10 0.14

C18:3c9,12,15 0.07 <0.1 0.12 0.10 0.22

C20:2c 0.00

0.01

2.00

C22:0 0.02 <0.1 0.04

1.05

C23:0 0.00

C24:0 0.03

0.06 0.10 0.93


Sample 6: Soybean oil and palm oil admixture (59% soybean oil + 41% palm oil)

FATTY ACIDS


C6:0 0.00

0.01

2.00

C8:0 0.00 0.00 0.01

2.00

C10:0 0.00 <0.1 0.01

1.76

C12:0 0.05 <0.1 0.05 0.10 0.38

C14:0 0.58 0.50 0.47 0.50 0.09

C15:0 0.02

0.04

1.24

C16:0 26.22 24.80 24.40 25.10 0.03

C16:1c 0.07

0.15 0.10 0.77

C17:0 0.08

0.09 0.10 0.68

C17:1c 0.01

0.06

1.56

C18:0 4.87 4.30 4.24 4.30 0.07

C18:1t 0.00 <0.1 0.05 0.10 0.76

C18:1c 31.02 30.30 29.99 29.80 0.02

C18:2t 0.11 <0.1 0.29 0.50 0.75

C18:2c 32.47 35.90 34.96 34.30 0.04

C20:0 0.27 0.40 0.38 0.40 0.17

C18:3c6,9,12 0.13

0.04

1.17

C20:1c 0.26 <0.1 0.25 0.30 0.39

C18:3c9,12,15 3.46 3.90 4.01 3.50 0.07

C20:2c 0.00

0.03

2.00

C22:0 0.26 <0.1 0.32 0.30 0.41

C23:0 0.02

2.00

C24:0 0.08

0.13 0.10 0.71


Sample 7: Palm oil

FATTY ACIDS


C6:0 0.00

0.01

2.00

C8:0 0.00 0.00 0.01

2.00

C10:0 0.00 <0.1 0.01

1.76

C12:0 0.09 <0.1 0.09 0.10 0.04

C14:0 1.12 1.00 0.93 1.00 0.08

C15:0 0.04

0.05 0.10 0.84

C16:0 43.82 43.60 42.56 43.20 0.01

C16:1c 0.10

0.19 0.20 0.77

C17:0 0.08

0.09 0.10 0.68

C17:1c 0.00

0.02

2.00

C18:0 4.70 4.50 4.41 4.40 0.03

C18:1t 0.00 <0.1 0.05 0.10 0.76

C18:1c 39.96 40.40 40.34 39.60 0.01

C18:2t 0.15 <0.1 0.41 0.40 0.62

C18:2c 9.26 10.20 9.85 9.50 0.04

C20:0 0.33 0.40 0.41 0.40 0.10

C18:3c6,9,12 0.03

0.01

1.07

C20:1c 0.14 <0.1 0.14 0.20 0.29

C18:3c9,12,15 0.10 <0.1 0.24 0.10 0.53

C20:2c 0.00

0.01

2.00

C22:0 0.05 <0.1 0.07 0.10 0.32

C23:0 0.00

C24:0 0.04

0.08 0.10 0.79


Sample 8: Standard cocoa butter

FATTY ACIDS


C6:0 0.00

0.01

2.00

C8:0 0.00 0.00 0.01

2.00

C10:0 0.00 <0.1 0.01

1.76

C12:0 0.00 <0.1 0.01

1.76

C14:0 0.08 <0.1 0.08 0.10 0.10

C15:0 0.02

0.04

1.32

C16:0 24.95 25.80 25.08 25.40 0.02

C16:1c 0.16

0.27 0.20 0.72

C17:0 0.21

0.25 0.30 0.69

C17:1c 0.00

0.03

2.00

C18:0 38.52 36.70 36.60 36.00 0.03

C18:1t 0.00 <0.1 0.02

1.57

C18:1c 32.03 33.50 33.00 33.20 0.02

C18:2t 0.00 <0.1 0.01

1.76

C18:2c 2.63 2.90 2.86 2.80 0.04

C20:0 1.04 1.10 1.12 1.10 0.03

C18:3c6,9,12 0.00

0.02

2.00

C20:1c 0.03 <0.1 0.07 0.10 0.44

C18:3c9,12,15 0.13 <0.1 0.18 0.20 0.30

C20:2c 0.00

0.01

2.00

C22:0 0.15 <0.1 0.20 0.20 0.30

C23:0 0.00

C24:0 0.05

0.12 0.10 0.77

Evidence Project Final Report -...

Documents

Transcript of Evidence Project Final Report -...