Evidence Project Final Report -...
Transcript of Evidence Project Final Report -...
EVID4 Evidence Project Final Report (Rev. 06/11) Page 1 of 106
General Enquiries on the form should be made to:
Defra, Procurements and Commercial Function (Evidence Procurement Team) E-mail: [email protected]
Evidence Project Final Report
Note
In line with the Freedom of Information Act 2000, Defra aims to place the results of its completed research projects in the public domain wherever possible. The Evidence Project Final Report is designed to capture the information on the results and outputs of Defra-funded research in a format that is easily publishable through the Defra website An Evidence Project Final Report must be completed for all projects.
This form is in Word format and the boxes may be expanded, as appropriate.
ACCESS TO INFORMATION
The information collected on this form will be stored electronically and may be sent to any part of Defra, or to individual researchers or organisations outside Defra for the purposes of reviewing the project. Defra may also disclose the information to any outside organisation acting as an agent authorised by Defra to process final research reports on its behalf. Defra intends to publish this form on its website, unless there are strong reasons not to, which fully comply with exemptions under the Environmental Information Regulations or the Freedom of Information Act 2000.
Defra may be required to release information, including personal data and commercial information, on request under the Environmental Information Regulations or the Freedom of Information Act 2000. However, Defra will not permit any unwarranted breach of confidentiality or act in contravention of its obligations under the Data Protection Act 1998. Defra or its appointed agents may use the name, address or other details on your form to contact you in connection with occasional customer research aimed at improving the processes through which Defra works with its contractors.
Project identification
1. Defra Project code FAO 158
2. Project title
Further development and validation of the proposed methodology to verify vegetable oil species in mixtures of oil
3. Contractor organisation(s)
The Queen’s University Belfast
54. Total Defra project costs £ 107,102
(agreed fixed price)
5. Project: start date ................ 1/8/14
end date ................. 31/10/15
EVID4 Evidence Project Final Report (Rev. 06/11) Page 2 of 106
6. It is Defra’s intention to publish this form.
Please confirm your agreement to do so. ................................................................................... YES NO
(a) When preparing Evidence Project Final Reports contractors should bear in mind that Defra intends that they be made public. They should be written in a clear and concise manner and represent a full account of the research project which someone not closely associated with the project can follow.
Defra recognises that in a small minority of cases there may be information, such as intellectual property or commercially confidential data, used in or generated by the research project, which should not be disclosed. In these cases, such information should be detailed in a separate annex (not to be published) so that the Evidence Project Final Report can be placed in the public domain. Where it is impossible to complete the Final Report without including references to any sensitive or confidential data, the information should be included and section (b) completed. NB: only in exceptional circumstances will Defra expect contractors to give a "No" answer.
In all cases, reasons for withholding information must be fully in line with exemptions under the Environmental Information Regulations or the Freedom of Information Act 2000.
(b) If you have answered NO, please explain why the Final report should not be released into public domain
Executive Summary
7. The executive summary must not exceed 2 sides in total of A4 and should be understandable to the intelligent non-scientist. It should cover the main objectives, methods and findings of the research, together with any other significant events and options for new work.
This report details the results of the work to validate a newly developed methodology for vegetable
oil species identification in a refined vegetable oil blend and its extension to processed foods
containing vegetable oil. The method has particular emphasis on the detection of palm oil and its
derivatives which is by far the most widely used vegetable oil in the food industry. Recent changes in
legislation now require the vegetable oil species used in processed foods to be labelled in the
ingredients list. Having the tools in place to verify and enforce food labelling requirement gives
consumers confidence in the integrity of the food chain. A novel method to allow verification of
vegetable oils species was developed under project FA0117, this follow-on project aimed to validate
the previously developed method. The methodology employed is a staged procedure that consists of a
combination of a spectroscopic technique known as FTIR (Fourier Transform Infrared spectroscopy)
that is used to screen and classify the oils and the established fatty acid methyl esters analysis using
gas chromatography to confirm the composition of the oils when required. These two techniques,
when performed serially using the developed decision making system, exploit the small differences in
chemical composition between different oil species in different type of oil blends to classify the
unknown sample into one of the 6 or 12 oil classes studied. In that way, both untargeted fingerprint
analysis (spectroscopic screening) and targeted analysis (fatty acid quantification by gas
chromatography, GC) are applied to increase result’s certainty. The project was divided into 4 main
sections, a) extension of the current method to include more samples and more oil classes, b) inter-lab
trials: FTIR and GC for fatty acids, c) validation of the method to identify oil species in pastry products
including any method refinement required, and d) validation of the methodology to detect the presence
of palm oil in chocolate confectionery products including any method refinement.
During the preliminary project FAO117 an initial database was created comprising of 23 pure
vegetable oils and 190 oil admixtures and grouped into 6 different oil classes. In this current follow-on
project, this database was expanded to include more variability, i.e. more pure oils from different
geographical origins and more in-house admixtures, in order to increase robustness of the calibration
models that the FTIR spectroscopic method is based on. Overall, a total of 80 pure vegetable oils were
obtained from reliable sources and 215 oil admixtures were prepared in-house in variable
concentrations. After merging the two datasets from FAO117 and from the current project, a total of
376 samples were used in the calibration models and 101 were used in the prediction set. The
prediction set is comprised of independent samples that are used inhouse to validate the method.
EVID4 Evidence Project Final Report (Rev. 06/11) Page 3 of 106
Calibration models were built using both SIMCA and PLS-DA classification techniques (multivariate
analysis) using computation software Matlab and SIMCA 14.0 UmetricsTM
for comparison purposes.
Calibration models were developed for the 6 classes of oil determined in FAO117 (with modifications
to incorporate the new oils/oil admixtures) and for 12 classes determined in this current project.The
new model of 12 classes provides much more resolution because it clearly distiguishes between the
different botanical origin of the vegetable oil species compared to the legacy 6 classes model that
contained some speciation overlap. In the legacy model design the 6 classes to be predicted
(corresponding to different oil types) using the method are palm,sunflowe/rapeseed oil, palm kernel oil,
and coconut oil and limited binary mixtures of the above. The new higher resolution model with the 12
classes includes: palm, palm kernel, sunflower, rapeseed, coconut, and all the binary combinations of
the above.
Overall, in the legacy design (when the classification result will be determined between 6 classes)
the best classification result was acheived using a calibration model built with PLS-DA using Matlab
sotware and simulated samples, whereas in the high resultion design (12 classes’ model) the best
performing calibration model is built with PLS-DA combined with threshold (t=0.57) using the
Umetrics™ SIMCA software. In the first case (legacy) no samples needed to go to the confirmation
step (GC analysis of fatty acids) whereas 44 samples were referred to the confirmation step with the
hugh resultuion model which was expected because the classification difficulty escalates when
classes are doubled. New criteria based on fatty acid content were developed for the high resolution
model design while the criteria for legacy model design were slightly revised to include the new pure
oil species and oil admixtures (coconut oil and its admixtures).
An inter-lab validation trial was undertaken in order to establish if the analytical method is
‘instrument-agnostic’, i.e. independent of the FTIR instruments used to acquire the spectra of the oils.
Nine samples including pure oils and oil admixtures were prepared in-house and dispatched to each of
the 12 participants agreeing to take part in the inter-lab trial. The majority of the blends could be
identified using the FTIR chemometric models (using the PLS-DA classification technique) (1st stage),
a small percentage of pure and oil blends were incorrectly identified (14% non-classified and 2.3%
wrongly classified). The GC fatty acid analysis (2nd
stage) of these non-classified samples however
correctly identified the nature of 16 out of 18 of the samples (88.9%) that had been referred to this
confirmation step. As a general conclusion, the original method, i.e. FTIR spectroscopy coupled with
PLS-DA classification technique, followed by GC fatty acid analysis when required, offers a great
insight into the nature of pure oil and binary mixtures and correctly classifies 96.03% of unknown oil
samples as seen in this inter-lab validation.
In order to establish the reproducibility of the GC fatty acid data obtained in-house (necessary for
the confirmation step), an inter-lab validation of the GC method was also undertaken. Three different
UK accredited food testing laboratories participated in the GC fatty acid inter-lab trial. Anonymous
samples (n=8) were submitted to the testing laboratories. Each of the laboratories performed the
analysis using their own GCMS instrument and official method for determination of individual fatty
acids in oil samples. Results showed very good reproducibility with low relative standard deviation
values (from 0.01 to 0.53) obtained for the major fatty acids present in oil samples.
The new calibration models (the legacy model design and the new high resolution model design)
were tested on oils extracted from commercial biscuits (plain biscuits, rich tea and digestive biscuits)
obtained in a UK survey. The accuracy (80%) was good for the legacy model but the false positive rate
(20%) was above the theshold we used to determine the the quality of the screening method (5%). For
the new high resulution model, the accuracy was low (50%) and the false positive rate was again high
(25%). Due to the relatively poor results obtained with these calibration models, a new calibration
model was built using oils extracted from biscuits prepared in-house (biscuit-specific model). Digestive
biscuits (DG) were prepared in-house using authentic palm oil and rapeseed oil and rich tea biscuits
(RT) were prepared with palm oil and admixtures of palm oil and rapeseed oil. All oils used were
sourced from reliable sources. After baking, the oils were recovered using hexane extraction and FTIR
spectra was recorded in triplicate for all the biscuit samples (n=40). The biscuit-specific model was
validated with oils extracted from in-house biscuits as well as with oils extracted from commercial
biscuits. Validation with oils from in-house biscuits showed 100% accuracy whereas validation with oils
from commercial biscuits showed 80% accuracy (15% wrongly classified). With the establishment of
thresholds, the false positive rate decreased from 15% to 5%, the accuracy decreased from 85% to
80% and 15% of the samples (3 samples) were referred to the confirmation step. Two out of three
EVID4 Evidence Project Final Report (Rev. 06/11) Page 4 of 106
samples referred to the confirmation step were correctly identified using the 6-classes fatty acids
criteria. In conclusion, FTIR spectroscopy coupled with PLS-DA classification technique, followed by
GC fatty acid analysis (when required), offers an insight into the nature of oils and oil admixtures
extracted from biscuits and correctly classifies 100% of the oils extracted from in-house biscuits
(validation set) and 90% of the oil extracted from commercial biscuits (validation set).
The presence of palm oil in confectionery products is widespread. Due to the different nature of
confectionery oils they could not be tested within the developed calibration models built with pure oils
(legacy and new model). New product-specific calibration models for chocolate confectionery products
were developed to answer the question “is there palm oil in a confectionery product, yes or no?”. FTIR
spectroscopy provided very good and promising results on the single detection of palm oil in a
chocolate confectionery product. Validation with in-house oil admixtures as well as with fats extracted
from commercial confectionery products showed 100% accuracy when using FTIR combined with
PLS-DA using a small dataset. Chocolate products with only cocoa butter (higher added value
products) could be confirmed and the presence of palm oil could be detected in those chocolate
products containing palm oil (generally of lower added value). Fatty acid criteria for confectionery
samples were created and successfully identified all oils extracted from commercial confectionery
products for those samples that needed a confirmatorty analysis following a non-specific screening
result.
In summary:
The vegetable species identification method performed very well when evaluating the speciation
of unprocessed oil blends then the mixture is up to two different oils (legacy and high resolution
model design). Due to the harmonisation protocols developed for the interlab trial the method
delivers accurate results on a range of instruments that were used for the spectra acquisition
and confirmatory chromatographic analysis. .
The results from this study indicate that the method can be successfully used when testing
processed foods containing vegetable oils, however a generic method is not possible and
modifications/ the development of new calibration models may be necessary in order to adapt its
use in different food product categories This is because the FTIR calibration model is not wholly
universal for all commercial products currently on the market.
Confectionary fats are very complex products. The presence of palm oil in confectionery
products has been successfully detected using specific PLS-DA calibration models for chocolate
confectionery products (yes/no model). Chocolate products with only cocoa butter (non-palm oil
confectionery) could also be confirmed using this model in a small commercial samples dataset
that was tested.
In conclusion, the staged procedure consisting of a spectroscopic screening with FTIR and a
chromatographic confirmatory analysis proved effective in identifying the nature of unknown complex
refined vegetable oil blends in both oils and in some extend in processed foods with some essential
modifications. The methodology is simple to implement, very affordable in terms of cost per sample
and equipment resources required and yet highly specific. The research proved that different variation
of the methods (different calibration model) is needed for every product category tested. Further work
is needed to develop the universal (applicable to all products), instrument agnostic (applicable to all
acquisition instruments) method in order to adequately enforce the legislation.
Project Report to Defra
8. As a guide this report should be no longer than 20 sides of A4. This report is to provide Defra with details of the outputs of the research project for internal purposes; to meet the terms of the contract; and to allow Defra to publish details of the outputs to meet Environmental Information Regulation or Freedom of Information obligations. This short report to Defra does not preclude contractors from also seeking to publish a full, formal scientific report/paper in an appropriate scientific or other journal/publication. Indeed, Defra actively encourages such publications as part of the contract terms. The report to Defra should include:
the objectives as set out in the contract;
EVID4 Evidence Project Final Report (Rev. 06/11) Page 5 of 106
the extent to which the objectives set out in the contract have been met;
details of methods used and the results obtained, including statistical analysis (if appropriate);
a discussion of the results and their reliability;
the main implications of the findings;
possible future work; and
any action resulting from the research (e.g. IP, Knowledge Exchange).
FOR THE ABBREVIATIONS USED PLEASE SEE PAGE 45
1. BRIEF BACKGROUND INFORMATION
In 2011, the European Commission (EC) introduced new legislation for labelling of processed foods
containing refined vegetable oils (EU Regulation 1169/2011) and this legislation took effect in 2014. A
number of important changes in the labelling of foodstuffs came into force. According to the legislation,
prepacked food labels should demonstrate clearly in the list of ingredients the vegetable oil species
used in the product. This essentially means that in the case of blended vegetable oils used in food
products, the type of vegetable oil is now clearly identified on the package in contrast to the previous
requirement where an oil blend could be labelled under the generic term “vegetable oil”. Currently
there is no official method that can be used to verify the vegetable oil constituents found in a product
under the new labelling legislation, which will be required to support its enforcement. In 2012, DEFRA
funded a 1 year proof-of-concept research project (FAO117) at Queens University Belfast which
aimed to develop such a methodology. After a thorough literature review (Osorio et al., 2013), it was
concluded that spectroscopic and chromatographic methods were suitable to tackle this problem
although their application has never been attempted with the these particular oil species. Through the
course of that project, the team developed a procedure based on a fusion of spectroscopic and
chromatographic methods for the analysis of binary blends of refined vegetable oils of interest with
emphasis on palm oil and its fractions (stearin and olein). The staged procedure consists of a
screening step (infrared spectroscopy, FTIR) and a confirmation step (chromatographic determination
of fatty acids) coupled with an embedded decision making system. The procedure demonstrated
excellent results when validated with external authentic oil samples in a single lab validation (SLV)
exercise. The extension of the method into foodstuffs (biscuits and confectionery) has been
undertaken within the current project and the reproducibility of the spectroscopic analysis, the fatty
acid criteria and the overall robustness of the method has been studied and re-evaluated.
2. OBJECTIVES
The specific project aims are:
1. To set up an inter-laboratory trial with partners in the UK using different FTIR
spectroscopy instruments.
2. To set up an inter-laboratory trial with partners in the UK using different GCMS (gas
chromatography- mass spectroscopy) instruments.
3. Expand vegetable oils reference database including a limited number of other types of oils
such as coconut oil present in processed foods.
4. Update calibration models and update/create fatty acid criteria when needed.
5. Assess robustness of using the method to determine oil species in food matrices (pastry
products): case study with biscuits.
6. Establish a method to detect the presence of palm oil and palm oil especies in
confectionery products: case study with chocolate confectionery bars and cakes.
7. Develop and validate the web tool used for data analysis.
8. Link with other EU wide initiatives and dissemination
The overall aim is to further improve and validate the developed oil speciation DEFRA method and
SOPs in order to make them fit-for-purpose for policing sustainable labelling of foodstuffs under the
new EU Regulation. This directive was driven by consumer awareness and need for better food
labelling in products across the EU.
EVID4 Evidence Project Final Report (Rev. 06/11) Page 6 of 106
3. DATABASE EXPANSION
3.1. Sourcing of refined authentic oils
As for project FAO117 the authenticity of reference vegetable oil samples was crucial for the
reliability of the final project results. Reference refined palm oil and its derivatives (palm stearin and
palm olein), palm kernel oil, sunflower oil, rapeseed oil and coconut oil samples were purchased from
reliable and reputable sources (major food industries and the oil processing industry) and are
representative of the refined oils present in the European/UK market. These oils were sourced globally
and usually refined/fractionated in the EU/UK. The period for the sourcing of oil samples was from
November 2014 to July 2015. Oils used in the confectionery industry were not easy to find and they
were mainly purchased from online retailers. Thus the authenticity of these oils was not verified and it
cannot be guaranteed. These oils were cocoa butter, hydrogenated palm kernel oil, shea butter, illipe
butter, mango kernel, kokum gurgi and sal. The list of all oils purchased for the current project is
shown in Table 1.
Table 1. Details of all oil samples sourced for the project FAO158.
Oil Specie Sample name
Usage Origin Company Date of purchase
Palm Oil POn1 Calibration Brazil Oil processor 1 07/14
POn2 Calibration Malaysia Oil processor 2 01/15
POn3 Validation Thailand Oil processor 3 11/14
POn4 Calibration Not provided Oil processor 4 01/15
POn5 Validation Not provided Oil processor 4 02/15
POn6 Calibration Not provided Oil processor 4 11/14
POn7 Calibration Not provided Oil processor 5 02/15
POn8 Validation Malaysia Oil processor 6 03/15
POn9 Calibration Malaysia Oil processor 7 03/15
POn10 Calibration Malaysia Oil processor 8 03/15
POn11 Calibration Indonesia Oil processor 9 04/15
POn12 Calibration Indonesia Oil processor 9 04/15
POn13 Validation Indonesia Oil processor 9 04/15
POn14 Calibration Indonesia Oil processor 9 04/15
POn15 Calibration Indonesia Oil processor 9 04/15
POn16 Validation Indonesia Oil processor 9 04/15
POn17 Calibration Indonesia Oil processor 9 04/15
POn18 Validation Indonesia Oil processor 9 04/15
POn19 Calibration Indonesia Oil processor 9 04/15
POn20 Calibration Not provided Oil processor 4 02/14
POn21 Validation Not provided Oil supplier 1
POn22 Validation Not provided Oil supplier 2 06/15
POn23 Calibration Colombia Oil supplier 2 06/15
Palm Kernel Oil
PKOn1 Calibration Malaysia Oil processor 2 01/15
PKOn2 Validation Thailand Oil processor 3 11/14
PKOn3 Calibration China Oil supplier 3 04/15
PKOn4 Calibration Not provided Oil supplier 2 06/15
PKOn5 Not provided Oil supplier 2 06/15
Palm Olein POln Calibration Malaysia Oil processor 2 01/15
POln2 Validation Thailand Oil processor 3 11/14
POln3 Calibration Not provided Oil processor 5 02/15
Palm Stearin
PSn1 Calibration Malaysia Oil processor 2 01/15
PSn2 Validation Thailand Oil processor 3 11/14
PSn3 Calibration Not provided Oil processor 4 01/15
PSn4 Calibration Not provided Oil processor 4 02/15
PSn5 Calibration Not provided Oil processor 4 11/14
PSn6 Validation Not provided Oil processor 4 02/14
PSn7 Calibration Not provided Oil processor 5 02/15
Rapeseed Oil
ROn1 Validation Not provided Oil retailer 1 01/15
ROn2 Calibration Not provided Oil retailer 2 03/15
EVID4 Evidence Project Final Report (Rev. 06/11) Page 7 of 106
ROn3 Validation Not provided Oil retailer 3 03/15
ROn4 Calibration Not provided Oil retailer 4 03/15
ROn5 Calibration Not provided Oil retailer 5 03/15
ROn6 Validation Not provided Oil retailer 6 03/15
ROn7 Calibration Not provided Oil retailer 1 03/15
ROn8 Calibration Not provided Oil retailer 7 03/15
ROn9 Validation Not provided Oil retailer 3 03/15
ROn10 Calibration Not provided Oil retailer 3 03/15
ROn11 Calibration Not provided Oil retailer 2 03/15
ROn12 Calibration Not provided Oil retailer 8 04/15
ROn13 Calibration Not provided Online retailer 1 04/15
ROn14 Calibration Not provided Oil supplier 2 06/15
Sunflower Oil
SOn1 Validation Not provided Oil retailer 1 01/15
SOn2 Calibration Not provided Oil retailer 7 03/15
SOn3 Calibration Not provided Oil retailer 2 03/15
SOn4 Validation Not provided Oil retailer 3 03/15
SOn5 Calibration Not provided Oil retailer 3 03/15
SOn6 Calibration Not provided Oil retailer 3 03/15
SOn7 Validation Not provided Oil retailer 9 03/15
SOn8 Validation Not provided Oil retailer 4 03/15
SOn9 Calibration Not provided Oil retailer 5 03/15
SOn10 Calibration Not provided Oil retailer 6 03/15
SOn11 Validation Not provided Oil retailer 1 03/15
SOn12 Calibration Not provided Oil retailer 10 03/15
SOn13 Validation Not provided Oil retailer 11 03/15
SOn14 Calibration Not provided Online retailer 2 04/15
SOn15 Validation Not provided Online retailer 3
SOn16 Calibration Not provided Online retailer 4 04/15
SOn17 Calibration Not provided Online retailer 5 04/15
SOn18 Calibration Italy Oil supplier 2 06/15
Coconut Oil
CCO1 Calibration Not provided Online retailer 6 04/15
CCO2 Calibration Not provided Online retailer 7 04/15
CCO3 Validation Not provided Online retailer 8 04/15
CCO4 Validation Not provided Online retailer 9 04/15
CCO5 Calibration Not provided Online retailer 10 04/15
CCO6 Calibration Not provided Online retailer 11 04/15
CCO7 Validation Not provided Online retailer 12 04/15
CCO8 Calibration Not provided Online retailer 13 04/15
CCO9 Calibration Not provided Oil supplier 2 06/15
CCO10 Validation Not provided Oil supplier 2 06/15
Cocoa Butter
COA1 Not provided Online retailer 14 04/15
COA2 Not provided Online retailer 15 04/15
COA3 Not provided Online retailer 15 04/15
COA4 Not provided Online retailer 16 04/15
COA5 Not provided Online retailer 17 04/15
COA6 Not provided Online retailer 18 04/15
COA7 Not provided Oil supplier 2 06/15
COA8 Not provided Oil supplier 2 06/15
Shea butter
ShB1 Not provided Oil supplier 2 06/15
ShB2 Not provided Online retailer 19 06/15
ShB3 Not provided Online retailer 20 07/15
ShB4 Not provided Online retailer 21 07/15
ShB5 Not provided Online retailer 22 07/15
Mango Kernel
MnB1 Not provided Online retailer 23 06/15
MnB2 Not provided Online retailer 24 07/15
MnB3 Not provided Online retailer 25 07/15
MnB4 Not provided Online retailer 26 07/15
Kokum gurgi
KmB1 Not provided Online retailer 27 07/15
KmB2 Not provided Online retailer 28 07/15
Illipe Butter
IlB1 Not provided Online retailer 29 07/15
IlB2 Not provided Online retailer 30 07/15
Sal SB1 Not provided Online retailer 31 07/15
EVID4 Evidence Project Final Report (Rev. 06/11) Page 8 of 106
Authentic oil samples were separated into calibration and prediction sets. Both sets are
independent. Calibration sets are samples used only to create the models and prediction sets are
samples used to test the prediction ability of the models. Calibration samples were added to the whole
FAO117 dataset (calibration + prediction) and prediction samples were used to validate the new
expanded database. New chemometric models were developed and prediction samples were used to
validate the new models.
3.2. Preparation of in-house oil mixtures
New binary oil admixtures including all sourced authentic oils (excluding oils for biscuits and
confectionery products) were created in our laboratory. These binary oil mixtures were (Appendix I):
Palm stearin + palm oil 23 samples
Palm olein + sunflower oil 17 samples
Palm oil + sunflower oil 23 samples
Rapeseed oil + palm kernel oil 18 samples
Sunflower oil + palm kernel oil 16 samples
Palm oil + palm kernel oil 30 samples
Rapeseed oil + sunflower oil 18 samples
Rapeseed oil + Palm oil 24 samples
In addition, a new binary admixture was also prepared:
Coconut oil + Palm oil 46 samples
In the preparation of every admixture, oils from different sources and geographic origins were used
in order to include compositional and geographical variability. All oil samples and resulting admixtures
were stored at -20ºC in glass vials with a headspace of <5% to avoid oxidation.
3.3. Spectral data acquisition with FTIR spectroscopy
FTIR spectroscopy was used as a screening technique in order to create a database of
spectroscopic data of vegetable oil samples. Appropriate number of replicates (3) was considered. All
spectra were pre-processed according to a suitable standardized treatment which includes three
spectral filters, standard normal variate (SNV), first order derivative and Savitsky-Golay smoothing,
applied in a sequential order. Pre-processing of spectral data removed undesired systematic variation
in the data (i.e. baseline drift and wavenumber regions of low information content) and enhanced the
predictive power of multivariate calibration models (Eriksson et al., 2006).
3.4. Chromatographic determination of fatty acid methyl esters
Fatty acid methyl esters were prepared according to BS684-2.34:2001 part 5 (see SOP, FAO117).
Specific criteria of individual fatty acids (FA) were modified accordingly and new criteria were
developed for the identification of an unknown sample.
3.5. Data analysis
Extended data analysis was undertaken. In advance of chemometric analysis, the datasets were
pre-processed as described in Section 3.3. After the elimination of the unwanted and systematic
variation, Principal Component Analysis (PCA) as an unsupervised pattern recognition technique was
applied for the exploratory data analysis (EDA) in order to simplify, gain better knowledge of datasets
and identify the outliers. In a second step, two supervised pattern recognition techniques were
performed to build up the classification models, Partial Least Squares Discriminant Analysis (PLS-DA)
and Soft Independent Modelling of Class Analogy (SIMCA). PLS-DA is a discriminant technique which
aims to find the variables and directions in the multivariate space which discriminate the established
classes in the calibration set (Berrueta et al., 2007). On the other hand, SIMCA is a class-modelling
technique where each class is independently modelled using PCA, and can be described by a
different number of principal components. For the interpretation of the models, inspection of the
Variable Importance in Projection (VIP) scores was used. The VIP of a predictor is a value that
EVID4 Evidence Project Final Report (Rev. 06/11) Page 9 of 106
expresses the contribution of the individual variable in the definition of the F-latent vector model
(Bevilacqua et al., 2012). The SIMCA 14.0 Umetrics TM
software (Upssala, Sweden) and MATLAB
R2015b (The Mathworks Inc., USA) software were used for conducting the chemometric analyses.
Specifically, in the workspace of MATLAB, SIMCA and PLS-DA Matlab functions of Cleiton A. Nunes
(UFLA, MG, Brazil) in combination with some in-house functions allowed us to establish the
identification models. The performance of the classification models produced was evaluated by means
of the most common statistical measures (Oliveri & Downey, 2012). In particular, the samples
belonging to the class being modeled are called true positive (TP) if they are correctly found inside of
class boundaries or false negative (FN) if they fall outside of the boundaries. By analogy, samples
extraneous to that class are referred to as false positive (FP) if they are found within the boundaries
or true negative (TN) if they are correctly outside the boundaries. Boundaries for each class are
defined by the classification technique applied for the development of the classification model. The
selection of these boundaries in the training step and the mapping of the new testing samples in the
validation step is based on the theory of each pattern recognition technique.
Sensitivity is defined as the fraction of samples belonging to the modeled class which is
correctly accepted by the respective model:
𝑠𝑒𝑛𝑠𝑖𝑡𝑖𝑣𝑖𝑡𝑦 =𝑇𝑃
𝑇𝑃 + 𝐹𝑁
Specificity is that fraction of samples not belonging to the modeled class that is correctly
rejected by the model:
𝑠𝑝𝑒𝑐𝑖𝑓𝑖𝑐𝑖𝑡𝑦 =𝑇𝑁
𝑇𝑁 + 𝐹𝑃
Precision is defined as the ratio between the number of samples correctly accepted and the
total number of samples accepted by the same model:
𝑝𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛 =𝑇𝑃
𝑇𝑃 + 𝐹𝑃
Accuracy or correct classification rate is the percentage of samples correctly classified. It is
used for the evaluation of the outcome of a discriminant classification:
𝑎𝑐𝑐𝑢𝑟𝑎𝑐𝑦 =𝑇𝑃 + 𝑇𝑁
𝑇𝑃 + 𝑇𝑁 + 𝐹𝑃 + 𝐹𝑁
Thresholds for the classification decision have been selected for the both cases of 6 and 12
classes (See Section 3.6.2). The decision of the thresholds was taken by using a standard cut-off of
false positive rate (FPR):
𝐹𝑃𝑅 =𝐹𝑃
𝐹𝑃 + 𝑇𝑁≤ 5%
Testing samples classified with predicted dummy variable (SIMCA Umetrics™) /probability
(MATLAB) less than the thresholds (0.54 and 0.50, respectively) have been forwarded to the
confirmation step (gas chromatography analysis) of the proposed analytical method. In the testing
step, a value for each class is generated for every testing sample (a vector 1xN where N is the
number of the model classes) corresponding to the predicted dummy variable (SIMCA Umetrics™)
/probability (MATLAB) that an unknown sample belongs to a class. The maximum value of these
numbers is used as classification criterion. In each case investigated, the setting of the thresholds was
done manually by minimising the FPR for the given testing datasets. Specifically, the threshold was
started with a value of 0.5, then it was increased gradually with ultimate aim the higher correct
classification rate and simultaneously a less than 5% false positive rate for the testing samples. If the
false positive rate was more than 5% then the threshold was decreased otherwise it was increased
until the highest overall classification rate have been achieved.
Additionally, in Matlab results, simulated samples were generated using real samples as
references in order to add a certain amount of variation in the calibration dataset. This strategy can be
very useful for improving the overall classification performance as proved by the results (See Section
EVID4 Evidence Project Final Report (Rev. 06/11)Page 10 of 106
3.6.2). In-house algorithms have been developed for changing the baseline, shifting and adding
random noise to the real calibration samples in order to produce simulated samples (Unpublished
work- under review). User can define the value ranges for the amplification factor for the spectral
intensifier which changes the baseline of a spectrum. Moreover, although random x-axis shifting and
adding noise blocks are not deterministic, user can select the parameters for these, i.e. scale
parameter for the Laplacian distribution (shifting along x-axis) and signal-to-noise ratio per spectrum, in
dB for the Gaussian noise (adding noise).
3.6. Results and discussion
3.6.1 Database Expansion
The database created during the FAO117 project funded by DEFRA was comprised of 23 pure oils
and 160 oil admixtures and were grouped into 6 different classes (PKO, P, RS, PPKO, RSPKO and
RSPO). In order to make the models more robust the database was expanded to include more
variability i.e. more pure oils from different origins and more in-house admixtures. A total of 80 pure
oils were purchased from reliable sources for the database expansion. Those pure oils included: palm
oil (n=23), palm kernel oil (n=5), palm olein (n=3), palm stearin (n=7), rapeseed oil (n=14), sunflower
oil (n=18) and coconut oil (n=10). Samples from the calibration set were exclusively restricted to the
prediction set. From these 80 pure oils, a total of 52 pure oils were used for calibration purposes and
27 were used for testing the new updated database. A total of 215 oil admixtures were prepared in-
house, 141 oil admixtures were used for calibration purposes and added to the previous database and
the rest 74 were used for testing the new updated database. Thus, a total of 193 oils including pure
oils and admixtures were added to the existing database and a total of 101 oils including pure oils and
oil admixtures were used as prediction set to validate the new expanded database.
3.6.2 Calibration model building – Classification
Substantial differences were observed among different types of pure oils (Figure 1) and oil
admixtures (Figure 2) when all spectra were superimposed.
Figure 1. Superimposed FTIR spectra of 7 pure oils.
EVID4 Evidence Project Final Report (Rev. 06/11)Page 11 of 106
Figure 2. Superimposed FTIR spectra of 9 oil admixtures.
Two different classification methods were applied to the pre-processed spectroscopic data: a) Soft
Independent Modelling of Class Analogy (SIMCA) and b) Partial Least Square Discriminant Analysis
(PLS-DA) in FTIR expanded database. Both models were developed using specific intervals from the
FTIR spectra (from 654.2 to 1875.4 and from 2520 to 3120.7 cm-1
, the selected 3781 variables were
concatenated serially suitable for untargeted analysis). The selection of the specific intervals was
based on literature findings and the aim was to exclude the areas of the spectra without peaks. The
calibration set was used to develop the classification models at 95% confidence level.
A total of 376 samples were used in the calibration models and 101 were used in the prediction set.
Two chemometric packages including SIMCA 14.0 Umetrics TM
and Matlab were used for different
purposes.
Various classes were considered in the calibration building phase including:
i. 6 classes legacy design (MODEL A and B) as per previous project (FAO 117) with minor
modifications
ii. 12 classes new high resolution design (MODEL C and D)
The characteristics (R2X and Q
2) of the new updated models are shown in Table 2. R
2 is the
percent of variation of the calibration set – Y with PLS – explained by the model. R2 is a measure of fit,
i.e. how well the model fits the data. R2X is the fraction of X variation modeled in the component and
R2X (cumulative) is the cumulative R
2X up to the specified component. Q
2 is the percent of variation of
the calibration set – Y with PLS – predicted by the model according to cross validation. Q2 indicates
how well the model predicts new data. A large Q2 (Q
2 > 0.5) indicates good predictability. Q
2
(cumulative) is the cumulative Q
2 up to the specified component. Unlike R
2X (cum), Q
2 (cum) is not
additive. The model characteristics R2X and Q
2 are generally very good for both models with the
exception of the palm kernel oil (PKO) and coconut oil (CCO) class model which had lower R2X and
Q2 values (Table 2c) for the 12 classes’ model. RO class had low Q
2 values for the 12 class model
(Table 2c).
EVID4 Evidence Project Final Report (Rev. 06/11)Page 12 of 106
Table 2. a) SIMCA and PLS-DA model characteristics on calibration dataset using FTIR spectral data
on all oil samples for the 6 classes’ models (model A) using SIMCA Umetrics™. b) PLS-DA model
characteristics on calibration dataset using FTIR spectral data on all oil samples for the 6 classes’
models (model B) using MATLAB. c) SIMCA and PLS-DA model characteristics on calibration dataset
using FTIR spectral data on all oil samples for the 12 classes’ models (model C) using SIMCA
Umetrics™. d) PLS-DA model characteristics on calibration dataset using FTIR spectral data on all oil
samples for the 12 classes’ models (model D) using MATLAB.
* R2X (cumulative) is the cumulative R
2X up to the specified component. R
2X is the fraction of X variation modeled
in the component; ** Q2 (
cumulative) is the cumulative Q2 up to the specified component. Q
2 indicates how well the
model predicts new data.* P group: palm oil, palm stearin, palm olein; PKOC group: palm kernel oil, coconut oil; RS
group: rapeseed oil, sunflower oil, rapeseed and sunflower admixtures; RSP group: RS group+P group; PPKOC
group: P group+PKOC group; RSPKOC group: RS group+PKOC; RO: rapeseed oil; SO: sunflower oil; PKO: palm
kernel oil; CCO: coconut oil; ROPO: rapeseed and palm oil admixture; SOPO: sunflower and palm oil admixture;
ROPKO: rapeseed and palm kernel oil admixture; SOPKO: sunflower and palm kernel oil admixture; ROSO:
rapeseed and sunflower oil admixture; PPKO: palm oil, kernel oil admixture; PCCO: palm oil and coconut oil mix.
A Class R
2X * (cumulative) Q
2 ** (cumulative)
FTIR
SIM
CA
P 0.918 0.872
PKOC 0.692 0.628
RS 0.929 0.881
PPKOC 0.967 0.944
RSP 0.963 0.951
RSPKOC 0.961 0.948
PLS-DA One model for all
classes 0.984 0.728
B Class R
2X * (cumulative) Q
2 ** (cumulative)
FTIR
PLS-DA One model for all
classes 0.946 0.887
C Class R
2X * (cumulative) Q
2 ** (cumulative)
FTIR
SIM
CA
P 0.918 0.872
RO 0.684 0.474
SO 0.829 0.638
PKO 0.722 0.320
CCO 0.535 0.286
ROPO 0.954 0.907
SOPO 0.960 0.941
ROPKO 0.936 0.913
SOPKO 0.960 0.931
ROSO 0.807 0.733
PPKO 0.960 0.926
PCCO 0.963 0.933
PLS-DA One model for all
classes 0.984 0.460
D Class R
2X * (cumulative) Q
2 ** (cumulative)
FTIR
PLS-DA One model for all
classes 0.936 0.817
EVID4 Evidence Project Final Report (Rev. 06/11)Page 13 of 106
The Variable Importance in Projection (VIP) scores estimate the importance of each variable in the
projection used in a PLS-DA model and is often used for variable selection. A variable with a VIP
score close to or greater than 1 can be considered important in a model. The 10 variables (cm-1
) with
the highest VIP score for the PLS-DA model A were: 1738.99, 1739.48, 1738.51, 1739.96, 1738.03,
1737.55, 1740.44, 1737.07, 1740.92 and 1736.58. C=C, C=O and C=N (stretching vibrations) are the
types of bonds that normally absorb on this region of the spectra e.g. ester C=O stretch, carboxylic
acid C=O stretch, etc. The 10 variables (cm-1
) with the highest VIP score for the PLS-DA model B
were: 2919.7, 2919.22, 2920.18, 1738.99, 1739.48, 1738.51, 1739.96, 1737.07, 1737.55 and 1738.03.
The region around 1740 cm-1
related to C=C, C=O and C=N bonds is also relevant for the model B. C-
H (stretching vibrations) are the type of bonds absorbing on the 2900 cm-1
region of the spectra. The
10 variables with the highest VIP score for the PLS-DA model C were: 1133.46, 1737.07, 1736.58,
1737.55, 1738.03, 1738.51, 1736.1, 1738.99, 2919.7 and 2919.22. The regions around 1740 cm-1
and
2900 cm-1
are also relevant for the 12 class model (Model C). The variable with the highest VIP score
(=3.01) is 1133.46 cm-1
and it is within the fingerprint region (1500-550 cm-1
) which is related to
bending vibrations (C-C, C-O, C-N). The 10 variables with the highest VIP score for the PLS-DA
model D were: 1736.58, 1767.07, 1736.10, 1753.46, 2919.22, 1737.55, 1753.94, 2918.73, 2919.70
and 1086.89. The regions around 1740 cm-1
and 2900 cm-1
are also relevant for the model D. The
variable 1086.69 cm-1
is within the fingerprint region (1500-550 cm-1
).
The developed SIMCA and PLS-DA classification models were validated using the prediction set
(n=101). The prediction set contained different oils from the ones included in the calibration set. In
Table 3 classification results of the prediction dataset against SIMCA and PLS-DA models are
presented. The performance of the classification models was calculated using four parameters;
sensitivity, specificity, precision and accuracy (see Section 3.5). Confusion tables can be seen in the
Appendix II.
EVID4 Evidence Project Final Report (Rev. 06/11)Page 14 of 106
Table 3. SIMCA and PLS-DA model performance on prediction dataset (n=101) using FTIR spectral
data.
TRAINING SAMPLES
ACC (%)
FALSE POSITIVE RATE
(%)
AVERAGE PRECISION (%)
NON CLASSIFIED (%)
MODEL A (using SIMCA UmetricsTM
)/ 6 classes
SIMCA 376 93.07 6.93 91.53 0
PLS-DA (Lv=17) 376 90.10 9.90 92.06 0
MODEL A WITH THRESHOLDS/ 6 classes
SIMCA (t=0.05) 376 33.66 0.99 81.67 65.35
PLS-DA (t=0.57) 376 85.15 4.95 78.71 9.90
MODEL B (using MATLAB)/ 6 classes
SIMCA with simulated samples
1302 76.24 23.76 71.76 0
PLS-DA with simulated
samples (t=0.5) (Lv=11)
1302 95.05 4.95 96.63 0
MODEL C (using SIMCA UmetricsTM
)/ 12 classes
SIMCA 376 89.11 10.89 92.12 0
PLS-DA (Lv=16) 376 81.19 18.81 77.30 0
MODEL C WITH THRESHOLDS/ 12 classes
SIMCA (t=0.05) 376 25.74 0 66.67 74.26
PLS-DA (t=0.54)
376 51.49 4.95 59.36 43.56
MODEL D (using MATLAB)/ 12 classes
SIMCA + sim. samples
2123 34.65 65.35 45.72 0
PLS-DA + sim. samples (Lv=17)
2123 91.09 8.91 92.00 0
*For definitions of terms in first row please see ‘Data analysis’ (Section 3.5)
* See Appendix II for confusion tables
In fact, PLS-DA classification technique performed better when using simulated matrices, the
accuracy increased to 95.05%, the false positive rate was less than 5% (4.95%), the average
precision was 96.63% and no samples needed to go to the confirmation step.
Permutation tests were performed and permuted R2 and Q
2 values were obtained in order to
assess if the PLS-DA model for 6 classes (Model B) is overfitted. For each class, oil class labels were
randomised and then cross validation calibration procedures were repeated for each case (20 times).
The permuted Q2 for every class is negative and lower than the original Q
2 which indicates that the
model is not overfitted (See permutation plots in Appendix III).
One new oil type was introduced in the expanded database, coconut oil. Going beyond the legacy
model design classes were rebuilt in order to contain a clearly defined oil type(s) per class). A total of
12 classes (that had a defined oil types) were created. The spectroscopic datasets are the same but
the samples were re-grouped into 12 classes instead of 6 classes. The new 12 classes were P
(including palm oil, palm olein and palm stearin), RO (including rapeseed oil), SO (including sunflower
oil), PKO (including palm kernel oil), CCO (including coconut oil), ROPO (including rapeseed-palm oil
admixtures), SOPO (including sunflower-palm oil admixtures), ROPKO (including rapeseed-palm
kernel oil admixtures), SOPKO (including sunflower-palm kernel oil admixtures), ROSO (including
rapeseed-sunflower oil admixtures), PPKO (including palm-palm kernel oil admixtures) and PCCO
(including palm-coconut oil admixtures).
EVID4 Evidence Project Final Report (Rev. 06/11)Page 15 of 106
Table 4. Number of simulated samples and parameters used for generating simulated samples for the
6 classes’ model (Model B).
FTIR simulated samples / 6 classes
Actual New total with
simulated samples
Spectral intensifier (step=0.01)
Shifting along x-
axis
Gaussian noise
1 P 68 204 1.01-1.02 - -
2 RS 62 248 1.01- 1.03 - -
3 PKOC 14 210 1.01-1.07 Laplacian distribution
b=0.6 -
4 RSPKOC 51 204 1.01-1.03 - -
5 RSP 107 214 1.01 - -
6 PPKOC 74 222 1.01-1.02 - -
TOTAL 376 1302
* P group: palm oil, palm stearin, palm olein; PKOC group: palm kernel oil, coconut oil; RS group: rapeseed oil,
sunflower oil, rapeseed and sunflower admixtures; RSP group: RS group + P group; PPKOC group: P group +
PKOC group; RSPKOC group: RS group + PKOC.
SIMCA and PLS-DA model performance on prediction dataset using FTIR when considering 12
classes can be seen in Table 3 (Model C and D). Class discrimination using 12 classes proved to be
more challenging. A calibration model built with 12 classes provided lower values of accuracy (89.11%
when using SIMCA and 81.19% when using PLS-DA) compared to the 6 classes’ model. Average
precision was higher (92.12%) when using SIMCA compared to the 6 classes’ model, however
average precision decreased to 77.30% when using PLS-DA. However, the false positive rate (i.e. the
number of samples that are wrongly classified as belonging to the class they don’t) is much higher
(10.89 and 18.81% for SIMCA and PLS-DA, respectively) compared to the 6 classes’ model. With the
aim of decreasing the false positive rate to <5%, some thresholds were introduced in the models as
for the 6 classes’ model. These thresholds were t=0.05 for SIMCA and t=0.54 for PLS-DA. The false
positive rates decreased to 0% for SIMCA and stayed the same for PLS-DA (4.95%) at the expense of
significantly decreasing the accuracy rates to 25.74% for SIMCA and 51.49% for PLS-DA.74.26% (75
out of 101 samples) of the samples were not classified when using SIMCA and 43.56% (44 out of 101
samples) when using PLS-DA, which means that these samples need to go to the second step
(confirmation step based on fatty acid criteria).
Permutation tests were performed and permuted R2 and Q
2 values were obtained in order to
assess if the PLS-DA model for 12 classes (Model C) is spurious i.e. overfitted. The order of the y-
variable was randomly permuted 20 times and separate models were fitted to all the permuted y-
variables extracting 16 components (the same number of components of the original Y matrix). All
permutation plots are shown in Appendix III and they showed no overfitting of the PLS-DA models.
The simulated samples approach was also applied to improve the 12 classes’ model. The actual
number of samples and the final number of samples including the simulated samples for the 12
classes’ model can be seen in Table 5.
EVID4 Evidence Project Final Report (Rev. 06/11)Page 16 of 106
Table 5. Number of simulated samples and parameters used for generating simulated samples for the
12 classes’ model (Model B)
FTIR simulated samples / 12 classes
Actual New total with
simulated samples
Spectral intensifier
Shifting along x-
axis
Gaussian noise
1 P 68 136 1.01 - -
2 RO 15 150 1.005- 1.045, step=0.005
- -
3 SO 16 144 1.005-1.040, step=0.005
- -
4 PKO 8 152 1.01-1.06, step=0.01
Laplacian distribution
b=0.6 30dB
5 CCO 6 138 1.01-1.11, step=0.01
Laplacian distribution
b=0.6 -
6 ROPO 39 195 1.01-1.04, step=0.01
- -
7 SOPO 68 204 1.01-1.02, step=0.01
- -
8 ROPKO 26 182 1.01-1.06, step=0.01
- -
9 SOPKO 25 200 1.01-1.07, step=0.01
- -
10 ROSO 31 217 1.01-1.06, step=0.01
- -
11 PPKO 39 195 1.01-1.04, step=0.01
- -
12 PCCO 35 210 1.01-1.05, step=0.01
- -
TOTAL 376 2123
*P: palm oil, palm olein and palm stearin; RO: rapeseed oil; SO: sunflower oil; PKO: palm kernel oil; CCO:
coconut oil; ROPO: rapeseed and palm oil admixture; SOPO: sunflower and palm oil admixture; ROPKO:
rapeseed and palm kernel oil admixture; SOPKO: sunflower and palm kernel oil admixture; ROSO: rapeseed and
sunflower oil admixture; PPKO: palm oil and palm kernel oil admixture; PCCO: palm oil and coconut oil admixture.
PLS-DA performed much better than SIMCA when using simulated matrices, the accuracy
increased to 91.09%, the false positive rate was more than 5% (8.91%), the average precision was
92.00% and no samples needed to go to the confirmation step.
Overall, the method with the best performance when using a 6 classes’ model is a calibration
model built with PLS-DA using Matlab and simulated samples (MODEL B), whereas for the 12
classes’ model is a calibration model built with PLS-DA combined with threshold (t=0.57) (MODEL C).
Please note that best models are not comparable between the 6 and the 12 classes as they are
different approaches. In the first model, no samples needed to go to the confirmation step whereas 44
samples were referred to the confirmation step in the second model.
3.6.3 Confirmation step– Fatty acids
PLS-DA performed better than SIMCA for the given problem and thus SIMCA was excluded for
further analyses.
A total of 10 samples (9.90%) were submitted to the confirmation step when using the method A
with thresholds/6 classes and 44 samples (43.56%) when using the method C with thresholds/12
classes.
EVID4 Evidence Project Final Report (Rev. 06/11)Page 17 of 106
The criteria for the 6 and 12 classes’ models are shown in Table 6 and Table 8, respectively.
These criteria are applied for the identification of an unknown sample. All conditions have to be met for
a sample to belong in a class. This is applied to all classes. If the unknown sample meets the criteria
of a specific class it is classified in the corresponding class.
The criteria for the 6 classes’ model were modified from the ones of the previous project (FAO117)
since new oils species/oil admixtures were included. Those changes were: C14:0 (5.8-10.0 instead of
7.8-10.0) and C18:2 (43-85 instead of 43-80) for P class; PUFA/SAT ratio (>3.5 instead of >4.0) for
RS class and all criteria for the PKOC class.
Table 6. Criteria expressed in quantities (mg fatty acid/g oil) for 6 classes’ model.
Specific FA
P PKOC RS PPKOC RSP RSPKOC
C8:0 Caprylic acid >8 >2.5 >2.5
C12:0 Lauric acid >0.99 >150 <0.1
C14:0 Myristic acid 5.8-10.0 <0.7
C16:0 Palmitic acid 315-490 50-100 >=70 58-330 35-70
C18:1 Oleic acid >=195
C18:2 Linoleic acid 43-85 <35 135-550 25-75 70-425 24-450
PUFA /SAT (P/S) ratio <0.25 <0.06 >3.5 <=0.3 >=0.325
* FA: fatty acid; P group: palm oil, palm stearin, palm olein; PKOC group: palm kernel oil, coconut oil; RS
group: rapeseed oil, sunflower oil, rapeseed and sunflower admixtures; RSP group: RS group + P group; PPKOC
group: P group + PKOC group; RSPKOC group: RS group + PKOC. PUFA/SAT: polyunsaturated fatty
acids/Saturated fatty acids
All samples (n=10) submitted to the confirmation step according to method A with thresholds/6
classes were successfully identified according to the 6 classes’ criteria (Table 7).
Table 7. Predicted identity of the samples submitted to the confirmation step (fatty acid criteria)
SAMPLE NAME ACTUAL PREDICTED IDENTITY
1 100PKOn2 PKOC PKOC
2 100POln2 P P
3 100POn21 P P
4 100POn22 P P
5 100POn5 P P
6 100POn8 P P
7 70CCO3+30POn3 PPKOC PPKOC
8 26POn3+74SOn4 RSP RSP
9 65POn18+35SOn13 RSP RSP
10 35POn5+65ROn3 RSP RSP
New criteria based on fatty acids for 12 classes were created. Criteria were created in the same
way that the criteria for 6 classes. Pure oils and admixtures from FAO117 and the current project were
used. New criteria for 12 classes can be seen on Table 8. Due to the similar fatty acid profile of some
of the oils/oil admixtures there are some overlapping criteria between RO and ROSO classes and
between PCCO and PPKO classes.
EVID4 Evidence Project Final Report (Rev. 06/11)Page 18 of 106
Table 8. Criteria expressed in quantities (mg fatty acid/g oil) for 12 classes’ model
Class FA
PKO RO SO P ROSO ROPKO SOPKO ROPO SOPO PPKO PCCO CCO
C6:0 <3 0 0 0 0
0 0 <1.0 0.1-2.5
>1.0
C8:0 5.0-40
0 0 0 0 <15 <15 0 0 <15 3.0-35
25-50
C10:0 10-30.0
0 0 0 0 <15 <15 0 0 <20 3.0-35
25-50
C12:0 150-400
0 0 >0.5 0 <235 <235 0.02-1.5
0.01-1.25
<250 20-275
250-350
C14:0
5-10
<10 <10 <100 15-125
>100
C16:0 50-100
20-50
30-70
>300 20-60 <70 <70 20-400
50-400
50-400
100-325
50-100
C16:1 0
0.5-1.5
C18:0 <25 5.0-15
15-35
20-45
5.0-30
<25 <25 5.0-35
20-40 15-35 20-35 15-30
C18:1c 80-175
20-600
150-
250
150-400
200-600
100-600
100-250
200-600
150-350
125-300
80-250
40-80
C18:2c <30 75-175
300-
550
40-85
100-450
15-175 50-400 50-175
50-450
15-75 15-60 5.0-35
C18:3c9,12,15
30-100
<3
<75 2.0-75 0.1-2.0 2.0-90
0.5-2
PUFA/ SAT
<0.07 2.0-4.5
4.5-6.0
<0.27
3.0-6.0
<2.75 <4 <3.25 <5.0 <0.16 <0.16 <0.075
* FA: fatty acid; PKO: palm kernel oil; RO: rapeseed oil; SO: sunflower oil; P: palm oil, palm olein and palm
stearin; ROSO: rapeseed and sunflower oil admixture; ROPKO: rapeseed and palm kernel oil admixture; SOPKO:
sunflower and palm kernel oil admixture; ROPO: rapeseed and palm oil admixture; SOPO: sunflower and palm oil
admixture; PPKO: palm oil and palm kernel oil admixture; PCCO: palm oil and coconut oil admixture; CCO: coconut
oil; PUFA/SAT: polyunsaturated fatty acids/Saturated fatty acids
Thirty-nine out of forty-four samples submitted to the confirmation step according to method C with
thresholds/12 classes were successfully identified according to the 12 classes’ criteria (Table 9). Four
samples were given two identities because they met all conditions for two classes that are similar in
term of fatty acid profiles. One sample could not meet all conditions for any of the 12 classes and thus
was left unidentified.
Table 9. Predicted identity of the samples submitted to the confirmation step (fatty acid criteria)
SAMPLE NAME ACTUAL PREDICTED IDENTITY
1 100POn21 P P
2 100POn22 P P
3 100POn3 P P
4 100POn5 P P
5 100POn8 P P
6 100ROn3 RO UNIDENTIFIED
7 100ROn6 RO RO/ROSO
8 100ROn9 RO RO
9 25ROn1+75SOn1 ROSO ROSO
10 33ROn3+67SOn4 ROSO ROSO
11 41ROn6+59SOn7 ROSO ROSO
12 49ROn9+51SOn8 ROSO ROSO
13 57ROn1+43SOn11 ROSO ROSO
14 65ROn3+35SOn13 ROSO ROSO
15 73ROn6+27SOn15 ROSO ROSO/RO
EVID4 Evidence Project Final Report (Rev. 06/11)Page 19 of 106
16 22PKO2+78POn3 PPKO PPKO
17 28PKO2+72POn5 PPKO PPKO
18 34PKOn2+66POn8 PPKO PPKO
19 40PKOn2+60POn13 PPKO PPKO
20 46PKOn2+54POn16 PPKO PCCO/PPKO
21 52PKO2+48POn18 PPKO PCCO/PPKO
22 28CCO3+72POn3 PCCO PCCO
23 31CCO4+69PO5 PCCO PCCO
24 25PKOn2+75ROn1 ROPKO ROPKO
25 30PKOn2+70ROn3 ROPKO ROPKO
26 39PKOn2+61ROn6 ROPKO ROPKO
27 47PKOn2+53ROn9 ROPKO ROPKO
28 50PKOn2+50ROn1 ROPKO ROPKO
29 60PKOn2+40ROn3 ROPKO ROPKO
30 69PKOn2+31ROn6 ROPKO ROPKO
31 73PKOn2+27ROn9 ROPKO ROPKO
32 42PKOn2+58SOn7 SOPKO SOPKO
33 47PKOn2+53SOn8 SOPKO SOPKO
34 57PKOn2+43SOn11 SOPKO SOPKO
35 60PKOn2+40SOn13 SOPKO SOPKO
36 69PKOn2+31SOn15 SOPKO SOPKO
37 65POn18+35SOn13 SOPO SOPO
38 35POn5+65ROn3 ROPO ROPO
39 41POn8+59ROn6 ROPO ROPO
40 50POn13+50ROn9 ROPO ROPO
41 56POn16+44ROn1 ROPO ROPO
42 65POn18+35ROn3 ROPO ROPO
43 71POn21+29ROn6 ROPO ROPO
44 75POn22+25ROn9 ROPO ROPO
4. INTER-LAB TRIALS AS PART OF METHOD VALIDATION
4.1 FTIR inter-lab trial
It was established from DEFRA project FAO117 that the combination of a two-step analytical
procedure, standard chemometric classification techniques and a vertical decision making process
produced very good results when validated in our lab (intra-lab validation). The analytical procedure
can be summarised as a screening step where a spectroscopic method such as FTIR (untargeted
analysis) is employed in oil admixtures and a confirmation step (targeted analysis) where the identity
of the unidentified samples from the screening step is confirmed by standard fatty acid analysis (GC).
In order to know if the method is ‘instrument-agnostic’ i.e. independent of the instruments used to
acquire the spectra of the oils, an inter-lab validation trial was undertaken.
4.1.1 Participants
Twelve different institutions in the UK including research centres, food industries, public services and
private companies participated in the inter-lab validation.
4.1.2 Samples
A total of nine samples including pure oils and oil admixtures were prepared in our lab and sent to each
of the participants. The oils used for preparing the admixtures were different from the ones included in
the calibration set. They were new oils (Origin: Thailand, Oil processor 3) purchased from the period
EVID4 Evidence Project Final Report (Rev. 06/11)Page 20 of 106
August 2014 to December 2014. The pure oil and oil admixture samples were:
o Sample 1: Palm oil (100% PO)
o Sample 2: Rapeseed oil (100% RO)
o Sample 3: Palm kernel oil (100% PKO)
o Sample 4: Rapeseed-palm oil (50% RO-50% PO)
o Sample 5: Rapeseed-palm stearin (70% RO-30% PS)
o Sample 6: Palm kernel oil-palm oil (40% PKO-60% PO)
o Sample 7: Rapeseed oil-Palm kernel oil (50% RO-50% PKO)
o Sample 8: Rapeseed oil-Sunflower oil (40% RO-60% SO)
o Sample 9: Palm olein-rapeseed oil (70% POL-30% RO)
4.1.3 Results
Due to the high variability observed on the spectral data coming from different instruments a new
approach to pre-processing was needed before testing them in our calibration models. Acquisition
parameters varied amongst participants due to the different FTIR instruments and software used.
Duplicates of all spectra were averaged before pre-processing. All spectra for every sample were
plotted together to see variation between participants (Figure 3-11).
Figure 3. Superimposed FTIR spectra of 16 palm oils
Figure 4. Superimposed FTIR spectra of 16 rapeseed oils
EVID4 Evidence Project Final Report (Rev. 06/11)Page 21 of 106
Figure 5. Superimposed FTIR spectra of 16 palm kernel oils
Figure 6. Superimposed FTIR spectra of 16 rapeseed oil + palm oil admixture
Figure 7. Superimposed FTIR spectra of 16 rapeseed oil + palm stearin admixture
EVID4 Evidence Project Final Report (Rev. 06/11)Page 22 of 106
Figure 8. Superimposed FTIR spectra of 16 palm kernel oil + palm oil admixture
Figure 9. Superimposed FTIR spectra of 16 rapeseed oil + palm kernel oil admixture
Figure 10. Superimposed FTIR spectra of 16 rapeseed oil + sunflower oil admixture
EVID4 Evidence Project Final Report (Rev. 06/11)Page 23 of 106
Figure 11. Superimposed FTIR spectra of 16 palm olein + rapeseed oil admixture
The first difference between the spectra recorded using different instruments is the number of
variables. Data spacing depends on resolution and other acquisition parameter such as zero filling.
The spectra used to create the calibration models were recorded at a resolution 4 cm-1
and zero filling
of four times (2 levels) so that the data spacing was 0.482 cm-1
and the number of variables
(wavenumbers) was 7157. Other aspects of the spectra that need to be corrected through signal
correction filters are baseline scope and peak shifting. The pre-processing techniques included: Linear
interpolation, iCoShift, Standard Normal Variate (SNV), first derivative, Savitzky–Golay and Pareto
scaling. Description of these pre-processing techniques can be seen in the Appendix IV.
The FTIR inter-lab trial was conducted before the database expansion performed later on in the
current project (see Section 3) and thus the unequal number of samples amongst classes was
overcome by creating simulated samples that were added to the calibration models in order to create
balanced classes and avoid any biased classification decision. Simulated samples are new samples
created by offsetting the mean spectrum of each class along the Y axis and slightly along the X axis.
These samples were appended to the calibration dataset and the model was re-trained. The offset
percentage along the Y-axis varied between 0 and 25% in order to have a balanced classification
model.
Detailed results and discussion can be found in the Appendix IV. Overall, PLS-DA proved to be
more powerful than SIMCA algorithm when correctly assigning unknown samples to any of the oil
classes. The disadvantage of miss-classification was tackled by establishing thresholds (P values) and
adding synthetic samples to the calibration models. The screening method (FTIR) has demonstrated
very capable of predicting the nature of both the pure oil and the binary oil admixtures and has the
great advantage of being a fast and easy method to rapidly screen an oil sample for authentication
purposes. The initial concept proved to work as seen in the inter-lab trial validation results where the
majority of the blends can be identified by the chemometric models (PLS-DA) in the screening step
and a small percentage (14% non-classified and 2.3% wrongly classified) of pure and oil blends are
rejected. Those pure oils and/or oil admixtures had to be analysed further using targeted analytical
methods such as analysis of fatty acid composition (confirmation step). The fatty acid analysis of the
validation samples correctly identified the nature of 16 out of 18 samples (88.9%) referred to the
confirmation step when using the PLS-DA algorithm. As a general conclusion, FTIR spectroscopy
coupled with PLS-DA algorithm and followed on by fatty acid analysis when required offers an insight
into the nature of pure oil and binary mixtures and correctly classifies 96.03% of unknown oil samples
as seen in this inter-lab validation.
EVID4 Evidence Project Final Report (Rev. 06/11)Page 24 of 106
4.2 Fatty acids inter-lab trial
A second step or confirmation step based on fatty acid analysis was stablished in FAO117 in order
to know the identity of samples that couldn’t be revealed on the screening step based on
spectroscopic analysis. Criteria were created based on fatty acid data obtained in our laboratory and
they proved successful. In order to know the reproducibility of the fatty acid data obtained in our
laboratory and thus the fatty acid criteria, an inter-lab validation has been undertaken.
4.2.1 Participants
Three different accredited laboratories based in UK participated in the fatty acid inter-lab trial.
Samples were anonymous and were submitted to the testing laboratories for performing fatty acid
analyses using GC. Each of the laboratories performed the analysis using their own GC instrument
and official method for determination of individual fatty acids in oil samples. The same samples were
also analysed in our laboratory.
4.2.2 Samples
A total of eight samples including pure oils and oil admixtures as well as certified reference
materials from the European Commission- Institute for Reference Materials and Measurements
(IRMM) were submitted to each of the participants. The samples were:
o Sample 1: Standard Soya-Maize oil blend. European Commission, Institute for Reference
Materials and Measurements (IRMM), certified reference material BCR-162R.
o Sample 2: Palm oil and shea butter admixture (50% palm oil + 50% shea butter)
o Sample 3: Palm oil and rapeseed oil admixture (65% palm oil + 35% rapeseed oil)
o Sample 4: Palm kernel oil and palm oil admixture (42% palm kernel oil + 58% palm oil)
o Sample 5: Coconut oil and palm oil admixture (58% coconut oil + 42 palm oil)
o Sample 6: Soybean oil and palm oil admixture (59% soybean oil + 41% palm oil)
o Sample 7: Palm oil
o Sample 8: Standard cocoa butter. European Commission, Institute for Reference Materials and
Measurements (IRMM), certified reference material IRMM-801.
4.2.3 Method
Individual fatty acid concentrations were calculated using the internal standard method as
calculated in the phase 1 of the FAO117 project. Response factors were calculated from the external
fatty acid standards with respect to C13:0 which was used as the internal standard. The peak area of
the individual fatty acid was divided by the peak area of the internal standard, multiplied by the internal
standard concentration and then by the corresponding response factor and then applying sample
weight and dilution factors. Duplicate analyses were then averaged.
4.2.4 Results
The fatty acid contents of all the oil samples included in this validation trial are presented in
Appendix V. Results of the first sample are discussed in this section as it is a certified standard
sample (Table 10). Similar pattern was observed for the rest of the samples analysed.
The relative standard deviation (RSD) was used to evaluate the repeatability of the measurements
taken using different instruments. The results obtained (Table 10) indicate that the repeatability of the
method is acceptable. The RSD of the most abundant fatty acids (palmitic acid, stearic acid, oleic acid,
linoleic acid and linolenic acid) ranged from 0.02 to 0.07 which indicates good repeatability.
EVID4 Evidence Project Final Report (Rev. 06/11)Page 25 of 106
Table 10. Fatty acid content (expressed in %) of sample 1 (Standard Soya-Maize oil blend,
certified reference material BCR-162R)
FATTY ACIDS
BCR-162R IRMM
LAB 1 (%) LAB 2 (%) LAB 3 (%) LAB 4 (%) RSD
C6:0
0.00
0.01
2.00
C8:0
0.00 0.00 0.01
2.00
C10:0
0.00 <0.1 0.01
1.76
C12:0
0.00 <0.1 0.01
1.76
C14:0
0.04 <0.1 0.05 0.10 0.41
C15:0
0.00
0.03
2.00
C16:0 10.74 11.18 10.90 10.69 11.00 0.02
C16:1c
0.06
0.12 0.20 0.92
C17:0
0.07
0.07 0.10 0.71
C17:1c
0.03
0.08
1.43
C18:0 2.82 3.27 2.90 2.84 2.90 0.07
C18:1t
0.00 <0.1 0.03 0.10 0.87
C18:1c 25.40 28.58 26.70 26.71 26.60 0.04
C18:2t
0.16 <0.1 0.46 0.50 0.67
C18:2c 54.13 52.13 55.30 53.86 53.60 0.02
C20:0
0.27 0.40 0.40 0.40 0.17
C18:3c6,9,12
0.16
0.01
1.58
C20:1c
0.35 0.30 0.35 0.30 0.09
C18:3c9,12,15
3.35 3.28 3.60 3.75 3.30 0.07
C20:2c
0.02
0.03
1.17
C22:0
0.28 <0.1 0.29 0.30 0.39
C23:0
0.00
0.00
C24:0
0.12
0.17 0.10 0.73
5. APPLICATION OF THE METHOD IN PASTRY PRODUCTS (BISCUITS)
5.1 Validation of the FTIR 6 and 12 classes’ models (Model B and C) on commercial biscuits
A total of 20 commercial samples including different types of plain biscuits and brands were
purchased from retailers in the UK (Table 11).
According to the ingredient list on the label, 16 biscuits contain palm oil (PO) and 4 biscuits contain
palm oil and rapeseed oil (PORO). Oils from commercial biscuits were extracted using the method
described in Section 5.2.2 for the extraction of oils from in-house biscuits. FTIR spectra were collected
for all samples. Spectroscopic data of the oils extracted from the commercial biscuits were checked
against the models built using pure oils (model B and C, see Section 3.6.2).
EVID4 Evidence Project Final Report (Rev. 06/11)Page 26 of 106
Table 11. List of commercial biscuits purchased from retailers in the UK
Sample code COMMERCIAL BISCUITS
Oil type Country Product type
CMDGV2 PO UK Digestives
CMDGV3 PO UK Digestives
CMDGV4 PO UK Digestives
CMDGV6 PO UK Digestives
CMDGV7 PO UK Digestives
CMDGV8 PO UK Digestives
CMDGV9 PO UK Digestives
CMDGV10 PO UK Digestives
CMDGV11 PO UK Digestives
CMDGV12 PO UK Digestives
CMRTV3 PO UK Rich Tea
CMRTV4 PO UK Rich Tea
CMRTV5 PORO UK Rich Tea
CMRTV6 PORO UK Rich Tea
CMRTV7 PORO UK Rich Tea
CMRTV8 PO UK Rich Tea
CMRTV9 PORO UK Rich Tea
CMRTV10 PO UK Rich Tea
CMRTV11 PO UK Rich Tea
CMRTV12 PO UK Rich Tea
* PO: palm oil; PORO: palm oil and rapeseed oil admixtures.
5.1.1 Results using the 6 classes’ legacy model (Model B)
The 6-classes’ model was used to predict the oil types including in commercial biscuits and the
results were:
o Accuracy (%): 80.00;
o False rate (%): 20.00;
o Average precision (%): 96.67
80% of the samples were correctly identified using the model B whereas 20% were wrongly
predicted i.e. were assigned to the wrong class.
5.1.2 Results using the 12 classes’ high resolution model (Model C)
The 12-classes’ model was used to predict the oil types including in commercial biscuits and the
results were:
o Accuracy (%): 50.00;
o False rate (%): 25.00;
o Samples for the Confirmation Step (%): 25.00 (5 samples out of 20)
Accuracy was lower compared to the 6-classes’ model (50% vs 80%). 25% of the samples were
wrongly assigned to classes and the rest (25%) were unidentified and are referred to the confirmation
step based on fatty acid criteria.
EVID4 Evidence Project Final Report (Rev. 06/11)Page 27 of 106
5.2 Development of specific biscuit-only model
Specific models using the FTIR spectroscopic data of in-house biscuits were created to compare to the
results obtained with the previous models (Section 5.1).
5.2.1 Samples
Pure oils were purchased from wholesaler, retailers and supermarkets in the UK (Table 12). Those
oils were: palm oil (PO) (n=12) and rapeseed oil (RO) (n=10) which are the most common oils used in
the biscuit sector.
Table 12. Details of samples (calibration and prediction set) used for the biscuit-only model
Oil species
Sample code Usage Origin Company
Palm Oil (PO)
POn3 Prediction Thailand Oil processor 3
POn4 Calibration UK Oil processor 4
POn5 Calibration UK Oil processor 4
POn6 Prediction UK Oil processor 4
POn7 Calibration Not provided Oil processor 5
POn8 Calibration Malaysia Oil processor 6
POn9 Calibration Malaysia Oil processor 7
POn10 Calibration Malaysia Oil processor 8
POn12 Calibration Indonesia Oil processor 9
POn16 Calibration Indonesia Oil processor 9
POn19 Prediction Indonesia Oil processor 9
POn20 Calibration UK Oil processor 4
Rapeseed Oil (RO)
ROn1 Prediction Not provided Oil retailer 1
ROn2 Calibration Not provided Oil retailer 2
ROn3 Calibration UK Oil retailer 3
ROn4 Calibration Not provided Oil retailer 4
ROn5 Calibration Not provided Oil retailer 5
ROn6 Calibration Not provided Oil retailer 6
ROn7 Calibration Not provided Oil retailer 1
ROn8 Calibration More than one country
Oil retailer 7
ROn9 Calibration UK Oil retailer 3
ROn10 Prediction Not provided Oil retailer 3
ROn11 Prediction Not provided Oil retailer 2
ROn12 Calibration Belgium Oil retailer 8
A quick market research was undertaken to establish the combination of vegetable oil species
involved in the making of plain biscuits. Two different types of biscuits, digestive (DG) and rich tea
(RT) from different brands were studied as they are the most typical plain biscuits on the market. The
most common oils/oil admixtures found in biscuits are as follow:
- Palm oil (PO)
- Rapeseed oil (RO)
- Palm oil and rapeseed oil admixtures (PORO)
5.2.2 In-house biscuits preparation and extraction process
Digestive and rich tea biscuits were baked in our laboratory following the recipe and baking
conditions obtained from industry sources. The ingredients list is presented in Table 13. All ingredients
EVID4 Evidence Project Final Report (Rev. 06/11)Page 28 of 106
were weighted and mixed together. Palm oil (PO), rapeseed oil (RO) and PORO (palm oil and
rapeseed oil admixture) were added to the biscuits accordingly. Biscuits were baked for 10 minutes at
170℃. Digestive biscuits (DG) were prepared using 2 different oils/oil admixtures: PO and RO and
rich tea biscuits (RT) were prepared using 2 different oils/oil admixtures: PO and PORO.
Table 13. Formulation deriving from industry practices (Manleya, 2001; Manley
b, 2001).
Baking method: 170°C for 10 min
Ingredients Digestives (weight in
g/biscuit) Rich Tea (weight in
g/biscuit)
Wholemeal flour 3
Plain flour 15 27.6
Sugar 3.9 6.9
Syrup 0.6 1.8
Soda 0.5 0.15
Salt 0.3 0.2
Water 2 6
Vegetable oils 6 6
In-house biscuits were finely ground for the extraction of the oils. Extractions were done with
hexane and the extraction process was as follows: The ground biscuit powder was mixed with n-
hexane (1:2) in 50 mL centrifuge tubes (13 ~ 15 g/biscuit powder with 30 mL n-hexane in each tube)
by the roller mixer for 1 hour (33 rpm with 16 mm amplitude) for dissolving the oils in the solvent.
Afterwards, tubes with the biscuits powder and solvent were centrifuged at 3000 ×g for 10 min to
separate the powder from the solvent. The upper layer containing the oil dissolved in the solvent was
transferred immediately into a 50 mL round-bottomed flask for the evaporation of the solvent using a
rotary evaporator (60°C and 160 rpm for 15 min). After the evaporation of the solvent, the oil was
transferred to a small vial and kept at -20℃ until further analysis.
5.2.3 FTIR spectral data acquisition
FTIR spectroscopy was used as screening technique in order to collect spectroscopic data from
the oils present in biscuits. The procedure and spectroscopic conditions were the same as the ones
used for building the spectroscopic database of pure oils. Three replicates were obtained for each
sample. Samples were defrosted and heated at 50°C for 3-5 minutes prior to spectra collection.
All spectra were pre-processed according to a suitable standardized treatment which includes
three spectral filters, standard normal variate (SNV), first order derivative and Savitsky-Golay
smoothing, applied in a sequential order.
5.2.4 Biscuit only-model building- Calibration models and validation
5.2.4.1 Biscuit dataset building
Calibration models were built using spectroscopic data from oils extracted from the in-house
biscuits (n=40). Samples were divided into two independent sets, the calibration set (n=40) and the
prediction set (n=14) and were assigned to classes.
Calibration models using FTIR data were built for 3 classes (RO, PO and PORO). Only PLS-DA
was used as a chemometric technique as proved to perform better than SIMCA.
The model characteristics (R2 and Q
2) are shown in Table 14. R
2 is the percent of variation of the
calibration set – Y with PLS – explained by the model. R2 is a measure of fit, i.e. how well the model
fits the data. R2X is the fraction of X variation modeled in the component and R
2X (cumulative) is the
cumulative R2X up to the specified component. Q
2 is the percent of variation of the calibration set – Y
with PLS – predicted by the model according to cross validation. Q2 indicates how well the model
predicts new data. A large Q2 (Q
2 > 0.5) indicates good predictability. Q
2 (cumulative) is the cumulative
Q2 up to the specified component. Unlike R
2X (cum), Q
2 (cum) is not additive.
EVID4 Evidence Project Final Report (Rev. 06/11)Page 29 of 106
Table 14. PLS-DA model characteristics on calibration dataset using FTIR variables on all oil samples for the 3 classes’ model (n=40).
* R2 is the percent of variation of the training set – Y with PLS – explained by the model; ** Q
2 indicates how well the model
predicts new data.
The model characteristics R2X and Q
2 are good for the 3-classes model based on FTIR as seen in
Table 14.
5.2.4.2 Validation of the biscuit model with in-house biscuits
The developed PLS-DA classification models were validated using the prediction set (n=14). The
prediction set contained different oils from the ones included in the calibration set. The performance of
the classification models was calculated using four parameters; sensitivity, specificity, precision and
accuracy (see section 3.5). The classification results of the prediction dataset against PLS-DA models
are as follow:
- Accuracy (%): 100.00
- False rate (%): 0.00
- Average precision (%): 100.00
All in-house biscuit oils (n=14) were correctly classified when using FTIR and PLS-DA. Confusion
tables and the performance of the classification models can be found in table 15 a and b.
Table 15a. Performance of the classification model (oils extracted from in-house biscuits, FTIR,
PLS-DA) on the validation samples (oils extracted from in-house biscuits, n=14).
Description Statistical Measures
ACC
(%) Class TP TN FP FN
Sensitivit
y or TPR
Specific
ity
FP
R
Precisi
on
F1
score
Application of
PLSDA
100 PO 6 8 0 0 1.00 1.00 0.00 1.00 1.00
PORO 6 8 0 0 1.00 1.00 0.00 1.00 1.00
RO 2 12 0 0 1.00 1.00 0.00 1.00 1.00
*ACC: accuracy; TP: true positives; TN: true negatives; FP: false positives; FN: false negatives; TPR: true positive rate; FPR: false positive rate; PO: palm oil; PORO: palm oil and rapeseed oil admixtures; RO: rapeseed oil.
Table 15b. Confusion table for results presented in Table 15a
Actu
al
Predicted
PO PORO RO Total Sensitivity (%)
PO 6 0 0 6 100.0
PORO 0 6 0 6 100.0
RO 0 0 2 2 100.0
Precision (%) 100.0 100.0 100.0
Average sensitivity 100.0
Average precision 100.0
Overall accuracy 100.0
Class R2X * (cumulative) Q
2 ** (cumulative)
PLS-DA One model for all classes
0.942 0.783
EVID4 Evidence Project Final Report (Rev. 06/11)Page 30 of 106
* PO: palm oil; PORO: palm oil and rapeseed oil admixtures; RO: rapeseed oil.
Six samples were correctly classified as belonging to the PO class, other six samples were
correctly classified to the PORO class and the last two samples were correctly classified as belonging
to the RO class. Thus the overall accuracy was 100% and the false positive rate 0%.
5.2.4.3 Validation of the biscuit only-model with commercial biscuits
The developed PLS-DA classification models based on oils extracted from in-house biscuits were
validated using the oil extracted from the commercial biscuits (n=20). The performance of the
classification models was calculated using four parameters; sensitivity, specificity, precision and
accuracy (see section 3.5). The classification results of the prediction dataset against PLS-DA models
are as follow:
- Accuracy (%): 85.00
- False rate (%): 15.00
- Average precision (%): 75.00
85% (17 samples) of the extracted oils from the commercial biscuits were correctly classified when
using FT-IR and PLS-DA whereas 15% (3 samples) were wrongly classified. Confusion tables and the
performance of the classification models can be found in tables 16 a and b.
Table 16a. Performance of the classification model (oils extracted from in-house biscuits, FTIR,
PLS-DA) on the validation samples (oils extracted from commercial biscuits, n=20).
Description Statistical Measures
ACC (%)
Class TP TN FP FN Sensitivity or TPR
Specificity
FPR
Precision
F1 score
Application of PLSDA
85 PO 16 1 3 0 1.00 0.25 0.75 0.84 0.91
PORO 1 16 0 3 0.25 1.00 0.00 1.00 0.40
RO 0 20 0 0 1.00 1.00 0.00 1.00 1.00
* ACC: accuracy; TP: true positives; TN: true negatives; FP: false positives; FN: false negatives; TPR: true positive rate; FPR: false positive rate; PO: palm oil; PORO: palm oil and rapeseed oil admixtures; RO: rapeseed oil.
Table 16b. Confusion table for results presented in Table 16a
Actu
al
Predicted
PO PORO RO Total Sensitivity (%)
PO 16 0 0 16 100.00
PORO 3 1 0 4 25.00
RO 0 0 0 0 0.00
Precision (%) 84.21 100.00 0.00
Average sensitivity 41.67
Average precision 61.40
Overall accuracy 85.00
* PO: palm oil; PORO: palm oil and rapeseed oil admixtures; RO: rapeseed oil.
Three samples of oils extracted from commercial biscuits containing PORO were wrongly classified
in the P class.
In order to decrease the false positive rate (<5%) thresholds were established (t=0.70) and the
EVID4 Evidence Project Final Report (Rev. 06/11)Page 31 of 106
performance of the model was as follow:
- Accuracy (%): 80.00
- False rate (%): 5.00
- Samples for the confirmation step (%): 15.00
Exactly 80% (16 samples) of the extracted oils from the commercial biscuits were correctly
classified when using FTIR and PLS-DA whereas 5% (1 sample) was wrongly classified. Three
samples (15%) were not assigned to any class and thus they are submitted to the confirmation step
based on fatty acid criteria. Confusion tables and the performance of the classification models can be
found in tables 17 a and b. One sample of oil extracted from commercial biscuits containing PORO
was wrongly classified in the P class.
Table 17a. Performance of the classification model (oils extracted from in-house biscuits, FTIR,
PLS-DA, thresholds) on the validation samples (oils extracted from commercial biscuits, n=20).
Description Statistical Measures
ACC (%)
Class TP TN FP FN Sensitivity or TPR
Specificity
FPR
Precision
F1 score
Application of PLSDA
80 PO 15 3 1 1 0.94 0.75 0.25 0.94 0.94
PORO 1 16 0 3 0.25 1.00 0.00 1.00 0.40
RO 0 20 0 0 1.00 1.00 0.00 1.00 1.00
* ACC: accuracy; TP: true positives; TN: true negatives; FP: false positives; FN: false negatives; TPR: true positive rate; FPR: false positive rate; PO: palm oil; PORO: palm oil and rapeseed oil admixtures; RO: rapeseed oil.
Table 17b. Confusion table for results presented in Table 17a
Actu
al
Predicted
PO PORO RO Confirmation Total Sensitivity
(%)
PO 15 0 0 1 16 93.75
PORO 1 1 0 2 4 25.00
RO 0 0 0 0 0 0.00
Precision (%) 93.75 100.00 0.00
Average sensitivity 39.58
Average precision 64.58
Overall accuracy 80.00
* PO: palm oil; PORO: palm oil and rapeseed oil admixtures; RO: rapeseed oil.
5.2.5 Confirmation step- Fatty acids
Three samples of oil extracted from commercial biscuits were submitted to the confirmation step.
These samples were:
- CMDGV11: containing palm oil
- CMRTV6: containing palm oil and rapeseed oil
- CMRTV9: containing palm oil and rapeseed oil
EVID4 Evidence Project Final Report (Rev. 06/11)Page 32 of 106
According to the 6 classes’ criteria (Table 6) sample CMDGV11 was classified as belonging to the
group P and sample CMRTV6 was classified as belonging to the group RSPO (Table 18a). Sample
CMRTV9 did not meet all the conditions of any class so remained unidentified but the high values of
palmitic acid indicate it may contain palm species.
Table 18a. Application of 6 classes’ fatty acid criteria to the unidentified biscuit samples
Specific FA (mg FA/g oil) CMDGV11 CMRTV6 CMRTV9
C8:0 Caprylic acid 0.09 0.09 1.95
C12:0 Lauric acid 1.25 1.12 1.08
C14:0 Myristic acid 6.25 5.15 5.16
C16:0 Palmitic acid 310.61 259.29 243.79
C18:1 Oleic acid 223.03 252.01 183.16
C18:2 c Linoleic acid 72.37 82.52 34.64
PUFA/SAT index 0.22 0.35 0.14
GC CRITERIA RESULT (assigned class)
P RSPO Not conclusive result. Contains
P
ACTUAL IDENTITY Palm oil biscuit (Oil retailer 7)
Rapeseed oil and palm oil (Oil retailer 7)
Rapeseed oil and palm oil (Oil retailer 1)
* PUFA/SAT: polyunsaturated fatty acids/saturated fatty acids.
Similar results (Table 18b) were obtained when applying the 12 classes’ criteria (Table 8). Sample
CMDGV11 was identified as being a pure palm oil sample and sample CMRTV6 was identified as
being a rapeseed and palm oil admixture. Sample CMRTV9 did not meet again the criteria of any of
the classes so remained unidentified; however the high values of palmitic acid indicate it may contain
palm species.
Table 18b. Application of 12 classes’ fatty acid criteria to the unidentified biscuit samples
Specific FA (mg FA/g oil)
CMDGV11 CMRTV6 CMRTV9
C8:0 Caprylic acid 0.09 0.09 1.95
C12:0 Lauric acid 1.25 1.12 1.08
C14:0 Myristic acid 6.25 5.15 5.16
C16:0 Palmitic acid 310.61 259.29 243.79
C18:1 Oleic acid 223.03 252.01 183.16
C18:2c Linoleic acid 72.37 82.52 34.64
PUFA/SAT index 0.22 0.35 0.14
GC CRITERIA RESULT (assigned class)
PO ROPO Not conclusive result. Contains P
ACTUAL IDENTITY Palm oil biscuit
(Oil retailer 7)
Rapeseed oil and palm oil
(Oil retailer 7)
Rapeseed oil and palm oil
(Oil retailer 1)
* PUFA/SAT: polyunsaturated fatty acids/saturated fatty acids.
EVID4 Evidence Project Final Report (Rev. 06/11)Page 33 of 106
6. APPLICATION OF THE METHOD IN CONFECTIONERY PRODUCTS
6.1 Background
Palm oil is used as an ingredient in the production of confectionery products. Palm oil and palm
kernel oil fractions provide ideal functional properties in the development of confectionery fats. The
confectionery market can be divided into two broad sectors: chocolate confectionery (‘countlines’ and
moulded bars, blocks, boxed chocolates and bite-size products), and sugar confectionery (including
fruit sweets, mints and chewing gum). Confectionery fats are used in coatings, filling, toffees and
caramels and ice-cream. Vegetable fats have been used in chocolate and chocolate-like coatings for
many years. Current EU legislation restricts their use to 5% of specific fats, if the product is being sold
as chocolate, and also requires very clear labelling. Confectionery with greater than 5% Cocoa Butter
Equivalent (CBE) cannot be labelled as chocolate. If higher levels or other fats are used, it must be
sold under another name, such as a chocolate flavoured coating. Legislation varies elsewhere in the
world with a few countries even allowing all the cocoa butter to be replaced by other fats. In the EU,
there is a Chocolate Directive which defines milk chocolate as having a minimum of fat content of
25%, not including vegetable oils. For a typical low-cost milk chocolate recipe with 28.3% fat, the
maximum vegetable fat that can be added is 3.3%. Chocolate for use in ice cream or similar may
contain up to 5% vegetable fat other than cocoa butter (EU, 2012). The Chocolate Directive does not
cover chocolate coatings and fillings. Chocolate fillings may include for example, hazelnut, praline,
toffee, wafers or other fat products. Fillings in chocolates may use coconut or palm kernel oils or CBE
containing palm mid-fractions (EU, 2012).
There are three types of palm based confectionary fats used in chocolate:
1. - Cocoa Butter Equivalents (CBE) (Non-lauric fats, temper): Vegetable fats with similar chemical
and physical characteristics to cocoa butter and can hence be used interchangeably with cocoa butter
in any recipe. They are like cocoa butter i.e. palm oil mid-fractions, with similar triacylglycerol
composition (POP, POSt, StOSt) than cocoa butter (CB) and can be added in any proportion without
causing a significant softening or hardening effect. A standard Cocoa Butter Equivalent (CBE)
contains around 50% exotic fats and 50% palm oil ‘Soft’ cocoa butter equivalents are typically used in
the UK and Ireland, and contain up to 30% exotics and 70% palm mid fractions (compared to ‘hard’
CBE which contain higher proportions of exotics to palm oil) (EU, 2012).
2.- Cocoa Butter Replacers (CBR) (Non-lauric fats, non-temper): Vegetable fats of a non-lauric
origin with similar physical, but not chemical characteristics to cocoa butter and which can be used to
replace most of the cocoa butter in coating applications. They are partially compatible with CB, adding
up to 20-30% of CB in the fat phase. They can also replace cocoa fats entirely (partly hydrogenated
double fractionated palm olein).
3.- Cocoa Butter Substitutes (CBS) (Lauric fats): Vegetable fats of a lauric origin with similar
physical, but not chemical characteristics to cocoa butter and which can be used to replace almost all
of the cocoa butter in coating applications. Toffees are also likely to contain palm oil, at around 7.5%
of volume. Toffee fats can include interesterified hydrogenated palm kernel oil and palm oil,
hydrogenated palm kernel stearin, palm kernel olein and palm olein or hydrogenated palm kernel oil.
6.2 Samples
Oils used in the confectionery industry were sourced from retail companies. Exotic oils were not
easy to find so a low number of samples were purchased from online retailers and authenticity was not
guaranteed. These oils were cocoa butter (n=8), hydrogenated palm kernel oil (n=1), shea butter
(n=5), illipe butter (n=2), mango kernel (n=4), kokum gurgi (n=2) and sal (n=1) (see Section 3.1, Table
1).
Additionally, three types of confectionery products containing chocolate were purchased from local
supermarkets. These were:
o Confectionery product 1: bar of two crispy wafer fingers covered with milk chocolate (66%). Fats
included in the ingredients list are: Cocoa butter, vegetable fat (Palm
Kernel/Palm/Shea/Sal/Illipe/Kokum Gurgi/Mango Kernel) and butterfat (from milk).
o Confectionery product 2: contains milk chocolate (35%) covered caramel (32%) and biscuit
EVID4 Evidence Project Final Report (Rev. 06/11)Page 34 of 106
(26%). Fats included in the ingredients list are: Palm fat, cocoa butter and milk fat.
o Confectionery products 3:
Brand 1- Sponge cakes with dark crackly chocolate and a smashing orangey centre.
Fats listed in the ingredients list are: Vegetable fats (palm, sal and/or shea), butter oil
(milk) and cocoa butter for the chocolate coating (19%) and vegetable oils (sunflower,
palm) for the rest of the cake.
Brand 2- Sponge cakes with dark crackly chocolate and a smashing orangey centre.
Fats listed in the ingredients list are: Palm oil for the biscuit (38%) and cocoa butter for
the chocolate coating (17%).
6.3 In-house admixtures of pure oils
Oil admixtures including oils and butters widely used in confectionary industry (palm oil, palm
kernel, hydrogenated palm kernel oil, shea butter, cocoa butter, sal, kokum butter, illipe butter and
mango kernel butter) were created in our laboratory for model validation purposes. These oil
admixtures were intended to mimic some of the most popular oil admixtures used in the confectionery
industry and were as follows:
Confectionery admixture 1 (sample code EBM1):
- Cocoa butter (70 %) COA2
- Palm oil (30 %) POn9
Confectionery admixture 2 (sample code EBM2):
- Cocoa butter (65 %) COA2
- Hydrogenated palm kernel oil (20 %) PKOn5
- Palm oil (15 %) POn9
Confectionery admixture 3 (sample code EBM3):
- Cocoa butter (70 %) COA2
- Palm oil (20 %) POn9
- Shea butter (5 %) ShB1
- Sal (5 %) SB1
Confectionery admixture 4 (sample code EBM4):
- Cocoa butter (65 %) COA2
- Palm oil (20 %) POn9
- Palm kernel oil (4 %) PKOn4
- Sal (2 %) SB1
- Illipe (2 %) IlB1
- Kokum (2 %) KmB1
- Mango kernel (2 %) MnB1
- Shea butter (3 %) ShB1
Confectionery admixture 5 (sample code EBM5):
- Cocoa butter (65 %) COA2
- Palm oil (25 %) POn9
- Shea butter (10 %) ShB1
Confectionery admixture 6 (sample code EBM6):
- Palm kernel oil (50 %) PKOn4
- Coconut oil (50 %) CCO9
Confectionery admixture 7 (sample code EBM7):
- Cocoa butter (70 %) COA2
EVID4 Evidence Project Final Report (Rev. 06/11)Page 35 of 106
- Palm oil (15 %) POn9
- Sal (2 %) SB1
- Illipe (4 %) IlB1
- Kokum (2 %) KmB1
- Mango kernel (4 %) MnB1
- Shea butter (3 %) ShB1
Confectionery admixture 8 (sample code EBM8):
- Cocoa butter (70 %) COA2
- Palm oil (10 %) POn9
- Sal (2 %) SB1
- Illipe (5 %) IlB1
- Mango kernel (5 %) MnB1
- Shea butter (8 %) ShB1
Confectionery admixture 9 (sample code EBM9):
- Cocoa butter (80 %) COA7
- Palm oil (16 %) POn18
- Sal (4 %) SB1
Confectionery admixture 10 (sample code EBM10):
- Cocoa butter (80 %) COA7
- Palm oil (16 %) POn9
- Shea butter (4 %) ShB1
Confectionery admixture 11 (sample code EBM11):
- Cocoa butter (80 %) COA7
- Palm oil (12 %) POn18
- Sal (4 %) SB1
- Shea butter (4 %) ShB1
Confectionery admixture 12 (sample code EBM12):
- Cocoa butter (70 %) COA7
- Palm oil (6 %) POn18
- Sal (2 %) SB1
- Sunflower oil (20 %) SOn5
- Shea butter (2 %) ShB1
Confectionery admixture 13 (sample code EBM13):
- Cocoa butter (60 %) COA7
- Palm oil (10 %) POn18
- Palm kernel oil (10%) PKOn3
- Sal (4 %) SB1
- Illipe (4 %) IlB1
- Mango kernel (4 %) MnB3
- Kokum (4%) KmB1
- Shea butter (4 %) ShB1
6.4 Fat extraction of commercial confectionery products
Confectionery chocolate products (1, 2 and 3) were analysed as a whole as well as per parts.
Confectionery product 1 was separated using a sharp knife into two parts, the chocolate coating and
the wafer fingers with filling. Confectionery product 2 was separated into three parts, the chocolate
coating, the caramel and the biscuit. And confectionery products 3 brand 1 and brand 2 were divided
EVID4 Evidence Project Final Report (Rev. 06/11)Page 36 of 106
into three sections, the chocolate coating, the orangey centre and the biscuit.
All samples were manually milled into powder/fine particles using a knife or a wooden stick. 10 g of
sample was mixed with 30 mL of hexane in a 50mL centrifuge tube. Tubes were mixed in a tube mixer
at 2500 rpm during 2 minutes and then left during 1 hour in a rotary mixer (33 rpm) letting the fat be
dissolved in the solvent. Tubes were centrifuged at 3000 rpm during 10 minutes until total separation
of phases. The upper layer containing the fat dissolved in hexane was transferred to a round bottomed
flask. Another 30 mL of hexane were added to the remaining bottom layer for a second extraction.
Procedure was the same followed for the first extraction. The second upper layer containing the
remaining fat dissolved in hexane was transferred to the round bottom flask and mixed with the first
extraction.
Solvent was evaporated using a rotary evaporator at 50°C during 15 minutes (160 rpm). Fat was
then weighted and transferred into small plastic tubes. The extraction procedure was repeated as
many times as needed in order to obtain the required amount of oil sample (approx. 3 g). Nitrogen was
injected into the headspace to prevent oxidation. Oil samples were stored at -20°C until analysis.
6.5 Spectral Data Acquisition with FTIR spectroscopy
FTIR and Raman spectroscopy were used as screening techniques in order to collect
spectroscopic data from the oils present in confectionery products. The procedure and spectroscopic
conditions were the same as described in the SOP (FAO117). Three and two replicates were obtained
for each sample, respectively.
All spectra were pre-processed according to a suitable standardized treatment which includes
three spectral filters, standard normal variate (SNV), first order derivative and Savitsky-Golay
smoothing, applied in a sequential order.
6.6 Confectionery only-model building- Calibration models and validation
6.6.1 Dataset building
Calibration models were built using spectroscopic data from pure oils as well as oil admixtures
likely present in confectionery products. The pure oils used for the calibration models were palm oil
(n=20), palm kernel oil (n=7), palm olein (n=3), hydrogenated palm kernel oil (n=1) and cocoa butter
(n=7). The number of different oil types and oil combinations in a confectionery product is high which
makes the identification of oil species in a confectionery product very challenging. A big number of
simulated samples/admixtures were generated in order to cover all the potential compositional ranges
met in commercial confectionery product. Oil admixtures were generated using simulated samples and
these were:
o CB+PO= CB (99%-69% (5%)) + PO (1%+31% (5%)) ---Oil admixtures of cocoa butter (ranging
from 69% to 99% in intervals of 5%) and palm oil (ranging from 1% to 31% in intervals of 5%)
(n=245)
o CB+PO+SB = CB (95%-99% (4%)) + PO (3.5%-0.5% (3%)) + SB (1.5%-0.5% (1%)) ---Oil
admixtures of cocoa butter (ranging from 95% to 99% in intervals of 4%), palm oil (ranging from
0.5% to 3.5% in intervals of 3%) and sal butter (ranging from 0.5% to 1.5% in intervals of 1%)
(n=350)
o CB+PO+ShB = CB (95%-99% (4 %)) + PO (3.5%-0.5% (3%)) + ShB (1.5%-0.5% (1%)) ---Oil
admixtures of cocoa butter (ranging from 95% to 99% in intervals of 4%), palm oil (ranging from
0.5% to 3.5% in intervals of 3%) and shea butter (ranging from 0.5% to 1.5% in intervals of 1%)
(n=350)
o CB+PO+SB+ShB = CB (95%-99% (4%)) + PO (3.5%-0.5% (3%)) + SB (0.75%-0.25% (0.5%)) +
ShB (0.75%-0.25% (0.5%)) ---Oil admixtures of cocoa butter (ranging from 95% to 99% in
intervals of 4%), palm oil (ranging from 0.5% to 3.5% in intervals of 3%), sal butter (ranging from
0.25% to 0.75% in intervals of 0.5%) and shea butter (ranging from 0.25% to 0.75% in intervals
of 0.5%) (n=350)
o CB+PO+SB+ShB+ILB+KMB+MNB= CB (95%-99% (4%)) + PO (3.5%-0.5% (3%)) + SB (0.3%-
0.1% (0.2%) + ShB (0.3%-0.1% (0.2%)) + ILB (0.3%-0.1% (0.2%)) + KMB (0.3%-0.1% (0.2%)) +
EVID4 Evidence Project Final Report (Rev. 06/11)Page 37 of 106
MNB (0.3%-0.1% (0.2%)) ---Oil admixtures of cocoa butter (ranging from 95% to 99% in
intervals of 4%), palm oil (ranging from 0.5% to 3.5% in intervals of 3%), sal butter (ranging from
0.1% to 0.3% in intervals of 0.2%), shea butter (ranging from 0.1% to 0.3% in intervals of 0.2%),
illipe butter (ranging from 0.1% to 0.3% in intervals of 0.2%), kokum butter (ranging from 0.1%
to 0.3% in intervals of 0.2%) and mango kernel butter (ranging from 0.1% to 0.3% in intervals of
0.2%) (n=5600)
o SO+PO: Oil admixtures of sunflower oil and palm oil (36 in-house samples and 900 simulated
samples for FTIR).
o PO+PKO: Oil admixtures of palm oil and palm kernel oil (37 in-house admixtures and 420
simulated samples for FTIR)
After the introduction of the simulated samples for the above type of admixtures, an unbalanced
training dataset is generated since the pure oil classes have a very small number of samples.
Therefore, simulated samples were also used to increase the number of samples of the pure oils and
created more balanced and robust models for avoiding any bias towards the classes with the most
representatives. Thus the final number of samples were as follows:
o Palm oil: 20 pure oil samples and 280 simulated samples (n=300).
o Palm kernel oil: 7 pure oil samples and 336 simulated samples (n=343)
o Palm olein: 3 pure oil samples and 300 simulated samples (n=303)
o Hydrogenated palm kernel oil: 1 pure oil sample and 280 simulated samples (n=281)
o Cocoa butter: 7 pure oil samples and 8400 simulated samples (n=8407)
All pure oils and oil admixtures above mentioned are commonly found in confectionery products
especially those products composed of chocolate coating and a biscuit/cake. The total number of
samples for building the models was 17922. Due to the different oil nature of the oils,
confectionery oils could not be tested with our initial calibration models built with pure oils
(model B and C, see section 3.6.2). For this particular case, the question to answer is “is palm oil
present in a confectionery product?” To answer this question, all samples were divided into two
classes, palm oil class (P class) and non-palm oil class (non-P class). Palm oil class was composed of
9515 samples and non-palm oil class was composed of a total of 8407 samples. Palm oil class (P
class) includes palm oil, palm olein, palm kernel oil, hydrogenated palm kernel oil and the oil
admixtures SO+PO, PO+PKO, CB+PO, CB+PO+SB, CB+PO+ShB, CB+PO+SB+ShB and
CB+PO+SB+ShB+ILB+KMB+MNB whereas non-palm oil class (non-P class) includes only cocoa
butter. Models were built using PLS-DA (number of latent variables used equals 2).
The PCA space of the two first principal components for FTIR is presented in Figure 12 (green
colour for the non-P class and blue colour for the P class). In the PCA space of the FTIR spectral data,
the ‘P class’ samples are dispersive as they include a large variety of oils and oil admixtures whereas
the ‘non-P class’ samples are grouped together.
EVID4 Evidence Project Final Report (Rev. 06/11)Page 38 of 106
Figure 12. Principal Component Analysis of FTIR spectral data of confectionery products (green
colour: non-P class and blue colour: P class)
6.6.2 Validation- in-house admixtures
The total number of in-house admixtures was 13 (see section 6.3) and one sample of pure cocoa
butter was also included in the model.
All oils and in-house admixtures were correctly classified when using FTIR and PLS-DA. Confusion
tables and the performance of the classification models can be found in Tables 19 a and b.
Table 19a. Performance of the FTIR classification model on the validation samples (cocoa butter
and in-house confectionery admixtures, n=14).
Description Statistical Measures
ACC (%)
Class TP TN FP FN Sensitivity or TPR
Specificity
FPR
Precision
F1 score
Application of PLS-DA
100 Non-P 1 13 0 0 1.00 1.00 0.00 1.00 1.00
P 13 1 0 0 1.00 1.00 0.00 1.00 1.00
* ACC: accuracy; TP: true positives; TN: true negatives; FP: false positives; FN: false negatives; TPR: true positive rate; FPR: false positive rate.
Table 19b. Confusion table for the results presented in Table 19a
Actu
al
Predicted
Non-P class P class Total Sensitivity (%)
Non-P 1 0 1 100.0
P 0 13 13 100.0
Precision (%) 100.0 100.0
Average sensitivity 100.0
Average precision 100.0
Overall accuracy 100.0
One sample out of 14 validation samples was correctly classified as belonging to the Non-P class
and 13 out of 14 validation samples were correctly classified as belonging to the P class and thus the
overall accuracy was 100%.
EVID4 Evidence Project Final Report (Rev. 06/11)Page 39 of 106
6.6.3 Validation- commercial admixtures
A total of 13 samples of oils extracted from commercial confectionery products were used for
validation purposes. The 13 samples were:
- Confectionery product 1 whole (sample code CP1W)
- Confectionery product 1chocolate coating (sample code CP1CH)
- Confectionery product 1biscuit with filling (sample code CP1B)
- Confectionery product 2 whole (sample code CP2W)
- Confectionery product 2 chocolate coating (sample code CP2CH)
- Confectionery product 2 caramel (sample code CP2C)
- Confectionery product 2 biscuit (sample code CP2B)
- Confectionery product 3 brand 1 whole (sample code CP3B1W)
- Confectionery product 3 brand 1 chocolate coating (sample code CP3B1CH)
- Confectionery product 3 brand 1 biscuit/cake (sample code CP3B1B)
- Confectionery product 3 brand 2 whole (sample code CP3B2W)
- Confectionery product 3 brand 2 chocolate coating (sample code CP3B2CH)
- Confectionery product 3 brand 2 biscuit/cake (sample code CP3B2B)
6.6.3.1.1.1 h
All oils extracted from confectionery products were correctly classified when using FTIR and PLS-
DA. Confusion tables and the performance of the classification models can be found the tables below
(Table 20 a and b).
Table 20a. Performance of the FTIR classification model on the validation samples (oil extracted
from commercial confectionery products, n=13)
Description Statistical Measures
ACC (%)
Class TP TN FP FN Sensitivi
ty or TPR
Specificity
FPR
Precision
F1 score
Application of PLS-DA
100 Non-P 1 12 0 0 1.00 1.00 0.00 1.00 1.00
P 12 1 0 0 1.00 1.00 0.00 1.00 1.00
* ACC: accuracy; TP: true positives; TN: true negatives; FP: false positives; FN: false negatives; TPR: true positive rate; FPR: false positive rate.
Table 20b. Confusion table for the results presented on Table 20a
Actu
al
Predicted
Non-P class P class Total Sensitivity (%)
Non-P 1 0 1 100.0
P 0 12 12 100.0
Precision (%) 100.0 100.0
Average sensitivity 100.0
Average precision 100.0
Overall accuracy 100.0
One sample (CP3B2CH) out of 13 commercial confectionery samples was correctly classified as
belonging to the Non-P class and 13 out of 14 commercial confectionery samples were correctly
classified as belonging to the P class and thus the overall accuracy was 100%. According to the
ingredient list on the label of the package of the confectionery product 3 brand 2 the dark chocolate
coating includes only cocoa butter as oil/fat.
EVID4 Evidence Project Final Report (Rev. 06/11)Page 40 of 106
6.7 Application of chromatographic confirmation method based on fatty acid criteria
Due to the complexity of oil admixtures that can be included in just one confectionery product,
comprehensive identification of all oils present in one product is not possible using the current two-
step methodology. Therefore the best approach is to know if there is palm oil or not in a given
confectionery product. Confectionery products with chocolate coating are most likely to contain palm
oil but sometimes they might contain only cocoa butter.
Fatty acid criteria have been built using pure cocoa butter oils to confirm if a chocolate
confectionery product does not contain any palm oil and can be found in table 21.
Table 21. Fatty acid criteria for confectionery chocolate products
Specific FA (mg FA/g oil) Pure Cocoa butter
C16:0 Palmitic acid <250
C18:0 Stearic acid >200
PUFA /SAT (P/S) ratio <0.048
* FA: fatty acid; PUFA/SAT (P/S): polyunsaturated fatty acids/saturated fatty acids.
Oil extracted from commercial confectionery products has been used to test the efficiency of the
cocoa butter fatty acid criteria. Results are presented in table 22.
Table 22. Identity of oil extracted from commercial confectionery products using fatty acid criteria
* PUFA/SAT: polyunsaturated fatty acids/saturated fatty acids.
From all the commercial confectionery samples only the chocolate coating of the confectionery
product 3 brand 2 (CP3B2CH) fulfilled all the criteria of pure cocoa butter. These results are in
accordance with the oils stated in the ingredient list of the package.
One sample, the chocolate coating of the confectionery product 2 (CP2CH) is quite close to fulfil all
the criteria of pure cocoa butter. This indicates that the amount of cocoa butter in the chocolate
coating of the confectionery product 2 is quite high and although palm oil is present, the amount is
very small.
SAMPLE C16:0 Palmitic acid
(mg FA/g oil)
C18:0 stearic acid
(mg FA/g oil) PUFA/SAT ratio
PREDICTED
IDENTITY
CP2W 334.81 137.51 0.0793 PO admixture
CP2CH 241.85 216.26 0.0565 PO admixture
CP2C 437.62 40.39 0.1033 PO admixture
CP2B 414.47 42.44 0.1139 PO admixture
CP1W 257.84 176.44 0.0635 PO admixture
CP1CH 233.48 199.33 0.0467 PO admixture
CP1B 341.33 68.86 0.1330 PO admixture
CP3B1W 214.46 192.78 0.1108 PO admixture
CP3B1CH 219.09 202.19 0.0726 PO admixture
CP3B1B 136.82 34.67 1.2618 PO admixture
CP3B2W 265.26 230.94 0.0831 PO admixture
CP3B2CH 231.17 284.95 0.0456 Cocoa butter
CP3B2B 363.44 38.58 0.2141 PO admixture
EVID4 Evidence Project Final Report (Rev. 06/11)Page 41 of 106
7. DEVELOP AND VALIDATE THE WEB TOOL USED FOR DATA ANALYSIS
The web tool predicts the composition of an unknown oil mixture using advanced multivariate
analysis tools and practically enable users to perform the analysis without the need of the necessary
statistical and data analysis packages. It can currently be accessed through the link:
www.whatismyoil.co.uk. One version of this web tool is ready and work is on-going.
The classification models of the webtool have been updated in this follow-on project in order to
include new vegetable oil species and extend it to processed foods containing vegetable oils,
especially foods containing palm oil such as biscuits and confectionery bars.
The above figure indicates the structure of the different options provided by the web tool. User can
select from the interface provided if the oil is a vegetable oil blend or it has been extracted from a
biscuit or a confectionery product (e.g. chocolate bar). Different models have been developed for each
of these options. ‘Biscuit’ and ‘vegetable oil’ options comprise one sub-model each whereas
‘confectionery option’ comprises three independent sub-models, i.e. coating, caramel and biscuit sub-
models.
The web tool also provides an extra functionality to detect the presence of palm oil in an unknown
testing sample where there is no information about the original source of the oil. The web tool is still
under development and it is expected to come online by the end of 2016.
Wh
at is
th
e sa
mp
le?
Known
Biscuit Biscuit model
Confectionery
Coating
Caramel
Biscuit
Vegetable oil Oil model
Unknown Palm oil / No
palm oil model
User options Models
EVID4 Evidence Project Final Report (Rev. 06/11)Page 42 of 106
EVID4 Evidence Project Final Report (Rev. 06/11)Page 43 of 106
8. OVERALL CONCLUSIONS AND IMPLICATIONS OF THE FINDINGS
The two-stage procedure developed in the previous DEFRA project FAO117, consisting of a
screening stage based on a spectroscopic method (FTIR) and a confirmation stage based on a
chromatographic method (FA analysis using GC), has been successfully applied to two processed
food product categories, i.e. biscuits and confectionery products, with some necessary modifications
that actually improve the initial method further. Additionally, the initial dataset of pure oils and oil
admixtures has been significantly expanded to include more variability and the calibration models
have been rebuilt and revalidated.
The conclusions of this project are:
Initial database of pure oils and oils admixtures has been expanded in terms of number of
samples and oil species and now the sample library covers the global oil production. All
samples were purchased from reliable and reputable sources (major food industries and the oil
processing industry) with the exception of the exotic oils/fats. Exotic oils were not easy to obtain
and they were mainly purchased from online retailers. Thus, the authenticity of these oils was
not verified and it cannot be guaranteed.
Extension of the oil-only detection: Initial 6-class calibration models (legacy design) have been
re-built and also improved with the introduction of the ‘enhanced dataset’ concept (simulated
samples and the use of Matlab data analysis package) resulting in remarkably good
classification rate, i.e. 95.05% of the validation samples were correctly classified and only
4.95% were wrongly classified. Use of the FTIR screening stage was so successful with this
dataset that no sample needed to go to the confirmation step (fatty acid analysis).
New oil-species-calibration models have been successfully built for 12 classes (new – high
resolution model design) on the top of 6 classes’ model (now containing also coconut oil and its
admixtures). Using the classification algorithm PLS-DA on FTIR data a correct classification rate
of 51.49% was achieved which was expected because of the higher degree of difficulty (the
more classes the more difficult the classification problem). Only 4.95% of the samples were
wrongly classified and 43.56% of the samples (44 samples) needed to go to the second step
based on fatty acid criteria to confirm identity.
New fatty acid criteria for 12 classes were established for the confirmation step. Following
screening the criteria were applied in the ‘unknown’ samples of the FTIR stage and correctly
identified 39 out of 44 pure oil/oil admixture samples submitted to the confirmation step (88.6%
success). The overall success when considering both stages (screening by FTIR and
confirmation by GC) was 90.1%.
FTIR inter-lab trial: The majority of the blends in the FTIR inter-lab trial validation were identified
by the PLS-DA chemometric models in the screening step and a small percentage (14% non-
classified and 2.3% wrongly classified) of pure and oil blends were rejected. The fatty acid
analysis of the validation samples correctly identified the nature of 16 out of 18 samples (88.9%)
referred to the confirmation step. FTIR spectroscopy coupled with PLS-DA algorithm and
followed on by fatty acid analysis when required offers an insight into the nature of pure oil and
binary mixtures and correctly classifies 96.03% of unknown oil samples as seen in this inter-lab
validation.
More specifically (with respect to the gas chromatographic analysis) the inter-lab trial of the
targeted analytical analysis (fatty acids) proved to be successful. Fatty acid contents of the
same oil samples analysed by different gas chromatography instruments and under different
derivatisation and chromatographic conditions were shown to be consistent amongst
participants. Low RSD (relative standard deviation) values (from 0.01 to 0.53) were obtained for
the quantities of the major fatty acids present in oil samples.
Validation in biscuits: The two-step procedure has been used to identify the oil species present
in biscuits. The 6-classes model was more efficient in identifying the oil classes of oils extracted
from commercial biscuits than the 12-classes model. 80% of the samples were correctly
EVID4 Evidence Project Final Report (Rev. 06/11)Page 44 of 106
classified (20% of the samples were wrongly identified) when using the 6-classes model
whereas 50% of the samples were correctly classified (25% of the samples were wrongly
classified and 25% of the samples were non-classified and need to go to the confirmation step)
when using 12-classes model.
To improve the results further, new calibration models specifically built for biscuits (biscuit-only
model) were prepared using authentic vegetable oils extracted from in-house biscuits. Validation
of the methodology with in-house biscuits showed 100% accuracy whereas validation with oils
from commercial biscuits showed 80% accuracy and 15% wrongly classified. In order to tackle
false positives to an acceptable level (<5%), thresholds were established (threshold=0.70) to
decrease the false positive rate, and as a result, the accuracy was 80% and the false positive
rate was 5%.
Confectionary fats are very complex products and resolving the oil types required a different
approach. Since these fats are very different to most of the oils used until now in the
methodology and would not be sufficiently identified by the calibration models, it was decided to
simplify the problem to ‘bare minimum’: the detection of the presence of palm oil (yes/no model).
Thus, the presence of palm oil in confectionery products has been successfully detected using
specific PLS-DA calibration models for chocolate confectionery products (yes/no model or
confectionery-only model). FTIR spectroscopy provided excellent and promising results on the
detection of palm oil in a chocolate confectionery product. Validation with in-house oil
admixtures as well as with oils extracted from commercial confectionery products showed 100%
accuracy when using FTIR.
Chocolate products with only cocoa butter (non-palm oil confectionery) could be confirmed
using the latter PLS-DA model for confectionery products (yes/no model) as well as the
presence of palm oil in chocolate products containing palm oil. Fatty acid criteria for
confectionery samples were created and successfully identified all oils extracted from
commercial confectionery products. Due to the limited number of samples used, further work
could strengthen further the model design.
The method has some known limitations:
The performance of the ‘processed foods-specific models’ have not been evaluated from
spectral input from different spectrometers but ground work on the harmonisation protocol
should help in this direction.
Complex admixtures of more than two oils types have not been tested with the newly developed
method. These admixtures are not common in processed foods. There is evidence however that
when complicated mixtures are analysed (such as in the example of confectionary fats) the
developed method can be modified to a binary problem (‘is there palm oil or not’?- yes/no
model) and has showed promising results.
Although a generic method has attempted to be developed, results showed that some
modifications will be required in order to adapt its use in different food products (e.g. coleslaw,
ice cream, chilled ready-to-eat foods such as lasagne dish with a mixture of animal and
vegetable fat). It is the nature of the method (untargeted analysis) that it ‘cannot be prepared for
the unexpected’ and needs to be supported by robust calibration data that thus limits its
application/specificity.
The method is not suitable for testing trans-esterified oils such as the oil contained in
margarines because it is based on vibrational spectroscopy and fatty acid analysis. These
tailored mixtures can be found in numerous combinations and are the intellectual property of
every company and they were not available as reference samples in the project. More
importantly, the nature of the trans-esterification and the countless possibilities for a different
final oil composition would make the analysis extremely challenging.
EVID4 Evidence Project Final Report (Rev. 06/11)Page 45 of 106
To what extent original objectives were met?
Ultimately, the staged procedure consisting of a spectroscopic screening with FTIR and a
chromatographic confirmatory analysis proved effective in identifying the nature of unknown complex
refined vegetable oil blends in both oils and in some extend in processed foods with some essential
modifications. The methodology is simple to implement, very affordable in terms of cost per sample
and equipment resources required and yet highly specific. In this regard the original objectives were
fully met.
An Standard Operating Procedures (SOP) manual has been developed on what is essentially 4
different variations:
Variation 1. Initial determination of oil species in an unknown oil blend (oils-only)
Variation 2. High resolution determination of oil species (oils-only)
Variation 3. Prediction of the oil species in a biscuit product
Variation 4. Confirmation of the presence of palm oil in confectionery (chocolate) products.
The methodology as it now stands is ready to be transferred for routine analysis of unprocessed
vegetable oils (variations 1 and 2) because its performance has been fully validated both in-house and
externally and with the new harmonisation protocols implemented, enlarged sample database and
advanced method criteria implemented. Processed foods testing (where legislation actually applies)
has been developed (variations 3 and 4) in two product categories (pastry and confectionary products)
with some limitations. The performance of these methods (although fully outlined in the SOP) has not
been evaluated externally, i.e. from spectral input from different spectrometers. Results, however, are
very promising (>90% success rate). The research proved that different variation of the methods
(different calibration model) is needed for every product category tested. Further work is needed to
develop the universal (applicable to all products), instrument agnostic (applicable to all acquisition
instruments) method in order to adequately enforce the legislation.
9. ABBREVIATIONS
ACC Accuracy
CB Cocoa Butter
CCO Coconut oil
DEFRA Department of Environment, Food and Rural Affairs
DG Digestive biscuits
EC European Commission
EDA Exploratory data analysis
EU European Union
FA Fatty Acid(s)
FN False negative
FP False positive
FTIR Fourier Transform Infrared
FPR False positive rate
GC Gas chromatography
ILB Illipe Butter
IRMM Institute for Reference Materials and Measurements
KMB Kokum gurgi Butter
MNB Mango kernel Butter
P Palm oil and its derivatives olein and stearin
PCA Principal Component Analysis
PCCO Palm oil (PO) and Coconut oil (CCO) binary admixture
PKO Palm Kernel oil
PKOC Palm Kernel oil, coconut oil
PLS Partial least square
EVID4 Evidence Project Final Report (Rev. 06/11)Page 46 of 106
PLS-DA Partial least square discriminant analysis
PO Palm oil
PPKO Palm Oil (PO) and Palm Kernel oil (PKO) binary admixture
PPKOC P and PKOC oil admixtures
P/S Polyunsaturated/Saturate fatty acids
PUFA Polyunsaturated fatty acids
QUB Queens University Belfast
RO Rapeseed oil
ROSO Rapeseed (RO) and Sunflower oil (SO) binary admixture
ROPKO Rapeseed (RO) and Palm Kernel oil (PKO) binary admixture
ROPO Rapeseed (RO) and Palm oil (PO) binary admixture
rpm Revolutions per minute
RSD Relative Standard Deviation
RS Rapeseed oil, sunflower oil, rapeseed and sunflower oil admixtures
RSP RS and P oil admixtures
RSPKOC RS and PKOC oil admixtures
RT Rich Tea biscuits
SAT Saturated fatty acids
SB Sal Butter
ShB Shea Butter
SIMCA Soft independent modelling of class analogy
SLV Single Lab Validation
SNV Standard Normal Variate
SO Sunflower oil
SOP Standard Operating Procedure(s)
SOPKO Sunflower (SO) and Palm Kernel oil (PKO) binary admixture
SOPO Sunflower (SO) and Palm oil (PO) binary admixture
TN True negative
TP True positive
TPR True positive rate
UK United Kingdom
EVID4 Evidence Project Final Report (Rev. 06/11)Page 47 of 106
References to published material
9. This section should be used to record links (hypertext links where possible) or references to other published material generated by, or relating to this project.
These are the references used in the Evidence Project Final Report.
1. Bevilacqua, M., Bucci, R., Magri, A. D., Magri, A. L., and Marini, F. 2012. Tracing the origin of extra virgin olive oils by infrared spectroscopy and chemometrics: A case study. Analytica Chimica, 717, pp. 39-51.
2. Berrueta, L. A., Alonso-Salces, R. M., and Héberger, K. 2007. Supervised pattern recognition in food analysis. Journal of chromatography A, 1158 (1-2), pp.196–214.
3. Eriksson, L., Johansson, E., Kettaneh-Wold, N., Trygg, J., Wikstrom, C., and Wold, S. 2006. In: Multi- and megavariate data analysis (Part I) Basic principles and applications. Eds second. Umetrics AB, Umea, Sweden.
4. Manley a, D. (2001) Section 5.6 semisweet biscuits in Chpater 5 Recipes for hard doughs in Biscuits, cracker and cookie recipes for the food industry. Abington Hall, Abington: Woodhead Publishing Lt. pp. 63- pp. 74.
5. Manley b, D. (2001) Section 6.2 plain biscuits in Chapter 6 Recipes for short doughs in Biscuits, cracker and cookie recipes for the food industry. Abington Hall, Abington: Woodhead Publishing Lt. pp. 81 – pp. 90.
6. Oliveri, P. & Downey, G., 2012. Multivariate class modeling for the verification of food-authenticity claims. TrAC Trends in Analytical Chemistry, 35, pp. 74–86.
7. EU legislation and agriculture reports
http://ec.europa.eu/agriculture/eval/reports/chocolate/fullrep_en.pdf
http://ec.europa.eu/agriculture/eval/reports/chocolate/sum_en.pdf
http://europa.eu/legislation_summaries/consumers/product_labelling_and_packaging/l21122b_en.htm
http://ec.europa.eu/agriculture/eval/reports/chocolate/fullrep_en.pdf
EVID4 Evidence Project Final Report (Rev. 06/11)Page 48 of 106
APPENDIX
Appendix I - In house admixtures- Database expansion
Palm Stearin + Palm Oil binary mixture Palm Olein + Sunflower Oil binary mixture
Palm Stearin % Palm oil % Usage Palm Olein % Sunflower oil % Usage
23 PSn3 77 POn1 Calibration 27 POln1 73 SOn3 Calibration
27 PSn5 73 POn2 Calibration 34 POln3 66 SOn5 Calibration
31 PSn1 69 POn4 Calibration 38 POln1 62 SOn6 Calibration
35 PSn4 65 POn6 Calibration 43 POln3 57 SOn9 Calibration
39 PSn7 61 POn7 Calibration 47 POln1 53 SOn10 Calibration
43 PSn3 57 POn9 Calibration 54 POln3 46 SOn12 Calibration
47 PSn5 53 POn10 Calibration 58 POln1 42 SOn14 Calibration
51 PSn1 49 POn11 Calibration 63 POln3 37 SOn16 Calibration
55 PSn4 45 POn12 Calibration 67 POln1 33 SOn17 Calibration
59 PSn7 41 POn14 Calibration 74 POln3 26 SOn2 Calibration
63 PSn3 37 POn15 Calibration 26 POln2 74 SOn1 Validation
67 PSn5 33 POn17 Calibration 35 POln2 65 SOn4 Validation
71 PSn1 29 POn19 Calibration 41 POln2 59 SOn7 Validation
75 PSn4 25 POn20 Calibration 50 POln2 50 SOn8 Validation
79 PSn7 21 POn1 Calibration 56 POln2 44 SOn11 Validation
32 PSn2 68 POn3 Validation 65 POln2 35 SOn13 Validation
38 PSn6 62 POn5 Validation 71 POln2 29 SOn15 Validation
44 PSn2 56 POn8 Validation
50 PSn6 50 POn13 Validation
56 PSn2 44 POn16 Validation
62 PSn6 38 POn18 Validation
68 PSn2 32 POn21 Validation
74 PSn6 26 POn22 Validation
Rapeseed oil + palm kernel oil binary mixture
Rapeseed Oil + Sunflower Oil binary mixture
Rapeseed
oil %
Palm kernel oil % Usage Rapeseed Oil % Sunflower oil % Usage
73 ROn2 27 PKOn1 Calibration 25 ROn2 75 SOn2 Calibration
67 ROn4 33 PKOn3 Calibration 30 ROn4 70 SOn3 Calibration
60 ROn5 40 PKOn1 Calibration 35 ROn5 65 SOn5 Calibration
56 ROn14 44 PKOn4 Calibration 41 ROn7 59 SOn6 Calibration
53 ROn7 47 PKOn3 Calibration 46 ROn8 54 SOn9 Calibration
45 ROn8 55 PKOn1 Calibration 50 ROn14 50 SOn18 Calibration
38 ROn10 62 PKOn3 Calibration 51 ROn10 49 SOn10 Calibration
36 ROn14 64 PKOn4 Calibration 57 ROn11 43 SOn12 Calibration
31 ROn11 69 PKOn1 Calibration 62 ROn12 38 SOn14 Calibration
25 ROn12 75 PKOn3 Calibration 67 ROn13 33 SOn16 Calibration
75 ROn1 25 PKOn2 Validation 73 ROn2 27 SOn17 Calibration
70 ROn3 30 PKOn2 Validation 25 ROn1 75 SOn1 Validation
61 ROn6 39 PKOn2 Validation 33 ROn3 67 SOn4 Validation
53 ROn9 47 PKOn2 Validation 41 ROn6 59 SOn7 Validation
50 ROn1 50 PKOn2 Validation 49 ROn9 51 SOn8 Validation
40 ROn3 60 PKOn2 Validation 57 ROn1 43 SOn11 Validation
31 ROn6 69 PKOn2 Validation 65 ROn3 35 SOn13 Validation
27 ROn9 73 PKOn2 Validation 73 ROn6 27 SOn15 Validation
EVID4 Evidence Project Final Report (Rev. 06/11)Page 49 of 106
Rapeseed oil + palm oil binary mixture Sunflower oil + palm kernel oil binary mixture
Rapeseed oil %
Palm oil % Usage Sunflower oil % Palm kernel oil % Usage
82 ROn2 18 POn1 Calibration 73 SOn2 27 PKOn1 Calibration
76 ROn4 24 POn2 Calibration 37 SOn3 33 PKOn3 Calibration
72 ROn5 28 POn4 Calibration 60 SOn5 40 PKOn1 Calibration
66 ROn7 34 POn6 Calibration 53 SOn6 47 PKOn3 Calibration
64 ROn14 36 POn23 Calibration 50 SOn18 50 PKOn4 Calibration
62 ROn8 38 POn7 Calibration 45 SOn9 55 PKOn1 Calibration
56 ROn10 44 POn9 Calibration 38 SOn10 62 PKOn3 Calibration
52 ROn11 48 POn10 Calibration 31 SOn12 69 PKOn1 Calibration
46 ROn12 54 POn11 Calibration 25 SOn14 75 PKOn3 Calibration
42 ROn13 58 POn12 Calibration 70 SOn1 30 PKOn2 Validation
36 ROn2 64 POn14 Calibration 37 SOn4 33 PKOn2 Validation
34 ROn14 66 POn23 Calibration 58 SOn7 42 PKOn2 Validation
32 ROn4 68 POn15 Calibration 53 SOn8 47 PKOn2 Validation
26 ROn5 74 POn17 Calibration 43 SOn11 57 PKOn2 Validation
22 ROn7 78 POn19 Calibration 40 SOn13 60 PKOn2 Validation
16 ROn8 84 POn20 Calibration 31 SOn15 69 PKOn2 Validation
74 ROn1 26 POn3 Validation
65 ROn3 35 POn5 Validation
59 ROn6 41 POn8 Validation
50 ROn9 50 POn13 Validation
44 ROn1 56 POn16 Validation
35 ROn3 65 POn18 Validation
29 ROn6 71 POn21 Validation
25 ROn9 75 POn22 Validation
Sunflower oil + palm oil binary mixture Palm oil + palm kernel oil binary mixture
Sunflower oil %
Palm oil % Usage Palm oil % Palm kernel oil % Usage
82 SOn2 18 POn1 Calibration 76 POn1 24 PKOn1 Calibration
76 SOn3 24 POn2 Calibration 72 POn2 28 PKOn3 Calibration
72 SOn5 28 POn4 Calibration 68 POn4 32 PKOn1 Calibration
66 SOn6 34 POn6 Calibration 66 POn23 34 PKOn4 Calibration
64 SOn18 36 POn23 Calibration 64 POn6 36 PKOn3 Calibration
62 SOn9 38 POn7 Calibration 60 POn7 40 PKOn1 Calibration
56 SOn10 44 POn9 Calibration 58 POn23 42 PKOn4 Calibration
52 SOn12 48 POn10 Calibration 56 POn9 44 PKOn3 Calibration
46 SOn14 54 POn11 Calibration 52 POn10 48 PKOn1 Calibration
42 SOn16 58 POn12 Calibration 50 POn23 50 PKOn4 Calibration
36 SOn17 64 POn14 Calibration 48 POn11 52 PKOn3 Calibration
34 SOn18 66 POn23 Calibration 44 POn12 56 PKOn1 Calibration
32 SOn2 68 POn15 Calibration 40 POn14 60 PKOn3 Calibration
26 SOn3 74 POn17 Calibration 38 POn23 62 PKOn4 Calibration
22 SOn5 78 POn19 Calibration 36 POn15 64 PKOn1 Calibration
74 SOn1 26 POn3 Validation 32 POn17 68 PKOn3 Calibration
65 SOn4 35 POn5 Validation 30 POn23 70 PKOn4 Calibration
59 SOn7 41 POn8 Validation 28 POn19 72 PKOn1 Calibration
50 SOn8 50 POn13 Validation 24 POn20 76 PKOn3 Calibration
44 SOn11 56 POn16 Validation 20 POn1 80 PKOn1 Calibration
35 SOn13 65 POn18 Validation 78 POn3 22 PKOn2 Validation
29 SOn15 71 POn21 Validation 72 POn5 28 PKOn2 Validation
25 SOn1 75 POn22 Validation 66 POn8 34 PKOn2 Validation
60 POn13 40 PKOn2 Validation
54 POn16 46 PKOn2 Validation
48 POn18 52 PKOn2 Validation
42 POn21 58 PKOn2 Validation
36 POn22 64 PKOn2 Validation
EVID4 Evidence Project Final Report (Rev. 06/11)Page 50 of 106
30 POn3 70 PKOn2 Validation
24 POn5 76 PKOn2 Validation
Coconut + Palm Oil Binary mixture
Coconut oil
%
Palm oil % Usage
20 CCO1 80 POn1 Calibration
22 CCO2 78 POn2 Calibration
24 CCO5 76 POn4 Calibration
26 CCO6 74 POn6 Calibration
28 CCO8 72 POn7 Calibration
30 CCO1 70 POn9 Calibration
31 CCO9 69 POn23 Calibration
32 CCO2 68 POn10 Calibration
34 CCO5 66 POn11 Calibration
36 CCO6 64 POn12 Calibration
38 CCO8 62 POn14 Calibration
40 CCO1 60 POn15 Calibration
41 CCO9 59 POn23 Calibration
42 CCO2 58 POn17 Calibration
44 CCO5 56 POn19 Calibration
46 CCO6 54 POn20 Calibration
48 CCO8 52 POn1 Calibration
50 CCO1 50 POn2 Calibration
51 CCO9 49 POn23 Calibration
52 CCO2 48 POn4 Calibration
54 CCO5 46 POn6 Calibration
56 CCO6 44 POn7 Calibration
58 CCO8 42 POn9 Calibration
60 CCO1 40 POn10 Calibration
61 CCO9 39 POn23 Calibration
62 CCO2 38 POn11 Calibration
64 CCO5 36 POn12 Calibration
66 CCO6 34 POn14 Calibration
68 CCO8 32 POn15 Calibration
70 CCO1 30 POn17 Calibration
71 CCO9 39 POn23 Calibration
72 CCO2 28 POn19 Calibration
74 CCO5 26 POn20 Calibration
76 CCO6 24 POn1 Calibration
78 CCO8 22 POn2 Calibration
28 CCO3 72 POn3 Validation
31 CCO4 69 POn5 Validation
40 CCO7 60 POn8 Validation
43 CCO10 57 POn13 Validation
49 CCO3 51 POn16 Validation
52 CCO4 48 POn18 Validation
61 CCO7 39 POn21 Validation
64 CCO10 36 POn22 Validation
70 CCO3 30 POn3 Validation
73 CCO4 27 POn5 Validation
82 CCO10 18 POn8 Validation
EVID4 Evidence Project Final Report (Rev. 06/11)Page 51 of 106
Appendix II – Confusion tables – Oil Database expansion
MODEL A (using SIMCA Umetrics TM)/6 classes and MODEL A with
thresholds/6 classes
SIMCA
Description Statistical Measures
ACC
(%) Class
T
P TN
F
P FN
Sensiti
vity or
TPR
Specifi
city FPR
Precisio
n F1 score
Application
of PLS-DA
93.0 P
18 81 1 1 0.95 0.99 0.01 0.95 0.95
RS 14 83 0 4 0.78 1.00 0.00 1.00 0.88
PKOC 4 95 1 1 0.80 0.99 0.01 0.80 0.80
RSPKOC 15 83 3 0 1.00 0.97 0.03 0.83 0.91
RSP 22 77 1 1 0.96 0.99 0.01 0.96 0.96
PPKOC 21 79 1 0 1.00 0.99 0.01 0.95 0.98
Act
ual
Predicted
P
RS
PK
OC
RSP
KO
C
RSP
PP
KO
C
Tota
l
Sen
siti
vity
(%
)
P 18 0 1 0 0 0 19 94.74
RS 0 14 0 3 1 0 18 77.78
PKOC 0 0 4 0 0 1 5 80.00
RSPKOC 0 0 0 15 0 0 15 100.00
RSP 1 0 0 0 22 0 23 95.65
PPKOC 0 0 0 0 0 21 21 100.00
Precision (%)
94
.74
10
0.0
0
80
.00
83
.33
95
.65
95
.45
Average sensitivity 91.36
Average precision 91.53
Overall accuracy 93.07
EVID4 Evidence Project Final Report (Rev. 06/11)Page 52 of 106
SIMCA with thresholds (threshold=0.05)
Description Statistical Measures
ACC
(%) Class TP TN FP FN
Sensitivi
ty or
TPR
Specifi
city
FP
R
Precisi
on
F1
score
Application
of PLSDA
33.66 P
7 82 0 12 0.37 1.00 0.00 1.00 0.54
RS 6 83 0 12 0.33 1.00
0.00 1.00 0.50
PKOC 0 96 0 5 0.00 1.00
0.00 0.00 0.00
RSPKOC 3 86 0 12 0.20 1.00
0.00 1.00 0.33
RSP 9 78 0 14 0.39 1.00
0.0
0 1.00 0.56
PPKOC 9 79 1 12 0.43 0.99
0.0
1 0.90 0.58
Act
ual
Predicted
P
RS
PK
OC
RSP
KO
C
RSP
PP
KO
C
Co
nfi
rmat
ion
Tota
l
Sen
siti
vity
(%)
P 7 0 0 0 0 0 12 19 36.84
RS 0 6 0 0 0 0 12 18 33.33
PKOC 0 0 0 0 0 1 4 5 0.00
RSPKOC 0 0 0 3 0 0 12 15 20.00
RSP 0 0 0 0 9 0 14 23 39.13
PPKOC 0 0 0 0 0 9 12 21 42.86
Precision (%)
10
0.0
0
10
0.0
0
0.0
0
10
0.0
0
10
0.0
0
90
.00
Average sensitivity 28.69
Average precision 81.67
Overall accuracy 33.66
EVID4 Evidence Project Final Report (Rev. 06/11)Page 53 of 106
PLS-DA
Description Statistical Measures
ACC
(%) Class TP TN FP FN
Sensitivit
y or TPR
Specific
ity FPR
Precisio
n
F1
score
Application of
PLS-DA
90.1 P
14 81 1 5 0.74 0.99 0.01 0.93 0.82
RS 18 83 0 0 1.00 1.00 0.00 1.00 1.00
PKOC 1 96 0 4 0.20 1.00 0.00 1.00 0.33
RSPKOC 15 82 4 0 1.00 0.95 0.05 0.79 0.88
RSP 22 74 4 1 0.96 0.95 0.05 0.85 0.90
PPKOC 21 79 1 0 1.00 0.99 0.01 0.95 0.98
Act
ual
Predicted
P
RS
PK
OC
RSP
KO
C
RSP
PP
KO
C
Tota
l
Sen
siti
vity
(%
)
P 14 0 0 4 1 19 73.68
RS 0 18 0 0 0 0 18 100.00
PKOC 0 0 1 0 0 4 5 20.00
RSPKOC 0 0 0 15 0 0 15 100.00
RSP 1 0 0 0 22 0 23 95.65
PPKOC 0 0 0 0 0 21 21 100.00
Precision (%)
93
.33
10
0.0
0
10
0.0
0
10
0.0
0
84
.62
80
.77
Average sensitivity 81.56
Average precision 93.12
Overall accuracy 90.10
EVID4 Evidence Project Final Report (Rev. 06/11)Page 54 of 106
PLS-DA with thresholds (threshold=0.57)
Description Statistical Measures
ACC
(%) Class TP TN FP FN
Sensitivit
y or TPR
Specific
ity FPR
Precisi
on
F1
score
Application of
PLS-DA
85.15 P
14 81 1 5 0.74 0.99 0.01 0.93 0.82
RS 18 83 0 0 1.00 1.00 0.00 1.00 1.00
PKOC 0 96 0 5 0.00 1.00 0.00 0.00 0.00
RSPKOC 15 82 4 0 1.00 0.95 0.05 0.79 0.88
RSP 19 78 0 4 0.83 1.00 0.00 1.00 0.90
PPKOC 20 80 0 1 0.95 1.00 0.00 1.00 0.98
Act
ual
Predicted
P
RS
PK
OC
RSP
KO
C
RSP
PP
KO
C
Co
nfi
rmat
io
n
Tota
l
Sen
sitv
ity
(%)
P 14 0 0 0 0 0 5 19 73.68
RS 0 18 0 0 0 0 0 18 100.00
PKOC 0 0 0 4 0 0 1 5 0.00
RSPKOC 0 0 0 15 0 0 0 15 100.00
RSP 1 0 0 0 19 0 3 23 82.61
PPKOC 0 0 0 0 0 20 1 21 95.24
Precision (%)
93
.33
10
0.0
0
0.0
0
78
.95
10
0.0
0
10
0.0
0
Average sensitivity 75.26
Average precision 78.71
Overall accuracy 85.15
EVID4 Evidence Project Final Report (Rev. 06/11)Page 55 of 106
MODEL B (using MATLAB)/ 6 classes SIMCA + simulated samples
Description
Statistical Measures
ACC
(%) Class TP TN FP FN
Sensitivi
ty or
TPR
Specifi
city FPR
Preci
sion
F1
score
Application
of PLS-DA
76.24 P
16 80 2 3 0.84 0.98 0.02 0.89 0.86
RS 11 83 0 7 0.61 1.00 0.00 1.00 0.76
PKOC 0 96 0 5 0.00 1.00 0.00 0.00 0.00
RSPKOC 10 86 0 5 0.67 1.00 0.00 1.00 0.80
RSP 21 58 20 2 0.91 0.74 0.26 0.51 0.66
PPKOC 19 78 2 2 0.90 0.98 0.03 0.90 0.90
Act
ual
Predicted
P
RS
PK
OC
RSP
KO
C
RSP
PP
KO
C
Tota
l
Sen
siti
vity
(%
)
P 16 0 0 0 2 1 19 84.21
RS 0 11 0 0 7 0 18 61.11
PKOC 0 0 0 0 4 1 5 0.00
RSPKOC 0 0 0 10 5 0 15 66.67
RSP 2 0 0 0 21 0 23 91.30
PPKOC 0 0 0 0 2 19 21 90.48
Precision (%)
88
.89
10
0.0
0
0.0
0
10
0.0
0
51
.22
90
.48
Average sensitivity 65.63
Average precision 71.76
Overall accuracy 76.24
EVID4 Evidence Project Final Report (Rev. 06/11)Page 56 of 106
PLS-DA + simulated samples
Description
Statistical Measures
ACC
(%) Class TP TN FP FN
Sensitivi
ty or
TPR
Specifi
city FPR
Prec
ision
F1
score
Application
of PLS-DA
95.05 P
14 82 0 5 0.74 1.00 0.00 1.00 0.85
RS 18 83 0 0 1.00 1.00 0.00 1.00 1.00
PKOC 5 96 0 0 1.00 1.00 0.00 1.00 1.00
RSPKOC 15 86 0 0 1.00 1.00 0.00 1.00 1.00
RSP 23 75 3 0 1.00 0.96 0.04 0.88 0.94
PPKOC 21 78 2 0 1.00 0.98 0.03 0.91 0.95
Act
ual
Predicted
P
RS
PK
OC
RSP
KO
C
RSP
PP
KO
C
Tota
l
Sen
siti
vity
(%)
P 14 0 0 0 3 2 19 73.68
RS 0 18 0 0 0 0 18 100.00
PKOC 0 0 5 0 0 0 5 100.00
RSPKOC 0 0 0 15 0 0 15 100.00
RSP 0 0 0 0 23 0 23 100.00
PPKOC 0 0 0 0 0 21 21 100.00
Precision (%)
10
0.0
0
10
0.0
0
10
0.0
0
10
0.0
0
88
.46
91
.30
Average sensitivity 95.61
Average precision 96.63
Overall accuracy 95.05
EVID4 Evidence Project Final Report (Rev. 06/11)Page 57 of 106
MODEL C (using SIMCA Umetrics TM)/12 classes and MODEL C with
thresholds/12 classes
SIMCA
Act
ual
Predicted
P
RO
SO
PK
O
CC
O
RO
PO
SOP
O
RO
PK
O
SOP
KO
RO
SO
PP
KO
PC
CO
Tota
l
Sen
siti
vity
(%)
P 11 0 0 0 8 0 0 0 0 0 0 0 19 57.89
RO 0 4 0 0 0 0 0 0 0 0 0 0 4 100.00
SO 0 0 7 0 0 0 0 0 0 0 0 0 7 100.00
PKO 0 0 0 1 0 0 0 0 0 0 0 0 1 100.00
CCO 0 0 0 0 4 0 0 0 0 0 0 0 4 100.00
ROPO 2 0 0 0 0 6 0 0 0 0 0 0 8 75.00
SOPO 0 0 0 0 0 0 15 0 0 0 0 0 15 100.00
ROPKO 0 0 0 0 0 0 0 8 0 0 0 0 8 100.00
SOPKO 0 0 0 0 0 0 0 0 7 0 0 0 7 100.00
ROSO 0 0 0 0 0 0 0 0 1 6 0 0 7 85.71
PPKO 0 0 0 0 0 0 0 0 0 0 10 0 10 100.00
PCCO 0 0 0 0 0 0 0 0 0 0 0 11 11 100.00
Precision (%)
84
.62
10
0.0
0
10
0.0
0
10
0.0
0
33
.33
10
0.0
0
10
0.0
0
10
0.0
0
87
.50
10
0.0
0
10
0.0
0
10
0.0
0
Average sensitivity 93.22
Average precision 92.12
Overall accuracy 89.11
EVID4 Evidence Project Final Report (Rev. 06/11)Page 58 of 106
Description
Statistical Measures
ACC
(%) Class TP TN FP FN
Sensitivi
ty or
TPR
Specifi
city FPR
Preci
sion
F1
score
Application
of PLS-DA
89.11 P 11 80 2 8 0.58 0.98 0.02 0.85 0.69
RO 4 97 0 0 1.00 1.00 0.00 1.00 1.00
SO 7 94 0 0 1.00 1.00 0.00 1.00 1.00
PKO 1 100 0 0 1.00 1.00 0.00 1.00 1.00
CCO 4 89 8 0 1.00 0.92 0.08 0.33 0.50
ROPO 6 93 0 2 0.75 1.00 0.00 1.00 0.86
SOPO 15 86 0 0 1.00 1.00 0.00 1.00 1.00
ROPKO 8 93 0 0 1.00 1.00 0.00 1.00 1.00
SOPKO 7 93 1 0 1.00 0.99 0.01 0.88 0.93
ROSO 6 94 0 1 0.86 1.00 0.00 1.00 0.92
PPKO 10 91 0 0 1.00 1.00 0.00 1.00 1.00
PCCO 11 90 0 0 1.00 1.00 0.00 1.00 1.00
EVID4 Evidence Project Final Report (Rev. 06/11)Page 59 of 106
SIMCA with thresholds (threshold=0.05)
Act
ual
Predicted
P
RO
SO
PK
O
CC
O
RO
PO
SOP
O
RO
PK
O
SOP
KO
RO
SO
PP
KO
PC
CO
Co
nfi
rmat
ion
Tota
l
Sen
siti
vity
(%)
P 7 0 0 0 0 0 0 0 0 0 0 0 12 19 36.84
RO 0 2 0 0 0 0 0 0 0 0 0 0 2 4 50.00
SO 0 0 0 0 0 0 0 0 0 0 0 0 7 7 0.00
PKO 0 0 0 0 0 0 0 0 0 0 0 0 1 1 0.00
CCO 0 0 0 0 0 0 0 0 0 0 0 0 4 4 0.00
ROPO 0 0 0 0 0 2 0 0 0 0 0 0 6 8 25.00
SOPO 0 0 0 0 0 0 5 0 0 0 0 0 10 15 33.33
ROPKO 0 0 0 0 0 0 0 2 0 0 0 0 6 8 25.00
SOPKO 0 0 0 0 0 0 0 0 1 0 0 0 6 7 14.29
ROSO 0 0 0 0 0 0 0 0 0 0 0 0 7 7 0.00
PPKO 0 0 0 0 0 0 0 0 0 0 3 0 7 10 30.00
PCCO 0 0 0 0 0 0 0 0 0 0 0 4 7 11 36.36
Precision (%)
10
0.0
0
10
0.0
0
0.0
0
0.0
0
0.0
0
10
0.0
0
10
0.0
0
10
0.0
0
10
0.0
0
0.0
0
10
0.0
0
10
0.0
0
Average sensitivity 20.90
Average precision 66.67
Overall accuracy 25.74
EVID4 Evidence Project Final Report (Rev. 06/11)Page 60 of 106
Description
Statistical Measures
ACC
(%) Class TP TN FP FN
Sensitivi
ty or
TPR
Specifi
city FPR
Preci
sion
F1
score
Application
of PLS-DA
25.74 P 7 82 0 12 0.37 1.00 0.00 1.00 0.54
RO 2 97 0 2 0.50 1.00 0.00 1.00 0.67
SO 0 94 0 7 0.00 1.00 0.00 0.00 0.00
PKO 0 100 0 1 0.00 1.00 0.00 0.00 0.00
CCO 0 97 0 4 0.00 1.00 0.00 0.00 0.00
ROPO 2 93 0 6 0.25 1.00 0.00 1.00 0.40
SOPO 5 86 0 10 0.33 1.00 0.00 1.00 0.50
ROPKO 2 93 0 6 0.25 1.00 0.00 1.00 0.40
SOPKO 1 94 0 6 0.14 1.00 0.00 1.00 0.25
ROSO 0 94 0 7 0.00 1.00 0.00 0.00 0.00
PPKO 3 91 0 7 0.30 1.00 0.00 1.00 0.46
PCCO 4 90 0 7 0.36 1.00 0.00 1.00 0.53
EVID4 Evidence Project Final Report (Rev. 06/11)Page 61 of 106
PLS-DA
Act
ual
Predicted
P
RO
SO
PK
O
CC
O
RO
PO
SOP
O
RO
PK
O
SOP
KO
RO
SO
PP
KO
PC
CO
Tota
l
Sen
siti
vity
(%)
P 16 0 0 0 0 0 3 0 0 0 0 0 19 84.21
RO 0 1 0 0 0 3 0 0 0 0 0 0 4 25.00
SO 0 0 7 0 0 0 0 0 0 0 0 0 7 100.00
PKO 0 0 0 0 0 0 0 1 0 0 0 0 1 0.00
CCO 0 0 0 0 3 0 1 0 0 0 0 0 4 75.00
ROPO 1 0 0 0 0 4 3 0 0 0 0 0 8 50.00
SOPO 0 0 0 0 0 0 15 0 0 0 0 0 15 100.00
ROPKO 0 0 0 0 0 0 0 8 0 0 0 0 8 100.00
SOPKO 0 0 0 0 0 0 2 0 5 0 0 0 7 71.43
ROSO 0 1 2 0 0 0 2 0 0 2 0 0 7 28.57
PPKO 0 0 0 0 0 0 0 0 0 0 10 0 10 100.00
PCCO 0 0 0 0 0 0 0 0 0 0 0 11 11 100.00
Precision (%)
94
.12
50
.00
77
.78
0.0
0
10
0.0
0
57
.14
57
.69
88
.89
10
0.0
0
10
0.0
0
10
0.0
0
10
0.0
0
Average sensitivity 69.52
Average precision 77.13
Overall accuracy 81.19
EVID4 Evidence Project Final Report (Rev. 06/11)Page 62 of 106
Description
Statistical Measures
ACC
(%) Class TP TN FP FN
Sensitivi
ty or
TPR
Specifi
city FPR
Preci
sion F1 score
Application
of PLS-DA
81.19 P 16 81 1 3 0.84 0.99 0.01 0.94 0.89
RO 1 96 1 3 0.25 0.99 0.01 0.50 0.33
SO 7 92 2 0 1.00 0.98 0.02 0.78 0.88
PKO 0 100 0 1 0.00 1.00 0.00 0.00 0.00
CCO 3 97 0 1 0.75 1.00 0.00 1.00 0.86
ROPO 4 90 3 4 0.50 0.97 0.03 0.57 0.53
SOPO 15 75 11 0 1.00 0.87 0.13 0.58 0.73
ROPKO 8 93 0 0 1.00 1.00 0.00 1.00 1.00
SOPKO 5 94 0 2 0.71 1.00 0.00 1.00 0.83
ROSO 2 94 0 5 0.29 1.00 0.00 1.00 0.44
PPKO 10 90 1 0 1.00 0.99 0.01 0.91 0.95
PCCO 11 90 0 0 1.00 1.00 0.00 1.00 1.00
EVID4 Evidence Project Final Report (Rev. 06/11)Page 63 of 106
PLS-DA with thresholds (threshold=0.54)
Act
ual
Predicted
P
RO
SO
PK
O
CC
O
RO
PO
SOP
O
RO
PK
O
SOP
KO
RO
SO
PP
KO
PC
CO
Co
nfi
rmat
ion
Tota
l
Sen
siti
vity
(%)
P 13 0 0 0 0 0 1 0 0 0 0 0 5 19 68.42
RO 0 0 0 0 0 1 0 0 0 0 0 0 3 4 0.00
SO 0 0 7 0 0 0 0 0 0 0 0 0 0 7 100.0
0
PKO 0 0 0 0 0 0 0 0 0 0 1 0 0 1 0.00
CCO 0 0 0 0 3 0 1 0 0 0 0 0 0 4 75.00
ROPO 0 0 0 0 0 1 0 0 0 0 0 0 7 8 12.50
SOPO 0 0 0 0 0 0 14 0 0 0 0 0 1 15 93.33
ROPKO 0 0 0 0 0 0 0 0 0 0 0 0 8 8 0.00
SOPKO 0 0 0 0 0 0 1 0 1 0 0 0 5 7 14.29
ROSO 0 0 0 0 0 0 0 0 0 0 0 0 7 7 0.00
PPKO 0 0 0 0 0 0 0 0 0 0 4 0 6 10 40.00
PCCO 0 0 0 0 0 0 0 0 0 0 0 9 2 11 81.82
Precision (%)
10
0.0
0
0.0
0
10
0.0
0
0.0
0
10
0.0
0
50
.00
82
.35
0.0
0
10
0.0
0
0.0
0
80
.00
10
0.0
0
Average sensitivity 40.45
Average precision 59.36
Overall accuracy 51.49
EVID4 Evidence Project Final Report (Rev. 06/11)Page 64 of 106
Description
Statistical Measures
ACC
(%) Class TP TN FP FN
Sensitivi
ty or
TPR
Specifi
city FPR
Preci
sion
F1
score
Application
of PLS-DA
51.49 P 13 82 0 6 0.68 1.00 0.00 1.00 0.81
RO 0 97 0 4 0.00 1.00 0.00 0.00 0.00
SO 7 94 0 0 1.00 1.00 0.00 1.00 1.00
PKO 0 100 0 1 0.00 1.00 0.00 0.00 0.00
CCO 3 97 0 1 0.75 1.00 0.00 1.00 0.86
ROPO 1 92 1 7 0.13 0.99 0.01 0.50 0.20
SOPO 14 83 3 1 0.93 0.97 0.03 0.82 0.88
ROPKO 0 93 0 8 0.00 1.00 0.00 0.00 0.00
SOPKO 1 94 0 6 0.14 1.00 0.00 1.00 0.25
ROSO 0 94 0 7 0.00 1.00 0.00 0.00 0.00
PPKO 4 90 1 6 0.40 0.99 0.01 0.80 0.53
PCCO 9 90 0 2 0.82 1.00 0.00 1.00 0.90
EVID4 Evidence Project Final Report (Rev. 06/11)Page 65 of 106
MODEL D (using MATLAB)/ 12 classes
SIMCA + simulated samples
Act
ual
Predicted
P
RO
SO
PK
O
CC
O
RO
PO
SOP
O
RO
PK
O
SOP
KO
RO
SO
PP
KO
PC
CO
Tota
l
Sen
siti
vity
(%)
P 9 0 0 8 0 0 1 0 0 0 0 1 19 47.37
RO 0 0 0 4 0 0 0 0 0 0 0 0 4 0.00
SO 0 0 0 7 0 0 0 0 0 0 0 0 7 0.00
PKO 0 0 0 1 0 0 0 0 0 0 0 0 1 100.00
CCO 0 0 0 4 0 0 0 0 0 0 0 0 4 0.00
ROPO 2 0 0 3 0 3 0 0 0 0 0 0 8 37.50
SOPO 0 0 0 6 0 0 9 0 0 0 0 0 15 60.00
ROPKO 0 0 0 8 0 0 0 0 0 0 0 0 8 0.00
SOPKO 0 0 0 7 0 0 0 0 0 0 0 0 7 0.00
ROSO 0 0 0 5 0 0 1 0 0 1 0 0 7 14.29
PPKO 0 0 0 3 0 0 0 0 0 0 7 0 10 70.00
PCCO 0 0 0 6 0 0 0 0 0 0 0 5 11 45.45
Precision (%)
81
.82
0.0
0
0.0
0
1.6
1
0.0
0
10
0.0
0
81
.82
0.0
0
0.0
0
10
0.0
0
10
0.0
0
83
.33
Average sensitivity 31.22
Average precision 45.72
Overall accuracy 34.65
EVID4 Evidence Project Final Report (Rev. 06/11)Page 66 of 106
Description
Statistical Measures
ACC
(%) Class TP TN FP FN
Sensitivi
ty or
TPR
Specifi
city FPR
Preci
sion
F1
score
Application
of PLS-DA
34.65 P 9 80 2 10 0.47 0.98 0.02 0.82 0.60
RO 0 97 0 4 0.00 1.00 0.00 0.00 0.00
SO 0 94 0 7 0.00 1.00 0.00 0.00 0.00
PKO 1 39 61 0 1.00 0.39 0.61 0.02 0.03
CCO 0 97 0 4 0.00 1.00 0.00 0.00 0.00
ROPO 3 93 0 5 0.38 1.00 0.00 1.00 0.55
SOPO 9 84 2 6 0.60 0.98 0.02 0.82 0.69
ROPKO 0 93 0 8 0.00 1.00 0.00 0.00 0.00
SOPKO 0 94 0 7 0.00 1.00 0.00 0.00 0.00
ROSO 1 94 0 6 0.14 1.00 0.00 1.00 0.25
PPKO 7 91 0 3 0.70 1.00 0.00 1.00 0.82
PCCO 5 89 1 6 0.45 0.99 0.01 0.83 0.59
EVID4 Evidence Project Final Report (Rev. 06/11)Page 67 of 106
PLS-DA + simulated samples
Act
ual
Predicted
P
RO
SO
PK
O
CC
O
RO
PO
SOP
O
RO
PK
O
SOP
KO
RO
SO
PP
KO
PC
CO
Tota
l
Sen
siti
vity
(%)
P 14 0 0 0 0 1 1 0 0 0 3 0 19 73.68
RO 0 4 0 0 0 0 0 0 0 0 0 0 4 100.00
SO 0 0 7 0 0 0 0 0 0 0 0 0 7 100.00
PKO 0 0 0 1 0 0 0 0 0 0 0 0 1 100.00
CCO 0 0 0 0 4 0 0 0 0 0 0 0 4 100.00
ROPO 0 0 0 0 0 8 0 0 0 0 0 0 8 100.00
SOPO 0 0 0 0 0 0 15 0 0 0 0 0 15 100.00
ROPKO 0 0 0 0 0 0 0 8 0 0 0 0 8 100.00
SOPKO 0 0 0 0 0 0 0 0 7 0 0 0 7 100.00
ROSO 0 2 2 0 0 0 0 0 0 3 0 0 7 42.86
PPKO 0 0 0 0 0 0 0 0 0 0 10 0 10 100.00
PCCO 0 0 0 0 0 0 0 0 0 0 0 11 11 100.00
Precision (%)
10
0.0
0
66
.67
77
.78
10
0.0
0
10
0.0
0
88
.89
93
.75
10
0.0
0
10
0.0
0
10
0.0
0
76
.92
10
0.0
0
Average sensitivity 93.05
Average precision 92.00
Overall accuracy 91.09
EVID4 Evidence Project Final Report (Rev. 06/11)Page 68 of 106
Description
Statistical Measures
ACC
(%) Class TP TN FP FN
Sensitivi
ty or
TPR
Specifi
city FPR
Preci
sion
F1
score
Application
of PLS-DA
91.09 P 14 82 0 5 0.74 1.00 0.00 1.00 0.85
RO 4 95 2 0 1.00 0.98 0.02 0.67 0.80
SO 7 92 2 0 1.00 0.98 0.02 0.78 0.88
PKO 1 100 0 0 1.00 1.00 0.00 1.00 1.00
CCO 4 97 0 0 1.00 1.00 0.00 1.00 1.00
ROPO 8 92 1 0 1.00 0.99 0.01 0.89 0.94
SOPO 15 85 1 0 1.00 0.99 0.01 0.94 0.97
ROPKO 8 93 0 0 1.00 1.00 0.00 1.00 1.00
SOPKO 7 94 0 0 1.00 1.00 0.00 1.00 1.00
ROSO 3 94 0 4 0.43 1.00 0.00 1.00 0.60
PPKO 10 88 3 0 1.00 0.97 0.03 0.77 0.87
PCCO 11 90 0 0 1.00 1.00 0.00 1.00 1.00
EVID4 Evidence Project Final Report (Rev. 06/11)Page 69 of 106
Appendix III – Permutation plots
A) PLS-DA 6 classes’ model (MODEL B) using MATLAB
EVID4 Evidence Project Final Report (Rev. 06/11)Page 70 of 106
EVID4 Evidence Project Final Report (Rev. 06/11)Page 71 of 106
B) PLS-DA 12 classes’ model (MODEL C) using SIMCA Umetrics™
Permutation plot for model 1: P class
Permutation plot for model 2: RO class
Permutation plot for model 3: SO class
Permutation plot for model 4: PKO class
EVID4 Evidence Project Final Report (Rev. 06/11)Page 72 of 106
Permutation plot for model 5: CCO class
Permutation plot for model 6: ROPO class
EVID4 Evidence Project Final Report (Rev. 06/11)Page 73 of 106
Permutation plot for model 7: SOPO class
Permutation plot for model 8: ROPKO class
Permutation plot for model 9: SOPKO class
EVID4 Evidence Project Final Report (Rev. 06/11)Page 74 of 106
Permutation plot for model 10: ROSO class
Permutation plot for model 11: PPKO class
Permutation plot for model 12: PCCO class
EVID4 Evidence Project Final Report (Rev. 06/11)Page 75 of 106
Appendix IV – FTIR Inter-Lab trial results
1. DATA EXPLORATION
Principal component analysis (PCA) was first applied to all the FTIR spectral data composed of 3781
variables (654.23 - 1875.4 cm-1 and 2520.02 - 3120.74 cm-1) and 144 samples (Fig. 1). Spectral data were
previously transformed by three different signal correction methods: Standard Normal Variate (SNV), first
order derivative and Savitzky-Golay.
Figure 1. Score plot of the first two principal components: R2X(1)=0.361; R
2X(2)=0.162; Q
2 (cum)= 0.831. P
group: palm oil, palm stearin, palm olein; PKO group: palm kernel oil; RS group: rapeseed oil, sunflower oil,
rapeseed and sunflower mixtures; RSPO group: RS group + P group; PPKO group: P group + PKO; RSPKO
group: RS group + PKO.
A total of 9 uncorrelated principal components were calculated. The first two principal components
accounted for 52% of the total variance, with the first and second components explaining 36% and 16% of the
total variability, respectively. All the samples from one of the participants (Participant C) were outside the 95%
confidence level showing a large variability that could be explained by a ‘badly shaped’ instrument
performance or a “human/user error” and thus, they were removed from the dataset. A new principal
component analysis was applied to the reduced dataset (135 samples, PCs=12) and the score plots on the
first two and three principal components respectively are plotted in Figure 2.
EVID4 Evidence Project Final Report (Rev. 06/11)Page 76 of 106
Figure 2. Score plot (2D and 3D) of the first two principal components: R2X(1)=0.246; R
2X(2)=0.173; Q
2 (cum)=
0.725. P group: palm oil, palm stearin, palm olein; PKO group: palm kernel oil; RS group: rapeseed oil, sunflower
oil, rapeseed and sunflower mixtures; RSPO group: RS group + P group; PPKO group: P group + PKO; RSPKO
group: RS group + PKO.
The score plot of the three first principal components revealed more samples situated outside the 95%
confidence ellipse that indicates a possible “mistake” in the performance of the instrument and/or the person
that carried out the analysis. All the samples from one of the participants (Participant B) were outside the
confidence level. Thus, all samples from participant B were considered outliers and were removed also from
the dataset. A final principal component analysis was applied to a total of 126 samples and 11 principal
components were calculated. The score plot of the first two principal components is presented in Figure 3.
EVID4 Evidence Project Final Report (Rev. 06/11)Page 77 of 106
Figure 3. Score plot of the first two principal components: R2X(1)=0.264; R
2X(2)=0.191; Q
2 (cum)= 0.669. P
group: palm oil, palm stearin, palm olein; PKO group: palm kernel oil; RS group: rapeseed oil, sunflower oil,
rapeseed and sunflower mixtures; RSPO group: RS group + P group; PPKO group: P group + PKO; RSPKO
group: RS group + PKO.
2. DATA PRE-PROCESSING
Due to the high variability observed on the spectral data coming from different instruments a new
approach to pre-processing was needed before testing them in our calibration models.
Linear interpolation
All FTIR spectra used for creating the calibration models were recorded from 550 to 4000 cm-1 at a
resolution 4 cm-1. The total number of variables generated was 7157 (data spacing = 0.482 cm-1). The
number of variables varied amongst participants: 1738 scan points for one participant, 1762 scan points for
one participant, 1763 scan points for one participant, 1764 scan points for eight participants, 1765 scan points
for one participant, 3526 scan points for one participant, 7053 scan points for one participant, 7054 scan
points for one participant and 7157 scan points for the samples collected in out lab.
Linear interpolation was applied to all spectra in order to get the desirable number of variables.
If the two known points are given by the coordinates (X0, Y0) and (X1, Y1), the linear interpolant is the
straight line between these points. For a value X in the interval (X0, X1), the value Y along the straight line is
given from the equation
We created two or three points between the given scan points of the participants using interp1 function in
MATLAB R2014b returning interpolated at specific query points using linear interpolation. This yielded to a
total number of 7054 variables. All participants collected the spectra from 600 to 4000 cm-1. Thus, 103
variables covering the region from 550 to 600 cm-1 were added at the beginning of every spectrum to reach a
total number of 7157 variables. These variables were the same that the first variable of every spectrum.
EVID4 Evidence Project Final Report (Rev. 06/11)Page 78 of 106
Figure 4. Example of a linear interpolation of one of the spectra from one participant.
iCoShift
Using different types of FTIR’s ATR modules (made of different materials such as diamond) can result in
significant signal shifting of the spectral peaks. To overcome this problem a rapid and versatile algorithm for
the alignment of spectral datasets called “iCoShift” was applied to all spectra using MATLAB R2014b.
The iCoShift algorithm is based on correlation shifting of spectral Intervals and employs an FFT engine
that aligns all spectra simultaneously. The algorithm is fast making full-resolution alignment of large datasets
feasible and thus avoiding down-sampling steps such as binning. The algorithm can use missing values
(NaN) as a filling alternative in order to avoid spectral artifacts at the segment boundaries.
Figure 5. Example of a spectrum pre-processed with iCoShift.
Standard normal variate (SNV)
Standard normal variate (SNV) is a mathematical transformation that was applied to all FTIR spectra from
the participants. SNV is a normalization method that removes the slope variation from spectra caused by
scatter and variation of particle size (Barnes et al., 1989) [4]. It calculates the standard deviation of all the
variables for the given sample. The entire sample is then normalized by this value, thus giving the sample a
unit standard deviation (σ = 1). This procedure also includes a zero-order detrend (subtraction of the
individual mean value from each spectrum). The equations used by the algorithm are the mean and standard
deviation equations:
EVID4 Evidence Project Final Report (Rev. 06/11)Page 79 of 106
n
X
x
n
j
ji
i
1
,
)1(
)(
/)(1
2
,
1,
n
xX
xXSNV
n
j
iji
injii
where n is the number of variables, jiX , is the value of the jth variable for the ith sample.
This normalization approach is weighted towards considering the values that deviate from the individual
sample mean more heavily than values near the mean. FTIR raw spectra of 14 palm oils are presented in
Figure 6.
Figure 6. Superimposed FTIR spectra of 14 palm oils before applying any mathematical transformation
Figure 7. Superimposed FTIR spectra of 14 palm oils after applying SNV mathematical transformation
Another example of a spectrum pre-processed with SNV is presented in Figure 8.
EVID4 Evidence Project Final Report (Rev. 06/11)Page 80 of 106
Figure 8. Comparison of a spectrum before and after pre-processing with SNV
The effect of both mathematical transformations (iCoshift + SNV) on a raw spectrum can be seen in Figure
9.
Figure 9. Example of a spectrum pre-processed with iCoshift followed by SNV
First derivative
First order derivative (Osborne, Fearn & Hindle, 1993) [5] aims to remove overlapping peaks and correct
the baseline. The derivative brings the overlapping peaks apart and the linear background becomes to a
constant level in the first derivative spectrum. The peaks become zero in the first derivative. Specifically, first
derivative forward difference implementation was applied to the data.
F’(x) = f(x + 1) – f(x)
X’i= ( Xi,j+1 – Xi,j )
where jiX , is the value of the jth variable for the ith sample.
An example of a spectrum before and after pre-processing with the first order derivative is shown in Figure
10.
EVID4 Evidence Project Final Report (Rev. 06/11)Page 81 of 106
Figure 10. Comparison of a spectrum before and after pre-processing with first order derivative
The effect of the three mathematical transformations (iCoShift + SNV + First Derivative) on a raw spectrum
can be seen in Figure 11.
Figure 11. Example of a spectrum pre-processed with iCoShift followed by SNV and by the first derivative
Savitzky–Golay
Savitzky–Golay (Savitzky & Golay, 1964) [6] is a filter that can be applied to a set of data points for the
purpose of smoothing the data, that is, to increase the signal-to-noise ratio without greatly distorting the
signal. This is achieved in a process known as convolution, by fitting successive sub-sets of adjacent data
points with a low-degree polynomial by the method of linear least squares. When the data points are equally
spaced an analytical solution to the least-squares equations can be found, in the form of a single set of
"convolution coefficients" that can be applied to all data sub-sets, to give estimates of the smoothed signal (or
derivatives of the smoothed signal) at the central point of each sub-set.
An example of a raw spectrum (before pre-processing) and the same spectrum after the application of
Savitzky-Golay filter can be seen in Figure 12.
EVID4 Evidence Project Final Report (Rev. 06/11)Page 82 of 106
Figure 12. Comparison of a spectrum before and after pre-processing with Savitzky-Golay smoothing.
The effect of the four mathematical transformations (iCoShift + SNV + First Derivative + Savitzky-Golay)
applied as a series of spectral filters on a raw spectrum is presented in Figure 13.
Figure 13. Example of a spectrum pre-processed with iCoShift followed by SNV, the first derivative and Savitzky-
Golay
Pareto scaling
Scaling methods are data pre-processing approaches that divide each variable by a factor, the scaling
factor, which is different for each variable. They aim to adjust for the differences in fold differences between
the different variables by converting the data into differences in concentration relative to the scaling factor.
Pareto scaling uses a measure of the data dispersion (square root of the standard deviation) as a scaling
factor. Large fold changes are decreased more than small fold changes, thus the large fold changes are less
dominant compared to clean data. Furthermore, the data does not become dimensionless.
j
jji
jis
XXX
,
,
~
where jiX , is the value of the jth variable for the ith sample.
EVID4 Evidence Project Final Report (Rev. 06/11)Page 83 of 106
An example of a raw spectrum (before pre-processing) and the same spectrum after the application of
Pareto scaling can be seen in Figure 14.
Figure 14. Comparison of a spectrum before and after scaling with Pareto
The pre-processed spectra look quite different from the raw spectra when applied all filters together,
iCoshift, SNV, first derivative, Savitzky-Golay and Pareto scaling. Thus, the pre-processed spectra are
shifted, normalised, smoothed and scaled (Figure 15).
Figure 15. Example of a spectrum pre-processed with iCoshift followed by SNV, the first derivative, Savitzky-
Golay and Pareto scaling
3. SIMULATED SAMPLES
The calibration models developed in the first phase of the FAO117 project were created using different
number of samples. The unequal number of samples was due to the re-grouping of classes done at the end
of the project FAO117 because of the similarities observed amongst initial classes. This unequal number of
classes gives to the model uncertainty when classifying unknown samples mainly those that belong to the
low-numbered classes such as PKO class (only 12 samples). The number of samples of each class was as
followed:
P class= 78 samples
RS class= 78 samples
PKO class= 12 samples
EVID4 Evidence Project Final Report (Rev. 06/11)Page 84 of 106
RSPKO class= 84 samples
RSPO class= 180 samples
PPKO class= 54 samples
Simulated samples were added to the calibration models in order to create balanced classes and avoid
any biased classification decision. Simulated samples are new samples created by offsetting the mean
spectrum of each class along the Y axis and slightly along the X axis. These samples were appended to the
calibration dataset and the model was re-trained. The offset percentage along the Y-axis varied between 0
and 25% in order to have a balanced classification model.
Table 1. Description of offsetting for the production of simulated samples and the resulted number of
samples
Class Y-axis offset X-axis offset
Number of
synthetic
samples
added
Total
number of
samples
P 15% 1 variable to left 32 110
RS 20% 1 variable to left 42 120
PKO 25% 1 variable to left 52 64
RSPKO 15% 1 variable to left 32 116
RSPO - - 0 180
PPKO 20% 1 variable to left 42 96
For instance, in the case of P class, fifteen simulated samples were created above the mean spectrum
with 1% step (15% offset). Thereafter, the resulted spectra plus the mean spectrum were shifted by one
variable to left. In total, 32 simulated samples were added to the calibration dataset for the P class (Total new
= 110). Figure 16 shows the new simulated samples comparing to the mean spectrum of a specific class.
Figure 16. Thirty-two simulated samples added for P class (green colour)
The creation of the simulated samples for the rest of the classes was done following the same procedure
that for the P class (Figure 17, 18, 19 and 20).
EVID4 Evidence Project Final Report (Rev. 06/11)Page 85 of 106
Figure 17. Forty-two simulated samples for RS class (green colour)
Figure 18. Fifty-two simulated samples for PKO class (green colour)
Figure 19. Thirty-two simulated samples for RSPKO class (green colour)
It has to be noted that no simulated spectra were created for the RSPO class because of the high number
of samples included originally in that class.
EVID4 Evidence Project Final Report (Rev. 06/11)Page 86 of 106
Figure 20. Forty-two simulated samples for PPKO class
Principal component analysis (PCA) was applied to all the resulted FTIR spectral data composed of 686
samples (including all samples of the calibration models and the new simulated samples) for visualization
purposes.
Figure 21. Score plot of the first two principal components: R2X(1)=0.597; R
2X(2)=0.279; Q
2 (cum)= 0.979.
(R2X: fraction of X variation modelled in the component; Q
2: overall cross-validated R
2X from the component)
P group: palm oil, palm stearin, palm olein; PKO group: palm kernel oil; RS group: rapeseed oil, sunflower oil,
rapeseed and sunflower mixtures; RSPO group: RS group + P group; PPKO group: P group + PKO; RSPKO
group: RS group + PKO.
A total of 23 uncorrelated principal components were calculated. The first two principal components
accounted for 88% of the total variance, with the first and second components explaining 60% and 28% of the
total variability, respectively.
4. PREDICTION- SCREENING STEP
4.1 INITIAL RESULTS OF THE INTER-LAB VALIDATION
A total of 126 spectra from all the participants were used for method validation. The developed SIMCA and
PLS-DA classification models from DEFRA FAO117 were validated using the inter-lab validation set to predict
the type of oil of each sample (oil and oil admixtures). The FTIR spectral intervals used were the same that
EVID4 Evidence Project Final Report (Rev. 06/11)Page 87 of 106
those used for creating the calibration models: from 654.2 to 1875.4 and from 2520 to 3120.7 cm-1 (3781
variables).
In general PLS-DA gave a better performance (higher correctly classification rate, 88.89%) than SIMCA
(33.33%). Performance was assessed by its accuracy in predicting each class correctly. However, both
techniques (PLS-DA and SIMCA) gave a high number of false positives (28.3% and 59.8%, respectively) -
which means a high risk of miss-classification and therefore rendering the whole process redundant. Samples
that are wrongly classified in the screening step will not be referred to the second step or confirmation step. In
order to decrease the number of wrongly classified samples (false positives) and increased the number of
non-classified samples, a probability threshold was calculated (see 4.2) and effectively included in the initial
methodology. The classification results with both methods are presented in Table 2 and 3 below.
Table 2. Classification results on the prediction of the inter-lab samples when using FTIR coupled with
PLS-DA algorithm
Pre-processing: calibration
dataset
Pre-processing
for prediction dataset
PLS-DA
ACC (%)
Class TP TN FP FN Sensitivity or TPR
Specificity
FPR Precision F1
score
1. SNV 2. 1
st Deriv
3. S-Golay (7,39) 4. Pareto
1. Icoshift (‘average’, ’whole’) 2. SNV 3. 1
st
Derivative 4. S-Golay (7,39) 5. Pareto
88.9 P 14 99 13 0 1.00 0.88 0.12 0.52 0.68
RS 28 98 0 0 1.00 1.00 0.00 1.00 1.00
PKO 14 111 1 0 1.00 0.99 0.01 0.93 0.97
RSPKO
14 112 0 0 1.00 1.00 0.00 1.00 1.00
RSPO 35 84 0 7 0.83 1.00 0.00 1.00 0.91
PPKO 7 112 0 7 0.50 1.00 0.00 1.00 0.67
*ACC: accuracy; TP: true positive; TN: true negative; FP: false positive; FN: false negative, TPR: true positive rate; FPR:
false positive rate. P class: palm oil, palm stearin, palm olein; PKO class: palm kernel oil; RS class: rapeseed oil, sunflower
oil, rapeseed and sunflower mixtures; RSPO class: RS class + P class; PPKO class: P group + PKO; RSPKO class: RS
class + PKO.
Table 3. Classification results on the prediction of the inter-lab samples when using FTIR coupled with SIMCA
algorithm
Pre-processing: calibration
dataset
Pre-processing
for prediction dataset
SIMCA
ACC (%)
Class TP TN FP FN Sensitivity or TPR
Specificity
FPR Precision F1
score
1. SNV 2. 1
st Deriv
3. S-Golay (7,39) 4. Pareto
1. Icoshift (‘average’, ’whole’) 2. SNV 3. 1
st
Derivative 4. S-Golay (7,39) 5. Pareto
33.3 P 0 112 0 14 0.00 1.00 0.00 0.00 0.00
RS 0 98 0 28 0.00 1.00 0.00 0.00 0.00
PKO 0 112 0 14 0.00 1.00 0.00 0.00 0.00
RSPKO
0 112 0 14 0.00 1.00 0.00 0.00 0.00
RSPO 42 0 84 0 1.00 0.00 1.00 0.33 0.50
PPKO 0 112 0 14 0.00 1.00 0.00 0.00 0.00
*P class: palm oil, palm stearin, palm olein; PKO class: palm kernel oil; RS class: rapeseed oil, sunflower oil, rapeseed and
sunflower mixtures; RSPO class: RS class + P class; PPKO class: P group + PKO; RSPKO class: RS class + PKO;. ACC:
accuracy; TP: true positive; TN: true negative; FP: false positive; FN: false negative, TPR: true positive rate; FPR: false
positive rate
The confusion matrices containing the information about actual and predicted classifications done by the
two classifiers (PLS-DA and SIMCA) are shown in Table 4 and 5, respectively.
EVID4 Evidence Project Final Report (Rev. 06/11)Page 88 of 106
Table 4. Performance of the PLS-DA classification model (confusion matrix).
Predicted PLS-DA
P
RS
PK
O
RS
PK
O
RS
PO
PP
KO
To
tal
Accu
racy
(%)
Actu
al
P 14 0 0 0 0 0 14 100.00
RS 0 28 0 0 0 0 28 100.00
PKO 0 0 14 0 0 0 14 100.00
RSPKO 0 0 0 14 0 0 14 100.00
RSPO 7 0 0 0 35 0 42 83.33
PPKO 6 0 1 0 0 7 14 50.00
Reliability
(%)
51.8
5
100.0
0
93.3
3
100.0
0
100.0
0
100.0
0
The average accuracy, average reliability and overall accuracy were 88.89, 90.86 and 88.89 %,
respectively.
Table 5. Performance of the SIMCA classification model
Predicted SIMCA
P
RS
PK
O
RS
PK
O
RS
PO
PP
KO
To
tal
Accu
racy
(%)
Actu
al
P 0 0 0 0 14 0 14 0.00
RS 0 0 0 0 28 0 28 0.00
PKO 0 0 0 0 14 0 14 0.00
RSPKO 0 0 0 0 14 0 14 0.00
RSPO 0 0 0 0 42 0 42 100.00
PPKO 0 0 0 0 14 0 14 0.00
Reliability
(%) 0.0
0
0.0
0
0.0
0
0.0
0
33.3
3
0.0
0
The performance of SIMCA algorithm is poorer than the PLS-DA with an average accuracy, average
reliability and overall accuracy of the prediction of 16.67%, 5.56% and 33.33%, respectively.
4.2 CALCULATION OF P-VALUES
P-values were calculated to define thresholds for normalized confidence/probability as an upper limit for
classifying a sample to each class and for sample referral to the second step or confirmation step. For this
calculation, the training dataset is used as a prediction set to our model.
P-values were calculated by firstly determining our experiment's degrees of freedom:
Degrees of freedom (dF) = n-1
EVID4 Evidence Project Final Report (Rev. 06/11)Page 89 of 106
and calculating the Chi-square score using the following formula:
x2 = Σ((o-e)
2/e)
where "o" is the observed value and "e" is the expected value for each class.
Chi-square probability distribution is used to find P-value. The bigger the obtained Chi-Square is, the
greater the difference between the observed and expected frequencies will be. Due to the very high Chi-
square values obtained using the above formula, a web-based Chi-Square Distribution Calculator instead of
the Chi-square distribution table was used to automatically estimate the P-values by using the dF and Chi-
square values.
The significance value of 0.05 or 5% was selected for these experiments. This means that experimental
results that meet this significance level have, at most, a 5% chance of being the result of pure chance. In
other words, there is a 95% chance that the results were caused by the scientists’ manipulation of
experimental classes, rather than by chance.
4.2.1 Results with PLS-DA
The P-value estimated by using 5 degrees of freedom and a Chi-square value of 131.82 was <0.00001 for
a dataset composed of 686 samples. The calculation of each class contribution to the Chi-square value of the
classification model developed by PLS-DA is shown in Table 6.
Table 6. Expected, observed number of samples and contribution to Chi-square for each class
Class Expected
Test
proportion Observed
Contribution
to Chi-square
P 110 0.160 168 30.58
RS 120 0.175 183 33.08
PKO 64 0.093 91 11.39
RSPKO 116 0.169 80 11.17
RSPO 180 0.262 110 27.22
PPKO 96 0.140 54 18.38
*P class: palm oil, palm stearin, palm olein; PKO class: palm kernel oil; RS class: rapeseed oil, sunflower oil, rapeseed and
sunflower mixtures; RSPO class: RS class + P class; PPKO class: P group + PKO; RSPKO class: RS class + PKO.
In these results, the P-value calculated is less than the confidence level of 0.05 and so the null hypothesis
can be rejected. This is the hypothesis that the experimental classes manipulated did not affect the results
observed. Thus, it is highly likely that there is a correlation between the classes manipulated and the results
observed.
The individual class contributions to Chi-square were used to quantify how much of the total Chi-square
statistic is attributable to each class's divergence.
Contribution to Chi-square = (o-e)2/e
The Chi-square statistic is the sum of these values for all classes.
Classes with a large difference between observed and expected values make a larger contribution to the
overall Chi-square statistic. The largest contribution comes from RS class. Based on the above class
contributions to Chi-square and the resulted normalized confidence/probability, the following thresholds were
selected (Table 7). The higher class contribution to Chi-square is, the higher threshold was selected for this
specific class.
Table 7. Thresholds for the normalized confidence/probability
P * RS PKO RSPKO RSPO PPKO
>0.21 >0.21 >0.18 >0.18 >0.19 >0.18
EVID4 Evidence Project Final Report (Rev. 06/11)Page 90 of 106
*P class: palm oil, palm stearin, palm olein; PKO class: palm kernel oil; RS class: rapeseed oil, sunflower oil, rapeseed and
sunflower mixtures; RSPO class: RS class + P class; PPKO class: P group + PKO; RSPKO class: RS class + PKO.
4.2.2 Results with SIMCA
The P-value estimated by using 5 degrees of freedom and a Chi-square value of 192.1 was <0.00001 for a
dataset composed of 686 samples. The calculation of each class contribution to the Chi-square value of the
classification model developed by PLS-DA is shown below (Table 10). In this case, the P-value calculated is
also less than the confidence level of 0.05. Therefore, the classes of this experiment had meaningful effect on
the results.
Table 8. Expected, observed number of samples and contribution to Chi-square
Class Expected
Test
proportion Observed
Contribution to
Chi-square
P* 110 0.160 97 1.54
RS 120 0.175 87 9.08
PKO 64 0.093 0 64.00
RSPKO 116 0.169 160 16.69
RSPO 180 0.262 296 74.76
PPKO 96 0.140 46 26.04
*P class: palm oil, palm stearin, palm olein; PKO class: palm kernel oil; RS class: rapeseed oil, sunflower oil, rapeseed and sunflower
mixtures; RSPO class: RS class + P class; PPKO class: P group + PKO; RSPKO class: RS class + PKO.
RSPO class makes the greatest contribution to the high Chi-square value. Using the above class
contributions to Chi-square and the resulted normalized confidence/probability, the following thresholds were
selected. A higher threshold was selected for the classes with high contribution to Chi-square.
Table 9. Thresholds for the normalized confidence/probability
P* RS PKO RSPKO RSPO PPKO
>0.20 >0.20 >0.22 >0.21 >0.29 >0.21
*P class: palm oil, palm stearin, palm olein; PKO class: palm kernel oil; RS class: rapeseed oil, sunflower oil, rapeseed and sunflower
mixtures; RSPO class: RS class + P class; PPKO class: P group + PKO; RSPKO class: RS class + PKO.
4.3 PREDICTION WITH PLS-DA
PLS-DA consists in a classical PLS regression where the response variable is a categorical one (replaced
by the set of dummy variables describing the categories) expressing the class membership of the statistical
units. Therefore, PLS-DA does not allow for other response variables than the one for defining the groups of
individuals. As a consequence, all measured variables play the same role with respect to the class
assignment. Actually, PLS components are built by trying to find a proper compromise between two purposes:
describing the set of explanatory variables and predicting the response ones. This approach may go further
than the classical SIMCA classification method that works more on the reassignment of units to pre-defined
classes. PLS-DA calibration models were created in our previous DEFRA project (FAO117). Samples in the
inter-lab validation set were compared to the model and assigned either to the category being modelled or not
on the basis of their normalised probabilities from the model and the thresholds defined by the P-values in the
4.2 unit. Each of these probabilities is a negative exponential function of the distance between the testing
sample and each model class. Their values were normalised by dividing them by the sum of the probabilities
associated with each of the testing samples in the probability space so that this sum is equal to one.
Performance of classification models (95% confidence level) was calculated using several parameters:
EVID4 Evidence Project Final Report (Rev. 06/11)Page 91 of 106
Sensitivity or true positive rate is the percentage of positive labelled samples that were predicted
as positive (Sensitivity=TP / (TP + FN)),
Specificity or true negative rate is the percentage of negative labelled samples that were predicted
as negative (Specificity=TN / (TN + FP)),
False positive rate (FPR) is the percentage of incorrectly classified samples that were, in fact,
negative. (FPR=FP / (FP+TN)),
Precision is the percentage of positive predictions that are correct (Precision=TP/ (TP+FP)) and,
F1 score (also F-score or F-measure) is the harmonic mean of precision and sensitivity where an
F1 score reaches its best value at 1 and worst score at 0 (F1 = 2 TP / (2TP + FP + FN)).
Inter-lab validation samples (n=126) were predicted using the calibration models developed in DEFRA
FAO 117. Specific intervals from the FTIR spectra (from 654.23 to 1875.4 cm-1 and from 2520.0 to 3120.7
cm-1
, 3781 variables) and two latent variables were used. The results are shown in Table 10 and 11.
Table 10. PLSDA model performances on inter-lab validation set using 3781 variables
Pre-processing: calibration
dataset
Pre-processing: prediction dataset
PLS-DA
ACC (%) Class TP TN FP FN
Sensitivity or TPR
Specificity FPR Precisio
n F1
score
1. SNV 2. 1
st Deriv
3. S-Golay (7,39) 4. Pareto
1. Icoshift (‘average’, ’whole’) 2. SNV 3. 1
st
Derivative 4. S-Golay (7,39) 5. Pareto
83.33 P 12 109 3 2 0.86 0.97 0.03 0.80 0.83
RS 26 98 0 2 0.93 1.00 0.00 1.00 0.96
PKO 14 112 0 0 1.00 1.00 0.00 1.00 1.00
RSPKO
13 112 0 1 0.93 1.00 0.00 1.00 0.96
RSPO 35 84 0 7 0.83 1.00 0.00 1.00 0.91
PPKO 5 112 0 9 0.36 1.00 0.00 1.00 0.53
* ACC: accuracy; TP: true positive; TN: true negative; FP: false positive; FN: false negative, TPR: true positive rate; FPR:
false positive rate. P class: palm oil, palm stearin, palm olein; PKO class: palm kernel oil; RS class: rapeseed oil, sunflower
oil, rapeseed and sunflower mixtures; RSPO class: RS class + P class; PPKO class: P group + PKO; RSPKO class: RS
class + PKO.
The average accuracy, average reliability and overall accuracy were 81.75, 96.67 and 83.33 %,
respectively. Accuracy is the fraction of correctly classified samples with regard to all samples of that ground
truth class and reliability is the fraction of correctly classified samples with regard to all samples classified as
that class. The overall accuracy is calculated as the total number of correctly classified samples divided by
the total number of validation samples.
The confusion table is presented below in Table 11. Twelve out of fourteen samples of the validation set
that belong to the P group were classified as belonging to the P group and the rest (two) were not assigned to
any of the modelled groups (non-classified). All samples of the validation set that belong to the RS group
were correctly classified as belonging to the RS group. Twelve samples of the validation set that belong to the
PKO group were classified as belonging to the PKO group whereas two samples were not assigned to any of
the modelled groups (non-classified). Thirteen samples of the validation set that belong to the RSPKO group
were classified as belonging to the RSPKO group and one sample was not assigned to any of the modelled
groups (non-classified). Thirty-five samples of the validation set that belong to the RSPO group were
classified as belonging to the RSPO group whereas two samples were wrongly classified as belonging to the
P group and five samples were not assigned to any of the modelled groups (non-classified). Five samples of
the validation set that belong to the PPKO group were classified as belonging to the PPKO group whereas
one sample was wrongly classified as belonging to the P group and eight samples were not assigned
to any of the modelled groups (non-classified).
EVID4 Evidence Project Final Report (Rev. 06/11)Page 92 of 106
Table 11. Performance of the PLS-DA classification model after the application of thresholds
Predicted
P
RS
PK
O
RS
PK
O
RS
PO
PP
KO
No
n
Cla
ss
ifie
d
To
tal
Accu
racy
(%)
Actu
al
P 12 0 0 0 0 0 2 14 85.71
RS 0 26 0 0 0 0 2 28 92.86
PKO 0 0 14 0 0 0 0 14 100.00
RSPKO 0 0 0 13 0 0 1 14 92.86
RSPO 2 0 0 0 35 0 5 42 83.33
PPKO 1 0 0 0 0 5 8 14 35.71
Reliability
(%) 80.0
0
100.0
0
100.0
0
100.0
0
100.0
0
100.0
0
A total of 18 samples were non-classified to any of our modelled groups since the probability was below
the stablish threshold for each class (P-value). All 18 samples are referred to the confirmation step. The three
false positive samples (samples classified as belonging to the wrong group) give an error of 2.38% to the
method when using PLS-DA algorithm. The expected class and the observed class as well as the potential
reason for the misclassification are shown in Table 12.
Table 12. Potential reason for the miss-classification of samples
Sample
Name
Sample
composition
Expected
Class
Observed
Class Potential reason
1 gmx5-a.spa 70%RO - 30%PS RSPO P
It contains palm stearin which is difficult to
analyse because it gets solidified quickly when
placed in the non-heated ATR. Most of the
participants said that samples were solid
before the end of the spectra collection.
Additionally, admixtures of palm stearin and
rapeseed oil were not included in our
calibration models (only palm stearin and palm
oil admixture was included which belong to P
class)
2 hmx5_a.spa 70%RO - 30%PS RSPO P The same than sample 1
3 lmx6b-a.spa 40%PKO - 60%PO PPKO P
This admixture contains 60 % of palm oil and
the model classified it to the closest class, in
this case P.
4.4 PREDICTION WITH SIMCA
SIMCA is a class-modelling technique that involves the use of principal components to model a class of
material on the basis of samples in a training (or calibration) set. SIMCA calibration models were created in
DEFRA project FAO117. Samples in the inter-lab validation set were compared to the model and assigned
EVID4 Evidence Project Final Report (Rev. 06/11)Page 93 of 106
either to the category being modelled or not, on the basis of their predicted distance from the model (and P
values).
Performance of classification models (95% confidence level) was calculated using several parameters;
sensitivity (the percentage of positive labelled samples that were predicted as positive), specificity (the
percentage of negative labelled samples that were predicted as negative), false positive rate (FPR) (the
percentage of incorrectly classified samples that were, in fact, negative), precision (the percentage of positive
predictions that are correct) and F1 score (the harmonic mean of precision and sensitivity).
Inter-lab validation samples (n=126) were predicted using the calibration models. Specific intervals from
the FTIR spectra (from 654.23 to 1875.4 cm-1
and from 2520.0 to 3120.7 cm-1
, 3781 variables) were used.
The results are shown in Table 13 and 14.
Table 13. SIMCA model performances on inter-lab validation set using 3781 variables
Pre-
processing: calibration
dataset
Pre-processing: prediction dataset
SIMCA
ACC(%)
Class TP TN FP FN Sensitivity or TPR
Specificity
FPR Precision F1
score
1. SNV 2. 1
st Deriv
3. S-Golay (7,39) 4. Pareto
1. Icoshift (‘average’, ’whole’) 2. SNV 3. 1
st Deriv
4. S-Golay (7,39) 5. Pareto
3.96 P 0 112 0 14 0.00 1.00 0.00 0.00 0.00
RS 0 98 0 28 0.00 1.00 0.00 0.00 0.00
PKO 0 112 0 14 0.00 1.00 0.00 0.00 0.00
RSPKO 0 112 0 14 0.00 1.00 0.00 0.00 0.00
RSPO 5 82 2 37 0.12 0.98 0.02 0.71 0.20
PPKO
0 112 0 14 0.00 1.00 0.00 0.00 0.00
* ACC: accuracy; TP: true positive; TN: true negative; FP: false positive; FN: false negative, TPR: true positive rate; FPR:
false positive rate. P class: palm oil, palm stearin, palm olein; PKO class: palm kernel oil; RS class: rapeseed oil, sunflower
oil, rapeseed and sunflower mixtures; RSPO class: RS class + P class; PPKO class: P group + PKO; RSPKO class: RS
class + PKO.
The confusion table is presented below in Table 13. One out of fourteen samples of the validation set that
belong to the P group were classified as belonging to the RSPO group and the rest (13 samples) were not
assigned to any of the modelled groups (non-classified). All samples of the validation set that belong to the
RS, RSPKO and PPKO group were not assigned to any of the modelled groups (non-classified). Five
samples of the validation set that belong to the RSPO group were correctly classified as belonging to the
RSPO group whereas thirty-seven samples were not assigned to any of the modelled groups (non-classified).
One sample of the validation set that belong to the PKO group were classified as belonging to the RSPO
group whereas thirteen samples were not assigned to any of the modelled groups (non-classified).
Table 14. Performance of the SIMCA classification model after the application of thresholds
Predicted
P
RS
PK
O
RS
PK
O
RS
PO
PP
KO
No
t
Cla
ss
ifie
d
To
tal
Accu
racy
(%)
Actu
al
P 0 0 0 0 1 0 13 14 0.00
RS 0 0 0 0 0 0 28 28 0.00
PKO 0 0 0 0 1 0 13 14 0.00
RSPKO 0 0 0 0 0 0 14 14 0.00
RSPO 0 0 0 0 5 0 37 42 11.90
PPKO 0 0 0 0 0 0 14 14 0.00
EVID4 Evidence Project Final Report (Rev. 06/11)Page 94 of 106
Reliability (%) 0
.00
0.0
0
0.0
0
0.0
0
71.4
3
0.0
0
*P group: palm oil, palm stearin, palm olein; PKO group: palm kernel oil; RS group: rapeseed oil, sunflower oil, rapeseed
and sunflower mixtures; RSPO group: RS group + P group; PPKO group: P group + PKO; RSPKO group: RS group + PKO
A total of 119 samples were non-classified to any of our modelled groups since the probability was below
the stablish threshold for each class (P-value). All 119 samples are referred to the confirmation step. Overall,
the performance of SIMCA algorithm is poor with an average accuracy, average reliability and overall
accuracy of the prediction of 11.90%, 1.98% and 3.97% respectively. Only five samples belonging to the
RSPO group are correctly classified as belonging to the RSPO group. All the other samples are either
wrongly classified (only two samples- false positive samples) or non-classified (n=119). The two false
positive samples (samples classified as belonging to the wrong group) give an error of 1.59% to the method
when using SIMCA algorithm. The expected class and the observed class as well as the potential reason for
the misclassification are shown in Table 15.
Table 15. Potential reason for the miss-classification of samples
4.5 SAMPLE REFERRAL TO THE CONFIRMATION STEP (GC-FAMEs)
A total of 18 samples were referred to the confirmation step when using PLS-DA algorithm and 119
samples when using the SIMCA algorithm (Table 16 and 17).
Table 16. Samples referred to the confirmation step when using FTIR coupled with PLS-DA algorithm.
File Name Actual class
apx1-a.spa P
emx6b_a.spa PPKO
fmx6-a.spa PPKO
gmx6-a.spa PPKO
hmx6_a.spa PPKO
kmx6-a.spa PPKO
kmx5-a.spa RSPO
lmx9b-a.spa RSPO
lmx5b-a.spa RSPO
lpx2b-a.spa RS
lmx6c-a.spa PPKO
lmx7c-a.spa RSPKO
nmx6-a.spa PPKO
npx1-a.spa P
omx6-a.spa PPKO
omx9-a.spa RSPO
omx5-a.spa RSPO
opx2-a.spa RS
Sample
Name
Sample
composition
Expected
Class
Observed
Class Potential reason
1 lpx3a-a.spa 100% PKO PKO RSPO
RSPO class includes big variety of
pure oils and oil admixtures
2 lpx1a-a.spa 100% PO P RSPO The same than sample 1
EVID4 Evidence Project Final Report (Rev. 06/11)Page 95 of 106
Only the PLS-DA classification is taken forward in the project although here the - not so satisfactory -
SIMCA classification results are also presented.
Table 17. Samples referred to the confirmation step when using FTIR coupled with SIMCA.
a/a File Name Actual class a/a File Name Actual class
1 apx1-a.spa P 43 gmx4-a.spa RSPO
2 amx2-a.spa RS 44 hpx3_a.spa PKO
3 amx3-a.spa PKO 45 hmx6_a.spa PPKO
4 amx4-a.spa RSPO 46 hpx1-a.spa P
5 amx5-a.spa RSPO 47 hmx9_a.spa RSPO
6 amx6-a.spa PPKO 48 hpx2_2.spa RS
7 amx7-a.spa RSPKO 49 hmx7_a.spa RSPKO
8 amx8-a.spa RS 50 hmx8_a.spa RS
9 epx3a_a.spa PKO 51 hmx4_a.spa RSPO
10 emx6a_a.spa PPKO 52 kpx3-a.spa PKO
11 epx1a_a.spa P 53 kmx6-a.spa PPKO
12 emx9a_a.spa RSPO 54 kpx1-a.spa P
13 emx5a_a.spa RSPO 55 kpx2-a.spa RS
14 epx2a_a.spa RS 56 kmx7-a.spa RSPKO
15 emx7a_a.spa RSPKO 57 kmx8-a.spa RS
16 emx8a_a.spa RS 58 kmx4-a.spa RSPO
17 emx4a_a.spa RSPO 59 kmx9-a.spa RSPO
18 epx3b_a.spa PKO 60 lmx6a-a.spa PPKO
19 emx6b_a.spa PPKO 61 lmx9a-a.spa RSPO
20 epx1b_a.spa P 62 lmx5a-a.spa RSPO
21 emx9b_a.spa RSPO 63 lpx2a-a.spa RS
22 emx5b_a.spa RSPO 64 lmx7a-a.spa RSPKO
23 epx2b_a.spa RS 65 lmx8a-a.spa RS
24 emx7b_a.spa RSPKO 66 lmx4a-a.spa RSPO
25 emx8b_a.spa RS 67 lpx3b-a.spa PKO
26 emx4b_a.spa RSPO 68 lmx6b-a.spa PPKO
27 fpx3-a.spa PKO 69 lpx1b-a.spa P
28 fmx6-a.spa PPKO 70 lmx9b-a.spa RSPO
29 fpx1-a.spa P 71 lmx5b-a.spa RSPO
30 fmx9-a.spa RSPO 72 lpx2b-a.spa RS
31 fmx5-a.spa RSPO 73 lmx7b-a.spa RSPKO
32 fpx2-a.spa RS 74 lmx8b-a.spa RS
33 fmx7-a.spa RSPKO 75 lmx4b-a.spa RSPO
34 fmx8-a.spa RS 76 lpx3c-a.spa PKO
35 fpx4-a.spa RSPO 77 lmx6c-a.spa PPKO
36 gpx3-a.spa PKO 78 lpx1c-a.spa P
37 gmx6-a.spa PPKO 79 lmx9c-a.spa RSPO
38 gpx1-a.spa P 80 lmx5c-a.spa RSPO
39 gmx9-a.spa RSPO 81 lpx2c-a.spa RS
40 gpx2-a.spa RS 82 lmx7c-a.spa RSPKO
41 gmx7-a.spa RSPKO 83 lmx8c-a.spa RS
42 gmx8-a.spa RS 84 lmx4c-a.spa RSPO
EVID4 Evidence Project Final Report (Rev. 06/11)Page 96 of 106
a/a File Name Actual class
85 npx3-a.spa PKO
86 nmx6-a.spa PPKO
87 npx1-a.spa P
88 nmx9-a.spa RSPO
89 nmx5-a.spa RSPO
90 npx2-a.spa RS
91 nmx7-a.spa RSPKO
92 nmx8-a.spa RS
93 nmx4-a.spa RSPO
94 opx3-a.spa PKO
95 omx6-a.spa PPKO
96 opx1-a.spa P
97 omx9-a.spa RSPO
98 opx2-a.spa RS
99 omx7-a.spa RSPKO
100 omx8-a.spa RS
101 omx4-a.spa RSPO
102 ppx3-a.spa PKO
103 pmx6-a.spa PPKO
104 ppx1-a.spa P
105 pmx9-a.spa RSPO
106 pmx5-a.spa RSPO
107 ppx2-a.spa RS
108 pmx7-a.spa RSPKO
109 pmx8-a.spa RS
110 pmx4-a.spa RSPO
111 palm kernel oil-a.spa
PKO
112 40 pko + 60 po -a.spa
PPKO
113 palm oil-a.spa P
114 70 pol + 30 ro -a.spa
RSPO
115 70 ro + 30 ps -a.spa RSPO
116 rapeseed oil-a.spa RS
117 50 ro + 50 pko -a.spa
RSPKO
118 40 ro + 60 so -a.spa RS
119 50 ro + 50 po -a.spa RSPO
5. PREDICTION- CONFIRMATION STEP
Individual fatty acid concentrations were calculated using the internal standard method as calculated in the
phase 1 of the FAO117 project. Response factors were calculated from the external fatty acid standards with
respect to C17:0 which was used as the internal standard. The peak area of the individual fatty acid was
divided by the peak area of the internal standard, multiplied by the internal standard concentration and then
by the corresponding response factor and then applying sample weight and dilution factors. Duplicate
analyses were then averaged. The fatty acid concentrations of all the oil samples included in this validation
trial are presented in Table 18.
EVID4 Evidence Project Final Report (Rev. 06/11)Page 97 of 106
A total of 18 samples from all participants were referred to the confirmation step when using PLS-DA
algorithm. Regardless the participants, the number of different samples referred to the confirmation step was
six. And those samples were:
Sample 1: Palm oil (P class)
Sample 2: Rapeseed oil (RS group)
Sample 5: Rapeseed oil (70%) + Palm stearin (30%) (RSPO class)
Sample 6: Palm kernel oil (40%) + Palm oil (60%) (PPKO class)
Sample 7: Rapeseed oil (50%) + Palm kernel oil (50%) (RSPKO class)
Sample 9: Palm olein (70%) + Rapeseed oil (30%) (RSPO class)
These six (6) samples were analysed chromatographically to determine their fatty acid profile according to
the SOPs and all FA contents (mg fatty acid / gram oil blend) and the P/S ratios were calculated.
With the application of the FA criteria (Table 19), the following classification results were obtained:
Sample 1 was not assigned to any of the classes
Sample 2 was found to belong to the RS class
Sample 5 was found to belong to the RSPO class
Sample 6 was found to belong to the PPKO class
Sample 7 was found to belong to the RSPKO class
Sample 9 was found to belong to the RSPO class
At the end of the procedure, 16 out of 18 samples were correctly classified whereas two samples were not
assigned to any of the classes because it did not meet all the requirements of every class.
A total of 119 samples were referred to the confirmation step when using SIMCA algorithm. Regardless
the participants, the number of different samples referred to the confirmation step was nine. And those
samples were:
Sample 1: Palm oil (P class)
Sample 2: Rapeseed oil (RS group)
Sample 3: Palm kernel oil (PKO class)
Sample 4: Rapeseed oil (50%) + Palm oil (50%) (RSPO class)
Sample 5: Rapeseed oil (70%) + Palm stearin (30%) (RSPO class)
Sample 6: Palm kernel oil (40%) + Palm oil (60%) (PPKO class)
Sample 7: Rapeseed oil (50%) + Palm kernel oil (50%) (RSPKO class)
Sample 9: Palm olein (70%) + Rapeseed oil (30%) (RSPO class)
These nine (9) samples were analysed chromatographically to determine their fatty acid profile according
to the SOPs and all FA contents (mg fatty acid / gram oil blend) and the P/S ratios were calculated.
With the application of the FA criteria (Table 19), the following classification results were obtained:
Sample 1 was not assigned to any of the classes
Sample 2 was found to belong to the RS class
Sample 3 was not assigned to any of the classes
Sample 4 was found to belong to the RSPO class
Sample 5 was found to belong to the RSPO class
Sample 6 was found to belong to the PPKO class
Sample 7 was found to belong to the RSPKO class
Sample 9 was found to belong to the RSPO class
EVID4 Evidence Project Final Report (Rev. 06/11)Page 98 of 106
At the end of the procedure, 93 out of 119 samples were correctly classified whereas 26 samples were not
assigned to any of the classes because they did not meet all the requirements of every class.
Table 18. Content (mg/g) of fatty acids of interest for all oil samples included in the inter-lab validation.
*FA: fatty acid; P/S ratio: polyunsaturated FAs/Saturated FAs; P group: palm oil, palm stearin, palm olein; PKO group: palm
kernel oil; RS group: rapeseed oil, sunflower oil, rapeseed and sunflower mixtures; RSPO group: RS group + P group;
PPKO group: P group + PKO; RSPKO group: RS group + PKO; PO: palm oil; RO: rapeseed oil; PKO: palm kernel oil; PS:
palm stearin; POL: palm olein.
Table 19. Classification criteria of fatty acids for every one of the 6 classes developed from control in-
house oil admixtures (DEFRA FAO117).
*FA: fatty acid; P/S ratio: polyunsaturated FAs/Saturated FAs; P group: palm oil, palm stearin, palm olein; PKO group: palm
kernel oil; RS group: rapeseed oil, sunflower oil, rapeseed and sunflower mixtures; RSPO group: RS group + P group;
PPKO group: P group + PKO; RSPKO group: RS group + PKO
Fatty acid content (mg FA/g oil)
C8:0 Caprylic
acid
C12:0 Lauric acid
C14:0 Myristic
acid
C16:0 Palmitic
acid
C18:1 Oleic acid
C18:2 c n6 Linoleic
acid P/S ratio
100% PO P group
0.081 1.233 4.618 268.228 336.052 98.829 0.325
100% RO RS group
0.000 0.000 0.158 24.627 361.206 165.410 6.275
40% RO+60% SO RS group
0.000 0.000 0.248 34.679 280.455 384.626 6.792
100% PKO PKO group
17.698 214.702 75.010 54.168 106.653 21.056 0.045
40% PKO+60% PO PPKO group
5.061 75.942 30.996 158.751 199.496 54.047 0.189
50% RO+50% PO RSPO group
0.031 0.578 2.562 152.631 309.400 99.498 0.776
70% RO+30% PS RSPO group
0.000 0.181 1.725 128.258 345.890 130.745 1.299
70% POL+30% RO RSPO group
0.047 0.894 3.270 170.749 320.622 104.774 0.650
50% RO+50% PKO RSPKO group
8.418 100.836 37.843 41.922 239.518 90.102 0.603
Specific FA P group PKO group RS group PPKO group RSPO group RSPKO group
C8:0 >20 >2.5 >2.5
C12:0 >0.99 >300 <0.1
C14:0 7.8-10.0 <0.7
C16:0 315-490 >=70 58-330 35-70
C18:1 >=195
C18:2c n6 43-80 135-550 25-75 70-425 24-450
P/S ratio <0.25 <0.04 >4.0 <=0.3 >=0.325
EVID4 Evidence Project Final Report (Rev. 06/11)Page 99 of 106
Appendix V – Fatty acid Inter-Lab trial
Sample 1: Standard Soya-Maize oil blend
FATTY ACIDS
BCR-162R IRMM
LAB 1 (%) LAB 2 (%) LAB 3 (%) LAB 4 (%) RSD
C6:0 0.00 0.01 2.00
C8:0 0.00 0.00 0.01 2.00
C10:0 0.00 <0.1 0.01 1.76
C12:0 0.00 <0.1 0.01 1.76
C14:0 0.04 <0.1 0.05 0.10 0.41
C15:0 0.00 0.03 2.00
C16:0 10.74 11.18 10.90 10.69 11.00 0.02
C16:1c 0.06 0.12 0.20 0.92
C17:0 0.07 0.07 0.10 0.71
C17:1c 0.03 0.08 1.43
C18:0 2.82 3.27 2.90 2.84 2.90 0.07
C18:1t 0.00 <0.1 0.03 0.10 0.87
C18:1c 25.40 28.58 26.70 26.71 26.60 0.04
C18:2t 0.16 <0.1 0.46 0.50 0.67
C18:2c 54.13 52.13 55.30 53.86 53.60 0.02
C20:0 0.27 0.40 0.40 0.40 0.17
C18:3c6,9,12 0.16 0.01 1.58
C20:1c 0.35 0.30 0.35 0.30 0.09
C18:3c9,12,15 3.35 3.28 3.60 3.75 3.30 0.07
C20:2c 0.02 0.03 1.17
C22:0 0.28 <0.1 0.29 0.30 0.39
C23:0 0.00 0.00
C24:0 0.12 0.17 0.10 0.73
EVID4 Evidence Project Final Report (Rev. 06/11)Page 100 of 106
Sample 2: Palm oil and shea butter admixture (50% palm oil + 50% shea butter)
FATTY ACIDS
LAB 1 (%) LAB 2 (%) LAB 3 (%) LAB 4 (%) RSD
C6:0 0.00
0.01
2.00
C8:0 0.00 0.00 0.01
2.00
C10:0 0.00 <0.1 0.01
1.76
C12:0 0.13 <0.1 0.13 0.20 0.31
C14:0 0.58 0.50 0.49 0.50 0.08
C15:0 0.03
0.03
1.17
C16:0 24.35 24.50 24.02 24.60 0.01
C16:1c 0.06
0.10 0.10 0.73
C17:0 0.07
0.08 0.10 0.69
C17:1c 0.00
0.02
2.00
C18:0 24.24 22.80 22.65 22.70 0.03
C18:1t 0.00 <0.1 0.05 0.10 0.76
C18:1c 41.68 43.00 42.64 42.20 0.01
C18:2t 0.07 <0.1 0.14 0.20 0.44
C18:2c 7.49 8.20 7.98 7.90 0.04
C20:0 0.85 0.90 0.93 0.90 0.04
C18:3c6,9,12 0.00
0.03
2.00
C20:1c 0.21 <0.1 0.26 0.30 0.40
C18:3c9,12,15 0.12 <0.1 0.21 0.20 0.35
C20:2c 0.00
0.01
2.00
C22:0 0.08 <0.1 0.10 0.10 0.11
C23:0 0.00
C24:0 0.05
0.08 0.10 0.77
EVID4 Evidence Project Final Report (Rev. 06/11)Page 101 of 106
Sample 3: Palm oil and rapeseed oil admixture (65% palm oil + 35% rapeseed oil)
FATTY ACIDS
LAB 1 (%) LAB 2 (%) LAB 3 (%) LAB 4 (%) RSD
C6:0 0.00
0.01
2.00
C8:0 0.00 0.00 0.01
2.00
C10:0 0.00 <0.1 0.01
1.76
C12:0 0.05 <0.1 0.07 0.10 0.28
C14:0 0.73 0.70 0.63 0.70 0.06
C15:0 0.03
0.05
1.23
C16:0 30.22 30.10 29.53 30.20 0.01
C16:1c 0.12
0.23 0.20 0.74
C17:0 0.06
0.08 0.10 0.71
C17:1c 0.00
0.06
2.00
C18:0 3.66 3.50 3.45 3.50 0.03
C18:1t 0.00 <0.1 0.05 0.10 0.76
C18:1c 49.02 48.90 48.46 47.90 0.01
C18:2t 0.24 <0.1 0.38 0.50 0.57
C18:2c 12.42 13.30 12.89 12.70 0.03
C20:0 0.38 0.40 0.45 0.40 0.07
C18:3c6,9,12 0.16
0.01
1.57
C20:1c 0.59 0.50 0.51 0.50 0.08
C18:3c9,12,15 2.13 2.60 2.82 2.30 0.13
C20:2c 0.00
0.02
2.00
C22:0 0.13 <0.1 0.16 0.10 0.23
C23:0 0.00
C24:0 0.06
0.10 0.10 0.73
EVID4 Evidence Project Final Report (Rev. 06/11)Page 102 of 106
Sample 4: Palm kernel oil and palm oil admixture (42% palm kernel oil + 58% palm oil)
FATTY ACIDS
LAB 1 (%) LAB 2 (%) LAB 3 (%) LAB 4 (%) RSD
C6:0 0.03
0.05 0.10 0.94
C8:0 0.74 1.40 0.95 1.10 0.26
C10:0 1.03 1.40 1.10 1.30 0.14
C12:0 18.14 19.20 17.01 18.70 0.05
C14:0 7.74 7.00 6.67 7.10 0.06
C14:1
0.10 2.00
C15:0 0.03
0.03
1.15
C16:0 29.09 28.80 28.93 28.50 0.01
C16:1c 0.06
0.13 0.10 0.77
C17:0 0.06
0.07 0.10 0.73
C17:1c 0.00
0.02
2.00
C18:0 3.91 3.70 3.77 3.60 0.03
C18:1t 0.00 <0.1 0.04
1.33
C18:1c 32.46 32.20 33.79 31.90 0.03
C18:2t 0.09 <0.1 0.15 0.10 0.24
C18:2c 6.09 6.30 6.46 6.00 0.03
C20:0 0.25 <0.1 0.32 0.30 0.41
C18:3c6,9,12 0.00
0.01
2.00
C20:1c 0.12 <0.1 0.17 0.10 0.26
C18:3c9,12,15 0.08 <0.1 0.18 0.10 0.37
C20:2c 0.00
0.01
2.00
C22:0 0.04 <0.1 0.06 0.10 0.42
C23:0 0.00
C24:0 0.04 0.07 0.10 0.82
EVID4 Evidence Project Final Report (Rev. 06/11)Page 103 of 106
Sample 5: Coconut oil and palm oil admixture (58% coconut oil + 42 palm oil)
FATTY ACIDS
LAB 1 (%) LAB 2 (%) LAB 3 (%) LAB 4 (%) RSD
C6:0 0.11
0.18 0.20 0.74
C8:0 2.44 4.10 2.83 3.20 0.23
C10:0 2.76 3.40 2.76 3.10 0.10
C12:0 26.88 27.50 25.06 26.80 0.04
C14:0 12.73 11.30 11.26 11.50 0.06
C15:0 0.00
0.03
2.00
C16:0 24.85 24.30 25.46 24.60 0.02
C16:1c 0.05
0.10 0.10 0.77
C17:0 0.04
0.05 0.10 0.86
C17:1c 0.00
0.01
2.00
C18:0 3.94 3.70 3.88 3.70 0.03
C18:1t 0.00 <0.1 0.03
1.44
C18:1c 20.69 20.60 22.23 20.60 0.04
C18:2t 0.08 <0.1 0.15 0.30 0.63
C18:2c 5.05 5.10 5.41 5.00 0.04
C20:0 0.19 <0.1 0.23 0.20 0.31
C18:3c6,9,12 0.00
0.01
2.00
C20:1c 0.07 <0.1 0.08 0.10 0.14
C18:3c9,12,15 0.07 <0.1 0.12 0.10 0.22
C20:2c 0.00
0.01
2.00
C22:0 0.02 <0.1 0.04
1.05
C23:0 0.00
C24:0 0.03
0.06 0.10 0.93
EVID4 Evidence Project Final Report (Rev. 06/11)Page 104 of 106
Sample 6: Soybean oil and palm oil admixture (59% soybean oil + 41% palm oil)
FATTY ACIDS
LAB 1 (%) LAB 2 (%) LAB 3 (%) LAB 4 (%) RSD
C6:0 0.00
0.01
2.00
C8:0 0.00 0.00 0.01
2.00
C10:0 0.00 <0.1 0.01
1.76
C12:0 0.05 <0.1 0.05 0.10 0.38
C14:0 0.58 0.50 0.47 0.50 0.09
C15:0 0.02
0.04
1.24
C16:0 26.22 24.80 24.40 25.10 0.03
C16:1c 0.07
0.15 0.10 0.77
C17:0 0.08
0.09 0.10 0.68
C17:1c 0.01
0.06
1.56
C18:0 4.87 4.30 4.24 4.30 0.07
C18:1t 0.00 <0.1 0.05 0.10 0.76
C18:1c 31.02 30.30 29.99 29.80 0.02
C18:2t 0.11 <0.1 0.29 0.50 0.75
C18:2c 32.47 35.90 34.96 34.30 0.04
C20:0 0.27 0.40 0.38 0.40 0.17
C18:3c6,9,12 0.13
0.04
1.17
C20:1c 0.26 <0.1 0.25 0.30 0.39
C18:3c9,12,15 3.46 3.90 4.01 3.50 0.07
C20:2c 0.00
0.03
2.00
C22:0 0.26 <0.1 0.32 0.30 0.41
C23:0 0.02
2.00
C24:0 0.08
0.13 0.10 0.71
EVID4 Evidence Project Final Report (Rev. 06/11)Page 105 of 106
Sample 7: Palm oil
FATTY ACIDS
LAB 1 (%) LAB 2 (%) LAB 3 (%) LAB 4 (%) RSD
C6:0 0.00
0.01
2.00
C8:0 0.00 0.00 0.01
2.00
C10:0 0.00 <0.1 0.01
1.76
C12:0 0.09 <0.1 0.09 0.10 0.04
C14:0 1.12 1.00 0.93 1.00 0.08
C15:0 0.04
0.05 0.10 0.84
C16:0 43.82 43.60 42.56 43.20 0.01
C16:1c 0.10
0.19 0.20 0.77
C17:0 0.08
0.09 0.10 0.68
C17:1c 0.00
0.02
2.00
C18:0 4.70 4.50 4.41 4.40 0.03
C18:1t 0.00 <0.1 0.05 0.10 0.76
C18:1c 39.96 40.40 40.34 39.60 0.01
C18:2t 0.15 <0.1 0.41 0.40 0.62
C18:2c 9.26 10.20 9.85 9.50 0.04
C20:0 0.33 0.40 0.41 0.40 0.10
C18:3c6,9,12 0.03
0.01
1.07
C20:1c 0.14 <0.1 0.14 0.20 0.29
C18:3c9,12,15 0.10 <0.1 0.24 0.10 0.53
C20:2c 0.00
0.01
2.00
C22:0 0.05 <0.1 0.07 0.10 0.32
C23:0 0.00
C24:0 0.04
0.08 0.10 0.79
EVID4 Evidence Project Final Report (Rev. 06/11)Page 106 of 106
Sample 8: Standard cocoa butter
FATTY ACIDS
LAB 1 (%) LAB 2 (%) LAB 3 (%) LAB 4 (%) RSD
C6:0 0.00
0.01
2.00
C8:0 0.00 0.00 0.01
2.00
C10:0 0.00 <0.1 0.01
1.76
C12:0 0.00 <0.1 0.01
1.76
C14:0 0.08 <0.1 0.08 0.10 0.10
C15:0 0.02
0.04
1.32
C16:0 24.95 25.80 25.08 25.40 0.02
C16:1c 0.16
0.27 0.20 0.72
C17:0 0.21
0.25 0.30 0.69
C17:1c 0.00
0.03
2.00
C18:0 38.52 36.70 36.60 36.00 0.03
C18:1t 0.00 <0.1 0.02
1.57
C18:1c 32.03 33.50 33.00 33.20 0.02
C18:2t 0.00 <0.1 0.01
1.76
C18:2c 2.63 2.90 2.86 2.80 0.04
C20:0 1.04 1.10 1.12 1.10 0.03
C18:3c6,9,12 0.00
0.02
2.00
C20:1c 0.03 <0.1 0.07 0.10 0.44
C18:3c9,12,15 0.13 <0.1 0.18 0.20 0.30
C20:2c 0.00
0.01
2.00
C22:0 0.15 <0.1 0.20 0.20 0.30
C23:0 0.00
C24:0 0.05
0.12 0.10 0.77