Development of Sensory Testing

1.0 DEVELOPMENT OF SENSORY TESTING IN FOOD INDUSTRY

1.1 Introduction:

Sensory tests have been conducted as human beings began to

evaluate the quality in their surroundings. Sensory analysis began during the

wartime when there are some efforts to provide food to the soldiers. There

will be a worth value for sensory testing as it helps to determine its

acceptability in marketplace. The three principal uses for sensory techniques

are quality control, product development, and research. To conduct valid and

reliable tests that provide data is the primary function of sensory testing.

Basically, there are many kinds of tests which can be classified into

two major tests. They are analytical tests and affective tests. Analytical test

can be divided into overall difference test, various attribute difference tests

and also descriptive tests. The affective tests are based on consumer testing.

There are many types of Difference Test. A Triangle Test is a sensory

test that is used to determine difference between products. These

differences could be ingredients, processing, or differences in packaging.

Effective testing includes presenting three samples and asking which sample

is different. In any type of test, leaving room for panelists to make comments

is also beneficial because it can sometimes better explain their choices. A

Two-out-of-Five test is similar to the Triangle Test. Panelists are asked to pick

two out of the five that are similar in characteristics.

Multiple Paired Comparison Tests, where panelists are asked to taste

two samples and rate attributes such as saltiness. The panelists may be

asked to mark the sample that is the most or least salty. This test involves a

number sample pairs.

In a ranking test, panelists are asked to rank in order an attribute the

sample possesses (or lack of.) Ranking samples of apples on levels of

crispiness (most crisp to mushy) is an example of a ranking test. Ranking the

color brown on various types of French fries after being deep fat fried (using

1

different types of potatoes may cause intensity changes to occur in the

browning of the potato) is another example of a Difference Ranking Test.

Descriptive analysis methods involve the detection (discrimination)

and the description of both the qualitative and quantitative sensory aspects

of a product by trained panels of 5 to 100 judges (subjects). It is a method by

which attributes of a food or product are identified and quantified using

human subjects who have been trained for this purpose. It is an appropriate

for use when detailed information is required on individual characteristics of

the product or material or both. It involves the detection and description of

both qualitative and quantitative sensory aspects of a product by trained

panelists. Descriptive test can provided information that cannot be obtained

by other analytical means. The analysis can include all parameters of the

products or it can be limited to certain aspects.

Smaller panels of five to ten subjects are used for the typical product

on the grocery shelf, whereas, the larger panels are used for product of mass

production where small differences can be very important, example like,

beers and soft drinks. Panelists must be able to detect and describe the

perceived sensory attributes of a sample. Plus, panelists must learn to

differentiate and rate the quantitative or intensity aspects of a sample and

also must learn to define to what degree each characteristic or qualitative

note is present in that sample. Panelists must be screened and quantified to

participate and must maintain their skills.

The qualitative aspects of a product combine to define the product and

include all of the appearance, aroma, flavor, texture, or sound properties of a

product that differentiate it from others. The goal of descriptive analysis is to

provide a quantitative specification of the important sensory aspects of a

product. Use descriptive tests to obtain detailed description of the aroma,

flavor, and oral texture of foods and beverages, skin-feel of personal care

products, hand-feel of fabrics and paper products, and the appearance and

sound of any product.

2

Qualitative factors include terms that define the sensory profile or

picture of the sample. There are three type of scales used which are

category scales, line scales and magnitude estimation. The order of

appearance of physical properties, related to oral, skin and fabric textures,

are generally predetermined by the way the product is handled (the input of

forces by the panelist). In addition to the detection and description of the

qualitative, quantitative and time factors that define the sensory

characteristics of a product, panelists are capable of, and management is

often interested in, some integrated assessment of the product properties.

The overall impression includes total intensity of aroma or flavor; balance or

blend (amplitude) of the aroma; overall difference of the sample; and

hedonic ratings.

3

2.0 OVERALL DIFFERENCE TESTS

2.1 Triangle test

The objective of this test (Triangle test) is to determine whether a

sensory difference exists between two products. This method also important

to determine whether there are changes in products after treatments upon

the products had been done where the product changes that produced

unable to be characterized simply by only one or two attributes. Statistically

it shown that this method are more efficient compare to the paired

comparison and duo-trio methods but triangle test hassle limited use with

products which means that it has limited use with products that involved

sensory fatigue, carryover or adaptation. It also has limited use with subject

that have problem or confuse in testing three samples.

Although triangle test has limited use but it is effective in certain

situations for instance first is to determine the products differences occur

from the changes in ingredients, processing, packaging, or storage.

Secondly, is to determine whether an overall difference exists, where there is

no specific attributes that can be identified as having affected. Thirdly, it is

effective in order to select and monitor panelists for their ability in

discriminating given differences.

In this test, each panelist presented with three coded samples. Two

from the three samples are identical and one is different (odd). Panels need

to taste or feel or examine each product in order from left to the right. Panels

need to identify the odd samples. The number of panels that identify

correctly will be count and to interpret the data we must refer a table.

Basically, 20- 40 panels needed to undergo triangle test. In certain

situation which the differences are large and easy to identify only a few

panels as 12 panels can be used. On the other hand, for the similarity test it

requires 50- 100 panels. Panels must be familiar with triangle test format ,

4

the procedure and familiar with the product that been tested. It is because of

flavour memory is important in this test. An orientation session need are

recommended before panels undergo this test and care must be taken in

giving information in order to be more instructive and motivating. Care is

important in order to avoid bias among panels.

In this test, there same things that need to be control which are the

test area and the preparation of the samples. The lighting at the taste area

must be controlled in order to reduce any colour variables. Meanwhile, the

samples should be prepared under optimum condition according to the

product type that used in this test. Question that related to acceptance,

preference, degree of difference or type of difference after initial selection of

the odd sample should not be asked because it can cause bias the responses

of the panels.

2.2 Two-out-of-Five Test

Two- out- of- Five Test is a test or method that statistically very

efficient compared to triangle test. it is because of this method give high

chances of guessing correctly of the samples which is 1 in 10 samples

compare to 1 in 3 samples (triangle test). This test is affected by sensory

fatigue and memory effect which this two factor are the principal used in

visual, auditory and tactile applications. This principal is not used in flavor

testing.

This method used when the objective of a test is to determine whether

a sensory difference exists between two samples and this method also used

when the numbers of subject are small (ten person).

Two- out- of- Five Test effective only in certain situations which in

situations that need to determine whether the difference of the product was

cause by the ingredients, processing, packaging and storage. Other situation

which the use of this method is effective is the situation that we need to

5

determine whether there is overall difference exists, where there is no

specific attributes that can be identified as having been affected and it also

effective in selecting and monitoring panelist for their ability to discriminate

differences that given in test situations.

In this method, panels were presented with five samples. Two of the

five samples belong to one type. Meanwhile, the other three were belong to

another type. The samples were tasted, view, examined and feel in order

from left to right. Panelist need to identify two samples which these two

samples are different from the other three samples. To undergo this test,

trained panelists were needed. Basically, 10- 20 panels were used. When

the differences are large and easy to identified, we can use 5 to 6 panels

only.

2.3 Same/ Different Test (or simple Difference Test)

Same/ different test also known as simple difference test. This method

used when the objective of the test is to determine whether a sensory

difference exists between two products. Generally, this method also used

when a test is not suitable for triple or multiple presentation which means

that not suitable for triangle test and duo-trio test. Examples of situation that

unsuitable for triangle test and duo- trio test are comparisons between

samples of strong or lingering flavour, samples that need to be applied to the

skin in half- face tests and samples that very complex which cause mentally

confusing to the panelist.

As the other test, same/ different Test also effective in certain

situation. For examples, this method is effective in situations that need to

determine whether product difference is cause by the change in ingredients,

processing, packaging, or storage and the other situation is the situation that

we need to determine whether there is an overall difference that exists

where there is no specific attributes that can be identified as having been

affected. This type of test consume more time compare to the other test

6

because the differences between products were obtained by comparing the

responses which the responses were obtained from different pairs (A/A, B/B,

A/B and B/A).

In this method each panels will be presented with 2 samples which

panels need to identified either the samples were same or different. Half of

these samples will present 2 different samples. Meanwhile, the other half will

present the same samples which it will present twice. Basically, there will be

20- 50 presentations of each of the four samples combinations (A/A, B/B, A/B

and B/A) required to determine differences. More than 200 panels can be

used in this method or 100 panels will receive two of the pairs. In a situation

where the same/ different test had been use because of the complexity of

the stimuli, subject should not be present with more than one pair of

samples at a time. Panels that involved in this test can be trained panels or

untrained panels but the subjects or panels that involved in this test cannot

be the mixture of trained and untrained panels. The results of this method

was analyze by comparing the responses for the different pairs by using the

x² - test.

2.4 “A”_ “Not A” Test

As the same/ different test, this test was used when the test objective

is to determine whether a sensory differences exists between two products

and generally it also used when test are not suitable for dual or triple

presentation (triangle test and duo- trio test not suitable). Examples of

situation that unsuitable for triangle test and duo- trio test are comparisons

between samples of strong or lingering flavour, samples that need to be

applied to the skin in half- face tests and samples that very complex which

cause mentally confusing to the panelist.

“A”_ “not A” test was used in preference to the same/ different test

where it was used when one of the two products has it’s own importance as

a standard or reference product which the panels were familiar with the

7

subjects or one of the two products is essential to a project similar to the

current sample against which all others are measured. As the others tests,

“A”_ “not A” test also effective in certain situation where this “A”_ “not A”

test effective in situations which is exactly the same with the situations

which the same/ different test was effective. This test is very useful for

screening of panelists. Other than that, it also can be used for determining

sensory threshold by Signal Detection Method. The principle of this test is to

familiarize the panelists with samples “A” and not A”. Each panelists will be

presented with samples where some of the samples are product “A” while

the other product are “not A”. Panels should identified whether the samples

that been presented to them is “A” or “not A”. x² - test was used to compare

the correct identifications with the incorrect ones in order to determine the

subject’s ability.

In this test, to recognize the “A” and “not A” 10- 50 trained panels

were needed. 20- 50 presentation of each sample in the study. Each panel

may receive only one sample either “A” or “not A”, 2 samples (one “A” and

one “not A”) or panels may test more than 10 samples in a series. Number of

samples that allowed to be presented to subjects is determined by the

degree of physical or mental fatigue that produced by the samples in each

panels. In this test, the standard version of the procedure a set of protocol

must be observed. The set of protocol are products “A” and “not A” must be

available to panels only until the start of the test, only one “not A” sample

exists for each test and equal numbers of “A” and “not A” must be

presented in each test. This set of protocol may be changed for any given

test. The changes must be informed to the panels.

2.5 Duo-trio test

Application and importance:

The duo-trio test (ISO 2004a) is known to be statistically less efficient

than the triangle test as the chance of obtaining a correct result by guessing

8

is 1 in 2. However, this test is simple and easily to be understood. The

advantage of this test is that a reference sample is presented and this can

avoid confusions with respect to what constitutes a difference. On the other

hand, the disadvantage of this test does exist, whereby instead of only two

samples, three samples must be tasted. This test method will be used only

when test objective is to determine whether a sensory difference exist

between two samples. It is particularly useful in determining if there is a

product differences resulting from change in ingredients, food processing

and packaging, or storage. It is also used to determine whether an overall

difference exists, where no specific attributes can be identified as having

been affected.

The duo-trio test uses its general application when there are more than

15, and preferably more than 30, test subjects are available. The test exists

in two types, which are the constant reference mode and the balanced

reference mode. The constant reference mode will be used in which the

sample, usually drawn from regular production, is always the reference while

the balanced reference mode is where t both the samples being compared

are used at random as the reference. Use the constant reference mode with

trained subjects whenever a product is well known to them can be used as

the reference. The balanced reference mode is used when both samples are

unknown or if untrained subjects are being used. The duo-trio test will be

less suitable than the paired comparison test if there are pronounced

aftertastes.

Principle of the Test:

An identified reference sample will be presented to the subject

followed by two other coded samples, one which matches the reference

sample. The subject needs to indicate which coded sample matches the

reference. The correct number of replies will be counted and interpretation is

referred to the table of Critical Number of Correct Responses.

9

Test Subjects:

The minimum subjects for this test is 16, but for less than 28 subjects,

the beta-error is high. Discrimination can be improved if 32, 40 or larger

number can be applied. At a minimum, subjects need to be familiarized with

the product characteristics and the test procedure. Subjects will not be

informed about specific information about the samples to avoid bias.

Method:

Control of lighting may be necessary to reduce colour variables and

samples need to be prepared and presented under optimum conditions for

the product being inspected. Samples need to be offered simultaneously, if

possible, or else sequentially. The samples need to be prepared in equal

numbers of the possible combinations and allocate the sets at random

among the subjects. Score sheet (which is the same as in the balanced

reference and constant reference modes) will be provided and space for

several duo-trio tests may be provided on the score sheet. The number of

correct responses and the total number of responses will be referred to the

table of Critical Number of Correct Responses. It will not count for “no

difference” responses and subjects need to guess when in doubt.

Usage:

As an example, this test will be used in a case where a food

manufacturer (i.e.; Chocolate blend) needs to replace the current ingredient

of cocoa beans used to make their food product. So, food analyst will try to

determine which type of cocoa beans can be best replaced the current

blend. They will then test for similarity between the current blend and each

type of the project blend. This is to see whether there is any significant

difference or similarity between the original blend and the substituting blend.

2.6 Difference-from-control test:

10


This test will be used when the project or the test objective is twofold,

where at the same time needs to determine whether a difference exist

between one or more samples and a control, and when estimating the size of

any such differences. One sample will be designated as the ”control”,

“reference”, or “standard”. All other samples are evaluated with respect to

how different each sample is from the controlling sample. It is useful in

situations where the difference may be detectable; however the size of

difference affects the decision about the test objective. This test is

appropriate when the duo-test and triangle test cannot be used because of

normal heterogeneity of food products. It can also be used as a two-sample

test in situations where the multiple sample tests are inappropriate because

of fatigue and carryover effects.

Principle of the Test:

The subject will be presented a controlling sample plus one more test

sample. The size of the difference between each sample and the control will

be rated by the subject and a scale is provided for this purpose. Indicate the

subjects that some of the test samples may be the same as the control. The

resulting mean difference-from-control is evaluated and estimated by

comparing them to the difference-from-control obtained with the blind

controls. The estimation obtained from the blind controls is used to obtain a

measure of the placebo effect.

Test subjects:

There are generally 20-50 presentations of each of the samples and

the blind control with the labelled control are required to determine a degree

of difference. When the difference-from-control test is chosen because of a

complex comparison or fatigue factor, then no more than one pair of

11

samples should be given to the subjects at the same time. This test can

either use trained or untrained panellist, but should not consist mixture of

both. The subjects need to be familiarized with the test format, the meaning

of the scale and the fact that the proportion of test samples will be blind

controls.

Method:

The test controls and product controls for this test is the same as the

triangle test and the duo-trio test. The samples will be presented

simultaneously, if possible, with the labelled control evaluated first. One

labelled control sample will be prepared and the other test sample will be

known as the sample test. When a sample being conducted to all subjects

but the sample testing cannot be done in that one test session, they need to

keep a record of subjects by sample to ensure that the remaining samples

are presented in subsequent sessions.

Usage:

As an example, the test is used in measuring the perceived difference

within batches of food, such in the case of flavoured peanut snack. They will

develop a test method suitable for monitoring batch-to-batch variations in

the production of the flavoured peanut snacks (i.e.: spicy flavour and

barbeque flavour). In such difference test as this, subjects need to detect

batch-to-batch differences and allows separation of the variations of the

flavoured peanut snacks.

2.7 Sequential test


12

This test were meant to economize the number of evaluations required

to draw a conclusion, for example, acceptance vs. rejection of a trainee on a

panel or shipment vs. destruction of a lot of produced goods. Because alpha

and beta error were determined and decided beforehand, the sequential

tests provides a direct approach to simultaneously test for either the

difference or the similarity between the two samples.

It is very practical and efficient as they take into consideration the

possibility that the evidence derived from the first few evaluations can be

sufficient to provide a conclusion. Further testing can be a waste of time and

money. Due to this test, it reduces the number of evaluations as much as

50%. It may be used with existence-of-difference test in which there is a

correct and incorrect answer.

Principle:

A sequence of evaluation was conducted according to the procedure

appropriate for the chosen method and the results will be entered into a test

graph. Three results are identified as the acceptance region, the rejection

region, and the continue-testing region. The number of trials will be plotted

on the horizontal (x) axis while the total of correct responses is plotted on

the vertical (y) axis. Result of the first test will be entered and each

succeeding test, increase x by 1 and y by 1 for a correct reply and 0 for

incorrect reply. The test will be continued until a point touch or crosses one

of the lines bordering the region of indecision. Indication of the conclusion

will be drawn in the graph.

Usage:

Sequential test can show a significant test plot that is capable to draw

a conclusion by plotting the results in graph. This test can be conducted in

cases such as in the sequential Duo-Trio test: The Warmed-Over Flavour in

Beef Patties. This example case shows that they need to determine whether

13

difference can be detected for the samples stored for a day, 3 days and five

days vs. a freshly grilled patties. The preliminary test shows that in the duo-

trio test, 5-days patties shows a strong ‘warmed-over-flavour’ and 1-day

patties have none, hence the sequential test design were appropriate;

whereby the decision for these two samples could occur just with a few

responses based on the graph plot. As each subject completes one test, the

result is added to the previous responses, and the cumulative results are

plotted. The test series continues until the storage sample is declared similar

to or different from the control.

3.0 ATTRIBUTE DIFFERENCE TESTS

Attribute difference tests measure a single attribute such as sweetness,

comparing one sample with one or several others. The lack of a difference

between samples with regard to one attribute does not simply that no overall

difference exists. Attribute difference tests involving two samples are simple

regarding test design and statiscal treatment. Determining whether test

situations are one-sided or two-sided is the main difficulty to determine.

14

Some designs can be analyzed by the analysis of variance whereas others

require specialized statistics if we get more than two samples. The degree of

complexity increases rapidly with sample numbers, as does the economy of

testing, which is possible by improved test designs.

In these attribute test, we will explain about a description of the various

multiple pair test follows, multisample tests and their designs.

3.1 Directional Difference Test: Comparing Two Samples

DEFINATION the method is also called the paired comparison test or the 2-

AFC (2-alternative forced choice) test. It is one of the simplest and most used

sensory tests that is often used first to determine if other more sophiscated

tests should be applied.

PURPOSE/USAGE this method when the test objective is to determine in

which way particular sensory characteristic differs between two samples.

APPLICATION,TOOLS AND TECHNIQUE INVOLVED

The number of respondents required for the test is affected by :

1) Whether the test is one-sided or two-sided

2) The values chosen for the test-sensitivity parameters.

This test present to each subject two coded samples. Prepare equal

numbers of the combinations of AB and BA and allot them at random among

the subjects. The subject will be ask to taste the products from left to right

and fill in the scoresheet. Clearly inform the subject whether ‘’no difference’’

verdicts are permitted.

Only the ‘’forced choice technique’’ is amenable to formal statiscal

analysis. However, in some cases subjects may object quite strenuously to

inventing a difference when none is perceived. The sensory analyst must

15

then decide whether to divide their scores evenly over the two samples or

ignore them.

This test procedure Prepare equal numbers of the combinations AB

and BA and allocate the sets at random among the subjects. The scoresheet

is the same whether the test is one- or two sided, but the scoresheet must

show whether ‘’no difference’’ verdicts are permitted. Space for the several

successive paired comparisons may be provided on single scoresheet,but do

not add supplemental questions because these may introduce bias.

For the count the number of responses of interest where In a one-sided test,

count the number of the correct responses, or the responses in the direction

of the interest. In two sided test,count the number of agreeing responses

citing one sample more frequently.

IMPLICATION AND IMPORTANCE

The test is conducted with subjects who have received a minimum of

training, it is sufficient that subjects are completely familiar with the

attribute under test. Some test is particular important such as an off-flavor in

aproduct already on market, highly trained subjects may be selected who

have shown special acuity attribute.This is because the chance of guessing is

50%, fairly large numbers of the test subjects are required.

3.2 Pairwise Ranking Test: Friedman Analysis

Comparing Several Samples in All Possible Pairs

PURPOSE/USAGE

This method is used when test objective is to compare several samples for a

single attribute, such as sweetness,freshness or preference. The test is

partiuclary useful for sets of three to six samples that are to be evaluated by

a relatively inexperienced panel. It arranges the samples on a scale of

16

intensity of the chosen attribute and provides a numerical indication of the

differences between samples and the significance of such differences.

APPLICATION,TOOLS AND TECHNIQUE INVOLVED

The Principle of the test is it will present a question which is for example’’

which sample is sweeter?’’ (fresher or more prefer ) to each subjects one

pair at a time in random order. It will continue until each subject has

evaluated all possible pairs that can be formed from the samples.Evaluate it

with Friedman Statiscal Analytical Analysis.

The tools in this test used is the test subject should be slecet,trainand

instruct subjects as described in other test. Use no fewer than 10

subjects,discrimination is much improved if 20 or more can be used.

Ascertain that subjects can recognize the attribute of interest, by training

with various pairs of known intensity difference in the attribute. Depending

on the test objective, subjects may be required who have proven ability to

detect small differences in the attribute.

The test procedure for test controls and product controls is same with stated

before.

3.3 Multisample Difference Tests

There are several types of multisample difference tests, those are:

1. Multisample Difference Test: Rating Approach-Evaluation by Analysis of

Variance

2. Multisample Difference Test: BIB Ranking test ( Balanced Incomplete

Block Design)-Friedman Analysis

3. Multisample Difference Test: BIB Rating Test ( Balanced Incomplete

Block Design)-Evaluation by Analysis of Variance

17

3.3.1Multisample Difference Test: Rating Approach-Evaluation by Analysis of

Variance

Rating approach is used when the test objective is to determine in

which way a particular sensory attribute varies over a number of t samples,

where t may vary from 3 to 6 or at most 8 and it is possible to compare all t

samples as one large set. Subjects will rate the intensity of the selected

attribute on a numerical intensity scale in example a category scale. The

results also will be evaluate by the analysis of variance.

The subjects receives the set of t samples in balanced randomized

order in which the task is to rate each sample using the specified scale. The

set may be presented once only, or several times with different coding.

Accuracy is much improved if the set can be presented two or more times. If

more than one attribute is to be rated, theoretically the sample should be

presented separately for each attribute.

For example the hop character in five beers. The situation is a brewer

is producing a new brand of beer that is to have a high level of hop

character. He is brewing with five alternative lots of hops that cost $1.00,

$1.20, $1.40, $1.60 and $1.80/lb. The project objective is to choose the lot

that gives the most hop character for the money while the test objective is to

compare the resulting five beers for degree of hop character in which to

obtain a measure of the reliability of the results. 20 subjects evaluate the

samples on a scale of 0-9. The order of presentation is randomized and the

samples are presented on three separate occasions with different coding.

3.3.2 Multisample Difference Test: BIB Ranking test ( Balanced Incomplete

Block Design)-Friedman Analysis

BIB ranking test is used when the test objective is to determine in

which way a particular sensory attribute varies over a number of samples

and there are too many samples to evaluate at any one time. Typically, the

18

method is used when the number of samples to be compared is from 6 to 12,

or at most 16. The present method (ranking) is chosen when the panelists

are relatively untrained for the type of sample or relatively simple statistical

analysis is preferred. Subjects are asked to rank the samples according to

the attribute of interest.

For example the species of fish. The situation is where a military field

ration XPQ-6 ( fish fingers in aspic) has been prepared in the past from 15

different species of fish. The project objective is to compare the 15 species

such that quantitative information on the degree of fishy flavor is obtained

while the test objective is to compare fish fingers produced from the 15

species for degree of fishy flavor. A randomly selected group of 105 enlisted

personnel are randomly divided into 35 groups of three subjects each. A

schoresheet is prepared to ask the subject to rank his three samples

according to fishy flavor, from least (=1) to most (=3).

3.3.3 Multisample Difference Test: BIB Rating Test ( Balanced Incomplete

Block Design)-Evaluation by Analysis of Variance

Usage/Application

This method is used when the test objective is to determine in which

way a particular sensory attribute varies over a number of samples.

Basically, the number of samples to be compared is from 6 to 12, or mostly

at 16. The present method (rating) is chosen when panelists is trained to use

a rating scale and results need to be as precise and actionable as possible.

All t samples are presenting as one large block and then the subjects

were asked to rate the intensity of the attribute of interest on a numerical

intensity scale. The results will be evaluate by analysis of variance.

The subjects must be able to recognize the attribute of interest

example by training with sets of known intensity levels in the attribute. Not

fewer than 8 subjects are used because discrimination is much improved if

19

16 or more are used. Subjects may require special instruction to enable them

to recognize the attributes of interest reproducibly. Depending on the test

objectives, subjects may be selected who show high discriminating ability in

the attribute(s) of interest.

BIB rating test offer samples simultaneously if possible or else

sequentially. The order of presentation is truly random whereby the subjects

must not be led to suspect a regular pattern, as this will influence verdicts.

For example a problem given is where a QC manager of an ice cream

plant routinely screens samples of finished product to select lots that will be

added to the pool of quality reference samples for use in the main QC testing

program. The project objective is to maintain a sufficient inventory of

reference samples of finished ice cream for QC testing purposes while the

test objective is to rate the inventory of six lots each day for overall off-flavor

and discard any lot that may not be suitable as a reference. The samples of

the six lots are evaluated for overall off flavor by 15 well-trained panelists

who use a 10-point category scale from 0 (no off-flavor) to 9 (extreme off-

flavor). Each of the 15 panelists is randomly assigned one block of four

samples from the design. The order of presentation of the samples within

each block is randomized.

20

4.0 DESCRIPTIVE ANALYSIS TECHNIQUE

Descriptive analysis are applied in documenting product sensory

characteristics, identifying and quantifying sensory characteristics,

correlating instrumental and chemical measurements with sensory

responses, monitoring product quality, interpreting consumer responses,

sensory diagnostics of ingredient, processing or packaging changes,

prediction of consumer acceptance, and also used in matching of sensory

profiles in quality assessments. Not only that, the sensory profiles are used

in research and development and in manufacturing to define the sensory

properties of a target for new product development; to document product

attributes before a consumer test to help in the selection of attributes to be

included in the consumer questionnaire and to help in an explanation of the

results of the consumer test; to track a product’s sensory changes over time

with respect to understanding shelf life, packaging and many more; to map

perceived product attributes for the purpose of relating them to

instrumental, chemical or physical properties; and to measure short-term

changes in the intensity of specific attributes over time (time-intensity

analysis).

The principles used in descriptive analysis are it deals with perceptions

not with ingredients, causes or implications; it does not ask questions about

consumer acceptability; it uses panels consisting of trained or calibrated

observers; it uses well-defined terminology; data are quantified through

ratings of perceived intensities on scales; and it seeks to answer questions

about how products differ on specific sensory bases. There are four

21

components in descriptive analysis, which are, first characteristics

(qualitative aspect); second intensity (quantitative aspect which include

category scales, line scales, and magnitude estimation); third order of

appearance (time aspect); and lastly overall impression (integrated aspect).

6 Commonly Used Descriptive Test Methods.

1) The Flavor Profile Method.

It is an analysis of a product's perceived aroma and flavor

characteristics, their intensities, order of appearance, and aftertaste. An

amplitude rating is generally included as part of the profile. It provides a

general tool for characterizing the flavors of complex food products.

Moreover, the method is proved valuable for examining flavor differences

among foods that are functions of ingredient, processing and storaging

changes. Normally it is carried out by 5-8 panelists.

2) The Texture Profile Method.

The texture profile method was developed in order to define the

textural parameters of food. Later the method was developed to include

specific attribute descriptive to specific products including semisolid foods,

beverages, skin feel products and fabrics and paper good. Texture is a

sensory attributes that perceived by the senses of touch, sight and hearing

of human. The sensory analysis of the texture complex of a food in terms of

its mechanical, geometrical, fat and moisture characteristics, the degree of

each present, and the order in which they appear from first bite through

complete mastication.

22

3) The spectrum descriptive analysis method.

The spectrum Descriptive Analysis method’s principal characteristic is

that the panelists score the perceived intensities with reference to the pre-

learned “absolute” intensity scales. The purpose is to make the resulting

profiles universally understandable and usable, not only at the later date but

also at any laboratory outside the originating one. This method provides for

this purpose an array of standard attributes names with each with its set of

standards which define a scale of intensity usually from 0 to 15. The

philosophy of spectrum is pragmatic which provides the tools to design a

descriptive procedure for a given product category. The main principal tools

are the reference lists contained in spectrum’s appendices which are

together with the scaling procedures and methods of panels’ training. The

min aim is to choose the most practical method system which is given the

product in question, the overall sensory program, the specific project

objectives in developing a panel and the desired level of statistical treatment

of the data.

4) Time-Intensity Descriptive Analysis.

As food enters the oral cavity, travels over the tongue and is ingested,

flavor, texture, and even sound perception change due to the breakdown of

food. Conventional scaling procedures, used to evaluate e.g. flavor intensity,

require judges to average their sensory response over time. This yields only

an overall impression, with no information about the course of the sensation.

However, the time-intensity can overcome this. The (T-I) technique focuses

on the dynamic changes in food over the entire physiological process. The

changes in perception of taste, flavor, texture, irritation and odor over a

selected period of time can be precisely measured. The period of the

intensity of perception varies among products. The time-intensity studies can

be divided into three kinds, including long-term time-intensity studies,

shorter term time-intensity studies and the shortest term time-intensity

23

studies. Long term time-intensity can be applied on skin lotion studies- to

measure the reduction of skin dryness periodically over days. Shorter term

time-intensity track flavor and texture attributes of chewing gum over

several minutes. The shortest term time-intensity can be applied on the

measurement of sweetness and bitterness of certain products over several

seconds.

5) Free-Choice Profiling.

Free-choice profile (FCP) was developed in the 1980s which is a

sensory analysis method that can be carried out by the untrained panels.

The participants need only to be able to use a scale and be consumers of the

product under the evaluation. Free-choice profiling is actually a novel

technique developed by Williams and Arnold at the Agricultural and Food

Council in United Kingdom which they used it as the solution to the problem

of consumers using different terms for a given attribute. It also allows the

panelists to invent and use as many terms as panels need and can to

describe the sensory characteristics of a set of samples. The samples are

actually all from the same category of products and the panelists can

develop their own score sheet. The main advantages of the new technique is

that it saves much times by not requiring any training of the panelists other

than an hour of instruction in the use of the chosen scale. The second

advantage is that the panelist who has not been trained can still be

recognized as representing naïve consumers. However, questions regarding

the ability of the sensory analyst to “interpret” the resulting terms, combined

from all the panelists which need to be addressed. In order to give the

reliable guidance to the products researchers, the experiment or sensory

analyst must decide what does each of the combined term actually means.

Therefore, the words or terms for each resulting parameter come the

experimenter or sensory analyst rather than from the panelists. The results

may be colored more by the perspective of the analyst than the combined

weight of the panelists’ verdicts.

24

6) The Quantitative Descriptive Analysis Method (QDA).

The Quantitative Descriptive Analysis (QDA) method is developed by

the Tragon Corp because the other methods are lack of statistical treatment

of data. This method relies on the statistical analysis to determine the

appropriate terms, procedures and the panelists to be used for analysis of a

specific product. These probably will reduce the unnecessary bias such as

being dominated by the leader panel in discussion and scaling. The panelists

can be selected from large pool of candidates, as long as they successfully

passed standardized tests for olfactory, taste and color sensitivity as well as

for commemoration, verbal abilities and creativity. In this method, there is

also a leader panel. However, unlike the flavor profile test, the leader panel

acts as a facilitator, rather than a instructor and refrains from affecting the

group. The panelists are free to evaluate the samples and give their own

results in separate booths under defined condition such as temperature and

light. This will reduce distraction and interaction of the panelists and there is

no discussion among the panelists in this method. The result data or score-

sheets are collected once they finish evaluating, and the data will be entered

into computer for statistical analysis. One of the computer program CASA

(Computer Aided Sensory Analysis). The results are analyzed statistically and

graphic representation of the data will be applied. It is normally in the form

of a spider web with a branch or spoke from a central point of each attribute.

Spider-web plots are used to present data graphically.

25

Panelists work independently of one another. Booths can be used to minimize social influences. Discussion can follow or calibration purposes.

5.0 AFFECTIVE TESTS

5.1 Usage/Application

These are tests in which subjective attitudes, such as product

acceptance and preference, are measured. In affective tests the task is to

indicate preference or acceptance by either selecting, ranking, or scoring

samples.

26

Respondents are usually consumers who are selected on their current

or potential use of the product. In laboratory situations, consumer

demographics often are substituted in favor of accessible respondents (e.g.,

employees) whose preference and acceptance behavior satisfactorily

correlate with those of the target consumer population. Laboratory-type

acceptance tests can be done with 25 to 50 respondents. In field studies

where the target population is used, minimum numbers are increased by 75

to 200 or more. As a rule, technical, marketing, and administrative personnel

involved with the particular product should not be used in affective tests

because of their prior knowledge and potential for biased response.

The primary purpose of affective tests is to assess the personal

response (preference or acceptance) of current or potential customers to a

product, a product idea, or specific product characteristics.

Affective tests may be used for a variety of purposes including:

Product Maintenance

Product Improvement/Optimization

New Product Development

Assessment of Market Potential

Support for Advertising Claims

Affective tests are used mainly by producers of consumer goods, but also

by service providers such as hospitals, banks, and the Armed Forces, where

many tests were first developed. Every year, the use of consumer tests

becomes more common. They have proven highly effective as a tool used to

design products and services that will sell in large quantities or command a

higher price. Prosperous companies tend to excel in customer-testing

knowledge and, consequently, in knowledge about their consumers. Affective

tests can be qualitative or quantitative, depending on purpose. Whichever

27

type of test is used, care needs to be taken to ensure the sample of testers is

representative of the target population expected to buy the product.

5.2 Affective Test Methods—Fuzzy Front End

One of the affective test methods is the Fuzzy Front End. Uncovering

consumers’ needs often occur in the beginning, at the fuzzy front end.

Typically, the research is conducted at the very early stage of a project,

when planning is being carried out, initial market and technical feasibility is

being assessed, and breakthrough ideas are being explored. Research at the

fuzzy front end is conducted before dollars are committed to detailed

technical assessment, costly concept testing is executed and significant

manpower and out-of-pocket expenses are committed. This does not imply

that the tools and techniques applied to understand the consumer early

cannot be applied at all stages of the product development process.

Methods used are unique because they gather in-depth information on who

the consumer really is, how and why products are used, what they really like,

dislike, and need. To capture this level of information, one must move

beyond the standard, frequently used quantitative and qualitative

approaches.

The applications of research at the fuzzy front end allows the:

Exploration of consumers as purchasers of products with specific

features or sensory properties identified.

Study of product functionality and ergonomics.

Determination of how a consumer is modifying a product or adapting

usage to suit his/her needs.

Uncovering of attitudes, behaviors, and motivators within the culture.

Study of the consumers in their own environment through

observational research.

28

Beyond the traditional techniques used to elicit information from

consumers in focus groups or one-to-one interviews, information-gathering

approaches that are used in support of the fuzzy front end are often

imagery-based and include, but are not limited to, compare and contrast,

mind maps, word webs, and collages. Quantitative techniques that go

beyond CLT’s or HUT’s to consider include online research and intrinsic/

extrinsic studies. The online research provides early exploration into the

design of concepts, attitudes, and behavioral research. Intrinsic or extrinsic

research studies the essential aspects of a product along with the external

motivators.

5.3 Types of Affective Tests

There are two main types of affective tests, namely:

1. Qualitative

2. Quantitative; which may be further divided into:

i. Preference tests

ii. Acceptance tests

I. Qualitative tests

Qualitative affective tests are those (e.g., interviews and focus groups)

which measure subjective responses of a sample of consumers to the

sensory properties of products by having those consumers talk about their

feelings in an interview or small group setting. Qualitative methods are used

in the following situations:

To uncover and understand consumer needs that are unexpressed

(example: Why do people buy 4-wheel-drive cars to drive on asphalt?).

Researchers that include anthropologists and ethnographers conduct

open-ended interviews. This type of study, often called “the fuzzy front

end,” can help marketers identify trends in consumer behavior and

product use.

29

To assess consumers’ initial responses to a product concept and/or a

product prototype. When product researchers need to determine if a

concept has some general acceptance or, conversely, some obvious

problems, a qualitative test can allow consumers to discuss freely the

concept and/or a few early prototypes. The results, a summary and a

tape of such discussions, permit the researcher to understand better the

consumers’ initial reactions to the concept or prototypes. Project

direction can be adjusted at this point, in response to the information

obtained.

To learn consumer terminology to describe the sensory attributes of a

concept, prototype or commercial product, or product category. In the

design of a consumer questionnaire and advertising it is critical to use

consumer-oriented terms rather than those derived from marketing or

product development. Qualitative tests permit consumers to discuss

product attributes openly in their own words.

To learn about consumer behavior regarding use of a particular product.

When product researchers wish to determine how consumers use certain

products (package directions) or how consumers respond to the use

process (dental floss, feminine protection), qualitative tests probe the

reasons and practices of consumer behavior.

Qualitative tests include the use of:

1. Focus Groups

A small group of 10 to 12 consumers, selected on the basis of specific

criteria (product usage, consumer demographics, etc.) meet for 1 to 2 hours

with the focus group moderator. The moderator presents the subject of

interest and facilitates the discussion using group dynamics techniques to

uncover as much specific information from as many participants as possible

directed toward the focus of the session.

30

Typically, two or three such sessions, all directed toward the same

project focus, are held in order to determine any overall trend of responses

to the concept and/or prototypes. Note is also made of unique responses

apart from the overall trend. A summary of these responses plus tapes,

audio or visual, are provided to the client researcher. Purists will say that 3 ×

12 = 36 verdicts are too few to be representative of any consumer trend, but

in practice if a trend emerges that makes sense, modifications are made

based on this. The modifications may then be tested in subsequent groups.

2. Focus Panels (focus groups with a longer existence)

In this variant of the focus group, the interviewer utilizes the same group

of consumers two or three more times. The objective is to make some initial

contact with the group, have some discussion on the topic, send the group

home to use the product, and then have the group return to discuss its

experiences.

3. One-on-one interviews

Qualitative affective tests in which consumers are individually interviewed

in a one-on-one setting are appropriate in situations in which the researcher

needs to understand and probe a great deal from each consumer or in which

the topic is too sensitive for a focus group.

The interviewer conducts successive interviews with up to 50 consumers,

using a similar format with each, but probing in response to each consumer’s

answers.

One unique variant of this method is to have a person use or prepare a

product at a central interviewing site or in the consumer’s home. Notes or a

video are taken regarding the process, which is then discussed with the

consumer for more information. Interviews with consumers regarding how

they use a detergent or prepare a packaged dinner have yielded information

about consumer behavior which was very different from what the company

expected or what consumers said they did.

31

One-on-one interviews or observations of consumers can give researchers

insights into unarticulated or underlying consumer needs, and this in turn

can lead to innovative products or services that meet such needs.

All these methods involve small samples so findings usually need to be

further supported by larger scale, usually quantitative, studies. However,

small scale studies often supply insights that will be missed in large scale

quantitative studies which have, by their nature, to focus on specific

attributes. Small scale studies, on the other hand, give scope for probing

responses and trying to identify reasons behind response.

II. Quantitative tests

Quantitative affective tests are those which determine the responses of a

large group (50 to several hundred) of consumers to a set of questions

regarding preference, liking, sensory attributes, etc. Quantitative affective

methods are applied in the following situations:

To determine overall preference or liking for a product or products by a

sample of consumers who represent the population for whom the

product is intended. Decisions about whether to use acceptance and/or

preference questions are discussed under each test method below.

To determine preference or liking for broad aspects of product sensory

properties (aroma, flavor, appearance, texture). Studying broad facets of

product character can provide insight regarding the factors affecting

overall preference or liking.

To measure consumer responses to specific sensory attributes of a

product. Use of intensity, hedonic, or “just right” scales can generate

data which can then be related to the hedonic ratings discussed

previously and to descriptive analysis data.

Preference and acceptance tests should not use trained panelists

i. Preference tests

32

Simple preference test

Present two samples and ask: Which do you prefer?

You can either force a decision or allow a "no preference option"

If a "no preference" option is permitted, the no preference responses may

either be removed from the sample or randomly allocated to the either of the

two samples, either way, there is need for particular care in interpreting the

results of any expressed preference. From the point of view of a more robust

statistical analysis, the forced preference method. On the other hand, some

testers believe that "A happy panel is a better panel"

The simple preference test is very similar to the directional difference test.

Ranking tests

These involve asking subjects to put three or more samples in order of

preference. Care should be taken not to induce sensory fatigue by

introducing too many samples.

An alternative procedure to identify ranking of preference is to use

multiple paired preference tests. This can involve all possible pairs of three

or more samples or selecting one or two samples as controls and rating the

other samples against these.

Sample size

Preference Tests require a minimum of 30 assessors. 100 or more is better

ii. Acceptance tests

These tests are aimed at identifying a liking for a product. They can be used

for general liking or evaluation of specific attributes. It is possible to infer

preference from acceptance scores.

Caution needs to be exercised with attribute testing. e.g. Is the tester's

and panellists perceptions of sour/bitter/astringent the same?

33

Rating scales are generally preferred to a simple yes/no response as

they give an indication of degree of liking. It is important that rating scales

are balance i.e. the number of "like this" is equal to the number of "dislike

this" options. You should normally include a neutral response (neither like

nor dislike).

Note that rating scales are prone to central tendency errors i.e. a reluctance

to use the extremes of the scale, so a sufficient number of scale points

should be provided to counter this.

Types of Rating Scale

1. Category scales

Sample is assigned to one of a set of descriptive terms

2. Line scales

Mark a point along a line

The outer limits of the range are marked at each end of the line

3. Ratio Scales

Rates sample against some standard

Always involves a comparison

Needs highly trained panellists to achieve meaningful results

Examples of Rating Scale

1. Likeability scale (9-point hedonic scale)

34

2. 'Just

Right'

scales

3. Line or

Numerical Scales

Respondent places a mark on a line or gives a number to express the degree

of liking, e.g.

Please score the suitability of product X for use in ... meal

not at

all

suitabl

e

very suitable

for this

occasion

35

4. Likelihood to Purchase or Food-Action-Rating

Eat the whole portion & evaluate the sample on the basis of your experience.

Tick which statement best reflects your opinion

I would eat this at every opportunity

I would eat this very often

I like this & would eat this now and then

I would eat this if available but would not go out of my way for it

I don't like this & would eat it only occasionally ... etc.

Note: This scale as it stands in unbalanced. There needs to be equal number

of like and dislike points and there is no clear neutral response.

Advantages:

Provides essential information; bottom line

Can identify liking/disliking segments

Can be related to descriptive profile, other variables in optimization

Liabilities:

Consumer vocabulary fuzzy

Representative samples can be a problem

Preference may be ambiguous

Costs:

Consumer recruiting, qualification as users/likers

Technician time in setup, recruiting , analysis, reporting

Computing required if long questionnaire, large sample

Some products may require controlled facility (odors, noise, etc.)

36

Development of Sensory Testing

Documents

Transcript of Development of Sensory Testing