B414843F

9
Treatment of bias in estimating measurement uncertainty Gregory E. O’Donnell ab and D. Brynn Hibbert* b Received 24th September 2004, Accepted 15th February 2005 First published as an Advance Article on the web 8th March 2005 DOI: 10.1039/b414843f Bias in an analytical measurement should be estimated and corrected for, but this is not always done. As an alternative to correction, there are a number of methods that increase the expanded uncertainty to take account of bias. All sensible combinations of correcting or enlarging uncertainty for bias, whether considered significant or not, were modeled by a Latin hypercube simulation of 125,000 iterations for a range of bias values. The fraction of results for which the result and its expanded uncertainty contained the true value of a simulated test measurand was used to assess the different methods. The strategy of estimating the bias and always correcting is consistently the best throughout the range of biases. For expansion of the uncertainty when the bias is considered significant is best done by SUMU Max :U(C test result ) 5 ku c (C test result ) + |d run |, where k is the coverage factor (5 2 for 95% confidence interval), u c is the combined standard uncertainty of the measurement and d run is the run bias. Introduction Uncertainties exist in every analytical measurement whether they are acknowledged or not. A truly informed decision, based on a measurement result, can only be made with the consideration of the measurement uncertainty. The measure- ment uncertainty of a result is important also for the comparison of subsamples, laboratories and the comparison of the test result with a specification limit. The Guide to the Expression of Uncertainty in Measurement (GUM) 1 gives scientists a theoretical approach to the estimation of uncer- tainty and has developed a formal definition of uncertainty of measurement, in conjunction with ISO in ref. 2, as a ‘‘parameter, associated with the result of a measurement, that characterizes the dispersion of the values that could reasonably be attributed to the measurand’’. The measurand is the quantity being measured, and most analysts would consider that their primary objective is to obtain a test result that is as close to the true value of the measurand as possible. An analyst produces a test result based on the analytical pro- cedure employed. It is unknown what the true value of a sample is, one can only estimate a value and the range in which the true value might lie, by good method validation and quality control procedures. Therefore, the expression of the uncer- tainty of an analytical result defines an interval where the true value of the measurand is expected to lie with a stated level of confidence. The aim of this paper is to illustrate how the treatment of bias influences the uncertainty interval and the probability of the true value occurring within such an interval. Components of measurement uncertainty Measurement uncertainty has been considered to have two components, arising from random effects (random error) and systematic effects. The former may be estimated by deter- mining the precision, measured as a standard deviation, of measurement results under repeatability conditions. The true- ness of an observed value can be assessed by comparison to an accepted reference value, preferably embodied in a certified reference material (CRM). A correction to the observed value is made when the bias is considered significant. The uncer- tainty of such a correction is then included in the measurement uncertainty calculation. In the bottom-up approach each contribution to bias is determined as a variance, and these are summed together with the precision estimation in an uncertainty budget. The difference between the two appro- aches is, therefore, in the treatment of bias. Definitions of bias The term bias has been defined 3 as the difference between the expectation of the test results and an accepted reference value. The term ‘‘test result’’ here refers to a single observation or the mean of a set of observations of the particular measurand. Bias has a currency for a particular time and applies to the results of a single laboratory or a group of laboratories. 4,5 Thus, different types of bias become apparent, namely method bias, laboratory bias and run bias. Method bias is the difference between the expectation of test results obtained from all laboratories using that method and an accepted reference value. 6 The determination of method bias is usually achieved by a number of laboratories participat- ing in a collaborative trial in which a certified reference material (CRM) is analyzed. Ideally the CRM used should be of the same matrix. When a CRM is not available then a standard material purchased from a reputable supplier will need to suffice. This will have a larger uncertainty than a CRM and give traceability to that standard material only. Walker and Lumley 7 have noted that the use of a CRM to establish a reference value is equivalent to participating in an inter- laboratory or collaborative trial because the reference value of the a reputable CRM has usually been established by such a study. The bias of a method is therefore obtained by com- paring the mean of all the means of the laboratories’ test *[email protected] PAPER www.rsc.org/analyst | The Analyst This journal is ß The Royal Society of Chemistry 2005 Analyst, 2005, 130, 721–729 | 721 Published on 08 March 2005. Downloaded by Universidade Tecnica de Lisboa (UTL) on 08/02/2014 16:44:44. View Article Online / Journal Homepage / Table of Contents for this issue

Transcript of B414843F

Page 1: B414843F

Treatment of bias in estimating measurement uncertainty

Gregory E. O’Donnellab and D. Brynn Hibbert*b

Received 24th September 2004, Accepted 15th February 2005

First published as an Advance Article on the web 8th March 2005

DOI: 10.1039/b414843f

Bias in an analytical measurement should be estimated and corrected for, but this is not always

done. As an alternative to correction, there are a number of methods that increase the expanded

uncertainty to take account of bias. All sensible combinations of correcting or enlarging

uncertainty for bias, whether considered significant or not, were modeled by a Latin hypercube

simulation of 125,000 iterations for a range of bias values. The fraction of results for which

the result and its expanded uncertainty contained the true value of a simulated test measurand was

used to assess the different methods. The strategy of estimating the bias and always correcting is

consistently the best throughout the range of biases. For expansion of the uncertainty when the

bias is considered significant is best done by SUMUMax:U(Ctest result) 5 kuc(Ctest result) + |drun|,

where k is the coverage factor (5 2 for 95% confidence interval), uc is the combined standard

uncertainty of the measurement and drun is the run bias.

Introduction

Uncertainties exist in every analytical measurement whether

they are acknowledged or not. A truly informed decision,

based on a measurement result, can only be made with the

consideration of the measurement uncertainty. The measure-

ment uncertainty of a result is important also for the

comparison of subsamples, laboratories and the comparison

of the test result with a specification limit. The Guide to the

Expression of Uncertainty in Measurement (GUM)1 gives

scientists a theoretical approach to the estimation of uncer-

tainty and has developed a formal definition of uncertainty

of measurement, in conjunction with ISO in ref. 2, as a

‘‘parameter, associated with the result of a measurement,

that characterizes the dispersion of the values that could

reasonably be attributed to the measurand’’. The measurand

is the quantity being measured, and most analysts would

consider that their primary objective is to obtain a test result

that is as close to the true value of the measurand as possible.

An analyst produces a test result based on the analytical pro-

cedure employed. It is unknown what the true value of a

sample is, one can only estimate a value and the range in which

the true value might lie, by good method validation and quality

control procedures. Therefore, the expression of the uncer-

tainty of an analytical result defines an interval where the true

value of the measurand is expected to lie with a stated level of

confidence. The aim of this paper is to illustrate how the

treatment of bias influences the uncertainty interval and the

probability of the true value occurring within such an interval.

Components of measurement uncertainty

Measurement uncertainty has been considered to have two

components, arising from random effects (random error) and

systematic effects. The former may be estimated by deter-

mining the precision, measured as a standard deviation, of

measurement results under repeatability conditions. The true-

ness of an observed value can be assessed by comparison to

an accepted reference value, preferably embodied in a certified

reference material (CRM). A correction to the observed value

is made when the bias is considered significant. The uncer-

tainty of such a correction is then included in the measurement

uncertainty calculation. In the bottom-up approach each

contribution to bias is determined as a variance, and these

are summed together with the precision estimation in an

uncertainty budget. The difference between the two appro-

aches is, therefore, in the treatment of bias.

Definitions of bias

The term bias has been defined3 as the difference between the

expectation of the test results and an accepted reference value.

The term ‘‘test result’’ here refers to a single observation or the

mean of a set of observations of the particular measurand.

Bias has a currency for a particular time and applies to the

results of a single laboratory or a group of laboratories.4,5

Thus, different types of bias become apparent, namely method

bias, laboratory bias and run bias.

Method bias is the difference between the expectation of test

results obtained from all laboratories using that method and

an accepted reference value.6 The determination of method

bias is usually achieved by a number of laboratories participat-

ing in a collaborative trial in which a certified reference

material (CRM) is analyzed. Ideally the CRM used should be

of the same matrix. When a CRM is not available then a

standard material purchased from a reputable supplier will

need to suffice. This will have a larger uncertainty than a CRM

and give traceability to that standard material only. Walker

and Lumley7 have noted that the use of a CRM to establish a

reference value is equivalent to participating in an inter-

laboratory or collaborative trial because the reference value of

the a reputable CRM has usually been established by such a

study. The bias of a method is therefore obtained by com-

paring the mean of all the means of the laboratories’ test*[email protected]

PAPER www.rsc.org/analyst | The Analyst

This journal is � The Royal Society of Chemistry 2005 Analyst, 2005, 130, 721–729 | 721

Publ

ishe

d on

08

Mar

ch 2

005.

Dow

nloa

ded

by U

nive

rsid

ade

Tec

nica

de

Lis

boa

(UT

L)

on 0

8/02

/201

4 16

:44:

44.

View Article Online / Journal Homepage / Table of Contents for this issue

Page 2: B414843F

results to the accepted reference value. This represents full

reproducibility conditions.

The laboratory bias is the difference between the expectation

of the test results from a particular laboratory and an

accepted reference value.3 The ‘‘expectation of the test results’’

has been interpreted in ref. 8 to be the mean of a sufficiently

large number of results. Hence, the laboratory bias is the

difference between the mean of the means of the test results

of individual analytical runs and the accepted reference

value. The individual analytical runs would be performed,

at different times, with different analysts, and if possible,

on different instruments. This is sometimes referred to as

intermediate, intralaboratory or within laboratory repro-

ducibility conditions.9

It then follows that the run bias (drun) would be defined as

the difference between the mean of a small number of test

results from a single run and the accepted reference value,

determined under repeatability conditions. Run bias is the result

of uncontrolled factors remaining constant for the period of

time of that batch of samples and having equal effects on all

samples in that batch.10 With many analytical methods, the

method bias and the laboratory bias tend to be reasonably

consistent over time, if they have been determined with enough

data, however, the run bias will tend to vary between each

batch of samples. An illustration of the different types of bias

can be seen in Fig. 1. If we take the viewpoint of a client of a

commercial laboratory who submits samples regularly to that

laboratory for analysis, it is logical that the client would

require these results to be comparable from run to run. The

only way this could be achieved is by recognizing that the run

bias is the bias of a particular analysis and moreover that it

should be estimated for each analysis batch. We note the use of

the symbol drun for the component of the total bias attributable

to the run in ref. 11 (as distinct from laboratory and method

components). This usage should not be confused with the

symbol here which refers to the actual bias of a particular

analytical run. Some sectors of analytical chemistry take the

view that the bias of an analysis is the laboratory bias,7,12,13

and use the laboratory bias to correct the measurement result

(if a correction is applied). This is an erroneous correction to

use in most cases. However it may be argued that the use of

laboratory bias is better than no estimate at all. In practice,

such a correction is rarely applied. This is because the labora-

tory bias distribution usually has a substantial standard devia-

tion and a correction is only deemed necessary when the bias is

considered statistically significant.

Decisions for dealing with run bias

There are four major decisions in dealing with bias and

these are schematically illustrated as a decision tree diagram

in Scheme 1. These are described here followed by a

computer simulation to determine the effectiveness of the

treatment to provide a measurement result and expanded

uncertainty that encompasses the true value with the stated

probability.

Decision 1—measure bias. The first decision to be made is to

decide if it is necessary to estimate bias at all. Many sectors

of analytical chemistry use empirical methods to give com-

parability of analytical results when ‘‘trueness’’ cannot be

achieved by any practical means. The results of such analyses

are dependent on the method used and are not related to the

Fig. 1 Schematic of different levels of bias and accompanying dispersion of results. Run bias and repeatability standard deviation contributes to

the standard deviation of the laboratory result (intra-laboratory or intermediate reproducibility) and this and laboratory bias contributes to the

method reproducibility. The accepted reference value (ARV) is the result obtained with zero bias.

722 | Analyst, 2005, 130, 721–729 This journal is � The Royal Society of Chemistry 2005

Publ

ishe

d on

08

Mar

ch 2

005.

Dow

nloa

ded

by U

nive

rsid

ade

Tec

nica

de

Lis

boa

(UT

L)

on 0

8/02

/201

4 16

:44:

44.

View Article Online

Page 3: B414843F

‘‘true value’’. The results are traceable to that method only.

They are, ipso facto, free of method bias but may indeed suffer

from laboratory and run bias. This bias can only be deter-

mined if a reference material specific for that particular

method is available. If it is available and the bias is determined,

then one can proceed to determine if the bias is significant. If

the method of interest is not an empirical method but rather a

rational method,14 and the bias has not been measured, then

the method has not been adequately validated and should not

be used. If the bias has not been determined then the reported

amount concentration of the test material (the test result,

CTest Result) is the value obtained by the measurement pro-

cedure Cobs (or Cobs if the results of a number of analyses have

been averaged).

CTest Result 5 Cobs (1)

The combined uncertainty of the test result is then the

combined uncertainty of the measurement, which is estimated

by ‘‘bottom-up’’ methods explained in GUM.1

uc(CTest Result) 5 uc(Cobs) (2)

Once the combined uncertainty has been calculated then the

expanded uncertainty (U) can be obtained by multiplying it

by the coverage factor k, which is often taken as 2 to give

approximately a 95% probability of the location of the true

value when the expanded uncertainty is added and subtracted

from the mean.

U 5 kuc(CTest Result) (3)

The run bias of a batch of analyses can be determined by

taking the average of a small number (p) of observations of a

CRM (Cobs,CRM) and comparing it to its accepted reference

value (CCRM). As the run bias is determined each time a

batch of the analysis is performed, then the uncertainty of

the run bias is equal to the combination of the uncertainty

of the analysis and the uncertainty of the reference material.

Consequently, the run bias has been determined on one day, in

a short period of time, by one analyst, on one instrument.

These are repeatability conditions and the run bias is equal to

drun 5 Cobs,CRM 2 CCRM (4)

and the uncertainty of the run bias is

u drunð Þ~

ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi

sr,CRMffiffiffi

pp

� �2

zu2 CCRMð Þ

s

(5)

where sr,CRM is the repeatability standard deviation of the

analyses of the CRM and u(CCRM) is the uncertainty of the

value of the CRM.

The value of p used as the denominator in eqn. (5) is the

number of analyses of the CRM performed to determine the

run bias. While it would be best to establish the repeatability

on the day of the analysis, time and cost restraints usually do

not allow this to be done with much rigor. Therefore, the

repeatability determined in the method validation process is

often used as a reasonable estimate of sr,CRM in eqn. (5) above.

To increase the precision of the estimate it can be calculated

as a pooled standard deviation from a number of different

runs, and is written sp,r,CRM. In a commercial working

Scheme 1 Decisions (labeled in bold face) that can be made with respect to bias and its correction, or otherwise. Outcomes (labeled in italics) give

the reported test result (CTest Result) and expanded uncertainty, for which k is 2 for approximately 95% coverage. ‘‘U 5 Enlargement method’’ refers

to one of the strategies described in the text to expand the uncertainty when bias is not corrected for.

This journal is � The Royal Society of Chemistry 2005 Analyst, 2005, 130, 721–729 | 723

Publ

ishe

d on

08

Mar

ch 2

005.

Dow

nloa

ded

by U

nive

rsid

ade

Tec

nica

de

Lis

boa

(UT

L)

on 0

8/02

/201

4 16

:44:

44.

View Article Online

Page 4: B414843F

laboratory where time and cost restraints restrict the number

of analyses of the CRM on the day to a single analysis, then

the run bias is simply

drun 5 Cobs,CRM 2 CCRM (6)

and the uncertainty of the run bias is

u drunð Þ~ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi

s2p,r,CRMzu2 CCRMð Þ

q

(7)

with the sp,r,CRM term being taken from the method validation

process as above.

Once bias is measured then it is usual to test the magnitude

of the bias for statistical significance. A test is usually

performed because it has been decided to take some action

(correct the result or enlarge the uncertainty) when bias is

significantly different from zero, but to ignore non-significant

bias. Bias is generally considered significant when it is greater

than the expanded uncertainty of the measurement of the bias

where u(drun) is given by eqn. (5).

drun . ku(drun) (8)

The magnitude of the uncertainty of the bias is largely

dependent on p, the number of repeats used to determine the

bias. Therefore the significance test should be performed at

p 2 1 degrees of freedom with the null hypothesis H0: drun 5 0

and k should take the value t0.050,p21. In practice k is usually

taken as 2 which is the Student-t value for 60 degrees of

freedom and so approximates t0.050,‘.

Decision 2—is the bias statistically significant. If the run bias

is determined to be not significant then there are three options,

to correct the observed value for the non-significant bias

anyway (Outcome 2.A), to account for the non-significant

bias by enlarging the uncertainty interval (Outcome 2.B), or to

report the test result without correction for bias and report

the uncertainty including the uncertainty associated with the

zero correction (Outcome 2.C). These options are explained

in more detail below. If a correction is made (Outcome 2.A),

then the test result is equal to the bias subtracted from the

observed value.

CTest Result 5 Cobs,Sample 2 drun (9)

A standard deviation of the observed value is obtained from

a pooled standard deviation of a number of real test samples

which are known to show a positive result for the analyte of

interest. The samples’ results should be of a similar magnitude,

if not, then all standard deviations should be converted to

relative standard deviations (RSD) and a pooled RSD

calculated. These samples can be analysed on different days,

by different analysts, and on different equipment if possible.

They also should represent the variability of the matrix to be

analyzed, and possess the inhomogeneity of the samples that is

likely to be encountered in this analysis. Whether the

uncertainty of primary sampling is included in the measure-

ment uncertainty of the sample or only any subsampling that

occurred once the sample arrived in the laboratory depends on

the definition of the measurand.15 If different types of samples

are analyzed by the current method then a separate pooled

standard deviation for each type of sample may be appro-

priate. Consequently, the combined uncertainty of the test

result can be expressed as

uc CTest Resultð Þ~

ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi

sp,r,Sampleffiffiffi

np

� �2

zu2 drunð Þ

s

(10)

where n represents the number of analyses of a test sample.

The expanded uncertainty can again be obtained by multi-

plying the combined uncertainty by the coverage factor k.

U 5 kuc(CTest Result) (11)

When a test result is not corrected for a run bias (Decision

2.3), then the uncertainty range should be increased to take

into account the offset of the result (Outcome 2.B). The

uncertainty range should at least be increased to include the

‘‘true value’’ with the stated probability. This can be done by

enlarging the uncertainty interval by one of the enlargement

methods. These methods are explained below, and are referred

to as the method of Barwick and Ellison, the RSSu method,

the RSSU method, the SUMU method and the SUMUMax

method, and are expressed in eqns. (13)–(20) respectively.

When these methods are employed, the expanded uncertainty

is expressed in Scheme 1 in a general form as

U 5 enlargement method (12)

A common occurrence of significant run bias is in the area

of trace analysis and is due to the loss of analyte recovered,

termed apparent recovery in a recent IUPAC recommenda-

tion.16 Apparent recovery is the ratio of the observed value to

the expected reference value.

�RR~�CCobs,CRM

CCRM(13)

Barwick and Ellison17,18 evaluated the uncertainty of the

apparent recovery, which is a quotient, as

u �RRð Þ0~�RR

ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi

1

p

sobs,CRM

�CCobs,CRM

� �2

zu CCRMð Þ

CCRM

� �2s

(14)

They concluded that when the test result is not corrected for

the loss of the apparent recovery, then the uncertainty interval

should be increased according to

u �RRð Þ~

ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi

1{�RRj jk

� �2

zu2 �RRð Þ0s

(15)

The uncertainty of the final test result can be determined by

combining the uncertainty of the measurement result with

the uncertainty of the apparent recovery, giving the expanded

uncertainty

U CTest Resultð Þ~

k|CTest Result

ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi

1

n

sobs,Sample

Cobs,Sample

� �2

zu �RRð Þ

�RR

� �2s

(16)

A number of other approaches of dealing with uncorrected

significant bias are used by different sectors of analytical

724 | Analyst, 2005, 130, 721–729 This journal is � The Royal Society of Chemistry 2005

Publ

ishe

d on

08

Mar

ch 2

005.

Dow

nloa

ded

by U

nive

rsid

ade

Tec

nica

de

Lis

boa

(UT

L)

on 0

8/02

/201

4 16

:44:

44.

View Article Online

Page 5: B414843F

chemistry. The first approach added the bias to the combined

uncertainty by using the ‘‘root sum of squares’’ (RSS)

method.19,20

RSSu:U CTest Resultð Þ~k

ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi

u2c CTest Resultð Þzd2

run

q

(17)

The second approach also used the RSS method to combine

the bias this time with the expanded uncertainty. This is the

method put forward by the APHA21

RSSU :U CTest Resultð Þ~ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi

k2u2c CTest Resultð Þzd2

run

q

(18)

The next method simply adds the bias to the expanded

uncertainty.19,20

SUMU:U(CTest Result)¡ 5

Max(kuc(CTest Result) ¡ drun,0)(19)

When the absolute value of the run bias is larger than the

expanded uncertainty then one of the enlarged uncertainties,

U+ or U2, will be a negative value. Under that circumstance,

the negative limit is equated to zero.

The SUMU approach has the disadvantage of producing

asymmetrical uncertainty limits. This drawback can be

overcome by using the SUMUMax method. This method

adds the absolute value of the run bias to the expanded

uncertainty.19,20

SUMUMax:U(CTest Result) 5 kuc(CTest Result) + |drun| (20)

If the outcome of Decision 2.3 was not to account for bias in

the test result (Outcome 2C), then the test result is equal to the

observed value. When the bias is measured there is an

uncertainty in that measurement. If the measurement reveals

that the bias is not significant, then there is still uncertainty in

that revelation and a possibility that bias does exist. However,

if one proceeds as if the bias does not exist (often called a

zero correction), then the uncertainty of the decision of

such action still needs to be taken into consideration. Ignoring

bias and its uncertainty in this way yields a test result that is

erroneous and an uncertainty interval that may not include the

‘‘true’’ value.

Decision 3—is the method fit for purpose. If the run bias is

significant then one should always answer the question, ‘‘Is the

method fit for purpose?’’ That is to say, does the method give a

result to the level of quality expected of it? If the answer to this

question is no, then the method should not be used until

improvements are made which enables the method to attain

the level of quality expected. If the method is fit for purpose

and does have significant run bias then one needs to decide to

correct the result for the bias, to account for the bias in the

uncertainty by using an enlargement method, or to present the

result uncorrected. It may be thought of as a rare occurence

that a test result is reported uncorrected when it has known

significant bias, however, in many multi-analyte analysis, all of

the analytes in the analysis may not be adequately recovered to

be considered to have non-significant bias, in accordance with

the criteria expressed in eqn. (8), and may indeed be not

corrected for that bias. The guidelines set out for that type of

analysis may well acknowledge that for a number of analytes,

significant bias is present and acceptable. However, it is rarely

stipulated that a correction should be performed.13 If one

corrects the test result for the bias, then the uncertainty of the

test result is the combination of the uncertainty for the

measurement and the uncertainty of the bias, as seen in

eqns. (9)–(11). If the test result is not corrected for bias, the

bias can be accounted for in the uncertainty. The interval can

be enlarged by one of the enlargement methods so that it has a

higher probability of containing the ‘‘true’’ value. If the result

is not corrected for significant run bias then the result is in

appreciable error.

In the next section we shall describe Monte Carlo simula-

tions that investigate which of the methods do lead to results

and confidence intervals that are consistent with their stated

aims.

Experimental

Many of the decisions on the treatment of bias have a

discontinuity at the point at which the measured bias is

deemed to be significant, and thus have no analytical solution

to the probability distribution. Maroto20 has shown the use of

Monte Carlo simulations to be a very powerful way to test

the methods. We have adopted the same overall approach, but

use Latin Hypercubes for speed, and cover a wider range of

decision combinations and treatments. In brief, the true value

of a measurand, the measurement repeatability, uncertainty

of the quantity value of the CRM, and the number of

measurements of the CRM are chosen for the experiments.

Then simulations, assuming normality of distribution of

CCRM and Cobs are performed for a range of run biases of

the measurement in which the quantity value of a CRM

is measured with m 5 CCRM + drun, and s 5 u(drun), as given

by eqns. (4) and (5), and then the test result is obtained in a

second simulation with a particular value of drun being

applied in the prescribed treatment (correction or expansion

of uncertainty). This yields a distribution of test results that

can be compared with the true value, and the fraction of

expanded uncertainty ranges that encompass the true value is

observed. If the treatments are well designed, this fraction

should be near 95% for any value of the bias. Scheme 2 shows

this approach for the case in which there is significant bias for

the particular value simulated and then the simulated Cobs is

either corrected for this bias or the uncertainty is expanded

using the value of the bias.

The simulations were performed on a WinTel personal

computer running @RISK 3.5 (Pallisade software, USA) as an

add-in to Microsoft Excel (Office 2000, Microsoft Inc, Seattle,

USA). The characteristics of the system were chosen to give

non-negligible contributions from measurement uncertainty

and the uncertainty of the CRM. A range of bias values

were chosen to span non-significant to significant bias. Table 1

contains parameters of the model.

For cases in which there is an analytical solution, i.e. when

the procedure is continuous across the level of significance (e.g.

always correct, never correct) the simulations were checked by

a calculation involving the normal distribution. The fraction of

measurements of a value Ctrue with run bias, drun, and

This journal is � The Royal Society of Chemistry 2005 Analyst, 2005, 130, 721–729 | 725

Publ

ishe

d on

08

Mar

ch 2

005.

Dow

nloa

ded

by U

nive

rsid

ade

Tec

nica

de

Lis

boa

(UT

L)

on 0

8/02

/201

4 16

:44:

44.

View Article Online

Page 6: B414843F

combined uncertainty uc for which the result plus and minus

an expanded uncertainty U includes the true value is given by

ð

CtruezU

Ctrue{U

f x,Ctruezdrun,ucð Þdx (21)

where f(x,m,s) is the probability density function of the normal

distribution with mean m and standard deviation s.

Results

Five scenarios have been considered that are most likely to

occur.

1. Always correct for bias

The most obvious scenario would be to always correct for

bias whether it is significant or not. This is equivalent to a

combination of outcomes 2.A and 3.A on the decision chart in

Scheme 1. The observed value is always corrected for the bias

using eqn. (9) and the expanded uncertainty interval (eqn. (11))

is the coverage factor multiplied by the combined uncertainty

(eqn. (10)). For a coverage factor of 2 the probability of the

true value occurring within the uncertainty limits is 95.45%

across all levels of bias and the resulting mean percentage from

the simulation can be seen in Fig. 2 as a line which closely

approximates this value.

2. Never correct for bias

The antithesis of the first position would be to never correct

for bias. This would give an outcome as seen in Scheme 1 as a

combination of 2.C and 3.C. The test result is simply the

observed value as expressed in eqn. (1). The uncertainty of the

test result is equivalent to the uncertainty of the measurement

as expressed in eqn. (2) and eqn. (3). The probability of the true

value occurring inside the interval of the expanded uncertainty

starts at 98.5%, a higher than expected value of 95.45%. This

is because the uncertainty of the test result (eqn. (10) and

eqn. (11)), as shown by the curve, includes the uncertainty of

bias but the distribution of the observed value, Cobs, which

the uncertainty interval is encompassing, does not. This

means that the uncertainty interval is slightly larger than a

Scheme 2 Detection of significant bias (a) followed by correction (b) or enlargement of the expanded uncertainty (c).

Table 1 Parameters of the simulation model

Simulation parameter Value

Accepted reference value (‘‘true value’’)of sample

100

drun ¡30 in increments of 1u(CCRM) 1.0sp,r,CRM 4.0p (for bias estimation) 1u(drun) ~

ffiffiffiffiffiffiffiffiffiffiffiffiffiffi

42z12p

~4:12sp,r,sample 6.0n (for sample estimation) 1k 2Number of iterations of Latin Hypercube

simulation for each drun and sample125,000

726 | Analyst, 2005, 130, 721–729 This journal is � The Royal Society of Chemistry 2005

Publ

ishe

d on

08

Mar

ch 2

005.

Dow

nloa

ded

by U

nive

rsid

ade

Tec

nica

de

Lis

boa

(UT

L)

on 0

8/02

/201

4 16

:44:

44.

View Article Online

Page 7: B414843F

corresponding 95.45% interval of the distribution of Cobs.

As the bias increases the probability quickly decreases as

shown in Fig. 2. This is not surprising, for as the bias increases

and the uncertainty interval remains constant, there would

be increasingly less chance of the true value occurring inside

the interval.

3. Non-significant bias—uncorrected/significant bias—corrected

The intermediate positions between these two extreme cases

would be to leave the observed value uncorrected for non-

significant bias and to correct the observed value, or enlarge

the uncertainty, for significant bias. The former situation,

would be a combination of the outcomes as seen in Scheme 1

as 2.C and 3.A to disregard any non-significant bias and to

correct the test result when the bias became significant. The

results of the simulation show a curve that starts just below

97.5% which again is above the expected value of 95.45%

probability. This again is due to the uncertainty of the test

result including the uncertainty of the bias and the distribution

of the observed value not including it. From this value of

97.5% the curve then decreases in probability until significant

bias is encountered. The observed value is then corrected

for the bias, and the probability of the uncertainty interval

containing the true value then increases until it reaches the

expected probability of 95.45%. (see Fig. 2)

4. Non-significant bias—uncorrected/significant bias—enlarged

A combination of outcomes 2.C and 3.B illustrates the situa-

tion when non-significant bias is disregarded and the uncer-

tainty range is enlarged to compensate for the bias when it

becomes significant (see Fig. 3). Again the curves started

higher than expected, because of the uncertainty of the test

result including the uncertainty of the bias when the distribu-

tion of the observed value did not. The method of SUMU gave

the most unacceptable probability of all the enlargement

methods. It shows a curve that starts at 97.5% and slowly

reduces in probability to zero where the significance bias ratio,

that is ratio of the bias to the expanded uncertainty of the bias,

was marginally higher than 2. The RSSU and Barwick and

Ellison method start at zero bias with a probability of 98.6%,

decrease to 91% at significant bias, and continue to decline

at slightly different rates to approximately 80% and 83%

respectively at a significant bias ratio of 2.0. The SUMUMax

enlargement method mirrors the RSSu curve and both curves

decrease slightly to 92% probability just after significant bias is

encountered. The curves then increase due to the interval being

enlarged to the expected probability at a significant ratio of

1.75, above which the curves depart from each other with

RSSu increasing to approximately 2% above SUMUMax at

3.0 significant bias ratio. The RSSu and SUMUMax curves at

high bias, above 1.756 significant bias ratio, over estimate the

uncertainty interval slightly.

5. Always enlarge for bias

The other scenario not yet considered is to enlarge the uncer-

tainty range to account for the bias at all levels of bias. With

this scenario as expected all the curves start at a probability

above the expected value, except the SUMU method, which

performs poorly as seen in Fig. 4. From an expected value of

95.45% at zero bias, the probability quickly decreases. The

RSSU and the Barwick and Ellison curves mirror each other

with the latter curve giving a slightly higher probability. They

both gradually decrease to 92% and 93.5% at significant bias

and continue to decrease to 80% and 83.5% at 2.06 significant

bias respectively. The RSSu method at zero bias starts at

approximately 3% above the expected value and then declines

to approximately 1% above the expected value at significant

bias, and as bias increases the uncertainty interval is over-

estimated to a small extent. The SUMUMax method starts at

zero bias at approximately 4% above the expected value and

declines slightly as bias increases to approximately 97.7%. It is

not unexpected that enlarging the uncertainty interval when

the bias is non-significant would result in a higher probability

of the true value being contained inside the limits and this can

be seen clearly in Fig. 4.

Fig. 2 To correct or not correct for bias. This graph shows the

probability of the true value occurring within the uncertainty limits

when the bias is corrected (circles), when it is not corrected for at all

(triangles), and when it is corrected for only when the bias is significant

(squares). Lines are drawn indicating the level at which the bias is

considered significant and the expected probability for k 5 2 (95.45%).

Fig. 3 Enlarge when significant. This graph shows the probability of

the true value occurring within the uncertainty interval when the

uncertainty interval is increased by an enlargement method when bias

is significant. The enlargement methods are SUMU (diamonds), RSSU

(circles), Barwick & Ellison (triangles), SUMUMax (crosses), RSSu

(squares). Lines are drawn indicating the level at which the bias is

considered significant and the expected probability for k 5 2 (95.45%).

This journal is � The Royal Society of Chemistry 2005 Analyst, 2005, 130, 721–729 | 727

Publ

ishe

d on

08

Mar

ch 2

005.

Dow

nloa

ded

by U

nive

rsid

ade

Tec

nica

de

Lis

boa

(UT

L)

on 0

8/02

/201

4 16

:44:

44.

View Article Online

Page 8: B414843F

Discussion

The estimation of the uncertainty of an analytical test result

needs only to consider the precision of the whole analytical

procedure and the uncertainty of the correction for bias. These

are best estimated separately as repeatabilities of a sample and

CRM as in eqns. (5), (10) and (11). As accreditation bodies

and method providers already stipulate that a bias estimate

(usually as recovery) be determined with each run, then it is a

simple matter to utilise this data without a great deal of extra

effort needed by the laboratory. Estimates of uncertainty

based on intermediate reproducibility can give an adequate

estimate of the uncertainty of the test result, however, an

estimate of the bias and its uncertainty still needs to be

determined to verify if the bias is significant or not. Estimates

based on full reproducibility, such as from interlaboratory

collaborative trails, should only be considered when it is clear

that all participants are following the same analytical method

rigorously in unbiased situations.

The significance of bias can be determined by the com-

parison in eqn. (8). Bias is often hidden by the significance test

being based on reproducibility data. This is because between-

run or between-laboratory reproducibility is perforce larger

than within run repeatability and would thus, in accordance

with eqn. (8), incorrectly render the bias non-significant. A

second question may be raised about the level of significance.

Testing at the 95% level means that the probability of the null

hypothesis (bias is zero) has to fall to 5% before the bias is

considered significant. In situations where there is likely to be a

small bias there may be a case to reject the null hypothesis at a

much greater probability (say 10 or 20%).

Metrological traceability and therefore comparability of test

results within a laboratory and between laboratories can only

be achieved if one recognizes that the run bias is the bias of the

analysis as performed and a traceable correction for that bias

is carried out. The run bias of an analysis tends to be due to

uncontrolled factors that remain constant for the period of the

analysis and have equal effects on all samples in that batch. If

the bias varies from sample to sample then it could be said that

the method is not very robust. Efforts should be made, usually

in the method validation stage, to rid the method of

uncontrolled factors. This is usually done by the introduction

of an internal standard or surrogate. If this cannot be achieved

then the ‘‘fitness of purpose’’ of the method needs to be

addressed. If bias is seen to vary in repeated analysis of a

sample, this can be attributed to the sample or the method. If

the within sample bias is due to the sample, this is usually

associated to the heterogeneity of the sample. Efforts should

be made to employ a sampling strategy that results in a

homogeneous sample. If this is not possible then the number of

replicate analysis of the sample should be increased to give a

more reliable result. If the within sample bias is due to the

method, then this variability is accounted for in the repeat-

ability of the sample and again can be reduced by increasing

the number of replicates of the sample. The simulation reveals

that the best way to deal with bias is to always correct for it. If

correction for bias is not allowed, then enlarging the

uncertainty by the SUMUMax gives the best compromise

across all levels of bias.

Conclusions

Testing an average of the bias under reproducibility conditions

for significance may render the bias to be acceptable, that is

non-significant, when in fact on many occasions it is not. The

bias should be included in the test result by making a

correction for the bias based on the magnitude of the effect

in that particular run, and the uncertainty of the correction

should be included in the uncertainty of the test result. If

the observed value is not corrected for the bias then the

uncertainty of the test result should be enlarged by one of

the enlargement methods to at least include the ‘‘true’’ value.

The best enlargement method to use, as illustrated in this

paper by simulations, is the SUMUMax method, which adds

the absolute value of the bias to the expanded uncertainty. Not

allowing for bias by correction of the test result or by an

enlargement of the uncertainty of the result, and not including

the uncertainty of the estimation of bias leaves both the test

result and the uncertainty invalid.

Gregory E. O’Donnellab and D. Brynn Hibbert*b

aWorkCover NSW, Laboratory Services Unit, 5a Pioneer Ave,Thornleigh, NSW, 2120, Australia.E-mail: [email protected]; Fax: +61 2 9980 6849;Tel: +61 2 9473 4005bSchool of Chemistry, University of New South Wales, Sydney, NSW,2052, Australia. E-mail: [email protected]; Fax: +61 2 9385 4713;Tel: +61 2 9385 6141

References

1 ISO, BIPM, IEC, IFCC, IUPAC, IUPAP, and OIML, Guide tothe Expression of Uncertainty in Measurement, ISO, Geneva,Switzerland, 1993.

2 ISO, International vocabulary of basic and general terms inmetrology, ISO, Geneva, Switzerland, 1993.

3 ISO.3534-1, Statistics—Vocabulary and symbols—Part 1, ISO,Geneva, Switzerland, 1993.

4 M. Thompson and R. Wood, Pure Appl. Chem., 1995, 67, 649.5 M. Thompson, S. L. R. Ellison and R. Wood, Pure Appl. Chem.,

2002, 74, 835.

Fig. 4 Always enlarge. This graph shows the probability of the true

value occurring within the uncertainty interval when the uncertainty

interval is increased by an enlargement method over all levels of bias.

The enlargement methods are SUMU (diamonds), RSSU (circles),

Barwick & Ellison (triangles), SUMUMax (crosses), RSSu (squares).

Lines are drawn indicating the level at which the bias is considered

significant and the expected probability for k 5 2 (95.45%).

728 | Analyst, 2005, 130, 721–729 This journal is � The Royal Society of Chemistry 2005

Publ

ishe

d on

08

Mar

ch 2

005.

Dow

nloa

ded

by U

nive

rsid

ade

Tec

nica

de

Lis

boa

(UT

L)

on 0

8/02

/201

4 16

:44:

44.

View Article Online

Page 9: B414843F

6 ISO.5725-1, Accuracy (trueness and precision) of measurementmethods and results – Part 1: General principles and definitions, ISO,Geneva, Switzerland, 1998.

7 R. Walker and I. Lumley, Trends Anal. Chem., 1999, 18, 594.8 D. L. Massart, B. G. M. Vandeginste, L. M. C. Buydens,

S. D. Jong, P. J. Lewi and J. Smeyers-Verbeke, Handbook ofChemometrics and Qualimetrics: Part A, Elsevier Science B.V.,Amsterdam, 1997.

9 E. Hund, D. L. Massart and J. Smeyers-Verbeke, Trends Anal.Chem., 2001, 20, 394.

10 R. Song, E. Kennedy and D. Bartley, Anal. Chem., 2001, 73, 310.11 Analytical Methods Committee, Analyst, 1995, 120, 2303.12 A. Maroto, R. Boque, J. Riu and F. X. Rius, Trends Anal. Chem.,

1999, 18, 577.

13 A. Ambrus, Accredit. Qual. Assur., 2004, 9, 288.14 M. Thompson, J. Environ. Monit., 1999, 1, 19N.15 M. Ramsey, VAM Bulletin, 2004, autumn 9.16 D. T. Burns, K. Danzer and A. Townshend, Pure Appl. Chem.,

2002, 74, 2201.17 V. J. Barwick and S. L. R. Ellison, Analyst, 1999, 124, 981.18 V. J. Barwick and S. L. R. Ellison, Accredit. Qual. Assur., 2000, 5, 47.19 S. D. Phillips, K. R. Eberhardt and B. Parry, J. Res. Natl. Inst.

Stand. Technol., 1997, 102, 577.20 A. Maroto, R. Boque, J. Riu and F. X. Rius, Accredit. Qual.

Assur., 2002, 7, 90.21 M. A. H. Franson, American Public Health Association. Standard

methods for examination of water and wastewater, Washington, DC,1989.

This journal is � The Royal Society of Chemistry 2005 Analyst, 2005, 130, 721–729 | 729

Publ

ishe

d on

08

Mar

ch 2

005.

Dow

nloa

ded

by U

nive

rsid

ade

Tec

nica

de

Lis

boa

(UT

L)

on 0

8/02

/201

4 16

:44:

44.

View Article Online