GUIDELINES FOR OBTAINING, CALCULATING AND REPORTING ...dfo-mpo.gc.ca/Library/320737.pdf · for both...

. .::

GUIDELINES FOR OBTAINING, CALCULATING AND REPORTING QUALITY

STATEMENTS FOR CHEMICAL ANALYSIS OF MARINE SEDIMENTS

R. \~. MACDONALD

i

TABLE OF CONTENTS

1. INTRODUCTION

2. REPORTING

2.1 What should be reported

2.2 How data should be reported

3. METHODOLOGY

3.1 Method documentation

3.1.1 Principles of a good procedure

3.2 Organizational requirements of RODAC

3.3 Validation of the method

3.3.1 Ruggedness testing

4. FACTORS CONTRIBUTING TO QUALITY

4.1 Reagents

4.2 Recovery and Interferences

4.2.1 Reference materials

4.2.2 Spiking environmental samples

4.4 Calibration

4.5 Blanks

4.6 Limit of detection (LOD)

4.7 Accuracy

4.7.1 Random error

4.7.2 Systematic error (bias)

4.7.3 Combining and reporting the two error statements

4.8 Quality assurance

4.8.1 Control charts

5. RECOMMENDATIONS FOR VERIFYING QUALITY OF DATA

6. REFERENCES

;;l..

.'l

.'I

S"'

" b

1-'8 "I

10

10

10

1/

(.2.

13 l<j

/6

19 ':1I

.23 .:W \i

.28

3D

33

3.';

,.

ii

APPENDICES

... 1 APPENDIX 1 DEFINITION OF TERMS USED IN REPORTING OF DATA

..• 2 APPENDIX 2 THE RUGGEDNESS TEST

... 3 APPENDIX 3 CERTIFIED REFERENCE SEDIMENTS

A.3.l Metals

A.3.2 Chlorinated hydrocarbons

A.3.3 Reference sediments in preparation

A.3.4 Addresses

·.4 APPENDIX 4 PROPAGATION OF ERROR

'/1

W' '1,/

W -'-:> '1!'

"9 ... 5 APPENDIX 5 COMMON FORMULAS USED TO SUMMARIZE DATA AND MAKE QUALITY 50

STATEMENTS

A.S.l Sample standard deviation

A.S.2 Pooled standard deviation, sp

A.S.3 Rejection of outliers

A.S.4 Calibration using linear regression

A.S.4.l Linearity

A.S.4.2 Linear regression

A.5.S Blank Correction

A.S.6 Limit of detection (LOD)

A.5.6.l IUPAC Definition

A.S.6.2 LOD by propagation of error

A.S.7 Symbols used in Appendix 5

..• 6 APPENDIX 6 CONTROL CHARTS

A.6.l The Shewhart control chart

A.6.2 The Youden control method

A.6.3 The range control chart

A.6.4 Summary

A.6.4.l Chart preparation

A.6.4.2 Chart interpretation

S-I 51

Scl.

:'3 5'(

.5",

.~6

S1 58

59 (,0

6::1. (,2

6</

~S b~

'8 to

4

deviations. However, the leap from these to confidence limits requires certain

assumptions which may be invalid (Kaiser, 1970) and should be approached with

caution. At the very least, the average, X, the standard deviaiton, s, and the

number of replicates, .n, should be reported (Natrella, 1982) because how well X estimates the population mean, ~ , depends on sand n, and how well s estimates G,

the population standard deviation, depends on n. Uncertainty statements should

be carefully formulated and supported so that there can be no confusion in what

is meant. Outliers which have been deleted from the data set should be

identified, and statistical or other reasons for their deletion should be

specified.

Data points which fall below the LOD (limit of detection) should be

reported as "not detected" followed by the LOD in brackets. Those points which

fall between the LOD and the quantitation limit (LOQ) should be reported as

numbers followed by the detection limit in brackets (ACS Committee, 1980).

The numbers should not be distorted by the reporting process. This

means that blank and recovery corrections, and calibration conversions should be

made clear. Give at least the formula and preferably a worked e~ample. Errors

should be avoided by rounding off at the end of the calculations. It is

generally assumed that roundoff (correctly applied) implies ~ 1/2 in the last

significant figure (i.e. 3.2 ~ 0.05) however this is a poor way to judge

confidence in data (ACS Committee, 1980). State the uncertainty to two

significant figures and the reported value to the last place in the uncertainty

statement (Ku, 1968). Terminology and units should be expressed according to

S.l. practice (see for instance the last pages in a recent January issue of

Analytical Chemistry). For data where neither imprecision nor systematic error

(bias) are negligible, Eisenhart (1968) recommends qualifying the results with a

statement placing bounds on systematic error with a separate sta·tement of the

standard deviation or imprecision. The reported result should be stated to the

last place affected by the finer of the two qualifying error statements. Later,

in the section on accuracy (4.7), we discuss how the two error statements might

be combined.

5

3. METHODOLOGY

The word "method" has been defined as a "set of written instructions

completely defining the procedures to be adopted by the analyst in order to

obtain the required analytical result" (Wilson, cited in Kirchmer, 1983). While

most analysts would concur with the definition, there is a further need for

clarity since there is a large variation in what is normally reported as a

method. Taylor, (1983), has suggested the following hierarchy; technique

method-procedure-protocol, where technique refers to the scientific principle,

and method is a distinct adaptation of technique. Only when we get to procedure

do we consider written directions necessary to use a method, and more

specifically a protocol is a set of definitive directions that must be followed

without exception. Generally when we refer to method as defined at the Qutset,

we mean a procedure or more stringently, protocol. In the following, we will

use Taylor's terminology for clarity.

3.1 Method documentation

The documentation should include sample pre-treatment, digestion and

instrumentation. All need to be specified accurately and completely so that

others can understand exactly how the determination was performed. New methods

should be reported fully with exhaustive testing (ACS Committee, 1980). It is

acceptable to cite a reference provided it is generally available and gives a

complete procedure for the method. Any modifications to a procedure should be

fully tested and reported, and the procedure should. be updated when

modifications have been instituted. Little known methods are not recommended

since they force one to rely on the analyst, and generally do not have a good

basis for comparison. Uniformity of methods between laboratories can remove one

source of inter-laboratory variance, however, with properly validated methods,

this step is not essential to meet quality objectives. If the laboratory is

following acceptable quality assurance practice, they will already have at least

a procedure and preferably a laboratory protocol. Provision of a detailed

description of their procedure, therefore, need only be done once and kept on

file with updates ~hen changes are introduced. This should not be an onerous

8

TABLE 1

Proposed performance characteristics of a suitable ocean dumping procedure for

total metal in sediments4•

METAL

Hg < 0.23

Cd < 0.18

Cu < 4

Pb < 21

Zn < 30

PRECISION

sxlOO X

< 10%

< 10%

< 5%

< 20%

< 5%

ACCURACyl

sxlOO X

15%

15%

10%

25%

15%

CONCENTRATIoJ

- -1 X (Ilg g )

0.75

0.60

25

35

200

1. Modified standard deviation -V'i.(X~-R) 2 where R is the reference value.

2. X is the concentration (dry weight bssis) upon which precisio~ accuracy and

LOD are based.

3. For Cu, Ph and Zn these were estimated as 38 at the concentration shown.

4. A suitable method for total metals should also have a recovery of metal in

certified reference sediments of greater than 60%, and preferably greater

than 90% (see section 4.2).

3.3 Validation of the method

This subject has recently been reviewed by Taylor (1983), and deserves

more emphasis than it is generally given. The goal of validation is to see if a

laboratory can use a specific method to produce results which conform to

pre-determined requirements. This implies that corporate goals are defined as

in Table 1. The literature may be reviewed for prospective methods which should

meet the pre-defined performance characteristics. Reported methods often lack

c.

9

detail and it is generally unwise to accept claims for the performance at face

value. Therefore, a good laboratory w~ll perform its own validation before

using the method routinely, and as an end product of validation it prepares a

procedure or better still a protocol as described above. During validation the

performance it prepares a procedure or better still a protocol as described

above.

During validation, the performance of the method is examined, a good

design allowing estimates of precision and accuracy (function of concentration)

for both reference materials and real samples (Kirchmer, 1983). Typically,

three concentration levels should be examined; extremes and mid-range.

Information acquired during this process can lead to an estimate of the range of

application, sensitivity to interferences, purity of reagents and the detection

limits. The product of validation, therefore, is a list of performance

characteristics. Through validation, the laboratory demonstrates its capability

of performing a method, and generates sufficient information to prepare the

protocol. The performance expectations so determined should not be used for

quality estimates of later data; rather they are a yardstick for comparison. If

there is a limited number of sample types for which the method does not work,

this does not mean the method need be abandoned (Evans, 1978). Rather these

sample types should be identified, and use of an alternate procedure designated.

This is the value of documenting "accumulated experience II with a method.

3.3.1 Ruggedness testing

No validation procedure would be complete without a ruggedness test.

This can be performed with remarkably little effort, and provides evidence of

sensitivity of a procedure to small changes. An outline of how to perform a

ruggedness test is given in Appendix 2. Procedures which are rugged are

desirable since their application by different analysts at different times is

likely to produce more consistent results. If the outcome of a procedure is

found to be very sensitive to a particular variable (e.g. temperature, humidity,

time), then alternate steps should be considered, or that variable must be very

tightly controlled within specifications outlined in the protocol.

10

4. FACTORS CONTRIBUTING TO QUALITY

4.1 Reagents

The required purity of reagents varies with the determination, and needs

specific evaluation during method validation. The performance will be

controlled by purity (and variability of purity) since it affects precision,

bias and LOD for the procedure. As a rule, for analytical determination of

components of sediments analytical reagent grade will be required as a minimum.

For some metals, spectra-grade or ultra-pure chemicals will be required.

Certainly, once established, no chemicals of lesser purity than specified in the

protocol should be used. The reagent background or blank of each chemical

component should be determined prior to use (Booth, 1979). Variations in purity

occur from source to source and even from lot to lot and therefore new bottles

should be evaluated before use. In addition to the individual reagent blanks, a

method blank is required to evaluate the combination of all chemicals used in

sample digestion and preparation, and also to estimate the LOD.

Chemicals used in the preparation of standards or calibrants require

special attention since they will be factors in both precision and accuracy of

the reported results. It is particularly important to evaluate the storage

procedure (bottle type, time, light, temperature). Restandardization and

chemical preparation should be carried out as often as required and should be

specified in the protocol. Standards can be prepared or bought commercially,

but should be independently checked and intercalibrated with the old standards

to provide continuity.

4.2 Recovery and Interferences

Methods which achieve high recoveries are desirable since they do not

require a large bias correction, and therefore are easier to check with

reference materials. Furthermore, they have inherently better relative

precision. For example compare two different methods with routine recoveries of

85-95% and 25-35% respectively. The range in recovery as a percent is

apparently the same for both methods, however the variance in recovery leads to

a coefficient of variation of + 6% for the first method, and + 17% for the

11

second. In practice methods with low recoveries tend to have wider ranges in

recovery which worsens this situation. For these reasons, the ACS Committee

(1980) recommends that methods with recoveries of less than 60% not be used.

The recommendation to avoid methods which have low recovery should in no way be

misconstrued to mean that selective or partial extraction schemes are to be

avoided. What I mean here is low recovery of the determinand of interest which

could be a small portion of the total for example, the 'weakly bound" component.

The determination of recovery (and its repeatability and

reproducibility) is an important validation step which should be addressed with

reference materials, and by using referee methods such as those known to give

total extraction (HF for total metals in sediments for example). Spiking can be

used to determine the efficiency of various chemical or physical extraction

steps but will give no information of the extraction of the determinand embedded

in a complex matrix or in a form which differs from the spike (see section

4.2.2).

4.2.1 Reference materials

Certified reference materials (Appendix 3) can be obtained from a number

of sources and are a keystone in validating a method and checking its

performance. Reference materials are intended to behave like environmental

samples, and are in fact environmental samples of careful determination and

known composition. At present, the selection for metals, hydrocarbons and

chlorinated hydrocarbons in reference marine sediments is limited to a very few,

so not all matrix types or metal concentrations are represented. For

environmental material which is similar in composition and trace component

concentration, reference materials are the method of choice to check for

recovery and interference problems.

The limited selection of reference materials can be augmented by

preparing in-house uncertified sediments. This could be particularly important

for exceptional material (mine tailings, woody waste) which present reference

materials do not well represent. The uncertified reference should be well

mixed, properly stored to avoid deterioration, and fine grained ( < 62 ~m) so

that representative samples can be easily removed. This material should be

12

analyzed by referee methods if possible as a further check, or used in

inter-calibrations with other laboratories performing similar analyses.

4.2.2 Spiking environmental samples

Spiking can be carried out by adding a known amount of analyte to a

portion of sample for which there is already a determination and estimating the

difference between actual recovery and theoretical recovery. This test is not

very powerful in a statistical sense (Kirchmer, 1983) and even without

interference, considerable differences from 100% are to be expected. An

alternative approach is the well known standard additions method (SAM), the uses

and limitations of which are discussed by Klein and Hach (1977).

Standard additions and spiking are fraught with problems in

interpretation and should be approached with great caution. Spiking with a

soluble form of the analyte will indicate nothing of the effectiveness in

leaching the analyte from a solid matrix. The assumption in spiking experiments

is that the analyte added has equilibrated with the natural form, or is subject

to the same recovery and interference (Holden et aI, 1983; Corsini et aI, 1982;

ACS Committee, 1983). This assumption is often unwarranted and Corsini et al

(1982) point out that even complete recovery of the analyte spike is not

evidence that the analytical result is correct. Suitable combinations of

thermodynamic and kinetic factors are required before SAM is reliable. However

recovery tests by spiking can provide a certain degree of information; for

example non-linear SAM surves, or disagreement between SAM and normal

calibration procedures indicate a problem. Furthermore, spiking of blanks

(calibration) which are run through ·the entire analytical sequence can give

information on interferences arising from the reagents. These kind of

interferences or recovery problems can and should be eliminated before the

method is applied to environomental samples. If results for spikes on a sample

are poor, the matrix is likely to present even greater problems (Holden et aI,

1983). Spiking of blanks can also be used in the estimation of the LOD of the

method (Glaser et aI, 1981).

13

4.4 Calibration

Calibration is the process of relating an instrumental response to a

corresponding mass or concentration of a particular substance. Taylor (1983)

notes that there are two kinds of calibrations; physical calibration for the

measurement equipment and aneillary measurements (time, volume, mass), and

chemical calibration. Due care should be paid to physical calibrations so that

they do not contribute significantly to error. Of all the ancillary equipment

the analytical balance is probably the most important and most neglected. The

balance plays a central role in analysis, generally used to prepare standards

and to calibrate or check other laboratory procedures, micro-pipetting for

example. Balances should be checked for calibration regularly with a set of

standard weights, and a control chart maintained with the balance to ensure

prompt correction in case of problem.

Uncertainty in the calibration, particularly biased calibration curves

is a leading contributor to inter-laboratory variance. For this reason it is

essential to perform the calibration with high quality calibrants which have

been verified. Calibrants should not be stored longer than recommended by

manufacturers and detail concerning their preparation and use should be

specified in the protocol. New calibrants should be inter-calibrated with the

old ones, when possible, to ensure continuity.

To prevent bias, calibrants should match as closely as possible blanks

and samples, and they should be analyzed by the identical procedure (Kirchmer,

1983), Different procedures are acceptable only where there is experimental

evidence that the results differ negligibly. The procedure for calibrating

should be specified in the laboratory protocol (ACS Committee, 1980).

The ideal arrangement is to have a linear calibration but this should

initially be verified with the linearity test as shown in section A.5.4.1. The

ACS Committee (1980) recommends that 5 concentrations analyzed in triplicate

should be used. The concentrations should span the measurement range for

samples. With experience of the instrument response (sensitivity, calibration

blank) these requirements could be dropped to 3 different concentrations

analyzed in triplicate. Check calibrant should be prepared independently to

assess precision and bias of the calibration and further establish linearity.

14

We recommend control standards at a high concentration to check slope bias and

at a low concentration to check blank bias (see appendix). Calibration data can

and should be summarized by linear regression equations (Kaiser, 1970).

Appropriate formulas and reporting instructions are given in A.5.4. Calibration

near zero concentration will also affect the estimate of LOD and so needs to be

performed with care.

Memory effects should be identified by presenting calibrants randomly to

the instrument, interspersed with samples and blanks. Of prime importance is

the establishment of the stability of the instrument, so that frequency of

calibration can be adjusted to attain the required accuracy given the stability

or drift rate. This can best be achieved by inserting standards regularly and

using a control chart.

The standard addition method, SAM, can provide useful information about

the calibration and should be used from time to time, particularly during method

validation, and when problem sediments are suspected. Klein and Hach (1977)

have developed a standard additions decision tree which may be used to guide

interpretation of SAM results. We re-emphasize that SAM should be used with

caution. Normally, SAM is performed by preparing a four point calibration

curve. In no case should a one point standard addition be used as a substitute

for a calibration curve.

4.5 Blanks

A blank is a response which occurs in the end measurement in the absence

of analyte derived from a sample. Several types of blanks are possible;

instrumental, calibration, reagent, and method (field). Generally the magnitude

and variation tend to increase from the instrument blank to the method blank.

The intent and utility of each blank varies and therefore clarity in reporting

how a blank was determined is essential. The validity of a blank depends on how

well it represents the measurement process to which it applies. To prevent

bias, blanks should always be handled randomly and in an identical fashion to

samples. Use of special chemicals, containers or facilities for blanks will

invalidate them.

The instrumental blank may be determined by measuring the response when

15

the instrument is operated normally but no sample is presented to it. This

blank should be monitored since the overall blank will never be less, and

because it can give valuable information on instrumental deterioration or memory

effects (a pertinent example is the deterioration of instrumental blank as

graphite tubes reach the end of their lifetime). The calibration blank will

depend on the chemicals used to make up the standards and should be monitored

since it is a potential source of calibration bias. The reagent blank should be

determined individually for each of the chemicals used in the procedure by

introducing them to the same detection system (diluted or made up in appropriate

quantities of solvent). This process can be used to accept or reject new lots

of chemicals before they are brought into service, or to indicate when

purification steps must be taken.

The method blank is the most important and both its magnitude and

variance need to be known, the latter being used for the estimation of the

method LOD and overall error in a blank corrected result. This blank is found,

in principle, by analysis of a representative sediment which contains no

analyte. Ideally, this would be run as a field blank wherein the "analyte free"

sediment would be packaged and stored in the the field identically to samples.

Unfortunately, this ideal does not presently appear to be feasible for marine

sediments and therefore a "combined" laboratory reagent blank must substitute.

(With careful sampling and storage, normal marine sediments should not be

subject to a significant contamination and therefore absence of a field blank

should not invalidate results.) To determine the combined reagent blank,

therefore, digestions are carried out exactly as described in the procedure in

the absence of a sample.

It is statistically advantageous to have a blank to sample ratio of 1:1,

but this is seldom done in practice due to cost. The number of blanks required

depends on several factors including the number of analyses, the accuracy or

precision required, and experience with blanks from previous determinations. It

has been suggested that one blank should be run per batch (Kirchmer, 1983) or

that the blank to sample ratio should be 1:9 (Booth, 1979). Until blank

characteristics have been well established it would be wise to run them

frequently. For the purposes of control (blank variance and magnitude) no fewer

16

than 2 per batch are necessary.

Performance of a blank correction can contribute both to random error

and bias. To minimize these, low and constant blanks are desirable. The blank

response on the recorder should not be subtracted from the sample response

directly unless the linearity of the calibration is well established.

4.6 Limit of Detection (LOD)

The limit of detection (LOD) has been a subject of confusion for two

reasons: many definitions have been used and often a given definition can be

applied in more than one way. Different statistical approaches result in an

order of magnitude variation in the estimate of the LOD, and recently much

thought has gone into developing a rational basis for determining it (Long and

Winefordner, 1983; Kirchmer, 1983; ACS Committee 1980; Currie, 1968, and Porter,

1983). Concensus is being reached in the literature and we summarize here two

acceptable approaches. Detailed formulas and worked examples are provided for

the IUPAC method, and the "propagation of error" method in A.5.6.

Detection limit is frequently defined as twice background noise. This

definition evolves from electronics and relates specifically to the response of

an instrument. While simple to state, application of this definition is

problematic; the relationship between frequency of signal and dominant frequency

of noise plays a role but is seldom discussed. Furthermore it is universally

recognized that such "instrumental" detection limits contribute to but do not

encompass the broader concept of a method detection limit which includes

additional procedure uncertainties (see Glaser et aI, 1981).

The IUPAC definition of LOD (Appendix 1) brings in the concept of

"reasonable certainty" in separating an analytical signal from noise. To do

this, IUPAC recommends using the random error associated with the blank signal, aB,

as the noise. Reasonable certainty is assured by demanding that a signal be at

least ka limits above the average blank before concluding that analyte has been

detected. In the past, k=2 has often been chosen but this is now discouraged in

favour of k=3; for a normal distribution only 1 in 1000 blanks would give so

large a signal, and for any distribution, Tschebyscheff's inequality assures

that no more than 1 in 9 blanks would exceed this criterion. The real world

lies somewhere between these two limits.

or the

To calculate LOD, 0B is estimated

standard deviation observed as one

17

by the blank standard deviation,s , B

approaches the LOD from above. If

this estimate is based on a very limited number of blanks,sB may seriously

underestimate crBe To correct for this one can substitute the appropriate "tl!

value which converges to 3 as n gets large, as shown in e_quation 1;

-1-

III m

n large n small

where m, the slope of the calibration curve (analytical sensitivity), converts

from recorder signal units to concentration units. Experience in a variety of

situations has shown that the "3q II method of estimating LOD is in agreement B

with estimates made by inspection (Kaiser, 1970). The "method" detection limit

can therefore be defined as;

LOD ~ 3 lim a c

[c]+ LOD

-2-

For this calculation, qB is chosen as a reasonable estimate for GLOD

.

Kirchmer (1983) estimates the random error at the LOD to be about 66% of the LOD

at the 95% confidence level. If blanks are so small that they do not give an

apparent signal above instrumental noise, it is difficult to apply the above

concept. The ACS Committee (1980) recommends that the LOD may then be based on

the peak to peak noise measured on the baseline close to the actual or expected

analyte peak. A more thorough approach, consistent with the definition in

equation 2, would be to spike blanks to give analyte concentrations which lie

slightly above the LOD but still not in the region of quantitation as discussed

below (see Glaser et aI, 1981).

Long and Winefordner (1983) have criticised the IUPAC definition of LOD

because it does not include error associated with the calibration and therefore

underestimates the true LOD. Based on the propagation of error, they give

18

formulas which allow the calibration to be factored into the LOD calculation,

and these are developed in A.5.6.2.

There are three regions of chemical analysis (Figure 1); unreliable

detection, detection (qualitative analysis) and determination (quantitative

analYsis).

REGIONS OF ANALVTE MEASUREMENT

o LOO.3Ita LCD.l0ua

Unreliable Detection Region of Detection Region of Quantitation

Reported u, Nol Ottected (LOD)

Figure 1

Reported aB, Number (LOO)

Reported Ba,

Number Plus Error Estimates

Figure 1 shows that care is needed in expressing LOD, or LOQ since one can

perform a blank correction as shown at the bottom. The definition of LOD

implies that it is the blank corrected value, "3a ", which should be expressed. B

Near the detection limit the substance can only be detected (present,

absent) with a certain degree of confidence but not reliably quantified. A

second term, the limit of quantitation (LOQ) has therefore been introduced to

designate the point at which quantitative analysis can begin. The ACS Committee

(1980) recommends that LOQ be calculated analogously to LOD but with k set equal

to 10. The choice of 10 is a minimum and really depends on having a well

defined blank (Currie, 1968). Blanks, and their variance, sensitivity and

calibrations change from data batch to data batch and are intrinsic properties

of a data set. Therefore LOD and LOQ vary also; good control over blanks and

calibrations is essential to keep LOD in control and this should be continually

evaluated.

19

Detection and quantitation limits are part of the performance

characteristics of a method and should_be reported with the data including how

they were determined, and the number of blanks used to estimate them. The LOD

and LOQ are useful indicators of how the method will perform for single

determinations at various concentrations. By replicating samples and taking

averages, onl' can effectively reduce the LOD and LOQ by decreasing random error

(s- = s). Therefore these performance characteristics should not be viewed as X X Yo

, a replacement for a valid error statement which is based on replicate

determinations.

4.7 Accuracy

Although a fundamental property of data, and the most important quality

statement, there is still no agreement on the definition of accuracy or how best

to express it. Therefore it is essential to illustrate clearly how one has

arrived at a final error statement. Here I consider accuracy to be an

expression about how close each determination is expected to be. ~o the "true"

value. With this definition accuracy comprises a random error (imprecision) and

a systematic error (bias). The need for clarity in accuracy statements becomes

all the more important since systematic error can behave like a random error and

vice-versa depending on exact circumstance. For example, a single calibration

curve which is in error (slope or intercept) would cause a bias in all data

where that curve was used. However, over a number of days with many

independent calibrations it would begin to look like a random error. Of course

if the primary calibrant used to prepare all calibration curves was "off" this

would contribute a bias in addition to the daily fluctuations. Thus it is that

bias between two laboratories becomes treatable as a random error if a large

population of laboratories is sampled, and would be called inter-laboratory

variance.

It is unreasonable to require that every measurement be made in a

completely independent manner (i.e. have its own calibration curve, blank, etc.)

and therefore we must estimate error for these steps and ensure that they

contribute to the final error statement. Figure 2 illustrates how errors can be

20

RANDOM and SYSTEMATIC COMPONENTS of LABORATORY ERROR

Bias Callbrsnl

Blank • t

Poor Matkod -

Contamination + Analyst :!:::

Total Bias Leb/ Analyst

.....................................

Many Labs

Many Analysts

.. ... ~ ,

...... . ..... · '. f Random Biaso

\ . . : Sample Preparation .. : .. ~ Calibration \ .. Blank Correction ... . . \ Reading, Calculation \ · . · . ... ..

". ". \ \ 0. ~ Long Term \

'0 ':: •• ;~ Many Events ... ~~

Short Term <1 day S Ingle Event "'\, " ".,-,. '"'-)

j ........... 1 ./ '---_B_ia_s_~ ~_R_a_n_do_m----,

Accuracy Bias :!:(Random Bias :!:Random)

Q Period of Errors in this Box is About One Day

Figure 2

21

into three categories; genuine bias J random bias, and random error. The "random

bias" will be treated here as if it were part of the random error component, but

it needs special consideration because while it contributes error to the final

result, it is often omitted from the error statement due to the manner in which

calculations are made.

4.7.1 Random error

Precision is defined as the degree of agreement between individual

measurements when using a method. It is commonly represented by standard

deviation or some other measure of spread which is really an estimate of

imprecision. Great care is needed in giving the precision statement. As noted

earlier, not only is the standard deviation, 5, required, but the number of

replicates, fi J used to determine s.

In addition to sand n, the model and formula used to estimate s must

also be stated, as will now be shown. Variation in data generated on a single

sample by a single analyst or instrument over a short period of time is one

possible model which could be used to estimate a valid standard deviation for

the "repeatabilityll of the method. A second model would estimate the variation

observed over an extended period of time including different analysts in

different laboratories (reproducibility). The variance associated with the

reproducibility will be greater than or equal to that associated with the

repeatability. Furthermore, determinations which have been performed in

different laboratories would be expected to contain inter- and intra-laboratory

components of variance. The skill in providing a valid precision estimate lies

in choosing the correct model for the circumstance. In reporting, it is

essential that the model be clarified by a statement supporting the precision

estimate; a statement such as A± a is, by itself, meaningless.

Sources of error in an analytical measurement are listed below in

approximate order of increasing significance (Anomynous, 1976); electronic

noise, reading errors, analytical variation (depends on method), sample

preparation noise, inhomogeneity of samples and poor sampling, sample handling

and preservation. It is important that steps be taken to control and measure

the total analytical variability since too often, sample inhomogeneity is used

22

as an excuse for poor reproducibility in analysis. Errors which are calculated

individually for each of the above components can be combined according to the

rules of propagation of error as outlined in Appendix 4.

Imprecision will vary over the range of measurement. Near the LOD the

standard deviation will be about the same magnitude as the measurement. As one

reaches and passes the LOQ the standard deviation will be such that the

coefficient of variation (eV) is about 10% or less. One might expect the CV

then to remain constant up to some concentration where the calibration curve

becomes non-linear and the instrument saturates. Over the linear range, one

expects an empirical relationship between standard deviation and concentration,

C, such as;

s a + be

cv = s = a + b e C

-3-

-4-

Therefore at low concentrations, S ~ a, while at higher concentrations the

coefficient of variation is approximately constant and equal to b. While more

complex formulations are possible (higher powers of e are used) it is doubtful

that data exist which warrant their usage. The main point is that a single

standard deviation may not suffice, and at the least, s should be evaluated at

the upper and lower end of the concentration range of interest. This requires

judgement based on experience with the method.

There are a number of approaches to measuring precision, some of which

are listed below.

1. Repeat determinations of a single sediment extract;

2. a single sediment sample (well mixed) is split and sub-samples are

independently rUn through the complete analytical procedure; and,

3. a number of different sediment samples (each well mixed) are split and

replicate sub-samples are independently run through the complete analytical

procedure.

The first method would evaluate the measurement system noise but would not

detect memory effects (unless blanks were interspersed with samples) or sample

23

digestion variance. It would, therefore, be an unsuitable way to determine

variance pertaining to a group of varied sediments. The second method would be

an improved way to estimate the overall analytical precision, but only for a

particular sample. As noted, two or more samples investigated in this fashion

could help determine standard deviation as a function of analyte concentration

or even matrix. Method 3 is probably best for sets of varied sediment samples,

but duplicates should certainly not be run consecutively, and it is better if

" the analyst does not know which samples are duplicates. A pooled variance

(a.5.2) can be calculated from the results of replicates (assuming variances are

relatively uniform). As noted above, variance is likely to be a function of

analyte concentration, and possibly of other aspects such as matrix. In cases

where data are obtained over a wide range of substrates or concentrations, the

error treatment might best be handled by dividing the data into smaller sub-sets

where variance is more or less homogeneous, for example a high concentration

data set (polluted) and a low concentration data set (background).

Alternatively, variance stabilizing operations can be used such as the

log-transformation, but it is likely that some judgement will be required by the

analyst.

Correction of the blank poses an interesting problem for expressing

error. If there is a matched blank for each sample and the variance is

calculated on the adjusted values, the final reported variance will correctly

contain the two random error components. However, it is usually the case that

there are fewer blanks than samples, and the blank correction is made by

subtracting the average blank from each sample. In this case the blank behaves

as a "random bias" and the error may be filtered out of the final statement. It

is very important that one specify exactly how blank correction has been

performed including the information B,s,n, where B is the average blank.

Formulas for the various precision estimates plus worked examples are given in

A.5.S.

4.7.2. Systematic error (bias)

The main sources of analytical bias include calibration, blank

correction, interference and inability to determine all forms of the determinand

24

(Kirchmer, 1983 Holden et aI, 1983). Bias can also arise from the sampling and

storage process, but this will not be dealt with here.

Sections dealing with calibration, blanks and interferences have already

emphasized the importance of treating these properly. Although not widely

recognized, bias from interference or incomplete extraction is difficult to

control, but definite steps can and should be taken. Firstly, procedures should

be rugged, as already outlined, to ensure that extraction will be consistent.

The most satisfactory method for evaluating sources of bias is to analyze

certified reference materials. We recommend using replicate determinations of

reference materials at each end of the expected range of analyte concentration.

If suitable reference materials do not exist, the laboratory may have to prepare

their own working material taking the precautions noted earlier. Reference

sediments, used properly, can help to estimate both precision and bias.

A second way to estimate bias is to use an alternate analytical method

from time to time. This "reference" method should be an adaptation of a

completely independent technique if possible. For example, neutron activation

of the solid sediment followed by y counting would be a possible reference

method for a procedure which relied on a digestion step followed by

determination by AA. Colorimetric determination after the same digestion step

would not be an independent technique and so would fail as a reference method.

4.7.3 Combining and reporting the two error statements

Considering the problems identified above, it is unlikely that a perfect

error statement can be made. However, an unbiased best estimate can be

attempted by considering how error arises in the measurement system.

Three steps should lead to a valid assessment and clear statement of

error:

I. All possible sources of error should be identified preferably in order of

importance. Within laboratory errors will normally include (see Figure 2)

a. Subsampling and sample preparation

b. Calibration

c. Blank correction

d. Reading, calculation errors

e. Electronic noise

25

2. Develop and error model by which each of the important errors can be

measured. Calibration can be treated by linear least squares and errors on

slopes and intercepts can be calculated. Similarly, random blanks can be

used to determine the average and variance of the blank. Subsampling,

sample preparation, reading errors and electronic noise are difficult to

separate and are most easily evaluated together in the results from random

blind replicates. Alternatively, variance for these might be measured by

replicate measurements at 2 or more concentrations. To assess the combined

error of several steps, each must be allowed to contribute to error

independently. Bias can be evaluated independently through the use of

referee methods or blind, random reference material, and will be reported

separately from the random errors.

3. Combine the errors to give an un-biased best estimate. This will involve

two parts; the bias and the random errors. The former can best be treated

by examining all experience with the method particularly in consideration of

reference material, referee methods and inter-laboratory comparisons. For

the random component of error one must make some judgement of how error

contributes to the final data as outlined in Figure 2. The intent of

putting the errors together ~s to make sure that all errors which contribute

to the measurement have been given an unbiased and independent chance to

contribute to the error statement. "Random bias" components will require

particular care. The two most important examples are calibration and blank

correction. If a single calibration has been used for all data, then the

calibration error needs to be added to the error estimated by replicates.

If an average blank has been used rather than independent blanks for each

sample, then the blank variance needs also to be added into the final

statement.

These corrections can be made, where required, by the use of propagation of

error formulas. The main problem confronting us with virtually all real

data is that s tends to underestimate q because n is small (n(20), and

furthermore not all estimates of s will be based on the same replication

density. For example we will have ~ the number of blanks, nc the number of

26

points used to calibrate, and perhaps ~ the number of replicates (or pooled

replicates). To convert s to an unbiased best estimate of 0 (20, 30 •• ) we

should multiply by an appropriate "t" value (depending on n). This is

basically a normalization process for error estimates based on few

replicates. Here we will use a(2) 0.05 (also expressed as 95%

confidence) which converges to 1.96 for n large. Therefore in the

propagation of error formula we are replacing 0 with 1.96 0 ~ t o•05 ,n-1 x s.

(One could equally choose some other "t" which converged to a, 20, 30 for

example if that were desired.)

What is proposed here is a "modular ll approach whereby bias is estimated,

and random error is estimated for each of the components. Provided some

planning is done before the measurements, this should make the calculations

relatively straightforward. Ultimately the propagation of error formula will

indicate how to combine the errors properly. Necessary formulas are given with

worked examples in Appendix 5. A last step to meld random error and bias into

one concise error statement has been proposed by Taylor (1981). This is done by

assuming the "til distribution for random error, and calculating a confidence

interval as;

C n

-5-

where A is bias, t the Student factor (given n-1 degrees of freedom and some

arbitrary confidence) and n is the number of replicates upon which s, the

standard deviation, is based. This form is particularly convenient since the

calculations so far have been arranged in the form "ts ll• However, the division

by vn to convert error to "error of the mean" is not normally this simple since

there are different n's for each of the measurements. To do this correctly we

must return again to the propagation of error equation and insert (tsT where n

formerly was used (tsr (see Appendix 4). In reporting, a final error statement

along the lines of equation 5 may accompany but should not substitute for the

27

COMBINING the ERROR STATEMENTS

Bias

Random Bias CaljhralipO

m, 8,6,

lim. Sa. nc statement

IIlanI!.s ~ ii, 5B. nB statement

i • : .......................... -: : Limit of Detection : . . . . :· ................. u •••••• :

1. Reference Material 2. Reference Method{s}

3. Intercalibration

4. Method Validation

Random X, 5, n model formula

Information above the Line Should be Reported --------------

Figure 3

Grand Error Formula for Combining Bias and Random

Total Random Statement

. Formula lor Combining Random & Random Bias

28

simpler expressions X, s, n for each of the error contributors. Figure 3 shows

the modules which should be reported, and how they combine to form the final

statement of error. The above discussion and an understanding of the error

assessment process as shown in Appendix 7 will convince the reader of the

necessity to report each of the error modules and how they were combined.

4.8 Quality assurance

Quality assurance comprises two concepts; quality control, and quality

assessment which verifies that control is working (Taylor, 1981). Aside from

the laboratorids own error assessment, data validity can be strengthened by an

independent verification of the laboratory claims. Two components needed for

quality assurance are a IIcontrol system ll and a mechanism to verify the system.

Control is continuing, active, feedback which is used to correct problems and

assess accuracy. Bringing a system into control involves two stages, the first

of which is method validation and calculation of performance characteriestics.

The second stage is setting up the control system, the core of which is usually

a set of control charts and/or control calculations. At this point the errors

in the procedure have been reduced to acceptable limits, characterized

statistically, and included in the laboratory protocol. Elements of a

laboratory quality assurance program are listed in Table 2. These comprise what

would be considered good laboratory practice.

The task of the laboratory/analyst is to produce data within a known

uncertainty range, and document them fully. Elements in Table 2, if

implemented, will certainly achieve these aims. However it would be an

oversight to let the matter end there. While data produced by the analyst might

be impeccable, they could be useless from the point of view of interpretation

(Bewers, pers. comm.). The analyst or laboratory often does not have or need

expertise in interpreting environmental data and no incentive to develop it.

Therefore an essential step in the quality process is to ensure that expertise

to interpret the data is consulted before sampling, and from time to time to

review the data. Too often data are collected and compiled in a large and

growing file only to find in the end that they cannot be used by anyone.

29

Table 2

Quality assurance elements (After, Inhorn, cited in ACS Committee, 1980).

1. Maintenance of skilled personnel, written and validated methods, and

properly constructed, equipped and maintained lab facilities.

2. Provision of representative samples and controls.

3. Use of high-quality glassware, solvents, and other testing materials.

4. Calibration, adjustment, and maintenance of equipment.

5. Use of control samples and standard samples, with proper records.

6. Directly observing the performance of certain critical tests.

7. Review and critique of results.

8. Tests of internal and external proficiency testing.

9. Use of replicate samples.

10. Comparison of replicate results with other laboratories.

11. Response to user complaints.

12. The monitoring of results.

13. Correction of departures from standards of quality.

Element 10 in Table 2 (collabrative testing) is recognized as one of the

most essential components of quality assurance and there is a clear need to make

this option available to all laboratories. The use of referee methods and

reference sediments can assist the laboratory in a self-~udit and will

strengthen their quality statements but they should not substitute for the

round-robin. Reference materials should simulate the environmental samples as

closely as possible and they should be treated identically if they are to be

used as controls. The very limited number of certified reference sediments

(Appendix 3) will certainly not be representative of all sediments and therefore

laboratories will need to prepare their own uncertified materials. Certified

and uncertified reference materials will form the backbone of the control

process and therefore the laboratory protocol should schedule the number, order

and type of controls to be used. New analysts should demonstrate their ability

30

to perform the method within the performance characteristics before they handle

environmental samples.

A minimum quality assurance program should include control charts as

outlined in section 4.8.1. Not only must performance be acceptable (see

McFarren et aI, 1970) within the corporate aims (Table 1 in this case) but the

regulatory demands should be practical in terms of what can be obtained and how

much it will cost. According to Horowitz (1979) a "practical" method must be

reliable (accurate and specific), sufficiently rapid to provide timely

information (i.e. results can be provided within about one day), and economical.

The latter element comprises a number of factors including cost and stability of

reagents, cost and availability of instrumentation and expertise required to

perform analysis. Other considerations include the availability of reference

materials, problems of contamination (high blanks) and laboratory safety.

The task for ocean dumping and other environmental regulations is to

establish the concentration and variability of a contaminant in an environmental

reservoir. More specifically it is desired to know if the contaminant exceeds a

certain critical limit at which some action is necessary. What is not specified

is the acceptable risk of false rejections or false acceptances. It is beyond

the scope of this guide to delve into these issues except to point out that

environmental signals can confidently be detected provided the analytical

variability is less than about a third of the environmental noise. Furthermore,

confidence in an average of n replicates goes up as a function of n whereas

expense increases approximately with n. Therefore more than 3-5 replicates of a

single sample is probably not economically advisable, although saving replicate

material in case of later inconsistencies is always worthwhile.

4.8.1 Control charts

"Until a measurement operation ••• has attained a state of statistical

control, it cannot be regarded in any logical sense as measuring anything at

all" (Eisenhart, quoted in Taylor, 1983).

The design and use of the control chart has been reviewed (Mandel and

Nanni, 1978; Wernimont, 1979) and very detailed instructions are available for

presentation and analysis of control data (ASTM, 1976). The purpose of a

control chart is to establish that a measurement is in control; to maintain

control; and help assign cause when the process goes out of control. The

principle of preparing a Shewhart control chart is very simple, and an example

is given in Appendix 6. Points from a measurement (replicate of some sort) are

plotted as a function of time. After sufficient replicates have been collected,

a mean or centre line can be plotted, along with limit lines within which most

data should fall. There are as many different kinds of control charts as there

are measurements, going from something conceptually simple such as monitoring

the performance of a laboratory instrument (balance, AA, fluormeter etc.) to the

more difficult task of controlling a complete method including processing the

sample, making the determination and eventually calculating the result. To set

up a control chart requires about 20 points initially (Faires and Boswell,

1981). The mean is pencilled on the graph and the individual points are

plotted. The following three rules are suggested:

1. Not more than 1 in 20 results lie outside 2 standard deviations (warning

limit). A result outside 3 standard deviations requires action;

2. Not more than 7 consecutive results are on the same side of the mean;

3. There are no regular periodic variations.

Other control charts have been devised, but if emphasis is on the

accuracy of individual analytical results the simple chart outlined above is

adequate (Kirchmer, 1983; Natrella, 1982).

We also recommend the use of the Youden plot mainly because of its

visual impact (see Appendix 6). Reference materials are analyzed in pairs where

each member of the pair can be either identical or slightly different. One

member of the pair is plotted against the other and with sufficient replication,

error circles can be drawn analogous to the control limits of the Shewhart

control chart. The You den plot is actually a replicate or double control chart

and bas the advantage of helping to diagnose error source. Systematic errors

such as calibration or blank correction which vary from day to day tend to lie

on a line with 45° slope while random errors favour no direction. Generally the

combination of the two types of error forms an elliptical patttern. If the

centre of mass of the plotted points is different from the certified values,

then a constant bias in the method can be suspected. (See for example Macdonald

32

and Nelson, 1984, Youden and Steiner, 1978, Youden, 1968).

Whatever control technique is used, data should simulate normal

measurements. Control samples should be random and blind to prevent bias. The

rationale for the control chart use should be outlined in the laboratory

protocol; if numbers exceed the action limit the process should be stopped, the

problem identified, resolved and the process started again at the point last

known to be in control. Care should be taken to choose a statistic which is not

sensitive to concentration, for example CV (s/X) might be a better choice than s

in some cases (ASTM, 1979). Another option would be to divide the range of

measurements into sections where the statistic is reasonably well behaved and

prepare control charts for each range. This latter option is likely to be

tedious and not practical.

For short term measurements, the control chart will not be very useful

although control samples are still required. The frequency of control samples

cannot be etated at the outset; experience will dictate how many are required as

the procedure becomes understood. It is likely that more controls will be

needed initially.

Control charts could be maintained for the following:

1. Reference material. Single or better still paired samples are analyzed

(Youden chart). Use of two concentration levels spanning the expected range

is desirable.

2. Blind replicates of environmental material. Random samples are well mixed

and split to be analyzed in duplicate (or triplicate ••• ). The difference

between duplicates or the range (Appendix 6) is plotted with time.

Duplicates should not be run sequentially. An "out of control" duplicate

may not indicate an "out of contro11l analytical system, and checking results

for (1) above will help to resolve the problem. Choice of model (see

precision) is important since duplicates run during the same batch, or in

different batches will not evaluate the same error.

3. Standard solutions. Results for check calibrants should be plotted as a

Shewhart or Youden control chart. We recommend a high concentration (slope

bias) and low concentration (blank bias) be used.

33

4. Blanks. Blank control tends to be neglected (King, 1976). Blanks greatly

influence LOD and contribute error (bias and precision) to all samples.

Blank control charts therefore help in the assessment of LOD and detect

contamination quickly.

A control chart should also be maintained for any other instrument or

process which is used in the analytical system, the analytical balance being the

most important example. Where electronic data handling and processing takes

place, a check set of data should also be used to verify that the program is

working properly. This does not require a chart, but perhaps a "tick" system to

ensure that it is always done.

5. RECOMMENDATIONS FOR VERIFYING QUALITY OF DATA

The individual, or group, who wish to interpret the data or use it for

making decisions require a mechanism for independently assessing the quality of

the data. About 10-20% of the total budget should be spent on this, more for

areas which are particularly sensitive or when litigation is likely. We

recommend the following:

1. The laboratory protocol and quality assurance program should be available

and kept on file by RODAC. Laboratories which cannot produce detailed

procedures of validated methods with performance characteristics are not

excercising quality control.

2. Blind replicates, reference material and calibrants should be submitted for

analysis from time to time. Performance on this material would have to

equal or exceed the laboratories claims for accuracy.

3. Evidence of quality control procedures should be available with each data

set. This would normally consist of recent control charts, and should be

backed up by occasional audits. The laboratory should keep raw data on file

for a specified time period (2 years) and should be able to produce it.

Inspection of the laboratory can also give evidence that they follow the

quality control practices outlined in their protocol.

4. Occasionally samples should be checked by independent reference methods,

particularly for sediments where problems are anticipated (high sulfide or

organic content for example).

36

Horowitz, W., 1979. Practicality in regulatory analytical chemistry, Analytical Chemistry 2.!... 741A-745A.

Kaiser, H., 1970. Quantitation in elemental analysis, Analytical Chemistry ~ (4), 26A-59A.

King, D.E., 1976. Evaluation of interlaboratory comparison data by linear regression analysis, in National Bureau of Standards Special Publication 464, Proceedings of the 8th Symposium, P. 581-596.

Klein, R.J. and C. Hach, 1977. spectrophotometric analysis.

Kirchmer, C.J., 1983. Quality Technol., ~ 174A-181A

Standard additions: uses and limitations in American Laboratory 21-27.

control in water analyses, Environ. Sci.

Ku, H.H., 1968. Statistical concepts in Metrology, p. 296-330 In, Precision Measurements and Calibration Statistical concepts and Procedures, N.B.S. Special Publ. 300 volume I, ed. H.H. Ku, Washington, D.C. 436pp.

Long, G.L. and J.D. Winefordner, 1983. Limit of detection - a closer look at the IUPAC definition, Analytical Chemistry, ~ 712A-724A.

Macdonald, R.W. and H. Nelson, 1984. A laboratory performance check for the determination of metals (Hg, Zn, Cd, Cu, Pb) in reference marine sediments, Canadian Tech. Rep. of Hydrography & Ocean Sciences No. 33, 57pp.

Mandel, J. and L.F. Nanni, 1978. Measurement evaluation In Quality assurance practices for health laboratories, S.L. Inhorn, ed., APHA, N.Y.

McFarren,·E.F., R.J. Lishka and J.H. Parker, 1970. acceptability of analytical methods, Analytical

Criterion for judging Chemistry, ~ 358-365.

Natrella, M.G., 1983. Experimental Statistics, NBS Handbook 91, u.S. Dept. of Commerce, Washington, D.C., various pagings.

Porter, W.R.; 1983. Proper statistical evaluation of calibration data, Analytical Chemistry ~ 1290A.

Russell, D.S., 1984. Available standards for use in the analysis of marine materials, Marine Analytical Standards Program, National Research Council of Canada, Report No.8, NRCC No. 23025, 35pp.

Samant, H.S., D.H. Loring and S. Ray, 1979. Laboratory evaluation program. First quality control round-robin surveillance. Report EPS-4-AR-79-1, Environment Canada, 38pp.

Taylor, J.K., 1983. Validation of analytical methods, Analytical Chemistry ~ 1588A-1596A.

37

Wernimont, G., 1979. Statistical control of measurement processes, In, Validation of the Measurement Process, J.R. Devoe, ed., ACS Symposium Series 63, American Chemical Society, Washington, D.C.

Youden, W.S., 1968. Graphical diagnosis of interlaboratory test results, p. 133-137, ~ Precision measurements and Calibration Statistical Concepts and Procedures, N.B.S. Special Publ. 300, volume I, ed., H.H. Ku, Washington, D.C. 436 pp.

Youden, W.S. and E.H. Steiner, 1975. Statistical manual of the association of official analytical chemists, assoc. of official analytical chemists, Arlington, Va., 88pp.

Standard Operating Procedures:

Statistical Control:

Technique:

Uncertainty:

Validation:

Verification:

40

Detailed written procedures (Taylor, 1981).

Measurements behave like random samples from a probability distribution, and therefore can be predicted (Natrella, 1982).

Scientific principle useful for providing compositional information. (Taylor, 1983).

Allowance assigned to a measured value to include two major components of error: (1) Bias, and (2) random error. (Natrella, 1982).

An experimental process involving external corroboration by other laboratories (internal or external) or methods or the use of reference materials to evaluate the suitability of methodology (ACS Committee, 1983).

The general process used to decide whether a method in question is capable of producing accurate and reliable data (ACS Committee, 1983).

41

A.2 APPENDIX 2 THE RUGGEDNESS TEST (AFTER YOUDEN & STEINER, 1976).

This test is a simple way to learn if the results of a determination are

sensitive to small procedural changes, for example temperature of a digest or

length of time for an extraction. A rugged procedure is relatively insensitive

to small changes and is more likely to produce consistent results when subjected

to normal "abuse" by different analysts working in different laboratories. A

ruggedness test should be used as an integral part of method validation, and the

results of the test can help to estimate the performance characteristics and in

the preparation of the written laboratory procedure or protocol.

A particularly efficient procedure based on a fractional factorial

design may be used to investigate up to 7 variables with only 8 determinations.

Unfortunately the consequence of the design is that main effects are confounded

with some of the possible interactions while other interactions cannot even be

estimated. To interpret results one is forced to assume that confounded

interactions are negligible. Furthermore, sorting out random error from real

effects is difficult since one cannot calculate the former. A simple solution

to the problem is to run the ruggedness test in duplicate with a total of 16

determinations.

The basis of the ruggedness test is to allow each of 7 variables to have

two states, preferably representative of the extremes likely to occur when a

procedure is followed by two different analysts using different equipment.

Defining the two states by upper and lower case letters, one obtains a factorial

design like that shown in Table A-2.

Calculations as illustrated by the worked example are performed using

the average differences between upper and lower case results, DA, DB ••• DG,

DA

' , ••• DG

' where

D ' G

r+t+u+v 4

w+x+y+z 4

r'+v'+x'+y' - t'+u'+w'+z' 4 4

a-2-l

The differences

can provide an estimate

s = 2

(D -D ') A A

42

- D.' are independent of factor effects and therefore 1

the random error, s, with 7 degrees of freedom

2 + ••• +(D -D ')

G G 7

a-2-2

For an effect to be statistically significant at the 95% confidence level, the

absolute mean difference, I Di + Di

' I, of any factor must exceed 1.18s.

2

In the worked example, factors found to have a significant effect in

order of importance are A > C > B - E. Factors F and G which were unassigned

did not (and should not) have a significant effect.

Several types of information are available from the ruggedness test. We -1

have an estimate of s at the concentration of 1.45 ~gg , and we know that

temperature must be rather closely controlled during digestion. If we have a

certified value for Cd concentration in the material, or a IIreliable"

measurement by an independent method, we can estimate bias and recommend

operating conditions which will go into the laboratory procedure or protocol.

If the variance is approximately the same for blanks as it appears to be for -1 these samples, the detection limit will be about 0.12 ~gg and the method shows

much promise for complying with ocean dumping requirements. In fact the

precision could be called excellent.

The ruggedness test could also be a useful way of training an analyst

new to the method, and allowing him to evaluate for himself which factors are

likely to have an important effect on the final data. It would also show the

ability of the analyst to produce data which conform to the performance

characteristics prior to analyzing environmental material.

43

TABLE A-2

Eight combinations of seven hypothetical factors used to test the ruggedness of an analytical method.

FACTORS DETERMINATION /I

1 2 3 4 5 6

Digestion Temperature A A A A a a o 0 A, 100 C a, 90 C Digestion Time B B b b B B B, 2 hours b 3 hours Volume of acid C c C c C c C 6 mL c 10 mL Ratio of HCl:HND:3 D D d d d d D, 3:1 d, 2:1 Digest Storage Time E e E e e E E, 1 day e, 1 week Unassigned F f f F F f Unassigned G g g G g G Gene ra lized observed results r t u v w x

r' t' , , w' x ,

u V

Hypothetical results for 1.47 1.63 1.58 1.72 1.14 1.38 worked example Cd (pg g-l) 1.54 1.56 1.64 1.69 1.10 1.41

WORKED EXAMPLE USING THE HYPOTHETICAL RESULTS

DIFFERENCES A B C D E F

Di 0.300 -0.090 -0.200 -0.010 0.050 0.000 D'i 0.295 -0.115 -0.180 0.000 0.145 0.015

Di - D i -0.005 0.025 -0.020 -0.010 -0.095 -0.015 IDi; D'il 0.297 0.103 0.190 0.005 0.098 0.008 -- -- -- ---

0.039 CV s x 100 3% X

1.18s = 0.046

7 8

a a

b b

C c

D D

e E

f F G g y z , , y z

1. 21 1.47 1.20 1.54

G

-0.010 0.000

-0.010 0.005

:. Factors A, B, C and E are significant at the 95% confidence level.

44

A.3 APPENDIX 3 CERTIFIED REFERENCE SEDIMENTS (SEE RUSSELL, 1984 FOR A

COMPLETE LISTING OF PRESENTLY AVAILABLE REFERENCE MATERIALS)

A.3.1 Metals

Metal Concentration, -1 )lg g

Schedule 1

Reference Organization Hg Cd Pb Cu

MESS-l MACSP-NRC 0.171 0.59 34.0 25.1

BCSS-l MACSP-NRC 0.129 0.25 22.7 18.5

SRM-1646 NBS 0.063 0.36 28.2 18

MAG-I USGS (0.2) (24) 27

( ) Data are based on limited results

A.3.2 Chlorinated Hydrocarbons

Reference Organization Compounds 1 X (s) )lg kg-1 ----~~~~--~~~~--~~~~

CS-l

HS-l

HS-2

MACSP-ARL

MACSP-ARL

MACSP-ARL

1 Relative to 1254

PCB

PCB

PCB

2 Individual compounds also determined

1.15 (0.60)

21.8 0.12)2

111.8 (2.5)2

Schedule 2

Zn As Be Cr

191 10.6 1.9 71

119 11.1 1.3 123

138 11.6 (1. 5) 76

135 (3) 105

Ni V

29.5 72.4

55.3 93.4

32 94

S4 140

45

A.3.3 Reference sediments in preparation

Reference

SD-N-l/l

SD-N-l/2

SD-N-2

Organization

MACSP

lAEA

IAEA

IAEA

BCR

Elements or Compounds

Polycyclic aromatic hydrocarbons (PAR)

Low-level transuranics

Trace elements, U, Th + Decay products

Low-level transuranics

May prepare marine reference sediments in future

A.3.4 Addresses

BCR: Community Bureau of Reference

Directorate General XII

lAEA:

MACSP:

NRC:

Commission of the European Communities

200 Rue de la Loi

B-1049 Brussels

Belgium

International Atomic Energy Agency

Analytical Quality Control Services

Laboratory Seibersdorf

P. O. Box 590

A-lOll Vienna

Austria

Marine Analytical Chemistry Standard Program

National Research Council of Canada

1. Division of Chemistry

Montreal Road

Ottawa, Ontario

KIA OR9

ARL: 2.

46

Atlantic Research Laboratory

1411 Oxford Street

Halifax, N.S.

B3H 3Z1

NBS: National Bureau of Standards

U. S. Department of Commerce

Washington, D.C. 20234

USGS:

U. s. A.

United States Geological Survey

National Centre

Stop 972, Reston, VA 22092

U" S" A.

47

A.4 APPENDIX 4 PROPAGATION OF ERROR

When a quantity, Q, is calculated indirectly by combining several

measured quantities, each with an associated error, the overall error in Q can

be estimated from the theory of propagation of error. Suppose Q = f (A,B,C ••• )

and that each variable has associated with it an independent error with variance

2 2 °A' O"B •••

Then the variance of Q is given by

2 aB

+ .... a-4-1

The best estimate of a J particularly when the number of replicates, n, is small,

is ts where t is the Student factor (which depends on n and the number of a

limits one wishes to estimate) and s is the sample standard deviation. The

formula becomes

a-4-?

And where it is desired to estimate the "variance of the mean"

o~ • (~)2 (tsA)2 + (~\2 (tsB)2 .... aA - aBJ-

n A nB

a-4-3

where nA,nB ••. are the number of replic~tes associated with each respective

standard devi~tionJ sA' sB ....

Table A-4-1 shows these equations applied to common cases.

FORMULA Q=

A ± B

aA ± bB a, b constant

AB, AB- 1

An

In[A]

A+B e

TABLE A-4-1

Propagation of error fo~ulas for commonly encountered cases. 1

2 . f 2 Best estimate of cr~ crQ Best est~mate 0 crQ (eqn. a-4-1) (eqn. a-4-2) (eqn. a-4-3)

2 + 2 2 2 2 2 cr

A cr

B (ts

A) + (ts

B) (ts

A) (ts

B)

- +-n

A n

B

2 222 2 2 2 2 2 2 2 2 a u

A + b cr

B a (ts

A) + b (ts

B) a (ts

A) b (ts

B)

- + -n

A n

B

~Ay [U! cr~] B 'i! + 2 A B (~y

[(tsA)2 + (tsB)2]

A2 B2 (~r [(ts{ + (ts~)2J

nAA nBB

n2

[An

-1

crAJ2 2 [An

-1 J 2 n tS

A :: [An

-1

tSAr

2 2 2 uA

(tsA

) (tsA

)

A2 A2 . . A2 nA

r+By [cr! + cr~ +cr2e]

[ 2 2 2 2 2 tA+BY (tsA) +(tsB) + (tse ) t .... +Br [Cts

A) ~+(tsB) ~ (tse e (A + Bf e

2 e (A+B/ e 2 e - 2 + 2

nAnB(A+B) nee

-

)21 J

-"" co

A+B + D C

rA+B12 [a!*,,~ a~} 2

[CJ (A+B)2 + c2 aD

rA+B]2 (tsA) +(tsB) +(tse) l 2 2 2J [ e] (A+B) 2 e2 t~Br

+(tsD

/

2 2

[

(tsA) nB+(t~B) nA +

nAnB

(A+B)

2 (ts

D)

+-~

1. "t" is Students t and depends on tile degree of confidence desired (number of a limits) and the respective number of replicates enA' nB ••• )

(tsc~ 2 J

nee

.0-

'"

50

A.5 APPENDIX 5 COMMON FORMULAS USED TO SUMMARIZED DATA AND MAKE QUALITY

STATEMENTS. (SYMBOLS AND ABBREVIATIONS ARE GIVEN AT THE

END OF THIS APPENDIX.)

As described in the text under lIaccuracy" and shown in Figure 3, a

complete error treatment can be considered in several simple compartments which

can be combined appropriately to make the final error statement. In this

section are described formulas which will assist in presenting compact

statements of data and their quality to fit in each of the compartments.

For data sets based on only a few samples, uncertainties will always be

great. Where not much is known about the underlying distribution for the data

it is worthwhile keeping the Tschebycheff inequality in mind. It states that

regardless of distribution the probability of a measurement exceeding an average

by more than k standard deviations is less than or equal to l-~, or;

k2

Xi > X + ks; P < 1 -

Therefore at least 89% of the numbers will be less than X + 3s

(regardless of distribution), and in the favourable circumstance of a normal

distribution with the determination of s based on many replicates, 99.9% will be

less than X + 3s. The real world will always be between these two extremes.

Caution in the use of the formulas.

Care must be taken with units since averages and standard deviations can

be calculated at three different stages.

Stage 1 - Direct reading from a recorder; Y. (mV, em) 1

Stage 2 - After a calibration process; Ci

(llg L-1 )

Stage 3 - After correction for weight of sediment and/or makeup -1

volume; ~ (ll g g dry weight basis)

Here we will always

stages. It is assumed that

eventually reported.

use Y., C. and X. to designate the respective 1 1 1

some derived quantity such as Xi is what is

51

A.5.1 Sample standard deviation, s.

Formula

Example

Report

n (n-1)

X. 4.00, 3.28, 3.25, 4.03, 3.27 ~

X,s,n 3.57,0.41,5 (llg g-l)

t a-5-1

t This formula is· subject to error if roundoff is carried out carelessly.

A.5.2 Pooled standard deviation, sp.

Formula

Example

Report

s = a-5-2 p

where vi = ni

-1 = degrees of freedom.

For the case where data consist of k duplicates and di

is

the difference between duplicates;

v. = 1 ~

s p a-5-3

Suppose for a data set, 20% of the samples were run as blind

duplicates, on different days with the following matched

pairs, -1

(Xl' X2)lg g ).

(1.61, 1.29) (l.18, 1.25) (1.72, 1.60) (0.31, 0.41)

(0.21, 0.29) (2.48, 2.59) (4.81, 4.39)

s = ,(0.327 = 0.15 llg g-l p ~

For the range 0.21 - 4.8111g g-l, s = 0.15 )lg g-l (n=2k=14) p

(plus the formula used and a statement describing the model).

52

A.5.3 Rejection of outliers

Extreme observations may occur due to gross error or may be part of the

population. There are two reasons for rejecting outliers; known contamination

(sample mishandling) and statistical improbability. In the first instance, if a

sample is suspect, it should not be analyzed. For the second instance,

statistical means exist for removing outliers, such as range tests (Dixon and

Massey, 1969) or Chauvenet's criterion. Regardless of which method is used, it

is not likely to have much meaning for small data sets (n(4). In that case

there is no easy solution to dealing with "wild" data points except re-analysis

or re-sampling or both. Criteria for data rejection should be decided before

data are evaluated. Chauvenet's criterion is tabulated below.

Table A-S-l. Chauvenet's criterion for rejection of suspected value

having a deviation from the mean of 8 = X. - X ~

n 8/ (J n 8/0 n 0/ (J

3 1.37 7 1.80 20 2.24

4 1.53 8 1.87 30 2.39

5 1.64 9 1.91 50 2.57

6 1.73 10 1.97 100 2.80

For a small population, s may seriously underestimate a and the "t " distribution

could be used to correct this.

Example

Report

Xi = 1.60, 1.59, 1.71, 1.28, 1.60, 1.48, 1.65, 1.73, 1.43

X ~ sen) = 1.56 ~ 0.14 (9)

Should 1.28 be eliminated because it is too low? 8

= 2.0 > 1.91 s

Reject 1. 28

X= 1.60 + 0.10 (8)

- -1 X, s, n = 1.60, 0.10,~g g ,8 (One low data point

was rejected by Chauvenet's criterion)

53

A.5.4 Calibration using linear regression (see Natrella, 1963)

A calibration is usually performed to relate a measurement such as

.recorder response to one which is more useful such as concentration. If the

relationship is linear;

Y = a + m C

The normal model used for the calculation of the regression assumes that there

is no error in the independent variable, C. Careful calibrant preparation can

arrange Sc « sY and therefore the regression should be performed according to

the above equation. The appropriate formulas are:

Formulas

slope

intercept

variance of Y

variance slope

of

variance of intercept

correlation coefficient

m

a

nE CY-E CEY

nEC 2 _ (EC) 2

EY m E C n n

2 s

1 on-2

Ey2 _ 1. (EY) 2 - mECY + !". ECLY n n

2 2 s - s m

EC2 _ (EC)2 n

2 s2EC2 s a 2 (EC) 2 nEC -

r = nECY - ECEY

~[~l:C2 _ (EC) 2J [nLY2 - (l:Y) 2J

a-5-4

a-5-5

a-5-6

a-5-7

a-5-8

a-5-9

54

A.5.4.1 Linearity

How best to perform a calibration should be investigated as part of

method validation and prior to preparing a laboratory protocol. If it is

desired to use a linear regression formula, the "linear! tyll of the calibration

should be demonstrated along with the range for which it is linear. The

protocol should specify that calibrations and samples must always be within this

linear range.

The linearity test can be performed if there are replicate observations

on Y (recorder response) at one or more values of C (concentration). The

recommended 5 calibration points in triplicate would supply exactly this sort of

information.

Suppose there are n different concentrations, and that at each

concentration, ki

(i = l,n) replicate observations of recorder response are

made. Table A-5-2 shows a worked example of the linearity test using

hypothetical data. To demonstrate linearity, the calculated F should be less

than the F . (taken crlt

from common statistical tables) at the chosen significance

level, normally a (1) = 0.05. The worked example shows that the first 6

recorder responses have a linear relationship with concentration but that as the

concentration gets larger the calibration becomes non-linear and response tends

to drop off. If the calibration data are plotted, it will be seen that the -1 recorder response is tending to drop off slightly even at 70 llg L

concentration, and inclusion of this point would cause a negative bias in -1 estimated concentration near zero and a positive bias near 70 ~g L Therefore

the linearity test is not a substitute for analytical experience, and

calibration data should be plotted to get a clear idea of what the recorder

response function is like. The linearity test and the plotted data lead one to

conclude for the hypothetical case in Table A-5-2, that linear regression is

safe provided concentration of analyte presented to the instrument is no more -1

than 50 llg L •

TABLE A-5-2 Linearity test

1 2 .3 4 5 6 7 8 9 10

k. kiCi 2 . (EY) 2/k .

-1 k.C. ECY ~ ~ ~ ~ Y cm ~g L

(1:Y) 2 Ey2 (6x12

) Concentration Recorder Response l.:Y (6x1) (lx.3) (4x6)

0.00 0.2, 0.4, 0.7, 005 1.8 3 0.94 4 0.00 0 0.00 1 5.00 12.0, 11.5, 11.4 34.9 1218 406.21 3 15.00 75. 174.5 406

10.00 23.4, 23.8, 23.0 70.2 4928 1643.0 3 30.00 300. 702.0 1642.7 30.00 69.0, 71.1 140.1 19628 9816.2 2 60.00 1800. 4203.0 9814 50.00 117.1, 110.2, 115.2, 114.4 456.9 208758 52214.9 4 200.00 1000000 22845.0 52190 70.00 148.1, 150.3, 142.1 440.5 194040 64716.1 3 210.00 14700.0 30835.0 64680

n = 6 j=k-n=13 1: 1144.4 128797 19 515.00 26875 58759.5 128734

.3 5 6 7 8 9 10

Calculations

51 =10-l b 9-7.3/k 52 =b[9-7.3/k] S3=5-.32/k

k 8 _ 72/k ~

F = [51 - 52] [k - j] = [59805.0 - 59579.8J [-..E.] = 1. 95 F . (6,11) = 3.09 j - 2 59868.0 - 59804.0 11 cr~t

53 51 :. F < F ., :. Calibration is linear crlt

Suppose we have two more calibrations at higher concentration

100 I 202.1, 204.3, 198.0, 20301 807.5 652056 163036.5 4 400 150 268.1, 270.3, 265.0

n = 8 j=k-n=17

F = [203232 - 200647J [~] = 13.9 203331 - 203232 15

803.4 645452 215164.7 3 450

1: 2755.3 1723083 506998

.3

F . (8,15) = 2064 crlt

5

25 1365

7

F > F . crlt

40000 80750 163014 67500 120510 215151

134375 260020 506899

8 9 10

Calibration is not linear

56

A.5.4.2 Linear regression

Assuming that linearity has been established, linear regression can be

applied using the well known formulas given above. The worked example below

(using selected data from Table A-5-2) shows what to report.

Example

Report

Suppose we have the following calibration data reported as -1 C Ilg L ,Y cm.

(0.00, 0.2), (5.00, 12.0), (10.00, 23.0), (30.00, 71.1),

(50.00, 117.2)

n 5 s 0.0122 m

m = 2.346 s 0.325 a

a 0.111 C 19.0

i 0.257

Calibration was done by linear regression, Y = a + mC where -1 Y em is recorder signal and C II g L is analyte

-1 concentration for the range 0 - 5011 g L ,n = 5. (The

linearity test shows calibration to be linear within this

range at the" (1) = 0.05 significance level.)

Y = 0.30 + 2.32 C, s = 0.303, s = 0.194, s 0.00732, C = 19.0 a m

A.5.5 Blank correction

As noted in the text, care needs to be taken to assess correctly the

error contributed by the blank. With an independent blank for each sample

(seldom the case), calculation of net concentration will correctly contain error

from the total and blank, and standard deviation may be calculated as shown in

eqn. a-5-1 above. More commonly, a limited number of blanks are available and

their average is used to correct total to net concentration for a given data

batch. When this is done, variation in the blank is eliminated and will not

show up in the final error statement. This can easily be corrected through the

propagation of error formula. The following example illustrates the difference

57

in "apparent error II between the two methods of calculating, and how the

correction should be made when an average blank is used.

ExamEle

ReEort

The following blanks and samples are measured -1

(llg L )

Total, Ti 171 149 157 161 138 T, sT = 155.2, 12.5

Blank, Bi 11 18 23 a 7 B, sB ll.8, 9.0

a) Ti - B. ].

Ci 160 131 134 161 131 C, s = C 143, 15.7

b) T. - B = Qi IS9 137 145 149 126 Q, t 143, 12.S s = 1. T

t The important point is that error associated with ~ is

identical to sr and underestimates Be' the true error. The

correct error associated with Q can be calculated according

to the propagation of error formulas (Appendix 4);

sQ =~s~ + s~ =~12.s2 + 9.02

= IS.4

- -1 B, s , n = ll.8, 9. 0, S (11 g L )

Also make a statement about how blanks were run (randomly)

how totals were blank corrected (a or b) and finally how

error from the blank was factored into the final error

statement. (See Appendix 7 for a complete example).

a.S.6 Limit of dectection (LaD)

58

a.5.6.1 IUPAC definition

Formula

Example

LOD (IUPAC) = 30 B m

LOD (IUPAC) = 3s B

(n > 20)

m

LOD (IUPAC) = ts B

(n small)

m

where

terms

t is to. 001 (1), n-l and sB is calculated in of recorder signal output.

a-5-10

a-5-11

a-5-'12

The following blank peak heights were measured (cm) with

the system calibrated according to the example in A.S.4.2

above.

2.1, 3.0, 1.4, 5.1, 4.1, 2.6, 3.3, 1.9, 1.1, 5.6, 3.8

YB

, sB' n = 3.09, 1.5, 11

t 4.1 0.001 (1) ,10

LOD = 4.1 x 1.5 2.346

m = 2.346

If the normal procedure is to digest about 0.8g (dry

weight of sediment) and make up final volume to 50 mL,

this corresponds to:

LOD = 2.6 x 50 ~g i l (dry weight) 0.8 1000

0.16 ~g g-l

Report

S9

LOD (IUPAC) = 2.6 llg C 1

(LOD = tSB

where" (1) 0.001, m sensitivity and n = 11 )

m

Normally about 0.8 g of sediment are used and digests

are made up to SO mL which corresponds to an LOD of

0.16 llg g-l

A.S.6.2 LOD by propagation of error (includes calibration error)

Formula The blank expressed as a concentration is;

a-S-13

m Therefore according to the propagation of error (Appendix 4)

m

or for small numbers of blanks and calibrations

222 30B ~ YB.-a (tsB) + (tsa ) + (tsm)

m 2 2 (Y

B - a) m

a-S-14

a-S-1S

where Student's t (a(l) = 0.001) is based on the

respective degrees of freedom for the blank (n-1) or

residual error of the calibration (n-2).

Example

Report

60

Assume the same blanks given before (A.5.6.1) and the

calibration data (A.5.4.2). We have

a = 0.30

1.5 m 2.35

11 0.00732

t(0.001,10) = 4.14

s a

0.194

n = 5 c

t (0.001,3) 10.2

LaD (eqn a-5-15, n small) 1.19 ..,j 4.95+0.50+0.001 = 2.8 llg L- 1

Evidently the contribution to LOD from error in the

intercept is small and from error in the slope is

negligible, but this will not always be the case.

Normally, the intercept error will contribute more than

slope error so it is important to calibrate near zero to

fix the intercept accurately.

LOD (propagation of error) = 2.8 II g L- 1 (Based on

II blanks and 5 calibration points, and normalized to 30

using the appropriate t values. Give formula used, or

reference to i~.)

For 0.8 g sediment digests made

of 50 mL this corresponds to an

up to a working volume -1

LaD of 0.17 llg g

A.5.7 Symbols used in Appendix 5

a

B

C

(A bar over any symbol denotes average)

Intercept on a calibration line. -1

Blank measurement, ~g L -1 Concentration after applying calibration formula, llg L

CV

d

o k

LOD

m

n

Q

s

s a

sB s m

s p'

S q

T

61

Coefficient of variation (CV ~ six x 100).

Difference between two replicates, Xl - ~.

Distance away from an average, Xi - X. The number of standard deviations used for determining L.O.D.

The limit of detection.

The slope of a calibration line (analytical sensitivity).

The number of samples, blanks (~) or calibration points (n ). -1 C

Samples corrected by subtracting an average blank, ~g L •

standard deviation.

Standard deviation of the intercept of a linear calibration.

Standard deviation of the blank.

Standard deviation of the slope of a linear calibration.

Pooled standard deviation.

Standard deviation of a sample corrected with an average blank.

Samples after applying a calibration formula but before a blank

correction.

t Student's t.

X concentration of metal in a sediment, 11 g g-l

Y Signal from an instrument, (mm, volts etc).

YB Signal from an instrument when measuring a blank.

o Population standard deviation.

v Degrees of freedom normally n-1.

62

A.6 APPENDIX 6 CONTROL CHARTS

Here I give worked examples of three simple control charts which should

be adequate for most laboratory analyses. They are the Shewhart chart (good for

detecting a shift in results), the Youden control chart (helps to differentiate

random and systematic error problems) and the standard deviation or range chart

(good for detecting increased variability or noise). Use of combinations of

these control charts should help to spot and diagnose problems quickly. More

complex approaches are possible and the interested laboratory should refer to

the ASTM manual (1976) or to Friedman and Erdmann, 1982.

To prepare a control chart, a stable. measurement is required, and we

have suggested (1) Reference material, (2) Blind replicates, (3) Check

calibrants and (4) Blanks as possible candidates. Additionally, physical

measurements such as weighing need some control chart attention. A faulty

balance can generate error which will pervade all measurements made in a

laboratory.

A.6.1 The Shewhart Control Chart

The following hypothetical Cd data (~g g-l) for reference material

(MESS-I) have been generated over a number of days, using the same procedure,

analyst and instrumentation.

0.63, 0.58, 1.8, 0.50, 0.40, 0.59, 0.58, 0.64, 0.59, 0.62, 0.61, 1.0, 0.40,

0.40, 0.58, 0.58, 0.55, 0.49, 0.63, 0.68, 0.62, 0.59, 0.54, 0.58, 0.60, 0.65;

X, s, n = 0.63, 0.26, 26.

The numbers 1.8, and 1.0 look suspiciously high, and both can be eliminated

(stepwise) by Chauvenet's criterion. This should be done, because we want the

control chart to be based on samples which we feel were "in control". The

revised estimates are X, s, n = 0.568, 0.078, 24. The chart may now be drawn

with a line representing the mean, X and two control limits which are X + 3s.

Subsequent data collected during several weeks are marked on the control chart

(Fig. A-6-1), along with pertinent comments.

Examination of the Shewhart control chart shows its potential utility.

For example in Figure A-6-1, new standards seem to contribute to variance

f(&- 4-10--/

--I Ul Ul w ::E

iii ;: CI

SHEWHART CONTROL CHART

~0.8L \ 5i 0.7-

New Batch of

Graphite. Tubes

--. - - -~

~ CI ~ 0.6-1 ••

I: 00.5-

~0.4~ -I· •••

I: (II CJ I: o

CJ New Standards

" CJ

•• 1 • • • • • •

• • • •

• • • •••

I New Standards

Action Limit

- - Warning Limit

Mean

• Warning Limit

Action Limit

r---- -,----- - --,- ---,-----,-----,

5 10 15 20 25 5

Febru ary 1984

x+3s )(+2s

x

)(-2s

x-3s

'" w

66

TABLE A-6-1

PAIR MESS-1 BCSS-1 MESS MESS CALCULATIONS -1 -1 +

Cd~g g Cd~g g BCSS BCSS S = 0.030

n X Y T D R

Ss 0.069 1 0.63 0.28 0.91 0.35

2 0.58 0.25 0.83 0.33 S 0.075

3 1.8* 0.27

4 0.40 0.20 0.60 0.20 CONTROL LIMITS 5 0.59 0.27 0.86 0.32

2s = 0.085 6 0.58 0.27 0.85 0.31 D

7 0.64 0.30 0.94 0.34 3s = D 0.127

8 0.59 0.68*

9 0.62 0.26 0.88 0.36 NOTE: Variances for X and Y

10 0.68 0.30 0.98 0.38 appear to be different.

11 0.55 0.27 0.82 0.28

12 0.50 0.18 0.68 0.32

13 0.59 0.22 0.71 0.37

14 0.61 0.26 0.87 0.35

15 0.49 0.20 0.69 0.29

16 0.52 0.22 0.74 0.30

17 0.66 0.08 0.84 0.48*

18 0.60 0.24 0.84 0.36

19 0.49 0.18 0.67 0.31

20 0.55 0.26 0.81 0.29

21 0.51 0.20 0.70 0.30

AVG 0.56 0.24 0.799 0.320

s 0.068 0.039 0.106 0.0423

n = 18

* Data points removed according to Chauvenet's criterion.

Fr . --r'-I' -- ( L

O.B

0.7 "U ()

"111 ~, / .6

III ::I.

0.6J I ~ oA -I 1/1

.,J ~ 1/1 ILl ::l W -3

0.41 t'; "?-o I 0.1 0.2 03 BCSS-1 I'g g-l Cd

0.4 -Q -1/1 1/1 () 0.3 ID I

1/1 1/1 ILl ::l

0.2

5 10 Days

\ I

)

0.4

Warning

-t--1/1 1/1

0 () ID

+ 1/1 1/1 ILl

Warning ::l

15

.,

YOU DEN CONTROL METHOD

+ Certified Value

o Average n= 18

Inner Circle Radius 2s R= 0.060

Outer Circle Radius 26 = 0.150

1.0

0.9 I ! ... I

I • O.B ,

\ • , 0.7

0.6 -5 10

Days

Warning

T

Warning

15

'" ....

68

considered. If the T chart shows points exceeding the warning limit while the D

chart does not, bias can be suspended as the cause.

The utility of the You den method can be seen in Figure A-6-2 where data

collected during 8 days (batches, etc.) have been plotted. In the top left hand

X-Y plot, systematic errors are evident since points generally form an ellipse.

On day 4 and day 8, there seems to be some problem with random error. On days

3, 4 and 6, systematic error is a major contributor.

A.6.3 The range control chart

Either a range or standard deviation chart may be used to control

precision. Range is used here since it is easier to calculate and understand by

the average observer. This control chart may be set up if measurements are

being replicated a fixed number of times per batch or time period, and therefore

can be used in conjunction with a Shewhart control chart for mean concentration.

To set up the control chart, calculate range for each set of replicates,

and eventually estimate average range, R = rR./N. The control limits may now be ~

calculated by mUltiplying R by a factor which can be found in pu~lished Tables

(ASTM, 1976). For most cases, where the number of replicates is small ( < 7)

this process is likely to be simple. The lower limit multiplier, DL is zero and

the upper limit multiplier, tu for the respective n's is 3.27 n=2, 2.57 n=3,

2.28 n=4, 2.11 n=5, 2.00 n=6.

A.6.4 Summary

A.6.4.1 Chart preparation

1. Establish parameters, preferably with n~ 20

2. Prepare chart using X ~ 2s (warning) + 3s (action)

3. Maintain control and plot subsequent data.

4. Update parameters from time to time. F-tests can be used to

es~ablish if variances are the same, while "t" tests (assuming

underlying normality of control data) can be used to decide if

means are the same. For example, could we up-date the data in

Table A-6-1 with the data plotted in Figure A-6-2 (n=8)? Table

~

69

TABLE A-6-2: Update Data Plotted in Figure A-6-2

PAIR X Y T D

1 0.60 0.28 0.88 0.32

2 0.53 0.24 0.77 0.29

3 0.52 0.15 0.67 0.37

4 0.41* 0.20 0.61 0.21

5 0.62 0.25 0.87 0.37

6 0.66 0.31 0.97 0.35

7 0.53 0.19 0.72 0.44

8 0.60 0.15 0.75 0.35

AVG 0.58 0.22 0.80 0.36

s 0.054 0.063 0.106 0.047

n = 7

* We will eliminate this data point since it

lies outside the 28 circle in Figure A-6-2.

CALCULATIONS

s = R 0.033

s = S 0.082

s = 0.088

F s (n=7) = 0.088 = 1.17 (F crit 3.10) s(n-18) 0.075

Variances are_essentially the same. Similarly, lit" tests cannot distinguish the X or Y data from the two time periods. Therefore data could be pooled with the earlier set to give n=25.

70

A-6-2 lists the data, and subsequent calculations show that we

could pool the data sets to derive a new control chart. If

groups prove to be different a reason should be sought and the

process must be restarted.

A.6.4.2 Chart interpretation

1. Points outside the action limits require investigation. Data accumulated

since last known control must be redone.

2. Trends, for example 7 points consecutively on the same side of the mean

should be investigated.

3. The charts should be watched for cycles or periodicity which takes place

within the control limits. Cause should be investigated and this source of

error eliminated.

71

A.7 APPENDIX 7 CALCULATING AND REPORTING THE ERROR STATEMENTS.

What to report and how to report it have been dealt with in the text. A

complete data report will include complete validated methods and their

performance characteristics, physical handling and appearance of samples,

sampling strategy and some statement about laboratory quality assurance tactics.

However, at some point we are going to want to report our data, give some

estimate of error and how we calculated it, and the model used. To illustrate

how this can be done for a real data set, some hypothetical data for Hg

determinations in sediments are shown in Table A-7-1. This data set includes

calibration, hlanks, reference material and replicate samples. Ideally, all of

these have been run randomly as shown by the sequential number. Furthermore,

our hypothetical laboratory checks its balances (using a control chart and NBS

standards at a high and a low weight), keeps a control chart on blanks and blind

reference materials, and checks the instrument calibration by inserting two

blind calibrants (high and low) prepared by the lab manager. Before data are

allowed to go out of the laboratory, the calibrant, reference materials and

blanks must be "in control". One advantage of the control charts is that one

can draw on a longer period of experience with the method to assess average and

variance of the blank, and also assess bias from the long term experience with

reference materials.

Here, error will be treated in a "modular II fashion, dealing with each

aspect as a separate entity and putting the error statement together at the end.

This will require logical thinking, and a due consideration of all elements

which potentially contribute error to the final reported value. Also it should

be anticipated that some problems will arise which require an operational

decision. For example if our detection limit is expressed as an absolute

amount, ng Hg then in terms of sediment concentration, ng g-l, it will depend on

how much sediment we originally used.

72

TABLE A-7-1

RAW DATA AS LISTED IN LABORATORY DATA SHEETS

SAMPLE MEASUREMENTS

w Y

SAMPLE SEQt SEDIMENT PEAK

II WEIGHT HEIGHT

1 8 0.291 3503 36 0.263 31.2

2 28 0.375 44.2 17 0.474 46.7

3 35 0.247 27.9 18 0.222 26.3 51 0.232 27 .1

4 52 0.303 31.2 2 0.365 31.6

5 26 0.632 44.1 19 0.484 43.3

6 49 0.387 35.0 53 0.329 31.9

7 48 0.223 17.1 37 0.205 17.6

8 24 0.451 46.0 33 0.458 46.6 7 0.402 42.1

BCSS-1 54 0.567 91.4 MESS-1 42 0.498 97.1

BLANKS (2 DAYS, POOLED)

DATE SEQ. II PK Ht

5/6/83 4 6.5 5/6/83 1 3.8 5/6/83 6 4.8 4/6/83 6.0 4/6/83 5.4 4/6/83 3.7 4/6/83 6.3

ANALYST: MIKE ROMOLE DERIVED QUANTITIES 'DATE JUNE 5/83

- FILE /I JUN-5-83 Y-Y Y-Y -a Y-Y -a ~P CALCULATION B B B

m mw

BLANK Hg ng Hg ng g -1 AVERAGE s v

CORR Hg ng g -1

30.1 21. 9 75.4 73.4 2.83 1 26.0 18.8 71.4 39.0 28.8 76.9 70.9 8.49 1 41.5 30.8 64.9 22.7 16.2 65.7 66.8 0.99 2 21.1 15.0 67.5 21.9 15.6 67.3 26.0 18.8 61.9 57.1 6.79 1 26.4 19.1 52.3 38.9 28.7 45.5 51.8 8.90 1 38.1 28.1 58.1 29.8 21.7 56.1 57.4 1.84 1 26.7 19.3 58.7 11.9 7.9 35.3 37.8 3.54 1 12.4 8.3 40.3 40.8 30.2 67.0 67.2 0.40 2 41.4 30.7 67.0 36.9 27.2 67.7 E"i = 10

86.2 65.3 115 Control limit X-2s = 105 91. 9 69.7 140 Control limit X-2s = 130

t SEQ /I Gives order in which determinations were run (italics)

73

TABLE A-7-1 CONTINUED

CALIBRATION SHEET DATE JUNE 5/83 FILE # JUN-5-83

W Y, PEAK HEIGHT Hg, ng

40 34 43 32 SEQ # 0.0 0.0 0.4 0.0 0.0

22 14 38 50 SEQ II 20.0 27.0 30.0 24.1 28.0

12 27 5 46 SEQ II 30.0 41.5 43.5 44.4 38.5

11 16 30 SEQ II 40.0 55.7 56.9 54.3

15 23 29 SEQ II 50.0 67.8 63.5 62.4

20 13 SEQ II 60.0 76.8 77 .1

3 SEQ II 70.0 93.3

21 10 25 39 47 SEQ II Wl* 8.4 8.8 7.6 8.0 8.1 Yl' k 8.2, 5

41 31 44 9 45 SEQ #

W * 2 74.0 70.1 75.1 74.3 74.0 '72' k 73.5, 5

* WI' W2

Check calibrants supplied by lab manager.

Analyses started at 0900, finished at 1100.

74

A.7.l Error elements for the data reported in Table A-7-1.

A.7.1.1 Calibration

A linearity check need not be performed every time a calibration is run,

but it certainly should be checked initially as part of method validation, and

from time to time. A check on the data in Table A-7-1 gives

F = 0.75 < F~(1)=0.05,7,12 = 2.91. Therefore linear regression is appropriate

within the amounts of Hg (ng) shown in the Table. A check confirms that no

samples were measured outside the linear calibration range.

Results of linear regression are (Section A.5.4)

n = 21 s c = 2.34 where Y a + mW

m 1.295 s 0.0249 m a = 1.70 s 0.935 Y peak height units a r = 0.9965 W 31.4 ng W mass of Hg, ng

Given that the calibration is linear, and a linear regression has been

performed, the next task is to check the calibration against the blind

calibrants. Although control charts can and should be prepared to assess long

term performance, it is probably simpler and more internally consistent to

confirm that the blind calibrants lie within some arbitrary confidence limit

(95% for example) determined by the regression. If k replicate recorder

readings are made on each of the blind calibrants, the appropriate 95%

confidence interval is:

tsy = to. 05 (2), n-2 ~ s2 [t + iJ + (w. - W} 2 s

1 m a-7-1

For blind calibrant 1, WI = 5.0 ng, k = 5

ts 2.09 ~ 5.48 [t + ~lJ + (26.4)2 x 0.000620 y

1.2

75

Therefore the average, Y, of the 5 readings on WI' must lie within the

95% confidence band:

8.2 + 1.2 (peak height units)

Similarly for blind calibrant 2, W2

= 55 ng, and the average of 5 readings on W2

must lie within

72.9 + 2.7 (peak height units)

The data on WI and W2 in Table A-7-1 show that the calibration is acceptable,

and we have no reason to believe that the blind calibrants come from a different

population. The blind cali brant could now be included in a revised calibration,

but this will not be done here.

A.7.1.2 Blanks

The three blanks run with this batch are within the blank control limits

so blanks will be pooled with those from the day before to give a better

estimate.

Therefore YB, BB' DB = 5.21, 1.15, 7 (Peak height units)

(eqn. a-5-1)

A.7.1.3 Limit of Detection

LOD (IUPAC) = t o.001 (1),6 x sB

m

(eqn. a-5-12)

or

5.21 x 1.15 = 4.6 ng Hg

1.3

LOD (Prop of Error)

(eqn. a-5-15) 2.71 " 2.91 + 0.91 + 0.005 = 5.3 ng Hg

Limit of Quantitation is about 10 LOD ""3

18 ng Hg

76

A.7.1.4 Pooled Variance

We may use the replicate determinations (independent weight, digest,

recorder reading) to calculate a pooled variance.

sp'\> = 4.73,

(eqn. a-5-2)

-1 10 (ng g )

A.7.2 Putting the error components together.

A.7.2.1 Random error

Now that we have calculated the likely size of error for each of the

above four components, we need to find some succinct way of putting them

together in a meaningful final error statement, In the treatments below, we

will use "t" to indicate Student's "t" and will mean by it tct (2) = 0.05 (or 95%

confidence limit) for the appropriate degrees of freedom, usually n-1. It

should be noted that "til is used as a correction or normalizer to convert s to

(or 2 crete) when s is calculated from a small number of replicates. What we

will attempt to estimate is the random error one would observe if each final

measured or calculated value were arrived at independently. That is, each final

concentration reported had its own blank, calibration, digest and weighing.

Furthermore we will assume that underlying distributions are close to normal.

The first step in assessing error is to

the final reported measurement. Here for each

-1 C. ngg

l.

w. m l.

specify exactly how we arrived at

Hg concentration, C., l.

a-7-2

Note that Yi and wi are independent for each sample but that YB, a and mare

constant for all samples. For the completely independent measurement,

Ci

= (Yi

- YBi

)

wi mi

a-7-3

..

77

If we measure separately each of the errors associated with the elements

in equation a-7-3, propagation of error theory (Appendix 4) will show us how

they should be combined to give the un-biased best-estimate of error associated

with C. While tractable, it is rather messy to do this and some simplifying

assumptions may be possible at this point. For instance, control on our balance

tells us that 0 is very small, and we can ignore it without seriously hurting w

our error estimate. Therefore we will use some arbitrary "errorless" weight, w,

(average or minimum for example) when estimating the error. Formula a-4-1

applied to a-7-3 setting w. = w, yields ~

1 _2 2 wm

+ 02 + (Y_Y _a)2 a B

,

o:~ ] m

a-7-4

Now by substituting ts for 0 we can estimate the combined error (keeping units

the same I ).

02 Y

_2 2 wm

2 o YB _2 2 wm

2 a a -2 2 wm

=

=

(tsB

) 2

_2 2 wm

[2.45 x 1.15 ] 2

0.352 x 1.29

2 (ts ) a --:z-2 wm

[7..09 x 0.935] 2 0.352 x 1.29

"

. -1 2 = 38.5 (ngg )

I = 18.5 (rigg -1) 2

80

SAMPLE" -1

Hg cone ng g n t p

1/ dry weight

1 73.4 2

2 70.9 2

3 66.8 (23) 3

4 57.1 2

5 51.8 2

6 57.4 2

7 37.8 (25) 2

8 67.2 3

t number of determinations used to calculate average.

( ) LOD for measurements falling between LOQ and LOD •

.. Samples, standards, blanks and reference material run randomly.

Cali bra tion: Y = a + mW Y - Peak height

Blanks:

LOD:

W-Hg,ng

n =21, a=1.70, m=1.294, s =0.0249, s =0.935 c m a

Calibration range 0 - 70 ng Hg.

Calibration checked (2 points)

Three blanks in control; pooled with previous day's blanks.

YB, ~, Ds = 5.21, 1.15, 7

Calculated by method of propagation of error (a-5-15).

LOD = 5.3 rig Hg

SAMPLE SIZE RANGE

:. LOD = 8 - 25 ng

0.205 - 0.632 g -1 g (depending on sample)

"

' ..

POOLED VARIANCE, sp

ESTIMATED TOTAL ERROR

Additional Information

Sl

Calculated from the replicate (n ) determinations. p

Includes error of digestion, determination and chart

reading.

-1 sp = 4.73 ng g (based on 10 degrees of

freedom - formula a-5-2)

a) Random

Random error was calculated according to the

propagation of error theory using t (a(2) = 0,05,\)

to normalize s.

2 (tsy ) + 2 (tsy ) + (y _y _a)2 ts2 max B m B a

" 13.4 ng g-l where By ::: S ...-- p wm

or tsc = S.l ng g-l

b) Bias Bias was estimated from long term

determinations of reference sediments BCSS-l,

MESS-I. At these concentrations, our data appear

to be consistently low by about 5%, based on 128

determinations over a period of 1 year.

2 m

These data are referenced as file # JUN-S-S3. File contains Control

charts for blanks, balances, reference material, and the work sheets, (along

with sampling location, storage procedure and any physical manipulations done to

the samples prior to chemical digestion).

GUIDELINES FOR OBTAINING, CALCULATING AND REPORTING ...dfo-mpo.gc.ca/Library/320737.pdf · for both...

Documents

Transcript of GUIDELINES FOR OBTAINING, CALCULATING AND REPORTING ...dfo-mpo.gc.ca/Library/320737.pdf · for both...