GUIDELINES FOR OBTAINING, CALCULATING AND REPORTING ...dfo-mpo.gc.ca/Library/320737.pdf · for both...
Transcript of GUIDELINES FOR OBTAINING, CALCULATING AND REPORTING ...dfo-mpo.gc.ca/Library/320737.pdf · for both...
. .::
GUIDELINES FOR OBTAINING, CALCULATING AND REPORTING QUALITY
STATEMENTS FOR CHEMICAL ANALYSIS OF MARINE SEDIMENTS
R. \~. MACDONALD
i
TABLE OF CONTENTS
1. INTRODUCTION
2. REPORTING
2.1 What should be reported
2.2 How data should be reported
3. METHODOLOGY
3.1 Method documentation
3.1.1 Principles of a good procedure
3.2 Organizational requirements of RODAC
3.3 Validation of the method
3.3.1 Ruggedness testing
4. FACTORS CONTRIBUTING TO QUALITY
4.1 Reagents
4.2 Recovery and Interferences
4.2.1 Reference materials
4.2.2 Spiking environmental samples
4.4 Calibration
4.5 Blanks
4.6 Limit of detection (LOD)
4.7 Accuracy
4.7.1 Random error
4.7.2 Systematic error (bias)
4.7.3 Combining and reporting the two error statements
4.8 Quality assurance
4.8.1 Control charts
5. RECOMMENDATIONS FOR VERIFYING QUALITY OF DATA
6. REFERENCES
;;l..
.'l
.'I
S"'
" b
1-'8 "I
10
10
10
1/
(.2.
13 l<j
/6
19 ':1I
.23 .:W \i
.28
3D
33
3.';
,.
ii
APPENDICES
... 1 APPENDIX 1 DEFINITION OF TERMS USED IN REPORTING OF DATA
..• 2 APPENDIX 2 THE RUGGEDNESS TEST
... 3 APPENDIX 3 CERTIFIED REFERENCE SEDIMENTS
A.3.l Metals
A.3.2 Chlorinated hydrocarbons
A.3.3 Reference sediments in preparation
A.3.4 Addresses
·.4 APPENDIX 4 PROPAGATION OF ERROR
'/1
W' '1,/
W -'-:> '1!'
"9 ... 5 APPENDIX 5 COMMON FORMULAS USED TO SUMMARIZE DATA AND MAKE QUALITY 50
STATEMENTS
A.S.l Sample standard deviation
A.S.2 Pooled standard deviation, sp
A.S.3 Rejection of outliers
A.S.4 Calibration using linear regression
A.S.4.l Linearity
A.S.4.2 Linear regression
A.5.S Blank Correction
A.S.6 Limit of detection (LOD)
A.5.6.l IUPAC Definition
A.S.6.2 LOD by propagation of error
A.S.7 Symbols used in Appendix 5
..• 6 APPENDIX 6 CONTROL CHARTS
A.6.l The Shewhart control chart
A.6.2 The Youden control method
A.6.3 The range control chart
A.6.4 Summary
A.6.4.l Chart preparation
A.6.4.2 Chart interpretation
S-I 51
Scl.
:'3 5'(
.5",
.~6
S1 58
59 (,0
6::1. (,2
6</
~S b~
'8 to
4
deviations. However, the leap from these to confidence limits requires certain
assumptions which may be invalid (Kaiser, 1970) and should be approached with
caution. At the very least, the average, X, the standard deviaiton, s, and the
number of replicates, .n, should be reported (Natrella, 1982) because how well X estimates the population mean, ~ , depends on sand n, and how well s estimates G,
the population standard deviation, depends on n. Uncertainty statements should
be carefully formulated and supported so that there can be no confusion in what
is meant. Outliers which have been deleted from the data set should be
identified, and statistical or other reasons for their deletion should be
specified.
Data points which fall below the LOD (limit of detection) should be
reported as "not detected" followed by the LOD in brackets. Those points which
fall between the LOD and the quantitation limit (LOQ) should be reported as
numbers followed by the detection limit in brackets (ACS Committee, 1980).
The numbers should not be distorted by the reporting process. This
means that blank and recovery corrections, and calibration conversions should be
made clear. Give at least the formula and preferably a worked e~ample. Errors
should be avoided by rounding off at the end of the calculations. It is
generally assumed that roundoff (correctly applied) implies ~ 1/2 in the last
significant figure (i.e. 3.2 ~ 0.05) however this is a poor way to judge
confidence in data (ACS Committee, 1980). State the uncertainty to two
significant figures and the reported value to the last place in the uncertainty
statement (Ku, 1968). Terminology and units should be expressed according to
S.l. practice (see for instance the last pages in a recent January issue of
Analytical Chemistry). For data where neither imprecision nor systematic error
(bias) are negligible, Eisenhart (1968) recommends qualifying the results with a
statement placing bounds on systematic error with a separate sta·tement of the
standard deviation or imprecision. The reported result should be stated to the
last place affected by the finer of the two qualifying error statements. Later,
in the section on accuracy (4.7), we discuss how the two error statements might
be combined.
5
3. METHODOLOGY
The word "method" has been defined as a "set of written instructions
completely defining the procedures to be adopted by the analyst in order to
obtain the required analytical result" (Wilson, cited in Kirchmer, 1983). While
most analysts would concur with the definition, there is a further need for
clarity since there is a large variation in what is normally reported as a
method. Taylor, (1983), has suggested the following hierarchy; technique
method-procedure-protocol, where technique refers to the scientific principle,
and method is a distinct adaptation of technique. Only when we get to procedure
do we consider written directions necessary to use a method, and more
specifically a protocol is a set of definitive directions that must be followed
without exception. Generally when we refer to method as defined at the Qutset,
we mean a procedure or more stringently, protocol. In the following, we will
use Taylor's terminology for clarity.
3.1 Method documentation
The documentation should include sample pre-treatment, digestion and
instrumentation. All need to be specified accurately and completely so that
others can understand exactly how the determination was performed. New methods
should be reported fully with exhaustive testing (ACS Committee, 1980). It is
acceptable to cite a reference provided it is generally available and gives a
complete procedure for the method. Any modifications to a procedure should be
fully tested and reported, and the procedure should. be updated when
modifications have been instituted. Little known methods are not recommended
since they force one to rely on the analyst, and generally do not have a good
basis for comparison. Uniformity of methods between laboratories can remove one
source of inter-laboratory variance, however, with properly validated methods,
this step is not essential to meet quality objectives. If the laboratory is
following acceptable quality assurance practice, they will already have at least
a procedure and preferably a laboratory protocol. Provision of a detailed
description of their procedure, therefore, need only be done once and kept on
file with updates ~hen changes are introduced. This should not be an onerous
8
TABLE 1
Proposed performance characteristics of a suitable ocean dumping procedure for
total metal in sediments4•
METAL
Hg < 0.23
Cd < 0.18
Cu < 4
Pb < 21
Zn < 30
PRECISION
sxlOO X
< 10%
< 10%
< 5%
< 20%
< 5%
ACCURACyl
sxlOO X
15%
15%
10%
25%
15%
CONCENTRATIoJ
- -1 X (Ilg g )
0.75
0.60
25
35
200
1. Modified standard deviation -V'i.(X~-R) 2 where R is the reference value.
2. X is the concentration (dry weight bssis) upon which precisio~ accuracy and
LOD are based.
3. For Cu, Ph and Zn these were estimated as 38 at the concentration shown.
4. A suitable method for total metals should also have a recovery of metal in
certified reference sediments of greater than 60%, and preferably greater
than 90% (see section 4.2).
3.3 Validation of the method
This subject has recently been reviewed by Taylor (1983), and deserves
more emphasis than it is generally given. The goal of validation is to see if a
laboratory can use a specific method to produce results which conform to
pre-determined requirements. This implies that corporate goals are defined as
in Table 1. The literature may be reviewed for prospective methods which should
meet the pre-defined performance characteristics. Reported methods often lack
c.
9
detail and it is generally unwise to accept claims for the performance at face
value. Therefore, a good laboratory w~ll perform its own validation before
using the method routinely, and as an end product of validation it prepares a
procedure or better still a protocol as described above. During validation the
performance it prepares a procedure or better still a protocol as described
above.
During validation, the performance of the method is examined, a good
design allowing estimates of precision and accuracy (function of concentration)
for both reference materials and real samples (Kirchmer, 1983). Typically,
three concentration levels should be examined; extremes and mid-range.
Information acquired during this process can lead to an estimate of the range of
application, sensitivity to interferences, purity of reagents and the detection
limits. The product of validation, therefore, is a list of performance
characteristics. Through validation, the laboratory demonstrates its capability
of performing a method, and generates sufficient information to prepare the
protocol. The performance expectations so determined should not be used for
quality estimates of later data; rather they are a yardstick for comparison. If
there is a limited number of sample types for which the method does not work,
this does not mean the method need be abandoned (Evans, 1978). Rather these
sample types should be identified, and use of an alternate procedure designated.
This is the value of documenting "accumulated experience II with a method.
3.3.1 Ruggedness testing
No validation procedure would be complete without a ruggedness test.
This can be performed with remarkably little effort, and provides evidence of
sensitivity of a procedure to small changes. An outline of how to perform a
ruggedness test is given in Appendix 2. Procedures which are rugged are
desirable since their application by different analysts at different times is
likely to produce more consistent results. If the outcome of a procedure is
found to be very sensitive to a particular variable (e.g. temperature, humidity,
time), then alternate steps should be considered, or that variable must be very
tightly controlled within specifications outlined in the protocol.
10
4. FACTORS CONTRIBUTING TO QUALITY
4.1 Reagents
The required purity of reagents varies with the determination, and needs
specific evaluation during method validation. The performance will be
controlled by purity (and variability of purity) since it affects precision,
bias and LOD for the procedure. As a rule, for analytical determination of
components of sediments analytical reagent grade will be required as a minimum.
For some metals, spectra-grade or ultra-pure chemicals will be required.
Certainly, once established, no chemicals of lesser purity than specified in the
protocol should be used. The reagent background or blank of each chemical
component should be determined prior to use (Booth, 1979). Variations in purity
occur from source to source and even from lot to lot and therefore new bottles
should be evaluated before use. In addition to the individual reagent blanks, a
method blank is required to evaluate the combination of all chemicals used in
sample digestion and preparation, and also to estimate the LOD.
Chemicals used in the preparation of standards or calibrants require
special attention since they will be factors in both precision and accuracy of
the reported results. It is particularly important to evaluate the storage
procedure (bottle type, time, light, temperature). Restandardization and
chemical preparation should be carried out as often as required and should be
specified in the protocol. Standards can be prepared or bought commercially,
but should be independently checked and intercalibrated with the old standards
to provide continuity.
4.2 Recovery and Interferences
Methods which achieve high recoveries are desirable since they do not
require a large bias correction, and therefore are easier to check with
reference materials. Furthermore, they have inherently better relative
precision. For example compare two different methods with routine recoveries of
85-95% and 25-35% respectively. The range in recovery as a percent is
apparently the same for both methods, however the variance in recovery leads to
a coefficient of variation of + 6% for the first method, and + 17% for the
11
second. In practice methods with low recoveries tend to have wider ranges in
recovery which worsens this situation. For these reasons, the ACS Committee
(1980) recommends that methods with recoveries of less than 60% not be used.
The recommendation to avoid methods which have low recovery should in no way be
misconstrued to mean that selective or partial extraction schemes are to be
avoided. What I mean here is low recovery of the determinand of interest which
could be a small portion of the total for example, the 'weakly bound" component.
The determination of recovery (and its repeatability and
reproducibility) is an important validation step which should be addressed with
reference materials, and by using referee methods such as those known to give
total extraction (HF for total metals in sediments for example). Spiking can be
used to determine the efficiency of various chemical or physical extraction
steps but will give no information of the extraction of the determinand embedded
in a complex matrix or in a form which differs from the spike (see section
4.2.2).
4.2.1 Reference materials
Certified reference materials (Appendix 3) can be obtained from a number
of sources and are a keystone in validating a method and checking its
performance. Reference materials are intended to behave like environmental
samples, and are in fact environmental samples of careful determination and
known composition. At present, the selection for metals, hydrocarbons and
chlorinated hydrocarbons in reference marine sediments is limited to a very few,
so not all matrix types or metal concentrations are represented. For
environmental material which is similar in composition and trace component
concentration, reference materials are the method of choice to check for
recovery and interference problems.
The limited selection of reference materials can be augmented by
preparing in-house uncertified sediments. This could be particularly important
for exceptional material (mine tailings, woody waste) which present reference
materials do not well represent. The uncertified reference should be well
mixed, properly stored to avoid deterioration, and fine grained ( < 62 ~m) so
that representative samples can be easily removed. This material should be
12
analyzed by referee methods if possible as a further check, or used in
inter-calibrations with other laboratories performing similar analyses.
4.2.2 Spiking environmental samples
Spiking can be carried out by adding a known amount of analyte to a
portion of sample for which there is already a determination and estimating the
difference between actual recovery and theoretical recovery. This test is not
very powerful in a statistical sense (Kirchmer, 1983) and even without
interference, considerable differences from 100% are to be expected. An
alternative approach is the well known standard additions method (SAM), the uses
and limitations of which are discussed by Klein and Hach (1977).
Standard additions and spiking are fraught with problems in
interpretation and should be approached with great caution. Spiking with a
soluble form of the analyte will indicate nothing of the effectiveness in
leaching the analyte from a solid matrix. The assumption in spiking experiments
is that the analyte added has equilibrated with the natural form, or is subject
to the same recovery and interference (Holden et aI, 1983; Corsini et aI, 1982;
ACS Committee, 1983). This assumption is often unwarranted and Corsini et al
(1982) point out that even complete recovery of the analyte spike is not
evidence that the analytical result is correct. Suitable combinations of
thermodynamic and kinetic factors are required before SAM is reliable. However
recovery tests by spiking can provide a certain degree of information; for
example non-linear SAM surves, or disagreement between SAM and normal
calibration procedures indicate a problem. Furthermore, spiking of blanks
(calibration) which are run through ·the entire analytical sequence can give
information on interferences arising from the reagents. These kind of
interferences or recovery problems can and should be eliminated before the
method is applied to environomental samples. If results for spikes on a sample
are poor, the matrix is likely to present even greater problems (Holden et aI,
1983). Spiking of blanks can also be used in the estimation of the LOD of the
method (Glaser et aI, 1981).
13
4.4 Calibration
Calibration is the process of relating an instrumental response to a
corresponding mass or concentration of a particular substance. Taylor (1983)
notes that there are two kinds of calibrations; physical calibration for the
measurement equipment and aneillary measurements (time, volume, mass), and
chemical calibration. Due care should be paid to physical calibrations so that
they do not contribute significantly to error. Of all the ancillary equipment
the analytical balance is probably the most important and most neglected. The
balance plays a central role in analysis, generally used to prepare standards
and to calibrate or check other laboratory procedures, micro-pipetting for
example. Balances should be checked for calibration regularly with a set of
standard weights, and a control chart maintained with the balance to ensure
prompt correction in case of problem.
Uncertainty in the calibration, particularly biased calibration curves
is a leading contributor to inter-laboratory variance. For this reason it is
essential to perform the calibration with high quality calibrants which have
been verified. Calibrants should not be stored longer than recommended by
manufacturers and detail concerning their preparation and use should be
specified in the protocol. New calibrants should be inter-calibrated with the
old ones, when possible, to ensure continuity.
To prevent bias, calibrants should match as closely as possible blanks
and samples, and they should be analyzed by the identical procedure (Kirchmer,
1983), Different procedures are acceptable only where there is experimental
evidence that the results differ negligibly. The procedure for calibrating
should be specified in the laboratory protocol (ACS Committee, 1980).
The ideal arrangement is to have a linear calibration but this should
initially be verified with the linearity test as shown in section A.5.4.1. The
ACS Committee (1980) recommends that 5 concentrations analyzed in triplicate
should be used. The concentrations should span the measurement range for
samples. With experience of the instrument response (sensitivity, calibration
blank) these requirements could be dropped to 3 different concentrations
analyzed in triplicate. Check calibrant should be prepared independently to
assess precision and bias of the calibration and further establish linearity.
14
We recommend control standards at a high concentration to check slope bias and
at a low concentration to check blank bias (see appendix). Calibration data can
and should be summarized by linear regression equations (Kaiser, 1970).
Appropriate formulas and reporting instructions are given in A.5.4. Calibration
near zero concentration will also affect the estimate of LOD and so needs to be
performed with care.
Memory effects should be identified by presenting calibrants randomly to
the instrument, interspersed with samples and blanks. Of prime importance is
the establishment of the stability of the instrument, so that frequency of
calibration can be adjusted to attain the required accuracy given the stability
or drift rate. This can best be achieved by inserting standards regularly and
using a control chart.
The standard addition method, SAM, can provide useful information about
the calibration and should be used from time to time, particularly during method
validation, and when problem sediments are suspected. Klein and Hach (1977)
have developed a standard additions decision tree which may be used to guide
interpretation of SAM results. We re-emphasize that SAM should be used with
caution. Normally, SAM is performed by preparing a four point calibration
curve. In no case should a one point standard addition be used as a substitute
for a calibration curve.
4.5 Blanks
A blank is a response which occurs in the end measurement in the absence
of analyte derived from a sample. Several types of blanks are possible;
instrumental, calibration, reagent, and method (field). Generally the magnitude
and variation tend to increase from the instrument blank to the method blank.
The intent and utility of each blank varies and therefore clarity in reporting
how a blank was determined is essential. The validity of a blank depends on how
well it represents the measurement process to which it applies. To prevent
bias, blanks should always be handled randomly and in an identical fashion to
samples. Use of special chemicals, containers or facilities for blanks will
invalidate them.
The instrumental blank may be determined by measuring the response when
15
the instrument is operated normally but no sample is presented to it. This
blank should be monitored since the overall blank will never be less, and
because it can give valuable information on instrumental deterioration or memory
effects (a pertinent example is the deterioration of instrumental blank as
graphite tubes reach the end of their lifetime). The calibration blank will
depend on the chemicals used to make up the standards and should be monitored
since it is a potential source of calibration bias. The reagent blank should be
determined individually for each of the chemicals used in the procedure by
introducing them to the same detection system (diluted or made up in appropriate
quantities of solvent). This process can be used to accept or reject new lots
of chemicals before they are brought into service, or to indicate when
purification steps must be taken.
The method blank is the most important and both its magnitude and
variance need to be known, the latter being used for the estimation of the
method LOD and overall error in a blank corrected result. This blank is found,
in principle, by analysis of a representative sediment which contains no
analyte. Ideally, this would be run as a field blank wherein the "analyte free"
sediment would be packaged and stored in the the field identically to samples.
Unfortunately, this ideal does not presently appear to be feasible for marine
sediments and therefore a "combined" laboratory reagent blank must substitute.
(With careful sampling and storage, normal marine sediments should not be
subject to a significant contamination and therefore absence of a field blank
should not invalidate results.) To determine the combined reagent blank,
therefore, digestions are carried out exactly as described in the procedure in
the absence of a sample.
It is statistically advantageous to have a blank to sample ratio of 1:1,
but this is seldom done in practice due to cost. The number of blanks required
depends on several factors including the number of analyses, the accuracy or
precision required, and experience with blanks from previous determinations. It
has been suggested that one blank should be run per batch (Kirchmer, 1983) or
that the blank to sample ratio should be 1:9 (Booth, 1979). Until blank
characteristics have been well established it would be wise to run them
frequently. For the purposes of control (blank variance and magnitude) no fewer
16
than 2 per batch are necessary.
Performance of a blank correction can contribute both to random error
and bias. To minimize these, low and constant blanks are desirable. The blank
response on the recorder should not be subtracted from the sample response
directly unless the linearity of the calibration is well established.
4.6 Limit of Detection (LOD)
The limit of detection (LOD) has been a subject of confusion for two
reasons: many definitions have been used and often a given definition can be
applied in more than one way. Different statistical approaches result in an
order of magnitude variation in the estimate of the LOD, and recently much
thought has gone into developing a rational basis for determining it (Long and
Winefordner, 1983; Kirchmer, 1983; ACS Committee 1980; Currie, 1968, and Porter,
1983). Concensus is being reached in the literature and we summarize here two
acceptable approaches. Detailed formulas and worked examples are provided for
the IUPAC method, and the "propagation of error" method in A.5.6.
Detection limit is frequently defined as twice background noise. This
definition evolves from electronics and relates specifically to the response of
an instrument. While simple to state, application of this definition is
problematic; the relationship between frequency of signal and dominant frequency
of noise plays a role but is seldom discussed. Furthermore it is universally
recognized that such "instrumental" detection limits contribute to but do not
encompass the broader concept of a method detection limit which includes
additional procedure uncertainties (see Glaser et aI, 1981).
The IUPAC definition of LOD (Appendix 1) brings in the concept of
"reasonable certainty" in separating an analytical signal from noise. To do
this, IUPAC recommends using the random error associated with the blank signal, aB,
as the noise. Reasonable certainty is assured by demanding that a signal be at
least ka limits above the average blank before concluding that analyte has been
detected. In the past, k=2 has often been chosen but this is now discouraged in
favour of k=3; for a normal distribution only 1 in 1000 blanks would give so
large a signal, and for any distribution, Tschebyscheff's inequality assures
that no more than 1 in 9 blanks would exceed this criterion. The real world
lies somewhere between these two limits.
or the
To calculate LOD, 0B is estimated
standard deviation observed as one
17
by the blank standard deviation,s , B
approaches the LOD from above. If
this estimate is based on a very limited number of blanks,sB may seriously
underestimate crBe To correct for this one can substitute the appropriate "tl!
value which converges to 3 as n gets large, as shown in e_quation 1;
-1-
III m
n large n small
where m, the slope of the calibration curve (analytical sensitivity), converts
from recorder signal units to concentration units. Experience in a variety of
situations has shown that the "3q II method of estimating LOD is in agreement B
with estimates made by inspection (Kaiser, 1970). The "method" detection limit
can therefore be defined as;
LOD ~ 3 lim a c
[c]+ LOD
-2-
For this calculation, qB is chosen as a reasonable estimate for GLOD
.
Kirchmer (1983) estimates the random error at the LOD to be about 66% of the LOD
at the 95% confidence level. If blanks are so small that they do not give an
apparent signal above instrumental noise, it is difficult to apply the above
concept. The ACS Committee (1980) recommends that the LOD may then be based on
the peak to peak noise measured on the baseline close to the actual or expected
analyte peak. A more thorough approach, consistent with the definition in
equation 2, would be to spike blanks to give analyte concentrations which lie
slightly above the LOD but still not in the region of quantitation as discussed
below (see Glaser et aI, 1981).
Long and Winefordner (1983) have criticised the IUPAC definition of LOD
because it does not include error associated with the calibration and therefore
underestimates the true LOD. Based on the propagation of error, they give
18
formulas which allow the calibration to be factored into the LOD calculation,
and these are developed in A.5.6.2.
There are three regions of chemical analysis (Figure 1); unreliable
detection, detection (qualitative analysis) and determination (quantitative
analYsis).
REGIONS OF ANALVTE MEASUREMENT
o LOO.3Ita LCD.l0ua
Unreliable Detection Region of Detection Region of Quantitation
Reported u, Nol Ottected (LOD)
Figure 1
Reported aB, Number (LOO)
Reported Ba,
Number Plus Error Estimates
Figure 1 shows that care is needed in expressing LOD, or LOQ since one can
perform a blank correction as shown at the bottom. The definition of LOD
implies that it is the blank corrected value, "3a ", which should be expressed. B
Near the detection limit the substance can only be detected (present,
absent) with a certain degree of confidence but not reliably quantified. A
second term, the limit of quantitation (LOQ) has therefore been introduced to
designate the point at which quantitative analysis can begin. The ACS Committee
(1980) recommends that LOQ be calculated analogously to LOD but with k set equal
to 10. The choice of 10 is a minimum and really depends on having a well
defined blank (Currie, 1968). Blanks, and their variance, sensitivity and
calibrations change from data batch to data batch and are intrinsic properties
of a data set. Therefore LOD and LOQ vary also; good control over blanks and
calibrations is essential to keep LOD in control and this should be continually
evaluated.
19
Detection and quantitation limits are part of the performance
characteristics of a method and should_be reported with the data including how
they were determined, and the number of blanks used to estimate them. The LOD
and LOQ are useful indicators of how the method will perform for single
determinations at various concentrations. By replicating samples and taking
averages, onl' can effectively reduce the LOD and LOQ by decreasing random error
(s- = s). Therefore these performance characteristics should not be viewed as X X Yo
, a replacement for a valid error statement which is based on replicate
determinations.
4.7 Accuracy
Although a fundamental property of data, and the most important quality
statement, there is still no agreement on the definition of accuracy or how best
to express it. Therefore it is essential to illustrate clearly how one has
arrived at a final error statement. Here I consider accuracy to be an
expression about how close each determination is expected to be. ~o the "true"
value. With this definition accuracy comprises a random error (imprecision) and
a systematic error (bias). The need for clarity in accuracy statements becomes
all the more important since systematic error can behave like a random error and
vice-versa depending on exact circumstance. For example, a single calibration
curve which is in error (slope or intercept) would cause a bias in all data
where that curve was used. However, over a number of days with many
independent calibrations it would begin to look like a random error. Of course
if the primary calibrant used to prepare all calibration curves was "off" this
would contribute a bias in addition to the daily fluctuations. Thus it is that
bias between two laboratories becomes treatable as a random error if a large
population of laboratories is sampled, and would be called inter-laboratory
variance.
It is unreasonable to require that every measurement be made in a
completely independent manner (i.e. have its own calibration curve, blank, etc.)
and therefore we must estimate error for these steps and ensure that they
contribute to the final error statement. Figure 2 illustrates how errors can be
20
RANDOM and SYSTEMATIC COMPONENTS of LABORATORY ERROR
Bias Callbrsnl
Blank • t
Poor Matkod -
Contamination + Analyst :!:::
Total Bias Leb/ Analyst
.....................................
Many Labs
Many Analysts
.. ... ~ ,
...... . ..... · '. f Random Biaso
\ . . : Sample Preparation .. : .. ~ Calibration \ .. Blank Correction ... . . \ Reading, Calculation \ · . · . ... ..
". ". \ \ 0. ~ Long Term \
'0 ':: •• ;~ Many Events ... ~~
Short Term <1 day S Ingle Event "'\, " ".,-,. '"'-)
j ........... 1 ./ '---_B_ia_s_~ ~_R_a_n_do_m----,
Accuracy Bias :!:(Random Bias :!:Random)
Q Period of Errors in this Box is About One Day
Figure 2
21
into three categories; genuine bias J random bias, and random error. The "random
bias" will be treated here as if it were part of the random error component, but
it needs special consideration because while it contributes error to the final
result, it is often omitted from the error statement due to the manner in which
calculations are made.
4.7.1 Random error
Precision is defined as the degree of agreement between individual
measurements when using a method. It is commonly represented by standard
deviation or some other measure of spread which is really an estimate of
imprecision. Great care is needed in giving the precision statement. As noted
earlier, not only is the standard deviation, 5, required, but the number of
replicates, fi J used to determine s.
In addition to sand n, the model and formula used to estimate s must
also be stated, as will now be shown. Variation in data generated on a single
sample by a single analyst or instrument over a short period of time is one
possible model which could be used to estimate a valid standard deviation for
the "repeatabilityll of the method. A second model would estimate the variation
observed over an extended period of time including different analysts in
different laboratories (reproducibility). The variance associated with the
reproducibility will be greater than or equal to that associated with the
repeatability. Furthermore, determinations which have been performed in
different laboratories would be expected to contain inter- and intra-laboratory
components of variance. The skill in providing a valid precision estimate lies
in choosing the correct model for the circumstance. In reporting, it is
essential that the model be clarified by a statement supporting the precision
estimate; a statement such as A± a is, by itself, meaningless.
Sources of error in an analytical measurement are listed below in
approximate order of increasing significance (Anomynous, 1976); electronic
noise, reading errors, analytical variation (depends on method), sample
preparation noise, inhomogeneity of samples and poor sampling, sample handling
and preservation. It is important that steps be taken to control and measure
the total analytical variability since too often, sample inhomogeneity is used
22
as an excuse for poor reproducibility in analysis. Errors which are calculated
individually for each of the above components can be combined according to the
rules of propagation of error as outlined in Appendix 4.
Imprecision will vary over the range of measurement. Near the LOD the
standard deviation will be about the same magnitude as the measurement. As one
reaches and passes the LOQ the standard deviation will be such that the
coefficient of variation (eV) is about 10% or less. One might expect the CV
then to remain constant up to some concentration where the calibration curve
becomes non-linear and the instrument saturates. Over the linear range, one
expects an empirical relationship between standard deviation and concentration,
C, such as;
s a + be
cv = s = a + b e C
-3-
-4-
Therefore at low concentrations, S ~ a, while at higher concentrations the
coefficient of variation is approximately constant and equal to b. While more
complex formulations are possible (higher powers of e are used) it is doubtful
that data exist which warrant their usage. The main point is that a single
standard deviation may not suffice, and at the least, s should be evaluated at
the upper and lower end of the concentration range of interest. This requires
judgement based on experience with the method.
There are a number of approaches to measuring precision, some of which
are listed below.
1. Repeat determinations of a single sediment extract;
2. a single sediment sample (well mixed) is split and sub-samples are
independently rUn through the complete analytical procedure; and,
3. a number of different sediment samples (each well mixed) are split and
replicate sub-samples are independently run through the complete analytical
procedure.
The first method would evaluate the measurement system noise but would not
detect memory effects (unless blanks were interspersed with samples) or sample
23
digestion variance. It would, therefore, be an unsuitable way to determine
variance pertaining to a group of varied sediments. The second method would be
an improved way to estimate the overall analytical precision, but only for a
particular sample. As noted, two or more samples investigated in this fashion
could help determine standard deviation as a function of analyte concentration
or even matrix. Method 3 is probably best for sets of varied sediment samples,
but duplicates should certainly not be run consecutively, and it is better if
" the analyst does not know which samples are duplicates. A pooled variance
(a.5.2) can be calculated from the results of replicates (assuming variances are
relatively uniform). As noted above, variance is likely to be a function of
analyte concentration, and possibly of other aspects such as matrix. In cases
where data are obtained over a wide range of substrates or concentrations, the
error treatment might best be handled by dividing the data into smaller sub-sets
where variance is more or less homogeneous, for example a high concentration
data set (polluted) and a low concentration data set (background).
Alternatively, variance stabilizing operations can be used such as the
log-transformation, but it is likely that some judgement will be required by the
analyst.
Correction of the blank poses an interesting problem for expressing
error. If there is a matched blank for each sample and the variance is
calculated on the adjusted values, the final reported variance will correctly
contain the two random error components. However, it is usually the case that
there are fewer blanks than samples, and the blank correction is made by
subtracting the average blank from each sample. In this case the blank behaves
as a "random bias" and the error may be filtered out of the final statement. It
is very important that one specify exactly how blank correction has been
performed including the information B,s,n, where B is the average blank.
Formulas for the various precision estimates plus worked examples are given in
A.5.S.
4.7.2. Systematic error (bias)
The main sources of analytical bias include calibration, blank
correction, interference and inability to determine all forms of the determinand
24
(Kirchmer, 1983 Holden et aI, 1983). Bias can also arise from the sampling and
storage process, but this will not be dealt with here.
Sections dealing with calibration, blanks and interferences have already
emphasized the importance of treating these properly. Although not widely
recognized, bias from interference or incomplete extraction is difficult to
control, but definite steps can and should be taken. Firstly, procedures should
be rugged, as already outlined, to ensure that extraction will be consistent.
The most satisfactory method for evaluating sources of bias is to analyze
certified reference materials. We recommend using replicate determinations of
reference materials at each end of the expected range of analyte concentration.
If suitable reference materials do not exist, the laboratory may have to prepare
their own working material taking the precautions noted earlier. Reference
sediments, used properly, can help to estimate both precision and bias.
A second way to estimate bias is to use an alternate analytical method
from time to time. This "reference" method should be an adaptation of a
completely independent technique if possible. For example, neutron activation
of the solid sediment followed by y counting would be a possible reference
method for a procedure which relied on a digestion step followed by
determination by AA. Colorimetric determination after the same digestion step
would not be an independent technique and so would fail as a reference method.
4.7.3 Combining and reporting the two error statements
Considering the problems identified above, it is unlikely that a perfect
error statement can be made. However, an unbiased best estimate can be
attempted by considering how error arises in the measurement system.
Three steps should lead to a valid assessment and clear statement of
error:
I. All possible sources of error should be identified preferably in order of
importance. Within laboratory errors will normally include (see Figure 2)
a. Subsampling and sample preparation
b. Calibration
c. Blank correction
d. Reading, calculation errors
e. Electronic noise
25
2. Develop and error model by which each of the important errors can be
measured. Calibration can be treated by linear least squares and errors on
slopes and intercepts can be calculated. Similarly, random blanks can be
used to determine the average and variance of the blank. Subsampling,
sample preparation, reading errors and electronic noise are difficult to
separate and are most easily evaluated together in the results from random
blind replicates. Alternatively, variance for these might be measured by
replicate measurements at 2 or more concentrations. To assess the combined
error of several steps, each must be allowed to contribute to error
independently. Bias can be evaluated independently through the use of
referee methods or blind, random reference material, and will be reported
separately from the random errors.
3. Combine the errors to give an un-biased best estimate. This will involve
two parts; the bias and the random errors. The former can best be treated
by examining all experience with the method particularly in consideration of
reference material, referee methods and inter-laboratory comparisons. For
the random component of error one must make some judgement of how error
contributes to the final data as outlined in Figure 2. The intent of
putting the errors together ~s to make sure that all errors which contribute
to the measurement have been given an unbiased and independent chance to
contribute to the error statement. "Random bias" components will require
particular care. The two most important examples are calibration and blank
correction. If a single calibration has been used for all data, then the
calibration error needs to be added to the error estimated by replicates.
If an average blank has been used rather than independent blanks for each
sample, then the blank variance needs also to be added into the final
statement.
These corrections can be made, where required, by the use of propagation of
error formulas. The main problem confronting us with virtually all real
data is that s tends to underestimate q because n is small (n(20), and
furthermore not all estimates of s will be based on the same replication
density. For example we will have ~ the number of blanks, nc the number of
26
points used to calibrate, and perhaps ~ the number of replicates (or pooled
replicates). To convert s to an unbiased best estimate of 0 (20, 30 •• ) we
should multiply by an appropriate "t" value (depending on n). This is
basically a normalization process for error estimates based on few
replicates. Here we will use a(2) 0.05 (also expressed as 95%
confidence) which converges to 1.96 for n large. Therefore in the
propagation of error formula we are replacing 0 with 1.96 0 ~ t o•05 ,n-1 x s.
(One could equally choose some other "t" which converged to a, 20, 30 for
example if that were desired.)
What is proposed here is a "modular ll approach whereby bias is estimated,
and random error is estimated for each of the components. Provided some
planning is done before the measurements, this should make the calculations
relatively straightforward. Ultimately the propagation of error formula will
indicate how to combine the errors properly. Necessary formulas are given with
worked examples in Appendix 5. A last step to meld random error and bias into
one concise error statement has been proposed by Taylor (1981). This is done by
assuming the "til distribution for random error, and calculating a confidence
interval as;
C n
-5-
where A is bias, t the Student factor (given n-1 degrees of freedom and some
arbitrary confidence) and n is the number of replicates upon which s, the
standard deviation, is based. This form is particularly convenient since the
calculations so far have been arranged in the form "ts ll• However, the division
by vn to convert error to "error of the mean" is not normally this simple since
there are different n's for each of the measurements. To do this correctly we
must return again to the propagation of error equation and insert (tsT where n
formerly was used (tsr (see Appendix 4). In reporting, a final error statement
along the lines of equation 5 may accompany but should not substitute for the
27
COMBINING the ERROR STATEMENTS
Bias
Random Bias CaljhralipO
m, 8,6,
lim. Sa. nc statement
IIlanI!.s ~ ii, 5B. nB statement
i • : .......................... -: : Limit of Detection : . . . . :· ................. u •••••• :
1. Reference Material 2. Reference Method{s}
3. Intercalibration
4. Method Validation
Random X, 5, n model formula
Information above the Line Should be Reported --------------
Figure 3
Grand Error Formula for Combining Bias and Random
Total Random Statement
. Formula lor Combining Random & Random Bias
28
simpler expressions X, s, n for each of the error contributors. Figure 3 shows
the modules which should be reported, and how they combine to form the final
statement of error. The above discussion and an understanding of the error
assessment process as shown in Appendix 7 will convince the reader of the
necessity to report each of the error modules and how they were combined.
4.8 Quality assurance
Quality assurance comprises two concepts; quality control, and quality
assessment which verifies that control is working (Taylor, 1981). Aside from
the laboratorids own error assessment, data validity can be strengthened by an
independent verification of the laboratory claims. Two components needed for
quality assurance are a IIcontrol system ll and a mechanism to verify the system.
Control is continuing, active, feedback which is used to correct problems and
assess accuracy. Bringing a system into control involves two stages, the first
of which is method validation and calculation of performance characteriestics.
The second stage is setting up the control system, the core of which is usually
a set of control charts and/or control calculations. At this point the errors
in the procedure have been reduced to acceptable limits, characterized
statistically, and included in the laboratory protocol. Elements of a
laboratory quality assurance program are listed in Table 2. These comprise what
would be considered good laboratory practice.
The task of the laboratory/analyst is to produce data within a known
uncertainty range, and document them fully. Elements in Table 2, if
implemented, will certainly achieve these aims. However it would be an
oversight to let the matter end there. While data produced by the analyst might
be impeccable, they could be useless from the point of view of interpretation
(Bewers, pers. comm.). The analyst or laboratory often does not have or need
expertise in interpreting environmental data and no incentive to develop it.
Therefore an essential step in the quality process is to ensure that expertise
to interpret the data is consulted before sampling, and from time to time to
review the data. Too often data are collected and compiled in a large and
growing file only to find in the end that they cannot be used by anyone.
29
Table 2
Quality assurance elements (After, Inhorn, cited in ACS Committee, 1980).
1. Maintenance of skilled personnel, written and validated methods, and
properly constructed, equipped and maintained lab facilities.
2. Provision of representative samples and controls.
3. Use of high-quality glassware, solvents, and other testing materials.
4. Calibration, adjustment, and maintenance of equipment.
5. Use of control samples and standard samples, with proper records.
6. Directly observing the performance of certain critical tests.
7. Review and critique of results.
8. Tests of internal and external proficiency testing.
9. Use of replicate samples.
10. Comparison of replicate results with other laboratories.
11. Response to user complaints.
12. The monitoring of results.
13. Correction of departures from standards of quality.
Element 10 in Table 2 (collabrative testing) is recognized as one of the
most essential components of quality assurance and there is a clear need to make
this option available to all laboratories. The use of referee methods and
reference sediments can assist the laboratory in a self-~udit and will
strengthen their quality statements but they should not substitute for the
round-robin. Reference materials should simulate the environmental samples as
closely as possible and they should be treated identically if they are to be
used as controls. The very limited number of certified reference sediments
(Appendix 3) will certainly not be representative of all sediments and therefore
laboratories will need to prepare their own uncertified materials. Certified
and uncertified reference materials will form the backbone of the control
process and therefore the laboratory protocol should schedule the number, order
and type of controls to be used. New analysts should demonstrate their ability
30
to perform the method within the performance characteristics before they handle
environmental samples.
A minimum quality assurance program should include control charts as
outlined in section 4.8.1. Not only must performance be acceptable (see
McFarren et aI, 1970) within the corporate aims (Table 1 in this case) but the
regulatory demands should be practical in terms of what can be obtained and how
much it will cost. According to Horowitz (1979) a "practical" method must be
reliable (accurate and specific), sufficiently rapid to provide timely
information (i.e. results can be provided within about one day), and economical.
The latter element comprises a number of factors including cost and stability of
reagents, cost and availability of instrumentation and expertise required to
perform analysis. Other considerations include the availability of reference
materials, problems of contamination (high blanks) and laboratory safety.
The task for ocean dumping and other environmental regulations is to
establish the concentration and variability of a contaminant in an environmental
reservoir. More specifically it is desired to know if the contaminant exceeds a
certain critical limit at which some action is necessary. What is not specified
is the acceptable risk of false rejections or false acceptances. It is beyond
the scope of this guide to delve into these issues except to point out that
environmental signals can confidently be detected provided the analytical
variability is less than about a third of the environmental noise. Furthermore,
confidence in an average of n replicates goes up as a function of n whereas
expense increases approximately with n. Therefore more than 3-5 replicates of a
single sample is probably not economically advisable, although saving replicate
material in case of later inconsistencies is always worthwhile.
4.8.1 Control charts
"Until a measurement operation ••• has attained a state of statistical
control, it cannot be regarded in any logical sense as measuring anything at
all" (Eisenhart, quoted in Taylor, 1983).
The design and use of the control chart has been reviewed (Mandel and
Nanni, 1978; Wernimont, 1979) and very detailed instructions are available for
presentation and analysis of control data (ASTM, 1976). The purpose of a
control chart is to establish that a measurement is in control; to maintain
control; and help assign cause when the process goes out of control. The
principle of preparing a Shewhart control chart is very simple, and an example
is given in Appendix 6. Points from a measurement (replicate of some sort) are
plotted as a function of time. After sufficient replicates have been collected,
a mean or centre line can be plotted, along with limit lines within which most
data should fall. There are as many different kinds of control charts as there
are measurements, going from something conceptually simple such as monitoring
the performance of a laboratory instrument (balance, AA, fluormeter etc.) to the
more difficult task of controlling a complete method including processing the
sample, making the determination and eventually calculating the result. To set
up a control chart requires about 20 points initially (Faires and Boswell,
1981). The mean is pencilled on the graph and the individual points are
plotted. The following three rules are suggested:
1. Not more than 1 in 20 results lie outside 2 standard deviations (warning
limit). A result outside 3 standard deviations requires action;
2. Not more than 7 consecutive results are on the same side of the mean;
3. There are no regular periodic variations.
Other control charts have been devised, but if emphasis is on the
accuracy of individual analytical results the simple chart outlined above is
adequate (Kirchmer, 1983; Natrella, 1982).
We also recommend the use of the Youden plot mainly because of its
visual impact (see Appendix 6). Reference materials are analyzed in pairs where
each member of the pair can be either identical or slightly different. One
member of the pair is plotted against the other and with sufficient replication,
error circles can be drawn analogous to the control limits of the Shewhart
control chart. The You den plot is actually a replicate or double control chart
and bas the advantage of helping to diagnose error source. Systematic errors
such as calibration or blank correction which vary from day to day tend to lie
on a line with 45° slope while random errors favour no direction. Generally the
combination of the two types of error forms an elliptical patttern. If the
centre of mass of the plotted points is different from the certified values,
then a constant bias in the method can be suspected. (See for example Macdonald
32
and Nelson, 1984, Youden and Steiner, 1978, Youden, 1968).
Whatever control technique is used, data should simulate normal
measurements. Control samples should be random and blind to prevent bias. The
rationale for the control chart use should be outlined in the laboratory
protocol; if numbers exceed the action limit the process should be stopped, the
problem identified, resolved and the process started again at the point last
known to be in control. Care should be taken to choose a statistic which is not
sensitive to concentration, for example CV (s/X) might be a better choice than s
in some cases (ASTM, 1979). Another option would be to divide the range of
measurements into sections where the statistic is reasonably well behaved and
prepare control charts for each range. This latter option is likely to be
tedious and not practical.
For short term measurements, the control chart will not be very useful
although control samples are still required. The frequency of control samples
cannot be etated at the outset; experience will dictate how many are required as
the procedure becomes understood. It is likely that more controls will be
needed initially.
Control charts could be maintained for the following:
1. Reference material. Single or better still paired samples are analyzed
(Youden chart). Use of two concentration levels spanning the expected range
is desirable.
2. Blind replicates of environmental material. Random samples are well mixed
and split to be analyzed in duplicate (or triplicate ••• ). The difference
between duplicates or the range (Appendix 6) is plotted with time.
Duplicates should not be run sequentially. An "out of control" duplicate
may not indicate an "out of contro11l analytical system, and checking results
for (1) above will help to resolve the problem. Choice of model (see
precision) is important since duplicates run during the same batch, or in
different batches will not evaluate the same error.
3. Standard solutions. Results for check calibrants should be plotted as a
Shewhart or Youden control chart. We recommend a high concentration (slope
bias) and low concentration (blank bias) be used.
33
4. Blanks. Blank control tends to be neglected (King, 1976). Blanks greatly
influence LOD and contribute error (bias and precision) to all samples.
Blank control charts therefore help in the assessment of LOD and detect
contamination quickly.
A control chart should also be maintained for any other instrument or
process which is used in the analytical system, the analytical balance being the
most important example. Where electronic data handling and processing takes
place, a check set of data should also be used to verify that the program is
working properly. This does not require a chart, but perhaps a "tick" system to
ensure that it is always done.
5. RECOMMENDATIONS FOR VERIFYING QUALITY OF DATA
The individual, or group, who wish to interpret the data or use it for
making decisions require a mechanism for independently assessing the quality of
the data. About 10-20% of the total budget should be spent on this, more for
areas which are particularly sensitive or when litigation is likely. We
recommend the following:
1. The laboratory protocol and quality assurance program should be available
and kept on file by RODAC. Laboratories which cannot produce detailed
procedures of validated methods with performance characteristics are not
excercising quality control.
2. Blind replicates, reference material and calibrants should be submitted for
analysis from time to time. Performance on this material would have to
equal or exceed the laboratories claims for accuracy.
3. Evidence of quality control procedures should be available with each data
set. This would normally consist of recent control charts, and should be
backed up by occasional audits. The laboratory should keep raw data on file
for a specified time period (2 years) and should be able to produce it.
Inspection of the laboratory can also give evidence that they follow the
quality control practices outlined in their protocol.
4. Occasionally samples should be checked by independent reference methods,
particularly for sediments where problems are anticipated (high sulfide or
organic content for example).
36
Horowitz, W., 1979. Practicality in regulatory analytical chemistry, Analytical Chemistry 2.!... 741A-745A.
Kaiser, H., 1970. Quantitation in elemental analysis, Analytical Chemistry ~ (4), 26A-59A.
King, D.E., 1976. Evaluation of interlaboratory comparison data by linear regression analysis, in National Bureau of Standards Special Publication 464, Proceedings of the 8th Symposium, P. 581-596.
Klein, R.J. and C. Hach, 1977. spectrophotometric analysis.
Kirchmer, C.J., 1983. Quality Technol., ~ 174A-181A
Standard additions: uses and limitations in American Laboratory 21-27.
control in water analyses, Environ. Sci.
Ku, H.H., 1968. Statistical concepts in Metrology, p. 296-330 In, Precision Measurements and Calibration Statistical concepts and Procedures, N.B.S. Special Publ. 300 volume I, ed. H.H. Ku, Washington, D.C. 436pp.
Long, G.L. and J.D. Winefordner, 1983. Limit of detection - a closer look at the IUPAC definition, Analytical Chemistry, ~ 712A-724A.
Macdonald, R.W. and H. Nelson, 1984. A laboratory performance check for the determination of metals (Hg, Zn, Cd, Cu, Pb) in reference marine sediments, Canadian Tech. Rep. of Hydrography & Ocean Sciences No. 33, 57pp.
Mandel, J. and L.F. Nanni, 1978. Measurement evaluation In Quality assurance practices for health laboratories, S.L. Inhorn, ed., APHA, N.Y.
McFarren,·E.F., R.J. Lishka and J.H. Parker, 1970. acceptability of analytical methods, Analytical
Criterion for judging Chemistry, ~ 358-365.
Natrella, M.G., 1983. Experimental Statistics, NBS Handbook 91, u.S. Dept. of Commerce, Washington, D.C., various pagings.
Porter, W.R.; 1983. Proper statistical evaluation of calibration data, Analytical Chemistry ~ 1290A.
Russell, D.S., 1984. Available standards for use in the analysis of marine materials, Marine Analytical Standards Program, National Research Council of Canada, Report No.8, NRCC No. 23025, 35pp.
Samant, H.S., D.H. Loring and S. Ray, 1979. Laboratory evaluation program. First quality control round-robin surveillance. Report EPS-4-AR-79-1, Environment Canada, 38pp.
Taylor, J.K., 1983. Validation of analytical methods, Analytical Chemistry ~ 1588A-1596A.
37
Wernimont, G., 1979. Statistical control of measurement processes, In, Validation of the Measurement Process, J.R. Devoe, ed., ACS Symposium Series 63, American Chemical Society, Washington, D.C.
Youden, W.S., 1968. Graphical diagnosis of interlaboratory test results, p. 133-137, ~ Precision measurements and Calibration Statistical Concepts and Procedures, N.B.S. Special Publ. 300, volume I, ed., H.H. Ku, Washington, D.C. 436 pp.
Youden, W.S. and E.H. Steiner, 1975. Statistical manual of the association of official analytical chemists, assoc. of official analytical chemists, Arlington, Va., 88pp.
Standard Operating Procedures:
Statistical Control:
Technique:
Uncertainty:
Validation:
Verification:
40
Detailed written procedures (Taylor, 1981).
Measurements behave like random samples from a probability distribution, and therefore can be predicted (Natrella, 1982).
Scientific principle useful for providing compositional information. (Taylor, 1983).
Allowance assigned to a measured value to include two major components of error: (1) Bias, and (2) random error. (Natrella, 1982).
An experimental process involving external corroboration by other laboratories (internal or external) or methods or the use of reference materials to evaluate the suitability of methodology (ACS Committee, 1983).
The general process used to decide whether a method in question is capable of producing accurate and reliable data (ACS Committee, 1983).
41
A.2 APPENDIX 2 THE RUGGEDNESS TEST (AFTER YOUDEN & STEINER, 1976).
This test is a simple way to learn if the results of a determination are
sensitive to small procedural changes, for example temperature of a digest or
length of time for an extraction. A rugged procedure is relatively insensitive
to small changes and is more likely to produce consistent results when subjected
to normal "abuse" by different analysts working in different laboratories. A
ruggedness test should be used as an integral part of method validation, and the
results of the test can help to estimate the performance characteristics and in
the preparation of the written laboratory procedure or protocol.
A particularly efficient procedure based on a fractional factorial
design may be used to investigate up to 7 variables with only 8 determinations.
Unfortunately the consequence of the design is that main effects are confounded
with some of the possible interactions while other interactions cannot even be
estimated. To interpret results one is forced to assume that confounded
interactions are negligible. Furthermore, sorting out random error from real
effects is difficult since one cannot calculate the former. A simple solution
to the problem is to run the ruggedness test in duplicate with a total of 16
determinations.
The basis of the ruggedness test is to allow each of 7 variables to have
two states, preferably representative of the extremes likely to occur when a
procedure is followed by two different analysts using different equipment.
Defining the two states by upper and lower case letters, one obtains a factorial
design like that shown in Table A-2.
Calculations as illustrated by the worked example are performed using
the average differences between upper and lower case results, DA, DB ••• DG,
DA
' , ••• DG
' where
D ' G
r+t+u+v 4
w+x+y+z 4
r'+v'+x'+y' - t'+u'+w'+z' 4 4
a-2-l
The differences
can provide an estimate
s = 2
(D -D ') A A
42
- D.' are independent of factor effects and therefore 1
the random error, s, with 7 degrees of freedom
2 + ••• +(D -D ')
G G 7
a-2-2
For an effect to be statistically significant at the 95% confidence level, the
absolute mean difference, I Di + Di
' I, of any factor must exceed 1.18s.
2
In the worked example, factors found to have a significant effect in
order of importance are A > C > B - E. Factors F and G which were unassigned
did not (and should not) have a significant effect.
Several types of information are available from the ruggedness test. We -1
have an estimate of s at the concentration of 1.45 ~gg , and we know that
temperature must be rather closely controlled during digestion. If we have a
certified value for Cd concentration in the material, or a IIreliable"
measurement by an independent method, we can estimate bias and recommend
operating conditions which will go into the laboratory procedure or protocol.
If the variance is approximately the same for blanks as it appears to be for -1 these samples, the detection limit will be about 0.12 ~gg and the method shows
much promise for complying with ocean dumping requirements. In fact the
precision could be called excellent.
The ruggedness test could also be a useful way of training an analyst
new to the method, and allowing him to evaluate for himself which factors are
likely to have an important effect on the final data. It would also show the
ability of the analyst to produce data which conform to the performance
characteristics prior to analyzing environmental material.
43
TABLE A-2
Eight combinations of seven hypothetical factors used to test the ruggedness of an analytical method.
FACTORS DETERMINATION /I
1 2 3 4 5 6
Digestion Temperature A A A A a a o 0 A, 100 C a, 90 C Digestion Time B B b b B B B, 2 hours b 3 hours Volume of acid C c C c C c C 6 mL c 10 mL Ratio of HCl:HND:3 D D d d d d D, 3:1 d, 2:1 Digest Storage Time E e E e e E E, 1 day e, 1 week Unassigned F f f F F f Unassigned G g g G g G Gene ra lized observed results r t u v w x
r' t' , , w' x ,
u V
Hypothetical results for 1.47 1.63 1.58 1.72 1.14 1.38 worked example Cd (pg g-l) 1.54 1.56 1.64 1.69 1.10 1.41
WORKED EXAMPLE USING THE HYPOTHETICAL RESULTS
DIFFERENCES A B C D E F
Di 0.300 -0.090 -0.200 -0.010 0.050 0.000 D'i 0.295 -0.115 -0.180 0.000 0.145 0.015
Di - D i -0.005 0.025 -0.020 -0.010 -0.095 -0.015 IDi; D'il 0.297 0.103 0.190 0.005 0.098 0.008 -- -- -- ---
0.039 CV s x 100 3% X
1.18s = 0.046
7 8
a a
b b
C c
D D
e E
f F G g y z , , y z
1. 21 1.47 1.20 1.54
G
-0.010 0.000
-0.010 0.005
:. Factors A, B, C and E are significant at the 95% confidence level.
44
A.3 APPENDIX 3 CERTIFIED REFERENCE SEDIMENTS (SEE RUSSELL, 1984 FOR A
COMPLETE LISTING OF PRESENTLY AVAILABLE REFERENCE MATERIALS)
A.3.1 Metals
Metal Concentration, -1 )lg g
Schedule 1
Reference Organization Hg Cd Pb Cu
MESS-l MACSP-NRC 0.171 0.59 34.0 25.1
BCSS-l MACSP-NRC 0.129 0.25 22.7 18.5
SRM-1646 NBS 0.063 0.36 28.2 18
MAG-I USGS (0.2) (24) 27
( ) Data are based on limited results
A.3.2 Chlorinated Hydrocarbons
Reference Organization Compounds 1 X (s) )lg kg-1 ----~~~~--~~~~--~~~~
CS-l
HS-l
HS-2
MACSP-ARL
MACSP-ARL
MACSP-ARL
1 Relative to 1254
PCB
PCB
PCB
2 Individual compounds also determined
1.15 (0.60)
21.8 0.12)2
111.8 (2.5)2
Schedule 2
Zn As Be Cr
191 10.6 1.9 71
119 11.1 1.3 123
138 11.6 (1. 5) 76
135 (3) 105
Ni V
29.5 72.4
55.3 93.4
32 94
S4 140
45
A.3.3 Reference sediments in preparation
Reference
SD-N-l/l
SD-N-l/2
SD-N-2
Organization
MACSP
lAEA
IAEA
IAEA
BCR
Elements or Compounds
Polycyclic aromatic hydrocarbons (PAR)
Low-level transuranics
Trace elements, U, Th + Decay products
Low-level transuranics
May prepare marine reference sediments in future
A.3.4 Addresses
BCR: Community Bureau of Reference
Directorate General XII
lAEA:
MACSP:
NRC:
Commission of the European Communities
200 Rue de la Loi
B-1049 Brussels
Belgium
International Atomic Energy Agency
Analytical Quality Control Services
Laboratory Seibersdorf
P. O. Box 590
A-lOll Vienna
Austria
Marine Analytical Chemistry Standard Program
National Research Council of Canada
1. Division of Chemistry
Montreal Road
Ottawa, Ontario
KIA OR9
ARL: 2.
46
Atlantic Research Laboratory
1411 Oxford Street
Halifax, N.S.
B3H 3Z1
NBS: National Bureau of Standards
U. S. Department of Commerce
Washington, D.C. 20234
USGS:
U. s. A.
United States Geological Survey
National Centre
Stop 972, Reston, VA 22092
U" S" A.
47
A.4 APPENDIX 4 PROPAGATION OF ERROR
When a quantity, Q, is calculated indirectly by combining several
measured quantities, each with an associated error, the overall error in Q can
be estimated from the theory of propagation of error. Suppose Q = f (A,B,C ••• )
and that each variable has associated with it an independent error with variance
2 2 °A' O"B •••
Then the variance of Q is given by
2 aB
+ .... a-4-1
The best estimate of a J particularly when the number of replicates, n, is small,
is ts where t is the Student factor (which depends on n and the number of a
limits one wishes to estimate) and s is the sample standard deviation. The
formula becomes
a-4-?
And where it is desired to estimate the "variance of the mean"
o~ • (~)2 (tsA)2 + (~\2 (tsB)2 .... aA - aBJ-
n A nB
a-4-3
where nA,nB ••. are the number of replic~tes associated with each respective
standard devi~tionJ sA' sB ....
Table A-4-1 shows these equations applied to common cases.
FORMULA Q=
A ± B
aA ± bB a, b constant
AB, AB- 1
An
In[A]
A+B e
TABLE A-4-1
Propagation of error fo~ulas for commonly encountered cases. 1
2 . f 2 Best estimate of cr~ crQ Best est~mate 0 crQ (eqn. a-4-1) (eqn. a-4-2) (eqn. a-4-3)
2 + 2 2 2 2 2 cr
A cr
B (ts
A) + (ts
B) (ts
A) (ts
B)
- +-n
A n
B
2 222 2 2 2 2 2 2 2 2 a u
A + b cr
B a (ts
A) + b (ts
B) a (ts
A) b (ts
B)
- + -n
A n
B
~Ay [U! cr~] B 'i! + 2 A B (~y
[(tsA)2 + (tsB)2]
A2 B2 (~r [(ts{ + (ts~)2J
nAA nBB
n2
[An
-1
crAJ2 2 [An
-1 J 2 n tS
A :: [An
-1
tSAr
2 2 2 uA
(tsA
) (tsA
)
A2 A2 . . A2 nA
r+By [cr! + cr~ +cr2e]
[ 2 2 2 2 2 tA+BY (tsA) +(tsB) + (tse ) t .... +Br [Cts
A) ~+(tsB) ~ (tse e (A + Bf e
2 e (A+B/ e 2 e - 2 + 2
nAnB(A+B) nee
-
)21 J
-"" co
A+B + D C
rA+B12 [a!*,,~ a~} 2
[CJ (A+B)2 + c2 aD
rA+B]2 (tsA) +(tsB) +(tse) l 2 2 2J [ e] (A+B) 2 e2 t~Br
+(tsD
/
2 2
[
(tsA) nB+(t~B) nA +
nAnB
(A+B)
2 (ts
D)
+-~
1. "t" is Students t and depends on tile degree of confidence desired (number of a limits) and the respective number of replicates enA' nB ••• )
(tsc~ 2 J
nee
.0-
'"
50
A.5 APPENDIX 5 COMMON FORMULAS USED TO SUMMARIZED DATA AND MAKE QUALITY
STATEMENTS. (SYMBOLS AND ABBREVIATIONS ARE GIVEN AT THE
END OF THIS APPENDIX.)
As described in the text under lIaccuracy" and shown in Figure 3, a
complete error treatment can be considered in several simple compartments which
can be combined appropriately to make the final error statement. In this
section are described formulas which will assist in presenting compact
statements of data and their quality to fit in each of the compartments.
For data sets based on only a few samples, uncertainties will always be
great. Where not much is known about the underlying distribution for the data
it is worthwhile keeping the Tschebycheff inequality in mind. It states that
regardless of distribution the probability of a measurement exceeding an average
by more than k standard deviations is less than or equal to l-~, or;
k2
Xi > X + ks; P < 1 -
Therefore at least 89% of the numbers will be less than X + 3s
(regardless of distribution), and in the favourable circumstance of a normal
distribution with the determination of s based on many replicates, 99.9% will be
less than X + 3s. The real world will always be between these two extremes.
Caution in the use of the formulas.
Care must be taken with units since averages and standard deviations can
be calculated at three different stages.
Stage 1 - Direct reading from a recorder; Y. (mV, em) 1
Stage 2 - After a calibration process; Ci
(llg L-1 )
Stage 3 - After correction for weight of sediment and/or makeup -1
volume; ~ (ll g g dry weight basis)
Here we will always
stages. It is assumed that
eventually reported.
use Y., C. and X. to designate the respective 1 1 1
some derived quantity such as Xi is what is
51
A.5.1 Sample standard deviation, s.
Formula
Example
Report
n (n-1)
X. 4.00, 3.28, 3.25, 4.03, 3.27 ~
X,s,n 3.57,0.41,5 (llg g-l)
t a-5-1
t This formula is· subject to error if roundoff is carried out carelessly.
A.5.2 Pooled standard deviation, sp.
Formula
Example
Report
s = a-5-2 p
where vi = ni
-1 = degrees of freedom.
For the case where data consist of k duplicates and di
is
the difference between duplicates;
v. = 1 ~
s p a-5-3
Suppose for a data set, 20% of the samples were run as blind
duplicates, on different days with the following matched
pairs, -1
(Xl' X2)lg g ).
(1.61, 1.29) (l.18, 1.25) (1.72, 1.60) (0.31, 0.41)
(0.21, 0.29) (2.48, 2.59) (4.81, 4.39)
s = ,(0.327 = 0.15 llg g-l p ~
For the range 0.21 - 4.8111g g-l, s = 0.15 )lg g-l (n=2k=14) p
(plus the formula used and a statement describing the model).
52
A.5.3 Rejection of outliers
Extreme observations may occur due to gross error or may be part of the
population. There are two reasons for rejecting outliers; known contamination
(sample mishandling) and statistical improbability. In the first instance, if a
sample is suspect, it should not be analyzed. For the second instance,
statistical means exist for removing outliers, such as range tests (Dixon and
Massey, 1969) or Chauvenet's criterion. Regardless of which method is used, it
is not likely to have much meaning for small data sets (n(4). In that case
there is no easy solution to dealing with "wild" data points except re-analysis
or re-sampling or both. Criteria for data rejection should be decided before
data are evaluated. Chauvenet's criterion is tabulated below.
Table A-S-l. Chauvenet's criterion for rejection of suspected value
having a deviation from the mean of 8 = X. - X ~
n 8/ (J n 8/0 n 0/ (J
3 1.37 7 1.80 20 2.24
4 1.53 8 1.87 30 2.39
5 1.64 9 1.91 50 2.57
6 1.73 10 1.97 100 2.80
For a small population, s may seriously underestimate a and the "t " distribution
could be used to correct this.
Example
Report
Xi = 1.60, 1.59, 1.71, 1.28, 1.60, 1.48, 1.65, 1.73, 1.43
X ~ sen) = 1.56 ~ 0.14 (9)
Should 1.28 be eliminated because it is too low? 8
= 2.0 > 1.91 s
Reject 1. 28
X= 1.60 + 0.10 (8)
- -1 X, s, n = 1.60, 0.10,~g g ,8 (One low data point
was rejected by Chauvenet's criterion)
53
A.5.4 Calibration using linear regression (see Natrella, 1963)
A calibration is usually performed to relate a measurement such as
.recorder response to one which is more useful such as concentration. If the
relationship is linear;
Y = a + m C
The normal model used for the calculation of the regression assumes that there
is no error in the independent variable, C. Careful calibrant preparation can
arrange Sc « sY and therefore the regression should be performed according to
the above equation. The appropriate formulas are:
Formulas
slope
intercept
variance of Y
variance slope
of
variance of intercept
correlation coefficient
m
a
nE CY-E CEY
nEC 2 _ (EC) 2
EY m E C n n
2 s
1 on-2
Ey2 _ 1. (EY) 2 - mECY + !". ECLY n n
2 2 s - s m
EC2 _ (EC)2 n
2 s2EC2 s a 2 (EC) 2 nEC -
r = nECY - ECEY
~[~l:C2 _ (EC) 2J [nLY2 - (l:Y) 2J
a-5-4
a-5-5
a-5-6
a-5-7
a-5-8
a-5-9
54
A.5.4.1 Linearity
How best to perform a calibration should be investigated as part of
method validation and prior to preparing a laboratory protocol. If it is
desired to use a linear regression formula, the "linear! tyll of the calibration
should be demonstrated along with the range for which it is linear. The
protocol should specify that calibrations and samples must always be within this
linear range.
The linearity test can be performed if there are replicate observations
on Y (recorder response) at one or more values of C (concentration). The
recommended 5 calibration points in triplicate would supply exactly this sort of
information.
Suppose there are n different concentrations, and that at each
concentration, ki
(i = l,n) replicate observations of recorder response are
made. Table A-5-2 shows a worked example of the linearity test using
hypothetical data. To demonstrate linearity, the calculated F should be less
than the F . (taken crlt
from common statistical tables) at the chosen significance
level, normally a (1) = 0.05. The worked example shows that the first 6
recorder responses have a linear relationship with concentration but that as the
concentration gets larger the calibration becomes non-linear and response tends
to drop off. If the calibration data are plotted, it will be seen that the -1 recorder response is tending to drop off slightly even at 70 llg L
concentration, and inclusion of this point would cause a negative bias in -1 estimated concentration near zero and a positive bias near 70 ~g L Therefore
the linearity test is not a substitute for analytical experience, and
calibration data should be plotted to get a clear idea of what the recorder
response function is like. The linearity test and the plotted data lead one to
conclude for the hypothetical case in Table A-5-2, that linear regression is
safe provided concentration of analyte presented to the instrument is no more -1
than 50 llg L •
TABLE A-5-2 Linearity test
1 2 .3 4 5 6 7 8 9 10
k. kiCi 2 . (EY) 2/k .
-1 k.C. ECY ~ ~ ~ ~ Y cm ~g L
(1:Y) 2 Ey2 (6x12
) Concentration Recorder Response l.:Y (6x1) (lx.3) (4x6)
0.00 0.2, 0.4, 0.7, 005 1.8 3 0.94 4 0.00 0 0.00 1 5.00 12.0, 11.5, 11.4 34.9 1218 406.21 3 15.00 75. 174.5 406
10.00 23.4, 23.8, 23.0 70.2 4928 1643.0 3 30.00 300. 702.0 1642.7 30.00 69.0, 71.1 140.1 19628 9816.2 2 60.00 1800. 4203.0 9814 50.00 117.1, 110.2, 115.2, 114.4 456.9 208758 52214.9 4 200.00 1000000 22845.0 52190 70.00 148.1, 150.3, 142.1 440.5 194040 64716.1 3 210.00 14700.0 30835.0 64680
n = 6 j=k-n=13 1: 1144.4 128797 19 515.00 26875 58759.5 128734
.3 5 6 7 8 9 10
Calculations
51 =10-l b 9-7.3/k 52 =b[9-7.3/k] S3=5-.32/k
k 8 _ 72/k ~
F = [51 - 52] [k - j] = [59805.0 - 59579.8J [-..E.] = 1. 95 F . (6,11) = 3.09 j - 2 59868.0 - 59804.0 11 cr~t
53 51 :. F < F ., :. Calibration is linear crlt
Suppose we have two more calibrations at higher concentration
100 I 202.1, 204.3, 198.0, 20301 807.5 652056 163036.5 4 400 150 268.1, 270.3, 265.0
n = 8 j=k-n=17
F = [203232 - 200647J [~] = 13.9 203331 - 203232 15
803.4 645452 215164.7 3 450
1: 2755.3 1723083 506998
.3
F . (8,15) = 2064 crlt
5
25 1365
7
F > F . crlt
40000 80750 163014 67500 120510 215151
134375 260020 506899
8 9 10
Calibration is not linear
56
A.5.4.2 Linear regression
Assuming that linearity has been established, linear regression can be
applied using the well known formulas given above. The worked example below
(using selected data from Table A-5-2) shows what to report.
Example
Report
Suppose we have the following calibration data reported as -1 C Ilg L ,Y cm.
(0.00, 0.2), (5.00, 12.0), (10.00, 23.0), (30.00, 71.1),
(50.00, 117.2)
n 5 s 0.0122 m
m = 2.346 s 0.325 a
a 0.111 C 19.0
i 0.257
Calibration was done by linear regression, Y = a + mC where -1 Y em is recorder signal and C II g L is analyte
-1 concentration for the range 0 - 5011 g L ,n = 5. (The
linearity test shows calibration to be linear within this
range at the" (1) = 0.05 significance level.)
Y = 0.30 + 2.32 C, s = 0.303, s = 0.194, s 0.00732, C = 19.0 a m
A.5.5 Blank correction
As noted in the text, care needs to be taken to assess correctly the
error contributed by the blank. With an independent blank for each sample
(seldom the case), calculation of net concentration will correctly contain error
from the total and blank, and standard deviation may be calculated as shown in
eqn. a-5-1 above. More commonly, a limited number of blanks are available and
their average is used to correct total to net concentration for a given data
batch. When this is done, variation in the blank is eliminated and will not
show up in the final error statement. This can easily be corrected through the
propagation of error formula. The following example illustrates the difference
57
in "apparent error II between the two methods of calculating, and how the
correction should be made when an average blank is used.
ExamEle
ReEort
The following blanks and samples are measured -1
(llg L )
Total, Ti 171 149 157 161 138 T, sT = 155.2, 12.5
Blank, Bi 11 18 23 a 7 B, sB ll.8, 9.0
a) Ti - B. ].
Ci 160 131 134 161 131 C, s = C 143, 15.7
b) T. - B = Qi IS9 137 145 149 126 Q, t 143, 12.S s = 1. T
t The important point is that error associated with ~ is
identical to sr and underestimates Be' the true error. The
correct error associated with Q can be calculated according
to the propagation of error formulas (Appendix 4);
sQ =~s~ + s~ =~12.s2 + 9.02
= IS.4
- -1 B, s , n = ll.8, 9. 0, S (11 g L )
Also make a statement about how blanks were run (randomly)
how totals were blank corrected (a or b) and finally how
error from the blank was factored into the final error
statement. (See Appendix 7 for a complete example).
a.S.6 Limit of dectection (LaD)
58
a.5.6.1 IUPAC definition
Formula
Example
LOD (IUPAC) = 30 B m
LOD (IUPAC) = 3s B
(n > 20)
m
LOD (IUPAC) = ts B
(n small)
m
where
terms
t is to. 001 (1), n-l and sB is calculated in of recorder signal output.
a-5-10
a-5-11
a-5-'12
The following blank peak heights were measured (cm) with
the system calibrated according to the example in A.S.4.2
above.
2.1, 3.0, 1.4, 5.1, 4.1, 2.6, 3.3, 1.9, 1.1, 5.6, 3.8
YB
, sB' n = 3.09, 1.5, 11
t 4.1 0.001 (1) ,10
LOD = 4.1 x 1.5 2.346
m = 2.346
If the normal procedure is to digest about 0.8g (dry
weight of sediment) and make up final volume to 50 mL,
this corresponds to:
LOD = 2.6 x 50 ~g i l (dry weight) 0.8 1000
0.16 ~g g-l
Report
S9
LOD (IUPAC) = 2.6 llg C 1
(LOD = tSB
where" (1) 0.001, m sensitivity and n = 11 )
m
Normally about 0.8 g of sediment are used and digests
are made up to SO mL which corresponds to an LOD of
0.16 llg g-l
A.S.6.2 LOD by propagation of error (includes calibration error)
Formula The blank expressed as a concentration is;
a-S-13
m Therefore according to the propagation of error (Appendix 4)
m
or for small numbers of blanks and calibrations
222 30B ~ YB.-a (tsB) + (tsa ) + (tsm)
m 2 2 (Y
B - a) m
a-S-14
a-S-1S
where Student's t (a(l) = 0.001) is based on the
respective degrees of freedom for the blank (n-1) or
residual error of the calibration (n-2).
Example
Report
60
Assume the same blanks given before (A.5.6.1) and the
calibration data (A.5.4.2). We have
a = 0.30
1.5 m 2.35
11 0.00732
t(0.001,10) = 4.14
s a
0.194
n = 5 c
t (0.001,3) 10.2
LaD (eqn a-5-15, n small) 1.19 ..,j 4.95+0.50+0.001 = 2.8 llg L- 1
Evidently the contribution to LOD from error in the
intercept is small and from error in the slope is
negligible, but this will not always be the case.
Normally, the intercept error will contribute more than
slope error so it is important to calibrate near zero to
fix the intercept accurately.
LOD (propagation of error) = 2.8 II g L- 1 (Based on
II blanks and 5 calibration points, and normalized to 30
using the appropriate t values. Give formula used, or
reference to i~.)
For 0.8 g sediment digests made
of 50 mL this corresponds to an
up to a working volume -1
LaD of 0.17 llg g
A.5.7 Symbols used in Appendix 5
a
B
C
(A bar over any symbol denotes average)
Intercept on a calibration line. -1
Blank measurement, ~g L -1 Concentration after applying calibration formula, llg L
CV
d
o k
LOD
m
n
Q
s
s a
sB s m
s p'
S q
T
61
Coefficient of variation (CV ~ six x 100).
Difference between two replicates, Xl - ~.
Distance away from an average, Xi - X. The number of standard deviations used for determining L.O.D.
The limit of detection.
The slope of a calibration line (analytical sensitivity).
The number of samples, blanks (~) or calibration points (n ). -1 C
Samples corrected by subtracting an average blank, ~g L •
standard deviation.
Standard deviation of the intercept of a linear calibration.
Standard deviation of the blank.
Standard deviation of the slope of a linear calibration.
Pooled standard deviation.
Standard deviation of a sample corrected with an average blank.
Samples after applying a calibration formula but before a blank
correction.
t Student's t.
X concentration of metal in a sediment, 11 g g-l
Y Signal from an instrument, (mm, volts etc).
YB Signal from an instrument when measuring a blank.
o Population standard deviation.
v Degrees of freedom normally n-1.
62
A.6 APPENDIX 6 CONTROL CHARTS
Here I give worked examples of three simple control charts which should
be adequate for most laboratory analyses. They are the Shewhart chart (good for
detecting a shift in results), the Youden control chart (helps to differentiate
random and systematic error problems) and the standard deviation or range chart
(good for detecting increased variability or noise). Use of combinations of
these control charts should help to spot and diagnose problems quickly. More
complex approaches are possible and the interested laboratory should refer to
the ASTM manual (1976) or to Friedman and Erdmann, 1982.
To prepare a control chart, a stable. measurement is required, and we
have suggested (1) Reference material, (2) Blind replicates, (3) Check
calibrants and (4) Blanks as possible candidates. Additionally, physical
measurements such as weighing need some control chart attention. A faulty
balance can generate error which will pervade all measurements made in a
laboratory.
A.6.1 The Shewhart Control Chart
The following hypothetical Cd data (~g g-l) for reference material
(MESS-I) have been generated over a number of days, using the same procedure,
analyst and instrumentation.
0.63, 0.58, 1.8, 0.50, 0.40, 0.59, 0.58, 0.64, 0.59, 0.62, 0.61, 1.0, 0.40,
0.40, 0.58, 0.58, 0.55, 0.49, 0.63, 0.68, 0.62, 0.59, 0.54, 0.58, 0.60, 0.65;
X, s, n = 0.63, 0.26, 26.
The numbers 1.8, and 1.0 look suspiciously high, and both can be eliminated
(stepwise) by Chauvenet's criterion. This should be done, because we want the
control chart to be based on samples which we feel were "in control". The
revised estimates are X, s, n = 0.568, 0.078, 24. The chart may now be drawn
with a line representing the mean, X and two control limits which are X + 3s.
Subsequent data collected during several weeks are marked on the control chart
(Fig. A-6-1), along with pertinent comments.
Examination of the Shewhart control chart shows its potential utility.
For example in Figure A-6-1, new standards seem to contribute to variance
f(&- 4-10--/
--I Ul Ul w ::E
iii ;: CI
SHEWHART CONTROL CHART
~0.8L \ 5i 0.7-
New Batch of
Graphite. Tubes
--. - - -~
~ CI ~ 0.6-1 ••
I: 00.5-
~0.4~ -I· •••
I: (II CJ I: o
CJ New Standards
" CJ
•• 1 • • • • • •
• • • •
• • • •••
I New Standards
Action Limit
- - Warning Limit
Mean
• Warning Limit
Action Limit
r---- -,----- - --,- ---,-----,-----,
5 10 15 20 25 5
Febru ary 1984
x+3s )(+2s
x
)(-2s
x-3s
'" w
66
TABLE A-6-1
PAIR MESS-1 BCSS-1 MESS MESS CALCULATIONS -1 -1 +
Cd~g g Cd~g g BCSS BCSS S = 0.030
n X Y T D R
Ss 0.069 1 0.63 0.28 0.91 0.35
2 0.58 0.25 0.83 0.33 S 0.075
3 1.8* 0.27
4 0.40 0.20 0.60 0.20 CONTROL LIMITS 5 0.59 0.27 0.86 0.32
2s = 0.085 6 0.58 0.27 0.85 0.31 D
7 0.64 0.30 0.94 0.34 3s = D 0.127
8 0.59 0.68*
9 0.62 0.26 0.88 0.36 NOTE: Variances for X and Y
10 0.68 0.30 0.98 0.38 appear to be different.
11 0.55 0.27 0.82 0.28
12 0.50 0.18 0.68 0.32
13 0.59 0.22 0.71 0.37
14 0.61 0.26 0.87 0.35
15 0.49 0.20 0.69 0.29
16 0.52 0.22 0.74 0.30
17 0.66 0.08 0.84 0.48*
18 0.60 0.24 0.84 0.36
19 0.49 0.18 0.67 0.31
20 0.55 0.26 0.81 0.29
21 0.51 0.20 0.70 0.30
AVG 0.56 0.24 0.799 0.320
s 0.068 0.039 0.106 0.0423
n = 18
* Data points removed according to Chauvenet's criterion.
Fr . --r'-I' -- ( L
O.B
0.7 "U ()
"111 ~, / .6
III ::I.
0.6J I ~ oA -I 1/1
.,J ~ 1/1 ILl ::l W -3
0.41 t'; "?-o I 0.1 0.2 03 BCSS-1 I'g g-l Cd
0.4 -Q -1/1 1/1 () 0.3 ID I
1/1 1/1 ILl ::l
0.2
5 10 Days
\ I
)
0.4
Warning
-t--1/1 1/1
0 () ID
+ 1/1 1/1 ILl
Warning ::l
15
.,
YOU DEN CONTROL METHOD
+ Certified Value
o Average n= 18
Inner Circle Radius 2s R= 0.060
Outer Circle Radius 26 = 0.150
1.0
0.9 I ! ... I
I • O.B ,
\ • , 0.7
0.6 -5 10
Days
Warning
T
Warning
15
'" ....
68
considered. If the T chart shows points exceeding the warning limit while the D
chart does not, bias can be suspended as the cause.
The utility of the You den method can be seen in Figure A-6-2 where data
collected during 8 days (batches, etc.) have been plotted. In the top left hand
X-Y plot, systematic errors are evident since points generally form an ellipse.
On day 4 and day 8, there seems to be some problem with random error. On days
3, 4 and 6, systematic error is a major contributor.
A.6.3 The range control chart
Either a range or standard deviation chart may be used to control
precision. Range is used here since it is easier to calculate and understand by
the average observer. This control chart may be set up if measurements are
being replicated a fixed number of times per batch or time period, and therefore
can be used in conjunction with a Shewhart control chart for mean concentration.
To set up the control chart, calculate range for each set of replicates,
and eventually estimate average range, R = rR./N. The control limits may now be ~
calculated by mUltiplying R by a factor which can be found in pu~lished Tables
(ASTM, 1976). For most cases, where the number of replicates is small ( < 7)
this process is likely to be simple. The lower limit multiplier, DL is zero and
the upper limit multiplier, tu for the respective n's is 3.27 n=2, 2.57 n=3,
2.28 n=4, 2.11 n=5, 2.00 n=6.
A.6.4 Summary
A.6.4.1 Chart preparation
1. Establish parameters, preferably with n~ 20
2. Prepare chart using X ~ 2s (warning) + 3s (action)
3. Maintain control and plot subsequent data.
4. Update parameters from time to time. F-tests can be used to
es~ablish if variances are the same, while "t" tests (assuming
underlying normality of control data) can be used to decide if
means are the same. For example, could we up-date the data in
Table A-6-1 with the data plotted in Figure A-6-2 (n=8)? Table
~
69
TABLE A-6-2: Update Data Plotted in Figure A-6-2
PAIR X Y T D
1 0.60 0.28 0.88 0.32
2 0.53 0.24 0.77 0.29
3 0.52 0.15 0.67 0.37
4 0.41* 0.20 0.61 0.21
5 0.62 0.25 0.87 0.37
6 0.66 0.31 0.97 0.35
7 0.53 0.19 0.72 0.44
8 0.60 0.15 0.75 0.35
AVG 0.58 0.22 0.80 0.36
s 0.054 0.063 0.106 0.047
n = 7
* We will eliminate this data point since it
lies outside the 28 circle in Figure A-6-2.
CALCULATIONS
s = R 0.033
s = S 0.082
s = 0.088
F s (n=7) = 0.088 = 1.17 (F crit 3.10) s(n-18) 0.075
Variances are_essentially the same. Similarly, lit" tests cannot distinguish the X or Y data from the two time periods. Therefore data could be pooled with the earlier set to give n=25.
70
A-6-2 lists the data, and subsequent calculations show that we
could pool the data sets to derive a new control chart. If
groups prove to be different a reason should be sought and the
process must be restarted.
A.6.4.2 Chart interpretation
1. Points outside the action limits require investigation. Data accumulated
since last known control must be redone.
2. Trends, for example 7 points consecutively on the same side of the mean
should be investigated.
3. The charts should be watched for cycles or periodicity which takes place
within the control limits. Cause should be investigated and this source of
error eliminated.
71
A.7 APPENDIX 7 CALCULATING AND REPORTING THE ERROR STATEMENTS.
What to report and how to report it have been dealt with in the text. A
complete data report will include complete validated methods and their
performance characteristics, physical handling and appearance of samples,
sampling strategy and some statement about laboratory quality assurance tactics.
However, at some point we are going to want to report our data, give some
estimate of error and how we calculated it, and the model used. To illustrate
how this can be done for a real data set, some hypothetical data for Hg
determinations in sediments are shown in Table A-7-1. This data set includes
calibration, hlanks, reference material and replicate samples. Ideally, all of
these have been run randomly as shown by the sequential number. Furthermore,
our hypothetical laboratory checks its balances (using a control chart and NBS
standards at a high and a low weight), keeps a control chart on blanks and blind
reference materials, and checks the instrument calibration by inserting two
blind calibrants (high and low) prepared by the lab manager. Before data are
allowed to go out of the laboratory, the calibrant, reference materials and
blanks must be "in control". One advantage of the control charts is that one
can draw on a longer period of experience with the method to assess average and
variance of the blank, and also assess bias from the long term experience with
reference materials.
Here, error will be treated in a "modular II fashion, dealing with each
aspect as a separate entity and putting the error statement together at the end.
This will require logical thinking, and a due consideration of all elements
which potentially contribute error to the final reported value. Also it should
be anticipated that some problems will arise which require an operational
decision. For example if our detection limit is expressed as an absolute
amount, ng Hg then in terms of sediment concentration, ng g-l, it will depend on
how much sediment we originally used.
72
TABLE A-7-1
RAW DATA AS LISTED IN LABORATORY DATA SHEETS
SAMPLE MEASUREMENTS
w Y
SAMPLE SEQt SEDIMENT PEAK
II WEIGHT HEIGHT
1 8 0.291 3503 36 0.263 31.2
2 28 0.375 44.2 17 0.474 46.7
3 35 0.247 27.9 18 0.222 26.3 51 0.232 27 .1
4 52 0.303 31.2 2 0.365 31.6
5 26 0.632 44.1 19 0.484 43.3
6 49 0.387 35.0 53 0.329 31.9
7 48 0.223 17.1 37 0.205 17.6
8 24 0.451 46.0 33 0.458 46.6 7 0.402 42.1
BCSS-1 54 0.567 91.4 MESS-1 42 0.498 97.1
BLANKS (2 DAYS, POOLED)
DATE SEQ. II PK Ht
5/6/83 4 6.5 5/6/83 1 3.8 5/6/83 6 4.8 4/6/83 6.0 4/6/83 5.4 4/6/83 3.7 4/6/83 6.3
ANALYST: MIKE ROMOLE DERIVED QUANTITIES 'DATE JUNE 5/83
- FILE /I JUN-5-83 Y-Y Y-Y -a Y-Y -a ~P CALCULATION B B B
m mw
BLANK Hg ng Hg ng g -1 AVERAGE s v
CORR Hg ng g -1
30.1 21. 9 75.4 73.4 2.83 1 26.0 18.8 71.4 39.0 28.8 76.9 70.9 8.49 1 41.5 30.8 64.9 22.7 16.2 65.7 66.8 0.99 2 21.1 15.0 67.5 21.9 15.6 67.3 26.0 18.8 61.9 57.1 6.79 1 26.4 19.1 52.3 38.9 28.7 45.5 51.8 8.90 1 38.1 28.1 58.1 29.8 21.7 56.1 57.4 1.84 1 26.7 19.3 58.7 11.9 7.9 35.3 37.8 3.54 1 12.4 8.3 40.3 40.8 30.2 67.0 67.2 0.40 2 41.4 30.7 67.0 36.9 27.2 67.7 E"i = 10
86.2 65.3 115 Control limit X-2s = 105 91. 9 69.7 140 Control limit X-2s = 130
t SEQ /I Gives order in which determinations were run (italics)
73
TABLE A-7-1 CONTINUED
CALIBRATION SHEET DATE JUNE 5/83 FILE # JUN-5-83
W Y, PEAK HEIGHT Hg, ng
40 34 43 32 SEQ # 0.0 0.0 0.4 0.0 0.0
22 14 38 50 SEQ II 20.0 27.0 30.0 24.1 28.0
12 27 5 46 SEQ II 30.0 41.5 43.5 44.4 38.5
11 16 30 SEQ II 40.0 55.7 56.9 54.3
15 23 29 SEQ II 50.0 67.8 63.5 62.4
20 13 SEQ II 60.0 76.8 77 .1
3 SEQ II 70.0 93.3
21 10 25 39 47 SEQ II Wl* 8.4 8.8 7.6 8.0 8.1 Yl' k 8.2, 5
41 31 44 9 45 SEQ #
W * 2 74.0 70.1 75.1 74.3 74.0 '72' k 73.5, 5
* WI' W2
Check calibrants supplied by lab manager.
Analyses started at 0900, finished at 1100.
74
A.7.l Error elements for the data reported in Table A-7-1.
A.7.1.1 Calibration
A linearity check need not be performed every time a calibration is run,
but it certainly should be checked initially as part of method validation, and
from time to time. A check on the data in Table A-7-1 gives
F = 0.75 < F~(1)=0.05,7,12 = 2.91. Therefore linear regression is appropriate
within the amounts of Hg (ng) shown in the Table. A check confirms that no
samples were measured outside the linear calibration range.
Results of linear regression are (Section A.5.4)
n = 21 s c = 2.34 where Y a + mW
m 1.295 s 0.0249 m a = 1.70 s 0.935 Y peak height units a r = 0.9965 W 31.4 ng W mass of Hg, ng
Given that the calibration is linear, and a linear regression has been
performed, the next task is to check the calibration against the blind
calibrants. Although control charts can and should be prepared to assess long
term performance, it is probably simpler and more internally consistent to
confirm that the blind calibrants lie within some arbitrary confidence limit
(95% for example) determined by the regression. If k replicate recorder
readings are made on each of the blind calibrants, the appropriate 95%
confidence interval is:
tsy = to. 05 (2), n-2 ~ s2 [t + iJ + (w. - W} 2 s
1 m a-7-1
For blind calibrant 1, WI = 5.0 ng, k = 5
ts 2.09 ~ 5.48 [t + ~lJ + (26.4)2 x 0.000620 y
1.2
75
Therefore the average, Y, of the 5 readings on WI' must lie within the
95% confidence band:
8.2 + 1.2 (peak height units)
Similarly for blind calibrant 2, W2
= 55 ng, and the average of 5 readings on W2
must lie within
72.9 + 2.7 (peak height units)
The data on WI and W2 in Table A-7-1 show that the calibration is acceptable,
and we have no reason to believe that the blind calibrants come from a different
population. The blind cali brant could now be included in a revised calibration,
but this will not be done here.
A.7.1.2 Blanks
The three blanks run with this batch are within the blank control limits
so blanks will be pooled with those from the day before to give a better
estimate.
Therefore YB, BB' DB = 5.21, 1.15, 7 (Peak height units)
(eqn. a-5-1)
A.7.1.3 Limit of Detection
LOD (IUPAC) = t o.001 (1),6 x sB
m
(eqn. a-5-12)
or
5.21 x 1.15 = 4.6 ng Hg
1.3
LOD (Prop of Error)
(eqn. a-5-15) 2.71 " 2.91 + 0.91 + 0.005 = 5.3 ng Hg
Limit of Quantitation is about 10 LOD ""3
18 ng Hg
76
A.7.1.4 Pooled Variance
We may use the replicate determinations (independent weight, digest,
recorder reading) to calculate a pooled variance.
sp'\> = 4.73,
(eqn. a-5-2)
-1 10 (ng g )
A.7.2 Putting the error components together.
A.7.2.1 Random error
Now that we have calculated the likely size of error for each of the
above four components, we need to find some succinct way of putting them
together in a meaningful final error statement, In the treatments below, we
will use "t" to indicate Student's "t" and will mean by it tct (2) = 0.05 (or 95%
confidence limit) for the appropriate degrees of freedom, usually n-1. It
should be noted that "til is used as a correction or normalizer to convert s to
(or 2 crete) when s is calculated from a small number of replicates. What we
will attempt to estimate is the random error one would observe if each final
measured or calculated value were arrived at independently. That is, each final
concentration reported had its own blank, calibration, digest and weighing.
Furthermore we will assume that underlying distributions are close to normal.
The first step in assessing error is to
the final reported measurement. Here for each
-1 C. ngg
l.
w. m l.
specify exactly how we arrived at
Hg concentration, C., l.
a-7-2
Note that Yi and wi are independent for each sample but that YB, a and mare
constant for all samples. For the completely independent measurement,
Ci
= (Yi
- YBi
)
wi mi
a-7-3
..
77
If we measure separately each of the errors associated with the elements
in equation a-7-3, propagation of error theory (Appendix 4) will show us how
they should be combined to give the un-biased best-estimate of error associated
with C. While tractable, it is rather messy to do this and some simplifying
assumptions may be possible at this point. For instance, control on our balance
tells us that 0 is very small, and we can ignore it without seriously hurting w
our error estimate. Therefore we will use some arbitrary "errorless" weight, w,
(average or minimum for example) when estimating the error. Formula a-4-1
applied to a-7-3 setting w. = w, yields ~
1 _2 2 wm
+ 02 + (Y_Y _a)2 a B
,
o:~ ] m
a-7-4
Now by substituting ts for 0 we can estimate the combined error (keeping units
the same I ).
02 Y
_2 2 wm
2 o YB _2 2 wm
2 a a -2 2 wm
=
=
(tsB
) 2
_2 2 wm
[2.45 x 1.15 ] 2
0.352 x 1.29
2 (ts ) a --:z-2 wm
[7..09 x 0.935] 2 0.352 x 1.29
"
. -1 2 = 38.5 (ngg )
I = 18.5 (rigg -1) 2
80
SAMPLE" -1
Hg cone ng g n t p
1/ dry weight
1 73.4 2
2 70.9 2
3 66.8 (23) 3
4 57.1 2
5 51.8 2
6 57.4 2
7 37.8 (25) 2
8 67.2 3
t number of determinations used to calculate average.
( ) LOD for measurements falling between LOQ and LOD •
.. Samples, standards, blanks and reference material run randomly.
Cali bra tion: Y = a + mW Y - Peak height
Blanks:
LOD:
W-Hg,ng
n =21, a=1.70, m=1.294, s =0.0249, s =0.935 c m a
Calibration range 0 - 70 ng Hg.
Calibration checked (2 points)
Three blanks in control; pooled with previous day's blanks.
YB, ~, Ds = 5.21, 1.15, 7
Calculated by method of propagation of error (a-5-15).
LOD = 5.3 rig Hg
SAMPLE SIZE RANGE
:. LOD = 8 - 25 ng
0.205 - 0.632 g -1 g (depending on sample)
"
' ..
POOLED VARIANCE, sp
ESTIMATED TOTAL ERROR
Additional Information
Sl
Calculated from the replicate (n ) determinations. p
Includes error of digestion, determination and chart
reading.
-1 sp = 4.73 ng g (based on 10 degrees of
freedom - formula a-5-2)
a) Random
Random error was calculated according to the
propagation of error theory using t (a(2) = 0,05,\)
to normalize s.
2 (tsy ) + 2 (tsy ) + (y _y _a)2 ts2 max B m B a
" 13.4 ng g-l where By ::: S ...-- p wm
or tsc = S.l ng g-l
b) Bias Bias was estimated from long term
determinations of reference sediments BCSS-l,
MESS-I. At these concentrations, our data appear
to be consistently low by about 5%, based on 128
determinations over a period of 1 year.
2 m
These data are referenced as file # JUN-S-S3. File contains Control
charts for blanks, balances, reference material, and the work sheets, (along
with sampling location, storage procedure and any physical manipulations done to
the samples prior to chemical digestion).