DATA & STATISTICS 101
Presented by
Stu Nagourney NJDEP, OQA
Precision, Accuracy and Bias
Precision: Degree of agreement between a series of measured values under the same conditions
Accuracy: Degree of agreement between the measured and the true value
Bias: Error caused by some aspect of the measurement system
Precision, Accuracy and Bias
Sources of Error
Systematic Errors: Bias always in the same direction, and constant no matter how many measurements are made
Random Errors: Vary in sign and are unpredictable. Average to 0 if enough measurements are made
Blunders: The occasional mistake that produces erroneous results; can be minimized but never eliminated
Applying Statistics
One cannot sample every entity of an entire system or population. Statistics provides estimates of the behavior of an entire system or population, provided that:– Measurement system is stable– Individual measurements are all independent– Individual measurements are random
representatives of the system or population
Distributions
Data generated by a measurement process generally have the following properties:– Results spread symmetrically around a central
value– Small deviations from the central value occur more
often than large deviations– The frequency distribution of a large amount of
data approximates a bell-shaped curve– The mean of even small sets of data represent the
overall better than individual values
“Normal” Distribution
Other Distributions
Issues with Distributions
For large amounts of data, distributions are easy to define. For smaller data sets, it is harder to define a distribution.
Deviations from “normal” distributions:– Outliers that are not representative of the population– Shifts in operational characteristics that skew the
distribution– Large point-to-point variations that cause
broadening
Estimation of Standard Deviation
The basic parameters that characterize a population are– Mean ()
– Standard Deviation ()
Unless the entire population is examined, and cannot be known. They can only be estimated from a representative sample by– Sample Mean (X)
– Estimate of Standard Deviation (s)
Measures of Central Tendency & Variability
Central Tendency: the value about which the individual results tend to “cluster
Mean: X = [X1 + X2 + X3 + … Xn] / n
Median: Middle value of an odd number of results when listed in order
s = [(Xi - X)2 / n-1]1/2
Measures of Central Tendency & Variability
Statistics
If you make several sets of measurements from a normal distribution, you will get different means and standard deviations
Even the best scientist and/or laboratory will have measurement differences when examining the same sample (system)
What needs to be defined is the confidence in measurement data and the significance of any differences
Estimation of Standard Deviation
X (Xi – X) (Xi – X)2
15.2 0.143 0.020414.7 -0.357 0.125715.1 0.043 0.001815.0 -0.057 0.003315.3 0.243 0.059015.2 0.143 0.020414.9 -0.157 0.0247
X = 15.057 = 0.2572s = (0.2572/6)1/2 = 0.207
If we take 10X the measurements;all the values are the same as above:
X = 15.057 = 2.572s = (2.572/69)1/2 = 0.193
Does a Measured Value Differ from an Expected Value?
Confidence Interval of the Mean (CI) : The probability where a sample mean lies relative to the population mean
CI = X ± (t) (s) / (n)1/2: value of t depends upon level of confidence desired & # of degrees of freedom (n-1)
Does a Measured Value Differ from an Expected Value?
NIST SRM 2682 (Subbitumerous Coal) was analyzed in triplicate for SCertified value = 0.47%
X = 0.485%s = 0.0090%n = 3
Desired CI = 95%; is the measured mean agree with thecertified value for S of 0.47%?
X = 0.485 (4.303)(0.0090) / (3)1/2
X = 0.485 0.0223Values 0.463 to 0.507 are OK
What if s = 0.0090, but 21 measurements were made? t goes upX = 0.485 (2.086)(0.0090) / (3)1/2
X = 0.485 0.0108Values 0.474 to 0.496 are OK
Criteria for Rejecting an Observation
One can always reject a data point if there is an assignable cause
If not, evaluate using statistical techniques
Common Outlier Tests– Dixon (Q) Test– Grubbs Test– Youdon Test– Student t Test
Criteria for Rejecting an Observation: Dixon (Q) Test
1. Calculate the range of results2. Find the difference between the suspected result and its
nearest neighbor3. Q = Step 2 / Step 14. Consult a Table; if the computed Q > the value in the Table,
the result in question can be rejected with 90% confidence.
0.10140.10120.1019 ?0.1016
Q = 0.1019 – 0.1016 / 0.1019 – 0.1012Q = 0.0003 / 0.0007Q = 0.43Since the measured Q (0.43) is less than the reference value(0.76), the value of 0.1019 cannot be rejected
Control Charts
Top Related