DATA & STATISTICS 101

19
DATA & STATISTICS 101 Presented by Stu Nagourney NJDEP, OQA

description

DATA & STATISTICS 101. Presented by Stu Nagourney NJDEP, OQA. Precision, Accuracy and Bias. Precision: Degree of agreement between a series of measured values under the same conditions Accuracy: Degree of agreement between the measured and the true value - PowerPoint PPT Presentation

Transcript of DATA & STATISTICS 101

Page 1: DATA & STATISTICS 101

DATA & STATISTICS 101

Presented by

Stu Nagourney NJDEP, OQA

Page 2: DATA & STATISTICS 101

Precision, Accuracy and Bias

Precision: Degree of agreement between a series of measured values under the same conditions

Accuracy: Degree of agreement between the measured and the true value

Bias: Error caused by some aspect of the measurement system

Page 3: DATA & STATISTICS 101

Precision, Accuracy and Bias

Page 4: DATA & STATISTICS 101

Sources of Error

Systematic Errors: Bias always in the same direction, and constant no matter how many measurements are made

Random Errors: Vary in sign and are unpredictable. Average to 0 if enough measurements are made

Blunders: The occasional mistake that produces erroneous results; can be minimized but never eliminated

Page 5: DATA & STATISTICS 101

Applying Statistics

One cannot sample every entity of an entire system or population. Statistics provides estimates of the behavior of an entire system or population, provided that:– Measurement system is stable– Individual measurements are all independent– Individual measurements are random

representatives of the system or population

Page 6: DATA & STATISTICS 101

Distributions

Data generated by a measurement process generally have the following properties:– Results spread symmetrically around a central

value– Small deviations from the central value occur more

often than large deviations– The frequency distribution of a large amount of

data approximates a bell-shaped curve– The mean of even small sets of data represent the

overall better than individual values

Page 7: DATA & STATISTICS 101

“Normal” Distribution

Page 8: DATA & STATISTICS 101

Other Distributions

Page 9: DATA & STATISTICS 101

Issues with Distributions

For large amounts of data, distributions are easy to define. For smaller data sets, it is harder to define a distribution.

Deviations from “normal” distributions:– Outliers that are not representative of the population– Shifts in operational characteristics that skew the

distribution– Large point-to-point variations that cause

broadening

Page 10: DATA & STATISTICS 101

Estimation of Standard Deviation

The basic parameters that characterize a population are– Mean ()

– Standard Deviation ()

Unless the entire population is examined, and cannot be known. They can only be estimated from a representative sample by– Sample Mean (X)

– Estimate of Standard Deviation (s)

Page 11: DATA & STATISTICS 101

Measures of Central Tendency & Variability

Central Tendency: the value about which the individual results tend to “cluster

Mean: X = [X1 + X2 + X3 + … Xn] / n

Median: Middle value of an odd number of results when listed in order

s = [(Xi - X)2 / n-1]1/2

Page 12: DATA & STATISTICS 101

Measures of Central Tendency & Variability

Page 13: DATA & STATISTICS 101

Statistics

If you make several sets of measurements from a normal distribution, you will get different means and standard deviations

Even the best scientist and/or laboratory will have measurement differences when examining the same sample (system)

What needs to be defined is the confidence in measurement data and the significance of any differences

Page 14: DATA & STATISTICS 101

Estimation of Standard Deviation

X (Xi – X) (Xi – X)2

15.2 0.143 0.020414.7 -0.357 0.125715.1 0.043 0.001815.0 -0.057 0.003315.3 0.243 0.059015.2 0.143 0.020414.9 -0.157 0.0247

X = 15.057 = 0.2572s = (0.2572/6)1/2 = 0.207

If we take 10X the measurements;all the values are the same as above:

X = 15.057 = 2.572s = (2.572/69)1/2 = 0.193

Page 15: DATA & STATISTICS 101

Does a Measured Value Differ from an Expected Value?

Confidence Interval of the Mean (CI) : The probability where a sample mean lies relative to the population mean

CI = X ± (t) (s) / (n)1/2: value of t depends upon level of confidence desired & # of degrees of freedom (n-1)

Page 16: DATA & STATISTICS 101

Does a Measured Value Differ from an Expected Value?

NIST SRM 2682 (Subbitumerous Coal) was analyzed in triplicate for SCertified value = 0.47%

X = 0.485%s = 0.0090%n = 3

Desired CI = 95%; is the measured mean agree with thecertified value for S of 0.47%?

X = 0.485 (4.303)(0.0090) / (3)1/2

X = 0.485 0.0223Values 0.463 to 0.507 are OK

What if s = 0.0090, but 21 measurements were made? t goes upX = 0.485 (2.086)(0.0090) / (3)1/2

X = 0.485 0.0108Values 0.474 to 0.496 are OK

Page 17: DATA & STATISTICS 101

Criteria for Rejecting an Observation

One can always reject a data point if there is an assignable cause

If not, evaluate using statistical techniques

Common Outlier Tests– Dixon (Q) Test– Grubbs Test– Youdon Test– Student t Test

Page 18: DATA & STATISTICS 101

Criteria for Rejecting an Observation: Dixon (Q) Test

1. Calculate the range of results2. Find the difference between the suspected result and its

nearest neighbor3. Q = Step 2 / Step 14. Consult a Table; if the computed Q > the value in the Table,

the result in question can be rejected with 90% confidence.

0.10140.10120.1019 ?0.1016

Q = 0.1019 – 0.1016 / 0.1019 – 0.1012Q = 0.0003 / 0.0007Q = 0.43Since the measured Q (0.43) is less than the reference value(0.76), the value of 0.1019 cannot be rejected

Page 19: DATA & STATISTICS 101

Control Charts