Error and Error Analysis. Types of error Random error – Unbiased error due to unpredictable...

29
Error and Error Analysis

Transcript of Error and Error Analysis. Types of error Random error – Unbiased error due to unpredictable...

Page 1: Error and Error Analysis. Types of error Random error – Unbiased error due to unpredictable fluctuations in the environment, measuring device, and so.

Error and Error Analysis

Page 2: Error and Error Analysis. Types of error Random error – Unbiased error due to unpredictable fluctuations in the environment, measuring device, and so.

Types of error

Random error – Unbiased error due to unpredictable fluctuations in the environment, measuring device, and so forth.

Systematic error – Error that biases a result in a particular direction. This can be due to calibration error, instrumental bias, errors in the approximations used in the data analysis, and so forth.

For example, if we carried out several independent measurements of the normal boiling point of a pure unknown liquid, we would expect some scatter in the data due to random differences that occur from measurement to measurement. If the thermometer used in the measurements was not calibrated, or we failed to correct the measurements for the effect of pressure on boiling point, there would be systematic errors.

Page 3: Error and Error Analysis. Types of error Random error – Unbiased error due to unpredictable fluctuations in the environment, measuring device, and so.

Precision and Accuracy

Precision is a measure of how close successive independent measurements of the same quantity are to one another.

Accuracy is a measure of how close a particular measurement (or the average of several measurements) is to the true value of the measured quantity.

Good precision; poor accuracy Good precision, good accuracy

Page 4: Error and Error Analysis. Types of error Random error – Unbiased error due to unpredictable fluctuations in the environment, measuring device, and so.

Significant Figures

The total number of digits in a number that provide information about the value of a quantity is called the number of significant figures. It is assumed that all but the rightmost digit are exact, and that there is an uncertainty of ± a few in the last digit.

Example: The mass of a metal cylinder is measured and found to be 42.816 g. This is interpreted as follows

42.816

considered as certain

error is ± 2 or 3

Number of significant figures is 5

Page 5: Error and Error Analysis. Types of error Random error – Unbiased error due to unpredictable fluctuations in the environment, measuring device, and so.

Rules For Counting Significant Figures

1) Exact numbers have an infinite number of significant figures.

2) Leading zeros are not significant.

3) Zeros between nonzero digits are always significant.

4) Trailing zeros to the right of a decimal point are always significant.

5) Trailing zeros to the left of a decimal point may or may not be significant (they may, for example, be placeholders).

6) All other digits are significant.

Page 6: Error and Error Analysis. Types of error Random error – Unbiased error due to unpredictable fluctuations in the environment, measuring device, and so.

Counting Significant Figures (Examples)

3.108 4 significant figures

0.00175 3 significant figures (zeros are placeholders)

7214.0 5 significant figures

4000. 1, 2, 3, or 4 significant figures

(zeros may be placeholders)

We can always make clear how many significant figures are present in a number by writing the number in scientific notation.

4. x 103 1 significant figure

4.00 x 103 3 significant figures

Page 7: Error and Error Analysis. Types of error Random error – Unbiased error due to unpredictable fluctuations in the environment, measuring device, and so.

Significant Figures in Calculations

There are two common general cases. A general method that applies to cases other than those given below will be discussed later.

Multiplication and/or division - The number of significant figures in the result is equal to the smallest number of significant figures present in the numbers used in the calculation.

Example. For a particular sample of a solid, the mass of the solid is 34.1764 g and the volume of the solid is 8.7 cm3. What is the value for D, the density of the solid?

D = mass = 34.1764 g = 3.9283218 g/cm3 = 3.9 g/cm3

volume 8.7 cm3

Page 8: Error and Error Analysis. Types of error Random error – Unbiased error due to unpredictable fluctuations in the environment, measuring device, and so.

Addition and/or subtraction. The number of significant figures in the result is determined by where the number of significant figures in the numbers used in the calculation, relative to the decimal point, first run out.

Example. The masses for three rocks are 24.19 g, 2.7684 g, and 91.8 g. What is the combined mass of the rocks?

24.19 g

2.7684 g

91.8 g

118.7584 g = 118.8 g

Page 9: Error and Error Analysis. Types of error Random error – Unbiased error due to unpredictable fluctuations in the environment, measuring device, and so.

Additional Cases

1) For calculations involving both addition/subtraction and multiplication/division the rules for significant figures are applied one at a time, rounding off to the correct number of significant figures at the end of the calculation.

Example.

(46.38 - 39.981) = 6.399 = 0.4629911 = 0.463

13.821 13.821

2) Significant figures for other kinds of mathematical operations (logarithm, exponential, etc.) are more complicated and will be discussed separately.

Page 10: Error and Error Analysis. Types of error Random error – Unbiased error due to unpredictable fluctuations in the environment, measuring device, and so.

Rounding Numbers

In the above calculations we had to round off the results to the correct number of significant figures. The general procedure for doing so is as follows:

1) If the digits being removed are 5 or less, round down.

2) If the digits being removed are more than 5, round to the even number.

Examples: 3.4682 to 3 sig figs 3.47

18.4513 to 4 sig figs 18.45

1.4500 to 2 sig figs 1.4

1.4501 to 2 sig figs 1.5

Page 11: Error and Error Analysis. Types of error Random error – Unbiased error due to unpredictable fluctuations in the environment, measuring device, and so.

Statistical Analysis of Random Error

There are powerful general techniques that may be used to determine the error present in a set of experimental measurements. These techniques generally make the following assumptions:

1) Successive measurements are independent of one another.

2) Systematic error in the measurements can be ignored.

3) The random error present in the measurements has a normal (Gaussian) distribution

P(x) dx = 1 exp[- (x - )2/22] dx

(22)1/2

P(x) = probability of obtaining x from the measurement

= average value (the true value in the absence of systematic error)

= standard deviation (measure of the width of the distribution)

Page 12: Error and Error Analysis. Types of error Random error – Unbiased error due to unpredictable fluctuations in the environment, measuring device, and so.

Normal Error Distribution

P(x) dx = 1 exp[- (x - )2/22] dx

(22)1/2

= 0, = 1

Page 13: Error and Error Analysis. Types of error Random error – Unbiased error due to unpredictable fluctuations in the environment, measuring device, and so.

Error Analysis – General Case

Consider the case where we have N independent experimental measurements of a particular quantity. We may define the following:

x = ( xi )/N

S = 1 { (xi – x )2 }1/2

(N – 1)1/2

Sm = S/N1/2 = { (xi – x )2 }1/2

[N(N – 1)]1/2

In the above equations the summation is assumed to run from i = 1 to i = N.

We call x the average (mean) value for the N measurements, S the estimated standard deviation, and Sm the estimated standard deviation of the mean.

Page 14: Error and Error Analysis. Types of error Random error – Unbiased error due to unpredictable fluctuations in the environment, measuring device, and so.

Limiting Case: N The equations given on the previous slide show interesting

behavior in the limit of a large number of measurements. As N x S Sm 0

What does this mean? In the limit of a large number of measurements (and subject to the restrictions previously given) the average value approaches the true value for the quantity being measured, the estimated standard deviation approaches the true standard deviation, and the estimated standard deviation of the mean approaches zero. This means that in the absence of systematic error we can reduce the uncertainty (error) in our average value by carrying out a larger and larger number of measurements. Unfortunately, since Sm ~ 1/N1/2, the square root of the number of measurements made, the reduction in the uncertainty is slow. We normally have to make a compromise between reducing the uncertainty and making a reasonable number of measurements.

Page 15: Error and Error Analysis. Types of error Random error – Unbiased error due to unpredictable fluctuations in the environment, measuring device, and so.

Uncertainty for a Finite Number of Measurements

Consider the case where N does not become large. In this case, we usually report uncertainties in terms of P, the confidence limits on the result. We interpret the confidence limits in the following way. If the value for the experimental quantity is given as x , at (P x 100%) confidence limits, it means that the probability that the true value for the quantity being measured lies within the stated error limits is P x 100%.

Example: The density of an unknown substance has been measured and the following results obtained.

D = (1.485 0.16) g/cm3, at 95% confidence limits

This means that there is a 95% probability that the true value for the density lies within 0.16 g/cm3 of the average value obtained from experiment.

We typically use 95% confidence limits in reporting data, though other limits can be used.

Page 16: Error and Error Analysis. Types of error Random error – Unbiased error due to unpredictable fluctuations in the environment, measuring device, and so.

Determination of Confidence Limits

The theoretical basis for determining confidence limits is complicated (see Garland et al. for a brief discussion). We will therefore focus on the results and not be concerned with the theoretical justification.

Consider the case where N independent measurements have been carried out, and x and Sm have been found by the methods previously discussed. The confidence limits for x are given by the following relationship

P, = tP, Sm = tP, S

N1/2

where P is the confidence limits (expressed as a decimal number instead of a percentage), Sm is the estimated standard deviation of the mean, S is the estimated standard deviation, is the number of degrees of freedom in the measurements, and tP, is the critical value of t (Table 3, of Garland et al.)

Page 17: Error and Error Analysis. Types of error Random error – Unbiased error due to unpredictable fluctuations in the environment, measuring device, and so.

The number of degrees of freedom in a measurement is equal to the number of measurements made minus the number of parameters determined from the measurements. For example, for the case of N measurements, then…

…if we find the average value for the measurements (one parameter) then = N – 1

…if we fit the data to a line (two parameters) then = N – 2

…if we fit the data to a cubic equation (y = a0 + a1x + a2x2 + a3x3, four parameters) then = N – 4.

and so forth.

Page 18: Error and Error Analysis. Types of error Random error – Unbiased error due to unpredictable fluctuations in the environment, measuring device, and so.

Example

Consider the determination of Uc, the energy of combustion, for an unknown pure organic compound. The following experimental results are found

Trial Uc(kJ/g)

1 - 11.348

2 - 11.215

3 - 11.496

4 - 11.335

5 - 11.290

N = 5 = N – 1 = 4 x = - 11.3368 kJ/g

S = 0.010308 kJ/g

Page 19: Error and Error Analysis. Types of error Random error – Unbiased error due to unpredictable fluctuations in the environment, measuring device, and so.

For = 4 and P = 0.95 (95% percent confidence limits), we get from Table 3 (top row; values for P) that t0.95,4 = 2.78. Therefore

= tP, S = (2.78)(0.10308 kJ/g) = 0.1281 kJ/g

N1/2 (5)1/2

We generally round off the confidence limits to two significant figures, and then report the average value out to the same number of significant figures as the confidence limits.

We would therefore report the result from the above measurements as

Uc = (- 11.34 0.13) kJ/g, at 95% confidence limits

Page 20: Error and Error Analysis. Types of error Random error – Unbiased error due to unpredictable fluctuations in the environment, measuring device, and so.

Comparison of Experimental Results to Literature or Theoretical Values

You will often compare your experimental results to either a literature value or a value obtained by some appropriate theory. While literature values have their own uncertainty it will generally be the case that the uncertainty is small compared to the uncertainty in your results. You will therefore generally treat literature values as “exact”.

If a literature result or theoretical result falls within the confidence limits of your result, then we say that your result agrees with the experimental or theoretical value at the stated confidence limits. If the result falls outside the confidence limits then we say that you’re your result disagrees with the literature or theoretical value at the stated confidence limits.

Page 21: Error and Error Analysis. Types of error Random error – Unbiased error due to unpredictable fluctuations in the environment, measuring device, and so.

Discordant Data

It will sometimes be the case that one result will look much different than the other results that have been found in an experiment. In this case it is usually a good idea to go back and check your calculations to see if a mistake has been made.

Assuming there are no obvious calculation mistakes, there is a procedure (called the Q-test) that can be used to decide whether or not to keep an anomalous experimental result (Garland et al.).

Generally speaking one should be reluctant to discard any experimental result, and resist the temptation to discard results just to make your experimental data look better. If you do decide not to include an experimental result in your data analysis I expect that the result should still be reported, and that a reason be given for not including the result in your calculations.

Page 22: Error and Error Analysis. Types of error Random error – Unbiased error due to unpredictable fluctuations in the environment, measuring device, and so.

Propagation of Error

We often use experimental results in calculations. In this case there is a general procedure that can be found for error propagation.

Let F(x,y,z) be a function whose value is determined by the values of the variables x, y, and z. We further assume that x, y, and z are independent variables. Let x, y, and z be the uncertainty in the values for x, y and z (these might, for example, be confidence limits for the values). F, the uncertainty in the value of F, is related to the uncertainties in x, y, and z as follows

(F)2 = (F/x)2(x)2 + (F/y)2(y)2 + (F/z)2(z)2

where (F/x), (F/y), and (F/z) are the partial derivatives of F with respect to x, y, and z.

Page 23: Error and Error Analysis. Types of error Random error – Unbiased error due to unpredictable fluctuations in the environment, measuring device, and so.

Examples

F = x + y + z

(F)2 = (F/x)2(x)2 + (F/y)2(y)2 + (F/z)2(z)2

(F)2 = (1)2(x)2 + (1)2(y)2 + (1)2(z)2

= (x)2 + (y)2 + (z)2

F= xyz

(F)2 = (F/x)2(x)2 + (F/y)2(y)2 + (F/z)2(z)2

(F)2 = (yz)2(x)2 + (xz)2(y)2 + (xy)2(z)2

If we divide both sizes of the equation by F = (xyz)2, we get (after some rearrangement)

(F/F)2 = (x/x)2 + (y/y)2 + (z/z)2

Note the above results are general for combinations of addition and subtraction (1st result) or of multiplication and division (2nd result).

Page 24: Error and Error Analysis. Types of error Random error – Unbiased error due to unpredictable fluctuations in the environment, measuring device, and so.

Example

k = A exp(- Ea/RT) (the Arrhenius equation)

(k)2 = (k/A)2(A)2 + (k/Ea)2(Ea)2

(k)2 = [exp(-Ea/RT)]2(A)2 + [(A/RT) exp(-Ea/RT)]2(Ea)2

If we divide through by k2, we can rewrite the above in a slightly nicer form

(k/k)2 = (A/A)2 + (Ea/RT)2

The nice thing about the general method for finding how error propagates in a calculation is that it allows you to deal with “non-standard” equations - that is, equations involving operations more complicated than addition, subtraction, multiplication, and division.

There are more sophisticated treatment of error that take into account correlation between errors from different terms in an equation, but we will not consider such treatments here.

Page 25: Error and Error Analysis. Types of error Random error – Unbiased error due to unpredictable fluctuations in the environment, measuring device, and so.

“Best Fit” of Data to a Line

It is often the case that two variables x and y are linearly related. In this case we would like to be able to find the line that best fits the experimental data, that is, find the best values for the slope (m) and intercept (b). We consider the following case (there are other more complicated cases that we will not consider here).

Assume that the values for x (independent variable) are exact. Assume the error is the same for all values of y. The general way in which we proceed is as follows:

1) Define an error function

E(m,b) = (yc,i – yi)2

= (mxi + b – yi)2

where xi and yi are the ith experimental values for x and y, yc,i is the ith calculated value for y, and the sum runs from i = 1 to i = N.

Page 26: Error and Error Analysis. Types of error Random error – Unbiased error due to unpredictable fluctuations in the environment, measuring device, and so.

2) Find the values for m and b that minimize the error. We do this by setting the first derivatives of the error function equal to zero (strictly speaking this only finds extreme points in the error function, but in this case the extreme point will turn out to be a minimum).

(E/m) = 0 (E/b) = 0

This will give us two equations in terms of two unknowns, m and b.

3) Use the equations found in 2 to determine the best fit values for m and b.

The results (not derived here) are as follows

m = ( xiyi) – (1/N)(xi)(yi)

(xi2) – (1/N)(xi)2

b = (1/N) [ (yi) – m(xi) ]

The above procedure is called the method of least squares. Note that expressions for the confidence limits for m and b can also be found.

Page 27: Error and Error Analysis. Types of error Random error – Unbiased error due to unpredictable fluctuations in the environment, measuring device, and so.

“Best Fit” of Data to a Polynomial

The same procedure outlined above may be used to fit data to a function that is a power series expansion in x.

y = a0 + a1x + a2x2 + …amxm

where a0, a1, …, am are constants determined by finding values that minimize the error. Note that the above polynomial is of order m.

While it is tedious to do the math involved in obtaining expressions for the constants a0, a1, …, am it turns out that the method outlined above for finding the best fitting line works in the general case, and that a (relatively) simple method for determining the constants and confidence limits for the constants can be found using matrix methods.

Page 28: Error and Error Analysis. Types of error Random error – Unbiased error due to unpredictable fluctuations in the environment, measuring device, and so.

Best Fit of Data to an Arbitrary Function

The same procedure outlined above can usually be used to fit experimental data to any function. The difference is that the math involved is generally more complicated, and that correlation among the variables in the functions used can become important. In many cases numerical approximation methods must be used to determine the values of the constants that give a best fit of the function to the experimental data.

Page 29: Error and Error Analysis. Types of error Random error – Unbiased error due to unpredictable fluctuations in the environment, measuring device, and so.

Confidence Limits for Slope and Intercept

Just as we can find the confidence limits for the average value for a series of measurements, we can also determine confidence limits for the slope and intercept that give the best linear fit to a set of data. The formulas are more complicated than those for the confdence limits of the average, but the principle involved is the same.

There are several programs that will not only find the best fit values for slope and intercept but also the 95% confidence limits for these parameters. They include:

EXCEL (some versions)

POLY (available in CP-375)

Other data packages (ANOVA and others).

A file of test data will be placed on my web page for checking that a particular program is correctly finding the confidence limits for the slope and intercept in a linear fit to data.