Unit 8 Textbook
-
Upload
steve-bishop -
Category
Documents
-
view
232 -
download
1
Transcript of Unit 8 Textbook
-
7/30/2019 Unit 8 Textbook
1/47
UNIT
USING
STATISTICS
FOR SCIENCE
BTEC (Extended) Diploma
Applied Science (Forensics) Level 3
Steve Bishop
November 2012
-
7/30/2019 Unit 8 Textbook
2/47
Unit 8 Steve Bishop
2
Contents
1 BE ABLE TO USE STATISTICAL TECHNIQUES TO INVESTIGATE SCIENTIFIC
PROBLEMS ................................................................................................................. 3
Statistical techniques ................................................................................................... 5
Measures of location ................................................................................................ 5
Measures of dispersion ............................................................................................ 6
Normal distribution 1 .................................................................................................... 9
Confidence limits .................................................................................................... 11
Shapes of distributions ........................................................................................... 13
The normal distribution 2 ........................................................................................... 14
Finding probabilities with negative values of z....................................................... 17
Standardising a normal distribution ........................................................................... 19
Probability introduction .............................................................................................. 22
Conditional probability ............................................................................................... 24
Statistics and probability questions ........................................................................... 26
2 BE ABLE TO PERFORM STATISTICAL TESTS TO INVESTIGATE SCIENTIFIC
PROBLEMS ............................................................................................................... 28
Chi-squared (2
! ) test ............................................................................................. 29
Practice questions ................................................................................................. 35
Type I and type II errors ............................................................................................ 36
The angel of death: guilty or not guilty? ..................................................................... 37
Students t-test ........................................................................................................... 39
t-test for matched pairs .......................................................................................... 41
Independent samples ............................................................................................. 42
Independent t-test .................................................................................................. 44
STATISTICAL TABLES ............................................................................................. 45
-
7/30/2019 Unit 8 Textbook
3/47
Unit 8 Steve Bishop
3
1 BE ABLE TO USE STATISTICAL TECHNIQUES
TO INVESTIGATE SCIENTIFIC PROBLEMS
Probability: addition and multiplication rules; conditional probability, eg lottery,
Mendelian inheritance
Frequency distributions: discrete data; continuous data (grouped and ungrouped)
Shape of distributions: unimodal distributions (normal distributions and skewed
distributions); bimodal distributions (qualitative explanation)
Statistical data calculations: calculation of the mean, ; mode;
median; calculations of standard deviation, ; using ICT
equipment to calculate the standard deviation; entering statistical data into ICT
equipment; retrieving statistical information from ICT equipment; standard error of
the mean; confidence limits
Normal distribution: mean; variance; use of tables of the cumulative distribution
function; application of the normal distribution in science
Sampling: random sampling (quadrant in field sampling); population and sample(Gallup or Mori poll); standard error of the mean (the uncertainty in the average
value of a set of measurements, eg the calorific value of oil)
P1 carry out statistical calculations to investigate a scientific problem
M1 perform a calculation using probability to investigate a scientific problem
D1 interpret shapes of distributions in scientific data
-
7/30/2019 Unit 8 Textbook
4/47
Unit 8 Steve Bishop
4
-
7/30/2019 Unit 8 Textbook
5/47
Unit 8 Steve Bishop
5
Statistical techniques
Measures of location
There are three types of average: the mean (x
_
),
mode and median. These are known as measures of
location. This provides a single value that represent
the data.
Now try this
Find the mean, median and mode of the following data:
(a) 1, 1, 1, 3, 4, 5, 6
(b) 0, 1, 1, 1, 1 ,1 ,1, 9
Which of the three measures of location is most affected by an extreme value?
When might the mode be of more use than either the median or the mean?
What is the advantage of the mean?
Measures of location doesnt tell us how spread out our data are how dispersed
they are.
Be able to use statistical
techniques to investigate
scientific problems
Frequency distributions
Shape of distributions
Statistical data calculations: mean,
mode, median and standard
deviation
Samples and populations standard
error of the mean
Using spreadsheets and calculators
-
7/30/2019 Unit 8 Textbook
6/47
Unit 8 Steve Bishop
6
Measures of dispersion
One measure of dispersionor spread is the range. Another is the standard
deviation, sor !.
The formula for standard deviation is: s =
! x" x_#
$%
&
'(2
n"1
This is not quite so scary as it looks!
It involves a few simple steps
1. Find the mean
2. Subtract it from all the values to find the deviation and then square it3. Total up the deviation squared
4. Divide 3 by the total number of data points less one.
5. Square root the answer from 4. This is the samplestandard deviation.
Done manually, it is best done in a table:
Example
These are the number of break-ins in a housing estate over a twelve-month period.
Find the mean and the standard deviation of the data:
1, 3, 3, 4, 2, 0, 0, 3, 4, 3, 0, 1
1. Find the mean
x
_
=
1+3+3+4+2+0+0+3+4+3+0+1
12=
24
12= 2
2. Subtract the mean from all the other values and square it x! x_"
#$
%
&'2
sis the standard deviation
xis the individual data
points
x
_
(x bar) is the mean
n is the number of data
points
-
7/30/2019 Unit 8 Textbook
7/47
Unit 8 Steve Bishop
7
x" x_#
$
%
&
'
(
x" x_#
$
%
&
'
(
2
1 1 - 2 -1 1
3 3 2 1 1
3 3 2 1 14 4 2 2 4
2 2 2 0 0
0 0 2 -2 4
0 0 2 -2 4
3 3 2 1 1
4 4 2 2 4
3 3 2 1 1
0 0 2 -2 4
1 1 2 -1 1
Total 26
3. Find the total of the deviations " x# x_$
%&
'
()
2
= 26
4. Divide the above by the number of data points (n) less 1 (n-1)
12-1 =11
! x" x_#
$%
&
'(2
n"1=26
11=2.3636363 (Dont round up yet!)
5. To find the standard deviation square root the answer above
s =
! x" x_#
$%
&
'(
n"1
2
= 2.363636... = 1.5374
= 1.54 (3 sig fig)
-
7/30/2019 Unit 8 Textbook
8/47
Unit 8 Steve Bishop
8
This can be done on a spreadsheet using
the insert function.
Enter the data in a column.
Ensure the correct data points are chosen
Insert function choose statistical >
STDEV
Alternatively use the function statement
=STDEV(cell range)
.
-
7/30/2019 Unit 8 Textbook
9/47
Unit 8 Steve Bishop
9
Normal distribution 1
The standard deviation can be used to find the confidence intervalfor a set of
measurements.
We expect 95% of measured values to lie within 2 standard deviations above and
below the mean.
The distribution of the height of 1000 people might look like this.
The shape is known as a bell shape.
The mean, median and mode will all have the same value
It is symmetrical around the mean value.
Many biological variables such as weight, height, blood pressure, life span have this
same distribution shape.
Given enough data points the curve will be a smooth bell shape
-
7/30/2019 Unit 8 Textbook
10/47
Unit 8 Steve Bishop
10
On a normal distribution:
68% of the data items will be within 1 standard deviation from the mean
95.5% of the data items will be within 2 standard deviation from the mean
99.7% of the data items will be within 3 standard deviation from the mean
However, the mean is only an estimate of the exact value and we only have a small
sample of values so we have to use this equation. There will be a sampling error,as
we cannot always sample the whole population.
We then need to calculate the standard errorof the mean:
Standard error =s
n
Any data that is more than
3 standard deviations
from the mean is
considered to be an
outlier.
If the whole population is
sampled then this is
known as a census
-
7/30/2019 Unit 8 Textbook
11/47
Unit 8 Steve Bishop
11
Now try this
Complete the following table
Sample size Mean (cm) Standard deviation Standard error of
the mean
10 150 2
100 150 2
1000 150 2
10 000 150 2
What happens to the standard error of the mean as the sample size increases?
Confidence limits
To find how confident we can be in the data we can find the confidence limits
these are related to the standard error.
For data that is normally distributed approximately 95% is
within 2 standard deviations. The 95% confidence level is
adequate for most scientific investigations.
95% confidence limit = mean 1.96 x standard error of the mean
In forensic situations or in
clinical trials a 99.7%confidence limit is often
required
-
7/30/2019 Unit 8 Textbook
12/47
Unit 8 Steve Bishop
12
Now try these
1. The diameter of a piece of wire is measured using a micrometer. The
following results in mm were obtained:
2.34, 2.34, 2.35, 2.37, 2.38
Calculate the mean and standard deviation.
2. The mean of five diameter values of a piece of wire is 2.36 mm and the
standard deviation is 0.018 mm.
What is the standard error of the mean?
A piece of wire is 2.39 mm. Can you be 95% confident that it is a correct
measurement of the diameter?
-
7/30/2019 Unit 8 Textbook
13/47
Unit 8 Steve Bishop
13
3. The volumes of acid to determine the end point of a titration are the
following:
(a) Calculate the mean and standard deviation
(b) Find the standard error of the mean.
(c) What are the 95% confidence limits, assuming the data is normally
distributed?
Shapes of distributions
-
7/30/2019 Unit 8 Textbook
14/47
Unit 8 Steve Bishop
14
The normal distribution 2
The normal distributionis a veryimportant distribution.
It is described by:
X~ N(x
_
, s!)
It has the following features:
bell-shaped
symmetrical about "
it extends from ! to +!
the maximum value of f(x) = 1
! 2"
the total area under the curve is 1
95%
!2" +2"
99.9%
!3" +3"
Approximately 95% of thedistribution lies between 2 SDs of
the mean
Approximately 99.9% of the
distribution lies between 3 SDs ofthe mean
f(x)
mean variance
s
-
7/30/2019 Unit 8 Textbook
15/47
The probability that X lies between aand bis written
as: P(a
-
7/30/2019 Unit 8 Textbook
16/47
Unit 8 Steve Bishop
16
Now try these
Draw sketches to illustrate your answers
If Z ~N (0, 1), find
1. P (Z 0.87) 3. P (Z< 0.544) 4. P (Z> 0.544)
-
7/30/2019 Unit 8 Textbook
17/47
Unit 8 Steve Bishop
17
Finding probabilities with negative values of z
For negative values of Zwe use !(-z) = 1 !(z). Remembering that thecurves are symmetrical and that the total area under the curves is 1.
Above shows, P(Z< -a) = !(-a) = 1 !(a)
This shows that P(Z> -a) = !(a)
Example
Find (a) P (Z< 0.411) (b) P (Z> - 0.411) (c) P (Z> 0.411) (d) P (Z< - 0.411)
Solution
(a)
P(Z< 0.411) = [from tables 6591 + 4] = 0.6595
(b)P(Z> - 0.411) = P(Z< 0.411) = 0.6595 (from (a))
(c)P(Z> 0.411) = 1 !(0.411) = 1 0.6595 = 0.3405
(d)P(Z< - 0.411) = P(Z> 0.411) = 0.3405
-a
!(-a)1-
a
!(a)
a
! (a)
-a
P(Z< -a) = P(Z> a)
- =
-
7/30/2019 Unit 8 Textbook
18/47
Unit 8 Steve Bishop
18
Now try these
1. P (Z> - 0.314) 2. P (Z< - 0.314) 3. P (Z> 0.111) 4. P (Z> - 0.111)
P(a < Z < b) = !(b) !(a)
Example:
Find P(0.345 < Z< 1.751)
= !(1.751) !(0.345) = 0.9600 0.6350
= 0.3250
Now try these
Find(a)
P(0.35 < Z< 1.50)
(b)P(0.45 < Z< 1.51)
(c)P(0.354 < Z< 1.541)
(d)P(0.349 < Z< 1.716)
Answers
Now try these
1. 0.80078
2. 1-0.8078 = 0.1922
3. 0.7068
4. 1- 0.7068 = 0.2932
5.! (0.314) = 0.6231 ! by symmetry 0.6231
6. 1- 0.6231 = 0.3769
7. 1- 0.5442 = 0.4558
8. 0.5442
Now try these
(a) ! (1.50) - ! (0.35) = 0.0332 0.6368 = 0.2964(b) 0.9345 0.6736 = 0.2609
(c) 0.9383 0.6382 = 0.3001
(d) 0.9569-0.6363 = 0.3206
a b
-
7/30/2019 Unit 8 Textbook
19/47
Unit 8 Steve Bishop
19
Standardising a normal distribution
To standardise Xwhere X~ N(!, "#)
subtract the mean and then divide by the standard deviation:
Z =X
!
where Z~ N(0,1)
ExampleIf X~ N(100, 25), find P(X> 110)
SolutionFirst standardise the random variable:
P(X> 110) =
P "# Z >
110 100
5
%& = P(Z > 2)
P (Z > 2) = 1 P(Z$2)
= 1 0.9772
= 0.0228
Now try these1. If X~ N(116, 64), find P(X< 100)
2. If X~ N(100, 16), find P(X> 90)
100 110
0 2
X~N (100,25)
Z~N (0,1)
-
7/30/2019 Unit 8 Textbook
20/47
Unit 8 Steve Bishop
20150 165x:
z: 0 0.5145-05
ExampleLengths of a murder victims hair are normally distributed with a mean lengthof 150 cm and a standard deviation of 10 cm.
Find the probability that the length of a randomly selected strip is shorter than165 cm
Solution
Here X ~N (150, 10!)
(a)This means we have to find P(X
-
7/30/2019 Unit 8 Textbook
21/47
Unit 8 Steve Bishop
21
Now try these 2
1. The masses of packages from a particular machine are normally
distributed with a mean of 200g and a standard deviation of 2 g. Find
the probability that a randomly selected package from the machineweighs
(a)less than 197 g
(b)more than 200.5 g
(c)between 198.5 and 199.5 g.
2. The heights of boys at a particular age follow a normal distribution with
mean 105.3 cm and variance 25 cm.
find the probability that a boy picked at random from this group has
height
(a)less than 153 cm
(b)
more than 158 cm(c)between 150 cm and 158 cm
(d)more than 10 cm difference from the mean height.
Answers
Now try these
1. 0.0228
2. 0.8994
Now try these 2
1. (a) 0.0668 (b) 0.4013 (c) 0.1747
2. (a) 0.7054 (b) 0.0618 (c) 0.4621 (d) 0.0456
-
7/30/2019 Unit 8 Textbook
22/47
Unit 8 Steve Bishop
22
Probability introduction
Random events happen by chance. Probability is a measure of how likely they are.
It is measured on a scale from 0 (impossible) to 1 (certain).A random event has various outcomes.
In a trial(or experiment) the things that happen are called outcomes.
Eventsare groups of one or more outcomes.
When an outcome is equally likelythe probability of an event is determined by
counting the outcomes.
P(event) =Number of outcomes where event happens
Total number of possible outcomes
Example
A bag with 10 balls in 4 are red, 3 are blue, 2 are white and 1 is black.
What is the probability of picking a blue ball? a white ball? a green ball? a ball that
is notred?
A sample spaceis the set of all possible outcomes.
Example
Complete the following sample space for the score on rolling 2 dice:
Scores 1 2 3 4 5 6
1
2
3
4
5
6
Find the probability of scoring: a total of 12; a total of 7; a score of less than 4.
Venn diagrams can be used to show which outcome corresponds to which event.
The shaded area in the middle The shaded area shows A or B
shows A and B P(A!B) P(AUB)
P(AUB) = P(A) + P(B) P(A!B)
A B
-
7/30/2019 Unit 8 Textbook
23/47
Unit 8 Steve Bishop
23
Example
If you roll a dice, event A is an even number, and event B is a number >4, then the
Venn diagram would be:
A B
2
4
6 5
3
1
Find: P(A); P(B) ; P (A!
B); P(A)'
-
7/30/2019 Unit 8 Textbook
24/47
Unit 8 Steve Bishop
24
Conditional probabilityIn a small prison there are 100 prisoners. 50 are imprisoned for burglary, 29 arson
and 34 for other crimes.
First draw a Venn diagram. Work out how many are in for burglary and arson:
(50+29+24) 100 = 13. These must have been counted twice, so they are the ones
in for both. So those who charged for burglary only must be 50 13 = 37 and arson
only 29 13 = 16. Place these numbers on the Venn diagram
50 2937 1613
34
Maths Science
What is the probability of choosing someone from the prison who is not in for burglary
or arson?
1. What is the probability of choosing someone who is in for burglary and arson?
2. What is the probability of choosing someone who is in for arson?
3. What is the probability of this person in for burglary as well?
This last question is known as conditional probability. It is often phrased as What
is the probability of choosing someone who is convicted for burglary given thatthey
are convicted for arson?
This is written as: P(B|A) (the probability of B givenA).
From the diagram, the answer is straightforward 13/29.
The 13 represent those in Burglary andArson and 29 represents those in Arson
So this can be written as:
P(A|B) = P (A and B)
P()
Burglary
-
7/30/2019 Unit 8 Textbook
25/47
Unit 8 Steve Bishop
25
Now try these
1. Two dice are thrown. What is the probability that the total is: (i) 7; (ii) a prime
number; (iii) 7, given that it is a prime number.
2. A forensics company is worried about the high turnover of its employees and
decides to investigate whether they are more likely to stay if they are given
training. On 1stJanuary one year the company employed 256 people (excluding
those about to retire). During the year a record was kept of who received training
as well as who left the company. The results are summarised below:
Still employed Left company Total
Given training 109 43 152
Not given training 60 44 104
Total 169 87 256
Find the probability that a random selected employee:
(i) received training
(ii) did not leave the company
(iii) received training and did not leave the company(iv) did not leave the company, given that the person had received training
(v) received training, given the person had not left the company.
3. 100 cars are entered for a road-worthiness test which is in two parts, mechanical
and electrical. A car passes only if it passes both parts. Half the cars fail the
electrical test and 62 pass the mechanical. 15 pass the electrical but fail the
mechanical test. Find the probability that a car chosen at random.
(i) passes overall (ii) fails one test only (iii) given that it has failed, failed the
mechanical test, only.
-
7/30/2019 Unit 8 Textbook
26/47
Unit 8 Steve Bishop
26
Statistics and probability questionsFor each task show all your workings. Give the final answer where
appropriate to 3 significant figures. Hand in your completed working andsolution.
Task 1A. Use the data from your titration experiment.(a) Find the mean and median of the volume of HCl used to determine the endpoint.(b) Determine the standard deviation using an appropriate method.
B. On a particular corpse some unidentified tissue has been found. A sampleof 11 cells have been taken and measured. The diameters (in m) are asfollows:
123, 126, 129, 122, 125, 128, 125, 124, 125, 126, 122
(a) Find the mean, median and mode of the diameters(b) Determine the standard deviation manually andby ICT (if you use aspreadsheet include a screen shot).(c) Calculate the standard error of the mean. What is the 95% confidencelimit?
(P1)
Task 2You have been investigating the probability of certain ballistic trace evidencebeen found at a crime scene. The probability of one type A is 0.3 and theprobability of type B is 0.5. The probability of P(A|B) = 0.25.
(a) Find the probability of finding A and B at a crime scene(b) Find the probability of finding A or B at the scene.
(P1 part; M1 part)
Task 3A forensic anthropologist has asked your advice. She was investigating thelifespan of insects on a human corpse. The mean lifespan for one insect is 144days and the standard deviation is 16 days. Find the probability that one insect
will live less than 140 days and another more than 156 days.
(M1 part, D1 part)
-
7/30/2019 Unit 8 Textbook
27/47
Unit 8 Steve Bishop
27
Task 4The following distributions have had their labels removed.
A B
C D
Identify:(a) the bimodal distribution(b) the positively skewed distribution(c) the negatively skewed distribution and(d) the normal distribution.
Which distribution matches the following:(i) An easy science examination(ii) The salary of workers in a large laboratory
(iii) The heights of males and females in the UK(iv) The mass of males in a large science laboratory.
(D1 part)
-
7/30/2019 Unit 8 Textbook
28/47
Unit 8 Steve Bishop
28
2 BE ABLE TO PERFORM STATISTICAL TESTS
TO INVESTIGATE SCIENTIFIC PROBLEMS
Chi-squared test: , where O is the observed frequency and Eis the expected frequency); degrees of freedom; contingency tables; science
related applications of the Chi-squared test, eg colour blindness, psychology,
genetics, drug tests, any other science related test
P2 perform a chi-squared test to support a scientific hypothesis
M2 interpret the results of the chi-squared test
D2evaluate the validity of the interpretation of the results of the chi-squared test
The t-test: independent samples; related samples (matched pairs); applications,
eg equal number of seeds in two different composts, test whether a particular
fertilizer improves yield of tomatoes, any other science related test
P3 perform a t-test on data collected from a laboratory experiment
M3 interpret the results of the t-test
D3 evaluate the validity of the interpretation of the results of the t-test
Correlation testing: graphical test, eg line of best fit; linear regression, eg using a
calculator in linear regression mode; testing for power law, eg radioactivity
experiments, electrical experiments, any other science related example
P4 carry out an appropriate correlation method to investigate data collected from a
laboratory experiment.
M4 interpret the results of the correlation.
D4 evaluate the validity of the interpretation of the results of the correlation.
-
7/30/2019 Unit 8 Textbook
29/47
Unit 8 Steve Bishop
29
Chi-squared ( 2
! ) test
2! is pronounced kai-squared, and sometimes written chi-squared. The 2! test
helps discover if there is any connection between two variables that can be arrangedinto categories (eg colours, countries, gender). (It cannot be used with continuous
data.)
Example 1
50 men and 50 women are interviewed.43 men can name over 15 clubs in the premier league
27 women can name over 15 clubs in the premier league.Is there a connection between gender and football interest (assuming being able to
name over 15 clubs means that the person has an interest in football)?
1. Define the null and alternative hypotheses
H0: there is no difference between gendersH1: there is a difference between genders.
2. Arrange the data into a contingency table
Interestedin football
Notinterested
in football
Total
Men 43 7 50
Women
27
23
50
Total 70 30 100
This is a 2 !2 table there are 2 categories for each variable.
"2=
(O # E)2
E$ where Ois the observed values and Eis the expected values.
The contingency table above gives the observed values. We now have to find theexpected values.
3. Find the expected values
This is found by multiplying the column total by the row total and dividing by thegrand total.
Interestedin football
Notinterested
in football
Total
Men 43 7 50
Women 27 23 50
Total 70 30 100
column !row
overall
-
7/30/2019 Unit 8 Textbook
30/47
Unit 8 Steve Bishop
30
Hence for men:
interested in football we would expect: (70 !50) 100 = 35
not interested in football: (30 !50) 100 = 15
For women:
interested in football: (70 !50) 100 = 35not interested in football: (30 !50) 100 = 15
The expected table would then read:
Interestedin football
Notinterested
in football
Total
Men 35 15 50
Women 35 15 50
Total 70 30 100
The totals will remain unchanged.
4. Calculate the residual table
The residual is the difference between the observed and the expected values.
Observed - Expected = Residual
43 7 - 35 15 = + 8 -8
27 23 35 15 -8 +8
In 2 !2 tables the numbers will always be the same, with only the signs differing.
5. Calculate2
!
! "
=
E
EO 2
2 )(# where Ois the observed values and Eis the expected values.
The residual table was found (O - E)
So,2
! =35
)8( 2++
15
)8( 2!
+35
)8( 2!
+15
)8( 2+= 12.19
We now have to decide if the2
! value is high enough to conclude that it is unlikely
to get such a number by chance.
To do this we have to look at the concept of degrees of freedom
-
7/30/2019 Unit 8 Textbook
31/47
Unit 8 Steve Bishop
31
Degrees of freedom
In a 2 !2 contingency table, the value of one entry determines all the others:
Total
43
50
50
70 30 100
However, in a 3 !3 table we need 4 values before we can know what all the othervalues are:
Total
37 22 70
8 10 20
60
60
50
40
One in the first example, and four in the second example are called the degrees offreedom.
The degrees of freedom can be calculated using:
degrees of freedom = (r1) !(c-1)
where r= the numbers of rows and c= the number of columns.
Knowing the degrees of freedom, the2
! value and a table of critical valueswe can
find out if there is any relations hip between gender and interest in football.
One-tail 5% 2.5% 1.25% 0.5% 0.25% 0.005%
Two-tail 10% 5% 2.5% 1% 0.5% 0.01%
0.9 0.95 0.975 0.99 0.995 0.999
!=1 2.706 3.841 5.024 6.635 7.8794 10.83
!=2 4.605 5.991 7.378 9.210 12.84 16.27
!=3 6.251 7.815 9.348 11.34 14.86 18.47
!=4
7.779
9.488
11.14
13.28
16.75
20.51
With one degree of freedom and a test at the 5% level gives us a value of 3.841.This means that 5% of the time we would expect a number greater than 3.81.
As the2
! value is 12.9, we can say that at the 5% level we are confident that there is
a relationship between football and gender.
-
7/30/2019 Unit 8 Textbook
32/47
Unit 8 Steve Bishop
32
Example 2
A sociologist wants to know if middle-class men are more likely to change babies
nappies than working-class men. The sociologist interviews 40 middle-class and 60working-class men. 17 middle-class men change nappies and 13 working-class menchange nappies.
1. Define the null and alternative hypothesesH0: There is no connection between social class and nappy changingH1: The two variables are related.
2. Arrange the data into a contingency table3. Find the expected values
4. Calculate the residual table
17 23 - 12 28 = +5 -5
13 47 18 42 -5 +5
5. Calculate 2!
So,2
! =12
)5( 2++
28
)5( 2!+
18
)5( 2!+
42
)5( 2+= 4.96
6. Find the degrees of freedom
Degrees of freedom = (2 1 ) !(2 1 ) = 1
7. Use the tables
The chance that2
! will be 3.841 or more by chance if H0is true will be 5%.2
! = 4.96, so this suggests that we reject H0and conclude that there is some
connection between social class and nappy changing.
-
7/30/2019 Unit 8 Textbook
33/47
Unit 8 Steve Bishop
33
Now try these
1. Find the expected values for the following tables.
(a)
18
32
(b)
25
16
8 42 22 37
(c) 40 60 60
60 50 50
20 50 10
2. Find the residual tables for the tables in question 1.
3. Calculate the2
! for the tables in question 1.
4. How many degrees of freedom will there be for each of the following contingency tables?
(a) 5 !3 (b) 7 !5 (c) 6 !2 (d) 10 !17
5. The table below shows the results of a drug test on an infection. Is there any evidencethat treatment is related to cure?
Treated Not treated
Cured 24 57
Not cured 53 257
6. Murder Inc., a forensic science firm, carried out a survey to find out the political affiliation
of its employees. Carry out a2
! test on the table to determine whether there is any
association between political affiliation and type of work
Lab-based Non lab-based Total
Conservative 22 16 38
Labour 53 8 61
LibDem 20 11 31
Total 95 35 130
7. A researcher in genetics is investigating whether eye colour bears any relationship toplace of residence. From the table below, is there any evidence of such a relationship?
Brown
Blue
OtherLeicester 72 80 28
Bournemouth 20 62 18
Aberdeen 67 120 44
-
7/30/2019 Unit 8 Textbook
34/47
Unit 8 Steve Bishop
34
Answers
1.
(a) 13 37 (b) 19.27 21.73
13 37 27.73 31.27
(c)
48
64
48
48 64 48
24 32 24
2.
(a) + 5 - 5 (b) + 5.73 -5.73
- 5 + 5 -5.73 + 5.73
(c) -8 -4 +12
+12 -14 +2
-4 +18 -14
3.(a)2
! = 25/13 + 25/37 + 25/13 + 25/37 = 5.20 (b) 5.45 (c) 29.69
4. (a) 4 !2 = 8 (b) 6 !4 = 24 (c) 5 (d) 144
5. 11.59 significant at !%, so there is evidence of an association
6. 6.38, significant at 2 %, so there is evidence of an association7. 13.5, 4 degrees of freedom, significant at 1% so there is evidence of an association.
-
7/30/2019 Unit 8 Textbook
35/47
Unit 8 Steve Bishop
35
Practice questions
1. Is there a connection at the 5% level between burglary and house type?
Burglary No burglary Total
House 3 2
Bungalow 4 1
Total
2. Is there a connection between the type of area and fatal traffic accidents (figuresin thousands) at the 5% level?
Fatal Non-fatal Total
Motorway 5 15 20
Urban 4 24 28
Rural 3 12 15
Total 12 51 63
Solutions
1. Degrees of freedom: 1
Chi-square = 0.476For significance at the 5% level, chi-square should be greater than or equal to 3.84.
The distribution is not significant.
2. Degrees of freedom: 2
Chi-square = 0.88
For significance at the 5% level, chi-square should be greater than or equal to 5.99.
The distribution is not significant.
-
7/30/2019 Unit 8 Textbook
36/47
Unit 8 Steve Bishop
36
Type I and type II errors
There are four possible conclusions when conducting a significance test:
True situation Our conclusion
H0is true Accept H0 Correct decision
H0is true Reject H0 Wrong decision Type I error
H0is false Accept H0 Wrong decision Type II error
H0is false Reject H0 Correct decision
A type I error is known as a false positive.For example a court finding a person guilty for a crime they did not commit.
The probability of a type I error is the same as the significance level
A type II error is a false negative.A court finding a person not guilty of a crime they did commit.
A third type of error has also been proposed: type IIIRejecting the null hypothesis for the wrong reason!
Justice System - Trial
Defendant
InnocentDefendant
Guilty
Reject
Presumption of
Innocence
(Guilty Verdict)
Type I Error Correct
Fail to Reject
Presumption of
Innocence (Not
Guilty Verdict)
Correct Type II Error
Statistics - Hypothesis Test
Null Hypoth
TrueNull Hypoth
False
Reject Null
Hypothesis Type I Error Correct
Fail to Reject
Null Hypothesis Correct Type II Error
-
7/30/2019 Unit 8 Textbook
37/47
Unit 8 Steve Bishop
37
The angel of death: guilty or not guilty?
Kirsten Gibert was a nurse on Ward C at the Veterans Affairs Medial
centre in Northampton Massachusetts, USA. She earned the
nickname Angel of Death as she was often the first to notice that a
patient was going into a cardiac arrest. She was calm and competent
and would be able to administer the correct drug to save the patient.
However, there were growing suspicions about her behaviour. There
had been a high number of deaths on her particular ward. As well as
shortages of the amphetamine-type drug epinephrine that can be
used to cause cardiac arrest.
A hospital investigation found nothing untoward. Some staff were still concerned, so a
second investigation took place, this time involving statistician Stephen Gehlbach. Gehlbach
plotted the annual number of deaths, broken down by shift and year (below). Gilbert started
to work on Ward C in March 1990 and stopped working at the hospital in February 1996.
Total deaths at the hospital, by shift and year [source: Devlin & Lorden (2007, p. 16)]
What pattern does the bar chart show?
-
7/30/2019 Unit 8 Textbook
38/47
Unit 8 Steve Bishop
38
Is there evidence to secure a conviction? Could it be a coincidence? To determine this we
can use a chi-squared test.
Here is the data the investigators had:
Gilbert Present Death on shift
Yes No Total
Yes 40 217
No 34 1350
Total
Perform a chi-squared test to support the following one-tail hypothesis at the 0.01 level (P2):
HA: Significantly more patients will be found to die on a shift where the subject is
working than on shifts when the subject is not working.
State clearly your conclusion.
What are the implications of the result? (M2)
How valid are your results? How valid is your interpretation? Is Kirsten Gilbert really gulity or
non-guilty? (D2)
Bibliography
Kelly M. Pyrek (2009). Kristen Gilbert Case Explored in New Book Forensic Nurse[online:
http://www.forensicnursemag.com/webx/391webx1.html [accessed 22 Jan 2010]]
K Devlin and G. Lorden (2007). The Numbers behind Numb3rs. Plume: New York.
-
7/30/2019 Unit 8 Textbook
39/47
Unit 8 Steve Bishop
39
Students t-testStudent was W. S. Gossett. He published his test anonymously as
Student because he was working for the brewers Guinness as astatistician and Guinness did not want the competition knowing that they
were using statistics to help improve the brewing process.
The test is used to compare samples from two different batches.
This may be beer brewed under different circumstances, soil from
different areas or evidence from two different crime scenes.
It is usually used with small (
-
7/30/2019 Unit 8 Textbook
40/47
Unit 8 Steve Bishop
40
4. Calculate the standard deviation of the difference
! !!!!!!!
!!!!
!!!!! !!
!!!!!!!!!!!!
!"!!= 1.51
5. Calculate the standard error
SE =!
!
=!!!"
!"
= 0.478
6. Calculate the value of t
! ! !
!"
=!!!
!!!"#= 5.0
7. Calculate the number of degrees of freedom and find the critical value
No of pairs of data 1 = n 1
10-1 = 9
8. From the table with 9 degrees of freedom 1-tail at 0.05 level:
9. Determine if there is a difference or not
t > tcritical (5.0 > 1.833)
So, the null hypothesis is rejected and the alternative hypothesis is accepted.
The ninhydrin does make a positive difference.
-
7/30/2019 Unit 8 Textbook
41/47
Unit 8 Steve Bishop
41
t-test for matched pairs
1 Set up the null and alternative
hypotheses and determine if it is a one-
or two-tail test
H0
HA
2 Calculate the differences between the
pairs in the samples (D)
3 Calculate the mean of the differences
! ! !
!
4 Calculate the standard deviation of the
differences! !!!!!!!
!!!
5 Calculate the standard error of the
differences SE =!
!
6 Calculate the value of t
! ! !
!"
7 Calculate the number of degrees of
freedom
No of pairs of data 1 = n 1
8. Find the critical value from the table
9. Determine if there is a difference or not
If t< critical value then there is no
significant difference between the two sets
of data and the null hypothesis is accepted.
If t!critical value then the null hypothesis
is rejected. Then the two sets of data differ
significantly.
-
7/30/2019 Unit 8 Textbook
42/47
Unit 8 Steve Bishop
42
Independent samples
If there is no before and after relationship between the samples then the independent
samples test is used.
! !
!! ! !!
!!
!
!!
!!!
!
!!
Example
Some brown dog hairs were found on the clothing of a victim at a crime scene involving a
dog.
The five of the hairs were measured: 46, 57, 54, 51, 38 !m.
A suspect is the owner of a dog with similar brown hairs. A sample of the hairs has been
taken and their widths measured: 31, 35, 50, 35, 36 !m.
Is it possible that the hairs found on the victim were left by the suspects dog? Test at the %5
level.[From D. Lucy Introduction to Statistics for Forensic ScientistsChichester: Wiley, 2005 p. 44.]
Solution
1. Calculate the mean and standard deviation for the data sets !!and !!
Dog A Dog B
46 31
57 35
54 50
51 35
38 36
Total 246 187
Mean 49.2 37.4
Standard
deviation
7.463 7.301
2. Calculate the magnitude of the difference between the two means.!!!- !!!
49.2 37.4 = 11.8
3. Calculate the standard error !
!
in the difference:!!
!
!!
!!!
!
!!
.
!!!"#!
!!
!!!"#!
!= !"!!"!! !"!!!!!!
= 4.669 "4.67 (3 sf)
-
7/30/2019 Unit 8 Textbook
43/47
Unit 8 Steve Bishop
43
4. Calculate the value of t:
t= difference between the means standard error in the difference
11.8!4.669 = 2.527
!2.53 (3 sig fig)
5. Calculate the degrees of freedom = !!+ !! 2
5 + 5 -2 = 8
6. Find the critical value for the particular significance you are working to and find the
critical value from the table
At the 0.05 level tcrit= 2.306
If t< critical value then there is no significant difference between the two sets of data
If t> critical value then there is a significant difference between the two sets of data
So, at 0.05 level there is a significant difference between the two data sets.
So it could not come from the same dog.
-
7/30/2019 Unit 8 Textbook
44/47
Unit 8 Steve Bishop
44
Independent t-test
1 Calculate the mean and standard
deviation for the data sets !!and !!
2 Calculate the magnitude of the
difference between the two means.
!!!- !!!
3 Calculate the standard error!
!
in the
difference:!!
!
!!
!!!
!
!!
.
4 Calculate the value of t:
t= difference between the means
standard error in the difference [step 2
step 3]
5 Calculate the degrees of freedom = !!+ !! 2
6 Find the critical value for the particularsignificance you are working to.
7 If t< critical value then there is nosignificant difference between the two sets
of data and the null hypothesis is accepted.
If t"critical value then the null hypothesis isrejected. Then the two sets of data differ
significantly.
-
7/30/2019 Unit 8 Textbook
45/47
Unit 8 Steve Bishop
45
STATISTICAL TABLES
-
7/30/2019 Unit 8 Textbook
46/47
Unit 8 Steve Bishop
46
-
7/30/2019 Unit 8 Textbook
47/47
Unit 8 Steve Bishop