Unit 8 Textbook

7/30/2019 Unit 8 Textbook

1/47

UNIT

USING

STATISTICS

FOR SCIENCE

BTEC (Extended) Diploma

Applied Science (Forensics) Level 3

Steve Bishop

November 2012


2/47

Unit 8 Steve Bishop

2

Contents

1 BE ABLE TO USE STATISTICAL TECHNIQUES TO INVESTIGATE SCIENTIFIC

PROBLEMS ................................................................................................................. 3

Statistical techniques ................................................................................................... 5

Measures of location ................................................................................................ 5

Measures of dispersion ............................................................................................ 6

Normal distribution 1 .................................................................................................... 9

Confidence limits .................................................................................................... 11

Shapes of distributions ........................................................................................... 13

The normal distribution 2 ........................................................................................... 14

Finding probabilities with negative values of z....................................................... 17

Standardising a normal distribution ........................................................................... 19

Probability introduction .............................................................................................. 22

Conditional probability ............................................................................................... 24

Statistics and probability questions ........................................................................... 26

2 BE ABLE TO PERFORM STATISTICAL TESTS TO INVESTIGATE SCIENTIFIC

PROBLEMS ............................................................................................................... 28

Chi-squared (2

! ) test ............................................................................................. 29

Practice questions ................................................................................................. 35

Type I and type II errors ............................................................................................ 36

The angel of death: guilty or not guilty? ..................................................................... 37

Students t-test ........................................................................................................... 39

t-test for matched pairs .......................................................................................... 41

Independent samples ............................................................................................. 42

Independent t-test .................................................................................................. 44

STATISTICAL TABLES ............................................................................................. 45


3/47

Unit 8 Steve Bishop

3

1 BE ABLE TO USE STATISTICAL TECHNIQUES

TO INVESTIGATE SCIENTIFIC PROBLEMS

Probability: addition and multiplication rules; conditional probability, eg lottery,

Mendelian inheritance

Frequency distributions: discrete data; continuous data (grouped and ungrouped)

Shape of distributions: unimodal distributions (normal distributions and skewed

distributions); bimodal distributions (qualitative explanation)

Statistical data calculations: calculation of the mean, ; mode;

median; calculations of standard deviation, ; using ICT

equipment to calculate the standard deviation; entering statistical data into ICT

equipment; retrieving statistical information from ICT equipment; standard error of

the mean; confidence limits

Normal distribution: mean; variance; use of tables of the cumulative distribution

function; application of the normal distribution in science

Sampling: random sampling (quadrant in field sampling); population and sample(Gallup or Mori poll); standard error of the mean (the uncertainty in the average

value of a set of measurements, eg the calorific value of oil)

P1 carry out statistical calculations to investigate a scientific problem

M1 perform a calculation using probability to investigate a scientific problem

D1 interpret shapes of distributions in scientific data


4/47

Unit 8 Steve Bishop

4


5/47

Unit 8 Steve Bishop

5

Statistical techniques

Measures of location

There are three types of average: the mean (x

_

),

mode and median. These are known as measures of

location. This provides a single value that represent

the data.

Now try this

Find the mean, median and mode of the following data:

(a) 1, 1, 1, 3, 4, 5, 6

(b) 0, 1, 1, 1, 1 ,1 ,1, 9

Which of the three measures of location is most affected by an extreme value?

When might the mode be of more use than either the median or the mean?

What is the advantage of the mean?

Measures of location doesnt tell us how spread out our data are how dispersed

they are.

Be able to use statistical

techniques to investigate

scientific problems

Frequency distributions

Shape of distributions

Statistical data calculations: mean,

mode, median and standard

deviation

Samples and populations standard

error of the mean

Using spreadsheets and calculators


6/47

Unit 8 Steve Bishop

6

Measures of dispersion

One measure of dispersionor spread is the range. Another is the standard

deviation, sor !.

The formula for standard deviation is: s =

! x" x_#

$%

&

'(2

n"1

This is not quite so scary as it looks!

It involves a few simple steps

1. Find the mean

2. Subtract it from all the values to find the deviation and then square it3. Total up the deviation squared

4. Divide 3 by the total number of data points less one.

5. Square root the answer from 4. This is the samplestandard deviation.

Done manually, it is best done in a table:

Example

These are the number of break-ins in a housing estate over a twelve-month period.

Find the mean and the standard deviation of the data:

1, 3, 3, 4, 2, 0, 0, 3, 4, 3, 0, 1

1. Find the mean

x

_

=

1+3+3+4+2+0+0+3+4+3+0+1

12=

24

12= 2

2. Subtract the mean from all the other values and square it x! x_"

#$

%

&'2

sis the standard deviation

xis the individual data

points

x

_

(x bar) is the mean

n is the number of data

points


7/47

Unit 8 Steve Bishop

7

x" x_#

$

%

&

'

(

x" x_#

$

%

&

'

(

2

1 1 - 2 -1 1

3 3 2 1 1

3 3 2 1 14 4 2 2 4

2 2 2 0 0

0 0 2 -2 4

0 0 2 -2 4

3 3 2 1 1

4 4 2 2 4

3 3 2 1 1

0 0 2 -2 4

1 1 2 -1 1

Total 26

3. Find the total of the deviations " x# x_$

%&

'

()

2

= 26

4. Divide the above by the number of data points (n) less 1 (n-1)

12-1 =11

! x" x_#

$%

&

'(2

n"1=26

11=2.3636363 (Dont round up yet!)

5. To find the standard deviation square root the answer above

s =

! x" x_#

$%

&

'(

n"1

2

= 2.363636... = 1.5374

= 1.54 (3 sig fig)


8/47

Unit 8 Steve Bishop

8

This can be done on a spreadsheet using

the insert function.

Enter the data in a column.

Ensure the correct data points are chosen

Insert function choose statistical >

STDEV

Alternatively use the function statement

=STDEV(cell range)

.


9/47

Unit 8 Steve Bishop

9

Normal distribution 1

The standard deviation can be used to find the confidence intervalfor a set of

measurements.

We expect 95% of measured values to lie within 2 standard deviations above and

below the mean.

The distribution of the height of 1000 people might look like this.

The shape is known as a bell shape.

The mean, median and mode will all have the same value

It is symmetrical around the mean value.

Many biological variables such as weight, height, blood pressure, life span have this

same distribution shape.

Given enough data points the curve will be a smooth bell shape


10/47

Unit 8 Steve Bishop

10

On a normal distribution:

68% of the data items will be within 1 standard deviation from the mean

95.5% of the data items will be within 2 standard deviation from the mean

99.7% of the data items will be within 3 standard deviation from the mean

However, the mean is only an estimate of the exact value and we only have a small

sample of values so we have to use this equation. There will be a sampling error,as

we cannot always sample the whole population.

We then need to calculate the standard errorof the mean:

Standard error =s

n

Any data that is more than

3 standard deviations

from the mean is

considered to be an

outlier.

If the whole population is

sampled then this is

known as a census


11/47

Unit 8 Steve Bishop

11

Now try this

Complete the following table

Sample size Mean (cm) Standard deviation Standard error of

the mean

10 150 2

100 150 2

1000 150 2

10 000 150 2

What happens to the standard error of the mean as the sample size increases?

Confidence limits

To find how confident we can be in the data we can find the confidence limits

these are related to the standard error.

For data that is normally distributed approximately 95% is

within 2 standard deviations. The 95% confidence level is

adequate for most scientific investigations.

95% confidence limit = mean 1.96 x standard error of the mean

In forensic situations or in

clinical trials a 99.7%confidence limit is often

required


12/47

Unit 8 Steve Bishop

12

Now try these

1. The diameter of a piece of wire is measured using a micrometer. The

following results in mm were obtained:

2.34, 2.34, 2.35, 2.37, 2.38

Calculate the mean and standard deviation.

2. The mean of five diameter values of a piece of wire is 2.36 mm and the

standard deviation is 0.018 mm.

What is the standard error of the mean?

A piece of wire is 2.39 mm. Can you be 95% confident that it is a correct

measurement of the diameter?


13/47

Unit 8 Steve Bishop

13

3. The volumes of acid to determine the end point of a titration are the

following:

(a) Calculate the mean and standard deviation

(b) Find the standard error of the mean.

(c) What are the 95% confidence limits, assuming the data is normally

distributed?

Shapes of distributions


14/47

Unit 8 Steve Bishop

14

The normal distribution 2

The normal distributionis a veryimportant distribution.

It is described by:

X~ N(x

_

, s!)

It has the following features:

bell-shaped

symmetrical about "

it extends from ! to +!

the maximum value of f(x) = 1

! 2"

the total area under the curve is 1

95%

!2" +2"

99.9%

!3" +3"

Approximately 95% of thedistribution lies between 2 SDs of

the mean

Approximately 99.9% of the

distribution lies between 3 SDs ofthe mean

f(x)

mean variance

s


15/47

The probability that X lies between aand bis written

as: P(a


16/47

Unit 8 Steve Bishop

16

Now try these

Draw sketches to illustrate your answers

If Z ~N (0, 1), find

1. P (Z 0.87) 3. P (Z< 0.544) 4. P (Z> 0.544)


17/47

Unit 8 Steve Bishop

17

Finding probabilities with negative values of z

For negative values of Zwe use !(-z) = 1 !(z). Remembering that thecurves are symmetrical and that the total area under the curves is 1.

Above shows, P(Z< -a) = !(-a) = 1 !(a)

This shows that P(Z> -a) = !(a)

Example

Find (a) P (Z< 0.411) (b) P (Z> - 0.411) (c) P (Z> 0.411) (d) P (Z< - 0.411)

Solution

(a)

P(Z< 0.411) = [from tables 6591 + 4] = 0.6595

(b)P(Z> - 0.411) = P(Z< 0.411) = 0.6595 (from (a))

(c)P(Z> 0.411) = 1 !(0.411) = 1 0.6595 = 0.3405

(d)P(Z< - 0.411) = P(Z> 0.411) = 0.3405

-a

!(-a)1-

a

!(a)

a

! (a)

-a

P(Z< -a) = P(Z> a)

- =


18/47

Unit 8 Steve Bishop

18

Now try these

1. P (Z> - 0.314) 2. P (Z< - 0.314) 3. P (Z> 0.111) 4. P (Z> - 0.111)

P(a < Z < b) = !(b) !(a)

Example:

Find P(0.345 < Z< 1.751)

= !(1.751) !(0.345) = 0.9600 0.6350

= 0.3250

Now try these

Find(a)

P(0.35 < Z< 1.50)

(b)P(0.45 < Z< 1.51)

(c)P(0.354 < Z< 1.541)

(d)P(0.349 < Z< 1.716)

Answers

Now try these

1. 0.80078

2. 1-0.8078 = 0.1922

3. 0.7068

4. 1- 0.7068 = 0.2932

5.! (0.314) = 0.6231 ! by symmetry 0.6231

6. 1- 0.6231 = 0.3769

7. 1- 0.5442 = 0.4558

8. 0.5442

Now try these

(a) ! (1.50) - ! (0.35) = 0.0332 0.6368 = 0.2964(b) 0.9345 0.6736 = 0.2609

(c) 0.9383 0.6382 = 0.3001

(d) 0.9569-0.6363 = 0.3206

a b


19/47

Unit 8 Steve Bishop

19

Standardising a normal distribution

To standardise Xwhere X~ N(!, "#)

subtract the mean and then divide by the standard deviation:

Z =X

!

where Z~ N(0,1)

ExampleIf X~ N(100, 25), find P(X> 110)

SolutionFirst standardise the random variable:

P(X> 110) =

P "# Z >

110 100

5

%& = P(Z > 2)

P (Z > 2) = 1 P(Z$2)

= 1 0.9772

= 0.0228

Now try these1. If X~ N(116, 64), find P(X< 100)

2. If X~ N(100, 16), find P(X> 90)

100 110

0 2

X~N (100,25)

Z~N (0,1)


20/47

Unit 8 Steve Bishop

20150 165x:

z: 0 0.5145-05

ExampleLengths of a murder victims hair are normally distributed with a mean lengthof 150 cm and a standard deviation of 10 cm.

Find the probability that the length of a randomly selected strip is shorter than165 cm

Solution

Here X ~N (150, 10!)

(a)This means we have to find P(X


21/47

Unit 8 Steve Bishop

21

Now try these 2

1. The masses of packages from a particular machine are normally

distributed with a mean of 200g and a standard deviation of 2 g. Find

the probability that a randomly selected package from the machineweighs

(a)less than 197 g

(b)more than 200.5 g

(c)between 198.5 and 199.5 g.

2. The heights of boys at a particular age follow a normal distribution with

mean 105.3 cm and variance 25 cm.

find the probability that a boy picked at random from this group has

height

(a)less than 153 cm

(b)

more than 158 cm(c)between 150 cm and 158 cm

(d)more than 10 cm difference from the mean height.

Answers

Now try these

1. 0.0228

2. 0.8994

Now try these 2

1. (a) 0.0668 (b) 0.4013 (c) 0.1747

2. (a) 0.7054 (b) 0.0618 (c) 0.4621 (d) 0.0456


22/47

Unit 8 Steve Bishop

22

Probability introduction

Random events happen by chance. Probability is a measure of how likely they are.

It is measured on a scale from 0 (impossible) to 1 (certain).A random event has various outcomes.

In a trial(or experiment) the things that happen are called outcomes.

Eventsare groups of one or more outcomes.

When an outcome is equally likelythe probability of an event is determined by

counting the outcomes.

P(event) =Number of outcomes where event happens

Total number of possible outcomes

Example

A bag with 10 balls in 4 are red, 3 are blue, 2 are white and 1 is black.

What is the probability of picking a blue ball? a white ball? a green ball? a ball that

is notred?

A sample spaceis the set of all possible outcomes.

Example

Complete the following sample space for the score on rolling 2 dice:

Scores 1 2 3 4 5 6

1

2

3

4

5

6

Find the probability of scoring: a total of 12; a total of 7; a score of less than 4.

Venn diagrams can be used to show which outcome corresponds to which event.

The shaded area in the middle The shaded area shows A or B

shows A and B P(A!B) P(AUB)

P(AUB) = P(A) + P(B) P(A!B)

A B


23/47

Unit 8 Steve Bishop

23

Example

If you roll a dice, event A is an even number, and event B is a number >4, then the

Venn diagram would be:

A B

2

4

6 5

3

1

Find: P(A); P(B) ; P (A!

B); P(A)'


24/47

Unit 8 Steve Bishop

24

Conditional probabilityIn a small prison there are 100 prisoners. 50 are imprisoned for burglary, 29 arson

and 34 for other crimes.

First draw a Venn diagram. Work out how many are in for burglary and arson:

(50+29+24) 100 = 13. These must have been counted twice, so they are the ones

in for both. So those who charged for burglary only must be 50 13 = 37 and arson

only 29 13 = 16. Place these numbers on the Venn diagram

50 2937 1613

34

Maths Science

What is the probability of choosing someone from the prison who is not in for burglary

or arson?

1. What is the probability of choosing someone who is in for burglary and arson?

2. What is the probability of choosing someone who is in for arson?

3. What is the probability of this person in for burglary as well?

This last question is known as conditional probability. It is often phrased as What

is the probability of choosing someone who is convicted for burglary given thatthey

are convicted for arson?

This is written as: P(B|A) (the probability of B givenA).

From the diagram, the answer is straightforward 13/29.

The 13 represent those in Burglary andArson and 29 represents those in Arson

So this can be written as:

P(A|B) = P (A and B)

P()

Burglary


25/47

Unit 8 Steve Bishop

25

Now try these

1. Two dice are thrown. What is the probability that the total is: (i) 7; (ii) a prime

number; (iii) 7, given that it is a prime number.

2. A forensics company is worried about the high turnover of its employees and

decides to investigate whether they are more likely to stay if they are given

training. On 1stJanuary one year the company employed 256 people (excluding

those about to retire). During the year a record was kept of who received training

as well as who left the company. The results are summarised below:

Still employed Left company Total

Given training 109 43 152

Not given training 60 44 104

Total 169 87 256

Find the probability that a random selected employee:

(i) received training

(ii) did not leave the company

(iii) received training and did not leave the company(iv) did not leave the company, given that the person had received training

(v) received training, given the person had not left the company.

3. 100 cars are entered for a road-worthiness test which is in two parts, mechanical

and electrical. A car passes only if it passes both parts. Half the cars fail the

electrical test and 62 pass the mechanical. 15 pass the electrical but fail the

mechanical test. Find the probability that a car chosen at random.

(i) passes overall (ii) fails one test only (iii) given that it has failed, failed the

mechanical test, only.


26/47

Unit 8 Steve Bishop

26

Statistics and probability questionsFor each task show all your workings. Give the final answer where

appropriate to 3 significant figures. Hand in your completed working andsolution.

Task 1A. Use the data from your titration experiment.(a) Find the mean and median of the volume of HCl used to determine the endpoint.(b) Determine the standard deviation using an appropriate method.

B. On a particular corpse some unidentified tissue has been found. A sampleof 11 cells have been taken and measured. The diameters (in m) are asfollows:

123, 126, 129, 122, 125, 128, 125, 124, 125, 126, 122

(a) Find the mean, median and mode of the diameters(b) Determine the standard deviation manually andby ICT (if you use aspreadsheet include a screen shot).(c) Calculate the standard error of the mean. What is the 95% confidencelimit?

(P1)

Task 2You have been investigating the probability of certain ballistic trace evidencebeen found at a crime scene. The probability of one type A is 0.3 and theprobability of type B is 0.5. The probability of P(A|B) = 0.25.

(a) Find the probability of finding A and B at a crime scene(b) Find the probability of finding A or B at the scene.

(P1 part; M1 part)

Task 3A forensic anthropologist has asked your advice. She was investigating thelifespan of insects on a human corpse. The mean lifespan for one insect is 144days and the standard deviation is 16 days. Find the probability that one insect

will live less than 140 days and another more than 156 days.

(M1 part, D1 part)


27/47

Unit 8 Steve Bishop

27

Task 4The following distributions have had their labels removed.

A B

C D

Identify:(a) the bimodal distribution(b) the positively skewed distribution(c) the negatively skewed distribution and(d) the normal distribution.

Which distribution matches the following:(i) An easy science examination(ii) The salary of workers in a large laboratory

(iii) The heights of males and females in the UK(iv) The mass of males in a large science laboratory.

(D1 part)


28/47

Unit 8 Steve Bishop

28

2 BE ABLE TO PERFORM STATISTICAL TESTS

TO INVESTIGATE SCIENTIFIC PROBLEMS

Chi-squared test: , where O is the observed frequency and Eis the expected frequency); degrees of freedom; contingency tables; science

related applications of the Chi-squared test, eg colour blindness, psychology,

genetics, drug tests, any other science related test

P2 perform a chi-squared test to support a scientific hypothesis

M2 interpret the results of the chi-squared test

D2evaluate the validity of the interpretation of the results of the chi-squared test

The t-test: independent samples; related samples (matched pairs); applications,

eg equal number of seeds in two different composts, test whether a particular

fertilizer improves yield of tomatoes, any other science related test

P3 perform a t-test on data collected from a laboratory experiment

M3 interpret the results of the t-test

D3 evaluate the validity of the interpretation of the results of the t-test

Correlation testing: graphical test, eg line of best fit; linear regression, eg using a

calculator in linear regression mode; testing for power law, eg radioactivity

experiments, electrical experiments, any other science related example

P4 carry out an appropriate correlation method to investigate data collected from a

laboratory experiment.

M4 interpret the results of the correlation.

D4 evaluate the validity of the interpretation of the results of the correlation.


29/47

Unit 8 Steve Bishop

29

Chi-squared ( 2

! ) test

2! is pronounced kai-squared, and sometimes written chi-squared. The 2! test

helps discover if there is any connection between two variables that can be arrangedinto categories (eg colours, countries, gender). (It cannot be used with continuous

data.)

Example 1

50 men and 50 women are interviewed.43 men can name over 15 clubs in the premier league

27 women can name over 15 clubs in the premier league.Is there a connection between gender and football interest (assuming being able to

name over 15 clubs means that the person has an interest in football)?

1. Define the null and alternative hypotheses

H0: there is no difference between gendersH1: there is a difference between genders.

2. Arrange the data into a contingency table

Interestedin football

Notinterested

in football

Total

Men 43 7 50

Women

27

23

50

Total 70 30 100

This is a 2 !2 table there are 2 categories for each variable.

"2=

(O # E)2

E$ where Ois the observed values and Eis the expected values.

The contingency table above gives the observed values. We now have to find theexpected values.

3. Find the expected values

This is found by multiplying the column total by the row total and dividing by thegrand total.


Notinterested

in football

Total

Men 43 7 50

Women 27 23 50

Total 70 30 100

column !row

overall


30/47

Unit 8 Steve Bishop

30

Hence for men:

interested in football we would expect: (70 !50) 100 = 35

not interested in football: (30 !50) 100 = 15

For women:

interested in football: (70 !50) 100 = 35not interested in football: (30 !50) 100 = 15

The expected table would then read:


Notinterested

in football

Total

Men 35 15 50

Women 35 15 50

Total 70 30 100

The totals will remain unchanged.

4. Calculate the residual table

The residual is the difference between the observed and the expected values.

Observed - Expected = Residual

43 7 - 35 15 = + 8 -8

27 23 35 15 -8 +8

In 2 !2 tables the numbers will always be the same, with only the signs differing.

5. Calculate2

!

! "

=

E

EO 2

2 )(# where Ois the observed values and Eis the expected values.

The residual table was found (O - E)

So,2

! =35

)8( 2++

15

)8( 2!

+35

)8( 2!

+15

)8( 2+= 12.19

We now have to decide if the2

! value is high enough to conclude that it is unlikely

to get such a number by chance.

To do this we have to look at the concept of degrees of freedom


31/47

Unit 8 Steve Bishop

31

Degrees of freedom

In a 2 !2 contingency table, the value of one entry determines all the others:

Total

43

50

50

70 30 100

However, in a 3 !3 table we need 4 values before we can know what all the othervalues are:

Total

37 22 70

8 10 20

60

60

50

40

One in the first example, and four in the second example are called the degrees offreedom.

The degrees of freedom can be calculated using:

degrees of freedom = (r1) !(c-1)

where r= the numbers of rows and c= the number of columns.

Knowing the degrees of freedom, the2

! value and a table of critical valueswe can

find out if there is any relations hip between gender and interest in football.

One-tail 5% 2.5% 1.25% 0.5% 0.25% 0.005%

Two-tail 10% 5% 2.5% 1% 0.5% 0.01%

0.9 0.95 0.975 0.99 0.995 0.999

!=1 2.706 3.841 5.024 6.635 7.8794 10.83

!=2 4.605 5.991 7.378 9.210 12.84 16.27

!=3 6.251 7.815 9.348 11.34 14.86 18.47

!=4

7.779

9.488

11.14

13.28

16.75

20.51

With one degree of freedom and a test at the 5% level gives us a value of 3.841.This means that 5% of the time we would expect a number greater than 3.81.

As the2

! value is 12.9, we can say that at the 5% level we are confident that there is

a relationship between football and gender.


32/47

Unit 8 Steve Bishop

32

Example 2

A sociologist wants to know if middle-class men are more likely to change babies

nappies than working-class men. The sociologist interviews 40 middle-class and 60working-class men. 17 middle-class men change nappies and 13 working-class menchange nappies.

1. Define the null and alternative hypothesesH0: There is no connection between social class and nappy changingH1: The two variables are related.

2. Arrange the data into a contingency table3. Find the expected values

4. Calculate the residual table

17 23 - 12 28 = +5 -5

13 47 18 42 -5 +5

5. Calculate 2!

So,2

! =12

)5( 2++

28

)5( 2!+

18

)5( 2!+

42

)5( 2+= 4.96

6. Find the degrees of freedom

Degrees of freedom = (2 1 ) !(2 1 ) = 1

7. Use the tables

The chance that2

! will be 3.841 or more by chance if H0is true will be 5%.2

! = 4.96, so this suggests that we reject H0and conclude that there is some

connection between social class and nappy changing.


33/47

Unit 8 Steve Bishop

33

Now try these

1. Find the expected values for the following tables.

(a)

18

32

(b)

25

16

8 42 22 37

(c) 40 60 60

60 50 50

20 50 10

2. Find the residual tables for the tables in question 1.

3. Calculate the2

! for the tables in question 1.

4. How many degrees of freedom will there be for each of the following contingency tables?

(a) 5 !3 (b) 7 !5 (c) 6 !2 (d) 10 !17

5. The table below shows the results of a drug test on an infection. Is there any evidencethat treatment is related to cure?

Treated Not treated

Cured 24 57

Not cured 53 257

6. Murder Inc., a forensic science firm, carried out a survey to find out the political affiliation

of its employees. Carry out a2

! test on the table to determine whether there is any

association between political affiliation and type of work

Lab-based Non lab-based Total

Conservative 22 16 38

Labour 53 8 61

LibDem 20 11 31

Total 95 35 130

7. A researcher in genetics is investigating whether eye colour bears any relationship toplace of residence. From the table below, is there any evidence of such a relationship?

Brown

Blue

OtherLeicester 72 80 28

Bournemouth 20 62 18

Aberdeen 67 120 44


34/47

Unit 8 Steve Bishop

34

Answers

1.

(a) 13 37 (b) 19.27 21.73

13 37 27.73 31.27

(c)

48

64

48

48 64 48

24 32 24

2.

(a) + 5 - 5 (b) + 5.73 -5.73

- 5 + 5 -5.73 + 5.73

(c) -8 -4 +12

+12 -14 +2

-4 +18 -14

3.(a)2

! = 25/13 + 25/37 + 25/13 + 25/37 = 5.20 (b) 5.45 (c) 29.69

4. (a) 4 !2 = 8 (b) 6 !4 = 24 (c) 5 (d) 144

5. 11.59 significant at !%, so there is evidence of an association

6. 6.38, significant at 2 %, so there is evidence of an association7. 13.5, 4 degrees of freedom, significant at 1% so there is evidence of an association.


35/47

Unit 8 Steve Bishop

35

Practice questions

1. Is there a connection at the 5% level between burglary and house type?

Burglary No burglary Total

House 3 2

Bungalow 4 1

Total

2. Is there a connection between the type of area and fatal traffic accidents (figuresin thousands) at the 5% level?

Fatal Non-fatal Total

Motorway 5 15 20

Urban 4 24 28

Rural 3 12 15

Total 12 51 63

Solutions

1. Degrees of freedom: 1

Chi-square = 0.476For significance at the 5% level, chi-square should be greater than or equal to 3.84.

The distribution is not significant.

2. Degrees of freedom: 2

Chi-square = 0.88

For significance at the 5% level, chi-square should be greater than or equal to 5.99.

The distribution is not significant.


36/47

Unit 8 Steve Bishop

36

Type I and type II errors

There are four possible conclusions when conducting a significance test:

True situation Our conclusion

H0is true Accept H0 Correct decision

H0is true Reject H0 Wrong decision Type I error

H0is false Accept H0 Wrong decision Type II error

H0is false Reject H0 Correct decision

A type I error is known as a false positive.For example a court finding a person guilty for a crime they did not commit.

The probability of a type I error is the same as the significance level

A type II error is a false negative.A court finding a person not guilty of a crime they did commit.

A third type of error has also been proposed: type IIIRejecting the null hypothesis for the wrong reason!

Justice System - Trial

Defendant

InnocentDefendant

Guilty

Reject

Presumption of

Innocence

(Guilty Verdict)

Type I Error Correct

Fail to Reject

Presumption of

Innocence (Not

Guilty Verdict)

Correct Type II Error

Statistics - Hypothesis Test

Null Hypoth

TrueNull Hypoth

False

Reject Null

Hypothesis Type I Error Correct

Fail to Reject

Null Hypothesis Correct Type II Error


37/47

Unit 8 Steve Bishop

37

The angel of death: guilty or not guilty?

Kirsten Gibert was a nurse on Ward C at the Veterans Affairs Medial

centre in Northampton Massachusetts, USA. She earned the

nickname Angel of Death as she was often the first to notice that a

patient was going into a cardiac arrest. She was calm and competent

and would be able to administer the correct drug to save the patient.

However, there were growing suspicions about her behaviour. There

had been a high number of deaths on her particular ward. As well as

shortages of the amphetamine-type drug epinephrine that can be

used to cause cardiac arrest.

A hospital investigation found nothing untoward. Some staff were still concerned, so a

second investigation took place, this time involving statistician Stephen Gehlbach. Gehlbach

plotted the annual number of deaths, broken down by shift and year (below). Gilbert started

to work on Ward C in March 1990 and stopped working at the hospital in February 1996.

Total deaths at the hospital, by shift and year [source: Devlin & Lorden (2007, p. 16)]

What pattern does the bar chart show?


38/47

Unit 8 Steve Bishop

38

Is there evidence to secure a conviction? Could it be a coincidence? To determine this we

can use a chi-squared test.

Here is the data the investigators had:

Gilbert Present Death on shift

Yes No Total

Yes 40 217

No 34 1350

Total

Perform a chi-squared test to support the following one-tail hypothesis at the 0.01 level (P2):

HA: Significantly more patients will be found to die on a shift where the subject is

working than on shifts when the subject is not working.

State clearly your conclusion.

What are the implications of the result? (M2)

How valid are your results? How valid is your interpretation? Is Kirsten Gilbert really gulity or

non-guilty? (D2)

Bibliography

Kelly M. Pyrek (2009). Kristen Gilbert Case Explored in New Book Forensic Nurse[online:

http://www.forensicnursemag.com/webx/391webx1.html [accessed 22 Jan 2010]]

K Devlin and G. Lorden (2007). The Numbers behind Numb3rs. Plume: New York.


39/47

Unit 8 Steve Bishop

39

Students t-testStudent was W. S. Gossett. He published his test anonymously as

Student because he was working for the brewers Guinness as astatistician and Guinness did not want the competition knowing that they

were using statistics to help improve the brewing process.

The test is used to compare samples from two different batches.

This may be beer brewed under different circumstances, soil from

different areas or evidence from two different crime scenes.

It is usually used with small (


40/47

Unit 8 Steve Bishop

40

4. Calculate the standard deviation of the difference

! !!!!!!!

!!!!

!!!!! !!

!!!!!!!!!!!!

!"!!= 1.51

5. Calculate the standard error

SE =!

!

=!!!"

!"

= 0.478

6. Calculate the value of t

! ! !

!"

=!!!

!!!"#= 5.0

7. Calculate the number of degrees of freedom and find the critical value

No of pairs of data 1 = n 1

10-1 = 9

8. From the table with 9 degrees of freedom 1-tail at 0.05 level:

9. Determine if there is a difference or not

t > tcritical (5.0 > 1.833)

So, the null hypothesis is rejected and the alternative hypothesis is accepted.

The ninhydrin does make a positive difference.


41/47

Unit 8 Steve Bishop

41

t-test for matched pairs

1 Set up the null and alternative

hypotheses and determine if it is a one-

or two-tail test

H0

HA

2 Calculate the differences between the

pairs in the samples (D)

3 Calculate the mean of the differences

! ! !

!

4 Calculate the standard deviation of the

differences! !!!!!!!

!!!

5 Calculate the standard error of the

differences SE =!

!

6 Calculate the value of t

! ! !

!"

7 Calculate the number of degrees of

freedom

No of pairs of data 1 = n 1

8. Find the critical value from the table

9. Determine if there is a difference or not

If t< critical value then there is no

significant difference between the two sets

of data and the null hypothesis is accepted.

If t!critical value then the null hypothesis

is rejected. Then the two sets of data differ

significantly.


42/47

Unit 8 Steve Bishop

42

Independent samples

If there is no before and after relationship between the samples then the independent

samples test is used.

! !

!! ! !!

!!

!

!!

!!!

!

!!

Example

Some brown dog hairs were found on the clothing of a victim at a crime scene involving a

dog.

The five of the hairs were measured: 46, 57, 54, 51, 38 !m.

A suspect is the owner of a dog with similar brown hairs. A sample of the hairs has been

taken and their widths measured: 31, 35, 50, 35, 36 !m.

Is it possible that the hairs found on the victim were left by the suspects dog? Test at the %5

level.[From D. Lucy Introduction to Statistics for Forensic ScientistsChichester: Wiley, 2005 p. 44.]

Solution

1. Calculate the mean and standard deviation for the data sets !!and !!

Dog A Dog B

46 31

57 35

54 50

51 35

38 36

Total 246 187

Mean 49.2 37.4

Standard

deviation

7.463 7.301

2. Calculate the magnitude of the difference between the two means.!!!- !!!

49.2 37.4 = 11.8

3. Calculate the standard error !

!

in the difference:!!

!

!!

!!!

!

!!

.

!!!"#!

!!

!!!"#!

!= !"!!"!! !"!!!!!!

= 4.669 "4.67 (3 sf)


43/47

Unit 8 Steve Bishop

43

4. Calculate the value of t:

t= difference between the means standard error in the difference

11.8!4.669 = 2.527

!2.53 (3 sig fig)

5. Calculate the degrees of freedom = !!+ !! 2

5 + 5 -2 = 8

6. Find the critical value for the particular significance you are working to and find the

critical value from the table

At the 0.05 level tcrit= 2.306

If t< critical value then there is no significant difference between the two sets of data

If t> critical value then there is a significant difference between the two sets of data

So, at 0.05 level there is a significant difference between the two data sets.

So it could not come from the same dog.


44/47

Unit 8 Steve Bishop

44

Independent t-test

1 Calculate the mean and standard

deviation for the data sets !!and !!

2 Calculate the magnitude of the

difference between the two means.

!!!- !!!

3 Calculate the standard error!

!

in the

difference:!!

!

!!

!!!

!

!!

.

4 Calculate the value of t:

t= difference between the means

standard error in the difference [step 2

step 3]

5 Calculate the degrees of freedom = !!+ !! 2

6 Find the critical value for the particularsignificance you are working to.

7 If t< critical value then there is nosignificant difference between the two sets

of data and the null hypothesis is accepted.

If t"critical value then the null hypothesis isrejected. Then the two sets of data differ

significantly.


45/47

Unit 8 Steve Bishop

45

STATISTICAL TABLES


46/47

Unit 8 Steve Bishop

46


47/47

Unit 8 Steve Bishop

Unit 8 Textbook

Documents

Transcript of Unit 8 Textbook