Research Methods Part 4 T- Statistics Partly based on material by Sherry O’Sullivan.

28
Research Methods Part 4 T- Statistics Partly based on material by Sherry O’Sullivan

Transcript of Research Methods Part 4 T- Statistics Partly based on material by Sherry O’Sullivan.

Page 1: Research Methods Part 4 T- Statistics Partly based on material by Sherry O’Sullivan.

Research Methods

Part 4T- Statistics

Partly based on material by Sherry O’Sullivan

Page 2: Research Methods Part 4 T- Statistics Partly based on material by Sherry O’Sullivan.

Revision• General terms

– Population– Sample– Parameter– Statistic

• Measures of central tendency– Mean – Median– Mode

• Measures of spread– Range– Inter-quartile range– Variance– Standard deviation

• Population• Sample

Page 3: Research Methods Part 4 T- Statistics Partly based on material by Sherry O’Sullivan.

Revision of notation

• Numbers describing a population are called parameters

• Notation uses Greek letters

• Population mean = μ• Population standard

deviation = σ

• Numbers describing a sample are called statistics

• Notation uses ordinary letters

• Sample mean =

• Sample standard deviation = s

Page 4: Research Methods Part 4 T- Statistics Partly based on material by Sherry O’Sullivan.

Revision: Z - Scores

• A specific method for describing a specific location within a distribution– Used to determine precise location of an in individual score within the distribution– Used to compare relative positions of 2 or more scores

Page 5: Research Methods Part 4 T- Statistics Partly based on material by Sherry O’Sullivan.

Revision: Standard Deviation

• Measures the spread of scores within the data set– Population standard deviation is used when you

are only interested in your own data– Sample standard deviation is used when you want

to generalise from your sample to the rest of the population

Page 6: Research Methods Part 4 T- Statistics Partly based on material by Sherry O’Sullivan.

Normal Distribution (Bell shaped)

Page 7: Research Methods Part 4 T- Statistics Partly based on material by Sherry O’Sullivan.

Normal distribution

• Many data sets follow a Normal distribution– Defined mathematically by its mean and standard

deviation

2

2

221

x

xf exp)(

Many statistical tests assume that data follows the Normal distribution

Strictly, you can’t use these tests unless you can show that your data follows a Normal distribution

Page 8: Research Methods Part 4 T- Statistics Partly based on material by Sherry O’Sullivan.

Other possible distributions

• Poisson distribution – for very rare events– e.g. number of BSCs (blue screen crashes) per

hour of computer use– Mean is small, often less than 1– Mode and median often zero

• Binomial distribution– Very similar to the Normal distribution, but a

discrete distribution (as opposed to a continuous distribution)

• There are lots of others…

Page 9: Research Methods Part 4 T- Statistics Partly based on material by Sherry O’Sullivan.

Poisson distribution(μ = mean = variance)

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 150

50

100

150

200

250

300Poisson Distribution, μ = 2; n = 1000

!/)( xexf x

Page 10: Research Methods Part 4 T- Statistics Partly based on material by Sherry O’Sullivan.

Distribution of the sample Means(simple example)

X X X X

0 1 2 3 4 5 6 7 8 9

• Frequency Distribution of 4 scores (2, 4, 6, 8)

• Distribution looks flat and not bell shaped• (actually not enough data to decide what the distribution might be)

• Mean of population is (2+4+6+8)/4 = 5

Page 11: Research Methods Part 4 T- Statistics Partly based on material by Sherry O’Sullivan.

Distribution of the sample means• Take all possible samples of two scores• Calculate average for each sample

(2+2)/2 = 2(2+4)/2 = 3(2+6)/2 = 4(2+8)/2 = 5

(6+2)/2 = 4(6+4)/2 = 5(6+6)/2 = 6(6+8)/2 = 7

(4+2)/2 = 3(4+4)/2 = 4(4+6)/2 = 5(4+8)/2 = 6

0 1 2 3 4 5 6 7 8 9

X XX

XXX

XXXX

X X XXX

X

(8+2)/2 = 5(8+4)/2 = 6(8+6)/2 = 7(8+8)/2 = 8

Page 12: Research Methods Part 4 T- Statistics Partly based on material by Sherry O’Sullivan.

Central Limit Theorem

• “For any population with a mean μ and standard deviation σ , the distribution of sample means for sample size n will have a mean of μ and standard deviation of σ/√n and will approach a normal distribution as n gets very large.”

• How big should the sample size be? n=30

X

X X X

X X X X X

X X X X X X X

0 1 2 3 4 5 6 7 8 9

Page 13: Research Methods Part 4 T- Statistics Partly based on material by Sherry O’Sullivan.

Standard Errorσ/√n is used to calculate the Standard Error of the sample meanSample data = x The mean of each sample =Then the standard error becomes

It identifies how much the observed sample mean is likely to differ from the un-measurable population mean μ.

So to be more confident that our sample mean is a good measure of the population mean, then the standard error should be small. One way we can ensure this is to take large samples (large n).

Page 14: Research Methods Part 4 T- Statistics Partly based on material by Sherry O’Sullivan.

Example The population of SATs scores is normal with μ= 500, σ =100. What is the chance that a sample of n=25 students has a mean score = 540? Since the distribution is normal, we can use the z-score.

First calculate the Standard Error: = 100/5 = 20

Then the Z-Score: = (540-500)/20 =2

The z-value is 2, therefore around 98% of the sample means are below this and only 2% are above. So we conclude that the chance of getting a sample mean of 540 or more is about 2%, so we are about 98% confident that this sample mean (if recorded in an experiment) is not due to random variation, but that the 25 students are (on average) brighter than average.

Page 15: Research Methods Part 4 T- Statistics Partly based on material by Sherry O’Sullivan.

t - Statistics

• So far we’ve looked at mean and sd of populations and our calculations have had parameters

• But how do we deduce something about the population using our sample?

• We can use the t-Statistic

Page 16: Research Methods Part 4 T- Statistics Partly based on material by Sherry O’Sullivan.

Remember SD from last week?

Great for population of N but not for sample of n

Why n -1? Because we can only freely choose n-1 (Degree

of freedom = df)

t - Statistics

11

2

nSS

nxx

s

NSS

Nx

2

Page 17: Research Methods Part 4 T- Statistics Partly based on material by Sherry O’Sullivan.

t - StatisticsStandard Errort statistic is z-score redone using the

above: And for the t-statistic, we substitute σ (SD

of population) with s (SD of sample)

But what about μ ?An example…

Page 18: Research Methods Part 4 T- Statistics Partly based on material by Sherry O’Sullivan.

Hypothesis Testing

• Sample of computer game players n =16• Intervention = inclusion of rich graphical

elements• Level has 2 rooms

– Room A = lots of visuals– Room B = very bland

• Put them in level 60 minutes• Record how long they spend in B

Page 19: Research Methods Part 4 T- Statistics Partly based on material by Sherry O’Sullivan.

Results

• Average time spent in B = 39 minutes• Observed “sum of squares” for the

sample is SS = 540.

A B

Page 20: Research Methods Part 4 T- Statistics Partly based on material by Sherry O’Sullivan.

Stage1: Formulation of Hypothesis

• H0: “null hypothesis”, that the visuals have no effect on the behaviour.

• H1: “alternate hypothesis”, that the visuals do have an effect on the players’ behaviour.

• If visuals have no effect, how long on average should they be in room B?

• Null hypothesis is crucial; here we can infer that μ = 30 and get rid of the population mean

Page 21: Research Methods Part 4 T- Statistics Partly based on material by Sherry O’Sullivan.

Stage 2: Locate the critical region

• We use the t-table to help us locate this, enabling us to reject or accept the null hypothesis. To get we need:– Number of degree of freedom (df) 16 -1 =15– We choose a significance or a level of

confidence: α = 0.05 (95% confidence)– Locate in t-table (2 tails): critical value of

t=2.131,

Page 22: Research Methods Part 4 T- Statistics Partly based on material by Sherry O’Sullivan.

Stage 3: Calculate statisticsCalculate sample sd = 6

Sample Standard Error = 6 / 4 =1.5

t-Statistic

= 6

The μ = 30 came from the null hypothesis: if visuals had no effect, then the player would spend 30 minutes in both rooms A and B.

Page 23: Research Methods Part 4 T- Statistics Partly based on material by Sherry O’Sullivan.

Stage 4: Decision

• Can we reject the H0, that the visuals have no effect on the behaviour?t = 6 which is well beyond the value of 2.131 which

indicates where chance kicks in.• So “yes”, we can safely reject it and say it does

affect behaviour• Which room do they prefer?

– They spent on average 39 minutes in Room B which is bland

Page 24: Research Methods Part 4 T- Statistics Partly based on material by Sherry O’Sullivan.

Another use of t (part 1)

• Our example was a comparison of what was observed with what was expected

• Our analysis gave a confidence with which the observations were different from the expected– Note: cannot be used to confirm similarity…

• Another use of t: comparison of two samples– e.g. male and female performance on a game, or

opinion of a website….

Page 25: Research Methods Part 4 T- Statistics Partly based on material by Sherry O’Sullivan.

Another use of t (part 2)

• In this case, we have two sample means, and we are testing for a difference

Recall: before, we had

This time, it gets messy because the two standard deviations might not be the same, and we finish up with

22

21

2

1

21

nsns

xxt

//

Page 26: Research Methods Part 4 T- Statistics Partly based on material by Sherry O’Sullivan.

Yet another complication:

• When looking for a difference, we may have no reason to suppose that one sample gives higher values than the other– We don’t know which mean might be higher

• This is a two-tailed test: we test both tails of the distribution

• If we have a good reason to suppose that one set of results has to be higher than the other– e.g. game scores before and after a practice session

• Then we have a one-tailed test

Page 27: Research Methods Part 4 T- Statistics Partly based on material by Sherry O’Sullivan.

Fortunately, it can all be done on a computer…

Page 28: Research Methods Part 4 T- Statistics Partly based on material by Sherry O’Sullivan.

End