Revision of basic statistics Hypothesis testing Principles Testing a proportion Testing a mean...

10
Revision of basic statistics Hypothesis testing Principles Testing a proportion Testing a mean Testing the difference between two means Estimation

Transcript of Revision of basic statistics Hypothesis testing Principles Testing a proportion Testing a mean...

Page 1: Revision of basic statistics Hypothesis testing Principles Testing a proportion Testing a mean Testing the difference between two means Estimation.

Revision of basic statistics

Hypothesis testing Principles Testing a proportion Testing a mean Testing the difference between two means

Estimation

Page 2: Revision of basic statistics Hypothesis testing Principles Testing a proportion Testing a mean Testing the difference between two means Estimation.

Principles of hypothesis testing

Null vs alternative hypothesis The null as assumed true until proven otherwise

If the evidence is inconsistent with the null, reject it in favour of the alternative. E.g.

H0: a coin is fair vs H1: a coin is biased towards heads Evidence (data): 20 heads in 25 tosses

Evidence seems unlikely if H0 were true, hence reject H0

Probability of such extreme evidence is actually 0.2%. We usually reject if the probability is < 5% (the significance level of the test.)

Page 3: Revision of basic statistics Hypothesis testing Principles Testing a proportion Testing a mean Testing the difference between two means Estimation.

Testing a proportion

H0: 10% of people are left handed H0: = 0.1H1: the proportion is not 10% H1: 0.1

The sample proportion p is a random variable and should be somewhere near to the true value.

Its probability distribution is p ~ N(, n) under H0

Hence the test statistic is

This is, in general,

( )n

pz

ππ

π

−=

1

tatistice sample srror of thstandard e

d valueypothesisetistic - hsample sta

Page 4: Revision of basic statistics Hypothesis testing Principles Testing a proportion Testing a mean Testing the difference between two means Estimation.

Using data to calculate the test statistic

If 7 out of a group of 50 are left handed, the test statistic is

This is less than z* = 1.96, the critical value which cuts off 5% in the two tails of the Normal distribution.

Hence we cannot reject H0.

( )94.0

501.011.0

10.014.0=

−=z

-3 -2.5 -2 -1.5 -1 -0.5 -0 0.5 1 1.5 2 2.5 3

Page 5: Revision of basic statistics Hypothesis testing Principles Testing a proportion Testing a mean Testing the difference between two means Estimation.

Testing a mean

A firm selling franchises claims that the average weekly income of a franchise is at least £2000. A sample of 40 such franchises finds an average weekly income of £1770 with s.d. £450. Is the claim justified?

H0: = 2000 vs H1: < 2000 Significance level for test: 1% (we want to avoid a false

accusation) Critical value: z*= 2.33

Since z < -z* we reject H0.

( ) ( )40450,2000~,~ 22 NxnNx soσμ

( )23.3

40450

200017702

−=−

=z

Page 6: Revision of basic statistics Hypothesis testing Principles Testing a proportion Testing a mean Testing the difference between two means Estimation.

The Prob-value approach

Instead of comparing the test statistic to the critical value, we could compare the prob-value to the significance level (1% in this case)

The prob-value is the area in the tail of the distribution beyond the value of the test statistic.

In this case (z = -3.23) the prob-value is 0.0013 (0.13%, found from the standard Normal table)

Since 0.13% < 1% we reject H0Left hand tail of the Normal distribution

-3.9 -3.8 -3.6 -3.5 -3.3 -3.2 -3 -2.9 -2.7 -2.6 -2.4 -2.3 -2.1 -2

-2.33

-3.23

1% in tail of distribution

0.13% in tail

Page 7: Revision of basic statistics Hypothesis testing Principles Testing a proportion Testing a mean Testing the difference between two means Estimation.

How to reject the null hypothesis

Method 1 Test statistic > critical value (in absolute value) 3.23 > 2.33

Method 2 (prob-value) Prob value < significance level 0.13% < 1%

Note the different direction of the inequality!!! Both reject the null

If in doubt, draw the diagram! Watch out for:

Choice of significance level (5% or 1%) One vs two tail test. If we had a two tail test, the prob-

value would be 0.26% (and compare this to 1%).

Page 8: Revision of basic statistics Hypothesis testing Principles Testing a proportion Testing a mean Testing the difference between two means Estimation.

Testing the difference of two means

A sample of 40 students five years ago found an average expenditure on text books per annum of £87 (at today's prices) with s.d. £21. A current survey of 50 students found average expenditure of £77 with s.d. £30. Has expenditure declined?

H0: 1 - 2 = 0 vs H1: 1 -2 > 0

Random variable:

Significance level: 5%. Critical value z = 1.64.

Test statistic:

Decision: z > z* hence reject H0.

Or, prob-value associated with 1.86 is 3.14% < 5% hence reject.

⎟⎟⎠

⎞⎜⎜⎝

⎛+−−

2

22

1

21

2121 ,~nn

Nxxσσ

μμ

( )86.1

50304021

0778722

=+

−−=z

Page 9: Revision of basic statistics Hypothesis testing Principles Testing a proportion Testing a mean Testing the difference between two means Estimation.

The t distribution

When testing a mean with small samples, we use the t distribution instead of the Normal.

(But note that regression coefficients follow the t distribution whatever the sample size.)

A sample of 12 National Lottery outlets finds an average sale of 800 tickets per week, with s.d. 140. Does this suggest the original target of 700 has been exceeded?

H0: =700H1: > 700 Significance level: 5%. Critical value t* = 1.796 (d.f. = 11)

Test statistic:

2.47 > 1.796 hence reject H0. Alternatively, prob-value associated with 2.47 is 1.6%.

( )47.2

12140

07008002

=−−

=t

Page 10: Revision of basic statistics Hypothesis testing Principles Testing a proportion Testing a mean Testing the difference between two means Estimation.

Estimation

An alternative approach than hypothesis testing The sample mean or proportion is a point estimate Around this we build a confidence interval For the Normal distribution, the 95% CI is given by

Point estimate 1.96 standard errors For the franchising example above, we have

The interval has a width of about 170, expressing our uncertainty.

For the t distribution, the interval is given by Point estimate t* standard errors where t* is obtained from tables, using the appropriate

degrees of freedom (d.f. = n – 1 for the mean).

[ ]5.1909,5.16304045096.1177096.1

22 =±=± nsx