T test and ANOVA

Post on 07-May-2015

10.320 views 3 download

description

TTest and ANOVA

Transcript of T test and ANOVA

©drtamil@gmail.com 2012

Inferential Statistics, T Test, ANOVA & Proportionate Test

Assoc. Prof . Dr Azmi Mohd Tamil

Dept of Community Health

Universiti Kebangsaan Malaysia

FF2613

©drtamil@gmail.com 2012

Inferential Statistics

Basic Hypothesis Testing

FF2613

©drtamil@gmail.com 2012

Inferential Statistic

4When we conduct a study, we want to

make an inference from the data

collected. For example;

“drug A is better than drug B in treating

disease D"

©drtamil@gmail.com 2012

Drug A Better Than Drug B?

4Drug A has a higher rate of cure than

drug B. (Cured/Not Cured)

4 If for controlling BP, the mean of BP

drop for drug A is larger than drug B.

(continuous data – mm Hg)

©drtamil@gmail.com 2012

Null Hypothesis

4Null Hyphotesis;

“no difference of effectiveness between

drug A and drug B in treating disease D"

©drtamil@gmail.com 2012

Null Hypothesis

4H0 is assumed TRUE unless data indicate

otherwise:

• The experiment is trying to reject the null

hypothesis

• Can reject, but cannot prove, a hypothesis

– e.g. “all swans are white”

» One black swan suffices to reject

» H0 “Not all swans are white”

» No number of white swans can prove the hypothesis –

since the next swan could still be black.

©drtamil@gmail.com 2012

Can reindeer fly?

4 You believe reindeer can fly

4 Null hypothesis: “reindeer cannot fly”

4 Experimental design: to throw reindeer off the roof

4 Implementation: they all go splat on the ground

4 Evaluation: null hypothesis not rejected• This does not prove reindeer cannot fly: what you have

shown is that

– “from this roof, on this day, under these weather conditions, these particular reindeer either could not, or chose not to, fly”

4 It is possible, in principle, to reject the null hypothesis• By exhibiting a flying reindeer!

©drtamil@gmail.com 2012

Significance

4 Inferential statistics determine whether a significant difference of effectiveness exist between drug A and drug B.

4 If there is a significant difference (p<0.05), then the null hypothesis would be rejected.

4 Otherwise, if no significant difference (p>0.05), then the null hypothesis would not be rejected.

4 The usual level of significance utilised to reject or not reject the null hypothesis are either 0.05 or 0.01. In the above example, it was set at 0.05.

©drtamil@gmail.com 2012

Confidence interval

4Confidence interval = 1 - level of significance.

4 If the level of significance is 0.05, then

the confidence interval is 95%.

4CI = 1 – 0.05 = 0.95 = 95%

4If CI = 99%, then level of significance is 0.01.

©drtamil@gmail.com 2012

What is level of significance? Chance?

What is level of significance? Chance?

t0 2.0639-2.0639

.025

Reject H0

Reject H0

.025

t0 2.0639-2.0639

.025

Reject H0

Reject H0

.025

-1.96 1.96

©drtamil@gmail.com 2012

Fisher’s Use of p-Values

4 R.A. Fisher referred to the probability to declare significance as “p-value”.

4 “It is a common practice to judge a result significant, if it is of such magnitude that it would be produced by chance not more frequently than once in 20 trials.”

4 1/20=0.05. If p-value less than 0.05, then the probability of the effect detected were due to chance is less than 5%.

4 We would be 95% confident that the effect detected is due to real effect, not due to chance.

©drtamil@gmail.com 2012

Error

4Although we have determined the level

of significance and confidence interval,

there is still a chance of error.

4There are 2 types;

• Type I Error

• Type II Error

©drtamil@gmail.com 2012

Treatments are

not different

Treatments are

different

Conclude

treatments are

not different

Conclude

treatments are

different

DECISION

REALITY

Correct DecisionType I error

α error

Type II error

β error

Correct Decision

(Cell b)(Cell a)

(Cell c) (Cell d)

Error

©drtamil@gmail.com 2012

Error

Test of Correct Null Hypothesis

Incorrect Null

Hypothesis

Significance (Ho not rejected) (Ho rejected)

Null Hypothesis

Not Rejected Correct Conclusion Type II Error

Null Hypothesis

Rejected Type I Error Correct Conclusion

©drtamil@gmail.com 2012

Type I Error

• Type I Error – rejecting the null hypothesis

although the null hypothesis is correct

e.g.

• when we compare the mean/proportion of

the 2 groups, the difference is small but the

difference is found to be significant.

Therefore the null hypothesis is rejected.

• It may occur due to inappropriate choice of

alpha (level of significance).

©drtamil@gmail.com 2012

Type II Error

• Type II Error – not rejecting the null

hypothesis although the null hypothesis is

wrong

• e.g. when we compare the mean/proportion

of the 2 groups, the difference is big but the

difference is not significant. Therefore the

null hypothesis is not rejected.

• It may occur when the sample size is

too small.

©drtamil@gmail.com 2012

Type of treatment * Pain (2 hrs post-op) Crosstabulation

8 7 15

53.3% 46.7% 100.0%

4 11 15

26.7% 73.3% 100.0%

12 18 30

40.0% 60.0% 100.0%

Count

% within Type

of treatment

Count

% within Type

of treatment

Count

% within Type

of treatment

Pethidine

Cocktail

Type of treatment

Total

No pain In pain

Pain (2 hrs post-op)

Total

Example of Type II Error

Data of a clinical trial on 30 patients on comparison of pain control between

two modes of treatment.

p = 0.136. p bigger than 0.05. No significant difference and the null hypothesis was not

rejected.

There was a large difference between the rates but were not

significant. Type II Error?

Chi-square =2.222, p=0.136

©drtamil@gmail.com 2012

Not significant since power of the study is less than 80%.

Power is only

32%!

©drtamil@gmail.com 2012

Check for the errors

4You can check for type II errors of your

own data analysis by checking for the

power of the respective analysis

4This can easily be done by utilising

software such as Power & Sample Size

(PS2) from the website of the Vanderbilt

University

©drtamil@gmail.com 2012

Determining the appropriate statistical test

©drtamil@gmail.com 2012

Data Analysis

4Descriptive – summarising data

4Test of Association

4Multivariate – controlling for confounders

©drtamil@gmail.com 2012

Test of Association

4To study the relationship between one

or more risk variable(s) (independent)

with outcome variable (dependent)

4For example; does ethnicity affects the

suicidal/para-suicidal tendencies of

psychiatric patients.

©drtamil@gmail.com 2012

Problem Flow Chart

Marital Status

Suicidal Tendencies

Ethnicity

Independent Variables

Dependent Variable

©drtamil@gmail.com 2012

Multivariat

4Studies the association between

multiple causative factors/variables

(independent variables) with the

outcome (dependent).

4For example; risk factors such as

parental care, practise of religion,

education level of parents & disciplinary

problems of their child (outcome).

©drtamil@gmail.com 2012

Hypothesis TestingHypothesis Testing

4Distinguish parametric & non-parametric

procedures

4Test two or more populations using

parametric & non-parametric procedures

• Means

• Medians

• Variances

©drtamil@gmail.com 2012

Hypothesis Testing Procedures

Hypothesis Testing Procedures

©drtamil@gmail.com 2012

Parametric Test Procedures

Parametric Test Procedures

4 Involve population parameters

• Example: Population mean

4Require interval scale or ratio scale

• Whole numbers or fractions

• Example: Height in inches: 72, 60.5, 54.7

4Have stringent assumptions

• Example: Normal distribution

4Examples: Z test, t test

©drtamil@gmail.com 2012

Nonparametric Test Procedures

Nonparametric Test Procedures

4Statistic does not depend on population

distribution

4Data may be nominally or ordinally

scaled

• Example: Male-female

4May involve population parameters such

as median

4Example: Wilcoxon rank sum test

©drtamil@gmail.com 2012

Parametric Analysis –Quantitative

Qualitative

Dichotomus

Quantitative Normally distributed data Student's t Test

Qualitative

Polinomial

Quantitative Normally distributed data ANOVA

Quantitative Quantitative Repeated measurement of the

same individual & item (e.g.

Hb level before & after

treatment). Normally

distributed data

Paired t Test

Quantitative -

continous

Quantitative -

continous

Normally distributed data Pearson Correlation

& Linear

Regresssion

©drtamil@gmail.com 2012

non-parametric tests

Variable 1 Variable 2 Criteria Type of Test

Qualitative

Dichotomus

Qualitative

Dichotomus

Sample size < 20 or (< 40 but

with at least one expected

value < 5)

Fisher Test

Qualitative

Dichotomus

Quantitative Data not normally distributed Wilcoxon Rank Sum

Test or U Mann-

Whitney Test

Qualitative

Polinomial

Quantitative Data not normally distributed Kruskal-Wallis One

Way ANOVA Test

Quantitative Quantitative Repeated measurement of the

same individual & item

Wilcoxon Rank Sign

TestQuantitative -

continous

Quantitative -

continous

Data not normally distributed Spearman/Kendall

Rank Correlation

©drtamil@gmail.com 2012

Statistical Tests - Qualitative

Variable 1 Variable 2 Criteria Type of Test

Qualitative Qualitative Sample size > 20 dan no

expected value < 5Chi Square Test (X

2)

Qualitative

Dichotomus

Qualitative

Dichotomus

Sample size > 30 Proportionate Test

Qualitative

Dichotomus

Qualitative

Dichotomus

Sample size > 40 but with at

least one expected value < 5X2 Test with Yates

Correction

Qualitative Quantitative Normally distributed data Student's t TestQualitative

Dichotomus

Qualitative

Dichotomus

Sample size < 20 or (< 40 but

with at least one expected

value < 5)

Fisher Test

Qualitative Quantitative Data not normally distributed Wilcoxon Rank Sum

©drtamil@gmail.com 2012

Data Analysis

4Using SPSS;

http://161.142.92.104/spss/

4Using Excel;

http://161.142.92.104/excel/

©drtamil@gmail.com 2012

T Test, ANOVA & Proportionate Test

Assoc. Prof . Dr Azmi Mohd Tamil

Dept of Community Health

Universiti Kebangsaan Malaysia

FF2613

© d rtam il@ g m ail.co m 2012

T - Test

Independent T-Test

Student’s T-Test

Paired T-Test

ANOVA

©drtamil@gmail.com 2012

Student’s T-test

William Sealy Gosset @

“Student”, 1908. The Probable

Error of Mean. Biometrika.

©drtamil@gmail.com 2012

Student’s T-Test

4To compare the means of two independent

groups. For example; comparing the mean

Hb between cases and controls. 2 variables

are involved here, one quantitative (i.e. Hb)

and the other a dichotomous qualitative

variable (i.e. case/control).

4 t =

©drtamil@gmail.com 2012

Examples: Student’s t-test

4Comparing the level of blood cholestrol

(mg/dL) between the hypertensive and

normotensive.

4Comparing the HAMD score of two

groups of psychiatric patients treated

with two different types of drugs (i.e.

Fluoxetine & Sertraline

©drtamil@gmail.com 2012

Example

Group Statistics

35 4.2571 3.12808

32 3.8125 4.39529

DRUG

F

S

DHAMAWK6

N Mean Std. Deviation

Independent Samples Test

.48 65 .633 .4446Equal variances

assumed

DHAMAWK6

t df

Sig.

(2-tailed)

Mean

Difference

t-test for Equality of Means

©drtamil@gmail.com 2012

Assumptions of T test

4Observations are normally distributed in

each population. (Explore)

4The population variances are equal.

( L e v e n e ’s T e s t )

The 2 groups are independent of each

other. (Design of study)

©drtamil@gmail.com 2012

Manual Calculation

4 Sample size > 30 4 Small sample size,

equal variance

1 2

2 2

1 2

1 2

X Xt

s s

n n

−=

+

1 2

0

1 2

2 22 1 1 2 20

1 2

1 1

( 1) ( 1)

( 1) ( 1)

X Xt

sn n

n s n ss

n n

−=

+

− + −=

− + −

©drtamil@gmail.com 2012

Example – compare cholesterol level

4Hypertensive :Mean : 214.92s.d. : 39.22n : 64

4Normal :Mean : 182.19s.d. : 37.26n : 36

• Comparing the cholesterol level between

hypertensive and normal patients.

• The difference is (214.92 – 182.19) = 32.73 mg%.

• H0 : There is no difference of cholesterol level

between hypertensive and normal patients.

• n > 30, (64+36=100), therefore use the first formula.

©drtamil@gmail.com 2012

Calculation

4 t = (214.92- 182.19)________ ((39.222/64)+(37.262/36))0.5

4 t = 4.137

4 df = n1+n2-2 = 64+36-2 = 98

4 Refer to t table; with t = 4.137, p < 0.001

1 2

2 2

1 2

1 2

X Xt

s s

n n

−=

+

If df>100, can refer Table A1.We don’t have 4.137 so we use 3.99 instead. If t = 3.99, then p=0.00003x2=0.00006Therefore if t=4.137, p<0.00006.

Or can refer to Table A3.We don’t have df=98,

so we use df=60 instead. t = 4.137 > 3.46 (p=0.001)

Therefore if t=4.137, p<0.001.

©drtamil@gmail.com 2012

Conclusion

• Therefore p < 0.05, null hypothesis rejected.

• There is a significant difference of

cholesterol level between hypertensive and

normal patients.

• Hypertensive patients have a significantly

higher cholesterol level compared to

normotensive patients.

©drtamil@gmail.com 2012

Exercise (try it)

• Comparing the mini test 1 (2012) results between

UKM and ACMS students.

• The difference is 11.255

• H0 : There is no difference of marks between UKM

and ACMS students.

• n > 30, therefore use the first formula.

©drtamil@gmail.com 2012

Exercise (answer)

4Null hypothesis rejected

4There is a difference of marks between

UKM and ACMS students. UKM marks

higher than AUCMS

©drtamil@gmail.com 2012

T-Test In SPSS

4 For this exercise, we will be using the data from the CD, under Chapter 7, sga-bab7.sav

4 This data came from a case-control study on factors affecting SGA in Kelantan.

4 Open the data & select ->Analyse

>Compare Means>Ind-Samp T

Test…

©drtamil@gmail.com 2012

T-Test in SPSS

4 We want to see whether there is any association between the mothers’ weight and SGA. So select the risk factor (weight2) into ‘Test Variable’ & the outcome (SGA) into ‘Grouping Variable’.

4 Now click on the ‘Define Groups’ button. Enter

• 0 (Control) for Group 1 and

• 1 (Case) for Group 2.

4 Click the ‘Continue’ button & then click the ‘OK’ button.

©drtamil@gmail.com 2012

T-Test Results

4Compare the mean+sd of both groups.

• Normal 58.7+11.2 kg

• SGA 51.0+ 9.4 kg

4Apparently there is a difference of

weight between the two groups.

Group Statistics

108 58.666 11.2302 1.0806

109 51.037 9.3574 .8963

SGA

Normal

SGA

Weight at first ANC

N Mean Std. Deviation

Std. Error

Mean

©drtamil@gmail.com 2012

Results & Homogeneity of Variances

4 Look at the p value of Levene’s Test. If p is not significant then equal variances is assumed (use top row).

4 If it is significant then equal variances is not assumed (use bottom row).

4 So the t value here is 5.439 and p < 0.0005. The difference is significant. Therefore there is an association between the mothers weight and SGA.

Independent Samples Test

1.862 .174 5.439 215 .000 7.629 1.4028 4.8641 10.3940

5.434 207.543 .000 7.629 1.4039 4.8612 10.3969

Equal variances

assumed

Equal variances

not assumed

Weight at first ANC

F Sig.

Levene's Test for

Equality of Variances

t df Sig. (2-tailed)

Mean

Difference

Std. Error

Difference Lower Upper

95% Confidence

Interval of the

Difference

t-test for Equality of Means

©drtamil@gmail.com 2012

How to present the result?

Group N Mean test p

Normal 108 58.7+11.2 kg

T test

t = 5.439<0.0005

SGA 109 51.0+ 9.4

©drtamil@gmail.com 2012

Paired t-test

“Repeated measurement on the

same individual”

©drtamil@gmail.com 2012

Paired T-Test

4“Repeated measurement on the same

individual”

4 t =

©drtamil@gmail.com 2012

Formula

( )22

0

1

1

d

i

d

p

dt

s

n

dd

nsn

df n

−=

−=

= −

∑∑

©drtamil@gmail.com 2012

Examples of paired t-test

4Comparing the HAMD score between

week 0 and week 6 of treatment with

Sertraline for a group of psychiatric

patients.

4Comparing the haemoglobin level

amongst anaemic pregnant women after

6 weeks of treatment with haematinics.

©drtamil@gmail.com 2012

Example

Paired Samples Statistics

13.9688 32 6.48315

3.8125 32 4.39529

DHAMAWK0

DHAMAWK6

Pair

1

Mean N Std. Deviation

Paired Samples Test

10.1563 6.75903 8.500 31 .000DHAMAWK0 -

DHAMAWK6

Pair

1

Mean

Std.

Deviation

Paired Differences

t df

Sig.

(2-tailed)

©drtamil@gmail.com 2012

M a n u a l C a l c u l a t i o n

D The measurement of the systolic and diastolic

blood pressures was done two consecutive

times with an interval of 10 minutes. You want

t o d e t e r m in e w h e t h e r t h e r e w a s a n y

difference between those two measurements.

4H0:There is no difference of the systolic blood

pressure during the first (time 0) and second

measurement (time 10 minutes).

©drtamil@gmail.com 2012

Calculation

4Calculate the difference between first &

second measurement and square it.

Total up the difference and the square.

©drtamil@gmail.com 2012

Calculation

4∑ d = 112 ∑ d2 = 1842 n = 36

4Mean d = 112/36 = 3.11

4sd = ((1842-1122/36)/35)0.5

sd = 6.53

4 t = 3.11/(6.53/6)

t = 2.858

4df = np – 1 = 36 – 1 = 35.

4Refer to t table;

( )22

0

1

1

d

i

d

p

dt

s

n

dd

nsn

df n

−=

−=

= −

∑∑

Refer to Table A3.We don’t have df=35,

so we use df=30 instead. t = 2.858, larger than 2.75

(p=0.01) but smaller than 3.03 (p=0.005). 3.03>t>2.75

Therefore if t=2.858, 0.005<p<0.01.

©drtamil@gmail.com 2012

Conclusion

with t = 2.858, 0.005<p<0.01Therefore p < 0.01.Therefore p < 0.05, null hypothesis rejected.Conclusion: There is a significant difference of the systolic blood pressure between the first and second measurement. The mean average of first reading is significantly higher compared to the second reading.

©drtamil@gmail.com 2012

Paired T-Test In SPSS

4 For this exercise, we will be using the data from the CD, under Chapter 7, sgapair.sav

4 This data came from a controlled trial on haematinic effect on Hb.

4 Open the data & select ->Analyse>Compare Means

>Paired-Samples T

Test…

©drtamil@gmail.com 2012

Paired T-Test In SPSS

4 We want to see whether there is any association between the prescription on haematinic to anaemic pregnant mothers and Hb.

4 We are comparing the Hb before & after treatment. So pair the two measurements (Hb2 & Hb3) together.

4 Click the ‘OK’ button.

©drtamil@gmail.com 2012

Paired T-Test Results

4This shows the mean & standard

deviation of the two groups.

Paired Samples Statistics

10.247 70 .3566 .0426

10.594 70 .9706 .1160

HB2

HB3

Pair

1

Mean N Std. Deviation

Std. Error

Mean

©drtamil@gmail.com 2012

Paired T-Test Results

4This shows the mean difference of Hb

before & after treatment is only 0.347

g%.

4Yet the t=3.018 & p=0.004 show the

difference is statistically significant.

Paired Samples Test

-.347 .9623 .1150 -.577 -.118 -3.018 69 .004HB2 - HB3Pair 1

Mean Std. Deviation

Std. Error

Mean Lower Upper

95% Confidence

Interval of the

Difference

Paired Differences

t df Sig. (2-tailed)

©drtamil@gmail.com 2012

How to present the result?

Group NMean D

(Diff.)Test p

Before

treatment

(HB2) vs

After

treatment

(HB3)

70 0.35 + 0.96

Paired T-

test

t = 3.018

0.004

©drtamil@gmail.com 2012

ANOVA

©drtamil@gmail.com 2012

ANOVA –Analysis of Variance

4Extension of independent-samples t test

4Compares the means of groups of

independent observations

• Don’t be fooled by the name. ANOVA does

not compare variances.

4Can compare more than two groups

©drtamil@gmail.com 2012

One-Way ANOVA F-Test

One-Way ANOVA F-Test

4 Tests the equality of 2 or more population means

4 Variables• One nominal scaled independent variable

– 2 or more treatment levels or classifications

(i.e. Race; Malay, Chinese, Indian & Others)

• One interval or ratio scaled dependent variable(i.e. weight, height, age)

4 Used to analyse completely randomized

experimental designs

©drtamil@gmail.com 2012

Examples

4Comparing the blood cholesterol levels

between the bus drivers, bus conductors

and taxi drivers.

4Comparing the mean systolic pressure

between Malays, Chinese, Indian &

Others.

©drtamil@gmail.com 2012

One-Way ANOVA F-Test Assumptions

One-Way ANOVA F-Test Assumptions

4Randomness & independence of errors

• Independent random samples are drawn

4Normality

• Populations are normally distributed

4Homogeneity of variance

• Populations have equal variances

©drtamil@gmail.com 2012

Example

Descriptives

Birth weight

151 2.7801 .52623 1.90 4.72

23 2.7643 .60319 1.60 3.96

44 2.8430 .55001 1.90 3.79

218 2.7911 .53754 1.60 4.72

Housewife

Office work

Field work

Total

N Mean Std. Deviation Minimum Maximum

ANOVA

Birth weight

.153 2 .077 .263 .769

62.550 215 .291

62.703 217

Between Groups

Within Groups

Total

Sum of

Squares df Mean Square F Sig.

©drtamil@gmail.com 2012

Manual Calculation

ANOVA

©drtamil@gmail.com 2012

Manual Calculation

4Not expected to be calculated manually

by medical students.

Example: Time To Complete Analysis

45 samples were

analysed using 3 different

blood analyser (Mach1,

Mach2 & Mach3).

15 samples were placed

into each analyser.

Time in seconds was

measured for each

sample analysis.

Example: Time To Complete Analysis

The overall mean of the

entire sample was 22.71

seconds.

This is called the “grand”

mean, and is often

denoted by .

If H0 were true then we’d

expect the group means

to be close to the grand

mean.

X

Example: Time To Complete Analysis

The ANOVA test is

based on the combined

distances from .

If the combined

distances are large, that

indicates we should

reject H0.

X

The Anova Statistic

To combine the differences from the grand mean we

• Square the differences

• Multiply by the numbers of observations in the groups

• Sum over the groups

where the are the group means.

“SSB” = Sum of Squares Between groups

( ) ( ) ( )23

2

2

2

1 151515 SSB XXXXXX MachMachMach −+−+−=

*X

The Anova Statistic

To combine the differences from the grand mean we

• Square the differences

• Multiply by the numbers of observations in the groups

• Sum over the groups

where the are the group means.

“SSB” = Sum of Squares Between groups

Note: This looks a bit like a variance.

*X

( ) ( ) ( )23

2

2

2

1 151515 SSB XXXXXX MachMachMach −+−+−=

©drtamil@gmail.com 2012

4Grand Mean = 22.71

4Mean Mach1 = 24.93; (24.93-22.71)2=4.9284

4Mean Mach2 = 22.61; (22.61-22.71)2=0.01

4Mean Mach3 = 20.59; (20.59-22.71)2=4.4944

4SSB = (15*4.9284)+(15*0.01)+(15*4.4944)

4SSB = 141.492

( ) ( ) ( )23

2

2

2

1 151515 SSB XXXXXX MachMachMach −+−+−=

Sum of Squares Between

How big is big?

4 For the Time to Complete, SSB = 141.492

4 Is that big enough to reject H0?

4 As with the t test, we compare the statistic to

the variability of the individual observations.

4 In ANOVA the variability is estimated by the

Mean Square Error, or MSE

MSEMean Square Error

The Mean Square Error

is a measure of the

variability after the

group effects have

been taken into

account.

where xij is the ith

observation in the jth

group.

( )∑∑ −−

=j i

jij XxKN

MSE21

MSEMean Square Error

The Mean Square Error

is a measure of the

variability after the

group effects have

been taken into

account.

where xij is the ith

observation in the jth

group.

( )∑∑ −−

=j i

jij XxKN

MSE21

MSEMean Square Error

The Mean Square Error

is a measure of the

variability after the

group effects have

been taken into

account.

( )∑∑ −−

=j i

jij XxKN

MSE21

©drtamil@gmail.com 2012

( )∑∑ −−

=j i

jijXx

KNMSE

21

Mach1 (x-mean)^2 Mach2 (x-mean)^2 Mach3 (x-mean)^2

23.73 1.4400 21.5 1.2321 19.74 0.7225

23.74 1.4161 21.6 1.0201 19.75 0.7056

23.75 1.3924 21.7 0.8281 19.76 0.6889

24.00 0.8649 21.7 0.8281 19.9 0.4761

24.10 0.6889 21.8 0.6561 20 0.3481

24.20 0.5329 21.9 0.5041 20.1 0.2401

25.00 0.0049 22.75 0.0196 20.3 0.0841

25.10 0.0289 22.75 0.0196 20.4 0.0361

25.20 0.0729 22.75 0.0196 20.5 0.0081

25.30 0.1369 23.3 0.4761 20.5 0.0081

25.40 0.2209 23.4 0.6241 20.6 0.0001

25.50 0.3249 23.4 0.6241 20.7 0.0121

26.30 1.8769 23.5 0.7921 22.1 2.2801

26.31 1.9044 23.5 0.7921 22.2 2.5921

26.32 1.9321 23.6 0.9801 22.3 2.9241

SUM 12.8380 9.4160 11.1262

©drtamil@gmail.com 2012

4Note that the variation of the means (141.492) seems quite large (more likely to be significant???) compared to the variance of observations within groups (12.8380+9.4160+11.1262=33.3802).

4MSE = 33.3802/(45-3) = 0.7948

( )∑∑ −−

=j i

jijXx

KNMSE

21

Notes on MSE

4 If there are only two groups, the MSE is equal

to the pooled estimate of variance used in the

equal-variance t test.

4 ANOVA assumes that all the group variances

are equal.

4 Other options should be considered if group

variances differ by a factor of 2 or more.

4 (12.8380 ~ 9.4160 ~ 11.1262)

ANOVA F Test

4 The ANOVA F test is based on the F statistic

where K is the number of groups.

4 Under H0 the F statistic has an “F” distribution,

with K-1 and N-K degrees of freedom (N is the

total number of observations)

MSE

KSSBF

)1( −=

Time to Analyse:F test p-value

To get a p-value we

compare our F statistic

to an F(2, 42)

distribution.

Time to Analyse:F test p-value

To get a p-value we

compare our F statistic

to an F(2, 42)

distribution.

In our example

We cannot draw the line

since the F value is so

large, therefore the p

value is so small!!!!!!

015.89423802.33

2492.141==F

©drtamil@gmail.com 2012

Refer to F Dist. Table (α=0.01).We don’t have df=2;42,

so we use df=2;40 instead. F = 89.015, larger than 5.18

(p=0.01) Therefore if F=89.015, p<0.01.

Why use df=2;42?We have 3 groups so K-1 = 2We have 45 samples therefore N-K = 42.

Time to Analyse:F test p-value

To get a p-value we

compare our F statistic

to an F(2, 42)

distribution.

In our example

The p-value is really

( ) 00080000000000.0015.89(2,42) => FP

015.89423802.33

2492.141==F

ANOVA Table

Sum of

Squares df

Mean

Square F Sig.

Between

Groups141.492 2 40.746 89.015 .0000000

Within Groups 33.380 42 .795

Total 174.872 44

Results are often displayed using an ANOVA Table

ANOVA Table

Sum of

Squares df

Mean

Square F Sig.

Between

Groups141.492 2 40.746 89.015 .0000000

Within Groups 33.380 42 .795

Total 174.872 44

Results are often displayed using an ANOVA Table

Sum of Squares

Between (SSB)

Mean Square

Error (MSE)F Statistic p value

Pop Quiz!: Where are the following quantities presented in this table?

ANOVA Table

Sum of

Squares df

Mean

Square F Sig.

Between

Groups141.492 2 40.746 89.015 .0000000

Within Groups 33.380 42 .795

Total 174.872 44

Results are often displayed using an ANOVA Table

Sum of Squares

Between (SSB)

Mean Square

Error (MSE)F Statistic p value

ANOVA Table

Sum of

Squares df

Mean

Square F Sig.

Between

Groups141.492 2 40.746 89.015 .0000000

Within Groups 33.380 42 .795

Total 174.872 44

Results are often displayed using an ANOVA Table

Sum of Squares

Between (SSB)

Mean Square

Error (MSE)F Statistic p value

ANOVA Table

Sum of

Squares df

Mean

Square F Sig.

Between

Groups141.492 2 40.746 89.015 .0000000

Within Groups 33.380 42 .795

Total 174.872 44

Results are often displayed using an ANOVA Table

Sum of Squares

Between (SSB)

Mean Square

Error (MSE)F Statistic p value

ANOVA Table

Sum of

Squares df

Mean

Square F Sig.

Between

Groups141.492 2 40.746 89.015 .0000000

Within Groups 33.380 42 .795

Total 174.872 44

Results are often displayed using an ANOVA Table

Sum of Squares

Between (SSB)

Mean Square

Error (MSE)F Statistic p value

©drtamil@gmail.com 2012

ANOVA In SPSS

4 For this exercise, we will be using the data from the CD, under Chapter 7, sga-bab7.sav

4 This data came from a case-control study on factors affecting SGA in Kelantan.

4 Open the data & select ->Analyse

>Compare Means>One-Way

ANOVA…

©drtamil@gmail.com 2012

ANOVA in SPSS

4 We want to see whether there is any association between the babies’ weight and mothers’ type of work. So select the risk factor (typework) into ‘Factor’ & the outcome (birthwgt) into ‘Dependent’.

4 Now click on the ‘Post Hoc’button. Select Bonferonni.

4 Click the ‘Continue’ button & then click the ‘OK’ button.

4 Then click on the ‘Options’button.

©drtamil@gmail.com 2012

ANOVA in SPSS

4 Select ‘Descriptive’,

‘Homegeneity of

variance test’ and

‘Means plot’.

4 Click ‘Continue’ and

then ‘OK’.

©drtamil@gmail.com 2012

ANOVA Results

4Compare the mean+sd of all groups.

4Apparently there are not much

difference of babies’ weight between the

groups.

Descriptives

Birth weight

151 2.7801 .52623 .04282 2.6955 2.8647 1.90 4.72

23 2.7643 .60319 .12577 2.5035 3.0252 1.60 3.96

44 2.8430 .55001 .08292 2.6757 3.0102 1.90 3.79

218 2.7911 .53754 .03641 2.7193 2.8629 1.60 4.72

Housewife

Office work

Field work

Total

N Mean Std. Deviation Std. Error Lower Bound Upper Bound

95% Confidence Interval for

Mean

Minimum Maximum

©drtamil@gmail.com 2012

Results & Homogeneity of Variances

4Look at the p value of Levene’s Test. If p

is not significant then equal variances is

assumed.

Test of Homogeneity of Variances

Birth weight

.757 2 215 .470

Levene

Statistic df1 df2 Sig.

©drtamil@gmail.com 2012

ANOVA Results

4So the F value here is 0.263 and p =0.769.

The difference is not significant. Therefore

there is no association between the

babies’ weight and mothers’ type of work.

ANOVA

Birth weight

.153 2 .077 .263 .769

62.550 215 .291

62.703 217

Between Groups

Within Groups

Total

Sum of

Squares df Mean Square F Sig.

©drtamil@gmail.com 2012

How to present the result?

Type of Work Mean+sd Test p

Office 2.76 + 0.60

ANOVA

F = 0.2630.769Housewife 2.78 + 0.53

Farmer 2.84 + 0.55

©drtamil@gmail.com 2012

Proportionate Test

©drtamil@gmail.com 2012

Proportionate Test

4Qualitative data utilises rates, i.e. rate of

anaemia among males & females

4To compare such rates, statistical tests

such as Z-Test and Chi-square can be

used.

©drtamil@gmail.com 2012

Formula

• where p1 is the rate for

event 1 = a1/n1

• p2 is the rate for event 2

= a2/n2

• a1 and a2 are frequencies

of event 1 and 2

4 We refer to the normal

distribution table to

decide whether to reject

or not the null

hypothesis.

1 2

0 0

1 2

1 1 2 20

1 2

0 0

1 1

1

p pz

p qn n

p n p np

n n

q p

−=

+

+=

+

= −

©drtamil@gmail.com 2012

http://stattrek.com/hypothesis-test/proportion.aspx

4 ■The sampling method is simple random

sampling.

4 ■Each sample point can result in just two

possible outcomes. We call one of these

outcomes a success and the other, a failure.

4 ■The sample includes at least 10 successes

and 10 failures.

4 ■The population size is at least 10 times as

big as the sample size.

©drtamil@gmail.com 2012

Example

4Comparison of worm infestation rate between male and female medical students in Year 2.

4Rate for males ; p1= 29/96 = 0.302

4Rate for females;p2 =24/104 = 0.231

4H0: There is no difference of worm infestation rate between male and female medical students in Year 2

©drtamil@gmail.com 2012

Cont.

p1 p2

p0 q0

©drtamil@gmail.com 2012

Cont.

4p0 = (29/96*96)+(24/104*104) = 0.265

96+104

4q0 = 1 – 0.265 = 0.735

©drtamil@gmail.com 2012

Cont.

4 z = 0.302 - 0.231 = 1.1367

((0.735*0.265) (1/96 + 1/104))0.5

4 From the normal distribution table (A1), z value

is significant at p=0.05 if it is above 1.96. Since

the value is less than 1.96, then there is no

difference of rate for worm infestatation

between the male and female students.

Refer to Table A1.We don’t have 1.1367 so we use 1.14 instead. If z = 1.14, then p=0.1271x2=0.2542Therefore if z=1.14, p=0.2542. H0 not rejected

©drtamil@gmail.com 2012

Exercise (try it)

4Comparison of failure rate between

ACMS and UKM medical students in

Year 2 for minitest 1 (MS2 2012).

4Rate for UKM ; p1= 42/196 = 0.214

4Rate for ACMS;p2 = 35/70 = 0.5

©drtamil@gmail.com 2012

Answer

4P1 = 0.214, p2 = 0.5, p0 = 0.289, q0 = 0.711

4N1 = 196, n2 = 70, Z = 20.470.5 = 4.52

4p < 0.00006