Malimu statistical significance testing.

STATISTICAL STATISTICAL SIGNIFICANCE TESTINGSIGNIFICANCE TESTING

BY MALIMUBY MALIMUDept of Epidemiology/Biostatistics,Dept of Epidemiology/Biostatistics,School of Public Health and Social School of Public Health and Social

Sciences,Sciences,MUHAS/KIUMUHAS/KIU

OUTLINEOUTLINE

• Introduction– why significance tests (with examples)– how significance tests work– significance levels– critical region and critical values– concept of the P-value– hypotheses

• Significance test for 1 mean (z-test and t-test)• Significance test for 1 proportion (z-test)• Significance test for 2 means (z-test and t-

test)• Significance test for 2 proportions (χχ22 -test) -test)

derived from a 2 x 2 contingency tablederived from a 2 x 2 contingency table

WHY A SIGNIFICANCE TESTWHY A SIGNIFICANCE TESTExample 1: mean age at first Example 1: mean age at first sexual intercourse; sexual intercourse; do males do males start earlier than females?start earlier than females?

Male Female Overallmean

(n)16.9

(93)17.2

(99)17.0 (192)

std. dev.

2.1 2.0 2.0

INTRODUCTION: why INTRODUCTION: why significance testssignificance tests

Example 2: Prevalence of HIV infection: comparing in-school youth and the general youth population

• Large studies have indicated that the proportion of youth infected with HIV is 10%. A study done involving 400 school youth showed a prevalence of 7%. Do these results provide evidence that the prevalence of HIV among school youth is lower than that in the general youth population?

Example 3: Utilization of VCT services by marital status

• In a study on VCT utilization, we find that 60% of married people in the sample utilize VCT services compared to 30% of unmarried people. How should we interpret this result?



• We note that in all examples, there are differences

• However, the observed difference in each example might:– reflect a TRUE DIFFERENCE (i.e. the

difference also exists in the total population from which the sample was drawn

– be due to CHANCE (i.e. in reality there is no difference, but the observed difference is due to sampling variation)

– be due to BIAS (e.g. due to defects in the study methodology)

• With an appropriate study design, we can feel confident that an observed difference between two groups cannot be explained by BIAS

• We would like to find out whether this difference can be considered as a TRUE difference

• We can only conclude that this is the case if we can rule out the CHANCE explanation

• We accomplish this by applying a significance test


• A SIGNIFICANCE TEST estimates the likelihood that an observed study result (e.g. a difference between two groups) is due to chance

• A significance test is used to assess whether a study result, which is observed in a sample can be considered as a result which indeed exists in the study population from which the sample was drawn


INTRODUCTION: How INTRODUCTION: How significance tests work significance tests work

• Suppose we observed a difference between two groups in a study sample.

• We want to know whether this observed difference between the two groups represents a real difference in the total study population from which the sample was drawn, or whether it just occurred by chance (due to sampling variation).

• To find this out, we determine how likely it is that this difference could have occurred by chance.

• We can never be 100% sure that an observed difference is true, but in general, we are happy if we can be 99% or 95% sure (confident).

• If we are 95% sure, there is a less than 5% likelihood that the observed difference occurred by chance.

INTRODUCTION: How INTRODUCTION: How significance tests worksignificance tests work

Important TerminologiesImportant Terminologies• Statistical hypothesisStatistical hypothesis

– This is a statement about the parameter(s) of the population(s) from which the sample(s) were taken.

– Null hypothesis, H0 : hypothesis of “no difference”. This is the one to be tested.

– Alternative hypothesis, H1: hypothesis that disagrees with the null hypothesis.

• Test statisticTest statistic– This is a mathematical function

(expression) of sample values which provides a basis for testing a statistical hypothesis.

– It has a known sampling distribution with tabulated percentage points (e.g. standard normal deviate (SND), z; chi-squared, 2; t)

Important TerminologiesImportant Terminologies

• Significance level (Significance level (αα):):– The probability of rejecting H0 when it is true– Often expressed in percentage form (i.e.

probability, α, is multiplied by 100)– In social sciences, we choose a commonly

accepted level of allowing that our conclusion may have occurred by chance of 0.05 (5%). In clinical trials involving new drugs, a higher significance level (e.g. of 0.01=1%) would be chosen

– Generally, 0.01 and 0.05 values are most commonly used in scientific studies


• Critical (Rejection) RegionCritical (Rejection) Region– This is the region that

encompasses values of the test statistic leading to rejection of the null hypothesis

– Location of the critical region is dependent on the test statistic and the specified significance level


• Critical ValueCritical Value– This is the value of the test statistic

corresponding to a given significance level– The critical value changes as the confidence

level alters: e.g. corresponding critical values for confidence levels of 90%, 95%, 99% are 1.64, 1.96 (≈2), 2.58, respectively

– If the test statistic value computed from the data is greater than the critical value, H0 is rejected

– It is the boundary value of the critical region


CRITICAL REGION & CRITICAL REGION & VALUEVALUE

CConcept of P-valueoncept of P-value• In any study looking for differences

between groups or associations between variables, the likelihood or PROBABILITY of observing a certain result by chance has to be calculated by a statistical test

• This PROBABILITY of observing a result by chance is usually expressed as a P-VALUE

• If it is unlikely (<0.05) that the difference occurred by chance, we reject the chance explanation and accept that there is a real difference. We then say that the difference is statistically significant

• If it is likely ( 0.05) that the difference occurred by chance, we cannot conclude that a real difference exists We then say that the difference is not statistically significant

•Therefore a difference is considered significant if P < 0.05

Concept of P-valueConcept of P-value

HypothesesHypotheses• In statistical terms the assumption that in the

total study population no real difference exists between groups (or that no real association exists between variables) is called the NULL HYPOTHESIS (H0)

• The ALTERNATIVE HYPOTHESIS (H1) is that there exists a difference between groups or that a real association exists between variables

• If the result is statistically significant, we reject the NULL HYPOTHESIS (H0) and accept the ALTERNATIVE HYPOTHESIS (H1) that there is a real difference between two groups, or a real association between two variables

One sample significance test One sample significance test for a mean (for a mean ( known): the known): the

Standard Normal Deviate or Standard Normal Deviate or z-testz-test

• Problem: Is it reasonable to conclude that a sample of n observations, with mean could have been from a population with mean µ and standard deviation ?

x

One sample significance test One sample significance test for a mean (for a mean ( known): the known): the

Standard Normal Deviate or Standard Normal Deviate or z-testz-test

• H0: The difference between µ and is merely due to sampling error

( - µ)• Calculate SND, z = -------- SE( )

( - µ) = -------- /n• Consider the numerical value of z

x

x

xx

INTERPRETATION OF P-INTERPRETATION OF P-VALUE VALUE

• If z < 1.96 then P > 0.05:– we have no strong evidence against H0– suggests that difference being due to

chance is more likely– hence, difference is not statistically

significant• If z > 1.96 then P < 0.05:

– we have evidence against H0 – it is unlikely that the difference between µ

and is due only to sampling error– hence, difference is statistically

significant • If z > 2.58 then P < 0.01:

– we have strong evidence against H0

x

ExampleExample• Results of a study investigating medical risks

associated with a certain occupation show that in random sample of 20 men aged 30-39 years the mean systolic blood pressure is 141.4 mmHg.

• Suppose that the mean systolic blood pressure in the general population of men aged 30-39 years is known to be 133.2 mmHg with a standard deviation of 15.1 mmHg.

• Do the results of the study provide evidence of an increased blood pressure associated with this occupation?

SolutionSolution• Null hypothesis, H0: there is no increase in blood

pressure in this occupation, and the sample of 20 men can be regarded as a random sample from the general population of men aged 30-39 years.

( - µ)• Calculate SND, z = -------- SE( )

( - µ) = -------- /n

xx

x

SolutionSolution• n = 20• = 141.4 mmHg• µ = 133.2 mmHg• σ = 15.1 mmHg• SE( ) = 3.38• SND, z = 2.43; P=0.015<0.05• Conclusion:

– The difference is statistically significant!– That is, there is enough evidence of an increase in

systolic blood pressure among men in this occupation

x

x

PRACTICALPRACTICALThe mean level of prothrombin in the normal

population is known to be 20.0mg/100 ml of plasma and the standard deviation is 4mg/100 ml. A sample of 40 patients showing vitamin K deficiency has a mean prothrombin level of 18.5mg/100 ml.

• (a) How reasonable is it to conclude that the true mean for patients with vitamin K deficiency is the same as that for a normal population?

• (b) Within what limits would the mean prothrombin level be expected to lie for all patients with vitamin K deficiency? (Give the 95% confidence limits)

ONE SAMPLE SIGNIFICANCE ONE SAMPLE SIGNIFICANCE TEST FOR A PROPORTIONTEST FOR A PROPORTION

Problem: Is it reasonable to conclude that a sample of n observations in which the proportion p have a characteristic, could have been taken from a population in which the proportion with the characteristic is ?

• H0: the difference between p and is merely due to sampling error (i.e. by chance)

• If n is reasonably large (? >40), then calculatez = p- SE(p)z = p - (1-)/n

OR z = p -

(100-)/n• Conclusions follow like before

PRACTICAL: PRACTICAL: one sample one sample proportionproportion

• In a clinical trial to compare two systems of TB treatment: A (hospital based DOTS) and B (home based DOTS), 100 patients were each tried the two systems on different occasions. Of the 100 patients, 65 say they prefer A, 35 prefer B. Is this reasonably good evidence that more patients prefer A than B?

One sample significance test One sample significance test for a mean (for a mean ( unknown): the unknown): the

t-testt-testUse of SND, z, applies when the population

standard deviation, , is known• If is unknown, it can be estimated from the

sample by the standard deviation s• With small samples, and replacing by s in the

formula for SND, leads to a new quantity t, given by

( - µ)• t = -------- SE( ) ( - µ) = -------- s/n

xx

x

END OF ONE SAMPLE END OF ONE SAMPLE SIGNIFICANCE TESTSIGNIFICANCE TEST

• s = 5.18s = 5.18• t = 2.65 on 11 df; P<0.05;• probably G. secundum

DETERMINING SIGNIFICANT DETERMINING SIGNIFICANT DIFFERENCES BETWEEN DIFFERENCES BETWEEN

GROUPS IN CATEGORICAL GROUPS IN CATEGORICAL DATA DATA

• For NOMINAL data the significance test to be used depends on whether the sample is small or large

• Generally, the Chi-square test ((χχ22)) will be used

• However, for small samples (? < 40) in a 2 x 2 table, or if any cell of the cross-table, has an expected value of less than 5, it is better to use Yate’s corrected χχ22 or Fisher exact test

COMPARING TWO COMPARING TWO PROPORTIONS: the chi-PROPORTIONS: the chi-

square (square (χχ22) test ) test • The chi-square test is used for

CATEGORICAL data to test for independency between two or more variables, basically comparing two or more proportions

• With categorical data the chi-square test is used to find out whether observed differences between proportions of events in two or more groups may be considered statistically significant

THE CHI-SQUARE (THE CHI-SQUARE (ΧΧ22) TEST: ) TEST: ExampleExample

• Suppose that in a cross-sectional study of the factors affecting the utilization of antenatal clinics we find that 64% of the women who lived within 10 km of the clinic came for antenatal care, compared to only 47% of those who lived more than 10 km away

• This suggests that antenatal care (ANC) is used more often by women who live close to the clinics. The complete results are presented in the following table

UTILIZATION OF ANTENATAL UTILIZATION OF ANTENATAL CLINICS BY DISTANCE FROM A CLINICS BY DISTANCE FROM A

CLINICCLINIC

• We now want to examine if this observed difference is statistically significant or not.

• The chi-square test can be used to give us the answer.

• To perform a χ2 test we need to complete the following 3 steps:– calculate the χ2, – use a χ2 table to obtain the P-value and – interpret the χ2

THE CHI-SQUARE (THE CHI-SQUARE (ΧΧ22) ) TESTTEST

TABLE OF ΧTABLE OF Χ22 VALUES VALUES df P =

0.05P = 0.01

1 2 3 4 5 6 7 8 9101112

3.84 5.99 7.81 9.4911.0712.5914.0715.5116.9218.3119.6821.03

6.63 9.2111.3413.2815.0916.8118.4820.0921.6723.2124.7226.22

CALCULATING ΧCALCULATING Χ22 VALUE: VALUE:the 2 X 2 tablethe 2 X 2 table

• Consider the following general cross-table:

• The quick formula for calculating χχ22 in a 2 x 2 table is as follows:

• χχ22 = N(ad-bc)2/(EFGH)• General formula for larger contingency

tables is time consuming to perform, but computers are very useful for this

• For larger contingency tables, we shall therefore learn how to use Epi-Info for statistical significance testing

CALCULATING ΧCALCULATING Χ22 VALUE: VALUE:the 2 X 2 tablethe 2 X 2 table

USING A ΧUSING A Χ22 TABLE TABLE (1) Decide on the significance level you want to use (e.g.

0.05)(2) Calculate the degrees of freedom (df) as:• df = (r-1) x (c-1), where r is the number of rows and c

is the number of columns• for a simple two-by-two table the number of degrees of

freedom is 1 (df = (2-1) x (2-1) = 1)(3) If the calculated χ2 > the tabulated χ2 then

P<0.05• In this case, we reject the null hypothesis and conclude

that there is a statistically significant difference between the groups

(4) If the calculated χ2 < the tabulated χ2 then P>0.05• In this case, we accept the null hypothesis and

conclude that the observed difference is not statistically significant

TABLE OF ΧTABLE OF Χ22 VALUES VALUES df P =

0.05P = 0.01

1 2 3 4 5 6 7 8 9101112

3.84 5.99 7.81 9.4911.0712.5914.0715.5116.9218.3119.6821.03

6.63 9.2111.3413.2815.0916.8118.4820.0921.6723.2124.7226.22

APPLYING THE ΧAPPLYING THE Χ22

Example: use of χ2 test with the data on utilization of antenatal care:

• χ2 = 4.57; P = 0.03 <0.05• Conclusion:

– women living within a distance of 10 km from the clinic utilize antenatal care more often (64%) than the women living more than 10 km away (48%);

– this difference is statistically significant (χ2 = 4.57; P = 0.03)

PRACTICAL: two sample PRACTICAL: two sample proportionsproportions

• From each of 509 vaginal swabs taken at an STI clinic, isolation of Candida albicans and culture of Trichomonas vaginalis were attempted.

• There were 347 swabs negative for both Candida and Trichomonas. Candida was isolated from 7 of the 44 swabs positive for Trichomonas:– Present this information in a 2 x 2 table.– Is there an evidence of association between

Candidiasis and Trichomoniasis?

Malimu statistical significance testing.

Education

Transcript of Malimu statistical significance testing.