Review Normal Distribution Statistical...

Hypothesis Testing for Proportions

1

HT - 1

Statistical Inference

Sampling Distribution

2

Review Normal Distribution

Example: Consider the distribution of serum

cholesterol levels for 40- to 70-year-old

males living in community A has a mean of

211 mg/100 ml, and the standard deviation

of 46 mg/100 ml. If an individual is selected

from this population, what is the probability

that his/her serum cholesterol level is higher

than 225?

3

P(X > 225) = ?

225

0

z

x

211

X ~ N (m = 211, s = 46)

.30

225 - 211

46 = .30 z =

.382

.382

4

Statistical Inference

1. Type of Inference:

– Estimation

– Hypothesis Testing

2. Purpose

– Make Decisions

about Population

Characteristics

Population?

5

Statistics Used to Estimate Population Parameters

– Sample Mean,

– Sample Variance, s 2

– Sample Proportion,

…

Estimators

p̂

x m population mean

s 2 population variance

p population proportion

Parameters Statistics

Theoretical Basis Is Sampling Distribution.

6

Theoretical Probability Distribution of the

Sample Statistic.


What is the Shape of this distribution?

What are the values of the parameters

such as mean and standard deviation?


2

7

Sampling Distribution of The

Mean (Normal Theorem) If a random sample is taken from a Normally

distributed population that has a mean m and

a standard deviation s, the sampling

distribution of the sample mean, x, will be

Normal with a mean that is the same as the

population mean, and will have a standard

deviation that is equal to the standard

deviation of the population divided by the

square root of the sample size.

ss

xn

m mx 8

Sampling Distribution of The

Mean (Central Limit Theorem) If a relatively large random sample is taken

from a population that has a mean m and a

standard deviation s, the sampling

distribution of the sample mean, x, will be

approximately a normal distribution with

mean that is the same as the population

mean, and a standard deviation that is equal

to the standard deviation of the population

divided by the square root of the sample

size.

ss

xn

m mx

9

Standard Error of Mean

1. Formula

2. Standard Deviation of the sampling

distribution of the Sample Means,X

3. Less Than Pop. Standard Deviation

n

s

nx

ss

ss

n

10


s = 2

m = 8

x Population

Distribution

m = 8

x

4.025

2xsSampling

Distribution of

Mean for a

sample of size 25

11

Probability Related to Mean

Example: Consider the distribution of serum

cholesterol levels for all 20- to 74-year-old

males living in United States has a mean of

211 mg/100 ml, and the standard deviation

of 46 mg/100 ml. If a random sample of

100 individuals is taken from the population,

what is the probability that the average

serum cholesterol level of these 100

individuals is higher than 225?

12

P(X > 225) = ?

225

3.04

225 - 211

4.6 = 3.04

.001

Cholesterol Level has a mean 211, s.d. 46.

001.0

)04.3()225(

ZPXP

211

n = 100

x

)6.4,211( xxNX sm

0

z


3

Inference on Proportions

HT - 13 HT - 14

Inference on Proportion

Parameter: Population Proportion p (or p)

(Percentage of people has no health insurance)

Statistic: Sample Proportion n

xp ˆ

x is number of successes

n is sample size

Data: 1, 0, 1, 0, 0 4.5

2ˆ p

4.5

00101

x

xp ˆ

HT - 15

Sampling Distribution of

Sample Proportion A random sample of size n from a large population

with proportion of successes (usually represented by a value 1) p , and therefore proportion of failures (usually represented by a value 0) 1 – p , the sampling distribution of sample proportion,

= x/n, where x is the number of successes in the sample, is asymptotically normal with a mean p

and standard deviation . n

pp )1( -

p̂

HT - 16


When sampling from a population that has a

proportion of successes p, the distribution of is approximately normal if n is large,

m = p

n

ppp

)1(ˆ

-s

p̂

p̂

17

Sampling Error

Sample statistic

(point estimate)

ө

Sampling Error = | ө – |

18

Key Elements of

Interval Estimation

Confidence

interval

Sample statistic

(point estimate)

Confidence

limit (lower)

Confidence

limit (upper)

Confidence Level: A probability that the

population parameter falls somewhere

within the interval.

Point estimate Margin of Error


4

HT - 19


m = p p̂

95%

za/2·

n

pp )ˆ1(ˆ -

1.96

p̂

za/2· n

pp )ˆ1(ˆ -p̂

HT - 20

Confidence Interval

Confidence interval: The (1- a)x100%

confidence interval estimate for population

proportion is

za/2·

n

pp )ˆ1(ˆ -p̂

Large Sample Assumption:

Both np and n(1-p) are greater than 5, that is, it is

expected that there at least 5 counts in each category.

HT - 21

1026 ,481026

495ˆ n.p

)ˆ1(ˆ

ˆ2/

n

ppZp

- a

1026

)48.1(48.96.148.

-

) %51 , %45( %3%48 .03 .48

Margin of Error

22

Sample Size

n

ppp

)ˆ1(ˆzˆ :C.I.

2

- a

n

ppZE

)ˆ1(ˆError ofMargin

2

- a

if pilot study is done. )ˆ1(ˆ2

2

2 ppE

zn -

a

to get the largest sample to

achieve the goal. 25.0

2

2

2 E

zn

a

23

Sample Size (No prior information on p)

Sample Size Example: If one wishes to do a

survey to estimate the population proportion

with 95% confidence and a margin of error of

3%, how large a sample is needed?

Za/2 = 1.96; E = .03

n = (1.962/.032) x .25 = 1067.11

A sample of size 1068 is needed.

24

Sample Size (With prior information on p)

Sample Size Example: If one wishes to to estimate

the percentage of people infected with West Nile in a

population with 95% confidence and a margin of

error of 3%, how large a sample is needed? (A pilot

study has been done, and the sample proportion was

6%.)

Za/2 = 1.96; E = .03

n = (1.962/.032) x .06 x (1 – .06) = 240.7

A sample of size 241 is needed.

How large a sample was used for pilot study?


5

HT - 25

An Alternative Method

nz

nznppznzp

/1

)4/(/)ˆ1(ˆ)2/(ˆ2

2/

22

2/2/

2

2/

a

aaa

-

2//)1(

|/|az

npp

pny

-

-By solving for in p

n

yˆ

The (1- a)x100% Confidence Interval for p is

HT - 26

Hypothesis Testing

1. State research hypotheses or questions.

2. Gather data or evidence (observational or experimental) to answer the question.

3. Summarize data and test the hypothesis.

4. Draw a conclusion.

p = 30% ?

%2525.ˆ p

HT - 27

Statistical Hypothesis

Null hypothesis (H0):

Hypothesis of no difference or no relation,

often has =, , or notation when testing

value of parameters.

Example:

H0: p = 30% or

H0: Percentage of votes for A is 30%. HT - 28

Statistical Hypothesis

Alternative hypothesis (H1 or Ha)

Usually corresponds to research hypothesis and opposite to null hypothesis,

often has >, < or notation in testing mean.

Example:

Ha: p 30% or

Ha: Percentage of votes for A is not 30%.

HT - 29

Hypotheses Statements Example

• A researcher is interested in finding out

whether percentage of people in favor of

policy A is different from 60%.

H0: p = 60%

Ha: p 60%

[Two-tailed test]

HT - 30



whether percentage of people in a community

that has health insurance is more than 77%.

H0: p = 77% ( or p 77% )

Ha: p > 77%

[Right-tailed test]


6

HT - 31



whether the percentage of bad product is

less than 10%.

H0: p = 10% ( or p 10% )

Ha: p < 10%

[Left-tailed test]

HT - 32

Evidence

Test Statistic (Evidence): A sample

statistic used to decide whether to reject

the null hypothesis.

HT - 33

Logic Behind

Hypothesis Testing

In testing statistical hypothesis,

the null hypothesis is first assumed to

be true.

We collect evidence to see if the evidence

is strong enough to disprove (reject) the null

hypothesis and therefore support the

alternative hypothesis.

HT - 34

One Sample Z-Test for Proportion

(Large sample test)

Two-Sided Test

HT - 35

I. Hypothesis

One wishes to test whether the percentage

of votes for A is different from 30%

Ho: p = 30% v.s. Ha: p 30%

HT - 36

What will be the key statistic (evidence) to use for testing the hypothesis about population proportion?

Evidence

pSample Proportion:

A random sample of 100 subjects is

chosen and the sample proportion is 25%

or .25.


7

HT - 37


If H0: p = 30% is true, sampling distribution of sample proportion will be approximately normally distributed with mean .3 and standard

deviation (or standard error)

.30

0458.0ˆ ps

0458.0100

)3.1(3.

-

p̂

HT - 38

This implies that the statistic is 1.09 standard

deviations away from the mean .3 under H0 ,

and is to the left of .3 (or less than .3)

09.1

100

)3.1(3.

3.25.

)1(

ˆˆ

00

0

ˆ

0

--

-

-

-

-

n

pp

ppppz

ps

II. Test Statistic

-1.09 0

Z

.25 .30

p̂

HT - 39

Level of Significance

Level of significance for the test (a)

A probability level selected by the

researcher at the beginning of the

analysis that defines unlikely values of

sample statistic if null hypothesis is true.

Total tail area = a

c.v. 0 c.v.

c.v. = critical value

HT - 40

III. Decision Rule Critical value approach: Compare the test statistic

with the critical values defined by significance level a,

usually a = 0.05.

We reject the null hypothesis, if the test statistic

z < –za/2 = –z0.025 = –1.96, or z > za/2 = z0.025 = 1.96.

( i.e., | z | > za/2 )

–1.09

–1.96 0 1.96 Z

a/2=0.025 a/2=0.025

Rejection

region Rejection

region

Two-sided Test

Critical values

HT - 41

III. Decision Rule p-value approach: Compare the probability of the evidence or more extreme evidence to occur when null hypothesis is true. If this probability is less than the level of significance of the test, a, then we reject the null hypothesis. (Reject H0 if p-value < a)

p-value = P(Z -1.09 or Z 1.09) = 2 x P(Z -1.09) = 2 x .1379 = .2758

Z 0

Left tail area .1379

–1.09 Two-sided Test

1.09

Right tail area .138

HT - 42

p-value p-value

The probability of obtaining a test statistic that is as extreme or more extreme than actual sample statistic value given null hypothesis is true. It is a probability that indicates the extremeness of evidence against H0.

The smaller the p-value, the stronger the evidence for supporting Ha and rejecting H0 .


8

HT - 43

IV. Draw conclusion

Since from either

critical value approach z = -1.09 > -za/2= -1.96

or p-value approach p-value = .2758 > a = .05 ,

we do not reject null hypothesis.

Therefore we conclude that there is no

sufficient evidence to support the alternative

hypothesis that the percentage of votes

would be different from 30%.

HT - 44

Steps in Hypothesis Testing

1. State hypotheses: H0 and Ha.

2. Choose a proper test statistic, collect data, checking the assumption and compute the value of the statistic.

3. Make decision rule based on level of significance(a).

4. Draw conclusion. (Reject or not reject null hypothesis)

(Support or not support alternative hypothesis)

HT - 45

When do we use this z-test for

testing the proportion of a

population?

• Large random sample.

HT - 46

One-Sided Test

Example with the same data:

A random sample of 100 subjects is chosen

and the sample proportion is 25% .

HT - 47

I. Hypothesis

One wishes to test whether the percentage

of votes for A is less than 30%

Ho: p = 30% v.s. Ha: p < 30%

HT - 48

What will be the key statistic (evidence) to use for testing the hypothesis about population proportion?

Evidence

pSample Proportion:

A random sample of 100 subjects is

chosen and the sample proportion is 25%

or .25.


9

HT - 49


If H0: p = 30% is true, sampling distribution of sample proportion will be approximately normally distributed with mean .3 and standard

deviation (or standard error)

.30

0458.0ˆ ps

0458.0100

)3.1(3.

-

p̂

HT - 50

This implies that the statistic is 1.09 standard

deviations away from the mean .3 under H0 ,

and is to the left of .3 (or less than .3)

09.1

100

)3.1(3.

3.25.

)1(

ˆˆ

00

0

ˆ

0

--

-

-

-

-

n

pp

ppppz

ps

II. Test Statistic

-1.09 0

Z

.25 .30

p̂

HT - 51

III. Decision Rule Critical value approach: Compare the test statistic

with the critical values defined by significance level a,

usually a = 0.05.

We reject the null hypothesis, if the test statistic

z < –za = –z0.05 = –1.645,

–1.09

–1.645 0 Z

a = .05

Rejection

region

Left-sided Test

HT - 52

III. Decision Rule

p-value approach: Compare the probability of the

evidence or more extreme evidence to occur when

null hypothesis is true. If this probability is less than

the level of significance of the test, a, then we

reject the null hypothesis.

p-value = P(Z -1.09) = P(Z -1.09) = .1379

Z

–1.09 0

Left tail area .1379

Left-sided Test

Z-Table

HT - 53

IV. Draw conclusion

Since from either

critical value approach z = -1.09 > -za/2= -1.645

or p-value approach p-value = .1379 > a = .05 ,

we do not reject null hypothesis.

Therefore we conclude that there is no

sufficient evidence to support the alternative

hypothesis that the percentage of votes is

less than 30%.

HT - 54

Can we see data and then

make hypothesis?

1. Choose a test statistic, collect data, checking the assumption and compute the value of the statistic.

2. State hypotheses: H0 and HA.

3. Make decision rule based on level of significance(a).

4. Draw conclusion. (Reject null hypothesis or not)

SN_Table.jpg

SN_Table.jpg

SN_Table.jpg


10

HT - 55

Errors in Hypothesis Testing

Possible statistical errors:

• Type I error: The null hypothesis is true,

but we reject it.

• Type II error: The null hypothesis is false,

but we don’t reject it.

Z p

a

“a” is the probability of committing Type I Error.

HT - 56

One-Sample z-test for a

population proportion

z-test:

Step 1: State Hypotheses (choose one of the three

hypotheses below)

i) H0 : p = p0 v.s. HA : p p0 (Two-sided test)

ii) H0 : p = p0 v.s. HA : p > p0 (Right-sided test)

iii) H0 : p = p0 v.s. HA : p < p0 (Left-sided test)

HT - 57

Test Statistic

Step 2: Compute z test statistic:

n

pp

ppz

)1(

ˆ

00

0

-

-

HT - 58

Step 3: Decision Rule:

p-value approach: Compute p-value,

if HA : p p0 , p-value = 2·P( Z | z | )

if HA : p > p0 , p-value = P( Z z )

if HA : p < p0 , p-value = P( Z z )

reject H0 if p-value < a

Critical value approach: Determine critical value(s) using a , reject H0 against

i) HA : p p0 , if | z | > za/2

ii) HA : p > p0 , if z > za

iii) HA : p < p0 , if z < - za

Step 4: Draw Conclusion.

HT - 59

Example: A researcher hypothesized that the

percentage of the people living in a community

who has no insurance coverage during the past

12 months is not 10%. In his study, 1000

individuals from the community were randomly

surveyed and checked whether they were

covered by any health insurance during the 12

months. Among them, 122 answered that they

did not have any health insurance coverage

during the last 12 months. Test the researcher’s

hypothesis at the level of significance of 0.05.

HT - 60

Hypothesis: H0 : p = .10 v.s. HA : p .10 (Two-sided test)

32.2

1000

)10.1(10.

10.122.

)1(

ˆ

00

0 -

-

-

-

n

pp

ppz

Test Statistic:

p-value = 2 x .0102 = .0204

Decision Rule: Reject null hypothesis if p-value < .05.

Conclusion: p-value = .0204 < .05. There is sufficient

evidence to support the alternative hypothesis that the

percentage is statistically significantly different from 10%.

Ex. 8.10


11

Binomial Test

HT - 61

32.2

1000

)10.1(10.

10.122.

)1(

ˆ

00

0 -

-

-

-

n

pp

ppzZ-Test Statistic:

P(Z >2.32) = 0.0102, p-value = 2 x .0102 = .0204

Binomial Test: test statistic = 122

(Exact Test)

P(Xb 122 | p = .1) = 0.0134, p-value = 2 x .0134 = .0268

HT - 62

Purpose: Compare proportions of two populations

Assumption: Two independent large random samples.

Step 1: Hypothesis:

1) H0: p1 = p2 v.s. HA: p1 p2

2) H0: p1 = p2 v.s. HA: p1 > p2

3) H0: p1 = p2 v.s. HA: p1 < p2

Two Independent Samples z-test

for Two Proportions

HT - 63

Step 2: Test Statistic:

-

---

21

2121

111

nnpp

ppppz

)ˆ(ˆ

)(ˆˆ

1

1

1n

xp ˆ

2

2

2n

xp ˆ

21

21

nn

xxp

ˆ

If a random sample of size n1 from population 1 has x1 successes,

and a random sample of size n2 from population 2 has x2

successes, the sample proportions of these two samples are

(proportion of successes in sample 1)

(proportion of successes in sample 2)

(overall sample proportion of successes)

(If H0: p1 = p2 , then p1 – p2 = 0 )

z has a standard normal distribution if n 1 and n 2 are large. HT - 64

Step 3: Decision Rule:

p-value approach: Compute p-value,

if HA : p1 p2 , p-value = 2·P( Z | z | )

if HA : p1 > p2 , p-value = P( Z z )

if HA : p1 < p2 , p-value = P( Z z )

reject H0 if p-value < a

Critical value approach: Determine critical value(s) using a ,

reject H0 against

i) HA : p1 p2 if | z | > za/2

ii) HA : p1 > p2 if z > za

iii) HA : p1 < p2 if z < - za

Step 4: Conclusion

HT - 65

Example: Test to see if the percentage of

smokers in country A is significant different from

country B, at 5% level of significance?

For country A, 1500 adults were randomly

selected and 551 of them were smokers.

For country B, 2000 adults were randomly

selected and 652 of them were smokers.

1p̂

2p̂

p̂

= 551/1500 = .367 (Country A)

= 652/2000 = .326 (Country B)

=(551+652)/(1500+2000) =.344

(overall percentage of smokers)

HT - 66

Step 1:

Hypothesis:

H0: p1 = p2 v.s. HA: p1 p2

532

2000

1

1500

13441344

0326367.

).(.

..

-

--z

Step 2:

Test Statistic:

p-value = .0057x2 = 0.0114


12

HT - 67

Step 3:

Decision Rule: Using the level of

significance at 0.05, the null hypothesis

would be rejected if p-value is less than 0.05.

Step 4:

Conclusion: Since p-value = 0.0114 < 0.05, the null

hypothesis is rejected. There is sufficient

evidence to support the alternative

hypothesis that there is a statistically

significantly difference in the percentages of

smokers in country A and country B. HT - 68

Confidence interval: The (1- a)% confidence

interval estimate for the difference of two population proportions is

za/2 · 21

pp ˆˆ -

2

22

1

1111

n

pp

n

pp )ˆ(ˆ)ˆ(ˆ -

-

2000

3261326

1500

3671367 ).(.).(. -

-

The 95% confidence interval estimate for the

difference of the two population proportions is:

.367 - .326 1.96 ·

.041 .032 4.1% 3.2%

(0.9%, 7.3%) CI does not cover 0

implies significant

difference.

HT - 69

34.7% 38.9%

( )( )

30.9% 34.3%

Confidence Interval Estimate of One Proportion

= 551/1500 = .367 = 36.7% (from A)

= 652/2000 = .326 = 32.6% (from B)

For A: 36.7% 2% or (34.7%, 38.9%)

For B: 32.6% 1.7% or (30.9%, 34.3%)

1p̂

2p̂

Two CI’s do not overlap implies significant difference.

HT - 70

Methods of Testing Hypotheses

• Traditional Critical Value Method

• P-value Method

• Confidence Interval Method

Review Normal Distribution Statistical...

Documents

Transcript of Review Normal Distribution Statistical...