CHAPTER 10 CONFIDENCE INTERVALS FOR ONE SAMPLE POPULATION CONFIDENCE INTERVAL FOR THE DIFFERENCE IN...

CHAPTER 10

•CONFIDENCE INTERVALS FOR ONE SAMPLE POPULATION

•CONFIDENCE INTERVAL FOR THE DIFFERENCE IN TWO SAMPLE

POPULATION

1

Point Estimate and Interval Estimate

A point estimate is a single number that is our “best guess” for the parameter. Point estimation produces a number (an estimate) which is believed to be close to the value of the unknown parameter.

An interval estimate is an interval of numbers within which the parameter value is believed to fall. Interval estimation produces an interval that contains the estimated parameter with a prescribed confidence.

2

Point Estimate and Interval Estimate(Figure 10.1)

3


• Figure 10.1 A point estimate predicts a parameter by a single number. An interval estimate is an interval of numbers that are believable values for the parameter.

• Question: Why is a point estimate alone not sufficiently informative?

4


A point estimate doesn’t tell us how close the estimate is likely to be to the parameter.

An interval estimate is more useful, it incorporates a margin of error which helps us to gauge the accuracy of the point estimate.

5

Properties of Point Estimators

Property 1: A good estimator has a sampling distribution that is centered at the parameter.

An estimator with this property is unbiased. The sample mean is an unbiased estimator

of the population mean.

The sample proportion is an unbiased estimator of the population proportion.

6

7

SOME POINT ESTIMATORS

PARAMETER UNBIASED ESTIMATOR

PROPORTION P

MEAN

STANDARD DEVIATION

S

P̂

X

Properties of Point Estimators

Property 2: A good estimator has a small standard deviation compared to other estimators.

This means it tends to fall closer than other estimates to the parameter.

The sample mean has a smaller standard error than the sample median when estimating the population mean of a normal distribution.

8

The Logic behind Constructing a Confidence Interval

To construct a confidence interval for a population proportion, start with the sampling distribution of a sample proportion.

Gives the possible values for the sample proportion and their probabilities.

The sampling distribution: Is approximately a normal distribution for

large random samples by the CLT. Has mean equal to the population proportion. Has standard deviation called the standard

error.9

Constructing a Confidence Interval to Estimate a Population Proportion

We symbolize a population proportion by p.

The point estimate of the population proportion is the sample proportion.

We symbolize the sample proportion by

called “p-hat”.10

p̂

11


•A CONFIDENCE INTERVAL OFTEN HAS THE FORM:

•IT IS CONSTRUCTED WITH A PRESCRIBED CONFIDENCE KNOWN AS THE CONFIDENCE LEVEL

)(MEERROROFMARGINESTIMATEPOINT

Confidence Interval or Interval Estimate

Sample estimate Multiplier × Standard Error

Sample estimate Margin of error

• Multiplier is a number based on the confidence level desired and determined from the standard normal distribution (for proportions) or Student’st-distribution (for means).

12

The Multiplier

• Multiplier, denoted as z*, is the standardized score such that the area between -z* and z* under the standard normal curve corresponds to the desired confidence level.

• Note: Increase confidence level => larger multiplier

13

The Multiplier

14

For 90% Confidence Level

15

16

SOME CRITICAL VALUES FOR STANDARD NORMAL DISTRIBUTION

C % CONFIDENCE LEVEL

CRITICAL VALUE

80% 1.282

90% 1.645

95% 1.960

98% 2.326

99% 2.576

*Z

Interpretation of the Confidence Level

So what does it mean to say that we have “95% confidence”?

The meaning refers to a long-run interpretation—how the method performs when used over and over with many different random samples.

If we used the 95% confidence interval method over time to estimate many population proportions, then in the long run about 95% of those intervals would give correct results, containing the population proportion.

17

18

WHAT DOES C% CONFIDENCE REALLY MEAN?

• FORMALLY, WHAT WE MEAN IS THAT C% OF SAMPLES OF THIS SIZE WILL PRODUCE CONFIDENCE INTERVALS THAT CAPTURE THE TRUE PROPORTION.

• C% CONFIDENCE MEANS THAT ON AVERAGE, IN C OUT OF 100 ESTIMATIONS, THE INTERVAL WILL CONTAIN THE TRUE ESTIMATED PARAMETER.

• E.G. A 95% CONFIDENCE MEANS THAT ON THE AVERAGE, IN 95 OUT OF 100 ESTIMATIONS, THE INTERVAL WILL CONTAIN THE TRUE ESTIMATED PARAMETER.

19

CONFIDENCE INTERVAL FOR PROPORTION P [ONE-PROPORTION Z-INTERVAL]

ASSUMPTIONS AND CONDITIONS• RANDOMIZATION CONDITION

• 10% CONDITION

• SAMPLE SIZE ASSUMPTION OR SUCCESS/FAILURE CONDITION

• INDEPENDENCE ASSUMPTION• NOTE: PROPER RANDOMIZATION CAN HELP

ENSURE INDEPENDENCE.

20

CONSTRUCTING CONFIDENCE INTERVALS

ESTIMATOR SAMPLE PROPORTION

STANDARD ERROR

C% MARGIN OF ERROR

C% CONFIDENCE INTERVAL

P̂

n

qpPSE

ˆˆ)ˆ(

)ˆ()ˆ( * pSEzpME

)ˆ(ˆ pMEp

Compact Formula For a Confidence Interval For a Population Proportion p

• is the sample proportion.• z* denotes the multiplier.where

• is the standard error of .

21

n

ppzp

ˆ1ˆˆ

n

pp ˆ1ˆ

p̂

p̂


•The exact standard deviation of a sample proportion equals:

•This formula depends on the unknown population proportion, p.•In practice, we don’t know p, and we need to estimate the standard error as

22

n

pp )1(

n

ppse

)ˆ1(ˆ

Margin of Error

– The margin of error measures how accurate the point estimate is likely to be in estimating a parameter.

– It is a multiple of the standard error of the sampling distribution when the sampling distribution is a normal distribution.

– The distance of 1.96 standard errors is the margin of error for a 95% confidence interval for a parameter from a normal distribution.

23

Intuitive Explanation of Margin of Error

• Margin of Error Characteristics:

• The difference between the sample proportion and the population proportion is less than the margin of error about 95% of the time, or for about 19 of every 20 sample estimates.

• The difference between the sample proportion and the population proportion is more than the margin of error about 5% of the time, or for about 1 of every 20 sample estimates

24

25

SAMPLE SIZE NEEDED TO PRODUCE A CONFIDENCE INTERVAL WITH A GIVEN MARGIN OF ERROR, ME

SOLVING FOR n GIVES

WHERE IS A REASONABLE GUESS. IF WE CANNOT MAKE A GUESS, WE TAKE

n

qpzpME

ˆˆ)ˆ( *

2

2*

)(

ˆˆ)(

ME

qpzn

qANDp ˆˆ5.0ˆˆ qp

26

EXAMPLE 1A MAY 2002 GALLUP POLL FOUND THAT ONLY 8% OF A

RANDOM SAMPLE OF 1012 ADULTS APPROVED OF ATTEMPTS TO CLONE A HUMAN.

(A) FIND THE MARGIN OF ERROR FOR THIS POLL IF WE WANT 95% CONFIDENCE IN OUR ESTIMATE OF THE PERCENT OF AMERICAN ADULTS WHO APPROVE OF CLONING HUMANS.

(B) EXPLAIN WHAT THAT MARGIN OF ERROR MEANS.

(C) IF WE ONLY NEED TO BE 90% CONFIDENT, WILL THE MARGIN OF ERROR BE LARGER OR SMALLER? EXPLAIN.

(D) FIND THAT MARGIN OF ERROR.

(E) IN GENERAL, IF ALL OTHER ASPECTS OF THE SITUATION REMAIN THE SAME, WOULD SMALLER SAMPLES PRODUCE SMALLER OR LARGER MARGINS OF ERROR?

27

SOLUTION

28

EXAMPLE 2

DIRECT MAIL ADVERTISERS SEND SOLICITATIONS (a.k.a. “junk mail”) TO THOUSANDS OF POTENTIAL CUSTOMERS IN THE HOPE THAT SOME WILL BUY THE COMPANY’S PRODUCT. THE RESPONSE RATE IS USUALLY QUITE LOW. SUPPOSE A COMPANY WANTS TO TEST THE RESPONSE TO A NEW FLYER, AND SENDS IT TO 1000 PEOPLE RANDOMLY SELECTED FROM THEIR MAILING LIST OF OVER 200,000 PEOPLE. THEY GET ORDERS FROM 123 OF THE RECIPIENTS.

(A) CREATE A 90% CONFIDENCE INTERVAL FOR THE PERCENTAGE OF PEOPLE THE COMPANY CONTACTS WHO MAY BUY SOMETHING.

(B) EXPLAIN WHAT THIS INTERVAL MEANS.(C) EXPLAIN WHAT “90% CONFIDENCE” MEANS.(D) THE COMPANY MUST DECIDE WHETHER TO NOW DO A

MASS MAILING. THE MAILING WON’T BE COST-EFFECTIVE UNLESS IT PRODUCES AT LEAST A 5% RETURN. WHAT DOES YOUR CONFIDENCE INTERVAL SUGGEST? EXPLAIN.

29

SOLUTION

30

EXAMPLE 3

IN 1998 A SAN DIEGO REPRODUCTIVE CLINIC REPORTED 49 BIRTHS TO 207 WOMEN UNDER THE AGE OF 40 WHO HAD PREVIOUSLY BEEN UNABLE TO CONCEIVE.

(A) FIND A 90% CONFIDENCE INTERVAL FOR THE SUCCESS RATE AT THIS CLINIC.

(B) INTERPRET YOUR INTERVAL IN THIS CONTEXT.

(C) EXPLAIN WHAT “90 CONFIDENCE” MEANS.

(D) WOULD IT BE MISLEADING FOR THE CLINIC TO ADVERTISE A 25% SUCCESS RATE? EXPLAIN.

(E) THE CLINIC WANTS TO CUT THE STATED MARGIN OF ERROR IN HALF. HOW MANY PATIENTS’ RESULTS MUST BE USED?

(F) DO YOU HAVE ANY CONCERNS ABOUT THIS SAMPLE? EXPLAIN.

31

SOLUTION

How Can We Use Confidence Levels Other than 95%?

In practice, the confidence level 0.95 is the most common choice. But, some applications require greater (or less) confidence.

• To increase the chance of a correct inference, we can use a larger confidence level, such as 0.99.

32

A 99% Confidence Interval Is Wider Than a 95% Confidence Interval.

33

Question: If you want greater confidence, why would you expect a wider interval?

• In using confidence intervals, we must compromise between the desired margin of error and the desired confidence of a correct inference.

– As the desired confidence level increases, the margin of error gets larger.

34

Effects of Confidence Level and Sample Size on Margin of Error

The margin of error for a confidence interval: Increases as the confidence level increases Decreases as the sample size increases

For instance, a 99% confidence interval is wider than a 95% confidence interval, and a confidence interval with 200 observations is narrower than one with 100 observations at the same confidence level. These properties apply to all confidence intervals, not just the one for the population proportion.

35

What is the Error Probability for the Confidence Interval Method?

•The general formula for the confidence interval for a population proportion is:

• Sample estimate Multiplier × Standard Error

–which in symbols is

36

z(se) ˆ p

What is the Error Probability for the Confidence Interval Method?

37

Confidence Intervals for the Difference Between Two Proportions

where z* is the value of the standard normal variable with area between -z* and z* equal to the desired confidence level.

38

2

22

1

11*21

11

n

pp

n

ppzpp

Necessary Conditions

• Condition 1: Sample proportions are available based on independent, randomly selected samples from the two populations.

• Condition 2: All of the quantities –

– are at least 10.

39

22221111 ˆ1 and ,ˆ ,ˆ1 ,ˆ pnpnpnpn

Example: Age and Using the Internet

Young:92 of 262 use Internet as main news source = .351Old: 59 of 632 use Internet as main news source

= .093

• Approximate 95% Confidence Interval: .258 1.96(.0317) .196 to .320

• We are 95% confident that somewhere between 19.6% and 32.0% more young adults than older adults use the Internet as their main news source.

40

258.093.351.ˆˆ 21 pp 0317.)ˆˆ.(. and 21 ppes

1p̂2p̂

Using Confidence Intervals to Guide Decisions

Principle 1. A value not in a confidence interval can be rejected as a possible value of the population proportion. A value in a confidence interval is an “acceptable” possibility for the value of a population proportion.

Principle 2. When a confidence interval for the difference in two population proportions does not cover 0, it is reasonable to conclude the two population proportions are different.

Principle 3. When the confidence intervals for proportions in two different populations do not overlap, it is reasonable to conclude the two population proportions are different.

41

Example: Which Drink Tastes Better?

• Taste Test: A sample of 60 people taste both drinks and 55% like taste of Drink A better than Drink B

Makers of Drink A want to advertise these results.Makers of Drink B make a 95% confidence interval for

the population proportion who prefer Drink A.95% Confidence Interval:

• Note: Since .50 is in the interval, there is not enough evidence to claim that Drink A is preferred by a majority of population represented by the sample.

42

13.55.

60

55.155.255.

CHAPTER 11

ESTIMATING MEANS WITH CONFIDENCE

43

CONFIDENCE INTERVALS FOR ONE POPULATION MEAN

The confidence interval again has the formPoint estimate margin of error

The sample mean is the point estimate of the population mean.

The exact standard error of the sample mean is

• In practice, we estimate σ by the sample standard• deviation, s, so

44

/ n

n

sxes ..

Confidence Intervals for One Population Mean

• For large n… from any population and also

• For small n from an underlying population that is normal…

• The confidence interval for the population mean is:

45

)n

z(

x

Confidence Intervals for One Population Mean

In practice, we don’t know the population standard

deviation • Substituting the sample standard deviation s for to

get

• introduces extra error. To account for this

increased error, we must replace the z-score by a slightly larger score, called a t –score. The confidence interval is then a bit wider. This distribution is called the t distribution.

46

n

sxes ..

Summary: Properties of the t-Distribution

The t-distribution is bell shaped and symmetric about 0.

The probabilities depend on the degrees of freedom,

.

The t-distribution has thicker tails than the standard normal distribution, i.e., it is more spread out.

A t -score multiplied by the standard error gives the margin of error for a confidence interval for the mean.

47

1df n

t - Distribution

48

t - Distribution

• The t Distribution Relative to the Standard Normal Distribution: The t distribution gets closer to the standard normal as the degrees of freedom ( df ) increase. The two are practically identical when .

• Question: Can you find z -scores (such as 1.96) for a normal distribution on the t table?

49

30df

t - Distribution

50

t – Distribution

• Part of t - Table Displaying t-Scores. The scores have right-tail probabilities of 0.100, 0.050, 0.025, 0.010, 0.005, and 0.001. When

• and is the t -score with right-tail probability = 0.025 and two-tail probability = 0.05. It is used in a 95% confidence interval,

51

7, 6n df

.025 2.447t

2.447( )x se

t - Distribution

52

t - Distribution

• The t Distribution with df = 6. 95% of the distribution falls between -2.447

• and 2.447. These t -scores are used with a 95% confidence interval when n = 7.

• Question: Which t -scores with df = 6 contain the middle 99% of a t distribution (for a 99% confidence interval)?

53

Using the t Distribution to Construct a Confidence Interval for a Mean

•Summary: 95% Confidence Interval for a Population Mean

•When the standard deviation of the population is unknown, a 95% confidence interval for the population mean is:

•To use this method, you need: Data obtained by randomization An approximately normal population distribution

54

1-n df );( t.025

n

sx

SUMMARY

55

ASSUMPTIONS AND CONDITIONS

• INDEPENDENCE ASSUMPTION: THE DATA VALUES SHOULD BE INDEPENDENT. THERE’S REALLY NO WAY TO CHECK INDEPENDENCE OF THE DATA BY LOOKING AT THE SAMPLE, BUT WE SHOULD THINK ABOUT WHETHER THE ASSUMPTION IS REASONABLE.

• RANDOMIZATION CONDITION: THE DATA SHOULD ARISE FROM A RANDOM SAMPLE OR SUITABLY A RANDOMIZED EXPERIMENT.

56

ASSUMPTIONS AND CONDITIONS

• 10% CONDITION: THE SAMPLE IS NO MORE THAN 10% OF THE POPULATION.

• NORMAL POPULATION ASSUMPTION OR NEARLY NORMAL CONDITION: THE DATA COME FROM A DISTRIBUTION THAT IS UNIMODAL AND SYMMETRIC. REMARK: CHECK THIS CONDITION BY MAKING A HISTOGRAM OR NORMAL PROBABILITY PLOT.

57

CONSTRUCTING CONFIDENCE INTERVALS FOR MEANS

58

• POINT ESTIMATOR:

• STANDARD ERROR:

• C% MARGIN OF ERROR:

WHERE tn-1* IS A CRITICAL VALUE FOR STUDENT’S t – MODEL WITH n – 1 DEGREES OF FREEDOM THAT CORRESPONDS TO C% CONFIDENCE LEVEL.

59

2

22*1)(

ME

stn n

REMARK

60

ILLUSTRATIVE PICTURE

61

FINDING CRITICAL t - VALUES

• Using t tables (Table T) and/or calculator, find or estimate the

• 1. critical value t7* for 90% confidence level if number of degrees of freedom is 7

• 2. one tail probability if t = 2.56 and number of degrees of freedom is 7

• 3. two tail probability if t = 2.56 and number of degrees of freedom is 7

• NOTE: If t has a Student's t-distribution with degrees of freedom, df, then TI-83 function tcdf(a,b,df) , computes the area under the t-curve and between a and b.

62

EXAMPLES FROM MIDTERM EXAM III PRACTICE EXERCISES

63

Choosing the Sample Size for Estimating a Population Mean

In practice, you don’t know the value of the standard deviation,

• You must substitute an educated guess for

• Sometimes you can use the sample standard deviation from a similar study.

When no prior information is known, a crude estimate that can be used is to divide the estimated range of the data by 6 since for a bell-shaped distribution we expect almost all of the data to fall within 3 standard deviations of the mean.

64

Other Factors That Affect the Choice of the Sample Size

The first is the desired precision, as measured by the margin of error, m.

The second is the confidence level.

The third factor is the variability in the data.

The fourth factor is cost.

65

What if You Have to Use a Small n?

The t methods for a mean are valid for any n.

However, you need to be extra cautious to look for extreme outliers or great departures from the normal population assumption.

– In the case of the confidence interval for a population proportion, the method works poorly for small samples because the CLT no longer holds.

66

Confidence Intervals for Difference in Two Population Means (Independent Samples)

67

Confidence Intervals for Difference for the Difference Between Two Population Means

Approximate CI for 1 – 2:

where t* is the value in a t-distribution with area between -t* and t* equal to the desired confidence level.

Approximate df difficult to specify. Use computer software or conservatively use the smaller of the two sample sizes and subtract 1.

68

2

22

1

21*

21 n

s

n

stxx

Degrees of Freedom

Statistical software can use the above approximation, but if done by-hand then use a conservative df = smaller of n1 – 1 and n2 – 1.

69

The t-distribution is only approximately correct and df formula is complicated

(Welch’s approximation):

Necessary Conditions

Two samples must be independent and either:

Situation 1: Populations of measurements both bell-shaped, and random samples of any size are measured.

Situation 2: Large (n 30) random samples are measured. But if there are extreme outliers, or extreme skewness, it is better to have an even larger sample than n = 30.

70

Example: Effect of a Stare on Driving

• Randomized experiment: Researchers either stared or did not stare at drivers stopped at a campus stop sign; Timed how long (sec) it took driver to proceed from sign to a mark on other side of the intersection.

No Stare Group (n = 14): 8.3, 5.5, 6.0, 8.1, 8.8, 7.5, 7.8, 7.1, 5.7, 6.5, 4.7, 6.9, 5.2, 4.7

Stare Group (n = 13): 5.6, 5.0, 5.7, 6.3, 6.5, 5.8, 4.5, 6.1, 4.8, 4.9, 4.5, 7.2, 5.8

• Task: Make a 95% CI for the difference between the mean crossing times for the two populations represented by these two independent samples.

71


72


Checking Conditions

Boxplots show …

• No outliers and no strong skewness.

• Crossing times in stare group generally faster and less variable.

73

Example: Effect on a Stare on Driving

74

Note: The df = 21 was reported by the computer package based on the Welch’s approximation formula.

Equal Variance Assumption and the Pooled Standard Error

• May be reasonable to assume the two populations have equal population standard deviations, or equivalently, equal population variances:

• Estimate of this variance based on the combined or “pooled” data is called the pooled variance. The square root of the pooled variance is called the pooled standard deviation:

75

2

11 deviation standard Pooled

21

222

211

nn

snsnsp

222

21

Pooled Standard Error

76

21

21

2

2

2

1

2

21

11

11

).(. Pooled

nns

nns

n

s

n

sxxes

p

p

pp

Pooled Degrees of Freedom (df)

• Note: Pooled df = (n1 – 1) + (n2 – 1)

= (n1 + n2 – 2).

77

Pooled Confidence Interval

Pooled CI for the Difference Between Two Means (Independent Samples):

where t* is found using a t-distribution with df = (n1 + n2 – 2) and

sp is the pooled standard deviation.

78

21

*21

11

nnstxx p

Example: Male and Female Sleep Times• Q: How much difference is there between how long female and male students slept the previous night?

• Data: The 83 female and 65 male responses from students in an intro stat class.

• Task: Make a 95% CI for the difference between the two population means sleep hours for females versus males.

• Note: We will assume equal population variances.

79

Example: Male and Female Sleep TimesTwo-sample T for sleep [with “Assume Equal Variance”

option]

Sex N Mean StDev SE Mean

Female 83 7.02 1.75 0.19

Male 65 6.55 1.68 0.21

Difference = mu (Female) – mu (Male)Estimate for difference: 0.46195% CI for difference: (-0.103, 1.025)T-Test of difference = 0 (vs not =): T-Value = 1.62 P = 0.108 DF = 146Both use Pooled StDev = 1.72

80

Example: Male and Female Sleep Times

Notes:• Two sample standard deviations are

very similar.• Sample mean for females higher than

for males.• 95% confidence interval contains 0 so

cannot rule out that the population means may be equal.

81


• Pooled Standard Deviation and Pooled Standard Error “by – hand”:

82

72.1957.2

26583

68.116575.1183

2

11 dev std Pooled

22

21

222

211

nn

snsnsp


83

285.065

1

83

172.1

11).(. Pooled

2121

nn

sxxes p

Pooled or Unpooled?

• If the larger sample size produced the larger standard deviation, the pooled procedure is acceptable because it will be conservative.

• If the smaller standard deviation accompanies the larger sample size, the pooled test can be quite misleading and not recommended.

• If sample sizes are equal, the pooled and unpooled standard errors are equal. Unless the sample standard deviations are quite similar, it is best to use the unpooled procedure.

84

Confidence Interval for the Difference in Two Population Means

1.Make sure appropriate conditions apply checking sample size and/or a shape picture of the differences.2.Choose a confidence level.3.Compute the mean and std dev for each sample.4.Determine whether the std devs are similar enough to pooled procedure can be used.5.Calculate the appropriate standard error (pooled or unpooled).6.Calculate the appropriate df.7.Use Table A.2 (or software) to find the multiplier t*.

85

21*

21 .. xxestxx

Examples From Midterm Exam III Practice Sheet

86

CHAPTER 10 CONFIDENCE INTERVALS FOR ONE SAMPLE POPULATION CONFIDENCE INTERVAL FOR THE DIFFERENCE IN...

Documents

Transcript of CHAPTER 10 CONFIDENCE INTERVALS FOR ONE SAMPLE POPULATION CONFIDENCE INTERVAL FOR THE DIFFERENCE IN...