More About Confidence Intervals

21
More About More About Confidence Confidence Intervals Intervals Presentation 10

description

Presentation 10. More About Confidence Intervals. Types of CI’s in Chapter 12. 1 mean Difference Between 2 Independent means Difference Between 2 Paired Means Difference Between 2 Proportions In cases B, C and D we are interested in comparing 2 populations with - PowerPoint PPT Presentation

Transcript of More About Confidence Intervals

Page 1: More About Confidence Intervals

More About More About Confidence IntervalsConfidence Intervals

Presentation 10

Page 2: More About Confidence Intervals

Types of CI’s in Chapter 12Types of CI’s in Chapter 12A.A. 1 mean1 mean

B.B. Difference Between 2 Independent meansDifference Between 2 Independent means

C.C. Difference Between 2 Paired MeansDifference Between 2 Paired Means

D.D. Difference Between 2 ProportionsDifference Between 2 Proportions

In cases B, C and D we are interested in comparing 2 populations with In cases B, C and D we are interested in comparing 2 populations with

regard to a parameter. There are two possible ways to get the samples regard to a parameter. There are two possible ways to get the samples

from the two populations: from the two populations:

1.1. Independent samples – The data from one sample do not tell us anything Independent samples – The data from one sample do not tell us anything for the data in the other sample (cases B and D)for the data in the other sample (cases B and D)

2.2. Paired Data – A natural pairing exists among the two samples, e.g. “before Paired Data – A natural pairing exists among the two samples, e.g. “before and after” studies, studies on twins, etc. (case C)and after” studies, studies on twins, etc. (case C)

Basic formula for CI remains the same!Basic formula for CI remains the same!

Estimate ± Multiplier x Standard Error of the EstimateEstimate ± Multiplier x Standard Error of the Estimate

Page 3: More About Confidence Intervals

Recognize the SituationRecognize the Situation The biggest challenge that most of you face at this point is The biggest challenge that most of you face at this point is

reading a problem and deciding which kind of confidence reading a problem and deciding which kind of confidence interval is required. So, I will make it very clear how to do so, interval is required. So, I will make it very clear how to do so, and then we will get some practice.and then we will get some practice.

First, you need to identify the response variable and then First, you need to identify the response variable and then determine what type of variable (categorical or quantitative) it determine what type of variable (categorical or quantitative) it is.is.

If it is categorical, we are dealing with proportions. From there, If it is categorical, we are dealing with proportions. From there, you should be able to determine whether we are looking at just you should be able to determine whether we are looking at just one proportion or the difference between two proportions.one proportion or the difference between two proportions.

If the variable of interest is quantitative, we are dealing with If the variable of interest is quantitative, we are dealing with means. If it is just one mean, you are all set. If we are looking means. If it is just one mean, you are all set. If we are looking at the difference between two means, you need to determine if at the difference between two means, you need to determine if they are paired or independent.they are paired or independent.

Recognizing what you need to do is half the battle. Once you Recognizing what you need to do is half the battle. Once you have accomplished that, it is just a matter of putting the right have accomplished that, it is just a matter of putting the right pieces together. Every confidence interval requires a sample pieces together. Every confidence interval requires a sample estimate, a multiplier, and a standard error, and you should estimate, a multiplier, and a standard error, and you should have the right formulas written down for each type of CI. Once have the right formulas written down for each type of CI. Once you have made the correct diagnosis, just plug and have fun you have made the correct diagnosis, just plug and have fun with the calculations!with the calculations!

Page 4: More About Confidence Intervals

SubjectSubject GenderGender Blue Blue Eyes?Eyes?

11 MaleMale NN

22 FemaleFemale YY

33 FemaleFemale NN

44 MaleMale NN

Etc. Etc. …….. ……..

Data Table based on Each Observation

Think about what variables are recorded for each subject. In this case we have gender and eye color for each subject, both of which are categorical variables. When we want to compare the categorical response variable (Blue Eyes) over 2-levels of the categorical predictor variable (Gender), we want a confidence interval for 2 Proportions. Note: Means would make NO SENSE here. You can’t have the mean of a categorical variable!

Construct a 95% CI for the difference in the proportion of men and women who have blue eyes.

Example 1Example 1:: John records the number of blue eyed individuals John records the number of blue eyed individuals from a sample of 60 men and 60 women. Construct an from a sample of 60 men and 60 women. Construct an appropriate confidence interval for the difference between men appropriate confidence interval for the difference between men and women with respect to blue eyes. and women with respect to blue eyes.

Page 5: More About Confidence Intervals

Example 2Example 2: John recorded the lengths of height of 50 : John recorded the lengths of height of 50 randomly chosen redwood trees in State College. He is randomly chosen redwood trees in State College. He is interested in estimating the average height of redwood interested in estimating the average height of redwood trees in State College. It is easy to see that the data would trees in State College. It is easy to see that the data would consist of a single consist of a single quantitativequantitative variable (height) measured variable (height) measured for each tree. An appropriate CI might be a 95% CI for the for each tree. An appropriate CI might be a 95% CI for the mean height of redwood trees. That is a CI for 1 Mean. mean height of redwood trees. That is a CI for 1 Mean.

TreeTree Height (ft)Height (ft)

11 190190

22 230230

33 175175

44 245245

Etc.Etc. ……

Note: If height had been replaced by a categorical variable (e.g. Tree greater than 200 ft - Yes/No) then a confidence interval for 1 Proportion would have been appropriate.

Page 6: More About Confidence Intervals

Examples: Independent vs. Paired Examples: Independent vs. Paired DataData

Independent Data:Independent Data: Occurs when the observations are not Occurs when the observations are not related in any way. For example taking a random sample of related in any way. For example taking a random sample of 50 males and 50 females and recording their SAT scores. 50 males and 50 females and recording their SAT scores. The scores from the first female and the first male are NOT The scores from the first female and the first male are NOT related. The observations are independent.related. The observations are independent.

Paired Data:Paired Data: Occurs when the observations are paired. For Occurs when the observations are paired. For example if we select 50 random subjects to participate in a example if we select 50 random subjects to participate in a diet study and we record their weights before and after. diet study and we record their weights before and after. The weight before is paired with the weight after for each The weight before is paired with the weight after for each individual. Paired data occurs when either there are individual. Paired data occurs when either there are repeated measurements on the same unit (e.g. before and repeated measurements on the same unit (e.g. before and after some treatment) or if the units themselves are after some treatment) or if the units themselves are naturally paired (ex. twins, husband and wife, etc. )naturally paired (ex. twins, husband and wife, etc. )

Page 7: More About Confidence Intervals

Structure of Paired and Structure of Paired and Independent DataIndependent Data

Independent Data:Independent Data: A random sample of 400 apples is taken A random sample of 400 apples is taken off the shelf at a grocery store. The apples are classified as off the shelf at a grocery store. The apples are classified as yellow or red, and the amount of vitamin C in each apple is yellow or red, and the amount of vitamin C in each apple is recorded.recorded.

AppleApple ColorColor Vitamin C Vitamin C (mg)(mg)

11 RedRed 125125

22 RedRed 110110

33 YellowYellow 235235

44 RedRed 104104

Etc. Etc. …….. ……..

What type of CI makes sense here?

A CI for the difference in the mean amount of vitamin C between yellow and red apples. That is a CI for 2 Means.

Page 8: More About Confidence Intervals

Structure of Paired and Structure of Paired and Independent DataIndependent Data

Paired Data:Paired Data: A random sample of 200 patients is administered a A random sample of 200 patients is administered a new cholesterol drug. The patients cholesterol is recorded before new cholesterol drug. The patients cholesterol is recorded before and after taking the drug.and after taking the drug.

PatientPatient Cholesterol Cholesterol BeforeBefore

Cholesterol Cholesterol AfterAfter

Decrease Decrease in in CholesteroCholesteroll

11 235235 215215 2020

22 310310 254254 5656

33 198198 178178 2020

44 245245 231231 1414

Etc. Etc. …….. ……..What type of CI makes sense here?

A CI for the mean decrease in cholesterol. That is a CI for 1 Mean based on the pair-wise differences (decrease in cholesterol).

Page 9: More About Confidence Intervals

Practice…Practice… Twenty-five people have their blood pressure measured in Twenty-five people have their blood pressure measured in

the morning and again in the afternoon. The data will be the morning and again in the afternoon. The data will be used to determine whether blood pressure increases during used to determine whether blood pressure increases during the day.the day.

IndependentIndependent PairedPaired

What is the difference in average ages at which teachers What is the difference in average ages at which teachers and plumbers retire? and plumbers retire?

IndependentIndependent PairedPaired

A sample of 100 students at a university was asked how A sample of 100 students at a university was asked how many hours a week they spent studying and how many they many hours a week they spent studying and how many they spent socializing. The difference was computed for each spent socializing. The difference was computed for each student. student.

IndependentIndependent PairedPaired

What is the difference in average salaries for high school What is the difference in average salaries for high school graduates and college graduates? graduates and college graduates?

IndependentIndependent PairedPaired

Students are asked their actual weight and their ideal Students are asked their actual weight and their ideal weight in order to determine how far they are from their weight in order to determine how far they are from their "goal"."goal".

IndependentIndependent PairedPaired

Page 10: More About Confidence Intervals

General Format of a CIGeneral Format of a CI In Chapter 10 we have seen how to create confidence interval for In Chapter 10 we have seen how to create confidence interval for

a proportion. Recall that a a proportion. Recall that a ββ% C.I. for some population proportion % C.I. for some population proportion pp is is

where is the sample proportion (the statistic), and the z* where is the sample proportion (the statistic), and the z* multiplier depends on the desired confidence level, multiplier depends on the desired confidence level, ββ% and is % and is obtained from the standard normal tables. More specifically, z* is obtained from the standard normal tables. More specifically, z* is such that such that

P(-z*<Z<z*)= P(-z*<Z<z*)= ββ%.%.

In general, the format of a CI for a parameter isIn general, the format of a CI for a parameter is Sample Estimate ± Multiplier x Standard Error of the Sample Sample Estimate ± Multiplier x Standard Error of the Sample

EstimateEstimate

In the following, we will see what is the appropriate sample In the following, we will see what is the appropriate sample statistic what is its standard error and how to obtain the multiplier statistic what is its standard error and how to obtain the multiplier for each of the situations.for each of the situations.

)ˆ(*ˆ psezp p̂

Page 11: More About Confidence Intervals

CI for One MeanCI for One Mean Here is the case were we want to make inference about the Here is the case were we want to make inference about the

population mean of a quantitative random variable.population mean of a quantitative random variable.

The The sample statisticsample statistic used in this case is the sample mean used in this case is the sample mean

The The standard errorstandard error of the sample mean is of the sample mean is where where ss is the sample standard deviation, and is the sample standard deviation, and nn is the sample is the sample

size.size.

It remains to specify the “Multiplier” in the general form of a CI. It remains to specify the “Multiplier” in the general form of a CI. To do so we nee to introduce some further distribution theory. To do so we nee to introduce some further distribution theory.

In Chapter 9 we have seen that if we have a sample from a In Chapter 9 we have seen that if we have a sample from a population with some mean population with some mean µµ and some standard deviation and some standard deviation σσ, , then under some conditions is normal with mean µ and std then under some conditions is normal with mean µ and std deviation deviation σσ/√n. Equivalently, /√n. Equivalently,

If If σσ was known, based on this result we would be able to create a was known, based on this result we would be able to create a CI for CI for µµ. However, usually this is not the case.. However, usually this is not the case.

.x,)(s.e nsx

1. dev std and 0mean with normal is ,n

X

X

Page 12: More About Confidence Intervals

CI for One Mean CI for One Mean Replacing Replacing σσ with with s, s, we have thatwe have that

if one of the following conditions is true:if one of the following conditions is true:1.1. the random variable of interest is bell-shaped (in practice, for small the random variable of interest is bell-shaped (in practice, for small

samples the data should show no extreme skewness or outliers).samples the data should show no extreme skewness or outliers).2.2. the random variable is not bell-shaped, but a large random sample the random variable is not bell-shaped, but a large random sample

is measured, is measured, n n ≥ 30.≥ 30.

Some Properties of the t-distributionSome Properties of the t-distribution::1.1. There are infinitely many t-distributions, each characterized by one There are infinitely many t-distributions, each characterized by one

parameter, the degrees of freedom (df).parameter, the degrees of freedom (df).

2.2. The degrees of freedom are positive integers, e.g. 1,2,…The degrees of freedom are positive integers, e.g. 1,2,…

3.3. Random variables with t-distribution are continuous.Random variables with t-distribution are continuous.

4.4. The density curve of a t-distribution is symmetric, bell-shaped and The density curve of a t-distribution is symmetric, bell-shaped and centered at zero (similar to the standard normal curve).centered at zero (similar to the standard normal curve).

5.5. As the degrees of freedom increase, the variance of the t- random As the degrees of freedom increase, the variance of the t- random variable decreases, i.e. the density curve is less spread, and variable decreases, i.e. the density curve is less spread, and actually it approaches the standard normal density. (That implies actually it approaches the standard normal density. (That implies that the density curve of a t-distribution is more spread out than that the density curve of a t-distribution is more spread out than the standard normal curve.)the standard normal curve.)

freedom. of degrees )1(on with distributi- thas,

nns

X

Page 13: More About Confidence Intervals

CI for One MeanCI for One Mean Based on these results we have that the Based on these results we have that the multipliermultiplier for the for the

confidence interval of µ is the value in the t-distribution with confidence interval of µ is the value in the t-distribution with df=n-1, such that the area between the (multiplier) and the -df=n-1, such that the area between the (multiplier) and the -(multiplier) is equal to the desired confidence level. (multiplier) is equal to the desired confidence level.

The multiplier in this case is denoted with t*.The multiplier in this case is denoted with t*.

We can easily obtain the values of the multiplier from Table We can easily obtain the values of the multiplier from Table A2. Here are some examples for the values of the multiplier:A2. Here are some examples for the values of the multiplier:

1.1. n= 41 (i.e. df=40), confidence level 95%, t*=2.02.n= 41 (i.e. df=40), confidence level 95%, t*=2.02.

2.2. n= 10 (i.e. df=9), confidence level 99%, t*=3.25.n= 10 (i.e. df=9), confidence level 99%, t*=3.25.

Summary – Steps to obtain CI for µ:Summary – Steps to obtain CI for µ:

1.1. Check if the condition is satisfied, i.e. bell shaped population Check if the condition is satisfied, i.e. bell shaped population or n≥30. or n≥30.

2.2. CalculateCalculate

3.3. Based on the required confidence level, Based on the required confidence level, ββ%, and the degrees %, and the degrees of freedom (n-1), use Table A2 to get the multiplier t*.of freedom (n-1), use Table A2 to get the multiplier t*.

4.4. The The ββ% CI for µ is % CI for µ is .* nstx

.)(s.e and nsxx

Page 14: More About Confidence Intervals

Special case of CI for 1 Mean: CI for PairedSpecial case of CI for 1 Mean: CI for Paired Data Data Consider the example were we are interested in the difference in Consider the example were we are interested in the difference in

the mean blood pressure before exercise and after exercise. the mean blood pressure before exercise and after exercise.

We are interested in estimating µWe are interested in estimating µ1 1 -µ-µ22 for forµµ11: mean blood pressure before exercise: mean blood pressure before exerciseµµ22: mean blood pressure after exercise.: mean blood pressure after exercise.

For each person we have two measurements resulting in two For each person we have two measurements resulting in two samples, the ''before'' sample (the values of blood pressure samples, the ''before'' sample (the values of blood pressure before exercise) and the "after" sample (the values of blood before exercise) and the "after" sample (the values of blood pressure before exercise).pressure before exercise).

However, we are just interested in the difference between the However, we are just interested in the difference between the "before" measurement and the "after" measurement. So, for "before" measurement and the "after" measurement. So, for each pair of values we computer their difference resulting in one each pair of values we computer their difference resulting in one sample of the differences. sample of the differences. Then, using the sample of the Then, using the sample of the differences we can create a C.I. for the population mean of the differences we can create a C.I. for the population mean of the differences using the same procedure as the CI for one mean!differences using the same procedure as the CI for one mean!

Let µLet µdd= the population mean of the differences, and the = the population mean of the differences, and the sample mean of the differences, thensample mean of the differences, then

The CI for µThe CI for µd d is is

where where ssdd is the sample standard deviation of the differences. is the sample standard deviation of the differences.

nstd d*

d. and 2121 xxdd

Page 15: More About Confidence Intervals

Difference between two means Difference between two means (Independent Samples).(Independent Samples).

Steps to obtain CI of Steps to obtain CI of µµ1 1 - µ- µ22 (difference between 2 pop. (difference between 2 pop. Means):Means):

1. Check if the following conditions are valid:1. Check if the following conditions are valid:1.1. The two samples are independent.The two samples are independent.2.2. Each sample is either coming from a bell shaped population or Each sample is either coming from a bell shaped population or

the sample size is ≥30.the sample size is ≥30.

2. Calculate the sample statistic and the standard error2. Calculate the sample statistic and the standard error

where nwhere n11, n, n2 2 are the sizes of the two samples and sare the sizes of the two samples and s1122,, ss22

2 2 are are the the

variances of the two samples.variances of the two samples.

3. The multiplier for the confidence interval is a t-multiplier (t*) and 3. The multiplier for the confidence interval is a t-multiplier (t*) and the df are approximately equal to the lesser of the df are approximately equal to the lesser of nn11--1 and 1 and nn2 2 -1.-1.

4. The 4. The ββ% CI for % CI for µµ1 1 - µ- µ22 is is

,)s.e(2

22

1

21

21 n

s

n

sxx

21 xx

)(.* 2121 xxestxx

Page 16: More About Confidence Intervals

Difference between two Difference between two proportionsproportions

(Independent Samples).(Independent Samples).

Steps to obtain CI of Steps to obtain CI of pp11 – p – p2 2 (difference between 2 pop. (difference between 2 pop. Prop.):Prop.):

1. Check if the following conditions are valid:1. Check if the following conditions are valid:1.1. The two samples are independent.The two samples are independent.2.2. All the quantities are at All the quantities are at

least 5 and preferably at least 10.least 5 and preferably at least 10.

2. Calculate the sample statistic and the standard error2. Calculate the sample statistic and the standard error

where where nn11, n, n22 are the sizes of the two samples and are the sizes of the two samples and are the are the

sample proportions in the two samples.sample proportions in the two samples.

3. The multiplier for the confidence interval is a z-multiplier (z*) 3. The multiplier for the confidence interval is a z-multiplier (z*) like in the one sample case, i.e.like in the one sample case, i.e. P(-z*<Z<z*)= P(-z*<Z<z*)= ββ%%..

4. The 4. The ββ% CI for % CI for pp11 – p – p22 is is

)ˆ1( and ˆ),ˆ1(,ˆ 22221111 pnpnpnpn

21 ˆˆ pp

2

22

1

11

21

)ˆ1(ˆ)ˆ1(ˆ)ˆˆ( s.e

n

pp

n

pppp

2 1 ˆ ,ˆ pp

)ˆˆ(.*)ˆˆ( 2121 ppeszpp

Page 17: More About Confidence Intervals

Table of CI TypesTable of CI TypesTypeType ParametParamet

ererStatistiStatisticc

Standard ErrorStandard Error MultiplierMultiplier

One Mean One Mean

(or Paired (or Paired mean)mean)

µ µ oror µ µdd

or or or or t*t*

df=n-1df=n-1

Difference Difference Between Between MeansMeans

µµ11- µ- µ22

t*t*df=min(ndf=min(n11-1,n-1,n22--1)1)

One ProportionOne Proportionpp

z*z*

Difference Difference Between Between ProportionsProportions

pp11-p-p22 z*z*

x d

n

s

n

sd

21 xx 2

2

1

122

n

s

n

s

21 ˆˆ pp

n

pp )ˆ1(ˆ

2

22

1

11 )ˆ1(ˆ)ˆ1(ˆ

n

pp

n

pp

Page 18: More About Confidence Intervals

Conditions Necessary for Conditions Necessary for Confidence IntervalsConfidence Intervals

1 mean or Difference Between Paired 1 mean or Difference Between Paired MeansMeansPopulation is normal (bell-shaped) or n≥30.Population is normal (bell-shaped) or n≥30.

Difference Between 2 Independent meansDifference Between 2 Independent meansAt least one of the above conditions must At least one of the above conditions must hold for BOTH samples. The two samples hold for BOTH samples. The two samples are indepentet.are indepentet.

Difference Between 2 ProportionsDifference Between 2 Proportions Both AND Both AND

must be greater than or equal to 10.must be greater than or equal to 10.

)ˆ1(,ˆ 1111 pnpn )ˆ1(,ˆ 2222 pnpn

Page 19: More About Confidence Intervals

Example 1Example 1 Veronica records the weights of Veronica records the weights of 6464 adult black bears trapped in adult black bears trapped in

New York in the fall of 2002. The sample mean weight was New York in the fall of 2002. The sample mean weight was 210210 lbs and with a standard deviation of lbs and with a standard deviation of 2525 lbs. Construct a lbs. Construct a 95%95% confidence interval for the mean weight of adult black bears.confidence interval for the mean weight of adult black bears.

The parameter of interest is The parameter of interest is μμ, the population mean weight of black , the population mean weight of black bears. bears.

ConditionsConditions: The sample size is greater than 30, n=64> 30.: The sample size is greater than 30, n=64> 30.

The multiplier is a t*. Use table A.2 in your text. The df = n-1 = 63 The multiplier is a t*. Use table A.2 in your text. The df = n-1 = 63 and the CI level=95%. and the CI level=95%.

Note:Note: If they do not have the specific df, then use the next LOWEST If they do not have the specific df, then use the next LOWEST number in the table. So for df=60, we get t*=2.number in the table. So for df=60, we get t*=2.

95% CI for 95% CI for μμ: 210± 2(3.125) = (203.8,216.3): 210± 2(3.125) = (203.8,216.3)

Interpretation: We are 95% confident that the mean weight Interpretation: We are 95% confident that the mean weight of adult black bears is between 203.8 and 216.3 lbs. of adult black bears is between 203.8 and 216.3 lbs.

.125.38/25)( s.e and lbs, 210 n

sxx

Page 20: More About Confidence Intervals

The parameter of interest is The parameter of interest is ppf f – p– pm m .. Conditions: Conditions: All quantities, and are All quantities, and are

greater than 10. greater than 10.

For 99% confidence level z*= 2.58.For 99% confidence level z*= 2.58.

The 99% CI for The 99% CI for ppf f - p- pmm is: .20 ±2.58(.0447) = (.085,.315). is: .20 ±2.58(.0447) = (.085,.315).

Interpretation: Interpretation: We are 99% confident that the proportion of We are 99% confident that the proportion of females who are pro-choice is between 8.5% and 31.5% females who are pro-choice is between 8.5% and 31.5% greater than the proportion of males who are pro-choice.greater than the proportion of males who are pro-choice.

Example 2Example 2 Margaret conducts a study to determine the difference in opinion between Margaret conducts a study to determine the difference in opinion between

men and women on abortion. She randomly asks men and women on abortion. She randomly asks 200 200 men and men and 300300 women whether they are pro-life or pro-choice. women whether they are pro-life or pro-choice. 8080 men and men and 180180 women women say they are pro-choice. Construct a say they are pro-choice. Construct a 9999% confidence interval for the % confidence interval for the difference in the proportion of men and women who are pro-choice.difference in the proportion of men and women who are pro-choice.

)ˆ1(,ˆ 1111 pnpn )ˆ1(,ˆ 2222 pnpn

0447.00(.6)(.4)/300(.4)(.6)/2 )ˆˆ( s.e

2.4.6.200/80300/180ˆˆ

mf

mf

pp

pp

Page 21: More About Confidence Intervals

Identifying the C.IIdentifying the C.I For each example below, decide which For each example below, decide which

type of confidence interval should be type of confidence interval should be calculatedcalculated..

We want to estimate the difference between the We want to estimate the difference between the heights of smokers and non-smokers at PSU.heights of smokers and non-smokers at PSU.

We want to calculate an interval that contains the We want to calculate an interval that contains the fraction of all PSU students who are right-handed.fraction of all PSU students who are right-handed.

We want to capture the difference between the We want to capture the difference between the proportions of smokers and non-smokers at PSU proportions of smokers and non-smokers at PSU who have two or more tattoos.who have two or more tattoos.

We want to estimate the daily sugar intake (in We want to estimate the daily sugar intake (in grams) of adult Americans.grams) of adult Americans.