Inference about a population proportion. 1. Paper due March 29 Last day for consultation with me...

58
Inference about a population proportion. 1

Transcript of Inference about a population proportion. 1. Paper due March 29 Last day for consultation with me...

Page 1: Inference about a population proportion. 1. Paper due March 29 Last day for consultation with me March 22 2.

Inference about a population proportion.

1

Page 2: Inference about a population proportion. 1. Paper due March 29 Last day for consultation with me March 22 2.

• Paper due March 29• Last day for consultation with me March 22

2

Page 4: Inference about a population proportion. 1. Paper due March 29 Last day for consultation with me March 22 2.

4

Prediction

Page 5: Inference about a population proportion. 1. Paper due March 29 Last day for consultation with me March 22 2.

5

Prediction

Page 6: Inference about a population proportion. 1. Paper due March 29 Last day for consultation with me March 22 2.

6

Probabilistic Reasoning

• “The Achilles’ heel of human cognition.”

Page 7: Inference about a population proportion. 1. Paper due March 29 Last day for consultation with me March 22 2.

7

Probabilistic Reasoning

• “Men are taller than women”

• “All men are taller than all women”

Page 8: Inference about a population proportion. 1. Paper due March 29 Last day for consultation with me March 22 2.

8

Probabilistic Reasoning

• A probabilistic trend means that it is more likely than not but does not always hold true.

Page 9: Inference about a population proportion. 1. Paper due March 29 Last day for consultation with me March 22 2.

9

Probabilistic Reasoning

• Knowledge does not have to be certain to be useful.

• Individual cases cannot be predicted but trends can

Page 10: Inference about a population proportion. 1. Paper due March 29 Last day for consultation with me March 22 2.

BPS - 5th Ed. Chapter 19 10

• The proportion of a population that has some outcome (“success”) is p.

• The proportion of successes in a sample is measured by the sample proportion:

Proportions

sample the in nsobservatio of number totalsample the in successes of numberp̂

“p-hat”

Page 11: Inference about a population proportion. 1. Paper due March 29 Last day for consultation with me March 22 2.

BPS - 5th Ed. Chapter 19 11

Inference about a ProportionSimple Conditions

Page 12: Inference about a population proportion. 1. Paper due March 29 Last day for consultation with me March 22 2.

Confidence Intervals for Proportions

• Social media is poised to become a central player in the 2012 

12

Page 13: Inference about a population proportion. 1. Paper due March 29 Last day for consultation with me March 22 2.

Example 19.5 page 508

• What proportion of Euros have cocaine traces?

• Sample 17 out of 20• 85%• Plus 4 method• 79%

13

n

ppzp

ˆˆˆ 1

Page 14: Inference about a population proportion. 1. Paper due March 29 Last day for consultation with me March 22 2.

Dealing with sampling error

• Confidence intervals • Hypothesis testing

Page 15: Inference about a population proportion. 1. Paper due March 29 Last day for consultation with me March 22 2.

Obtaining confidence intervals• estimate + or - margin of

error

Page 16: Inference about a population proportion. 1. Paper due March 29 Last day for consultation with me March 22 2.

Determining Critical values of Z

• 90% .05 1.645• 95% .025 1.96• 99% .005 2.576• Critical Values: values that mark off a

specified area under the standard normal curve.

Page 17: Inference about a population proportion. 1. Paper due March 29 Last day for consultation with me March 22 2.

19.25 page 517

• Do smokers know it is bad for them?

• Yes 848• Total 1010• 85%• Margin of error .2263• Lower limit .8170• Upper .8622

17

Page 18: Inference about a population proportion. 1. Paper due March 29 Last day for consultation with me March 22 2.

Problem 19.6 page 507

• What proportion of SAT takers have coaching?

• 427 coaching• 2733 did not• 3160 total• standard error 0.0061• margin of error

0.0157• upper 0.1508• lower 0.1195

18

n

ppzp

ˆˆˆ 1

Page 19: Inference about a population proportion. 1. Paper due March 29 Last day for consultation with me March 22 2.

19

Two-way tables

William P. Wattles, Ph.D.

Chapter 20

Page 20: Inference about a population proportion. 1. Paper due March 29 Last day for consultation with me March 22 2.

20

Categorical Data

• Examples, gender, race, occupation, type of cellphone, type of trash are categorical

Page 21: Inference about a population proportion. 1. Paper due March 29 Last day for consultation with me March 22 2.

21

Categorical Data• Sometimes

measurement data is grouped into categorical.

heart dis Freqless than 219.2 12219.2-247.9 13248-282 13>=282 13

Page 22: Inference about a population proportion. 1. Paper due March 29 Last day for consultation with me March 22 2.

22

Categorical Data

• Expressed in counts or percents

heart dis Freq %less than 219.2 12 24%219.2-247.9 13 25%248-282 13 25%>=282 13 25%

51 100%

     Less than 219.2

     219.2 to 247.9

     248.0 to 282.0

     More than 282.0

Page 23: Inference about a population proportion. 1. Paper due March 29 Last day for consultation with me March 22 2.

23

PopulationParameter

p = population proportion

Sample

phat=sample proportion

Page 24: Inference about a population proportion. 1. Paper due March 29 Last day for consultation with me March 22 2.

24

counttotal

successesofcount

proportionsamplep

ˆ

Page 25: Inference about a population proportion. 1. Paper due March 29 Last day for consultation with me March 22 2.

25

Two-way table

• Organizes data about two categorical variables

Column VariableRow variable column 1 column 2 column 3row1 # # # row1 totalrow2 # # # row2 total

col1 total col2 total col3 total

Page 26: Inference about a population proportion. 1. Paper due March 29 Last day for consultation with me March 22 2.

Chapter 6 26BPS - 5th Ed.

• Now we will study the relationship between two categorical variables (variables whose values fall in groups or categories).

• To analyze categorical data, use the counts or percents of individuals that fall into various categories.

Categorical Variables

Page 27: Inference about a population proportion. 1. Paper due March 29 Last day for consultation with me March 22 2.

Chapter 6 27BPS - 5th Ed.

• When there are two categorical variables, the data are summarized in a two-way table– each row in the table represents a value of the row

variable– each column of the table represents a value of the column

variable

• The number of observations falling into each combination of categories is entered into each cell of the table

Two-Way Table

Page 28: Inference about a population proportion. 1. Paper due March 29 Last day for consultation with me March 22 2.

Two-way table

28

25- 34 35- 54 55+No High School 4459 9174 14226 27859High School 11562 26455 20060 58077College 1- 3 10693 22647 11125 44465College 4 + 11071 23160 10597 44828

37785 81436 56008

Page 29: Inference about a population proportion. 1. Paper due March 29 Last day for consultation with me March 22 2.

Chapter 6 29BPS - 5th Ed.

• A distribution for a categorical variable tells how often each outcome occurred – totaling the values in each row of the table gives the

marginal distribution of the row variable (totals are written in the right margin)

– totaling the values in each column of the table gives the marginal distribution of the column variable (totals are written in the bottom margin)

Marginal Distributions

Page 30: Inference about a population proportion. 1. Paper due March 29 Last day for consultation with me March 22 2.

30

25- 34 35- 54 55+No High School 4459 9174 14226 27859High School 11562 26455 20060 58077College 1- 3 10693 22647 11125 44465College 4 + 11071 23160 10597 44828

37785 81436 56008

Page 31: Inference about a population proportion. 1. Paper due March 29 Last day for consultation with me March 22 2.

Chapter 6 31BPS - 5th Ed.

• It is usually more informative to display each marginal distribution in terms of percents rather than counts– each marginal total is divided by the table total to

give the percents

• A bar graph could be used to graphically display marginal distributions for categorical variables

Marginal Distributions

Page 32: Inference about a population proportion. 1. Paper due March 29 Last day for consultation with me March 22 2.

32

25- 34 35- 54 55+No High School 4459 9174 14226 15.9%High School 11562 26455 20060 33.1%College 1- 3 10693 22647 11125 25.4%College 4 + 11071 23160 10597 25.6%

21.6% 46.5% 32.0%

Page 33: Inference about a population proportion. 1. Paper due March 29 Last day for consultation with me March 22 2.

Chapter 6 33BPS - 5th Ed.

Case Study

Data from the U.S. Census Bureau for the year 2000 on the level of education reached by

Americans of different ages.

(Statistical Abstract of the United States, 2001)

Age and Education

Page 34: Inference about a population proportion. 1. Paper due March 29 Last day for consultation with me March 22 2.

Chapter 6 34BPS - 5th Ed.

Case StudyAge and Education

Variables

Marginal distributions

Page 35: Inference about a population proportion. 1. Paper due March 29 Last day for consultation with me March 22 2.

Chapter 6 35BPS - 5th Ed.

Case StudyAge and Education

Variables

Marginal distributions

21.6% 46.5% 32.0%

15.9%33.1%25.4%25.6%

Page 36: Inference about a population proportion. 1. Paper due March 29 Last day for consultation with me March 22 2.

Chapter 6 36BPS - 5th Ed.

Case StudyAge and Education

Marginal Distributionfor Education Level

Not HS grad 15.9%

HS grad 33.1%

College 1-3 yrs 25.4%

College ≥4 yrs 25.6%

Page 37: Inference about a population proportion. 1. Paper due March 29 Last day for consultation with me March 22 2.

Chapter 6 37BPS - 5th Ed.

• Relationships between categorical variables are described by calculating appropriate percents from the counts given in the table– prevents misleading comparisons due to unequal

sample sizes for different groups

Conditional Distributions

Page 38: Inference about a population proportion. 1. Paper due March 29 Last day for consultation with me March 22 2.

Chapter 6 38BPS - 5th Ed.

Case StudyAge and Education

Compare the 25-34 age group to the 35-54 age group in terms of success in completing at least 4 years of college:

Data are in thousands, so we have that 11,071,000 persons in the 25-34 age group have completed at least 4 years of college, compared to 23,160,000 persons in the 35-54 age group.

The groups appear greatly different, but look at the group totals.

Page 39: Inference about a population proportion. 1. Paper due March 29 Last day for consultation with me March 22 2.

BPS - 5th Ed. Chapter 6 39

Case StudyAge and Education

Compare the 25-34 age group to the 35-54 age group in terms of success in completing at least 4 years of college:

Change the counts to percents: Now, with a fairer comparison using percents, the groups appear very similar.group age 54-35 for (28.4%) .284

81,435

23,160

group age 34-25 for (29.3%) .29337,786

11,071

Page 40: Inference about a population proportion. 1. Paper due March 29 Last day for consultation with me March 22 2.

Chapter 6 40BPS - 5th Ed.

Case StudyAge and Education

If we compute the percent completing at least four years of college for all of the age groups, this would give us the conditional distribution of age, given that the education level is “completed at least 4 years of college”:

Age: 25-34 35-54 55 and over

Percent with≥ 4 yrs college: 29.3% 28.4% 18.9%

Page 41: Inference about a population proportion. 1. Paper due March 29 Last day for consultation with me March 22 2.

Chapter 6 41BPS - 5th Ed.

• The conditional distribution of one variable can be calculated for each category of the other variable.

• These can be displayed using bar graphs.• If the conditional distributions of the second variable are

nearly the same for each category of the first variable, then we say that there is not an association between the two variables.

• If there are significant differences in the conditional distributions for each category, then we say that there is an association between the two variables.

Conditional Distributions

Page 42: Inference about a population proportion. 1. Paper due March 29 Last day for consultation with me March 22 2.

Chapter 6 42BPS - 5th Ed.

Case StudyAge and Education

Conditional Distributions of Age for each level of Education:

Page 43: Inference about a population proportion. 1. Paper due March 29 Last day for consultation with me March 22 2.

Cell phone preference

43

Page 44: Inference about a population proportion. 1. Paper due March 29 Last day for consultation with me March 22 2.

44

Marginal Distribution

• Row and column totals • Provides counts or percents of one variable

Page 45: Inference about a population proportion. 1. Paper due March 29 Last day for consultation with me March 22 2.

45

Conditional Variable

• Each value as a Percent of the marginal distribution

Page 46: Inference about a population proportion. 1. Paper due March 29 Last day for consultation with me March 22 2.

46

Two-way Tables

• Do you think the Bush administration has a clear and well-thought-out policy on Iraq, or not?

•  new yorkers USA

Yes 42% 59%No 48% 35%No opinion

10% 6%

Page 47: Inference about a population proportion. 1. Paper due March 29 Last day for consultation with me March 22 2.

47

Relationships between categorical variables

Risks of SoccerElite non-elite did not play

Arthritis 10 9 24No Arthritis 61 206 548

Page 48: Inference about a population proportion. 1. Paper due March 29 Last day for consultation with me March 22 2.

48

Relationships between categorical variables

Risks of SoccerElite non-elite did not play

Arthritis 10 9 24 43No Arthritis 61 206 548 815

71 215 572 8588% 25% 67%

Page 49: Inference about a population proportion. 1. Paper due March 29 Last day for consultation with me March 22 2.

49

Relationships between categorical variables

• Calculate percent of players who had arthritis

Page 50: Inference about a population proportion. 1. Paper due March 29 Last day for consultation with me March 22 2.

50

Relationships between categorical variables

• Calculate percent of players who had arthritis

Risks of Soccer Percent with ArthritisElite non-elite did not play

Arthritis 14.1% 4.2% 4.2%No Arthritis 85.9% 95.8% 95.8%

Page 51: Inference about a population proportion. 1. Paper due March 29 Last day for consultation with me March 22 2.

51

Categorical data

• Smoking Data

SmokingNeither parent smokes

one parent smokes

both parents smoke

Student does not smoke 1168 1823 1380Student smokes 188 416 400

Page 52: Inference about a population proportion. 1. Paper due March 29 Last day for consultation with me March 22 2.

52

Categorical data

• Smoking Data

SmokingNeither parent smokes

one parent smokes

both parents smoke

Student does not smoke 1168 1823 1380 4371Student smokes 188 416 400 1004

1356 2239 1780 5375

Page 53: Inference about a population proportion. 1. Paper due March 29 Last day for consultation with me March 22 2.

53

Categorical data

• Smoking Data

SmokingNeither parent smokes

one parent smokes

both parents smoke

Student does not smoke 86.1% 81.4% 77.5%Student smokes 13.9% 18.6% 22.5%

Page 54: Inference about a population proportion. 1. Paper due March 29 Last day for consultation with me March 22 2.

54

Student Smoking

0.0%

5.0%

10.0%

15.0%

20.0%

25.0%

Neither parent smokes one parent smokes both parents smoke

sm

ok

e

Page 55: Inference about a population proportion. 1. Paper due March 29 Last day for consultation with me March 22 2.

55

Evaluating Treatment

better not betterswim 200 75no swim 50 15

Page 56: Inference about a population proportion. 1. Paper due March 29 Last day for consultation with me March 22 2.

56

Evaluating Treatment

better not betterswim 200 75 275no swim 50 15 65

250 90 340

Page 57: Inference about a population proportion. 1. Paper due March 29 Last day for consultation with me March 22 2.

57

better not betterswim 200 75 275no swim 50 15 65

250 90 340

Percent improvedbetter not betterswim 73% 27% 100%no swim 77% 23% 100%

Page 58: Inference about a population proportion. 1. Paper due March 29 Last day for consultation with me March 22 2.

58The End