Inference about a population proportion. 1. Paper due March 29 Last day for consultation with me...

Post on 14-Jan-2016

214 views 0 download

Tags:

Transcript of Inference about a population proportion. 1. Paper due March 29 Last day for consultation with me...

Inference about a population proportion.

1

• Paper due March 29• Last day for consultation with me March 22

2

4

Prediction

5

Prediction

6

Probabilistic Reasoning

• “The Achilles’ heel of human cognition.”

7

Probabilistic Reasoning

• “Men are taller than women”

• “All men are taller than all women”

8

Probabilistic Reasoning

• A probabilistic trend means that it is more likely than not but does not always hold true.

9

Probabilistic Reasoning

• Knowledge does not have to be certain to be useful.

• Individual cases cannot be predicted but trends can

BPS - 5th Ed. Chapter 19 10

• The proportion of a population that has some outcome (“success”) is p.

• The proportion of successes in a sample is measured by the sample proportion:

Proportions

sample the in nsobservatio of number totalsample the in successes of numberp̂

“p-hat”

BPS - 5th Ed. Chapter 19 11

Inference about a ProportionSimple Conditions

Confidence Intervals for Proportions

• Social media is poised to become a central player in the 2012 

12

Example 19.5 page 508

• What proportion of Euros have cocaine traces?

• Sample 17 out of 20• 85%• Plus 4 method• 79%

13

n

ppzp

ˆˆˆ 1

Dealing with sampling error

• Confidence intervals • Hypothesis testing

Obtaining confidence intervals• estimate + or - margin of

error

Determining Critical values of Z

• 90% .05 1.645• 95% .025 1.96• 99% .005 2.576• Critical Values: values that mark off a

specified area under the standard normal curve.

19.25 page 517

• Do smokers know it is bad for them?

• Yes 848• Total 1010• 85%• Margin of error .2263• Lower limit .8170• Upper .8622

17

Problem 19.6 page 507

• What proportion of SAT takers have coaching?

• 427 coaching• 2733 did not• 3160 total• standard error 0.0061• margin of error

0.0157• upper 0.1508• lower 0.1195

18

n

ppzp

ˆˆˆ 1

19

Two-way tables

William P. Wattles, Ph.D.

Chapter 20

20

Categorical Data

• Examples, gender, race, occupation, type of cellphone, type of trash are categorical

21

Categorical Data• Sometimes

measurement data is grouped into categorical.

heart dis Freqless than 219.2 12219.2-247.9 13248-282 13>=282 13

22

Categorical Data

• Expressed in counts or percents

heart dis Freq %less than 219.2 12 24%219.2-247.9 13 25%248-282 13 25%>=282 13 25%

51 100%

     Less than 219.2

     219.2 to 247.9

     248.0 to 282.0

     More than 282.0

23

PopulationParameter

p = population proportion

Sample

phat=sample proportion

24

counttotal

successesofcount

proportionsamplep

ˆ

25

Two-way table

• Organizes data about two categorical variables

Column VariableRow variable column 1 column 2 column 3row1 # # # row1 totalrow2 # # # row2 total

col1 total col2 total col3 total

Chapter 6 26BPS - 5th Ed.

• Now we will study the relationship between two categorical variables (variables whose values fall in groups or categories).

• To analyze categorical data, use the counts or percents of individuals that fall into various categories.

Categorical Variables

Chapter 6 27BPS - 5th Ed.

• When there are two categorical variables, the data are summarized in a two-way table– each row in the table represents a value of the row

variable– each column of the table represents a value of the column

variable

• The number of observations falling into each combination of categories is entered into each cell of the table

Two-Way Table

Two-way table

28

25- 34 35- 54 55+No High School 4459 9174 14226 27859High School 11562 26455 20060 58077College 1- 3 10693 22647 11125 44465College 4 + 11071 23160 10597 44828

37785 81436 56008

Chapter 6 29BPS - 5th Ed.

• A distribution for a categorical variable tells how often each outcome occurred – totaling the values in each row of the table gives the

marginal distribution of the row variable (totals are written in the right margin)

– totaling the values in each column of the table gives the marginal distribution of the column variable (totals are written in the bottom margin)

Marginal Distributions

30

25- 34 35- 54 55+No High School 4459 9174 14226 27859High School 11562 26455 20060 58077College 1- 3 10693 22647 11125 44465College 4 + 11071 23160 10597 44828

37785 81436 56008

Chapter 6 31BPS - 5th Ed.

• It is usually more informative to display each marginal distribution in terms of percents rather than counts– each marginal total is divided by the table total to

give the percents

• A bar graph could be used to graphically display marginal distributions for categorical variables

Marginal Distributions

32

25- 34 35- 54 55+No High School 4459 9174 14226 15.9%High School 11562 26455 20060 33.1%College 1- 3 10693 22647 11125 25.4%College 4 + 11071 23160 10597 25.6%

21.6% 46.5% 32.0%

Chapter 6 33BPS - 5th Ed.

Case Study

Data from the U.S. Census Bureau for the year 2000 on the level of education reached by

Americans of different ages.

(Statistical Abstract of the United States, 2001)

Age and Education

Chapter 6 34BPS - 5th Ed.

Case StudyAge and Education

Variables

Marginal distributions

Chapter 6 35BPS - 5th Ed.

Case StudyAge and Education

Variables

Marginal distributions

21.6% 46.5% 32.0%

15.9%33.1%25.4%25.6%

Chapter 6 36BPS - 5th Ed.

Case StudyAge and Education

Marginal Distributionfor Education Level

Not HS grad 15.9%

HS grad 33.1%

College 1-3 yrs 25.4%

College ≥4 yrs 25.6%

Chapter 6 37BPS - 5th Ed.

• Relationships between categorical variables are described by calculating appropriate percents from the counts given in the table– prevents misleading comparisons due to unequal

sample sizes for different groups

Conditional Distributions

Chapter 6 38BPS - 5th Ed.

Case StudyAge and Education

Compare the 25-34 age group to the 35-54 age group in terms of success in completing at least 4 years of college:

Data are in thousands, so we have that 11,071,000 persons in the 25-34 age group have completed at least 4 years of college, compared to 23,160,000 persons in the 35-54 age group.

The groups appear greatly different, but look at the group totals.

BPS - 5th Ed. Chapter 6 39

Case StudyAge and Education

Compare the 25-34 age group to the 35-54 age group in terms of success in completing at least 4 years of college:

Change the counts to percents: Now, with a fairer comparison using percents, the groups appear very similar.group age 54-35 for (28.4%) .284

81,435

23,160

group age 34-25 for (29.3%) .29337,786

11,071

Chapter 6 40BPS - 5th Ed.

Case StudyAge and Education

If we compute the percent completing at least four years of college for all of the age groups, this would give us the conditional distribution of age, given that the education level is “completed at least 4 years of college”:

Age: 25-34 35-54 55 and over

Percent with≥ 4 yrs college: 29.3% 28.4% 18.9%

Chapter 6 41BPS - 5th Ed.

• The conditional distribution of one variable can be calculated for each category of the other variable.

• These can be displayed using bar graphs.• If the conditional distributions of the second variable are

nearly the same for each category of the first variable, then we say that there is not an association between the two variables.

• If there are significant differences in the conditional distributions for each category, then we say that there is an association between the two variables.

Conditional Distributions

Chapter 6 42BPS - 5th Ed.

Case StudyAge and Education

Conditional Distributions of Age for each level of Education:

Cell phone preference

43

44

Marginal Distribution

• Row and column totals • Provides counts or percents of one variable

45

Conditional Variable

• Each value as a Percent of the marginal distribution

46

Two-way Tables

• Do you think the Bush administration has a clear and well-thought-out policy on Iraq, or not?

•  new yorkers USA

Yes 42% 59%No 48% 35%No opinion

10% 6%

47

Relationships between categorical variables

Risks of SoccerElite non-elite did not play

Arthritis 10 9 24No Arthritis 61 206 548

48

Relationships between categorical variables

Risks of SoccerElite non-elite did not play

Arthritis 10 9 24 43No Arthritis 61 206 548 815

71 215 572 8588% 25% 67%

49

Relationships between categorical variables

• Calculate percent of players who had arthritis

50

Relationships between categorical variables

• Calculate percent of players who had arthritis

Risks of Soccer Percent with ArthritisElite non-elite did not play

Arthritis 14.1% 4.2% 4.2%No Arthritis 85.9% 95.8% 95.8%

51

Categorical data

• Smoking Data

SmokingNeither parent smokes

one parent smokes

both parents smoke

Student does not smoke 1168 1823 1380Student smokes 188 416 400

52

Categorical data

• Smoking Data

SmokingNeither parent smokes

one parent smokes

both parents smoke

Student does not smoke 1168 1823 1380 4371Student smokes 188 416 400 1004

1356 2239 1780 5375

53

Categorical data

• Smoking Data

SmokingNeither parent smokes

one parent smokes

both parents smoke

Student does not smoke 86.1% 81.4% 77.5%Student smokes 13.9% 18.6% 22.5%

54

Student Smoking

0.0%

5.0%

10.0%

15.0%

20.0%

25.0%

Neither parent smokes one parent smokes both parents smoke

sm

ok

e

55

Evaluating Treatment

better not betterswim 200 75no swim 50 15

56

Evaluating Treatment

better not betterswim 200 75 275no swim 50 15 65

250 90 340

57

better not betterswim 200 75 275no swim 50 15 65

250 90 340

Percent improvedbetter not betterswim 73% 27% 100%no swim 77% 23% 100%

58The End