STT 421 Day 7: September 28, 2015 September 28, 2015STT 421: Vince Melfi1.

25
STT 421 Day 7: September 28, 2015 September 28, 2015 STT 421: Vince Melfi 1

Transcript of STT 421 Day 7: September 28, 2015 September 28, 2015STT 421: Vince Melfi1.

Page 1: STT 421 Day 7: September 28, 2015 September 28, 2015STT 421: Vince Melfi1.

STT 421

Day 7: September 28, 2015

September 28, 2015 STT 421: Vince Melfi 1

Page 2: STT 421 Day 7: September 28, 2015 September 28, 2015STT 421: Vince Melfi1.

Sample Surveys

• Want to learn something about a (often large) group called the population.

• We only can collect data on a subset of the population, called the sample.

• We’d like the sample to be “representative” of the population.

• If a sampling method over or under represents an important characteristic, it’s called biased.

September 28, 2015 STT 421: Vince Melfi 2

Page 3: STT 421 Day 7: September 28, 2015 September 28, 2015STT 421: Vince Melfi1.

Literary Digest Poll (1936)

• Goal: Predict the outcome of the 1936 presidential election between Roosevelt and Landon

• Literary digest magazine mailed out 10 million surveys and got 2.4 million responses.

• Of those who responded, 57% preferred Landon to Roosevelt.

• On the basis of this (large!) sample, Literary Digest predicted a landslide victory for Landon

September 28, 2015 STT 421: Vince Melfi 3

Page 4: STT 421 Day 7: September 28, 2015 September 28, 2015STT 421: Vince Melfi1.

Literary Digest Poll (1936)

• George Gallup, a pollster, also tried to predict the outcome of the election

• He had a smaller sample size of 50,000. • But he selected his sample via “quota sampling”

where he tried to get proportions in his sample matching those in the population for important groups.

• For example, the sample should have the same proportion of middle class urban women, lower class rural men, etc.

September 28, 2015 STT 421: Vince Melfi 4

Page 5: STT 421 Day 7: September 28, 2015 September 28, 2015STT 421: Vince Melfi1.

Literary Digest Poll (1936)

• Roosevelt won the election by a landslide• Gallup’s poll predicted this.• Literary digest went out of business shortly after

1936• Gallup polls are still conducted today. (But they

don’t use “quota sampling” any more. There are better methods that we’ll learn about.)

September 28, 2015 STT 421: Vince Melfi 5

Page 6: STT 421 Day 7: September 28, 2015 September 28, 2015STT 421: Vince Melfi1.

Literary Digest Poll (1936)

• What went wrong for Literary Digest? – They found their 10 million names in three places

• Their own readers (who tended to be affluent)• Telephone registries (in 1936, at the height of the

depression, many poorer people had no phone)• Automobile registries (in 1936, at the height of the

depression, many poorer people had no phone)

– So the sample wasn’t representative of the population. In fact it overrepresented the wealthy

September 28, 2015 STT 421: Vince Melfi 6

Page 7: STT 421 Day 7: September 28, 2015 September 28, 2015STT 421: Vince Melfi1.

Randomization

• How do we avoid bias even if we don’t know much about the population?

• The key idea is randomization. • By choosing people “at random” we guard

against potential biases.• There are many sampling methods that

employ randomization. One of the most basic is “simple random sampling.”

September 28, 2015 STT 421: Vince Melfi 7

Page 8: STT 421 Day 7: September 28, 2015 September 28, 2015STT 421: Vince Melfi1.

Population and Sample

• The population is the group we’re interested in.

• Numerical characteristics of the population are called parameters.

• The sample is the group we’re able to collect data on

• Numerical characteristics of the sample are called statistics.

September 28, 2015 STT 421: Vince Melfi 8

Page 9: STT 421 Day 7: September 28, 2015 September 28, 2015STT 421: Vince Melfi1.

Population and Sample

• Example: 1936 election prediction.• Population is all those who will vote.• Parameter of interest is p, the proportion

of those who vote who will vote for Roosevelt

• Statistic we’d calculate from the sample is the proportion in the sample who say they’ll vote for Roosevelt, denoted

September 28, 2015 STT 421: Vince Melfi 9

Page 10: STT 421 Day 7: September 28, 2015 September 28, 2015STT 421: Vince Melfi1.

Simple Random Sample

• A simple random sample of size n is drawn in such a way that every sample of size n from the population has the same chance of being selected.

• Example: Population is A, B, C, D. n=2• {A,B}, {A,C}, {A,D}, {B,C}, {B,D}, {C, D} are

all the samples of size 2. All should have the same chance of being selected.

September 28, 2015 STT 421: Vince Melfi 10

Page 11: STT 421 Day 7: September 28, 2015 September 28, 2015STT 421: Vince Melfi1.

“Good” samples aren’t so easy to obtain

• Example: In an election poll, how do you determine who will actually vote, to avoid having people in your sample who are registered voters but won’t vote?

• Even ignoring this, how do you deal with people who refuse to answer, who lie, who will change their vote by the time of the election, etc?

September 28, 2015 STT 421: Vince Melfi 11

Page 12: STT 421 Day 7: September 28, 2015 September 28, 2015STT 421: Vince Melfi1.

The Salk polio vaccine study

• Polio was a very feared disease in the first half of the 20th century

• Franklin Roosevelt contracted polio and was partially paralyzed

• Polio is caused by a virus• Not all cases of polio cause severe symptoms:

Some mild cases are hard to distinguish from other illnesses

February 13, 2013 STT 200: Vince Melfi 12

Page 13: STT 421 Day 7: September 28, 2015 September 28, 2015STT 421: Vince Melfi1.

The Salk polio vaccine study

• Two references (class material largely drawn from the second):–“Polio: An American Story.” by David

Oshinsky–“The Biggest Public Health

Experiment Ever: The 1954 Field Trial of the Salk Poliomyelitis Vaccine.” by Paul Meier

February 13, 2013 STT 200: Vince Melfi 13

Page 14: STT 421 Day 7: September 28, 2015 September 28, 2015STT 421: Vince Melfi1.

The early 1950s• In the early 1950s there were two vaccines

under development that had substantial promise• A “live virus” vaccine developed by Albert Sabin• A “killed virus” vaccine developed by Jonas Salk• Based on preliminary data, it was decided to do

a large-scale study of the effectiveness of the Salk vaccine

• The vaccine was NOT expected to be 100% effective

February 13, 2013 STT 200: Vince Melfi 14

Page 15: STT 421 Day 7: September 28, 2015 September 28, 2015STT 421: Vince Melfi1.

A Simple Study

• Safety of the vaccine was not a worry• A simple plan: Make the vaccine available

as widely as possible; let subjects (or their parents) volunteer to get the vaccine.

• See whether and how much the rate of polio drops

• This is an observational study

February 13, 2013 STT 200: Vince Melfi 15

Page 16: STT 421 Day 7: September 28, 2015 September 28, 2015STT 421: Vince Melfi1.

Which of these are potential problems with the simple idea of distributing the vaccine widely and comparing the rate of polio with that in the past?

(a) If the rate drops, we don’t know whether the drop is due to the vaccine or other factors

(b) Those who volunteer may have different health characteristics than those who do not

(c) Since polio is hard to diagnose, doctors who know a patient is vaccinated might be less likely to diagnose polio

February 13, 2013 STT 200: Vince Melfi 16

Page 17: STT 421 Day 7: September 28, 2015 September 28, 2015STT 421: Vince Melfi1.

February 13, 2013 STT 200: Vince Melfi 17

1937 1940 1943 1946 1949 1952

010

000

3000

050

000

Number of polio cases by year

Page 18: STT 421 Day 7: September 28, 2015 September 28, 2015STT 421: Vince Melfi1.

Adding a control group• A control group (people who would not have the

opportunity to receive the vaccine) can help with some of the issues

• A suggestion: – Offer (but do not require) vaccination for all

second graders (the treatment group)– Don’t offer vaccination to others– First and third graders form the control group

February 13, 2013 STT 200: Vince Melfi 18

Page 19: STT 421 Day 7: September 28, 2015 September 28, 2015STT 421: Vince Melfi1.

Which of these are potential problems with the modified study which includes a control group?

(a) Those who volunteer may have different health characteristics than those who do not

(b) Since polio is hard to diagnose, doctors who know a patient is vaccinated might be less likely to diagnose polio

(c) There may be differences between the treatment and control group that affect the results

February 13, 2013 STT 200: Vince Melfi 19

Page 20: STT 421 Day 7: September 28, 2015 September 28, 2015STT 421: Vince Melfi1.

Experiment vs observational study

• Adding a control group moves us closer to a designed experiment

February 13, 2013 STT 200: Vince Melfi 20

Page 21: STT 421 Day 7: September 28, 2015 September 28, 2015STT 421: Vince Melfi1.

An experimental study

• Assign children at random to one of two groups:– “Treatment” group: receives the polio vaccine– “Placebo control” group: receives an injection of an

innocuous serum that does not affect polio

• Children, parents, physicians, not allowed to know which children are in the control group and which are in the treatment group (a double-blind study)

February 13, 2013 STT 200: Vince Melfi 21

Page 22: STT 421 Day 7: September 28, 2015 September 28, 2015STT 421: Vince Melfi1.

Sample Size

• Polio was relatively rare, about 50 cases per 100,000

• The vaccine was not expected to be 100% effective without further refinement

• Clearly a large sample size would be needed to detect effectiveness

February 13, 2013 STT 200: Vince Melfi 22

Page 23: STT 421 Day 7: September 28, 2015 September 28, 2015STT 421: Vince Melfi1.

If the incidence of polio is 50 per 100,000, the vaccine is 50% effective, and there are 40,000 children in the treatment group and 40,000 in the control group, how many children in the treatment group would be expected to contract polio?

(a) 20

(b) 40

(c) 50

(d) 10

February 13, 2013 STT 200: Vince Melfi 23

Page 24: STT 421 Day 7: September 28, 2015 September 28, 2015STT 421: Vince Melfi1.

Results of first study

Group Size # Poiio Cases Rate (per 100,000)

Vaccine (2nd grade)

221,988 56 25

No vaccine (1st and 3rd grade)

725,173 391 54

Refused vaccine (2nd grade)

123,605 54 44

February 13, 2013 STT 200: Vince Melfi 24

Page 25: STT 421 Day 7: September 28, 2015 September 28, 2015STT 421: Vince Melfi1.

Results of second study

Group Size # Poiio Cases Rate (per 100,000)

Vaccinated 200, 745 57 28Placebo 201,229 142 71

February 13, 2013 STT 200: Vince Melfi 25