The Scientific Study of Politics (POL 51)

41
The Scientific Study of Politics (POL 51) Professor B. Jones University of California, Davis

description

The Scientific Study of Politics (POL 51). Professor B. Jones University of California, Davis. Today. Sampling Plans Survey Research. Populations. Key Concepts Population Defined by the research “All U.S. citizens age 18 or older.” All democratic countries - PowerPoint PPT Presentation

Transcript of The Scientific Study of Politics (POL 51)

Page 1: The Scientific Study of Politics (POL 51)

The Scientific Study of Politics (POL 51)

Professor B. Jones

University of California, Davis

Page 2: The Scientific Study of Politics (POL 51)

Today

Sampling Plans Survey Research

Page 3: The Scientific Study of Politics (POL 51)

Populations

Key Concepts Population

– Defined by the research– “All U.S. citizens age 18 or older.”– All democratic countries– Counties in the United States

Characteristics of a Population– Bounded and definable– If you can’t define the population, you probably don’t have a

well formed research question!

Page 4: The Scientific Study of Politics (POL 51)

Populations vs. Samples

Populations are often unattainable– TOO BIG (U.S. population)– Very Costly to Obtain– May not be necessary

The beauty of statistical theory

Samples– Simply Defined: a subset of the population

chosen in some manner– How you choose is the important question!

Page 5: The Scientific Study of Politics (POL 51)

Moving Parts of a Sample

Units of Analysis– J is the population– i is a member of J – Then i is a “sample element”

Sampling Frames– The actual source of the data– Literary Digest Poll (1936)– “Dewey Defeats Truman” (1948)– Exit Polls

Page 6: The Scientific Study of Politics (POL 51)

More Moving Parts

Sampling Unit– Could be same as sample element (Unit of

Analysis)– But it could be collections of elements (cluster,

stratified sampling)

Sampling Plan– Random? Nonrandom?

Page 7: The Scientific Study of Politics (POL 51)

Kinds of Samples

Simple Random Sample– Major Characteristic: Every sample element has

an equi-probable chance of selection.– If done properly, maximizes the likelihood of a

representative sample. What if your assumptions of randomness

goes badly? Nonrandom samples (often) produce

nonrepresentative surveys.

Page 8: The Scientific Study of Politics (POL 51)

Why Randomness is Goodness

Nonprobability Sampling– Probability of “getting into” the sample is unknown– All bets are off; inference most likely impossible– Highly unreliable!

Simple Random Sampling– Every sample element has the same probability of

being selected: Pr(selection)=1/N– In practice, not always easy to guarantee or achieve

An Example of a Bad Assumption

Page 9: The Scientific Study of Politics (POL 51)

Some Data

y = -7.0524x + 229.42

R2 = 0.7519

0

50

100

150

200

250

0

50

100

150

200

250

Page 10: The Scientific Study of Politics (POL 51)

More Data

y = 6.7867x + 90.803

R2 = 0.7005

0

50

100

150

200

250

0

50

100

150

200

250

Page 11: The Scientific Study of Politics (POL 51)

Getting Probability Samples Wrong

0

50

100

150

200

250

Vietnam Draft LotteryLottery Numbers and Deaths by Month of Birth

Lottery Number Deaths

Page 12: The Scientific Study of Politics (POL 51)

Draft Lottery

Simple random sampling did not exist.– Avg. Lottery Number Jan.-June: 206– Avg. Lottery Number July-Dec.: 161– Avg. Deaths Jan.-June: 159– Avg. Deaths July-Dec.: 111

Differences highly significant. Its absence had profound consequences. Randomness should have ensured an equal chance of draft, invariant

to birth date. It didn’t. By analogy, suppose college admissions were based on this kind of

lottery… Those of you born later in the year would be less likely to be admitted. Would you consider that fair?

Page 13: The Scientific Study of Politics (POL 51)

How to Achieve Randomness

Random number generation– Modern computers are really good at this. – Assign sample elements a number– Generate a random numbers table– Use a decision rule upon which to select sample.

The Key: sampled units are randomly drawn. Why Important? Randomness helps ensure

REPRESENTATIVENESS! Absent this, all bets are off:

– Convenience Polls– Push Polls– Person-on-the-Street Interviews

Page 14: The Scientific Study of Politics (POL 51)

A Population and Some “Samples”

A “Population”– Striations represent “attitudes”

Some “Samples”

Page 15: The Scientific Study of Politics (POL 51)

Other Kinds of Sampling Strategies

Stratified Samples: a probability sample in which elements sharing some characteristic are grouped and then sample elements are randomly chosen from each group.

Benefit? Can ensure more representative sample with smaller sample sizes.

Why might this be the case?

Page 16: The Scientific Study of Politics (POL 51)

Sampling come to life in…R!!!

Suppose we have a population of 100,000 And in that population, we have 4 groups

– Group 1: 13,000 (13 percent)– Group 2: 12,000 (12 percent)– Group 3: 4,000 ( 4 percent)– Group 4: 70,000 (70 percent)

Racial/Ethnic Characteristics in the US: US Census– White (69.13 percent)– Black (12.06 percent)– Hispanic (12.55 percent)– Asian (3.6 percent)

Some R Code

Page 17: The Scientific Study of Politics (POL 51)

R

#Creating a population of 100,000 consisting of 4 groups set.seed(535126235)population<- rep(1:4,c(13000, 12000, 4000, 70000))

#Tabulating the population (ctab requires package catspec)

ctab(table(population))

#Tabulating the population (ctab requires package catspec)(btw, not sure why percents are not whole numbers)

ctab(table(population)) Count Total %population 1 13000.00 13.132 12000.00 12.123 4000.00 4.044 70000.00 70.71

Page 18: The Scientific Study of Politics (POL 51)

Sampling

What do we expect from random sampling? That each sample reproduces the population

proportions. Let’s consider SIMPLE RANDOM

SAMPLES. Also, let’s consider small samples (size 100) …which is a .001 percent sample.

Page 19: The Scientific Study of Politics (POL 51)

R: 3 samples of n=100

#Three Simple Random Samples without Replacement; n=100 which is a .001 percent sample#The set.seed command ensures I can exactly replicate the simulations

set.seed(15233)srs1<-sample(population, size=100, replace=FALSE) ctab(table(srs1))

set.seed(5255563)srs2<-sample(population, size=100, replace=FALSE) ctab(table(srs2))

set.seed(5255)srs3<-sample(population, size=100, replace=FALSE) ctab(table(srs3))

Page 20: The Scientific Study of Politics (POL 51)

R: Sample Results

> set.seed(15233)> srs1<-sample(population, size=100, replace=FALSE)> ctab(table(srs1)) Count Total %srs1 1 19 192 13 133 5 54 63 63 > set.seed(5255563)> srs2<-sample(population, size=100, replace=FALSE)> ctab(table(srs2)) Count Total %srs2 1 16 162 8 83 4 44 72 72

> set.seed(5255)> srs3<-sample(population, size=100, replace=FALSE)> ctab(table(srs3)) Count Total %srs3 1 12 122 9 93 1 14 78 78

Page 21: The Scientific Study of Politics (POL 51)

Implications?

Small samples? Variability in proportion of groups. Why does this occur? Let’s understand stratification. What does it do? You’re sampling within strata. Suppose we know the population

proportions?

Page 22: The Scientific Study of Politics (POL 51)

R: Identifying Strata and then Sampling from them.

#Stratified Sampling #Creating the Groupings strata1<- rep(1,c(13000)) strata2<- rep(1,c(12000)) strata3<- rep(1,c(4000)) strata4<- rep(1,c(70000)) #Sampling by strata #Selection observations proportional to known population values: Proportionate Sampling set.seed(52524425)

srs4<-sample(strata1, size=13, replace=FALSE) ctab(table(srs4)) set.seed(4244225)srs5<-sample(strata2, size=12, replace=FALSE) ctab(table(srs5)) set.seed(33325)srs6<-sample(strata3, size=4, replace=FALSE) ctab(table(srs6)) set.seed(1114225)srs7<-sample(strata4, size=70, replace=FALSE) ctab(table(srs7))

Page 23: The Scientific Study of Politics (POL 51)

R: Results? Proportional Sampling w/small samples.

> srs4<-sample(strata1, size=13, replace=FALSE)> ctab(table(srs4)) Count Total %srs4 1 13 100> > set.seed(4244225)> srs5<-sample(strata2, size=12, replace=FALSE)> ctab(table(srs5)) Count Total %srs5 1 12 100> > set.seed(33325)> srs6<-sample(strata3, size=4, replace=FALSE)> ctab(table(srs6)) Count Total %srs6 1 4 100> > set.seed(1114225)> srs7<-sample(strata4, size=70, replace=FALSE)> ctab(table(srs7)) Count Total %srs7 1 70 100

Page 24: The Scientific Study of Politics (POL 51)

Proportionate Sampling

What do we see? If we know the proportions of the relevant

stratification variable(s)… Then sample from the groups. SMALL SAMPLES can reproduce certain

characteristics of the sample. But of course, it is probabilistic.

Page 25: The Scientific Study of Politics (POL 51)

Disproportionate Sampling

Why? “Oversampling” may be of interest when

research centers on small pockets in the population.

Race is often an issue in this context.

Page 26: The Scientific Study of Politics (POL 51)

R: Disproportionate Sampling

> #Sampling by strata> #Selection observations disproportional to known population values: disproportionate Sampling> #"Oversampling by Race" > set.seed(5555425)> srs8<-sample(strata1, size=24, replace=FALSE)> ctab(table(srs8)) Count Total %srs8 1 24 100> > set.seed(4222225)> srs9<-sample(strata2, size=22, replace=FALSE)> ctab(table(srs9)) Count Total %srs9 1 22 100> > set.seed(103325)> srs10<-sample(strata3, size=14, replace=FALSE)> ctab(table(srs10)) Count Total %srs10 1 14 100> > set.seed(11534)> srs11<-sample(strata4, size=70, replace=FALSE)> ctab(table(srs7)) Count Total %srs7 1 70 100>

Page 27: The Scientific Study of Politics (POL 51)

Disproportionate Samples

What did I ask R to do? I “oversampled” for some groups. Again, understand why we, as researchers,

might want to do this.

Page 28: The Scientific Study of Politics (POL 51)

Side-trip: Sample Sizes

Who is happy with a .001 percent SRS? On the other hand… What do we get from a stratified sample? Suppose we increase n in a SRS? It’s R time!

Page 29: The Scientific Study of Politics (POL 51)

R: SRS with a 1 percent sample

> #Sample Size=1000> > set.seed(1775233)> srs1<-sample(population, size=1000, replace=FALSE)> ctab(table(srs1)) Count Total %srs1 1 129.0 12.92 97.0 9.73 46.0 4.64 728.0 72.8> > set.seed(5200563)> srs2<-sample(population, size=1000, replace=FALSE)> ctab(table(srs2)) Count Total %srs2 1 117.0 11.72 127.0 12.73 41.0 4.14 715.0 71.5> > set.seed(52909)> srs3<-sample(population, size=1000, replace=FALSE)> ctab(table(srs3)) Count Total %srs3 1 147.0 14.72 126.0 12.63 39.0 3.94 688.0 68.8>

Page 30: The Scientific Study of Politics (POL 51)

Implications?

Sample Size MATTERS What do we see? Note, again, what stratification “buys” us. The issues with stratification? Another R example (code posted on website)

Page 31: The Scientific Study of Politics (POL 51)

R

We have again 4 sample elements > set.seed(52352) > urn<-sample(c(1,2,3,4),size=1000, replace=TRUE) > > ctab(table(urn)) Count Total % urn 1 239.0 23.9 My Population 2 253.0 25.3 3 268.0 26.8 4 240.0 24.0

Page 32: The Scientific Study of Politics (POL 51)

R version of a person-on-the-street interview

> #Convenience Sample: What shows up> > con<-matrixurn[1:10]; con [1] 1 1 1 3 4 2 4 3 4 3> > ctab(table(con)) Count Total %con 1 3 302 1 103 3 304 3 30

Page 33: The Scientific Study of Politics (POL 51)

R and Samples, redux

What do we find? Very unreliable sample: we oversample

some groups, undersample others. Useless data more than likely. What do you imagine happens when we

increase the sample sizes?

Page 34: The Scientific Study of Politics (POL 51)

R and SRS with samples of size N

/*Sample: Sizes 10, 50, 75, 100, 200, 250, 900, 1000*/

set.seed(562)s1<-sample(urn, 10, replace=FALSE)ctab(table(s1))

set.seed(58862)s1a<-sample(urn, 50, replace=FALSE)ctab(table(s1a))

set.seed(562657)s1b<-sample(urn, 75, replace=FALSE)ctab(table(s1b))

set.seed(58862)s2<-sample(urn, 100, replace=FALSE)ctab(table(s2))

set.seed(58862)s3<-sample(urn, 200, replace=FALSE)ctab(table(s3))

set.seed(10562)s4<-sample(urn, 250, replace=FALSE)ctab(table(s4))

set.seed(22562)s5<-sample(urn, 900, replace=FALSE)ctab(table(s5))

set.seed(56882)s6<-sample(urn, 1000, replace=FALSE)ctab(table(s6))

Page 35: The Scientific Study of Politics (POL 51)

> /*Sample: Sizes 10, 50, 75, 100, 200, 250, 900, 1000*/Error: unexpected '/' in "/"> > set.seed(562)> s1<-sample(urn, 10, replace=FALSE)> ctab(table(s1)) Count Total %s1 1 2 202 4 403 2 204 2 20> > set.seed(58862)> s1a<-sample(urn, 50, replace=FALSE)> ctab(table(s1a)) Count Total %s1a 1 13 262 13 263 13 264 11 22>

Sampling and Sample Size

Page 36: The Scientific Study of Politics (POL 51)

> > > set.seed(562657)> s1b<-sample(urn, 75, replace=FALSE)> ctab(table(s1b)) Count Total %s1b 1 22.00 29.332 18.00 24.003 22.00 29.334 13.00 17.33> > set.seed(58862)> s2<-sample(urn, 100, replace=FALSE)> ctab(table(s2)) Count Total %s2 1 27 272 24 243 22 224 27 27>

Sample Sizes

Page 37: The Scientific Study of Politics (POL 51)

> set.seed(58862)> s3<-sample(urn, 200, replace=FALSE)> ctab(table(s3)) Count Total %s3 1 54 272 48 243 48 244 50 25> > > set.seed(10562)> s4<-sample(urn, 250, replace=FALSE)> ctab(table(s4)) Count Total %s4 1 62.0 24.82 67.0 26.83 56.0 22.44 65.0 26.0>

Sample Size

Page 38: The Scientific Study of Politics (POL 51)

Sample Size

> set.seed(22562)> s5<-sample(urn, 900, replace=FALSE)> ctab(table(s5)) Count Total %s5 1 220.00 24.442 231.00 25.673 234.00 26.004 215.00 23.89> > set.seed(56882)> s6<-sample(urn, 1000, replace=FALSE)> ctab(table(s6)) Count Total %s6 1 239.0 23.92 253.0 25.33 268.0 26.84 240.0 24.0> >

Page 39: The Scientific Study of Politics (POL 51)

R: What did we learn?

Sample size seems to have some impact here.

But there are trade-offs.

Page 40: The Scientific Study of Politics (POL 51)

Important Moving Parts

Randomness (covered!) Sampling Frame

– Random sampling from a bad sampling frame produces bad samples.

Sample Size– What is your intuition about sample sizes?

Must they always be large?– Not necessarily so…although…

Page 41: The Scientific Study of Politics (POL 51)

Bad Sampling

Person-on-the-Street Interviews What do these imply? Small samples and inherently nonrandom Likely poor inference. Other examples? Not all non-random samples are

necessarily bad Purposive Samples