The Scientific Study of Politics (POL 51)

55
Professor B. Jones University of California, Davis The Scientific Study of Politics (POL 51)

description

The Scientific Study of Politics (POL 51). Professor B. Jones University of California, Davis. Today. Sampling Plans Survey Research. Populations. Key Concepts Population Defined by the research “All U.S. citizens age 18 or older.” All democratic countries - PowerPoint PPT Presentation

Transcript of The Scientific Study of Politics (POL 51)

Page 1: The Scientific Study of Politics (POL 51)

Professor B. JonesUniversity of California, Davis

The Scientific Study of Politics (POL 51)

Page 2: The Scientific Study of Politics (POL 51)

TodaySampling PlansSurvey Research

Page 3: The Scientific Study of Politics (POL 51)

PopulationsKey ConceptsPopulation

Defined by the research“All U.S. citizens age 18 or older.”All democratic countriesCounties in the United States

Characteristics of a PopulationBounded and definableIf you can’t define the population, you probably don’t

have a well formed research question!

Page 4: The Scientific Study of Politics (POL 51)

Populations vs. SamplesPopulations are often unattainable

TOO BIG (U.S. population)Very Costly to ObtainMay not be necessary

The beauty of statistical theorySamples

Simply Defined: a subset of the population chosen in some manner

How you choose is the important question!

Page 5: The Scientific Study of Politics (POL 51)

Moving Parts of a SampleUnits of Analysis

J is the populationi is a member of J Then i is a “sample element”

Sampling FramesThe actual source of the dataLiterary Digest Poll (1936)“Dewey Defeats Truman” (1948)Exit Polls

Page 6: The Scientific Study of Politics (POL 51)

More Moving PartsSampling Unit

Could be same as sample element (Unit of Analysis)

But it could be collections of elements (cluster, stratified sampling)

Sampling PlanRandom? Nonrandom?

Page 7: The Scientific Study of Politics (POL 51)

Kinds of SamplesSimple Random Sample

Major Characteristic: Every sample element has an equi-probable chance of selection.

If done properly, maximizes the likelihood of a representative sample.

What if your assumptions of randomness goes badly?

Nonrandom samples (often) produce nonrepresentative surveys.

Page 8: The Scientific Study of Politics (POL 51)

Why Randomness is Goodness

Nonprobability SamplingProbability of “getting into” the sample is

unknownAll bets are off; inference most likely impossibleHighly unreliable!

Simple Random SamplingEvery sample element has the same probability

of being selected: Pr(selection)=1/NIn practice, not always easy to guarantee or

achieveAn Example of a Bad Assumption

Page 9: The Scientific Study of Politics (POL 51)

Some Data

y = -7.0524x + 229.42

R2 = 0.7519

0

50

100

150

200

250

Janu

ary

Febru

ary

Mar

chApr

ilM

ayJu

ne July

Augus

t

Septe

mbe

r

Octobe

r

Novem

ber

Decem

ber

0

50

100

150

200

250

Janu

ary

Febru

ary

Mar

chApr

ilM

ayJu

ne July

Augus

t

Septe

mbe

r

Octobe

r

Novem

ber

Decem

ber

Page 10: The Scientific Study of Politics (POL 51)

More Data

y = 6.7867x + 90.803

R2 = 0.7005

0

50

100

150

200

250

Janu

ary

Febru

ary

Mar

chApr

ilM

ayJu

ne July

Augus

t

Septe

mbe

r

Octobe

r

Novem

ber

Decem

ber

0

50

100

150

200

250

Janu

ary

Febru

ary

Mar

chApr

ilM

ayJu

ne July

Augus

t

Septe

mbe

r

Octobe

r

Novem

ber

Decem

ber

Page 11: The Scientific Study of Politics (POL 51)

Getting Probability Samples Wrong

0

50

100

150

200

250

Vietnam Draft LotteryLottery Numbers and Deaths by Month of Birth

Lottery Number Deaths

Page 12: The Scientific Study of Politics (POL 51)

Draft Lottery

Simple random sampling did not exist. Avg. Lottery Number Jan.-June: 206 Avg. Lottery Number July-Dec.: 161 Avg. Deaths Jan.-June: 111 Avg. Deaths July-Dec.: 159

Differences highly significant. Its absence had profound

consequences. Randomness should have

ensured an equal chance of draft, invariant to birth date. It didn’t.

By analogy, suppose college admissions were based on this kind of lottery…

http://www.poetv.com/video.php?vid=52539

Page 13: The Scientific Study of Politics (POL 51)

How to Achieve Randomness Random number generation

Modern computers are really good at this. Assign sample elements a numberGenerate a random numbers tableUse a decision rule upon which to select sample.

The Key: sampled units are randomly drawn. Why Important? Randomness helps ensure

REPRESENTATIVENESS! Absent this, all bets are off:

Convenience PollsPush PollsPerson-on-the-Street Interviews

Page 14: The Scientific Study of Politics (POL 51)

Populations and Samples

A population is any well-defined set of units of analysis.

The population is determined largely by the research question; the population should be consistent through all parts of a research project.

A sample is a subset of a population.Samples are drawn through a systematic procedure

called a sampling method. Sample statistics measure characteristics of the

sample to estimate the value of population parameters that describe the characteristics of a population.

Page 15: The Scientific Study of Politics (POL 51)

Populations and Samples

Page 16: The Scientific Study of Politics (POL 51)

Populations and Samples

A population would be the first choice for analysis.

Resources and feasibility usually preclude analysis of population data.

Most research uses samples.

Page 17: The Scientific Study of Politics (POL 51)

Probability Samples

The goal in sampling is to create a sample that is identical to the population in all characteristics except size.

Any difference between a population and a sample is defined as bias.

Bias leads to inaccurate conclusions about the population.

Page 18: The Scientific Study of Politics (POL 51)

Probability Samples

Probability samples: Each element in the population has a known probability of inclusion in the sample.

Probability samples are a better choice than nonprobability samples, when possible, because they are more likely to be representative and unbiased.

Page 19: The Scientific Study of Politics (POL 51)

Probability Samples

Simple random sample:Each element and combination of elements

in a population have an equal chance of selection.

Selection can be driven by a lottery, a random number generator, or any other method that guarantees an equal chance of selection.

Page 20: The Scientific Study of Politics (POL 51)

Probability Samples

Systematic sample:Generated by selecting elements from a list

of the population at a predetermined interval.

Start point for selection must be chosen at random or the list must be randomized; otherwise, the sample will not be as representative.

Page 21: The Scientific Study of Politics (POL 51)

Probability Samples

Stratified sample:Drawn from a population that has been

subdivided into two or more strata based on a single characteristic.

Elements are selected from each strata in proportion to the strata’s representation in the entire population.

Page 22: The Scientific Study of Politics (POL 51)

Probability Samples

Disproportionate stratified sample:Elements are drawn disproportionately from

the strata.Used to over-represent a group that, due to

its small size in the population, would not likely make up a large enough percentage of the sample to allow for quality inferences.

Page 23: The Scientific Study of Politics (POL 51)

Probability Samples

Cluster samples:Group elements for an initial sampling frame

(50 states).Samples drawn from increasingly narrow

groups (counties, then cities, then blocks) until the final sample of elements is drawn from the smallest group (individuals living in each household).

Page 24: The Scientific Study of Politics (POL 51)

Nonprobability Samples

Nonprobability samples: Each element in the population has an unknown probability of inclusion in the sample.

These sampling techniques, while less representative, are used to collect data when probability samples are not feasible.

Page 25: The Scientific Study of Politics (POL 51)

Nonprobability Samples

Purposive samples:Used to study a diverse and limited number

of observations.Case studies.

Page 26: The Scientific Study of Politics (POL 51)

Nonprobability Samples

Convenience samples:Include elements that are easy or

convenient for the investigator; for example, college students in samples collected on college campuses.

Page 27: The Scientific Study of Politics (POL 51)

Nonprobability Samples

Quota sample:Elements are chosen for inclusion in a

nonprobabilistic manner (usually in a purposive or convenient manner) in proportion to their representation in the population.

Page 28: The Scientific Study of Politics (POL 51)

Nonprobability Samples

Snowball sample:Relies on elements in the target population

to identify other elements in the population for inclusion in the sample.

Particularly useful when studying hard-to-locate or identify populations.

Page 29: The Scientific Study of Politics (POL 51)

A Population and Some “Samples”A “Population”

Striations represent “attitudes”

Some “Samples”

Page 30: The Scientific Study of Politics (POL 51)

Sampling come to life in…R!!! Suppose we have a population of 100,000 And in that population, we have 4 groups

Group 1: 13,000 (13 percent)Group 2: 12,000 (12 percent)Group 3: 4,000 ( 4 percent)Group 4: 70,000 (70 percent)

Racial/Ethnic Characteristics in the US: US CensusWhite (69.13 percent)Black (12.06 percent)Hispanic (12.55 percent)Asian (3.6 percent)

Some R Code

Page 31: The Scientific Study of Politics (POL 51)

R#Creating a population of 100,000 consisting of 4 groups set.seed(535126235)population<- rep(1:4,c(13000, 12000, 4000, 70000))

#Tabulating the population (ctab requires package catspec)

ctab(table(population))

#Tabulating the population (ctab requires package catspec)(btw, not sure why percents are not whole numbers)

ctab(table(population)) Count Total %population 1 13000.00 13.132 12000.00 12.123 4000.00 4.044 70000.00 70.71

Page 32: The Scientific Study of Politics (POL 51)

SamplingWhat do we expect from random sampling?That each sample reproduces the

population proportions. Let’s consider SIMPLE RANDOM

SAMPLES. Also, let’s consider small samples (size

100)…which is a .001 percent sample.

Page 33: The Scientific Study of Politics (POL 51)

R: 3 samples of n=100

#Three Simple Random Samples without Replacement; n=100 which is a .001 percent sample#The set.seed command ensures I can exactly replicate the simulations

set.seed(15233)srs1<-sample(population, size=100, replace=FALSE) ctab(table(srs1))

set.seed(5255563)srs2<-sample(population, size=100, replace=FALSE) ctab(table(srs2))

set.seed(5255)srs3<-sample(population, size=100, replace=FALSE) ctab(table(srs3))

Page 34: The Scientific Study of Politics (POL 51)

R: Sample Results> set.seed(15233)> srs1<-sample(population, size=100, replace=FALSE)> ctab(table(srs1)) Count Total %srs1 1 19 192 13 133 5 54 63 63 > set.seed(5255563)> srs2<-sample(population, size=100, replace=FALSE)> ctab(table(srs2)) Count Total %srs2 1 16 162 8 83 4 44 72 72

> set.seed(5255)> srs3<-sample(population, size=100, replace=FALSE)> ctab(table(srs3)) Count Total %srs3 1 12 122 9 93 1 14 78 78

Page 35: The Scientific Study of Politics (POL 51)

Implications?Small samples?Variability in proportion of groups.Why does this occur? Let’s understand stratification.What does it do?You’re sampling within strata. Suppose we know the population

proportions?

Page 36: The Scientific Study of Politics (POL 51)

R: Identifying Strata and then Sampling from them.

#Stratified Sampling #Creating the Groupings strata1<- rep(1,c(13000)) strata2<- rep(1,c(12000)) strata3<- rep(1,c(4000)) strata4<- rep(1,c(70000)) #Sampling by strata #Selection observations proportional to known population values: Proportionate Sampling set.seed(52524425)

srs4<-sample(strata1, size=13, replace=FALSE) ctab(table(srs4)) set.seed(4244225)srs5<-sample(strata2, size=12, replace=FALSE) ctab(table(srs5)) set.seed(33325)srs6<-sample(strata3, size=4, replace=FALSE) ctab(table(srs6)) set.seed(1114225)srs7<-sample(strata4, size=70, replace=FALSE) ctab(table(srs7))

Page 37: The Scientific Study of Politics (POL 51)

R: Results? Proportional Sampling w/small samples.

> srs4<-sample(strata1, size=13, replace=FALSE)> ctab(table(srs4)) Count Total %srs4 1 13 100> > set.seed(4244225)> srs5<-sample(strata2, size=12, replace=FALSE)> ctab(table(srs5)) Count Total %srs5 1 12 100> > set.seed(33325)> srs6<-sample(strata3, size=4, replace=FALSE)> ctab(table(srs6)) Count Total %srs6 1 4 100> > set.seed(1114225)> srs7<-sample(strata4, size=70, replace=FALSE)> ctab(table(srs7)) Count Total %srs7 1 70 100

Page 38: The Scientific Study of Politics (POL 51)

Proportionate SamplingWhat do we see?If we know the proportions of the relevant

stratification variable(s)…Then sample from the groups.SMALL SAMPLES can reproduce certain

characteristics of the sample.But of course, it is probabilistic.

Page 39: The Scientific Study of Politics (POL 51)

Disproportionate SamplingWhy?“Oversampling” may be of interest when

research centers on small pockets in the population.

Race is often an issue in this context.

Page 40: The Scientific Study of Politics (POL 51)

R: Disproportionate Sampling> #Sampling by strata> #Selection observations disproportional to known population values: disproportionate Sampling> #"Oversampling by Race" > set.seed(5555425)> srs8<-sample(strata1, size=24, replace=FALSE)> ctab(table(srs8)) Count Total %srs8 1 24 100> > set.seed(4222225)> srs9<-sample(strata2, size=22, replace=FALSE)> ctab(table(srs9)) Count Total %srs9 1 22 100> > set.seed(103325)> srs10<-sample(strata3, size=14, replace=FALSE)> ctab(table(srs10)) Count Total %srs10 1 14 100> > set.seed(11534)> srs11<-sample(strata4, size=70, replace=FALSE)> ctab(table(srs7)) Count Total %srs7 1 70 100>

Page 41: The Scientific Study of Politics (POL 51)

Disproportionate Samples What did I ask R to do?I “oversampled” for some groups.Again, understand why we, as researchers,

might want to do this.

Page 42: The Scientific Study of Politics (POL 51)

Side-trip: Sample SizesWho is happy with a .001 percent SRS? On the other hand…What do we get from a stratified sample?Suppose we increase n in a SRS? It’s R time!

Page 43: The Scientific Study of Politics (POL 51)

R: SRS with a 1 percent sample

> #Sample Size=1000> > set.seed(1775233)> srs1<-sample(population, size=1000, replace=FALSE)> ctab(table(srs1)) Count Total %srs1 1 129.0 12.92 97.0 9.73 46.0 4.64 728.0 72.8> > set.seed(5200563)> srs2<-sample(population, size=1000, replace=FALSE)> ctab(table(srs2)) Count Total %srs2 1 117.0 11.72 127.0 12.73 41.0 4.14 715.0 71.5> > set.seed(52909)> srs3<-sample(population, size=1000, replace=FALSE)> ctab(table(srs3)) Count Total %srs3 1 147.0 14.72 126.0 12.63 39.0 3.94 688.0 68.8>

Page 44: The Scientific Study of Politics (POL 51)

Implications?Sample Size MATTERSWhat do we see?Note, again, what stratification “buys” us.The issues with stratification? Another R example (code posted on

website)

Page 45: The Scientific Study of Politics (POL 51)

RWe have again 4 sample elements > set.seed(52352) > urn<-sample(c(1,2,3,4),size=1000, replace=TRUE) > > ctab(table(urn)) Count Total % urn 1 239.0 23.9 My Population 2 253.0 25.3 3 268.0 26.8 4 240.0 24.0

Page 46: The Scientific Study of Politics (POL 51)

R version of a person-on-the-street interview

> #Convenience Sample: What shows up> > con<-matrixurn[1:10]; con [1] 1 1 1 3 4 2 4 3 4 3> > ctab(table(con)) Count Total %con 1 3 302 1 103 3 304 3 30

Page 47: The Scientific Study of Politics (POL 51)

R and Samples, reduxWhat do we find?Very unreliable sample: we oversample

some groups, undersample others.Useless data more than likely. What do you imagine happens when we

increase the sample sizes?

Page 48: The Scientific Study of Politics (POL 51)

R and SRS with samples of size N

/*Sample: Sizes 10, 50, 75, 100, 200, 250, 900, 1000*/

set.seed(562)s1<-sample(urn, 10, replace=FALSE)ctab(table(s1))

set.seed(58862)s1a<-sample(urn, 50, replace=FALSE)ctab(table(s1a))

set.seed(562657)s1b<-sample(urn, 75, replace=FALSE)ctab(table(s1b))

set.seed(58862)s2<-sample(urn, 100, replace=FALSE)ctab(table(s2))

set.seed(58862)s3<-sample(urn, 200, replace=FALSE)ctab(table(s3))

set.seed(10562)s4<-sample(urn, 250, replace=FALSE)ctab(table(s4))

set.seed(22562)s5<-sample(urn, 900, replace=FALSE)ctab(table(s5))

set.seed(56882)s6<-sample(urn, 1000, replace=FALSE)ctab(table(s6))

Page 49: The Scientific Study of Politics (POL 51)

> /*Sample: Sizes 10, 50, 75, 100, 200, 250, 900, 1000*/Error: unexpected '/' in "/"> > set.seed(562)> s1<-sample(urn, 10, replace=FALSE)> ctab(table(s1)) Count Total %s1 1 2 202 4 403 2 204 2 20> > set.seed(58862)> s1a<-sample(urn, 50, replace=FALSE)> ctab(table(s1a)) Count Total %s1a 1 13 262 13 263 13 264 11 22>

Sampling and Sample Size

Page 50: The Scientific Study of Politics (POL 51)

> > > set.seed(562657)> s1b<-sample(urn, 75, replace=FALSE)> ctab(table(s1b)) Count Total %s1b 1 22.00 29.332 18.00 24.003 22.00 29.334 13.00 17.33> > set.seed(58862)> s2<-sample(urn, 100, replace=FALSE)> ctab(table(s2)) Count Total %s2 1 27 272 24 243 22 224 27 27>

Sample Sizes

Page 51: The Scientific Study of Politics (POL 51)

> set.seed(58862)> s3<-sample(urn, 200, replace=FALSE)> ctab(table(s3)) Count Total %s3 1 54 272 48 243 48 244 50 25> > > set.seed(10562)> s4<-sample(urn, 250, replace=FALSE)> ctab(table(s4)) Count Total %s4 1 62.0 24.82 67.0 26.83 56.0 22.44 65.0 26.0>

Sample Size

Page 52: The Scientific Study of Politics (POL 51)

Sample Size> set.seed(22562)> s5<-sample(urn, 900, replace=FALSE)> ctab(table(s5)) Count Total %s5 1 220.00 24.442 231.00 25.673 234.00 26.004 215.00 23.89> > set.seed(56882)> s6<-sample(urn, 1000, replace=FALSE)> ctab(table(s6)) Count Total %s6 1 239.0 23.92 253.0 25.33 268.0 26.84 240.0 24.0> >

Page 53: The Scientific Study of Politics (POL 51)

R: What did we learn?Sample size seems to have some impact

here. But there are trade-offs.

Page 54: The Scientific Study of Politics (POL 51)

Important Moving PartsRandomness (covered!)Sampling Frame

Random sampling from a bad sampling frame produces bad samples.

Sample SizeWhat is your intuition about sample sizes?

Must they always be large?Not necessarily so…although…

Page 55: The Scientific Study of Politics (POL 51)

Bad Sampling Person-on-the-Street Interviews What do these imply? Small samples and inherently nonrandom Likely poor inference. Other examples? Not all non-random samples are

necessarily bad Purposive Samples