POL SOC 360 Sampling Generalizability

8
Research Methods Sampling and Generalizability The Basics of Sampling Population: o Entire set of individuals or other entities to which study findings are generalized o Does not necessarily have to be people Population might also refer to: Adults living in a geographical area (e.g. city, state), or working in a given organization Set a countries, corporations, government agencies, events, etc. “All….” and countable o Examples: All political science / sociology majors at MSU “All high schools in Kentucky” o Important that population be carefully and fully defined and that it be relevant to the research question being asked Again, remember McDonald and Popkin’s work on the “Vanishing Voter” Question: How did they re-conceptualize the traditional meaning of a “population” o Went from Voting Age Population (VEP) to Voting Eligible Population (VEP) It is normally infeasible to interview the entire population of anything about anything o The U.S. census experiences an undercounting problem in many cities It is better to select a few members of the population for further inquiry o Known as a sample Sample o Any subset of units collected in some way from a population o Data we use to actually test theories Quality / Precision / Reliability of Sample based on: Overall sample size How members are chosen to be in sample size o Will discuss this more later Population or Sample? Which should we really use? Advantages of Sampling o Time and Money

Transcript of POL SOC 360 Sampling Generalizability

Page 1: POL SOC 360 Sampling Generalizability

Research Methods Sampling and Generalizability The Basics of Sampling

Population:

o Entire set of individuals or other entities to which study findings are generalized

o Does not necessarily have to be people

Population might also refer to:

Adults living in a geographical area (e.g. city, state), or working in

a given organization

Set a countries, corporations, government agencies, events, etc.

“All….” and countable

o Examples:

“All political science / sociology majors at MSU”

“All high schools in Kentucky”

o Important that population be carefully and fully defined and that it be relevant to

the research question being asked

Again, remember McDonald and Popkin’s work on the “Vanishing Voter”

Question: How did they re-conceptualize the traditional meaning

of a “population”

o Went from Voting Age Population (VEP) to Voting

Eligible Population (VEP)

It is normally infeasible to interview the entire population of anything about anything

o The U.S. census experiences an undercounting problem in many cities

It is better to select a few members of the population for further inquiry

o Known as a sample

Sample o Any subset of units collected in some way from a population

o Data we use to actually test theories

Quality / Precision / Reliability of Sample based on:

Overall sample size

How members are chosen to be in sample size

o Will discuss this more later

Population or Sample?

Which should we really use?

Advantages of Sampling

o Time and Money

Page 2: POL SOC 360 Sampling Generalizability

Disadvantages of Sampling

o Information based on sample is usually less accurate or more subject to error than

is important collected from a population

Some studies do not lend themselves to sampling

o Case Studies: Involve detailed examination of just one a few units

Sometimes you really don’t have a choice…

o At the end of the day, decision is usually made on practical grounds—due to time,

money, and other “research costs”

Example: State of the Union vs. State of the State Addresses

Show Stateline.org

Cannot analyze population of these gubernatorial speeches because

complete dataset does not exist before 2003

o Developing a better dataset—to get a little closer to

population—might end up being my dissertation

Fundamental Concepts

As we know, social scientists are mainly interested in certain characteristics about

populations such as differences between individuals, groups, societal relationships, etc.

Population Parameter

o Characteristics about a population that can be quantified as a number

Examples: Proportion, Mean/Average, etc.

Population Proportion = P

Population Mean = u (Greek mu)

Estimator o Numerically estimates the value of population characteristic, or population

parameter

Sample Statistic o An estimator of a population parameter derived from a population sample

Examples:

Sample Proportion = p

Sample Mean = Y-bar

Element

o Not hydrogen or helium…

o We know this better as a unit of analysis

o A single occurrence, realization, or instance of the objects or entities being

studied

Examples: Individuals, States, Cities, Countries, Political Speeches, Wars

Page 3: POL SOC 360 Sampling Generalizability

Stratum o We will discuss this more shortly—but for now, know that a population can be

stratified—or subdivided or broken up into groups of similar elements—before a

sample is drawn

o Each stratum is subgroup of a population that shares one or more characteristics

Examples:

MSU students stratified by class, major, or GPA

o Latin Graduation Honors are type of stratification

Cum Laude, Magna Cum Laude, Summa Cum

Laude

Teten stratified State of the Union addresses into:

o Founding Period

Washington to John Quincy Adams (1790 to 1825)

o Traditional Period

JQA to Taft (1825-1913)

o Modern Period

Wilson to Present (1913-)

Stratification based on word length of the

address

Sampling Frame o Particular population from which sample is actually drawn

o Closer sampling frame is to population of interest or theoretical population, the

better off you are

Example: Mall

If you interview every nth person entering Fayette Mall about who

they are going to vote for in November, you are not going to get

entire population unless Anthony Davis is in Food Court signing

autographs and then everyone in Lexington will go to Mall

o Remember the Literary Digest Poll?

In 1936, the presidential election was between Republican Alfred Landon

(Kansas Governor) and FDR

o LR predicted that Landon would win, 55% to FDR’s 41%

In actuality, Landon only carried Maine and Vermont and won

a whopping eight electoral votes (1.5% of total)

Why did this happen?

o Comes down to polling techniques

Magazine sent out 10 million poll ballots and gotten a 24%

response rate (2.4 million respondents)

o However, they only surveyed their readership

Group with disposable income (because they could still afford

a subscription during Great Depression)

o Used two other lists for surveying:

Page 4: POL SOC 360 Sampling Generalizability

Registered automobile users

Telephone users

Question: While this could probably be considered

statistically significant today, what was the big problem

in 1936?

These groups had high incomes, which made it

much more likely that they would vote for

Republican candidate

o Sampling frame ended up oversampling wealthy and GOP

Were not factoring in Great Depression and fact Hoover was doing next to

nothing to help

Lots of poor people voted, and they overwhelmingly voted for

FDR

o If sampling frame is incomplete or inappropriate, then sample bias will occur

Sample will be unrepresentative of the population, and inaccurate

conclusions may result

Sample bias may also be caused by a biased selection of elements, even if

frame is complete and accurate

Sampling Unit o Entity listed in a sampling frame

Can be thought of like an element

Types of Samples

Basic differences made between different sampling types due to how data is collected

Probability Sample

o Sample for which each element in the total population has a known probability of

being included in the sample

Can calculate how accurately sample reflects population from which it is

known

Nonprobability Sample

o Sample in which each element in the total population has an unknown probability

of being selected

Without knowing probability, you cannot use statistical theory to make

inferences about population

Page 5: POL SOC 360 Sampling Generalizability

PROBABILITY SAMPLES

Simple Random Sample (SRS)

Each element and combination of elements has an equal chance of being selected

o What has to happen for this to occur?

A list of all elements in the population must be available

A method for selecting those elements must be used that ensure each

element has equal chance of being selected

While seemingly simple, drawing a true SRS can be difficult

Class Activity: o Write down list of random numbers for 30 seconds

o How “random” are the lists?

Example: Vietnam Draft

As Vietnam War continued and opposition to national policies

grew (LBJ), need to make draft process fairer so that all men—not

just poor and minorities—would have chance to serve

Because you could not go out and pick men at random, the

Selective Service began lottery system

Likelihood that man would be drafted was determined randomly

by writing every day of year on individuals slips of paper, placing

slips in separate capsules, and putting all capsules in a barrel

o VIDEO: “The Draft Lottery—Vietnam War”

Selective Service estimated that anyone w/ number higher than 200

would not be called; process seemed fair

However, people found negative correlation between day of birth

and draft number

o If you were born in later months of year, you had more

change to serve than people born in early months

Capsules were probably not mixed well

One way to get around this issue is to assign number to each element in sampling frame,

and then use random numbers generator

o Simply list of random numbers

o Suppose we had list of all 500 MSU political science majors, and we wanted to

randomly sample 10 to ask their thoughts about our program

We would have to number each person: 1, 2, 3…

Then we start at random place in random numbers table and start selecting

numbers (if same number twice, we ignore it)

Would have # 463, #335, #658, #618, #161, #543… as subjects

Since SRS only requires list of population members, we could use it to survey members

of Congress, all countries in world, or cities with more than 50,000 people

Page 6: POL SOC 360 Sampling Generalizability

Systematic Sample

Elements selected from list at predetermined intervals

o May be easier than random number generator but still requires list of target

population

In a systematic sample, every Kth element on list is selected

o K is number that will result in desired number of elements being chosen

K = Sampling Interval or “skip” between elements

K = Population Size (N) / Sample size (n)

o Example: Had 25 people and wanted to sample five, you could sample #2, 7, 12,

17, and 22

Useful when dealing with long list of population elements (e.g. all SC justices)

Often used in product testing

o Example: Working at JIF plant, job is to check that lids screwed on

Would make sense to simply sample every 5th lid

Stratified Sample

Probability sample where elements sharing one or more characteristics are grouped

o Elements are selected from each group in proportion to group’s representation to

total population

Two Main Types:

o Proportionate Sample

Stratified sample were each stratum represented in proportion to its size in

population

Example: Imagine that there were 500 members in Congress, with

six parties:

o Blue Party – 100 Members Sample 20

o Red Party – 100 Members Sample 20

o Green Party – 50 members Sample 10

o White Party – 150 members Sample 30

o Brown Party – 50 members Sample 10

o Black Party – 50 members Sample 10

Say we wanted to sample 100 of these members on

an upcoming policy issue

First have to calculate sampling fraction

100 / 500 = 1/ 5 (Refer to Sampling Counts above)

Helps issue of SRS, where all 100 might come from White Party

o Disproportionate Sample

Stratified sample where each stratum is not represented in proportion to its

size in population

Example: Thinking about differences between racial groups in

US—number in sample for one race might be too small to make

valid inferences

o Sample disproportionately to get enough of that race

Issue of Weighting

Page 7: POL SOC 360 Sampling Generalizability

Cluster Samples

Probability sample in which sampling frame initially consists of clusters of elements

Groups / clusters of elements are identified and listed as sampling units

o Within each sampling unit, certain elements are identified and sampled

Happens a lot when dealing with public opinion polling

o Step 1: Get Murray map and identify city blocks

This becomes sampling frame

o Step 2: Sample (either randomly or systemically) smaller number of blocks

o Step 3: Go to selected blocks and list all houses on block

o Step 4: Sample list of households to actually interview

Advantages of Cluster Sampling

o Allows researchers to get around problem of acquiring list of elements in target

population

o Reduces fieldwork costs for public opinion surveys (people closer together)

Disadvantage of Cluster Sampling

o Greater level of imprecision

Error arises at each stage of cluster sample

Example: Sample of city blocks will not necessarily be

representative of all city blocks

Systematic, stratified, and cluster samples are better than SRS

NONPROBABILITY SAMPLES

Nonprobability Sample

o Sample in which each element in the total population has an unknown probability

of being selected

o Probability samples are preferred because they more accurately represent

population and thus, can better calculate estimated values closer to population

Purposive Sample

Researcher exercises considerable discretion over what observations to study

Goal: To study a diverse and usually limited number of observations

Example: Fenno and Home Style

o Describes behavior of 18 incumbent representatives in Congress

Convenience Sample

Elements are included because they are convenient or easy for a researcher to study

o Example: Studying those State of the State Addresses found on Stateline.org

Used for exploratory research or when target population is impossible to define / locate

Page 8: POL SOC 360 Sampling Generalizability

Quota Sample

Sample in which elements are sampled in proportion to their representation in population

Similar to proportionate stratified sampling, but elements are quota sample are NOT

chosen in reasoned or probabilistic manner

o Chosen in convenience fashion until each type of element (quota) has been

reached

Leads to biased and inaccurate measures of target population

Example: 1948 Gallup Poll used quota sampling and predicted that Thomas Dewey,

Republican governor of New York would beat incumbent President Harry S Truman

Snowball Sample

Initial respondents are used to identify others who might quality for inclusion into

sample

o Asked to provide names for further surveying / interviewing

Useful when trying to study members in a typically elusive population:

o Draft Dodgers

o Political Protestors

o Drug Users