PPA 501 – A NALYTICAL M ETHODS IN A DMINISTRATION Lecture 3c – Sampling.

13
PPA 501 – ANALYTICAL METHODS IN ADMINISTRATION Lecture 3c – Sampling

Transcript of PPA 501 – A NALYTICAL M ETHODS IN A DMINISTRATION Lecture 3c – Sampling.

PPA 501 – ANALYTICAL METHODS IN ADMINISTRATIONLecture 3c – Sampling

INTRODUCTION – WHEN INFORMATION IS UNAVAILABLE

If all possible information needed to solve an administrative problem could be collected, there would be no need for a sample.

But, such data gathering is limited by time and money.

So most analysts and administrators use samples and estimate effects based on probabilities.

REASONS FOR SAMPLING

Cost, time, accuracy, and the destructive nature of the measurement process.

Key concerns: Accuracy to make the decision. Cost of the wrong choice. How much more information is needed. What kinds of data and at what cost? Affordability of extra cost.

Sampling precision. Sampling is based on probability: What is the

probability that I will be wrong if I generalize from the sample to the population?

REASONS FOR SAMPLING

A high degree of precision is difficult and expensive to achieve. A doubling in accuracy requires a four-fold

increase in sample size. For many social science applications, that level

of accuracy is not necessary: Tracking trends is often equally or more important.

The survey or research process can also begin to change people’s reactions, answers, behaviors, and so on.

SAMPLING METHODS

Probability or nonprobability sample. Single unit or cluster of units. Unstratified or stratified sample. Equal unit probability or unequal probability. Single stage or multi-stage sampling.

PROBABILITY OR NONPROBABILITY SAMPLE? A probability sample is one in which the sample

units (peoples, states, counties, etc.) are selected at random and have an equal chance of being selected. Simple random samples. Systematic samples.

A nonprobability sample is one in which random selection techniques are not used.

The key difference is the generalizability of the results to the larger population.

The choice is usually based on cost versus value.

Rule of thumb: the more diverse the population, the more important representativeness becomes.

SINGLE UNIT OR CLUSTER SAMPLING?

A sampling unit is the basic element of the population being sampled.

In single unit sampling, each sampling unit is selected independently.

In cluster sampling, units are selected in groups.

Cluster sampling reduces costs. However, diverse populations generate

pressure to guarantee representativeness by using single unit sampling.

STRATIFIED OR UNSTRATIFIED SAMPLING

A sample stratum is a portion of the population that is of interest to the researcher.

Can be used to ensure representativeness or can be used to ensure overrepresentation of a selected population.

EQUAL UNIT OR UNEQUAL UNIT

In combination with stratified sampling, unequal unit sampling can be used to ensure an overrepresentation of a research population of interest.

SINGLE STAGE VERSUS MULTISTAGE SAMPLING Used when sampling over a large geographic

area. Face-to-face surveying.

Congressional districts. Census tracts. Residential blocks. Households. Residents (most recent birthday).

Telephone interviewing. Area codes. Prefixes. First two-digit clusters. Random assignment of last two digits (unlisted). Over-sampling to accommodate disconnects and

commercial numbers. Residents (most recent birthday).

SAMPLE BIAS AND SAMPLING ERROR

The ultimate purpose of sampling is to generate a sample that accurately reflects the research relevant characteristics of the population.

This purpose can be undermined by both sampling and nonsampling error.

Sample bias. Conscious or unconscious bias in the selection of the

sample. Overcome by random selection.

Sampling error. No sample ever exactly matches the population, but

random sampling allows probability estimates of the match.

Law of large numbers versus law of diminishing returns.

NONSAMPLING ERROR

Sampling frame. You should start with as complete a sampling

frame as possible. Example: random digit dialing versus one-plus dialing

versus telephone directories. Example: residential survey versus telephone survey.

Nonresponse error. Low response rates nearly always guarantee a

biased sample. Use of incentives and follow-up phone calls to

reduce. No guarantees of reduction.

SAMPLING DISTRIBUTIONS The sampling distribution is a hypothetical

distribution that was developed by statisticians to allow the estimation of the probability of a match between the sample and the population.

The sampling distribution is a distribution composed of the means of a very large number of samples drawn from the population.

This sample is generally normal and has a mean equal to the population mean and a standard deviation equal to . This is called the standard error of the mean.

Central limit theorem – as the number of samples increases the distribution of the sample statistic will take on a normal distribution. This begins to occur at n=30.

1-nsor

n