Chapter 1: Probability Sampling - Iowa State...

27
Chapter 1: Probability Sampling Jae-kwang Kim Fall, 2014

Transcript of Chapter 1: Probability Sampling - Iowa State...

Chapter 1: Probability Sampling

Jae-kwang Kim

Fall, 2014

Introduction

1 Introduction

2 Example

3 Probability sampling

4 Survey Procedures

5 Survey Errors

Kim Ch. 1: Probability Sampling Fall, 2014 2 / 27

Introduction

What is sampling?

Population

Finite PopulationInfinite Population

Sample : Subset of a population

Sampling : Make statistical inference about the (finite) populationwithout measuring the whole population

Sampling error : the error that results from taking a sample instead ofexamining the whole population

Kim Ch. 1: Probability Sampling Fall, 2014 3 / 27

Introduction

Why sampling ?

1 To reduce the cost

2 To save the time

3 Sometimes, to get a more accurate information about the population.

4 Sometimes, it is the only way of getting information about the targetpopulation.

Kim Ch. 1: Probability Sampling Fall, 2014 4 / 27

Introduction

How to do the sampling ?

Two types of sampling

Probability samplingNon-probability sampling

Roughly speaking, a probability rule is assigned to obtain a sample inprobability sampling.

Kim Ch. 1: Probability Sampling Fall, 2014 5 / 27

Example

1 Introduction

2 Example

3 Probability sampling

4 Survey Procedures

5 Survey Errors

Kim Ch. 1: Probability Sampling Fall, 2014 6 / 27

Example

Let’s look at an artificial finite population (of size N = 4).

ID Size of farms yield(Acres) (y)

1 4 12 6 33 6 54 20 15

Parameter of interest: Mean yield of the farms in the population

θ = (y1 + y2 + y3 + y4)/4

The value of y is obtained only from the sampled units.

Kim Ch. 1: Probability Sampling Fall, 2014 7 / 27

Example

Instead of observing N = 4 farms, we want to select a sample of sizen = 2.

6 possible samples

case sample ID sample mean sampling error

1 1, 2 22 1, 3 33 1, 4 84 2, 3 45 2, 4 96 3, 4 10

Each sample has a sampling error.

Two ways of selecting one of the six possible samples.

Nonprobability sampling : (using size of farms or etc.) select a samplesubjectively.Probability sampling : select a sample by a probability rule.

Kim Ch. 1: Probability Sampling Fall, 2014 8 / 27

Example

Probability sampling

Simple Random Sampling : Assign the same selection probability toall possible samples

case sample ID sample sampling selectionmean (y) error probability

1 1, 2 2 -4 1/62 1, 3 3 -3 1/63 1, 4 8 2 1/64 2, 3 4 -2 1/65 2, 4 9 3 1/66 3, 4 10 4 1/6

In this case, the sample mean(y) has a discrete probabilitydistribution.

Kim Ch. 1: Probability Sampling Fall, 2014 9 / 27

Example

Probability mass function of y :

Py (y) =

{1/6 if y ∈ {2, 3, 4, 8, 9, 10}0 otherwise.

Unbiased

Variance

Kim Ch. 1: Probability Sampling Fall, 2014 10 / 27

Example

Remark

No model assumption about yi in the example: totally differentframework !

Design-based approach: the reference distribution is the samplingdistribution generated by the repeated application of the givensampling mechanism.

Kim Ch. 1: Probability Sampling Fall, 2014 11 / 27

Probability sampling

1 Introduction

2 Example

3 Probability sampling

4 Survey Procedures

5 Survey Errors

Kim Ch. 1: Probability Sampling Fall, 2014 12 / 27

Probability sampling

Definition & Notation

U = {1, 2, · · · ,N} : index set of finite population

A : subset of U, index set of the sample.

A: set of samples under consideration, sample support.

θ = θ(yi ; i ∈ A) : statistic

Kim Ch. 1: Probability Sampling Fall, 2014 13 / 27

Probability sampling

1 Probability distribution of samples, or sample distribution: probabilitymass function P (·) defined on A. That is, P (·) satisfies

1 P (A) ∈ [0, 1] , ∀A ∈ A2∑

A∈A P (A) = 1.

2 (Induced) probability distribution of a statistic

1. Expectation : E (θ) =∑

A∈A P(A)θ (A)

2. Variance : Var(θ) =∑

A∈A P(A)[θ (A)− E (θ)

]2

3. Mean squared error :

MSE (θ) =∑A∈A

P(A)[θ (A)− θ

]2

= Var(θ)

+[E (θ)− θ

]2

MSE = Variance + (Bias)2

Kim Ch. 1: Probability Sampling Fall, 2014 14 / 27

Probability sampling

Probability Sampling

Definition : For each element in the population, the probability thatthe element is included in the sample is known and greater than 0.

Advantages1 Exclude subjectivity of selecting samples.2 Remove sampling bias (or selection bias)

What is sampling bias ? ( θ : true value, θ: estimated value of θ)

(sampling) error of θ = θ − θ

={θ − E

(θ)}

+{E(θ)− θ}

= variation + bias

In nonprobability sampling, variation is 0 but there is a bias. Inprobability sampling, there exists variation but bias is 0.

Kim Ch. 1: Probability Sampling Fall, 2014 15 / 27

Probability sampling

Probability Sampling

Main theory1 Law of Large Numbers : θ converges to E (θ) for sufficiently large

sample size.2 Central Limit Theorem : θ follows a normal distribution for sufficiently

large sample size.

Additional advantages of probability sampling with large sample :1 Improve the precision of an estimator2 Can compute confidence intervals or test statistical hypotheses.

With the same sample size, we may have different precision.

Kim Ch. 1: Probability Sampling Fall, 2014 16 / 27

Probability sampling

Example - Continued : Probability sampling 2

Unequal probability sampling

Sample ID y value Mean Estimator Selection probability1, 4 1, 15 4.5 1/32, 4 3, 15 6 1/33, 4 5, 15 7.5 1/3

What is the probability distribution of the mean estimator ?

What is the expected value of the sampling error ?

E (y) =1

3(4.5 + 6 + 7.5) = 6.0

Compute the variance. Compare it with that of SRS.

Kim Ch. 1: Probability Sampling Fall, 2014 17 / 27

Survey Procedures

1 Introduction

2 Example

3 Probability sampling

4 Survey Procedures

5 Survey Errors

Kim Ch. 1: Probability Sampling Fall, 2014 18 / 27

Survey Procedures

Outline of a survey

1 Define Target population: this is the population to which theconclusions apply.

2 Determine population characteristics of interest (e.g. acres of corn,daily intake of calcium)

3 Find sampling frame: device that associates elements by samplingunits (e.g. phone book)

4 Obtain a sample by a probability sampling design.

5 Measure the study variables.

6 Use measured values to compute point estimates and standard errors.

Kim Ch. 1: Probability Sampling Fall, 2014 19 / 27

Survey Procedures

Basic procedures for survey sampling

1 Planning1 Statement of objectives2 Selection of a sampling frame

2 Design and development1 Sample design2 Questionnaire design

3 Implementation1 Data collection2 Data capture and coding3 Editing and Imputation4 Estimation5 Data analysis6 Data dissemination

4 Evaluation - Documentation

Kim Ch. 1: Probability Sampling Fall, 2014 20 / 27

Survey Procedures

Two aspects of Survey Design

1 Sampling design: how to collect a sample ?

What is your target population ?What is your sampling frame ?What information is available from the sampling frame ?What is the cost for observing each unit in the sample ?

2 Questionnaire design: how to obtain measurement from the selectedsample ?

What is your research questions ?What are the variables that you want to measure from the sample ?How to ask the questions properly ?Is there any other auxiliary variables that you want to measure to usein the weighting or editing stage ?

Kim Ch. 1: Probability Sampling Fall, 2014 21 / 27

Survey Errors

1 Introduction

2 Example

3 Probability sampling

4 Survey Procedures

5 Survey Errors

Kim Ch. 1: Probability Sampling Fall, 2014 22 / 27

Survey Errors

Source of Errors

1 Errors of nonobservationCoverage error

(Target) population 6= FrameSome elements are not listed.

Sampling error

Frame 6= sampleSome listed elements not sampled.

Non-response error

sample 6= respondentsSome sampled elements don’t respond.

2 Errors of observation

Measurement error: Interviewer, respondent, instrument, modeProcessing error

Kim Ch. 1: Probability Sampling Fall, 2014 23 / 27

Survey Errors

Total survey errors

Kim Ch. 1: Probability Sampling Fall, 2014 24 / 27

Survey Errors

Sampling error:

If n = N, no sampling errorIn fact, if n ↑ then sampling error ↓.

Nonsampling error: Everything else

Even if n = N, you have nonsampling error.In practice, we can decrease nonsampling error by decreasing n.

Because of nonsampling error, a sample is often more accurate than aCensus.

Kim Ch. 1: Probability Sampling Fall, 2014 25 / 27

Survey Errors

Survey Methodology

Psychology (Cognitivescience), social science

More interested innon-sampling errors

By properly asking questions,we can reduce survey errors.

Questionnaire design, surveymanagement

Survey Statistics

Statistics

More interested in samplingerrors

Want to measure theuncertainty of survey errorsand incorporate them intoestimation.

Sampling design, estimation,editing etc.

Kim Ch. 1: Probability Sampling Fall, 2014 26 / 27

Survey Errors

Summary

The course is about the sampling error of θ1 What is it ?2 How to reduce it ?3 How to measure it ?

Kim Ch. 1: Probability Sampling Fall, 2014 27 / 27