Chapter 1: Probability Sampling - Iowa State...

Chapter 1: Probability Sampling

Jae-kwang Kim

Fall, 2014

Introduction

1 Introduction

2 Example

3 Probability sampling

4 Survey Procedures

5 Survey Errors

Kim Ch. 1: Probability Sampling Fall, 2014 2 / 27

Introduction

What is sampling?

Population

Finite PopulationInfinite Population

Sample : Subset of a population

Sampling : Make statistical inference about the (finite) populationwithout measuring the whole population

Sampling error : the error that results from taking a sample instead ofexamining the whole population


Introduction

Why sampling ?

1 To reduce the cost

2 To save the time

3 Sometimes, to get a more accurate information about the population.

4 Sometimes, it is the only way of getting information about the targetpopulation.


Introduction

How to do the sampling ?

Two types of sampling

Probability samplingNon-probability sampling

Roughly speaking, a probability rule is assigned to obtain a sample inprobability sampling.


Example

1 Introduction

2 Example


4 Survey Procedures

5 Survey Errors


Example

Let’s look at an artificial finite population (of size N = 4).

ID Size of farms yield(Acres) (y)

1 4 12 6 33 6 54 20 15

Parameter of interest: Mean yield of the farms in the population

θ = (y1 + y2 + y3 + y4)/4

The value of y is obtained only from the sampled units.


Example

Instead of observing N = 4 farms, we want to select a sample of sizen = 2.

6 possible samples

case sample ID sample mean sampling error

1 1, 2 22 1, 3 33 1, 4 84 2, 3 45 2, 4 96 3, 4 10

Each sample has a sampling error.

Two ways of selecting one of the six possible samples.

Nonprobability sampling : (using size of farms or etc.) select a samplesubjectively.Probability sampling : select a sample by a probability rule.


Example

Probability sampling

Simple Random Sampling : Assign the same selection probability toall possible samples

case sample ID sample sampling selectionmean (y) error probability

1 1, 2 2 -4 1/62 1, 3 3 -3 1/63 1, 4 8 2 1/64 2, 3 4 -2 1/65 2, 4 9 3 1/66 3, 4 10 4 1/6

In this case, the sample mean(y) has a discrete probabilitydistribution.


Example

Probability mass function of y :

Py (y) =

{1/6 if y ∈ {2, 3, 4, 8, 9, 10}0 otherwise.

Unbiased

Variance


Example

Remark

No model assumption about yi in the example: totally differentframework !

Design-based approach: the reference distribution is the samplingdistribution generated by the repeated application of the givensampling mechanism.



1 Introduction

2 Example


4 Survey Procedures

5 Survey Errors



Definition & Notation

U = {1, 2, · · · ,N} : index set of finite population

A : subset of U, index set of the sample.

A: set of samples under consideration, sample support.

θ = θ(yi ; i ∈ A) : statistic



1 Probability distribution of samples, or sample distribution: probabilitymass function P (·) defined on A. That is, P (·) satisfies

1 P (A) ∈ [0, 1] , ∀A ∈ A2∑

A∈A P (A) = 1.

2 (Induced) probability distribution of a statistic

1. Expectation : E (θ) =∑

A∈A P(A)θ (A)

2. Variance : Var(θ) =∑

A∈A P(A)[θ (A)− E (θ)

]2

3. Mean squared error :

MSE (θ) =∑A∈A

P(A)[θ (A)− θ

]2

= Var(θ)

+[E (θ)− θ

]2

MSE = Variance + (Bias)2



Probability Sampling

Definition : For each element in the population, the probability thatthe element is included in the sample is known and greater than 0.

Advantages1 Exclude subjectivity of selecting samples.2 Remove sampling bias (or selection bias)

What is sampling bias ? ( θ : true value, θ: estimated value of θ)

(sampling) error of θ = θ − θ

={θ − E

(θ)}

+{E(θ)− θ}

= variation + bias

In nonprobability sampling, variation is 0 but there is a bias. Inprobability sampling, there exists variation but bias is 0.



Probability Sampling

Main theory1 Law of Large Numbers : θ converges to E (θ) for sufficiently large

sample size.2 Central Limit Theorem : θ follows a normal distribution for sufficiently

large sample size.

Additional advantages of probability sampling with large sample :1 Improve the precision of an estimator2 Can compute confidence intervals or test statistical hypotheses.

With the same sample size, we may have different precision.



Example - Continued : Probability sampling 2

Unequal probability sampling

Sample ID y value Mean Estimator Selection probability1, 4 1, 15 4.5 1/32, 4 3, 15 6 1/33, 4 5, 15 7.5 1/3

What is the probability distribution of the mean estimator ?

What is the expected value of the sampling error ?

E (y) =1

3(4.5 + 6 + 7.5) = 6.0

Compute the variance. Compare it with that of SRS.


Survey Procedures

1 Introduction

2 Example


4 Survey Procedures

5 Survey Errors


Survey Procedures

Outline of a survey

1 Define Target population: this is the population to which theconclusions apply.

2 Determine population characteristics of interest (e.g. acres of corn,daily intake of calcium)

3 Find sampling frame: device that associates elements by samplingunits (e.g. phone book)

4 Obtain a sample by a probability sampling design.

5 Measure the study variables.

6 Use measured values to compute point estimates and standard errors.


Survey Procedures

Basic procedures for survey sampling

1 Planning1 Statement of objectives2 Selection of a sampling frame

2 Design and development1 Sample design2 Questionnaire design

3 Implementation1 Data collection2 Data capture and coding3 Editing and Imputation4 Estimation5 Data analysis6 Data dissemination

4 Evaluation - Documentation


Survey Procedures

Two aspects of Survey Design

1 Sampling design: how to collect a sample ?

What is your target population ?What is your sampling frame ?What information is available from the sampling frame ?What is the cost for observing each unit in the sample ?

2 Questionnaire design: how to obtain measurement from the selectedsample ?

What is your research questions ?What are the variables that you want to measure from the sample ?How to ask the questions properly ?Is there any other auxiliary variables that you want to measure to usein the weighting or editing stage ?


Survey Errors

1 Introduction

2 Example


4 Survey Procedures

5 Survey Errors


Survey Errors

Source of Errors

1 Errors of nonobservationCoverage error

(Target) population 6= FrameSome elements are not listed.

Sampling error

Frame 6= sampleSome listed elements not sampled.

Non-response error

sample 6= respondentsSome sampled elements don’t respond.

2 Errors of observation

Measurement error: Interviewer, respondent, instrument, modeProcessing error


Survey Errors

Total survey errors


Survey Errors

Sampling error:

If n = N, no sampling errorIn fact, if n ↑ then sampling error ↓.

Nonsampling error: Everything else

Even if n = N, you have nonsampling error.In practice, we can decrease nonsampling error by decreasing n.

Because of nonsampling error, a sample is often more accurate than aCensus.


Survey Errors

Survey Methodology

Psychology (Cognitivescience), social science

More interested innon-sampling errors

By properly asking questions,we can reduce survey errors.

Questionnaire design, surveymanagement

Survey Statistics

Statistics

More interested in samplingerrors

Want to measure theuncertainty of survey errorsand incorporate them intoestimation.

Sampling design, estimation,editing etc.


Survey Errors

Summary

The course is about the sampling error of θ1 What is it ?2 How to reduce it ?3 How to measure it ?


Chapter 1: Probability Sampling - Iowa State...

Documents

Transcript of Chapter 1: Probability Sampling - Iowa State...