Session 1A - World Bankpubdocs.worldbank.org/pubdocs/publicdoc/2016/5/... · Session 1A: Total...

Session 1A: Total survey quality and the fundamentals of scientific sampling

Juan Muñoz Sistemas Integrales Delhi, March 18, 2013

Please join Channel 41

The end product

Sampling

Data

management

Tools

Field work

2

Total quality in surveys

The integration of sampling design, questionnaires and tools, field work and data management with the goal of delivering analysts a reliable database on time

Data loose their value if they don’t represent the reality of the day

The implementer of the intervention program

The evaluator (National or International

Evaluation Agency)

The survey firm responsible for data collection

Core team

The stage and the actors

3

Financial $

Manager

Field teams

Project manager Data

manager

Specialists in Impact Evaluation Design and Analysis

Field operations manager

Quality Assurance

Impact Evaluation team

Survey methods: an overview

Morning 1

Morning 2

Afternoon 1

Afternoon 2

Monday Tuesday Wednesday Thursday

Total Quality

Scientific sampling

Exercise on scientific sampling

Complex samples and

design effects

Exercise on complex samples

Sampling and Impact Evaluation

Statistical Power

Exercise on power

calculations

Questionnaire design

Exercise on questionnaire

design

Fieldwork and Quality

Assurance

Exercise on Fieldwork

Surveys schedules and

budgets

Exercise on Quality

Assurance

Documentation

The keys of quality

The end product of a survey is…

1. Field work 2. A database 3. Scientific sampling 4. Data management 5. The questionnaire

Total quality is important for

1. Demographic and Health Surveys (DHSs)

2. Randomized Control Trials (RCTs)

3. Living Standards Measurement Surveys (LSMSs)

4. Impact Evaluation Surveys (IEs)

5. All of the above

Elections and statistical wizardy

7

Random sampling • In our examples, electors in the sample were chosen as

in a lottery. This selection technique is called “Simple Random Sampling”

• We shall soon discuss more complex techniques (stratified sampling, multi-stage sampling, etc.), but they will all be random techniques.

• In random sampling (a.k.a. probabilistic, or scientific sampling), each element in the population has a probability of being chosen that is – positive, and – known

• This is why random sampling permits the definition of margins of error and confidence intervals. Other sampling techniques (quota sampling, convenience sampling, etc.) cannot do it

8

but not necessarily the same for all…

Only one sample

• In our examples, in order to see the distribution of estimations, we pretended that our researcher could go back in time and select several samples

• In actual practice, the researcher can only select one single sample

• Sampling theory gives formulas for the distribution of estimations on the basis of that unique sample

9

The basic formula for the estimation of a prevalence in

Simple Random Sampling

10

𝑒 = 1 −𝑛𝑁

𝑃(1 − 𝑃)𝑛

Standard error

Population size Prevalence Sample size

What is the use of the standard error?

11

Example

• We want to evaluate the impact of a program that intends to reduce smoking among pregnant women

• To get baseline data, we took a Simple Random Sample of 900 of the 17,125 births recorded in the province in the year 2012

• 279 of the mothers in the sample declared to have smoked during pregnancy

• What can we say about the prevalence of smoking among mothers in the province?

13

14

We can obviously estimate the prevalence of smoking as P = 279 / 900 = 0.31 (31 percent)

We can also estimate the standard error as

𝑒 = 1 −𝑛𝑁

𝑃(1 − 𝑃)𝑛

𝑒 = 1 −900

17,1250.31 × (1 − 0.31)

900

e= 0.0150 (1.50 percent)

Based on the standard error, we can compute confidence intervals

Confidence intervals

27 28 29 30 31 32 33 34 35

The prevalence is estimated as 31 percent with a standard error of 1.50 percent

Standard error

95% confidence interval: 31 ± 1.50 × 1.96

99% confidence interval: 31 ± 1.50 × 2.58

15

Effect of the population size

16

𝑒 = 1 −𝑛𝑁

𝑃(1 − 𝑃)𝑛

Finite population correction

In practice this is almost always so close to 1 that we can safely ignore it

Effect of the population size

17 Size of the population

Sample size needed to achieve a

given precision

In practice, the size of the population has very little influence on the sample size needed to achieve a given precision

Effect of the sample size

18 Sample size

Standard error To halve the standard error…

…sample size needs to be quadrupled

Sampling error vs non sampling error

19 Sample size

Sampling error

Non sampling

error

Total error

𝑒 = 1 −𝑛𝑁

𝑃(1 − 𝑃)𝑛

𝑒 =𝑃(1 − 𝑃)

𝑛

𝑛∞ =𝑃(1 − 𝑃)

𝑒2

𝑛∞ =𝑡∝2𝑃(1 − 𝑃)

𝐸2

For an infinite population (N=∞)

For a maximum error E at a given confidence level α

With t95%=1.96, t99%=2.58, etc.

𝑛𝑁 =𝑛∞

1 + 𝑛∞ 𝑁⁄

For a population size N

Estimating an average

21

For the estimation of an average

22

𝑒 = 1 −𝑛𝑁

𝜎2

𝑛≈

𝜎𝑛

Which of these is less important

1 2 3 4

0% 0%0%0%

1. Non sampling errors 2. The finite

population correction

3. Sampling errors 4. Field work

For the group sessions • After the coffee break, proceed to your group’s room and turn your

clicker to the corresponding channel, as follows: – Group 1 Agra hall Channel 41 – Group 2 Jaipur hall Channel 42 – Group 3 Mumbai hall Channel 43 – Group 4 Varanasi hall Channel 44

• Have your calculator at hand • Copy to your computer the Excel workbook

– Simulators\Lesson1.xlsm • Authorize Excel macros, according to the instructions at

– Simulators\Authorize_Excel_Macros.doc • In case of difficulties consult your group’s animators during the coffee

break, before the sessions start, in 30 minutes • If you don’t have a computer, sit next to a colleague who has one

24

Session 1A - World Bankpubdocs.worldbank.org/pubdocs/publicdoc/2016/5/... · Session 1A: Total...

Documents

Transcript of Session 1A - World Bankpubdocs.worldbank.org/pubdocs/publicdoc/2016/5/... · Session 1A: Total...