Session 1A - World Bankpubdocs.worldbank.org/pubdocs/publicdoc/2016/5/... · Session 1A: Total...
Transcript of Session 1A - World Bankpubdocs.worldbank.org/pubdocs/publicdoc/2016/5/... · Session 1A: Total...
Session 1A: Total survey quality and the fundamentals of scientific sampling
Juan Muñoz Sistemas Integrales Delhi, March 18, 2013
Please join Channel 41
The end product
Sampling
Data
management
Tools
Field work
2
Total quality in surveys
The integration of sampling design, questionnaires and tools, field work and data management with the goal of delivering analysts a reliable database on time
Data loose their value if they don’t represent the reality of the day
The implementer of the intervention program
The evaluator (National or International
Evaluation Agency)
The survey firm responsible for data collection
Core team
The stage and the actors
3
Financial $
Manager
Field teams
Project manager Data
manager
Specialists in Impact Evaluation Design and Analysis
Field operations manager
Quality Assurance
Impact Evaluation team
Survey methods: an overview
Morning 1
Morning 2
Afternoon 1
Afternoon 2
Monday Tuesday Wednesday Thursday
Total Quality
Scientific sampling
Exercise on scientific sampling
Complex samples and
design effects
Exercise on complex samples
Sampling and Impact Evaluation
Statistical Power
Exercise on power
calculations
Questionnaire design
Exercise on questionnaire
design
Fieldwork and Quality
Assurance
Exercise on Fieldwork
Surveys schedules and
budgets
Exercise on Quality
Assurance
Documentation
The keys of quality
The end product of a survey is…
1. Field work 2. A database 3. Scientific sampling 4. Data management 5. The questionnaire
Total quality is important for
1. Demographic and Health Surveys (DHSs)
2. Randomized Control Trials (RCTs)
3. Living Standards Measurement Surveys (LSMSs)
4. Impact Evaluation Surveys (IEs)
5. All of the above
Elections and statistical wizardy
7
Random sampling • In our examples, electors in the sample were chosen as
in a lottery. This selection technique is called “Simple Random Sampling”
• We shall soon discuss more complex techniques (stratified sampling, multi-stage sampling, etc.), but they will all be random techniques.
• In random sampling (a.k.a. probabilistic, or scientific sampling), each element in the population has a probability of being chosen that is – positive, and – known
• This is why random sampling permits the definition of margins of error and confidence intervals. Other sampling techniques (quota sampling, convenience sampling, etc.) cannot do it
8
but not necessarily the same for all…
Only one sample
• In our examples, in order to see the distribution of estimations, we pretended that our researcher could go back in time and select several samples
• In actual practice, the researcher can only select one single sample
• Sampling theory gives formulas for the distribution of estimations on the basis of that unique sample
9
The basic formula for the estimation of a prevalence in
Simple Random Sampling
10
𝑒 = 1 −𝑛𝑁
𝑃(1 − 𝑃)𝑛
Standard error
Population size Prevalence Sample size
What is the use of the standard error?
11
12
Example
• We want to evaluate the impact of a program that intends to reduce smoking among pregnant women
• To get baseline data, we took a Simple Random Sample of 900 of the 17,125 births recorded in the province in the year 2012
• 279 of the mothers in the sample declared to have smoked during pregnancy
• What can we say about the prevalence of smoking among mothers in the province?
13
14
We can obviously estimate the prevalence of smoking as P = 279 / 900 = 0.31 (31 percent)
We can also estimate the standard error as
𝑒 = 1 −𝑛𝑁
𝑃(1 − 𝑃)𝑛
𝑒 = 1 −900
17,1250.31 × (1 − 0.31)
900
e= 0.0150 (1.50 percent)
Based on the standard error, we can compute confidence intervals
Confidence intervals
27 28 29 30 31 32 33 34 35
The prevalence is estimated as 31 percent with a standard error of 1.50 percent
Standard error
95% confidence interval: 31 ± 1.50 × 1.96
99% confidence interval: 31 ± 1.50 × 2.58
15
Effect of the population size
16
𝑒 = 1 −𝑛𝑁
𝑃(1 − 𝑃)𝑛
Finite population correction
In practice this is almost always so close to 1 that we can safely ignore it
Effect of the population size
17 Size of the population
Sample size needed to achieve a
given precision
In practice, the size of the population has very little influence on the sample size needed to achieve a given precision
Effect of the sample size
18 Sample size
Standard error To halve the standard error…
…sample size needs to be quadrupled
Sampling error vs non sampling error
19 Sample size
Sampling error
Non sampling
error
Total error
𝑒 = 1 −𝑛𝑁
𝑃(1 − 𝑃)𝑛
𝑒 =𝑃(1 − 𝑃)
𝑛
𝑛∞ =𝑃(1 − 𝑃)
𝑒2
𝑛∞ =𝑡∝2𝑃(1 − 𝑃)
𝐸2
For an infinite population (N=∞)
For a maximum error E at a given confidence level α
With t95%=1.96, t99%=2.58, etc.
𝑛𝑁 =𝑛∞
1 + 𝑛∞ 𝑁⁄
For a population size N
Estimating an average
21
For the estimation of an average
22
𝑒 = 1 −𝑛𝑁
𝜎2
𝑛≈
𝜎𝑛
Which of these is less important
1 2 3 4
0% 0%0%0%
1. Non sampling errors 2. The finite
population correction
3. Sampling errors 4. Field work
For the group sessions • After the coffee break, proceed to your group’s room and turn your
clicker to the corresponding channel, as follows: – Group 1 Agra hall Channel 41 – Group 2 Jaipur hall Channel 42 – Group 3 Mumbai hall Channel 43 – Group 4 Varanasi hall Channel 44
• Have your calculator at hand • Copy to your computer the Excel workbook
– Simulators\Lesson1.xlsm • Authorize Excel macros, according to the instructions at
– Simulators\Authorize_Excel_Macros.doc • In case of difficulties consult your group’s animators during the coffee
break, before the sessions start, in 30 minutes • If you don’t have a computer, sit next to a colleague who has one
24