Integrating GPS and SR Measures of Land in HH Surveys (Alberto Zezza, World Bank)

15
Integrating GPS and self- reported measures of land area in household surveys Alberto Zezza Development Data Group, The World Bank RuLIS Expert Consultation FAO Headquarters - Rome – November 8, 2016

Transcript of Integrating GPS and SR Measures of Land in HH Surveys (Alberto Zezza, World Bank)

Page 1: Integrating GPS and SR Measures of Land in HH Surveys (Alberto Zezza, World Bank)

Integrating GPS and self-reported measures of

land area in household surveys

Alberto Zezza Development Data Group, The World Bank

RuLIS Expert Consultation FAO Headquarters - Rome – November 8, 2016

Page 2: Integrating GPS and SR Measures of Land in HH Surveys (Alberto Zezza, World Bank)

worldbank.org/lsms RuLIS, 8 November 2016

Background: LSMS Methodological Research

• Broad scope of LSMS methodological research since 2005• Agricultural productivity measurement: LSMS Methodological Validation

Program (MVP) - UK Aid, Partnerships w/ (FAO) Global Strategy to Improve Agricultural & Rural Statistics

• Approach:• Test (old & new) methods in tandem with a gold standard• Assess relative accuracy & scale-up feasibility• Cost effectiveness, skill & training requirements, respondent burden• Document results, best practices & protocols for scale-up (guidelines)• Integrate validated & cost-effective methods into LSMS operations

• Today’s focus: GPS and self-reported land area measures in Household surveys

Presenter
Presentation Notes
Page 3: Integrating GPS and SR Measures of Land in HH Surveys (Alberto Zezza, World Bank)

worldbank.org/lsms RuLIS, 8 November 2016

Motivation

Land area is critical in:• Measuring productivity• Assessing farmer wealth• Designing

titling/registration schemes

• Anything agriculture… > 70%Of the world’s poor reside in rural areas*

*source: IFAD, Rural poverty report 2011

Page 4: Integrating GPS and SR Measures of Land in HH Surveys (Alberto Zezza, World Bank)

worldbank.org/lsms RuLIS, 8 November 2016

Measuring Land Area: Methodological OptionsFarmer self-reported

estimate

PROS- Inexpensive

- Less missingness

CONS- Subjective

- Complicated by traditional units

-Potential ulterior motives

Compass and rope (aka traversing)

PROS-Traditional gold

standard for accuracy- Eliminates subjectivity

CONS- Time/labor intensive

(leading to higher costs)

- Requires travel to plot

GPS

PROS- Significantly quicker than traversing with

advantages of objective measurement

CONS- Questions of accuracy

on small plots (?)- Requires travel to plot

Remote Sensing (?)

PROS - Potential to eliminate

plot visits

CONS- Resolution limitations

- Feasibility of boundary identification

3 Methodological experiments: Ethiopia (n=1798), Tanzania (n=1945), Nigeria (n=494) – Total N=4237

Page 5: Integrating GPS and SR Measures of Land in HH Surveys (Alberto Zezza, World Bank)

worldbank.org/lsms RuLIS, 8 November 2016

Comparison of Methods (National Surveys): Subjective vs. Objective

Subjective farmer self-reported estimates are potentially sensitive to:

• Respondent characteristics• Perceived use of the data

(taxation, program eligibility)• Traditional/local units of

measurement• Rounding

• Large errors• Systematic biases

Source: Carletto, Savastano, Zezza (2013). “Fact or Artifact: the Impact of Measurement Errors on the Farm size - Productivity Relationship”, Journal of Development Economics.

Page 6: Integrating GPS and SR Measures of Land in HH Surveys (Alberto Zezza, World Bank)

worldbank.org/lsms RuLIS, 8 November 2016

0

2

4

6

0 2 4 6GPS

Ethiopia

0

2

4

6

CR

0 2 4 6SR

0

1

2

3

0 1 2 3 4GPS

Tanzania

0

1

2

3

CR

0 2 4 6 8SR

05101520

0 5 10 15 20GPS

Nigeria

05101520

CR

0 5 10 15 20SR

Correlation between GPS and CR measurements:

0.997

(about 0.5 between SR and CR)

GPS vs. Compass & Rope vs Subjective

Page 7: Integrating GPS and SR Measures of Land in HH Surveys (Alberto Zezza, World Bank)

worldbank.org/lsms RuLIS, 8 November 2016

0

20

40

60

80

100

Ethiopia Tanzania Total

GPS CRPlot Size Level (CR) & MinutesAverage Measurement Duration

• Ethiopia:– GPS = 13.7 minutes– CR = 56.8 minutes

• Tanzania:– GPS = 7.4 minutes– CR = 29.3 minutes

GPS much, much faster (cheaper) than CR

Page 8: Integrating GPS and SR Measures of Land in HH Surveys (Alberto Zezza, World Bank)

worldbank.org/lsms RuLIS, 8 November 2016

So GPS is the way to go, except…• Collecting GPS-based land areas not always feasible – field work

protocols, lack of physical access, refusals• Substantial presence of missing values (up to 30 percent or more):

Empirical implications unclear

SurveyRate ofMissingness

Required Spatial Coverage ofGPS-Based Plot Area Measurements

Niger Enquête Nationale sur les Conditionsde Vie des nages et l’Agriculture 2011

29%Measure all plots in the same enumerationarea as the household.

Nigeria General Household Survey - Panel2012/2013

13%Measure all plots in the same district of thehousehold and within 3 hours of travel,regardless of mode of transportation.

Tanzania National Panel Survey2010/2011

22%Measure all plots within 1 hour of travel fromthe household, regardless of mode oftransportation.

Uganda National Panel Survey2011/2012

44%Measure all plots in the same enumerationarea as the household.

Page 9: Integrating GPS and SR Measures of Land in HH Surveys (Alberto Zezza, World Bank)

www.worldbank.org/lsms RuLIS, 8 November 2016

Non-randomness in missing GPS-based plot areasUNPS 2009/10 TZNPS 2010/11

Entire Sample W/ GPS W/o GPS

Observations 4,142 3,383(82%)

759(18%)

GPS-BasedPlot Area (Acres)

2.59 2.59 --

Farmer-Reported Plot Area (Acres)

2.31 2.30 2.35

Distance to Home (KM) 3.74 1.95 13.92 ***

Distance to Road (KM) 2.18 1.62 5.39 ***

Rented/Other † 0.12 0.09 0.25 ***

# of Plots in Holding 3.09 3.08 3.15 ***

Mover Original HH † 0.06 0.05 0.09 ***

Split-Off HH † 0.09 0.08 0.15 ***

Wealth Index (2008/09) -1.06 -1.09 -0.88 ***

Note: Results from tests of mean differences reported. *** p<0.01, ** p<0.05, * p<0.1.Statistics weighted through the use of household sampling weights. † denotes a dummyvariable.

Entire Sample W/ GPS W/o GPS

Observations 4,333 2,814(65%)

1,519(35%)

GPS-Based Plot Area (Acres) 2.13 2.13 --

Farmer-Reported Plot Area (Acres) 2.05 2.00 2.12

Less Than 15 Mins Away from HH †

0.62 0.80 0.31 ***

30+ Mins Away from HH † 0.22 0.06 0.48 ***

Rented/Other † 0.26 0.14 0.46 ***# of Plots in Holding 3.31 3.17 3.54 ***

Mover Original HH † 0.04 0.01 0.09 ***

Split-Off HH † 0.13 0.06 0.25 ***

Wealth Index (2005/06) -0.66 -0.77 -0.47 ***

Note: Results from tests of mean differences reported. *** p<0.01, ** p<0.05, * p<0.1. Statisticsweighted through the use of household sampling weights. † denotes a dummy variable.

Page 10: Integrating GPS and SR Measures of Land in HH Surveys (Alberto Zezza, World Bank)

worldbank.org/lsms RuLIS, 8 November 2016

Multiple Imputation (MI): Background• MI originally proposed to handle missing data in public use files from

censuses, sample household surveys (Rubin, 1977)• Using distribution of observed data to estimate plausible values for missing

data, incorporating random, imputation-related components to reflectuncertainty (Rubin, 1987)

• Superior over casewise deletion & conditional mean imputation, known tounderstate true variance (Schafer & Graham, 2002)

• Key assumption: Missing At Random (MAR) conditional on observables,plausibility depends on the nature & sources of missing data

Our Approach:• 50 imputations of GPS-based plot area, using PMM with 5 neighbors• Robustness checks: # of m, # of neighbors, bootstrapping, PMM vs. OLS

Page 11: Integrating GPS and SR Measures of Land in HH Surveys (Alberto Zezza, World Bank)

worldbank.org/lsms RuLIS, 8 November 2016

Multiple Imputation (MI) model

Selected OLS Regression Results Underlying Multiple ImputationDependent Variable = GPS-Based Plot Area (Acres)

UNPS 2009/10 TZNPS 2010/11

Farmer-Reported Plot Area (Acres) 0.945*** 0.866***Log [Value of Plot Output] 0.023 0.056***Log [Value of Plot Input] 0.027** 0.032***# of Plots in Holding -0.141*** -0.094**District & Enumerator Fixed Effects YES YESObservations 2,814 3,363R2 0.658 0.688

Page 12: Integrating GPS and SR Measures of Land in HH Surveys (Alberto Zezza, World Bank)

worldbank.org/lsms RuLIS, 8 November 2016

Empirical Approach for MI validation

• Create artificial missing(ness) in GPS-based plot areas• Conduct MI based on each unique data set under a specific

simulated degree of missing observations beyond the two different distance thresholds (SR: key dependent variable)

• Compare the distributions of plot area and plot-level agricultural productivity ( imputed vs observed) the same plots

• Identify the missing(ness) threshold beyond which, MI yields imputed distributions that are statistically different from the observed distributions

• MI reliably predicting missing GPS-based plot areas in surveys

Presenter
Presentation Notes
Create artificial missing(ness) in GPS-based plot areas at random above operationally-relevant thresholds, namely a distance of greater than 500 meters or 1 kilometer from the dwelling unit  Conduct MI based on each unique data set under a specific simulated degree of missing observations beyond the two different distance thresholds (SR: key dependent variable) Compare the distributions of plot area and plot-level agricultural productivity, based on the imputed GPS-based plot areas with the distributions of the same variables based on the observed area measures for the same plots Kolmogorov-Smirnov (K-S) test  Identify the missing(ness) threshold beyond which, MI yields imputed distributions that are statistically different from the observed distributions. Critical missing rate is lowest rate of plot area missing(ness) at which at least 1 of the 50 imputed variable is statistically different from the observed variable under the K-S test at the 5% level.
Page 13: Integrating GPS and SR Measures of Land in HH Surveys (Alberto Zezza, World Bank)

worldbank.org/lsms RuLIS, 8 November 2016

Assessing the tolerable rate of missing (ness) for use of MI

9382

010

2030

4050

# Im

puta

tions

Sta

tistic

ally

iden

tical

to th

e 'T

ruth

'

0 20 40 60 80 100% of plots missing beyond threshold

Malawi (1km threshold)

5245

010

2030

4050

# Im

puta

tions

Sta

tistic

ally

iden

tical

to th

e 'T

ruth

'

0 20 40 60 80 100% of plots missing beyond threshold

Malawi (500m threshold)

73560

1020

3040

50

# Im

puta

tions

Sta

tistic

ally

iden

tical

to th

e 'T

ruth

'

0 20 40 60 80 100% of plots missing beyond threshold

Ethiopia (1km threshold)

4836

010

2030

4050

# Im

puta

tions

Sta

tistic

ally

iden

tical

to th

e 'T

ruth

'

0 20 40 60 80 100% of plots missing beyond threshold

Ethiopia (500m threshold)

Tolerable rates of plot area missingness

Plot area Land Productivity

Plot Area YieldTolerable rate (%)

Tolerable rate (%)

Malawi1.0 km

93(26)

82(23)

500 m52

(24)45

(21)

Ethiopia1.0 km

73(20)

56(13)

500 m48

(18)36

(15)

*overall missing(ness) in parentheses

Page 14: Integrating GPS and SR Measures of Land in HH Surveys (Alberto Zezza, World Bank)

worldbank.org/lsms RuLIS, 8 November 2016

Concluding Thoughts• Clear evidence of systematic bias in farmer self-reported area estimates• GPS serves as a time- and cost-efficient substitute for CR (in most cases)• GPS + SR: When GPS measurements are missing, impute them using the self-

reported area estimates• Imputing missing GPS-based plot areas has clear implications for policy-relevant

productivity analysis• MI use to compute mean statistics is empirically validated by our work under MAR.• Critical rates of missing(ness) that MI can overcome is context specific and can be

use to efficiently plan survey operations • RuLIS: Distribute one land area variable with notation on whether SR or GPS+MI?

Page 15: Integrating GPS and SR Measures of Land in HH Surveys (Alberto Zezza, World Bank)

worldbank.org/lsms RuLIS, 8 November 2016

LSMS Resources on Land Area Measurement• Carletto, G., Gourlay, S., Murray, S., & Zezza, A., 2016. Land Area Measurement in Household Surveys: A Guidebook.

Washington DC: World Bank.

• Carletto, G., Gourlay, S., Murray, S. and Zezza, A. (2016). Cheaper, Faster and More Than Good Enough: Is GPS the new gold standard in land area measurement? World Bank Policy Research Working Paper, 7759.

• Carletto, G., Gourlay, S., and Winters, P. (2015). From Guesstimates to GPStimates: Land Area Measurement and Implications for Agricultural Analysis. Journal of African Economies, 24 (5), 593–628. (Also available in the World Bank Policy Research Working Paper series.)

• Carletto, G., Savastano, S., and Zezza, A. (2013). Fact or artifact: The impact of measurement errors on the farm size–productivity relationship, Journal of Development Economics, 103(C), 254–261. (Also available in the World Bank Policy Research Working Paper series.)

• Dillon, A., Gourlay, S., McGee, K., and Oseni, G. (2016). Land measurement bias and its empirical implications: evidence from a validation exercise. World Bank Policy Research Working Paper, 7597.

• Kilic, T., Zezza, A., Carletto, G., and Savastano, S. (2013). Missing(ness) in Action: Selectivity Bias in GPS-Based Land Area Measurements. World Bank Policy Research Working Paper 6490.

• Kilic, T., I. Yacoubou Djima, and C. Carletto. (2016). Is Predicting Missing GPS-Based Land Area Measures Mission Impossible in Household Surveys? Exploring the Promise of MI. World Bank Policy Research Working Paper, forthcoming.