Data Validation For the 2006 Census of Agriculture
-
Upload
ryder-hardin -
Category
Documents
-
view
19 -
download
1
description
Transcript of Data Validation For the 2006 Census of Agriculture
1
Data Validation For the 2006 Census of
Agriculture
Charlie Arcaro, Statistics Canada
ICES III - 20 June, 2007
2
Outline
Introduction
What is Data Validation?
CEAG and Validation Process
New Strategies for 2006
Conclusions
3
Introduction
Census of Agriculture (CEAG) Provides a quinquennial snapshot of Canadian Agriculture
Piggyback with Census of Population
Collects inventories of farm commodities/financial information
Approximately 229K farms & 327K operators in CEAG 2006
Used in redesigning Agricultural surveys Basic sources of sampling frames Intracensal corrections Updating the Farm Register
4
What is Data Validation?
Major part of data quality evaluation for Census of Agriculture
Costliest part of the CEAG process post-collection
Analyze and change CEAG data (Sep 2006 to Feb 2007)
Data analyzed using both micro and macro level tools Aggregate data at various geographic levels Quality evaluation of small area data
Produce reports, make presentations to Certification Committee and recommendations for publication
5
CEAG Process
CollectionCollection Data ScanningData Scanning
Output & DisseminationOutput & Dissemination
Editing, Matching, Follow upEditing, Matching, Follow up
ImputationImputation
Data ValidationData Validation
6
What’s Involved?
Senior Validators prepare validation plan based upon knowledge and expectations
Number of farms & totals in 2006, Structural changes since 2001
Validation tools on Central Processing System (CPS) Comparison/Match/Distribution tables, Top Contributors, Impact
of Processing
Compare references sources
Validators responsible for many variables Validation done by variables and not questionnaire
7
Goals for CEAG Validation 2006
More efficient use of resources
Improve data quality at finer geographic level Greater scrutiny expected in 2006
8
Validation Tools
Comparison Tables
Compare data for 2001 and 2006 and % of change Total values, # farms reporting, average reporting value Four geographical levels
Look for changes that makes data “questionable” Use survey data or other sources to justify changes More likely at lower levels of geography
9
Validation Tools (cont.)
Impact of Processing Tables Assesses the impact made to data during
imputation and validation (Provincial level only)
Reports available for all variables
No change for 2006
10
Validation Tools (cont.)
Top Contributor Tables Farm records with highest value for certain geographic
areas
What to look for Imputed values, Jumps in consecutive values Compare associated variables with main variable
Bottom Contributors Greenhouse variables - supposed to report in ft2 or m2
Locate capture and response errors
11
Validation Tools
Top Contributor Tables
In 2001… Top 100 farms in each province (default)
In 2006… Top 100 farms or top 80% contributors in each province
12
Validation Tools
Match Tables Compare CEAG data to selected Referential Sources (RS)
Three reports are generated In RS not in CEAG, in CEAG not in RS, in both but with significant
differences
In 2001… Top 100 from each of the 3 match reports (default)
In 2006… Using Call Management System (CMS) cutoff algorithm (2% units,
50% cumulative total 1% individual total) Missing CEAG farms > 30% CCS estimate
13
Validation Tools
Distribution Tables Distribution of variables (Geog. Classification, Category..)
Similar to Comparison Tables. Three types:
Operator Tables All operators at detailed geography level # of operators, age, sex, job injuries etc….
Livestock Tables Counts and farms sizes for various livestock and poultry (Province
level)
Other Tables Computer Usage, Land management Data on tick box variables
14
Results for CEAG 2006
Total Cattle (TCATTL) – 109,920 Farms
# Validated % Farms Total %Total
New Methods 1,551 1.41% 3,346,211 20.15
Previous Methods 2,928 2.66% 3,992,312 20.73
Alfalfa (ALFALFA) – 88,064 Farms
# Validated % Farms Total %Total
New Methods 1,916 2.18% 1,220,928 9.62
Previous Methods 2,562 2.91% 1,187, 804 9.36
Total Pigs (TOPIGS) – 11,506 Farms
# Validated % Farms Total %Total
New Methods 1,466 12.74% 7,018,275 45.71
Previous Methods 2,808 24.40% 7,255,827 47.25
15
Conclusions
How much validation is enough?
More focused and structured approach to Validation
Process
Reduced work without quality compromise
16
Contact Details
If you have any Si vous avez plus defurther questions questions
Charlie ArcaroPhone (613) [email protected]
Visit our website / Visitez notre site: www.statcan.ca