Selective Data Editing - Helsingin yliopisto · • then editing would be a minor process! 2....
Transcript of Selective Data Editing - Helsingin yliopisto · • then editing would be a minor process! 2....
![Page 1: Selective Data Editing - Helsingin yliopisto · • then editing would be a minor process! 2. Editing Editing is an activity of detecting, resolving and understanding errors in data](https://reader033.fdocuments.in/reader033/viewer/2022051311/603ebb0742db087f8b7013c5/html5/thumbnails/1.jpg)
Selective Data EditingSelective Data EditingThe Third Baltic-Nordic Conference on Survey Statistics – BaNoCoss.
13-17 June 2011 in High Cost area, Sweden
Mr Anders Norberg,Statistics Sweden (SCB)
![Page 2: Selective Data Editing - Helsingin yliopisto · • then editing would be a minor process! 2. Editing Editing is an activity of detecting, resolving and understanding errors in data](https://reader033.fdocuments.in/reader033/viewer/2022051311/603ebb0742db087f8b7013c5/html5/thumbnails/2.jpg)
If …
• we only want information from businesses that weknow they have,
• and we ask for that information so theyunderstand,
• and we motivate them to deliver as good quality indata as possible,
• and we help them to avoid accidental errors inanswering questionnaires,
• then editing would be a minor process!2
![Page 3: Selective Data Editing - Helsingin yliopisto · • then editing would be a minor process! 2. Editing Editing is an activity of detecting, resolving and understanding errors in data](https://reader033.fdocuments.in/reader033/viewer/2022051311/603ebb0742db087f8b7013c5/html5/thumbnails/3.jpg)
Editing
Editing is an activity of detecting,resolving and understanding errors indata and produced statistics
![Page 4: Selective Data Editing - Helsingin yliopisto · • then editing would be a minor process! 2. Editing Editing is an activity of detecting, resolving and understanding errors in data](https://reader033.fdocuments.in/reader033/viewer/2022051311/603ebb0742db087f8b7013c5/html5/thumbnails/4.jpg)
Where errors are introduced• Errors in raw data delivered by respondents
to the statistical agency are typically non-response and measurement errors
• Errors in data transmissions
• The statistics production process is amixture of many activities with risks ofintroducing errors
![Page 5: Selective Data Editing - Helsingin yliopisto · • then editing would be a minor process! 2. Editing Editing is an activity of detecting, resolving and understanding errors in data](https://reader033.fdocuments.in/reader033/viewer/2022051311/603ebb0742db087f8b7013c5/html5/thumbnails/5.jpg)
Editing activitiesA. Respondent editingB. Manual editing before data registrationC. Data registration editingD. Production editing / micro editing
1 “Traditional” editing2 Selective editing
E. Coherence analysisF. Output editing / macro editingG. EvaluationH. Delivery control
![Page 6: Selective Data Editing - Helsingin yliopisto · • then editing would be a minor process! 2. Editing Editing is an activity of detecting, resolving and understanding errors in data](https://reader033.fdocuments.in/reader033/viewer/2022051311/603ebb0742db087f8b7013c5/html5/thumbnails/6.jpg)
Types of errorsObvious errors / Fatal errors
Item non-responseNon-valid valuesData structure- or model errors, total sum of componentsContradictions
Suspected data valuesDeviation errors (Outliers)• Suspiciously high/low values, data outside of predetermined limits
Definition errors (Inliers)• Many respondent miss-understand a question in the same way• Many respondents fetch data from info-systems with other definitions
![Page 7: Selective Data Editing - Helsingin yliopisto · • then editing would be a minor process! 2. Editing Editing is an activity of detecting, resolving and understanding errors in data](https://reader033.fdocuments.in/reader033/viewer/2022051311/603ebb0742db087f8b7013c5/html5/thumbnails/7.jpg)
Suspected data valuesDeviation errors• Manual follow-up takes time and is expensive
• Few deviation errors have impact on outputstatistics (low hit-rate, many changes in datahave very little impact)
Editing must have impact on the output!Remember response burden !
![Page 8: Selective Data Editing - Helsingin yliopisto · • then editing would be a minor process! 2. Editing Editing is an activity of detecting, resolving and understanding errors in data](https://reader033.fdocuments.in/reader033/viewer/2022051311/603ebb0742db087f8b7013c5/html5/thumbnails/8.jpg)
Suspected data valuesDefinition errors (Inliers)• Difficult to find• Ways to find them:
Combined editing for several surveysDeep interviews in focus groupsUse statistics from FAQ and from re-contacts withrespondentsHigh proportions of item non-responseGraphical editingGood examples
![Page 9: Selective Data Editing - Helsingin yliopisto · • then editing would be a minor process! 2. Editing Editing is an activity of detecting, resolving and understanding errors in data](https://reader033.fdocuments.in/reader033/viewer/2022051311/603ebb0742db087f8b7013c5/html5/thumbnails/9.jpg)
The new role of editingThe new role of editing• Quality Control of the measurement process
– Find errors (use efficient controls)– Consider every identified error as a problem for the
respondent to deliver correct data by our collectioninstrument
– Identify sources of error (process data)– Analyse process data – communicate with cognitive
specialists
• Contribute to quality declaration
• Adjust (change/correct) significant errorsGranquist (1997). The New View on Editing. International Statistical Review
![Page 10: Selective Data Editing - Helsingin yliopisto · • then editing would be a minor process! 2. Editing Editing is an activity of detecting, resolving and understanding errors in data](https://reader033.fdocuments.in/reader033/viewer/2022051311/603ebb0742db087f8b7013c5/html5/thumbnails/10.jpg)
The Process Perspective• Audit and improve data collection
– measurement instrument– collection process
• and the editing process itself
Un-edited data must be saved in order toproduced important process indicators,as hit-rate and impact on output!
![Page 11: Selective Data Editing - Helsingin yliopisto · • then editing would be a minor process! 2. Editing Editing is an activity of detecting, resolving and understanding errors in data](https://reader033.fdocuments.in/reader033/viewer/2022051311/603ebb0742db087f8b7013c5/html5/thumbnails/11.jpg)
Process indicators
• Sources of errors (problem for the respondents)
• Prop. of flagged units and variables
• Prop. of manually and automatically reviewed unitsand variables
• Prop. of amended values and impact of thechanges, per variable
• Hit-rate for edits
![Page 12: Selective Data Editing - Helsingin yliopisto · • then editing would be a minor process! 2. Editing Editing is an activity of detecting, resolving and understanding errors in data](https://reader033.fdocuments.in/reader033/viewer/2022051311/603ebb0742db087f8b7013c5/html5/thumbnails/12.jpg)
“Traditional” data editing
![Page 13: Selective Data Editing - Helsingin yliopisto · • then editing would be a minor process! 2. Editing Editing is an activity of detecting, resolving and understanding errors in data](https://reader033.fdocuments.in/reader033/viewer/2022051311/603ebb0742db087f8b7013c5/html5/thumbnails/13.jpg)
“Traditional” data editing
An EDIT is a checking rule / edit rule, a logicalcondition or a restriction to the value of a dataitem or a data group which must be met if thedata is to be considered correct.
An EDIT has:Test-variableEdit groupAcceptance region
if Occupation = ‘Doctor’ andnot (2900 < Salary_Month < 7100)then Errcode_A01 = ‘Flag’;
![Page 14: Selective Data Editing - Helsingin yliopisto · • then editing would be a minor process! 2. Editing Editing is an activity of detecting, resolving and understanding errors in data](https://reader033.fdocuments.in/reader033/viewer/2022051311/603ebb0742db087f8b7013c5/html5/thumbnails/14.jpg)
Suspicion / Traditional editsFinding acceptance limits: Data from previous surveyrounds
Hourly wage distributed by SNI code at one-digit level.
![Page 15: Selective Data Editing - Helsingin yliopisto · • then editing would be a minor process! 2. Editing Editing is an activity of detecting, resolving and understanding errors in data](https://reader033.fdocuments.in/reader033/viewer/2022051311/603ebb0742db087f8b7013c5/html5/thumbnails/15.jpg)
Selective data editing
Potentialimpact
Suspicion0 1
Flagged
A procedure which targets only some of themicro data variables or records for review byprioritizing the manual work.
![Page 16: Selective Data Editing - Helsingin yliopisto · • then editing would be a minor process! 2. Editing Editing is an activity of detecting, resolving and understanding errors in data](https://reader033.fdocuments.in/reader033/viewer/2022051311/603ebb0742db087f8b7013c5/html5/thumbnails/16.jpg)
Selective data editingCriteria for prioritizing variables and recordsfor review:• Limited bias• Limited variance
… imagining that 100% would yield bestquality
Hedlin, D. (2008). Local and global score functions in selective editing. Invited paper, UNECEWork Session on Statistical Data Editing, Wien, Austria, 21-23 April.
![Page 17: Selective Data Editing - Helsingin yliopisto · • then editing would be a minor process! 2. Editing Editing is an activity of detecting, resolving and understanding errors in data](https://reader033.fdocuments.in/reader033/viewer/2022051311/603ebb0742db087f8b7013c5/html5/thumbnails/17.jpg)
Selective data editingConstruct a score function for prioritizingvariables and records:
• Potential impact on statistics for recordsflagged by traditional edits
• Expected impact on statistics for variablevalues flagged to be suspected by edits
Norberg, A. et al. (2010): A General Methodology for Selective Data Editing.Statistics Sweden
![Page 18: Selective Data Editing - Helsingin yliopisto · • then editing would be a minor process! 2. Editing Editing is an activity of detecting, resolving and understanding errors in data](https://reader033.fdocuments.in/reader033/viewer/2022051311/603ebb0742db087f8b7013c5/html5/thumbnails/18.jpg)
Selective data editing
The purpose of selective data editing is toreduce cost for the statistical agency as wellas for the respondents, without significantdecrease of the quality of the outputstatistics.
![Page 19: Selective Data Editing - Helsingin yliopisto · • then editing would be a minor process! 2. Editing Editing is an activity of detecting, resolving and understanding errors in data](https://reader033.fdocuments.in/reader033/viewer/2022051311/603ebb0742db087f8b7013c5/html5/thumbnails/19.jpg)
Selective data editingLatouche, M. and Berthelot, J.-M. (1992): Use of a score function to prioritize and limitre-contacts in business surveys. Journal of Official Statistics, Vol. 8, pp. 389-400
Lawrence, D. and McDavitt, C. (1994): Significance Editing in the Australian Survey ofAverage Weekly Earnings. Journal of Official Statistics, Vol. 10, pp. 437-447
Granquist, L. (1995): Improving the Traditional Editing Process. In Business SurveyMethods, eds. Cox et.al., Wiley
Granquist, L. (1997): The New View on Editing. International Statistical Review
Granquist, L. and Kovar, J. (1997): Editing of survey data: How much is enough?In Survey measurement and process quality (p. 415-435) eds. Lyberg et al., Wiley
Hedlin, D. (2008): Local and global score functions in selective editing. Invited paper,UNECE Work Session on Statistical Data Editing, Wien, Austria, 21-23 April.
Norberg, A. et al. (2010): A General Methodology for Selective Data Editing.Statistics Sweden
Ilves, K. (2010): Probability Approach to Editing. Workshop on Survey Sampling Theoryand Methodology, Vilnius, Lithuania, August 23-27, 2010
![Page 20: Selective Data Editing - Helsingin yliopisto · • then editing would be a minor process! 2. Editing Editing is an activity of detecting, resolving and understanding errors in data](https://reader033.fdocuments.in/reader033/viewer/2022051311/603ebb0742db087f8b7013c5/html5/thumbnails/20.jpg)
Selective data editingStatistics Sweden has developed a genericIT-tool for selective editing, SELEKT 1.1
It is based on a documented methodology.
SELEKT 1.1 is flexible but require yourunderstanding of the methodology.
Norberg, A. et al. (2010): A General Methodology for Selective Data Editing.Statistics SwedenNorberg, A. et al. (2011): User´s Guide to SELEKT 1.1, A Generic Toolbox forSelective Data Editing. Statistics Sweden
![Page 21: Selective Data Editing - Helsingin yliopisto · • then editing would be a minor process! 2. Editing Editing is an activity of detecting, resolving and understanding errors in data](https://reader033.fdocuments.in/reader033/viewer/2022051311/603ebb0742db087f8b7013c5/html5/thumbnails/21.jpg)
The survey environment
-Coding Sum of wages by Industry -Decision makingRespondent (u) has one or several sampled units -Editing Industry -Information
-Imputation ASampled unit (k) -Estimation BObserved Background variable Measurement var. (j) Cunit (l) Industry Gender Occup. 1 2=Wage D
1 E2 B M 2 F - Z
34 Sum of wages by Occupation and Gender
GenderOccupa-tion Men Women Sum
1234
Sum
Input Throughput Output Use
jkly
![Page 22: Selective Data Editing - Helsingin yliopisto · • then editing would be a minor process! 2. Editing Editing is an activity of detecting, resolving and understanding errors in data](https://reader033.fdocuments.in/reader033/viewer/2022051311/603ebb0742db087f8b7013c5/html5/thumbnails/22.jpg)
The survey environment
-Coding Sum of wages by Industry -Decision makingRespondent (u) has one or several sampled units -Editing Industry -Information
-Imputation ASampled unit (k) -Estimation BObserved Background variable Measurement var. (j) Cunit (l) Industry Gender Occup. 1 2=Wage D
1 E2 B M 2 F - Z
34 Sum of wages by Occupation and Gender
GenderOccupa-tion Men Women Sum
1234
Sum
Input Throughput Output Use
jkly
Suspicion
![Page 23: Selective Data Editing - Helsingin yliopisto · • then editing would be a minor process! 2. Editing Editing is an activity of detecting, resolving and understanding errors in data](https://reader033.fdocuments.in/reader033/viewer/2022051311/603ebb0742db087f8b7013c5/html5/thumbnails/23.jpg)
Predicted (expected) values
Data / predictor•Time series
•Previous value
•Forecast
•Cross section•Mean/standard error
•Median/quartile
Edit groupsAll data
Blue collarworkers
White collars
Monthly payWeekly pay
Profession=3111Profession=3112
Payment bythe hour
Monthly payWeekly pay
Payment bythe hour
Profession =1 Profession= 2 Profession= 3 Profession=9
Profession=3480
MenWomen
Profession=3113
21
![Page 24: Selective Data Editing - Helsingin yliopisto · • then editing would be a minor process! 2. Editing Editing is an activity of detecting, resolving and understanding errors in data](https://reader033.fdocuments.in/reader033/viewer/2022051311/603ebb0742db087f8b7013c5/html5/thumbnails/24.jpg)
Suspicion
R=
Suspicion=R/(TAU+R)
l,k,jU
l,k,jl,k,jU
l,k,jL
l,k,jU
l,k,jl,k,jU
l,k,jl,k,jl,k,j
l,k,jU
l,k,jl,k,jl,k,jL
l,k,jl,k,jl,k,j
Ll,k,jl,k,jl,k,jl,k,j
Ll,k,j
Ul,k,jl,k,j
Ll,k,jl,k,jl,k,j
z~z~KAPPAz~z~if)z~z~/(z~z~KAPPAz~zz~z~KAPPAz~zz~z~KAPPAz~if0
z~z~KAPPAz~zif)z~z~/(zz~z~KAPPAz~
Susp
KAPPA = 0. The ratio R is the distancebetween t and the centre t~ divided by thedispersion range r = )()( ~~ LU tt ,
R = a/r:
KAPPA = 1. The ratio R is the distancefrom the nearest range limit divided by therange. Hence R = a/r. For data between thelower and upper limits of the dispersionrange the suspicion is zero.
* **r
a
* **a
r
![Page 25: Selective Data Editing - Helsingin yliopisto · • then editing would be a minor process! 2. Editing Editing is an activity of detecting, resolving and understanding errors in data](https://reader033.fdocuments.in/reader033/viewer/2022051311/603ebb0742db087f8b7013c5/html5/thumbnails/25.jpg)
Impact• Actual impact = w ( yune – yedi) for an observation is
the impact on estimated domain-total of variable Yif yune is kept instead of making a review to find yedi
• Potential impact = w (yune – ypred) is a proxy for actualimpact to be used in practice. ypred is a prediction(expected value) for yedi
• Expected impact (per domain, variable, observation)is the product of suspicion and potential impact
![Page 26: Selective Data Editing - Helsingin yliopisto · • then editing would be a minor process! 2. Editing Editing is an activity of detecting, resolving and understanding errors in data](https://reader033.fdocuments.in/reader033/viewer/2022051311/603ebb0742db087f8b7013c5/html5/thumbnails/26.jpg)
Score function (1)Local score nr 5, by domain d, variable j, observed unit k,lis the expected impact related to an appropriate measure ofsize for the domain/variable, say standard error of estimate.
VIOLINj = weight for variable j
CLARINETc(d) = weight for classification (domains) c(d)
OBOEj = adjustment for size of estimated total or itsstandard error for variable j
Score5d,j,k,l = Suspicionj,k,l ×Potential impactd,j,k,l × CELLOd(c),j
( ){ }( ) jOBOE0t,j,d0t,j,dj
)d(cjj),c(d
T̂SE,T̂×ALFAmaximum
CLARINET×VIOLIN=CELLO
27
![Page 27: Selective Data Editing - Helsingin yliopisto · • then editing would be a minor process! 2. Editing Editing is an activity of detecting, resolving and understanding errors in data](https://reader033.fdocuments.in/reader033/viewer/2022051311/603ebb0742db087f8b7013c5/html5/thumbnails/27.jpg)
Score function (2)• Global scores are aggregated local scores by
domain, variable, second stage units (opt.) toa score for the primary unit and finally torespondent unit (opt.)
• Methods: sum, sum of squares, maximumetc. by (Minkovsky´s distance)
{ }( )-l
3l,k
3k 3Threshold3Score,0max=2Score
Hedlin, D. (2008): Local and global score functions in selective editing. Invited paper, UNECE WorkSession on Statistical Data Editing, Wien, Austria, 21-23 April.
![Page 28: Selective Data Editing - Helsingin yliopisto · • then editing would be a minor process! 2. Editing Editing is an activity of detecting, resolving and understanding errors in data](https://reader033.fdocuments.in/reader033/viewer/2022051311/603ebb0742db087f8b7013c5/html5/thumbnails/28.jpg)
EvaluationEvaluation
Relative pseudo-bias is a measure of error inoutput due to incomplete data review
( ) ( )100
100q
T̂SE
T̂T̂=qRPB
-
![Page 29: Selective Data Editing - Helsingin yliopisto · • then editing would be a minor process! 2. Editing Editing is an activity of detecting, resolving and understanding errors in data](https://reader033.fdocuments.in/reader033/viewer/2022051311/603ebb0742db087f8b7013c5/html5/thumbnails/29.jpg)
EvaluationEvaluationPsedobias for PPI-survey relative to the overall price index.PSUs ordered in descending order of score
0
0,02
0,04
0,06
0,08
0,1
0,12
0,14
0,16
0,18
0,2
1 42 83 124 165 206 247 288 329 370 411 452 493 534 575 616 657 698 739 780 821
Antal ändringar
![Page 30: Selective Data Editing - Helsingin yliopisto · • then editing would be a minor process! 2. Editing Editing is an activity of detecting, resolving and understanding errors in data](https://reader033.fdocuments.in/reader033/viewer/2022051311/603ebb0742db087f8b7013c5/html5/thumbnails/30.jpg)
CutCut--off or probability sampling?off or probability sampling?
Say that 821 of the total sample (n=4 000) have ascore >0.
There are two options for manual review:
– Cut-off sampling: Score2 >Threshold2,assuming the remaining bias is small
– Two-phase sampling: ps-sampling anddesign-based estimation of measurement errorsto subtract from initial estimates
Ilves, K. (2010): Probability Approach to Editing. Workshop on Survey Sampling Theory andMethodology, Vilnius, Lithuania, August 23-27, 2010
![Page 31: Selective Data Editing - Helsingin yliopisto · • then editing would be a minor process! 2. Editing Editing is an activity of detecting, resolving and understanding errors in data](https://reader033.fdocuments.in/reader033/viewer/2022051311/603ebb0742db087f8b7013c5/html5/thumbnails/31.jpg)
SELEKT 1.1Survey specific coldadapter (SAS code)Data preparation
SAS dataset
PRE-SELEKTParameter specifications,Analysis of cold data
AUTOSELEKTScore calculation &record flagging
Records toFOLLOW-UP
Processdata andreports
Input (hot)survey data
Records toIMPUTATION
Raw+edited past(cold) survey data
Survey specific hotadapter (SAS code)Data preparation
SASdata set
Table ofParameters
Table ofEstimates
Acceptedrecords
CLANestimationsoftware
SNOWDON-X analysisof edits
Edits
![Page 32: Selective Data Editing - Helsingin yliopisto · • then editing would be a minor process! 2. Editing Editing is an activity of detecting, resolving and understanding errors in data](https://reader033.fdocuments.in/reader033/viewer/2022051311/603ebb0742db087f8b7013c5/html5/thumbnails/32.jpg)
EditingEditing –– remaining methodology issuesremaining methodology issues
Confidence (respondents and clients)Do we make a differrence between new and old respondentsEditing in earlier processes– Web-questionnaires– Scanned paper questionnaires
Fatal errors– Classifying variables– Survey variables
Data and methods for computing predicted values etc.Homogenous edit groupsHow to decide threshold valuesAggregating scoresSampling below threshold– Inference– Data for evaluation