new features of vim - visualization and imputation of missing values
Editing, Imputation, and Synthesis: A Public Use File for ... values and imputation process. ......
Transcript of Editing, Imputation, and Synthesis: A Public Use File for ... values and imputation process. ......
![Page 1: Editing, Imputation, and Synthesis: A Public Use File for ... values and imputation process. ... (Left) original values before blanking (Right) imputed values. ... 1. impute/edit survey](https://reader036.fdocuments.in/reader036/viewer/2022070609/5addb73a7f8b9a9d4d8d897f/html5/thumbnails/1.jpg)
Editing, Imputation, and Synthesis: A Public Use File for
the Census of Manufactures
Hang Kim
2015 Affiliates Annual Meeting, Miami, FL Sunday, March 15
NISS / Duke University
Joint work with Jerry Reiter, Alan Karr, and Larry Cox Research supported by NSF [SES-11-31897]
![Page 2: Editing, Imputation, and Synthesis: A Public Use File for ... values and imputation process. ... (Left) original values before blanking (Right) imputed values. ... 1. impute/edit survey](https://reader036.fdocuments.in/reader036/viewer/2022070609/5addb73a7f8b9a9d4d8d897f/html5/thumbnails/2.jpg)
BackgroundDisseminate Public Use File
![Page 3: Editing, Imputation, and Synthesis: A Public Use File for ... values and imputation process. ... (Left) original values before blanking (Right) imputed values. ... 1. impute/edit survey](https://reader036.fdocuments.in/reader036/viewer/2022070609/5addb73a7f8b9a9d4d8d897f/html5/thumbnails/3.jpg)
Production of Public Use File
Survey Design Survey Data Collection
Data Processing Publication
![Page 4: Editing, Imputation, and Synthesis: A Public Use File for ... values and imputation process. ... (Left) original values before blanking (Right) imputed values. ... 1. impute/edit survey](https://reader036.fdocuments.in/reader036/viewer/2022070609/5addb73a7f8b9a9d4d8d897f/html5/thumbnails/4.jpg)
Current Practice: Data Processing
Initial Check / Recontact Imputation
Data EditingDisclosure Control/
Data Masking
![Page 5: Editing, Imputation, and Synthesis: A Public Use File for ... values and imputation process. ... (Left) original values before blanking (Right) imputed values. ... 1. impute/edit survey](https://reader036.fdocuments.in/reader036/viewer/2022070609/5addb73a7f8b9a9d4d8d897f/html5/thumbnails/5.jpg)
Edit Rules
Logical constraints which are satisfied by reported records to be considered plausible and consistent
find unacceptable errors in survey data
e.g. pregnant male, $2M of avg. salary
specify space of reasonably imputed values
Common edit rules for continuous values
1. Range restriction e.g. total emp. > 0
2. Ratio edit e.g. total salary / total emp. < $1M
3. Balance edit
e.g. total emp. = production workers + other emp.
![Page 6: Editing, Imputation, and Synthesis: A Public Use File for ... values and imputation process. ... (Left) original values before blanking (Right) imputed values. ... 1. impute/edit survey](https://reader036.fdocuments.in/reader036/viewer/2022070609/5addb73a7f8b9a9d4d8d897f/html5/thumbnails/6.jpg)
Three Related Research Topics
1. Imputation under linear constraints
• Technical Report No. 182, NISS
• Journal of Business and Economic Statistics, 2015, Vol 32
2. Simultaneous data editing and imputation
• Technical Report No. 189, NISS
3. Synthetic microdata for the U.S. Census of Manufactures
• work in progress
![Page 7: Editing, Imputation, and Synthesis: A Public Use File for ... values and imputation process. ... (Left) original values before blanking (Right) imputed values. ... 1. impute/edit survey](https://reader036.fdocuments.in/reader036/viewer/2022070609/5addb73a7f8b9a9d4d8d897f/html5/thumbnails/7.jpg)
Topic IImputation under linear
constraints
![Page 8: Editing, Imputation, and Synthesis: A Public Use File for ... values and imputation process. ... (Left) original values before blanking (Right) imputed values. ... 1. impute/edit survey](https://reader036.fdocuments.in/reader036/viewer/2022070609/5addb73a7f8b9a9d4d8d897f/html5/thumbnails/8.jpg)
Example: Colombian Manufacturing Survey
![Page 9: Editing, Imputation, and Synthesis: A Public Use File for ... values and imputation process. ... (Left) original values before blanking (Right) imputed values. ... 1. impute/edit survey](https://reader036.fdocuments.in/reader036/viewer/2022070609/5addb73a7f8b9a9d4d8d897f/html5/thumbnails/9.jpg)
Example: Colombian Manufacturing Survey
Similar to U.S. Census of Manufactures
![Page 10: Editing, Imputation, and Synthesis: A Public Use File for ... values and imputation process. ... (Left) original values before blanking (Right) imputed values. ... 1. impute/edit survey](https://reader036.fdocuments.in/reader036/viewer/2022070609/5addb73a7f8b9a9d4d8d897f/html5/thumbnails/10.jpg)
Example: Colombian Manufacturing Survey
Data 1977-1991have been used for researchers
![Page 11: Editing, Imputation, and Synthesis: A Public Use File for ... values and imputation process. ... (Left) original values before blanking (Right) imputed values. ... 1. impute/edit survey](https://reader036.fdocuments.in/reader036/viewer/2022070609/5addb73a7f8b9a9d4d8d897f/html5/thumbnails/11.jpg)
Example: Colombian Manufacturing Survey
All variables are continuous
![Page 12: Editing, Imputation, and Synthesis: A Public Use File for ... values and imputation process. ... (Left) original values before blanking (Right) imputed values. ... 1. impute/edit survey](https://reader036.fdocuments.in/reader036/viewer/2022070609/5addb73a7f8b9a9d4d8d897f/html5/thumbnails/12.jpg)
Example: Colombian Manufacturing Survey
Complex feasible region given edit rules
![Page 13: Editing, Imputation, and Synthesis: A Public Use File for ... values and imputation process. ... (Left) original values before blanking (Right) imputed values. ... 1. impute/edit survey](https://reader036.fdocuments.in/reader036/viewer/2022070609/5addb73a7f8b9a9d4d8d897f/html5/thumbnails/13.jpg)
Example: Colombian Manufacturing Survey
Not easy to assume a parametric distribution
![Page 14: Editing, Imputation, and Synthesis: A Public Use File for ... values and imputation process. ... (Left) original values before blanking (Right) imputed values. ... 1. impute/edit survey](https://reader036.fdocuments.in/reader036/viewer/2022070609/5addb73a7f8b9a9d4d8d897f/html5/thumbnails/14.jpg)
Joint Modeling Imputation (NISS report 182)
Nonparametric Bayesian Model
use mixture normals with Dirichlet process (DP) priors
to capture complex features of data under very weak distributional assumption
restrict support under constraints regions
to guarantee that imputed values satisfy edit rules
Multiple Imputation Approach
to capture uncertainty introduced by missing values and imputation process
![Page 15: Editing, Imputation, and Synthesis: A Public Use File for ... values and imputation process. ... (Left) original values before blanking (Right) imputed values. ... 1. impute/edit survey](https://reader036.fdocuments.in/reader036/viewer/2022070609/5addb73a7f8b9a9d4d8d897f/html5/thumbnails/15.jpg)
Illustration of Mixture Normals
Dirichlet process (DP) prior helps the model stochastically decide the number of components and weights
![Page 16: Editing, Imputation, and Synthesis: A Public Use File for ... values and imputation process. ... (Left) original values before blanking (Right) imputed values. ... 1. impute/edit survey](https://reader036.fdocuments.in/reader036/viewer/2022070609/5addb73a7f8b9a9d4d8d897f/html5/thumbnails/16.jpg)
Illustration of Mixture Normals
Dirichlet process (DP) prior helps the model stochastically decide the number of components and weights
![Page 17: Editing, Imputation, and Synthesis: A Public Use File for ... values and imputation process. ... (Left) original values before blanking (Right) imputed values. ... 1. impute/edit survey](https://reader036.fdocuments.in/reader036/viewer/2022070609/5addb73a7f8b9a9d4d8d897f/html5/thumbnails/17.jpg)
Illustration of Mixture Normals
Dirichlet process (DP) prior helps the model stochastically decide the number of components and weights
![Page 18: Editing, Imputation, and Synthesis: A Public Use File for ... values and imputation process. ... (Left) original values before blanking (Right) imputed values. ... 1. impute/edit survey](https://reader036.fdocuments.in/reader036/viewer/2022070609/5addb73a7f8b9a9d4d8d897f/html5/thumbnails/18.jpg)
Illustration of Mixture Normals
Dirichlet process (DP) prior helps the model stochastically decide the number of components and weights
![Page 19: Editing, Imputation, and Synthesis: A Public Use File for ... values and imputation process. ... (Left) original values before blanking (Right) imputed values. ... 1. impute/edit survey](https://reader036.fdocuments.in/reader036/viewer/2022070609/5addb73a7f8b9a9d4d8d897f/html5/thumbnails/19.jpg)
Illustration of Mixture Normals
Dirichlet process (DP) prior helps the model stochastically decide the number of components and weights
![Page 20: Editing, Imputation, and Synthesis: A Public Use File for ... values and imputation process. ... (Left) original values before blanking (Right) imputed values. ... 1. impute/edit survey](https://reader036.fdocuments.in/reader036/viewer/2022070609/5addb73a7f8b9a9d4d8d897f/html5/thumbnails/20.jpg)
Illustration of Mixture Normals
Dirichlet process (DP) prior helps the model stochastically decide the number of components and weights
![Page 21: Editing, Imputation, and Synthesis: A Public Use File for ... values and imputation process. ... (Left) original values before blanking (Right) imputed values. ... 1. impute/edit survey](https://reader036.fdocuments.in/reader036/viewer/2022070609/5addb73a7f8b9a9d4d8d897f/html5/thumbnails/21.jpg)
Illustration of Mixture Normals
Dirichlet process (DP) prior helps the model stochastically decide the number of components and weights
![Page 22: Editing, Imputation, and Synthesis: A Public Use File for ... values and imputation process. ... (Left) original values before blanking (Right) imputed values. ... 1. impute/edit survey](https://reader036.fdocuments.in/reader036/viewer/2022070609/5addb73a7f8b9a9d4d8d897f/html5/thumbnails/22.jpg)
Illustration of Mixture Normals
Dirichlet process (DP) prior helps the model stochastically decide the number of components and weights
![Page 23: Editing, Imputation, and Synthesis: A Public Use File for ... values and imputation process. ... (Left) original values before blanking (Right) imputed values. ... 1. impute/edit survey](https://reader036.fdocuments.in/reader036/viewer/2022070609/5addb73a7f8b9a9d4d8d897f/html5/thumbnails/23.jpg)
Illustration of Mixture Normals
Dirichlet process (DP) prior helps the model stochastically decide the number of components and weights
![Page 24: Editing, Imputation, and Synthesis: A Public Use File for ... values and imputation process. ... (Left) original values before blanking (Right) imputed values. ... 1. impute/edit survey](https://reader036.fdocuments.in/reader036/viewer/2022070609/5addb73a7f8b9a9d4d8d897f/html5/thumbnails/24.jpg)
Simulation Study using Colombian Manufacturing Survey data
![Page 25: Editing, Imputation, and Synthesis: A Public Use File for ... values and imputation process. ... (Left) original values before blanking (Right) imputed values. ... 1. impute/edit survey](https://reader036.fdocuments.in/reader036/viewer/2022070609/5addb73a7f8b9a9d4d8d897f/html5/thumbnails/25.jpg)
Simulation Study using Colombian Manufacturing Survey data
1. Assume data are truly reported values with no missing
![Page 26: Editing, Imputation, and Synthesis: A Public Use File for ... values and imputation process. ... (Left) original values before blanking (Right) imputed values. ... 1. impute/edit survey](https://reader036.fdocuments.in/reader036/viewer/2022070609/5addb73a7f8b9a9d4d8d897f/html5/thumbnails/26.jpg)
Simulation Study using Colombian Manufacturing Survey data
2. Randomly blank some values as simulated nonresponse
![Page 27: Editing, Imputation, and Synthesis: A Public Use File for ... values and imputation process. ... (Left) original values before blanking (Right) imputed values. ... 1. impute/edit survey](https://reader036.fdocuments.in/reader036/viewer/2022070609/5addb73a7f8b9a9d4d8d897f/html5/thumbnails/27.jpg)
Simulation Study using Colombian Manufacturing Survey data
3. Fill in simulated missing values using the suggested method
![Page 28: Editing, Imputation, and Synthesis: A Public Use File for ... values and imputation process. ... (Left) original values before blanking (Right) imputed values. ... 1. impute/edit survey](https://reader036.fdocuments.in/reader036/viewer/2022070609/5addb73a7f8b9a9d4d8d897f/html5/thumbnails/28.jpg)
Simulation Study using Colombian Manufacturing Survey data
Pink dots: unchanged values Blue dots: (Left) original values before blanking
(Right) imputed values
![Page 29: Editing, Imputation, and Synthesis: A Public Use File for ... values and imputation process. ... (Left) original values before blanking (Right) imputed values. ... 1. impute/edit survey](https://reader036.fdocuments.in/reader036/viewer/2022070609/5addb73a7f8b9a9d4d8d897f/html5/thumbnails/29.jpg)
Topic IISimultaneous data editing and imputation
![Page 30: Editing, Imputation, and Synthesis: A Public Use File for ... values and imputation process. ... (Left) original values before blanking (Right) imputed values. ... 1. impute/edit survey](https://reader036.fdocuments.in/reader036/viewer/2022070609/5addb73a7f8b9a9d4d8d897f/html5/thumbnails/30.jpg)
Automatic Data Editing
Agencies detect and edit unacceptable errors in survey data
Manual editing
utilizing expert knowledge
Automatic editing
fast and handling massive datasets
Automatic editing process
1. Error localization step
• Which variable of a record is incorrect?
2. Imputation step
• What is a reasonable value to replace the incorrect value?
![Page 31: Editing, Imputation, and Synthesis: A Public Use File for ... values and imputation process. ... (Left) original values before blanking (Right) imputed values. ... 1. impute/edit survey](https://reader036.fdocuments.in/reader036/viewer/2022070609/5addb73a7f8b9a9d4d8d897f/html5/thumbnails/31.jpg)
Fellegi-Holt (F-H) ApproachSince proposed by Fellegi and Holt in 1976, the best-known, most-used guiding principle for automatic data editing
Mathematical optimization approach
Objective function
the number of changed variables (to be minimized)
Constraints
imputed/edited values satisfy edit rules
Example
If avg. salary > $ 1M, need to further review
avg. salary = total salary / total employees
F-H changes either variable, but not change both variables
![Page 32: Editing, Imputation, and Synthesis: A Public Use File for ... values and imputation process. ... (Left) original values before blanking (Right) imputed values. ... 1. impute/edit survey](https://reader036.fdocuments.in/reader036/viewer/2022070609/5addb73a7f8b9a9d4d8d897f/html5/thumbnails/32.jpg)
Edited/Imputed Values Under F-H Approach
![Page 33: Editing, Imputation, and Synthesis: A Public Use File for ... values and imputation process. ... (Left) original values before blanking (Right) imputed values. ... 1. impute/edit survey](https://reader036.fdocuments.in/reader036/viewer/2022070609/5addb73a7f8b9a9d4d8d897f/html5/thumbnails/33.jpg)
Edited/Imputed Values Under F-H Approach
Case 1. assume the observed value failing edit rules
![Page 34: Editing, Imputation, and Synthesis: A Public Use File for ... values and imputation process. ... (Left) original values before blanking (Right) imputed values. ... 1. impute/edit survey](https://reader036.fdocuments.in/reader036/viewer/2022070609/5addb73a7f8b9a9d4d8d897f/html5/thumbnails/34.jpg)
Edited/Imputed Values Under F-H Approach
Case 1. assume the observed value failing edit rulesCase 1. no option but changing the value of X1
![Page 35: Editing, Imputation, and Synthesis: A Public Use File for ... values and imputation process. ... (Left) original values before blanking (Right) imputed values. ... 1. impute/edit survey](https://reader036.fdocuments.in/reader036/viewer/2022070609/5addb73a7f8b9a9d4d8d897f/html5/thumbnails/35.jpg)
Edited/Imputed Values Under F-H Approach
Case 1. assume the observed value failing edit rulesCase 1. no option but changing the value of X1Case 1. can draw imputations from high density region
![Page 36: Editing, Imputation, and Synthesis: A Public Use File for ... values and imputation process. ... (Left) original values before blanking (Right) imputed values. ... 1. impute/edit survey](https://reader036.fdocuments.in/reader036/viewer/2022070609/5addb73a7f8b9a9d4d8d897f/html5/thumbnails/36.jpg)
Edited/Imputed Values Under F-H Approach
Case 1. assume the observed value failing edit rulesCase 1. no option but changing the value of X1Case 1. can draw imputations from high density region
![Page 37: Editing, Imputation, and Synthesis: A Public Use File for ... values and imputation process. ... (Left) original values before blanking (Right) imputed values. ... 1. impute/edit survey](https://reader036.fdocuments.in/reader036/viewer/2022070609/5addb73a7f8b9a9d4d8d897f/html5/thumbnails/37.jpg)
Edited/Imputed Values Under F-H Approach
Case 1. assume the observed value failing edit rulesCase 1. no option but changing the value of X1Case 1. can draw imputations from high density regionCase 2. no option but changing the value of X2
![Page 38: Editing, Imputation, and Synthesis: A Public Use File for ... values and imputation process. ... (Left) original values before blanking (Right) imputed values. ... 1. impute/edit survey](https://reader036.fdocuments.in/reader036/viewer/2022070609/5addb73a7f8b9a9d4d8d897f/html5/thumbnails/38.jpg)
Edited/Imputed Values Under F-H Approach
Case 1. assume the observed value failing edit rulesCase 1. no option but changing the value of X1Case 1. can draw imputations from high density regionCase 2. no option but changing the value of X2
![Page 39: Editing, Imputation, and Synthesis: A Public Use File for ... values and imputation process. ... (Left) original values before blanking (Right) imputed values. ... 1. impute/edit survey](https://reader036.fdocuments.in/reader036/viewer/2022070609/5addb73a7f8b9a9d4d8d897f/html5/thumbnails/39.jpg)
Edited/Imputed Values Under F-H Approach
Case 1. assume the observed value failing edit rulesCase 1. no option but changing the value of X1Case 1. can draw imputations from high density regionCase 2. no option but changing the value of X2Case 2. can draw imputations from high density region
![Page 40: Editing, Imputation, and Synthesis: A Public Use File for ... values and imputation process. ... (Left) original values before blanking (Right) imputed values. ... 1. impute/edit survey](https://reader036.fdocuments.in/reader036/viewer/2022070609/5addb73a7f8b9a9d4d8d897f/html5/thumbnails/40.jpg)
Edited/Imputed Values Under F-H Approach
Case 1. assume the observed value failing edit rulesCase 1. no option but changing the value of X1Case 1. can draw imputations from high density regionCase 2. no option but changing the value of X2Case 2. can draw imputations from high density regionCase 3. both options available: changing X1 or X2
![Page 41: Editing, Imputation, and Synthesis: A Public Use File for ... values and imputation process. ... (Left) original values before blanking (Right) imputed values. ... 1. impute/edit survey](https://reader036.fdocuments.in/reader036/viewer/2022070609/5addb73a7f8b9a9d4d8d897f/html5/thumbnails/41.jpg)
Edited/Imputed Values Under F-H Approach
Case 1. assume the observed value failing edit rulesCase 1. no option but changing the value of X1Case 1. can draw imputations from high density regionCase 2. no option but changing the value of X2Case 2. can draw imputations from high density regionCase 3. both options available: changing X1 or X2Case 3. draw imputed values from tails of distribution
![Page 42: Editing, Imputation, and Synthesis: A Public Use File for ... values and imputation process. ... (Left) original values before blanking (Right) imputed values. ... 1. impute/edit survey](https://reader036.fdocuments.in/reader036/viewer/2022070609/5addb73a7f8b9a9d4d8d897f/html5/thumbnails/42.jpg)
Bayesian Data Editing (NISS report 189)
Nonparametric Bayesian Model
use Dirichlet process (DP) mixture normals
restrict support under constrained regions
balance edits as well as ratio edits
utilize latent indicator to stochastically find the location of error
Multiple Imputation Approach
measure uncertainty introduced by imputation process and data editing process
![Page 43: Editing, Imputation, and Synthesis: A Public Use File for ... values and imputation process. ... (Left) original values before blanking (Right) imputed values. ... 1. impute/edit survey](https://reader036.fdocuments.in/reader036/viewer/2022070609/5addb73a7f8b9a9d4d8d897f/html5/thumbnails/43.jpg)
Simulation StudyReported values with errors
Generate simulated reported values with introduced errors
True simulated values
![Page 44: Editing, Imputation, and Synthesis: A Public Use File for ... values and imputation process. ... (Left) original values before blanking (Right) imputed values. ... 1. impute/edit survey](https://reader036.fdocuments.in/reader036/viewer/2022070609/5addb73a7f8b9a9d4d8d897f/html5/thumbnails/44.jpg)
Simulation StudyReported values with errors
Generate simulated reported values with introduced errors
True simulated values
![Page 45: Editing, Imputation, and Synthesis: A Public Use File for ... values and imputation process. ... (Left) original values before blanking (Right) imputed values. ... 1. impute/edit survey](https://reader036.fdocuments.in/reader036/viewer/2022070609/5addb73a7f8b9a9d4d8d897f/html5/thumbnails/45.jpg)
Simulation StudyReported values with errors
Generate simulated reported values with introduced errors
True simulated values
![Page 46: Editing, Imputation, and Synthesis: A Public Use File for ... values and imputation process. ... (Left) original values before blanking (Right) imputed values. ... 1. impute/edit survey](https://reader036.fdocuments.in/reader036/viewer/2022070609/5addb73a7f8b9a9d4d8d897f/html5/thumbnails/46.jpg)
Result with Bayesian EditingEdited values by BE
Bayes. Editing successfully estimates the distribution of simulated true values
True simulated values
![Page 47: Editing, Imputation, and Synthesis: A Public Use File for ... values and imputation process. ... (Left) original values before blanking (Right) imputed values. ... 1. impute/edit survey](https://reader036.fdocuments.in/reader036/viewer/2022070609/5addb73a7f8b9a9d4d8d897f/html5/thumbnails/47.jpg)
Result with Fellegi-HoltEdited values by FH
F-H approach results in some edited values at tails of the distribution of true values
True simulated values
![Page 48: Editing, Imputation, and Synthesis: A Public Use File for ... values and imputation process. ... (Left) original values before blanking (Right) imputed values. ... 1. impute/edit survey](https://reader036.fdocuments.in/reader036/viewer/2022070609/5addb73a7f8b9a9d4d8d897f/html5/thumbnails/48.jpg)
Result with Fellegi-HoltEdited values by FH
F-H approach results in some edited values at tails of the distribution of true values
True simulated values
![Page 49: Editing, Imputation, and Synthesis: A Public Use File for ... values and imputation process. ... (Left) original values before blanking (Right) imputed values. ... 1. impute/edit survey](https://reader036.fdocuments.in/reader036/viewer/2022070609/5addb73a7f8b9a9d4d8d897f/html5/thumbnails/49.jpg)
Result with Fellegi-HoltEdited values by FH
F-H approach results in some edited values at tails of the distribution of true values
True simulated values
![Page 50: Editing, Imputation, and Synthesis: A Public Use File for ... values and imputation process. ... (Left) original values before blanking (Right) imputed values. ... 1. impute/edit survey](https://reader036.fdocuments.in/reader036/viewer/2022070609/5addb73a7f8b9a9d4d8d897f/html5/thumbnails/50.jpg)
Result with Fellegi-HoltEdited values by FH
F-H approach results in some edited values at tails of the distribution of true values
True simulated values
![Page 51: Editing, Imputation, and Synthesis: A Public Use File for ... values and imputation process. ... (Left) original values before blanking (Right) imputed values. ... 1. impute/edit survey](https://reader036.fdocuments.in/reader036/viewer/2022070609/5addb73a7f8b9a9d4d8d897f/html5/thumbnails/51.jpg)
Simulation Study: Comparison of Pairwise Correlations from Edited Data
![Page 52: Editing, Imputation, and Synthesis: A Public Use File for ... values and imputation process. ... (Left) original values before blanking (Right) imputed values. ... 1. impute/edit survey](https://reader036.fdocuments.in/reader036/viewer/2022070609/5addb73a7f8b9a9d4d8d897f/html5/thumbnails/52.jpg)
Application: U.S. Census of Manufactures
Economic census for manufacturing industries, conducted by the U.S. Census Bureau every five years
variables: cost of materials, total emp., total value of shipments
widely used by researchers, e.g., interested in plant-level productivity
Current editing practice
F-H based automatic editing system
using ratio edits and balance edits
additional (separated) manual editing processes
We compare three editing approaches with pairwise correlation
BE: Bayesian Editing
FH: Fellegi-Holt based editing
FH & manual: Final edited data produced by the Census Bureau
![Page 53: Editing, Imputation, and Synthesis: A Public Use File for ... values and imputation process. ... (Left) original values before blanking (Right) imputed values. ... 1. impute/edit survey](https://reader036.fdocuments.in/reader036/viewer/2022070609/5addb73a7f8b9a9d4d8d897f/html5/thumbnails/53.jpg)
2007 Census of Manufactures: Comparison of Pairwise Correlations from Edited Data
![Page 54: Editing, Imputation, and Synthesis: A Public Use File for ... values and imputation process. ... (Left) original values before blanking (Right) imputed values. ... 1. impute/edit survey](https://reader036.fdocuments.in/reader036/viewer/2022070609/5addb73a7f8b9a9d4d8d897f/html5/thumbnails/54.jpg)
Topic IIISynthetic microdata for
the U.S. Census of Manufactures
![Page 55: Editing, Imputation, and Synthesis: A Public Use File for ... values and imputation process. ... (Left) original values before blanking (Right) imputed values. ... 1. impute/edit survey](https://reader036.fdocuments.in/reader036/viewer/2022070609/5addb73a7f8b9a9d4d8d897f/html5/thumbnails/55.jpg)
Integration of Imputation, Editing and Synthesizer (in progress)
Two-stage Multiple Imputation Approach
1. impute/edit survey data X given a size measure z
• resulting in m copies of edited/imputed datasets, X(1), … ,X(m)
2. generate synthetic data given X(l) and z
• resulting in r synthetic file X1(l), … ,Xr
(l) for l=1,…m
Inferences
based on mr complete-data analyses and combining rules
Compared to current practices (with separate steps)
correctly estimate variance of final synthetic data
all benefits enjoyed by Bayesian editing/imputation
![Page 56: Editing, Imputation, and Synthesis: A Public Use File for ... values and imputation process. ... (Left) original values before blanking (Right) imputed values. ... 1. impute/edit survey](https://reader036.fdocuments.in/reader036/viewer/2022070609/5addb73a7f8b9a9d4d8d897f/html5/thumbnails/56.jpg)
Concluding Remarks
![Page 57: Editing, Imputation, and Synthesis: A Public Use File for ... values and imputation process. ... (Left) original values before blanking (Right) imputed values. ... 1. impute/edit survey](https://reader036.fdocuments.in/reader036/viewer/2022070609/5addb73a7f8b9a9d4d8d897f/html5/thumbnails/57.jpg)
We proposed a Bayesian framework to integrate the currently-separated processes: imputation, data editing, and disclosure limitation
A future research topic is simultaneous data editing/imputation methods for mixed type data, such as American Community Survey
R package for Bayesian editing/imputation of continuous variables will be published on CRAN soon
Technical reports are available at NISS websites (http://www.niss.org/publications/technical-reports)