1 Overview of Statistical Disclosure Methodology for Microdata Laura Zayatz Census Bureau...
-
Upload
irma-clark -
Category
Documents
-
view
215 -
download
1
Transcript of 1 Overview of Statistical Disclosure Methodology for Microdata Laura Zayatz Census Bureau...
![Page 1: 1 Overview of Statistical Disclosure Methodology for Microdata Laura Zayatz Census Bureau laura.zayatz@census.gov BTS Confidentiality Seminar Series, April.](https://reader030.fdocuments.in/reader030/viewer/2022032803/56649e415503460f94b33ce2/html5/thumbnails/1.jpg)
1
Overview of Statistical Disclosure Methodology for Microdata
Laura Zayatz
Census Bureau
BTS Confidentiality Seminar Series, April 2003
![Page 2: 1 Overview of Statistical Disclosure Methodology for Microdata Laura Zayatz Census Bureau laura.zayatz@census.gov BTS Confidentiality Seminar Series, April.](https://reader030.fdocuments.in/reader030/viewer/2022032803/56649e415503460f94b33ce2/html5/thumbnails/2.jpg)
2
Microdata
Respondent level dataEach record represents one responding
household or personUsually demographic data
![Page 3: 1 Overview of Statistical Disclosure Methodology for Microdata Laura Zayatz Census Bureau laura.zayatz@census.gov BTS Confidentiality Seminar Series, April.](https://reader030.fdocuments.in/reader030/viewer/2022032803/56649e415503460f94b33ce2/html5/thumbnails/3.jpg)
3
Microdata Disclosure Risk
High visibility recordsLinkable external files
![Page 4: 1 Overview of Statistical Disclosure Methodology for Microdata Laura Zayatz Census Bureau laura.zayatz@census.gov BTS Confidentiality Seminar Series, April.](https://reader030.fdocuments.in/reader030/viewer/2022032803/56649e415503460f94b33ce2/html5/thumbnails/4.jpg)
4
Addressing Risk
Can we measure it? (file level – function of ever-changing external database environment)
Getting specific (record level)Reduce the amount of informationDistort the information
![Page 5: 1 Overview of Statistical Disclosure Methodology for Microdata Laura Zayatz Census Bureau laura.zayatz@census.gov BTS Confidentiality Seminar Series, April.](https://reader030.fdocuments.in/reader030/viewer/2022032803/56649e415503460f94b33ce2/html5/thumbnails/5.jpg)
5
Making the Data Safe
Remove obvious identifiersSet a geographic thresholdKnow external dataKnow your dataLook for problems
![Page 6: 1 Overview of Statistical Disclosure Methodology for Microdata Laura Zayatz Census Bureau laura.zayatz@census.gov BTS Confidentiality Seminar Series, April.](https://reader030.fdocuments.in/reader030/viewer/2022032803/56649e415503460f94b33ce2/html5/thumbnails/6.jpg)
6
Look for Problems:External Files
Get to know external files• State and local government agencies
• Other federal agencies
• Private firms ($)
Perform reidentification experiments• Hunt and peck
• Data fusion
![Page 7: 1 Overview of Statistical Disclosure Methodology for Microdata Laura Zayatz Census Bureau laura.zayatz@census.gov BTS Confidentiality Seminar Series, April.](https://reader030.fdocuments.in/reader030/viewer/2022032803/56649e415503460f94b33ce2/html5/thumbnails/7.jpg)
7
Look for Problems:Your File
Special uniques• 2 way cross tabulations of all variables
• Work your way up
Variables that you know exist on external databases (even if you cannot access them)
![Page 8: 1 Overview of Statistical Disclosure Methodology for Microdata Laura Zayatz Census Bureau laura.zayatz@census.gov BTS Confidentiality Seminar Series, April.](https://reader030.fdocuments.in/reader030/viewer/2022032803/56649e415503460f94b33ce2/html5/thumbnails/8.jpg)
8
Be Careful with ……
Geographic detailContextual variables/Sampling informationEstablishment dataLongitudinal data (life events often generate
records on external files)Administrative dataData that will also be published in the form of
tables at smaller geographic levels
![Page 9: 1 Overview of Statistical Disclosure Methodology for Microdata Laura Zayatz Census Bureau laura.zayatz@census.gov BTS Confidentiality Seminar Series, April.](https://reader030.fdocuments.in/reader030/viewer/2022032803/56649e415503460f94b33ce2/html5/thumbnails/9.jpg)
9
When You Find a Problem…..
Reduce risk by reducing the amount of information
Reduce risk by perturbing the data
![Page 10: 1 Overview of Statistical Disclosure Methodology for Microdata Laura Zayatz Census Bureau laura.zayatz@census.gov BTS Confidentiality Seminar Series, April.](https://reader030.fdocuments.in/reader030/viewer/2022032803/56649e415503460f94b33ce2/html5/thumbnails/10.jpg)
10
Reducing the Amount of Information
Delete a variableRecode a categorical variable into larger
categories (perhaps using thresholds)Recode a continuous variable into categoriesRound continuous variablesUse top and bottom codes (provide means,…)Use local suppressionEnlarge geographic areas
![Page 11: 1 Overview of Statistical Disclosure Methodology for Microdata Laura Zayatz Census Bureau laura.zayatz@census.gov BTS Confidentiality Seminar Series, April.](https://reader030.fdocuments.in/reader030/viewer/2022032803/56649e415503460f94b33ce2/html5/thumbnails/11.jpg)
11
Perturbing the Data
Noise additionSwappingRank SwappingBlanking and imputationMicroaggregationMultiple imputation/modeling to generate
synthetic data
![Page 12: 1 Overview of Statistical Disclosure Methodology for Microdata Laura Zayatz Census Bureau laura.zayatz@census.gov BTS Confidentiality Seminar Series, April.](https://reader030.fdocuments.in/reader030/viewer/2022032803/56649e415503460f94b33ce2/html5/thumbnails/12.jpg)
12
Census 2000 Public Use Microdata Samples (PUMS) --- Decrease in Detail
Internal reidentification study looking at all products (microdata and tables) from 1990 census
Increased concern about external data linking capabilities
![Page 13: 1 Overview of Statistical Disclosure Methodology for Microdata Laura Zayatz Census Bureau laura.zayatz@census.gov BTS Confidentiality Seminar Series, April.](https://reader030.fdocuments.in/reader030/viewer/2022032803/56649e415503460f94b33ce2/html5/thumbnails/13.jpg)
13
The Files
17% of households receive the long formMaximum of 6% of the population appears on
PUMS2 mutually exclusive files5% state file, PUMAs contain 100,000 people,
less variable detail, 2003 release1% characteristics file, SuperPUMAs contain
400,000 people, same variable detail, 2003 release
![Page 14: 1 Overview of Statistical Disclosure Methodology for Microdata Laura Zayatz Census Bureau laura.zayatz@census.gov BTS Confidentiality Seminar Series, April.](https://reader030.fdocuments.in/reader030/viewer/2022032803/56649e415503460f94b33ce2/html5/thumbnails/14.jpg)
14
Changes for the 5% and the 1% Files
Round all dollar amounts• $1-7 = $4
• $8-$999 round to nearest $10
• $1,000-$49,000 round to nearest $100
• $50,000+ round to nearest $1,000
![Page 15: 1 Overview of Statistical Disclosure Methodology for Microdata Laura Zayatz Census Bureau laura.zayatz@census.gov BTS Confidentiality Seminar Series, April.](https://reader030.fdocuments.in/reader030/viewer/2022032803/56649e415503460f94b33ce2/html5/thumbnails/15.jpg)
15
Changes for the 5% and the 1% Files
Round departure time• 2400-0259 in 30-minute intervals
• 0300-0459 in 10-minute intervals
• 0500-1059 in 5 minute intervals
• 1100-2359 in 10-minute intervals
![Page 16: 1 Overview of Statistical Disclosure Methodology for Microdata Laura Zayatz Census Bureau laura.zayatz@census.gov BTS Confidentiality Seminar Series, April.](https://reader030.fdocuments.in/reader030/viewer/2022032803/56649e415503460f94b33ce2/html5/thumbnails/16.jpg)
16
Changes for the 5% and the 1% Files
Noise added to ages of people in households with 10 or more people
Ages must stay within certain groupingsBlank original agesNew ages generated from a given
distribution of ages in that grouping
![Page 17: 1 Overview of Statistical Disclosure Methodology for Microdata Laura Zayatz Census Bureau laura.zayatz@census.gov BTS Confidentiality Seminar Series, April.](https://reader030.fdocuments.in/reader030/viewer/2022032803/56649e415503460f94b33ce2/html5/thumbnails/17.jpg)
17
Small Amount of Additional Noise
Certain characteristics of small, unusual subgroups of people and housing units
Vulnerable to disclosure via publicly available datasets
Not disclosing the detailsResulted from reidentification studies
![Page 18: 1 Overview of Statistical Disclosure Methodology for Microdata Laura Zayatz Census Bureau laura.zayatz@census.gov BTS Confidentiality Seminar Series, April.](https://reader030.fdocuments.in/reader030/viewer/2022032803/56649e415503460f94b33ce2/html5/thumbnails/18.jpg)
18
Changes for the 5% File OnlyCategories Must Have 10,000 People Nationwide
1990 2000Language 305 74Ancestry 292 143Birthplace 312 167Tribe 27 23Hispanic O. 48 29Occupation 506 443
![Page 19: 1 Overview of Statistical Disclosure Methodology for Microdata Laura Zayatz Census Bureau laura.zayatz@census.gov BTS Confidentiality Seminar Series, April.](https://reader030.fdocuments.in/reader030/viewer/2022032803/56649e415503460f94b33ce2/html5/thumbnails/19.jpg)
19
Multiple Races
WhiteBlackAmerican Indian or Alaska NativeAsian: Asian Indian, Chinese, Filipino,
Japanese, Korean, Vietnamese, Other AsianNative Hawaiian or Pacific Islander: Native
Hawaiian, Guamanian or Chamorro, Samoan, Other Pacific Islander
Some Other Race
![Page 20: 1 Overview of Statistical Disclosure Methodology for Microdata Laura Zayatz Census Bureau laura.zayatz@census.gov BTS Confidentiality Seminar Series, April.](https://reader030.fdocuments.in/reader030/viewer/2022032803/56649e415503460f94b33ce2/html5/thumbnails/20.jpg)
20
Race on the PUMS
Yes/No for each major race grouping (6)Designation of 1 of about 70 single race
groups (includes some write-ins)Designation of multiple race groups At least 10,000 (5% file) and 8,000 (1%
file) people nationwide
![Page 21: 1 Overview of Statistical Disclosure Methodology for Microdata Laura Zayatz Census Bureau laura.zayatz@census.gov BTS Confidentiality Seminar Series, April.](https://reader030.fdocuments.in/reader030/viewer/2022032803/56649e415503460f94b33ce2/html5/thumbnails/21.jpg)
21
Data Swapping
5% and 1% filesSwapping of “special uniques” --- high risk
recordsPairs of households that agree on certain
demographic characteristics but are in difference geographic areas are swapped across PUMAs and SuperPUMAs
![Page 22: 1 Overview of Statistical Disclosure Methodology for Microdata Laura Zayatz Census Bureau laura.zayatz@census.gov BTS Confidentiality Seminar Series, April.](https://reader030.fdocuments.in/reader030/viewer/2022032803/56649e415503460f94b33ce2/html5/thumbnails/22.jpg)
22
Issues that Have Not Changed
Topcoding (half-percent/three-percent rule)Property taxes (categorization)
![Page 23: 1 Overview of Statistical Disclosure Methodology for Microdata Laura Zayatz Census Bureau laura.zayatz@census.gov BTS Confidentiality Seminar Series, April.](https://reader030.fdocuments.in/reader030/viewer/2022032803/56649e415503460f94b33ce2/html5/thumbnails/23.jpg)
23
Conclusion
Know your dataKnow external dataProactively look for problemsIf possible, if you find a disclosure problem,
let users help you choose the best method for reducing risk of disclosure