Census Processing Procedures Matt Sobek Funded by the National Science Foundation Minnesota...
-
date post
21-Dec-2015 -
Category
Documents
-
view
213 -
download
0
Transcript of Census Processing Procedures Matt Sobek Funded by the National Science Foundation Minnesota...
Census Processing Procedures
Matt Sobek
Funded by the National Science Foundation
Minnesota Population Center
1. Inventory
IPUMS Work Process
2. English Translation
8. Dissemination
6. Data Harmonization
3. Data Restructuring
5. Confidentiality Measures4. Sample Creation
7. Data Improvement
1. Inventory
IPUMS Work Process
2. Translation
8. Dissemination
6. Harmonization
3. Data Restructuring
5. Confidentiality4. Sample Creation
7. Data Improvement
For each sample
• data• data dictionary• census questionnaire and instructions• sample design• census design• published tabulations, post-enumeration surveys, demographic analyses (when available)
1. Inventory
IPUMS Work Process
2. Translation
8. Dissemination
6. Harmonization
3. Data Restructuring
5. Confidentiality4. Sample Creation
7. Data Improvement
• Census questionnaire
• Census instructions
• Data dictionary codes and labels
1. Inventory
IPUMS Work Process
2. Translation
8. Dissemination
6. Harmonization
3. Data Restructuring
5. Confidentiality4. Sample Creation
7. Data Improvement
a) Create labels/set-up file
Labels File, Costa Rica 2000Var Col Wid Value VarLabel ValueLabel VarLabelOrig ValueLabelOrig Freq VarOrig DD#
relate 36 2 Relationship to household head P01-Parentesco con el jefe(a) PARENTES 641 Head (male or female) Jefe o jefa2 Spouse or partner Esposo(a)/compañera3 Child or stepchild Hijo(a)/hijastro4 Son-in-law or daughter-in-law Yerno o nuera5 Grandchild Nieto(a)6 Parent or parent in-law Padres o suegros7 Other relative Otro familiar8 Domestic servant or relative of serv. Serv.Domestico o su familiar9 Other non-relative Otro no familiar
sex 38 1 Sex P02-Sexo SEXO 651 Male Masculino2 Female Femenino
age 39 3 Age P03-Edad EDAD 66bplg 42 1 Place of birth P04-Lugar de Nacimiento NACIMIEN 67
1 In this same canton Mismo canton2 In another canton Otro canton3 In another country Otro pais
arryr 46 5 Year of Arrival in Costa Rica P04c-Año de Llegada a Costa Rica ANOLLEGA 699999 Unknown Ignorado
nation 51 1 Nationality P05-Nacionalidad NACIONAL 701 Costa Rican by birth CR nacimiento2 Costa Rican by naturalization CR naturalizado3 Other Otra
ethnic 55 2 Ethnic group P06-Etnia ETNIA 721 Indigenous Indigena2 Black or Afrocostarican Negra o Afrocostarricense3 Asian China4 None of the above Ninguna anterior9 Unknown Ignorado
1. Inventory
IPUMS Work Process
2. Translation
8. Dissemination
6. Harmonization
3. Data Restructuring
5. Confidentiality4. Sample Creation
7. Data Improvement
• Basic record structure
b) Analyze data
• Unique IDs or other means of distinguishing household membership
a) Create labels/set-up file
1. Inventory
IPUMS Work Process
2. Translation
8. Dissemination
6. Harmonization
3. Data Restructuring
5. Confidentiality4. Sample Creation
7. Data Improvement
• Basic record structure
b) Analyze data
• Unique IDs or other means of distinguishing household membership
c) Reformat the data
• Convert to household-person hierarchical structure
a) Create labels/set-up file
geography housing
person (head)
person (child)
person (child)
geography housing person (head)
geography housing person (child)
geography housing person (child)
geography housing person (head)
geography housing person (spouse)
geography housing person (child)
geography housing person (child)
geography housing
person (head)
person (spouse)
person (child)
person (child)
Reformat Rectangular Sample
(Brazil 1980)
(Person records only; household data duplicated on person records)
Reformat Dwelling-Household-Person Sample
dwelling
household
person (head)
person (spouse)
person (child)
household
person (head)
person (child)
person (head)
person (spouse)
dwelling
household
dwelling household
person (head)
person (spouse)
person (child)
dwelling household
person (head)
person (child)
dwelling household
person (head)
person (spouse)
(Chile 1992)
(Separate dwelling and household records)
dwelling 001
head
spouse
child
head
dwelling 002
head
child
Reformat Dwelling-Person Sample
(Colombia 1993)
household 00101
head
spouse
child
household 00102
head
household 00201
head
child
(Multi-household dwellings; no separate household record)
serial 001 head
serial 001 spouse
serial 002 head
serial 002 child
serial 003 head
serial 001 geog & housing
serial 002 geog & housing
serial 003 geog & housing
serial 001 household
serial 001 head
serial 001 spouse
serial 003 household
serial 002 household
serial 002 head
serial 002 child
serial 003 head
Household File
Person File
(Brazil 2000)
Merge Separate Household and Person Files
Reformat Individual-level Data
geog person housing geog person
geog person housing geog person
geog person housing geog person
geog person housing geog person
geog person housing geog person
person
household
household
person
person
person
person
household
household
household
(Mexico 1960)
geog person housing geog person
geog person housing geog person
geog person housing geog person
geog person housing geog person
geog person housing geog person
(Individuals only; not organized in households)
1. Inventory
IPUMS Work Process
2. Translation
8. Dissemination
6. Harmonization
3. Data Restructuring
5. Confidentiality4. Sample Creation
7. Data Improvement
• Basic record structure
b) Analyze data
• Unique IDs or other means of distinguishing household membership
d) Identify and flag errors in structure
c) Reformat the data
• Convert to household-person hierarchical structure
a) Create labels/set-up file
Var Col Wid Value VarLabel ValueLabel
hdN 20 1 N of heads in household0 Zero1 One2 Two or more
hdFirst 22 1 Head not first0 No problem1 Household has a head but not as the 1st person
spN 24 1 N of spouses in household0 Zero1 One2 Two or more
dupRecD 25 1 Duplicate record (dwelling-wide)0 No problem1 Dwelling has duplicate person records
dupRecH 26 1 Duplicate record (household-wide)0 No problem1 Household has duplicate person records
dupRec 27 1 Duplicate record0 No problem1 Record equals the previous person record
fgeog 28 1 Inconsistent geography0 No problem1 Records in dw do not have same geography-form vars
lineN 29 7 Line number from original data file
Flags Identifying Structural Issues, Chile 1970
1. Inventory
IPUMS Work Process
2. Translation
8. Dissemination
6. Harmonization
3. Data Restructuring
5. Confidentiality4. Sample Creation
7. Data Improvement
a) Formerly, systematic samples
• We developed a household- substitution technique to exclude corrupt records during sampling
Flag
bad
bad
bad
10th
x
x
HH Size
21 14
22 6
23 9
24 6
25 5
26 1
27 2
28 1
29 7
30 5
31 4
32 3
33 4
34 6
35 6
36 13
37 4
38 7
39 6
40 5
HH Size
1 4
2 2
3 9
4 4
5 3
6 1
7 5
8 6
9 4
10 5
11 3
12 1
13 5
14 2
15 2
16 4
17 10
18 2
19 6
20 2
Flag
bad
10th
x
x
Sampling Procedure – Colombia 1973Take
Take
Take
Take
Take
Take
No No
No
1. Inventory
IPUMS Work Process
2. Translation
8. Dissemination
6. Harmonization
3. Data Restructuring
5. Confidentiality4. Sample Creation
7. Data Improvement
a) Formerly, systematic samples
• We developed a household- substitution technique to exclude corrupt records during sampling
b) Stratified samples
• Variables for variance estimation
• Develop strata for each sample using geography, ethnicity, hh size, hh type, socioeconomic status; adjusted as necessary for census
1. Inventory
IPUMS Work Process
2. Translation
8. Dissemination
6. Harmonization
3. Data Restructuring
5. Confidentiality4. Sample Creation
7. Data Improvement
• Limit geographic specificity
• Swap across geographic units
• Randomize order within geographies
• Merge small variable categories
• Top-code sensitive numeric variables
5 measures, as required
1. Inventory
IPUMS Work Process
2. Translation
8. Dissemination
6. Harmonization
3. Data Restructuring
5. Confidentiality4. Sample Creation
7. Data Improvementa) Data translation matrices
code label chn1982 col1973 ken1989 mex1970 usa1990P35..35 P17..17 P24..24 P46..46 P59..59
100 SINGLE/NEVER MARRIED 1=never married 4=single 1=Single 9=Soltero 6=Never married200 MARRIED/IN UNION210 Married (not specified) 2=married 2=married 2=Monogamous 1=married, sp present211 Civil 3=Only civil212 Religious 4=Only religious213 Civil and religious 2=Civil and religious214 Polygamous 3=Polygamous220 Consensual union 1=free union 5=Free union300 SEPARATED/DIVORCED310 Separated or divorced 3=separated or divorced320 Separated 6=Separated 8=Separated 3=separated321 Legally separated322 De facto separated330 Divorced 4=divorced 5=Divorced 7=Divorced 4=divorced340 Married, spouse absent 2=married, sp absent400 WIDOWED 3=widowed 5=widowed 4=Widowed 6=Widowed 5=widowed999 UNKNOWN/MISSING 0=missing 6=unknown B=Blank 1=Unknown
MARST Marital Status
Translation Matrix – Marital Status
code label chn1982 col1973 ken1989 mex1970 usa1990P35..35 P17..17 P24..24 P46..46 P59..59
100 SINGLE/NEVER MARRIED 1=never married 4=single 1=Single 9=Soltero 6=Never married200 MARRIED/IN UNION210 Married (not specified) 2=married 2=married 2=Monogamous 1=married, sp present211 Civil 3=Only civil212 Religious 4=Only religious213 Civil and religious 2=Civil and religious214 Polygamous 3=Polygamous220 Consensual union 1=free union 5=Free union300 SEPARATED/DIVORCED310 Separated or divorced 3=separated or divorced320 Separated 6=Separated 8=Separated 3=separated321 Legally separated322 De facto separated330 Divorced 4=divorced 5=Divorced 7=Divorced 4=divorced340 Married, spouse absent 2=married, sp absent400 WIDOWED 3=widowed 5=widowed 4=Widowed 6=Widowed 5=widowed999 UNKNOWN/MISSING 0=missing 6=unknown B=Blank 1=Unknown
MARST Marital Status
Translation Matrix – Marital Status
code label chn1982 col1973 ken1989 mex1970 usa1990P35..35 P17..17 P24..24 P46..46 P59..59
100 SINGLE/NEVER MARRIED 1=never married 4=single 1=Single 9=Soltero 6=Never married200 MARRIED/IN UNION210 Married (not specified) 2=married 2=married 2=Monogamous 1=married, sp present211 Civil 3=Only civil212 Religious 4=Only religious213 Civil and religious 2=Civil and religious214 Polygamous 3=Polygamous220 Consensual union 1=free union 5=Free union300 SEPARATED/DIVORCED310 Separated or divorced 3=separated or divorced320 Separated 6=Separated 8=Separated 3=separated321 Legally separated322 De facto separated330 Divorced 4=divorced 5=Divorced 7=Divorced 4=divorced340 Married, spouse absent 2=married, sp absent400 WIDOWED 3=widowed 5=widowed 4=Widowed 6=Widowed 5=widowed999 UNKNOWN/MISSING 0=missing 6=unknown B=Blank 1=Unknown
MARST Marital Status
Translation Matrix – Marital Status
code label chn1982 col1973 ken1989 mex1970 usa1990P35..35 P17..17 P24..24 P46..46 P59..59
100 SINGLE/NEVER MARRIED 1=never married 4=single 1=Single 9=Soltero 6=Never married200 MARRIED/IN UNION210 Married (not specified) 2=married 2=married 2=Monogamous 1=married, sp present211 Civil 3=Only civil212 Religious 4=Only religious213 Civil and religious 2=Civil and religious214 Polygamous 3=Polygamous220 Consensual union 1=free union 5=Free union300 SEPARATED/DIVORCED310 Separated or divorced 3=separated or divorced320 Separated 6=Separated 8=Separated 3=separated321 Legally separated322 De facto separated330 Divorced 4=divorced 5=Divorced 7=Divorced 4=divorced340 Married, spouse absent 2=married, sp absent400 WIDOWED 3=widowed 5=widowed 4=Widowed 6=Widowed 5=widowed999 UNKNOWN/MISSING 0=missing 6=unknown B=Blank 1=Unknown
MARST Marital Status
Translation Matrix – Marital Status
1. Inventory
IPUMS Work Process
2. Translation
8. Dissemination
6. Harmonization
3. Data Restructuring
5. Confidentiality4. Sample Creation
7. Data Improvementa) Data translation matrices
b) Specialized variable programming
• Where one-to-one recoding of the translation matrix is insufficient
1. Inventory
IPUMS Work Process
2. Translation
6. Harmonization
3. Data Restructuring
5. Confidentiality4. Sample Creation
7. Data Improvementa) Constructed variables
• Family structure and other derived variables8. Dissemination
• Location of mother, father and spouse
Pernum Relate Age Sex Marst Chborn
1 head 46 male married n/a
2 spouse 44 female married 3
3 aunt 77 female widow 7
4 child 15 female single 0
5 child 13 female single n/a
6 child 11 male single n/a
Pernum Relate Age Sex Marst Chborn
1 head 46 male married n/a
2 spouse 44 female married 3
3 aunt 77 female widow 7
4 child 15 female single 0
5 child 13 female single n/a
6 child 11 male single n/a
Spouse’s
Mother’s Father’s
IPUMS “Pointer” VariablesLocation
2
1
0
0
0
0
Location
Location
0
0
0 0
0
0
2 1
1
1
2
2
(Colombia 1985)
(Simple household)
Pernum Relationship Age Sex Marst Chborn
1 head 53 female separated 6
2 child 28 male single n/a
3 child 22 male single n/a
4 child 21 male single n/a
5 child 25 female married 2
6 child-in-law 28 male married n/a
7 grandchild 3 male single n/a
8 grandchild 1 male single n/a
9 non-relative 32 female separated 2
10 non-relative 10 male single n/a
11 non-relative 5 female single n/a
Location
Location
Location
0
0
0
0
0
6
5
0
0
0
0
0
0
1
1
1
1
0
5
5
0
9
9
0
0
0
6
6
0
0
0
0
0
Spouse’s Father’sMother’s
IPUMS “Pointer” Variables(Complex household)
(Colombia 1985)
1. Inventory
IPUMS Work Process
2. Translation
Location of mother, father,and spouse.
6. Harmonization
3. Data Restructuring
5. Confidentiality4. Sample Creation
7. Data Improvementa) Constructed variables
Family structure and other derived variables.
8. Dissemination
b) Data editing and missing data allocation
Missing Data Allocation – Occupation Script
(USA pre-1940 samples)
OCCallocated when 975, 996, 998;
sex (2 categories) 1; 2;
empstat (3 categories) 10-19; 20-29; 30-39;
classwkr (3 categories) 10-19; 20-29; 99;
age (6 categories) 10-19; 20-29; 30-39; 40-49; 50-59; 60-126;
race (3 categories) 100-199; 200-299; 300-899;
1. Inventory
IPUMS Work Process
2. Translation
8. Dissemination
6. Harmonization
3. Data Restructuring
5. Confidentiality4. Sample Creation
7. Data Improvement
a) Metadata• Static pages• Dynamic pages• Translation matrices• Control files
1. Inventory
IPUMS Work Process
2. Translation
8. Dissemination
6. Harmonization
3. Data Restructuring
5. Confidentiality4. Sample Creation
7. Data Improvement
a) Metadata• Static pages• Dynamic pages• Translation matrices• Control files
b) Dissemination Programming
• Documentation system• Extract interface (front end)• Extract engine (back end)• On-line data analysis