Census Processing Procedures Matt Sobek Funded by the National Science Foundation Minnesota...

33
Census Processing Procedures Matt Sobek Funded by the National Science Foundation Minnesota Population Center
  • date post

    21-Dec-2015
  • Category

    Documents

  • view

    213
  • download

    0

Transcript of Census Processing Procedures Matt Sobek Funded by the National Science Foundation Minnesota...

Census Processing Procedures

Matt Sobek

Funded by the National Science Foundation

Minnesota Population Center

1. Inventory

IPUMS Work Process

2. English Translation

8. Dissemination

6. Data Harmonization

3. Data Restructuring

5. Confidentiality Measures4. Sample Creation

7. Data Improvement

1. Inventory

IPUMS Work Process

2. Translation

8. Dissemination

6. Harmonization

3. Data Restructuring

5. Confidentiality4. Sample Creation

7. Data Improvement

For each sample

• data• data dictionary• census questionnaire and instructions• sample design• census design• published tabulations, post-enumeration surveys, demographic analyses (when available)

1. Inventory

IPUMS Work Process

2. Translation

8. Dissemination

6. Harmonization

3. Data Restructuring

5. Confidentiality4. Sample Creation

7. Data Improvement

• Census questionnaire

• Census instructions

• Data dictionary codes and labels

1. Inventory

IPUMS Work Process

2. Translation

8. Dissemination

6. Harmonization

3. Data Restructuring

5. Confidentiality4. Sample Creation

7. Data Improvement

a) Create labels/set-up file

Labels File, Costa Rica 2000Var Col Wid Value VarLabel ValueLabel VarLabelOrig ValueLabelOrig Freq VarOrig DD#

relate 36 2 Relationship to household head P01-Parentesco con el jefe(a) PARENTES 641 Head (male or female) Jefe o jefa2 Spouse or partner Esposo(a)/compañera3 Child or stepchild Hijo(a)/hijastro4 Son-in-law or daughter-in-law Yerno o nuera5 Grandchild Nieto(a)6 Parent or parent in-law Padres o suegros7 Other relative Otro familiar8 Domestic servant or relative of serv. Serv.Domestico o su familiar9 Other non-relative Otro no familiar

sex 38 1 Sex P02-Sexo SEXO 651 Male Masculino2 Female Femenino

age 39 3 Age P03-Edad EDAD 66bplg 42 1 Place of birth P04-Lugar de Nacimiento NACIMIEN 67

1 In this same canton Mismo canton2 In another canton Otro canton3 In another country Otro pais

arryr 46 5 Year of Arrival in Costa Rica P04c-Año de Llegada a Costa Rica ANOLLEGA 699999 Unknown Ignorado

nation 51 1 Nationality P05-Nacionalidad NACIONAL 701 Costa Rican by birth CR nacimiento2 Costa Rican by naturalization CR naturalizado3 Other Otra

ethnic 55 2 Ethnic group P06-Etnia ETNIA 721 Indigenous Indigena2 Black or Afrocostarican Negra o Afrocostarricense3 Asian China4 None of the above Ninguna anterior9 Unknown Ignorado

1. Inventory

IPUMS Work Process

2. Translation

8. Dissemination

6. Harmonization

3. Data Restructuring

5. Confidentiality4. Sample Creation

7. Data Improvement

• Basic record structure

b) Analyze data

• Unique IDs or other means of distinguishing household membership

a) Create labels/set-up file

1. Inventory

IPUMS Work Process

2. Translation

8. Dissemination

6. Harmonization

3. Data Restructuring

5. Confidentiality4. Sample Creation

7. Data Improvement

• Basic record structure

b) Analyze data

• Unique IDs or other means of distinguishing household membership

c) Reformat the data

• Convert to household-person hierarchical structure

a) Create labels/set-up file

geography housing

person (head)

person (child)

person (child)

geography housing person (head)

geography housing person (child)

geography housing person (child)

geography housing person (head)

geography housing person (spouse)

geography housing person (child)

geography housing person (child)

geography housing

person (head)

person (spouse)

person (child)

person (child)

Reformat Rectangular Sample

(Brazil 1980)

(Person records only; household data duplicated on person records)

Reformat Dwelling-Household-Person Sample

dwelling

household

person (head)

person (spouse)

person (child)

household

person (head)

person (child)

person (head)

person (spouse)

dwelling

household

dwelling household

person (head)

person (spouse)

person (child)

dwelling household

person (head)

person (child)

dwelling household

person (head)

person (spouse)

(Chile 1992)

(Separate dwelling and household records)

dwelling 001  

head    

spouse    

child    

head    

dwelling 002  

head    

child    

Reformat Dwelling-Person Sample

(Colombia 1993)

household 00101  

head    

spouse    

child    

household 00102  

head    

household 00201  

head    

child    

(Multi-household dwellings; no separate household record)

serial 001 head

serial 001 spouse

serial 002 head

serial 002 child

serial 003 head

serial 001 geog & housing

serial 002 geog & housing

serial 003 geog & housing

serial 001 household

serial 001 head

serial 001 spouse

serial 003 household

serial 002 household

serial 002 head

serial 002 child

serial 003 head

Household File

Person File

(Brazil 2000)

Merge Separate Household and Person Files

Reformat Individual-level Data

geog person housing geog person

geog person housing geog person

geog person housing geog person

geog person housing geog person

geog person housing geog person

person

household

household

person

person

person

person

household

household

household

(Mexico 1960)

geog person housing geog person

geog person housing geog person

geog person housing geog person

geog person housing geog person

geog person housing geog person

(Individuals only; not organized in households)

1. Inventory

IPUMS Work Process

2. Translation

8. Dissemination

6. Harmonization

3. Data Restructuring

5. Confidentiality4. Sample Creation

7. Data Improvement

• Basic record structure

b) Analyze data

• Unique IDs or other means of distinguishing household membership

d) Identify and flag errors in structure

c) Reformat the data

• Convert to household-person hierarchical structure

a) Create labels/set-up file

Var Col Wid Value VarLabel ValueLabel

hdN 20 1 N of heads in household0 Zero1 One2 Two or more

hdFirst 22 1 Head not first0 No problem1 Household has a head but not as the 1st person

spN 24 1 N of spouses in household0 Zero1 One2 Two or more

dupRecD 25 1 Duplicate record (dwelling-wide)0 No problem1 Dwelling has duplicate person records

dupRecH 26 1 Duplicate record (household-wide)0 No problem1 Household has duplicate person records

dupRec 27 1 Duplicate record0 No problem1 Record equals the previous person record

fgeog 28 1 Inconsistent geography0 No problem1 Records in dw do not have same geography-form vars

lineN 29 7 Line number from original data file

Flags Identifying Structural Issues, Chile 1970

1. Inventory

IPUMS Work Process

2. Translation

8. Dissemination

6. Harmonization

3. Data Restructuring

5. Confidentiality4. Sample Creation

7. Data Improvement

a) Formerly, systematic samples

• We developed a household- substitution technique to exclude corrupt records during sampling

Flag

 

 

 

 

 

 

 

bad

 

 

 

 

 

bad

bad

 

 

 

 

 

10th

 

 

 

x

 

 

 

 

 

 

 

 

 

x

 

 

 

 

 

 

HH Size

21 14

22 6

23 9

24 6

25 5

26 1

27 2

28 1

29 7

30 5

31 4

32 3

33 4

34 6

35 6

36 13

37 4

38 7

39 6

40 5

HH Size

1 4

2 2

3 9

4 4

5 3

6 1

7 5

8 6

9 4

10 5

11 3

12 1

13 5

14 2

15 2

16 4

17 10

18 2

19 6

20 2

Flag

 

 

 

 

 

 

 

 

 

 

 

 

 

bad

 

 

 

 

 

 

10th

 

 

 

x

 

 

 

 

 

 

 

 

 

x

 

 

 

 

 

 

Sampling Procedure – Colombia 1973Take

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Take

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Take

Take

Take

Take

No No

No

1. Inventory

IPUMS Work Process

2. Translation

8. Dissemination

6. Harmonization

3. Data Restructuring

5. Confidentiality4. Sample Creation

7. Data Improvement

a) Formerly, systematic samples

• We developed a household- substitution technique to exclude corrupt records during sampling

b) Stratified samples

• Variables for variance estimation

• Develop strata for each sample using geography, ethnicity, hh size, hh type, socioeconomic status; adjusted as necessary for census

1. Inventory

IPUMS Work Process

2. Translation

8. Dissemination

6. Harmonization

3. Data Restructuring

5. Confidentiality4. Sample Creation

7. Data Improvement

• Limit geographic specificity

• Swap across geographic units

• Randomize order within geographies

• Merge small variable categories

• Top-code sensitive numeric variables

5 measures, as required

1. Inventory

IPUMS Work Process

2. Translation

8. Dissemination

6. Harmonization

3. Data Restructuring

5. Confidentiality4. Sample Creation

7. Data Improvementa) Data translation matrices

code label chn1982 col1973 ken1989 mex1970 usa1990P35..35 P17..17 P24..24 P46..46 P59..59

100 SINGLE/NEVER MARRIED 1=never married 4=single 1=Single 9=Soltero 6=Never married200 MARRIED/IN UNION210 Married (not specified) 2=married 2=married 2=Monogamous 1=married, sp present211 Civil 3=Only civil212 Religious 4=Only religious213 Civil and religious 2=Civil and religious214 Polygamous 3=Polygamous220 Consensual union 1=free union 5=Free union300 SEPARATED/DIVORCED310 Separated or divorced 3=separated or divorced320 Separated 6=Separated 8=Separated 3=separated321 Legally separated322 De facto separated330 Divorced 4=divorced 5=Divorced 7=Divorced 4=divorced340 Married, spouse absent 2=married, sp absent400 WIDOWED 3=widowed 5=widowed 4=Widowed 6=Widowed 5=widowed999 UNKNOWN/MISSING 0=missing 6=unknown B=Blank 1=Unknown

MARST Marital Status

Translation Matrix – Marital Status

code label chn1982 col1973 ken1989 mex1970 usa1990P35..35 P17..17 P24..24 P46..46 P59..59

100 SINGLE/NEVER MARRIED 1=never married 4=single 1=Single 9=Soltero 6=Never married200 MARRIED/IN UNION210 Married (not specified) 2=married 2=married 2=Monogamous 1=married, sp present211 Civil 3=Only civil212 Religious 4=Only religious213 Civil and religious 2=Civil and religious214 Polygamous 3=Polygamous220 Consensual union 1=free union 5=Free union300 SEPARATED/DIVORCED310 Separated or divorced 3=separated or divorced320 Separated 6=Separated 8=Separated 3=separated321 Legally separated322 De facto separated330 Divorced 4=divorced 5=Divorced 7=Divorced 4=divorced340 Married, spouse absent 2=married, sp absent400 WIDOWED 3=widowed 5=widowed 4=Widowed 6=Widowed 5=widowed999 UNKNOWN/MISSING 0=missing 6=unknown B=Blank 1=Unknown

MARST Marital Status

Translation Matrix – Marital Status

code label chn1982 col1973 ken1989 mex1970 usa1990P35..35 P17..17 P24..24 P46..46 P59..59

100 SINGLE/NEVER MARRIED 1=never married 4=single 1=Single 9=Soltero 6=Never married200 MARRIED/IN UNION210 Married (not specified) 2=married 2=married 2=Monogamous 1=married, sp present211 Civil 3=Only civil212 Religious 4=Only religious213 Civil and religious 2=Civil and religious214 Polygamous 3=Polygamous220 Consensual union 1=free union 5=Free union300 SEPARATED/DIVORCED310 Separated or divorced 3=separated or divorced320 Separated 6=Separated 8=Separated 3=separated321 Legally separated322 De facto separated330 Divorced 4=divorced 5=Divorced 7=Divorced 4=divorced340 Married, spouse absent 2=married, sp absent400 WIDOWED 3=widowed 5=widowed 4=Widowed 6=Widowed 5=widowed999 UNKNOWN/MISSING 0=missing 6=unknown B=Blank 1=Unknown

MARST Marital Status

Translation Matrix – Marital Status

code label chn1982 col1973 ken1989 mex1970 usa1990P35..35 P17..17 P24..24 P46..46 P59..59

100 SINGLE/NEVER MARRIED 1=never married 4=single 1=Single 9=Soltero 6=Never married200 MARRIED/IN UNION210 Married (not specified) 2=married 2=married 2=Monogamous 1=married, sp present211 Civil 3=Only civil212 Religious 4=Only religious213 Civil and religious 2=Civil and religious214 Polygamous 3=Polygamous220 Consensual union 1=free union 5=Free union300 SEPARATED/DIVORCED310 Separated or divorced 3=separated or divorced320 Separated 6=Separated 8=Separated 3=separated321 Legally separated322 De facto separated330 Divorced 4=divorced 5=Divorced 7=Divorced 4=divorced340 Married, spouse absent 2=married, sp absent400 WIDOWED 3=widowed 5=widowed 4=Widowed 6=Widowed 5=widowed999 UNKNOWN/MISSING 0=missing 6=unknown B=Blank 1=Unknown

MARST Marital Status

Translation Matrix – Marital Status

1. Inventory

IPUMS Work Process

2. Translation

8. Dissemination

6. Harmonization

3. Data Restructuring

5. Confidentiality4. Sample Creation

7. Data Improvementa) Data translation matrices

b) Specialized variable programming

• Where one-to-one recoding of the translation matrix is insufficient

1. Inventory

IPUMS Work Process

2. Translation

6. Harmonization

3. Data Restructuring

5. Confidentiality4. Sample Creation

7. Data Improvementa) Constructed variables

• Family structure and other derived variables8. Dissemination

• Location of mother, father and spouse

Pernum Relate Age Sex Marst Chborn

1 head 46 male married n/a

2 spouse 44 female married 3

3 aunt 77 female widow 7

4 child 15 female single 0

5 child 13 female single n/a

6 child 11 male single n/a

Pernum Relate Age Sex Marst Chborn

1 head 46 male married n/a

2 spouse 44 female married 3

3 aunt 77 female widow 7

4 child 15 female single 0

5 child 13 female single n/a

6 child 11 male single n/a

Spouse’s

Mother’s Father’s

IPUMS “Pointer” VariablesLocation

 

 

 

 

 

 

2

1

0

0

0

0

Location

 

 

 

 

 

 

Location

 

 

 

 

 

 

0

0

0 0

0

0

2 1

1

1

2

2

(Colombia 1985)

(Simple household)

Pernum Relationship Age Sex Marst Chborn

1 head 53 female separated 6

2 child 28 male single n/a

3 child 22 male single n/a

4 child 21 male single n/a

5 child 25 female married 2

6 child-in-law 28 male married n/a

7 grandchild 3 male single n/a

8 grandchild 1 male single n/a

9 non-relative 32 female separated 2

10 non-relative 10 male single n/a

11 non-relative 5 female single n/a

Location

 

 

 

 

 

 

 

 

 

 

 

Location

 

 

 

 

 

 

 

 

 

 

 

Location

 

 

 

 

 

 

 

 

 

 

 

0

0

0

0

0

6

5

0

0

0

0

0

0

1

1

1

1

0

5

5

0

9

9

0

0

0

6

6

0

0

0

0

0

Spouse’s Father’sMother’s

IPUMS “Pointer” Variables(Complex household)

(Colombia 1985)

1. Inventory

IPUMS Work Process

2. Translation

Location of mother, father,and spouse.

6. Harmonization

3. Data Restructuring

5. Confidentiality4. Sample Creation

7. Data Improvementa) Constructed variables

Family structure and other derived variables.

8. Dissemination

b) Data editing and missing data allocation

Missing Data Allocation – Occupation Script

(USA pre-1940 samples)

OCCallocated when 975, 996, 998;

sex (2 categories) 1; 2;

empstat (3 categories) 10-19; 20-29; 30-39;

classwkr (3 categories) 10-19; 20-29; 99;

age (6 categories) 10-19; 20-29; 30-39; 40-49; 50-59; 60-126;

race (3 categories) 100-199; 200-299; 300-899;

1. Inventory

IPUMS Work Process

2. Translation

8. Dissemination

6. Harmonization

3. Data Restructuring

5. Confidentiality4. Sample Creation

7. Data Improvement

a) Metadata• Static pages• Dynamic pages• Translation matrices• Control files

1. Inventory

IPUMS Work Process

2. Translation

8. Dissemination

6. Harmonization

3. Data Restructuring

5. Confidentiality4. Sample Creation

7. Data Improvement

a) Metadata• Static pages• Dynamic pages• Translation matrices• Control files

b) Dissemination Programming

• Documentation system• Extract interface (front end)• Extract engine (back end)• On-line data analysis

End

Matt Sobek

[email protected]