IPUMS-International Methods Matt Sobek Minnesota Population Center [email protected].

38
IPUMS-International Methods Matt Sobek Minnesota Population Center [email protected]

Transcript of IPUMS-International Methods Matt Sobek Minnesota Population Center [email protected].

IPUMS-International Methods

Matt SobekMinnesota Population Center

[email protected]

IPUMS-International Development Process

1. Inventory1. Inventory

2. Metadata Preparation2. Metadata Preparation

3. Data Preparation3. Data Preparation

4. Harmonization4. Harmonization

5. Data Enhancements5. Data Enhancements

6. Dissemination6. Dissemination

IPUMS-International Development Process

1. Inventory1. Inventory

a) Dataa) Data

b) Data dictionaryb) Data dictionary

c) Census questionnaire and instructionsc) Census questionnaire and instructions

d) Sample designd) Sample design

IPUMS-International Development Process

2. Metadata Preparation2. Metadata Preparation

• English translationEnglish translation

IPUMS-International Development Process

2. Metadata Preparation2. Metadata Preparation

• English translationEnglish translation

• Data dictionariesData dictionaries

Original Data Dictionary (Kenya 1989). C006-EA-TYPE N 13 1 RURAL 1 URBAN 2 . C007-HHOLDNUM N 14-16 3 HHOLD-CODE 001:999 . (record type) A 17 1 . .age 2 Data Dictionary: REAL1 IMPS Version 3.1 . Created: 31/10/95 11:57:21 . Record Name: POP-RECORD Record Type: 2 .------------------------------------------------------------------------------- .tem (occurs) Data Item . Subitem (occurs) Type Position Len. Dec. Value Name Values .------------------------------------------------------------------------------- POP1 A 18-67 50 . P00-LINENUMBER N 18-19 2 0 LINE-NUMBER 01:49 . P10-RELATIONSHI N 20 1 0 HEAD 1 SPOUSE 2 SON-OFHEAD 3 DAU-OFHEAD 4 FATHER 5 MOTHER 6 OTHERRELATIVE 7 NOTRELATED 8 NR 9 . P11-SEX N 21 1 0 MALE 1 FEMALE 2 NR 9 . P12-AGE N 22-23 2 0 UNDERONE 00 YEARGIVEN 01:96 OVR97 97 NR 99

Original Data Dictionary (Romania 1992)Line No.

Item Data type and

Item Len.

Signification and values

1. MAPA N 6 010001- 47XXXX number of the file, where : - 01- 47 is the code of the county

- 0001-XXXX is the code of the census sector within the county

2. CLAD N 3 The order number of the building in the file 3. LOC N 3 The order number of the dwelling within the building 4. RT N 1 Record type value: 4 5. P00 N 1 The order number of the household in the dwelling 6. PNR N 2 The order number of the person in the household 7. P01 N 2 Relationship with the household head:

. household head 1 . husband / wife 2 . son / daughter 3 . son in law / daughter in law 4 . grandson / granddaughter 5 . father / mother 6 . grandfather / grandmother 7 . brother / sister 8 . brother in law / sister in law 9 . father in law / mother in law 10 . other relative 11 . non-related person 20

8. P05 N 1 Situation at the census moment: . present 1 . temporally absent from the household: - left in other place of the country 2 - left abroad 3 . absent for a long time: - for working 4 - for studies 5 - other reason 6

Original Data Dictionary (China 1982)======================================= year: 1982, sample: 1%, record: individual, variable: age Length: 3 Start: 7 Age in years 0..99 ======================================= year: 1982, sample: 1%, record: individual, variable: race Length: 2 Start: 10 Ethnicity 01: Han 21: Va 41: Tajik 02: Mongol 22: She 42: Nu 03: Hui 23: Gaoshan 43: Uzbek 04: Tibetan 24: Lahu 44: Russian 05: Uygur 25: Sui 45: Ewenkei 06: Miao 26: Dongxiang 46: Benglong 07: Yi 27: Naxi 47: Baoan 08: Zhuang 28: Jingpo 48: Yugur 09: Bouyi 29: Kirgiz 49: Gin 10: Korean 30: Tu 50: Tatar 11: Man 31: Daur 51: Derung 12: Dong 32: Mulam 52: Orogen 13: Yao 33: Qiang 53: Hezhen 14: Bai 34: Bulang 54: monba 15: Tujia 35: Salar 55: Lhoba 16: Hani 36: Maonan 56: Jino 17: Kazak 37: Gelao 97: Other Unidentified 18: Dai 38: Xibe 98: Naturalized Foreigners 19: Li 39: Achang 20: Lisu 40: Pumi ======================================= year: 1982, sample: 1%, record: individual, variable: regstats Length: 1 Start: 12 Registration Status 1: Residing and registered here 2: Residing here over 1 year, but registered elsewhere. 3: Residing here less than 1 year, absent from the registration place 1 year or more. 4: Living here with registration unsettled 5: Used to reside here; is now abroad with no local registration =======================================

Original Data Dictionary (Mexico 1990)

25 CLAVE DE PARENTESCO CATALOGO DE PARENTESCO (CATPAREN.TXT) PRIMER DIGITO IGUAL A: 1 JEFE(A) 2 ESPOSA(O) O COMPAÑERA(O) 3 HIJO(A) 4 SIRVIENTE 5 SIN PARENTESCO 6 OTRO PARENTESCO 7 PERSONA SOLA 9 PARENTESCO NO ESPECIFICADO 26 SEXO 1 HOMBRE 2 MUJER 27 EDAD AÑOS CUMPLIDOS 999 EDAD NO ESPECIFICADA 28 LUGAR DE NACIMIENTO CATALOGO DE PAISES (CATPAISE.TXT) 001..032 ENTIDADES DEL PAIS 033..099 ENTIDAD INSUFICIENTEMENTE ESPECIFICADO 100..998 OTRO PAIS 999 NO ESPECIFICO LUGAR DE NACIMIENTO 29 LUGAR DE RESIDENCIA ANTERIOR CATALOGO DE PAISES (CATPAISE.TXT) 001..032 ENTIDADES DEL PAIS 033..099 ENTIDAD INSUFICIENTEMENTE ESPECIFICADO 100..998 OTRO PAIS 999 NO ESPECIFICO LUGAR DE RESIDENCIA ANTERIOR

Variable Labels File – IPUMS Metadata

(Costa Rica 2000)

Rec Var Col Wid Value Value_Label Value_Label_Original Freq Svar P relate 36 2 Relationship to household head P01-Parentesco con el jefe(a) CR00A400 1 Head (male or female) Jefe o jefa 960,098 2 Spouse or partner Esposo(a)/compañera 680,217 3 Child or stepchild Hijo(a)/hijastro 1,763,230 4 Son-in-law or daughter-in-law Yerno o nuera 23,644 5 Grandchild Nieto(a) 140,300 6 Parent or parent in-law Padres o suegros 44,393 7 Other relative Otro familiar 117,223 8 Domestic servant or relative Serv.Domestico o su familiar 11,884 9 Other non-relative Otro no familiar 69,190 P sex 38 1 Sex P02-Sexo CR00A401 1 Male Masculino 1,902,614 2 Female Femenino 1,907,565 P bpl 39 1 Place of birth P04-Lugar de Nacimiento CR00A403 1 In this same canton Mismo canton 2,303,784 2 In another canton Otro canton 1,209,934 3 In another country Otro pais 296,461 P ethnic 40 2 Ethnic group P06-Etnia CR00A408 1 Indigenous Indigena 63,876 2 Black or Afrocostarican Negra o Afrocostarricense 72,784 3 Asian China 7,873 4 None of the above Ninguna anterior 3,568,471 9 Unknown Ignorado 97,175 P indigsp 42 2 Speaks Indigenous language P06b-Habla lengua indigena CR00A410 1 Yes, speaks Indigenous lang Si habla lengua indígena 15,806 2 No, does not speak Indigenous lang No habla lengua indígena 13,768 9 Unknown Ignorado 3,554 10 [no label] 3,777,051

IPUMS-International Development Process

2. Metadata Preparation2. Metadata Preparation

• English translationEnglish translation

• Data dictionariesData dictionaries

• Questionnaires and instructionsQuestionnaires and instructions

Census Questionnaire (Mexico 2000)

WaterWaterAccessAccess

5. Number of Rooms

How many rooms are used for sleeping without counting hallways? _____ Write the number

Without counting the hallways or bathrooms how many total rooms are in this dwelling? Count the kitchen

_____Write the number

6. Access to water

Read all of the options until you get an affirmative answer. Circle only one answer

1 Running water inside the dwelling 2 Running water outside the dwelling but on the land 3 Running water from a public faucet or hydrant 4 Running water that is carried from another dwelling 5 Tanked in by truck 6 Water from a well, river, lake, stream or other

Answers 3, 4, 5, 6 continue with number 8

7. Water supply

How many days of the week is water available? Circle only one answer

1 Daily 2 Every third day 3 Twice a week 4 Once a week 5 Occasionally

Text of Census Questionnaire (Mexico 2000)

5. Number of Rooms <svar v="MX00A016" a="all"> How many rooms are used for sleeping without counting hallways?

<i1> _____ Write the number </i1>

</svar> <svar v="MX00A017" a="all"> Without counting the hallways or bathrooms how many total rooms are in this dwelling? Count the kitchen

<i1> _____Write the number </i1>

</svar> <svar v="MX00A018" a="all"> 6. Access to water

Read all of the options until you get an affirmative answer. Circle only one answer <i1> 1 Running water inside the dwelling 2 Running water outside the dwelling but on the land 3 Running water from a public faucet or hydrant 4 Running water that is carried from another dwelling 5 Tanked in by truck 6 Water from a well, river, lake, stream or other </i1>

Answers 3, 4, 5, 6 continue with number 8 </svar>

XML-Tagged Census Questionnaire (Mexico 2000)

Source variableSource variableMX00A016MX00A016

Source variableSource variableMX00A017MX00A017

Source variableSource variableMX00A018MX00A018

(water access)(water access)

<svar v="MX00A016 MX00A017" a="all"> 5. Number of Rooms

Room is the space in the dwelling delimited, normally, by fixed walls and roofs of any material. In the first question, only the rooms utilized for sleeping are considered. In the second, include all the rooms in the dwelling: bedrooms, living room, dining room, living room-dining room, kitchen, living room ["estancia"], study, and service room.

Storerooms, granaries, commercial areas, stores, garages, or others, which are regularly used for sleeping, should be counted as bedrooms and be included in the total number of rooms. </svar> <svar v="MX00A018" a="all"> 6. Access to Water

This question distinguishes dwellings which have piped water from those that get water from a different source.

[Depiction of this completed question on the enumeration form, and a related drawing] </svar> <svar v="MX00A019 MX00A020" a="all"> 7. Water Supply

When there is piped water within the dwelling or outside of the dwelling but within the property, first ask how often they receive it, and if they receive it daily, if they receive it during all or part of the day.

[Depiction of this completed question on the enumeration form] </svar>

Source variableSource variableMX00A018MX00A018

XML-Tagged Census Instructions (Mexico 2000)

IPUMS-International Development Process

3. Data Preparation3. Data Preparation

• Data reformattingData reformatting

geography housing

person (head)

person (child)

person (child)

geography housing person (head)

geography housing person (child)

geography housing person (child)

geography housing person (head)

geography housing person (spouse)

geography housing person (child)

geography housing person (child)

geography housing

person (head)

person (spouse)

person (child)

person (child)

(Brazil 1980)

(Person records only; household data duplicated on person records)

Reformat Rectangular Sample

dwelling

household

person (head)

person (spouse)

person (child)

household

person (head)

person (child)

person (head)

person (spouse)

dwelling

household

dwelling household

person (head)

person (spouse)

person (child)

dwelling household

person (head)

person (child)

dwelling household

person (head)

person (spouse)

(Chile 1992)

(Separate dwelling and household records)

Reformat Dwelling-Household-Person Sample

serial 001 head

serial 001 spouse

serial 002 head

serial 002 child

serial 003 head

serial 001 geog & housing

serial 002 geog & housing

serial 003 geog & housing

serial 001 household

serial 001 head

serial 001 spouse

serial 003 household

serial 002 household

serial 002 head

serial 002 child

serial 003 head

Household File

Person File

(Brazil 2000)

Merge Separate Household and Person Files

IPUMS-International Development Process

3. Data Preparation3. Data Preparation

• Data reformattingData reformatting

• Draw samplesDraw samples

• Confidentiality measuresConfidentiality measures

• Convert source variables to inputConvert source variables to input

MX2000 MX00A018

H 49

Code Label Code Label

B Not specified {21,807} 0 NIU

1 Piped water inside the dwelling {1,138,262} 1 Piped water, inside dwelling

2 Piped water outside dwelling, but within property {697,912} 2 Piped water, outside dwelling, within property

3 Piped water from a public tap (or hydrant) {68,212} 3 Piped water, from a public tap

4 Piped water brought in from another dwelling {52,041} 4 Piped water, brought from another dwelling

5 Tanked in by truck {46,147} 5 Tanked in by truck

6 Water from a well, river, lake, stream or other {287,654} 6 From a well, river, lake, stream or other

7 [undocumented] {3} 9 Unknown

8 [undocumented] {1} 9 "

9 [undocumented] {6} 9 "

Original Source VariableOriginal Source Variable IPUMSI Input VariableIPUMSI Input Variable

Input Variables – Data

Input Variables – DescriptionMX00A18 Water source Universe Not collective households. Description Source of water used by the household. Questionnaire 6. Access to water Read all of the options until you get an affirmative answer. Circle only one answer

1 Running water inside the dwelling 2 Running water outside the dwelling but on the land 3 Running water from a public faucet or hydrant 4 Running water that is carried from another dwelling 5 Tanked in by truck 6 Water from a well, river, lake, stream or other

Answers 3, 4, 5, 6 continue with number 8 Instructions 6. Access to Water This question distinguishes dwellings which have piped water from those that get water from a different source.

Assigned by Assigned by computercomputer

Developed by Developed by researchersresearchers

Assembled by computerAssembled by computerfrom XML markupsfrom XML markups

IPUMS-International Development Process

4. Harmonization4. Harmonization

• DataData

• Correspondence tablesCorrespondence tables

Correspondence Table – Marital Status

MARST Marital Status

code label CN82A403 CO73A411 KN89A413 MX70A402 US90A425

100 SINGLE/NEVER MARRIED 1=never married 4=single 1=single 9=single 6=never married

200 MARRIED/IN UNION

210 Married (not specified) 2=married 2=married 3=monogamous 1=married

211 Civil 3=only civil

212 Religious 4=only religious

213 Civil and religious 2=civil and religious

214 Polygamous 3=polygamous

220 Consensual union 1=free union 5=free union

300 SEPARATED/DIVORCED 3=sep. or divorced

310 Separated 6=separated 8=separated 3=separated

321 Legally separated

322 De facto separated

330 Divorced 4=divorced 5=divorced 7=divorced 4=divorced

400 WIDOWED 3=widowed 5=widowed 4=widowed 6=widowed 5=widowed

999 UNKNOWN/MISSING 0=missing 6=unknown B=blank 1=unknown

ChinaChina19821982

ColombiaColombia19731973

KenyaKenya19891989

MexicoMexico19701970

U.S.A.U.S.A.19901990

Correspondence Table – Marital Status

MARST Marital Status

gen code label CN82A403 CO73A411 KN89A413 MX70A402 US90A425

1 100 SINGLE/NEVER MARRIED 1=never married 4=single 1=single 9=single 6=never married

2 200 MARRIED/IN UNION

210 Married (not specified) 2=married 2=married 3=monogamous 1=married

211 Civil 3=only civil

212 Religious 4=only religious

213 Civil and religious 2=civil and religious

214 Polygamous 3=polygamous

220 Consensual union 1=free union 5=free union

3 300 SEPARATED/DIVORCED 3=sep. or divorced

310 Separated 6=separated 8=separated 3=separated

321 Legally separated

322 De facto separated

330 Divorced 4=divorced 5=divorced 7=divorced 4=divorced

4 400 WIDOWED 3=widowed 5=widowed 4=widowed 6=widowed 5=widowed

9 999 UNKNOWN/MISSING 0=missing 6=unknown B=blank 1=unknown

General Codes

IPUMS-International Development Process

4. Harmonization4. Harmonization

• DataData

• Correspondence tablesCorrespondence tables

• Supplemental programmingSupplemental programming

<programming> BRA1970 inctot=p25..28*12; if (p25..28 = 9999) inctot=0; if (age<10) inctot=99999999; BRA1980 inctot=p64..72 + p87..95 + p103..111 + p112..120 + p121..129 + p130..138; BRA1991 if (p139..146=BBBBBBBB)inctot=0. if (p139..146 > 9999997 && p139..146 < 99999999)inctot=9999997; if (p139..146=99999999)inctot=9999998; if (age<10) inctot=99999999; BRA2000 if (p310..315 = 0 and age < 10) inctot=99999999; if (p310..315 = BBBBBB && age > 9)inctot=9999998; COL1973 if(age < 10) inctot=9999999; if(p55..59 = 99999) inctot=9999998; if(p55..59 = BBBBB) inctot=0; USA1960, USA1970, USA1980, USA1990 if (p154..159=999999)inctot=9999999; MEX1970 if (age < 12) inctot=9999999; MEX2000 if (p170..175 = 999999) inctot=9999998; </programming>

Supplementary Variable Programming (INCTOT)

IPUMS-International Development Process

4. Harmonization4. Harmonization

• DataData

• Correspondence tablesCorrespondence tables

• Supplemental programmingSupplemental programming

• DocumentationDocumentation

• IntegrationIntegration

• Mark-up for web deliveryMark-up for web delivery

XML-Tagged Variable Text (Literacy)<vardesc> <var> LIT </var> <desc> LIT indicates whether or not the respondent could read and write in any language. A person is typically considered literate if they can both read and write. All other persons are illiterate, including those who can either read or write but cannot do both. </desc> <comp> Some samples provided more specific criteria than others with respect to the level of ability that should constitute literacy. Typically, the instructions appear to be aimed at distinguishing persons who have memorized how to write their signature or recognize certain words from those that can truly write and comprehend text they read. In 1999 Vietnam, all persons with 5 or more years of schooling are automatically considered literate. </comp> <comp.bra> All Brazilian censuses consistently stipulated that to be considered literate a person must be able to read and write a simple note in any language. Persons are not literate if they can only write their name or if they once learned to read and write but have since forgotten. </comp.bra> <comp.chn> The Chinese census instructions supplied explicit criteria for defining literate and semi-literate persons, who are combined in the data as "illiterate." The instructions stated that illiterate and semi-literate persons were those who knew fewer than 1500 words and could not read "simple language books and newspapers or write a simple message." </comp.chn>

VariableVariableNameName

DescriptionDescription

GeneralGeneralComparabilityComparability

ComparabilityComparabilityBrazilBrazil

ComparabilityComparabilityChinaChina

Variable Description on Website (Literacy)

IPUMS-International Development Process

5. Data Enhancements5. Data Enhancements

• Data editingData editing

• Consistency editsConsistency edits

• Hot-deck imputationHot-deck imputation

OCCallocated when 975, 996, 998

categ1 categ2 categ3 categ4 categ5 categ6

empstat (10-19) (20-29) (30-39)

classwkr (10-19) (20-29) (99)

sex (1) (2)

race (100-199) (200-299) (300-899)

age (10-19) (20-29) (30-39) (40-49) (50-59) (60-120)

Missing Data Allocation Script

(Occupation variable, USA)

5 dimensional table5 dimensional table324 cells324 cells

IPUMS-International Development Process

5. Data Enhancements5. Data Enhancements

• Data editingData editing

• Consistency editsConsistency edits

• Hot-deck imputationHot-deck imputation

• Family interrelationship “pointers”Family interrelationship “pointers”

Pernum Relate Age Sex Marst Chborn

1 head 46 male married n/a

2 spouse 44 female married 3

3 aunt 77 female widow 7

4 child 15 female single 0

5 child 13 female single n/a

6 child 11 male single n/a

Pernum Relate Age Sex Marst Chborn

1 head 46 male married n/a

2 spouse 44 female married 3

3 aunt 77 female widow 7

4 child 15 female single 0

5 child 13 female single n/a

6 child 11 male single n/a

Spouse’s

Mother’s Father’s

IPUMS “Pointer” Variables

Location

 

 

 

 

 

 

2

1

0

0

0

0

Location

 

 

 

 

 

 

Location

 

 

 

 

 

 

0

0

0 0

0

0

2 1

1

1

2

2

(Simple household)

Pernum Relationship Age Sex Marst Chborn

1 head 53 female separated 6

2 child 28 male single n/a

3 child 22 male single n/a

4 child 21 male single n/a

5 child 25 female married 2

6 child-in-law 28 male married n/a

7 grandchild 3 male single n/a

8 grandchild 1 male single n/a

9 non-relative 32 female separated 2

10 non-relative 10 male single n/a

11 non-relative 5 female single n/a

Location

 

 

 

 

 

 

 

 

 

 

 

Location

 

 

 

 

 

 

 

 

 

 

 

Location

 

 

 

 

 

 

 

 

 

 

 

0

0

0

0

0

6

5

0

0

0

0

0

0

1

1

1

1

0

5

5

0

9

9

0

0

0

6

6

0

0

0

0

0

Spouse’s Father’sMother’s

IPUMS “Pointer” Variables(Complex household)

IPUMS-International Development Process

6. Dissemination6. Dissemination

• Documentation systemDocumentation system

• Preferences and dynamic content deliveryPreferences and dynamic content delivery

IPUMS-International Development Process

6. Dissemination6. Dissemination

• Documentation systemDocumentation system

• Preferences and dynamic content deliveryPreferences and dynamic content delivery

• Data extraction systemData extraction system

• Sample, variable, and case selectionSample, variable, and case selection

• General and detailed variablesGeneral and detailed variables

• Advanced extract featuresAdvanced extract features