DwB-Training Cource on EU-SILC , February 13-15, 2013
description
Transcript of DwB-Training Cource on EU-SILC , February 13-15, 2013
Working with EU-SILC using the hierarchical data structure, matching & aggregating data
Practical computing session I – Part 2
Heike WirthGESIS – Leibniz Institut für Sozialwissenschaften
DwB-Training Cource on EU-SILC , February 13-15, 2013Romanian Social Data Archive at the Departement of SociologyUniversity of Bucharest, Romania
• EU-SILC data has a hierarchical structure
• more than one level of analysis is possible• household & individual levels are represented by separate files• data are stored in multiple data files
2
Introduction
3
Example of household level dataExample 1: Household
record #
Year of survey
Country HH-ID Dwelling type Total disposable
HHLD Income
Ability to make ends meet
….
HB010 HB020 HB030 HH010 HY020 HS120 …1 2010 AT 1 apartment or flat in 15,271 with great difficulty2 2010 AT 2 detached house 30,081 fairly easily…
1500 2010 RO 1 detached house 2,243 fairly easily1501 2010 RO 2 detached house 2,409 with difficulty
… … … … … … … …
1 observation = 1 HouseholdPlease note: HHLD-ID does not differentiate between countriesTo be on the safe side use HHLD-ID with country & year of survey
4
Example of individual level dataExample 2: Individual data
record #
Year of survey
Country HH-ID Person-ID
Marital status Gross monthly earnings
Highest ISCED Level attained
PB010 PB020 PX030 PB030 PB190 PY0200G PE0401 2010 AT 1 11 married 3500 (upper) secondary 2 2010 AT 1 12 married 1400 lower secondary 3 2010 AT 1 13 never married 1450 (upper) secondary 4 2010 AT 1 14 never married 2307 lower secondary
30001 2010 RO 1 11 married 1500 (upper) secondary 30002 2010 RO 1 12 married 750 lower secondary 30003 2010 RO 1 13 never married 250 (upper) secondary
… … … … … … …
1 observation = 1 PersonPerson-ID sequential within household
• Decision on the appropriate unit of analysis for your research question, e.g.
• research interest in households or persons? % of households /persons/men/women/children who live in poverty? % of households with only 1 person or % of persons who live alone?
• Knowledge of procedures for manipulating the data
5
Working with this kind of data, requires
• One-to-one matching • Household Register to Household Data; • Personal Register to Personal Data
• One-to-many matching• Household variables to Individual data
• Many-to-one matching (‘aggregation’)• e.g. adding information from the individual data to the
household data
6
Types of Matching
7
EU-SILC – Types of matching
Household-Register File
(D)
Household-Register File
(D)
Household-Data File (H) Household-
Data File (H)
Personal-Register File (R)
Personal-Register File (R)
Personal-Data File (P)
Personal-Data File (P)
1:1 1:1
n:11:n
n:11:nn:11:n
n:1
1:n
• Key variables provide links between the related records
• between household files• between individual files• between household and individual files
• Key variables (depending on the files) are• household id (DB030; HB030; RX030; PX030)• personal id (RB030; PB030)
• to be on the safe side: Use key variables always with• ‘year of survey’ (DB010; HB010; RB010; PB010) & • ‘country’ (DB020; HB020; RB020; PB020)
8
Linking EU-SILC files (cross-sectional)
• Attach household register information (D-File) to household data file (H-File)
• e.g. ‘Degree of urbanisation’ (DB100) is only included in the household register, it might be of use having this information in the household data, too.
9
Example 1: one-to-one
10
One-to-One Match, e.g. household informationHousehold Register ( separate file)
DB010 DB020 DB030 DB075 (…) DB100 2010 AT 2 3 (…) intermediate area2010 AT 12 2 (…) thinly populated area2010 AT 13 3 (…) thinly populated area2010 AT 19 2 (…) thinly populated area2010 AT 26 3 (…) thinly populated area2010 AT 59 4 (…) densely populated area
Household Data (separate file)
HB010 HB020 HB030 HS090 HS120 (…) HX060
2010 AT 2 no - cannot afford with great difficulty (…) One person household2010 AT 12 yes with difficulty (…) Other hhlds without dep. children
2010 AT 13 no - other reason fairly easily (…) One person household2010 AT 19 yes fairly easily (…) Other hhlds without dep. children
2010 AT 26 yes easily (…) Other hhlds without dep. children2010 AT 59 yes with some difficulty (…) One person household
11
Result: Combined Household File
Household Data (combined file)
HB010 HB020 HB030 HS090 HS120 (…) HX060 DB100
2010 AT 2no - cannot
affordwith great difficulty (…)
One person household intermediate area
2010 AT 12 yeswith
difficulty (…)
Other households without dependent
childrenthinly populated
area
2010 AT 13no - other
reason fairly easily (…)One person household
thinly populated area
2010 AT 19 yes fairly easily (…)
Other households without dependent
childrenthinly populated
area
2010 AT 26 yes easily (…)
Other households without dependent
childrenthinly populated
area
2010 AT 59 yeswith some difficulty (…)
One person household
densely populated area
• Attach household register information (D-File) to personal data file (P-File)
• Attach ‘Degree of urbanisation’ (again) to the personal data file
12
Example 2: one-to-many
13
Attaching household data to personal data (1:n)
Personal Data (combined)PB010 PB020 PX30 PB030 PH010 PH020 PH030 PX020 DB1002010 AT 2 201 fair yes yes, limited 71 intermediate area2010 AT 12 1201 fair no no, not limited 32 thinly populated area2010 AT 12 1202 fair yes yes, limited 31 thinly populated area2010 AT 12 1203 good no no, not limited 30 thinly populated area2010 AT 12 1204 fair no no, not limited 26 thinly populated area(…)
Household Register ( separate file)DB010 DB020 DB030 DB075 (…) DB100 2010 AT 2 3 (…) intermediate area2010 AT 12 2 (…) thinly populated area
2010 AT 26 3 (…) thinly populated area
• e.g. number of persons in a households who are• unemployed, • full-time employed • self-employed?
• such information is not included in the data
=> own computation
14
Example 3: many-to-one
15
Matching: many-to-one (summarizing information)
Personal Data Summarized variables
PB010 PB020 PX30 PB030 PL031 # unempl# employed
full time# self
employed2010 AT 2 201 Unemployed (5) 1 0 02010 AT 12 1201 Empl. full time (1) 0 2 12010 AT 12 1202 Emp. full time (1) 0 2 12010 AT 12 1203 Emp. part time (2) 0 2 12010 AT 12 1204 Self-employed (3) 0 2 1(…)
Household Data( combined file)HB010 HB020 HB030 # unempl # employed # self employed 2010 AT 2 1 0 02010 AT 12 0 2 1
2010 AT 26 .. …
• Attach ‘Degree of Urbanisation’ (DB100) to household data file (H-File)• Open the EU-SILC training dataset – D-File *.• Check the variables you are interested in .• Sort your data according to key variables used für linkage *.• Names of key variables in files to be matched must identical
=> Create new key variables (ID010, ID020, ID_HH) in such a way thatDB010 = ID010DB020 = ID020DB030 = ID_HH
• Create a new file with only the key variables & the variable(s) you are interested in
• name the new file DB100.sav16
Hands on – matching 1:1
• **** Before you start ************.
* specify the path where the EU-SILC training dataset is stored.FILE HANDLE data_path / NAME='H:\wirth\DWB_TRAINING\SILC\DATA\'.
* specify the path where you want to save your data.FILE HANDLE mydata_path /NAME='H:\wirth\DWB_TRAINING\SILC\EXERCISE_1\'.
open the EU-SILC training dataset – D-File *.
GET FILE='data_path/udb_c10d_silc_course.sav'.
* check the variables you are interested in .cross DB020 by DB100.
17
SPSS–Matching: one-to-one
* open the EU-SILC training dataset – D-File *.
GET FILE='data_path/udb_c10d_silc_course.sav'.
* check the variables you are interested in .cross DB020 by DB100.
* Step 1- Sort your data according to key variables used für linkage *.sort cases by DB010 DB020 DB030.
* Step 2 - Names of key variables in files to be matched must identical *. rename variables (DB010 DB020 DB030 = ID010 ID020 ID_HH).
* create a new file with the key variables & the variable(s) you are interested in *.
save outfile = 'mydata_path/DB100.sav' /keep ID010 ID020 ID_HH DB100.
18
SPSS–Matching: one-to-one
GET FILE='data_path/udb_c10H_silc_course.sav'.sort cases HB010 HB020 HB030.
* Key – Variables *.* either rename (like before) or better generate a new variable *
STRING ID020 (A2).compute ID010 = HB010.compute ID020 = HB020.compute ID_HH = HB030.
MATCH FILES FILE= * /file ='mydata_path/DB100.sav' /BY ID010 ID020 ID_HH. execute.
* check whether it worked.cross HB020 by DB100.
19
SPSS–Matching: one-to-one
Example 2: Combing household and personal data
E.g. ‘Degree of Urbanisation’ (DB100) to personal data.
GET FILE='data_path/udb_c10p_silc_course.sav'.
* Sort key variables used für linkage *.
sort cases by PB010 PB020 PX030.
* PB020 = string variable - create a new string variable ID020 /or use the rename command *
STRING ID020 (A2).
compute ID010 = PB010.
compute ID020 = PB020.
compute ID_HH = PX030.
20
SPSS–Matching: One-to-many Match (1:n)
MATCH FILES FILE= *
/table = 'mydata_path/DB100.sav'
/BY ID010 ID020 ID_HH.
execute.
* Check whether it worked *.
cross pb020 by db100.
save outfile = 'mydata_path/personal_data.sav'.
21
SPSS–Matching: One-to-many Match (1:n)
• Create new summary variables for personal data (P-File)
• number of persons living in the same household• number of unemployed persons living in a household • number of full-time employed persons living in a household• number of part-time employed persons living in a household• number of self-employed persons living in a household• sum of ‘pensions from individual private plans (PY080G)
22
Matching: many-to-one (n : 1)
23
• *********************************************************.• * many-to-one (n:1)• * Personal Data• * example 1• * number of persons living in the same household• * number of unemployed persons living in a household• *********************************************************.
• * specify the path where the EU-SILC training dataset is stored.• FILE HANDLE data_path / NAME='H:\wirth\DWB_TRAINING\SILC\DATA\'.
• * specify the path where you want to save your data.• FILE HANDLE mydata_path / NAME='H:\wirth\DWB_TRAINING\SILC\EXERCISE_1\'.
• * open the EU-SILC training dataset.• GET FILE='data_path/udb_c10p_silc_course.sav'.