Weighting and Imputation for CORE Social Housing Statistics Julia Bowman & Niall Goulding.
-
Upload
felix-carpenter -
Category
Documents
-
view
212 -
download
0
Transcript of Weighting and Imputation for CORE Social Housing Statistics Julia Bowman & Niall Goulding.
Weighting and Imputationfor CORE Social Housing Statistics
Julia Bowman & Niall Goulding
What CORE is
• COntinuous REcording of Social Housing Lettings
• Census – hybrid of interview and administrative data
• Household level data collected
• Private Registered Providers and Local Authorities
• Collected from all housing providers in England since 2004
• Many types of information are collected, not just the number of lettings…
Lettings log
2012/13 Headline stats
Context – 378,700 lettings
Household characteristics – 91% UK nationals, 22% in work, 3% under 18Most common reason given for why the household left their last settled home - overcrowding
Average weekly rent - £79.58 / £104.52
Length of time vacant – 32 days
Staying within local authority – 90%
378,700 lettingsOvercrowding£79.58 per week32 days vacant90% remain in LA
Complimentary data setsLocal Authority Housing Statistics (LAHS)
English Housing Survey (EHS)
Users
Interests around household characteristics
• And media interest…
QIF bid
• Two problems we sought to resolve…
• Placed bid to the UKSA’s Quality Improvement Fund (QIF)
• Work carried out by the ONS Methodology Advisory Board
Problem 1: LA missing records
• Lettings volume varies greatly by local authority
• Local Authority Housing Statistics (LAHS): nearly complete lettings data at LA level
• CORE: lettings data at household level
Problem 1: LA missing records
• Some LAs do not provide logs for every letting in CORE
• Introduces bias into demographic statistics
• Lettings grossed to LAHS counts on urban/rural classification
• Does not account for demographics of population
Solution 1: Improved Weighting
• Geographic approach maintained
• ONS area classifications (OACs) are used to replace urban/rural classifications.
• Areas grouped on many factors using a cluster methodology
Solution 1: Improved Weighting
• What is our best estimate for lettings per ONS cluster area?
• The highest of LAHS or CORE for each LA
• If neither, we use an imputed LAHS figure
• Sum these to get total lettings per ONS cluster area
Solution 1: Improved Weighting
Highest of LAHS, CORE, imputed LAHS for each LA
Sum lettings per ONS cluster area group
Compare to reported CORE figure per area group
Ratio of best estimate to CORE figure = weight
Problem 2: Record level missing data
• Both LA and PRPs submit logs with missing household characteristics
• Age, sex, ethnicity, nationality and economic status
• This can happen because
tenant refuses to provide the information
some LAs do not interview
admin data constraints
IT constraints
Solution 2: Imputation
• So how do we account for this?
• Donor imputation: Neighbour Imputation Method
• Canadian Census Edit and Imputation System – CanCEIS (Canadian Census 2001, UK Census 2011)
• Efficient, free license, variety of record editing rules
Solution 2: Imputation
Raw data comes to DCLG (SPSS)
Data reformatted for CanCEIS (ASCII)
CanCEIS finds incomplete and donor
records
CanCEIS matches records
Household characteristics that are available(age, sex, ethnicity, nationality, economic status)
Area classification, provider type (LA/PRP), previous tenure, size of property, asylum seeker,
refugee status (and client type)
Record randomly picked from pool of
donors
Imputed output data set
Age Sex Nationality Area Asylum
45 M UK 6 N
35 M EEA 2 N
27 F MISSING 4 N
Age Sex Nationality Area Asylum
45 1 1 6 0
35 1 2 2 0
27 2 -10 4 0
Age Sex Nationality Area Asylum
45 1 1 6 0
35 1 2 2 0
27 2 -10 4 0
Age Sex Nationality Area Asylum
27 2 -10 4 0
27 2 2 4 0 ×10
2
The complete process
Raw data comes to DCLG
Weighting Imputation
Complete recordsWeights assigned
Final data set
Results
• What happens when we weight and impute?
PRP LA Total %
UK 113,071 69,256 91.8%
A10 4,258 2,547 3.4%
Other EEA 1,286 936 1.1%
Other 3,537 3,710 3.6%
Missing 4,324 17,131 9.7%
Total lettings 220,056
PRP LA Total %
UK 116,944 96,410 91.4%
A10 4,427 3,569 3.4%
Other EEA 1,347 1,369 1.2%
Other 3,758 5,510 4.0%
Total lettings 233,334
Original reported data Weighted and imputed dataImputed data
PRP LA Total %
UK 116,944 84,439 91.5%
A10 4,427 3,118 3.4%
Other EEA 1,347 1,204 1.2%
Other 3,758 4,819 3.9%
Total lettings 220,056
Testing
• But what further tests can we do?
• Remove logs from a complete data set and then test weighting against the complete version
• Deleting data and then imputing it to check error rate
• Finding other unaccounted biases needing weighting
• Any other thoughts?
Future work
• CORE is now National Statistics – improvements pending
• Use areas from 2011 census data
• Affordable rent weighting and imputation
• Improve data quality and volume from LAs – 2013/14 first year all LAs will participate
• On going disclosure control investigations
• Make CORE data more easily available via Open Data Communities
Thank you. Questions and comments please!