Administrative Data Sources

32
Administrative Data Sources ONS Centre for Demography

description

Administrative Data Sources. ONS Centre for Demography. Overview. Assess statistical data quality of record level data ONS have access to: School Census Higher Education Statistics Agency Migrant Workers Scan Lifetime Labour Market Database Patient Registers - PowerPoint PPT Presentation

Transcript of Administrative Data Sources

Page 1: Administrative Data Sources

Administrative Data Sources

ONS Centre for Demography

Page 2: Administrative Data Sources

Overview

Assess statistical data quality of record level data ONS have access to:

• School Census• Higher Education Statistics Agency• Migrant Workers Scan• Lifetime Labour Market Database• Patient Registers

Detailed knowledge of these data sets gained during phase 1 will inform their potential use in phase 2

Page 3: Administrative Data Sources

Statistical Quality Measures

• Relevance - Purpose/ uses/ definitions

• Accessibility - Collection procedures

• Accuracy - Coverage/ completeness/ error

• Timeliness - Frequency/ lag

• Comparability - Legislative/ administrative changes

• Coherence - Common identifiers

Page 4: Administrative Data Sources

School Census: Overview

The School Census (SC) collects pupil level data on all pupils attending maintained schools

in England, whilst this is only a subset of the population it collects a wide range of

demographic information for each child.

Page 5: Administrative Data Sources

School Census: Key Features

• All pupils attending maintained schools in England

• Annual ‘Snap shot’

• Much validation is carried out by the schools software, LEA and DfE.

• Wide range of variables available to ONS

• Introduction of new variables

• Accuracy of variables very high and improving

Page 6: Administrative Data Sources

School Census: Key Issues

• Extract as at January, unlike timing for MYEs

• Excludes Independent schools and several other institutions

• Reliant on pupil/parent to up-date school of any changes

• Pupil reference matching number not always unique

• No migrant identifier

• Pupil number not available on other sources

Page 7: Administrative Data Sources

Higher Education Statistics Agency Data: Overview

The Higher Education Statistics Agency (HESA) collect data on all students in Higher

Education Institutions in the UK (full and part time, UK domiciled and overseas students) –

we have access to data for E&W.

Page 8: Administrative Data Sources

HESA: Key Features

• All students registered at a Higher Education Institution in England and Wales

• HESA also record number of students who complete studies each year by term-time local authority

• Wide range of variables available• High quality data source• Domicile and study addresses used to identify

internal migration • Linking to other sources to identify international

student inflows

Page 9: Administrative Data Sources

HESA: Key Issues

• Data by academic year, so not directly comparable to ONS estimates

• Some student groups excluded

• Term-time address assumed as at 30th June

• Student identifier not always unique

• Student identifier not available on other sources

Page 10: Administrative Data Sources

Migrant Workers Scan: Overview

The Migrant Workers Scan (MWS) is a subset of data from the National Insurance Recording

and PAYE System (NPS) and contains information on all overseas nationals who have

registered for and been allocated a National Insurance Number (NINo).

Page 11: Administrative Data Sources

MWS: Key Features

• All overseas nationals who are allocated a NINo

• Quarterly extracts

• Key demographic variables supplied

• MWS considered a high quality source• New registrations used to identify international

immigration• Address change used to identify internal migration of

overseas nationals

Page 12: Administrative Data Sources

MWS: Key Issues

• Not all overseas nationals captured

• Registration lag

• Arrival date is self reported

• Address at first extract is a proxy for arrival address

• Address changes are self reported (not compulsory) • Includes short and long term migrants arrivals• No measure of outflows

• Encrypted NINo not available on other sources

Page 13: Administrative Data Sources

Lifetime Labour Market Database (L2): Overview

• The Lifetime Labour Market Database (L2) is a 1% extract of data from the National Insurance Recording and PAYE System (NPS)

• DWP create the L2 database to inform policy decisions and expenditure forecasts on NI registrations, National Insurance contributions, benefit entitlement and take-up, employment & self employment trends.

Page 14: Administrative Data Sources

L2: Key Features

• L2 has over 700,000 individuals in the 1% sample.

• Holds detailed records for each tax year from 1975 & summary data between 1948 & 1974

• Each person may have the following data present:– Registration details (for all people) and arrival dates & country of

origin (for migrants) – Individual employment level data in each tax year – Benefit spells & number of weeks credited with NI contributions – Self employed & Voluntary NI contributions– Periods of NI liability / Non-liability, Child Benefit & Tax Credits– Address details and histories

• L2 also has 100% of all employer PAYE details by tax year (from 1997 onwards).

Page 15: Administrative Data Sources

L2: Data Recording

• Data on benefits are recorded by DWP. Data is communicated to NPS via the relevant benefit system or the Customer Information System (CIS)

• Data on employment is submitted to the NPS by employers via their PAYE P14 returns. The P14 data is validated by HMRC prior to recording on the NPS system

• Data on Self Employment is submitted by individuals via NI contribution or self assessment. HMRC generates Class 2 bills & receipts for the payment of NI contributions.

• Data on Tax Credits is recorded on NPS by HMRC after the Tax Credit award

• Data on Child Benefit is captured on NPS by HMRC when the claim for benefit starts and when a claim is amended or ceases.

• Address data is updated via any interaction with either a DWP or HMRC system.

Page 16: Administrative Data Sources

Creation of L2

}* Selection of data on NPS – not a complete list of data tables

Tax Credit claims & awards. Self employment & Self assessment data

Employee & Employer PAYE data (P14 etc)

Changes of address notified by employer or person

Changes of name or circumstances

Births & claims to Child Benefit

NPS* DWP Processes L2 database

Benefit Systems

NINO registrations (including migrants)

Benefit claims and awards

} Data exchange

Identification data

PAYE & NI data

Liabilities data

Address data

Employer PAYE data

IB JSA etc

1% sample

• Load data

• Validate & split

• Remove Audit trail

• Add archived records from old L2

• Derive analysis variables

• Postcode matching

• Anonymise

100% extract

•Postcode matching

• Anonymise

1% sample

• Extract 1% from 5% sample of benefit systems

• Anonymise

Identification data (1%)

PAYE & NI data (1%)

Liabilities data (1%)

Address data (1%)

Employer PAYE data (100%)

Benefit spell data (1%)

Page 17: Administrative Data Sources

L2: Coverage, Advantages & Limitations

• Coverage– Holds records for anyone who has ever had a National Insurance number– Records all employment, self employment, tax credit & benefit activities

since 1975– Holds address histories for all people who interact with NPS or DWP.

• Advantages– Large sample compared to survey data, spanning over 30 years.– Not subject to survey anomalies and response rates – Highly accurate. The data is used for benefit calculation and entitlement &

State Pension accrual and so is highly accurate at individual record level– Individual record level data allows cohort and longitudinal analysis of

activities over time. (L2 is anonymised to comply with Data Protection)

• Limitations– Doesn’t hold contextual data (education, household status etc)– Not collected specifically for population and migration purposes, so not

always defined in the same way as ONS survey data

Page 18: Administrative Data Sources

L2: Data Relevance

• Allows activities within the tax year to be analysed, and assessment made of whether the person is resident in the year or not, based on the type of activity.

• Enables methodological testing of data at individual record level to assess the most suitable and robust method of generating population and migration estimates using administrative data, based on activities that leave a ‘footprint’ on government computer systems.

• Enables cohort analyses to monitor changes in interaction behaviours over time, and to make adjustments accordingly to improve methodologies and population estimates using L2 data

• Early results from L2 suggest that this data source will provide robust population and migration estimates that will be highly beneficial to the MSIP & Beyond 2011 projects

Page 19: Administrative Data Sources

L2: Timelines & Next steps

• Timeliness– NPS is a live system receiving continual updates– L2 extracts are taken in January and May. The extract takes

around 2 months to load & process– The L2 extract is generally provided at least 9 months after the

last tax year being analysed. This is due to the time it takes employers to submit PAYE returns.

• Next Steps– Refine the current L2 population & migration methodology to

reflect (as near as possible) ONS survey/census selection criteria– Investigate the possibility of a larger sample size for the Beyond

2011 project. This would allow more robust analysis at Local Authority level and would improve the accuracy of internal migration measures.

Page 20: Administrative Data Sources

Patient Registers: Overview

ONS create a Patient Register (PR) dataset by collating data from each Primary Care

Organisation, who hold records of patients registered with an NHS doctor within England and

Wales.

Page 21: Administrative Data Sources

Patient Register – Key features

• Annual ‘mid-year’ snap-shot• Captures basic demographic variables• Has been used by ONS for a number of years

(main source for measuring internal migration)

• Other potential uses:– Population stock measure– International migration (Flag 4)

• Covers all ages

Page 22: Administrative Data Sources

Patient Register – Key Issues

• Some gaps in coverage• Reliant on patient to update their record and

to re-register with a doctor• Migrants may delay registering• List inflation• List cleaning (ad-hoc)• Duplicate NHS numbers• NHS number not present in other sources

Page 23: Administrative Data Sources

How will the administrative data sources be used in phase 2?

Page 24: Administrative Data Sources

• These data sources will be considered, either singularly, or in conjunction with other sources, in forming plausibility ranges for each LA by age & sex:

– Patient Register– School Census– HESA– L2– Electoral Roll?

• These data sources can be use as an indirect contribution for informing the quality of each other, and also in comparison to mid-year estimates

Phase 2 – Reconciling Administrative Sources

Page 25: Administrative Data Sources

Age Coverage

0 100+

OTHER DATA SOURCES ?

5 16 59/6418 28

PATIENT REGISTER

45

SCHOOL CENSUS

HESA

LIFETIME LABOUR MARKET DATABASE (L2)

Phase 2 – Reconciling Administrative Sources

Page 26: Administrative Data Sources

Phase 2 – Migrant Distribution

• Aim to deliver an improved method for distributing long and short term migrants at Regional and Local Authority level

• Research how administrative sources can be used to distribute the national IPS figure

Page 27: Administrative Data Sources

Distribution Methods (LTM)

National Immigration Estimate (IPS)

Region/UK country

Intermediate Geography

Local Authority

Current* Proposed

Distribute using 3 yr LFS average

Use 3 yr IPS average

Model-based distribution

Distribute using administrative data

* For non-London LAs. A similar , but slightly more complex method is used for London LAs

Page 28: Administrative Data Sources

Outline Proposals

Immigrant stream Potential Sources (Long-term migrants)

Potential Sources (Short-term migrants)

Workers MWS &L2 (Workers) MWS &L2 (Workers)

Students HESA HESA & data on English language schools?

Children under 16 Flag 4s, School Census (ages 5-16)

School Census?

Returning migrants IPS uncalibrated, L2 population data, Census

Data from the IPS is distributed to local level using a regression model approach based on a wide range of sources e.g. Flag 4s, IDBR, Population estimates by ethnic group

Other migrants Flag 4, L2 (claiming benefits)

Page 29: Administrative Data Sources

Further Information

• More information on the MSIP can be found at:

http://www.statistics.gov.uk/imps

• Initial feasibility report on use of MWS:

http://www.ons.gov.uk/about-statistics/methodology-and-quality/imps/updates-reports/historical-updates-reports/updates-reports-09/initial-feasibility-report---october-2009.pdf

• Research paper on use of School Census:http://www.ons.gov.uk/about-statistics/methodology-and-quality/imps/updates-reports/historical-updates-reports/updates-reports-09/research-paper-on-the-use-of-school-census-data-to-improve-population-statistics---october-2009.pdf

Page 30: Administrative Data Sources

Questions

?

Page 31: Administrative Data Sources

Phase 2 – Other uses

•Internal migration adjustments– MWS - Student adjustment – end of study moves– MWS - First onward moves– HESA – already used to improve internal migration

adjustments– Compare internal moves of SC/MWS to PR

Other uses of administrative data:• MWS - NPP assumption• MWS - Other migration research• SC - Continue to quality assure mid-Year Estimates for 5 to 15

year olds in England and Wales• SC - Pursuing access to Welsh microdata• L2 – creating sex ratios for comparison• L2 – assess feasibility of gaining a larger sample of NPS

Page 32: Administrative Data Sources

Administrative Data Sources

ONS Centre for Demography