Tourism 2025 in action Statistics New Zealand’s contribution 1.
All the answers? Statistics New Zealand’s Integrated Data Infrastructure
description
Transcript of All the answers? Statistics New Zealand’s Integrated Data Infrastructure
![Page 1: All the answers? Statistics New Zealand’s Integrated Data Infrastructure](https://reader035.fdocuments.in/reader035/viewer/2022070421/5681603f550346895dcf6432/html5/thumbnails/1.jpg)
All the answers? Statistics New Zealand’s Integrated Data
InfrastructurePaper by Felibel Zabala, Rodney Jer,
Jamas Enright and Allyson SeybPresented by Felibel Zabala
Sept 2012
![Page 2: All the answers? Statistics New Zealand’s Integrated Data Infrastructure](https://reader035.fdocuments.in/reader035/viewer/2022070421/5681603f550346895dcf6432/html5/thumbnails/2.jpg)
Statistics New Zealand’s Integrated Data Infrastructure (IDI)
Merges data from different suppliers including Statistics NZ
Variable quality of the different datasets, both within and between
2
![Page 3: All the answers? Statistics New Zealand’s Integrated Data Infrastructure](https://reader035.fdocuments.in/reader035/viewer/2022070421/5681603f550346895dcf6432/html5/thumbnails/3.jpg)
Statistics New Zealand’s Integrated Data Infrastructure (IDI)
Linking clean datasets is not easy, much more difficult for variable quality in datasets
Importance of an effective and efficient editing strategy
3
![Page 4: All the answers? Statistics New Zealand’s Integrated Data Infrastructure](https://reader035.fdocuments.in/reader035/viewer/2022070421/5681603f550346895dcf6432/html5/thumbnails/4.jpg)
Main objective
Present some of the issues on and solutions to any linked administrative dataset with a focus on one of Statistics NZ‘s first integrated dataset, the Linked Employer-Employee Data (LEED)
4
![Page 5: All the answers? Statistics New Zealand’s Integrated Data Infrastructure](https://reader035.fdocuments.in/reader035/viewer/2022070421/5681603f550346895dcf6432/html5/thumbnails/5.jpg)
LEED
Provides the backbone of the IDI prototype
Links longitudinal business data from Statistics NZ’s Business Frame to a longitudinal series of payroll tax data from Inland Revenue (IRD)
Used to produce quarterly statistics that measure labour market dynamics at various levels, eg filled jobs, worker flows, and total earnings
5
![Page 6: All the answers? Statistics New Zealand’s Integrated Data Infrastructure](https://reader035.fdocuments.in/reader035/viewer/2022070421/5681603f550346895dcf6432/html5/thumbnails/6.jpg)
LEED Payroll data
Collected from employers for New Zealand’s taxation system through IRD’s Employer Monthly Schedule (EMS)
Information available from EMS Employer/employee name and IRD number taxable earnings for work performed taxed at source
of income tax deductions (pay-as-you-earn or PAYE,
withholding tax, child support payment, student loan indicator amount)
start and finish dates of employment6
![Page 7: All the answers? Statistics New Zealand’s Integrated Data Infrastructure](https://reader035.fdocuments.in/reader035/viewer/2022070421/5681603f550346895dcf6432/html5/thumbnails/7.jpg)
LEED – additional details
Also includes payments made to beneficiaries by the government
Contains a subset of the self-employed
7
![Page 8: All the answers? Statistics New Zealand’s Integrated Data Infrastructure](https://reader035.fdocuments.in/reader035/viewer/2022070421/5681603f550346895dcf6432/html5/thumbnails/8.jpg)
LEED – additional details (cont’d)
Collection unit - the legal entity that files the EMS return
Statistical unit – or the ‘employer’ in LEED is the geographical or physical location of the business
8
![Page 9: All the answers? Statistics New Zealand’s Integrated Data Infrastructure](https://reader035.fdocuments.in/reader035/viewer/2022070421/5681603f550346895dcf6432/html5/thumbnails/9.jpg)
Methods of integration in LEED
Figure 1. Unit record links in LEED9
![Page 10: All the answers? Statistics New Zealand’s Integrated Data Infrastructure](https://reader035.fdocuments.in/reader035/viewer/2022070421/5681603f550346895dcf6432/html5/thumbnails/10.jpg)
Figure 1. Unit record links in LEED10
Linking employer to enterprise
![Page 11: All the answers? Statistics New Zealand’s Integrated Data Infrastructure](https://reader035.fdocuments.in/reader035/viewer/2022070421/5681603f550346895dcf6432/html5/thumbnails/11.jpg)
Figure 1. Unit record links in LEED11
Linking employer longitudinally
![Page 12: All the answers? Statistics New Zealand’s Integrated Data Infrastructure](https://reader035.fdocuments.in/reader035/viewer/2022070421/5681603f550346895dcf6432/html5/thumbnails/12.jpg)
Figure 1. Unit record links in LEED12
Linking enterprise and geo longitudinally
![Page 13: All the answers? Statistics New Zealand’s Integrated Data Infrastructure](https://reader035.fdocuments.in/reader035/viewer/2022070421/5681603f550346895dcf6432/html5/thumbnails/13.jpg)
Figure 1. Unit record links in LEED13
Linking employee longitudinally
![Page 14: All the answers? Statistics New Zealand’s Integrated Data Infrastructure](https://reader035.fdocuments.in/reader035/viewer/2022070421/5681603f550346895dcf6432/html5/thumbnails/14.jpg)
Variables edited in LEEDIRD numbers
Gross earnings
Date of birth
Sex
Workplace of an employee
Start and end dates of employment
Editing strategy: Do not replace any IRD data unless there is strong evidence it is an error
14
![Page 15: All the answers? Statistics New Zealand’s Integrated Data Infrastructure](https://reader035.fdocuments.in/reader035/viewer/2022070421/5681603f550346895dcf6432/html5/thumbnails/15.jpg)
Variables edited in LEED (cont’d)
IRD numbers
Imputation of sex
Imputation of start and end dates of employment
15
![Page 16: All the answers? Statistics New Zealand’s Integrated Data Infrastructure](https://reader035.fdocuments.in/reader035/viewer/2022070421/5681603f550346895dcf6432/html5/thumbnails/16.jpg)
Variables edited in LEED (cont’d)
Gross earnings Presence of systematic errors Detection method – use of ratio edit: PAYE/gross
earnings Imputation method
Date of birth Presence of systematic errors Detection method – edit rules based on an
employee’s age against some events Imputation method
16
![Page 17: All the answers? Statistics New Zealand’s Integrated Data Infrastructure](https://reader035.fdocuments.in/reader035/viewer/2022070421/5681603f550346895dcf6432/html5/thumbnails/17.jpg)
Variables edited in LEED (cont’d)
Imputation of workplace of an employee Uses transportation method, where the imputed workplace of an employee is the
geo that minimises the distance between an employee’s home address to the geo, subject to the constraints that
each employee is assigned to a geo and the total number of employees allocated to a
geo should equal the number of employees expected from the geo
17
![Page 18: All the answers? Statistics New Zealand’s Integrated Data Infrastructure](https://reader035.fdocuments.in/reader035/viewer/2022070421/5681603f550346895dcf6432/html5/thumbnails/18.jpg)
The IDI prototype
Datasets linked to LEED
Benefit data
Tertiary education data
Administrative tertiary education data and student loans and allowances data
Statistics NZ’s Household Labour Force Survey (HLFS) and its supplementary surveys
18
![Page 19: All the answers? Statistics New Zealand’s Integrated Data Infrastructure](https://reader035.fdocuments.in/reader035/viewer/2022070421/5681603f550346895dcf6432/html5/thumbnails/19.jpg)
The IDI prototype (cont’d)
Other linked dataset in IDIThe Longitudinal Business Database (LBD) prototype includes information on business
demographics, financial data, employment, goods exports, government assistance, and management practices
19
![Page 20: All the answers? Statistics New Zealand’s Integrated Data Infrastructure](https://reader035.fdocuments.in/reader035/viewer/2022070421/5681603f550346895dcf6432/html5/thumbnails/20.jpg)
The IDI prototype (cont’d)
Figure 2. Linking in the IDI prototype20
![Page 21: All the answers? Statistics New Zealand’s Integrated Data Infrastructure](https://reader035.fdocuments.in/reader035/viewer/2022070421/5681603f550346895dcf6432/html5/thumbnails/21.jpg)
Issues in linking in the IDI
Lack of a common identifier across datasets
Main variables in the Central Linking Concordance (CLC) IRD numbers, passport numbers, and student ID,
where available
Use of demographic variables as partial identifiers
21
![Page 22: All the answers? Statistics New Zealand’s Integrated Data Infrastructure](https://reader035.fdocuments.in/reader035/viewer/2022070421/5681603f550346895dcf6432/html5/thumbnails/22.jpg)
Issues in linking in the IDI (cont’d)
Need for a standard software for automated data linkage robust to data changes
Timing of receipt of data
22
![Page 23: All the answers? Statistics New Zealand’s Integrated Data Infrastructure](https://reader035.fdocuments.in/reader035/viewer/2022070421/5681603f550346895dcf6432/html5/thumbnails/23.jpg)
Editing strategy in the IDIFocus on ensuring high-quality linking variables are used in linking. Examples: Validity rules were used to edit names across
data sources Sex and date of birth are reformatted to ensure
common coding is used across data sources
Where inconsistencies occur in records linked from two different data sources, it is important to know which of the two data sources is more reliable
23
![Page 24: All the answers? Statistics New Zealand’s Integrated Data Infrastructure](https://reader035.fdocuments.in/reader035/viewer/2022070421/5681603f550346895dcf6432/html5/thumbnails/24.jpg)
Editing strategy in the IDI (cont’d)
Process to resolve inconsistencies in personal details Most common value present in the datasets
should be kept Prioritise the data sources to determine the order
of retaining their values
24
![Page 25: All the answers? Statistics New Zealand’s Integrated Data Infrastructure](https://reader035.fdocuments.in/reader035/viewer/2022070421/5681603f550346895dcf6432/html5/thumbnails/25.jpg)
Editing strategy in the IDI (cont’d)
Editing strategy should be able to
Edit inconsistencies from the same unit from different sources
Treat erroneous and missing variables in a record
Ensure consistency in variables across a record for a time period and over time
25
![Page 26: All the answers? Statistics New Zealand’s Integrated Data Infrastructure](https://reader035.fdocuments.in/reader035/viewer/2022070421/5681603f550346895dcf6432/html5/thumbnails/26.jpg)
Next steps
Build of the IDI with a focus on improving the linking methodology
Determine standard quality measures for outputs produced using administrative data
26
![Page 27: All the answers? Statistics New Zealand’s Integrated Data Infrastructure](https://reader035.fdocuments.in/reader035/viewer/2022070421/5681603f550346895dcf6432/html5/thumbnails/27.jpg)
Next steps (cont’d)
Redevelopment of LEED and SLA systems Investigate the use of geospatial information
to improve the employee allocation method Review of the editing of gross earnings Investigate the use of Banff
27