The data revolution - Building data capability for a ...
Transcript of The data revolution - Building data capability for a ...
web: www.ons.gov.uk
twitter: @ONS
The data revolution -
Building data capability for a
modern national statistics institute
Tom Smith
Director, Data Science Campus
@_datasmith
Economy GDP
Inflation
Labour market
+++
People Population
Census
Incomes
+++
World Trade
Sustainable
Development Goals
+++
Data for ONS, government and wider
Insight into
Society
Insight into the
Economy
Savings &
efficiencies
Better informed
debate
Innovative economy
Better informed
research
Targeted
Services
Targeted service delivery (some examples)
Service integration:
Land Availability
Identify possible sites for new schools
Support house building
EFFICIENCY Clustering
Better policy decisions
based on demographics
and geography
Better Informed Public Debate:
Migration
Collaboration cross-government to reduce
mixed messages from different sources
Clearly present and explain all information
Better policy decisions:
Flow of funds
Closer monitoring of financial flows
Reduce risk of another financial crisis
Asset and liability position by sector
One sectors liabilities are spread
across economies
Efficient & effective
services: Reduce
reoffending rates
Assess success of
interventions
(Data often not held by
service providers)
Early Indicators of GDP
Data Science Campus | datasciencecampus.ons.gov.uk | [email protected] | @datascicampus
Length of time from 2008 for the UK
economy to return to pre-recession size
Fig 2. ONS National Accounts Publication Timetable
Fig 1. UK GDP Growth Rate Early
Intervention
Early
Indicators
-6% Change in UK GDP between first quarter of
2008 and second quarter of 2009
5 years
£12b Estimated value for earlier identification of
2008 downturn
Early Indicators of GDP
Data Science Campus | datasciencecampus.ons.gov.uk | [email protected] | @datascicampus
Length of time from 2008 for the UK
economy to return to pre-recession size
-6% Change in UK GDP between first quarter of
2008 and second quarter of 2009
5 years
£12b Estimated value for earlier identification of
2008 downturn
VAT turnover
returns
HMRC VAT Data AIS Ship Location
Road Traffic Broadband Traffic
Reproducible Analytical Pipeline
Data Science Campus | datasciencecampus.ons.gov.uk | [email protected] | @datascicampus
Solution The Challenge
• Producing official statistics for publications is a
key problem: as it is a time consuming
meticulous process
• It is time consuming as the analysis has to
pass throw multiple systems and multiple
individuals
• The systems are diverse and do not always
conform to good software engineering practice
• Use of software engineering
tools and techniques such as
version control.
• Automated generation of
tables/charts and statistical
verification
• Process from data storage to
report generation
£8.8k Estimated average annual saving per
publication
Data
Store
Statistical
Software Spreadsheet
Word
Processor pdf
Data
Store R Markdown £118m
Estimated annual efficiency savings across
government stats publications
Efficiency Savings
Production Secure, in-house access
Automated & audited
Reproducible
Research & insight Working in partnership, across
ONS and government
Secure 3rd party access to data
Services Acquire & access data
Match, link, anonymise
Statistical methods
Internal and 3rd party use
People & skills Upskilling & recruitment
Learning Academy
Data Science Campus
Data Capability to support National Statistics
9
Deputy National Statistician for Data Capability
Heather Savory
National Statistician
John Pullinger
Digital Technology (IT
Infrastructure/ Products,
Platforms & Computing)
Enterprise Architecture &
Service Design
Digital Policy & Service
Standards
Information Assurance &
Technical Security
Systems & Data Security
Operations Support &
Maintenance
Service Delivery & Design
Technological Policy &
Standards
Technical Service Policy
Data Policy and Standards
Information Infrastructure
Methodological Policy &
Standards
Methodological Services
(GSS)
Research and researcher
Accreditation – Secure
Research Service (SRS)
Statistical Quality Centre
Data Services
Good Practice and GSS
support
Cross-Gov Data Science
Data Science Frameworks &
Definitions
Data Science Policy &
Standards
Data Science Technical
Delivery & Design
External Partnering
GSS/ Heads of Profession
Leadership
Human Resources
Learning Academy
People Capability
Analysis Function
GSS Careers & Learning
Data Capability at ONS
Digital Services &
Technology Simon Taylor
People & Business
Services Philippa Bonay
Data Science Campus Tom Smith
Methods, Data &
Research Sarah Henry
Security Andy Wall
UN Global Platform Mark Craddock
Data policy Ross Young
Info
rm
An
aly
se
P
rep
are
A
cq
uir
e
Data use and access SRSA (inc ISOs)
DEA
STA (business)
RSA (registration)
VAT/finance acts Sta
tuto
ry
No
n-
Sta
tuto
ry
Voluntary Surveys
Non-controlled admin data
Commercial partnerships
Open data
Metadata
Statistical Methods
Development
Statistical
Releases
Ad Hoc Outputs
Inform Policy Accredited
Research Output
Devolved
Statistics
Identified Safe
Unrestricted Access Controlled Access
Statistical
Methods Advice
Statistical
Production
Statistical
Research
3rd Party
Service 3rd Party
Disclosure
Data Access Platform
Data Access Platform
One integrated digital platform
for all data storage, analysis
and processing in the ONS.
People and Learning
Transformational growth Professional development
• Management and
Leadership pathways
• Future Leader and
High Potential
Programmes
• Emotional good
health and resilience
• R and Python
learning pathways
• Statistical Analyst
scheme
• GSG Induction and
foundation
• DAP user training
Analysis Function in Government
Analysis, research and evidence helps
make better decisions to deliver improved
outcomes for the UK.
Data Science Campus | datasciencecampus.ons.gov.uk | [email protected] | @DataSciCampus
Data Science Campus
“Although better use of [data] has the potential
to transform the provision of economic statistics,
ONS will need to build up its capability to
handle such data.
This will take some time and will require not only
recruitment of a cadre of data scientists but
also active learning and experimentation.
That can be facilitated through collaboration
with relevant partners – in academia, the
private and public sectors, and internationally.”
Independent Review Economic Statistics Professor Sir Charles Bean, 2016, p.11
Data Science Campus | datasciencecampus.ons.gov.uk | [email protected] | @DataSciCampus
Data Science Campus | datasciencecampus.ons.gov.uk | [email protected] | @DataSciCampus
Data Science Campus delivery & capability
Data science projects
New data sources, eg satellite images, text, big
data, Internet of Things, social media
New techniques – machine learning, neural
networks, network, text & image analysis, big data
processing etc.
Short, exploratory research – innovation and risk
Building data science capability
Cross-govt training & train-the-trainers
Apprenticeships in Data Analytics
MSc Data Analytics for Government
Continuous Professional Development
Mentoring - Data Science Accelerator & Academy
STEM Ambassadors
Co-funded & co-supervised PhD placements and
programmes
0
2
4
6
8
10
12
complete in progress mentoring
UN Global Platform
A global programme using and
integrating big data for official
statistics, ensuring no one is
left behind.
UN Global Platform
UN Global Platform
Moving Forward Together
Partnership for the Goals
Products & Services
Global Platform
Data security – assessment & principles
Criteria to
outline content
Attributes to
describe content
7 Assurance and audit Assess and demonstrate that governance
and security controls are working as
expected
1 Security governance Accountable business owner, accountable data owners and a set of support
policies and processes that govern security operations and data management
2 Risk assessment of data Appropriate security protection and
business access with regular reviews the
security control environment
3 Best practice technical
design Government and industry best practice
for „Secure by Design‟ to blend system
and security development activity within
the development lifecycle
4 Need To Know User access through controlled and
centrally managed unique user accounts
based on role and need
5 Protective monitoring Logged access for business user data and
platform maintenance with advanced
analytics to identify anomalies
6 Import-export control Data ingest and export follows a defined
process with single routes, full authorisation
and validation
Assess data content
National Statistician’s Data Ethics Advisory Committee (NSDEC)
The data subject‟s identity (whether person or organisation) is
protected, information is kept confidential and secure, and the issue of
consent is considered appropriately
Confidentiality,
data security,
consent
The use of data has clear benefits for users and
serves the public good Public Good
21
The risks and limits of new technologies are considered and
there is sufficient human oversight so that methods employed
are consistent with recognised standards of integrity & quality
Methods &
Quality
The access, use and sharing of data is transparent, and is
communicated clearly and accessibly to the public Transparency
The views of the public are considered in light of the data used and the
perceived benefits of the research
Public views &
engagement
Data used and methods employed are consistent with legal
requirements such as the DPA, the Human Rights Act, the SRSA and
the common law duty of confidence
Legal
Compliance
Eth
ica
l P
rin
cip
les