BIG DATA INITIATIVES - Kasetsart University...2019/01/01 · BIG DATA FRAMEWORK Ministry of Digital...
Transcript of BIG DATA INITIATIVES - Kasetsart University...2019/01/01 · BIG DATA FRAMEWORK Ministry of Digital...
BIG DATA FRAMEWORKMinistry of Digital Economy and Society
Missions
• Encourage big data analytics in government• better decision-making
• More efficient operation
• To promote data exchange standards among government agencies
• Encourage the compilation and usage of open data
Government Data Services
National Data Center, Cloud, and Big Data Platforms
Training Programs• Data Scientists• Data Engineers• Business Analysts
Services • ID verification• Access control• Data distribution• Transaction logging
Data Council
Data Cataloging and directory services
1
3
2
5
4คณะอนุกรรมการกลั่นกรองค าขอข้อมูล
Dat
a ex
chan
ge
Dat
a C
atal
og
Peo
ple
war
e
Data council&
Operating team
Infrastructure
Use cases in government
3
6
Showcases• Healthcare• Tourism• Traffic• Etc.
Profile the Agencies’ Computing Facility
Assessor
Facility Profile
Standards Gap Analysis
InfrastructureRecommendations
Gap Report
THE APPROACHDifferent services require different computing environment and possibly
different infrastructure. One size will not fit all.
Transition Plan
1
Total Cost of Ownership for SaaS
https://www.peoplehr.com/blog/index.php/2015/06/12/saas-vs-on-premise-hr-systems-pros-cons-hidden-costs/
Could be 77% less than
on-PremiseSource: Yankee Group DecisionNote
Technology Analysis
1
1
Personally identifiable information (PII)Government confidential dataData related to national security
ผู้ให้บริการเป็นองค์กรที่รัฐบาลสามารถก ากับดูแลได้
ผู้ให้บริการเป็นองค์กรใดก็ได้
Public Data
1
Tier 3.x datacenter, self administrative and self provisioning
Datacenter
Traditional IT Infrastructure
Private Cloud
Hybrid Cloud
Public Cloud XPublic Cloud A
Multi cloud
Gov CloudGovernment cloud with Service Level AgreementProvisioning and manage services by DE Ministry
Infrastructure-as-a-service
• Co-location
• Hardware
• Virtualization
8
Agency put own machines in provider’
estate and may contract the server MA to
provider
Providers have buildings (tier 4) with
• Reliable electrical power
• Easy access to fast network
connectivity
• Reliable cooling systems
• Secured space
1
Infrastructure-as-a-service
• Co-location
• Hardware
• Virtualization
9
• Provider takes over the
administration, monitoring
and support of the
dedicated server systems.
1
Infrastructure-as-a-service
• Co-location
• Hardware
• Virtualization
• Agencies rent virtual machines from providers
• Agencies take care of core business applications only
• Providers take care of everything else
1
Agency Cloud A
Gov Cloud
Ministry 1
Ministry 20
20 Ministry clouds(Self provisioning orOutsource)
Government cloud with Service Level AgreementProvisioning by DE Ministry
Estimated to have 55+ agency datacenters
10+ agency clouds (self provisioning or Outsource)
Dat
a Ex
chan
ge S
ervi
ce A
PIs
ระดบักระทรวง
ระดบักรมหรือเทียบเท่า
Metadata storage and data service facility and services
Data Catalog, Data Linkage, and Data Exchange Services
LinkageCenters
Agency datacenter X
1
National Data Catalog2
for Thai government data
In Big data analytics projects, analysts need to look for sources of data. They need a place where they can see the list of available data and tools to gather data.
National Data Catalog2
Provide government data directory services
with web/mobile interfaces that can be
searched by
• Sectors (healthcare, finance, natural
resources, education, justice, transportation,
social services, etc.)
• Agencies / Ministries
• Access license (Open data, Person data, Govt
data, Secured Govt data)
• Data type / data format
National Data Catalog
• Compile metadata from government
agencies for all critical government
services (and other data portals)
• Build data Index for multi-dimensional
search
• Data search, request and approval
mechanisms
• Track and report on all data exchange
transactions
To provide data directory services
2
Agency B
opensensitive
Agency A Agency C Agency D
User Interfaces and APIs
Encourage each government Agency to …• Identify their critical datasets
• Evaluate data quality and quantity of critical datasets, find gaps, and define roadmap (3-5 years) on compliance
2
Define Metadata Template and ..• Encourage each agency to define their own metadata on critical data
set based on the template
• Metadata model may include definition and information about data• Metadata model• Data Owner• Collection methods • Purpose of data collection • Sources of data• Data description • Types and format• Tags (business domain and keywords)
2
Define Data Decision Tree Guideline
• Used data classification guideline to classify data/datasets into classes
• Public data
• Personally identifiable information
• Confidential data
• Data related to national security
NS
PIIConfidential
Public
2
Draft digital data management policy guideline
Encourage each agency to segment data into 4 data classes and …
For each data class
• Define procedures to ensure that data quality and quantity meet standards
• Define procedures to ensure that data is fresh all the time
• Define policy on the use and sharing of data• Terms and Conditions of Data Release
• Protocol for the data transferring and handling method (API, FTP, removable storage device, Email, etc.)
• Access control policy
• A set of filters to anonymize or de-classify datasets that is confidential but very useful
• Define data protection policy template
2
DE to provide all the templates
Emphasize that data format matters2
DE to advertise, encourage, and enforce the 4-star data format
Encourage and guide each government Agency to define their own data policy
• Ensure data is fit for the purposes of internal and external reporting
• Ensure data is appropriately categorized for storage, retrieval, destruction, backup, and access (Appropriate data life cycle)
• Ensure proper management of digital data
Agency should establish the Data Stewardship Team
2
Artifacts / Templates • Data quantity and quality evaluation method
• Metadata template
• Data classification guideline (decision tree)
• Data transfer protocol for each class of data
• Tag tree (for different business domain and keywords)
• Data access control policy for each class of data
• Digital Data Management Policy
• Data Exchange/Supply Agreement
• Terms and Conditions of Data Release
• Data protection policy and directives
• A set of filters to anonymize or de-classify datasets that is confidential but very useful
• All artifacts should be produced for 1 agency as an example
2
Data Exchange Services
• Provisioning of a data exchange platform to enable update and sharing of data through automation
• Verifying person identity for each data request (Authentication)
• Controlling access to all data (Authorization)
• Provisioning of data usage reports for both internal and external uses
• Provision a data log to keep track of data access trails
• Commit to data handling method defined by each agency
• Define and commit to data service level agreement (SLA)
• Draft IT roadmap for data service (3-5 year plan)
3
For all transfers of information containing controlled data, • Establish the identity of the recipients
• Check whether access license of the recipients and the data matched (Authorization, security clearance)
• Check whether reasonable requests on data are made • The nature of the information, its sensitivity and confidentiality
• The size of the data being requested
• The damages or implications any data loss would have on individuals, agencies, or the council
• Check whether it is a special request where formal approval are needed
• Ensure transfer protocol is followed (API, FTP, removable storage device, Email, etc.)
• Check the storage facility on the receiving that the security meets standard for each data type
3
3
Directory Services
Data Consumer
1
Data Provider
X
Data Provider
1
Data Provider
2
2 Search and Request data
3 Approve Request
Facilitate data services in response to regular and ad hoc data requests
1 Publish data description
Data Exchange Services
Data council and operating team
• Appoint Chief Data Officer (CDO)
• Form Data council (Representatives from all ministries, External experts)
• Coordinate actions in terms of the inventorying, governance, production, circulation and use of data
• Organize the best use of these data and their wider circulation, while respecting data privacy laws
• For data exchange with external organization, draft a standard template Data Exchange Agreement (DEA), a Data Supply Agreement (DSA), and “Terms and Conditions of Data Release”
• Suggest data strategies to the prime minister
To regulate data access within and across agencies/ministries
4
Data Council and operating team
• Building standard protocol and repeatable
processes in data requests and data
provision across agencies
• Design a framework to approve special
data requests that do not fit standard
request protocols
Ministry AData
Council
Agency B
Ministry X Data
Council
AgencyD
Agency C
Agency A
4คณะอณุฯ Data, Cloud,
and Big DataGoal: Enable systematic data exchange across agencies
Training (Problem-based Learning)
• Data Scientists = 250 persons / year • 5-6 months
• Background : basic statistics and basic programming
• Data Engineers = 250 persons / year • 4-5 months
• Background : system administrators and basic programming
• Business Analysts = 800 persons /year • 2-3 months
• Background : none
5
Special Program: Data Science Bootcamp
Admission Test Foundation
Courses:
Comprehensive Test
Hands-on
Advanced
Courses:
Special Project Study: Select a project with impact from own organization
(Weekly consultation for the total of 4 months)
Final results are evaluated in both technical and business perspectives by professors and managers
Python / R Programming (3 days)
Mathematics for data Science (3 days)
Statistical Data Analysis (4 days)
Python / R for Big Data
Analytics(2 days)
Machine Learning(4 days)
Data Science Workshop(3 days)
Introduction to Data Science and Enterprise IT(1 day)
Time Series Analysis(2 days)
5
Special Program: BI Analytics Bootcamp
Admission Test
PLB
Foundation
Course:
Comprehensive Test
Seminar
Main Course:
Statistical Data Analysis
(4 days)
Visualization Workshop with Tools (4 days)
Consumer Behavior Analytics (2 days)
Analytics for Decision Making(3 days)
Discussion and Showcases(1 day)
Data Analytic Thinking
(1 day)
5
Special Program: Data Engineering Bootcamp
Admission Test
PBL
Foundation
Course:
Foundation Level Test
Hands-on
workshop
Main Courses:
Special Project Study on Big Data Solution Design and implementation
(Bi-Weekly consultation for the total of 3 months)
Final results are evaluated in both technical and business perspectives by professors and IT managers
Introduction to Big Data(1 day)
High Performance Computing and Hadoop(2 days)
Data Integration and Service (3 days)
Python Programming (3 days)
Hadoop Ecosystem and Workshop(2 days)
Data Processing with Spark(2 days)
5
Training Methods• Universities offer a series of data boot-camps for government units
• Each camp is independent of one another (non-sequence)
• Each camp has a set of knowledge modules and technical skills, called learning outcomes.
• Certificates are given for each camp for those who pass the comprehensive/ hands-on exams
• Learners can collect a set of satisfied learning outcome (micro-credential) to be used as a part certification or degree later on.
5
Micro-credential for Learners
Create Interactive visualization
Explore data distribution
Understand basic statistics
Program in Python
Write short reports
Understand consumer behaviors
Each learners collect micro-credentials (Learning outcomes)
A set of learning outcomes = a certificateA set of certificates = a master degree
5
Univ. A
Univ. B
Univ. C
สกอกพ
กระทรวง DE(โดย SDU)
สพร Univ. D
DE• Professional standard frameworkDGA• E-Learning Platform for DS• Blockchain Hosting (handle
payment later on)• Training competency assessment
กพ• Training Budget Management• ท ำเรื่องเลื่อนระดับข้ำรำชกำรตำม skills
สกอ• Govern degree awardingUniversities• Select trainers (theory and practical
experiences) • Conduct training and teaching• Certify based on learning outcome
Blockchain for HR development in Big Data
5
Advanced Analytics Use Cases
• Public Health
• Public Finance
• Workforce
• Public Safety
• Justice and Corrections
• Education
• Agriculture
• Economic Development, commerce, industry
• Transportation
• Utilities
• Citizen Services
• Etc.
6