Semantic web design for - Presentation
-
Upload
muthu-kumaar-thangavelu -
Category
Internet
-
view
113 -
download
0
Transcript of Semantic web design for - Presentation
What is Linked Data?
• Linked Data is an alternative data representation format.
• Actually, its just a repackaging of Semantic Web elements
• It is different from relational database concepts such as tables, rows, columns…
RDF
Subject-Predicate -Object
Jurong belongs to the West Zone
Linked Data Representation Format
http://data.gov.sg/resource/area/Jurong_West
http://data.gov.sg/ontology/property/has_zone
http://data.gov.sg/resource/zone/West
Subject
Predicate
Object
http://w3.org/2003/01/geo/wgs84_pos#/lat http://w3.org/2003/01/geo/wgs84_pos#/long
1°20'040.2"N103°42'24.54"E
Traditional representation - Tables
Linked Data Components• Data talks about itself. Humans and Machines
both understand data - How?
• URIs - lots of them (http://data.gov.sg/PlanningArea/Kallang)
• RDF - Data model (Jurong Point is a location)
• Ontologies - Enforces a structure to data (Land Hierarchy) – represented as RDFs
• SPARQL - Does the same job as SQL and a bit more...
Linked Data Cloud (Web of Data)
Linked Data becomes Linked Open Data(LOD) by publishing it with “appropriate” license
Provides opportunity to link with other useful data sets
Provides variety of information about the same resource
Linked Data and Government Data - a natural compatibility
• Why?
• Govt data is used by all
• Govt data needs to be transparent and easily understandable
• Govt data is mainly factual – a direct fit!
• Standardized representation of Govt data across the globe can facilitate comparison without hassles.
• Best way to propagate a useful agenda to the private arena...
Who have implemented Linked Data?
• UK, US, Brazil Governments
• Private Corporations? Yes– BBC
– Nature
– World Bank
– New York Times
– FAO
– CIA Factbook
?Provide Links?
http://wheredoesmymoneygo.org/bubbletree-map.html#/~/grand-total--2010-
Sample Linked Data Usecase in UK
ABC Water Proj (R)
Agency Websites
Singstat publicationsMINISTRIES
XLS
HTML
Accountant-General's DepartmentAccounting and Corporate Regulatory Authority
Agency For Science, Technology & ResearchAttorney-General’s Chambers
Building & Construction AuthorityCentral Narcotics Bureau
Central Provident Fund Board Civil Aviation Authority of Singapore
Department of StatisticsEconomic Development Board
Energy Market AuthorityHealth Sciences Authority
Housing & Development BoardImmigration & Checkpoints Authority
Infocomm Development Authority of SingaporeInland Revenue Authority of Singapore
Institute of Technical EducationIntellectual Property Office of Singapore
JTC CorporationJudiciary, Subordinate CourtsJudiciary, Supreme CourtLand Transport AuthorityMajlis Ugama Islam Singapura
Maritime & Port Authority of Singapore
Monetary Authority of SingaporeNanyang Polytechnic
National Environment AgencyNational Heritage Board
National Library Board National Parks Board
Ngee Ann Polytechnic People's Association
Public Service DivisionPublic Transport Council
Public Utilities Board Republic Polytechnic
Sentosa Development Corporation Singapore Civil Defence Force
Singapore Customs Singapore Land Authority
Singapore Police ForceSingapore Polytechnic
Singapore Sports CouncilSingapore Workforce Development Agency
Spring Singapore Temasek Polytechnic
Urban Redevelopment Authority
Ministry of Community Development, Youth & Sports
Ministry of Education
Ministry of Foreign Affairs
Ministry of Health
Ministry of Law –Community Mediation Unit
Ministry of Manpower
Ministry of Transport
Media Development Authority
BFABuildings(C)GreenBuilding(E)
C- CommunityCul - Culture
E- EnvironmentEmp- Employment
Edu - EducationH- HealthF- Family
R- RecreationS- Sports
Breast Screen (H)Cervical Screen (H)Healthier Dining (H)
Quit Centers (H)
Infocomm Access (C)Silver infocomm (C)
Wireless Hotspots (R)Child care (F)Disability (F)Elder care (F)
Family (F)Family Friendly Estab (F)
Student Care (F)Comm Mediation Center (C)
After Death Facilities (E)Funeral Palours (E)Dengue Cluster (H)Hawker Center (E)
NEA Offices (E)Recycling Bins (E)
Waste Disposal Site (E)
Waste Treatment (E)
Heritage sites(Cul)Monuments(Cul)
Museums(Cul)
Libraries (Cul)Streets and Places(Cul)
CD Councils (C)Community Clubs (C)
Constituency offices (C)Other facilities (C)
Other Pan networks (C)PA head quarters (C)
Residents Committee(C)Water Venture (C)
National Parks (R)Skyrise greenery (E)
Sports clubs (S)
CET Centers(Emp)WDA Service points(Emp)
Kindergartens (Edu)Get TokenAddress SearchAgency Data SearchStatic Map
Get Layer InfoMashupGet Related Data
Get DirectionsPublic Transportation
Reverse Geocode
Map-related APIs from various agencies
Traffic-related APIs from Land Transport Authority
Tourism-related APIs from the Singapore Tourism Board
Environment-related APIs from the National Environment Agency
Library-related data feeds & web services from National Library Board
DGS Eco System
SG DATA
TEXTUAL
SPATIAL
API
THEMES OPERATIONSCATEGORIES
UNSTRUCTURED DATA
STRUCTURED DATA
STRUCTURED DATA
STATUTORY BOARDS
SG Government Data Eco System
Different levels of granularity
Multiple End points
Meta data only at data set levels
Data already cooked !!
Hierarchies not captured
Vocabulary Conflict in spatial and textual data
Few design issues spotted through the Linked Data lens
Benefits of using Linked Data for iDASingapore
• An opportunity to standardize common terms across agencies
• Re-use of resources (through URIs) ex: http://data.gov.sg/zone/central
• Centralized control?
• Single endpoint for all govt data - Linked Data API
• Very convenient for developers to join data from different agencies. eg: combining data from SLA and URA
URA Sites for Sales dataset(Urban Planning)DOS Population and Household Characteristics dataset (Population Demographics)
Age Pyramid of Resident Population
Old Age Support Ratio
Datasets Used for Framework Evaluation
Framework Formulation Process• Work was split into three phases – Analysis, Design
and Evaluation
• Based on study of Linked Data Migration Research Papers and cookbooks published by the World Wide Web Consortium(W3C)
• Analysis of Linked Data implementations in UK ,US and Brazil
• Evaluation of Linked Data tools with Singapore data sets for recommendation in each step of the framework
• Contemplating on probable issues that could be faced during implementation
Proposed Linked Data MigrationalFramework for DGS
Specification Identfication Analysis
Object Modeling
Ontology Modeling
URI Naming
RDF Creation
External Linking
Datasets Publication
Discovery & Exploitation
Re-use Create
S2R D2R A2R
\
Govt Agencies and IDA
Govt Agencies Domain Matter Experts
Ontology Modelers
IDA and Web Architects
Developers
Developers and Domain Experts
Developers
Web Architects
ObjectivesSpecifications
Project Duration
Dataset PrioritizationDataset License SettingImpln Mode Selection
RoadmapArchitecture
Overview
Relational ModelDataset Overview
Drawing Objects in Whiteboard
Conceptual View
Conceptual ViewPublic Vocabularies
Re-use of Existing Vocabularies
Creation of New Vocabularies
OWL, RDFS, RDF Vocabulary files
Resources Class and Properties
Visualization of URI mining process
URI AdministrationURI Lifecycle
ER ModelSpreadsheets,
DBMS, API
Conversion to RDF triples using Mapping files
RDF Triples
Government and external data sets
Linking based on Similarity Algorithms
Outbound Links
RDF TriplesOntologies
SPARQL, API
Data InsertionVOID ModelingData Retrieval
API to SPARQL conversion
VOID TriplesJSON data
Actual DataExisting Apps
GamificationCrowdsourcing
Catalog RegistrationExternal Reference
New Apps
INPU
T
PR
OC
ES
S
OUTPU
T
INPU
T
PR
OC
ES
S
OUTPU
T
INPU
T
PR
OC
ES
S
OUTPU
T
INPU
T
PR
OC
ES
S
OUTPU
T
INPU
T
PR
OC
ES
S
OUTPU
T
INPU
T
PR
OC
ES
SOUTPU
T
INPU
T
PR
OC
ES
S
OUTPU
T
INPU
T
PR
OC
ES
S
OUTPU
T
Resource
Allocation
10
Resource
Allocation
15
Resource
Allocation
15
Resource
Allocation
5
Resource
Allocation
20
Resource
Allocation
5
Resource
Allocation
15
Resource
Allocation
15
1
2
3
4
5
6
78
Step 1 Step 2 Step 3 Step 4 Step 5 Step 6 Step 7 Step 8
Specification Home
1) Design the High Level Architecture
2) Set the “Migration Potential” for data sets
3) Decide the “Perspective” – Vertical vs Horizontal -> Agency vs Application (We recommend Agency perspective)
Data setData set
URL Data Type AgencyUtility Level
Interlinking Possibility
Potential Level
Annual Vehicle Population by Type of Fuel Use URL
Textual (PDF) LTA H L M
Administrative Data - Employment Statistic URL
Textual (HTML) MOM H M H
Step 1 Step 2 Step 3 Step 4 Step 5 Step 6 Step 7 Step 8
Specification Home
4) Setting up of License for data sets
5) Implementation Method – “Linked Data + RDF”Other options - 1) Just URIs 2) URI for data sets only
Analysis of Data sets Study of System specifications, design & integration documents (including database) of
the selected data sets
• Understand Metadata, Schema design and Entity Relationship (ER) models
Data SetData Set
URL Data Type Agency LicenseAccess Rights Data Access Modes
Annual Vehicle Population by Type of Fuel Use URL Textual (PDF) LTA PDDL R
API, SPARQL, RDF Dump
Administrative Data - Employment Statistic URL
Textual (HTML) MOM PDDL R
API, SPARQL, RDF Dump
Step 1 Step 2 Step 3 Step 4 Step 5 Step 6 Step 7 Step 8
Object Modeling
This is modeling without usage context.*Requires normalization of database model in 3NF form
IssuesPossibility of applying high abstraction and high granularity to objects
Key Learning Ease in identifying the use of common objects across data setsFacilitates brainstorming of relationships between objects
Home
Step 1 Step 2 Step 3 Step 4 Step 5 Step 6 Step 7 Step 8
Ontology ModelingTakes the conceptual diagram from Object Modeling as input.
Design Ontologies1. Identify classes and subclasses2. Identify hierarchy structure3. Connect classes through relationship4. Create rules for inference (optional)5. Output OWL vocabulary files
Ontology modelling is carried out in two ways:- 1) Using and extending public ontologies 2) Designing a local ontology from scratch
Home
Step 1 Step 2 Step 3 Step 4 Step 5 Step 6 Step 7 Step 8
Ontology Modeling
Date fields, location fields and fields related to measurements in DGS have scope for vocabulary re-use
Vocabulary for the identified data sets (developed using Protege) with screenshots
List of vocabularies required for LOGD implementation
List of tools used for ontology modeling
OUTPUT?ALLOCATION PERCENTAGE?PERSONNEL INVOLVED
Home
Step 1 Step 2 Step 3 Step 4 Step 5 Step 6 Step 7 Step 8
URI Naming
ABOX TBOX
http://data.gov.sg/ontology/Ministry/ http://data.gov.sg/ministry/MOH
http://data.gov.sg/ontology/Agency/ http://data.gov.sg/agency/SLA
http://data.gov.sg/ontology/SiteLocation http://data.gov.sg/location/pioneer_road_north
http://data.gov.sg/ontology/Race http://data.gov.sg/race/chinese
Dataset ID URAstaticfile001
Dataset http://data.gov.sg/dataset/ URAstaticfile001/
Class http://data.gov.sg/terms/class/URAstaticfile001/sitesforsale
Property http://data.gov.sg/terms/property/URAstaticfile001/time
Row 1 http://data.gov.sg/dataset/URAstaticfile001/1
Row 1 - A generic column http://data.gov.sg/dataset/URAstaticfile001/1/columnName
Dataset URIs
Home
1) “URI Administration” ModeMaintained centrally in the DGS platform (resultant URIs will start with http://data.gov.sg/) -> RECOMMENDED
vsMaintained by individual agencies (resultant URIs will start with http://ura.gov.sg or http://sla.gov.sg).
vsMaintained externally by third party platforms such as Kasabi (resultant URIs will start with http://data.kasabi.org) – No longer valid as Kasabi service has been shut down
2) Setup of URI Taxonomy
Step 1 Step 2 Step 3 Step 4 Step 5 Step 6 Step 7 Step 8
RDF Creation Home
RDF triples are generated by converting data from source format with the necessary transformation
Type Nature Example of Singapore data sets Source format
S2R (Static Files) StaticURA Site for Sales, Singstat’s Population
Household Characteristics XLS, CSV, TXT files and other static files
D2R (RDBMS) Dynamic DGS tables RDBMS
A2R (APIs/Web Services) Dynamc
OneMap API, myTransport API, NLB web services
Application Programming Interface (API) and Web Services(SOAP, REST)
Step 1 Step 2 Step 3 Step 4 Step 5 Step 6 Step 7 Step 8
RDF Creation Home
Evaluated 3 tools for each mode of conversion Google Refine - S2RRDF Views - D2RRDF Sponger - A2R
Google Refine Demo for S2R!
ER models from RDBMS are to be converted into corresponding vocabularies/Ontologies for D2R process using STDTrip methodology
For A2R, External Cartridges (mapping files) are to be created for mapping API parameters to vocabularies. This can be done in RDF Sponger
“We feel that Linked Data is best suited for data from Static files and not for data that is real-time and dynamic in nature unless conformity to structure can be trusted”
Step 1 Step 2 Step 3 Step 4 Step 5 Step 6 Step 7 Step 8
External Linking
External Linking is connecting with other data sets in the web of data
Data.gov.sg
WorldBankCIA World Factbook
DBpedia FAO GeonamesSupreme
CourtFlickr
<http://data.gov.sg/location/bugis> <owl:sameAs> <http://www.dbpedia.org/resource/Bugis><http://data.gov.sg/race/malay> <owl:sameAs> <http://www.dbpedia.org/resource/Malay_race>
Issues•The outbound links made to data sets outside of IDA’s purview can be risky
•Dead links are a vivid possibility during the change of resource URIs or system downtime
Home
Step 1 Step 2 Step 3 Step 4 Step 5 Step 6 Step 7 Step 8
Datasets Publication Home
Triple Store or RDF Store is the data structure used to store Linked Data.
• We used Virtuoso Universal Server’s built-in triple store for evaluation• It is visualized that the triple store will be centralized at iDA
SPARQL (pronounced as SPARKLE) will be the main output terminal for Linked Data• SPARQL can be used to SELECT, INSERT , DELETE, UPDATE data• SPARQL is gateway to any operation on Linked Data. APIs and Applications are
built on top of it
Triple Store and SPARQL Demo!
We had some information about External Linked Data Hosting but we had to remove itas the major provider Talis has closed its own hosting service Kasabi!
Step 1 Step 2 Step 3 Step 4 Step 5 Step 6 Step 7 Step 8
Datasets Publication Home
Linked Data API is the common API endpoint that will be used by developers and public users to access government data.- This solves the problem of maintaining multiple end points!
ex: http://gov.tso.co.uk/transport/api/transport/doc/bus-stop-point.xml
Step 1 Step 2 Step 3 Step 4 Step 5 Step 6 Step 7 Step 8
Discovery & Exploitation
Key Theme1) Internal discovery within Singapore for local citizens – idea4apps (link)
2) External discovery for attracting usage of Singapore government data in international economic & political research and global issues(water scarcity, Carbon
Footprint etc.)• Entry in CKAN registry ->http://thedatahub.org/tag/registry
Home
Gamification? Promoted by LinkedGov.org
Original data
provided by URA
Possible because of the re-use of the
common resource URI Pasir Ris across
data sets
Similarly, location based data from OneMap API is
retrieved for Pasir Ris
Interlinked Datasets Post-Migration
Other Interesting Use Cases
Definitely not Science Fiction!
Q & A Engine that works on top of government linked data. Inspired by www.trueknowledge.com
Sense-MakingQuestion: Which recent year had a growth rate close to 50% for majority of Singapore based SME?
Step1: Spot the resources in this query
Dbpedia Spotlight does just that! – Semantic Information Extraction
Which recent year had a growth rate close to 50% for majority of Singapore based SME
Step2: Identify the relationship between the resources
SME is instance of the Organization class Organization class comes under Singapore country
Growth rate is a property of Sales class Year is a class by itself
Majority is subset of Group class
Step3: Use NLP technique – Syntactic Analysis (Stanford Parser) followed by Focus Extraction for understanding the question
2010 is retuned as the result!
Step 4: Look for RDF triples that meet the criteria
Syntactic Parse tree is generated followed by Access Pattern
Key Challenges• Dense data - lot of additional RDF triples will get created along with the
required RDF triples as a resource belongs to multiple ontologiesDemographics dataset stats:-Rows:~300 Columns:16 in excel file Resultant triples count in RDF/Triple store:13711Reason: Majority of the generated triples are for machine understanding.
• URI administration could be an intense activity as dead URIs can cause damage to applications eg: what will happen if http://data.gov.sg/area/jurong doesn't work?
• Changes to structure of static files and RDMBS tables require changes in RDF mapping files - might be a long process if not properly regulated
• Not readily suitable for real-time data
Summary
Four in-person discussion sessions with IDA, NIIT and SLA
Analysis of Five data.gov.sg system specifications
Evaluation of Four existing Migration Frameworks
Prototyping with Six core Linked Data Tools
Dataset Publication
Virtuoso Universal Server Linked Data API
External Linking
SILK LIMES
RDF Creation
Google Refine RDF Views RDF Sponger
URI Naming
Pubby
Ontology Modeling
Protégé
Object Modeling
Concept Map
Summary
• Applicability of the framework to Singapore Government Data
• Issues identified in existing Data Eco System• Recommended tools and best practices for each step• Launchpad for SG Linked Data implementation
Final Thoughts…• ROI is not a key metric for Linked Data implementation• Benefits of moving to Linked Data is intangible and may
not be immediately realizable• Volume of work is huge compared to traditional
systems
We are thankful to Prof Chris Khoofor his supervision and iDA staff Soy Boon Lim for providing overview of data.gov.sg and also for furnishing DGS design documents...
Why are we doing this project?
To prescribe a Linked Data migrational framework for data.gov.sg (DGS) data sets
First hand view of the required migration activities
Issues anticipated at each step
Evaluation & Recommendation on Linked Data tools
To help IDA in realizing - What more can be done with existing data ? A closer look at Government counterparts – UK and US !
In totality, iDA can use this report as a guide for the various aspects related to Linked Data implementation
Basic Thought Process of Linked Data Publishing
• Select data sets that appear apt for Linked Data
• Identify the data sources for the data sets
• Find out what type of transformations are needed
• Publish it!