Implementation of a Linked Open Data Solution For
Transcript of Implementation of a Linked Open Data Solution For
-
7/27/2019 Implementation of a Linked Open Data Solution For
1/16
Alejandro VillarAna Santurtn
-
7/27/2019 Implementation of a Linked Open Data Solution For
2/16
Necessary for citizens to have a better understandingof their society and to help them make informeddecisions about it.
Fundamental piece of Open Government.
Many agencies collect, process and publish statisticalinformation at different levels aroud the worldNeed for interoperability (e.g. SDMX, PC-Axis).
-
7/27/2019 Implementation of a Linked Open Data Solution For
3/16
-
7/27/2019 Implementation of a Linked Open Data Solution For
4/16
Data: RDBMS / OLAP Model Structural metadata (cubes, dimensions, measures, etc.) stored in
Mondrian XML Schema files. Descriptive metadata
Stored in RDBMS Most fields not normalized Domain model:
Area: Subject area for a group of fields. Type: Type of series (time series, historical data or municipal data). Group: Subject area subdivision (Area + Type intersection).
Node: Hierarchical layout of folders and time series. Software:
Liferay Enterprise Portal 5.2 (main website) Custom Web Application using Pentaho Mondrian 3.1 + Jpivot 1.8 (data
bank) MySQL 5.0 (data and metadata storage)
-
7/27/2019 Implementation of a Linked Open Data Solution For
5/16
-
7/27/2019 Implementation of a Linked Open Data Solution For
6/16
New domain model: Four sections (economy, population, society and territory and
environment).
Several subsections per section.
Up to three categories (regional data, historical data, municipaldata) inside each subsection.
A hierarchy ofnodes (folders and time series) inside everysubsection + category.
Descriptive metadata reorganization:
Normalize fields whenever possible. Define sufficiently unique URI tags for later use in a URI schema.
Create link repository. Development of Web Service and metadata API.
-
7/27/2019 Implementation of a Linked Open Data Solution For
7/16
Folder hierarchy: SKOS model (skos) Concept Schemes: Sections and subsections. Concepts: Folders.
Time series: RDF Data Cube Vocabulary (qb)
Each time series is a qb:DataSet. Linked to folders using dcterms:subject.
DCMI Terms (dcterms) used throughout the model for descriptivemetadata (subject, modified, accrualPeriodicity, spatial, temporal,etc.).
Custom ICANE support vocabulary (icane).
RDF local domain class mapping. Missing and confidential data values. Links to Metadata API using icane:metadataApiUri.
Dates: XML Schema date type (except initial and final periods). VoID at http://www.icane.es/.well-known/void
-
7/27/2019 Implementation of a Linked Open Data Solution For
8/16
-
7/27/2019 Implementation of a Linked Open Data Solution For
9/16
Converted HTML into XHTML+RDFa for user-visitableentities Fragment identifiers (Hash URIs).
Already visible metadata annotated with RDFa. New, invisible metatada annotated with empty s
Some entities only have RDF/XML views (notdesigned for human consumers):
Category Reference Area
Source Canonical TLD: www.icane.es
-
7/27/2019 Implementation of a Linked Open Data Solution For
10/16
RDF Data Cube export filter developed for the data bankapplication. Export files created on the fly using current user selection. A default selection is available. OLAP RDF Data Cube mapping is straightforward. icane:ObservableValue
icane:NullValue icane:ConfidentialValue
OLAP RDF Data Cube
Cube DataSet
Dimension SKOS Concept Scheme & Concept Class
Dimension Member SKOS Concept
Measure MeasureProperty
Slice Slice
Data cell Observation
-
7/27/2019 Implementation of a Linked Open Data Solution For
11/16
Only descriptive metadata served Structural metadata resides in Mondrian XML Schema files
(impractical). All available data cannot be served for performance reasons.
Initial consideration: D2RQ. Optimizations unavailable due to similarity of URI patterns. Complex SPARQL queriesHigh SQL query redundancy. Database structure modification implied reconfiguration (Metadata
Web service not used). Custom solution developed: Metadata API Apache Jena bridge
Model generation on the fly. Periodic (every 5) updates. Model is cached and only regenerated
when metadata changes. SPARQL form developed as Liferay portlet.
-
7/27/2019 Implementation of a Linked Open Data Solution For
12/16
Semantic links were established manually for select entities.Automatic linking discarded: Data heterogeneity. Domain specificity.
Spanish texts and labels. Time series not linked due to abundance and specificity (already
linked to folders). Links to non-RDF resources use foaf:page and rdfs:seeAlso. Link targets:
GeoNames (4 links).
DBpedia (45 links) and DBpedia ES (47 links). Instituto Nacional de Estadstica (251 links). Eurostat (22 links). Lista de Encabezamientos de Materia (168 links). Library of Congress Subject Headings (151 links).
-
7/27/2019 Implementation of a Linked Open Data Solution For
13/16
Entity Entity count Property Link count
Section 4 dcterms:subject 18
rdfs:seeAlso 1
Subsection 27 dcterms:subject 141
rdfs:seeAlso 43
Category 3 None None
Folder 703 skos:closeMatch 161
rdfs:seeAlso 199
Time Series 2707 None None
Reference Area 6 owl:sameAs 10
rdfs:seeAlso 15
Source* 2694 None None* All Sources have a foaf:page property not included here.
-
7/27/2019 Implementation of a Linked Open Data Solution For
14/16
Linked Open Data solution designed,developed and deployed in < 6 months.
No aspect or functionality impact thanks toRDFa.
Automatic data consumers can discoverentities and their relationships.
Series data is made available to RDFconsumers. SPARQL acts as alternative metadata API,
allows data exploration using queries.
-
7/27/2019 Implementation of a Linked Open Data Solution For
15/16
RDF markup for other published resources. Time series structural metadata
centralization (similarly to descriptivemetadata). SPARQL data repository. Internationalization of text literals. Improving link quantity and quality (Google
Refine, Free Your Metadata, Silk Framework).
-
7/27/2019 Implementation of a Linked Open Data Solution For
16/16
and thanks to ICANEs IT department,specially Miguel Expsito and Alberto Lezcano.