Implementation of a Linked Open Data Solution For

download Implementation of a Linked Open Data Solution For

of 16

Transcript of Implementation of a Linked Open Data Solution For

  • 7/27/2019 Implementation of a Linked Open Data Solution For

    1/16

    Alejandro VillarAna Santurtn

  • 7/27/2019 Implementation of a Linked Open Data Solution For

    2/16

    Necessary for citizens to have a better understandingof their society and to help them make informeddecisions about it.

    Fundamental piece of Open Government.

    Many agencies collect, process and publish statisticalinformation at different levels aroud the worldNeed for interoperability (e.g. SDMX, PC-Axis).

  • 7/27/2019 Implementation of a Linked Open Data Solution For

    3/16

  • 7/27/2019 Implementation of a Linked Open Data Solution For

    4/16

    Data: RDBMS / OLAP Model Structural metadata (cubes, dimensions, measures, etc.) stored in

    Mondrian XML Schema files. Descriptive metadata

    Stored in RDBMS Most fields not normalized Domain model:

    Area: Subject area for a group of fields. Type: Type of series (time series, historical data or municipal data). Group: Subject area subdivision (Area + Type intersection).

    Node: Hierarchical layout of folders and time series. Software:

    Liferay Enterprise Portal 5.2 (main website) Custom Web Application using Pentaho Mondrian 3.1 + Jpivot 1.8 (data

    bank) MySQL 5.0 (data and metadata storage)

  • 7/27/2019 Implementation of a Linked Open Data Solution For

    5/16

  • 7/27/2019 Implementation of a Linked Open Data Solution For

    6/16

    New domain model: Four sections (economy, population, society and territory and

    environment).

    Several subsections per section.

    Up to three categories (regional data, historical data, municipaldata) inside each subsection.

    A hierarchy ofnodes (folders and time series) inside everysubsection + category.

    Descriptive metadata reorganization:

    Normalize fields whenever possible. Define sufficiently unique URI tags for later use in a URI schema.

    Create link repository. Development of Web Service and metadata API.

  • 7/27/2019 Implementation of a Linked Open Data Solution For

    7/16

    Folder hierarchy: SKOS model (skos) Concept Schemes: Sections and subsections. Concepts: Folders.

    Time series: RDF Data Cube Vocabulary (qb)

    Each time series is a qb:DataSet. Linked to folders using dcterms:subject.

    DCMI Terms (dcterms) used throughout the model for descriptivemetadata (subject, modified, accrualPeriodicity, spatial, temporal,etc.).

    Custom ICANE support vocabulary (icane).

    RDF local domain class mapping. Missing and confidential data values. Links to Metadata API using icane:metadataApiUri.

    Dates: XML Schema date type (except initial and final periods). VoID at http://www.icane.es/.well-known/void

  • 7/27/2019 Implementation of a Linked Open Data Solution For

    8/16

  • 7/27/2019 Implementation of a Linked Open Data Solution For

    9/16

    Converted HTML into XHTML+RDFa for user-visitableentities Fragment identifiers (Hash URIs).

    Already visible metadata annotated with RDFa. New, invisible metatada annotated with empty s

    Some entities only have RDF/XML views (notdesigned for human consumers):

    Category Reference Area

    Source Canonical TLD: www.icane.es

  • 7/27/2019 Implementation of a Linked Open Data Solution For

    10/16

    RDF Data Cube export filter developed for the data bankapplication. Export files created on the fly using current user selection. A default selection is available. OLAP RDF Data Cube mapping is straightforward. icane:ObservableValue

    icane:NullValue icane:ConfidentialValue

    OLAP RDF Data Cube

    Cube DataSet

    Dimension SKOS Concept Scheme & Concept Class

    Dimension Member SKOS Concept

    Measure MeasureProperty

    Slice Slice

    Data cell Observation

  • 7/27/2019 Implementation of a Linked Open Data Solution For

    11/16

    Only descriptive metadata served Structural metadata resides in Mondrian XML Schema files

    (impractical). All available data cannot be served for performance reasons.

    Initial consideration: D2RQ. Optimizations unavailable due to similarity of URI patterns. Complex SPARQL queriesHigh SQL query redundancy. Database structure modification implied reconfiguration (Metadata

    Web service not used). Custom solution developed: Metadata API Apache Jena bridge

    Model generation on the fly. Periodic (every 5) updates. Model is cached and only regenerated

    when metadata changes. SPARQL form developed as Liferay portlet.

  • 7/27/2019 Implementation of a Linked Open Data Solution For

    12/16

    Semantic links were established manually for select entities.Automatic linking discarded: Data heterogeneity. Domain specificity.

    Spanish texts and labels. Time series not linked due to abundance and specificity (already

    linked to folders). Links to non-RDF resources use foaf:page and rdfs:seeAlso. Link targets:

    GeoNames (4 links).

    DBpedia (45 links) and DBpedia ES (47 links). Instituto Nacional de Estadstica (251 links). Eurostat (22 links). Lista de Encabezamientos de Materia (168 links). Library of Congress Subject Headings (151 links).

  • 7/27/2019 Implementation of a Linked Open Data Solution For

    13/16

    Entity Entity count Property Link count

    Section 4 dcterms:subject 18

    rdfs:seeAlso 1

    Subsection 27 dcterms:subject 141

    rdfs:seeAlso 43

    Category 3 None None

    Folder 703 skos:closeMatch 161

    rdfs:seeAlso 199

    Time Series 2707 None None

    Reference Area 6 owl:sameAs 10

    rdfs:seeAlso 15

    Source* 2694 None None* All Sources have a foaf:page property not included here.

  • 7/27/2019 Implementation of a Linked Open Data Solution For

    14/16

    Linked Open Data solution designed,developed and deployed in < 6 months.

    No aspect or functionality impact thanks toRDFa.

    Automatic data consumers can discoverentities and their relationships.

    Series data is made available to RDFconsumers. SPARQL acts as alternative metadata API,

    allows data exploration using queries.

  • 7/27/2019 Implementation of a Linked Open Data Solution For

    15/16

    RDF markup for other published resources. Time series structural metadata

    centralization (similarly to descriptivemetadata). SPARQL data repository. Internationalization of text literals. Improving link quantity and quality (Google

    Refine, Free Your Metadata, Silk Framework).

  • 7/27/2019 Implementation of a Linked Open Data Solution For

    16/16

    and thanks to ICANEs IT department,specially Miguel Expsito and Alberto Lezcano.