Mapping Hierarchical Sources into RDF using the RML Mapping Language

Post on 05-Dec-2014

513 views 1 download

description

Incorporating structured data in the Linked Data cloud is still complicated, despite the numerous existing tools. In particular, hierarchical structured data (e.g., JSON) are underrepresented, due to their processing complexity. A uniform mapping formalisation for data in different formats, which would enable reuse and exchange between tools and applied data, is missing. This paper describes a novel approach of mapping heterogeneous and hierarchical data sources into RDF using the RML mapping language, an extension over R2RML (the W3C standard for mapping relational databases into RDF). To facilitate those mappings, we present a toolset for producing RML mapping files using the Karma data modelling tool, and for consuming them using a prototype RML processor. A use case shows how RML facilitates the mapping rules’ definition and execution to map several heterogeneous sources. http://rml.io https://github.com/mmlab/RMLProcessor

Transcript of Mapping Hierarchical Sources into RDF using the RML Mapping Language

Mapping Hierarchical Sources into RDF using the RML Mapping Language

Anastasia Dimou1, Miel Vander Sande1, Jason Slepicka2, Pedro Szekely2,

Erik Mannens1, Craig Knoblock2, Rik Van de Walle1

1Ghent University – iMinds – Multimedia Lab 2University of Southern California – Information Science Institute –

Department of Computer Science

http://rml.io

IEEE-ICSC14

Newport beach, California, 18th June 2014

Most of the data that we would like to be able to query as Linked Open Data

exists in formats other than RDF

There are…

over 11,000 APIs according to ProgrammableWeb.org

only 74 of which return results in RDF

But more than 5000

return results in JSON or XML

Many languages, tools and approaches

were proposed

to convert data from relational databases to RDF

Relational Database to RDF (R2RML W3C)

R2RML mappings R2RML processor

Data OWNER / PUBLISHER

defines

RDF

DB

R2RML mappings R2RML processor

Data OWNER / PUBLISHER

defines

RDF

DB CSV JSON XML

RDF RDF RDF

lack of uniform definitions to describe mapping rules for heterogeneous sources

lack of interoperable definitions that would allow the re-use of mapping rules

across different implementations

lack of reusable definitions that would allow the re-use of mapping rules

for representing data in the same or different formats

mapping data

on a per-source and per-format basis

or on case-specific basis

Uniform way of defining mappings

for heterogeneous sources

that can be re-used across data

in the same or different formats

and be interoperable

across different implementations

R2RML mappings R2RML processor

Data OWNER / PUBLISHER

defines

RDF

DB CSV JSON XML

RDF RDF RDF

Mappings definitions processor

Data OWNER / PUBLISHER

defines

RDF

DB CSV JSON XML

any format to RDF

RDF Mapping Language (RML)

generic scalable mapping language

for mapping heterogeneous resources into RDF

in an integrable and interoperable fashion

superset of the W3C standardized

R2RML mapping language

http://semweb.mmlab.be/ns/rml

Relational Database to RDF

Mapping Language

(R2RML)

R2RML mapping document

NAME BIRTH_DATE DEATH_DATE

Robert Theodore McCall 1919-12-23 2010-02-26

Ronald Anderson 1929-12-06

Triples Map

Logical Table

Table Name

<#ArtistMapping>

rr:logicalTable [

rr:tableName “ARTISTS” ].

R2RML mapping definition

Table Name

Triples Map

Logical Table

Subject Map

Predicate-Object Map

Predicate-Object Map

Predicate-Object Map

Predicate Map

Object Map

R2RML mapping document

Triples Map

Subject Map

NAME BIRTH_DATE DEATH_DATE

Robert Theodore McCall 1919-12-23 2010-02-26

Ronald Anderson 1929-12-06

<#ArtistMapping>

rr:subjectMap [

rr:template “http://ex.com/{NAME}” ;

rr:class ex:Person ];

<http://ex.com/Robert+Theodore+McCall> a ex:Person

R2RML mapping document

Predicate Map

NAME BIRTH_DATE DEATH_DATE

Robert Theodore McCall 1919-12-23 2010-02-26

Ronald Anderson 1929-12-06

<#ArtistMapping>

rr:predicateObjectMap [

rr:predicate ex:birth_date;

rr:objectMap [ rr:column "BIRTH_DATE" ] ];

<http://ex.com/Robert+Theodore+McCall> ex:birth_date “1919-12-23”

Predicate Object Map

Objectt Map

RDF Mapping Language

(RML)

RDF Mapping Language (RML)

mapping hierarchical sources to RDF

deal with hierarchy and heterogeneity

R2RML: each row is a self-contained

that can be processed independently

R2RML: the columns in each row

can be referred to unambiguously

R2RML: for each reference to a column in a single row

a unique value is returned

explicit reference to the iteration pattern R2RML: each row is a self-contained

that can be processed independently

abstract reference to the input data R2RML: the columns in each row

can be referred to unambiguously

more than one triples per Predicate-Object Map R2RML: for each reference to a column in a single row

a unique value is returned

RDF Mapping Language

(RML)

For hierarchical sources

[ ... …

{ "Title": "Apollo 11 Crew",

"Artist": "Ronald Anderson",

"Ref": "NPG_70_36",

"Sitter": [

{ "Name": "Neil Armstrong",

"Birth Date": "1930-08-05" },

{ "Name": "Buzz Aldrin",

"Birth Date": "1930-01-20" },

{ "Name": "Michael Collins" } ],

"DateOfWork": "1969" },

{ "Title": "Neil Armstrong",

"Artist": "Robert Theodore McCall",

"Ref": "S_NPG_2010_51",

"Sitter": [

{ "Name": "Neil Armstrong" } ],

"DateOfWork": "2009" },

... … ]

<Artists> ... ...

<Artist>

<Name>Robert Theodore McCall</Name>

<Birth_Date>1919-12-23</Birth_Date>

<Death_Date>2010-02-26</Death_Date>

</Artist>

<Artist>

<Name>Ronald Anderson</Name>

<Birth_Date>1929-12-06</Birth_Date>

<Death_Date/>

</Artist> ... ...

</Artists>

artworks.JSON artists.XML

Specifying the input data

R2RML: database

RML: file, API, …

R2RML: Logical Table (rr:logicalTable)

RML: Logical Source (rml:logicalSource)

R2RML: logical Name (rr:logicalName)

RML: source (rml:source)

Triples Map

Logical Source

source

<#ArtworkMapping>

rml:logicalSource [rml:source “http://ex.com/artworks.json”].

Triples Map

Logical Source

source

<#ArtistMapping>

rml:logicalSource

[ rml:source “artists.xml” ].

Referring to the input data

R2RML: databases

RML: XML or JSON or CSV or ….

R2RML: (SQL)

RML: Xpath/Xquery or JSONPath or RFC 4180 or …

R2RML: (rr:sqlQuery)

RML: rml:referenceFormulation

<#ArtworkMapping>

rml:logicalSource

[ rml:source “http://ex.com/artworks.json” ;

rml:rererenceFormulation ql:JSONPath ].

Triples Map

Logical Source

source

<#ArtistMapping>

rml:logicalSource

[ rml:source “artists.xml”;

rml:referenceFormulation ql:XPath ]. Reference Formulation

Triples Map

Logical Source

source

Reference Formulation

Iterating over the input data

R2RML: per row

RML: ?

R2RML:

RML: rml:iterator

<#ArtistMapping>

rml:logicalSource

[ rml:source “artists.xml”; rml:referenceFormulation ql:Xpath ;

rml:iterator “/Artists/Artist” ].

<Artists> ... ...

<Artist>

<Name>Robert Theodore McCall</Name>

<Birth_Date>1919-12-23</Birth_Date>

<Death_Date>2010-02-26</Death_Date>

</Artist>

<Artist>

<Name>Ronald Anderson</Name>

<Birth_Date>1929-12-06</Birth_Date>

<Death_Date/>

</Artist> ... ...

</Artists>

[ ... …

{ "Title": "Apollo 11 Crew",

"Artist": "Ronald Anderson",

"Ref": "NPG_70_36",

"Sitter": [

{ "Name": "Neil Armstrong",

"Birth Date": "1930-08-05" },

{ "Name": "Buzz Aldrin",

"Birth Date": "1930-01-20" },

{ "Name": "Michael Collins" } ],

"DateOfWork": "1969" },

{ "Title": "Neil Armstrong",

"Artist": "Robert Theodore McCall",

"Ref": "S_NPG_2010_51",

"Sitter": [

{ "Name": "Neil Armstrong" } ],

"DateOfWork": "2009" },

... … ]

<#ArtworkMapping>

rml:logicalSource

[ rml:source “http://ex.com/artworks.json” ;

rml:rererenceFormulation ql:JSONPath ;

rml:iterator “$.[*]” ].

<#SitterMapping>

rml:logicalSource [ rml:source “http://ex.com/artworks.json”;

rml:rererenceFormulation ql:JSONPath ;

rml:iterator “$.[*].Sitter” ].

Referring to the extracts of the input data

explicitly and implicitly

R2RML: column name

RML: XML element or JSON object or …

R2RML: rr:column

RML: rml:reference

<#ArtistMapping>

rml:logicalSource [ rml:source “http://ex.com/artists.xml”;

rml:rererenceFormulation ql:XPath ;

rml:iterator “/Artists/Artist” ] ;

rr:subjectMap [

rr:template “http://ex.com/{Name}” ];

rr:predicateObjectMap [ rr:predicate ex:death_date ; rr:objectMap [

rml:reference “/Artists/Artist/Death_Date”] ].

<Artists> ... ...

<Artist>

<Name>Robert Theodore McCall</Name>

<Birth_Date>1919-12-23</Birth_Date>

<Death_Date>2010-02-26</Death_Date>

</Artist>

<Artist>

<Name>Ronald Anderson</Name>

<Birth_Date>1929-12-06</Birth_Date>

<Death_Date/>

</Artist> ... ...

</Artists>

<http://ex.com/Robert+Theodore+McCall> ex:death_date “1929-12-06”.

[ ... …

{ "Title": "Apollo 11 Crew",

"Artist": "Ronald Anderson",

"Ref": "NPG_70_36",

"Sitter": [

{ "Name": "Neil Armstrong",

"Birth Date": "1930-08-05" },

{ "Name": "Buzz Aldrin",

"Birth Date": "1930-01-20" },

{ "Name": "Michael Collins" } ],

"DateOfWork": "1969" },

{ "Title": "Neil Armstrong",

"Artist": "Robert Theodore McCall",

"Ref": "S_NPG_2010_51",

"Sitter": [

{ "Name": "Neil Armstrong" } ],

"DateOfWork": "2009" },

... … ]

<#ArtworkMapping>

rml:logicalSource [ rml:source “http://ex.com/artworks.json”;

rml:rererenceFormulation ql:JSONPath ;

rml:iterator “$.[*]” ] ;

rr:subjectMap [ rr:template “http://ex.com/{Ref}”];

rr:predicateObjectMap [ rr:predicate rdfs:label ; rr:objectMap [ rml:reference “$.[*].Title” ] ].

<http://ex.com/NPG_70_36> rdfs:label “Apollo 11 Crew”.

[ ... …

{ "Title": "Apollo 11 Crew",

"Artist": "Ronald Anderson",

"Ref": "NPG_70_36",

"Sitter": [

{ "Name": "Neil Armstrong",

"Birth Date": "1930-08-05" },

{ "Name": "Buzz Aldrin",

"Birth Date": "1930-01-20" },

{ "Name": "Michael Collins" } ],

"DateOfWork": "1969" },

{ "Title": "Neil Armstrong",

"Artist": "Robert Theodore McCall",

"Ref": "S_NPG_2010_51",

"Sitter": [

{ "Name": "Neil Armstrong" } ],

"DateOfWork": "2009" },

... … ]

<#SitterMapping>

rml:logicalSource [ rml:source “http://ex.com/artworks.json”;

rml:rererenceFormulation ql:JSONPath ;

rml:iterator “$.[*].Sitter” ] ;

rr:subjectMap [ rr:template “http://ex.com/{Name}”];

rr:predicateObjectMap [ rr:predicate ex:birth_date ; rr:objectMap [ rml:reference “$.[*].Sitter.Birth Date” ]].

<http://ex.com/Neil+Armstrong> ex:birth_date “1930-08-05”.

RDF Mapping Language (RML)

Source

Triples Map

Logical Source

Subject Map

Predicate-Object Map

Predicate Map

Object Map

Term Map

template

constant

reference

Iterator

Reference Formulation

Referencing Object Map

Triples Map

Join Condition

Parent column

Child column

RDF Mapping Language

(RML)

Editing mappings with Karma http://www.isi.edu/integration/karma/

RDF Mapping Language

(RML)

Processing

mapping-driven processing:

processing driven by the mapping module

data-driven processing:

processing driven by the extraction module

Extraction Module Mapping Module

RML Processor

Mapping Hierarchical Sources into RDF

using the RML mapping language

RML: http://rml.io

RML Namespace: http://semweb.mmlab.be/ns/rml

RML Processor: https://github.com/mmlab/RMLProcessor

Contact us

Anastasia Dimou anastasia.dimou@ugent.be @natadimou

Miel Vander Sande miel.vandersande@ugent.be @Miel_vds