Mapping Hierarchical Sources into RDF using the RML Mapping Language

41
Mapping Hierarchical Sources into RDF using the RML Mapping Language Anastasia Dimou 1 , Miel Vander Sande 1 , Jason Slepicka 2 , Pedro Szekely 2 , Erik Mannens 1 , Craig Knoblock 2 , Rik Van de Walle 1 1 Ghent University iMinds Multimedia Lab 2 University of Southern California Information Science Institute Department of Computer Science http://rml.io IEEE-ICSC14 Newport beach, California, 18th June 2014

description

Incorporating structured data in the Linked Data cloud is still complicated, despite the numerous existing tools. In particular, hierarchical structured data (e.g., JSON) are underrepresented, due to their processing complexity. A uniform mapping formalisation for data in different formats, which would enable reuse and exchange between tools and applied data, is missing. This paper describes a novel approach of mapping heterogeneous and hierarchical data sources into RDF using the RML mapping language, an extension over R2RML (the W3C standard for mapping relational databases into RDF). To facilitate those mappings, we present a toolset for producing RML mapping files using the Karma data modelling tool, and for consuming them using a prototype RML processor. A use case shows how RML facilitates the mapping rules’ definition and execution to map several heterogeneous sources. http://rml.io https://github.com/mmlab/RMLProcessor

Transcript of Mapping Hierarchical Sources into RDF using the RML Mapping Language

Page 1: Mapping Hierarchical Sources into RDF using the RML Mapping Language

Mapping Hierarchical Sources into RDF using the RML Mapping Language

Anastasia Dimou1, Miel Vander Sande1, Jason Slepicka2, Pedro Szekely2,

Erik Mannens1, Craig Knoblock2, Rik Van de Walle1

1Ghent University – iMinds – Multimedia Lab 2University of Southern California – Information Science Institute –

Department of Computer Science

http://rml.io

IEEE-ICSC14

Newport beach, California, 18th June 2014

Page 2: Mapping Hierarchical Sources into RDF using the RML Mapping Language

Most of the data that we would like to be able to query as Linked Open Data

exists in formats other than RDF

Page 3: Mapping Hierarchical Sources into RDF using the RML Mapping Language

There are…

over 11,000 APIs according to ProgrammableWeb.org

only 74 of which return results in RDF

But more than 5000

return results in JSON or XML

Page 4: Mapping Hierarchical Sources into RDF using the RML Mapping Language

Many languages, tools and approaches

were proposed

to convert data from relational databases to RDF

Page 5: Mapping Hierarchical Sources into RDF using the RML Mapping Language

Relational Database to RDF (R2RML W3C)

R2RML mappings R2RML processor

Data OWNER / PUBLISHER

defines

RDF

DB

Page 6: Mapping Hierarchical Sources into RDF using the RML Mapping Language
Page 7: Mapping Hierarchical Sources into RDF using the RML Mapping Language

R2RML mappings R2RML processor

Data OWNER / PUBLISHER

defines

RDF

DB CSV JSON XML

RDF RDF RDF

Page 8: Mapping Hierarchical Sources into RDF using the RML Mapping Language

lack of uniform definitions to describe mapping rules for heterogeneous sources

lack of interoperable definitions that would allow the re-use of mapping rules

across different implementations

lack of reusable definitions that would allow the re-use of mapping rules

for representing data in the same or different formats

Page 9: Mapping Hierarchical Sources into RDF using the RML Mapping Language

mapping data

on a per-source and per-format basis

or on case-specific basis

Uniform way of defining mappings

for heterogeneous sources

that can be re-used across data

in the same or different formats

and be interoperable

across different implementations

Page 10: Mapping Hierarchical Sources into RDF using the RML Mapping Language

R2RML mappings R2RML processor

Data OWNER / PUBLISHER

defines

RDF

DB CSV JSON XML

RDF RDF RDF

Page 11: Mapping Hierarchical Sources into RDF using the RML Mapping Language

Mappings definitions processor

Data OWNER / PUBLISHER

defines

RDF

DB CSV JSON XML

any format to RDF

Page 12: Mapping Hierarchical Sources into RDF using the RML Mapping Language

RDF Mapping Language (RML)

generic scalable mapping language

for mapping heterogeneous resources into RDF

in an integrable and interoperable fashion

superset of the W3C standardized

R2RML mapping language

http://semweb.mmlab.be/ns/rml

Page 13: Mapping Hierarchical Sources into RDF using the RML Mapping Language

Relational Database to RDF

Mapping Language

(R2RML)

Page 14: Mapping Hierarchical Sources into RDF using the RML Mapping Language

R2RML mapping document

NAME BIRTH_DATE DEATH_DATE

Robert Theodore McCall 1919-12-23 2010-02-26

Ronald Anderson 1929-12-06

Triples Map

Logical Table

Table Name

<#ArtistMapping>

rr:logicalTable [

rr:tableName “ARTISTS” ].

Page 15: Mapping Hierarchical Sources into RDF using the RML Mapping Language

R2RML mapping definition

Table Name

Triples Map

Logical Table

Subject Map

Predicate-Object Map

Predicate-Object Map

Predicate-Object Map

Predicate Map

Object Map

Page 16: Mapping Hierarchical Sources into RDF using the RML Mapping Language

R2RML mapping document

Triples Map

Subject Map

NAME BIRTH_DATE DEATH_DATE

Robert Theodore McCall 1919-12-23 2010-02-26

Ronald Anderson 1929-12-06

<#ArtistMapping>

rr:subjectMap [

rr:template “http://ex.com/{NAME}” ;

rr:class ex:Person ];

<http://ex.com/Robert+Theodore+McCall> a ex:Person

Page 17: Mapping Hierarchical Sources into RDF using the RML Mapping Language

R2RML mapping document

Predicate Map

NAME BIRTH_DATE DEATH_DATE

Robert Theodore McCall 1919-12-23 2010-02-26

Ronald Anderson 1929-12-06

<#ArtistMapping>

rr:predicateObjectMap [

rr:predicate ex:birth_date;

rr:objectMap [ rr:column "BIRTH_DATE" ] ];

<http://ex.com/Robert+Theodore+McCall> ex:birth_date “1919-12-23”

Predicate Object Map

Objectt Map

Page 18: Mapping Hierarchical Sources into RDF using the RML Mapping Language

RDF Mapping Language

(RML)

Page 19: Mapping Hierarchical Sources into RDF using the RML Mapping Language

RDF Mapping Language (RML)

mapping hierarchical sources to RDF

deal with hierarchy and heterogeneity

Page 20: Mapping Hierarchical Sources into RDF using the RML Mapping Language

R2RML: each row is a self-contained

that can be processed independently

R2RML: the columns in each row

can be referred to unambiguously

R2RML: for each reference to a column in a single row

a unique value is returned

Page 21: Mapping Hierarchical Sources into RDF using the RML Mapping Language

explicit reference to the iteration pattern R2RML: each row is a self-contained

that can be processed independently

abstract reference to the input data R2RML: the columns in each row

can be referred to unambiguously

more than one triples per Predicate-Object Map R2RML: for each reference to a column in a single row

a unique value is returned

Page 22: Mapping Hierarchical Sources into RDF using the RML Mapping Language

RDF Mapping Language

(RML)

For hierarchical sources

Page 23: Mapping Hierarchical Sources into RDF using the RML Mapping Language

[ ... …

{ "Title": "Apollo 11 Crew",

"Artist": "Ronald Anderson",

"Ref": "NPG_70_36",

"Sitter": [

{ "Name": "Neil Armstrong",

"Birth Date": "1930-08-05" },

{ "Name": "Buzz Aldrin",

"Birth Date": "1930-01-20" },

{ "Name": "Michael Collins" } ],

"DateOfWork": "1969" },

{ "Title": "Neil Armstrong",

"Artist": "Robert Theodore McCall",

"Ref": "S_NPG_2010_51",

"Sitter": [

{ "Name": "Neil Armstrong" } ],

"DateOfWork": "2009" },

... … ]

<Artists> ... ...

<Artist>

<Name>Robert Theodore McCall</Name>

<Birth_Date>1919-12-23</Birth_Date>

<Death_Date>2010-02-26</Death_Date>

</Artist>

<Artist>

<Name>Ronald Anderson</Name>

<Birth_Date>1929-12-06</Birth_Date>

<Death_Date/>

</Artist> ... ...

</Artists>

artworks.JSON artists.XML

Page 24: Mapping Hierarchical Sources into RDF using the RML Mapping Language

Specifying the input data

R2RML: database

RML: file, API, …

R2RML: Logical Table (rr:logicalTable)

RML: Logical Source (rml:logicalSource)

R2RML: logical Name (rr:logicalName)

RML: source (rml:source)

Page 25: Mapping Hierarchical Sources into RDF using the RML Mapping Language

Triples Map

Logical Source

source

<#ArtworkMapping>

rml:logicalSource [rml:source “http://ex.com/artworks.json”].

Triples Map

Logical Source

source

<#ArtistMapping>

rml:logicalSource

[ rml:source “artists.xml” ].

Page 26: Mapping Hierarchical Sources into RDF using the RML Mapping Language

Referring to the input data

R2RML: databases

RML: XML or JSON or CSV or ….

R2RML: (SQL)

RML: Xpath/Xquery or JSONPath or RFC 4180 or …

R2RML: (rr:sqlQuery)

RML: rml:referenceFormulation

Page 27: Mapping Hierarchical Sources into RDF using the RML Mapping Language

<#ArtworkMapping>

rml:logicalSource

[ rml:source “http://ex.com/artworks.json” ;

rml:rererenceFormulation ql:JSONPath ].

Triples Map

Logical Source

source

<#ArtistMapping>

rml:logicalSource

[ rml:source “artists.xml”;

rml:referenceFormulation ql:XPath ]. Reference Formulation

Triples Map

Logical Source

source

Reference Formulation

Page 28: Mapping Hierarchical Sources into RDF using the RML Mapping Language

Iterating over the input data

R2RML: per row

RML: ?

R2RML:

RML: rml:iterator

Page 29: Mapping Hierarchical Sources into RDF using the RML Mapping Language

<#ArtistMapping>

rml:logicalSource

[ rml:source “artists.xml”; rml:referenceFormulation ql:Xpath ;

rml:iterator “/Artists/Artist” ].

<Artists> ... ...

<Artist>

<Name>Robert Theodore McCall</Name>

<Birth_Date>1919-12-23</Birth_Date>

<Death_Date>2010-02-26</Death_Date>

</Artist>

<Artist>

<Name>Ronald Anderson</Name>

<Birth_Date>1929-12-06</Birth_Date>

<Death_Date/>

</Artist> ... ...

</Artists>

Page 30: Mapping Hierarchical Sources into RDF using the RML Mapping Language

[ ... …

{ "Title": "Apollo 11 Crew",

"Artist": "Ronald Anderson",

"Ref": "NPG_70_36",

"Sitter": [

{ "Name": "Neil Armstrong",

"Birth Date": "1930-08-05" },

{ "Name": "Buzz Aldrin",

"Birth Date": "1930-01-20" },

{ "Name": "Michael Collins" } ],

"DateOfWork": "1969" },

{ "Title": "Neil Armstrong",

"Artist": "Robert Theodore McCall",

"Ref": "S_NPG_2010_51",

"Sitter": [

{ "Name": "Neil Armstrong" } ],

"DateOfWork": "2009" },

... … ]

<#ArtworkMapping>

rml:logicalSource

[ rml:source “http://ex.com/artworks.json” ;

rml:rererenceFormulation ql:JSONPath ;

rml:iterator “$.[*]” ].

<#SitterMapping>

rml:logicalSource [ rml:source “http://ex.com/artworks.json”;

rml:rererenceFormulation ql:JSONPath ;

rml:iterator “$.[*].Sitter” ].

Page 31: Mapping Hierarchical Sources into RDF using the RML Mapping Language

Referring to the extracts of the input data

explicitly and implicitly

R2RML: column name

RML: XML element or JSON object or …

R2RML: rr:column

RML: rml:reference

Page 32: Mapping Hierarchical Sources into RDF using the RML Mapping Language

<#ArtistMapping>

rml:logicalSource [ rml:source “http://ex.com/artists.xml”;

rml:rererenceFormulation ql:XPath ;

rml:iterator “/Artists/Artist” ] ;

rr:subjectMap [

rr:template “http://ex.com/{Name}” ];

rr:predicateObjectMap [ rr:predicate ex:death_date ; rr:objectMap [

rml:reference “/Artists/Artist/Death_Date”] ].

<Artists> ... ...

<Artist>

<Name>Robert Theodore McCall</Name>

<Birth_Date>1919-12-23</Birth_Date>

<Death_Date>2010-02-26</Death_Date>

</Artist>

<Artist>

<Name>Ronald Anderson</Name>

<Birth_Date>1929-12-06</Birth_Date>

<Death_Date/>

</Artist> ... ...

</Artists>

<http://ex.com/Robert+Theodore+McCall> ex:death_date “1929-12-06”.

Page 33: Mapping Hierarchical Sources into RDF using the RML Mapping Language

[ ... …

{ "Title": "Apollo 11 Crew",

"Artist": "Ronald Anderson",

"Ref": "NPG_70_36",

"Sitter": [

{ "Name": "Neil Armstrong",

"Birth Date": "1930-08-05" },

{ "Name": "Buzz Aldrin",

"Birth Date": "1930-01-20" },

{ "Name": "Michael Collins" } ],

"DateOfWork": "1969" },

{ "Title": "Neil Armstrong",

"Artist": "Robert Theodore McCall",

"Ref": "S_NPG_2010_51",

"Sitter": [

{ "Name": "Neil Armstrong" } ],

"DateOfWork": "2009" },

... … ]

<#ArtworkMapping>

rml:logicalSource [ rml:source “http://ex.com/artworks.json”;

rml:rererenceFormulation ql:JSONPath ;

rml:iterator “$.[*]” ] ;

rr:subjectMap [ rr:template “http://ex.com/{Ref}”];

rr:predicateObjectMap [ rr:predicate rdfs:label ; rr:objectMap [ rml:reference “$.[*].Title” ] ].

<http://ex.com/NPG_70_36> rdfs:label “Apollo 11 Crew”.

Page 34: Mapping Hierarchical Sources into RDF using the RML Mapping Language

[ ... …

{ "Title": "Apollo 11 Crew",

"Artist": "Ronald Anderson",

"Ref": "NPG_70_36",

"Sitter": [

{ "Name": "Neil Armstrong",

"Birth Date": "1930-08-05" },

{ "Name": "Buzz Aldrin",

"Birth Date": "1930-01-20" },

{ "Name": "Michael Collins" } ],

"DateOfWork": "1969" },

{ "Title": "Neil Armstrong",

"Artist": "Robert Theodore McCall",

"Ref": "S_NPG_2010_51",

"Sitter": [

{ "Name": "Neil Armstrong" } ],

"DateOfWork": "2009" },

... … ]

<#SitterMapping>

rml:logicalSource [ rml:source “http://ex.com/artworks.json”;

rml:rererenceFormulation ql:JSONPath ;

rml:iterator “$.[*].Sitter” ] ;

rr:subjectMap [ rr:template “http://ex.com/{Name}”];

rr:predicateObjectMap [ rr:predicate ex:birth_date ; rr:objectMap [ rml:reference “$.[*].Sitter.Birth Date” ]].

<http://ex.com/Neil+Armstrong> ex:birth_date “1930-08-05”.

Page 35: Mapping Hierarchical Sources into RDF using the RML Mapping Language

RDF Mapping Language (RML)

Source

Triples Map

Logical Source

Subject Map

Predicate-Object Map

Predicate Map

Object Map

Term Map

template

constant

reference

Iterator

Reference Formulation

Referencing Object Map

Triples Map

Join Condition

Parent column

Child column

Page 36: Mapping Hierarchical Sources into RDF using the RML Mapping Language

RDF Mapping Language

(RML)

Editing mappings with Karma http://www.isi.edu/integration/karma/

Page 37: Mapping Hierarchical Sources into RDF using the RML Mapping Language
Page 38: Mapping Hierarchical Sources into RDF using the RML Mapping Language

RDF Mapping Language

(RML)

Processing

Page 39: Mapping Hierarchical Sources into RDF using the RML Mapping Language

mapping-driven processing:

processing driven by the mapping module

data-driven processing:

processing driven by the extraction module

Page 40: Mapping Hierarchical Sources into RDF using the RML Mapping Language

Extraction Module Mapping Module

RML Processor

Page 41: Mapping Hierarchical Sources into RDF using the RML Mapping Language

Mapping Hierarchical Sources into RDF

using the RML mapping language

RML: http://rml.io

RML Namespace: http://semweb.mmlab.be/ns/rml

RML Processor: https://github.com/mmlab/RMLProcessor

Contact us

Anastasia Dimou [email protected] @natadimou

Miel Vander Sande [email protected] @Miel_vds