ISWC 2014 - Dandelion: from raw data to dataGEMs for developers

Post on 02-Jul-2015

554 views 1 download

description

This is the presentation showed during ISWC 2014 at Riva del Garda. The session was titled "Developers Workshop", and the focus was on how you solved practical problems for Linked Data. We presented dandelion platform and our data curation workflow, and the overall idea of dataGEM APIs.

Transcript of ISWC 2014 - Dandelion: from raw data to dataGEMs for developers

Dandelion: from raw data to dataGEMs for

developers

Stefano Parmesan

Tatiana Tarasova

Ugo Scaiella

Michele Barbera

A bit of context

• SpazioDati s.r.l. • Italian startup: Pisa & Trento • Members of the DBpedia Association • Manage the italian DBpedia

Goal

• Close the gap between getting the data and using it

• Build a Knowledge Graph as-a-service: • Make it querable • Make it stable, make it scale • Support different access levels

How?

• Phase #1: PUT the data in • Data normalization • Entity deduplication

• Phase #2: GET the data out • Slices

How?

Data Normalisation Entity Deduplication Data Storage Data Access

Raw Data

Sample

Reconciliation Services

Source 1

Source N

Azkaban SilkFramework Titan Graph dandelion.eu

Linked Data

Slices

dataGEM

Why…

• … slices? • SQL-like APIs • Common knowledge, linked data

• … a graph at all? • Traversals • Data is centralized • Different sources, different access levels

Why…

• … titan/gremlin? • Scalable • Richer (multi-prop, undef-depth queries) • OpenSource • ElasticSearch powered

And now what?

• Still a prototype: • Private beta access to slices (demo) • English and italian DBpedia • Corporate private data

Future?

• Phase #1b: PUT the data in • Scalable entity deduplication

• Phase #2b: GET the data out • API for graph traversal • Text analysis tools (dataTXT) • Customizations

RDF mappings

<http://data.spaziodati.eu/resource/ef4a83008d4ffd9e85c1fcf6dfe59c8758226cdb> a code:ISTATAdministrativeDivision ; sd:childOf <http://data.spaziodati.eu/resource/7b7d45857f1372e1205bcfc87c19b2b2db2e0f59> ; sd:code "001001" ; sd:acheneID "ef4a83008d4ffd9e85c1fcf6dfe59c8758226cdb" ; code:cadastralCode "A074" ; sd:label "Agliè" ; code:elevation "315"^^xsd:int ; code:isCoastal "false"^^xsd:boolean ; code:isMountainous "false"^^xsd:boolean ; sd:level "60"^^xsd:int . !_:node194hhq904x1 rdf:subject <http://data.spaziodati.eu/resource/ef4a83008d4ffd9e85c1fcf6dfe59c8758226cdb> ; rdf:predicate code:population ; rdf:object "2574"^^xsd:int ; sd:acheneID "31e4104e62168ffc4c3d6d278ecc775effff6ebc" ; metaprop:validSince "2001-10-21"^^xsd:date . !_:node194hhq904x2 rdf:subject <http://data.spaziodati.eu/resource/ef4a83008d4ffd9e85c1fcf6dfe59c8758226cdb> ; rdf:predicate code:population ; rdf:object "2644"^^xsd:int ; sd:acheneID "f38e87252cc5614faeec4abbeedd6315f5d00e9f" ; metaprop:validSince "2011-10-09"^^xsd:date .

Graph structure

Provenance nodes

Type nodes

Bristle node

Achene node

Traversing

• v.as(‘x’).out(‘sd:childOf’) .loop(‘x’){ cur -> cur.outE(‘sd:childOf’).hasNext() }.path()

Stefano Parmesan parmesan@spaziodati.eu