Einführung Linked Open Data (LOD) - Introduction to Linked Open Data (LOD)
Linked Open Data for Digital Humanities
-
Upload
christophe-gueret -
Category
Technology
-
view
1.043 -
download
2
description
Transcript of Linked Open Data for Digital Humanities
Linked Open Data for Digital HumanitiesWhat is Linked Open Data and
why is it relevant for you ?
Christophe Guéret (@cgueret)
Open Data
“A piece of data or content is open if anyone is free to use, reuse, and redistribute it — subject only, at most, to the requirement to attribute and/or share-alike.”
http://opendefinition.org/
Linked Data
"a term used to describe a recommended best practice for exposing, sharing, and connecting pieces of data, information, and knowledge on the Semantic Web using URIs and RDF."
http://linkeddata.org/
Linked Open Data
● Linked Open Data = Open Data + Linked Data
● Interconnected data sets that are on the Web and free to use
● 5-star scheme http://5stardata.info/
Why does it matter for DH ?
● Digital Humanities use a lot of data and study relations between things
● Data acquisition & curation represents a LOT of efforts for data consumers
● Linked Open Data is a good way to○ Facilitate your own work (as a data consumer)○ Facilitate other's work (as a data publisher)
Data found on the Web
● You get the following table as a CSV file
● And that Excel table from somewhere else
Kennis Stad
Christophe Amsterdam
David Parijs
Ville Pays
Paris France
Amsterdam Pays-Bas
And you want to integrate it
● Data integration issues○ Kennis, Stad, Ville, Pays ?○ Parijs = Paris ?○ Amsterdam = Amsterdam ?
● Lot of work for the (uninformed) consumer !
Kennis Stad
Christophe Amsterdam
David Parijs
Ville Pays
Paris France
Amsterdam Pays-Bas
+ = ?
Linked Data approach
● Assign unique identifiers (URIs) to concepts and things
● Create a "triple": connect the identifiers with labelled, directed edges
dbpedia:Amsterdam dbpedia:Netherlandsdbo:country
Why does it solves the issue?
● Shift some of the data integration load on the provider side○ Clarify the semantics of the data○ Refer to identifiers rather than names
● There is only one "dbpedia:Amsterdam" at http://dbpedia.org/resource/Amsterdam
● Labels used for the edges are published by an external authority
Some vocabulary publishers
From triples to the Web of Data
● Every triple is a bit of factual information
● Because nodes are re-used across triples, the union of all the triples is a graph
● The "Web of Data" is a pre-integrated, semantically clear, data set ready to be used!
Exploring relations in the graph
Let's make a social network !
● The network○ A node per European country○ An edge means a shared official language○ Label the edges with the languages○ Label the nodes with the country names
● Data source○ DBpedia SPARQL http://dbpedia.org/sparql
● Visualisation tool○ Gephi https://gephi.org/
SPARQL ?
● Query language for Linked Open Data● Describe part of the graph and use variables
dbpedia:Amsterdam ?Countrydbo:country
Suggested book to read
The query in SPARQLSELECT DISTINCT ?Source ?Target ?Label WHERE {
?country1 a <http://dbpedia.org/class/yago/EuropeanCountries>.?country1 <http://dbpedia.org/ontology/officialLanguage> ?language.?country2 a <http://dbpedia.org/class/yago/EuropeanCountries>.?country2 <http://dbpedia.org/ontology/officialLanguage> ?language.FILTER (?country1 != ?country2)
?country1 <http://www.w3.org/2000/01/rdf-schema#label> ?Source.?country2 <http://www.w3.org/2000/01/rdf-schema#label> ?Target.?language <http://www.w3.org/2000/01/rdf-schema#label> ?Label.FILTER ((LANG(?Source) = "en") && (LANG(?Target) = "en") && (LANG(?Label) = "en"))
}
Making the network
● Get the query from○ https://gist.github.com/cgueret/5098706
● Copy & paste in to○ http://dbpedia.org/sparql
● Change the result format to "CSV"● Press "Run Query" and save the result
● Open Gephi● Start a new project● Import the CSV file in the "Data Laboratory"
There is not only DBpedia ...
Last words
● Look for data sources published as Linked Open Data (RDF), this can save you time
● Consider publishing your own data as Linked Open Data
● There is much more to say...○ Using SPARQL within R (very easily)
■ http://linkedscience.org/tools/sparql-package-for-r/○ Reasoning capabilities of triple stores○ Creating and extending vocabularies