Linked Open Data - Seminar 25.04.12
-
date post
18-Oct-2014 -
Category
Technology
-
view
747 -
download
1
description
Transcript of Linked Open Data - Seminar 25.04.12
www.vestforsk.no
Outline
Web evolution
Semantic Web
Why we need it?
Linked Data Paradigm
Tools
JSON-LD
www.vestforsk.no
Web Evolution
www.vestforsk.no
From Gopher to Super-Mashups
http://reegle.info/countries
www.vestforsk.no
Why do we want to add meaning to data ?
When a computer understands what data means, it can do search, reasoning and combining
www.vestforsk.no
Meaning is about understanding
To understand we need a language
A language starts with words
www.vestforsk.no
Things mean something in words
Online, we describe things with XML
www.vestforsk.no
Look at my coin collection
The first coin is called “Silver Tram” and is from Armenia. It was made in 1246-47 AD.
The second coin is called “Gold Stater of Lahor” and is from India. It was made in 127-151 AD.
< ... etc >
www.vestforsk.no
<?xml version="1.0" encoding="ISO-8859-1"?>
<collection name=”My coin collection">
<coin> <title>Silver Tram</title>
<country>Armenia</country> <year>1246-47 AD</year> </coin> <coin> <title>Gold Stater of
Lahor</title> <country>India</country> <year>127-151 AD</year> </coin></collection>
www.vestforsk.no
We can’t understand words alone. We also need grammar
Online grammar is RDF (Resource Description Framework)
www.vestforsk.no
This coin is from India
www.vestforsk.no
This coin is from India
subject
predicate
object
www.vestforsk.no
With RDF Schema we can define concepts and make simple relations between them
www.vestforsk.no
This coin is from India, hence from South Asia
www.vestforsk.no
But, RDF schema is limited
A language needs more expression and logic to make good reasoning possible
That’s why OWL (The Web Ontology Language) was invented
www.vestforsk.no
Next, to reason you need rules
www.vestforsk.no
I got this coin from my
grandfather.
www.vestforsk.no
The rule for calling someone my grandfather is that one of my parents has a father
mother or fatherIson of father
www.vestforsk.no
Rules are formulated in Rule Language
www.vestforsk.no
<ruleml:imp>
<ruleml:_rlab ruleml:href="#example1"/>
<ruleml:_body>
<swrlx:individualPropertyAtom swrlx:property="hasParent">
<ruleml:var>x1</ruleml:var>
<ruleml:var>x2</ruleml:var>
</swrlx:individualPropertyAtom>
<swrlx:individualPropertyAtom swrlx:property="hasFather">
<ruleml:var>x2</ruleml:var>
<ruleml:var>x3</ruleml:var>
</swrlx:individualPropertyAtom>
</ruleml:_body>
<ruleml:_head>
<swrlx:individualPropertyAtom swrlx:property="hasGrandfather">
<ruleml:var>x1</ruleml:var>
<ruleml:var>x3</ruleml:var>
</swrlx:individualPropertyAtom>
</ruleml:_head>
</ruleml:imp>
www.vestforsk.no
So,
Words in XMLGrammar in RDF (schema) and OWLRules in RL
There are a lot of things, that can be described using standard formats
www.vestforsk.no
Suppose, I want to search for a specific coin
www.vestforsk.no
“I want all the golden coins, designed in Asia, but used in the Europe, between 1958 and 1989”
www.vestforsk.no
We can use SPARQL (Protocol and RDF Query Language)
www.vestforsk.no
Because the Web is decentralized and data is in many places, not only language
is important
Exchange of data between different DB for knowledge creation is an ultimate goal
www.vestforsk.no
To make a connection a machine needs a source For this, we use resource identifiers
Best known resource identifier is the URI
which consists of a name (urn) and
a location (url)
www.vestforsk.no
URI
URN
Gold Stater of Lahor
URL
http://www.mycollection.in/goldStater
www.vestforsk.no
With all this background we are capable of using
the power of all different
data resources on the Web
www.vestforsk.no
Linked Data vs. Semantic Web
The Semantic Web, or the Web of Data, is the ultimate goal
Linked Data provides the means to reach that goal
Linked Data helps build the Web of Data that later can be exploited by more advanced technologies such as intelligent agents
www.vestforsk.no
Linked Data vs. Linked Open Data
www.vestforsk.no
Databases store data to answer questions (1)
Persons Organisations
• How old is Rajendra?• Where does Rajendra work?• What is Rajendra interested
in?
• When was VF founded?• Where is VF located?• What can VF do for me?
www.vestforsk.no
Databases store data to answer questions (2)
name date_birth work_place
interests
Rajendra 08-08 Sogndal Linked Data
Svein …. …. ….
Persons Organisations
organisation
date_founded
location
services
VF 1985 Norway IT-Consulting & Research
nLink …. …. ….
• Rajendra is .. years old.• Rajendra works in Sogndal.• Rajendra is interested in
the Linked Data.
• VF was founded 27 years ago.
• VF is located in Norway.• VF offers IT-Consulting & Research.
www.vestforsk.no
Data from Databases can be exposed to the Web via HTML
Persons Organisations
www.vestforsk.no
Data from Databases can be accessed via APIs
getLocation(„VF“)
Persons Organisations
getWorkplace(„Rajendra“)
<workPlace>Sogndal</workPlace> <location>Norway</location>
www.vestforsk.no
(Some) Information on the Web can be found via search engines
Google Questions won´t be answered necessarily
www.vestforsk.no
But how to get answers on complex questions? (1)
Who is interested in „Linked Data“ and is working in the same country as VF is located?
www.vestforsk.no
But how to get answers on complex questions? (2)
Who is interested in „Linked Data“ and is working in the same country as VF is located?
name date_birth work_place interests
Rajendra 08-08 Sogndal Linked Data
Svein …. …. ….
Persons Organisations
organisation
date_founded
location services
VF 1985 Norway IT-Consulting & Research
nLink …. …. ….
work_place
Sogndal
location
Norway
same thing?
same country?
Still no answer
www.vestforsk.no
Is Mapping the solution?
work_place
Sogndal
location
Norway
Mapped!
same country?
Still not clear
….….….Svein
Computer Science
NTNU08-08Rajendra
courseuniversitydate_birthname
Students
And what, if we need to add another database?
What, if DB-ownerscan´t agree on acommon model?
www.vestforsk.no
Mapping is no
solution for a distributed
Web of data
www.vestforsk.no
Before I come up
with a solution,
let us understand four simple things
www.vestforsk.no
Resources
work_place
location
place
isAisA
Norway
Sogndal
partOf
type
type
www.vestforsk.no
URIs & Namespaces
geo:point
geonames:country
umbel:place
rdfs:subClassOf
dbpedia:Norway
dbpedia:Sogndal
p:subdivisionName
rdfs:subClassOf
rdf:type
dbpedia:Sogndal http://dbpedia.org/resource/Sogndal=
rdf:type
rdfs:subClassOf = http://www.w3.org/2000/01/rdf-schema#subClassOf
A namespace is an abstract container or environment created to hold a logical grouping of unique identifier or symbols.
www.vestforsk.no
Ontologies
work_place
location
place
Person Organisation
University
worksFor
studiesAtisA
has
isAisA
has
Norway
Sogndal
partOf
type
type
www.vestforsk.no
What, if each resource (classes and
individuals) had a URI?
www.vestforsk.no
Expose data from databases as resources & triples on the Web
foaf:name foaf:birthday
foaf:based_near
foaf:topic_interest
Rajendra 08-08 dbpedia:Sogndal dbpedia:LinkedData
Svein …. …. ….
Persons
persons:Rajendra
persons:Svein
dbpedia:Sogndal
persons:Rajendra
foaf:based_near
Organisations
foaf:name
foaf:birthday
foaf:based_near
orgs:services
VF 1985 dbpedia:Norway IT-Consulting & Research
nLink …. …. ….
orgs:VF
orgs:nLink
dbpedia:Norway
orgs:VF
foaf:based_near
www.vestforsk.no
Link data and do queries all over the Web
dbpedia:Sogndal
persons:Rajendra
foaf:based_near
dbpedia:Norway
orgs:VF
foaf:based_near
Who is interested in „Linked Data“ and is working in the same country as VF is located?
dbpedia:LinkedData
foaf:topic_interest
p:subdivisionName
www.vestforsk.no
Link data from more than 40 datasets
Make use of more than 2 Billion triples!
http://esw.w3.org/topic/SweoIG/TaskForces/CommunityProjects/LinkingOpenData
www.vestforsk.no
The Linking Open Data cloud diagram
http://richard.cyganiak.de/2007/10/lod/
Link data from more than 295 datasetsLast updated: 2011-09-19
www.vestforsk.no
How to get answers on really complex questions?
dbpedia:Sogndal
persons:Rajendra
foaf:based_near
dbpedia:Norway
orgs:VF
foaf:based_near
Who is interested in „Linked Data“ and is working in a country where the unemployment rate is lower than 4%?
dbpedia:LinkedData
foaf:topic_interest
p:subdivisionName Scandinavia:Norgeowl:sameAs
3.6
Scandenavia:unemployment_rate_total
www.vestforsk.no
New way to get knowledge and answers —not by searching the web, but by doing dynamic computations based on a vast collection of data, algorithms, and methods
http://www.wolframalpha.com/
www.vestforsk.no
Comprehensive Knowledge Archive Network
http://no.ckan.net/Open Knowledge FoundationLicensed under the Open Database
www.vestforsk.no
A collaboration between: Norwegian Press Association, Association of Norwegian Editors, Norwegian Union of Journalists, and Department of Journalism
http://www.offentlighet.no/Registeroffentlighet/Alle-registre
www.vestforsk.no
Linked data ...
publishing data on the Web ...
... to enable integration, linking and reuse across silos
www.vestforsk.no
Six Steps to Publishing Linked Data
1. Understand the Principles
2. Model Your Data
3. Choose URIs for Things in your Data
4. Setup Your Infrastructure
5. Link to other Data Sets
6. Describe and Publicise your Data
www.vestforsk.no
Can’t we just publish data as files?pdf
easy to read and publish
Excel allows further processing and analysis
csv processing without need for proprietary tools
But ... structure of data not explained no connection between different data sets, silos static and fixed – can’t retrieve just slices relevant to problem
www.vestforsk.no
Linked data
Apply the principles of the Web to publication of data
The Web: is a global network of pages each identified by a URL fetching a URL gives a document pages connected by links open, anyone can say anything about anything else
www.vestforsk.no
Linked dataApply the principles to the web to publication of data
The linked data web: is a global network of things each identified by a URI fetching a URI gives a set of statements things connected by typed links open, anyone can say anything about anything else
Linked data is “data you can click on”
www.vestforsk.no
Linked Data - Paradigm
Use URIs as names for things
Use HTTP URIs so that people can look up those names.
When someone looks up a URI, provide useful information.
Include links to other URIs. so that they can discover more things.
www.vestforsk.no
LOD Benefits
other humans and applications caneasily access your data using Web
technologiesfollow the links in order to obtain
further contextual information
links to your data and search engine indices can increase the visibility of your data
www.vestforsk.no
JSON-LD - JSON for Linking Data
JSON-LD (JavaScript Object Notation for Linking Data) is a lightweight Linked Data format that gives your data context. It is easy for humans to read and write. It is easy for
machines to parse and generate. It is based on the already successful JSON format and
provides a way to help JSON data interoperate at Web-scale.
If you are already familiar with JSON, writing JSON-LD is very easy.
These properties make JSON-LD an ideal Linked Data interchange language for JavaScript environments, Web service, and unstructured databases such as CouchDB and MongoDB.http://json-ld.org/spec/latest/json-ld-syntax/
www.vestforsk.no
This RDF model in standard XML notation
<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:dc="http://purl.org/dc/elements/1.1/"> <rdf:Description rdf:about="/wiki/Tony_Benn"> <dc:title>Tony Benn</dc:title> <dc:publisher>Wikipedia</dc:publisher> </rdf:Description> </rdf:RDF>
www.vestforsk.no
written in JSON-LD like this:
{ "@context": { "title": "http://purl.org/dc/elements/1.1/title", "publisher": "http://purl.org/dc/elements/1.1/publisher" }, "@id": "/wiki/Tony_Benn", "title": "Tony Benn", "publisher": "Wikipedia" }
A context is used to allow developers to use aliases for IRIs.
www.vestforsk.no
JSON-LD object
An Internationalized Resource Identifier (IRI) is a mechanism for representing unique
identifiers on the web. In Linked Data, IRIs (or URI references) are
commonly used for describing entities and properties.
{ "a": "Person", "name": "Manu Sporny", "homepage": "http://manu.sporny.org/" "avatar": "http://twitter.com/account/profile_image/manusporny" }
www.vestforsk.no
Unambiguous Identifiers for JSON
If a set of terms, like Person, name, and homepage, are defined in a context, and that context is used to resolve the names in JSON objects, machines could automatically expand the terms to something meaningful and unambiguous
{ "http://www.w3.org/1999/02/22-rdf-syntax-ns#type": "http://xmlns.com/foaf/0.1/Person", "http://xmlns.com/foaf/0.1/name": "Manu Sporny", "http://xmlns.com/foaf/0.1/homepage": "http://manu.sporny.org" "http://rdfs.org/sioc/ns#avatar": "http://twitter.com/account/profile_image/manusporny" }
www.vestforsk.no
JSON-LD Example
Let's start by building up a fictitious bike store called "Links Bike Shop". We've already got our bike store setup athttp://store.example.com/ and are using linked data principles.
Here's some of the URLs:
http://store.example.com/: The home page of the store.
http://store.example.com/products/links-swift-chain: A chain product.
http://store.example.com/products/links-speedy-lube: A chain lube product.
www.vestforsk.no
We want to start creating some linked data for this fictitious store and start with rough JSON data on the store itself.
{
"@id": "http://store.example.com/",
"@type": "Store",
"name": "Links Bike Shop",
"description": "The most \"linked\" bike store on earth!"
}
www.vestforsk.no
Next let's create some rough data for our two premier products{ "@id":
"http://store.example.com/products/links-swift-chain",
"@type": "Product", "name": "Links Swift Chain", "description": "A fine chain with many
links.", "category":
["http://store.example.com/categories/parts", "http://store.example.com/categories/chains"],
"price": "10.00", "stock": 10}
www.vestforsk.no
{
"@id": "http://store.example.com/products/links-speedy-lube",
"@type": "Product",
"name": "Links Speedy Lube",
"description": "Lubricant for your chain links.",
"category": ["http://store.example.com/categories/lubes", "http://store.example.com/categories/chains"],
"price": "5.00",
"stock": 20
}
www.vestforsk.no
To make this into a full JSON-LD document we combine the data, add a @context, and adjust some values.{
"@id": "http://store.example.com/",
"@type": "Store",
"name": "Links Bike Shop",
"description": "The most \"linked\" bike store on earth!",
"product": [
...
...
www.vestforsk.no
],
"@context": {
"Store": "http://ns.example.com/store#Store",
"Product": "http://ns.example.com/store#Product",
"product": "http://ns.example.com/store#product",
"category": {
"@id": "http://ns.example.com/store#category",
"@type": "@id"
},
"price": "http://ns.example.com/store#price",
"stock": "http://ns.example.com/store#stock",
"name": "http://purl.org/dc/terms/title",
"description": "http://purl.org/dc/terms/description",
"p": "http://store.example.com/products/",
"cat": "http://store.example.com/category/"
}
}
www.vestforsk.no
Publishing Solutions and Tools
Triplify
Goal: expose semantics available in RDBMS as simple as possible
Available for most popular Web app languages PHP (ready), Ruby/Python (under dev.)
Works with most popular Web app databases MySQL, PHP-PDO DBs (SQLite, Oracle, DB2, MS
SQL, PostgreSQL)
www.vestforsk.no
Virtuoso RDF Views
transforms the result of SQL SELECT statements into RDF
mapping steps define RDFS class IRIs for each table define construction of subject IRIs from primary key
column values define construction of predicate IRIs from each non-key
column
www.vestforsk.no
Relational Databases RDF & Ontologies
Data Model Relational(tables, columns, rows)
Triples(subject, predicate, object)
Schema and data separation
Implicit information
Scalability
Schema flexibility
Web data integration readiness
Marrying DBs with RDF & Ontologies
Using DBs for storage and querying of RDF & ontologies
Publishing DB content as RDF
www.vestforsk.no
DBpedia is a community effort to extract structured information from Wikipedia and to make this information available on the Web. DBpedia allows you to ask sophisticated queries against Wikipedia, and to link other data sets on the Web to Wikipedia data.
The DBpedia knowledge base currently describes more than 2.6 million things, including at least 213,000 persons, 328,000 places, 57,000 music albums, 36,000 films, 20,000 companies. The knowledge base consists of 274 million pieces of information (RDF triples).
http://dbpedia.org/
DBpedia and all other linked data is searchable with SPARQL
http://en.wikipedia.org/wiki/SPARQL
www.vestforsk.no
Open Streetmap
OpenStreetMap is a free editable map of the whole world. It is made by people like you.
OpenStreetMap allows you to view, edit and use geographical data in a collaborative way from anywhere on Earth.
www.openstreetmap.org
GeoNamesThe GeoNames geographical database is available for download free of charge under a creative commons attribution license. It contains over eight million geographical names and consists of 6.5 million unique features.www.geonames.org
www.vestforsk.no
Creating Open DataPublic Domain – Only after the expiration of copyright
Science Commons protocol for open data
Creative Commons Zero Public Domain Dedication & License with Community Norms
o Avoid Technical protection measures
o Give credit where credit’s due
o Use Open formats
o Let others know!
o Share your work too!
Photo by suttonhoo @ Flickr, CC BY-NC-SA
www.vestforsk.no
Examples
http://data-gov.tw.rpi.edu/wikihttp://dbrec.net/http://fanhu.bz/http://data.nytimes.com/schools/schools.htmlhttp://sig.mahttp://visinav.deri.org/semtech2010/
www.vestforsk.no
The road to open knowledge begins here!
Thank you !