Introduction to the Semantic Web and Linked Open Data.
-
Upload
amanda-stanley -
Category
Documents
-
view
232 -
download
0
Transcript of Introduction to the Semantic Web and Linked Open Data.
• Overview of issues relating to the publication and use of linked data in HEIs
• The lessons that we’ve learned!• Pragmatism rather than perfection• General guidelines rather than detailed
specifications• Coining cool URIs• Publication alongside existing resources• Licensing
Goals
• Detailed tutorial on the finer points of:• RDF• RDFa• RDF Schema• OWL• SPARQL• …
(an hour and a half isn’t enough for this – and there are good tutorials available online)
Non-Goals
“If HP knew what HP knows, we’d be three times more profitable”
Lew PlattHewlett-Packard Chairman and CEO
Linked Data in a NutshellLinked Data in a Nutshell
http://www.flickr.com/photos/arielarielariel/322301228/http://www.flickr.com/photos/arielarielariel/322301228/
• Linked Data is about providing structured data on the Web
• Doesn’t necessarily require RDF (though it usually uses it)
•Underlying model of triples used to describe the relations between entities in linked data
• This is the basis of the RDF data model
• (subject, predicate, object)• e.g. “The Hobbit”, “created by”, “JRR Tolkien”
The triple
The Hobbit JRR Tolkiencreated by
subject predicate object
• Take a citation:• Tim Berners-Lee, James Hendler and Ora Lassila. The
Semantic Web. Scientific American, May 2001
•We can identify a number of distinct statements in this citation:• There is an article titled “The Semantic Web”• One of its authors is a person named “Tim Berners-Lee”
(etc)• It appeared in a publication titled “Scientific American”• It was published in May 2001
Example
• We can represent these statements graphically:
Example
Tim Berners-Lee
James Hendler
Ora Lassila
The Semantic Web
Scientific American
name
name
name
title
title
creator
publishedIn creator
creator
2001-05
date
Example
• There are two types of node in this graph:• Literals, which have a value but no identity
(a string, a number, a date)
• Resources, which represent objects with identity(a web page, a person, a journal)
Scientific American
• Resources are identified by URIs• Property labels are also identified by URIs, and are
drawn from a vocabulary or ontology
Example
http://purl.org/dc/elements/1.1/title
http://www.sciam.com/ Scientific American
subject predicate object
• The triple-based graph model makes it possible to mix terms from different vocabularies in the same graph
• Simplifies the task of information integration
Mixing Vocabularies
Tim Berners-Lee
James Hendler
Ora Lassila
The Semantic Web
Scientific American
name
name
name
title
title
creator
publishedIn creator
creator
2001-05
date
foaf
dc
bibo
Set of publishing practices for SW data:
1. Use URIs as names for things2. Use HTTP URIs so that people can look up those
names3. When someone looks up a URI, provide useful
information4. Include links to other URIs. so that they can discover
more things
Effectively, putting the hypertext back into the Semantic Web
Simplifies integration between datasets while maintaining loose coupling
Linked Data Principles
Example
graph describing ‘sw’
sciam
tbl
jh
ora
sw
The Semantic Web title
creator
publishedIn creator
creator
2001-05
date
graph describing ‘tbl’
Tim Berners-Lee
nametbl
graph describing ‘jh’
James Hendlername
jh
graph describing ‘ora’
Ora Lassilanameoragraph describing ‘sciam’
Scientific American
titlesciam
• URI represents a person.
• Requesting URI via web gets a “See Other” response.
• Requester redirected to most appropriate document URL. usually HTML or RDF+XML
Publishing Example
<<>><<><>><>>><>><>><>><>><><>>>><<><><<<<<><><><><><><><><><><><><><<<<>>><><<><><>><>
• DON’T worry about understanding the XML. It’s the equivalent of “view-source” in a webpage!
• Use a tool to covert it to something less icky! (http:/graphite.ecs.soton.ac.uk/browser/ for example)
Publishing RDF
• You want your data to be used & reused, right?• Don’t prevent commercial use.• Don’t prevent derivative works (prevents people
using it at all!)• If there are any things which your data should not
be used for why are you publishing it?
Licensing
• Must-Attribute license• Public Domain license
(your info still can’t be used in illegal ways, of course)
• Procrastinate and worry about it later(much better than not publishing your data)
Licensing Options
• What datasets does your organisation already maintain?
• What is the business case for making them available?• in a machine readable form• to all members • without bureaucracy or restriction.
• What are the barriers to putting them online and maintaining them?
• What are the benefits to the wider community?
• What are the risks?
Task
• List your 3 easiest wins - the lowest hanging fruit.
• Starting suggestion: Every building & campus in your organisation with:• Number • Building Name• Site (Campus)• Lat & Long This data changes very slowly and also made freely available
already.
Task
• http://id.ecs.soton.ac.uk/docs/
• http://rdf.ecs.soton.ac.uk/person/1248
• http://rdf.ecs.soton.ac.uk/project/42
Beauty
• http://domain/classOfThing/scheme/identifier• http://domain/classOfThing/scheme/identifier.rdf• http://domain/classOfThing/scheme/identifier.html
• http://mysite.org/person/username/t23• http://mysite.org/person/username/t23.rdf• http://mysite.org/person/username/t23.html
Scheme is optional but futureproofs you against next time the university reorganises everything.
And The Beast
http://www.diy.com/diy/jsp/bq/nav.jsp?action=detail&fh_oneslice=true&fh_view_size=10&fh_reffacet=styleStyle&fh_location=%2f%2fcatalog01%2fen_GB%2fcategories%3C{9372014}%2fcategories%3C{9372039}%2fcategories%3C{9372150}%2fspecificationsProductType%3done_hole_taps%2fstyleStyle%3E{adelaide}&fh_refview=summary&fh_refpath=facet_159017215&fh_secondid=10507747&fh_eds=%C3%9F&ts=1279018688652
Further ReadingFurther Reading
http://www.flickr.com/photos/markhillary/337685031/http://www.flickr.com/photos/markhillary/337685031/
• http://www.w3.org/standards/semanticweb/ • http://www.w3.org/standards/techs/rdf • http://www.w3.org/standards/techs/owl• http://www.w3.org/TR/swbp-vocab-pub/
W3C Specifications
Tools
•Graphite Browser• http://graphite.ecs.soton.ac.uk/browser/
• Tabulator• http://www.w3.org/2005/ajar/tab
Linked Data Help
• Linked Data Website• http://linkeddata.org/
• The Patterns Book• http://patterns.dataincubator.org/book/
• Semantic Overflow• http://www.semanticoverflow.com/
• SKOS (Simple Knowledge Organisation Scheme)• Taxonomies and thesauri
• SIOC (Semantically Interlinked Online Communities)• Web forums, mailing lists, etc
• FOAF (Friend of a Friend)• People, social networks
• DC (Dublin Core)• Basic bibliographic information
• BIBO (Bibliographic Ontology)• Advanced bibliographic information
• GEO• Simple geolocation (lat/long) ontology
Common Namespaces
Cool URIs
• Cool URIs don't change (by TimBL)• http://www.w3.org/Provider/Style/URI
• Cool URIs for the Semantic Web• http://www.w3.org/TR/cooluris/
• ECS URI scheme documentation• http://id.ecs.soton.ac.uk/docs/
Infrastructure Namespaces
• RDF & RDFS• These describe classes & predicates which are used to tie
everything together. rdf:type is used to give a URI a class <http://id.ecs.soton.ac.uk/person/1248> rdf:type
foaf:Person .
•OWL• Used to describe the meaning of predicates & classes in
machine-readable form.• Start with a human readable documents, OWL is not widely
consumed (yet?)
• XSD• Describes datatypes like String, Positve Integer etc.
Take Home MessagesTake Home Messages
http://www.flickr.com/photos/71894657@N00/2696793132/http://www.flickr.com/photos/71894657@N00/2696793132/
• ‘Cool URIs don’t change’ – once you’ve chosen a URI convention for your organisation, it’s a pain to change it
• Getting this right is key to having your linked data used more widely
We think that we got this one mostly right……but we still had too many anonymous nodes around
Good URI Selection
• Go for an incremental approach• …but keep an eye on possible avenues for future
expansion
• RDFa is not for beginners!
• Don’t do as we did: we tried to build linked data for all of our internal data in one go
Start with the easy stuff
• Regardless of your application domain, there is probably already an ontology that does some of what you want
• …but don’t be afraid to invent relationships and classes if you can’t find any suitable
• Don’t do as we did! we wrote a new ontology from scratch, rather than reusing FOAF+DC)
Don’t reinvent the wheel
• Build linked data for your own consumption first• You know what your use cases are – better to support
these than to second guess those of unknown future users
• Don’t do as we did: we overcomplicated our data by trying to support all of the plausible scenarios that we could think of, rather than concentrating on what mattered to us
(be glad I couldn't find any clip art for this slide)
Eat your own dogfood
• You should aim to publish as RDF• Publishing as CSV may get your data out there
faster as an interim measure
We used CSV as a ‘glue’ data format between different systems, but chose not to expose data until we could do so as RDF.
Don’t underestimate CSV