Post on 14-Jul-2015
Open Data- Principles and Techniques -
VU Web Engineering / TU Wien May 15th 2014
!- Bernhard Haslhofer -
About me
• Data Scientist @ AIT - Austrian Institute of Technology
• Previously – Lecturer & Researcher @ Cornell University, NY,
USA – Univ. Ass @ University of Vienna – …
2
About me
• Research Interests
–Web-based information systems • Structured Web Data • Knowledge Graphs • Data quality issues • …
– Large-scale data analytics • Machine learning • Network analysis • Information retrieval
3
My plan for today…
• Open Data – Principles and Examples !
• Technique #1: Linked (Open) Data !
• Technique #2: Microdata !
• Open Data Activities in Austria !
• Questions / Discussion
4
Open Data – Principles
!
“Open data is data that can be freely used, reused and redistributed by anyone - subject only, at most, to the requirement to attribute and sharealike.” !Open Data Handbook, 2012, Open Knowledge Foundation http://opendatahandbook.org/
5
P#1: Availability and Access
Data must be available as a whole and at no more than a reasonable reproduction cost, preferably by downloading over the internet
!
Data must also be available in a convenient and modifiable form
6http://opendefinition.org/
P#2: Reuse and Redistribution
Data must be provided under terms that permit reuse and redistribution including the intermixing with other datasets.
7http://opendefinition.org/
P#3: Universal Participation
Everyone must be able to use, reuse and redistribute (no discrimination) !
No ‘non-commercial’ restrictions
8http://opendefinition.org/
Questions
!
• Do the open data principles sound familiar (to CS students / software engineers)? !
• Any known “open data” examples?
9
Open Data Licensing
10
Public Domain Dedication
11
Open Data Movement
12
Source: http://www.flickr.com/photos/jamescridland/613445810/sizes/l/in/photostream/
Open Government Data
13
14
15
“Decades ago, the US Government made both whether data and the GPS System freely available. Since that time, American entrepreneurs and innovators have utilised these resources to create navigation systems, location-based applications, …”
16
Open Government Data
17
18
19
Open Government Data
Developers Entrepreneurs
Startups
Apps / Services
(Open) Data Journalism
20
21
(Open) Data Journalism
Open Data in Science
23
Open Data in Science / Open Access
24
How can publish and access structured data on the Web?
My plan for today…
• Open Data – Principles and Examples !
• Technique #1: Linked (Open) Data !
• Technique #2: Microdata !
• Open Data Activities in Austria !
• Questions / Discussion
26
Linked Data!“A method of publishing structured data so that it can be interlinked and become more useful. !It builds upon standard Web technologies such as HTTP, RDF and URIs, but rather than using them to serve web pages for human readers, it extends them to share information in a way that can be read automatically by computers. !This enables data from different sources to be connected and queried” ![Bizer, Heath, Berners-Lee 2009]
27
Linked Open Data
28Open Data + Linked Data = Linked Open Data
Why Linked Data?
Why Linked Data?
Why Linked Data?
Web Architecture
Web Architecture
• A set of simple standards – Uniform global addressing (URI) – Uniform document encoding (HTML) – Uniform transportation (HTTP)
• Hyperlinks connecting documents • Works pretty well for accessing and exchanging
documents
How can publish and access structured data on the Web?
Web Services and Web APIs
Source: http://www.blogperfume.com/new-27-circular-social-media-icons-in-3-sizes/
Web Services and Web APIs
• Each Web API has a proprietary interface • Datasources must be known in advance • Information entities (papers, authors,
subjects, etc.) are often not linked
37
Social Networking Sites as Walled Gardens by David Simonds
Linked Data Vision
• Publish and link structured data on the Web • Create a single globally connected data
space based on the Web Architecture
Web of Linked Data
• A set of simple standards – Uniform global addressing (URI) – Uniform data model (RDF) – Uniform transportation (HTTP)
• RDF links connecting entities • Forms a global data space and facilitates
accessing and exchanging data
What is Linked Data?
• A method to build a Web of Data • Architectural style, set of standards
Linking Open Data Project
• A W3C community project with the goal to extend the Web with a data commons by publishing various open data sets as RDF on the Web and by setting links between data items from different sources
~$ curl -I -H "Accept: text/turtle" http://dbpedia.org/resource/The_Shining_\(film\) !~$ curl -H "Accept: text/turtle" http://dbpedia.org/data/The_Shining_\(film\).ttl
~$ sudo apt-get install raptor (Linux) ~$ brew install raptor (Mac OSX) ~$ rapper http://dbpedia.org/resource/The_Shining_\(film\)
LINKED DATA TECHNOLOGIES
48
RDF
• A data model for representing data on the Web • Several statements (triples) form a graph
RDF/XML, N3, Turtle, etc.
• Data formats for RDF resource representations
• Used to transfer RDF data between apps
RDFS
• A language for describing the syntax and semantics of schemas/vocabularies in a machine-understandable way
http://dbpedia.org/ontology/Film
http://dbpedia.org/ontology/Work
rdfs:subClassOf
OWL• A more expressive (formal) language for defining
the syntax and semantics of schemas/vocabularies • Solves RDFS shortcomings but introduces quite
some complexity
SKOS• A language for describing controlled vocabularies
(taxonomies, thesauri, classification schemes)
SPARQL
• A query language and protocol for accessing RDF data on the Web
SELECT DISTINCT ?x WHERE { ! ?x dcterms:subject ! <http://dbpedia.org/resource/Category:1980s_horror_films> . }
Database Systems Analogy...
Purpose Relational Database Management Systems (RDBMS)
Linked Data Technologies
Query
Schema Definition Language
Data Representation
Identifiers
55
?
Database Systems Analogy...
Purpose Relational Database Management Systems (RDBMS)
Linked Data Technologies
Query SQL SPARQL
Schema Definition Language
SQL DDL RDFS / OWL
Data Representation
Relational Model / Tables RDF / Graph
Identifiers Primary Keys (numeric sequences)
URI
56
DBPedia Query Demo
57
SELECT ?person (count(DISTINCT ?spouse) as ?spouses) where { ?person a yago:AmericanFilmActors . ?person dbpprop:spouse ?spouse . !} ORDER BY DESC(?spouses) LIMIT 100
LINKED DATA EXAMPLES
58
65
66
Google Knowledge Graph
• Enables search for things (people, places) that Google knows about !
• Rooted in public sources such as Freebase, Wikipedia, CIA World Factbook, etc. – augmented to 500M objects, 3.5B facts and
relationship !
• Next generation search (semantic index)
67
68
69
My plan for today…
• Open Data – Principles and Examples !
• Technique #1: Linked (Open) Data !
• Technique #2: Microdata !
• Open Data Activities in Austria !
• Questions / Discussion
70
Rich Snippets / Microdata
71
Microdata (HTML5)
• An HTML 5 specification used to nest structured data within existing content on Web pages. !
• Search engines and browsers can extract and process Microdata and provide richer browsing experience for users
Microdata Example
<div itemscope itemtype="http://schema.org/Person"> !! <span itemprop="name">Bernhard Haslhofer</span>, ! <span itemprop="nickname">behas</span>. ! <div !itemprop="address” ! !itemscope itemtype="http://schema.org/PostalAddress">
! ! <span itemprop="streetAddress">301 College Avenue</span> ! ! <span itemprop=”addressLocality">Ithaca</span> ! ! <span itemprop=”addressCountry">United States</span>
! </div> </div>
Schema.org
schema.org / Microdata example
<h1>Pirates of the Carribean: On Stranger Tides (2011)</h1> Jack Sparrow and Barbossa embark on a quest to find the elusive fountain of youth, only to discover that Blackbeard and his daughter are after it too. !Director: Rob Marshall Writers: Ted Elliott, Terry Rossio, and 7 more credits Stars: Johnny Depp, Penelope Cruz, Ian McShane 8/10 stars from 200 users. Reviews: 50.
schema.org / Microdata example
schema.org
• Defines – a number of types (e.g, person), organized in
an inheritance hierarchy – a number of properties (e.g., name)
• Extension mechanisms to extend the schemas
• OWL representation: http://schema.org/docs/schemaorg.owl
• http://schema.rdfs.org/index.html78
Open Graph Protocol
81
My plan for today…
• Open Data – Principles and Examples !
• Technique #1: Linked (Open) Data !
• Technique #2: Microdata !
• Open Data Activities in Austria !
• Questions / Discussion
83
84
Open Government Data
85
Open Government Data
86
87
Open Government Data Apps
88
My plan for today…
• Open Data – The idea !
• Implementation #1: Linked Open Data !
• Implementation #2: Machine-readable HTML tags
!
• Open Data Activities in Austria !
• Questions / Discussion
89
Readings
!
• Tom Heath and Christian Bizer (2011) Linked Data: Evolving the Web into a Global Data Space (1st edition). Synthesis Lectures on the Semantic Web: Theory and Technology, 1:1, 1-136. Morgan & Claypool. !
• Jason Ronallo: HTML5 Microdata and Schema.orghttp://journal.code4lib.org/articles/6400