Open Data - Principles and Techniques

Post on 14-Jul-2015

214 views 1 download

Transcript of Open Data - Principles and Techniques

Open Data- Principles and Techniques -

VU Web Engineering / TU Wien May 15th 2014

!- Bernhard Haslhofer -

About me

• Data Scientist @ AIT - Austrian Institute of Technology

• Previously – Lecturer & Researcher @ Cornell University, NY,

USA – Univ. Ass @ University of Vienna – …

2

About me

• Research Interests

–Web-based information systems • Structured Web Data • Knowledge Graphs • Data quality issues • …

– Large-scale data analytics • Machine learning • Network analysis • Information retrieval

3

My plan for today…

• Open Data – Principles and Examples !

• Technique #1: Linked (Open) Data !

• Technique #2: Microdata !

• Open Data Activities in Austria !

• Questions / Discussion

4

Open Data – Principles

!

“Open data is data that can be freely used, reused and redistributed by anyone - subject only, at most, to the requirement to attribute and sharealike.” !Open Data Handbook, 2012, Open Knowledge Foundation http://opendatahandbook.org/

5

P#1: Availability and Access

Data must be available as a whole and at no more than a reasonable reproduction cost, preferably by downloading over the internet

!

Data must also be available in a convenient and modifiable form

6http://opendefinition.org/

P#2: Reuse and Redistribution

Data must be provided under terms that permit reuse and redistribution including the intermixing with other datasets.

7http://opendefinition.org/

P#3: Universal Participation

Everyone must be able to use, reuse and redistribute (no discrimination) !

No ‘non-commercial’ restrictions

8http://opendefinition.org/

Questions

!

• Do the open data principles sound familiar (to CS students / software engineers)? !

• Any known “open data” examples?

9

Open Data Licensing

10

Public Domain Dedication

11

Open Data Movement

12

Source: http://www.flickr.com/photos/jamescridland/613445810/sizes/l/in/photostream/

Open Government Data

13

14

15

“Decades ago, the US Government made both whether data and the GPS System freely available. Since that time, American entrepreneurs and innovators have utilised these resources to create navigation systems, location-based applications, …”

16

Open Government Data

17

18

19

Open Government Data

Developers Entrepreneurs

Startups

Apps / Services

(Open) Data Journalism

20

21

(Open) Data Journalism

(Open) Data Journalism

22http://datajournalismhandbook.org/

Open Data in Science

23

Open Data in Science / Open Access

24

How can publish and access structured data on the Web?

My plan for today…

• Open Data – Principles and Examples !

• Technique #1: Linked (Open) Data !

• Technique #2: Microdata !

• Open Data Activities in Austria !

• Questions / Discussion

26

Linked Data!“A method of publishing structured data so that it can be interlinked and become more useful. !It builds upon standard Web technologies such as HTTP, RDF and URIs, but rather than using them to serve web pages for human readers, it extends them to share information in a way that can be read automatically by computers. !This enables data from different sources to be connected and queried” ![Bizer, Heath, Berners-Lee 2009]

27

Linked Open Data

28Open Data + Linked Data = Linked Open Data

Why Linked Data?

Why Linked Data?

Why Linked Data?

Web Architecture

Web Architecture

• A set of simple standards – Uniform global addressing (URI) – Uniform document encoding (HTML) – Uniform transportation (HTTP)

• Hyperlinks connecting documents • Works pretty well for accessing and exchanging

documents

How can publish and access structured data on the Web?

Web Services and Web APIs

Source: http://www.blogperfume.com/new-27-circular-social-media-icons-in-3-sizes/

Web Services and Web APIs

• Each Web API has a proprietary interface • Datasources must be known in advance • Information entities (papers, authors,

subjects, etc.) are often not linked

37

Social Networking Sites as Walled Gardens by David Simonds

Linked Data Vision

• Publish and link structured data on the Web • Create a single globally connected data

space based on the Web Architecture

Web of Linked Data

• A set of simple standards – Uniform global addressing (URI) – Uniform data model (RDF) – Uniform transportation (HTTP)

• RDF links connecting entities • Forms a global data space and facilitates

accessing and exchanging data

What is Linked Data?

• A method to build a Web of Data • Architectural style, set of standards

Linking Open Data Project

• A W3C community project with the goal to extend the Web with a data commons by publishing various open data sets as RDF on the Web and by setting links between data items from different sources

~$ curl -I -H "Accept: text/turtle" http://dbpedia.org/resource/The_Shining_\(film\) !~$ curl -H "Accept: text/turtle" http://dbpedia.org/data/The_Shining_\(film\).ttl

~$ sudo apt-get install raptor (Linux) ~$ brew install raptor (Mac OSX) ~$ rapper http://dbpedia.org/resource/The_Shining_\(film\)

LINKED DATA TECHNOLOGIES

48

RDF

• A data model for representing data on the Web • Several statements (triples) form a graph

RDF/XML, N3, Turtle, etc.

• Data formats for RDF resource representations

• Used to transfer RDF data between apps

RDFS

• A language for describing the syntax and semantics of schemas/vocabularies in a machine-understandable way

http://dbpedia.org/ontology/Film

http://dbpedia.org/ontology/Work

rdfs:subClassOf

OWL• A more expressive (formal) language for defining

the syntax and semantics of schemas/vocabularies • Solves RDFS shortcomings but introduces quite

some complexity

SKOS• A language for describing controlled vocabularies

(taxonomies, thesauri, classification schemes)

SPARQL

• A query language and protocol for accessing RDF data on the Web

SELECT DISTINCT ?x WHERE { ! ?x dcterms:subject ! <http://dbpedia.org/resource/Category:1980s_horror_films> . }

Database Systems Analogy...

Purpose Relational Database Management Systems (RDBMS)

Linked Data Technologies

Query

Schema Definition Language

Data Representation

Identifiers

55

?

Database Systems Analogy...

Purpose Relational Database Management Systems (RDBMS)

Linked Data Technologies

Query SQL SPARQL

Schema Definition Language

SQL DDL RDFS / OWL

Data Representation

Relational Model / Tables RDF / Graph

Identifiers Primary Keys (numeric sequences)

URI

56

DBPedia Query Demo

57

SELECT ?person (count(DISTINCT ?spouse) as ?spouses) where { ?person a yago:AmericanFilmActors . ?person dbpprop:spouse ?spouse . !} ORDER BY DESC(?spouses) LIMIT 100

LINKED DATA EXAMPLES

58

65

66

Google Knowledge Graph

• Enables search for things (people, places) that Google knows about !

• Rooted in public sources such as Freebase, Wikipedia, CIA World Factbook, etc. – augmented to 500M objects, 3.5B facts and

relationship !

• Next generation search (semantic index)

67

68

69

My plan for today…

• Open Data – Principles and Examples !

• Technique #1: Linked (Open) Data !

• Technique #2: Microdata !

• Open Data Activities in Austria !

• Questions / Discussion

70

Rich Snippets / Microdata

71

Microdata (HTML5)

• An HTML 5 specification used to nest structured data within existing content on Web pages. !

• Search engines and browsers can extract and process Microdata and provide richer browsing experience for users

Microdata Example

<div itemscope itemtype="http://schema.org/Person"> !! <span itemprop="name">Bernhard Haslhofer</span>, ! <span itemprop="nickname">behas</span>. ! <div !itemprop="address” ! !itemscope itemtype="http://schema.org/PostalAddress">

! ! <span itemprop="streetAddress">301 College Avenue</span> ! ! <span itemprop=”addressLocality">Ithaca</span> ! ! <span itemprop=”addressCountry">United States</span>

! </div> </div>

Schema.org

schema.org / Microdata example

<h1>Pirates of the Carribean: On Stranger Tides (2011)</h1> Jack Sparrow and Barbossa embark on a quest to find the elusive fountain of youth, only to discover that Blackbeard and his daughter are after it too. !Director: Rob Marshall Writers: Ted Elliott, Terry Rossio, and 7 more credits Stars: Johnny Depp, Penelope Cruz, Ian McShane 8/10 stars from 200 users. Reviews: 50.

schema.org / Microdata example

schema.org

• Defines – a number of types (e.g, person), organized in

an inheritance hierarchy – a number of properties (e.g., name)

• Extension mechanisms to extend the schemas

• OWL representation: http://schema.org/docs/schemaorg.owl

• http://schema.rdfs.org/index.html78

Open Graph Protocol

81

My plan for today…

• Open Data – Principles and Examples !

• Technique #1: Linked (Open) Data !

• Technique #2: Microdata !

• Open Data Activities in Austria !

• Questions / Discussion

83

84

Open Government Data

85

Open Government Data

86

87

Open Government Data Apps

88

My plan for today…

• Open Data – The idea !

• Implementation #1: Linked Open Data !

• Implementation #2: Machine-readable HTML tags

!

• Open Data Activities in Austria !

• Questions / Discussion

89

Readings

!

• Tom Heath and Christian Bizer (2011) Linked Data: Evolving the Web into a Global Data Space (1st edition). Synthesis Lectures on the Semantic Web: Theory and Technology, 1:1, 1-136. Morgan & Claypool. !

• Jason Ronallo: HTML5 Microdata and Schema.orghttp://journal.code4lib.org/articles/6400