Linked Open Data in Digital Humanities - Judaica Europeana · Linked Open Data in Digital ... * See...

98
Linked Open Data in Digital Humanities Christian Morbidoni <[email protected] > thanks to Michele Barbera for the slides :-) This work is licensed under a Creative Commons Attribution-NoDerivs 2.5 Italy Friday, July 30, 2010

Transcript of Linked Open Data in Digital Humanities - Judaica Europeana · Linked Open Data in Digital ... * See...

Linked Open Data in Digital HumanitiesChristian Morbidoni <[email protected]>

thanks to Michele Barbera for the slides :-)

This work is licensed under a Creative Commons Attribution-NoDerivs 2.5 Italy

Friday, July 30, 2010

MissionNet7

Scientific knowledge

Business dataCultural

Heritage

Make knowledge more usable and accessible by applying

Semantic Web technologies.

Friday, July 30, 2010

From the Cathedral to the Bazar

PhilosophyWe use and promote

Open Source and Free Software

We support Open Access to scientific knowledge and cultural heritage

Friday, July 30, 2010

LocationKnowledge and innovation area

4 Universities & research Institutions:

- Università di Pisa- Scuola Normale Superiore- Scuola Superiore Sant’Anna- CNR

2 Science Parks:

- Polo Tecnologico di Navacchio- Pontech

Consorzio Apice:

- 25 High-Tech SMEs- Turn-around:10 Meur- Operators: 250

~1000 High-Tech SMEs

>7000 knowledge workers

The province of Pisa has ~ 380.000 Inhabitants

Friday, July 30, 2010

2002Riccardo - 28Federico - 27Michele - 23 Alessio - 29 Simone - 27Roan - 28 Armando - 28

History

Founded by 7 people: 3 computer scientists, 1 historian, 2 physicists, 1 film directorLABS: Pisa, Tuscany

Capital: 4k eur

Friday, July 30, 2010

2010now

Investement in (own) Research & Innovation in 2009: 45%LABS in 2010: Pisa, Tuscany - Lecce, Puglia

Michele - 29 Federico - 33 Riccardo - 34 Alessio - 35 Luca - 38 Nicola - 32 Daniel - 32Evelyne - 39

Matteo - 46Enrico - 45 Giovanni - 26 Romeo - 26 Simone - 27

Danilo - 32

Massimiliano - 28Massimiliano - 32 Marco - 33Adrian - 38

Friday, July 30, 2010

Some Clients & Partners

Italy- Tuscany Region- Province of Pisa- Municipality of Milan- Scuola Normale Superiore- University of Pisa- Scuola Superiore Sant’Anna- University of Bologna- University of Bergamo- Cilea- UNICREDIT- Intesa Sanpaolo Spa- Connexia Spa

France- INRIA- CNRS- Université Paris-Nord 13- Maison des Sciences de l'Homme

Poland- Knowledgehives Ltd

Cyprus- Cyprus Institute

Ireland- Trinity College Dublin- Digital Enterprise Research Institute- IBM

UK- Oxford Internet Institute- De Montfort University- In2 Ltd.- King’s college London

Germany- Universität München- Staatsbibliothek zu Berlin- Humboldt-Universität zu Berlin

Norway- University of Bergen- AKSIS

Belgium- Id Consulting

Cultural Heritage

Knowledge Management

Document Management

eGovernment

Friday, July 30, 2010

Publishing on the Web is crucial...

Friday, July 30, 2010

fancy Web sites are nice

Friday, July 30, 2010

so are carefully marked-up texts

Friday, July 30, 2010

but

Friday, July 30, 2010

are they really accessible?

Friday, July 30, 2010

peer review and quality of data are important things

Friday, July 30, 2010

except the notion of “quality” changes over time and space and culture and...

Friday, July 30, 2010

My boss allowed me to do it* in my spare time...

* the Web!

Friday, July 30, 2010

Researchers are not any more only in the academia*

* See also recent speeches of EU Commissioner M. Quinn and Europe 2020 strategy

Friday, July 30, 2010

Wikinomics and the Era of Openness: European Innovation at a Crossroads

e-brief published by the Lisbon Council

...the primary difference between the old and new [innovation] system can perhaps best be captured by a single concept of profound ramifications: openness.

This openness stands in stark contrast to the closed innovation systems of the past – systems that produced new products in remote, closed-off laboratories; systems in which intellectual property was never shared but fiercely guarded with the help of patents; systems which were built on incremental, even slow and predictable change. Today, even traditional “brick-and-mortar” industries are opening up their innovation processes, sometimes even sharing their intellectual property and releasing patents. Consumers and users are no longer passive recipients of products that companies produce but are co-creators and valuable sources of intelligence and new ideas. [...]

Europe is already an active contributor and enthusiastic participant in this new, open era of innovation [driven by] the brilliance of collaborative communities [...] In many ways, what we are witnessing is nothing short of the “democratisation” of innovation, empowering millions of people [...] It holds huge potential for citizens, consumers, companies and governments alike, and it must be urgently recognised as a profound and lasting paradigm shift that effects not only the private sector but also society at large.

Friday, July 30, 2010

research is not done anymore only here

Friday, July 30, 2010

Friday, July 30, 2010

researchers are all around us

Friday, July 30, 2010

so

Friday, July 30, 2010

if we want to advance human knowledge (our mission?)

Friday, July 30, 2010

we can’t anymore

Friday, July 30, 2010

keep data hidden in our hard disks

Friday, July 30, 2010

nor

Friday, July 30, 2010

in our fancy websites with TEI-aware, user-adaptive, context-sensitive, whatever

search engines

Friday, July 30, 2010

we must allow

Friday, July 30, 2010

people (and the next generations) to freely re-use, re-think, re-mix the knowledge we encoded into machines

Friday, July 30, 2010

possibly lots of people

Friday, July 30, 2010

Internet users

in Europe:402.380.474 (~50%)

in the world:1.668.870.408

average growth from 2000 to 2009:362.3%

Friday, July 30, 2010

so we can’t hide data because

Friday, July 30, 2010

data is less valuable when it’s isolated

Friday, July 30, 2010

value is in

Friday, July 30, 2010

context

Friday, July 30, 2010

linking

Friday, July 30, 2010

someone understood this and...

Friday, July 30, 2010

in the nineties he built the Web

Friday, July 30, 2010

but it didn’t work like he imagined

Friday, July 30, 2010

Friday, July 30, 2010

it became a Web of Documents meant for Human consumption (only)

Friday, July 30, 2010

machines had no way of understanding

Friday, July 30, 2010

that two documents talk about the same “thing”

Friday, July 30, 2010

semantics is hidden, only humans (and not all of them) can extract it from documents

Friday, July 30, 2010

machines are helpless

Friday, July 30, 2010

unless...

Friday, July 30, 2010

we explicitly say to machines what we are talking about

... and we keep it simple...

... and we use simple, structured language that they can understand.

Friday, July 30, 2010

this is what the Semantic Web (and Linked Data, and the Web of Data) is all about.

Friday, July 30, 2010

Friday, July 30, 2010

DONATELLO FLORENCEis from

BRUNELLESCHI

worked with

made

DAVID

FROM wikipedia

Friday, July 30, 2010

DONATELLO FLORENCEis from

BRUNELLESCHI

worked with

made

is a

Marble StatueMuseo Nazionale del Bargello

is exposed in

DAVID

FROM wikipedia

FROM the museum web site

Friday, July 30, 2010

Linking ?

The Web is great because it is so simple to jump from a web page to the others

With the Semantic Web it should be easy to jump from a database to an other:

Image separated clouds of data that are ready to interoperate with each other ...

as if it was a unique world wide database

Friday, July 30, 2010

Web of Documents vs. Web of Data

Presentation Layer

Data representation layer

Hyperlinks

?

RDF Data RDF Data RDF Data RDF Data

???

semantically defined relations ???

Web Infrastructure

Friday, July 30, 2010

Feb 2008

Friday, July 30, 2010

Giu 2009

Friday, July 30, 2010

Web of Documents

Web of Data

Social Web

Web 1.0

Web 2.0

Web 3.0

Hidden semantics, no access to data

Hidden semantics, partial access to data,1-to-1 Interoperability

Explicit semantics, full access to data, global Interoperability

Friday, July 30, 2010

An important consideration

The Semantic Web is not a magic thing

It doesn’t necessarily mean artificial intelligence

It aims at better structuring knowledge so that:

Independently produced knowledge can be more easily merged

Applications can be smarter in collecting, managing and presenting knowledge

Friday, July 30, 2010

after TBL call for “Raw data now!” on TED in 2009there has been a huge growth of LOD

Friday, July 30, 2010

UK government

Friday, July 30, 2010

Swedish government

Friday, July 30, 2010

Austrian government

Friday, July 30, 2010

BBC

Friday, July 30, 2010

New York Times

Friday, July 30, 2010

Library of Congress

Friday, July 30, 2010

German National Library

Friday, July 30, 2010

Europeana

Friday, July 30, 2010

Municipalities of San Francisco, Toronto, London, Zaragoza

Friday, July 30, 2010

google

Friday, July 30, 2010

Yahoo

Friday, July 30, 2010

Facebook

Friday, July 30, 2010

today we have more than 15 billions triples

Friday, July 30, 2010

what for?

Friday, July 30, 2010

to let other people use our data in new an unanticipated ways

Friday, July 30, 2010

to let them do research, develop applications and mash-ups, play, do business...

Friday, July 30, 2010

just a couple of examples

freebase.com

data.gov.uk

geonames.org+

http://www.where-can-i-live.com/londonproperty

http://www.wheredoesmymoneygo.org/

http://www.we-love-the.net/FreeInfluencer/

Friday, July 30, 2010

the web is becoming a huge distributed global database

Friday, July 30, 2010

we want to be part of it

Friday, July 30, 2010

yes we can!*

* with Muruca

Friday, July 30, 2010

Muruca

Friday, July 30, 2010

Muruca

Semantic WebOpen Source Open Access

Friday, July 30, 2010

MurucaSet of tools (not an out-of-the box product) to build research platforms and digital libraries in the Linked

Data Cloud

Friday, July 30, 2010

Muruca

Led by Net7 with University of Ancona, DERI Galway and other contributors

Friday, July 30, 2010

Muruca‣Talia + SwickyNotes (to be merged soon)

‣IIPImage + IIPImageAnnotator

‣Linked Data recommender & search

Friday, July 30, 2010

SWikyNotes

SemanticAnnotations

Import orrun-time mashup

XHTML

RDF(S)

Navigation based on contextual semantic information

- +

Talia based Web Site

Site Ontology

http://www.example.com/ex1.html

Talia & SWickyNotes

• SwickyNote users add semantic annotations to web pages or to zones of the page• e.g. this phrase (fragment) contradicts that other one; this page mentions a

moecule; this word is a place.• Semantic annotations are published into Talia• Talia uses semantic annotation to enrich the browsing and searching experience of its

users.

Friday, July 30, 2010

Talia & SWickyNotesPuccini Source

Text

http://puccini-soiurce.com/sources/Bio

Semantic enrichment in practice

Friday, July 30, 2010

Talia & SWickyNotesPuccini Source

Text

http://puccini-soiurce.com/sources/Bio

Semantic enrichment in practice

stable URLhttp://taliaexample.com/puccini/xpointer(id(‘123’))

Friday, July 30, 2010

Talia & SWickyNotesPuccini Source

Text

http://puccini-soiurce.com/sources/Bio

Semantic enrichment in practice

stable URLhttp://taliaexample.com/puccini/xpointer(id(‘123’))

Puccini Source

Text

http://puccini-soiurce.com/sources/Bio

Semantic enrichment in practice

Stable URLhttp://piccini-source/sources/Bio/xpointer(id('123'))

Example:

http://puccini-source.com/sources/Bio#xpointer(id('123'))

http://puccini-source.com/sources/Bio

"This is partially true because...."

comment

part of

Friday, July 30, 2010

Talia & SWickyNotesPuccini Source

Text

http://puccini-soiurce.com/sources/Bio

Semantic enrichment in practice

RDF/XML

This is partially true because most recente research found ou that Puccini bla bla bla...

Puccini Source

Text

http://puccini-soiurce.com/sources/Bio

Semantic enrichment in practice

Stable URLhttp://piccini-source/sources/Bio/xpointer(id('123'))

Example:

http://puccini-source.com/sources/Bio#xpointer(id('123'))

http://puccini-source.com/sources/Bio

"This is partially true because...."

comment

part of

?

Friday, July 30, 2010

Image Annotation

Friday, July 30, 2010

target

• digital humanities

• medicine

Friday, July 30, 2010

approach

example.com/mymanuscript/

example.com/shape1

example.com/shape2

example.com/shape3

Friday, July 30, 2010

example.com/shape1

example.com/shape2

onto:belongs_to

onto:belongs_toonto:belongs_to

onto:c

oordin

ates

example.com/svgfile.xml

example.com/mymanuscript/

example.com/shape3

http://example.com/myImageAnnotations.rdf

example.com/shape654/

???

example.com/textFragment39847/

??? ???

another.com/pagex/

example.com/concepts/democracy

???

Friday, July 30, 2010

web

• browser application

• editor and viewer versions

• embeddable and scriptable

• viewer based on OpenZoom

• based on multiple technologies on the server side (see next slides)

Friday, July 30, 2010

links

• link shapes (URIs) to everything (URIs)

• other images

• shapes in other images

• Ascii text

• non TEI xml

• webpages

Friday, July 30, 2010

stand-off

• anchors are stand-off

• coordinates are stand-off

• links are stand-off (instances of ontology properties)

• targets are stand-off (anything has a URI)

Friday, July 30, 2010

base layers

• shapes on server side- gigabyte images

• shape on normal flat images

• shapes on tiled pyramidal GIS maps

• shapes on multispectral pyramidal images

Friday, July 30, 2010

annotation layers

• sublevels

• more shapes on the same level (groups)

Friday, July 30, 2010

Thank you!

Friday, July 30, 2010

Some applications

• Zemanta.

• With a bit of text processing things mentioned in a text can be automatically linked to the blog-sphere, wikipedia and more.

Friday, July 30, 2010