Keynote csws2013

54
Linked Data for Digital Heritage and History Victor de Boer VU University Amsterdam Keynote CSWS 2013 Shanghai

description

Keynote presentation for CSWS 2013 Conference in Shanghai, China. Some slides borrowed from Jan Wielemaker, Guus Schreiber, Jacco van Ossenbruggen, Niels Ockeloen, Antske Fokkens, Serge ter Braake.

Transcript of Keynote csws2013

Page 1: Keynote csws2013

Linked Data for Digital Heritage and History

Victor de Boer

VU University Amsterdam Keynote CSWS 2013 Shanghai

Page 2: Keynote csws2013

About me

Victor de Boer

Assistant professor at VU University Amsterdam

Domain-driven Semantic Technologies, Linked Data

Cultural Heritage

Digital History

Linked Data for Development

Page 3: Keynote csws2013

Linked Data is ``a term used to describe a recommended best practice for exposing, sharing, and connecting pieces of data, information, and knowledge on the Semantic Web using URIs and RDF.’’ --Wikipedia

Page 4: Keynote csws2013

The evolution of Science

Antonie van Leeuwenhoek’s microscope (17th C.)

Large Hadron Collider in Switzerland (21th C.)

Page 5: Keynote csws2013

Why Linked Data for E-science

Large amounts of data

Efficient analysis, data mining

Sharing data, information and knowledge between scientists

Across continents

Across disciplines

Page 6: Keynote csws2013

OpenPhacts explorer

http://www.openphacts.org/

Page 7: Keynote csws2013
Page 8: Keynote csws2013

But what about the humanities?

Page 9: Keynote csws2013

Cultural Heritage

Page 10: Keynote csws2013

MultimediaN E-Culture project

• Museums have increasingly nice websites • But: most of them are driven by stand-alone collection

databases

• Data is isolated, both syntactically and semantically

• If users can do cross-collection search, the individual collections become more valuable!

• Semantic Search

Page 12: Keynote csws2013

Search for objects which are linked via concepts (semantic link)

China

Kanton

PartOf

Query “China”

Use the type of semantic link to provide meaningful presentation of the search results

Rijksmuseum: View of Canton, with two Dutch ships

Semantic Search

Page 13: Keynote csws2013

Vocabulary alignment

• In large virtual collections there are always multiple vocabularies with its own perspective – In multiple languages – You can’t just merge them

• But you can use vocabularies

jointly by defining a limited set of links

• It is surprising what you can

do with just a few links

Page 14: Keynote csws2013

Vocabulary alignment

“Easel-pieces”

RMA concept “Schilderij”

RMA is the thesaurus of Rijksmuseum

AAT artefact type “Easel Piece” “Painting”

AAT is Getty’s Art & Architecture Thesaurus

Page 15: Keynote csws2013

http://e-culture.multimedian.nl/

Page 16: Keynote csws2013

Amsterdam Museum as Linked Open Data

Page 17: Keynote csws2013

17

Page 18: Keynote csws2013

Amsterdam Museum

• Formerly Amsterdam Historic Museum – “The rich collection of works of art,

objects and archaeological finds brings to life the fortunes of Amsterdammers of days gone by and today.”

• In March 2010 published their whole collection online – 70.000 objects

– CC license

Page 19: Keynote csws2013

Requirements for conversion and linking

• Transparent conversion and linking of the data – Use of provenance and

reproducibility

• keep original complexities of the data

• while making it interoperable with other (Europeana) data

• Retain the relation to original data

19

Page 20: Keynote csws2013

Methods

ClioPatria

XMLRDF

1. XML ingestion (OAI)

2. Direct transformation to ‘crude’ RDF

3. Interactive RDF restructuring

4. Create a metadata mapping schema

5. Align vocabularies with external sources

6. Publish as Linked Data

Amalgame

Tools

Page 21: Keynote csws2013

ClioPatria.swi-prolog.org

ClioPatria is powered by

Page 22: Keynote csws2013

XMLRDF rewriting rules examples

Page 23: Keynote csws2013

Mapping to popular vocabularies

am:obj_22093 “Job Cohen” am:contentPersonName

rdfs:subPropertyOf

dcterms:subject

Page 24: Keynote csws2013

Amalgame alignment platform

• Semi-automatic linking – Simple automatic

techniques,

– chained together by hand

• Transparent and interactive

Page 26: Keynote csws2013

E-history (digital history)

Page 27: Keynote csws2013

BiographyNet

Page 28: Keynote csws2013

(Narrative) historical methodology

• Historical facts derived mainly from archival findings and existing literature

• Historians put them together into a narrative/synthesis.

– The Narrative: a historical synthesis which can not be

scientifically proven (only made likely) based on facts which can be proven or falsified. There is necessarily a creative element in drawing up a narrative

Slides by BiographyNet team

Page 29: Keynote csws2013

Where do eScience and Biographical History meet?

• Quantitative analyses of a larger group of people (prosopography). Surpassing the anecdotal.

• Finding relations/networks

between people which are otherwise hard to detect

Page 30: Keynote csws2013

Johan Rudolph Thorbecke werd in 1798 geboren op 14 januari in Zwolle en komt uit een half-Duitse…

Johan Rudolph Thorbecke werd in 1798 geboren op 14 januari in Zwolle en komt uit een half-Duitse…

Linked Data for BiograpyNet

Thorbecke

Biographical Description

Provenance Meta Data

NNBW

Person Meta Data

“Thorbecke”

Biography Parts

Birth 1798

Event

Biographical Description

Enrichment NLP Tool

Person Meta Data

Event Birth

Johan Rudolph Thorbecke werd in 1798 geboren op 14 januari in Zwolle en komt uit een half-Duitse…

Zwolle 1798-01-14

Page 31: Keynote csws2013

Prototype under development

The information provided by the first system can

be used to:

1. Identify alternative descriptions of events (same time, location and/or participants)

2. Identify relations between events (same locations & time, consequent events, same participants, etc.)

3. Initial networks of people

http://www.biographynet.nl

Page 32: Keynote csws2013

Verrijkt Koninkrijk

Page 33: Keynote csws2013

History of German occupied Dutch society (1940-1945) Published between 1969 and 1991 in 14 volumes, 30 parts, 18.000 pages 1. Digitization, 2. Open Data, 3. Enriched access with Linked Open Data

Het Koninkrijk der Nederlanden in de Tweede Wereldoorlog

(The Kingdom of the Netherlands During World War II )

Page 34: Keynote csws2013

country, collection, doc-type, volume, chapter, section, sub-section, paragraph

Page 35: Keynote csws2013

Back of the Book Vocabulary +

Named Entity Vocabulary

SKOS vocabularies as stepping stones

Page 36: Keynote csws2013

http://semanticweb.cs.vu.nl/verrijktkoninkrijk/

Page 37: Keynote csws2013

niod:Blitzkrieg

niod:parRef

niod:oai_wo2_niod_nl_rec_102045 dct:subject

http://resolver.verrijktkoninkrijk.nl/nl.vk.d.reg.4.1386

botb:Blitzkrieg

skos:exactMatch

Page 38: Keynote csws2013

skos:exactMatch

skos:exactMatch

Page 39: Keynote csws2013

botb:sjanghai

dbpedianl:sjanghai

dbpedia:sjanghai

owl:sameAs

Шанхай

Thượng Hải

上海市

Xangai

Šanghaj

Shanghai

Shanghai

rdfs:label

dbpedia:

Shanghai_Jiao_Tong_University

dbp:is_city_of

Page 40: Keynote csws2013

SELECT * WHERE

{ ?s skos:prefLabel ?pl.

?s skos:closeMatch ?geo.

?geo gn:parentADM1 ?prov.

?prov gn:name ?provname.

?s niod:pageRef ?pref. }

0

2000

4000

6000

8000

10000

12000

NE index

BotB index

Geographical analysis using background knowledge from

GeoNames

Page 41: Keynote csws2013

Results are links to paragraphs

Page 42: Keynote csws2013

SPARQL for R

National-Socialist

29%

Social-Democrat

21% Protestant

13%

Liberal 12%

R-Catholic 12%

Communist 8%

Jewish 5%

Pillar1 Pillar2 Co

Liber. Protestant 0.29

Protestant R-Cath. 0.22

Liber. R-Cath. 0.21

Comm Soc-dem 0.20

Liber. Soc-dem 0.15

Page 43: Keynote csws2013

Dutch Ships and Sailors

Page 44: Keynote csws2013

gz:Mercuur

1782

gz:Buijksloot

gz:Batavia

gz:Claas Roem

voc:Claas Roem

voc:Buijksloot

1752 das:Mercuur

das:Departure

das:Roem, Klaas

19-12-1780 das:Texel

das:Arrival

20-7-1781 das:Batavia

das:Voyage1

Web of Data

Page 45: Keynote csws2013

DataLab

Page 46: Keynote csws2013

Lessons Learned

Page 47: Keynote csws2013

Be humble, transparent and interactive in your data conversion and linking

Lessons Learned

Page 48: Keynote csws2013

Lessons Learned

Retain complexities of the data and establish layers of interoperability

Page 49: Keynote csws2013

Lessons Learned

A Little Semantics goes a Long Way…and so does a small amount of links

Page 50: Keynote csws2013

Lessons Learned

Make sure your solutions and tools fit the methodology of the field

Page 51: Keynote csws2013

Lessons Learned

Show added benefit for scientific research and (unexpected) re-use

Page 52: Keynote csws2013

Lessons Learned

Linked Data is a good fit for Humanities research

Page 53: Keynote csws2013

Thank you!

Victor de Boer

http://victordeboer.com [email protected]

Page 54: Keynote csws2013

Image credits

• Wikipedia lemmas • Flickr images (cc-licensed)

– RMTip21 – Argonne National Laboratory – thegarethwiscombe

• “Linking Open Data cloud diagram, by Richard Cyganiak and Anja Jentzsch. http://lod-cloud.net/” • http://blogs.voanews.com/science-world/tag/cern/ • Gezicht op Canton, Vingboons-atlas, Bussum 1981, p. 35 VOC Kenniscentrum

Links

• http://semanticweb.cs.vu.nl

• http://biographynet.nl

• http://e-culture.multimedian.nl

• http://cliopatria.swi-prolog.org