Michiel Hildebrand: CultuurLINK: Connecting Cultural Heritage

39
ENRICH > LINK > SEARCH The lean approach for advanced search applications over linked data Michiel Hildebrand Semantics Conference Vienna 2015

Transcript of Michiel Hildebrand: CultuurLINK: Connecting Cultural Heritage

Page 1: Michiel Hildebrand: CultuurLINK: Connecting Cultural Heritage

ENRICH > LINK > SEARCHThe lean approach for advanced search applications over linked data

Michiel HildebrandSemantics Conference Vienna 2015

Page 2: Michiel Hildebrand: CultuurLINK: Connecting Cultural Heritage

2

Page 3: Michiel Hildebrand: CultuurLINK: Connecting Cultural Heritage

Do you see value in open data?

3

Page 4: Michiel Hildebrand: CultuurLINK: Connecting Cultural Heritage

Do you think that open data could improve

the access to your own data?

4

Page 5: Michiel Hildebrand: CultuurLINK: Connecting Cultural Heritage

Have you integrated open data with your own data?

5

Page 6: Michiel Hildebrand: CultuurLINK: Connecting Cultural Heritage

Have you created an application on top of your

integrated data?

6

Page 7: Michiel Hildebrand: CultuurLINK: Connecting Cultural Heritage

The billion $ Open Data example

7

Page 8: Michiel Hildebrand: CultuurLINK: Connecting Cultural Heritage

Cultural Heritage: advanced access through (Open) Data

multi-lingual

location-based

recommendation

personalization

advanced ranking

analytics

http://www.getty.edu/research/tools/vocabularies/aat/ 8

Page 9: Michiel Hildebrand: CultuurLINK: Connecting Cultural Heritage

multi-lingual

location-based

recommendation

personalization

advanced ranking

analytics

Cultural Heritage: advanced access through (Open) Data

http://www.vistory.nl/9

Page 10: Michiel Hildebrand: CultuurLINK: Connecting Cultural Heritage

Cultural Heritage: advanced access through (Open) Data

multi-lingual

location-based

recommendation

personalization

advanced ranking

analytics

query logs

content-based10

Page 11: Michiel Hildebrand: CultuurLINK: Connecting Cultural Heritage

Cultural Heritage: advanced access through (Open) Data

multi-lingual

location-based

recommendation

personalization

advanced ranking

analytics

http://manovich.net/11

Page 12: Michiel Hildebrand: CultuurLINK: Connecting Cultural Heritage

Historic newsreels and photographs

12

Page 13: Michiel Hildebrand: CultuurLINK: Connecting Cultural Heritage

Demo: Linked Open Images

13

http://link.spinque.com/openbeelden

Page 14: Michiel Hildebrand: CultuurLINK: Connecting Cultural Heritage

Can we build this in a day?

14

Page 15: Michiel Hildebrand: CultuurLINK: Connecting Cultural Heritage

Factory metaphor

PUSH: make to stock

PULL: make to order

Output and efficiency oriented

exact needs of user secondary

User needs oriented

production costly

15

Page 16: Michiel Hildebrand: CultuurLINK: Connecting Cultural Heritage

How can we reduce the time

and cost?

Data factory

PUSH: make to stock

PULL: make to order

16

How good is the data for

your application?

Page 17: Michiel Hildebrand: CultuurLINK: Connecting Cultural Heritage

The lean approach

17

Your data Integrate Access Deploy

API

Enrich

Page 18: Michiel Hildebrand: CultuurLINK: Connecting Cultural Heritage

Open Data Node platform

http://opendatanode.org/

Methodology for publishing Open Data

http://www.comsode.eu/index.php/deliverables/

Moving from one-off to sustainable data publishing

18

http://unifiedviews.eu/

Page 19: Michiel Hildebrand: CultuurLINK: Connecting Cultural Heritage

Key requirements for integration step

Sustainable

Quality control

19

Your data Integrate Access Deploy

API

Enrich

Page 20: Michiel Hildebrand: CultuurLINK: Connecting Cultural Heritage

Integrating historic newsreels with photographs

GTAA thesaurus (SKOS)NIOD subject terms (SKOS)

20

Page 21: Michiel Hildebrand: CultuurLINK: Connecting Cultural Heritage

preferred label

antisemitisme

spionage

amnestie

...

preferred label

antisemitisme

spionage

amnestie

...

NIOD subject terms GTAA thesaurus

preferred label = preferred label

21

Page 22: Michiel Hildebrand: CultuurLINK: Connecting Cultural Heritage

prefered label alternative label

politieagenten agenten

militaire parades parades

optochten parades

prefered label

agenten

parades

NIOD subject termsGTAA thesaurus

Introduces ambiguity

preferred label = alternative label

22

Page 23: Michiel Hildebrand: CultuurLINK: Connecting Cultural Heritage

prefered label

dodenherdenking

hamsteren

NIOD subject terms GTAA thesaurus

Introduces errors

prefered label

dodenherdenkingen

hamsters

singular label = plural label (stemming)

23

Page 24: Michiel Hildebrand: CultuurLINK: Connecting Cultural Heritage

prefered label

dieren

graven

NIOD subject terms

GTAA thesaurus

filter sources

prefered label concept scheme

dieren subject terms

dieren geographical names

graven subject terms

grave geographical names

subject ≠ location (noise)

24

Page 25: Michiel Hildebrand: CultuurLINK: Connecting Cultural Heritage

Other alignment techniques

fuzzy string matching

join matches on multiple attributes

similarity in the hierarchy (skos:broader)

select best candidate (most generic/specific term)

....

25

Page 27: Michiel Hildebrand: CultuurLINK: Connecting Cultural Heritage

Key requirements integration step checked

Quality control• Model link strategy out of (simple) building blocks• Iterative process (trial and error)• Exploration of the source data• Direct access to the results• Evaluate the subsets

Sustainable• Export links and link strategy• Provenance of the process is explicit in the strategy• Rerun after update of datasets

27

Page 28: Michiel Hildebrand: CultuurLINK: Connecting Cultural Heritage

Dutch National Strategy Digital Heritage

28

Page 29: Michiel Hildebrand: CultuurLINK: Connecting Cultural Heritage

CultuurLINK a free service for the cultural heritage domain

29

http://cultuurlink.beeldengeluid.nl/

Page 30: Michiel Hildebrand: CultuurLINK: Connecting Cultural Heritage

Rijksmuseum Amsterdam integrated multilingual vocabularies

http://www.rijksmuseum.nl/nl/collectie/BK-NM-1010 http://www.getty.edu/research/tools/vocabularies/aat/ 30

Page 31: Michiel Hildebrand: CultuurLINK: Connecting Cultural Heritage

Key requirements for access step

31

Your data Integrate Access Deploy

API

Enrich

Model complex access (search)

Combine graph queries and ranking

Page 32: Michiel Hildebrand: CultuurLINK: Connecting Cultural Heritage

Already three types of search in a simple app

32

keyword search location-based search recommendation

Page 33: Michiel Hildebrand: CultuurLINK: Connecting Cultural Heritage

multilingual

location-based

recommendation

personalization

ranking

analyticsProbabilistic Graph Database

Building blocks (SPINQL)

Search by Strategy

Advanced search applications with Spinque

33

Page 34: Michiel Hildebrand: CultuurLINK: Connecting Cultural Heritage

Demo Spinque Search

34

Page 35: Michiel Hildebrand: CultuurLINK: Connecting Cultural Heritage

Key requirements access step checked

Model complex search problems• Search strategy out of (simple) building blocks• No programming required

Combine graph queries and ranking• Integrated triple store and search index• Probabilistic graph database• Building blocks for graph queries• Building blocks for search and ranking

35

Page 36: Michiel Hildebrand: CultuurLINK: Connecting Cultural Heritage

Your data Enrich Link strategy

API

DeploySearch strategy

36

The lean approach

Page 37: Michiel Hildebrand: CultuurLINK: Connecting Cultural Heritage

Breakout

What kind of functionality would you like to provide to your users?

1. What kind of data do you want to make accessible in a richer way?

2. What additional (open) data can you use for this enriched access?

3. What type of (search) functionality is required?

37

Page 38: Michiel Hildebrand: CultuurLINK: Connecting Cultural Heritage

Other applications: Restaurant inspections

38

Page 39: Michiel Hildebrand: CultuurLINK: Connecting Cultural Heritage

Other applications: Community platform

39