E-Culture semantic search pilot

19
MultimediaN Pilot E-Culture

description

Seminar, Staford Medical Informatics, August 2006

Transcript of E-Culture semantic search pilot

Page 1: E-Culture semantic search pilot

MultimediaNPilot E-Culture

Page 2: E-Culture semantic search pilot

2

Pilot E-Culture

Partners: VU, UvA, CWI, DEN, ICN

Subproject of MultimediaN, a 16 MEuro project on multimedia technology funded by the Dutch government

Aim: demonstrate added value of Semantic Web techniques for virtual heritage collections

Page 3: E-Culture semantic search pilot

3

Page 4: E-Culture semantic search pilot

4

Hypothesis

Semantic Web technology is in particular useful in knowledge-rich domains

or formulated differently

If we cannot show added value in knowledge-rich domains, then it may have no value at all

Page 5: E-Culture semantic search pilot

5

Use case: painting style

Find paintings of a similar style

KLIMT, GustavPortrait of Adele Bloch-Bauer I1907Oil and gold on canvas138 x 138 cmAustrian Gallery, Vienna

Page 6: E-Culture semantic search pilot

6

How can we find this other ‘Art nouveau’ painting?

MUNCH, EdvardThe Scream1893Oil, tempera and pastel on

cardboard91 x 73.5 cmNational Gallery, Oslo

Page 7: E-Culture semantic search pilot

7

Issues w.r.t. the use case

Parse annotation to find matches with thesauri terms– E.g. match artists to ULAN individuals

Artists-style links– AAT contains styles; ULAN contains artists, but there

is no link• Learn link from corpora• Derive it from other annotations

– Domain-specific rules/reasoning needed • see example in SWRL doc• Painters may have painted in multiple styles

Page 8: E-Culture semantic search pilot

8

Natural-lang proc.automatic annotation

text stings concepts

Distributedcultuurwijzer.nl collections

OAI-based access

Reasoning supporttime/space reasoning

Web interfacesupport for web collections

Presentation facilitiessemantic presentation

device-specific

InteroperabilityXML/RDF/OWL

Scalability> 10,000,000 triples

OntologiesWordNet, AAT, TGN ULAN, Dutch labels

Search strategiessibling searchsemantic distance

Dublin Corespecializationsdumb-down

semantic annotation

DIGITAL HERITAGE COLLECTIONS

semantic search

BASELINEENHANCEDENHANCEDFEATURESFEATURES

NEWNEWFEATURESFEATURES

Page 9: E-Culture semantic search pilot

9

Architecture

Page 10: E-Culture semantic search pilot

10

Use of thesauri

RDF/OWL data models of Getty thesauri– Issues: scope, preserving structure

WordNet: W3C SWBPD workhttp://www.w3.org/TR/wordnet-rdf/

Multilingualism– Dutch version of AAT

Existing collection metadata are parsed to find matches in thesauri (e.g. creator name => ULAN entry)

Page 11: E-Culture semantic search pilot

11

Distributed vs. centralized collection dataMinimal requirement: collection object has

image URIPreference for external metadata,

accessed through protocol such as OAI In practice, external metadata access is

still cumbersome

Page 12: E-Culture semantic search pilot

12

Search strategies

Basic search: keyword-orientedAdvanced search:

– Tweaking default search parameters– Time-related queries

Faceted searchRelation search

– How are two URIs related?

Page 13: E-Culture semantic search pilot

13

Keyword search with semantic clustering1. Btree of literals plus Porter stem and

metaphone index2. Find resources with matching labels

• Default resources are “Work”s

3. Find related resources by one-way graph traversal

• owl:inverseOf is used• Threshold used for constraining search

4. Cluster results (group instances)

Page 14: E-Culture semantic search pilot

14

Demonstrator

Page 15: E-Culture semantic search pilot

15

Search: WordNet patterns that increase recall without sacrificing precisions

(Hollink)

Page 16: E-Culture semantic search pilot

16

Triple statistics

Page 17: E-Culture semantic search pilot

17

Status

4-year project, now in month 18Short-term goals:

– Adding more ethnological collections– Location-oriented presentation– User studies with professional users (museum

people) and interested lay persons– Multi-lingual interface (English, Dutch,

Indonesian)

Page 18: E-Culture semantic search pilot

18

Issues

Getting access to collections is mainly a social process– There is usually no principled objection to make data,

metadata and thesauri publicly available, but it still feels threatening

Cultural heritage is a good area for a Semantic Web “island”:– lots of domain-specific knowledge– strong application pull– enormous amount of existing annotations, which have

been built up over centuries

Page 19: E-Culture semantic search pilot

19

On-line demohttp://e-culture.multimedian.nl