Interoperability in the Cultural Heritage Domain

47
Interoperability in the Cultural Heritage Domain Lourens van der Meij VU Amsterdam – KB (part of sheets by A.Isaac) October 3 rd , 2008

description

Interoperability in the Cultural Heritage Domain. Lourens van der Meij VU Amsterdam – KB (part of sheets by A.Isaac) October 3 rd , 2008. Background. CATCH (NWO) C ontinuous A ccess T o C ultural H eritage Computer science research projects - PowerPoint PPT Presentation

Transcript of Interoperability in the Cultural Heritage Domain

Page 1: Interoperability in the Cultural Heritage Domain

Interoperability in the Cultural Heritage Domain

Lourens van der MeijVU Amsterdam – KB

(part of sheets by A.Isaac)

October 3rd , 2008

Page 2: Interoperability in the Cultural Heritage Domain

Interoperability in the Cultural Heritage Domain

Background

• CATCH (NWO) • Continuous Access To Cultural Heritage• Computer science research projects• Applied to Cultural Heritage (Libraries,

Musea)

• STITCH• SemanTic Interoperability To access

Cultural Heritage• Interoperability:

• Exchanging (standardization)• Integrating (translating, linking)metadata

Page 3: Interoperability in the Cultural Heritage Domain

Interoperability in the Cultural Heritage Domain

Intention

Show through example applications that• Integration of data, collections, and services• Interoperability:

• Data standardized such that it can be used across different applications

• Functionality reusable via services.• Creating mappings, semantic links between data

from different sources

is important in the Cultural Heritage Domain

Page 4: Interoperability in the Cultural Heritage Domain

Interoperability in the Cultural Heritage Domain

First

• Illustrate Integrated access to collections in the CH domain by looking at use case.

• Introduction of the use case• About vocabulaires• Introduce the collections that will be

integrated• Faceted browsing• What we want ->• Demo• Requirements, details

Page 5: Interoperability in the Cultural Heritage Domain

Interoperability in the Cultural Heritage Domain

(Integrated) Access to collections

• Collections: (records) of books, pieces of art,…• Electronic access, web portal.• STITCH focuses on semantics: structured access using

the available knowledge sources, not full text search• Records: meta data, information about the object

• Author• Date• Subject

• CH institutes often maintain knowledge structures(KOS), vocabularies, to facilitate storage and access and maintenance.

• Subject meta data, access through KOS focus of STITCH.

Page 6: Interoperability in the Cultural Heritage Domain

Interoperability in the Cultural Heritage Domain

Vocabularies (Knowledge Structures, KOS)

• Thesauri, classification systems, structuring collections, describing content, form, aspects of collection elements.

• Many vocabularies, within the KB: STITCH is cooperation between VU Amsterdam (KRR group), National Library(KB) and MPI Nijmegen. In the KB in the order of 10 vocabularies are maintained internally, and 20 or more external vocabularies play a role. Why?• History• Specialized collections, particular views on the

collection and theories how access should be provided.• Examples of vocabularies in the demos.

Page 7: Interoperability in the Cultural Heritage Domain

Interoperability in the Cultural Heritage Domain

Vocabularies

• Many different (kinds) of Vocabularies• Many different representations, data formats,

methods of access.

• Integrated access requires • standardized representation of vocabularies and

collections• standardized access => services• Providing links between elements of vocabularies,

alignment of vocabularies

• Next: example of integration

Page 8: Interoperability in the Cultural Heritage Domain

Interoperability in the Cultural Heritage Domain

Illustration, use case STITCH

• Integrated access to two collections:• KB : geillumineerde manuscripten• BnF: Mandragore, manuscrits enluminés• STITCH focus:

• Integration• Alignment, techniques (and standards)

• Interoperability• RDF, SKOS

Those aspects will be discussed after the first demo.

Page 9: Interoperability in the Cultural Heritage Domain

Interoperability in the Cultural Heritage Domain

KB Illustrated Manuscripts

Page 10: Interoperability in the Cultural Heritage Domain

Interoperability in the Cultural Heritage DomainKB Illustrated Manuscripts: Iconclass

Page 11: Interoperability in the Cultural Heritage Domain

Interoperability in the Cultural Heritage Domain

Mandragore

Page 12: Interoperability in the Cultural Heritage Domain

Interoperability in the Cultural Heritage Domain

Mandragore

Page 13: Interoperability in the Cultural Heritage Domain

Interoperability in the Cultural Heritage Domain

Faceted browsing

• Access the collection, using structure of the vocabularies

• Different dimensions: subject, author,..• Use the hierarchy of vocabularies if there is

such to group together objects• Lions, Giraffes, Zebras -> animals. Distinguish them

as a group.

Page 14: Interoperability in the Cultural Heritage Domain

Interoperability in the Cultural Heritage Domain

MDS 2- Field 1

- Field 1.1- Field 1.2

- Field 1.2.1- Field 1.3

- Field 2- …

MDS 1- Field 1

- Field 1.1- Field 2

- Field 2.1- Field 2.2- …

What we have

Page 15: Interoperability in the Cultural Heritage Domain

Interoperability in the Cultural Heritage Domain

What we want

Page 16: Interoperability in the Cultural Heritage Domain

Interoperability in the Cultural Heritage Domain

Demo

• KB Illuminated Manuscripts• BNF Mandragore Manuscripts

• http://galjas.cs.vu.nl:33333/MANDRA-SV-ICE-mandraNewNONE , amphibians

• Wheat

Page 17: Interoperability in the Cultural Heritage Domain

Interoperability in the Cultural Heritage Domain

Integrated Access

• Integrated semantic access requires • standardized representation of vocabularies and

collections• standardized access => services• Providing links between elements of vocabularies.

Page 18: Interoperability in the Cultural Heritage Domain

Interoperability in the Cultural Heritage Domain

Standardized representation

• Use of semantic web techniques• “Things” are represented as “resources”,URIs, over

any application and data set• Values as simple strings, numbers(Literals), URIs• Properties as typed, named links between URIs and

URIs and Literals• Theory, reasoning methods. interoperability, some standardization

Still need standardization on how to represent CH objects (xml:Dublin core), vocabularies (SKOS), links between elements of vocabularies.

Page 19: Interoperability in the Cultural Heritage Domain

Interoperability in the Cultural Heritage Domain

http://www.iconclass.nl/s_11

http://www.iconclass.nl/s_11F

skos:Concept

rdf:type

skos:broader

skos:prefLabel“the Virgin Mary”@en

skos:prefLabel“la Vierge Marie”@fr

http://www.iconclass.nl/

skos:inScheme

skos:ConceptScheme

rdf:type

SKOS: Example

Page 20: Interoperability in the Cultural Heritage Domain

Interoperability in the Cultural Heritage Domain

SKOS (Simple Knowledge Organization System)

• SKOS offers building blocks to represent KOSs in RDF

• Objects: Concept and ConceptScheme• Lexical properties (multilingual)

• prefLabel• altLabel

• Semantic relations• broader, narrower• related

• Notes • scopeNote• definition

Page 21: Interoperability in the Cultural Heritage Domain

Interoperability in the Cultural Heritage Domain

Vocabulary alignment

• Aim: finding semantic correspondences between vocabulary elements• “klassieke ruïnes” ≈ “landschap met ruïnes”• “maagd Maria” = “Heilige Moeder”

• Doing it (semi-) automatically• Vocabularies are big (tens of thousands concepts)• They change

Page 22: Interoperability in the Cultural Heritage Domain

Interoperability in the Cultural Heritage Domain

Automatic alignment techniques

• Lexical Labels of entities and textual definitions

• StructuralStructure of the vocabularies

• Background knowledge Using a shared conceptual reference to find links

• ExtensionalObject information (e.g. book indexing)

céréale, grain, blé blé

Page 23: Interoperability in the Cultural Heritage Domain

Interoperability in the Cultural Heritage Domain

Automatic alignment techniques

• Lexical Labels of entities and textual definitions

• StructuralStructure of the vocabularies

• Background knowledge Using a shared conceptual reference to find links

• ExtensionalObject information (e.g. book indexing)

céréale, grain, blé blé

Page 24: Interoperability in the Cultural Heritage Domain

Interoperability in the Cultural Heritage Domain

Extensional Statistical Alignment

• Object information (e.g. book indexing)

Thesaurus 1 Thesaurus 2

Collectionof books

“DutchLiterature”

“Dutch”

Page 25: Interoperability in the Cultural Heritage Domain

Interoperability in the Cultural Heritage Domain

Results

1: 9132.9 (1704 3479 976) Schilderijen - schilderkunst

2: 8088.5 (1204 2330 767) Kwaliteitszorg - kwaliteitsmanagement

3: 6232.7 (820 1572 543) Personeelsmanagement - personeelsbeleid

4: 5392.1 (1399 3271 622) Beeldende kunsten - beeldende kunst

5: 5063.1 (4951 1152 613) Nederlands - Nederlandse taalkunde

17: 3421.8 (280 714 243) Diabetes mellitus - suikerziekte

Page 26: Interoperability in the Cultural Heritage Domain

Interoperability in the Cultural Heritage Domain

Alignment: no Trivial Solution

• Current techniques are not reliable as unique source of knowledge

• What is a good alignment?• Evaluation criteria?• => What will it be used for?Usage scenarios • Integrated Search• Reindexing• Thesaurus merging• Navigation => faceted browsing

Page 27: Interoperability in the Cultural Heritage Domain

Interoperability in the Cultural Heritage Domain

What next

• Evaluation, lessons learned• What next ->• Second use case: reindexing• (Vocabulary service)• Conclusion

Page 28: Interoperability in the Cultural Heritage Domain

Interoperability in the Cultural Heritage Domain

Why usage scenarios

• Evaluation of alignments depends on its use.• Real world applications provide test of quality of

alignments• Requirements on alignments depend on their use.• What kinds of links should be distinguished?• Optional demo evaluation:

• http://localhost:33344/logineval• http://kits.cs.vu.nl:33344/logineval

• Next, reindexing, nearest to real world application.

Page 29: Interoperability in the Cultural Heritage Domain

Interoperability in the Cultural Heritage Domain

Situation at Dutch libraries, National Library(=KB)

• KB: two large collections:• DEPOT?Deposit collection: all Dutch language

publications)• Own Scientific collection• Subject indexing using two completely different

indexing systems Brinkman, GOO

• Common automation system for NL, Eu (OCLC-Pica)

• Meta data of books, contains lots of fields• Een boek, publicatie door verschillende

bibliotheken voorzien van meta data, gebruik makend van vele verschillende vocabulaires.

Page 30: Interoperability in the Cultural Heritage Domain

Interoperability in the Cultural Heritage Domain

Reindexing

• KB has about 20 people indexing books daily, about 20,000 books per year are being indexed.

• Indexing even internally according to different vocabularies. Indexing: adding keywords and classification information to books.

• Some books come with indexing done by other libraries (openbare bibliotheken, Biblion).

• If Biblion indices, or combinations could be translated to KB indices (Brinkman). Less work for KB.

Page 31: Interoperability in the Cultural Heritage Domain

Interoperability in the Cultural Heritage Domain

WinIBW

• OCLC (PICA) automatiseringssysteem voor bibliotheken in Nederland, ook gebruikt binnen Europa

• Online Public Access Catalogue (OPAC) • WinIBW internet access to Pica system (local

and central). Adding records, adding meta data, searching records.

• Demo, closest to real world application.

Page 32: Interoperability in the Cultural Heritage Domain

Interoperability in the Cultural Heritage Domain

Reindexing

• Biblion -> Brinkman Fietstochten, Kapellen, Beesel, Heiligenbeelden,… -> Brinkman?

Use alignment..Bibl:Fietstochten -> Brinkman?Bibl:Kappellen -> Brinkman?DEMO(Voorbeeld z sel 3-10-2008 gd?79)

Page 33: Interoperability in the Cultural Heritage Domain

Interoperability in the Cultural Heritage Domain

Page 34: Interoperability in the Cultural Heritage Domain

Interoperability in the Cultural Heritage Domain

Page 35: Interoperability in the Cultural Heritage Domain

Interoperability in the Cultural Heritage Domain

Page 36: Interoperability in the Cultural Heritage Domain

Interoperability in the Cultural Heritage Domain

Page 37: Interoperability in the Cultural Heritage Domain

Interoperability in the Cultural Heritage Domain

Page 38: Interoperability in the Cultural Heritage Domain

Interoperability in the Cultural Heritage Domain

Page 39: Interoperability in the Cultural Heritage Domain

Interoperability in the Cultural Heritage Domain

Page 40: Interoperability in the Cultural Heritage Domain

Interoperability in the Cultural Heritage Domain

Page 41: Interoperability in the Cultural Heritage Domain

Interoperability in the Cultural Heritage Domain

Result

Page 42: Interoperability in the Cultural Heritage Domain

Interoperability in the Cultural Heritage Domain

Reindexing

• Under evaluation• Improvement:

• Use other meta data• Adapt scenario (pass 95% confidence records)

• Many other uses.

Page 43: Interoperability in the Cultural Heritage Domain

Interoperability in the Cultural Heritage Domain

Schets vocabulaires van belang voor de KB

Page 44: Interoperability in the Cultural Heritage Domain

Interoperability in the Cultural Heritage Domain

Integrated Access

• Services through the internet• Protocols, SOAP, REST,..• Collection Access?• Vocabulary Access, Alignment access• http://eculture.cs.vu.nl:38080/vocreptags• http://localhost:8080/vocreptags

Page 45: Interoperability in the Cultural Heritage Domain

Interoperability in the Cultural Heritage Domain

Lessons

• Using semantic web techniques interoperability and integration of collections can be made easier.

• Aligning vocabularies is of use in different situations. The alignment methods need to be fine-tuned to the application they are meant for.

• Introducing new techniques, interaction between field CH and scientific institutes very valuable.

• Standardization of access to collections and vocabularies should be dealt with (prototype has been developed).

Page 46: Interoperability in the Cultural Heritage Domain

Interoperability in the Cultural Heritage Domain

Begrippen

• An ontology in both computer science and information science is a formal representation of a set of concepts within a domain and the relationships between those concepts. It is used to reason about the properties of that domain, and may be used to define the domain.

• Metadata (meta data, or sometimes metainformation) is "data about data", of any sort in any media. An item of metadata may describe an individual datum, or content item, or a collection of data including multiple content items and hierarchical levels, for example a database schema.

Page 47: Interoperability in the Cultural Heritage Domain

Interoperability in the Cultural Heritage Domain

begrippen

• A library classification is a system of coding and organizing library materials (books, serials, audiovisual materials, computer files, maps, manuscripts, realia) according to their subject and allocating a call number to that information resource. Similar to classification systems used in biology, bibliographic classification systems group entities that are similar together typically arranged in a hierarchical tree structure.

• In information technology, a thesaurus represents a database or list of semantically orthogonal topical search keys. In the field of Artificial Intelligence, a thesaurus may sometimes be referred to as an ontology.