Semantic annotation and search of large virtual heritage collections
description
Transcript of Semantic annotation and search of large virtual heritage collections
Semantic annotation and search of Semantic annotation and search of large virtual heritage collectionslarge virtual heritage collections
Guus SchreiberGuus Schreiber
Free University AmsterdamFree University Amsterdam
Overview
• A non-technical view on the Semantic Web• Work on Semantic-Web deployment
– SKOS, RDFa
• Semantic annotation and search in virtual collections: the E-Culture example
The Web: resources and links
URL URLWeb link
The Semantic Web: typed resources and links
URL URLWeb link
ULAN
Henri Matisse
Dublin Core
creator
Painting“Femme aux chapeau”
SFMOMA
Principle 1: semantic annotation
• Description of web objects with “concepts” from a shared vocabulary
Principle 2: semantic search
• Search for objects which are linked via concepts (semantic link)
• Use the type of semantic link to provide meaningful presentation of the search results
urang-utang
orange
ape
great ape
Principle 3: multiple vocabularies. or: the myth of a unified vocabulary
• In large virtual collections there are always multiple vocabularies – In multiple languages
• Every vocabulary has its own perspective– You can’t just merge them
• But you can use vocabularies jointly by defining a limited set of links– “Vocabulary alignment”
• It is surprising what you can do with just a few links
Example“Tokugawa”
AAT style/period Edo (Japanese period) Tokugawa
SVCN period Edo
SVCN is local in-house thesaurus
A link between two thesauri
RDF/OWL language constructs
• classes and individuals• subclasses• properties• subproperties• domain/range of
properties• XML Schema datatypes
• equality, inequality • inverse, transitive,
symmetric, functional properties
• property constraints: cardinality, allValuesFrom, someValuesFrom
• conjunction, disjunction, negation of classes
• hasValue, enumerated type
How useful are RDF and OWL?
• RDF: basic level of interoperability• Some constructs of OWL are key:
– Logical characteristics of properties: symmetric, transitive, inverse
– Identity: sameAs
• OWL pitfalls– Bad: if it is written in OWL it is an ontology– Worse: if it is not in OWL, then it is not an
ontology
W3C Semantic Web Deployment Working Groupmaking vocabularies/thesauri/ontologies available on the Web
• Schema for interoperable RDF/OWL representation of vocabularies – SKOS
• Publication guidelines: – URI management, representation of versions
• Embedding RDF in (X)HTML pages– RDFa
SKOS: pattern for thesaurus modeling
• Based on ISO standard• RDF representation• Documentation:
http://www.w3.org/TR/swbp-skos-core-guide/• Base class: SKOS Concept
Multi-lingual labels for concepts
Semantic relation:broader and narrower
• No subclass semantics assumed!
Indexing a resource with a SKOS concept
• primarySubject is defined as subproperty
Adding semantics
• Adding OWL statements• Interpretations of thesaurus relations such as
narrower as subclass-of are often imprecise (but can still be useful)
• Learning relations between thesauri is important form of additional semantics– Example: AAT contains styles; ULAN contains
artists, but there is no link– Availability of this kind of alignment knowledge is
extremely useful
W3C standardization process
• Input: draft specification• Collect use cases• Derive requirements• Create issues list: requirements that cannot be
handled by the draft spec• Propose resolutions for issues• Continuously: ask for public feedback/comments• Get consensus on amended spec• Find two independent implementation for each
feature in the spec
Example issue: relationships between lexical labels
• In draft SKOS spec lexical labels of concepts are represented as datatype properties
• Use cases require relations between labels, e.g. “AAT” is an acronym of “Art & Architecture Thesaurus”
• This is a problem because literals have no URI (so cannot be subject of an RDF property)
• Possible resolutions:– Labels/terms as classes– Relaxing constraints on label property– …..
Recipes for vocabulary URIs
• Simplified rule:– Use “hash" variant” for vocabularies that are
relatively small and require frequent access
http://www.w3.org/2004/02/skos/core#Concept – Use “slash” variant for large vocabularies, where
you do not want always the whole vocabulary to be retrieved
http://xmlns.com/foaf/0.1/Person
• For more information and other recipes, see:
http://www.w3.org/TR/swbp-vocab-pub/
Query for WordNet URI returns “concept-bounded description”
RDFa: embedding RDF metadata in an (X)HTML file
Regular HTML
Resulting RDF statements
HTML with RDFa
More information
E-Culture demonstrator
• Part of large Dutch knowledge-economy project MultimediaN
• Partners: VU, CWI, UvA, DEN,ICN
• People: – Alia Amin, Lora Aroyo, Mark
van Assem, Victor de Boer, Lynda Hardman, Michiel Hildebrand, Laura Hollink, Marco de Niet, Borys Omelayenko, Marie-France van Orsouw, Jos Taekema, Annemiek Teesing, Anna Tordai, Jan Wielemaker, Bob Wielinga
• Artchive.com, ICN: Rijksmuseum Amsterdam, Dutch ethnology musea (Amsterdam, Leiden), National Library (Bibliopolis)
Use case: painting style
Find paintings of a similar style
KLIMT, GustavPortrait of Adele Bloch-Bauer I1907Oil and gold on canvas138 x 138 cmAustrian Gallery, Vienna
How can we find this other ‘Art nouveau’ painting?
MUNCH, EdvardThe Scream1893Oil, tempera and pastel on
cardboard91 x 73.5 cmNational Gallery, Oslo
Issues w.r.t. the use case
• Parse annotation to find matches with thesauri terms– E.g. match artists to ULAN individuals
• Artists-style links– AAT contains styles; ULAN contains artists, but there is no
link• Learn link from corpora• Derive it from other annotations
– Domain-specific rules/reasoning needed • see example in SWRL doc• Painters may have painted in multiple styles
Example enrichment
• Learning relations between art styles in AAT and artists in ULAN through NLP of art0historic texts
• But don’t learn things that already exist!
Culture Web demonstratorhttp://e-culture.multimedian.nl
16 Nov 200616 Nov 2006
Perspectives
• Basic Semantic Web technology is ready for deployment– in open knowledge-rich domains– Important research issues: scalability, vocabulary
alignment, metadata extraction
• Web 2.0 features:– Involving community experts in annotation– Personalization, myArt
• Social barriers have to be overcome!– “open door” policy– Involvement of general public => issues of “quality”
• Importance of using open standards– Away from custom-made flashy web sites