Terminology mapping for subject cross-browsing in distributed information environments Libo Si PhD...

41
Terminology mapping for subject cross-browsing in distributed information environments Libo Si PhD student in the Department of Information Science, Lough borough University

Transcript of Terminology mapping for subject cross-browsing in distributed information environments Libo Si PhD...

Page 1: Terminology mapping for subject cross-browsing in distributed information environments Libo Si PhD student in the Department of Information Science, Loughborough.

Terminology mapping for subject cross-browsing in distributed information environments

Libo SiPhD student in the Department of Information Science, Loughborough University

Page 2: Terminology mapping for subject cross-browsing in distributed information environments Libo Si PhD student in the Department of Information Science, Loughborough.

Background

Users have to face different information resources using different schemes.

Library portal systems, such as MetaLib, SirSi Room. These provide a single access point.

Page 3: Terminology mapping for subject cross-browsing in distributed information environments Libo Si PhD student in the Department of Information Science, Loughborough.

Background Keyword cross-searching

Mapping different metadata schemes. Make them interoperable.

Subject cross-browsing Integrate different KOSs together into a hierarch

ical tree. Issues

Interoperability between different knowledge organisation systems

Interoperability between metadata standards

Page 4: Terminology mapping for subject cross-browsing in distributed information environments Libo Si PhD student in the Department of Information Science, Loughborough.

My Research Aim

To develop methods to facilitate both subject cross-browsing and cross-searching for library portal systems.

Objectives To investigate different methods to develop cross-searc

h service in a library portal product; To investigate different methods to make different met

adata standards interoperable; To investigate different methods to make different kno

wledge organisation systems interoperable; To indicate some trends to establish ontologies to

facilitate both cross-searching and cross-browsing by subject for the development of library portal systems.

Page 5: Terminology mapping for subject cross-browsing in distributed information environments Libo Si PhD student in the Department of Information Science, Loughborough.

Methodology

Case study: HILT, Renardus, MetaNet, ABC Ontology, OpenCyc Ontology, ePrint UK, and UMLS.

Investigate different methods used by these projects to facilitate subject cross-browsing and cross-searching service.

Page 6: Terminology mapping for subject cross-browsing in distributed information environments Libo Si PhD student in the Department of Information Science, Loughborough.

Methods to cross-search (1)

Federated Search (Sadeh 2006)

Page 7: Terminology mapping for subject cross-browsing in distributed information environments Libo Si PhD student in the Department of Information Science, Loughborough.

Methods to cross-search (1)

“A cross-search service can create and maintain their own repository of resource metadata” (Sadeh 2004).

Issues: Loss of data value Cannot capture rich knowledge organisation

systems used by different online databases due to the lack of methods to reuse different metadata schemes and controlled vocabularies (Hughs and Kamat 2005).

Page 8: Terminology mapping for subject cross-browsing in distributed information environments Libo Si PhD student in the Department of Information Science, Loughborough.

Methods to cross-search (2) An alternative is …

In the semantic web community, the construction of ontologies to maximise the use of both subject classification systems and metadata schemes across different collections is possible.

Each participating resource providers can offer metadata and classification systems to any cross-search service.

Page 9: Terminology mapping for subject cross-browsing in distributed information environments Libo Si PhD student in the Department of Information Science, Loughborough.

Mapping semantics of different metadata standards Derivation; Application profile; Crosswalk (one-to-one, and switch); Metadata registry; Data reuse and integration (RDF); Aggregation.

- Chen and Zheng (2006)

Page 10: Terminology mapping for subject cross-browsing in distributed information environments Libo Si PhD student in the Department of Information Science, Loughborough.

Derivation

One metadata scheme can be developed based on the principle and structure of an existing one (Chan and Zeng 2006a).

Ex.: TEI Lite is derived from the full Text Encoding Initiative (TEI).

Page 11: Terminology mapping for subject cross-browsing in distributed information environments Libo Si PhD student in the Department of Information Science, Loughborough.

Application profile

An application profile can be defined by combining a selected range of metadata elements from different metadata schemes for some application-specific purpose (Heery and Patel 2004).

Page 12: Terminology mapping for subject cross-browsing in distributed information environments Libo Si PhD student in the Department of Information Science, Loughborough.

Project using Application Profile Five namespaces used by Renardus applic

ation profile http://renardus.sub.unigoettingen.de/renap/renap.html

Renardus Metadata Element Set (rmes), Renardus Metadata Element Set Qualifiers

(rmesq), Dublin Core Metadata Element Set, version

1.1 (dc 1.1), Dublin Core Metadata Element Set

Qualifiers (dcterms), DCMI Type Vocabulary (dcmitype).

Page 13: Terminology mapping for subject cross-browsing in distributed information environments Libo Si PhD student in the Department of Information Science, Loughborough.

Crosswalk

“A crosswalk is a specification for mapping one metadata standard to another” (St. Pierre and LaPlant 1998).

One-to-one Many-to-many (switch scheme)

Page 14: Terminology mapping for subject cross-browsing in distributed information environments Libo Si PhD student in the Department of Information Science, Loughborough.

Metadata scheme registry A metadata registry refers to an

application that provides services based on information about 'metadata terms' and about related resources (Johnston 2005).

Ex: the CORES registry lists more than 40 metadata schemes, and supports searching and browsing by metadata scheme developer, maintenance agency, element sets, elements, encoding schemes, application profiles and element usages. (http://www.cores-eu.net/registry/)

Page 15: Terminology mapping for subject cross-browsing in distributed information environments Libo Si PhD student in the Department of Information Science, Loughborough.

Data reuse and integration This refers to describing information

objects by using different elements from different metadata schemes or application profiles (Chan and Zeng 2006b).

The Resource Description Framework (RDF) provides a basic platform for integrating different metadata schemes to describe web resources (Heery and Patel 2004).

RDF can facilitate the use of different application profiles.

Page 16: Terminology mapping for subject cross-browsing in distributed information environments Libo Si PhD student in the Department of Information Science, Loughborough.

An RDF example<?xml version="1.0" ?> - <rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:dc=http://purl.org/dc/elements/1.0/ xmlns:bc="http://www.schemas-forum.org/registry/schemas/BIBLINK/1.0/bc-ap#">- <rdf:Description about="urn:isbn:0-89887-113-1">    <dc:title>Patrologia Latina Database</dc:title>   <dc:creator>Jacques Paul Migne</dc:creator>   <dc:date>1993</dc:date>   <dc:language>la</dc:language>      <bc:extent>2 computer laser optical disks; 4 3/4 in</bc:extent>    <bc:systemRequirements>Multimedia PC 486x or higher, 8mb memory, CD-ROM drive, sound card, SVGA 256-colour monitor, Windows 95 or Windows 3.1</bc:systemRequirements>   <dc:subject rdf:value="Christian literature, Early" bc:subjectScheme="LCSH" />      <dc:identifier rdf:value="isbn:0-89887-113-1" bc:identifierScheme="URN" />   <bc:placePublication>Cambridge</bc:placePublication>   <dc:publisher>Chadwyck-Healey</dc:publisher>   </rdf:Description>  </rdf:RDF>

Page 17: Terminology mapping for subject cross-browsing in distributed information environments Libo Si PhD student in the Department of Information Science, Loughborough.

Aggregation This refers to:

Employing a central knowledge base to gather metadata records from different online databases using different metadata standards

Converting heterogeneous metadata records into a consistent form

Developing a range of enhancement services to enrich the metadata records gathered.

Page 18: Terminology mapping for subject cross-browsing in distributed information environments Libo Si PhD student in the Department of Information Science, Loughborough.

Project using Aggregation - ePrint UK

(Powell 2001)

Page 19: Terminology mapping for subject cross-browsing in distributed information environments Libo Si PhD student in the Department of Information Science, Loughborough.

Mapping semantics of different KOSs

Derivation Direct mapping Switch language Merging Co-occurrence mapping Satellite and leaf node linking

Page 20: Terminology mapping for subject cross-browsing in distributed information environments Libo Si PhD student in the Department of Information Science, Loughborough.

Derivation

A subject-specific vocabulary is developed based on some widely-used general vocabularies.

Ex: MeSH was developed based on the structure of LCSH.

Page 21: Terminology mapping for subject cross-browsing in distributed information environments Libo Si PhD student in the Department of Information Science, Loughborough.

Direct mapping

(Chan and Zeng 2004)

Page 22: Terminology mapping for subject cross-browsing in distributed information environments Libo Si PhD student in the Department of Information Science, Loughborough.

Switch language

(Mai 2003)

Page 23: Terminology mapping for subject cross-browsing in distributed information environments Libo Si PhD student in the Department of Information Science, Loughborough.

Projects using a switch language

The HILT Project Uses DDC as a switch language to navigat

e users to find relevant information. The Renardus Project.

Page 24: Terminology mapping for subject cross-browsing in distributed information environments Libo Si PhD student in the Department of Information Science, Loughborough.

Co-occurrence mapping

(Zeng and Chan 2004)

Page 25: Terminology mapping for subject cross-browsing in distributed information environments Libo Si PhD student in the Department of Information Science, Loughborough.

Merging

Different vocabularies in the same domain can be merged into a super-thesaurus.

Ex: The Unified Medical Language System (UMLS) merges concepts from about fifty medical controlled vocabularies into a metathesaurus.

Page 26: Terminology mapping for subject cross-browsing in distributed information environments Libo Si PhD student in the Department of Information Science, Loughborough.

Satellite and leaf node linking

Editors can select and adapt parts of a general vocabulary as a subject-specific vocabulary for some particular requirements.

Ex: A number of domain-specific controlled vocabularies have been developed by selecting parts of LCSH.

Page 27: Terminology mapping for subject cross-browsing in distributed information environments Libo Si PhD student in the Department of Information Science, Loughborough.

Ontology mapping for subject cross-search and browsing Current efforts within the digital library co

mmunity include developing ways to map different metadata schemes, and ways to map different knowledge organisation systems.

In the semantic web community, the ways to improve semantic interoperability include the construction of ontology and ontology mapping.

There is much in common between the methods used by these two communities.

Page 28: Terminology mapping for subject cross-browsing in distributed information environments Libo Si PhD student in the Department of Information Science, Loughborough.

What is an ontology?

Definition: An ontology is a formal (explicit) specification of a conceptualization shared by a community of people (R.Studer,1998).

The difference between an ontology and other knowledge organisation systems.

Page 29: Terminology mapping for subject cross-browsing in distributed information environments Libo Si PhD student in the Department of Information Science, Loughborough.

Types of ontologies in digital libraries Upper level ontology Domain ontology.

Page 30: Terminology mapping for subject cross-browsing in distributed information environments Libo Si PhD student in the Department of Information Science, Loughborough.

Upper level ontology

Refers to a common vocabulary including the basic concepts, such as things, space, events, time, behaviour, etc, and the relations between them (Gomez-Perez and Benjamins 1999; Ding and Foo 2004a).

Ex: OpenCyc, WordNet, and ABC ontology.

Page 31: Terminology mapping for subject cross-browsing in distributed information environments Libo Si PhD student in the Department of Information Science, Loughborough.

ABC Ontology “ It provides the notional basis for developing dom

ain, role, or community specific ontologies, and it incorporates a number of basic entities and relationships common across other metadata ontologies including time and object modification, agency, places, concepts, and tangible objects. Communities wishing to build their own metadata ontologies and models may then extend the ABC entities and relationships as needed” (Lagoze and Hunter 2001).

ABC Ontology is designed to incorporate basic entities and relationships common across different metadata standards, and provide a basis to create metadata ontologies, into which different metadata schemes can be mapped.

Page 32: Terminology mapping for subject cross-browsing in distributed information environments Libo Si PhD student in the Department of Information Science, Loughborough.

OpenCyc Ontology This is a universal ontology, in which "every

concept one can imagine can be correctly linked into the OpenCyc Ontology in appropriate places, no matter how general or specific, no matter how arcane or prosaic, no matter what the context (nationality, age, native language, epoch, childhood experiences, current goals, etc.) of the imaginer" (Stubkjar 2001).

It provides a framework for further establishing custom, and domain-specific ontologies.

Page 33: Terminology mapping for subject cross-browsing in distributed information environments Libo Si PhD student in the Department of Information Science, Loughborough.

WordNet Ontology This is a “manually constructed online lexical

reference system” (Noy and Hafner 1997). In WordNet, different lexical objects are organised systematically with the basic distinction between nouns, verbs, adjectives, and adverbs. Nouns are grouped by different concepts, and different concepts are organised hierarchically. In WordNet, a verb is related to a concept’s function, and an adjective is related to a concept’s property.

The WordNet ontology is often applied to offer a taxonomic tree, and also support natural language processing.

Page 34: Terminology mapping for subject cross-browsing in distributed information environments Libo Si PhD student in the Department of Information Science, Loughborough.

Domain ontology A domain-specific vocabulary that

encompass the concepts in a given domain (such as medical, agriculture, computer science, etc) and their relationships (Gomez-Perez and Benjamins 1999; Uschold and Gruninger 1996; Guarino 1997).

In some cases, potentially, some traditional KOSs can be integrated together, and form a basis to create a domain ontology.

Page 35: Terminology mapping for subject cross-browsing in distributed information environments Libo Si PhD student in the Department of Information Science, Loughborough.

Use of ontologies MetaNet:

Different metadata elements from different metadata schemes have been mapped to ABC ontology.

Mappings between E-learning object metadata and OpenCyc ontology

Mappings between MeSH and OpenCyc ontology

Mappings between different subject classification systems and OpenCyc

Page 36: Terminology mapping for subject cross-browsing in distributed information environments Libo Si PhD student in the Department of Information Science, Loughborough.

An Ontology Library System

“ An ontology library system is a library system that offers various functions for managing, adapting and standardizing groups of different ontologies” (Ding and Fensel 2001).

To support searching and browsing different ontologies.

Page 37: Terminology mapping for subject cross-browsing in distributed information environments Libo Si PhD student in the Department of Information Science, Loughborough.

Conclusion (1) A library portal system should be able to ma

ximise the reuse of existing library resources, such as metadata schemes, and knowledge organisation systems.

In order to improve semantic interoperability, it is expected that each resource provider publishes metadata schemes, and knowledge organisation systems in semantic web enabled format to facilitate reusing these resources. RDF, XML

Page 38: Terminology mapping for subject cross-browsing in distributed information environments Libo Si PhD student in the Department of Information Science, Loughborough.

Conclusion (2) In order to facilitate cross-searching:

Develop or apply a common metadata scheme, into which different metadata elements from different metadata schemes can be mapped.

Different metadata schemes can also be mapped into an upper level ontology.

These two ways can be developed together.

Page 39: Terminology mapping for subject cross-browsing in distributed information environments Libo Si PhD student in the Department of Information Science, Loughborough.

Conclusion (3) To facilitate cross browsing by subject

Different knowledge organisation systems can be mapped into a DDC as a subject navigation tree.

In order to support more powerful computational semantics, all concepts, intra-relationships, and inter-relationships in different knowledge organisation systems can be mapped into an upper level ontology.

Page 40: Terminology mapping for subject cross-browsing in distributed information environments Libo Si PhD student in the Department of Information Science, Loughborough.

Conclusion (4) A variety of mappings have been

developed. Each type of mapping is designed

to offer specific capabilities to improve semantic interoperability, and limited search or browsing functions.

A combination of the different types of mapping is required

Page 41: Terminology mapping for subject cross-browsing in distributed information environments Libo Si PhD student in the Department of Information Science, Loughborough.

Thank you and questions!

Libo [email protected]