ICS-FORTH January 11, 2000 1 Thesaurus Mapping Martin Doerr Foundation for Research and Technology -...
-
Upload
eugene-greer -
Category
Documents
-
view
215 -
download
0
Transcript of ICS-FORTH January 11, 2000 1 Thesaurus Mapping Martin Doerr Foundation for Research and Technology -...
1ICS-FORTH January 11, 2000
Thesaurus Mapping
Martin Doerr
Foundation for Research and Technology - HellasInstitute of Computer Science
Bath, UK, January 11, 2000
Centre for Cultural Informatics and Documentation Systems
2ICS-FORTH January 11, 2000
Thesaurus MappingThe Problem
Logical aspects Semantics of involved entities Notions of translation Objectives and logics of mapping
Production of mappings Human Language engineering, cluster analysis
Architecture Mapping management Mapping service Integration in IT environment
3ICS-FORTH January 11, 2000
Thesaurus MappingWhy do we need mapping?
Thesauri for information retrieval depend on: View point (e.g. functional, morphological, social,
special database fields etc.) Language or social group (experts, common people etc.) Size and distribution of target material (effective partitioning)
Therefore Concepts differ Use of concepts differs Semantic embedding differs
Even if we agree on the same world Research topic: Formalisation of views and context
4ICS-FORTH January 11, 2000
Thesaurus Mapping Semantics of entities
Concepts are defined by agreement, e.g. orange (colour)
Concepts identify sets of real world objects
Concepts are identified by scope notes, literature references, examples, images
Concepts should not be changed they should be created or abandoned
they should be understood, accepted or rejected
A Descriptor is a concept identifier
5ICS-FORTH January 11, 2000
Thesaurus Mapping Semantics of entities
Links should express opinions and differences about set relation between concepts
subsumtion, disjointness etc. about derived concepts about term usage opinions may be human or computational !
Terms (noun phrases) should be used by social groups to refer to (multiple) concepts without direct linguistic meaning one term is selected as concept identifier
6ICS-FORTH January 11, 2000
Thesaurus Mapping Semantics of entities
concept - concept relations:
set semantics : BT, between thesauri/ version - for query expansion, users
associative: RTs, BTP, etc, - for user guidance
concept - term :
authoritative: preferred, used for - for cataloguers, users
statistical, possible synonyms: - for information retrieval
term - term relations : dictionary entries: - limited precision, within LE tools
7ICS-FORTH January 11, 2000
A translated thesaurus: For comprehension Established concepts and terms from one user group Optimally interpreted in words of another or more languages Translations are not established terms
Mapped thesauri (ISO5964): For transition Independent thesauri, each one from another user group Established concepts and terms. links declare “overlap” between concepts
Interlingua: For communication and knowledge sharing Compromise to share concepts between many user groups Optimally interpreted in words of another language
Thesaurus Mapping What is a Multilingual Thesaurus?
8ICS-FORTH January 11, 2000
Thesaurus Mapping Functionality of Mapping
Transparent query transformation (Z39.50!)
Replace Boolean term combination from thesaurus A with optimal
term combination from thesaurus B to retrieve equivalent results
Guaranteed transition needed (ev. to higher concepts)
Need controlled loss of precision or recall (research!)
Combinatorial explosion:
Need cascading Thes A => Thes B => Thes C
9ICS-FORTH January 11, 2000
Interthesaurus relations (ISO 5964)(from Descriptor of Thes. A to Descriptor of Thes. B )
• partial equivalence Better: broader equivalence
narrower equivalence• exact equivalence• inexact equivalence (“+/-”)
good for FTR only• single to multiple equivalence
Better:exact equivalence to BOOLEAN combination of target terms.
“AND” (intersection), “OR” (union), “NOT” (complement)
Thesaurus Mapping Logics of Mapping
10ICS-FORTH January 11, 2000
ANDEnglish Heritage Thesaurus Merimee Thesaurus
English Vocabulary French Vocabulary
Interthesaurusrelations
linguistictranslation
linguistictranslation
+/-
Interlingua
+/- +/-
Thesaurus Mapping Translation and Mapping
11ICS-FORTH January 11, 2000
BT
Thesaurus MappingBoolean OR-Combinations
A
CB
B OR CExact
equivalence
Boolean Compound
• Combines instances of B and C• Uses properties of either B or C• Is BT of B, C and NT of their common broader terms.
12ICS-FORTH January 11, 2000
BT
Thesaurus MappingBoolean AND-Combinations
AB AND C
Exact equivalence
Boolean Compound
• Uses instances of both, B and C• Combines properties of B and C• Is NT of B, C and BT of their common narrower terms.
CB
13ICS-FORTH January 11, 2000
BT
Thesaurus MappingApproximation by Inclusion
A
CB
Broader equivalence
Narrower equivalences
14ICS-FORTH January 11, 2000
BT
Thesaurus Mapping Avoid redundant linking!
A BBroader equivalence
Narrower equivalences
Exact equivalence
15ICS-FORTH January 11, 2000
Thesaurus Mapping Problems of Mapping
Consistency and reasoning (Description Logics!)
Optimal substitution of combined query terms
Protocol to propagate recall/ precision control
Inverse reading of one-to-many links.
Postcoordination : unclear semantics !
e.g. “grinding & factories”, solution by DL ?
16ICS-FORTH January 11, 2000
Thesaurus Mapping Production of Mappings
Human assessment needs (see Term-IT): CSCW, work flow, decentralised management tools
Excellent comparative presentation of thesaurus contents
Language engineering (see Term-IT): termhood recognition, automatic translation by parallel texts,
filtering by occurrence in target indexing language.
Excellent for preprocessing !
Analysis of use: Cluster analysis with doubly indexed entries.
Libraries: problem to identify the same “work” !
17ICS-FORTH January 11, 2000
SIS - Thesaurus Management System Co-operative linking
BTVersion 0
Version 1
Version 0
Version 1
Version 2
New Workspace
Group 1 Group 2
New Workspaceobsolete term
links of group2
links of group1
18ICS-FORTH January 11, 2000
Thesaurus MappingUsers Environment
??
User’s Authorities
Target Authorities CMS Collections
old version
specialized
DistributedRetrieval
Local Term
Agreed-on Term
foreignlanguage
19ICS-FORTH January 11, 2000
Search AidTool
Thesaurus MappingThree-level Architecture
CMS Maintainer CMS CMS Maintainer CMS
National Authority Providers
conceptproposal
Thesaurus initialization
Local TMSLocal TMS
End User Cascadedmapping service
conceptproposal
Thesaurus initialization
Update term use
Update term use
20ICS-FORTH January 11, 2000
Thesaurus Mapping Architectural Considerations
We propose to distinguish: Collection Management Systems with local term management National authority providers Mapping service
Mapping service: Co-operative mapping production environment and system,
- for few languages (3?), domain specific ? Large scale mapping tables detached from production system,
accessible as replicated Web resource.
Integration: Access engines connect to mapping resources on demand Provision of suitable metadata for CMS capabilities
21ICS-FORTH January 11, 2000
Thesaurus Mapping Conclusions
Thesaurus mapping is feasible and the best means to access coherently multiple CMS with controlled vocabulary
Thesaurus mapping is a major investment in human resources and IT environment
Targeted research can much improve the currently
feasible
- quality of mapping
- quality of service
- and production cost