Taxonomies: Insuring compatibility and crosswalks Marjorie M. K. Hlava Access Innovations / Data...

69
Taxonomies: Insuring compatibility and crosswalks Marjorie M. K. Hlava Access Innovations / Data Harmony [email protected]

Transcript of Taxonomies: Insuring compatibility and crosswalks Marjorie M. K. Hlava Access Innovations / Data...

Page 1: Taxonomies: Insuring compatibility and crosswalks Marjorie M. K. Hlava Access Innovations / Data Harmony mhlava@accessinn.com.

Taxonomies:Insuring compatibility and crosswalks

Marjorie M. K. Hlava

Access Innovations / Data Harmony

[email protected]

Page 2: Taxonomies: Insuring compatibility and crosswalks Marjorie M. K. Hlava Access Innovations / Data Harmony mhlava@accessinn.com.

Background "Underlying the information architecture for web sites

and search are taxonomies. The standards for thesauri, taxonomies, ontologies, semantic web and topic maps are converging. 

Where do they differ and where are they the same? This one hour talk will cover the ISO ANSI/NISO and

W3C terminology and controlled vocabulary standards, as well as the differences in the new standards compared to the previous editions.

Finally it will talk about the crosswalks and registries underway between these development communities."

Page 3: Taxonomies: Insuring compatibility and crosswalks Marjorie M. K. Hlava Access Innovations / Data Harmony mhlava@accessinn.com.

What we will cover today Background Overview of standards Specifics on 3 things

NISO Z39.19 BSI 8723 IFLA

Thoughts on a registry

Page 4: Taxonomies: Insuring compatibility and crosswalks Marjorie M. K. Hlava Access Innovations / Data Harmony mhlava@accessinn.com.

Why are taxonomies hot? Search doesn’t work

Without tagged data Websites need them to display

information To tag navigation back to content

Page 5: Taxonomies: Insuring compatibility and crosswalks Marjorie M. K. Hlava Access Innovations / Data Harmony mhlava@accessinn.com.

What’s happening to the business? Carpet baggers Differences of opinion Want to build on existing taxonomies Need for standards Need for cross walks Need for international communication Need for general registries of taxonomies

Page 6: Taxonomies: Insuring compatibility and crosswalks Marjorie M. K. Hlava Access Innovations / Data Harmony mhlava@accessinn.com.

The Problem – KEEPING UP

Many players we know and don’t know Between controlled vocabulary standards

ISO 2788 and 5964, BSI 8723

Groups developing guidelines and standards W3C with SKOS and OWL Governments world wide developing and mandating taxonomies

Communities increase reuse mapping interoperability between controlled vocabularies. 

Page 7: Taxonomies: Insuring compatibility and crosswalks Marjorie M. K. Hlava Access Innovations / Data Harmony mhlava@accessinn.com.

Traditional Standards ISO

TC 46 SC 9

ANSI NISO

Z39.19 BSI

BS 8723 W3C

OWL SKOS

US Government Office of Management and Budget

European Union

Page 8: Taxonomies: Insuring compatibility and crosswalks Marjorie M. K. Hlava Access Innovations / Data Harmony mhlava@accessinn.com.

Thesaurus related NISO Z39.19 2006 www.niso.org BSI (BS 8723) the next revised ISO ISO 2788 - Monolingual (1986) ISO 5964 - Multilingual (1985)

www.iso.ch/iso/en/ISOOnline.frontpage ISO 5127, Information and documentation

Vocabulary OWL from W3C SKOS the W3C thesaurus standard

Page 9: Taxonomies: Insuring compatibility and crosswalks Marjorie M. K. Hlava Access Innovations / Data Harmony mhlava@accessinn.com.

Thesaurus and Indexing Standards – ANSI/NISO

ANSI/NISO Z39.19 - 2003 Guidelines for the Construction, Format, and Management of Monolingual Thesauri

NISO Z39.19-200x Guidelines for the Construction, Format, and Management of Monolingual Controlled Vocabularies

NISO TR02-1997 Guidelines for Indexes and Related Information Retrieval Devicesby James D. Anderson

Page 10: Taxonomies: Insuring compatibility and crosswalks Marjorie M. K. Hlava Access Innovations / Data Harmony mhlava@accessinn.com.

The standards NISO Z39.19 2006 www.niso.org BSI (BS 8723) - the next revised ISO ISO 2788 - Monolingual (1986) ISO 5964 - Multilingual (1985)

www.iso.ch/iso/en/ISOOnline.frontpage ISO 5127 - Information and documentation

Vocabulary OWL from W3C SKOS - the W3C thesaurus standard

Page 11: Taxonomies: Insuring compatibility and crosswalks Marjorie M. K. Hlava Access Innovations / Data Harmony mhlava@accessinn.com.

Z39.19 - What’s new?The old standard

Coverage documents

Types of vocabularies Thesauri

Single BT Post-coordinated Printed formats Monolingual

vocabularies

The revised standard

Coverage Content objects

Types of vocabularies lists, synonym rings,

taxonomy Pre-coordinated Web format Multilingual vocabularies

(general) Polyheirachical Interoperability Facet analysis

Page 12: Taxonomies: Insuring compatibility and crosswalks Marjorie M. K. Hlava Access Innovations / Data Harmony mhlava@accessinn.com.

British Standards - BS 8723 Structured vocabularies for information retrieval

– Guide Part 1: General Part 2: Thesauri Part 3: Vocabularies other than thesauri Part 4: Interoperability between vocabularies Part 5: Interoperability with applications

Page 13: Taxonomies: Insuring compatibility and crosswalks Marjorie M. K. Hlava Access Innovations / Data Harmony mhlava@accessinn.com.

ISO TC 37Scope of ISO TC 37:

Standardization of principles, methods and applications relating to terminology and other language resources.

TC 37/SC 1 - Principles and methods TC 37/SC 2 - Terminography and lexicography TC 37/SC 3 - Computer applications for

terminology TC 37/SC 4 - Language resource management

Page 14: Taxonomies: Insuring compatibility and crosswalks Marjorie M. K. Hlava Access Innovations / Data Harmony mhlava@accessinn.com.

Other ISO standards: Concept-oriented terminology ISO 704:2000 Terminology work -

Principles and methodsISO 860:1996 Terminology work -

Harmonization of concepts and termsISO 1087-1:2000 Terminology work - Vocabulary -

Part 1: Theory and applicationISO 1087-2:2000 Terminology work - Vocabulary -

Part 2: Computer applications ISO 10241:1992 Preparation and layout of

international terminology standards

Page 15: Taxonomies: Insuring compatibility and crosswalks Marjorie M. K. Hlava Access Innovations / Data Harmony mhlava@accessinn.com.

Sample ISO - Data Categories ISO 12200:1999 Computer applications in

terminology - Machine-readable terminology interchange format (MARTIF) - Negotiated interchangeISO 12616:2002 Translation-oriented terminographyISO/TR 12618:1994 Computer aids in terminology - Creation and use of terminological databases and text corpora ISO 12620:1999 Computer applications in terminology - Data categories

used to create glossaries

Page 16: Taxonomies: Insuring compatibility and crosswalks Marjorie M. K. Hlava Access Innovations / Data Harmony mhlava@accessinn.com.

ISO Thesaurus and Indexing Standards ISO 2788:1986

Documentation - Guidelines for the establishment and development of monolingual thesauri

ISO 5964:1985Documentation - Guidelines for the establishment and development of multilingual thesauri

ISO 5963:1985Documentation - Methods for examining documents, determining their subjects, and selecting indexing terms

ISO 999:1996Information and documentation - Guidelines for the content, organization and presentation of indexes

Page 17: Taxonomies: Insuring compatibility and crosswalks Marjorie M. K. Hlava Access Innovations / Data Harmony mhlava@accessinn.com.

ISO TC 46/SC 9

Information and Documentation - Identification and Description

TC 46 is ISO's Technical Committee (TC) for information and documentation standards.

SC 9 is the TC 46 Subcommittee (SC) that develops and maintains ISO standards on the identification and description of information resources.

Page 18: Taxonomies: Insuring compatibility and crosswalks Marjorie M. K. Hlava Access Innovations / Data Harmony mhlava@accessinn.com.

ANSI/NISO Thesaurus and Indexing Standards

ANSI/NISO Z39.19 - 2005 Guidelines for the Construction, Format, and Management of Monolingual Thesauri

NISO Z39.19-200x Guidelines for the Construction, Format, and Management of Monolingual Controlled Vocabularies

NISO TR02-1997 Guidelines for Indexes and Related Information Retrieval Devicesby James D. Anderson

Page 19: Taxonomies: Insuring compatibility and crosswalks Marjorie M. K. Hlava Access Innovations / Data Harmony mhlava@accessinn.com.

Reports to use Report on the Workshop on Electronic

Thesauri, November 4-5, 1999 http://www.niso.org/news/events_workshops/thes99rprt.html

Final Report to the ALCTS/CCS Subject Analysis Committee: Subcommittee on Subject Relationships/Reference StructuresJune 1997 http://archive.ala.org/alcts/organization/ccs/sac/rpt97rev.html

Page 20: Taxonomies: Insuring compatibility and crosswalks Marjorie M. K. Hlava Access Innovations / Data Harmony mhlava@accessinn.com.

Other links http://esw.w3.org/topic/SkosDev/ThesaurusLinks/

XmlFormats MARC-21 XMLSchema. Zthes Z39.50 profile for thesaurus navigation (2001). TML thesaurus markup language (1999). ADL Thesaurus Protocol XML formats (2002). MeSH XML format (2001). GEMET XML format (2003). APAIS XML thesaurus format, an extension of Zthes

(2000). Open University thesaurus schemas (2002). Soergel XML thesaurus specification (2001).

Page 21: Taxonomies: Insuring compatibility and crosswalks Marjorie M. K. Hlava Access Innovations / Data Harmony mhlava@accessinn.com.

W3C OWL – Web Ontology Language RDF – Resource Description Format Topic Maps SKOS - Simple Knowledge Organization

Systems

Which community to serve? Build on the current standard Might make this link next

Page 22: Taxonomies: Insuring compatibility and crosswalks Marjorie M. K. Hlava Access Innovations / Data Harmony mhlava@accessinn.com.

Other things to watch Other W3C and ISO areas Support groups

Blogs Communities of Practice

SIMILE Web 2.0 activities WSDL – Web Services Digital Library

Page 23: Taxonomies: Insuring compatibility and crosswalks Marjorie M. K. Hlava Access Innovations / Data Harmony mhlava@accessinn.com.

Other Relevant ISO & W3C Standards

For translation, terminology and applied linguists go to: http://appling.kent.edu/ResourcePages/LTStandards/Chart/standards.chart.htm#Ontology

•Markup Languages •Metadata Resources •Character Coding •Access Protocols and Interoperability•Content Creation, Manipulation, and Maintenance •Authoring Standards •Text and Content Markup •Translation Standards •Terminology and Lexicography Standards •ISO TC 37 Standards •Terminology Interchange Standards •Controlled Language Standards •Taxonomy and Ontology Standards •Corpus Management Standards   •Locale-Related Standards

Page 24: Taxonomies: Insuring compatibility and crosswalks Marjorie M. K. Hlava Access Innovations / Data Harmony mhlava@accessinn.com.

SIMILE Semantic Interoperability of Metadata and Information in unLike Environments

Forming a data reference for open source taxonomies

Page 25: Taxonomies: Insuring compatibility and crosswalks Marjorie M. K. Hlava Access Innovations / Data Harmony mhlava@accessinn.com.

Revised Standards for Controlled VocabulariesU.S. Standard (NISO Z39.19 - 2005)British Standard (BS 8723 - 2005)IFLA Guidelines - 2005

Page 26: Taxonomies: Insuring compatibility and crosswalks Marjorie M. K. Hlava Access Innovations / Data Harmony mhlava@accessinn.com.

U.S. Standard for Controlled Vocabularies – NISO Z39.19

NISO Z39.19-200x Guidelines for the Construction, Format,

and Management of Monolingual Controlled Vocabularies

Some of the slides are based on

Emily Fayen 2004.6 SLA presentation, Margie Hlava’s talk at 2005 Data Harmony User Group meeting 2005 and Marcia Zeng – NKOS Meeting in Denver

Page 27: Taxonomies: Insuring compatibility and crosswalks Marjorie M. K. Hlava Access Innovations / Data Harmony mhlava@accessinn.com.

A little bit history… ANSI/NISO Z39.19,Guidelines for the

Construction, Format, and Management of Monolingual Thesauri – 1993

The most frequently requested NISO Standard In spite of its age the Standard is still relevant 1999: NISO Workshop on Electronic Thesauri

http://www.niso.org/news/events_workshop/thes99rpt.html

2002: NISO initiates revision of Z39.19 2004: 1993 reaffirmed 2005 new standard published

Page 28: Taxonomies: Insuring compatibility and crosswalks Marjorie M. K. Hlava Access Innovations / Data Harmony mhlava@accessinn.com.

Scope Expand beyond thesaurus Make more user-friendly Explain important concepts Explain principles of vocabulary control Include electronic information environment Include additional user search methods:

Browse Navigate Keyword searching

Expand beyond A & I services Include Web applications

Page 29: Taxonomies: Insuring compatibility and crosswalks Marjorie M. K. Hlava Access Innovations / Data Harmony mhlava@accessinn.com.

The Team: Vivian Bliss – Microsoft Carol Brent – ProQuest John Dickert – DTIC Lynn El-Hoshy – Library of Congress Marjorie Hlava – Access Innovations Stephen Hearn – ALA Sabine Kuhn – Chemical Abstracts Service Pat Kuhr – H.W. Wilson Company Diane McKerlie – DMA Consulting Peter Morville -- Semantic Studios Stuart Nelson – National Library of Medicine Allan Savage – National Library of Medicine Diane Vizine-Goetz – OCLC Marcia Lei Zeng – Special Libraries Association

Page 30: Taxonomies: Insuring compatibility and crosswalks Marjorie M. K. Hlava Access Innovations / Data Harmony mhlava@accessinn.com.

Z39.19 Chapters

1. Introduction 2. Scope3. Referenced Standards4. Definitions, Abbreviations, and Acronyms5. Controlled Vocabularies – Purpose,

Concepts, Principles, and Structure6. Term Choice, Scope, and Form7. Compound Terms8. Relationships9. Displaying Controlled Vocabularies10. Interoperability11. Construction, Testing, Maintenance, and

Management Systems

Page 31: Taxonomies: Insuring compatibility and crosswalks Marjorie M. K. Hlava Access Innovations / Data Harmony mhlava@accessinn.com.

Z39.19 - What’s new? The old standard

Coverage documents

Types of vocabularies

Thesauri Single BT Post-coordinated Printed formats Monolingual

vocabularies

The revised standard

Coverage Content objects

Types of vocabularies lists, synonym rings,

taxonomy Pre-coordinated Web format Multilingual vocabularies

(general) Poly hierarchical Interoperability Facet analysis

Page 32: Taxonomies: Insuring compatibility and crosswalks Marjorie M. K. Hlava Access Innovations / Data Harmony mhlava@accessinn.com.

Principles of Controlled Vocabularies

There are four important principles of vocabulary control that guide their design and development.• eliminating ambiguity• controlling synonyms• establishing relationships among terms where appropriate• testing and validation of terms

Page 33: Taxonomies: Insuring compatibility and crosswalks Marjorie M. K. Hlava Access Innovations / Data Harmony mhlava@accessinn.com.

Type of vocabulary control

Page 34: Taxonomies: Insuring compatibility and crosswalks Marjorie M. K. Hlava Access Innovations / Data Harmony mhlava@accessinn.com.

Lists A list is a simple group of

terms Example:

Alabama

Alaska

Arkansas

California

Colorado

. . . .

Frequently used in Web site pick lists and pull down menus

Page 35: Taxonomies: Insuring compatibility and crosswalks Marjorie M. K. Hlava Access Innovations / Data Harmony mhlava@accessinn.com.

Synonym Rings A synonym ring is a list of synonyms or near synonyms

that are used interchangeably for retrieval purposes

Page 36: Taxonomies: Insuring compatibility and crosswalks Marjorie M. K. Hlava Access Innovations / Data Harmony mhlava@accessinn.com.

Synonym Rings-- ExamplesSynonym rings are

usually found as sets of lists that allow users to access all content containing any of the terms.

e.g., cholesterol:

CholesterolBlood CholesterolSerum CholesterolGood CholesterolBad CholesterolLDL . . .

-- Frequently used in systems where the content is not indexed or the indexing vocabulary is not controlled

Page 37: Taxonomies: Insuring compatibility and crosswalks Marjorie M. K. Hlava Access Innovations / Data Harmony mhlava@accessinn.com.

An example from International SEMATECH;

a search for Silicon would look like this:

Your search was submitted as “SILICON” or “SI”

Page 38: Taxonomies: Insuring compatibility and crosswalks Marjorie M. K. Hlava Access Innovations / Data Harmony mhlava@accessinn.com.

Synonym Rings are used-- To expand queries for content objects.

any one of these terms retrieves any of the terms in the cluster.

With unstructured natural language format, interface draws together similar terms

With search engines Help control of the diversity of the language

Page 39: Taxonomies: Insuring compatibility and crosswalks Marjorie M. K. Hlava Access Innovations / Data Harmony mhlava@accessinn.com.

Taxonomies A taxonomy is a set of preferred terms, all

connected by a hierarchy or polyhierarchy

Example:Chemistry

Organic chemistry

Polymer chemistry

Nylon

Frequently used in web navigation systems

Page 40: Taxonomies: Insuring compatibility and crosswalks Marjorie M. K. Hlava Access Innovations / Data Harmony mhlava@accessinn.com.

Thesauri A thesaurus is a controlled vocabulary with

multiple types of relationships

Example:Rice

UF paddy

BT Cereals

BT Plant products

NT Brown rice

RT Rice straw

Page 41: Taxonomies: Insuring compatibility and crosswalks Marjorie M. K. Hlava Access Innovations / Data Harmony mhlava@accessinn.com.

Thesauri (cont.)Relationship types: Equivalence (Use/Used For) – indicates

preferred term in a synonym relationship Hierarchy – indicates broader and narrower

terms Associative – almost unlimited types of

relationships may be used - related

It is the most complex format for controlled vocabularies and widely used.

Page 42: Taxonomies: Insuring compatibility and crosswalks Marjorie M. K. Hlava Access Innovations / Data Harmony mhlava@accessinn.com.

Interoperability One of the most important issues from

the 1999 workshop

Question: How to compare indexes perform searches merge databases that have been developed

using different controlled vocabularies?

Page 43: Taxonomies: Insuring compatibility and crosswalks Marjorie M. K. Hlava Access Innovations / Data Harmony mhlava@accessinn.com.

Interoperability (CONT.) Factors Affecting Interoperability Multilingual Controlled Vocabularies Searching Indexing Merging Databases Merging Controlled Vocabularies Achieving Interoperability Storage and Maintenance of Relationships

among Terms in Multiple Controlled Vocabularies

Page 44: Taxonomies: Insuring compatibility and crosswalks Marjorie M. K. Hlava Access Innovations / Data Harmony mhlava@accessinn.com.

II. The British Standard

BS 8723: Structured Vocabularies for Information Retrieval – Guide Slides based on the presentation by Stella G Dextre Clarke, Alan Gilchrist ,Leonard WillIn ISKO 2004, London

Page 45: Taxonomies: Insuring compatibility and crosswalks Marjorie M. K. Hlava Access Innovations / Data Harmony mhlava@accessinn.com.

Existing BSI/ISO thesaurus standards ISO 2788-1986 Guidelines for the

establishment and development of monolingual thesauri

= BS 5723:1987

ISO 5964-1985 Guidelines for the establishment and development of multilingual thesauri

= BS 6723:1985

Page 46: Taxonomies: Insuring compatibility and crosswalks Marjorie M. K. Hlava Access Innovations / Data Harmony mhlava@accessinn.com.

What needs updating? Printed versus electronic application Guidance on management software Interoperability:

Mapping between thesauri and other types of vocabulary

Formats/protocols for data exchange with downstream applications

Applicability to end-user applications, not just those for information professionals

Page 47: Taxonomies: Insuring compatibility and crosswalks Marjorie M. K. Hlava Access Innovations / Data Harmony mhlava@accessinn.com.

Outline of new standardBS 8723: Structured vocabularies for

information retrieval – Guide Part 1 - Definitions, symbols and abbreviations Part 2 – Thesauri Part 3 - Vocabularies other than thesauri; Part 4 - Interoperability between vocabularies Part 5 - Interoperation between vocabularies and

other components of information storage and retrieval systems

Page 48: Taxonomies: Insuring compatibility and crosswalks Marjorie M. K. Hlava Access Innovations / Data Harmony mhlava@accessinn.com.

Part 3 chapters Classification schemes Subject heading lists Taxonomies Ontologies Semantic nets (?) Search thesauri

Page 49: Taxonomies: Insuring compatibility and crosswalks Marjorie M. K. Hlava Access Innovations / Data Harmony mhlava@accessinn.com.

Issues for Part 3 How much guidance is needed on how to

build other sorts of vocabulary? Should we describe the idiosyncrasies of

existing schemes, even where we judge there is a ‘better’ way?

Pick out the characteristics of different vocabulary types that govern when and how you can map them.

But some of the observable characteristics might not be what we’d recommend.

Page 50: Taxonomies: Insuring compatibility and crosswalks Marjorie M. K. Hlava Access Innovations / Data Harmony mhlava@accessinn.com.

Part 4: Interoperability between vocabularies

Huge demand for accessing information indexed with another language and/or vocabulary. ‘Mapping’. The Semantic Web is just one application.

Includes multilingual thesauri special case of mapping between vocabularies.

Applies where more than one language or vocabulary is in use, access to all resources is through one vocabulary

Page 51: Taxonomies: Insuring compatibility and crosswalks Marjorie M. K. Hlava Access Innovations / Data Harmony mhlava@accessinn.com.

BS 8723 part 4 has a wider scope BS 6723, was only with multilingual thesauri.

BS 8723 extends the scope to: thesauri in different dialects of one language different thesauri in a single language situations where a thesaurus interoperates with one or

more different types of structured vocabulary, such as classification schemes

situations where not all the interoperating vocabularies have the same status and/or function.

Part 4: Interoperability between vocabularies (cont.)

Page 52: Taxonomies: Insuring compatibility and crosswalks Marjorie M. K. Hlava Access Innovations / Data Harmony mhlava@accessinn.com.

Part 5: Interoperability with applications Vocabularies must work with

Search software Content Management Systems Web publishing software, etc.

Page 53: Taxonomies: Insuring compatibility and crosswalks Marjorie M. K. Hlava Access Innovations / Data Harmony mhlava@accessinn.com.

Build on existing formats and protocols for data exchange Z39.50 and Zthes, XML schema DTD MARC SKOS Core Schema Topic Map ADL gazetteer protocol W3C crosswalks OMB _ Section 207 of e-gov act

Page 54: Taxonomies: Insuring compatibility and crosswalks Marjorie M. K. Hlava Access Innovations / Data Harmony mhlava@accessinn.com.

Review and Comments Request a copy for Parts 1, 2, 3 and

4: Parts 1 and 2 numbered 04/30086620 DC

and 04/30094113 DC. The documents may be ordered from BSI

Customer Services tel +44(0)208-996-9001 or email [email protected]

Part 5 is out for comment

Page 55: Taxonomies: Insuring compatibility and crosswalks Marjorie M. K. Hlava Access Innovations / Data Harmony mhlava@accessinn.com.

III. IFLA Guidelines for Multilingual Thesauri

IFLA Classification and Indexing Section April 2005 released for commentsPublished 2005

Page 56: Taxonomies: Insuring compatibility and crosswalks Marjorie M. K. Hlava Access Innovations / Data Harmony mhlava@accessinn.com.

World-Wide Review of IFLA Guidelines for Multilingual Thesauri

URL: http://www.ifla.org/VII/s29/pubs/Draft-multilingualthesauri.pdf

Add to the ISO 5964 for multilingual Thesauri

Page 57: Taxonomies: Insuring compatibility and crosswalks Marjorie M. K. Hlava Access Innovations / Data Harmony mhlava@accessinn.com.

IFLA Classification and Indexing Section WG on Guidelines for Multilingual Thesauri

Chair: Gerhard J.A. Riesthuis (Netherlands)

Members: Lois Mai Chan (USA), Patrice Landry (Switzerland), Pia Leth (Sweden), Ia McIlwaine (United Kingdom), Martin Kunz (Germany), Dorothy McGarry (USA), Max Naudi (France), Marcia Lei Zeng (USA)

Page 58: Taxonomies: Insuring compatibility and crosswalks Marjorie M. K. Hlava Access Innovations / Data Harmony mhlava@accessinn.com.

Three approaches in the development of multilingual thesauri:

1. building a new thesaurus from the bottom up starting with one language and adding another language or

languages starting with more than one language simultaneously

2. combining existing thesauri merging two or more existing thesauri into one new

(multilingual) information retrieval language to be used in indexing and retrieval

linking existing thesauri and subject heading languages to each other; using the existing thesauri and/or subject heading languages both in indexing and retrieval

3. translating a thesaurus into one or more other languages

Page 59: Taxonomies: Insuring compatibility and crosswalks Marjorie M. K. Hlava Access Innovations / Data Harmony mhlava@accessinn.com.

Semantic problems

Semantic problems pertain to equivalence relations between terms used as preferred and non-preferred terms in information retrieval languages.

Equivalence relations exist not only within each separate language involved, but also between the languages (intra-language equivalence and inter-language equivalence).

Intra-language homonymy and inter-language homonymy are also considered semantic questions.

Additional problems pertaining to semantics involve the scope, form and choice of thesaurus terms.

Page 60: Taxonomies: Insuring compatibility and crosswalks Marjorie M. K. Hlava Access Innovations / Data Harmony mhlava@accessinn.com.

Structural problems Structural problems involve hierarchical and

associative relations between the terms. An important question in this respect is whether

the structure should be the same or different for each language.

In most if not all cases of linking, the structure will most probably not be the same in all the information retrieval languages involved.

In the other approaches mentioned it is possible in principle to apply the same structure to all languages.

Page 61: Taxonomies: Insuring compatibility and crosswalks Marjorie M. K. Hlava Access Innovations / Data Harmony mhlava@accessinn.com.

Contents covered by the guidelines

Building multilingual thesauri starting from scratch

Structure Morphology and Semantics

Starting from existing thesauri Merging Linking

Glossary Appendix:

An example of a non-symmetrical thesaurus

Page 62: Taxonomies: Insuring compatibility and crosswalks Marjorie M. K. Hlava Access Innovations / Data Harmony mhlava@accessinn.com.

Examples are in multiple languagesEnglish (British) English (USA) Dutch French

cranes (birds) cranes (birds) kraanvogels grue (oiseau) cranes (lifting

equipment) cranes (lifting

equipment) hijskranen SN voor andere

typen kranen, zie aldaar

grue (appareil de levage)

water taps water faucets waterkranen robinet à eau gas taps gas faucets gaskranen robinet à gaz taps NT water taps NT gas taps

faucets NT water faucets NT gas faucets

kranen SN voor kranen als

hijswerktuig gebruik hijskranen

NT waterkranen NT gaskranen

robinet NT robinet à eau NT robinet à gaz

Cranes is a homograph in English does not necessarily mean that equivalent terms in other languages are also homographs. The Dutch term kranen is a homograph too, but with the meanings cranes (lifting equipment) and taps.

Page 63: Taxonomies: Insuring compatibility and crosswalks Marjorie M. K. Hlava Access Innovations / Data Harmony mhlava@accessinn.com.

What is a taxonomist to do? Watch the standards Participate in development Exceed the guidelines Comply with all standards –

internationally Promote standards participation And we do – so far!

Page 64: Taxonomies: Insuring compatibility and crosswalks Marjorie M. K. Hlava Access Innovations / Data Harmony mhlava@accessinn.com.

Controlled vocabularies of all stripes need a place to call home Open contribution Thesaurus metadata contributions Comments on the contributions Examples of implementation A clearing house to keep track of

all the initiatives and suggested standards, a means to allow input from and to those initiatives, and publishing of best practices or lessons learned from

implementations perhaps a WikiKOS

Page 65: Taxonomies: Insuring compatibility and crosswalks Marjorie M. K. Hlava Access Innovations / Data Harmony mhlava@accessinn.com.

The Solutions Registry? NKOS KOS of KOS SKOS participants KOS typology - Tudhope Tesauro.com – Spanish - Salama Kent.edu site – Marcia Zeng Taxonomy Warehouse – Factiva - Clarke UMLS - Unified Medical Language

System

Page 66: Taxonomies: Insuring compatibility and crosswalks Marjorie M. K. Hlava Access Innovations / Data Harmony mhlava@accessinn.com.

More Solutions

Semantic Interoperability of Metadata and Information in unLike Environments (Open Source

UK HILT - Dennis Nicholson

Page 67: Taxonomies: Insuring compatibility and crosswalks Marjorie M. K. Hlava Access Innovations / Data Harmony mhlava@accessinn.com.

Good starts Link to each other Include

Thesauri Taxonomies Semantic webs Classification systems Subject headings SKOS OWL and Ontologies Other KOS

Page 68: Taxonomies: Insuring compatibility and crosswalks Marjorie M. K. Hlava Access Innovations / Data Harmony mhlava@accessinn.com.

What about? Authority Files Other pick lists Roget's and other synonym rings Dictionaries Gazetteers Glossaries Etc.

Page 69: Taxonomies: Insuring compatibility and crosswalks Marjorie M. K. Hlava Access Innovations / Data Harmony mhlava@accessinn.com.

Discussion??

Thank you for your attention!

Marjorie M. K. Hlava

Access Innovations / Data Harmony

[email protected]