Post on 06-Jan-2018
description
Interoperability,Z39.50 Profiles &Testing
William E. Moen<wemoen@unt.edu>
School of Library and Information SciencesTexas Center for Digital Knowledge
University of North TexasDenton, TX 72603
Netspeed 2002 Conference, October 25, 2002 Calgary, Alberta
Moen Netspeed 2002 -- Calgary, Alberta -- October 2002 2
Overview Interoperability Profiles
The Bath Profile The U.S. National Profile
Beyond profiles Indexing and search functionality Interoperability testing
Moen Netspeed 2002 -- Calgary, Alberta -- October 2002 3
Interoperability
Systems and organizations will interoperate!
One should actively be engaged in the ongoing process of ensuring that the systems, procedures and culture of an organisation are managed in such a way as to maximise
opportunities for exchange and re-use of information, whether internally or externally.
Paul Miller, 2000
Moen Netspeed 2002 -- Calgary, Alberta -- October 2002 4
Defining interoperability
System-oriented definition The ability of two or more systems or components to
exchange information and use the exchanged information without special effort on either system
User-oriented definition User’s ability to successfully search and retrieve
information in a meaningful way and have confidence in the results
The condition achieved when two or more technical systems can exchange information directly in a way that is satisfactory to users of the systems (AAP)
Moen Netspeed 2002 -- Calgary, Alberta -- October 2002 5
Assessing interoperability Binary
Interoperable Not interoperable
Continuum More or less interoperable Acceptable levels of interoperability
Moen Netspeed 2002 -- Calgary, Alberta -- October 2002 6
Factors affecting interoperability Multiple and disparate systems
operating systems, information retrieval systems, etc.
Multiple protocols Z39.50, HTTP, SOAP, etc.
Multiple data formats, syntax, metadata schemes MARC 21, UNIMARC, XML, / ISBD/AACR2-based, Dublin Core
Multiple vocabularies, ontologies, disciplines LCSH, MESH, AAT
Multiple languages, Multiple character sets Indexing, word normalization, and word extraction policies
Moen Netspeed 2002 -- Calgary, Alberta -- October 2002 7
Mapping the landscape Networked information retrieval occurs within and
across communities Information communities
Focal community (e.g., libraries) Extended community (e.g., cultural heritage community) Extra community
Knowledge Domains Intra domain Extra domain
Costs to achieve interoperability vary
Moen Netspeed 2002 -- Calgary, Alberta -- October 2002 8
Information communities
Focal Community(e.g., Libraries)
Focal Community(e.g., Archives
Focal Community(e.g., Museum)
Extended Community(e.g., Cultural Heritage)
Focal Community(e.g., Geospatial )
Focal Community(e.g., Geospatial)
Focal Community(e.g., Natural History
Museums)
Extended Community
Extra Community
Moen Netspeed 2002 -- Calgary, Alberta -- October 2002 9
Focal community Community agreements exist (e.g., standards, rules, etc.) Interoperability factors reduced Interoperability more easily achieved
Libraries as Focal Community
Relative homogeneity of data and systems Z39.50 widely implemented Standards-based MARC records Content and structure prescribed by AACR Commonly understood access points Use of controlled vocabularies
Moen Netspeed 2002 -- Calgary, Alberta -- October 2002 10
Threats to Z39.50 interoperability Differences in implementation of the standard Differences in local information retrieval systems
Search functionality Indexing policies
These threats can be addressed by Z39.50 specifications and configuration Enhancing local information retrieval systems Recommendations for local indexing decisions
Moen Netspeed 2002 -- Calgary, Alberta -- October 2002 11
Virtual Catalog Application
Moen Netspeed 2002 -- Calgary, Alberta -- October 2002 12
Z39.50 Model of Resource Discovery
Moen Netspeed 2002 -- Calgary, Alberta -- October 2002 13
Profiles Z39.50 specifications
CompleteZ39.50
Specifications
Z39.50Profile
Represent community consensus on requirements
Identify Z39.50 specifications to support those requirements
Aid in purchasing decisions Provide specifications for vendors
Profiles are a solution path forimproving interoperability
Moen Netspeed 2002 -- Calgary, Alberta -- October 2002 14
Profiles Defines a subset of specifications from one or more
standards Goal of profiles is to improve interoperability
Profiles are useful for: prescribing how Z39.50 should be used in a particular
application environment solving interoperability problems with existing Z39.50
implementations within a community or across two or more communities
Moen Netspeed 2002 -- Calgary, Alberta -- October 2002 15
The Bath Profile
The Bath Profile: An International Z39.50 Specification for Library Applications and Resource Discovery, Release 2 (Draft 3,Oct. 2002)
Enables effective use of Z39.50 in a range of library applications: Search and retrieval from library catalogues Search and retrieval of bibliographic holdings Search and retrieval of authority records Cross-domain searching
FOR MORE INFORMATION, VISIT THE BATH MAINTENANCE AGENCY WEBSITE…
http://www.nlc-bnc.ca/bath/
Moen Netspeed 2002 -- Calgary, Alberta -- October 2002 16
Structure of the profile Modular for extensibility Related requirements and specifications group in
Functional Areas Release 2 defines four Functional Areas
Functional Area A: Basic Bibliographic Search and Retrieval, with Primary Focus on Library Catalogues
Functional Area B: Bibliographic Holdings Search and Retrieval
Functional Area C: Cross-Domain Search and Retrieval Functional Area D: Authority Record Search and Retrieval
in Online Library Catalogues Defines Conformance Levels for each area
Moen Netspeed 2002 -- Calgary, Alberta -- October 2002 17
Addressing interoperability The Bath Profile:
Identifies searching requirements (tasks) Defines the searches (semantics and behavior) Specifies Z39.50 query to represent the search
• Standard combination of Z39.50 attribute types and values• Clients must send all attribute type values specified for search• Servers must be able to process all values• No default behavior by client or server
Requires support for specific formats for interchanging retrieval records
Moen Netspeed 2002 -- Calgary, Alberta -- October 2002 18
Functional Area A, Level 0 Conformance Level 0
Version 2 required, Version 3 recommended Basic Bibliographic Search (Z39.50 Search Service)
• Author Search — Keyword• Title Search — Keyword• Subject Search — Keyword • Any Search — Keyword
Basic Bibliographic Retrieval (Z39.50 Present Service)• Z-clients to support MARC21 and SUTRS• Z-servers to support MARC 21
Moen Netspeed 2002 -- Calgary, Alberta -- October 2002 19
Functional Area A, Level 1 Conformance Level 1
Inherits search requirements form Level 0 Requires 15 additional searches, including:
• Exact Match (author, title, subject)• First Words & First Characters in Field (author, title, subject)• Keyword with Right Truncation (author, title, subject)• Standard ID, Date,
Browse Indexes (Z39.50 Scan Service) • 3 scans defined
Retrieval• Z-clients to support MARC21 and SUTRS• Z-servers to support MARC 21
Moen Netspeed 2002 -- Calgary, Alberta -- October 2002 20
Functional Areas B, C, D Area B -- Holdings Information
Address the challenge of search and retrieval of bibliographic holdings information
• Locations Only• Locations, Summary Information and Count if available• Summary Copy Level Holdings
Use of XML as Record Syntax Area C -- Cross Domain Search/Retrieval
Defines two conformance levels (13 searches) Dublin Core DTD for XML record syntax
Area D – Authority Record Search/Retrieval Defines one conformance level Defines 14 searches
Moen Netspeed 2002 -- Calgary, Alberta -- October 2002 21
Level 0: title keyword search
Attribute Type Attribute Value Attribute Names
Use (1) 4 Title
Relation (2) 3 Equal
Position (3) 3 Any
Structure (4) 2 Word
Truncation (5) 100 Do not truncate
Completeness (6) 1 Incomplete subfield
Uses: Searches for complete word in a title of a resource.
Example: Title search for “woman” represented in Z query as:(1,4)(2,3)(3,3)(4,2)(5,100)(6,1) woman
Moen Netspeed 2002 -- Calgary, Alberta -- October 2002 22
Level 0: title keyword right truncation
Attribute Type Attribute Value Attribute NamesUse (1) 4 Title
Relation (2) 3 Equal
Position (3) 3 Any
Structure (4) 2 Word
Truncation (5) 1 Right Truncation
Completeness (6) 1 Incomplete subfield
Uses: Searches for complete word beginning with the specified character string in fields that contain a title of a resource.
Example: Title search for woman truncated as “wom” represented in Z query as: (1,4)(2,3)(3,3)(4,2)(5,1)(6,1) wom
Moen Netspeed 2002 -- Calgary, Alberta -- October 2002 23
Level 1: title first words in field
Attribute Type Attribute Value Attribute NamesUse (1) 4 TitleRelation (2) 3 EqualPosition (3) 1 First in fieldStructure (4) 1 PhraseTruncation (5) 100 Do not truncateCompleteness (6) 1 Incomplete subfield
Uses: Searches for complete word(s) in the order specified in fields that contain a title of a resource. The field must begin with the specified character string. This search is useful when the beginning words in a title are known to the user.
Example: Title search for “Gone with the” represented in Z query as:(1,4)(2,3)(3,1)(4,1)(5,2)(6,1) gone with the
Moen Netspeed 2002 -- Calgary, Alberta -- October 2002 24
Endorsements of Bath Profile Atlantic Scholarly Information Network CENL Working Group on Technical Standards Czech and Slovak Library Information Network (CASLIN) Committee on Institutional Cooperation (CIC) International Coalition of Library Consortia (ICOLC) Istituto Centrale per il Catalogo Unico delle Biblioteche Italiane e per le
Informazioni Bibliografiche (ICCU) M25 Consortium of Higher Education Libraries National Library of Canada OCLC ONE2 SmartLibrary Standing Conference of National and University Libraries (SCONUL) Z Texas Project
Moen Netspeed 2002 -- Calgary, Alberta -- October 2002 25
Bath as foundation profile National, regional, and state profiles based on the
Bath Profile ONE-2 Profile DanZIG Profile U.S. National Z39.50 Profile Z Texas Profile
Moen Netspeed 2002 -- Calgary, Alberta -- October 2002 26
Library application profiles
The Bath Profile: An International Z39.50 Specification for Library Applications and Resource Discovery
U.S. National Z39.50 Profile for Library Applications
Z Texas Profile: A Z39.50 Profile for Library Systems Applications in Texas
Bath ProfileCore Specifications
For GlobalInteroperability
Relationship among profiles
Moen Netspeed 2002 -- Calgary, Alberta -- October 2002 27
U.S. National Profile National Information Standards Organization
(NISO) standards effort National Profile:
Addresses cross-catalog searching and holdings information interchange
Bath Profile is foundation for U.S. National Profile Responds to national requirements
Work initiated in November 2000 Draft standard ready by end of 2002
FOR MORE INFORMATION, VISIT THE PROJECT WEBSITE…
http://www.unt.edu/zprofile
Moen Netspeed 2002 -- Calgary, Alberta -- October 2002 28
U.S. Profile Functional Area A Conformance Level 0
Version 2 required, Version 3 recommended Basic Bibliographic Search (Z39.50 Search Service)
• Author Search — Keyword (NISO)• Title Search — Keyword (Bath)• Subject Search — Keyword (Bath)• Any Search — Keyword (Bath)
Basic Bibliographic Retrieval (Z39.50 Present Service)• MARC 21 supported by Z-client and Z-servers
Moen Netspeed 2002 -- Calgary, Alberta -- October 2002 29
U.S. Profile Functional Area A Conformance Level 1
Version 3 required Inherits search requirements form Level 0 Requires 20 additional searches, including:
• Exact Match (author, title, subject)• First Words & First Characters in Field (author, title, subject)• Keyword with Right Truncation (author, title, subject)• ISBN, ISSN, Standard ID, Format/Type, Date, Language
Browse Indexes (Z39.50 Scan Service) Retrieval
• Z-clients support MARC 21• Z-servers support MARC 21
Moen Netspeed 2002 -- Calgary, Alberta -- October 2002 30
U.S. Profile Functional Area A Conformance Level 2
38 additional searches, including• Key Title, Series Title, Uniform Title, • Unanchored phrase searches for Title, Subject, Name, Any• Personal Author, Corporate Author, Conference Meeting• Notes, other standard number (e.g., LCCN)• Pattern searches for one or more controlled vocabularies
Moen Netspeed 2002 -- Calgary, Alberta -- October 2002 31
U.S. Profile Functional Area B Bibliographic Holdings Information Retrieval Use of XML as Record Syntax Z39.50 Holdings XML Schema
http://www.portia.dk/zholdings/ Harmonized with Bath Profile
Moen Netspeed 2002 -- Calgary, Alberta -- October 2002 32
Z39.50 profiles are not enough Profiles can:
Identify searching requirements (tasks) Define the searches (semantics and behavior) Specify Z39.50 query to represent the search and
formats of retrieval records Also needed are:
Agreements on indexing Common search functionality Methods and testbed for interoperability testing Conformance to profiles by vendors and libraries
Moen Netspeed 2002 -- Calgary, Alberta -- October 2002 33
Indexing & search functionality Indexing
Access points Populating indexes from which MARC fields/subfields Moving toward community agreements on common indexing
policies to support profile-defined searches Indexing guidelines available for use
http://www.unt.edu/zinterop/ Related issues: word normalization, word extraction
Search functionality Phrase searching Truncation Proximity searching, etc.
Moen Netspeed 2002 -- Calgary, Alberta -- October 2002 34
Interoperability testbed projectRealizing the Vision of Networked Access toLibrary Resources: An Applied Research andDemonstration Project to Establish andOperate a Z39.50 Interoperability Testbed
A Institute of Museum and Library Services National Leadership Grant
Goal: Improve Z39.50 semantic interoperability among libraries for information access and resource sharing
FOR MORE INFORMATION, VISIT THE PROJECT WEBSITE…
http://www.unt.edu/zinterop/
Moen Netspeed 2002 -- Calgary, Alberta -- October 2002 35
Z-Interop vision Provide a technically and organizationally trusted
environment for vendors and consumers to demonstrate and evaluate Z39.50 products
Develop rigorous methodologies, test scenarios & procedures to measure and assess the extent of interoperability
Demonstrate and operate a Z39.50 interoperability testbed
Moen Netspeed 2002 -- Calgary, Alberta -- October 2002 36
Z-Interop partners Institute of Museum and Library Services UNT’s Texas Center for Digital Knowledge University of North Texas School of Library and
Information Sciences OCLC Online Computer Library Center Sirsi Corporation Sea Change Corporation, Bookwhere 2000
Moen Netspeed 2002 -- Calgary, Alberta -- October 2002 37
Components of the testbed Test dataset
400,000 MARC 21 records from OCLC’s WorldCat Z39.50 reference implementations
Z-client, Z-server, information retrieval system Test scenarios & searches
Searches with known result records from dataset Benchmarks
Results of test searches against reference implementations
Moen Netspeed 2002 -- Calgary, Alberta -- October 2002 38
Analysis of test dataset Determine frequency of words in dataset Systematically select words for use in test searches Identify records that contain selected word
Aggregate Record Group Word appears in any fields and subfields
Identify records that contain selected word in specified fields/subfields Candidate Record Group For example, examine records for occurrence of word in
title-related fields/subfields
Moen Netspeed 2002 -- Calgary, Alberta -- October 2002 39
Decomposed MARC recordsOCLC #
Tag 1st Ind
2nd Ind
SubFld Fld Pos
SubFld Pos
Word Pos
Word
3 1 1 1 1 Ocm00000003
3 3 2 1 1 OCoLC
3 110 2 a 11 1 1 National
3 110 2 a 11 1 2 Study
3 110 2 a 11 1 3 Service
3 245 1 0 a 12 1 1 Illegitimacy
3 245 1 0 a 12 1 2 and
3 245 1 0 a 12 1 3 Adoption
3 245 1 0 b 12 2 1 Report
3 650 0 a 17 1 1 Illegitimacy
3 650 0 z 17 2 1 Maine
400,000 MARC21 records = 33 million decomposed records
Moen Netspeed 2002 -- Calgary, Alberta -- October 2002 40
Analysis logic
TestDataset
(decomposedrecords)
1. Examine for occurrence of word “river”
CandidateRecordGroup
2. Yields Aggregate Record Group
for word “river”
3. Examine for occurrence of word
“river” in selectedfields/subfields
4. Yields Candidate Record Group for word “river” in selected
fields/subfields
AggregateRecordGroup
Moen Netspeed 2002 -- Calgary, Alberta -- October 2002 41
Some critical questions What is a “word”
Self-help Self help
Normalization Elena Éléna
What are the appropriate Author, Title, and Subject fields to look in for the word? Decision related to indexing policies
Moen Netspeed 2002 -- Calgary, Alberta -- October 2002 42
Reference implementations Online Catalog Software
Z-Interop testbed uses SIRSI’s UNICORN system Test dataset loaded on the system Indexing policies based on guidelines
Z39.50 Server SIRSI Z39.50 Module Configured according to Bath/U.S. Profile
Z39.50 Client Bookwhere 2000 Configured according to Bath/ U.S. Profile
Moen Netspeed 2002 -- Calgary, Alberta -- October 2002 43
Establishing benchmarksReference
Z39.50 Client Reference
Z39.50 Server
Configuredto Support
ProfileSpecifications
Configuredto Support
ProfileSpecifications
Indexed perguidelinesto support
Profile searches
Test Dataset
Test searches
RetrievalResults
CandidateRecordGroup
Compared toBenchmarksFor
Test Search
Yields
Moen Netspeed 2002 -- Calgary, Alberta -- October 2002 44
Interoperability testing Z-Interop Interoperability Testing Policies and Procedures Test dataset loaded on participant’s system Configured conform with Bath/U.S. Profiles Indexed according to participant’s policies Testing Z-servers
Z-Interop will send test searches from reference Z-client Report results compared with benchmarks Analyze results to assist implementor to improve interop
Testing Z-clients Test searches sent to reference Z-server
Moen Netspeed 2002 -- Calgary, Alberta -- October 2002 45
Testing & assessmentReference
Z39.50 Client Vendor
Z39.50 Server
Configuredto Support
ProfileSpecifications
Configuredby Vendor
for Conformance
to Profile
Indexed by Vendor
According to Vendor’s
Specifications
Test Dataset Loaded by Vendor or Library
Test Searches
RetrievalResultsCompared to
BenchmarksFor
Test Search
Moen Netspeed 2002 -- Calgary, Alberta -- October 2002 46
Current testing Validate testing methodologies, procedures, policies Bath/U.S. National Profiles Levels 0 & 1 Search & Retrieval
Title Search – Keyword Author Search – Keyword Subject Search – Keyword Any Search – Keyword Title, Author, Subject Searches – Keyword Right Truncation Simple Keyword Boolean searches (AND, OR, NOT)
Test participants InQuirion OCLC Innovative Interfaces TLC/CARL
epixtech Fretwell-Dowing M 25 (UK) Others expressing interest
Moen Netspeed 2002 -- Calgary, Alberta -- October 2002 47
Research questions What are acceptable levels of interoperability? What are appropriate measures of interoperability? What does conformance to a Profile mean?
Conformance of vendor’s product Conformance of your implementation of vendor’s product
To what extent are organizations willing to support common indexing practices to improve interoperability?
Moen Netspeed 2002 -- Calgary, Alberta -- October 2002 48
Critical success factors Openness and transparency of processes
Project documents available on website Culture of nurturing improvement Trustworthiness Confidentiality of participants’ results
Moen Netspeed 2002 -- Calgary, Alberta -- October 2002 49
An opportunity for Z39.50 Z39.50 experience has shown the challenges of
interoperability Problems of interoperability are better understood
within a focal community Solution paths exist Interoperability testing serves as platform for
improvement The pieces are finally falling into place!
Moen Netspeed 2002 -- Calgary, Alberta -- October 2002 50
References The Bath Profile Maintenance Agency
http://www.nlc-bnc.ca/bath/ U.S. National Profile
http://www.unt.edu/zprofile/ Z39.50 Interoperability Testbed
http://www.unt.edu/zinterop/