Deep-Indexing the OPAC: Integrating Contents Information into Search Results Mary M. Strouse, CUA...

Post on 28-Dec-2015

215 views 0 download

Tags:

Transcript of Deep-Indexing the OPAC: Integrating Contents Information into Search Results Mary M. Strouse, CUA...

Deep-Indexing the OPAC: Deep-Indexing the OPAC: Integrating Contents Integrating Contents

Information into Search Information into Search ResultsResults

Mary M. Strouse, CUA DuFour Law Library7th MAIUG Annual Conference, October 2005.

Functions of Contents Functions of Contents Information (TOC Data)Information (TOC Data)

• Evaluation:Does this resource suit my purpose?

• Navigation:Which volume(s), pages do I need?

• Identification/Collocation:

What does the library have about…?

…Written by…?

… Containing… [known title]?

Local Priorities for TOC Local Priorities for TOC InclusionInclusion

• Edited collections on broad themes

• Diverse geographic treatments

• Conferences, symposia, anthologies

• Local interest (our faculty, etc.)

Unifying themes: multiple authorship and/or non-predictable content

Where, When and HowWhere, When and How• Choice of vendors

• Blackwell• Syndetic Solutions• Marcive (using Syndetic’s data)• Scanning/local input

• Loading mechanisms• III loading services (Blackwell)• Matching on demand

• Choice of formats

• Standards-compliant• Existing tools (macros, spell-check) • Includes volume and chapter, but not

page #• Keyword access to titles and authors • Titles indexed (all or nothing)• Can’t index authors (not in inverted form).• Can be difficult to read

505 Enhanced Contents 505 Enhanced Contents NoteNote

Vendor TOC Format Vendor TOC Format (97x)(97x)

• Displays as a table of contents • Includes page numbers• Indexing flexibility: authors and titles as

well as keyword • Can exclude generic titles

(“Introduction” “Preface”, etc.) from indexing

• Space for both transcribed and authorized forms

97x TOC Format Detail97x TOC Format Detail 970 field (one per chapter or section title)

Indicator 1 : title indexing0 Non-distinctive title (don’t index*)1 General chapter title or heading 2 Citable title No longer used

Indicator 2: hierarchy, degree of indentation |l: Section or chapter label |t: Section or chapter title |c: Personal author |f: Personal author in inverted form *|d: Non-personal author|e: Editor |p: Starting page number

*by default, authors of non-distinctive titles are not indexed. May be indexed on request.

Disadvantages of TOC Disadvantages of TOC Format:Format:

• Table-like format takes up screen, adds significantly to printing

• Limitations on use of vendor data (TOC blocked in exported results)

• Implicit burden on library staff • Commingling of library and vendor data

Display bug: Corporate author doesn’t display

Effect on Keyword SearchEffect on Keyword Search

• Adds significantly to retrieval in keyword searches – authors and titles

• Display element in keyword results is always the book title – also true of sorted/limited results.

• Search terms are highlighted in full record display

Effect on Title SearchEffect on Title Search

• Library determines which titles to exclude (970 first indicator)

• Chapter titles will appear in unsorted results browse

• Chapter titles not identified as such• English initial articles automatically

excluded• Search terms highlighted in full record

display

Effect on Author SearchEffect on Author Search

• Individual authors are linked in record (as transcribed) and appear in browse list (indexed form)

• Authority work often needed to match with existing names

• Corporate authors from 970 |d and editors from 970 |e not indexed

• Display of titles in extended browse follows same rules as title search

Keyword-only Indexing Keyword-only Indexing OptionOption

• Includes authors and titlesMust specify inclusion in author and

title segments • Avoids collocation issue/authority work • Avoids “noise” retrieval, confusion

between chapters and books• Limits access to documents and reports

(distinct works)• Limits effectiveness of known-author

and known-title searches

Formatting ControlsFormatting Controls

• BIB_TOC_HEADER WWWoption• Places a caption at head of TOC display• Default: no caption• Accepts HTML for formatting or link to a

help file• TABLEPARAM_BIB_TOC • Stylesheet Class : bibTOC• No link in brief citation

Search result display Search result display optionsoptions

• DISPLAY_245= does not apply to chapter titles

• EXTENDED_T=U will not force a book title to display

• Beware confusion from forcing extended display (INDEX_EXT=ta)

BROWSE WWWoptionBROWSE WWWoption

• Controls first line of index browse • BROWSE_T= controls first line of record

browse (in absence of briefcit.htm )• If no 970 subfields are specified, all

subfields will display• If specify default subfields for non-245

titles, must include subfield t

Example 1: BROWSE_T=245/abnp/c or BROWSE_T=245/abnp/c |970/t/cdBROWSE_T=245/abnp/c |970/t/cd|/a/c|/a/c

(ALL TOC subfields display)(ALL TOC subfields display)

Example 2: BROWSE_T=245/abnp/c |970/t/cd andBROWSE_T=245/abnp/c |970/t/cd |/at/c

Briefcit FormatBriefcit Format

<span class="briefcitTitle"><!--{linkfieldspec:VbT}--></span>

• All record browse screens show book titles (includes limited and keyword results)

• All index browse screens show chapter titles (includes sorted results)

• Use BROWSE_T= (define 970 |t to avoid “no title” display)

Briefcit FormatBriefcit Format

<span class="briefcitTitle"><!--{linkfieldspec:Vbt245abnp}--></span>

• All record browse screens show book titles

(includes sorted, limited and keyword)• Only system-sorted index browse shows

chapter titles

Loading and Workflow Loading and Workflow IssuesIssues

• False adds : monographic series w/ ISSNs, • False drops: CIP and other title

discrepancies• Coding consistency• Authority control

• Volume of work• Lack of tools• No mechanism to identify/protect library-

added data

Coding issues: vendor- Coding issues: vendor- supplied datasupplied data

• Titles and names transcribed from TOC, not from fullest form available

• No space for formal titles of included works -- we add 7xx

• Inconsistent coding of index-worthiness • Is “Appendix” a title or a number?

““Non-personal” Authors: Non-personal” Authors: |d|d

• Used for corporate author:

970 11 |l9 |tRedefining Discrimination: 'Disparate Impact' and the Institutionalization of Affirmative Action |d United States Department of Justice Office of Legal Policy |p121

• Also used for personal authors in direct order (but sometimes not):970 12 |tExcerpts from Antigone |d Sophocles |

p11 970 11 |tReith Lecture 2000 |d The Prince of

Wales |p11

““Non-personal” Non-personal” Authors: |dAuthors: |d

• Also used for other transcribed phrases and “et al.”:

970 21 |tWorkshop Discussion: Civil Litigation Against Terrorism |d Workshop Participants |p185

970 21 |tPublic Support for Access to Government Records: A National Survey |cPaul D. Driscoll| fDriscoll, Paul D. |cSigman L. Splichal |fSplichal, Sigman L. |cMichael B. Salwen|fSalwen, Michael B. |d [et al.] |p23

• Library can add index link in |f (not vendor-provided)

Recap: User IssuesRecap: User Issues

• Cost in screen space, added printing• Multiple forms of author entry (split

files)• Can’t distinguish between chapter and

book-length treatment (increased noise)

• License limitations on data use

Wish ListWish List

• Fix corporate author display bug • Identify chapter titles in search results• Option to force display of both chapter

title and book title in extended browse • Link to full TOC display from brief

citation format (briefcit.html)• Allow limited data export for legitimate

scholarly use

Recap: Workflow Recap: Workflow IssuesIssues

• Vendor-dependent format

• Staff burden – need coding regularization

• Co-mingling of vendor and library data

• False positives (multiple ISBN)

Wish ListWish List

• Additional Subfields/Codes:• Indexed/authorized form for corporate

author• Data source and ownership • Authority history

• Subfield code(s) to identify library-added TOC data, overlay-protect library-added authority work

• Better coding conventions, transparency

ReferencesReferences

• CSDirect TOC Data FAQ (password required)http://csdirect.iii.com/faq/tocfaq.shtml

• Blackwell TOC Enrichment brochurehttp://www.blackwell.com/pdf/TOCEnrichment.pdf

• Vendorshttp://www.blackwell.com/level2/TOC.asphttp://www.syndetics.com/index.htmhttp://www.marcive.com/HOMEPAGE/MARCres.htm(Marcive uses Syndetics data)

ContactContact

Mary M. StrouseHead of Technical ServicesJudge Kathryn J. DuFour Law

LibraryCatholic University of Americastrouse at law.cua.edu