Implementing FRBR on Large Databases
description
Transcript of Implementing FRBR on Large Databases
Implementing FRBR on Large Databases
Thomas HickeyDiane Vizine-Goetz
OCLC Research
2CNI 2002 Fall Task Force
What is FRBR• IFLA study group report: Functional
Requirements for Bibliographic Records • Bibliographic model independent of
cataloging rules• Clusters bibliographic items into a four-
level structure• Work• Expression• Manifestation• Item
3CNI 2002 Fall Task Force
Control of Entities in FRBR
ItemManifestation
ExpressionWork
Corporate Body
Person Concept
PlaceEventObject
Entities
SurrogatesUniform titlesCitations Names Subjects
4CNI 2002 Fall Task Force
Why FRBR?• Potential to improve:
– Cataloging– Discovery– Delivery
• By– Bringing versions of works together– Showing relationships of various kinds– Enabling users to navigate to level of
interest
5CNI 2002 Fall Task Force
Research on FRBR & WorldCat• Subsets
– By library, region– Example/problem sets
• Shakespeare, the Bible• Humphry Clinker• 1,000 random works
– By genre• Dissertations• Fiction
• Whole file, 47 million bibliographic records
6CNI 2002 Fall Task Force
Our Approach• Concentrating on work-level
– Problems with expression-level clusters
• Efficient, maintainable, understandable
• Few, if any, false matches with correct cataloging– Err on the side of missed matches– Some accommodation of frequent
variants• Compare with manually clustered
7CNI 2002 Fall Task Force
The Algorithm• A key is generated for each record• Extract author, title
– Look up in NACO authority file– Added entry information as needed
• Form a key from bibliographic record– Author, title, added entry information– These can be sorted, compared
10CNI 2002 Fall Task Force
Problems• Many (17%) records do not have
– Author main-entry– Uniform title
• In general these can not be matched– Look at added entries– Information at the expression and
manifestation levels– Handled separately– 180,000 clusters involving ~400,000
records
11CNI 2002 Fall Task Force
Top 10 WorldCat Clusters# Recs Author/Title Key
8,383 bible\n t8,055 bible6,174 bible\authorized4,033 bible\o t\psalms3,964 haggadah3,477 great britain/treaties etc2,402 bible\o t2,248 koran2,153 arabian nights
12CNI 2002 Fall Task Force
Top 10 from a Public Library# Recs Author/Title Key
89 bible\authorized85 mother goose84 chopin, frederic\1810 1849/piano music81 schulz, charles m/peanuts63 davis, jim/garfield61 moore, clement clarke\1779 1863/night before
christmas60 mozart, wolfgang amadeus\1756
1791/instrumental music58 bach, johann sebastian\1685 1750/cantatas57 beethoven, ludwig van\1770 1827/sonatas56 twain, mark\1835 1910/adventures of
huckleberry finn
13CNI 2002 Fall Task Force
Results• Manual estimate: 1.5
manifestations/work in WorldCat• Algorithm: ~1.3• 25,844 clusters have 20 or more
records• 401,659 clusters have 5 or more
records
14CNI 2002 Fall Task Force
Preliminary Plans• Build structures for FRBR into new
catalog• Expose FRBR clustering for
searching• Make visible in cataloging
– As consensus on implementation is developed
– As cataloging rules accommodate FRBR
15CNI 2002 Fall Task Force
Spin-offs• NACO normalization code
– Testbed– Server
• Authority work– ePrints UK
• FRBR in other projects– FictionFinder– NDLTD union catalog
16CNI 2002 Fall Task Force
Fiction Subset • 2,665,662 WorldCat records • 1,758,479 work clusters• 1.5 records/cluster• 3,866 clusters have 20 or more
records• 50,540 clusters have 5 or more
records
17CNI 2002 Fall Task Force
Top 10 clusters for fiction# Recs Author/Title Key
1,296 defoe, daniel\1661 1731/robinson crusoe1,248 carroll, lewis\1832 1898/alices adventures in
wonderland 971 cervantes saavedra, miguel de\1547 1616/don
quixote 828 stevenson, robert louis\1850 1894/treasure
island 689 twain, mark\1835 1910/adventures of
huckleberry finn 624 twain, mark\1835 1910/adventures of tom
sawyer 618 swift, jonathan\1667 1745/gullivers travels 600 andersen, h c\hans christian\1805 1875/tales 581 stowe, harriet beecher\1811 1896/uncle toms
cabin 570 arabian nights
18CNI 2002 Fall Task Force
FictionFinder• Employs work clusters in a prototype
system for searching and browsing bibliographic records for fiction
• Indexes records at the work level and organizes displays by work and expression (primarily language)
• Includes records for textual items; additional modes of expression (moving image, sound) to be added later
395 records for author “crichton, michael\1942” clustered into 17 entries
23 airframe 40 andromeda strain 5 binary 11 case of need 44 congo 26 disclosure 5 disclosure a novel 16 eaters of the dead 7 eaters of the dead the manuscript of ibn fadlan relating his experiences with the
northmen in a d 922 27 great train robbery 47 jurassic park 25 lost world 37 rising sun 31 sphere 7 sphere a novel 19 terminal man 25 timeline 395
Typical Results Set Display
Typical Work-level Display
Typical Results Set Display
Typical Work-level Display
24CNI 2002 Fall Task Force
Benefits • Aggregated displays for works and
expressions• Enhancement of (fiction) records at
work level– with elements from records within the
work cluster (e.g., summaries, genre terms, subject headings, class numbers)
– with external data (e.g., literary prizes, prequels/sequels, evaluative content)
25CNI 2002 Fall Task Force
Challenges• Identifying appropriate bibliographic
data for systematically grouping or differentiating works and expressions – Works
• Genre (graphic novel v.s novel)• Genre + mode of expressions (audio book v.s
radio play)• Degree of modification (abridgement of juvenile
work v.s an adaptation for young children)– Expressions
• translators, illustrators, editors
26CNI 2002 Fall Task Force
Next Steps• FRBR algorithm
– Explore applications– Refine algorithm as needed
• FictionFinder– Add records for sound and image– Conduct user studies
27CNI 2002 Fall Task Force
Links• Functional Requirements for Bibliographic
Records - Final Report– http://www.ifla.org/VII/s13/frbr/frbr.htm
• Experiments with the IFLA Functional Requirements for Bibliographic Records (FRBR)– http://www.dlib.org/dlib/september02/hickey/09hicke
y.html• OCLC Research Activities and IFLA's Functional
Requirements for Bibliographic Records– http://www.oclc.org/research/projects/frbr/index.shtm
• Implementing FRBR on Large Databases– http://staff.oclc.org/~vizine/CNI/OCLCFRBR.htm