Maintaining the integrity of e-book titles in CityU library catalogue
-
Upload
jasmine-kaufman -
Category
Documents
-
view
29 -
download
0
description
Transcript of Maintaining the integrity of e-book titles in CityU library catalogue
1
Maintaining the integrity of e-book titles in CityU library catalogue
7th HKIUG, 12 Dec 2006, HKUST
Joanna Pong, Philip WongRun Run Shaw Library
City University of Hong Kong
Maintaining the intergrity of e-book titles in CityU library catalogue, 7th HKIUG, 2006 2
Table of Contents
1. Growth of e-books in CityU2. Duplication problems3. Attempted solutions4. Effective Solutions5. De-duplication jobs6. Benefits and limitations
Maintaining the intergrity of e-book titles in CityU library catalogue, 7th HKIUG, 2006 3
1. Growth of e-books in CityU
E-book collection contains English e-books, Chinese e-books & e-theses
From 2001: NetLibrary (around 200 titles)To Oct 2006: > 200,000 titles English e-books: > 87,000 titles Chinese e-books: > 45,000 titles e-theses: > 70,000 titles
Maintaining the intergrity of e-book titles in CityU library catalogue, 7th HKIUG, 2006 4
1. Growth of e-books in CityU (cont’d)
Acquisition of e-books from 2001 onwards
English ebooks
Chinese ebooks
eTheses Total
2001-02 200 0 0 >200
2002-03 100 0 200 >300
2003-04 200 0 100 >400
2004-05 1,300 1,400 39,000 >40,000
2005-06 77,000 44,000 31,000 >150,000
2006-07 (Jul-Oct 06)
8000 0 100 >8,100
>87,000 >45,000 >70,000 >200,000
Maintaining the intergrity of e-book titles in CityU library catalogue, 7th HKIUG, 2006 5
1. Growth of e-books in CityU (cont’d)
Acquisition of eBooks
0
10000
20000
30000
40000
50000
60000
70000
80000
90000
2001-02 2002-03 2003-04 2004-05 2005-06 2006-07
Year
Num
ber o
f Titl
e
English ebooks Chinese ebooks e-theses
Total > 200,000 titles
Maintaining the intergrity of e-book titles in CityU library catalogue, 7th HKIUG, 2006 6
1. Growth of e-books in CityU (cont’d)
Major e-book collections
No. of Titles
4600036%
50004%
2700022%
4300035%
10001%
20002%
Apabi
Books24x7
Ebrary
NetLibrary
Safari
Springer
Maintaining the intergrity of e-book titles in CityU library catalogue, 7th HKIUG, 2006 7
1. Growth of e-books in CityU (cont’d)
E-theses
No. of Titles
2000
3% 15000
21%
54000
76%
UMI pdf files
ProQuestABI/Inform
Digital DissertationConsortium
Maintaining the intergrity of e-book titles in CityU library catalogue, 7th HKIUG, 2006 8
1. Growth of e-books in CityU (cont’d)
Consortial acquisition of e-books Digital Dissertation Consortium – since 2005 Apabi D-Lib Consortium – since 2006 NetLibrary Super E-book Consortium – since 2006
New consortia Electronic Resources Academic Library Link (ERALL),
a JULAC project on collective e-book collection development
Maintaining the intergrity of e-book titles in CityU library catalogue, 7th HKIUG, 2006 9
1. Growth of e-books in CityU (cont’d)
Growth of e-book usages (from CGI Logs) -- showed an uprising trend
eBooks Yr 2004 Yr 2005 Yr 2006 % Growth
05 to 06
Apabi 588 5196 8047 55%
ebrary - 5922 18467 212%
netLibrary 1928 2563 14753 476%
Safari - 1488 1768 19%
Wiley InterScience - 302 1291 327%
Digital Dissert. Con.
- 9881 11485 16%
ProQuest Dissert. - 1594 3171 99%
Maintaining the intergrity of e-book titles in CityU library catalogue, 7th HKIUG, 2006 10
2. Duplication problems
The variety of e-book collections and high number of titles created problems in cataloguing
A major problem-> Title duplication
We load records supplied by different vendors, resulted in title duplication
More e-book titles, more title duplication
same title from different collections same title from same collection
Maintaining the intergrity of e-book titles in CityU library catalogue, 7th HKIUG, 2006 11
2. Duplication problems (cont’d)
Duplication from different collections
Maintaining the intergrity of e-book titles in CityU library catalogue, 7th HKIUG, 2006 12
2. Duplication problems (cont’d)
Duplication from the same collection NetLibrary collection
Titles purchased by CityU since 2001 Titles acquired via Super-ebook Consortium
Maintaining the intergrity of e-book titles in CityU library catalogue, 7th HKIUG, 2006 13
2. Duplication problems (cont’d)
Same title from NetLibrary acquired in different period
Maintaining the intergrity of e-book titles in CityU library catalogue, 7th HKIUG, 2006 14
2. Duplication problems (cont’d)
Duplication from the same collection (cont’d) UMI e-theses
Titles purchased by CityU since 2002 Titles acquired via Digital Dissertation
Consortium Titles in ProQuest Database
Maintaining the intergrity of e-book titles in CityU library catalogue, 7th HKIUG, 2006 15
2. Duplication problems (cont’d)
Same UMI e-thesis title acquired in different period
Maintaining the intergrity of e-book titles in CityU library catalogue, 7th HKIUG, 2006 16
3. Attempted solutions
Single record approach in cataloguing We apply single record approach for all e-
versions of the same title Applied to e-books and e-journals
Maintaining the intergrity of e-book titles in CityU library catalogue, 7th HKIUG, 2006 17
3. Attempted solutions (cont’d)
Duplication control in e-journals CityU applied and modified BU’s program to merge
e-journal titles from aggregator databases
Maintaining the intergrity of e-book titles in CityU library catalogue, 7th HKIUG, 2006 18
3. Attempted solutions (cont’d)
Duplication control through manual methods For e-books, our previous solutions
1. Manual checking2. Headings reports – duplicate call numbers3. Loading through match field 001 – identify
duplicate records4. Encounter basis
Okay when the number of titles remains small
Maintaining the intergrity of e-book titles in CityU library catalogue, 7th HKIUG, 2006 19
3. Attempted solutions (cont’d)
Duplication control through customized load profiles The first attempt to automate the procedure Utilized the local load profiles and translation
table in INNOPAC to merge 2 sets of NetLibrary titles Super E-book Consortium titles purchased in
2006 NetLibrary titles purchased since 2001 2,206 titles were found duplicated
Maintaining the intergrity of e-book titles in CityU library catalogue, 7th HKIUG, 2006 20
3. Attempted solutions (cont’d)
Duplication control through customized load profiles (cont’d) Using load profiles is not a complete solution
Cannot match multiple tags (cannot match tag 020 against tag 024)
Cannot match selected sets (cannot exclude print titles)
Cannot merge multiple records automatically; must output for manual checking to decide the master record
Maintaining the intergrity of e-book titles in CityU library catalogue, 7th HKIUG, 2006 21
4. Effective Solutions
Cataloguing worked with Systems to run de-duplication and merging of records
Prerequisite easy to apply able to fit in the existing workflow have flexibility to handle different sizes of e-
book batches allow prompt or ad hoc loading of records if
necessary
Maintaining the intergrity of e-book titles in CityU library catalogue, 7th HKIUG, 2006 22
4. Effective Solutions (cont’d)
Scope of de-duplication Include English e-books and e-theses
e-books: 88,000 records e-theses: 70,000 records
Maintaining the intergrity of e-book titles in CityU library catalogue, 7th HKIUG, 2006 23
4. Effective Solutions (cont’d)
Scope of de-duplication (cont’d) Exclude Chinese e-books because
CityU so far only has one Chinese e-book collection, Apabi.
Vendor supplied unique records when we joined the Apabi D-Lib consortium (no duplication with previously purchased titles)
We will also handle Chinese e-books if we acquire other Chinese e-book collections in the future
Maintaining the intergrity of e-book titles in CityU library catalogue, 7th HKIUG, 2006 24
4. Effective Solutions (cont’d)
What fields to match? E-books
Match ISBN – a relatively reliable tag Match major MARC tags – 110 match key
UMI e-theses Use UMI number for matching
Maintaining the intergrity of e-book titles in CityU library catalogue, 7th HKIUG, 2006 25
4. Effective Solutions (cont’d)
How to merge? Set the one with the earliest Create Date as the
master record Add reproduction note (tag 533), name of book
collection (tag 773) and URL link (tag 856) of the duplicate record(s) to the master record
Maintaining the intergrity of e-book titles in CityU library catalogue, 7th HKIUG, 2006 26
4. Effective Solutions (cont’d)
Matching algorithm of ISBN Print ISBN vs. e-book ISBN
Some records come with print ISBN, some with e-book ISBN, some with both
Both types are used for matching
Different tags to store ISBN 020 $a, $z 024 (1st indicator 3) $a, $z 776 $z All the above are used for matching
Maintaining the intergrity of e-book titles in CityU library catalogue, 7th HKIUG, 2006 27
4. Effective Solutions (cont’d)
Matching algorithm of ISBN (cont’d) 13-digit ISBN vs. 10-digit ISBN
Starting on 1 Jan 2007, the ISBN is 13-digit Some publishers already used 13-digit ISBN
before that Starting from 12 Nov 06, OCLC moves 13-digit
ISBN to tag 020 13-digit ISBN with prefix “978” may have 10-digit
equivalents, they are converted to 10-digit for matching
Maintaining the intergrity of e-book titles in CityU library catalogue, 7th HKIUG, 2006 28
4. Effective Solutions (cont’d)
Matching algorithm of ISBN (cont’d) ISBN with “noise”
Some ISBN include a note enclosed in parentheses
Do not use ISBN for matching if the text inside the parentheses indicates that the ISBN is for a set, a series, or a volume etc.e.g. “0415191327 (series : International library of
psychology)” Hints: look for keywords “set”, “series” and
compare with Tag 440 and Tag 830
Maintaining the intergrity of e-book titles in CityU library catalogue, 7th HKIUG, 2006 29
4. Effective Solutions (cont’d)
Matching algorithm of the 110 Match Key To guarantee there is no mismatch by ISBN,
construct additional match key based on INN-Reach 110 Match Key
Title + Gen. Media + Pub. Year + Pagination + Edition + Publisher + Type of Record + Title Part + Title Number
Constructed the key and normalized Refer to INN-Reach documentation for details
Maintaining the intergrity of e-book titles in CityU library catalogue, 7th HKIUG, 2006 30
5. De-duplication jobs
Initial clean-up Regular de-duplication
Maintaining the intergrity of e-book titles in CityU library catalogue, 7th HKIUG, 2006 31
5. De-duplication jobs (cont’d)
Initial clean-up One time -- to de-duplicate records that had
been loaded 6,063 (7.2%) duplicate records were found, out
of 84,756 English e-book titles Fine tune program after initial clean-up
Maintaining the intergrity of e-book titles in CityU library catalogue, 7th HKIUG, 2006 32
5. De-duplication jobs (cont’d)
Regular de-duplication Once every month Flexibility
Depends on no. of title loaded & urgency to load the records
Clean-up before loading vs. clean-up after loading
Maintaining the intergrity of e-book titles in CityU library catalogue, 7th HKIUG, 2006 33
5. De-duplication jobs (cont’d)
Regular de-duplication (cont’d) Procedures
Output e-book records from catalogue Run de-duplication program to match with
vendor records Overlay records in catalogue with merged
records If vendor records have been loaded
delete duplicate vendor records from catalogue Else
insert new vendor records into catalogue
Maintaining the intergrity of e-book titles in CityU library catalogue, 7th HKIUG, 2006 34
5. De-duplication jobs (cont’d)
Flow chart
Match & Merge
DeleteOverlay
Master records
Vendor records
MergedDuplicate
dNew
INNOPAC
Insert
Vendor
INNOPAC
Maintaining the intergrity of e-book titles in CityU library catalogue, 7th HKIUG, 2006 35
5. De-duplication jobs (cont’d)
De-duplication results Initial clean-up of e-books
Total English e-book records 84756 100.0%
Records duplicated 6063 7.2%
Titles merged from 2 records 3024 99.8%
Titles merged from 3 records 5 0.2%
Titles merged from >= 4 records 0 0.0%
Maintaining the intergrity of e-book titles in CityU library catalogue, 7th HKIUG, 2006 36
5. De-duplication jobs (cont’d)
De-duplication results Initial clean-up of e-books (cont’d)
Books24x7 ebrary netLibrary Safari Springer Wiley Total
Books24x7 7
ebrary 0 14
netLibrary 4 2842 10
Safari 10 30 51 2
Springer 0 41 0 0 0
Wiley 0 11 0 0 0 0
Total 21 2938 61 2 0 0 3022
(Misc) 2
Distribution of titles merged from 2 records
Maintaining the intergrity of e-book titles in CityU library catalogue, 7th HKIUG, 2006 37
5. De-duplication jobs (cont’d)
De-duplication results Initial clean-up of e-books (cont’d)
We found that for the duplicated titles within the same collection, some will direct users to different e-books, this problem is more serious in ebrary.
Fine-tune program, add the condition:
When two matched records have the same CGI scripts (i.e. belong to the same collection) but different book IDs, do not merge them, but flag for review
Maintaining the intergrity of e-book titles in CityU library catalogue, 7th HKIUG, 2006 38
5. De-duplication jobs (cont’d)
De-duplication results (cont’d) Initial clean-up of e-theses
Total UMI e-thesis records 66358 100.0%
Records duplicated 502 0.76%
Titles merged from 2 records 251 100%
Titles merged from 3 records 0 0.0%
Titles merged from >= 4 records 0 0.0%
Maintaining the intergrity of e-book titles in CityU library catalogue, 7th HKIUG, 2006 39
5. De-duplication jobs (cont’d)
De-duplication results Initial clean-up of e-theses (cont’d)
UMI (pdf) DDC ProQuest Total
UMI (pdf) 0
DDC 226 0
ProQuest 23 2 0
Total 249 2 0 251
Distribution of titles merged from 2 records
(DDC = Digital Dissertation Consortium)
More than 4,000 DDC & ProQuest records had been de-duplicated with manual process (using 001 field) before the initial clean-up process.
Maintaining the intergrity of e-book titles in CityU library catalogue, 7th HKIUG, 2006 40
6. Benefits and limitations
Benefits Single record for all versions of the same e-book
or e-thesis titles, maintain integrity in the library catalogue
Save much staff time & manual effort Method applicable to other e-resources Management need – generate duplication
statistics Can be applied to match existing e-book
collections with e-book titles supplied by potential vendors – e-book collection development
Maintaining the intergrity of e-book titles in CityU library catalogue, 7th HKIUG, 2006 41
6. Benefits and limitations (cont’d)
Limitations Depends on data in vendor-supplied records
Incorrect match and merge in case of incorrect or incomplete data
Chinese e-book records Brief bibliographic data Lack of standardization in transcription Difficult to construct reliable match-key Sometimes lack of ISBNs
Maintaining the intergrity of e-book titles in CityU library catalogue, 7th HKIUG, 2006 42
Maintaining the integrity of e-book titles in CityU library catalogue
Thank You!
Joanna PongE-mail: [email protected]
Philip WongE-mail: [email protected]