Creating and Library - SciTech...
Transcript of Creating and Library - SciTech...
Creating and Managing a Digital Library
Library Learning Trends
ELSEVIER’S LEARNING TRENDS SERIES
Table of Contents
Cover image
Title page
Chapter 1: A Climate of Demand
Synopsis
Abstract
1.1 THE EMERGENCE OF DEMANDDRIVEN ACQUISITIONS
1.2 LIBRARIES AND PUBLISHERS
1.3 GOING FORWARD
Chapter 6: Managing Digital Collections
Synopsis
Abstract
6.1 INTRODUCTION
6.2 ERESOURCES
6.3 DIGITAL COLLECTIONS
6.4 COLLECTION ASSESSMENT
6.5 THE ROLE OF MANAGERS AND ADMINISTRATORS
6.6 CONCLUSIONS
Chapter 13: Staffing the Libraries of the Future
Synopsis
Abstract
Chapter 12: Altmetrics and Research Support
Synopsis
Abstract
12.1 INTRODUCTION
12.2 TRADITIONAL METRICS
12.3 NEW METRICALTMETRICS
12.4 RESEARCH ON ALTMETRICS
12.5 FUTURE OF ALTMETRICS
12.6 LIST OF TRADITIONAL CITATIONBASED AND ALTMETRICS TOOLS
Chapter 6: EBook discovery metadata
Synopsis
Abstract
6.1 Structure of the discovery metadata chapter and parts
6.2 What is discovery metadata?
Notes
6.3 Why MARC?
6.4 What is the MARC 21 standard?
6.5 Other eBook metadata containers
6.6 Original and copy cataloguing
6.7 Subject headings
6.8 Classification
Notes
For those tempted to begin reading this book at this chapter
6.9 What does bulk processing mean?
6.10 What is a record set?
6.11 Sources of record sets
6.12 Multiple modes for providing record sets
6.13 When record sets aren’t available
6.14 Collaboration between library functions
6.15 KBART for eBooks
6.16 Bulk processing of record sets
6.17 Record loading
Questions to answer and documentation to include
6.18 Updating record set metadata
Notes
CHAP T E R 1
A Climate of DemandLaura Costello
SynopsisThe shift from traditional print collections to the emergence of e‑books and demand‑driven acquisitions
AbstractThis chapter describes the conditions that produced demand‑driven acquisitions (DDA)in libraries, from the collection assessment literature produced in the 1960s and 1970s tothe emerging ebook research in the early 2000s. Budget changes for libraries andsignificant disruptions to the way information is produced, published, and distributedcreated a climate ready to experiment and accept a radical change in the way librariesbuild collections for users. This chapter discusses the particularities of the changingrelationship between libraries and publishers and goes on to describe several changesthat may become part of the future of DDA.Laura Costello is Head of Research & Emerging Technologies at Stony Brook Universitywhere she works to apply new technologies to existing library and education practice.
The fiscal troubles and technological advances of the early 2000s represented aturning point for libraries of all kinds. The 2007–09 financial crisis caused awithdrawal of state funding for public institutions, while private universities andlibraries saw a reduction in endowments (Geiger, 2015). Along with thesereductions, the prices of scholarly monographs and serials, which had been risingthrough the 1980s (Carrigan, 1996; Rossmann & Arlitsch, 2015), continued to risebeyond the rate of inflation. Increased scrutiny of the cost and values of highereducation along with growing interest in the technology and content of distancelearning began to change the way patrons interacted with libraries. Along withthese usage changes, libraries faced shrinking collection spaces, a larger demandfor new types of collaborative spaces, and a stagnation in building expansions(Mays, 2012).
Though collection assessment has always been an important part of libraryservice, these changes have led many librarians to approach collection buildingwith data‑driven a�ention to budgets and an awareness of their finite storagespaces. Librarian resourcefulness and new options from vendors have led toexperimentation with different methods and formats. Strategies that had beenaround for decades, like collaborative collection building, consortia participation,floating collections, and working with nonlibrary partners have taken on a newurgency. Newer strategies like evidence‑based collection development, short‑termloans, and pay‑per‑view emerged and have been quickly and broadly adopted.The changes made to policies and acquisitions strategies have had an impact onlibrary service at all levels, from ensuring that patrons are able to find and useinformation (Hedlund & Copeland, 2013) to keeping the peace and the lights onas units of the library that once functioned separately suddenly find themselvesthrown together in new workflows (De Fino & Lo, 2011).Libraries have been experiencing flat or reduced budgets since 2008, and much
of the personnel expansion in libraries has been towards electronic resourcesmanagement and development. With these developments there has been a greaterexamination of workflows and costs in libraries, including examinations ofcataloging and processing workflows. This has led to more outsourcing as well asa greater reliance on electronic resources. The same financial pressures that havemade budgets static have also impacted publishers. Sixty percent of publishersindicated that their business models were impacted by the 2008–09 economicdownturn (Moeller, 2013).Like all organizations, libraries have become more data‑driven and our
institutions and funders require quantitative proof of usage and financial decisionmaking. Demand‑driven acquisition (DDA) is an appropriate model for thisbecause of the opportunities for data collection, the options for loan and purchase,and the high circulation rate, which helps justify increased costs and morecomplex workflows. The University of Maryland is an example of an institutionthat used DDA to overcome budget difficulties, opting for three short‑term, 24‑hloans before a multiuser license was purchased. They also built in a manualoverride for the short‑term loan process so they could purchase a multiuserlicense right away for titles that seemed popular (Mays, 2012). Short‑term loans,which give users full access to materials for a limited time, have an additionalbenefit for budget justifications because it is easy to contrast the level of accesswith the cost of purchasing all the titles outright. The library literature at the endof the first decade of the 21st century is full of examples like the one at theUniversity of Maryland and many of the DDA programs that began during thistime are now mature and standing methods of purchase.Increasing prices for serials have remodeled the way libraries allocate funds,
most libraries are now devoting less than a quarter of their total acquisitionsexpenditures to books and spending most of their budgets on electronic serials
(Rossmann & Arlitsch, 2015). Bob Holley theorizes that it is possible that the riseof DDA is enabling a reduced expenditure on monographs by ge�ing facultymembers and other active patrons what they need without developing aspeculative collection of materials for other patrons to discover (Holley, 2011). In a2010 issue of Against the Grain, John D. Riley suggested that libraries took soquickly to DDA because they had already adopted a needs‑based purchasingmodel as a result of shrinking acquisitions budgets (Riley, 2010).DDA supports active patrons at the point of need and there is some evidence
that this strategy also supports subsequent circulations in the research, which willbe discussed at length in Part II of this volume. It is still difficult to know whetherDDA is a good model for everyone; inexperienced or passive patrons who mightnot have a good idea of what they need until they see and use library resourcesmay be left out by a strategy that requires proactive use of library materials.Publishers used to a certain percentage of fluff purchasing by libraries may needto adjust their policies. One of the primary fears of DDA is that we may not knowif there are implications for these stakeholders until after it is too late to assistthem.On the other side of this, there has also been an explosion in the research
surrounding DDA and other user‑driven acquisitions strategies and researchersare a�uned to the criticisms as well as the benefits of DDA programs. There isevidence that DDA programs of all kinds are serving patrons well and increasingthe usage of collections. In a data‑driven world, the benefits of shrinking costs peruse and the evidence of what seems to be blossoming usage of electronicresources is more valuable than ever. Careful assessment is necessary to ensurethat we are meeting the needs of all patrons, from those motivated enough to goto great lengths to request what they need to those who access library resourcesfor the first time the night before an assignment is due, but the tools and strategieswe are using to accomplish this goal are becoming more available and reliable.There is cause to be cautiously optimistic about the state of acquisitions.
1.1 THE EMERGENCE OF DEMANDDRIVENACQUISITIONSRichard Trueswell suggested in 1969 that 20% of a typical academic library’scollection generated about 80% of circulations (Trueswell, 1969). This idea hasbeen tested several times with mixed results. Some librarians find that it isrepresentative of how circulations generally fall in their traditionally acquiredcollections, though it leaves out some important factors like the age of the itemsand the time that the items have already spent on the shelf (Burrell, 1985). Whatwe know for certain about Trueswell’s ratio is that it does not apply equally to alllibraries. Factors like discipline, type of library, and proportion of undergraduate
to graduate users change both the breadth and depth of circulations in any givenlibrary (Alan, Chrzastowski, German, & Wiley, 2011). OhioLINK’s massiveconsortium study is a great representation of this, they found that circulation rateswere effected by the age of the materials, the institution, and the discipline (Force,Gammon, & O’Neill, 2011).Ohio State University Libraries traces the “just in case” acquisitions model back
to the rise of universities after World War II, when there was an influx ofenrollment and money from returning veterans. In order to provide for this boomand the future students that would follow, universities began developing largecollections and models for continuing to collect materials that they predictedwould circulate. Changes toward the turn of the 20th century caused institutionsincluding Ohio State University to begin examining this policy and experimentingwith alternatives (Hodges, Preston, & Hamilton, 2010).These alternatives have made an impact, but circulation issues are complex and
there are several significant examples of both large percentages of uncirculatedmaterials and concentrated groups of hyper‑use materials. Penn State and theUniversity of Illinois at Urbana–Champaign (UIUC) analyzed their print approvalplan circulations from 2004–05 and found that 31% of Penn State’s materials and40% of the UIUC materials did not circulate within 1–2 years. Twenty‑four percentof Penn State’s materials and 9% of UIUC’s materials circulated more than fivetimes within the same period (Alan et al., 2011). The University of Liverpoolfound a similar ratio with ebook package materials, 40% of these had notcirculated in 2 years and 3.4% of items circulated more than five times within ayear (Bucknell, 2010). Low circulation rates were also revealed in the University ofPi�sburgh study (Kent, 1979) and further echoed by a 2010 Cornell Universityreport that suggested 55% of the University’s monographs acquired since 1990had never circulated (Goedeken & Lawson, 2015).Amy Fry’s article in Library Philosophy and Practice thoroughly examines the
confirmed data we have on print circulations and suggests that these dismalpercentages may not tell the whole story of traditionally acquired printcirculations in academic libraries (Fry, 2015). The ebook frenzy and rise of DDAprograms in 2010–11 did seem to have an impact on the way print circulationswere portrayed in the literature. Though it is tempting to compare thephenomenal circulations and daringly low costs per use of ebook DDA totraditionally acquired physical monographs, they are very different. We knowthat our common measures of use for both print and ebooks present incompletepictures of how our users interact with our resources.What we can tell though, is that ebooks cast a much wider net of use and are
capturing more kinds of use, both scholarly and glancing. Our traditionalcirculation measures only suggest that someone once had an intent of seriouslyusing the book and their actual behavior may have been quite different. Wecannot see, as we do in ebooks, the users that take our items off the shelf, page
through them or read them for a moment and then put them back on the shelf, butwe also lose data about the books that were deeply important to the research of afew users. There is simply not a very good measure for the use of physical books(Danielson, 2012) and this factor restricts our ability to compare the two formatsin fair ways.We also know that institutional and individual differences between libraries,
which may not be obvious when looking at only circulation data, impact thedepth and breadth of circulations even in institutions with similar rates andmethods of purchase. The Penn State and the UIUC study we examined earlier inthe chapter is a good example for this phenomenon. Penn State’s biggerpopulation and increased course reserve circulations are probably responsible forthe wider circulation of materials they observed. Creating collections and se�ingup DDA trials that are appropriate for the size and population of particularlibraries is very important. This concept of right sizing may have had an impact onthe percentages of uncirculated and hypercirculated materials that each universityobserved. Penn State and UIUC had similar‑sized material pools at 13,658 and11,037 respectively, but Penn State’s user population was around 98,000 at thetime while UIUC’s was around 45,000 (Alan et al., 2011). The concepts of rightsizing, uncirculated material counts, and hypercirculations are importantmeasures that will come up again in the research analysis. This volume will usethe term hypercirculation to refer to the group of materials with the mostcirculations in a particular collection, not to refer to a specific number. The rates ofcirculation differ greatly between ebooks and physical books and betweenmaterials acquired in different ways, but there is a lot to learn from examining thegroup of materials that performs particularly well in usage for each of thesecollections.There is a lot of conflicting information about ebook access, but what seems
clear is that ebooks are becoming more common in libraries (Sharp & Thompson,2010) and users are becoming more open to reading ebooks under somecircumstances, even if they generally prefer print (Mizrachi, 2015; Walton, 2014).Ebooks are also good tools for assessment; they capture a lot of the data thatphysical books are missing including the many glancing and few deep uses. Thiswide net of usage is undoubtedly part of the reason that ebook circulations lookso good to librarians. As anyone who has worked in a library can a�est, the usesof spaces, services, and collections encompass ideal use, but are also home tomany other sorts of activities and uses. Though data for this are impossible tocollect, it is likely that physical book circulations exclude a lot of these alternativeuses and digital book circulations probably erroneously include them (Rose‑Wiles, 2013).Even if physical items that spend their lives on the shelf are not as extensive as
initially quoted, most libraries still want to avoid spending money on materialsthat patrons do not use. The University of Nebraska‑Lincoln estimated that they
spent an average of $325,137 per year between 2003 and 2008 on books that didnot circulate and that is with a relatively respectable percentage of circulatingmaterials at about 54% (Tyler, Melvin, Yang, Epp, & Kreps, 2011). The Universityof Alaska Fairbanks looked at one example of a $55,000 approval plan investmentover 1 year in engineering. Within 5 years only 10% of those books had circulated(Jensen, 2012). Cornell University Libraries also found that 55% of theirtraditionally selected items published since 1990 had not circulated (Walker et al.,2010). This climate of physical book assessment evolved alongside the practicesfor DDA and the two literatures directly influence one another.One of the common goals of print and ebook collection building is to create
collections that are broadly used and another is to create collections that aredeeply used. Ideally, librarians would be able to create collections that are usedboth broadly and deeply, but different formats and collection developmentactivities have different strengths. A diverse collection development strategy withclear goal se�ing, robust assessment, and quick iterations of experimentation isthe best way to build collections with a wide percentage of use and the capacity tosupport deep, sustained research. DDA can be a great part of this strategy andassessment, both of DDA programs and of other collections in the library, canhelp collection development evolve and change towards broader and deeperusage.Removing barriers to change is the first step in this process. Thomas Peters
observed in 2000 that “computers have changed everything—except perhaps theworking assumptions and beliefs of the majority of collection developmentlibrarians” (Peters, 2000). Even though many years have passed since Peters wrotethis, our strategies and goals have not developed as quickly as our technologiesand tools. Individual librarian a�itudes are seldom the cause of slow rates ofchange in libraries, but institutional ideas at the library and global levelsometimes make for hard‑won progress in changing styles and methods ofcollection building. Library acquisitions operate in a digital‑enabled world thatcomes from a strong print tradition and shaking off the workflows andassumptions of print is a process that is still in progress. Fortunately, along withthe technologies that offer greater collection development control to our patrons,have come new tools and strategies to evaluate our holdings.One of the most common criticisms of DDA as a strategy is that patrons fail to
distinguish their immediate research needs from their learning requirements(Sens & Fonseca, 2013). DDA shifts the burden of cooking up a collection from asmall group of librarians to a large group of patrons, but librarians still have tostock the kitchen with the ingredients that will make for the best outcome. Thework of selection librarians shifts in DDA to creating the profile from whichpatrons will choose. They have the freedom to include resources that are outsidethe traditional scope, but all should be of sufficient quality that they will havelong and fruitful lives on the shelf. Another common issue for DDA programs is
that they do not serve all stakeholders equally. Patrons that use the librariesbenefit most from the inclusion of DDA and patrons that rely on using bookcollections on the shelf may be left out of some DDA decisions (Walters, 2012)though they may be the beneficiaries of users with more foresight and similarresearch interests.There are also some barriers that are unique to ebook programs. The success of
ebooks depends on both the user’s equipment for access and the availability ofdigital books to the library market (Benhamou, 2015). Ebook availability hasimproved, but it still is not universal. The University of Mississippi conducted acollection assessment by looking at books that had circulated for the first time in2012 to see how many of these could have been purchased on demand. Theyfound that 8020 titles were used for the first time in 2012 and, of these, 6130 titles(76%) were available for purchase as ebooks. Some of the titles were also in thepublic domain, so only 21% of titles were unavailable for purchase or access inany electronic format. The University participates in a consortium and only 1% ofthe titles could not be purchased, accessed freely, or borrowed from anothermember institution. Sixty‑four percent of the titles had been published after 1990(Herrera, 2015). This is a promising strategy, especially as publisher backfilesmove towards greater electronic access. It also represents an exciting possibilityfor increased on‑demand purchasing and reduced “just in case” purchasing. Iflibraries can get nearly anything on demand, the task of stocking libraries withmaterials that patrons might need becomes less essential.Ebooks are considered to be leased rather than purchased outright because the
vendor almost always owns most uses of the file and the proprietary platform.Because ebooks are not owned, they cannot be sold or transferred (Walters, 2014).Vendor license terms often specify that these materials are restricted from beingshared via interlibrary loan (Radnor & Shrauger, 2012). Even perpetual accesstitles might be considered leased, because the future of any given platform orvendor is uncertain. The move from physical to ephemeral seems frightening andradical, but it does not necessarily represent a fundamental shift from the waylibrarians have always managed collections. We have always had to makepreservation plans, it’s not a flood or fire that is going to destroy ebookcollections, but there are other digital natural disasters that might impact ourcollections like obsolescence or companies collapsing, and it makes sense forlibraries to make a preservation plan in the event that these things happen. Fordigital items this is not every few years or so, but at the time of licensing andlicense renewal. This process should begin to involve more people at theinstitutional level, it is no longer simply an issue for acquisitions librarians, butinvolves many other workflows as well as including preservation, interlibraryloan, and collection management.Ebooks also can be challenging to promote. Because these items are not
physically visible, libraries must work hard to ensure that these resources are
easily accessible via the catalog and available for both browsing and activelysearching patrons. These materials should be clearly and proudly differentiated asebooks and libraries should select discovery platforms that clearly representebooks and ensure that access is seamless. Kent State evaluated how users cometo access ebooks and found that most users were accessing specific titles throughthe library’s bibliographic catalog rather than browsing for titles on the Ebraryinterface. An overwhelming number of these searches came from generalkeywords, with a not insignificant portion coming from title and author searches.The researchers called these accesses “full‑orthodox” because they showed somewill on the part of the patron in seeking out a specific title or topic. They alsofound that when users clicked the first result, they were significantly more likelyto trigger the book. The researchers took this as a sign that users had intentionwhen searching to find specific titles, many of the keyword searches actuallycontained significant numbers of words from the specific triggered title. Theirresearch revealed that over 70% of trigger purchases were associated withsearches and click‑throughs from the bibliographic record (Urbano, Zhang,Downey, & Klingler, 2015). This means that the catalog is still a very importantdiscovery interface for users and that ensuring users are able to find materials inthe online catalog is as important as making sure they can find their way aroundour physical stacks.Ebook visibility is also important for library branding. Many libraries are
reducing their print collections to make more space for patrons, but continuing tobrand the library as a space to get information and access resources is important,even if those resources are invisible to users walking into our buildings. Since thedigital transition, libraries have endured a period of identity transition and ourwork experimenting with new formats and methods has ensured that librarieshave remained relevant to our users. Promoting this spirit of inquiry to our usersis essential to the library mission as our spaces transform from book repositoriesto active and evolving information spaces.Though there are many roads to increasing patron input in collection
development, the rapid rise of DDA and the vast majority of the researchpublications on this subject are strongly tied to the demand for and availability ofebooks. Many universities are developing programs and courses to meet a greaterdemand for online and distance education. For academic libraries, this means thatour student body and patron base is increasingly far‑flung. Academic publishershave jumped on the digital bandwagon and every year more titles are available inthis format. For public libraries, this road has been rockier but the demand fordigital content spiked in the early 2000s and continues to rise, albeit at a moremoderate pace. DDA does not always involve ebooks, but the strategy that tookthe acquisitions world by storm in 2010–11 was catalog‑integrated ebook DDA.Physical books can also be acquired via DDA, and many examples of this will beexamined, but the fate of the strategy is inexorably tied to the digital format.
Libraries, especially public libraries, have long used informal suggestions andmore formal suggestion request forms to connect with patrons and developcollections, but interlibrary loan‑to‑purchase programs and a greater emphasis onthe benefits of patron‑selected materials in the early 2000s led to a grand adoptionof catalog‑integrated electronic DDA when the option became available throughcommon vendors (Fulton, 2014).The rise of DDA is strongly tied to the explosion of new ebook platforms, e‑
reader technologies, and ebook research in 2010–11. In 2010 and 2011 DDA saleswere growing in popularity and sales of ebooks in the personal market were alsoat record highs. This time period marked the beginning of increased acceptance ofthe electronic format. An American Library Association report specified thatebook sales rose 210% between 2010 and 2011, but print volumes still accountedfor 79% of trade sales (Besen & Kirby, 2012). By the end of 2011, Amazonannounced that their electronic sales had surpassed their print sales (Miller &Bosman, 2011). Though ebooks were still a small percentage of book sales from allretailers, this was a proof of concept that ebook sellers offering competitivepricing and extraordinary convenience could change the reading habits of theircustomers from print to blended or even fully digital.This new acceptance of ebooks also had an impact on library purchasing. The
J.N. Desmarais Library of Laurentian University explored their ebook usageduring and after this period. Making ebooks discoverable is of upmostimportance. Laurentian does this by pu�ing links to all of its ebooks immediatelyup on the library’s website. They found that during the first 7 years of ebookacquisitions (2002–09) they experienced a steadily increasing interest in searchingand viewing ebooks. In 2010 they saw an exponential increase in both interest andacquisitions. They added about 30,000 ebooks to the collection and there was ahuge increase in searches and viewings for the ebook content (Lamothe, 2013).In the intervening years, digital sales have plateaued in popular reading (Alter,
2015) and libraries, especially public libraries, have faced an uncertain andfrequently changing ebook market (Benhamou, 2015). Ebook purchasing hasfluctuated in public libraries but the bump in demand through the early 2000sestablished vendors and protocols that have made digital purchasing at least assimple as print both in personal purchasing and in most academic libraries. Thereare no certainties about the future of ebooks but many possibilities. As with anyformat, ebooks have drawbacks like restrictive licenses, promotion challenges,and sometimes even resistance within library communities. Ebooks also tend to bemore expensive than print titles though there are many more options forpurchasing these materials and a lot more flexibility for models in the future.Movements like open access might shift the payment paradigms that libraries areused to, but the purchasing strategy for print books is unlikely to change verymuch. Access to ebooks is very diverse. The most recent Pew study indicates that68% of adults now own a smartphone and 86% of adults ages 18–29 use
smartphones. Computer ownership among adults in this age group has alsodropped, from 89% in 2012 to 78% in 2015 (Anderson, 2015).The method of access, though it does not change the format of the item, should
shape the way libraries think about providing devices to support their demand‑driven acquisitions programs. Acquiring more desktop computers, laptops, andlarge‑format screens might drive ebook use more than the purchase of dedicatede‑readers and support for users of smartphones and tablets should be among themain objectives of library service. The personal and library institutional ebookmarkets have thus far remained very separate, with Amazon and Barnes & Nobleaccounting for more than 85% of personal ebook sales and other companies likeEbrary, Proquest, and EBSCO accounting for most academic library sales (Surveyof Ebook Penetration and Use in US Academic Libraries (New York: LibraryJournal, 2010)). The major differentiators between these two groups are in priceand licensure. It may be that these two markets become more differentiated, but ifa large‑scale player in the personal ebook space like Amazon begins to offerinstitutional licensing the landscape of library ebook purchasing could changegreatly.
1.2 LIBRARIES AND PUBLISHERSIn addition to changing the way libraries serve patrons, DDA also changes therelationship between libraries and publishers. DDA and the rise of ebooks hasraised questions about how use‑based spending will affect publishers as librariesmove from purchasing entire catalogs to only purchasing materials patrons use(Fischer, Wright, Clatanoff, Barton, & Shreeves, 2012). The reshuffling ofacquisitions programs in preparation for DDA initiatives has caused publishersand librarians to examine their complex relationship and evaluate the impact ofthe greater patron focus in collection building and the increasing prevalence ofdigital materials. Librarians cannot predict what will happen to the ebookindustry as it matures. This is a caution borne from multiple iterations of newlibrary technologies that were sometimes sustainable and sometimes becameobsolete almost as soon as we had invested money in them (Sens & Fonseca,2013). Many libraries are burdened with aging formats and hardware that areincreasingly expensive and difficult to maintain even as their usage declines. Thenext generation of librarians may very well need to sift through our elderlyebooks, hosted on rapidly aging software and making clu�er and trouble in ourstreamlined zippy future online catalogs. It is also possible that the dream ofperpetual access will come true for libraries. Like any industry riding the squall oftechnology, we need to grasp the most likely lifeboat and hope that it does nothave too many holes. We continue to do our best to provide for the needs ourusers have now while trying to keep an eye on what is coming up. Neithervendors nor libraries can predict the future of technology and we are all going to
have to work together to ensure that we are doing the best possible assessmentand planning for it.DDA changes the balance between libraries and publishers in significant ways.
Instead of purchasing a reliable quantity of newly published titles, DDA movespurchasing to the point of need, sca�ered across the semester and sometimesdipping back into older publications. The publishing model for many publishersoperates on the assumption that many libraries will purchase broadly and in theprocess acquire many materials their communities never use, this is particularlytrue for scholarly and academic presses. DDA is disruptive for publishers, but thegoals of libraries and publishers are still aligned. Robust DDA programs are goodfor users, libraries, and publishers. Increasing the discoverability of ebook recordsand making ebooks more usable and flexible is a process that librarians andpublishers will take on in collaboration as user data continue to shape productand collection development (Seger & Allen, 2011).This greater alignment is a shift for libraries as well. Jean‑Mark Sens and
Anthony J. Fonseca warn that academic librarians ought not to embrace DDAprograms too quickly, because the increasing similarity of library catalogs tobookstores puts the brand at risk. Sens and Fonseca also warn against theincreasing weight of publishers in the representation of library records (2013).This worry is compounded by the presence of library publishers in discoveryplatforms and their power to optimize results towards their own materials. Theoligarchical distribution of library technology and publishing means that formany libraries, the same company or small group of companies is producing bothcontent and discovery tools and in this relationship there is the potential forabuse. Patron choice is important particularly for DDA programs and data‑drivencollection development, so librarians should be vigilant over the search resultweight in discovery systems so the friendly publisher is not skewed over the rightcontent for the right patron. Barbara Quint warns in her Information Today article,“Do we let publishers and vendors design our collections and just tell them whatwe’re going to get?” (Quint, 2014). With increasing alliances between publishersand the vendors of discovery systems, this is a very real threat to our searches.Librarians should monitor this carefully and examine alternative discoverystrategies if the major discovery systems fail to give librarians full control totweak their search algorithms and change the weights of different providers.Some publisher costs, like distribution and printing, will decrease with the rise
of ebooks, but some costs, like management and making materials visible, willincrease. The turn to ebooks takes the power out of the hands of publishers to bethe sole arbiters of what information gets turned into a book. Because the costs ofproducing a book are lowered, the market should begin to fla�en out with areduction in the barrier to entry. This might be similar to what we have seen asthe music industry has gone digital.
Users are still going to seek out the book equivalent of Beyoncé on establishedplatforms, but reputable, reliable sources for discovering new content, likeBandcamp (h�ps://bandcamp.com/) will begin to rise up and legitimize as sourcesof information. This process will be disruptive not only for publishers’ costs butfor the evaluation of library materials. Publisher quality used to inform collectiondevelopment decisions, but in the future of publishing quality materials might beproduced by anyone. Libraries must seek to understand this system, or find waysto help patrons access materials that might not be a traditional book or publishedthrough a publisher, but might be good for collections anyway (Benhamou, 2015).DDA is a good model for this, but libraries have to be proactive about ensuring
that these materials can be accessed by patrons. The work of collectiondevelopment is becoming a more serendipitous process. While libraries may havefavored publishers, formats, and methods for obtaining materials, it seems likepatrons are finding their resources more often on the web and looking for thematerials they have discovered in libraries after the fact.Libraries are also pushing on publishers harder to provide fair terms for
ebooks. Several librarians at UNC Charlo�e have been working on an initiative topush academic ebook vendors towards policies and standards that are good forthe long‑term collection health of libraries. UNC Charlo�e began this initiative in2014 with a Charleston Conference presentation that explored the sustainability ofacquiring only ebooks that ensured perpetual access, allowed for an unlimitednumber of simultaneous users, and were free from digital rights management(DRM). They recruited a working group of professionals from libraries, consortia,and publishing and have secured a Mellon grant to explore these issues further(Hamaker, 2016). Other platforms like Portico and LOCKSS/CLOCKSS focus onthe shared interests of librarians and publishers to minimize future informationloss.The Claremont Colleges have been investigating purchasing DRM‑free books
directly from publishers for years. It is likely that these types of books (which canbe fully “owned” by the library, that is: distributed, downloaded, printed,controlled, etc.) won’t ever be integrated into something so sleek as catalog‑integrated ebook DDA, but if libraries are investing heavily in ebooks without aguarantee that they will stick around forever it’s worth investigating (Price, 2011).Another argument from Sens and Fonseca questions whether librarian and
publisher needs are really as well‑aligned as we tend to think. It is true thatpublishers had the need and desire to push a profitable model for ebook saleseven before libraries sought them out, but there are very practical reasons thatlibraries approach ebooks. Ebooks are useful for sharing our progress withstakeholders, because we are able to back up their existence with extensive usagedata, it is quick and easy for us to produce impressive statistics that may helpkeep our budgets stocked. They are also good for the kind of public relations thatuniversities are doing right now, offering new types of online learning initiatives
requires a well‑stocked electronic library that is accessible both on and off campusand this is an ecosystem that still requires both journals and monographs. Forpublic libraries, ebooks help us deliver the materials they want in new ways (Gray& Copeland, 2012).Joseph Esposito has wri�en prolifically about the impact of these new
acquisitions policies on the business practices of publishers. He continues to writefor the Scholarly Kitchen (h�p://scholarlykitchen.sspnet.org/). He notes that costreduction policies in libraries necessitate changes in publisher workflows. DDA ornot, our budgets are shrinking. Esposito also suggests that the differentpurchasing model, that is triggers not coming at the point of publication but at thepoint of trigger (which might be much later) could impact publishers’ ability toplan. He counsels that publishers will deal with DDA in different ways,commercial publishers might experiment with including or excluding titles fromDDA programs to observe the effects on sales, while university presses shouldfocus on long‑term relevance, since many of their titles are of the “long tail” ofacademia which may be triggered even years after publication.There is a potential scenario in which university presses lose so much money
that they begin to restrict output, though this is just an extension of a process thathas been happening for years already. One of the most compounding issues withthis is that at the same time DDA is becoming a serious strategy, course adoptionpurchasing is anecdotally declining as libraries begin to offer more reserve andcourse books for checkout. A lot of this is pure speculation though, since there arenot good published numbers on how many books from these presses are sold tolibraries and whether that percentage is declining (Esposito, Walker, & Ehling,2013). Esposito also suggests in a later article that as libraries become lessdependent on stocking libraries with all the relevant titles and more dependent ondata to make collection development decisions, publishers and vendors may startto commercialize these data and sell them to libraries (Esposito, 2015).Sandy Thatcher and Rick Anderson also debated the question of whether DDA
will crush the scholarly publishing market in Against the Grain. Thatcher defendsthe right and role of university presses to publish niche monographs that arecommercial failures, while Anderson suggests that perhaps the model of obscurescholarship for the sake of monographs should be put to bed. The suggestion ofthe debate was that universities as a whole have an obligation to supportscholarship for the benefit of institutions and scholars across the system, but thatthis is not necessarily the responsibility of library collection developers to solve(Arch, Anderson, & Thatcher, 2011). Libraries alternatively can invest in othertools and platforms that enable their academics to share scholarly research outsidethe monograph form. Development and maintenance of institutional repositoriesand more informal scholarly sharing and collaboration platforms might help to dothis. There have also been some efforts to collaboratively support the scholarlypublishing industry through consortia. Four State University of New York
research centers made an agreement to collaboratively purchase the entire yearlycatalog of eight university presses and share the usage between their institutions(Booth & O’Brien, 2011). When this kind of collaboration serves both librarypatrons and the output of our faculty and scholars in university presses it couldserve as one potential solution to this problem.The relationship between libraries and publishers is definitely changing and
DDA is only one part of this. It is good practice to keep the fraught parts of thisrelationship in mind when purchasing and se�ing up things like discoveryplatforms and the harmonious parts of this relationship in mind when makingcollection development decisions. The level of guilt or fear about the future ofpublishing when starting a DDA program is unique to every institution.Questions about the future of ebooks and publishing are difficult to answerbecause so much is uncertain, but the good news is that many librarians andpublishers are actively involved in hammering out the unknowns and, in thatendeavor at least, the interests of publishers and librarians are perfectly aligned.
1.3 GOING FORWARDWe have considered the climate that led to the advancement of DDA as anacquisitions strategy, examined the library issues surrounding its advancementand success, and have discussed the potential issues that may arise for publishersand libraries as we embrace ebooks and DDA programs. The short history ofDDA has already been a wild ride, but what can we expect from its future? Muchof this is pure speculation, but we may see things like developing license types forebook materials, increased scrutiny on patron security when using librarytechnologies, a diversification in patron access strategies, and the risingimportance of discovery platforms for connecting users with information.We might also see a rise in DDA acceptance. A recent survey of small academic
library directors in Indiana found that 82% of these institutions did not haveebook DDA programs yet, but 82% of those directors felt confident that theirpatrons would choose appropriate selections that would circulate in the library ifgiven the chance to purchase using DDA. Several of the directors indicated that,even though they believed in DDA and wanted to implement a program, staff andtime constraints prevented them from starting one (Freeman, Nixon, & Ward,2015). Librarians that fall into this group might appreciate the next chapter, whichwill outline a wide variety of DDA plans options that are appropriate for differentbudget and staffing configurations.Along with these options for creating DDA programs, librarians might see an
increase in the ways patrons are accessing library materials. A study out of BoiseState University investigated whether students could access and use libraryresources successfully via several mobile devices including tablets, smartphones,and e‑readers. Student use was studied with a pre and post survey along with a
focus group. Participants’ use of library electronic resources including ebooks rosewith participation in the program, so it’s likely that many of the users were notaccessing ebooks in the catalog because they did not know about them. Studentssuggested that they had problems with ebook usability because they could notannotate the text and they did not like to read electronically for extended periods.Seventy‑eight percent of students believed that electronic resource access wouldimprove education in the future (Glackin, Rodenhiser, & Herzog, 2014). Thoughthere are still issues with ebook access it is heartening that students appreciateaccessing electronic materials and feel that they will improve education.We also owe it to patrons to help improve their experiences using electronic
materials in the library. One of the things that libraries can do towards this goal iswork to strengthen privacy controls and ensure that our vendors are doing thesame. Andromeda Yelton spoke brilliantly on this subject at her keynote to the2016 LibTech Conference in St. Paul, MN. Her notes, along with a list of questionsand answers that even nontechnical librarians can use to talk to their vendorsabout patron privacy are available on her website(h�ps://andromedayelton.com/talks/ltc2016/). Another thing that librarians can doto improve the electronic experience for patrons is help create and organizediscovery and delivery platforms so that all library users, both new andexperienced, can access electronic materials in a straightforward way. The web‑scale discovery layers on the library market today are very powerful and putmany kinds of resources within reach through a single search bar, but they mostlyfail to provide context for this information and their recommendation algorithmscan sometimes be skewed to favor one vendor over another. These discoveryplatforms improve the initial user transaction in our systems (Lundrigan, Manuel,& Yan, 2015), but the mess of different types of resources they sometimes returncould be overwhelming for users. The way discovery platforms impact usagestatistics is also still an emerging scholarship.Discovery layer research definitely shows that this technology will have an
impact on the way our patrons use materials. When the University of Liverpooladopted the EBSCO Discovery Service they saw a decline in their usage ofSpringerLink journal titles. They theorized that this might have been due to usersdiscovering ebooks that had previously been hidden to them when they weredoing journal article searches, though the use of both journal articles and ebooksincreased over the next several months (Bucknell, 2012). Penn State conducted astudy with Proquest’s Summon that showed that use of the system decreased thenumber of erroneous interlibrary loan requests for materials the library alreadyowned (Musser & Coopey, 2015). As the research in this area grows, it is certainthat we will see more examples of the ways discovery layers are impacting ourpatrons’ use of the system and how that could effect our purchasing andassessment.
Discovery layers and other search technologies also offer libraries theopportunity to promote DDA collections. Most institutions do not promote thisstrategy, a survey found that 74% of respondents did not promote DDA programsoffered by their libraries (Carrico, Leonard, & Gallagher, 2016). One of the bigbarriers to this is fear about the library budget or that some patrons would usethis knowledge to purposefully trigger materials the library may not need.Rutgers University used a great solution to balance promotion with caution. Theydid not advertise their DDA program, but informed new students in particulardisciplines that they intended to strengthen the ebook collection (De Fino & Lo,2011). This strategy might build interest in ebook collections and DDA withoutreleasing specifics.DDA has the potential to evolve in many directions beyond what we have
explored here, but these ideas represent some present‑based things for librariansfocused on ebooks and DDA programs to keep in mind. Promoting thesecollections and conducting user surveys can help further predictions in specificinstitutions.
CHAP T E R 6
Managing DigitalCollectionsSusan Higgins
SynopsisOnce you have made the decision transition to a more digital‑based collection, how doyou manage it?
AbstractThis chapter discusses the ways managers of collections in academic libraries engagetheir spaces and services. There is always a great deal of work to be done to handleresources in academic libraries, and libraries have adapted their routines toaccommodate e‑resources. Even experienced libraries often decide upon creating andmaintaining the resources that require server space. Different methods of collectiondevelopment are presented and the role of managers and administrators is discussed.Susan E. Higgins is currently an Adjunct Instructor for San Jose State University Schoolof Information as well as the University of Arizona School of Information
6.1 INTRODUCTIONIn the current academic library environment, managing information is managingdigital information. To a great extent, the terms “collection development” and“collection management” refer to building digital collections. They refer tomanaging the transition from print, to print plus online, to an environment wheredigital collections are the default, even though print and other physical media willhave a place for some time to come. Managing this transition and managingdigital collections have many aspects: funding, space, equipment, staffing,training, and technology. It means making choices, staying current,communicating with users, and collaborating with other libraries, otherinstitutions, and others on campus.
Academic libraries are currently engaged in “spaces and services” discussionswith others on campus, with the goal of creating student‑centered spaces,academic success facilities (including campus units such as writing centers, aswell as library reference and instruction activities), and spaces for faculty andstudent research. This means rethinking the role of print and other physicalcollections, and the collection assessment activity goes hand in hand with thespaces and services discussion.Library collections affect every function and department: acquisition,
bibliographic control, access, and use. Libraries are more than their collections,and collections are more than the things owned by or housed in the library.Nevertheless, the resources chosen by a library for its users are one of the centraldefining services in any library. Electronic or digital resources take a number offorms. These include commercially produced databases, ejournals, and ebooks,and other similar resources, including streaming audio and video and digitalimage collections. Other commercial eresources include collections of digitizedprimary source material. There are also increasing numbers of open‑accessjournals, books, and archives. Another important kind of digital resource is locallyproduced research material and digital archives. Research material includestheses, dissertations, and faculty publications that may be housed in aninstitutional repository. It also includes research data that are preserved in a datarepository. Other locally created resources include digitized special collectionsmaterial, for example, le�ers, diaries, images, and other materials that pertain to aperson, place, or thing.
6.2 ERESOURCESEresources have become common in library collections in the last 15 years. In thepast 10 years, ejournals have become the default format for journals, and moreand more titles are available electronically. Ebooks are increasingly common butnot yet the default. Libraries have adapted their routines to accommodate theseeresources. Library software has been adapted and created to handle thesematerials. Library catalogs and discovery tools have been optimized for access toeresources of all kinds. These resources allow access from anywhere at any timeand have motivated libraries to create virtual reference service since users do notneed to go to the library to use them. Eresources are expensive, but libraries havebeen able to take advantage of consortia discounts and to negotiate with vendorsin other ways to achieve advantageous pricing. Faculty and students in alldisciplines, but especially in the sciences, have demanded access to ejournals anddatabases.In the current environment, academic libraries are facing a space crunch and are
finding other uses for the space occupied by the print collection. Access toejournals is not always stable enough to allow the print volumes to be withdrawn,
but where there are stable archives, such as JSTOR, academic libraries arebeginning to withdraw the print versions of titles. In addition, shared printrepositories are being developed, so that libraries will be more willing towithdraw print.
6.3 DIGITAL COLLECTIONSAcademic libraries may potentially have a number of different kinds of digitalresources. The first is the product of faculty and student research that may residein a digital repository. These repositories are a product of the last 10 years andhad their origin in the idea of electronic theses and dissertations. There are anumber of available repositories, including Digital Commons, D‑Space, andothers. They provide server space and metadata for theses, dissertations, and forfaculty’s published works, generally a preprint version of journal articles andother materials. This allows open access to a large amount of research materialthat can be found with a Web search.Another kind of digital resource is the “digital library,” which may include a
variety of things, including the institutional repository. Digital libraries bringtogether many different types of resources including digitized special collectionsmaterial. While the rise of ejournals is somewhat associated with the sciences, alarge area of digital library resources is digital humanities, which includesarchives of primary source material in literature, history, and other areas. Digitalhumanities projects allow scholars to use primary sources without having totravel, making these resources available to users who previously would neverhave dreamed of using them. Digital humanities resources can be used for textualanalysis, linguistic research, and for many other kinds of research in a variety ofhumanities disciplines. Libraries that create digital humanities projects may becollaborating with scholars on campus and at other institutions. Creating andmaintaining these resources requires server space, IT support, and people withknowledge of a number of metadata schemes and encoding formats, includingTEI and XML. There are large grants available for these projects, which requireskill in applying for grants and carrying out the work that is funded.
6.4 COLLECTION ASSESSMENTLibraries are completing the transition to an environment in which all or virtuallyall resources are online. This requires a process of collection assessment, whichcan refer to a number of activities, including weeding, determining what ejournalholdings are in stable archives, working with other institutions on shared storage,working with vendors on purchasing backfiles to allow withdrawal of print, andso on. Collection assessment requires careful coordination and communication. Itrequires a process for requesting and se�ing up projects. The process must
include a way of prioritizing and a way to make sure that overlapping projects arenot causing redundant effort. The transition from print to online requires space.The intersection of the “spaces and services” discussion and the collectionassessment that goes along with it requires staging areas and swing space, sincecollections that are moving or being withdrawn must be housed somewhereduring the process. Collection assessment also requires time and space forthoughtful decision‑making.
6.5 THE ROLE OF MANAGERS ANDADMINISTRATORSManaging digital collections implies many things, including funding, training,and workflow. It also implies that the library administration has made anorganizational commitment to the creation and acquisition of digital resources.Administrators and managers must help the organization create a vision of the21st‑century library, and the role of collections and services in that library. Part ofthat vision is aligning the library’s goals with those of the larger institution.Everyone in the library has a role in the management of digital resources, andadministrators must get buy‑in from the whole organization to move forward.Funding issues include IT infrastructure, including hardware, software, and
their maintenance. There are ongoing training needs for staff in every department.As there is turnover, or even new positions, it is essential to rethink every openposition and hire strategically for the changing environment. In addition torethinking positions, libraries must continuously rethink the organization itself.How are functions and departments aligned to deliver digital resources andservices? Administrators must find appropriate collaborators for enriching digitalcollections. These include consortia, which might provide advantageous pricingfor purchasing eresources, as well as cooperative digitization projects.The management of digital collections obviously has a major collection
development component. Managers must consider the allocation of funds forvarious formats of information. The transition from print to online resourcesbrings into question the traditional split between funding for monographs andserials. It may no longer make sense to maintain that strict division, to maintaintraditional funding formulas (eg, 70% for serials, 30% for monographs), and tomaintain a single approach for all disciplines. The needs of the sciences forjournals and the humanities for books may call for a more customized approachto funding, for example.There is a great deal of literature on all aspects of managing digital collections.
An important aspect of this topic is the needs and a�itudes of users. Connawayand Dickey (2010) report that students in academic libraries seek full‑text digitalcontent from the academic library that serves the institution. They still value
traditional library services and human sources of information but find digitalcollections more adaptable to work and study. Respondents to the survey alsooverwhelmingly reported using online resources such as the library’s researchdatabases and online journals, followed by the library online catalog. Mortimore(2006) advocates just‑in‑time acquisition accomplished through analysis ofinterlibrary loan requests. Evidence‑based and demand‑driven acquisitions are animportant part of the digital environment, and this approach to collectiondevelopment employs evidence to guide collection decision‑making. Gerke andManess (2010) discuss a survey of library users regarding digital collections. Usedid not vary by their discipline but was correlated by frequency of use of thelibrary Website. Hu�on (2008) explores library service to distance students,recommending that libraries “pursue metadata standards to support cross‑searching, collaborative projects, and development of eresource search software,which integrates with the library catalog.” Huwe (2010) discusses the design oflibrary spaces in the digital age. The library as place has taken on a new meaningin the digital era, and academic libraries are creating new spaces for students.A crucial part of the digital landscape is open access material. Fernandez and
Nariani (2011) state that “The open access publishing landscape is nowinternational in scope and encompasses many approaches. Funding of OAinitiatives is becoming increasingly important to libraries and has relevance forchanging librarian roles” (p. 3). Open access is important as a collection streambut also as an organizational consideration.The management of digital collections has a strong connection to the way
libraries are organized and how work and responsibility are assigned. Jordan(2010) wrote that “OCLC Research provides the OCLC cooperative with aninfrastructure and interactive process for helping libraries, museums and archivesdeal with the rapidly changing digital, global community” (p. 13).The library has an important role to play in the digital educational
environment. Mathews (2009, p. 19) writes that “Academic libraries must be ableto express how the library is unique and how it adds value and contributes to theintellectual life of the university.” That includes effective records management,including the management of digital records, which increases the operatingefficiency and effectiveness of the academic library, reduces unnecessary, oftenhidden costs, ensures compliance with legislative requirements, provideslitigation support, and is the basis of institutional memory. 2012 top 10 trends inacademic libraries (2012) include communicating value, managing research data,and preserving digital collections. The article also explores data repositories andacquisition of electronic material. Maloney et al. (2010) found that leaders“perceive a significant gap between their current and preferred organizationalcultures and that current organizational cultures limit their effectiveness.” Thatgap may make it difficult to achieve the aims of creating data repositories andcreating digital collections. “Adhocracy” is defined by Waterman (1992) as “any
form of organization that cuts across normal bureaucratic lines to captureopportunities, solve problems, and get results.” The current environment calls forsome adhocracy, which may determine best practices and create neworganizational pa�erns that work be�er. These new models and practices mayapply to particular types of libraries or particular areas of academic libraries.Breakstone (2010) explores the availability of online resources for law libraries.Brenner, Larsen, and Weston (2006) “offer a strategy for adapting a library systemto traditional archival practice.” Conway (2008) defines a framework for themanagement of digital collections, which “offers an original model for evaluatingthe asset values of digital content produced or acquired in a university context.”One important aspect of digital collections is digitized material, which is
created in‑house or through a cooperative project. These may involve one or moreacademic libraries, other departments on campus, or a large organization such asGoogle. Breitbach, Tracey, and Neely (2002) describe a project to digitize slideimages. Carlson and Young (2005) describe the Google Books digitization project,which began with Google collaborating with five large research libraries.“Framework for good digital collections:Version 3” released by NISO, IMLS (2008)reports on the release of NISO’s framework for digital collections, which“establishes principles for creating, managing and preserving digital collections.”Goldman (2011) explores the management of “born‑digital” collections, includingstorage and access solutions. Gueguen and Hanlon (2009) discuss themanagement of digitization that is done at the point of use or demand. Huwe(2011) discusses the lawsuit filed against HathiTrust and its implications for thecreation and use of digital collections. Jeng (2005) looks at the issue of usability ofdigital libraries, proposing a model for evaluating them, and finding “aninterlocking relationship among effectiveness, efficiency, and satisfaction.”Johnson and Mandity (2010) describe a collaborative digitization project involvingtwo university libraries. Chen and Reilly (2011) discuss the use of preservationmetadata in a digital library. Nelson (2012) investigates the inclusion of “born‑digital” materials in library special collections. Nikolaidou, Anagnostopoulos, andHa�opoulos (2005) discuss a digital library that supports research in a medicalschool, describing requirements for creating objects and searching. Oehlerts andLiu (2013) present options for digital preservation including practices, tools, andtechnologies. Zorich (2007) discusses the need for preservation of digital objectsand for cultural heritage organizations to maintain their stewardship role.Foulonneau et al. (2006) describe the CIC metadata portal project, which exploredsharing information about digital collections among universities. Hurford andRunyon (2011) describe the Bracker Collection of horticultural material at BallState University Libraries, which “posed significant challenges to traditionalarchival collection processing procedures and existing digital collection buildingworkflows.” Seo and Zanish‑Belcher (2005) observe that a variety of issues go intopreservation decision‑making as it relates to special collections, including the
paramount role of priority se�ing as well as effective communication. Hubbard(2001) describes the programs of the Ge�y Research Institute, including thecreation of a single discovery system, and a review of digital asset management.Kre�schmar and Po�er (2010) discuss issues in digital humanities projects,including data storage, changing media and technology, and the uniquechallenges presented in maintaining digital archives. Lampert and Vaughan (2009)use a case study and survey to investigate academic library digitization programs,finding that “potential success factors” include “staff skill sets, funding, andstrategic planning.” Lopatin (2010) looks at metadata for digital projects,including issues like interoperability, user‑created metadata, and staffing. Prilop,Westbrook, and German (2012) describe a multidepartment workflow fordigitization, including “the collaborative planning process . . . the rewards andchallenges of tackling such a project,” and “lessons learned.” Rafiq and Ameen(2013) explored digitization practices in university libraries of Pakistan and foundthat one‑third of libraries surveyed had digitization programs. Rentfrow (2006)explores the challenges of producing digital thematic research collections,drawing on experience from particular projects. Wolski (2011) investigatesarchiving of research data and states that although “libraries have a history ofdesigning discovery systems, new research paradigms” present challenges andopportunities. Worthey (2009) explores issues in archiving digital content,including its role in scholarly communication. Watanabe (2007) explores thepromotion of eresources to library users. Wu (2011) “presents a vision of acollaborative, digital academic law library” and explores issues such as copyright.Zambare et al. (2009) describe the challenges of migration from a print to anonline environment. Zimmerman and Paschal (2009) write of an exploratorystudy in which the usability of Websites was assessed.Another essential aspect of managing digital collections is the organization and
administration of library functions and departments, including the generalcollection development process. The literature on this topic explores the print‑to‑online transition, the management of eresources, and the roles of traditionallibrary functions such as cataloging, acquisitions, reference, and instruction.Breeding (2012) addresses the management of library collections in the currentenvironment, including various models of acquisition of ebooks and journals.Carr and Collins (2008) explore the management of digital resources and thetransition from a print to an online environment, including acquisition andlicensing. Chadwell (2011) discusses gaming in public and academic libraries,including the field of new media studies. Collins (2009) investigates electronicresource management workflows, including planning, staffing, andcommunication. Demas and Miller (2012) explore the use of collectionmanagement, including “disposition of withdrawn materials, life‑cyclemanagement retention, and education and community support.” DeVoe (2006)defines the challenges of the electronic environment, including the rapid growth
in availability of eresources. Farmer (2009) discusses digital reference resources,“focusing on subscription databases: assessment, selection, acquisition, Webpresentation and maintenance, archiving and preservation, and de‑selection.”Flatley and Prock (2009) explore the need for a defined process for selecting andevaluating eresources. Angel (2011) provides a gap analysis of the digitalcollections department at an academic library. Blummer and Kenton (2012) is areview of the literature on ebooks from 2005 to 2011 to find best practices, whichinclude cataloging, usage statistics, and promotion of these materials. Gregory,Weber, and Dippie (2008) explore the role of technical services librarians in themanagement of digital resources, including “creative uses of the catalog,participating in the creation and improvement of metadata standards, enhancingthe development of, and access to digital collections; knowledge managementcollaborations with library colleagues, academic departments, and otherorganizations; database development and instruction; teaching and referenceactivities; and technology support.” Gómez, López, Prats, and Rovira (2004)describe an academic library’s management of digital resources, includingseparate systems for commercial ebooks and journals and a repository of materialsuch as dissertations. Horava (2010a) explores collection management in thecurrent environment, including “core values, scholarly communication issues,acquisition activities, access and delivery issues, and innovation.” Horava (2010b)considers collection management from the point of view of environmentalsustainability and use of space. Kulp and Rupp‑Serrano (2005) explore theacquisition of eresources, including funding, staffing, decision‑making, andworkflow, finding various answers in a survey of 24 academic libraries. Johns(2003) explores the problem of supporting and managing both print and digitalcollections. Kichuk (2010) is a case study of the growth of eresources in academiclibraries, finding stages of development corresponding to different types ofresources: bibliographic, full text, and reference. Lewis (2007) discusses thedisruption of traditional academic library service that is the result of the“application of digital technologies to scholarly communications.” The authoradvises that libraries “complete the migration from print to electronic collections .. . retire legacy print collections . . . redevelop library space . . . reposition libraryand information tools, resources, and expertise . . . and . . . migrate the focus ofcollections from purchasing materials to curating content.” Lindsay, Kemper, andOelschlegel (2012) present the advantages of purchasing electronic backfiles andremoving print collections in a medical library. Maxey‑Harris (2010) investigateseresources that enhance research into diversity and multiculturalism, finding arapid increase in subscriptions to these materials by ARL libraries in recent years.Price (2009) discusses electronic collection development for libraries with limitedfunds, including open access resources, negotiation with vendors, and formingconsortia. Safley (2006) describes the role of eresources and consortia purchasingin improving collections and services in the library of a scientific research
institute. Schonfeld (2010) discusses the future of print collections in anincreasingly digital environment. Shearer, Kla�, and Nagy (2009) investigatemethods of choosing and evaluating collections of core ejournals for a medicalschool library. Smith (2006) discusses development of electronic collections andmethods of assessing collections, including digital and electronic collections, todetermine whether they match institutional goals. Sinn (2012) surveyed thescholarly literature in history to discover the extent of use of digital resources,resulting in guidelines for the creation of digital material. Skekel (2008) analyzesthe role of libraries in producing and providing access to digital collections,finding that libraries are “expanding their traditional roles of collecting,organizing, and providing access to resources. Their new roles include creatingcontent and in some ways, also creating the access.” Southwell and Slater (2012)examine the use of technology for accessing digital materials, surveying Websitesto determine whether digital materials were accessible using screen readertechnology, and finding that about 42% were accessible in this way. Sowell,Boock, Landis, and Nutefall (2012) is a case study of managing government in thetransition from a print to a digital environment, recommending a balance betweenthe needs of the library and its users and the requirements of the FederalDepository Library Program. Staiger (2012) reviews the research on the use ofebooks, discovering that academic library users do not read ebooks in theirentirety but refer to particular pieces of information. Stevens (2006) describes theplanning process for a completely electronic library, including changes to“collections, budgets, staffing, services, and buildings.” Stewart (2012) exploresdigital preservation and presents ideas such as collaboration with others oncampus in these efforts. Taber and Conger (2010) describe the involvement of acataloging department in creation of an institutional repository, which “broughtopportunities to redefine its perceived role,” including the “creative repurposingof staff, students, and skills in order to integrate these new formats and processes(both physical and digital) into departmental workflows.” Tennant (2001)discusses XML and its role in creating and managing digital objects. Tharani(2012) explores digitization for off‑campus communities, in which “academiclibraries can reposition themselves as responsive and relevant in the face of achanging digital services landscape.” Updike and Rosen (2006) describe thecreation of a digital image database for teaching, learning, and research. Vasileiou,Hartley, and Rowley (2012) report on research into methods and criteria forselecting ebooks. These criteria include price, platform, and a number of otheritems.
6.6 CONCLUSIONSLibraries are always in transition. Academic libraries find themselves in theposition of maintaining collections and services that support teaching and
research as they exist now, but also helping their institutions carve a path into theemerging world of online teaching, learning, and research. These things haveexisted for nearly 20 years, and libraries have become increasingly adept atsupporting online education and providing online resources, but the environmentcontinues to shift rapidly, and it is not always easy to find the best path throughcompeting choices in a somewhat chaotic environment.Academic library managers and administrators have a clear role in managing
digital resources in this era of transition. That role includes:• Creating dialog and partnerships with others on campus and in theeducational and information technology community to create digitalresources, purchase commercial eresources at an advantageous cost, createand acquire useful discovery systems, and create a student‑centered physicalenvironment, including a reduced footprint for the physical collection.
• Creating a more flexible library organization that allows learning andcollaboration, which is inclusive, and which recognizes expertise that can beused and developed to manage digital collections.
• Recognizing the continuous environment of transition and encouraging theorganization to become comfortable with that environment.
• Recognizing and cultivating areas of collection strength that can be an essentialpart of the digital environment.
• Being a part of the open access movement, encouraging creation of open accesspublications, and providing easy access to them.
CHAP T E R 1 3
Staffing the Libraries ofthe FutureRichard Jost
SynopsisThe skills librarians will need as the library transitions from a print to a more technicalenvironment.
AbstractThe people who staff libraries of the future will need to have an expanded set of skillsfrom current employees. In addition to traditional library skills, they will also need to becomfortable in a constantly shifting library environment that will require both technicaland interpersonal qualities. Seeking these traits in future library candidates will be animportant step in aligning the library with its evolving mission.Richard Jost is currently the Information Systems Coordinator at the University ofWashington Marian Gould Gallagher Law Library in Sea�le.
Just as library services have been transformed in the current library environment,so too will the skill set required to staff these new organizations. Although it maybe easy to project that many of the skills and expertise that libraries will need forthe future will be different from the current staff members, it is more difficult toknow exactly how different. Will there still be a need for “traditional” library skills(cataloging, collection development, record management) in the future, or willthese skills be obsolete? And, if it is true that these skills are no longer needed,what new requirements will have taken their place?Before a discussion can begin about skills, a word must first be spent on the
future of librarianship itself. Will there still be a need for professional librarians inthe libraries of the future, or will the current staff positions be filled by anothertype of professional or paraprofessional? As has been documented (Griffiths &
King, 2011), there will be a need in the future for librarians but with someimportant differences:
The future for qualified librarians is strong. There is a continuing need for librarians toreinforce and strengthen their roles as leaders, navigators, organisers, managers andinterpreters of the expanding record of human activity and accomplishment in all itsforms. However, the library profession and workforce face continued and increasingcompetition for qualified individual from the rapidly evolving information industry.Librarians need to recognize that, to remain relevant as society at large becomes muchmore aware of and engaged with recorded knowledge opportunities, they need to beproactive in monitoring and adapting to change, and to pay considerable a�ention topositioning and marketing themselves in this environment. (p. 299).
The authors contend that there will still be a need for professional librarians butnot necessarily those with the same skills that are currently employed. How then,to prepare a list of qualifications for a library future that contains so manydifferent options? If one knew for certain how libraries will be used by variouspatron communities, it would be a relatively simple task to come up with amatching set of librarian skills to meet this future. Unfortunately, there is preciousli�le agreement on what the future will look like and how it will impact staffing.There are, however, some general observations that point to trends that will
impact library staffs of the future: • Libraries will operate with smaller staffs and leaner budgets.• Libraries will rely more on technology and will need staff that aretechnologically literate.
• Libraries will increase their reliance on shared services (like shared technicalservices) to save costs.
• Libraries will devote more job positions to direct patron service for access todigital content. For technical services staff, there are several changes that are already impacting
their work and will continue to do so in the future. The continuing evolution ofthe MARC record for cataloging purposes has accelerated in recent years, inparticular with the introduction of Resource Description and Access (RDA;h�p://www.loc.gov/aba/rda/) the new standard for resource description andaccess designed for the digital world. RDA was introduced in 2010 and was builton the foundations established by Anglo‑American Cataloging Rules (AACR2),issued in 1978. RDA provides a comprehensive set of guidelines and instructionson resource description and access covering all types of content and media. Oneof the main goals for RDA is to create a single standard for cataloging all types of
resources, both analog and digital, so that records for all materials will becompatible.In addition, the Library of Congress has launched the Bibliographic Initiative
Framework (BIBFRAME; h�p://www.loc.gov/bibframe/) to provide a descriptiveframework for resources that appear on the web or in the networked world. Thisnew method of description was developed to be able to cite library data in a waythat differentiates the conceptual work (a title and author) from the physicaldetails about that work’s manifestation (page numbers, illustrations, etc.). It isintended to replace the MARC record at some point with a new structure thatbe�er meets the needs of digital items and their relationships.One projection that can be guaranteed is that the library of the future will have
a large component of technology. Not only will libraries continue to providehardware and software in their physical locations for patron use but they willhave to adapt to the ever‑increasing reliance on personal devices and mobileapplications. There will be many patrons who will continue to come to the library(both for using the collections and utilizing the space), but some of the mostdedicated library users may never set foot within the building. It is these patrons,increasingly independent, that libraries will need to reach out to with appropriatetechnology.It also seems clear that services provided to the patron community will need to
change as a result of the upgrades to technology but also because of risingexpectations. Patrons may no longer be content with borrowing a DVD from thelibrary to watch a movie, but may expect the library to provide a streamingservice similar to the commercial online providers.This example provides a clear contrast of the staff qualities desired under each
scenario. For the library purchasing a movie on DVD for circulation, most of thetraditional library skills that are associated with library operations will suffice.Starting with the acquisitions and purchase of the materials, through catalogingand processing, the DVD does not represent a huge change from how books havebeen curated for many years. The loan rules may be different for the DVD andcertainly the patron has to have a DVD player to watch the movie, but the basiclibrary pipeline of acquiring and distributing materials would be recognizable.In contrast, a library se�ing up a streaming service for movies will have
different challenges. Since this service will most likely be a subscription service,the staff person responsible for acquisitions and procurement must understandthe complexities of signing a contract with limitations and terms that do not applyto one‑time purchases. Instead of pu�ing the DVD on the shelves in the mediacollection, the streaming service will need to be added to the library web page bythe IT department or webmaster, complete with instructions on how to use it. Asfor the reference and circulation librarians, they will need to have anunderstanding of copyright laws, rights management, and user restrictions to
explain to patrons why they cannot download a copy of the movie to theirpersonal device.All in all, this is a much more complex transaction than purchasing a DVD,
requiring a correspondingly complex set of skills. No ma�er what the futureshape of the library, it is safe to assume that it will be a more challengingenvironment as new technologies and user demands proliferate. Libraries willneed to adapt to these changes while not losing sight of their core missions,values, and patron goodwill.Without knowing what new technologies and services will be in use, can
libraries predict the types of staff they will need in the future? Although thefuture may be uncertain, it is possible to develop a list of qualities that will servelibraries well no ma�er what the future may bring.In former days, the introduction of a new library service had a fairly well‑
defined path. When a new service was introduced or a new technology was addedto the existing framework, it was usually followed by a period of stability beforethe next update or revision to that service. In the current environment, there willno longer be a period of interlude between major changes; rather, it is an ever‑expanding cycle of constant change that no longer allows for the long periods ofstability that used to be common. In particular, updates to software are beingpushed out to users on a constant basis, often leading to the necessity of adoptingnew hardware or platforms to take advantage of the new features. Hiring staffwho are comfortable in this new environment will be crucial (Figure 13.1).
FIGURE 13.1 Model of constant change with new libraryservices.
It is difficult to forecast the type of technical knowledge that future job seekerswill need as some of the technology that they may be implementing is not evenavailable yet. This is where the qualities that focus not on specific technicalknowledge but rather the willingness to explore technology solutions with anopen mind will be invaluable. Future librarians will need to have exposure to awide variety of technologies, both in the technical services and public servicessides of library operations, and be comfortable with migrating systems orimplementing new ones as circumstances change. Keeping current with thelibrary automation marketplace and the advances in social media will also berequired to be effective proponents of new services to enhance library operations.Knowing that the environment is shifting, the recruitment by libraries of new
professionals to succeed in this atmosphere can be formidable. In addition toknowledge of programming languages, integrated library system experience, andgraduate degrees, the library manager of the future must also seek some moreintangible traits. In interviewing and hiring of new staff members, libraries mustbe aware of these qualities to seek out in potential job seekers:
1. Adaptability—The ability to change and adapt to circumstances as theyevolve is critical in a constantly changing technical environment. The abilityto be nimble when a shift in direction occurs is necessary for careerlongevity.
2. Inquisitiveness—The desire to be interested in exploring new ways of doingthings is an ideal trait for the future. It is no longer acceptable to fall back onthe canard that libraries will have to do things a certain way as “that is theway it has always been done around here.” To always be questioning currentpractices ensures that unnecessary policies and procedures are not beingmaintained without purpose.
3. Irreverence—Coupled with the ability to be inquisitive, the ideal futurecandidates will also need the courage to challenge preconceived notions ofhow the library operates. Done in a respectful manner, the ability to seekjustification for long‑standing library activities can help the library examinewhether some library folklore tales need to be retired.
4. Confidence—Future library candidates will have to not be afraid of tryingnew things even if they risk failure. Having a positive and confident a�itude(although not reckless) will inspire others to be courageous too. Especially ina tumultuous time of change, having an upbeat a�itude can help create anatmosphere of trust and support.
5. Collegiality—The need for collegiality is a core qualification. Just as almosteverything done in libraries today is based on project management, most ofthese activities are also done by teams working together. The days ofindividual library employees working by themselves in separate offices isover; most of the work done today in libraries is collaborative. The need toget along with a wide variety of people from different backgrounds andethnicities will be a foundational prerequisite for future library staffmembers.
6. Versatility—It is going to be challenging for future library hiring commi�eesto write a job posting that adequately addresses the needs of any oneposition. Libraries are going to have to recruit people who are comfortableusing a host of skills in their workplace, often a blend of traditional technicalservices and public services expertise. Based on lean budgets and expandedservices, more cross‑training between library staff members will be the newnormal to meet patron demands within budgetary limitations. Based on theneed for collegiality, the conventional pa�ern of hiring introverts fortechnical services and extroverts for public services is waning. Futurecandidates will need to have the same a�ention to detail as traditionallibrary work to launch new services but also be able to interact with patronsto help them use the new services most effectively. The fusion of bothtechnical service and public service skills will be the hallmark of new librarystaff members.
While this list of qualifications may seem daunting (or impossible) to seek in
any one individual, it can be a useful measure of what libraries need to cultivatein their new staff members. This is not to say, however, that all new staff must behired to achieve the library’s goals. Many current staff members in libraries havethese traits already or are willing to learn them. Creating a new culture within anorganization will take time and patience but having a roadmap of the type of staffmembers who are needed is an important first step.The growth in the complexity and functionality of new library systems is just
one aspect of the current transformation of the library world. Libraries are part ofthe larger societal changes taking place through the increasing proliferation ofelectronic access to information and constant communication. The digitallandscape that has transformed how people interact with technology has had aprofound effect on the role that libraries play in modern society. The rise in socialmedia applications has allowed libraries to engage with their patron communitiesin new ways, both raising the awareness of the collections they offer and alsoraising expectations of new services that patrons may demand.
CHAP T E R 1 2
Altmetrics and ResearchSupportSharon Q. Yang and Lili Li
SynopsisTips on using altmetrics to measure the impact of your digital library.
AbstractAlternative metrics (altmetrics) are web‑based metrics to measure the impact of print and digitalscholarship. They include downloads, views, clicks, discussed, mentioned, blogged, and more.Open access content promotes altmetrics, which in turn are early indicators of use for printmaterials. This chapter will discuss both traditional and new metrics, research findings aboutaltmetrics, and relevant altmetrics toolsDr. Sharon Q. Yang is a professor and systems librarian at Rider University, Lawrenceville, NewJersey.LiLi Li is an associate professor/E‑information service librarian at Georgia Southern University.
12.1 INTRODUCTIONScholarly communication and support for research are primary interests for academiclibrarians. Many librarians are scholars and researchers as they hold faculty status and areactively engaged in publications and scholarly endeavors. Like teaching faculty, manyacademic librarians strive for excellence in scholarship as they are often under stringentscrutiny in promotion and tenure processes. Academic achievements are equallyimportant for both faculty and librarians. However, librarians are more interested inmeans or tools that can assist faculty and students because the la�er often seek help fromlibraries for their research‑related activities. This chapter will cover some of the tools thatmay be useful for scholars in various ways.One emerging technology for scholars is in the area of tracking research impact. Recent
years have witnessed increasing interest in the measurement for digital or web‑nativescholarship. Other emerging areas include online reference management systems andopen access content. Scholars are always interested in metrics to measure and showcasetheir scholarship for various reasons such as grant writing, award selection, promotion,
and tenure. In addition, metrics are often used for background checking for hiring and asinformation‑seeking aids and collection development tools.
12.2 TRADITIONAL METRICSPublication and research have remained in print format for a long time. Long‑establishedmetrics are in place to measure the impact of print scholarship. Those metrics include thetotal number of published articles or books, the total number of citations, the averagenumber of citations per article by scholar, the average number of citations per article peryear, H‑index, G‑index, journal impact factor (JIF), and more.Counting publications and citations is the most direct way to measure one’s research
impact. H‑index is based on the number of both publications and citations. Created by J.E.Hirsch, Department of Physics, University of California at San Diego, H‑index is “definedas the number of papers with citation number ≥h, a useful index to characterize thescientific output of a researcher” (Hirsch, 2014). G‑index was created by Leo Egghe, aBelgian professor, as an improvement on H‑index. “[Given a set of articles] ranked indecreasing order of the number of citations that they received, the g‑index is the (unique)largest number such that the top g articles received (together) at least g2 citations” (TarmaSoftware Research, 2014). JIF is calculated as the number of citations per article in a journalover a period of time. JIF is often considered an indicator for the relative importance of ajournal in a discipline. Past evidence shows that different disciplines have differentcitation rates, which is the average number of citations per article over a fixed period oftime such as 5 or 10 years (Yang and Dawson, 2014). The impact of scholarship cannot becompared across the disciplines using citation counts and analysis.While traditional metrics are widely accepted in academia across the United States,
there has been long‑standing criticism and issues related to citation‑based metrics. First,the citation counts do not differentiate between positive and negative citations. Self‑citations may inflate citation counts and analysis. Sometimes authors may cite simply topad their publications. According to a 2013 study by Public Library of Science (PLoS),“Citation counts represent less than 1% of usage for an article” (Buschman and Michalek,2013). Additionally, it takes a long time for a work to be cited after its publication. “It takes5 years for a paper in physics to receive half of the cited by references that the article willever require” (Brody et al., 2006). Works that were used do not always get cited. Typically,an estimated 30% of the works are cited and the other 70% are omissions for variousreasons (MacRoberts and MacRoberts, 2010). Therefore traditional metrics, specificallycitation counts and analysis, have flaws and may not accurately reflect one’s scholarlyachievements.Traditional web‑based metrics tools include Google Scholar and Publish or Perish.
Google Scholar is a free and web‑based application that uses a crawler to comb throughthe Internet, including many publishers’ sites for scholarly works. Its metrics are based oncitations. Publish or Perish is a free downloadable software that draws data from GoogleScholar and provides more functionality to manipulate data for analysis (see List ofSoftware for details on Google Scholar and Publish or Perish).
12.3 NEW METRICALTMETRICS
The term “altmetrics” stands for alternative metrics. Other names for altmetrics includesocial metrics, article‑level metrics (ALM), and Influmetrics (Ronald and Fred, 2013). Thefailure of traditional metrics to capture the impact of web‑native scholarship led toincreased interest in altmetrics. Those new measurements feed on web‑based usage ofmore diverse research products that, in addition to publications, may include data sets,media stories, computer codes, molecular structures, algorithms, presentations, and othernontraditional research products. Altmetrics capture and analyze web‑based usage andtraffic generated by scholarly works. The new metrics may include, but are not limited todownloads, bookmarks, saves, favorites, reads, likes, wall posts, discussions, mentions,citations, tweets, views, reviews, expert or public opinions, and more. Those usagestatistics accrue anywhere on the Internet, contributed by both the public and scholars,especially in Web 2.0 applications such as social networking sites, blogs, publishers’ webs,online reference management tools, online databases, and sites of open access journals.Those metrics are so new that they are not standardized or regulated in any way.Altmetrics are increasingly catching the a�ention of librarians, faculty, and students in
all disciplines as the Internet is becoming more and more popular as a channel forscholarly communication. Most scholarly journals also publish an electronic copy on theweb. The development of open access content nurtures and helps altmetrics to flourish. In2012, seven out of the ten of the most popular articles in science were from open accessjournals (Mounce, 2013). A JIF analysis revealed that open access journals receive twice orthree times more citations in comparison to those of closed access (Laakso and Björk,2013). For instance, there is an estimated 150 million failed a�empts per year to access J‑stor, a subscription‑based database of scholarly journal articles with restricted access(Mounce, 2013). In addition, research is not limited to publications only. The modernscholarship can manifest in a variety of research objects.Altmetrics are gaining momentum slowly. The Research Excellence Framework allows
scientists to include altmetrics in their submi�ed reports for evaluation, which mayinfluence funding decisions (Kwok, 2013). Plum Analytics, an altmetrics company, isworking with academic institutions to provide researchers’ profiles based on altmetrics.Recently, the National Science Foundation changed its policy to allow scholars to listresearch products, including publications, in grant applications, recognizing “the breadthof a scientist’s intellectual contributions” (Piwowar, 2013). Those alternative products areevaluated by altmetrics. Two high‑profile promotion and tenure cases exist that includedaltmetrics in their application packages. Emilio Bruna, a faculty member with theDepartment of Wildlife Ecology & Conservation at the University of Florida, applied forpromotion to full professor and selection to the Academy of Distinguished TeachingScholars, a campus‑wide faculty award. He included altmetrics in his applications andboth were successful (Konkiel, 2014). He also teaches altmetrics tools in a workshop onscientific publishing to graduate students. Marine scientist Steven B. Roberts at theUniversity of Washington also included altmetrics in his application for a promotion toassociate professor with tenure. His impact data included tweets, blog posts, views,downloads, and mentions on social networking sites (Howard, 2013). He was successful inhis application.Many altmetrics tools have been developing over the last few years, all of which are
cloud based. Two kinds of altmetrics tools exist: web‑based software or programs that
aggregate data from different sources and compile them into coherent, meaningful impactstatistics; and publishers or content providers that keep track of usage counts of digitalmaterials. Of the former, the most comprehensive and be�er‑designed tool is AltmetricExplorer at h�p://altmetric.com. It covers almost all major social networking sites andmany scholarly sources, including Scopus and Web of Science. In addition, its interactivemap displays geographical areas and countries where the usage of a publication takesplace. Altmetric Explorer is being used by many publishers and content providers to trackarticle‑level altmetrics in their archives and databases. Some online reference managementtools, for instance Mendeley, also developed the capability to take altmetrics. Mostaltmetrics tools also cover citation counts and analyses.Altmetrics are still in an early stage of development, as are their tools and applications.
Many of these tools need improvement. One problem is the difficulty in disambiguatingauthors’ names when collecting publications and gathering usages. Another vitalweakness is that these tools present partial stories about one’s research impact. All thesealtmetrics tools are new and usage statistics take time to accumulate. The hidden web and“the dark social” exist, and altmetrics tools cannot reach them for usage data.Consequently, many may not be comprehensive in data collection as each one may belimited to a certain number of sources. Therefore, each tool may not cover the totality of ascholar’s research and publications. Another fatal threat to altmetrics comes from what iscalled the “liquid culture,” where applications disappear due to the evanescent nature ofInternet content in comparison to solid culture such as print materials, a stable andtangible format (Torres et al., 2013). The recent closing of Google Wave is one example.Meebo, a chat program used heavily by libraries, also illustrates the evanescent characteror liquid culture of the Internet. When a social network or Internet application is closed,associated discussions, bookmarks, and downloads will also vanish into thin air.Copyright is a more serious legal issue. Most altmetrics tools allow scholars to upload
their research and publications, but the copyrights of these publications belong topublishers when authors sign a copyright release. The desire to share research, which is amo�o hailed by altmetric tools, is limited by copyright limitations. Publishers have beguna takedown campaign and are demanding to withdraw the uploaded works (Wecker,2014).There are suspicions about altmetrics as it is still a new area to be explored and
validated. Some complained that altmetrics “open the gates for the barbarians” and“herald a system where a number of tweets would decide a professor’s tenure” (Griffin,2013). One research study revealed that publications can be faked and citations can begenerated artificially without being detected in Google Scholar (López‑Cózar et al., 2012).Web‑based metric applications are easy to manipulate and one can inflate metrics andgame the system. Some even call altmetrics a technology of narcissism (Mounce, 2013).The data compilation can be difficult from disparate sources. Altmetrics, like citationcounts, also do not differentiate positive from negative a�ention.
12.4 RESEARCH ON ALTMETRICSMost research on altmetrics done in 2013 analyzed the structure and interaction of
altmetric indicators with each other and their correlation to traditional metrics. Findingsall point to the fact that highly cited items also receive high volumes of altmetrics.
Significant associations were found between higher metric scores and higher numbers ofcitations with sufficient evidence (Thelwall et al., 2013; Torres et al., 2013). The evidencealso indicates that altmetrics may serve as early indicators of expected usage of apublished work (Wang et al., 2014). Some researchers conclude that “Altmetrics are in factsuperior to traditional filters for assessing scholarly impact in multiple dimensions and interms of social structure” (Liu et al., 2013). More research is called for on altmetrics toverify and broaden those findings.
12.5 FUTURE OF ALTMETRICSIt has never been easy to measure research quality. As web‑native scholarship flourishes,altmetrics are here to stay and flourish as well in spite of the controversies. Altmetrics willcomplement, not replace, traditional metrics. There is still a large body of research workthat may not be on the Internet and thus cannot be easily tracked by altmetrics. Bothtraditional and new measures are mutually supportive and complementary to provide amore holistic picture of the quality of research and publications by a scholar. As manycautioned, one must use altmetrics wisely by providing context and explanations to avoidsuspicion or confusion (Kwok, 2013).Future altmetrics tools will incorporate gaming resistance and detection capabilities. So
far, no one tool can find all the research products by a scholar. Future tools will have tofind ways to correct this, and the work to address this issue is already under way. Onedifficulty in gathering all the research products and associated altmetrics of a researcher isoften the ambiguity of the author’s name. Altmetrics applications have difficultyidentifying one scholar from many others because they have identical names. OpenResearcher and Contributor ID (ORCID) is an effort to connect a scholar with his or herresearch objects through a unique identifier. Altmetrics applications can use data fromORCID through its application program interface (API). ResearchID, an organization, isalso performing a similar function. If a researcher has linked all his or her research objectswith his or her ORCID, it is possible that an altmetric tool can receive this information andprovides a more complete set of altmetrics for his or her works.National Information Standards Organization, a nonprofit organization for creating
information industry standards, is working on standards and recommended practices foraltmetrics. “For altmetrics to move out of its current pilot and proof‑of‑concept phase, thecommunity must begin coalescing around a suite of commonly understood definitions,calculations, and data sharing practices” (Lagace, 2013). It is important to establishcommonly agreed upon rules governing what will be measured, how they will bemeasured, and the technical infrastructure to produce and exchange this data. Theregulated approach to altmetrics will help them become more accepted in scholarlycommunities.
12.6 LIST OF TRADITIONAL CITATIONBASED ANDALTMETRICS TOOLS
12.6.1 Traditional CitationBased Tools
Name: Publish or PerishDeveloper/company: A.W. Harzing; harzing.comPrice: Open source/freeDescription: Publish or Perish is a software program available for both Windowsand Mac. One needs to download and install it on a local PC. The metrics aredrawn from Google Scholar and are citation‑based. It provides more possibilitiesfor data analysis than Google Scholar can provide. Author or journal search isavailable for evaluation with H‑index, G‑index, and other traditional metrics. Itis easy to download, install, and is self‑explanatory. Anyone can work withPublish or Perish without much training.Reviewer comment: Publish or Perish is designed to empower individualacademics to present their case for research impact to its best advantage. Wewould be concerned if it would be used for academic staff evaluation purposesin a mechanistic way (Harzing, 2007).
Name: Google ScholarDeveloper/company: Google Inc.Price: FreeURL: h�p://scholar.google.comSystem requirement: A browser and Internet accessDescription: When an author creates an account and logs into Google Scholar forthe first time, the system will find all the publications under the same or similarname so he or she can choose which ones are his or hers. The author can alsomanually import the published works when Google cannot find them. GoogleScholar provides citation‑based metrics such as citation counts, citation pa�erns,H‑index, and i10‑index. Additional features include “My Library” where ascholar can save his retrieved results. Criticism of Google Scholar includesincomplete coverage, Ma�hew effect, and security vulnerability to gaming.Reviewer comment: Google Scholar Citations is a citation service provided freeof charge. It is easy to set up, especially if you already have a Google account.Like other citation tracking services, it tracks academic articles, but it also countstheses, book titles, and other documents toward author citation metrics (CornellUniversity Libraries, 2015).
12.6.2 Altmetrics Tools
Name: Academia.edu
Developer/company: academic.eduPrice: Open source/freeURL: h�p://www.academia.eduSystem requirement: A browser and Internet accessDescription: A repository for scholarly works, Acadmia.edu is a free Internetapplication that allows scholars to upload their research, CV, keywords forresearch interests, and publications to share with others. The link called“Analytics” is where altmetrics are displayed, which is comprised mainly ofviews of uploaded documents, keywords, and external links in a scholar’sprofile. Like Altmetrics Explorer’s “Demographics,” academic.edu has a tabcalled “Country” where one can see the geographical locations of viewers whohave visited the author’s profiles and read his or her publications. One problemwith academia.edu is the copyright issues where authors signed off theircopyright to the publishers. Therefore, scholars are prohibited from uploadingtheir publications into the system. Some publishers demanded thatacademic.edu take down publications.Reviewer comment: Researchers use the site to follow one another’s work, tracktheir influence with analytics tools, and, the company suggests, “build powerfulbrands online” (Shankland, 2013).
Name: Altmetric ExplorerDeveloper/company: Altmetric LLP, The Macmillan Building, 4Crinan St. London, N1 9XW UKPrice: Publisher/Institution subscription; free for librariansSystem requirement: A browser and Internet accessDescription: Altmetric Explorer is by far the most comprehensive and morepowerful altmetrics application. Publishers subscribe to it to display article‑levelaltmetrics to their line citations and publications. Institutions subscribe to it toshowcase the impact of research by their scholars. Each article receives a score toindicate its usage. The importance of an article is indicated by the shade of colorson the wreath around the score. Collected altmetrics include blogs, news, Weibo,Facebook, Google+, Twi�er, and more. There is also a “Demographics” link,which displays a visual map of where the a�ention for an article comes from.Altmetric Explorer allows users to search for an article by author, title keywords,or journals, but the search feature still needs improvement. As it is a newlydeveloped application, the coverage of publications do not go back very far eventhough its collection of altmetrics is comprehensive.Reviewer comment: This program is comprehensive in that it providesinformation about how many times an article has been viewed and the rankingsfrom the journal they are from. Explorer also provides a list of social componentslike how many times an article has been picked up on a news feed, how often ithas been tweeted, and who has discussed it on Google + and several other social
media platforms. Using Altmetric Explorer a researcher can even see thedemographics of who has seen their article. This is an excellent feature as itprovides people with an idea of who is looking at the material (Read, 2013).
Name: ImpactstoryDeveloper/company: Heather Piwowar, Jason Priem, and Stacy KonkielPrice: Open source/freeSystem requirement: A web browser and Internet accessDescription: Impactstory is an open source web‑based tool that measures thediverse impacts of all research products—from traditional products like journalarticles, to emerging products like blog posts, data sets, presentations, andsoftware. Built by several individuals with grants from various organizations, itanalyzes and displays the impact of research based on raw metrics. Each scholaris given a customer‑persistent URL, and metrics are classified based on audienceand type of engagement with the research (Impactstory, 2014). It shows dataregarding users and use in a percentile that is calculated in comparison withother research indexed by Web of Science in that year. Its uniqueness lies in itsanalysis and ability to display the impact of one’s research in an easilyunderstandable format, called impact story. Users may need to import theiritems into Impactstory, which in turn automatically gathers impact statisticsfrom Scopus, Mendaley, Google Scholar, Slideshare, ORCID, and PubmedCentral. However, Impactstory is not synchronized with the aforementionedsystems and cannot automatically update its content. This application is anexcellent tool for scholars who want to trace the impact of their web‑nativescholarship. However, it still needs to improve as there are parts that may notwork well sometimes.Reviewer comment: I included Impactstory data in my portfolios for (1)promotion to full professor and (2) selection to UF’s Academy of DistinguishedTeaching Scholars, a campus‑wide faculty award. Both were successful. Butperhaps more importantly, I included Impactstory in my workshop on scientificpublishing for graduate students, where in one of the sessions all theparticipants set up ORCID IDs, Researcher IDs, and Impactstory Profiles—checkit out. Students get it (Byrne, 2014).
Name: DataCite Metadata StoreDeveloper/company: DataCitePrice: Open source/freeURL: h�p://www.datacite.org
Description: DataCite is a service that promotes metrics for data sets by allowingdata publishers to register DOIs and associated metadata. Metadata Search alsoallows people to search for data sets registered with DataCite. DataCite providesstatistics of DOI registrations and resolutions as well, sorted by allocator, datacenter, or prefix. By working with data centers around the world, DataCite aimsto develop a global citation framework that supports simple and effectivemethods of data citations, discovery, and access. The search service is fairlysimple, but it also comes with a list of filters that allows the user to filter resultsby allocator, data center, prefix, resource type, contributor, creator, publicationyear, publisher, and/or language.Reviewer comment: The DataCite Metadata Search service aims to supportefforts at increasing the ease and prevalence of data citation. By exposing andproviding a search interface for metadata a�ached to data sets that are registeredwith the DataCite Metadata Store, the search service allows researchers to findothers’ data and to track their own data’s DOI and citations (Digital CurationCentre, 2013).
Name: KudosDeveloper/company: Kudos Innovations Ltd.Price: Publishers and institutions subscription; free for individualsURL: h�ps://www.growkudos.com/Description: Kudos is an altmetrics aggregator that charges a fee for publishersand institutions, but is available free of charge for researchers. To set up apublication for altmetrics, a process has to be completed by researchersinvolving “explain,” “enrich,” “share,” and “measure.” Researchers are asked toannotate their publication for easy and quick understanding by the public, linkor upload the content, agree to publicize the research, and measure by gatheringaltmetrics. Kudos is unique in that it pushes research and publication to socialnetworking sites such as Facebook and Twi�er and via e‑mail for promotion andin return gathers data from those sources. None of the other altmetrics tools areas aggressive. Its altmetrics include tweets, Facebook posts, referrals, Kudosviews, Kudos downloads, and more. Kudos has partnerships with publishersincluding Emerald, Elsevier, Wiley, and Taylor & Francis Group. Its recentintegration with Altmetric Explorer will enhance its data coverage.Reviewer comment: First piloted in September 2013, Kudos is a new servicedesigned to help scholars and their institutions increase the impact of theirpublished research articles. Altmetric Explorer tracks and collates mentions ofresearch articles on social media, blogs, news outlets, and other online sources.This integration means mentions are now incorporated on the Kudos metricspages for individual authors, and accompanied by a short summary that furtherdetails the number of mentions per source. Each article is assigned a score based
on the amount of a�ention it has received to date, and authors are able to clickthrough to see a sample of the original mentions of their article (Wheeler, 2014).
Name: PlumXDeveloper/company: Plum AnalyticsPrice: N/AURL: h�p://www.plumanalytics.com/Description: Plum Analytics is a service provided for users to assess scholarshipimpact, track said impact, measure output, and compare group metrics. PlumAnalytics has developed an impact dashboard that displays how research outputis being utilized, interacted with, and mentioned around the world. Thisdashboard harvests metrics from multiple online sources, and tracks manyartifact types. These factors contribute to PlumX’s easily understandable metricsummaries, while the PlumX toolset allows users to gain more depth from thesesummaries. Plum Analytics is now integrated with EBSCO Information Servicesas well.Reviewer comment: PlumX may be of interest to academic libraries, speciallibraries, research support offices, and anyone seeking to be�er understand howthe research output of their organization is being used. The tools that assessresearch impact beyond citations are new, and it will take some time todetermine how useful these metrics are to administrators and researchers(Swoger, 2013).
Name: ReadMeterDeveloper/company: Dario Taraborelli, San Francisco, CAPrice: FreeURL: h�p://www.readmeter.orgDescription: ReadMeter is based on the traditional concepts such as H‑index andG‑index, but redefined them so bookmarks are used for analysis rather thancitations. It takes advantage of the API provided by Mendeley, an onlinereference management system for readership data, and turned them into Hr‑index and Gr‑index. The application is free for anyone to use.Reviewer comment: ReadMeter measures the use of scientific content by a largenumber of readers. It presents author and article level statistics visually. Data areobtained using the Mendeley API and the reports are available in both machinereadable and HTML formats (Terkko, 2014).
12.6.3 Publishers/Content Providers with Altmetrics
Name: PloS Article‑Level MetricsDeveloper/company: PLOS, San FranciscoPrice: Open source/freeURL: h�ps://github.com/articlemetricsDescription: PLoS is a publisher of seven open access journals primarily in thebiomedical field. A fee is charged for an article to be published. However, itsALM application, wri�en in Ruby, is open source and available for download. Itrequires someone with IT skills to install and run the program. PLOS uses ALM,which stores and reports performance data on research articles, to aggregaterelevant data and statistics for research articles including online usage, citations,social bookmarks, ratings, blog coverage, and more (FAQ, 2014). It is the firstpublisher that has developed an altmetrics application that gathers untraditionalusage data on articles that are viewed, cited, saved, discussed, andrecommended. ALM has a list of sources for data collection. PLOS provides anAPI to developers for access to data ALM collects.Reviewer comment: It is clear that PloS sees quality as a multidimensionalconstruct, and thus presents a collection of indicators in an a�empt to paint abroader, more complex picture of article performance (Davis, 2009).
Name: PloS Impact ExplorerDeveloper/company: AltmetricPrice: FreeURL: h�p://www.altmetric.com/demos/plos.htmlDescription: PLoS Impact Explorer is a web‑based tool that allows users toexamine their scholarly impact. Information on tweets, Facebook pages, articles,Google posts, news mentions, and blogging activity is available on the website.Users can track the research impact of a single person, a group, or an institution;PLoS Impact Explorer allows users to monitor scholarly impact. Users can alsobrowse through articles based on frequency of mentions, journal publication,and more. This tool may be best used in conjunction with other Altmetric metrictools, such as Altmetric Explorer and Altmetric API.Reviewer comment: Developed by Euan Adie, product manager at theMacmillan‑funded startup, Digital Science, the PLoS Impact Explorer app is anextension of Adie’s Altmetric service, which tracks and scores academic output(scientific articles and data sets) based on the mentions it has received in thepress, on reference manager websites, on social media websites, and in literaturereviews. This app features a clean, intuitive interface and nicely integrates the
PloS Search API, Mendeley reader counts, and Altmetric’s scores for academicoutput (Konkiel, 2011).
Name: ScopusDeveloper/company: ElsevierPrice: Annual feeURL: h�p://www.elsevier.com/online‑tools/scopusDescription: Scopus is a database containing abstracts and citations for academicarticles. It is self‑acclaimed the largest abstract and citation database of peer‑reviewed literature. Scopus features smart tools to track and analyze research,and also delivers an overview of research output all over the world, regardless ofdisciplinary field.Reviewer comment: Scopus is a promising addition to the stable of workhorsedatabases now available to researchers in the STM subject categories, and itsinterdisciplinary content coupled with citation searching capability inevitablysets it up as a direct rival to Web of Science. Although definitive pricinginformation is not publicly available for these costly products, earlier estimatesindicate a modest edge in favor of Scopus. However, prospective buyers mustalso factor in a host of performance and content factors to determine which ofthese products will be�er serve the needs of their user communities (Dess, 2006).
Name: Web of ScienceDeveloper/company: Thomson ReutersPrice: Annual subscription based on FTE and other factors; one time fee forlinking to past citations.URL: h�p://wokinfo.com/Description: Web of Science connects publications and researchers throughcitations and controlled indexing in databases covering every discipline. Over100 years’ worth of content is fully indexed, with back files dating all the wayback to 1898. Users are able to use cited reference search to track past researchand monitor current developments.Reviewer comment: Web of Science is a database that allows you to searchbibliographic and citation information and create research metrics. Strengths:coverage spans 1955 to the present and covers over 12,000 high‑impact journalsand over 150,000 conference proceedings; stronger coverage in the sciences.Weaknesses: less coverage in other disciplines such as social sciences, arts andhumanities, and not much coverage in other disciplines (La Trobe University,2015).
12.6.4 Organize and Create Bibliographies
Name: ReadCubeDeveloper/company: LabtivaPrice: FreeURL: h�ps://www.readcube.com/Description: ReadCube is a desktop and browser‑based program that allowsusers to manage, annotate, and access academic research articles. ReadCube isavailable on Windows and Macintosh OS, and features customizability;integrated search functionality with Google Scholar, PubMed, and MicrosoftAcademic; citation exportation to EndNote and other reference managers; andmore.Reviewer comment: “I think that ReadCube is the best academic application Ihave ever used. It encompasses everything I need in a reference manager but in avery simple and stylish way, which is why I avidly recommend ReadCube to allmy work colleagues.” Vanessa Tubb, h�ps://www.readcube.com/
Name: VIVODeveloper/company: VIVO projectPrice: Open source/freeURL: h�p://www.vivoweb.org/Description: VIVO is an open source web application first developed andimplemented at Cornell. The application enables the discovery of research andscholarship across multiple disciplines at each institution through browsing andsearch functions that return results for rapid retrieval of information. VIVO isinstalled locally at each institution, and content may be maintained manually, orthrough automated ways from local systems (HR, grants, local faculty activitydatabases, database providers, etc.).Reviewer comment: VIVO is a free, downloadable semantic web applicationdesigned to facilitate research collaboration both within and betweeninstitutions. Originally developed at Cornell, it invites institutions to uploaddata related to faculty profiles, which it crawls in order to draw meaningfulconnections between researchers. VIVO doesn’t directly support user‑centeredmetrics, but has the potential to be a powerful tool in collecting university‑levelresearch metrics. To date, only a few large institutions have implemented VIVO,as it requires significant programming knowledge and commitment (Roemerand Borchardt, 2012).
Name: CiteSeerXDeveloper/company: Steve Lawrence, Lee Giles, and Kurt Bollacker at the NECResearch Institute, Princeton, New JerseyPrice: Free and open accessURL: h�p://citeseerx.ist.psu.edu/indexDescription: CiteSeerX is a digital library and search engine that focusesprimarily on scientific literature concerning computer and information science.Features include autonomous citation indexing for literature search andevaluation; automatic metadata extraction for analysis and document search;citation statistics for all articles cited in the database; automatically generatedreference links; author disambiguation; citation context; article harvesting inaddition to its submission system; and a personalized content portal thatprovides feature like personal collections, RSS‑like notifications, socialbookmarking, and more.Reviewer comment: “The CiteSeerX digital library stores and indexes researcharticles in Computer Science and related fields. Although its main purpose is tomake it easier for researchers to search for scientific information, CiteSeerX hasbeen proven as a powerful resource in many data mining, machine learning andinformation retrieval applications that use rich metadata, e.g., titles, abstracts,authors, venues, references lists, etc.” (Caragea et al., 2014).
Name: BibTexDeveloper/company: Oren PatashnikPrice: FreeURL: h�p://www.bibtex.org/Description: BibTex is reference management software used for forma�ingreferences; the tool is typically used with the LaTeX document preparationsystem. BibTex helps users cite sources in a consistent manner by separatingbibliographic information from the presentation of the information. Users canuse this program by creating a document in which the citations are wri�en usingBibTex format, then uploading the document to BibTex for sorting and propercitation.Reviewer comment: Using BibTex is fast and easy way to keep track ofreferencing within a document. It uses a plain text file containing a database ofall your references, each of which must be given a unique keyword (Johnson,2013).
Name: RefWorksDeveloper/company: ProQuestPrice: $100 annual subscriptionURL: h�ps://www.refworks.com/Description: RefWorks is web‑based commercial reference managementsoftware. It allows users to easily gather, manage, store, and share information.References are imported from text files or online databases; for example, GoogleScholar, Web of Science, or Scopus. These databases can then be used to manage,store, and share information. Users can format bibliographies and manuscriptsvery easily. RefWorks has also incorporated a RSS feed reader that allows usersto establish connections to their favorite RSS feeds and import data directly fromthose feeds into RefWorks. RefWorks can be used to count citations to assess theimpact of research.Reviewer comment: “I have used reference database software for many years.RefWorks is the most intuitive and least problematic of any software that I haveused. It permits me to capture the reference information for the online resourcesmore effectively than other tools. I am also impressed with the responsiveness ofthe RefWorks development team. Personally, I have found RefWorksinvaluable!” (Olsen, n.d.).
Name: RefmanDeveloper/company: Ernest Beutler/Thomson ReutersPrice: $249.95URL: h�p://www.refman.com/Description: Reference Manager (RefMan) is a software tool for publishing andmanaging bibliographies on Windows and Macintosh desktops. The tool helpsusers create bibliographies in a much more efficient manner. This productincludes an online reference searcher, database manager, web publisher,bibliography builder, and easy reference sharing.Reviewer comment: With RefMan, authors can save time by exporting referencesdirectly from online resources such as ISI Web of Knowledge SM. Manuscriptsforma�ed by RefMan can be submi�ed easily for publications using ManuscriptCentral, an online manuscript and peer‑review system for scholarly publishersfrom ScholarOne, a Thomson Reuters Business (InfoToday, 2008).
Name: Zotero
Developer/company: Roy Rosenzweig Center for History and New Media,George Mason UniversityPrice: FreeURL: h�ps://www.zotero.org/Description: Zotero is open source reference management software that allowsusers to manage bibliographic data and related search materials. Featuresinclude web browser integration, online syncing, generation of in‑text citations,and footnotes and bibliographies. It is also integrated with word processorsMicrosoft Word, LibreOffice, OpenOffice.org Writer, and NeoOffice. Zoterooffers ways to connect and collaborate with other researchers as well, throughZotero Groups and Zotero People. Zotero is primarily a Mozilla Firefox browserplugin, but Zotero Standalone and Zotero Connectors allow users to use thesoftware with Mac OSX, PCs, and Linux, and Google Chrome and Apple Safari,respectively.Reviewer comment: “While Zotero 3 has its bugs and limitations, this free,browser‑based plugin will ingratiate itself in your research, thanks to itssimplicity, portability, and flexibility” (Fenton, 2012).
Name: MendeleyDeveloper/company: Elsevier (originally founded by three German PhDstudents)Price: FreeURL: h�p://www.mendeley.com/Description: Mendeley is a reference manager and academic social network thathelps users organize research, collaborate with others online, and keep up todate with the latest research. With Mendeley, users are able to use the referencemanager; read and annotate works; add and organize pdfs; collaborate withcolleagues; access papers on the web; and network and discover papers, people,and other public groups. Reviewer comment: “A one stop shop. Excellent forcollaboration and discovery. It is usually a task to find like‑minded individuals”(Weaver, 2014).
Name: CiteULikeDeveloper/company: Richard Cameron; CiteULike, Redland House, 157 RedlandRoad; Bristol, BS6 6YE UKPrice: FreeURL: h�p://www.citeulike.org/
Description: CiteULike is a service that helps users store, organize, and sharescholarly papers. Once a user adds a paper to their personal library, CiteULikeautomatically extracts citation details. The service works within the user’s webbrowser—installation is not needed, and the library is accessible from anycomputer with an Internet connection. Citations are saved and shareable,promoting collaboration among scientists and researchers.Reviewer comment: This tool doesn’t work exactly like a citation manager;however, it allows for you to quickly (with a quick “Post to CiteULike” bu�on)save online articles, tag them, organize them, and so forth. “I really liked thatyou can prioritize the articles into categories such as ‘top priority’ or ‘I mightread it!’” (InfoProMom, 2011).
Name: EasyBib Bibliography CreatorDeveloper/company: ImagineEasy SolutionsPrice: FreeURL: h�p://www.easybib.com/Description: EasyBib is an information literacy platform that provides citation,note‑taking, and research tools to users. This service is accurate andcomprehensive, and is most commonly used for citations; users input data foreach citation, and the platform automatically cites the source based on dataprovided and organizes the citations in order as well. EasyBib also provides aseries of paid products tailored to different needs; for instance, the SchoolEdition of EasyBib is tailored for librarian use to teach students more aboutresearch habits and enhance critical thinking skills.Reviewer comment: “This site is awesome; I have found it to take all the pain outof creating bibliographies. I really like the new page layout and the automaticsave feature. I was forced to restart my computer this afternoon, and was sooverjoyed to not have lost all the information I had entered into EasyBib.”
Name: Cite This For MeDeveloper/company: Cite This For MePrice: FreeURL: h�p://www.citethisforme.com/Description: Cite This For Me is a simple and straightforward citation program.Users follow three simple steps: add information about sources, build thebibliography, and download the fully forma�ed bibliography. Cite This For Mealso provides users with the option of sharing bibliographies with groups,promoting collaboration among users. Reviewer comment: The site itself is
visually appealing and easy to use. There are several options available to use foryour paper. After you create a bibliography you can access it on the site. It is avery simple way to create your works cited page for a paper. I encourageeveryone to take a look at something such as CiteThisForMe as a way toorganize and create an accurate works cited page (Wells, 2013).
Name: refDotDeveloper/company: GooglePrice: FreeURL:h�ps://chrome.google.com/webstore/detail/refdot/hdhekmbccpnb�kdoinkjmggbcpcfloDescription: refDot is a Google Chrome extension that allows users to keep trackof and format references for bibliographic use. When viewing a website or onlinearticle of any sort, users can click the refDot icon in the browser to open awindow into which users will enter all information needed for a bibliography.Reviewer comment: refDot could be a very useful Chrome extension for studentsto use when [they’re] performing research online. What I like about refDot is thatyou’re reminded to record all the important information you need for mostbibliography formats (Byrne, 2012).
Name: CiteLighterDeveloper/company: Saad Alam, Lee JoklPrice: Free ($80 for annual subscription to Citelighter Pro)URL: h�p://www.citelighter.com/Description: Citelighter is an academic research platform that allows users tosave, organize, and automatically cite online and offline information and storecontent privately or sort by topic to be shared with the community throughKnowledge Cards. Citelighter also provides a downloadable tool bar for easieruse. All citations and bibliographic data are stored in the cloud for be�eraccessibility.Reviewer comment: Teaching a variety of things from basic research (captureand store), to organization (digital notecard that are easily manipulated), toteaching paraphrasing (comments section under each fact is a great place forstudents to record thoughts) (West, 2013).
12.6.5 Choose a Journal to be Published
Name: SCImagoDeveloper/Company: SCImagoPrice: FreeURL: h�p://www.scimagojr.comDescription: SCImago is a portal that includes journals and country‑specificindicators developed based on information from Elsevier’s Scopus database. Itmeasures scientific influence of scholarly journals by accounting for both thenumber of citations received as well as the importance or prestige of the journalswhere these citations come from. In other words, SCImago provides users with aweighted rank of influence of journals.
REFERENCESArticle‑Level Metrics Information. Retrieved July 14, 2014 from
h�p://www.plosone.org/static/almInfo.Brody, T., Harnad, S., Carr, L., 2006. Earlier Web usage statistics as predictors of later citation impact.
J. Assoc. Inf. Sci. Technol. 57 (8), 1060–1072. h�p://dx.doi.org/10.1002/asi.20373.Buschman, M., Michalek, A., 2013. Are alternative metrics still alternative? Bull. Assoc. Inf. Sci.
Technol. 39 (4), 35–39.Byrne, R., 2012. RefDot: a chrome extension for organizing reference materials [Blog post]. Retrieved
from Free Technology for Teachers website: h�p://www.freetech4teachers.com/2012/04/refdot‑chrome‑extension‑for‑organizing.html#.U8QXRJRdWSo.
Byrne, R., 2014. 5 tools that help students organize research and create biographies [Blog post].Retrieved from Free Technology for Teachers website:h�p://www.freetech4teachers.com/2014/04/5‑tools‑that‑help‑students‑organize.html#.U7wzG5RdWSq.
Caragea, C., Wu, J., Ciobanu, A., Williams, K., Fernández‑Ramírez, J., Chen, H., Wu, Z., Giles, L.,2014. CiteSeerx : a scholarly big dataset. Advances in information retrieval: 36th Europeanconference on IR research, ECIR, Amsterdam, The Netherlands, April 13–16, 2014. Proceedings.311. h�p://dx.doi.org/10.1007/978‑3‑319‑06028‑6_26.
Citation Analysis: Measure Your Research Impact, 2014. Retrieved July 14, 2014 fromh�p://latrobe.libguides.com/content.php?pid=460592&sid=3770409.
Cormier, D., 2012. Google Scholar Citations. Retrieved June 8, 2014 from A Quick Look at GoogleScholar’s ‘my Citation’ website: h�p://guides.library.cornell.edu/content.php?pid=422684&sid=3811338.
Cornell University Library, 2015. Measuring your research impact: Google scholar citations.Retrieved October 26, 2015, from Cornell University Library website:h�p://guides.library.cornell.edu/c.php?g=32272&p=203399.
DataCite Metadata Search, 2013. Retrieved July 14, 2014 from Digital Curation Centre website:h�p://www.dcc.ac.uk/resources/external/datacite‑metadata‑search#sthash.cfiqkILT.dpuf.
Davis, P., 2009. PLoS releases article‑level metrics [Blog post]. Retrieved from The Scholarly Kitchenwebsite: h�p://scholarlykitchen.sspnet.org/2009/09/22/plos‑releases‑article‑level‑usage‑data/.
Dess, H.M., 2006. Database reviews and reports: scopus. Osorio, N. (Ed.), Rutgers University.h�p://dx.doi.org/10.5062/F4X0650T.
FAQ, 2014. Retrieved May 24, 2014 from PLOS API website: h�p://api.plos.org/alm/faq/.
Fenner, M., 2013. New datacite/orcid integration tool [Blog post]. Retrieved from PLoS Blogswebsite: h�p://blogs.plos.org/mfenner/2013/05/18/new‑datacite‑orcid‑integration‑tool/.
Fenton, W., 2012. Zotera 3. Retrieved July 1, 2014 from PC website:h�p://www.pcmag.com/article2/0,2817,2403446,00.asp.
Griffin, D., 2013. ImpactStory: tabulating tomorrow’s research. Inf. Today 30 (8), 8.Harzing, A.W., 2007. Publish or Perish. Available from h�p://www.harzing.com/pop.htm.Hirsch, J.E., 2014. An index to quantify an individual’s scientific research output. Retrieved May 24,
2014 from h�p://www.pnas.org/content/102/46/16569.Howard, J., 2013. Rise of “Altmetrics” revives questions about how to measure impact of research.
Chron. Higher Educ. 59 (38), A6–A7.ImpactStory, 2014. “FAQ.” Retrieved May 10, 2014 from h�ps://impactstory.org/faq.Johnson, 2013. Using bibtex for referencing [Blog post]. Retrieved from Research Methods website:
h�p://math65740.blogspot.com/2013/10/using‑bibtex‑for‑referencing.html.Konkiel, S., 2011. Binary ba�le finalists announced [Blog post]. Retrieved from
h�p://blogs.plos.org/plos/2011/11/binary‑ba�le‑finalists‑announced/.Konkiel, S., 2014. At h�p://blog.impactstory.org/contest‑winner/.Kwok, R., 2013. Research impact: altmetrics make their mark. Nature 500 (7463), 491–493.Laakso, M., Björk, B.‑C., 2013. Delayed open access: an overlooked high‑impact category of openly
available scientific literature. J. Am. Soc. Inf. Sci. Technol. 64 (7), 1323–1329.Lagace, N., 2013. National Information Standards Organization. NISO to develop standards and
recommended practices for altmetrics [Press release].La Trobe University, 2015, July 22. Citation analysis: measure your research impact. Retrieved
October 26, 2015, from La Trobe University website: h�p://latrobe.libguides.com/content.php?pid=460592&sid=3770409.
Liu, C.L., Xue, Y.Q., Wu, H., Chen, S.S., Guo, J.J., 2013. Correlation and interaction visualization ofAltmetric indicators extracted from scholarly social network activities: dimensions andstructure. J. Med. Internet Res. 15 (11), 6. h�p://dx.doi.org/10.2196/jmir.2707.
López‑Cózar, E.D., Robinson‑García, N., Torres‑Salinas, D., 2012. Manipulating Google scholarcitations and Google scholar metrics: simple, easy and tempting. Retrieved May 17, 2014, fromh�p://arxiv.org/abs/1212.0638.
MacRoberts, M.H., MacRoberts, B.R., 2010. Problem of citation analysis: a study of uncited andseldom‑cited influences. J. Am. Soc. Inf. Sci. Technol. 61 (1), 1–12.
Mendeley, citeulike reviews [Blog post], 2011. Retrieved fromh�p://infopromom.wordpress.com/2011/08/23/thing‑14‑mendeley‑citeulike‑reviews/.
Mounce, R., 2013. Open access and altmetrics: distinct but complementary. Bull. Assoc. Inf. Sci.Technol. 39 (4), 14–17.
Olsen, J. RefWorks testimonials. Retrieved July 1, 2014 from RefWorks website:h�ps://www.refworks.com/content/testimonials/default.asp.
Piwowar, H., 2013. Altmetrics: value all research products. Nature 493 (7431), 159.h�p://dx.doi.org/10.1038/493159a.
Read, K., 2013. Altmetrics and evaluating scholarly impact: what’s out there and how can weparticipate? [Blog post]. Retrieved from Kevin the Librarian website at:h�p://kevinthelibrarian.wordpress.com/2013/03/09/altmetrics‑and‑evaluating‑scholarly‑impact‑whats‑out‑there‑and‑how‑can‑we‑participate/.
Roemer, R.C., Borchardt, R., 2012. From bibliometrics to altmetrics: a changing scholarly landscape.Coll. Res. Lib. News 73 (10), 596–600. Retrieved from: h�p://crln.acrl.org/content/73/10/596.full.
Ronald, R., Fred, Y., 2013. A multi‑metric approach for research evaluation. Chin. Sci. Bull. 58 (26),3288–3290.
Shankland, S., 2013. Academia.edu raises funds to build a Facebook for scientists. Retrieved July 1,2014 from CNet website: h�p://www.cnet.com/news/academia‑edu‑raises‑funds‑to‑build‑a‑facebook‑for‑scientists/.
Swoger, B.J.M., 2013. Reference eReviews. Retrieved July 1, 2013 from Library Journal website:h�p://reviews.libraryjournal.com/2013/08/reference/ereviews/referene‑ereviews‑august‑15‑2013/.
Tarma Software Research, 2014. Citation metrics. Retrieved May 24, 2014 fromh�p://www.harzing.com/pophelp/metrics.htm#gindex.
Terkko, 2014. ReaderMeter alpha. Retrieved October 26, 2015, from Terkko Navigator/ReaderMeterwebsite: h�ps://www.terkko.helsinki.fi/readermeter.
Thelwall, M., Haustein, S., Larivière, V., Sugimoto, C.R., 2013. Do Altmetrics work?. Twi�er and tenother social web services. PLoS ONE 8 (5), 1–7. h�p://dx.doi.org/10.1371/journal.pone.0064841.
Thomson reuters releases reference manager 12 for windows, 2008. Retrieved July 14, 2014 fromInformation Today, Inc. website: h�p://newsbreaks.infotoday.com/Digest/Thomson‑Reuters‑Releases‑Reference‑Manager‑12‑for‑Windows‑50580.asp.
Torres, D., Cabezas, Á., Jiménez, E., 2013. Altmetrics: new indicators for scientific communication inweb 2.0. Comunicar 21 (41), 53–60. h�p://dx.doi.org/10.3916/C41‑2013‑05.
Wang, X., Mao, W., Xu, S., Zhang, C., 2014. Usage history of scientific literature: nature metrics andmetrics of nature publications. Scientometrics 98 (3), 1923–1933.
Weaver, M., 2014. Our users. Retrieved July 1, 2014 from Mendeley website:h�p://www.mendeley.com/our‑users/.
Wecker, M., 2014. Should you share your research on Academia.edu? Retrieved July 1, 2014 fromChronicle Vitae website: h�ps://chroniclevitae.com/news/345‑should‑you‑share‑your‑research‑on‑academia‑edu.
Wells, J., 2013. Citing sources part 2: CiteThisForMe [Blog post]. Retrieved fromh�p://madamewells.blogspot.com/2013/10/citing‑sources‑part‑2‑citethisforme.html.
West, K., 2013, October 30. Reviews‑Citelighter [online forum post]. Retrieved from Citelighterwebsite: h�ps://edshelf.com/tool/citelighter/.
Wheeler, L., 2014. Kudos integrates altmetric data to help researchers see online dissemination ofarticles [Blog post]. Retrieved from Digital Science website: h�p://www.digital‑science.com/blog/posts/kudos‑integrates‑altmetric‑data‑to‑help‑researchers‑see‑online‑dissemination‑of‑articles.
Yang, S.Q., Dawson, P.H., 2014. November 6. Altmetrics‑learn new metrics to showcase the impactof your research. Conference presentation presented at 2014 Northeast E‑Learning ConsortiumConference. Villanova University, PA, The Villanova Institute for Teaching and Learning.
CHAP T E R 6
EBook discovery metadataDonna E. Frederick
SynopsisSo you have your eBook collection, how to you make these resources discoverable toyour patrons?
AbstractDiscovery metadata is essential for eBooks as these resources have no physical presenceand can’t be discovered through physical browsing of a collection. This chapter exploresthe difference between creating metadata for print and electronic resources and offers anew definition for discovery metadata.
6A Discovery metadata: An introductionDiscovery metadata for eBooks is a substantive topic for metadata librarians inacademic libraries. It would not be surprising to hear that many librarians beganto read this book expecting to find primarily information about and instruction inthe creation of discovery records and that many find it puzzling that eBookdiscovery metadata is discussed so far into the book.The reality is that with eBooks, the metadata created for various steps in the
workflow helps to inform subsequent steps. In addition, as we have already seen,eBooks aren’t static resources in the library’s collection. EBooks can automaticallycome and go from the collection and the DRM can change over time, just to namea few examples of how eBooks and eBook collections differ from hard copyresources. Ultimately to be able to catalogue or otherwise create discoverymetadata for eBooks, the eBooks must first be acquired and access must beestablished. As eBooks can’t be received and physically processed through aworkflow in the same manner as hard copy resources, having adequate metadata
to represent the resource and track it through various workflows is essential.Therefore, much needs to be in place before a library is ready to make an eBook oreBook package discoverable. An eBook metadata management plan needs toensure that all of the preliminary information about the resources, subscriptions,and platforms have been recorded in an adequate and useful form and location.Now that those issues have been addressed, a埄�ention can be given to the topic towhich cataloguing and metadata librarians seem to be naturally drawn: Thecreation and management of discovery metadata.
6.1 Structure of the discovery metadatachapter and partsPart A of this chapter will constitute an introduction to the topic of discoverymetadata. It is expected that readers will approach this chapter with differenttypes and amounts of experience and different interests in learning more aboutdiscovery metadata. While the experienced cataloguing or metadata librarian mayfind much of this introductory section to be rudimentary, there hopefully will be afew new and interesting tidbits of information. For other librarians who have notworked with cataloguing or have not done so in a very long time or have onlycatalogued hard copy resources, this chapter is intended to provide a baseline ofconcepts and information.For those librarians who have not been actively working in the area of
cataloguing or are new to librarianship in general, the notes sections of thediscovery metadata chapter may be of particular interest. The informationpresented in these notes have been placed here rather than in the body of thechapter for the benefit of currently active cataloguers and metadata librarians whomay wish to read through Part A briskly and then browse the notes at the end tosee if there is anything there of interest to them.The subsequent parts of the discovery metadata chapter will be divided up into
specific topics in a manner intended to facilitate reading and study as well asfuture reference once the library’s metadata plan has been implemented. Not alltopics will be relevant or appear to be relevant to all libraries; however, it may beuseful for the reader to at least skim through the content to be aware of it in caseits relevance becomes evident during the process of creating the metadata plan.As with the previous chapters, please take time to review the answers to previoussurveys to see if anything needs to be updated or appended as topics related todiscovery are explored.
6.2 What is discovery metadata?
As previously mentioned, in many library contexts, discovery metadata foreBooks takes the form of MARC records in library catalogues. However, librariesmay use other forms of discovery metadata instead of or in addition to traditionalMARC records. A more useful answer to this question focuses not so literally onwhat form eBook metadata takes, as there are many possible useful forms, but onits purpose and how it functions as well as why good quality eBook discoverymetadata is essential.Upon reflection, the author realizes that her introduction to the concept of
discovery metadata came when she took her first cataloguing class in the 1980s. Atthe time the phrase “discovery metadata” was not used. In fact, the concept ofdiscovery was not discussed and the term metadata was not mentioned. Instead,the course focused on creating “surrogates”1 for the resources purchased by thelibrary in the form of card catalogue records. There was a significant amount ofemphasis on what is often called “descriptive cataloguing”2 although subjectanalysis and classification were also taught. This class taught prospective teacher‑librarians how to create card catalogue cards for their school libraries. What werecreated were literally li埄�le cards onto which a significant amount of informationneeded to be typed. A round hole was punched into the bo埄�om of each card andthe cards would eventually be interfiled in card catalogue drawers. A rod wouldbe pulled out of each drawer, the new cards would drop into place. The newcatalogue records were then secured in their new home in the drawer when therod was reinserted through the card holes. While card catalogues undoubtedlycan still be found in school libraries around the world, in most academic librariesthis type of cataloguing now appears to be a somewhat antiquated and arduousbut also a curiously quaint practice. Understanding how cataloguing has changedin the last three decades is a useful way to come to an appreciation of theimportance of contemporary eBook discovery metadata.The fact that the term “surrogate” was used repeatedly during the course the
author took all those years ago is indicative of the believed purpose of metadatacreation during that era. There was an idea that in creating the card cataloguerecord librarians were able to represent the book or other resource on a small cardcatalogue card. Librarians of that era were trying to represent what a personmight see if she or he were to have the book in hand in terms of characteristicssuch as author, title, number of pages, size of book, and whether or not it isillustrated. Rather than trying to a埄�empt to browse through the entire library, thecard catalogue was intended to allow patrons to flip through cards and do asimulated browsing or at least limit the amount of searching around patronsneeded to do to the compact area of the card catalogue. There was a main card foreach book, so to speak, in each card catalogue. The phrase “main entry”3 remainsin cataloguing today as an artifact from that era. This is the card that had the mostdetail about the resource itself. Then there were “added entries”4 and “tracings,”5
which all redirected the patron to the information on the main entry card. There
was what was called a “shelf list,”6 which was a set of cards filed according toclassification number and thus reflected how the books would sit on the shelf. Theshelf list was generally hidden away in a nonpublic area for the use of librarystaff. Of course, of greater interest to patrons would be the cards that were createdaccording to the author, title, or subject of the book. For school libraries, theauthor was taught to integrate all of the cards into a single card catalogue but wasalso told that if the library is very large there could be separate card catalogues forsubjects, titles, and authors. In order to get this complex system to work and all ofthe important information to fit on the cards, a system of rules and abbreviationswere required, which precipitated the eventual creation of the Anglo AmericanCataloging Rules (AACR2), AACR2 abbreviations,7 ISBD (including ISBDpunctuation),8 and other rules and conventions. Given that librarians in mostlibraries during more than half of the twentieth century had to catalogue theircollections without the benefit of computers, computer networks, or the web, thesystem that was developed was incredibly efficient and effective.Considering the big picture of traditional cataloguing, the idea was that patrons
would systematically search through a collection of surrogates and crossreferences to surrogates that were logically organized in a compact, efficientsystem of physical reference cards. The information retrieved from these cardswould then direct patrons to the appropriate area within the physical collectionfrom which they could retrieve the information or resources required. In such anenvironment, directly interacting with and manipulating a physical search toolwas necessary to locate physical resources that were situated in a real space.Traditional cataloguing proved over the years to be very efficient in supportingthe discovery of the library’s resources and thus has come to be identified as themost important discovery metadata used in library contexts. To this day, aspreviously discussed, MARC records remain a critical element for supporting theeffective discovery of eBook content in many academic libraries.While the author believes that there is benefit in reflecting upon the nature and
origin of traditional cataloguing, she believes that it is equally important to reflectupon the ways in which the nature of eBooks and eBook discovery is notsupported in such an environment and also envision ways in which eBookdiscovery can be optimized. To begin with, eBooks do not have a physicalpresence in libraries and don’t sit on a shelf or have a location that is relative toother resources in the library. Patrons can’t browse through the shelves andserendipitously discover an eBook. EBooks don’t have the majority of physicalcharacteristics of print books and thus a埄�empting to record such information isoften irrelevant. This is particularly true for born digital eBooks where even basiccharacteristics such as the number of pages can be determined. In Chapter 2,which discusses the disruptive nature of eBooks, a number of other characteristicsof eBooks were discussed that indicated how eBooks are distinctly different thantheir print counterparts. These include the fact that OPACs generally can’t tell
users when an eBook is in use by the maximum number of allowed users or whenthe payment of fees to use an eBook platform is in arrears and the access to theeBooks has been temporarily suspended. Essentially, within the context of thelibrary’s collection, eBooks are an invisible resource. Patrons can’t walk into thedoor of a library and see them. Managers can’t see piles of uncatalogued eBooksstacking up in the technical services department. Even if library staff and patronsare aware that the library purchased a certain eBook, it is not possible to knowfrom looking around the library whether or not that eBook is available to use andwhat can be done with it. Just as metadata is necessary to process eBooks andeBook collections through acquisitions workflows, metadata is also the criticalelement in making eBooks discoverable for both library staff and patrons. Humanbeings don’t have the ability to browse through various electronic files as theyexist without any intermediation in order to identify which files are eBooks andwhich files contain other types of electronic information, let alone identify aneBook of interest and actually make use of that file. The physical nature of humanbeings makes it impossible for them to directly interact with electronic files instorage media the way that a human being can interact with the cards in a cardcatalogue or books on a shelf.So, what does the inaccessible nature of eBooks mean for libraries? It means
that discovery metadata is essential. Whereas traditional cataloguing is based onthe model of creating a physical surrogate for a physical resource, eBook metadatashould be based on a model of creating an electronic surrogate for an electronicresource. That electronic surrogate is still not directly usable by human beings butcan be made useful by applications that read, process, and display theinformation found in the surrogates. Those applications include but are notlimited to library OPACs and discovery systems. In fact, in recent years the authorhas noticed that a number of problematic records are confusing to patronsbecause the records that have been created are based on the model of creatingphysical surrogates for physical resources. When this model is used, time iswasted in adding metadata that is unnecessary and/or confusing with regard toeBooks while other information that would be helpful to patrons is not added.Unfortunately, there are many existing MARC records for eBooks that have beencreated using the older cataloguing model. Many of them were created in the pre‑RDA era where existing records for the hard copy version were converted toeBook records via an automated process. While methods such as this allowed forthe rapid and efficient creation of metadata for large collections of eBook contentwhere key access points such as author, title, subjects, and other contributors werepresent and controlled to the same level of accuracy as were the print cataloguerecords, the records aren’t necessarily optimal for representing eBook content. It isimportant that metadata and cataloguing librarians recognize that many of theirexisting eBook records created during the period of about 2008 to 2012 andsupplied to the library in record sets may have been created using a conversion
process. These records may be quite functional in a traditional OPAC but maybegin to show signs of being less than functional in the newer discovery systemsand likely will continue be problematic in the future. Some eBook packagevendors have recently offered replacement record sets for some or all of theireBook MARC records. Libraries should replace the old records with the new RDAones as the theoretical framework behind RDA (FRBR) does take intoconsideration important distinctions in form and format of eBooks that arerelevant to patrons and library staff alike. RDA records should prove moreeffective and functional in discovery systems now and in the future than the older“converted from print” MARC records.Considering the discussion in this section, the author proposes a new definition
for eBook discovery metadata that she believes will be useful for the purpose ofcreating discovery metadata that will be functional today and in the future. Thisdefinition is:
Discovery metadata are structured electronic representations of resources. Theserepresentations are intended to be used by an application (or have the potential to beused by multiple applications) that facilitates the process of assisting human beings inlocating, accessing, and using resources and information.
There are many key considerations in this definition. For readers who have readall of this book previous to this section, the majority of ideas will not be new.Considerations include the following: (1) Metadata must be structured. Following one of the current or emerging
metadata standards for eBook discovery metadata will address this concern.(2) By following the newer standards, metadata records should be usable by
applications that have been wri埄�en to use metadata coded to the relevantstandard. In today’s library environment, metadata is increasingly beingshared, migrated, and otherwise transferred between applications andenvironments. Interoperability of metadata is becoming increasinglyimportant in academic library environments and different products andservices use the same metadata that was once intended almost exclusivelyfor use by an ILS and OPAC.
(3) The metadata that is created first and foremost must be parsed efficiently andeffectively by computers. As RDA training typically points out, cataloguingstandards are not display standards. Instead, cataloguing or metadatastandards are intended to create systematic, standardized metadata.Programs are then created that will search the metadata and display theresults in a way that is suitable for the user community. The ability to createprograms that perform as expected relies in part on the fact that those whohave created the metadata in the first place have followed whichever
standard/s has/have been adopted for that metadata container. For librarystaff that have long been exposed to the appearance of ISBD‑influencedmetadata, there may be a strong temptation to “tweak” metadata away fromthe current standard, such as RDA, to make the entries “look be埄�er.”Ironically, these well‑meaning a埄�empts to improve the display of the recordcan actually make the programming perform less well in the long run oreven malfunction depending on the nature of the “tweak.” Training staffthat are involved with cataloguing and other discovery metadata creation inthe concept that complying with international standards is essential.Instruction should include that the intent is that the metadata will be read bya computer and that in such an environment the need for consistency andcompliance with the standards is an absolute requirement seeing ascomputer programs generally can’t make the same types of visualinterpretations that a human being automatically makes and also that thereis no concern for the usual human aesthetics when it comes to theperformance of a computer program. It is in the interface where display,aesthetic, and readability concerns should be addressed, not in the metadataitself.
(4) Finally, the bo埄�om line is that the purpose of creating the metadata in the firstplace is to assist human beings in the tasks they need to perform. The FRBRtheoretical model has identified user tasks as “find,” “identify,” “select,” and“obtain.” The concepts related to the user tasks have then been embedded inRDA and also have a presence in BIBFRAME.9 While those involved withcreating metadata should not “tweak” the metadata they create away fromthe standard they are trying to apply, the experience of the user doesultimately need to be evaluated and if the metadata appears to fail tosupport the key user tasks, then further investigation is required. Thelibrarian should seek to understand the cause of the failure. Does themetadata appear to be adequate but is the design of the interface confusingto the user? Or, is the ma埄�er in which metadata is displayed incongruentwith or inadequate for the needs of the user population? If so, options forupdating the interface design or replacing it altogether may be appropriate.Perhaps the librarian discovers that metadata produced within theframework of the emerging metadata standards are still failing the needs ofusers and the tasks they need to perform. Given that many of the new andemerging metadata standards and practices have been designed with newerlibrary resources such as eBooks in mind, it is reasonable that libraries willwant to take advantage of the new approaches. While the newer ways ofdoing things have already shown some benefit to libraries, as will bediscussed in later sections of this chapter, the new standards, practices,guidelines, and metadata containers are not yet mature and have not stoodthe test of time in various academic library contexts. It is reasonable to
expect that librarians will discover problems and shortcomings. This iswhere the academic librarian can play an important role in furthering thedevelopment of the LIS discipline. This is particularly true for academiclibrarians who are actively involved with research and publishing. Theselibrarians may use the shortcomings they discover as research topics.Finding solutions to the problems may also create opportunities for workingcollaboratively with other libraries to innovate solutions. However,regardless of whether or not such options may be possibilities, all academiclibrarians can report their findings to the organizations responsible for thedevelopment and maintenance of the standards or guidelines and follow thediscussions in journals and at conferences.
Hopefully Part A has set the scene for the sections to come and provided both
some background information and a useful definition for discovery metadata. Thefollowing sections will, no doubt, be of great interest to many readers.
Notes1. The ODLIS defines surrogate as the following:A substitute used in place of an original item, for example, a facsimile orphotocopy of a document too rare or fragile to be handled by library users oran abstract or summary that provides desired information without requiringthe reader to examine the entire document. In preservation, a surrogate isusually made in a more durable medium. In a library catalog, the descriptionprovided in the bibliographic record serves as a surrogate for the actualphysical item (see: h埄�p://www.abc‑clio.com/ODLIS/odlis_s.aspx).It is interesting to note that the cataloguing‑related definition of a surrogateis listed last in this definition without significant explanation. It’s difficult tosay whether or not this is just a coincidence or a reflection of the fact that theconcept of cataloguing as creating “surrogates” has gradually been fallingout of favor as libraries increasingly catalogue nonphysical items.
2. Descriptive Cataloguing is a somewhat problematic term in the sense thateven within the field of LIS it has different meanings. In looking at theLibrary of Congress’ Descriptive Cataloging Manual DCM (seeh埄�p://www.loc.gov/catdir/cpso/dcmz1.pdf), it appears that pre埄�y muchevery aspect of traditional library cataloguing is “descriptive cataloguing.”In practice, many cataloguers and metadata librarians consider descriptivecataloguing to be the creation of metadata for title, author, publication, andphysical description as relevant to the resource being catalogued with otherfields in MARC 21 records containing what is essentially access and technicalmetadata.
3. The ODLIS defines “main entry” as:
The entry in a library catalog that provides the fullest description of abibliographic item, by which the work is to be uniformly identified andcited. In AACR2, the main entry is the primary access point. In the cardcatalog, it includes all the secondary headings under which the item iscataloged (called added entries). For most items, main entry is under nameof author. When there is no author, main entry is under title (see:h埄�p://www.abc‑clio.com/ODLIS/odlis_m.aspx).The idea that a single resource would be found listed on more than one cardin more than one section of a card catalogue but that only one card wouldhave all of the details about that resource has become somewhat lost in aMARC environment where there is only one bibliographic record and all ofthe detailed cataloguing is found on that record. Remnants of the oldcataloguing days remain present in the technical details of the MARC 21indicators, in the sense that a properly coded MARC record can still be usedto print hard copy card catalogue cards. The author has noted that manynonlibrarian cataloguers who have only ever worked in MARC cataloguingenvironments meticulously code main and added entry fields in MARCrecords using the appropriate indicators, but when asked why this type ofcoding is necessary, respond either that they don’t know or that “it’s the waywe’ve always done it.” As eBook metadata creation gradually moves towardthe creation of metadata that is useful in linked data environments, it isimportant to keep in mind that some cataloguers who have significantexperience working only in a MARC environment may require extrainstruction in order to feel secure that they are not losing somethingimportant as practices change and coding which no longer serves a functionis gradually deprecated in the cataloguing standards.
4. The ODLIS defines added entry as simply, “A secondary entry, additional tothe main entry, usually under a heading for a joint author, illustrator,translator, series, or subject, by which an item is represented in a librarycatalog” (see: h埄�p://www.abc‑clio.com/ODLIS/odlis_a.aspx‑#addedentry). Inthe MARC environment, which is flat in nature, the original meaning ofadded entry has become somewhat muddled in the sense that added entriesare simply more access points in the same record and don’t representadditional entries or areas of the card catalogue where the resource iscatalogued as was the case in the physical card catalogue.
5. The concept of tracings is an interesting relic from the days of the cardcatalogue. Their intention was to aid in the maintenance of a physicalcatalogue. They have had relatively li埄�le need in computerized cataloguingenvironments but the concept has persisted through the years. The ODLISdefines tracings as “A record of the additional headings under which abibliographic item is listed in a library catalog, usually associated with themain entry, enabling the cataloger to ‘trace’ all the entries referring to the
item whenever a change or correction is made or when the item iswithdrawn from the collection” (see: h埄�p://www.abc‑clio.com/ODLIS/odlis_t.aspx#tracings). Despite the fact that the originalpurpose for tracings has long been obsolete for the majority of academiclibraries, the idea that a method is needed for allowing library staff to drawtogether a subset of catalogue records for maintenance or deletion is a criticalneed in eBook metadata management. This need will be discussed in detailin later sections of this chapter.
6. Like tracings, the creation and use of the shelf list is tied to the long‑termmanagement of hard copy collections and metadata in a card catalogueenvironment. The ODLIS defines shelf list or shelflist as:A nonpublic catalog of a library collection containing a single bibliographicrecord for each item, filed in the order in which the items are arranged on theshelf (usually by call number), used for inventory because it contains themost current information on copy and volume holdings. Card shelflists arebeing phased out by libraries that have converted their catalogs to machine‑readable records (see: h埄�p://www.abc‑clio.com/ODLIS/odlis_s.aspx).Many librarians would argue that the concept of the shelf list or practicesrelated to “shelf listing” or adjusting call numbers so that resources will filealphabetically by main entry are completely irrelevant in a computerizedcataloguing environment. The relevance of shelf listing is questionable inlibraries where an ILS contains item and holdings records and allows forhighly flexible sorting and display of records. Those libraries that do regularinventories of their collections will likely not see the need to a埄�empt to printout an entire shelf list of their holdings because of all of the other optionsavailable to them. However, many experienced cataloguers cling to practicesrelated to adjusting catalogue records to display in “proper shelf list order.”In an eBook environment where the eBooks do not sit on a physical shelf, theidea of a埄�empting to create a shelf list is particularly puzzling. In fact, it iscommon that catalogue records for electronic resources not be assignedclassification numbers because of the extra work involved in doing sowithout any practical need for it in the sense that eBooks do not sit on a shelfand thus don’t need to be assigned a location. If the reader has not yetaddressed the issue of applying call numbers to eBooks and adjusting thosenumbers to fit a “shelf list,” it is something that should be addressed duringthe creation of the eBook metadata plan. This topic will be discussed in moredetail in a later section of this chapter.
7. AACR2 abbreviations were originally listed in Appendix B of the AACR2publication. The original idea behind these abbreviations was to create asystematic and standardized way to represent commonly found andrepeated terms and phrases in card catalogue records. In the card catalogueenvironment this was a necessary and efficient way to address the limited
amount of readable text that can be placed on card catalogue cards and alsolimit the amount of card space that would need to be assigned to thefurniture that was the card catalogue as well as the drawers of cardscontained within it. Even in early computerized cataloguing environments itcould be argued that the cost of memory justified the use of abbreviations.These abbreviations have come under scrutiny in recent years in the sensethat they are no longer required in contemporary computerizedenvironments and that the abbreviations are not widely understood in adiverse international context. In fact “Anglo‑American” abbreviations mayactually present a barrier for some library patrons. Despite the fact that RDAhas largely eliminated the use of AACR2 abbreviations, there is no doubtthat librarians who are managing eBook discovery metadata will encountertheir use in their eBook discovery records.
8. ISBD or the International Standard Bibliographic Description is a 40+ year‑old cataloguing standard that predates AACR2, and as such, has beenintegrated within AACR2. The ODLIS defines ISBD as:A set of standards adopted in 1971 by the International Federation of LibraryAssociations (IFLA), governing the bibliographic description of itemscollected by libraries. The general standard ISBD(G) serves as a guide fordescribing all types of library materials. Standards have also been developedfor specific formats: ISBD(CM) for cartographic materials, ISBD(PM) forprinted music, ISBD(S) for serials, etc. ISBDs have been integrated intoseveral catalogue codes around the world, including AACR2 (see:h埄�p://www.abc‑clio.com/ODLIS/odlis_i.aspx).ISBD was created in an information environment that was still largely printbased, and as such, was intended for the human eye rather the computer. Asa result, the most significant legacy of ISBD that we see today is a particulartradition of the use of spaces and punctuation intended to make cataloguerecords easier for human beings to read. Therefore, sections of informationare divided up, even in today’s MARC records, by spaces, slashes, colons,semicolons, commas, and periods. Traditional OPACs also tend to displayinformation from MARC records according to the sections that are outlinedby ISBD. For the human eye, this use of punctuation and the division ofblocks of text that ISBD has defined makes records easy to read andunderstand but it is not ideal for the world of computing. Any irregularitiesin how ISBD is applied to a record makes computer programming based onwhat is found in those records impossible in the sense of producing accurateand error‑free displays and search results. In the current context, it makesmuch more sense to have each bit of information stored in its own, clearlydefined field and then have the programming in the search interface insertthe required spacing and punctuation into the results in order to make thetext more readable for human beings.
ISBD and/or ISBD punctuation is another topic that librarians willundoubtedly encounter and need to deal with when addressing the topic ofdiscovery metadata for eBooks. For example, MARC records that have beencrosswalked from other metadata standards into MARC typically don’tcontain ISBD punctuation and, if it is possible to systematically insert some,it is not always done with 100% accuracy. Both MARC and RDA supportrecords that lack ISBD punctuation as do the new discovery systems.However, many cataloguers and librarians feel strongly about the traditionof the appearance of records that adhere to ISBD standards. On the otherhand, ISBD is not the standard used by the academic community for citingresources. Instead, standards such as APA or MLA are more commonlyrecognized and preferred. If a library has not yet come to terms with agradual movement away from various aspects of ISBD as library metadatamoves increasingly toward linked data environments, the reckoning willneed to occur at some point.
9. Cataloguing and metadata librarians who have not already become familiarwith BIBFRAME would do well to develop at least a basic understanding ofthe new metadata framework for use in libraries, which has been proposedand is under development. A useful place to begin investigating BIBFRAMEis h埄�p://www.loc.gov/bibframe/. For those who are not yet familiar withBIBFRAME or linked data in general, the importance of building a basicunderstanding of them before undertaking the design of a new metadatamanagement plan cannot be stressed enough. BIBFRAME has been proposedas a linked data solution that will eventually replace MARC in libraries.However, BIBFRAME has a much greater potential for the discovery ofinformation and library resources and the overall use and management oflibrary metadata than simply repurposing old bibliographic records. It isimportant for librarians to consider the possibilities for the new informationenvironment that BIBFRAME has the potential to bring to libraries. As withmany of the emerging standards and practices in library metadata creationand management, BIBFRAME is not ready for implementation or supportedin any currently available commercial ILS/LMS, OPAC, or discovery system.However, there are a number of libraries around the world who are workingtoward a real‑life implementation of BIBFRAME and it is reasonable to beginto follow its development and thus limit the negative impact of anydisruptions it may bring to the larger LIS environment by at least having anunderstanding of what is happening, why it is happening, and being able torecognize and make use of opportunities to move the library closer to itsfuture goals as those opportunities present themselves.
6B MARC 21 discovery metadata
6.3 Why MARC?Part B of this chapter includes a series of sections that address the topic of MARC21 discovery metadata. MARC 21 is expected to be gradually replaced as thecontainer for bibliographic or eBook discovery metadata in academic librarieswithin the relatively near future by a new container, which likely will beBIBFRAME or some other container based on the linked data model. Even so, it isimportant to keep in mind that there are currently millions of eBook discoveryrecords that have been created using the MARC 21 standard. Given that discoverymetadata is typically costly to create and that well‑formed MARC 21 records havethe capacity to be transformed and upgraded into other types of metadata, it isreasonable to expect that the existing MARC records and the records that arecreated today and in the immediate future will continue to play a significant rolein the supporting eBook discovery well into the future.Some readers may question the amount space in this book dedicated to MARC
metadata considering that MARC is commonly viewed as being “on its way out.”The author does not dispute that as this book goes to press the MARC standard isnearing the end of its journey and is about to pass the torch of discoveryworkhorse in libraries off to a new standard that fits into the world of linked data,big data, and the semantic web. MARC is no longer as useful or as functional as ametadata container needs to be in today’s information environment and needs tobe replaced. The reality is that there currently is several decades’ worth ofdiscovery metadata that is stored in MARC format. By understanding both wherelibrary metadata is today and where it has been, as well as studying thepossibilities for the future, those who have to bring eBook discovery metadatafrom the twentieth to the twenty‑first century will have the information they needto make the best possible decisions and plans.In addition to shepherding eBook metadata into the 21st century, many
academic libraries currently use MARC metadata for eBook discovery and it willbe a while yet until libraries see a fully functional replacement for MARC that canbe integrated into the larger library context. Even when such systems aredeveloped, it will take awhile for those systems to be adopted widely. In themeantime many academic libraries will need to continue to use MARC as theirprimary source of eBook discovery metadata. At this point, it’s not possible toknow how long the transition to a non‑MARC environment will take, so it seemswise to make sure that a solid understanding of MARC is in the librarian’s toolkit.As part of the toolkit readers are building as they work through this book, it is
essential that all librarians who create and manage eBook metadata locally havean understanding of the MARC 21 standard in that kit. At a minimum, librarians
must be able to use the wri埄�en standard to interpret the fields, subfields, andindicators in records. For those librarians who are already well versed in MARCand/or traditional cataloguing, it may be useful to learn the basics of eitherMARCXML1 or XML2 in general if not already familiar with XML metadata.While XML is not directly related to the content of this section, traditionalcataloguers who are well versed in the information that is found on the followingpages may like to take the opportunity to delve into topics that may be new tothem by focusing their a埄�ention on the resources mentioned at the beginning ofthe notes section for this chapter. For those librarians who are new to technicalservices and librarianship and don’t have a programming background, they willlikely benefit from reading all parts of Part B intensively.Considering its age and the relatively limited computing environment in which
it was originally created, the MARC standard has done well to last as long as ithas. While some librarians may suggest that it is not worthwhile for students andnew librarians to learn the MARC standard, there are multiple reasons whycataloguing and metadata librarians need to learn and understand it. The firstreason has already been mentioned. It has to do with the amount of existingeBook discovery metadata in the MARC format. Librarians who aren’t able tounderstand and make use of this metadata are at a disadvantage. The secondreason is that MARC cataloguing, including its terminology and concepts,permeate the culture and language of academic library cataloguing. Even if thenew metadata librarian doesn’t expect to ever catalogue in MARC, it is useful forthat librarian to understand the language and concepts of MARC to communicateeffectively with colleagues and to make sense of documentation. Some of thedocumentation has been in existence for a long time and has been tested in anumber of different environments, making them documents to reflect upon interms of discovering what is functional and efficient in a metadata container andwhat is not effective. With less mature metadata standards it is hard to tell what atrue limitation is and what is the reflection of issues and problems that willeventually be resolved.
6.4 What is the MARC 21 standard?MARC (MAchine‑Readable Cataloging) has already been mentioned numeroustimes in this book but in this section it will be addressed at the practical level. Asdiscussed, MARC is the machine‑readable bibliographic metadata container fortraditional cataloguing or at least that was its original purpose. MARC 21 is themost recent iteration of the MARC standard, and one of the most significantcharacteristics of MARC 21 is it has combined all the variations on the standardpreviously used in different countries into a single international standard.Another notable point about MARC 21 is it is expected to be the last version ofMARC. MARC is about as mature as any metadata standard for a computing
environment3 and this likely is a significant cause of its omnipresence in academiclibrary bibliographic databases despite the fact that other, more modern, metadatacontainers have been developed over the years such as Dublin Core (DC) andMODS.The MARC standard currently applies to records for bibliographic information
(metadata for resources), authority metadata (controlled headings for names andsubject headings), holdings records (metadata about the particular holdings in agiven library or group of libraries), and classification data (metadata related tovarious classification schemes). This book is only concerned about MARC recordsfor bibliographic data, although a thorough metadata plan will also includeconsideration of how a library may manage both holdings information4 andauthority control5 or the use of authority records to optimize the discovery abilityand accessibility of eBooks within its collection.Despite the intention of MARC 21 being a single international standard, there
are two versions of the MARC 21 standard. The Library of Congress version forbibliography records is located at h埄�p://www.loc.gov/marc/bibliographic/ whilethe OCLC version is located at h埄�p://oclc.org/bibformats/en.html. The author doesnot promote either version of the standard but encourages readers to scanthrough both versions and generally be aware that two versions exist. Thesignificance of this difference will be discussed later in the discussion of copy andoriginal cataloguing. Another fact to keep in mind is that despite the MARC 21standard being essentially the latest and likely last version of MARC, it remainsunder revision. These revisions are being made to reflect changes in cataloguingtheory and practice that have been brought about by the introduction of RDA.Essentially, these revisions have added more granularity to the standard so that itcan accommodate the greater detail and specificity RDA requires. Considering theOCLC version of MARC was under revision at the time this book was wri埄�en, theLibrary of Congress (2014b) version is the standard that will be referenced in thispublication. However, readers should familiarize themselves with both versionsat some point.
6.5 Other eBook metadata containersIn talking about discovery metadata for eBooks and considering the relativelybroad definition of eBooks this publication encompasses, many of the “eBooks” inacademic library collections could be held in locally created and hosted digitalcollections and or institutional repositories (IRs) such as those that archive andmake discoverable theses, dissertations, and faculty research as well as otheruniversity‑generated publications. There is no question that electronicmonographs that fit into these categories need the highest possible qualitydiscovery metadata. The resources may, in fact, be unique or rare. They likely areresources upon which the reputation of the university and its faculty and
researchers is built. Locally hosted digital collections and IRs may also containcritical information for researchers. Thus, there is no question it is reasonable touse the necessary effort and resources required for making such resourcesdiscoverable and accessible. It is also important that “eBooks” or electronicmonographs held in digital collections, IRs, or otherwise hosted locally beconsidered when creating the metadata management plan.While some libraries have created MARC records for their locally hosted
collections, chances are that many use other metadata containers for theseresources. If the digital collection is relatively mature, it is likely that DC has beenused. However, it is also possible that another common metadata schema such asMODS or PREMIS or even a combination of schema have been used. At somelibraries, a locally created system and metadata container may have been createdto manage digital collections. A discussion of metadata for locally hosted digitalmonographs has been placed in this section of this chapter because this type ofmetadata generally is what we will later describe as “original cataloguing.”However, a detailed discussion of how to manage metadata in these collections isbeyond the scope of this book. That being said, the importance and value of theresources require their metadata be considered as part of the larger metadatamanagement plan. To help reconcile the disparity between the coverage of thetopic of metadata for digital collections and IRs in this book and the relativeimportance of the resources in the academic context, a section of the toolkit isdedicated to providing resources that will be useful for those readers whose planincludes metadata for digital collections and IRs. For those readers who areprimarily traditional cataloguers, familiarizing themselves with the resources inthis section may also help them to build and expand their overall metadatamanagement toolkit.
6.6 Original and copy cataloguingOriginal cataloguing refers to the process of creating a catalogue record for aresource from beginning to end without relying on metadata from an existingcatalogue record. If the metadata is transferred from another metadataformat/schema or container6 into MARC, the process is called “crosswalking”;whereas, if the metadata is simply copied from a MARC record stored elsewhere,the process is called “copy cataloguing.” The ODLIS defines copy cataloguing as:
Adaptation of a preexisting bibliographic record (usually found in OCLC, NUC, orsome other bibliographic database) to fit the characteristics of the item in hand, withmodifications to correct obvious errors and minor adjustments to reflect locally acceptedcataloging practice, as distinct from original cataloging (creating a completely newrecord from scratch). Synonymous with derived cataloging (see: h埄�p://www.abc‑clio.com/ODLIS/odlis_c.aspx#copycataloging).
This definition appears to accurately describe both the historical and currentapproach to copy cataloguing. This type of copy cataloguing assumes that thecataloguer examines the resource to be catalogued and potential existing recordson a resource‑by‑resource basis. There is another type of copy cataloguing whererecords are selected as a group and copied as a batch. This method will beoutlined in Part C of this chapter in the discussion about bulk record processing.From a theoretical point of view, original cataloguing is the most
straightforward approach. The cataloguer has a resource in hand andsystematically works through the applicable cataloguing standards andguidelines to create a complete catalogue record. In reality, original cataloguing isthe most labor‑intensive and time‑consuming of all of the metadata creationmethods discussed in this chapter. In some academic libraries, the complexity andspecificity of resources in the collection often means that librarians or subjectspecialists must do most or all of the original cataloguing because it is a task thatrequires more specialized training and/or knowledge than is expected fromlibrary technicians and other nonspecialist cataloguers. Ideally, originalcataloguing is kept to a minimum at most libraries because of the cost involved increating records using this method. However, when no records exist or can beretrieved or the existing records are not suitable for the needs of the academiclibrary environment, an original record must be created.With technologies such as z39.507 catalogue searching and record retrieval, once
an original record is created at one library, it can potentially be shared with andreused by libraries around the world. Thus, the time and effort put into creating ahigh‑quality, standards‑based MARC record to optimize the discoverability andaccess the library’s own resources not only brings a benefit to local patrons butalso potentially to libraries and library patrons everywhere. In addition, thereader’s library can also benefit significantly from the time and effort that otherlibraries have put into creating good quality records.
6.6.1 The importance of training for cataloguers inacademic and research librariesFor those readers who must create original catalogue records in MARC but havenever done so or last catalogued before 2012, it is recommended that the readerseek some supplementary training in MARC cataloguing and/or RDA instruction.Because of RDA, cataloguing has changed significantly in recent years. EBookMARC records are more effective as discovery metadata when they reflect RDAinstructions. Training is sometimes offered by professional organizations asworkshops or as preconference sessions. For the librarian who has nevercatalogued, training that is gained through workshops of three days in length orless will likely not be adequate. Alternatives that may offer more intensive and in‑depth instruction for those who require it include professional development
courses offered by library schools and iSchools, multiweek online courses offeredby professional or nonprofit information organizations, and mentoring fromexperienced cataloguers. The toolkit for Part B contains some useful referenceresources and suggestions for places to seek training.The effort invested in taking the time to learn original cataloguing in MARC
properly is well worth it in the academic environment. The author has had theunfortunate experience of discovering catalogue records for electronic resourcesin her library’s catalogue that couldn’t be discovered by patrons because ofmultiple problems with the way the MARC tags and indicators were coded. Inone case, an electronic thesis could not be retrieved by title or author because theauthor’s name was coded with the wrong tag and the title began with “the,” butthe indicators for the title tags weren’t coded to reflect this. The mistakes werecreated by a person who was trained in neither cataloguing nor the MARCstandard. While highly intelligent and competent in other areas, the untrainedcataloguer did not understand the MARC standard or have training in the basicpractices. The individual likely was selected to create the records because ofcompetencies in a technical field and, perhaps, may have overgeneralizedprinciples from that field into MARC cataloguing. Some basic cataloguingknowledge would have prevented many of the problems that persisted for yearsand still may lurk in the library’s catalogue today. Unfortunately, the MARCstandard is not intuitive and does not necessarily fit in with “how things aredone” in other disciplines. Until the various problems were discovered andrectified, many unique and costly resources were not discoverable or accessible tolibrary patrons. This experience remains in the mind of the author as a strongexample of why those who create original catalogue records for electronicresources in academic libraries need to be well‑trained cataloguers. If an originalrecord is being created, chances are that the resource is either unique or rare.These are the sorts of resources that, while they may not be popular, are oftenvery important to researchers and may be part of a unique or specializedcollection in which both the library and the university takes pride and maycontribute positively to that university’s international reputation. In the context ofan academic or research library, investing in training and supporting cataloguerswho can create high‑quality metadata is an investment in current and futureresearch as well as the overall reputation of the university.There are two more points relevant to this topic that are important for the
librarian creating the metadata plan to consider. First of all, regardless of whetherthe reader is experienced in working with traditional cataloguing or not, everyoneinvolved with creating and implementing an eBook management plan shouldbecome familiar with the basics of the semantic web and linked data. In additionto developing a basic understand of these, it is important that the librarian keepabreast of developments in terms of how libraries are planning to implementlinked data‑based solutions for the discovery of their resources and any related
new or potential related changes in the infrastructure related to the purchase andmanagement of those resources. Because linked data does not have a traditionalrecord structure as does currently existing metadata, we can expect that anentirely new set of technological disruptions will occur in practically every aspectof library operations considering practically all library functions rely in some wayon MARC records. The metadata manager who understands both how linkeddata and the new technologies such as BIBFRAME are implemented in the librarycontext as well as the ways in which MARC records are used within his or herparticular context will be a crucial player in helping libraries to navigate their wayinto the new library context. Some resources have been included in the toolkitsection of this chapter to assist readers who are not already familiar with thesemantic web and linked data or are unsure where to begin following the newdevelopments. In addition, there will be a dedicated discussion of BIBFRAME inthe final chapter of this book as well as an example of how one librarydocumented their understanding of how their MARC records are used in varioussystems within and outside their library. Given that learning about linked dataand BIBFRAME may represent a significant learning curve for some readers, it ishoped by gradually introducing it within the context of various discussions in thisbook, that learning curve may be somewhat reduced.The second point regards the conflicting priorities within technical services or
similar departments in academic libraries. Specifically, when valuable resourcesare acquired by the library, librarians, faculty, students, researchers, and otherpatrons would like to see the resources made discoverable and usable as quicklyas possible and may express the desire for expedient access to resources over thequality of the records produced. The desire to make resources discoverable andaccessible as quickly as possible is not necessarily in conflict with the value ofthose managing metadata. In fact, timely, efficient creation of metadata is a reasonfor creating an effective metadata management plan. The conflict occurs when itbecomes apparent that original cataloguing is necessary for an eBook or collectionof eBooks. When library staff are busy learning about the new technologies andstandards and this includes taking the time to essentially relearn how tocatalogue, it is very difficult to maintain the same level of productivity as in thepast. Thus, it appears that “something has to give” and some aspect ofproductivity may have to be somewhat reduced in the meantime. Some librariesmay have multiyear backlogs of original cataloguing while other libraries maychurn out quick eBook records that are not highly effective for all of the contextsin which they are used. As an increasing number of libraries find themselves inthis situation, libraries are also finding new solutions to these conflictingdemands. For example, librarians at the University of Illinois have created asimple interface they call Metadata Maker8 (previously called MARC Maker),which library staff who are not cataloguers can use to collect the essentialinformation for creating a significant part of a catalogue record. The records can
then be saved in a MARC format (or in a choice of other metadata formats ifrequired), edited in bulk using a program such as MARCEdit by copy cataloguers,and then enriched by a cataloguer before loading into the local catalogue. Thus, acataloguing librarian only needs to be involved with the aspects of originalcataloguing that require specialized knowledge or training. Because of thepotential that Metadata Maker has for freeing up the time of cataloguinglibrarians and specialist library assistants, the librarians at the University ofIllinois who have created this application and have inspired other librarians tofind ways to make original cataloguing more efficient are identified as tigertamers.Part of keeping up‑to‑date with emerging developments includes keeping up
with the further development of tools such as Metadata Maker. While this is animportant consideration for librarians to keep in mind, tools to assist freeing upthe time of librarians won’t be included in the toolbox for Part B because theymost appropriately belong in Part C of this chapter, which deals with bulkprocessing. The key message here is that when cataloguing librarians findthemselves pressed for time and a number of eBooks require original cataloguing,a viable option for dealing with the conflicting demands is to use a newer toolsuch as Metadata Maker to shift some of the less specialized aspects of originalcataloguing to support staff. Taking this approach is an excellent example of howlibraries can make a healthy and proactive adjustment in the face of a disruptivechange.
6.6.2 When to create an original catalogue record for aneBook and how to do itBecause of the cost of doing original cataloguing, most libraries see it as theapproach of last resort when the following options have been exhausted in orderof preference: (1) Acceptable quality MARC records supplied by an electronic resource vendor
as part of the eBook price. (This may include records received as part of anautomated record delivery service such as OCLC’s Collection Manager.These types of services will be discussed in detail in the next chaptersection.)
(2) Low‑cost records supplied by a third‑party cataloguing vendor.(3) MARC records extracted as needed or in bulk from a trusted z39.50 target.(4) Records extracted from a knowledge base or open‑source non‑MARC
metadata and crosswalked into MARC.(5) Traditional copy cataloguing.
Options 2, 3, and 4 can have different levels of priority depending on thelibrary’s approach to metadata management, the services and applications it uses,and the number of eBooks to be catalogued. In fact, if only one eBook iscatalogued, the most efficient approach may be traditional copy cataloguing first,as long as doing so fits into the larger metadata management framework.As eBooks are often acquired in packages and the packages contain anywhere
from tens, to hundreds, to thousands, and tens of thousands of titles, processesthrough which records can be retrieved and processed in bulk are generallypreferred to any of the methods described in Part B. Yet, original cataloguing andcopy cataloguing are the backbone of MARC cataloguing. Successful bulkprocessing depends on the fact that the cataloguer understands MARCcataloguing well enough to do original cataloguing or can interpret records thatothers have created. Thus, while a librarian would likely never select originalcataloguing as their “first choice,” it is the method that needs to be discussed first.While many academic libraries may outsource their original cataloguing to
other agencies such as a cataloguing vendor, the original cataloguing of someeBooks may be problematic in such arrangements. This may be particularly truewith licensed resources when the license agreement and DRM won’t allow thethird party to view the resource to catalogue it and the electronic resource vendorwon’t send metadata to the cataloguing vendor. Where the library has absolutelyno capacity to do original cataloguing in‑house, a third option may be needed.This may include considering options such as hiring a trained cataloguer fromanother university or organization who can work at the library that purchased theeBooks and catalogue them on a contract or casual basis. In considering who maybe contacted to perform such cataloguing, consider not only librarians from otherlibraries in the area, but also faculty from library schools and library technicianprograms.
6.6.3. Standards and guidelines for original cataloguingAs previously mentioned, metadata that follows international standards and usescontrolled vocabularies is generally considered the gold standard for libraries.This remains true for eBook discovery metadata. In reviewing the library’sdiscovery metadata as part of creating the metadata management plan, it isimportant to revisit the best practices discussed in Chapter 3. In addition to thoseconsiderations, there are additional guidelines relevant to discovery metadata andeBook metadata in particular. This includes RDA, provider neutral guidelines(PN), and various community guidelines.In terms of how to perform the original cataloguing, it has already been
mentioned that new original records ideally follow RDA guidelines. In addition totaking formal training, the Toolkit Tools section for this part of the chapterincludes some resources to help librarians learn and apply RDA to the
cataloguing of eBooks. There are many reasons why libraries should adopt RDAas their cataloguing guidelines but there are some reasons that are particularlyimportant for eBooks. For example, RDA allows for the creation of records thatspecifically reflect various technical, physical, and content‑related qualities ofeBooks. Electronic monographs aren’t exclusively text resources. In fact, chancesare that the typical academic library has purchased streaming audio and videoresources and may also have collections of digitized photographs, maps, andother nontextual information. All of these resources are electronic monographsand thus “eBooks” according to the definition used in this book. In some cases, aresource may have content in multiple formats such as textual content andstreaming video. The la埄�er is very common in teaching resources for the healthsciences. While MARC has supported the recording of most of the details relevantfor the variety of “eBook” content that might have been found in an academiclibrary for a number of years, AACR2 and traditional cataloguing practiceshaven’t led cataloguers to take advantage of MARC’s capacity to record suchinformation. Those who follow the RDA guidelines are given instructions on howto extract the relevant information for each type of resource; how to identify theroles individuals, families, and organizations have played in the creation ordistribution of the resource; and other resources related to the catalogued item aswell as specifying how those resources are related. That information can then beportioned off into the appropriate compartments of the MARC metadatacontainer in ways that were not found in traditional AACR2 records. In addition,while many discovery systems still don’t make full use of RDA coding in MARCrecords, if this coding is created, its presence offers the potential for creating new,more powerful, and functional discovery systems.The fact that RDA was not created specifically for use with MARC but that once
RDA metadata is identified it must be transferred into MARC is indicative ofseveral key characteristics of RDA as a descriptive cataloguing standard. The firstcharacteristic is that RDA is schema or metadata container neutral. It has beendesigned to be used with any and all metadata containers. It is the job of themetadata creator to find a way to record RDA metadata within the container. Thisis beneficial in several ways, including that RDA can potentially be used with anyexisting or future metadata schema. It is also potentially beneficial in the sensethat it does bring some uniformity to the bits of information placed into thecontainer. The benefit comes from reducing compartment size and typemismatches when metadata is crosswalked between containers. In addition, RDAuses more controlled vocabularies than did AACR2, which also is a benefit interms of creating more precise metadata. The second characteristic is that RDA isbased on the principles of the Functional Requirements for BibliographicDescription (FRBR) and is intended to create metadata suitable for a linked dataenvironment.9 Unfortunately MARC’s flat, linear record structure isn’t directlycompatible with linked data. Fortunately, metadata created in MARC can be used
in linked data by pulling fields apart and storing them as triples. That being said,there are a few problems with the “pulling apart” process. There are missingcompartments in MARC and some MARC compartments that have the potentialfor turning into “junk drawers”10 in a linked data environment. Therefore, it isimportant for the MARC cataloguer to recognize that MARC will never quitesupport true RDA cataloguing and linked data derived from MARC records isbound to be problematic. Nonetheless, following RDA guidelines will helpcataloguers create highly functional records and records that are increasinglylikely to transfer effectively into future metadata containers and eventually intolinked data environments.It is important that all cataloguers of electronic resources understand the
Library of Congress’ and Program for Cooperative Cataloguing’s (PCC) ProviderNeutral e‑Resource guidelines. The idea behind creating the provider neutral or PNguidelines was to allow for the creation of a single record that was stripped ofmetadata relevant to the various platforms on which an eBook is available and theinsertion of links to the eBook on various platforms into that single record. PCC(2013) states in their guidelines that:
Libraries may make local policy decisions whether to use single or multiple records fortheir e‑resources. They may use a single provider‑neutral record that incorporates allspecific package and other local information on one record—or use multiple records—each with one specific package/URL on it. Whatever decisions PCC member librariesmake for their local catalogs, they still need to follow the provider‑neutral guidelineswhen coding master records in OCLC as PCC records. Any records added to OCLC aresubject to having package‑specific information removed.
There are several important considerations to be made with PN eBook recordsand making a decision about whether or not to use them: (1) Not all ILS record loading processes and discovery systems support or
function well with PN records. This is another reason why it is essential tounderstand how the library’s ILS and discovery systems work.
(2) Related to consideration #1, once records are loaded, it is not unusual forURLs or other details about an eBook to change over time. It is important tounderstand the potential ease and effectiveness of updating those records inthe local system once a change has happened. This would depend on factorsincluding but not restricted to how the ILS and its loaders function, howupdate information is received, and the technical knowledge and skills of thestaff that needs to make the updates.
(3) In some environments, a埄�empts to create provider neutral or providerspecific records can be very labor‑intensive and time‑consuming. It is
important to understand what workflows could be necessary to create andmaintain records that adhere to either guideline.
(4) Many of the record sets that are made freely available to libraries for use intheir catalogues are only available in one format or the other. The author hasyet to be offered the choice of ge埄�ing a record set in either PN or vendor‑specific format.
(5) Because in many libraries the majority of eBook records are obtained throughrecord sets or either title‑by‑ title or bulk copy cataloguing and these recordshave been created by other libraries and agencies, it is inevitable that thelibrary will receive and need to deal with records that reflect cataloguingpolicies and practices of other libraries that may or may not be compatiblewith their own systems, workflows, and practices.
The implication of these considerations for original cataloguing is that the
library must decide whether or not PN or vendor‑specific records will be usedand then create original catalogue records that reflect this decision. If PN recordsare preferred, then the current version of the PN record guidelines should befollowed. If vendor‑specific records are preferred, the general RDA guidelinesand/or any other relevant community guidelines should be followed. The issue ofPN records will be discussed again In the copy cataloguing section.Finally, there is the issue of “community guidelines,” which have already been
mentioned but not clearly defined. While the phrase “community guidelines” isnot generally recognized in the cataloguing and metadata community, the authorhas chosen it as a useful term to describe a growing number of RDA guidelinesthat are being wri埄�en to assist cataloguers with the application of RDA guidelinesto specific types of materials and formats. Within the cataloguing and metadatacommunity these guidelines are often described as “best practices,”“recommendations,” or “guidelines,” with no single phrase or term beingpredominant. With RDA there was a movement away from cataloguing rules tocataloguing instructions and these instructions are often quite general in nature soas to make them flexible enough to accommodate the cataloguing of any currentor future type of resource or material format. In addition, there are manyinstances where the instructions imply that the cataloguer should use judgment(cataloguer’s judgment) to decide what will be recorded and how to record thatinformation. While the sentiment in creating a very broad and flexible set ofinstructions remains highly valuable and a strength of RDA, it is thischaracteristic that sometimes creates problems for cataloguers who are workingwith specific forms of resources or types of information. In order to supplementthe RDA instructions, librarians who are specialists in the respective areas inquestion have been working on creating guidelines that help to guide cataloguersthrough the more generic aspects of RDA. As the guidelines are developed andused by cataloguers, many of the examples and supplemental instructions are
incorporated into the RDA Toolkit.11 The author has decided to use the term“community guidelines” because these guidelines have essentially arisen out ofthe cataloguing and specialist librarian community. As this book is wri埄�en, it isthe guidelines wri埄�en by the Music Librarian’s Association that are the most fullydeveloped and are now found in the RDA Toolkit. There are other similarcommunity guidelines available for use, have been presented in a draft format, orare under discussion. It is important that cataloguers continue to monitor theirListservs or social media discussion groups for discussions and news about newand developing guidelines. It is reasonable to expect that many more communityguidelines will be produced in the near future.Now that the idea of community guidelines for those doing original
cataloguing has been introduced, the next consideration is to look at howlibrarians might know when and why these guidelines could or should beapplied. The author recommends that RDA be used when original cataloguing isrequired for eBooks. Whether or not to use community guidelines when they existis highly dependent on the context in which the resource is being catalogued. If alibrary were to, for example, just follow RDA as it is found in the RDA Toolkit, anentirely adequate and acceptable record would be produced for use in mostlibraries. Yet, there are many situations where using the community guidelineswould clearly be the wisest choice. Ultimately, the metadata manager mustconsider the context of the library, the resource being catalogued, and thecomposition of the collection into which those resources will be added and do acost‑benefit analysis. This analysis would be in terms of the cost of learning andimplementing a community guideline versus the potential benefit of creatingmetadata that reflects the best practices for materials on that subject or of thatmaterials format.Looking at music resources is an excellent way to demonstrate in a very simple
way how an analysis of the costs and benefits using a community guideline mightwork. First, consider the situation where a librarian is given a link to a recordingof the university’s fight song,12 which has been digitized and is hosted on one ofthe library’s servers, and is asked to catalogue the recording. This is a digitalmonograph and thus, according to the definition used in this book, is an eBook.As a unique resource, this song will require original cataloguing. This particularuniversity is a four‑year college that focuses on business and doesn’t have musicor music education programs. It has relatively few musical recordings, sheetmusic, or books about music in its collection. In looking at the informationprovided along with the link to the file, the librarian sees that the file is adigitization of a band of unknown musicians playing the music sometime in the1920s when the university still had a football team. Upon listening to the musicalrecording, the librarian can immediately tell that this “fight song” is not uniqueand is a popular one used by sports teams elsewhere. There is nothing that thelibrarian can discern that is notable or known to be remarkable about the
recording except for its historical significance to the university in question. Whilethe librarian has been using RDA for two years, she has not had to create anoriginal catalogue record for any form of musical recording or music score in allof that time. It doesn’t sound like more historical recordings will be digitized inthe near future. Given that the librarian rarely catalogues music, her universitydoesn’t have any sort of teaching or research mandate related to music, theuniversity’s collection of music resources is negligible, and there is no evidencethat the piece of music being catalogued will be of interest to musicologists, it ishard to justify taking the time to read through, learn, and apply the specificguidelines for cataloguing musical recordings in RDA. Instead a more practicalapproach may be to use the RDA Toolkit to catalogue the item and then browsethrough some of the examples in the community guidelines related to musiccataloguing (called “Best Practices for Music Cataloging Using RDA andMARC21”). While the “best practices” should not be ignored, it is hard to justifyusing them as more than a quick reference resource in order to find answers toany questions that may arise during the cataloguing of the musical recording or tomatch the final product to the examples and the tables given in the best practicesdocument. In this situation it would not be a good investment of time and effortfor the librarian to spend time reading the supplemental music cataloguingdocument in order to catalogue what appears to be a one‑off music resource ofrelatively li埄�le significance to the larger academic and research environmentdespite the fact that it is valuable to the university and its history and thus stillrequires a reasonable quality discovery record. In this situation the librarian findsa balance by not entirely ignoring the guidelines but limiting the amount of timeshe spends interacting with them seeing as she may not need to catalogue anothermusic resource for years.Now consider an entirely different scenario in a university library that has a
strong music program and a large collection of musical recordings. In thissituation, the special collections area has digitized a large collection ofperformances of original compositions by former students. The librarian has alook at the collection that has been digitized, and it exceeds 100 recordings madeover the period of nearly a half century. In scanning through the documentationfor the collection, she sees names of former students who went on to be notedcomposers and musicians. She also notes that some of the pieces of music went onto be recorded and performed elsewhere. These recordings are likely the earliestknown recordings of those pieces of music. In this context, the musical recordingsare significant on multiple levels and have the potential to be of significantinterest to musicologists, musicians, historians, and others in the community.Given that the university is known for its music program, it already has asignificant music collection and the resources being catalogued are of interest to alarge potential audience, there is no question of taking the time to read and learnthe content in the best practices guidelines and use them intensively while
cataloguing these resources. In this context, the time spent interacting with andlearning the best practices will undoubtedly build the knowledge and skill set ofthe cataloguer and bring a benefit not only to the digitized music beingcatalogued but many other original catalogue records that the librarian will createfor other music resources. In such a situation, it may be reasonable that the bestpractices form the core document for providing instruction for RDA cataloguingwhile the RDA Toolkit is referred to on occasion for clarification and additionalexamples.While the previous two examples may seem to be somewhat extreme cases,
hopefully they help to illustrate that there is no one‑size‑fits‑all approach todeciding when and how to use community guidelines. It is the recommendationof the author that librarians take opportunities to learn and use the guidelines.However, given the pressure on cataloguing resources at many academic libraries,she is also suggesting that it is a reasonable part of the metadata plan to evaluateand make decisions on the extent to which certain guidelines will be used.Actually documenting decisions and rationale as supplementing documentationfor the plan will undoubtedly be useful for making future decisions as well asevaluating those decisions as collections and research and teaching mandateschange within academic environments.
6.6.4. Copy cataloguingAs already discussed, copy cataloguing is essentially copying, updating, andreusing metadata for resources. The specific mechanism for copy cataloguing willvary from library to library. Some ILSs have powerful built‑in functionality, whichsupports searching externally for existing metadata; allowing for viewing andcomparison of potentially useful records and support for correcting and updatingdownloaded metadata. The external searching is often done using z39.50‑basedtechnology, as previously discussed in this section. In other library contexts,records may need to be downloaded externally to the ILS and then importedeither record‑by‑record or in bulk. The author has even seen library contextswhere copy cataloguers have electronically cut and pasted text from a library’sOPAC display and copied it into the local ILS catalogue record. While the la埄�er isa somewhat inefficient approach to copy cataloguing, it is important to recognizethat this technique is sometimes the only option available to some cataloguers insome contexts. It is important that those who are involved with creating themetadata management plan understand how copy cataloguing on a title‑by‑titlebasis is carried out for eBooks in his or her library and also to evaluate the qualityand usefulness of the resulting records for the effective discovery of eBooks. Ifinvestigations reveal that there are problems with the resulting records, practicesare out‑of‑date and/or some copy cataloguers require supplemental training,addressing issues such as this should become part of the overall metadata
management plan. This section will address specifically the how and why ofcarrying out a library‑specific evaluation of copy cataloguing. It is outside thescope of this book to provide detailed instruction about how to perform copycataloguing itself for eBooks.For the librarian who is trained in creating original cataloguing, the issue of
copy cataloguing appears fairly straightforward. All of the knowledge that is usedfor original cataloguing can be applied to selecting, correcting, and upgradingexisting metadata for inclusion in the local catalogue or discovery system. Insituations where a fully trained and experienced original cataloguer is alsoperforming copy cataloguing, no further evaluation of copy cataloguing practicesmay need occur. However, in situations where the librarian who is doing bothoriginal and copy cataloguing is also struggling with finding the time to learn andkeep up with changes and emerging technologies in the field, transferring copycataloguing duties to staff that are adequately trained and supported in carryingout the required tasks may be a viable solution. The key is that the staff that arereassigned or hired to do copy cataloguing must have both training and support.Given the amount of change in recent years, even library staff with technicaltraining and previous cataloguing experience may not have the knowledge andskill set required to meet the current demands for creating useful eBook discoverymetadata.In many academic libraries, the person or people doing the original cataloguing
are often not the same staff doing copy cataloguing. In reality, the theoretical andpractical training of some copy cataloguers can be relatively limited. Inconsidering the rule‑based nature of AACR2, it is not surprising if a librariandiscovers that in his or her library many copy cataloguers may have only workedin a highly prescriptive cataloguing environment previous to the introduction ofRDA. The experience of the author is that more than one copy cataloguer hasdescribed her job as being told “exactly what to do and how to do it” with li埄�leroom for judgment and critical thinking. This has resulted in a rote, repetitive,and mechanical work experience. Some current or former copy cataloguers havereported to the author that they find comfort and security in the orderliness of aprescriptive approach and take pride in being able to produce what they considerto be a “perfect record.” In these cases a “perfect record” is one that perfectly andliterally conforms to the direction given to the cataloguer. Unfortunately, these areoften also the cataloguers who find the transition to RDA the most distressing anddisorienting. On the other hand, the author has also spoken with many formercataloguers who have a strong dislike and bias against cataloguing because of the“mindless” and “boring” nature of the copy cataloguing they had done in thepast. Once again this is an unfortunate situation because their opinion of the workdone by cataloguers is based on a time and situation that is not in line with thecurrent dynamic environment of discovery metadata creation. In reality, if alibrarian discovers evidence that staff copy cataloguing eBooks are doing a
significant amount of rote and repetitive work, this should be documented forconsideration during the creation of the metadata management plan. Such adiscovery may be a symptom of moving some of that work away from title‑by‑title copy cataloguing into the realm of bulk processing of records. Part C of thischapter will address the important topic of bulk processing for eBook recordsmore fully.It is not fair to all of the highly skilled copy cataloguers who work currently in
academic libraries to characterize all of them as having a limited scope ofknowledge and training, nor is this accurate. In fact, it is important for thosecreating the metadata management plan to try to understand, if it is not alreadyknown, the strengths and limitations of the existing copy cataloguing staffcompliment as part of creating the metadata management plan. Painting all copycataloguers with the same brush not only does a disservice to library staff thatcould be, for example, highly skilled and knowledgeable or to those in need ofskill enrichment, it also bypasses an opportunity for finding ways to make thebest use of existing staff knowledge and evaluate the actual need for training onan employee‑by‑employee basis.As previously discussed, eBooks require higher‑quality discovery metadata
than hard copy resources because it is not possible to physically browse foreBooks. The tiger that needs to be tamed in the realm of copy cataloguinggenerally is to ensure that staff are using their time wisely and efficiently in termsof ge埄�ing the best value they can out of the time they have spent selecting andupdating copy catalogue records. Adequate time and effort must be spent toensure that records suitable for the discovery of electronic resources in currentand emerging discovery environments are selected. However, because mostlibraries have limited time resources, it is essential to ensure that time and effortare not wasted in the process.The copy cataloguing of eBooks may present a challenge in some academic
libraries and those challenges may not be readily apparent on the superficial level.It is important for the librarian creating the metadata plan to understand both theassumptions upon which copy cataloguing practices have been based and alsostudy the details of the instructions given to copy cataloguers if these are notalready known. While most libraries have likely adjusted their copy cataloguingpractices over the years to adapt to changing standards and the requirements ofthe new discovery systems, it is possible that some libraries have not recentlyupdated practices or that specific staff have missed ge埄�ing training in the newerrequirements. The reality is that the author has encountered a number ofdiscovery records in her own library’s catalogue and in other libraries’ cataloguesthat likely had been downloaded and massaged to function in the local library’sOPAC but some critical aspect of the record doesn’t accurately describe the itemin hand. For some librarians, a challenge may arise out of a埄�empting to determinethat current copy cataloguing practices are appropriate for creating good quality
eBook discovery metadata and that the staff involved with copy cataloguing haveadequate and appropriate training to do their work effectively. This trainingshould include instruction in how to access the tools and resources they need forreference. The following list of considerations can be used as a guide for detectingpossible problems in the current practices: (1) Copy cataloguers are instructed to examine relatively few fields when
selecting copy catalogue records and to make relatively few changes to therecords found. For example, copy cataloguers may be instructed to look atthe 020, 1xx, 24x, and 300 fields and if these match the item in hand, therecord should be considered as acceptable and the remainder of MARC tagsdo not require further inspection. Perhaps the copy cataloguer may also beinstructed to look for specific library codes in the subfield “a” of the 040 tagbecause the library either prefers records from certain libraries or choses toexclude records from other libraries. In general, however, it is a red flag todiscover that five or fewer tags are being inspected when eBook records areselected for copy cataloguing. It has been the experience of the author thatpractices such as this are quite common in libraries and represent an efficientway to select records for hard copy resources. A be埄�er practice is to continueto focus on the fields previously mentioned but to also consider thesuitability and correctness of the record as a whole in selecting it for use, andmaking corrections or updates. In particular, practices should includeensuring that the MARC leader and control fields are correctly formed forthe particular resource and format being catalogued. These fields inparticular are sometimes overlooked in copy cataloguing workflows.
(2) There are many instructions about removing fields from records or individualcataloguers may remove many fields as part of their routine. Occasionally,the author has encountered situations where copy cataloguers routinelyremove MARC tags from records when the MARC tag either doesn’t displayin the current OPAC or discovery system or the tag content is displayed inthe OPAC in a way that is not perceived to be helpful to patrons. Forexample, one cataloguer would routinely remove 001, 003, 035, 041, 043, andall 7xx linking fields in addition to any other tag she either didn’t recognizeor understand. In observing this practice the author asked for an explanationas to why these fields should be removed. The response was that the fieldsweren’t required for display in the OPAC and that including them made therecord “look messy.” With regard to the linking fields, the copy cataloguerexpressed her feeling that these fields “don’t work” and “are just confusingto patrons.” Further conversation revealed that the copy cataloguer’straining had not been updated in decades. She was not able to make sense ofnew MARC tags as they began to appear in MARC records over the years. Asimilar situation may occur with staff that may want to remove RDA coding
from records and a埄�empt to convert them to AACR2 records. This processshould not be done and its presence likely indicates a lack of RDA training.
(3) Copy cataloguers a埄�empt to either piece together multiple records or convertrecords for other formats of the resource rather than set the resource asidefor original cataloguing. Copy cataloguers at the author’s current libraryhave given types of records created via processes such as this the title of“frankenrecords” and actually have an image adapted from a Halloweendecoration as part of a display in their work area to remind them of the needto avoid creating records that are “mishmashes” of information. While thepractice of piecing together records or converting a record for a hardcopyresource to one for an electronic resource may have been relatively effectiveand efficient in an AACR2 cataloguing environment, it’s not proving to be asuitable practice for RDA cataloguing. Frankenrecords may have controlfields that are forma埄�ed for an entirely different media. The corresponding33x tags may also be inappropriate for the resource being catalogued. Otherproblems that can occur include copying a 035 OCLC number for a recordthat doesn’t match the item the record is intended to represent; copyinglinking fields that are unrelated to or inappropriate for the item beingcatalogued; and copying other information that is not appropriate oraccurate for describing the eBook. In general, it is be埄�er to start with a freshrecord and work through it systematically. Certainly, elements from otherrecords such as call numbers, subject headings, and content summaries canbe copied from other records if they are appropriate, but the cu埄�ing andpasting should be limited to those fields known to be correct andappropriate.
(4) Copy cataloguers are not opening and viewing the eBook in order to select arecord to represent it. If cataloguers work from a spreadsheet of titles andURLs when performing copy cataloguing and are only matching copy on thetitle of the resource alone, the chances for selecting inappropriate copyand/or not recognizing when an original record is needed goes upsignificantly. Title pages, introductory screens, or the like must be viewedjust as they would be when selecting copy catalogue records for hard copyresources.
(5) Presence and accuracy of important indicators are not examined or updated.While copy cataloguers may routinely check the second indicator in the 245field to ensure that it properly reflects any “nonfiling characters,” they maynot check and update indicators in the relatively new fields such as the 264or 856 tags. This is another situation where RDA training or updatedinformation about the MARC standard may be lacking.
(6) Copy cataloguers aren’t able to tell the difference between PN (providerneutral) eBook records and those which are not PN. Copy cataloguers needto know whether or not the metadata management plan requires or prefers
eBook records to be PN or not. They also need to know how to identifywhether a record is PN or not and what to do if the only records they find donot conform to the format the library uses.
These considerations are suggestions for helping to detect areas the metadata
plan may address in the realm of copy cataloguing. There may be other issues orquestions librarians may want to address in their library depending on what theyalready know about copy cataloguing practices or according to issues that areuncovered when investing any of the issues discussed.A section of the Toolkit Survey for this chapter will assist with helping
librarians to summarize issues and concerns surrounding the title‑by‑title copycataloguing of eBooks.
6.7 Subject headingsAs previously discussed, because it is not possible to “browse” shelves for eBooks,richer and be埄�er quality access points are required for eBook discovery. One ofthe best ways to make eBooks discoverable is by providing good subject access viasubject headings. The need for subject analysis is particularly useful formonographs from disciplines within the arts and humanities, as keywordsearching is often ineffective in situations where irony is used or if, in general,titles that are not to be taken literally are given to resources. In cases such as thela埄�er, often the only way to discover the content in an electronic environment is ifa subject heading that reflects that content has been applied. The need for subjectanalysis in medicine and the natural sciences, for example, may be less critical.However, it has been the experience of the author that there are more exceptionsto the rule, which make it less than desirable to suggest that across the boardsome types of nonfiction require subject headings and some don’t. Instead, theauthor prefers to recommend some best practices that are specific to eBookdiscovery metadata: (1) Use as many subject headings as is required to reflect the content of the
resource. While cataloguers have often been given guidelines as to howmany subject headings should be assigned, the limitations appear out of stepin the current information environment. If one subject heading thoroughlyrepresents the resource, this is all that needs to be included. However, if theresource is complex and multidisciplinary, there is no reason to enforce anartificial limit such as 3 or 4 headings as is often found in cataloguing policymanuals.
(2) Make good use of the subject heading options in use at your library. Somelibraries only use one set of subject headings. For example, it is common forNorth American libraries to exclusively use Library of Congress Subject
Headings (LCSH). However, the second indicator in 6xx fields in MARCallow for multiple subject heading vocabularies to be used in the samerecord and the second indicator as well as the $2 subfield can be coded toreflect the specific vocabulary in use. Therefore, if a library uses multiplecontrolled vocabularies such as LCSH plus one or more vocabularies such asMeSH (U.S. National Library of Medicine subject headings), CSH (Canadiansubject headings), or another national or institution‑specific set of subjectheadings, it is possible within MARC to incorporate them all within thesame record. Because some controlled vocabularies are be埄�er suited forreflecting certain types of content, it is possible that subject access to aresource can be improved by including multiple subject headings from thevarious vocabularies in use at the reader’s library. The key consideration isto ensure that the second indicator of the MARC field has been properlyconfigured for the subject heading list from which the term or terms havebeen selected.
(3) Consider adding the use of alternative formats of controlled vocabularies foruse as subject headings. In addition to considering adding discipline‑specificcontrolled vocabularies, which may be relevant to the particular eBookcontent in the library collection, libraries that use a faceted discovery systemor plan to adopt a system that uses facets may wish to consider using FAST(see the Toolkit Tools for more information) subject headings. These subjectheadings are ideally applied in addition to traditional subject headings inoriginal catalogue records. Chances are that many copy catalogue recordsare retrieved with FAST subject headings already present. In cases whereFAST subject headings are already present even those libraries that can’t usethe new headings in their existing discovery context may consider leavingthese headings in the records for future use. If FAST headings are bothnonfunctional and create a problematic display in older OPACs, check withthe OPAC documentation or vendor to see if there is a way to suppress theFAST headings from public display.
(4) Control subject headings. Controlling subject headings means to ensure that asubject heading inserted into a properly coded MARC field contains theexact string of text as found in the authority record (or the source file forauthority data) for that heading and that subfields have been codedcorrectly. If authority records are downloaded and maintained in the locallibrary, ensure that those files are up‑to‑date and that the library has a wayto manage the bibliographic records when there are changes to authorityrecords. Even if the local discovery system doesn’t use authority records,controlling subject headings can be very useful for collocation of eBooks onthe same topic because browsing shelves isn’t possible with eBooks. Whenthe discovery system can access authority records, controlled subject
headings represent a significant strength for both collocation anddisambiguation.
6.8 ClassificationThe ODLIS (Rei埋�, J. 2014)13 defines classification as “The process of dividingobjects or concepts into logically hierarchical classes, subclasses, and sub‑subclasses based on the characteristics they have in common and those thatdistinguish them.” In practice, classification generally involves the librarianconsidering the content of the resource as a whole, determining either the mostoverarching topic of the work or, if there are multiple but divergent topics, thepredominant topic and assigning a general classification number to reflect thattopic and then further refining the classification as permi埄�ed by the rules of theclassification system. Typically, refinements are by geographical region orcommon subdivisions within the topic. Thus, while as many subject headingsmay be assigned to a resource as the cataloguer feels will accurately reflect thesubject coverage of that resource, only one classification number can be assigned.The topic of assigning classification or call numbers, such as Library of
Congress Classification (LCC) and cu埄�er numbers, to eBooks is a controversialtopic for some cataloguers. In the 10th edition of her classic textbook oncataloguing Taylor (2006) suggests that:
. . . classification provides a logical, or at least a methodical, approach to themanagement of those documents. Classification traditionally provides formal, orderlyaccess to the shelves. In online environments it is beginning to be used to bring orderout of chaos and to provide hierarchical means for browsing for relevant resources (p.391).
At the time Taylor made this statement about the use of classification theexplosion of the availability of eBooks in large packages and the general growth ofelectronic monographs in library collections was in its early stages. Her statementdoes reflect the idea that despite the fact that eBooks do not need to have aphysical location and thus do not need to be assigned a location on the shelf toreside and from which patrons can retrieve them, there may be some value inclassification numbers from an information seeking point of view. While somedegree of value remains, in the nearly 10 years since the publication of thecataloguing text, the introduction of new discovery systems, the adoption of anew way to record subject content for these systems, and the massive amount ofeBook metadata that needs to be managed in many academic libraries are just afew examples of the changes that have occurred, which sheds new light on theissue of the value of the use of classification numbers in eBook discoverymetadata.
The author has heard some librarians state that they find call numbers in eBookrecords to be useful for collection management purposes. She has also had theexperience of speaking with sellers of eBooks that offer “approval plan”14 andDDA programs that program the automated selection of resources for thelibrary’s collection based on classification ranges. This is to say, if a book isclassified within a certain range will determine whether or not a book will beselected for the library or not. Based on her previous experience as a selector andher current experience as a metadata librarian who assigns classification numbers,it is the opinion of the author that the practice by vendors and the practice ofmaking selection decisions based on classification numbers alone is highlyconcerning. The author has had the experience of having to assign classificationnumbers to resources on new or esoteric topics for which no reasonableclassification number exists and has also frequently found herself assigningclassification numbers for multidisciplinary publications that easily could havebeen put into any of four different ranges. In the end, there are many times whenclassification is not exactly precise with regard to representing the subject contentof the resource. For those libraries that have highly specialized andinterdisciplinary collections, the resulting classification dilemma cataloguinglibrarians face on a regular basis is likely not fully appreciated by selectors andtheir electronic resource vendors. Therefore, the author generally recommends toselectors who prefer to organize their selection activities according toclassification number to also make use of subject headings. Subject headings canhelp to bring a greater level of accuracy to the process of identifying the topic ortopics addressed in a resource and reduce the impact of imprecisions that occurfrom time to time when there is no single or appropriate classification number toapply.The key point is that those who are creating and managing eBook metadata
understand some of the perceptions about the value and usefulness ofclassification numbers for eBooks that are held by noncataloguers. The cost ofassigning classification numbers when doing original cataloguing must ultimatelybe balanced with the real, as opposed to perceived, value those classificationnumbers bring to the effectiveness and efficiency of both information seeking andlibrary processes. Depending on the collection management practices of librariesand individual librarians, this is an issue the metadata management plan mayneed to address. Hopefully a fuller discussion of the key issues will be helpful forthose creating the plan.To begin with, it is important to recognize and accept that at this point both
library selectors and vendors do occasionally use classification numbers to aidwith some aspect of collection management. While some librarians may rely li埄�leon classification numbers, some may use them as the main way of decidingwhether a resource falls within their selection responsibility or not. With regard tobrowsing for eBooks using classification numbers, it is true that both library staff
and patrons will use this method as a means of browsing collections so as to makeserendipitous discoveries of useful resources. In some contexts, continuing to addand manage classification numbers for eBook content may be manageable andworth the effort. Those working on the metadata management plan must studyand determine the use of and reliance upon classification numbers within theircontext.A second important consideration is to recognize that by necessity there may be
only one classification number assigned within a single classification system foreach item. Because classification systems have been designed to bring a logicalorder to resources in physical space and to aid with the processes of retrievingresources from the collection and browsing for useful resources, someone needsto make the decision about where each resource will sit within the larger contextof the collection. This process of selecting a location in space, while rational, isoften imperfect and subject to the limitations of the classification system beingused. New topics and interdisciplinary subjects are often very problematic in thesense that there often is no place for the resource within the system. In such cases,the resource may be classified within a related subject or in the most generalcategory. The author recalls working in reference and often being struck byquestions such as “why is this here?” and “why isn’t this over there?” whenbrowsing through the shelves with patrons or doing collection managementwork. As someone who regularly assigns classification numbers, she nowunderstands how a perplexed cataloguer may have been struggling with aresource that simply didn’t fit into the classification system or fit into too manyplaces in the classification system. With physical resources, ultimately a place onthe shelf must be assigned and sometimes a cataloguer may make that assignmentwithout being comfortable with the final choice but sees no other options. Unlikewith subject headings, it is not possible to combine several classification numbersto more accurately reflect the resource. One of the enduring limitations of mostclassification systems is that they reflect the limitations of the physical world.A third important consideration is that the process of assigning a classification
number to an originally catalogued resource is generally time‑consuming. Forthose libraries doing original cataloguing of large collections of digitizedresources, it may be possible to retrieve and reuse classification numbers thatwere assigned to the original form of the resource. However, with “born digital”content, classification would need to be assigned by the cataloguer. Thoseworking on the metadata plan need to document and consider the amount andtype of classification work required if they are doing a significant amount oforiginal cataloguing at their library.A fourth consideration centers on discovery. While classification numbers can
assist with certain types of browsing, classification numbers fail at the task ofcollocation of resources within a discovery system. By assigning a classificationnumber, the cataloguer essentially precollocates the resource. The classification
number alone doesn’t support bringing together diverse resources from anywherewithin the collection depending on the search query. In modern discoverycontexts, searching by classification number provides an inflexible, linearexperience. This is not to deny that there are times when a searcher may want toliterally browse through resources in shelf order, but to point out that there arelimitations in classification number searching. In fact, it has been the experience ofthe author that many of the current discovery layer products such as Exlibris’Primo do not support classification number browsing or it is not supported in anintuitive‑to‑use manner. Instead, these systems cut across various classificationand subject heading systems to provide faceted searching that is flexible andallows the users to interact with search results to either limit or broaden theirsearches. These systems are particularly useful for serendipitous discovery, whichis one of the often‑stated purposes of browsing within a classification number.Discovery systems that make use of subject facets typically are useful for thecollocation of resources as well. Unfortunately, disambiguation remains aproblem in many systems but this is a problem that is likely to be resolved as theuse of services such as VIAF (see: www.viaf.org), ORCID (see: h埄�p://orcid.org/),and ISNI (see: h埄�p://www.isni.org/) are integrated into discovery systems. Thekey point is that in our current academic information environment, which ischaracterized by an increasing amount of electronic information from diversesources, it appears that using call number searching as a critical method fordiscovery is becoming increasingly irrelevant.While the bias of the author with regard to the use of classification numbers in
eBook discovery metadata is likely evident to readers at this point and that thebias appears to not support the inclusion of classification numbers in eBookdiscovery records, it may be either too early or not appropriate for some librariesto share that same view. What is appropriate for all libraries, however, is to studyand document if and when classification is included in original and copycatalogued eBook discovery records and what is known about how classificationnumbers in eBooks are being used locally. With this information, libraries have asolid basis on which to make decisions and take action or, perhaps, changepolicies and educate library staff, if needed.In conclusion to Part B of this chapter, the author would like to point out that
for many academic and research libraries relatively li埄�le original or title‑by‑titlecopy cataloguing will occur for eBook content. Nonetheless, a solid background inthe principles, standards, and practices discussed in this section are essential forall libraries. Having a solid understanding of the MARC 21 standard or at leastbeing able to interpret its coding, for example, is essential for carrying outeffective bulk processing of record sets. It is hoped that the Toolkit Survey for thischapter will help those creating the metadata plan get a strong start in creating thediscovery metadata section of that plan, which is useful and relevant to theirlibrary.
Toolkit Survey
(1) What types of records are a埄�ached to your MARC bibliographic records foreBooks (order, check‑in, holdings, item, etc.)? Which type of record isa埄�ached for which scenario? What must be a埄�ached to an original or copycatalogue eBook discovery record? Are there any known inconsistencies orproblems in current practice? Are there any probable or suspectedinconsistencies or problems?
(2) What standards, controlled vocabularies, authority files, and guidelines areused in creating MARC eBook records in the library? Are they beingfollowed consistently and accurately? What “exceptions” and local practicesexist? Are there any conflicts between or among controlled vocabularies,authority files, and so forth? Is there any area that needs further study orimprovement in terms of creating good quality, standards‑based metadata?
(3) Does the library have locally created digital collections of monographs? Thismay take the form of an IR, ETDs, or other digital collections. Whatdiscovery metadata is used for this collection? What schema and descriptivestandards have been used? Does the discovery metadata seem adequate andeffective for the type of resources? Is any sort of authority control used? Ifnot, what authority control would be effective? Do the controlledvocabularies in use appear appropriate for both the resources and thepotential users of the discovery metadata?
(4) Is the library set up to retrieve MARC records through z39.50? If so, in whatenvironment (often it is the ILS and/or another application such asMARCEdit)? If not, what are the reasons for and implications of not beingset up? If it appears that the library should be retrieving MARC recordsfrom other libraries via the z39.50 protocol but currently is not, it may benecessary to investigate se埄�ing up this sort of access to improve theefficiency of cataloguing. Even those libraries that use a non‑MARCmetadata container can benefit from MARC records retrieved from z39.50searches as large quantities of metadata can often be crosswalked into therequired scheme more efficiently than a埄�empting to create new metadatafrom scratch. If the library is already using z39.50 searches, are the targetlibraries providing adequate records or are there other libraries with whichit may be useful to exchange metadata?
(5) Who is performing original cataloguing? How much of it is done? Does itappear that these records are compliant with international standards and arefunctional in contemporary discovery systems? Do(es) the originalcataloguer(s) have adequate training and support as well as access to thenecessary tools and information, such as the RDA instructions andstandards documentation, to carry out original cataloguing? Based on thelibrarian’s current understanding of the emerging developments to
transition discovery metadata away from MARC into another vehicle orcontainer, is there anything in current practices and procedures that may becreating metadata that would be problematic for such a transition (e.g.,removing OCLC or other control numbers from records or deviating fromMARC 21)? Is the time of the original cataloguer being put to its best use? Ifnot, are there any options that could be investigated?
(6) Who is performing copy cataloguing? Repeat the same questions as asked forquestion 5.
(7) Is there a significant amount of original and/or title‑by‑title copy cataloguingbeing done? If so, have alternatives for making the work more cost‑effectivebeen considered? For example, have any tools for capturing basic metadatafor original cataloguing been investigated? If not, using a manual orautomated capture process to create basic metadata that is then passed alongto a trained cataloguer may create an efficiency. In the case of copycataloguing, it is possible that a vendor or third‑party cataloguing agencymay offer free or low‑cost record sets for eBook content that is purchased inpackages or in small batches for title‑by‑title purchases. Note that while avendor may not have offered records when the library first purchased apackage, they often begin offering records or a third party is contracted tocreate them after the fact. Therefore, it is a good practice to ask about theavailability of record sets for eBook content from time to time even if therepeated answer is that they are not available.
(8) Has the library recently documented or evaluated existing documentation ofcataloguing policies, practices, and procedures? Are they adequate andappropriate? Do they conform to current international cataloguingstandards, guidelines, and practices? What is out of synch and what are thepotential implications of this? What are the costs of updating policies,practices, and procedures and the costs of not doing so? Which cost isgreater? Are there some changes that are absolutely necessary for theongoing creation of sustainable eBook discovery metadata?
(9) Is the library spending considerable time creating original and copycatalogued eBook metadata for resources that are essentially freely availableon the web? If so, has the library investigated other, less labor‑intensivemethods for making this content readily discoverable for library patrons? Ifthis has been investigated, how recent were those investigations? It ispossible that new harvestable metadata has become available in themeantime, for example.
(10) Are the eBook discovery records known to fail in the library’s currentdiscovery system (e.g., they often display as print rather than electronicversions)? If so, are the problems isolated and random or is there a pa埄�ernand consistency to the problems? Are the failures significant enough tomake the eBooks undiscoverable or inaccessible? If so, part of the metadata
management plan will likely include creating a solution to the problem andrepairing or replacing problematic records. If the library has no currentvehicle for recording and analyzing problems with eBook discovery in thecurrent discovery system, designing and implementing one may be part ofthe metadata management plan.
(11) What controlled vocabularies for subject headings are in use at the library?Are they being used in eBook records? Are there policies for assigningsubject headings? Do they require updating? Could another controlledvocabulary or FAST subject headings be added? Does the library have aregularly updated authority file for subject heading? If so, is there amechanism for updating bibliographic records when there are changes tothe authority records? Is the library using or considering a discovery systemthat uses faceted searching? If so, will the existing approach for creatingsubject headings be optimal in a new environment? Are the indicators andsubfields in 6xx MARC tags forma埄�ed correctly in original records and havecopy cataloguers been trained to look for and correct problems in copy?
(12) Does the library insert, accept, or adjust classification/call numbers inoriginal and copy eBook discovery records? Is there a difference betweendigitized and born digital content? Are classification numbers used in anyautomated collection management system? If so, which systems and how arethey used? What is/would be the impact of including or excludingclassification numbers for eBooks? Does your current discovery systemsupport browsing by classification number? Searching by classificationnumber? Has the library done a recent cost‑benefit analysis of assigningclassification numbers to eBooks and other electronic content?
Toolkit Tools
Best Practices for Music Cataloguing using RDA and MARC:h埄�p://www.rdatoolkit.org/sites/default/files/rda_best_practices_for_music_cataloging‑v1_0_1‑140401.pdf. This document is useful for those cataloguing streaming audio,other remotely accessed music files, and digitized scores.Cataloger’s Desktop: h埄�ps://desktop.loc.gov. Access to this resource is availablethrough a paid subscription. Librarians who do a significant amount of originalcataloguing may find this resource particularly useful because it supports easy accessto a number of freely available documents and tools as well as consolidated access toa number of additional resources that require a paid subscription (Classification Web,Web Dewey, RDA Toolkit, etc).Cataloging Calculator: h埄�p://calculate.alptown.com/. This is a free tool useful for bothcopy and original cataloguers. It will calculate cu埄�er numbers for resources classifiedin LC.
Codes and Controlled Vocabularies: h埄�p://www.loc.gov/standards/valuelist/. This pagebrings together links to various codes used in cataloguing.Dublin Core: h埄�p://dublincore.org/. Some libraries may have discovery metadata fordigitized monographs created using the DC scheme. This may be particularly true forcontent held in institution repositories and collections of digitized resources.FAST (Faceted Application of Subject Terminology): The following website has a usefulcollection of information and links related to FAST:h埄�p://www.oclc.org/research/activities/fast.html. Of particular interest to thoselibrarians who wish to begin adding FAST subject access to their eBook discoveryrecords is OCLC’s tool assignFAST, which is free to use at:h埄�p://experimental.worldcat.org/fast/assignfast/. When using this tool, select the“MARCbreaker” format to generate a generic MARC subject heading field for thesubject heading selected.Integrating Resources: See document: “Integrating Resources: A Cataloging Manual,”which is Appendix A to the BIBCO Participants’ Manual and Module 35 of theCONSER Cataloging Manual (seeh埄�p://www.loc.gov/aba/pcc/bibco/documents/bpm.pdf).Library of Congress Authorities: h埄�p://authorities.loc.gov/. For additional sources ofname authority data, see VIAF (found later in this list of tools) as well as ORCID (seeh埄�p://orcid.org/) and ISNI (see h埄�p://www.isni.org/).Library of Congress MARC 21 FAQ: h埄�p://www.loc.gov/marc/faq.html. For thoselibrarians who are not experienced with traditional or MARC cataloguing, this webpage is an excellent portal for beginning to learn the standard. Librarians may wish tostart with the tutorial “Understanding MARC Bibliographic.” Note that theinformation and training resources on this page reflect traditional cataloguing andboth practices and terminology that were in vogue previous to the adoption of RDAand the revisions to MARC that reflect RDA. For example, the phrases “main entry”and “added entry” are commonly used, although these concepts are no longerpresent in RDA. For more information about traditional cataloguing and theprinciples upon which the terminology and practices are built, see Part A of thischapter.Library of Congress RDA Page: h埄�p://www.loc.gov/aba/rda/. This page acts as a portalto information about current RDA implementations, updates, and training resources.For those libraries that don’t purchase the RDA Toolkit, this page provides essentialreference resources. OCLC has a similar page, which can be accessed ath埄�p://www.oclc.org/rda/about.en.html.Linked Data: Rather than suggesting a single resource, it would be helpful forcataloguing and technical services librarians to familiarize themselves with all of thefollowing: (1) Tim Berners‑Lee: The next web: This TED talk is another short discussion
of how the idea of linked data came to be and how it works(h埄�p://www.ted.com/talks/tim_berners_lee_on_the_next_web); (2) Linked Data forLibraries YouTube video: This is a short introductory video created by OCLC, whichis a great starting point for learning the theory of linked data(h埄�ps://www.youtube.com/watch?v=fWfEYcnk8Z8); (3) BIBFRAME is the linkeddata‑based technology being developed and tested by the Library of Congress inconjunction with the British Library, George Washington University, PrincetonUniversity, Deutsche National Bibliothek, National Library of Medicine, and OCLC.While BIBFRAME currently centers on developing a replacement for MARC, it hasthe potential for being the “vehicle” for all library metadata(h埄�p://www.loc.gov/bibframe/); and (4) BIBFLOW: At the time this book was wri埄�en,BIBFLOW was an experimental project at UCDavis, which is exploring howBIBFRAME and linked data technology in general might reinvent cataloguing andtechnical services. To read more about this project and read about the outcomes ofthis project, visit this page and the links found on it:h埄�p://www.lib.ucdavis.edu/bibflow/about/. Once the reader has become familiarwith these resources, it is recommended to follow discussions about linked data,BIBFRAME, and related technologies in library journals, at conferences, in listservs,in social media, and in professional development offerings. Having an understandingof trends and developments in this area can help those working on the metadatamanagement plan to reduce the negative impact of disruptive change that linked dataand BIBFRAME may bring to libraries. In fact, rather than experiencing a negativeimpact, those librarians who understand the changing technologies in their field maybe in a good position to recognize and take advantage of advantageous opportunitiesas they present themselves.MODS Metadata Schema: h埄�p://www.loc.gov/standards/mods/. Some libraries thathave collections of digitized monographs may have discovery metadata records inthe MODS standard. Useful related information on METS can be found on this page:h埄�p://www.loc.gov/standards/mets/METSOverview.v2.html.OCLC Bibformats: h埄�ps://www.oclc.org/bibformats/en.html. This is OCLC’s version ofthe MARC 21 format. It is a useful companion to the Library of Congress (LC) MARC21 FAQ listed previously as well as the LC version of the standard for bibliographicdata itself, which can be found at h埄�p://www.loc.gov/marc/bibliographic/. Given thatthere are some slight differences between OCLC and LC versions of the MARCstandard it is important for those who are doing original cataloguing to know whichversion is preferred by their library. For copy cataloguers, it is important toremember that these LC and OCLC MARC documents can be referenced when copyappears to contain unfamiliar or unusually forma埄�ed MARC tags and subfields.
While it is possible that copy contains errors, it is also possible that the record reflectsa different version of the MARC standard.PCC RDA BIBCO Standard Record (BSR) Metadata Application Profile:h埄�p://www.loc.gov/aba/pcc/scs/documents/PCC‑RDA‑BSR.pdf. This is thedocumentation for the PCC instructions for cataloguers. “The BIBCO StandardRecord (BSR) is a combination of RDA “Core,” RDA “Core if,” “PCC Core,” and“PCC Recommended” elements applicable to archival materials, audio recordings,cartographic resources, electronic resources (if cataloged in the computer file format),graphic materials, moving images, notated music, rare materials, and textualmonographs.” (page 3)PREMIS metadata: h埄�p://www.loc.gov/standards/premis/. While PREMIS is generallyconsidered to be a metadata scheme used for the preservation of digital objects,PREMIS is a robust metadata scheme. PREMIS records often contain discoverymetadata.Provider‑Neutral E‑Resource MARC Record Guide: P‑N/RDA version:h埄�p://www.loc.gov/aba/pcc/scs/documents/PN‑RDA‑Combined.docx. Even if alibrary decides not to adopt PN guidelines and/or is working in an environment thatwon’t support them, it is important for copy cataloguers to have a basicunderstanding of the guidelines to make sense of existing eBook records that couldbe used for copy cataloguing.Virtual International Authority File: www.viaf.org. This online resource contains mainlyname authority references from authority files of national libraries around the worldas well as select specialized libraries. In addition, for subject and uniform titleauthority information, libraries should consult the authority service of their nationallibrary or, in the case of the United States, consult the Library of Congress Authorities(see h埄�p://authorities.loc.gov/).Z39.50: h埄�p://www.niso.org/standards/resources/Z39.50_Resources. This is NISO’sweb page, which provides access to information about the z39.50 protocol includinglistings of sites from which relevant software can be downloaded.
Notes1. XML is eXtensible Mark‑up Language and MARCXML is a version of MARCthat is forma埄�ed in XML. A very helpful resource for learning aboutMARCXML and also seeing examples of discovery records in this containeris h埄�p://www.loc.gov/standards/marcxml/.
2. An excellent starting point for librarians who are completely new to XML ish埄�p://www.w3schools.com/xml/, which is the W3school for learning XML.Note that the XML lesson is very generic and doesn’t deal with library‑specific uses of XML, and some of the examples and content isn’t relevant to
the way XML is commonly used in library contexts. For example, whenlibraries use XML it is generally within the context of an existing metadatacontainer, so there generally are already defined tags and the structure of theXML has been laid out in advance. However, the lesson and relateddocumentation found on the website is an excellent starting point. For thosewho have not studied HTML, it may be useful to complete the W3schooltutorial on that topic at h埄�p://www.w3schools.com/htmL/. For those whohave difficulty navigating from one section to the next in these lessons, thereis a green “next chapter” link, which allows the reader to move to the nextpage once they are finished reading the current page. In addition, within thelessons there are boxes where the learner can test what he or she is learningand then see the results. It is recommended that these opportunities to testcoding be utilized as they allow the traditional cataloguer to get a sense ofwhat is involved with actually creating forma埄�ed text as well as the level ofprecision required for producing the desired outcomes.In addition, it is not unusual for the W3schools to recommend that learnersalso know JavaScript prior to learning topics such as XML. In software andweb development environments, this advice makes perfect sense. However,it is not 100% applicable to the library context. That being said, metadatalibrarians are increasingly recognizing the value of knowing how to do sometype of coding such as Python. While the issue of whether or not it isessential for librarians to be able to write or edit code remains controversial,arguments in favor of at least some librarians having good coding skills arebecoming increasingly convincing. For those readers who are interested inlearning more about this topic, a good starting place is the following articlepublished in The Digital Shift back in March 2013:h埄�p://www.thedigitalshift.com/2013/03/software/cracking‑the‑code/, whichnot only captures some of the discussion in the debate but also has links touseful websites.
3. For those who are interested in learning more about the history anddevelopment of the MARC standard, one of the best concise histories can befound in the ALA World Encyclopedia of Library and Information Services. Atleast three editions of this encyclopedia have been published over the yearsand all the versions the author has viewed contain a useful article about thehistory of MARC. Unfortunately, it appears that the publication is out‑of‑print and no electronic version has ever been published. For those readerswhose library doesn’t own a copy of this publication, hopefully a copy of thevolume or a photocopy of the chapter could be obtained through theirlibrary’s interlibrary loan or document delivery service. It also appears thatthis publication may be available via a “print on demand” service in theUnited States. Or, for those libraries who have other comprehensive
encyclopedia for the LIS discipline, chances are there is an extensivecoverage on the topic of MARC.
4. The Library of Congress’s standards for holdings records are located ath埄�p://www.loc.gov/marc/holdings/; as well as training on the purpose andcreation of these records at h埄�p://www.loc.gov/marc/umh/. For thoselibraries who are OCLC members, the relevant standards for holdingsinformation are located ath埄�ps://oclc.org/holdingsformat/en/Introduction.html, with a more detailedpdf training and instructional manual located ath埄�ps://oclc.org/content/dam/support/local‑holdings/documentation/primer/Holdings%20Primer%202008.pdf (note thatthe detail in this manual is up‑to‑date as of 2008 and thus detailedinstructions do need to be confirmed before implemented). An importantconsideration is that not all libraries use holdings records despite the benefitsthat are discussed in the documentation listed above. In fact, thefunctionality of some ILSs is such that the benefits and creation of holdingsrecords aren’t as critical as in other environments. Even among libraries thatuse the same ILS, most or many libraries may opt for using holdings recordswhile others do not. This is why it is important for librarians that arecreating the metadata management plan to not only understand thefunctioning of their ILS but also learn specifically how it has beenimplemented in their library.
5. Authority control is defined in the ODLIS as:The procedures by which consistency of form is maintained in the headings(names, uniform titles, series titles, and subjects) used in a library catalog orfile of bibliographic records through the application of an authoritative list(called an authority file) to new items as they are added to the collection.Authority control is available from commercial service providers (see:h埄�p://www.abc‑clio.com/ODLIS/odlis_A.aspx).The “names” controlled by authority file records include the names ofindividuals, families, companies, and other groups as well as meetings andconferences. The use of authority control in libraries is a practice thatpredates information seeking on the World Wide Web, and is a majorcontribution that the field of LIS has made to the larger discipline ofinformation studies (including computer science). Authority control allowsfor disambiguation or the ability to differentiate among similar or identicalwords, names, or concepts. Authority control also supports the collocation ofdisparate resources according to controlled headings such as the name of anauthor or a subject heading. While the value of authority control for assistinglibrary patrons in conducting both accurate and exhaustive searches forinformation has been understood by librarians for a long time, in recentyears the principle has begun to be adopted by various agencies on the
World Wide Web. The development of the Virtual International AuthorityFile (h埄�p://www.viaf.org/), which brings together authority files fromnational and other libraries across the globe, has helped to facilitate the useof authority files in a web context. For more detailed information aboutVIAF, please see h埄�p://oclc.org/viaf.en.html.While it is important for all metadata and cataloguing librarians to befamiliar with VIAF and how it functions, the reality is that in many librariesthat use authority control, the file they use is either that of their nationallibrary or a discipline‑specific file. Because MARC bibliographic andauthority data records support it, many academic libraries will use multipleauthority files if their ILS supports doing so. While the author has seen ILSintended for school libraries that do not use authority control, it’s not likelyor recommended that a large academic library that needs to serve multipleprograms and disciplines should a埄�empt to avoid the use of authoritycontrol in their MARC records. Because of the size of many academic librarycollections and the specificity of research that is conducted, the need toprovide a means for collocation and disambiguation is significant.For those readers who have not worked with authority records, training isavailable online via the Library of Congress website ath埄�p://www.loc.gov/marc/uma/index.html and the Library of CongressMARC standard for authority data is also available ath埄�p://www.loc.gov/marc/authority/. Note that like other MARC‑relatedtraining, this training reflects an era of cataloguing previous to theintroduction of RDA and thus contains many of the older cataloguingterminology and concepts.
6. While technically the generally recognized term used to describe the differenttypes of metadata is “schema”, it is not unusual to see the term “format”used in listserves, at conferences, and other practical publications. Theauthor prefers to use the term “container” when discussing differentmetadata schema. This is particularly the case in the context of crosswalkingmetadata from one schema to another. The reason for this preference is dueto the fact that it is easier to visualize a container than it is a schema. Acontainer can have a size and shape as well as various compartments ofdifferent sizes and shapes. As containers, one metadata schema has differentcompartments than another container because the containers have beendeveloped at different times, by different agencies, and for differentpurposes. The result is that when metadata is crosswalked from onecontainer to another, the compartments aren’t always the same or the bits ofmetadata weren’t consistently placed into the compartments in the sameway. This creates a problem for the librarian creating the crosswalk and oftenresults in problematic records in the new container. It is easier toconceptualize the process of crosswalking and understand the resulting
problems by viewing the schema as a structured container into whichmetadata is placed.
7. OCLC has defined the z39.50 protocol as “Z39.50 is a computer‑to‑computercommunications protocol designed to support searching and retrieval ofinformation in a distributed network environment” (see:h埄�p://www.oclc.org/research/activities/z3950.html). Essentially, it allows forthe discovery and exchange of MARC records between library systems. Notall libraries support or permit z39.50 searching and retrieval access of theirMARC records but for those that do, sharing records represents a significantefficiency for libraries in the sense that the amount of original cataloguingthat needs to be done can be limited to only those records that can’t bedownloaded from other libraries. Z39.50 MARC record retrieval is anestablished and mature practice for cataloguing in academic libraries. Forthose readers who are not familiar with the protocol and its use in libraries,Fay Turner’s 1995 article “An Overview of the Z39.50 Information RetrievalStandard” may be of interest (retrieve article online fromh埄�p://archive.ifla.org/VI/5/op/udtop3/udtop3.htm). Librarians who areinvolved with creating a metadata management plan must beknowledgeable about the basics of how z39.50 works as well as whether ornot the library exchanges metadata with other libraries, the localmechanism(s) through which the exchange is done (typically through the ILSor software like MARCEdit), and from which libraries records are extracted.
8. The Metadata Maker was hosted by the University of Illinois at the followingURL in February 2015: h埄�p://iisdev1.library.illinois.edu/marcmaker/?language=eng&country=nyu&vorp=pages&literature=yes&literature‑dropdown=1&illustrations=yes.
9. The topic of the usefulness of the FRBR model and RDA in current andfuture metadata and discovery contents have been debated at the Annualand Midwinter Conference and Meetings of the American LibraryAssociation in recent years. For example, Shapiro and Myn埄�i argued againstthe usefulness and relevance of FRBR at a 2013 FRBR Interest Group Meeting(see FRBR Interest Group Report 2013 ath埄�p://www.ala.org/alctsnews/reports/ac2013‑div) whereas at the annualmeeting in 2014 Kelly McGrath and Jacob Nadal resumed the discussionwith concrete demonstrations of where and how FRBR functions well andwhere it was not useful (see FRBR Interest Group Report at 2014 ath埄�p://www.ala.org/alctsnews/reports/ac2014‑division). In that samemeeting, there was some discussion about how FRBR might be enriched ortransformed to overcome it. While the author did not a埄�end either of thesemeetings, she did note follow‑up discussion on social media where thequestion being posted was “Is FRBR Dead?” In order to follow theprovocative discussion, the author a埄�ended the FRBR Interest Group
Meeting at the ALA 2015 Midwinter Conference. There were no formalpresentations at this meeting but it was largely an open discussion about theviability and usefulness of FRBR, which was led by the interest group chair.While documentation for the discussions at the 2015 meeting had not beenrecorded at the time this section of the book was wri埄�en, the author’ssummary of the discussion is that FRBR is not dead and is useful as atheoretical model upon which RDA has been created. The sentiment of someaudience members is that in recent years there has been some confusionabout the difference between descriptive standards and the “vehicles” thatcarry data or metadata. In this book the phrase “metadata container” hasbeen used but the term “vehicle” suggests a useful image to characterize themovement of metadata in MARC environments, linked data, BIBFRAME,and so on. The la埄�er is also a helpful way to conceptualize a metadatacontainer. Considering the most recent discussions about the relevance ofFRBR leads the author to recommend that readers not be sidetracked byconfusing discussions. While FRBR as a theoretical model may have somelimitations or librarians have not quite come to a full understanding of itsimplications, it remains useful as the basis for RDA. As a descriptivestandard, RDA is still in its adoption stage and will take a while to reachmaturity, so it is reasonable to expect some degree of debate and change forsome time to come. It is the opinion of the author that RDA is going througha process that, while disconcerting to some cataloguers who had becomeaccustomed to the relative stability of MARC and ACCR2, is a normal andnecessary process of gradual adjustment and tweaking.
10. The “junk drawer” refers to a tradition in many households where a draweror cabinet is assigned to a highly generic and disorganized storage function.The junk drawer may contain random household items ranging from paperclips and bo埄�le‑cap openers to flashlights and lightbulbs. With no logic ororganization to what is in the junk drawer, family members just rummagearound in it until the desired object is located, or not located. Despite thegranularity of MARC relative to other library metadata standards such asDublin Core, there are still some fields and subfields that aren’t granularenough to be effective in a linked data environment. This is particularly truewith the 245 field, which combines title information, material formatinformation, and information about creators and contributors for theresource. There is a similar problem with the 300 field. Even the division ofthe subfields within the MARC tags doesn’t provide for the neededseparation of distinct elements of metadata. The problem doesn’t show up inthe way MARC records are used in a typical OPAC but does show up inother environments. While the information can be rummaged through andmakes sense to a human reader, some of the MARC ending up looking andfunctioning a bit like a junk drawer.
11. The RDA Toolkit is available as a subscription resource atwww.rdatoolkit.org. It can also be accessed, again by subscription, via theLibrary of Congress Cataloguer’s Desktop (for more information seeh埄�p://www.loc.gov/cds/desktop/). These are the most commonly used toolsfor those doing original cataloguing in RDA. Information documenting thedevelopment and the current status of the recommendations for RDA arefound on the RDA Joint Steering Commi埄�ee’s web page at h埄�p://www.rda‑jsc.org/working1.html. Note that the actual text of the RDA instructions arecopyrighted and not made freely accessible online but must be purchasedvia a subscription, in an eBook format (not available in all countries) or inprint.
12. “Fight songs” are commonly used at Canadian and American universities aswell as by some professional sports teams in North America. These aregenerally simple, repetitive songs that are often used as a cheer or to enlivenfans during games. It is not unusual for teams to reuse melodies from otherteam’s fight songs. This is especially true if the teams compete in differentleagues and/or geographical regions. For example, the same melody is usedby the University of Wisconsin–Madison for their “On Wisconsin” fight songas is used by the Canadian Football League’s Saskatchewan Roughrider’s“On Roughriders” (listen to the YouTube recordings for a comparison: “OnWisconsin” at h埄�ps://www.youtube.com/watch?v=zOYus1BE7jk versus “OnRoughriders” at h埄�ps://www.youtube.com/watch?v=cg‑9pULgbB0).
13. The ODLIS defines “nonfiling characters” as:A character, such as the apostrophe, ignored in arrangement when it appearsin a word, phrase, heading, or descriptor. For example, under most filingrules, the le埄�ers of the initial articles “a,” “an,” and “the” are ignored at thebeginning of a title. In the MARC record, the number of nonfiling charactersat the beginning of a title or heading is specified in the indicator at thebeginning of the field. Synonymous with nonsorting character (see:h埄�p://www.abc‑clio.com/ODLIS/odlis_n.aspx).
14. The ODLIS describes approval plans as:A formal arrangement in which a publisher or wholesaler agrees to selectand supply, subject to return privileges specified in advance, publicationsexactly as issued that fit a library’s pre‑established collection developmentprofile. Approval profiles usually specify subject areas, levels ofspecialization or reading difficulty, series, formats, price ranges, languages,etc. (see: h埄�p://www.abc‑clio.com/ODLIS/odlis_a.aspx).
6C Bulk processing: Working with record setsand updating metadataMany readers have likely chosen to read this book because they are responsiblefor dealing with the MARC record sets that their libraries receive or retrieve fortheir eBook collections and would like some tips for how to work with thoserecord sets more efficiently and effectively. The interest that many cataloguingand metadata librarians have in developing their skills in this area reflects the factthat many libraries can purchase or otherwise gain access to anywhere fromseveral hundred to as many as over a hundred thousand eBooks in a given year.The number of eBooks added to the reader’s collection may be more significantthan first thought if the definition of eBook in this book is employed and all of thestreaming audio and video, born digital documents, and digitized contentincluding both licensed and open access resources are considered. The author hasprocessed record sets containing upwards of 70,000 records for a single collectionof digitized documents. These documents were for a collection that was part of aseries for which each collection appeared to have a minimum of 68,000 records.While each resource in the collection was a digitized document of generally 10pages or less, a MARC record was provided for each document just as they wouldbe for full‑length eBooks. The boퟙ�om line is that once academic libraries startpurchasing eBooks, librarians will soon find themselves in a situation wheretraditional approaches to cataloguing are simply not feasible because of the sheervolume of metadata that needs to be handled. This is another way in whicheBooks have presented a disruption to academic libraries.1
For those tempted to begin reading this bookat this chapterGiven the importance of this topic for many librarians, some may have skippeddirectly to this chapter without reading the sections leading up to it. Certainly,many if not most librarians who are currently charged with handling eBookrecord sets will not need to read the sections on copy and original cataloguing,although they may wish to skim through them and read the notes at the end ofeach part of each chapter. The author suggests that those who are tempted to skipdirectly to this chapter return to Chapter 1 through Chapter 3 and read them first.Some terminology is introduced in these chapters as well as the idea thatmanaging eBook metadata can be made more effective if the disruptive characterof eBooks in academic libraries is considered and if managing record sets isintegrated within the framework of a larger eBook metadata management plan.
6.9 What does bulk processing mean?For librarians who are already experienced in working with record sets, thedecision to call the type of work described in this chapter “bulk processing” mayseem somewhat puzzling. The more commonly used phrases to describe workingwith record sets are “batch loading” and “batch processing.” The author haschosen to prefer the term “bulk” rather than “batch” for a few reasons. The first isthat the word “batch” has a long‑standing use in information technologyterminology and actually refers to practices and procedures that resemble whatlibrarians often do with their record sets. However, the vast majority of processescarried out on record sets in libraries are different from what is generallyconsidered “batch processing” in fundamental ways. Saving the term “batchprocessing” for processes that are more in line with batch processing as it isperformed in other fields helps to make the distinction between the approaches aliퟙ�le clearer. The second reason is that while very large record sets are processedat once, the processes applied to these records often need to either be done in aspecific order or are done in conjunction with other processes or procedures. Inmany instances, library staff need to evaluate the results of a process and makethe decision about which process to apply next, so the traditional idea of batchprocesses running automatically in the background or overnight with relativelyliퟙ�le human mediation is somewhat misleading. The third reason is that there aretimes when some or all records within a record set need to be examined andedited on a record‑by‑record basis by library staff. Fortunately, the situations thatcall for this level of intervention have become increasingly rare in the author’sexperience. Nonetheless, working with the types of record sets that require anysort of record‑by‑record staff intervention should not be referred to as batchprocessing even if the record set is eventually loaded as a batch. The fourth andfinal reason to avoid the “batch” terminology is that in the author’s experience shehas noted that the term seems to imply for some library staff that a process can beset up and then all record sets are handled through that process automatically.While ideally record sets and the manner in which they are supplied should be sostandardized within the eBook publishing industry that such an approach shouldbe possible, this is not currently the case. Unfortunately, those libraries that havehighly diverse eBook collections likely do not have enough standardizationamong the record sets they receive to apply such an automated process to all ofthe record sets. Treating all record sets from all vendors in a uniform fashion canlead to what is described in the final chapter of this book as a “metadataaccident.” In her own library, the author has taken the approach to switch thepredominant terminology away from “batch” to “bulk” in order to preventconfusion and misunderstanding about what needs to occur when the libraryreceives record sets.
So then, what is a useful definition of “bulk processing”? The author proposesthat bulk processing should be viewed as any processes carried out on groups ofrecords either before they are loaded into the library’s local bibliographicdatabase or after they are loaded. In general, bulk processing occurs beforerecords enter the local database but there are times and situations where bulkprocessing must occur within the system. Also, bulk processing involves acombination of human‑guided and automated processes to efficiently andeffectively prepare records for the specific discovery environment in which theywill be used.
6.10 What is a record set?While the term “record set” has already been used in this book, it is importantthat readers are completely familiar with what a record set is before proceedingwith the remainder of this chapter. Record sets are collections of MARC recordmetadata that are generally provided to libraries in either the .mrc or .mrk fileformat. The former is often referred to as a “MARC file” and the laퟙ�er as a“MARC text file” or mnemonic file. The difference between the two is that theMARC file is machine‑readable and the MARC text file is editable by humanbeings when the file is opened in a MARC editor. A record set may contain asliퟙ�le as a single record or have a theoretically limitless maximum number. Inreality, record sets that contain records in the hundreds of thousands become verydifficult to transmit and process because of the limitations of the computing andcommunications environment in which the processing and file transmission needto occur.The purpose of the records in most record sets is to provide libraries with
discovery metadata for resources. The record set format is a convenient andefficient way to prepare and transmit that metadata. Because academic librariesgenerally use an ILS/LMS that draws upon metadata records in the MARCformat, record sets are predominantly provided in the MARC 21 format. For thoselibraries that no longer use a traditional OPAC for discovery, they may or may nothave a use for the eBook record sets supplied by vendors. If, for example, thediscovery metadata is fed into the knowledge base (KB) for the discovery systemindirectly by the eBook vendor and there is no local need for MARC records foreBook acquisitions, preservation, or troubleshooting purposes then the vendor‑supplied MARC records would likely be considered superfluous. However, in theauthor’s conversations with academic librarians at conferences, in emaildiscussion forums, and in social media, the majority of academic libraries stillappear to find the presence of MARC eBook discovery metadata in their localbibliographic databases to be desirable. Some libraries that lack eBook discoveryrecords in their local catalogue simply don’t have the resources to add and
manage them. Hopefully, this chapter will be of particular assistance to thosereaders whose libraries are in that situation.
6.11 Sources of record setsThis section will address each of the eight general sources of record set metadata: (1) A vendor’s website where vendor‑generated files can be directly downloaded
into a local file location(2) A custom record set generator on the vendor’s website where library staff can
configure the records they would like to download(3) Direct provision of records from vendor or consortia via email(4) Retrieval of record sets from an FTP site(5) Customized delivery of records from a third party based on KB information(6) Delivery of records from a third‑party cataloguing vendor(7) Record sets of harvested metadata(8) Locally generated record sets While some libraries will undoubtedly have additional sources of record set
metadata, these eight sources are the most common. The author will providesome examples of each type and some general tips for addressing each mode ofrecord set extraction.
(1) A vendor’s website where vendorgenerated filescan be directly downloaded into a local file location
When a vendor offers generic packages of eBooks for their customers and theURLs in these eBooks aren’t specific to any library, a vendor may have a locationon their website from which files containing record sets for each package can bedownloaded.These vendor‑generated files are generally found in one of three places. There
may be a password‑protected administrator site associated with a platform orproduct. There may be a page with a title that reads something along the lines of“resources for librarians” or “tools for librarians.” The third possibility is thatsome of the larger eBook providers will have separate websites dedicatedspecifically to providing information about all of the discovery options. Seefigures VIc.1 and VIc.2 for examples.One of the most significant benefits when vendors provide access to these
record sets without having to log in is that librarians can view the record setsbefore a product is purchased. Not only can the quality of the records beexamined but the frequency of updates and whether or not corrections are
provided can be seen. Ease of access to the MARC records, well‑organized filesthat include clearly marked correction and deletion files, and helpful hints andinstructions are positive signs with regard to the potential for the future ease ofmanaging eBook metadata for the vendor’s products.Considering the author has assisted a number of librarians from other libraries
with downloading OCLC record sets, it seems appropriate to include a tip aboutan unusual file extension that OCLC uses. If the reader encounters a MARCrecord set that has the extension “.bin,” chances are that record set has beenproduced by OCLC. In order to download that file from a website such as OCLC’sProduct Services Web, the user should right‑click on the file name and thenchange the file extension from “.bin” to “.mrc” before saving the file locally. If thefile name is then double‑clicked from its location on the local server, this actionshould automatically open the file into a MARC editor such as MARCEdit. Forthis to work properly, it assumes that a MARC editing application that isindependent from the ILS has been installed on the computer. Some libraries usethe “.bin” file to load directly into the ILS; however, it is not generallyrecommended that libraries load record sets directly into the local system forreasons that will be discussed in detail later in this chapter.One word of caution has to do with the care and aퟙ�ention that needs to be
taken when downloading files. Some vendors offer many packages and manypackage options. There may be multiple options for record sets that can bedownloaded for any given package. It is important to ensure that the appropriaterecord set for the package purchased is downloaded. It has been the author’sexperience that the package title listed on the invoice doesn’t always match upwith the record set names. If the record set is very small, it may be feasible to trythe URLs in the 856s to see which URLs are accessible and which aren’t.MARCEdit, which will be discussed in more detail later, has a function that canhelp with this process. In the end, it may be necessary to contact a salesrepresentative or a technical contact to determine which record set should bedownloaded. Depending on how the library’s record set loaders function, it couldbe difficult to remove a record set that is loaded incorrectly. Or an incorrect recordset can potentially overlay records that should not have been overlaid, and so on.
(2) A custom record set generator on the vendor’swebsite where library staff can configure the recordsthey would like to download
Record set generators may allow librarians to customize their record sets withregard to characteristics such as which packages to include, the time frame duringwhich the records were added, specific titles or ISBNs, character encoding, and
other characteristics. Some record set generators allow for the insertion of fieldsand subfields that the librarian specifies and/or the removal of others. Somerecord set generators are quite powerful and offer librarians many options forcustomizing both the records found within the record set and the fields withineach record while others are very basic and may allow only one or twocustomizations.These generators are often located in the same places on the web as the vendor‑
generated files previously discussed. In fact, many vendors offer both a selectionof generic record sets as well as a record set generator on the same page. A librarymay wish to download a generic record set the first time that they retrieve recordsfor a collection and then use the record set generators to retrieve subsequentupdates and corrections. Record set generators that include the option to definethe titles or ISBNs that should be included in the record sets can be particularlyuseful for maintaining the accuracy of metadata for packages over time, assometimes individual records may be missed in the generic record set. If, forexample, the ISBNs of the missing eBook records can be determined, these can beentered into the record set generator to retrieve the missing records.
Springer is an example of an eBook publisher which offers acustomizable MARC recordset generator for use by theircustomers.
In addition to the caution about the potential for ensuring that records for thecorrect package name are downloaded, record set generators should be used withadditional cautions. The most significant concern is understanding what the daterange options mean. The date ranges may mean one thing for one record setgenerator and another thing for another generator. Sometimes the date range hasto do with the publication date of the resource, sometimes it has to do with thedate that the record was added to the collection of records, and other times it has
to do with the last time the record has been updated. Each one of these meaningshas a different implication for how metadata for the collection is managed overtime. For example, in one situation the author once assumed that the date range inthe record set generator reflected the date on which a title was added to thecollection. Based on this assumption, once the end of the year was reached and thelast record set was downloaded, the author stopped selecting that year whendownloading record sets. The problem is that the date in the record set generatorhad to do with the publication date of the resource and by making an incorrectassumption there was a repeated problem with records being missing from thecatalogue for eBooks recently added to the package but having older publicationdates. When this discrepancy and its cause were discovered, a new workflow wascreated for that vendor so that the older publication date titles would not bemissed. Given that there can be great variation among the record set generatorsand the fact that new features are added from time to time, the best approach is tonot assume what a feature does or how it is used. Librarians should search fordocumentation or contact sales or technical representatives from the platformvendor for information if it is not available on the website. Whether or notinformation can be found or obtained, it is a good idea to test the results of arecord set generator to ensure that the configuration produces the desired orexpected results. If the acquisitions metadata gives an insight into how manyrecords should have been retrieved or the particular titles within a collection areknown, this information may be helpful in determining the accuracy of the recordset.Finally, the results of record set generators are often produced immediately and
the file and can be downloaded directly onto the librarian’s computer. However,there are other times when a record set is emailed to the library. There is noconcern in the laퟙ�er situation if the user is prompted to enter an address to whichthe record set should be sent. However, if the record set generator is located onthe administrative site and a password is needed to access it, there is a goodpossibility that the record set will be emailed to the administrator of thesubscription. In such a case, librarians may need to do some investigation as toprecisely where a record set has been sent. The author has had the experience ofrecord sets being delivered to a selector because that librarian’s email address wasestablished as the administrator of the account.
(3) Direct provision of records from vendor or consortiavia email
In some cases a vendor will email record sets for eBooks directly to the library.This often happens when eBook collections/purchases are relatively small and/orthe vendor doesn’t have a web page from which to download records or doesn’t
have a record set generator that customers can use. Other times vendors may sendrecords directly when there are other methods of record retrieval available butwant to send their customers a small number of records that were missed from alarger file or records that the library may have previously retrieved.Sometimes consortia have assigned an individual from one of the member
libraries to coordinate retrieving and distributing record sets to other libraries inthe consortia. This person may also report problems with record sets to thevendor. With some eBook purchases made by consortia, the vendor may stillprefer that libraries deal with them directly rather than through the consortiarepresentative but may still prefer having a centralized method for dealing withMARC records. If a centralized approach is used, chances are that it will be thecoordinator from the consortia who sends record sets to the library via email.Readers should consider that with some eBook products, the vendor or
consortia is not set up to automatically generate MARC records when new titlesare purchased or added to the platform. Often the fact that a sales representativeor other person sends the record sets rather that receiving the email via anautomated process is a sign that the vendor needs to retrieve and forward therecords on a customer‑by‑customer basis. The author has encountered more thanone situation where she had to request an updated record set after notices that thecontent of the collection had changed. The vendor would not have automaticallyproduced new records without being prompted to do so by the request.When vendors don’t have automated ways to produce record sets, a number of
additional problematic situations can occur. For example, a problem with notgeퟙ�ing the required record sets can occur when there is a change of salesrepresentatives or library staff in the consortia. As with the updated content, theauthor has found that with some vendors she just needs to remind them to sendthe record sets from time to time. Fortunately, this occurs with a very smallpercentage of vendors and it does tend to only be those from whom a smallnumber of eBooks are purchased each year. However, because of the specializednature of these eBooks, they often are critical to patrons and may have a high costper title. Therefore, it is important from the point of view of many parties to keepthe discovery metadata for these products as up‑to‑date as possible and this maymean contacting the vendor from time to time to ask for record updates.The email mode of retrieval for these record sets can be relatively simple in the
sense that the email just needs to be forwarded to the person who will process therecords for loading into the local ILS. However, the author has found that beingoverly casual about handling record sets received via email can lead to eventualcomplications. In this chapter there will be a discussion about creating specificmetadata that tracks which record sets have been retrieved or received fromwhich vendors as well as a discussion of the tracking of record sets as they passthrough the various workflows. Not only is it easy to forget to record that recordswere received, there often is no metadata about the source of the records and the
person who sent them. If there is a significant gap in time between when newcontent is announced and no updated record sets are received, as is described inthe previous paragraph, it is helpful to have easy access to information about whohas been sending the record sets, when they are typically sent, and when the lastone was received. This process of recording information for tracking purposes itsomewhat similar to what libraries have done historically to deal with the “check‑in” of their print journals. In the case of eBook record sets, the recording of somebasic information prevents having staff scour through their email inboxes lookingfor messages and also prevents unnecessary concern that someone has forgoퟙ�ento send a record set to the library. There is more than one vendor for which theauthor’s library receives MARC record updates either annually or biannuallydespite the fact that new eBook titles are added on a monthly basis. Having thisinformation recorded as metadata used for managing eBook loading processeshas proven helpful many times when selectors and other library staff areconcerned that MARC records are missing from the catalogue.On a related note, selectors and other library staff sometimes find the
infrequency of updates to discovery records in the catalogue to be out of line withregard to the fact that the content can become quickly dated and patrons tend torequire information in a timely fashion. This may be particularly true forresources in the health sciences and some business and technology‑relateddisciplines. Cataloging and metadata librarians may wish to recommend toselectors that this issue be taken into consideration when purchasing eBookcontent. If there are no other options for sources of the eBook content, which iscommon when the content is specialized, it would be helpful for the selector toinform the sales representative of the inadequacy of their annual or biannualrecord updates before a purchase is made. While the vendor may not be able toimmediately improve the frequency of MARC record updates, if they receivefeedback from a number of their academic library customers on this issue, thechances that the regularity will be improved may be greater.
(4) Retrieval of record sets from an FTP site
Rather than using a website for the retrieval of record sets, some vendors willmake their record sets available for retrieval from an FTP site. While somevendors require that all of their record sets be retrieved from the FTP site, otherswill only use this mode of record set delivery for very large record sets. Seeing asvendors may use multiple modes of record delivery, one of which being the use ofan FTP site, it is often not possible to establish a single workflow for retrievingrecords that applies to content purchased from a vendor.A large record set could contain anywhere from tens of thousands of records to
hundreds of thousands of records. Given the fact that there is a limitation on the
size of email aퟙ�achments, there are some record sets for which the use of an FTPfile transfer is the only viable option. Even when very large files are retrieved inthis manner, the files that are retrieved may be zipped in order to make it easier todownload them. Such records need to be unzipped locally before they can beedited.Notices that records have been posted on FTP sites are generally sent through
an automated system. The person receiving the email should carefully read thedetails found within it. Important details include the address of the FTP site, thelogin information, the name and location of the file, the length of time the file orfiles will be available for download, and contact information if there are problems.Login information may change from time to time or it may remain the same.Sometimes staff have difficulty retrieving files when they haven’t noticed that thelogin information provided in the current email is different than what was used inthe past. Other problems can occur when staff are not aware that they havedownloaded a zipped folder and try to open it in the MARC editor withoutunzipping it first.The time limit for which files are available on FTP sites can be problematic for
some academic libraries that are not fully staffed at certain times of the year suchas over the summer. Often the time limit is 90 days, which may seem adequate inmost situations. However, if notice of the file is received in late June and a limitedstaffing complement doesn’t get around to aퟙ�empting to download the file untilthe university opens again in September with its full staffing compliment, the timelimit may have expired in the meantime. While it is often possible to get recordsets reposted, doing so may delay the delivery of other record sets and/or result inadditional costs for the library. In addition, with some very large record sets it cantake a surprisingly long time to process them if the workflow is particularlycomplex. For example, if there are a number of validation errors and problemswith diacritics in a record set it can take the person using the editor an unusuallylong time to prepare the record set. If the record set then fails to load into the ILSbecause of undetected problems with the file it may need to be passed back to thecataloguing department for more work. Large record sets may need to be passedback and forth multiple times depending on how large the record set is, how oldthe records are, and the original source of the records (i.e., if they have beenharvested from another metadata schema). If the library is closed, another morepressing issue arises, and if a staffing change or some other event occurs it meansthat the record set is not actively being worked on for an extended period of time,and it can easily take a few months before the records can actually be loaded intothe ILS. There was one instance in the author’s library where the notice that arecord set had been posted was misdirected and was not discovered by thecataloguing staff until nearly six months after the record set was posted. Thenotice said that the record set was only available for 90 days. Fortunately, whenthe cataloguing staff went to retrieve the record set it was still there and could be
retrieved. However, there have been other situations where some very largerecord sets were removed after the time limit and it was difficult to get themreposted.It is a good practice to keep an archived copy of all very large record sets
exactly as they have been downloaded until the records have been successfullyloaded into the catalogue. If a record set gets misdirected or corrupted in theprocess of editing and loading it and the records need to be retrieved again so thatthe process can be restarted, the library can use the archived copy. This eliminatesthe risk of having the record set disappear from the FTP site before records can besuccessfully loaded. Given the size of the files, the archived copy of the originalfile shouldn’t use regular server space indefinitely. When it appears that therecords have loaded successfully, the backup file can be deleted to recover filestorage space locally. Or, as will be discussed later, the library may wish tocompress and archive the file in a special location for potential use later on.
(5) Customized delivery of records from a third partybased on KB information
An increasingly common mode of retrieving eBook discovery metadata is to havethe delivery of record sets mediated by a third party that maintains a KB intowhich the eBook vendors and aggregators feed metadata about both theirproducts and what their customers have purchased. The third‑party vendor canthen use that metadata to extract the appropriate MARC records to deliver to eachcustomer and combine them into record sets that can then be retrieved by thecustomer. The two best known examples of this type of service common in theacademic library sector are OCLC’s WorldShare Metadata Manager CollectionManager (see: hퟙ�p://www.oclc.org/worldshare‑metadata.en.html) and ProQuest’s360 MARC Updates service (see: hퟙ�p://www.proquest.com/products‑services/360‑MARC‑Updates.html).
This diagram shows the process through which eBook vendorscan use OCLC’s WorldShare Metadata Collection Manager todeliver eBook records and update records to their customers.
Given that eBook MARC record services such as these have been undergoingsignificant changes in recent years, describing how any particular service works ingreat detail is likely to result in something that will be out‑of‑date almost as soonas this book is printed. A number of companies including ExLibris (ALMA) andEBSCO (EDS) have been developing their own systems for managing variousaspects of eBook metadata and providing a discovery interface. It is likely thatthese developments and other new services yet to be introduced will impact onthe face of the overall eBook metadata environment for academic libraries in thenear future. While the delivery of MARC records isn’t part of many vendors’current services, librarians should keep an eye on news about new products andservices as the eBook metadata environment continues to grow and shift.The focus in this part of the chapter is on those services for the delivery MARC
records to libraries based on information retrieved from a central KB. There aresome things that all cataloguing and metadata librarians should take intoconsideration about MARC record services that use KBs of metadata that has beensupplied and updated by vendors.First of all, these services are intended to address the very problem that has
likely motivated many readers to pick up or download this book in the first place.That problem is that eBook discovery metadata can be very difficult to manage formany libraries. Managing eBook discovery metadata is definitely a tiger thatstresses and eats away at the resources of technical services staff in manyacademic and research libraries because it doesn’t fit neatly into existing technicalservices workflows and processes. As already discussed throughout this book, thecontent of eBook collections can be dynamic with titles coming and going out ofsome packages fairly regularly. In addition, URLs may need to be updated overtime and the MARC records themselves often need to be enriched to optimizeeBook discoverability in multiple discovery environments. Not only is it difficult
for libraries to manage all of the change, vendors must deal with the change aswell. They have many customers and each customer has a somewhat uniquemetadata or discovery environment. The disruption that has been created in apredominantly MARC‑based discovery environment presents a significantchallenge for libraries that have experienced shrinking technical servicesdepartments. The author suspects that vendors are also finding a challenge indealing with a library‑specific technology outside of their expertise whileremaining viable in a highly competitive eBook market. The idea that a thirdparty would essentially accept metadata updates from vendors and enter theinformation into a multipurpose KB from which MARC records and metadataupdates can be automatically generated appears to be a response to a disruptiveinnovation that can help both libraries and eBook vendors to continue to thriverather than to continue to suffer negative impacts from the disruption.Second, as ideal as the solution of KB‑based MARC record services sound, there
is one limitation that many academic libraries with large eBook collections willundoubtedly encounter. This is particularly true if the eBooks include specializedand international content. Even with a robust service such as OCLC’s CollectionManager, not all eBook content is found in the KB and even if metadata for thatcontent can be added and an initial record set can be generated, the vendor maynot update it regularly if at all. That being said, even if only some of a library’seBook metadata can be managed through a third‑party service, doing so has thepotential for improving the efficiency of managing eBook discovery metadata formany libraries.A key point is that some libraries, if not many, will likely have to use mixed
methods for creating and managing their eBook metadata. Some of these mixedmethods may still include original and copy cataloguing, which consume a lot oftime and resources. Therefore, the larger the proportion of eBook metadatamanagement that can be handled through automated processes, the beퟙ�er it is forthe overall efficiency and effectiveness of the eBook metadata management at thatlibrary.Third, depending on the services the library already uses and the library’s
overall metadata environment, adopting one of the KB‑based services may be anatural fit or it could be difficult and costly to implement. For example, many ofthe KB‑based services make use of a KB that is also used by a link resolver and/ora non‑MARC‑based discovery service. If the library already uses a service offeredby the vendor who manages the KB, geퟙ�ing a MARC delivery service may be anadd‑on that, while it has an associated cost, is well worth both the improveddiscoverability and ease of MARC record maintenance that such a servicerepresents. Or, in the case of OCLC’s Collection Manager, it is a service that iscurrently provided to the author’s library at no extra charge along with the cost ofan OCLC cataloguing subscription. Many academic libraries in North Americaalready have cataloguing subscriptions with OCLC, just as the author’s library
does, and thus beginning to use Collection Manager essentially means seퟙ�ing uprecord set delivery within WorldShare and adjusting workflows. In fact, for thoselibraries who already have OCLC memberships for cataloguing services, it wouldrepresent a significant lost opportunity to not at least try seퟙ�ing up recorddelivery for a handful of eBook collections to understand how the service worksand to see how it might benefit the effectiveness of the library.On the other hand, there may be a number of academic and research libraries
around the world where using one of these services is prohibitive for one reasonor another. The author has spoken to librarians who are not able to select thevendors and products used at their local library. The decision may be made at aregional level, through a consortia or by the government, so that resources can beshared while reducing the overall work associated with purchasing andmaintaining of them. Given the high costs of the specialized products and servicesacademic libraries sometimes purchase, the shrinking budgets with which manylibraries need to contend, and the overall complexity of academic libraries withregard to their metadata environments as well as their administrative and fundingstructures and policies, aퟙ�empting to implement a new service and/or processmay actually be more challenging to achieve than it may appear on the surface.Fourth, given the complexity of the systems that libraries use, the more systems
libraries need to make work together, the more challenging managing theeffective exchange of information between systems can become. Chapter 9contains a section that discusses the importance of creating metadata flowsdocumentation to assist with managing this complexity. Many KB‑based MARCrecord delivery services can technically be used by libraries with any ILS, usingany link resolver, using any discovery layer, and so on, but don’t appear toperform consistently well in any environment or context. It has been the author’sexperience that MARC delivery services are often much more effectively managedif other applications and services from the same vendor are in use as well, due tothe reduced need to transfer metadata between systems, and so forth. Forexample, as far as the author can tell from viewing the WorldShare trainingvideos, many of the challenges discussed in this book would be eliminated if alibrary were to use the WorldShare LMS because much of the complexity ofmetadata management, including the need to get multiple systems to worktogether, disappears when services are integrated either in the cloud or within asingle system and a KB is shared by multiple services within that system. In fact,in a completely cloud‑based LMS, there is generally liퟙ�le to no need to use MARCrecord sets, link resolvers become largely irrelevant, and other supplementaryservices such as outsourced authority control is simply redundant. Thus, some ofthe newer comprehensive LMSs theoretically hold the potential for significantlyreducing the amount of complexity that the local library needs to manage.While there are many benefits to adopting a single, comprehensive solution to
managing both library metadata and automated library functions through a single
service or a suite of services offered by the same vendor, it appears that academiclibraries have yet to find a true panacea for the reality of their complex metadataand information technology environments. Or, if there is a panacea in the making,it may be too early to call it that yet. The boퟙ�om line is that for many libraries, it isa massive undertaking to not just migrate from one ILS to another but tocompletely revamp the entire model on which the library’s informationtechnology and metadata environment is based. As a result, many academiclibraries may wish to include the following developments in the larger area ofLMS innovation as part of their eBook metadata management plan. Collecting andrecording information about the local system while simultaneously followingdevelopments and emerging trends in the larger academic library environmentcan help those creating the metadata management plan determine where localpractices are in line with the direction that libraries in general are moving andwhere they aren’t. The information about innovations, developments, and trendsmay also act to guide decisions about what changes to make and how to makethose changes.While some libraries may eventually be forced to make a gigantic leap from one
model of managing metadata and discovery to another, it may be possible formany libraries to gradually shift their policies and practices to align the librarywith the emerging trends in the library environment. Libraries are more likely tofit into the laퟙ�er category if their librarians have been actively monitoring thosetrends and looking for opportunities to made small changes along the way.
(6) Delivery of records from a thirdparty cataloguingvendor
Some eBook vendors do not supply MARC records to customers free of chargebut arrange with a third‑party cataloguing vendor to produce record sets thelibrary can purchase. Some vendors such as Cassidy Cataloguing (e.g., West Lawrecords), Marcive (e.g., U.S. government publications), and OCLC (e.g., NAXOSMusic) provide eBook records to libraries using this type of arrangement. The costof the records may be calculated based on a cost per record fee or a cost per recordset price. Depending on the vendor there may also be a subscription fee on top ofthe cost of the records and the subscription fee may be charged annually. In theauthor’s experience she has paid as liퟙ�le as $.30 USD per record up to a hefty $2USD per record. If the library purchases a lot of eBooks from a vendor that usesthis mode of record set delivery, the costs can add up quickly. However, for themost part, the cost of purchasing these records is still significantly less of a drainon the library’s resources than if the library had to do original and copycataloguing for the eBooks.
The author has received some very high‑quality record sets for eBooks fromthird‑party cataloguing vendors. These records generally have controlledheadings, many useful access points, and complete and accurate 505 and 520fields. From the point of view of cataloguing and resource discovery, there is noquestion that purchasing these records is a good value. Of course, it would bebeퟙ�er if the vendor sponsored the cost of the records. However, it is reasonable toassume that providing these records for free would likely just increase the price ofthe resource for all customers regardless of whether they want the MARC recordsor not. Considering that the anecdotal evidence is that the majority of libraries dostill use MARC records, it’s not likely that too many libraries would be put off bya small increased cost.Quality of the records aside, libraries must be somewhat cautious about third‑
party cataloguing vendor records and the associated costs for purchasing them.The author recommends that cataloguing staff be involved with the purchase ofnew eBook packages, eBooks on new platforms, and eBooks from new publishers.This involvement is to ferret out the precise nature of the MARC records to beprovided, including whether or not the records must be purchased and if there isa subscription fee to be paid. In the author’s library she has encountered morethan one situation where fees for record delivery were paid for twice because thechoice of the source for record sets wasn’t coordinated appropriately. She has alsohad the experience where it was discovered that the cost of purchasing records foran electronic document collection was significantly greater than the cost of thecollection itself. Cataloguers didn’t find the high cost of the record sets to beunusual considering they have been known to spend a considerable amount oftime cataloguing resources that are either low‑cost or free. However, thisrealization came as a surprise to the selector and had an impact on the electronicresource budget as that is the budget from which the record set costs were paid. Ifa library has a particularly tight budget, it is important to know whether or notthe library will need to pay for the MARC records for a new purchase and theestimated costs for those records.The updating of records provided through third‑party cataloguing vendors can
sometimes be an issue. The author has experienced situations where the updatinghas basically needed to be done on a record‑by‑record basis when either patronsor library staff noticed a problem such as the URL not working. While somecataloguing services also offer a record update service, these services aren’tuniformly available for all eBook products. If the collection is large it is especiallyimportant to know up front whether or not the vendor will provide updaterecords. If the cataloguing vendor doesn’t supply the records automaticallyand/or if the problems must be reported directly to the eBook seller on a title‑by‑title basis, the maintenance of these records over time may be somewhat moretime‑consuming than with other eBook discovery metadata. Fortunately, manycataloguing services do offer record update services as well. It has been the
experience of the author that these services generally cost in the range of a fewhundred dollars USD per year.In terms of the actual retrieval of record sets, it has been the experience of the
author that these records need to be retrieved from either a password protectedwebsite or an FTP site.
(7) Record sets of harvested metadata
While the author has never had to harvest metadata in order to create MARCdiscovery records for purchased eBook content, she has harvested metadata foropen access eBooks and other digital resource collections. A full discussion ofharvesting metadata and the various processes and the applications that can beused for this purpose is largely outside the scope of this book. However, giventhat many libraries find it useful to include metadata from digital repositories,thesis and dissertation collections, and other digital content in their localdiscovery systems because that content is relevant to local students andresearchers, a rudimentary discussion of the topic of metadata harvesting for thepurpose of creating MARC record sets is appropriate.Those libraries and other information organizations that have locally hosted
collections of digitized or born digital resources and have made their metadataharvestable by other libraries generally have implemented what is called OAI‑PMH, which stands for Open Archives Initiative Protocol for MetadataHarvesting. This protocol allows for a standardized method of exposing metadatafor digital collection for harvesting via HTTP (i.e., the web). OAI‑PMH wasoriginally designed to harvest Dublin Core (DC) metadata into a file of DCrecords. In the past decade some development has occurred that makes OAI morerobust in terms of the different schema it can handle but in reality, the vastmajority of harvestable metadata that libraries may be interested in will be in theDC format.For those libraries that have local digital collections held in products such as
ContentDM, DSpace, or any of the newer products, OCLC has produced andupdated a useful document that will guide libraries in the preparation ofrepository metadata that not only is OAI‑PMH compliant but also will transferwell into other metadata containers such as MARC. Note that this document doesnot reflect many of the RDA considerations that libraries who are initiating a newdigital repository may wish to build into the method for structuring and creatingmetadata.
OCLC (2013). Best Practices for CONTENTdm and other OAI‑PMH compliantrepositories: creating sharable metadata Version 3.1. retrieved from:hퟙ�p://www.oclc.org/content/dam/support/wcdigitalcollectiongateway/MetadataBestPractices.pdf.
The actual methods and tools for harvesting OAI metadata are numerous andvaried. In fact, it is highly likely that one or more of the applications already usedlocally at the reader’s library has the capacity to harvest records. Some ILSs andmost discovery layers have functionality in this regard. Those working on themetadata management plan may wish to investigate and document whatmetadata is being harvested via OAI‑PMH, how it is being processed, and whereit is eventually stored and used locally. This activity may prove useful in terms ofidentifying where undesired duplication of processes are occurring and theisolation of processes that need updating.For those libraries that don’t already have an established method for harvesting
metadata and may not have a local metadata environment that supportsharvesting external metadata sources, the author can suggest one tool likely to beuseful to nearly all academic libraries. MARCEdit (www.marcedit.org/), whichwill be discussed in greater detail later in this chapter, has an OAI harvesting toolthat will harvest metadata from an OAI‑compliant source directly into a MARCfile, which can then be edited for loading into the local bibliographic database ordiscovery system. The following video was created by Terry Reese who is also thedeveloper for MARCEdit. It explains step‑by‑step how the OAI tool can be usedto harvest metadata:
hퟙ�ps://www.youtube.com/watch?v=gvBrMVH6j7U (for readers of the print version ofthis book: Go to the YouTube and search for the title “Translating OAI metadata toMARC using MarcEdit”).
As will be discussed later in this chapter, MARCEdit is an application that hasnew versions released every few months. While the basic functionality generallystays the same over time, the actual appearance of the interface does change.Therefore, the options and appearance of the tool being demonstrated in the videomay look or function somewhat differently than it does now or will in futureversions. The key with all of the MARCEdit videos Terry Reese has created is thatthey introduce the tool in a way that helps users to understand the basics of howthe tools and functions can be used. The MARCEdit user community has a stronginternational presence so searches for additional videos, other web content, andlistserv discussions will all likely produce more detailed information anddiscussion about any of the MARCEdit functions. This information also typicallyincludes discussions about the changes that occur with each release.One final note about harvested metadata has to do with the nature and quality
of the resulting record sets. When OAI metadata is harvested, most of the timethat metadata was created using the DC schema. As discussed previously in thisbook, different schema have different characteristics and there is a significantdifference between MARC and DC with regard to both granularity androbustness. The crosswalk built into many of the tools such as MARCEdit actually
do a fairly good job of matching the DC metadata to MARC fields and subfieldsconsidering the disparity between the two standards. That being said, harvestedrecords will always require some local tweaking and enrichment to increase theirusefulness as MARC discovery records. There is another issue with harvestedrecords that is sometimes overlooked: many DC records that have been createdover the years contain metadata that doesn’t conform to any descriptive metadatastandard (AACR2, RDA, RAD). Given that DC records are so simple, the need fora descriptive standard is less obvious than with other schema. However, it isuseful to recognize that titles and the manner in which pages have been counted,for example, may vary from what is typically found in MARC records. Inaddition, the author has noticed that metadata from digital repositories oftendoesn’t make use of controlled vocabularies used in other library metadataincluding both subject and name authorities. More than once the author hasdiscovered that harvested metadata has been transferred into MARC tags andcoded as if headings were controlled according to LCNs or LCSHs but were not.Depending on the local metadata environment this could cause problems withautomated heading processes, the effectiveness of “see” and “see also” referencesin the OPAC, and the accuracy of facets in a faceted discovery system. The boퟙ�omline is that all metadata that has been harvested will likely need to be examinedmore closely and will typically require more editing and repairing in order tofunction as useful MARC discovery records relative to the other types of recordsets discussed in this chapter.(8) Locally generated record setsLocally generated record sets are those containing records that have been
gathered together for bulk processing by library staff. There are three key ways inwhich record sets can be extracted locally. The first is to use a report or queryfunction that is built into the ILS/LMS to gather together records stored in thebibliographic database. Then functions that are part of the ILS are used to applychanges to this group of records without having to reload the records into thelocal system. The second method is to take the same group of records and exportthem in a MARC file so that they can be edited in a MARC editor and thenreloaded into the bibliographic database by overlaying the existing records. Thethird method is to use a z39.50 tool to query an external catalogue such asWorldCat in bulk and have those records exported via a single MARC file forediting in a MARC editor. After editing, the records are added to the local system.Because the first method depends on the ILS/LMS used and the functionality
that is specific to that system, a discussion of how this type of record set editing isdone is outside the scope of this book. However, it is important for those creatingthe metadata plan to investigate when and how this approach to creating andediting record sets is used and to record the relevant practices and procedures.During those investigations, librarians should take note if the procedures poseany particular risk to the integrity of the records or the bibliographic database as a
whole. If so, what would the benefits and drawbacks be for exporting thoserecords to an editor to make the changes? It is possible that some of the existingpractices and workflows were set up before MARC editors were in common useand the old practices have not been reexamined in light of more recentdevelopments.With regard to the second method of extracting records from the local database,
to edit them in a MARC editor (such as MARCEdit), this practice is recommendedin a few situations: (1) When multiple types of edits must be made.(2) When the order in which edits must be made logically need to happen in a
certain order (e.g., copy 440 to 490 and delete 440. If not done in this order,there will be nothing to copy). If a mistake in the logic of which processesneed to happen in which order occurs, it is easy to start over again with theextracted record set.
(3) The local ILS/LMS doesn’t support the editing function that needs to be done.(4) The record set is very large and the criteria upon which records were
gathered may not be reliable. If a mistake is made, there is a technique thatwill be explained shortly that can be used to reverse the process.
(5) When a significant amount of editing can be done by either a lower level ofstaff or staff who are more experienced in working with MARC record sets ifthe records are edited externally.
(6) When staff are more comfortable with the MARC editor than they are withthe bulk editing features of the ILS/LMS.
When doing this type of editing, it is generally useful to keep a copy of the
original extracted mark file and then save the working file under another name.Each time a successful operation is carried out, the file should be checked toensure that the desired results were achieved and then the file should be resavedunder the working file name. If a problem does occur along the way, the personworking on the file can always close the file without saving it and then reopen thelast‑saved version of the file. As a last resort, the copy of the original file could beopened and editing could begin again if the file becomes too “mixed up” duringthe editing process. It remains essential to always keep a copy of the original fileexactly as it was extracted. Specifically, the problem is that sometimes it isdiscovered after the fact that the editing was applied to records inappropriately. Ifthis occurs, the untouched record set can be used to overlay the records to theiroriginal state and then a new query can be built that is more accurate in retrievingthe correct records. In general, editing records outside of the ILS/LMS is a low‑riskoperation that may prove useful for many complex eBook record clean‑upprocesses.
With regard to the third method, which is using a z39.50 tool to extract recordsin bulk, MARCEdit’s tool is one of the easier to use methods for gathering MARCrecords in groups from other library’s catalogues if that library allows access viaz39.50. The following is another of Terry Reese’s videos explaining this tool:
hퟙ�ps://www.youtube.com/watch?v=y0YibTP1dIs (For readers of the print version ofthis book, go to YouTube and search for the title “MarcEdit’s Z39.50 Functionality.”
The above video provides a general overview of how the tool works and thedifferent possibilities for extracting records from other catalogues using the z39.50protocol. More detailed videos are available on YouTube, plus Terry Reese hasadditional help files on the MARCEdit website at hퟙ�p://marcedit.reeset.net/help.Readers may question why such a process may be desirable in terms of
extracting record sets for eBooks. The author has used the z39.50 tool inMARCEdit many times for different purposes. In one case she had heard that anacademic library had already harvested metadata for some open access eBooksusing the method described in the previous section. This library had spent monthscleaning up the harvested metadata. The author was able to extract good qualityMARC records from the library’s catalogue for eventual loading into her owncatalogue. In another instance she was able to extract records that anotheracademic library had created for conference proceedings and in another case shewas able to extract a large number of records for government publications.Fortunately, in each case the library that created the catalogue records in the firstplace was able to supply the author with useful metadata that could be used toextract the correct records. While the same records could have been downloadedusing traditional copy cataloguing methods, it would have taken severalcataloguers literally months to complete one of the collections. Using the z39.50tool, the author was able to complete the entire process and prepare the recordsfor loading into the catalogue in less than a day.
6.12 Multiple modes for providing record setsFor those who work directly with retrieving record sets it is important torecognize that vendors often use more than one method for making record setsavailable to their customers. Some eBook vendors may offer their metadatathrough a variety of sources so that their customers can have a choice and canselect the option that best suits their overall metadata environment. Options mayinclude providing a record set generator on their website, record delivery througha third‑party cataloguing vendor, and metadata feeds to third‑party KBs. In otherinstances, a vendor may make record sets for their generic eBook packagesavailable for download on their website but may email records for customizedselections of eBooks and purchases through consortia directly to the library. Other
times a vendor may provide free records for some collections while record sets forother collections must be purchased through a third‑party cataloguing vendor.A best practice is to inquire about the record sets during the process of
considering a new eBook collection. It is important to not make assumptions thatrecord sets will be automatically made available to the library for free as part ofthe purchase and that the mode will be the same as for other collections from thesame publisher or on the same platform. If multiple options are available, thelibrary should select the most efficient option and most likely to be effectivewithin the context of the larger eBook metadata management plan.The author has experienced a situation that demonstrated to her the need to
carefully select the mode of record delivery to avoid costly and time‑consumingresults that can occur when a mode that may be an excellent choice for manylibraries is a misfit with the local bulk processing environment. An eBookaggregator was preparing for the library what the vendor called “enriched”records on a charge per record basis plus an annual subscription for providing theservice. This was set up before the author began working at this library and therewas no record of subscribing to the service in the existing metadata. Unbeknownstto the person who set up the record set delivery service from the aggregator, thepublisher was also providing MARC records for that same content because thelibrary had also purchased package eBooks directly from the publisher. Recordsfor content purchased from the aggregator were being delivered to the library freeof charge from the publisher in addition to the purchased records the aggregatorsupplied. The free record set records were duplicating the records sent by theaggregator and creating a highly confusing situation in the ILS. To complicate thesituation, when there were URL updates or other corrections to the records, thepublisher was sending updated records automatically but the aggregator didn’tsend updated records unless the library requested an update on a title‑by‑titlebasis. It took the author and library staff nearly two years to sort out the messcreated by selecting a record delivery service that was not appropriate for thelarger metadata environment at the library. Sorting out record redundancies andresolving loading errors that these redundancies generated were costly from thestandpoint of the time of library staff. Considering that the library had paidthousands of dollars for the aggregator‑supplied records over the years, thecostliness of not integrating a record delivery method with the larger environmentwas particularly pronounced in this situation.In the scenario just described, the cost per record service could have been a cost‑
effective mode of receiving record sets for eBooks purchased from the aggregatorif the library didn’t already purchase other content directly from the publishers onthe same platforms and/or if the library didn’t have a full OCLC cataloguingsubscription. In reality, if a library were to initiate an OCLC cataloguingsubscription just to receive, at the maximum, a few hundred records per year, thecost would be completely prohibitive. In such a situation the aggregator’s service
would truly be an excellent choice but this was not the case at the author’s library.Given that the only options for receiving MARC records for the library’s packagepurchases from the vendor was to use the MARC records supplied either directlyfrom the vendor or via a MARC record subscription service from OCLC, thelibrary needed to use these methods in their local workflows. The addition ofrecords for individually purchased eBooks from the aggregator should have beeninvestigated and then integrated into the existing workflows rather than beingimplemented in isolation from the larger eBook metadata context.Some readers may question how inappropriate choices for record delivery
modes could happen in the first place while other readers may have similarexperiences to the one described by the author and understand how it is relativelyeasy for this sort of thing to happen. The reality is that no one cause is responsiblefor all delivery mode mismatches except the lack of an overall eBook metadataplan or, if there is one, the cause is that the plan has either not been reexaminedoften enough or library staff aren’t aware that the plan exists. In the case of theauthor’s example, the cause of the initial choice was that only selectors andacquisitions staff were involved with the initiation of record delivery and the onlyinformation that they had to inform their decisions was that MARC records wererequired. Had someone who was directly involved with the eBook metadatamanagement plan and/or who understood the bigger picture of bulk processing atthe library been involved with the process, the same choice would not have beenmade. Those making the decision simply didn’t have the information they neededto make an appropriate choice. This situation is another demonstration of thedisruption eBooks have brought to academic libraries. In the past it was notnecessary to include cataloguing and metadata staff in the process of selecting andacquiring library resources but not doing so today can potentially have costlyresults.Another thing that could happen is that a record delivery mode that was
appropriate at the time it was originally set up can become problematic as otherfactors change within the larger metadata environment. If there is nounderstanding of the bigger picture of how all of the different metadata‑relatedprocesses interact and/or nobody monitors the impact of changes in thoseprocesses over time, a once functional process can begin to cause detrimentaleffects. Even when someone is monitoring processes, the author has seenfirsthand that changes can have unexpected negative impacts. At least when theseunexpected results occur, if someone is monitoring the situation the problem ismore likely to be detected and remedied in a timely fashion. Those working withthe record sets have information about the processes that are underwaysimultaneously. This information takes the form of the documentation foundwithin the metadata management plan. While the author does not have ascomplete documentation of her metadata environment as she would like, she hasstill been able to successfully use the documentation that she does have to unsnarl
a few unpredicted problems brought about by what seemed like relatively smalland innocuous changes to other processes. The power of even rudimentarydocumentation to assist with taming tigers has been proven repeatedly in theauthor’s library in recent years.
6.13 When record sets aren’t availableAnother reality that sometimes catches eBook selectors by surprise is that withsome specialized eBook publishers, the vendor does not supply record sets.Sometimes the content is so specialized that the vast majority of the vendor’scustomers are located in professional practice rather than academic libraries. Inthe case of customers who are situated within professional practice, MARCrecords are not required because law offices, engineering firms, medical clinics,and so on, do not typically have ILS/LMSs and thus have no use for MARCrecords. In such cases the eBook readers may search for content directly on thevendor’s platform and may use direct links to eBook content from an intranetpage. Considering that MARC record creation requires specialized knowledgeand is generally costly to create, it seems reasonable that eBook vendors who haverelatively few academic library customers are unlikely to readily provide freeMARC records.Another reason that record sets may not be available for some “eBooks” is that
the platform design and infrastructure of the information is not suitable forcreating records. The author has found that two different scenarios can occur inthis regard. In both cases, the publications were originally published in print asmonographs and sometimes monographic series. Examples include dictionaries,directories, encyclopedia, consolidations of legal literature, and handbooks. In onecase the content from numerous monographs was broken into parts and thenrecombined in a database format as a new resource. Often these new resources areno longer monographs but integrating resources, or what is often called an“updating website” where new content is integrated with the old rather thanpublishing new editions as would be the case with monographic publications. Assuch, the library has purchased the information, which is the equivalent of whatwas in a number of former print monographs but the original publications nolonger exist as distinct entities in their new electronic format. In their newelectronic format it is not possible to catalogue the former constituent partsbecause they no longer exist as distinct manifestations. Consider the fictitiousexample whereby there are five print monographs: Encyclopedia of cat behavior,Encyclopedia of dog behavior, Encyclopedia of bird behavior, Encyclopedia of rabbitbehavior, and Encyclopedia of guinea pig behavior. In print these encyclopedia werepublished as five separate volumes. In electronic format, the articles in each of thefive volumes have been combined to create the new resource called Theconsolidated encyclopedia of domestic domiciliary animal behavior. The main page of
this resource features a search page as well as information about the contents ofthe database, but it is not possible to browse through any of the formerpublications as one might if they had been digitized as an eBook rather than adatabase. In such a situation, a catalogue record for each of the former monographtitles would not be appropriate because it is no longer possible to link to a pagethat will take the patron to just that content. In the new electronic resource, theEncyclopedia of bird behavior no longer exists as a distinct resource although thecontents of that resource are included in the new product. The only appropriatecatalogue record to make in this situation is for the new resource, The Consolidatedencyclopedia of domestic domiciliary animal behavior.Some vendors will occasionally supply catalogue records for each of the former
print monograph titles included in database format integrating resources but the856 field in each of the records all link to the same search page. The author hasexperimented with adding records such as this to the local catalogue but theresults have been highly disorienting for both staff and patrons. This wasespecially true when the MARC records were transferred over to the discoverlayer, where the records completely failed to be useful for discovery. Given theway that MARC records are structured, it is not possible to accurately describewithin the context of a discrete bibliographic record content that isnonsequentially distributed within a larger resource. Given that creating recordsfor each former title that all link to the same database search page has provenconfusing at the author’s library but selectors want the ability to discover theresource by searching the former print monograph titles, an alternative solutionneeded to be found. In most situations, the author has added the appropriate 76X‑78X MARC linking fields to the catalogue record for the new databases to reflectwhat is generally a vertical relationship between the content from the printmonographs and the content in the new database. This solution only works forthose libraries with discovery systems that can make use of the MARC linkingfields. It is also a solution that works best when the library has retained copies ofthe print resources and linking fields to the electronic resource are placed in thoserecords as well.The boퟙ�om line is that in scenarios such as the one just described, librarians
should not expect to be supplied with MARC records or if MARC records areprovided, they may not be suitable in the local metadata and discovery context.Ultimately, libraries will likely need to work out something locally and thesolution will likely not be something that can be addressed through bulkprocessing of record set records.A second situation the author has encountered more than once with highly
specialized resources is that formerly print resources are not only combined andintegrated in the way described above but the content is also transformed intointeractive experience for users. In one situation, a Java‑based application wasused to combine encyclopedia articles with interactive diagrams from a diagram
collection and lecture podcasts on a dynamic page that is generated in response toa search query. In such an environment, stable URLs for the content, even if it canbe generated in a way that users could scan through a given resource frombeginning to end, are not available and thus there is no suitable URL to place inan 856 field of a MARC record. This type of resource is typically excellent forteaching and learning but extremely difficult to address from a cataloguing pointof view if selectors and faculty expect to be able to link directly to the contentwithin the interactive environment. Given that the way in which the content isdisplayed is dynamic and embedded within an application, creating granularrecords for the resource content is either impractical or impossible. This is asituation where having the cataloguing staff aퟙ�end demonstrations and askquestions about products before they are purchased can be helpful. If thecataloguer has enough information to determine whether or not creating recordsfor specific content within the resource is possible, that may be importantinformation for selectors to consider before making a decision. If the resourcebeing considered is intended for teaching and learning purposes, the inability tomake the content discoverable and accessible at a more granular level via thediscovery system may be irrelevant. However, if librarians expect to use theresource for reference purposes and faculty and researchers need to be able to citespecific sections of the resource, the resource format may be consideredinappropriate. In the author’s experience, there have been multiple occasionswhere researchers have found certain electronic resources unacceptable for theirpurposes because of the inability to refer to pages cited in articles and otherpublications or to cite their own sources.
6.14 Collaboration between library functionsBecause of the complexity of options and the interrelated nature of systems withinthe metadata environment, the author recommends that when a purchase on anew eBook platform, a new collection, or a new publisher is being considered thatpart of the discussion involve a collaborative team studying how the newpurchase and its discovery metadata will fit into the existing system. This isparticularly true for DDA/PDA programs, which will be discussed in more detailin the final chapter.So, why does this collaboration need to happen and who needs to be involved?
Readers may have already come to some conclusions about differences betweenpurchasing hard copy resources and eBooks that may make a collaborative effortessential with regard to making a decision about the cost and long‑term feasibilityof selecting one option over another. When libraries ordered primarily hard copyresources, the selector knew the approximate cost for obtaining the resource(depending on the region there may be taxes, exchange rates, and other factorsthat impact on the final price), the acquisitions staff had established practices for
obtaining the resource and cataloguing processes were more or less the same nomaퟙ�er the publisher, type of binding, and so forth. In such an environment newlyselected resources could pass through various technical services processeswithout there being much need for staff performing the different functions tointeract except to pass on the resource between steps in the process. As we havealready seen, there is no single way to select, acquire, catalogue, discover, access,and, as we will later see, preserve eBook content in the majority of academiclibraries. Even if a library has a preferred method for purchasing and cataloguingeBooks, chances are that some product or platform the library requires won’t fitthe library’s preferences. As the metadata environment of a library becomesincreasingly complex with increasing diversity in terms of various vendors’approaches toward supporting the discovery of their products, the moreimportant it is that selectors work together with acquisitions staff, cataloguingstaff, and potentially information technology staff to ensure that the desired leveland type of discoverability can be met within a reasonable cost and without beingoverly disruptive to existing workflows. The library, as a customer, may be morelikely to be heard before a purchase if the vendor’s approach to supportingdiscovery doesn’t address the needs of the library. If it seems likely that thevendor will not provide MARC records, as discussed previously, the collaborativeteam can discuss in advance the different options for making the resourcediscoverable and to explore whether there may be other products that would meetthe same need and fit beퟙ�er with the existing discovery and metadataenvironments.Of course, this type of collaboration would only need to be done when
contemplating geퟙ�ing eBook content on a new platform, from a series of packagesthe library has never previously purchased, from a new publisher, or any othercontent that means the nature of the purchase is different than what the libraryhas acquired in the past. This team may meet initially to develop a checklist ofquestions to ask to determine whether or not it might be appropriate for acollaborative meeting to occur. In reality, these collaborations would occur rarelybut when they do occur they could potentially prevent considerable headache anddisappointment.
6.15 KBART for eBooksIn discussing the bulk processing of eBook discovery metadata it would benegligent to focus exclusively on MARC discovery metadata. The reality is thatweb‑based discovery systems including the discovery layers used by manylibraries utilize KBART files as their main container for electronic resourcediscovery metadata. KBART stands for “knowledge bases and related tools.”KBART files are best known for their use within discovery services for eJournalcontent, but KBART recommendations now include provisions for eBook
metadata as well. For those technical services librarians who aren’t yet familiarwith KBART files, the resources listed in this section may be useful for developinga basic understanding of what KBART files are and how they are used to storeand exchange metadata. An interesting point to consider is that KBART files arenot just intended to be used to hold metadata for discovery purposes but they alsocan be used for transmiퟙ�ing acquisitions metadata for potential use in ERMs.While the use of KBART files is a significant method for receiving and
managing discovery metadata for eBooks, there are a number of reasons whymetadata and cataloguing librarians don’t generally require a detailed knowledgeof KBART files or how to manage them. To begin with, the metadata in KBARTfiles generally isn’t created, managed, or updated at the local library. Themanagement is generally done by the eResource vendors themselves as well asthird‑party companies that build the KBs. The second is that the creators of theKBART standard have never intended that KBART replace MARC and theexisting methods libraries and electronic resource vendors use for exchangingmetadata. For example, on the United Kingdom Serials Group/UKSG (2014)website, the following statement is made with regard to KBART:
Many content providers and knowledge base developers are already successfullyexchanging metadata, and this report is not intended to detract from or interfere withsuch existing processes. However, it is evident that many others are unsure about howbest to exchange metadata. Therefore, we propose entry‑level guidelines andinstructions to enable exchange of essential metadata.
Thus, while KBART may never be the predominant container for eBookdiscovery metadata, it is important that metadata librarians and other technicalservices librarians are familiar with the basics of how KBART works becauseexisting systems found in many academic libraries make use of KBART files eitherdirectly or indirectly. It is possible that future metadata containers will beinfluenced by the structure of KBART. In addition, many metadata librarians willundoubtedly find the ideas behind KBART to be interesting in a general way.The final issue is that KBART has not been as widely adopted by eResource
vendors as one might expect. For example, librarians can have a look at thevendors who are found on NISO’s list of KBART endorsers (those vendors whoagree to provide KBART metadata) to see how many of their current eResourcevendors are listed and how many are missing athퟙ�p://www.niso.org/workrooms/kbart/endorsement/. Most librarians will likelyfind that many of their major vendors are listed but some are not. This is notentirely surprising given that NISO and the UKSG have not actively promotedKBART as a container that needs to be widely adopted. As the previous UKSGquote suggests, KBART is not actually recommended for use unless an agencylacks a functional method of storing and exchanging metadata. The author
recognizes that many of the vendors on the endorsement list do use othermethods of exchanging metadata and interprets their choice to also provideKBART metadata as part of a larger effort to give libraries options for how theyreceive metadata from them or via third parties.For the benefit of readers who aren’t yet familiar with KBART, the remainder of
this section includes some key resources for building a basic understanding.This ALCTS2 webinar recording includes an introduction to KBART in the first
7 min. of the video as well as answers to questions about eBook metadata invarious non‑MARC metadata containers including KBART, which begins at aboutminute 36. The URL for this video is:
hퟙ�ps://www.youtube.com/watch?v=POGzvWBJ7xs (for readers of the text version ofthis book: go to the YouTube website and search for the title “Standards for CollectionManagement ‑ Part 2”). Readers who are interested in eBook acquisitions and licensingmetadata may be interested in watching the full video. In particular, there is adiscussion of DDA, which is a topic of discussion in the final chapter of this book.
ALCTS is one of the tiger tamers recognized in this chapter because of the high‑quality and timely information, training, and documentation that this associationprovides to technical services librarians who are currently faced with thechallenges of dynamic information and metadata environments. For moreinformation about ALCTS, see the notes section for this chapter.KBART files are explained in the following document:
NISO/UKSG KBART Working Group (2010). KBART: Knowledge Bases and RelatedTools. Baltimore: NISO. Retrieved from: hퟙ�p://www.niso.org/publications/rp/RP‑2010‑09.pdf (this document reflects Phase I of the KBART project).
Phase II of the NISO/UKSG KBART project represents further developments tothe Phase I report and also includes discussions of KBART information which isrelevant to eBook content:
KBART Phase II Working Group (2014). Knowledge Bases and Related Tools:Recommended Practice. Baltimore: NISO. Retrieved from:hퟙ�p://www.niso.org/apps/group_public/download.php/12720/rp‑9‑2014_KBART.pdf.
6.16 Bulk processing of record setsCataloging and metadata librarians often have an interest in improving their skillsand techniques for dealing with record sets. Record sets are truly a disruptivetechnology in the manner that was discussed in Chapter 2. In that chapter, it wasdiscussed how a disruptive technology is typically affordable, simpler, smaller,
and often relatively convenient to use. Chapter 2 also discussed how the quality ofthe innovation may be lesser than that of the traditional product or service. Inaddition, it was noted that the nature of the disruption may allow the businessthat adopts the disruptive innovation to tap into lower‑end markets thattraditionally may not have used the product or service or may have had limitedutility for it. Finally, the example of Kodak demonstrated how a powerhouse in anindustry can be brought to ruin if it doesn’t adapt in an appropriate way to thedisruptive innovation. Record sets are definitely affordable, seeing as most ofthem are provided by vendors for free and even those that need to be purchasedare still more affordable than the same metadata might be if it were created locallyusing traditional cataloguing methods. Record sets are simpler in the sense thatthose working with them don’t need to be as skilled at cataloguing and asknowledgeable about metadata standards as is a professional cataloguer.However, those working with record sets still need to have a basic understandingof cataloguing and the MARC standard. Record sets are smaller in the sense that avery large number of records can be processed at once, making the amount ofwork that might have been done to a single record during copy cataloguingapplicable to hundreds or thousands of records at a time. Record sets are alsoconvenient because they offer a way to rapidly add, update, or remove MARCmetadata as required. While the quality of record set records have been lamentedby librarians as being disappointingly poor, the author has noticed a markedincrease in the overall quality of record set records over the past 4 years. That thequality is improving may indicate that libraries and eBook vendors have beenadapting well to the disruption. Finally, record sets have brought opportunities tolibraries that otherwise would not have been able to support the level of discoveryfor eBooks made possible through the use of record sets. For example, at theauthor’s library she is the only professional cataloguer who does originalcataloguing and there are a handful of library assistants who do copy cataloguing.With this staffing compliment it would be impossible to create let alone managethe tens of thousands of eBook records that need to be handled each year. Withoutrecord sets, the author’s library simply would not be able to make the largenumber of eBooks purchased by the library discoverable in the library’s OPAC.The discovery of eBooks would be limited to patrons browsing for them on thevendor’s website. Given the large number of vendors and different platforms thatwould need to be searched, even library staff, faculty, and researchers whoextensively use library resources are not likely to know where to begin searchingfor certain eBooks.It is possible that some of the readers of this book decided to read it because
they feel the impact of disruptive pressure brought to the cataloguing departmentwith the proliferation of record sets. The tiger they are dealing with may havecome out of the jungle and is either directly behind the door or maybe is alreadyin the house. The fortunate news for those libraries that are trying to tame a tiger
is that there is a growing community of tiger tamers in academic libraries aroundthe world who are willing to share what they know. The following section of thischapter will introduce readers to some of the basic tiger taming techniques andintroduce a few more tiger tamers.
6.16.1 Mediating record setsDue to a number unfortunate experiences, the author recommends thatcataloguing or metadata staff mediate the process of loading eBook record setsinto the local catalogue rather than receiving a record set and loading it directlyinto the ILS. Given the diversity of ways the record sets can be created in the firstplace, the complexity of the metadata environment, and the fact the metadataoften needs to be transferred between systems in the library and/or exported intoother systems, it’s good to design and use some routine checks and processes onrecord sets.For many years, staff at the author’s library would load record sets received
from vendors directly into the catalogue with no checking, editing, or correctingof the record set content. These record sets included metadata for printmonographs that had been catalogued by a cataloguing vendor and thus wererelatively low‑risk in terms of containing problematic coding or introducing largeamounts of incorrect metadata into the system. Realistically, when record setloading first began at the author’s library MARC editing software was nowherenear as functional as it is today. It likely didn’t occur to staff that puퟙ�ing a recordset through an editor before loading it might be helpful. With the records for printmonographs prepared by a cataloguing vendor, it may have seemed like anunnecessary activity. However, over the years the number of record sets increasedas did the number of problems with those record sets. Problems the author hasfound over the years include eBook records that lacked 856 fields; if put into thecatalogue, these records would not link patrons to the actual eBook content.Another problem was discovered when the 830 and 710 fields didn’t seem tomatch the product purchased. These fields were evidence that the record set wasfor eBooks the library didn’t purchase. With these two problems, the supplier ofthe record set was contacted and the problem reported. New record sets were sentand the correct records were loaded into the catalogue. If these records had beenloaded without mediation, the problem would have likely been reported bypatrons and possibly handled on a title‑by‑title basis, which would both havebeen poor service to patrons and very time‑consuming for library staff to resolve.However, by previewing the record set in an editor before loading and followinga routine process, staff were able to detect the issue within a few minutes ofdownloading the files and thus avoided having to deal with the problem later on.Another type of problem the author has been able to address in advance of
loading records originates from MARC metadata that may have been crosswalked
from another metadata container or contains records that haven’t been properlyconverted from MARC8 to Unicode or vice versa. While sometimes the problem isprimarily cosmetic such as HTML coding in 520 fields that displays as somethinglike <p> or <li>, there are times when the inappropriate coding impacts on thediscoverability of the resource or the functioning of the software. For example,there is a problem that sometimes occurs with the character encoding in recordsand it is a problem library staff should be trained to identify. The problem oftenappears when non‑Latin character text (including diacritical characters) orsymbols are present in MARC records. In terms of what can be seen in records,strings like “<u+. . . . .” and “&#… .” can show up in place of characters or therecan be some gibberish. Problems such as this can lead to both discovery andauthority control processes failing because the ILS or OPAC doesn’t find theintended character(s). In addition to potentially not being able to find the records,the presence of Unicode or HTML‑related strings and gibberish can make therecord that is located hard to read and understand.The third type of problem, which happens from time to time, is the presence of
a control character in the record set. Generally these can be found in a record setby searching for the “^” character in a record set but may also be present in “<u+. .. .>“ coding. Depending upon the systems through which the metadata may needto pass, these characters may cause applications to fail or to perform erratically.Those who mediate record sets absolutely must search for and remove thesecharacters if the records are to be integrated into complex metadata environmentswhere the metadata will be exchanged among systems.There can be other problems that occur when metadata has been crosswalked
from other metadata schema into MARC. One of the most common issues that canlead to problems with discovery is the incorrect coding of the second indicator ofthe 245 field. If metadata is known or believed to have not originally been createdin MARC, it is good to examine the record set for paퟙ�erns in problems withindicators and subfields. With regard to the 245 field problem, at the author’slibrary cataloguers will routinely search record sets for coding such as “245 10$aThe,” “245 10 $aA,” and “245 10 $aAn,” as such coding will cause the library’sOPAC to malfunction with some title searches. This may not be a problem for alllibraries depending upon how their discovery systems process initial articles.There may be other problems that are relevant or more critical to other libraries.Another example of the type of code that can be looked for when mediating
records is the use of the pipe key “|”. Many ILSs use the pipe key to indicate asubfield in MARC records while most MARC editors (except OCLCs) use thedollar sign “$”. The author has found record sets where the pipe key is usedoccasionally to indicate a subfield. For example, $c is wriퟙ�en as |c. The pipe keymay have been entered manually by accident by a cataloguer who is accustomedto using a system that uses the pipe key to mark subfields. While this mistake isoften found when validating record sets, a process that will be discussed later, this
is a mistake known to carry through to the catalogue unless specifically searchedfor and corrected. Depending upon the system into which the records will beloaded, the subfield may not be identified and the record may malfunction.The examples discussed in this section represent some of the more significant
and, unfortunately, persistent problems the author and the cataloguing staff at herlibrary search for when they mediate the loading of record sets. Records that may“look” good to a noncataloguer may actually contain some highly problematiccharacteristics. Libraries should keep a log of problems found either beforerecords are loaded or after the fact. Over time, paퟙ�erns of problems will arise. It ispossible to automate some routine searches and processes in MARC editors suchas MARCEdit. Where paퟙ�erns are evident, the library may wish to automatesome of the checking. For checking that isn’t suitable for automating, workflowsshould be developed and followed by everyone who prepares record sets forloading.One final word about mediating record sets is with regard to the balance
between automating all processes or, for those ILSs that have features to do this,allowing the local system to detect errors. A reasonable balance needs to besought for the sake of efficiency and the potential benefit of having a human beingscan through the record sets from time to time. For example, if vendor X providesrecord sets with consistent characteristics and consistent quality month aftermonth, it is reasonable that the library would make use of an automated processset up in the MARC editor and library staff wouldn’t do more than a cursoryglance at the content of the actual record set. This is an efficient use of time.However, the author recommends that cataloguing staff inspect these recordsfrom time to time to check to make sure that nothing has changed. Perhaps recordsets that have been deemed to need no regular inspection could still be reviewedon a 6‑month basis or, at the very minimum, each year when the subscription isrenewed. If the library is using some sort of workflow management software orsystem to assist with managing the retrieval and processing of eBook metadata,the schedule for routine reviews of the record sets can be included in thoseworkflows. With regard to those record sets that have proven to be problematic,an automated process can be set up for dealing with those issues that areconsistent over time; however, library staff should still continue to examine therecord set until or unless the quality and consistency of the record set improvessignificantly. The author has experienced more than once the situation where shehas found fairly significant problems in a record set, including missing 856 fieldsduring the process of examining the accuracy of bulk processes that were appliedto other fields.
6.16.2. Creating record set profiles and workflows
At this point in the discussion about managing eBook metadata through the use ofrecord sets, readers undoubtedly have a sense of how diverse the options andpossibilities are for not only the condition of the record sets received but also themethods for receiving them and the variety of processes that may need to beapplied to each record set. Keeping track of what needs to be done with eachrecord set can be a complex and overwhelming task.In the author’s library, she once had an undergraduate summer student who
created an inventory of the distinct eBook collections for which her library had orrequired MARC record sets. She was surprised to see that the student’s inventorywas in excess of 800 package titles. Sometimes subscriptions that seemed fairlygeneral and straightforward when viewing the vendor’s online platform wereactually complex, multilayered resources. For example, the library purchasedeBooks from a particular scientific publisher and it appeared at first glance as ifthe eBooks were all part of a single package purchase. However, that purchasewas not a single purchase represented by a single record set. Instead, thecollection was divided into seven subject packages and the front list content foreach package was offered as a new purchase each year plus selected back file orarchival content was available for purchase as well. It was discovered that eachyear the seven packages were invoiced and paid for on two different invoices. Thestudent identified that each year there were seven record sets the library neededto retrieve for each of the packages but had only been retrieving five. The authorwas baffled by the fact that the eBook seller treated each package separately interms of issuing record sets. It was also puzzling to see that the vendor wasinvoicing two of the packages separately until she discovered that two of theseven packages had once been available on another platform and apparently werestill under a separate license from the other packages. These two packages hadmigrated to the current platform two years previously. Therefore, the reason forthe separation of invoices and licenses appeared to be historical. She also foundout that because not all libraries purchase all seven packages the vendor had toseparate the records for each package into discrete record sets. Despite the factthat there ended up being logical explanations for what initially appeared to beunnecessary complexity in how the record sets needed to be handled, it becameapparent that because there was no specific documentation giving instructionsabout how to deal with this complexity, two of the seven record sets wereoverlooked ever since those two collections migrated to the new platform.The numbers the student recorded during the summer project at the author’s
library didn’t account for the corrections and deletions record sets that the librarymight get during the year. However, even without those numbers, it was clear tothe author that many record sets had been falling through the cracks over theyears. In scanning through the spreadsheet, some collections were one‑timepurchases with a single corresponding record set but the majority of purchaseswere complex multiyear and multipart collections. The record sets most likely to
fall through the cracks were those that represented a package that was part of acomplex purchase. The purpose of the project was to identify the backlog ofeBook records that required loading, which ended up being projected to be in thehundreds of thousands of records, but the real value of the project was todemonstrate not so much the volume of records but the complexity and diversityof the methods for obtaining record sets themselves as well as the ease with whichcertain eBook collection purchases and their corresponding record sets can getlost in the mass of electronic resource subscriptions.Library assistants at the author’s library are still uncovering record sets that
were never downloaded or, if downloaded, were not processed and loaded intothe library’s catalogue. With regard to these late discoveries, another issue wasdiscovered: some of the digital document collections in the record sets were notissued until years after the collection was purchased. In addition, there was nometadata indicating whether or not records for those collections had beenreceived. Furthermore, there were changes in staff over time and assumptionswere made about which record sets were loaded to further complicate andconfuse the situation. The boퟙ�om line is that libraries need an effective andefficient way to keep track of their record sets and what needs to be done withthem. Part of the “keeping track” includes recording when no record sets areavailable and/or a vendor has promised to provide metadata at a later date.Given the author’s realization, described in the previous paragraph, she began
trying to think of a way to manage the complexity created by the diversity ofeBook record set workflows. Handling eBook record sets seemed to defy thecreation of a single linear workflow process. While a general workflow forprocessing an eBook record set was understood, puퟙ�ing that process on paperand training library assistants to follow it was an allusive undertaking.Developing a method for what was required for each record set in the first placeseemed to be critical for the effective management of eBook metadata, as was amethod for tracking the status of the record sets. At the ALA MidwinterConference held in the year following the summer project that lead to thisrealization, the author was fortunate to hear a conference presentation by RomanPanchyshyn who demonstrated a method of using a series of standard questionsto ask when a new product or package was purchased. The content of thatpresentation was later published in the following article:
Panchyshyn, R. (2013) “Asking the Right Questions: An E‑resource Checklist forDocumenting Cataloging Decisions for Batch Cataloging Projects.” Technical ServicesQuarterly. v. 30, issue 1 (2013). pp. 15–37 DOI: 10.1080/07317131.2013.735951.
Roman has shared the eBook checklist for librarians to use and modify at theirlocal library. Because of the sharing of this helpful information, Roman
Panchyshyn is also recognized as a tiger tamer. The link to the checklist is foundat this location:
hퟙ�p://www.library.kent.edu/kent‑state‑university‑libraries‑technical‑services‑ebook‑checklist (for those readers who are reading the print version of this book, search for thetitle “Kent State University Libraries Technical Services: EBook Checklist” in anyinternet search engine).
While this example checklist contains questions that are specific to Ohio andKent State Libraries, this document is a useful real‑life example of the types ofquestions that can be built into a form. In addition, libraries may want to divideup and organize their record sets and checklist profiles in a way that makes sensefor the local environment. For example, some libraries may have no reason todivide up packages within a larger collection because they always get thecomplete collection and all of the record sets are treated the same way. Otherlibraries may divide up the packages because they may not get all of the packagesin the collection and some of the metadata may come directly from the vendorand some may be supplied through another library in a consortia. The boퟙ�om lineis that rather than trying to create a “one size fits all” workflow for managingeBook record sets, the use of questions such as those used by Kent State lead tothe creation of profiles that can guide library staff through the generic process ofprocessing record sets. In addition, if the answers to the questions are recorded ina standardized way in a product such as Excel, the information can be used asmetadata for other activities. For example, the resulting spreadsheet could beused to identify the most commonly applied actions for which it may make senseto spend time creating an automated process.The author uses a simplified version of the Kent State form at her library. Each
new purchase gets a new form, which is printed out and wriퟙ�en on by hand. Theforms are then placed into folders, which are organized according to platform. Avendor may have more than one platform and multiple packages may haveprofiles included in a folder as long as all of the packages are on the sameplatform. As the record sets are retrieved and processed, the folders are moved todifferent physical locations. If a folder remains in a physical location for too long,this is an indication that something must have gone wrong and library staff needto follow‑up on the progress of that record set. Dates for each stage of work doneto a record set are recorded in the folder as well as copies of the load reports(produced by the Sierra ILS) and any reports of problems associated with eitherthe record sets themselves or individual records. The author considers this simplepaper‑based system to be a temporary one until both the ERM and a workflowmanagement system can be fully implemented at her library. However, as simpleand old‑fashioned as this system seems, it has proven to dramatically improve theefficiency of record set loading processes and provides useful information when
aퟙ�empting to resolve problems. There are some distinct strengths and weaknessesof this system, with the weaknesses being that the progress of the record sets isn’ttransparent to the rest of technical services or the library in general relative towhat might be possible if the information and workflow were handledelectronically. In addition, the folders don’t really support the necessary functionof informing library staff when it is time to look for a new record set. That beingsaid, the author heard a librarian mention at a conference that her library wasrecycling old serials check‑in cards for the purpose of monitoring when newrecord sets should be available for pick‑up. Unfortunately, the author doesn’trecall the name of this librarian or her library to give her credit for the idea. Whilethe author hasn’t tried this type of approach for herself, it seems like a plausiblesystem and also an effective way to repurpose now defunct library supplies. Theauthor also now wishes that her eBook record set loading folders had been set upat a more granular level as not all record sets for content on the same platform areavailable at the same time or necessarily processed at the same time, so it is easyto lose track of an individual package on a platform. With regard to the strengthsof the “folder” system, the fact that the folders are a physical object that can beobserved to “get stuck” in a location or appear to have “not been touched in awhile” and thus need aퟙ�ention is a definite strength. Unlike print resources,which tend to build up on shelves and carts in the cataloguing department,backlogs of eBook cataloguing are invisible and, as the author has foundnumerous times in her library, can remain unaퟙ�ended for literally years beforeeither staff or patrons realize that there are no records in the catalogue for aparticular product. This being said, the use of workflow management softwarewould be helpful in this regard as well. In the end, the author would like totransition away from the use of the “folder system.” This discussion was intendedto show that homegrown or what appears to be makeshift methods of managingboth the information about what needs to be done with various record sets andthe progress of record sets through workflows can be very effective. Rather thansuggesting that a particular type of software or a specific application is “best” forrecording metadata about record sets and tracking their process, what is moreimportant is that those managing eBook metadata understand the key goals thatneed to be achieved and then use whatever resources are available to come upwith an efficient and effective method of achieving those goals. In fact, if usingsoftware prevents the library from recording necessary details because there isnowhere to record them effectively or the software causes the library to bendpractices in ways that lead to an outcome that isn’t effective or efficient, amakeshift approach may be the best option for the time being. Makeshiftapproaches could include the use of paper or electronic methods or a combinationthereof. In the case of the author’s library, the intent is to eventually manage boththe information about packages and the workflow in an electronic rather thanpaper format but has found that no single electronic product is adequate for thejob. Given the complexity that has been discovered over the years, the author is
job. Given the complexity that has been discovered over the years, the author isglad that she opted to not aퟙ�empt to create an electronic system too quicklybecause in the process of using the folders she has had a lot of flexibility toexperiment with slight variations in her approach to managing how theinformation is stored and the manner in which record sets are handed off betweenprocesses. Experimenting with these changes would have required adopting newsoftware in an electronic environment. Being able to observe the physicalmovement of folders through the various processes also helped to improvesituations that created boퟙ�lenecks that may have looked a liퟙ�le untidy in theauthor’s office, but were actually very helpful in detecting and ironing outinefficiencies. These situations would not have been so easily detected in anelectronic environment.In conclusion, for those readers who like simple and tidy solutions for their
metadata management problems, seퟙ�ing up profiles for the various record setsand managing workflows may prove to be a bit frustrating if the eBook collectionis highly diverse. However, the author has seen firsthand both the need todocument record sets and the value of that documentation over time. Therefore,the effort put into seퟙ�ing up systems for recording information about record setsand establishing a system to track the progress of record sets is worth it in thelong term with regard to the time that it can save later on when library staff try tofigure out why records are missing from the catalogue.
6.16.3. Record set editingHaving established that it is essential that library staff mediate the loading ofrecord sets into their local bibliographic databases and that it is equally essentialthat the library document the requirements for its various record sets and have amethod for tracking the progress of record sets, it is now appropriate to talk aboutthe nuts and bolts of actually editing record sets.When planning to do record set editing there are three levels of editing that can
occur with any record set and the library should decide on a package‑by‑packagebasis the level of editing required. The levels are progressive and a particular levelof editing includes the editing that occurs at the previous levels as well. Theselevels include: (1) Clean‑up: This level of editing involves searching for and dealing with some of
the problematic coding that can be found in MARC records as waspreviously discussed in this chapter, such as problems with diacritics. Thisstep would also typically include using a MARC validator if the MARCeditor has one as well as checking for any recurring problems that have beennoted in the profile for the particular record set being processed. Typicallythe “clean‑up” level of record set processing is becoming increasingly
prevalent in the author’s library because many vendors and services such asCollection Manager allow libraries to set up profiles whereby many of thelocally required fields or subfields can be inserted or preedited in the recordset before it is delivered or otherwise made available to the library.
(2) Local editing: Having a record set that is “cleaned‑up” from a technical point ofview, most libraries will likely want to add fields that are specific to the locallibrary and/or make any required changes that will optimize the records foruse in local systems. Some examples of fields and subfields that librariesmay wish to include:(i) 040 $d: This subfield, while not required is often helpful for copy
cataloguers. It indicates that the record has been downloaded fromanother source and modified for the library’s local catalogue. Librariesput their library code in this subfield.3
(ii) 506: This field is used to record any local restrictions on access. LicensedeBooks typically have a 506 field. For remote access eBooks, the librarymay indicate that use is restricted to the students, faculty, and staff atthe institution (i.e., name of the university or college). Or if there is noremote access, the library would record text such as “For on‑campususe only” or “For use in library only.” Details about the 506 field can befound at hퟙ�p://www.loc.gov/marc/bibliographic/bd506.html.
(iii) 590: This field can be used if a local series title is required, preferred, oruseful. Many libraries use the local series, sometimes in combinationwith 710 fields, in order to identify records for eBooks in certaincollections and/or content on certain platforms. As a local field,libraries have flexibility in how they wish to apply it. However, thereare a few recommended guidelines. The first is that the use of the 590(and 710 if applicable) should be recorded in the metadata or profile forthe record set. The second is that the text in the field must be enteredconsistently for all records intended to be included in that collection. Ifthe library prefaces a collection name with text such as “Frompackage:”, that text string must not vary in any way from record torecord. For this reason, using an automated process to insert localseries information may be a good way to insert this field. Cataloguersneed to keep in mind the reason for adding 590 (and possibly 710)fields. Often these fields are used primarily for the benefit of selectorsso that they can collate eBook titles for certain collections or packagesthrough doing title searches in the OPAC. For some libraries it ispossible that this practice, if only done for the benefit of libraryemployees, may become obsolete over time if other metadata andsources of information achieve the same result.
(iv) 850: For those libraries that either report their holdings or holdinginformation is harvested from their MARC records, the local library
code (as described with regard to the 040) may need to be recorded inthis field.
(v) 856 $u: Depending upon how authentication is handled, the library mayneed to insert a script before the URL found in the $u subfield content.
Local editing may occur entirely within the MARC editor or it could occurpartially within the editor and partially carried out by the record loader built intothe ILS. In addition, the editing done in the editor could be carried out step‑by‑step by library staff or the steps could be built into an automated process createdfor record sets that fit a particular profile. In general, the author prefers to do asmuch editing as possible within the editing software, regardless of whether thatediting was done by using an automated process or not, because of the fact thatthe results of the editing can be previewed by a cataloguer before sending forloading. At her institution, cataloguers don’t load records and thus aren’t able topreview results of processing carried out by loaders before those records enteredthe catalogue. However, at other libraries the situation may be different, which iswhy the decision about how to carry out local editing needs to be made within thecontext of what is appropriate for each library.(3) Local Enrichment: This type of enrichment refers to the addition of call
numbers, adding access points, and either hybridizing or upgrading recordsto RDA. Some local enrichment processes can be carried out using featureswithin a robust MARC editing tool such as MARCEdit. Examples of the typeof enrichment that can be done by MARCEdit include inserting callnumbers, adding FAST subject headings, and either hybridizing or doing arough conversion of the records to RDA. Other types of enrichment mayneed to be done through more direct staff interaction with either groups ofrecords within a set or individual records. The purpose of local enrichmentis generally to make the records more discoverable and/or functional withinlocal discovery systems. The decision about whether or not to do localenrichment as well as how much enrichment and the type of enrichmentdepends on several factors:(i) Whether or not the enrichment can be done through a feature in the
editor: If it is quick and easy to do some enrichment and thatenrichment is believed to benefit patrons or library staff, enrichmentcould be done regardless of other factors.
(ii) The durability of the collection and metadata: Labor‑intensive localenrichment is not recommended for subscription eBooks. This isespecially the case when those eBooks will only remain in the cataloguefor 1 or 2 years. In cases where enrichment is desirable, the libraryshould speak to the vendor and encourage that enrichment be done onthe record sets before they are sent to the library. In the case of DDA(Demand Driven Acquisitions will be discussed in the final chapter ofthis book), it is in the vendor’s best interest to provide the best quality
discovery metadata possible to the library. In a DDA situation, thepurchase of an eBook largely relies upon the ability of discoverymetadata to find the records for the vendor’s eBooks in the localdiscovery system and interact with the eBook in a way that will triggera purchase of the DDA content. Without the patron discovering theeBook in the first place, there is no opportunity to sell the content to thelibrary. Therefore, this is a situation where the library’s specificsuggestions for how the metadata can be improved may be particularlyuseful and welcome. However, if the library is adding metadataharvested from one of its digital repositories and would like to enrichthat metadata with controlled access points, the time and effort spentdoing this work would be worthwhile if done locally.
(iii) The potential impact of the enrichment: It has been the experience ofthe author that sometimes library staff request that enrichment be doneto eBook records that will not improve the discoverability orperformance of the metadata in the local discovery systems. Requestsfor the addition of call numbers that can’t be added to the recordseither using a tool built into the MARC editor or through another bulkprocess as well as generic information put into 500 fields are examplesof enrichment that typically has liퟙ�le impact overall. Concerns aboutthe value of the enrichment versus the time of staff to enrich themetadata may also be impacted upon by the fact that some records arefrequently overlaid as the vendor or cataloguing source updates URLsand enriches the records.
For those record sets retrieved using a KB‑based mode, a very effective methodof enriching records can be to have the metadata in the KB itself enriched.Depending on the agency responsible for the KB, the approach to enriching themetadata varies. Sometimes a library can get special training in how to makechanges directly within the KB while other times, the agency that hosts the KBmust make the changes on behalf of libraries.
6.16.4. MARC editorsWhile all ILS/LMSs allow cataloguers to download and edit MARC records on arecord‑by‑record basis and most have functionalities that allow existing records tobe grouped and updated at once, editing within an ILS is not robust and efficientenough to deal with the mass of record sets many academic libraries need tohandle on an ongoing basis. While ILS/LMS developers and vendors often call thecataloguing module within their software the “MARC editor,” this is not what isintended when the term “MARC editor” is used in this book. MARC editors, inthe context of this book, are a type of software that exists separate from theILS/LMS and allows library staff, at the very minimum, to edit fields within
records, entire records, and the complete record set in a number of flexible ways.Given the number of record sets and the complexity of editing that sometimesneeds to happen, ideally a MARC editor for eBook metadata would be muchmore complex than this. An ideal editor will have a number of built‑in functionsthat support the most commonly applied processes. In addition, an editor that issuitable for working with eBook record sets also allows for libraries to programsome of their own functions and/or piece together functions to run automatically.MARC editors should also help with the harvesting or retrieval of metadata andthe transformation of records and record sets from one format to another.The Library of Congress in the United States keeps a list of tools that can be
used for editing, enriching, and transforming MARC records and other relatedcoding in documentation, which is located at the following web address:hퟙ�p://www.loc.gov/marc/marctools.htm. Not all of these tools are actually MARCeditors. The “Cataloging Calculator” for example is useful for generating Cuퟙ�erNumbers, looking up geographical codes, or finding AACR2 abbreviations butcan’t be used for editing MARC records. In addition, many of the resourcesrequire a subscription or need to be purchased. Only those marked as “free” arefreely available for everyone to download and/or use. Cataloguing and metadatalibrarians may find it interesting to investigate tools listed on this page even ifthey aren’t MARC editors and visit the page every now and again to see if newtools have been added.The author acknowledges that there are many additional MARC editors in
existence and used in academic libraries that aren’t listed on the Library ofCongress page. In addition, there are some MARC editors that have beendesigned to be used in specific contexts such as on MAC computers, which aren’treferenced in the list. Readers may already use a MARC editor or multiple MARCeditors that they find useful. It is not possible within the context of this book to dojustice to all of the helpful tools currently available for cataloguing and metadatalibrarians to use. That being said, there is one editor that is currently available thatstands out as exceptional in a number of aspects. That editor is MARCEdit and itis exceptional by the fact that it is highly robust, it is regularly updated, new toolsare added from time to time, and it is free for anyone to download and use. Otherexceptional factors include the significant number of training videos availableonline, the extensive “help” documentation, and the large community of userswho can provide support and information. For these reasons as well as the factthat the author has found MARCEdit to be critical to her own eBook metadatamanagement processes, MARCEdit will be discussed in more detail in thischapter. For those librarians who already use another editing application andhaven’t looked at MARCEdit lately, it may be of use to look at MARCEdit again tosee if any of the newly added features and functions might be useful supplementsto existing practices and processes.
6.16.4.1 MARCEdit backgroundAccording to MARCEdit’s developer, Terry Reese (2013), development onMARCEdit began in 1999 as a replacement for the Library of Congress’s DOS‑based MARCBreakr/MARCMakr software. Over time the program has undergoneconsiderable changes and improvements. Today the application is wriퟙ�en in C#and for nearly a decade it has been made freely available for download bylibraries around the globe. Reese, as of the writing of this book, is the Head ofDigital Initiatives at The Ohio State University and previously was with OregonState library. Because of his ongoing work with MARCEdit, he is another of ourtiger tamers.
MARCEdit editing and metadata processing applications,available for download from http://marcedit.reeset.net/downloads
To begin a discussion of MARCEdit, it must be recognized that it is a robust,complex, and customizable program. While it can be used “out of the box” so tospeak to do some basic editing functions without much of a learning curve at all,those who want to make good use of it for maximizing the effectiveness andefficiency of their eBook metadata management processes need to invest sometime and effort into learning about its features and options and then keeping up
with news and updates. The author has already tamed a number of whatappeared to be hopelessly wild tigers by using MARCEdit, but she admits thatthere are many more that could be tamed at her library if there was more time tosit down with the program and documentation and work out a solution. The hoperemains that as other metadata management snarls are sorted out more time willbe freed up to continue to delve increasingly deeper into both learning moreabout MARCEdit and constructing ways to use its features.With regard to training and documentation, the MARCEdit website has a page
dedicated to information about how to find training and help (see:hퟙ�p://marcedit.reeset.net/help). The YouTube videos, which aren’t mentionedspecifically on the help page but on another page listed as “tutorials,” areparticularly useful in terms of helping librarians get started with using MARCEditand learning about the new features and functions. While Terry Reese has createdmany YouTube videos himself, other librarians from around the world havecreated additional videos. For those who are completely new to MARCEdit, theMARCedit 101 video may be a useful starting point (see:hퟙ�ps://www.youtube.com/watch?v=zP4x‑4hcVQ4 or for those reading the printversion of the book, go to YouTube and search for the video title “MarcEdit 101: Ihave a MARC record, now what?” Or, a more recent version of the MARCEdit 101training can be found at hퟙ�p://marcedit.reeset.net/marcedit‑101‑workshop).Reese’s videos are found on the channel “tpreese” while videos done by otherscan be located by searching the name of the function or outcome and the term“MARCEdit.” Those librarians who wish to keep up‑to‑date on changes toMARCEdit may wish to subscribe to the digest form of the Listserv, which islisted on the help page mentioned above.For those who have not yet downloaded MARCEdit, various versions of the
download as well as other tools can be found at this page:hퟙ�p://marcedit.reeset.net/downloads.Also, the videos may be quite helpful for visual learners or those who find the
menus in MARCEdit overwhelming. However, it is important to keep in mindthat the MARCEdit interface and menu options have changed over the years andsome of the videos are now somewhat dated. Because of this, the author suggeststhat the listserv and listserv archive also be used to find the most up‑to‑dateinformation about the features as well as to learn about newly added features.
6.16.4.2 MARCEdit general overviewFortunately, the information and training resources currently available forlearning about MARCEdit and all of its features are plentiful and easy to access.The best that a comprehensive book such as this can do is give a general overviewof MARCEdit and its features and direct readers to resources where moreinformation and training can be located. Hopefully this overview will introducereaders to some tools they can use for working with record sets and doing other
bulk processing that they may need to do at their libraries. they can use forworking with record sets and doing other bulk processing that they (1) MARC toolsThis set of tools allows the user to either break a record set for editing in
MARCEdit or save the record set in a form that can be loaded into an ILS/MLS. Italso supports the translation of MARC record sets in either direction betweencontainers including MARCXML, DC, FGDC EAD, and MODS. This feature alsocontains a tool that will allow conversions between MARC‑8 and UTF8 (Unicode)character encoding.
View of Tools in the MARCEdit record set editing application.MARCEdit editing and metadata processing applications can beaccessed through this MARCEdit window. MARCEdit is availablefor download from http://marcedit.reeset.net/downloads.
The MARC Tools feature is generally the doorway through which library staffwill bring a record set into MARCEdit for editing. The MARC 101 video gives ademonstration of how this task might typically occur. Also, on computers whereMARCEdit is already installed, double‑clicking a file name with the extension“.mrc” will open the tools.
(2) MARCEditorThe MARCEditor is the MARC editing tool that was a topic of discussion
previously in this book. This is the tool in which MARC record sets can be editedin a readable text format. This is the feature in which most of the day‑to‑day workwith eBook record sets typically occurs. At the point in which this book waswriퟙ�en, the features built into the editor include:
(a) Inserting control fields and control numbers as well as characters andsymbols from various character‑sets
(b) Locating records for certain formats in order to extract them for specialor separate processing
(c) Validating MARC coding or ISBNS(d) Automatically generating call numbers and Cuퟙ�er numbers(e) Automatically inserting RDA coding(f) Finding appropriate linked data access points
MARCEdit tools can be accessed via the Tools menu in the editoror from the menus on the MARCEdit window.
(g) Inserting constant data into records(h) Custom automation of a number of tasks(i) Retrieving and inserting records via z39.50 searches(j) Removing duplicate records within a record set(k) Extracting and inserting OAI‑harvested metadata(l) Retrieving metadata from other remote URL locations(m) Automated MARC record normalization(n) Automatically removing empty subfields(o) Adding, deleting, editing, copying, and swapping fields according to
criteria supplied by users
(p) Sorting record sets according to certain fields Embedded within these features, there are a number of additional related
functions including options that allow for precise control over which records willbe acted upon when a feature is used and how certain actions will be carried out.For those librarians who only need to do some basic processing of their eBookrecord sets, chances are they can get up and running with MARCEdit by watchingthe “MARC 101” video as well as searching around for other basic MARCEditvideos on YouTube and reading some of the information on the sources of help,which can be accessed through MARCEdit’s help menu. However, if the librarydoes tasks repeatedly, efficiency can be increased by automating those tasks (see:hퟙ�ps://www.youtube.com/watch?v=fnorN0MFFN0 or search for the YouTubevideo “MarcEdit Task Automation Management”).For those librarians who need to have very precise control over what records,
fields, subfields or strings within subfields are edited during an editing process,learning to use regular expressions (sometimes wriퟙ�en as reg ex or regex) may behelpful. For an example of how regular expressions can be used in MARCEdit, seehퟙ�ps://www.youtube.com/watch?v=qNJA9p2_‑qU (search in YouTube for“Multiple field regular expression”). For those librarians who work in Python,Perl, or R and have some familiarity with these languages, chances are thatworking with regular expressions in MARCEdit will be quite straightforwardwith the exception of a few idiosyncrasies.4 For those who have an interest inusing regular expressions but have no previous experience working with them, anumber of librarians in the MARCEdit community have posted instructions,videos, and sample expressions that can be used to achieve certain results. Oncean account is created on the listserv, the archives can be searched at this location:hퟙ�p://metis3.gmu.edu/cgi‑bin/wa?A0=MARCEDIT‑L(3) MARC SQL ExplorerThis tool allows users to perform queries on local or remote databases and to
output MARC data in either a SQLite or MySQL formaퟙ�ed data. The results canbe exported as MARC records or tab‑delimited data (i.e., a spreadsheet). The toolwas originally designed to evaluate HathiTrust metadata so it is useful forsearching and extracting metadata from large sources of metadata. The SQLExplorer may be potentially useful for a number of purposes, including locatingproblematic records for local database clean‑up or extracting subsets of recordswhen harvesting metadata from massive collections.A simple demonstration of the Explorer is available on Terry Reese’s YouTube
channel: hퟙ�ps://www.youtube.com/watch?v=xmHAsF34qn0 (Search for“MarcEdit: Using the MARC SQL Explorer” on YouTube).(4) Delimited text translatorThe delimited text translator allows librarians to convert or translate a file that
was originally created in a tab‑delimited or comma‑delimited format, including
Excel spreadsheets and Access tables, into MARC records. The ability of thetranslator to successfully perform this function is dependent upon each rowcontaining information about a single eBook and each column containingconsistently formaퟙ�ed elements, which are also consistently applied and containinformation relevant to the eBook referenced in that row.The most common purpose for which this tool is used by the author is in
situations where a vendor has no capacity to create and supply MARC records foreBooks but can supply a spreadsheet. The information can be crosswalked usingthe translator feature into MARC records that, while generally far from perfect,can be used as discovery records. Another possible situation is that the librarymay wish to create discovery records for a digital object collection createdelsewhere on campus and the metadata is not in a library‑friendly format butcould be exported in a tab‑ or comma‑delimited format.A useful demonstration of how the delimited text translator can be used is
found in the following YouTube video: hퟙ�ps://www.youtube.com/watch?v=Kp_N3ncjS7Q (search YouTube for “MarcEdit Delimited Text Translator”).(5) Harvest OAI recordsHarvesting OAI records was previously discussed in this chapter in terms of it
being one of the major methods by which an academic library may retrieve arecord set. Readers may have noticed that there is an option to harvest OAIrecords built right into the MARC editor. There is also a standalone tool that is theauthor’s preferred tool for harvesting OAI metadata because of the differentoptions that are built into the tool and also because it is integrated with otherMARCEdit tools she uses.As with the other MARCEdit tools, Terry Reese has provided a YouTube video
that demonstrates how the harvesting tool might be used. This video is located athퟙ�ps://www.youtube.com/watch?v=gvBrMVH6j7U (search for “Translating OAImetadata to MARC using MarcEdit”).(6) z39.50/SRU clientThe z39.50 protocol has already been discussed in relation to being a method
for extracting MARC metadata from library and union catalogues. This is anotherhelpful and powerful MARCEdit tool the author uses on a regular basis to extractrecord sets when electronic resource creators or vendors aren’t able to supplyrecord sets but MARC records are known to exist elsewhere.The video for this function is located at hퟙ�ps://www.youtube.com/watch?
v=y0YibTP1dIs (search for “MarcEdit’s z39.50 Functionality”).If libraries have OCLC accounts, for example, MARCEdit can be configured
with the library’s user name and password to access the z39.50 service at OCLC.Even without an OCLC account, libraries should be able to search the Library ofCongress and may be able to set up access to other local, regional, and nationallibraries as applicable to retrieve MARC metadata from those catalogues.
There are additional videos showing tips and tricks in using this tool created byother librarians in the MARCEdit community, which can be searched for onYouTube. plus there is a fair bit of documentation that can be accessed throughsearching the listserv archive.(7) MARCNextThis is a suite of tools not so much intended for librarians to use in managing
their current eBook metadata but to experiment with and learn about linked dataand BIBFRAME. When this book was wriퟙ�en, MARCNext was included for thebenefit of those librarians who may be interested in the tools. For those librarianswho are new to MARCEdit and bulk processing, this is a feature and topic that isnot currently essential. The topics of BIBFRAME and linked data will beaddressed in the final chapter of this book to help librarians plan for the future,but these are technologies that are not yet in active use in academic libraries.The tools available in MARCNext continue to grow in number and complexity
over time. There are some existing YouTube videos that may be of interest:
hퟙ�ps://www.youtube.com/watch?v=iៈ�xNT1TxVU (search for “MarcEdit MARCNext:Linked Records Tool”).hퟙ�ps://www.youtube.com/watch?v=2BTkjjowF1s (search for “MarcEdit MARCNext:Bibframe Testbed”).hퟙ�ps://www.youtube.com/watch?v=wyijGEn8sr0 (search for “MarcEdit MARCNext:JSON Object Viewer”).
Readers may also note a new tool in MARCNext called SPARQL Browser. Forthose interested in learning more about SPARQL, see Terry Reese’s blog post:
Reese, T. (2014). “Working with SPARQL in MarcEdit” retrieved fromhퟙ�p://blog.reeset.net/archives/1632.
The blog post includes references to other documents on the web that readersmay find interesting.(8) MARC SpyThis tool is described as being a HEX editor, but in practical terms the author
has found this particular tool to be useful to find problematic characters in recordsets that can’t be detected through the usual editing processes. In particular, therehave been files that appear to be fine but cause problems during the process ofbeing loaded into the local catalogue. If the ILS/LMS gives a clue as to whichrecord and/or field within that record is problematic, the file can be run throughthe “Spy” tool and the problematic part of the record can be examined on a bit‑by‑bit basis until the irregularity is located. The problem bit can then either bereplaced with a valid one or deleted completely.
While the author uses this tool on very rare occasions, it is useful to know aboutit because it can be used to tame some vicious tigers that come out of the junglewhen a record or record set refuses to load into the local system.The video for this tool is located at hퟙ�ps://www.youtube.com/watch?
v=FJbQYhV4M2Y (search for “MarcEdit—Using MARC Spy”).(9) Extract selected recordsIt is not unusual for a library to get a large record set that contains records that
either aren’t needed or require some type of editing that the other records in therecord set don’t require. The “extract selected records” function can be used toseparate out records from the larger record set when situations such as this occur.As well, there is a “delete selected records” that may be used to solve problemswith unwanted records.The following video includes a demonstration of how the feature can be used to
solve some very common problems experienced when processing eBook recordsets. The video is quite useful for explaining when a librarian might decide to usethe various approaches to dealing with record sets that contain subsets thatrequire processing. The video can be found at this location:
hퟙ�ps://www.youtube.com/watch?v=A3xChRJ8OEQ (search for “MarcEdit ‑‑ ExtractSelected Records: Working with Multiple fields and missing fields”).
(10) Export recordsBy using MARCEdit it is possible to export MARC records either in the MARC
format or as tab‑delimited data. There are many possible uses for this feature. Anexample may be when eBook metadata needs to be exported to a system forwhich the library lacks an existing method of metadata exchange. Systems of thistype could include institutional repositories and other digital repositories. As longas the system can use delimited records, MARCEdit can be used to facilitate themetadata exchange. In addition, MARCEdit could also be used to extract recordsfrom a larger record set that could be shared with other libraries or edited in acontext such as the ones described previously with regard to extracting selectedrecords.The video that describes how this function works is located at
hퟙ�ps://www.youtube.com/watch?v=qkzJmNOvY00 (search for “MarcEdit: ExportTab Delimited Data”).(11) Batch process MARC recordsThe batch processing function allows librarians to apply an action to all of the
files with the same extension stored in a single folder to process the files all atonce. This is different than bulk processing where a single action is applied to allrecords within a record set. Batch processing, on the other hand, allows for anumber of record sets to be processed at once. Key differences between batchprocessing and bulk processing include the fact that bulk processing allows the
librarian to do more granular and customized editing while batch editing tends toinvolve relatively high‑level automated metadata transformations based onexisting algorithms for various transformations. Batch processing can be usefulwhen a number of files of metadata have been harvested from a non‑MARCsource or existing record sets need to be exchanged with another system that hasdifferent technical requirements. There is even a specialized function that willtransform the character encoding from records extracted from a MARC‑8 systemto characters usable in a Unicode system.This is one feature that allows metadata and cataloguing librarians to somewhat
magically transform metadata in ways that previous to the development ofMARCEdit was very labor‑intensive. In increasingly complex metadata anddiscovery environments, librarians are finding greater need to move theirmetadata between systems making this feature quite important for someacademic libraries. The MARCEdit discussion list archive contains a number ofdiscussions about how this tool is being used. In addition, readers may beinterested in viewing Terry Reese’s YouTube video that demonstrates the batchprocessing tool:
hퟙ�ps://www.youtube.com/watch?v=nt2RChF_hgQ (search for: “MarcEdit: BatchProcess Records (Example: MARCXML2 MARC)”).
Because of the potential the batch processing tool has for taming particularlydifficult tigers, references to articles and other sources of information about thetool and other related tools are included in the toolkit tools listed at the end of thischapter.(12) Generate call numbers, FAST headings, and other handy toolsWhile librarians may generate call numbers and FAST headings from within the
MARCEdit editor, it is also possible to operate these tools as batch processes.Without opening a record set, the same options for inserting classificationnumbers (LCCN or DDC) and FAST subject headings if a match for the record isfound in OCLC are available. These tools can be accessed via the MARCEdit“tools” menu as are the other tools listed in this chapter.For those who wish to deduplicate records within a file without actually
opening the record set, the deduplication tool is also located on the tools menu.On this menu the feature is called “find duplicate records.” The RDA helper isalso located on this menu.(13) MARCCompare, MARCJoin, MARCSplit, and merge recordsThese tools are useful for situations where either entire record sets or fields
within record sets need to be merged or combined and also where record setsneed to be split. The merge records tool is useful where fields or subfields fromone record set need to be merged into an existing record set to enrich or correctthe existing metadata.
The MARCJoin tool can be used as a batch process tool to combine all of thefiles contained within a folder. Or, it can be used to combine only specific files.When selecting specific files, this can include files in multiple directories orfolders. The tool can also be used to append files to an existing record set asrecords are received. This tool is used more frequently when dealing with eBookmetadata than might be expected as some eBook vendors will send individualMARC records in file folders rather than collate them into a single record set.Combining the separate eBook records into a single record set makes theprocesses of editing and preparing the records and loading them into thecatalogue much more effective and less time‑consuming than aퟙ�empting toprocess and load each record individually.MARCSplit is less frequently used when working with eBook record metadata
but may be useful if existing record sets need to be broken up into separaterecords for reprocessing and transformation into another metadata format.With regard to the MARCCompare tool, the author has never had a reason to
use this particular tool for any type of metadata processing. However, for thosewho are interested in reading about this tool, a link to a blog post by Terry Reeseexplaining the history of this tool and what it is intended to do is included in thelist of resources directly below.Helpful video examples and a blog post on these tools can be found at the
following:
hퟙ�ps://www.youtube.com/watch?v=_a60t2I9Fqs (search for “MarcEdit: Merge MARCRecords”).hퟙ�ps://www.youtube.com/watch?v=wOIL435CxMI (search for “MarcEdit Example ofusing MARCJoin”).hퟙ�ps://www.youtube.com/watch?v=M1J3QEyLzss (search for “MarcEdit Split andBatch Process Example”).
Blog Post on MARCCompare:
Reese, T. (2014). “MarcEdit 6: Reintroduction of MARCCompare/RobertCompare”[blog post]. Retrieved from: hퟙ�p://blog.reeset.net/archives/1341 30 March 2015.
(14) Plugin managerPlugins can be used to extend the functionality of an application. MARCEdit
allows for the creation of plugins for use within the MARC editor. There are anumber of existing plugins available within MARCEdit that can be activatedwithin the plugin manager so that they can be used in the editor. The existingplugins can also be used as examples of how additional plugins could be created.Examples of existing plugins include OCLC Connection, Biblios.net editor, andgenerate cuퟙ�ers.
The usefulness of plugins within MARCEdit depends on the larger metadataenvironment and the types of processes and workflows that aren’t being met byMARCEdit and other applications in use by the library, or workflows that couldbe improved if a custom extension to the application were created.Terry Reese’s YouTube video that discusses the use of plugins in MARCEdit
from a general point of view is found at hퟙ�ps://www.youtube.com/watch?v=ZTx‑gL1BAmew (search for: “MarcEdit: Managing Plug‑ins using the MarcEdit Plug‑in Manager”).
“Functions available on the MARCEdit Addins menu.”
(15) CDS‑ISIS.iso => MARC translationWhile CDS/ISIS database metadata is not one of the more common types of
metadata librarians need to harvest for use in the local discovery system, librariesthat require metadata from highly specialized collections from anywhere in theworld may need to work with CDS/ISIS metadata.Information about CDS/ISIS as well as ISISMarc can be located on the web by
navigating to UNESCO’s web portal at hퟙ�p://portal.unesco.org/ and searching forCDS/ISIS on the website search engine.
In addition, for those librarians who use this tool, expect to spend more timecleaning and reformaퟙ�ing metadata relative to the work typically done whenmetadata has been translated into MARC out of schema such as DC or MODS. Inparticular, the presence of the character “^” should specifically be searched forand records that contain it should be carefully inspected. Not only may there bean error in a field that contains this character, the presence of the character maycause applications that need to process the records to malfunction.(16) MARCEdit Script WizardThe script wizard add‑in tool can be used to automate some simple or routine
tasks. For example, a librarian can indicate what action will happen if a certaincondition is found in a record. The author has used the wizard to create a scriptfor replacing generic 506 fields with a restriction note that is specific to her library.The script is saved and then run automatically on relevant record sets from withinthe MARCEditor. The script wizard can also be applied in very specific situationsthrough the use of regular expressions.For those who have an interest in working with building more complex scripts,
the basic scripts that are built using the script editor can be used as a template forbuilding additional scripts. Even those who are not familiar with any type ofscripting language will likely notice paퟙ�erns in how scripts are constructedbecause the results of the information put into the wizard are displayed in apreview window before the script is saved.The author suggests testing the behavior of scripts on a number of different
sample record sets before using them in the regular workflow. This is particularlytrue if the librarian intends that the workflow include the application of severalscripts that are applied serially. While the logic of a script order may appearcorrect, unanticipated factors or record set characteristics may lead to unexpectedresults, which are only discovered when applying the script. The author hasfound that with experience her scripts and her ability to appropriately orderscripts have become increasingly effective. Many scripts are relatively easy tocreate and she has been generally pleased with the results despite beingsomewhat frustrated by a few unexpected results and mistakes that occurredwhen her first aퟙ�empts were made. In general, the author’s conclusion is that timeand effort put into learning to create and use scripts in MARCEdit have been wellworth it in terms of the increased efficiency and consistency in record setmanagement that has resulted.(17) Verify URLsThis tool, which is found on the “add‑ins” menu, has a self‑descriptive title. The
verify URLs tool can be run on a record set to check the URLs found in the 856field of a record set. The results of that check are reported in an HTML page,which is saved locally. Problems discovered during the process are returned witheither a 400 or 404 code, while URLs that could be verified are coded typically 200or 300. Occasionally, no code will be produced because the browser timed out
during the search. The laퟙ�er URLs are generally suspected by the author to beproblematic and are tested manually on a URL‑by‑URL basis. There may be othercodes that could be produced in the report but these are the only ones the authorhas seen when she has verified URLs.The author has used this tool on various occasions, including situations where
URLs have either been retrieved from an external source and inserted into therecords or have otherwise been generated through a process proven to have beenproblematic in the past. Of course, the tool isn’t 100% foolproof in the sense thatthe URL can be directed to the incorrect resource and/or it doesn’t detect whetheror not the library has access to the full content of a licensed resource, but at leastthe worst problems are identified in the report.The report can help to identify when the process, which was designed to
generate or insert URLs, failed so that the process can be revised and run again. Itcan also be used to detect quality control problems with vendor‑supplied recordsets. If there are a small number of problematic URLs, the correct URLs can besubstituted before the record set is loaded. If the problem is more substantial, thevendor may need to be contacted and the record set may need to be generatedagain using up‑to‑date information.(18) Help menuFor those readers who have become cynical about the helpfulness of “help”
menus, the MARCEdit help menu will likely be a pleasant and welcome surprise.The help menu gives users access to tutorials, the MARCEdit listserv, informationabout known issues, and a link to the MARCEdit blog. It is a useful resource touse as a starting point for learning about the features, solving problems, andkeeping up‑to‑date on developments and issues as they are discovered.While this exploration of MARCEdit has taken a considerable amount of real
estate in this chapter, it’s important to keep in mind that this discussion has beenlimited to a somewhat superficial discussion of the features, functions, and toolsthat might be used by the majority of readers or that may be of interest to readers.The intent has been to introduce readers to the scope of what is possible withMARCEdit. Some readers may have no experience with MARCEdit and may wishto begin with learning some basic functions in the MARC editor. Other readersmay be MARCEdit users but have not experimented with some of the moreadvanced functions that, if implemented, could improve their ability to managetheir eBook metadata more effectively and efficiently.For readers whose libraries are members of OCLC or Koha, they will find that
there are many features that have been designed for their systems or processessupported by their systems. There may be specific techniques using the generaltools that have been worked out by other libraries to achieve specific ends forlibraries that use either of these systems. There are a number of sources ofinformation about how MARCEdit can be specifically used with OCLC or Kohaincluding videos, blog posts, and conference presentations.
One final note about MARCEdit is to remind readers about two importantcharacteristics of the application. The first is that functionality is added and theapplication is updated on a regular basis so it is reasonable to expect that both theinterface and the tools will change their appearance and functionality regularly.One way to not become disorientated by the changes is to follow the MARCEditlistserv and/or Terry Reese’s blog, where news about changes are reported eitherin advance or as the changes appear in the updated versions. The secondcharacteristic arises from the intersection of the nature of the application and thecreative problem solvers who use it. Specifically, users innovate many clever andpractical ways to apply and/or modify MARCEdit’s functions and tools to tametigers big and small. Many of these helpful solutions are shared in onlinediscussions and conference presentations. However, the author has found that shehas learned a considerable amount from other cataloguing and metadatalibrarians about how MARCEdit might be used to solve common and/orparticularly troublesome problems with record sets by simply asking them inperson at conferences or virtually in social media or other online contexts abouthow they use MARCEdit at their libraries. Not only is the ongoing developmentof MARCEdit both dynamic and responsive to the changing environments wheremetadata is managed in academic libraries, the nature of MARCEdit also fostersinnovation and collaboration in the LIS community.
6.17 Record loadingHaving covered issues surrounding the retrieval or creation of record sets and thetopic of editing MARC record sets, it is now appropriate to talk about the processof loading a mediated record set into the local bibliographic database. Thisprocess is generally called record loading or just loading and the tool orapplication used to achieve the task is generally called a loader. Loaders are afunctionality or application built into the ILS/LMS and can vary significantly fromsystem to system. For example, when the author was a consultant she workedwith an ILS that employed a loader that did nothing more than extract the recordsfrom the record set file and load them into the catalogue, with no option forcustomizing or adjusting the records during the loading process. At the extremeopposite end of the spectrum, at the library where the author currently works the“loader” is actually a highly customizable function that makes use of complextables and profiles for various record‑loading scenarios. In this context recordscan be reprocessed and edited during the loading process and it is also possible todo the type of merging of records previously discussed in relation to the featuresthat are available in MARCEdit. The former loading process only required theuser to read a few lines of text instructions and click a few options while runningthe laퟙ�er ILS loaders require specialized training provided by the vendor.Considering the complexity of the ILSs that are frequently used by academic
libraries, the laퟙ�er scenario of having a highly specialized loader is much morelikely to occur than having the highly generic and easy to operate variety.Regardless, this contrast points out the fact that not all loaders work the same wayonce the discussion gets past the point of establishing that the function of theloader is to add records from record sets to the bibliographic database. Note thatin some contexts, loaders are not just loading discovery metadata but may also beloading acquisitions metadata, ERM records or, depending on the ILS, any othertype of record stored within the ILS.Given that it is not possible to discuss record loaders with much specificity
because of the variety of possibilities readers may find in use at their library, it isessential that this topic be explored in a general sense for the benefit of thosecreating the eBook metadata management plan. If the record loader(s) were notdocumented when the reader was investigating the overall functionality of his orher local ILS/LMS, now is the time to begin to investigate the functionality forloading records. If some investigation was done and some documentation wascreated, now is the time to review and add to that documentation.
Questions to answer and documentation toinclude(1) How does the ILS loader work? Can its functioning be reduced to a diagram?
Is there documentation explaining how it functions? Where is thatdocumentation located?
Sometimes vendors have charts in their documentation that illustrate the orderin which processes occur and, if information is drawn from tables, whatinformation is used at which point. Including either the document or access to itin the eBook metadata management plan can be extremely useful fortroubleshooting and/or improving processes. If such a diagram doesn’t exist, itmay be worth the time to create a simple diagram to represent the process.(2) Does seퟙ�ing up a loader and/or running it require specialized training? If so,
who has that training? Are enough people trained? Who does the training?It is important that those creating the eBook metadata management plan not
make any assumptions in this regard. Having enough people with the righttraining is critical to ensuring that unnecessary backlogs of record loading andunresolved problems don’t pile up and reduce the overall effectiveness of themetadata management plan.(3) In addition to the diagram, have the options and functionality of the loader(s)
been documented? It is particularly important to have some level ofgranularity in the documentation if the cataloguing department is notresponsible for loading processes.
Documentation should include details such as: • A threshold (maximum) record set size that can be loaded (this may includea threshold after which performance problems with the loader or ILS canbegin to occur).
• Whether or not fields can automatically be added or deleted during theloading process (if there are any limitations and restrictions on this theyshould be noted as well).
• What options are available for inserting records, overlaying records, andrejecting records during the loading process.
• If it is possible to protect fields if records are overlaid (Including anylimitations or exceptions).
• Any special requirements for loading (i.e., what must typically be done to arecord set before it is loaded).
• What will happen when a record is rejected during a load process.• What will cause a record to be rejected.• What reports are or could be generated as a result of a record load andwhere those reports are distributed.
• If different aspects of creating and using loaders are carried out by multiplestaff in multiple departments, is there documentation of who is responsiblefor which parts.
(4) If the loader uses load profiles, is there an inventory of profiles? Thisinventory should outline the characteristics of each loader as well as whattypes of records are typically loaded. (“Type” may mean record sets withspecific characteristics or it may mean record sets from a particular source.)
If details about the loaders are documented and reviewed when the metadatamanagement plan is reviewed, it may be easier to detect when loading processesrequire updating. In addition, this information can be useful when librarians areworking on integrating the handing of records for a new eBook subscription oraltering workflows to adjust to the addition of a new service or process.(5) If error reports are produced, whom do they go to? Does someone follow‑up
on them? What is done?The boퟙ�om line is that the records that are reported on an error report, if an ILS
produces one, represent records that were not loaded into the system or were notloaded in the same way as the other records. This can have an impact of thediscoverability of eBooks within the collection and/or the accuracy of records inthe catalogue. Those overseeing the eBook metadata management plan will needto ensure that if error reports are produced, they are followed up on and that theperson(s) responsible for dealing with the error reports have the information andtraining they require to resolve or report the problems.In the author’s experience it has been through the process of dealing with error
reports that she and other library staff have uncovered the source of some
complex problems that had been occurring over time but had no easily detectablecause. Therefore, not only does dealing with error reports ensure records thatwere missed during the loading process get loaded into the catalogue, it alsooffers librarians and other library staff the opportunity to investigate when,where, and why bulk processes fail. This knowledge can be helpful for improvingpractices and preventing future problems.(6) If various aspects of the loading process are handled by different staff and/or
different departments, is there a way to trace the progress of a record setthrough the steps of loading? If so, is there a way to detect when a record sethas been held‑up somewhere in the process?
Unfortunately, when there are many record sets being passed from person toperson and between departments it is possible for record sets to essentially fallthrough the cracks. The issue of tracking record sets may have been addressedalready in the metadata management plan but perhaps not in enough detail todetect when a file has not been effectively passed from one person to another orfrom one department to another. The author has had the experience of a recordset being misdirected for nearly two years before the problem was detected. Inthis case update records were sent to overlay records that weren’t present in thecatalogue. The lost record set could be discarded but until this problems wasdiscovered the eBook content wasn’t discoverable in the catalogue.A local approach to using loaders within an ILS will need to be examined
within the larger metadata and library context. In particular, those managingeBook metadata will need to make a decision regarding how much processing isdone in the MARC editor and how much is done by the loader. Neither choice isautomatically beퟙ�er seeing as both options have the potential for being equally asautomated and efficient. However, the author does tend to prefer doing most ofthe work in an editor at her library because the output of automated processes canbe quickly scanned by cataloguers before passing the file around for loading.Workflows and the ILS at another library may be such that cataloguers design andoperate the loaders and thus can monitor the record quality during the actualloading process. Decisions about what types of changes are done in the editingsoftware and what is done by the loader would likely be made based on factorssuch as the time and skills available in different departments and among differentstaff; existing processes and workflows; processes that must occursimultaneously; and the availability of software that can handle the edits thatneed to occur.
6.18 Updating record set metadataThis topic will be addressed in more detail in an upcoming chapter. However, it isimportant to note that one of the key benefits of bulk processing eBook metadatais that when that metadata needs to be updated, it can be done via bulk process as
well. Often sellers of eBooks will send record sets that contain updated metadata.These record sets could be intended to achieve a number of different ends rangingfrom fixing errors in the records originally provided and providing new recordsthat have more access points or table of contents information to providing newURLs after a major platform change. These update record sets are effective andefficient ways to rapidly correct and update discovery records in the catalogue.Another type of “update” record set supplied by vendors is a “delete” file or,
sometimes called “deletions.” As discussed elsewhere in this book, delete files aresimply MARC files with a “d” coded in the MARC leader field. A loader can becreated that will overlay and possibly reformat records that need to be removedfrom the catalogue. The typical automated workflow is to load a delete file thatwill suppress the records from public view, report changes in holdings to anyunion catalogues or OCLC, and then purge the record from the system if the laퟙ�erstep is the desired eventual outcome.When discussing the topic of bulk processing eBook discovery metadata, there
are two key topics that those working on the eBook metadata management planshould consider. The first topic is relevant to only those libraries whose ILSs haveloaders capable of protecting fields when a new record set record is set to overlayan existing record and/or library staff have the skills and time to create a loaderthat will protect fields. The considerations that need to be made include whetheror not the library should be protecting any fields and, if so, which ones makesense to protect. Librarians need to review policies and practices in this regardconsidering it is possible that the current practices were established at a timewhen the library was doing a significant amount of manual editing of bulk loadedrecords. It is possible in some situations that the loading process should beswitched to allow the new record to entirely replace the old record. As the qualityof some record sets improve and/or the process of trying to protect certain fieldsproves either difficult or problematic, some libraries may decide that the time isappropriate to begin allowing a complete overlay of records for certain recordsets. However, there may be times when libraries have added certain fields locallythat can’t easily be replaced using a bulk process. Examples of this may be locallyadded subject headings from controlled vocabularies that are used by the localdiscovery system but not commonly found in the record set records or local callnumbers created by a cataloguer. It is particularly important to record theseexceptions and the reasons for making them. This information will reduce thechance of mistakes being made during a major system migration if the loadersneed to be partially or completely recreated. Also, when the metadata plan isreviewed, library staff will have the opportunity to consider whether the reasonsremain valid and, if not, this may represent a new opportunity to simplify loadingprocesses by eliminating a complex loader.The second issue that should be considered has to do with how libraries might
be able to plan for long‑term metadata maintenance issues. While maintenance
will be discussed in an upcoming chapter, certain aspects of maintenance shouldbe considered alongside discussions about eBook metadata bulk processing. Theissue is that not all eBook vendors have the capacity to provide update or deletefiles. In addition, if the library has harvested or used metadata exported fromanother system within the library or elsewhere on campus, there is no vendor toproduce update or delete record sets. The reality is it is possible that this metadatamay need to be updated, replaced, or deleted someday and the process of doingthis will likely need to be carried out entirely by library staff. Given the size ofmany record sets that can be obtained in the ways discussed in this chapter,library staff will undoubtedly want to be able to update these records in bulk. Oneapproach, as previously alluded to, is to compress and archive record sets createdlocally in a well‑indexed storage location so that they can be retrieved at a laterdate. While it may be possible to group the records within the ILS and maintainthem within the ILS using built‑in features and functions, working outside of theILS in an editor such as MARCEdit allows the freedom to experiment withdifferent approaches to successfully carry out the changes that need to happenwithout risking unintended damage to the records in question or to other recordsin the catalogue. For example, it may be possible to extract the updated metadatafrom another source and then merge it into a test version of the original recordset. Once a merge is achieved, the library staff could test the success of the resultsby overlaying a small sample of records in the live system. By working externallyto the ILS in an application such as MARCEdit, library staff could experimentwith different approaches and processes to work out bugs and find the mostefficient and effective workflow.In conclusion to this chapter section on the bulk processing of eBook discovery
metadata, there is likely liퟙ�le doubt that readers now understand why thepractices, processes, procedures, and applications used in managing MARCrecord sets are a central concern for metadata and cataloguing librarians. Bulkprocessing practices and procedures consequently will likely require a significantamount of study and consideration when creating the eBook metadatamanagement plan. It’s important for readers to keep in mind that this is an aspectof professional practice among cataloguing and metadata librarians characterizedby a notable amount of innovation and creativity. While the innovation is likelyoften motivated by the need to survive some of the disruptive changes brought onby the presence of eBooks in academic libraries, many librarians, including theauthor herself, find the prospect of being able to innovate and develop newsolutions to problems to be a motivating situation in itself. In a profession that hasmany long‑standing practices and standards, the ability to introduce somethingnew can bring a new level of engagement to the work of many technical serviceslibrarians. Therefore, it is reasonable to expect that there will continue to be manymore new developments and innovations in the near future. Many new tigertamers may step forward and change the face of the work that libraries do today.
With regard to the face of change in the work done by cataloguing andmetadata librarians as well as other technical services staff, it appears that thegrowing adoption of OCLC’s WorldShare Metadata Collection Manager Servicemay in fact be what was described as a “sustaining technology” in Chapter 2.However, the current discussions about the movement of library discovery awayfrom systems based on MARC records and toward models based on linked dataconcepts such as BIBFRAME, suggest that cataloguing and metadata librariansmay see the core of their practice moving increasingly toward another disruptivechange. Linked data, which is potentially much more powerful and flexible, lacksa traditional record structure and thus represents a dramatic shift from the waylibrarians think about discovery metadata. So as not to ignore the fact thatlibraries may be shortly moving from one disruption to the next, a discussion ofBIBFRAME and how those working on the eBook metadata management planmight prepare themselves for whatever may happen in the near future will beaddressed in the final chapter of this book.
Toolkit survey: Bulk processing
(1) Does the library have an inventory of the record sets that it receives or has
received? Where is this inventory located? Is it kept up‑to‑date? Do the staffwho need the information from the record set have easy access to it? Is themetadata for the record sets integrated with other metadata (such as in theERM)? Is this metadata granular enough for keeping track of the record sets?If not, could the existing system be upgraded or would something new needto be created?
(2) Does the library have a checklist that can be used to create a profile for seퟙ�ingup an appropriate workflow for new record sets? Is the checklist effective?Could some questions be added, removed, or reworded to improve thequality and usefulness of the resulting profiles? Are the profiles reviewed foraccuracy annually, when the subscription is renewed, or at some other pointin time?
(3) Are there some eBook record set workflows that significantly differ from themajority of other eBook record set workflows? Are there options availablethat would allow the library to bring this workflow in line with what is donewith other eBook record sets? (Sometimes the workflows were set up so longago that the possible options have since improved.) If it is not possible tomodify the workflow, is the difference clearly documented and are the staffwho need to perform any part of the workflow aware of the special casespresented by these workflows?
(4) Is there a way to monitor the progress of record sets? Is it systematic and doesit apply to all stages of retrieving, editing, loading, and maintaining the
records? Can staff easily detect when record sets are stalled at a particularpoint in the process? Do staff know what to do if or when a record set getsheld up somewhere in the system?
(5) Is a cataloguer included in the group who considers the purchase of newelectronic book collections, dealing with new vendors and/or seퟙ�ing upaccess to eBooks on new platforms? If so, does the cataloguer have adequateknowledge or information about bulk processing to be helpful? Also, havecriteria been established to identify when it is appropriate to include acataloguer in preliminary discussions? If cataloguers aren’t involved at thisstage in considering new resources, could a cataloguer be added to thisteam?
(6) Does the library have a system that alerts library staff when a record setshould be available for pickup (i.e., this would be for those record sets thatare updated on monthly, quarterly, annually, or some other regular basis)?
(7) Does the library have a plan to systematically monitor the accuracy andcorrectness of record set profiles?
(8) Has the library reviewed the appropriateness of how loaders are set up withinthe ILS? Do the reviews take into consideration changes that may have takenplace in the metadata environment since they were first created or changesin the characteristics of the record sets for which they have been created?
(9) Does the library harvest eBook metadata from external sources and crosswalkit into MARC metadata for use in a local discovery system? Is metadata fromother local systems exchanged into MARC for use in the discovery system?If the answer to either or both questions is “yes,” is there an inventory ofboth the sources of this metadata and the processes that were/are used forcapturing and importing that data?
(10) Does the library use one or more MARC editors that exist outside the ILSand offer advanced editing options? Are the library staff that must use thiseditor trained in the use of the most relevant options?
(11) Does at least one librarian or other library staff follow listservs, blogs, orother sources of information to learn about new developments that arerelevant to the use of the library’s MARC editor and methods for processingrecords in bulk? Is there a way to follow reports of known problems with theeditor(s) and or tricks and tips offered by other librarians?
(12) Does the ILS/LMS produce load reports or error reports? Is at least oneperson at the library assigned to follow‑up on error reports? Is that persontrained in what to do when various types of problems are found and/orwhere certain problems should be reported? Is there somewhere to log newor repeated problems? Is this log useful to library staff or might it be useful?
(13) Has the library identified and automated the most common and routineprocesses involved with bulk processing eBook record sets? Are theseprocesses documented? Are they effective? Are the results of these
automated processes tested on a routine basis? Is there evidence that theyare being applied appropriately? Is there an established method fordetermining where and when processes could be automated?
(14) If the library uses scripts to automate processes, are those scripts testedbefore they are implemented? Are they routinely checked for continuedappropriateness as part of the processes described in question 13?
(15) Does the library use Koha? Does the library have an OCLC membership?Have special features built into MARCEdit been explored and consideredfor local use? Or, have the special workflows that have been recommendedfor Koha libraries and/or for those who have OCLC accounts beeninvestigated for possible local use?
(16) Does the library have an approach or various methods for maintainingeBook metadata over time? This topic will be revisited later in this book. Atthis point it is good to start documenting any known practices related toupdating records or removing defunct ones from the local cataloguing.
(17) Are the possible options available for record loading, overlaying, and fieldprotection during overlay processes known to library staff and consideredwhen customizing loaders and designing workflows? Is special trainingrequired to customize and run the loaders? If so, does the appropriate staffhave the training and are there an adequate number of trained staff?
(18) Does the library archive some or all of its record sets for future possiblereuse? If so, is the rationale for archiving record sets documented? Is therean inventory of which record sets have been archived and/or is the archiveindexed so that the record sets can be easily retrieved if needed? Are recordssaved for a short period of time such as situations where library staff want tofirst ensure that a complex series of transformations and edits have beensuccessful and then the original record sets are discarded? Or, does thelibrary save some record sets indefinitely? Is the purpose for saving thesesets documented in the eBook metadata management plan? Reasons caninclude the ease of future maintenance processes or the desire to have arecord set of metadata for locally digitized resources readily available toshare with other libraries.
Toolkit tools
• Slides from a conference presentation given by Terry Reese contain someuseful overview information as well as examples of how regular expressionscan be used in MARCEdit:
Reese, T. (2012) “Editing Records with the MARCEditor” [conference presentationslides] Kansas Library Association 2012 Conference, Wichita, KS retrieved fromhퟙ�p://kslibassoc.org/2012Conf/handouts/marceditsession_three.pdf 30 March 2015. Or
an update of this presentation can be found at hퟙ�p://marcedit.reeset.net/marcedit‑101‑workshop.
• This article outlines a way in which the power of the batch load tool inMARCEdit has been used in conjunction with Python to solve some problemswith particularly tricky tigers. Because she has shared her ideas and helpedother cataloguing and metadata librarians see how flexible and powerfulMARCedit can be with some problem solving and creative thinking, HeidiFrank is recognized as another of our tiger tamers. The article is:
Frank, H. (2013). “Augmenting the Cataloger’s Bag of Tricks: Using MarcEdit, Python,and PyMARC for Batch‑Processing MARC Records Generated From the Archivists’Toolkit”. Code4Lib Journal. Issue 20, 2013‑04‑17 Retrieved from:hퟙ�p://journal.code4lib.org/articles/8336 30 March 2015.For those who find this article of interest, a related blog post by Lauren Magnuson,which goes into more detail about PyMARC. may also be of interest:Magnuson, L. (2014). “Hacking in Python with PyMARC” [blog post] ACRLTechConnect, posted October 15, 2014. Retrieved from: hퟙ�p://acrl.ala.org/techconnect/?p=4669 30 March 2015.
• The following are examples of projects where tiger tamers have taken on thetask of harvesting metadata from large collections of open source digitaldocuments and transforming that metadata into functional MARC records orMARCXML and sharing the results with others in the form of record sets:
Project Gutenberg Catalogue Project, University of Adelaide:www.gutenberg.org/ebooks/.The University of Adelaide gets additional recognition as being tiger tamers for theirhandy monthly MARC record sets updates to their locally hosted open access eBookcontent: hퟙ�ps://ebooks.adelaide.edu.au/meta/AOpen hosts a number of open access academic eBooks in the humanities and socialsciences. The metadata file, which can be downloaded for the collection, is inMARCXML format, which is easily converted into MARC 21 via MARCEdit. Thedownload page for the metadata is located here: hퟙ�p://www.oapen.org/metadataexports?page=intro.
• OpenLibrary‑Utilities (SCCLD) was produced by the Santa Clara CountyLibrary District (SCCLD) and contains a search engine that provides access tonumerous open access tools that can be used by libraries and is located athퟙ�ps://foss4lib.org/package/openlibrary‑utilities‑sccld. For example, to searchfor pages of open source tools for working with and transforming MARC,
search for “marc*”. Note that the tools range from links to software downloadsto articles and information provided in training sessions. SCCLD are tigertamers for pulling together in a single location such a wide variety of usefulresources for the benefit of librarians around the world.
Notes1. For a fuller discussion of disruption in technical services functions ofacademic libraries, the author has wriퟙ�en a blog post in the Brain Work,which is hosted by The Centre for Evidence Based Library and InformationPractice at the University of Saskatchewan. The post can be read athퟙ�p://words.usask.ca/ceblipblog/2014/12/02/technological‑disruption‑in‑technical‑services/.
2. ALCTS stands for Association for Library Collections and Technical Servicesand is a division of the American Library Association (ALA) (see:hퟙ�p://www.ala.org/alcts/) Given the rapid rate of change in libraries, it is agood practice for all technical services librarians working in academic andresearch libraries to follow the webinars, documents, and other informationposted by ALCTS. In reality, it is nearly impossible to create a monographsuch as this book that reflects the cuퟙ�ing edge of thought in practice.However, ALCTS is one source of information for both new and experiencedlibrarians to keep up‑to‑date with innovations, trends, emerging practices,and issues within the specialized subfield of technical services librarianship.It is not necessary to be a member of ALA and/or ALCTS, althoughmembership has various benefits including geퟙ�ing discounts on training,conferences, and publications. Much of the information available on theALCTS website is provided for the benefit of the LIS community free ofcharge.
3. While not every library in the world has a code, the libraries who do use thecode included in the 040 field of MARC records can be searched for via oneof the directories listed on this page:hퟙ�p://www.loc.gov/marc/organizations/.
4. For those who have not seen regex applied to solving common MARC recordediting questions, have a look at Terry Reese’s blog post for March 11, 2015,which is called “Conditional Regular Expression Replacements usingsubstitutions in MarcEdit.” Available at hퟙ�p://blog.reeset.net/archives/1659.