MARC: Keystone for Library Automationpdfs.semanticscholar.org/a0a7/67bd448c58cce9849d80a89f81... ·...

16
The creation of the machine-readable cata- loging (MARC) format for bibliographic data made libraries pioneers in the technology rev- olution that has been ongoing ever since the development of the computer. Electronic com- puting machines were essentially invented in the 1940s, 1 developed in the 1950s, and became widespread in the 1960s, at which time librarians began to join others in exploring the possibilities that computers might hold for their services. Initially, the focus was on using computers for library circulation and inventory processes. The most central and costly activity for libraries—the cataloging of material and the maintenance of the catalogs that provide end- user access—had such complex requirements that it was not until 1965, when the Library of Congress launched an intense automation effort, that experts in the new technology real- ly began to tackle this application. 2 Cataloging data is a shorthand description of items in a collection. Today we call that meta- data: data (cataloging description) about data (the item from the collection). Much of the cat- aloging data are succinct, abundant, and diverse access points such as names, subjects, places, lan- guages, and physical characteristics of the item. The data attempt to give potential users of library material a variety of ways to discover and locate information that might meet their needs. Cataloging data is also critical for efficient func- tioning of various library processes such as acqui- sitions and circulation. The keystone for the development of automation in libraries was the simple but innovative MARC cataloging data format, developed in 1967–1968. This article describes the complexity of the library application, how the MARC format was innovative, and why it was the foundation of automated systems development in libraries. The text also discuss- es the environment that the development of MARC helped to create. Technology setting In the mid-1960s, when the automation evo- lution for libraries began, the computer envi- ronment differed sharply from today’s: There were no personal computers or networks—even the cathode-ray tube (CRT) computer terminal was not yet deployed. Computing was carried out on physically large mainframe machines, using transistor technology. Integrated circuits were new; chips and local area networks were still a few years off, 1971 and 1973, respective- ly. The first true family of computers, IBM’s suc- cessful System/360, was introduced in 1964. Computer input was largely via punched cards, and machine storage capacity was a serious con- cern. Magnetic tapes, paper tapes, and cards were used for data transfer and storage. Com- puting was a powerful new tool but essentially involved batch processes. Initially used prima- rily for research, by the 1960s the use of com- puters in business applications was being actively pursued. 3 On another level, the environment was also strikingly different from today’s. Assembly lan- guage was heavily used for computer program- ming, although there was increasing use of higher level languages such as the new Cobol and Algol, and Fortran was well established for number-based applications. The days of work- ing directly with binary encoding were not long past, however. Computer use typically involved complex numerical calculations; lan- 34 IEEE Annals of the History of Computing 1058-6180/02/$17.00 © 2002 IEEE MARC: Keystone for Library Automation Sally H. McCallum Library of Congress Libraries’ most central and costly activity—cataloging material and maintaining the catalogs providing end-user access—had requirements that defied efficient automation until the mid-1960s, when the Library of Congress developed the MARC format for data records. The format became the foundation for automated systems for libraries that took data sharing to new levels and enabled exploitation of future computer developments to create today’s online catalog environment.

Transcript of MARC: Keystone for Library Automationpdfs.semanticscholar.org/a0a7/67bd448c58cce9849d80a89f81... ·...

Page 1: MARC: Keystone for Library Automationpdfs.semanticscholar.org/a0a7/67bd448c58cce9849d80a89f81... · 2017-11-29 · MARC: Keystone for Library Automation Sally H. McCallum Library

The creation of the machine-readable cata-loging (MARC) format for bibliographic datamade libraries pioneers in the technology rev-olution that has been ongoing ever since thedevelopment of the computer. Electronic com-puting machines were essentially invented inthe 1940s,1 developed in the 1950s, andbecame widespread in the 1960s, at which timelibrarians began to join others in exploring thepossibilities that computers might hold fortheir services. Initially, the focus was on usingcomputers for library circulation and inventoryprocesses. The most central and costly activityfor libraries—the cataloging of material and themaintenance of the catalogs that provide end-user access—had such complex requirementsthat it was not until 1965, when the Library ofCongress launched an intense automationeffort, that experts in the new technology real-ly began to tackle this application.2

Cataloging data is a shorthand description ofitems in a collection. Today we call that meta-data: data (cataloging description) about data(the item from the collection). Much of the cat-aloging data are succinct, abundant, and diverseaccess points such as names, subjects, places, lan-guages, and physical characteristics of the item.The data attempt to give potential users of librarymaterial a variety of ways to discover and locateinformation that might meet their needs.Cataloging data is also critical for efficient func-tioning of various library processes such as acqui-sitions and circulation.

The keystone for the development ofautomation in libraries was the simple butinnovative MARC cataloging data format,developed in 1967–1968. This article describesthe complexity of the library application, how

the MARC format was innovative, and why itwas the foundation of automated systemsdevelopment in libraries. The text also discuss-es the environment that the development ofMARC helped to create.

Technology settingIn the mid-1960s, when the automation evo-

lution for libraries began, the computer envi-ronment differed sharply from today’s: Therewere no personal computers or networks—eventhe cathode-ray tube (CRT) computer terminalwas not yet deployed. Computing was carriedout on physically large mainframe machines,using transistor technology. Integrated circuitswere new; chips and local area networks werestill a few years off, 1971 and 1973, respective-ly. The first true family of computers, IBM’s suc-cessful System/360, was introduced in 1964.Computer input was largely via punched cards,and machine storage capacity was a serious con-cern. Magnetic tapes, paper tapes, and cardswere used for data transfer and storage. Com-puting was a powerful new tool but essentiallyinvolved batch processes. Initially used prima-rily for research, by the 1960s the use of com-puters in business applications was beingactively pursued.3

On another level, the environment was alsostrikingly different from today’s. Assembly lan-guage was heavily used for computer program-ming, although there was increasing use ofhigher level languages such as the new Coboland Algol, and Fortran was well established fornumber-based applications. The days of work-ing directly with binary encoding were notlong past, however. Computer use typicallyinvolved complex numerical calculations; lan-

34 IEEE Annals of the History of Computing 1058-6180/02/$17.00 © 2002 IEEE

MARC: Keystone for Library AutomationSally H. McCallumLibrary of Congress

Libraries’ most central and costly activity—cataloging material andmaintaining the catalogs providing end-user access—hadrequirements that defied efficient automation until the mid-1960s,when the Library of Congress developed the MARC format for datarecords. The format became the foundation for automated systemsfor libraries that took data sharing to new levels and enabledexploitation of future computer developments to create today’sonline catalog environment.

Page 2: MARC: Keystone for Library Automationpdfs.semanticscholar.org/a0a7/67bd448c58cce9849d80a89f81... · 2017-11-29 · MARC: Keystone for Library Automation Sally H. McCallum Library

guage and string manipulation techniques werejust being explored. Some computer systems stillworked with an all-uppercase Latin script, using6-bit characters; the EBCDIC (Extended BinaryCoded Decimal Interchange Code)4 and ASCII(American Standard Code for InformationInterchange)5 sets, both of which include upper-and lowercase Latin alphabet characters, werenot introduced until the mid-1960s. Data for-mats, formal or ad hoc, were usually fixed lengthwith fixed-length data fields.

Changes were occurring rapidly, however.Universities and research groups were experi-menting with applications, and IBM and othercompanies were sponsoring cutting-edge workto improve the computing infrastructure.

The library applicationLibrarians plunged into this 1960s computer

environment with some special-use applicationsbecause they recognized that mechanizationheld great potential for library functions. Theprospect of gaining efficiencies for work-processing streams, largely by sharing catalogingrecords faster and more efficiently, was a majordriving force for experimentation with libraryautomation. Moreover, with machine-readablecataloging records, catalog cards could bemachine sorted and printed. Thus, automationcould benefit both the library and the librarypatron through cost savings on cataloging andother manual processes. These savings couldrelease funds that might be channeled into pur-chasing additional library resources (although itwas recognized by some analysts, if not alwaystaken into account by library directors, that thesavings would initially be offset by automationdevelopment costs).

However, from the beginning, library leadersrealized that automation would not be worth-while unless enhancing end users’ ability to findinformation received focus. While expeditingcataloging operations would be a significantadvantage for users, another goal was betteraccess options. Ultimately, the end user wouldbe able to discover material more rapidly andcompletely via machine access to a much richerset of data from the bibliographic record thanwas available in the limited and static (albeitsturdy) card catalog. Although the computingenvironment of the time would not yet supportsuch an “online catalog,” the possibility of itsdevelopment was considered in the subsequentwork on a format for bibliographic data. 6

During this period in the 1960s, however,library data and the needs of the library com-munity were not yet a good fit with the con-temporary computing environment. Libraries

had special characteristics that automation hadto accommodate, as described next.

Variable length data elementsLibrary catalog records have text string data

elements that are relatively short but highlyvariable in length. Truncation of data is unac-ceptable, but the variability of length for dif-ferent data elements and for different instancesof the same element means that specifyinglong, fixed-length format fields is wasteful. Yetboth of these, truncation and fixed-lengthfields, were standard practices in the 1960scomputer environment.

Record length variabilityBibliographic data, despite the obvious deri-

vation of the term bibliographic from books, refersto a diverse family of metadata records, not justfor books but for maps, serials, recorded sound,written music, motion pictures, photographs,and other graphic material—even artifacts.Although library materials tend to have the com-mon characteristics of author, title, subject, eachmode of expression has its own important char-acteristics. For example, maps require specializedgeographic elements, and access points; record-ed sound needs access and recognition for theperformers in addition to the composers.

Thus a bibliographic application requires alarge number of different data elements, andthe elements appropriate to describe eachresource will vary, resulting in records of manydifferent lengths. Fixed-length records, thenorm at that time, would be inefficient forlibrary data.

Large filesStorage is a perennial concern, but in the

1960s it was often controlled through limitingrecord—hence file—size. Library catalogs havetwo formidable characteristics in relation to datastorage. First, catalogs contain thousands and insome cases millions of records—at least one foreach item the library holds. Second, librariescontinuously integrate new records into the cat-alog, never having the luxury of stopping andstarting a new, smaller file. A library’s patronsexpect the catalog to provide them with accessto the library’s complete holdings, not just thelast few years’ worth, and patrons generallywant to see the bibliographic records for allitems matching a search so that they can deter-mine the most relevant results. The implicationfor systems developers in the 1960s was that thelibrary catalog application had to accommodatefiles with large and constantly growing numbersof variable-length records.

April–June 2002 35

Page 3: MARC: Keystone for Library Automationpdfs.semanticscholar.org/a0a7/67bd448c58cce9849d80a89f81... · 2017-11-29 · MARC: Keystone for Library Automation Sally H. McCallum Library

Frequent record updatesAn additional requirement is the need to

perform frequent record updates. Librarianshave long experience with keeping consisten-cy in large, constantly growing files construct-ed over many years. To maintain consistency,librarians often need to make changes not onlyto recent records, during the editing and inputprocess, but to retrospective records as well.Therefore, it is essential that the computerizedbibliographic record support easy updating.

Data retrieval Library cataloging records consist of many

data elements describing different aspects ofthe items being cataloged. These data arerecorded according to cataloging rules andsome of the data are highly structured to assurethat records for like or related items coalesce indifferent retrieval approaches.

Accordingly, data elements needed to beidentified, and in some cases the rules used toformulate them. Data tagging would assist var-ious printing and sorting processes, but alsoindexing, which enabled the rich data retrievalmade possible by the computer—but could beprovided on only a few data elements in thecard environment. While retrieval was initial-ly expected to be a batch process, some expertslooked forward to emerging online possibilities.

Data extractionsBibliographic data are a composite of many

different types of data elements intended tosupport different applications (such as circula-tion, acquisitions, and record input/update)and different views of the data (such as briefcitation lists, full item descriptions, and catalogcard content). Specialized subsets of the recordelements are required for these different pur-poses. Automated records, then, had to parseand identify data at sufficient granularity tosupport easy extraction of appropriate elementsfor the application or view needed.

Data sortingAnother crucial and exacting requirement for

bibliographic data is complex sorting. Large,multifaceted files generally need more than sim-ple alphabetic sorts, so structured headings andspecial sort rules have been developed over timeto help users in browsing. These rules take intoaccount the multilingual nature of bibliographicdata and the different categories of access points(such as title, name, and subject). To add to thecomplexity, the professional community doesnot agree on sorting approaches—large files,small files, public library collections, research

library collections, and special collections oftenhad unique sorting requirements. Therefore, thecataloging record needed to support sorting bydifferent rules. Several important products of cat-alog automation at that time would be presort-ed catalog cards, sorted printed book catalogsand lists, and computer-output-microfiche cata-logs.

Character setsIf libraries were not to take a step backward

from what had been achieved in the pre-automation card production environment,computerized cataloging records data wouldhave to be expressed in both upper- and lower-case alphabetic characters, would require anextension to the common English-centric Latinalphabet set, and, ideally, would provide char-acter encodings for many other scripts. TheLibrary of Congress holds material written inmore than 350 languages, using more than 30scripts, and many other large libraries have sim-ilar collections. Cataloging records containtitles, author names, and other information inthe vernacular, transcribed as it appears on theitems. In the 1960s, the Library of Congress wasactually producing catalog cards (see Figure 1)in many different scripts with the help of theGovernment Printing Office and sharing themwith libraries around the country.

The library community, however, has devel-oped transliteration tables for all non-Latinscripts, and information usually appeared on thecards in both the vernacular and transliteratedinto Latin script. Transliteration of selected datainto the Latin script supports sorting becauseinterfiling of scripts was problematic then, as itstill is. Besides the additional characters and dia-critics used in some Latin script languages, dia-critics are used extensively in transliteration. Thecommunity needed, at the least, an extendedLatin character set with approximately 60 addi-tional spacing and nonspacing characters thatwould enable librarians to encode additionalcharacters used in non-English Latin script lan-guages and many different character and dia-critic combinations.

Interrelated filesLibrary material varies widely in presenta-

tion, making it difficult to provide consistent,predictable descriptive metadata. Descriptivecataloging of these disparate resources is thuscarried out using standardized lists of names forauthors and other related persons, places,organizations, and conferences. Subject accessto library files uses controlled vocabularies andclassification schemes to assist with topical collo-

36 IEEE Annals of the History of Computing

MARC: Keystone for Library Automation

Page 4: MARC: Keystone for Library Automationpdfs.semanticscholar.org/a0a7/67bd448c58cce9849d80a89f81... · 2017-11-29 · MARC: Keystone for Library Automation Sally H. McCallum Library

cation of bibliographicrecords in catalogs. The listsand thesauri help creators ofthe cataloging data to stan-dardize the descriptions suf-ficiently to give end users anorganized approach to find-ing material. Moreover, theyare important for interinsti-tutional cooperation in cat-aloging. For effective catalogautomation, these support-ing lists and thesauri had tobe captured in computerform and related to the bib-liographic record creationprocess.

Interchange historyCataloging has, for more

than a century, taken placein a record-sharing envi-ronment. Before the 1970s,libraries required catalogcards. Because many of theitems they collected were the same as those heldby other libraries, the library community devel-oped ingenious mechanisms for copying eachother’s cataloging. “Copy” cataloging providedenormous savings to libraries even in the pre-automation era. The Library of Congress sup-plied major vehicles for this in its printed carddistribution service, which began in 1901, andits printed union catalogs. The union catalogsincluded Library of Congress records and cata-log records from other libraries for items notheld by the Library of Congress, along with anindication of all the libraries that held an item.The card distribution service also provided cardsfor catalog records created by selected researchlibraries.

Using a Library of Congress card number,libraries from around the country, and indeedinternationally, ordered sets of cards for theitems they held for which cataloging was avail-able. The cards were adapted on receipt to con-form to local practices and then filed in theirlocal card catalogs. This activity also created animpetus for national agreement on catalogingstandards that facilitated the sharing of biblio-graphic data. Because the provision of Libraryof Congress records was central to copy cata-loging nationally, a major focus for automationat the Library of Congress was the continuationof and improved support for this process. Thecard files from which the Library of Congresssupplied this service had, by the 1960s, becomeenormous and the fulfillment process was

labor-intensive. Computers held the potentialto simultaneously provide efficiency in theprocess and build an electronic file for thefuture.

Budgetary constraintsLibraries and other information agencies do not

have a history of large budgets. Budgets usuallyinclude funds for the purchase of collections (thekey item), associated processing costs (acquisitions,cataloging, and so on), service costs (for patron assis-tance, circulation, and stack assistance), storage(shelving and building costs, for instance) andadministration. Libraries have had to juggle the pur-chase, processing, and services budgets, trying tominimize the impact on users of any reductions—which many libraries have experienced again andagain. So when the potential for harnessing com-puters to assist with library processing and enhanceservice was recognized, there was limited financialsupport available for development, experimenta-tion, and testing in a real environment.

Development of the MARC formatLibrarians had been interested from the late

1950s in applying computer technology to theiroperations. Several studies investigated the pos-sibilities, including a general treatise in1961–1963 that predicted some of the ways thatcomputers might change library services.7 Animportant 1962–1963 feasibility study of theLibrary of Congress recommended that theLibrary design automation systems for a num-

April–June 2002 37

Figure 1. A sample Library of Congress card containing non-Latin script and transliteration.(Courtesy of the author.)

Page 5: MARC: Keystone for Library Automationpdfs.semanticscholar.org/a0a7/67bd448c58cce9849d80a89f81... · 2017-11-29 · MARC: Keystone for Library Automation Sally H. McCallum Library

ber of its processes: cataloging, searching, index-ing, and document retrieval.8 This was a colos-sal task for that time, given the complexity oflibrary activities and the embryonic nature ofthe computing environment.

The Library of Congress began the recom-mended general analysis of all processes, but for-tunately it also undertook a more focused projectin partnership with the general library commu-nity. This project was to develop a data format forthe interchange of cataloging information inmachine-readable form for multipurpose use. Acomputer expert named Henriette Avram washired in 1964 by the Library of Congress to leadthe project. She had had exceptional experienceas a programmer with the National SecurityAgency in the 1950s, where cutting-edge com-puter technology was being developed and used.Avram was the essential ingredient in the devel-opment of library automation, for she had thebackground to understand the fundamentalnature of a common data format as the spring-board standard from which to build an automat-ed environment, including its potential forcoalescing a community. She also recognized theimportance of working with the professionallibrarians in the field until she understood theirpoint of view in order to make a useful—andacceptable—project and product. A rapid, butbroadly consultative, development process wasthus begun.9

After sponsoring a 1964 study on methods for“recording of Library of Congress bibliographicdata in machine form”10 and several exploratorymeetings in 1965, an agency that assisted majorfoundations in channeling funding for libraryprojects, the Council on Library Resources,became a major backer of the MARC formatdevelopment work by funding a pilot project.The project, under Avram’s leadership, was to becarried out by the Library of Congress with agroup of participating libraries.11 The pilot pro-ject’s immediate goals were to develop a standardformat, set up a record input system at the Libraryof Congress, and start a tape-based record distri-bution service from the Library. Avram stated thatthe expected use of MARC would “undoubtedlycenter around … producing traditional recordssuch as catalog cards or book catalogs or in devel-oping new on-line systems,”12 and she also fore-saw the format stimulating research in bothoffline and online areas including: “book catalogproduction, file organization, retrieval methods,and man-machine dialogues.”13

The schedule for the pilot was intense:

• January–April 1966: specification of a formatfor cataloging records.14 This preliminary

form became known as the MARC I format.• March–November 1966: development of an

input and distribution system and estab-lishment of a weekly cataloging record tapedistribution service for pilot participants.

• December 1966–June 1967: evaluation anddecisions on next steps based on the pilot.

By June 1967, while the value of such arecord distribution service was becomingapparent, the pilot systems set up in the par-ticipating libraries to use it were not workingsufficiently well to yield conclusive results.Participants had had major problems assem-bling tools and expertise. The Library ofCongress decided to extend the pilot for anoth-er year to refine the format and develop a pro-duction-level MARC record distribution serviceopen to all libraries, not just participants in theoriginal pilot project. During that year theMARC II format was finalized.15 MARC II wasthe first complete and official version of the for-mat and still forms the basis of the MARC 21family of formats used today by thousands oflibraries in the US and around the world forsharing bibliographic data.16

Interestingly, the major bibliographic recordsupply networks that developed over the nextdecade, several of which continue to exist suc-cessfully today offering expanded services, grewfrom the experience of several early pioneers indeveloping the format. Frederick Kilgour fromYale was one of the first to take action, leadingthe planning that began as early as 1968 for theinstitution that became the Online ComputerLibrary Center (OCLC) in Ohio. Auto Graphics(AG) Canada in Toronto originated as part ofthe University of Toronto library automationprogram, becoming an independent networksystem in 1973. In the early 1970s, the Wash-ington State Library launched the WesternLibrary Network (WLN), now part of OCLC.The Research Libraries Group (RLG) grew out ofa consortium of Harvard and Yale, to whichColumbia University and New York PublicLibrary were added. Creative individuals orgroups at the pilot project institutions could seethe potential of MARC records and set aboutdeveloping organizations offering network serv-ices that, while similar to each other, had inter-esting and innovative differences. Thesenetworks were and still are based on MARC.

In addition to constant consultation withthe pilot project participants, the Library ofCongress also held many discussion sessions atlibrary meetings, forums, and other venues,such as meetings of the American LibraryAssociation (ALA), to obtain the broadest possi-

38 IEEE Annals of the History of Computing

MARC: Keystone for Library Automation

Page 6: MARC: Keystone for Library Automationpdfs.semanticscholar.org/a0a7/67bd448c58cce9849d80a89f81... · 2017-11-29 · MARC: Keystone for Library Automation Sally H. McCallum Library

ble input to the format and service develop-ment process.17 Special interest from the BritishNational Bibliography office in the UnitedKingdom, now a part of the British Library,added an element of international participationin finalizing MARC II in late 1967. Their inter-est alerted Avram and the pilot participants tothe possibility of international data exchangethat could be more timely and useful than theprinted national bibliography book catalogsand card services that were then available.

International interest was high, leading to thesuccess in the next few years of the effort to makethe basic MARC format structure an interna-tional standard under the auspices of the Inter-national Organization for Standardization (ISO).The MARC format is considered an implementa-tion of the generalized format structure thatAvram and her team modeled. That structure,which was first approved as an American stan-dard (ANSI Z39.2)18 and quickly became an inter-national standard (ISO 2709),19 describes theframework for a record. Unfortunately, interna-tional standardization could not go beyond thatpoint at that time, and nations tended over thenext decade to develop their own national for-mat versions of MARC, with the same ISO 2709structure but having different tags and coded val-ues, giving them names like CAN/MARC (Cana-da), UKMARC (United Kingdom), NorMARC(Norway), and AusMARC (Australia), to name afew. It was not yet really understood how inter-national the exchange of data could become, sonational borders seemed like the logical bound-aries of a format for a batch-oriented environ-ment based on magnetic-tape data exchange.

The original MARC II covered only recordsfor books (monographs), and the first tapes inthe MARC record distribution service launchedby the Library of Congress in 1969 includedonly books in English. From 1969 through thelate 1970s, work continued at the Library ofCongress to expand the format to accommo-date all forms of material and then to extend itto controlled vocabularies, thesauri of subjects,and lists of names. The MARC record distribu-tion service also constantly broadened its scopeto cover all languages, forms of material, and,finally, related files such as the Library ofCongress Subject Headings and the Library ofCongress Name Authority File.

These developments created an environ-ment for the establishment and growth of thebibliographic service networks mentioned pre-viously and for the entry of vendors into thelibrary services arena. While some of theseagencies did not at first see the value of thestrict use of the MARC format for exporting

bibliographic data, by 1980 MARC was recog-nized as the basic building block for a thrivingand competitive library services industry, anindustry that assists libraries to take advantageof the savings and expanded service associatedwith automation.

MARC innovationsThe MARC format’s structure and tagging

accommodated the requirements, alreadydescribed, with a format design that was high-ly innovative for its time. The most criticalchallenges—data element and file length vari-ability, large files, and update requirements—were addressed with a simple format structurethat embedded a directory to the data contentfields in the front of the fields.20

Briefly, the MARC format structure is com-posed of a short introductory fixed-lengthblock, called the Leader; followed by aDirectory giving the tag, length, and startingcharacter position of each of the record’s datacontent fields; followed by the data contentfields themselves (see the “MARC 21 RecordExample” sidebar, next page). This is notunlike the structure of a typical book, with anintroductory title page, then a table of con-tents that identifies and points to the bookcontents, followed by the content. One differ-ence, however, was that reading the data con-tent fields sequentially was not a requirement,so that while the table of contents would beordered in the MARC record, the data fieldscould be in any order.

The actual data contained in the MARCrecord is formulated according to a set of rulesfollowed by catalogers. The rules also indicatethe data elements that are essential to includein a record. These rules are not a part of theMARC format, and, while some of them areshared and used internationally by multiplecommunities, there are a number of differentcataloging conventions in existence. Becausethe format is intended to serve as the vehiclefor encoding and transporting any catalogingdata, data elements are defined as genericallyas possible so that the format will be usefulfor data derived from different catalogingrules.

As mentioned, the MARC format imple-ments a generalized structure that has becomean American and international standard.While aspects of the Leader, Directory, andfield-ending and record-ending marks are spec-ified in the standard, the specific data tags andsome structural details of Directory entries anddata fields are left to a MARC implementation.All data in the format are character encoded.

April–June 2002 39

Page 7: MARC: Keystone for Library Automationpdfs.semanticscholar.org/a0a7/67bd448c58cce9849d80a89f81... · 2017-11-29 · MARC: Keystone for Library Automation Sally H. McCallum Library

This format structure was very different fromapplication data formats used in the mainstreamof computer work at the time it was developed,but the structure efficiently accommodated vari-able-length data and record updates and effec-tively minimized file size. The followingdescribes the format features and indicates howthey helped satisfy the library data requirements.

LeaderThe first 24 bytes of a record give the record’s

overall length, define the structural options cho-sen for the record, and give information on somebasic characteristics of the data content. It wasinnovative in that it allowed the record to be par-tially self-defining. However, the MARC pilotproject participants recommended that the

40 IEEE Annals of the History of Computing

MARC: Keystone for Library Automation

In this example—a MARC 21 record for Paul E. Ceruzzi’sbook, A History of Modern Computing—the following sub-stitutions have been made where needed for the controlcharacters used in MARC records and for the space:

# = Space (ASCII 20)$ = First character of each subfield tag (ASCII 1F)@ = End of field marker (ASCII 1E)% = End of record marker (ASCII 1D)

Figure A shows the example record as it might appearin an online catalog display.

Table A shows the fields in the example MARC 21record.

Figure A. Online catalog display form of the exampleMARC 21 record.

MARC 21 Record Example

Table A. An example of a MARC 21 record, showing fields and data.

Field Data CommentsLeader 00647cam##22002005#a#4500 Coded and other data giving the length of the record,

base address of data, record status, record characteristics, and a few important bibliographic data codes characterizing the bibliographic item being described and the conventions used to create the descriptive data.

001 3560569 The record identification number used by the agency specified in field 003.

003 DLC Organization code for the Library of Congress.005 19990615161503.7 The date and time of the last update to this record.008 980420s1998####maua#####b####001#0#eng## Coded and other data indicating the date the record was

created, date the work was published, language of the work, place of publication, and so on.

020 ##$a0262032554 (hardcover : alk. paper) Data040 ##$aDLC$cDLC$dDLC Organization codes for the agency creating, keying, and

updating the MARC record.050 00$aQA76.17$b.C47 1998 Data100 1#$aCeruzzi, Paul E. Data245 12$aA history of modern computing /$cPaul E. Ceruzzi. Data260 ##$aCambridge, Mass. :$bMIT Press,$c1998. Data300 ##$ax, 398 p. :$bill. ;$c24 cm. Data440 #0$aHistory of computing Data504 ##$aIncludes bibliographical references and index. Data650 #0$aComputers$xHistory. Data650 #0$aElectronic data processing$xHistory. Data

Author: Ceruzzi, Paul E.Title: A history of modern computingPublication information: Cambridge, Mass.: MIT

Press, 1998.Pagination/size: x, 398 p. : ill. : 24 cm.Series: History of ComputingNote: Includes bibliographical references and index.ISBN: 0262032554 (hardcover : alk. paper)Subject: Computers—HistorySubject: Electronic data processing—HistoryCall number: QA76.17.C47 1998

Page 8: MARC: Keystone for Library Automationpdfs.semanticscholar.org/a0a7/67bd448c58cce9849d80a89f81... · 2017-11-29 · MARC: Keystone for Library Automation Sally H. McCallum Library

MARC implementation adopt the same struc-tural options for all records in the bibliographicapplication. These values are always the same ina MARC record, thus the flexibility of self-definition was not really used. The Leader carriesthe base address for the data fields, which isessential information for processing the recordbecause it makes the Directory structure work,thus enables variable-length data fields. The con-tent-related data found in the Leader is thatwhich might affect the way an agency’s systemtreats the incoming record—the files to which arecord might be sent or programs and prepro-cessing routines through which the recordshould pass. The Leader has the familiar charac-teristic of fixed length with data elementsdefined according to position, but it is short andpacked with essential information. Fortunately,however, the Leader allowed a few bytes forfuture definition, which have been valuablegiven the multidecade use of this format.

DirectoryThe Directory, which follows the Leader, is

the first variable-length field. The data content

fields in the record are found via a Directoryentry, so the Directory contains an entry foreach. Directory entries themselves are fixed inlength for a record, but the Directory varies inlength depending on the number of data con-tent fields in the record. A Directory entry hasfive possible parts—the first, the tag that iden-tifies the field, must be three characters. Thelengths of the other four parts are set by valuesin the Leader. All MARC formats developedaround the world have the same choice of 12bytes for the Directory entries, with a three-character field tag, a four-character field length,and a five-character starting character positionof the referenced field. The other two possibledirectory parts are undefined and have zerolength in MARC.

The Directory system gives the format agreat deal of flexibility for efficiently carryingvariable-length data without a need forpadding or truncation. Whether the data for afield is only four characters, such as the titleBabe, or 58 characters, such as the title TheNew Milton Cross Complete Stories of the GreatOperas, the field length adjusts to the data

April–June 2002 41

Figure B. The example MARC 21 record.

Figure B shows how the example MARC 21 record would appear. The underlined character posi-tions in Figure B are the record Leader. The first five positions contain the length of the record, 647bytes. The italicized digits in the Leader (13-17) indicate the base address of the data fields, 00205.That is the position, from the beginning of the record, of the first byte of the first data field afterthe Directory and the address from which all the variable data field addresses are calculated. Thisallows the data fields’ addresses to be independent of the number of entries, hence length, of theDirectory.

The Directory entry for the field that contains the title and the data field to which that entrypoints are highlighted (bolded) in Figure B. The Directory entry is composed of the field tag, 245,the length of the field, 0054, and the starting character position of the field relative to the baseaddress of the data fields, 00172.

The highlighted data field begins with two indicator values, 12. The first value indicates that thetitle of the work would be appropriate to index. The second indicates that there are two characters,A#, at the first of the title to ignore in indexing and sorting for displays. The data in the field are con-tained in two subfields. The first, identified by the subfield code $a, contains the title of the work. Thesecond, $c, contains a transcription of the author’s name as found on the title page of the book.

00647cam##2200205#a#4500001000800000003000400008005001700012008004100029020004000070040001800110050002300128100002100151224455000055440000117722260004200226300003200268440002500300504005100325650002400376650004100400@3560569@[email protected]@980420s1998####maua#####b####001#0#eng##@##$a0262032554#(hardcover#:#alk.#paper)@##$aDLC$cDLC$dDLC@00$aQA76.17$b.C47#1998@1#$aCeruzzi,#Paul#E.@1122$$aaAA##hhiissttoorryy##ooff##mmooddeerrnn##ccoommppuuttiinngg##//$$ccPPaauull##EE..##CCeerruuzzzzii..@##$aCambridge,#Mass.#:$bMIT#Press,$c1998.@##$ax,#398#p.#:$bill.#;$c24#cm.@#0$aHistory#of#computing@##$aIncludes#bibliographical#references#and#index.@#0$aComputers$xHistory.@#0$aElectronic#data#processing$xHistory.@%

Page 9: MARC: Keystone for Library Automationpdfs.semanticscholar.org/a0a7/67bd448c58cce9849d80a89f81... · 2017-11-29 · MARC: Keystone for Library Automation Sally H. McCallum Library

length. Because of the implementation choic-es for the MARC record (only four characterpositions are allowed for each field length inthe Directory, and the length is characterencoded), there is a maximum length for eachfield of 9,999 bytes, but this is seldom a con-straint. Bibliographic data rarely require greatlength for a single field, and most systemshave field and record length limits that will bereached first. Because the Directory lets pro-cessing systems know how long a field is, sys-tems can be prepared to handle the data,which is especially critical with longer fieldsbecause systems generally expect bibliograph-ic data to be short.

The starting character positions in theDirectory entries are the keys to finding thefields. They are calculated from the first byte inthe record after the Directory, called the baseaddress of the data content. The Leader indi-cates which position from the first byte of therecord is the base address.

The Directory also enables easy recordupdating. While the entries in the recordDirectory may be in a convenient order, thedata fields can be in any order, even randomorder. Thus an updated field can be simplyadded to the end of a record. The previous formof the field can even be left in the record withno Directory entry pointing to it, if necessary.Rearranging entries in the Directory does notaffect the positions of the data content fields towhich they point. This gives the format flexi-bility when constructing MARC records forcommunication to other systems.

The Directory is an efficient tool for extract-ing subsets or manipulating the bibliographicdata. When a record is being incorporated intoa system, different fields may be needed for thedifferent processes: display, indexing, conver-sion to internal format, and feed for peripher-al systems such as circulation systems, and, inthe 1960s, card and catalog printing systems.The Directory serves as an efficient tool forselecting the information that might be need-ed for different processing objectives.

Data content fieldsAs has been described, bibliographic data

are a collection of different pieces of informa-tion that describe various aspects of the itembeing cataloged. The many related but distincttypes of data included in a record are accom-modated by the field system, and each field isidentified in the Directory through a specificfield tag. Fields thus support access to the dif-ferent data elements formulated by catalogers,making it possible to select individual ele-

ments or subsets for indexing, for transfer tovarious subsystems, or for retrieval and subse-quent sorting. The data fields themselves areparsed into subelements of two types: indica-tors and subfields.

IndicatorsMARC records reserve two bytes at the

beginning of each data field to specify addi-tional information about the field. Indicatordefinitions are field-dependent, and for somefields with no extra requirements they areundefined (carried as blanks), but where need-ed they assist in further characterizing the datain the field, in some cases assisting with index-ing and sorting the data.

SubfieldsFormat fields can be further divided into

subfields that separate and identify datasubelements. The subfield tags depend on thefield type for their definition. This deviceallows identification of data components at arelatively high granularity and is used effec-tively for precise information retrieval andsorting.

The fields, indicators, and subfield struc-tures enable the format to carry and identifyhighly parsed, structured data; show relation-ships among data; and even assist with com-plex sorting. Fields and subfields easilyaccommodate the many different data ele-ments needed to describe special aspects ofdiverse resources.

Character setA major component of the MARC format

was the development of an extension set ofLatin-based characters and diacritics. As men-tioned, while libraries had enjoyed biblio-graphic data in vernacular scripts for manyyears, they also employed transliteration intothe Latin alphabet for key elements on everyvernacular catalog card to support integratedfiling, sorting, and retrieval. The decision in the1960s was to focus on an extended Latin setthat enabled full transcription of all Latin scriptlanguages and full transliteration of non-Latinscript languages, but not to attempt yet todevelop machine-readable non-Latin scripts. Athorough analysis was carried out by theLibrary of Congress project staff to identify allspecial characters and diacritics required for thecataloging of the Library of Congress’s and sev-eral of the pilot participants’ multilingual col-lections.21 The project team developed a set of27 additional spacing characters (such as thethorn used in the Icelandic language or the ae

42 IEEE Annals of the History of Computing

MARC: Keystone for Library Automation

Page 10: MARC: Keystone for Library Automationpdfs.semanticscholar.org/a0a7/67bd448c58cce9849d80a89f81... · 2017-11-29 · MARC: Keystone for Library Automation Sally H. McCallum Library

digraph used in several Scandinavian lan-guages) and 29 diacritics to be used over orunder alphabetic characters that fully met thecharacter set requirement. This extension toASCII later became an American standard.22

Because diacritics are used with many basealphabetic characters, and sometimes two oreven three are needed above and below a sin-gle alphabetic character, especially for south-east Asian languages using Latin script andtransliterations of tonal and Indic languages,the decision was made by pilot project partici-pants to “float” all diacritics rather thanexpand the character set to incorporate all pos-sible combinations. Coding all combinationswould have created multiple sets and hundredsof characters that would have been difficult tohandle in the technical environment of theperiod. Thus, in the MARC set, a character withone diacritic was represented by two characters,with the diacritic encoded first and the alpha-betic character following. Printing devices wereexpected to position the diacritic in the properplace over or under the alphabetic character,although that also proved to be a challenge,especially in the early years. IBM worked withthe Library on a print train for the extended setthat could also be used by other libraries. Thepilot participants found that moving from 94conventional graphic characters (ASCII) to 150(with the new characters) was itself a challenge.

Using floating diacritics was innovative. Itenabled the introduction of a large number ofcharacters with a relatively small set and assist-ed data normalization. Computer routinescommonly normalize data strings for certainsorting, matching, and indexing routines inlibrary applications so that these operations arecarried out on the base character without thediacritics. With the floating diacritics, normal-ization could be achieved simply by droppingthe diacritics instead of employing characterconversion.

Development of an environmentThe MARC format, released in 1968, was to

be both a catalyst and linchpin for the devel-opment of a broad and diverse automatedlibrary environment. The original project teamcould hardly have imagined some of the tech-nical advances of the 1970s, 1980s, and1990s—regarding first, terminals, followed bypersonal computers, networks, and theInternet—that would be adopted by librariesto improve and transform the way they func-tion. The MARC format proved itself resilientfor carrying library data through constantlychanging technology.

The 1970sIn the early 1970s, following the original

objectives for the format’s use, the Library ofCongress and others immediately began driv-ing catalog card production from the new com-puter file. Services were offered to patrons thatinvolved computerized batch record retrievalsystems to mine the rich MARC record for spe-cialized user needs. These same retriever pro-grams were also applied to improve backroomlibrary processing functions. Computer-generated book and microform catalogs wereproduced, and initial development of localonline catalogs began.

The essential factor that propelled the for-mat’s acceptance was the Library of Congress’suse of it to encode and distribute bibliograph-ic records. By 1975, roughly 60 percent of therecords produced annually by the Library ofCongress were available as MARC records, withthe number increasing rapidly as formatrequirements for new forms of material wereimplemented. By the mid-1970s, the formatwas a suite of coordinated bibliographic for-mats not just for books but also for serials,maps, films, music, sound recordings, andmanuscripts. These formats were alike at thecore but different in details specific to eachmedium.

For an effective bibliographic system it isessential to have companion files for the stan-dardized forms of names and for subject the-sauri. These standard name and term filesprovide catalog users with cross-references tothe forms of names and terms that have beenadopted as the preferred forms for use in bibli-ographic records. The importance, for catalogerproductivity, of access to an automated file ofname and subject “authority” records led to thedevelopment and refinement of a MARCAuthority record format in the late 1970s, afterwhich major file-building programs werelaunched. This format used the ISO 2709 struc-ture, and the fields were borrowed or coordi-nated with related ones in the bibliographicformat where logical.

One especially important development inthe early 1970s was the setting up of formatmaintenance mechanisms, which persist tothis day. The format was adopted by America’snational libraries, the National AgriculturalLibrary and National Library of Medicine inaddition to the Library of Congress; libraryassociations, such as the American LibraryAssociation and the Music Library Association;and bibliographic networks such as OCLC,WLN, and RLG. A maintenance routine wasworked out with the following characteristics:

April–June 2002 43

Page 11: MARC: Keystone for Library Automationpdfs.semanticscholar.org/a0a7/67bd448c58cce9849d80a89f81... · 2017-11-29 · MARC: Keystone for Library Automation Sally H. McCallum Library

The Library of Congress took maintenanceagency responsibility; all changes were docu-mented with respect to background, need,options, and impact; open reviews were held inconjunction with open meetings twice a year;and agreed-upon changes were made to the for-mat documentation, which was maintainedand distributed by the Library of Congress.

This maintenance process has matured withthe technology and while today it follows thesame general outline, it is more open and globalthrough use of the Internet for posting changeproposals and use of email and listservs for com-ment and discussion. Participation extendingbeyond American organizations became neces-sary since the format is now known to be usedin more than 40 countries. In recent years, theLibrary of Congress maintenance agencyresponsibility has become a partnership with theNational Library of Canada, and it will contin-ue to evolve. Major stakeholders, such as ven-dors of large integrated systems that use MARCand developers of micro-based systems depend-ent on MARC, are key participants in the reviewand discussion process. The careful analysis ofchanges, open international discussion, activeinput of a variety of users—from the highlytechnical to those with strong bibliographicand/or specialized expertise—have been impor-tant ingredients in the format’s continuing vitalrole in automated library services.

A library infrastructure development thatsprang from the establishment of the MARCformat and the availability of Library ofCongress data was the creation of institutionsoffering access to records and a host of otherbibliographic services for libraries. These newunion catalogs were aptly called “bibliographicutilities” at the time, because they did not holdcollections, like a library, but only offeredbibliographic-record-related services.

Several of these utilities began developmentin the 1970s: OCLC, which was originallynamed the Ohio College Library Center, but asit became national in the 1980s and global in1999, changed its name to Online ComputerLibrary Center; RLG, which was formed on theEast Coast in 1974 and then merged in the late1970s with a system begun at StanfordUniversity and moved to California; WLN,formed as the Washington Library Network inWashington state, which became the WesternLibrary Network as it expanded and was final-ly absorbed into OCLC in the late 1990s; andUTLAS, the University of Toronto LibraryAutomation System that is now called AGCanada. These networks proved important tolibraries, giving them the opportunity to view

and copy not only the Library of Congress cat-aloging records but also cataloging records cre-ated and input by other libraries directly intothe network systems. Initially, in the 1970s,these networks primarily printed cards andshipped them to participating libraries, as onlya few organizations were able to use electronicMARC records locally. As a (planned) by-product, the networks accumulated large unioncatalogs, with holdings attached to MARC bib-liographic records, which they used to supporta growing national, and eventually global,interlibrary loan program.23

These utilities were also early adopters andadapters of devices such as terminals based onthe new CRT technology. OCLC developed aspecial CRT terminal in the mid-1970s thatwould work in the limited networked environ-ment of that time, and it supported the extend-ed MARC character set, 150 graphic characters,in an online mode.24

In summary, the MARC format role in thesedevelopments was like a currency. With a com-mon standard and a critical mass of biblio-graphic records, libraries experimented withproducing single-focus automated systems thattook the records and used them, for example,for card printing, circulation control, book cat-alog printing, acquisition selection procedures,book preparation (such as binding labels andbook pockets), or current purchase awarenesslists. Bibliographic utilities were developed thatfacilitated record sharing and began to buildunion catalogs for interlibrary loan. The Libraryof Congress achieved a distribution service forall of its cataloging records and its related sub-ject and name authority files in MARC formatwhile also developing an internal online cata-log. By the late 1970s, several libraries aroundthe country had online catalogs under devel-opment, presaging the shift that would takeplace in the next decade.

The 1980sDuring the 1980s, the local system develop-

ment that started in the previous decade led towidespread availability of vendor software forlibrary processing and online catalogs. Severalvendor systems had their start as local systemsfor a university library. Important examples areNOTIS (Northwestern Online Total IntegratedSystem) developed by Northwestern Universityand VTLS (Virginia Tech Library System), whichoriginated as an online catalog development atthe Virginia Polytechnic Institute and StateUniversity. While teams of university libraryand computing staff provided the initial expert-ise, the universities generally found it desirable

44 IEEE Annals of the History of Computing

MARC: Keystone for Library Automation

Page 12: MARC: Keystone for Library Automationpdfs.semanticscholar.org/a0a7/67bd448c58cce9849d80a89f81... · 2017-11-29 · MARC: Keystone for Library Automation Sally H. McCallum Library

to limit their vendor role. These systems werespun off as separate enterprises when otherinstitutions began to want to purchase them.

The success of a few vendors of bibliograph-ic systems attracted more development, untilthe marketplace offered large and small systemswith a variety of special features, as well as abroad price range. The MARC communicationsformat provided a data-rich record that systemengineers used for innovative applications,retrieval programs, and user interfaces, but itdid not dictate internal system design. MARCdata within a system is usually carried in aninternal format or configuration that is efficientfor the system hardware or platform, but whichis also highly compatible with the communica-tions format. Inevitable transitions from a localsystem to a vendor or from one vendor systemto another would take place over the nextdecades in libraries, but this migration has beeneased by the standard data format all systemscould export and import.

As more libraries obtained online catalogs,interest turned to retrospective conversion ofbibliographic records. Libraries wanted to con-solidate all holdings in their online catalogsand retire their card catalogs. Thus the manyretrospective conversion projects in the 1980sresulted in an explosion of MARC records inunion catalogs like OCLC and RLG. Even theLibrary of Congress undertook the conversionof its retrospective file of more than five millionrecords, a project that had been explored by theAvram team as early as 1968.25 While early con-version of the Library of Congress’s mammothcatalog could have been valuable to later con-version projects of the nation’s libraries, thebibliographic utilities contributed greatly toreducing the cost of later conversions by mak-ing the records converted by one library avail-able to other libraries.

The 1980s saw the exploration of new terri-tory for the format itself, targeted toward inte-grating holdings and other data into librarycatalogs, and toward patron service operations.Thus the MARC Holdings, MARC Classification,and MARC Community Information formatswere developed by special-interest groups andput through a review process to ensure compat-ibility with the bibliographic format. While aMARC bibliographic record indicates that alibrary holds an item by the record’s existence ina catalog, the MARC holdings record wasdesigned to accommodate recording and displayof holding details: exactly how many copies, inwhat physical formats, and, for serials, exactlywhich volumes and issues. The holdings formatalso contains sufficient detail to support serial

check-in systems, including prediction ofexpected issues and automatic generation ofclaims for serial issues that are overdue from thepublisher. The MARC holdings format is a closecompanion to the bibliographic format, and infact was developed according to a model thatallows the holdings data to be contained in sep-arate records or embedded in bibliographicrecords. The classification format supports theonline transmission and use of common classi-fication schedule files, including the Library ofCongress classification and the Dewey decimalclassification. The most unusual format was thatfor community information data, which allowsa library to integrate information about publicevents and community services into a biblio-graphic database.

A significant format maintenance and updateinitiative took place in the late 1980s, when theformat had just reached 20 years of use. The bib-liographic format had become a suite of coordi-nated formats for different forms of material.The trend at the time was to maintain com-monality of the core data elements, such asnames and titles, but to restrict use of specializeddata elements to specific forms of material. Theforms were primarily related to type of intellec-tual expression of the information: text, carto-graphic, music, visual—but other aspects of thematerial were also singled out: serial nature ofthe material, whether the material was consid-ered appropriate for treatment in an archivalmanner, and the material’s electronic form.

After extensive cost and benefit studies andclose examination of all changes for upwardcompatibility, the bibliographic format wasfully integrated, eliminating earlier designationsof field validity by type of material. Henceforth,not only the core data elements, but all fieldsformerly defined for specific subformats couldbe used in a record for any item, regardless of itstype. This new freedom and flexibility enabledthe format to be more readily responsive tochanges in resource media and technology,especially for accommodating the descriptionof modern multimedia material. At the sametime, format simplification was attempted,although the main counter to any simplifica-tion was and is the persistent need to parse andstructure bibliographic data to produce complexretrieval options for systems and, ultimately,end users. Libraries serve a varied clientele, andwhile a majority of their users may not havecomplex needs, librarians provide services toboth the generalists and specialists.

Also in the 1980s, MARC broke out of theASCII and extended Latin character sets whenone of the major networks, RLG, developed

April–June 2002 45

Page 13: MARC: Keystone for Library Automationpdfs.semanticscholar.org/a0a7/67bd448c58cce9849d80a89f81... · 2017-11-29 · MARC: Keystone for Library Automation Sally H. McCallum Library

standard character sets for Arabic, Hebrew,Cyrillic, Chinese, Japanese, and Korean (withmore than 16,000 characters in the Chinese,Japanese, and Korean set) and implemented anetworked non-Latin cataloging module. Thesesets either followed recently established ISOstandards or were immediately taken througha standardization process. This was followed bythe development of non-Latin capability inseveral vendor systems, using either the stan-dard or local character sets to serve broadermarkets. This work predated Unicode by morethan 10 years and, in turn, contributed to theeventual development and refinement of theglobal Unicode set.26

The past decadeDuring the 1990s, the MARC format has

evolved in reaction to the exciting possibili-ties of Internet technology. Libraries had col-lected and cataloged, via MARC records,computer files and tangible electronicresources for two decades, but the successiveand extensive development of online elec-tronic resources, starting with gopher tech-nology, followed by the open-access Web, andnow subscription-based electronic publica-tions, has required both cataloging and MARCformat adjustments. The format has addressedseveral major issues, the most prominentbeing the need to provide linking to actualresources from the bibliographic record. In1993, even before the uniform resource loca-tor (URL) addressing scheme was completelydeveloped, a field in the MARC format wasestablished to contain pathway componentsfor accessing digital resources from a MARCrecord. That field is still adjusted at least annu-ally as the Internet and Web environmentmatures and becomes more standards based,and a URL/URN linking subfield has also beenadded to other relevant fields. The modernMARC-based catalog is thus able to retrievematerial comprehensively, integrating accessto descriptions for both tangible and intangi-ble resources and, for the electronic material,to the resources themselves.

In the 1990s, the format developers alsotackled the question of using Unicode ratherthan the original ASCII, extended Latin, andother script character encodings, since thelibrary community already had an interest andconsiderable investment in non-Latin scriptsdata. With mapping assistance from theUnicode Consortium, all the MARC charactersets now have defined mappings to Unicode.27

A special MARC committee also establishedrules and conventions for using Unicode in a

MARC exchange record. Not unexpectedly, afew library system vendors have been earlyimplementers of Unicode—even before thetools and methodologies for its use had beenworked out (which gave them many learningexperiences). As a result, however, several fullyUnicode-compliant vendor library systems arestarting to be deployed.

An additional development in the 1990shas been the attempt to separate the MARCdata elements from the MARC structure (ISO2709) to enable representations of the highlydeveloped MARC data elements in StandardGeneralized Markup Language (SGML) orExtensible Markup Language (XML) struc-tures.28 An SGML document type definition(DTD) with format transformation scripts hasbeen available on the Library of CongressMARC Web site since 1996, joined by an XMLDTD in 2000. Others are also experimentingwith XML versions of MARC data. These areexplorations; other views of the MARC datain XML, or the markup language of thefuture, will be part of the format’s ongoingmaintenance.

Because of its widespread use for such anextended period, MARC has become both acommunications format and a lingua franca forlibrarians, especially staff responsible forinputting or interpreting the content ofrecords and for building systems and helperapplications that use the data in MARCrecords. The MARC tags are familiar to librari-ans across different institutions, who talkamong themselves in field tags instead ofnames. This language by-product of the stan-dard format enables training to be transferablefrom job to job and system to system.

During the past 30 years, three main vari-eties of MARC formats developed: those simi-lar to MARC 21, maintained by the Library ofCongress; those similar to UKMARC, promul-gated by the British Library; and those closer toUNIMARC, issued by the International Feder-ation of Library Associations and Institutions.All three models have the same structure, ISO2709, with the same structural options used,but they have differences in tagging at the fieldand subfield levels. MARC and UKMARC hadcommon roots, so many field tags match butsubfield structures may differ, whereas UNI-MARC differs in subfield structures and also hasaltogether different tagging. In the 1990s, thestrong availability of systems that fundamen-tally support MARC 21, and the MARC 21 ori-entation of several of the large recordrepositories such as OCLC, have been an incen-tive for countries to rethink or realign their for-

46 IEEE Annals of the History of Computing

MARC: Keystone for Library Automation

Page 14: MARC: Keystone for Library Automationpdfs.semanticscholar.org/a0a7/67bd448c58cce9849d80a89f81... · 2017-11-29 · MARC: Keystone for Library Automation Sally H. McCallum Library

mats with MARC 21. This globalization of theoriginal MARC format has moved the interna-tional MARC community toward a new level ofconsistency through standardization of con-tent designation that was not possible in theearly years. The analysis and decision of SouthAfrica in the mid-1990s to move from SAMARC(modeled on UNIMARC) to MARC 21, after 20years of use of SAMARC, was a catalyst for oth-ers. The complete alignment of the MARC for-mat used in the US with CAN/MARC fromCanada in 1997 has been beneficial to NorthAmerican libraries that already cooperated inmany ways. The decision in 2001 of the BritishLibrary to cease maintenance of UKMARC infavor of MARC 21 is also having a major impacton global MARC standardization.29 Theseexamples illustrate the trend.

In 1999, the MARC format family had aname change that better reflects its current sta-tus. In the early years, the original name, MARCII eventually became just MARC. However, inthe 1970s, because of the focus on the use of theformat to distribute Library of Congress cata-loging data, it was often called LCMARC. In the1980s, it took the name USMARC in line withnational format names in other countries, toclarify just which MARC it was, but in the 1990stwo situations mandated a new name for thefuture. The format was obviously being usedaround the world, and a special relationshiphad been established with Canada when har-monization of the already similar CAN/MARCformat with USMARC took place. The new cen-tury offered a suitable solution, and in 1999 theformat was renamed MARC 21.30

Summing upThe MARC communications format was

developed in the late 1960s and has beenexpanded, updated, and carefully maintainedsince that time. It has proved itself to be afoundation that enabled libraries to catcheach new wave of computer technology anduse it to help meet their goals and needs. Theformat was innovative and forward lookingwhen it was introduced and has helpedchange thinking in the library communityabout data and automation. MARC has itselfmoved and changed throughout its history,which contributed to its ability to supportextensive library system development—and tohave such an extended life.

Three factors stand out to explain whyMARC became the keystone rather than justanother experimental format. The first was, ofcourse, its innovative design—but many goodproducts are ignored by those who could ben-

efit from them. The second and third are morepractical factors: the collaborative way inwhich the format was developed, with broadlibrary community involvement and librariansworking hand-in-hand with systems staff; andthe Library of Congress’s immediate develop-ment of systems to make its large volume ofcataloging records available in MARC. The col-laborative approach encouraged both imagina-tive local use of the records and development ofnew ideas for data exchange. The collaborationof librarians and systems staff encouragedlibrarians to accept change and technologiststo produce acceptable systems. Making Libraryof Congress cataloging records available inMARC took advantage of more than 60 yearsof the record distribution service from thelibrary, producing another avenue for obtain-ing the library’s high-quality, consistentrecords. Then, following on the heels of theseinitial initiatives, came the development of thefirst shared cataloging utility, OCLC, whichgave MARC high visibility and immediate util-ity to a broad spectrum of libraries.

Today, standard MARC data support simpleand complex retrieval by end users and pro-vide the basis for cost-saving record sharing. Ithas been the underpinning for the prolifera-tion of interchangeable, modular systems thatenable libraries to automate in an integratedmanner, and it has served as the foundationon which a rich array of tools that helplibraries do their work have been built. MARChas even become a language that thousands oflibrary professionals use to input and discussbibliographic control issues. These uses havebeen built over the years as systems, tools,training, and globalization developed aroundthe format.

While the MARC format is simply a com-munications format, it turned out to be the keystandard for the development of the vast infra-structure that supports libraries today, enablingthem to provide users with retrieval and otherservices unheard of 30 years ago. Libraries havethe responsibility to organize and provide con-sistent and integrated access to all of theirresources—ancient manuscripts as well astoday’s electronic documents—and MARC’s far-sighted design, stability, and prompt, skillfulmaintenance have enabled libraries to meetthese fundamental objectives.

References and notes1. While there were a few computers such as the

Atanasoff-Berry, Bell Labs Model I, and the Mark 1machines in the late 1930s and early 1940s, theENIAC in 1945–1946 is considered by many as the

April–June 2002 47

Page 15: MARC: Keystone for Library Automationpdfs.semanticscholar.org/a0a7/67bd448c58cce9849d80a89f81... · 2017-11-29 · MARC: Keystone for Library Automation Sally H. McCallum Library

springboard for modern computer development.2. In this article, I draw on my own lengthy experi-

ence with the more recent developments inMARC and my association over a long period atthe Library of Congress with H.D. Avram, L.J.Rather, and others who led earlier developments.

3. An excellent source, used here, for computerdevelopment from 1945 to the late 1990s is P. E.Ceruzzi, A History of Modern Computing, MITPress, Cambridge, Mass., 1998. Ceruzzi’s volumeof pre-1945 “computing” history is also recom-mended: P.E. Ceruzzi, Reckoners: The Prehistory ofthe Digital Computer, from Relay to the Stored Pro-gram, 1935–1945, Greenwood Press, Westport,Conn., 1983.

4. Extended Binary Coded Decimal InterchangeCode (EBCDIC) is an 8-bit Latin character set thatIBM introduced in 1964 with its IBM System 360series of computers.

5. American Standard Code for Information Inter-change (ASCII) is a 7-bit set with 94 graphic char-acters including upper- and lowercase Latinalphabet characters, numbers, punctuation signs,and a few symbols. ASCII was first approved as anAmerican National Standard in 1968.

6. Another concern in 1965 was the deterioratingcondition of card catalogs. A study of the NewYork Public Library catalog in 1963–1965 indicat-ed that, of the 8,000,000 cards in that venerablecatalog, 2,296,000 needed replacement, whichwould cost an estimated $2 million. Convertingthe data to machine-readable form was suggest-ed, and the question was asked: “If the new cata-log were automated, should output be in theform of cards or books or should the data bestored in such a way that they could be called upand displayed graphically on a cathode raytube?” New York Public Library, ResearchLibraries, Library Catalogs: Their Preservation andMaintenance by Photographic and AutomatedTechniques, A Study by the Research Libraries of theNew York Public Library, MIT Press, Cambridge,Mass., 1968, p. vi.

7. J.C.R. Licklider, Libraries of the Future, MIT Press,Cambridge, Mass., 1965. This publication is basedon a study sponsored by the Council on LibraryResources and conducted by Bolt, Beranek, andNewman between Nov. 1961 and Nov. 1963.

8. G.W. King et al., Automation and the Library of Con-gress, Library of Congress, Washington, D.C., 1963.

9. An article by H.D. Avram titled “Machine-Read-able Cataloging (MARC) Program” that was pub-lished in the Encyclopedia of Library andInformation Science, Vol. 17, Marcel Dekker,Washington, D.C. (and later published in arevised form as a monograph MARC: Its Historyand Implications, Library of Congress, Washing-ton, D.C., 1975), contains an excellent detailed

description of the format development processfrom the 1960s through the early 1970s. It alsocontains an extensive bibliography that illustratesthe immediate excitement and explorationsinspired by the availability of MARC records.

10. L.F. Buckland, The Recording of Library of CongressBibliographical Data in Machine Form; A Report Pre-pared for the Council on Library Resources Inc.,revised, Council on Library Resources, Washing-ton, D.C., 1965.

11. The participants in the pilot project, selected fromvolunteers, represented a diverse group of libraries:Argonne Nat’l Laboratory, California State Library,Cornell Univ., Georgia Inst. of Technology, HarvardUniv., Illinois State Library, Indiana Univ.,Montgomery County Public Schools (Md.), NassauCounty Library System (N.Y.), Nat’l AgriculturalLibrary, Redstone Scientific Information Center,Rice Univ., State Univ. of New York BiomedicalComm. Network, Univ. of California Inst. of LibraryResearch (Los Angeles), Univ. of Chicago, Univ. ofFlorida, Univ. of Missouri, Univ. of Toronto, Wash-ington State Library, and Yale Univ.

12. H.D. Avram, “Implications of Project MARC,”Library Automation: A State of the Art Review, Am.Library Assoc. (ALA), Chicago, 1969, p. 83. Thispublication contains papers presented at the Pre-conference Institute on Library Automation heldat San Francisco, California, 22–24 June 1967.

13. H.D. Avram, The MARC Pilot Project, Final Report,Library of Congress, Washington, D.C., 1968, p. 8.

14. H.D. Avram, R.S. Freitag, and K.D. Guiles, A Pro-posed Format for a Standardized Machine-ReadableCatalog Record; A Preliminary Draft, Library ofCongress, Washington, D.C., June 1965.

15. H.D. Avram, J.F. Knapp, and L.J. Rather, TheMARC II Format: A Communications Format for Bib-liographic Data, Library of Congress, Washington,D.C., Jan. 1968.

16. In this article, the term MARC refers to the contin-uously updated format that was originally calledMARC II and is now called MARC 21. See also anexplanatory paragraph about the changing nameof the format in “The last decade” subsection.

17. Examples of such collaborations include the ALAMachine-Readable Catalog Format Committeethat reviewed and approved MARC II prior to itsrelease, and the ALA Standard Library TypewriterKeyboard Committee that helped develop thelayout for the record input keyboard.

18. Am. Nat’l Standards Inst., American NationalStandard Format for Bibliographic InformationInterchange on Magnetic Tape, New York, 1971(ANSI Z39.2-1971). The standard has beenreviewed and updated over the years and is nowavailable as Information Interchange Format(ANSI/NISO Z39.2-1994).

19. Int’l Organization for Standardization, Documen-

48 IEEE Annals of the History of Computing

MARC: Keystone for Library Automation

Page 16: MARC: Keystone for Library Automationpdfs.semanticscholar.org/a0a7/67bd448c58cce9849d80a89f81... · 2017-11-29 · MARC: Keystone for Library Automation Sally H. McCallum Library

tation—Format for Bibliographic Data Interchangeon Magnetic Tape (ISO 2709:1973). The standardhas been reviewed and updated over the yearsand is now available as Format for InformationInterchange (ISO 2709:1996).

20. Interesting “at the time” discussions of the struc-ture can be found in J.F. Knapp, “Design Consid-erations for the MARC Magnetic Tape Formats,”Library Resources & Technical Services, vol. 12, no.3, pp. 275-284, and H.D. Avram, J.F. Knapp, andL.J. Rather, The MARC II Format: A CommunicationsFormat for Bibliographic Data, Library of Congress,Washington, D.C., Jan. 1968.

21. It is remarkable that while the broader computercommunity had recently moved to a full set ofLatin alphabetic characters, the MARC projectwas putting together the tools to implement a56-character extension. The following articlefrom 1968, the same year that ASCII was firststandardized, describes the development processfor the extended set and matches languages tocharacters: L.J. Rather, “Special Characters andDiacritical Marks Used in Roman Alphabets,”Library Resources & Technical Services, vol. 12, no.3, 1968, pp. 285-295.

22. Am. Nat’l Standards Inst., Extended Latin AlphabetCoded Character Set for Bibliographic Use (ANSEL),(ANSI Z39.47-R1998).

23. As an example of size, in early 2002, OCLC aloneheld more than 50,000,000 MARC records in itsunion catalog, and OCLC member libraries heldan estimated 800,000,000 MARC records in theirlocal catalogs.

24. The OCLC 100 Display, manufactured by BeehiveMedical Electronics, was difficult to engineer butproved itself in use with the OCLC system intothe 1980s. See F.G. Kilgour, “ComputerizedLibrary Networks,” 2nd USA–Japan ComputerConf. Proc., Aug. 26–28, 1975, Tokyo, Japan, Am.Federation of Information Processing Societies,Montvale, N.J., 1975.

25. After several years of study and conversion testprojects in the late 1960s and early 1970s, a taskforce concluded that a large-scale retrospectiveconversion project for the Library of Congressretrospective catalog should take place. Becauseof past years of copy cataloging from Library ofCongress records, such a conversion would helplibraries around the country in their conversions.However, funding was not found at the time. Seethe following: Recon Pilot Project; Final Report,Library of Congress, Washington, D.C., 1972, forthe report on a major study and Avram’s MARC:Its History and Implications, Library of Congress,Washington, D.C., 1975, pp. 13-20, for adescription of various investigations.

26. Unicode is a universal character encodingstandard that includes all major scripts of the

world. It is a single set able to encode more thana million characters (through fully specified singleand multibyte encodings), without the use ofcontrol characters or special escapes to accessadditional characters as is necessary with conven-tional 7- and 8-bit sets. It is also synchronizedwith the ISO standard for the Universal CharacterSet, ISO 10646. See http://www.unicode.org formore information.

27. See http://www.loc.gov/marc/specifications/speccharintro.html.

28. S.H. McCallum, “Extending MARC forBibliographic Control in the Web Environment:Challenges and Alternatives,” Proc. BicentennialConf. Bibliographic Control for the New Millennium:Confronting the Challenges of Networked Resourcesand the Web, Library of Congress, Washington,D.C., 2001, pp. 245-261.

29. “British Library to Adopt MARC 21,” The BritishLibrary, 2001. Available from the British LibraryWeb site http://www.bl.uk.

30. The current full MARC format document is MARC21 Format for Bibliographic Data, Library ofCongress, Washington, D.C., 1999 (with annualupdates). A concise version of the format, otherversions, and related documentation are availablefrom the MARC 21 Web site: http://www.loc.gov/marc/.

Sally H. McCallum is Chief ofthe Network Development andMARC Standards Office at theLibrary of Congress in Washing-ton, D.C. In addition to theMARC standards, her office isresponsible for digital and Webstandards for the National

Library component of the Library of Congress andmaintains several important protocols and formatsused by libraries globally, such as the Z39.50 informa-tion retrieval protocol, the Encoded Archival Descrip-tion DTD, and the Metadata Encoding andTransmission Standard schema. She is a graduate ofRice University and the University of Chicago.

Readers may contact Sally H. McCallum [email protected].

For further information on this or any other com-puting topic, please visit our Digital Library athttp://computer.org/publications/dlib.

April–June 2002 49