Creator Element Authority Control. Garbage In, Garbage Out: Input Standards and Metadata Scheme is...

87
Creator Element Authority Control

Transcript of Creator Element Authority Control. Garbage In, Garbage Out: Input Standards and Metadata Scheme is...

Page 1: Creator Element Authority Control. Garbage In, Garbage Out: Input Standards and Metadata Scheme is only half of the equation Consistency is key Controlled.

Creator ElementAuthority Control

Page 2: Creator Element Authority Control. Garbage In, Garbage Out: Input Standards and Metadata Scheme is only half of the equation Consistency is key Controlled.

Garbage In, Garbage Out: Input Standards andMetadata

• Scheme is only half of the equation

• Consistency is key

• Controlled vocabulary for all– Subjects– Names– Common descriptive terms

Page 3: Creator Element Authority Control. Garbage In, Garbage Out: Input Standards and Metadata Scheme is only half of the equation Consistency is key Controlled.

Access Points - Purposes

• To identify (e.g., an entity known to the user)• To collocate (i.e., bring together related

entities/works)• To aid in evaluating or selecting (e.g., Has this

author written something newer on the subject? Which of several works with the same title do I want? What level of subject treatment is needed –a whole work on the subject? a chapter? A paragraph?)

• To locate the image, etc.

Page 4: Creator Element Authority Control. Garbage In, Garbage Out: Input Standards and Metadata Scheme is only half of the equation Consistency is key Controlled.

Access Points for Names and Titles -Purposes

• To facilitate the retrieval of names and titles that are imperfectly remembered

• To facilitate the retrieval of names and titles that are expressed differently in different information packages

• To facilitate the retrieval of names and titles that have changed over time

• To collocate expressions and manifestations of works

• To collocate works that are related to other works

Page 5: Creator Element Authority Control. Garbage In, Garbage Out: Input Standards and Metadata Scheme is only half of the equation Consistency is key Controlled.

Access Points for Names and Titles– How Accomplished

• Name and Title Authority Control– All access points (whether main or added entries)

need to be under authority control so that• persons or entities with the same name can be distinguished

from each other• all names used by a person or body, or all manifestations of

a name of a person or body will be brought together• all differing titles of the same work can be brought together

– Therefore, current practice dictates either the establishment of a “heading” for each name or title as an access point or the provision of pointers to draw different representations of names or titles together

– Headings are kept track of in authority files; RDF provides a model for linking entities

Page 6: Creator Element Authority Control. Garbage In, Garbage Out: Input Standards and Metadata Scheme is only half of the equation Consistency is key Controlled.

Name Authority Standards

• LCNAF (Library of Congress Name Authority File) – constructed according to principles set out in AACR2R

• Getty Vocabulary tools (artist names; geographic names) – VRA Core Categories calls for use of the Getty vocabulary

• ISAAR(CPF) – International Standard Archival Authority Record for Corporate Bodies, Persons and Families

• EAC – Encoded Archival Context (for describing creators of archival collections)

• DCMI Agents – creators, contributors, and publishers – to be used in Dublin Core records

Page 7: Creator Element Authority Control. Garbage In, Garbage Out: Input Standards and Metadata Scheme is only half of the equation Consistency is key Controlled.

DCMI Agents: Working Definitions

• Agent: A person (author, publisher, sculptor, editor, director, etc.)

• or a group (organization, corporation, library, orchestra, country, federation, etc.)

• or an automaton (weather recording device, software translation program, etc.)

• that has a role in the lifecycle of a resource.• Agent Record: A collection of elements describing an

agent.• Agent Authority Record: An agent record that includes

the particular name that is preferred (considered authoritative) within a particular community (e.g., libraries).

Page 8: Creator Element Authority Control. Garbage In, Garbage Out: Input Standards and Metadata Scheme is only half of the equation Consistency is key Controlled.

Controlled Subject Terminology- Purposes

• To provide subject access to information packages in a catalog or index

• To collocate surrogate records for information packages of a like nature

• To provide suggested synonyms and syndetic structure to aid a user in subject searching

• To save the users’ time

Page 9: Creator Element Authority Control. Garbage In, Garbage Out: Input Standards and Metadata Scheme is only half of the equation Consistency is key Controlled.

Controlled Subject Terminology– How Accomplished

• Conceptual analysis – describe aboutness in natural language

• Translate that analysis into the framework of the controlled vocabulary system (e.g., use of single concept terms vs. use of phrases, compound concepts, and precoordinated subdivisions)

• Use controlled vocabulary system rules to create controlled subject access points to be added to metadata records

Page 10: Creator Element Authority Control. Garbage In, Garbage Out: Input Standards and Metadata Scheme is only half of the equation Consistency is key Controlled.

Controlled Vocabularies

• Subject heading lists– LCSH (Library of Congress Subject Headings)– FAST (Faceted Access to Subject Terminology)– Sears List of Subject Headings– MeSH (Medical Subject Headings)

• Thesauri– AAT (Art & Architecture Thesaurus)– Thesaurus of ERIC Descriptors– Many more...

Page 11: Creator Element Authority Control. Garbage In, Garbage Out: Input Standards and Metadata Scheme is only half of the equation Consistency is key Controlled.

Names: What Do We Need

• Enable the user to retrieve all relevant items associated with a person or group

• Enable the user to retrieve all relevant items associated with a name regardless of the fullness or spelling of the person or group

• Enable names to be browsed by either last name or first name but displayed in natural order

Page 12: Creator Element Authority Control. Garbage In, Garbage Out: Input Standards and Metadata Scheme is only half of the equation Consistency is key Controlled.

Names: Existing Tools

• ANAC (Automated Name Authority Control system)

• Perseus project developed its own named entity extractor optimized for Civil War–era names. Uses MADS

• Stanford Natural Language Processor Tools

Page 13: Creator Element Authority Control. Garbage In, Garbage Out: Input Standards and Metadata Scheme is only half of the equation Consistency is key Controlled.

Authorities

• Authority Control governs usage of a controlled vocabulary. This is managed with

• Authority Files, that consist of• Authority Records, each of which records

a term and its variants as well as evidence. They are created using

• Authority Work, bibliographic detective work usually.

Page 14: Creator Element Authority Control. Garbage In, Garbage Out: Input Standards and Metadata Scheme is only half of the equation Consistency is key Controlled.

Authorities

• Each authority record exists to control a term, known in library cataloging as a “heading”

• The only “entity” is the controlled heading

• The relationships are among the heading and variant forms of the heading

• Everything else in the authority record is evidentiary or used for file control

Page 15: Creator Element Authority Control. Garbage In, Garbage Out: Input Standards and Metadata Scheme is only half of the equation Consistency is key Controlled.

Role of Authority Work

• Authority work, in which terms and names are verified and validated, is a critical part of documentation practice.

• The concept originated in the library cataloging domain in the days of manual card catalogs and indexes when strict consistency was necessary for minimal access.

• Today authority work has extended to other information management communities and its processes and procedures have benefited greatly from computerization.

• The development and application of standard controlled vocabularies is an significant outcome of authority work.

Page 16: Creator Element Authority Control. Garbage In, Garbage Out: Input Standards and Metadata Scheme is only half of the equation Consistency is key Controlled.

Authority Work Characteristics

• Authority files are compilations of authorized terms or headings used by a single organization or consortium in cataloging, indexing, or documentation

• Authority control is a system of procedures that maintains consistent information in database records.

Page 17: Creator Element Authority Control. Garbage In, Garbage Out: Input Standards and Metadata Scheme is only half of the equation Consistency is key Controlled.

Authority Work Characteristics

• An authority file is a controlled vocabulary, but not all controlled vocabularies are authority files.

• Authority files are an integral part of most automated information systems but you will find differing levels of implementation depending on the system.

• Authority work procedures may be automated, but the intellectual processes needed to create quality authority files are still best accomplished by humans.

Page 18: Creator Element Authority Control. Garbage In, Garbage Out: Input Standards and Metadata Scheme is only half of the equation Consistency is key Controlled.

Attributing Works in the Anglo-American Cataloguing Rules

• A work may be attributed to an individual creator, it may be attributed to a corporate emanator, or it may be entered under its title.

• Individuals: chiefly responsible for the creation of intellectual (artistic, etc.) content (21.1A1). Responsibility may be shared or mixed …

• Corporate body: an organization with a name that acts as an entity … and causes a work of collective thought or activity to emanate … (21.1B2). Governments, churches, universities, corporations, conferences, etc.

Page 19: Creator Element Authority Control. Garbage In, Garbage Out: Input Standards and Metadata Scheme is only half of the equation Consistency is key Controlled.

A “Heading” Contains, but is Not Equal to, A “Name”

• A heading includes:– The authorized form of name (title, etc.)– Manipulated in various ways (inverted, for

instance)– Qualifiers to make it unique

• The name is Richard P. Smiraglia• The heading is Smiraglia, Richard P., 1952-

Page 20: Creator Element Authority Control. Garbage In, Garbage Out: Input Standards and Metadata Scheme is only half of the equation Consistency is key Controlled.

Constituting Headings: Personal Names

• The name of the creator as found in his published works.

• If more than one name, choose the latest.

• If more than one form, choose that found most often most recently.

• If all else fails, choose the fullest form.

• Add dates and middle names to resolve conflicts.

Page 21: Creator Element Authority Control. Garbage In, Garbage Out: Input Standards and Metadata Scheme is only half of the equation Consistency is key Controlled.

Constituting Headings: Corporate Names

• The name of the corporate body as found in its published works.

• If more than one name use all.

• If more than one form, choose the one found most often in its works.

• Add terms as qualifiers to resolve conflicts.

– Who (Musical group)

– Apollo (Spaceship)

Page 22: Creator Element Authority Control. Garbage In, Garbage Out: Input Standards and Metadata Scheme is only half of the equation Consistency is key Controlled.

Constituting Headings: Subordinate Entry

• Government or Corporate Entities with generic names or names implying subordination “Department” “Division” “Bureau” “Committee” etc.

• Entered under the name of the intermediate unit with a distinctive name.– California. Employment Data and Research

Division.– NOT: California. Employment Development

Department. Employment Data and Research Division.

Page 23: Creator Element Authority Control. Garbage In, Garbage Out: Input Standards and Metadata Scheme is only half of the equation Consistency is key Controlled.

Authority Control

• Maintains consistency of usage of names of individuals, corporate bodies, and titles of works.

• Always:– Smiraglia, Richard P., 1952-– Not Smiraglia, R.P.– Not Smiraglia, Richard

• Always:– Taylor, Arlene G., 1941-– Not Dowell, Arlene Taylor, 1941-

Page 24: Creator Element Authority Control. Garbage In, Garbage Out: Input Standards and Metadata Scheme is only half of the equation Consistency is key Controlled.

Authority Records

• Authority control works through the use of authority records

• Authority records record:– Authority work—the actual decision-making

process of the cataloger– Variant forms found along the way– References in the catalog from recognized

variant forms

Page 25: Creator Element Authority Control. Garbage In, Garbage Out: Input Standards and Metadata Scheme is only half of the equation Consistency is key Controlled.

A new model of “authority file”

• The authority records of creators are meant to include a much more complex set of information than traditional bibliographic authority records, exactly because they are devoted to implementing the model of separate description of archives and creators

• Dates of existence, history and geography, functions, occupations, and activities … political, social, cultural context in which the creator worked

Page 26: Creator Element Authority Control. Garbage In, Garbage Out: Input Standards and Metadata Scheme is only half of the equation Consistency is key Controlled.

From a Data Modeling Standpoint ….

• Thus the only entity in an authority record is the authorized heading (or “term”)

• Its variants are attributes, but could also be seen as equivalents

• The rest is functional:– Notes (Evidentiary and

Non—two types)– Usage – Control

AF

BF

A flat file model

Headings in the Authority File governusage in the Bibliographic File. One “ Dickens” in the AF governs all “Dickens” in the BF. Usage is inferential.

Page 27: Creator Element Authority Control. Garbage In, Garbage Out: Input Standards and Metadata Scheme is only half of the equation Consistency is key Controlled.

Online, new models emerged

1. Online flat-file modelssimply used the authorityfile as an occasional filter.All headings from the bibliographic file were run against itperiodically for validation.

2. An ER model separated the headings from their representations in bibliographic records. This reduced redundancy dramatically. Every heading is stored onlyin the authority file, and copiedas needed into the displaysarranged from the bibliographic file.All “Dickens” resides only in the AF, with links from the BF.

AF BF

Page 28: Creator Element Authority Control. Garbage In, Garbage Out: Input Standards and Metadata Scheme is only half of the equation Consistency is key Controlled.

Authority Control

• Traditional Functions– Ensures that access points are unique and

consistent in content and form– Provides a network of linkages for variant and

related headings in the catalog– Improves precision & recall for database

searches

Page 29: Creator Element Authority Control. Garbage In, Garbage Out: Input Standards and Metadata Scheme is only half of the equation Consistency is key Controlled.

Reasons for Authority Control Success

• AC operates within a well-defined and bounded universe—the library catalog

• Creation of access points based on principles & standardized practices that guide the process

• Authority work is aided by reference to authoritative lists

• Performed by highly trained individuals– Part of library culture– Understand cause and effect in the information

retrieval process

Page 30: Creator Element Authority Control. Garbage In, Garbage Out: Input Standards and Metadata Scheme is only half of the equation Consistency is key Controlled.

Functions of the Authority File

• Document decisions

• Serve as reference tool

• Control forms of access points

• Support access to bibliographic file

• Link bibliographic and authority files

Page 31: Creator Element Authority Control. Garbage In, Garbage Out: Input Standards and Metadata Scheme is only half of the equation Consistency is key Controlled.

Users

• Authority record creators and reference librarians

• Library patrons

Page 32: Creator Element Authority Control. Garbage In, Garbage Out: Input Standards and Metadata Scheme is only half of the equation Consistency is key Controlled.

Users and Tasks• Users

• Authority record creators and reference librarians

• Library patrons

• User tasks• Find

– Find an entity or set of entities corresponding to stated criteria

• Identify– Identify an entity

• Contextualize– Place a person, corporate

body, work, etc. in context• Justify

– Document the authority record creator’s reason for choosing the name or form of name on which an access point is based

Page 33: Creator Element Authority Control. Garbage In, Garbage Out: Input Standards and Metadata Scheme is only half of the equation Consistency is key Controlled.

Traditional Authority Control in Libraries

Which names do we control?– Names of authors and some contributors of published books– Composers of sheet music– Names of corporate bodies responsible for official publications– Names associated with resources catalogued since 1981– Names associated with audio or audio visual resources, where possible

Which names do we exclude?– Names of authors of journal articles or chapters of published books– Contributors whose names fall towards the end of the alphabet or

whose contribution we regard as insignificant– Names associated with archival or manuscript material– Names derived from older catalogues– Names associated with most Web Resources– Names in the content management system / institutional repository

Page 34: Creator Element Authority Control. Garbage In, Garbage Out: Input Standards and Metadata Scheme is only half of the equation Consistency is key Controlled.

Expectations

• There is a gap between ambition and delivery• Only some names on some types of resources are

controlledUser expectations are changing

• Silos:– Libraries / Archives / Repositories / Museums– National practices– Institutional practices– Variance over time

Is partial authority control acceptable to users? If not, will it be acceptable to administrators?

Page 35: Creator Element Authority Control. Garbage In, Garbage Out: Input Standards and Metadata Scheme is only half of the equation Consistency is key Controlled.

Workflows

• Current workflows are not scalable• Retrospective• Cataloger driven• Decision making

– Is A. Rose PhD the same person as Dr. Alex Rose, University of London?

– What other information is available?– Is it sufficient to match or disambiguate the identities?– Is there a website / contact details?

Page 36: Creator Element Authority Control. Garbage In, Garbage Out: Input Standards and Metadata Scheme is only half of the equation Consistency is key Controlled.

Rethinking the Process

• Capture information about the person, family or corporate body at the time the resource is created

• Devolve responsibility to authors, publishers, researchers and academics

• Libraries and bibliographic agencies focus on quality control, complex relationships and conflict resolution.

• Capture information in a way that is machine intelligible.– Identification of entities not disambiguation of

headings

Page 37: Creator Element Authority Control. Garbage In, Garbage Out: Input Standards and Metadata Scheme is only half of the equation Consistency is key Controlled.

It’s not just about libraries…

• FO:AF Friend of a Friend– Social networking metadata– Granularity of parts of a name

– http://xmlns.com/foaf/spec/

• EAC-CPF: Encoded Archival Context – Corporate Bodies, Persons, and Families– Communication standard for exchange of authority records– ISAAR (CPF)– Draft Standard

http://eac.staatsbibliothek-berlin.de/

Page 38: Creator Element Authority Control. Garbage In, Garbage Out: Input Standards and Metadata Scheme is only half of the equation Consistency is key Controlled.

Thoughts

• Controlling names remains important in the context of linked data and the Semantic Web

• Identification and collocation of variants is more important than establishing a preferred form

• Current techniques are not scalable• Automation and participation are the way

forward• Web services for identification• No simple solution• Exension of the collaborative model

Page 39: Creator Element Authority Control. Garbage In, Garbage Out: Input Standards and Metadata Scheme is only half of the equation Consistency is key Controlled.

FRAD

• Functional Requirements for Authority Data

• IFLA Division of Bibliographic Control working group 1999-

• April 2007 draft for world-wide review

• Approved March 2009

Page 40: Creator Element Authority Control. Garbage In, Garbage Out: Input Standards and Metadata Scheme is only half of the equation Consistency is key Controlled.

FRAD Entities

• Name by which bibliographic entities are known (in the “real” world)

• Identifier assigned to those entities

• Controlled access point based on those names or identifiers

• These are the heart of the authority data

Page 41: Creator Element Authority Control. Garbage In, Garbage Out: Input Standards and Metadata Scheme is only half of the equation Consistency is key Controlled.

Name

• A character or group of words and/or characters by which an entity is known

• The basic name or term itself

• As found in the “real” world

Page 42: Creator Element Authority Control. Garbage In, Garbage Out: Input Standards and Metadata Scheme is only half of the equation Consistency is key Controlled.

Definition: Identifier

• A number, code, word, phrase, logo, device, etc. that is uniquely associated with an entity, and serves to differentiate that entity from other entities within the domain in which the identifier is assigned

• Not only bibliographic identifiers

Page 43: Creator Element Authority Control. Garbage In, Garbage Out: Input Standards and Metadata Scheme is only half of the equation Consistency is key Controlled.

Definition: Controlled Access Point

• A name, term, code, etc. under which a bibliographic or authority record or reference will be found

• Includes established or authorized headings and variant headings or references

Page 44: Creator Element Authority Control. Garbage In, Garbage Out: Input Standards and Metadata Scheme is only half of the equation Consistency is key Controlled.

Basic FRAD Model

BIBLIOGRAPHIC ENTITIES

known by

NAMES and / or IDENTIFIERS

basis for

CONTROLLED ACCESS POINTS

Page 45: Creator Element Authority Control. Garbage In, Garbage Out: Input Standards and Metadata Scheme is only half of the equation Consistency is key Controlled.

More FRAD Entities

• Rules governing construction of a controlled access point

• Agency applying the rules, and creating/modifying the controlled access point

Page 46: Creator Element Authority Control. Garbage In, Garbage Out: Input Standards and Metadata Scheme is only half of the equation Consistency is key Controlled.

MADS

• MODS users kept asking for a compatible authority record

• Metadata Authority Description Schema– April 2004, Preliminary version out for review– December 2004 new draft out for review– April 2005 version 1.0 published

Page 47: Creator Element Authority Control. Garbage In, Garbage Out: Input Standards and Metadata Scheme is only half of the equation Consistency is key Controlled.

MADS schema design

• Highly coordinated with MODS– Schema specifies high level elements and

unique substructures– But MADS points to substructures in MODS

where possible

• Each heading is wrapped in an XML tag: <authority> or <related> or <variant>

• Each subpart of a heading has authority list identifier

Page 48: Creator Element Authority Control. Garbage In, Garbage Out: Input Standards and Metadata Scheme is only half of the equation Consistency is key Controlled.

Components of MADS

• Authoritative heading

• Related heading(s) (see also)

• Variant heading(s) (see)

• Other elements

Page 49: Creator Element Authority Control. Garbage In, Garbage Out: Input Standards and Metadata Scheme is only half of the equation Consistency is key Controlled.

Heading elements

• <authority>– <name>– <titleInfo>– <topic>– <temporal>– <genre>– <geographic>– <hierarchical geographic>– <occupation>

• Same for <related> and <variant>

Page 50: Creator Element Authority Control. Garbage In, Garbage Out: Input Standards and Metadata Scheme is only half of the equation Consistency is key Controlled.

Examples• <authority>

– <geographic>Scotland</geographic>– <topic>History</topic>– <temporal>18th century</temporal>

• </authority>

• <authority>– <genre authority=“gsafd”>Historical fiction</genre>

• </authority>

• <authority>– <name type=“personal”>Law, Felicia</name>– <titleInfo><title>Ways we move</title></titleInfo>

• </authority>

Page 51: Creator Element Authority Control. Garbage In, Garbage Out: Input Standards and Metadata Scheme is only half of the equation Consistency is key Controlled.

Reference types

• <related>– Attribute indicates:

• earlier• later• parentOrg• broader• narrower• equivalent• other

• <variant>– Attribute indicates:

• acronym• abbreviation• translation• expansion• Other

Page 52: Creator Element Authority Control. Garbage In, Garbage Out: Input Standards and Metadata Scheme is only half of the equation Consistency is key Controlled.

Other elements

• <notes>

• <affiliation>

• <fieldOfActivity>

• <url>

• <identifier>

• <extension>

• <recordInfo>

Page 53: Creator Element Authority Control. Garbage In, Garbage Out: Input Standards and Metadata Scheme is only half of the equation Consistency is key Controlled.

Example-<mads> -<authority> -<name type="corporate“ authority="naf">   <namePart>Unesco</namePart></name>  -</authority> -<related type="parentOrg"> -<name><namePart>UnitedNations</namePart> </name>  -</related> -<variant type=“expansion"> -<name><namePart>United Nations Educational, Cultural, and Scientific Organization </namePart></name> -</variant> -</mads>

Page 54: Creator Element Authority Control. Garbage In, Garbage Out: Input Standards and Metadata Scheme is only half of the equation Consistency is key Controlled.

Features

• Word oriented tagging– English word tags– Same as corresponding MODS elements– Easy to pick up and use?– Record creation by technicians?

Page 55: Creator Element Authority Control. Garbage In, Garbage Out: Input Standards and Metadata Scheme is only half of the equation Consistency is key Controlled.

Features

• Rich linking possibilities– <url> element to link out at record level– xlink attribute for external links from elements– ID attribute to enable linking to an element

Page 56: Creator Element Authority Control. Garbage In, Garbage Out: Input Standards and Metadata Scheme is only half of the equation Consistency is key Controlled.

Features

• Special attributes on all elements– lang – MARC codes (ISO 639-2b)– xml:lang – ISO 639-1– script – ISO 15924– transliteration – no controlled list– authority – e.g., lcsh, naf

Page 57: Creator Element Authority Control. Garbage In, Garbage Out: Input Standards and Metadata Scheme is only half of the equation Consistency is key Controlled.

Subject heading example<mads> -<authority> <topic authority=“lcsh">Computer programming </topic> </authority> <related type="broader"> <topic>Computers</topic></related> <related type="narrower"> <topic>Programming languages</topic> </related> <related type=“other"><topic>Systems analysis </topic></related></mads>

Page 58: Creator Element Authority Control. Garbage In, Garbage Out: Input Standards and Metadata Scheme is only half of the equation Consistency is key Controlled.

MADS

• MADS is taking a fresh approach to authority records that is:– Coordinated with MARC 21 authorities and

MODS– Accommodating to a variety of authority types

and practices– Taking advantage of the XML environment– Web site: www.loc.gov/mads

Page 59: Creator Element Authority Control. Garbage In, Garbage Out: Input Standards and Metadata Scheme is only half of the equation Consistency is key Controlled.

Automated name metadata remediation

• Inconsistent name representation

• Metadata harvested from multiple providers

• Hand-crafted data is expensive

• Commercial alternatives are expensive

Page 60: Creator Element Authority Control. Garbage In, Garbage Out: Input Standards and Metadata Scheme is only half of the equation Consistency is key Controlled.

Johns Hopkins Project: Automated Name Authority Control (ANAC)

• 29,000 Levy sheet music records

• 13,764 unique names

• 3.5 million LC name authority records (at the time of the project)

Page 61: Creator Element Authority Control. Garbage In, Garbage Out: Input Standards and Metadata Scheme is only half of the equation Consistency is key Controlled.

ANAC• The evidence used to determine the probability of a match between a name to an LC record is a

set of Boolean tests involving the name, the Levy metadata associated with that name, and the LC record.

• The following fields were used by ANAC: • Levy record:

– Given name: often abbreviated – Middle names: often abbreviated – Family name – Modifiers: titles and suffixes – Date: publication year – Location: publication location (city)

• LC record: – Given name: includes abbreviations – Middle names: includes abbreviations – Family name – Modifiers: titles and suffixes – Birth: year of birth – Death: year of death – Context: miscellaneous data

Page 62: Creator Element Authority Control. Garbage In, Garbage Out: Input Standards and Metadata Scheme is only half of the equation Consistency is key Controlled.

ANAC

• The tests used are: first name equality and consistency, middle name equality and consistency, music terms present in LC record context, name modifier consistency, Levy sheet music publication consistent with LC author birth and death, and Levy record publication location in LC record context

Page 63: Creator Element Authority Control. Garbage In, Garbage Out: Input Standards and Metadata Scheme is only half of the equation Consistency is key Controlled.

ANAC• In order to train the system, the Cataloging Department at the

Sheridan Libraries generated ground truth data.• For each name in 2,000 randomly selected Levy metadata records,

catalogers recorded the authorized form of the name when a matching authority record was available.

• The entire process required 311 hours (approximately seven minutes per name).

• The human catalogers used much the same type of evidence as ANAC in establishing matches. Catalogers examined name similarity; compared publication dates from the Levy records to birth and death dates in the authority records; and examined authority record note fields for musical terms.

• In addition, the catalogers often searched for bibliographic records of other editions of a particular title to determine the authoritative name assigned to the subject.

Page 64: Creator Element Authority Control. Garbage In, Garbage Out: Input Standards and Metadata Scheme is only half of the equation Consistency is key Controlled.

ANAC

• Overall, ANAC was successful 58% of the time. When a name had an LC record, ANAC was successful 77% of the time, but when an LC record did not exist for a name ANAC was successful only 12% of them time. The reason for this discrepancy is that ANAC cannot learn whether or not a name has been added to the LC authority file.

• It took ANAC five hours and forty-five minutes to classify the 2,673 (2,841 minus 168) names, or about eight seconds per name. The database-bound process of retrieving the candidate set of MARC records given a family name consumed most of this time.

Page 65: Creator Element Authority Control. Garbage In, Garbage Out: Input Standards and Metadata Scheme is only half of the equation Consistency is key Controlled.

ANAC

• Matching very dependent on contextual data

• Machine matching much faster than manual

• Performance reasonable even with dirty metadata

• Machine matching could enhance manual work

Page 66: Creator Element Authority Control. Garbage In, Garbage Out: Input Standards and Metadata Scheme is only half of the equation Consistency is key Controlled.

ANAC: Conclusions

• Matching very dependent on contextual data• Machine matching much faster than manual (8

sec. vs. 7 min.)• Performance reasonable even with dirty

metadata. • Machine matching could enhance manual work• Combination of machine processing and human

intervention produced best results• Approach could be tweaked by comparing

names to multiple authority files or domain specific databases

Page 67: Creator Element Authority Control. Garbage In, Garbage Out: Input Standards and Metadata Scheme is only half of the equation Consistency is key Controlled.

Identifiers: People

• One area where growing interest in identifiers is very clear is that of people, particularly in their role as authors or creators.

• The benefits of using a consistent name are clear from a discovery point of view.

• So it is interesting that many people are inconsistent in how they identify themselves on their works.

• Search engines have probably made people more conscious of the distinctiveness - or otherwise - of their names?

• The additional step of unique identification would facilitate various services.

Page 68: Creator Element Authority Control. Garbage In, Garbage Out: Input Standards and Metadata Scheme is only half of the equation Consistency is key Controlled.

UK Names Project

• The project is going to scope the requirements of UK institutional and subject repositories for a service that will reliably and uniquely identify names of individuals and institutions.

• It will then go on to develop a prototype service which will test the various processes involved. This will include determining the data format, setting up an appropriate database, mapping data from different sources, populating the database with records and testing the use of the data.

• This will provide important information about the future usefulness of a name authority service for institutional and subject-based repositories, and other applications beyond the repository sector.

Page 69: Creator Element Authority Control. Garbage In, Garbage Out: Input Standards and Metadata Scheme is only half of the equation Consistency is key Controlled.

Virtual International Authority File (VIAF)

• Link authority records from national bibliographic agencies

• Build on their authority work

• Expand the concept of universal bibliographic control– Allow national or regional variations in

authorized form to co-exist

– Support needs for variations in preferred language, script, and spelling

Page 70: Creator Element Authority Control. Garbage In, Garbage Out: Input Standards and Metadata Scheme is only half of the equation Consistency is key Controlled.

VIAF

Demonstrate feasibility of linking personal names across:

• Personennormadatei (PND)

• Library of Congress Name Authority File (LCNAF)

• Bibliotheque national de France

Page 71: Creator Element Authority Control. Garbage In, Garbage Out: Input Standards and Metadata Scheme is only half of the equation Consistency is key Controlled.

What is VIAF?

• System– Links between files– Web browser access– Multi-lingual and multi-scripts

• Maintenance– National agencies control their records– Records harvested from national systems

• Scalable– Any number of national authority files

Page 72: Creator Element Authority Control. Garbage In, Garbage Out: Input Standards and Metadata Scheme is only half of the equation Consistency is key Controlled.

Matching Variations

In the LCNAF and PND authority files:

• Same name, same person

• Same name, different people

• Different names, same person

• Missing person in one file

Page 73: Creator Element Authority Control. Garbage In, Garbage Out: Input Standards and Metadata Scheme is only half of the equation Consistency is key Controlled.

Two Different People – One Name

Adams, Mike

• PND: a golfer

• LCNAF: author of a Beatles collector's guide

Same Name

Different

People

Page 74: Creator Element Authority Control. Garbage In, Garbage Out: Input Standards and Metadata Scheme is only half of the equation Consistency is key Controlled.

One Person – Two Names

• LCNAF: Morel, Pierre

• PND: Morellus, Petrus

Same Person

Different

Names

Page 75: Creator Element Authority Control. Garbage In, Garbage Out: Input Standards and Metadata Scheme is only half of the equation Consistency is key Controlled.

Enhancing the Authorities

Bibliographic

Record

Derived

Authority

Authority

Record

Enhanced

Authority

Page 76: Creator Element Authority Control. Garbage In, Garbage Out: Input Standards and Metadata Scheme is only half of the equation Consistency is key Controlled.

Strong Matching Attributes

• A work (title) in common

• Common control numbers (ISBN, ISSN, or LCCN)

• Exact birth and death year

• Joint authors

• Name as subject

Page 77: Creator Element Authority Control. Garbage In, Garbage Out: Input Standards and Metadata Scheme is only half of the equation Consistency is key Controlled.

Weaker Attributes

• Only one of birth/death date(s) (allows some variation)• Subject area of works (two levels)• Format (books, films, musical scores, etc.)• Language• Publisher• Partial title match

• Date of publication• Country• Role (author, illustrator, composer, etc.)• Format (books, films, musical scores, etc.)

Page 78: Creator Element Authority Control. Garbage In, Garbage Out: Input Standards and Metadata Scheme is only half of the equation Consistency is key Controlled.

OCLC Cooperative Identities Hub

• Bring together information about creators now hidden within library, archival, and museum contexts, using a social networking model.

• Broaden the view of "authority work" beyond NACO contributors.

• Increase metadata creation efficiency. • Make it easier for users to identify works by or about the

same creator regardless of language or discipline. • Expose information about personal and corporate bodies

beyond the confines of library, archival, and museum silos and bring them into the "network flow".

Page 79: Creator Element Authority Control. Garbage In, Garbage Out: Input Standards and Metadata Scheme is only half of the equation Consistency is key Controlled.

Names can be ambiguous…

“John Adams”

… the US president?

… the US composer?

… the British mathematician & astronomer?

… the British nuclear physicist?

… or someone else?

Page 80: Creator Element Authority Control. Garbage In, Garbage Out: Input Standards and Metadata Scheme is only half of the equation Consistency is key Controlled.

Names depend on context… US: Chiang Kai-shek

France, Germany: Jiang Jieshi

China, Japan:蒋介石

蔣中正

Arabic-speaking countries:

شيك كاي شيانج

Tamil: சங்கை� செசக்

Page 81: Creator Element Authority Control. Garbage In, Garbage Out: Input Standards and Metadata Scheme is only half of the equation Consistency is key Controlled.

Cooperative Identities Hub

• Framework to concatenate and merge authoritative information

• Gateway to all forms of names without preferring one form over another

• Use social networking model• Provide a switch to extract relevant information

for re-use in own contexts• Create federated trust environment to

authenticate and authorize contributors

Page 82: Creator Element Authority Control. Garbage In, Garbage Out: Input Standards and Metadata Scheme is only half of the equation Consistency is key Controlled.

Hub Objectives

• Increase metadata creation efficiency• Easier to identify identity regardless

of language or discipline • Determine preferred form within own

context• Enable contributing agencies to augment

own data resources• Expose information about personal and

corporate bodies beyond original contexts

Page 83: Creator Element Authority Control. Garbage In, Garbage Out: Input Standards and Metadata Scheme is only half of the equation Consistency is key Controlled.

Hub Data Elements

• At least one form of name

• Life events, with dates if known: origin, place(s) of output, knowledge domains, institutional affiliations…

• Associated entities (role and what relationship is)

• At least some works• Short biographical history

• Unique identifiers from each source

Page 84: Creator Element Authority Control. Garbage In, Garbage Out: Input Standards and Metadata Scheme is only half of the equation Consistency is key Controlled.
Page 85: Creator Element Authority Control. Garbage In, Garbage Out: Input Standards and Metadata Scheme is only half of the equation Consistency is key Controlled.
Page 86: Creator Element Authority Control. Garbage In, Garbage Out: Input Standards and Metadata Scheme is only half of the equation Consistency is key Controlled.
Page 87: Creator Element Authority Control. Garbage In, Garbage Out: Input Standards and Metadata Scheme is only half of the equation Consistency is key Controlled.