Challenges for a new era

44
Challenges for the New Era Diane I. Hillmann Metadata Management Associates Oslo, February 8, 2013

description

Presentation delivered in Oslo, Norway, February 8, 2013.

Transcript of Challenges for a new era

Page 1: Challenges for a new era

Challenges for the New Era

Diane I. HillmannMetadata Management Associates

Oslo, February 8, 2013

Page 2: Challenges for a new era

Oslo 2013 2

Big Challenges/Big Ideas

O Changing our thinking from records to statementsO Will RDA help?

O Where you start affects where you end up

O Shifting our ways from ROI to PotlachO Recognizing that our human resources

are limitedO So how do we manage this data-that-

isn’t records?2/7/13

Page 3: Challenges for a new era

Oslo 2013 3

Statements and Records

O Records are still important but not as we’ve used them in the pastO We might want to think about records

as the instantiation of a point of viewO News: traditional library data has a

point of viewO MARC required consensus because of

limitations built into the technology O Now we need provenance, so we know

“Who said?”2/7/13

Page 4: Challenges for a new era

Oslo 2013 4

Building RDVocab: Goals

O Bridge the XML and RDF worlds O Ensure ability to map between RDA and

other element setsO Provide a sound platform for extension of

RDA Vocabularies into new and specialized domains

O Consider methods for expressing AACR2 structures in technical ways to ease the pain of transition to RDA

2/7/13

Page 5: Challenges for a new era

Oslo 2013 5

RDVocab Structure, Simplified

O RDA Properties declared in two separate hierarchies: O An ‘unconstrained’ vocabulary, with no explicit

relationship to FRBR entitiesO A subset of classes, properties and

subproperties with FRBR entities as ‘domains’

O Pros: retained usability in or out of libraries; better mapping to/from non-FRBR vocabularies

O Cons: still seems too complex to many SemWeb implementers (many using BIBO)

2/7/13

Page 6: Challenges for a new era

Oslo 2013 6

Why Unconstrained Properties?

O The ‘bounded’ properties should be seen as the official JSC-defined RDA Application Profile for librariesO What’s still lacking is the addition of the necessary

constraints: datatypes, cardinality, associated value vocabularies

O Extensions and mapping should be built from the unconstrained propertiesO Unconstrained vocabularies necessary for use in

domains where FRBR not assumed or inappropriateO Mapping from vocabularies not using the FRBR model

directly to ones that do (and back) creates serious problems for the ‘Web of Data’

2/7/13

Page 7: Challenges for a new era

Oslo 2013 7

Property (Generalized, no FRBR relationship)

Subproperty (with relationship to one FRBR entity)

FRBR Entity

SemanticWeb

Library ApplicationsThe Simple Case:

One Property-- One FRBR Entity

2/7/13

Page 8: Challenges for a new era

Oslo 2013 8

Property (Generalized, no FRBR relationship)

Subproperty (with relationship to one FRBR entity)

Subproperty (with relationship to one FRBR entity)

FRBR Entity

FRBR Entity

SemanticWeb

Library ApplicationsThe Not-So-Simple Case: One Property—more than

One FRBR Entity2/7/13

Page 9: Challenges for a new era

Oslo 2013 9

Roles: Attributes or Properties?

O In 2005, the DC Usage Board worked with LC to build a formal representation of the MARC Relators so that these terms could be used with DCO This work provided a template for the

registration of the role terms in RDA (in Appendix I) and, by extension, the other RDA relationships

O Role and relationship properties are registered at the same level as elements, rather than as attributes (as MARC does with relators, and RDA does in its XML schemas)

2/7/13

Page 10: Challenges for a new era

Oslo 2013 10

Vocabulary Extension O The inclusion of unconstrained properties

provides a path for extension of RDA into specialized library communities and non-library communitiesO They may have a different notion of how

FRBR ‘aggregates’ (For example, a colorized version of a film may be viewed as a separate work)

O They may not wish to use FRBR at allO They may have additional, domain-specific

properties to add, that could benefit from a relationship to the RDA properties

2/7/13

Page 11: Challenges for a new era

Oslo 2013 11

RDA:adaptedAs

RDA:adaptedAsARadioScript

hasSubprope

rty

2/7/13

Page 12: Challenges for a new era

Oslo 2013 12

RDA:adaptedAs

RDA:adaptedAsARadioScript

KidLit:adaptedAsAPictureBook

hasSubproperty

hasSubprope

rty

Extension using Unconstrained Properties

2/7/13

Page 13: Challenges for a new era

Oslo 2013 13

RDA:adaptedAs

RDA:adaptedAsARadioScript

KidLit:adaptedAsAPictureBook

hasSubproperty

hasSubprope

rty

KidLit:adaptedAsAChapterBook

hasS

ubprop

e

rty

Extension using Unconstrained Properties2/7/13

Page 14: Challenges for a new era

Oslo 2013 14

Where you start affects where you end up

O Simple metadata is more useful as output than inputO The ‘long tail’ of MARC’s lesser used

properties was built up over decades and shouldn’t be discarded

O Easier to dumb down than smarten upO Dublin Core and MARC examples of

starting simple and trying to add onO Distribution models are important

2/7/13

Page 15: Challenges for a new era

Oslo 2013 15

Values vs. CostsO Machines cost less than people, but

they can’t replace peopleO Computers tend to require

instructions from people to work wellO But they are more consistent than

people!O ROI culture vs. Potlatch culture

O Is ‘who pays for this?’ the right question?

2/7/13

Page 16: Challenges for a new era

Oslo 2013 16

The Management Conundrum

O Traditional ILS’s haven’t worked for us for a long timeO They were built to create and manage

catalog dataO We can no longer invest in the catalog

paradigmO Libraries are data builders, data

managers, data distributorsO The centralized, master record model

is as dead as MARC encoding2/7/13

Page 17: Challenges for a new era

Oslo 2013 17

I Know It’s Hard

We’re getting our heads around linked data, but still

can’t figure out what it means for creation and

management of our data

XML and RDF are more than different encodings, they are different views of the

world

In order to understand our challenges, we need to understand more about those views of the world

2/7/13

Page 18: Challenges for a new era

Oslo 2013 18

Linked Data is Inherently Chaotic

O Requires creating and aggregating data in a broader contextO There is no one ‘correct’ record to be

made from this, no objective ‘truth’O This approach is different from the

cataloging traditionO BUT, the focus on vocabularies is familiar

O In the SemWeb world vocabularies are more complex than the thesauri we know

2/7/13

Page 19: Challenges for a new era

Oslo 2013 19

Model of ‘the World’ /XML

O XML assumes a 'closed' world (domain), usually defined by a schema:O "We know all of the data describing this

resource. The single description must be a valid document according to our schema. The data must be valid.”

OXML's document model provides a neat equivalence to a metadata 'record’ (and most of us are fairly comfortable with it)

2/7/13

Page 20: Challenges for a new era

Oslo 2013 20

Model of ‘the World’ /RDF

O RDF assumes an 'open' world:O "There's an infinite amount of unknown data

describing this resource yet to be discovered. It will come from an infinite number of providers. There will be an infinite number of descriptions. Those descriptions must be consistent."

O RDF's statement-oriented data model has no notion of 'record’ (rather, statements can be aggregated for a fuller description of a resource)

2/7/13

Page 21: Challenges for a new era

Oslo 2013 21

The New Management Strategy

O Statement level rather than record level management

O Emphasis on evaluation coming in and provenance going out

O Shift in human effort from creating standard cataloging to knowledgeable human intervention in machine-based processes

O Extensive use of data created outside libraries

O Intelligent re-use of our legacy data

2/7/13

Page 22: Challenges for a new era

Is MARC Dead?

O The communication format is very dead (based on standards no longer updated)

O The semantics are not deadO They represent the distillation of decades of

descriptive experienceO As we move into a more machine-assisted

world, our old concerns about the size of our legacy can be addressed

O Taking the legacy records with us should be based on solutions developed using open and transparent strategies

Page 23: Challenges for a new era

Oslo 2013 23

What’s our Distribution Model?

We don’t know what you want, so

choose!

We know more about what you want than you do. Here it is!

2/7/13

Page 24: Challenges for a new era

Libraries as Data Publishers & Consumers

O Data from library ‘publishers’ should look like a supermarket—lots of choices, with decisions made by consumers

O Right now we seem to be operating as Soviet bakeries

O This is not what open linked data is supposed to be doing for us

O "Be conservative in what you send, liberal in what you accept”—Robustness Principle

Page 25: Challenges for a new era

Our Goals as Data Publishers

O If we want people outside libraries to use our data, we need offer them choices

O This strategy is based on mapping all of our legacy data

O Not a selectionO Filtering accomplished by data consumers,

who know best what they needO This requires active innovation and a new

understanding of how to manage the data

Page 26: Challenges for a new era

Oslo 2013 26

Our Goals as Data Consumers

O As aggregators of relevant metadata contentO Developing methods to gather and redistribute

without necessarily re-creating OCLCO Modeling and documenting best practices in

metadata creation, improvement and exposureO Application profiles important in this effort

O As developers of vocabularies exposing a variety of bibliographic relationships

O As innovators in using social networks to enhance bibliographic description

2/7/13

Page 27: Challenges for a new era

Mapping Legacy Data for Re-distribution

O If we want data consumers to value our data, we should map it all

O We can distribute limited ‘flavors’ as well, as we gain experience and feedback

O Current mapping strategies are based onO One-time, inflexible, programmatic methods

that effectively hide the process from consumers

O Assumptions that data must be improved at the time it is mapped, or never

Page 28: Challenges for a new era

Oslo 2013 282/7/13

Page 29: Challenges for a new era

Oslo 2013 292/7/13

Page 30: Challenges for a new era

Oslo 2013 30

If we don’t distribute our best data, how can anybody do cool stuff with it?

Isn’t that what we want?

We can use the cool stuff ourselves!2/7/13

Page 31: Challenges for a new era

Oslo 2013 312/7/13

Page 32: Challenges for a new era

Oslo 2013 322/7/13

Page 33: Challenges for a new era

Oslo 2013 332/7/13

Page 34: Challenges for a new era

Oslo 2013 34

Harvest/Ingest PlanO Choosing data sources

O There are known sources out there, some of them are of good quality, others are usable, with improvement

O Tools are needed to help pull data, validate it, cache it, and set it up for evaluationO Most of these tasks can/should be set

up with automated processes, with alerts to human minders when something goes wrong 2/7/13

Page 35: Challenges for a new era

Oslo 2013 352/7/13

Page 36: Challenges for a new era

Oslo 2013 36

Metadata EvaluationO Evaluation needs to scale well

beyond random samplingO Statistical and data mining tools

need to be brought into the process, to provide both ‘overview’ and specifics of whole data sets

O Improvement specifications, techniques, quality criteria and tools need to be iterative, granular, and shareable

2/7/13

Page 37: Challenges for a new era

Oslo 2013 372/7/13

Page 38: Challenges for a new era

Oslo 2013 38

Testing, Monitoring & Re-evaluation

O Data will change, and processes must be able to detect that, based on data profilesO Human intervention should be limited

O Tools need to be built so that non-programmers can run themO Reading logs, monitoring error reports,

checking results, writing specs, can/should be done by data specialists (a.k.a. catalogers w/training)

O Looking for opportunities for programmers and catalogers to learn together is essential!

2/7/13

Page 39: Challenges for a new era

Oslo 2013 392/7/13

Page 40: Challenges for a new era

Oslo 2013 40

Re-distribution PlanO If we improve data, we need to expose

how we did it (and what we did), for the use of downstream consumersO New metadata provenance efforts are

designed to do this at the statement levelO This strategy can only exist successfully

where open licenses allow innovation and wide re-use

O Ideally, distribution AND redistribution should be accomplished with Application Profiles

2/7/13

Page 41: Challenges for a new era

Oslo 2013 41

Will This Shift Cost Too Much?

O It’s the human effort that costs usO Cost of traditional cataloging is far too high, for

increasingly dubious valueO Our current investments have reached the end of

their usefulnessO All the possible efficiencies for traditional

cataloging have already been accomplishedO Waiting for leadership from the big players costs

us valuable time with no guarantees of resultsO We need to figure out how to invest in more

distributed innovation and focused collaboration

2/7/13

Page 42: Challenges for a new era

Oslo 2013 42

What About the Millions?

O Our legacy MARC data is already a ‘graph’, but the resources defined there have no internet resolvable identity

O But even the transcribed text can be hugely valuable, with effort and software to helpO Projects like the eXtensible Catalog have

made an excellent start in demonstrating this point

O MARC 21 is already available as basic RDF

2/7/13

Page 43: Challenges for a new era

The Bottom LineO Our big investment is (and has always

been) in our data, not our systemsO Over many changes in format of

materials, we’ve always struggled to keep our focus on the data content that endures, regardless of presentation format

O We are in a great position to have influence on how the future develops, but we can’t be afraid to change, or afraid to fail

Oslo 2013 43

2/7/13

Page 44: Challenges for a new era

Thank you! Questions?

Contact info: [email protected]

Metadata Matters:

http://managemetadata.com/blog

Oslo 2013 44

2/7/13