Challenges for a new era
-
Upload
diane-i-hillmann -
Category
Documents
-
view
592 -
download
4
description
Transcript of Challenges for a new era
Challenges for the New Era
Diane I. HillmannMetadata Management Associates
Oslo, February 8, 2013
Oslo 2013 2
Big Challenges/Big Ideas
O Changing our thinking from records to statementsO Will RDA help?
O Where you start affects where you end up
O Shifting our ways from ROI to PotlachO Recognizing that our human resources
are limitedO So how do we manage this data-that-
isn’t records?2/7/13
Oslo 2013 3
Statements and Records
O Records are still important but not as we’ve used them in the pastO We might want to think about records
as the instantiation of a point of viewO News: traditional library data has a
point of viewO MARC required consensus because of
limitations built into the technology O Now we need provenance, so we know
“Who said?”2/7/13
Oslo 2013 4
Building RDVocab: Goals
O Bridge the XML and RDF worlds O Ensure ability to map between RDA and
other element setsO Provide a sound platform for extension of
RDA Vocabularies into new and specialized domains
O Consider methods for expressing AACR2 structures in technical ways to ease the pain of transition to RDA
2/7/13
Oslo 2013 5
RDVocab Structure, Simplified
O RDA Properties declared in two separate hierarchies: O An ‘unconstrained’ vocabulary, with no explicit
relationship to FRBR entitiesO A subset of classes, properties and
subproperties with FRBR entities as ‘domains’
O Pros: retained usability in or out of libraries; better mapping to/from non-FRBR vocabularies
O Cons: still seems too complex to many SemWeb implementers (many using BIBO)
2/7/13
Oslo 2013 6
Why Unconstrained Properties?
O The ‘bounded’ properties should be seen as the official JSC-defined RDA Application Profile for librariesO What’s still lacking is the addition of the necessary
constraints: datatypes, cardinality, associated value vocabularies
O Extensions and mapping should be built from the unconstrained propertiesO Unconstrained vocabularies necessary for use in
domains where FRBR not assumed or inappropriateO Mapping from vocabularies not using the FRBR model
directly to ones that do (and back) creates serious problems for the ‘Web of Data’
2/7/13
Oslo 2013 7
Property (Generalized, no FRBR relationship)
Subproperty (with relationship to one FRBR entity)
FRBR Entity
SemanticWeb
Library ApplicationsThe Simple Case:
One Property-- One FRBR Entity
2/7/13
Oslo 2013 8
Property (Generalized, no FRBR relationship)
Subproperty (with relationship to one FRBR entity)
Subproperty (with relationship to one FRBR entity)
FRBR Entity
FRBR Entity
SemanticWeb
Library ApplicationsThe Not-So-Simple Case: One Property—more than
One FRBR Entity2/7/13
Oslo 2013 9
Roles: Attributes or Properties?
O In 2005, the DC Usage Board worked with LC to build a formal representation of the MARC Relators so that these terms could be used with DCO This work provided a template for the
registration of the role terms in RDA (in Appendix I) and, by extension, the other RDA relationships
O Role and relationship properties are registered at the same level as elements, rather than as attributes (as MARC does with relators, and RDA does in its XML schemas)
2/7/13
Oslo 2013 10
Vocabulary Extension O The inclusion of unconstrained properties
provides a path for extension of RDA into specialized library communities and non-library communitiesO They may have a different notion of how
FRBR ‘aggregates’ (For example, a colorized version of a film may be viewed as a separate work)
O They may not wish to use FRBR at allO They may have additional, domain-specific
properties to add, that could benefit from a relationship to the RDA properties
2/7/13
Oslo 2013 11
RDA:adaptedAs
RDA:adaptedAsARadioScript
hasSubprope
rty
2/7/13
Oslo 2013 12
RDA:adaptedAs
RDA:adaptedAsARadioScript
KidLit:adaptedAsAPictureBook
hasSubproperty
hasSubprope
rty
Extension using Unconstrained Properties
2/7/13
Oslo 2013 13
RDA:adaptedAs
RDA:adaptedAsARadioScript
KidLit:adaptedAsAPictureBook
hasSubproperty
hasSubprope
rty
KidLit:adaptedAsAChapterBook
hasS
ubprop
e
rty
Extension using Unconstrained Properties2/7/13
Oslo 2013 14
Where you start affects where you end up
O Simple metadata is more useful as output than inputO The ‘long tail’ of MARC’s lesser used
properties was built up over decades and shouldn’t be discarded
O Easier to dumb down than smarten upO Dublin Core and MARC examples of
starting simple and trying to add onO Distribution models are important
2/7/13
Oslo 2013 15
Values vs. CostsO Machines cost less than people, but
they can’t replace peopleO Computers tend to require
instructions from people to work wellO But they are more consistent than
people!O ROI culture vs. Potlatch culture
O Is ‘who pays for this?’ the right question?
2/7/13
Oslo 2013 16
The Management Conundrum
O Traditional ILS’s haven’t worked for us for a long timeO They were built to create and manage
catalog dataO We can no longer invest in the catalog
paradigmO Libraries are data builders, data
managers, data distributorsO The centralized, master record model
is as dead as MARC encoding2/7/13
Oslo 2013 17
I Know It’s Hard
We’re getting our heads around linked data, but still
can’t figure out what it means for creation and
management of our data
XML and RDF are more than different encodings, they are different views of the
world
In order to understand our challenges, we need to understand more about those views of the world
2/7/13
Oslo 2013 18
Linked Data is Inherently Chaotic
O Requires creating and aggregating data in a broader contextO There is no one ‘correct’ record to be
made from this, no objective ‘truth’O This approach is different from the
cataloging traditionO BUT, the focus on vocabularies is familiar
O In the SemWeb world vocabularies are more complex than the thesauri we know
2/7/13
Oslo 2013 19
Model of ‘the World’ /XML
O XML assumes a 'closed' world (domain), usually defined by a schema:O "We know all of the data describing this
resource. The single description must be a valid document according to our schema. The data must be valid.”
OXML's document model provides a neat equivalence to a metadata 'record’ (and most of us are fairly comfortable with it)
2/7/13
Oslo 2013 20
Model of ‘the World’ /RDF
O RDF assumes an 'open' world:O "There's an infinite amount of unknown data
describing this resource yet to be discovered. It will come from an infinite number of providers. There will be an infinite number of descriptions. Those descriptions must be consistent."
O RDF's statement-oriented data model has no notion of 'record’ (rather, statements can be aggregated for a fuller description of a resource)
2/7/13
Oslo 2013 21
The New Management Strategy
O Statement level rather than record level management
O Emphasis on evaluation coming in and provenance going out
O Shift in human effort from creating standard cataloging to knowledgeable human intervention in machine-based processes
O Extensive use of data created outside libraries
O Intelligent re-use of our legacy data
2/7/13
Is MARC Dead?
O The communication format is very dead (based on standards no longer updated)
O The semantics are not deadO They represent the distillation of decades of
descriptive experienceO As we move into a more machine-assisted
world, our old concerns about the size of our legacy can be addressed
O Taking the legacy records with us should be based on solutions developed using open and transparent strategies
Oslo 2013 23
What’s our Distribution Model?
We don’t know what you want, so
choose!
We know more about what you want than you do. Here it is!
2/7/13
Libraries as Data Publishers & Consumers
O Data from library ‘publishers’ should look like a supermarket—lots of choices, with decisions made by consumers
O Right now we seem to be operating as Soviet bakeries
O This is not what open linked data is supposed to be doing for us
O "Be conservative in what you send, liberal in what you accept”—Robustness Principle
Our Goals as Data Publishers
O If we want people outside libraries to use our data, we need offer them choices
O This strategy is based on mapping all of our legacy data
O Not a selectionO Filtering accomplished by data consumers,
who know best what they needO This requires active innovation and a new
understanding of how to manage the data
Oslo 2013 26
Our Goals as Data Consumers
O As aggregators of relevant metadata contentO Developing methods to gather and redistribute
without necessarily re-creating OCLCO Modeling and documenting best practices in
metadata creation, improvement and exposureO Application profiles important in this effort
O As developers of vocabularies exposing a variety of bibliographic relationships
O As innovators in using social networks to enhance bibliographic description
2/7/13
Mapping Legacy Data for Re-distribution
O If we want data consumers to value our data, we should map it all
O We can distribute limited ‘flavors’ as well, as we gain experience and feedback
O Current mapping strategies are based onO One-time, inflexible, programmatic methods
that effectively hide the process from consumers
O Assumptions that data must be improved at the time it is mapped, or never
Oslo 2013 282/7/13
Oslo 2013 292/7/13
Oslo 2013 30
If we don’t distribute our best data, how can anybody do cool stuff with it?
Isn’t that what we want?
We can use the cool stuff ourselves!2/7/13
Oslo 2013 312/7/13
Oslo 2013 322/7/13
Oslo 2013 332/7/13
Oslo 2013 34
Harvest/Ingest PlanO Choosing data sources
O There are known sources out there, some of them are of good quality, others are usable, with improvement
O Tools are needed to help pull data, validate it, cache it, and set it up for evaluationO Most of these tasks can/should be set
up with automated processes, with alerts to human minders when something goes wrong 2/7/13
Oslo 2013 352/7/13
Oslo 2013 36
Metadata EvaluationO Evaluation needs to scale well
beyond random samplingO Statistical and data mining tools
need to be brought into the process, to provide both ‘overview’ and specifics of whole data sets
O Improvement specifications, techniques, quality criteria and tools need to be iterative, granular, and shareable
2/7/13
Oslo 2013 372/7/13
Oslo 2013 38
Testing, Monitoring & Re-evaluation
O Data will change, and processes must be able to detect that, based on data profilesO Human intervention should be limited
O Tools need to be built so that non-programmers can run themO Reading logs, monitoring error reports,
checking results, writing specs, can/should be done by data specialists (a.k.a. catalogers w/training)
O Looking for opportunities for programmers and catalogers to learn together is essential!
2/7/13
Oslo 2013 392/7/13
Oslo 2013 40
Re-distribution PlanO If we improve data, we need to expose
how we did it (and what we did), for the use of downstream consumersO New metadata provenance efforts are
designed to do this at the statement levelO This strategy can only exist successfully
where open licenses allow innovation and wide re-use
O Ideally, distribution AND redistribution should be accomplished with Application Profiles
2/7/13
Oslo 2013 41
Will This Shift Cost Too Much?
O It’s the human effort that costs usO Cost of traditional cataloging is far too high, for
increasingly dubious valueO Our current investments have reached the end of
their usefulnessO All the possible efficiencies for traditional
cataloging have already been accomplishedO Waiting for leadership from the big players costs
us valuable time with no guarantees of resultsO We need to figure out how to invest in more
distributed innovation and focused collaboration
2/7/13
Oslo 2013 42
What About the Millions?
O Our legacy MARC data is already a ‘graph’, but the resources defined there have no internet resolvable identity
O But even the transcribed text can be hugely valuable, with effort and software to helpO Projects like the eXtensible Catalog have
made an excellent start in demonstrating this point
O MARC 21 is already available as basic RDF
2/7/13
The Bottom LineO Our big investment is (and has always
been) in our data, not our systemsO Over many changes in format of
materials, we’ve always struggled to keep our focus on the data content that endures, regardless of presentation format
O We are in a great position to have influence on how the future develops, but we can’t be afraid to change, or afraid to fail
Oslo 2013 43
2/7/13
Thank you! Questions?
Contact info: [email protected]
Metadata Matters:
http://managemetadata.com/blog
Oslo 2013 44
2/7/13