Developments in catalogues and data sharing
-
Upload
edmund-chamberlain -
Category
Education
-
view
1.807 -
download
2
description
Transcript of Developments in catalogues and data sharing
• How our catalogues are evolving• Opening and sharing the data within them
• Ed Chamberlain• Systems Development Librarian – Cambridge University Library
Systems Development Librarian at the other place
Data ‘munger’
Data consumer?
Control over data creation
Control over data consumption
Control over data environment
Control over data technology
No longer the single authority for content and data
Commercial, social and academic discovery mechanisms
Explosion of digital content
Illusion of ‘all on the web’
Studies into Google Generation / ‘Generation Y’ 1
Cambridge Arcadia IRIS report 2009 2
Preference for search engine over catalogue
Online over in-building
Trust tutors and peers over Librarian
Still respect the library ‘brand’ 1) ”The Google generation: the information behaviour of the researcher of the future”Aslib Proceedings, V60, issue 4 10.1108/00012530810887953
2) Arcadia IRIS Project report - http://arcadiaproject.lib.cam.ac.uk/docs/Report_IRIS_final.pdf
So far …
Evolution of catalogues
Changes in exposure of data
To come? Greater sharing of data
Library data used in non-library environments
Keyword based discovery services
New ways to exploit old data
Relevancy ranking
Rich faceting
Greater linking
Search is the new browse
Repositories and archives
Is the OPAC dead?
Citations
Abstracts
Table of Contents
Tags Public lists Reader reviews
Dramatic growth in access pointsInput from true subject specialists
o Lack of structureo No quality controlo Compromise of sanctity?
Web scale - resource discovery concept taken further Primo Central Summon Ebsco Discovery Worldcat local
Hathi trust data can be used for full text searching of print collections
Catalogue data is now: Consumed as keywords (not
left anchored) Facted (not browsed) Supplemented Transformed Merged Amalgamated
Our local catalogues
National / international aggregations
Joe Public
Teenage software developer / hacker
Booksellers
Web start-ups
Search engines
Wikipedia
Other libraries
Research group website
Bibliographic data linked to many aspects of successful teaching and research
Citation lists – measure output
Shared bibliography – core of research group work
Reading lists – backbone of undergraduate teaching
High quality data needed for re-use
Not all possible whilst data resides in the library ‘silo’
“Library catalogues have imposed on them librarian or supplier-made decisions about what can/can’t be searched and in what way. Some of these decisions are limited by current cataloguing rules, but not all; often the data is recorded, but not in a usable way, or is there but isn’t tapped by the interface. For example, in most catalogues you can limit by publication type to newspapers, but you can’t limit by frequency of the issues.”
“Releasing data means that people can start to use it in the way they want to.”
Success of distributed access outside of cultural heritage
Single point of discovery?
Taxpayer generated – give it back!
Why not share?
Past few years have seen a massive release of public data in government and cultural heritage sectors Open Government Data - http://data.gov.uk Open Knowledge Foundation - http://okfn.org
EU Commission mandate to open data
Shared in ways for easy reuse and linking
RLUK and JISC initiative
Galleries, libraries, archives, museums
The Discovery principles propose that:
'Open metadata creates the opportunity for enhancing impact through the release of descriptive data about library, archival and museum resources. It allows such data to be made freely available and innovatively reused to serve researchers, teachers, students, service providers and the wider community in the UK and internationally.'
http://discovery.ac.uk
Why not?
WorldCat has done this for years
Schema.org microdata– some semantic structure
Use case for catalogue data in an advertising environment?
Google taken 10% (so far)
<h1 itemprop="name”>The Cambridge companion to Spenser edited by Andrew Hadfield. [electronic resource] /</h1>
<span style="display: none;" itemprop="publisher">Cambridge University Press,</span> <span style="display: none;" itemprop="datePublished">2001.</span>
Application Programme Interface (API)
Layered over LMS
Expose catalogue data feeds for developers
Anyone can use them
Simple request, simple response
http://www.lib.cam.ac.uk/api
http://www.lib.cam.ac.uk/api/voyager/newtonSearch.cgi?searchArg=darwin&databases=depfacaedb
COMET project
80% of CUL bib records converted to Resource Description Framework (RDF)
Enriched with direct links to the Library of Congress
Vocab in-line with British Library work
OCLC FAST and VIAF authority sources
http://data.lib.cam.ac.uk
Marc21 …001 1000346245$aEarly medieval history of Kashmir : $b[with special reference to the Loharas] A.D. 1003-1171 /
DC XML …<dc:identifer>1000346</dc:identifer><dc:title>Early medieval history of Kashmir : [with special reference to the Loharas] A.D. 1003-1171</dc:title>
RDF triples …<http://data.lib.cam.ac.uk/id/entry/cambrdgedb_1000346> <http://purl.org/dc/terms/title> "Early medieval history of Kashmir : [with special reference to the Loharas] A.D. 1003-1171"
1. <http://data.lib.cam.ac.uk/id/entry/cambrdgedb_1000346> <http://purl.org/dc/terms/title> "Early medieval history of Kashmir : [with special reference to the Loharas] A.D. 1003-1171" .2. <http://data.lib.cam.ac.uk/id/entry/cambrdgedb_1000346> <http://purl.org/dc/terms/type> <http://data.lib.cam.ac.uk/id/type/1cb251ec0d568de6a929b520c4aed8d1> .3. <http://data.lib.cam.ac.uk/id/entry/cambrdgedb_1000346> <http://purl.org/dc/terms/type> <http://data.lib.cam.ac.uk/id/type/46657eb180382684090fda2b5670335d> .4. <http://data.lib.cam.ac.uk/id/entry/cambrdgedb_1000346> <http://purl.org/dc/terms/identifier> "UkCU1000346" .5. <http://data.lib.cam.ac.uk/id/entry/cambrdgedb_1000346> <http://purl.org/dc/terms/issued> "1981" .6. <http://data.lib.cam.ac.uk/id/entry/cambrdgedb_1000346> <http://purl.org/dc/terms/creator> <http://data.lib.cam.ac.uk/id/entity/cambrdgedb_a5a6f7a184ff02e08b1befedc1b3a4d0> .7. <http://data.lib.cam.ac.uk/id/entry/cambrdgedb_1000346> <http://purl.org/dc/terms/language> <http://id.loc.gov/vocabulary/iso639-2/eng> .8. <http://data.lib.cam.ac.uk/id/entry/cambrdgedb_1000346> <http://RDVocab.info/ElementsplaceOfPublication> <http://id.loc.gov/vocabulary/countries/ii>
The Linking Open Data cloud diagram - http://richard.cyganiak.de/2007/10/lod
Wikipedia
Archives Hub
British Library BNB
British Museum
Library of Congress
LOD at Bibliothèque nationale de France
BBC Nature
University of Southampton
Open University
More data out there for cataloguers to reuse
More access points in records
Better mechanisms for record enrichment
Scope for revised cataloguing workflows
Records have a permanent identity on the web
Initial attempts with RDF
Newer lightweight formats and databases
Focus on citation metadata for the sciences
New ways for scientists to share and work with bibliography
http://openbiblio.net/
http://openbiblio.net/principles/
If developers are now consumers of our data …
Most Cambridge data could be released under a permissive license (PDDL)
Europeana Digital Library approve Creative Commons ‘Zero’ licensing of data
British Library BNB – Creative Commons ‘Zero’
OCLC looking at attribution only licensing
Move away from ‘non-commercial’ wording
Open Data Commons Public Domain Dedication and License
(PDDL)
No one wants OCLC to go under (partners on COMET)
Valued partners
Focus on sharing ‘non-marc21’ formats of greater use to the non-Librarian
Vendors aim to profit from services based on data rather than data for its own sake?
Based on a 40 year old format
Based on a need to print a human readable card
Syntax, vocabulary, field names and content all intertwined
According to OCLC Research : Only 10% of all Marc tags in Worldcat
appear in 100% of all Worldcat records
65% of tags appear in less that 1% of records.
AACR2 / MARC21 uses punctuation to denote content (100$d)
Mixed fields (text and numbers) (020$a)
Duplication author name format One hundred notes fields (or close
enough) ?
df100$aBradford, Gamaliel$d1863 - 1932. <authorParsed><surname>Bradford</surname><restOfName> Gamaliel</restOfName><birthDate>1863</birthDate><birthDateNormalised>18630101</birthDateNormalised><deathDate>1932</deathDate><deathDateNormalised>19320101</deathDateNormalised></authorParsed>
Marc21 is binary encoded
Web-friendly standards are now the norm (XML/JSON) 1
Numbers for field names?
Bad character encoding allowed
LOC Bibliographic Framework Transition declares a shift away from Marc21
Is the delay in introduction of RDA until we get a ‘better container’ ?
No system vendor is going forward with Marc21
Will take 10+ years
What is to come next?
Steering for RDA and Marc replacement needs non-librarian input or ownership
Offer from NISO to take the work on
Karen Coyle criticises the Marc21 Bibliographic Framework Transition Initiative for not including museums, publishing, and IT professionals …
She argues that our data is not just for us to consume alone …
“The next data carrier for libraries needs to be developed as a truly open effort. It should be led by a neutral organization (possibly ad hoc) that can bring together the wide range of interested parties and make sure that all voices are heard. Technical development should be done by computer professionals with expertise in metadata design. The resulting system should be rigorous yet flexible enough to allow growth and specialization.”
http://kcoyle.blogspot.com/2011/08/bibliographic-framework-transition.html
It becomes (even) easier to go to Amazon
Our status as authoritative data providers will be (further) eroded
No-one will want to play with us if we cannot learn to share
http://www.discovery.ac.uk - Discovery
Ncg4lib mailing list
http://okfn.org - Open Knowledge Foundation
http://data.lib.cam.ac.uk
Ed Chamberlain
@edchamberlain [email protected] http://www.slideshare.net/EdmundChamberlain/