Online Presentation

26
Establishing the Connection: Creating the Linked Open British National Bibliography Neil Wilson Head of Metadata Services Online Information Conference 30 November 2011 twitter.com/#!/BLMetadata

description

Updated version of the presentation for Online 2011

Transcript of Online Presentation

Page 1: Online Presentation

Establishing the Connection: Creating the Linked Open British National Bibliography

Neil WilsonHead of Metadata Services

Online Information Conference 30 November 2011

twitter.com/#!/BLMetadata

Page 2: Online Presentation

2

British Library Metadata ServicesBackground

Operated prior to the BL’s foundation as ‘The British National Bibliography’ (BNB) Ltd from1950

Originally offered priced services for national & international libraries

Evolved through changes in format & delivery technologies

Offered free services from 2010 as part of the BL’s new open metadata strategy

Page 3: Online Presentation

3

BL Metadata Services Stakeholder Relationships

Page 4: Online Presentation

4

Library Sector Relevance?

Declining? Increasing?

“I did my PhD with only 12 visits to a library. That was 5 years ago; things have improved since then, now you don’t need to use one at all!”

“The release of library data offers the opportunity for it to be used in ways unthought-of by the library & information community…”

Page 5: Online Presentation

Changing ExpectationsPutting Public Sector Data To Work

McKinsey forecasts the benefits value of open public data could be 250bn Euros

“Putting the Frontline First” required “the majority of government-published information to be reusable, linked data” by June 2011.

Public data will be released under the same open licence which enables free re-use, including commercial re-use

Page 6: Online Presentation

6

Library Metadata The Promise of Linked Data

Better web integration of resources increasing visibility & reaching new users

A global pool of reusable data for organisations to add unique value

New library leadership opportunities due to persistence, stability & authority

Such benefits cross national & sectoral boundaries but require huge cultural changes

Page 7: Online Presentation

7

How Are We Meeting The Challenge?

Our new open metadata strategy aims to:

Enable increased innovation without unnecessary barriers

Break from library formats & use cross domain standards

Obtain attribution while offering more permissive licensing

Deliver with decreasing resources while maintaining revenue

Page 8: Online Presentation

8

What Have We Achieved?

Signed over 450 organisations in 71 countries to free data services

Supplied 3-15 million item XML datasets under Creative Commons licenses

Worked with JISC & linked open data implementers on technical, standards & licensing challenges

Created a linked data version of the British National Bibliography

User type

42.7%

1.5%9.9%1.0%3.5%

6.2%

2.0%

4.2%

6.7%

22.3%

Academic Chari ty Commercia l ConsortiaGovernment Individual Medica l NationalPubl ic School

Page 9: Online Presentation

9

Our Linked Data Journey… Why the British National Bibliography?

We wanted to:

Advance debate from theory to practice via release of a critical mass of data

Show commitment by using a core dataset - niche examples are not as compelling

Create a foundational service others can build upon & not a dead end

Page 10: Online Presentation

10

Our Linked Data Journey… Preliminaries

We first identified: The best licensing model

for our objectives (CC0) A proven hosting platform

(Talis) Sources of expert

knowledge & feedback (e.g. W3C, Open Bibliography etc)

…in order to concentrate effort on adding new value to our data

Page 11: Online Presentation

11

Our Linked Data Journey…Additional Objectives

The project would be a staff & organisational development opportunity using:

In-house personnel i.e. librarians rather than IT experts

Pre-existing tools & technologies

Library MARC21 data Established & trusted linked

resources

Page 12: Online Presentation

12

Our Linked Data Journey…Migrating From a Flat Catalogue Card Model…

We aimed to:

Start simple & develop in line with evolving staff expertise

Utilise staff training & mentoring from Talis in:

Linked data concepts RDF modelling

Presentation options

… and use the opportunity to blend the best of traditional & new approaches

Page 13: Online Presentation

13

Our Linked Data Journey…To Something New…

Page 14: Online Presentation

14

Our Linked Data Journey… Selecting Sites To Link To (for mutual benefit)

To position our data in a wider context

We blended general linked resources i.e.:

GeoNames Lexvo RDF Book Mashup

With key linked library resources i.e.:

Dewey.info LCSH SKOS VIAF

Page 15: Online Presentation

15

Our Linked Data Journey…Matching & Generating Links

Three approaches used:

Automatic generation from data elements in records

Automated text matching with linked data resource dumps

Two stage crosswalk matching process for coded data

Page 16: Online Presentation

16

Our Linked Data Journey…Embedding The Links

Page 17: Online Presentation

17

Full BNB MARC21

File

Transform to RDFXML using

XSLT

Load to Linked Data Platform

Generate RDF Triple Dump

BNB RDF/XML file

Select single volume

published books only

Normalise for improved

matching & transforms

Convert to pre-composed UTF-8

Create BL URIs and add external

URIs by matching

MARCPre-Processing

Our Linked Data Journey The MARC to RDF XML Conversion Workflow

MARC to RDF XML Conversion Consists of multiple automated steps using a number of tools

• Selection• Pre-processing• Character set conversion• URI Generation• Data Transformation

Page 18: Online Presentation

18

Where Did We Get To?

Hosted on the Platform:

bnb.data.bl.uk/sparql bnb.data.bl.uk/describe bnb.data.bl.uk/search

.

BNB Books 1950-2011 2.5 Million Records

80 Million Unique RDF Triples

Page 19: Online Presentation

19

What Does It Look Like ?

Page 20: Online Presentation

20

Lessons Learned - Its a new way of thinking…

Legacy data wasn’t designed for this so take care with data modelling & sustainability

Everyone is still learning so you may be the best judge

There are often tools or expertise out there so don’t reinvent the wheel

Page 21: Online Presentation

21

Lessons Learned – Data Issues

Offer sample access to the community for feedback

Expect criticism in addition to positive feedback & continually improve

Any conversion inevitably identifies hidden data issues…& creates new ones!

…but it’s often better to release an imperfect something than a perfect nothing!

Page 22: Online Presentation

22

Lessons Learned - Staff and Resource Issues

It can be a steep learning curve so:

Exploit external expertise to work with or guide your own domain experts

Cultivate a staff culture of enquiry & innovation to widen perspectives

Identify & use pre-existing tools to save development time & assist data validation

Page 23: Online Presentation

23

Lessons Learned – Was It Worth It?

The benefits have been significant & the initiative has:

Given us a presence without distorting revenue streams …& may even offer new options

Gained us a 1st mover advantage within our sector & advanced discussion as hoped

Shown that if you offer useful data, people will use it With over 3 million transactions in the 1st 3 months

Page 24: Online Presentation

24

Our Linked Data Journey - Where Next?

Release of further BNB material

Refine & document the new data model

Identify further resources to link to

Monthly updates on completion

Identify what else can be offered?

Page 25: Online Presentation

2525

Final Thoughts…

It’s never going to be perfect first time

We expect to make mistakes

We aim to learn from them

We hope others will learn something too

& everyone benefits

So if anyone is thinking of undertaking a similar journey…

Just do it!

Page 26: Online Presentation

26

British Library Metadata Services

Images from

http://twitter.com/#!/BLMetadata

[email protected]