Publishing the British National Bibliography as Linked Open Data Corine Deliot Metadata Standards...

33
Publishing the British National Bibliography as Linked Open Data Corine Deliot Metadata Standards Analyst British Library CIG Event Birmingham, 25 November 2013 © The British Library Board 2013

Transcript of Publishing the British National Bibliography as Linked Open Data Corine Deliot Metadata Standards...

Publishing the British National Bibliography

as Linked Open Data

Corine DeliotMetadata Standards Analyst

British Library

CIG EventBirmingham, 25 November 2013

© The British Library Board 2013

www.bl.uk 2

Overview• Motivations and approach

• The modelling process and the data model

• Technical process: from MARC 21 to RDF

• Linking to external datasets

• Outcomes – datasets/platform/access

• Plans for future developments

• Use of the BNB data

• Benefits

• Challenges

www.bl.uk 3

Motivations

• Publishing our data for others to re-use

• Looking beyond library audiences

• Taking part in the Linked Data conversation

www.bl.uk 4

How?

• Pragmatic, bottom-up approach

• Using existing staff

• Building on existing skills

• Using existing tools as much as possible

www.bl.uk 5

Why BNB?

• General bibliography - not a unique institutional catalogue

• Consistent format - over 60 years

• Size & range of content - 3 million records on all subjects in many languages

• Control of metadata – publishable as CC0.

© Waldir/ Wikimedia Commons/ CC BY-SA-3.0Usage terms: http://creativecommons.org/licenses/by-sa/3.0/

www.bl.uk 6

The modelling process (I)

• identify our objects of interest, i.e. what does the MARC record says about “things in the world”

e.g. Bibliographic resources, people, organizations, places, subjects, etc.

• Assign URIs to identify these objects of interests

www.bl.uk 7

URIs: Things to think about

• Create our own URIs or use existing ones? e.g. http://viaf.org/viaf/96994048

http://id.loc.gov/authorities/names/n78095332

• Create opaque or transparent URIs?• e.g. http://viaf.org/viaf/96994048 or

http://dbpedia.org/resource/William_Shakespeare

• What pattern? URI pattern guidance from the UK Cabinet Office

“Designing URI Sets for the UK Public Sector”

• Create valid, i.e. syntax conformant URIs

www.bl.uk 8

URI patterns

• http://bnb.data.bl.uk/id/resource/{control-number}

• http://bnb.data.bl.uk/id/resource/{BNB-number}

• http://bnb.data.bl.uk/id/person/{person-name}

• http://bnb.data.bl.uk/id/organization/{organization-name}

• http://bnb.data.bl.uk/id/concept/lcsh/{topic}

• http://bnb.data.bl.uk/id/concept/ddc/{edition-number}/{dewey-number}

www.bl.uk 9

URI patterns

• http://bnb.data.bl.uk/id/resource/008043929

• http://bnb.data.bl.uk/doc/resource/008043929

• http://bnb.data.bl.uk/doc/resource/008043929.rdf

• http://bnb.data.bl.uk/doc/resource/008043929.ttl

• http://bnb.data.bl.uk/doc/resource/008043929.json

• http://bnb.data.bl.uk/doc/resource/008043929.html

www.bl.uk 10

The modelling process (II)

• Describe these objects of interest, i.e. use classes

• and how they relate to each other, i.e. use properties

Use classes and properties from existing RDF vocabularies

Define our own classes and properties when required; documented in the British Library Terms RDF schema

www.bl.uk 11

RDF Vocabularies

• Bibliographic Ontology

• Bio: a Vocabulary for Biographical Information

• British Library Terms

• Dublin Core

• Event Ontology

• FOAF: Friend of a Friend

• ISBD

• Org: an Organisation Ontology

• OWL

• RDA

• RDF

• RDF Schema

• SKOS

• WGS84 Geo Positioning

www.bl.uk 12

RDF Vocabularies

• Bibliographic Resource Dublin Core Bibliographic Ontology ISBD British Library Terms

• Event Event Ontology British Library Terms

• Person/Organization FOAF: Friend of a Friend Bio: a Vocabulary for

Biographical Information Org: an Organisation

Ontology RDA

• Place WGS84 Geo Positioning

• Concept SKOS British Library Terms

• RDF• RDF Schema• OWL

www.bl.uk 13

The British Library Terms RDF Schema

@prefix blt:<http://www.bl.uk/schemas/bibliographic/blterms#> .

• Existing property not quite right (e.g. not granular enough)

e.g. dcterms:identifier vs blt:bnb

www.bl.uk 14

The British Library Terms RDF Schema

@prefix blt:<http://www.bl.uk/schemas/bibliographic/blterms#> .

Property or class required by specific feature of the model

e.g. blt:publication and blt:PublicationEvent (rdfs:subclass of event:Event)

www.bl.uk 15

The British Library Terms RDF Schema

@prefix blt:<http://www.bl.uk/schemas/bibliographic/blterms#> .

For pragmatic reasons, e.g. facilitate searching and navigating through the graph

e.g. blt:TopicLCSH and blt:TopicDDC

e.g. blt:hasCreated owl:inverseOf dcterms:creator

www.bl.uk 16

The BNB data model - Books

http://www.bl.uk/bibliographic/pdfs/bldatamodelbook.pdf

www.bl.uk 17

Data Model Features (I): the Bibliographic Resource

www.bl.uk 18

Data Model Features (II): Publication as an event@prefix dc:<http://purl.org/dc/elements/1.1/> .

@prefix dcterms:<http://purl.org/dc/terms> .

<BibResource> dc:publisher “Publisher” ;

dcterms:issued “Date” ;

?:placeOfPublication “Place” .

@prefix blt:<http://www.bl.uk/schemas/bibliographic/blterms#> .

@prefix event:<http://purl.org/NET/c4dm/event.owl#> .

<BibResource> blt:publication <PublicationEvent> . <PublicationEvent> event:place <Place> ;

event:agent <Publisher> ; event:time <Year> .

Usual approach

Event-based approach

www.bl.uk 19

Data model features (III)

• Birth and death are modelled as biographical events

• extensive use of foaf:focus to relate “things in the world” (e.g. people, organizations, places) to their SKOS concepts.

e.g. “London”, the capital of England and the UK as a single “thing in the world” may be the “focus” of multiple concepts belonging to different concept schemes, e.g. thesauri (LCSH, Rameau, etc.)

<Thing-as-Concept> foaf:focus <Thing in the World> .

http://efoundations.typepad.com/efoundations/2011/09/things-their-conceptualisations-skos-foaffocus-modelling-choices.html by Pete Johnston

www.bl.uk 20

MARC to RDF Conversion Workflow

Full BNB MARC21

File

Transform to RDF/XML using

XSLT

Load to Linked Data Platform

Generate RDF Triple Dump

BNB RDF/XML file

Select records

Convert to pre-composed UTF-8

Normalise for improved

matching & transforms

Create BL URIs and add external

URIs by matching

MARCPre-Processing

Load to BL Downloads page

Process• Selection• Character set conversion• Pre-processing• URI generation• Data transformation• Create & load triples• Produce VoiD descriptions

Tools• Catalogue Bridge Utilities • MARC Global/MARC Report http://www.marcofquality.com/• Jena Eyeball http://jena.sourceforge.net/Eyeball/

www.bl.uk 21

Linking to external sources (I)

To give our data broader context we linked to:

• General resources:• GeoNames• Lexvo• RDF Book

Mashup

• Library resources:• LCSH• VIAF• Dewey.info• MARC language

and country codes

www.bl.uk 22

Linking to external sources (II)

Techniques included:

• Automatic generation from

record data

• Auto text match with linked data dumps

• Crosswalk matching for coded data

© Silverspoon/ Wikimedia Commons/ CC BY-SA-3.0Usage terms: http://creativecommons.org/licenses/by-sa/3.0/

www.bl.uk 23

Outcomes

• Two datasets – Books and Serials - and their VoID descriptions, accessible at:

• BNB Linked data platform: http://bnb.data.bl.uk

• SPARQL endpoint: http://bnb.data.bl.uk/sparql

• SPARQL editor: http://bnb.data.bl.uk/flint

• Bulk downloads: http://www.bl.uk/bibliographic/download.html

Updated monthly Serializations available:

RDF/XML, N-Triples

“Linking Open Data cloud diagram, by Richard Cyganiak and Anja Jentzsch. http://lod-cloud.net/”Usage terms: http://creativecommons.org/licenses/by-sa/3.0/

www.bl.uk 24http://bnb.data.bl.uk

www.bl.uk 25

www.bl.uk 26http://bnb.data.bl.uk/flint

www.bl.uk 27http://www.bl.uk/bibliographic/download.html

www.bl.uk 28

Platform change

• 2011 - initial Talis platform

• 2013 – data migration to TSO platformhttp://www.tso.co.uk/our-expertise/technology/openup-platform

Tendering process Migration of data and services over a couple of months

www.bl.uk 29

Plans for Future Developments

• Refine and extend the model

• Investigate frbr-ization

• Link to other external sources• Geonames at city level

• ISNI, LC/NACO, DBpedia

• DNB bibliographic resources

• Expand scope beyond current BNB

• Improve developer support

www.bl.uk 30

Use of the BNB data

• Statistics e.g. Number of hits on the SPARQL endpoint e.g. Number of downloads on the BL webpage

• BNB data used in pilot projects e.g. Linked Open BNB data used as test data for a semantic

search demonstrator.

• Anecdotal evidence

• Use is difficult to assess; part and parcel of the data being open and available for all to use.

www.bl.uk 31

Benefits of Linked Open Data

• We have learnt a lot about the practical aspects of working with linked data.

• The data model got some attention. Re-used by Danish Bibliographic Centre (DBC) Stanford Linked Data Workshop Technology Plan

““…ensure resulting model retains the BL’s high-level focus and its web derived, transparent structure for representing facts about people, organizations, places, events, and topics”

• LOD raised the Library’s profile internally and externally

• LOD helped us focus our legacy data enhancement activities

www.bl.uk 32

Challenges

Converting MARC data into RDF!

• Publication event approach: transforming transcribed text into data

• URI creation from string may result in duplication changes over time may also produce duplication.

• Legacy data issues e.g. inconsistency of the data e.g. cataloguers using inadequate input tools for diacritics

• This is (relatively) new, nobody has all the answers

www.bl.uk 33

For further information

http://bnb.data.bl.uk

http://www.bl.uk/bibliographic/datafree.html

Thank you.

Questions?

[email protected]

http://twitter.com/#!/BLMetadata