Introduction to information retrieval and bibliographic control Systems for bibliographic control
Do the LOCAH-Motion: How to Make Bibliographic and Archival Linked Data
-
Upload
adrian-stevenson -
Category
Technology
-
view
1.880 -
download
3
description
Transcript of Do the LOCAH-Motion: How to Make Bibliographic and Archival Linked Data
www.bath.ac.uk
UKOLN is supported by:
Do the LOCAH-MotionHow to Make Archival and Bibliographic Linked Data
16th February 2011
Dev8D, University of London, UK
Adrian Stevenson
LOCAH Project Manager
www.bath.ac.uk
What is the LOCAH Project?• Linked Open Copac and Archives Hub• Funded by #JiscEXPO 2/10 ‘Expose’ call• 1 year project. Started August 2010• Partners & Consultants:
– UKOLN – Adrian Stevenson, Julian Cheal– Mimas – Jane Stevenson, Bethan Ruddock, Yogesh Patel– Eduserv – Pete Johnston– Talis – Leigh Dodds, Tim Hodson– OCLC - Ralph LeVan, Thom Hickey– Ed Summers
• http://blogs.ukoln.ac.uk/locah/ tag: #locah
www.bath.ac.uk
What are the Archives Hub and Copac?• The Archives Hub is an aggregation of
archival descriptions from archive repositories across the UK - http://archiveshub.ac.uk
• Copac provides access to the merged library catalogues of libraries throughout the UK, including all national libraries - http://copac.ac.uk
What is Linked Data?
• URIs
• LD Design Issues
• Triples
http://www.w3.org/DesignIssues/LinkedData.html
www.bath.ac.uk
What does Linked Data Offer?• Haven’t we been putting linked data on the
web for years?– In CSV , relational databases, XML etc?
• Well yes, but these approaches are not easy to integrate
• Web 2.0 mashups work against a fixed set of data sources
• Linked Data applications operate on top of an unbound, global data space.
www.bath.ac.uk
What is LOCAH Doing?
• Part 1: Exposing the Linked Data
• Part 2: Creating a prototype visualisation
• Part 3: Reporting on opportunities and barriers
www.bath.ac.uk
How are we Exposing the LOCAH Linked Data?1. Model our ‘things’ into RDF
2. Transform the existing data into RDF/XML
3. Enhance the data
4. Load the RDF/XML into a triple store
5. Create Linked Data Views
6. Document the process, opportunities and barriers on LOCAH Blog
www.bath.ac.uk
1. Modelling ‘things’ into RDF• Archives Hub data in ‘Encoded Archival Description’ EAD
XML form
• Copac data in ‘Metadata Object Description Schema’ MODS XML form
• Take a step back from the data format– Think about your ‘things’– What is EAD document “saying” about “things in the world”?– What questions do we want to answer about those “things”?
• Can help make data more user-centric
http://www.loc.gov/ead/ http://www.loc.gov/standards/mods/
www.bath.ac.uk
Triples• Thinking falls naturally into ‘triple’ statements
– ‘Things’ have ‘properties’ with ‘values’– Subject – Predicate - Object
• Triples are basis of RDF• More on all this at
http://blogs.ukoln.ac.uk/locah/2010/09/28/model-a-first-cut/
ArchivalResource
Repository Provides Access To
www.bath.ac.uk
Data Modelling Challenges• Archival description is hierarchical and multi-level• Information is provided about aggregation of records,
and then about component parts• Multi-level approach gives a strong sense of “context”
– “lower level” units interpreted in context of the higher levels of description
– Arguably “incomplete” without the contextual data
• Linked Data involves ‘bounded descriptions– Relations are asserted, e.g. member-of/component-of
– But there is no requirement or expectation that data consumers will follow the links describing the relations
www.bath.ac.uk
Data Modelling Challenges• Hub: inconsistencies in data and lack
of standardisation– there's actually no content standard in
the UK
• Copac: not a standard library catalogue– merged catalogues with de-duplication
to an extent but cannot be done entirely
www.bath.ac.uk
1. Modelling ‘things’ into RDF• Decide on patterns for URIs we generate• Following guidance from W3C ‘Cool URIs for the Semantic
Web’ and UK Cabinet Office ‘Designing URI Sets for the UK Public Sector’– E.g. http://example.ac.uk/id/findingaid/gb1086skinner ‘thing’ URI– Use HTTP 303 ‘See Other’ to redirect to …– E.g. http://example.ac.uk/doc/id/findingaid/gb1086skinner doc URI– Content negotiates to …– http://example.ac.uk/doc/…/doc.rdf , …/doc.html for documents
about things– More info at http://blogs.ukoln.ac.uk/locah/2010/11/16/identifying-the-things-uri-
patterns-for-the-hub-linked-data/
http://www.w3.org/TR/cooluris/http://www.cabinetoffice.gov.uk/resource-library/designing-uri-sets-uk-public-sector
www.bath.ac.uk
1. Modelling ‘things’ into RDF• Using existing RDF vocabularies:
– DC, SKOS, FOAF, BIBO, WGS84 Geo, Lexvo, ORE, LODE, Event and Time Ontologies
• Define additional RDF terms where required– FindingAid– ArchivalResource– maintenanceAgency
• It can be hard to know where to look for vocabs and ontologies
• Decide on license – CC0, ODC PDD
ArchivalResource
Finding Aid
EAD Document
Biographical History
Agent
Family Person Place
Concept
Genre Function
Organisation
maintainedBy/maintains
origination
associatedWith
accessProvidedBy/providesAccessTo
topic/page
hasPart/partOf
hasPart/partOf
encodedAs/encodes
Repository(Agent)
Book
Place
topic/page
Language
Level
administeredBy/administers
hasBiogHist/isBiogHistFor
foaf:focus Is-a associatedWith
level
Is-a
language
ConceptScheme
inScheme
ObjectrepresentedBy
PostcodeUnit
Extent
Creation
Birth Death
extent
participates in
TemporalEntity
TemporalEntity
at time
at time
product of
in
Archives Hub Model (as at 14/2/2011)
www.bath.ac.uk
Copac Model (as at November 2010 – work in progress)
www.bath.ac.uk
Feedback Requested!
• We would like feedback on the model• Appreciate this will be easier when the data
available• Via blog
– http://blogs.ukoln.ac.uk/locah/2010/09/28/model-a-first-cut/
– http://blogs.ukoln.ac.uk/locah/2010/11/08/some-more-things-some-extensions-to-the-hub-model/
– http://blogs.ukoln.ac.uk/locah/2010/10/07/modelling-copac-data/
• Via email, twitter, in person at Dev8d
www.bath.ac.uk
2. Transforming in RDF/XML• Need to transform data in EAD and MODS
to RDF/XML, based on our models
• For Hub data created XSLT Stylesheet and used Saxon parser– http://saxon.sourceforge.net/– Saxon runs the XSLT against a set of EAD files and
creates a set of RDF/XML files
• For Copac data created in-house Java transformation program
www.bath.ac.uk
3. Enhancing our data• Already have some links:
– lexvo.org URIs for languages of archival materials– reference.data.gov.uk URIs for time periods– URIs for postcodes, using both UK Postcodes URIs and
Ordnance Survey URIs
• Currently also looking at:– Virtual International Authority File
• Matches and links widely-used authority files - http://viaf.org/
– Library Congress Subject Headings– DBPedia
www.bath.ac.uk
4. Load the RDF/XML into a triple store• Using the Talis Platform triple store
• RDF/XML is HTTP POSTed
• We’re using Pynappl – Python client for the Talis Platform– http://code.google.com/p/pynappl/
• Store provides us with a SPARQL query interface
www.bath.ac.uk
5. Create Linked Data Views• Expose ‘bounded’ descriptions from
the triple store over the Web
• Make available as documents in both human-readable HTML and RDF formats (also JSON, Turtle, CSV)
• Using Paget ‘Linked Data Publishing Framework’– http://code.google.com/p/paget/– PHP scripts query Sparql endpoint
• ‘Out-of-the-box’ Paget view• Linkedhub.ac.uk domain just given as example
www.bath.ac.uk
Other Stuff We Might Try• Linked Data API
– APIs, data formats and supporting tools to aid the adoption of linked data
– http://code.google.com/p/linked-data-api/
• Entity extraction from free text– Open Calais
• “creates rich semantic metadata for the content you submit” - http://www.opencalais.com/
– DBPedia Spotlight (announced yesterday)• “solution for linking unstructured information sources to the Linked
Open Data”• http://dbpedia.org/spotlight
www.bath.ac.uk
Can I Access the Locah Linked Data?
• Not quite yet …
• Hoping to release the Hub data by end February 2011
• Copac data end March 2011
• Release will include Linked Data views, Sparql endpoint details, example queries and supporting documentation
www.bath.ac.uk
How are we creating the Visualisation Prototype?• Based on researcher use cases
• Data queried from Sparql endpoint
• Use tools such as Simile, Many Eyes, Google Charts
www.bath.ac.uk
Visualisation Protoype• Using Timemap –
– Googlemaps and Simile
– http://code.google.com/p/timemap/
• Early stages with this• Will give location and
‘extent’ of archive.• Will link through to
Archives Hub
www.bath.ac.uk
How are we reporting on opportunities and barriers?• Recording these as we go along on
the blog (tags: ‘opportunities’ ‘barriers’)
• Feed into #JiscEXPO synthesis work
• Not time to go into these today• More at:
– http://blogs.ukoln.ac.uk/locah/2010/09/22/creating-linked-data-more-reflections-from-the-coal-face/
– http://blogs.ukoln.ac.uk/locah/2010/12/01/assessing-linked-data
www.bath.ac.uk
Questions?
• Contacts:– Ade Stevenson @adrianstevenson– Jane Stevenson @janestevenson– Pete Johnston @ppetej– Bethan Ruddock @bethanar– Julian Cheal @juliancheal– Yogesh Patel http://mimas.ac.uk/staff/
www.bath.ac.uk
Attribution and CC License
• Sections of this presentation adapted from materials created by other members of the LOCAH Project
• This presentation available under creative commons Non Commercial-Share Alike:http://creativecommons.org/licenses/by-nc/2.0/uk/