British Library Labs Presentation at Ed Tech Hackathon 2013 - hackathoncentral.com
description
Transcript of British Library Labs Presentation at Ed Tech Hackathon 2013 - hackathoncentral.com
British Library Labshttp://labs.bl.uk
Saturday 26th October 2013, 1000 – 1100 (15 min slot)Ed Tech Hackathon 2013 (Apps for Learning and Teaching)hackathoncentral.comCentral Working Canteen, Google Campus4–5 Bonhill Street, London, EC2A 4BX, UK
Mr Mahendra MaheyBritish Library Labs Project Manager
http://labs.bl.uk 2#bl_labs
Encouraging scholars and developers to do research and development with and across British Library digital collections and data
British Library Labs is about…
http://labs.bl.uk 3#bl_labs
What activities does Labs do?• Encourage researchers / developers to do interesting things with BL
digital content (+other) with and across collections (a data driven approach)
• Competitions and events (hack events)
• Creating an environment where scholars / developers can work intensively with Library’s digital collections (winners will be resident)
• Work with your ideas
• Help develop tools and services to support digital scholarship
• Case studies for the library and research communities
http://labs.bl.uk 4#bl_labs
How Labs works…
BL LabsCompetitions
Events
Contact
Software
Publications
Tools and services to
support Digital Scholarship
BL Digital Collection /
Data
idea
BL Digital Collection /
Data
Other Digital Collection
idea
idea
idea
idea
http://labs.bl.uk 5#bl_labs
British Library Digital Collections
Over 600 digital collections and rising…Filter is needed…
•Copyright cleared for research and commercial use or non commercial
•Curated (Is there someone who knows about the collection?)
•Collection / Item Level Metadata available?
•Where is it?
Available only in
Reading Rooms due
to ©
Available on site only at the moment
due to ©
Digital but not online –
various storage devices
Available only onsite at the momentHack Events, In residence
Digital and
online
http://labs.bl.uk 6#bl_labs
Some data / digital collections
British National BibliographyUK Web Archive Data 19th Century Books
Environmental Sounds
Text-mining of electronic journals
Book ordering and anonymised reader
data
Resonance FM10 year CommunityArts Radio Show
Datasets, Books / Text, Images / Music, Maps, Sounds, Multimedia
http://labs.bl.uk/Digital+Collections
http://labs.bl.uk 7#bl_labs
Example Research Methods• Corpus analysis tools
• Visualisations
• Location based searching
• Geotagging
• Annotation
• APIs for datasets e.g. Metadata, Images
• Crowdsourcing / Human Computation
• Natural Language Processing
• Transcribing
http://labs.bl.uk 8#bl_labs
Ideas from first competition
• Text mining tool in the reading rooms
• Curatorial…repackaging metadata for teaching and learning in a CMS e.g. Drupal
• Visualising large collections of sound at a glance (thumbnails)
• Using sheet music and OMR software
• Working to re-use a radio archive
• The winners are…
http://labs.bl.uk 9#bl_labs
Mixing the Library: The Disc Jockey & the Digital Collection
Dan Norton completed a PhD at the University of Dundee and is an Artist in Residence at Hangar, Centre for Art and Research, Barcelona.
His idea is to build a ‘mixing’ interface for interacting with BL digital collections and wider developed from the DJ's model of interaction with information.
Dan Norton’s prototype ‘mixing’ interface
Annotation
Preview ‘item’
Selected ‘right’ channel ‘item’
Selected ‘left’channel ‘item’
Collection ‘stalks’ made of ‘items’. Each ‘item’ is a URL. The order of the ‘items’ can be ‘shuffled’ and sent to the ‘left’ or ‘right’ channels
‘Play back’ of ‘items’ (Blue) and annotations (Yellow)
http://labs.bl.uk 10#bl_labs
In this example, the ‘Sample Generator’ searches across 1.8 million bibliographic records from the 19th Century for items about ‘Travel Routes’ and where possible
(digitised items permitting) provides unbiased digital ‘samples’ for further research.
Generates a randomised unbiased sample
Work 1
Generated sample URL (unique & citable after creation)
Work 1
Work 2
Work 3
Work 4
Work 5
Work 6
Work 7
Work 8
Terms used: ‘Travel Routes’ from ‘1888-1899’, sample size ‘8’. Set created on ‘16/10/2013’ by ‘Pieter Francois’
Travel Routes From To1888 1899
Digitally available content only
Account, Tour, Adventure, Visit. Journey, Expedition, Excursion, Trip, Holiday, Guide, Plan, Route
Sample Size 8 Generate
British Library Labs Sample Generator1
Travel route extracted from ‘Work 1’ for further research
3
Pieter Francois is a Postdoctoral Researcher at the University of Oxford. The ‘Sample Generator’ connects one or more major catalogues or collections of digitized texts through metadata.
The Sample Generator for Digitised Texts
Search terms
Distribution of items in catalogue
Researcher carries out research on works in the sample generated. Here it used for the basis
of generating travel routes as shown in 3.
2
Synonyms
http://labs.bl.uk 11#bl_labs
Next Competition
• Starts 11 November 2013 and ends around April 2014
• Submit idea, engage during this period to formulate a good idea
• First prize £3000 and residency (expenses paid) and we will work with you to make your shiny thing between May and October 2014
• Work with us anyway and our content at Data / Hack events:
• 12 Dec 2013, 13 January 2014, 12 February 2014, 10 March 2014
http://labs.bl.uk 12#bl_labs
Data / Items brought
• British National Bibliography in RDF Triples
• Digitised books from 17th, 18th and 19th and 20th Century
• Image metadata
• 10 x USB Sticks
• 1 x 500 Gb hard drive
http://labs.bl.uk 13#bl_labs
British National Bibliographic Data• http://bnb.data.bl.uk (part of data.bl.uk) –
download here, SPARQL end point
• 2.8 Million individual records
• Available as Linked Open Data, Basic RDF/XML and Marc21.
• On USB• Hard Drive Augmenting Author records
– London Review of BooksCombining with other data sources?
http://labs.bl.uk 14#bl_labs
19th Century Digitised Books• 65,000 digitised volumes. Many rare or inaccessible books
published between 17, 18, 19 and 20th Century including philosophy, history, poetry and literature, travel
• 25 million pages (OCR text available, 75% accuracy) on hard drive as .txt, .json and metadata as .xml (50 Gb) (metadata as .tsv metadata on USB stick), items identified by unique numbers
• 420,000 images / illustrations available on Flickr (around 70% and counting) http://goo.gl/OrCKZz (use their API) and on hard drive (100 Gb – 20 mins? – illustrations and covers)
• See Mechanical Curator on Tumblr - http://goo.gl/uvE5Yw
• For images - Jigsaw, crowdsourcing metadata, image recognition (machine learning)
• For Text – dirty data, cleaning up exercise, with educational purpose?
http://labs.bl.uk 15#bl_labs
Image Metadata
• .CSV files on USB stick and hard drive
• Contains links to images
• Re-purpose metadata and images?
http://labs.bl.uk 16#bl_labs
Speak to me: 0207 412 7324 Email me: [email protected] or [email protected] Website: http://labs.bl.uk/ Twitter: @BL_LabsHash Tag: #bl_labsJiscmail: https://www.jiscmail.ac.uk/cgi-bin/webadmin?A0=BL-LABS
Blog: http://britishlibrary.typepad.co.uk/digital-scholarship/
What next?