Post on 11-May-2015
description
Building Bridges: from Europeana
Libraries to Europeana Newspapers
Susan Reilly, LIBER
Twitter: @skreilly
IFLA Newspapers/GENLOC, Helsinki, 13th Aug 2012
This project is partially funded under the ICT Policy Support Programme (ICT PSP) as part of the Competitiveness and Innovation Framework Programme by the European Community http://ec.europa.eu/ict_psp 2
Overview
About LIBERIntroduction to Europeana NewspapersThe foundation stone: Europeana Libraries
This project is partially funded under the ICT Policy Support Programme (ICT PSP) as part of the Competitiveness and Innovation Framework Programme by the European Community http://ec.europa.eu/ict_psp
Association of European Research LibrariesOur projects:
ContentEuropeana LibrariesEuropeana Newspapers
PolicyMEDOANET
InfrastructureAPARSENAAA StudyODE
LIBER & the European Digital Agenda
This project is partially funded under the ICT Policy Support Programme (ICT PSP) as part of the Competitiveness and Innovation Framework Programme by the European Community http://ec.europa.eu/ict_psp
Europeana Newspapers
• 17 partner institutions
• 3 years (2012-2015)
• Aggregation of more than 18 million newspapers
• Will use refinement methods for OCR, OLR (article segmentation), and named entity (NER) and class recognition
• Suvey existing collections in Europe
• Make content accessible
This project is partially funded under the ICT Policy Support Programme (ICT PSP) as part of the Competitiveness and Innovation Framework Programme by the European Community http://ec.europa.eu/ict_psp
Why newspapers?
“The museum (and the newspaper) today seeks whatever represents normal life in its own native locality and with infinite pains its collections are arranged in a manner which is natural to them in their own habitat”
Lucy Maynard Salmon (1976) in The Newspaper and the Historian
This project is partially funded under the ICT Policy Support Programme (ICT PSP) as part of the Competitiveness and Innovation Framework Programme by the European Community http://ec.europa.eu/ict_psp
Europeana Newspapers: where the content comes from…
NLF
SBB ONB
NLP
BnF
NLE
SUB HH
USAL
NLL
KB
LIBER
CCS
NLT
UB
UIBK
LFT
BL
We are looking for more libraries!
This project is partially funded under the ICT Policy Support Programme (ICT PSP) as part of the Competitiveness and Innovation Framework Programme by the European Community http://ec.europa.eu/ict_psp
What we do with the content
• Select 10 million items to be OCR’d• Structural information by UKIB e.g. headings, table of contents
• Select 2 million items for OCR and OLR• Article segmentation and page class recognition by CCS
• Libraries carry out manual correction of recognition and segmentation results
• Named entity recognition applied to English, Dutch and German material
This project is partially funded under the ICT Policy Support Programme (ICT PSP) as part of the Competitiveness and Innovation Framework Programme by the European Community http://ec.europa.eu/ict_psp
Making the content accessible
• OCR enables full text searching
• OLR enables more targeted searching (titles and sections)
• NER enables searching by people, place,and the discover of new relationships between entities
This project is partially funded under the ICT Policy Support Programme (ICT PSP) as part of the Competitiveness and Innovation Framework Programme by the European Community http://ec.europa.eu/ict_psp
No access without aggregation
• Europeana Libraries • A single library domain aggregator• Content from European research libraries• Full-text search capabilities• Portal for researchers
Access= Critical mass of content:
• 3,319,045 pages
• 598,130 books and theses
• 368,000 articles
• 848,078 images
• 1,200 film and video clips
• 34,000 mixed content objectsAccess = SustainabilityAccess = Visibility
Thank you for your attention!
http://www.libereurope.eu
http://www.europeana-newspapers.eu/
http://www.europeana-libraries.eu/
Hall 4/5, stand H104