Planning a digitisation project: a rough guide1. Writing a project proposal (incl. ‘business...
Transcript of Planning a digitisation project: a rough guide1. Writing a project proposal (incl. ‘business...
Planning a digitisationproject: a rough guide
Digiwiki seminar 12.3.2009, Helsinki, Finland
Koninklijke Bibliotheek – Nationale bibliotheek van Nederland
Planning a digitisation project: a rough guide
Koninklijke Bibliotheek – Nationale bibliotheek van Nederland
Planning a digitisation project: a rough guide
Stepping stones...
1. Writing a project proposal (incl. ‘business case’)
2. Acquiring finances
3. Writing a detailed project plan (incl. detailed specs)
4. Setting up a project organisation
5. Managing the project flow
6. Wrapping up the project (exploitation plan, longterm
preservation)
Koninklijke Bibliotheek – Nationale bibliotheek van Nederland
Planning a digitisation project: a rough guide
Sample 1: ANP Radionews bulletins
• 1,5 million handtyped newsitems from the Dutch radio (1937-
1995)
• June 2007 – October 2008
• Text mass digitisation project
• Budget: 0,5 million Euros
• Funded by Memory of the Netherlands programme
• URL: http://anp.kb.nl (in Dutch only)
Koninklijke Bibliotheek – Nationale bibliotheek van Nederland
Planning a digitisation project: a rough guide
Koninklijke Bibliotheek – Nationale bibliotheek van Nederland
Planning a digitisation project: a rough guide
Koninklijke Bibliotheek – Nationale bibliotheek van Nederland
Planning a digitisation project: a rough guide
Sample 2: Databank of Digital Daily Newspapers
• 8 million newspaper pages
• Selection of Dutch local, regional, national and colonial
newspapers 1618-1995
• 2006 – 2011
• Budget: 12,5 million Euros
• Funded by the National Programme Investments in Large-
Scale Research Facilities
• 25 billion words,text mass digitisation
• URL: http://www.kb.nl/projectdagbladen/
Koninklijke Bibliotheek – Nationale bibliotheek van Nederland
Planning a digitisation project: a rough guide
Koninklijke Bibliotheek – Nationale bibliotheek van Nederland
Planning a digitisation project: a rough guide
Koninklijke Bibliotheek – Nationale bibliotheek van Nederland
Planning a digitisation project: a rough guide
Stepping stones...
1. Writing a project proposal (incl. ‘business case’)
2. Acquiring finances
3. Writing a detailed project plan (incl. detailed specs)
4. Setting up a project organisation
5. Managing the project flow
6. Concluding the project (exploitation plan, longterm
maintenance)
Koninklijke Bibliotheek – Nationale bibliotheek van Nederland
Planning a digitisation project: a rough guide
1. Writing a project proposal
- Business case: benefit for our organisation to start digitising
- Planning: how long will it take?
- Which resources are needed? Staff, equipment, etc. Estimate
of required budget.
- Risk asessment: what are the potential risks that can prevent
us from reaching our goals?
- Very down-to-earth description of final ‘deliverables’
- Who will be the ‘owner’ of the digital collection after the
project?
Koninklijke Bibliotheek – Nationale bibliotheek van Nederland
Planning a digitisation project: a rough guide
Sample case 1: ANP news bulletins
- Goal: make the collection online accessible for researchers
and the general public
- Output-oriented = ‘access project’
- Why? Task of our organization
- Deliverables: 1,5 million JPEG q. 10 files, 1,5 million ALTO
xml-files, a website with fulltext search-and-retrieval
functionality
- Estimated budget: 0,5 million Euro, easy material so 0,33 Ct
per page
Koninklijke Bibliotheek – Nationale bibliotheek van Nederland
Planning a digitisation project: a rough guide
Sample case 2: DDD Dutch newspapers
- Goal: make the collection online accessible for researchers
and the general public.
- No preservation project but because of vulnerability of
material re-scanning in the future no option, thus ‘production
project’
- Why? Task of the KB.
- Article-level access
Koninklijke Bibliotheek – Nationale bibliotheek van Nederland
Planning a digitisation project: a rough guide
Sample case 2: DDD Dutch newspapers (2)
- Deliverables: 8 million JP2 master files, 8 million JP2 access
images, PDF per issue, text file per article, MPEG21 per issue,
8 million MIX-files and website with fulltext and advanced
search
- Estimated budget: 12,5 million Euro
Koninklijke Bibliotheek – Nationale bibliotheek van Nederland
Planning a digitisation project: a rough guide
I digitise because
- my users frequently use this material
- I think my users will frequently use this material
- I want to save and protect my vulnerable originals
- my users give me money to do so
Koninklijke Bibliotheek – Nationale bibliotheek van Nederland
Planning a digitisation project: a rough guide
Digitisation on demand: Stadsarchief Amsterdam
• Threshold: information in original should be readible• 1 copy customer, 1 copy reading room• No separate, uncompressed master files (JPEG 10)• Now: 32 kilometers of archives digitally available• € 0,50 per scan
Koninklijke Bibliotheek – Nationale bibliotheek van Nederland
Planning a digitisation project: a rough guide
Cost estimate
• Be realistic• Calculate all costs• Use ‘realisation’ data from other projects• Beware of all the work BESIDES the actual scanning
Koninklijke Bibliotheek – Nationale bibliotheek van Nederland
Planning a digitisation project: a rough guide
Sample: ANP news bulletins and DDD newspapers
Staff
Hard- and software
Research anddevelopment
Scanning, OCR,metadata
= 1,50 Euro
= 0,33 Euro
Staff
Hard-and software
Research anddevelopment
Scanning, OCR,metadata
Koninklijke Bibliotheek – Nationale bibliotheek van Nederland
Planning a digitisation project: a rough guide
Outsourcing digitisation: different prices
Koninklijke Bibliotheek – Nationale bibliotheek van Nederland
Planning a digitisation project: a rough guide
Outsourcing?
Koninklijke Bibliotheek – Nationale bibliotheek van Nederland
Planning a digitisation project: a rough guide
Pitfall: intellectual property rights
- Three ‘copyright’ moments:
1. Making a (digital) copy
2. Making a copy for an internal network
3. Making a copy for the internet
- 70 years after death of ‘author’ and/or date of publication
- Legal obligation to retrieve all rightholders, time consuming
activity in (mass)digitisation projects
- Commission Digiti©e: agency responsible to deal with claims
and retrieving rightholders?
Koninklijke Bibliotheek – Nationale bibliotheek van Nederland
Planning a digitisation project: a rough guide
Privacy laws?
Koninklijke Bibliotheek – Nationale bibliotheek van Nederland
Planning a digitisation project: a rough guide
Pitfall: know thy originals!
• How many?• What condition?• Where are they?• Available metadata• Dimensions• Colour/greyscale/B&W• Available alternatives (eg. microfilm vs originals)
Koninklijke Bibliotheek – Nationale bibliotheek van Nederland
Planning a digitisation project: a rough guide
Case 1: ANP news bulletins
Koninklijke Bibliotheek – Nationale bibliotheek van Nederland
Planning a digitisation project: a rough guide
Case 1: ANP news bulletins
Koninklijke Bibliotheek – Nationale bibliotheek van Nederland
Planning a digitisation project: a rough guide
Case 2: DDD newspapers
Koninklijke Bibliotheek – Nationale bibliotheek van Nederland
Planning a digitisation project: a rough guide
Case 2: DDD newspapers
The Hague Stockholm
Vatican Secret Archives
Dresden
Parimaribo
Koninklijke Bibliotheek – Nationale bibliotheek van Nederland
Planning a digitisation project: a rough guide
Acquiring finances
- Resources within own institution
- National government
- European Union
- Private funds
- Current figures: 2 to 3% of all Dutch ‘institutionalized’ cultural
heritage is currently available in digital format.
Koninklijke Bibliotheek – Nationale bibliotheek van Nederland
Planning a digitisation project: a rough guide
Koninklijke Bibliotheek – Nationale bibliotheek van Nederland
Planning a digitisation project: a rough guide
Koninklijke Bibliotheek – Nationale bibliotheek van Nederland
Planning a digitisation project: a rough guide
Koninklijke Bibliotheek – Nationale bibliotheek van Nederland
Planning a digitisation project: a rough guide
National government Netherlands:
Dutch government encourages:
- crosssector cooperation between heritage institutions
- open standards
- service-oriented architecture (SOA)
- mass digitisation
- digitisation incorporated into overall policy (‘Digitaliseren met
beleid’)
- more uniformity of digitisation activities
- main target groups: education and research
Koninklijke Bibliotheek – Nationale bibliotheek van Nederland
Planning a digitisation project: a rough guide
National government Netherlands
- Dutch ministry of Education, Culture and Science but also
other ministeries.
- Digitisation programmes:
* Erfgoed van de Oorlog
* Memory of the Netherlands
http://www.geheugenvannederland.nl
* Images for the future
http://www.beeldenvoordetoekomst.nl
Koninklijke Bibliotheek – Nationale bibliotheek van Nederland
Planning a digitisation project: a rough guide
National preservation programme: Metamorfoze
- KB, National Archives
- 1997-
- Funded by Ministry of Education, Culture and Science
- Preservation of Dutch paper-based heritage
- Projectbureau
- 30% own contribution
- Before 2007: microfilming, after 2007: preservation imaging
- http://www.metamorfoze.nl/
Koninklijke Bibliotheek – Nationale bibliotheek van Nederland
Planning a digitisation project: a rough guide
Koninklijke Bibliotheek – Nationale bibliotheek van Nederland
Planning a digitisation project: a rough guide
Future: project Nederlands Erfgoed: Digitaal!
- Consortium of 10 Dutch heritage institutions
- Combining parts of their collections
- Cross-media, subject-oriented focus
- Target groups: education, research, tourism and creative
industry
- ‘Canon van Nederland’: highlights of Dutch history
- 2009-2014
- Overall estimated cost M186 Euro, estimated benefits M172-
223 Euro (!)
Koninklijke Bibliotheek – Nationale bibliotheek van Nederland
Planning a digitisation project: a rough guide
Koninklijke Bibliotheek: Digital Library programme
- Current digitisation projects: (to 2011): M 41 pages, budget
M€ 54
- Digital Library programme 2009-2013
- Target: 20% of all books, newspapers and journals published
in the Netherlands digitised in 2013
Koninklijke Bibliotheek – Nationale bibliotheek van Nederland
Planning a digitisation project: a rough guide
European projects
- Financial commitment institution, ‘matching’ principle
- Aimed at innovative initiatives
- Aimed at combining collection on an international level, e.g.
Europeana (http://www.europeana.eu) and European Digital
Library (http://www.theeuropeanlibrary.org/portal/index.htm)
Koninklijke Bibliotheek – Nationale bibliotheek van Nederland
Planning a digitisation project: a rough guide
Stepping stones...
1. Writing a project proposal (incl. ‘business case’)
2. Acquiring finances
3. Writing a detailed project plan (incl. detailed specs)
4. Setting up a project organisation
5. Managing the project flow
6. Concluding the project (exploitation plan, longterm
maintenance)
Koninklijke Bibliotheek – Nationale bibliotheek van Nederland
Planning a digitisation project: a rough guide
Project plan
- Detailed timeschedule with milestones and deliverables
- Division into workpackages (in case of large projects)
- Clear overview of dependencies
- Detailed risk assessment
- ‘Business case’: should be checked during project
- Translation of project aim into detailed specifications
Koninklijke Bibliotheek – Nationale bibliotheek van Nederland
Planning a digitisation project: a rough guide
Koninklijke Bibliotheek – Nationale bibliotheek van Nederland
Planning a digitisation project: a rough guide
http://www.dbnl.org
Koninklijke Bibliotheek – Nationale bibliotheek van Nederland
Planning a digitisation project: a rough guide
Koninklijke Bibliotheek – Nationale bibliotheek van Nederland
Planning a digitisation project: a rough guide
Koninklijke Bibliotheek – Nationale bibliotheek van Nederland
Planning a digitisation project: a rough guide
Koninklijke Bibliotheek – Nationale bibliotheek van Nederland
Planning a digitisation project: a rough guide
Koninklijke Bibliotheek – Nationale bibliotheek van Nederland
Planning a digitisation project: a rough guide
Koninklijke Bibliotheek – Nationale bibliotheek van Nederland
Planning a digitisation project: a rough guide
Koninklijke Bibliotheek – Nationale bibliotheek van Nederland
Planning a digitisation project: a rough guide
Technical specifications affected by project aim
- Image quality: resolution, tonal range, detail reproduction,
polarity (B/W, greyscale, colour)
- Image format: lossy vs. lossless, compressed vs. uncompressed
- Metadata: a lot vs a little vs somewhere in between
- Image manipulation: yes vs no vs a little
- Good technical manuals available:
* JISC Digital Media at http://www.jiscdigitalmedia.ac.uk/
* Cornell University at
http://www.library.cornell.edu/preservation/tutorial/
Koninklijke Bibliotheek – Nationale bibliotheek van Nederland
Planning a digitisation project: a rough guide
Search-and-retrieval problems
- Large amounts of data: how to find your way
- Limited capacity of search engine
- Limitations of Optical Character Recognition (OCR) software
Koninklijke Bibliotheek – Nationale bibliotheek van Nederland
Planning a digitisation project: a rough guide
Optical character recognition (OCR)
Blijkens verschillende mededeeelingen in de dag-bladen is de Indische regeering den laatsten tijd rege-lend opgetreden ten aanzien van het Indische handels-verkeer, in het bijzonder ten aanzien van den uitvoervan Indische producten.
Koninklijke Bibliotheek – Nationale bibliotheek van Nederland
Planning a digitisation project: a rough guide
Optical character recognition (OCR)
Blijkena verachillende mededeeelingen in de dag-bladen is de Indische regeering den 1aatsten tijd rege-lond opgetreden ten aanzien van het Indische handels-verkeer, in het lijzonder ten a3nzien van den uitvoarvan Indische producten.
Word accuracy: 7/33=79%
Character accuracy: 7/202=97%
Koninklijke Bibliotheek – Nationale bibliotheek van Nederland
Planning a digitisation project: a rough guide
Optical character recognition (OCR)
IINCOLXis strangely forgotten by b visitors to in Washington WashingtonThe Thesightseers who whotluck flock to the National ntionnl Capital at all sea seasons scaon8 seasons ¬ Lsons on8 of the year for some som unknown reason jeeni to find more moreinteresting moreintNe8ting moreinteresting interestingthe thing things of less historic importanethan the therelics thcreliC therelics relicspertaining pertainh g iu ti > the fmt martyred President whose un untimely untimely untimely ¬timelydeath was as mourned by the entire oiTiHzed world
Source: http://www.loc.gov/chroniclingamerica/
Koninklijke Bibliotheek – Nationale bibliotheek van Nederland
Planning a digitisation project: a rough guide
Optical character recognition (OCR)
Koninklijke Bibliotheek – Nationale bibliotheek van Nederland
Planning a digitisation project: a rough guide
Automated OCR
- Pilot project ‘Historische kranten’ (bitonal, from microfilm): between 60% and 70% word accuracy
- Results for historical texts very low. EU-project IMPACT (Improving Access to Texts, URL: http://www.impact-project.eu/)
Koninklijke Bibliotheek – Nationale bibliotheek van Nederland
Planning a digitisation project: a rough guide
Stepping stones...
1. Writing a project proposal (incl. ‘business case’)
2. Acquiring finances
3. Writing a detailed project plan (incl. detailed specs)
4. Setting up a project organisation
5. Managing the project flow
6. Concluding the project (exploitation plan, longterm
maintenance)
Koninklijke Bibliotheek – Nationale bibliotheek van Nederland
Planning a digitisation project: a rough guide
Steering group- representative of end-users
- representative of initiating party- representative responsible for quality assurance
Koninklijke Bibliotheek – Nationale bibliotheek van Nederland
Planning a digitisation project: a rough guide
Project manager- reports to steering group
- responsible for day-to-day work (budget etc.)-management by exception
Koninklijke Bibliotheek – Nationale bibliotheek van Nederland
Planning a digitisation project: a rough guide
Project leader- reports to project manager
- responsible for workpackage
Koninklijke Bibliotheek – Nationale bibliotheek van Nederland
Planning a digitisation project: a rough guide
Selection: Scientific Advisory Committee
- Advises on titles to be selected
- Advises on search functionality on the website
(userperspective)
- Advises on content on the website
Koninklijke Bibliotheek – Nationale bibliotheek van Nederland
Planning a digitisation project: a rough guide
Stepping stones...
1. Writing a project proposal (incl. ‘business case’)
2. Acquiring finances
3. Writing a detailed project plan (incl. detailed specs)
4. Setting up a project organisation
5. Managing the project flow
6. Concluding the project (exploitation plan, longterm
maintenance)
Koninklijke Bibliotheek – Nationale bibliotheek van Nederland
Planning a digitisation project: a rough guide
Managing the project flow
- Are the specifications met within timeframe and budget?
- Are there any new developments that affect the business case
of the project?
Koninklijke Bibliotheek – Nationale bibliotheek van Nederland
Planning a digitisation project: a rough guide
Stepping stones...
1. Writing a project proposal (incl. ‘business case’)
2. Acquiring finances
3. Writing a detailed project plan (incl. detailed specs)
4. Setting up a project organisation
5. Managing the project flow
6. Concluding the project (exploitation plan, longterm
maintenance, lessons learned)
Koninklijke Bibliotheek – Nationale bibliotheek van Nederland
Planning a digitisation project: a rough guide
Exploitation plan
- Who will be responsible for maintaining the website?
- What possible future purposes can be served?
- How much costs are involved in maintaining the website and
longterm preservation of the deliverables?
Koninklijke Bibliotheek – Nationale bibliotheek van Nederland
Planning a digitisation project: a rough guide
Pitfall: It ain’t over when it’s over...
- Koninklijke Bibliotheek: 41 million pages to be digitized up to
2011.
- Required storage space 1 petabyte
- Current estimated storage costs: longterm preservation
system (e-depot) 1 TB = 8,500 Euro a year
- Current estimated storage costs: webserver 1 TB = 7,500
Euro a year
- Structural costs in the long run: millions!
Koninklijke Bibliotheek – Nationale bibliotheek van Nederland
Planning a digitisation project: a rough guide
Koninklijke Bibliotheek- big issues
- Expensive scanning price of 1,3 Euro per page
- Intellectual property rights
- Quality control of files delivered by suppliers (> 2 million files
a month)
- Storage
- Longterm preservation of all files produced
- Inefficient search-and-retrieval software but …
Koninklijke Bibliotheek – Nationale bibliotheek van Nederland
Planning a digitisation project: a rough guide
we have already come a long way since 1999!