The Big Dutch 20 Year 730 Million Page Digitisation Challenge

95
T h e T h e Big Big D u t c h D u t c h 2 0 Y e a r 2 0 Y e a r 730 M illi on 730 M illi on P a g e P a g e D i g i t i sa ti o n D i g i t i sa ti o n Challe n g e Challe n g e w.alumni.ubc.ca/wp/wp-content/uploads/Gohagan_RLDutchWW_2012_01_HollandWindmillTulips.jpg 10 th International Conference on the Book 30 th June 2012, Barcelona, Spain Olaf Janssen, National Library of the Netherlands – [email protected] / @ookgezellig / slideshare.net/OlafJanssenNL

description

The National Library of the Netherlands (KB) is mass-digitizing all Dutch publications since 1470. This article outlines KB's strategy for making this output publicly available. In the next 20 years, the Dutch national library (KB) will mass-digitize all Dutch printed books, newspapers and magazines since 1470, a total of 730 million pages. Until recently, this was done by public funding alone. To speed up things in a climate of ongoing budget cuts, KB entered into public-private partnerships with both Google and Proquest to digitize 42 million pages by 2013. Besides the availability of funding, digitization priority is determined by a mix of client and institutional needs such as copyright status, uniqueness, institutional capability and user demand. At the same time, KB is answering user demand for centralized access and content distribution by streamlining its scattered online services portfolio. For this, KB develops two strategic lines of action. * The first is on metadata (searching FOR publications): in 2013, KB will unify metadata searching across all its paper and digital collections via OCLC's WorldCat Local. * The second is on full-text (searching IN publications): for searching in full-text historic publications (i.e. mass digitization output) KB is currently developing its Platform for Digital Publications. Besides a search engine, it is also a: * Presentation environment, associating each full-text object with a standardized webpage and persistent URL, offering a uniform look and feel, and unique reference for all KB's full-texts. This landing page enables third-party services (e.g. WorldCat Local, Europeana, Google) to refer to objects in a persistent way. * Delivery platform, enabling KB to deliver content in the workflows of users via APIs and expose it to research communities. * Aggregator, enabling KB to set up a network of partners to bring together all Dutch digital books, newspapers and magazines, at the same time supporting Europeana's content aggregation strategy.

Transcript of The Big Dutch 20 Year 730 Million Page Digitisation Challenge

Page 1: The  Big Dutch 20 Year  730 Million Page Digitisation  Challenge

TheThe BigBig DutchDutch 20 Year20 Year

730 Million 730 Million PagePage

Digitisation Digitisation ChallengeChallenge

w.a

lum

ni.u

bc.c

a/w

p/w

p-co

nten

t/up

load

s/G

ohag

an_R

LDut

chW

W_2

012_

01_H

olla

ndW

indm

illTu

lips.

jpg

10th International Conference on the Book30th June 2012, Barcelona, Spain

Olaf Janssen, National Library of the Netherlands – [email protected] / @ookgezellig / slideshare.net/OlafJanssenNL

Presentator
Presentatienotities
The National Library of the Netherlands (KB) is mass-digitizing all Dutch publications since 1470. This presentation outlines KB's strategy for making this output publically available. Keywords: National Libraries, Digital Library Strategies, Mass Digitization, Public - Private Partnerships, Integrated Access, Cross-domain Cultural Heritage, Content Aggregation and Distribution, Interoperability, Europeana
Page 2: The  Big Dutch 20 Year  730 Million Page Digitisation  Challenge

Photo: KB

Hello, my name is Olaf Janssen

Page 3: The  Big Dutch 20 Year  730 Million Page Digitisation  Challenge

I’m a project manager for KB, the

National Library of the Netherlands…

htt

p://

pin

tere

st.c

om/p

in/2

38

90

19

55

20

34

21

00

2/

Hello, my name is Olaf Janssen

Page 4: The  Big Dutch 20 Year  730 Million Page Digitisation  Challenge

These are my colleagues …Source: KB intranet

Page 5: The  Big Dutch 20 Year  730 Million Page Digitisation  Challenge

Source: KB intranet

Every day we give our best …

Page 6: The  Big Dutch 20 Year  730 Million Page Digitisation  Challenge

... because we’ve atask to accomplish

Source: NRC

Page 7: The  Big Dutch 20 Year  730 Million Page Digitisation  Challenge

We are scanning

Source: NRC

all Dutch

Presentator
Presentatienotities
In the next 20 years, the Dutch national library (KB) will mass-digitize all Dutch printed books, newspapers and magazines since 1470, a total of 730 million pages.
Page 8: The  Big Dutch 20 Year  730 Million Page Digitisation  Challenge

books

Page 9: The  Big Dutch 20 Year  730 Million Page Digitisation  Challenge

booksnewspapers

Page 10: The  Big Dutch 20 Year  730 Million Page Digitisation  Challenge

bookshttp://www.corbisimages.com/stock-photo/rights-managed/42-20042070/1960s-1970s-boy-leaning-against-tree-reading?popup=1http://www.corbisimages.com/stock-photo/rights-managed/42-26195211/humor-portrait-man-wearing-hat-sitting-on?popup=1http://eu.art.com/products/p6901596700-sa-i5098387/posters.htm?ui=F9E1398DA4CC4D3DB105E128FBAB2C4D

newspapers & magazines

Page 11: The  Big Dutch 20 Year  730 Million Page Digitisation  Challenge

since

1470

http

://m

arks

ayer

s.fil

es.w

ordp

ress

.com

/201

1/05

/cha

rles-

darw

in-2

.png

Page 12: The  Big Dutch 20 Year  730 Million Page Digitisation  Challenge

pages

A whopping

730http://www.portlandmonthlymag.com/assets/0004/4122/surprised-woman.jpghttp://morristrust.com/wp-content/uploads/2012/04/Surprised-Face.jpg

Presentator
Presentatienotities
Page 13: The  Big Dutch 20 Year  730 Million Page Digitisation  Challenge

pages

A whopping

730.000http://www.portlandmonthlymag.com/assets/0004/4122/surprised-woman.jpghttp://morristrust.com/wp-content/uploads/2012/04/Surprised-Face.jpg

Presentator
Presentatienotities
Page 14: The  Big Dutch 20 Year  730 Million Page Digitisation  Challenge

pages

A whopping

730.000.000http://www.portlandmonthlymag.com/assets/0004/4122/surprised-woman.jpghttp://morristrust.com/wp-content/uploads/2012/04/Surprised-Face.jpg

Presentator
Presentatienotities
Page 15: The  Big Dutch 20 Year  730 Million Page Digitisation  Challenge

(approx.)days

For the next

7300http://www.picturesfromourpast.com/gallery/RightsManaged/20110817/CKSA011_YC019.jpghttp://imgc.allpostersimages.com/images/P-473-488-90/56/5641/2RYMG00Z/posters/george-marks-surprised-woman-posing-portrait.jpg

Presentator
Presentatienotities
Page 16: The  Big Dutch 20 Year  730 Million Page Digitisation  Challenge

100.000pages

http://www.allposters.co.uk/-sp/Man-Wiping-Forehead-Posters_i8018953_.htm

That’s

every single day !!

Page 17: The  Big Dutch 20 Year  730 Million Page Digitisation  Challenge

And of course, after digitisation, we want to make many people happy with our content.

http://annakrentz.blogspot.nl/2011/05/dutch-liberation.html

Page 18: The  Big Dutch 20 Year  730 Million Page Digitisation  Challenge

I work on this because I believe…

http://annakrentz.blogspot.nl/2011/05/dutch-liberation.html

Page 19: The  Big Dutch 20 Year  730 Million Page Digitisation  Challenge

ultimately people want to know who they are.

For that they explore their histories & origins.

I want to help them in exploring these worlds

http://annakrentz.blogspot.nl/2011/05/dutch-liberation.html

Page 20: The  Big Dutch 20 Year  730 Million Page Digitisation  Challenge

http://annakrentz.blogspot.nl/2011/05/dutch-liberation.html

ultimately people want to know who they are.

For that they explore their histories & origins.

I want to help them in exploring these worlds

Page 21: The  Big Dutch 20 Year  730 Million Page Digitisation  Challenge

http://annakrentz.blogspot.nl/2011/05/dutch-liberation.html

ultimately people want to know who they are.

For that they explore their histories & origins.

I want to help them in exploring these worlds.

Page 22: The  Big Dutch 20 Year  730 Million Page Digitisation  Challenge

The key idea I’d like to share with you today:

How KB goes about tackling these grand

challenges…

http://mestadelsbilder.files.wordpress.com/2011/06/dali.jpg

Page 23: The  Big Dutch 20 Year  730 Million Page Digitisation  Challenge

First, we are creating digital content ...

http://www.bjp-online.com/IMG/001/110001/getty-archive-collection.jpg?1282665196

Page 24: The  Big Dutch 20 Year  730 Million Page Digitisation  Challenge

http://www.corbisimages.com/stock-photo/rights-managed/42-20042210/1940s-man-in-suit-holding-up-index?popup=1

1995-1999

Page 25: The  Big Dutch 20 Year  730 Million Page Digitisation  Challenge

1995-1999

Small scale digitisation

Page 26: The  Big Dutch 20 Year  730 Million Page Digitisation  Challenge

1995-1999

Treasures & highlights of KB collection

(1000s pages)

Small scale digitisation

Presentator
Presentatienotities
The KB started digitizing its holdings in 1995, for reasons of accessibility and long-term preservation. In the first years small scale efforts focused on scanning visually attractive materials, highlights of the collection for the widest possible audiences. One of the first projects was 100 highlights of the Koninklijke Bibliotheek,
Page 27: The  Big Dutch 20 Year  730 Million Page Digitisation  Challenge

1995-1999

Memory of the Netherlands

(730K images)

Small scale digitisation

Presentator
Presentatienotities
followed by Memory of the Netherlands, the national programme for digitizing Dutch cultural heritage, which was focused on image based materials.
Page 28: The  Big Dutch 20 Year  730 Million Page Digitisation  Challenge

2000-2010

http://www.corbisimages.com/stock-photo/rights-managed/42-26194724/smiling-blond-nurse-with-surprised-expression-talking?popup=1

Presentator
Presentatienotities
It was not until 1999 that the KB started digitizing historical textual publications (books, newspapers & magazines).
Page 29: The  Big Dutch 20 Year  730 Million Page Digitisation  Challenge

Large scale, public funding

2000-2010

Page 30: The  Big Dutch 20 Year  730 Million Page Digitisation  Challenge

Large scale, public funding

2000-2010

Early Dutch Books(2.2M pages, full-text, 1781-1800)

Presentator
Presentatienotities
For the last 8 years the focus has been on large-scale digitization of text corpora for study and research in the humanities using public funding. In 2003 a project took off to scan the complete run of Dutch Parliamentary Papers. Consisting of 2.3 million pages, this was at that time an unprecedented quantity for the Netherlands   In May 2011 the Early Dutch Books Online digitization effort delivered 2.1 million full-text pages from the specials book collections of the KB and the university libraries of Amsterdam and Leiden. Furthermore, by the summer of 2012, some 1.5 million pages from the most frequently consulted old magazines (1840 -1950) will have been converted into full-text and made searchable on the word level.
Page 31: The  Big Dutch 20 Year  730 Million Page Digitisation  Challenge

Large scale, public funding

2000-2010

Historic Newspapers(9.5M pages, full-text, 1618-1995)

Presentator
Presentatienotities
At the end of 2006 the KB was rewarded the Historical Newspapers project. By the middle of 2012, it will have scanned 9.5 million pages from popular Dutch regional, national and colonial newspapers from the period 1618-1995.
Page 32: The  Big Dutch 20 Year  730 Million Page Digitisation  Challenge

http://www.corbisimages.com/stock-photo/rights-managed/42-26194844/smiling-woman-counting-on-fingers-wearing-pearls?popup=1

2010-today

Page 33: The  Big Dutch 20 Year  730 Million Page Digitisation  Challenge

2010-today

Mass scale, private & public funding

Page 34: The  Big Dutch 20 Year  730 Million Page Digitisation  Challenge

2010-today

Mass scale, private & public funding

Proquest partnership(12M pages, 1450-1700)

http://www.kb.nl/nieuws/2011/proquest-en.html

Presentator
Presentatienotities
The KB cannot rely on public funding alone, especially in times when government support for cultural & scientific heritage is in a downward trend. It has therefore entered into strategic public-private partnerships with both Google and Proquest to digitize 210.000 books (some 42M pages) from its public domain collections. KB and Google sign book digitization agreement, http://www.kb.nl/nieuws/2010/google-en.html Digitization by Proquest of early printed books in KB collection, http://www.kb.nl/nieuws/2011/proquest-en.html
Page 35: The  Big Dutch 20 Year  730 Million Page Digitisation  Challenge

2010-today

Mass scale, private & public funding

Google partnership(35M pages, full-text, 1701-1871)

http://www.kb.nl/nieuws/2010/google-en.html

Presentator
Presentatienotities
The KB cannot rely on public funding alone, especially in times when government support for cultural & scientific heritage is in a downward trend. It has therefore entered into strategic public-private partnerships with both Google and Proquest to digitize 210.000 books (some 42M pages) from its public domain collections. KB and Google sign book digitization agreement, http://www.kb.nl/nieuws/2010/google-en.html Digitization by Proquest of early printed books in KB collection, http://www.kb.nl/nieuws/2011/proquest-en.html
Page 36: The  Big Dutch 20 Year  730 Million Page Digitisation  Challenge

OK, so we’re very busy creating loads of digital content …

http://www.bjp-online.com/IMG/001/110001/getty-archive-collection.jpg?1282665196

Presentator
Presentatienotities
Besides the availability of funding and physical condition of the materials, the priorities for digitization of KB’s textual collections are determined by a mix of further client and institutional needs. These include: 1- Copyright status. To minimize legal hassle and optimize future (re)usability of its objects, KB mainly focuses on historic out-of-copyright content. For these reasons the Google and Proquest efforts deliberately focus on these types of objects. For materials that are still under copyright, typically post-WWII publications, the KB tries to clear rights by signing bulk agreements with publishers and collective right organizations. In the Netherlands Lira (for text) and Pictoright (for images) are the main parties the KB has signed bulk-contracts with to make the Historic Newspaper website (5) possible. For orphan works the KB makes a reasonable effort (30 minutes per title) for tracking down and documenting rights holders. However, finding them is not always successful. The KB has chosen not to delay the upcoming launch of its Historic Magazines website (7) until all IPR-holders have been identified. When the site launches in June 2012, it will contain an opt-out page where rights holders can identify themselves and object to online publication of their articles, after which appropriate actions will be taken. 2- Uniqueness of materials. The KB has a lot of publications that are unique to the country. The original documents always have priority, as they are irreplaceable and carry the highest level of authenticity and integrity. 3- Institutional capability. There are many Dutch publications that have not only been collected by the KB, but by other libraries as well. However, not all of these are willing or capable to digitize these works because they lack resources, digitisation know-how or storage infrastructure. The KB can choose to digitize those works, as other parties are unlikely to do this. 4- User demand. By 2013 it will be possible to give priority to digitize publications clients explicitly ask for. In other words, next year KB will have an operational Digitisation on Demand service .
Page 37: The  Big Dutch 20 Year  730 Million Page Digitisation  Challenge

Houston, we’ve a problem!!

http

://2

.bp.

blog

spot

.com

/_BW

zuYw

iS6-

I/TM

geR

sFd3

mI/

AAAA

AAAA

Elw

/3cv

gbZS

PWcs

/s16

00/d

octo

r+m

acro

+ju

dy+

scar

ed.jp

g

Page 38: The  Big Dutch 20 Year  730 Million Page Digitisation  Challenge

Although we create & store our digital content in a strictly standardized process … (JP2, JPG, XML-OCR, MPEG21, ALTO, PDF … *)

* http://kb.nl/hrd/digitalisering/index-en.html

http://www.electrohype.org/press/pionjar/IBM_System360_Mod_50.jpg

Page 39: The  Big Dutch 20 Year  730 Million Page Digitisation  Challenge

.. this back-end standardisation does not reflect in the front-end

http://www.electrohype.org/press/pionjar/IBM_System360_Mod_50.jpg

Page 40: The  Big Dutch 20 Year  730 Million Page Digitisation  Challenge
Page 41: The  Big Dutch 20 Year  730 Million Page Digitisation  Challenge

KB Treasures Memory of the Netherlands

Page 42: The  Big Dutch 20 Year  730 Million Page Digitisation  Challenge

KB full-text Historic books

KB full-text

Historic newspapers

KB Treasures Memory of the Netherlands

Page 43: The  Big Dutch 20 Year  730 Million Page Digitisation  Challenge

Google full-text

Historic books

KB full-text Historic books

ProquestHistoric books

KB full-text

Historic newspapers

KB Treasures Memory of the Netherlands

Page 44: The  Big Dutch 20 Year  730 Million Page Digitisation  Challenge

To many people KB’s website portfolio feels something like this

http://berichtenuithetverleden.files.wordpress.com/2011/03/escher.jpg

Page 45: The  Big Dutch 20 Year  730 Million Page Digitisation  Challenge

BrandingObject

presentation

URL-logic DesignSearch logic

Imag

es:

htt

p://

ww

w.c

orbi

sim

ages

.com

/Sea

rch

#pg

=h

+ar

mst

ron

g+ro

bert

s

User experience

Current KB websites are inconsistent in

Display of result set

Page 46: The  Big Dutch 20 Year  730 Million Page Digitisation  Challenge

BrandingObject

presentation

Scattered & unrelated collections

URL-logic DesignSearch logic

Imag

es:

htt

p://

ww

w.c

orbi

sim

ages

.com

/Sea

rch

#pg

=h

+ar

mst

ron

g+ro

bert

s

User experience

Current KB websites are inconsistent in

Display of result set

Page 47: The  Big Dutch 20 Year  730 Million Page Digitisation  Challenge

BrandingObject

presentation

Scattered & unrelated collections

Non-interoperability

URL-logic DesignSearch logic

Imag

es:

htt

p://

ww

w.c

orbi

sim

ages

.com

/Sea

rch

#pg

=h

+ar

mst

ron

g+ro

bert

s

User experience

Display of result set

Current KB websites are inconsistent in

Presentator
Presentatienotities
The KB has a large and growing portfolio of online services for consumers, non-commercial parties and businesses. The multitude of KB websites makes is practically impossible for users to get a complete picture of all the content - both metadata and full-texts - the KB gives access to. Each of these sites has its own specific branding, URL, design and search & result display functionalities. For users it is not clear that e.g. the websites of the Memory of the Netherlands (3), the Parliamentary Papers (4), Historical Newspapers (5), Early Dutch Books Online (6) and ANP radio transcripts are all based on the same back-end open data standards and proven working methods , both for metadata and full-text. On the data level these services are interoperable, but this potential has not yet been optimized in the front-end presentation.� For end-users the KB collections thus appear to be unrelated and scattered,
Page 48: The  Big Dutch 20 Year  730 Million Page Digitisation  Challenge

For short:

Current KB websites don’t meet expectations of modern & future generations

http://www.corbisimages.com/stock-photo/rights-managed/NT3707756/depressed-cheerleader?popup=1http://www.corbisimages.com/stock-photo/rights-managed/42-20036948/1960s-1970s-seated-baby-in-diaper-with?popup=1

Presentator
Presentatienotities
making them relatively difficult to use given the expectations of the web2.0 generation. They demand all content not only to be available centrally via a single point of entry (portal), but also distributed in their own social and professional workflows, networks and platforms. They expect being able to apply multiple views & filters (by theme, by time, by geographical location, by object type etc.) to the interoperable, contextualized, enriched, sharable, taggable and re-usable content, with minimum copyright limitations. In addition, users are primarily interested in the digital content itself, much less from which physical object or institution it was derived.  
Page 49: The  Big Dutch 20 Year  730 Million Page Digitisation  Challenge

http://ww

w.leninim

ports.com/cary_grant_new

_7a.jpg

No panic !

Page 50: The  Big Dutch 20 Year  730 Million Page Digitisation  Challenge

We are working on a solution..

http://ww

w.leninim

ports.com/cary_grant_new

_7a.jpg

Page 51: The  Big Dutch 20 Year  730 Million Page Digitisation  Challenge

KB is implementing 3 lines of action

htt

p://

sim

pleh

omes

choo

l.net

/wp-

con

ten

t/u

ploa

ds/2

01

1/0

6/w

oman

_w

alki

ng_

bet

wee

n_

book

shel

f-e1

30

83

57

75

27

73

.jpg

htt

p://

ww

w.c

orbi

sim

ages

.com

/sto

ck-p

hot

o/ri

ghts

-man

aged

/NT3

76

51

15

/an

-old

-hat

-tri

ck?p

opu

p=1

Page 52: The  Big Dutch 20 Year  730 Million Page Digitisation  Challenge

1.Unified searching

forpublications

KB is implementing 3 lines of action

htt

p://

sim

pleh

omes

choo

l.net

/wp-

con

ten

t/u

ploa

ds/2

01

1/0

6/w

oman

_w

alki

ng_

bet

wee

n_

book

shel

f-e1

30

83

57

75

27

73

.jpg

htt

p://

ww

w.c

orbi

sim

ages

.com

/sto

ck-p

hot

o/ri

ghts

-man

aged

/NT3

76

51

15

/an

-old

-hat

-tri

ck?p

opu

p=1

Page 53: The  Big Dutch 20 Year  730 Million Page Digitisation  Challenge

1.Unified searching

forpublications

2.Unified searching

in publications

KB is implementing 3 lines of action

htt

p://

sim

pleh

omes

choo

l.net

/wp-

con

ten

t/u

ploa

ds/2

01

1/0

6/w

oman

_w

alki

ng_

bet

wee

n_

book

shel

f-e1

30

83

57

75

27

73

.jpg

htt

p://

ww

w.c

orbi

sim

ages

.com

/sto

ck-p

hot

o/ri

ghts

-man

aged

/NT3

76

51

15

/an

-old

-hat

-tri

ck?p

opu

p=1

Page 54: The  Big Dutch 20 Year  730 Million Page Digitisation  Challenge

1.Unified searching

forpublications

2.Unified searching

in publications

KB is implementing 3 lines of action

htt

p://

sim

pleh

omes

choo

l.net

/wp-

con

ten

t/u

ploa

ds/2

01

1/0

6/w

oman

_w

alki

ng_

bet

wee

n_

book

shel

f-e1

30

83

57

75

27

73

.jpg

htt

p://

ww

w.c

orbi

sim

ages

.com

/sto

ck-p

hot

o/ri

ghts

-man

aged

/NT3

76

51

15

/an

-old

-hat

-tri

ck?p

opu

p=1

3.Unified

object presentation

Presentator
Presentatienotities
It is virtually impossible to meet all users demands in one single service, so the strategy of the KB until 2013 is focusing on 3 lines of actions: one for metadata search , one for full-text search & object presentation, and one for object presentation
Page 55: The  Big Dutch 20 Year  730 Million Page Digitisation  Challenge

Metadata

1. Unified searching for publications

Page 56: The  Big Dutch 20 Year  730 Million Page Digitisation  Challenge

Metadata

1. Unified searching for publications

KB General Cataloguesearching for • (e-)books • (e-)magazines • (e-)newspapers

Page 57: The  Big Dutch 20 Year  730 Million Page Digitisation  Challenge

Metadata

1. Unified searching for publications

KB General Cataloguesearching for • (e-)books • (e-)magazines • (e-)newspapers

MetaLibsearching for • scholarly e-journals • licensed 3rd party databases

Presentator
Presentatienotities
Users that need specific publications (known-item search) will in general want to use metadata for finding the objects. Currently the KB offers its General Catalogue - for searching for (e-)books, (e-)magazines and (e-)newspapers – and MetaLib for searching for scholarly e-journals and licensed 3rd party databases.
Page 58: The  Big Dutch 20 Year  730 Million Page Digitisation  Challenge

Metadata

1. Unified searching for publications

KB General Cataloguesearching for • (e-)books • (e-)magazines • (e-)newspapers

MetaLibsearching for • scholarly e-journals • licensed 3rd party databases

WorldCat LocalKB’s single starting point for searching for publications

Presentator
Presentatienotities
In 2012 the KB will unify metadata searching via OCLC’s discovery tool WorldCat Local , making this KB’s single point of entry for known-item search across all digital (and paper) collections.
Page 59: The  Big Dutch 20 Year  730 Million Page Digitisation  Challenge

Downsides of WorldCat Local

1. Unified searching for publications

Page 60: The  Big Dutch 20 Year  730 Million Page Digitisation  Challenge

Downsides of WorldCat Local

1. Unified searching for publications

1. No full-text searching

Page 61: The  Big Dutch 20 Year  730 Million Page Digitisation  Challenge

Downsides of WorldCat Local

1. Unified searching for publications

2. No object presentation

1. No full-text searching

htt

p://

ww

w.c

orbi

sim

ages

.com

/Sea

rch

#pg

=h

+ar

mst

ron

g+ro

bert

s&p=

1&

Col

orFo

rmat

=2

&q=

sad

Presentator
Presentatienotities
The current version of WCL has two limitations that are relevant for KB’s digital ambitions: Full-text search: WCL does not support full-text searching, so parties that want to offer their users word-level discovery of their content must provide this in other services. Object presentation: As WCL is only aimed at finding digital content, the actual presentation of objects referred to by WCL is left to 3rd parties.
Page 62: The  Big Dutch 20 Year  730 Million Page Digitisation  Challenge

http://ww

w.leninim

ports.com/cary_grant_new

_7a.jpg

No panic !

Page 63: The  Big Dutch 20 Year  730 Million Page Digitisation  Challenge

We are tackling this…

http://ww

w.leninim

ports.com/cary_grant_new

_7a.jpg

Page 64: The  Big Dutch 20 Year  730 Million Page Digitisation  Challenge

Full-text

2. Unified searching in publications

Page 65: The  Big Dutch 20 Year  730 Million Page Digitisation  Challenge

Full-text

2. Unified searching in publications

KB Platform for Digital Publications

Page 66: The  Big Dutch 20 Year  730 Million Page Digitisation  Challenge

Full-text

2. Unified searching in publications

KB full-text historic books

KB Platform for Digital Publications

Page 67: The  Big Dutch 20 Year  730 Million Page Digitisation  Challenge

Full-text

2. Unified searching in publications

Google full-texthistoric books

KB full-text historic books

KB Platform for Digital Publications

Page 68: The  Big Dutch 20 Year  730 Million Page Digitisation  Challenge

Full-text

2. Unified searching in publications

Google full-texthistoric books

KB full-text historic books

KB full-text historic newspapers

KB Platform for Digital Publications

Page 69: The  Big Dutch 20 Year  730 Million Page Digitisation  Challenge

Full-text

2. Unified searching in publications

Google full-texthistoric books

KB full-text historic books

KB full-text historic newspapers

All future KB full-text digitisation output

KB Platform for Digital Publications

Page 70: The  Big Dutch 20 Year  730 Million Page Digitisation  Challenge

Full-text

2. Unified searching in publications

Google full-texthistoric books

KB full-text historic books

KB full-text historic newspapers

All future KB full-text digitisation output

KB Platform for Digital Publications

KB full-texthistoric magazines

(sept 2012)

Presentator
Presentatienotities
Because of the two limitations of WCL mentioned, the KB is currently developing a website where - Users can search in publications. It will be KB’s single point of entry for full-text search in digitized historic materials. Worldcat Local can refer to for presentation of the object. For every digital publication stored in the KB, a landing page with persistent URL is created on which the full-text book, magazine or newspaper is presented. This will create a user interface with uniform look & feel to all KB's full-text objects. � Not only WCL, but also other services (e.g. Google, Europeana, other library catalogues, Wikipedia) or persons (e.g. scientists citing a publication) can refer to the landing page/presentation environment using the persistent URL.   This website has the (working) name of KB Platform for Digital Publications. KB’s Platform for Digital Publications marks a turning point towards centralized access to Dutch digital publications. It will aggregate - The content & functionalities of existing KB full-text websites. In the first instance Historic Newspapers (5) and EDBO (6). These individual websites will be phased out as soon as the content has been transferred into the Platform. - Upcoming output of KB digitization projects. In the first instance Historic Magazines (7), Google (9) and Proquest (10). - The future output of digitization projects under the Metamorfoze programme (8) - Existing full-text collections from Dutch university libraries that have a tradition in the humanities, such as Leiden and Amsterdam. This collaboration with content delivering partners is of strategic important for KB’s ambitions to position the Platform as a national aggregator for digital textual publications.   For the years 2012 and 2013 four content releases for the Platform are foreseen. The current planning is as follows: Historic Magazines by June 2012 �* Adding 1.5M pages from the period 1840 -1950 + adding magazine-specific functionalities to the user interface Historic Books by October 2012 �* Transferring the 2.1M pages & book-specific functionalities of EDBO into the Platform �* Adding the e-books from the Google-effort that have been digitized by that date Dutch university collections by May 2013�* The exact number and nature of the objects are not yet known, as the investigation of eligible collections is currently underway. Historic Newspapers by May 2013�* Transferring the 9.5M pages & newspaper-specific functionalities of http://kranten.kb.nl into the Platform
Page 71: The  Big Dutch 20 Year  730 Million Page Digitisation  Challenge

Uniform look & feel, independent of object

3. Unified object presentation

Page 72: The  Big Dutch 20 Year  730 Million Page Digitisation  Challenge

Book (early wireframing stage)

Uniform look & feel, independent of object

3. Unified object presentation

Presentator
Presentatienotities
Because of the two limitations of WCL mentioned, the KB is currently developing a website where Worldcat Local can refer to for presentation of the object. For every digital publication stored in the KB, a landing page with persistent URL is created on which the full-text book, magazine or newspaper is presented. This will create a user interface with uniform look & feel to all KB's full-text objects. �  This website has the (working) name of KB Platform for Digital Publications.
Page 73: The  Big Dutch 20 Year  730 Million Page Digitisation  Challenge

Newspaper (early wireframing stage)

Uniform look & feel, independent of object

3. Unified object presentation

Page 74: The  Big Dutch 20 Year  730 Million Page Digitisation  Challenge

Uniform look & feel, independent of object

3. Unified object presentation

Magazine (early wireframing stage)

Page 75: The  Big Dutch 20 Year  730 Million Page Digitisation  Challenge

Landing page + persistent ID

3. Unified object presentation

Page 76: The  Big Dutch 20 Year  730 Million Page Digitisation  Challenge

Landing page + persistent ID

3. Unified object presentation

Landing page(within Platform for Digital Publications)

Page 77: The  Big Dutch 20 Year  730 Million Page Digitisation  Challenge

Landing page + persistent ID

3. Unified object presentation

Landing page(within Platform for Digital Publications)

persistent ID

Page 78: The  Big Dutch 20 Year  730 Million Page Digitisation  Challenge

Landing page + persistent ID

3. Unified object presentation

KB metadata search(via WCLocal)

Landing page(within Platform for Digital Publications)

Page 79: The  Big Dutch 20 Year  730 Million Page Digitisation  Challenge

Landing page + persistent ID

3. Unified object presentation

KB metadata search(via WCLocal)

KB full-text search (via Platform for Digital

Publications)

Landing page(within Platform for Digital Publications)

Page 80: The  Big Dutch 20 Year  730 Million Page Digitisation  Challenge

Landing page + persistent ID

3. Unified object presentation

Scientist, student etc.KB metadata search(via WCLocal)

KB full-text search (via Platform for Digital

Publications)

Landing page(within Platform for Digital Publications)

Page 81: The  Big Dutch 20 Year  730 Million Page Digitisation  Challenge

Landing page + persistent ID

3. Unified object presentation

Scientist, student etc.KB metadata search(via WCLocal)

KB full-text search (via Platform for Digital

Publications)

Landing page(within Platform for Digital Publications)

Page 82: The  Big Dutch 20 Year  730 Million Page Digitisation  Challenge

Landing page + persistent ID

3. Unified object presentation

Scientist, student etc.KB metadata search(via WCLocal)

KB full-text search (via Platform for Digital

Publications)

Landing page(within Platform for Digital Publications)

Page 83: The  Big Dutch 20 Year  730 Million Page Digitisation  Challenge

Landing page + persistent ID

3. Unified object presentation

Scientist, student etc.KB metadata search(via WCLocal)

KB full-text search (via Platform for Digital

Publications)

Landing page(within Platform for Digital Publications)

Presentator
Presentatienotities
Because of the two limitations of WCL mentioned, the KB is currently developing a website where Not only WCL, but also other services (e.g. Google, Europeana, other library catalogues, Wikipedia) or persons (e.g. scientists citing a publication) can refer to the landing page/presentation environment using the persistent URL.   This website has the (working) name of KB Platform for Digital Publications.
Page 84: The  Big Dutch 20 Year  730 Million Page Digitisation  Challenge

http://www.bjp-online.com/IMG/001/110001/getty-archive-collection.jpg?1282665196

So… we’re very busy creating loads of digital content …

Page 85: The  Big Dutch 20 Year  730 Million Page Digitisation  Challenge

http://www.bjp-online.com/IMG/001/110001/getty-archive-collection.jpg?1282665196

and we’re also creating unified discovery & presentation …

So… we’re very busy creating loads of digital content …

Page 86: The  Big Dutch 20 Year  730 Million Page Digitisation  Challenge

making quite a few people happy…

http://ww

w.corbisim

ages.com/Search#

pg=h+

armstrong+

roberts&p=

1&ColorForm

at=2&

q=happy

Page 87: The  Big Dutch 20 Year  730 Million Page Digitisation  Challenge

But we want more …

Page 88: The  Big Dutch 20 Year  730 Million Page Digitisation  Challenge

KB wants to make MORE people

HAPPIER with its content!

Page 89: The  Big Dutch 20 Year  730 Million Page Digitisation  Challenge

Some strategies…

http://mestadelsbilder.files.wordpress.com/2011/06/dali.jpg

Page 90: The  Big Dutch 20 Year  730 Million Page Digitisation  Challenge

1) APIs / dataservicesOAI-PMH & SRU

http://www.ducatimeccanica.com/single_engine.jpg

Presentator
Presentatienotities
The KB Platform for Digital Publications. will be aimed at meeting the user needs of the web2.0 generation. The Platform will therefore not only have full-text searching and presentation functionalities via a modern web2.0 user interface, but will also act as a content distribution service with APIs. These can deliver content to users in their established social networks and platforms, on their mobile devices, in their professional virtual research environments & communities, to the Open Data community, as well as allow others (both business and consumers) to build their own applications based on the content in the Platform.
Page 91: The  Big Dutch 20 Year  730 Million Page Digitisation  Challenge

2) Content via (social) networks

Page 92: The  Big Dutch 20 Year  730 Million Page Digitisation  Challenge

3) Clear licensing information for use & reuseCC-zero, unless (legal) restrictions apply

Presentator
Presentatienotities
Obviously, distribution and re-use of content outside the KB web domain is bound to copyright conditions. Even for objects that are in the public domain because of their age (such as the EDBO collection (6)), the KB is not always free to pro-actively distribute them. For instance, the public-private partnerships agreed with Google and Proquest both carry conditions that prevent the KB from distributing the full-texts to 3rd party servers.
Page 93: The  Big Dutch 20 Year  730 Million Page Digitisation  Challenge

4) Strategic partnershipsEuropeana & Wikipedia

Presentator
Presentatienotities
4 The cross-domain & international dimensions As the national library, the KB has a very important facilitating and networking role in the Dutch scientific and cultural infrastructure. Using this position, it has the potential to set up and stimulate different levels of collaboration to make online heritage more accessible. This is illustrated by the 3-tier collaborative model Lower level: domain specific collaboration & aggregation The KB plans to position its Platform for Digital Publications as a national aggregator for Dutch full-texts, aiming to make the content - and the network of content delivering partners - interoperable and ready for participation in cross-domain initiatives on national and international levels.   Besides the KB with its Platform, organizations from other domains are working on interoperability and aggregation for their specific sectors. Lead by the Institute for Sound & Vision, institutions from the audio-visual domain collaborate to enable aggregation of AV-materials. A similar initiative is taking place for the archival domain, with the National Archives as the facilitator. For the museum sector the Rijksdienst voor het Cultureel Erfgoed is the main player.   The ways content aggregation and the supporting technical and organizational structures are set up are not uniform, but differ across the domains. Based on sector-specific best-practices, knowledge and culture, each aggregator is setting up domain interoperability in the best possible way. This is however not done in isolation; the domains are in regular contact to reach consensus on issues such as “which content goes where”, to learn from each other and to avoid overlapping work. This way responsibilities & roles are kept clear, while at the same time synergies are exploited where possible.   Middle level: national cross-domain collaboration & aggregation�To enable these sector specific aggregation initiatives to come together, the results of the NED! project are used. It delivered a basic infrastructure for the interoperability of Dutch digital heritage, using open standards including XML, DublinCore, OAI-PMH and SRU. It is now being expanded to build a cross-domain heritage aggregator that can become the national hub for content delivery to international initiatives.   Building a national aggregator is however a step-by-step process, not finished overnight. Until that time domain-specific aggregators - in case of the library domain the National Platform for Digital Publications or The European Library - will continue to have an important role in routing Dutch library content directly to top-level services. Finally, it should be noted that the cross-domain hub is envisioned as a “dark aggregator”, i.e. a B2B service without an interface (website) for end users (however, see item 5 below). �Top level: International cross-country collaboration & aggregation Having established national cross-domain aggregation and interoperability on as many levels as possible , Dutch content can be shown and used on international stages, most notably Europeana. This fast growing, largely EU-funded, metadata aggregator and display space for European digitized works enables people to explore the resources of Europe's museums, libraries, archives and audio-visual institutions. It promotes discovery and networking opportunities in a multilingual space where users can engage, share in and be inspired by the rich diversity of Europe's cultural and scientific heritage. Europeana always connects users to the original source of the material so authenticity is ensured. The digital objects they can find are not stored centrally with Europeana, but remain hosted at the providing cultural institutions.   Europeana offers the following added values for (Dutch) content holding institutions: - It enriches the experience of their users by making relations between their objects and information from other countries and institutions. This enables cross-border and interdisciplinary research, as well as enriching the content by presenting it in a wider context. - Users expect integrated content – they want to see video’s, listen to sound recordings, look at images and read texts, all in once place. Using Europeana they can find related content in multiple formats, from different countries and from diverse domains and disciplines. - Europeana makes their content findable in search engines and social platforms. - Europeana generates extra visits to their holdings by redirecting users to the original source of the content (i.e. the content holders’ websites). - Europeana offers a set of APIs . These not only enable reuse of Europeana content by third parties, but also allow the content that has been contextualized & enriched by Europeana to be given back to the providers for use in their own environments. The APIs, in other words, make it possible to create user interface elements for (dark) aggregation services on the lower and middle levels, as indicated in Figure 2 by the dotted blue API arrows. - Knowledge transfer can be a major added value from becoming a participant in the Europeana network. Europeana collaborates with digital library professionals across Europe and the US. Knowledge generated by these experts is fed back into the network via presentations, workshops and seminars. This way valuable knowledge about the theory and practice on metadata standards, multilinguality, semantic web, information architectures, usability, geolocation, object modeling and many other subjects becomes available for content suppliers.
Page 94: The  Big Dutch 20 Year  730 Million Page Digitisation  Challenge

http

://4

.bp.

blog

spot

.com

/-Q

qBeV

bbjr

pY/T

4csQ

k4dt

cI/A

AAAA

AAAA

Yw/6

btxo

psuR

sM/s

1600

/cro

wd2

.jpg

5) CrowdsourcingCollaborative OCR correction

Page 95: The  Big Dutch 20 Year  730 Million Page Digitisation  Challenge

[email protected] - @ookgezellig -slideshare.net/OlafJanssenNL

Thanks for your attention!