The Big Dutch 20 Year 730 Million Page Digitisation Challenge

Post on 06-May-2015

1.977 views 0 download

Tags:

description

The National Library of the Netherlands (KB) is mass-digitizing all Dutch publications since 1470. This article outlines KB's strategy for making this output publicly available. In the next 20 years, the Dutch national library (KB) will mass-digitize all Dutch printed books, newspapers and magazines since 1470, a total of 730 million pages. Until recently, this was done by public funding alone. To speed up things in a climate of ongoing budget cuts, KB entered into public-private partnerships with both Google and Proquest to digitize 42 million pages by 2013. Besides the availability of funding, digitization priority is determined by a mix of client and institutional needs such as copyright status, uniqueness, institutional capability and user demand. At the same time, KB is answering user demand for centralized access and content distribution by streamlining its scattered online services portfolio. For this, KB develops two strategic lines of action. * The first is on metadata (searching FOR publications): in 2013, KB will unify metadata searching across all its paper and digital collections via OCLC's WorldCat Local. * The second is on full-text (searching IN publications): for searching in full-text historic publications (i.e. mass digitization output) KB is currently developing its Platform for Digital Publications. Besides a search engine, it is also a: * Presentation environment, associating each full-text object with a standardized webpage and persistent URL, offering a uniform look and feel, and unique reference for all KB's full-texts. This landing page enables third-party services (e.g. WorldCat Local, Europeana, Google) to refer to objects in a persistent way. * Delivery platform, enabling KB to deliver content in the workflows of users via APIs and expose it to research communities. * Aggregator, enabling KB to set up a network of partners to bring together all Dutch digital books, newspapers and magazines, at the same time supporting Europeana's content aggregation strategy.

Transcript of The Big Dutch 20 Year 730 Million Page Digitisation Challenge

TheThe BigBig DutchDutch 20 Year20 Year

730 Million 730 Million PagePage

Digitisation Digitisation ChallengeChallenge

w.a

lum

ni.u

bc.c

a/w

p/w

p-co

nten

t/up

load

s/G

ohag

an_R

LDut

chW

W_2

012_

01_H

olla

ndW

indm

illTu

lips.

jpg

10th International Conference on the Book30th June 2012, Barcelona, Spain

Olaf Janssen, National Library of the Netherlands – olaf.janssen@kb.nl / @ookgezellig / slideshare.net/OlafJanssenNL

Presentator
Presentatienotities
The National Library of the Netherlands (KB) is mass-digitizing all Dutch publications since 1470. This presentation outlines KB's strategy for making this output publically available. Keywords: National Libraries, Digital Library Strategies, Mass Digitization, Public - Private Partnerships, Integrated Access, Cross-domain Cultural Heritage, Content Aggregation and Distribution, Interoperability, Europeana

Photo: KB

Hello, my name is Olaf Janssen

I’m a project manager for KB, the

National Library of the Netherlands…

htt

p://

pin

tere

st.c

om/p

in/2

38

90

19

55

20

34

21

00

2/

Hello, my name is Olaf Janssen

These are my colleagues …Source: KB intranet

Source: KB intranet

Every day we give our best …

... because we’ve atask to accomplish

Source: NRC

We are scanning

Source: NRC

all Dutch

Presentator
Presentatienotities
In the next 20 years, the Dutch national library (KB) will mass-digitize all Dutch printed books, newspapers and magazines since 1470, a total of 730 million pages.

books

booksnewspapers

bookshttp://www.corbisimages.com/stock-photo/rights-managed/42-20042070/1960s-1970s-boy-leaning-against-tree-reading?popup=1http://www.corbisimages.com/stock-photo/rights-managed/42-26195211/humor-portrait-man-wearing-hat-sitting-on?popup=1http://eu.art.com/products/p6901596700-sa-i5098387/posters.htm?ui=F9E1398DA4CC4D3DB105E128FBAB2C4D

newspapers & magazines

since

1470

http

://m

arks

ayer

s.fil

es.w

ordp

ress

.com

/201

1/05

/cha

rles-

darw

in-2

.png

pages

A whopping

730http://www.portlandmonthlymag.com/assets/0004/4122/surprised-woman.jpghttp://morristrust.com/wp-content/uploads/2012/04/Surprised-Face.jpg

Presentator
Presentatienotities

pages

A whopping

730.000http://www.portlandmonthlymag.com/assets/0004/4122/surprised-woman.jpghttp://morristrust.com/wp-content/uploads/2012/04/Surprised-Face.jpg

Presentator
Presentatienotities

pages

A whopping

730.000.000http://www.portlandmonthlymag.com/assets/0004/4122/surprised-woman.jpghttp://morristrust.com/wp-content/uploads/2012/04/Surprised-Face.jpg

Presentator
Presentatienotities

(approx.)days

For the next

7300http://www.picturesfromourpast.com/gallery/RightsManaged/20110817/CKSA011_YC019.jpghttp://imgc.allpostersimages.com/images/P-473-488-90/56/5641/2RYMG00Z/posters/george-marks-surprised-woman-posing-portrait.jpg

Presentator
Presentatienotities

100.000pages

http://www.allposters.co.uk/-sp/Man-Wiping-Forehead-Posters_i8018953_.htm

That’s

every single day !!

And of course, after digitisation, we want to make many people happy with our content.

http://annakrentz.blogspot.nl/2011/05/dutch-liberation.html

I work on this because I believe…

http://annakrentz.blogspot.nl/2011/05/dutch-liberation.html

ultimately people want to know who they are.

For that they explore their histories & origins.

I want to help them in exploring these worlds

http://annakrentz.blogspot.nl/2011/05/dutch-liberation.html

http://annakrentz.blogspot.nl/2011/05/dutch-liberation.html

ultimately people want to know who they are.

For that they explore their histories & origins.

I want to help them in exploring these worlds

http://annakrentz.blogspot.nl/2011/05/dutch-liberation.html

ultimately people want to know who they are.

For that they explore their histories & origins.

I want to help them in exploring these worlds.

The key idea I’d like to share with you today:

How KB goes about tackling these grand

challenges…

http://mestadelsbilder.files.wordpress.com/2011/06/dali.jpg

First, we are creating digital content ...

http://www.bjp-online.com/IMG/001/110001/getty-archive-collection.jpg?1282665196

http://www.corbisimages.com/stock-photo/rights-managed/42-20042210/1940s-man-in-suit-holding-up-index?popup=1

1995-1999

1995-1999

Small scale digitisation

1995-1999

Treasures & highlights of KB collection

(1000s pages)

Small scale digitisation

Presentator
Presentatienotities
The KB started digitizing its holdings in 1995, for reasons of accessibility and long-term preservation. In the first years small scale efforts focused on scanning visually attractive materials, highlights of the collection for the widest possible audiences. One of the first projects was 100 highlights of the Koninklijke Bibliotheek,

1995-1999

Memory of the Netherlands

(730K images)

Small scale digitisation

Presentator
Presentatienotities
followed by Memory of the Netherlands, the national programme for digitizing Dutch cultural heritage, which was focused on image based materials.

2000-2010

http://www.corbisimages.com/stock-photo/rights-managed/42-26194724/smiling-blond-nurse-with-surprised-expression-talking?popup=1

Presentator
Presentatienotities
It was not until 1999 that the KB started digitizing historical textual publications (books, newspapers & magazines).

Large scale, public funding

2000-2010

Large scale, public funding

2000-2010

Early Dutch Books(2.2M pages, full-text, 1781-1800)

Presentator
Presentatienotities
For the last 8 years the focus has been on large-scale digitization of text corpora for study and research in the humanities using public funding. In 2003 a project took off to scan the complete run of Dutch Parliamentary Papers. Consisting of 2.3 million pages, this was at that time an unprecedented quantity for the Netherlands   In May 2011 the Early Dutch Books Online digitization effort delivered 2.1 million full-text pages from the specials book collections of the KB and the university libraries of Amsterdam and Leiden. Furthermore, by the summer of 2012, some 1.5 million pages from the most frequently consulted old magazines (1840 -1950) will have been converted into full-text and made searchable on the word level.

Large scale, public funding

2000-2010

Historic Newspapers(9.5M pages, full-text, 1618-1995)

Presentator
Presentatienotities
At the end of 2006 the KB was rewarded the Historical Newspapers project. By the middle of 2012, it will have scanned 9.5 million pages from popular Dutch regional, national and colonial newspapers from the period 1618-1995.

http://www.corbisimages.com/stock-photo/rights-managed/42-26194844/smiling-woman-counting-on-fingers-wearing-pearls?popup=1

2010-today

2010-today

Mass scale, private & public funding

2010-today

Mass scale, private & public funding

Proquest partnership(12M pages, 1450-1700)

http://www.kb.nl/nieuws/2011/proquest-en.html

Presentator
Presentatienotities
The KB cannot rely on public funding alone, especially in times when government support for cultural & scientific heritage is in a downward trend. It has therefore entered into strategic public-private partnerships with both Google and Proquest to digitize 210.000 books (some 42M pages) from its public domain collections. KB and Google sign book digitization agreement, http://www.kb.nl/nieuws/2010/google-en.html Digitization by Proquest of early printed books in KB collection, http://www.kb.nl/nieuws/2011/proquest-en.html

2010-today

Mass scale, private & public funding

Google partnership(35M pages, full-text, 1701-1871)

http://www.kb.nl/nieuws/2010/google-en.html

Presentator
Presentatienotities
The KB cannot rely on public funding alone, especially in times when government support for cultural & scientific heritage is in a downward trend. It has therefore entered into strategic public-private partnerships with both Google and Proquest to digitize 210.000 books (some 42M pages) from its public domain collections. KB and Google sign book digitization agreement, http://www.kb.nl/nieuws/2010/google-en.html Digitization by Proquest of early printed books in KB collection, http://www.kb.nl/nieuws/2011/proquest-en.html

OK, so we’re very busy creating loads of digital content …

http://www.bjp-online.com/IMG/001/110001/getty-archive-collection.jpg?1282665196

Presentator
Presentatienotities
Besides the availability of funding and physical condition of the materials, the priorities for digitization of KB’s textual collections are determined by a mix of further client and institutional needs. These include: 1- Copyright status. To minimize legal hassle and optimize future (re)usability of its objects, KB mainly focuses on historic out-of-copyright content. For these reasons the Google and Proquest efforts deliberately focus on these types of objects. For materials that are still under copyright, typically post-WWII publications, the KB tries to clear rights by signing bulk agreements with publishers and collective right organizations. In the Netherlands Lira (for text) and Pictoright (for images) are the main parties the KB has signed bulk-contracts with to make the Historic Newspaper website (5) possible. For orphan works the KB makes a reasonable effort (30 minutes per title) for tracking down and documenting rights holders. However, finding them is not always successful. The KB has chosen not to delay the upcoming launch of its Historic Magazines website (7) until all IPR-holders have been identified. When the site launches in June 2012, it will contain an opt-out page where rights holders can identify themselves and object to online publication of their articles, after which appropriate actions will be taken. 2- Uniqueness of materials. The KB has a lot of publications that are unique to the country. The original documents always have priority, as they are irreplaceable and carry the highest level of authenticity and integrity. 3- Institutional capability. There are many Dutch publications that have not only been collected by the KB, but by other libraries as well. However, not all of these are willing or capable to digitize these works because they lack resources, digitisation know-how or storage infrastructure. The KB can choose to digitize those works, as other parties are unlikely to do this. 4- User demand. By 2013 it will be possible to give priority to digitize publications clients explicitly ask for. In other words, next year KB will have an operational Digitisation on Demand service .

Houston, we’ve a problem!!

http

://2

.bp.

blog

spot

.com

/_BW

zuYw

iS6-

I/TM

geR

sFd3

mI/

AAAA

AAAA

Elw

/3cv

gbZS

PWcs

/s16

00/d

octo

r+m

acro

+ju

dy+

scar

ed.jp

g

Although we create & store our digital content in a strictly standardized process … (JP2, JPG, XML-OCR, MPEG21, ALTO, PDF … *)

* http://kb.nl/hrd/digitalisering/index-en.html

http://www.electrohype.org/press/pionjar/IBM_System360_Mod_50.jpg

.. this back-end standardisation does not reflect in the front-end

http://www.electrohype.org/press/pionjar/IBM_System360_Mod_50.jpg

KB Treasures Memory of the Netherlands

KB full-text Historic books

KB full-text

Historic newspapers

KB Treasures Memory of the Netherlands

Google full-text

Historic books

KB full-text Historic books

ProquestHistoric books

KB full-text

Historic newspapers

KB Treasures Memory of the Netherlands

To many people KB’s website portfolio feels something like this

http://berichtenuithetverleden.files.wordpress.com/2011/03/escher.jpg

BrandingObject

presentation

URL-logic DesignSearch logic

Imag

es:

htt

p://

ww

w.c

orbi

sim

ages

.com

/Sea

rch

#pg

=h

+ar

mst

ron

g+ro

bert

s

User experience

Current KB websites are inconsistent in

Display of result set

BrandingObject

presentation

Scattered & unrelated collections

URL-logic DesignSearch logic

Imag

es:

htt

p://

ww

w.c

orbi

sim

ages

.com

/Sea

rch

#pg

=h

+ar

mst

ron

g+ro

bert

s

User experience

Current KB websites are inconsistent in

Display of result set

BrandingObject

presentation

Scattered & unrelated collections

Non-interoperability

URL-logic DesignSearch logic

Imag

es:

htt

p://

ww

w.c

orbi

sim

ages

.com

/Sea

rch

#pg

=h

+ar

mst

ron

g+ro

bert

s

User experience

Display of result set

Current KB websites are inconsistent in

Presentator
Presentatienotities
The KB has a large and growing portfolio of online services for consumers, non-commercial parties and businesses. The multitude of KB websites makes is practically impossible for users to get a complete picture of all the content - both metadata and full-texts - the KB gives access to. Each of these sites has its own specific branding, URL, design and search & result display functionalities. For users it is not clear that e.g. the websites of the Memory of the Netherlands (3), the Parliamentary Papers (4), Historical Newspapers (5), Early Dutch Books Online (6) and ANP radio transcripts are all based on the same back-end open data standards and proven working methods , both for metadata and full-text. On the data level these services are interoperable, but this potential has not yet been optimized in the front-end presentation.� For end-users the KB collections thus appear to be unrelated and scattered,

For short:

Current KB websites don’t meet expectations of modern & future generations

http://www.corbisimages.com/stock-photo/rights-managed/NT3707756/depressed-cheerleader?popup=1http://www.corbisimages.com/stock-photo/rights-managed/42-20036948/1960s-1970s-seated-baby-in-diaper-with?popup=1

Presentator
Presentatienotities
making them relatively difficult to use given the expectations of the web2.0 generation. They demand all content not only to be available centrally via a single point of entry (portal), but also distributed in their own social and professional workflows, networks and platforms. They expect being able to apply multiple views & filters (by theme, by time, by geographical location, by object type etc.) to the interoperable, contextualized, enriched, sharable, taggable and re-usable content, with minimum copyright limitations. In addition, users are primarily interested in the digital content itself, much less from which physical object or institution it was derived.  

http://ww

w.leninim

ports.com/cary_grant_new

_7a.jpg

No panic !

We are working on a solution..

http://ww

w.leninim

ports.com/cary_grant_new

_7a.jpg

KB is implementing 3 lines of action

htt

p://

sim

pleh

omes

choo

l.net

/wp-

con

ten

t/u

ploa

ds/2

01

1/0

6/w

oman

_w

alki

ng_

bet

wee

n_

book

shel

f-e1

30

83

57

75

27

73

.jpg

htt

p://

ww

w.c

orbi

sim

ages

.com

/sto

ck-p

hot

o/ri

ghts

-man

aged

/NT3

76

51

15

/an

-old

-hat

-tri

ck?p

opu

p=1

1.Unified searching

forpublications

KB is implementing 3 lines of action

htt

p://

sim

pleh

omes

choo

l.net

/wp-

con

ten

t/u

ploa

ds/2

01

1/0

6/w

oman

_w

alki

ng_

bet

wee

n_

book

shel

f-e1

30

83

57

75

27

73

.jpg

htt

p://

ww

w.c

orbi

sim

ages

.com

/sto

ck-p

hot

o/ri

ghts

-man

aged

/NT3

76

51

15

/an

-old

-hat

-tri

ck?p

opu

p=1

1.Unified searching

forpublications

2.Unified searching

in publications

KB is implementing 3 lines of action

htt

p://

sim

pleh

omes

choo

l.net

/wp-

con

ten

t/u

ploa

ds/2

01

1/0

6/w

oman

_w

alki

ng_

bet

wee

n_

book

shel

f-e1

30

83

57

75

27

73

.jpg

htt

p://

ww

w.c

orbi

sim

ages

.com

/sto

ck-p

hot

o/ri

ghts

-man

aged

/NT3

76

51

15

/an

-old

-hat

-tri

ck?p

opu

p=1

1.Unified searching

forpublications

2.Unified searching

in publications

KB is implementing 3 lines of action

htt

p://

sim

pleh

omes

choo

l.net

/wp-

con

ten

t/u

ploa

ds/2

01

1/0

6/w

oman

_w

alki

ng_

bet

wee

n_

book

shel

f-e1

30

83

57

75

27

73

.jpg

htt

p://

ww

w.c

orbi

sim

ages

.com

/sto

ck-p

hot

o/ri

ghts

-man

aged

/NT3

76

51

15

/an

-old

-hat

-tri

ck?p

opu

p=1

3.Unified

object presentation

Presentator
Presentatienotities
It is virtually impossible to meet all users demands in one single service, so the strategy of the KB until 2013 is focusing on 3 lines of actions: one for metadata search , one for full-text search & object presentation, and one for object presentation

Metadata

1. Unified searching for publications

Metadata

1. Unified searching for publications

KB General Cataloguesearching for • (e-)books • (e-)magazines • (e-)newspapers

Metadata

1. Unified searching for publications

KB General Cataloguesearching for • (e-)books • (e-)magazines • (e-)newspapers

MetaLibsearching for • scholarly e-journals • licensed 3rd party databases

Presentator
Presentatienotities
Users that need specific publications (known-item search) will in general want to use metadata for finding the objects. Currently the KB offers its General Catalogue - for searching for (e-)books, (e-)magazines and (e-)newspapers – and MetaLib for searching for scholarly e-journals and licensed 3rd party databases.

Metadata

1. Unified searching for publications

KB General Cataloguesearching for • (e-)books • (e-)magazines • (e-)newspapers

MetaLibsearching for • scholarly e-journals • licensed 3rd party databases

WorldCat LocalKB’s single starting point for searching for publications

Presentator
Presentatienotities
In 2012 the KB will unify metadata searching via OCLC’s discovery tool WorldCat Local , making this KB’s single point of entry for known-item search across all digital (and paper) collections.

Downsides of WorldCat Local

1. Unified searching for publications

Downsides of WorldCat Local

1. Unified searching for publications

1. No full-text searching

Downsides of WorldCat Local

1. Unified searching for publications

2. No object presentation

1. No full-text searching

htt

p://

ww

w.c

orbi

sim

ages

.com

/Sea

rch

#pg

=h

+ar

mst

ron

g+ro

bert

s&p=

1&

Col

orFo

rmat

=2

&q=

sad

Presentator
Presentatienotities
The current version of WCL has two limitations that are relevant for KB’s digital ambitions: Full-text search: WCL does not support full-text searching, so parties that want to offer their users word-level discovery of their content must provide this in other services. Object presentation: As WCL is only aimed at finding digital content, the actual presentation of objects referred to by WCL is left to 3rd parties.

http://ww

w.leninim

ports.com/cary_grant_new

_7a.jpg

No panic !

We are tackling this…

http://ww

w.leninim

ports.com/cary_grant_new

_7a.jpg

Full-text

2. Unified searching in publications

Full-text

2. Unified searching in publications

KB Platform for Digital Publications

Full-text

2. Unified searching in publications

KB full-text historic books

KB Platform for Digital Publications

Full-text

2. Unified searching in publications

Google full-texthistoric books

KB full-text historic books

KB Platform for Digital Publications

Full-text

2. Unified searching in publications

Google full-texthistoric books

KB full-text historic books

KB full-text historic newspapers

KB Platform for Digital Publications

Full-text

2. Unified searching in publications

Google full-texthistoric books

KB full-text historic books

KB full-text historic newspapers

All future KB full-text digitisation output

KB Platform for Digital Publications

Full-text

2. Unified searching in publications

Google full-texthistoric books

KB full-text historic books

KB full-text historic newspapers

All future KB full-text digitisation output

KB Platform for Digital Publications

KB full-texthistoric magazines

(sept 2012)

Presentator
Presentatienotities
Because of the two limitations of WCL mentioned, the KB is currently developing a website where - Users can search in publications. It will be KB’s single point of entry for full-text search in digitized historic materials. Worldcat Local can refer to for presentation of the object. For every digital publication stored in the KB, a landing page with persistent URL is created on which the full-text book, magazine or newspaper is presented. This will create a user interface with uniform look & feel to all KB's full-text objects. � Not only WCL, but also other services (e.g. Google, Europeana, other library catalogues, Wikipedia) or persons (e.g. scientists citing a publication) can refer to the landing page/presentation environment using the persistent URL.   This website has the (working) name of KB Platform for Digital Publications. KB’s Platform for Digital Publications marks a turning point towards centralized access to Dutch digital publications. It will aggregate - The content & functionalities of existing KB full-text websites. In the first instance Historic Newspapers (5) and EDBO (6). These individual websites will be phased out as soon as the content has been transferred into the Platform. - Upcoming output of KB digitization projects. In the first instance Historic Magazines (7), Google (9) and Proquest (10). - The future output of digitization projects under the Metamorfoze programme (8) - Existing full-text collections from Dutch university libraries that have a tradition in the humanities, such as Leiden and Amsterdam. This collaboration with content delivering partners is of strategic important for KB’s ambitions to position the Platform as a national aggregator for digital textual publications.   For the years 2012 and 2013 four content releases for the Platform are foreseen. The current planning is as follows: Historic Magazines by June 2012 �* Adding 1.5M pages from the period 1840 -1950 + adding magazine-specific functionalities to the user interface Historic Books by October 2012 �* Transferring the 2.1M pages & book-specific functionalities of EDBO into the Platform �* Adding the e-books from the Google-effort that have been digitized by that date Dutch university collections by May 2013�* The exact number and nature of the objects are not yet known, as the investigation of eligible collections is currently underway. Historic Newspapers by May 2013�* Transferring the 9.5M pages & newspaper-specific functionalities of http://kranten.kb.nl into the Platform

Uniform look & feel, independent of object

3. Unified object presentation

Book (early wireframing stage)

Uniform look & feel, independent of object

3. Unified object presentation

Presentator
Presentatienotities
Because of the two limitations of WCL mentioned, the KB is currently developing a website where Worldcat Local can refer to for presentation of the object. For every digital publication stored in the KB, a landing page with persistent URL is created on which the full-text book, magazine or newspaper is presented. This will create a user interface with uniform look & feel to all KB's full-text objects. �  This website has the (working) name of KB Platform for Digital Publications.

Newspaper (early wireframing stage)

Uniform look & feel, independent of object

3. Unified object presentation

Uniform look & feel, independent of object

3. Unified object presentation

Magazine (early wireframing stage)

Landing page + persistent ID

3. Unified object presentation

Landing page + persistent ID

3. Unified object presentation

Landing page(within Platform for Digital Publications)

Landing page + persistent ID

3. Unified object presentation

Landing page(within Platform for Digital Publications)

persistent ID

Landing page + persistent ID

3. Unified object presentation

KB metadata search(via WCLocal)

Landing page(within Platform for Digital Publications)

Landing page + persistent ID

3. Unified object presentation

KB metadata search(via WCLocal)

KB full-text search (via Platform for Digital

Publications)

Landing page(within Platform for Digital Publications)

Landing page + persistent ID

3. Unified object presentation

Scientist, student etc.KB metadata search(via WCLocal)

KB full-text search (via Platform for Digital

Publications)

Landing page(within Platform for Digital Publications)

Landing page + persistent ID

3. Unified object presentation

Scientist, student etc.KB metadata search(via WCLocal)

KB full-text search (via Platform for Digital

Publications)

Landing page(within Platform for Digital Publications)

Landing page + persistent ID

3. Unified object presentation

Scientist, student etc.KB metadata search(via WCLocal)

KB full-text search (via Platform for Digital

Publications)

Landing page(within Platform for Digital Publications)

Landing page + persistent ID

3. Unified object presentation

Scientist, student etc.KB metadata search(via WCLocal)

KB full-text search (via Platform for Digital

Publications)

Landing page(within Platform for Digital Publications)

Presentator
Presentatienotities
Because of the two limitations of WCL mentioned, the KB is currently developing a website where Not only WCL, but also other services (e.g. Google, Europeana, other library catalogues, Wikipedia) or persons (e.g. scientists citing a publication) can refer to the landing page/presentation environment using the persistent URL.   This website has the (working) name of KB Platform for Digital Publications.

http://www.bjp-online.com/IMG/001/110001/getty-archive-collection.jpg?1282665196

So… we’re very busy creating loads of digital content …

http://www.bjp-online.com/IMG/001/110001/getty-archive-collection.jpg?1282665196

and we’re also creating unified discovery & presentation …

So… we’re very busy creating loads of digital content …

making quite a few people happy…

http://ww

w.corbisim

ages.com/Search#

pg=h+

armstrong+

roberts&p=

1&ColorForm

at=2&

q=happy

But we want more …

KB wants to make MORE people

HAPPIER with its content!

Some strategies…

http://mestadelsbilder.files.wordpress.com/2011/06/dali.jpg

1) APIs / dataservicesOAI-PMH & SRU

http://www.ducatimeccanica.com/single_engine.jpg

Presentator
Presentatienotities
The KB Platform for Digital Publications. will be aimed at meeting the user needs of the web2.0 generation. The Platform will therefore not only have full-text searching and presentation functionalities via a modern web2.0 user interface, but will also act as a content distribution service with APIs. These can deliver content to users in their established social networks and platforms, on their mobile devices, in their professional virtual research environments & communities, to the Open Data community, as well as allow others (both business and consumers) to build their own applications based on the content in the Platform.

2) Content via (social) networks

3) Clear licensing information for use & reuseCC-zero, unless (legal) restrictions apply

Presentator
Presentatienotities
Obviously, distribution and re-use of content outside the KB web domain is bound to copyright conditions. Even for objects that are in the public domain because of their age (such as the EDBO collection (6)), the KB is not always free to pro-actively distribute them. For instance, the public-private partnerships agreed with Google and Proquest both carry conditions that prevent the KB from distributing the full-texts to 3rd party servers.

4) Strategic partnershipsEuropeana & Wikipedia

Presentator
Presentatienotities
4 The cross-domain & international dimensions As the national library, the KB has a very important facilitating and networking role in the Dutch scientific and cultural infrastructure. Using this position, it has the potential to set up and stimulate different levels of collaboration to make online heritage more accessible. This is illustrated by the 3-tier collaborative model Lower level: domain specific collaboration & aggregation The KB plans to position its Platform for Digital Publications as a national aggregator for Dutch full-texts, aiming to make the content - and the network of content delivering partners - interoperable and ready for participation in cross-domain initiatives on national and international levels.   Besides the KB with its Platform, organizations from other domains are working on interoperability and aggregation for their specific sectors. Lead by the Institute for Sound & Vision, institutions from the audio-visual domain collaborate to enable aggregation of AV-materials. A similar initiative is taking place for the archival domain, with the National Archives as the facilitator. For the museum sector the Rijksdienst voor het Cultureel Erfgoed is the main player.   The ways content aggregation and the supporting technical and organizational structures are set up are not uniform, but differ across the domains. Based on sector-specific best-practices, knowledge and culture, each aggregator is setting up domain interoperability in the best possible way. This is however not done in isolation; the domains are in regular contact to reach consensus on issues such as “which content goes where”, to learn from each other and to avoid overlapping work. This way responsibilities & roles are kept clear, while at the same time synergies are exploited where possible.   Middle level: national cross-domain collaboration & aggregation�To enable these sector specific aggregation initiatives to come together, the results of the NED! project are used. It delivered a basic infrastructure for the interoperability of Dutch digital heritage, using open standards including XML, DublinCore, OAI-PMH and SRU. It is now being expanded to build a cross-domain heritage aggregator that can become the national hub for content delivery to international initiatives.   Building a national aggregator is however a step-by-step process, not finished overnight. Until that time domain-specific aggregators - in case of the library domain the National Platform for Digital Publications or The European Library - will continue to have an important role in routing Dutch library content directly to top-level services. Finally, it should be noted that the cross-domain hub is envisioned as a “dark aggregator”, i.e. a B2B service without an interface (website) for end users (however, see item 5 below). �Top level: International cross-country collaboration & aggregation Having established national cross-domain aggregation and interoperability on as many levels as possible , Dutch content can be shown and used on international stages, most notably Europeana. This fast growing, largely EU-funded, metadata aggregator and display space for European digitized works enables people to explore the resources of Europe's museums, libraries, archives and audio-visual institutions. It promotes discovery and networking opportunities in a multilingual space where users can engage, share in and be inspired by the rich diversity of Europe's cultural and scientific heritage. Europeana always connects users to the original source of the material so authenticity is ensured. The digital objects they can find are not stored centrally with Europeana, but remain hosted at the providing cultural institutions.   Europeana offers the following added values for (Dutch) content holding institutions: - It enriches the experience of their users by making relations between their objects and information from other countries and institutions. This enables cross-border and interdisciplinary research, as well as enriching the content by presenting it in a wider context. - Users expect integrated content – they want to see video’s, listen to sound recordings, look at images and read texts, all in once place. Using Europeana they can find related content in multiple formats, from different countries and from diverse domains and disciplines. - Europeana makes their content findable in search engines and social platforms. - Europeana generates extra visits to their holdings by redirecting users to the original source of the content (i.e. the content holders’ websites). - Europeana offers a set of APIs . These not only enable reuse of Europeana content by third parties, but also allow the content that has been contextualized & enriched by Europeana to be given back to the providers for use in their own environments. The APIs, in other words, make it possible to create user interface elements for (dark) aggregation services on the lower and middle levels, as indicated in Figure 2 by the dotted blue API arrows. - Knowledge transfer can be a major added value from becoming a participant in the Europeana network. Europeana collaborates with digital library professionals across Europe and the US. Knowledge generated by these experts is fed back into the network via presentations, workshops and seminars. This way valuable knowledge about the theory and practice on metadata standards, multilinguality, semantic web, information architectures, usability, geolocation, object modeling and many other subjects becomes available for content suppliers.

http

://4

.bp.

blog

spot

.com

/-Q

qBeV

bbjr

pY/T

4csQ

k4dt

cI/A

AAAA

AAAA

Yw/6

btxo

psuR

sM/s

1600

/cro

wd2

.jpg

5) CrowdsourcingCollaborative OCR correction

olaf.janssen@kb.nl - @ookgezellig -slideshare.net/OlafJanssenNL

Thanks for your attention!