Digital Newspaper Collections: If You Build One, Who Will...

73
Digital Newspaper Collections: If You Build One, Who Will Visit? By Frederick Zarndt (Global Connexions) Day 1: 8th April 2014 Session II Content Development: Accelerating and Enriching Digital Content Creation

Transcript of Digital Newspaper Collections: If You Build One, Who Will...

Page 1: Digital Newspaper Collections: If You Build One, Who Will ...myrepositori.pnm.gov.my/bitstream/123456789/1224/1/IDLC2014_Pa… · Digital Newspaper Collections: If You Build One,

Digital Newspaper Collections:

If You Build One, Who Will Visit?

By Frederick Zarndt

(Global Connexions)

Day 1: 8th April 2014

Session II

Content Development: Accelerating and Enriching Digital

Content Creation

Page 2: Digital Newspaper Collections: If You Build One, Who Will ...myrepositori.pnm.gov.my/bitstream/123456789/1224/1/IDLC2014_Pa… · Digital Newspaper Collections: If You Build One,

Frederick Zarndt has worked with historic and contemporary newspaper, journal,

magazine, book, and records digitisation since computer speeds, software, technology,

storage, and costs first made it practical. Frederick has experience in every aspect of

digitisation projects including project requirements development, project management,

conversion operations (both in-house and outsourced), acceptance testing, software

development for production and delivery of digital data, and digital preservation.

Frederick is current secretary and former chair of the IFLA Newspapers Section. He’s

the administrative chair of the ALTO XML Editorial Board and a member of the METS

Editorial Board. Frederick has 25+ years experience in software development and is a

member of ACM and IEEE and a Certified Software Development Professional (CSDP).

He is a member of ALA and IFLA. Frederick has Master's Degrees in Computer Science

and Physics.

Frederick Zarndt Global Connexions

Page 3: Digital Newspaper Collections: If You Build One, Who Will ...myrepositori.pnm.gov.my/bitstream/123456789/1224/1/IDLC2014_Pa… · Digital Newspaper Collections: If You Build One,

digital newspaper collections:

if you build one, who will visit?

Frederick ZarndtIFLA Newspapers [email protected]@cowboyMontanahashtag #IFLAnewspaper

Page 4: Digital Newspaper Collections: If You Build One, Who Will ...myrepositori.pnm.gov.my/bitstream/123456789/1224/1/IDLC2014_Pa… · Digital Newspaper Collections: If You Build One,

about digital newspapers

• programs

• collections

• users / crowdsourcing

San Francisco Call 21 April 1906

Page 5: Digital Newspaper Collections: If You Build One, Who Will ...myrepositori.pnm.gov.my/bitstream/123456789/1224/1/IDLC2014_Pa… · Digital Newspaper Collections: If You Build One,

why digitize newspapers?

“News is only the first rough draft of history.”

Alan Barth writing for 1943Washington Post

Wikipedia contributors, “Alan Barth," Wikipedia, The Free Encyclopedia, https://en.wikipedia.org/wiki/Alan_Barth (accessed March 2014).

Page 6: Digital Newspaper Collections: If You Build One, Who Will ...myrepositori.pnm.gov.my/bitstream/123456789/1224/1/IDLC2014_Pa… · Digital Newspaper Collections: If You Build One,

to preserve

to provide access

why digitize newspapers?

Page 7: Digital Newspaper Collections: If You Build One, Who Will ...myrepositori.pnm.gov.my/bitstream/123456789/1224/1/IDLC2014_Pa… · Digital Newspaper Collections: If You Build One,

• newspapers are deteriorating

• microfilm is dissolving

• no storage space or space is too expensive

Page 8: Digital Newspaper Collections: If You Build One, Who Will ...myrepositori.pnm.gov.my/bitstream/123456789/1224/1/IDLC2014_Pa… · Digital Newspaper Collections: If You Build One,

• newspapers are deteriorating

• microfilm is dissolving

• no storage space or space is too expensive

Page 9: Digital Newspaper Collections: If You Build One, Who Will ...myrepositori.pnm.gov.my/bitstream/123456789/1224/1/IDLC2014_Pa… · Digital Newspaper Collections: If You Build One,

• newspapers are deteriorating

• microfilm is dissolving

• no storage space or space is too expensive

Page 10: Digital Newspaper Collections: If You Build One, Who Will ...myrepositori.pnm.gov.my/bitstream/123456789/1224/1/IDLC2014_Pa… · Digital Newspaper Collections: If You Build One,

• newspapers are deteriorating

• microfilm is dissolving

• no storage space or space is too expensive

Page 11: Digital Newspaper Collections: If You Build One, Who Will ...myrepositori.pnm.gov.my/bitstream/123456789/1224/1/IDLC2014_Pa… · Digital Newspaper Collections: If You Build One,

the principal reason to digitize newspapers is to provide non-destructive, universal

access to newspapers for as many users as possible

Page 12: Digital Newspaper Collections: If You Build One, Who Will ...myrepositori.pnm.gov.my/bitstream/123456789/1224/1/IDLC2014_Pa… · Digital Newspaper Collections: If You Build One,

Ph

oto

by

DA

VID

IL

IFF

. L

icen

se:

CC

-BY

-SA

3.0

rea

din

g r

oo

ms

by

th

e n

um

ber

s*

Monthly average

Visitors Requests for Newspapers

Population Reading Room Microform Print

Australia 22,876,000 5,130 345 240

France 65,350,000 3,000 2,000 1,000

Netherlands 16,847,000 NA NA NA

New Zealand 4,414,000 NA NA NA

Norway 4,985,000 600 400 NA

Singapore 5,184,000 NA 300 NA

UK 62,262,000 2,000 6,900 4,816

USA 313,292,000 NA NA NA

*numbers from 2012

Page 13: Digital Newspaper Collections: If You Build One, Who Will ...myrepositori.pnm.gov.my/bitstream/123456789/1224/1/IDLC2014_Pa… · Digital Newspaper Collections: If You Build One,

physical versus digital

monthly averages 2012

requests for newspapers digitised historical newspapers

population paper + microform unique visitors

22,876,000 585 150,000

37,692,000 NA 12,800

5,405,000 NA NA

65,350,000 3,000 22,000

16,847,000 NA 50,000

4,414,000 NA 83,333

4,985,000 400 1,500

5,184,000 300 12,400

62,262,000 11,716 NA

313,292,000 NA NA

Page 14: Digital Newspaper Collections: If You Build One, Who Will ...myrepositori.pnm.gov.my/bitstream/123456789/1224/1/IDLC2014_Pa… · Digital Newspaper Collections: If You Build One,

Image from http://www.visualinsight.net/nc/gallery/pages/e-Preservation.html

• newspaper digitization is expensive

• newspaper digitization is complicated

• digital preservation is expensive

• digital preservation is untested

BUT …

Page 15: Digital Newspaper Collections: If You Build One, Who Will ...myrepositori.pnm.gov.my/bitstream/123456789/1224/1/IDLC2014_Pa… · Digital Newspaper Collections: If You Build One,

programs

Page 16: Digital Newspaper Collections: If You Build One, Who Will ...myrepositori.pnm.gov.my/bitstream/123456789/1224/1/IDLC2014_Pa… · Digital Newspaper Collections: If You Build One,

programs

National

Cooperative

Ind

ivid

ua

l

Page 17: Digital Newspaper Collections: If You Build One, Who Will ...myrepositori.pnm.gov.my/bitstream/123456789/1224/1/IDLC2014_Pa… · Digital Newspaper Collections: If You Build One,

national: a single (national) library which funds and manages a national newspapers digitization program.

• Papers Past, National Library of New Zealand

• Newspaper SG, National Library of Singapore

• Historiallinen Sanomalehtikirjasto, National Library of Finland

• and others …

programs

Page 18: Digital Newspaper Collections: If You Build One, Who Will ...myrepositori.pnm.gov.my/bitstream/123456789/1224/1/IDLC2014_Pa… · Digital Newspaper Collections: If You Build One,

national: centrally funded and centrally managed program with several participants. strict standards for participants.

• National Digital Newspaper Program (Library of Congress)

• Australian Newspaper Digitisation Program

programs

Page 19: Digital Newspaper Collections: If You Build One, Who Will ...myrepositori.pnm.gov.my/bitstream/123456789/1224/1/IDLC2014_Pa… · Digital Newspaper Collections: If You Build One,

cooperative: organizations collaborate to achieve a common goal but digitization programs are managed separately. flexible standards.

• Europeana newspapers

• Digital Public Library of America

programs

Page 20: Digital Newspaper Collections: If You Build One, Who Will ...myrepositori.pnm.gov.my/bitstream/123456789/1224/1/IDLC2014_Pa… · Digital Newspaper Collections: If You Build One,

individual: organization digitizes on its own. may or, more usually, does not follow open standards. all commercial organizations.

• ProQuest Historical Newspapers

• Newspapers.com

• Newsbank

• many others…

programs

Page 21: Digital Newspaper Collections: If You Build One, Who Will ...myrepositori.pnm.gov.my/bitstream/123456789/1224/1/IDLC2014_Pa… · Digital Newspaper Collections: If You Build One,

• the design of a digitization program requires careful thought and must be adapted to local circumstances

• determine principal or targeted user demographic and use cases

• ask those who have gone before

• join the IFLA Newspapers Section! (ask me how)

programs

Image courtesy of Donald Zolan.

Page 22: Digital Newspaper Collections: If You Build One, Who Will ...myrepositori.pnm.gov.my/bitstream/123456789/1224/1/IDLC2014_Pa… · Digital Newspaper Collections: If You Build One,

collections

Page 23: Digital Newspaper Collections: If You Build One, Who Will ...myrepositori.pnm.gov.my/bitstream/123456789/1224/1/IDLC2014_Pa… · Digital Newspaper Collections: If You Build One,

as of Mar 2014

library collection ~size pages dates

National Library of Australia Trove 12,668,000 1803-1995

California Digital Newspaper Collection CDNC 545,000 1846-2012

Naitonal Library of Finland Historical Newspaper Library 3,006,000 1771-1919

Bibliotheque nationale de France Gallica 2,200,000 1293-2000

Koninklijke Bibliotheek Historische Kranten 9,000,000 1618-1995

National Library of New Zealand Papers Past 3,109,000 1839-1945

National Library of Norway NBDigital Aviser 12,000,000 1763-2012

Singapore National Library Newspaper SG 2,400,000 1831-2009

British Library British Newspaper Archive 7,598,000 1710-1954

Library of Congress Chronicling America 7,293,000 1836-1922

digital historic newspaper collections

Page 24: Digital Newspaper Collections: If You Build One, Who Will ...myrepositori.pnm.gov.my/bitstream/123456789/1224/1/IDLC2014_Pa… · Digital Newspaper Collections: If You Build One,
Page 25: Digital Newspaper Collections: If You Build One, Who Will ...myrepositori.pnm.gov.my/bitstream/123456789/1224/1/IDLC2014_Pa… · Digital Newspaper Collections: If You Build One,

1

10

100

1,000

10,000

100,000

1,000,000

10,000,000

Au

stralia

n N

ewsp

ap

ers

Bo

ok

s

Pictu

res a

nd

ph

oto

s

Jo

urn

al A

rticles

Mu

sic sou

nd

an

d v

ideo

Ma

ps

Arch

ived

web

sites

Dia

ries, letters, arch

ive

s

Pe

op

le an

d o

rga

nisa

tion

s

unique visits page views

2013 monthly averages

Page 26: Digital Newspaper Collections: If You Build One, Who Will ...myrepositori.pnm.gov.my/bitstream/123456789/1224/1/IDLC2014_Pa… · Digital Newspaper Collections: If You Build One,

0

1,500,000

3,000,000

4,500,000

6,000,000

7,500,000

Au

stralia

n N

ewsp

ap

ers

Bo

ok

s

Pictu

res a

nd

ph

oto

s

Jo

urn

al A

rticles

Mu

sic sou

nd

an

d v

ideo

Ma

ps

Arch

ived

web

sites

Dia

ries, letters, arch

ives

Pe

op

le an

d o

rga

nisa

tion

s

unique visits page views

2013 monthly averages

Page 27: Digital Newspaper Collections: If You Build One, Who Will ...myrepositori.pnm.gov.my/bitstream/123456789/1224/1/IDLC2014_Pa… · Digital Newspaper Collections: If You Build One,
Page 28: Digital Newspaper Collections: If You Build One, Who Will ...myrepositori.pnm.gov.my/bitstream/123456789/1224/1/IDLC2014_Pa… · Digital Newspaper Collections: If You Build One,

0

200000

400000

600000

800000

1000000

NewspaperSG Infopedia iRememberSG

unique visits number of visits page views

2013 monthly averages

Page 29: Digital Newspaper Collections: If You Build One, Who Will ...myrepositori.pnm.gov.my/bitstream/123456789/1224/1/IDLC2014_Pa… · Digital Newspaper Collections: If You Build One,
Page 30: Digital Newspaper Collections: If You Build One, Who Will ...myrepositori.pnm.gov.my/bitstream/123456789/1224/1/IDLC2014_Pa… · Digital Newspaper Collections: If You Build One,

February 2014

123,889 53,897

2,527,926

517,823

0

500000

1000000

1500000

2000000

2500000

3000000

Papers Past National Library except Papers Past

unique visits page views

Page 31: Digital Newspaper Collections: If You Build One, Who Will ...myrepositori.pnm.gov.my/bitstream/123456789/1224/1/IDLC2014_Pa… · Digital Newspaper Collections: If You Build One,
Page 32: Digital Newspaper Collections: If You Build One, Who Will ...myrepositori.pnm.gov.my/bitstream/123456789/1224/1/IDLC2014_Pa… · Digital Newspaper Collections: If You Build One,

2013 monthly averages

90%

10% 0%

Historic Cambridge Newspapers (1846-1923)

Cambridge City Directories (1848 - 1910)

Cambridge Chronicle (August 2005 to present)

Page 33: Digital Newspaper Collections: If You Build One, Who Will ...myrepositori.pnm.gov.my/bitstream/123456789/1224/1/IDLC2014_Pa… · Digital Newspaper Collections: If You Build One,

users

Page 34: Digital Newspaper Collections: If You Build One, Who Will ...myrepositori.pnm.gov.my/bitstream/123456789/1224/1/IDLC2014_Pa… · Digital Newspaper Collections: If You Build One,

Newspaper collectionuser survey

• California Digital Newspaper Collection and Cambridge Public Library published a user survey in Mar 2013

• 604 / 32 responses

• surveys are (mostly) identical except for organization name

John Herbert and Randy Olsen. “Small town papers: Still delivering the news”. Paper given at 2012 World Library and

Page 35: Digital Newspaper Collections: If You Build One, Who Will ...myrepositori.pnm.gov.my/bitstream/123456789/1224/1/IDLC2014_Pa… · Digital Newspaper Collections: If You Build One,

User demographic:genealogists and family historians

John Herbert and Randy Olsen. “Small town papers: Still delivering the news”. Paper given at 2012 World Library and

Page 36: Digital Newspaper Collections: If You Build One, Who Will ...myrepositori.pnm.gov.my/bitstream/123456789/1224/1/IDLC2014_Pa… · Digital Newspaper Collections: If You Build One,

User demographic:no spring chickens

John Herbert and Randy Olsen. “Small town papers: Still delivering the news”. Paper given at 2012 World Library and

X

Page 37: Digital Newspaper Collections: If You Build One, Who Will ...myrepositori.pnm.gov.my/bitstream/123456789/1224/1/IDLC2014_Pa… · Digital Newspaper Collections: If You Build One,

User demographic:reasons for use

John Herbert and Randy Olsen. “Small town papers: Still delivering the news”. Paper given at 2012 World Library and

Page 38: Digital Newspaper Collections: If You Build One, Who Will ...myrepositori.pnm.gov.my/bitstream/123456789/1224/1/IDLC2014_Pa… · Digital Newspaper Collections: If You Build One,

User demographic:types of information

John Herbert and Randy Olsen. “Small town papers: Still delivering the news”. Paper given at 2012 World Library and

Page 39: Digital Newspaper Collections: If You Build One, Who Will ...myrepositori.pnm.gov.my/bitstream/123456789/1224/1/IDLC2014_Pa… · Digital Newspaper Collections: If You Build One,

• 72% visit UDN for genealogical research• 20% visit for various other types of historical research• 87% find obituaries useful• Over 60% find the other genealogical article types (birth

and wedding announcements) useful• Only 7% do not find genealogical articles useful• Many are writing family histories and consequently also

look for general background information• Older content is much more highly valued than more recent

content (see more detailed explanation that follows)• 44% find smaller, rural papers more useful, while only 15%

find larger, metropolitan papers more useful

Utah Digital Newspapers:2012 user survey

John Herbert and Randy Olsen. Small town papers: still delivering the news. WLIC 2012, Helsinki Finland. http://conference.ifla.org/past-wlic/2012/119-herbert-en.pdf

Page 40: Digital Newspaper Collections: If You Build One, Who Will ...myrepositori.pnm.gov.my/bitstream/123456789/1224/1/IDLC2014_Pa… · Digital Newspaper Collections: If You Build One,

“The ‘typical’ Trove user is a very well educated, highly paid, English speaking employed woman aged fifty or over, with a significant or primary interest in family or local history, who visits the Trove website very frequently. Users of Trove newspapers are older than the average Trove user; only 13% of newspaper users are under 40 years or age.”

Marie-Louise Ayres. ‘Singing for their supper’: Trove, Australian newspapers, and the crowd. WLIC 2013,Singapore. http://library.ifla.org/245/1/153-ayres-en.pdf.

Engaged users: who are they?

Page 41: Digital Newspaper Collections: If You Build One, Who Will ...myrepositori.pnm.gov.my/bitstream/123456789/1224/1/IDLC2014_Pa… · Digital Newspaper Collections: If You Build One,

“Many of Trove’s user engagement features are very popular. More than 100,000 users have registered to date, and more than 2 million tags and nearly 60,000 comments had been added… [Trove] text correction, however, stands head and shoulders above any other user engagement features.”

Marie-Louise Ayres. ‘Singing for their supper’: Trove, Australian newspapers, and the crowd. WLIC 2013,Singapore. http://library.ifla.org/245/1/153-ayres-en.pdf.

Engaged users: who are they?

Page 42: Digital Newspaper Collections: If You Build One, Who Will ...myrepositori.pnm.gov.my/bitstream/123456789/1224/1/IDLC2014_Pa… · Digital Newspaper Collections: If You Build One,

Crowdsourcing is the practice of obtaining needed services, ideas, or content by soliciting

contributions from a large group of people, and especially from an online community, rather than from traditional employees or suppliers. ... [It] is different from ordinary

outsourcing since it is a task or problem that is outsourced to an undefined public rather than

a specific, named group.

Wikipedia contributors, "Crowdsourcing," Wikipedia, The Free Encyclopedia, http://en.wikipedia.org/wiki/Crowdsourcing (accessed March 17, 2013)

Page 43: Digital Newspaper Collections: If You Build One, Who Will ...myrepositori.pnm.gov.my/bitstream/123456789/1224/1/IDLC2014_Pa… · Digital Newspaper Collections: If You Build One,

Why correct text?

Here’s why ...

Page 44: Digital Newspaper Collections: If You Build One, Who Will ...myrepositori.pnm.gov.my/bitstream/123456789/1224/1/IDLC2014_Pa… · Digital Newspaper Collections: If You Build One,

Deaths. lln»rieff, Esq. of <c .. Qn.Sunday, the till. greatly Drandrellt, ofOrms4\irJi.- ~ ; ;✓ ' • * On ijfr r innljjjil F iij '11 f Havodivyd,Carnarvonshire, S ; **" *- ' « ' MarchOxford, F. Tfovmeud, Uerald. » • V .•On Tncsdav last, Mr. Charles.IWilinson, this 8 ; had vf thesis#,, a weekago, which tcrminate<i'iu his death. . / ' ■O'i Sunday, dJst nit. at. AsbtCnvHall,mar Lancaster, Mr.,Geo. Worn ick,many years house'steward hit late OnceThe Hamilton and Brandon. He lockedhimself h»oWn'r«wte<: soon. twelveo'clock" that dny, and fii»-d a loaded pistol"through Ins bead, 1 whichinstantaneously killed him. Coronet'sVerdict, shot himself in a temporary fit ofFriday week,

raw OCR text

Excerpt from The British Newspaper Archive, Chester Courant, Tuesday 6-Apr-1819, page 3.

newspaper image

Page 45: Digital Newspaper Collections: If You Build One, Who Will ...myrepositori.pnm.gov.my/bitstream/123456789/1224/1/IDLC2014_Pa… · Digital Newspaper Collections: If You Build One,

Accuracy

• Edwin Kiljin (Koninklijke Bibliotheek the Netherlands) reports raw OCR character accuracies of 68% for early 20th century newspapers

• Rose Holley (National Library of Australia) reports raw OCR character accuracy varied from 71% to 98% on a sample Trove digitized newspapers

Rose Holley. How good can it get? Analysing and improving OCR accuracy in large scale historic newspaper digitisation programs. D-Lib Magazine. March/April 2009.

Edwin Kiljin. The current state-of-art in newspaper digitization. D-Lib Magazine. January/February 2008.

Page 46: Digital Newspaper Collections: If You Build One, Who Will ...myrepositori.pnm.gov.my/bitstream/123456789/1224/1/IDLC2014_Pa… · Digital Newspaper Collections: If You Build One,

uncorrected OCR accuracy by newspaper title

titleOCR character

accuracy~OCR word

accuracy*

PRP Pacific Rural Press 1871 - 1922 92.6% 68.1%

SFC San Francisco Call 1890 - 1913 92.6% 68.1%

LAH Los Angeles Herald 1873 - 1910 88.7% 54.9%

LH Livermore Herald 1877 - 1899 88.6% 54.6%

DAC Daily Alta California 1841 - 1891 88.2% 53.4%

CFJ California Farmer and Journalof Useful Sciences 1855 - 1880

86.5% 48.4%

SN Sausalito News 1885 - 1922 70.4% 17.3%

*Word accuracy assumes average word length is 5 characters

Page 47: Digital Newspaper Collections: If You Build One, Who Will ...myrepositori.pnm.gov.my/bitstream/123456789/1224/1/IDLC2014_Pa… · Digital Newspaper Collections: If You Build One,

OCR accuracy by newspaper title

titleOCR character

accuracycorrected accuracy

PRP Pacific Rural Press 1871 - 1922 92.6% 99.3%

SFC San Francisco Call 1890 - 1913 92.6% 99.6%

LAH Los Angeles Herald 1873 - 1910 88.7% 99.1%

LH Livermore Herald 1877 - 1899 88.6% 99.9%

DAC Daily Alta California 1841 - 1891 88.2% 99.9%

CFJ California Farmer and Journalof Useful Sciences 1855 - 1880

86.5% 99.8%

SN Sausalito News 1885 - 1922 70.4% 100.0%

Page 48: Digital Newspaper Collections: If You Build One, Who Will ...myrepositori.pnm.gov.my/bitstream/123456789/1224/1/IDLC2014_Pa… · Digital Newspaper Collections: If You Build One,

corrected accuracy by newspaper title

titleOCR character

accuracy~OCR word

accuracy*

corrected accuracy

~corrected word accuracy*

PRP 1871 - 1922 92.6% 68.1% 99.3% 96.5%

SFC 1890 - 1913 92.6% 68.1% 99.6% 98.0%

LAH 1873 - 1910 88.7% 54.9% 99.1% 95.6%

LH 1877 - 1899 88.6% 54.6% 99.9% 99.5%

DAC 1841 - 1891 88.2% 53.4% 99.9% 99.5%

CF 1855 - 1880 86.5% 48.4% 98.3% 91.8%

SN 1885 - 1922 70.4% 17.3% 100.0% 100.0%

*Word accuracy assumes average word length is 5 characters

Page 49: Digital Newspaper Collections: If You Build One, Who Will ...myrepositori.pnm.gov.my/bitstream/123456789/1224/1/IDLC2014_Pa… · Digital Newspaper Collections: If You Build One,

correction accuracyby user

user average OCR accuracy correction accuracy

A 70.4% 100.0%

B 87.1% 99.5%

C 95.4% 99.5%

D 86.5% 98.3%

E 95.3% 100.0%

F 91.0% 100.0%

G 91.0% 99.8%

H 90.5% 99.0%

I 96.6% 99.8%

J 94.8% 100.0%

K 86.8% 99.3%

Page 50: Digital Newspaper Collections: If You Build One, Who Will ...myrepositori.pnm.gov.my/bitstream/123456789/1224/1/IDLC2014_Pa… · Digital Newspaper Collections: If You Build One,

How does low text accuracy affect search recall?

The Facts

• Average uncorrected OCR character accuracy of the CDNC sample data is ~89%

• Average length of an English word is 5 characters

• Average word accuracy is 89% x 89% x 89% x 89% x 89% = 55.8% - round up to 60% or 6 out of 10 words correct

Accuracy

Page 51: Digital Newspaper Collections: If You Build One, Who Will ...myrepositori.pnm.gov.my/bitstream/123456789/1224/1/IDLC2014_Pa… · Digital Newspaper Collections: If You Build One,

ARNDT

ARNDTARNDT

ARNDT ARNDT

ARNDT

Search recall no text correction

instances of “ARNDT” found instances of “ARNDT” not found

Page 52: Digital Newspaper Collections: If You Build One, Who Will ...myrepositori.pnm.gov.my/bitstream/123456789/1224/1/IDLC2014_Pa… · Digital Newspaper Collections: If You Build One,

Accuracy

The Facts

• Average corrected character accuracy of the CDNC sample data is ~99.4%

• Average word accuracy of CDNC corrected text is 99.4% x 99.4% x 99.4% x 99.4% x 99.4% = 97.0%

Page 53: Digital Newspaper Collections: If You Build One, Who Will ...myrepositori.pnm.gov.my/bitstream/123456789/1224/1/IDLC2014_Pa… · Digital Newspaper Collections: If You Build One,

ARNDT

ARNDTARNDT

ARNDT ARNDT

ARNDT

ARNDT

ARNDT

ARNDT

AR

ND

T

instances of “ARNDT” found instances of “ARNDT” not found

Search recall with text correction

Page 54: Digital Newspaper Collections: If You Build One, Who Will ...myrepositori.pnm.gov.my/bitstream/123456789/1224/1/IDLC2014_Pa… · Digital Newspaper Collections: If You Build One,

A search for “Arndt” at Chronicling America gives 10,267 results*

• If Chronicling America text accuracy is 55.8% (same as uncorrected CDNC sample), then 8,133 instances of “Arndt” were not found

• If text accuracy is 97.0%, then 317 instances of “Arndt” were not found

Accuracy

* Search performed 31 Oct 2012

Page 55: Digital Newspaper Collections: If You Build One, Who Will ...myrepositori.pnm.gov.my/bitstream/123456789/1224/1/IDLC2014_Pa… · Digital Newspaper Collections: If You Build One,

Accuracy

Suppose the word/name is longer than 5 characters?

The Facts

• Assume that average uncorrected / corrected OCR character accuracy is ~89% / ~99% same as CDNC.

name name length raw text accuracy corrected text accuracy

Eklund 6 49.7% 94.2%

Kennedy 7 44.2% 93.25

Espinosa 8 39.4% 92.3%

Bonaparte 9 35% 91.4%

Chatterjee 10 31.2% 90.4%

Page 56: Digital Newspaper Collections: If You Build One, Who Will ...myrepositori.pnm.gov.my/bitstream/123456789/1224/1/IDLC2014_Pa… · Digital Newspaper Collections: If You Build One,

Accuracy

namenumber of search

resultsmissing results with raw

text accuracymissing results with corrected

text accuracy

Eklund 2,951 2,987 182

Kennedy 360,723 455,392 26,111

Espinosa 1,918 2,950 160

Bonaparte 44,664 82,947 4,203

Chatterjee 19 42 2

Chronicling America searches done 19-Mar-2013 (6,025,474 pages from 1836 to 1922).

Page 57: Digital Newspaper Collections: If You Build One, Who Will ...myrepositori.pnm.gov.my/bitstream/123456789/1224/1/IDLC2014_Pa… · Digital Newspaper Collections: If You Build One,

user lines corrected*

1 646,873

2 236,323

3 111,749

4 100,749

5 99,999

6 87,720

7 82,768

8 63,786

9 57,441

10 56,458

lines corrected* user

2,455,338 1

1,822,422 2

1,448,370 3

1,265,217 4

1,174,835 5

1,069,669 6

1,058,179 7

1,020,462 8

949,694 9

886,315 10

*numbers from Mar 2014

Page 58: Digital Newspaper Collections: If You Build One, Who Will ...myrepositori.pnm.gov.my/bitstream/123456789/1224/1/IDLC2014_Pa… · Digital Newspaper Collections: If You Build One,

user lines corrected Mar 2014

1 646,873

2 236,323

3 111,749

4 100,749

5 99,999

6 87,720

7 82,768

8 63,786

9 57,441

10 56,458

lines corrected Oct 2012

242,965

87,515

31,318

24,144

23,184

19,240

18,898

16,875

11,784

9,762

Page 59: Digital Newspaper Collections: If You Build One, Who Will ...myrepositori.pnm.gov.my/bitstream/123456789/1224/1/IDLC2014_Pa… · Digital Newspaper Collections: If You Build One,

• “I enjoy the correction ‐ it’s a great way to learn more about past history and things of interest whilst doing a ‘service to the community’ by correcting text for the benefit of others.”

• “I have recently retired from IT and thought that I could be of some assistance to the project. It benefits me and other people. It helps with family research.”

Rose Holley. Many Hands Make Light Work. National Library of Australia March 2009.

motivationTrove users’ report

Page 60: Digital Newspaper Collections: If You Build One, Who Will ...myrepositori.pnm.gov.my/bitstream/123456789/1224/1/IDLC2014_Pa… · Digital Newspaper Collections: If You Build One,

“I am interested in all kinds of history. I have pursued genealogy as a hobby for many years. I correct text at CDNC because I see it

as a constructive way to contribute to a worthwhile project. Because I am interested in history, I enjoy it.”

Wesley, California

Personal communications with CDNC text correctors.

motivationCDNC users’ report

Page 61: Digital Newspaper Collections: If You Build One, Who Will ...myrepositori.pnm.gov.my/bitstream/123456789/1224/1/IDLC2014_Pa… · Digital Newspaper Collections: If You Build One,

“I only correct the text on articles of local interest - nothing at state, national or international level, no advertisements, etc. The objective is to be able to help researchers to locate local people,

places, organizations and events using the on-line search at CDNC. I correct local news & gossip, personal items, real estate transactions, superior court proceedings, county and local board

of supervisors meetings, obituaries, birth notices, marriages, yachting news, etc.”

Ann, California

Personal communications with CDNC text correctors.

motivationCDNC users’ report

Page 62: Digital Newspaper Collections: If You Build One, Who Will ...myrepositori.pnm.gov.my/bitstream/123456789/1224/1/IDLC2014_Pa… · Digital Newspaper Collections: If You Build One,

“I have always been interested in history, especially the development of the American West, and nothing brings it alive

better than newspapers of the time. I believe them to be an invaluable source of knowledge for us and future generations.”

David, United Kingdom

motivationCDNC users’ report

Personal communications with CDNC text correctors.

Page 63: Digital Newspaper Collections: If You Build One, Who Will ...myrepositori.pnm.gov.my/bitstream/123456789/1224/1/IDLC2014_Pa… · Digital Newspaper Collections: If You Build One,

CDNC is an excellent source of information matching my personal interest in such topics as sea history, development of

shipbuilding, clippers and other ships etc. ... Unfortunately, the quality of text ... is rather poor I’m afraid. This is why I started

to do all corrections necessary for myself ... and to leave the corrected text for use of others. .... I am not doing this very

regularly as this is just my hobby and pleasure.

Jerzey, Poland

motivationCDNC users’ report

Personal communications with CDNC text correctors.

Page 64: Digital Newspaper Collections: If You Build One, Who Will ...myrepositori.pnm.gov.my/bitstream/123456789/1224/1/IDLC2014_Pa… · Digital Newspaper Collections: If You Build One,

As an amateur historical researcher my time for research is very limited. Making time to travel to archives, libraries, and historical societies does not happen as often as I would like. The Cambridge

Public Library’s online newspaper collection has been an invaluable resource and it is fun. I am very grateful for all the help I have received over the years from so many research organizations.

Correcting text has several benefits. It makes it much more likely that I will find a story if I decide to search for it in the future. It is a way of

saying ‘thank you’ to the Cambridge Library for having such a great resource available and maybe I can make the next person’s research a

little easier. It is my own little historical preservation project.

Cambridge Historical Newspapers Text Corrector

motivationCambridge users’ report

Personal communications with Cambridge text correctors.

Page 65: Digital Newspaper Collections: If You Build One, Who Will ...myrepositori.pnm.gov.my/bitstream/123456789/1224/1/IDLC2014_Pa… · Digital Newspaper Collections: If You Build One,

Hard-to-measure-but-shouldn’t-be-overlooked (HTMBSBO) benefits

Public domain photo “A useful instruction for young sailors from the Royal Hospital School,

Greenwich” from the National Maritime Museum.

Page 66: Digital Newspaper Collections: If You Build One, Who Will ...myrepositori.pnm.gov.my/bitstream/123456789/1224/1/IDLC2014_Pa… · Digital Newspaper Collections: If You Build One,

“when someone transcribes a document, they are actually better fulfilling the mission of a cultural heritage

organization than someone who simply stops by to flip through the pages”

HTMBSBO benefit

Paraphrased from Trevor Owen’s blog

http://www.trevorowens.org/2012/03/crowdsourcing-cultural-heritage-the-

objectives-are-upside-down/ (accessed June 2013).

Page 67: Digital Newspaper Collections: If You Build One, Who Will ...myrepositori.pnm.gov.my/bitstream/123456789/1224/1/IDLC2014_Pa… · Digital Newspaper Collections: If You Build One,

“in addition to increasing search accuracy or lowering the costs of document transcription, crowdsourcing is the

single greatest advancement in getting people using and interacting with library collections”

HTMBSBO benefit

Paraphrased from Trevor Owen’s blog

http://www.trevorowens.org/2012/03/crowdsourcing-cultural-heritage-the-

objectives-are-upside-down/ (accessed June 2013).

Page 68: Digital Newspaper Collections: If You Build One, Who Will ...myrepositori.pnm.gov.my/bitstream/123456789/1224/1/IDLC2014_Pa… · Digital Newspaper Collections: If You Build One,

conclusions

Conclusion of the Sonata for piano #32, opus 111 by Ludwig van Beethoven

• newspaper digitization may be difficult but there are many, many examples of successful digitization programs. ask for help! and join the IFLA Newspapers Section!

• digital newspaper collections are the most used digital library collections

• benefits to crowdsourced text correction and tagging are multi-faceted: data accuracy, patron engagement, increased web traffic

• know your user community!!

Page 69: Digital Newspaper Collections: If You Build One, Who Will ...myrepositori.pnm.gov.my/bitstream/123456789/1224/1/IDLC2014_Pa… · Digital Newspaper Collections: If You Build One,

• Library of Congress National Digital Newspaper Program http://www.loc.gov/ndnp/

• Australian Newspaper Digitisation Program http://www.nla.gov.au/content/newspaper-digitisation-program

• IFLA Newspapers Section Digitisation projects and best practices http://www.ifla.org/node/6777

• ICON: International Coalition on Newspapers http://icon.crl.edu/digitization.htm

Page 70: Digital Newspaper Collections: If You Build One, Who Will ...myrepositori.pnm.gov.my/bitstream/123456789/1224/1/IDLC2014_Pa… · Digital Newspaper Collections: If You Build One,

Wikipedia contributors, "List of online newspaper archives," Wikipedia, The Free Encyclopedia, https://en.wikipedia.org/wiki/Wikipedia:List_of_online_newspaper_archives (accessed March 17, 2013).

Page 71: Digital Newspaper Collections: If You Build One, Who Will ...myrepositori.pnm.gov.my/bitstream/123456789/1224/1/IDLC2014_Pa… · Digital Newspaper Collections: If You Build One,
Page 72: Digital Newspaper Collections: If You Build One, Who Will ...myrepositori.pnm.gov.my/bitstream/123456789/1224/1/IDLC2014_Pa… · Digital Newspaper Collections: If You Build One,

Become a member of the IFLA Newspapers Section! See

http://www.ifla.org/membership or ask me.

Frederick Zarndt, SecretaryIFLA Newspapers Section

[email protected]

Page 73: Digital Newspaper Collections: If You Build One, Who Will ...myrepositori.pnm.gov.my/bitstream/123456789/1224/1/IDLC2014_Pa… · Digital Newspaper Collections: If You Build One,

Frederick ZarndtSecretary, IFLA Newspapers Section

[email protected]

Photo held by John Oxley Library, State Library of Queensland. Original from

Courier-mail, Brisbane, Queensland, Australia.