Indigenous Language Newspapers as a Digital Library Collection: The Niupepa Example Department of...

38
Indigenous Language Indigenous Language Newspapers as a Newspapers as a Digital Library Collection: Digital Library Collection: The Niupepa Example The Niupepa Example Department of Computer Science Department of Computer Science Te Taka Keegan Te Taka Keegan

Transcript of Indigenous Language Newspapers as a Digital Library Collection: The Niupepa Example Department of...

Page 1: Indigenous Language Newspapers as a Digital Library Collection: The Niupepa Example Department of Computer Science Te Taka Keegan.

Indigenous Language Indigenous Language Newspapers as a Newspapers as a

Digital Library Collection: Digital Library Collection:

The Niupepa ExampleThe Niupepa Example

Department of Computer Science Department of Computer Science

Te Taka KeeganTe Taka Keegan

Page 2: Indigenous Language Newspapers as a Digital Library Collection: The Niupepa Example Department of Computer Science Te Taka Keegan.

2

OverviewOverview

NZ Language & Publishing HistoryNZ Language & Publishing History

The Niupepa CollectionThe Niupepa Collection

Digital LibrariesDigital Libraries

Image CaptureImage Capture

Text CaptureText Capture

NZDL environmentNZDL environment

User InterfaceUser Interface

ConclusionConclusion

historyNiupepadigital libraryimage capturetext captureNZDLuser interfaceconclusion

Page 3: Indigenous Language Newspapers as a Digital Library Collection: The Niupepa Example Department of Computer Science Te Taka Keegan.

3

New Zealand Encounter HistoryNew Zealand Encounter History

Some Māori say they were resident when the Some Māori say they were resident when the

‘land rose from the sea’. Other traditions give ‘land rose from the sea’. Other traditions give

migrations occurring between 900-1200AD.migrations occurring between 900-1200AD.

First European sighting was in 1642 by Abel First European sighting was in 1642 by Abel

Tasman. In 1769 the islands were charted by Tasman. In 1769 the islands were charted by

the James Cook. the James Cook.

In the 1790s whalers and sealers began to In the 1790s whalers and sealers began to

arrive, trading with Māori and often settling. arrive, trading with Māori and often settling.

They were followed by missionaries.They were followed by missionaries.

historyNiupepadigital libraryimage capturetext captureNZDLuser interfaceconclusion

Page 4: Indigenous Language Newspapers as a Digital Library Collection: The Niupepa Example Department of Computer Science Te Taka Keegan.

4

……New Zealand Encounter HistoryNew Zealand Encounter History

In 1840 the Treaty of Waitangi was signed In 1840 the Treaty of Waitangi was signed

establishing the British colony.establishing the British colony.

In 1845-1872 the NZ Land wars were fought In 1845-1872 the NZ Land wars were fought

resulting in significant land confiscationsresulting in significant land confiscations

For 100 years the Colonial Government served For 100 years the Colonial Government served

policies of assimilation and integration to policies of assimilation and integration to

‘civilize the native savage’‘civilize the native savage’

In the1970s Government policies began to In the1970s Government policies began to

favour multiculturalism favour multiculturalism

historyNiupepadigital libraryimage capturetext captureNZDLuser interfaceconclusion

Page 5: Indigenous Language Newspapers as a Digital Library Collection: The Niupepa Example Department of Computer Science Te Taka Keegan.

5

NZ Māori Language History…NZ Māori Language History…

At the time when the Treaty was signed At the time when the Treaty was signed

Māori language was used in everyday Māori language was used in everyday

activities, e.g. trade, schooling, religion.activities, e.g. trade, schooling, religion.

After 50 years of encounter history the Māori After 50 years of encounter history the Māori

population had reduced to ~30% because of population had reduced to ~30% because of

wars, diseases, and land alienation.wars, diseases, and land alienation.

The Colonial attitudes & legislation had a The Colonial attitudes & legislation had a

devastating effect on Māori language. The devastating effect on Māori language. The

Education Acts (1867 & 1871) prohibited the Education Acts (1867 & 1871) prohibited the

use of Māori language in education.use of Māori language in education.

historyNiupepadigital libraryimage capturetext captureNZDLuser interfaceconclusion

Page 6: Indigenous Language Newspapers as a Digital Library Collection: The Niupepa Example Department of Computer Science Te Taka Keegan.

6

……NZ Māori Language HistoryNZ Māori Language History

Following WWII there was a significant Following WWII there was a significant

Māori urban shift which often meant an Māori urban shift which often meant an

abandonment of Māori language and custom.abandonment of Māori language and custom.

Research undertaken in 1979 suggested the Research undertaken in 1979 suggested the

death of the ‘living’ language was imminent.death of the ‘living’ language was imminent.

The 1980s saw a significant resurgence in The 1980s saw a significant resurgence in

Māori language initiatives, especially schooling, Māori language initiatives, especially schooling,

and Government support.and Government support.

Currently, 83% of Māori have little or no fluency, Currently, 83% of Māori have little or no fluency,

1/3 of fluent speakers are over 60. However the 1/3 of fluent speakers are over 60. However the

outlook is promising.outlook is promising.

historyNiupepadigital libraryimage capturetext captureNZDLuser interfaceconclusion

Page 7: Indigenous Language Newspapers as a Digital Library Collection: The Niupepa Example Department of Computer Science Te Taka Keegan.

7

Māori Language Publishing…Māori Language Publishing…

Māori is traditionally an oral language, Māori is traditionally an oral language,

certain characteristics are lost in the print media.certain characteristics are lost in the print media.

It was the Missionaries who initiated printing in It was the Missionaries who initiated printing in

the 1830s to enable reading of scripture.the 1830s to enable reading of scripture.

It has been estimated that 50% of Māori were It has been estimated that 50% of Māori were

literate in the 1850-1900 period.literate in the 1850-1900 period.

Consequently many periodicals, letters, and Consequently many periodicals, letters, and

Government proceedings were written in Māori in Government proceedings were written in Māori in

that period.that period.

historyNiupepadigital libraryimage capturetext captureNZDLuser interfaceconclusion

Page 8: Indigenous Language Newspapers as a Digital Library Collection: The Niupepa Example Department of Computer Science Te Taka Keegan.

8

……Māori Language PublishingMāori Language Publishing

With the Māori population recovery in the With the Māori population recovery in the

1900s there was not also a language recovery. 1900s there was not also a language recovery.

Consequently in 1900-1950s there was a sharp Consequently in 1900-1950s there was a sharp

decline in the amount of material pub. in Māori.decline in the amount of material pub. in Māori.

1970s Ministry of Education began publishing in 1970s Ministry of Education began publishing in

the Māori language and a demand has arisen the Māori language and a demand has arisen

with the recent resurgence initiatives.with the recent resurgence initiatives.

1987 Māori language Commission established 1987 Māori language Commission established

and Māori Language given official status.and Māori Language given official status.

historyNiupepadigital libraryimage capturetext captureNZDLuser interfaceconclusion

Page 9: Indigenous Language Newspapers as a Digital Library Collection: The Niupepa Example Department of Computer Science Te Taka Keegan.

9

What is the Niupepa Collection? What is the Niupepa Collection?

Libraries throughout NZ hold various papers Libraries throughout NZ hold various papers

from the time period when Māori language from the time period when Māori language

publishing was flourishing.publishing was flourishing.

In 1988/1989 the Alexander Turnbull Library In 1988/1989 the Alexander Turnbull Library

undertook to source the best copies of these undertook to source the best copies of these

periodicals and preserve them on microfilm.periodicals and preserve them on microfilm.

This collection is called the “Niupepa Collection” This collection is called the “Niupepa Collection”

and is available on 408 microfiche at most and is available on 408 microfiche at most

libraries.libraries.

history Niupepadigital libraryimage capturetext captureNZDLuser interfaceconclusion

Page 10: Indigenous Language Newspapers as a Digital Library Collection: The Niupepa Example Department of Computer Science Te Taka Keegan.

10

What is the Niupepa Collection? What is the Niupepa Collection? history Niupepadigital libraryimage capturetext captureNZDLuser interfaceconclusion

Page 11: Indigenous Language Newspapers as a Digital Library Collection: The Niupepa Example Department of Computer Science Te Taka Keegan.

11

What’s in the Niupepa Collection? What’s in the Niupepa Collection?

There are approximately 18,000 pages in There are approximately 18,000 pages in

40 40 periodicals. They were mostperiodicals. They were mostly written in ly written in

Māori or for Māori audience.Māori or for Māori audience.

They are written from 3 distinct perspectives; They are written from 3 distinct perspectives;

Missionaries, Government and from MMissionaries, Government and from Māāori ori

themselves.themselves.

The language is written in a very clear style.The language is written in a very clear style.

It is an invaluable source of historic informationIt is an invaluable source of historic information

1.1. examples to followexamples to follow

history Niupepadigital libraryimage capturetext captureNZDLuser interfaceconclusion

Page 12: Indigenous Language Newspapers as a Digital Library Collection: The Niupepa Example Department of Computer Science Te Taka Keegan.

12

Digital Library considerationsDigital Library considerations historyNiupepa digital lib.image capturetext captureNZDLuser interfaceconclusion

Two major stages to the digital captureTwo major stages to the digital capture

of legacy print material: of legacy print material:

1.1. capture of a digital facsimile image;capture of a digital facsimile image; - digital photograph of the original image - digital photograph of the original image

comprising individual dots or pixelscomprising individual dots or pixels

2.2. extraction of text using OCR;extraction of text using OCR; - need electronic text as a record of the actual - need electronic text as a record of the actual

characters if text is to be able to be searchedcharacters if text is to be able to be searched

Page 13: Indigenous Language Newspapers as a Digital Library Collection: The Niupepa Example Department of Computer Science Te Taka Keegan.

13

Image CaptureImage Capture

Images were available on 35mm filmImages were available on 35mm film

Photographs of good quality, but original print Photographs of good quality, but original print

material varied substantially in quality and material varied substantially in quality and

information density information density

...examples...examples

historyNiupepadigital library image capt.text captureNZDLuser interfaceconclusion

Page 14: Indigenous Language Newspapers as a Digital Library Collection: The Niupepa Example Department of Computer Science Te Taka Keegan.

14

Image Capture Image Capture

Quality or fidelity of image depends on:Quality or fidelity of image depends on:

1.1. density of dots (dots per inch - dpi)density of dots (dots per inch - dpi)

2.2. sensitivity of dot values (bits/pixel)sensitivity of dot values (bits/pixel)

Both parameters influence storage required:Both parameters influence storage required:

A4 page at 72 dpi b/wA4 page at 72 dpi b/w

=> 63 Kbytes=> 63 Kbytes

A4 page at 300 dpi and 256 grey levelsA4 page at 300 dpi and 256 grey levels

=> 9000 Kbytes=> 9000 Kbytes

historyNiupepadigital library image capt.text captureNZDLuser interfaceconclusion

Page 15: Indigenous Language Newspapers as a Digital Library Collection: The Niupepa Example Department of Computer Science Te Taka Keegan.

15

Image CaptureImage Capture

A large variation in quality and information A large variation in quality and information

density meant significant performance density meant significant performance

differences across periodicalsdifferences across periodicals

Experiments showed scanning density of Experiments showed scanning density of

300dpi on original page produced uniform 300dpi on original page produced uniform

OCR results.OCR results.

For large format newspaper, this meant ~20 For large format newspaper, this meant ~20

million pixels for a one-page imagemillion pixels for a one-page image

historyNiupepadigital library image capt.text captureNZDLuser interfaceconclusion

Page 16: Indigenous Language Newspapers as a Digital Library Collection: The Niupepa Example Department of Computer Science Te Taka Keegan.

16

Automated image captureAutomated image capture

Automated capture from 35mm film carried out Automated capture from 35mm film carried out

by New Zealand Micrographicsby New Zealand Micrographics

Because of set-up costs, two images captured Because of set-up costs, two images captured

at the same time:at the same time:

1.1. bitonal image for Internet deliverybitonal image for Internet delivery

~ 200Kbytes each~ 200Kbytes each

2.2. grey-scale image for OCRgrey-scale image for OCR

~ 5-10Mbytes each~ 5-10Mbytes each

historyNiupepadigital library image capt.text captureNZDLuser interfaceconclusion

Page 17: Indigenous Language Newspapers as a Digital Library Collection: The Niupepa Example Department of Computer Science Te Taka Keegan.

17

bitonal and grey-scale imagesbitonal and grey-scale imageshistoryNiupepadigital library image capt.text captureNZDLuser interfaceconclusion

Page 18: Indigenous Language Newspapers as a Digital Library Collection: The Niupepa Example Department of Computer Science Te Taka Keegan.

18

Image formatImage format

Images stored as “tagged image format” Images stored as “tagged image format”

(.tif) files, one file per frame of film (.tif) files, one file per frame of film

(~19000 frames)(~19000 frames)

8 CDs for entire collection in bitonal form8 CDs for entire collection in bitonal form

90+ CDs for entire collection in 90+ CDs for entire collection in

grey-scale formgrey-scale form

historyNiupepadigital library image capt.text captureNZDLuser interfaceconclusion

Page 19: Indigenous Language Newspapers as a Digital Library Collection: The Niupepa Example Department of Computer Science Te Taka Keegan.

19

Cropping and SplittingCropping and Splitting

Many images were of double-page spreadsMany images were of double-page spreads

For consistent granularity, 1-page-per-file, For consistent granularity, 1-page-per-file,

these images were split into two separate these images were split into two separate

images prior to text extractionimages prior to text extraction

Extraneous image material (outside the Extraneous image material (outside the

text) was cropped reducing sizetext) was cropped reducing size

Skewed images straightenedSkewed images straightened

historyNiupepadigital library image capt.text captureNZDLuser interfaceconclusion

Page 20: Indigenous Language Newspapers as a Digital Library Collection: The Niupepa Example Department of Computer Science Te Taka Keegan.

20

Text Capture Text Capture

OCR software identifies shapes in the image OCR software identifies shapes in the image

to creates a corresponding page of textto creates a corresponding page of text

Recognition accuracy depends on image Recognition accuracy depends on image

quality, visual quality of original, font, and quality, visual quality of original, font, and

software sophistication.software sophistication.

OCR can be time-consuming and expensive, OCR can be time-consuming and expensive,

depending on quality of images captureddepending on quality of images captured

Electronic text has low storage requirements - Electronic text has low storage requirements -

say 4Kbyte for single spaced A4 pagesay 4Kbyte for single spaced A4 page

historyNiupepadigital libraryimage capturetext captureNZDLuser interfaceconclusion

Page 21: Indigenous Language Newspapers as a Digital Library Collection: The Niupepa Example Department of Computer Science Te Taka Keegan.

21

Text CaptureText Capture

An An index fileindex file helps manage a large helps manage a large numbers of files with similar file namesnumbers of files with similar file names

Each image can give rise to several Each image can give rise to several different files;different files;

1.1. Digital facsimile grey scale image Digital facsimile grey scale image

2.2. Digital facsimile bitonal image Digital facsimile bitonal image

3.3. Cropped/Split image ready for OCRCropped/Split image ready for OCR

4.4. Text fileText file

5.5. Reduced Resolution image for WWWReduced Resolution image for WWW

6.6. Preview Image for WWWPreview Image for WWW

historyNiupepadigital libraryimage capturetext captureNZDLuser interfaceconclusion

Page 22: Indigenous Language Newspapers as a Digital Library Collection: The Niupepa Example Department of Computer Science Te Taka Keegan.

22

File RenamingFile Renaming

Should occur before the OCR process so Should occur before the OCR process so

that correct text files names are generatedthat correct text files names are generated

A consistent naming convention will match A consistent naming convention will match

image file names with text files namesimage file names with text files names

A self explanatory naming convention assists A self explanatory naming convention assists

interface programming e.g.interface programming e.g.

1.1. 01_01_02_05 is used to represent01_01_02_05 is used to represent

Niupepa 1, Volume 1, Issue 2, Page 5Niupepa 1, Volume 1, Issue 2, Page 5

This is very time consuming without a This is very time consuming without a

renaming program…renaming program…

historyNiupepadigital libraryimage capturetext captureNZDLuser interfaceconclusion

Page 23: Indigenous Language Newspapers as a Digital Library Collection: The Niupepa Example Department of Computer Science Te Taka Keegan.

23

File Renamer ExampleFile Renamer Example

File Renamer Ultra 2000, by TechalchemyFile Renamer Ultra 2000, by Techalchemy

historyNiupepadigital libraryimage capturetext captureNZDLuser interfaceconclusion

Page 24: Indigenous Language Newspapers as a Digital Library Collection: The Niupepa Example Department of Computer Science Te Taka Keegan.

24

OCR softwareOCR software

We tested many OCR programs and finally We tested many OCR programs and finally

settled on ABBYY FineReader 6.0 Coporate settled on ABBYY FineReader 6.0 Coporate

mainly for the following reasons;mainly for the following reasons;

1.1. It supports the Māori language (though not It supports the Māori language (though not

fully) and 175 other languagesfully) and 175 other languages

2.2. It does not try to write English words in non It does not try to write English words in non

English textsEnglish texts

3.3. It is very accurateIt is very accurate

4.4. It is cost effectiveIt is cost effective

historyNiupepadigital libraryimage capturetext captureNZDLuser interfaceconclusion

Page 25: Indigenous Language Newspapers as a Digital Library Collection: The Niupepa Example Department of Computer Science Te Taka Keegan.

25

OCR softwareOCR software

Other characteristics of FineReader that Other characteristics of FineReader that

are appropriate for this work;are appropriate for this work;

1.1. You can train a ‘recognition pattern’You can train a ‘recognition pattern’

2.2. Are able to input user dictionaries and Are able to input user dictionaries and

maintain them in the proofing processmaintain them in the proofing process

3.3. Can define the character setCan define the character set

4.4. It is user friendly and easy to run, thus new It is user friendly and easy to run, thus new

staff require minimum trainingstaff require minimum training

historyNiupepadigital libraryimage capturetext captureNZDLuser interfaceconclusion

Page 26: Indigenous Language Newspapers as a Digital Library Collection: The Niupepa Example Department of Computer Science Te Taka Keegan.

26

historyNiupepadigital libraryimage capturetext captureNZDLuser interfaceconclusion

OCR softwareOCR software

Page 27: Indigenous Language Newspapers as a Digital Library Collection: The Niupepa Example Department of Computer Science Te Taka Keegan.

27

NZDL – Greenstone softwareNZDL – Greenstone software

supports a range of document styles, forms supports a range of document styles, forms

and languages; and languages;

a wide range of collection sizes;a wide range of collection sizes;

different interface languages;different interface languages;

different browsing and searching different browsing and searching

structures;structures;

different storage and delivery techniques;different storage and delivery techniques;

historyNiupepadigital libraryimage capturetext capture NZDLuser interfaceconclusion

Page 28: Indigenous Language Newspapers as a Digital Library Collection: The Niupepa Example Department of Computer Science Te Taka Keegan.

28

Building the collectionBuilding the collection

Niupepa collection comprises two main Niupepa collection comprises two main

sets of documents:sets of documents:

1.1. extracted electronic text (for searching)extracted electronic text (for searching)

2.2. digital facsimile pages (for viewing)digital facsimile pages (for viewing)

Niupepa is one of the standard collections Niupepa is one of the standard collections

from the NZDL site, delivered over the from the NZDL site, delivered over the

Internet via a standard web browser;Internet via a standard web browser;

Can also be delivered on CD-ROM, as with Can also be delivered on CD-ROM, as with

all Greenstone collections.all Greenstone collections.

historyNiupepadigital libraryimage capturetext capture NZDLuser interfaceconclusion

Page 29: Indigenous Language Newspapers as a Digital Library Collection: The Niupepa Example Department of Computer Science Te Taka Keegan.

29

User InterfaceUser Interface

nzdl.org/niupepanzdl.org/niupepa The default language is Māori. The collection is The default language is Māori. The collection is

primarily in Māori, it was funded as a Māori primarily in Māori, it was funded as a Māori

language resource, and it makes a statement language resource, and it makes a statement

about the use of Māori. about the use of Māori.

The user interface can easily be switched to The user interface can easily be switched to

English or one of nine other languagesEnglish or one of nine other languages

historyNiupepadigital libraryimage capturetext captureNZDL user i/fconclusion

Page 30: Indigenous Language Newspapers as a Digital Library Collection: The Niupepa Example Department of Computer Science Te Taka Keegan.

30

User InterfaceUser Interface

There are 3 methods of accessThere are 3 methods of access

1.1. Browse by TitleBrowse by Title; selecting a particular ; selecting a particular

newspaper, issue, and then the pagenewspaper, issue, and then the page

2.2. Browse by DateBrowse by Date; selecting a particular time ; selecting a particular time

period, and then the 1period, and then the 1stst page of the issue page of the issue

3.3. Search by ContentSearch by Content; entering words or ; entering words or

phrases for full text searchingphrases for full text searching

historyNiupepadigital libraryimage capturetext captureNZDL user i/fconclusion

Page 31: Indigenous Language Newspapers as a Digital Library Collection: The Niupepa Example Department of Computer Science Te Taka Keegan.

31

User interface issuesUser interface issues

Because of poor quality images, not all Because of poor quality images, not all

OCR could be done. Full text search is OCR could be done. Full text search is notnot

available for < 2% of total series.available for < 2% of total series.

Indigenous languages often use non-ASCII Indigenous languages often use non-ASCII

characters. Unicode is used to correctly characters. Unicode is used to correctly

display these characters.display these characters.

Indigenous languages often require new Indigenous languages often require new

word generation at the user interfaceword generation at the user interface

historyNiupepadigital libraryimage capturetext captureNZDL user i/fconclusion

Page 32: Indigenous Language Newspapers as a Digital Library Collection: The Niupepa Example Department of Computer Science Te Taka Keegan.

32

User interface enhancementsUser interface enhancements

We were fortunate with this collection to make We were fortunate with this collection to make

available 2 additional sets of information on the available 2 additional sets of information on the

various Newspaper publications:various Newspaper publications:

1.1. Commentaries - historic research material Commentaries - historic research material

including bibliographic information, background, & including bibliographic information, background, &

subject matter and current physical locationssubject matter and current physical locations

2.2. Abstracts in English- a hypertext linked summary Abstracts in English- a hypertext linked summary

that gives non-Māori speaking readers insight to that gives non-Māori speaking readers insight to

what was written what was written

historyNiupepadigital libraryimage capturetext captureNZDL user i/fconclusion

Page 33: Indigenous Language Newspapers as a Digital Library Collection: The Niupepa Example Department of Computer Science Te Taka Keegan.

33

User interface future developmentsUser interface future developments

Possibilities exist for future development on:Possibilities exist for future development on:

1.1. A graphical timeline where the time period A graphical timeline where the time period

may be selected by moving an adjustable may be selected by moving an adjustable

slider along a timeline.slider along a timeline.

2.2. Generating a search by selecting a certain Generating a search by selecting a certain

location in a map and returning all the location in a map and returning all the

pages associated with that area.pages associated with that area.

3.3. High-lighting on the facsimile image areas High-lighting on the facsimile image areas

that match the search criteria. that match the search criteria.

historyNiupepadigital libraryimage capturetext captureNZDL user i/fconclusion

Page 34: Indigenous Language Newspapers as a Digital Library Collection: The Niupepa Example Department of Computer Science Te Taka Keegan.

34

UsageUsage

48 of top 50 search terms are in Māori 48 of top 50 search terms are in Māori

41 of 241 of 2ndnd top 50 search terms are in Māori top 50 search terms are in Māori

historyNiupepadigital libraryimage capturetext captureNZDLuser interface conclusion

Page 35: Indigenous Language Newspapers as a Digital Library Collection: The Niupepa Example Department of Computer Science Te Taka Keegan.

35

Usage Usage

0

20,000

40,000

60,000

80,000

100,000

2000 (283) 2001 (336) 2002 (297) 2003 (69)

Māori - English Comparisons for Niupepa Site

Māori

English

historyNiupepadigital libraryimage capturetext captureNZDLuser interface conclusion

Page 36: Indigenous Language Newspapers as a Digital Library Collection: The Niupepa Example Department of Computer Science Te Taka Keegan.

36

Usage Usage historyNiupepadigital libraryimage capturetext captureNZDLuser interface conclusion

Page 37: Indigenous Language Newspapers as a Digital Library Collection: The Niupepa Example Department of Computer Science Te Taka Keegan.

37

Conclusion...Conclusion...

A unique collection of historical indigenous A unique collection of historical indigenous

language newspapers covering a 100-year period language newspapers covering a 100-year period

has been captured in digital form; has been captured in digital form;

Not only is the collection preserved, but it is made Not only is the collection preserved, but it is made

much more widely and conveniently accessible;much more widely and conveniently accessible;

Full-text search, although a costly option, adds Full-text search, although a costly option, adds

significant utility and value to the collection;significant utility and value to the collection;

There are difficulties in carrying out OCR with There are difficulties in carrying out OCR with

minority languages, but these can be overcome;minority languages, but these can be overcome;

historyNiupepadigital libraryimage capturetext captureNZDLuser interface conclusion

Page 38: Indigenous Language Newspapers as a Digital Library Collection: The Niupepa Example Department of Computer Science Te Taka Keegan.

38

……conclusionconclusion

The experience gained, and the techniques learned The experience gained, and the techniques learned

and developed, are equally applicable to a wide and developed, are equally applicable to a wide

range of legacy text collections...range of legacy text collections...

……however, they are particularly pertinent to however, they are particularly pertinent to

collections of historical indigenous language collections of historical indigenous language

documents…documents…

……and digital collections of this type have the and digital collections of this type have the

potential to contribute significantly to the promotion potential to contribute significantly to the promotion

and preservation of language and culture.and preservation of language and culture.

historyNiupepadigital libraryimage capturetext captureNZDLuser interface conclusion