aims to develop innova- tive, efficient and cost...

2
Supported by: EU Cultural Heritage: http://www.transcriptorium.eu aims to develop innova- tive, efcient and cost-effective solu- tions for the indexing, search and full transcription of historical handwritten document images, using modern, holis- tic Handwritten Text Recognition tech- nology. ..................................................................................................................... will turn Handwritten Text Recog- nition (HTR) technology into a mature tech- nology by addressing the following objectives: 1 Enhancing HTR technology for efcient tran- scription Departing from state-of-the-art HTR approaches, will capitalize on interactive-predictive techniques for effective and user-friendly computer- assisted transcrition. 2 Bringing the HTR technology to users Expected users of the HTR technology belong mainly to two groups: a) individual researchers with expe- rience in handwritten documents transcription inter- ested in transcribing specic documents. b) volun- teers which collaborate in large transcription projects. 3 Integrating the HTR results in public web portals The HTR technology will become a support in the digitization of the handwritten materials. The out- comes of the tools will be attached to the published handwritten document images. This in- cludes not only full, correct transcriptions, but also partially correct transcription and other kinds of auto- matically produced metadata, useful for indexing and searching. Project coordinator: Joan Andreu S · anchez ([email protected]) Project no.: 600707 Start date: 1 January 2013 End date: 31 December 2015 ..................................................................................................................... Research groups and institutions: Pattern Recognition and Human Language Tech- nology Research Center (PRHLT) from Univer- sitat Politecnica de Val encia (Spain) Principal researcher: Enrique Vidal ([email protected]) Department of German Language and Litera- ture Studies (UIBK) from University of Inns- bruck (Austria) Principal researcher: G¤ unter M¤ uhlberger ([email protected]) Computational Intelligence Laboratory (CIL) from National Center for Scientic Research Demokritos (Greece) Principal researcher: Basilis Gatos ([email protected]) Centre for Digital Humanities (UCLDH) from University College London (United Kingdom) Principal researcher: Philip Schoeld ([email protected]) Institute for Dutch Lexicology (INL) (Nether- lands) Principal researcher: Katrien Depuydt ([email protected]) Digital Archives & Repositories Department (ULCC) from University London Computer Centre (United Kingdom) Principal researcher: Richard Davis ([email protected])

Transcript of aims to develop innova- tive, efficient and cost...

Page 1: aims to develop innova- tive, efficient and cost ...transcriptorium.eu/wp-content/uploads/2014/12/leafletTS.pdf · document images, using modern, holis-tic Handwritten Text Recognition

Supported by: EU Cultural Heritage:

http://www.transcriptorium.eu

aims to develop innova-tive, efficient and cost-effective solu-tions for the indexing, search and fulltranscription of historical handwrittendocument images, using modern, holis-tic Handwritten Text Recognition tech-nology.

....

....

....

....

....

....

....

....

....

....

....

....

....

....

....

....

....

....

....

....

....

....

....

....

....

....

....

....

....

....

....

will turn Handwritten Text Recog-nition (HTR) technology into a mature tech-nology by addressing the following objectives:

1 Enhancing HTR technology for efficient tran-scriptionDeparting from state-of-the-art HTR approaches,

will capitalize on interactive-predictivetechniques for effective and user-friendly computer-assisted transcrition.

2 Bringing the HTR technology to usersExpected users of the HTR technology belong mainlyto two groups: a) individual researchers with expe-rience in handwritten documents transcription inter-ested in transcribing specific documents. b) volun-teers which collaborate in large transcription projects.

3 Integrating the HTR results in public webportalsThe HTR technology will become a support in thedigitization of the handwritten materials. The out-comes of the tools will be attached tothe published handwritten document images. This in-cludes not only full, correct transcriptions, but alsopartially correct transcription and other kinds of auto-matically produced metadata, useful for indexing andsearching.

Project coordinator: Joan Andreu Sanchez([email protected])

Project no.: 600707Start date: 1 January 2013End date: 31 December 2015

....

....

....

....

....

....

....

....

....

....

....

....

....

....

....

....

....

....

....

....

....

....

....

....

....

....

....

....

....

....

....

Research groups and institutions:

• Pattern Recognition and Human Language Tech-nology Research Center (PRHLT) from Univer-sitat Politecnica de Valencia (Spain)

Principal researcher: Enrique Vidal([email protected])

• Department of German Language and Litera-ture Studies (UIBK) from University of Inns-bruck (Austria)

Principal researcher: Gunter Muhlberger([email protected])

• Computational Intelligence Laboratory (CIL)from National Center for Scientific Research“Demokritos” (Greece)

Principal researcher: Basilis Gatos([email protected])

• Centre for Digital Humanities (UCLDH) fromUniversity College London (United Kingdom)

Principal researcher: Philip Schofield([email protected])

• Institute for Dutch Lexicology (INL) (Nether-lands)

Principal researcher: Katrien Depuydt([email protected])

• Digital Archives & Repositories Department(ULCC) from University London ComputerCentre (United Kingdom)

Principal researcher: Richard Davis([email protected])

Page 2: aims to develop innova- tive, efficient and cost ...transcriptorium.eu/wp-content/uploads/2014/12/leafletTS.pdf · document images, using modern, holis-tic Handwritten Text Recognition

CollectionsE

nglis

hD

utch

Ger

man

Span

ish

....

....

....

....

....

....

....

....

....

....

....

....

....

....

....

....

....

....

....

....

....

....

....

....

....

....

....

....

....

....

....

Research results• Document Image Analysis: Pre-processing, image en-

hancing, layout analysis, skew correction, line detection.

• Interactive Handwritten Text Recognition: the

user and the system interact for obtaining the correct transcript.

• Key Word SpottingQuery by string

Query by sample

• Linguistic Resources: language models, lexicon, abbre-

viations.

....

....

....

....

....

....

....

....

....

....

....

....

....

....

....

....

....

....

....

....

....

....

....

....

....

....

....

....

....

....

....

Data and tool results

Datasets and ground-truth:• English. Bentham: XVIII/XIX centuries, 80 000

pages, several hands. GT: 800 pages.

• Dutch. Four books: XV century, 2 000 pages, severalhands. GT: 200 pages.

• German. Several collections: from XVI to XX cen-turies, 32 000 pages, several hands. GT: 200 pages.

• Spanish. Plantas: XVII century, 7 000 pages, onehand. GT: 1 000 pages. Esposalles:, from XV to XXcenturies, 291 books, several hands. GT: 2 books.

Tools: DIA, interactive HTR and KWS tools.