Post on 07-Oct-2020
Supported by: EU Cultural Heritage:
http://www.transcriptorium.eu
aims to develop innova-tive, efficient and cost-effective solu-tions for the indexing, search and fulltranscription of historical handwrittendocument images, using modern, holis-tic Handwritten Text Recognition tech-nology.
....
....
....
....
....
....
....
....
....
....
....
....
....
....
....
....
....
....
....
....
....
....
....
....
....
....
....
....
....
....
....
will turn Handwritten Text Recog-nition (HTR) technology into a mature tech-nology by addressing the following objectives:
1 Enhancing HTR technology for efficient tran-scriptionDeparting from state-of-the-art HTR approaches,
will capitalize on interactive-predictivetechniques for effective and user-friendly computer-assisted transcrition.
2 Bringing the HTR technology to usersExpected users of the HTR technology belong mainlyto two groups: a) individual researchers with expe-rience in handwritten documents transcription inter-ested in transcribing specific documents. b) volun-teers which collaborate in large transcription projects.
3 Integrating the HTR results in public webportalsThe HTR technology will become a support in thedigitization of the handwritten materials. The out-comes of the tools will be attached tothe published handwritten document images. This in-cludes not only full, correct transcriptions, but alsopartially correct transcription and other kinds of auto-matically produced metadata, useful for indexing andsearching.
Project coordinator: Joan Andreu Sanchez(jandreu@prhlt.upv.es)
Project no.: 600707Start date: 1 January 2013End date: 31 December 2015
....
....
....
....
....
....
....
....
....
....
....
....
....
....
....
....
....
....
....
....
....
....
....
....
....
....
....
....
....
....
....
Research groups and institutions:
• Pattern Recognition and Human Language Tech-nology Research Center (PRHLT) from Univer-sitat Politecnica de Valencia (Spain)
Principal researcher: Enrique Vidal(evidal@prhlt.upv.es)
• Department of German Language and Litera-ture Studies (UIBK) from University of Inns-bruck (Austria)
Principal researcher: Gunter Muhlberger(guenter.muehlberger@uibk.ac.at)
• Computational Intelligence Laboratory (CIL)from National Center for Scientific Research“Demokritos” (Greece)
Principal researcher: Basilis Gatos(bgat@iit.demokritos.gr)
• Centre for Digital Humanities (UCLDH) fromUniversity College London (United Kingdom)
Principal researcher: Philip Schofield(p.schofield@ucl.ac.uk)
• Institute for Dutch Lexicology (INL) (Nether-lands)
Principal researcher: Katrien Depuydt(Katrien.Depuydt@inl.nl)
• Digital Archives & Repositories Department(ULCC) from University London ComputerCentre (United Kingdom)
Principal researcher: Richard Davis(r.davis@ulcc.ac.uk)
CollectionsE
nglis
hD
utch
Ger
man
Span
ish
....
....
....
....
....
....
....
....
....
....
....
....
....
....
....
....
....
....
....
....
....
....
....
....
....
....
....
....
....
....
....
Research results• Document Image Analysis: Pre-processing, image en-
hancing, layout analysis, skew correction, line detection.
• Interactive Handwritten Text Recognition: the
user and the system interact for obtaining the correct transcript.
• Key Word SpottingQuery by string
Query by sample
• Linguistic Resources: language models, lexicon, abbre-
viations.
....
....
....
....
....
....
....
....
....
....
....
....
....
....
....
....
....
....
....
....
....
....
....
....
....
....
....
....
....
....
....
Data and tool results
Datasets and ground-truth:• English. Bentham: XVIII/XIX centuries, 80 000
pages, several hands. GT: 800 pages.
• Dutch. Four books: XV century, 2 000 pages, severalhands. GT: 200 pages.
• German. Several collections: from XVI to XX cen-turies, 32 000 pages, several hands. GT: 200 pages.
• Spanish. Plantas: XVII century, 7 000 pages, onehand. GT: 1 000 pages. Esposalles:, from XV to XXcenturies, 291 books, several hands. GT: 2 books.
Tools: DIA, interactive HTR and KWS tools.