KU Leuven - Words and numbers - ICoC

22
Words and numbers KU Leuven University Library Central Services Digitisation

Transcript of KU Leuven - Words and numbers - ICoC

Words and numbers

KU Leuven University Library Central ServicesDigitisation

Intro: KU Leuven Digitisation

• University Library Central Services

• Digitisation projects and programmeso Research, education, heritageo Coordination, facilitation

• Imaging Labo Focus on qualityo Focus on innovation

Intro: LIBIS

• IT solutions for collection managemento Archives, libraries, museao Development and support for network larger than just KU Leuveno LIAS

• Solutions for researcherso Scientific data management, collaboration, sharingo Multiple environments

• Centre of expertise

• Project oriented

Lines of Work and Issues

• Output formats

• Historical languages: Latin

• Historical languages: Demotic and friends

• Printed statistical tables

• Manuscripts and handwritten materials

• Workflow management

Output formats

• SUCCEED

• OCR engines generate TEI that does not use all features of the standard.

• Reduces the value of OCR-generated TEI as a starting point for research.

• Looking for:o A way to improve the quality of TEI generated by OCR engines

• Possible input:o LIBIS expertise and knowhow

Historical languages: Latin

• Course notes by students of the old university of Leuven

• Western Europe: Latin essential for historical research

• Fragmented efforts, hard to track, difficult to establish cooperation

• Looking for:o Highly automated and accurate OCR = limited manual interventiono Lexica, NER

• Possible input:o Text material from different periods and locationso Academic input: neo-latin, …

Historical languages: other

• Latin is not the only important historical language

• Precursors of contemporary spoken languages

• No specific projects for now

• Certainly important for our researchers, Hebrew for instance

• Looking for :o Initiatives we might join

Printed statistical tables• Recensement général des industries et des métiers (31 octobre 1896)

• Nineteenth-century statistical material

• Very hard to use for research due to sheer size and complexity

• Solution: digitisation followed by OCR

• Output: spreadsheets or functional equivalents

• Looking for:o Extremely accurate OCR for numeric materialso correct translation of dense table layouto Tools for preparation of the digitised images and quality control

• Possible input:o Digitized source materialo Expertise:Depts of Electrical Engineering, Economic History, Historical Demography

How to deal with complex layout, columns and ciphers?

Manuscripts and handwritten material

• RICH + Bible of Anjou

• Ready to contribute material as content holder

• Working on a programme about letters

Workflow management

• Digicorder + Teamwork

• How do others deal with workflow management?

• Where to position enrichment in digitisation workflow?

• Ready to participate in the production of Webinars

Klik op het pictogram als u een afbeelding wilt toevoegen

Digicorder = tool to manage naming of projects and scansCreated by Diederik Lanoye using Filemaker1 project = 1 instance of digicorder

Options when creating unique names for scans and corresponding labelsStarting point = object to be digitizedLabel = description of part of object or number of page or folio

Names for scans and corresponding labels

Information shown for each scanned image

Teamwork = workflow management toolDashboard lists projects, tasks, milestones and responsibilities

Inside a project: tasks on a timeline

Milestones are defined for important moments in the workflowOften in case of transitionsMore information: https://www.teamwork.com/projects/

You never walk alone o Issues are not specific to KU Leuveno Sharing expertise to cover all aspects is the only way to goo Valuable expertise in specific fields

• Neo and humanist Latin• Historic demography and Economic history• Imaging

o On our wishlist:• Cooperation in new and on-going developments• Exchange of expertise• Above all: action

Cooperation• Wiki as a starting point, interesting initiative

• Who wants to join forces?o Writing projects togethero Searching for funding

• Important:o Automatedo Accurateo Scalable and Maintainableo Cost effective

[email protected]

• Hoping to return to Leuven with names, specific suggestions, and appointments for meetings to discuss proposals

Appendix: Center for Processing Speech and Images

• The Center for Processing Speech and Images (PSI) is one of the units within the department of Electrical Engineering (ESAT) at KU Leuven. It is specialized in computer vision and has object and object class recognition as one of its most important domains of research. Besides more general goals as scene understanding, segmentation or invariant object recognition, it has experience with character recognition in licence plates and automatic recognition of handwritten music scores for transcription to modern music.

With more than 60 researchers it is one of the biggest research groups of its kind in Europe and has a lot of experience in national and international projects. 2 professors have received ERC grants of the European Commission and have won several other prestigious prizes.