9 ODT2DAISY: Producing Digital Talking Books with Open-Source Software

10
FOSS-AMA Satellite event odt2daisy: digital talking books with open-source software Christophe Strobbe Katholieke Universiteit Leuven Belgium

description

odt2daisy is an open-source add-on for OpenOffice.org that converts text processing files to digital talking books in the DAISY1 format (ANSI/NISO2 Z39.86). Digital talking books make print material accessible to blind or otherwise print-disabled persons. DAISY contains features that allow users to navigate by headings or page numbers, and to have a text version that is synchronised with the audio version. odt2daisy produces both Full DAISY 3 (text synchronised with audio) and DAISY 3 XML3 (text without audio). For compatibility with older DAISY software, it also supports DAISY 2.02. odt2daisy also supports mathematical content (Mathematical Markup Language). odt2daisy works on Microsoft Windows, Mac OS X, Linux and Solaris. For the production of audio, odt2daisy relies on the DAISY Pipeline Lite, an open-source software developed by the DAISY Consortium, the LAME MP3 encoding technology, and the operating system’s text-to-speech (TTS) engine(s). The supported languages depend on the TTS engines available on the user’s system. On Unix-based systems odt2daisy relies on the open-source eSpeak TTS engine, which supports 27 languages. odt2daisy enables the production of DAISY books with only opensource software, for example Ubuntu Linux, OpenOffice.org, odt2daisy and eSpeak constitute a completely open-source software stack. The next step is the development of an accessibility evaluation and repair add-on for OpenOffice.org in order to ensure that documents produced with OpenOffice.org can be more accessible and serve as a better basis for exporting to other formats such as DAISY, PDF4 and HTML5. Vincent Spiewak started working on odt2daisy at the Université Pierre et Marie Curie (Paris, France) and continued the work at the Katholieke Universiteit Leuven (Leuven, Belgium) in the framework of ÆGIS, a research and development project co-financed by the European Commission’s 7th Framework Programme.

Transcript of 9 ODT2DAISY: Producing Digital Talking Books with Open-Source Software

Page 1: 9 ODT2DAISY: Producing Digital Talking Books with Open-Source Software

FOSS-AMASatellite event

odt2daisy: digital talking books

with open-source software

Christophe StrobbeKatholieke Universiteit Leuven

Belgium

Page 2: 9 ODT2DAISY: Producing Digital Talking Books with Open-Source Software

27-28 March 2010, Paphos, Cyprus

Motivation & Problem Area

Digital talking books• For persons with “print disabilities”• DAISY – ANSI/NISO Z39/86• Production: typically

– by specialised production centres – for blind & visually impaired users– i.e. not by users (in 2007)

Page 3: 9 ODT2DAISY: Producing Digital Talking Books with Open-Source Software

27-28 March 2010, Paphos, Cyprus

Objectives

Enable end-users to produce DAISY• In most European languages• In a free and open-source office suite• Support:

– DAISY 3 (with or without audio)– DAISY 2.02 (for older players)– Multilingual content– Mathematical Markup Language

Page 4: 9 ODT2DAISY: Producing Digital Talking Books with Open-Source Software

27-28 March 2010, Paphos, Cyprus

Methodology

• Build OpenOffice.org extension– Odt2dtbook by Vincent Spiewak

available in 2008– Functionality available as extension and

as reusable JAR (Java Archive)– Add:

• DAISY 3 audio, DAISY 2.02• comprehensive set of test documents

(regression testing)• Support for multilingual content on Windows

Page 5: 9 ODT2DAISY: Producing Digital Talking Books with Open-Source Software

27-28 March 2010, Paphos, Cyprus

odt2daisy Components (1)

• Java Open Document Library (JODL)– For ODT / XML preprocessing

• odt2daisy library– Converts ODT to DAISY XML (XSTL)– Validates output– Reusable Java library– Command line interface

Page 6: 9 ODT2DAISY: Producing Digital Talking Books with Open-Source Software

27-28 March 2010, Paphos, Cyprus

odt2daisy Components (2)

• odt2daisy extension– Wrapper for other components:– Uses OpenOffice.org UNO API– Uses odt2daisy library– Uses DAISY Pipeline Lite (speech

synthesis)– Includes templates

• Templates with custom styles for DAISY production

Page 7: 9 ODT2DAISY: Producing Digital Talking Books with Open-Source Software

27-28 March 2010, Paphos, Cyprus

Results (1)

• odt2daisy released November 2009– Tutorials in various formats (text, DAISY,

video)– Developer documentation– Test files for regression testing– TTS in 27 languages where eSpeak is

available (Linux, Windows)

Page 8: 9 ODT2DAISY: Producing Digital Talking Books with Open-Source Software

27-28 March 2010, Paphos, Cyprus

Results (2)

• Support for ODT features– Heading, List, Table, Images, Captions,

Notes, Foot/Rear notes, Math, TOC, Section, Frame, Bookmark, Metadata, ...

– Page numbering (1,i,I,a,A; advanced)– Front / body / rear matter– “Complex text layout” and East-Asian

languages not supported

Page 9: 9 ODT2DAISY: Producing Digital Talking Books with Open-Source Software

27-28 March 2010, Paphos, Cyprus

Conclusion and Outlook

• Some ODT features are hard to parse (e.g. multilingual text; “Asian” languages)

• Licensing: MP3 vs Ogg Vorbis for TTS• TTS quality: TTS as internet service/

in cloud computing?• Accessibility checking before export

Page 10: 9 ODT2DAISY: Producing Digital Talking Books with Open-Source Software

27-28 March 2010, Paphos, Cyprus

Start Using It!

• http://odt2daisy.sf.net/

• Developer site:http://sourceforge.net/projects/odt2daisy/