Zanichelli XML-based Dictionaries Editing System Daniele Fusi.

36
Zanichelli XML-based Dictionaries Editing System Daniele Fusi
  • date post

    19-Dec-2015
  • Category

    Documents

  • view

    214
  • download

    0

Transcript of Zanichelli XML-based Dictionaries Editing System Daniele Fusi.

  • Slide 1
  • Zanichelli XML-based Dictionaries Editing System Daniele Fusi
  • Slide 2
  • 1 - System Requirements Multiple presentations, legacy content, operating environment
  • Slide 3
  • One content, multiple presentations data cd-rom / dvd web sites or services paper books e-books
  • Slide 4
  • Existing environment: requirements authors accustomed toWYSIWYG editing in Word processors no technical training IT point of view text as a database query and interactivity multiple media and forms editors content validation and uniformation text-based tools simple content structure designers DTP pagination flattened structure import / export
  • Slide 5
  • Existing content: conversion word processor documents 3rd party formats
  • Slide 6
  • Digital format requirements text-based storage, both machine- and user-readable using standard technologies (portable & durable) open to expansion and customization easy to manipulate easy to transform for import/export focused on semantics: content rather than its presentation
  • Slide 7
  • Content and semantics: dictionary... lemma:
  • Slide 8
  • Marking semantics in text: fields lemma morphology etymon translation sample work etc...
  • Slide 9
  • Semantic markup: applications lemma morphology etymon translation sample work alphabetical lemmata list, normal or inverted list of lemmata grouped by grammatical category list of lemmata grouped by etymon (roots dictionary) rudimentary bidirectional dictionary look for quotation list of quoted works and authors etc... complex searches lemma morphology etymon work etc...
  • Slide 10
  • 2 Solution overview XML-based implementation
  • Slide 11
  • Implementation: XML XML Dictionary Unicode text files widely used standard built for openness and transformation (XSLT) representation of any kind of data, independently from their presentation hierarchical model well-fit to hierarchical model: letter, lemma, fields typically stored as text for existing works dictionary letter lemma field
  • Slide 12
  • Sample: lemma and fields lemma = dizionrio date = 1965 grammar = s.m. translation 1 = complesso dei lemmi di un dizionario e sim. separator translation 2 = lista dei lemmi dizionrio [1965] s.m. complesso dei lemmi di un dizionario e sim. lista dei lemmi
  • Slide 13
  • dictionary translation separator translation grammar date Hierarchical structure lemma lemmata... letter letters...
  • Slide 14
  • Minimalist structure Flat, yet extensible smallest depth satisfies practical requirements fields vary at will accor- ding to the dictionary language and type variability of fields compensates for relatively flat hierarchy dictionary letter lemma field...
  • Slide 15
  • Structure and compromises Practical devices fields define lemma parts: etymon, translation, grammar, samples,... formatting is automatically derived from semantic structure (lemma = bold, grammar = italic, author = smallcaps,...) text escapes define specific formatting for portions of field values, whenever they are not considered as semantically relevant I came by cab Focus on semantics 1 field (sample) in lemma: hierarchy needs not to be deeper, yet allow emphasis on by
  • Slide 16
  • Storage: data XML files: one file per letter each dictionary has its own alphabet and sorting scheme lemmata: automatically inserted in the proper file and at the proper position according to their content lemma ID overriding for special sorting XML files (letters) lemma ct (du)acote ABSabiesse 10 minutestenminutes
  • Slide 17
  • Storage: metadata self-descriptive dictionary: additional XML files define: fields list and types within each dictionary alphabet and sort order for each dictionary, including diacritics sensitivity other support dictionary- specific resources (e.g. frequently typed symbols, preview styles) prelemma etymon abbreviation phonetics translation variant grammar category (A, B, C...) section (1, 2, 3...) separator ( ... )... abcd efghijkl mn oprstuvz croatian
  • Slide 18
  • 3 Editing Authors
  • Slide 19
  • Visual Editing visual UI: authors build lemmata visually by blocks, and are shielded from underlying XML code XML code integrity is granted by software typographical preview is provided for WYSIWYG accustomed authors XML data file = letter letter lemmata fields XML metadata
  • Slide 20
  • Editing software: editing by blocks lemmata list visual editing: fields in lemma typographical preview letter selector
  • Slide 21
  • Editing in distributed scenarios Web based visual editing
  • Slide 22
  • Web: distributed scenario dictionaries are stored centrally in a web server an ASP.NET web site manages accesses and versioning for different authors and works visual editing implemented as a Silverlight RIA, running from authors own computer, yet inside a web page: desktop-class responsiveness for application true platform independence (Mac / PC, IE / Mozilla / Safari) no need for software distribution and installation centralized software maintenance
  • Slide 23
  • Distributed editing SQL database for managing access ASP.NET server application manages users and works versions Silverlight application runs on client computer for visual editing XML author specialized author editor
  • Slide 24
  • Visual editing in your web browser lemmata list visual editing: fields in lemma typographical preview letter selector
  • Slide 25
  • 4 - Revision Editors
  • Slide 26
  • Content revisions and transformations merging different versions (multiple authors scenarios) editors validation and uniformation DTP pagination for printing
  • Slide 27
  • Automated revision and correction test selection test description results
  • Slide 28
  • 5 - Publication Editors
  • Slide 29
  • One content, multiple outputs print cd/dvd mobile devices (Mobipocket) web sites
  • Slide 30
  • Extending the model Sample: RTL languages and root-based dictionaries
  • Slide 31
  • Arabic-Italian dictionary clashing RTL/LTR text flows special alphabetical order: several letters share the same rank different sorting according to level root-based dictionary: letter root lemma field existing dictionaries structure must be kept unchanged even if a deeper hierarchy would be required +0621...+0627 +0628 +062A 1 2 3... roots are sorted according to predefined scheme, lemmata in roots are arbitrarily sorted by authors
  • Slide 32
  • letter Hierarchy depths Other dictionaries Arabic: roots... lemma... lemma... item = root item = lemma item = lemma = set of fields item = root = set of fields, some delimiting lemmata boundaries
  • Slide 33
  • Deeper hierarchy illusion: special editor Arabic-X editor XML structure unchanged: each file is a letter containing items, each item contains fields items are roots, not lemmata a special field defines lemmata boundaries whithin each root user sees letters, roots, lemmata in root, fields in lemmata; XML structure remains letter-items-fields... lemma... lemma
  • Slide 34
  • Specialized editor: Arabic letter selector roots in letter lemmata in root visual editing: fields in lemma typographical preview, bidirectional flows
  • Slide 35
  • Arabic editor trick: advantages user experience is almost unchanged (there are 2 lists instead of 1 to choose from for editing, roots and lemmata) XML structure unchanged: all the other editorial processes require no change so that the new dictionary fits into them easily fields variability (already responsible for structure expandability) makes this trick possible one model, several views
  • Slide 36
  • Daniele Fusi http://www.fusisoft.it [email protected]