Samuel Läubli, Martin-Dietrich Glessgen, Phoenix2. A Tool for Web-Based Annotation of Medieval...

download Samuel Läubli, Martin-Dietrich Glessgen, Phoenix2. A Tool for Web-Based Annotation of Medieval Texts

of 27

Transcript of Samuel Läubli, Martin-Dietrich Glessgen, Phoenix2. A Tool for Web-Based Annotation of Medieval...

  • 7/30/2019 Samuel Lubli, Martin-Dietrich Glessgen, Phoenix2. A Tool for Web-Based Annotation of Medieval Texts

    1/27

    Phoenix2A Tool for Web-Based Annotation of Medieval Texts

    COST Workshop Connecting Textual Corpora and Dictionaries

    Samuel Laubli1,2 Martin-Dietrich Glessgen1

    1Institute of Romance StudiesUniversity of Zurich

    2

    Institute of Computational LinguisticsUniversity of Zurich

    April 26, 2013

  • 7/30/2019 Samuel Lubli, Martin-Dietrich Glessgen, Phoenix2. A Tool for Web-Based Annotation of Medieval Texts

    2/27

    Samuel Laubli | 2/27

  • 7/30/2019 Samuel Lubli, Martin-Dietrich Glessgen, Phoenix2. A Tool for Web-Based Annotation of Medieval Texts

    3/27

    Samuel Laubli | 3/27

    Contents

    1. Background Corpus Digital Edition Tools

    2. Phoenix2 in Use

    Import Querying Annotation External Editing

    3. Hands-On Session

    4. Conclusion

  • 7/30/2019 Samuel Lubli, Martin-Dietrich Glessgen, Phoenix2. A Tool for Web-Based Annotation of Medieval Texts

    4/27Samuel Laubli | 4/27

    Background

    1. Background

    Les plus anciens documents linguistiques de la France

    B k d C

  • 7/30/2019 Samuel Lubli, Martin-Dietrich Glessgen, Phoenix2. A Tool for Web-Based Annotation of Medieval Texts

    5/27Samuel Laubli | 5/27

    Background Corpus

    Corpus

    Les plus anciens documents linguistiques de la France (DocLing)

    Old French charters of the 13th century

    Collection founded by Jacques Monfrin (Ecole Nationale des Chartes)

    Now pursued by Martin-Dietrich Glessgen (University of Zurich)

    Currently comprises over 2000 documents from different regions

    B k d C

  • 7/30/2019 Samuel Lubli, Martin-Dietrich Glessgen, Phoenix2. A Tool for Web-Based Annotation of Medieval Texts

    6/27Samuel Laubli | 6/27

    Background Corpus

    Corpus

    Departements Editors [Adaptors] # Doc.

    1. Published VolumesOise Carolus-Barre [Tock, Grubl] 202Haute-Marne Gigot [Tock, Kiha] 142Vosges Lanher [Trotter] 285Aube, S.-et-M., Yonne Coq 103

    2. Revised Volumes

    Meurthe-et-Moselle Arnod, Glessgen 290Douai Mestayer, Brunner 350

    3. New Volumes in Progress

    Jura Muller 105Marne Kiha 230Meuse Matthey 250

    Moselle Pitz 180Nievre Alletsgruber 30Haute-Saone Muller 155Saone-et-Loire Alletsgruber 95Chancellerie royale Videsott 150 [+350]

    Adapted from [Glessgen, 2011]

    Background Digital Edition

  • 7/30/2019 Samuel Lubli, Martin-Dietrich Glessgen, Phoenix2. A Tool for Web-Based Annotation of Medieval Texts

    7/27Samuel Laubli | 7/27

    Background Digital Edition

    Digital Edition

    Project lead: Martin-Dietrich Glessgen

    Aimed at editing Old French charters of the 13th century

    Charters are manually transcribed into a machine-readable format

    Double encoding principle:

    a) Original (ancient) view

    b) Modern view Use the same data for print and online editions

    Background Tools

  • 7/30/2019 Samuel Lubli, Martin-Dietrich Glessgen, Phoenix2. A Tool for Web-Based Annotation of Medieval Texts

    8/27Samuel Laubli | 8/27

    Background Tools

    Digital Edition Requirements

    Functional Requirements:

    Editor for assisting editors in transcribing charters

    Storage and management of transcribed charters

    Querying of transcribed charters

    Annotation Text level (date, genre, regest, ...) Word level (Lemma, PoS, Morphology, ...)

    Export in distinct formats for: Print publication

    Web publication Research (working formats) Use within other tools External Editing

    Background Tools

  • 7/30/2019 Samuel Lubli, Martin-Dietrich Glessgen, Phoenix2. A Tool for Web-Based Annotation of Medieval Texts

    9/27Samuel Laubli | 9/27

    Background Tools

    Digital Edition Requirements

    Functional Requirements:

    Working process, programs

    TAGGING TOOL LEXICOGRAPHIC TOOL

    Entities/Data

    charter xml-charter

    enhanced xml-charter

    S-1.1 S-1.2

    XML-EDITOR

    mapping entry

    UML Control Flow

    Background Tools

  • 7/30/2019 Samuel Lubli, Martin-Dietrich Glessgen, Phoenix2. A Tool for Web-Based Annotation of Medieval Texts

    10/27Samuel Laubli | 10/27

    Background Tools

    Digital Edition Requirements

    Quality Requirements:

    Powerful yet easy to use

    Fast querying

    Easily accessible (client-server architecture)

    Use of non-commercial technology

    Background Tools

  • 7/30/2019 Samuel Lubli, Martin-Dietrich Glessgen, Phoenix2. A Tool for Web-Based Annotation of Medieval Texts

    11/27Samuel Laubli | 11/27

    g

    Phoenix2: Architecture

    Phoenix2 is a web-based tool for managing, querying, and annotating

    medieval texts.

    PHOENIX2 Web Interface(browser-based)

    CSS

    phoenix2-cssCSS-Framework

    XHTML

    JavaScript

    jQueryJavascript-Framework

    PHP

    MySQLRDBMS

    ApacheWebserver

    nformal

    Phoenix2 in Use

  • 7/30/2019 Samuel Lubli, Martin-Dietrich Glessgen, Phoenix2. A Tool for Web-Based Annotation of Medieval Texts

    12/27

    Samuel Laubli | 12/27

    2. Phoenix2 in Use

    Live Demonstration

    Phoenix2 in Use Import

  • 7/30/2019 Samuel Lubli, Martin-Dietrich Glessgen, Phoenix2. A Tool for Web-Based Annotation of Medieval Texts

    13/27

    Samuel Laubli | 13/27

    Live Demonstration

    Importing Texts

    Phoenix2 in Use Import

  • 7/30/2019 Samuel Lubli, Martin-Dietrich Glessgen, Phoenix2. A Tool for Web-Based Annotation of Medieval Texts

    14/27

    Samuel Laubli | 14/27

    Machine-Readable Format: XML/XSD

    Phoenix2 builds upon texts encoded in an idiosyncratic XML format. We

    use three schemata:

    entry: Lightweight markup aimed at facilitating the initialtranscription of charters (original format). Either tokenized oruntokenized.

    storage: Main format for use within Phoenix2. Thoroughly tokenized;all Tokens are typed (tok/num/punct).

    edit: Similar to storage, but slightly adapted for use in externalXML editors.

    Extra attributes for word-level annotations Checksums for re-import into Phoenix2 (check-in)

    Phoenix2 in Use Import

  • 7/30/2019 Samuel Lubli, Martin-Dietrich Glessgen, Phoenix2. A Tool for Web-Based Annotation of Medieval Texts

    15/27

    Samuel Laubli | 15/27

    Indexing Texts in a Relational Database

    Why does importing texts take quite a while?

    Texts are indexed into a relational database

    We use a relational MySQL database. This allows for

    Fast querying Linking additional entities to texts without including them in the XML

    Storing system data (user accounts, settings, ...)

    Phoenix2 in Use Import

  • 7/30/2019 Samuel Lubli, Martin-Dietrich Glessgen, Phoenix2. A Tool for Web-Based Annotation of Medieval Texts

    16/27

    Samuel Laubli | 16/27

    Indexing Texts in a Relational Database

    Phoenix2 in Use Querying

  • 7/30/2019 Samuel Lubli, Martin-Dietrich Glessgen, Phoenix2. A Tool for Web-Based Annotation of Medieval Texts

    17/27

    Samuel Laubli | 17/27

    Live Demonstration

    Querying Texts

    Phoenix2 in Use Querying

  • 7/30/2019 Samuel Lubli, Martin-Dietrich Glessgen, Phoenix2. A Tool for Web-Based Annotation of Medieval Texts

    18/27

    Samuel Laubli | 18/27

    Regular Expressions

    Queries in Phoenix2 can be formulated using Regular Expressions.

    abbe finds all words that contain the string abbe abbe, abbes, ...

    ^pou?r$ finds por and pour.

    [aeiou]{3} finds words that contain three consecutive vowels ...

    Phoenix2 in Use Annotation

  • 7/30/2019 Samuel Lubli, Martin-Dietrich Glessgen, Phoenix2. A Tool for Web-Based Annotation of Medieval Texts

    19/27

    Samuel Laubli | 19/27

    Live Demonstration

    Annotating Words

    Phoenix2 in Use External Editing

  • 7/30/2019 Samuel Lubli, Martin-Dietrich Glessgen, Phoenix2. A Tool for Web-Based Annotation of Medieval Texts

    20/27

    Samuel Laubli | 20/27

    Live Demonstration

    External Editing

    Hands-On Session

  • 7/30/2019 Samuel Lubli, Martin-Dietrich Glessgen, Phoenix2. A Tool for Web-Based Annotation of Medieval Texts

    21/27

    Samuel Laubli | 21/27

    3. Hands-On Session

    Try it Yourself

    Hands-On Session

  • 7/30/2019 Samuel Lubli, Martin-Dietrich Glessgen, Phoenix2. A Tool for Web-Based Annotation of Medieval Texts

    22/27

    Samuel Laubli | 22/27

    Log In

    All you need is

    Any modern internet browser

    Internet connection

    Log in via

    URL: tiny.uzh.ch/2A

    User: cost

    Password: action

    Enter login credentials twice

    Feel free to explore and manipulate whatever you want its just a copy.

    Conclusion

    http://localhost/var/www/apps/conversion/tmp/scratch_1/tiny.uzh.ch/2Ahttp://localhost/var/www/apps/conversion/tmp/scratch_1/tiny.uzh.ch/2Ahttp://tiny.uzh.ch/2A
  • 7/30/2019 Samuel Lubli, Martin-Dietrich Glessgen, Phoenix2. A Tool for Web-Based Annotation of Medieval Texts

    23/27

    Samuel Laubli | 23/27

    4. Conclusion

    Phoenix2 A Tool for Web-Based Annotation of Medieval Texts

    Conclusion

  • 7/30/2019 Samuel Lubli, Martin-Dietrich Glessgen, Phoenix2. A Tool for Web-Based Annotation of Medieval Texts

    24/27

    Samuel Laubli | 24/27

    Conclusion

    Phoenix2 is an implementation based on the most recent

    computational and philological standards.

    It is aimed at

    Transperancy of all data and source codes (i.e., well-documented opensource technology)

    Connectivity through well-defined interfaces Persistance of all data and interfaces

    Usability for both experts and novices

    We pursue the stringent and uncompromising synthesis of philology,linguistics, and information technology based on a long-term, intensivecooperation between computational linguistics and special branches ofacademic knowledge.

    Conclusion

  • 7/30/2019 Samuel Lubli, Martin-Dietrich Glessgen, Phoenix2. A Tool for Web-Based Annotation of Medieval Texts

    25/27

    Samuel Laubli | 25/27

    Conclusion

    Feel free to try and get in touch with us.Feedback is very welcome.

  • 7/30/2019 Samuel Lubli, Martin-Dietrich Glessgen, Phoenix2. A Tool for Web-Based Annotation of Medieval Texts

    26/27

    Samuel Laubli | 26/27

    Thank You

    These slides are available atwww.cl.uzh.ch/people/team/laeubli.html

    http://www.cl.uzh.ch/people/team/laeubli.htmlhttp://www.cl.uzh.ch/people/team/laeubli.html
  • 7/30/2019 Samuel Lubli, Martin-Dietrich Glessgen, Phoenix2. A Tool for Web-Based Annotation of Medieval Texts

    27/27

    Samuel Laubli | 27/27

    Bibliography

    Glessgen, M.-D. (2011).

    Presentation generale: architecture et methodologie du projet des plus anciensdocuments linguistiques de la france, edition electronique.

    In Glessgen, M.-D., Kiha, D., and Videsott, P., editors, Lelaboration philologique et

    linguistique des Plus anciens documents linguistiques de la France, Editionelectronique (Bibliotheque de l Ecole des Chartes 168), pages 8394.