Encoding and Presenting Interlinear Text Using XML Technologies

23
Encoding and Presenting Interlinear Text Using XML Technologies Baden Hughes, Steven Bird, Catherine Bow University of Melbourne Australasian Language Technology Workshop December 10, 2003

description

Paper at ALTW2003 (December 2003, Melbourne)

Transcript of Encoding and Presenting Interlinear Text Using XML Technologies

Page 1: Encoding and Presenting Interlinear Text Using XML Technologies

Encoding and Presenting Interlinear Text

Using XML Technologies

Baden Hughes, Steven Bird, Catherine Bow

University of MelbourneAustralasian Language Technology Workshop

December 10, 2003

Page 2: Encoding and Presenting Interlinear Text Using XML Technologies

Hughes, Bird, Bow (ALTW 03)2

Introduction / OutlineIntroduction / Outline

What is interlinear text? EMELD Interlinear Text Model XML Representation Interlinear Text Styles XSL Rendering Prototype & Implementation Future Research

Page 3: Encoding and Presenting Interlinear Text Using XML Technologies

Hughes, Bird, Bow (ALTW 03)3

What is interlinear text?What is interlinear text?

A standard presentational form for displaying a source text aligned with a variety of linguistic annotations may include phonological, morphological, syntactic

analyses, glosses, translations, comments Variations in structure, alignment, display styles,

mapping, wrapping, etc. Typical example of three line text:

Yidinj (Dixon 1977)

Page 4: Encoding and Presenting Interlinear Text Using XML Technologies

Hughes, Bird, Bow (ALTW 03)4

Interlinear text samples Interlinear text samples (contd)(contd)

Nivkh (Comrie 1981)

text

metadata

notes

free translation

Page 5: Encoding and Presenting Interlinear Text Using XML Technologies

Hughes, Bird, Bow (ALTW 03)5

Interlinear Text SamplesInterlinear Text Samples

\_sh v3.0 485 SE Text\itm kalsrap.mov\nt Story from tape 20001bx told by Kalsarap Namaf. \aud kalsrap.mov\as 0\ae 13.0002\tx Akit tumaui tae esan ipi, go\mr akit tu- mau tae esan i - pi go\mg 1plincS 1plincRS- all know place 3sgRS - be and\POS pron pron- quantifier vambi n pron - v conj\fg We all know that place, and this Litrapong…\fgb Yumi evriwan isave ples ia. Mo Litrapong (Lisepsep) ia.

South Efate (Namaf, 2001)

Page 6: Encoding and Presenting Interlinear Text Using XML Technologies

Hughes, Bird, Bow (ALTW 03)6

EMELD Interlinear Text ModelEMELD Interlinear Text Model

TEXT

WORDWORD

PHRASE PHRASE PHRASE

WORD WORD WORD WORD

MMM M M M M

WORD

(M = Morph)

MMM M MM M

Page 7: Encoding and Presenting Interlinear Text Using XML Technologies

Hughes, Bird, Bow (ALTW 03)7

XML RepresentationXML Representation

<interlinear-text> <item type=”user-defined”> Content at the text level, such as metadata, or an unaligned transcription of the entire text, or a pointer to an unaligned audio file </item> <phrases> Nested XML content to represent the phrasal constituents of the text </phrases></interlinear-text>

Each level is considered an element in an XML document

Page 8: Encoding and Presenting Interlinear Text Using XML Technologies

Hughes, Bird, Bow (ALTW 03)8

<interlinear-text> <item type=“title”>A Yidinj Story</item> <phrases> <phrase> <item type=“number”>99</item> <item type=“gls”>Where have you come from?</item> <words> <word> <item type=“txt”>nundu</item> <morphs> <morph> <item type=“gls”>you-SA</item> </morph> </morphs> </word> <word> <item type=“txt”>wandam</item> <morphs> <morph> <item type=“gls”>where-ABL</item> </morph> </morphs> </word> </words> <phrase> </phrases></interlinear-text>

XML Representation – Yidinj textXML Representation – Yidinj text

Page 9: Encoding and Presenting Interlinear Text Using XML Technologies

Hughes, Bird, Bow (ALTW 03)9

Interlinear Text StylesInterlinear Text Styles Row display Row styles Row ordering Grouping of content

\TEXT nyewøxi nyenæcyøje q syo q\MNG (noun+Ø+vbs) (noun+n/j+acpl+cbs)(suf+genpl) (noun+jo+gbs) (suf+nompl)\BASE nyewøxi nyenæcyøh syo\MITA Traditional folk songs.

(1) Tundra Nenets (Paakkan, 1997)

Nyewºxiº nyenecyøyeq syoq.ancient.ABS.NOM.SG person.ABS.GEN.PL song.ABS.NOM.PLTraditional folk songs.

(2) Tundra Nenets (Susoi 1990)

Page 10: Encoding and Presenting Interlinear Text Using XML Technologies

Hughes, Bird, Bow (ALTW 03)10

Key presentation challenge of interlinear text Complexity due to relative length of analysis

& source

Line-wrappingLine-wrapping

Nyewºxiº nyenecyøyeq syoqancient.ABS.NOM.SG person.ABS.GEN.PL song.ABS.NOM.PL

Traditional folk songs

Implications for rendering technology

Hypothetical line length

Hypothetical line wrapping

Correct line wrapping

Page 11: Encoding and Presenting Interlinear Text Using XML Technologies

Hughes, Bird, Bow (ALTW 03)11

XSL RenderingXSL Rendering

Transforms XML docs into other formats Generate a variety of useful formats for

human consumption (e.g. html, pdf, jpg) machine consumption (e.g. to another XML

format) Two stages:

Convert XML to format specifying grouping, row ordering, styles

Convert XML into formatting instructions of another language

Conversion to XSL Formatting Objects (XSL-FO) Rendering into delivery format

Page 12: Encoding and Presenting Interlinear Text Using XML Technologies

Hughes, Bird, Bow (ALTW 03)12

XSL Formatting ObjectsXSL Formatting Objects

XSL-FO is an XML application that describes how pages will look when presented to a reader

XML + XSL XSL-FO OUTPUT

Abstractrepresentation

Stylesheettransformation

Rendered version: XML, PDF, JPG, etc.

Abstract presentational format

Page 13: Encoding and Presenting Interlinear Text Using XML Technologies

Hughes, Bird, Bow (ALTW 03)13

XSL ImplementationXSL Implementation

XSL1

XMLUR

Abstractrepresentation

Delivery

XSL2

XSL3

XMLSR

Surfacerepresentation

XSLFO XMLFO

PDF

HTML

RTF

SVG

JPEG

XSLPUB

XSLPUB

Rendered in XML

Page 14: Encoding and Presenting Interlinear Text Using XML Technologies

Hughes, Bird, Bow (ALTW 03)14

<xsl:template match=”phrase”> <phrase> <xsl:apply-templates select=”words”/> <xsl:apply-templates select=”item”/> </phrase>

</xsl:template>

XSL Example - phraseXSL Example - phrase

Page 15: Encoding and Presenting Interlinear Text Using XML Technologies

Hughes, Bird, Bow (ALTW 03)15

<xsl:template match=“document”> <document> <interlinear-text> <phrases> <xsl:for-each select=“interlinear-text/phrases/phrase/words/word”> <xsl:sort select=“.”/> <phrase> <words> <xsl:copy-of select=“.”/> </words> </phrase> </xsl:for-each> </phrases> </interlinear-text> </document> </xsl:template>

XSL Example - documentXSL Example - document

Page 16: Encoding and Presenting Interlinear Text Using XML Technologies

Hughes, Bird, Bow (ALTW 03)16

Example: Nenets interlinear Example: Nenets interlinear (Susoi)(Susoi)

Page 17: Encoding and Presenting Interlinear Text Using XML Technologies

Hughes, Bird, Bow (ALTW 03)17

Example: Nenets (Susoi) Example: Nenets (Susoi) structurestructure

Page 18: Encoding and Presenting Interlinear Text Using XML Technologies

Hughes, Bird, Bow (ALTW 03)18

Example: Nenets (Susoi) Example: Nenets (Susoi) wordlistwordlist

Page 19: Encoding and Presenting Interlinear Text Using XML Technologies

Hughes, Bird, Bow (ALTW 03)19

PrototypePrototype Underlying Data Surface Display Variant Display

Simple display types Free translation as separate block or separate frame for synchronised scrolling and

linking Complex display types

Metastructural display Row re-ordering Optional row display Wordlist linkage Concordance linkage

Page 20: Encoding and Presenting Interlinear Text Using XML Technologies

Hughes, Bird, Bow (ALTW 03)20

ImplementationImplementation

User Interface Select input text, display types, output

format Parameterisation Logic

Processed by script to determine display type and result type

Rendering Engine Combines source and option parameters

to generate appropriate output type for browser to display

Page 21: Encoding and Presenting Interlinear Text Using XML Technologies

Hughes, Bird, Bow (ALTW 03)21

Page 22: Encoding and Presenting Interlinear Text Using XML Technologies

Hughes, Bird, Bow (ALTW 03)22

Future ResearchFuture Research

Architectural Extensions Linguistic ontologies Text mining and retrieval Compatibility with other schemata

API for interlinear text manipulation Embedding interlinear functionality in

application instances e.g. AGTK

Page 23: Encoding and Presenting Interlinear Text Using XML Technologies

Hughes, Bird, Bow (ALTW 03)23

ConclusionConclusion

Interlinear text as a pervasive data type in linguistics Various tools available to create and edit Outputs tied to particular implementations

Need for open extensible model Allows reuse of interlinear text in different

output formats XML-based structural encoding allows for

manipulation and querying