Encoding and Presenting Interlinear Text Using XML Technologies
-
Upload
baden-hughes -
Category
Economy & Finance
-
view
1.111 -
download
2
description
Transcript of Encoding and Presenting Interlinear Text Using XML Technologies
Encoding and Presenting Interlinear Text
Using XML Technologies
Baden Hughes, Steven Bird, Catherine Bow
University of MelbourneAustralasian Language Technology Workshop
December 10, 2003
Hughes, Bird, Bow (ALTW 03)2
Introduction / OutlineIntroduction / Outline
What is interlinear text? EMELD Interlinear Text Model XML Representation Interlinear Text Styles XSL Rendering Prototype & Implementation Future Research
Hughes, Bird, Bow (ALTW 03)3
What is interlinear text?What is interlinear text?
A standard presentational form for displaying a source text aligned with a variety of linguistic annotations may include phonological, morphological, syntactic
analyses, glosses, translations, comments Variations in structure, alignment, display styles,
mapping, wrapping, etc. Typical example of three line text:
Yidinj (Dixon 1977)
Hughes, Bird, Bow (ALTW 03)4
Interlinear text samples Interlinear text samples (contd)(contd)
Nivkh (Comrie 1981)
text
metadata
notes
free translation
Hughes, Bird, Bow (ALTW 03)5
Interlinear Text SamplesInterlinear Text Samples
\_sh v3.0 485 SE Text\itm kalsrap.mov\nt Story from tape 20001bx told by Kalsarap Namaf. \aud kalsrap.mov\as 0\ae 13.0002\tx Akit tumaui tae esan ipi, go\mr akit tu- mau tae esan i - pi go\mg 1plincS 1plincRS- all know place 3sgRS - be and\POS pron pron- quantifier vambi n pron - v conj\fg We all know that place, and this Litrapong…\fgb Yumi evriwan isave ples ia. Mo Litrapong (Lisepsep) ia.
South Efate (Namaf, 2001)
Hughes, Bird, Bow (ALTW 03)6
EMELD Interlinear Text ModelEMELD Interlinear Text Model
TEXT
WORDWORD
PHRASE PHRASE PHRASE
WORD WORD WORD WORD
MMM M M M M
WORD
(M = Morph)
MMM M MM M
Hughes, Bird, Bow (ALTW 03)7
XML RepresentationXML Representation
<interlinear-text> <item type=”user-defined”> Content at the text level, such as metadata, or an unaligned transcription of the entire text, or a pointer to an unaligned audio file </item> <phrases> Nested XML content to represent the phrasal constituents of the text </phrases></interlinear-text>
Each level is considered an element in an XML document
Hughes, Bird, Bow (ALTW 03)8
<interlinear-text> <item type=“title”>A Yidinj Story</item> <phrases> <phrase> <item type=“number”>99</item> <item type=“gls”>Where have you come from?</item> <words> <word> <item type=“txt”>nundu</item> <morphs> <morph> <item type=“gls”>you-SA</item> </morph> </morphs> </word> <word> <item type=“txt”>wandam</item> <morphs> <morph> <item type=“gls”>where-ABL</item> </morph> </morphs> </word> </words> <phrase> </phrases></interlinear-text>
XML Representation – Yidinj textXML Representation – Yidinj text
Hughes, Bird, Bow (ALTW 03)9
Interlinear Text StylesInterlinear Text Styles Row display Row styles Row ordering Grouping of content
\TEXT nyewøxi nyenæcyøje q syo q\MNG (noun+Ø+vbs) (noun+n/j+acpl+cbs)(suf+genpl) (noun+jo+gbs) (suf+nompl)\BASE nyewøxi nyenæcyøh syo\MITA Traditional folk songs.
(1) Tundra Nenets (Paakkan, 1997)
Nyewºxiº nyenecyøyeq syoq.ancient.ABS.NOM.SG person.ABS.GEN.PL song.ABS.NOM.PLTraditional folk songs.
(2) Tundra Nenets (Susoi 1990)
Hughes, Bird, Bow (ALTW 03)10
Key presentation challenge of interlinear text Complexity due to relative length of analysis
& source
Line-wrappingLine-wrapping
Nyewºxiº nyenecyøyeq syoqancient.ABS.NOM.SG person.ABS.GEN.PL song.ABS.NOM.PL
Traditional folk songs
Implications for rendering technology
Hypothetical line length
Hypothetical line wrapping
Correct line wrapping
Hughes, Bird, Bow (ALTW 03)11
XSL RenderingXSL Rendering
Transforms XML docs into other formats Generate a variety of useful formats for
human consumption (e.g. html, pdf, jpg) machine consumption (e.g. to another XML
format) Two stages:
Convert XML to format specifying grouping, row ordering, styles
Convert XML into formatting instructions of another language
Conversion to XSL Formatting Objects (XSL-FO) Rendering into delivery format
Hughes, Bird, Bow (ALTW 03)12
XSL Formatting ObjectsXSL Formatting Objects
XSL-FO is an XML application that describes how pages will look when presented to a reader
XML + XSL XSL-FO OUTPUT
Abstractrepresentation
Stylesheettransformation
Rendered version: XML, PDF, JPG, etc.
Abstract presentational format
Hughes, Bird, Bow (ALTW 03)13
XSL ImplementationXSL Implementation
XSL1
XMLUR
Abstractrepresentation
Delivery
XSL2
XSL3
XMLSR
Surfacerepresentation
XSLFO XMLFO
HTML
RTF
SVG
JPEG
XSLPUB
XSLPUB
Rendered in XML
Hughes, Bird, Bow (ALTW 03)14
<xsl:template match=”phrase”> <phrase> <xsl:apply-templates select=”words”/> <xsl:apply-templates select=”item”/> </phrase>
</xsl:template>
XSL Example - phraseXSL Example - phrase
Hughes, Bird, Bow (ALTW 03)15
<xsl:template match=“document”> <document> <interlinear-text> <phrases> <xsl:for-each select=“interlinear-text/phrases/phrase/words/word”> <xsl:sort select=“.”/> <phrase> <words> <xsl:copy-of select=“.”/> </words> </phrase> </xsl:for-each> </phrases> </interlinear-text> </document> </xsl:template>
XSL Example - documentXSL Example - document
Hughes, Bird, Bow (ALTW 03)16
Example: Nenets interlinear Example: Nenets interlinear (Susoi)(Susoi)
Hughes, Bird, Bow (ALTW 03)17
Example: Nenets (Susoi) Example: Nenets (Susoi) structurestructure
Hughes, Bird, Bow (ALTW 03)18
Example: Nenets (Susoi) Example: Nenets (Susoi) wordlistwordlist
Hughes, Bird, Bow (ALTW 03)19
PrototypePrototype Underlying Data Surface Display Variant Display
Simple display types Free translation as separate block or separate frame for synchronised scrolling and
linking Complex display types
Metastructural display Row re-ordering Optional row display Wordlist linkage Concordance linkage
Hughes, Bird, Bow (ALTW 03)20
ImplementationImplementation
User Interface Select input text, display types, output
format Parameterisation Logic
Processed by script to determine display type and result type
Rendering Engine Combines source and option parameters
to generate appropriate output type for browser to display
Hughes, Bird, Bow (ALTW 03)21
Hughes, Bird, Bow (ALTW 03)22
Future ResearchFuture Research
Architectural Extensions Linguistic ontologies Text mining and retrieval Compatibility with other schemata
API for interlinear text manipulation Embedding interlinear functionality in
application instances e.g. AGTK
Hughes, Bird, Bow (ALTW 03)23
ConclusionConclusion
Interlinear text as a pervasive data type in linguistics Various tools available to create and edit Outputs tied to particular implementations
Need for open extensible model Allows reuse of interlinear text in different
output formats XML-based structural encoding allows for
manipulation and querying