osis linguistic annotation

29
osis linguistic annotation definitions and requirements kirk e. lowery westminster hebrew institute sbl computer-assisted research group

description

definitions and requirements. osis linguistic annotation. kirk e. lowery westminster hebrew institute sbl computer-assisted research group. why osis linguistic annotation?. the context. the goal of osis. to exchange electronic b ibles any language, medium, presentation style - PowerPoint PPT Presentation

Transcript of osis linguistic annotation

Page 1: osis linguistic annotation

osis linguistic annotation

definitions and requirements

kirk e. lowerywestminster hebrew institute

sbl computer-assisted research group

Page 2: osis linguistic annotation

the context

why osis linguistic annotation?

Page 3: osis linguistic annotation

the goal of osis

to exchange electronic biblesany language, medium, presentation style

to add “meta-information” to those textskeywords: “link”, “hierarchy”, “pyramid”

to easily transform these textsthe target transformation is unknown

to cut costs: production, presentation, distribution of bibles plus “meta-data”time, money, people

Page 4: osis linguistic annotation

why exchange bible texts?

coordination within organizations cooperation between organizations and

between individuals publish in multiple formats and media

from one “canonical” source long-term archival the changing definition of “publish”

documents have a life cycle!

Page 5: osis linguistic annotation

who wants to exchange texts?

bible publishers commercial publishing houses denominations & bible societies

bible translators translation teams & editors consultants & supervisors

bible scholars original languages, text criticism text analysis and commentary

Page 6: osis linguistic annotation

text “meta-data”

what informationneeds to be captured?

Page 7: osis linguistic annotation

translators:managing the translation process document versions & responsibility comments & corrections by editors handling presentation issues

script direction “rubies”

linking source, relay & target translations linking supplementary information

notes, glossaries, maps

Page 8: osis linguistic annotation

translators & scholars:focus on the text manuscript collation & description text criticism: establishment of the original linguistic analysis

text segmentation segment id: from phoneme to text structures linguistic mapping of source & target

alignment: parallel & synoptic texts

Page 9: osis linguistic annotation

linguistic annotation

how can we capturethe information?

Page 10: osis linguistic annotation

required

a way to segment the text a mechanism for associating labels with an

arbitrary text-span a means to declare labels used in analysis

a common linguistic vocabulary language-specific grammar terms

a protocol for user redefinition

Page 11: osis linguistic annotation

segmenting text<seg id="gn1:1,1.1">B.:</seg><seg id="gn1:1,1.2">R")$IYT</seg><seg id="gn1:1,2.1">B.FRF)</seg><seg id="gn1:1,3.1">):ELOHIYM </seg><seg id="gn1:1,4.1">)"T</seg><seg id="gn1:1,5.1">HA</seg><seg id="gn1:1,5.2">$.FMAYIM</seg><seg id="gn1:1,6.1">W:</seg><seg id="gn1:1,6.2">)"T</seg><seg id="gn1:1,7.1">HF</seg><seg id="gn1:1,7.2">)FREC</seg>

start tag

unique identification

hebrew text

end tag

Page 12: osis linguistic annotation

adding annotation (1)<seg id="gn1:1,1.1">B.:

<lemma>B.</lemma><particle type="preposition" />

</seg><seg id="gn1:1,1.2">R")$IYT

<lemma>R")$IYT</lemma><noun type="common"

features="fsa" /></seg><seg id="gn1:1,2.1">B.FRF)

<lemma homonym="1">B.R)</lemma><verb stem="q" conjugation="p"

pgn="3ms" /></seg><seg id="gn1:1,3.1">):ELOHIYM

<lemma>):ELOHIYM</lemma><noun type="common" features="mpa" />

</seg><seg id="gn1:1,4.1">)"T

<lemma homonym="1">)"T</lemma><particle type="object_marker" />

</seg>

content tag

“milestone” tag

Page 13: osis linguistic annotation

adding annotation (2)<seg id="gn1:1,5.1">HA

<lemma>H</lemma><particle type="article" />

</seg><seg id="gn1:1,5.2">$.FMAYIM

<lemma>$FMAYIM</lemma><noun type="common" features="mpa" />

</seg><seg id="gn1:1,6.1">W:

<lemma>W</lemma><particle type="conjunction" />

</seg><seg id="gn1:1,6.2">)"T

<lemma homonym="1">)"T</lemma><particle type="object_marker" />

</seg><seg id="gn1:1,7.1">HF

<lemma>H</lemma><particle type="article" />

</seg><seg id="gn1:1,7.2">)FREC

<lemma>)EREC</lemma><noun type="common" features="fsa" />

</seg>

content tag

“milestone” tag

Page 14: osis linguistic annotation

the hard part: linguistic labels

must be standard must be applicable to any conceivable

language labels are the “linguistic inventory”

must be compatible with current and future linguistic theories labels must be linguistic theory-neutral

must be redefinable by the user

Page 15: osis linguistic annotation

standard solutions: labels

expert advisory group on language engineering standards (eagles) <http://www.ilc.pi.cnr.it/EAGLES/home.html> an initiative of the european commission (1993) standard grammar labels of morphology and

syntax for european languages create osis standard labels for hebrew,

aramaic and greek

Page 16: osis linguistic annotation

standard solutions: mechanism

the text encoding initiative (tei) guidelines chapter 14: linking, segmentation, & alignment chapter 16: feature structures chapter 26: feature system declaration

“stand-off” markup (xlink) or “up-close-and-personal” (inline)? separate meta-data about the text from the text

itself? “either-or” or “both-and”?

Page 17: osis linguistic annotation

formal requirements

what we must do, exactly

Page 18: osis linguistic annotation

labels

claims made about the data itself vs claims about the claims that can be made! the linguistic model vs the analysis allowed by the

model example: does Hebrew have “adverbs”?

a library of labels as comprehensive as possible definitions to clarify what “thing” is being

labeled labels are names for grammatical objects

Page 19: osis linguistic annotation

labels as objects

grammatical “objects” have “attributes” or “features”

features can vary over a range of “values” objects & features have defaults that could be

changed objects & features could be easily extended objects & features can be arranged linearly or

hierarchically

Page 20: osis linguistic annotation

mechanism

user language declaration all labels and their relationships done by “exclusion”, not inclusion sensitive to linguistic theory

levels of language: resolution of ambiguity lexical, semantic, phonemic, morphologic,

phrase-, clause-, discourse-, theological levels “context-free” and “context-bound” analysis part-of-speech resolution

Page 21: osis linguistic annotation

tei feature structures

the feature element the most basic markup requires a label and any number of values <f t="feature name" value="feature value">

the feature structure element <fs name="feature structure name"> may contain any number of nested <f> and <fs> models some grammatical object

Page 22: osis linguistic annotation

tei feature example

<f name="conjugation"> <vAlt mutExcl="Y"> <sym id="pf" value="perfect" /> <sym id="impf" value="imperfect" /> <sym id="qppt" value="qal_passive_participle" /> <sym id="wc" value="wayyiqtol" /> <sym id="impv" value="imperative" /> <sym id="inf" value="infinitive" /> <sym id="pt" value="participle" /> </vAlt></f>

Page 23: osis linguistic annotation

tei feature structure example

<fs type="common noun features"> <f name="gender" org="set" fVal="gm gf gn" /> <f name="number" org="set" fVal="ns np nd" /> <f name="state" org="set" fVal="sa sc" /></fs>

Page 24: osis linguistic annotation

tei feature library example

<fvLib id="g" type="gender feature values"> <vAlt mutExcl="N"> <sym id="gm" value="masculine"/> <sym id="gf" value="feminine" /> <sym id="gn" value="neuter" /> </vAlt></fvLib>

Page 25: osis linguistic annotation

a different approach

<div type="x-tag" osisID="A_APFC" divTitle="A APFC"> <p>Part of speech: adjective</p> <p>Case: accusative</p> <p>Number: plural</p> <p>Gender: feminine</p> <p>Degree: comparative</p></div>

Dictionary of Packard-Style Greek Morphology Codes

Page 26: osis linguistic annotation

what can we do with feature structure marked up text?

self-organizing topic maps compare linguistic hypotheses with actual

usage XSLT transforms automated tagging of new features comparative linguistic study source↔target language grammar mapping

Page 27: osis linguistic annotation

conclusions

where do we go from here?

Page 28: osis linguistic annotation

in the short-term

complete a first pass of language modeling mark up real biblical text with annotation distribute to translators and scholars for

feedback does this meet your needs? is it practical enough that you will use it? is it flexible enough for your language(s) and

linguistic theories

Page 29: osis linguistic annotation

in the long-term

determine if tei feature structures are sufficient

decide whether to require “inline” or “standoff” markup, or to allow either

determine the best way of integrating linguistic markup with the osis core tag set

explore ideas for authoring software or, at least, linguistic annotation utility programs