MW2010: Graham Davies and Dafydd James, Evaluating the online audience of a new collections website
Hypermedia Lexica and Lexicon Metadata The MetaLex model in the ModeLex project Dafydd Gibbon U...
-
Upload
owen-ramsey -
Category
Documents
-
view
213 -
download
0
Transcript of Hypermedia Lexica and Lexicon Metadata The MetaLex model in the ModeLex project Dafydd Gibbon U...
Hypermedia Lexica andHypermedia Lexica andLexicon MetadataLexicon Metadata
The MetaLex model in the ModeLex projectThe MetaLex model in the ModeLex project
Dafydd GibbonDafydd GibbonU BielefeldU Bielefeld
EuropeEurope
E-MELD Workshop, Detroit, August 2002E-MELD Workshop, Detroit, August 2002
OverviewOverview
Metalex goalsMetalex goalsBackground: DATR, Hyprlex, Speech, Language Background: DATR, Hyprlex, Speech, Language
DocumentationDocumentation
Metalex design: theory and practiceMetalex design: theory and practiceLexical documents & metadocumentsLexical documents & metadocumentsLexical objects, properties, structuresLexical objects, properties, structures
Metalex implementationMetalex implementationIvory Coast encyclopaedia projectIvory Coast encyclopaedia projectEga documentation model projectEga documentation model projectThe Modelex (multimodal lexicon) projectThe Modelex (multimodal lexicon) projectIvory Coast + Nigeria documentation curriculum projectIvory Coast + Nigeria documentation curriculum project
Extending metalexExtending metalexModalities & submodalitiesModalities & submodalitiesData-driven lexicographyData-driven lexicographyData structures & algorithms: trees, lattices; induction, inferenceData structures & algorithms: trees, lattices; induction, inference
General objectives:General objectives: Versatile high quality spoken language lexicographyVersatile high quality spoken language lexicography Motivated balance of high-tech + low techMotivated balance of high-tech + low tech Good resources are Good resources are data-drivendata-driven and and theory-informedtheory-informed
Specific project objectives:Specific project objectives: DATR/ILEX: formal lexicon theory and implementationDATR/ILEX: formal lexicon theory and implementation
VerbMobil: integrated HyprLex dissemination modelVerbMobil: integrated HyprLex dissemination model
HyprLex encyclopaedia model for Ivory Coast LanguagesHyprLex encyclopaedia model for Ivory Coast Languages
Ega endangered language documentation modelEga endangered language documentation model
Modelex - theory and design of multimodal lexicaModelex - theory and design of multimodal lexica
Ivory Coast and Nigeria curricula for language documentationIvory Coast and Nigeria curricula for language documentation
Metalex goals: backgroundMetalex goals: background
Data-driven data + metadata acqusition:Data-driven data + metadata acqusition:
Systematic metatext derived from and supporting ...Systematic metatext derived from and supporting ... Computational fieldworkComputational fieldwork
Induction of lexicaInduction of lexica
Theory-informed data + metadata acquisition:Theory-informed data + metadata acquisition:
Integrated Lexicon (ILEX) consisting of ...Integrated Lexicon (ILEX) consisting of ... Abstract Lexicon (ALEX) - "theory" in the mathematical senseAbstract Lexicon (ALEX) - "theory" in the mathematical sense
Object Lexicon (OLEX) - "model" in the mathematical senseObject Lexicon (OLEX) - "model" in the mathematical sense
Metalex design: data and theoryMetalex design: data and theory
Data-driven acquisition:Data-driven acquisition: Computational fieldworkComputational fieldwork
Portable metadatabase with restricted vocabulary and general metatext, andPortable metadatabase with restricted vocabulary and general metatext, and
Definition of and support for transcription + annotationDefinition of and support for transcription + annotation Portable support for scenarios, scriptsPortable support for scenarios, scripts Portable support for lexicon processingPortable support for lexicon processing
Induction of lexicaInduction of lexicaLexicon tools forLexicon tools for
Extraction of macrostructural elements (lexeme elements)Extraction of macrostructural elements (lexeme elements) Induction of microstructural information (media concordance, POS, ...)Induction of microstructural information (media concordance, POS, ...) Induction of mesostructural regularities and subregularities (grammar, ...)Induction of mesostructural regularities and subregularities (grammar, ...)
Metalex design: dataMetalex design: data
Theory-informed formalisation:Theory-informed formalisation: Abstract Lexicon (ALEX) - "theory" in the mathematical senseAbstract Lexicon (ALEX) - "theory" in the mathematical sense
Decomposition (componential A-V description)Decomposition (componential A-V description) Generalisation (inheritance)Generalisation (inheritance) Composition (multilinear operations)Composition (multilinear operations)
Object Lexicon (OLEX) - "model" in the mathematical senseObject Lexicon (OLEX) - "model" in the mathematical sense XML archiving and dissemination formatsXML archiving and dissemination formats object-relational database acquisition and processing formatsobject-relational database acquisition and processing formats
= Integrated Lexicon (ILEX)= Integrated Lexicon (ILEX)
Metalex design: theoryMetalex design: theory
Data model Data model Theory = shared lexicon architecture: Theory = shared lexicon architecture: Macrostructure: declarative and procedural componentsMacrostructure: declarative and procedural components
Lexicon architecture: relational, inheritance, text, ...Lexicon architecture: relational, inheritance, text, ... Lexical objects: entry typesLexical objects: entry types Lexical access: fact query, semasiological / onomasiological indexingLexical access: fact query, semasiological / onomasiological indexing
Mesostructure:Mesostructure: Generalisations: grammar, phonetics, cultural background, ...Generalisations: grammar, phonetics, cultural background, ... Composition of lexicon object types: idioms, words, morphemes, ...Composition of lexicon object types: idioms, words, morphemes, ... Lexical access: inferential queryLexical access: inferential query
Microstructure:Microstructure: Lexical entry (article, lemma structure - atom, string, tree, ...)Lexical entry (article, lemma structure - atom, string, tree, ...) Types of lexical information - standardly: "lexicon model"Types of lexical information - standardly: "lexicon model"
Metalex implementation:Metalex implementation:architecturearchitecture
Microstructure specification philosophy:Microstructure specification philosophy: Anybody can specify any kind of unpredictable detailAnybody can specify any kind of unpredictable detail
Questionnaire / Experiment / Corpus / Archive dependenceQuestionnaire / Experiment / Corpus / Archive dependence Lexicon architecture: relational, inheritance, text, ...Lexicon architecture: relational, inheritance, text, ... Intelligent (semi-)automatic classification, not fixed attributesIntelligent (semi-)automatic classification, not fixed attributes
Theory-informed coarse grouping is possibleTheory-informed coarse grouping is possible Media attributes: visual, auditory, tactile, ...Media attributes: visual, auditory, tactile, ... Meaning attributes: definition, gloss, lexical relations, ...Meaning attributes: definition, gloss, lexical relations, ... Composition attributes: context/category, parts, operationsComposition attributes: context/category, parts, operations Use attributes: style, register, concordance, media illustrations, ...Use attributes: style, register, concordance, media illustrations, ... Micrometadata attributes: lexicographer DB indices, source (e.g. Micrometadata attributes: lexicographer DB indices, source (e.g.
fieldwork metadata) DB indices, modification, ...fieldwork metadata) DB indices, modification, ...
Metalex implementation:Metalex implementation:microstructuremicrostructure
Metalex implementation:Metalex implementation:fieldwork metadata source (1)fieldwork metadata source (1)
Situation dimensionsSituation dimensions participant: fieldworker, partners, contactsparticipant: fieldworker, partners, contacts
channel: modalities, mediachannel: modalities, media
locale: indoor/outdoor, spatial configurationlocale: indoor/outdoor, spatial configuration
temporal: date, time, calendar eventtemporal: date, time, calendar event
functional: affiliation, role, occasion; observation (prompt, metadata management)functional: affiliation, role, occasion; observation (prompt, metadata management)
Language dimensionLanguage dimension affiliationaffiliation
discourse level: discourse type, genre + prosody discourse level: discourse type, genre + prosody
phrase level: recursive phrasal categories/relations + prosodyphrase level: recursive phrasal categories/relations + prosody
word level: clitics, inflexion, word formation + prosodyword level: clitics, inflexion, word formation + prosody
Metalex implementation:Metalex implementation:fieldwork metadata source (2)fieldwork metadata source (2)
Technical dimensionTechnical dimension physical characteristics of participants: age, sex, healthphysical characteristics of participants: age, sex, health
physical characteristics of locale: indoor/outdoor, spatial configuration, temporal physical characteristics of locale: indoor/outdoor, spatial configuration, temporal sequence, date (season), time (of day)sequence, date (season), time (of day)
audio: mike type, position, room; A/D; channels, faudio: mike type, position, room; A/D; channels, fsamplesample, resolution; formats , resolution; formats
video: camera & microphone type, analogue/digital; filters, lenses; audio; formats video: camera & microphone type, analogue/digital; filters, lenses; audio; formats
other sensors: laryngograph, airflow, data glove, ...other sensors: laryngograph, airflow, data glove, ...
Metalinguistic dimensionMetalinguistic dimension empirical method: introspection, experiment, corpus elicitationempirical method: introspection, experiment, corpus elicitation
materials: questionnaire, experiment layout, corpus scenariomaterials: questionnaire, experiment layout, corpus scenario
metadata specification: index, metatext type, metacatalogue typemetadata specification: index, metatext type, metacatalogue type
Metalex implementation:Metalex implementation:fieldwork metadata entry toolfieldwork metadata entry tool
LREC 2002, Workshop on Portability IssuesLREC 2002, Workshop on Portability Issues
Metalex implementation:Metalex implementation:fieldwork metadata entry toolfieldwork metadata entry tool
HanDBaseDBMS forPalmOS
Metalex objectsMetalex objectsin conjunction with work in ISLE CLWGin conjunction with work in ISLE CLWG(Computational Lexicon Working Group)(Computational Lexicon Working Group)
(see Gibbon in reading (see Gibbon in reading list)list)
LEXICON:LEXICON: { < Macrostructure > , < Mesostructure > }{ < Macrostructure > , < Mesostructure > }
Macrostructure: Ordering( {ENTRY, ...} )Macrostructure: Ordering( {ENTRY, ...} ) Mesostructure: < FrontmatterMetadata, Descriptions >Mesostructure: < FrontmatterMetadata, Descriptions >
ENTRY:ENTRY: < Microstructure, HousekeepingMetadata >< Microstructure, HousekeepingMetadata >
The LEXICON objectThe LEXICON object
Front Matter Metadata:Front Matter Metadata: Bibliographical: creator, publisher, title, date, ...Bibliographical: creator, publisher, title, date, ...
Medium / format: paper, CD-ROM/DVD, web, ...Medium / format: paper, CD-ROM/DVD, web, ...
Macrostructure type:Macrostructure type: access: semasiological/onomasiological,access: semasiological/onomasiological,
n-lingual/langue(s),n-lingual/langue(s),
special: taxonomy (thesaurus), concordancespecial: taxonomy (thesaurus), concordance
structure, e.g. tabular: f(type,attrib)=valuestructure, e.g. tabular: f(type,attrib)=value
The ENTRY object: metadataThe ENTRY object: metadata
Entry Metadata:Entry Metadata: (see Gibbon & al. in reading list)(see Gibbon & al. in reading list)
Entry type (wrt macrostructure specification):Entry type (wrt macrostructure specification): encyclopaedicencyclopaedic multiword unit, word, ...multiword unit, word, ...
Microstructure data model specification:Microstructure data model specification: entry structure: flat, tree, graph (net), ...entry structure: flat, tree, graph (net), ... dta categories specification (atribute, field, information type)dta categories specification (atribute, field, information type)
DC groups - structural skeletonDC groups - structural skeleton DCsDCs DC substructure - homography, homophony, polysemy ...DC substructure - homography, homophony, polysemy ...
The ENTRY object: DC groupsThe ENTRY object: DC groups
Media ("surface"):Media ("surface"): acoustic (phonetic, earcon, sonification,), visual (orthography, icon, gesture, ...)acoustic (phonetic, earcon, sonification,), visual (orthography, icon, gesture, ...)
Composition (structure):Composition (structure): part (e.g. morphology for words), context (e.g. POS, subcat for words)part (e.g. morphology for words), context (e.g. POS, subcat for words)
Meaning (definition, illustration):Meaning (definition, illustration): semantic (components, relations, senses, ontology)semantic (components, relations, senses, ontology)
pragmatic (speech act, dialogue, disfluency, ...)pragmatic (speech act, dialogue, disfluency, ...)
Use: Use: typically: media (e.g. audio) concordance, ...typically: media (e.g. audio) concordance, ...
Metadata: Metadata: lexicographer, ...lexicographer, ...
The ENTRY object: DCsThe ENTRY object: DCs
Countless Data Category models:Countless Data Category models: (see reading list)(see reading list)
every existing dictionaryevery existing dictionary linguistic "types of lexical information"linguistic "types of lexical information" several European projectsseveral European projects
(GENELEX, MULTILEX, ACQUILEX, ...)(GENELEX, MULTILEX, ACQUILEX, ...)
ISO terminology norms (cf. MARTIF etc. ...)ISO terminology norms (cf. MARTIF etc. ...)
The ENTRY object: DC structuresThe ENTRY object: DC structures
Computationally relevant properties of fields:Computationally relevant properties of fields: type (atomic, complex: tree, string, xyz-formatted text)type (atomic, complex: tree, string, xyz-formatted text)
character encoding spec.: ASCII, Unicode, xyzcharacter encoding spec.: ASCII, Unicode, xyz
tree (or other graph/net):tree (or other graph/net): finite depthfinite depth flat, disjunctive disjunctive treeflat, disjunctive disjunctive tree recursive graph (net)recursive graph (net)
table, non-tree graph, anchor/link/index structuretable, non-tree graph, anchor/link/index structure
generated text:generated text: print, hypertext (compiled vs. dynamic (generated on the fly)print, hypertext (compiled vs. dynamic (generated on the fly)
Metalex microstruture applicationMetalex microstruture application
Media ("surface"):Media ("surface"): phonemic & tonemic transcription (SAMPA ASCII - still waiting for Unicode...)phonemic & tonemic transcription (SAMPA ASCII - still waiting for Unicode...)
Composition (structure):Composition (structure): morphemic substructure, category & subcategorymorphemic substructure, category & subcategory
Meaning (definition, illustration):Meaning (definition, illustration): glosses (English, French, German)glosses (English, French, German)
definitions, senses, relations, components; audio-visual illustrationdefinitions, senses, relations, components; audio-visual illustration
Use: Use: genres; examples (e.g. concordance link); free text notesgenres; examples (e.g. concordance link); free text notes
Metadata:Metadata: first record; last field first record; last field
Metalex field lexicon microstrutureMetalex field lexicon microstruture
Anouman_1:Anouman_1: Media attributes:Media attributes:
Phonemic tier: `an'U~m`'a~Phonemic tier: `an'U~m`'a~ Skeletal tier: VNVNVSkeletal tier: VNVNV Tonal tier: L H LHTonal tier: L H LH Signal tier: AudioSignal tier: Audio
Meaning attributes:Meaning attributes: F-gloss: OiseauF-gloss: Oiseau E-gloss: BirdE-gloss: Bird G-gloss: VogelG-gloss: Vogel Definition: Definition: avisavis Homophone full: Anouman_2: grandchildHomophone full: Anouman_2: grandchild Homophone phonemic: Anouman_3: yesterdayHomophone phonemic: Anouman_3: yesterday
Use:Use: < Concordance pointer >< Concordance pointer > Genre: narrativeGenre: narrative
Metadata:Metadata: Lexicographer: S. AdouakouLexicographer: S. Adouakou Source: Bielefeld-Anyi-Corpus, Source: Bielefeld-Anyi-Corpus,
Adaou village, CIAdaou village, CI Date: March 2002Date: March 2002
Metalex portable lexical databaseMetalex portable lexical database
Relational database:Relational database: Metalex specs flattenedMetalex specs flattened structure re-constitution structure re-constitution
via metalex specsvia metalex specs HanDBase for PalmOSHanDBase for PalmOS Features:Features:
standard full RelDBMSstandard full RelDBMS XML, CSV, text exportXML, CSV, text export export/import via GSMexport/import via GSM inexpensive (wrt laptop)inexpensive (wrt laptop) stylus, keyboard, sync inputstylus, keyboard, sync input light weightlight weight low power consumptionlow power consumption inconspicous in useinconspicous in use interfaces to Scheme, Cinterfaces to Scheme, C
Metalex extensionMetalex extensionThe Modelex project:The Modelex project:
"Theory and Design of Multimodal Lexica""Theory and Design of Multimodal Lexica"
Goals:Goals: Data-driven, theory-informed lexicon modelsData-driven, theory-informed lexicon models
Formal properties of abstract data models for multimodal lexicaFormal properties of abstract data models for multimodal lexica
Interpretation of abstract data models in XMLInterpretation of abstract data models in XML
Integration of parallel annotation lattices for Integration of parallel annotation lattices for modalitiesmodalities and and submodalitiessubmodalities
Development of a prototype multimodal lexiconDevelopment of a prototype multimodal lexicon
The Modelex domain:The Modelex domain:modalities and submodalitiesmodalities and submodalities
Modelex: data driven lexicographyModelex: data driven lexicography
Modelex: gesture annotationModelex: gesture annotation
TTime ime AAligned ligned SSignalignal
CCorpus orpus SSystemystem(Java, GPL)(Java, GPL)
Jan-Torsten Milde, U BielefeldJan-Torsten Milde, U Bielefeld
TASX annotator:TASX annotator:
Phonological tierPhonological tier
ToBI tiersToBI tiers
Gesture tierGesture tier
Speech Act tierSpeech Act tier
Anyi, Ega, GermanAnyi, Ega, German
Model-theoretic compilation in ILEX:Model-theoretic compilation in ILEX:INTERPRETATION INTERPRETATION ( ALEX ) = OLEX( ALEX ) = OLEX
Metalex in the Modelex project:Metalex in the Modelex project:MMultimodal concordance as microstructure DCultimodal concordance as microstructure DC
Prototype: http://www.spectrum.uni-bielefeld.de/langdoc/PAX/Prototype: http://www.spectrum.uni-bielefeld.de/langdoc/PAX/
Metalex in the Modelex project:Metalex in the Modelex project:underspecified ALEX microstructure for underspecified ALEX microstructure for
gesture coordinatesgesture coordinates
Hand: <parts> == "Palm" "Digit" <vector> == "<name>" <coord "<name>"> <coord> == "<x1>" "<y1>" "<x2>" "<y2>" <> ==.Palm: <parts> == <vector> <name> == palm <width> == pw <height> == ph <x1 fore> == <x1> <x1 middle> == ( <x1> + ( <x2> - <x1> ) / 3 ) <x1 ring> == ( <x1> + ( <x2> - <x1> ) * 2 / 3 ) <x1 pinky> == <x2> <x1> == px1 <y1> == py1 <x2> == ( <x1> + <width> ) <y2> == ( <y1> + <height> ) <> == Hand.
Metalex in the Modelex project:Metalex in the Modelex project:fully specified ALEX microstructure for gesture fully specified ALEX microstructure for gesture
coordinatescoordinates
Hand:<parts> =
palm px1 py1 ( px1 + pw ) ( py1 + ph )
thumb px1 py1 ( px1 - lt ) py1
fore px1 py1 px1 ( py1 - lf )
middle ( px1 + ( ( px1 + pw ) - px1 ) / 3 ) py1 ( px1 + ( ( px1 + pw ) - px1 ) / 3 ) ( py1 - lm )
ring ( px1 + ( ( px1 + pw ) - px1 ) * 2 / 3 ) py1 ( px1 + ( ( px1 + pw ) - px1 ) * 2 / 3 ) ( py1 - lr )
pinky ( px1 + pw ) py1 ( px1 + pw ) ( py1 - lp )
Metalex: conclusion & prospectsMetalex: conclusion & prospects
User complexity:User complexity: demands an open, data-driven approachdemands an open, data-driven approach
Domain:Domain: demands a theory-informed approachdemands a theory-informed approach with computational acquisition & inferencewith computational acquisition & inference
Data-driven Data-driven andand theory-informed lexica theory-informed lexica are possible (METALEX)are possible (METALEX) need integrated model-theoretic approach (ILEX):need integrated model-theoretic approach (ILEX):
INTERPRETATIONINTERPRETATION (ALEX) = OLEX (ALEX) = OLEX a formal problem remains: differing complexity ofa formal problem remains: differing complexity of
trees (archive): simulation of other graphs via semantics onlytrees (archive): simulation of other graphs via semantics onlyannotation lattices (data), tables (lexica):annotation lattices (data), tables (lexica):regular relations if non-recursive, indexed grammars if recursive?regular relations if non-recursive, indexed grammars if recursive?