What Linguists Want
-
Upload
leo-valdez -
Category
Documents
-
view
14 -
download
0
description
Transcript of What Linguists Want
![Page 1: What Linguists Want](https://reader035.fdocuments.in/reader035/viewer/2022072015/56812fef550346895d9565f6/html5/thumbnails/1.jpg)
What Linguists Want
(we think)
Helen Aristar Dry & Anthony Aristar
LINGUIST List & E-MELD
![Page 2: What Linguists Want](https://reader035.fdocuments.in/reader035/viewer/2022072015/56812fef550346895d9565f6/html5/thumbnails/2.jpg)
Language Documentation Used
• Research:• Historical / comparative Ling• Typology• Language description • Phonology & phonetics• Syntax• Psycholinguistics• Discourse Analysis• Anthropological linguistics• Ethnomusicology
• Teaching of all of the above
![Page 3: What Linguists Want](https://reader035.fdocuments.in/reader035/viewer/2022072015/56812fef550346895d9565f6/html5/thumbnails/3.jpg)
So they want• Access
• Central index of available material that supports flexible searching
• Ability to preview material• Clear indication of access rights • Fast permissions (24-hour turnaround)
• Stability• Cited versions of resources still available• Assembled sub-corpora available for a
specified period of time, e.g., for the duration of a course
![Page 4: What Linguists Want](https://reader035.fdocuments.in/reader035/viewer/2022072015/56812fef550346895d9565f6/html5/thumbnails/4.jpg)
• Ease of use • Single interface — things work the
same way in different archives (hard to misunderestimate the
technical skill of academics)• Registration that persists—i.e., they
don’t have to keep filling out registration forms
These desiderata addressed in Scenarios 4 and 5
![Page 5: What Linguists Want](https://reader035.fdocuments.in/reader035/viewer/2022072015/56812fef550346895d9565f6/html5/thumbnails/5.jpg)
And they would like• Ability to manipulate the data
• To annotate corpus & share annotations with co-researchers
• To track their own annotations & additions (as opposed to those of others)
• To use a concordance program or other text processing program on the corpus
• To extract relevant portions of texts and create a sub-sub-corpus; to share this sub-corpus with co-researchers or students
![Page 6: What Linguists Want](https://reader035.fdocuments.in/reader035/viewer/2022072015/56812fef550346895d9565f6/html5/thumbnails/6.jpg)
They would REALLY like
• Ability to identify resources by searching for linguistic structures, e.g.• Morphosyntactic categories (classifiers)• Morphosyntactic features (paucal)• Phonetic features (nasalization)*• Supersegmentals (tone)*
• E.g. to search, not just the metadata, but the annotations and transcriptions of the archived material.
*transcriptions, not sound — though search by sound would be even better
![Page 7: What Linguists Want](https://reader035.fdocuments.in/reader035/viewer/2022072015/56812fef550346895d9565f6/html5/thumbnails/7.jpg)
Structures central to:
• Research:• Historical / comparative Ling• Typology• Language description • Phonology & phonetics• Syntax
• Teaching of all of the above
![Page 8: What Linguists Want](https://reader035.fdocuments.in/reader035/viewer/2022072015/56812fef550346895d9565f6/html5/thumbnails/8.jpg)
Want to answer Qs like:
• Do all IE languages have a contrast between voiced and unvoiced consonants?
• Which languages have a distinction between trial and paucal number?
• Where can I find examples of voiceless nasals (e.g., for a phonology problem)?
![Page 9: What Linguists Want](https://reader035.fdocuments.in/reader035/viewer/2022072015/56812fef550346895d9565f6/html5/thumbnails/9.jpg)
Need to search for…
• Morphemes representing morphosyntactic categories and features
• Phonetic segments • Co-occurrences of segments,
categories, & features
![Page 10: What Linguists Want](https://reader035.fdocuments.in/reader035/viewer/2022072015/56812fef550346895d9565f6/html5/thumbnails/10.jpg)
Need to search by
• Language families and subgroups• Feature classes (e.g. “stops”, not
[ b ] )• Morphosyntactic concepts (not just
terminology, as this varies)
![Page 11: What Linguists Want](https://reader035.fdocuments.in/reader035/viewer/2022072015/56812fef550346895d9565f6/html5/thumbnails/11.jpg)
Requires enhanced
•Documentation •Meta-information •Search tools
![Page 12: What Linguists Want](https://reader035.fdocuments.in/reader035/viewer/2022072015/56812fef550346895d9565f6/html5/thumbnails/12.jpg)
Documentation
•Complete & transparent phonetic transcription
•Detailed & transparent morphosyntactic annotation
•Unambiguous language identification & classification
![Page 13: What Linguists Want](https://reader035.fdocuments.in/reader035/viewer/2022072015/56812fef550346895d9565f6/html5/thumbnails/13.jpg)
Meta-Information
• Unambiguous language identification system (language codes)
• Language classification system, organizing languages into families and subgroups
• Structured (graphic) taxonomy of phonetic features
![Page 14: What Linguists Want](https://reader035.fdocuments.in/reader035/viewer/2022072015/56812fef550346895d9565f6/html5/thumbnails/14.jpg)
Meta-Information• Structured taxonomy of
morphosyntactic categories and features (concepts and definitions)
• Lists of morphosyntactic terminology in use by various groups
• Mapping of the different terminology sets to the concepts and definitions
![Page 15: What Linguists Want](https://reader035.fdocuments.in/reader035/viewer/2022072015/56812fef550346895d9565f6/html5/thumbnails/15.jpg)
Search tools that can
• Interpret meta-information • Use it to construct intelligent
searches• Search
• Annotation & Transcription• OR Language profiles• OR Annotation indexes
![Page 16: What Linguists Want](https://reader035.fdocuments.in/reader035/viewer/2022072015/56812fef550346895d9565f6/html5/thumbnails/16.jpg)
What we have
• New Documentation• Audio / video recordings w/ translation• Phonetic transcription• Little morphosyntactic annotation (sometimes)
• Legacy documentation • Detailed morphosyntactic annotation• Complete phonetic transcription• Non-transparent (idiosyncratic) markup• Inaccessible format (e.g., paper)
![Page 17: What Linguists Want](https://reader035.fdocuments.in/reader035/viewer/2022072015/56812fef550346895d9565f6/html5/thumbnails/17.jpg)
What we have
• Meta-information• Ontology of morphosyntactic
concepts (GOLD —and others?)• Terminology sets (DatCat Registry)• Ontology of phonetic features• Language codes & associated family
trees (Ethnologue based)
![Page 18: What Linguists Want](https://reader035.fdocuments.in/reader035/viewer/2022072015/56812fef550346895d9565f6/html5/thumbnails/18.jpg)
What we have
• Search• Prototype search of phonetic transcription
using ontology of phonetic features, e.g. “Find all voiceless stops.”
• Steps toward search of morphosyntactic features:
• Language profiles which give the morphosyntactic categories and features used in a language (in XML)
• Conversion path for • mapping idiosyncratic markup to the GOLD ontology
(metaschemas + XSLT)• Converting GOLD compliant markup into RDF for
searching via semantic web
![Page 19: What Linguists Want](https://reader035.fdocuments.in/reader035/viewer/2022072015/56812fef550346895d9565f6/html5/thumbnails/19.jpg)
What we have: Tools
• For ontology-based morphosyntactic annotation • OntoElan (MPI’s Elan + ontology-
based terminology mapper)• OntoGloss (ontology-aware stand-off
annotation of web documents)
• For creating language profiles• FIELD
![Page 20: What Linguists Want](https://reader035.fdocuments.in/reader035/viewer/2022072015/56812fef550346895d9565f6/html5/thumbnails/20.jpg)
What we need
•Comprehensive, integrated system that supports this kind of searching
•“Architecture, not just tools”