A Common Ontology for Linguistic Concepts Scott Farrar University of Arizona.

15
A Common Ontology for Linguistic Concepts Scott Farrar University of Arizona

Transcript of A Common Ontology for Linguistic Concepts Scott Farrar University of Arizona.

Page 1: A Common Ontology for Linguistic Concepts Scott Farrar University of Arizona.

A Common Ontology for Linguistic Concepts

Scott Farrar

University of Arizona

Page 2: A Common Ontology for Linguistic Concepts Scott Farrar University of Arizona.

Endangered Languages

• As many as half of the world’s languages are in danger of disappearing LaPolla (1998)

• Including: Many languages in the Americas (Hopi), Africa, Australia (), and Southeast Asia (Biao Min).

Page 3: A Common Ontology for Linguistic Concepts Scott Farrar University of Arizona.

EMELD

• EMELD (Electronic Metastructure for Endangered Languages Data)

• One of Application of EMELD: Make endangered languages available on the Semantic Web

Page 4: A Common Ontology for Linguistic Concepts Scott Farrar University of Arizona.

Linguistic Field Work

• Linguists collect data

• Datasets (grammars, dictionaries, or glossed corpora)

• Hopi example of kachina:sivu-’ikwiw-ta-qa[vessel-carry: on: back-DUR-REL]

Page 5: A Common Ontology for Linguistic Concepts Scott Farrar University of Arizona.

Problems Concerning Data Interoperability

• Dataset can vary according to:– markup– theoretical style – natural language semantics

Az épület-be mégy-ek.

the building-IllativeCase go-1P/SING

I am going into the building.

Page 6: A Common Ontology for Linguistic Concepts Scott Farrar University of Arizona.

Problems Concerning Data Interoperability

• Linguistic Data is Dynamic

New data is collected.

Datasets are revised.

Theory changes.

Page 7: A Common Ontology for Linguistic Concepts Scott Farrar University of Arizona.

Standardization is not Viable

• Text Encoding Initiative (TEI) (Sperberg-McQueen and Burnard 1994)

• Corpus Encoding Standard (CES) (Ide and Romary 2000)

Page 8: A Common Ontology for Linguistic Concepts Scott Farrar University of Arizona.

Towards a Solution

• Data Storage and Distribution—local or distributed?

• Data model for linguistic datasets

• Linguistic ontology

Page 9: A Common Ontology for Linguistic Concepts Scott Farrar University of Arizona.

EMELD Architecture

EMELDSearch Engine

GUI

Hopi Mocovi Biao Min

LinguisticOntology

Semantic Web

Page 10: A Common Ontology for Linguistic Concepts Scott Farrar University of Arizona.

Linguistic Ontology

• Conceptual Model for the Linguistics domain(special focus on morpho-syntax)

• Built on top of the Standard Upper Merged Ontology (SUMO) (Niles and Pease 2001)– already includes a number of concepts relating to

semiotics and linguistics– incorporates concepts from a number of top-level

ontologies– peer-reviewed and freely available

Page 11: A Common Ontology for Linguistic Concepts Scott Farrar University of Arizona.

Backbone Taxonomy• Entity

PhysicalObject

ContentBearingObjectIconSymbolicStringLinguisticExpressionWrittenLinguisticExpression

TextSentencePhraseWordMorpheme

SpokenLinguisticExpressionDialogueSentencePhraseWordMorpheme

Page 12: A Common Ontology for Linguistic Concepts Scott Farrar University of Arizona.

Backbone Taxonomy (continued)Abstract

ClassRelation

PredicateGrammaticalRelation

AspectTenseCaseAgreement

AttributeGrammaticalAttribute

GenderPersonNumber

Page 13: A Common Ontology for Linguistic Concepts Scott Farrar University of Arizona.

Morphosyntactic CaseCase

InherentCaseSpatio-KineticCase

PositionalCaseInessiveCase

DirectionalCaseIllativeCase

ExistentialCaseAbessiveCasePartitiveCase

InstrumentalCase

StructuralCaseGenitiveCaseErgativeCaseNominativeCase

Page 14: A Common Ontology for Linguistic Concepts Scott Farrar University of Arizona.

Future directions

• Include the domains of phonology and discourse analysis.

• The linguistics ontology has applications beyond the immediate EMELD project:– as part of an expert system for reasoning

about language data – as part of an interlingua designed for machine

translation systems

Page 15: A Common Ontology for Linguistic Concepts Scott Farrar University of Arizona.

Contact Info

• Scott Farrar

• Will Lewis

• Terry Langendoen

• {farrar, wlewis, langendoen}

@u.arizona.edu