Laying the Foundations for a Diachronic Dictionary of Tunis Arabic A First Glance at an Evolving New...

download Laying the Foundations for a Diachronic Dictionary of Tunis Arabic A First Glance at an Evolving New Language Resource Karlheinz Mörth 1, Stephan Procházka.

If you can't read please download the document

Transcript of Laying the Foundations for a Diachronic Dictionary of Tunis Arabic A First Glance at an Evolving New...

  • Slide 1
  • Laying the Foundations for a Diachronic Dictionary of Tunis Arabic A First Glance at an Evolving New Language Resource Karlheinz Mrth 1, Stephan Prochzka 2, Ines Dallaji 2 1 Institute of Corpus Linguistics and Text Technology (Austrian Academy of Sciences) 2 Department of Oriental Studies (University of Vienna) [email protected] [email protected] [email protected]
  • Slide 2
  • Introduction Two projects Vienna Corpus of Arabic Varieties (VICAV) Linguistic Dynamics in the Greater Tunis Area: A Corpus- based Approach (TUNICO) Text technology + Linguistics
  • Slide 3
  • Introduction VICAV ==> Vienna Corpus of Arabic Varieties Digital language resources of a wide range of spoken Arabic varieties: dictionaries, corpora, bibliographies, language profiles, best practices Cooperation of University of Vienna and the Austrian Academy of Sciences http://corpus3.aac.oeaw.ac.at/vicav2/
  • Slide 4
  • Introduction VICAV
  • Slide 5
  • Slide 6
  • Slide 7
  • Introduction TUNICO ==> Linguistic Dynamics in the Greater Tunis Area: A Corpus-based Approach Funded by the Austrian Science Fund (FWF, P 25706-G23) Main objectives: Linguistic exploration of spoken, contemporary Arabic Two digital language resources Corpus of spoken youth language Dictionary of Tunis Arabic
  • Slide 8
  • Arabic dialect lexicography No comprehensive dictionary of the Arabic dialect of Tunis Basis for diachronic research: Nicolas, A. (1911). Dictionnaire franais-arabe Beaussier, M. (2006). Dictionnaire pratique arabe-franais (arabe maghrbin) Qumneur, J. (1961). Notes sur quelques vocables du parler Tunisien Qumneur, J. (1962). Glossaire de dialectal Abdellatif, K. (2010). Dictionnaire le Karmous du Tunisien Marais, W., Guga, A. (1958-61). Textes arabes de Takrona. II: Glossaire
  • Slide 9
  • Dictionary of Tunis Arabic - micro-diachronic and machine-readable - up-to-date and easily accessible lexical information - incorporation of: a) contemporary data from a digital corpus b) various historical sources (e.g. Stumme, H.) - information added is kept traceable to its origin - basis: data taken from didactic materials - 3 other main sources: newly created corpus, interviews and historical publications
  • Slide 10
  • Dictionary of Tunis Arabic Contemporary sources 1) Corpus of spoken youth language (dialogues, narratives): uncommon approach in Arabic dialectology: dialectological interests in language of older people --> only older forms of particular varieties known focus on modern language, contemporary usage and lexical neologisms 2) Additional interviews to complete the data gained from corpus and historical sources
  • Slide 11
  • Dictionary of Tunis Arabic Historical sources - 800-page grammar of the Medina of Tunis by Hans-Rudolf Singer (1984): evaluation of data, integration of excerpted lexicographic data into dictionary - Verification and completion of collected data with other historical resources - Diachronic dimension helps to understand processes in the development of the lexicon - Material gathered will allow analysis of recent developments (migration of parents from rural areas, influence by other Arabic varieties, influence of revolution, foreign elements)
  • Slide 12
  • Dictionary of Tunis Arabic
  • Slide 13
  • Dictionary of Tunis Arabic Technical issues Modelling the data Tools
  • Slide 14
  • Dictionary of Tunis Arabic Technical issues Single schema for a range of dictionaries LMF, RDF, SKOS, TEI (P5)
  • Slide 15
  • Dictionary of Tunis Arabic Technical issues Using the TEI dictionary module to encode digitised print dictionaries is a fairly common standard procedure in digital humanities. The TEI dictionary module needs to be further constrained: to enhance interoperability to reduce alternate constructs to achieve a high degree of compliance with LMF (ISO 24613) Easy to impose in the creation of digitally born dictionaries.
  • Slide 16
  • Dictionary of Tunis Arabic Basic schema............
  • Slide 17
  • Dictionary of Tunis Arabic Basic schema............
  • Slide 18
  • Dictionary of Tunis Arabic Basic schema ktb ktub noun ktb book Buch livre ktb ktub noun ktb book Buch livre
  • Slide 19
  • Dictionary of Tunis Arabic Representing diachrony Ritt-Benmimoun 2014 Singer 1958 56 Ritt-Benmimoun 2014 Singer 1958 56
  • Slide 20
  • Dictionary of Tunis Arabic Documentation http://corpus3.aac.ac.at/vicav2/query/ tools/dictionary_encoding_guidelines
  • Slide 21
  • Dictionary of Tunis Arabic Tools Viennese Lexicographic Editor (VLE) XML editor providing functionalities typically needed in compiling lexicographic data Web-based standalone application Designed to process standard-based lexicographic and terminological data such as LMF, TBX, RDF or TEI. Automating procedures Freely configurable visualisation (via XSLT) Validation: MSXML Schema Client-server architecture (php + mysql) Freely available and easy to setup
  • Slide 22
  • Dictionary of Tunis Arabic Tools
  • Slide 23
  • Corpus Dictionary interface tokenEditor Specialised Web-browser
  • Slide 24
  • Dictionary of Tunis Arabic Tools corpus_shell... a modular framework of reusable software components to access and publish heterogeneous and distributed language resources such as language corpora, dictionaries, encyclopaedic databases, prosopographic databases, bibliographies, metadata, and schemata. Language Resources Portal clarin.oeaw.ac.at/ccv/corpus_shell. clarin.oeaw.ac.at/ccv/
  • Slide 25
  • Dictionary of Tunis Arabic Status and outlook CLARIN-ERIC (Common Language Resources and Technology Infrastructure). Open access and open source. ~5000 entries
  • Slide 26
  • ! Karlheinz Mrth 1, Stephan Prochzka 2, Ines Dallaji 2 1 Institute of Corpus Linguistics and Text Technology (Austrian Academy of Sciences) 2 Department of Oriental Studies (University of Vienna) [email protected] [email protected] [email protected] Thank you for your attention!