The development of Basque and Spanish in Basque immersion programmes
Computer-aided lexicographyiula.upf.edu/materials/101126artola.pdf · 2013. 7. 29. · Basque...
Transcript of Computer-aided lexicographyiula.upf.edu/materials/101126artola.pdf · 2013. 7. 29. · Basque...
Computer-aided lexicography
Creation, publication, and use of dictionaries:our experience at the Ixa NLP Group
Xabier Artola [email protected]
Faculty of Computer Science, Donostia
IULA - InfoLex (UPF) 32010-11-26
Using the dictionary is not always fun
How many legs has a fly?
This looks like a past participle of some verb!!!: shrunk
There must be a word for...
to remove the hair from the skin of goats and sheep
I need a verb now!:
the fire ...s
Is there any relationship between these words? Which one?:
to burn, to blacken
Which one is correct?: a quick shower or a fast shower
IULA - InfoLex (UPF) 42010-11-26
Using the dictionary is not always fun
Translating buy for into Spanish:
The company bought stock for investment purposesThey kept buying for several months
They bought stock for €3,000,000The defendant said he bought it for his brother
look after: what does it mean?
IULA - InfoLex (UPF) 52010-11-26
Outline of the presentation
CreationComputer-aided lexicography: text corpora and language databases
Dictionary editing environments
Knowledge representation issues
PublicationPrint
Electronic (on-line or whatever)
From the editing application to the final product
UseUse cases, users, and dictionary software functionality
Do we get from electronic dictionaries what we could expect from them?
IULA - InfoLex (UPF) 62010-11-26
Creation: dictionary making
Still in the 20th century: piles of index cards within shoeboxesWord usage was compiled largely on paper slips or index cards, as the basis for the creation of dictionary entries
Computer technologytext corpora (concordances, KWIC) to:
acquire real language use examples
discover and ascertain word senses, extract definitions
find and verify collocations
find neologisms
find out multiword lexical units
databases (wide sense) to store dictionary contents
IULA - InfoLex (UPF) 82010-11-26
Creation: dictionary making
Today's electronic dictionaries: where do we get dictionary content from?
print dictionaries (legacy): scanning
OCR
parsing of typographic features
importing it from glossaries, entry lists, other electronic dictionaries...
from scratch: editing (lexicographer)word processors
databases
XML editors
publishers' custom applications
dictionary editing software: Tshwanelex...
IULA - InfoLex (UPF) 102010-11-26
Creation: dictionary making
Building electronic dictionaries from legacy dictionaries: [scanning + OCR +] parsing of typographic features
Goal: to obtain a structural representation of the dictionary content (often in XML)
from text to a lexicographic database
Two real cases (Ixa NLP Group): eEH: from RTF to TEI SGML / XML (Arregi et al., 2003, 2007)
DBE: from RTF to TEI XML (Alegria et al., 2006a, 2006b)
IULA - InfoLex (UPF) 112010-11-26
eEH: from RTF to TEI SGML / XML
Sarasola I. Euskal Hiztegia. Kutxa Fundazioa: Donostia, 1996.
Basque monolingual dictionary, reference for the standard Basque dictionary (Hiztegi Batua, Academy of the Basque Language)
33,111 entries, 41,699 senses
Typical examples illustrating the use of words, drawn from corpora
From RTF to TEI SGML (later to TEI XML): DCG written in Prolog
TEI DTD: select / customize / enhance
Manual correction of the automatically obtained output
IULA - InfoLex (UPF) 122010-11-26
eEH: from RTF to TEI SGML / XML
eEH: electronic Euskal Hiztegia (electronic dictionary prototype)Sophisticated indexing system (no databases are used)
definition and example texts fully lemmatized
Users: ordinary
advanced (philologists, lexicographers, translators...)
Functionalityfull hypertext utility (from definitions and examples to corresponding entries)
basic query
advanced query• especially designed query language
• dictionary search as in a corpus
Problem: lack of editing environment
IULA - InfoLex (UPF) 132010-11-26
queryinterface
eEH: electronic dictionary prototype
querylanguage
IULA - InfoLex (UPF) 142010-11-26
queryinterface
eEH: electronic dictionary prototype
querylanguage
IULA - InfoLex (UPF) 152010-11-26
DBE: from RTF to TEI XML
Miyares Bermúdez E. (dir.) Diccionario Básico Escolar. Centro de Lingüística Aplicada, Santiago de Cuba. 2003.
School dictionary, monolingual
7,473 entries, 14,013 word senses (1st ed.)
From RTF to TEI P4 XML:Word macros
Ferret (semi-automatic learning software)
TEI DTD: select / customize / enhance
Manual correction of the automatically obtained output
leXkit: dictionary editing environment
Three on-line versions, two CDs, three print editions
IULA - InfoLex (UPF) 172010-11-26
DBE: CD and on-line (3rd version)
otherfunctionalityentry look-up
indexlook-up
orthographichelp
letterindexes
imagerequest
response
cross-references
IULA - InfoLex (UPF) 182010-11-26
Dictionary editing environments
Essential if databases or markup languages are chosen for dictionary knowledge representation
Wish listall kind of editing facilities: XML-transparent, navigation facilities, cross-reference building, wizards...
integrity constraint checking and consistency
multimedia integration
import facilities
collaborative editingWiktionary
dicussion forums• Ultralingua (online discussion forum)
• Leo collaborative bilingual dictionaries
IULA - InfoLex (UPF) 192010-11-26
Dictionary editing environments
Wish list (cont'd)customized output: dictionary publication
different dictionary products:• unabridged dictionary
• student's dictionary
• ...
export formats: • electronic versions: XML, HTML, other formats...
• print: PDF, desktop publishing software...
IULA - InfoLex (UPF) 202010-11-26
A real case: leXkit (Ixa NLP Group)
leXkit: a dictionary content management system (Alegria et al., 2006c)
Dictionary edition and maintenance
XML-based: Berkeley DBXML XML native database for storage
Client-server architecture: SOAP-based communication
Suitable for different kinds of dictionaries
Main features: Allows adding, deleting and modifying entries in a friendly fashion: XML details are transparent for the lexicographer
Provides the lexicographers with all the features of a full-fledged DBMS: full search capabilities, safe storage, concurrent access,etc.
IULA - InfoLex (UPF) 212010-11-26
leXkit
Main features (cont'd): Maintains entry states (version control and tracking)
Allows to automatically generate the files and components needed by a running application such as the current electronic DBE.
Tailored output is feasible: it allows to easy export data required in print editions, diversified electronic versions, etc.
ArchitectureClient
The component used by the lexicographer
Tool integration (corpora, other dictionaries...)
Server: database, concurrency, configuration files (dictionary schema definitions, wizards, etc.), import/export utilities, backups...
IULA - InfoLex (UPF) 222010-11-26
leXkit
Index:Dictionary entries
Search results
Editor:•Edition tree
•Predefined tasks
Viewer:•Entry preview(WYSIWYG)
•Integrated tools
edition textbox
dictionary tabs
IULA - InfoLex (UPF) 232010-11-26
leXkit
Viewer:•XML tab•Entry info
•Session control•...
views and info tabs
IULA - InfoLex (UPF) 242010-11-26
leXkit: system architecture
IULA - InfoLex (UPF) 252010-11-26
leXkit
Communication (client / server)SOAP web services (RPC model + cookies)
Intermediate declarative layer (XML)Dictionary specifications
Operations (context-dependent tasks)
Wizards (common edition operations, predefined searches...)
Other technical aspectsXSLT is widely used in the application
XSLTi: decarative language that adds interactivity to XSLT scripts
XML processing: Xerces + Xalan
Graphical interface: wxWidgets
HTML rendering: Mozilla (wxMozilla)
IULA - InfoLex (UPF) 262010-11-26
leXkit: wizards for the DBE
IULA - InfoLex (UPF) 272010-11-26
leXkit: conclusions
leXkit has been used at the CLA for editing the DBE's 2nd and 3rd editions: from 7473 entries / 14013 senses in the 1st
edition to 10557 entries / 19374 senses in the 3rd one.
The construction of leXkit was a vital tool in the qualitative leap of this work.
Dictionary edition applications are a must, especially if dictionaries are stored in databases or XML-encoded.
leXkit can be used by other lexicographical teams to create and update dictionaries. It is available as free software (open source) at http://sourceforge.net/projects/lexkit/.
IULA - InfoLex (UPF) 282010-11-26
Dictionary representation
Representation is the key factor for dictionary functionalitywe won't get what is not stored and adequately represented in the dictionary
the representation we choose conditions what we later on will be able to get from the dictionary
Physical leveltext (no access facilities, deficient structuring)
plain or somehow structured (CSV, tabular...)
rich text: typography, word processors
even the entry concept is diluted sometimes
risk: vicious circle (to be avoided)
IULA - InfoLex (UPF) 292010-11-26
Dictionary representation
Physical level (cont'd)
database: relational (structure, indexing, query and update facilities)
one database = one dictionary• is each pertinent information unit correctly represented in a field or
column?
integrated dictionary system (publishers)• publisher's general dictionary database
marked textHTML: mark-up language, presentation-oriented
SGML / XML: mark-up metalanguage, content-oriented
IULA - InfoLex (UPF) 302010-11-26
Dictionary representation
content-oriented marked text constitutes a better data model for the representation of dictionary content and structure than the relational model
lexical information is inherently complex
information apparently similar is represented in dictionaries using structurally different ways
intra-entry hierarchical structure is not adequately represented using the relational model
the information must be split in several tables: redundancy, factorization problems
construction of user-friendly graphical user interfaces is not always easy
query languages are often complex and non-intuitive
IULA - InfoLex (UPF) 312010-11-26
Dictionary representation
content-oriented marked text... (cont'd)
content-oriented marked texts (SGML, XML...)descriptive markup (structure, content)
more flexible data representation model
reflects better the lexicographic data model used in dictionaries
drawback: manageability and efficiency XML native databases: indexing, query and update facilities
TEI (Text Encoding Initiative): a whole chapter full of recommendations on marking up human-oriented dictionaries
IULA - InfoLex (UPF) 322010-11-26
Dictionary representation
Physical level (cont'd)
dictionary knowledge bases: reasoning, artificial intelligence techniques, knowledge representation languages
the only way to extract implicit knowledge from dictionary structures
IULA - InfoLex (UPF) 332010-11-26
Dictionary representation
Conceptual level
what information/knowledge is represented?orthography, pronunciation, grammar (mostly POS), register, definition...
morphology? irregular inflection paradigms?• important in learner's dictionaries, highly inflected languages...but
not only
• two real cases (Ixa NLP Group)• Elhuyar eu-es (MS Word plugin): eu and es lemmatization
• UZEI synonyms (MS Word plugin): eu lemmatization
IULA - InfoLex (UPF) 342010-11-26
Dictionary representation
Conceptual level (cont'd)
dictionary typology monolingual / bi- or multilingual
language dictionary / encyclopedic
general use / specific (terminology)
...
implicit knowledge: in definitions, examples, lexical semanticsWordNet, thesauri...
association lists, semantic networks
inference, reasoning
IULA - InfoLex (UPF) 352010-11-26
Publication: presentation, output
printhow to obtain the "file" to submit to the publisher?
electronictypology
on-line• on-line dictionaries (free, subscriptions...)
• dictionary directories: OneLook Dictionary Search
• multi-dictionary access tools: Euskalbar, a Firefox plugin that integrates ~30 dictionaries and corpora
• the web (corpus) as a dictionary
• translation memories, parallel corpora
IULA - InfoLex (UPF) 362010-11-26
Publication: presentation, output
typology (cont'd)desktop dictionary software
• standalone applications: personal computers, small handheld devices, mobile phones...
• integrated dictionaries, plugins: in word processors, web browsers...
• Elhuyar eu-es (MS Word plugin)
• UZEI synonyms (MS Word plugin)
• multi-dictionary tools: Babylon
machine-readable dictionaries: PDF...
formats: HTML, XML, PDF, PS, electronic book formats, application proprietary formats...
IULA - InfoLex (UPF) 372010-11-26
Publication: presentation, output
Which is the way leading from the editing environment or database to the print or to the electronic version?
DBECD and on-line: XML to HTML (dynamic transformation, XSLT)
print: XML to PDF (XSLT-FO)
Hiztegi Batua (Euskaltzandia, Basque Language Academy):on-line: XML to HTML (XSLT)
publishing: HTML to Quark (manually)
download: Quark to PDF
IULA - InfoLex (UPF) 382010-11-26
Publication: presentation, output
Which is the way... (cont'd)other solutions:
[general dictionary]• Oracle to HTML (web)
• Oracle to Quark (print)
[terminological dictionary] • 4D to Quark (print)
• 4D to XML (TBX) to XHTML (web)
customized output: proprietary formats (mainly in desktop dictionary software)
The longer the way...the easier is to get lost!update will be more costly
IULA - InfoLex (UPF) 402010-11-26
Use: functionality
Use caseslanguage input: typical lookup (definitions, multiword expressions...)
language output: is the dictionary well oriented to be used in language production situations?
much more information is needed when we want to actually use a word in speech or in writing than when we only want to understand a word in a passage.
translation tasks: language input and outputespecial information is needed: faux amis...
language learning activities: more information is needed about context of use, connotations of a word, collocations, etc.
IULA - InfoLex (UPF) 412010-11-26
Use: functionality
Users (models, profiles)native speakers
language learners
translators
students, children...
specialists: scientists, technicians...
Functionalitydo we get from electronic dictionaries what we could expect from them?
are they something more than their print counterparts?
IULA - InfoLex (UPF) 422010-11-26
Dictionaries of the future: http://www.oxforddictionaries.com/page/84
Print dictionaries have been joined by dictionaries in electronic form: these are often enriched with many additional features, such as sound recordings or sophisticated links to other related material.
...
It seems likely that by the middle of this century, if not before, all dictionaries will be in electronic form. This means that limitations of space, which have always been a serious issue for lexicographers and dictionary publishers, will be much less important. Dictionaries will be able to include more material: more words and definitions, interactive features, and multimedia content such as images, sound, and video. They will also be updated much more rapidly than ever before. But the general idea of a dictionary - a resource that provides explanations of words and how they are used - will probably remain the same.
IULA - InfoLex (UPF) 432010-11-26
Use: functionality
Functionality (cont'd)what we get
search facilities: from basic lookup to advanced queries
speed, storage facilities
orthographic help (closeness)
integration: word processors, reading applications...
new features: multimedia (recorded sounds, images, videos), hyperlinks
interactivity?
wish listdefinition and examples: corpus queries
navigation: fully hyperlinked (lemmatization of definitions, examples...)
morphology, grammar, derivation...
IULA - InfoLex (UPF) 442010-11-26
Use: functionality
wish list (cont'd)use of words, lexical combinatorics, collocations
• dictionary and corpus integration?
find a word from its definition, explore related concepts...:OneLook Reverse Dictionary (statistical language processing)
intelligent dictionary? why not integrate different kinds of information and tools (WordNet, thesauri, multimedia, collocations, thematic...) in powerful language help systems, and provide them with inference and reasoning capabilities?
• Hiztsua / SIAD (Artola, 1993; Agirre et al. 1994a, 1994b, 1997)
• AnHitz (Arregi, 1995; Agirre et al. 1996, 2000)
Have we investigated enough the ways users use dictionaries?
IULA - InfoLex (UPF) 452010-11-26
Hiztsua / SIAD: Intelligent Dictionary Help System
Built from a small French dictionary: Le Plus Petit Larousse (Librairie Larousse. Paris, 1980).
Definitions parsed using NLP techniques: morphology, syntax, definition patterns, lexico-semantic relationships
Building procedure:LPPL (typed directly into a DB GUI)
Dictionary Database (DDB, relational)
Dictionary Knowledge Base (DKB)
DKB: interrelated network of concepts (semantic network):hypernymy/hyponymy
synonymy, antonyms
meronymy
semantic roles
IULA - InfoLex (UPF) 462010-11-26
Hiztsua / SIAD: Intelligent Dictionary Help System
Frame-based system, allowing inheritance, inference, composition of lexical relationships
Prototype conceived and designed for human usersfrom the study of questions that human users would like to have answered when consulting a dictionary
Functionality that allows to extract and infer implicit knowledge hidden in the dictionary structures
definition queries, searches of alternative definitions
differences, relations and analogies between concepts
thesaurus-like word search
verification of concept properties and of interconceptualrelationships
...
IULA - InfoLex (UPF) 482010-11-26
Anhitz: A translator-oriented Dictionary System
Intelligent help system for human translatorsthe dictionary is conceived as an "active" tool that observes the activity of the user while he or she is working, providing him or her with "intelligent help"
Prototype based on two monolingual dictionaries (French and Basque):
two monolingual knowledge bases
one bilingual DKB establishes equivalence links between concepts from the monolingual dictionaries
diverse types of equivalence relationships: more general, more specific...
IULA - InfoLex (UPF) 492010-11-26
Anhitz: A translator-oriented Dictionary System
Functionality:empirical observation and study, using protocols and questionnaires, on the activity of professional and non-professional translators
to model the translator-dictionary interaction when translating lexical units from the source language into the target language
user's goals and intentions, dictionary queries made, observations, etc. have been recorded
monolingual and bilingual, locution, synonym... dictionaries
real use cases
functions classified according basically to three main activities:source text understanding
target text generation
search for translation equivalents
IULA - InfoLex (UPF) 502010-11-26
Anhitz: A translator-oriented Dictionary System
trans-lex
TRANSLATING THE SOURCE WORD
GETTING THE CONTEXT
SOURCE WORDUNDERSTANDING
SEARCHING FOR THE EQUIVALENT
TARGET WORDGENERATIONACQUIRING THE
MODEL OF THETRANSLATOR
IULA - InfoLex (UPF) 512010-11-26
Anhitz: A translator-oriented Dictionary System
examrths
comp-semsint-pat
prod
collocver-regdis-pro
dpro
TRANSLATING THE SOURCE WORD
TARGET WORD GENERATION
FINDING GENERATION HYPOTHESES
DISCRIMINATING GENERATION HYPOTHESES
GENERATION HYPOTHESIS VERIFICATION
FROM THE DICTIONARY ENTRY TO THE LEXICAL UNIT
IULA - InfoLex (UPF) 522010-11-26
Anhitz: A translator-oriented Dictionary System
Primitive functions:morphological analysis of a word form
choice of a dictionary entry / word sense in a given context
list of the possible senses that could be suitable for a word in a given context
definition request
reformulation of a definition
request of the properties of a concept
choice of a definition in a given context
request of differences or relationships between two concepts
verification of relationships between two concepts
definition verification
IULA - InfoLex (UPF) 532010-11-26
Anhitz: A translator-oriented Dictionary System
Primitive functions (cont'd):verification of the properties of a concept
thesaurus-like search of concepts
request of examples
direct lexical translation of a word form
verification of translation equivalents
semantic compatibility between two word senses according to a given relationship
search for syntactic constructions corresponding to a given pattern
search of lexical collocations
request of the verb regime
search for potential translation equivalents
IULA - InfoLex (UPF) 542010-11-26
To finish...
Dictionary edition: provide the lexicographer with advanced tools
Stress the importance of dictionary knowledge representation: we will get what we keep, and we will get it if we represent it adequately for the purpose required
We should investigate how users do use dictionaries, in order tobuild more "intelligent" systems, capable of anticipating users'needs and help them better
The dictionary of the future should be a "different" thing, not merely a "faster" print dictionary
integration of different kinds of information and tools in powerful language help systems
rich and heterogeneous functionality and access ways to the lexicon
IULA - InfoLex (UPF) 552010-11-26
Bibliography
Miyares Bermúdez E., Ruiz Miyares L., Álamo Suárez C., Pérez Marqués C., Artola ZubillagaX., Alegria Loinaz I., Arregi Iparragirre X.. 2010a. Las últimas ediciones del Diccionario BásicoEscolar de Cuba. IV Congreso Internacional de Lexicografía Hispánica. Tarragona.
Miyares Bermúdez E., Ruiz Miyares L., Álamo Suárez C., Pérez Marqués C., Artola ZubillagaX., Alegria Loinaz I., Arregi Iparragirre X.. 2010b. La segunda y tercera ediciones del Diccionario Básico Escolar. Euralex2010. Leeuwarden (The Netherlands).
Arregi X., Arriola J.M., Artola X., Díaz de Ilarraza A., Garcia E., Lascurain V., Soroa A., Uria L. 2007. Semiautomatic Construction of the Electronic Euskal Hiztegia Basque Dictionary (eEHBD). The XVIth biennial conference of the Dictionary Society of North America, Chicago.
Alegria I., Arregi X., Artola X., Astiz M., Ruiz Miyares L.. 2006a. Different issues in the design and development of the electronic Cuban Basic School Dictionary. E. Miyares, L. Ruiz eds., Linguistics in the Twenty First Century, 273-288. Cambridge Scholars Press, UK. ISBN: 1904303862.
Alegria I., Arregi X., Artola X., Astiz M., Ruiz Miyares L.. 2006b. Building an Electronic Version of the Cuban Basic School Dictionary. Proceedings EURALEX 2006 I, 243-250 (Turin, Italy). (ISBN 88-7694-918-6).
IULA - InfoLex (UPF) 562010-11-26
Bibliography
Alegria I., Arregi X., Artola X., Astiz M., Ruiz Miyares L.. 2006c. A Dictionary Content Management System. Proceedings EURALEX 2006 I, 105-109 (Turin, Italy). (ISBN 88-7694-918-6).
Soroa, A. Izaera heterogeneoko baliabide lexikalen integraziorako arkitektura batenproposamena. Datu-integrazioaren ikuspegitik egindako ekarpena. PhD Thesis. InformatikaFakultatea, UPV-EHU. 2004.
Arregi X., Arriola J., Artola X., Díaz de Ilarraza A., García E., Laskurain B., Sarasola K., SoroaA., Uria L.. 2003. Semiautomatic conversion of the Euskal Hiztegia Basque Dictionary to a queryable electronic form. T.A.L. journal. vol 44, num 2 p 107-124 ISSN: 1248-9433.
Arriola J., Artola X., Soroa A.. 2003. Automatic Extraction of verb patterns from Hauta-LanerakoEuskal Hiztegia. B. Oyharçabal ed., Inquiries into the lexicon-syntax relations in Basque. Supplements of ASJU no. XLVI (ISBN: 84-8373-580-6), 127-146. UPV/EHU, Bilbo.
E. Agirre, X. Arregi, X. Artola, A. Díaz de Ilarraza, F. Evrard, K. Sarasola, A. Soroa. 2003. An Intelligent Dictionary Help System. Encyclopedia of Library and Information Science, 2nd. Edition (ISSN/ISBN: 0-8247-2075-X [print]; 0-8247-4259-1 [web]), 1390-1401. Allen Kent (Marcel Dekker, Inc.), New York.
IULA - InfoLex (UPF) 572010-11-26
Bibliography
Agirre E., Arregi X., Artola X., Díaz de Ilarraza A., Sarasola K., Soroa A.. 2001. MLDS: A Translator-Oriented MultiLingual Dictionary System. Natural Language Engineering, 5 (4), 325-353. ISSN: 1351-3249. Cambridge University Press.
Agirre E., Ansa O., Arregi X., Artola X., Díaz de Ilarraza A., Lersundi M., Martinez D., SarasolaK., Urizar R.. 2000. Extraction Of Semantic Relations From A Basque Monolingual Dictionary Using Constraint Grammar. Proceedings of Euralex Sttutgart (Germany). 2000. ISBN 3-00-006574-1.
Arriola, J.M.. Euskal Hiztegia-ren azterketa eta egituratzea ezagutza lexikalaren eskuratzeautomatikoari begira. Aditz-adibideen analisia murriztapen-gramatika baliatuz, azpikategorizazioaren bidean. PhD Thesis. Filologia eta Historia-Geografia Fakultatea, UPV-EHU, 2000.
Patrick J., Zhang J., Artola X.. 2000. An Architecture and Query Language for a Federation of Heterogeneous Dictionary Databases. Computers and the Humanities (ISSN: 0010-4817).
Agirre E., Arregi X., Artola X., Díaz de Ilarraza A., Sarasola K., Soroa A.. 2000. A Methodology For Building Translator-Oriented Dictionary Systems. Machine Translation Journal. ISSN: 0922-6567. Kluwer Academic Publishers. V. 15 nº 4. pp. 295-310. 2000.
IULA - InfoLex (UPF) 582010-11-26
Bibliography
Agirre E., Arregi X., Artola X., Díaz de Ilarraza A., Sarasola K., Soroa A.. 1999. Un Diccionarioactivo vasco-castellano en un entorno de escritura. VI Simposio Internacional de ComunicaciónSocial. Santiago de Cuba, 25-28 de Enero de 1999.
Agirre E., Arregi X., Artola X., Díaz de Ilarraza A., Sarasola K., Soroa A.. 1997. Constructing an intelligent dictionary help system. Natural Language Engineering 2(3): 229-252. ISSN: 1351-3249. Cambridge University Press. Cambridge. 1997.
Arriola J., Artola X., Soroa A.. 1996. Hauta-lanerako Euskal Hiztegiaren analisi erdiautomatikoa. ASJU, Anuario del Seminario de Filología Vasca.
Agirre E., Arregi X., Artola X., Díaz de Ilarraza A., Ezeiza N., Sarasola K., Soroa A., A. Agirre, Patel H..1996. Design of a translator-oriented dictionary: Enhancement of a dictionary knowledge base by task modelling. Le traitement automatique du langage et les applications industrielles/Natural Language Processing and Industrial Applications. (NLP + IA96), Volume I, pp 1-6. Moncton, Canada. 1996.
Arriola J., Artola X., Soroa A.. 1996. Automatic extraction of lexical information from an ordinary dictionary. EURALEX'96, Göteborg (Sweden).
Patrick J., Zhang J., Artola X.. 1996. An Architecture for a Federation of Heterogeneous Lexical and Dictionary Databases. Joint International Conference ALLC/ACH'96, 221-225. Bergen (Sweden).
IULA - InfoLex (UPF) 592010-11-26
Bibliography
Agirre E., Arregi X., Artola X., Díaz de Ilarraza A., Sarasola K., Evrard F.. 1995. IDHS, MLDS: Towards Dictionary Help Systems for Human Users. Semantics And Pragmatics Of Natural Language: Logical And Computational Aspects. K. Korta & J. M. Larrazabal (Eds.), ILCLI Series, n. 1. Donostia.
Arriola J., Artola X., Soroa A.. 1995. Análisis automático del diccionario Hauta-Lanerako EuskalHiztegia. Procesamiento del lenguaje natural (SEPLN), Revista no. 17, 173-181. Bilbo.
Arregi, X.. ANHITZ: Itzulpenean laguntzeko hiztegi-sistema eleanitza. PhD Thesis. InformatikaFakultatea, UPV-EHU, 1995.
Agirre E., Arregi X., Artola X., Díaz de Ilarraza A., Sarasola K.. 1994a. Lexical Knowledge Representation in an Intelligent Dictionary Help System. Proceedings of COLING'94, vol. 1, 544-550. Kyoto (Japan).
Agirre E., Arregi X., Artola X., Díaz de Ilarraza A., Sarasola K., Evrard F.. 1994b. Intelligent dictionary help systems. Applications and Implications of current LSP Research. Eds. Brekke, M.; Andersen. I.; Dahl, T. & Myking, J., v. 1., 174-183. Fakbokforlaget (Norway).
Agirre E., Arregi X., Artola X., Díaz de Ilarraza A., Sarasola K.. 1994c. Analysing world-level translation activity to design a computerised dictionary. Proceedings of Euralex'94. Amsterdam.
IULA - InfoLex (UPF) 602010-11-26
Bibliography
Agirre E., Arregi X., Artola X., Díaz de Ilarraza A., Sarasola K.. 1994d. A methodology for the extraction of semantic knowledge from dictionaries using phrasal patterns. Proceedings of IBERAMIA'94. IV Congreso Iberoamericano de Inteligencia Artificial. McGraw-Hill. , 263-270. Caracas (Venezuela).
Arregi X., Artola X., Díaz de Ilarraza A., Sarasola K., Evrard F.. 1993. Sistema DiccionarialMultilingüe: aproximación funcional. Revista de la Asociación Española para el Procesamientodel Lenguaje Natural. Vol: 14, pp: 313-335.ISSN: 1135-5948.
Artola, X.. HIZTSUA: Hiztegi-sistema urgazle adimendunaren sorkuntza eta eraikuntza. Hiztegi-ezagumenduaren errepresentazioa eta arrazonamenduaren ezarpena. / Conception et construction d'un système intelligent d'aide dictionnariale (SIAD). Acquisition et représentationdes connaissances dictionnariales, établissement de mécanismes de déduction et spécificationdes fonctionnalités de base. PhD Thesis. Informatika Fakultatea, UPV-EHU, 1993.
Artola X., Evrard F.. 1992. Dictionnaire intelligent d'aide á la compréhension. Actas IV CongresoInternational EURALEX'90 (Benalmádena), 45-57. Barcelona.
Arregi X., Artola X., Díaz de Ilarraza A., Sarasola K., Evrard F.. 1991. Aproximación funcional a DIAC: diccionario inteligente de ayuda a la comprensión. Revista de la Asociación Españolapara el Procesamiento del Lenguaje Natural. Vol: 11, pp:127-138. ISSN: 1135-5948.
IULA - InfoLex (UPF) 612010-11-26
RTF: Rich Text Format (MS Word){\rtf1\ansi\ansicpg1252\uc1 \deff0\deflang1033\deflangfe1033{\fonttbl{\f0\froman\fcharset0\fprq2{\*\panose02020603050405020304}Times New Roman;}
…{\b\f69\fs16 aberastasun}{\fs14 .}{\b\i\fs14 }{\i\fs14 iz. }{\fs14 (1617; }{\i\fs14 abrastasun}{\fs14 1571).}{\b\fs14 1}{\i\fs14 . }{\fs14 Ondasun edo gauza baliotsuen ugaritasuna}{\i\fs14
. Aberastasunak ematen du aginpidea. }{\fs14 Ik. }{\b\fs14 diru}{\b\i\fs14 . }{\i\fs14Aberastasunez betea. Ez ohorerik ez aberastasunik. Garai hart
an Espainia guztian omen zen baso-oihanetan aberastasun handia. Basoetako aberastasuna. Zein zitezkeen gereziketa eta fruitu aberastasun horren iturburuak. }{\f69\fs12 II}{\fs14 }{\i\fs14 Pl. }{\fs14 Norbaitek dituen ondasun eta gauza baliozkoak}{
\i\fs14 . Herri baten aberastasunak eta baliabideak. Aberastasun galkorren ondoan ibiltzea. Euskarak bere baitan dituen aberastasunak. Aberastasunen banaketa zuzena. Aberastasunak hondatu. }{\b\fs14 2}{\fs14 .}{\i\fs14 }{\fs14
Aberatsa denaren nolakotasuna. Ant}{\i\fs14 . }{\b\fs14 pobretasun}{\fs14 ;}{\b\fs14behartasun}{\b\i\fs14 . }{\i\fs14 Aberastasunean bizi. Pobretasunetik aberastasunera. Aberastasunaren arriskuak.
\par }\pard \ql \li0\ri0\widctlpar\aspalpha\aspnum\faauto\adjustright\rin0\lin0\itap0 {
\par }}
IULA - InfoLex (UPF) 622010-11-26
Simplified DCG grammar to parse EH entries
Entry => Hdw [Relations] Category [date] [DefExamples].Hdw => [Homograph] [NonStdHdw | StdHdw].Homograph => bh number eh.NonStdHdw => cross bb hdw eb.StdHdw => bb hdw eb.Category => [subc] Category.Category => bi cat ei.DefExamples => Def [Examples] DefExamples | ε.Def => [SenseNumber][SenseGroup] def [Relations].SenseNumber => bs number es.SenseGroup => bsg grouptag esg.Relations => [SynRel | AntRel] Relations [Examples] | ε.SynRel => bsy synonyms esy.AntRel => ba antonyms ea.Examples => bi examples ei.
IULA - InfoLex (UPF) 632010-11-26
TEI XML encoding (DBE entry)
<entry id="d_d0e1701"><form>
<orth>decaer</orth><syll>de|ca|er</syll>
</form><gramGrp>
<pos>vintr.</pos><itype>(33)</itype>
</gramGrp><sense n="1">
<def>Ir a menos, perder una persona o cosa parte de laspropiedades que le daban su fuerza o valor.</def>
<eg><q>Con el paso del tiempo, su interés
<oVar>type="?">decayó</oVar>.</q></eg>
<xr><lbl>Sin.</lbl><ref>debilitar</ref><ref>disminuir</ref><ref>flaquear</ref><ref>desfallecer</ref>
</xr></sense><form type="infl">
<orth>decaído</orth><gram>(p.p.)</gram>
</form></entry>
IULA - InfoLex (UPF) 642010-11-26
DBE: print version (3rd ed.) page markers
figure refs.