Universal Networking Language: Advances in Theory and Applications
-
Upload
shamim-h-ripon -
Category
Documents
-
view
214 -
download
0
Transcript of Universal Networking Language: Advances in Theory and Applications
-
8/18/2019 Universal Networking Language: Advances in Theory and Applications
1/465
Universal Networking Language:Advances in Theory and Applications
-
8/18/2019 Universal Networking Language: Advances in Theory and Applications
2/465
Research on Computing Science
Series Editorial BoardComité Editorial de la Serie
Chief Editors:Editores en Jefe
Juan Luis Díaz de León S. (Mexico)Gerhard Ritter (USA)
Jean Serra (France)Ulises Cortés (Spain)
Associate Editors:Editores Asociados
Jesús Angulo (Frane)Oscar Camacho (Mexico)
Jihad El-Sana (Israel) Jesús Figueroa (Mexico) Alexander Gelbukh (Russia) Ioannis Kakadiaris (USA)Serguei Levachkine (Russia)
Petros Maragos (Greece) Julian Padget (UK) Miguel Torres (Mexico) Mateo Valero (Spain)Cornelio Yáñez (Mexico)
Editorial Coordination:Coordinación Editorial
José Ángel Cu Tinoco Miguel R. Silva Millán
Format:Formación
Sulema Torres Ramos Hiram Calvo Castro
Research on Computing Science es una publicación trimestral, de circulación internacional, editada por el
Centro de Investigación en Computación del IPN, para dar a conocer los avances de investigación científica ydesarrollo tecnológico de la comunidad científica internacional. Volumen 12, febrero, 2005. Tiraje: 500
ejemplares. Certificado de Reserva de Derechos al Uso Exclusivo del Título No. 04-2004-062613250000-102,expedido por el Instituto Nacional de Derecho de Autor. Certificado de Licitud de Título No. 12897, Certificadode licitud de Contenido No. 10470, expedidos por la Comisión Calificadora de Publicaciones y Revistas
Ilustradas. El contenido de los artículos es responsabilidad exclusiva de sus respectivos autores. Queda prohibidala reproducción total o parcial, por cualquier medio, sin el permiso expreso del editor, excepto para uso personal
o de estudio haciendo cita explícita en la primera página de cada documento. Impreso en la Ciudad de México,
en los Talleres Gráficos del IPN – Dirección de Publicaciones, Tres Guerras 27, Centro Histórico, México, D.F.
Distribuida por el Centro de Investigación en Computación, Av. Juan de Dios Bátiz S/N, Esq. Av. Miguel Othónde Mendizábal, Col. Nueva Industrial Vallejo, C.P. 07738, México, D.F. Tel. 57 29 60 00, ext. 56571.
Editor Responsable: Juan Luis Díaz de León Santiago, DISJ 690604
Research on Computing Science is published by the Centre for Computing Research of IPN. Volume 12 , February, 2005. Printing 500. The authors are responsible for the contents of their articles. All rights reserved. No part of this publication may be reproduced, stored in a retrieval system, or transmitted, in any form or by anymeans, electronic, mechanical, photocopying, recording or otherwise, without prior permission of Centre forComputing Research. Printed in Mexico City, February, 2005, in the IPN Graphic Workshop – PublicationOffice.
Volume 12Volumen 12
-
8/18/2019 Universal Networking Language: Advances in Theory and Applications
3/465
Universal Networking Language:Advances in Theory and Applications
Volume Editors:Editores del Volumen
Jesús Cardeñosa Alexander Gelbukh Edmundo Tovar
Instituto Politécnico Nacional
Centro de Investigación en Computación
México 2005
-
8/18/2019 Universal Networking Language: Advances in Theory and Applications
4/465
ISBN: 970-36-0226-6ISSN: 1665-9899
Copyright © Instituto Politécnico Nacional 2005
Copyright © by Instituto Politécnico Nacional
Instituto Politécnico Nacional (IPN)Centro de Investigación en Computación (CIC)
Av. Juan de Dios Bátiz s/n esq. M. Othón de Mendizábal
Unidad Profesional “Adolfo López Mateos”, Zacatenco
07738, México D.F., México
http://www.ipn.mx
http://www.cic.ipn.mx
Printing: 500Impresiones
Printed in MexicoImpreso en México
-
8/18/2019 Universal Networking Language: Advances in Theory and Applications
5/465
Preface
This volume1 is an attempt to compile and illustrate all the open lines of research
within the UNL initiative. The included papers constitute a selection of the most sig-
nificant papers presented in several international conferences and workshops during
the last four years that served as a meeting point for the UNL consortium. In general,
papers are not restricted to UNL although they are clearly predominant; they clearlyillustrate the wideness and flexibility of this UNL initiative, launched by the United
Nations aiming at the elimination of linguistic barriers.
Since the starting of the UNL project in 1996, the participants in the project from
initially 15 languages have made substantial progress in technical matters and the or-
ganizational aspects involved as well. This book attempts to provide a survey on theapproaches and theoretical studies around UNL, since research on UNL is not only
devoted to studies on interlinguas, MT or any NLP related issues, the intrinsic proper-
ties of UNL make it a firm candidate to support a wide variety of applications ranging
from e-learning platforms to management of multilingual document bases. Such a va-
riety of applications, their theoretical basis and subsequent methodological inquiries
are at core of this volume.
What is UNL? Its motivation and purpose
The emerging needs and use of Internet for cultural and educational dissemination
and commercial expansion of the peoples collide with linguistic diversity, which inprinciple diminishes the potential of Internet as a vehicle of knowledge for everybody.
Aware of this problem, the Institute of Advanced Studies of the University of the
United Nations University (UNU/IAS) launched the UNL project in 1996 with the
initial participation of 15 languages (German, Arab, Chinese, Spanish, French, Hindi,
Indonesian, English, Italian, Japanese, Latvian, Mongol, Portuguese, Russian, Thai).
In short, the UNL Programme was initially conceived to support multilingual services
in Internet being an alternative to classical machine translation systems.
The UNL system revolves around a unique artificial language (Universal Network-
ing Language) that pretends to capture the meaning of written documents. This lan-
guage is based on the representation of concepts and its relations. The definition of
this language has been possible thanks to the collaboration of more than one hundred
people, prestigious researchers, and scientists of all around the world, that worked
during the first three years of the project to produce a final version of the UNL speci-fications2.
1 Earlier versions of the papers at pages 10, 109, 117, 125, 145, 215, 230, 254, 268, 276, 309,
347, 359, 370, 380 have been published in the Proceedings of Convergences’03, Alexandria,
Egypt. Earlier versions of the papers at pages 3, 10, 27, 38, 101, 261, 326 have been pub-
lished in the Proceedings of LREC-2002.2 UNL Specifications, v.3.1 available at
http://www.undl.org/unlsys/unl/UNL%20Specifications.htm
-
8/18/2019 Universal Networking Language: Advances in Theory and Applications
6/465
The UNL organization
The UNL initiative has often been regarded as “hidden organization”. The first years
of the project (1996-2000) were devoted to the definition of the interlingua and to the
development of the essential components required to undertake the basic process in
UNL (mainly dictionaries and language generators). During this period, the organiza-
tion was closed and limited to a number of participants, because of the need to define
the specifications of the language.
By the end of this period, the UNL project reached a significant degree of quality
in the development of components, linguistic resources and technical specifications;
and the specifications were finally produced. Once the specifications were finished,
they were made public and accessible to all the international community, so that col-
laboration and participation in this initiative is completely open. As a consequence of this degree of development, the Board of the United Nations
University, in its fifth meeting in 2000, agreed on the creation of a new institution re-
sponsible for the organization and promotion of the UNL in the future under the um-
brellas of the United Nations. This new entity was the UNDL Foundation, with head-
quarters in Geneva3. The development of the components of different languages was
assigned to the so-called Language Centres, constituted by the initial teams in each
country in charge of the development of the essential components of UNL.
The year 2004 represents a turning point in the evolution of UNL for two main rea-
sons. First, it is the year where a new period coordinated and fostered by the Lan-
guage Centres starts for the debugging, updating and expansion of linguistic resources
and developed components of their representative languages, in order to respond to
the institutional and marketable challenges at a pre-competitive level in the support of
multilingual services. Second, it is the year where the UNL patent has been approvedin USA for the UN (US Patent No. 6704700 B1, March 2004). It has been the first
software patent of the United Nations.
Open nature and scientific dissemination of UNL
Since 2002, an open annual conference around the convergence of language, culture
and knowledge is being held as a meeting point for researchers, politicians, linguists
and engineers. The most recent edition of this open conference was Convergences’03,
held in Alexandria, Egypt. The most significant papers from this conference have se-
lected and included in this volume. Additionally, an international workshop on UNL
and Interlinguas was organized in 2002 (International Workshop on UNL, other Inter-linguas and their Applications, held at Las Palmas de Gran Canaria, May, 2002), pa-
pers from this workshop are also compiled in this volume. Finally, we include the pa-
pers of the current edition of the UNL Workshop, held in Mexico D.F, February,
2005.
These conferences and workshops try to be a forum where all the interested people
in this initiative find a vehicle for communication and exchange of knowledge. The
3 www.undl.org
-
8/18/2019 Universal Networking Language: Advances in Theory and Applications
7/465
UNL is a great initiative that could never succeed and advance if the number of par-
ticipants is limited to the initial ones. The heterogeneity of the authors and languages
involved in this selection of papers shows the open nature of UNL.
Research on UNL: Current Trends
Apart from the mere applied studies of UNL, there is a current important trend ontheoretical studies of UNL, even though there is a final version of the specifications of
the language, dating to July 2003.
The rationale for such theoretical research is the need for standardization and ho-
mogenization on the use of the Interlingua both at the applied level and at the theo-
retical level. The UNL Specifications turned out to be subject to different personal in-terpretations, thus creating own UNL dialects. This is not desirable for an interlingua,
that claims to be language independent and that, in fact, turned out to be “person-
dependent”. For this reason, it is important and desirable to foment theoretical studies
on UNL, both from the linguistic point of view and the knowledge point of view.
From a scientific point of view, UNL follows the approach of the concept of Inter-
lingua, as an “artificial” language aiming at the neutral representation of linguistic
meaning. In this sense its roots can be sought in the tradition of MT interlinguas andin the tradition of Knowledge Representation formalisms.
When viewed as an interlingua, UNL differs from some of its predecessors and
current Interlinguas in the generality of appliance, that is, UNL is not restricted to a
number of languages or to a given domain. Thus, its design pretended to show the
highest degree of language independence while retaining natural language expres-
siveness in order to support multilingual generation tasks.Of course, the staging of UNL is such a general enterprise that requires research
and efforts. This process can be divided into several periods:
– Creation of deconversion and enconversion modules, (see Part 3) that is, devel-opment of the basic tools to undertake the basic architecture of the UNL system
(enconversion and generation), along with dictionaries. Although basic, it is con-
ditio sine qua non to have powerful generation systems. This a fruitful trend in
the UNL consortium, with three different approaches:
1. The official one: those using a common engine provided by the UNL Center.
2. The integrative ones: those that have integrated UNL into pre-existing MT
systems, following the transfer-based architecture, showing the flexilibility
of UNL with good results.
3.
The new ones: those that have noticed the drawbacks of the official compo-
nents, and have decided to create new architectures for generation
It should be noticed that emphasis is put on the deconversion process, quantita-
tively proven by the number of papers devoted to generation. Teams usually de-
velop generation systems, not so much enconversion systems, although the integra-
tive usually includes both processes in UNL.
-
8/18/2019 Universal Networking Language: Advances in Theory and Applications
8/465
–
Application of UNL in other contexts (see Part 3). Should UNL be considered as
an interlingua, it can be applied in fields and tasks other than multilingual genera-
tion, being the main one Knowledge representation and Knowledge Management.
– Use of external lexical and ontological resources. It is important as well, and fol-lowing the spirit of the integrative approaches, the use of external lexical re-
sources such as Wordnet to enhance some of the processes of UNL, especially in
the lexicographic part (see Part 3, also). This is also a trend and the philosophy of
UNL: integration and complementation of resources is encouraged, rather than
confrontation. And this is the spirit of the consortium and of every work in UNL.
From an engineering point of view, research is taken on:
–
Creation of methodologies in the workflow. –
Standardization of UNL, integration of UNL into current standards.
Why such studies methodologies and standards? Because of the heterogeneity and
diversity of the current consortium, it is needed such a process of standardization and
methodologies, since the short and medium term objective of UNL is its staging in the
market, where standards and methodologies are required in order to pursue higher
productivity and quality. The areas of linguistic engineering together with knowledge
engineering are claiming for such methodologies and processes of standardization.
The Future
After some time developing components and systems to support the multilingual ser-vices, UNL researchers and new teams have discovered that the UNL could be sup-
port of other applications as crosslingual information retrieval, knowledge reposito-ries, automatic building of ontologies from texts once repressented in UNL and much
more. UNL could be useful in new possible applications in areas where a common
conceptual representation is needed, independent of any particular language. For do-
ing it, new necessities emerge; particularly when putting together semantics and mul-
tilingualism. More theoretical studies are needed, along with the tuning up of re-
sources and tools, the proper standardization of the interlingua and processes for
enconverting and deconverting, and of course the integration and definition of the
lexical component of UNL.
-
8/18/2019 Universal Networking Language: Advances in Theory and Applications
9/465
The Structure of the Book
The volume is divided into four parts.
Part 1. Introduction
This fist part is an introduction to the language itself, and its purpose is to set up the
reader in the UNL context. These introductory papers posit the general philosophy of
the language (paper at page 3) and provide a general introduction to the language it-
self and to the context of multilingual generation, one of the main and most basic
“applications” supported by UNL (paper at page 10).
Part 2. Fundamentals
This part is dedicated to theoretical studies on UNL. As already said, UNL is mainly
an interlingua. There are many aspects that have to be taken into account when de-
signing an interlingua, such as its expressiveness, degree of language-independency,
accuracy and formality of the language, etc. Most of these issues are covered in this
part. Thus, the part opens up with an experiment on the common understandability of
UNL by different humans and the admissible degree of indeterminacy and ambiguity
in an Interlingua (paper at page 27). Pure theoretical studies on the universality of
UNL and its adequacy from a representational and linguistic point of view follow
(papers at pages 51 to 101). It has to be pointed out that this part is not exclusively
devoted to UNL, but to the field of interlinguas in general (paper at page 38; paper atpage 109).
All these papers point at the proper designs of the Interlingua. However, there is
another important aspect worth of consideration in any artificial language, namely, the
syntactic formalism of the formal language and its adequacy to the declared purpose.
These topics are addressed in papers at page 117 and at page 125, where the emphasis
is put on the syntactic properties of UNL expressions and its consequences to other is-sues such as analysis or proper deconversion. Finally, there is a (recurrent) thematic
shift; UNL is not viewed as an interlingua to support linguistic tasks, but as a lan-
guage for knowledge representation (papers at page 138 and at page 145).
These two sides of UNL (an interlingua to support linguistic tasks and a as knowl-
edge representation language) determine the nature of the applications dealt with in
Part 3.
Part 3. Applications
The core applications of UNL are those that support the tasks of NL analysis and gen-
eration (enconversion and deconversion in the UNL jargon). When dealing with NLP
tasks, the scene is quite heterogeneous: from the use of common generation tools pro-
vided by the UNL Center (as shown in papers at pages 215 and 241), to the integra-
tion of existing MT translation systems based on the transfer architecture to support
-
8/18/2019 Universal Networking Language: Advances in Theory and Applications
10/465
an Interlingua architecture (papers at pages 157 and 230). Other languages are sup-
ported with new tools, but differs in their configuration and architecture (maybe re-
flecting language variety, maybe reflecting different ways to support generation and
of course, as an advanced over common tools, like Deco). Chinese, Brazilian Portu-
guese, Arabic or Armenia are example of this, where very different paradigms are il-
lustrated in order to undertake the generation task (papers at pages 167, 175, 195, and
210, respectively).
Papers at pages 254 to 276 illustrate the development of workbenches to support
the processes of edition, generation and training and with the creation of multilingual
platforms within the UNL framework.
In parallel with the theoretical studies of Part 2, UNL also presents and applied di-
mension when conceived as a language for knowledge representation (papers at pages
337 and 359). These papers present the use of UNL as an extension (or complementa-
tion) to the expressiveness of standard languages such as XML (illustrated in papers
at pages 300 and 309), as the communication language among agents, developed in
paper at page 326, or as the support of case-based reasoning systems (paper at page
347). It is also remarkable the possibility of complementation and integration with
other lexical and ontological resources such as WordNet (papers at pages 370 and
380) to the enhancement of the processes of knowledge acquisition and representation
within the UNL context. Finally, paper at page 286 shows how to extend the expres-
sivity of UNL in order to represent and formalize meaning coming for oral sources.
Part 4. Methodologies
Finally, the volume ends up with the methodological work. Methodologies target at
the creation of methodologies to support multilingual services (papers at pages 395
and 413) and for the optimization of knowledge intensive tasks (paper at page 430).
Needless to say, methodologies conforms an integral part of the UNL R+D activities,
as long as productivity, quality and a real consolidation of UNL are pursued both at
the scientific and commercial levels.
Mexico D.F, 16th February 2005
Jesús Cardeñosa
Alexander Gelbukh
Edmundo TovarEditors
-
8/18/2019 Universal Networking Language: Advances in Theory and Applications
11/465
Table of ContentsÍndice
INTRODUCTION: Setting up UNL
A Rationale for Using UNL as an Interlingua and More in Various Domains ............3
Christian Boitet
Standardization of the Generation Process in a Multilingual Environment ............... 10 Jesús Cardeñosa, Carolina Gallardo and Edmundo Tovar
FOUNDATIONS
The UNL Distinctive Features: Inferences from a NL-UNL Enconverting Task.......27 Ronaldo Teixeira Martins, Lúcia Helena Machado Rino, Maria dasGraças Volpe Nunes, Osvaldo Novais Oliveira Jr.
Issues in Generating Text from Interlingua Representations......................................38Stephan Busemann
On the Aboutness of UNL................................................................... ....................... 51 Ronaldo Teixeira Martins, Maria das Graças Volpe Nunes
A Comparative Evaluation of unl Participant Relations using a Five-LanguageParallel Corpus .............................................................. ............................................. 64
Brian Murphy and Carl Vogel
Some Controversial Issues of UNL: Linguistic Aspects.............................................77
Igor Boguslavsky
Some Lexical Issues of UNL....................................................................................101 Igor Boguslavsky
The Representation of Complex Telic Predicates in Wordnets: the Case ofLexical-Conceptual Structure Deficitary Verbs .......................................................109
Palmira Marrafa
Remaining Issues that Could Prevent UNL to be Accepted as a Standard ............... 117Gilles Sérasset and Étienne Blanc
Semantic Analysis through Ant Algorithms, Conceptual Vectors and Fuzzy
UNL Graphs ............................................................ ................................................. 125 Mathieu Lafourcade
Term-Based Ontology Alignment ................................................................... ......... 138Virach Sornlertlamvanich, Canasai Kruengkrai, Shisanu Tongchim, Prapass Srichaivattana, and Hitoshi Isahara
-
8/18/2019 Universal Networking Language: Advances in Theory and Applications
12/465
Universal Networking Language: A Tool for Language IndependentSemantics?................................................................................................................145
Amitabha Mukerjee, Achla M Raina, Kumar Kapil, Pankaj Goyal, Pushpraj Shukla
APPLICATIONS
About and Around the French Enconverter and the French Deconverter.................157 Étienne Blanc
A UNL Deconverter for Chinese..............................................................................167 Xiaodong Shi, Yidong Chen
Flexibility, Configurability and Optimality in UNL Deconversion viaMultiparadigm Programming .................................................... ............................... 175
Jorge Marques Pelizzoni, Maria das Graças Volpe Nunes
Arabic Generation in the Framework of the Universal Networking Language ........ 195 Daoud Maher Daoud
Development of the User Interface Tools for Creation of National LanguageModules ............................................................. ....................................................... 210
Tigran Grigoryan, Vahan Avetisyan
Universal Networking Language Based Analysis and Generation for Bengali
Case Structure Constructs.........................................................................................215 Kuntal Dey, Pushpak Bhattacharyya
Interactive Enconversion by Means of the Etap-3 System ....................................... 230 Igor M. Boguslavsky, Leonid L. Iomdin and Victor G. Sizov
Prepositional Phrase Attachment and Interlingua.....................................................241 Rajat Kumar Mohanty, Ashish Francis Almeida, and Pushpak Bhattacharyya
Hermeto: A NL–UNL Enconverting Environment...................................................254 Ronaldo Martins, Ricardo Hasegawa and M. Graças V. Nunes
A Platform for Experimenting UNL (Universal Networking Language) ................. 261Wang-Ju TSAI
A Framework for the Development of Universal Networking Language E-Learning User Interfaces...........................................................................................268
Alejandro Martins, Gabriela Tissiani and Ricardo Miranda BarciaA WEB Platform Using UNL: CELTA’s Showcase ............................................... 276 Lumar Bértoli Jr., Rodolfo Pinto da Luz and Rogério Cid Bastos
Studies of Emotional Expressions in Oral Dialogues towards an Extension ofUniversal Networking Language .............................................................................286
Mutsuko Tomokiyo, Gérard Chollet, Solange Hollard
An XML-UNL Model for Knowledge-Based Annotation........................................300 Jesús Cardeñosa, Carolina Gallardo and Luis Iraola
-
8/18/2019 Universal Networking Language: Advances in Theory and Applications
13/465
A Pivot XML-Based Architecture for Multilingual, Multiversion Documents:Parallel Monolingual Documents Aligned Through a Central CorrespondenceDescriptor and Possible Use of UNL........................................................................309
Najeh Hajlaoui, Christian Boitet
UCL - Universal Communication Language ............................................................ 326Carlos A. Estombelo-Montesco and Dilvan A. Moreira
Knowledge Engineering Suite: A Tool to Create Ontologies for an AutomaticKnowledge Representation in Intelligent Systems ................................................... 336
Tânia C. D. Bueno, Hugo C. Hoeschl, Andre Bortolon, Eduardo S. Mattos, Cristina Santos, Ricardo M. Barcia
Using Semantic Information to Improve Case Retrieval in Case-Based
Reasoning Systems...................................................... ............................................. 346 J. Akshay Iyer and Pushpak Bhattacharyya
Facilitating Communication Between Languages and Cultures: aComputerized Interface and Knowledge Base..........................................................358
Claire-Lise Mottaz Jiang, Gabriela Tissiani, Gilles Falquet, Rodolfo Pinto da Luz
Using WordNet for linking UWs to the UNL UW System ...................................... 369 Luis Iraola
Automatic Generation of Multilingual Lexicon by Using Wordnet ......................... 379 Nitin Verma and Pushpak Bhattacharyya
METHODOLOGIESGradable Quality Translations Through Mutualization of Human Translationand Revision, and UNL-Based MT and Coedition .................................................. 393
Christian Boitet
Towards a systematic process in the use of UNL to support multilingualservices ................................................................ ..................................................... 411
Jesús Cardeñosa, Carolina Gallardo, Edmundo Tovar
Knowledge Representation Issues and Implementation of Lexical Data Bases ....... 428 F. Sáenz and A. Vaquero
Author Index ............................................................................................................443
-
8/18/2019 Universal Networking Language: Advances in Theory and Applications
14/465
-
8/18/2019 Universal Networking Language: Advances in Theory and Applications
15/465
Prologue
UNL is an ongoing worldwide initiative starting in 1996. Almost 10 years have
passed a big span of time for a project. We could say that UNL didn’t meet its expec-
tations. But let’s have a closer look to UNL, the project, its basics and objectives. A
closer look at its objective will reveal that this affirmation is gratuitous and unmoti-
vated.
The Problem: Linguistic Diversity
UNL was launched by IAS/UNU to erase linguistic barriers. Linguistic barriers col-lide with the enhancement of linguistic diversity and the value that native languages
as one of the main vehicles to express one’s cultural identity. Apart from socio-
cultural issues, linguistic diversity also knows an economic and political dimension.
Institutions like the United Nations or the European Union have to face everyday with
the barriers that linguistic diversity imposes. It is well known the enormous amount of
documentation that these institutions produce everyday, which have to be produced inall their official languages: 6 for the UN, 25 for the European Union. It is simply un-
feasible to rely on human translators for the production of all these amount of docu-
mentation.
Aware of this, the IAS/UNU launched the UNL project, aiming at the real access
of information in the own native language and not recurrent to dominant languages.
UNL is basically an artificial language where contents expressed in natural languages
can be converted to and subsequently, contents written in UNL can be generated intoany natural language, provided that the adequate tools are built.
MT and Multilinguality
From the technological point of view, multilinguality has been tackled by Machine
Translation. In the evolution of the area of MT, there is variety of architectures to un-
dertake the task of translating the contents of one text written in a given language into
another language. Transfer-based systems could be regarded as the most productive
and of better quality. But they are hindered by the exponential growth in the modules
to be developed when the number of involved languages increases. A transfer-based
system involving N languages need to develop N*(N-1) modules. An astronomic
number to create real multilingual platforms.Further, although there are some very good systems, the quality of these systems
seem to be limited, since after years of refinement, the MT system does not surpass a
given degree of quality. Besides, the development of transfer based MT systems is
usually reduced to the so-called majority languages (English, French, German and
even Spanish or Italian), but it is fairly rare to find a good quality and wide coverage
MT system covering English and Polish, let’s say.
Transfer based MT is not the only option, Interlingua-based systems represents an
alternative to transfer systems. Interlingua-based MT does not work on pair of lan-
-
8/18/2019 Universal Networking Language: Advances in Theory and Applications
16/465
guages, but translation is carries out to and from an artificial language that serves as a
pivot for all the natural languages involved in the system. This architecture tries to
overcome the exponential growth of transfer-based systems, since the number of
modules to develop for N languages is 2*N and the inclusion of new languages into
the system does not affect the other language modules. In this way, UNL follows the
architecture of Interlingua-based MT systems.
Usually, Interlinguas are abstract formal (or semi formal) languages that captures
the meaning of texts in a language independent way. Ideally, the Interlingua should
not be close to a given particular language and should not include linguistic devices
proper of natural languages. In this way, Interlingua-based systems seem the most
plausible (and even the unique) option to tackle massive multilinguality.
But Interlinguas has been often rejected within the scientific community and since
their boom in the 80ies, there have no commercial application of Interlinguas and the
systems developed under this trend were laboratory products. Why is this so? Let’s
have a look at the properties of interlinguas.
Problems with Interlinguas
Interlinguas are semantic languages designed to represent the meaning of any given
text, ideally satisfying the following conditions:
(a) They are language neutral.
(b)
They are precise, unambiguous, formal languages
Being so, they usually show the following characteristics:
− Interlinguas are intimately tied up with ideas about the representation of meaning,being meaning the most abstract and deepest level of linguistic analysis (that
should be common to all languages, far enough from surface representation of lan-
guages).
− An Interlingua is “another language” in the sense that it has autonomy and thus its
components need to be defined: vocabulary and “relations” mainly. Besides, and
Interlingua is an artificial language that should be as expressive as natural lan-
guages.
Here we find the main bottleneck of interlinguas: its proper design and definition.
Defining an Interlingua involves the following parameters:
(a)
A language whose “atoms” are not dependent on any given natural language
so that the ambiguity of natural languages is eliminated.
(b)
A language whose “atoms” are not dependent on a given natural language so
that the concepts and ideas expressed in different natural languages can be
easily and naturally expressed in the Interlingua.
(c)
A language that is as expressive as a natural language so that what can be ex-
pressed in natural languages can be transposed to the Interlingua, and from
the interlingua to other natural languages.
These three conditions make interlinguas hard to design. It is quite difficult to find
the equilibrium between language independency, degree of abstraction and expres-
-
8/18/2019 Universal Networking Language: Advances in Theory and Applications
17/465
siveness in a formal device such an Interlingua. Maybe this difficulty in the design of
interlinguas is the reason why they have not been successful at least in open domains
within massively multilingual environments. The examples of interlingua-based sys-
tems are domain dependent and quite limited in the number of languages.
Is UNL a Viable Solution?
The panorama appears quite despairing. While Interlinguas are theoretically biased
and difficult to put into practice, transfer based systems have proved to be unattain-
able when dealing with massive multilinguality. Maybe the concept of Interlingua
should be revisited, and re-adapted to real necessities and to real scenarios. This is the
spirit of UNL. UNL, by its definition and by its most basic architecture is definitelyan Interlingua-based system. Its targets are the support of multilinguality, not re-
stricted to a given domain or to a given family of languages. Thus, the design of a in-
terlingua like UNL encounters all the possible barriers that an Interlingua may en-
counter (especially to find a real language independent representation).
So why we could considered UNL as different, as a new viable technology if inter-
linguas were rejected a long time ago? First, let’s remember the main objective of
UNL:
− to generate and produce contents in any natural language in any domain.
− to support multilingual services.
That is, there is a primacy of generation and coverage of languages and domains,
which means that a very expressive formalism has to be designed in order to repre-
sent such a variety of contents coming from any natural language.Let’s illustrate this fact by have a closer look at the vocabulary of the Interlingua,
one of the most difficult and polemic issues of UNL and of any Interlingua. UNL util-
izes the so-called Universal Words as the semantic atoms of the Interlingua (no de-
composable). They exhibit the following main characteristic:
They are based on English headwords.
From this very simple definition, we can conclude that UNL is language biased
(English) and thus:
1. UNL is based on a natural language:
2.
It hinders logical relations and inferences (facilitated by primitive based solutions)
3. Its vocabulary is a potential source of ambiguity
4. Its vocabulary fosters lexical and conceptual mismatches among languages.
So is there any advantage in the UW system and in the overall essence of UNL?
Well, if theoretical reasons do not support the design of open-domain interlinguas,
let’s look at the practical or pragmatic ones.
(a)
UNL is based on a natural language. At first sight could be a drawback,
however, the expressiveness of a natural language is inherited by the Inter-
lingua, thus allowing for the representation of a variety of domains and con-
cepts.
-
8/18/2019 Universal Networking Language: Advances in Theory and Applications
18/465
(b)
UNL shows an English oriented vocabulary. At this moment, English is the
lingua franca, the most accessible to work with for Indo-Europeans, Semitic,
Japanese, Chinese, etc. Bilingual dictionaries usually have English as one of
their target/source languages, thus the development of lexicographic re-
sources is facilitated by choosing English as the most basic atoms of the lan-
guage.
Of course, this approach (although supported by pragmatism) is far from perfect.
Even at first sight, it can be considered as naïve, since it merely “suggest” well known
problems in lexical semantics (like support verbs, compounds expressions, connota-
tional meaning, etc). For this reason, theoretical research on the UNL as a language it-
self should be fostered within the Consortium, while respecting the basic nature of the
language.
That is, UNL should be viewed rather than a perfect Interlingua as the pillars tosupport multilingual services. Its natural language orientation (apparently, its weak-
est points as an Interlingua) turns the language as a candidate to the support of multi-
linguality and facilitates converting contents to and from UNL. There are several as-
pects that support it. First, the creation of generators of medium quality (where post-
edition is possible) is rather straightforward. Second, its flexibility and language ori-
entation makes it possible to integrate UNL into other pre-existent MT systems (be it
transfer-based be it another architecture) which extends the range of application of
UNL and makes possible to alleviate the problem of exponential growth in transfer-
based systems. And last, but not least, the processes of enconverting and deconverting
are independent so that if generation is taken as a priority, generators are constructed
first; the process of enconversion can be done manually, due to the human readability
of the language.
At this point in the evolution of UNL, there appears a contradiction, UNL is stillnot theoretically mature, but from an applied perspective, it is. In the short term there
is priority for the UNL Consortium to get feedback from previous experiences in In-
terlinguas, from Linguistic Theory (semantics, logic, and lexical semantics) in order
for UNL to grow and find a place in the scientific community and, why not, in the
market as a real approach to support multilinguality, once the applications and utilities
are clear and defined within the UNL Programme.
Prospective
So is it worth another attempt? Definitely yes, the real need to overcome linguistic
barriers (be it at the institutional level, be it at the social level) claims for a solution to
the problem of multilinguality. Transfer based systems simply are out of question ifisolated . This doesn’t mean that they are useless: they are not. An interlingua like
UNL is conceived as another autonomous languages, close enough to the superficial
form of natural languages, thus integration of the Interlingua into the transfer system
is possible and not a contradiction in terminis.
After several years of experience, we know that knowledge and language genera-
tion do not go on a par . Thus the final design have to be done bearing the ultimate
-
8/18/2019 Universal Networking Language: Advances in Theory and Applications
19/465
purpose of the interlingua (the closer to language semantics is, the better to generate
languages) and probably will lead to the success of the interlingua.
A Final Word
I would like to thank the editors of this book for their invitation to write a prologue tothis work and to collaborate with them in the selection and revision of the selected
papers presented in this volume. Hopefully it will provide a thorough understanding
of the UNL Programme, its meaning, its evolution, its shortages and its strengths.
Carolina Gallardo
-
8/18/2019 Universal Networking Language: Advances in Theory and Applications
20/465
-
8/18/2019 Universal Networking Language: Advances in Theory and Applications
21/465
INTRODUCTION: SETTING UP UNL
-
8/18/2019 Universal Networking Language: Advances in Theory and Applications
22/465
-
8/18/2019 Universal Networking Language: Advances in Theory and Applications
23/465
A Rationale for Using UNL as an Interlingua and More
in Various Domains
Christian Boitet
GETA, CLIPS, IMAG385, Av. de la Bibliothèque, BP 53F-38041 Grenoble cedex 9, [email protected]
Abstract. The UNL language of semantic graphs may be called as a "seman-tico-linguistic" interlingua. As a successor of the technically and commerciallysuccessful ATLAS-II and PIVOT interlinguas, its potential to support variouskinds of text MT is certain, even if some improvements would be welcome, asalways. It is also a strong candidate to be used in spoken dialogue translationsystems when the utterances to be handled are not only task-oriented and oflimited variety, but become more free and truly spontaneous. Finally, althoughit is not a true representation language such as KRL and its frame-based andlogic-based successors, and although its associated "knowledge base" is not atrue ontology, but rather a kind of immense thesaurus of (interlingual) sets ofword senses, it seems particularly well suited to the processing of multilingualinformation in natural language (information retrieval, abstracting, gisting,etc.).The UNL format of multilingual documents aligned at the level of utter-ances is currenly embedded in html (call it UNL-html), and used by various
tools such as the UNL viewer. By using a simple transformation, one obtainsthe UNL-xml format, and profit from all tools currently developed aroundXML. In this context, UNL may find another application in the localization ofmultilingual textual resources of software packages (messages, menu items,help files, and examples of use in multilingual dictionaries.)
1 Introduction
UNL is the name of a project, of a meaning representation language, and of a formatfor "perfectly aligned" multilingual documents. There is some hefty controversyabout the use of the UNL language as an "interlingua", be it for translation or forother applications such as cross-lingual information retrieval. On the other hand,
there is almost no discussion on the UNL format, in its current form, embedded inHTML, or some directly derivable form, embedded in XML.We argue that the UNL language is indeed a good interlingua for automated trans-
lation, ranging from fully automatic MT to interactive MT of several kinds through,we believe, spoken translation of non task-oriented dialogues. It is also more thanthat, due to the associated "knowledge base", and has a great potential in textual in-formation processing applications.
© J. Cardeñosa, A. Gelbukh, E. Tovar (Eds.)Universal Network Language: Advances in Theory and Applications.Research on Computing Science 12, 2005, pp. 3–9.
-
8/18/2019 Universal Networking Language: Advances in Theory and Applications
24/465
4 Christian Boitet
We will first give our view of what the UNL language is, and then develop a "ra-tionale" for using the UNL language UNL along the previous lines. We will then de-scribe some interesting potential uses of the UNL format in an "XML-ized" form.
2 The UNL language
The UNL representation is made of "semantic graphs" where a graph expresses themeaning of some natural language utterance. Nodes contain lexical units and attribu-tes, arcs bear semantic relations. Connex subgraphs may be defined as "scopes", sothat a UNL graph may be a hypergraph. Figure 1 illustrates a UNL graph.
agt
insplt
objmod
Ronaldo head(pof>body)
corner
left
goal(icl>thing)
score(icl>event,agt>human,fld>sport).@entry.@past.@complete
obj
pos
Fig. 1. A possible UNL graph for “Ronaldo has headed the ball into the left corner of the goal”
The lexical units, called Universal Words (in French, not "mot universel" but bet-ter "Unité de Vocabulaire Virtuel" or UVV or UW), represent word meanings, some-thing less ambitious than concepts. Their denotations are built to be intuitively under-stood by developers knowing English, that is, by all developers in NLP. A UW is anEnglish term or pseudo-term possibly completed by semantic restrictions.
A UW such as "process" represents all word meanings of that lemma, seen as cita-tion form (verb or noun here). The UW "process(icl>do, agt>person)" covers the ver-bal meanings of processing, working on, etc.
The attributes are the (semantic) number, genre, time, aspect, modality, etc.The 40 or so semantic relations are traditional "deep cases" such as agent, (deep)
object, location, goal, time, etc.One way of looking at a UNL graph corresponding to an utterance U-L in lan-
guage L is to say that it represents the abstract structure of an equivalent English ut-terance U-E as "seen from L", meaning that semantic attributes not necessarily ex-pressed in L may be absent (e.g., aspect coming from French, determination ornumber coming from Japanese, etc.).
-
8/18/2019 Universal Networking Language: Advances in Theory and Applications
25/465
A Rationale for Using UNL as an Interlingua and More in Various Domains 5
3 Some arguments for using the UNL language in various contexts
To show that using UNL is not only a workable but a good or perhaps the best idea atthe moment, we can say that
− the "pivot" technique HAS BEEN not only experimented but deployed success-fully (ATLAS, PIVOT, ULTRA, KANT).
− in particular, ATLAS-II (Fujitsu) is built on the basis of a pivot from which theUNL representation has evolved. The main designer of UNL, H. Uchida, was alsothe main designer of ATLAS-II.
− ATLAS-II has been recognized as the best EJ/JE MT system in Japan for over 10
years and has a very large coverage (586,000 words in English and Japanese).− interlingual representations can not in principle be used (alone) to achieve the
highest quality achievable by transfer systems, BUT they can give quite high qual-ity as demonstrated by ATLAS-II.
− due to the precise nature of UNL, it is possible for human non-specialists to im-prove a UNL representation interactively, a posteriori, from any UNL-related lan-guage, and on demand (meaning partially — think of "lazy improvement").
− in many contexts other than translation, an interlingual, semantic-oriented repre-sentation like UNL is actually the best solution. For example, all applications re-lated to information processing in multilingual contexts don't need a very preciserepresentation of the FORM of the information, they need a precise ENOUGHrepresentation of the INFORMATION CONTENT of the information.
− applications such as information retrieval and abstracting have already been proto-typed successfully with UNL. It is far easier to generate SQL or SQL-like queriesand answers from a UNL form than from text in many languages.
4 Applications of the UNL format
The UNL format of multilingual documents aligned at the level of utterances is cur-renly embedded in html (call it UNL-html). A sentence is represented between the [S]and [/S] tags. Its original text is contained between {org:el} (English, here) and{/org}, its UNL graph between {unl} and {/unl}, each French version between {fr}and {/fr}, and analogously for other languages. Atrtibutes such as version, date, loca-tion, author, etc. may appear in the tags. Here is a slightly simplified example of a filein UNL-html format.
Example 1 El/UNL
[D:dn=Mar Example 1, on= UNL French,[email protected]]
-
8/18/2019 Universal Networking Language: Advances in Theory and Applications
26/465
6 Christian Boitet
[P][S:1]{org:el}I ran in the park yesterday.{/org}{unl}agt(run(icl>do).@entry.@past,i(icl>person)) plc(run(icl>do).@entry.@past,park(icl>place).@def)tim(run(icl>do).@entry.@past,yesterday){/unl}{cn dtime=20020130-2030, deco=man}
我昨天在公園裡跑步{/cn}
{de dtime=20020130-2035, deco=man}Ich lief gestern im Park. {/de}
{es dtime=20020130-2031, deco=UNL-SP}Yo corri ayer en el parque.{/es}{fr dtime=20020131-0805, deco=UNL-FR}J’ai couru dans le parc hier. {/fr}[/S][S:2]{org:el}My dog barked at me.{/org}{unl}agt(bark(icl>do).@entry.@past,dog(icl>animal))gol(bark(icl>do).@entry.@past,i(icl>person)) pos(dog(icl>animal),i(icl>person)){/unl}{de dtime=20020130-2036, deco=man}Mein Hund bellte zu mir.{/de}
{fr dtime=20020131-0806, deco=UNL-FR}Mon chien aboya pour moi.[/S] [/P][/D]
The French versions have been produced automatically while the German and Chi-nese versions have been translated manually.
The output of the UNL viewer for French is:
Example 1 El/UNL
J’ai couru dans le parc hier.Mon chien aboya pour moi.
and will probably be displayed by a browser as:
Example 1 El/UNLJ’ai couru dans le parc hier. Mon chien aboya pourmoi.
and similarly for all other languages.
-
8/18/2019 Universal Networking Language: Advances in Theory and Applications
27/465
A Rationale for Using UNL as an Interlingua and More in Various Domains 7
The UNL viewer produces on demand as many html files as languages selectedand sends them to any available browser.
The UNL-html format predates XML, hence the special tags like [S] and {unl},but it is easy to derive from it an XML format and to transform the documents into anequivalent "UNL-xml" format. Then, using DOM and javaScript, it is possible to pro-duce various views, including that of a classical viewer, a bilingual or multilingualeditable presentation, and a revision interface where not only the text but the UNLgraph and possibly other structures may be directly manipulated.
Let us take an example from an experiment performed for the "Forum Barcelona2004" on documents in Spanish, Italian, Russian, French and Hindi. Hindi and Rus-sian are not shown. The XML form is simplified (see figure 2).
agt(retrieve.@entry.@future, city) tim(retrieve.@entry.@future, after) obj (after. Forum) obj(retrieve.@entry.@future, zone.@indef) mod(zone.@indef, coastal) After a Forum, a city will retrieve a coastal zone Una ciudad recuperará una zona de costa después de Forum
Una cité retrouvera une zone côtière après un forum Città ricuperarà une zone costiera dopo Forum
Fig. 2. Simplified XML form. Correct sentences are produced by the deconverters from cor-rect and complete UNL graphs. Suppose for the sake of illustration that some UNL graph has
been produced from a Chinese version, and does not contain definiteness and aspectual infor-mation. All results may be wrong wrt articles, and some wrt aspect.
The idea of "coedition" is applicable if there is a UNL graph associated with asegment one wants to modify. The goal is to share the revisions across languages, byreflecting them on the UNL graph, e.g.
• add ".@def" on the nodes containing "city", "Forum".
• replace "retrieve" by "recover" and add ".@complete" on the node containing it.It is not possible in principle to deduce the modification on the graph from a modi-
fication on the text. For example, replacing "un" ("a") by "le" ("the") does not entailthat the following noun is determined (.@def), because it can also be generic ("ilaime la montagne" = "he likes mountains"). Hence, the technique envisaged is that:
•
revision is not done by modifying directly the text, but by using a menu system,
• the menu items have a "language side" and a hidden "UNL side",
• when a menu item is chosen, only the graph is transformed, and the action to bedone on the text is stored and shown next to its focus in the "To Do" zone,
• at any time, the new graph may be sent to the L0 deconverter and the result shown.If is is satisfactory, that shows that errors were due to the graph and not to the de-converter, and the graph may be sent to deconverters in other languages. Versions
-
8/18/2019 Universal Networking Language: Advances in Theory and Applications
28/465
8 Christian Boitet
in some other languages known by the user may be displayed, so that improve-ment sharing is visible and encouraging.New versions will be added with appropriate tags and attributes in the original
multilingual document in UNL-xml format, or in a DBMS, so that nothing is everlost, and cooperative working on a document is feasible. UNL may find another ap-plication in the localization of multilingual textual resources of software packages(messages, menu items, help files, and examples of use in multilingual dictionaries.)
Apart of the "coedition", there are many other portential applications of UNL, suchas:
• crosslingual information retrieval, on which we are currently working,
• abstracting & gisting, which has been prototyped at NecTec and in India,
•
localization of software packages: messages in multiple languages could be cre-ated from UNL graphs produced from a graphical interface or by enconversion,and then sent to appropriate deconverters.
For this last point, we have found how to represent messages including variables(such as integers, file names etc.), but not yet how to handle messages including mor-phological or even lexical variants (as "4 goda / 5 let" for "4 years / 5 years" in Rus-sian).
5 Conclusion
The UNL language is an artificial interlingua, embeddable in html or xml formats for
multilingual document representation and processing. Because of its both abstract andlinguistic nature, the UNL language offers many more interesting potential applica-tions than other types of interlingua such as task and/or domain specific interlingua.
The history of MT shows that UNL will also be usable in the context of high-quality MT, quality being obtained through typology specialization and/or interactiveimprovement, a priori (interactive disambiguation after all-path robust analysis)and/or a posteriori by coedition of the text in any language and the correspondingUNL graph.
References
Blanc É. & Guillaume P. (1997) Developing MT lingware through Internet : ARIANE and the
CASH interface. Proc. of Pacific Association for Computational Linguistics 1997 Confer-ence (PACLING'97), Ohme, Japon, 2-5 September 1997, 1/1, pp. 15-22.Blanchon H. (1994) Perspectives of DBMT for monolingual authors on the basis of LIDIA-1,
an implemented mockup. Proc. of 15th International Conference on Computational Linguis-tics, COLING-94, 5-9 Aug. 1994, 1/2, pp. 115—119.
Boitet C., Guillaume P. & Quézel-Ambrunaz M. (1982) ARIANE-78, an integrated environ-ment for automated translation and human revision. Proc. of COLING-82, Prague, July1982, North-Holland, Ling. series 47, pp. 19—27.
-
8/18/2019 Universal Networking Language: Advances in Theory and Applications
29/465
A Rationale for Using UNL as an Interlingua and More in Various Domains 9
Boitet C. (1994) Dialogue-Based MT and self-explaining documents as an alternative to MAHT and MT of controlled languages. Proc. of Machine Translation 10 Years On, 11-14Nov. 1994, Cranfield University Press, pp. 22.21—29.
Boitet C. & Blanchon H. (1994) Multilingual Dialogue-Based MT for Monolingual Authors:the LIDIA Project and a First Mockup. Machine Translation, Vol. 9, N° 2, pp. 99—132.
Boitet C. (1997) GETA's MT methodology and its current development towards personal net-working communication and speech translation in the context of the UNL and C-STAR pro- jects. Proc. of PACLING-97, Ohme, 2-5 September 1997, Meisei University, pp. 23-57.
Boitet C., Réd. (1982) "DSE-1"— Le point sur ARIANE-78 début 1982. Contrat ADI/CAP-Sogeti/Champollion (3 vol.), GETA, Grenoble, janvier 1982, 616 p.
Brown R. D. (1989) Augmentation. (Machine Translation), Vol., N° 4, pp. 1299-1347.Ducrot J.-M. (1982) TITUS IV . In Information research in Europe. Proc. of the EURIM 5 conf.
(Versailles), edited by Taylor P. J., London, ASLIB.Kay M. (1973) The MIND system. In Courant Computer Science Symposium 8: Natural Lan-
guage Processing, edited by Rustin R., New York, Algorithmics Press, Inc., pp. 155-188.Lafourcade M. (2001) Lexical sorting and lexical transfer by conceptual vectors. Proc. ofMMA'01, 29-31/1/01, SigMatics & NII, Tokyo, 10 p.
Lafourcade M. & Prince V. (2001) Synonymies et vecteurs conceptuels. Proc., 29-31/1/01,SigMatics & NII, Tokyo, 10 p.
Maruyama H., Watanabe H. & Ogino S. (1990) An Interactive Japanese Parser for MachineTranslation. Proc. of COLING-90, 20-25 août 1990, ACL, 2/3, pp. 257-262.
Melby A. K., Smith M. R. & Peterson J. (1980) ITS : An Interactive Translation System. Proc.of COLING-80, Tokyo, 30/9-4/10/80, pp. 424—429.
Moneimne W. (1989) TAO vers l'arabe. Spécification d'une génération standard de l'arabe. Réalisation d'un prototype anglais-arabe à partir d'un analyseur existant . Nouvelle thèse,UJF.
Nirenburg S. & al. (1989) KBMT-89 Project Report ., Center for Machine Translation, CarnegieMellon University, Pittsburg, April 1989.
Nyberg E. H. & Mitamura T. (1992) The KANT system: Fast, Accurate, High-Quality Transla-
tion in Practical Domains. Proc. of COLING-92, 23-28 July 92, ACL, 3/4, pp. 1069—1073.Sérasset G. & Boitet C. (2000) On UNL as the future "html of the linguistic content" & the re-use of existing NLP components in UNL-related applications with the example of a UNL-French deconverter . Proc. of COLING-2000, Saarbrücken, 31/7—3/8/2000, ACL, 7 p.
Slocum J. (1984) METAL: the LRC Machine Translation system. In Machine Translation to-day: the state of the art (Proc. third Lugano Tutorial, 2–7 April 1984) , edited by King M.,Edinburgh University Press (1987).
Wehrli E. (1992) The IPS System. Proc. of COLING-92, 23-28 July 1992, 3/4, pp. 870-874.
-
8/18/2019 Universal Networking Language: Advances in Theory and Applications
30/465
Standardization of the Generation Process
in a Multilingual Environment
Jesús Cardeñosa, Carolina Gallardo and Edmundo Tovar
Universidad Politécnica de Madrid, 28660 Madrid.{carde,carolina,edmundo}@opera.dia.fi.upm.es
Abstract. Natural language generation has received less attention within the
field of Natural language processing than natural language understanding. One
possible reason for this could be the lack of standardization of the inputs to
generation systems. This fact makes the systematic planning of the process of
developing generation systems to become difficult. The authors propose the use
of the UNL (Universal Networking Language) as a possible standard for the
normalization of inputs to generation processes.
1 Introduction
In natural language processing (from now on NLP) two areas can be differentiated:
analysis and generation. However, one has not received the same attention as the
other from the scientific community, that is why generation can be considered as the
“poor brother” of the NLP. The reason for this minor development is the different na-ture of the input to the analysis and generation systems. The input to the analysis sys-
tems is always natural language, whose casuistic and phenomenology are known;
while in a generation system, the output is always known, but not what it is going to
generate from [1].
The input to a generation system varies depending on whether it is monolingual
generation (dialogue systems) or a multilingual system (mainly machine translation
systems). In dialogue systems it is difficult to establish appropriate characteristics
common to all inputs, because “the problem” of generation is usually solved with so-
lutions ad hoc, depending on the application and the system language. In machine
translation systems, there are also many differences in the inputs to the generation
subcomponents, conditioned by the nature of system architecture (transfer, interlin-
gua, etc.), the kind of grammars being used (declaratives vs. procedural) [2], or the
number of languages in the system.This difference in the input to the generators makes a systematic planning of their
development process impossible (main cause of the minor development of generation
compared to analysis). It is necessary then, that the input to the “generator” can be
supported with an appropriate model of contents representation, separated from the
format or language that ensures a standard process for the development of generation
systems.
In this article we propose the UNL as a possible standard for the generation inputs.
To achieve this, in section 2 we will introduce the main generation architectures. Sec-
© J. Cardeñosa, A. Gelbukh, E. Tovar (Eds.)
Universal Network Language: Advances in Theory and Applications.
Research on Computing Science 12, 2005, pp. 10–24.
-
8/18/2019 Universal Networking Language: Advances in Theory and Applications
31/465
Standardization of the Generation Process in a Multilingual Environment 11
tion 3 will describe in detail the UNL system, its qualities and basic architecture for
generation. Section 4 will establish the conditions required by any technology in order
to be considered a standard and which ones are fulfilled by the UNL. The article will
end with the description of a real massively multilingual system (HEREIN) where
UNL has been formally studied and proposed as a de-facto standard for generation of
contents in natural languages.
2 Generation Architectures
2.1 Dialogue Systems
Dialogue systems represent one of the main applications of natural language genera-
tion. This kind of systems have as their most important target “to present information
to the users in an easy to understand format” [3] in very specific fields where the user
generally interacts with the system in the same language. The user asks the system
specific information; once obtained, the system can show it through an answer in
natural language. This answer is very frequently obtained (with certain success)
through the generation of a “built” language from a series of templates that keep a
predefined relationship with the templates that support the questions [4]; this means
the generation process takes as input a representation that depends on the way the user
makes the question. It could be said that there is not a thorough analysis of the text,
nor an abstract representation of the information that should be given to the user. The
great dependency of the source language and the domain restrain the construction of
multilingual dialogue systems and the reuse of these systems in other domains.
2.2 Machine translation systems
Machine translation systems (from now on MT) are essentially multilingual because
their target is the “transformation” of a text written in language A into an equivalent
text in language B. In this section main architectures of MT systems will be de-
scribed, because each architecture sets a series of conditions over the appropriate
characteristics of the inputs to the generation process.
2.2.1 Transfer systems
The basic tasks in a transfer system are analysis, transfer and generation. The analysiscomponent produces a syntactic representation (sort of thorough) depending of the
source language. This syntactic representation is the input to the transfer module
whose task is to transform that representation into a closer structure to the target lan-
guage. The output of the transfer module shapes the input to the generation system
module which finally produces the phrase in the target language. In transfer systems,
the components, inputs and outputs are strongly oriented to the source and target lan-guages.
-
8/18/2019 Universal Networking Language: Advances in Theory and Applications
32/465
12 Jesús Cardeñosa, Carolina Gallardo, and Edmundo Tovar
The main problem of the transfer systems is the almost impossibility to reuse the
existing resources (transfer modules) and components in order to include new lan-
guage pairs in the system. In fact, if it were necessary to increase the number of lan-
guage pairs, a new system would have to be built. Generally, the great orientation of
the “transfer” systems towards the target language involves great accuracy in the out-
put, and a considerable difficulty in reusing components to include new languages in
the system.
2.2.2 Interlingua based Systems
Interlingua based systems form the second great systems’ paradigm of machine trans-
lation. Included into the systems based in interlingua are the “traditional” ATLAS-II[5], PIVOT [6] as much as the knowledge-based ones such as KANT [7] or Mikro-
kosmos [8]. Their defining characteristics are:
• Unique intermediate representation. The abstract representation, result from the
analysis, “feeds” directly the generation module. This intermediate representation
is the component named “interlingua”.
• Elimination of the transfer process. The system carries out two basic tasks:
analysis and generation.
The systems based on interlingua are oriented to cover the largest possible number
of languages, given that the number of components that requires a system based in in-
terlingua for n languages is 2*n, it is remarkably inferior to n*(n-1) that transfer sys-
tems require for the same number of languages.The basic architecture of the generation in a interlingua system is shown in the next
figure.
Interlingua based systems offer an important advantage over the transfer ones; thearchitecture facilitates the inclusion of new languages and are reusable. However, dur-
ing the conversion process to the interlingua, it is possible that some significant
grammar information for the generation may be lost, that is, the interlingua may have
less information (grammatical, not conceptual) than a syntactic representation. To
sum up, the systems based on interlingua offer a larger number of languages at the
expense of lesser precision in the generated texts.
Fig. 1. Generation in interlingua systems
Interlingua
Generator A Generator B Generator C
Language A Language BLanguage C
-
8/18/2019 Universal Networking Language: Advances in Theory and Applications
33/465
Standardization of the Generation Process in a Multilingual Environment 13
2.2.3 Fusion
Without any doubt, multilingualism is an added value for any generation system. The
transfer-interlingua dichotomy seems to imply an opposition between precision vs.
number of languages. To take advantage from every one, some transfer systems have
“interlingued” their architectures to support a larger number of languages [9] [10].
The common characteristic in these systems is the existence of a deep syntactic repre-
sentation that has some amount of independence from the source language. The proc-
ess to combine the interlingua architecture in a transfer system requires the construc-
tion of a transfer module between the deep syntactic structure and an interlingua
representation [11].
3 The UNL approach
3.1 The UNL system
UNL [12] is an artificial language designed to reproduce the content of texts written
in any natural language. The UNL is provided with specifications that formally define
the language. A UNL expression is an hyper graph consisting of:
• Universal words. They define the vocabulary of the language, i.e., they can be
considered the lexical items of UNL. To be able to express any concept occurring
in a natural language, the UNL proposes the use of English words modified by a
series of semantic restrictions that eliminate the innate ambiguity of the vocabu-
lary in natural languages. In this way, the language gets an expressive richnessfrom the natural languages but without their ambiguity. Take, for example, the
English word “construction” meaning “the action of constructing” and the “final
product”. Thus, the word “construction” will be paired with two different univer-
sal words:
construction1 construction(icl>action)
construction2 construction(icl>concrete thing)
where “icl” is the abbreviation for “included”.
• Relations. These are a group of 41 relations that define the semantic relations
among concepts. They include argumentative (agent, object, goal), circumstantial
(purpose, time, place), logic (conjunction, and disjunction) relations, etc. For ex-
ample, in a sentence like “The boy eats potatoes in the kitchen”, there is a main
predicate (“eats”) and three arguments, two of them are instances of argumenta-tive relations (“boy” is the agent of the predicate “eats”, whereas “potatoes” is
the object ) and one circumstantial relation (“kitchen” is the place where the ac-
tion described in the sentence takes place).
• Attributes. They express the semantic information resulting from the morphol-
ogic flexion and the functional elements of the phrase (auxiliary verbs, articles,
etc.). They are put together with the universal words to complete their meaning
when they appear in a specific context. The attributes include information about
time or aspect of the event, number, polarity, modality, etc. In the previous sen-
-
8/18/2019 Universal Networking Language: Advances in Theory and Applications
34/465
14 Jesús Cardeñosa, Carolina Gallardo, and Edmundo Tovar
tence, attributes are needed to express plurality in the object (“potatoes”), definite
reference in both the agent (“boy”) and the place (“kitchen”) and finally and spe-
cial attribute denoting which UW is the head of the whole expression. (the entry
node).
Formally, a UNL expression has the form of a semantic net, where the nodes
(universal words) are linked by labeled arcs with the UNL concept relations. The
graphical representation of the sentence “the boy eats potatoes in the kitchen” in UNL
is shown in figure 2.
This sentence is written in UNL in the following manner:
agt( eat(icl>do).@entry, boy(icl>person).@ def )
obj( eat(icl>do).@entry, potato(icl>food).@ pl )
plc( eat(icl>do).@entry, kitchen(icl>facilities).@ def )
3.2 Basic characteristics of UNL
The UNL system represents a generic framework for the massive generation of multi-
lingual contents. Its main goal is the contents’ representation of a document, web
page, data base, etc., in a consensual and normalized structure that may be trans-
formed into a text in a natural language. The defining characteristics of the UNL sys-
tem are:
a) It is a system oriented to the generation of multilingual contents. A document writ-
ten in the UNL has its “own identity” and can be stored in a document data base, etc.
b) The UNL does not involve the use of specific components or tools. The tools and
components, as well as the processes that may be defined to accomplish the editionand the generation in the UNL vary from one language to another. The use of the
UNL only involves the standardization of the input into a generation system [13].
Fig. 2. Representation of a UNL expression.
place
agent
objectpotato(icl>food)@pl eat(icl>do).@entry
boy(icl>person).@def
kitchen(icl>facilties).@def
-
8/18/2019 Universal Networking Language: Advances in Theory and Applications
35/465
Standardization of the Generation Process in a Multilingual Environment 15
In spite of the emphasis given to the language generation in the system, the UNL
framework includes the editing process of natural language into the UNL, named “en-
conversion” as well as the generation into natural languages or “deconversion” (see
figure 3).
The UNL is an interlingua in essence, that is, an appropriate language for the rep-
resentation of the meaning in an independent way from the natural languages. The
UNL is not restricted to a specific domain (as can be the KANT or Mikrokosmos in-
terlinguas); the fact of not restricting the input in the vocabulary collection of the in-
terlingua guarantees the UNL adaptation for the representation of contents in any lan-
guage or domain.
3.3 Generation in the UNL framework.
There are several architectures for the generation of natural language from the UNL.Next, the two generation architectures within the UNL framework will be described in
detail.
3.3.1 Direct Generation
The UNDL Foundation (http://www.undl.org) supplies a module that carries out the
generation process through a unique process. This module is known as DeCo (stand-ing for DeConverter). This module is completely language independent, since all the
Fig. 3. Architecture of the UNL system.
generation module
edition
module
EditorUNL
Target lan-
guage gene-rator
Originaltext (any
language)
GeneratedText
UNL
Document
Base
UNL – locallanguage Dic-
tionaries
-
8/18/2019 Universal Networking Language: Advances in Theory and Applications
36/465
16 Jesús Cardeñosa, Carolina Gallardo, and Edmundo Tovar
necessary grammar knowledge for the generation of the target language is included in
the dictionary and the rules’ set proper of the language.
Given that this module directly transforms the semantic UNL representation into
the morphological realization (that is, a sentence in natural language), the dictionary
must contain the best detailed information in the following aspects:
• Grammar category and subcategories: the more organized by hierarchies the
lexical level, the better quality will be expected from the generation.
• Argument structure and prepositions required by verbs, nouns, and adjectives.
• Semantic information that may be relevant for the syntactic configuration in the
target language.
With the help of the information included in the dictionary, the generation rules
have, as their main task, to transform the UNL expression into a phrase in the naturallanguage. Basically the following tasks are being carried out:
• Matching of the UNL relations with the grammar relations of the language.
In the previous sentence, the agent of the predicate in UNL corresponds to thegrammatical subject in English or Spanish.
• “Translation” of the UNL attributes into their appropriate morphologic or
syntactic realization. For example, the attribute “plural” has to be morphologi-
cally realized as a plural noun in Spanish. The attribute “definite reference” is
translated into Spanish through the insertion of a definite article. Not always there
is a direct translation between UNL attributes and morphological/pragmatic in-
formation in natural languages. For instance, when dealing with time, UNL only
offers three possibilities (past, present and future). It would be “competence” of
the generation rules of each natural language to correctly select the tense and ver-
bal moods applicable to the languages that do not have this kind of time system
(for instance, Spanish).
• Generation of pronouns and anaphoric expressions. The UNL expression is
devoid of anaphoric elements, all concepts in UNL should be stated explicitly. It
is the task of the generation rules to insert pronouns and other anaphoric elements
in the generated texts.
• Morphologic synthesis. Finally, generation rules should tackle aspects such as
agreement between verb and subject, or between adjectives and nouns, word or-
der or the expression of the correct verb tense.
Figure 4 shows the architecture for direct generation, there it can be seen how
the “bilingual” dictionary Natural Language-UNL and the generation rules feeds theDeCo module in order to carry out the generation of UNL text into a natural language
text.
-
8/18/2019 Universal Networking Language: Advances in Theory and Applications
37/465
Standardization of the Generation Process in a Multilingual Environment 17
3.3.2 Combined Generation (reuse of transfer components)
The treatment for Russian and French languages inside the UNL system is the perfect
example of the combined generation within the UNL framework. Both teams have in-
tegrated the UNL system into their transfer systems, ETAP in Russian case [11], and
Ariane for the French one [14]. These systems have chosen to reuse the available generators of the target languages
and to develop an additional module that allows the conversion of the UNL represen-
tation into a friendly format through the generators of their “transfer” systems. An ex-
ample of combined architecture would be exemplified in figure 5.
The so called “UNL transfer module” is with no doubt a new component to de-velop. However, the experience in the already mentioned systems has shown that the
development costs of this module are cheaper than the costs for developing a new
generator that could have the UNL code as its direct input.
Fig. 4. Architecture of the direct generation in the UNL
framework
DeCo
text in
naturallanguage
UNLdocumentbase
diction-
aryLn -
generation
rules
new module reused generator
text in
naturallanguage
UNLdocumentbase
Transfer ge-nerator
UNL transfermodule
Fig. 5. Combined Generation
-
8/18/2019 Universal Networking Language: Advances in Theory and Applications
38/465
18 Jesús Cardeñosa, Carolina Gallardo, and Edmundo Tovar
4 Can the UNL be a generation standard?
4.1 What is a standard?
If we try to avoid the formal definitions for “standard”, it could be said that a standard
is a set of rules, criteria and recommendations that allow to build a product or to de-sign and offer a service in a proper way that assures:
• The universalization of the work, that is, a unique way of doing something that at
the same time can be independently evaluated, no matter who does it or when.
• The quality. When products or services have been carried out following a stan-
dard, there is a certainty that the processes are well implemented and the product
quality is not at risk.• The assessment of the product or service provisions, meaning that it could be de-
termined through a unique way, when a product or a service fulfills the specifica-
tions it has been designed and built for.
Many more could be enumerated, but we are focusing in these three that may be
the most intuitive. As it has already been mentioned, the lesser development of some
products (in this case, language generators) is due to the lack of standards that could
assure these characteristics. The diversification and extremely disperse casuistic of the
inputs to a generator cause that the output become the only way to assess it to estab-
lish a subjective evaluation.
Although there are some researchers that have not neglected this side of the gen-
eration [15], this standard has not been yet established, neither formally nor de facto.
4.2 The UNL as a standard
Technically speaking, the lack of uniformity in the inputs to language generators is
almost the only reason that restrains a bigger development. Therefore, the support
systems to multilingual services see their action limited only to specific languages
where translation services may be offered, either automatic or not. However, the lan-
guage expansion is an unapproachable road with these methods. If the input to lan-
guage generators is not standardized, this problem will not be solved in a global way.
The only standardization would then be the choice of a content support that could ex-
press itself in a unique way, with a specific language. Actually, this concept has ex-
isted for many years, and it is the Interlingua concept. It is within this context where
the UNL can play a role. The UNL has not been conceived as an interlingua, but itcan be used as one. The interlinguas had their historic moment when they faced the
same problems as the other systems created for machine translation during the 80’s.
At the beginning of the 90’s it was clear that the subject of the languages was much
more complex than it seemed during the technological development of the 80’s and
the exaggerated optimism of the time.
It is not the purpose of this paper to describe the economic advantages of an inter-
lingua over the traditional systems of machine translation regarding many languages
(a traditional machine system requires 90 systems to support 10 languages, while one
-
8/18/2019 Universal Networking Language: Advances in Theory and Applications
39/465
Standardization of the Generation Process in a Multilingual Environment 19
based in interlingua requires only 20). In fact, the crossing point between systems
takes place at three languages. For more than three languages interlingua is cheaper.
However, historic matters at the beginning of the 90’s buried the interlinguas
(mainly those developed in Japan and the USA) because while the interlingua based
systems were not well defined, the “transfer” machine translation systems began to
offer more positive results. Even so, within the group of language technologies, ma-
chine translation became kind of discredited. At the end of the 90’s, the United Na-
tions opted for models based on interlingua approximations to define the multilingual
support systems for the Internet. The result is the today’s named UNL, already de-
scribed in this chapter. Apparently, it would be the ideal system to solve the problem
of the absence of a standard input to language generators. Nevertheless, a standard is
something else than a technological solution. It could be summarized like this: a stan-
dard is evaluated through the maturity concept that to sum up means that it would be
associated to the organized and organizational maturity, that is, there has to be an or-
ganization behind the standard that may be able to maintain, modify, allow the study
of its acceptance and real use for it, and other factors. Currently, it could be said that
the UNL has weak and strong points to formally become a standard [16]:
Weak points:
– Relatively recent technology
– Not too much implemented
– Quality system not implemented Strong points:
– Worldwide organization behind (dissemination assured)
–
Business expectations increased by the incorporation of minority lan-
guages –
Quality system defined
However, independently of the global factors, the technological approach is nowa-
days the only one able to solve the problem of automatic multilingual generation sys-
tems. Regarding the business approach, the expansion of multilingual systems in the
Internet requires much more than traditional systems of machine translation. This is
why the UNL is not just an interlingua, but a language to support knowledge reposito-ries, different ontological approaches, and other matters. Summarizing, the UNL (or
something similar to it) is necessary and needed by others.
5 A real experience: HEREIN and UNL
5.1 |Herein and standardization of form and structure.
The Herein system (IST-2000-29355) [17] is a perfect example of a massively multi-
lingual environment. It constitutes an Internet-based facility for improving cultural