CAP: A Hierarchical Lexical Function
description
Transcript of CAP: A Hierarchical Lexical Function
![Page 1: CAP: A Hierarchical Lexical Function](https://reader034.fdocuments.in/reader034/viewer/2022051419/568159c4550346895dc7164a/html5/thumbnails/1.jpg)
CAP: A Hierarchical Lexical Function
Amalia Todirascu
Linguistique, Langues, Paroles (LILPA)
University of Strasbourg
![Page 2: CAP: A Hierarchical Lexical Function](https://reader034.fdocuments.in/reader034/viewer/2022051419/568159c4550346895dc7164a/html5/thumbnails/2.jpg)
2
The Project
Goals to study a specific CAP lexical function, in
several languages (French, English, German) economy, politics
to provide a complete linguistic description of this function
to extend a multilingual ontology, Prolexbase (Tran and Maurel, 2006)
![Page 3: CAP: A Hierarchical Lexical Function](https://reader034.fdocuments.in/reader034/viewer/2022051419/568159c4550346895dc7164a/html5/thumbnails/3.jpg)
The Project (II)
collaboration with CLARIN European project (http://www.clarin.eu)– WP3 Humanities overview
• WP3.3 Call for collaboration with Humanities projects
– Collaboration• access to existing corpora and tools
• consultancy
![Page 4: CAP: A Hierarchical Lexical Function](https://reader034.fdocuments.in/reader034/viewer/2022051419/568159c4550346895dc7164a/html5/thumbnails/4.jpg)
4
CAP – a Lexical Function
CAP lexical function (Mel'čuk 1984, 1988, 1992, 1999) – hierarchical relations
Two persons François Fillon est premier ministre de Nicolas Sarkozy Sebek em war ein Oberpriester ca. 1780 v.Chr
Two organisations Swiss Private Aviation AG, a fully-owned subsidiary of Swiss
International Air Lines AG Peugeot est une firme sochalienne
A Person and an organization or a country SWISS Finanzchef Marcel Klaus Traian Băsescu is the Romanian president
![Page 5: CAP: A Hierarchical Lexical Function](https://reader034.fdocuments.in/reader034/viewer/2022051419/568159c4550346895dc7164a/html5/thumbnails/5.jpg)
5
Context
linguistics : noun classifications (Kleiber 1990, Kleiber 1999, Jonasson 1994)
lexical databases: WordNet (Miller, 1995), EuroWordNet (Vossen, 1998), BalkaNet (Tufis, 2004), FrameNet (Baker, et al, 1998)
ontologies: Prolexbase (Tran and Maurel, 2006) (Grass et al, 2004) , SUMO (Niles and Pease, 2001)
several applications : information extraction QA systems
![Page 6: CAP: A Hierarchical Lexical Function](https://reader034.fdocuments.in/reader034/viewer/2022051419/568159c4550346895dc7164a/html5/thumbnails/6.jpg)
6
The Methodology
we identify existing monolingual and parallel corpora
DE, EN, FR CLARIN language resource registry
tagged and raw corpora annotation tools (both from the repository and on-line web
services)
we create our own multilingual corpora
![Page 7: CAP: A Hierarchical Lexical Function](https://reader034.fdocuments.in/reader034/viewer/2022051419/568159c4550346895dc7164a/html5/thumbnails/7.jpg)
7
The Methodology (II)
we apply several data extraction strategies• searching synonyms of "chef/head of/Vorsitzender";• searching Named Entities related by the CAP relation (Martine Aubry – Parti Socialiste);• searching annotated persons and organizations through aligned corpora
we analyse the contexts to classify the expressions and their argumentswe extend Prolexbase ontology
![Page 8: CAP: A Hierarchical Lexical Function](https://reader034.fdocuments.in/reader034/viewer/2022051419/568159c4550346895dc7164a/html5/thumbnails/8.jpg)
Corpora (I)
• Available public data• Web interfaces (CQP)
• Various domains and genres
• monolingual : • Wortschatz (http://corpora.informatik.uni-leipzig.de), IULA
(http://bwananet.iula.upf.edu), COSMAS (http://www.ids-mannheim.de/cosmas2), BNC (http://www.natcorp.ox.ac.uk/)
• multilingual :• Oslo (http://www.hf.uio.no), CLUVI (http://sli.uvigo.es/CLUVI), DGT-TM
(http://langtech.jrc.it/DGT-TM.html)
![Page 9: CAP: A Hierarchical Lexical Function](https://reader034.fdocuments.in/reader034/viewer/2022051419/568159c4550346895dc7164a/html5/thumbnails/9.jpg)
Corpora (II)
Corpora built for the project
monolingual : party chiefs (DE, EN), French president (FR) (200,000 tokens/language)
multilingual (paralel and comparable) :
aiplane companies (51,000-54,000 tokens)
European parliament (127,000-134,000 tokens)
European commission (175,000-195,000 tokens)
Domains : politics, economy
![Page 10: CAP: A Hierarchical Lexical Function](https://reader034.fdocuments.in/reader034/viewer/2022051419/568159c4550346895dc7164a/html5/thumbnails/10.jpg)
10
Preprocessing the Corpora
Unitex tool (Paumier, 2000) Resources available for the three languagesTools :
tokenizer, lemmatizer and tagger CasSys (Friburger and Maurel, 2004) to annotate French Named
Entities
Weblicht Platform NE annotations for German and English
sentence aligner : Alinea (Kraif, 2001)
![Page 11: CAP: A Hierarchical Lexical Function](https://reader034.fdocuments.in/reader034/viewer/2022051419/568159c4550346895dc7164a/html5/thumbnails/11.jpg)
11
Data extraction
three strategies for data extractionA. we identify synonyms/hyponyms for English (WordNet,
FrameNet) and their equivalents in French and German• chef, président, PDG, directeur général• Chief executive officer, president, head of• Vorsitzender, Direktor
B. we search pairs of entities which are related by a CAP relation• Barack Obama – United States of America• José Manuel Barroso – la Commission européenne• Marcel Klaus – SWISS
C. we use aligned corpora and French NER CasSys (Friburger and Maurel, 2004) to obtain relevant contexts of Person or Organization
![Page 12: CAP: A Hierarchical Lexical Function](https://reader034.fdocuments.in/reader034/viewer/2022051419/568159c4550346895dc7164a/html5/thumbnails/12.jpg)
Data Extraction (II)
Problems few contexts from existing corpora (30 to 50) Various queries
CQP/web interface
raw texts
Various annotations few tagged corpora
almost no NE annotated corpora
heterogenous tools to preprocess corpora
12
![Page 13: CAP: A Hierarchical Lexical Function](https://reader034.fdocuments.in/reader034/viewer/2022051419/568159c4550346895dc7164a/html5/thumbnails/13.jpg)
13
'Cap' lexical units
various lexical categories nouns :
positions (e.g.Finanzdirektor), professions (infirmière en chef), titles (Dr.), army ranks (General)
verbs : to lead, to organize, to commandA trilingual ontology
95 lexical units (FR), 93 lexical units (EN), 67 lexical units (DE)
From existing lexical databases From corpora
![Page 14: CAP: A Hierarchical Lexical Function](https://reader034.fdocuments.in/reader034/viewer/2022051419/568159c4550346895dc7164a/html5/thumbnails/14.jpg)
14
Linguistic Analysis
arguments types persons, organizations, places common nouns : anaphoric references to
organisations or persons in charge, nationality adjectives
various linguistic expressions Nouns – morpho-syntactic variations Verbs
complex verbo-nominal predicates (sous la gouverne de, unter der Leitung von, under the direction of, become président, être elu …)
![Page 15: CAP: A Hierarchical Lexical Function](https://reader034.fdocuments.in/reader034/viewer/2022051419/568159c4550346895dc7164a/html5/thumbnails/15.jpg)
15
Morpho-Syntactic Properties
Nouns affixation
général, généralissime (FR) composition
vice-roi (FR), vice-roy (EN), Vizekönig (DE) modification
adjective (directeur général, FR, Generaldirektor DE) prepositional phrase (infirmière en chef FR, head nurse EN,
Oberschwester DE) noun being the possessor of another noun
du Conseil de Sécurité des Nations Unies, United Nation Security Council, des UN-Sicherheitsrates
![Page 16: CAP: A Hierarchical Lexical Function](https://reader034.fdocuments.in/reader034/viewer/2022051419/568159c4550346895dc7164a/html5/thumbnails/16.jpg)
17
Conclusion and Further Work
study from the lexical semantics field : a hierarchy relation in a multilingual perspective – CAP various expressions and various arguments types data from monolingual and multilingual corpora trilingual ontology (FR,DE, EN) – extension of Prolexbase
Overall experience querying various interfaces heterogeneous annotation information heterogeneous tools combining linguists’ and computational linguists’ competences
![Page 17: CAP: A Hierarchical Lexical Function](https://reader034.fdocuments.in/reader034/viewer/2022051419/568159c4550346895dc7164a/html5/thumbnails/17.jpg)
18
The Lexico-Syntactic Patterns
French patterns <Organization>de<Organization> Conseil d'Administration de SWISS
English patterns <CAP function> of <Organisation>, <Person> Chief executive officer of the company TAROM, M. Gheorghe Birla
German patterns <Person> <sein> <tokens>* <CAP function> <Organisation>
Peter Siegenthaler ist seit Juli 2000 Direktor der Eidgenössischen Finanzverwaltung