Using a Generative Lexicon Resource to Compute Bridging Anaphora in Italian: preliminary...

26
Using a Generative Lexicon Resource to Compute Bridging Anaphora in Italian: preliminary observations and data Tommaso Caselli Istituto di Linguistica Computazionale – ILC-CNR Pisa Dip. Di Linguistica “T. Bolelli”, Università degli Studi di Pisa {tommaso (dot) caselli (at) ilc (dot) cnr (dot) it} CBA 2008, Barcelona, 14 November 2008

Transcript of Using a Generative Lexicon Resource to Compute Bridging Anaphora in Italian: preliminary...

Page 1: Using a Generative Lexicon Resource to Compute Bridging Anaphora in Italian: preliminary observations and data Tommaso Caselli Istituto di Linguistica.

Using a Generative Lexicon Resource to Compute Bridging Anaphora in Italian:

preliminary observations and data

Tommaso Caselli

Istituto di Linguistica Computazionale – ILC-CNR Pisa Dip. Di Linguistica “T. Bolelli”, Università degli Studi di

Pisa {tommaso (dot) caselli (at) ilc (dot) cnr (dot) it}

CBA 2008, Barcelona, 14 November 2008

Page 2: Using a Generative Lexicon Resource to Compute Bridging Anaphora in Italian: preliminary observations and data Tommaso Caselli Istituto di Linguistica.

Outline

Motivations

Bridging in Italian: corpus-study

Introducing a Different Resource: PAROLE/SIMPLE/CLIPS

Preliminary Experiments and Evaluation

Conclusion & Future Work

Page 3: Using a Generative Lexicon Resource to Compute Bridging Anaphora in Italian: preliminary observations and data Tommaso Caselli Istituto di Linguistica.

Motivations: Bridging anaphora is a very challenging phenomenon and their

resolution is essential to improve the performance of many NLP

applications (Q.A.; I.R. & I.E. and Summarizers);

So far, the use of (lexical) resources has concentrated on the

exploitation of semantic relations (meronymy, synonymy, hyponymy ...)

but the results present limitations:

the relation between the bridging anaphor and the anchor is not always a semantic relation in classical terms

Relations between words are not randomly created by speakers. This

calls for resources based on strong theoretical frameworks which may

provide accounts on the way words combine and are related

Generative Lexicon (G.L.) & G.L.-based resources

Page 4: Using a Generative Lexicon Resource to Compute Bridging Anaphora in Italian: preliminary observations and data Tommaso Caselli Istituto di Linguistica.

Bridging anaphora: theoretical

assumptions• it is“a type of indirect textual reference whereby a new referent is introduced as an anaphoric not of but via the referent of an antecedent expression” [Kleiber 1999: 339];

Yesterday we went for a pic-nic, but I forgot to put the beers in the fridge.

• it is a class of inferences required to maintain the coherence of the discourse (Clark 1977);

• they give rise to three kinds of presupposition:

• the Uniqueness Presupposition;

• the Familiarity/Identifiability Presupposition and

• the Inferential Presupposition i.e.“the [N1] R [N2]” e.g.:

• N1 [the beers], N2 [a pic-nic] R= is_a_member_of).

Page 5: Using a Generative Lexicon Resource to Compute Bridging Anaphora in Italian: preliminary observations and data Tommaso Caselli Istituto di Linguistica.

Bridging anaphora: theoretical

assumptions (2)

• they are a matter of the local focus of the discourse for the identification of their antecedents (Sidner 1979, Poesio 2003);

• 3 pragma-cognitive dimensions can be identified for their interpretation (Korzen 2003):

• Lexical Semantics Dimension;

• Co-textual Dimension (discourse structure);

• Con-textual Dimension (scripts, frames, world knowledge).

Page 6: Using a Generative Lexicon Resource to Compute Bridging Anaphora in Italian: preliminary observations and data Tommaso Caselli Istituto di Linguistica.

Bridging anaphora: corpus-study

Two-folded corpus study:

1) General corpus-study on Full Definite Noun Phrases (FDNPs) in Italian;

2) A study on those cases of FDNPs which are instances of bridging anaphora in Italian.

METHODOLOGICAL NOTE

corpus of seventeen randomly chosen articles from the Italian financial newspaper “il Sole-24 Ore”, a workpackage of the SI-TAL project

use of processing requirements for the classification both of the FDNPs in general and for bridging anaphors;

Minimal vs Maximal NP (MUC-7);

all instances of NPs (pronouns – including zero anaphora, lexical expressions), VPs and frames have been considered as probable anchors;

pre- and post-nominal modifiers (adjectives, non-finite verb forms, relative clauses and prepositional phrases) have been considered as disambiguating clues.

Page 7: Using a Generative Lexicon Resource to Compute Bridging Anaphora in Italian: preliminary observations and data Tommaso Caselli Istituto di Linguistica.

Bridging anaphora: corpus-study (1)

CLASSNUMBER OF

ITEMSPERCENTAGE

First Mention 833 58.61%

Direct Anaphora 170 12.03%

Bridging 299 21.17%

Possessives 36 2.54%

Idiom 25 1.62%

Doubt 49 3.47%

Total 1412 100%

Full Definite Noun Phrases in Italian:

Page 8: Using a Generative Lexicon Resource to Compute Bridging Anaphora in Italian: preliminary observations and data Tommaso Caselli Istituto di Linguistica.

Bridging anaphora: corpus-study (2)

Bridging Anaphora in Italian:

CLASS OF BRIDGING FDNPs NUMBER OF ITEMS

PERCENTAGE

Lexical 119 39.79%

Event 18 6.02%

Rhetorical Relation 27 9.03%

Inferential 109 36.45%

Discourse Topic 26 8.69%

Total 299 100%

LEXICAL SEMANTICS > PRAGMATICS > DISCOURSE STRUCTURE

• 221 anchors are nominal entities & ~70% have lookback ranging 0-2

• 53.84% (119/221) of the anchors are previous Cbs/Cps (Centering Theory)

• 25.33% are proper names

• 34.03% are NPs of postmodifying PPs, i.e. the explicit argument of the head noun of Lőbner’s FC2 e.g.:

4) i due Paesi - i due partner commerciali: I negoziatori dei due Paesi hanno annunciato che i colloqui “informali” in corso da giovedì scorso nella capitale Usa hanno portato all' alba di martedì al compromesso[...]. Doppiato questo scoglio [...] i due partner commerciali hanno promesso di procedere a passo spedito.

Page 9: Using a Generative Lexicon Resource to Compute Bridging Anaphora in Italian: preliminary observations and data Tommaso Caselli Istituto di Linguistica.

A Different Resource: PAROLE/SIMPLE /CLIPS

As the corpus-study has shown more than 45% of the relations between anchor – bridging anaphor are based on relations which are not strictly lexical.

WHY USING (AGAIN) A LEXICAL RESOURCE?

SIMPLE is based on Generative Lexicon (Pustejovsky, 1995):

formal framework which explains how senses are generated in the lexicon;

the basic qualia (telic, constitutive, agentive and formal) enable the description of the meaning of the word & captures orthogonal relations between semantic units;

the span of semantic relations in the G.L. framework is much wider and it reduces the need of world/pragmatic knowledge to explain semantic relations between words

http://www.ilc.cnr.it/clips/

Page 10: Using a Generative Lexicon Resource to Compute Bridging Anaphora in Italian: preliminary observations and data Tommaso Caselli Istituto di Linguistica.

A Different Resource: PAROLE/SIMPLE /CLIPS (2)

PAROLE/SIMPLE/CLIPS is the largest computational lexical knowledge base of Italian language: 

SEMANTICS

Lemmas: 45,437

verbs 2,830

common nouns 14,088

proper nouns 526

adjectives 1856

Semantic Units: 57,101

verbs 5,351

common nouns 19,123

proper nouns 873

adjectives 3,163

Page 11: Using a Generative Lexicon Resource to Compute Bridging Anaphora in Italian: preliminary observations and data Tommaso Caselli Istituto di Linguistica.

A Different Resource: PAROLE/SIMPLE /CLIPS (3)

SEMANTIC UNIT

Ontological Type

Domain

Event Type

Semantic Properties

FEATURES

RELATIONS

Extended Qualia

Synonymy

Derivation

Regular Polysemy

Page 12: Using a Generative Lexicon Resource to Compute Bridging Anaphora in Italian: preliminary observations and data Tommaso Caselli Istituto di Linguistica.

A Different Resource: PAROLE/SIMPLE /CLIPS (4)

Page 13: Using a Generative Lexicon Resource to Compute Bridging Anaphora in Italian: preliminary observations and data Tommaso Caselli Istituto di Linguistica.

A Different Resource: PAROLE/SIMPLE /CLIPS (5)

Qualia structure:

the classical 4 qualia have been extended, up to 64 relations

finer-grained specification of meaning dimensions

from a single keyword it is possible to retrieve and extract a set of semantic units, regardeless of their semantic type, which creates a rich semantic network in the text

FORMAL TELICAGENTIVECONSTITUTIVE

5 semantic relations 35 semantic relations 10 semantic relations 14 semantic relations

THESE SEMANTIC RELATIONS ARE TAKEN TO EXPRESS THE R ELEMENT OF THE INFERENTIAL PRESUPPOSITION TO RESOLVE BRIDGING ANAPHORS

PISTOLA (gun) – ARMA (weapon) SemRel= is_a

MORTE (dead) – SUICIDIO (suicide) SemRel= resulting_state

BENZINA (petrol) – PETROLIO (oil) SemRel= derived_from

PROIETTILE (bullet) – COLPIRE (shoot) SemRel= used_for

Page 14: Using a Generative Lexicon Resource to Compute Bridging Anaphora in Italian: preliminary observations and data Tommaso Caselli Istituto di Linguistica.

A Different Resource: PAROLE/SIMPLE /CLIPS (6)

1. i prezzi – al consumatore [the prices – the customer]; INFERENTIAL indirect_telic + agent_verb

2. il processo – gli imputati [the trial – the convicted]; INFERENTIAL member_of

3. essersi sparato – il suicidio [to shoot oneself – the suicide]; EVENT resulting_state

4. fatto esplodere – the debris [exploded – the debris]; EVENT result_of

5. condannare – il pubblico ministero [to condemn – the attorney] EVENT relates

6. il voto – l’elezione [the vote – the election] RHET. RELATION purpose

Page 15: Using a Generative Lexicon Resource to Compute Bridging Anaphora in Italian: preliminary observations and data Tommaso Caselli Istituto di Linguistica.

Experiments and evaluation

Experiment: 129 couple of bridging anaphor – anchor has been selected from the corpus-study, corresponding to the following classes:

Lexical

Event

Rhetorical Relations

Inferential

Anaphoric relations involving N.E. have been excluded

Page 16: Using a Generative Lexicon Resource to Compute Bridging Anaphora in Italian: preliminary observations and data Tommaso Caselli Istituto di Linguistica.

Experiments and evaluation (2)

Bridging anaphor Anchor

SIMPLE

WSD of the bridging anaphor

selection of the anchor

- automatic retrieval of the semantic relation

- maximum 2 semantic arcs allowed

- direct connection between the 2 SemU or between the 2 SemType.

Page 17: Using a Generative Lexicon Resource to Compute Bridging Anaphora in Italian: preliminary observations and data Tommaso Caselli Istituto di Linguistica.

Experiments and evaluation: results

Resource # Bridging Lexical Inferential EventRhet. Relation

SIMPLE 22 (17.05%) 11 (50.00%) 7 (31.82%) 2 (9.09%) 2 (9.09%)

IWN 19 (14.72%) 12 (63.20%) 5 (26.31%) 2 (10.52%) 0

Unsatisfactory results BUT still better than using IWN

Reason: lots of the extended qualia relations have not been introduced into the resource

The classes of Inferential and Rhetorical Relations are mostly resolved by 2 type of qualia: CONSTITUTIVE & TELIC

Page 18: Using a Generative Lexicon Resource to Compute Bridging Anaphora in Italian: preliminary observations and data Tommaso Caselli Istituto di Linguistica.

Conclusion & Future Work

the use of a GL based resource can be seen as a way of reducing the need of extralinguistic knowledge;

the problem of bridging anaphora resolution becomes part of a more general problem of identification of semantic relations between linguistic

elements.

a resource with GL qualia relations encoded in it should not be compared with a world-knowledge databases. GL-based relations are

dynamic: they allow to discover new relations between lexical items and can provide an account for the creative use of language;

Page 19: Using a Generative Lexicon Resource to Compute Bridging Anaphora in Italian: preliminary observations and data Tommaso Caselli Istituto di Linguistica.

Conclusion & Future Work (2)

qualia relations can represent new features for machine learning approaches;

GL pattern induction from a corpus-based study can improve the resource by adding missing relation;

extensive exploitation of the SemTypes can overcome the need of introducing single SemUs.

ESPLODERE (explode) - MACERIE (debris)

ESPLODERE Resulting_state SemU maceria SemType Cause_change_of_state

MACERIA result_of SemU esplodere SemType Cause_change_of_state

SemType Cause_change_of_state SemU DETRITO

SemU …………

Page 20: Using a Generative Lexicon Resource to Compute Bridging Anaphora in Italian: preliminary observations and data Tommaso Caselli Istituto di Linguistica.

Thanks

Page 21: Using a Generative Lexicon Resource to Compute Bridging Anaphora in Italian: preliminary observations and data Tommaso Caselli Istituto di Linguistica.

The Model :

Bill (Cb), book (Cp), Maria(Cf),

Bill(x1)book(x2)Maria(x3)give(x1,x3,x2)…………….

author (Cb)

author(y1)famous(y1)

Main DRS

DRS 2

2) Bill gave a book to Maria.

The author is very famous.

Page 22: Using a Generative Lexicon Resource to Compute Bridging Anaphora in Italian: preliminary observations and data Tommaso Caselli Istituto di Linguistica.

Lexical Bridging

la pistola (the gun) - l' arma (the weapon)): E poi stupisce che nel tamburo della pistola mancasse un proiettile[…]. Alcune tracce di ruggine, infatti, farebbero pensare che l' arma fu collocata nella cintura dei pantaloni almeno 4 o 5 giorni prima del ritrovamento del corpo .

l’esplosivo (the explosive) – la bomba (the bomb): Gli agenti sono risaliti al furgone utilizzato per trasportare l' esplosivo nel garage e alla persona che l' aveva affittato , Salameh. Il suo arresto , anche per aver collaborato alla preparazione della bomba , fu seguito dalla cattura di Ayyad , un chimico

Page 23: Using a Generative Lexicon Resource to Compute Bridging Anaphora in Italian: preliminary observations and data Tommaso Caselli Istituto di Linguistica.

essersi sparato (to shoot oneself) - il suicidio (the suicide): dopo essersi sparato una prima volta con la sua “Smith e Wesson” calibro 38, carica a proiettili “rafforzati” […]. Ciò non esclude automaticamente l' ipotesi del suicidio , ma avvalora quella di successive manomissioni, effettuate subito dopo la morte.

Event

rispose (to answer) - le domande (the questions): Nel 1993 , invece , rispose positivamente alle domande degli inquirenti perché ritenne che il clima politico consentisse di parlare liberamente .

Page 24: Using a Generative Lexicon Resource to Compute Bridging Anaphora in Italian: preliminary observations and data Tommaso Caselli Istituto di Linguistica.

Rhetorical Relations

il voto (the vote) - l' elezione (the election): il voto di lista maggioritario per l' elezione in assemblea dei componenti del cda (che per altro verranno retribuiti anche in relazione ai risultati ottenuti dalla società)

due elementi (two elements) - il voto... (the vote) / i limiti (the limits).. [i tre ministri] che hanno voluto introdurre nello statuto due elementi finora sconosciuti nell' universo italiano delle privatizzazioni: il voto di lista maggioritario per l' elezione in assemblea dei componenti del cda (che per altro verranno retribuiti anche in relazione ai risultati ottenuti dalla società) e, soprattutto, i limiti imposti al tetto azionario che vanno ben oltre il vincolo del 5 per cento.

Page 25: Using a Generative Lexicon Resource to Compute Bridging Anaphora in Italian: preliminary observations and data Tommaso Caselli Istituto di Linguistica.

Inferential

quattro uomini (four men) - i quattro immigrati (the four immigrants): “Non potrò veder crescere mio figlio perché quattro uomini hanno deciso di far saltare simboli americani” [...]. Neppure la chiusura del processo ai quattro immigrati di origine araba promette però di scrivere la parola fine

il tribunale (the court) - il giudice (the judge): La decisione del tribunale era parsa scontata e non ha sorpreso neppure Mohammed Salameh, Ahmad Ajaj, Mahmud Abouhalima e Nidal Ayyad, il gruppo di fondamentalisti islamici sotto processo. “Mi aspetto il massimo della pena” aveva detto Ajaj poco prima di ascoltare il responso del giudice [...].

la Cina (China) – Pechino (Bejing): gli Stati Uniti sono parsi più vicini a trovare una soluzione di compromesso anche sulla controversia con la Cina sui diritti umani. Il segretario di Stato Warren Christopher avrebbe infatti stabilito che Pechino ha soddisfatto richieste specifiche alle quali gli Usa.

Page 26: Using a Generative Lexicon Resource to Compute Bridging Anaphora in Italian: preliminary observations and data Tommaso Caselli Istituto di Linguistica.

I terroristi hanno fatto esplodere una potentissima carica di esplosivo nel garage dei piu' alti grattacieli di New York : tra le macerie persero la vita sei persone e altre mille rimasero ferite ,