Annotating Attribution Relations Towards an Italian Discourse Treebank

31
Annotating Attribution Relations Towards an Italian Discourse Treebank Silvia Pareti Irina Prodanof

description

Annotating Attribution Relations Towards an Italian Discourse Treebank. Silvia Pareti Irina Prodanof. Outline. Introduction Related works Goal and methodology Proposed scheme Some issues Pilot annotation Attribution figures Conclusion and future work. Introduction. Fiona. Fiona. - PowerPoint PPT Presentation

Transcript of Annotating Attribution Relations Towards an Italian Discourse Treebank

Page 1: Annotating Attribution Relations Towards an Italian Discourse Treebank

Annotating Attribution Relations

Towards an Italian Discourse Treebank

Silvia Pareti

Irina Prodanof

Page 2: Annotating Attribution Relations Towards an Italian Discourse Treebank

• Introduction• Related works• Goal and methodology• Proposed scheme• Some issues• Pilot annotation• Attribution figures• Conclusion and future work

Outline

Page 3: Annotating Attribution Relations Towards an Italian Discourse Treebank

Introduction

ATTRIBUTION in a text is ascribing the ownership of an attitude towards some linguistic material, i.e. the

text itself, a portion of it or their semantic content, to an entity.

Recognising attribution relations is fundamental for Information Extraction, (Multi Perspective) Question Answering, Opinion Mining etc.

FionaDifferent sources can differ in bias and reliability and this deeply affects the way we perceive information.

Fiona says “This afternoon it will rain”

Page 4: Annotating Attribution Relations Towards an Italian Discourse Treebank

Introduction

Why should we identify the source of a portion of text?

NLP techniques

Question comprehension

Information Retrieval

Finding text fragments with the answer

Language Generation

Answer generation

Answer selection

ODQA

• visualize only authoritative answers

• collect different opinions, hearsay

• discard second-hand or anonymous information

• retrieve statements from a specific source over a given time span

• …

Page 5: Annotating Attribution Relations Towards an Italian Discourse Treebank

Introduction

“È meglio vaccinarsi per l’influenza ‘suina’?”

Is it better to get the swine flu vaccine?

Page 6: Annotating Attribution Relations Towards an Italian Discourse Treebank

Introduction

“The vaccine is useless.”

“Everyone should get the vaccine.”

“Only persons having a higher risk of complication from influenza should get the vaccine .”

orsetta90

Novartis

Doctor association

blogger: not authoritative and not verifiable source

Pharmaceuticals industry: authoritative but biased

“È meglio vaccinarsi per l’influenza ‘suina’?”

Is it better to get the swine flu vaccine?

Page 7: Annotating Attribution Relations Towards an Italian Discourse Treebank

Related works

Opinion holders identification projects

Bethard et al. (2004)Consider just opinion propositions (source = agent)

Kim and Hovy (2005)Identify all possible opinion holders agentive and NPs (no pronouns)

Stoyanov and Cardie (2006)Identify NPs sources

Choi et al.(2006)They do not consider implicit or multiple sources and test their system on the OPQA corpus

Opinion recognition has limited coverage and not satisfactory precision: 60-70%

Page 8: Annotating Attribution Relations Towards an Italian Discourse Treebank

Related works

PDTB (Prasad et al., 2007)

assertions, beliefs, facts, eventualities

Opinion Corpus (Wiebe, 2002)

speech acts

private states: opinions, beliefs, thoughts, feelings, emotions, goals, evaluations and judgements

GraphBank (Wolf and Gibson, 2005)

attribution included as a directed coherence relation (satellite to nucleus)

Attribution of discourse connectives and their arguments only

Attribution considered as an intra-sentential phenomenon

Attribution of discourse segments

Page 9: Annotating Attribution Relations Towards an Italian Discourse Treebank

Designing the addition of a level of annotation for attribution to the ISST (Italian Syntactic - Semantic Treebank) corpus.

• more complete and independent analysis of attribution

• development of an annotation schema

• pilot annotation of a portion of the ISST

• partial listing of possible attribution cues

• evaluation

Goal and methodology

Page 10: Annotating Attribution Relations Towards an Italian Discourse Treebank

Goal and methodology

ANALYSIS

SCHEMADEFINITION

TOOL SELECTION

EVALUATION

ANNOTATION

•Scope definition

•Identification of characteristics and issues

•Selection of features to be annotated

•Design of the schema

•Annotation requirement definition

•Match tool characteristics and annotation requirements

•Setting the tool

•Evaluation of the schema applicability

•Pilot annotation and detection of issues

X

•Linguistic resource creation and release

Page 11: Annotating Attribution Relations Towards an Italian Discourse Treebank

Proposed schema

SOURCE(S) CUE CONTENT(S)

relation

(SUPPLEMENT)

Markables

-verb

-noun

-adjective

-preposition

-prep. group

-graphic marker

-noun phrase

-adjective

-prep. phrase

-word

-phrase

-clause

-sentence

-entire article

-cue modifier

-indirect object

-source of source

-event specification

Page 12: Annotating Attribution Relations Towards an Italian Discourse Treebank

Features

Attribution type

Source type

• assertion (e.g. dire, osservare, sostenere)

• belief (e.g. credere, pensare, dubitare)

• fact (e.g. ricordare, sapere, sentire)

• eventuality (e.g. permettere, proibire)

• writer

• other (e.g. il presidente, un uomo, Maria)

• arbitrary (e.g. uno, la gente, tutti)

• mixed

Factuality

Scopal change • none

• scopal change

• factual

• non-factual

Proposed schema

Page 13: Annotating Attribution Relations Towards an Italian Discourse Treebank

Source

Some issues

• Nested attribution

• Multiple sources

• Source of source

• Pronominal and bridging anaphora

Page 14: Annotating Attribution Relations Towards an Italian Discourse Treebank

[Sue said {that Mary believes (that Gore

won the election)}].

Fonti: [writer] {writer, Sue} (writer, Sue, Mary) (Wiebe, 2002:5 - with the addition of brackets)

Blinder, secondo voci riferite dal New York Times, sperava di succedere al presidente Greenspan quando a marzo scadrà la sua nomina. (ISST re070)

Blinder, according to rumours reported by the New York Times, hoped to succeed to president Greenspan when in May his appointment will run over.

Some issues

Source

• Nested attribution

• Multiple sources

• Source of source

• Pronominal and bridging anaphora

Page 15: Annotating Attribution Relations Towards an Italian Discourse Treebank

Tutti, incluse le autorità, conoscono la loro provenienza, ma nessuno dice e fa nulla per prevenire il massacro di capi selvatici. (cs.morph020)

Everyone, including the authorities, knows their provenance, but no one says and does anything to prevent the massacre of wild animals.

Arbitrary Other

Some issues

Source

• Nested attribution

• Multiple sources

• Source of source

• Pronominal and bridging anaphora

Page 16: Annotating Attribution Relations Towards an Italian Discourse Treebank

(Ø) Ho saputo della squalifica di Garciano da Maurizio Damilano, vi giuro, non pensavo di arrivare primo. (ISST cs071)

(I) heard of the disqualification of Garciano from Maurizio Damilano, I swear, I didn’t imagine I would have came first.

Poi però, tramite la figlia che sta a Santiago, prima limita la portata del colloquio con Gaston Salvatore (“non è stata una vera intervista, solo una conversazione”), poi smentisce. (ISST period005)

Afterwards however, through the daughter who lives in Santiago, first diminishes the importance of the colloquium with Gaston Salvatore (“it wasn’t a real interview, just a conversation”), then (she) denies.

Some issues

Source

• Nested attribution

• Multiple sources

• Source of source

• Pronominal and bridging anaphora

Page 17: Annotating Attribution Relations Towards an Italian Discourse Treebank

La Fermenta, a sentire l' arabo, è organizzata in modo che oggi consegue un utile pari al 35 per cento del fatturato. Questo il vero traguardo che dovrà nel tempo raggiungere la Pierrel. Ma come? Con tagli di mano d'opera? Nemmeno per sogno, dice El Sayed. (ISST els001)

Fermenta, according to the Arabian, is organised so that it earns at present a profit of 35 per cent of the turnover. This is the real goal that in the long distance Pierrel will have to achieve. But how? Cutting down on workforce? No way, says El Sayed.

Some issues

Source

• Nested attribution

• Multiple sources

• Source of source

• Pronominal and bridging anaphora

Page 18: Annotating Attribution Relations Towards an Italian Discourse Treebank

Cue

Some issues

• Type definition

• Multimodal cues

• Scopal change

Page 19: Annotating Attribution Relations Towards an Italian Discourse Treebank

Cue

assertion belief facts eventualities

affermare credere ricordare permettere

sostenere pensare sapere sostenere

osservare  dubitare osservare  desiderare

"Vi daremo le statistiche alla fine", promettono i generali croati. (ISST cs030)

“We’ll give you the statistics at the end”, promise the Croatian generals.

AssertionEventuality

Some issues

• Type definition

• Multimodal cues

• Scopal change

Page 20: Annotating Attribution Relations Towards an Italian Discourse Treebank

"Sì - si adombra Matt - Un ruolo interessante: con Tarantino eravamo a buon punto, poi é arrivato Bruce. I suoi film incassano un po' più dei miei, no? Hanno scelto lui” …(ISST cs060)

“Yes - Matt grows dark - An interesting role: with Tarantino we were at a good point, then Bruce arrived. His films cash in a bit more than mines, right? They chose him” …

Arlacchi sorride: “Pura paranoia politica. Non ho partecipato ai lavori solo a causa di un impegno privato…”. (ISST re095)

Arlacchi smiles: “Pure political paranoia. I didn’t participate in the works only because of a private appointment…” .

Cue

Some issues

• Type definition

• Multimodal cues

• Scopal change

Page 21: Annotating Attribution Relations Towards an Italian Discourse Treebank

Cue

Strano destino, quello di Civitavecchia: finire spesso, troppo spesso, sulle pagine dei giornali per eventi misteriosi, oppure per fatti che nessuno vorrebbe accadessero nella sua città. (ISST cs090)

Strange destiny, that of Civitavecchia: ending up often, too often, in the news because of mysterious events, or because of events that no one would like to happen in their town.

Se c’è, cioè, una maggioranza in Parlamento in grado di affrontare seriamente una fase di riforme anche elettorali, Ø penso che la legislatura possa utilmente proseguire. (ISST re075)

If there is a majority at the Parliament able to seriously face a phase of reforms, also electoral, (I) think that the legislature could usefully continue.

? = tutti vorrebbero non accadessero

Some issues

• Type definition

• Multimodal cues

• Scopal change

Page 22: Annotating Attribution Relations Towards an Italian Discourse Treebank

Some issues

Content

• Multiple contents

• Discontinuous spans

• Event anaphora

Page 23: Annotating Attribution Relations Towards an Italian Discourse Treebank

(Ø) Ho detto |che ero dalla sua parte| e |che ritenevo giusta la sua protesta|. (ISST cs063)

(I) said |that I was on his side| and |that I considered his complaint fair|.

Some issues

Content

• Multiple contents

• Discontinuous spans

• Event anaphora

Page 24: Annotating Attribution Relations Towards an Italian Discourse Treebank

"There's no question that some of those workers and

managers contracted asbestos-related diseases,"

said Darrell Phillips, vice president of human

resources for Hollingsworth & Vose.

"But you have to recognize that these events took

place 35 years ago. It has no bearing on our work

force today."

(PDTB 0003)

Some issues

• Multiple contents

• Discontinuous spans

• Event anaphora

Content

Page 25: Annotating Attribution Relations Towards an Italian Discourse Treebank

“…L’umanità deve proclamare uno storico sciopero ad oltranza fino alla distruzione di tutti gli armamenti nucleari.” Le parole registrate di Gheddafi, …(ISST cs039)

“…The world should proclaim a non-stop strike till the destruction of all nuclear armaments.” Gheddafi’s recorded words,…

Some issues

Content

• Multiple contents

• Discontinuous spans

• Event anaphora

Page 26: Annotating Attribution Relations Towards an Italian Discourse Treebank

Subcorpus:

• 50 articles from the ISST• balanced• 37.000 word tokens• 461 attribution relations

MMAX2

Base Data (original text)

Scheme (annotation schema)

Style (display structure)

Customization (preferences)

Markable (annotation)

Tools

GATE

Knowtator

Annotator

MMAX2

Callisto

Pilot annotation

Tool requirements

Discontinuous text selection

Nested selection

Relations

Multiple sources/contents

Pre-defined values selection

Display customizability

Ease of setting a scheme

Ease of annotation

XML stand-off output

Reference to word index

Page 27: Annotating Attribution Relations Towards an Italian Discourse Treebank

Pilot annotation

Page 28: Annotating Attribution Relations Towards an Italian Discourse Treebank

Attribution figures

Source type

Scopal change

NONE 429

SCOPAL-CHANGE 7

WRITER 23

OTHER 375

ARBITRARY 62

MIXED 1

Markables

CUE 461

SOURCE 329

CONTENT 468

Page 29: Annotating Attribution Relations Towards an Italian Discourse Treebank

Attribution figures

Attribution type and Factuality

Page 30: Annotating Attribution Relations Towards an Italian Discourse Treebank

Conclusion and future work

Achievements:

• more complete analysis of attribution

• definition of an annotation schema

• identification of issues and possible solutions

• partial listing of possible attribution cues

• annotation of a portion of the ISST corpus

Future work:

• testing of the interannotator agreement for the proposed schema

• redefinition of problematic or underspecified attributes

• annotation of the whole ISST corpus

• expanding the list of attribution cues

• relation between attribution and discourse connectives/ anaphora/ …

Page 31: Annotating Attribution Relations Towards an Italian Discourse Treebank

ANNOTATED CORPUS

Training tools for ODQA/ MPQA/ IE

Discourse generation

Statistical and combinatory analysis

Researches on journalistic discourse

Testing algorithms for the recognition of attribution

Development of corpora in other languages

Conclusion and future work

Thank you