Annotating Attribution Relations Towards an Italian Discourse Treebank
description
Transcript of Annotating Attribution Relations Towards an Italian Discourse Treebank
Annotating Attribution Relations
Towards an Italian Discourse Treebank
Silvia Pareti
Irina Prodanof
• Introduction• Related works• Goal and methodology• Proposed scheme• Some issues• Pilot annotation• Attribution figures• Conclusion and future work
Outline
Introduction
ATTRIBUTION in a text is ascribing the ownership of an attitude towards some linguistic material, i.e. the
text itself, a portion of it or their semantic content, to an entity.
Recognising attribution relations is fundamental for Information Extraction, (Multi Perspective) Question Answering, Opinion Mining etc.
FionaDifferent sources can differ in bias and reliability and this deeply affects the way we perceive information.
Fiona says “This afternoon it will rain”
Introduction
Why should we identify the source of a portion of text?
NLP techniques
Question comprehension
Information Retrieval
Finding text fragments with the answer
Language Generation
Answer generation
Answer selection
ODQA
• visualize only authoritative answers
• collect different opinions, hearsay
• discard second-hand or anonymous information
• retrieve statements from a specific source over a given time span
• …
Introduction
“È meglio vaccinarsi per l’influenza ‘suina’?”
Is it better to get the swine flu vaccine?
Introduction
“The vaccine is useless.”
“Everyone should get the vaccine.”
“Only persons having a higher risk of complication from influenza should get the vaccine .”
orsetta90
Novartis
Doctor association
blogger: not authoritative and not verifiable source
Pharmaceuticals industry: authoritative but biased
“È meglio vaccinarsi per l’influenza ‘suina’?”
Is it better to get the swine flu vaccine?
Related works
Opinion holders identification projects
Bethard et al. (2004)Consider just opinion propositions (source = agent)
Kim and Hovy (2005)Identify all possible opinion holders agentive and NPs (no pronouns)
Stoyanov and Cardie (2006)Identify NPs sources
Choi et al.(2006)They do not consider implicit or multiple sources and test their system on the OPQA corpus
Opinion recognition has limited coverage and not satisfactory precision: 60-70%
Related works
PDTB (Prasad et al., 2007)
assertions, beliefs, facts, eventualities
Opinion Corpus (Wiebe, 2002)
speech acts
private states: opinions, beliefs, thoughts, feelings, emotions, goals, evaluations and judgements
GraphBank (Wolf and Gibson, 2005)
attribution included as a directed coherence relation (satellite to nucleus)
Attribution of discourse connectives and their arguments only
Attribution considered as an intra-sentential phenomenon
Attribution of discourse segments
Designing the addition of a level of annotation for attribution to the ISST (Italian Syntactic - Semantic Treebank) corpus.
• more complete and independent analysis of attribution
• development of an annotation schema
• pilot annotation of a portion of the ISST
• partial listing of possible attribution cues
• evaluation
Goal and methodology
Goal and methodology
ANALYSIS
SCHEMADEFINITION
TOOL SELECTION
EVALUATION
ANNOTATION
•Scope definition
•Identification of characteristics and issues
•Selection of features to be annotated
•Design of the schema
•Annotation requirement definition
•Match tool characteristics and annotation requirements
•Setting the tool
•Evaluation of the schema applicability
•Pilot annotation and detection of issues
X
•Linguistic resource creation and release
Proposed schema
SOURCE(S) CUE CONTENT(S)
relation
(SUPPLEMENT)
Markables
-verb
-noun
-adjective
-preposition
-prep. group
-graphic marker
-noun phrase
-adjective
-prep. phrase
-word
-phrase
-clause
-sentence
-entire article
-cue modifier
-indirect object
-source of source
-event specification
Features
Attribution type
Source type
• assertion (e.g. dire, osservare, sostenere)
• belief (e.g. credere, pensare, dubitare)
• fact (e.g. ricordare, sapere, sentire)
• eventuality (e.g. permettere, proibire)
• writer
• other (e.g. il presidente, un uomo, Maria)
• arbitrary (e.g. uno, la gente, tutti)
• mixed
Factuality
Scopal change • none
• scopal change
• factual
• non-factual
Proposed schema
Source
Some issues
• Nested attribution
• Multiple sources
• Source of source
• Pronominal and bridging anaphora
[Sue said {that Mary believes (that Gore
won the election)}].
Fonti: [writer] {writer, Sue} (writer, Sue, Mary) (Wiebe, 2002:5 - with the addition of brackets)
Blinder, secondo voci riferite dal New York Times, sperava di succedere al presidente Greenspan quando a marzo scadrà la sua nomina. (ISST re070)
Blinder, according to rumours reported by the New York Times, hoped to succeed to president Greenspan when in May his appointment will run over.
Some issues
Source
• Nested attribution
• Multiple sources
• Source of source
• Pronominal and bridging anaphora
Tutti, incluse le autorità, conoscono la loro provenienza, ma nessuno dice e fa nulla per prevenire il massacro di capi selvatici. (cs.morph020)
Everyone, including the authorities, knows their provenance, but no one says and does anything to prevent the massacre of wild animals.
Arbitrary Other
Some issues
Source
• Nested attribution
• Multiple sources
• Source of source
• Pronominal and bridging anaphora
(Ø) Ho saputo della squalifica di Garciano da Maurizio Damilano, vi giuro, non pensavo di arrivare primo. (ISST cs071)
(I) heard of the disqualification of Garciano from Maurizio Damilano, I swear, I didn’t imagine I would have came first.
Poi però, tramite la figlia che sta a Santiago, prima limita la portata del colloquio con Gaston Salvatore (“non è stata una vera intervista, solo una conversazione”), poi smentisce. (ISST period005)
Afterwards however, through the daughter who lives in Santiago, first diminishes the importance of the colloquium with Gaston Salvatore (“it wasn’t a real interview, just a conversation”), then (she) denies.
Some issues
Source
• Nested attribution
• Multiple sources
• Source of source
• Pronominal and bridging anaphora
La Fermenta, a sentire l' arabo, è organizzata in modo che oggi consegue un utile pari al 35 per cento del fatturato. Questo il vero traguardo che dovrà nel tempo raggiungere la Pierrel. Ma come? Con tagli di mano d'opera? Nemmeno per sogno, dice El Sayed. (ISST els001)
Fermenta, according to the Arabian, is organised so that it earns at present a profit of 35 per cent of the turnover. This is the real goal that in the long distance Pierrel will have to achieve. But how? Cutting down on workforce? No way, says El Sayed.
Some issues
Source
• Nested attribution
• Multiple sources
• Source of source
• Pronominal and bridging anaphora
Cue
Some issues
• Type definition
• Multimodal cues
• Scopal change
Cue
assertion belief facts eventualities
affermare credere ricordare permettere
sostenere pensare sapere sostenere
osservare dubitare osservare desiderare
"Vi daremo le statistiche alla fine", promettono i generali croati. (ISST cs030)
“We’ll give you the statistics at the end”, promise the Croatian generals.
AssertionEventuality
Some issues
• Type definition
• Multimodal cues
• Scopal change
"Sì - si adombra Matt - Un ruolo interessante: con Tarantino eravamo a buon punto, poi é arrivato Bruce. I suoi film incassano un po' più dei miei, no? Hanno scelto lui” …(ISST cs060)
“Yes - Matt grows dark - An interesting role: with Tarantino we were at a good point, then Bruce arrived. His films cash in a bit more than mines, right? They chose him” …
Arlacchi sorride: “Pura paranoia politica. Non ho partecipato ai lavori solo a causa di un impegno privato…”. (ISST re095)
Arlacchi smiles: “Pure political paranoia. I didn’t participate in the works only because of a private appointment…” .
Cue
Some issues
• Type definition
• Multimodal cues
• Scopal change
Cue
Strano destino, quello di Civitavecchia: finire spesso, troppo spesso, sulle pagine dei giornali per eventi misteriosi, oppure per fatti che nessuno vorrebbe accadessero nella sua città. (ISST cs090)
Strange destiny, that of Civitavecchia: ending up often, too often, in the news because of mysterious events, or because of events that no one would like to happen in their town.
Se c’è, cioè, una maggioranza in Parlamento in grado di affrontare seriamente una fase di riforme anche elettorali, Ø penso che la legislatura possa utilmente proseguire. (ISST re075)
If there is a majority at the Parliament able to seriously face a phase of reforms, also electoral, (I) think that the legislature could usefully continue.
? = tutti vorrebbero non accadessero
Some issues
• Type definition
• Multimodal cues
• Scopal change
Some issues
Content
• Multiple contents
• Discontinuous spans
• Event anaphora
(Ø) Ho detto |che ero dalla sua parte| e |che ritenevo giusta la sua protesta|. (ISST cs063)
(I) said |that I was on his side| and |that I considered his complaint fair|.
Some issues
Content
• Multiple contents
• Discontinuous spans
• Event anaphora
"There's no question that some of those workers and
managers contracted asbestos-related diseases,"
said Darrell Phillips, vice president of human
resources for Hollingsworth & Vose.
"But you have to recognize that these events took
place 35 years ago. It has no bearing on our work
force today."
(PDTB 0003)
Some issues
• Multiple contents
• Discontinuous spans
• Event anaphora
Content
“…L’umanità deve proclamare uno storico sciopero ad oltranza fino alla distruzione di tutti gli armamenti nucleari.” Le parole registrate di Gheddafi, …(ISST cs039)
“…The world should proclaim a non-stop strike till the destruction of all nuclear armaments.” Gheddafi’s recorded words,…
Some issues
Content
• Multiple contents
• Discontinuous spans
• Event anaphora
Subcorpus:
• 50 articles from the ISST• balanced• 37.000 word tokens• 461 attribution relations
MMAX2
Base Data (original text)
Scheme (annotation schema)
Style (display structure)
Customization (preferences)
Markable (annotation)
Tools
GATE
Knowtator
Annotator
MMAX2
Callisto
…
Pilot annotation
Tool requirements
Discontinuous text selection
Nested selection
Relations
Multiple sources/contents
Pre-defined values selection
Display customizability
Ease of setting a scheme
Ease of annotation
XML stand-off output
Reference to word index
Pilot annotation
Attribution figures
Source type
Scopal change
NONE 429
SCOPAL-CHANGE 7
WRITER 23
OTHER 375
ARBITRARY 62
MIXED 1
Markables
CUE 461
SOURCE 329
CONTENT 468
Attribution figures
Attribution type and Factuality
Conclusion and future work
Achievements:
• more complete analysis of attribution
• definition of an annotation schema
• identification of issues and possible solutions
• partial listing of possible attribution cues
• annotation of a portion of the ISST corpus
Future work:
• testing of the interannotator agreement for the proposed schema
• redefinition of problematic or underspecified attributes
• annotation of the whole ISST corpus
• expanding the list of attribution cues
• relation between attribution and discourse connectives/ anaphora/ …
ANNOTATED CORPUS
Training tools for ODQA/ MPQA/ IE
Discourse generation
Statistical and combinatory analysis
Researches on journalistic discourse
Testing algorithms for the recognition of attribution
Development of corpora in other languages
…
Conclusion and future work
Thank you