Multilingual Topic Models for Bilingual Dictionary Extraction
Multilingual Event Extraction and Semi-automatic acquisition of related resources
description
Transcript of Multilingual Event Extraction and Semi-automatic acquisition of related resources
Multilingual Event Extraction and Semi-automatic Acquisition of Related
Resources
Hristo TanevJoint Research Centre
Ispra, Italy
NEXUS News Event eXtraction
Using language Structures
Event Extraction
Event extraction was introduced as a language processing task at MUC-2 in 1989
Event is something that happens, event description is a template which describes an event
The goal of automatic event extraction is automatic filling of an event description template from a text or a set of texts
Event description usually includes: Event type Time and place of the event Participating entities which have specific roles and which depend on the event type,
e.g. perpetrator, victim, instrument etc. Cause
Event Extraction in the Context of EMM
The purpose of the automatic event extraction from online news is to facilitate the crisis-management efforts of the European Commission and other related political institutions
NEXUS NEXUS detects security-related events and disasters NEXUSNEXUS monitors in nearly real time online news in English,
French, Spanish, Italian, Russian, Portuguese, and Arabic (after automatic translation into English)
Medical NEXUS detects news about disease outbreaks in English (soon to be deployed in French)
EMM Event Extraction from Online News
News cluster:
Car bomb kills 50 in IraqHindustanTimes Wednesday, June 18, 2008 5:07:00 AM CEST A car bomb blast in northern Baghdad left more than 50 people dead and 80 wounded on Tuesday, a police source said…
Biggest blast in months leaves at least 50 dead in IraqreliefWeb Wednesday, June 18, 2008 5:05:00 AM CESTA car bomb blast in northern Baghdad, the largest in months, left more than 50 people dead and 80 wounded on Tuesday, a police source said...
EMM Event Extraction from Online News
Event Description
• Date: 18 June 2008• Place: Baghdad, Iraq• Event type: terrorist attack• Number killed: 50• Number wounded: 80• Number kidnapped: 0• Perpetrators: not reported• Weapons: car bomb
NEXUS
EMM Event Extraction ArchitectureNews
Entity Match Geo-Tagging Clustering
TextProcessing
NER, Parsing,Pattern Matching
InformationAggregation
Visualization Events
Partial Parsing
Example for a multilingual rule, which recognizes NP like: "a French volunteer and an Italian military"
coordination_rule :> ( person_group & [NAME:#name1, AMOUNT:"1" #amount1] (token & [SURFACE: ","]?
person_group & [NAME:#name2, AMOUNT:"1" #amount2])?(token & [SURFACE: ","]?
person_group & [NAME:#name3, AMOUNT:"1" #amount3])?conjunctionperson_group & [NAME:#name4, AMOUNT:"1" #amount4]):c
c: person_group & [NAME:#final, AMOUNT:#amount, NUMBER:"p“]& #final := ConcForSum(#name1,#name2,#name3,#name4)& #amount := ConcForSum(#amount1,#amount2,#amount3,#amount4).
Annotating Participating Entities
This is one of the most important tasks – to label the person groups and other phrases with event specific semantic roles, e.g. Perpetrator, Dead victim, Displaced people, Weapons used, etc.
Linear patterns – work well for English We use linear patterns also for Russian More elaborated event extraction grammars for Arabic,
Italian, French, Spanish and Portuguese
Event-specific Grammars
Rule: <person-group> [introduce-passive] Verb[baseform: rimanere]? Adv? Verb[sem: injured-obj, passive-voice] <person-group> : injured
Cinque persone sono state feriteCinque persone sono state gravemente feriteCinque persone sono rimaste ferite For details see [Zavarella et.al. Event Extraction for
Italian, Using a Cascade of Finite State Grammars, FSMNLP 2008]
Multilingual Lexical Acquisition
Multilingual Lexical Acquisition
Automatic learning of language-specific lexical resources
Statistical approaches, weakly supervised, make use of large quantities of unannotated news
Learning of patterns, keywords and keyphrases, which can be manually validated, rather than statistical models like SVM
Pattern learning Learning domain-specific lexica Learning semantic classes
Linear Pattern Learning
For English we use the linear patterns, as the algorithm learns them
We learned more 3000 linear patterns for English For Italian and other languages, linear patterns
are staring point for grammar development
Learning Semantic Classes
Sometimes, it is necessary to learn specific semantic classes, e.g. vehicles, disasters, weapons, facilities
We built a stastical system for automatic acquisition of semantic classes
The system is language-independent, only a list of language-specific stop words is used
Ontopopulis
INPUT:
feelings: hatred, love, fear, sadness
contrasting classes: taste, (style, outlook), character, thoughts
Extracting New Terms
Newly learnt terms are ordered and next given to the user for evaluation Top 20 terms from the category feelings
griefsorrowsadnesscondolencesfeardisappointmentregretsympathyshockhatredgratitudefrustrationangerdeep sorrowprofounddismaycondolencesatisfactionprofound griefdeep grief
Using Learnt Semantic Classes for Event Extraction
We use Ontopopulis to learn terms, which we next put into our domain-specific dictionaries
Some rules which require a domain specific dictionary: Rules for parsing person reference noun phrases, such as
two engineers Rules which detect weapons used:
killed with a [WEAPON] (killed with a gun ) Detection of vehicles used:
[PEOPLE] in a [VEHICLE] were stopped (three men in a boat were stopped)
NEXUS Evaluation for English
61%Geo-tagging (place name)
90%Geo-tagging (country)
80%Event classification
57%Injured counting
70%Dead counting
AccuracyDetection Task
NEXUS Multilingual Evaluation
0.470.670.510.69Portuguese
0.67-0.620.87Italian
ArrestedKidnappedWoundedDeadF1 measure
Evaluation of Ontopopulis
------6095Spanish
7585207085756090Portuguese
BuildingCrimeEdged weapon
WatercraftVehiclePoliticianWeaponPersonAccuracy (%) top 20