Semantic Role Labeling for Arabic using Kernel Methods Mona Diab Alessandro Moschitti Daniele...

Post on 02-Jan-2016

214 views 1 download

Tags:

Transcript of Semantic Role Labeling for Arabic using Kernel Methods Mona Diab Alessandro Moschitti Daniele...

Semantic Role Labeling for Arabic using Kernel Methods

Mona DiabAlessandro Moschitti

Daniele Pighin

What is SRL?

Proposition

John opened the door

What is SRL?

Proposition

[John]Agent [opened]Predicate [the door]Theme

What is SRL?

Proposition

[John]Agent [opened]Predicate [the door]Theme

Subject Object

What is SRL?

Proposition

[John]Agent [opened]Predicate [the door]Theme

Subject Object

[The door]Theme [opened]Predicate

What is SRL?

Proposition

[John]Agent [opened]Predicate [the door]Theme

Object

Subject[The door]Theme [opened]Predicate

What is SRL?

Proposition

[John]Agent [opened]Predicate [the door]Theme

FrameNet Agent Container_portal

[The door]Theme [opened]Predicate

What is SRL?

Proposition

[John]Agent [opened]Predicate [the door]Theme

PropBank ARG0 ARG1

[The door]Theme [opened]Predicate

Why SRL?

• Useful for information extraction

• Useful for Question Answering

• Useful for Machine Translation?

Our Goal

Last Sunday India to official visit Rongji Zhu the-Chinese the-Ministers president started

The Chinese Prime Minister Zho Rongji started an official visit to India last sunday

Our Goal

Last Sunday India to official visit Rongji Zhu the-Chinese the-Ministers president started

The Chinese Prime Minister Zho Rongji started an official visit to India last Sunday

ARGM-TMP

RoadMap

• Arabic Characteristics

• Our Approach

• Experiments & Results

• Conclusions & Future Directions

Morphology

• Rich complex morphology– Templatic, concatenative, derivational,

inflectional• wbHsnAthm• w+b+Hsn+At+hm• and by virtue(s) their

– Verbs are marked for tense, person, gender, aspect, mood, voice

– Nominals are marked for case, number, gender, definiteness

• Orthography is underspecified for short vowels and consonant doubling (diacritics)

Syntax

Characteristics relevant for SRL

• Typical underspecification of short vowels masks morphological features such as case and agreement– Example:

rjl Albyt AlkbyrMan_masc the-house_masc the-big_masc

“the big man of the house” or “the man of the big house”

Characteristics relevant for SRL

• Typical underspecification of short vowels masks morphological features such as case and agreement– Example:

rjlu Albyti AlkbyriMan_masc-Nom the-house_masc-Gen the-big_masc-Gen

the man of the big house

Characteristics relevant for SRL

• Typical underspecification of short vowels masks morphological features such as case and agreement– Example:

rjlu Albyti AlkbyruMan_masc-Nom the-house_masc-Gen the-big_masc-Nom

the big man of the house

Characteristics relevant for SRL

• Idafa constructions make indefinite nominals syntactically definite hence allowing for agreement, therefore better scoping– Example:

[rjlu Albyti] AlkbyruMan_masc-Nom-Def the-house_masc-Gen the-big_masc-Nom-Def

the big man of the house

Characteristics relevant for SRL

Characteristics relevant for SRL

Characteristics relevant for SRL

Characteristics relevant for SRL

Characteristics relevant for SRL

• Passive constructions differ from English in that they can not have an explicit non-instrument underlying subject, hence only ARG1 and ARG2. ARG0 are not allowed.

– Example:qutil Emru bslAHiK qAtliK*qutl [Emru]ARG1 [bslmY]ARG0

*[Amr]ARG1 was killed [by SalmA]ARG0

Characteristics relevant for SRL

• Passive constructions differ from English in that they can not have an explicit non-instrument underlying subject, hence only ARG1 and ARG2. ARG0 are not allowed.

– Example:qutil [Emru]ARG1 [bslAHiK qAtliK]ARG2

[Amr]ARG1 was killed [by a deadly weapon]ARG2

Characteristics relevant for SRL

Our Approach

Semantic Role Labeling Steps

• Given a sentence and an associated syntactic parse

• An SRL system identifies the arguments for a given predicate

• The arguments are identified in two steps– Argument boundary detection– Argument role classification

• For the overall system we apply a heuristic for argument label conflict resolution

• one label per argument

The Sentence

The Chinese Prime Minister Zho Rongji started an official visit to India last sunday

The Parse Tree

Boundary Identification

Role Classification

Our Approach

• Experiment with different kernels

• Experiment with Standard Features (similar to English) and rich morphological features specific to Arabic

Different Kernels• Polynomial Kernels (1-6) with standard

features • Tree Kernels

Where Nt1 and Nt2 are the sets of nodes in t1 and t2, and Δ(.) evaluates the common substructures rooted in n1 and n2

Argument Structure Trees (AST)

NP

D N

VP

V

delivers

a talk

S

N

Paul

in

PP

IN NP

jj

formal

N

styleArg. 1

Defined as the minimal subtree encompassing the predicate and one of its arguments

Tree Substructure Representations

NP

D N

VP

V

delivers

a talk

NP

D N

VP

V

delivers

a

NP

D N

VP

V

delivers

NP

D N

VP

V NP

VP

V

The overall set of AST substructures

NP

D N

a talk

NP

D N

NP

D N

a D N

a talk

NP

D N NP

D N

VP

V

delivers

a talk

V

delivers

NP

D N

VP

V

a talk

NP

D N

VP

V

NP

D N

VP

V

a

NP

D

VP

V

talk

N

a

NP

D N

VP

V

delivers

talk

NP

D N

VP

V

delivers NP

D N

VP

V

delivers

NP

VP

V NP

VP

V

delivers

talk

Explicit feature space

zxrr

..,0)..,0,..,1, .,1,.,1,..,0,. ..,0,..,0,..,1, ..,1,..,1,..,0, 0,(=xr

• counts the number of common substructures

NP

D N

a talk

NP

D N

a

NP

D N NP

D N

VP

V

delivers

a talk

NP

D N

VP

V

a talk

NP

D N

VP

V

talk

Standard Features• Predicate: Lemmatization of the predicate• Path: Syntactic path linking the predicate and an

argument NNNPVPVBD• Partial Path: Path feature limited to the branching of

arg• No Direction path without the traversals• Phrase type• Last and first POS of words in the arguments• Verb subcategorization frame: production expanding

the predicate parent node• Position of the argument relative to predicate• Syntactic Frame: positions of the surrounding NPs

relative to predicate

Extended Features for Arabic

Definiteness, Number, Gender, Case, Mood, Person, Lemma (vocalized), English Gloss, Unvocalized surface

form, Vocalized Surface form

• Expanded the leaf nodes in AST with 10 attribute value pairs creating EAST

Arabic AST

Sample AST from our example

ARG0

Arabic AST

Sample AST from our example

ARG0

Extended AST (EAST)

……

Experiments & Results

Experimental Set Up

• SemEval 2007 Task 18 data set, Pilot Arabic Propbank

• 95 most frequent verbs in ATB3v2• Gold parses, Unvowelized, Bies

reduced POS tag set (25 tags)• Num Sentences: Dev (886), Test (902),

Train (8402)• 26 role types (5 numbered ARGs)

Experimental Set Up

• Experimented only with 350k examples

• We use the SVM-Light TK Toolkit (Moschitti, 2004, 2006) with SVM light default parameters

• Evaluation metrics of precision, recall and F measure are obtained using the CoNLL evaluator

Boundary Detection Results

Role Classification Results

Overall Results

Observations-BD

• AST and EAST don’t differ much for boundary detection

• AST+EAST+ Poly (3) gives best BD results

• AST and EAST perform significantly better than Poly (1)

Observations – RC & SRL

Conclusions

• Explicitly encoding the rich morphological features helps with SRL in Arabic

• Tree Kernels is indeed a feasible way of dealing with large feature spaces that are structural in nature

• Combining kernels yields better results

Future Directions

Thank You

The parse tree