Textual Entailment as Syntactic Graph Distance: a rule based and a SVM based approach Fabio Massimo...

21
Textual Entailment as Syntactic Graph Distance: a rule based and a SVM based approach Fabio Massimo Zanzotto Dipartimento Informatica Sistemistica e Comunicazione Università di Milano Bicocca Italy Maria Teresa Pazienza and Marco Pennacchiotti Department of Computer Science, Systems and Production University of Roma “Tor Vergata”

Transcript of Textual Entailment as Syntactic Graph Distance: a rule based and a SVM based approach Fabio Massimo...

Page 1: Textual Entailment as Syntactic Graph Distance: a rule based and a SVM based approach Fabio Massimo Zanzotto Dipartimento Informatica Sistemistica e Comunicazione.

Textual Entailment as Syntactic Graph Distance: a rule based and a SVM based approach

Fabio Massimo ZanzottoDipartimento Informatica Sistemistica e ComunicazioneUniversità di Milano BicoccaItaly

Maria Teresa Pazienza and Marco PennacchiottiDepartment of Computer Science, Systems and Production

University of Roma “Tor Vergata”

Page 2: Textual Entailment as Syntactic Graph Distance: a rule based and a SVM based approach Fabio Massimo Zanzotto Dipartimento Informatica Sistemistica e Comunicazione.

Classifying Textual Entailment (TE)

Two dimensionsSemantic dimension paraphrasing (i.e., synonymy) strict entailment

Recognition dimension semantic subsumption

America Airlines will lay off ... America Airlines will fire ... syntactic subsumption

American Airlines began laying off hundreds of flight attendants on Tuesday American Airlines will fire hundreds of flight attendants

direct implication America Airlines will fire flight attendants hundreds of flight

attendents will lose their jobs

Page 3: Textual Entailment as Syntactic Graph Distance: a rule based and a SVM based approach Fabio Massimo Zanzotto Dipartimento Informatica Sistemistica e Comunicazione.

Recognizing Textual Entailment (TE)

semantic subsumption syntactic subsumption

TE is a Graph Matching problem!

T:

H:

Page 4: Textual Entailment as Syntactic Graph Distance: a rule based and a SVM based approach Fabio Massimo Zanzotto Dipartimento Informatica Sistemistica e Comunicazione.

Graph Matching (GM)

GM is used, for instance, in Image Recognition

One Problem: distortions in the input graphs!!

Page 5: Textual Entailment as Syntactic Graph Distance: a rule based and a SVM based approach Fabio Massimo Zanzotto Dipartimento Informatica Sistemistica e Comunicazione.

Textual Entailment as Graph Matching (GM)

Known limitations distortion in the input syntactic/semantic graphs (errors in

parsing, word sense disambiguation, etc.) matching nodes is more complex than simple label

matching syntactic transformations should be an invariant

phenomenon (nominalization, passivization, argument movement, ...)

textual entailment relation is an asimmetric relation

Textual Entailment Measure

Page 6: Textual Entailment as Syntactic Graph Distance: a rule based and a SVM based approach Fabio Massimo Zanzotto Dipartimento Informatica Sistemistica e Comunicazione.

What’s next

Step 1 Definition of the syntactic representation model

(Extended Dependency Graph, XDG)Step 2: Rule-based Approach Definition of the Graph Matching measure for the

textual entailment relationStep 3: SVM-based Approach Using a SVM to evaluate parameters of the Graph

matching measureStep 4 Preliminary analysis of the results on the

development set

Page 7: Textual Entailment as Syntactic Graph Distance: a rule based and a SVM based approach Fabio Massimo Zanzotto Dipartimento Informatica Sistemistica e Comunicazione.

Extended Dependency Graph (XDG)

C are constituents syntactic head potential semantic

governor D are dependencies

among constituents

Page 8: Textual Entailment as Syntactic Graph Distance: a rule based and a SVM based approach Fabio Massimo Zanzotto Dipartimento Informatica Sistemistica e Comunicazione.

GM on XDG: definitions

Isomorphic subsumptionif two biiective functions fc and fd exist

Subgraph isomorphic subsumptionif it exists so that

Maximal Common Subsumption Subgraph (MCSS)given and , is the MCSS if

andthen

Page 9: Textual Entailment as Syntactic Graph Distance: a rule based and a SVM based approach Fabio Massimo Zanzotto Dipartimento Informatica Sistemistica e Comunicazione.

Finding the bijective function and evaluating the measure

Step 1 Constituent matching (fc:ChCt bijective)

Step 2 Dependency matching (fd:DhDt bijective)

Step 3 Define MCSS using fc and fd

Step 4Evaluate Similarity Measure on MCSS

Page 10: Textual Entailment as Syntactic Graph Distance: a rule based and a SVM based approach Fabio Massimo Zanzotto Dipartimento Informatica Sistemistica e Comunicazione.

Constituent Similarity

Degree of similarity

where

Parameter Box

ht

Page 11: Textual Entailment as Syntactic Graph Distance: a rule based and a SVM based approach Fabio Massimo Zanzotto Dipartimento Informatica Sistemistica e Comunicazione.

Dependency Similarity

Degree of Similarity

AL

Parameter Box

Page 12: Textual Entailment as Syntactic Graph Distance: a rule based and a SVM based approach Fabio Massimo Zanzotto Dipartimento Informatica Sistemistica e Comunicazione.

Textual Entailment Measure

Finally....

textual entailment holds if >t

Parameter Boxt

constituents dependencies

Page 13: Textual Entailment as Syntactic Graph Distance: a rule based and a SVM based approach Fabio Massimo Zanzotto Dipartimento Informatica Sistemistica e Comunicazione.

Some more details

Syntactic Transformation nominalization passive form

Other phenomena be-sentences vs appositions, e.g., the

president of XYZ is ... treating the not

Page 14: Textual Entailment as Syntactic Graph Distance: a rule based and a SVM based approach Fabio Massimo Zanzotto Dipartimento Informatica Sistemistica e Comunicazione.

Estimating Parameters with SVM

Main idea: divide the Graph Matching measure in many subparts

Assumptions The hypothesis H is a simple S-V-O sentence SVM must learn parameters and thresholds

A possibility: Feature space divided in three parts:

Subject Related Features Main Verb Related Features Object Related Features

Page 15: Textual Entailment as Syntactic Graph Distance: a rule based and a SVM based approach Fabio Massimo Zanzotto Dipartimento Informatica Sistemistica e Comunicazione.

Feature Spaces

T:

H:

Page 16: Textual Entailment as Syntactic Graph Distance: a rule based and a SVM based approach Fabio Massimo Zanzotto Dipartimento Informatica Sistemistica e Comunicazione.

Feature Spaces

Percent of common tokens and lemmas

Task Structural (Graph) Features

Subgraph matching indicators

Mean number of commonly anchored dependencies within constituents

Page 17: Textual Entailment as Syntactic Graph Distance: a rule based and a SVM based approach Fabio Massimo Zanzotto Dipartimento Informatica Sistemistica e Comunicazione.

Used Resources

Chaos: A modular and lexicalised parser for English and Italian (Basili&Zanzotto, 1998, 2002) based on the extended dependency graph (XDG) formalism

WordNet SVMlight

Page 18: Textual Entailment as Syntactic Graph Distance: a rule based and a SVM based approach Fabio Massimo Zanzotto Dipartimento Informatica Sistemistica e Comunicazione.

Preliminary analysis (Rule-based System)

Analysis of on dev1

we decided for:=0.85=0.85=0.5

0

0,1

0,2

0,3

0,4

0,5

0,6

0,7

0,8

0 0,1 0,2 0,3 0,4 0,5 0,6 0,7 0,8 0,9 1

Treshold

Prec

Page 19: Textual Entailment as Syntactic Graph Distance: a rule based and a SVM based approach Fabio Massimo Zanzotto Dipartimento Informatica Sistemistica e Comunicazione.

Preliminary analysis (SVM-based system)

Test Bed: dev1+dev2 Test Method: 3-fold cross validation repeated 10

times

winning horse!

Page 20: Textual Entailment as Syntactic Graph Distance: a rule based and a SVM based approach Fabio Massimo Zanzotto Dipartimento Informatica Sistemistica e Comunicazione.

Out from the Fairy Tale...

Page 21: Textual Entailment as Syntactic Graph Distance: a rule based and a SVM based approach Fabio Massimo Zanzotto Dipartimento Informatica Sistemistica e Comunicazione.

... and back to real life!!!!

Comdex -- once among the world's largest trade shows, the launching pad for new computer and software products, and a Las Vegas fixture for 20 years -- has been canceled for this year.

Los Vegas hosted the Comdex trade show for 20 years.