Sentence Compression Based on ILP Decoding Method Hongling Wang, Yonglei Zhang, Guodong Zhou NLP...

Sentence Compression Based on ILP Decoding Method

Hongling Wang, Yonglei Zhang, Guodong Zhou

NLP Lab, Soochow University

Introduction

Related Work

Sentence Compression based on ILP

Experiments

Conclusion

Outline

Introduction(1)

• Definition of Sentence Compression– It aims to shorten a sentence x=l1,l2,……,ln

into a substring y=c1,c2,……cm, where ci { ∈ l1,l2,……,ln }.

• Example:– Original Sentence: 据法新社报道，有目击者称，以

军 23 日空袭加沙地带中部，目前尚无伤亡报告。 – Target Sentence: 目击者称以军空袭加沙地带中部

Introduction(2)

• Sentence compression has been widely used in:– Summarization– Automatic title generation– Searching engine– Topic detection– …

Related Work(1)

• Mainstream solution – corpus-driven supervised leaning – Generative model

• To select the optimal target sentence by estimating the joint probability P(x, y) of original sentence x having the target sentence y.

– Discriminative model

Related Work(2)

• Generative model– Knight & Marcu (2002) firstly apply the noisy-

channel model for sentence compression. – Shortcomings:

• the source model is trained on uncompressed sentences – inaccurate data

• the channel model requires aligned parse trees for both compressed and uncompressed sentences in the training set -- alignment difficult and the channel probability estimates unreliable

Related Work(3)

• Discriminative model– McDonald(2006) used max-margin relaxed algorithm (MIRA)

to study the feature weight, then rank the subtrees, and finally select the tree with the highest score as the optimal target sentence.

– Cohn & Lapata (2007, 2008, and 2009) formulated the compression problem as tree-to-tree rewriting using a synchronous grammar. Each grammar rule is assigned a weight which is learned discriminatively within a large margin model.

– Zhang et al. (2013) compressed sentences based on Structured SVM model which treats the compression problem as a structured learning problem

Our Method

• The sentence compression problem is treated as a structured learning problem followed Zhang et al.(2013)

– Learning a subtree from the original sentence parse tree as its compressed sentence

– Formulating the problem of finding the optimal subtree to an ILP decoding problem

The Framework of SC

Sentence Compression based on ILP

• Linear objective function

x is the original sentence syntactic tree, y is the target subtree

is the feature function of bi-gram and trimming features

from x to y, w is the vector of feature weight

( , ) arg max( )

, ( , )score y xy Y x

w f y x

( , )f y x

Linear constrains

• ni for each non-terminal node – where ni is the parent node of nj

• wi for each terminal node – wi = nj, where nj is the POS node of word wi

• fi for the ith feature

– if fi=1 ， the ith feature appears; or, the feature doesn’t appear

– According to the restrictions of feature value, the

corresponding linear constrains are added – fi=1-wi

0i jn n i jn n

Features – Word/POS Features

– the remaining word’s bigram POS • PosBigram ( 目击者称 ) = NN&VV

– whether the dropped word is a stop word • IsStop ( 据 ) = 1

– whether the dropped word is the headword of the original sentence

– the number of remaining words.

Features – Syntax features • the parent-children relationship of the cutting edge

– del-Edge (PP) = IP-PP

• the number of the cutting edge • the dependant relation between the dropped word and its

dependence word – dep_type( 有 )=DEP

• the relation chain of the dropped word’s POS with its dependence word’s POS – dep_link ( ， ) = PU-VMOD-VV

• whether the dependence tree’s root is deleted – del_ROOT ( 无 ) = 1

• whether each dropped word is a leaf of the dependence tree – del_Leaf ( 法新社 ) = 1

Loss Function

• Function 1– the loss ratio of bigram of the remaining

word in original sentence

• Function 2: word loss-based function– the sum of the number of the words

deleted by mistake and the number of the words remained by mistake between the predict sentence and the gold target sentence

Evaluation

• manual evaluation – Importance– Grammaticality

• automatic evaluation– compression ratio (CR) (0.7~10)– BLEU score

1

exp( log )N

n nn

Bleu BP w p

Experimental settings

• Parallel corpus extracted from news documents

• Stanford Parser

• Alignment tool developed by our own

• Structured SVM

Experimental results

Compared to the McDonald’s decoding method, the system based ILP decoding method achieves a comparable performance using simpler and less features

Conclusions

• the problem of sentence compression is formulated as a problem of finding an optimal sub-tree using ILP decoding method.

• Compared to the work using McDonald’s decoding method, the system which only uses simpler and fewer features achieves a comparable performance on same conditions.

Sentence Compression Based on ILP Decoding Method Hongling Wang, Yonglei Zhang, Guodong Zhou NLP...

Documents

Transcript of Sentence Compression Based on ILP Decoding Method Hongling Wang, Yonglei Zhang, Guodong Zhou NLP...