1 Boosting-based parse re-ranking with subtree features Taku Kudo Jun Suzuki Hideki Isozaki NTT...

Boosting-based parse re-ranking with subtree features

Taku Kudo

Jun Suzuki Hideki Isozaki

NTT Communication Science Labs.

Discriminative methods for parsing

have shown a remarkable performance compared to traditional generative models, e.g., PCFG

two approachesre-ranking [Collins 00, Collins 02]

discriminative machine learning algorithms are used to rerank n-best outputs of generative/conditional parsers.

dynamic programming Max margin parsing [Tasker 04]

Reranking Let x be an input

sentence, and y be a parse tree for x

Let G(x) be a function that returns a set of n-best results for x

A re-ranker gives a score to each sentence and selects the result which has the highest score

x: I buy cars with money

n-best results

Scoring with linear model

is a feature function that maps output y into space

is a parameter vector (weights) modeled with training data

)(y...}1,0,0,0,1,0,1,0{

)( maxargˆ

yyscore

Two issues in linear model [1/2]

How to estimate the weights ? try to minimize a loss for given training data

definition of loss:

)( maxargˆ)(

Boosting

Two issues in linear model [2/2]

How to define the feature set ? use all subtrees

Pros: - natural extension of CFG rules

- can capture long contextual information Cons: naïve enumerations give huge complexities

)( maxargˆ)(

A question for all subtrees

Do we always need all subtrees? only a small set of subtrees is informative most subtrees are redundant

Goal: automatic feature selection from all subtrees can perform fast parsing can give good interpretation to selected

subtrees Boosting meets our demand!

Why Boosting? Different regularization strategies for

L1 (Boosting) better when most given features are irrelevant can remove redundant features

L2 (SVMs) better when most given features are relevant uses features as much as they can

Boosting meets our demand, because most subtrees are irrelevant and redundant

RankBoost [Freund03]Current weights

Next weightsUpdate feature kwith an increment δ

select the optimal pair <k,δ> thatminimizes the Loss

How to find the optimal subtree?

Set of all subtrees is huge Need to find the optimal subtree efficiently

　 A variant of Branch-and-Bound Define a search space in which the whole set of subtrees is given Find the optimal subtree by traversing this search space Prune the search space by proposing a criterion

Ad-hoc techniques Size constraints

Use subtrees whose size is less than s (s = 6~8)

Frequency constraints Use subtrees that occur no less than f times in

training data (f = 2 ~ 5)

Pseudo iterations After several 5- or 10-iterations of boosting, we

alternately perform 100- or 300 pseudo iterations, in which the optimal subtee is selected from the cache that maintains the features explored in the previous iterations.

Relation to previous work

Boosting vs Kernel methods [Collins 00]Boosting vs Data Oriented Parsing [Bod 98]

Kernels [Collins 00] Kernel methods reduce the problem into the dual

form that only depends on dot products of two instances (parsed trees)

Pros No need to provide explicit feature vector A dynamic programming is used to calculate dot

products between trees, which is very efficient! Cons

Require a large number of kernel evaluations in testing Parsing is slow Difficult to see which features are relevant

DOP [Bod 98] DOP is not based on re-ranking DOP deals with the all the subtrees

representation explicitly like our method Pros

high accuracy Cons

exact computation is NP-complete cannot always provide sparse feature representation very slow since the number of subtrees the DOP

uses is huge

Kernels vs DOP vs BoostingKernel DOP Boosting

How to enumerate all the subtrees?

implicitly explicitly explicitly

Complexity in training

polynomial NP-hard NP-hard

(worst case)Branch-and-bound

Sparse feature representations

No No Yes

Parsing speed slow slow fast

Can see relevant features?

No Yes, but difficult because of

redundant features

Experiments

WSJ parsingShallow parsing

Experiments WSJ parsing

Standard data: training: 2-21, test 23 of PTB Model2 of Collins 99 was used to obtain n-best

results exactly the same setting as [Collins 00 (Kernels)]

Shallow parsing CoNLL 2000 shared task training:15-18, test: 20 of PTB CRF-based parser [Sha 03] was used to obtain n-

best results

Tree representations WSJ parsing

lexicalized tree each non-terminal has

a special node labeled with a head word

Shallow parsing right-branching tree

where adjacent phrases are child/parent relation

special node for right/left boundaries

Results: WSJ parsing

LR/LP = labeled recall/precision. CBs is the average number of cross brackets per sentence. 0 CBs, and 2CBs are the percentage of sentences with 0 or 2 crossing brackets, respectively

Comparable to other methods Better than kernel method that uses all subtree representations with different

parameter estimation

Results: Shallow parsing

Comparable to other methods Our method is also comparable to Zhang’s method even without extra linguistic

features

Fβ=1 is a harmonic mean between precision and recall

Advantages

Compact feature set WSJ parsing: ~ 8,000 Shallow parsing: ~ 3,000 Kernels implicitly use a huge number of features

Parsing is very fast WSJ parsing: 0.055 sec./sentence Shallow parsing: 0.042 sec./sentence

(n-best parsing time is NOT included)

Advantages, cont’d Sparse feature representations allow us to

analyze which kinds of subtrees are relevant

Shallow parsing

positive subtrees

negative subtrees negative subtrees

positive subtrees

WSJ parsing

Conclusions All subtrees are potentially used as features Boosting

L1 norm regularization performs automatic feature selection

Branch and bound enables us to find the optimal subtrees efficiently

Advantages: comparable accuracy to other parsing methods fast parsing good interpretability

Efficient computation

Right most extension [Asai02, Zaki02]

Extend a given tree of size (n-1) by adding a new node to obtain trees of size n a node is added to the right-most-path a node is added as the rightmost sibling

a b5 6c3

rightmost- path

7 7},,{ cbaL

},,{ cba

},,{ cba},,{ cba

Right most extension, cont. Recursive applications of right most

extensions create a search space

Pruning For all propose an upper bound

such that Can prune the node t if ,

where is a suboptimal gain

tt ' )()'( ttgain

)( 1.0

)( 4.0

)( 5.0 gain

)( 4.0 gain

Pruning strategyμ(t )=0.4 implies the gain of any supertree of t is no grater than 0.4

ttgain

Upper bound of the gain

1 Boosting-based parse re-ranking with subtree features Taku Kudo Jun Suzuki Hideki Isozaki NTT...

Documents

Transcript of 1 Boosting-based parse re-ranking with subtree features Taku Kudo Jun Suzuki Hideki Isozaki NTT...

Judo in Action 01 Kudo

TASM: Top-k Approximate Subtree Matchingaugsten/publ/icde10/icde10-slides-printer.pdfMotivation and Problem Deﬁnition TASM: Top-k Approximate Subtree Matching Deﬁnition (TASM:

Kudo Manual INAGAKI Genshiro 0

Activities!&!Techniques! Presentation ... - Saitama JALT · (Presentation at JALT OMIYA, January 13, 2013) Anna Husson Isozaki Gunma Women’s University anna-isozaki@nifty.com Introduction

FDA Kudo - November 4, 1997

James Kudo | Portfolio

SUBTREE PRUNE AND RE-GRAFT: A REVERSIBLE …SUBTREE PRUNE AND RE-GRAFT 3 Figure 1. An SPR move. The dashed subtree tree attached to vertex xin the top tree is re-attached at a new

Subtree Replacement in Decision Tree Simpli cation

FDA Kudo - December 2. 1987

· ARRErpo TORRE ISOZAKI — ARA TA ISOZAKI AND ANDREA vvvvw.lSOZAKl,CO.JP - "La tensione infinita dell'uomo verso Cie o", questa l'idea alla base del progetto.

Multiples Iba Yasuko, Isozaki Arata, Kawamata Tadashi, Ozawa Tsuyoshi, Shinoda … · 2019. 2. 27. · Iba Yasuko, Isozaki Arata, Kawamata Tadashi, Ozawa Tsuyoshi, Shinoda Taro Date:

Protozoology (1939) by Kudo

The Subtree Size Profile of Plane-oriented Recursive Treesjupiter.math.nctu.edu.tw/~mfuchs/ana11_talk.pdf · The Subtree Size Profile of Plane-oriented Recursive Trees ... node. Add

Arata Isozaki Presentation

Approximate Labelled Subtree Homeomorphism Based on: “Approximate Labelled Subtree Homeomorphism” R. Y. Pinter, O.Rokhlenko, D. Tsur, M. Ziv-Ukelson.

ABOUT JAI KUDO LENSES - · PDF fileCall us on +44 (0) 2087 329 600 or visit lenses.jaikudo.com ABOUT JAI KUDO LENSES Jai Kudo are one of the largest independent lens distributors in

Approximate Labelled Subtree Homeomorphism

Arata Isozaki - Architettura di Pietra · Arata Isozaki & Associates, Arata Isozaki y Asociados España, Toshiaki Tange, Masato ... Granito grigio Mondaliz di Galizia (facciata a

201 Kudo Project Swing-by r Kudo Project Swing-by seminar 51 fiôo Kudo Project Swing ... · 2019-02-15 · RA 19692 RA 19693 RB 19691 201 Kudo Project "Swing-by" seminar 2019 DVD7ža—fi

A brief story of designing kudo