Inference Protocols for Coreference Resolution Kai-Wei Chang, Rajhans Samdani, Alla Rozovskaya, Nick...

1
Inference Protocols for Coreference Resolution Kai-Wei Chang, Rajhans Samdani, Alla Rozovskaya, Nick Rizzolo, Mark Sammons, and Dan Roth This research is supported by ARL, and by DARPA, under the Machine Reading Program Coreference Coreference Resolution is the task of grouping all the mentions of entities into equivalence classes so that each class represents a discourse entity. In the example below, the mentions are colour-coded to indicate which mentions are co-referent (overlapping mentions have been omitted for clarity). An American official announced that American President Bill Clinton met his Russian counterpart, Vladimir Putin, today. The president said that Russia was a great country. System Architecture CoNLL Shared Task 2011 Experiments and Results Results on DEV set with predicted mentions Results on DEV set with gold mentions Official scores on TEST set: Method MD MUC BCUB CEAF AVG Best-Link 64.7 0 55.6 7 69.2 1 43.7 8 56.2 2 Best-Link W/ Const. 64.6 9 55.8 0 69.2 9 43.9 6 56.3 5 All-Link 63.3 0 54.5 6 68.5 0 42.1 5 55.0 7 All-Link W/ 63.3 54.5 68.4 6 42.2 55.0 Method MUC BCUB CEAF AVG Best-Link 80.58 75.68 64.68 73.65 Best-Link W/ Const. 80.56 75.02 64.24 73.27 All-Link 77.72 73.65 59.17 70.18 All-Link W/ Const. 77.94 73.43 59.47 70.28 Task MD MUC BCUB CEAF AVG Pred. Mentions w/ Pred. Boundaries (Best-Link W/ Const. ) 64.8 8 57.1 5 67.1 4 41.9 4 55.9 6 Pred. Mentions w/ Gold Boundaries (Best-Link W/ Const. ) 67.9 2 59.7 5 68.6 5 41.4 2 56.6 2 Gold Mentions (Best-Link) - 82.5 5 73.7 0 65.2 4 73.8 3 Discussion and Conclusions Mention Detection Coreference Resolution with Pairwise Coreference Model Inference Procedure Best-Link Strategy All-Link Strategy Knowledge-based Constraints Post-Processing Coreference resolution on the OntoNotes- 4.0 data set. Based on Bengtson and Roth (2008), our system is built on Learning Based Java (Rizzolo and Roth, 2010) We participated in the “closed” track of the shared task. Compared to the ACE 2004 Corpus, the OntoNotes-4.0 data set has two main differences: Singleton mentions are not annotated in OntoNotes-4.0. OntoNotes-4.0 takes the largest logical span to represent a mention. We design a high recall (~90%) and low precision (~35%) rule-based mention detection system. As a post-processing step, we remove all predicted mentions that remain in singleton clusters after the inference stage. The system achieves 64.88% in F1 score on Mention Detection Pairwise Mention Score Knowledge-based Constraints Inference Procedure The inference procedure takes as input a set of pairwise mention scores over a document and aggregates them into globally consistent cliques representing entities. We investigate two techniques: Best-Link and All-Link. Best-Link Inference: For each mention, Best-Link considers the best mention on its left to connect to (according the pairwise score) and creates a link between them if the pairwise score is above some threshold. All-Link Inference: The All-Link inference approach scores a clustering of mentions by including all possible pairwise links in the score. It is also known as correlational clustering (Bansal et al., 2002). ILP Formulation: Both Best-Link and All-Link can be written as an Integer linear programming (ILP) problem: Best-Link: All-Link: w uv is the compatibility score of a pair of mentions, y uv is a binary variable. For the All-Link clustering, we drop one of Weight vector learned from training data Extracted features Compatibility score given by constraints A threshold parameter (to be tuned) Training Procedure We use the same features as Bengtson and Roth (2008) with the knowledge extracted from OntoNotes-4.0. We explored two types of learning strategies, which can be used to learn w in Best-Link and All-Link. The choice of a learning strategy depends on the inference procedure. 1. Binary Classification: Following Bengtson and Roth (2008), we learn the pairwise scoring function w on: Positive examples: for each mention u, we construct a positive example (u, v), where v is the closest preceding mention in u’s equivalence class. Negative examples: all mention pairs (u, v), where v is a preceding mention of u and u, v are in different classes. As singleton mentions are not annotated, the sample distributions in the training and inference phases are inconsistent. We apply the mention detector to the training set, and train the classifier using the union set of gold and prediction mentions. 2. Structured Learning (Structured Perceptron): We present a structured perceptron algorithm, which is similar to the supervised clustering algorithm of Finley and Joachims (2005) to learn w: We define three high precision constraints that improve recall on NPs with definite determiners and mentions whose heads are Named Entities. Examples of mention pairs that are correctly linked by the constraints are: [Governor Bush] and [Bush],[a crucial swing state, Florida] and [Florida], [Sony itself] and [Sony]. We investigate two inference methods: Best-Link & All-Link We provide a flexible architecture for incorporating constraints We compare and evaluate the two inference approaches and the contribution of constraints Contributions Best-Link outperforms All-Link. This raises a natural algorithmic question regarding the inherent nature of the clustering style most suitable for coreference resolution and regarding possible ways of infusing more knowledge into different coreference clustering styles. Constraints improve the recall on a subset of mentions. There are other common errors for the system that might be fixed by constraints. Our approach accommodates infusion of know-ledge via constraints. We have demonstrated its utility in an end-to-end coreference system.

Transcript of Inference Protocols for Coreference Resolution Kai-Wei Chang, Rajhans Samdani, Alla Rozovskaya, Nick...

Page 1: Inference Protocols for Coreference Resolution Kai-Wei Chang, Rajhans Samdani, Alla Rozovskaya, Nick Rizzolo, Mark Sammons, and Dan Roth This research.

Inference Protocols for Coreference Resolution Kai-Wei Chang, Rajhans Samdani, Alla Rozovskaya, Nick Rizzolo, Mark Sammons, and Dan Roth

This research is supported by ARL, and by DARPA, under the Machine Reading Program

CoreferenceCoreference Resolution is the task of grouping all the mentions of entities into equivalence classes so that each class represents a discourse entity. In the example below, the mentions are colour-coded to indicate which mentions are co-referent (overlapping mentions have been omitted for clarity).

An American official announced that American President Bill Clinton met his Russian counterpart, Vladimir Putin, today. The president said that Russia was a great country.

System Architecture

CoNLL Shared Task 2011

Experiments and Results Results on DEV set with predicted mentions

Results on DEV set with gold mentions

Official scores on TEST set:

Method MD MUC BCUB CEAF AVG

Best-Link 64.70 55.67 69.21 43.78 56.22

Best-Link W/ Const. 64.69 55.80 69.29 43.96 56.35

All-Link 63.30 54.56 68.50 42.15 55.07

All-Link W/ Const. 63.39 54.56 68.46 42.20 55.07

Method MUC BCUB CEAF AVG

Best-Link 80.58 75.68 64.68 73.65

Best-Link W/ Const. 80.56 75.02 64.24 73.27

All-Link 77.72 73.65 59.17 70.18

All-Link W/ Const. 77.94 73.43 59.47 70.28

Task MD MUC BCUB CEAF AVG

Pred. Mentions w/ Pred. Boundaries(Best-Link W/ Const. )

64.88 57.15 67.14 41.94 55.96

Pred. Mentions w/ Gold Boundaries(Best-Link W/ Const. )

67.92 59.75 68.65 41.42 56.62

Gold Mentions(Best-Link) - 82.55 73.70 65.24 73.83

Discussion and Conclusions

Mention Detection

Coreference Resolution with Pairwise Coreference Model• Inference Procedure

• Best-Link Strategy• All-Link Strategy

• Knowledge-based Constraints

Post-Processing

Coreference resolution on the OntoNotes-4.0 data set. Based on Bengtson and Roth (2008), our system is built

on Learning Based Java (Rizzolo and Roth, 2010) We participated in the “closed” track of the shared task.

Compared to the ACE 2004 Corpus, the OntoNotes-4.0 data set has two main differences:

Singleton mentions are not annotated in OntoNotes-4.0. OntoNotes-4.0 takes the largest logical span to represent

a mention.

We design a high recall (~90%) and low precision (~35%) rule-based mention detection system.

As a post-processing step, we remove all predicted mentions that remain in singleton clusters after the inference stage.

The system achieves 64.88% in F1 score on TEST set.

Mention Detection

Pairwise Mention Score

Knowledge-based Constraints

Inference ProcedureThe inference procedure takes as input a set of pairwise mention scores over a document and aggregates them into globally consistent cliques representing entities. We investigate two techniques: Best-Link and All-Link.Best-Link Inference:For each mention, Best-Link considers the best mention on its left to connect to (according the pairwise score) and creates a link between them if the pairwise score is above some threshold.

All-Link Inference:The All-Link inference approach scores a clustering of mentions by including all possible pairwise links in the score. It is also known as correlational clustering (Bansal et al., 2002).

ILP Formulation:Both Best-Link and All-Link can be written as an Integer linear programming (ILP) problem:Best-Link:

All-Link:

wuv is the compatibility score of a pair of mentions, yuv is a binary variable. For the All-Link clustering, we drop one of the three transitivity constraints for each triple of mention variables. Similar to Pascal and Baldridge (2009), we observe that this improves accuracy.

wuvwuv

Weight vector learned from training data

Extracted features

Compatibility score given by constraints

A threshold parameter (to be tuned)

Training Procedure

We use the same features as Bengtson and Roth (2008) with the knowledge extracted from OntoNotes-4.0.

We explored two types of learning strategies, which can be used to learn w in Best-Link and All-Link. The choice of a learning strategy depends on the inference procedure.1. Binary Classification:Following Bengtson and Roth (2008), we learn the pairwise scoring function w on:

Positive examples: for each mention u, we construct a positive example (u, v), where v is the closest preceding mention in u’s equivalence class.

Negative examples: all mention pairs (u, v), where v is a preceding mention of u and u, v are in different classes.

As singleton mentions are not annotated, the sample distributions in the training and inference phases are inconsistent. We apply the mention detector to the training set, and train the classifier using the union set of gold and prediction mentions.

2. Structured Learning (Structured Perceptron):We present a structured perceptron algorithm, which is similar to the supervised clustering algorithm of Finley and Joachims (2005) to learn w:

We define three high precision constraints that improve recall on NPs with definite determiners and mentions whose heads are Named Entities. Examples of mention pairs that are correctly linked by the constraints are: [Governor Bush] and [Bush],[a crucial swing state, Florida] and [Florida], [Sony itself] and [Sony].

We investigate two inference methods: Best-Link & All-Link

We provide a flexible architecture for incorporating constraints

We compare and evaluate the two inference approaches and the contribution of constraints

Contributions

Best-Link outperforms All-Link. This raises a natural algorithmic question regarding the inherent nature of the clustering style most suitable for coreference resolution and regarding possible ways of infusing more knowledge into different coreference clustering styles.

Constraints improve the recall on a subset of mentions. There are other common errors for the system that might be fixed by constraints.

Our approach accommodates infusion of know-ledge via constraints. We have demonstrated its utility in an end-to-end coreference system.