Improved Temporal Relation Classificationkanmy/talks/COLING_Temporal_2012… · Improved Temporal...

51
Improved Temporal Relation Classification using Dependency Parses and Selective Crowdsourced Annotations Jun-Ping Ng and Min-Yen Kan COLING - 13 Dec 2012 1 Thursday, December 13, 12

Transcript of Improved Temporal Relation Classificationkanmy/talks/COLING_Temporal_2012… · Improved Temporal...

Page 1: Improved Temporal Relation Classificationkanmy/talks/COLING_Temporal_2012… · Improved Temporal Relation Classification using Dependency Parses and Selective Crowdsourced Annotations

Improved Temporal Relation Classification

using Dependency Parses and Selective Crowdsourced Annotations

Jun-Ping Ng and Min-Yen Kan

COLING - 13 Dec 2012 1

Thursday, December 13, 12

Page 2: Improved Temporal Relation Classificationkanmy/talks/COLING_Temporal_2012… · Improved Temporal Relation Classification using Dependency Parses and Selective Crowdsourced Annotations

Temporal Relations?

Two top aides to Netanyahu, political advisor Uzi Arad and Cabinet Secretary Danny Naveh, left for Europe on Sunday, apparently to investigate the Syrian issue, the newspaper said.

2

Thursday, December 13, 12

Page 3: Improved Temporal Relation Classificationkanmy/talks/COLING_Temporal_2012… · Improved Temporal Relation Classification using Dependency Parses and Selective Crowdsourced Annotations

Temporal Relations?

Two top aides to Netanyahu, political advisor Uzi Arad and Cabinet Secretary Danny Naveh, left for Europe on Sunday, apparently to investigate the Syrian issue, the newspaper said.

OVERLAP

2

Thursday, December 13, 12

Page 4: Improved Temporal Relation Classificationkanmy/talks/COLING_Temporal_2012… · Improved Temporal Relation Classification using Dependency Parses and Selective Crowdsourced Annotations

Temporal Relations?

Two top aides to Netanyahu, political advisor Uzi Arad and Cabinet Secretary Danny Naveh, left for Europe on Sunday, apparently to investigate the Syrian issue, the newspaper said.

OVERLAP

AFTER

2

Thursday, December 13, 12

Page 5: Improved Temporal Relation Classificationkanmy/talks/COLING_Temporal_2012… · Improved Temporal Relation Classification using Dependency Parses and Selective Crowdsourced Annotations

Goal

• Be able to classify event-temporal (E-T) relations within a sentence

3

Thursday, December 13, 12

Page 6: Improved Temporal Relation Classificationkanmy/talks/COLING_Temporal_2012… · Improved Temporal Relation Classification using Dependency Parses and Selective Crowdsourced Annotations

Outline

• Brief look at state-of-the-art

• Proposed Approach

• Reducing size of feature space

• Smart acquisition of data via crowdsourcing

• Error Analysis

4

Thursday, December 13, 12

Page 7: Improved Temporal Relation Classificationkanmy/talks/COLING_Temporal_2012… · Improved Temporal Relation Classification using Dependency Parses and Selective Crowdsourced Annotations

State-of-the-art

• Shared tasks TempEval-1 and TempEval-2 held in conjunction with SemEval in 2007 and 2010.

• State-of-the-art systems in TempEval-2 achieve around 65% accuracy

• Work with dataset from TempEval-2 to facilitate benchmarking and comparisons

5

Thursday, December 13, 12

Page 8: Improved Temporal Relation Classificationkanmy/talks/COLING_Temporal_2012… · Improved Temporal Relation Classification using Dependency Parses and Selective Crowdsourced Annotations

Data Sparsity

• Features typically employed include

• lexical cues

• context

• sentence structure

• Training set consists of around 959 instances

6

Thursday, December 13, 12

Page 9: Improved Temporal Relation Classificationkanmy/talks/COLING_Temporal_2012… · Improved Temporal Relation Classification using Dependency Parses and Selective Crowdsourced Annotations

Proposal

• Reduce dimensionality of feature space

• Increase amount of annotated data available

7

Thursday, December 13, 12

Page 10: Improved Temporal Relation Classificationkanmy/talks/COLING_Temporal_2012… · Improved Temporal Relation Classification using Dependency Parses and Selective Crowdsourced Annotations

A Kernel Hypothesis

... left for Europe on Sunday ...

8

Thursday, December 13, 12

Page 11: Improved Temporal Relation Classificationkanmy/talks/COLING_Temporal_2012… · Improved Temporal Relation Classification using Dependency Parses and Selective Crowdsourced Annotations

A Kernel Hypothesis

... left for Europe on Sunday ...

8

OVERLAP

Thursday, December 13, 12

Page 12: Improved Temporal Relation Classificationkanmy/talks/COLING_Temporal_2012… · Improved Temporal Relation Classification using Dependency Parses and Selective Crowdsourced Annotations

A Kernel Hypothesis

... left for Europe on Sunday ...

...went to America on Monday..

...partied at home on Wednesday..

8

OVERLAP

Thursday, December 13, 12

Page 13: Improved Temporal Relation Classificationkanmy/talks/COLING_Temporal_2012… · Improved Temporal Relation Classification using Dependency Parses and Selective Crowdsourced Annotations

A Kernel Hypothesis

... left for Europe on Sunday ...

...went to America on Monday..

...partied at home on Wednesday..

8

OVERLAP

OVERLAP

OVERLAP

Thursday, December 13, 12

Page 14: Improved Temporal Relation Classificationkanmy/talks/COLING_Temporal_2012… · Improved Temporal Relation Classification using Dependency Parses and Selective Crowdsourced Annotations

Convolution Kernels

• Allows us to capture structure similarities easily

• Tree structure used as feature for support vector machines (SVM)

• No need to flatten structure representation into a set of real number features

9

Thursday, December 13, 12

Page 15: Improved Temporal Relation Classificationkanmy/talks/COLING_Temporal_2012… · Improved Temporal Relation Classification using Dependency Parses and Selective Crowdsourced Annotations

Structure SimilarityS

NP VP PP

S

NP VP

10

Thursday, December 13, 12

Page 16: Improved Temporal Relation Classificationkanmy/talks/COLING_Temporal_2012… · Improved Temporal Relation Classification using Dependency Parses and Selective Crowdsourced Annotations

Structure SimilarityS

NP VP PP

S

NP VP

S

NP

S

VP

S

PP

S

NP

S

VP

10

Thursday, December 13, 12

Page 17: Improved Temporal Relation Classificationkanmy/talks/COLING_Temporal_2012… · Improved Temporal Relation Classification using Dependency Parses and Selective Crowdsourced Annotations

Features... met with his friends early December ...

December

met

early

with

friends

his

11

Thursday, December 13, 12

Page 18: Improved Temporal Relation Classificationkanmy/talks/COLING_Temporal_2012… · Improved Temporal Relation Classification using Dependency Parses and Selective Crowdsourced Annotations

Features... met with his friends early December ...

December

met

early

with

friends

his

met

December

early

11

Thursday, December 13, 12

Page 19: Improved Temporal Relation Classificationkanmy/talks/COLING_Temporal_2012… · Improved Temporal Relation Classification using Dependency Parses and Selective Crowdsourced Annotations

Features... met with his friends early December ...

December

met

early

with

friends

his

met

December

early

(1) dependency path

11

Thursday, December 13, 12

Page 20: Improved Temporal Relation Classificationkanmy/talks/COLING_Temporal_2012… · Improved Temporal Relation Classification using Dependency Parses and Selective Crowdsourced Annotations

Features... met with his friends early December ...

December

met

early

with

friends

his

met

December

early

(II) dependency parse of time expression

11

Thursday, December 13, 12

Page 21: Improved Temporal Relation Classificationkanmy/talks/COLING_Temporal_2012… · Improved Temporal Relation Classification using Dependency Parses and Selective Crowdsourced Annotations

ComparisonsSystem Accuracy

ConvoDep

TRIOS

JU_CSE

NCSU_indi

NCSU_joint

TRIPS

USFD2

67.4%

65.0%

63.0%

63.0%

63.0%

63.0%

63.0%

✦ Trained on TempEval-2 training set, Tested on TempEval-2 testing set12

Thursday, December 13, 12

Page 22: Improved Temporal Relation Classificationkanmy/talks/COLING_Temporal_2012… · Improved Temporal Relation Classification using Dependency Parses and Selective Crowdsourced Annotations

ComparisonsSystem Accuracy

ConvoDep

TRIOS

JU_CSE

NCSU_indi

NCSU_joint

TRIPS

USFD2

67.4%

65.0%

63.0%

63.0%

63.0%

63.0%

63.0%

✦ Trained on TempEval-2 training set, Tested on TempEval-2 testing set12

Precision

Recall

F1

0.828

0.512

0.523

Thursday, December 13, 12

Page 23: Improved Temporal Relation Classificationkanmy/talks/COLING_Temporal_2012… · Improved Temporal Relation Classification using Dependency Parses and Selective Crowdsourced Annotations

Getting More Data

• Crowdsourcing is a cheap, efficient avenue for large scale data annotation

• But temporal annotations are not trivial

• We want to investigate

• the quality of crowdsourced temporal annotations

• effective ways to gather the annotations

13

Thursday, December 13, 12

Page 24: Improved Temporal Relation Classificationkanmy/talks/COLING_Temporal_2012… · Improved Temporal Relation Classification using Dependency Parses and Selective Crowdsourced Annotations

Task Setup

• Crowdsourcing via Crowdflower

• Data validation to improve data quality

• Raw data collected from news articles

• Event and time expressions extracted during pre-processing

14

Thursday, December 13, 12

Page 25: Improved Temporal Relation Classificationkanmy/talks/COLING_Temporal_2012… · Improved Temporal Relation Classification using Dependency Parses and Selective Crowdsourced Annotations

Is It Useful?

• Collected initial dataset of 1000 instances

• Trained SVM classifier with convolution kernels

15

Thursday, December 13, 12

Page 26: Improved Temporal Relation Classificationkanmy/talks/COLING_Temporal_2012… · Improved Temporal Relation Classification using Dependency Parses and Selective Crowdsourced Annotations

Is It Useful?

System Accuracy F1 Precision Recall

ConvoDep

CF-1000

CF-1000 + TE

67.4% 0.523 0.828 0.512

65.2% 0.525 0.578 0.535

71.7% 0.615 0.726 0.598

✦ Tested on TempEval-2 testing set16

Thursday, December 13, 12

Page 27: Improved Temporal Relation Classificationkanmy/talks/COLING_Temporal_2012… · Improved Temporal Relation Classification using Dependency Parses and Selective Crowdsourced Annotations

A Smarter Way• Are we able to collect less data but still

remain effective?

• Insight - Instances are not equally hard

17

Thursday, December 13, 12

Page 28: Improved Temporal Relation Classificationkanmy/talks/COLING_Temporal_2012… · Improved Temporal Relation Classification using Dependency Parses and Selective Crowdsourced Annotations

A Smarter Way• Are we able to collect less data but still

remain effective?

• Insight - Instances are not equally hard

Two top aides to Netanyahu, political advisor Uzi Arad and Cabinet Secretary Danny Naveh, left for Europe on Sunday, apparently to investigate the Syrian issue, the newspaper said.

17

Thursday, December 13, 12

Page 29: Improved Temporal Relation Classificationkanmy/talks/COLING_Temporal_2012… · Improved Temporal Relation Classification using Dependency Parses and Selective Crowdsourced Annotations

A Smarter Way• Are we able to collect less data but still

remain effective?

• Insight - Instances are not equally hard

Two top aides to Netanyahu, political advisor Uzi Arad and Cabinet Secretary Danny Naveh, left for Europe on Sunday, apparently to investigate the Syrian issue, the newspaper said.

17

Thursday, December 13, 12

Page 30: Improved Temporal Relation Classificationkanmy/talks/COLING_Temporal_2012… · Improved Temporal Relation Classification using Dependency Parses and Selective Crowdsourced Annotations

A Smarter Way• Are we able to collect less data but still

remain effective?

• Insight - Instances are not equally hard

Two top aides to Netanyahu, political advisor Uzi Arad and Cabinet Secretary Danny Naveh, left for Europe on Sunday, apparently to investigate the Syrian issue, the newspaper said.

17

Thursday, December 13, 12

Page 31: Improved Temporal Relation Classificationkanmy/talks/COLING_Temporal_2012… · Improved Temporal Relation Classification using Dependency Parses and Selective Crowdsourced Annotations

Hard Instances• Order event expressions in increasing order from

time expression in dependency parse

said

left

onfor

Sunday

18

Thursday, December 13, 12

Page 32: Improved Temporal Relation Classificationkanmy/talks/COLING_Temporal_2012… · Improved Temporal Relation Classification using Dependency Parses and Selective Crowdsourced Annotations

Hard Instances• Order event expressions in increasing order from

time expression in dependency parse

said

left

onfor

SundaySunday

said

left

18

Thursday, December 13, 12

Page 33: Improved Temporal Relation Classificationkanmy/talks/COLING_Temporal_2012… · Improved Temporal Relation Classification using Dependency Parses and Selective Crowdsourced Annotations

Hard Instances• Order event expressions in increasing order from

time expression in dependency parse

said

left

onfor

SundaySunday

said

leftLevel-0 instance

18

Thursday, December 13, 12

Page 34: Improved Temporal Relation Classificationkanmy/talks/COLING_Temporal_2012… · Improved Temporal Relation Classification using Dependency Parses and Selective Crowdsourced Annotations

Hard Instances• Order event expressions in increasing order from

time expression in dependency parse

said

left

onfor

SundaySunday

said

leftLevel-0 instance

Level-1 instance

18

Thursday, December 13, 12

Page 35: Improved Temporal Relation Classificationkanmy/talks/COLING_Temporal_2012… · Improved Temporal Relation Classification using Dependency Parses and Selective Crowdsourced Annotations

Easy Level-0 Instances

Accuracy (%)Accuracy (%)Accuracy (%)Accuracy (%)Accuracy (%)

Level-0 (59) Level-1 (47) Level-2 (21) Level-3 (10) Level-4 (1)

84.5 66.0 42.9 30.0 100.0

✦ Tested on TempEval-2 testing set

19

• Level-0 instances are much easier to get correct

Thursday, December 13, 12

Page 36: Improved Temporal Relation Classificationkanmy/talks/COLING_Temporal_2012… · Improved Temporal Relation Classification using Dependency Parses and Selective Crowdsourced Annotations

Selective Annotation

System Accuracy F1 Precision Recall

ConvoDep

CF-NoLevel0

CF-Full

67.4% 0.523 0.828 0.512

73.2% 0.639 0.659 0.643

73.2% 0.641 0.660 0.647

✦ Tested on TempEval-2 testing set20

• Dropping Level-0 instances does not lead to drop in performance

Thursday, December 13, 12

Page 37: Improved Temporal Relation Classificationkanmy/talks/COLING_Temporal_2012… · Improved Temporal Relation Classification using Dependency Parses and Selective Crowdsourced Annotations

Annotation Savings

21

11%

18%

34%

37%

Level-0 Level-1Level-2 Others

• Level-0 instances form up to 37% of the annotated data

Thursday, December 13, 12

Page 38: Improved Temporal Relation Classificationkanmy/talks/COLING_Temporal_2012… · Improved Temporal Relation Classification using Dependency Parses and Selective Crowdsourced Annotations

Analysis

• Why missing out on 37% of training instances causes no drop in performance?

• How to approach performance upper-bound?

22

Thursday, December 13, 12

Page 39: Improved Temporal Relation Classificationkanmy/talks/COLING_Temporal_2012… · Improved Temporal Relation Classification using Dependency Parses and Selective Crowdsourced Annotations

Performance Breakdown

SystemOverlapOverlapOverlap BeforeBeforeBefore AfterAfterAfter

SystemP R F1 P R F1 P R F1

CF-NoLevel0 0.72 0.96 0.82 0.56 0.45 0.50 0.70 0.52 0.60

CF-Full 0.72 0.95 0.81 0.57 0.40 0.47 0.70 0.60 0.64

23

• Classifier does better on OVERLAP relations

Thursday, December 13, 12

Page 40: Improved Temporal Relation Classificationkanmy/talks/COLING_Temporal_2012… · Improved Temporal Relation Classification using Dependency Parses and Selective Crowdsourced Annotations

Label Distribution

LabelDistribution of Labels (%)Distribution of Labels (%)Distribution of Labels (%)

LabelLevel-0 Level-1 Level-2

AFTER 10.1 21.2 23.6

BEFORE 5.1 13.7 16.1

24

• Level-0 instances contain less AFTER and BEFORE instances

Thursday, December 13, 12

Page 41: Improved Temporal Relation Classificationkanmy/talks/COLING_Temporal_2012… · Improved Temporal Relation Classification using Dependency Parses and Selective Crowdsourced Annotations

Label Distribution

LabelDistribution of Labels (%)Distribution of Labels (%)Distribution of Labels (%)

LabelLevel-0 Level-1 Level-2

AFTER 10.1 21.2 23.6

BEFORE 5.1 13.7 16.1

24

• Level-0 instances contain less AFTER and BEFORE instances

Thursday, December 13, 12

Page 42: Improved Temporal Relation Classificationkanmy/talks/COLING_Temporal_2012… · Improved Temporal Relation Classification using Dependency Parses and Selective Crowdsourced Annotations

Confusion Matrix

Actual LabelPredicted LabelPredicted LabelPredicted Label

Actual LabelOVERLAP BEFORE AFTER

OVERLAP 78 2 1

BEFORE 7 9 4

AFTER 13 0 14

✦ Confusion matrix for CF-NoLevel025

• BEFORE mis-classified as AFTER

Thursday, December 13, 12

Page 43: Improved Temporal Relation Classificationkanmy/talks/COLING_Temporal_2012… · Improved Temporal Relation Classification using Dependency Parses and Selective Crowdsourced Annotations

Confusion Matrix

Actual LabelPredicted LabelPredicted LabelPredicted Label

Actual LabelOVERLAP BEFORE AFTER

OVERLAP 78 2 1

BEFORE 7 9 4

AFTER 13 0 14

✦ Confusion matrix for CF-NoLevel025

• BEFORE mis-classified as AFTER

Thursday, December 13, 12

Page 44: Improved Temporal Relation Classificationkanmy/talks/COLING_Temporal_2012… · Improved Temporal Relation Classification using Dependency Parses and Selective Crowdsourced Annotations

Copula Modifiers

He added that final guidelines to be published in early November will determine whether the bank is in compliance.

BEFORE

added

published

November

…………..

…………..

26

Thursday, December 13, 12

Page 45: Improved Temporal Relation Classificationkanmy/talks/COLING_Temporal_2012… · Improved Temporal Relation Classification using Dependency Parses and Selective Crowdsourced Annotations

Copula Modifiers

He added that final guidelines to be published in early November will determine whether the bank is in compliance.

BEFORE

added

published

November

…………..

…………..

26

Thursday, December 13, 12

Page 46: Improved Temporal Relation Classificationkanmy/talks/COLING_Temporal_2012… · Improved Temporal Relation Classification using Dependency Parses and Selective Crowdsourced Annotations

Missing Feature

He added that final guidelines published in early November will determine whether the bank is in compliance.

27

Thursday, December 13, 12

Page 47: Improved Temporal Relation Classificationkanmy/talks/COLING_Temporal_2012… · Improved Temporal Relation Classification using Dependency Parses and Selective Crowdsourced Annotations

Missing Feature

He added that final guidelines published in early November will determine whether the bank is in compliance.

AFTER

27

Thursday, December 13, 12

Page 48: Improved Temporal Relation Classificationkanmy/talks/COLING_Temporal_2012… · Improved Temporal Relation Classification using Dependency Parses and Selective Crowdsourced Annotations

Round Up

• Improved event-temporal relation classification

• By reducing input feature space

• By increasing amount of annotated data

• Demonstrated efficacy of crowdsourced annotations

• Proposed an optimization to reduce the annotation effort required

28

Thursday, December 13, 12

Page 49: Improved Temporal Relation Classificationkanmy/talks/COLING_Temporal_2012… · Improved Temporal Relation Classification using Dependency Parses and Selective Crowdsourced Annotations

Questions?

29

Thank you!

Dataset can be downloaded at http://wing.comp.nus.edu.sg/~junping/etrcc/page/index.html

Thursday, December 13, 12

Page 50: Improved Temporal Relation Classificationkanmy/talks/COLING_Temporal_2012… · Improved Temporal Relation Classification using Dependency Parses and Selective Crowdsourced Annotations

30

Thursday, December 13, 12

Page 51: Improved Temporal Relation Classificationkanmy/talks/COLING_Temporal_2012… · Improved Temporal Relation Classification using Dependency Parses and Selective Crowdsourced Annotations

Breakdown of Data

31

Data SetData Set

TempEval-2 Training

TempEval-2 Testing

CF-Full

Relative size of partition (%)Relative size of partition (%)Relative size of partition (%)Relative size of partition (%)

Level-0 Level-1 Level-2 Others

40.9 35.2 15.1 8.8

41.4 34.3 15.7 8.6

37.0 34.3 17.5 11.2

Thursday, December 13, 12