SemEval-2012 Task 2: Measuring Degrees of Relational...
Transcript of SemEval-2012 Task 2: Measuring Degrees of Relational...
![Page 1: SemEval-2012 Task 2: Measuring Degrees of Relational …jurgens.people.si.umich.edu/docs/semeval-2012-task-2...SemEval-2012 Task 2: Measuring Degrees of Relational Similarity David](https://reader035.fdocuments.in/reader035/viewer/2022071219/6053f2a0a31d7b1ccc3cc3ab/html5/thumbnails/1.jpg)
SemEval-2012 Task 2: Measuring Degrees of Relational Similarity
David Jurgens Department of Computer Science
University of California, Los Angeles
Saif Mohammad Emerging Technologies
National Research Council Canada
Keith Holyoak Department of Pyschology
University of California, Los Angeles
Peter Turney Emerging Technologies
National Research Council Canada
![Page 2: SemEval-2012 Task 2: Measuring Degrees of Relational …jurgens.people.si.umich.edu/docs/semeval-2012-task-2...SemEval-2012 Task 2: Measuring Degrees of Relational Similarity David](https://reader035.fdocuments.in/reader035/viewer/2022071219/6053f2a0a31d7b1ccc3cc3ab/html5/thumbnails/2.jpg)
Talk Outline
• Motivating Example
• Task Description
• Data Annotation Gathering
• Systems and Performance
• Discussion
![Page 3: SemEval-2012 Task 2: Measuring Degrees of Relational …jurgens.people.si.umich.edu/docs/semeval-2012-task-2...SemEval-2012 Task 2: Measuring Degrees of Relational Similarity David](https://reader035.fdocuments.in/reader035/viewer/2022071219/6053f2a0a31d7b1ccc3cc3ab/html5/thumbnails/3.jpg)
The relational search engine
List all things that are part of a ... car
![Page 4: SemEval-2012 Task 2: Measuring Degrees of Relational …jurgens.people.si.umich.edu/docs/semeval-2012-task-2...SemEval-2012 Task 2: Measuring Degrees of Relational Similarity David](https://reader035.fdocuments.in/reader035/viewer/2022071219/6053f2a0a31d7b1ccc3cc3ab/html5/thumbnails/4.jpg)
The relational search engine
List all things that are part of a ... car!Antenna Hubcaps Seats Roof Wheel Engine Tires Windows
![Page 5: SemEval-2012 Task 2: Measuring Degrees of Relational …jurgens.people.si.umich.edu/docs/semeval-2012-task-2...SemEval-2012 Task 2: Measuring Degrees of Relational Similarity David](https://reader035.fdocuments.in/reader035/viewer/2022071219/6053f2a0a31d7b1ccc3cc3ab/html5/thumbnails/5.jpg)
The relational search engine
List all things that are part of a ... car!Antenna Hubcaps Seats Roof Wheel Engine Tires Windows
How might we rank these items?
![Page 6: SemEval-2012 Task 2: Measuring Degrees of Relational …jurgens.people.si.umich.edu/docs/semeval-2012-task-2...SemEval-2012 Task 2: Measuring Degrees of Relational Similarity David](https://reader035.fdocuments.in/reader035/viewer/2022071219/6053f2a0a31d7b1ccc3cc3ab/html5/thumbnails/6.jpg)
The relational search engine
List all things that are part of a ... car!Car:Antenna Car:Hubcaps Car:Seats Car:Roof Car:Wheel Car:Engine Car:Tires Car:Windows
These are all analogouspairs, but vary in how strong the relation is
![Page 7: SemEval-2012 Task 2: Measuring Degrees of Relational …jurgens.people.si.umich.edu/docs/semeval-2012-task-2...SemEval-2012 Task 2: Measuring Degrees of Relational Similarity David](https://reader035.fdocuments.in/reader035/viewer/2022071219/6053f2a0a31d7b1ccc3cc3ab/html5/thumbnails/7.jpg)
The relational search engine
List all things that are part of a ... car!Car:Antenna Car:Hubcaps Car:Seats Car:Roof Car:Wheel Car:Engine Car:Tires Car:Windows
What is the most prototypical example of
the shared relation?
![Page 8: SemEval-2012 Task 2: Measuring Degrees of Relational …jurgens.people.si.umich.edu/docs/semeval-2012-task-2...SemEval-2012 Task 2: Measuring Degrees of Relational Similarity David](https://reader035.fdocuments.in/reader035/viewer/2022071219/6053f2a0a31d7b1ccc3cc3ab/html5/thumbnails/8.jpg)
Talk Outline
• Motivating Example
• Task Description
• Data Annotation Gathering
• Systems and Performance
• Discussion
![Page 9: SemEval-2012 Task 2: Measuring Degrees of Relational …jurgens.people.si.umich.edu/docs/semeval-2012-task-2...SemEval-2012 Task 2: Measuring Degrees of Relational Similarity David](https://reader035.fdocuments.in/reader035/viewer/2022071219/6053f2a0a31d7b1ccc3cc3ab/html5/thumbnails/9.jpg)
Task 2: Measuring Degrees of Relational Similarity
!Car:Antenna Car:Hubcaps Car:Seats Car:Roof Car:Wheel Car:Engine Car:Tires Car:Windows
Given example pairs having approximately the same relation
Identify what the relation is
Rate each pair according to the degree that it expresses that relation
1
2
![Page 10: SemEval-2012 Task 2: Measuring Degrees of Relational …jurgens.people.si.umich.edu/docs/semeval-2012-task-2...SemEval-2012 Task 2: Measuring Degrees of Relational Similarity David](https://reader035.fdocuments.in/reader035/viewer/2022071219/6053f2a0a31d7b1ccc3cc3ab/html5/thumbnails/10.jpg)
Task 2: Measuring Degrees of Relational Similarity
bouquet:flower army:soldiers library:book arsenal:weapons herd:cow troop:soldier paragraph:word album:photos class:student beach:sand garden:plot
Identify what the relation is
1
![Page 11: SemEval-2012 Task 2: Measuring Degrees of Relational …jurgens.people.si.umich.edu/docs/semeval-2012-task-2...SemEval-2012 Task 2: Measuring Degrees of Relational Similarity David](https://reader035.fdocuments.in/reader035/viewer/2022071219/6053f2a0a31d7b1ccc3cc3ab/html5/thumbnails/11.jpg)
Task 2: Measuring Degrees of Relational Similarity
bouquet:flower army:soldiers library:book arsenal:weapons herd:cow troop:soldier paragraph:word album:photos class:student beach:sand garden:plot
Identify what the relation is
1A X is made from a collection of Y
![Page 12: SemEval-2012 Task 2: Measuring Degrees of Relational …jurgens.people.si.umich.edu/docs/semeval-2012-task-2...SemEval-2012 Task 2: Measuring Degrees of Relational Similarity David](https://reader035.fdocuments.in/reader035/viewer/2022071219/6053f2a0a31d7b1ccc3cc3ab/html5/thumbnails/12.jpg)
Task 2: Measuring Degrees of Relational Similarity
bouquet:flower army:soldiers library:book arsenal:weapons herd:cow troop:soldier paragraph:word album:photos class:student beach:sand garden:plot
Identify what the relation is
1A X is made from a collection of Y
Rate each pair according to the degree that it expresses that relation
2
![Page 13: SemEval-2012 Task 2: Measuring Degrees of Relational …jurgens.people.si.umich.edu/docs/semeval-2012-task-2...SemEval-2012 Task 2: Measuring Degrees of Relational Similarity David](https://reader035.fdocuments.in/reader035/viewer/2022071219/6053f2a0a31d7b1ccc3cc3ab/html5/thumbnails/13.jpg)
Task 2: Measuring Degrees of Relational Similarity
51.7 bouquet:flower 50.0 army:soldiers 37.3 library:book 35.7 arsenal:weapons 23.6 herd:cow 21.1 troop:soldier 20.7 paragraph:word 18.2 album:photos 10.5 class:student -7.5 beach:sand -32.8 garden:plot
Identify what the relation is
1A X is made from a collection of Y
Rate each pair according to the degree that it expresses that relation
2
![Page 14: SemEval-2012 Task 2: Measuring Degrees of Relational …jurgens.people.si.umich.edu/docs/semeval-2012-task-2...SemEval-2012 Task 2: Measuring Degrees of Relational Similarity David](https://reader035.fdocuments.in/reader035/viewer/2022071219/6053f2a0a31d7b1ccc3cc3ab/html5/thumbnails/14.jpg)
Task 2: Relation Taxonomy
10 Relation Categories, Divided into 79 subcategories
Class InclusionTaxonomic - flower:tulip Function - weapon:knife
Cause-PurposeCause:Effect - joke:laughter Agent:Goal - climber:peak
Isaac I. Bejar, Roger Chaffin, and Susan Embretson. Cognitive and Psychometric Analysis of Analogical Problem Solving. 1991
![Page 15: SemEval-2012 Task 2: Measuring Degrees of Relational …jurgens.people.si.umich.edu/docs/semeval-2012-task-2...SemEval-2012 Task 2: Measuring Degrees of Relational Similarity David](https://reader035.fdocuments.in/reader035/viewer/2022071219/6053f2a0a31d7b1ccc3cc3ab/html5/thumbnails/15.jpg)
Task 2: Relation Taxonomy
Includes some more challenging subcategories...
SimilarDimensional Naughty - copy:plagiarize
ContrastAsymmetric Contrary - hot:cool
Space-TimeContiguity - ocean:coast
![Page 16: SemEval-2012 Task 2: Measuring Degrees of Relational …jurgens.people.si.umich.edu/docs/semeval-2012-task-2...SemEval-2012 Task 2: Measuring Degrees of Relational Similarity David](https://reader035.fdocuments.in/reader035/viewer/2022071219/6053f2a0a31d7b1ccc3cc3ab/html5/thumbnails/16.jpg)
Task Data
• Lists of example pairs for all 79 subcategories
• Pairs vary in quality
• Prototypicality ratings for 10 subcategories
• All materials used to crowdsource the ratings
• Includes example description of each relation, “An X is a kind of Y”
![Page 17: SemEval-2012 Task 2: Measuring Degrees of Relational …jurgens.people.si.umich.edu/docs/semeval-2012-task-2...SemEval-2012 Task 2: Measuring Degrees of Relational Similarity David](https://reader035.fdocuments.in/reader035/viewer/2022071219/6053f2a0a31d7b1ccc3cc3ab/html5/thumbnails/17.jpg)
Talk Outline
• Motivating Example
• Task Description
• Data Annotation Gathering
• Systems and Performance
• Discussion
![Page 18: SemEval-2012 Task 2: Measuring Degrees of Relational …jurgens.people.si.umich.edu/docs/semeval-2012-task-2...SemEval-2012 Task 2: Measuring Degrees of Relational Similarity David](https://reader035.fdocuments.in/reader035/viewer/2022071219/6053f2a0a31d7b1ccc3cc3ab/html5/thumbnails/18.jpg)
Crowdsourcing Graded Relational Annotations
Seed Pairs and example relation
List of pairs Ratings
Phase 1 Phase 2
Generate Examples
PrototypicalityRating
![Page 19: SemEval-2012 Task 2: Measuring Degrees of Relational …jurgens.people.si.umich.edu/docs/semeval-2012-task-2...SemEval-2012 Task 2: Measuring Degrees of Relational Similarity David](https://reader035.fdocuments.in/reader035/viewer/2022071219/6053f2a0a31d7b1ccc3cc3ab/html5/thumbnails/19.jpg)
Gathering Relation Examples
Consider the following word pairs:
flower:tulip, emotion:rage, poem:sonnet
!What relation best describes these X:Y word pairs?
to X is to have a Y receive some object/service/idea Y is an unacceptable form of X a Y is a part of an X Y is a kind/type/instance of X
• Question 1 asked Turkers to pick the relation shared by 3 seed pairs
• Question 2 asked Turkers to provide four additional examples with the same relation
![Page 20: SemEval-2012 Task 2: Measuring Degrees of Relational …jurgens.people.si.umich.edu/docs/semeval-2012-task-2...SemEval-2012 Task 2: Measuring Degrees of Relational Similarity David](https://reader035.fdocuments.in/reader035/viewer/2022071219/6053f2a0a31d7b1ccc3cc3ab/html5/thumbnails/20.jpg)
Rating Prototypicality
• Question 1 same as Phase 1
• Question 2 used the MaxDiff format
Given prototypical examples of a subcategory: flower:tulip, emotion:rage, poem:sonnet
weapon:spearbird:swanautomobile:vanhair:brown
Select which pair is the best example of the relation and which is the worst example
![Page 21: SemEval-2012 Task 2: Measuring Degrees of Relational …jurgens.people.si.umich.edu/docs/semeval-2012-task-2...SemEval-2012 Task 2: Measuring Degrees of Relational Similarity David](https://reader035.fdocuments.in/reader035/viewer/2022071219/6053f2a0a31d7b1ccc3cc3ab/html5/thumbnails/21.jpg)
Talk Outline
• Motivating Example
• Task Description
• Data Annotation Gathering
• Systems and Performance
• Discussion
![Page 22: SemEval-2012 Task 2: Measuring Degrees of Relational …jurgens.people.si.umich.edu/docs/semeval-2012-task-2...SemEval-2012 Task 2: Measuring Degrees of Relational Similarity David](https://reader035.fdocuments.in/reader035/viewer/2022071219/6053f2a0a31d7b1ccc3cc3ab/html5/thumbnails/22.jpg)
Participants
• University of Texas, Dallas
• two systems
• University of Minnesota, Duluth
• three systems
• Benemérita Universidad Autónoma de Puebla (México)
![Page 23: SemEval-2012 Task 2: Measuring Degrees of Relational …jurgens.people.si.umich.edu/docs/semeval-2012-task-2...SemEval-2012 Task 2: Measuring Degrees of Relational Similarity David](https://reader035.fdocuments.in/reader035/viewer/2022071219/6053f2a0a31d7b1ccc3cc3ab/html5/thumbnails/23.jpg)
Evaluation Metrics
• Use the ratings to answer MaxDiff questions
• Compare system ranking with Turker ranking using Spearman’s rank correlation
weapon:spearbird:swanautomobile:vanhair:brown
Systems provide numerical ratings for each pair
Highest scoringis best example
![Page 24: SemEval-2012 Task 2: Measuring Degrees of Relational …jurgens.people.si.umich.edu/docs/semeval-2012-task-2...SemEval-2012 Task 2: Measuring Degrees of Relational Similarity David](https://reader035.fdocuments.in/reader035/viewer/2022071219/6053f2a0a31d7b1ccc3cc3ab/html5/thumbnails/24.jpg)
Baselines
• Generate a random ordering of pairs
• Score pairs according to the pair’s words’ Point-wise Mutual Information (PMI)
• a measure of statistical association of the pairs’ words
![Page 25: SemEval-2012 Task 2: Measuring Degrees of Relational …jurgens.people.si.umich.edu/docs/semeval-2012-task-2...SemEval-2012 Task 2: Measuring Degrees of Relational Similarity David](https://reader035.fdocuments.in/reader035/viewer/2022071219/6053f2a0a31d7b1ccc3cc3ab/html5/thumbnails/25.jpg)
Average Correlation Performance
Spea
rman
’s R
ank
Cor
rela
tion
0
0.075
0.15
0.225
0.3
BUAP RandomUMD-V2 UMD-V1UMD-V0 PMIUTD-SVM UTD-NB
![Page 26: SemEval-2012 Task 2: Measuring Degrees of Relational …jurgens.people.si.umich.edu/docs/semeval-2012-task-2...SemEval-2012 Task 2: Measuring Degrees of Relational Similarity David](https://reader035.fdocuments.in/reader035/viewer/2022071219/6053f2a0a31d7b1ccc3cc3ab/html5/thumbnails/26.jpg)
Correlation Performance per Subcategory
# o
f Sig
nific
ant
Cor
rela
tions
0
7.5
15
22.5
30
p < 0.05 p < 0.01
BUAP RandomUMD-V2 UMD-V1UMD-V0 PMIUTD-SVM UTD-NB
![Page 27: SemEval-2012 Task 2: Measuring Degrees of Relational …jurgens.people.si.umich.edu/docs/semeval-2012-task-2...SemEval-2012 Task 2: Measuring Degrees of Relational Similarity David](https://reader035.fdocuments.in/reader035/viewer/2022071219/6053f2a0a31d7b1ccc3cc3ab/html5/thumbnails/27.jpg)
MaxDiff Performance
% M
axD
iff Q
uest
ions
Ans
wer
ed
Cor
rect
ly
0
10
20
30
40
BUAP RandomUMD-V2 UMD-V1UMD-V0 PMIUTD-SVM UTD-NB
![Page 28: SemEval-2012 Task 2: Measuring Degrees of Relational …jurgens.people.si.umich.edu/docs/semeval-2012-task-2...SemEval-2012 Task 2: Measuring Degrees of Relational Similarity David](https://reader035.fdocuments.in/reader035/viewer/2022071219/6053f2a0a31d7b1ccc3cc3ab/html5/thumbnails/28.jpg)
Talk Outline
• Motivating Example
• Task Description
• Data Annotation Gathering
• Systems and Performance
• Discussion
![Page 29: SemEval-2012 Task 2: Measuring Degrees of Relational …jurgens.people.si.umich.edu/docs/semeval-2012-task-2...SemEval-2012 Task 2: Measuring Degrees of Relational Similarity David](https://reader035.fdocuments.in/reader035/viewer/2022071219/6053f2a0a31d7b1ccc3cc3ab/html5/thumbnails/29.jpg)
Categorical Performance
• Were some subcategories harder than others?
![Page 30: SemEval-2012 Task 2: Measuring Degrees of Relational …jurgens.people.si.umich.edu/docs/semeval-2012-task-2...SemEval-2012 Task 2: Measuring Degrees of Relational Similarity David](https://reader035.fdocuments.in/reader035/viewer/2022071219/6053f2a0a31d7b1ccc3cc3ab/html5/thumbnails/30.jpg)
Measuring the impact of pair reversals
Spea
rman
’s R
ank
Cor
rela
tion
-0.075
0
0.075
0.15
0.225
0.3
With Reversals Without Reversals
BUAP RandomUMD-V2 UMD-V1UMD-V0 PMIUTD-SVM UTD-NB
![Page 31: SemEval-2012 Task 2: Measuring Degrees of Relational …jurgens.people.si.umich.edu/docs/semeval-2012-task-2...SemEval-2012 Task 2: Measuring Degrees of Relational Similarity David](https://reader035.fdocuments.in/reader035/viewer/2022071219/6053f2a0a31d7b1ccc3cc3ab/html5/thumbnails/31.jpg)
Future Work
• Relations aren’t simply binary
• Especially when relational reasoning comes into play
• Future SemEval task
• Dataset has many uses in psychology as well as computational linguistics
• Spark more interest
![Page 32: SemEval-2012 Task 2: Measuring Degrees of Relational …jurgens.people.si.umich.edu/docs/semeval-2012-task-2...SemEval-2012 Task 2: Measuring Degrees of Relational Similarity David](https://reader035.fdocuments.in/reader035/viewer/2022071219/6053f2a0a31d7b1ccc3cc3ab/html5/thumbnails/32.jpg)
Thank you!https://sites.google.com/site/semeval2012task2/
David Jurgens Department of Computer Science
University of California, Los Angeles
Saif Mohammad Emerging Technologies
National Research Council Canada
Keith Holyoak Department of Pyschology
University of California, Los Angeles
Peter Turney Emerging Technologies
National Research Council Canada
Questions? [email protected]