The Topology of WordNet: some metrics Ann Devitt and Carl Vogel Computational Linguistics Group...

23
The Topology of WordNet: some metrics Ann Devitt and Carl Vogel Ann Devitt and Carl Vogel Computational Linguistics Computational Linguistics Group Group Trinity College Dublin, Trinity College Dublin, Ireland Ireland

Transcript of The Topology of WordNet: some metrics Ann Devitt and Carl Vogel Computational Linguistics Group...

Page 1: The Topology of WordNet: some metrics Ann Devitt and Carl Vogel Computational Linguistics Group Trinity College Dublin, Ireland.

The Topology of WordNet:some metrics

Ann Devitt and Carl VogelAnn Devitt and Carl Vogel

Computational Linguistics GroupComputational Linguistics Group

Trinity College Dublin, IrelandTrinity College Dublin, Ireland

Page 2: The Topology of WordNet: some metrics Ann Devitt and Carl Vogel Computational Linguistics Group Trinity College Dublin, Ireland.

Ann Devitt, TCD

Introduction

MeasuresMeasures WordNet “sub-hierarchies”WordNet “sub-hierarchies” Multiple inheritanceMultiple inheritance Branching FactorBranching Factor Depth Depth versus versus HeightHeight Cluster coefficientsCluster coefficients

Specificity pilot studySpecificity pilot study

Page 3: The Topology of WordNet: some metrics Ann Devitt and Carl Vogel Computational Linguistics Group Trinity College Dublin, Ireland.

Ann Devitt, TCD

Terminology

WordNet as directed acyclic graphWordNet as directed acyclic graph

Node and synset interchangeableNode and synset interchangeable

Page 4: The Topology of WordNet: some metrics Ann Devitt and Carl Vogel Computational Linguistics Group Trinity College Dublin, Ireland.

Ann Devitt, TCD

Dimensional distribution

Page 5: The Topology of WordNet: some metrics Ann Devitt and Carl Vogel Computational Linguistics Group Trinity College Dublin, Ireland.

Ann Devitt, TCD

Overlap between hierarchies

2072 synsets: more than 1 top hierarchy2072 synsets: more than 1 top hierarchy

35 synsets: more than 2 top hierarchies35 synsets: more than 2 top hierarchies

Page 6: The Topology of WordNet: some metrics Ann Devitt and Carl Vogel Computational Linguistics Group Trinity College Dublin, Ireland.

Ann Devitt, TCD

Some overlap examples

Abstraction and EventAbstraction and Event948 synsets948 synsets

group actiongroup action Entity and GroupEntity and Group

250 nodes250 nodesweaponryweaponry

Page 7: The Topology of WordNet: some metrics Ann Devitt and Carl Vogel Computational Linguistics Group Trinity College Dublin, Ireland.

Ann Devitt, TCD

Multiple inheritance

2.6% of nodes2.6% of nodes Normal distribution throughout depthNormal distribution throughout depth Significantly different in different Significantly different in different

taxonomies: taxonomies: χχ22 (8, N=75180)=324.27, p≤0.001 (8, N=75180)=324.27, p≤0.001

Page 8: The Topology of WordNet: some metrics Ann Devitt and Carl Vogel Computational Linguistics Group Trinity College Dublin, Ireland.

Ann Devitt, TCD

Specificity examples

Parents = 1, depth < 3Parents = 1, depth < 3 damnationdamnation officeoffice

Parents = 1, depth > 8Parents = 1, depth > 8 beagle beagle palominopalomino

Parents > 1, depth < 3Parents > 1, depth < 3 personperson artefactartefact

Parents > 1, depth > 8Parents > 1, depth > 8 sea basssea bass self-self-

condemnationcondemnation bombardonbombardon

Page 9: The Topology of WordNet: some metrics Ann Devitt and Carl Vogel Computational Linguistics Group Trinity College Dublin, Ireland.

Ann Devitt, TCD

Branching Factor

Number of children + 1Number of children + 1 Including leaf nodesIncluding leaf nodes

Range: 1 – 573Range: 1 – 573 Average: 2.023Average: 2.023

Excluding leaf nodes: Excluding leaf nodes: Average: 5.793Average: 5.793 97% less than 2097% less than 20

Page 10: The Topology of WordNet: some metrics Ann Devitt and Carl Vogel Computational Linguistics Group Trinity College Dublin, Ireland.

Ann Devitt, TCD

Branching factor

Overall low branching factorOverall low branching factor Same distribution in all sub-hierarchiesSame distribution in all sub-hierarchies Large number of nodes in totalLarge number of nodes in total Greater overall depth in pathsGreater overall depth in paths Not a shallow structure Not a shallow structure

despite 55,000 leaf nodesdespite 55,000 leaf nodes

Page 11: The Topology of WordNet: some metrics Ann Devitt and Carl Vogel Computational Linguistics Group Trinity College Dublin, Ireland.

Ann Devitt, TCD

Depth vs Height

Depth: Depth: Maximum = 18Maximum = 18 Normal distributionNormal distribution

Height: Height: Maximum = 5Maximum = 5 93.6% 1 or 2 nodes from a leaf node93.6% 1 or 2 nodes from a leaf node Zipfian distributionZipfian distribution

Page 12: The Topology of WordNet: some metrics Ann Devitt and Carl Vogel Computational Linguistics Group Trinity College Dublin, Ireland.

Ann Devitt, TCD

Depth vs Height

Reported distributionsReported distributions the same across the different sub the same across the different sub

hierarchieshierarchies

Depth is a more informative measureDepth is a more informative measure

Page 13: The Topology of WordNet: some metrics Ann Devitt and Carl Vogel Computational Linguistics Group Trinity College Dublin, Ireland.

Ann Devitt, TCD

Clustering coefficient

Measure of graph connectivityMeasure of graph connectivity Ratio: Ratio:

Number of connections btwn nodesNumber of connections btwn nodesPossible number of connectionsPossible number of connections

2 2 ΣΣii

kkii (k (kii – 1) – 1)

Page 14: The Topology of WordNet: some metrics Ann Devitt and Carl Vogel Computational Linguistics Group Trinity College Dublin, Ireland.

Ann Devitt, TCD

Cluster coefficients

First-order measure First-order measure Not useful for WordNetNot useful for WordNet Only 62 nodes have a coefficient > 0Only 62 nodes have a coefficient > 0 Does not form clusters readilyDoes not form clusters readily

Page 15: The Topology of WordNet: some metrics Ann Devitt and Carl Vogel Computational Linguistics Group Trinity College Dublin, Ireland.

Ann Devitt, TCD

Cluster coefficients

Second-order measureSecond-order measure Average 0.337Average 0.337 Normal distributionNormal distribution May form clusters of wider diameterMay form clusters of wider diameter

Page 16: The Topology of WordNet: some metrics Ann Devitt and Carl Vogel Computational Linguistics Group Trinity College Dublin, Ireland.

Ann Devitt, TCD

Pilot Study Aims

1.1. Do people have a notion of Do people have a notion of generality/specificity for concepts? generality/specificity for concepts?

2.2. Do people agree on what is more/less Do people agree on what is more/less general/specific? general/specific?

3.3. What features of WordNet do these What features of WordNet do these judgments correlate with?judgments correlate with?

Page 17: The Topology of WordNet: some metrics Ann Devitt and Carl Vogel Computational Linguistics Group Trinity College Dublin, Ireland.

Ann Devitt, TCD

Sample ranking task I

Axis, axis of rotation – (the center around Axis, axis of rotation – (the center around which something rotateswhich something rotates

River boat – (a boat used on rivers or to ply River boat – (a boat used on rivers or to ply a river)a river)

Remains – (any object that is left unused or Remains – (any object that is left unused or still extant; “I threw out the remains of my still extant; “I threw out the remains of my dinner”dinner”

Page 18: The Topology of WordNet: some metrics Ann Devitt and Carl Vogel Computational Linguistics Group Trinity College Dublin, Ireland.

Ann Devitt, TCD

Sample ranking task II

rational motive - (a motive that can be rational motive - (a motive that can be defended by reasoning or logical argumentdefended by reasoning or logical argument

disapproval - (the act of disapproving or disapproval - (the act of disapproving or condemning)condemning)

harmony, concord, concordance - harmony, concord, concordance - (agreement of opinions)(agreement of opinions)

Page 19: The Topology of WordNet: some metrics Ann Devitt and Carl Vogel Computational Linguistics Group Trinity College Dublin, Ireland.

Ann Devitt, TCD

Do people agree on what is more/less general/specific?

YESYES Cochran Q statistic (Cochran 1950)Cochran Q statistic (Cochran 1950) HH0 0 : that any agreement between respondents is : that any agreement between respondents is

due to chancedue to chance Overall: for 11 respondentsOverall: for 11 respondents

Cochran's QCochran's Q 165.859165.859 44 degrees of freedom44 degrees of freedom Asymp. Sig.Asymp. Sig. .000.000

Page 20: The Topology of WordNet: some metrics Ann Devitt and Carl Vogel Computational Linguistics Group Trinity College Dublin, Ireland.

Ann Devitt, TCD

What WN features correlate?

DepthDepth Less deep = more generalLess deep = more general

ChildrenChildren InconclusiveInconclusive

SistersSisters Less sisters = more generalLess sisters = more general

Sub-hierarchySub-hierarchy Did not seem to affect judgmentsDid not seem to affect judgments Did increase the difficulty of the taskDid increase the difficulty of the task

Page 21: The Topology of WordNet: some metrics Ann Devitt and Carl Vogel Computational Linguistics Group Trinity College Dublin, Ireland.

Ann Devitt, TCD

Conclusion

WordNet metricsWordNet metrics Inheritance: Sub-hierarchy and parentageInheritance: Sub-hierarchy and parentage Branching FactorBranching Factor Distance: depth and heightDistance: depth and height ClusteringClustering

Pilot studyPilot study Suggests where to go with a larger studySuggests where to go with a larger study

Page 22: The Topology of WordNet: some metrics Ann Devitt and Carl Vogel Computational Linguistics Group Trinity College Dublin, Ireland.

Ann Devitt, TCD

Bibliography

W. G. Cochran: The comparison of percentages in W. G. Cochran: The comparison of percentages in matched samples. Biometrika, 37:256-266, 1950matched samples. Biometrika, 37:256-266, 1950

David Touretsky: The Mathematics of Inheritance David Touretsky: The Mathematics of Inheritance Systems, Los Altos, CA: Morgan Kaufmann Systems, Los Altos, CA: Morgan Kaufmann (1986)(1986)

D. J. Watts and S. H. Strogatz: Collective D. J. Watts and S. H. Strogatz: Collective dynamics of small world networks, Nature 401, dynamics of small world networks, Nature 401, 130 (1999)130 (1999)

Page 23: The Topology of WordNet: some metrics Ann Devitt and Carl Vogel Computational Linguistics Group Trinity College Dublin, Ireland.

Ann Devitt, TCD

Multiple Inheritance vs Depth