The Topology of WordNet: some metrics Ann Devitt and Carl Vogel Computational Linguistics Group...
-
Upload
jack-newman -
Category
Documents
-
view
212 -
download
0
Transcript of The Topology of WordNet: some metrics Ann Devitt and Carl Vogel Computational Linguistics Group...
The Topology of WordNet:some metrics
Ann Devitt and Carl VogelAnn Devitt and Carl Vogel
Computational Linguistics GroupComputational Linguistics Group
Trinity College Dublin, IrelandTrinity College Dublin, Ireland
Ann Devitt, TCD
Introduction
MeasuresMeasures WordNet “sub-hierarchies”WordNet “sub-hierarchies” Multiple inheritanceMultiple inheritance Branching FactorBranching Factor Depth Depth versus versus HeightHeight Cluster coefficientsCluster coefficients
Specificity pilot studySpecificity pilot study
Ann Devitt, TCD
Terminology
WordNet as directed acyclic graphWordNet as directed acyclic graph
Node and synset interchangeableNode and synset interchangeable
Ann Devitt, TCD
Dimensional distribution
Ann Devitt, TCD
Overlap between hierarchies
2072 synsets: more than 1 top hierarchy2072 synsets: more than 1 top hierarchy
35 synsets: more than 2 top hierarchies35 synsets: more than 2 top hierarchies
Ann Devitt, TCD
Some overlap examples
Abstraction and EventAbstraction and Event948 synsets948 synsets
group actiongroup action Entity and GroupEntity and Group
250 nodes250 nodesweaponryweaponry
Ann Devitt, TCD
Multiple inheritance
2.6% of nodes2.6% of nodes Normal distribution throughout depthNormal distribution throughout depth Significantly different in different Significantly different in different
taxonomies: taxonomies: χχ22 (8, N=75180)=324.27, p≤0.001 (8, N=75180)=324.27, p≤0.001
Ann Devitt, TCD
Specificity examples
Parents = 1, depth < 3Parents = 1, depth < 3 damnationdamnation officeoffice
Parents = 1, depth > 8Parents = 1, depth > 8 beagle beagle palominopalomino
Parents > 1, depth < 3Parents > 1, depth < 3 personperson artefactartefact
Parents > 1, depth > 8Parents > 1, depth > 8 sea basssea bass self-self-
condemnationcondemnation bombardonbombardon
Ann Devitt, TCD
Branching Factor
Number of children + 1Number of children + 1 Including leaf nodesIncluding leaf nodes
Range: 1 – 573Range: 1 – 573 Average: 2.023Average: 2.023
Excluding leaf nodes: Excluding leaf nodes: Average: 5.793Average: 5.793 97% less than 2097% less than 20
Ann Devitt, TCD
Branching factor
Overall low branching factorOverall low branching factor Same distribution in all sub-hierarchiesSame distribution in all sub-hierarchies Large number of nodes in totalLarge number of nodes in total Greater overall depth in pathsGreater overall depth in paths Not a shallow structure Not a shallow structure
despite 55,000 leaf nodesdespite 55,000 leaf nodes
Ann Devitt, TCD
Depth vs Height
Depth: Depth: Maximum = 18Maximum = 18 Normal distributionNormal distribution
Height: Height: Maximum = 5Maximum = 5 93.6% 1 or 2 nodes from a leaf node93.6% 1 or 2 nodes from a leaf node Zipfian distributionZipfian distribution
Ann Devitt, TCD
Depth vs Height
Reported distributionsReported distributions the same across the different sub the same across the different sub
hierarchieshierarchies
Depth is a more informative measureDepth is a more informative measure
Ann Devitt, TCD
Clustering coefficient
Measure of graph connectivityMeasure of graph connectivity Ratio: Ratio:
Number of connections btwn nodesNumber of connections btwn nodesPossible number of connectionsPossible number of connections
2 2 ΣΣii
kkii (k (kii – 1) – 1)
Ann Devitt, TCD
Cluster coefficients
First-order measure First-order measure Not useful for WordNetNot useful for WordNet Only 62 nodes have a coefficient > 0Only 62 nodes have a coefficient > 0 Does not form clusters readilyDoes not form clusters readily
Ann Devitt, TCD
Cluster coefficients
Second-order measureSecond-order measure Average 0.337Average 0.337 Normal distributionNormal distribution May form clusters of wider diameterMay form clusters of wider diameter
Ann Devitt, TCD
Pilot Study Aims
1.1. Do people have a notion of Do people have a notion of generality/specificity for concepts? generality/specificity for concepts?
2.2. Do people agree on what is more/less Do people agree on what is more/less general/specific? general/specific?
3.3. What features of WordNet do these What features of WordNet do these judgments correlate with?judgments correlate with?
Ann Devitt, TCD
Sample ranking task I
Axis, axis of rotation – (the center around Axis, axis of rotation – (the center around which something rotateswhich something rotates
River boat – (a boat used on rivers or to ply River boat – (a boat used on rivers or to ply a river)a river)
Remains – (any object that is left unused or Remains – (any object that is left unused or still extant; “I threw out the remains of my still extant; “I threw out the remains of my dinner”dinner”
Ann Devitt, TCD
Sample ranking task II
rational motive - (a motive that can be rational motive - (a motive that can be defended by reasoning or logical argumentdefended by reasoning or logical argument
disapproval - (the act of disapproving or disapproval - (the act of disapproving or condemning)condemning)
harmony, concord, concordance - harmony, concord, concordance - (agreement of opinions)(agreement of opinions)
Ann Devitt, TCD
Do people agree on what is more/less general/specific?
YESYES Cochran Q statistic (Cochran 1950)Cochran Q statistic (Cochran 1950) HH0 0 : that any agreement between respondents is : that any agreement between respondents is
due to chancedue to chance Overall: for 11 respondentsOverall: for 11 respondents
Cochran's QCochran's Q 165.859165.859 44 degrees of freedom44 degrees of freedom Asymp. Sig.Asymp. Sig. .000.000
Ann Devitt, TCD
What WN features correlate?
DepthDepth Less deep = more generalLess deep = more general
ChildrenChildren InconclusiveInconclusive
SistersSisters Less sisters = more generalLess sisters = more general
Sub-hierarchySub-hierarchy Did not seem to affect judgmentsDid not seem to affect judgments Did increase the difficulty of the taskDid increase the difficulty of the task
Ann Devitt, TCD
Conclusion
WordNet metricsWordNet metrics Inheritance: Sub-hierarchy and parentageInheritance: Sub-hierarchy and parentage Branching FactorBranching Factor Distance: depth and heightDistance: depth and height ClusteringClustering
Pilot studyPilot study Suggests where to go with a larger studySuggests where to go with a larger study
Ann Devitt, TCD
Bibliography
W. G. Cochran: The comparison of percentages in W. G. Cochran: The comparison of percentages in matched samples. Biometrika, 37:256-266, 1950matched samples. Biometrika, 37:256-266, 1950
David Touretsky: The Mathematics of Inheritance David Touretsky: The Mathematics of Inheritance Systems, Los Altos, CA: Morgan Kaufmann Systems, Los Altos, CA: Morgan Kaufmann (1986)(1986)
D. J. Watts and S. H. Strogatz: Collective D. J. Watts and S. H. Strogatz: Collective dynamics of small world networks, Nature 401, dynamics of small world networks, Nature 401, 130 (1999)130 (1999)
Ann Devitt, TCD
Multiple Inheritance vs Depth