Post on 08-Jan-2016
description
Rogelio Nazar & Maarten JanssenIULA, Universitat Pompeu Fabra, Barcelona
Dictionaries good source for information Long tradition of taxonomy extraction
Calzolari (1977), Amsler (1981), Chodorow et al (1985), Fox et al. (1988), Alshawi (1989), Boguraev (1991), Barrière & Popowich (1996), Chang (1998), Renau & Battaner (2008)
Exploiting Machine Readable Dictionaries Parsing definitional phrases Pattern extraction, Shallow parsing Full treatment of a single dictionary
There is a lot of information available Hand crafted, high-qualify resources
Combining yields new data Taxonomy from multiple dictionaries
Language-independent shallow method Combining definitions of the same word Various dictionaries, online versions DRAE, DGLE, Clave, DEM Frequency Based
Dictionaries differ◦ Different lexicon and definitions◦ Even if only for legal reasons
Hyperonym should be the same◦ A cat is an animal◦ Unless there is uncertainty in the hyperonym
Most dictionaries should use same genus◦ Statistically relevant
3xablandabrevaspersona2xcom. inútil1xsubstantivocomúnfig.
Directly from harvested text◦ With begin/end tags
No textual analysis More than definitions
◦ Examples, multiple senses, etc. Sense matching impossible
◦ Entries unsystematic◦ Dictionaries do not match in senses
Minimum number of dictionaries Raw frequency count
◦ Hyperonym tends to be repeated Candidates have to be words
◦ Of the same word-class Use of a stop-list
◦ Dictionary generated◦ Words that occur in more than 10% entries
# deconstrucción (3 dictionaries)teoría 2 1EWN: 0.desconstrucción; 0.deconstrucción; 1.teoría filosófica; 1.doctrina filosófica; 2.filosofía; 3.creencia; 4.contenido mental; 5.conocimiento; 5.cognición; 6.rasgo psicológico;
# descubrimiento (5 dictionaries)acción 3 3cosa 3 5efecto 2 -EWN: 0.descubrimiento; 1.logro; 1.presentación; 1.revelación; 2.realización; 2.información; 2.exposición; 3.acción; 3.hecho; 3.acto de habla; 3.comunicación visual; 4.acto; 4.actividad humana; 4.comunicación; 5.relación social; 6.relación; 7.abstracción;
# cumbia (5 dictionaries)danza 2 -EWN: 0.cumbiamba; 0.cumbia; 1.baile regional; 1.danza popular; 2.baile social; 3.baile; 4.recreación; 4.diversión; 5.actividad; 6.acto; 6.actividad humana;
# asta (5 dictionaries)mar 6 -lanza 6 -media 5 -toro 5 -cuerno 5 -bandera 4 -EWN: 0.cuerno; 0.asta; 1.tomadero; 1.materia animal; 1.cogedero; 1.bastón; 1.agarradera; 1.asimiento; 1.asidero; 1.asa; 2.materia; 2.apéndice; 2.vara; 2.palo; 3.porción; 3.sustancia; 3.parte; 3.herramienta; 4.utillaje; 5.artefacto; 6.objeto físico; 6.cosa; 6.objeto; 6.objeto inanimado; 7.competente; 7.respirar; 7.capaz; 7.entidad;
WordNet (still) best available taxonomy◦ Not the best resource for evaluation
Automatic Verification◦ 100 Random nouns◦ Best 5 hyperonymy candidates◦ Match when candidate in chain
Only about 50% accurracy
WordNet ◦ Many intermediate/artificial levels◦ Compulsory hyperonym◦ Contains proper names
Dictonaries ◦ More word-senses◦ Alternative definitions (synonymy, paraphrasis,
…) Differences
◦ Different choice of hyperonym◦ Different lexicon
Question?