Concept Hierarchy Induction by Philipp Cimiano
description
Transcript of Concept Hierarchy Induction by Philipp Cimiano
![Page 1: Concept Hierarchy Induction by Philipp Cimiano](https://reader035.fdocuments.in/reader035/viewer/2022062309/568149cc550346895db6fb77/html5/thumbnails/1.jpg)
Concept Hierarchy Inductionby Philipp Cimiano
![Page 2: Concept Hierarchy Induction by Philipp Cimiano](https://reader035.fdocuments.in/reader035/viewer/2022062309/568149cc550346895db6fb77/html5/thumbnails/2.jpg)
Objective
Structure information into categories
Provide a level of generalization to define relationships between data
Application: Backbone of any ontology
![Page 3: Concept Hierarchy Induction by Philipp Cimiano](https://reader035.fdocuments.in/reader035/viewer/2022062309/568149cc550346895db6fb77/html5/thumbnails/3.jpg)
Overview
Different approaches of acquiring conceptual hierarchies from text corpus.
Various clustering techniques. Evaluation Related Work Conclusion
![Page 4: Concept Hierarchy Induction by Philipp Cimiano](https://reader035.fdocuments.in/reader035/viewer/2022062309/568149cc550346895db6fb77/html5/thumbnails/4.jpg)
Machine Readable Dictionaries
Entries: ‘a tiger is a mammal’, or ‘mammals such as tigers, lions or elephants’.
exploit the regularity of dictionary entries.
the head of the first NP - hypernym.
![Page 5: Concept Hierarchy Induction by Philipp Cimiano](https://reader035.fdocuments.in/reader035/viewer/2022062309/568149cc550346895db6fb77/html5/thumbnails/5.jpg)
Example
![Page 6: Concept Hierarchy Induction by Philipp Cimiano](https://reader035.fdocuments.in/reader035/viewer/2022062309/568149cc550346895db6fb77/html5/thumbnails/6.jpg)
Exception
![Page 7: Concept Hierarchy Induction by Philipp Cimiano](https://reader035.fdocuments.in/reader035/viewer/2022062309/568149cc550346895db6fb77/html5/thumbnails/7.jpg)
Exception
is-a (corolla, part)………..is a NOT VALID
is-a (republican, member) ……….. is a NOT VALID
is-a (corolla, flower)………..is a NOT VALID
is-a (republican, political party)………..is a NOT VALID
![Page 8: Concept Hierarchy Induction by Philipp Cimiano](https://reader035.fdocuments.in/reader035/viewer/2022062309/568149cc550346895db6fb77/html5/thumbnails/8.jpg)
Exception
![Page 9: Concept Hierarchy Induction by Philipp Cimiano](https://reader035.fdocuments.in/reader035/viewer/2022062309/568149cc550346895db6fb77/html5/thumbnails/9.jpg)
Alshawis solution
![Page 10: Concept Hierarchy Induction by Philipp Cimiano](https://reader035.fdocuments.in/reader035/viewer/2022062309/568149cc550346895db6fb77/html5/thumbnails/10.jpg)
Results using MRDs
Dolan et al. - 87% of the hypernym relations extracted are correct
Calzolari cites a precision of > 90%
Alshawi - precision of 77%
![Page 11: Concept Hierarchy Induction by Philipp Cimiano](https://reader035.fdocuments.in/reader035/viewer/2022062309/568149cc550346895db6fb77/html5/thumbnails/11.jpg)
Strengths And Weaknesses
Correct, explicit knowledge
Robust basis for ontology learning
Weakness- domain independent
![Page 12: Concept Hierarchy Induction by Philipp Cimiano](https://reader035.fdocuments.in/reader035/viewer/2022062309/568149cc550346895db6fb77/html5/thumbnails/12.jpg)
Lexico-Syntactic patterns
Task: automatically learning hyponym relations from the corpora.
'Such injuries as bruises, wounds and broken bones'
hyponym (bruise, injury)
hyponym (wound, injury)
hyponym (broken bone, injury)
![Page 13: Concept Hierarchy Induction by Philipp Cimiano](https://reader035.fdocuments.in/reader035/viewer/2022062309/568149cc550346895db6fb77/html5/thumbnails/13.jpg)
Hearst patterns
'Such injuries as bruises, wounds and broken bones'
![Page 14: Concept Hierarchy Induction by Philipp Cimiano](https://reader035.fdocuments.in/reader035/viewer/2022062309/568149cc550346895db6fb77/html5/thumbnails/14.jpg)
Requirements
Occur frequently in many text genres.
Accurately indicate the relation of interest.
Be recognizable with little or no pre-encoded knowledge
![Page 15: Concept Hierarchy Induction by Philipp Cimiano](https://reader035.fdocuments.in/reader035/viewer/2022062309/568149cc550346895db6fb77/html5/thumbnails/15.jpg)
Strengths And Weaknesses
Identified easily and are accurate
Weakness: patterns appear rarely is-a relation do not appear in Hearst style
pattern
![Page 16: Concept Hierarchy Induction by Philipp Cimiano](https://reader035.fdocuments.in/reader035/viewer/2022062309/568149cc550346895db6fb77/html5/thumbnails/16.jpg)
Distribution Similarity
'you shall know a word by the company it keeps’ [Firth, 1957].
semantic similarity of words – similarity of the contexts.
![Page 17: Concept Hierarchy Induction by Philipp Cimiano](https://reader035.fdocuments.in/reader035/viewer/2022062309/568149cc550346895db6fb77/html5/thumbnails/17.jpg)
![Page 18: Concept Hierarchy Induction by Philipp Cimiano](https://reader035.fdocuments.in/reader035/viewer/2022062309/568149cc550346895db6fb77/html5/thumbnails/18.jpg)
Using distribution similarity
![Page 19: Concept Hierarchy Induction by Philipp Cimiano](https://reader035.fdocuments.in/reader035/viewer/2022062309/568149cc550346895db6fb77/html5/thumbnails/19.jpg)
Strengths And Weaknesses
reasonable concept hierarchy.
Weakness: Cluster tree lacks clear and formal interpretation Does not provide any intentional description of
concepts Similarities may be accidental (sparse data)
![Page 20: Concept Hierarchy Induction by Philipp Cimiano](https://reader035.fdocuments.in/reader035/viewer/2022062309/568149cc550346895db6fb77/html5/thumbnails/20.jpg)
Formal Concept Analysis (FCA)
![Page 21: Concept Hierarchy Induction by Philipp Cimiano](https://reader035.fdocuments.in/reader035/viewer/2022062309/568149cc550346895db6fb77/html5/thumbnails/21.jpg)
![Page 22: Concept Hierarchy Induction by Philipp Cimiano](https://reader035.fdocuments.in/reader035/viewer/2022062309/568149cc550346895db6fb77/html5/thumbnails/22.jpg)
FCA output
![Page 23: Concept Hierarchy Induction by Philipp Cimiano](https://reader035.fdocuments.in/reader035/viewer/2022062309/568149cc550346895db6fb77/html5/thumbnails/23.jpg)
Similarity measures
![Page 24: Concept Hierarchy Induction by Philipp Cimiano](https://reader035.fdocuments.in/reader035/viewer/2022062309/568149cc550346895db6fb77/html5/thumbnails/24.jpg)
Smoothing
![Page 25: Concept Hierarchy Induction by Philipp Cimiano](https://reader035.fdocuments.in/reader035/viewer/2022062309/568149cc550346895db6fb77/html5/thumbnails/25.jpg)
Evaluation
Semantic cotopy (SC).
Taxonomy overlap (TO)
![Page 26: Concept Hierarchy Induction by Philipp Cimiano](https://reader035.fdocuments.in/reader035/viewer/2022062309/568149cc550346895db6fb77/html5/thumbnails/26.jpg)
Evaluation Measure
![Page 27: Concept Hierarchy Induction by Philipp Cimiano](https://reader035.fdocuments.in/reader035/viewer/2022062309/568149cc550346895db6fb77/html5/thumbnails/27.jpg)
100% Precision Recall
![Page 28: Concept Hierarchy Induction by Philipp Cimiano](https://reader035.fdocuments.in/reader035/viewer/2022062309/568149cc550346895db6fb77/html5/thumbnails/28.jpg)
Low Recall
![Page 29: Concept Hierarchy Induction by Philipp Cimiano](https://reader035.fdocuments.in/reader035/viewer/2022062309/568149cc550346895db6fb77/html5/thumbnails/29.jpg)
Low Precision
![Page 30: Concept Hierarchy Induction by Philipp Cimiano](https://reader035.fdocuments.in/reader035/viewer/2022062309/568149cc550346895db6fb77/html5/thumbnails/30.jpg)
Results
![Page 31: Concept Hierarchy Induction by Philipp Cimiano](https://reader035.fdocuments.in/reader035/viewer/2022062309/568149cc550346895db6fb77/html5/thumbnails/31.jpg)
Results
![Page 32: Concept Hierarchy Induction by Philipp Cimiano](https://reader035.fdocuments.in/reader035/viewer/2022062309/568149cc550346895db6fb77/html5/thumbnails/32.jpg)
Results
![Page 33: Concept Hierarchy Induction by Philipp Cimiano](https://reader035.fdocuments.in/reader035/viewer/2022062309/568149cc550346895db6fb77/html5/thumbnails/33.jpg)
Results
![Page 34: Concept Hierarchy Induction by Philipp Cimiano](https://reader035.fdocuments.in/reader035/viewer/2022062309/568149cc550346895db6fb77/html5/thumbnails/34.jpg)
Strengths And Weaknesses
FCA generates formal concepts Provides intentional description
Weakness: Size of the lattice can get exponential in the size spurious clusters Finding appropriate labels for the cluster
![Page 35: Concept Hierarchy Induction by Philipp Cimiano](https://reader035.fdocuments.in/reader035/viewer/2022062309/568149cc550346895db6fb77/html5/thumbnails/35.jpg)
Problems with Unsupervised Approaches to Clustering
Data sparseness leads to spurious syntactic similarities
Produced clusters can’t be appropriately labeled
![Page 36: Concept Hierarchy Induction by Philipp Cimiano](https://reader035.fdocuments.in/reader035/viewer/2022062309/568149cc550346895db6fb77/html5/thumbnails/36.jpg)
Guided Clustering
Hypernyms directly used to guide clusteringWordNetHearst
Agglomerative clustering
![Page 37: Concept Hierarchy Induction by Philipp Cimiano](https://reader035.fdocuments.in/reader035/viewer/2022062309/568149cc550346895db6fb77/html5/thumbnails/37.jpg)
![Page 38: Concept Hierarchy Induction by Philipp Cimiano](https://reader035.fdocuments.in/reader035/viewer/2022062309/568149cc550346895db6fb77/html5/thumbnails/38.jpg)
Similarity Computation
Ten most similar terms of the tourism reference taxonomy
![Page 39: Concept Hierarchy Induction by Philipp Cimiano](https://reader035.fdocuments.in/reader035/viewer/2022062309/568149cc550346895db6fb77/html5/thumbnails/39.jpg)
The Hypernym Oracle
Three sourcesWordNetHearst patterns matched in a corpusHearst patterns matched in the World Wide
Web Record hypernyms and amount of
evidence found in support of hypernyms.
![Page 40: Concept Hierarchy Induction by Philipp Cimiano](https://reader035.fdocuments.in/reader035/viewer/2022062309/568149cc550346895db6fb77/html5/thumbnails/40.jpg)
WordNet
Collect hypernyms found in any dominating synset containing term, t
Include number of times the hypernym appears in a dominating synset
![Page 41: Concept Hierarchy Induction by Philipp Cimiano](https://reader035.fdocuments.in/reader035/viewer/2022062309/568149cc550346895db6fb77/html5/thumbnails/41.jpg)
Hearst Patterns (Corpus)
Record number of isa-relations found between two terms
![Page 42: Concept Hierarchy Induction by Philipp Cimiano](https://reader035.fdocuments.in/reader035/viewer/2022062309/568149cc550346895db6fb77/html5/thumbnails/42.jpg)
Hearst Patterns (WWW)
Download 100 Google abstracts for each concept and clue:
![Page 43: Concept Hierarchy Induction by Philipp Cimiano](https://reader035.fdocuments.in/reader035/viewer/2022062309/568149cc550346895db6fb77/html5/thumbnails/43.jpg)
Evidence
Total Evidence for Hypernyms:
•time: 4
•vacation: 2
•period: 2
![Page 44: Concept Hierarchy Induction by Philipp Cimiano](https://reader035.fdocuments.in/reader035/viewer/2022062309/568149cc550346895db6fb77/html5/thumbnails/44.jpg)
Clustering Algorithm
1. Input a list of terms
2. Calculate the similarity between each pair of terms and sort from highest to lowest
3. For each potential pair to be clustered consult the oracle.
![Page 45: Concept Hierarchy Induction by Philipp Cimiano](https://reader035.fdocuments.in/reader035/viewer/2022062309/568149cc550346895db6fb77/html5/thumbnails/45.jpg)
Consulting the Oracle case 1
If term 1 is a hypernym of term 2 or vice-versa:Create appropriate subconcept relationship.
![Page 46: Concept Hierarchy Induction by Philipp Cimiano](https://reader035.fdocuments.in/reader035/viewer/2022062309/568149cc550346895db6fb77/html5/thumbnails/46.jpg)
Consulting the Oracle case 2
Find the common hypernym for both terms with greatest evidence.
If one term has already been classified:
t’ = h h is a hypernym of t’ t’ is a hypernym of h
![Page 47: Concept Hierarchy Induction by Philipp Cimiano](https://reader035.fdocuments.in/reader035/viewer/2022062309/568149cc550346895db6fb77/html5/thumbnails/47.jpg)
Consulting the Oracle case 3
Neither term has been classified:Each term becomes a subconcept of the
common hypernym.
![Page 48: Concept Hierarchy Induction by Philipp Cimiano](https://reader035.fdocuments.in/reader035/viewer/2022062309/568149cc550346895db6fb77/html5/thumbnails/48.jpg)
Consulting the Oracle case 4
The terms do not share a common hypernym:Set aside the terms for further processing.
![Page 49: Concept Hierarchy Induction by Philipp Cimiano](https://reader035.fdocuments.in/reader035/viewer/2022062309/568149cc550346895db6fb77/html5/thumbnails/49.jpg)
r-matches
For all unprocessed terms, check for r-matches (i.e. ‘credit card’ matches ‘international credit card’)
![Page 50: Concept Hierarchy Induction by Philipp Cimiano](https://reader035.fdocuments.in/reader035/viewer/2022062309/568149cc550346895db6fb77/html5/thumbnails/50.jpg)
Further Processing
If either term in a pair is already classified as t’, the other term is classified under t’ as well.
Otherwise place both terms under the hypernym of either term with the most evidence.
Any unclassified terms are added under the root concept.
![Page 51: Concept Hierarchy Induction by Philipp Cimiano](https://reader035.fdocuments.in/reader035/viewer/2022062309/568149cc550346895db6fb77/html5/thumbnails/51.jpg)
![Page 52: Concept Hierarchy Induction by Philipp Cimiano](https://reader035.fdocuments.in/reader035/viewer/2022062309/568149cc550346895db6fb77/html5/thumbnails/52.jpg)
Evaluation
Taxonomic overlap (TO) ignore leaf nodes
Sibling overlap (SO)measures quality of clusters
![Page 53: Concept Hierarchy Induction by Philipp Cimiano](https://reader035.fdocuments.in/reader035/viewer/2022062309/568149cc550346895db6fb77/html5/thumbnails/53.jpg)
Evaluation
Tourism domain:Lonely PlanetMecklenburg
Finance domain:Reuters-21578
![Page 54: Concept Hierarchy Induction by Philipp Cimiano](https://reader035.fdocuments.in/reader035/viewer/2022062309/568149cc550346895db6fb77/html5/thumbnails/54.jpg)
Tourism Results—TO
![Page 55: Concept Hierarchy Induction by Philipp Cimiano](https://reader035.fdocuments.in/reader035/viewer/2022062309/568149cc550346895db6fb77/html5/thumbnails/55.jpg)
Finance Results—TO
![Page 56: Concept Hierarchy Induction by Philipp Cimiano](https://reader035.fdocuments.in/reader035/viewer/2022062309/568149cc550346895db6fb77/html5/thumbnails/56.jpg)
Tourism Results—SO
![Page 57: Concept Hierarchy Induction by Philipp Cimiano](https://reader035.fdocuments.in/reader035/viewer/2022062309/568149cc550346895db6fb77/html5/thumbnails/57.jpg)
Finance Results—SO
![Page 58: Concept Hierarchy Induction by Philipp Cimiano](https://reader035.fdocuments.in/reader035/viewer/2022062309/568149cc550346895db6fb77/html5/thumbnails/58.jpg)
Human Evaluation
![Page 59: Concept Hierarchy Induction by Philipp Cimiano](https://reader035.fdocuments.in/reader035/viewer/2022062309/568149cc550346895db6fb77/html5/thumbnails/59.jpg)
Future Work
Take word sense into consideration for the WordNet source.
![Page 60: Concept Hierarchy Induction by Philipp Cimiano](https://reader035.fdocuments.in/reader035/viewer/2022062309/568149cc550346895db6fb77/html5/thumbnails/60.jpg)
Summary
Hypernym guided agglomerative clustering works pretty good.Better than the “Golden Standard”Good human evaluation
Provides labels for clusters No spurious similarities Faster than agglomerative clustering
![Page 61: Concept Hierarchy Induction by Philipp Cimiano](https://reader035.fdocuments.in/reader035/viewer/2022062309/568149cc550346895db6fb77/html5/thumbnails/61.jpg)
Learning from Heterogeneous Sources of Evidence
Many ways to learn concept hierarchies Can we combine different paradigms?
Any manual attempt to combine strategies would be ad hoc
Use supervised learning to combine techniques
![Page 62: Concept Hierarchy Induction by Philipp Cimiano](https://reader035.fdocuments.in/reader035/viewer/2022062309/568149cc550346895db6fb77/html5/thumbnails/62.jpg)
Determining relationships with machine learning Example: Determine if a pair of words has
an “isa” relationship
![Page 63: Concept Hierarchy Induction by Philipp Cimiano](https://reader035.fdocuments.in/reader035/viewer/2022062309/568149cc550346895db6fb77/html5/thumbnails/63.jpg)
Feature 1:Matching patterns in a corpus Given two terms t1 and t2 we record how
many times a Hearst-pattern indicating an isa-relation between t1 and t2 is matched in the corpus
Normalize by maximum number of Hearst patterns found for t1
![Page 64: Concept Hierarchy Induction by Philipp Cimiano](https://reader035.fdocuments.in/reader035/viewer/2022062309/568149cc550346895db6fb77/html5/thumbnails/64.jpg)
Example
This provided the best F-measure with a single-feature classifier
![Page 65: Concept Hierarchy Induction by Philipp Cimiano](https://reader035.fdocuments.in/reader035/viewer/2022062309/568149cc550346895db6fb77/html5/thumbnails/65.jpg)
Feature 2:Matching patterns on the web Use the Google API to count the matches
of a certain expression on the Web
![Page 66: Concept Hierarchy Induction by Philipp Cimiano](https://reader035.fdocuments.in/reader035/viewer/2022062309/568149cc550346895db6fb77/html5/thumbnails/66.jpg)
Feature 3:Downloading webpages Allows for matching expressions with a more
complex linguistic structure Assign functions to each of the Hearst patterns to
be matched Use these “clues” to decide what pages to
download
Download 100 abstracts matching the query “such as conferences”
![Page 67: Concept Hierarchy Induction by Philipp Cimiano](https://reader035.fdocuments.in/reader035/viewer/2022062309/568149cc550346895db6fb77/html5/thumbnails/67.jpg)
Example
![Page 68: Concept Hierarchy Induction by Philipp Cimiano](https://reader035.fdocuments.in/reader035/viewer/2022062309/568149cc550346895db6fb77/html5/thumbnails/68.jpg)
Feature 4:WordNet – All senses Is there a hypernym relationship between
t1 and t2? Can be more than one path from the
synsets of t1 to the synsets of t2
![Page 69: Concept Hierarchy Induction by Philipp Cimiano](https://reader035.fdocuments.in/reader035/viewer/2022062309/568149cc550346895db6fb77/html5/thumbnails/69.jpg)
Feature 5:WordNet – First sense Only consider the first sense of t1
![Page 70: Concept Hierarchy Induction by Philipp Cimiano](https://reader035.fdocuments.in/reader035/viewer/2022062309/568149cc550346895db6fb77/html5/thumbnails/70.jpg)
Feature 6:“Head”- heuristic If t1 r-matches t2 we derive the relation
isa(t2,t1) e.g.
t1 = “conference”
t2 = “international conference”
isahead(“international conference”,”conference”)
![Page 71: Concept Hierarchy Induction by Philipp Cimiano](https://reader035.fdocuments.in/reader035/viewer/2022062309/568149cc550346895db6fb77/html5/thumbnails/71.jpg)
Feature 7:Corpus-based subsumption t1 is a subclass of t2 if all the syntactic
contexts in which t1 appears are also shared by t2
![Page 72: Concept Hierarchy Induction by Philipp Cimiano](https://reader035.fdocuments.in/reader035/viewer/2022062309/568149cc550346895db6fb77/html5/thumbnails/72.jpg)
Feature 8:Document-based subsumption t1 is a subclass of term t2 if t2 appears in
all documents in which t1 appears
# of pages where t1 and t2 occur
# of pages where t1 occurs
![Page 73: Concept Hierarchy Induction by Philipp Cimiano](https://reader035.fdocuments.in/reader035/viewer/2022062309/568149cc550346895db6fb77/html5/thumbnails/73.jpg)
Example
![Page 74: Concept Hierarchy Induction by Philipp Cimiano](https://reader035.fdocuments.in/reader035/viewer/2022062309/568149cc550346895db6fb77/html5/thumbnails/74.jpg)
Naïve Threshold Classifier
Used as a baseline Classify an example as positive if the
value of a given feature is above some threshold t
For each feature, the threshold has been varied from 0 to 1 in steps of 0.01
![Page 75: Concept Hierarchy Induction by Philipp Cimiano](https://reader035.fdocuments.in/reader035/viewer/2022062309/568149cc550346895db6fb77/html5/thumbnails/75.jpg)
Baseline Measures
![Page 76: Concept Hierarchy Induction by Philipp Cimiano](https://reader035.fdocuments.in/reader035/viewer/2022062309/568149cc550346895db6fb77/html5/thumbnails/76.jpg)
Evaluation
ClassifiersNaïve BayesDecision TreePerceptronMulti-layer perceptron
![Page 77: Concept Hierarchy Induction by Philipp Cimiano](https://reader035.fdocuments.in/reader035/viewer/2022062309/568149cc550346895db6fb77/html5/thumbnails/77.jpg)
Evaluation Strategies
Undersampling Remove a number of majority class examples (non-isa
examples)
Oversampling Add additional examples to the minority class
Varying the classification threshold Try different threshold values other than 0.5
Introducing a cost matrix Different penalties for different types of misclassification
One Class SVMs Only considers positive examples
![Page 78: Concept Hierarchy Induction by Philipp Cimiano](https://reader035.fdocuments.in/reader035/viewer/2022062309/568149cc550346895db6fb77/html5/thumbnails/78.jpg)
Results
![Page 79: Concept Hierarchy Induction by Philipp Cimiano](https://reader035.fdocuments.in/reader035/viewer/2022062309/568149cc550346895db6fb77/html5/thumbnails/79.jpg)
Results (cont.)
![Page 80: Concept Hierarchy Induction by Philipp Cimiano](https://reader035.fdocuments.in/reader035/viewer/2022062309/568149cc550346895db6fb77/html5/thumbnails/80.jpg)
Discussion
The best results achieved with the one-class SVM (F = 32.96%) More than 10 points above the baseline classifier
average (F = 21.28%) and maximum (F = 21%) strategies
More than 14 points better than the best single-feature classifier (F = 18.84%) using the isawww feature
Second best results obtained with a Multilayer Perceptron using oversampling or undersampling
![Page 81: Concept Hierarchy Induction by Philipp Cimiano](https://reader035.fdocuments.in/reader035/viewer/2022062309/568149cc550346895db6fb77/html5/thumbnails/81.jpg)
Discussion
Gain insight from finding which features were most used by classifiers
Used this information to modify features and rerun experiments
![Page 82: Concept Hierarchy Induction by Philipp Cimiano](https://reader035.fdocuments.in/reader035/viewer/2022062309/568149cc550346895db6fb77/html5/thumbnails/82.jpg)
Summary
Using different approaches is useful Machine learning approaches outperform naïve
averaging Unbalanced character of the dataset poses a
problem SVMs (which are not affected by the imbalance)
produce the best results This approach can show which features are the
most reliable as predictors
![Page 83: Concept Hierarchy Induction by Philipp Cimiano](https://reader035.fdocuments.in/reader035/viewer/2022062309/568149cc550346895db6fb77/html5/thumbnails/83.jpg)
Related Work
Taxonomy ConstructionLexico-syntactic patternsClusteringLinguistic approaches
Taxonomy Refinement Taxonomy Extension
![Page 84: Concept Hierarchy Induction by Philipp Cimiano](https://reader035.fdocuments.in/reader035/viewer/2022062309/568149cc550346895db6fb77/html5/thumbnails/84.jpg)
Lexico-syntactic patterns
Hearst Iswanska et al. – added extra patterns Poesia et al. – anaphoric resolution Ammad et al. – applying to specific domains Etzioni et al. – patterns matched on the www Cederburg and Widdows – precision improved with
Latent Semantic Analysis Others working on learning patterns automatically
![Page 85: Concept Hierarchy Induction by Philipp Cimiano](https://reader035.fdocuments.in/reader035/viewer/2022062309/568149cc550346895db6fb77/html5/thumbnails/85.jpg)
Clustering
Hindle group nouns semantically derive verb-subject and verb-object dependencies from a 6
million word sample of Associated Press news stories
Pereira et al. top-down soft clustering algorithm with deterministic annealing words can appear in different clusters (multiple meanings of
words)
Caraballo bottom-up clustering approach to build a hierarchy of nouns uses conjunctive and appositive constructions for nouns derived
from the Wall Street Journal Corpus
![Page 86: Concept Hierarchy Induction by Philipp Cimiano](https://reader035.fdocuments.in/reader035/viewer/2022062309/568149cc550346895db6fb77/html5/thumbnails/86.jpg)
Clustering (cont.)
The ASIUM System The Mo'K Workbench Grefenstette Gasperin et al. Reinberger et al. Lin et al. CobWeb Crouch et al. Haav Curran et al. Terascale Knowledge Acquisition
![Page 87: Concept Hierarchy Induction by Philipp Cimiano](https://reader035.fdocuments.in/reader035/viewer/2022062309/568149cc550346895db6fb77/html5/thumbnails/87.jpg)
Linguistic Approaches
Linguistic analysis exploited more directly rather than just for feature extraction OntoLT - use shallow parser to label parts of speech and
grammatical relations (e.g. HeadNounToClass-ModToSubClass, which maps a common noun to a concept or class)
OntoLearn - analyze multi-word terms compositionally with respect to an existing semantic resource (Word-Net)
Morin et al. - tackle the problem of projecting semantic relations between single terms to multiple terms (e.g. project the isa-relation between apple and fruit to an isa-relation between apple juice and fruit juice)
![Page 88: Concept Hierarchy Induction by Philipp Cimiano](https://reader035.fdocuments.in/reader035/viewer/2022062309/568149cc550346895db6fb77/html5/thumbnails/88.jpg)
Linguistic Approaches
Sanchez and Moreno – download first n hits for a search word and process the neighborhood linguistically to determine candidate modifiers for the search term
Sabou - inducing concept hierarchies for the purpose of modeling web services (applies methods not to full text, but to Java-documentation of web services)
![Page 89: Concept Hierarchy Induction by Philipp Cimiano](https://reader035.fdocuments.in/reader035/viewer/2022062309/568149cc550346895db6fb77/html5/thumbnails/89.jpg)
Taxonomy Refinement
Hearst and Schutze Widdows Madche, Pekar and Staab Alfonseca et aL
![Page 90: Concept Hierarchy Induction by Philipp Cimiano](https://reader035.fdocuments.in/reader035/viewer/2022062309/568149cc550346895db6fb77/html5/thumbnails/90.jpg)
Taxonomy Extension
Agirre et al. Faatz and Steinmetz Turney
![Page 91: Concept Hierarchy Induction by Philipp Cimiano](https://reader035.fdocuments.in/reader035/viewer/2022062309/568149cc550346895db6fb77/html5/thumbnails/91.jpg)
Conclusions
Compared different hierarchical clustering approaches with respect to:effectivenessspeed traceability
Set-theoretic approaches, as FCA, can outperform similarity-based approaches.
![Page 92: Concept Hierarchy Induction by Philipp Cimiano](https://reader035.fdocuments.in/reader035/viewer/2022062309/568149cc550346895db6fb77/html5/thumbnails/92.jpg)
Conclusions
Presented an algorithm for clustering guided by a hypernym oracle.
More efficient than agglomerative clustering.
![Page 93: Concept Hierarchy Induction by Philipp Cimiano](https://reader035.fdocuments.in/reader035/viewer/2022062309/568149cc550346895db6fb77/html5/thumbnails/93.jpg)
Conclusions
Used machine learning techniques to effectively combine different approaches for learning taxonomic relations from text.
A learned model indeed outperforms all single approaches.
![Page 94: Concept Hierarchy Induction by Philipp Cimiano](https://reader035.fdocuments.in/reader035/viewer/2022062309/568149cc550346895db6fb77/html5/thumbnails/94.jpg)
Open Issues
Which similarity or weighting measure should be chosen Which features should be considered to represent a
certain term Can features be aggregated to represent a term at a
more abstract level How should we model polysemy of terms Can we automatically induce lexico-syntactic patterns
(unsupervised!) What other approaches are there for combining different
paradigms; and how can we compare these
![Page 95: Concept Hierarchy Induction by Philipp Cimiano](https://reader035.fdocuments.in/reader035/viewer/2022062309/568149cc550346895db6fb77/html5/thumbnails/95.jpg)
Questions