BY PHILIPP CIMIANO PRESENTED BY JOSEPH PARK CONCEPT HIERARCHY INDUCTION.

Post on 11-Jan-2016

214 views 1 download

Transcript of BY PHILIPP CIMIANO PRESENTED BY JOSEPH PARK CONCEPT HIERARCHY INDUCTION.

B Y P H I L I P P C I M I A N O

P R E S E N T E D B Y J O S E P H P A R K

CONCEPT HIERARCHY INDUCTION

CONCEPT HIERARCHIES

• Structure information into categories

• Provide a level of generalization

• Form the backbone of any ontology

COMMON APPROACHES

• Machine readable dictionaries

• Lexico-syntactic patterns

• Distributional similarity

• Co-occurrence analysis

MACHINE READABLE DICTIONARIES

• Exploit regularity of dictionaries• Find a hypernym for the defined word• Head of the first NP (genus or kernel term)

• spring "the season between winter and summer and in which leaves and flowers appear“• hornbeam "a type of tree with a hard wood,

sometimes used in hedges“• launch "a large usu. motor-driven boat used for

carrying people on rivers, lakes, harbors, etc."

LEXICO-SYNTACTIC PATTERNS

• Hearst patterns• Hearstl: NP such as {NP,}* {(and | or)} NP• Hearst2: such NP as {NP,}* {(and | or)} NP• HearstS: NP {,NP}* {,} or other NP• Hearst4: NP {,NP}* {,} and other NP• Hearst5: NP including {NP,}* NP {(and | or)} NP• Hearst6: NP especially {NP,}* {(and|or)} NP

• They should occur frequently and in many text genres• They should accurately indicate the relation of

interest• They should be recognizable with little or no pre-

encoded knowledge

EXAMPLE OF USING HEARST PATTERN

• 'Such injuries as bruises, wounds and broken bones...'

• hyponym(bruise, injury)• hyponym(wound, injury)• hyponym(broken bone, injury)

DISTRIBUTIONAL SIMILARITY

• Distributional hypothesis• Words are similar to the extent they share the same

context• ‘you shall know a word by the company it keeps’ –Firth

EXAMPLE

CO-OCCURRENCE ANALYSIS

• Collocation

• Document-based subsumption• a certain term is more special than a term if also

appears in all the documents in which appears

THREE MORE APPROACHES

• Formal Concept Analysis (FCA)

• Guided Clustering

• Learning from heterogeneous sources of evidence

FORMAL CONCEPT ANALYSIS

• Set-theoretical approach• Parse corpus (extract dependencies)• Verb-pp-complement• Verb-object• Verb-subject

• Extract surface dependencies (section 4.1.4)

PSEUDOCODE

EXAMPLE

RESULTS

GUIDED CLUSTERING

• Uses hypernyms from WordNet and Hearst patterns

EXAMPLE

RESULTS

MORE RESULTS

HETEROGENEOUS SOURCES OF EVIDENCE

• Naïve threshold classifier• Uses Hearst patterns for corpus patterns• Uses Google API for web patterns• Uses Hearst patterns over downloaded pages• Uses WordNet senses• Uses ‘head’-heuristic (r-match)• Uses corpus based subsumption• Uses document based subsumption

RESULTS

MORE RESULTS