1 Unsupervised Ontology Induction From Text Hoifung Poon Dept. Computer Science & Eng. University of...

31
1 Unsupervised Ontology Induction From Text Hoifung Poon Dept. Computer Science & Eng. University of Washington (Joint work with Pedro Domingos)

Transcript of 1 Unsupervised Ontology Induction From Text Hoifung Poon Dept. Computer Science & Eng. University of...

Page 1: 1 Unsupervised Ontology Induction From Text Hoifung Poon Dept. Computer Science & Eng. University of Washington (Joint work with Pedro Domingos)

1

Unsupervised Ontology Induction

From Text

Hoifung PoonDept. Computer Science & Eng.

University of Washington

(Joint work with Pedro Domingos)

Page 2: 1 Unsupervised Ontology Induction From Text Hoifung Poon Dept. Computer Science & Eng. University of Washington (Joint work with Pedro Domingos)

2

Extracting Knowledge From Text

……

Page 3: 1 Unsupervised Ontology Induction From Text Hoifung Poon Dept. Computer Science & Eng. University of Washington (Joint work with Pedro Domingos)
Page 4: 1 Unsupervised Ontology Induction From Text Hoifung Poon Dept. Computer Science & Eng. University of Washington (Joint work with Pedro Domingos)
Page 5: 1 Unsupervised Ontology Induction From Text Hoifung Poon Dept. Computer Science & Eng. University of Washington (Joint work with Pedro Domingos)

5

Extracting Knowledge From Text

……

Page 6: 1 Unsupervised Ontology Induction From Text Hoifung Poon Dept. Computer Science & Eng. University of Washington (Joint work with Pedro Domingos)

Extracting Knowledge From Text

Wanted: Automatic, end-to-end solution Manual engineering: Costly and limited Supervised learning

Bottleneck: Labeled examples Infeasible for large-scale, open-domain

knowledge extraction

6

Page 7: 1 Unsupervised Ontology Induction From Text Hoifung Poon Dept. Computer Science & Eng. University of Washington (Joint work with Pedro Domingos)

Unsupervised Learning for Knowledge Extraction

TextRunner [Banko et al. 2007] State-of-the-art open information extraction Only extracts triples Extractions are largely unstructured and noisy

USP [Poon & Domingos 2009] Form complete, detailed meaning representation More robust to noise Still limited to extractions with substantial evidence Lacks ontological structures

7

Page 8: 1 Unsupervised Ontology Induction From Text Hoifung Poon Dept. Computer Science & Eng. University of Washington (Joint work with Pedro Domingos)

Why Ontology?

Compact representation and efficient reasoning [Staab & Studer 2004]

Better generalization

8

Q: What does IL-2 regulate?

A: The DEX-mediated IkappaBalpha induction

Interestingly, the DEX-mediated IkappaBalpha induction was completely inhibited by IL-2, but not IL-4, in Th1 cells, while the reverse profile was seen in Th2 cells.

REGULATE

INHIBIT

ISA

Page 9: 1 Unsupervised Ontology Induction From Text Hoifung Poon Dept. Computer Science & Eng. University of Washington (Joint work with Pedro Domingos)

Ontology Learning Attracts increasing interest [Snow et al. 2006, Cimiano

2006, Suchanek et al. 2008, Wu & Weld 2008] Induction: Construct an ontology Population: Map textual expressions to concepts and

relations in the ontology Limitations in existing approaches

Require heuristic patterns or existing KBs Pursue each task in isolation

9

Knowledge representation NLP

Page 10: 1 Unsupervised Ontology Induction From Text Hoifung Poon Dept. Computer Science & Eng. University of Washington (Joint work with Pedro Domingos)

Jointly conducts: Ontology induction, population, and knowledge extraction Learns ISA hierarchy over logical expressions Populates it by translating sentences into

logical forms Extends USP with hierarchical clustering Hierarchical smoothing

Encoded in a few high-order formulas in Markov Logic [Richardson & Domingos, 2006]

Sole input is dependency trees10

This Talk: OntoUSP

Five times as many correct answers as TextRunner

Improves on the recall of USP by 47%

Page 11: 1 Unsupervised Ontology Induction From Text Hoifung Poon Dept. Computer Science & Eng. University of Washington (Joint work with Pedro Domingos)

11

Outline

Background: USP Unsupervised ontology induction Conclusion

Page 12: 1 Unsupervised Ontology Induction From Text Hoifung Poon Dept. Computer Science & Eng. University of Washington (Joint work with Pedro Domingos)

12

Semantic Parsing

induces

protein CD11b

nsubj dobj

IL-4

nn

induces

protein CD11b

nsubj dobj

IL-4

nn

INDUCE

INDUCER INDUCED

IL-4

CD11B

INDUCE(e1)

INDUCER(e1,e2) INDUCED(e1,e3)

IL-4(e2) CD11B(e3)

IL-4 protein induces CD11b

Structured prediction: Partition + Assignment

Page 13: 1 Unsupervised Ontology Induction From Text Hoifung Poon Dept. Computer Science & Eng. University of Washington (Joint work with Pedro Domingos)

13

Challenge: Same Meaning, Many Variations

IL-4 induces CD11b

Protein IL-4 enhances the expression of CD11b

CD11b expression is induced by IL-4 protein

The cytokin interleukin-4 induces CD11b expression

IL-4’s up-regulation of CD11b, …

……

Page 14: 1 Unsupervised Ontology Induction From Text Hoifung Poon Dept. Computer Science & Eng. University of Washington (Joint work with Pedro Domingos)

14

Unsupervised Semantic Parsing

USP Recursively cluster arbitrary expressions composed with / by similar expressions

IL-4 induces CD11b

Protein IL-4 enhances the expression of CD11b

CD11b expression is enhanced by IL-4 protein

The cytokin interleukin-4 induces CD11b expression

IL-4’s up-regulation of CD11b, …

Page 15: 1 Unsupervised Ontology Induction From Text Hoifung Poon Dept. Computer Science & Eng. University of Washington (Joint work with Pedro Domingos)

15

Unsupervised Semantic Parsing

USP Recursively cluster arbitrary expressions composed with / by similar expressions

IL-4 induces CD11b

Protein IL-4 enhances the expression of CD11b

CD11b expression is enhanced by IL-4 protein

The cytokin interleukin-4 induces CD11b expression

IL-4’s up-regulation of CD11b, …

Cluster same forms at the atom level

Page 16: 1 Unsupervised Ontology Induction From Text Hoifung Poon Dept. Computer Science & Eng. University of Washington (Joint work with Pedro Domingos)

16

Cluster forms in composition with same forms

Unsupervised Semantic Parsing

USP Recursively cluster arbitrary expressions composed with / by similar expressions

IL-4 induces CD11b

Protein IL-4 enhances the expression of CD11b

CD11b expression is enhanced by IL-4 protein

The cytokin interleukin-4 induces CD11b expression

IL-4’s up-regulation of CD11b, …

Page 17: 1 Unsupervised Ontology Induction From Text Hoifung Poon Dept. Computer Science & Eng. University of Washington (Joint work with Pedro Domingos)

17

Unsupervised Semantic Parsing

USP Recursively cluster arbitrary expressions composed with / by similar expressions

IL-4 induces CD11b

Protein IL-4 enhances the expression of CD11b

CD11b expression is enhanced by IL-4 protein

The cytokin interleukin-4 induces CD11b expression

IL-4’s up-regulation of CD11b, …

Cluster forms in composition with same forms

Page 18: 1 Unsupervised Ontology Induction From Text Hoifung Poon Dept. Computer Science & Eng. University of Washington (Joint work with Pedro Domingos)

18

Unsupervised Semantic Parsing

USP Recursively cluster arbitrary expressions composed with / by similar expressions

IL-4 induces CD11b

Protein IL-4 enhances the expression of CD11b

CD11b expression is enhanced by IL-4 protein

The cytokin interleukin-4 induces CD11b expression

IL-4’s up-regulation of CD11b, …

Cluster forms in composition with same forms

Page 19: 1 Unsupervised Ontology Induction From Text Hoifung Poon Dept. Computer Science & Eng. University of Washington (Joint work with Pedro Domingos)

19

Unsupervised Semantic Parsing

USP Recursively cluster arbitrary expressions composed with / by similar expressions

IL-4 induces CD11b

Protein IL-4 enhances the expression of CD11b

CD11b expression is enhanced by IL-4 protein

The cytokin interleukin-4 induces CD11b expression

IL-4’s up-regulation of CD11b, …

Cluster forms in composition with same forms

Page 20: 1 Unsupervised Ontology Induction From Text Hoifung Poon Dept. Computer Science & Eng. University of Washington (Joint work with Pedro Domingos)

Probabilistic Model for USP

Joint probability distribution over input dependency trees and their semantic parses

Use Markov logic A Markov Logic Network (MLN) is a set of

pairs (Fi, wi) where Fi is a formula in higher-order logic

wi is a real number1

( ) exp ( )i ii

P x w N xZ

Number of true

groundings of Fi

Page 21: 1 Unsupervised Ontology Induction From Text Hoifung Poon Dept. Computer Science & Eng. University of Washington (Joint work with Pedro Domingos)

21

Unsupervised Semantic Parsing

Exponential prior on number of parameters Cluster mixtures:InClust(e,+c) ^ HasValue(e,+v)

Object/Event Cluster: INDUCE

induces 0.1

enhances 0.4

Property Cluster: INDUCER

0.5

0.4…

IL-4 0.2

IL-8 0.1

None 0.1

One 0.8

nsubj

agent

Page 22: 1 Unsupervised Ontology Induction From Text Hoifung Poon Dept. Computer Science & Eng. University of Washington (Joint work with Pedro Domingos)

22

Inference: Hill-Climb Probability

Initialize

Search Operator

Lambda reduction

induces

protein CD11B

nsubj dobj

IL-4

nn

?

? ?

?

?

?

?

protein

IL-4

nn

protein

IL-4

nn

?

?

?

?

Page 23: 1 Unsupervised Ontology Induction From Text Hoifung Poon Dept. Computer Science & Eng. University of Washington (Joint work with Pedro Domingos)

23

Learning: Hill-Climb Likelihood

enhances 1induces 1 protein 1IL-4 1

MERGE COMPOSE

IL-4 protein 1induces 0.2enhances 0.8

…Initialize

Search Operator

enhances 1induces 1 protein 1IL-4 1

Page 24: 1 Unsupervised Ontology Induction From Text Hoifung Poon Dept. Computer Science & Eng. University of Washington (Joint work with Pedro Domingos)

24

Outline

Background: USP Unsupervised ontology induction Conclusion

Page 25: 1 Unsupervised Ontology Induction From Text Hoifung Poon Dept. Computer Science & Eng. University of Washington (Joint work with Pedro Domingos)

OntoUSP

USP + Hierarchical clustering + Shrinkage Modify the cluster mixture formulaInClust(e,c) ^ ISA(c,+d) ^ HasValue(e,+v)

Page 26: 1 Unsupervised Ontology Induction From Text Hoifung Poon Dept. Computer Science & Eng. University of Washington (Joint work with Pedro Domingos)

26

New Operator: Abstraction

induces 0.30.1

enhances

ISA ISA

inhibits 0.2suppresses 0.1

induces 0.6

up-regulates 0.2

INDUCE

INHIBIT

inhibits 0.4

0.2

suppresses

INHIBIT

inhibits 0.4

0.2

suppressesinduces 0.6

up-regulates 0.2

INDUCE

MERGE with

REGULATE?

Captures substantial similarities

Page 27: 1 Unsupervised Ontology Induction From Text Hoifung Poon Dept. Computer Science & Eng. University of Washington (Joint work with Pedro Domingos)

27

Experiments

Evaluate on an end task: Question answering

Applied OntoUSP to extract knowledge from text and answer questions

Evaluation: Number of answers and accuracy GENIA dataset: 1999 Pubmed abstracts Questions

Use simple questions in this paper, e.g.:What does anti-STAT1 inhibit?What regulates MIP-1 alpha?

Sample 2000 questions according to frequency

Page 28: 1 Unsupervised Ontology Induction From Text Hoifung Poon Dept. Computer Science & Eng. University of Washington (Joint work with Pedro Domingos)

Total vs. Correct Answers

0

100

200

300

400

500

KW-SYN TextRunner RESOLVER DIRT USP OntoUSP

Improves recall over USP by 47%

Five times as many correct answers as TextRunner

Highest accuracy of 91%

Page 29: 1 Unsupervised Ontology Induction From Text Hoifung Poon Dept. Computer Science & Eng. University of Washington (Joint work with Pedro Domingos)

Induced Ontology (Partial)

Page 30: 1 Unsupervised Ontology Induction From Text Hoifung Poon Dept. Computer Science & Eng. University of Washington (Joint work with Pedro Domingos)

30

Question-Answer: Example

Q: What does IL-2 control?A: The DEX-mediated IkappaBalpha inductionSentence:

Interestingly, the DEX-mediated IkappaBalpha induction was completely inhibited by IL-2, but not IL-4, in Th1 cells, while the reverse profile was seen in Th2 cells.

Page 31: 1 Unsupervised Ontology Induction From Text Hoifung Poon Dept. Computer Science & Eng. University of Washington (Joint work with Pedro Domingos)

31

Conclusion

OntoUSP: Unsupervised ontology induction USP + hierarchical clustering / smoothing Jointly conducts ontology induction,

population, and knowledge extraction See you at poster 46