Niket Tandon - Max Planck Societypeople.mpi-inf.mpg.de/.../defense-niket-tandon.pdfNiket Tandon...

Post on 26-Jul-2020

5 views 0 download

Transcript of Niket Tandon - Max Planck Societypeople.mpi-inf.mpg.de/.../defense-niket-tandon.pdfNiket Tandon...

Commonsense Knowledge Acquisition and Applications

Niket TandonPh.D. Supervisor: Gerhard Weikum

Max Planck Institute for Informatics

Towards Commonsense Enriched Machines

2

Hard Rock

Hand, leg

Climbing a rock

brown

Person

Adventurous Activity

property

part of

scene

Climber Personis a

3

Hard Rock

Hand, leg

Climbing a rock

brown

Person

Adventurous Activity

property

part of

scene

Humans

Climber Personis a

Machines

1 Rock

2 Hands

2 Legs

1 Person

Human- Machine Knowledge Gap

4

Hard Rock

Hand, leg

Climbing a rock

brown

Person

Adventurous Activity

property

part of

scene

Humans

Climber Personis a

Machines

1 Rock

2 Hands

2 Legs

1 Person

Human- Machine Knowledge Gap

Commonsense of

objects

Commonsense of

relationships

Commonsense of

interactions

5

How will the machines be smarter if we fill this knowledge gap

Smarter Robots

Get me a coffee (where?)

Smarter Vision

Better classifiers Monitor or TV?given mouse, keyboard

Smarter IR

Adventurous activities

6

Encyclopedic Knowledge

Commonsense

Knowledge

Facts about instances/events

Facts about Instances:A. Honnold, married, Lisa Honnold

Their events:A. Honnold, married on, 19.08.2016

Facts about classes/activities

Can we fill the human machine knowledge gap using existing Encyclopedic KBs like FreeBase?

7

Encyclopedic Knowledge

Commonsense Knowledge

Facts about instances

1. EKB acquisition Unimodal

2. EKB Curation Textual verification

3. EKB CompletionNegative training assumptions hold

If (ei, rk, ej) holds, then

(ei, rk, ej’ != ej) is -ve

A. Honnold, bornIn, USA. Honnold, bornIn, UK

Facts about classes

1. CKB acquisitionMultimodal

2. CKB Curation Textual + Visual

3. CKB CompletionNegative trainingassumptions fail

climber, at location, {mountain, university}

8

Encyclopedic Knowledge

Commonsense Knowledge

Facts about instances

1. EKB acquisition Unimodal

2. EKB Curation Textual verification

3. EKB CompletionNegative training assumptions hold

If (ei, rk, ej) holds, then

(ei, rk, ej’ != ej) is -ve

A. Honnold, bornIn, USA. Honnold, bornIn, UK

Facts about classes

1. CKB acquisitionMultimodal

2. CKB Curation Textual + Visual

3. CKB CompletionNegative trainingassumptions failEKBs have several functional relations

hence the assumption holds.

0

0.2

0.4

0.6

0.8

1

EKB CKB

Functional

Non-functional

Commonsense knowledge acquisition is different and harder

Humans hardly express the obvious: Scarce & Implicit

Spread across multiple modalities: Multimodal

Unusual reported more than usual: Reporting Bias

Culture specific, Location specific: Contextual

9

KBs possessing commonsense knowledge

10

Need: automatically constructed, semantically organized Commonsense KB

KB Supervision Pros Cons

Cyc manually curated

accuracy costcoverage

ConceptNet semi-automated

coverage accuracy

less organized

Tandon et. al AAAI’11

bootstrapped usingConceptNet

coverage noise, less organized

Desiderata minimalsupervision

organized,high accuracy > 80%, high coverage >10M

---

Need: robust techniques to automatically construct semantically organized Commonsense KB

Three research questions:Investigate robust techniques to acquire:

RQ 1. Commonsense of objects in the environment - fine-grained, semantically refined properties.

Three research questions:Investigate robust techniques to acquire:

RQ 2. Commonsense of relationships between objects. - part whole relation, comparative relation…

Three research questions:Investigate robust techniques to acquire:

RQ 3. Commonsense of interactions between objects.- activities and their semantic attributes.

Three research questions:Investigate robust techniques to acquire:

Three research questions:Investigate robust techniques to acquire:

RQ.1

RQ.2

RQ.3

RQ.3

Research question 1

RQ.2

Previous work: • lump together these properties • do not distinguish the meanings of the words• have low coverage

RQ 1. Commonsense of objects in the environment - fine-grained, semantically refined properties.

18

Output 𝑡𝑟𝑖𝑝𝑙𝑒𝑠 ∶ < 𝑤1𝑛𝑠 , 𝑟, 𝑤2𝑎

𝑠 >

Input ∶ 𝐿𝑎𝑟𝑔𝑒 𝑡𝑒𝑥𝑡 𝑐𝑜𝑟𝑝𝑢𝑠

𝑐𝑜𝑛𝑡𝑎𝑖𝑛𝑖𝑛𝑔 𝑒. 𝑔. 𝑠𝑢𝑚𝑚𝑖𝑡 𝑖𝑠 𝑐𝑟𝑖𝑠𝑝

𝑠𝑢𝑚𝑚𝑖𝑡𝑛2 ℎ𝑎𝑠𝑇𝑒𝑚𝑝𝑒𝑟𝑎𝑡𝑢𝑟𝑒 𝑐𝑟𝑖𝑠𝑝𝑎

3

19

disambiguated n

1.)

2.)

3.)

fine-grained relations: r∈R

hasAppearancehasSoundhasTastehasTemperaturehasSoundevokesEmotion

Output 𝑡𝑟𝑖𝑝𝑙𝑒𝑠 ∶ < 𝑤1𝑛𝑠 , 𝑟, 𝑤2𝑎

𝑠 >

Input ∶ 𝐿𝑎𝑟𝑔𝑒 𝑡𝑒𝑥𝑡 𝑐𝑜𝑟𝑝𝑢𝑠

𝑐𝑜𝑛𝑡𝑎𝑖𝑛𝑖𝑛𝑔 𝑒. 𝑔. 𝑠𝑢𝑚𝑚𝑖𝑡 𝑖𝑠 𝑐𝑟𝑖𝑠𝑝

disambiguated a

1.)

2.)

3.)

𝑠𝑢𝑚𝑚𝑖𝑡𝑛2 ℎ𝑎𝑠𝑇𝑒𝑚𝑝𝑒𝑟𝑎𝑡𝑢𝑟𝑒 𝑐𝑟𝑖𝑠𝑝𝑎

3

20

Extract generic hasProperty

triples over input

<noun> verb [adv] <adj><adj> <noun>e.g. 𝑠𝑢𝑚𝑚𝑖𝑡 𝑖𝑠 𝑐𝑟𝑖𝑠𝑝..

Disambiguate argsand classify triple

𝒔𝒖𝒎𝒎𝒊𝒕, 𝒄𝒓𝒊𝒔𝒑

Our approach

𝒎𝒐𝒖𝒏𝒕𝒂𝒊𝒏, 𝒄𝒐𝒍𝒅

𝒄𝒉𝒊𝒍𝒊, 𝒉𝒐𝒕

Extract generic hasProperty

triples over input

Disambiguate argsand classify triple

Typically requirestraining data

22

< 𝒘𝟏𝒏 , 𝒘𝟐𝒂 >

<∗, 𝒓,𝒘𝟐𝒂𝒔 >

< 𝒘𝟏𝒏𝒔 , 𝒓, 𝒘𝟐𝒂

𝒔 >

< 𝒘𝟏𝒏𝒔 , 𝒓,∗>

Suppose 𝑟 =ℎ𝑎𝑠𝑇𝑒𝑚𝑝𝑒𝑟𝑎𝑡𝑢𝑟𝑒𝑠𝑢𝑚𝑚𝑖𝑡, 𝑐𝑟𝑖𝑠𝑝

Extract generic hasProperty

triples over input

Disambiguate argsand classify triple

𝒄𝒓𝒊𝒔𝒑𝒂𝟑, 𝒉𝒐𝒕𝒂

𝟏, 𝒄𝒐𝒍𝒅𝒂𝟏,

𝒊𝒄𝒚𝒂𝟐 …

𝒃𝒆𝒂𝒄𝒉𝒏𝟑 , 𝒔𝒖𝒎𝒎𝒊𝒕𝒏

𝟐 , 𝒎𝒆𝒕𝒂𝒍𝒏

𝟏 , 𝒎𝒆𝒕𝒂𝒍𝒏𝟐 …

< 𝒔𝒖𝒎𝒎𝒊𝒕𝒏𝟐 , 𝒄𝒓𝒊𝒔𝒑𝒂

𝟑 >< 𝒃𝒆𝒂𝒄𝒉𝒏

𝟏 , 𝒉𝒐𝒕𝒂𝟏 > …

𝒓𝒂𝒏𝒈𝒆 𝒓 𝒊𝒏𝒇𝒆𝒓𝒆𝒏𝒄𝒆

𝒅𝒐𝒎𝒂𝒊𝒏 𝒓 𝒊𝒏𝒇𝒆𝒓𝒆𝒏𝒄𝒆

𝒂𝒔𝒔𝒆𝒓𝒕𝒊𝒐𝒏 𝒓 𝒊𝒏𝒇𝒆𝒓𝒆𝒏𝒄𝒆

𝑑𝑜𝑚𝑎𝑖𝑛(𝑟), 𝑟𝑎𝑛𝑔𝑒(𝑟), 𝑎𝑠𝑠𝑒𝑟𝑡𝑖𝑜𝑛(𝑟) 𝑖𝑛𝑓𝑒𝑟𝑒𝑛𝑐𝑒

23

Noisy, Surface

form candidates

for 𝒓

Graph construction

Graph inference

An instance of the problem: 𝑟𝑎𝑛𝑔𝑒(𝑟)

24

summit mountain dancer

cold 20 50 3

hot 30 40 10

crisp 15 15 1

An instance of the problem: 𝑟𝑎𝑛𝑔𝑒(𝑟)

25

𝒄𝒓𝒊𝒔𝒑𝒂𝟏 clearly defined

𝒄𝒓𝒊𝒔𝒑𝒂𝟑 cold and invigorating

temperature

𝒄𝒐𝒍𝒅𝒂𝟏 low or inadequate

temperature

An instance of the problem: 𝑟𝑎𝑛𝑔𝑒(𝑟)

26

sense #1 sense #2 sense #3

1/2 1/3 1/4

Label propagation for graph inference, given few seeds.- Label per node = in/not in range of hasTemperature

27

Similar nodes Similar labels

But, limitedtraining data

𝒔𝒖𝒎𝒎𝒊𝒕, 𝒄𝒓𝒊𝒔𝒑

𝒎𝒐𝒖𝒏𝒕𝒂𝒊𝒏, 𝒄𝒐𝒍𝒅

s𝒂𝒍𝒔𝒂, 𝒉𝒐𝒕

28

Similar nodes Similar labels

But, limitedtraining data

Label propagation for graph inference, given few seeds.- Label per node = in/not in range of hasTemperature

Label Propagation: Loss function (Talukdar et. al 2009)

Seed label loss

Similar node diff label loss

Label prior loss (high

degree nodes are noise)

29

UV

30

Seed label loss

Similar node diff label loss

Label prior loss

Label propagation for graph inference, given few seeds.- Label per node = in/not in range of hasTemperature

WebChild : Model recap

31

Noisy, surface form candidates

for 𝒓

Clean, disambiguated triples in

𝒓

Graph construction

Graph inference

Resulting KB

Domain (hasShape)

mountain-n1

leaf-n1

...

Range (hasShape)

triangular-a1

tapered-a1

...

Assertions (hasSshape)

lens-n1, spherical-a2

palace-n2, domed-a1

...

WebChild: Large (~5Million), Semantically organized Accurate (0.82 sampled precision)

Summary of property commonsense

WebChild: First commonsense KB with fine-grained relations and disambiguated arguments ; 4.6 million assertions including domain and range for 19 relations.

Take away message: Transductive methods help

overcome sparsity of commonsense in text.

Research question 3

RQ 3. Commonsense of interactions between objects.- activities and their semantic attributes.

Previous work: • largely discuss events, but activities only at small-scale• do not organize the attributes of the activities• do not distinguish the meanings of the attribute values

35

{Climb up a mountain , Hike up a hill}

Participants climber, boy, rope

Location camp, forest, sea shore

Time day, holiday

Visuals

An Activity frame

36

{Climb up a mountain

, Hike up a hill}

Participants climber, boy, rope

Location camp, forest, sea shore

Time day, holiday

Visuals

Get to village

.. ..

Go up an elevation

.. ..

Previous activityParent activity

Reach at the top

.. ..

Next activity

Semantic organization of Activity frames

37

Contain events but not activity knowledge

May contain activities but no visuals and varying granularity of scene boundaries, transitions.

38

Hollywood narratives are good

Contain events but not activity knowledge

May contain activities but no visuals and varying granularity of scene boundaries, transitions.

39

Semantic parsing of scripts

Graph construction

40

Input: Text in a scene taken from a semi-structured movie script e.g. : He began to shoot a video on the summit

Output: Disambiguated, semantic roles e.g.the man : agent began to shoot : action a video : patientsummit : location

SRL systems are computationally expensive, domain specific

Semantic parsing of scripts

Graph construction

41

State of the art WSD customized for phrases

man.1

video.1

shoot.1

shoot.4

man.2

the man

began to

shoot

a video

42

State of the art WSD customized for phrases

man.1

video.1

shoot.1

shoot.4

man.2

the man

began to

shoot

a video

agent.animate

shoot.vn.1patient.animate

agent.animate

shoot.vn.3patient.

inanimate

NP VP NP

NP VP NP

VerbNet contains curated semantic roles for verbs

Selectional restriction

Selectional restriction

Can we use two different information sources to perform SRL given no training data?

43

State of the art WSD customized for phrasesSyntactic and semantic role

semantics from VerbNet

man.1

video.1

shoot.1

shoot.4

man.2 agent.animate

shoot.vn.1patient.animate

agent.animate

shoot.vn.3patient.

inanimate

the man

began to

shoot

a video

NP VP NP

NP VP NP

Thing/ inanimate

WordNet class hierarchy

WordNet VerbNetlinkage

Jointly leverage

44

State of the art WSD customized for phrasesSyntactic and semantic role

semantics from VerbNet

man.1

video.1

shoot.1

shoot.4

man.2 agent.animate

shoot.vn.1patient.animate

agent.animate

shoot.vn.3patient.

inanimate

the man

began to

shoot

a video

NP VP NP

NP VP NP

Thing/ inanimate

WordNet class hierarchy

WordNet VerbNetlinkage

Jointly leverage

Binary decision variable

45

State of the art WSD customized for phrasesSyntactic and semantic role

semantics from VerbNet

man.1

video.1

shoot.1

shoot.4

man.2 agent.animate

shoot.vn.1patient.animate

agent.animate

shoot.vn.3patient.

inanimate

the man

began to

shoot

a video

NP VP NP

NP VP NP

Thing/ inanimate

WordNet class hierarchy

WordNet VerbNetlinkage

Jointly leverage

WSD prior WN prior

46

State of the art WSD customized for phrasesSyntactic and semantic role

semantics from VerbNet

man.1

video.1

shoot.1

shoot.4

man.2 agent.animate

shoot.vn.1patient.animate

agent.animate

shoot.vn.3patient.

inanimate

the man

began to

shoot

a video

NP VP NP

NP VP NP

Thing/ inanimate

WordNet class hierarchy

WN VN linkage

Jointly leverage

Sense, VN syntactic match score

47

State of the art WSD customized for phrasesSyntactic and semantic role

semantics from VerbNet

man.1

video.1

shoot.1

shoot.4

man.2 agent.animate

shoot.vn.1patient.animate

agent.animate

shoot.vn.3patient.

inanimate

the man

began to

shoot

a video

NP VP NP

NP VP NP

Thing/ inanimate

WordNet class hierarchy

WN VN linkage

Jointly leverage

Sense, VN semantic match score

48

xij = binary decision var. for word i, mapped to WN sense j

WSD prior WN prior Word, VN match score

Selectional restriction score

One VN sense per verb

WN, VN sense consistency

Selectional restr. constraints

binary decision

Joint WSD and SRL

… …

Joint WSD and SRL O/P

Agent:

man.1

Action:

shoot.4

Patient:

video.1

man.1

video.1

shoot.1

shoot.4

man.2 agent.animate

shoot.vn.1patient.animate

agent.animate

shoot.vn.3patient.

inanimate

the man

began to

shoot

a video

NP VP NP

NP VP NP

Semantic parsing of scripts

Graph construction

Climb up a mountain

Participants climber, rope

Location summit, forest

Time day

Semantic parsing of scripts

Graph construction

51

Climb up a mountain

Participants climber, rope

Location summit, forest

Time day

Hike up a hill

Participants climber

Location sea shore

Time holiday

Go up an

elevation

.. ..

Reach top

.. ..

Semantic parsing of scripts

Graphconstruction

Construct a graph of activity frames with three edge types:

Similar : S(a,b) Previous: P(a,b)TypeOf : T(a,b)

52

Similarity: S (climb up a mountain, hike up a hill)

Attribute similarity

Climb up a mountain

Participants climber, rope

Location forest

Time day

Hike up a Hill

Participants climber

Location woods

Time holiday

+Activity Similarity

53

Attribute hypernymy

Climb up a mountain

Participants climber, rope

Location forest

Time day

Go up an elevation

Participants Person

Location Exterior

Time day

+Activity hypernymy

TypeOf: T (climb up a mountain, go up an elevation)

54

Climb up a mountain

… …

Reach the top

… …

Previous: P (reach the top, climb up a mountain)

Allow gaps between activities within one scene.PMI style counting to suppress generic activities.

Scene:

Carrie and Big start out early to head to the village. They climb up the beautiful mountain which felt as if they were in a different world. After several hours they eventually reach the top.

55

Climb up a mountain

Participants climber, rope

Location summit, forest

Time day

Hike up a hill

Participants climber

Location sea shore

Time holiday

Go up an elevation

.. ..

Reach top

.. ..

Semantic parsing of scripts

Graph construction

similar

56

Semantic parsing of scripts

Graph construction

57

Knowlywood Statistics

Scenes 1,708,782Activity synsets 505,788

Accuracy 0.85 ± 0.01#Images from scenes 30,000

Resulting KB: Knowlywood

Summary of activity commonsense

Knowlywood: First organized commonsense activity KB with activity attributes and disambiguated values containing nearly 1 million activities with visuals.

Take away message: Jointly leveraging different annotated

resources helps overcome sparsity of training data.

The overall KB: WebChild KB

> 3M concepts, > 18M triples, >1000 relations

Conclusions and take home messages:Knowledge to make machines smarter can be acquired with robust techniques that jointly leverage global information

• Research Question 1Properties

(WSDM’14)

• Research Question 2Comparatives, part-whole

(AAAI’14, AAAI’16)

• Research Question 3Activities

(WWW’15, CIKM’15)

60

WEBCHILD KB Applications(CVPR’15, ACL’15, ISWC’16..)

Conclusions and take home messages:Knowledge to make machines smarter can be acquired with robust techniques that jointly leverage global information

• RQ1

• Range, domain, assertions of fine-grained relations

Properties

(WSDM’14)

• RQ2

• Fine-grained comparative, part-whole relations

Comparatives, part-whole

(AAAI’14, AAAI’16)

• RQ3

• Activity frames with semantic attributes

Activities

(WWW’15, CIKM’15)

61

WEBCHILD KB Applications(CVPR’15, ACL’15, ISWC’16..)

ML + NLP community

limited training data can be overcome by jointly leveraging multiple cues

Computer Vision community

commonsense helps computer vision

vision helps commonsense acquisition

AI community

semantically organized knowledge is a step towards filling human machine gap