Commonsense Knowledge Acquisition and Applications
Niket TandonPh.D. Supervisor: Gerhard Weikum
Max Planck Institute for Informatics
Towards Commonsense Enriched Machines
2
Hard Rock
Hand, leg
Climbing a rock
brown
Person
Adventurous Activity
property
part of
scene
Climber Personis a
3
Hard Rock
Hand, leg
Climbing a rock
brown
Person
Adventurous Activity
property
part of
scene
Humans
Climber Personis a
Machines
1 Rock
2 Hands
2 Legs
1 Person
Human- Machine Knowledge Gap
4
Hard Rock
Hand, leg
Climbing a rock
brown
Person
Adventurous Activity
property
part of
scene
Humans
Climber Personis a
Machines
1 Rock
2 Hands
2 Legs
1 Person
Human- Machine Knowledge Gap
Commonsense of
objects
Commonsense of
relationships
Commonsense of
interactions
5
How will the machines be smarter if we fill this knowledge gap
Smarter Robots
Get me a coffee (where?)
Smarter Vision
Better classifiers Monitor or TV?given mouse, keyboard
Smarter IR
Adventurous activities
6
Encyclopedic Knowledge
Commonsense
Knowledge
Facts about instances/events
Facts about Instances:A. Honnold, married, Lisa Honnold
Their events:A. Honnold, married on, 19.08.2016
Facts about classes/activities
Can we fill the human machine knowledge gap using existing Encyclopedic KBs like FreeBase?
7
Encyclopedic Knowledge
Commonsense Knowledge
Facts about instances
1. EKB acquisition Unimodal
2. EKB Curation Textual verification
3. EKB CompletionNegative training assumptions hold
If (ei, rk, ej) holds, then
(ei, rk, ejβ != ej) is -ve
A. Honnold, bornIn, USA. Honnold, bornIn, UK
Facts about classes
1. CKB acquisitionMultimodal
2. CKB Curation Textual + Visual
3. CKB CompletionNegative trainingassumptions fail
climber, at location, {mountain, university}
8
Encyclopedic Knowledge
Commonsense Knowledge
Facts about instances
1. EKB acquisition Unimodal
2. EKB Curation Textual verification
3. EKB CompletionNegative training assumptions hold
If (ei, rk, ej) holds, then
(ei, rk, ejβ != ej) is -ve
A. Honnold, bornIn, USA. Honnold, bornIn, UK
Facts about classes
1. CKB acquisitionMultimodal
2. CKB Curation Textual + Visual
3. CKB CompletionNegative trainingassumptions failEKBs have several functional relations
hence the assumption holds.
0
0.2
0.4
0.6
0.8
1
EKB CKB
Functional
Non-functional
Commonsense knowledge acquisition is different and harder
Humans hardly express the obvious: Scarce & Implicit
Spread across multiple modalities: Multimodal
Unusual reported more than usual: Reporting Bias
Culture specific, Location specific: Contextual
9
KBs possessing commonsense knowledge
10
Need: automatically constructed, semantically organized Commonsense KB
KB Supervision Pros Cons
Cyc manually curated
accuracy costcoverage
ConceptNet semi-automated
coverage accuracy
less organized
Tandon et. al AAAIβ11
bootstrapped usingConceptNet
coverage noise, less organized
Desiderata minimalsupervision
organized,high accuracy > 80%, high coverage >10M
---
Need: robust techniques to automatically construct semantically organized Commonsense KB
Three research questions:Investigate robust techniques to acquire:
RQ 1. Commonsense of objects in the environment - fine-grained, semantically refined properties.
Three research questions:Investigate robust techniques to acquire:
RQ 2. Commonsense of relationships between objects. - part whole relation, comparative relationβ¦
Three research questions:Investigate robust techniques to acquire:
RQ 3. Commonsense of interactions between objects.- activities and their semantic attributes.
Three research questions:Investigate robust techniques to acquire:
Three research questions:Investigate robust techniques to acquire:
RQ.1
RQ.2
RQ.3
RQ.3
Research question 1
RQ.2
Previous work: β’ lump together these properties β’ do not distinguish the meanings of the wordsβ’ have low coverage
RQ 1. Commonsense of objects in the environment - fine-grained, semantically refined properties.
18
Output π‘ππππππ βΆ < π€1ππ , π, π€2π
π >
Input βΆ πΏππππ π‘ππ₯π‘ πππππ’π
ππππ‘ππππππ π. π. π π’ππππ‘ ππ ππππ π
π π’ππππ‘π2 βππ ππππππππ‘π’ππ ππππ ππ
3
19
disambiguated n
1.)
2.)
3.)
β¦
fine-grained relations: rβR
hasAppearancehasSoundhasTastehasTemperaturehasSoundevokesEmotion
β¦
Output π‘ππππππ βΆ < π€1ππ , π, π€2π
π >
Input βΆ πΏππππ π‘ππ₯π‘ πππππ’π
ππππ‘ππππππ π. π. π π’ππππ‘ ππ ππππ π
disambiguated a
1.)
2.)
3.)
β¦
π π’ππππ‘π2 βππ ππππππππ‘π’ππ ππππ ππ
3
20
Extract generic hasProperty
triples over input
<noun> verb [adv] <adj><adj> <noun>e.g. π π’ππππ‘ ππ ππππ π..
Disambiguate argsand classify triple
ππππππ, πππππ
Our approach
ππππππππ, ππππ
πππππ, πππ
Extract generic hasProperty
triples over input
Disambiguate argsand classify triple
Typically requirestraining data
22
< πππ , πππ >
<β, π,ππππ >
< ππππ , π, πππ
π >
< ππππ , π,β>
Suppose π =βππ ππππππππ‘π’πππ π’ππππ‘, ππππ π
Extract generic hasProperty
triples over input
Disambiguate argsand classify triple
πππππππ, ππππ
π, ππππ ππ,
πππππ β¦
πππππππ , πππππππ
π , ππππππ
π , πππππππ β¦
< ππππππππ , ππππππ
π >< ππππππ
π , πππππ > β¦
πππππ π πππππππππ
π πππππ π πππππππππ
πππππππππ π πππππππππ
ππππππ(π), πππππ(π), ππ π πππ‘πππ(π) πππππππππ
23
Noisy, Surface
form candidates
for π
Graph construction
Graph inference
An instance of the problem: πππππ(π)
24
summit mountain dancer
cold 20 50 3
hot 30 40 10
crisp 15 15 1
An instance of the problem: πππππ(π)
25
πππππππ clearly defined
πππππππ cold and invigorating
temperature
ππππ ππ low or inadequate
temperature
An instance of the problem: πππππ(π)
26
sense #1 sense #2 sense #3
1/2 1/3 1/4
Label propagation for graph inference, given few seeds.- Label per node = in/not in range of hasTemperature
27
Similar nodes Similar labels
But, limitedtraining data
ππππππ, πππππ
ππππππππ, ππππ
sππππ, πππ
28
Similar nodes Similar labels
But, limitedtraining data
Label propagation for graph inference, given few seeds.- Label per node = in/not in range of hasTemperature
Label Propagation: Loss function (Talukdar et. al 2009)
Seed label loss
Similar node diff label loss
Label prior loss (high
degree nodes are noise)
29
UV
30
Seed label loss
Similar node diff label loss
Label prior loss
Label propagation for graph inference, given few seeds.- Label per node = in/not in range of hasTemperature
WebChild : Model recap
31
Noisy, surface form candidates
for π
Clean, disambiguated triples in
π
Graph construction
Graph inference
Resulting KB
Domain (hasShape)
mountain-n1
leaf-n1
...
Range (hasShape)
triangular-a1
tapered-a1
...
Assertions (hasSshape)
lens-n1, spherical-a2
palace-n2, domed-a1
...
WebChild: Large (~5Million), Semantically organized Accurate (0.82 sampled precision)
Summary of property commonsense
WebChild: First commonsense KB with fine-grained relations and disambiguated arguments ; 4.6 million assertions including domain and range for 19 relations.
Take away message: Transductive methods help
overcome sparsity of commonsense in text.
Research question 3
RQ 3. Commonsense of interactions between objects.- activities and their semantic attributes.
Previous work: β’ largely discuss events, but activities only at small-scaleβ’ do not organize the attributes of the activitiesβ’ do not distinguish the meanings of the attribute values
35
{Climb up a mountain , Hike up a hill}
Participants climber, boy, rope
Location camp, forest, sea shore
Time day, holiday
Visuals
An Activity frame
36
{Climb up a mountain
, Hike up a hill}
Participants climber, boy, rope
Location camp, forest, sea shore
Time day, holiday
Visuals
Get to village
.. ..
Go up an elevation
.. ..
Previous activityParent activity
Reach at the top
.. ..
Next activity
Semantic organization of Activity frames
37
Contain events but not activity knowledge
May contain activities but no visuals and varying granularity of scene boundaries, transitions.
38
Hollywood narratives are good
Contain events but not activity knowledge
May contain activities but no visuals and varying granularity of scene boundaries, transitions.
39
Semantic parsing of scripts
Graph construction
40
Input: Text in a scene taken from a semi-structured movie script e.g. : He began to shoot a video on the summit
Output: Disambiguated, semantic roles e.g.the man : agent began to shoot : action a video : patientsummit : location
SRL systems are computationally expensive, domain specific
Semantic parsing of scripts
Graph construction
41
State of the art WSD customized for phrases
man.1
video.1
shoot.1
shoot.4
man.2
the man
began to
shoot
a video
42
State of the art WSD customized for phrases
man.1
video.1
shoot.1
shoot.4
man.2
the man
began to
shoot
a video
agent.animate
shoot.vn.1patient.animate
agent.animate
shoot.vn.3patient.
inanimate
NP VP NP
NP VP NP
VerbNet contains curated semantic roles for verbs
Selectional restriction
Selectional restriction
Can we use two different information sources to perform SRL given no training data?
43
State of the art WSD customized for phrasesSyntactic and semantic role
semantics from VerbNet
man.1
video.1
shoot.1
shoot.4
man.2 agent.animate
shoot.vn.1patient.animate
agent.animate
shoot.vn.3patient.
inanimate
the man
began to
shoot
a video
NP VP NP
NP VP NP
Thing/ inanimate
WordNet class hierarchy
WordNet VerbNetlinkage
Jointly leverage
44
State of the art WSD customized for phrasesSyntactic and semantic role
semantics from VerbNet
man.1
video.1
shoot.1
shoot.4
man.2 agent.animate
shoot.vn.1patient.animate
agent.animate
shoot.vn.3patient.
inanimate
the man
began to
shoot
a video
NP VP NP
NP VP NP
Thing/ inanimate
WordNet class hierarchy
WordNet VerbNetlinkage
Jointly leverage
Binary decision variable
45
State of the art WSD customized for phrasesSyntactic and semantic role
semantics from VerbNet
man.1
video.1
shoot.1
shoot.4
man.2 agent.animate
shoot.vn.1patient.animate
agent.animate
shoot.vn.3patient.
inanimate
the man
began to
shoot
a video
NP VP NP
NP VP NP
Thing/ inanimate
WordNet class hierarchy
WordNet VerbNetlinkage
Jointly leverage
WSD prior WN prior
46
State of the art WSD customized for phrasesSyntactic and semantic role
semantics from VerbNet
man.1
video.1
shoot.1
shoot.4
man.2 agent.animate
shoot.vn.1patient.animate
agent.animate
shoot.vn.3patient.
inanimate
the man
began to
shoot
a video
NP VP NP
NP VP NP
Thing/ inanimate
WordNet class hierarchy
WN VN linkage
Jointly leverage
Sense, VN syntactic match score
47
State of the art WSD customized for phrasesSyntactic and semantic role
semantics from VerbNet
man.1
video.1
shoot.1
shoot.4
man.2 agent.animate
shoot.vn.1patient.animate
agent.animate
shoot.vn.3patient.
inanimate
the man
began to
shoot
a video
NP VP NP
NP VP NP
Thing/ inanimate
WordNet class hierarchy
WN VN linkage
Jointly leverage
Sense, VN semantic match score
48
xij = binary decision var. for word i, mapped to WN sense j
WSD prior WN prior Word, VN match score
Selectional restriction score
One VN sense per verb
WN, VN sense consistency
Selectional restr. constraints
binary decision
Joint WSD and SRL
β¦ β¦
Joint WSD and SRL O/P
Agent:
man.1
Action:
shoot.4
Patient:
video.1
man.1
video.1
shoot.1
shoot.4
man.2 agent.animate
shoot.vn.1patient.animate
agent.animate
shoot.vn.3patient.
inanimate
the man
began to
shoot
a video
NP VP NP
NP VP NP
Semantic parsing of scripts
Graph construction
Climb up a mountain
Participants climber, rope
Location summit, forest
Time day
Semantic parsing of scripts
Graph construction
51
Climb up a mountain
Participants climber, rope
Location summit, forest
Time day
Hike up a hill
Participants climber
Location sea shore
Time holiday
Go up an
elevation
.. ..
Reach top
.. ..
Semantic parsing of scripts
Graphconstruction
Construct a graph of activity frames with three edge types:
Similar : S(a,b) Previous: P(a,b)TypeOf : T(a,b)
52
Similarity: S (climb up a mountain, hike up a hill)
Attribute similarity
Climb up a mountain
Participants climber, rope
Location forest
Time day
Hike up a Hill
Participants climber
Location woods
Time holiday
+Activity Similarity
53
Attribute hypernymy
Climb up a mountain
Participants climber, rope
Location forest
Time day
Go up an elevation
Participants Person
Location Exterior
Time day
+Activity hypernymy
TypeOf: T (climb up a mountain, go up an elevation)
54
Climb up a mountain
β¦ β¦
Reach the top
β¦ β¦
Previous: P (reach the top, climb up a mountain)
Allow gaps between activities within one scene.PMI style counting to suppress generic activities.
Scene:
Carrie and Big start out early to head to the village. They climb up the beautiful mountain which felt as if they were in a different world. After several hours they eventually reach the top.
β¦
55
Climb up a mountain
Participants climber, rope
Location summit, forest
Time day
Hike up a hill
Participants climber
Location sea shore
Time holiday
Go up an elevation
.. ..
Reach top
.. ..
Semantic parsing of scripts
Graph construction
similar
56
Semantic parsing of scripts
Graph construction
57
Knowlywood Statistics
Scenes 1,708,782Activity synsets 505,788
Accuracy 0.85 Β± 0.01#Images from scenes 30,000
Resulting KB: Knowlywood
Summary of activity commonsense
Knowlywood: First organized commonsense activity KB with activity attributes and disambiguated values containing nearly 1 million activities with visuals.
Take away message: Jointly leveraging different annotated
resources helps overcome sparsity of training data.
The overall KB: WebChild KB
> 3M concepts, > 18M triples, >1000 relations
Conclusions and take home messages:Knowledge to make machines smarter can be acquired with robust techniques that jointly leverage global information
β’ Research Question 1Properties
(WSDMβ14)
β’ Research Question 2Comparatives, part-whole
(AAAIβ14, AAAIβ16)
β’ Research Question 3Activities
(WWWβ15, CIKMβ15)
60
WEBCHILD KB Applications(CVPRβ15, ACLβ15, ISWCβ16..)
Conclusions and take home messages:Knowledge to make machines smarter can be acquired with robust techniques that jointly leverage global information
β’ RQ1
β’ Range, domain, assertions of fine-grained relations
Properties
(WSDMβ14)
β’ RQ2
β’ Fine-grained comparative, part-whole relations
Comparatives, part-whole
(AAAIβ14, AAAIβ16)
β’ RQ3
β’ Activity frames with semantic attributes
Activities
(WWWβ15, CIKMβ15)
61
WEBCHILD KB Applications(CVPRβ15, ACLβ15, ISWCβ16..)
ML + NLP community
limited training data can be overcome by jointly leveraging multiple cues
Computer Vision community
commonsense helps computer vision
vision helps commonsense acquisition
AI community
semantically organized knowledge is a step towards filling human machine gap
Top Related