Post on 21-Dec-2015
On WordNet, Text Mining, and Knowledge Bases of the Future
Peter ClarkMarch 2006
Knowledge SystemsBoeing Phantom Works
Introduction• Interested in text understanding & question-answering
– use of world knowledge to go beyond text
• Used WordNet as (part of) the knowledge repository
– got some leverage
– can we get more?
– what would a WordNet KB look like?
Outline
• Machine understanding and question-answering
• An initial attempt
• From WordNet to a Knowledge Base
– Representation
– Reasoning
– Text Mining for Possibilistic Knowledge
– A Knowledge Base of the Future?
Outline
• Machine understanding and question-answering
• An initial attempt
• From WordNet to a Knowledge Base
– Representation
– Reasoning
– Text Mining for Possibilistic Knowledge
– A Knowledge Base of the Future?
On Machine Understanding
• Consider
• Suggests:• there a rocket launch• China owns the satellite• the satellite is for monitoring weather• the orbit is around the Earth• etc.
None of these are explicitly stated in the text
“China launched a meteorological satellite into orbit Wednesday, the first of five weather guardians to be sent into the skies before 2008.”
On Machine Understanding• Understanding = creating a situation-specific model
(SSM), coherent with data & background knowledge– Data suggests background knowledge which may be
appropriate– Background knowledge suggest ways of interpreting data
Fragmentary,ambiguous
inputs
Coherent Model(situation-specific)
?
?
On Machine Understanding
Fragmentary,ambiguous
inputsCoherent Model
(situation-specific)
?
? Assembly of pieces, assessment of coherence,inference
World Knowledge
On Machine Understanding
• Conjectures about the nature of the beast:– “Small” number of core theories
• space, time, movement, …• can encode directly
– Large amount of “mundane” facts• a dictionary contains many of these facts
World Knowledge
Outline
• Machine understanding and question-answering
• An initial attempt
• From WordNet to a Knowledge Base
– Representation
– Reasoning
– Text Mining for Possibilistic Knowledge
– A Knowledge Base of the Future?
Caption-Based Video Retrieval
English captionsdescribing
a video segment(partial, ambiguous)
Coherent representationof the scene (elaborated,disambiguated)
Question-Answering, Search, etc.
World Knowledge
?
?
“A man opens an airplane door”
“A lever is rotated to the unarmed position”
“…” “…”
Video
Captions(manualauthoring)
Open
Man Door Airplane
agent object
is-part-of
Caption textInterpretation
Elaboration (inference,scene-building) Open
ManDoor Airplane
World Knowledge
SearchTouch
Person Door
Query:
Illustration: Caption-Based Video Retrieval
Semantic Retrieval• Query: “A person walking”
→ Result: “A man carries a box across a room”
• “Someone injured”→ “An employee was drilling a hole in a piece of wood.
The drill bit of the drill broke.The drill twisted out of the employee's right hand.The drill injured the employee's right thumb.”
• “An object was damaged”→ the above caption (x 2)→ “Someone broke the side mirrors of a Boeing truck.”
The Knowledge Base• Representation:
– Horn-Clause rules• plus add/delete lists for “before” and “after” rules
– Authored in simplified English• NLP system interactively translates to logic
– WordNet + UT Austin relations as the ontology
– ~1000 rules authored• just a drop in the ocean!
• Reasoning:– depth-limited forward chaining– precondition/effects just asserted (no sitcalc simulation)
Some of the Rules in the KB:IF a person is carrying an entity that is inside a room THEN (almost) always the person is in the room.
IF a person is picking an object up THEN (almost) always the person is holding the object.
IF an entity is near a 2nd entity AND the 2nd entity contains a 3rd entity THEN usually the 1st entity is near the 3rd entity.
ABOUT boxes: usually a box has a lid.
BEFORE a person gives an object, (almost) always the person possesses the object.
AFTER a person closes a barrier, (almost) always the barrier is shut.
…1000 more…
Some of the Rules in the KB:IF a person is carrying an entity that is inside a room THEN (almost) always the person is in the room.
isa(_Person1, person_n1), isa(_Carry1, carry_v1), isa(_Entity1, entity_n1), isa(_Room1, room_n1),
agent(_Carry1, _Person1), object(_Carry1, _Entity1), is-inside(_Entity1, _Room1),
==== (almost) always ===>
is-inside(_Person1, _Room1).
Critique: 2 Big Questions Hanging
• Representation: The Knowledge Base– Unscalable to build the KB from scratch– WordNet helped a lot– Could it be extended to help more?– What would that WordNet KB look like?– How could it be built?
• Reasoning:– Deductive inference is insufficient– How looks with large, noisy, uncertain knowledge?
Outline
• Machine understanding and question-answering
• An initial attempt
• From WordNet to a Knowledge Base
– Representation
– Reasoning
– Text Mining for Possibilistic Knowledge
– A Knowledge Base of the Future?
What Knowledge Do We Need?
"A dawn bomb attack devasted a major Shiite shrine in Iraq..."
Like system to infer that:The bomb explodedThe explosion caused the
devastationThe shrine was damaged…
System needs to know:Bombs can explodeExplosions can destroy thingsDestruction ≈ devastationAttacks are usually done by people…
What Knowledge Do We Need? "Israeli troops were engaged in a fierce gun battle with militants in a West Bank town. An Israeli soldier was killed.
Like system to infer that:There was a fight.The soldier died.The soldier was shot.The soldier was a member of the Israeli troops.…
System needs to know:A battle involves a fight.Soldiers use guns.Guns can kill.If you are killed you are dead.Soldiers belong to troops…
WordNet (Princeton Univ)– Is not a word net; is a concept net– 117,000 lexically motivated concepts (synsets)– organized into a taxonomy (hypernymy)– massively used in AI (~7000 downloads/month)
201378060: "shuffle", "ruffle", "mix": (mix so as to make a random order or arrangement; "shuffle the cards")
201378060
201174946
201173984
superclass / genls / supertype
WordNet (Princeton Univ)– Is not a word net; is a concept net– 117,000 lexically motivated concepts (synsets)– organized into a taxonomy (hypernymy)– massively used in AI (~7000 downloads/month)
mix_v6: "shuffle", "ruffle", "mix": (mix so as to make a random order or arrangement; "shuffle the cards")
mix_v6
manipulate_v2
handle_v4
superclass / genls / supertype
The Evolution of WordNet• v1.0 (1986)
– synsets (concepts) + hypernym (isa) links• v1.7 (2001)
– add in additional relationships• has-part• causes• member-of• entails-doing (“subevent”)
• v2.0 (2003)– introduce the instance/class distinction
• Paris isa Capital-City is-type-of City
– add in some derivational links• explode related-to explosion
• …• v10.0 (2010?)
– ?????
lexicalresource
knowledgebase?
WordNet as a Knowledge BaseGot: just “isa” and “part-of” knowledge
But still need: I. Axioms about each concept!
– From definitions and examples (?)– shallow extraction has been done (LCC and ISI)– getting close to useful logic
II. Relational vocabulary (case roles, semantic relns) – could take from: FrameNet, Cyc, UT Austin
III. Relations between word senses:• bank (building) vs. bank (institution) vs. bank (staff)• cut (separate) vs. cut (sweeping motion)
• Ide & Veronis: – dictionaries have no broad contextual/world knowledge– e.g., no connection between “lawn” and “house”
• Not true!
garden -- (a yard or lawn adjoining a house)
1 sense of lawn Sense 1 lawn#1 -- (a field of cultivated and mowed grass) -> field#1 -- (a piece of land cleared of trees and usually enclosed) => yard#2, grounds#2 -- (the land around a house or other building; "it was a small house with almost no yard at all")
WN1.7.1
WN1.6
I. Knowledge in the word sense definitions: How much knowledge is in WordNet?
I. Knowledge in the word sense definitions: How much knowledge is in WordNet?
"lawn". WordNet seems to "know", among other things, that lawns– need watering– can have games played on them– can be flattened, mowed– can have chairs on them and other furniture– can be cut/mowed– things grow on them– have grass ("lawn grass" is a common compound)– leaves can get on them– can be seeded
"accident" (ignoring derivatives like "accidentally")– accidents can block traffic– you can be prone to accidents– accidents happen– result from violent impact; passengers can be killed– involve vehicles, e.g., trains– results in physical damage or hurt, or death– there are victims– you can be blamed for accidents
I. Knowledge in the word sense definitions: How much knowledge is in WordNet?
I. Knowledge in the word sense definitions: Generating Logic from Glosses
• Definitions appear deceptively simple– really, huge representational challenges underneath
hammer_n2: (a hand tool with a heavy rigid head and a handle; used to deliver an impulsive force by striking) launch_v3: (launch for the first time; "launch a ship") cut_n1: (the act of reducing the amount or number)love_v1: (have a great affection or liking for)theater_n5: (a building where theatrical performances can be held)
• Want logic to be faithful but also simple (usable)• Claim: We can get away with a “shallow” encoding
– all knowledge as Horn clauses– some loss of fidelity– gain in syntactic simplicity and reusability
I. Knowledge in the word sense definitions: Simplifying
1. “Timeless” representations– No tagging of facts with situations– Representation doesn’t handle change
break_v4: (render inoperable or ineffective; "You broke the alarm clock when you took it apart!")
Ax,y isa(x,Break_v4) & isa(y,Thing) & object(x,y) → property(y,Inoperable)
Break
Thing Inoperable
object
property
I. Knowledge in the word sense definitions: Simplifying
“hammer_n2: (… used to deliver an impulsive force by striking)”
2. For statements about types, use instances instead:
Ax isa(x,Hammer_n2) → Ed,f,s,y,z … & isa(d, Deliver_v9) &isa(s, Hit_v2) &isa(f, Force_n3) &purpose(x, d) &object(d, f) &subevent(d, s).
Hammer
Handle Head
Rigid Heavy
property
has-part
Deliver Force
Strike
subevent
purposeobject
Strictly, should be purpose(x,Deliver-Impulsive-Force)
II: Relational Vocabulary
• Is this enough?• No, also need relational vocabulary
• Which relational vocabulary to use?– agent, patient, has-part, contains, destination, …
• Possible sources:– UT Austin’s Slot Dictionary (~100 relations)– Cyc (~1000 relations)– FrameNet (??)
III. Relations between word senses: Nouns
• Nouns often have multiple, related senses
School_n1: an institution School_n2: a building School_n3: the process of being educated School_n4: staff and students School_n5: a time period of instruction
• Reasoner needs to know these are related
The school declared that the teacher’s strike was over.
Students should arrive at 9:15am tomorrow morning.
School_n1 (institution)
School_n4 (staff,students)
School_n2 (building)
staff and students (School_n4)
educational process (School_n3)
institution (School_n1)
building (School_n2)
time period of instruction (School_n5)
participants
constituent
location
constituent
during
• Can hand-code these relationships (slow)III. Relations between word senses: Nouns
members
process
institution
building
time period
participants
constituent
location
constituent
during
• Can hand-code these relationships (slow)• BUT: The patterns repeat (Buitelaar)
III. Relations between word senses: Nouns
staff and students (School_n4)
educational process (School_n3)
institution (School_n1)
building (School_n2)
time period of instruction (School_n5)
participants
constituent
location
constituent
during
members
process
institution
building
time period
participants
constituent
location
constituent
during
• Can hand-code these relationships (slow)• BUT: The patterns repeat (Buitelaar)
– can encode and reuse the patterns
III. Relations between word senses: Nouns
staff and students (School_n4)
educational process (School_n3)
institution (School_n1)
building (School_n2)
time period of instruction (School_n5)
participants
constituent
location
constituent
during
III. Relations between word senses: Verbs• WordNet’s verb senses:
– 41 senses of “cut”– linguistically not representationally motivated
• “cut grass” (cut_v18) ≠ “cut timber” (cut_v31) ≠ “cut grain” (cut_v28) (“mow”, “chop”,
“harvest”)• cut_v1 (separate) ≠ cut_v3 (slicing movement) • fails to capture commonality
• Better: – Organize verbs into “mini taxonomy” – “Supersenses”, to group same meanings– Identify facets of verbs, use multiple inheritance
• result of action• style of action
Outline
• Machine understanding and question-answering
• An initial attempt
• From WordNet to a Knowledge Base
– Representation
– Reasoning
– Text Mining for Possibilistic Knowledge
– A Knowledge Base of the Future?
“We don’t believe that there’s any shortcut to being intelligent; the “secret” is to have lots of knowledge.” Lenat & Guha ‘86
“Knowledge is the primary source of the intellectual power of intelligent agents, both human and computer.” Feigenbaum ‘96
The Myth of Common-Sense:All you need is knowledge…
The Myth of Common-Sense
• Common, implicit assumption (belief?) in AI:– Knowledge is the key to intelligence– Acquisition of knowledge is bottleneck
• Spawned from:– ’70s experience with Expert Systems
• Feigenbaum’s “knowledge acquisition bottleneck”
– Introspection
Thought Experiment…
• Suppose we had• good logical translations of the WordNet definitions• good relational vocabulary• rich relationships between related word senses
– How would these be used?– Would they be enough?– What else would be needed?
Bomb Shrine
Dawn
DevastateAttack
timecauses
objectinstrument
"A dawn bomb attack devasted a major Shiite shrine in Iraq..."
Initial Scenario Sentence
Bomb Shrine
Dawn
DevastateAttack
timecauses
objectinstrument
“bomb: an explosive devicefused to detonate”
Bomb Detonate
Explosive
contains
purpose
Device
"A dawn bomb attack devasted a major Shiite shrine in Iraq..."
WordNet
One Elaboration Step (knowledge of bomb)
Bomb Shrine
Dawn
DevastateAttack
timecauses
objectinstrument
Bomb Shrine
Dawn
DevastateAttackcauses
objectinstrument
“bomb: an explosive devicefused to detonate”
Bomb Detonate
Explosive
contains
purpose
Device
Detonate
Explosivecontains
purpose
Device
"A dawn bomb attack devasted a major Shiite shrine in Iraq..."
WordNet
One Elaboration Step (knowledge of bomb)
“bomb: an explosive devicefused to detonate”
Bomb Detonate
Explosive
contains
purpose
Device
Bomb
BombingTerroristagent
instrument
“bombing: the use of bombs for sabotage; a tactic frequently used by terrorists”
“plastic explosive: an explosive material …intended to destroy”
Explosive
Destroy
purpose
Destroy
Explode
causes
“explode: destroy by exploding”
Destroy Damagecauses
“destroy: damage irrepairably”
Additional, Relevant Knowledge in WordNet
Bomb Shrine
Dawn
DevastateAttack
{Detonate,Explode}
Explosive
contains
timecauses
objectinstrument
Bomb Shrine
Dawn
{Devastate,Destroy}Attack
timecauses
objectinstrument
"A dawn bomb attack devasted a major Shiite shrine in Iraq..."
WordNet
Multiple Elaboration Steps
“bomb: an explosive device fused to detonate”
Bomb Shrine
Dawn
DevastateAttack
{Detonate,Explode}
Explosive
contains
timecauses
objectinstrument
Bomb Shrine
Dawn
{Devastate,Destroy}Attack
timecauses
objectinstrument
"A dawn bomb attack devasted a major Shiite shrine in Iraq..."
WordNet
Multiple Elaboration Steps
“bomb: an explosive device fused to detonate”
“bombing: the use of bombs for sabotage; a tactic frequently used by terrorists”
Terroristagent
Bomb Shrine
Dawn
Attack
{Detonate,Explode}
Explosive
contains
timecauses
objectinstrument
Bomb Shrine
Dawn
{Devastate,Destroy}Attack
timecauses
objectinstrument
"A dawn bomb attack devasted a major Shiite shrine in Iraq..."
WordNet
Multiple Elaboration Steps
“bomb: an explosive device fused to detonate”
“bombing: the use of bombs for sabotage; a tactic frequently used by terrorists”
“plastic explosive: an explosive material …intended to destroy”
Terroristagent {Devastate,
Destroy}
purpose
Bomb Shrine
Dawn
Attack
{Detonate,Explode}
Explosive
contains
timecauses
objectinstrument
Bomb Shrine
Dawn
{Devastate,Destroy}Attack
timecauses
objectinstrument
"A dawn bomb attack devasted a major Shiite shrine in Iraq..."
WordNet
Multiple Elaboration Steps
“bomb: an explosive device fused to detonate”
“bombing: the use of bombs for sabotage; a tactic frequently used by terrorists”
“plastic explosive: an explosive material …intended to destroy”
“destroy: damage irrepairably”
Terroristagent {Devastate,
Destroy}
purpose
Damagecauses
Bomb Shrine
Dawn
{Devastate,Destroy}Attack
{Detonate,Explode}
Explosive
DamageTerrorist
contains
agenttime
causes causes
causes
purpose
objectinstrument
Bomb Shrine
Dawn
{Devastate,Destroy}Attack
timecauses
objectinstrument
"A dawn bomb attack devasted a major Shiite shrine in Iraq..."
WordNet
Multiple Elaboration Steps
“bomb: an explosive device fused to detonate”
“bombing: the use of bombs for sabotage; a tactic frequently used by terrorists”
“plastic explosive: an explosive material …intended to destroy”
“destroy: damage irrepairably”
“explode: destroy by exploding”
How this really works…
• Pieces may not “fit together” so neatly– multiple ways of saying the same thing– Uncertainty at all stages of the process
• definitions are often only typical facts• errors in both English and translations
• Process is not a chain of deductions, rather– is a search of possible elaborations– looking for the most “coherent” elaboration
• More “crystallization” rather than “deduction”
1. Reasoning as a Search for Coherence
“A bomb attack
devasted a shrine.."
Bomb = explosive which detonates?
Bomb=calorimeter; measures heat??
?
1. Reasoning as a Search for Coherence
“A bomb attack
devasted a shrine.."
Bomb = explosive which detonates?
Bomb=calorimeter; measures heat?
Detonate=explode; explode causes destroy?
Detonate=explode; explode=increase in population?
?
?
?
1. Reasoning as a Search for Coherence
“A bomb attack
devasted a shrine.."
Bomb = explosive which detonates?
Bomb=calorimeter; measures heat?
Measure = assess quantity?
Detonate=explode; explode causes destroy?
Detonate=explode; explode=increase in population?
?
?
?
?
Reasoning as a Search for Coherence
“A bomb attack
devasted a shrine.."
Bomb = explosive which detonates?
Bomb=calorimeter; measures heat?
Measure = assess quantity?
Detonate=explode; explode causes destroy?
Detonate=explode; explode=increase in population?
?
?
?
?
Matching pieces of the representation
• Problem: – There are additional, implied facts– Need to compute and match against these also
• For example:– (X in state S) & (X part-of Y) ~→ (Y in state S)
• S = broken, injured, valuable, …– (X causes Y) & (Y causes Z) ~→ (X causes Z)– (X does Y) & (Y causes Z) ~→ (X does Z)
Bomb
Throw
Man
Destroy
Shrine
causes
objectagent
Bomb
Throw
Man
Destroy
Shrine
causes
objectagent causes
Assessing Coherence• Does a representation seem “sensible”?
– Minsky: We proactively ask certain questions• Coherence criteria (examples)
– No contradictions– Agents perform actions in pursuit of their goals– Agents have resources for their actions– Events have a cause (including randomness)– Artifacts are used for their purpose– Structures are physically possible– Observation not unusual (“sanctioned” by experience)
Assessing Coherence:“Sanctioning” and Possibilistic Knowledge
• We know from experience what is “usual”– Cats can drink milk– Rockets can be launched– Helicopters can land– etc.
• If we see these, we are comfortable– These statements sanction our tentative conclusions– logically, these are strange beasts
• How to accumulate this “database of possibilities”?
Outline
• Machine understanding and question-answering
• An initial attempt
• From WordNet to a Knowledge Base
– Representation
– Reasoning
– Text Mining for Possibilistic Knowledge
– A Knowledge Base of the Future?
Knowledge Mining: Acquiring Possibilistic Knowledge
There is a largely untapped source of general knowledge in texts, lying at a level beneath the explicit assertional content, and which can be harnessed.
“The camouflaged helicopter landed near the embassy.” helicopters can land helicopters can be camouflaged
Schubert’s Conjecture:
Our attempt: “lightweight” LFs generated from ReutersLF forms: (S subject verb object (prep noun) (prep noun) …) (NN noun … noun) (AN adj noun)
Knowledge Mining: Acquiring Possibilistic Knowledge
HUTCHINSON SEES HIGHER PAYOUT. HONG KONG. Mar 2.Li said Hong Kong’s property market remains strong while its economy is performing better than forecast. Hong Kong Electric reorganized and will spin off its non-electricity related activities. Hongkong Electric shareholders will receive one share in the new subsidiary for every owned share in the sold company. Li said the decision to spin off …
Newswire Article
Shareholders may receive shares.
Companies may be sold.
Shares may be owned.
Implicit, tacit knowledge
Outline
• Machine understanding and question-answering
• An initial attempt
• From WordNet to a Knowledge Base
– Representation
– Reasoning
– Text Mining for Possibilistic Knowledge
– A Knowledge Base of the Future?
Knowledge Bases of the Future
0. Core WordNet
1. Gloss Axioms
2. Relations
3. Related senses
4. Core Rules
5. Possibilities
Knowledge Bases of the Future1. Machine-sensible glosses and examples
;;; "Bomb: An explosive device fused to detonate" isa(_Bomb1, bomb_n1) ------->
isa(_Bomb1, device_n1) isa(_Explosive1, explosive_n2) isa(_Fuse1, fuse_v1) isa(_Detonate1, detonate_v1) contains(_Bomb1,_Explosive1) purpose(_Bomb1,_Detonate1) object(_Fuse1, _Bomb1)
;;; "The bomb exploded" isa(_Bomb1, device_n1) isa(_Explode1, explode_v1) explode(_Bomb1, _Explode1)
1. File wordnet3.0/glosses.wn
purpose(bomb_n1,detonate_v1). purpose(knife_n1,cut_v4). purpose(car_n1,transport_v5). ...
2a. File wordnet3.0/purpose.wn
contains(bomb_n1,explosive_n4). contains(river_n1,water_n2). contains(body_n1,blood_n2). ...
2b. File wordnet3.0/contains.wn
2c. File wordnet3.0/instrument.wn ...
Knowledge Bases of the Future2. Extra relational tables
pattern(p1, _Process1, _Building1, _Members1, _TimePeriod1)
location( _Process1, _Building1)partipants(_Process1, _Members1)during( _Process1, _TimePeriod1)
pattern(p1, school_n1, school_n4, school_n2, school_n6).pattern(p1, university_n1, university_n2, university_n6, university_n4).pattern(p1, government_n1, government_n4, government_n2, government_n3).
...
3a. File wordnet3.0/polysemy-patterns.wn
Knowledge Bases of the Future3. Related Senses
3b. File wordnet3.0/verb-facets.wnhypernym(cut_v3,cut_v5).hypernym(cut_v9,cut_v5).hypernym(cut_v11,cut_v5)....
has-part(_X,_Y)has-part(_Y,_Z)
------->has-part(_X,_Z)
does(_X,_A)causes(_A,_B)
------->does(_X,_B)
...
4. File wordnet3.0/rules.wn
Knowledge Bases of the Future4. General Rules
can(cat_n1, sit_v1).can(cat_n1, drink_v1).can(airplane_n1, fly_v2).can(airplane_n1, land_v4)...
5. File wordnet3.0/possibilities.wn
Knowledge Bases of the Future5. Possibilitistic Statements
Summary• WordNet is on the path to being a knowledge base
– Needs logical definitions of its word senses– Relational vocabulary– More relationships between word senses
• Notions of reasoning have to change too– Search for coherence (crystalization process)
• Also need:– Core rules– Possibilitistic knowledge
• Is this doable?– Yes! Is getting close with work on glosses– But, result will never be perfect