Distant Supervision and MultiR - IIT Delhi · 2017. 3. 7. · Steve Jobs founded Apple Apple was...
Transcript of Distant Supervision and MultiR - IIT Delhi · 2017. 3. 7. · Steve Jobs founded Apple Apple was...
-
Distant Supervision and MultiR
Happy Mittal
-
We will discuss
• Distant Supervision [Mintz et al, 2009]
• MultiR [Hoffmann et al, 2011]
-
Relation Instance Extraction
• Fully Supervised Learning• Labeled corpora of sentences.• Suffers from small dataset, domain bias.
• Unsupervised Learning• Cluster patterns to identify relations.• Large corpora available.• Can’t give name to relations identified.
• Bootstrap Learning• Give initial seed patterns and facts.• Generate more facts and patterns.• Suffers from semantic drift.
• Distant Supervision• Combines advantages of above approaches.
Hrithik Roshan’s Movie Kaabilfeatures love affair between two blind people.
Actor(Hrithik Roshan, Kaabil)
-
Distant Supervision [Mintz et al 2009]
Sentences (Ex : Wikipedia articles)
Person Birth Place
Edwin Hubble Marshfield
…. ….
Knowledge base(Ex : Freebase)
Generate training data
HOW ?
Assumption : Fact r(e1,e2) => Every sentence having entities e1 and e2specifies relation r.
-
Distant Supervision (Generating training data)• Astronomer Edwin Hubble was born in Marshfield, Missouri.
• Features : • Lexical Features
o Entity Types of both entities.
NE1 NE2 Label
PER LOC Birthplace
-
Distant Supervision (Generating training data)• Astronomer Edwin Hubble was born in Marshfield, Missouri.
• Features : • Lexical Features
o Words between entities and their POS tags.
NE1 Middle NE2 Label
PER [was/VERB born/VERB in/CLOSED] LOC Birthplace
-
Distant Supervision (Generating training data)• Astronomer Edwin Hubble was born in Marshfield, Missouri.
• Features : • Lexical Features
o Window of k words to left and right, k∈{0,1,2}
Left Window NE1 Middle NE2 Right window Label
[] PER [was/VERB born/VERB in/CLOSED] LOC [] Birthplace
[Astronomer] PER [was/VERB born/VERB in/CLOSED] LOC [,] Birthplace
[#,Astronomer] PER [was/VERB born/VERB in/CLOSED] LOC [,Missouri] Birthplace
-
Distant Supervision (Generating training data)• Astronomer Edwin Hubble was born in Marshfield, Missouri.
• Features : • Syntactic Features
o Dependency Path between entities.
o Window node in dependency path.
-
Distant supervision
• Strong Assumption : If a fact r(e1,e2) is seen in KB, then • Every sentence having e1 and e2 specifies relation r.
• Relax this assumption : • At least one sentence having e1 and e2 specifies relation r [Riedel et al, 2010]
-
Relaxing the assumption [Riedel et al 2010]
Founded 𝑌 ∈ R Relation Variable
Z1 = 1 Z2 = 0
Steve Jobs founded Apple
Steve Jobs is the CEO of Apple
Z1,Z2∈ {0,1} Relation mention Variables
X1 X2
• Model the joint distribution 𝑃(𝑌 = 𝑦, 𝑍 = 𝑧|𝑥)
-
Relaxing the assumption [Riedel et al 2010]
Founded 𝑌 ∈ R Relation Variable
Z1 = 1 Z2 = 0
Steve Jobs founded Apple
Steve Jobs is the CEO of Apple
Z1,Z2∈ {0,1} Relation mention Variables
X1 X2
• Model the joint distribution 𝑃 𝑌 = 𝑦, 𝑍 = 𝑧 𝑥• Problem : Doesn’t allow overlapping relations.• MultiR solves that problem.
-
MultiR [Hoffman et al 2011]
Founded 𝑌 ∈ 0,1 𝑟
Relation Variables(Capture aggregate level prediction)
Z1 = Founded
Z2 = CEO-of
Steve Jobs founded Apple
Steve Jobs is the CEO of Apple
𝑍𝑖 ∈ 𝑅
Relation mention Variables(Capture sentence level prediction)
X1 X2
CEO-of
Z3 = None
Steve Jobs left Apple
X3
…
-
MultiR [Hoffman et al 2011]
• Probability Distribution
• 𝑃 𝑌 = 𝑦, 𝑍 = 𝑧 𝑥 =1
𝑍𝑥 𝑟𝜙𝑗𝑜𝑖𝑛(𝑦𝑟 , 𝑧) 𝑖 ∅
𝑒𝑥𝑡𝑟𝑎𝑐𝑡(𝑧𝑖,𝑥𝑖)
1 if at least one 𝑧𝑖mentions relation 𝑦𝑟
[Mintz et al] features
-
MultiR [Hoffman et al 2011]• Parameter Learning
• 𝑃 𝑌 = 𝑦, 𝑍 = 𝑧 𝑥; 𝜃 =1
𝑍𝑥 𝑟𝜙𝑗𝑜𝑖𝑛(𝑦𝑟 , 𝑧) 𝑖 ∅
𝑒𝑥𝑡𝑟𝑎𝑐𝑡(𝑧𝑖,𝑥𝑖)
• 𝑃 𝑌 = 𝑦, 𝑍 = 𝑧 𝑥; 𝜃 =1
𝑍𝑥 𝑟𝜙𝑗𝑜𝑖𝑛(𝑦𝑟 , 𝑧) 𝑖 exp( 𝑗 𝜃𝑗 ∅𝑗(𝑧𝑖,𝑥𝑖)
• Treat Z variables as latent variables.
• Interested in maximizing
𝐿 𝜃 =
𝑖
𝑃 𝑦𝑖 𝑥𝑖; 𝜃 =
𝑖
𝑧
𝑃 𝑦𝑖 , 𝑧 𝑥𝑖; 𝜃
𝑙 𝜃 =
𝑖
𝑙𝑜𝑔
𝑧
𝑃 𝑦𝑖 , 𝑧 𝑥𝑖; 𝜃
1 if at least one 𝑧𝑖mentions relation 𝑦𝑟
[Mintz et al] features
-
MultiR [Hoffman et al 2011]• Parameter learning
Assumption of online training
-
MultiR [Hoffman et al 2011]• Parameter learning
Difficult to computeCompute argmax instead
-
MultiR [Hoffman et al 2011]
• Learning Algorithm
Need to do two inferences
-
MultiR Inference 1 𝑎𝑟𝑔𝑚𝑎𝑥𝑦,𝑧𝑃(𝑦, 𝑧|𝑥; 𝜃)
? 𝑌 ∈ 0,1 𝑟
Relation Variables(Capture aggregate level prediction)
? ?
Steve Jobs founded Apple
Apple was founded by Steve Jobs
𝑍𝑖 ∈ 𝑅
Relation mention Variables(Capture sentence level prediction)
X1 X2
?
?
Steve Jobs ls the CEO of Apple
X3
founder CEO-of
?
Capital
Founder 10.5 12.5 4.5
CEO-of 8.9 8.7 8.5
Capital 6.3 4.5 0.5
-
MultiR Inference 1 𝑎𝑟𝑔𝑚𝑎𝑥𝑦,𝑧𝑃(𝑦, 𝑧|𝑥; 𝜃)
? 𝑌 ∈ 0,1 𝑟
Relation Variables(Capture aggregate level prediction)
? ?
Steve Jobs founded Apple
Apple was founded by Steve Jobs
𝑍𝑖 ∈ 𝑅
Relation mention Variables(Capture sentence level prediction)
X1 X2
?
?
Steve Jobs is the CEO of Apple
X3
founder CEO-of
?
Capital
Founder 10.5 12.5 4.5
CEO-of 8.9 8.7 8.5
Capital 6.3 4.5 0.5
-
MultiR Inference 1 𝑎𝑟𝑔𝑚𝑎𝑥𝑦,𝑧𝑃(𝑦, 𝑧|𝑥; 𝜃)
? 𝑌 ∈ 0,1 𝑟
Relation Variables(Capture aggregate level prediction)
Founder Founder
Steve Jobs founded Apple
Apple was founded by Steve Jobs
𝑍𝑖 ∈ 𝑅
Relation mention Variables(Capture sentence level prediction)
X1 X2
?
CEO-of
Steve Jobs is the CEO of Apple
X3
founder CEO-of
?
Capital
Founder 10.5 12.5 4.5
CEO-of 8.9 8.7 8.5
Capital 6.3 4.5 0.5
-
MultiR Inference 1 𝑎𝑟𝑔𝑚𝑎𝑥𝑦,𝑧𝑃(𝑦, 𝑧|𝑥; 𝜃)
1 𝑌 ∈ 0,1 𝑟
Relation Variables(Capture aggregate level prediction)
Founder Founder
Steve Jobs founded Apple
Apple was founded by Steve Jobs
𝑍𝑖 ∈ 𝑅
Relation mention Variables(Capture sentence level prediction)
X1 X2
1
CEO-of
Steve Jobs is the CEO of Apple
X3
founder CEO-of
0
Capital
𝑂( 𝑅 𝑆 )Founder 10.5 12.5 4.5
CEO-of 8.9 8.7 8.5
Capital 6.3 4.5 0.5
-
MultiR Inference 2 𝑎𝑟𝑔𝑚𝑎𝑥𝑧𝑃(𝑧|𝑥, 𝑦; 𝜃)
1 𝑌 ∈ 0,1 𝑟
Relation Variables(Capture aggregate level prediction)
? ?
Steve Jobs founded Apple
Apple was founded by Steve Jobs
𝑍𝑖 ∈ 𝑅
Relation mention Variables(Capture sentence level prediction)
X1 X2
1
?
Steve Jobs is the CEO of Apple
X3
founder CEO-of
0
Capital
Founder 10.5 12.5 4.5
CEO-of 8.9 8.7 8.5
Capital 6.3 4.5 0.5
-
MultiR Inference 2 𝑎𝑟𝑔𝑚𝑎𝑥𝑧𝑃(𝑧|𝑥, 𝑦; 𝜃)
1 𝑌 ∈ 0,1 𝑟
Relation Variables(Capture aggregate level prediction)
? ?
Steve Jobs founded Apple
Apple was founded by Steve Jobs
𝑍𝑖 ∈ 𝑅
Relation mention Variables(Capture sentence level prediction)
X1 X2
1
?
Steve Jobs is the CEO of Apple
X3
founder CEO-of
0
Capital
10.512.5 4.58.9
8.78.5
Founder 10.5 12.5 4.5
CEO-of 8.9 8.7 8.5
Capital 6.3 4.5 0.5
Potentials as edge weights(Ignore edgesWith y = 0)
-
MultiR Inference 2 𝑎𝑟𝑔𝑚𝑎𝑥𝑧𝑃(𝑧|𝑥, 𝑦; 𝜃)
1
Variant of weighted edge cover problem
? ?
Steve Jobs founded Apple
Apple was founded by Steve Jobs
X1 X2
1
?
Steve Jobs is the CEO of Apple
X3
founder CEO-of
0
Capital
10.512.5 4.58.9
8.78.5
Founder 10.5 12.5 4.5
CEO-of 8.9 8.7 8.5
Capital 6.3 4.5 0.5
Potentials as edge weights(Ignore edgesWith y = 0)
Each y at least one edgeEach z exactly one edge
-
MultiR Inference 2 𝑎𝑟𝑔𝑚𝑎𝑥𝑧𝑃(𝑧|𝑥, 𝑦; 𝜃)
1
Variant of weighted edge cover problem
? ?
Steve Jobs founded Apple
Apple was founded by Steve Jobs
X1 X2
1
?
Steve Jobs is the CEO of Apple
X3
founder CEO-of
0
Capital
10.512.5 4.58.9
8.78.5
Founder 10.5 12.5 4.5
CEO-of 8.9 8.7 8.5
Capital 6.3 4.5 0.5
Potentials as edge weights(Ignore edgesWith y = 0)
Each y at least one edgeEach z exactly one edge
-
MultiR Inference 2 𝑎𝑟𝑔𝑚𝑎𝑥𝑧𝑃(𝑧|𝑥, 𝑦; 𝜃)
1
Variant of weighted edge cover problem
Founder Founder
Steve Jobs founded Apple
Apple was founded by Steve Jobs
X1 X2
1
CEO-Of
Steve Jobs is the CEO of Apple
X3
founder CEO-of
0
Capital
10.512.5 4.58.9
8.78.5
Founder 10.5 12.5 4.5
CEO-of 8.9 8.7 8.5
Capital 6.3 4.5 0.5
Potentials as edge weights(Ignore edgesWith y = 0)
Each y at least one edgeEach z exactly one edge
Exact Solution𝑂(𝑉(𝐸 + 𝑉𝑙𝑜𝑔𝑉))
-
MultiR Inference 2 𝑎𝑟𝑔𝑚𝑎𝑥𝑧𝑃(𝑧|𝑥, 𝑦; 𝜃)
1
Variant of weighted edge cover problem
Founder Founder
Steve Jobs founded Apple
Apple was founded by Steve Jobs
X1 X2
1
CEO-Of
Steve Jobs is the CEO of Apple
X3
founder CEO-of
0
Capital
10.512.5 4.58.9
8.78.5
Founder 10.5 12.5 4.5
CEO-of 8.9 8.7 8.5
Capital 6.3 4.5 0.5
Potentials as edge weights(Ignore edgesWith y = 0)
Each y at least one edgeEach z exactly one edge
Approx Solution𝑂(|𝑅||𝑆|)
-
Experiments• Data
• NY Times sentences : NER tagged
• Used Freebase as KB.
• Evaluation Metric
• Challenging
• Only 3% of sentences match facts in KB.
• Number of matches across relations highly unbalanced.
• Aggregate Extraction
• Matched extracted relations with freebase relations.
• Underestimates accuracy because many true relations not in free base.
• Sentential Extraction
• Sampled sentences from union of two sets of sentences : • Sentences from which some relation is extracted.
• Sentences whose arguments match with entities in freebase.
• Manually labelled them correct or incorrect.
• Overestimates the recall.
-
Experiments
• Systems compared• Original implementation of Riedel et al [2010]
• SoloR : Reimplementation of Riedel et al [2010]
• MultiR
• Metrics• Aggregate and sentential extraction results (PR curve)
• Relation specific results
• Running time
-
Experiments
• Results• Aggregate extraction
• MultiR : High precision over all recall
• MultiR : Recall from 20% to 25%
• Low precision in 0-1% Recall
• To investigate, extracted top 10
Relations marked wrong.
• Correct but not present in Freebase.
-
Experiments
• Results• Sentential extraction
• Riedel et al didn’t report.
• MultiR : High precision and recall
• MultiR : F1 score : 60.5%
-
Experiments
• Results• Relation specific results
• Take 10 top frequent relations.
• 𝑆𝑟𝑀 : Sentences MultiR extracted relation r.
• 𝑆𝑟𝐹 : Sentences matching arguments in freebase for relation r.
• Sample 100 sentences from both.
• Compute Accuracy, Precision and recall.
-
Experiments
Effect of modeling overlapping relations
-
Discussion
• Only relies on freebase for experimental evaluation [Nupur et al]
• Assumes that if a fact is present in text, then it must be present in KB [Dinesh Raghu]
• Only one relation in a sentence [Barun]
• Assume entities occur as NP only. [Gagan]
• Should use sampling instead of argmax as done in Riedel et al. [Happy, Barun]
• Evaluation problem : Only 3% sentences match in Freebase [Gagan]
• For sentential extraction evaluation, sampled only 1000 sentences.
• Separate graph for every entity pair : Scaling issue [Prachi]
-
Possible Extensions
• Evaluate on some other datasets as well, like Google knowledge graph [Anshul, Rishabh]
• Bootstrapping like NELL [Gagan et al]
• Iteratively correct the facts during learning for 0-1% recall range [Surag]
• Extract entity mentions spanning multiple sentences [Anshul]
• Relation to MLNs : Apply Lifting [Ankit]