Inducing Relations 1 Document 1... Boston was founded on November 17, 1630, by Puritan colonists...
-
Upload
keven-grimble -
Category
Documents
-
view
216 -
download
2
Transcript of Inducing Relations 1 Document 1... Boston was founded on November 17, 1630, by Puritan colonists...
1
Inducing Relations
Document 1 ... Boston was founded on November 17, 1630, by Puritan colonists from England ...
Document 2
Document 3
... New York City was settled by Europeans from The Netherlands in 1624 ...
... San Francisco was founded in 1776 by the Spanish conquerors ...
Goal: Discover types of information salient to a domain and extract short phrases representing them
2
Application: Creating Metadata• Machine-readable access mechanism for
searching, browsing, and retrieving text
MedicalRecords
Location Rotator cuffSeverity MildTear Length 2mm
DisasterReports
Injuries NoneLocation Melbourne, FLTime Tuesday morning
3
Application: Generating Infoboxes• Exploring the important attributes of new
domains automaticallyCambridge Seattle
4
Regularities for Learning Relations• Local lexical and orthographic similarity in expression of
relation instances
• Recurring specific syntactic patterns in relation occurrence
… injured six people… injured 16 relief workers… four were hurt {number}
injur*
Evoking word
Relation phrase
• Similar document-level positioning of relations
injuredkilled
sixThree
5
Note – this is the example for the document-level stuff for the slide above
A strong earthquake with a magnitude of 5.6 rocked the easternmost province of Irian Jaya on Friday.
An earthquake of magnitude 6 is considered “severe,” capable of widespread damage near the epicenter.
Beginning of Document
End of Document
Highlights of the Approach
• Novel source of supervision: declarative human knowledge about constraints
• Rich combination of information sources that combines multiple layers of linguistic analysis
• Mathematical formalism that guides unsupervised learning with human knowledge
Indicators: is_verb 0 1 0earthquake 1 0 0hit 0 1 0
......
Arguments: has_capital 0 1 1is_number 0 0 0height 1 1 2
......
Input Representation Each potential indicator word and argument phrase
encoded with features
8
injured VBN
six people+ discourse context
VP S NP
syntactic context
Output Representation Relation instances as indicator word and argument
phrase pairings.
9
Our Two-Pronged Approach
• Model Structure: A generative model of hidden indicator and argument structure– Models local lexical and syntactic similarity– Biases toward consistent document-level structure
• Soft declarative constraints: Enforced during inference via posterior regularization– Restricts global syntactic patterns – Enforces relation instance uniqueness
10
Generating Indicators and ArgumentsFor a single relation type, indicators and arguments drawn from relation-specific feature distributions
10
: parameters of indicator feature distributions : parameters of argument feature distributions
11
Backoff DistributionsRemaining constituents generated from backoff feature distributions
11
: parameters of indicator feature distributions : parameters of argument feature distributions
12
Multiple Relations (maybe delete if no fit)Each relation has its own Constituent features drawn from pointwise product over all
: either indicator or backoff for each : either argument or backoff for each
13
Selecting Relation Locations
• Relation instance locations within document drawn from shared distribution
• Indicator and argument within sentence selected uniformly at random
Document 1
Sentence 1.Sentence 2.Sentence 3.Sentence 4.
Document 2
Sentence 1.Sentence 2.
Document 3
Sentence 1.Sentence 2.Sentence 3.
Document 4
Sentence 1.Sentence 2.Sentence 3.
14
Summary of Generative Process
1. For each relation :a. Draw indicator, argument, and backoff distributions:
b. Draw location distribution:
15
(continuation of previous slide)
2. For each document :a. For each relation :
i. Select a sentence (or null): ii. Draw argument and indicator positions uniformly at
random within sentence
b. For each potential indicator word :i. Draw indicator features:
is if this word selected as indicator, otherwise
c. For each potential argument phrase :i. Draw argument features:
16
Model Properties
• Lexical similarity– Via features
• Recurring syntactic patterns– Via features and constraints during learning
• Regularities in document-level structure– Via document location distribution
• Issue: how do we break symmetry between relations?– Via constraints during learning
Variational Inference with Declarative Constraints
• Desired posterior:
• Optimize variational objective with mean field factorization:
Model parameters
Hidden structure (relations)
Observed data (words, trees)
18
Syntactic Constraints
Counts number of relationsin that match canonicalsyntactic pattern
• Biases toward relations that are syntactically plausible– Indicator is verb and argument is object of indicator– Indicator is noun and argument is modifier– Indicator and argument are subject/object of same
verb
Threshold of relations that mustmatch syntactic pattern (80%)
19
Separation Constraints (Argument)
Counts number of relationswhose arguments includeword
: no more than one relationshould share the same argument word
• Encourages relations to be diverse– Arguments cannot be shared...
20
Separation Constraints (Indicator)
Counts number of relationswhose indicators includeword
: allow some relationsto share indicator words
• Encourages relations to be diverse– Indicators can be shared to an extent
21
Experimental SetupExperiments on two news domains
Example Document
Corpus Number of Documents
Sentences/ Document
Words/Document
Relation Types
Finance 100 12.1 262.9 15
Earthquake 200 9.3 210.3 9
A strong earthquake rocked the Philippines island of Mindoro early Tuesday, killing at least two people and causing some damage, authorities said. The 3:15 am quake had a preliminary magnitude of 6.7 and was centered near Baco on northern Mindoro Island, about 75 miles south of Manila, according to the Philippine Institute of Vulcanology and Seismology. The U.S. Geological Survey in Menlo Park, Calif., put the quake's preliminary magnitude at 7.1. Gov. Rodolfo Valencia of the island's Oriental Mindoro province said two people reportedly were killed and that several buildings and bridges were damaged by the quake. Several homes near the shore reportedly were washed away by large waves, Valencia told Manila radio station DZBB. Telephone service was cut, he said. The quake swayed tall buildings in Manila. Institute spokesman Aris Jimenez said the quake occurred on the Lubang fault, one of the area's most active. A magnitude 6 quake can cause severe damage if centered under a populated area, while amgnitude 7 quake indicates a major quake capable of widespread, heavy damage.
22
Extracted Relations
Location
Magnitude
A strong earthquake rocked the Philippines island of Mindoro early Tuesday, killing at least two people and causing some damage, authorities said.
The 3:15 am quake had a preliminary magnitude of 6.7 and was centered near Baco on northern Mindoro Island...
The 3:15 am quake had a preliminary magnitude of 6.7 and was centered near Baco on northern Mindoro Island...
Time
23
Generic versus Domain-specific Knowledge
• Generic Feature Representation– Indicator: word, POS, word stem– Argument: word, syntax label, headword of
parent, dependency label to parent
• Domain-specific knowledge (relation independent)
– Finance: prefer arguments with numbers– Earthquake: prefer relations in first two sentences
of each document
Main Results (Sentence F-score)
70
57.563.1
69.2
84.4
Finance Earthquake
29.5 29.738.2
60.466
• USP: Unsupervised semantic parsing(Poon and Domingos 2009)• CLUTO: CLUTO sentence clustering• Mallows: Mallows content model sentence clustering (Chen et al 2009)
25
Main Results (Token F-score)
USP Model Model+DSC
19.3
30.5
38
Finance Earthquake
USP Model Model+DSC
12.6
18.3
22.8
• USP: Unsupervised semantic parsing(Poon and Domingos 2009)
26
Constraint Ablation Analysis
What happens as we modify declarative constraints?No-sep: No separation constraintsNo-syn: No syntactic constraintsHard-syn: Always enforce syntactic constraints
Model No-sep No-syn Hard-syn
69.2
5259.3
42.9
Model No-sep No-syn Hard-syn
60.4
28.8
57.1 60.1
Finance Earthquake
27
What if we had Annotated Data?