Global Inference via Linear Programming Formulation
description
Transcript of Global Inference via Linear Programming Formulation
Global Inference via Linear Programming
Formulation
Presenter: Natalia PrytkovaTutor: Maximilian Dylla
14.07.2011
2
Outline
• Motivation• Naïve Algorithm• LP Formulation
– Constraints– Objective Function
• Applications of LP• Experiments• Discussion
3
Inference with Classifiers
Recognize entities
Recognize relations
Inference
4
Example
Book Author
5
Example
Book Author
6
Properties of Extracted Items
BalletWrittenBy(Ballet, Composer)
BookWrittenBy(Book, Author)
Ballet
Composer
Book
Author
7
Properties of Extracted Items
BalletWrittenBy(Ballet, Composer)
BookWrittenBy(Book, Author)
ShownInTheater(Ballet,Theater)
GraduatedFrom(Composer, Conservatory)
BookPublishedBy(Book, Publisher)
MemberOfUnion(Author, WritersUnion)
Ballet
Composer
Theater
Book
AuthorWritersUnionConservatory
Publisher
8
Example
BalletWrittenBy
Ballet Composer
9
Example
BalletWrittenBy
Ballet Composer
10
Properties of Extracted Items
• a lot of relations types• a lot of entities types• mutually dependent
11
Outline
• Motivation• Naïve Algorithm• ILP Formulation
– Constraints– Objective Function
• Applications of ILP• Experiments• Discussion
12
Outline
• Motivation• Naïve Algorithm• LP Formulation
– Constraints– Objective Function
• Applications of LP• Experiments• Discussion
13
Key Idea
Recognize entities
Recognize relations
Inference
14
Naïve Algorithm
15
Naïve Algorithm
P(Book BalletWrittenBy Composer) = 0.07P(Book BalletWrittenBy Author) = 0.07P(Book BookWrittenBy Composer) = 0.12P(Book BookWrittenBy Author) = 0.03P(Ballet BalletWrittenBy Composer) = 0.28P(Ballet BalletWrittenBy Author) = 0.28P(Ballet BookWrittenBy Composer) = 0.12P(Ballet BookWrittenBy Author) = 0.12…
16
Naïve Algorithm
P(Book BalletWrittenBy Composer) = 0.07P(Book BalletWrittenBy Author) = 0.07 n entities – O(n2) binary
relationsP(Book BookWrittenBy Composer) = 0.12 l labels – ln
2 assignments
P(Book BookWrittenBy Author) = 0.03P(Ballet BalletWrittenBy Composer) = 0.28P(Ballet BalletWrittenBy Author) = 0.28P(Ballet BookWrittenBy Composer) = 0.12P(Ballet BookWrittenBy Author) = 0.12…
17
Naïve Algorithm
P(Book BalletWrittenBy Composer) = 0.07P(Book BalletWrittenBy Author) = 0.07 n entities – O(n2) binary
relationsP(Book BookWrittenBy Composer) = 0.12 l labels – ln
2 assignments
P(Book BookWrittenBy Author) = 0.03P(Ballet BalletWrittenBy Composer) = 0.28P(Ballet BalletWrittenBy Author) = 0.28P(Ballet BookWrittenBy Composer) = 0.12P(Ballet BookWrittenBy Author) = 0.12…
18
Some Useful Properties
• Relations impose restrictions on entities• Each entity or relation can be labeled only
with one label• Relations can be directed
(BookWrittenBy) or undirected (SpouseOf)
19
Outline
• Motivation• Naïve Algorithm• ILP Formulation
– Constraints– Objective Function
• Applications of ILP• Experiments• Discussion
20
Key Idea
• Obtain a set of possible labels for entities/relations
• Optimize the global decision given a set of constraints
21
Definitions• Sentence S
– Linked list of words and entities. Boundaries of entities are givenPiotr Ilyich Tchaikovsky is one entity.
• Entity ε– Observed variables
• Relation– Binary relations between entities
• Class– Predefined sets of entities and relations labels.
nEEE ..., 21
composery Tchaikovsk Ilyich Piotr
ballet Nutcracker The
2
1
2
1
E
E
LE
LE
tenByBalletWrit L ) E,(E 12R2112 R
Ballet Book,Author, Composer,eL enBy BookWritttenBy,BalletWritrL
22
Constraints
Indicator variables
x
eE
lRx
lRx
lEx
ii
ijijeElR
ijlR
ilE
iiijij
ij
i
allfor 0 otherwise
argumentfirst
its as label the withentity it takes
and as labeled was relation iff 1
as labeled was relation iff 1
as labeled wasentity iff 1
},,,{
},{
},{
23
Constraints
0 1
0 1
0 1
, ,
},{},{
}, ,,{}, ,,{
},{},{
11
112112
1212
1221
bookEballetE
bookEtenByBalletWritRballetEtenByBalletWritR
nByBookWritteRtenByBalletWritR
REE
xx
xx
xx
tenByBalletWritLcomposerLballetL
24
Constraints
• Each entity or relation can be labeled only with one label
• Assignment to each entity or relation variable is consistent with the assignments to its neighboring variables
25
Objective Function
• Assignment cost– e.g. – Cost of deviating from the assignments given by
classifiers
• Constraint cost
– e.g.
– Cost of breaking constraints between two neighboring entities
)log()( plcv
otherwise ,),( if 0),( 11 Cffffdiijiij ERER
Vv R
ERERvv
ij
jijiijffdffdfcfC )],(),([)(min)(min 21
)8.0log()(1
balletcE
),(
0),(2
1
authortenByballetWritd
ballettenByballetWritd
26
Naïve Algorithm
P(Book BalletWrittenBy Composer) = 0.07P(Book BalletWrittenBy Author) = 0.07 n entities – O(n2) binary
relationsP(Book BookWrittenBy Composer) = 0.12 l labels – ln
2 assignments
P(Book BookWrittenBy Author) = 0.03P(Ballet BalletWrittenBy Composer) = 0.28P(Ballet BalletWrittenBy Author) = 0.28P(Ballet BookWrittenBy Composer) = 0.12P(Ballet BookWrittenBy Author) = 0.12…
27
Useful Property
ILP is NP hard in general, but sometimes can be solved in polynomial time.
28
Outline
• Motivation• Naïve Algorithm• ILP Formulation
– Constraints– Objective Function
• Applications of ILP• Experiments• Discussion
29
Viterbi
Shortest path
30
Viterbi
',,1',
]1,0[0,
]1,0[0,
]1,0['''',
1]-m[0,y'',
]1,0[',],1,0[
',
and between edge an is there-
}1,0{
1
1
]1,0[],1,0[ 0
s.t.
)',(logmin
yiyiyyi
myyend
myystart
myyyiyyi
myyni
yyii
vvx
x
x
x
mynixx
xyyM
31
Phrases Identification
32
Phrases Identification
33
Phrases Identification
phrasea is pair y that theprobabilit theis
}1,0{x
sconstraint pathshortest s.t.
min
),(),,(),,(),,(),,(),,(),,(),,(:
i
1
6454625232615131
ip
xp
ttttttttttttttttx
i
n
iii
i
34
Outline
• Motivation• Naïve Algorithm• ILP Formulation
– Constraints– Objective Function
• Applications of ILP• Experiments• Discussion
35
Experiments
E -> R E <-> R
Separate
R -> E Omniscient
E
R
I
E
R
I
E
R
I
E
R
I
E
R
I
36
Experiments
37
Experiments
• 5 336 entities• 19 048 pairs of entities • 1 437 sentences• running time < 30 sec on Pentium III 800
MHz
38
Outline
• Motivation• Naïve Algorithm• ILP Formulation
– Constraints– Objective Function
• Applications of ILP• Experiments• Discussion
39
Discussion
• Guarantees optimality• Supports correct decisions by imposing
limitations • LP solvers are available• Not scalable
– cplex accepts at most 231 variables and constraints• ~ 46 000 entities
– student edition accepts only 500 =)• ~ 20 entities
• No feedback to extractors
40
References
• Dan Roth and Wen-tau Yih:A Linear Programming Formulation for Global Inference in Natural Language Tasks, CoNLL'04
• Dan Roth and Wen-tau Yih:Global Inference for Entity and Relation Identification via a Linear Programming Formulation, Introduction to Statistical Relational Learning, 2007