Random Walk Inference Over Knowledge Bases and...
Transcript of Random Walk Inference Over Knowledge Bases and...
Random Walk Inference Over Knowledge Bases and Text
11-805 class presentationMatt Gardner
work done by: Ni Lao, Matt Gardner, and collaborators1
So we do inference
- Predict missing facts given what we know
- Lots of ways to do this, today we’ll talk about random walk inference, or the Path Ranking Algorithm (PRA)
11
So we do inference
- Predict missing facts given what we know
- Lots of ways to do this, today we’ll talk about random walk inference, or the Path Ranking Algorithm (PRA)
- [Lao, Mitchell, Cohen, EMNLP 2011]
12
Inference
If: x1 competes with(x1,x2)
x2 economic sector (x2, x3)
x3
Then: economic sector (x1, x3)
[Lao et al, EMNLP 2011]
13
Pittsburgh
PennsylvaniaCityInState
Philadelphia Harrisburg
U.S.
DeltaPPG
AtLocation
Atlanta DallasTokyo
Japan
CityLocatedInCountry - Selecting path features
City
Loca
tedI
nCou
ntry
Ohio River
Cincinatti
City
Lies
OnR
iver
StateInCountry
Path type Count
[Lao et al, EMNLP 2011]
14
Pittsburgh
PennsylvaniaCityInState
Philadelphia Harrisburg
U.S.
DeltaPPG
AtLocation
Atlanta DallasTokyo
Japan
City
Loca
tedI
nCou
ntry
Ohio River
Cincinatti
City
Lies
OnR
iver
CityLocatedInCountry - Selecting path features
StateInCountry
Path type Count
[Lao et al, EMNLP 2011]
15
Pittsburgh
PennsylvaniaCityInState
Philadelphia Harrisburg
U.S.
DeltaPPG
AtLocation
Atlanta DallasTokyo
Japan
City
Loca
tedI
nCou
ntry
Ohio River
Cincinatti
City
Lies
OnR
iver
CityLocatedInCountry - Selecting path features
StateInCountry
Path type CountCityInState, StateInCountry 1
[Lao et al, EMNLP 2011]
16
Pittsburgh
PennsylvaniaCityInState
Philadelphia Harrisburg
U.S.
DeltaPPG
AtLocation
Atlanta DallasTokyo
Japan
City
Loca
tedI
nCou
ntry
Ohio River
Cincinatti
City
Lies
OnR
iver
CityLocatedInCountry - Selecting path features
StateInCountry
Path type CountCityInState, StateInCountry 1CityInState, CityInState-1, CityLocatedInCountry 1
[Lao et al, EMNLP 2011]
17
Pittsburgh
PennsylvaniaCityInState
Philadelphia Harrisburg
U.S.
DeltaPPG
AtLocation
Atlanta DallasTokyo
Japan
City
Loca
tedI
nCou
ntry
Ohio River
Cincinatti
City
Lies
OnR
iver
Path type CountCityInState, StateInCountry 1CityInState, CityInState-1, CityLocatedInCountry 2
CityLocatedInCountry - Selecting path features
StateInCountry
[Lao et al, EMNLP 2011]
18
Pittsburgh
PennsylvaniaCityInState
Philadelphia Harrisburg
U.S.
PPG
AtLocation
Atlanta DallasTokyo
Japan
City
Loca
tedI
nCou
ntry
Ohio River
Cincinatti
City
Lies
OnR
iver
Path type CountCityInState, StateInCountry 1CityInState, CityInState-1, CityLocatedInCountry 2AtLocation-1, AtLocation, CityLocatedInCountry 1
CityLocatedInCountry - Selecting path features
StateInCountry
Delta
[Lao et al, EMNLP 2011]
19
Pittsburgh
PennsylvaniaCityInState
Philadelphia Harrisburg
U.S.
DeltaPPG
AtLocation
Atlanta DallasTokyo
Japan
City
Loca
tedI
nCou
ntry
Ohio River
Cincinatti
City
Lies
OnR
iver
CityLocatedInCountry - Selecting path features
StateInCountry
Path type CountCityInState, StateInCountry 2CityInState, CityInState-1, CityLocatedInCountry 24AtLocation-1, AtLocation, CityLocatedInCountry 10CityLiesOnRiver, CityLiesOnRiver-1, CityLocatedInCountry 1… …
[Lao et al, EMNLP 2011]
20
Pittsburgh
PennsylvaniaCityInState
Philadelphia
U.S.
DeltaPPG
AtLocation
Atlanta DallasTokyo
Japan
City
Loca
tedI
nCou
ntry
Ohio River
Cincinatti
City
Lies
OnR
iver
CityLocatedInCountry - Selecting path features
StateInCountry
Path type CountCityInState, StateInCountry 3,892CityInState, CityInState-1, CityLocatedInCountry 234AtLocation-1, AtLocation, CityLocatedInCountry 1,543CityLiesOnRiver, CityLiesOnRiver-1, CityLocatedInCountry 123… …
Harrisburg
[Lao et al, EMNLP 2011]
21
Pittsburgh
PennsylvaniaCityInState
Philadelphia Harrisburg
U.S.
DeltaPPG
AtLocation
Atlanta DallasTokyo
Japan
City
Loca
tedI
nCou
ntry
Ohio River
Cincinatti
City
Lies
OnR
iver
Path type CountCityInState, StateInCountry 8,172CityInState, CityInState-1, CityLocatedInCountry 2,234AtLocation-1, AtLocation, CityLocatedInCountry 5,273CityLiesOnRiver, CityLiesOnRiver-1, CityLocatedInCountry 298… …
CityLocatedInCountry - Selecting path features
Select the most frequent path types, and keep them as features in the model
StateInCountry
[Lao et al, EMNLP 2011]
22
Feature = Typed Path CityInState, CityInState-1, CityLocatedInCountry
Pittsburgh
Feature Value
[Lao et al, EMNLP 2011]
CityLocatedInCountry - Computing feature values for (Pittsburgh, USA)
23
Feature = Typed Path CityInState, CityInState-1, CityLocatedInCountry
Pittsburgh
Pennsylvania
CityInS
tate
Feature Value
[Lao et al, EMNLP 2011]
CityLocatedInCountry - Computing feature values for (Pittsburgh, USA)
24
Feature = Typed Path CityInState, CityInState-1, CityLocatedInCountry
Pittsburgh
Pennsylvania
CityInS
tate
CityInState-1 CityInState -1
Philadelphia Harrisburg
…(14)
Feature Value
[Lao et al, EMNLP 2011]
CityLocatedInCountry - Computing feature values for (Pittsburgh, USA)
25
Feature = Typed Path CityInState, CityInState-1, CityLocatedInCountry
Pittsburgh
Pennsylvania
CityInS
tate
CityInState-1 CityInState -1
Philadelphia Harrisburg
…(14)
U.S.
Feature Value
CityLocatedInCountry
[Lao et al, EMNLP 2011]
CityLocatedInCountry - Computing feature values for (Pittsburgh, USA)
26
Feature = Typed Path CityInState, CityInState-1, CityLocatedInCountry 1.0
Pittsburgh
Pennsylvania
CityInS
tate
CityInState-1 CityInState -1
Philadelphia Harrisburg
…(14)
U.S.
Feature Value
CityLocatedInCountry
Pr(U.S. | Pittsburgh, TypedPath)
[Lao et al, EMNLP 2011]
CityLocatedInCountry - Computing feature values for (Pittsburgh, USA)
27
Feature = Typed Path CityInState, CityInState-1, CityLocatedInCountry 1.0 AtLocation-1, AtLocation, CityLocatedInCountry
Pittsburgh
Pennsylvania
CityInS
tate
CityInState-1 CityInState -1
Philadelphia Harrisburg
…(14)
U.S.
Feature Value
CityLocatedInCountry
[Lao et al, EMNLP 2011]
CityLocatedInCountry - Computing feature values for (Pittsburgh, USA)
28
Feature = Typed Path CityInState, CityInState-1, CityLocatedInCountry 1.0 AtLocation-1, AtLocation, CityLocatedInCountry
Pittsburgh
Pennsylvania
CityInS
tate
CityInState-1 CityInState -1
Philadelphia Harrisburg
…(14)
U.S.
Feature Value
CityLocatedInCountry
DeltaPPG
AtLocation -1
[Lao et al, EMNLP 2011]
CityLocatedInCountry - Computing feature values for (Pittsburgh, USA)
29
Feature = Typed Path CityInState, CityInState-1, CityLocatedInCountry 1.0 AtLocation-1, AtLocation, CityLocatedInCountry
Pittsburgh
Pennsylvania
CityInS
tate
CityInState-1 CityInState -1
Philadelphia Harrisburg
…(14)
U.S.
Feature Value
CityLocatedInCountry
DeltaPPG
AtLocation -1 AtLocationAtlanta Dallas
Tokyo
[Lao et al, EMNLP 2011]
CityLocatedInCountry - Computing feature values for (Pittsburgh, USA)
30
Feature = Typed Path CityInState, CityInState-1, CityLocatedInCountry 1.0 AtLocation-1, AtLocation, CityLocatedInCountry 0.6
Pittsburgh
Pennsylvania
CityInS
tate
CityInState-1 CityInState -1
Philadelphia Harrisburg
…(14)
U.S.
Feature Value
CityLocatedInCountry
DeltaPPG
AtLocation -1 AtLocationAtlanta Dallas
Tokyo
Japan
City
Loca
tedI
nCou
ntry
[Lao et al, EMNLP 2011]
CityLocatedInCountry - Computing feature values for (Pittsburgh, USA)
31
Feature = Typed Path CityInState, CityInState-1, CityLocatedInCountry 1.0 AtLocation-1, AtLocation, CityLocatedInCountry 0.6 … …
Pittsburgh
Pennsylvania
CityInS
tate
CityInState-1 CityInState -1
Philadelphia Harrisburg
…(14)
U.S.
Feature Value
CityLocatedInCountry
DeltaPPG
AtLocation -1 AtLocationAtlanta Dallas
Tokyo
Japan
City
Loca
tedI
nCou
ntry
[Lao et al, EMNLP 2011]
This is a row in a feature matrix! Use standard logistic regression
CityLocatedInCountry - Computing feature values for (Pittsburgh, USA)
32
PRA Feature Matrix
CityInState, C
ityInState
-1 , CityL
ocatedInCountry
A
tLocation-1 , A
tLocation, CityL
ocatedInCountry
…
…
(Pittsburgh, USA)(Pittsburgh, Japan)…(Tokyo, USA)(Tokyo, Japan)…
1.0 0.6 0.0 0.4 … 0.0 0.2 0.2 0.1 … … … … …
CityLocatedInCountry
33
PRA Feature Matrix
CityInState, C
ityInState
-1 , CityL
ocatedInCountry
A
tLocation-1 , A
tLocation, CityL
ocatedInCountry
…
…
(Pittsburgh, USA)(Pittsburgh, Japan)…(Tokyo, USA)(Tokyo, Japan)…
1.0 0.6 0.0 0.4 … 0.0 0.2 0.2 0.1 … … … … …
CityLocatedInCountry
34
Large data sets?
PRA Feature Matrix
CityInState, C
ityInState
-1 , CityL
ocatedInCountry
A
tLocation-1 , A
tLocation, CityL
ocatedInCountry
…
…
(Pittsburgh, USA)(Pittsburgh, Japan)…(Tokyo, USA)(Tokyo, Japan)…
1.0 0.6 0.0 0.4 … 0.0 0.2 0.2 0.1 … … … … …
CityLocatedInCountry
35
Large data sets?
PRA Feature Matrix
CityInState, C
ityInState
-1 , CityL
ocatedInCountry
A
tLocation-1 , A
tLocation, CityL
ocatedInCountry
…
…
(Pittsburgh, USA)(Pittsburgh, Japan)…(Tokyo, USA)(Tokyo, Japan)…
1.0 0.6 0.0 0.4 … 0.0 0.2 0.2 0.1 … … … … …
CityLocatedInCountry
36
Large data sets?
PRA Feature Matrix
CityInState, C
ityInState
-1 , CityL
ocatedInCountry
A
tLocation-1 , A
tLocation, CityL
ocatedInCountry
…
…
(Pittsburgh, USA)(Pittsburgh, Japan)…(Tokyo, USA)(Tokyo, Japan)…
1.0 0.6 0.0 0.4 … 0.0 0.2 0.2 0.1 … … … … …
CityLocatedInCountry
37
Large data sets?
Implementation
- Small graph: in memory
- Larger graph: GraphChi
- Vertex-centric computation. Go through the graph
sequentially, processing each walk at each vertex
and sending it to the next stop.
41
Implementation
- Small graph: in memory
- Larger graph: GraphChi
- Vertex-centric computation. Go through the graph
sequentially, processing each walk at each vertex
and sending it to the next stop.
- On a cluster: have a graph server
42
Inference over KB plus text
47
- Augment NELL or Freebase with information automatically extracted from text
Inference over KB plus text
48
- Augment NELL or Freebase with information automatically extracted from text
- From the combined graph, predict new NELL (or Freebase) relations
Inference over KB plus text
49
- Augment NELL or Freebase with information automatically extracted from text
- From the combined graph, predict new NELL (or Freebase) relations
- Basically “aggregate” relation extraction
“Steel City overlooks the Allegheny”“Pittsburgh lies on the Mon”“Pittsburgh sits on the Monongahela”
51
“Steel City overlooks the Allegheny”“Pittsburgh lies on the Mon”“Pittsburgh sits on the Monongahela”
“Steel City”
“Pittsburgh”
“the Allegheny”
“the Mon”
“the Monongahela”52
“Steel City overlooks the Allegheny”“Pittsburgh lies on the Mon”“Pittsburgh sits on the Monongahela”
“Steel City”
“Pittsburgh”
“the Allegheny”
“the Mon”
“overlooks”
“lies on”
“sits on”
“the Monongahela”53
“Steel City overlooks the Allegheny”“Pittsburgh lies on the Mon”“Pittsburgh sits on the Monongahela”
Pittsburgh
Allegheny River
Monongahela River
“Steel City”
“Pittsburgh”
“the Allegheny”
“the Mon”
“overlooks”
“lies on”
“sits on”
“the Monongahela”54
“Steel City overlooks the Allegheny”“Pittsburgh lies on the Mon”“Pittsburgh sits on the Monongahela”
Pittsburgh
Allegheny River
Monongahela River
“Steel City”
“Pittsburgh”
canR
efer
To
“the Allegheny”
“the Mon” canReferTo
canReferTo“overlooks”
“lies on”
“sits on”
“the Monongahela”canReferTo
55
canReferTo
Relation: cityLiesOnRiver
Path: canReferTo-1, “lies on”, canReferTo
Pittsburgh
Allegheny River
Monongahela River
“Steel City”
“Pittsburgh”
canR
efer
To
canReferTo
“the Allegheny”
“the Mon” canReferTo
canReferTo“overlooks”
“lies on”
“sits on”
“the Monongahela”canReferTo
KB + Text
57
Relation: cityLiesOnRiver
Path: canReferTo-1, “lies on”, canReferTo
Pittsburgh
Allegheny River
Monongahela River
“Steel City”
“Pittsburgh”
canR
efer
To
canReferTo
“the Allegheny”
“the Mon” canReferTo
canReferTo“overlooks”
“lies on”
“sits on”
“the Monongahela”canReferTo
KB + Text
58
Relation: cityLiesOnRiver
Path: canReferTo-1, “lies on”, canReferTo
Pittsburgh
Allegheny River
Monongahela River
“Steel City”
“Pittsburgh”
canR
efer
To
canReferTo
“the Allegheny”
“the Mon” canReferTo
canReferTo“overlooks”
“lies on”
“sits on”
“the Monongahela”canReferTo
- Large data problem: verb forms are sparse!
- Can clustering help? [Gardner et al., EMNLP 2013]
- “lies on” -> C1- “sits on” -> C1- “overlooks” -> C2
KB + Text
59
Relation: cityLiesOnRiver
Path: canReferTo-1, “C1”, canReferTo
Pittsburgh
Allegheny River
Monongahela River
“Steel City”
“Pittsburgh”
canR
efer
To
canReferTo
“the Allegheny”
“the Mon” canReferTo
canReferTo“C2”
“C1”
“C1”
“the Monongahela”canReferTo
KB + Clustered Text
61
Relation: cityLiesOnRiver
Path: canReferTo-1, “C1”, canReferTo
Pittsburgh
Allegheny River
Monongahela River
“Steel City”
“Pittsburgh”
canR
efer
To
canReferTo
“the Allegheny”
“the Mon” canReferTo
canReferTo“C2”
“C1”
“C1”
“the Monongahela”canReferTo
KB + Clustered Text
62
Relation: cityLiesOnRiver
Path: canReferTo-1, “C1”, canReferTo
Pittsburgh
Allegheny River
Monongahela River
“Steel City”
“Pittsburgh”
canR
efer
To
canReferTo
“the Allegheny”
“the Mon” canReferTo
canReferTo“C2”
“C1”
“C1”
“the Monongahela”canReferTo
KB + Clustered Text
63
Relation: cityLiesOnRiver
Path: canReferTo-1, “C1”, canReferTo
Pittsburgh
Allegheny River
Monongahela River
“Steel City”
“Pittsburgh”
canR
efer
To
canReferTo
“the Allegheny”
“the Mon” canReferTo
canReferTo“C2”
“C1”
“C1”
“the Monongahela”canReferTo
- Much better, but still can be sparse
- Use vector space similarity directly [Gardner et al., EMNLP 2014]
- “lies on” -> [.4, .3, -.1, .5]- “sits on” -> [.35, .3, -.2, .4]- “overlooks” -> [.2, .4, -.05, .2]
KB + Clustered Text
64
Relation: cityLiesOnRiver
Path: canReferTo-1, “lies on”: [.4, .3, -.1, .5], canReferTo
KB + Text Vectors
65
Relation: cityLiesOnRiver
Path: canReferTo-1, “lies on”: [.4, .3, -.1, .5], canReferTo
Pittsburgh
Allegheny River
Monongahela River
“Steel City”
“Pittsburgh”
canR
efer
To
canReferTo
“the Allegheny”
“the Mon” canReferTo
canReferTo“overlooks”: [.2, .4, -.05, .2]
“lies on”: [.4, .3, -.1, .5]
“sits on”: [.35, .3, -.2, .4]
“the Monongahela”canReferTo
KB + Text Vectors
66
Relation: cityLiesOnRiver
Path: canReferTo-1, “lies on”: [.4, .3, -.1, .5], canReferTo
Pittsburgh
Allegheny River
Monongahela River
“Steel City”
“Pittsburgh”
canR
efer
To
canReferTo
“the Allegheny”
“the Mon” canReferTo
canReferTo“overlooks”: [.2, .4, -.05, .2]
“lies on”: [.4, .3, -.1, .5]
“sits on”: [.35, .3, -.2, .4]
“the Monongahela”canReferTo
KB + Text Vectors
67
Relation: cityLiesOnRiver
Path: canReferTo-1, “lies on”: [.4, .3, -.1, .5], canReferTo
Pittsburgh
Allegheny River
Monongahela River
“Steel City”
“Pittsburgh”
canR
efer
To
canReferTo
“the Allegheny”
“the Mon” canReferTo
canReferTo“overlooks”: [.2, .4, -.05, .2]
“lies on”: [.4, .3, -.1, .5]
“sits on”: [.35, .3, -.2, .4]
“the Monongahela”canReferTo
KB + Text Vectors
68
Relation: cityLiesOnRiver
Path: canReferTo-1, “lies on”: [.4, .3, -.1, .5], canReferTo
Pittsburgh
Allegheny River
Monongahela River
“Steel City”
“Pittsburgh”
canR
efer
To
canReferTo
“the Allegheny”
“the Mon” canReferTo
canReferTo“overlooks”: [.2, .4, -.05, .2]
“lies on”: [.4, .3, -.1, .5]
“sits on”: [.35, .3, -.2, .4]
“the Monongahela”canReferTo
KB + Text Vectors
69