Poincaré Embeddings for Learning Hierarchical Representationslcarin/Becky11.30.2018.pdf2018/11/30...
Transcript of Poincaré Embeddings for Learning Hierarchical Representationslcarin/Becky11.30.2018.pdf2018/11/30...
Poincaré Embeddings forLearning Hierarchical Representations
Maximilian Nickel, Douwe KielaFacebook AI Research
Presented by Ke (Becky) Bai
Nov. 30th, 2018
1 / 15
Introduction
• Symbolic data exhibits underlying latent hierarchy (Tree-likestructure, power-law distributed data).• Simultaneously capture similarity and hierarchy in the
embedding space by unsupervised learning.• Introduce a novel approach for learning hierarchical
representations by embedding entities into hyperbolic space.
2 / 15
3 / 15
MotivationsThe distance in the embedding space of the symbolic data reflectstheir semantic similarity.• The nodes of trees with branching factor b>1 grows
exponentially.• The hyperbolic disc area and circle length grow exponentiallywith the radius.
4 / 15
Embeding SpacePoincaré BallPoincaré ball model of hyperbolic space is a Riemannian manifold(Bd , gx)
Bd = {x ∈ Rd | ‖x‖ < 1} (1)
d-dimensional open unit ball, where ‖ · ‖ denotes the Euclidean norm.
gx =
(2
1− ‖x‖2
)2
gE , (2)
where x ∈ Bd and gE denotes the Euclidean metric tensor. Let
kx =(
21−‖x‖2
)2
DistanceThe distance between points θ, x ∈ Bd is computed by
d(θ, x) = arcosh(1 + 2
‖θ − x‖2
(1− ‖θ‖2)(1− ‖x‖2)
). (3)
5 / 15
Optimization
Θ′ ← argminΘL(Θ) s.t. ∀θi ∈ Θ : ‖θi‖ < 1. (4)
θt+1 = θt + Rθt (−ηt∇RL(θt)) (5)
R donates the retraction onto B at θt and ηt denotes the learning rate attime t.The Riemannian gradient can be derived from the Euclidean space byrescaling ∇E with the inverse of the Poincaré ball metric tensor, i.e., k−1
θ .∇R = k−1
θ ∇E
∇E =∂L(θ)
∂d(θ, x)
∂d(θ, x)
∂θ(6)
θt+1 ← proj(θt − ηt
(1− ‖θt‖2)2
4∇E
). (7)
6 / 15
Optimization with more details
∂d(θ, x)
∂θ=
4
β√γ2 − 1
(‖x‖2 − 2〈θ, x〉+ 1
α2 θ − xα
). (8)
Where γ = 1 + 2αβ‖θ − x‖2, α = 1− ‖θ‖2 , β = 1− ‖x‖2
proj(θ) =
{θ/‖θ‖ − ε if ‖θ‖ ≥ 1θ otherwise ,
(9)
7 / 15
Comparisons
u, v are also embedding vectors like θ, x shown before.
Euclidean Distance
d(u, v) = ‖u − v‖2
Translational Distance
d(u, v) = ‖u − v + r‖2
Where r is a learned global translation vector designed forasymmetric data.
8 / 15
Application1: Embedding Taxonomies
Let D = {(u, v)} be the set of observed hypernymy relationsbetween noun pairs from WordNet. The loss function is
∑(u,v)∈D
loge−d(u,v)∑
v ′∈N (u) e−d(u,v ′)
,
where N (u) = {v | (u, v) 6∈ D} ∪ {u}
9 / 15
Application1: Results
10 / 15
Application2: Network Embeddings
Let D = {(u, v)} represents the relationships between two people ifthey co-author a paper. In this social network, the probability of aco-author edge is
P((u, v) = 1) =1
e(d(u,v)−r)/t + 1
Where r and t are hyperparameters.The loss is the cross-entropy loss based on the probability.
11 / 15
Application2: Results
Table 1: Mean average precision for Reconstruction and Link Prediction onnetwork data.
Dimensionality
Reconstruction Link Prediction
10 20 50 100 10 20 50 100
AstroPh Euclidean 0.376 0.788 0.969 0.989 0.508 0.815 0.946 0.960N=18,772; E=198,110 Poincaré 0.703 0.897 0.982 0.990 0.671 0.860 0.977 0.988
CondMat Euclidean 0.356 0.860 0.991 0.998 0.308 0.617 0.725 0.736N=23,133; E=93,497 Poincaré 0.799 0.963 0.996 0.998 0.539 0.718 0.756 0.758
GrQc Euclidean 0.522 0.931 0.994 0.998 0.438 0.584 0.673 0.683N=5,242; E=14,496 Poincaré 0.990 0.999 0.999 0.999 0.660 0.691 0.695 0.697
HepPh Euclidean 0.434 0.742 0.937 0.966 0.642 0.749 0.779 0.783N=12,008; E=118,521 Poincaré 0.811 0.960 0.994 0.997 0.683 0.743 0.770 0.774
12 / 15
Application3: Lexical Entailment
We can quantify to what degree X is a type of Y via ratings on scaleof [0,10] to evaluate how well semantic models can capture gradedlexical entailment.
score(is-a(u, v)) = −(1 + α(‖v‖ − ‖u‖))d(u, v)
Where α is a hyper parameter representing the severity of the penalty.Penalty is ‖v‖ − ‖u‖.
Training processure
− Train the embedding using WordNet as application 1.− Use the above evaluation to score all noun pairs in HYPERLEX.− Calculate Spearman’s rank correlation with the ground-truth
ranking.
13 / 15
Application3: Results
Table 2: Spearman’s ρ for Lexical Entailment on HyperLex.
FR SLQS-Sim WN-Basic WN-WuP WN-LCh Vis-ID Euclidean Poincaré
ρ 0.283 0.229 0.240 0.214 0.214 0.253 0.389 0.512
14 / 15
Thanks
15 / 15