Prediction in dynamic Graphs
-
Upload
cornec -
Category
Technology
-
view
2.405 -
download
3
description
Transcript of Prediction in dynamic Graphs
Prediction in Dynamic Graph Sequences
Prediction in Dynamic Graph Sequences
Emile Richard
CMLA-ENS Cachan & 1000mercis
Supervisors :Th. Evgeniou (INSEAD) and N. Vayatis (CMLA-ENS Cachan)
January 20, 2012
Prediction in Dynamic Graph Sequences
Table of contents
ContextMotivationData Description
Problem FormulationRandom Graph ModelsLink Prediction HeuristicsFramework
AlgorithmsTwo-stage optimizationJoint Optimization in W and SVariants
Discussion
References
Prediction in Dynamic Graph Sequences
Context
Prediction in Dynamic Graph Sequences
Context
Motivation
From Big Data to Business Decisions
1000mercis: interactive marketing and advertisement(emailing, mobile, viral games)
1. Send less ads: email is free → overwhelm consumers
2. Make consumers happy: serendipity
3. Act sustainably: avoid long-term fatigue
4. Earn more: up to 5 times!
Prediction in Dynamic Graph Sequences
Context
Motivation
Prediction in Relational Databases?I Recommender systems
I Links: to select recommendations, offline fine-tuningI Sales volumes: prepare or push trends
I Resource allocation Consumers and contributors in UGC[Zhang11], Stockmanagement
I Understanding of data through relevant features extraction
0 50 100 150 200 250 3009
9.5
10
10.5
11
11.5
12
Time (week)
Lo
g
Returning
0 50 100 150 200 250 3002
4
6
8
10
12
Time (week)
Lo
g
New
Sellers
Products
Buyers
Commission
Sellers
Products
Buyers
Commission
Prediction in Dynamic Graph Sequences
Context
Motivation
Similar Problems
I The Netflix prize: 1M$ for a 10% improvement in accuracy
I Amazon: 35% sales generated by recommendation[Linden03]
I CRM optimization: acquisition, cross-selling, churnmanagement, prediction of top-selling items etc.
Prediction in Dynamic Graph Sequences
Context
Motivation
Other Web Applications
Prediction in Dynamic Graph Sequences
Context
Motivation
Similar Problems in Computational Biology1
I Understanding the underlying mechanisms of biologicalsystems
I Inference procedures for analysis of effects of biologicalpathways in cancer progression
I Study the effect of potential drugs/treatments on generegulatory networks in cancer cells
1After a discussion with Ali Shohaie
Prediction in Dynamic Graph Sequences
Context
Data Description
Case Study
I Data: C-to-C website
I Recommendation newsletters and banners
I Management of promotional assets and pressure on users
Domain users products daily sales
Music 0.4M 60K 2K
Books 1.2M 1.7M 18K
Electronic 0.5M 60K 2K
Video Games 0.9M 0.2M 9K
Prediction in Dynamic Graph Sequences
Context
Data Description
Heterogeneous Domains
−8 −7 −6 −5 −4 −3 −2 −1 00
0.2
0.4
0.6
0.8
1
log(Clustering Coefficient)
Density
Users side
Video Games
Music
Electronic Devices
Books
−8 −7 −6 −5 −4 −3 −2 −1 00
0.2
0.4
0.6
0.8
1
log(Clustering Coefficient)
Density
Products side
Video Games
Music
Electronic Devices
Books
8 9 10 11 12 130
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
user side
log(degree)
Density
Video Games
Music
Electronic
Books
7 8 9 10 11 12 130
0.2
0.4
0.6
0.8
1
product side
log(degree)
Density
Video Games
Music
Electronic
Books
7 8 9 10 11 12 130
0.1
0.2
0.3
0.4
0.5
user side
log(d(2)
/degree)
Density
Video Games
Music
Electronic
Books
7 8 9 10 11 12 130
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0.4
0.45
product side
log(d(2)
/degree)
Den
sit
y
Video Games
Music
Electronic
Books
7 8 9 10 11 12 130
0.1
0.2
0.3
0.4
0.5
user side
log(d(3)
/d(2)
)
Density
Video Games
Music
Electronic
Books
7 8 9 10 11 12 130
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0.4
0.45
product side
log(d(3)
/d(2)
)
Density
Video Games
Music
Electronic
Books
0.0 0.2 0.4 0.6 0.8 1.0
0.0
0.2
0.4
0.6
0.8
1.0
Books joint User x Product distribution
Users (decreasing degree)
Pro
duct
s(de
crea
sing
deg
ree)
0.0 0.2 0.4 0.6 0.8 1.0
0.0
0.2
0.4
0.6
0.8
1.0
Music joint User x Product distribution
Users (decreasing degree)
Pro
duct
s(D
ecre
asin
g de
gree
)
Prediction in Dynamic Graph Sequences
Problem Formulation
Problem Formulation
Prediction in Dynamic Graph Sequences
Problem Formulation
Dynamic Graphs
I Nodes linked by Edges that appear over timeI Web applications, Economics, Biology, Drug discovery
I (Social networks users, Friendship)I (Users and products, Purchases or clicks)I (Websites, Hyperlinks)I (Proteins, Interaction)
Prediction in Dynamic Graph Sequences
Problem Formulation
Prediction at Descriptor (macro) and Edge (micro) Levels
I Network Effect: cause and symptom of the evolution of nodefeatures e.g. popularity, homophily, centrality, diffusion level
I Simultaneousely predict node features and future links
Prediction in Dynamic Graph Sequences
Problem Formulation
Complex Networks?
I Degrees of freedom ∼ n2 , n: # nodes
I Latent factors r � n , r : # latent factors
I Intrinsic dimensionality reduced to ∼ rn� n2
I Kepler’s Laws of networks
Prediction in Dynamic Graph Sequences
Problem Formulation
Random Graph Models
Random Graph Models
I Erdos-Renyi[Bollobas01]: nodes connected with uniformprobability. No prediction chance
I Preferential Attachment[Albert02]: reproduces power-lawdegree distributions. Rich-get-Richer
I Block-Models[Nowicki01]: k blocks or clusters form thestructure of the graph. Community Structure
I Latent Factor Model[Hoff02, Krivitsky10] node latent factorszi , zj , pair-wise covariate descriptors xi ,j
P(Y |X ,Z , θ) =∏i 6=j
P(Yi ,j |Xi ,j ,Zi ,Zj , θ)
log odd(yi ,j = 1|xi ,j , zi , zj , α, β) ∝ α− βxi ,j + ‖zi − zj‖2
Parameter Estimation
Prediction in Dynamic Graph Sequences
Problem Formulation
Random Graph Models
Exponential Random Graph Families[Wasserman96]
I Graph z : realization of a random variable Z
I Pθ(Z = z) = eθ>ω(z)−Ψ(θ)
I θ ∈ RQ vector of parameters,
I ω sufficient statistics on the graph z : ω(z) ∈ RQ
I Ψ a normalization factor
I Parameter Estimation by Maximizing Log-likelihood
Prediction in Dynamic Graph Sequences
Problem Formulation
Link Prediction Heuristics
Nearest Neighbors and Walks
Hypothesis: a graph G is partially observed, we aim to find thehidden edges[Kleinberg07]
I Friends of my friends are likely to be my friends.I A ∈ {0, 1}n×n the social adjacency matrixI (A2)i,j =
∑nk=1 Ai,kAk,j = #paths of length 2 from i to j
= #common friends of i and j
I Random Walks
I Take W = D−1A where D is the diagonal matrix of degreesI Katz =
∑∞k=1 β
kW k = (In − βW )−1 − In
Prediction in Dynamic Graph Sequences
Problem Formulation
Link Prediction Heuristics
Bipartite Graphs of Marketplaces
u1
u2
u3
u4
p1
p2
p3
p4
p5
I Who bought this also bought that.I M ∈ {0, 1}#users×#products : transactionsI (MM>M)i,j : number of times product j was purchased by
users having purchased the same products as a given user i
I Random Walks Apply the unipartite formula to
(0 M
M> 0
)
Prediction in Dynamic Graph Sequences
Problem Formulation
Link Prediction Heuristics
Low-RankA = Udiag(σi )V > SVDDefine ‖X‖∗ =
∑i σi (X )
and Dτ (A) = Udiagmax(σi − τ, 0)V > : the Shrinkage operatorI Rank r matrix closest to A is Udiag(σ1, · · · , σr , 0, · · · 0)V >
I Fact : argminX12‖X − A‖2
F + τ‖X‖∗ = Dτ (A)
0 10 20 30 40 50 60
0
10
20
30
40
50
60
nz = 1400
block−wise adjacency
I Matrix Completion[Srebro05, Candes08, Koltchinskii11]estimates A by minimizing
1
2‖ω(A)− ω(X )‖2
2 + τ‖X‖∗
for a linear mapping ω : Rn×n → RQ
Prediction in Dynamic Graph Sequences
Problem Formulation
Link Prediction Heuristics
Link Prediction: Statistical and Spectral PropertiesI Statistics on number of triangles and length of paths in the
graph are stableI Spectral functions[Kunegis09] of the adjacency and stochastic
matrices killing low eigenvalues
If A = Udiag(σi )V > is the SVD, Udiag(f (σi )i )V > is calledspectral function.
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Spectral Functions
σ
f(σ
)
σ2
∝ (1−β σ)−1
−1
max(σ − τ, 0)
Prediction in Dynamic Graph Sequences
Problem Formulation
Link Prediction Heuristics
Leading Insight
Link Prediction heuristics implicitly suggest
1. Graph sequence fits to some slowly varying feature map
2. Spectrum of graphs is regular
Define a regularization formulation of the problem in order toleverage the trade-offs and select the best features.
Obstacle to matrix completion: ω(A) is to be predicted.
Prediction in Dynamic Graph Sequences
Problem Formulation
Framework
Notations
I Time steps t ∈ {1, 2, ...,T}I Adjacency matrices At ∈ {0, 1}n×n graph sequence
I Feature map ω : Rn×n → RQ linear
I ω linear (degree, clusters)I Q � n2
I Prediction of AT+1: score matrix S ∈ Rn×n
Prediction in Dynamic Graph Sequences
Problem Formulation
Framework
Assumptions
1. Stationarity of successive feature vectors
∃f : RQ → RQ ,∀t, ω(At+1) = f (ω(At)) + εt
2. Simplicity of S
I S low rank[Srebro05],I Penalize the trace norm ‖S‖∗
Prediction in Dynamic Graph Sequences
Problem Formulation
Framework
Quantities to control
1. Features predictor
J1(f ) =T−1∑t=1
`(ω(At+1), f (ω(At)) + κ‖f ‖H
2. Predicted features matching the predicted graph features(coupling term)
J2(f ,S) = `(ω(S), f (ω(AT ))
3. Penalty on S
J3(S) = τ‖S‖∗
Prediction in Dynamic Graph Sequences
Problem Formulation
Framework
Convex Optimization Problem
Let
X =
ω(A1)>
...ω(AT−1)>
,Y =
ω(A2)>
...ω(AT )>
∈ R(T−1)×Q
We take linear predictors, f (ω) = ω>W and define the convexobjective
L .= J1 + J2 + J3
=1
2‖XW − Y ‖2
F +κ
2‖W ‖2
F +1
2‖ω(AT )>W − ω(S)>‖2
2 + τ‖S‖∗
Prediction in Dynamic Graph Sequences
Algorithms
Algorithms
Prediction in Dynamic Graph Sequences
Algorithms
Optimization Strategies
Goal : minimize L(S ,W )
1. Two-stage optimization
2. Joint optimization in W and S
3. Variant 1: graph regularization
4. Variant 2: sparsity constraint
Prediction in Dynamic Graph Sequences
Algorithms
Two-stage optimization
Two-stage Optimization [Richard10]
I Solve W.
= argminW∈RQ×Q J1(W ) (regression)
I Minimize J2(W ,S) + J3(S)
I Optimal algorithms due to Nesterov
I ε-optimal solution after O(1/√ε) iterations instead of
O(1/ε2) [Goldfarb09]
(r ,noise)\alg. Proposed Static P. A. Katz
(5,0.000) 0.671±0.008 0.648± 0.008 0.627± 0.015 0.616± 0.015
(5,0.250) 0.675± 0.009 0.642± 0.007 0.602± 0.016 0.592± 0.016
(5,0.750) 0.519± 0.007 0.525± 0.005 0.497± 0.007 0.491± 0.007
(500,0.000) 0.592± 0.008 0.587± 0.007 0.671± 0.010 0.667± 0.009
(500,0.250) 0.607± 0.011 0.588± 0.009 0.649± 0.009 0.643± 0.009
(500,0.750) 0.601± 0.010 0.583± 0.007 0.645± 0.017 0.641± 0.017
Prediction in Dynamic Graph Sequences
Algorithms
Two-stage optimization
Split and Alternately MinimizeI Splitting: Lη(S , S)
.= τ‖S‖∗ + h(S , ν), subject to S = S
I Alternately minimize in S and S :
I mG (S) = argminS
{τ‖S‖∗ + 〈∇h(S),S − S〉+ 1
2µ‖S − S‖2F
}I mH(S) = argminS
{h(S , ν) + 〈∇τ‖S‖∗, S − S〉+ 1
2µ‖S − S‖2F
}Algorithm 1 Link Discovery Algorithm
Parameters: τ, ν, η
Initialization: W0 = Z1 = AT , α1 = 0
for k = 1, 2, . . . doSk ← mG (Zk) and Sk ← mH(Sk)
Wk ←1
2(Sk + Sk)
αk+1 ←1
2(1 +
√1 + 4α2
k)
Zk+1 ←Wk +1
αk+1
(αk(Sk −Wk−1)− (Wk −Wk−1)
)end for
Prediction in Dynamic Graph Sequences
Algorithms
Joint Optimization in W and S
Minimization of L by proximal gradient descent
L(S ,W ) = g(S ,W ) + Γ(S ,W )
I g(S ,W ).
= 12‖XW − Y ‖2
F + 12‖ω(AT )>W − ω(S)>‖2
2 :smoothly differentiable fit-term
I Γ(S ,W ).
= κ2 ‖W ‖
2F + τ‖S‖∗ : convex penalty
I Explicit proximal
proxθΓ(S ,W ).
= argmin(Z ,V )θΓ(Z ,V )+1
2‖S−Z‖2
F+1
2‖W−V ‖2
F
= (Dθτ (S),W /(1 + θκ))
I (Sk+1,Wk+1) = proxθkΓ
((Sk ,Wk)− θkgradg(Sk ,Wk)
)I FISTA[Beck09] for optimal convergence rate
Prediction in Dynamic Graph Sequences
Algorithms
Variants
Variant 1: Graph Regularization Constraint
I Want i ∼S j ⇒ f (i) ∼H f (j)
I Control the laplacian-like[Chen10] inner product
J4(f , S) =∑
i ,j Si ,j︸︷︷︸i∼j
‖f (i) − f (j)‖2H︸ ︷︷ ︸
f (i)∼f (j)
=⟨S ,
(‖f (i) − f (j)‖2
H
)i ,j
⟩I Other possibility: J4(f ,S) =
⟨S ,Gram(f )
⟩I Lgraph regularization = L+ λJ4
I Issue: non-convex regularizersI Algorithms:
1. Gradient descent with hyper-parameters that keep theobjective inside the convexity domain
2. Projected gradient descent inside the convexity domain
Prediction in Dynamic Graph Sequences
Algorithms
Variants
Gradient Descent Convergence Area
Prediction in Dynamic Graph Sequences
Algorithms
Variants
Empirical Results
Data Marketing Synthetic
Method \ Error ∆Sales ∆Graph ∆Sales ∆Graph
Our solution 0.62 0.28 0.13 ± .002 0.21± .003Rank-free prediction 0.64 0.31 0.19 ± .008 0.24 ± .01AR 0.80 - 0.66 ± .007 -ARIMA 0.78 - 0.17 ± .02 -VAR 1.02 - 0.42 ± .09 -MC with shrinkage - 0.38 - 0.22 ± .003
I Sales Prediction metric: ∆Sales =‖ω(AT+1)−f (ω(AT ))‖2
‖ω(AT+1)‖2to be minimized
I Graph Completion metric: ∆Graph =‖AT+1−S‖F‖AT+1‖F
to be minimized
Prediction in Dynamic Graph Sequences
Algorithms
Variants
Convexity Domain
−2
−1.5
−1
−0.5
0
0.5
1
1.5
2
0
0.5
1
1.5
2
2.5
3
3.5
4
0
2
4
6
8
10
12
14
16
w
J4
s
sw
2
+−2
−1.5
−1
−0.5
0
0.5
1
1.5
2
00.5
11.5
22.5
33.5
4
0
2
4
6
8
10
12
14
w
κ |f|2 + ν|S−AT|
2
s s
2 +
w
2=
−2
−1
0
1
2
00.5
11.5
22.5
33.5
4
0
5
10
15
20
25
30
w
λ J4 + κ |f|
2 + ν|S−AT|
2
s
sw
2 +
s2 +
w
2
I J4 not jointly-convex in (S , f )
I λJ4 + κ‖W ‖2F + ν‖S − AT‖2
F convex inside
E =
{S ∈ Rn×n
+ ,W ∈ Rn×d∣∣∣∣ ‖W ‖2
F ≤√νκ
2λ
}
Prediction in Dynamic Graph Sequences
Algorithms
Variants
Empirical Results
−8 −6 −4 −2 0 2 4 6 8 100
0.2
0.4
0.6
0.8
1
1.2
1.4
log(ν)
rela
tiv
e e
rro
rs
Performance (ν)
HYBRID (Regression)
HYBRID (Graph Completion)
Rank Free Regression
Rank Free Graph Completion
Regression Only
Graph Only
Prediction in Dynamic Graph Sequences
Algorithms
Variants
Variant 2: Sparsity Constraint
I Lsparse(S ,W ).
= L(S ,W ) + γ‖S‖1,1 (lasso)
I Split S onto S and S and add an equality constraint
I Synthetic data n = 100,Q = 15,T = 200
I 10 runs for cross validation 10 runs for test
I AUC on S reported
Nearest Neighbors Static Low Rank Lsparse L0.9767 ± 0.0076 0.9751 ± 0.0362 0.9812 ± 0.0008 0.9778 ± 0.0071
Prediction in Dynamic Graph Sequences
Discussion
Discussion
Prediction in Dynamic Graph Sequences
Discussion
Synthetic Data Generation
Let ∀k ∈ {1, · · · , r}
U(i ,k)t =
1√2πσi ,k
e
−(t−µi,k )2
2σ2i,k + εi ,k
quantify the taste of user i for feature k at t, and
V(i ,k)t the weight of feature k for item i and take
A(i ,j)t = 1{U(i)(t) > θ}1{V (j)(t) > θ}>
At is
1. Sparse
2. Rank at most r
3. Its latent factors evolve slowly provided σ’s are not too small.
Prediction in Dynamic Graph Sequences
Discussion
Scalability
I Dτ (A) is dense, even for sparse A
I Fact[Srebro05] : ‖S‖∗ = 12 minUV>=S ‖U‖2
F + ‖V ‖2F
I Instead of fixing τ , fix r and take U,V ∈ Rn×r
I DefineJ (U,V ,W )
.=
‖XW−Y ‖2F+‖ω(AT )>W−ω(UV >)>‖2
2+κ
2‖W ‖2
F+λ
2(‖U‖2
F+‖V ‖2F )
I Parallel Stochastic Gradient Algorithms [Recht11]
Prediction in Dynamic Graph Sequences
Discussion
Store Recommendation Lists
I Each feature leads to a specific list of recommendation
I Store top-k lists
I Learn optimal combinations / aggregations
... work in progress
Prediction in Dynamic Graph Sequences
Discussion
Conclusion
I Introduction of a regularization approach formulation for linkprediction in graph sequences
I Several variants detailed and empirically tested
I Perspective for scalable algorithms
I Perspective for theoretical analysis and understanding of theproblem
Prediction in Dynamic Graph Sequences
Discussion
Thanks
Mercis !
Prediction in Dynamic Graph Sequences
References
Reka Albert and Albert-Laszlo Barabasi.Statistical mechanics of complex networks.Reviews of Modern Physics, 74:4797, 2002.
A. Beck and M. Teboulle.A fast iterative shrinkage-thresholding algorithm for linearinverse problems.SIAM Journal of Imaging Sciences, 2(1):183–202, 2009.
B. Bollobas.Random graphs, vol. 73 of Cambridge Studies in AdvancedMathematics. 2nd ed.Cambridge University Press, Cambridge, 2001.
Emmanuel J. Candes and Terence Tao.A singular value thresholding algorithm for matrix completion.SIAM Journal on Optimization, 20(4):1956–1982, 2008.
Prediction in Dynamic Graph Sequences
References
Xi Chen, Seyoung Kim, Qihang Lin, Jaime G. Carbonell, andEric P. Xing.Graph-structured multi-task regression and an efficientoptimization method for general fused lasso.arXiv, 2010.
Donald Goldfarb and Shiqlan Ma.Fast alternating linearization methods for minimizing the sumof two convex functions.Technical Report, Department of IEOR, Columbia University,2009.
P. D. Hoff, A. E. Raftery, and M. S. Handcock.Latent space approaches to social network analysis.Journal of the Royal Statistical Society, 97, 2002.
David Liben-Nowell and Jon Kleinberg.The link-prediction problem for social networks.
Prediction in Dynamic Graph Sequences
References
Journal of the American Society for Information Science andTechnology, 58(7):1019–1031, 2007.
Vladimir Koltchinskii, Karim Lounici, and Alexandre Tsybakov.
Nuclear norm penalization and optimal rates for noisy matrixcompletion.Annals of Statistics, 2011.
P. N. Krivitsky and M. S. Handcock.A Separable Model for Dynamic Networks.ArXiv e-prints, November 2010.
Jerome Kunegis and Andreas Lommatzsch.Learning spectral graph transformations for link prediction.In Proceedings of the 26th Annual International Conference onMachine Learning, ICML ’09, pages 561–568, New York, NY,USA, 2009. ACM.
Prediction in Dynamic Graph Sequences
References
G. Linden, B Smith, and J. York.Amazon.com recommendations : Item-to-item collaborativefiltering.IEEE Internet Computing, 2003.
K. Nowicki and T. Snijders.Estimation and prediction for stochastic blockstructures.Journal of the American Statistical Association, 96:1077–1087, 2001.
Benjamin Recht and Christopher Re.Parallel stochastic gradient algorithms for large-scale matrixcompletion.Submitted for publication, 2011.
Emile Richard, Nicolas Baskiotis, Theodoros Evgeniou, andNicolas Vayatis.Link discovery using graph feature tracking.
Prediction in Dynamic Graph Sequences
References
Proceedings of Neural Information Processing Systems (NIPS),2010.
Nathan Srebro, Jason D. M. Rennie, and Tommi S. Jaakkola.Maximum-margin matrix factorization.In Lawrence K. Saul, Yair Weiss, and Leon Bottou, editors, inProceedings of Neural Information Processing Systems 17,pages 1329–1336. MIT Press, Cambridge, MA, 2005.
Stanley Wasserman and Philippa Pattison.Logit models and logistic regressions for social networks: I. anintroduction to markov graphs and p∗.Psychometrika, 61(3):401–425, September 1996.
K. Zhang, Th. Evgeniou, V. Padmanabhan, and E. Richard.Content contributor management and network effects in a ugcenvironment.Marketing Science, 2011.