Composite Likelihood Data Augmentation for Within-Network ... · Composite Likelihood Data...
Transcript of Composite Likelihood Data Augmentation for Within-Network ... · Composite Likelihood Data...
Composite Likelihood Data Augmentation for Within-Network Statistical Relational Learning
Joseph J. Pfeiffer III1Jennifer Neville1Paul Bennett2
1Purdue University2Microsoft Research
ICDM 2014, Shenzhen
• Items: stock brokers
• Label: fraudulent
• Relationships: phone calls
• Attributes: other observable item traits
- social network users, papers, movies, products, etc.
- buy product, sales rank, etc.,
- friendships, emails, citations, co-purchases, etc.
Relational Machine Learning
2
Within-network relational machine learning
Within-Network Relational Machine Learning
3
Learn model parameters using the observed data
⇥Y
Infer (predict) remaining labels using the learned parameters and available data⇥Y
Will Show: Learning/inference approximations prevent
semi-supervised algorithms from converging
Will Develop/Demonstrate: Methods that model parameter uncertainty
overcome approximation errors
• Introduction • Background: Semi-supervised
Relational Machine Learning• Analysis of Relational EM convergence
in real world networks • Relational Stochastic EM and
Relational Data Augmentation• Experiments• Conclusions
Outline
4
• Introduction• Background: Semi-supervised
Relational Machine Learning • Analysis of Relational EM convergence
in real world networks • Relational Stochastic EM and
Relational Data Augmentation• Experiments• Conclusions
Outline
5
Collective Inference
ˆ
⇥Y = argmax
⇥Y
X
vi2VL
logPY (yi|YMBL(vi),xi,⇥Y )
• Relational Machine Learning
- Approximate Learning:
- ApproximateInference:
• Many conditional forms to choose from(e.g., Generative, Logistic)
Background - Semi-Supervised Relational LearningLabels Edges
Attributes
yi
yjxi
MB(vi)
M-Step:
E-Step: (For all unlabeled instances)
Composite Likelihood (GO)
Pseudolikelihood (GL)ˆ
⇥Y = argmax
⇥Y
X
YU2YU
PY (YU )
X
vi2VL
logPY (yi| ˜YMB(vi),xi,⇥Y )
PY (yi|YMB(vi),xi,⇥Y )
P (Y|X,E,⇥Y )
yi
yjxi
MB(vi)
Semi-supervised Relational EM(Xiang & Neville, 2009)
How do these approximations affect semi-supervised relational learning?
Both learning and inference require approximations
• Introduction• Background: Semi-supervised
Relational Machine Learning • Analysis of Relational EM convergence
in real world networks • Relational Stochastic EM and
Relational Data Augmentation• Experiments• Conclusions
Outline
7
• Introduction• Background: Semi-supervised
Relational Machine Learning• Analysis of Relational EM convergence
in real world networks • Relational Stochastic EM and
Relational Data Augmentation• Experiments• Conclusions
Outline
8
Collective Inference Approximation• (Relational EM:)
• Over-Propagation Error - Both neighboring yellow- Yellow neighbors are
not predictive • Over-Correction
Impact of Approximations in Semi-Supervised Relational EM
M-Step:
E-Step: (For all unlabeled instances)
Composite Likelihood Approximation
ˆ
⇥Y = argmax
⇥Y
X
YU2YU
PY (YU )
X
vi2VL
logPY (yi| ˜YMB(vi),xi,⇥Y )
PY (yi|YMB(vi),xi,⇥Y )
??
? P (Purple|Neighbors) ⇡ 0.61
P (Purple|Purple Neighbor) ⇡ 0.96
• Does over propagation during prediction happen in real world, sparsely labeled networks?
Impact of Approximations in Semi-Supervised Relational Learning
0 1000 2000 3000 4000 5000Total Positives
0
1000
2000
3000
4000
5000
Tota
lNeg
ativ
es
Trial 1Trial 2Trial 3Trial 4Trial 5
0 1000 2000 3000 4000 5000Total Positives
0
1000
2000
3000
4000
5000
Tota
lNeg
ativ
es
Trial 1Trial 2Trial 3Trial 4Trial 5
Relational Naive Bayes Relational Logistic Regression
Yes
Class Prior Class Prior
• Does over correction happen during parameter estimation for semi-supervised relational learning for real world, sparsely labeled networks?
Impact of Approximations in Semi-Supervised Relational Learning
Relational Naive Bayes Relational Logistic Regression
Yes
�0.2 0.0 0.2 0.4 0.6 0.8 1.0 1.2P (�|y)
�0.2
0.0
0.2
0.4
0.6
0.8
1.0
1.2
P(+
|y)
P (·|�)
P (·|+)
�0.2 0.0 0.2 0.4 0.6 0.8 1.0w[0]
�0.6�0.5�0.4�0.3�0.2�0.1
0.00.10.2
w[1
]
w
• Introduction• Background: Semi-supervised
Relational Machine Learning• Analysis of Relational EM convergence
in real world networks • Relational Stochastic EM and
Relational Data Augmentation• Experiments• Conclusions
Outline
12
• Introduction• Background: Semi-supervised
Relational Machine Learning• Analysis of Relational EM convergence
in real world networks • Relational Stochastic EM and
Relational Data Augmentation • Experiments• Conclusions
Outline
13
Relational Stochastic EM andRelational Data Augmentation
14
ParametersPredictions
Fixed Point Stochastic
Fixed Point Relational EM —
Stochastic Relational Stochastic EM
Relational Data Augmentation
• Stochastic approximation to relational EM algorithm• Sample from the joint conditional
distribution of labels • Maximize the composite likelihood
• Contrasts with Relational EM, which utilizes expectations of unlabeled items
• Average over the parameters to reduce the variance (Celex et al., 2001)
• Inference is still performed using a single, fixed point set of parameters
Relational Stochastic EM
15
Parameters
Predictions
Fixed Point Stochastic
Fixed Point Relational EM —
Stochastic Relational Stochastic EM
Relational Data Augmentation
Alternate Between:
Gibbs sample of labels
Maximizing Parameters
Aggregate Parameters:
Final Inference:
YtU ⇠ P t
Y (YU |YL,X,E, ⇥t�1Y )
˜
⇥
tY = arg max
⇥Y
X
vi2VL
logPY (yi| ˜YtMB(vi)
,xi,⇥Y )
⇥Y =1
T
X
t
⇥tY
PY (YU |YL,X,E, ⇥Y )
• Data Augmentation is a Bayesian viewpoint of EM- Parameters are random variables.- Computes posterior predictive
distribution (Tanner&Wong,1987)• Developed a version for relational semi-
supervised learning• Final inference is over a distribution
of parameter values• Requires prior distributions over the
parameters and corresponding sampling methods- RNB: Beta (conjunctive prior)- RLR: Normal
(Metropolis-Hastings sampler)
Relational Data Augmentation
16
Parameters
Predictions
Fixed Point Stochastic
Fixed Point Relational EM —
Stochastic Relational Stochastic EM
Relational Data Augmentation
Alternate Between:
Gibbs sample of labels
Sample Parameters
Final Parameters:
Final Inference:
YtU ⇠ P t
Y (YU |YL,X,E, ⇥t�1Y )
⇥Y =1
T
X
t
⇥tY
YtU =
1
T
X
t
YtU
⇥tY ⇠ P t(⇥Y |Yt
U ,YL,X,E)
• Introduction• Background: Semi-supervised
Relational Machine Learning• Analysis of Relational EM convergence
in real world networks • Relational Stochastic EM and
Relational Data Augmentation • Experiments• Conclusions
Outline
17
• Introduction• Background: Semi-supervised
Relational Machine Learning• Analysis of Relational EM convergence
in real world networks • Relational Stochastic EM and
Relational Data Augmentation• Experiments • Conclusions
Outline
18
• Compare on four real world networks• Competing Models
- Independent- Relational (No CI and CI)- Relational EM- Relational SEM - Relational DA
• Conditional Models- Relational Naive Bayes- Relational Logistic Regression
• Error measures- Zero-One Loss (0-1 Loss)- Mean absolute error (MAE)
Experiments - Setup
19
Dataset Vertices Edges Attributes
Facebook 5,906 73,374 2
IMDB 11,280 426,167 28
DVD 16,118 75,596 28
Music 56,891 272,544 26
Experiments - DVD
20
0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9Percentage of Graph Labeled
0.340.360.380.400.420.440.460.480.50
0-1
Loss
0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9Percentage of Graph Labeled
0.300.320.340.360.380.400.420.440.460.48
0-1
Loss
Naive Bayes Logistic Regression
Relational EM Relational EM
Relational DA Relational DA
Relational EM’s instability in sparsely labeled domains causes poor performance
0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9Percentage of Graph Labeled
0.40
0.45
0.50
0.55
0.60
MA
E
0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9Percentage of Graph Labeled
0.300.320.340.360.380.400.420.440.46
MA
EExperiments - Facebook
21
Naive Bayes Logistic Regression
Relational EM Relational EM
Relational DA Relational DA
Relational SEM
Relational Data Augmentation can outperform Relational Stochastic EM
(In Paper): Cast collective inference as a nonlinear dynamical system to
analyze the convergence of Relational SEM
(Finding): Similar to Relational EM, Relational SEM can sometimes learn
parameters that result in an unstable system
• Introduction• Background: Semi-supervised
Relational Machine Learning• Analysis of Relational EM convergence
in real world networks • Relational Stochastic EM and
Relational Data Augmentation• Experiments • Conclusions
Outline
22
• Introduction• Background: Semi-supervised
Relational Machine Learning• Analysis of Relational EM convergence
in real world networks • Relational Stochastic EM and
Relational Data Augmentation• Experiments• Conclusions
Outline
23
• Findings - Fixed point approximation error led Relational EM to not converge- Overpropagation and Overcorrection - (In Paper) Fixed point EM methods can result in unstable systems during
inference
• Developed - Relational Stochastic EM method has lower variance in parameter estimates- Relational Data Augmentation for computing a posterior predictive
distribution for the unlabeled instances- Models the uncertainty over the parameter estimates, meaning it can
outperform the Relational Stochastic EM approach- Both work well in conjunction with Composite Likelihood approximation- Both significantly outperformed a variety of competitors under a number of
testing scenarios
Conclusions
24