Composite Likelihood Data Augmentation for Within-Network ... · Composite Likelihood Data...

Composite Likelihood Data Augmentation for Within-Network Statistical Relational Learning

Joseph J. Pfeiffer III1Jennifer Neville1Paul Bennett2

1Purdue University2Microsoft Research

ICDM 2014, Shenzhen

• Items: stock brokers

• Label: fraudulent

• Relationships: phone calls

• Attributes: other observable item traits

- social network users, papers, movies, products, etc.

- buy product, sales rank, etc.,

- friendships, emails, citations, co-purchases, etc.

Relational Machine Learning

2

Within-network relational machine learning

Within-Network Relational Machine Learning

3

Learn model parameters using the observed data

⇥Y

Infer (predict) remaining labels using the learned parameters and available data⇥Y

Will Show: Learning/inference approximations prevent

semi-supervised algorithms from converging

Will Develop/Demonstrate: Methods that model parameter uncertainty

overcome approximation errors

• Introduction • Background: Semi-supervised

Relational Machine Learning• Analysis of Relational EM convergence

in real world networks • Relational Stochastic EM and

Relational Data Augmentation• Experiments• Conclusions

Outline

4

• Introduction• Background: Semi-supervised

Relational Machine Learning • Analysis of Relational EM convergence



Outline

5

Collective Inference

ˆ

⇥Y = argmax

⇥Y

X

vi2VL

logPY (yi|YMBL(vi),xi,⇥Y )

• Relational Machine Learning

- Approximate Learning:

- ApproximateInference:

• Many conditional forms to choose from(e.g., Generative, Logistic)

Background - Semi-Supervised Relational LearningLabels Edges

Attributes

yi

yjxi

MB(vi)

M-Step:

E-Step: (For all unlabeled instances)

Composite Likelihood (GO)

Pseudolikelihood (GL)ˆ

⇥Y = argmax

⇥Y

X

YU2YU

PY (YU )

X

vi2VL

logPY (yi| ˜YMB(vi),xi,⇥Y )

PY (yi|YMB(vi),xi,⇥Y )

P (Y|X,E,⇥Y )

yi

yjxi

MB(vi)

Semi-supervised Relational EM(Xiang & Neville, 2009)

How do these approximations affect semi-supervised relational learning?

Both learning and inference require approximations


Relational Machine Learning • Analysis of Relational EM convergence



Outline

7





Outline

8

Collective Inference Approximation• (Relational EM:)

• Over-Propagation Error - Both neighboring yellow- Yellow neighbors are

not predictive • Over-Correction

Impact of Approximations in Semi-Supervised Relational EM

M-Step:

E-Step: (For all unlabeled instances)

Composite Likelihood Approximation

ˆ

⇥Y = argmax

⇥Y

X

YU2YU

PY (YU )

X

vi2VL

logPY (yi| ˜YMB(vi),xi,⇥Y )

PY (yi|YMB(vi),xi,⇥Y )

??

? P (Purple|Neighbors) ⇡ 0.61

P (Purple|Purple Neighbor) ⇡ 0.96

• Does over propagation during prediction happen in real world, sparsely labeled networks?

Impact of Approximations in Semi-Supervised Relational Learning

0 1000 2000 3000 4000 5000Total Positives

0

1000

2000

3000

4000

5000

Tota

lNeg

ativ

es

Trial 1Trial 2Trial 3Trial 4Trial 5

0 1000 2000 3000 4000 5000Total Positives

0

1000

2000

3000

4000

5000

Tota

lNeg

ativ

es

Trial 1Trial 2Trial 3Trial 4Trial 5

Relational Naive Bayes Relational Logistic Regression

Yes

Class Prior Class Prior

• Does over correction happen during parameter estimation for semi-supervised relational learning for real world, sparsely labeled networks?

Impact of Approximations in Semi-Supervised Relational Learning

Relational Naive Bayes Relational Logistic Regression

Yes

�0.2 0.0 0.2 0.4 0.6 0.8 1.0 1.2P (�|y)

�0.2

0.0

0.2

0.4

0.6

0.8

1.0

1.2

P(+

|y)

P (·|�)

P (·|+)

�0.2 0.0 0.2 0.4 0.6 0.8 1.0w[0]

�0.6�0.5�0.4�0.3�0.2�0.1

0.00.10.2

w[1

]

w





Outline

12




Relational Data Augmentation • Experiments• Conclusions

Outline

13

Relational Stochastic EM andRelational Data Augmentation

14

ParametersPredictions

Fixed Point Stochastic

Fixed Point Relational EM —

Stochastic Relational Stochastic EM

Relational Data Augmentation

• Stochastic approximation to relational EM algorithm• Sample from the joint conditional

distribution of labels • Maximize the composite likelihood

• Contrasts with Relational EM, which utilizes expectations of unlabeled items

• Average over the parameters to reduce the variance (Celex et al., 2001)

• Inference is still performed using a single, fixed point set of parameters

Relational Stochastic EM

15

Parameters

Predictions





Alternate Between:

Gibbs sample of labels

Maximizing Parameters

Aggregate Parameters:

Final Inference:

YtU ⇠ P t

Y (YU |YL,X,E, ⇥t�1Y )

˜

⇥

tY = arg max

⇥Y

X

vi2VL

logPY (yi| ˜YtMB(vi)

,xi,⇥Y )

⇥Y =1

T

X

t

⇥tY

PY (YU |YL,X,E, ⇥Y )

• Data Augmentation is a Bayesian viewpoint of EM- Parameters are random variables.- Computes posterior predictive

distribution (Tanner&Wong,1987)• Developed a version for relational semi-

supervised learning• Final inference is over a distribution

of parameter values• Requires prior distributions over the

parameters and corresponding sampling methods- RNB: Beta (conjunctive prior)- RLR: Normal

(Metropolis-Hastings sampler)


16

Parameters

Predictions





Alternate Between:

Gibbs sample of labels

Sample Parameters

Final Parameters:

Final Inference:

YtU ⇠ P t

Y (YU |YL,X,E, ⇥t�1Y )

⇥Y =1

T

X

t

⇥tY

YtU =

1

T

X

t

YtU

⇥tY ⇠ P t(⇥Y |Yt

U ,YL,X,E)




Relational Data Augmentation • Experiments• Conclusions

Outline

17




Relational Data Augmentation• Experiments • Conclusions

Outline

18

• Compare on four real world networks• Competing Models

- Independent- Relational (No CI and CI)- Relational EM- Relational SEM - Relational DA

• Conditional Models- Relational Naive Bayes- Relational Logistic Regression

• Error measures- Zero-One Loss (0-1 Loss)- Mean absolute error (MAE)

Experiments - Setup

19

Dataset Vertices Edges Attributes

Facebook 5,906 73,374 2

IMDB 11,280 426,167 28

DVD 16,118 75,596 28

Music 56,891 272,544 26

Experiments - DVD

20

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9Percentage of Graph Labeled

0.340.360.380.400.420.440.460.480.50

0-1

Loss


0.300.320.340.360.380.400.420.440.460.48

0-1

Loss

Naive Bayes Logistic Regression

Relational EM Relational EM

Relational DA Relational DA

Relational EM’s instability in sparsely labeled domains causes poor performance


0.40

0.45

0.50

0.55

0.60

MA

E


0.300.320.340.360.380.400.420.440.46

MA

EExperiments - Facebook

21

Naive Bayes Logistic Regression

Relational EM Relational EM

Relational DA Relational DA

Relational SEM

Relational Data Augmentation can outperform Relational Stochastic EM

(In Paper): Cast collective inference as a nonlinear dynamical system to

analyze the convergence of Relational SEM

(Finding): Similar to Relational EM, Relational SEM can sometimes learn

parameters that result in an unstable system




Relational Data Augmentation• Experiments • Conclusions

Outline

22





Outline

23

• Findings - Fixed point approximation error led Relational EM to not converge- Overpropagation and Overcorrection - (In Paper) Fixed point EM methods can result in unstable systems during

inference

• Developed - Relational Stochastic EM method has lower variance in parameter estimates- Relational Data Augmentation for computing a posterior predictive

distribution for the unlabeled instances- Models the uncertainty over the parameter estimates, meaning it can

outperform the Relational Stochastic EM approach- Both work well in conjunction with Composite Likelihood approximation- Both significantly outperformed a variety of competitors under a number of

testing scenarios

Conclusions

24

Thank [email protected] [email protected] [email protected]

JenniferNeville

PaulBennett

Composite Likelihood Data Augmentation for Within-Network ... · Composite Likelihood Data...

Documents

Transcript of Composite Likelihood Data Augmentation for Within-Network ... · Composite Likelihood Data...