PPT slides

23
Relational Learning with Gaussian Processes By Wei Chu, Vikas Sindhwani, Zoubin Ghahramani, S.Sathiya Keerthi (Columbia, Chicago, Cambridge, Yahoo!) Presented by Nesreen Ahmed, Nguyen Cao , Sebastian Moreno, Philip

Transcript of PPT slides

Page 1: PPT slides

Relational Learning with Gaussian Processes

By Wei Chu, Vikas Sindhwani, Zoubin

Ghahramani, S.Sathiya Keerthi (Columbia, Chicago, Cambridge, Yahoo!)

Presented byNesreen Ahmed, Nguyen Cao ,

Sebastian Moreno, Philip Schatz

Page 2: PPT slides

Outline

• Introduction• Relational Gaussian Processes• Application

– Linkage prediction– Semi-Supervised Learning

• Experiments & Results• Conclusion & Discussion

12/02/08CS590M: Statistical Machine Learning - Fall 2008 2

Page 3: PPT slides

Introduction

• Many domains involve Relational Data– Web: document links– Document Categorization: citations – Computational Biology: protein interactions

• Inter-relationships between instances can be informative for learning tasks

• Relations reflect network structure, enrich how instances are correlated

12/02/08CS590M: Statistical Machine Learning - Fall 2008 3

Page 4: PPT slides

Introduction• Relational Information represented by a graph

G = (V, E)• Supervised Learning:

– Provide structural knowledge• Also for semi-supervised: derived from input

attributes.• Graph estimates the global geometric structure

of the data

12/02/08CS590M: Statistical Machine Learning - Fall 2008 4

Page 5: PPT slides

• A Gaussian Process is a joint Gaussian distribution over sets of function values {fx} of any arbitrary set of n instances x

Gaussian Processes

12/02/08CS590M: Statistical Machine Learning - Fall 2008 5

fffP T

n1

2/12/ 21

expdet2

1)(

f f x1,..., fxn Twhere

(x i,x j ) n,n

Page 6: PPT slides

• Linkages:

• The uncertainty in observing εij induces Gaussian noise N(0, σ2) in observing the values of the corresponding instances’ function value

Relational Gaussian Processes

12/02/08CS590M: Statistical Machine Learning - Fall 2008 6

xjxiεij

tied"negatively" are xand xif 1

tied"positively" are xand xif 1

ji

ji

ij

z

xxxxxxij

dz

ffffffP jiji

ji

1,0|)(

11),|1(

Page 7: PPT slides

• Approximate Inference:

Relational Gaussian Processes

12/02/08CS590M: Statistical Machine Learning - Fall 2008 7

P( f |) 1

P P f P ij | fxi , fx j

ij

Q( f )P( f ) sij exp(1

2f ijT ij f ij )

ij

ij

i,j runs over the set of observed undirected linkages

EP algorithm approximates

where

P f P ij | fxi , fx j ij

as :

is a 2x2 symmetric matrix

Page 8: PPT slides

Relational Gaussian Processes

12/02/08CS590M: Statistical Machine Learning - Fall 2008 8

P( f |) (0,)

1 1

~

ij

ij

where

~

ij is a nxn matrix with four non-zero entries augmented from

ij

Page 9: PPT slides

• For any finite collection of data points X, the set of random variables {fx} conditioned on ε have a multivariate Gaussian distribution:

Relational Gaussian Processes

12/02/08CS590M: Statistical Machine Learning - Fall 2008 9

)~,0()|( fP

where elements of covariance matrix are given by evaluating the following (covariance) kernel function:

zTx kIkzxKzxK 1)(),(),(~

Page 10: PPT slides

Linkage Prediction

• Joint prob. • Probability for an edge between Xr and Xs

12/02/08CS590M: Statistical Machine Learning - Fall 2008 10

)~,0;()|( rsrsrssr fNfPXandX

)arcsin(21

)|(

)~,0;()|()|(

rsrs

XsXrrsrsrsrsidealrs

P

dfdffNfPP

),(~),(~),(~

rrss

sr

xxKxxK

xxK

Page 11: PPT slides

Semi supervised learning

12/02/08CS590M: Statistical Machine Learning - Fall 2008 11

?

? ?

??

?

?

?

-11

1

?

-1

Page 12: PPT slides

Semi supervised learning

12/02/08CS590M: Statistical Machine Learning - Fall 2008 12

?

? ?

??

?

?

?

-11

1

?

-1

Nearest Neighborhood K=1

Page 13: PPT slides

Semi supervised learning

12/02/08CS590M: Statistical Machine Learning - Fall 2008 13

?

? ?

??

?

?

?

-11

1

?

-1

Nearest Neighborhood K=2

Page 14: PPT slides

Semi supervised learning

• Apply RGP to obtain• Variables are related through a

Probit noise

• Applying Bayes

12/02/08CS590M: Statistical Machine Learning - Fall 2008 14

)|( lfP

lll yandzf ,,

n

zllzll

fyfyP

)|(

),(

)|()|()|(

1),|(

CN

fyPfPyP

yfPl

zllll

Page 15: PPT slides

Semi supervised learning

• Predictive distribution

• Obtaining Bernoulli distribution for classification

12/02/08CS590M: Statistical Machine Learning - Fall 2008 15

),(),|( 2ttzt NyfP

Ttlll

Ttttt

ltt

kCkzz

k

)~~~(),(~

~

1112

1

),0(),(),|(

),|0(),|(

22

22

ntt

tn

tt

NNyXP

yXPyyP

Page 16: PPT slides

Experiments

• Experimental Setup– Kernel function

• Centralized Kernel : linear or Gaussian kernel shifted to the empirical mean

– Noise level • Label noise = 10-4 (for RGP and GPC)• Edge noise = [5 : 0.05]

12/02/08CS590M: Statistical Machine Learning - Fall 2008 16

2n2

2

2exp),( zxzxK

i j

iji

ii

i xxKn

xzKn

xxKn

zxK ),(1

),(1

),(1

),( 2

Page 17: PPT slides

12/02/08CS590M: Statistical Machine Learning - Fall 2008 17

30 Samples collected from a gaussian mixture with two components on the x-axis. Two labeled samples indicated by diamond and circle.K=3

Best value =0.4 based on approximate model evidence

Results

Page 18: PPT slides

12/02/08CS590M: Statistical Machine Learning - Fall 2008 18

Posterior Covariance matrix of RGP learnt from the data

It captures the density information of unlabelled data

Using the posterior covariance matrix learnt from the data as the new prior, supervised learning is carried out

Curves represent predictive distribution for each class

Results

Page 19: PPT slides

Results• Real World Experiment

– Subset of the WEBKB dataset • Collected from CS dept. of 4 universities• Contains pages with hyperlinks interconnecting them• Pages classified into 7 categories (e.g student, course, other)

– Documents are preprocessed as vectors of input attributes

– Hyperlinks translated into undirected positive linkages• 2 pages are likely to be positively correlated if hyperlinked by

the same hub page • No negative linkages

– Compared with GPC & LapSVM (Sindhwani et al. 2005)

12/02/08CS590M: Statistical Machine Learning - Fall 2008 19

Page 20: PPT slides

Results

• Two classification tasks– Student vs. non-student, Other vs. non-other

• Randomly selected 10% samples as labeled data• Selection repeated 100 times• Linear kernel• Table shows average AUC for predicting the labels of unlabeled cases

12/02/08CS590M: Statistical Machine Learning - Fall 2008 20

Student or Not Other or Not

Univ. GPC LapSVM RGP GPC LapSVM RGP

Corn.0.825±0.01

60.987±0.00

80.989±0.00

90.708±0.02

10.865±0.03

80.884±0.02

5

Texa.0.899±0.01

60.994±0.00

70.999±0.00

10.799±0.02

10.932±0.02

60.906±0.02

6

Wash.0.839±0.01

80.957±0.01

40.961±0.00

90.782±0.02

30.828±0.02

50.877±0.02

4

Wisc0.883±0.01

30.976±0.02

90.992±0.00

80.839±0.01

40.812±0.03

00.899±0.01

5

Page 21: PPT slides

Conclusion• A novel Bayesian framework to learn from relational

data based on GP

• The RGP provides a data-dependent covariance function for supervised learning tasks (classification)

• Applied to semi-supervised learning tasks

• RGP requires very few labels to generalize on unseen test points– Incorporate unlabeled data in the model selection

12/02/08CS590M: Statistical Machine Learning - Fall 2008 21

Page 22: PPT slides

Discussion• The proposed framework can be extended to

model:– Directed (asymmetric) relations as well as

undirected relations – Multiple classes of relations– Graphs with weighted edges

• The model should be compared to other models

• The results can be sensitive to choice of K in KNN

12/02/08CS590M: Statistical Machine Learning - Fall 2008 22

Page 23: PPT slides

12/02/08CS590M: Statistical Machine Learning - Fall 2008 23

Thanks

Questions ?