Improving Response Prediction for Dyadic Data

Improving Response Improving Response Prediction for Dyadic DataPrediction for Dyadic Data

Nik Tuzov

April 2008http://www.stat.purdue.edu/~ntuzov/

http://www.stat.purdue.edu/~ntuzov/

Dyadic DataDyadic Data

• Means that a certain “response value” is associated with a pair of objects

Applications:Applications:

• Social networks

• Internet advertising

• Recommendation systems

Unsupervised learningUnsupervised learning• Example: Collaborative filtering (MovieLens project)

Movie

1 2 3 4 5

1 A B C D A

2 A B C C A

User 3 A B C X? A

4 Y? B

5 • Movie 1 is “similar” to 5, hence Y is likely “B”• Users 1, 2, 3 are “similar” to each other, hence X is likely “C” or “D”

Co-clustering with Bregman differencesCo-clustering with Bregman differences

• K*L rectangular clusters – direct products of row/column clusters

Movie rating belongs to cluster ( ( ), ( )), where

( ) = I, I takes values from 1 to K

( ) = J, J takes values from 1 to L

ijy i j

i

j

Co-clustering with Bregman differencesCo-clustering with Bregman differences(example from http://videolectures.net/kdd07_agarwal_pdlfm/)(example from http://videolectures.net/kdd07_agarwal_pdlfm/)

PDLF-GLM ModelPDLF-GLM Model(Agarwal & Merugu’07)(Agarwal & Merugu’07)

IJ

[ ] - response matrix (movie ratings), 1, 1,

- s-dimensional observed covariate

K*L - number of clusters or "blocks" in the response matrix

scalar "offset" or "interaction term" for

ij

ij

Y y i m j n

x

cluster (I, J), I = 1,K J = 1,L

- proportion of [ ] belonging to cluster (I, J)

( ; ) - probability density of a GLM model; is a scalar parameter

- s-dimensional vector of regression co

IJ ij

ij

y

f y

IJ,

efficient of GLM model

Then the conditional density of is represented as:

( | ) ( ; )

ij

tij ij IJ ij ij

I J

y

p y x f y x

Neural Network as alternative to GLMNeural Network as alternative to GLMDenote: observed (non-missing) reponse

( | , ) - fitted response from the neural network,

given covariate and estimated weights

ij

ij ij

y

y x

( ), ( )

ij

ij ij

The predicted response for dyad (i, j) is: ( | , ) +

Input: Response matrix Y = [y ], i = 1,m j = 1,n

Covariates X = [x ], x is s-dimensional

Number of r

ij ij i jy x

IJ

ow clusters: K

Number of column clusters: L

Output: Neural network weights (dimension depends on the network),

K*L offsets { }

(.), (.) row and column cluster a

ssignments, m and n-dimensional,

that minimize overall sum of squared differences between the observed

and predicted response for dyad (i, j).

AlgorithmAlgorithm

IJ

(.), (.) - randomly

0

- by fitting the network based just on Y and X

Method:

Initialize:

Repeat

IJ

IJ argmin

Step 1: Update offsets (interaction effects) :

For each I = 1,K , J = 1, L

( ( ijy

2

i I, j J

IJi I, j J

| , ) )

that is,

( ( | ))

Step 2: Update neural n

ij ij

ij ij ij

y x

mean y y x

2( ), ( )argmin

all i, j

etwork weights, :

( ( | , ) ) ,

which amounts to fitting the network with

ij ij ij i jy y x

( ), ( )

argmin

covariates and "adjusted" response values ( )

Step 3: Update row cluster assignments: for each i = 1, m

(i)

ij ij i jx y

2( ) I

j = 1,n

( ( | , ) )

Step 4: Update column cluster assignments (similar to Step 3)

ij ij ij I jy y x

Data: MovieLensData: MovieLens

• 20603 ratings, 346 users, 966 movies

• From 1 to 198 ratings per movie, 32 to 105 ratings per user.

• 50 covariates for each (user, movie) pair

• 5700 observations held out for validation

• Using area under Receiver Operating Characteristic (ROC) curve to measure performance

Neural Network TopologyNeural Network Topology

0

0 1

( ), 1, - "derived features"

y ( ) - fitted probability of {y = 1}

(v) =1/(1 ) - sigmoid function

Total number of parameters ("weights"): ( 1) ( 1)

tl l l

t

v

Z X l r

Z

e

s r r

Number of nodes?Number of nodes?

• 40 nodes appear enough (produce similar overfitting)

ResultsResults

Logistic regression Neural network PDLF-Logistic PDLF-Neural PDLF-Neural PDLF-Neural

Clusters N/A N/A 4 * 4 4 * 4 6 * 6 3 * 4

Hidden nodes 1 40 1 40 40 40

Validation ROC 0.62 0.6742 0.6913 0.7128 0.6919 0.708

Max. cluster size N/A N/A 2022 1913 5184 1847

Min cluster size N/A N/A 274 412 5 709

Max delta N/A N/A 0.25 0.13 0.23 0.02

Min delta N/A N/A -0.4 -0.57 -0.36 -0.62

New Covariates?New Covariates?

Title Release date

Nosferatu (Nosferatu, eine Symphonie des Grauens) (1922) 1-Jan-22

Blue Angel, The (Blaue Engel, Der) (1930) 1-Jan-30

Pinocchio (1940) 1-Jan-40

Dial M for Murder (1954) 1-Jan-54

8 1/2 (1963) 1-Jan-63

Carrie (1976) 1-Jan-76

Top Gun (1986) 1-Jan-86

Bram Stoker's Dracula (1992) 1-Jan-92

Mortal Kombat: Annihilation (1997) 1-Jan-97

Sphere (1998) 13-Feb-98

• 756 ratings; 23 females and 55 males; No documentaries

Sample movies from the cluster with delta = -0.57 :

Contribution to ROC

Is Neural Network useful?Is Neural Network useful?

• Gain in ROC area depends on the order: extra linear features (n/network) are added first => gain from co-clustering is reduced

• The opposite is also true

• Hence, info in linear features is similar to that in clusters, so

• For this dataset, n/network is not so helpful, but…

• For other dyadic datasets, n/network can be a lot more useful

Related WorkRelated Work

• What if we want to predict response on

(Web page, Search query, Web user) ?

• B. Long, X. Wu, Z. Zhang, and P. S. Yu. Unsupervised learning on k-partite graphs. In KDD, 2006.

Additional InfoAdditional Info

• To obtain a detailed report and Matlab code, please visit my website:


• The project is posted in “Software skills / Matlab” section

• Questions? Contact me on [email protected]


mailto:[email protected]

Improving Response Prediction for Dyadic Data

Documents

Transcript of Improving Response Prediction for Dyadic Data