Final Submission to KDD Cup 2011

7/31/2019 Final Submission to KDD Cup 2011

1/6

TritonMiners: Ensemble of LFL and ALS

August 14, 2011

Abstract

This document contains information about our final submission for track 1 of KDD Cup

2011. We achieved a final RMSE of 23.5797 on the test set using an ensemble of Alternative

Least Squares and Latent Feature Log-Linear approach. We are ranked at 38th position on the

leader board for track 1. This report contains information about our final submission. Our main

contribution is the parallelism for LFL using the Joint SGD Update by grouping strategy.

1


2/6

Contents

1 Notation 3

2 Alternating least squares based Matrix Factorization (ALS) 3

3 Parallelism for ALS training 3

3.1 Alternating update and grouping strategy . . . . . . . . . . . . . . . . . . . . . . 3

4 Latent Feature log linear model 4

5 Parallelism for LFL training 5

5.1 Joint SGD Update by grouping strategy . . . . . . . . . . . . . . . . . . . . . . . 5

6 Results 5

7 Timing Information 5

2


3/6

1 Notation

ru,i True rating for user- (u) and item - (i)

ru,i Predicted rating for user - (u) and item - (i)

Uu Latent feature vector for user - (u)

Ii Latent feature vector for item - (i)

k Size of feature vector i.e. latent factors

U Concatenated feature matrix for all users

I Concatenated feature matrix for all items

Nu Number of users

Ni Number of items

Learning rate parameter

Regularization parameter(x) Sigmoid on x

2 Alternating least squares based Matrix Factorization (ALS)

This method was first presented in [4]. The main differences compared to previously dicussed

methods are a) the update rule for Uu or Ii is the least squares solution and b) the regularization

parameter is multiplied by the number of ratings for that user (nu) or item (ni).

Objective Function E=

(ru,iUu Ii)2 +(

nu||Uu||

2 +

ni||Ii||2)

Least squares solution for a Uu and Ii

(MI(u)MTI(u) + (nuE))Uu = Vu

where MI(u) is sub matrix of I, where columns are chosen based on items that user u has rated.

and E is the identity matrix and Vu = MI(u)RT(u, I(u))

Optimization Type LS

Update Rule Uu A1u Vu where A = (MI(u)M

TI(u) + (nuE))

Ii B1i Yi; derivation similar to Uu

3 Parallelism for ALS training

3.1 Alternating update and grouping strategy

In this scheme, the SGD updates for U and I are decoupled. The U matrix is updated while fixing

I and vice versa (Alternating). This allows us to exploit the inherent parallelism in matrix updates.

The matrix being updated is split into N groups and each group is updated independently.

3


4/6

Figure 1: Each of the blocks of User matrix is updated independently.

4 Latent Feature log linear model

In LFL model [1] we restrict output ratings to be in the set Rc = {0,10,20,30,40,50...100} each

corresponding to c = {0,...11} classes and learn latent features for each of the ratings. We fix U0and I0 to be zero i.e. keeping class 0 as the base class.

Objective Function E=

(ru,i

cR

c exp(Ucu Ici )

Z)2 + (

c ||U

cu||

2 +

||Ici ||2)

Z=

c exp(Ucu I

ci ) - Normalization term

p(c|Uc, Ic) =exp(Ucu I

ci )

Z

r =

cR

c exp(Ucu Ici )

ZDerivative with respect to each example

foreach c

Uc

uk (ru,i

cR

c

exp(Uc

u Ic

i ))2

= 2(ru,i

c(Rc

p(c|Uc

, I

c

))p(c|Uc

, I

c

)(Rc

c(R

cp(c|Uc, Ic))Icikforeach c

Icik

(ru,i

cRc exp(Ucu I

ci ))

2 = 2(ru,i

c(Rcp(c|Uc, Ic))p(c|Uc, Ic)

(Rc

c(Rcp(c|Uc, Ic))Ucuk

Optimization Type SGD

Update Rule Ucuk Ucuk ((

Uc

uk

E) + (Ucuk))

Icik Icik ((

Ic

ik

E) + (Icik))

4


5/6

5 Parallelism for LFL training

5.1 Joint SGD Update by grouping strategy

In this scheme, the SGD updates for U and I are parallelized by creating two disjoint set containing

(u,i) pairs as illustrated in the figure below. This scheme can be recursively applied to each of the

disjoint set for further levels parallelism. To create the disjoint set we used the modulo operator

to partition into (u,i) sets. It turns out that in this dataset modulo operator splits the disjoint sets

of almost equal sizes. One of the main advantages of this strategy over the alternating strategy is

that the trained model is identitical to the trained model that one would get from a sequential SGD

training. The alternating strategy creates a different model altogether.

Figure 2: Joint SGD update by grouping independent U and I

6 Results

The results from our experiments using both training and validation set during training. The en-

semble coefficients were learned using linear regression on the validation set using a model trained

on the training set.

ALS with validation set (1/-/200) 23.88

LFL with validation set (10/.0001/120) 23.87

Ensemble of LFL and ALS 23.57

Table 4. Current Results on Test Set

5


6/6

7 Timing Information

All these runs were using 8 cores on the same node. It takes around 250 seconds to load all the files

into memory for track 1 on a single compute node. On vSMP loading time is around 400 seconds.

Method(k) Time in sec per epoch

ALS (200) 4000

LFL (120) 1200

Table 4. Run times on a single node

References

1. Aditya Krishna Menon, Charles Elkan, A log-linear model with latent features for dyadic

prediction, In IEEE International Conference on Data Mining (ICDM), Sydney, Australia,

2010

2. Zhou, Y., Wilkinson, D.M., Schreiber, R., Pan, R.: Large-Scale Parallel Collaborative Fil-

tering for the Netflix Prize,In AAIM(2008) 337-348

6

Final Submission to KDD Cup 2011

Documents

Transcript of Final Submission to KDD Cup 2011