Final Submission to KDD Cup 2011
Transcript of Final Submission to KDD Cup 2011
-
7/31/2019 Final Submission to KDD Cup 2011
1/6
TritonMiners: Ensemble of LFL and ALS
August 14, 2011
Abstract
This document contains information about our final submission for track 1 of KDD Cup
2011. We achieved a final RMSE of 23.5797 on the test set using an ensemble of Alternative
Least Squares and Latent Feature Log-Linear approach. We are ranked at 38th position on the
leader board for track 1. This report contains information about our final submission. Our main
contribution is the parallelism for LFL using the Joint SGD Update by grouping strategy.
1
-
7/31/2019 Final Submission to KDD Cup 2011
2/6
Contents
1 Notation 3
2 Alternating least squares based Matrix Factorization (ALS) 3
3 Parallelism for ALS training 3
3.1 Alternating update and grouping strategy . . . . . . . . . . . . . . . . . . . . . . 3
4 Latent Feature log linear model 4
5 Parallelism for LFL training 5
5.1 Joint SGD Update by grouping strategy . . . . . . . . . . . . . . . . . . . . . . . 5
6 Results 5
7 Timing Information 5
2
-
7/31/2019 Final Submission to KDD Cup 2011
3/6
1 Notation
ru,i True rating for user- (u) and item - (i)
ru,i Predicted rating for user - (u) and item - (i)
Uu Latent feature vector for user - (u)
Ii Latent feature vector for item - (i)
k Size of feature vector i.e. latent factors
U Concatenated feature matrix for all users
I Concatenated feature matrix for all items
Nu Number of users
Ni Number of items
Learning rate parameter
Regularization parameter(x) Sigmoid on x
2 Alternating least squares based Matrix Factorization (ALS)
This method was first presented in [4]. The main differences compared to previously dicussed
methods are a) the update rule for Uu or Ii is the least squares solution and b) the regularization
parameter is multiplied by the number of ratings for that user (nu) or item (ni).
Objective Function E=
(ru,iUu Ii)2 +(
nu||Uu||
2 +
ni||Ii||2)
Least squares solution for a Uu and Ii
(MI(u)MTI(u) + (nuE))Uu = Vu
where MI(u) is sub matrix of I, where columns are chosen based on items that user u has rated.
and E is the identity matrix and Vu = MI(u)RT(u, I(u))
Optimization Type LS
Update Rule Uu A1u Vu where A = (MI(u)M
TI(u) + (nuE))
Ii B1i Yi; derivation similar to Uu
3 Parallelism for ALS training
3.1 Alternating update and grouping strategy
In this scheme, the SGD updates for U and I are decoupled. The U matrix is updated while fixing
I and vice versa (Alternating). This allows us to exploit the inherent parallelism in matrix updates.
The matrix being updated is split into N groups and each group is updated independently.
3
-
7/31/2019 Final Submission to KDD Cup 2011
4/6
Figure 1: Each of the blocks of User matrix is updated independently.
4 Latent Feature log linear model
In LFL model [1] we restrict output ratings to be in the set Rc = {0,10,20,30,40,50...100} each
corresponding to c = {0,...11} classes and learn latent features for each of the ratings. We fix U0and I0 to be zero i.e. keeping class 0 as the base class.
Objective Function E=
(ru,i
cR
c exp(Ucu Ici )
Z)2 + (
c ||U
cu||
2 +
||Ici ||2)
Z=
c exp(Ucu I
ci ) - Normalization term
p(c|Uc, Ic) =exp(Ucu I
ci )
Z
r =
cR
c exp(Ucu Ici )
ZDerivative with respect to each example
foreach c
Uc
uk (ru,i
cR
c
exp(Uc
u Ic
i ))2
= 2(ru,i
c(Rc
p(c|Uc
, I
c
))p(c|Uc
, I
c
)(Rc
c(R
cp(c|Uc, Ic))Icikforeach c
Icik
(ru,i
cRc exp(Ucu I
ci ))
2 = 2(ru,i
c(Rcp(c|Uc, Ic))p(c|Uc, Ic)
(Rc
c(Rcp(c|Uc, Ic))Ucuk
Optimization Type SGD
Update Rule Ucuk Ucuk ((
Uc
uk
E) + (Ucuk))
Icik Icik ((
Ic
ik
E) + (Icik))
4
-
7/31/2019 Final Submission to KDD Cup 2011
5/6
5 Parallelism for LFL training
5.1 Joint SGD Update by grouping strategy
In this scheme, the SGD updates for U and I are parallelized by creating two disjoint set containing
(u,i) pairs as illustrated in the figure below. This scheme can be recursively applied to each of the
disjoint set for further levels parallelism. To create the disjoint set we used the modulo operator
to partition into (u,i) sets. It turns out that in this dataset modulo operator splits the disjoint sets
of almost equal sizes. One of the main advantages of this strategy over the alternating strategy is
that the trained model is identitical to the trained model that one would get from a sequential SGD
training. The alternating strategy creates a different model altogether.
Figure 2: Joint SGD update by grouping independent U and I
6 Results
The results from our experiments using both training and validation set during training. The en-
semble coefficients were learned using linear regression on the validation set using a model trained
on the training set.
ALS with validation set (1/-/200) 23.88
LFL with validation set (10/.0001/120) 23.87
Ensemble of LFL and ALS 23.57
Table 4. Current Results on Test Set
5
-
7/31/2019 Final Submission to KDD Cup 2011
6/6
7 Timing Information
All these runs were using 8 cores on the same node. It takes around 250 seconds to load all the files
into memory for track 1 on a single compute node. On vSMP loading time is around 400 seconds.
Method(k) Time in sec per epoch
ALS (200) 4000
LFL (120) 1200
Table 4. Run times on a single node
References
1. Aditya Krishna Menon, Charles Elkan, A log-linear model with latent features for dyadic
prediction, In IEEE International Conference on Data Mining (ICDM), Sydney, Australia,
2010
2. Zhou, Y., Wilkinson, D.M., Schreiber, R., Pan, R.: Large-Scale Parallel Collaborative Fil-
tering for the Netflix Prize,In AAIM(2008) 337-348
6