Final Submission to KDD Cup 2011

download Final Submission to KDD Cup 2011

of 6

Transcript of Final Submission to KDD Cup 2011

  • 7/31/2019 Final Submission to KDD Cup 2011

    1/6

    TritonMiners: Ensemble of LFL and ALS

    August 14, 2011

    Abstract

    This document contains information about our final submission for track 1 of KDD Cup

    2011. We achieved a final RMSE of 23.5797 on the test set using an ensemble of Alternative

    Least Squares and Latent Feature Log-Linear approach. We are ranked at 38th position on the

    leader board for track 1. This report contains information about our final submission. Our main

    contribution is the parallelism for LFL using the Joint SGD Update by grouping strategy.

    1

  • 7/31/2019 Final Submission to KDD Cup 2011

    2/6

    Contents

    1 Notation 3

    2 Alternating least squares based Matrix Factorization (ALS) 3

    3 Parallelism for ALS training 3

    3.1 Alternating update and grouping strategy . . . . . . . . . . . . . . . . . . . . . . 3

    4 Latent Feature log linear model 4

    5 Parallelism for LFL training 5

    5.1 Joint SGD Update by grouping strategy . . . . . . . . . . . . . . . . . . . . . . . 5

    6 Results 5

    7 Timing Information 5

    2

  • 7/31/2019 Final Submission to KDD Cup 2011

    3/6

    1 Notation

    ru,i True rating for user- (u) and item - (i)

    ru,i Predicted rating for user - (u) and item - (i)

    Uu Latent feature vector for user - (u)

    Ii Latent feature vector for item - (i)

    k Size of feature vector i.e. latent factors

    U Concatenated feature matrix for all users

    I Concatenated feature matrix for all items

    Nu Number of users

    Ni Number of items

    Learning rate parameter

    Regularization parameter(x) Sigmoid on x

    2 Alternating least squares based Matrix Factorization (ALS)

    This method was first presented in [4]. The main differences compared to previously dicussed

    methods are a) the update rule for Uu or Ii is the least squares solution and b) the regularization

    parameter is multiplied by the number of ratings for that user (nu) or item (ni).

    Objective Function E=

    (ru,iUu Ii)2 +(

    nu||Uu||

    2 +

    ni||Ii||2)

    Least squares solution for a Uu and Ii

    (MI(u)MTI(u) + (nuE))Uu = Vu

    where MI(u) is sub matrix of I, where columns are chosen based on items that user u has rated.

    and E is the identity matrix and Vu = MI(u)RT(u, I(u))

    Optimization Type LS

    Update Rule Uu A1u Vu where A = (MI(u)M

    TI(u) + (nuE))

    Ii B1i Yi; derivation similar to Uu

    3 Parallelism for ALS training

    3.1 Alternating update and grouping strategy

    In this scheme, the SGD updates for U and I are decoupled. The U matrix is updated while fixing

    I and vice versa (Alternating). This allows us to exploit the inherent parallelism in matrix updates.

    The matrix being updated is split into N groups and each group is updated independently.

    3

  • 7/31/2019 Final Submission to KDD Cup 2011

    4/6

    Figure 1: Each of the blocks of User matrix is updated independently.

    4 Latent Feature log linear model

    In LFL model [1] we restrict output ratings to be in the set Rc = {0,10,20,30,40,50...100} each

    corresponding to c = {0,...11} classes and learn latent features for each of the ratings. We fix U0and I0 to be zero i.e. keeping class 0 as the base class.

    Objective Function E=

    (ru,i

    cR

    c exp(Ucu Ici )

    Z)2 + (

    c ||U

    cu||

    2 +

    ||Ici ||2)

    Z=

    c exp(Ucu I

    ci ) - Normalization term

    p(c|Uc, Ic) =exp(Ucu I

    ci )

    Z

    r =

    cR

    c exp(Ucu Ici )

    ZDerivative with respect to each example

    foreach c

    Uc

    uk (ru,i

    cR

    c

    exp(Uc

    u Ic

    i ))2

    = 2(ru,i

    c(Rc

    p(c|Uc

    , I

    c

    ))p(c|Uc

    , I

    c

    )(Rc

    c(R

    cp(c|Uc, Ic))Icikforeach c

    Icik

    (ru,i

    cRc exp(Ucu I

    ci ))

    2 = 2(ru,i

    c(Rcp(c|Uc, Ic))p(c|Uc, Ic)

    (Rc

    c(Rcp(c|Uc, Ic))Ucuk

    Optimization Type SGD

    Update Rule Ucuk Ucuk ((

    Uc

    uk

    E) + (Ucuk))

    Icik Icik ((

    Ic

    ik

    E) + (Icik))

    4

  • 7/31/2019 Final Submission to KDD Cup 2011

    5/6

    5 Parallelism for LFL training

    5.1 Joint SGD Update by grouping strategy

    In this scheme, the SGD updates for U and I are parallelized by creating two disjoint set containing

    (u,i) pairs as illustrated in the figure below. This scheme can be recursively applied to each of the

    disjoint set for further levels parallelism. To create the disjoint set we used the modulo operator

    to partition into (u,i) sets. It turns out that in this dataset modulo operator splits the disjoint sets

    of almost equal sizes. One of the main advantages of this strategy over the alternating strategy is

    that the trained model is identitical to the trained model that one would get from a sequential SGD

    training. The alternating strategy creates a different model altogether.

    Figure 2: Joint SGD update by grouping independent U and I

    6 Results

    The results from our experiments using both training and validation set during training. The en-

    semble coefficients were learned using linear regression on the validation set using a model trained

    on the training set.

    ALS with validation set (1/-/200) 23.88

    LFL with validation set (10/.0001/120) 23.87

    Ensemble of LFL and ALS 23.57

    Table 4. Current Results on Test Set

    5

  • 7/31/2019 Final Submission to KDD Cup 2011

    6/6

    7 Timing Information

    All these runs were using 8 cores on the same node. It takes around 250 seconds to load all the files

    into memory for track 1 on a single compute node. On vSMP loading time is around 400 seconds.

    Method(k) Time in sec per epoch

    ALS (200) 4000

    LFL (120) 1200

    Table 4. Run times on a single node

    References

    1. Aditya Krishna Menon, Charles Elkan, A log-linear model with latent features for dyadic

    prediction, In IEEE International Conference on Data Mining (ICDM), Sydney, Australia,

    2010

    2. Zhou, Y., Wilkinson, D.M., Schreiber, R., Pan, R.: Large-Scale Parallel Collaborative Fil-

    tering for the Netflix Prize,In AAIM(2008) 337-348

    6