Accelerated Inexact Soft-Impute for Fast Large-Scale ...€¦ · singular value thresholding(SVT):...

34
Introduction Related Work Proposed Algorithm Experiments Accelerated Inexact Soft-Impute for Fast Large-Scale Matrix Completion Quanming Yao Department of Computer Science and Engineering Hong Kong University of Science and Technology Hong Kong Joint work with James Kwok Quanming Yao AIS-Impute for Matrix Completion

Transcript of Accelerated Inexact Soft-Impute for Fast Large-Scale ...€¦ · singular value thresholding(SVT):...

  • Introduction Related Work Proposed Algorithm Experiments

    Accelerated Inexact Soft-Impute forFast Large-Scale Matrix Completion

    Quanming Yao

    Department of Computer Science and EngineeringHong Kong University of Science and Technology

    Hong Kong

    Joint work with James Kwok

    Quanming Yao AIS-Impute for Matrix Completion

  • Introduction Related Work Proposed Algorithm Experiments

    Outline

    1 Introduction

    2 Related Work

    3 Proposed Algorithm

    4 Experiments

    Quanming Yao AIS-Impute for Matrix Completion

  • Introduction Related Work Proposed Algorithm Experiments

    Motivating Applications

    Recommender systems: predict rating by user i on item j

    Quanming Yao AIS-Impute for Matrix Completion

  • Introduction Related Work Proposed Algorithm Experiments

    Motivating Applications

    Similarity among users and items: low-rank assumption

    Quanming Yao AIS-Impute for Matrix Completion

  • Introduction Related Work Proposed Algorithm Experiments

    Motivating Applications

    Image inpainting: fill in missing pixels

    Natural image can be well approximated by low rank matrix

    Quanming Yao AIS-Impute for Matrix Completion

  • Introduction Related Work Proposed Algorithm Experiments

    Matrix Completion

    minX12‖PΩ(X − O)‖

    2F + λ‖X‖∗

    X ∈ Rm×n: low-rank matrix to be recovered (m ≤ n)O ∈ Rm×n: observed elements[PΩ(A)]ij = Aij if Ωij = 1, and 0 otherwise

    ‖X‖∗: nuclear norm (sum of X ’s singular values,non-smooth)‖X‖∗ =

    ∑mi=1 σi (X )

    find X which is low-rank and consistent with the observations

    Quanming Yao AIS-Impute for Matrix Completion

  • Introduction Related Work Proposed Algorithm Experiments

    Proximal Gradient Descent

    minx f (x) + λg(x)

    f (·): convex and smoothg(·): convex, can be non-smooth

    xt+1 = arg minx

    f (xt) + 〈x − xt ,∇f (xt)〉+1

    2‖x − xt‖2F + λg(x)

    = arg minx

    1

    2‖x − zt‖2 + λg(x)︸ ︷︷ ︸

    Proximal Step

    (where zt = xt −∇f (xt))

    often has simple closed-form solution

    convergence rate: O(1/T ), where T is number of iterations

    Quanming Yao AIS-Impute for Matrix Completion

  • Introduction Related Work Proposed Algorithm Experiments

    Proximal Gradient Descent - Acceleration

    minx f (x) + λg(x)

    can be accelerated to O(1/T 2) [Nesterov, 2013]

    yt = (1 + θt)xt − θtxt−1zt = yt −∇f (yt)

    xt+1 = arg minx

    1

    2‖x − zt‖2 + λg(x)

    e.g., θt = (t − 1)/(t + 2)can be seen as momentum method with specified weight

    Quanming Yao AIS-Impute for Matrix Completion

  • Introduction Related Work Proposed Algorithm Experiments

    Proximal Gradient Descent for Matrix Completion

    minX1

    2‖PΩ(X − O)‖2F︸ ︷︷ ︸

    f (X )

    +λ ‖X‖∗︸ ︷︷ ︸g(X )

    Let the SVD of matrix Z be UΣV>.

    Proximal Step for Matrix Completion

    arg minX

    1

    2‖X − Z‖2F + λ‖X‖∗ = U (Σ− λI )+︸ ︷︷ ︸

    thresholding

    V> ≡ SVTλ(Z )

    [(A)+]ij = max(Aij , 0)

    singular value thresholding (SVT): shrink singular values nobigger than λ to 0

    Acceleration can be used [Ji and Ye, 2009; Toh and Yun, 2010].

    Quanming Yao AIS-Impute for Matrix Completion

  • Introduction Related Work Proposed Algorithm Experiments

    Soft-Impute [Mazumder et al., 2010]

    Zt = PΩ(O) + P⊥Ω (Xt), Xt+1 = SVTλ(Zt).

    [P⊥Ω (A)]ij = Aij if Ωij = 0, and 1 otherwise (complement of PΩ(A))

    To compute SVD, the basic operations are matrix multiplications ofthe form Ztu and Z

    >t v

    Key observation: Zt is sparse + low-rank

    Let Xt = UtΣtVt>. For any u ∈ Rn,

    Ztu = PΩ(O − Xt)u︸ ︷︷ ︸sparse:O(‖Ω‖1)

    + UtΣt(Vt>u)︸ ︷︷ ︸

    low rank:O((m+n)k)

    Rank-k SVD takes O(‖Ω‖1k + (m + n)k2) time, instead of O(mnk)(similarly, for Z>t v)

    k is much smaller than m and n; ‖Ω‖1 much smaller than mn

    Quanming Yao AIS-Impute for Matrix Completion

  • Introduction Related Work Proposed Algorithm Experiments

    Soft-Impute is Proximal Gradient

    Zt = Xt −∇f (Xt)︸ ︷︷ ︸Proximal Gradient

    = Xt − PΩ(Xt − O) = P⊥Ω (Xt) + PΩ(O)︸ ︷︷ ︸Soft-Impute

    Soft-Impute = Proximal Gradient

    Possible to use acceleration and obtain O(1/T 2) rate

    Previous work suggested that this is not useful

    “sparse + low-rank” structure no longer existsincrease in iteration complexity > gain in convergence rate

    Quanming Yao AIS-Impute for Matrix Completion

  • Introduction Related Work Proposed Algorithm Experiments

    Main Contributions

    Acceleration is useful!

    1 “sparse + low-rank” structure can still be used

    maintain low iteration complexity

    improve convergence rate to O(1/T 2)

    2 Speedup SVT using power method

    further reduces iteration complexity

    use of approximation still yields O(1/T 2) convergence rate

    Quanming Yao AIS-Impute for Matrix Completion

  • Introduction Related Work Proposed Algorithm Experiments

    “Sparse + Low-Rank” Structure

    With acceleration,

    Zt = PΩ(O − Yt) + Yt= PΩ(O − Yt)︸ ︷︷ ︸

    sparse

    + (1 + θt)Xt − θtXt−1︸ ︷︷ ︸sum of two low-rank matrices

    For any u,

    Ztu = PΩ(O − Yt)u︸ ︷︷ ︸O(‖Ω‖1)

    + (1 + θt)UtΣtV>t u︸ ︷︷ ︸

    O((m+n)k)

    − θtUt−1Σt−1V>t−1u︸ ︷︷ ︸O((m+n)k)

    .

    rank-k SVD takes O(‖Ω‖1k + (m + n)k2) time(same as Soft-Impute)

    but rate is improved to O(1/T 2) (because of acceleration)

    Quanming Yao AIS-Impute for Matrix Completion

  • Introduction Related Work Proposed Algorithm Experiments

    Approximate SVT - Motivation

    The iterative procedure becomes

    Yt = (1 + θt)Xt − θtXt−1Zt = PΩ(O − Yt) + Yt

    Xt+1 = SVT (Zt)

    Motivations

    in SVT, only need singular vectors with singular values ≥ λpartial-SVD still has to be exactly solved

    iterative nature of proximal gradient descent, warm start canbe helpful

    → approximate the subspace spanned by those singular vectorsusing power method

    Quanming Yao AIS-Impute for Matrix Completion

  • Introduction Related Work Proposed Algorithm Experiments

    Power Method

    Let rank-k SVD of Z̃ = UkΣkV>k , power method is

    simple but efficient to approximate subspace spanned by Uk

    iterative algorithm and can be warm-started (using R)

    PowerMethod(Z̃ ,R, �̃) [Halko et al., 2011]

    Require: Z̃ ∈ Rm×n, initial R ∈ Rn×k for warm-start, tolerance �̃;1: initialize Q0 = QR(Z̃R);2: for j = 0, 1, . . . do3: Qj+1 = QR(Z̃(Z̃

    >Qj)); // QR decomposition of a matrix4: ∆j+1 = ‖Qj+1Qj+1> − QjQj>‖F ;5: if ∆j+1 ≤ �̃ then break;6: end for7: return Qj+1;

    Quanming Yao AIS-Impute for Matrix Completion

  • Introduction Related Work Proposed Algorithm Experiments

    Power Method - Case with k = 1

    PowerMethod(Z̃ , r)

    1: initialize q0 = Z̃ r ;2: for j = 0, 1, . . . do3: qj = qj/‖qj‖; // QR becomes normalization of a vector4: qj+1 = Z̃(Z̃

    >qj);5: end for

    Let Z̃ = UΣV>, recursive relationship can be seen as

    qj =(Z̃ Z̃>

    )jZ̃ r = U

    1 (σ2/σ1)2j...

    U>Z̃ rFor i = 2, · · · ,m, lim

    j→∞

    (σiσ1

    )2j= 0, power method captures

    span of u1 (first column of U)

    Quanming Yao AIS-Impute for Matrix Completion

  • Introduction Related Work Proposed Algorithm Experiments

    Obtain SVT(Z̃t) from a much smaller SVT

    With the obtained Q, an approximate SVT can be constructed as

    X̂t = Q SVTλ(Q>Z̃t).

    Q>Z̃t ∈ Rk×n, thus is much smaller than Z̃t ∈ Rm×n

    Approx-SVT(Z̃t ,R, λ, �̃)

    Require: Z̃t ∈ Rm×n, R ∈ Rn×k , thresholds λ and �̃.1: Q = PowerMethod(Z̃t ,R, �̃);2: [U,Σ,V ] = SVD(Q>Z̃t);3: U = {ui | σi > λ}, V = {vi | σi > λ}, Σ = (Σ− λI )+;4: return QU,Σ and V .

    still O(‖Ω‖1k + (m + n)k2), but is cheaper than exact SVD

    Quanming Yao AIS-Impute for Matrix Completion

  • Introduction Related Work Proposed Algorithm Experiments

    Complete Algorithm

    Accelerated Inexact Soft-Impute (AIS-Impute).

    Require: partially observed matrix O, parameter λ, decay parameter ν ∈ (0, 1),threshold �;

    1: [U0, λ0,V0] = rank-1 SVD(PΩ(O));2: initialize c = 1, �̃0 = ‖PΩ(O)‖F , X0 = X1 = λ0U0V>0 ;3: for t = 1, 2, . . . do4: λt = ν

    t(λ0 − λ) + λ;5: θt = (c − 1)/(c + 2);6: Yt = Xt + θt(Xt − Xt−1);7: Z̃t = Yt + PΩ(O − Yt);8: �̃t = ν

    t �̃0;9: Vt−1 = Vt−1 − Vt(Vt>Vt−1), remove zero columns;

    10: Rt = QR([Vt ,Vt−1]);11: [Ut+1,Σt+1,Vt+1] = approx-SVT(Z̃t ,Rt , λt , �̃t);12: if F (Ut+1Σt+1V

    >t+1) > F (UtΣtV

    >t ) c = 1 else c = c + 1;

    13: end for14: return Xt+1 = Ut+1Σt+1V

    >t+1.

    Quanming Yao AIS-Impute for Matrix Completion

  • Introduction Related Work Proposed Algorithm Experiments

    Accelerated Inexact Soft-Impute (AIS-Impute).

    Require: partially observed matrix O, parameter λ, decay parameter ν ∈ (0, 1),threshold �;

    1: [U0, λ0,V0] = rank-1 SVD(PΩ(O));2: initialize c = 1, �̃0 = ‖PΩ(O)‖F , X0 = X1 = λ0U0V>0 ;3: for t = 1, 2, . . . do4: λt = ν

    t(λ0 − λ) + λ;5: θt = (c − 1)/(c + 2);6: Yt = Xt + θt(Xt − Xt−1);7: Z̃t = Yt + PΩ(O − Yt);8: �̃t = ν

    t �̃0;9: Vt−1 = Vt−1 − Vt(Vt>Vt−1), remove zero columns;

    10: Rt = QR([Vt ,Vt−1]);11: [Ut+1,Σt+1,Vt+1] = approx-SVT(Z̃t ,Rt , λt , �̃t);12: if F (Ut+1Σt+1V

    >t+1) > F (UtΣtV

    >t ) c = 1 else c = c + 1;

    13: end for14: return Xt+1 = Ut+1Σt+1V

    >t+1.

    core steps: 5–7 (acceleration)

    Quanming Yao AIS-Impute for Matrix Completion

  • Introduction Related Work Proposed Algorithm Experiments

    Accelerated Inexact Soft-Impute (AIS-Impute).

    Require: partially observed matrix O, parameter λ, decay parameter ν ∈ (0, 1),threshold �;

    1: [U0, λ0,V0] = rank-1 SVD(PΩ(O));2: initialize c = 1, �̃0 = ‖PΩ(O)‖F , X0 = X1 = λ0U0V>0 ;3: for t = 1, 2, . . . do4: λt = ν

    t(λ0 − λ) + λ;5: θt = (c − 1)/(c + 2);6: Yt = Xt + θt(Xt − Xt−1);7: Z̃t = Yt + PΩ(O − Yt);8: �̃t = ν

    t �̃0;9: Vt−1 = Vt−1 − Vt(Vt>Vt−1), remove zero columns;

    10: Rt = QR([Vt ,Vt−1]);11: [Ut+1,Σt+1,Vt+1] = approx-SVT(Z̃t ,Rt , λt , �̃t);12: if F (Ut+1Σt+1V

    >t+1) > F (UtΣtV

    >t ) c = 1 else c = c + 1;

    13: end for14: return Xt+1 = Ut+1Σt+1V

    >t+1.

    core steps: 8–11 (approximate SVT)the last two iterations (Vt and Vt−1) is used to warm-start power methoderror on approximate SVT �̃t is decreased linearly

    Quanming Yao AIS-Impute for Matrix Completion

  • Introduction Related Work Proposed Algorithm Experiments

    Accelerated Inexact Soft-Impute (AIS-Impute).

    Require: partially observed matrix O, parameter λ, decay parameter ν ∈ (0, 1),threshold �;

    1: [U0, λ0,V0] = rank-1 SVD(PΩ(O));2: initialize c = 1, �̃0 = ‖PΩ(O)‖F , X0 = X1 = λ0U0V>0 ;3: for t = 1, 2, . . . do4: λt = ν

    t(λ0 − λ) + λ;5: θt = (c − 1)/(c + 2);6: Yt = Xt + θt(Xt − Xt−1);7: Z̃t = Yt + PΩ(O − Yt);8: �̃t = ν

    t �̃0;9: Vt−1 = Vt−1 − Vt(Vt>Vt−1), remove zero columns;

    10: Rt = QR([Vt ,Vt−1]);11: [Ut+1,Σt+1,Vt+1] = approx-SVT(Z̃t ,Rt , λt , �̃t);12: if F (Ut+1Σt+1V

    >t+1) > F (UtΣtV

    >t ) c = 1 else c = c + 1;

    13: end for14: return Xt+1 = Ut+1Σt+1V

    >t+1.

    step 12: adaptive restarts algorithm if F (X ) starts to increase

    Quanming Yao AIS-Impute for Matrix Completion

  • Introduction Related Work Proposed Algorithm Experiments

    Accelerated Inexact Soft-Impute (AIS-Impute).

    Require: partially observed matrix O, parameter λ, decay parameter ν ∈ (0, 1),threshold �;

    1: [U0, λ0,V0] = rank-1 SVD(PΩ(O));2: initialize c = 1, �̃0 = ‖PΩ(O)‖F , X0 = X1 = λ0U0V>0 ;3: for t = 1, 2, . . . do4: λt = ν

    t(λ0 − λ) + λ;5: θt = (c − 1)/(c + 2);6: Yt = Xt + θt(Xt − Xt−1);7: Z̃t = Yt + PΩ(O − Yt);8: �̃t = ν

    t �̃0;9: Vt−1 = Vt−1 − Vt(Vt>Vt−1), remove zero columns;

    10: Rt = QR([Vt ,Vt−1]);11: [Ut+1,Σt+1,Vt+1] = approx-SVT(Z̃t ,Rt , λt , �̃t);12: if F (Ut+1Σt+1V

    >t+1) > F (UtΣtV

    >t ) c = 1 else c = c + 1;

    13: end for14: return Xt+1 = Ut+1Σt+1V

    >t+1.

    step 4 (continuation strategy): λt is initialized to large value and then

    decreased gradually; allows further speedup

    Quanming Yao AIS-Impute for Matrix Completion

  • Introduction Related Work Proposed Algorithm Experiments

    Error in Approximate SVT

    Let hλg (X ;Zt) ≡ 12‖X − Zt‖2F + λg(X ), if power method exits after j

    iterations, assume that k ≥ k̂ , ηt < 1 and �̃ ≥ αtηjt√

    1 + η2t , then

    hλ‖·‖∗(X̂t ; Z̃t) ≤ hλ‖·‖∗(SVTλ(Z̃t); Z̃t) +ηt

    1− ηtβtγt �̃︸ ︷︷ ︸

    controlled by �̃

    .

    where X̂t is approximate solution.

    αt , βt , γt and ηt are some constants depend on Z̃t

    k̂ is # of singular values > λ, k is input rank for Approx-SVT

    �̃ is tolerance for power method

    The approximation error in Approx-SVT can be controlled by �̃t

    Quanming Yao AIS-Impute for Matrix Completion

  • Introduction Related Work Proposed Algorithm Experiments

    Convergence of AIS-Impute

    Theorem

    With controlled approximation error on SVT, Algorithm 3converges to the optimal solution with a rate of O(1/T 2).

    Since approximation error �̃t on proximal step (approx-SVT)decreases to 0 faster than O(1/T 2), the convergence rate is thesame as for exact SVT

    Quanming Yao AIS-Impute for Matrix Completion

  • Introduction Related Work Proposed Algorithm Experiments

    Synthetic Data

    m ×m data matrix O = UV + GU ∈ Rm×5,V ∈ R5×m: sampled i.i.d. from N (0, 1)G : sampled from N (0, 0.05)

    ‖Ω‖1 = 15m log(m) random elements in O are observedhalf for training, half for parameter tuning

    Testing on the unobserved (missing) elements

    Performance criteria:

    NMSE =√‖P⊥Ω (X − X̃ )‖F/‖P⊥Ω (X̃ )‖F

    rank obtainedtime

    Quanming Yao AIS-Impute for Matrix Completion

  • Introduction Related Work Proposed Algorithm Experiments

    Synthetic Data - Compared Methods

    Compare the proposed AIS-Impute with

    accelerated proximal gradient algorithm (“APG”) [Ji and Ye,2009; Toh and Yun, 2010];

    Soft-Impute [Mazumder et al., 2010]

    AlgorithmIteration

    ComplexityRate SVT

    APG O(mnk) O(1/T 2) Exact

    Soft-ImputeO(k‖Ω‖1 +k2(m + n))

    O(1/T ) Exact

    AIS-ImputeO(k‖Ω‖1 +k2(m + n))

    O(1/T 2) Approximate

    Code can be download fromhttps://github.com/quanmingyao/AIS-impute

    Quanming Yao AIS-Impute for Matrix Completion

    https://github.com/quanmingyao/AIS-impute

  • Introduction Related Work Proposed Algorithm Experiments

    Results

    m = 500 (sparsity=18.64%) m = 1000 (10.36%)

    NMSE rank time (sec) NMSE rank time (sec)

    APG 0.0183 5 5.1 0.0223 5 45.5

    Soft-Impute 0.0183 5 1.3 0.0223 5 4.4

    AIS-Impute 0.0183 5 0.3 0.0223 5 1.1

    m = 1500 (7.31%) m = 2000 (5.70%)

    NMSE rank time (sec) NMSE rank time (sec)

    APG 0.0251 5 172.7 0.0273 5 483.9

    Soft-Impute 0.0251 5 13.3 0.0273 5 18.7

    AIS-Impute 0.0251 5 2.0 0.0273 5 2.9

    All algorithms are equally good on recovery, while AIS-Impute isthe fastest

    Quanming Yao AIS-Impute for Matrix Completion

  • Introduction Related Work Proposed Algorithm Experiments

    Convergence Speeds

    (a) objective vs #iterations. (b) objective vs time.

    W.r.t. #iterations

    APG and AIS-Impute are much faster than Soft-ImputeAIS-Impute has a slightly higher objective than APG

    W.r.t. time

    APG is the slowest (does not use “sparse plus low-rank”)AIS-Impute is the fastest

    Quanming Yao AIS-Impute for Matrix Completion

  • Introduction Related Work Proposed Algorithm Experiments

    Recommendation - MovieLens Data

    Task: Recommend movies based on users’ historical ratings

    #users #movies #ratings

    MovieLens-100K 943 1,682 100,000

    MovieLens-1M 6,040 3,449 999,714

    MovieLens-10M 69,878 10,677 10,000,054

    ratings (from 1 to 5) of different users on movies

    50% of the observed ratings for training

    25% for validation and the rest for testing

    Quanming Yao AIS-Impute for Matrix Completion

  • Introduction Related Work Proposed Algorithm Experiments

    MovieLens Data - Compared Methods

    Besides proximal algorithms, we also compare with

    active subspace selection (“active”) [Hsieh and Olsen, 2014]

    Frank-Wolfe algorithm (“boost”) [Zhang et al., 2012]

    variant of Soft-Impute (“ALT-Impute”) [Hastie et al., 2014]

    second-order trust-region algorithm (“TR”) [Mishra et al.,2013]

    Quanming Yao AIS-Impute for Matrix Completion

  • Introduction Related Work Proposed Algorithm Experiments

    Objective w.r.t. Time

    AIS-Impute is in black

    (a) MovieLens-100K. (b) MovieLens-10M.

    MovieLen-10M

    TR and APG are very slow, and thus not shown

    Quanming Yao AIS-Impute for Matrix Completion

  • Introduction Related Work Proposed Algorithm Experiments

    Testing RMSE w.r.t. Time

    AIS-Impute is in black

    (a) MovieLens-100K. (b) MovieLens-10M.

    Quanming Yao AIS-Impute for Matrix Completion

  • Introduction Related Work Proposed Algorithm Experiments

    Results

    MovieLens-100K MovieLens-1M MovieLens-10M

    RMSE rank time RMSE rank time RMSE rank time

    active 1.037 70 59.5 0.925 180 1431.4 0.918 217 29681.4

    boost 1.038 71 19.5 0.925 178 616.3 0.917 216 13873.9

    ALT-Impute 1.037 70 29.1 0.925 179 797.1 0.919 215 17337.3

    TR 1.037 71 1911.4 — — > 106 — — > 106

    APG 1.037 70 83.4 0.925 180 2060.3 — — > 106

    Soft-Impute 1.037 70 337.6 0.925 180 8821.0 — — > 106

    AIS-Impute 1.037 70 5.8 0.925 179 129.7 0.916 215 2817.5

    All algorithms are equally good at recovering the missingmatrix elements

    TR is the slowest

    ALT-Impute has the same convergence rate as Soft-Impute,but is faster (than Soft-Impute)

    AIS-Impute is the fastest

    Quanming Yao AIS-Impute for Matrix Completion

  • Introduction Related Work Proposed Algorithm Experiments

    Conclusion

    AIS-Impute

    accelerates proximal gradient descent without losing the“sparse plus low-rank” structure

    power method produces good approximation to SVT efficiently

    fast convergence rate + low iteration complexity

    empirically, much faster than the state-of-the-art

    Quanming Yao AIS-Impute for Matrix Completion

    IntroductionRelated WorkProximal Gradient Descent

    Proposed AlgorithmExperiments