Talwalkar mlconf (1)

89
DivideandConquer Matrix Factoriza5on Ameet Talwalkar UC Berkeley November 15th, 2013 Collaborators: Lester Mackey 2 , Michael I. Jordan 1 , Yadong Mu 3 , ShihFu Chang 3 1 UC Berkeley 2 Stanford University 3 Columbia University

Transcript of Talwalkar mlconf (1)

Page 1: Talwalkar mlconf (1)

Divide-­‐and-­‐Conquer  Matrix  Factoriza5on

Ameet  TalwalkarUC  Berkeley

November  15th,  2013

Collaborators:  Lester  Mackey2,  Michael  I.  Jordan1,  Yadong  Mu3,  Shih-­‐Fu  Chang3

1UC  Berkeley            2Stanford  University            3Columbia  University  

Page 2: Talwalkar mlconf (1)

Three  Converging  Trends

Page 3: Talwalkar mlconf (1)

Big  Data

Three  Converging  Trends

Page 4: Talwalkar mlconf (1)

Distributed  CompuOng

Big  Data

Three  Converging  Trends

Page 5: Talwalkar mlconf (1)

Distributed  CompuOng

Big  Data

Three  Converging  Trends

Machine  Learning

Page 6: Talwalkar mlconf (1)

Distributed  CompuOng

Big  Data Machine  Learning

Goal:  Extend  ML  to  the  Big  Data  SeAng  

Challenge:  ML  not  developed  with  scalability  in  mind✦ Does  not  naturally  scale  /  leverage  distributed  compuOng

Page 7: Talwalkar mlconf (1)

Distributed  CompuOng

Big  Data Machine  Learning

Goal:  Extend  ML  to  the  Big  Data  SeAng  

Challenge:  ML  not  developed  with  scalability  in  mind✦ Does  not  naturally  scale  /  leverage  distributed  compuOng

Our  approach:  Divide-­‐and-­‐conquer✦ Apply  exisOng  base  algorithms  to  subsets  of  data  and  combine

Page 8: Talwalkar mlconf (1)

Distributed  CompuOng

Big  Data Machine  Learning

Goal:  Extend  ML  to  the  Big  Data  SeAng  

Challenge:  ML  not  developed  with  scalability  in  mind✦ Does  not  naturally  scale  /  leverage  distributed  compuOng

Our  approach:  Divide-­‐and-­‐conquer✦ Apply  exisOng  base  algorithms  to  subsets  of  data  and  combine

✓ Build  upon  exisOng  suites  of  ML  algorithms✓ Preserve  favorable  algorithm  properOes✓ Naturally  leverage  distributed  compuOng

Page 9: Talwalkar mlconf (1)

Distributed  CompuOng

Big  Data Machine  Learning

Goal:  Extend  ML  to  the  Big  Data  SeAng  

Challenge:  ML  not  developed  with  scalability  in  mind✦ Does  not  naturally  scale  /  leverage  distributed  compuOng

Our  approach:  Divide-­‐and-­‐conquer✦ Apply  exisOng  base  algorithms  to  subsets  of  data  and  combine

✓ Build  upon  exisOng  suites  of  ML  algorithms✓ Preserve  favorable  algorithm  properOes✓ Naturally  leverage  distributed  compuOng

✦ E.g.,  ✦ Matrix  factorizaOon  (DFC)✦ Assessing  esOmator  quality  (BLB)✦ Genomic  Variant  Calling

[MTJ, NIPS11; TMMFJ, ICCV13]

[KTSJ, ICML12; KTSJ, JRSS13; KTASJ, KDD13]

[BTTJPYS13, submitted, CTZFJP13, submitted]

Page 10: Talwalkar mlconf (1)

Distributed  CompuOng

Big  Data Machine  Learning

Goal:  Extend  ML  to  the  Big  Data  SeAng  

Challenge:  ML  not  developed  with  scalability  in  mind✦ Does  not  naturally  scale  /  leverage  distributed  compuOng

Our  approach:  Divide-­‐and-­‐conquer✦ Apply  exisOng  base  algorithms  to  subsets  of  data  and  combine

✓ Build  upon  exisOng  suites  of  ML  algorithms✓ Preserve  favorable  algorithm  properOes✓ Naturally  leverage  distributed  compuOng

✦ E.g.,  ✦ Matrix  factorizaOon  (DFC)✦ Assessing  esOmator  quality  (BLB)✦ Genomic  Variant  Calling

[MTJ, NIPS11; TMMFJ, ICCV13]

[KTSJ, ICML12; KTSJ, JRSS13; KTASJ, KDD13]

[BTTJPYS13, submitted, CTZFJP13, submitted]

Page 11: Talwalkar mlconf (1)

Matrix  CompleOon

Page 12: Talwalkar mlconf (1)

Matrix  CompleOon

Page 13: Talwalkar mlconf (1)

Matrix  CompleOon

Goal: Recover a matrix from a subset of its entries

Page 14: Talwalkar mlconf (1)

Matrix  CompleOon

Goal: Recover a matrix from a subset of its entries

Page 15: Talwalkar mlconf (1)

Matrix  CompleOon

Goal: Recover a matrix from a subset of its entries

Page 16: Talwalkar mlconf (1)

Matrix  CompleOon

Goal: Recover a matrix from a subset of its entries

Can we do this at scale?✦ Netflix: 30M users, 100K+ videos✦ Facebook: 1B users✦ Pandora: 70M active users, 1M songs ✦ Amazon: Millions of users and products✦ ...

Page 17: Talwalkar mlconf (1)

Reducing  Degrees  of  Freedom

Page 18: Talwalkar mlconf (1)

Reducing  Degrees  of  Freedom

✦ Problem: Impossible without additional information✦ mn degrees of freedom

m

n

Page 19: Talwalkar mlconf (1)

Reducing  Degrees  of  Freedom

✦ Problem: Impossible without additional information✦ mn degrees of freedom

✦ Solution: Assume small # of

factors determine preference

m

n

= m

r nr

‘Low-rank’

Page 20: Talwalkar mlconf (1)

Reducing  Degrees  of  Freedom

✦ Problem: Impossible without additional information✦ mn degrees of freedom

✦ Solution: Assume small # of

factors determine preference ✦ degrees of freedom✦ Linear storage costs

m

n

= m

r nr

‘Low-rank’

O(m+ n)

Page 21: Talwalkar mlconf (1)

Bad  Sampling

✦ Problem:    We  have  no  raOng  informaOon  about  

Page 22: Talwalkar mlconf (1)

Bad  Sampling

✦ Problem:    We  have  no  raOng  informaOon  about

✦ SoluOon:    Assume                      observed  entries  drawn  uniformly  at  random

⌦̃(r(n+m))

Page 23: Talwalkar mlconf (1)

Bad  InformaOon  Spread

Page 24: Talwalkar mlconf (1)

Bad  InformaOon  Spread

✦ Problem:  Other  raOngs  don’t  inform  us  about  missing  raOng

bad  spread  of  informaOon

Page 25: Talwalkar mlconf (1)

Bad  InformaOon  Spread

[Candes and Recht, 2009]

✦ Problem:  Other  raOngs  don’t  inform  us  about  missing  raOng

✦ SoluOon:    Assume  incoherence  with  standard  basis

bad  spread  of  informaOon

Page 26: Talwalkar mlconf (1)

Matrix  CompleOon

In

=

Low-rank

+ ‘noise’

Goal:  Recover  a  matrix  from  a  subset  of  its  entries,  assuming✦ low-­‐rank,  incoherent✦ uniform  sampling

Page 27: Talwalkar mlconf (1)

Matrix  CompleOon

In

=

Low-rank

+ ‘noise’

✦ Nuclear-­‐norm  heurisOc+  strong  theoreOcal  guarantees+  good  empirical  results

Page 28: Talwalkar mlconf (1)

Matrix  CompleOon

In

=

Low-rank

+ ‘noise’

✦ Nuclear-­‐norm  heurisOc+  strong  theoreOcal  guarantees+  good  empirical  results⎯  very  slow  computa5on

Page 29: Talwalkar mlconf (1)

Matrix  CompleOon

In

=

Low-rank

+ ‘noise’

✦ Nuclear-­‐norm  heurisOc+  strong  theoreOcal  guarantees+  good  empirical  results⎯  very  slow  computa5on

Goal:  Scale  MC  algorithms  and  preserve  guarantees

Page 30: Talwalkar mlconf (1)

Divide-­‐Factor-­‐Combine  (DFC)[MTJ, NIPS11]

Page 31: Talwalkar mlconf (1)

Divide-­‐Factor-­‐Combine  (DFC)

✦ D  step:  Divide  input  matrix  into  submatrices

[MTJ, NIPS11]

Page 32: Talwalkar mlconf (1)

Divide-­‐Factor-­‐Combine  (DFC)

✦ D  step:  Divide  input  matrix  into  submatrices

✦ F  step:  Factor  in  parallel  using  a  base  MC  algorithm

[MTJ, NIPS11]

Page 33: Talwalkar mlconf (1)

Divide-­‐Factor-­‐Combine  (DFC)

✦ D  step:  Divide  input  matrix  into  submatrices

✦ F  step:  Factor  in  parallel  using  a  base  MC  algorithm

✦ C  step:  Combine  submatrix  esOmates

[MTJ, NIPS11]

Page 34: Talwalkar mlconf (1)

Divide-­‐Factor-­‐Combine  (DFC)

✦ D  step:  Divide  input  matrix  into  submatrices

✦ F  step:  Factor  in  parallel  using  a  base  MC  algorithm

✦ C  step:  Combine  submatrix  esOmates

Advantages:✦ Submatrix  factorizaOon  is  much  cheaper  and  easily  parallelized

✦ Minimal  communicaOon  between  parallel  jobs

✦ Retains  comparable  recovery  guarantees  (with  proper  choice  of  division  /  combinaOon  strategies)

[MTJ, NIPS11]

Page 35: Talwalkar mlconf (1)

DFC-­‐Proj✦ D  step:  Randomly  parOOon  observed  entries  into  t  submatrices:

Page 36: Talwalkar mlconf (1)

DFC-­‐Proj✦ D  step:  Randomly  parOOon  observed  entries  into  t  submatrices:

✦ F  step:  Complete  the  submatrices  in  parallel✦ Reduced  cost:  Expect  t-­‐fold  speedup  per  iteraOon✦ Parallel  computaOon:  Pay  cost  of  one  cheaper  MC

Page 37: Talwalkar mlconf (1)

DFC-­‐Proj✦ D  step:  Randomly  parOOon  observed  entries  into  t  submatrices:

✦ F  step:  Complete  the  submatrices  in  parallel✦ Reduced  cost:  Expect  t-­‐fold  speedup  per  iteraOon✦ Parallel  computaOon:  Pay  cost  of  one  cheaper  MC

✦ C  step:  Project  onto  single  low-­‐dimensional  column  space

Page 38: Talwalkar mlconf (1)

DFC-­‐Proj✦ D  step:  Randomly  parOOon  observed  entries  into  t  submatrices:

✦ F  step:  Complete  the  submatrices  in  parallel✦ Reduced  cost:  Expect  t-­‐fold  speedup  per  iteraOon✦ Parallel  computaOon:  Pay  cost  of  one  cheaper  MC

✦ C  step:  Project  onto  single  low-­‐dimensional  column  space✦ Roughly,  share  informaOon  across  sub-­‐soluOons

Page 39: Talwalkar mlconf (1)

DFC-­‐Proj✦ D  step:  Randomly  parOOon  observed  entries  into  t  submatrices:

✦ F  step:  Complete  the  submatrices  in  parallel✦ Reduced  cost:  Expect  t-­‐fold  speedup  per  iteraOon✦ Parallel  computaOon:  Pay  cost  of  one  cheaper  MC

✦ C  step:  Project  onto  single  low-­‐dimensional  column  space✦ Roughly,  share  informaOon  across  sub-­‐soluOons✦ Minimal  cost:  linear  in  n,  quadraOc  in  rank  of  sub-­‐soluOons

Page 40: Talwalkar mlconf (1)

DFC-­‐Proj✦ D  step:  Randomly  parOOon  observed  entries  into  t  submatrices:

✦ F  step:  Complete  the  submatrices  in  parallel✦ Reduced  cost:  Expect  t-­‐fold  speedup  per  iteraOon✦ Parallel  computaOon:  Pay  cost  of  one  cheaper  MC

✦ C  step:  Project  onto  single  low-­‐dimensional  column  space✦ Roughly,  share  informaOon  across  sub-­‐soluOons✦ Minimal  cost:  linear  in  n,  quadraOc  in  rank  of  sub-­‐soluOons

=

Page 41: Talwalkar mlconf (1)

DFC-­‐Proj✦ D  step:  Randomly  parOOon  observed  entries  into  t  submatrices:

✦ F  step:  Complete  the  submatrices  in  parallel✦ Reduced  cost:  Expect  t-­‐fold  speedup  per  iteraOon✦ Parallel  computaOon:  Pay  cost  of  one  cheaper  MC

✦ C  step:  Project  onto  single  low-­‐dimensional  column  space✦ Roughly,  share  informaOon  across  sub-­‐soluOons✦ Minimal  cost:  linear  in  n,  quadraOc  in  rank  of  sub-­‐soluOons

=

Page 42: Talwalkar mlconf (1)

DFC-­‐Proj✦ D  step:  Randomly  parOOon  observed  entries  into  t  submatrices:

✦ F  step:  Complete  the  submatrices  in  parallel✦ Reduced  cost:  Expect  t-­‐fold  speedup  per  iteraOon✦ Parallel  computaOon:  Pay  cost  of  one  cheaper  MC

✦ C  step:  Project  onto  single  low-­‐dimensional  column  space✦ Roughly,  share  informaOon  across  sub-­‐soluOons✦ Minimal  cost:  linear  in  n,  quadraOc  in  rank  of  sub-­‐soluOons

=

Page 43: Talwalkar mlconf (1)

DFC-­‐Proj✦ D  step:  Randomly  parOOon  observed  entries  into  t  submatrices:

✦ F  step:  Complete  the  submatrices  in  parallel✦ Reduced  cost:  Expect  t-­‐fold  speedup  per  iteraOon✦ Parallel  computaOon:  Pay  cost  of  one  cheaper  MC

✦ C  step:  Project  onto  single  low-­‐dimensional  column  space✦ Roughly,  share  informaOon  across  sub-­‐soluOons✦ Minimal  cost:  linear  in  n,  quadraOc  in  rank  of  sub-­‐soluOons

= =

Page 44: Talwalkar mlconf (1)

DFC-­‐Proj✦ D  step:  Randomly  parOOon  observed  entries  into  t  submatrices:

✦ F  step:  Complete  the  submatrices  in  parallel✦ Reduced  cost:  Expect  t-­‐fold  speedup  per  iteraOon✦ Parallel  computaOon:  Pay  cost  of  one  cheaper  MC

✦ C  step:  Project  onto  single  low-­‐dimensional  column  space✦ Roughly,  share  informaOon  across  sub-­‐soluOons✦ Minimal  cost:  linear  in  n,  quadraOc  in  rank  of  sub-­‐soluOons

✦ Ensemble: Project onto column space of each sub-solution and average

Page 45: Talwalkar mlconf (1)

Does  It  Work?Yes,  with  high  probability.

Theorem:    Assume:  ✦            is  low-­‐rank  and  incoherent,✦                                              entries  sampled  uniformly  at  random,✦ Nuclear  norm  heurisOc  is  base  algorithm.

L0

⌦̃(r(n+m))

Page 46: Talwalkar mlconf (1)

Does  It  Work?Yes,  with  high  probability.

Theorem:    Assume:  ✦            is  low-­‐rank  and  incoherent,✦                                              entries  sampled  uniformly  at  random,✦ Nuclear  norm  heurisOc  is  base  algorithm.

Then                              with  (slightly  less)  high  probability.    

L0

⌦̃(r(n+m))

L̂ = L0

Page 47: Talwalkar mlconf (1)

Does  It  Work?Yes,  with  high  probability.

Theorem:    Assume:  ✦            is  low-­‐rank  and  incoherent,✦                                              entries  sampled  uniformly  at  random,✦ Nuclear  norm  heurisOc  is  base  algorithm.

Then                              with  (slightly  less)  high  probability.    

L0

⌦̃(r(n+m))

L̂ = L0

✦ Noisy  seang:                          approximaOon  of  original  bound

✦ Can  divide  into  an  increasing  number  of  subproblems  (                          )  when  number  of  observed  entries  int ! 1

(2 + ✏)

!̃(r2(n+m))

Page 48: Talwalkar mlconf (1)

DFC  Noisy  Recovery

0 2 4 6 8 100

0.05

0.1

0.15

0.2

0.25MC

RM

SE

% revealed entries

Proj−10%Proj−Ens−10%Base−MC

✦ Noisy recovery relative to base algorithm ( )n = 10K, r = 10

Page 49: Talwalkar mlconf (1)

DFC Speedup

✦ Speedup over APG for random matrices with 4% of entries revealed and r = 0.001n

1 2 3 4 5x 104

0

500

1000

1500

2000

2500

3000

3500MC

time

(s)

m

Proj−10%Proj−Ens−10%Base−MC

Page 50: Talwalkar mlconf (1)

NeIlix  Prize:  ✦ 100  million  raOngs  in  {1,  ...  ,  5}✦ 18K  movies,  480K  user✦ Issues:  Full-­‐rank;  Noisy,  non-­‐uniform  

observaOons

Matrix  CompleOon

Page 51: Talwalkar mlconf (1)

NeIlixNeIlixMethod Error Time

Nuclear  Norm 0.8433 2653.1sDFC,  t=4DFC,  t=10

DFC-­‐Ens,  t=4DFC-­‐Ens,  t=10

NeIlix  Prize:  ✦ 100  million  raOngs  in  {1,  ...  ,  5}✦ 18K  movies,  480K  user✦ Issues:  Full-­‐rank;  Noisy,  non-­‐uniform  

observaOons

Matrix  CompleOon

Page 52: Talwalkar mlconf (1)

NeIlixNeIlixMethod Error Time

Nuclear  Norm 0.8433 2653.1sDFC,  t=4 0.8436 689.5sDFC,  t=10 0.8484 289.7s

DFC-­‐Ens,  t=4 0.8411 689.5sDFC-­‐Ens,  t=10 0.8433 289.7

Matrix  CompleOonNeIlix  Prize:  

✦ 100  million  raOngs  in  {1,  ...  ,  5}✦ 18K  movies,  480K  user✦ Issues:  Full-­‐rank;  Noisy,  non-­‐uniform  

observaOons

Page 53: Talwalkar mlconf (1)

NeIlixNeIlixMethod Error Time

Nuclear  Norm 0.8433 2653.1sDFC,  t=4 0.8436 689.5sDFC,  t=10 0.8484 289.7s

DFC-­‐Ens,  t=4 0.8411 689.5sDFC-­‐Ens,  t=10 0.8433 289.7

Matrix  CompleOonNeIlix  Prize:  

✦ 100  million  raOngs  in  {1,  ...  ,  5}✦ 18K  movies,  480K  user✦ Issues:  Full-­‐rank;  Noisy,  non-­‐uniform  

observaOons

Page 54: Talwalkar mlconf (1)

Robust  Matrix  FactorizaOon[Chandrasekaran, Sanghavi, Parrilo, and Willsky, 2009; Candes, Li, Ma, and Wright, 2011; Zhou, Li, Wright, Candes, and Ma, 2010]

In

=

Low-rank

+ ‘noise’Matrix  

Comple5on

Page 55: Talwalkar mlconf (1)

Robust  Matrix  FactorizaOon[Chandrasekaran, Sanghavi, Parrilo, and Willsky, 2009; Candes, Li, Ma, and Wright, 2011; Zhou, Li, Wright, Candes, and Ma, 2010]

In

=

Low-rank

+ ‘noise’Matrix  

Comple5on

=Principal  

Component  Analysis

In Low-rank

+ ‘noise’

Page 56: Talwalkar mlconf (1)

Robust  Matrix  FactorizaOon[Chandrasekaran, Sanghavi, Parrilo, and Willsky, 2009; Candes, Li, Ma, and Wright, 2011; Zhou, Li, Wright, Candes, and Ma, 2010]

In

=

Low-rank

+ ‘noise’Matrix  

Comple5on

= +Robust  Matrix  Factoriza5on

In Low-rank SparseOutliers

+ ‘noise’

=Principal  

Component  Analysis

In Low-rank

+ ‘noise’

Page 57: Talwalkar mlconf (1)

✦ Goal:  separate  foreground  from  background  

✦ Store  video  as  matrix✦ Low-rank  =  background✦ Outliers  =  movement

Video  Surveillance

Page 58: Talwalkar mlconf (1)

✦ Goal:  separate  foreground  from  background  

✦ Store  video  as  matrix✦ Low-rank  =  background✦ Outliers  =  movement

Video  Surveillance

Original  Frame

Page 59: Talwalkar mlconf (1)

✦ Goal:  separate  foreground  from  background  

✦ Store  video  as  matrix✦ Low-rank  =  background✦ Outliers  =  movement

Video  Surveillance

Original  Frame Nuclear  Norm(342.5s)

Page 60: Talwalkar mlconf (1)

✦ Goal:  separate  foreground  from  background  

✦ Store  video  as  matrix✦ Low-rank  =  background✦ Outliers  =  movement

Video  Surveillance

Original  Frame Nuclear  Norm(342.5s)

DFC-­‐5%(24.2s)

DFC-­‐0.5%(5.2s)

Page 61: Talwalkar mlconf (1)

Subspace  SegmentaOon

In

=

Low-rank

+ ‘noise’Matrix  

Comple5on

[Liu, Lin, and Yu, 2010]

Page 62: Talwalkar mlconf (1)

Subspace  SegmentaOon

In

=

Low-rank

+ ‘noise’Matrix  

Comple5on

=Principal  

Component  Analysis

In Low-rank

+ ‘noise’

[Liu, Lin, and Yu, 2010]

Page 63: Talwalkar mlconf (1)

Subspace  SegmentaOon

In

=

Low-rank

+ ‘noise’Matrix  

Comple5on

=Principal  

Component  Analysis

In Low-rank

+ ‘noise’

[Liu, Lin, and Yu, 2010]

=Subspace  Segmenta5on

In

+ ‘noise’

Low-rank

Page 64: Talwalkar mlconf (1)

MoOvaOon:  Face  images

Page 65: Talwalkar mlconf (1)

MoOvaOon:  Face  images

Principal  Component  Analysis

In

...

...

Page 66: Talwalkar mlconf (1)

MoOvaOon:  Face  images

=

Low-rank

+ ‘noise’

✦  Model  images  of  one  person  via  one  low-­‐dimensional  subspace

Principal  Component  Analysis

In

...

...

Page 67: Talwalkar mlconf (1)

MoOvaOon:  Face  images

Page 68: Talwalkar mlconf (1)

MoOvaOon:  Face  images

Subspace  Segmenta5on

In

Page 69: Talwalkar mlconf (1)

MoOvaOon:  Face  images

Subspace  Segmenta5on

In

Page 70: Talwalkar mlconf (1)

MoOvaOon:  Face  images

Subspace  Segmenta5on

In

Page 71: Talwalkar mlconf (1)

MoOvaOon:  Face  images

Subspace  Segmenta5on

In

Page 72: Talwalkar mlconf (1)

MoOvaOon:  Face  images

Subspace  Segmenta5on

In

Page 73: Talwalkar mlconf (1)

MoOvaOon:  Face  images

Subspace  Segmenta5on

In

Page 74: Talwalkar mlconf (1)

MoOvaOon:  Face  images

✦  Model  images  of  five  people  via  five  low-­‐dimensional  subspaces

= + ‘noise’

Low-rank

Subspace  Segmenta5on

In

Page 75: Talwalkar mlconf (1)

MoOvaOon:  Face  images

✦  Model  images  of  five  people  via  five  low-­‐dimensional  subspaces✦  Recover  subspaces                        cluster  images

= + ‘noise’

Low-rank

Subspace  Segmenta5on

In

Page 76: Talwalkar mlconf (1)

MoOvaOon:  Face  images

= + ‘noise’

Low-rank

Subspace  Segmenta5on

In

✦ Nuclear  norm  heurisOc  to  provably  recovers  subspaces✦ Guarantees  are  preserved  with  DFC [TMMFJ, ICCV13]

Page 77: Talwalkar mlconf (1)

MoOvaOon:  Face  images

= + ‘noise’

Low-rank

Subspace  Segmenta5on

In

✦ Toy  Experiment:  IdenOfy  images  corresponding  to  same  person  (10  people,  640  images)

✦ DFC  Results:  Linear  speedup,  State-­‐of-­‐the-­‐art  accuracy  

Page 78: Talwalkar mlconf (1)

Video  Event  DetecOon

Page 79: Talwalkar mlconf (1)

Video  Event  DetecOon

✦ Input:  videos,  some  of  which  are  associated  with  events✦ Goal:  predict  events  for  unlabeled  videos

Page 80: Talwalkar mlconf (1)

Video  Event  DetecOon

✦ Input:  videos,  some  of  which  are  associated  with  events✦ Goal:  predict  events  for  unlabeled  videos✦ Idea:

✦ Featurize  each  video

Page 81: Talwalkar mlconf (1)

Video  Event  DetecOon

✦ Input:  videos,  some  of  which  are  associated  with  events✦ Goal:  predict  events  for  unlabeled  videos✦ Idea:

✦ Featurize  each  video✦ Learn  video  clusters  via  nuclear  norm  heurisOc

Page 82: Talwalkar mlconf (1)

Video  Event  DetecOon

✦ Input:  videos,  some  of  which  are  associated  with  events✦ Goal:  predict  events  for  unlabeled  videos✦ Idea:

✦ Featurize  each  video✦ Learn  video  clusters  via  nuclear  norm  heurisOc✦ Given  labeled  nodes  and  cluster  structure,  make  predicOons

Page 83: Talwalkar mlconf (1)

Video  Event  DetecOon

✦ Input:  videos,  some  of  which  are  associated  with  events✦ Goal:  predict  events  for  unlabeled  videos✦ Idea:

✦ Featurize  each  video✦ Learn  video  clusters  via  nuclear  norm  heurisOc✦ Given  labeled  nodes  and  cluster  structure,  make  predicOons

                                           Can  do  this  at  scale  with  DFC!

Page 84: Talwalkar mlconf (1)

DFC  Summary

✦ DFC:  distributed  framework  for  matrix  factorizaOon✦ Similar  recovery  guarantees✦ Significant  speedups  

✦ DFC  applied  to  3  classes  of  problems:✦ Matrix  compleOon✦ Robust  matrix  factorizaOon✦ Subspace  recovery

✦ Extend  DFC  to  other  MF  methods,  e.g.,  ALS,  SGD?

Page 85: Talwalkar mlconf (1)

Big  Data  and  Distributed  CompuOng  are  valuable  resources,  but  ...

Page 86: Talwalkar mlconf (1)

✦ Challenge  1:  ML  not  developed  with  scalability  in  mind

Big  Data  and  Distributed  CompuOng  are  valuable  resources,  but  ...

Page 87: Talwalkar mlconf (1)

✦ Challenge  1:  ML  not  developed  with  scalability  in  mind

Divide-­‐and-­‐Conquer  (e.g.,  DFC)

Big  Data  and  Distributed  CompuOng  are  valuable  resources,  but  ...

Page 88: Talwalkar mlconf (1)

✦ Challenge  1:  ML  not  developed  with  scalability  in  mind

Divide-­‐and-­‐Conquer  (e.g.,  DFC)

✦ Challenge  2:  ML  not  developed  with  ease-­‐of-­‐use  in  mind

Big  Data  and  Distributed  CompuOng  are  valuable  resources,  but  ...

Page 89: Talwalkar mlconf (1)

✦ Challenge  1:  ML  not  developed  with  scalability  in  mind

Divide-­‐and-­‐Conquer  (e.g.,  DFC)

✦ Challenge  2:  ML  not  developed  with  ease-­‐of-­‐use  in  mind

Big  Data  and  Distributed  CompuOng  are  valuable  resources,  but  ...

baseML

baseML

baseML

baseML

ML base

ML base

ML base

ML base

ML base

www.mlbase.org