ParCube : Sparse Parallelizable Tensor Decompositions

28
School of Computer Science Carnegie Mellon University Dept. of ECE University of Minnesota ParCube: Sparse Parallelizable Tensor Decompositions Evangelos E. Papalexakis 1 , Christos Faloutsos 1 , Nikos Sidiropoulos 2 1 Carnegie Mellon University, School of Computer Science 2 University of Minnesota, ECE Department European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases (ECML PKDD), Bristol, UK, September 24 th -28t h , 2012.

description

ParCube : Sparse Parallelizable Tensor Decompositions. Evangelos E. Papalexakis 1 , Christos Faloutsos 1 , Nikos Sidiropoulos 2 1 Carnegie Mellon University, School of Computer Science 2 University of Minnesota, ECE Department. - PowerPoint PPT Presentation

Transcript of ParCube : Sparse Parallelizable Tensor Decompositions

Page 1: ParCube : Sparse Parallelizable Tensor Decompositions

School of Computer ScienceCarnegie Mellon University

Dept. of ECEUniversity of Minnesota

ParCube: Sparse Parallelizable Tensor Decompositions

Evangelos E. Papalexakis1, Christos Faloutsos1, Nikos Sidiropoulos2

1Carnegie Mellon University, School of Computer Science2University of Minnesota, ECE Department

European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases (ECML

PKDD), Bristol, UK, September 24th-28th, 2012.

Page 2: ParCube : Sparse Parallelizable Tensor Decompositions

Evangelos Papalexakis (CMU) – ECML-PKDD 2012

2

Outline

• Introduction Problem Statement Method Experiments Conclusions

Page 3: ParCube : Sparse Parallelizable Tensor Decompositions

Evangelos Papalexakis (CMU) – ECML-PKDD 2012

3

Introduction• Facebook has ~800 Million users

Evolves over time How do we spot interesting patterns & anomalies in this very large

network?

Page 4: ParCube : Sparse Parallelizable Tensor Decompositions

Evangelos Papalexakis (CMU) – ECML-PKDD 2012

4

Introduction

• Suppose we have Knowledge Base data E.g. Read the Web Project at CMU

Subject – verb – object triplets, mined from the web Many gigabytes or terabytes of data! How do we find potential new synonyms to a

word using this knowledge base?

Page 5: ParCube : Sparse Parallelizable Tensor Decompositions

Evangelos Papalexakis (CMU) – ECML-PKDD 2012

5

Introduction to Tensors

• Tensors are multidimensional generalizations of matrices Previous problems can be formulated as tensors! Time-evolving graphs/social networks, Multi-aspect data

(e.g. subject, object, verb)• Focus on 3-way tensors

Can be viewed as Data cubes

Indexed by 3variables (IxJxK)

subject

object

verb

Page 6: ParCube : Sparse Parallelizable Tensor Decompositions

Evangelos Papalexakis (CMU) – ECML-PKDD 2012

6

Introduction to Tensors• PARAFAC decomposition

Decompose a tensor into sum of outer products/rank 1 tensors

Each rank 1 tensor is a different group/”concept” “Similar” to the Singular Value Decomposition in the

matrix case

Store the factor vectors ai, bi, ci as columns of matrices A, B, C

subject

verb

object“leaders/CEOs” “products”

Page 7: ParCube : Sparse Parallelizable Tensor Decompositions

Evangelos Papalexakis (CMU) – ECML-PKDD 2012

7

Outline

Introduction• Problem Statement Method Experiments Conclusions

Page 8: ParCube : Sparse Parallelizable Tensor Decompositions

Evangelos Papalexakis (CMU) – ECML-PKDD 2012

8

Why not PARAFAC?

• Today’s datasets are in the orders of terabytes e.g. Facebook has ~ 800 Million users!

• Explosive complexity/run time for truly large datasets!

• Also, data is very sparse We need the decomposition factors to be sparse

Better interpretability / less noise Can do multi-way soft co-clustering this way!

PARAFAC is dense!

Page 9: ParCube : Sparse Parallelizable Tensor Decompositions

Evangelos Papalexakis (CMU) – ECML-PKDD 2012

9

Problem Statement

• Wish-list: Significantly drop the dimensionality

Ideally 1 or more orders of magnitude Parallelize the computation

Ideally split the problem into independent parts and run in parallel

Yield sparse factors Don’t loose much in the process

Page 10: ParCube : Sparse Parallelizable Tensor Decompositions

Previous work• A.H. Phan et al. Block decomposition for very large-scale nonnegative tensor factorization

Partition & merge parallel algorithm for NN PARAFAC No sparsity

• Q. Zhang et al. A parallel nonnegative tensor factorization algorithm for mining global climate data.

• D. Nion et al. Adaptive algorithms to track the parafac decomposition of a third-order tensor & J. Sun et al. Beyond streams and graphs: dynamic tensor analysis

Tensor is a stream, both methods seek to track the decomposition• C.E. Tsourakakis Mach: Fast randomized tensor decompositions & J. Sun et al.

Multivis:Content- based social network exploration through multi-way visual analysis Sampling based TUCKER models.

• E.E. Papalexakis et al. Co-clustering as multilinear decomposition with sparse latent factors.

Sparse PARAFAC algorithm applied to co-clustering

Evangelos Papalexakis (CMU) – ECML-PKDD 2012 10

None combines all

requirements!

Page 11: ParCube : Sparse Parallelizable Tensor Decompositions

Evangelos Papalexakis (CMU) – ECML-PKDD 2012

11

Our proposal

• We introduce PARCUBE and set the following goals:• Goal 1: Fast

Scalable & parallelizable• Goal 2: Sparse

Ability to yield sparse latent factors and a sparse tensor approximation

• Goal 3: Accurate provable correctness in merging partial results, under

appropriate conditions

Page 12: ParCube : Sparse Parallelizable Tensor Decompositions

Evangelos Papalexakis (CMU) – ECML-PKDD 2012

12

Outline

Introduction Problem Statement• Method Experiments Conclusions

Page 13: ParCube : Sparse Parallelizable Tensor Decompositions

Evangelos Papalexakis (CMU) – ECML-PKDD 2012

13

PARCUBE: The big picture

• Sampling selects small portion of indices• PARAFAC vectors ai bi ci will be sparse by construction

Break up tensor into small piecesusing sampling

Fit dense PARAFAC decomposition on small sampled tensors

Match columns and distribute non-zero values to appropriate indices in original (non-sampled) space

G1

G1

G2

G2

Page 14: ParCube : Sparse Parallelizable Tensor Decompositions

Evangelos Papalexakis (CMU) – ECML-PKDD 2012

14

The PARCUBE method

• Key ideas: Use biased sampling to sample rows, cols & fibers Sampling weight During sampling, always keep a common portion of

indices across samples For each smaller tensor, do the PARAFAC

decomposition. Need to specify 2 parameters:

Sampling rate: s Initial dimensions I, J, K I/s, J/s, K/s

Number of repetitions / different sampled tensors: r

Page 15: ParCube : Sparse Parallelizable Tensor Decompositions

Evangelos Papalexakis (CMU) – ECML-PKDD 2012

15

Putting the pieces together Details

• Say we have matrices As from each sample• Possibly have re-ordering of factors• Each matrix corresponds to different sampled index set of the

original index space• All factors share the “upper” part (by construction)

Proposition: Under mild conditions, the algorithm will stitch components correctly & output what exact PARAFAC would

Proof on paper

G3

Page 16: ParCube : Sparse Parallelizable Tensor Decompositions

Evangelos Papalexakis (CMU) – ECML-PKDD 2012

16

Outline

Introduction Problem Statement Method• Experiments Conclusions

Page 17: ParCube : Sparse Parallelizable Tensor Decompositions

17

Experiments

• We use the Tensor Toolbox for Matlab PARAFAC for baseline and core implementation

• Evaluation of performanceAlgorithm correctnessExecution speedupFactor sparsity

Evangelos Papalexakis (CMU) – ECML-PKDD 2012

Page 18: ParCube : Sparse Parallelizable Tensor Decompositions

Evangelos Papalexakis (CMU) – ECML-PKDD 2012

18

Experiments – Correctness for multiple repetitions

• Relative cost = PARCUBE approximation cost / PARAFAC approximation cost

• The more samples we get, the closer we are to exact PARAFAC• Experimental validation of our theoretical result.

Page 19: ParCube : Sparse Parallelizable Tensor Decompositions

Evangelos Papalexakis (CMU) – ECML-PKDD 2012

19

Experiments - Correctness & Speedup for 1 repetition

• Relative cost = PARCUBE approximation cost / PARAFAC approximation cost• Speedup = PARAFAC execution time / PARCUBE execution time • Extrapolation to parallel execution for 4 repetitions yields 14.2x speedup

(and improves accuracy)

Page 20: ParCube : Sparse Parallelizable Tensor Decompositions

Evangelos Papalexakis (CMU) – ECML-PKDD 2012

20

Experiments – Correctness & Sparsity

• Output size = NNZ(A) + NNZ(B) + NNZ(C)• 90% sparser than PARAFAC while maintaining the

same approximation error

Same as PARAFAC

Page 21: ParCube : Sparse Parallelizable Tensor Decompositions

Evangelos Papalexakis (CMU) – ECML-PKDD 2012

21

Experiments

• Knowledge Discovery ENRON email/social network 186×186×44 Network traffic data (LBNL) 65170 × 65170 ×

65327 FACEBOOK Wall posts 63891 × 63890 × 1847 Knowledge Base data (Never Ending Language

Learner – NELL) 14545 × 14545 × 28818

Page 22: ParCube : Sparse Parallelizable Tensor Decompositions

Evangelos Papalexakis (CMU) – ECML-PKDD 2012

22

Discovery - ENRON

• Who-emailed-whom data from the ENRON email dataset. Spans 44 months 184×184×44 tensor We picked s = 2, r = 4

• We were able to identify social cliques and spot spikes that correspond to actual important events in the company’s timeline

Page 23: ParCube : Sparse Parallelizable Tensor Decompositions

Evangelos Papalexakis (CMU) – ECML-PKDD 2012

23

Discovery – LBNL Network Data

• Network traffic data of form (src IP, dst IP, port #) 65170 × 65170 × 65327 tensor We pick s = 5, r = 10

• We were able to identify a possible Port Scanning Attack

1 src

1 dst

Page 24: ParCube : Sparse Parallelizable Tensor Decompositions

Evangelos Papalexakis (CMU) – ECML-PKDD 2012

24

Discovery – FACEBOOK Wall posts

• Small portion of Facebook’s users 63890 users for 1847 days Picked s = 100, r = 10

• Data in the form (Wall owner, poster, timestamp)• Downloaded from http://socialnetworks.mpi-sws.org/data-wosn2009.html• We were able to identify a birthday-like event.

1 day

1 Wall

Page 25: ParCube : Sparse Parallelizable Tensor Decompositions

Evangelos Papalexakis (CMU) – ECML-PKDD 2012

25

Discovery - NELL

• Knowledge base data• Taken from the Read The Web project at CMU

http://rtw.ml.cmu.edu/rtw/ Special thanks to Tom Mitchell for the data.• Noun phrase x Context x Noun phrase triplets

e.g. ‘Obama’ – ‘is’ – ‘the president of the United States’

• Discover words that may be used in the same context• We picked s = 500, r = 10.

Page 26: ParCube : Sparse Parallelizable Tensor Decompositions

Evangelos Papalexakis (CMU) – ECML-PKDD 2012

26

Outline

Introduction Problem Statement Method Experiments• Conclusions

Page 27: ParCube : Sparse Parallelizable Tensor Decompositions

Evangelos Papalexakis (CMU) – ECML-PKDD 2012

27

Conclusions Goal 1: Fast

Scalable & parallelizable Goal 2: Sparse

Ability to yield sparse latent factors and a sparse tensor approximation

Goal 3: Accurate provable correctness in merging partial results, under appropriate

conditions Experiments that also demonstrate that

• Enables processing of tensors that don’t fit in memory• Interesting findings in diverse Knowledge Discovery settings

Page 28: ParCube : Sparse Parallelizable Tensor Decompositions

Evangelos Papalexakis (CMU) - ASONAM 2012 28

The End

Thank you!

Any questions?

Evangelos E. PapalexakisEmail: [email protected]: http://www.cs.cmu.edu/~epapalex

Christos FaloutsosEmail: [email protected]: http://www.cs.cmu.edu/~christos

Nicholas SidiropoulosEmail: [email protected]: http://www.ece.umn.edu/users/nikos/