Part 2: Communication Costs of Tensor Decompositions · Part 2: Communication Costs of Tensor...
Transcript of Part 2: Communication Costs of Tensor Decompositions · Part 2: Communication Costs of Tensor...
![Page 1: Part 2: Communication Costs of Tensor Decompositions · Part 2: Communication Costs of Tensor Decompositions Grey Ballard CS 294/Math 270: Communication-Avoiding Algorithms UC Berkeley](https://reader033.fdocuments.in/reader033/viewer/2022050102/5f417692a3343e4dfd0ccd99/html5/thumbnails/1.jpg)
Part 2: Communication Costs ofTensor Decompositions
Grey Ballard
CS 294/Math 270: Communication-Avoiding AlgorithmsUC Berkeley
March 28, 2016
Sandia National Laboratories is a multi-program laboratory managed and operated by Sandia Corporation, a wholly owned subsidiary of Lockheed Martin Corporation, for the U.S. Department of Energy’s National Nuclear Security Administration under contract DE-AC04-94AL85000. Sandia National Laboratories is a multi-program laboratory managed and operated by Sandia Corporation, a wholly owned subsidiary of Lockheed Martin Corporation, for the U.S. Department of Energy’s National Nuclear Security Administration under contract DE-AC04-94AL85000.
![Page 2: Part 2: Communication Costs of Tensor Decompositions · Part 2: Communication Costs of Tensor Decompositions Grey Ballard CS 294/Math 270: Communication-Avoiding Algorithms UC Berkeley](https://reader033.fdocuments.in/reader033/viewer/2022050102/5f417692a3343e4dfd0ccd99/html5/thumbnails/2.jpg)
Plan
Introduction to tensor decompositions [KB09]nomenclature and notationpopular decompositions: CP and Tucker
We’re seeking comm-optimal sequential and parallel algorithmsfew known lower boundsfew standard libraries or HPC implementations
Many flavors of problemsdense or sparsesequential or parallelCP or Tucker (or alternatives like tensor train)choices of mathematical algorithm
We’ll do two case studies of parallel algorithmscomputing CP decomposition of sparse tensor [KU15]computing Tucker decomposition of dense tensor [ABK15]
Grey Ballard CA Algorithms 1
![Page 3: Part 2: Communication Costs of Tensor Decompositions · Part 2: Communication Costs of Tensor Decompositions Grey Ballard CS 294/Math 270: Communication-Avoiding Algorithms UC Berkeley](https://reader033.fdocuments.in/reader033/viewer/2022050102/5f417692a3343e4dfd0ccd99/html5/thumbnails/3.jpg)
Plan
Introduction to tensor decompositions [KB09]nomenclature and notationpopular decompositions: CP and Tucker
We’re seeking comm-optimal sequential and parallel algorithmsfew known lower boundsfew standard libraries or HPC implementations
Many flavors of problemsdense or sparsesequential or parallelCP or Tucker (or alternatives like tensor train)choices of mathematical algorithm
We’ll do two case studies of parallel algorithmscomputing CP decomposition of sparse tensor [KU15]computing Tucker decomposition of dense tensor [ABK15]
Grey Ballard CA Algorithms 1
![Page 4: Part 2: Communication Costs of Tensor Decompositions · Part 2: Communication Costs of Tensor Decompositions Grey Ballard CS 294/Math 270: Communication-Avoiding Algorithms UC Berkeley](https://reader033.fdocuments.in/reader033/viewer/2022050102/5f417692a3343e4dfd0ccd99/html5/thumbnails/4.jpg)
Plan
Introduction to tensor decompositions [KB09]nomenclature and notationpopular decompositions: CP and Tucker
We’re seeking comm-optimal sequential and parallel algorithmsfew known lower boundsfew standard libraries or HPC implementations
Many flavors of problemsdense or sparsesequential or parallelCP or Tucker (or alternatives like tensor train)choices of mathematical algorithm
We’ll do two case studies of parallel algorithmscomputing CP decomposition of sparse tensor [KU15]computing Tucker decomposition of dense tensor [ABK15]
Grey Ballard CA Algorithms 1
![Page 5: Part 2: Communication Costs of Tensor Decompositions · Part 2: Communication Costs of Tensor Decompositions Grey Ballard CS 294/Math 270: Communication-Avoiding Algorithms UC Berkeley](https://reader033.fdocuments.in/reader033/viewer/2022050102/5f417692a3343e4dfd0ccd99/html5/thumbnails/5.jpg)
Plan
Introduction to tensor decompositions [KB09]nomenclature and notationpopular decompositions: CP and Tucker
We’re seeking comm-optimal sequential and parallel algorithmsfew known lower boundsfew standard libraries or HPC implementations
Many flavors of problemsdense or sparsesequential or parallelCP or Tucker (or alternatives like tensor train)choices of mathematical algorithm
We’ll do two case studies of parallel algorithmscomputing CP decomposition of sparse tensor [KU15]computing Tucker decomposition of dense tensor [ABK15]
Grey Ballard CA Algorithms 1
![Page 6: Part 2: Communication Costs of Tensor Decompositions · Part 2: Communication Costs of Tensor Decompositions Grey Ballard CS 294/Math 270: Communication-Avoiding Algorithms UC Berkeley](https://reader033.fdocuments.in/reader033/viewer/2022050102/5f417692a3343e4dfd0ccd99/html5/thumbnails/6.jpg)
Outline
1 Tensor Notation
2 Tensor Decompositions
3 Computing CP via Alternating Least SquaresMathematical BackgroundParallel Algorithm for Sparse Tensors
4 Computing Tucker via Sequentially Truncated Higher-Order SVDMathematical BackgroundParallel Algorithm for Dense Tensors
5 Open Problems
![Page 7: Part 2: Communication Costs of Tensor Decompositions · Part 2: Communication Costs of Tensor Decompositions Grey Ballard CS 294/Math 270: Communication-Avoiding Algorithms UC Berkeley](https://reader033.fdocuments.in/reader033/viewer/2022050102/5f417692a3343e4dfd0ccd99/html5/thumbnails/7.jpg)
Tensors
Vector N = 1
Matrix N = 2
3rd-Order Tensor N = 3
4th-Order Tensor N = 4
5th-Order Tensor N = 5
An N th-order tensor has N modesNotation convention: vector v, matrix M, tensor T
Grey Ballard CA Algorithms 2
![Page 8: Part 2: Communication Costs of Tensor Decompositions · Part 2: Communication Costs of Tensor Decompositions Grey Ballard CS 294/Math 270: Communication-Avoiding Algorithms UC Berkeley](https://reader033.fdocuments.in/reader033/viewer/2022050102/5f417692a3343e4dfd0ccd99/html5/thumbnails/8.jpg)
Fibers
Mode-1 Fibers Mode-2 Fibers Mode-3 Fibers
A tensor can be decomposed into the fibers of each mode(fix all indices but one)
Grey Ballard CA Algorithms 3
![Page 9: Part 2: Communication Costs of Tensor Decompositions · Part 2: Communication Costs of Tensor Decompositions Grey Ballard CS 294/Math 270: Communication-Avoiding Algorithms UC Berkeley](https://reader033.fdocuments.in/reader033/viewer/2022050102/5f417692a3343e4dfd0ccd99/html5/thumbnails/9.jpg)
Slices
458 TAMARA G. KOLDA AND BRETT W. BADER
(a) Mode-1 (column) fibers: x:jk (b) Mode-2 (row) fibers: xi:k (c) Mode-3 (tube) fibers: xij:
Fig. 2.1 Fibers of a 3rd-order tensor.
(a) Horizontal slices: Xi:: (b) Lateral slices: X:j: (c) Frontal slices: X::k (or Xk)
Fig. 2.2 Slices of a 3rd-order tensor.
A. The inner product of two same-sized tensors X, Y ∈ RI1×I2×···×IN is the sum ofthe products of their entries, i.e.,
⟨X, Y ⟩ =
I1!
i1=1
I2!
i2=1
· · ·IN!
iN=1
xi1i2···iN yi1i2···iN .
It follows immediately that ⟨X, X ⟩ = ∥X ∥2.2.1. Rank-One Tensors. An N -way tensor X ∈ RI1×I2×···×IN is rank one if it
can be written as the outer product of N vectors, i.e.,
X = a(1) ◦ a(2) ◦ · · · ◦ a(N).
The symbol “◦” represents the vector outer product. This means that each elementof the tensor is the product of the corresponding vector elements:
xi1i2···iN = a(1)i1
a(2)i2
· · · a(N)iN
for all 1 ≤ in ≤ In.
Figure 2.3 illustrates X = a ◦ b ◦ c, a third-order rank-one tensor.
2.2. Symmetry and Tensors. A tensor is called cubical if every mode is the samesize, i.e., X ∈ RI×I×I×···×I [49]. A cubical tensor is called supersymmetric (though
Dow
nloa
ded
09/0
5/13
to 1
98.2
06.2
19.3
8. R
edis
tribu
tion
subj
ect t
o SI
AM
lice
nse
or c
opyr
ight
; see
http
://w
ww
.siam
.org
/jour
nals
/ojs
a.ph
p
A tensor can also be decomposed into the slices of each mode(fix one index)
Grey Ballard CA Algorithms 4
![Page 10: Part 2: Communication Costs of Tensor Decompositions · Part 2: Communication Costs of Tensor Decompositions Grey Ballard CS 294/Math 270: Communication-Avoiding Algorithms UC Berkeley](https://reader033.fdocuments.in/reader033/viewer/2022050102/5f417692a3343e4dfd0ccd99/html5/thumbnails/10.jpg)
Unfoldings
A tensor can be reshaped into matrices,called unfoldings or matricizations, for different modes
(fibers form columns, slices form rows)
Grey Ballard CA Algorithms 5
![Page 11: Part 2: Communication Costs of Tensor Decompositions · Part 2: Communication Costs of Tensor Decompositions Grey Ballard CS 294/Math 270: Communication-Avoiding Algorithms UC Berkeley](https://reader033.fdocuments.in/reader033/viewer/2022050102/5f417692a3343e4dfd0ccd99/html5/thumbnails/11.jpg)
Outline
1 Tensor Notation
2 Tensor Decompositions
3 Computing CP via Alternating Least SquaresMathematical BackgroundParallel Algorithm for Sparse Tensors
4 Computing Tucker via Sequentially Truncated Higher-Order SVDMathematical BackgroundParallel Algorithm for Dense Tensors
5 Open Problems
![Page 12: Part 2: Communication Costs of Tensor Decompositions · Part 2: Communication Costs of Tensor Decompositions Grey Ballard CS 294/Math 270: Communication-Avoiding Algorithms UC Berkeley](https://reader033.fdocuments.in/reader033/viewer/2022050102/5f417692a3343e4dfd0ccd99/html5/thumbnails/12.jpg)
Low-rank approximations of tensors
Tensor “decompositions” are usually low-rank approximations
They generalize matrix approximations from two viewpointssum of outer products (think PCA)product of two rectangular matrices (think high-variance subspaces)
Some applications seek true decompositions, but less common
Grey Ballard CA Algorithms 6
![Page 13: Part 2: Communication Costs of Tensor Decompositions · Part 2: Communication Costs of Tensor Decompositions Grey Ballard CS 294/Math 270: Communication-Avoiding Algorithms UC Berkeley](https://reader033.fdocuments.in/reader033/viewer/2022050102/5f417692a3343e4dfd0ccd99/html5/thumbnails/13.jpg)
Sum of outer products
Matrix:
Tensor:
This is known as the CANDECOMP/PARAFAC (CP) decomposition
Grey Ballard CA Algorithms 7
![Page 14: Part 2: Communication Costs of Tensor Decompositions · Part 2: Communication Costs of Tensor Decompositions Grey Ballard CS 294/Math 270: Communication-Avoiding Algorithms UC Berkeley](https://reader033.fdocuments.in/reader033/viewer/2022050102/5f417692a3343e4dfd0ccd99/html5/thumbnails/14.jpg)
CP Notation
T ≈ u1 ◦ v1 ◦w1 + · · ·+ uR ◦ vR ◦wR, T ∈ RI×J×K
T ≈ JU,V,WK , U ∈ RI×R,V ∈ RJ×R,W ∈ RK×R are factor matrices
tijk ≈R∑
r=1
uir vjr wkr , 1 ≤ i ≤ I,1 ≤ j ≤ J,1 ≤ k ≤ K
Grey Ballard CA Algorithms 8
Notation convention: scalar dimension N, index n with 1 ≤ n ≤ N
![Page 15: Part 2: Communication Costs of Tensor Decompositions · Part 2: Communication Costs of Tensor Decompositions Grey Ballard CS 294/Math 270: Communication-Avoiding Algorithms UC Berkeley](https://reader033.fdocuments.in/reader033/viewer/2022050102/5f417692a3343e4dfd0ccd99/html5/thumbnails/15.jpg)
Applications of CP
CP often used like PCA for multi-dimensional datainterpretable components separated from noise
Sample applicationschemometrics [AB03]
data is excitation wavelengths × emission wavelengths × timecomponents correspond to chemical species’ signatures
neuroscience [AABB+07]data is electrode × frequency × timecomponents help to describe origin of a seizure
text analysis [BBB08]data is term × author × timecomponents discover conversations
Grey Ballard CA Algorithms 9
![Page 16: Part 2: Communication Costs of Tensor Decompositions · Part 2: Communication Costs of Tensor Decompositions Grey Ballard CS 294/Math 270: Communication-Avoiding Algorithms UC Berkeley](https://reader033.fdocuments.in/reader033/viewer/2022050102/5f417692a3343e4dfd0ccd99/html5/thumbnails/16.jpg)
High-variance subspaces
Matrix:
Tensor:
This is known as the Tucker decomposition
Grey Ballard CA Algorithms 10
![Page 17: Part 2: Communication Costs of Tensor Decompositions · Part 2: Communication Costs of Tensor Decompositions Grey Ballard CS 294/Math 270: Communication-Avoiding Algorithms UC Berkeley](https://reader033.fdocuments.in/reader033/viewer/2022050102/5f417692a3343e4dfd0ccd99/html5/thumbnails/17.jpg)
Tucker Notation
T ≈ G×1 U×2 V×3 W T ∈ RI×J×K ,G ∈ RP×Q×R is core tensor
T ≈ JG;U,V,WK , U ∈ RI×P ,V ∈ RJ×Q,W ∈ RK×R are factor matrices
tijk ≈P∑
p=1
Q∑
q=1
R∑
r=1
gpqr uipvjqwkr , 1 ≤ i ≤ I,1 ≤ j ≤ J,1 ≤ k ≤ K
Grey Ballard CA Algorithms 11
![Page 18: Part 2: Communication Costs of Tensor Decompositions · Part 2: Communication Costs of Tensor Decompositions Grey Ballard CS 294/Math 270: Communication-Avoiding Algorithms UC Berkeley](https://reader033.fdocuments.in/reader033/viewer/2022050102/5f417692a3343e4dfd0ccd99/html5/thumbnails/18.jpg)
Tensor-Times-Matrix (TTM)
Tensor version:Y = X×2 M
Y ∈ RI×Q×K X ∈ RI×J×K M ∈ RQ×J
Matrix version:Y(2) = MX(2)
Y(2) ∈ RQ×IK X(2) ∈ RJ×IK
Element version:
yiqk =J∑
j=1
mqjxijk
TTM is matrix multiplication with certain unfolding
Grey Ballard CA Algorithms 12
![Page 19: Part 2: Communication Costs of Tensor Decompositions · Part 2: Communication Costs of Tensor Decompositions Grey Ballard CS 294/Math 270: Communication-Avoiding Algorithms UC Berkeley](https://reader033.fdocuments.in/reader033/viewer/2022050102/5f417692a3343e4dfd0ccd99/html5/thumbnails/19.jpg)
Applications of Tucker
Tucker can be viewed as a richer form of CP, so it’s also used like PCAa diagonal core tensor corresponds to a CP decomposition
Sample ApplicationComputer vision: TensorFaces [VT02]
facial recognition system benefiting from varying lighting,expression, viewpoint
Tucker is typically more efficient than CP for compressionSample Application
Visual data compression [BRP15]image, video, and 3D volume data
Grey Ballard CA Algorithms 13
![Page 20: Part 2: Communication Costs of Tensor Decompositions · Part 2: Communication Costs of Tensor Decompositions Grey Ballard CS 294/Math 270: Communication-Avoiding Algorithms UC Berkeley](https://reader033.fdocuments.in/reader033/viewer/2022050102/5f417692a3343e4dfd0ccd99/html5/thumbnails/20.jpg)
Ambiguities
There are several ambiguities that have to be handled carefully
CP scaling ambiguity
T ≈R∑
r=1
ur ◦ vr ◦wr → T ≈R∑
r=1
λr · ur ◦ vr ◦wr
where ‖ur‖2 = ‖vr‖2 = ‖wr‖2 = 1
Tucker basis ambiguity
T ≈ G×1 U×2 V×3 W
where UTU = IP , VTV = IQ, WTW = IR
Grey Ballard CA Algorithms 14
![Page 21: Part 2: Communication Costs of Tensor Decompositions · Part 2: Communication Costs of Tensor Decompositions Grey Ballard CS 294/Math 270: Communication-Avoiding Algorithms UC Berkeley](https://reader033.fdocuments.in/reader033/viewer/2022050102/5f417692a3343e4dfd0ccd99/html5/thumbnails/21.jpg)
Survey Paper
Notation can be a huge obstacle to working with tensors,standardization can help
I recommend following the conventions of the following paper:
Tensor Decompositions and ApplicationsTammy Kolda and Brett Bader
SIAM Review 2009http://epubs.siam.org/doi/abs/10.1137/07070111X
Grey Ballard CA Algorithms 15
![Page 22: Part 2: Communication Costs of Tensor Decompositions · Part 2: Communication Costs of Tensor Decompositions Grey Ballard CS 294/Math 270: Communication-Avoiding Algorithms UC Berkeley](https://reader033.fdocuments.in/reader033/viewer/2022050102/5f417692a3343e4dfd0ccd99/html5/thumbnails/22.jpg)
Outline
1 Tensor Notation
2 Tensor Decompositions
3 Computing CP via Alternating Least SquaresMathematical BackgroundParallel Algorithm for Sparse Tensors
4 Computing Tucker via Sequentially Truncated Higher-Order SVDMathematical BackgroundParallel Algorithm for Dense Tensors
5 Open Problems
![Page 23: Part 2: Communication Costs of Tensor Decompositions · Part 2: Communication Costs of Tensor Decompositions Grey Ballard CS 294/Math 270: Communication-Avoiding Algorithms UC Berkeley](https://reader033.fdocuments.in/reader033/viewer/2022050102/5f417692a3343e4dfd0ccd99/html5/thumbnails/23.jpg)
Outline
1 Tensor Notation
2 Tensor Decompositions
3 Computing CP via Alternating Least SquaresMathematical BackgroundParallel Algorithm for Sparse Tensors
4 Computing Tucker via Sequentially Truncated Higher-Order SVDMathematical BackgroundParallel Algorithm for Dense Tensors
5 Open Problems
![Page 24: Part 2: Communication Costs of Tensor Decompositions · Part 2: Communication Costs of Tensor Decompositions Grey Ballard CS 294/Math 270: Communication-Avoiding Algorithms UC Berkeley](https://reader033.fdocuments.in/reader033/viewer/2022050102/5f417692a3343e4dfd0ccd99/html5/thumbnails/24.jpg)
CP Optimization Problem
For fixed rank R, we want to solve
minU,V,W
∥∥∥∥∥X−R∑
r=1
ur ◦ vr ◦wr
∥∥∥∥∥
which is a nonlinear, nonconvex optimization problem
in the matrix case, the SVD gives us the optimal solution
in the tensor case, uniqueness/convergence to optimum not guaranteed
Grey Ballard CA Algorithms 16
![Page 25: Part 2: Communication Costs of Tensor Decompositions · Part 2: Communication Costs of Tensor Decompositions Grey Ballard CS 294/Math 270: Communication-Avoiding Algorithms UC Berkeley](https://reader033.fdocuments.in/reader033/viewer/2022050102/5f417692a3343e4dfd0ccd99/html5/thumbnails/25.jpg)
Alternating Least Squares (ALS)
Fixing all but one factor matrix, we have a linear least squares problem:
minV
∥∥∥∥∥X−R∑
r=1
ur ◦ vr ◦ wr
∥∥∥∥∥
or equivalentlymin
V
∥∥∥X(2) − V(W� U)T∥∥∥
F
where � is the Khatri-Rao product, a column-wise Kronecker product
ALS works by alternating over factor matrices, updating one at a timeby solving the corresponding linear least squares problem
Grey Ballard CA Algorithms 17
![Page 26: Part 2: Communication Costs of Tensor Decompositions · Part 2: Communication Costs of Tensor Decompositions Grey Ballard CS 294/Math 270: Communication-Avoiding Algorithms UC Berkeley](https://reader033.fdocuments.in/reader033/viewer/2022050102/5f417692a3343e4dfd0ccd99/html5/thumbnails/26.jpg)
CP-ALS
Repeat1 Solve U(VTV ∗WTW) = X(1)(W� V) for U2 Normalize columns of U3 Solve V(UTU ∗WTW) = X(2)(W� U) for V4 Normalize columns of V5 Solve W(UTU ∗ VTV) = X(3)(V� U) for W6 Normalize columns of W and store norms in λ
Linear least squares problems solved via normal equationsusing identity (A� B)T(A� B) = ATA ∗ BTB,
where ∗ is element-wise product
Grey Ballard CA Algorithms 18
![Page 27: Part 2: Communication Costs of Tensor Decompositions · Part 2: Communication Costs of Tensor Decompositions Grey Ballard CS 294/Math 270: Communication-Avoiding Algorithms UC Berkeley](https://reader033.fdocuments.in/reader033/viewer/2022050102/5f417692a3343e4dfd0ccd99/html5/thumbnails/27.jpg)
Outline
1 Tensor Notation
2 Tensor Decompositions
3 Computing CP via Alternating Least SquaresMathematical BackgroundParallel Algorithm for Sparse Tensors
4 Computing Tucker via Sequentially Truncated Higher-Order SVDMathematical BackgroundParallel Algorithm for Dense Tensors
5 Open Problems
![Page 28: Part 2: Communication Costs of Tensor Decompositions · Part 2: Communication Costs of Tensor Decompositions Grey Ballard CS 294/Math 270: Communication-Avoiding Algorithms UC Berkeley](https://reader033.fdocuments.in/reader033/viewer/2022050102/5f417692a3343e4dfd0ccd99/html5/thumbnails/28.jpg)
Matricized Tensor Times Khatri-Rao Product
CP-ALS spends most of its time in MTTKRP (dense or sparse)corresponds to setting up the right-hand-side of normal equationsM(V ) = X(2)(W� U), for example
In the dense case, it usually makes sense to1 form Khatri-Rao product explicitly2 call dense matrix multiplication
In the sparse case, it usually make sense [BK07] to use
element-wise formula m(V )jr =
I∑
i=1
K∑
k=1
xijkuir wkr
row-wise formula m(V )j,: =
I∑
i=1
K∑
k=1
xijk (ui,: ∗ wk ,:)
Grey Ballard CA Algorithms 19
![Page 29: Part 2: Communication Costs of Tensor Decompositions · Part 2: Communication Costs of Tensor Decompositions Grey Ballard CS 294/Math 270: Communication-Avoiding Algorithms UC Berkeley](https://reader033.fdocuments.in/reader033/viewer/2022050102/5f417692a3343e4dfd0ccd99/html5/thumbnails/29.jpg)
Matricized Tensor Times Khatri-Rao Product
CP-ALS spends most of its time in MTTKRP (dense or sparse)corresponds to setting up the right-hand-side of normal equationsM(V ) = X(2)(W� U), for example
In the dense case, it usually makes sense to1 form Khatri-Rao product explicitly2 call dense matrix multiplication
In the sparse case, it usually make sense [BK07] to use
element-wise formula m(V )jr =
I∑
i=1
K∑
k=1
xijkuir wkr
row-wise formula m(V )j,: =
I∑
i=1
K∑
k=1
xijk (ui,: ∗ wk ,:)
Grey Ballard CA Algorithms 19
![Page 30: Part 2: Communication Costs of Tensor Decompositions · Part 2: Communication Costs of Tensor Decompositions Grey Ballard CS 294/Math 270: Communication-Avoiding Algorithms UC Berkeley](https://reader033.fdocuments.in/reader033/viewer/2022050102/5f417692a3343e4dfd0ccd99/html5/thumbnails/30.jpg)
Matricized Tensor Times Khatri-Rao Product
CP-ALS spends most of its time in MTTKRP (dense or sparse)corresponds to setting up the right-hand-side of normal equationsM(V ) = X(2)(W� U), for example
In the dense case, it usually makes sense to1 form Khatri-Rao product explicitly2 call dense matrix multiplication
In the sparse case, it usually make sense [BK07] to use
element-wise formula m(V )jr =
I∑
i=1
K∑
k=1
xijkuir wkr
row-wise formula m(V )j,: =
I∑
i=1
K∑
k=1
xijk (ui,: ∗ wk ,:)
Grey Ballard CA Algorithms 19
![Page 31: Part 2: Communication Costs of Tensor Decompositions · Part 2: Communication Costs of Tensor Decompositions Grey Ballard CS 294/Math 270: Communication-Avoiding Algorithms UC Berkeley](https://reader033.fdocuments.in/reader033/viewer/2022050102/5f417692a3343e4dfd0ccd99/html5/thumbnails/31.jpg)
Coarse-Grain Distribution for CP-ALS [KU15]
U
P1
P2
P3
P4
↓I↑
←R→
V
P1P2P3
P4↓J↑
←R→
W
P1
P2
P3
P4
↓K↑
←R→
X
Rows of each factor matrices are distributed across processors
Each tensor nonzero is copied to each process that will need it
Grey Ballard CA Algorithms 20
![Page 32: Part 2: Communication Costs of Tensor Decompositions · Part 2: Communication Costs of Tensor Decompositions Grey Ballard CS 294/Math 270: Communication-Avoiding Algorithms UC Berkeley](https://reader033.fdocuments.in/reader033/viewer/2022050102/5f417692a3343e4dfd0ccd99/html5/thumbnails/32.jpg)
Coarse-Grain Distribution for CP-ALS [KU15]
U
P1
P2
P3
P4
↓I↑
←R→
V
P1P2P3
P4↓J↑
←R→
W
P1
P2
P3
P4
↓K↑
←R→
X
i
j
k ∗xijk
Rows of each factor matrices are distributed across processorsEach tensor nonzero is copied to each process that will need it
Grey Ballard CA Algorithms 20
![Page 33: Part 2: Communication Costs of Tensor Decompositions · Part 2: Communication Costs of Tensor Decompositions Grey Ballard CS 294/Math 270: Communication-Avoiding Algorithms UC Berkeley](https://reader033.fdocuments.in/reader033/viewer/2022050102/5f417692a3343e4dfd0ccd99/html5/thumbnails/33.jpg)
Coarse-Grain Parallelization for MTTKRP [KU15]
To update V, need to compute M(V ) = X(2)(W� U)
Let Ip, Jp,Kp be the subset of rows of U,V,W owned by processor p
Main loop:for each j ∈ Jp
for each nonzero xijk in slice jm(V )
j,: ← m(V )j,: + xijk · (ui,: ∗ wk ,:)
In the inner loop, ui,: or wk ,: require communication if i /∈ Ip or k /∈ Kp
Grey Ballard CA Algorithms 21
![Page 34: Part 2: Communication Costs of Tensor Decompositions · Part 2: Communication Costs of Tensor Decompositions Grey Ballard CS 294/Math 270: Communication-Avoiding Algorithms UC Berkeley](https://reader033.fdocuments.in/reader033/viewer/2022050102/5f417692a3343e4dfd0ccd99/html5/thumbnails/34.jpg)
Fine-Grain Distribution of CP-ALS [KU15]
U
P1
P2
P3
P4
↓I↑
←R→
V
P1P2P3
P4↓J↑
←R→
W
P1
P2
P3
P4
↓K↑
←R→
X
∗
∗
∗
∗
∗
∗
Rows of each factor matrices are distributed across processorsTensor nonzeros are distributed across processors
Grey Ballard CA Algorithms 22
![Page 35: Part 2: Communication Costs of Tensor Decompositions · Part 2: Communication Costs of Tensor Decompositions Grey Ballard CS 294/Math 270: Communication-Avoiding Algorithms UC Berkeley](https://reader033.fdocuments.in/reader033/viewer/2022050102/5f417692a3343e4dfd0ccd99/html5/thumbnails/35.jpg)
Fine-Grain Parallelization for MTTKRP [KU15]
To update V, need to compute M(V ) = X(2)(W� U)
Let Xp be the subset of nonzeros of X owned by processor pLet Ip, Jp,Kp be the subset of rows of U,V,W owned by processor p
Main loop:for each xijk ∈ Xp
m(V )j,: ← m(V )
j,: + xijk · (ui,: ∗ wk ,:)
In the inner loop, ui,: or wk ,: require communication if i /∈ Ip or k /∈ Kp
After the loop, m(V )j,: for j /∈ Jp needs to be sent to owner processor
Grey Ballard CA Algorithms 23
![Page 36: Part 2: Communication Costs of Tensor Decompositions · Part 2: Communication Costs of Tensor Decompositions Grey Ballard CS 294/Math 270: Communication-Avoiding Algorithms UC Berkeley](https://reader033.fdocuments.in/reader033/viewer/2022050102/5f417692a3343e4dfd0ccd99/html5/thumbnails/36.jpg)
Minimizing Communication
Algorithms defined for any distributions of factor matrices / tensor
Distributions determine computational load balance andcommunication costs
Finding optimal distribution for each algorithm is a hypergraphpartitioning problem (subject to load balance constraint)
Even if hypergraph is optimally partitioned, no guarantees thateither algorithm is communication optimal
Grey Ballard CA Algorithms 24
![Page 37: Part 2: Communication Costs of Tensor Decompositions · Part 2: Communication Costs of Tensor Decompositions Grey Ballard CS 294/Math 270: Communication-Avoiding Algorithms UC Berkeley](https://reader033.fdocuments.in/reader033/viewer/2022050102/5f417692a3343e4dfd0ccd99/html5/thumbnails/37.jpg)
Coarse-Grain vs Fine-Grain
Coarse-GrainOwner computes: communicatesonly inputs within MTTKRPRequires replication of XGeneralizes row-wise algorithmfor SpMV (for multiple vectors)
Fine-GrainCommunicates inputs andoutputs within MTTKRPNo replication of XGeneralizes fine-grainalgorithm for SpMV
Grey Ballard CA Algorithms 25
![Page 38: Part 2: Communication Costs of Tensor Decompositions · Part 2: Communication Costs of Tensor Decompositions Grey Ballard CS 294/Math 270: Communication-Avoiding Algorithms UC Berkeley](https://reader033.fdocuments.in/reader033/viewer/2022050102/5f417692a3343e4dfd0ccd99/html5/thumbnails/38.jpg)
Performance Comparison [KU15]
1 2 4 8 16 32 64 128 256 512 102410
−1
100
101
102
Number of MPI processes
Avera
ge t
ime (
in s
eco
nd
s)
per
CP
−A
LS
ite
rati
on
ht−finegrain−hp
ht−finegrain−random
ht−coarsegrain−hp
ht−coarsegrain−block
DFacTo
(a) Time on Netflix
1 2 4 8 16 32 64 128 256 512 102410
−1
100
101
102
Number of MPI processes
Avera
ge t
ime (
in s
eco
nd
s)
per
CP
−A
LS
ite
rati
on
ht−finegrain−hp
ht−finegrain−random
ht−coarsegrain−hp
ht−coarsegrain−block
DFacTo
(b) Time on NELL-B
1 2 4 8 16 32 64 128 256 512 10240
20
40
60
80
100
120
140
160
180
Number of MPI processes
Sp
eed
up
over
the s
eq
uen
tial execu
tio
n o
f a C
P−
AL
S ite
rati
on
ht−finegrain−hp
ht−finegrain−random
ht−coarsegrain−hp
ht−coarsegrain−block
(c) Speedup on Flickr
1 2 4 8 16 32 64 128 256 512 10240
20
40
60
80
100
120
140
160
Number of MPI processes
Sp
eed
up
over
the s
eq
uen
tial execu
tio
n o
f a C
P−
AL
S ite
rati
on
ht−finegrain−hp
ht−finegrain−random
ht−coarsegrain−hp
ht−coarsegrain−block
(d) Speedup on Delicious
Figure 2: Time for parallel CP-ALS iteration on Netflix and NELL-B, and speedups on Flickr and Delicious.
Table 2: Statistics for the computation and communicationrequirements in one CP-ALS iteration for 512-way partition-ings of the Netflix tensor.
ModeComp. load Comm. volume Num. msg.
Max Avg Max Avg Max Avght-finegrain-hp
1 196672 196251 21079 6367 734 3162 196672 196251 18028 5899 1022 10163 196672 196251 3545 2492 1022 1018
ht-finegrain-random1 197507 196251 272326 252118 1022 10222 197507 196251 29282 22715 1022 10223 197507 196251 7766 4300 1013 1003
ht-coarsegrain-hp1 364181 196251 302001 136741 511 5112 349123 196251 59523 12228 511 5113 737570 196251 23524 2000 511 507
ht-coarsegrain-block1 198602 196251 239337 142006 448 4472 367966 196251 33889 12458 511 4453 737570 196251 24659 2049 511 394
on up to 1024 cores. The experiments showed that the pro-posed fine-grain MTTKRP can achieve the best performancewith respect to other alternatives with a good partitioning,reaching up to 194x speedups on 512 cores.
In our analysis and experiments, we identified the com-munication latency as the dominant hindrance for furtherscalability of the fastest proposed method. We will inves-tigate this in the future. We also note that the size of thehypergraphs that we build can cause discomfort to all ex-isting partitioning tools. Methods that partition huge hy-pergraphs e�ciently and e↵ectively are needed for handlinglarger tensors than those treated in this work.
During the revision process, we have come across to a newstudy called DMS on distributed memory tensor factoriza-tion [30] which uses SPLATT formulation. The communica-tion is based on all-to-all primitives. The current versions ofthe two codes (ours using MPI, DMS using OpenMP+MPI)do not allow a useful comparison. We plan to update ourcodes and do a comparison in the near future.
6. ACKNOWLEDGMENTSThis work was performed using HPC resources from GENCI-
[TGCC/CINES/IDRIS] (Grant 2015-100570). Additionalcomputational resources were used from the PSMN com-
puting center at ENS Lyon.
7. REFERENCES[1] E. Acar, D. M. Dunlavy, and T. G. Kolda. A scalable
optimization approach for fitting canonical tensordecompositions. Journal of Chemometrics,25(2):67–86, February 2011.
[2] C. A. Andersson and R. Bro. The N-way toolbox forMATLAB. Chemometrics and Intelligent LaboratorySystems, 52(1):1–4, 2000.
[3] C. J. Appellof and E. Davidson. Strategies foranalyzing data from video fluorometric monitoring ofliquid chromatographic e✏uents. AnalyticalChemistry, 53(13):2053–2056, 1981.
[4] C. Aykanat, B. B. Cambazoglu, and B. Ucar.Multi-level direct K-way hypergraph partitioning withmultiple constraints and fixed vertices. Journal ofParallel and Distributed Computing, 68:609–625, 2008.
[5] B. W. Bader and T. G. Kolda. E�cient MATLABcomputations with sparse and factored tensors. SIAMJournal on Scientific Computing, 30(1):205–231,December 2007.
[6] B. W. Bader, T. G. Kolda, et al. Matlab tensortoolbox version 2.6. Available online, February 2015.
[7] J. Bennett and S. Lanning. The netflix prize. InProceedings of KDD cup and workshop, volume 2007,page 35, 2007.
[8] R. H. Bisseling and W. Meesen. Communicationbalancing in parallel sparse matrix-vectormultiplication. Electronic Transactions on NumericalAnalysis, 21:47–65, 2005.
[9] A. Carlson, J. Betteridge, B. Kisiel, B. Settles, E. R.Hruschka Jr, and T. M. Mitchell. Toward anarchitecture for never-ending language learning. InAAAI, volume 5, page 3, 2010.
[10] J. D. Carroll and J.-J. Chang. Analysis of individualdi↵erences in multidimensional scaling via an N-waygeneralization of “Eckart-Young” decomposition.Psychometrika, 35(3):283–319, 1970.
[11] U. V. Catalyurek and C. Aykanat.Hypergraph-partitioning-based decomposition forparallel sparse-matrix vector multiplication. IEEETransactions on Parallel and Distributed Systems,10(7):673–693, Jul 1999.
[12] U . V. Catalyurek and C. Aykanat. PaToH: AMultilevel Hypergraph Partitioning Tool, Version 3.0.
Grey Ballard CA Algorithms 26
![Page 39: Part 2: Communication Costs of Tensor Decompositions · Part 2: Communication Costs of Tensor Decompositions Grey Ballard CS 294/Math 270: Communication-Avoiding Algorithms UC Berkeley](https://reader033.fdocuments.in/reader033/viewer/2022050102/5f417692a3343e4dfd0ccd99/html5/thumbnails/39.jpg)
Outline
1 Tensor Notation
2 Tensor Decompositions
3 Computing CP via Alternating Least SquaresMathematical BackgroundParallel Algorithm for Sparse Tensors
4 Computing Tucker via Sequentially Truncated Higher-Order SVDMathematical BackgroundParallel Algorithm for Dense Tensors
5 Open Problems
![Page 40: Part 2: Communication Costs of Tensor Decompositions · Part 2: Communication Costs of Tensor Decompositions Grey Ballard CS 294/Math 270: Communication-Avoiding Algorithms UC Berkeley](https://reader033.fdocuments.in/reader033/viewer/2022050102/5f417692a3343e4dfd0ccd99/html5/thumbnails/40.jpg)
Outline
1 Tensor Notation
2 Tensor Decompositions
3 Computing CP via Alternating Least SquaresMathematical BackgroundParallel Algorithm for Sparse Tensors
4 Computing Tucker via Sequentially Truncated Higher-Order SVDMathematical BackgroundParallel Algorithm for Dense Tensors
5 Open Problems
![Page 41: Part 2: Communication Costs of Tensor Decompositions · Part 2: Communication Costs of Tensor Decompositions Grey Ballard CS 294/Math 270: Communication-Avoiding Algorithms UC Berkeley](https://reader033.fdocuments.in/reader033/viewer/2022050102/5f417692a3343e4dfd0ccd99/html5/thumbnails/41.jpg)
Tucker Optimization Problem
For fixed ranks P,Q,R, we want to solve
minX
∥∥∥X− X∥∥∥
2=
I∑
i=1
J∑
j=1
K∑
k=1
(xijk − xijk )2 subject to X = JG;U,V,WK
which turns out to be equivalent to
maxU,V,W
‖G‖ subject to G = X×1 UT ×2 VT ×3 WT
which is a nonlinear, nonconvex optimization problem
Grey Ballard CA Algorithms 27
![Page 42: Part 2: Communication Costs of Tensor Decompositions · Part 2: Communication Costs of Tensor Decompositions Grey Ballard CS 294/Math 270: Communication-Avoiding Algorithms UC Berkeley](https://reader033.fdocuments.in/reader033/viewer/2022050102/5f417692a3343e4dfd0ccd99/html5/thumbnails/42.jpg)
Higher-Order Orthogonal Iteration (HOOI)
Fixing all but one factor matrix, we have a matrix problem:
maxV
∥∥∥X×1 UT ×2 VT ×3 W
T∥∥∥
or equivalentlymax
V
∥∥∥VTY(2)
∥∥∥F
where Y = X×1 UT ×3 W
T
HOOI works by alternating over factor matrices, updating one at a timeby computing leading left singular vectors
Grey Ballard CA Algorithms 28
![Page 43: Part 2: Communication Costs of Tensor Decompositions · Part 2: Communication Costs of Tensor Decompositions Grey Ballard CS 294/Math 270: Communication-Avoiding Algorithms UC Berkeley](https://reader033.fdocuments.in/reader033/viewer/2022050102/5f417692a3343e4dfd0ccd99/html5/thumbnails/43.jpg)
Sequentially Truncated Higher-Order SVD
HOOI is very sensitive to initialization
Truncated Higher-Order SVD (T-HOSVD) typically used
ST-HOSVD [VVM12] is more efficient than T-HOSVD, works byinitializing with identity matrices U = II , V = IJ , W = IKapplying one iteration of HOOIwhere ranks P,Q,R can be chosen based on error tolerance
Grey Ballard CA Algorithms 29
![Page 44: Part 2: Communication Costs of Tensor Decompositions · Part 2: Communication Costs of Tensor Decompositions Grey Ballard CS 294/Math 270: Communication-Avoiding Algorithms UC Berkeley](https://reader033.fdocuments.in/reader033/viewer/2022050102/5f417692a3343e4dfd0ccd99/html5/thumbnails/44.jpg)
ST-HOSVD Algorithm
1 S(1) ← X(1)XT(1)
2 U = leading eigenvectors of S(1)
3 Y = X×1 U4 S(2) ← Y(2)YT
(2)
5 V = leading eigenvectors of S(2)
6 Z = Y×2 V7 S(3) ← Z(3)ZT
(3)
8 W = leading eigenvectors of S(3)
9 G = Z×3 W
Left singular vectors of A computed as eigenvectors of ATA
Grey Ballard CA Algorithms 30
![Page 45: Part 2: Communication Costs of Tensor Decompositions · Part 2: Communication Costs of Tensor Decompositions Grey Ballard CS 294/Math 270: Communication-Avoiding Algorithms UC Berkeley](https://reader033.fdocuments.in/reader033/viewer/2022050102/5f417692a3343e4dfd0ccd99/html5/thumbnails/45.jpg)
Outline
1 Tensor Notation
2 Tensor Decompositions
3 Computing CP via Alternating Least SquaresMathematical BackgroundParallel Algorithm for Sparse Tensors
4 Computing Tucker via Sequentially Truncated Higher-Order SVDMathematical BackgroundParallel Algorithm for Dense Tensors
5 Open Problems
![Page 46: Part 2: Communication Costs of Tensor Decompositions · Part 2: Communication Costs of Tensor Decompositions Grey Ballard CS 294/Math 270: Communication-Avoiding Algorithms UC Berkeley](https://reader033.fdocuments.in/reader033/viewer/2022050102/5f417692a3343e4dfd0ccd99/html5/thumbnails/46.jpg)
Parallel Block Tensor Distribution
For N-mode tensor, use logical N-mode processor gridProc. grid: PI × PJ × PK = 3× 5× 2
← J →
←I→
←K→
Local tensors have dimensions IPI× J
PJ× K
PK
Grey Ballard CA Algorithms 31
![Page 47: Part 2: Communication Costs of Tensor Decompositions · Part 2: Communication Costs of Tensor Decompositions Grey Ballard CS 294/Math 270: Communication-Avoiding Algorithms UC Berkeley](https://reader033.fdocuments.in/reader033/viewer/2022050102/5f417692a3343e4dfd0ccd99/html5/thumbnails/47.jpg)
Unfolded Tensor Distribution
Key idea: each unfolded matrix is 2D block distributedProc. grid: PI × PJ × PK = 3× 5× 2
← IK →
↓
J
↑
X(2)
Logical mode-2 2D processor grid: PJ × PIPKLocal unfolded matrices have dimensions J
PJ× IK
PIPK
Grey Ballard CA Algorithms 32
![Page 48: Part 2: Communication Costs of Tensor Decompositions · Part 2: Communication Costs of Tensor Decompositions Grey Ballard CS 294/Math 270: Communication-Avoiding Algorithms UC Berkeley](https://reader033.fdocuments.in/reader033/viewer/2022050102/5f417692a3343e4dfd0ccd99/html5/thumbnails/48.jpg)
Kernel Matrix Computations
Key computations in ST-HOSVD areGram: computing X(2)XT
(2)
TTM: computing Y(2) = VTX(2)
These are just matrix computations, done for each mode in sequencecan determine lower bound/opt. alg. for individual computationshow to minimize communication across all computations?
Grey Ballard CA Algorithms 33
![Page 49: Part 2: Communication Costs of Tensor Decompositions · Part 2: Communication Costs of Tensor Decompositions Grey Ballard CS 294/Math 270: Communication-Avoiding Algorithms UC Berkeley](https://reader033.fdocuments.in/reader033/viewer/2022050102/5f417692a3343e4dfd0ccd99/html5/thumbnails/49.jpg)
Parameter Tuning
1x1x
1x38
4
1x1x
16x2
4
1x1x
2x19
2
1x1x
4x96
1x1x
8x48
1x2x
12x1
61x
4x8x
122x
2x8x
122x
4x6x
8
4x4x
4x6
6x4x
4x40
1
2
3
4
5 TTMEvecsGram
1234
1324
1342
2134
2314
2341
3124
3142
3214
3241
3412
3421
0
0.5
1
1.5
2
2.5 TTMEvecsGram
Varying processor grid for tensor ofsize 384×384×384×384 with
reduced size of 96×96×96×96.
Varying mode order for tensor ofsize 25×250×250×250 with
reduced size 10×10×100×100.
Grey Ballard CA Algorithms 34
![Page 50: Part 2: Communication Costs of Tensor Decompositions · Part 2: Communication Costs of Tensor Decompositions Grey Ballard CS 294/Math 270: Communication-Avoiding Algorithms UC Berkeley](https://reader033.fdocuments.in/reader033/viewer/2022050102/5f417692a3343e4dfd0ccd99/html5/thumbnails/50.jpg)
Parallel Scaling
181 256 625 1296
510
1519.2
Number of Nodes
GFL
OP
SPe
rCor
e
ST-HOSVD
1 2 4 8 16 32 64 128256512
2−4
2−3
2−2
2−1
20
Number of Nodes
Tim
e(s
econ
ds)
ST-HOSVD
Weak scaling for200k×200k×200k×200ktensor with reduced size
20k×20k×20k×20k ,using k4 nodes for 1 ≤ k ≤ 6.
Strong scaling for200×200×200×200
tensor with reduced size20×20×20×20,
using 2k nodes for 0 ≤ k ≤ 9.
Grey Ballard CA Algorithms 35
![Page 51: Part 2: Communication Costs of Tensor Decompositions · Part 2: Communication Costs of Tensor Decompositions Grey Ballard CS 294/Math 270: Communication-Avoiding Algorithms UC Berkeley](https://reader033.fdocuments.in/reader033/viewer/2022050102/5f417692a3343e4dfd0ccd99/html5/thumbnails/51.jpg)
Application: Compression of Scientific Simulation Data
We applied ST-HOSVD to compress multidimensional data fromnumerical simulations of combustion, including the following data sets:
HCCI:Dimensions: 672× 672× 33× 627672× 672 spatial grid, 33 variables over 627 time stepsTotal size: 70 GB
TJLR:Dimensions: 460× 700× 360× 35× 16460× 700× 360 spatial grid, 35 variables over 16 time stepsTotal size: 520 GB
SP:Dimensions: 500× 500× 500× 11× 50500× 500× 500 spatial grid, 11 variables over 50 time stepsTotal size: 550GB
Grey Ballard CA Algorithms 36
![Page 52: Part 2: Communication Costs of Tensor Decompositions · Part 2: Communication Costs of Tensor Decompositions Grey Ballard CS 294/Math 270: Communication-Avoiding Algorithms UC Berkeley](https://reader033.fdocuments.in/reader033/viewer/2022050102/5f417692a3343e4dfd0ccd99/html5/thumbnails/52.jpg)
Application: Compression of Scientific Simulation Data
10−6 10−5 10−4 10−3 10−2
100
101
102
103
104
Relative Normwise Error
Com
pres
sion
Rat
io
HCCITJLRSP
Compression ratio: IJKPQR+IP+JQ+KR Relative Normwise Error: ‖X−X‖‖X‖
Grey Ballard CA Algorithms 37
![Page 53: Part 2: Communication Costs of Tensor Decompositions · Part 2: Communication Costs of Tensor Decompositions Grey Ballard CS 294/Math 270: Communication-Avoiding Algorithms UC Berkeley](https://reader033.fdocuments.in/reader033/viewer/2022050102/5f417692a3343e4dfd0ccd99/html5/thumbnails/53.jpg)
Outline
1 Tensor Notation
2 Tensor Decompositions
3 Computing CP via Alternating Least SquaresMathematical BackgroundParallel Algorithm for Sparse Tensors
4 Computing Tucker via Sequentially Truncated Higher-Order SVDMathematical BackgroundParallel Algorithm for Dense Tensors
5 Open Problems
![Page 54: Part 2: Communication Costs of Tensor Decompositions · Part 2: Communication Costs of Tensor Decompositions Grey Ballard CS 294/Math 270: Communication-Avoiding Algorithms UC Berkeley](https://reader033.fdocuments.in/reader033/viewer/2022050102/5f417692a3343e4dfd0ccd99/html5/thumbnails/54.jpg)
Numerical Questions
CP-ALS solves least squares problems using normal equationsST-HOSVD computes singular vectors using the Gram matrix
Are there applications that require better numerical stability?Can more numerically stable methods be implemented efficiently?
Grey Ballard CA Algorithms 38
![Page 55: Part 2: Communication Costs of Tensor Decompositions · Part 2: Communication Costs of Tensor Decompositions Grey Ballard CS 294/Math 270: Communication-Avoiding Algorithms UC Berkeley](https://reader033.fdocuments.in/reader033/viewer/2022050102/5f417692a3343e4dfd0ccd99/html5/thumbnails/55.jpg)
CA Questions
What are the communication lower bounds for MTTKRP?the computation can be expressed as nested loopsis there a tradeoff between computation and communicaton?
What are the communication lower bounds for ST-HOSVD?we’ve already improved the comm. costs of the published algorithmcan the parameter tuning problems be solved analytically?
Grey Ballard CA Algorithms 39
![Page 56: Part 2: Communication Costs of Tensor Decompositions · Part 2: Communication Costs of Tensor Decompositions Grey Ballard CS 294/Math 270: Communication-Avoiding Algorithms UC Berkeley](https://reader033.fdocuments.in/reader033/viewer/2022050102/5f417692a3343e4dfd0ccd99/html5/thumbnails/56.jpg)
For more details:
Scalable Sparse Tensor Decompositions inDistributed Memory Systems
Oguz Kaya and Bora UçarInternational Conference for High Performance Computing,
Networking, Storage and Analysis 2015http://doi.acm.org/10.1145/2807591.2807624
Parallel Tensor Compression for Large-Scale Scientific DataWoody Austin, Grey Ballard, and Tamara G. Kolda
International Parallel and Distributed Processing Symposium 2016http://arxiv.org/abs/1510.06689
Grey Ballard CA Algorithms 40
![Page 57: Part 2: Communication Costs of Tensor Decompositions · Part 2: Communication Costs of Tensor Decompositions Grey Ballard CS 294/Math 270: Communication-Avoiding Algorithms UC Berkeley](https://reader033.fdocuments.in/reader033/viewer/2022050102/5f417692a3343e4dfd0ccd99/html5/thumbnails/57.jpg)
References I
Evrim Acar, Canan Aykut-Bingol, Haluk Bingol, Rasmus Bro, and BülentYener.Multiway analysis of epilepsy tensors.Bioinformatics, 23(13):i10–i18, 2007.
C. M. Andersen and R. Bro.Practical aspects of parafac modeling of fluorescenceexcitation-emission data.Journal of Chemometrics, 17(4):200–215, 2003.
Woody Austin, Grey Ballard, and Tamara G. Kolda.Parallel tensor compression for large-scale scientific data.Technical Report 1510.06689, arXiv, 2015.To appear in IPDPS.
Grey Ballard CA Algorithms 41
![Page 58: Part 2: Communication Costs of Tensor Decompositions · Part 2: Communication Costs of Tensor Decompositions Grey Ballard CS 294/Math 270: Communication-Avoiding Algorithms UC Berkeley](https://reader033.fdocuments.in/reader033/viewer/2022050102/5f417692a3343e4dfd0ccd99/html5/thumbnails/58.jpg)
References II
Brett W. Bader, Michael W. Berry, and Murray Browne.Survey of Text Mining II: Clustering, Classification, and Retrieval, chapterDiscussion Tracking in Enron Email Using PARAFAC, pages 147–163.Springer London, London, 2008.
Brett W. Bader and Tamara G. Kolda.Efficient MATLAB computations with sparse and factored tensors.SIAM Journal on Scientific Computing, 30(1):205–231, December 2007.
Rafael Ballester-Ripoll and Renato Pajarola.Lossy volume compression using tucker truncation and thresholding.The Visual Computer, pages 1–14, 2015.
T. G. Kolda and B. W. Bader.Tensor decompositions and applications.SIAM Review, 51(3):455–500, September 2009.
Grey Ballard CA Algorithms 42
![Page 59: Part 2: Communication Costs of Tensor Decompositions · Part 2: Communication Costs of Tensor Decompositions Grey Ballard CS 294/Math 270: Communication-Avoiding Algorithms UC Berkeley](https://reader033.fdocuments.in/reader033/viewer/2022050102/5f417692a3343e4dfd0ccd99/html5/thumbnails/59.jpg)
References III
Oguz Kaya and Bora Uçar.Scalable sparse tensor decompositions in distributed memory systems.In Proceedings of the International Conference for High PerformanceComputing, Networking, Storage and Analysis, SC ’15, pages77:1–77:11, New York, NY, USA, 2015. ACM.
M. Alex O. Vasilescu and Demetri Terzopoulos.Computer Vision — ECCV 2002: 7th European Conference on ComputerVision Copenhagen, Denmark, May 28–31, 2002 Proceedings, Part I,chapter Multilinear Analysis of Image Ensembles: TensorFaces, pages447–460.Springer Berlin Heidelberg, Berlin, Heidelberg, 2002.
Nick Vannieuwenhoven, Raf Vandebril, and Karl Meerbergen.A new truncation strategy for the higher-order singular valuedecomposition.SIAM Journal on Scientific Computing, 34(2):A1027–A1052, 2012.
Grey Ballard CA Algorithms 43