Tensor Decompositions for Signal Processing Applications

23
1 Tensor Decompositions for Signal Processing Applications From Two-way to Multiway Component Analysis A. Cichocki, D. Mandic, A-H. Phan, C. Caiafa, G. Zhou, Q. Zhao, and L. De Lathauwer Summary The widespread use of multi-sensor technol- ogy and the emergence of big datasets has high- lighted the limitations of standard flat-view ma- trix models and the necessity to move towards more versatile data analysis tools. We show that higher-order tensors (i.e., multiway arrays) enable such a fundamental paradigm shift to- wards models that are essentially polynomial and whose uniqueness, unlike the matrix meth- ods, is guaranteed under very mild and natural conditions. Benefiting from the power of multi- linear algebra as their mathematical backbone, data analysis techniques using tensor decom- positions are shown to have great flexibility in the choice of constraints that match data properties, and to find more general latent com- ponents in the data than matrix-based methods. A comprehensive introduction to tensor decom- positions is provided from a signal processing perspective, starting from the algebraic founda- tions, via basic Canonical Polyadic and Tucker models, through to advanced cause-effect and multi-view data analysis schemes. We show that tensor decompositions enable natural general- izations of some commonly used signal pro- cessing paradigms, such as canonical correla- tion and subspace techniques, signal separation, linear regression, feature extraction and classi- fication. We also cover computational aspects, and point out how ideas from compressed sens- ing and scientific computing may be used for addressing the otherwise unmanageable stor- age and manipulation problems associated with big datasets. The concepts are supported by illustrative real world case studies illuminating the benefits of the tensor framework, as ef- ficient and promising tools for modern signal processing, data analysis and machine learning applications; these benefits also extend to vector/matrix data through tensorization. I NTRODUCTION Historical notes. The roots of multiway anal- ysis can be traced back to studies of homoge- neous polynomials in the 19th century, contribu- tors include Gauss, Kronecker, Cayley, Weyl and Hilbert — in modern day interpretation these are fully symmetric tensors. Decompositions of non-symmetric tensors have been studied since the early 20th century [1], whereas the benefits of using more than two matrices in factor analy- sis [2] became apparent in several communities since the 1960s. The Tucker decomposition for tensors was introduced in psychometrics [3], [4], while the Canonical Polyadic Decomposition (CPD) was independently rediscovered and put into an application context under the names of Canonical Decomposition (CANDECOMP) in psychometrics [5] and Parallel Factor Model (PARAFAC) in linguistics [6]. Tensors were sub- sequently adopted in diverse branches of data analysis such as chemometrics, food industry and social sciences [7], [8]. When it comes to Signal Processing, the early 1990s saw a consid- erable interest in Higher-Order Statistics (HOS) [9] and it was soon realized that for the mul- tivariate case HOS are effectively higher-order tensors; indeed, algebraic approaches to Inde- pendent Component Analysis (ICA) using HOS [10]–[12] were inherently tensor-based. Around 2000, it was realized that the Tucker decompo- sition represents a MultiLinear Singular Value Decomposition (MLSVD) [13]. Generalizing the matrix SVD, the workhorse of numerical linear algebra, the MLSVD spurred the interest in tensors in applied mathematics and scientific computing in very high dimensions [14]–[16]. In parallel, CPD was successfully adopted as a tool for sensor array processing and determinis- tic signal separation in wireless communication [17], [18]. Subsequently, tensors have been used in audio, image and video processing, machine

Transcript of Tensor Decompositions for Signal Processing Applications

Page 1: Tensor Decompositions for Signal Processing Applications

1

Tensor Decompositions for SignalProcessing Applications

From Two-way to Multiway Component Analysis

A. Cichocki, D. Mandic, A-H. Phan, C. Caiafa, G. Zhou, Q. Zhao, andL. De Lathauwer

Summary

The widespread use of multi-sensor technol-ogy and the emergence of big datasets has high-lighted the limitations of standard flat-view ma-trix models and the necessity to move towardsmore versatile data analysis tools. We showthat higher-order tensors (i.e., multiway arrays)enable such a fundamental paradigm shift to-wards models that are essentially polynomialand whose uniqueness, unlike the matrix meth-ods, is guaranteed under very mild and naturalconditions. Benefiting from the power of multi-linear algebra as their mathematical backbone,data analysis techniques using tensor decom-positions are shown to have great flexibilityin the choice of constraints that match dataproperties, and to find more general latent com-ponents in the data than matrix-based methods.A comprehensive introduction to tensor decom-positions is provided from a signal processingperspective, starting from the algebraic founda-tions, via basic Canonical Polyadic and Tuckermodels, through to advanced cause-effect andmulti-view data analysis schemes. We show thattensor decompositions enable natural general-izations of some commonly used signal pro-cessing paradigms, such as canonical correla-tion and subspace techniques, signal separation,linear regression, feature extraction and classi-fication. We also cover computational aspects,and point out how ideas from compressed sens-ing and scientific computing may be used foraddressing the otherwise unmanageable stor-age and manipulation problems associated withbig datasets. The concepts are supported byillustrative real world case studies illuminatingthe benefits of the tensor framework, as ef-ficient and promising tools for modern signalprocessing, data analysis and machine learningapplications; these benefits also extend tovector/matrix data through tensorization.

INTRODUCTION

Historical notes. The roots of multiway anal-ysis can be traced back to studies of homoge-neous polynomials in the 19th century, contribu-tors include Gauss, Kronecker, Cayley, Weyl andHilbert — in modern day interpretation theseare fully symmetric tensors. Decompositions ofnon-symmetric tensors have been studied sincethe early 20th century [1], whereas the benefitsof using more than two matrices in factor analy-sis [2] became apparent in several communitiessince the 1960s. The Tucker decomposition fortensors was introduced in psychometrics [3], [4],while the Canonical Polyadic Decomposition(CPD) was independently rediscovered and putinto an application context under the namesof Canonical Decomposition (CANDECOMP)in psychometrics [5] and Parallel Factor Model(PARAFAC) in linguistics [6]. Tensors were sub-sequently adopted in diverse branches of dataanalysis such as chemometrics, food industryand social sciences [7], [8]. When it comes toSignal Processing, the early 1990s saw a consid-erable interest in Higher-Order Statistics (HOS)[9] and it was soon realized that for the mul-tivariate case HOS are effectively higher-ordertensors; indeed, algebraic approaches to Inde-pendent Component Analysis (ICA) using HOS[10]–[12] were inherently tensor-based. Around2000, it was realized that the Tucker decompo-sition represents a MultiLinear Singular ValueDecomposition (MLSVD) [13]. Generalizing thematrix SVD, the workhorse of numerical linearalgebra, the MLSVD spurred the interest intensors in applied mathematics and scientificcomputing in very high dimensions [14]–[16].In parallel, CPD was successfully adopted as atool for sensor array processing and determinis-tic signal separation in wireless communication[17], [18]. Subsequently, tensors have been usedin audio, image and video processing, machine

Page 2: Tensor Decompositions for Signal Processing Applications

2

learning and biomedical applications, to namebut a few. The significant interest in tensors andtheir fast emerging applications are reflected inbooks [7], [8], [12], [19]–[21] and tutorial papers[22]–[29] covering various aspects of multiwayanalysis.

From a matrix to a tensor. Approaches totwo-way (matrix) component analysis are wellestablished, and include Principal ComponentAnalysis (PCA), Independent Component Anal-ysis (ICA), Nonnegative Matrix Factorization(NMF) and Sparse Component Analysis (SCA)[12], [19], [30]. These techniques have becomestandard tools for e.g., blind source separation(BSS), feature extraction, or classification. Onthe other hand, large classes of data arising frommodern heterogeneous sensor modalities have amultiway character and are therefore naturallyrepresented by multiway arrays or tensors (seeSection Tensorization).

Early multiway data analysis approaches re-formatted the data tensor as a matrix and re-sorted to methods developed for classical two-way analysis. However, such a “flattened” viewof the world and the rigid assumptions inherentin two-way analysis are not always a goodmatch for multiway data. It is only throughhigher-order tensor decomposition that we havethe opportunity to develop sophisticated mod-els capturing multiple interactions and cou-plings, instead of standard pairwise interac-tions. In other words, we can only discoverhidden components within multiway data ifthe analysis tools account for intrinsic multi-dimensional patterns present — motivating thedevelopment of multilinear techniques.

In this article, we emphasize that tensor de-compositions are not just matrix factorizationswith additional subscripts — multilinear alge-bra is much structurally richer than linear al-gebra. For example, even basic notions such asrank have a more subtle meaning, uniquenessconditions of higher-order tensor decomposi-tions are more relaxed and accommodating thanthose for matrices [31], [32], while matrices andtensors also have completely different geometricproperties [20]. This boils down to matrices rep-resenting linear transformations and quadraticforms, while tensors are connected with multi-linear mappings and multivariate polynomials[29].

NOTATIONS AND CONVENTIONS

A tensor can be thought of as a multi-indexnumerical array, whereby the order of a tensoris the number of its “modes” or “dimensions”,these may include space, time, frequency, trials,classes, and dictionaries. A real-valued tensorof order N is denoted by A ∈ RI1×I2×···×IN

and its entries by ai1,i2,...,iN. Then, an N × 1

vector a is considered a tensor of order one,and an N × M matrix A a tensor of order two.Subtensors are parts of the original data tensor,created when only a fixed subset of indices isused. Vector-valued subtensors are called fibers,defined by fixing every index but one, andmatrix-valued subtensors are called slices, ob-tained by fixing all but two indices (see TableI). Manipulation of tensors often requires theirreformatting (reshaping); a particular case ofreshaping tensors to matrices is termed matrixunfolding or matricization (see Figure 4 (left)).Note that a mode-n multiplication of a tensor Awith a matrix B amounts to the multiplicationof all mode-n vector fibers with B, and thatin linear algebra the tensor (or outer) productappears in the expression for a rank-1 matrix:abT = a ◦ b. Basic tensor notations are summa-rized in Table I, while Table II outlines severaltypes of products used in this paper.

INTERPRETABLE COMPONENTS IN TWO-WAY

DATA ANALYSIS

The aim of blind source separation (BSS),factor analysis (FA) and latent variable analysis(LVA) is to decompose a data matrix X ∈ RI×J

into the factor matrices A = [a1, a2, . . . , aR] ∈RI×R and B = [b1, b2, . . . , bR] ∈ R J×R as:

X = A D BT + E =R

∑r=1

λr arbTr + E

=R

∑r=1

λr ar ◦ br + E, (1)

where D = diag(λ1, λ2, . . . , λR) is a scaling(normalizing) matrix, the columns of B repre-sent the unknown source signals (factors or la-tent variables depending on the tasks in hand),the columns of A represent the associated mix-ing vectors (or factor loadings), while E is noisedue to an unmodelled data part or model error.In other words, model (1) assumes that thedata matrix X comprises hidden componentsbr (r = 1, 2, . . . , R) that are mixed together inan unknown manner through coefficients A, or,equivalently, that data contain factors that have

Page 3: Tensor Decompositions for Signal Processing Applications

3

TABLE I: Basic notation.

A, A, a, a tensor, matrix, vector, scalar

A = [a1, a2, . . . , aR] matrix A with column vectors ar

a(:, i2, i3, . . . , iN) fiber of tensor A obtained by fixing all but one index

A(:, :, i3, . . . , iN) matrix slice of tensor A obtained by fixing all but two indices

A(:, :, :, i4, . . . , iN) tensor slice of A obtained by fixing some indices

A(I1, I2, . . . , IN) subtensor of A obtained by restricting indices to belong to subsetsIn ⊆ {1, 2, . . . , In}

A(n) ∈ R In×I1 I2···In−1In+1···INmode-n matricization of tensor A ∈ R I1×I2×···×IN whose entry atrow in and column (i1 − 1)I2 · · · In−1 In+1 · · · IN + · · · + (iN−1 −1)IN + iN is equal to ai1 i2...iN

vec(A) ∈ R IN IN−1···I1 vectorization of tensor A ∈ RI1×I2×···×IN with the entry at

position i1 + ∑Nk=2[(ik − 1)I1 I2 · · · Ik−1] equal to ai1i2 ...iN

D = diag(λ1, λ2, . . . , λR) diagonal matrix with drr = λr

D = diagN(λ1, λ2, . . . , λR) diagonal tensor of order N with drr···r = λr

AT, A−1, A† transpose, inverse, and Moore-Penrose pseudo-inverse

an associated loading for every data channel.Figure 2 (top) depicts the model (1) as a dyadicdecomposition, whereby the terms ar ◦ br =arbT

r are rank-1 matrices.

The well-known indeterminacies intrinsic tothis model are: (i) arbitrary scaling of compo-nents, and (ii) permutation of the rank-1 terms.Another indeterminacy is related to the physicalmeaning of the factors: if the model in (1) is un-constrained, it admits infinitely many combina-tions of A and B. Standard matrix factorizationsin linear algebra, such as the QR-factorization,Eigenvalue Decomposition (EVD), and SingularValue Decomposition (SVD), are only specialcases of (1), and owe their uniqueness to hardand restrictive constraints such as triangularityand orthogonality. On the other hand, certainproperties of the factors in (1) can be repre-sented by appropriate constraints, making pos-sible unique estimation or extraction of such fac-tors. These constraints include statistical inde-pendence, sparsity, nonnegativity, exponentialstructure, uncorrelatedness, constant modulus,finite alphabet, smoothness and unimodality.Indeed, the first four properties form the basisof Independent Component Analysis (ICA) [12],[33], [34], Sparse Component Analysis (SCA)

[30], Nonnegative Matrix Factorization (NMF)[19], and harmonic retrieval [35].

TENSORIZATION — BLESSING OF

DIMENSIONALITY

While one-way (vectors) and two-way (matri-ces) algebraic structures were respectively intro-duced as natural representations for segmentsof scalar measurements and measurements ona grid, tensors were initially used purely forthe mathematical benefits they provide in dataanalysis; for instance, it seemed natural to stacktogether excitation-emission spectroscopy ma-trices in chemometrics into a third-order tensor[7].

The procedure of creating a data tensor fromlower-dimensional original data is referred toas tensorization, and we propose the followingtaxonomy for tensor generation:

1) Rearrangement of lower dimensional datastructures. Large-scale vectors or matricesare readily tensorized to higher-order ten-sors, and can be compressed through ten-sor decompositions if they admit a low-rank tensor approximation; this principlefacilitates big data analysis [21], [27], [28]

Page 4: Tensor Decompositions for Signal Processing Applications

4

TABLE II: Definition of products.

C = A×n Bmode-n product of A ∈ R I1×I2×···×IN and B ∈ R Jn×In yieldsC ∈ R I1×···×In−1×Jn×In+1×···×IN with entries ci1 ···in−1 jn in+1···iN

=

∑In

in=1 ai1···in−1 in in+1···iNbjn in

and matrix representation C(n) = B A(n)

C = JA; B(1), B(2), . . . , B(N)K full multilinear product, C = A×1 B(1) ×2 B(2) · · · ×N B(N)

C = A ◦B tensor or outer product of A ∈ R I1×I2×···×IN and B ∈ R J1×J2×···×JM

yields C ∈ R I1×I2×···×IN×J1×J2×···×JM with entries ci1 i2···iN j1 j2···jN=

ai1i2···iNbj1 j2···jM

X = a(1) ◦ a(2) ◦ · · · ◦ a(N) tensor or outer product of vectors a(n) ∈ R In (n = 1, . . . , N) yields a rank-1

tensor X ∈ R I1×I2×···×IN with entries xi1 i2...iN= a

(1)i1

a(2)i2

. . . a(N)iN

C = A ⊗ B Kronecker product of A ∈ R I1×I2 and B ∈ R J1×J2 yields C ∈ R I1 J1×I2 J2

with entries c(i1−1)J1+j1,(i2−1)J2+j2 = ai1i2bj1 j2

C = A ⊙ B Khatri-Rao product of A = [a1, . . . , aR] ∈ R I×R and B = [b1, . . . , bR] ∈R J×R yields C ∈ R I J×R with columns cr = ar ⊗ br

I=26

(64 1)(8´8)

(2 2 2 2 2 2)´ ´ ´ ´ ´

. . .

+ +L

z0

z0 + ∆zz0 + 2∆z

x0 +

2∆x

x0 +

∆x

x0

y0

y0 + 2∆yy0 + ∆y

z x

y

Figure 1: Construction of tensors. Top: Tensorizationof a vector or matrix into the so-called quantizedformat; in scientific computing this facilitates super-compression of large-scale vectors or matrices. Bot-tom: Tensor formed through the discretization of atrivariate function f (x, y, z).

(see Figure 1 (top)). For instance, a one-way exponential signal x(k) = azk can berearranged into a rank-1 Hankel matrix ora Hankel tensor [36]:

H =

x(0) x(1) x(2) · · ·x(1) x(2) x(3) · · ·x(2) x(3) x(4) · · ·

......

...

= a b ◦ b,

(2)

where b = [1, z, z2, · · · ]T.Also, in sensor array processing, tensorstructures naturally emerge when combin-ing snapshots from identical subarrays [17].

2) Mathematical construction. Among manysuch examples, the Nth-order moments(cumulants) of a vector-valued randomvariable form an Nth-order tensor [9],while in second-order ICA snapshots ofdata statistics (covariance matrices) are ef-fectively slices of a third-order tensor [12],[37]. Also, a (channel × time) data matrixcan be transformed into a (channel × time ×frequency) or (channel × time × scale) tensorvia time-frequency or wavelet representa-tions, a powerful procedure in multichan-nel EEG analysis in brain science [19], [38].

3) Experiment design. Multi-faceted data can benaturally stacked into a tensor; for instance,in wireless communications the so-calledsignal diversity (temporal, spatial, spectral,. . . ) corresponds to the order of the ten-sor [18]. In the same spirit, the standardEigenFaces can be generalized to Tensor-Faces by combining images with differentilluminations, poses, and expressions [39],while the common modes in EEG record-ings across subjects, trials, and conditionsare best analyzed when combined togetherinto a tensor [26].

4) Naturally tensor data. Some data sourcesare readily generated as tensors (e.g., RGBcolor images, videos, 3D light field dis-

Page 5: Tensor Decompositions for Signal Processing Applications

5

plays) [40]. Also in scientific computing weoften need to evaluate a discretized mul-tivariate function; this is a natural tensor,as illustrated in Figure 1 (bottom) for atrivariate function f (x, y, z) [21], [27], [28].

The high dimensionality of the tensor formatis associated with blessings — these includepossibilities to obtain compact representations,uniqueness of decompositions, flexibility in thechoice of constraints, and generality of compo-nents that can be identified.

CANONICAL POLYADIC DECOMPOSITION

Definition. A Polyadic Decomposition(PD) represents an Nth-order tensorX ∈ RI1×I2×···×IN as a linear combinationof rank-1 tensors in the form

X =R

∑r=1

λr b(1)r ◦ b

(2)r ◦ · · · ◦ b

(N)r . (3)

Equivalently, X is expressed as a multilinearproduct with a diagonal core:

X = D×1 B(1) ×2 B(2) · · · ×N B(N)

= JD; B(1), B(2), . . . , B(N)K, (4)

where D = diagN(λ1, λ2, . . . , λR) (cf. the matrixcase in (1)). Figure 2 (bottom) illustrates thesetwo interpretations for a third-order tensor. Thetensor rank is defined as the smallest value ofR for which (3) holds exactly; the minimumrank PD is called canonical (CPD) and is desiredin signal separation. The term CPD may alsobe considered as an abbreviation of CANDE-COMP/PARAFAC decomposition, see Histori-cal notes. The matrix/vector form of CPD canbe obtained via the Khatri-Rao products as:

X(n) = B(n)D(

B(N) ⊙ · · · ⊙ B(n+1)

⊙B(n−1) ⊙ · · · ⊙ B(1))T

(5)

vec(X) = [B(N) ⊙ B(N−1) ⊙ · · · ⊙ B(1)] d.

where d = (λ1, λ2, . . . , λR)T .

Rank. As mentioned earlier, rank-relatedproperties are very different for matrices andtensors. For instance, the number of complex-valued rank-1 terms needed to represent ahigher-order tensor can be strictly less thanthe number of real-valued rank-1 terms [20],while the determination of tensor rank is ingeneral NP-hard [41]. Fortunately, in signal pro-cessing applications, rank estimation most of-ten corresponds to determining the number of

tensor components that can be retrieved withsufficient accuracy, and often there are only afew data components present. A pragmatic firstassessment of the number of components maybe through the inspection of the multilinearsingular value spectrum (see Section TuckerDecomposition), which indicates the size ofthe core tensor in Figure 2 (bottom-right). Ex-isting techniques for rank estimation includethe CORCONDIA algorithm (core consistencydiagnostic) which checks whether the core ten-sor is (approximately) diagonalizable [7], whilea number of techniques operate by balancingthe approximation error versus the number ofdegrees of freedom for a varying number ofrank-1 terms [42]–[44].

Uniqueness. Uniqueness conditions give the-oretical bounds for exact tensor decomposi-tions. A classical uniqueness condition is due toKruskal [31], which states that for third-ordertensors the CPD is unique up to unavoidablescaling and permutation ambiguities, providedthat kB(1) + kB(2) + kB(3) ≥ 2R + 2, where theKruskal rank kB of a matrix B is the maximumvalue ensuring that any subset of kB columns islinearly independent. In sparse modeling, theterm (kB + 1) is also known as the spark [30].A generalization to Nth-order tensors is due toSidiropoulos and Bro [45], and is given by:

N

∑n=1

kB(n) ≥ 2R + N − 1. (6)

More relaxed uniqueness conditions can be ob-tained when one factor matrix has full columnrank [46]–[48]; for a thorough study of thethird-order case, we refer to [32]. This all showsthat, compared to matrix decompositions, CPDis unique under more natural and relaxed con-ditions, that only require the components to be“sufficiently different” and their number notunreasonably large. These conditions do nothave a matrix counterpart, and are at the heartof tensor based signal separation.

Computation. Certain conditions, includingKruskal’s, enable explicit computation of thefactor matrices in (3) using linear algebra (es-sentially, by solving sets of linear equationsand by computing (generalized) EigenvalueDecomposition) [6], [47], [49], [50]. The presenceof noise in data means that CPD is rarely ex-act, and we need to fit a CPD model to thedata by minimizing a suitable cost function.This is typically achieved by minimizing theFrobenius norm of the difference between the

Page 6: Tensor Decompositions for Signal Processing Applications

6

X

@

TB

+ +...

( )´R R ( )´R J

a1

++ ...

( )´I J

l1

X @ = A

( )´I R

A

( )´I R ( )´ ´R R R ( )´R J

( )´K R

=T

B

D

( )´ ´I J K

C

b1

aR

bR

a1

b1

aR

bR

c1 cRl1

ar

ar

br

br

lR

crlR

Figure 2: Analogy between dyadic (top) and polyadic (bottom) decompositions; the Tucker format has adiagonal core. The uniqueness of these decompositions is a prerequisite for blind source separation andlatent variable analysis.

given data tensor and its CP approximation,or alternatively by least absolute error fittingwhen the noise is Laplacian [51]. TheoreticalCramer-Rao Lower Bound (CRLB) and Cramer-Rao Induced Bound (CRIB) for the assessmentof CPD performance were derived in [52] and[53].

Since the computation of CPD is intrinsi-cally multilinear, we can arrive at the solutionthrough a sequence of linear sub-problems as inthe Alternating Least Squares (ALS) framework,whereby the LS cost function is optimized forone component matrix at a time, while keepingthe other component matrices fixed [6]. As seenfrom (5), such a conditional update scheme boilsdown to solving overdetermined sets of linearequations.

While the ALS is attractive for its simplicityand satisfactory performance for a few wellseparated components and at sufficiently highSNR, it also inherits the problems of alternatingalgorithms and is not guaranteed to converge toa stationary point. This can be rectified by onlyupdating the factor matrix for which the costfunction has most decreased at a given step [54],but this results in an N-times increase in com-putational cost per iteration. The convergenceof ALS is not yet completely understood — itis quasi-linear close to the stationary point [55],while it becomes rather slow for ill-conditionedcases; for more detail we refer to [56], [57].

Conventional all-at-once algorithms for nu-merical optimization such as nonlinear conju-gate gradients, quasi-Newton or nonlinear leastsquares [58], [59] have been shown to oftenoutperform ALS for ill-conditioned cases and

to be typically more robust to overfactoring,but come at a cost of a much higher compu-tational load per iteration. More sophisticatedversions use the rank-1 structure of the termswithin CPD to perform efficient computationand storage of the Jacobian and (approximate)Hessian; their complexity is on par with ALSwhile for ill-conditioned cases the performanceis often superior [60], [61].

An important difference between matricesand tensors is that the existence of a best rank-Rapproximation of a tensor of rank greater thanR is not guaranteed [20], [62] since the set oftensors whose rank is at most R is not closed.As a result, cost functions for computing factormatrices may only have an infimum (insteadof a minimum) so that their minimization willapproach the boundary of that set without everreaching the boundary point. This will causetwo or more rank-1 terms go to infinity uponconvergence of an algorithm, however, numeri-cally the diverging terms will almost completelycancel one another while the overall cost func-tion will still decrease along the iterations [63].These diverging terms indicate an inappropriatedata model: the mismatch between the CPD andthe original data tensor may arise due to anunderestimated number of components, not alltensor components having a rank-1 structure, ordata being too noisy.

Constraints. As mentioned earlier, underquite mild conditions the CPD is unique byitself, without requiring additional constraints.However, in order to enhance the accuracy androbustness with respect to noise, prior knowl-edge of data properties (e.g., statistical inde-

Page 7: Tensor Decompositions for Signal Processing Applications

7

pendence, sparsity) may be incorporated intothe constraints on factors so as to facilitatetheir physical interpretation, relax the unique-ness conditions, and even simplify computation[64]–[66]. Moreover, the orthogonality and non-negativity constraints ensure the existence ofthe minimum of the optimization criterion used[63], [64], [67].

Applications. The CPD has already been es-tablished as an advanced tool for signal sep-aration in vastly diverse branches of signalprocessing and data analysis, such as in audioand speech processing, biomedical engineering,chemometrics, and machine learning [7], [22],[23], [26]. Note that algebraic ICA algorithmsare effectively based on the CPD of a tensorof the statistics of recordings; the statistical in-dependence of the sources is reflected in thediagonality of the core tensor in Figure 2, that is,in vanishing cross-statistics [11], [12]. The CPDis also heavily used in exploratory data anal-ysis, where the rank-1 terms capture essentialproperties of dynamically complex signals [8].Another example is in wireless communication,where the signals transmitted by different userscorrespond to rank-1 terms in the case of line-of-sight propagation [17]. Also, in harmonicretrieval and direction of arrival type applica-tions, real or complex exponentials have a rank-1 structure, for which the use of CPD is natural[36], [65].

Example 1. Consider a sensor array consistingof K displaced but otherwise identical subarraysof I sensors, with I = KI sensors in total.For R narrowband sources in the far field, thebaseband equivalent model of the array output

becomes X = AST + E, where A ∈ CI×R is

the global array response, S ∈ C J×R containsJ snapshots of the sources, and E is noise. Asingle source (R = 1) can be obtained fromthe best rank-1 approximation of the matrix X,however, for R > 1 the decomposition of X isnot unique, and hence the separation of sourcesis not possible without incorporating additionalinformation. Constraints on the sources thatmay yield a unique solution are, for instance,constant modulus or statistical independence[12], [68].

Consider a row-selection matrix Jk ∈ CI× I

that extracts the rows of X corresponding tothe k-th subarray, k = 1, . . . , K. For two iden-tical subarrays, the generalized EVD of thematrices J1X and J2X corresponds to the well-known ESPRIT [69]. For the case K > 2, we

shall consider JkX as slices of the tensor X ∈CI×J×K (see Section Tensorization). It can beshown that the signal part of X admits a CPDas in (3)–(4), with λ1 = · · · = λR = 1,

JkA = B(1)diag(b(3)k1 , . . . , b

(3)kR ) and B(2) = S

[17], and the consequent source separation un-der rather mild conditions — its uniquenessdoes not require constraints such as statisticalindependence or constant modulus. Moreover,the decomposition is unique even in cases whenthe number of sources R exceeds the number ofsubarray sensors I, or even the total number ofsensors I. Notice that particular array geome-tries, such as linearly and uniformly displacedsubarrays, can be converted into a constrainton CPD, yielding a further relaxation of theuniqueness conditions, reduced sensitivity tonoise, and often faster computation [65].

TUCKER DECOMPOSITION

Figure 3 illustrates the principle of Tuckerdecomposition which treats a tensor X ∈R

I1×I2×···×IN as a multilinear transformation ofa (typically dense but small) core tensor G ∈RR1×R2×···×RN by the factor matrices B(n) =

[b(n)1 , b

(n)2 , . . . , b

(n)Rn

] ∈ RIn×Rn , n = 1, 2, . . . , N [3],[4], given by

X =R1

∑r1=1

R2

∑r2=1

· · ·RN

∑rN=1

gr1r2···rN

(

b(1)r1

◦ b(2)r2

◦ · · · ◦ b(N)rN

)

(7)or equivalently

X = G×1 B(1) ×2 B(2) · · · ×N B(N)

= JG; B(1), B(2), . . . , B(N)K. (8)

Via the Kronecker products (see Table II) Tuckerdecomposition can be expressed in a ma-trix/vector form as:

X(n) = B(n)G(n)(B(N) ⊗ · · · ⊗ B(n+1) ⊗ B(n−1) ⊗ · · · ⊗ B(1))T

vec(X) = [B(N) ⊗ B(N−1) ⊗ · · · ⊗ B(1)] vec(G).

Although Tucker initially used the orthogonal-ity and ordering constraints on the core tensorand factor matrices [3], [4], we can also employother meaningful constraints (see below).

Multilinear rank. For a core tensor of mini-mal size, R1 is the column rank (the dimensionof the subspace spanned by mode-1 fibers), R2

is the row rank (the dimension of the subspacespanned by mode-2 fibers), and so on. A re-markable difference from matrices is that thevalues of R1, R2, . . . , RN can be different for N ≥

Page 8: Tensor Decompositions for Signal Processing Applications

8

C

A

1 2 3( )I × I × I 1 1( )´I R 1 2 3( )R ×R ×R

3 3( )´I R

2 2( )´R I

@B

T

Figure 3: Tucker decomposition of a third-ordertensor. The column spaces of A, B, C represent thesignal subspaces for the three modes. The core tensorG is nondiagonal, accounting for possibly complexinteractions among tensor components.

3. The N-tuple (R1, R2, . . . , RN) is consequentlycalled the multilinear rank of the tensor X.

Links between CPD and Tucker decomposi-tion. Eq. (7) shows that Tucker decompositioncan be considered as an expansion in rank-1terms (polyadic but not necessary canonical),while (4) represents CPD as a multilinear prod-uct of a core tensor and factor matrices (but thecore is not necessary minimal); Table III showsvarious other connections. However, despite theobvious interchangeability of notation, the CPand Tucker decompositions serve different pur-poses. In general, the Tucker core cannot bediagonalized, while the number of CPD termsmay not be bounded by the multilinear rank.Consequently, in signal processing and dataanalysis, CPD is typically used for factorizingdata into easy to interpret components (i.e., therank-1 terms), while the goal of unconstrainedTucker decompositions is most often to com-press data into a tensor of smaller size (i.e., thecore tensor) or to find the subspaces spanned bythe fibers (i.e., the column spaces of the factormatrices).

Uniqueness. The unconstrained Tucker de-composition is in general not unique, that is,factor matrices B(n) are rotation invariant. How-ever, physically, the subspaces defined by thefactor matrices in Tucker decomposition areunique, while the bases in these subspaces maybe chosen arbitrarily — their choice is compen-sated for within the core tensor. This becomesclear upon realizing that any factor matrix in(8) can be post-multiplied by any nonsingular(rotation) matrix; in turn, this multiplies the core

TABLE III: Different forms of CPD and Tucker rep-resentations of a third-order tensor X ∈ R I×J×K.

CPD Tucker Decomposition

Tensor representation, outer products

X =R

∑r=1

λr ar ◦ br ◦ cr X =R1

∑r1=1

R2

∑r2=1

R3

∑r3=1

gr1 r2 r3ar1

◦ br2◦ cr3

Tensor representation, multilinear products

X = D×1 A ×2 B ×3 C X = G×1 A ×2 B ×3 C

Matrix representations

X(1) = A D (C ⊙ B)T X(1) = A G(1) (C ⊗ B)T

X(2) = B D (C ⊙ A)T X(2) = B G(2) (C ⊗ A)T

X(3) = C D (B ⊙ A)T X(3) = C G(3) (B ⊗ A)T

Vector representation

vec(X) = (C ⊙ B ⊙ A)d vec(X) = (C ⊗ B ⊗ A) vec(G)

Scalar representation

xijk =R

∑r=1

λr ai r bj r ck r xijk =R1

∑r1=1

R2

∑r2=1

R3

∑r3=1

gr1 r2 r3ai r1

bj r2ck r3

Matrix slices Xk = X(:, :, k)

Xk = A diag(ck1, ck2, . . . , ckR) BT Xk = AR3

∑r3=1

ckr3G(:, :, r3) BT

tensor by its inverse, that is

X = JG; B(1), B(2), . . . , B(N)K

= JH; B(1)R(1), B(2)R(2), . . . , B(N)R(N)K,

H = JG; R(1)−1, R(2)−1

, . . . , R(N)−1K, (9)

where R(n) are invertible.

Multilinear SVD (MLSVD). Orthonormalbases in a constrained Tucker representation canbe obtained via the SVD of the mode-n matri-cized tensor X(n) = UnΣnVT

n (i.e., B(n) = Un,n = 1, 2, . . . , N). Due to the orthonormality, thecorresponding core tensor becomes

S = X×1 UT1 ×2 UT

2 · · · ×N UTN . (10)

Then, the singular values of X(n) are theFrobenius norms of the corresponding slices ofthe core tensor S: (Σn)rn,rn = ‖S(:, :, . . . , rn, :, . . . , :)‖, with slices in the same mode beingmutually orthogonal, i.e., their inner products

Page 9: Tensor Decompositions for Signal Processing Applications

9

I

J

K

I ...J J

Þ

X(1)

X(3)

U1

(SVD/PCA)

1V

Þ

X(:,:, )k

... T

K ...I I

...

X(:, :)j,

Þ

X(2)

A2

(NMF/SCA)

...J

K K... Þ A3

(ICA)

B3

T=S

( ,:,:)X i

@

@

@

1S

B2

T

Figure 4: Multiway Component Analysis (MWCA) for a third-order tensor, assuming that the componentsare: principal and orthogonal in the first mode, nonnegative and sparse in the second mode and statisticallyindependent in the third mode.

are zero. The columns of Un may thus be seenas multilinear singular vectors, while the normsof the slices of the core are multilinear singularvalues [13]. As in the matrix case, the multilin-ear singular values govern the multilinear rank,while the multilinear singular vectors allow, foreach mode separately, an interpretation as inPCA [8].

Low multilinear rank approximation. Anal-ogous to PCA, a large-scale data tensor X canbe approximated by discarding the multilinearsingular vectors and slices of the core tensor thatcorrespond to small multilinear singular values,that is, through truncated matrix SVDs. Lowmultilinear rank approximation is always well-posed, however, the truncation is not necessar-ily optimal in the LS sense, although a good es-timate can often be made as the approximationerror corresponds to the degree of truncation.When it comes to finding the best approxima-tion, the ALS type algorithms exhibit similaradvantages and drawbacks to those used forCPD [8], [70]. Optimization-based algorithmsexploiting second-order information have alsobeen proposed [71], [72].

Constraints and Tucker-based multiwaycomponent analysis (MWCA). Besides orthog-onality, constraints that may help to find uniquebasis vectors in a Tucker representation includestatistical independence, sparsity, smoothnessand nonnegativity [19], [73], [74]. Componentsof a data tensor seldom have the same proper-ties in its modes, and for physically meaningfulrepresentation different constraints may be re-quired in different modes, so as to match theproperties of the data at hand. Figure 4 illus-

trates the concept of MWCA and its flexibility inchoosing the mode-wise constraints; a Tuckerrepresentation of MWCA naturally accommo-dates such diversities in different modes.

Other applications. We have shown thatTucker decomposition may be considered asa multilinear extension of PCA [8]; it there-fore generalizes signal subspace techniques,with applications including classification, fea-ture extraction, and subspace-based harmonicretrieval [25], [39], [75], [76]. For instance, alow multilinear rank approximation achievedthrough Tucker decomposition may yield ahigher Signal-to-Noise Ratio (SNR) than theSNR in the original raw data tensor, makingTucker decomposition a very natural tool forcompression and signal enhancement [7], [8],[24].

BLOCK TERM DECOMPOSITIONS

We have already shown that CPD is uniqueunder quite mild conditions, a further advan-tage of tensors over matrices is that it is evenpossible to relax the rank-1 constraint on theterms, thus opening completely new possibili-ties in e.g. BSS. For clarity, we shall considerthe third-order case, whereby, by replacing the

rank-1 matrices b(1)r ◦ b

(2)r = b

(1)r b

(2) Tr in (3) by

low-rank matrices ArBTr , the tensor X can be

represented as (Figure 5, top):

X =R

∑r=1

(ArBTr ) ◦ cr. (11)

Figure 5 (bottom) shows that we can even useterms that are only required to have a low

Page 10: Tensor Decompositions for Signal Processing Applications

10

1( )I L´ 1( )L J´ ( )RL J´( )RI L´( )I J K´ ´

K( ) K( )1cRc

T1B T

RB1A RA

@

1C

1A RA

( )I J K´ ´ 1( )I L´ 1( )M × J ( )R R RL M R´ ´

1( )K N´RC

1

@+ +L

+ +LT1B

TRB

R

Figure 5: Block Term Decompositions (BTDs) finddata components that are structurally more com-plex than the rank-1 terms in CPD. Top: Decom-position into terms with multilinear rank (Lr, Lr, 1).Bottom: Decomposition into terms with multilinearrank (Lr, Mr, Nr).

multilinear rank (see also Section Tucker De-composition), to give:

X =R

∑r=1

Gr ×1 Ar ×2 Br ×3 Cr. (12)

These so-called Block Term Decompositions(BTD) admit the modelling of more complexsignal components than CPD, and are uniqueunder more restrictive but still fairly naturalconditions [77]–[79].

Example 3. To compare some standard andtensor approaches for the separation of shortduration correlated sources, BSS was performedon five linear mixtures of the sources s1(t) =sin(6πt) and s2(t) = exp(10t) sin(20πt), whichwere contaminated by white Gaussian noise,to give the mixtures X = AS + E ∈ R5×60,where S(t) = [s1(t), s2(t)]

T and A ∈ R5×2 was arandom matrix whose columns (mixing vectors)satisfy aT

1 a2 = 0.1, ‖a1‖ = ‖a2‖ = 1. The3Hz sine wave did not complete a full periodover the 60 samples, so that the two sources

had a correlation degree of|sT

1 s2 |‖s1‖2‖s2‖2

= 0.35.

The tensor approaches, CPD, Tucker decompo-sition and BTD employed a third-order tensorX of size 24 × 37 × 5 generated from five Han-kel matrices whose elements obey X(i, j, k) =X(k, i + j − 1) (see Section Tensorization). Theaverage squared angular error (SAE) was usedas the performance measure. Figure 6 shows thesimulation results, illustrating that:

• PCA failed since the mixing vectors werenot orthogonal and the source signals werecorrelated, both violating the assumptionsfor PCA.

• ICA (using the JADE algorithm [10]) failedbecause the signals were not statisticallyindependent, as assumed in ICA.

• Low rank tensor approximation: a rank-2CPD was used to estimate A as the thirdfactor matrix, which was then inverted toyield the sources. The accuracy of CPD wascompromised as the components of tensorX cannot be represented by rank-1 terms.

• Low multilinear rank approximation: Tuckerdecomposition (TKD) for the multilinearrank (4, 4, 2) was able to retrieve the col-umn space of the mixing matrix but couldnot find the individual mixing vectors dueto the non-uniqueness of TKD.

• BTD in multilinear rank-(2, 2, 1) termsmatched the data structure [78], and it isremarkable that the sources were recoveredusing as few as 6 samples in the noise-freecase.

0.05 0.1 0.15 0.2−0.3

−0.2

−0.1

0

0.1

Time (seconds)

s1

s sP CA sICA sCPD

0.05 0.1 0.15 0.2

−0.2

−0.1

0

0.1

Time (seconds)s1

s sCPD sTKD sBTD

0.05 0.1 0.15 0.2−0.2

−0.1

0

0.1

0.2

0.3

Time (seconds)

s2

s sCPD sTKD sBTD

0 10 20 30 400

20

40

60

SNR (dB)

SA

E (d

B)

PCA ICA CPD TKD BTD

Figure 6: Blind separation of the mixture of a puresine wave and an exponentially modulated sine waveusing PCA, ICA, CPD, Tucker decomposition (TKD)and BTD. The sources s1 and s2 are correlated andof short duration; the symbols s1 and s2 denote theestimated sources.

HIGHER-ORDER COMPRESSED SENSING

The aim of Compressed Sensing (CS) is toprovide faithful reconstruction of a signal ofinterest when the set of available measurementsis (much) smaller than the size of the originalsignal [80]–[83]. Formally, we have available M(compressive) data samples y ∈ RM, whichare assumed to be linear transformations ofthe original signal x ∈ RI (M < I). In otherwords, y = Φx, where the sensing matrix Φ ∈RM×I is usually random. Since the projectionsare of a lower dimension than the originaldata, the reconstruction is an ill-posed inverse

Page 11: Tensor Decompositions for Signal Processing Applications

11

problem, whose solution requires knowledgeof the physics of the problem converted intoconstraints. For example, a 2D image X ∈ RI1×I2

can be vectorized as a long vector x = vec(X) ∈RI (I = I1 I2) that admits sparse representationin a known dictionary B ∈ RI×I , so that x = Bg,where the matrix B may be a wavelet or dis-crete cosine transform (DCT) dictionary. Then,faithful recovery of the original signal x requiresfinding the sparsest vector g such that:

y = Wg, with ‖g‖0 ≤ K, W = ΦB, (13)

where ‖ · ‖0 is the ℓ0-norm (number of non-zeroentries) and K ≪ I.

Since the ℓ0-norm minimization is not practi-cal, alternative solutions involve iterative refine-ments of the estimates of vector g using greedyalgorithms such as the Orthogonal MatchingPursuit (OMP) algorithm, or the ℓ1-norm min-imization algorithms (‖g‖1 = ∑

Ii=1 |gi|) [83].

Low coherence of the composite dictionary ma-trix W is a prerequisite for a satisfactory recov-ery of g (and hence x) — we need to chooseΦ and B so that the correlation between thecolumns of W is minimum [83].

When extending the CS framework to tensordata, we face two obstacles:

• Loss of information, such as spatial and con-textual relationships in data, when a tensorX ∈ RI1×I2×···×IN is vectorized.

• Data handling, since the size of vectorizeddata and the associated dictionary B ∈RI×I easily becomes prohibitively large(see Section Curse of Dimensionality), es-pecially for tensors of high order.

Fortunately, tensor data are typically highlystructured – a perfect match for compressivesampling – so that the CS framework relaxesdata acquisition requirements, enables compactstorage, and facilitates data completion (inpaint-ing of missing samples due to a broken sensoror unreliable measurement).

Kronecker-CS for fixed dictionaries. In manyapplications, the dictionary and the sensing ma-trix admit a Kronecker structure (Kronecker-CSmodel), as illustrated in Figure 7 (top) [84]. Inthis way, the global composite dictionary matrixbecomes W = W(N) ⊗ W(N−1) ⊗ · · · ⊗ W(1),where each term W(n) = Φ

(n)B(n) has a reduceddimensionality since B(n) ∈ RIn×In and Φ

(n) ∈RMn×In . Denote M = M1 M2 · · · MN and I =I1 I2 · · · IN , and since Mn ≤ In, n = 1, 2, . . . , N,this reduces storage requirements by a factor∑n In Mn

MI . The computation of Wg is affordable

( )Ä Ä

W

y

Sparse Vector Representation (Kronecker-CS)

Measurement vector(compressedsensing)

Sparsevector

W(1)

W(2)

W(3)

g

1 2 3 1 2 3( )M M M I I I´

1 2 3( )I I I

ß

»

1 2 3( )M M M

1 2 3( )M M M´ ´

1 1( )M I´

1 2 3( )I I I´ ´

2 2( )M I´

Measurement tensor(compressed sensing)

Block sparsecore tensor

3 3( )M I´

Block Sparse Tucker Representation

W(3)

@

@W

(1)

W(2)

»

Figure 7: Compressed sensing with a Kronecker-structured dictionary. Top: Vector representation. Bot-tom: Tensor representation; Orthogonal MatchingPursuit (OMP) can perform faster if the sparse entriesbelong to a small subtensor, up to permutation of thecolumns of W(1), W(2), W(3).

since g is sparse, however, computing WTy isexpensive but can be efficiently implementedthrough a sequence of products involving muchsmaller matrices W(n) [85]. We refer to [84] forlinks between the coherence of factors W(n) andthe coherence of the global composite dictionarymatrix W.

Figure 7 and Table III illustrate that theKronecker-CS model is effectively a vectorizedTucker decomposition with a sparse core. Thetensor equivalent of the CS paradigm in (13) istherefore to find the sparsest core tensor G suchthat:

Y ∼= G×1 W(1) ×2 W(2) · · · ×N W(N), (14)

with ‖G‖0 ≤ K, for a given set of mode-wise dictionaries B(n) and sensing matrices Φ

(n)

(n = 1, 2, . . . , N). Working with several smalldictionary matrices, appearing in a Tucker rep-resentation, instead of a large global dictionarymatrix, is an example of the use of tensor struc-ture for efficient representation, see also SectionCurse of Dimensionality.

A higher-order extension of the OMP algo-rithm, referred to as the Kronecker-OMP algo-rithm [85], requires K iterations to find the Knon-zero entries of the core tensor G. Additional

Page 12: Tensor Decompositions for Signal Processing Applications

12

computational advantages can be gained if itcan be assumed that the K non-zero entriesbelong to a small subtensor of G, as shown inFigure 7 (bottom); such a structure is inherentto e.g., hyperspectral imaging [85], [86] and3D astrophysical signals. More precisely, if theK = LN non-zero entries are located within asubtensor of size (L × L × · · · × L), where L ≪In, then the so-called N-way Block OMP algo-rithm (N-BOMP) requires at most NL iterations,which is linear in N [85]. The Kronecker-CSmodel has been applied in Magnetic ResonanceImaging (MRI), hyper-spectral imaging, and inthe inpainting of multiway data [84], [86].

Approaches without fixed dictionaries.In Kronecker-CS the mode-wise dictionariesB(n) ∈ RIn×In can be chosen so as best to rep-resent physical properties or prior knowledgeabout the data. They can also be learned froma large ensemble of data tensors, for instance inan ALS type fashion [86]. Instead of the totalnumber of sparse entries in the core tensor,the size of the core (i.e., the multilinear rank)may be used as a measure for sparsity so asto obtain a low-complexity representation fromcompressively sampled data [87], [88]. Alterna-tively, a PD representation can be used insteadof a Tucker representation. Indeed, early workin chemometrics involved excitation-emissiondata for which part of the entries was unreliablebecause of scattering; the CPD of the data tensoris then computed by treating such entries asmissing [7]. While CS variants of several CPDalgorithms exist [59], [89], the “oracle” prop-erties of tensor-based models are still not aswell understood as for their standard models;a notable exception is CPD with sparse factors[90].

Example 2. Figure 8 shows an original3D (1024 × 1024 × 32) hyperspectral image X

which contains scene reflectance measured at32 different frequency channels, acquired by alow-noise Peltier-cooled digital camera in thewavelength range of 400–720 nm [91]. Withinthe Kronecker-CS setting, the tensor of compres-sive measurements Y was obtained by multi-plying the frontal slices by random Gaussiansensing matrices Φ

(1) ∈ RM1×1024 and Φ(2) ∈

RM2×1024 (M1, M2 < 1024) in the first andsecond mode, respectively, while Φ

(3) ∈ R32×32

was the identity matrix (see Figure 8 (top)).We used Daubechies wavelet factor matricesB(1) = B(2) ∈ R1024×1024 and B(3) ∈ R32×32,and employed N-BOMP to recover the small

Original hyperspectral image - RGB display

Reconstruction (SP=33%, PSNR = 35.51dB) - RGB display

(1024 x 1024 x 32) (256 x 256 x 32)

(1024 x 1024 x 32) (256 x 256 x 32)

Kronecker-CS of a 32-channel hyperspectral image

=

Figure 8: Multidimensional compressed sensing of a3D hyperspectral image using Tucker representationwith a small sparse core in wavelet bases.

sparse core tensor and, subsequently, recon-struct the original 3D image as shown in Figure8 (bottom). For the Sampling Ratio SP=33%(M1 = M2 = 585) this gave the Peak Signalto Noise Ratio (PSNR) of 35.51dB, while taking71 minutes to compute the required Niter = 841sparse entries. For the same quality of recon-struction (PSNR=35.51dB), the more conven-tional Kronecker-OMP algorithm found 0.1%of the wavelet coefficients as significant, thusrequiring Niter = K = 0.001 × (1024 × 1024 ×32) = 33, 555 iterations and days of computa-tion time.

LARGE-SCALE DATA AND CURSE OF

DIMENSIONALITY

The sheer size of tensor data easily exceedsthe memory or saturates the processing capabil-ity of standard computers, it is therefore naturalto ask ourselves how tensor decompositions can

Page 13: Tensor Decompositions for Signal Processing Applications

13

be computed if the tensor dimensions in allor some modes are large or, worse still, if thetensor order is high. The term curse of dimen-sionality, in a general sense, was introduced byBellman to refer to various computational bot-tlenecks when dealing with high-dimensionalsettings. In the context of tensors, the curse ofdimensionality refers to the fact that the numberof elements of an Nth-order (I × I × · · · × I)tensor, IN , scales exponentially with the tensororder N. For example, the number of valuesof a discretized function in Figure 1 (bottom),quickly becomes unmanageable in terms of bothcomputations and storing as N increases. Inaddition to their standard use (signal separa-tion, enhancement, etc.), tensor decompositionsmay be elegantly employed in this context asefficient representation tools. The first questionis then which type of tensor decomposition isappropriate.

Efficient data handling. If all computationsare performed on a CP representation and noton the raw data tensor itself, then instead ofthe original IN raw data entries, the number ofparameters in a CP representation reduces toNIR, which scales linearly in N (see Table IV).This effectively bypasses the curse of dimension-ality, while giving us the freedom to choose therank R as a function of the desired accuracy[14]; on the other hand the CP approximationmay involve numerical problems (see SectionCanonical Polyadic Decomposition).

Compression is also inherent to Tucker decom-position, as it reduces the size of a given datatensor from the original IN to (NIR+ RN), thusexhibiting an approximate compression ratio of( I

R )N. We can then benefit from the well under-

stood and reliable approximation by means ofmatrix SVD, however, this is only meaningfulfor low N.

TABLE IV: Storage complexities of tensor models foran Nth-order tensor X ∈ R I×I×···×I , whose originalstorage complexity is O(IN).

1. CPD O(NIR)

2. Tucker O(NIR + RN)

3. TT O(NIR2)

4. QTT O(NR2 log2(I))

Tensor networks. A numerically reliable wayto tackle curse of dimensionality is through aconcept from scientific computing and quan-

A

G

(1)B

1 1( )´I R

1 2 2( )´ ´R I R

2 3 3( )´ ´R I R

3 4 4( )´ ´R I R

54( )´R I

G

(3)(1) (2) (3)

Figure 9: Tensor Train (TT) decomposition ofa fifth-order tensor X ∈ R I1×I2×···×I5 , consist-ing of two matrix carriages and three third-order tensor carriages. The five carriages are con-nected through tensor contractions, which canbe expressed in a scalar form as xi1,i2,i3,i4,i5

=

∑R1

r1=1 ∑R2

r2=1 · · · ∑R4

r4=1 ai1,r1g(1)r1,i2,r2

g(2)r2,i3,r3

g(3)r3,i4,r4

br4,i5.

tum information theory, termed tensor networks,which represents a tensor of a possibly veryhigh order as a set of sparsely interconnectedmatrices and core tensors of low order (typ-ically, order 3). These low-dimensional coresare interconnected via tensor contractions toprovide a highly compressed representation ofa data tensor. In addition, existing algorithmsfor the approximation of a given tensor by atensor network have good numerical properties,making it possible to control the error andachieve any desired accuracy of approximation.For example, tensor networks allow for therepresentation of a wide class of discretizedmultivariate functions even in cases where thenumber of function values is larger than thenumber of atoms in the universe [21], [27], [28].

Examples of tensor networks are the hierar-chical Tucker (HT) decompositions and TensorTrains (TT) (see Figure 9) [15], [16]. The TTs arealso known as Matrix Product States (MPS) andhave been used by physicists more than twodecades (see [92], [93] and references therein).The PARATREE algorithm was developed insignal processing and follows a similar idea, ituses a polyadic representation of a data tensor(in a possibly nonminimal number of terms),whose computation then requires only the ma-trix SVD [94].

For very large-scale data that exhibit a well-defined structure, an even more radical ap-proach can be employed to achieve a parsi-monious representation — through the conceptof quantized or quantic tensor networks (QTN)[27], [28]. For example, a huge vector x ∈ RI

with I = 2L elements can be quantized andtensorized through reshaping into a (2 × 2 ×· · · × 2) tensor X of order L, as illustrated inFigure 1 (top). If x is an exponential signal,x(k) = azk, then X is a symmetric rank-1 tensor

Page 14: Tensor Decompositions for Signal Processing Applications

14

...

...

@

@

@

Þ(1) ( )k

( )K

A

C

( )K

(1)

( )k

A(1)

A( )k

A( )K

( )K

(1)

( )k

B(1)T

B( )Tk

B( )TK

C(1)

C( )k

C( )K

BTÞ

Figure 10: Efficient computation of the CP and Tucker decompositions, whereby tensor decompositions arecomputed in parallel for sampled blocks, these are then merged to obtain the global components A, B, Cand a core tensor G.

that can be represented by two parameters:the scaling factor a and the generator z (cf.(2) in Section Tensorization). Non-symmetricterms provide further opportunities, beyond thesum-of-exponential representation by symmet-ric low-rank tensors. Huge matrices and tensorsmay be dealt with in the same manner. Forinstance, an Nth-order tensor X ∈ RI1×···×IN ,with In = qLn , can be quantized in all modessimultaneously to yield a (q× q× · · · × q) quan-tized tensor of higher order. In QTN, q is small,typically q = 2, 3, 4, for example, the binaryencoding (q = 2) reshapes an Nth-order ten-sor with (2L1 × 2L2 × · · · × 2LN ) elements intoa tensor of order (L1 + L2 + · · · + LN) withthe same number of elements. The tensor traindecomposition applied to quantized tensors isreferred to as the quantized TT (QTT); variantsfor other tensor representations have also beenderived [27], [28]. In scientific computing, suchformats provide the so-called super-compression— a logarithmic reduction of storage require-ments: O(IN) → O(N logq(I)).

Computation of the decomposi-tion/representation. Now that we haveaddressed the possibilities for efficient tensorrepresentation, the question that needs to beanswered is how these representations canbe computed from the data in an efficientmanner. The first approach is to process thedata in smaller blocks rather than in a batchmanner [95]. In such a “divide-and-conquer”approach, different blocks may be processedin parallel and their decompositions carefullyrecombined (see Figure 10) [95], [96]. Infact, we may even compute the decompositionthrough recursive updating, as new data arrive[97]. Such recursive techniques may be used

for efficient computation and for trackingdecompositions in the case of nonstationarydata.

The second approach would be to employcompressed sensing ideas (see Section Higher-Order Compressed Sensing) to fit an algebraicmodel with a limited number of parameters topossibly large data. In addition to completion,the goal here is a significant reduction of thecost of data acquisition, manipulation and stor-age — breaking the Curse of Dimensionalitybeing an extreme case.

While algorithms for this purpose are avail-able both for low rank and low multilinearrank representation [59], [87], an even moredrastic approach would be to directly adoptsampled fibers as the bases in a tensor repre-sentation. In the Tucker decomposition settingwe would choose the columns of the factormatrices B(n) as mode-n fibers of the tensor,which requires addressing the following twoproblems: (i) how to find fibers that allow usto best represent the tensor, and (ii) how tocompute the corresponding core tensor at alow cost (i.e., with minimal access to the data).The matrix counterpart of this problem (i.e.,representation of a large matrix on the basis ofa few columns and rows) is referred to as thepseudoskeleton approximation [98], where the opti-mal representation corresponds to the columnsand rows that intersect in the submatrix ofmaximal volume (maximal absolute value of thedeterminant). Finding the optimal submatrix iscomputationally hard, but quasi-optimal sub-matrices may be found by heuristic so-called“cross-approximation” methods that only re-quire a limited, partial exploration of the datamatrix. Tucker variants of this approach have

Page 15: Tensor Decompositions for Signal Processing Applications

15

!"#$%&'()&

*'(+&,'(+&

-./+&0&

123%&45&6$78&$9:4;<2=&

>$;<=&#?2@?1&$&A9=3

YG

C(1)

C(2)

C(3)

∼=

C(2)

-./+-

!

Figure 11: Tucker representation through fiber sam-pling and cross-approximation: the columns of factormatrices are sampled from the fibers of the originaldata tensor X. Within MWCA the selected fibers maybe further processed using BSS algorithms.

been derived in [99]–[101] and are illustrated inFigure 11, while cross-approximation for the TTformat has been derived in [102]. Following asomewhat different idea, a tensor generalizationof the CUR decomposition of matrices samplesfibers on the basis of statistics derived from thedata [103].

MULTIWAY REGRESSION — HIGHER ORDER

PLS (HOPLS)

Multivariate regression. Regression refers tothe modelling of one or more dependent variables(responses), Y, by a set of independent data (pre-dictors), X. In the simplest case of conditionalMSE estimation, y = E(y|x), the response y is alinear combination of the elements of the vectorof predictors x; for multivariate data the Multi-variate Linear Regression (MLR) uses a matrixmodel, Y = XP + E, where P is the matrix ofcoefficients (loadings) and E the residual matrix.

The MLR solution gives P =(

XTX)−1

XTY, and

involves inversion of the moment matrix XTX.A common technique to stabilize the inverse ofthe moment matrix XTX is principal componentregression (PCR), which employs low rank ap-proximation of X.

Modelling structure in data — the PLS.Notice that in stabilizing multivariate regressionPCR uses only information in the X-variables,with no feedback from the Y-variables. The ideabehind the Partial Least Squares (PLS) methodis to account for structure in data by assum-ing that the underlying system is governed bya small number, R, of specifically constructedlatent variables, called scores, that are sharedbetween the X- and Y-variables; in estimatingthe number R, PLS compromises between fittingX and predicting Y. Figure 12 illustrates that thePLS procedure: (i) uses eigenanalysis to perform

( )I M´

Y UQT

( )I R´

=ur

qr

( )I N´

X T

( )I R´

PT

( )R N´

= Sr =1

R

t r

pr

@

@

( )R M´

Sr =1

R

T

T

Figure 12: The basic PLS model performs jointsequential low-rank approximation of the matrix ofpredictors X and the matrix of responses Y, so asto share (up to the scaling ambiguity) the latentcomponents — columns of the score matrices T andU. The matrices P and Q are the loading matricesfor predictors and responses, and E and F are thecorresponding residual matrices.

contraction of the data matrix X to the princi-pal eigenvector score matrix T = [t1, . . . , tR] ofrank R; (ii) ensures that the tr components aremaximally correlated with the ur componentsin the approximation of the responses Y, this isachieved when the ur’s are scaled versions ofthe tr’s. The Y-variables are then regressed onthe matrix U = [u1, . . . , uR]. Therefore, PLS is amultivariate model with inferential ability thataims to find a representation of X (or a part ofX) that is relevant for predicting Y, using themodel

X = T PT + E =R

∑r=1

tr pTr + E, (15)

Y = U QT + F =R

∑r=1

ur qTr + F. (16)

The score vectors tr provide an LS fit of X-data,while at the same time the maximum correla-tion between t- and u-scores ensures a goodpredictive model for Y-variables. The predictedresponses Ynew are then obtained from new dataXnew and the loadings P and Q.

In practice, the score vectors tr are extractedsequentially, by a series of orthogonal projec-tions followed by the deflation of X. Since therank of Y is not necessarily decreased witheach new tr, we may continue deflating untilthe rank of the X-block is exhausted, so as tobalance between prediction accuracy and modelorder.

The PLS concept can be generalized to tensorsin the following ways:

1) By unfolding multiway data. For example

Page 16: Tensor Decompositions for Signal Processing Applications

16

X(I × J × K) and Y(I × M × N) can beflattened into long matrices X(I × JK) andY(I × MN), so as to admit matrix-PLS (seeFigure 12). However, the flattening priorto standard bilinear PLS obscures structurein multiway data and compromises theinterpretation of latent components.

2) By low rank tensor approximation. The so-called N-PLS attempts to find score vec-tors having maximal covariance with re-sponse variables, under the constraintsthat tensors X and Y are decomposed asa sum of rank-one tensors [104].

3) By a BTD-type approximation, as in theHigher Order PLS (HOPLS) model shownin Figure 13 [105]. The use of block termswithin HOPLS equips it with additionalflexibility, together with a more realisticanalysis than unfolding-PLS and N-PLS.

The principle of HOPLS can be formalized asa set of sequential approximate decompositionsof the independent tensor X ∈ RI1×I2×···×IN andthe dependent tensor Y ∈ R J1×J2×···×JM (withI1 = J1), so as to ensure maximum similarity(correlation) between the scores tr and ur withinthe loadings matrices T and U, based on

X ∼=R

∑r=1

G(r)X ×1 tr ×2 P

(1)r · · · ×N P

(N−1)r (17)

Y ∼=R

∑r=1

G(r)Y ×1 ur ×2 Q

(1)r · · · ×N Q

(M−1)r . (18)

A number of data-analytic problems can bereformulated as either regression or “similarityanalysis” (ANOVA, ARMA, LDA, CCA), so thatboth the matrix and tensor PLS solutions can begeneralized across exploratory data analysis.

Example 4: Decoding of a 3Dhand movement trajectory from theelectrocorticogram (ECoG). The predictivepower of tensor-based PLS is illustrated ona real-world example of the prediction ofarm movement trajectory from ECoG. Fig.14(left) illustrates the experimental setup,whereby 3D arm movement of a monkeywas captured by an optical motion capturesystem with reflective markers affixed tothe left shoulder, elbow, wrist, and hand;for full detail see (http://neurotycho.org).The predictors (32 ECoG channels)naturally build a fourth-order tensor X

(time×channel no×epoch length×frequency)while the movement trajectories forthe four markers (response) can be

+ +L

t1tR

1P

(2)

RP

(2)

1 2 3( )´ ´I I I

2 2( )´L I

2 2( )´L I1

( )I1

( )I

3 3( )´I L

3 3( )´I L

1P

(1)T

RP

(1)T

= P(1)T

2 3( )´ ´R RL RL

2 2( )´RL I

1( )´I R

3 3 3( )I R L´

T

P(2)

...

+ +L

u1 uR

1Q

(2)

RQ

(2)

1Q

(1)T

1 2 3( )´ ´I J J

2 2( )L J´

2 2( )L J´1

( )I1

( )I

3 3( )J L´

3 3( )J L´

RQ

(1)T

= Q(1)

2 3( )R RL RL´ ´

2 2( )RL J´

1( )I R´

3 3 3( )J R L´

U

Q (2)

...

@

@

X

Y

X

(1)

X

( )R

Y

(1)

Y

( )R

Figure 13: The principle of Higher Order PLS(HOPLS) for third-order tensors. The core tensors GXand GY are block-diagonal. The BTD-type structureallows for the modelling of general components thatare highly correlated in the first mode.

represented as a third-order tensor Y

(time×3D marker position×marker no).The goal of the training stage is to identify

the HOPLS parameters: G(r)X ,G

(r)Y , P

(n)r , Q

(n)r ,

see also Figure 13. In the test stage, themovement trajectories, Y∗, for the newECoG data, X∗, are predicted throughmultilinear projections: (i) the new scores,t∗r , are found from new data, X∗, and the

existing model parameters: G(r)X , P

(1)r , P

(2)r , P

(3)r ,

(ii) the predicted trajectory is calculated as

Y∗ ≈ ∑Rr=1 G

(r)Y ×1 t∗r ×2 Q

(1)r ×3 Q

(2)r ×4 Q

(3)r . In

the simulations, standard PLS was applied inthe same way to the unfolded tensors.

Figure 14(right) shows that although the stan-dard PLS was able to predict the movement cor-

Page 17: Tensor Decompositions for Signal Processing Applications

17

Figure 14: Prediction of arm movement from brain electrical responses. Left: Experiment setup. Middle:Construction of the data and response tensors and training. Right: The new data tensor (bottom) and thepredicted 3D arm movement trajectories (X, Y, Z coordinates) obtained by tensor-based HOPLS and standardmatrix-based PLS (top).

responding to each marker individually, suchprediction is quite crude as the two-way PLSdoes not adequately account for mutual infor-mation among the four markers. The enhancedpredictive performance of the BTD-based HO-PLS (red line in Fig.14(right)) is therefore at-tributed to its ability to model interactions be-tween complex latent components of both pre-dictors and responses.

LINKED MULTIWAY COMPONENT ANALYSIS

AND TENSOR DATA FUSION

Data fusion concerns joint analysis of an en-semble of data sets, such as multiple “views”of a particular phenomenon, where some partsof the “scene” may be visible in only one ora few data sets. Examples include fusion ofvisual and thermal images in low visibilityconditions, or the analysis of human electro-physiological signals in response to a certainstimulus but from different subjects and trials;these are naturally analyzed together by meansof matrix/tensor factorizations. The “coupled”nature of the analysis of multiple datasets en-sures that there may be common factors acrossthe datasets, and that some components are notshared (e.g., processes that are independent ofexcitations or stimuli/tasks).

The linked multiway component analysis(LMWCA) [106], shown in Figure 15, performssuch decomposition into shared and individ-ual factors, and is formulated as a set of ap-proximate joint Tucker decompositions of a set

of data tensors X(k) ∈ RI1×I2×···×IN , (k =

1, 2, . . . , K):

X(k) ∼= G(k) ×1 B(1,k) ×2 B(2,k) · · · ×N B(N,k), (19)

where each factor matrix B(n,k) =[B

(n)C , B

(n,k)I ] ∈ RIn×Rn has: (i) components

B(n)C ∈ R

In×Cn (with 0 ≤ Cn ≤ Rn) thatare common (i.e., maximally correlated)to all tensors, and (ii) components

B(n,k)I ∈ R

In×(Rn−Cn) that are tensor-specific.The objective is to estimate the common

components B(n)C , the individual components

B(n,k)I , and, via the core tensors G(k), their

mutual interactions. As in MWCA (see SectionTucker Decomposition), constraints may beimposed to match data properties [73], [76].This enables a more general and flexibleframework than group ICA and IndependentVector Analysis, which also perform linkedanalysis of multiple data sets but assumethat: (i) there exist only common components

Page 18: Tensor Decompositions for Signal Processing Applications

18

(1)B

(1,1)

B(3,1)

@(1)

B(2,1)T

( )B

(1, )

B(3, )

@

B(2, )T

( )I3 3×

( )I1 × I I2 3× ( )I1 × 1 ( )1 × 2 3× ( )2 × I2

( )

BC

(1)BI

(1,1)

BC

(3)BI

(3,1)

BC

(2)T

BI

(2,1)T

BC

(1)BI

(1, )

BC

(3)BI

(3, )

BC

(2)T

BI

(2, )T

R R R R R

R

K K

K

K

K K

K

K

Figure 15: Coupled Tucker decomposition for linkedmultiway component analysis (LMWCA). The datatensors have both shared and individual components.Constraints such as orthogonality, statistical indepen-dence, sparsity and non-negativity may be imposedwhere appropriate.

and (ii) the corresponding latent variables arestatistically independent [107], [108], both quitestringent and limiting assumptions. As analternative to Tucker decompositions, coupledtensor decompositions may be of a polyadic oreven block term type [89], [109].

Example 5: Feature extraction and classifica-tion of objects using LMWCA. Classificationbased on common and distinct features ofnatural objects from the ETH-80 database(http://www.d2.mpi-inf.mpg.de/Datasets)was performed using LMWCA, whereby thediscrimination among objects was performedusing only the common features. This datasetconsists of 3280 images in 8 categories,each containing 10 objects with 41 viewsper object. For each category, the trainingdata were organized in two distinct fourth-order (128 × 128 × 3 × I4) tensors, whereI4 = 10 × 41 × 0.5p, with p the fraction oftraining data. LMWCA was applied to thesetwo tensors to find the common and individualfeatures, with the number of common featuresset to 80% of I4. In this way, eight sets ofcommon features were obtained for eachcategory. The test sample label was assignedto the category whose common featuresmatched the new sample best (evaluatedby canonical correlations) [110]. Figure 16shows the results over 50 Monte Carlo runsand compares LMWCA with the standard

Sample images from different and same categories

!"

!""#$%

&'(%

!"

!"

!"

!"

!"

!"

!"

&' ')%*$+,-.$/%

*'.%$+01%0+,$2'.3

4$/,%/+ "#$

567&!567&!

567&!567&!

4$/,%

Find the class whose common

features best

match the test

sample%

4.+8)8)2%9+,+

!

Findwhose

featu

match

Classification based on LMWCA

10 20 30 40 5075

80

85

90

95

Acc

ura

cy (

%)

Size of training data (%)

LWCA KNN−PCA LDA−PCA

Performance comparison

Figure 16: Classification of color objects belongingto different categories. Due to using only commonfeatures, LMWCA achieves a high classification rate,even when the training set is small.

K-NN and LDA classifiers, the latter using50 principal components as features. Theenhanced classification results for LMWCA areattributed to the fact that the classification onlymakes use of the common components andis not hindered by components that are notshared across objects or views.

SOFTWARE

The currently available software resources fortensor decompositions include:

• The Tensor Toolbox, a versatile frameworkfor basic operations on sparse and densetensors, including CPD and Tucker formats[111].

• The TDALAB and TENSORBOX, whichprovide a user-friendly interface and ad-

Page 19: Tensor Decompositions for Signal Processing Applications

19

vanced algorithms for CPD, nonnegativeTucker decomposition and MWCA [112],[113].

• The Tensorlab toolbox builds upon thecomplex optimization framework and of-fers numerical algorithms for computingthe CPD, BTD and Tucker decompositions.The toolbox includes a library of con-straints (e.g. nonnegativity, orthogonality)and the possibility to combine and jointlyfactorize dense, sparse and incomplete ten-sors [89].

• The N-Way Toolbox, which includes (con-strained) CPD, Tucker decomposition andPLS in the context of chemometrics ap-plications [114]. Many of these methodscan handle constraints (e.g., nonnegativity,orthogonality) and missing elements.

• The TT Toolbox, theHierarchical Tucker Toolbox and theTensor Calculus library provide tensortools for scientific computing [115]–[117].

• Code developed for multiwayanalysis is also available from theThree-Mode Company [118].

CONCLUSIONS AND FUTURE DIRECTIONS

We live in a world overwhelmed by data,from multiple pictures of Big Ben on various so-cial web links to terabytes of data in multiviewmedical imaging, while we may need to repeatthe scientific experiments many times to obtainground truth. Each snapshot gives us a some-what incomplete view of the same object, andinvolves different angles, illumination, lightingconditions, facial expressions, and noise.

We have cast a light on tensor decomposi-tions as a perfect match for exploratory anal-ysis of such multifaceted data sets, and haveillustrated their applications in multi-sensor andmulti-modal signal processing. Our emphasishas been to show that tensor decompositionsand multilinear algebra open completely newpossibilities for component analysis, as com-pared with the “flat view” of standard two-waymethods.

Unlike matrices, tensors are multiway arraysof data samples whose representations are typ-ically overdetermined (fewer parameters in thedecomposition than the number of data entries).This gives us an enormous flexibility in findinghidden components in data and the ability toenhance both robustness to noise and toleranceto missing data samples and faulty sensors.

We have also discussed multilinear variants ofseveral standard signal processing tools suchas multilinear SVD, ICA, NMF and PLS, andhave shown that tensor methods can operatein a deterministic way on signals of very shortduration.

At present the uniqueness conditions of stan-dard tensor models are relatively well under-stood and efficient computation algorithms doexist, however, for future applications severalchallenging problems remain to be addressedin more depth:

• A whole new area emerges when severaldecompositions which operate on differentdatasets are coupled, as in multiview datawhere some details of interest are visiblein only one mode. Such techniques needtheoretical support in terms of existence,uniqueness, and numerical properties.

• As the complexity of advanced models in-creases, their computation requires efficientiterative algorithms, extending beyond theALS class.

• Estimation of the number of componentsin data, and the assessment of their dimen-sionality would benefit from automation,especially in the presence of noise and out-liers.

• Both new theory and algorithms areneeded to further extend the flexibility oftensor models, e.g., for the constraints tobe combined in many ways, and tailored tothe particular signal properties in differentmodes.

• Work on efficient techniques for savingand/or fast processing of ultra large-scaletensors is urgent, these now routinely oc-cupy tera-bytes, and will soon require peta-bytes of memory.

• Tools for rigorous performance analysisand rule of thumb performance boundsneed to be further developed across tensordecomposition models.

• Our discussion has been limited to tensormodels in which all entries take valuesindependently of one another. Probabilisticversions of tensor decompositions incor-porate prior knowledge about complexvariable interaction, various data alphabets,or noise distributions, and so promise tomodel data more accurately and efficiently[119], [120].

It is fitting to conclude with a quote fromMarcel Proust “The voyage of discovery is not in

Page 20: Tensor Decompositions for Signal Processing Applications

20

seeking new landscapes but in having new eyes”.We hope to have helped to bring to the eyesof the Signal Processing Community the multi-disciplinary developments in tensor decomposi-tions, and to have shared our enthusiasm abouttensors as powerful tools to discover new land-scapes. The future computational, visualizationand interpretation tools will be important nextsteps in supporting the different communitiesworking on large-scale and big data analysisproblems.

BIOGRAPHICAL NOTES

Andrzej Cichocki received the Ph.D. and Dr.Sc. (habilita-tion) degrees, all in electrical engineering, from the WarsawUniversity of Technology (Poland). He is currently a SeniorTeam Leader of the laboratory for Advanced Brain SignalProcessing, at RIKEN Brain Science Institute (JAPAN) andProfessor at Systems Research Institute, Polish Academy ofScience(POLAND). He has authored of more than 400 pub-lications and 4 monographs in the areas of signal processingand computational neuroscience. He serves as AssociateEditor for the IEEE Transactions on Signal Processing andJournal Neuroscience Methods.Danilo P. Mandic is a Professor of signal processing at Im-perial College London, London, U.K. and has been workingin the area of nonlinear and multidimensional adaptive sig-nal processing and time-frequency analysis. His publicationrecord includes two research monographs titled RecurrentNeural Networks for Prediction (West Sussex, U.K.: Wiley,August 2001) and Complex Valued Nonlinear AdaptiveFilters: Noncircularity, Widely Linear and Neural Models,an edited book titled Signal Processing for InformationFusion, and more than 200 publications on signal and imageprocessing.Anh Huy Phan received the Ph.D. degree from the KitaKyushu Institute of Technology, Japan in 2011. He workedas Deputy Head of the Research and Development Depart-ment, Broadcast Research and Application Center, VietnamTelevision, and is currently a Research Scientist at theLaboratory for Advanced Brain Signal Processing, and aVisiting Research Scientist with the Toyota CollaborationCenter, Brain Science Institute, RIKEN. He has served on theEditorial Board of International Journal of ComputationalMathematics. His research interests include multilinear al-gebra, tensor computation, blind source separation, andbrain computer interface.Cesar F. Caiafa received the Ph.D. degree in engineeringfrom the Faculty of Engineering, University of Buenos Aires,in 2007. He is currently Adjunct Researcher with the Ar-gentinean Radioastronomy Institute (IAR) - CONICET andAssistant Professor with Faculty of Engineering, Universityof Buenos Aires. He is also Visiting Scientist at Lab. forAdvanced Brain Signal Processing, BSI - RIKEN, Japan.Guoxu Zhou received his Ph.D. degree in intelligent signaland information processing from South China Universityof Technology, Guangzhou, China, in 2010. He is currentlya Research Scientist of the Laboratory for Advanced BrainSignal Processing, at RIKEN Brain Science Institute, Japan.His research interests include statistical signal processing,tensor analysis, intelligent information processing, and ma-chine learning.Qibin Zhao received his Ph.D. degree from the Departmentof Computer Science and Engineering, Shanghai Jiao TongUniversity, Shanghai, China, in 2009. He is currently a re-search scientist at the Laboratory for Advanced Brain SignalProcessing in RIKEN Brain Science Institute, Japan anda visiting research scientist in BSI TOYOTA Collaboration

Center, RIKEN-BSI. His research interests include multiwaydata analysis, brain computer interface and machine learn-ing.Lieven De Lathauwer received the Ph.D. degree fromthe Faculty of Engineering, KU Leuven, Belgium, in 1997.From 2000 to 2007 he was Research Associate with theCentre National de la Recherche Scientifique, France. Heis currently Professor with KU Leuven. He is affiliatedwith both the Group Science, Engineering and Technologyof Kulak, with the Stadius Center for Dynamical Systems,Signal Processing and Data Analytics of the Electrical Engi-neering Department (ESAT) and with iMinds Future HealthDepartment. He is Associate Editor of the SIAM Journal onMatrix Analysis and Applications and has served as Asso-ciate Editor for the IEEE Transactions on Signal Processing.His research concerns the development of tensor tools forengineering applications.

REFERENCES

[1] F. L. Hitchcock, “Multiple invariants and generalizedrank of a p-way matrix or tensor,” Journal of Mathe-matics and Physics, vol. 7, pp. 39–79, 1927.

[2] R. Cattell, “Parallel proportional profiles and otherprinciples for determining the choice of factors byrotation,” Psychometrika, vol. 9, pp. 267–283, 1944.

[3] L. R. Tucker, “The extension of factor analysis tothree-dimensional matrices,” in Contributions to Math-ematical Psychology, H. Gulliksen and N. Frederiksen,Eds. New York: Holt, Rinehart and Winston, 1964,pp. 110–127.

[4] ——, “Some mathematical notes on three-mode factoranalysis,” Psychometrika, vol. 31, no. 3, pp. 279–311,September 1966.

[5] J. Carroll and J.-J. Chang, “Analysis of individ-ual differences in multidimensional scaling via ann-way generalization of ’Eckart-Young’ decomposi-tion,” Psychometrika, vol. 35, no. 3, pp. 283–319,September 1970.

[6] R. A. Harshman, “Foundations of the PARAFAC pro-cedure: Models and conditions for an explanatorymultimodal factor analysis,” UCLA Working Papers inPhonetics, vol. 16, pp. 1–84, 1970.

[7] A. Smilde, R. Bro, and P. Geladi, Multi-way Analysis:Applications in the Chemical Sciences. New York: JohnWiley & Sons Ltd, 2004.

[8] P. Kroonenberg, Applied Multiway Data Analysis. NewYork: John Wiley & Sons Ltd, 2008.

[9] C. Nikias and A. Petropulu, Higher-Order Spectra Anal-ysis: A Nonlinear Signal Processing Framework. PrenticeHall, 1993.

[10] J.-F. Cardoso and A. Souloumiac, “Blind beamformingfor non-Gaussian signals,” in IEE Proceedings F (Radarand Signal Processing), vol. 140, no. 6. IET, 1993, pp.362–370.

[11] P. Comon, “Independent component analysis, a newconcept?” Signal Processing, vol. 36, no. 3, pp. 287–314,1994.

[12] P. Comon and C. Jutten, Eds., Handbook of Blind SourceSeparation: Independent Component Analysis and Appli-cations. Academic Press, 2010.

[13] L. De Lathauwer, B. De Moor, and J. Vandewalle,“A multilinear singular value decomposition,” SIAMJournal of Matrix Analysis and Applications, vol. 24, pp.1253–1278, 2000.

[14] G. Beylkin and M. Mohlenkamp, “Algorithms fornumerical analysis in high dimensions,” SIAM J. Sci-entific Computing, vol. 26, no. 6, pp. 2133–2159, 2005.

[15] J. Ballani, L. Grasedyck, and M. Kluge, “Black box ap-proximation of tensors in hierarchical Tucker format,”Linear Algebra and its Applications, vol. 438, no. 2, pp.639–657, 2013.

Page 21: Tensor Decompositions for Signal Processing Applications

21

[16] I. V. Oseledets, “Tensor-train decomposition,” SIAMJ. Scientific Computing, vol. 33, no. 5, pp. 2295–2317,2011.

[17] N. Sidiropoulos, R. Bro, and G. Giannakis, “Paral-lel factor analysis in sensor array processing,” IEEETransactions on Signal Processing, vol. 48, no. 8, pp.2377–2388, 2000.

[18] N. Sidiropoulos, G. Giannakis, and R. Bro, “BlindPARAFAC receivers for DS-CDMA systems,” IEEETransactions on Signal Processing, vol. 48, no. 3, pp.810–823, 2000.

[19] A. Cichocki, R. Zdunek, A.-H. Phan, and S. Amari,Nonnegative Matrix and Tensor Factorizations: Applica-tions to Exploratory Multi-way Data Analysis and BlindSource Separation. Chichester: Wiley, 2009.

[20] J. Landsberg, Tensors: Geometry and Applications.AMS, 2012.

[21] W. Hackbusch, Tensor Spaces and Numerical TensorCalculus, ser. Springer series in computational math-ematics. Heidelberg: Springer, 2012, vol. 42.

[22] E. Acar and B. Yener, “Unsupervised multiway dataanalysis: A literature survey,” IEEE Transactions onKnowledge and Data Engineering, vol. 21, pp. 6–20,2009.

[23] T. Kolda and B. Bader, “Tensor decompositions andapplications,” SIAM Review, vol. 51, no. 3, pp. 455–500, September 2009.

[24] P. Comon, X. Luciani, and A. L. F. de Almeida,“Tensor decompositions, Alternating Least Squaresand other Tales,” Jour. Chemometrics, vol. 23, pp. 393–405, 2009.

[25] H. Lu, K. Plataniotis, and A. Venetsanopoulos, “Asurvey of multilinear subspace learning for tensordata,” Pattern Recognition, vol. 44, no. 7, pp. 1540–1551, 2011.

[26] M. Mørup, “Applications of tensor (multiway array)factorizations and decompositions in data mining,”Wiley Interdisc. Rew.: Data Mining and Knowledge Dis-covery, vol. 1, no. 1, pp. 24–40, 2011.

[27] B. Khoromskij, “Tensors-structured numerical meth-ods in scientific computing: Survey on recent ad-vances,” Chemometrics and Intelligent Laboratory Sys-tems, vol. 110, no. 1, pp. 1–19, 2011.

[28] L. Grasedyck, D. Kessner, and C. Tobler, “A litera-ture survey of low-rank tensor approximation tech-niques,” CGAMM-Mitteilungen, vol. 36, pp. 53–78,2013.

[29] P. Comon, “Tensors: A brief survey,” IEEE SignalProcessing Magazine, p. (accepted), 2014.

[30] A. Bruckstein, D. Donoho, and M. Elad, “From sparsesolutions of systems of equations to sparse modelingof signals and images,” SIAM Review, vol. 51, no. 1,pp. 34–81, 2009.

[31] J. Kruskal, “Three-way arrays: rank and uniquenessof trilinear decompositions, with application to arith-metic complexity and statistics,” Linear Algebra and itsApplications, vol. 18, no. 2, pp. 95 – 138, 1977.

[32] I. Domanov and L. De Lathauwer, “On the unique-ness of the canonical polyadic decomposition of third-order tensors — part i: Basic results and uniquenessof one factor matrix and part ii: Uniqueness of theoverall decomposition,” SIAM J. Matrix Anal. Appl.,vol. 34, no. 3, pp. 855–903, 2013.

[33] A. Cichocki and S. Amari, Adaptive Blind Signal andImage Processing. John Wiley, Chichester, 2003.

[34] A. Hyvarinen, “Independent component analysis: re-cent advances,” Philosophical Transactions of the RoyalSociety A: Mathematical, Physical and Engineering Sci-ences, vol. 371, no. 1984, 2013.

[35] M. Elad, P. Milanfar, and G. H. Golub, “Shape frommoments – an estimation theory perspective,” Signal

Processing, IEEE Transactions on, vol. 52, no. 7, pp.1814–1829, 2004.

[36] N. Sidiropoulos, “Generalizing Caratheodory’suniqueness of harmonic parameterization to Ndimensions,” IEEE Trans. Information Theory, vol. 47,no. 4, pp. 1687–1690, 2001.

[37] A. Belouchrani, K. Abed-Meraim, J.-F. Cardoso, andE. Moulines, “A blind source separation techniqueusing second-order statistics,” IEEE Trans. Signal Pro-cessing, vol. 45, no. 2, pp. 434–444, 1997.

[38] F. Miwakeichi, E. Martnez-Montes, P. Valds-Sosa,N. Nishiyama, H. Mizuhara, and Y. Yamaguchi, “De-composing EEG data into space−time−frequencycomponents using parallel factor analysis,” NeuroIm-age, vol. 22, no. 3, pp. 1035–1045, 2004.

[39] M. Vasilescu and D. Terzopoulos, “Multilinear anal-ysis of image ensembles: Tensorfaces,” in Proc. Eu-ropean Conf. on Computer Vision (ECCV), vol. 2350,Copenhagen, Denmark, May 2002, pp. 447–460.

[40] M. Hirsch, D. Lanman, G.Wetzstein, and R. Raskar,“Tensor displays,” in Int. Conf. on Computer Graphicsand Interactive Techniques, SIGGRAPH 2012, Los An-geles, CA, USA, Aug. 5-9, 2012, Emerging TechnologiesProceedings, 2012, pp. 24–42.

[41] J. Hastad, “Tensor rank is NP-complete,” Journal ofAlgorithms, vol. 11, no. 4, pp. 644–654, 1990.

[42] M. Timmerman and H. Kiers, “Three mode principalcomponents analysis: Choosing the numbers of com-ponents and sensitivity to local optima,” British Jour-nal of Mathematical and Statistical Psychology, vol. 53,no. 1, pp. 1–16, 2000.

[43] E. Ceulemans and H. Kiers, “Selecting among three-mode principal component models of different typesand complexities: A numerical convex-hull basedmethod,” British Journal of Mathematical and StatisticalPsychology, vol. 59, no. 1, pp. 133–150, May 2006.

[44] M. Mørup and L. K. Hansen, “Automaticrelevance determination for multiway models,”Journal of Chemometrics, Special Issue: In Honorof Professor Richard A. Harshman, vol. 23, no.7-8, pp. 352 – 363, 2009. [Online]. Available:http://www2.imm.dtu.dk/pubdb/p.php?5806

[45] N. Sidiropoulos and R. Bro, “On the uniquenessof multilinear decomposition of N-way arrays,” J.Chemometrics, vol. 14, no. 3, pp. 229–239, 2000.

[46] T. Jiang and N. D. Sidiropoulos, “Kruskal’s per-mutation lemma and the identification of CANDE-COMP/PARAFAC and bilinear models,” IEEE Trans.Signal Processing, vol. 52, no. 9, pp. 2625–2636, 2004.

[47] L. De Lathauwer, “A link between the canonicaldecomposition in multilinear algebra and simultane-ous matrix diagonalization,” SIAM J. Matrix AnalysisApplications, vol. 28, no. 3, pp. 642–666, 2006.

[48] A. Stegeman, “On uniqueness conditions for Cande-comp/Parafac and Indscal with full column rank inone mode,” Linear Algebra and its Applications, vol. 431,no. 1–2, pp. 211–227, 2009.

[49] E. Sanchez and B. Kowalski, “Tensorial resolution: adirect trilinear decomposition,” J. Chemometrics, vol. 4,pp. 29–45, 1990.

[50] I. Domanov and L. De Lathauwer, “Canonicalpolyadic decomposition of third-order tensors: Re-duction to generalized eigenvalue decomposition,”ESAT, KU Leuven, ESAT-SISTA Internal Report 13-36,2013.

[51] S. Vorobyov, Y. Rong, N. Sidiropoulos, and A. Gersh-man, “Robust iterative fitting of multilinear models,”IEEE Transactions Signal Processing, vol. 53, no. 8, pp.2678–2689, 2005.

[52] X. Liu and N. Sidiropoulos, “Cramer-Rao lowerbounds for low-rank decomposition of multidimen-

Page 22: Tensor Decompositions for Signal Processing Applications

22

sional arrays,” IEEE Trans. on Signal Processing, vol. 49,no. 9, pp. 2074–2086, Sep. 2001.

[53] P. Tichavsky, A. Phan, and Z. Koldovsky, “Cramer-rao-induced bounds for candecomp/parafac tensordecomposition,” IEEE Transactions on Signal Process-ing, vol. 61, no. 8, pp. 1986–1997, 2013.

[54] B. Chen, S. He, Z. Li, and S. Zhang, “Maximum blockimprovement and polynomial optimization,” SIAMJournal on Optimization, vol. 22, no. 1, pp. 87–107, 2012.

[55] A. Uschmajew, “Local convergence of the alternatingleast squares algorithm for canonical tensor approx-imation,” SIAM J. Matrix Anal. Appl., vol. 33, no. 2,pp. 639–652, 2012.

[56] M. J. Mohlenkamp, “Musings on multilinear fitting,”Linear Algebra and its Applications, vol. 438, no. 2, pp.834–852, 2013.

[57] M. Razaviyayn, M. Hong, and Z.-Q. Luo, “A unifiedconvergence analysis of block successive minimiza-tion methods for nonsmooth optimization,” SIAMJournal on Optimization, vol. 23, no. 2, pp. 1126–1153,2013.

[58] P. Paatero, “The multilinear engine: A table-drivenleast squares program for solving multilinear prob-lems, including the n-way parallel factor analysismodel,” Journal of Computational and Graphical Statis-tics, vol. 8, no. 4, pp. 854–888, Dec. 1999.

[59] E. Acar, D. Dunlavy, T. Kolda, and M. Mørup,“Scalable tensor factorizations for incomplete data,”Chemometrics and Intelligent Laboratory Systems, vol.106 (1), pp. 41–56, 2011. [Online]. Available:http://www2.imm.dtu.dk/pubdb/p.php?5923

[60] A.-H. Phan, P. Tichavsky, and A. Cichocki, “Lowcomplexity Damped Gauss-Newton algorithms forCANDECOMP/PARAFAC,” SIAM Journal on MatrixAnalysis and Applications (SIMAX), vol. 34, no. 1, pp.126–147, 2013.

[61] L. Sorber, M. Van Barel, and L. De Lathauwer,“Optimization-based algorithms for tensor decompo-sitions: Canonical Polyadic Decomposition, decompo-sition in rank-(Lr, Lr, 1) terms and a new generaliza-tion,” SIAM J. Optimization, vol. 23, no. 2, 2013.

[62] V. de Silva and L.-H. Lim, “Tensor rank and the ill-posedness of the best low-rank approximation prob-lem,” SIAM J. Matrix Anal. Appl., vol. 30, pp. 1084–1127, September 2008.

[63] W. Krijnen, T. Dijkstra, and A. Stegeman, “On thenon-existence of optimal solutions and the occurrenceof “degeneracy” in the Candecomp/Parafac model,”Psychometrika, vol. 73, pp. 431–439, 2008.

[64] M. Sørensen, L. De Lathauwer, P. Comon, S. Icart,and L. Deneire, “Canonical Polyadic Decompositionwith orthogonality constraints,” SIAM J. Matrix Anal.Appl., vol. 33, no. 4, pp. 1190–1213, 2012.

[65] M. Sørensen and L. De Lathauwer, “Blind signalseparation via tensor decomposition with Vander-monde factor: Canonical polyadic decomposition,”IEEE Trans. Signal Processing, vol. 61, no. 22, pp. 5507–5519, Nov. 2013.

[66] G. Zhou and A. Cichocki, “Canonical Polyadic De-composition based on a single mode blind sourceseparation,” IEEE Signal Processing Letters, vol. 19,no. 8, pp. 523–526, 2012.

[67] L.-H. Lim and P. Comon, “Nonnegative approxima-tions of nonnegative tensors,” Journal of Chemometrics,vol. 23, no. 7-8, pp. 432–441, 2009.

[68] A. van der Veen and A. Paulraj, “An analytical con-stant modulus algorithm,” IEEE Transactions SignalProcessing, vol. 44, pp. 1136–1155, 1996.

[69] R. Roy and T. Kailath, “Esprit-estimation of signal pa-rameters via rotational invariance techniques,” Acous-tics, Speech and Signal Processing, IEEE Transactions on,vol. 37, no. 7, pp. 984–995, 1989.

[70] L. De Lathauwer, B. De Moor, and J. Vandewalle, “Onthe best rank-1 and rank-(R1, R2, . . . , RN) approxima-tion of higher-order tensors,” SIAM Journal of MatrixAnalysis and Applications, vol. 21, no. 4, pp. 1324–1342,2000.

[71] B. Savas and L.-H. Lim, “Quasi-Newton methodson Grassmannians and multilinear approximations oftensors,” SIAM J. Scientific Computing, vol. 32, no. 6,pp. 3352–3393, 2010.

[72] M. Ishteva, P.-A. Absil, S. Van Huffel, and L. De Lath-auwer, “Best low multilinear rank approximation ofhigher-order tensors, based on the Riemannian trust-region scheme,” SIAM J. Matrix Analysis Applications,vol. 32, no. 1, pp. 115–135, 2011.

[73] G. Zhou and A. Cichocki, “Fast and unique Tuckerdecompositions via multiway blind source separa-tion,” Bulletin of Polish Academy of Science, vol. 60,no. 3, pp. 389–407, 2012.

[74] A. Cichocki, “Generalized Component Analysis andBlind Source Separation Methods for AnalyzingMulitchannel Brain Signals,” in in Statistical andProcess Models for Gognitive Neuroscience and Aging.Lawrence Erlbaum Associates, 2007, pp. 201–272.

[75] M. Haardt, F. Roemer, and G. D. Galdo, “Higher-order SVD based subspace estimation to improve theparameter estimation accuracy in multi-dimensionalharmonic retrieval problems,” IEEE Trans. Signal Pro-cessing, vol. 56, pp. 3198 – 3213, Jul. 2008.

[76] A. Phan and A. Cichocki, “Tensor decompositions forfeature extraction and classification of high dimen-sional datasets,” Nonlinear Theory and Its Applications,IEICE, vol. 1, no. 1, pp. 37–68, 2010.

[77] L. De Lathauwer, “Decompositions of a higher-ordertensor in block terms – Part I and II,” SIAMJournal on Matrix Analysis and Applications (SIMAX),vol. 30, no. 3, pp. 1022–1066, 2008, special Issue onTensor Decompositions and Applications. [Online].Available: http://publi-etis.ensea.fr/2008/De08e

[78] L. De Lathauwer, “Blind separation of exponentialpolynomials and the decomposition of a tensor inrank-(Lr,Lr,1) terms,” SIAM J. Matrix Analysis Appli-cations, vol. 32, no. 4, pp. 1451–1474, 2011.

[79] L. De Lathauwer, “Block component analysis, a newconcept for blind source separation,” in Proc. 10thInternational Conf. LVA/ICA, Tel Aviv, March 12-15,,2012, pp. 1–8.

[80] E. Candes, J. Romberg, and T. Tao, “Robust un-certainty principles: exact signal reconstruction fromhighly incomplete frequency information,” IEEETransactions on Information Theory, vol. 52, no. 2, pp.489–509, 2006.

[81] E. J. Candes and T. Tao, “Near-optimal signal recoveryfrom random projections: Universal encoding strate-gies?” Information Theory, IEEE Transactions on, vol. 52,no. 12, pp. 5406–5425, 2006.

[82] D. L. Donoho, “Compressed sensing,” InformationTheory, IEEE Transactions on, vol. 52, no. 4, pp. 1289–1306, 2006.

[83] Y. Eldar and G. Kutyniok, “Compressed Sensing:Theory and Applications,” New York: Cambridge Univ.Press, vol. 20, p. 12, 2012.

[84] M. F. Duarte and R. G. Baraniuk, “Kronecker com-pressive sensing,” IEEE Transactions on Image Process-ing, vol. 21, no. 2, pp. 494–504, 2012.

[85] C. Caiafa and A. Cichocki, “Computing sparse rep-resentations of multidimensional signals using Kro-necker bases,” Neural Computaion, vol. 25, no. 1, pp.186–220, 2013.

[86] ——, “Multidimensional compressed sensing andtheir applications,” WIREs Data Mining and KnowledgeDiscovery, 2013 (accepted).

Page 23: Tensor Decompositions for Signal Processing Applications

23

[87] S. Gandy, B. Recht, and I. Yamada, “Tensor com-pletion and low-n-rank tensor recovery via convexoptimization,” Inverse Problems, vol. 27, no. 2, 2011.

[88] M. Signoretto, Q. T. Dinh, L. De Lathauwer, and J. A.Suykens, “Learning with tensors: a framework basedon convex optimization and spectral regularization,”Machine Learning, pp. 1–49, 2013.

[89] L. Sorber, M. Van Barel, and L. De Lathauwer,“Tensorlab v1.0,” Feb. 2013. [Online]. Available:http://esat.kuleuven.be/sista/tensorlab/

[90] N. Sidiropoulos and A. Kyrillidis, “Multi-way com-pressed sensing for sparse low-rank tensors,” IEEESignal Processing Letters, vol. 19, no. 11, pp. 757–760,2012.

[91] D. Foster, K. Amano, S. Nascimento, and M. Foster,“Frequency of metamerism in natural scenes.” Journalof the Optical Society of America A, vol. 23, no. 10, pp.2359–2372, 2006.

[92] A. Cichocki, “Era of big data processing: A newapproach via tensor networks and tensor decomposi-tions, (invited talk),” in Proc. of Int. Workshop on SmartInfo-Media Systems in Asia (SISA2013), Nagoya, Japan,Sept.30–Oct.2, 2013.

[93] R. Orus, “A Practical Introduction to Tensor Net-works: Matrix Product States and Projected EntangledPair States,” The Journal of Chemical Physics, 2013.

[94] J. Salmi, A. Richter, and V. Koivunen, “Sequentialunfolding SVD for tensors with applications in arraysignal processing,” IEEE Transactions on Signal Process-ing, vol. 57, pp. 4719–4733, 2009.

[95] A.-H. Phan and A. Cichocki, “PARAFAC algorithmsfor large-scale problems,” Neurocomputing, vol. 74,no. 11, pp. 1970–1984, 2011.

[96] S. K. Suter, M. Makhynia, and R. Pajarola, “Tamresh- tensor approximation multiresolution hierarchy forinteractive volume visualization,” Comput. Graph. Fo-rum, vol. 32, no. 3, pp. 151–160, 2013.

[97] D. Nion and N. Sidiropoulos, “Adaptive algorithmsto track the PARAFAC decomposition of a third-ordertensor,” IEEE Trans. on Signal Processing, vol. 57, no. 6,pp. 2299–2310, Jun. 2009.

[98] S. A. Goreinov, N. L. Zamarashkin, and E. E. Tyrtysh-nikov, “Pseudo-skeleton approximations by matricesof maximum volume,” Mathematical Notes, vol. 62,no. 4, pp. 515–519, 1997.

[99] C. Caiafa and A. Cichocki, “Generalizing the column-row matrix decomposition to multi-way arrays,” Lin-ear Algebra and its Applications, vol. 433, no. 3, pp. 557–573, 2010.

[100] S. A. Goreinov, “On cross approximation of multi-index array,” Doklady Math., vol. 420, no. 4, pp. 404–406, 2008.

[101] I. Oseledets, D. V. Savostyanov, and E. Tyrtysh-nikov, “Tucker dimensionality reduction of three-dimensional arrays in linear time,” SIAM J. MatrixAnalysis Applications, vol. 30, no. 3, pp. 939–956, 2008.

[102] I. Oseledets and E. Tyrtyshnikov, “TT-cross approx-imation for multidimensional arrays,” Linear Algebraand its Applications, vol. 432, no. 1, pp. 70–88, 2010.

[103] M. W. Mahoney, M. Maggioni, and P. Drineas,“Tensor-CUR decompositions for tensor-based data,”SIAM Journal on Matrix Analysis and Applications,vol. 30, no. 3, pp. 957–987, 2008.

[104] R. Bro, “Multiway calibration. Multilinear PLS,” Jour-nal of Chemometrics, vol. 10, pp. 47–61, 1996.

[105] Q. Zhao, C. Caiafa, D. Mandic, Z. Chao, Y. Nagasaka,N. Fujii, L. Zhang, and A. Cichocki, “Higher-orderpartial least squares (HOPLS): A generalized multi-linear regression method,” IEEE Trans on Pattern Anal-ysis and machine Intelligence (PAMI), vol. 35, no. 7, pp.1660–1673, 2013.

[106] A. Cichocki, “Tensors decompositions: New conceptsfor brain data analysis?” Journal of Control, Measure-ment, and System Integration (SICE), vol. 47, no. 7, pp.507–517, 2011.

[107] V. Calhoun, J. Liu, and T. Adali, “A review of groupICA for fMRI data and ICA for joint inference ofimaging, genetic, and ERP data,” Neuroimage, vol. 45,pp. 163–172, 2009.

[108] Y.-O. Li, T. Adali, W. Wang, and V. Calhoun, “Jointblind source separation by multiset canonical corre-lation analysis,” IEEE Transactions on Signal Processing,vol. 57, no. 10, pp. 3918 –3929, oct. 2009.

[109] E. Acar, T. Kolda, and D. Dunlavy, “All-at-once op-timization for coupled matrix and tensor factoriza-tions,” CoRR, vol. abs/1105.3422, 2011.

[110] G. Zhou, A. Cichocki, S. Xie, and D. Mandic,“Beyond Canonical Correlation Analysis: Com-mon and individual features analysis,” IEEETransactions on PAMI, 2013. [Online]. Available:http://arxiv.org/abs/1212.3913,2012

[111] B. Bader, T. G. Kolda et al., “MAT-LAB tensor toolbox version 2.5,” Avail-able online, Feb. 2012. [Online]. Available:http://www.sandia.gov/∼tgkolda/TensorToolbox

[112] G. Zhou and A. Cichocki, “TDALAB:Tensor Decomposition Laboratory,” LABSP,Wako-shi, Japan, 2013. [Online]. Available:http://bsp.brain.riken.jp/TDALAB/

[113] A.-H. Phan, P. Tichavsky, and A. Cichocki, “Tensor-box: a matlab package for tensor decomposition,”LABSP, RIKEN, Japan, 2012. [Online]. Available:http://www.bsp.brain.riken.jp/∼phan/tensorbox.php

[114] C. Andersson and R. Bro, “The N-way toolboxfor MATLAB,” Chemometrics Intell. Lab. Systems,vol. 52, no. 1, pp. 1–4, 2000. [Online]. Available:http://www.models.life.ku.dk/nwaytoolbox

[115] I. Oseledets, “TT-toolbox 2.2,” 2012. [Online].Available: https://github.com/oseledets/TT-Toolbox

[116] D. Kressner and C. Tobler, “htucker—A MATLABtoolbox for tensors in hierarchical Tucker format,”MATHICSE, EPF Lausanne, 2012. [Online]. Available:http://anchp.epfl.ch/htucker

[117] M. Espig, M. Schuster, A. Killaitis, N. Waldren,P. Wahnert, S. Handschuh, and H. Auer,“Tensor Calculus library,” 2012. [Online]. Available:http://gitorious.org/tensorcalculus

[118] P. Kroonenberg, “The three-mode company. Acompany devoted to creating three-mode softwareand promoting three-mode data analysis.” [Online].Available: http://three-mode.leidenuniv.nl/

[119] S. Zhe, Y. Qi, Y. Park, I. Molloy, and S. Chari,“DinTucker: Scaling up Gaussian process models onmultidimensional arrays with billions of elements,”PAMI (in print) arXiv preprint arXiv:1311.2663, 2014.

[120] K. Yilmaz and A. T. Cemgil, “Probabilistic latenttensor factorisation,” in Proc. of International Conferenceon Latent Variable analysis and Signal Separation, vol.6365, 2010, pp. 346–353, cPCI-S.