Low-Rank Tensor Techniques for High-Dimensional Problems · 2011. 10. 14. · Two classes of tensor...

121
Low-Rank Tensor Techniques for High-Dimensional Problems Daniel Kressner CADMOS Chair for Numerical Algorithms and HPC MATHICSE, EPFL 1

Transcript of Low-Rank Tensor Techniques for High-Dimensional Problems · 2011. 10. 14. · Two classes of tensor...

  • Low-Rank Tensor Techniquesfor High-Dimensional Problems

    Daniel KressnerCADMOS Chair for Numerical Algorithms and HPC

    MATHICSE, EPFL

    1

  • ContentsI What is a tensor?I ApplicationsI Matrices and low rankI CP and TuckerI Hierarchical TuckerI Algorithms based on low-rank tensorsI Conclusions

    2

  • What is a tensor?I Vectors, matrices, and tensorsI Basic calculus with tensorsI Vectorization and matricizationI µ-mode matrix productsI Two classes of tensor problems

    3

  • Vectors, matrices, and tensors

    Vector Matrix Tensor

    I scalar = tensor of order 0I (column) vector = tensor of order 1I matrix = tensor of order 2I tensor of order 3

    = n1n2n3 numbers arranged in n1 × n2 × n3 array4

  • Tensors of arbitrary orderA d-th order tensor X of size n1 × n2 × · · · × nd is a d-dimensionalarray with entries

    Xi1,i2,...,id , iµ ∈ {1, . . . ,nµ} for µ = 1, . . . ,d .

    In the following, entries of X are real (for simplicity)

    X ∈ Rn1×n2×···×nd .

    Multi-index notation:

    I = {1, . . . ,n1} × {1, . . . ,n2} × · · · × {1, . . . ,nd}.

    Then i ∈ I is a tuple of d indices:

    i = (i1, i2, . . . , id ).

    Allows to write entries of X as Xi for i ∈ I.

    5

  • Two important points1. A matrix A ∈ Rm×n has a natural interpretation as a linear

    operator in terms of matrix-vector multiplications:

    A : Rn → Rm, A : x 7→ A · x .

    There is no such (unique and natural) interpretation for tensors! fundamental difficulty to define meaningful general notion ofeigenvalues and singular values of tensors.

    2. Number of entries in tensor grows exponentially with d Curse of dimensionality.

    Example: Tensor of order 30 with n1 = n2 = · · · = nd = 10 has1030 entries = 8× 1012 Exabyte storage!1

    For d � 1: Cannot afford to store tensor explicitly (in terms of itsentries).

    1Global data storage calculated at 295 exabyte, seehttp://www.bbc.co.uk/news/technology-12419672.

    6

    http://www.bbc.co.uk/news/technology-12419672

  • Basic calculusI Addition of two equal-sized tensors X ,Y:

    Z = X + Y ⇔ Zi = Xi + Yi ∀i ∈ I.

    I Scalar product with α ∈ R:

    Z = αX ⇔ Zi = αXi ∀i ∈ I.

    vector space structure.

    I Inner product of two equal-sized tensors X ,Y:

    〈X ,Y〉 :=∑i∈I

    xiyi .

    Induced norm‖X‖ :=

    (∑i∈I

    x2i)1/2

    For a 2nd order tensor (= matrix) this corresponds to theFrobenius norm.

    7

  • VectorizationTensor X of size n1 × n2 × · · · × nd has n1 · n2 · · · nd entries many ways to stack entries in a (loooong) column vector.One possible choice:The vectorization of X is denoted by vec(X ), where

    vec : Rn1×n2×···×nd → Rn1·n2···nd

    stacks the entries of a tensor in reverse lexicographical order into along column vector.

    Remark: For d = 2, this is the usual way how matrices are vectorized.

    A =

    a11 a12a21 a22a31 a32

    ⇒ vec(A) =

    a11a21a31a12a22a32

    8

  • VectorizationExample: d = 3, n1 = 3, n2 = 2, n3 = 3.

    vec(X ) =

    x111x112x113x121

    ...

    ...x321x322x323

    9

  • MatricizationI A matrix has two modes (column mode and row mode).I A d th-order tensor X has d modes (µ = 1, µ = 2, . . ., µ = d).

    Let us fix all but one mode, e.g., µ = 1: Then

    X (:, i2, i3, . . . , id ) (abuse of MATLAB notation)

    is a vector of length n1 for each choice of i2, . . . , id .

    View tensor X as a bunch of column vectors:

    10

  • MatricizationStack vectors into an n1 × (n2 · · · nd ) matrix:

    X ∈ Rn1×n2×···×nd X (1) ∈ Rn1×(n2n3···nd )

    For µ = 1, . . . ,d , the µ-mode matricization of X is a matrix

    X (µ) ∈ Rnµ×(n1···nµ−1nµ+1···nd )

    with entries (X (µ)

    )iµ1 ,(i1,...,iµ−1,iµ+1...id )

    = Xi ∀i ∈ I.

    11

  • MatricizationIn MATLAB: a = rand(2,3,4,5);

    I 1-mode matricization:reshape(a,2,3*4*5)

    I 2-mode matricization:b = permute(a,[2 1 3 4]);reshape(b,3,2*4*5)

    I 3-mode matricization:b = permute(a,[3 1 2 4]);reshape(b,4,2*3*5)

    I 4-mode matricization:b = permute(a,[4 1 2 3]);reshape(b,5,2*3*4)

    For a matrix A ∈ Rn1×n2 :

    A(1) = A, A(2) = AT .

    12

  • µ-mode matrix productsConsider 1-mode matricization X (1) ∈ Rn1×(n2···nd ):

    Seems to make sense to multiply an m × n1 matrix A from the left:

    Y (1) := A X (1) ∈ Rm×(n2···nd ).

    Can rearrange Y (1) back into an m × n2 × · · · × nd tensor Y.This is called 1-mode matrix multiplication

    Y = A ◦1 X ⇔ Y (1) = AX (1)

    More formally (and more ugly):

    Yi1,i2,...,id =n1∑

    k=1

    ai1,kXk,i2,...,id .

    13

  • µ-mode matrix productsGeneral definition of a µ-mode matrix product with A ∈ Rm×n1 :

    Y = A ◦µ X ⇔ Y (µ) = AX (µ).

    More formally (and more ugly):

    Yi1,i2,...,id =n1∑

    k=1

    aiµ,kXi1,...,iµ−1,k,iµ+1,...,id .

    For matrices:I 1-mode multiplication = multiplication from the left:

    Y = A ◦1 X = A X .

    I 2-mode multiplication = transposed multiplication from the right:

    Y = A ◦2 X = X AT .

    14

  • Kronecker productFor m× n matrix A and k × ` matrix B, Kronecker product defined as

    B ⊗ A :=

    b11A · · · b1`A... ...bk1A · · · bk`A

    ∈ Rkm×`n.

    Most important properties (for our purposes):1. vec(A X ) = (I ⊗ A) vec(X ).2. vec(X AT ) = (A⊗ I) vec(X ).3. (B ⊗ A)(D ⊗ C) = (BD ⊗ AC).4. Im ⊗ In = Imn.

    15

  • µ-mode matrix products and vectorizationBy definition,

    vec(X ) = vec(X (1)

    ).

    Consequently, also

    vec(A ◦1 X ) = vec(A X (1)

    ).

    Vectorized version of 1-mode matrix product:

    vec(A ◦1 X ) = (In2···nd ⊗ A)vec(X )= (Ind ⊗ · · · ⊗ In2 ⊗ A) vec(X ).

    Relation between µ-mode matrix product and matrix-vector product:

    vec(A ◦µ X ) = (Ind ⊗ · · · ⊗ Inµ+1 ⊗ A⊗ Inµ−1 ⊗ · · · ⊗ In1 ) vec(X )

    16

  • Two classes of tensor problemsClass 1: function-related tensorsConsider a function u(ξ1, . . . , ξd ) ∈ R in d variables ξ1, . . . , ξd .Tensor U ∈ Rn1×···×nd represents discretization of u:I U contains function values of u evaluated on a grid; orI U contains coefficients of truncated expansion in tensorized

    basis functions:

    u(ξ1, . . . , ξd ) ≈∑i∈I

    Ui φi1 (ξ1)φi2 (ξ2) · · ·φid (ξd ).

    Typical setting:I U only given implicitly, e.g., as the solution of a discretized PDE;I seek approximations to U with very low storage and tolerable

    accuracy.I d may become very large.

    Focus of this lecture on function-related tensors!

    17

  • Discretization of function in d variablesξ1, . . . , ξd ∈ [0,1] #function values grows exponentially with d

    18

  • Separability helpsIdeal situation:Function f separable:f (ξ1, ξ2, . . . , ξd ) = f1(ξ1)f2(ξ2) . . . fd (ξd )

    Kronecker product

    diskretized f

    discretized f j O(nd ) memory O(dn) memoryOf course:Exact separability rarely satisfied inpractice.

    19

  • Two classes of tensor problemsClass 2: data-related tensorsTensor U ∈ Rn1×···×nd contains multi-dimensional data.

    Example 1: U2011,3,2 denotes the number of papers published 2011by author 3 in the mathematical journal 2.

    Example 2: A video of 1000 frames with resolution 640× 480 canbe viewed as a 640× 480× 1000 tensor.

    Typical setting:I entries of U given explicitly (at least partially).I extraction of dominant features from U .I usually moderate values for d .

    20

  • SummaryI Tensor X ∈ Rn1×···×nd is a d-dimensional array.I Various ways of reshaping entries of a tensor X into a vector or

    matrix.I µ-mode matrix multiplication can be expressed with Kronecker

    products

    Further reading:I T. Kolda and B. W. Bader. Tensor decompositions and

    applications. SIAM Rev. 51 (2009), no. 3, 455–500.Software:

    I MATLAB offers basic functionality to work with d-dimensionalarrays.

    I MATLAB Tensor Toolbox: http://www.csmr.ca.sandia.gov/~tgkolda/TensorToolbox/

    21

    http://www.csmr.ca.sandia.gov/~tgkolda/TensorToolbox/http://www.csmr.ca.sandia.gov/~tgkolda/TensorToolbox/

  • Applications inscientific computing

    I High-dimensional elliptic PDEsI High-dimensional PDE-eigenvalue problemsI Quantum many-body problemsI Stochastic Automata NetworksI further applications

    22

  • High-dimensional elliptic PDEs: 3D model problemI Consider

    −∆u = f in Ω, u|∂Ω = 0,

    on unit cube Ω = [0,1]3.I Discretize on tensor grid.

    Uniform grid for simplicity:

    ξ(j)µ = jh, h =1

    n + 1

    for µ = 1,2,3.

    I Approximate solution tensor U ∈ Rn×n×n:

    Ui1,i2,i3 ≈ u(ξ

    (i1)1 , ξ

    (i2)2 , . . . , ξ

    (id )d

    ).

    23

  • High-dimensional elliptic PDEs: 3D model problemI Discretization of 1D-Laplace:

    −∂xx ≈

    2 −1

    −1 2. . .

    . . . . . . −1−1 2

    =: A.I Application in each coordinate direction:

    −∂ξ1ξ1u(ξ1, ξ2, ξ3) ≈ A ◦1 U ,−∂ξ2ξ2u(ξ1, ξ2, ξ3) ≈ A ◦2 U ,−∂ξ3ξ3u(ξ1, ξ2, ξ3) ≈ A ◦3 U .

    I Hence,−∆u ≈ A ◦1 U + A ◦2 U + A ◦3 U

    or in vectorized form with u = vec(U):

    −∆u ≈ (I ⊗ I ⊗ A + I ⊗ A⊗ I + A⊗ I ⊗ I)u.

    24

  • High-dimensional elliptic PDEs: 3D model problemFinite difference discretization of model problem

    −∆u = f in Ω, u|∂Ω = 0

    for Ω = [0,1]3 takes the form

    (I ⊗ I ⊗ A + I ⊗ A⊗ I + A⊗ I ⊗ I)u = f.

    Similar structure for finite element discretization with tensorized FEs:

    V⊗W⊗ Z ={∑

    αijk vi (ξ1)wj (ξ2)zk (ξ3) : αijk ∈ R}

    with

    V = {v1(ξ1), . . . , vn(ξ1)}, W = {w1(ξ2), . . . ,wn(ξ2)}, Z = {z1(ξ3), . . . , zn(ξ3)}

    Galerkin discretization

    (KV ⊗MW ⊗MZ + MV ⊗ KW ⊗MZ + MV ⊗MW ⊗ KZ )u = f,

    with 1D mass/stiffness matrices MV ,MW ,MZ ,KV ,KW ,KZ .25

  • High-dimensional elliptic PDEs: Arbitrary dimensionsFinite difference discretization of model problem

    −∆u = f in Ω, u|∂Ω = 0

    for Ω = [0,1]d takes the form

    ( d∑j=1

    I ⊗ · · · ⊗ I ⊗ A⊗ I ⊗ · · · ⊗ I)

    u = f.

    To obtain such Kronecker structure in general:I tensorized domain;I highly structured grid;I coefficients that can be written/approximated as sum of

    separable functions.

    26

  • High-dimensional PDE-eigenvalue problemsPDE-eigenvalue problem

    ∆u(ξ) + V (ξ)u(ξ) = λu(ξ) in Ω = [0,1]d ,u(ξ) = 0 on ∂Ω.

    Assumption: Potential represented as

    V (ξ) =s∑

    j=1

    V (1)j (ξ1)V(2)j (ξ2) · · ·V

    (d)j (ξd ).

    finite difference discretization

    Au = (AL +AV )u = λu,with

    AL =d∑

    j=1

    I ⊗ · · · ⊗ I︸ ︷︷ ︸d−j times

    ⊗AL ⊗ I ⊗ · · · ⊗ I︸ ︷︷ ︸j−1 times

    ,

    AV =s∑

    j=1

    A(d)V ,j ⊗ · · · ⊗ A(2)V ,j ⊗ A

    (1)V ,j .

    27

  • Quantum many-body problemsI spin-1/2 particles: proton, neutron, electron, and quark.I two states: spin-up, spin-downI quantum state for each spin represented by vector in C2 (spinor)I quantum state for system of d spins represented by vector in C2d

    I quantum mechanical operators expressed in terms of Paulimatrices

    Px =[

    0 11 0

    ], Py =

    [0 −ii 0

    ], Pz =

    [1 00 −1

    ].

    I spin Hamiltonian: sum of Kronecker products of Pauli matricesand identities each term describes physical (inter)action of spins

    I interaction of spins described by graphI Goal: Compute ground state of spin Hamiltonian.

    28

  • Quantum many-body problemsExample: 1d chain of 5 spins with periodic boundary conditions

    1 3 4 52

    Hamiltonian describing pairwise interaction between nearestneighbors:

    H = Pz ⊗ Pz ⊗ I ⊗ I ⊗ I+ I ⊗ Pz ⊗ Pz ⊗ I ⊗ I+ I ⊗ I ⊗ Pz ⊗ Pz ⊗ I+ I ⊗ I ⊗ I ⊗ Pz ⊗ Pz+ Pz ⊗ I ⊗ I ⊗ I ⊗ Pz

    29

  • Quantum many-body problemsI Ising (ZZ) model for 1d chain of d spins with open boundary

    conditions:

    H =p−1∑k=1

    I ⊗ · · · ⊗ I ⊗ Pz ⊗ Pz ⊗ I ⊗ · · · ⊗ I

    p∑k=1

    I ⊗ · · · ⊗ I ⊗ Px ⊗ I ⊗ · · · ⊗ I

    λ = ratio between strength of magnetic field and pairwiseinteractions

    I 1d Heisenberg (XY) modelI Current research: 2d models.I More details in:

    Huckle/Waldherr/Schulte-Herbrüggen: Computations inQuantum Tensor Networks.Schollwöck: The density-matrix renormalization group in the ageof matrix product states.

    30

  • Stochastic Automata Networks (SANs)

    I 3 stochastic automata A1,A2,A3 having 3 states each.I Vector x (i)t ∈ R3 describes probabilities of states (1), (2), (3) in Ai

    at time tI No coupling between automata local transition x (i)t 7→ x

    (i)t+1

    described by Markov chain:

    x (i)t+1 = Eix(i)t ,

    with a stochastic matrix Ei .I Stationary distribution of Ai = Perron vector of Ei (eigenvector for

    eigenvalue 1).

    31

  • Stochastic Automata Networks (SANs)

    I 3 stochastic automata A1,A2,A3 having 3 states each.I Coupling between automata local transition x (i)t 7→ x

    (i)t+1 not

    described by Markov chain.I Need to consider all possible combinations of states in

    (A1,A2,A3):

    (1,1,1), (1,1,2), (1,1,3), (1,2,1), (1,2,2), . . . .

    I Vector xt ∈ R33

    (or tensor X (t) ∈ R3×3×3) describes probabilitiesof combined states.

    32

  • Stochastic Automata Networks (SANs)I Transition xt 7→ xt+1 described by Markov chain:

    xt+1 = Ext ,

    with a large stochastic matrix E .I Oversimplified example:

    E = I ⊗ I ⊗ Ẽ1 + I ⊗ Ẽ2 ⊗ I + Ẽ3 ⊗ I ⊗ I︸ ︷︷ ︸local transition

    .

    + I ⊗ E21 ⊗ E12︸ ︷︷ ︸interaction between A1,A2

    + E32 ⊗ E23 ⊗ I︸ ︷︷ ︸interaction between A2,A3

    I Goal: Compute stationary distribution = Perron vector of E .I More details in:

    Stewart: Introduction to the Numerical Solution of MarkovChains. Chapter 9.Buchholz: Product Form Approximations for CommunicatingMarkov Processes.

    33

  • Further applicationsOther applications in scientific computing featuring low-rank tensorconcepts:

    I Boltzmann equation [Ibragimov/Rjasanow’2009].I Dynamical systems [Koch/Lubich’2009].I Parabolic PDEs [Andreev/Tobler’2011], [Khoromskij’2009].I Stochastic PDEs [Khoromskij/Schwab’2010],

    [Matthies/Zander’2011], [Kressner/Tobler’2011],[Ballani/Grasedyck/Kluge’2011], . . .

    I Electronic structure calculation [Chinnamsetty et al.’2007], [Fladet al.’2009], [Khoromskij/Khoromskaja’2009],[Limpanuparb/Gill’2009], [Benedikt et al.’2011],[Mohlenkamp’2011], . . .

    I Evaluation of boundary integrals (in BEM): [Grasedyck],[Khoromskij/Sauter/Veit’2011].

    I . . .

    34

  • SummaryI Large diversity of applications leading to linear systems /

    eigenvalue problems with Kronecker product structures.I For many problems of practical interest:

    Explicit storage / computation of solution infeasible.I Increasing use of low-rank tensor techniques.

    Heaviest use currently:DMRG for quantum many-body problems.

    I Remark: For PDE-related applications, high dimensionality canalso be addressed during the discretization phase (sparse grids,adaptive sparse discretization, . . .).Has advantages and disadvantages.

    35

  • Approximatelow-rank matrices

    I Singular value decompositionI Separability and low rankI Separability by polynomial interpolationI Separability by exponential sumsI Low rank of snapshot matrices

    36

  • Low-rank approximationSetting: Matrix X ∈ Rn×m, m and n too large to compute/store Xexplicitly.Idea: Replace X by RST with R ∈ Rn×r ,S ∈ Rm×r and r � m,n.

    X RST

    Memory nm nr + rmCost ops(m,n) ops(m,n)× rmin{m,n} (?)

    min{‖X − RST‖2 : R ∈ Rn×r ,S ∈ Rm×r

    }= σk+1.

    with singular values σ1 ≥ σ2 ≥ · · · ≥ σmin{m,n} of X .

    37

  • Construction from singular value decompositionSVD: Let matrix X ∈ Rn×m and k = min{m,n}. Then ∃ orthonormalmatrices

    U =[u1, u2, . . . , uk

    ]∈ Rn×k , V =

    [v1, v2, . . . , vk

    ]∈ Rm×k ,

    such thatX = UΣV T , Σ = diag(σ1, σ2, . . . , σk ).

    Choose r ≤ k and partition

    X =[U1, U2

    ] [ Σ1 00 Σ2

    ] [V1, V2

    ]T= U1 Σ1︸ ︷︷ ︸

    =:R

    V T1︸︷︷︸=:ST

    + U2Σ2V T2 .

    Then ‖X − RST‖2 = ‖Σ2‖2 = σr+1.

    Good low rank approximation if singular values decay sufficiently fast.

    Also: span(X ) ≈ span(R), span(X T ) ≈ span(ST )

    38

  • Discretization of bivariate functionI Bivariate function: f (x , y) :

    [xmin, xmax

    ]×[ymin, ymax

    ]→ R.

    I Function values on tensor grid [x1, . . . , xn]× [y1, . . . , ym]:

    F =

    f (x1, y1) f (x1, y2) · · · f (x1, yn)f (x2, y1) f (x2, y2) · · · f (x2, yn)

    ......

    ...f (xm, y1) f (xm, y2) · · · f (xm, yn)

    Basic but crucial observation: f (x , y) = g(x)h(y)

    F =

    g(x1)h(y1) · · · g(x1)h(yn)... ...g(xm)h(y1) · · · g(xm)h(yn)

    = g(x1)...

    g(xm)

    [ h(y1) · · · h(yn) ]

    Separability implies rank 1.

    39

  • Separability and low rankApproximation by sum of separable functions

    f (x , y) = g1(x)h1(y) + · · ·+ gr (x)hr (y)︸ ︷︷ ︸=:fr (x,y)

    + error.

    Define

    Fr =

    fr (x1, y1) · · · fr (x1, yn)... ...fr (xm, y1) · · · fr (xm, yn)

    .Then Fr has rank ≤ r and ‖F − Fr‖F ≤

    √mn × error.

    σr+1(F ) ≤‖F − Fr‖2 ≤ ‖F − Fr‖F ≤

    √mn × error.

    Semi-separable approximation implies low-rank approximation.

    40

  • Semi-separable approximation by polynomialsSolution of approximation problem

    f (x , y) = g1(x)h1(y) + · · ·+ gr (x)hr (y) + error.

    not trivial; gj ,hj can be chosen arbitrarily!

    General construction by polynomial interpolation:1. Lagrange interpolation of f (x , y) in y -coordinate:

    Iy [f ](x , y) =r∑

    j=1

    f (x , θj )Lj (y)

    with Lagrange polynomials Lj of degree r − 1 on [xmin, xmax].

    2. Interpolation of Iy [f ] in x-coordinate:

    Ix [Iy [f ]](x , y) =r∑

    i,j=1

    f (ξi , θj )Li (x)Lj (y) =̂r∑

    i=1

    Li,x (x)Lj,y (y),

    where f [f (ξi , θj )]i,j is “diagonalized” by SVD.41

  • Semi-separable approximation by polynomials

    error ≤ ‖f − Ix [Iy [f ]]‖∞= ‖f − Ix [f ] + Ix [f ]− Ix [Iy [f ]]‖∞≤ ‖f − Ix [f ]‖∞ + ‖Ix‖∞‖f − Iy [f ]‖∞

    with Lebesgue constant ‖Ix‖∞ ∼ log r when using Chebyshevinterpolation nodes.

    Polynomial interpolation error typically much too pessimistic

    I Lebesgue constants hit hard in high dimensions: (log r)d−1.I Severe theoretical barriers for general smooth multivariate

    functions:E. Novak and H. Woźniakowski: Tractability of MultivariateProblems, Volume I and II. EMS.

    42

  • Semi-separable approximation of 1/(x + y)Consider

    f (x , y) =1

    x + y, x , y ∈ [α, β], 0 < α < β.

    Apply numerical quadrature:

    1z

    =

    ∫ ∞0

    e−tz dt =r∑

    j=1

    ωje−γj z + error.

    Inserting z = x + y

    1x + y

    =r∑

    j=1

    ωje−γj (x+y) + error =r∑

    j=1

    ωje−γj xe−γj y + error.

    Choice of nodes γj > 0 and weights ωj > 0 as in [Stenger’93,Braess’86, Braess/Hackbusch’05]

    error ≤ 8|α|

    exp[− rπ

    2

    log(8β/α)

    ].

    43

  • Semi-separable approximation by exponential sumsI Consider more general case of function f (x , y) := g(x + y).I Approximation of g(z) with z := x + y by exponential sum

    g(z) ≈r∑

    j=1

    ωj exp(γjz) (1)

    for some coefficients γj , ωj ∈ R.I (1) gives semi-separable approximation for f :

    f (x , y) = g(x + y) ≈r∑

    j=1

    ωj exp(γj (x + y))

    =r∑

    j=1

    ωj exp(γjx) exp(γjy).

    I Naturally extends to arbitrarily many variables.I Problem: (1) nontrivial approx problem [Braess’1986],

    [Hackbusch’2006], . . .44

  • Low-rank approximation of snapshot matricesVector-valued function

    x(α) : [αmin, αmax]→ Rn

    Sampling at α1, . . . , αm ∈ [αmin, αmax]:

    Snapshot matrix X = =[x(α1), x(α2), . . . , x(αm)

    ]

    45

  • Example: Baking 1 cookieStationary heat equation with pw constant heat conductivity σ(x , α):

    −∇(σ(x , α)∇u) = f in Ω = [−1,1]2

    u = 0 on ∂Ω,

    I σ(baking tray) = 1I σ(cookie) = 1 + αI Undetermined parameter

    α ∈ [αmin, αmax].

    0 0.5 1 1.5 2

    0

    0.5

    1

    1.5

    2

    # Vertices : 455, # Elements : 825,# Edges : 1279

    Standard FE discretization results in linearly parameter-dependentlinear system

    (A0 + αA1)x(α) = b.

    46

  • Singular value decay – observationI 1 Cookie: n = 371,m = 101.

    log10(singular values of snapshot matrix)

    0 20 40 60 80 100−20

    −15

    −10

    −5

    0

    5

    I Foundation of Proper Orthogonal Decomposition and ReducedBasis Methods.

    47

  • Singular value decay – explanationPolynomial approximation:

    x(α) = x0 + αx1 + α2x2 + · · ·+ αk−1xk−1 + error.

    Approximation error:I Assume b(·), A(·) analytic x(·) analytic.I Then

    error . ρ−k ,

    where ρ > 1 depends on domain of analyticity of A,b.(Proof: Direct extension of classical result for scalar-valuedfunctions.)

    48

  • Singular value decay – explanationPolynomial approximation:

    x(α) = x0 + αx1 + α2x2 + · · ·+ αk−1xk−1 + error.

    Snapshot matrix:

    X =[x(α1), x(α2), . . . , x(αm)

    ]=

    [x0, x1, . . . , xk−1

    ]

    1 1 . . . 1α1 α2 . . . αm...

    ......

    αk−11 αk−12 . . . α

    k−1m

    + error= matrix of rank k + error

    σk+1(X ) ≤ error . ρ−k

    Remark: Trivially extends to pw analytic case.

    49

  • Singular value decay – pw analytic caseExample: Consider smallest singular value σ(z) and correspondingright singular vector v(z) of B(z) = A− izI for z ∈ [−1,1].

    I s(z) only Lipschitzcont, but pw anal.

    I v(z) discontinuous,but pw anal.

    I A = 2× 2 block diag randn, n = 400.I Snapshot matrix of singular vectors:

    X =[

    v(z1), v(z2), . . . , v(z100)]

    for equidistant samples zj ∈ [−1,1].

    σ(z) Singular values of X

    −1 −0.5 0 0.5 10

    0.005

    0.01

    0.015

    0.02

    0.025

    0.03

    z

    0 20 40 60 80 10010

    −20

    10−15

    10−10

    10−5

    100

    105

    50

  • Summary

    Need strong singular value decay for good low-rank approximations.

    For function-related matrices/tensors: Strong link to semi-separableapproximations.

    Smoothness seems to be important... at least somehow.I Fortunately, smoothness is not necessary.

    Piecewise smoothness can be enough.I Unfortunately, smoothness is not sufficient for higher-order

    tensors.I Need to impose stronger regularity as dimension/order d

    increases, based, e.g., on mixed weak derivatives [Yserentant:Regularity and approximability of electronic wave functions.2010].

    51

  • Low-rank tensors:CP and Tucker

    I CPI TuckerI Higher-order SVDI Tensor networks

    52

  • CP decompositionI Aim: Generalize concept of low rank from matrices to tensors.I One possibility motivated by

    X =[a1, a2, . . . , aR

    ][b1, b2, . . . , bR

    ]T=

    = a1bT1 + a2bT2 + · · ·+ aRbTR .

    vectorization

    vec(X ) = b1 ⊗ a1 + b2 ⊗ a2 + · · ·+ bR ⊗ aR .

    Canonical Polyadic decomposition of tensor X ∈ Rn1×n2×n3 definedvia

    vec(X ) = c1 ⊗ b1 ⊗ a1 + c2 ⊗ b2 ⊗ a2 + · · ·+ cR ⊗ bR ⊗ aR

    for vectors aj ∈ Rn1 , bj ∈ Rn2 , cj ∈ Rn3 .

    CP directly corresponds to semi-separable approximation.Tensor rank of X = minimal possible R

    53

  • CP decompositionIllustration of CP decomposition

    vec(X ) = c1 ⊗ b1 ⊗ a1 + c2 ⊗ b2 ⊗ a2 + · · ·+ cR ⊗ bR ⊗ aR .

    c1

    a1

    b1

    cr

    ar

    br

    X

    54

  • CP decompositionI CP decomposition offers low data-complexity; for constant R:

    linear complexity in d .I For matrices:

    I rank r is upper semi-continuous closedness property:sequence of rank= r matrices can only converge to rank≤ r matrix.

    I best low-rank approximation possible by successive rank-1approximations.

    I Robust black-box algorithms/software available (svd, Lanczos).

    For tensors of order d ≥ 3:I tensor rank R is not upper

    semi-continuous

    lack of closedness

    I successive rank-1 approximations failI all algorithms based on optimization

    techniques (ALS, Gauss-Newton)Picture taken from [Kolda/Bader’2009].

    55

  • Tucker decompositionI Aim: Generalize concept of low rank from matrices to tensors.I Alternative possibility motivated by

    A = U · Σ · V T , U ∈ Rn1×r , V ∈ Rn2×r , Σ ∈ Rr×r .

    vectorization

    vec(X ) =(V ⊗ U

    )· vec(Σ).

    Ignore diagonal structure of Σ and call it C.

    Tucker decomposition of tensor X ∈ Rn1×n2×n3 defined via

    vec(X ) =(W ⊗ V ⊗ U

    )· vec(C)

    with U ∈ Rn1×r1 , V ∈ Rn2×r2 , W ∈ Rn3×r3 ,and core tensor C ∈ Rr1×r2×r3 .

    In terms of µ-mode matrix products:

    X = U ◦1 V ◦2 W ◦3 C =: (U,V ,W ) ◦ C.

    56

  • Tucker decompositionIllustration of Tucker decomposition

    X = (U,V ,W ) ◦ C

    X CU

    V

    W

    57

  • Tucker decompositionConsider all three matricizations:

    X (1) = U · C(1) ·(W ⊗ V

    )T,

    X (2) = V · C(2) ·(W ⊗ U

    )T,

    X (3) = W · C(3) ·(V ⊗ U

    )T.

    These are low rank decompositions

    rank(X (1)

    )≤ r1, rank

    (X (2)

    )≤ r2, rank

    (X (3)

    )≤ r3.

    Multilinear rank of tensor X ∈ Rn1×n2×n3 defined by tuple

    (r1, r2, r3), with ri = rank(X (i)

    ).

    58

  • Higher-order SVD (HOSVD)Goal: Approximate given tensor X by Tucker decomposition withprescribed multilinear rank (r1, r2, r3).

    1. Calculate SVD of matricizations:

    X (µ) = ŨµΣ̃µṼ Tµ for µ = 1,2,3.

    2. Truncate basis matrices:

    Uµ := Ũµ(:,1 : rµ) for µ = 1,2,3.

    3. Form core tensor:

    vec(C) :=(UT3 ⊗ UT2 ⊗ UT1

    )· vec(X ).

    Truncated tensor produced by HOSVD [Lathauwer/DeMoor/Vandewalle’2000]:

    vec(X̃)

    :=(U3 ⊗ U2 ⊗ U1

    )· vec(C).

    Remark:Orthogonal projection X̃ :=

    (π1 ◦ π2 ◦ π3

    )X with πµX := UµUTµ ◦µ X .

    59

  • Higher-order SVD (HOSVD)Tensor X̃ resulting from HOSVD satisfies quasi-optimality condition

    ‖X − X̃‖ ≤√

    d‖X − Xbest‖,

    where Xbest is best approximation of X with multilinear ranks(r1, . . . , rd ).

    Proof:

    ‖X − X̃‖2 = ‖X − (π1 ◦ π2 ◦ π3)X‖2

    = ‖X − π1X‖2 + ‖π1X − (π1 ◦ π2)X‖2 + · · ·· · ·+ ‖(π1 ◦ π2)X − (π1 ◦ π2 ◦ π3)X‖2

    ≤ ‖X − π1X‖2 + ‖X − π2X‖2 + ‖X − π3X‖2

    Using‖X − πµX‖ ≤ ‖X − Xbest‖ for µ = 1,2,3

    leads to‖X − X̃‖2 ≤ 3 · ‖X − Xbest‖2.

    Best approximation: See [Kolda/Bader’09].60

  • Tucker decomposition – SummaryFor general tensors:

    I multilinear rank r is upper semi-continuous closednessproperty.

    I HOSVD – simple and robust algorithm to obtain quasi-optimallow-rank approximation.

    I quasi-optimality good enough for most applications in scientificcomputing.

    I robust black-box algorithms/software available (e.g., TensorToolbox).

    Drawback:Storage of core tensor ∼ rd curse of dimensionality

    61

  • Tensor network diagramsTensor network = undirected graph with:

    I each node is a tensor;I each outgoing edge is a mode;I each connected edge represents a contraction; example:

    Zi1,i2,i3,i4 =r∑

    j=1

    Xi1,i2,jYj,i3,i4 .2

    13 1

    2

    3

    I number of free edges = order of tensor represented by entirenetwork

    Researchers on quantum many-body problems think2 in terms oftensor networks!

    2and dream62

  • Tensor network diagramsExamples:

    1 2

    3 3

    1 2 1 2

    2 2 2 2

    1 1 1 1

    1 11

    1

    22

    2

    (v)(i) (ii) (iii) (iv)

    (i) vector;(ii) matrix;(iii) matrix-matrix multiplication;(iv) Tucker decomposition;(v) hierarchical Tucker decomposition.

    63

  • Low-rank tensors:Hierarchical Tucker

    I Intro of Hierarchical Tucker Decomposition (HTD)I MATLAB toolbox htuckerI Basic operations: µ-mode matrix multiplication, addition, . . .I Advanced Operations: inner product, elementwise multiplication,. . .

    64

  • IntroductionI CP offers low data complexity but difficult truncation;I Tucker offers simple truncation but high data complexity.

    Recently developed formats:I Matrix Product State (MPS),I TT decomposition,I Hierarchical Tucker decomposition (HTD).

    Aim to offer compromise between CP and Tucker.

    Focus in this lecture: HTD.I L. Grasedyck. Hierarchical singular value decomposition of tensors.

    SIAM J. Matrix Anal. Appl., 31(4):2029–2054, 2010.I W. Hackbusch and S. Kühn. A new scheme for the tensor

    representation. J. Fourier Anal. Appl., 15(5):706–722, 2009.I D. Kressner and C. Tobler. htucker – A MATLAB toolbox for the

    hierarchical Tucker decomposition. In preparation. Seehttp://www.math.ethz.ch/~ctobler.

    65

    http://www.math.ethz.ch/~ctobler

  • More general matricizationsRecall: µ-mode matricization for tensor X ,

    X (µ) ∈ Rnµ×(n1···nµ−1nµ+1···nd ), µ = 1, . . . ,d .

    It is getting ugly...

    General matricization for mode de-composition {1, . . . ,d} = t ∪ s:

    X (t) ∈ R(nt1 ···ntk )×(ns1 ···nsd−k )

    with(X (t)

    )(it1 ,...,itk ),(is1 ,...,isd−k )

    := Xi1,...,id .

    X

    X (1)

    X (1,2)

    66

  • Hierarchical constructionSingular value decomposition: X (t) = Ut ΣtUTs .Column spaces are nested

    t = t1 ∪ t2 ⇒ span(Ut ) ⊂ span(Ut2 ⊗ Ut1 )⇒ ∃Bt : Ut = (Ut2 ⊗ Ut1 )Bt .

    Size of Ut :Ut ∈ Rnt1 ···ntk×rt with rt = rank

    (X (t)

    ).

    For d = 4:

    U12 = (U2 ⊗ U1)B12U34 = (U4 ⊗ U3)B34

    vec(X ) = X (1234) = (U34 ⊗ U12)B1234⇒ vec(X ) = (U4 ⊗ U3 ⊗ U2 ⊗ U1)(B34 ⊗ B12)B1234.

    67

  • Dimension treeTree structure for d = 4:

    B12

    U1

    U2

    U3

    U4

    B34

    B1234(n2 × r2)

    (n3 × r3)

    (n4 × r4)

    (n1 × r1)

    (r1r2 × r12)(r1r2 × r12)

    (r3r4 × r34)

    (r12r34 × 1)

    Reshape:

    B12 ∈ Rr1r2×r12 ⇒ B12 ∈ Rr1×r2×r12

    B34 ∈ Rr3r4×r34 ⇒ B34 ∈ Rr3×r4×r34

    B1234 ∈ Rr12r34×1 ⇒ B1234 ∈ Rr12×r34

    68

  • Dimension tree

    B34

    B12

    U4

    U3

    U2

    U1

    B1234

    I Often, U1,U2,U3,U4 are orthonormal. This is advantageous butnot required.

    I Storage requirements for general d :

    O(dnr) +O(dr3),

    where r = max{rt}, n = max{nµ}.69

  • Constructors for MATLAB class htensor

    x = htensor([4 5 6 7]) constructs zero htensor of size4× 5× 6× 7, with a balanced dimension tree.

    x = htensor([4 5 6 7], ’TT’) constructs zero htensorof size 4× 5× 6× 7, with a TT-style dimension tree.

    x = htensor({U1, U2, U3}) constructs htensor fromtensor in CP decomp X (i1, i2, i3) =

    ∑j U1(i1, j)U2(i2, j)U3(i3, j).

    x = htenrandn([4 5 6 7]) constructs htensor of size4× 5× 6× 7, with random ranks and random entries.

    x = htenones([4 5 6 7]) constructs htensor of size4× 5× 6× 7, with all entries one.

    ...

    70

  • Basic functionality for MATLAB class htensorExample: x is in htensor of order 4.

    x(1, 3, 4, 2) returns entry of X .x(1, 3, :, :) returns slice of X as an htensor.full(x) returns full tensor represented by X . (use with care)disp_tree(htenrand([5 4 6 3])) returns treestructure/ranks:

    ans is an htensor of size 5 x 4 x 6 x 31-4 1; 6 3 11-2 2; 3 4 6

    1 4; 5 32 5; 4 4

    3-4 3; 3 3 33 6; 6 34 7; 3 3

    spy(x) displays spy plots of Ut ,Bt , on the dimension tree.change_root(x, i) switches root node.

    71

  • Singular value treeplot_sv(x) plots singular values of corresponding matricizations inthe dimension tree of a tensor X .

    Example: Singular value tree of solution to elliptic PDE with 4parameters.

    Dim. 1, 2 Dim. 3, 4, 5

    Dim. 1 Dim. 2 Dim. 3 Dim. 4, 5

    Dim. 4 Dim. 5

    Remark: Singular values are computed from Gramians. 72

  • Basic ops: µ-mode matrix multiplicationApplication of matrix A ∈ Rm×nµ to mode µ of X ∈ Rn1×···×nd :

    Y = A ◦µ X ⇔ Y (µ) = AX (µ).

    Nearly trivial if X is in H-Tucker format:

    A ◦µ X = A ◦µ((U1, . . . ,Ud ) ◦ C

    )= (U1, . . . ,Uµ−1,AUµ,Uµ+1, . . . ,Ud ) ◦ C

    I Almost no operations required.I Ranks stay the same.I Orthogonality destroyed.

    ttm(x, A, 2) applies matrix A to htensor X in mode 2.y = ttm(x, {A, B, C}, [2, 3, 4])y = ttm(x, @(x)(fft(x)), 2) applies FFT in mode 2.y = ttm(x, {A, B, C}, [2, 3, 4], ’h’) successivelyapplies matrices AT , BT , CT in modes 2,3,4.

    73

  • Addition of low-rank matricesAddition of two matrices in low-rank format:

    A = U1ΣAUT2 , B = V1ΣBVT2

    ⇒A + B =

    [U1 V1

    ] [ ΣA 00 ΣB

    ] [U2 V2

    ]TI No operations required.I Rank increases.I Orthogonality destroyed.

    74

  • Addition of low-rank tensorsAddition of four tensors X1,X2,X3,X4 in H-Tucker format:

    X1 + X2 + X3 + X4.

    Proceed as in matrix case by embedding factors in larger matrices.I No operations required.I H-Tucker rank increases.I Orthogonality destroyed.

    Command in htucker: x1 + x2 + x3 + x4

    75

  • U [4]1

    U [4]2

    U [4]3

    U [4]4

    B[1]12B[2]12B[3]12B[4]12

    B[1]34B[2]34B[3]34B[4]34

    B[1]1234B[2]1234B[3]1234B[4]1234

    U [3]1U[2]1U

    [1]1

    U [3]3

    U [3]2U[2]2

    U [2]3U[1]3

    U [1]2

    U [3]4U[2]4U

    [1]4

    76

  • OrthogonalizationAny tensor X in H-Tucker format can be orthogonalized in the sensethat all factors in the dimension tree, except for the root node, containorthonormal columns.

    Example: vec(X ) = (U4 ⊗ U3 ⊗ U2 ⊗ U1)(B34 ⊗ B12)B1234.

    Step 1: QR decompositions Ut = QtRt

    vec(X ) = (Q4 ⊗Q3 ⊗Q2 ⊗Q1)(B̃34 ⊗ B̃12)B1234

    with B̃34 := (R4 ⊗ R3)B34, B̃12 := (R2 ⊗ R1)B12.

    Step 2: QR decompositions B̃34 = Q34R34, B̃12 = Q12R12

    vec(X ) = (Q4 ⊗Q3 ⊗Q2 ⊗Q1)(Q34 ⊗Q12)B̃1234

    with B̃1234 := (R34 ⊗ R12)B1234.

    Compt. requirements for general d : O(dnr2) +O(dr4).

    Command in htucker: x = orthog(x)

    77

  • Norms and inner productsInner product of two tensors X ,Y ∈ Rn1×···nd :

    〈X ,Y〉 = 〈vec(X ), vec(Y)〉 =n1∑

    i1=1

    · · ·nd∑

    id =1

    xi1,...,id yi1,...,id .

    Can be performed efficiently in H-Tucker, provided that X ,Y havecompatible dimension trees.

    Example: Two tensors of order 4

    〈X ,Y〉 = (Bx1234)T (Bx34 ⊗ Bx12)T (Ux4 ⊗ Ux3 ⊗ Ux2 ⊗ Ux1 )T · · ·· · · (Uy4 ⊗ U

    y3 ⊗ U

    y2 ⊗ U

    y1 )(B

    y34 ⊗ B

    y12)B

    y1234

    Norm: After X has been orthogonalized:

    ‖X‖ =√〈X ,X〉 = ‖Bx12···d‖F .

    Possibly most accurate way to compute norm. Used in norm(x).

    78

  • Computation of inner products

    〈X ,Y〉 =n1∑

    i1=1

    · · ·nd∑

    id =1

    xi1,...,id yi1,...,id .

    79

  • Computation of inner products

    80

  • Computation of inner products

    81

  • Computation of inner products

    82

  • Computation of inner products

    83

  • Computation of inner products – contraction step

    (Bxt )T

    (Uxt2)T Uyt2(U

    xt1)

    T Uyt1

    Byt

    (Uxt )T Uyt = (B

    xt )

    T ((Uxt2 )T Uyt2 ⊗ (Uxt1 )T Uyt1)Byt .I htucker command: innerprod(x,y)I Overall cost: O(dnr2) +O(dr4).

    84

  • Reduced Gramians in H-Tucker

    t

    Ut

    Gt

    t

    Ut

    X (t) = UtV Tt ⇒ X (t)(X (t))T = Ut V Tt Vt︸ ︷︷ ︸=:Gt

    UTt

    If Ut orthonormal svd(X (t)

    )=√

    eig(Gt ) (used in plot_sv).85

  • Reduced Gramians in H-Tucker

    86

  • Reduced Gramians in H-Tucker

    87

  • Reduced Gramians in H-Tucker

    88

  • Reduced Gramians in H-Tucker

    89

  • Reduced Gramians in H-Tucker

    90

  • Reduced Gramians in H-Tucker

    Implemented in htucker command gramians(x).

    91

  • Advanced operationsI TruncationI Combined addition + truncationI Elementwise multiplicationI Elementwise reciprocal

    92

  • Truncation of explicit tensorLet X ∈ Rn1×n2×···×nd be explicitly given.

    I For each tree node t , let Wt contain rt dominant left singularvectors of X (t) and define projection

    πtX = WtW Tt ◦t X ⇔ πtX (t) = WtW Tt X (t).

    I Truncated tensor:

    X̃ :=( ∏

    t∈TL

    πt

    )· · ·( ∏

    t∈T1

    πt

    )X ,

    where T` contains all nodes on level `.I [Grasedyck’2010]: ‖X − X̃‖ ≤

    √2d − 3 ‖X − Xbest‖.

    Proof similar as for HOSVD.

    93

  • Truncation of explicit tensorExample:

    vecX̃ = (W4W T4 ⊗W3W T3 ⊗W2W T2 ⊗W1W T1 )(W34W T34 ⊗W12W T12)vecX= (W4 ⊗W3 ⊗W2 ⊗W1) · · ·

    ([W T4 ⊗W T3 ]W34︸ ︷︷ ︸=:B34

    ⊗ [W T2 ⊗W T1 ]W12︸ ︷︷ ︸=:B12

    ) ([W T34 ⊗W T12]vecX )︸ ︷︷ ︸=:B1234

    .

    opts.max_rank = 10 maximal rank at truncation.opts.rel_eps = 1e-6 maximal relative truncation error.opts.abs_eps = 1e-6 maximal absolute truncation error.Condition max_rank takes precedence over rel_eps andabs_eps.xt = htensor.truncate_rtl(x, opts) returns truncatedtensor X̃ of a multidimensional array.

    Remark: There is also a significantly fasterhtensor.truncate_ltr (proceeds successively from leafs toroots), for which the same error bound holds [Tobler’10].

    94

  • Truncation of H-Tucker tensorLet X ∈ Rn1×n2×···×nd be in H-Tucker format and orthogonalized.

    I Compute left singular vectors of X (t) = UtV Tt from eigenvectorsof

    X (t)(X (t)

    )T= Ut V Tt Vt︸ ︷︷ ︸

    =Gt

    UTt ,

    with reduced Gramian Gt .If St contains rt dominant eigenvectors of Gt Wt = UtSt .

    I Traverse tree from root to leafs. In each step:

    Btp

    StSTt

    Bt

    Bt

    Btp

    STt

    St

    STt ◦ Btp

    St ◦ Bt

    I In htucker: truncate(x,opts). Complexity O(dnr2 + dr4).95

  • Combined addition + truncationSum of more than two tensors:

    Y = X1 + X2 + · · ·+ Xs.

    Two possibilities to incorporate truncation operator T :1. Y ≈ T (X1 + X2 + X3 + · · ·+ Xs)2. Y ≈ T (· · · (T (T (X1 + X2) + X3) + · · ·+ Xs)

    Option 2 is usually significantly cheaper but may suffer from severecancellation.

    Artificial example: X1,X2,X3 ∈ R101×101×101 truncated tensor griddiscretizations for summands of

    f (x1, x2, x3) = tan(x1 + x2 + x3) + (x1 + x2 + x3)−1 − tan(x1 + x2 + x3).

    Error(Option 1) ≈ 10−7. Error(Option 2) ≈ 1.3.

    What is wrong with Option 1?

    96

  • Combined addition + truncation

    U [4]1

    U [4]2

    U [4]3

    U [4]4

    B[1]12B[2]12B[3]12B[4]12

    B[1]34B[2]34B[3]34B[4]34

    B[1]1234B[2]1234B[3]1234B[4]1234

    U [3]1U[2]1U

    [1]1

    U [3]3

    U [3]2U[2]2

    U [2]3U[1]3

    U [1]2

    U [3]4U[2]4U

    [1]4

    I Orthogonalization (needed before truncation) destroys blockdiagonal structure.

    I Complexity O(dns2r2 + ds4r4) for s summands.

    97

  • Combined addition + truncationIdea: New variant delays orthogonalization to keep block diagonalstructure in transfer tensors as long as possible.

    Reduces O(dns2r2 + ds4r4) to O(dns2r2 + ds2r4 + ds3r3)

    100

    101

    10−2

    10−1

    100

    101

    102

    Number of summands

    Run

    time

    [s]

    time truncate stdtime truncate sumtime truncate succ.O(t4)O(t2)O(t)

    I htucker command: add_truncate(x1 x2 x3 x4, opts).

    98

  • Elementwise multiplicationElementwise multiplication (also called Hadamard or Schur product)of two low-rank matrices A = U1ΣAUT2 ,B = V1ΣBV

    T2 :

    A ? B = (U1 �̃ V1)(ΣA ⊗ ΣB)(U2 �̃ V2)T ,

    with the row-wise Khatri-Rao product

    C �̃ D =

    cT1...

    cTn

    �̃ d

    T1...

    dTn

    = c

    T1 ⊗ dT1

    ...cTn ⊗ dTn

    I Orthogonality destroyed.I Rank increases significantly.

    But: singular value decay of ΣA ⊗ ΣB may become significantlystronger additional opportunities for truncation.

    99

  • Elementwise multiplicationElementwise multiplication of two tensors X ,Y in H-Tucker format:

    I Row-wise Khatri-Rao product of leaf matrices.I “Kronecker product” of non-leaf tensors.I Optional: Products are only formed after suitable truncation to

    avoid excessive memory requirements.Commands in htucker:x.*y (without truncation)x.ˆ2 (without truncation)elem_mult( x, y, opt ) (with truncation)

    100

  • Elementwise reciprocalGoal: Compute reciprocal of each entry in tensor X .

    Basic idea: Newton-Schultz iteration

    y0 = 1, yi+1 = yi + yi (1− x yi ), (2)

    converges to 1/x for 0 < x < 2.

    Apply (2) simultaneously to all entries.

    Code snippet of elem_reciprocal( x, opt ) in htucker:

    all_ones = htenones(size(x));y = all_ones;for it=1:maxit

    xy = elem_mult( x, y );xy = truncate( all_ones - xy );xy = elem_mult( xy, y );y = truncate( y + xy );

    end

    See also [Oseledets et al. 2009].101

  • Elementwise reciprocalExample: (x1 + x2 + x3 + x4)−1 with xi ∈ [10−3,1].

    c = laplace_core(4);U = [ones(100, 1), linspace(1e-3, 1, 100)’];x = ttm(c, {U, U, U, U});inv_x = elem_reciprocal(x, opts);

    0 2 4 6 8 10 1210

    −5

    100

    ||y*x

    k −

    1||/

    ||1||

    Convergence of ‖X ? Yk − 1‖.

    Dim. 1, 2

    Dim. 3, 4

    Dim. 1

    Dim. 2

    Dim. 3

    Dim. 4

    Singular value tree upon conver-gence.

    102

  • SummaryI HTD offers good compromise between CP and Tucker.I Algorithms often quite technical but conceptually simple.I Computational complexity ∼ d but often ∼ r4:

    Curse of dimensionality ⇒ curse of rank ?I Important to keep in mind:

    Unless d is tiny, tensor X can/should never be formed explicitly.All operations need to be performed implicitly in HTD.

    Can pose severe problems even for seemingly simple operations:min(X ), max(X ), abs(X ), 1./X , . . .

    103

  • 104

  • Algorithms based onlow-rank tensors

    I Inexact LOBPCGI ALS / MALS

    105

  • Strategies for solving tensor equationsI In many practical situations, tensor X is given implicitly as

    solution to linear system A(X ) = B, eigenvalue problemA(X ) = λX , nonlinear system, ODE, . . .

    Two main strategies to use low-rank tensor techniques:1. Combine existing iterative solver (e.g., CG, LOBPCG, GMRES)

    with repeated low-rank truncation of iterates ( inexact CG).I Straightforward to derive and implement (based, e.g., onhtucker).

    I Hard to analyze impact of nonnegligible truncations on accuracyand convergence.

    I Intermediate rank growth may result in excessive computing timesand/or harm accuracy+convergence.

    2. Formulate optimization problem, constrain to low-rank tensors,iteratively optimize wrt individual factors of low-rank format.

    I Works well in practice.I Convergence theory not well understood.I Not straightforward to implement.

    106

  • Example: PDE-eigenvalue problemGoal: Compute smallest eigenvalue for

    ∆u(ξ) + V (ξ)u(ξ) = λu(ξ) in Ω = [0,1]d ,u(ξ) = 0 on ∂Ω.

    Assumption: Potential represented as

    V (ξ) =s∑

    j=1

    V (1)j (ξ1)V(2)j (ξ2) · · ·V

    (d)j (ξd ).

    finite difference discretization

    Au = (AL +AV )u = λu,with

    AL =d∑

    j=1

    I ⊗ · · · ⊗ I︸ ︷︷ ︸d−j times

    ⊗AL ⊗ I ⊗ · · · ⊗ I︸ ︷︷ ︸j−1 times

    ,

    AV =s∑

    j=1

    A(d)V ,j ⊗ · · · ⊗ A(2)V ,j ⊗ A

    (1)V ,j .

    107

  • LOBPCG methodLOBPCG with block size 1 [Knyazev’01] for computing smallesteigenvalue of

    Ax = λx , A symmetric.

    λ0 = 〈x0, x0〉A, p0 = 0for k = 0,1, . . . (until converged) do

    rk = B−1(Axk − λk x)U =

    [xk , rk , pk

    ]Ã = UT AU, M̃ = UT UFind eigenpair (λk+1, y), with ‖y‖2 = 1, for smallest eigenvalueof matrix pencil Ã− λM̃.pk+1 = y2 · rk + y3 · pkxk+1 = y1 · xk + pk+1xk+1 ← xk+1/‖xk+1‖2

    end forReturn (λmin, x) = (λk+1, xk+1).

    108

  • Tensor low-rank LOBPCGTruncated LOBPCG with block size 1 for computing smallesteigenvalue of

    A(X ) = λX , A symmetric, X tensor.

    λ0 = 〈X0,X0〉A, P0 = 0 · Xfor k = 0,1, . . . (until converged) doRk = B−1(A(Xk )− λkXk ), Rk ← T (Rk )U1 = Xk , U2 = Rk , U3 = PkÃij = 〈Ui ,Uj〉A, M̃ij = 〈Ui ,Uj〉Find eigenpair (λk+1, y), with ‖y‖2 = 1, for smallest eigenvalueof matrix pencil Ã− λM̃.Pk+1 = y2 · Rk + y3 · Pk Pk+1 ← T (Pk+1)Xk+1 = y1 · Xk + Pk+1 Xk+1 ← T (Xk+1)Xk+1 ← Xk+1/

    √〈Xk+1,Xk+1〉

    end forReturn (λmin,X ) = (λk+1,Xk+1).

    T = truncation to hierarchical low rank

    109

  • Implementation details

    OrthogonalizationIn standard LOBPCG, orthogonalization of U is recommended[Knyazev 2010]. This is not practical with low-rank tensors, as rankswould grow and truncation would destroy orthogonality.

    TruncationXk ,Rk ,Pk are truncated in every step. Moreover, application of A(·)and preconditioner B−1(·) may involve truncation during theapplication of these operators.

    Inner productReduced matrix à is very sensitive to truncation in A(·). Thecomputation of Ãi,j = 〈Ui ,Uj〉A must be exact.

    110

  • Numerical Experiments - Sine potential

    PDE-eigenvalue problem with Ω = [0, π]d and sine potential

    V (ξ) = q ·d∏

    i=1

    sin(ξi )

    for some constant q > 0. We choose d = 10, n = 128.

    Preconditioner: [Grasedyck 2004]

    A−1L =∫ ∞

    0exp(−tAL)dt

    ≈M∑

    j=−M

    ωj exp(−αjA(d)L )⊗ · · · ⊗ exp(−αjA(1)L ) =: B

    −1,

    for a certain, optimized and tabulated choice of coefficients αj , ωj > 0.We choose M = 10.

    111

  • Numerical Experiments - Sine potential

    q = 1

    0 10 20 30 4010

    −8

    10−6

    10−4

    10−2

    100

    102

    104

    Re

    sid

    ua

    l

    Iterations

    0 10 20 30 400

    10

    20

    30

    40

    50

    Ma

    xim

    al ra

    nk

    eps 1e−2

    eps 1e−4

    eps 1e−8

    q = 1000

    0 10 20 30 4010

    −8

    10−6

    10−4

    10−2

    100

    102

    104

    Re

    sid

    ua

    l

    Iterations

    0 10 20 30 400

    10

    20

    30

    40

    50

    Ma

    xim

    al ra

    nk

    eps 1e−2

    eps 1e−4

    112

  • ALSOriginally from computational quantum physics [Schollwöck 2011],recently investigated by [Huckle et al. 2010; Oseledets, Khoromskij2010; Holtz et al. 2010; Dolgov, Oseledets 2011]

    Goal:

    min{ 〈X ,A(X )〉〈X ,X〉

    : X ∈ H-Tucker((rt )t∈T

    ), X 6= 0

    }Method: Choose one node t , fix all other nodes, set new tensor atnode t to minimize Rayleigh quotient 〈X ,A(X )〉〈X ,X〉 . This is done for allnodes (a sweep), and sweeps are continued until convergence.

    Sketch:

    X (t) = UtV Tt =(Utr ⊗ Utl

    )BtV Tt ,

    vec(X ) =(Vt ⊗ Utr ⊗ Utl

    )vec(Bt ) = Ut vec(Bt ).

    ⇒ min{

    yT (UTt AUt )yyT (UTt Ut )y

    : y ∈ Rrtl rtr rt , y 6= 0}.

    113

  • Computation of reduced matrices

    Consider A = Ad ⊗ · · · ⊗ A1 (Other operators can be treated similarly)

    Compute

    Ãt := UTt AUt =(Vt ⊗ Utr ⊗ Utl

    )TA(Vt ⊗ Utr ⊗ Utl ) = Ât ⊗ Ãtr ⊗ Ãtl ,where

    Ãtl = UTtl

    (⊗i∈tl

    Ai)

    Utl , Ãtr = UTtr

    (⊗i∈tr

    Ai)

    Utr , Ât = VTt

    (⊗i 6∈t

    Ai)

    Vt .

    Additionally

    M̃t := UTt Ut = V Tt Vt ⊗ UTtr Utr ⊗ UTtl Utl = Mt ⊗Mtr ⊗Mtl ,

    114

  • Computation of reduced matrices

    A1 A3 A5 A6 A7 A8A2 A4

    Ã12 Ã34

    Â1234

    115

  • MALS

    Method:I Select edge of tensor network.I Combine tensors at the adjacent nodes to form a higher-order

    tensor.I Set this tensor to minimize the Rayleigh quotient.I Use low-rank approximation to split new combined tensor into

    two tensors at adjacent nodes of selected edge.

    116

  • MALS - Illustration

    117

  • Numerical Experiments – Sine potential

    PDE-eigenvalue problem with Ω = [0, π]d and sine potential

    V (ξ) = q ·d∏

    i=1

    sin(ξi )

    for some constant q > 0. Choose d = 10, n = 128, q = 1000.Preconditioner: [Grasedyck 2004]

    A−1L =∫ ∞

    0exp(−tAL)dt

    ≈M∑

    j=−M

    ωj exp(−αjA(d)L )⊗ · · · ⊗ exp(−αjA(1)L ) =: B

    −1,

    for a certain, optimized choice of coefficients αj , ωj > 0. We chooseM = 10.

    118

  • Numerical Experiments – Sine potential

    ALS

    0 100 200 300 400 50010

    −15

    10−10

    10−5

    100

    105

    Execution time [s]

    0 100 200 300 400 50015

    20

    25

    30

    35

    40

    45err_lambda

    res

    nr_iter

    Hierarchical ranks 40.

    MALS

    0 100 200 300 400 50010

    −15

    10−10

    10−5

    100

    105

    Execution time [s]

    0 100 200 300 400 5000

    20

    40

    60

    80

    100err_lambda

    res

    eps

    rank

    nr_iter

    Maximal hierarchical rank 30.

    119

  • Conclusions and Outlook

    120

  • Conclusions and OutlookI Scientific computing with low-rank tensors rapidly evolving field

    and highly technical.I Precise scope of applications far from clear; many applications

    remain to be explored. More analysis and comparison toalternative techniques (sparse grids, adaptive tensordiscretization, Monte Carlo, . . .) needed.

    Some current trends:I Tensorization of vectors + low rank (discrete Chebfun?) by

    Hackbusch, Khoromskij, Oseledets, Tyrtishnikov, . . .I Computational differential geometry on low-rank tensor manifolds

    by Koch, Lubich, Schneider, Uschmajew, Vandereycken, . . .I Robust low rank (Candes et al.) for tensors suitable way of

    dealing with singularities?I . . .

    Acknowledgments: Presentation heavily benefited from joint workwith Christine Tobler (ETH Zurich).

    121