RADI: a low-rank ADI-type algorithm for large scale ...RADI: a low-rank ADI-type algorithm for large...

30
Numer. Math. (2018) 138:301–330 https://doi.org/10.1007/s00211-017-0907-5 Numerische Mathematik RADI: a low-rank ADI-type algorithm for large scale algebraic Riccati equations Peter Benner 1 · Zvonimir Bujanovi´ c 2 · Patrick Kürschner 1 · Jens Saak 1 Received: 23 November 2015 / Revised: 26 September 2016 / Published online: 24 July 2017 © The Author(s) 2017. This article is an open access publication Abstract This paper introduces a new algorithm for solving large-scale continuous- time algebraic Riccati equations (CARE). The advantage of the new algorithm is in its immediate and efficient low-rank formulation, which is a generalization of the Cholesky-factored variant of the Lyapunov ADI method. We discuss important imple- mentation aspects of the algorithm, such as reducing the use of complex arithmetic and shift selection strategies. We show that there is a very tight relation between the new algorithm and three other algorithms for CARE previously known in the literature—all of these seemingly different methods in fact produce exactly the same iterates when used with the same parameters: they are algorithmically different descriptions of the same approximation sequence to the Riccati solution. Mathematics Subject Classification 15A24 · 15A18 · 65F15 · 65F30 B Patrick Kürschner [email protected] Peter Benner [email protected] Zvonimir Bujanovi´ c [email protected] Jens Saak [email protected] 1 Max Planck Institute for Dynamics of Complex Technical Systems, Magdeburg, Germany 2 Department of Mathematics, Faculty of Science, University of Zagreb, Zagreb, Croatia 123

Transcript of RADI: a low-rank ADI-type algorithm for large scale ...RADI: a low-rank ADI-type algorithm for large...

  • Numer. Math. (2018) 138:301–330https://doi.org/10.1007/s00211-017-0907-5

    NumerischeMathematik

    RADI: a low-rank ADI-type algorithm for large scalealgebraic Riccati equations

    Peter Benner1 · Zvonimir Bujanović2 ·Patrick Kürschner1 · Jens Saak1

    Received: 23 November 2015 / Revised: 26 September 2016 / Published online: 24 July 2017© The Author(s) 2017. This article is an open access publication

    Abstract This paper introduces a new algorithm for solving large-scale continuous-time algebraic Riccati equations (CARE). The advantage of the new algorithm is inits immediate and efficient low-rank formulation, which is a generalization of theCholesky-factored variant of the Lyapunov ADI method. We discuss important imple-mentation aspects of the algorithm, such as reducing the use of complex arithmetic andshift selection strategies. We show that there is a very tight relation between the newalgorithm and three other algorithms for CARE previously known in the literature—allof these seemingly different methods in fact produce exactly the same iterates whenused with the same parameters: they are algorithmically different descriptions of thesame approximation sequence to the Riccati solution.

    Mathematics Subject Classification 15A24 · 15A18 · 65F15 · 65F30

    B Patrick Kü[email protected]

    Peter [email protected]

    Zvonimir Bujanović[email protected]

    Jens [email protected]

    1 Max Planck Institute for Dynamics of Complex Technical Systems, Magdeburg, Germany

    2 Department of Mathematics, Faculty of Science, University of Zagreb, Zagreb, Croatia

    123

    http://crossmark.crossref.org/dialog/?doi=10.1007/s00211-017-0907-5&domain=pdf

  • 302 P. Benner et al.

    1 Introduction

    The continuous-time algebraic Riccati equation,

    A∗X + X A + Q − XGX = 0, (1)

    where

    Q = C∗C, G = BB∗, A ∈ Rn×n, B ∈ Rn×m, C ∈ Rp×n,

    appears frequently in various aspects of control theory, such as linear-quadratic opti-mal regulator problems, H2 and H∞ controller design and balancing-related modelreduction. While the equation may have many solutions, for such applications one isinterested in finding a so-called stabilizing solution: the unique positive semidefinitesolution X ∈ Cn×n such that the matrix A − GX is stable (i.e. all of its eigenvaluesbelong to C−, the left half of the complex plane). If the pair (A,G) is stabilizable(i.e. rank[A−λI, G] = n, for all λ in the closed right half plane), and the pair (A, Q)is detectable (i.e. (A∗, Q∗) is stabilizable), then such a solution exists [11,19]. Theseconditions are fulfilled generically, and we assume they hold throughout the paper.

    There are several algorithms for finding the numerical solution of (1). In the casewhen n is small enough, one can compute the eigen- or Schur decomposition of theassociated Hamiltonian matrix

    H =[A GQ −A∗

    ], (2)

    and use an explicit formula for X , see [11,20]. However, if the dimensions of theinvolved matrices prohibit the computation of a full eigenvalue or Schur decompo-sition, specialized large-scale algorithms have to be constructed. In such scenarios,Riccati equations arising in applications have additional properties: A is sparse, andp,m � n, thus making the matrices Q and G of very-low rank compared to n. Inpractice, this often implies that the sought-after solution X will have a low numer-ical rank [3], and allows for construction of iterative methods that approximate Xwith a series of matrices stored in low-rank factored form. Most of these methods areengineered as generalized versions of algorithms for solving a large-scale Lyapunovequation [10,33], which is a special case of (1) with G = 0.

    The alternating directions implicit (ADI) method [35] is a well established itera-tive approach for computing solutions of Lyapunov and other linear matrix equations.There exists an array of ADI methods [7,8,21,22,27,30,35,36], covering both theordinary and the generalized case. All of these methods have simple statements andefficient implementations [27]. One particular advantage of ADI methods is that theyare verywell suited for large-scale problems: the default formulationwhichworkswithfull-size dense matrices can be transformed into a series of iteratively built approxi-mations to the solution. Such approximations are represented in factored form, eachfactor having a very small rank compared to the dimensions of the input matrices. Thismakes ADI methods very suitable for large-scale applications.

    123

  • RADI: a low-rank ADI-type algorithm for large AREs. 303

    Recently,Wong and Balakrishnan [37,38] suggested a so-called quadratic ADIme-thod (qADI) for solving the algebraic Riccati equation (1). Their method is a directgeneralization of the Lyapunov ADI method, but only when considering the formu-lation working with full-size dense matrices. However, in the setting of the qADIalgorithm, it appears impossible to apply a so-called “Li–White trick” [22], which isthe usual method of obtaining a low-rank formulation of an ADI method. Wong andBalakrishnan do provide a low-rank variant of their algorithm, but this variant hasan important drawback: in each step, all the low-rank factors have to be rebuilt fromscratch. This has a large negative impact on the performance of the algorithm.

    Apart from the qADI method, there are several other methods for solving the large-scale Riccati equation that have appeared in the literature recently. Amodei andBuchot[1] derive an approximation of the solution by computing small-dimensional invariantsubspaces of the associated Hamiltonian matrix (2). Lin and Simoncini [23] also con-sider theHamiltonianmatrix, and construct the solution by running subspace iterationson its Cayley transforms. Massoudi et al. [24] have shown that the latter method canbe obtained from the control theory point of view as well.

    In this paper, we introduce a new ADI-type iteration for Riccati equations, RADI.The derivation of RADI is not related to qADI, and it immediately gives the low-rankalgorithmwhich overcomes the drawback from [37,38]. The low-rank factors are builtincrementally in the new algorithm: in each step, each factor is expanded by severalcolumns and/or rows, while keeping the elements from the previous steps intact. Bysetting the quadratic coefficient B in (1) to zero, our method reduces to the low-rankformulation of the Lyapunov ADI method, see, e.g., [4,7,22,27].

    A surprising result is that, despite their completely different derivations, all of theRiccati methods we mentioned so far are equivalent: the approximations they producein each step are the same. This was already shown [3] for the qADI algorithm, andthe algorithm of Amodei and Buchot. In this paper we extend this equivalence to ournew low-rank RADI method and the method of Lin and Simoncini. Among all thesedifferent formulations of the same approximation sequence, RADI offers a compactand efficient implementation, and is very well suited for effective computation.

    This paper is organized as follows: in Sect. 2, we recall the statements of the Lya-punov ADI method and the various Riccati methods, and introduce the new low-rankRADI algorithm.The equivalence of all aforementionedmethods is shown inSect. 3. InSect. 4 we discuss important implementation issues, and in particular, various strate-gies for choosing shift parameters. Finally, Sect. 5 compares the effect of differentoptions for the algorithm on its performance via several numerical experiments. Wecompare RADI with other algorithms for computing low-rank approximate solutionsof (1) as well: the extended [16] and rational Krylov subspace methods [34], and thelow-rank Newton–Kleinman ADI iteration [7,9,15,29].

    The following notation is used in this paper: C− and C+ are the open left and righthalf plane, respectively, while Re (z) , Im (z), z = Re (z) − i Im (z), |z| denote thereal part, imaginary part, complex conjugate, and absolute value of a complex quantityz. For the matrix A, we use A∗ and A−1 for the complex conjugate transpose and theinverse, respectively. In most situations, expressions of the form x = A−1b are tobe understood as solving the linear system Ax = b of equations for b. The relationsA > (≥)0, A < (≤)0 stand for the matrix A being positive or negative (semi)definite.

    123

  • 304 P. Benner et al.

    Likewise, A ≥ (≤)B refers to A − B ≥ (≤)0. If not stated otherwise, ‖ · ‖ is theEuclidean vector or subordinate matrix norm, and κ(·) is the associated conditionnumber.

    2 A new low-rank factored iteration

    We start this section by stating various methods for solving Lyapunov and Riccatiequations, which will be used throughout the paper. First, consider the Lyapunovequation

    A∗X lya + X lyaA + Q = 0, Q = C∗C, A ∈ Rn×n, C ∈ Rp×n . (3)

    Here we assume that n is much larger than p. When A is a stable matrix, the solutionX lya is positive semidefinite. The Lyapunov ADI algorithm [36] generates a sequenceof approximations (X lyak )k to X

    lya defined by

    X lyak+1/2(A + σk+1 I ) = −Q − (A∗ − σk+1 I )X lyak ,(A∗ + σk+1 I )X lyak+1 = −Q − X lyak+1/2(A − σk+1 I ).

    }Lyapunov ADI (4)

    We will assume that the initial iterate X lya0 is the zero matrix, although it may be setarbitrarily. The complex numbers σk ∈ C− are called shifts, and the performance ofADI algorithms depends strongly on the appropriate selection of these parameters [30];this is further discussed in the context of the RADI method in Sect. 4.5. Since eachiteration matrix X lyak is of order n, formulation (4) is unsuitable for large values of n,

    due to the amount ofmemory needed for storing X lyak ∈ Cn×n and to the computationaltime needed for solving n linear systems in each half-step. The equivalent low-rankalgorithm [4,6] generates the same sequence, but represents X lyak in factored form

    X lyak = Z lyak (Z lyak )∗ with Z lyak ∈ Cn×pk :

    Rlya0 = C∗,V lyak =

    √−2Re (σk) · (A∗ + σk I )−1Rlyak−1,Rlyak = Rlyak−1 +

    √−2Re (σk) · V lyak ,Z lyak =

    [Z lyak−1 V

    lyak

    ].

    ⎫⎪⎪⎪⎪⎬⎪⎪⎪⎪⎭

    low-rank Lyapunov ADI (5)

    Initially, the matrix Z lya0 is empty. This formulation is far more efficient for largevalues of n, since the right-hand side of the linear system in each step involves onlythe tall-and-skinny matrix Rlyak−1 ∈ Cn×p.

    We now turn our attention to the Riccati equation (1). Once again we assume thatp,m � n, and seek to approximate the stabilizing solution X .Wong andBalakrishnan[37,38] suggest the following quadratic ADI iteration (abbreviated as qADI):

    123

  • RADI: a low-rank ADI-type algorithm for large AREs. 305

    Xadik+1/2(A + σk+1 I − GXadik

    ) = −Q − (A∗ − σk+1 I )Xadik ,(A∗ + σk+1 I − Xadik+1/2G

    )Xadik+1 = −Q − Xadik+1/2 (A − σk+1 I ) .

    }quadratic ADI

    (6)Again, the initial approximation is usually set to Xadi0 = 0. Note that by insertingG = 0 in the quadratic iteration we obtain the Lyapunov ADI algorithm (4). Asmentioned in the introduction, we will develop a low-rank variant of this algorithmsuch that inserting G = 0 will reduce it precisely to (5). To prepare the terrain, weneed to introduce two more methods for solving the Riccati equation.

    In the small scale setting, the Riccati equation (1) is usually solved by computingthe stable invariant subspace of the associated 2n × 2n Hamiltonian matrix (2). To bemore precise, let 1 ≤ k ≤ n, and

    H

    [PkQk

    ]=[A GQ −A∗

    ] [PkQk

    ]=[PkQk

    ]Λk, (7)

    where Pk, Qk ∈ Cn×k and the matrix Λk ∈ Ck×k is stable. For k = n, the stabilizingsolution of (1) is given by X = −Qk P−1k . In the large scale setting, it is computation-ally too expensive to compute the entire n-dimensional stable invariant subspace ofH . Thus an alternative approach was suggested in [1]: for k � n, one can computeonly a k-dimensional, stable, invariant subspace and use an approximation given bythe formula

    X invk = −Qk(Q∗k Pk)−1Q∗k .}invariant subspace approach (8)

    Clearly, X invn = X . This approach was further studied and justified in [3], where itwas shown that

    X invk = Xadik , for all k,

    if p = 1 and the shifts used for the qADI iteration coincide with the Hamiltonianeigenvalues of the matrix Λk . In fact, properties of the approximate solution X invkgiven in [3] have lead us to the definition of the low-rank variant of the qADI iterationthat is described in this paper.

    The final method we consider also uses the Hamiltonian matrix H . The Cayleytransformed Hamiltonian subspace iteration introduced in [23] generates a sequenceof approximations (Xcayk )k for the stabilizing solution of the Riccati equation definedby

    [MkNk

    ]= (H − σk I )−1(H + σk I )

    [I

    −Xcayk−1

    ],

    Xcayk = −NkM−1k ,

    ⎫⎬⎭ Cayley subspace iteration (9)

    Here Xcay0 is some initial approximation and σk ∈ C− are any chosen shifts (Here wehave adapted the notation to fit the one of this paper). In Sect. 3 we will show that thismethod is also equivalent to the qADI and the new low-rank RADI iterations.

    123

  • 306 P. Benner et al.

    2.1 Derivation of the algorithm

    The common way of converting ADI iterations into their low-rank variants is to per-form a procedure similar to the one originally done by Li and White [22] for theLyapunov ADI method. A crucial assumption for this procedure to succeed is thatthe matrices participating in the linear systems in each of the half-steps mutuallycommute for all k. For the Lyapunov ADI method this obviously holds true for thematrices A∗ + σk+1 I . However, in the case of the quadratic ADI iteration (6), thematrices A∗ + σk+1 I − Xadik G do not commute in general, for all k, and neither doA∗ + σk+1 I − Xadik+1/2G.

    Thus we take a different approach in constructing the low-rank version. A commonway to measure the quality of the matrixΞ ∈ Cn×n as an approximation to the Riccatisolution is to compute the norm of its residual matrix

    R(Ξ) = A∗Ξ + Ξ A + Q − ΞGΞ.

    The idea for our method is to repetitively update the approximation Ξ by forminga so-called residual equation, until its solution converges to zero. The background isgiven in the following simple result.

    Theorem 1 Let Ξ ∈ Cn×n be an approximation to a solution of (1).(a) Let X = Ξ + X̃ be an exact solution of (1). Then X̃ is a solution to the residual

    equationÃ∗ X̃ + X̃ Ã + Q̃ − X̃G X̃ = 0, (10)

    where à = A − GΞ and Q̃ = R(Ξ).(b) Conversely, if X̃ is a solution to (10), then X = Ξ + X̃ is a solution to the original

    Riccati equation (1). Moreover, if Ξ ≥ 0 and X̃ is a stabilizing solution to (10),then X = Ξ + X̃ is the stabilizing solution to (1).

    (c) If Ξ ≥ 0 andR(Ξ) ≥ 0, then the residual equation (10) has a unique stabilizingsolution.

    (d) If Ξ ≥ 0 andR(Ξ) ≥ 0, then Ξ ≤ X, where X is the stabilizing solution of (1).Proof

    (a) This is a straightforward computation which follows by inserting X̃ = X −Ξand the formula for the residual of Ξ into (10), see also [2,25].

    (b) The first part follows as in (a). IfΞ ≥ 0 and X̃ is a stabilizing solution to (10),then X = Ξ + X̃ ≥ 0 and A−GX = Ã−GX̃ is stable, which makes X thestabilizing solution to (1).

    (c), (d) The claims follow directly from (a) and [19, Theorem 9.1.1].

    �Our algorithm will have the following form:

    1. Let Ξ = 0.2. Form the residual equation (10) for the approximation Ξ .

    123

  • RADI: a low-rank ADI-type algorithm for large AREs. 307

    3. Compute an approximation X̃1 ≈ X̃ , where X̃ is the stabilizing solution of (10).4. Accumulate Ξ ← Ξ + X̃1, and go to Step 2.To complete the derivation, we need to specify Step 3 in a way that X̃1 ≥ 0

    and R(Ξ + X̃1) ≥ 0. With these two conditions imposed, Theorem 1 ensures thatthe residual equation in Step 2 always has a unique stabilizing solution and that theapproximation Ξ is kept positive semidefinite and monotonically increasing towardsthe stabilizing solution of (1). Thematrix X̃1 fulfilling these conditions can be obtainedby computing a 1-dimensional invariant subspace for the Hamiltonian matrix associ-ated with the residual equation, and plugging it into formula (8).

    More precisely, assume that R(Ξ) = C̃∗C̃ ≥ 0, G = BB∗, and that r, q ∈ Cnsatisfy

    [Ã BB∗

    C̃∗C̃ − Ã∗] [

    rq

    ]= λ

    [rq

    ],

    where λ ∈ C− is such that −λ is not an eigenvalue of Ã. Equivalently,

    Ãr + BB∗q = λr,C̃∗C̃r − Ã∗q = λq.

    From the second equation we get q = ( Ã∗ + λI )−1C̃∗(C̃r). Let

    Ṽ1 =√−2Re (λ)( Ã∗ + λI )−1C̃∗.

    Multiply the first equation by q∗ from the left, and the transpose of the second by rfrom the right; then add the terms to obtain

    q∗r = 12Re (λ)

    (q∗BB∗q + r∗C̃∗C̃r)

    = 12Re (λ)

    (C̃r)∗(I − 1

    2Re (λ)

    (Ṽ ∗1 B

    ) (Ṽ ∗1 B

    )∗)(C̃r).

    Expression (8) has the form X̃1 = −q(q∗r)−1q∗. When p = 1 and C̃r �= 0, the termscontaining C̃r cancel out, and we get

    Ṽ1 = √−2Re (λ) · ( Ã∗ + λI )−1C̃∗,Ỹ1 = I − 12Re(λ)

    (Ṽ ∗1 B

    ) (Ṽ ∗1 B

    )∗,

    X̃1 = Ṽ1Ỹ−11 Ṽ ∗1 .

    ⎫⎪⎬⎪⎭ (11)

    The derivation above is valid when λ is an eigenvalue of the Hamiltonian matrix,similarly as in [3, Theorem 7] which also studies residual Riccati equations related toinvariant subspaces ofHamiltonianmatrices. Nevertheless, the expression (11) is well-defined even when λ is not an eigenvalue of the Hamiltonian matrix, and for p > 1 aswell. The Hamiltonian argument here serves only as a motivation for introducing (11),

    123

  • 308 P. Benner et al.

    and the following proposition shows that the desired properties of the updated matrixΞ + X̃1 still hold for any λ in the left half-plane which is not an eigenvalue of − Ã,and for all p.

    Proposition 1 Let Ξ ≥ 0 be such that R(Ξ) = C̃∗C̃ ≥ 0, and let λ ∈ C− not bean eigenvalue of − Ã. The following holds true for the update matrix X̃1 as definedin (11):

    (a) X̃1 ≥ 0, i.e. Ξ + X̃1 ≥ 0.(b) R(Ξ + X̃1) = Ĉ∗Ĉ ≥ 0, where Ĉ∗ = C̃∗ + √−2Re (λ) · Ṽ1Ỹ−11 .Proof

    (a) Positive definiteness of Ỹ1 (and then the semi-definiteness of X̃1 as well) followsdirectly from Re (λ) < 0.

    (b) Note that Ã∗Ṽ1 = √−2Re (λ) · C̃∗ − λṼ1, and (Ṽ ∗1 B)(Ṽ ∗1 B)∗ = 2Re (λ) I −2Re (λ) Ỹ1. We use these expressions to obtain:

    R(Ξ + X̃1) = Ã∗ X̃1 + X̃1 Ã + R(Ξ) − X̃1BB∗ X̃1=(Ã∗Ṽ1

    )Ỹ−11 Ṽ

    ∗1 + Ṽ1Ỹ−11

    (Ã∗Ṽ1

    )∗ + C̃∗C̃− Ṽ1Ỹ−11

    (Ṽ ∗1 B

    ) (Ṽ ∗1 B

    )∗Ỹ−11 Ṽ

    ∗1

    = √−2Re (λ) · C̃∗Ỹ−11 Ṽ ∗1 +√−2Re (λ) · Ṽ1Ỹ−11 C̃ + C̃∗C̃

    − 2Re (λ) Ṽ1Ỹ−11 Ỹ−11 Ṽ ∗1=(C̃∗ +√−2Re (λ) · Ṽ1Ỹ−11

    )·(C̃∗ +√−2Re (λ) · Ṽ1Ỹ−11

    )∗.

    �We are now ready to state the new RADI algorithm. Starting with the initial

    approximation X0 = 0 and the residual R(X0) = C∗C , we continue by select-ing a shift σk ∈ C− and computing the approximation Xk = ZkY−1k Z∗k withthe residual R(Xk) = Rk R∗k , for k = 1, 2, . . .. The transition from Xk−1 toXk = Xk−1 + VkỸ−1k V ∗k is computed via (11), adapted to approximate the solu-tion of the residual equation with Ξ = Xk−1, i.e. with à = A − BB∗Xk−1 andC̃∗ = Rk−1. Proposition 1 provides a very efficient update formula for the low-rankfactor Rk of the residual. The whole procedure reduces to the following:

    R0 = C∗,Vk = √−2Re (σk) · (A∗ − Xk−1BB∗ + σk I )−1Rk−1,Ỹk = I − 12Re(σk )

    (V ∗k B

    ) (V ∗k B

    )∗ ; Yk =[Yk−1

    Ỹk

    ],

    Rk = Rk−1 + √−2Re (σk) · VkỸ−1k ,Zk =

    [Zk−1 Vk

    ].

    ⎫⎪⎪⎪⎪⎪⎪⎬⎪⎪⎪⎪⎪⎪⎭RADI iteration (12)

    Note that any positive semi-definite X0 can be used as an initial approximation, aslong as its residual is positive semi-definite as well, and its low-rank Cholesky fac-

    123

  • RADI: a low-rank ADI-type algorithm for large AREs. 309

    torization can be computed. From the derivation of the RADI algorithm we havethat

    Xk =k∑

    i=1Vi Ỹ

    −1i V

    ∗i ;

    in formulation (12) of the method we have collected V1, . . . , Vk into the matrix Zk ,and Ỹ1, . . . , Ỹk into the block-diagonal matrix Yk .

    When p = 1 and all the shifts are chosen as eigenvalues of the Hamiltonian matrixassociated with the initial Riccati equation (1), the update described in Proposition 1reduces to [3, Theorem 5]. Thus in that case, the RADI algorithm reduces to theinvariant subspace approach (8).

    Furthermore, iteration (12) clearly reduces to the low-rank Lyapunov ADImethod (5) when B = 0; in that case Yk = I . The relation to the original qADIiteration (6) is not clear unless p = 1 and the shifts are chosen as eigenvalues of H ,in which case both of these methods coincide with the invariant subspace approach.We discuss this further in the following section.

    3 Equivalences with other Riccati methods

    In this section we prove that all Riccati solvers introduced in Sect. 2 in fact computeexactly the same iterations, which we will refer to as the Riccati ADI iterations in theremaining text. This result is collected in Theorem 2; we begin with a simple technicallemma that provides different representations of the residual factor.

    Lemma 1 Let

    R(1)k =1√−2Re (σk) · (A

    ∗ − XkG − σk I )Vk,

    R(2)k =1√−2Re (σk+1) · (A

    ∗ − XkG + σk+1 I )Vk+1.

    Then R(1)k = R(2)k = Rk.

    Proof From the definition of the RADI iteration (12) it is obvious that R(2)k = Rk .Using Xk = Xk−1 + VkỸ−1k V ∗k , and V ∗k GVk = 2Re (σk) I − 2Re (σk) Ỹk , we have

    Rk = Rk−1 +√−2Re (σk)VkỸ−1k

    = 1√−2Re (σk) · (A∗ − Xk−1G + σk I )Vk +

    √−2Re (σk)VkỸ−1k= 1√−2Re (σk) · (A

    ∗ − XkG − σk I )Vk + 1√−2Re (σk)VkỸ−1k V

    ∗k GVk

    −√−2Re (σk)Vk +√−2Re (σk)VkỸ−1k= R(1)k . �

    123

  • 310 P. Benner et al.

    Theorem 2 If the initial approximation in all algorithms is zero, and the same shiftsare used, then for all k,

    Xk = Xadik = Xcayk .

    If rankC = 1 and the shifts are equal to distinct eigenvalues of H , then for all k,

    Xk = Xadik = Xcayk = X invk .

    Proof We first use induction to show that Xk = Xadik , for all k.Assume that Xk−1 = Xadik−1.We need to show that Xk = Xk−1+VkỸ−1k V ∗k satisfies

    the defining equality (6) of the qADI iteration, i.e. that

    (A∗ + σk I − Xadik−1/2G

    ) (Xk−1 + VkỸ−1k V ∗k

    )= −Q − Xadik−1/2(A − σk I ), (13)

    whereXadik−1/2(A + σk I − GXk−1) = −Q − (A∗ − σk I )Xk−1. (14)

    First, note that (14) can be rewritten as

    A∗Xk−1 + Xadik−1/2A + Q − Xadik−1/2GXk−1 + σk(Xadik−1/2 − Xk−1

    )= 0.

    Subtracting this from the expression for the Riccati residual,

    A∗Xk−1 + Xk−1A + Q − Xk−1GXk−1 = R(Xk−1),

    we obtainXadik−1/2 − Xk−1 = −R(Xk−1) · (A − GXk−1 + σk I )−1 . (15)

    Equation (13) can be reorganized as

    (A∗ + σk I )(Xk−1 + VkỸ−1k V ∗k

    )− Xadik−1/2GVkỸ−1k V ∗k

    = −Q − Xadik−1/2(A + σk I − GXk−1) + 2Re (σk) Xadik−1/2.

    Replace the second term on the right-hand side with the right-hand side of (14). Thus,it remains to prove

    (A∗ + σk I )(Xk−1 + VkỸ−1k V ∗k

    )− Xadik−1/2GVkỸ−1k V ∗k

    = (A∗ − σk I )Xk−1 + 2Re (σk) Xadik−1/2,

    or after some rearranging, and by using (15),

    (A∗ − Xk−1G + σk I )VkỸ−1k V ∗k

    123

  • RADI: a low-rank ADI-type algorithm for large AREs. 311

    =(Xadik−1/2 − Xk−1

    )·(2Re (σk) I + GVkỸ−1k V ∗k

    )

    = −R(Xk−1) · (A − GXk−1 + σk I )−1 ·(2Re (σk) I + GVkỸ−1k V ∗k

    ). (16)

    Next we use the expression R(Xk−1) = R(2)k−1(R(2)k−1)∗ of Lemma 1. The right-handside of (16) is, thus, equal to

    1

    2Re (σk)· (A∗ − Xk−1G + σk I )VkV ∗k ·

    (2Re (σk) I + GVkỸ−1k V ∗k

    ),

    which turns out to be precisely the same as the left-hand side of (16) once we use theidentity

    V ∗k GVk = 2Re (σk) I − 2Re (σk) Ỹk .

    This completes the proof of Xk = Xadik .Next, we use induction once again to show Xcayk = Xadik , for all k. For k = 0, the

    claim is trivial; assume that Xcayk−1 = Xadik−1 for some k ≥ 1. To show that Xcayk = Xadik ,let us first multiply (9) by H − σk I from the left:

    [A − σk I G

    Q −A∗ − σk I] [

    MkNk

    ]=[A + σk I G

    Q −A∗ + σk I] [

    I−Xcayk−1

    ]

    =:[Mk−1/2Nk−1/2

    ], (17)

    and suggestively introduce Xcayk−1/2 := −Nk−1/2M−1k−1/2. We thus have

    A + σk I − GXcayk−1 = Mk−1/2, (18)Q − (−A∗ + σk I )Xcayk−1 = Nk−1/2, (19)

    and

    Xcayk−1/2(A + σk I − GXcayk−1

    )=−Nk−1/2M−1k−1/2Mk−1/2 = −Q − (A∗ − σk I )Xcayk−1.

    This is the same relation as the one defining Xadik−1/2, and thus Xcayk−1/2 = Xadik−1/2. Next,

    equating the leftmost and the rightmost matrix in (17), it follows that

    (A − σk I )Mk + GNk = Mk−1/2, (20)QMk + (−A∗ − σk I )Nk = Nk−1/2. (21)

    Multiply (21) from the right by M−1k to obtain

    (A∗ + σk I )Xcayk = −Q + Nk−1/2M−1k , (22)

    123

  • 312 P. Benner et al.

    and multiply (20) from the left by Xcayk−1/2 and from the right by M−1k to get

    − Xcayk−1/2GXcayk = −Xcayk−1/2(A − σk I ) − Nk−1/2M−1k . (23)

    Adding (22) and (23) yields

    (A∗ + σk I − Xcayk−1/2G)Xcayk = −Q − Xcayk−1/2(A − σk I ),

    which is the same as the defining equation for Xadik . Thus Xcayk = Xadik , so both the

    Cayley subspace iteration and the qADI iteration generate the same sequences.In the case of rankC = 1 and shifts equal to the eigenvalues of H , the equality

    X invk = Xadik is already shown in [3]. Equality among the iterates generated by theother methods is a special case of what we have proved above. �

    It is interesting to observe that [23] also provides a low-rank variant of theCayley subspace iteration algorithm: there, formulas for updating the factors ofXcayk = Zcayk (Y cayk )−1(Zcayk )∗, where Zcayk ∈ Cn×pk and Y cayk ∈ Cpk×pk , are given.The contribution of [24] was to show that the same formulas can be derived from acontrol-theory point of view. The main difference in comparison to our RADI variantof the low-rank Riccati ADI iterations is that, in order to compute Zcayk , one usesthe matrix (A∗ + σk I )−1, instead of (A∗ − Xk−1G + σk I )−1, when computing Zk .This way, the need for using the Sherman–Morrison–Woodbury formula is avoided.However, as a consequence, the matrix Y cayk looses the block-diagonal structure, andits update formula becomes much more involved. Also, it is very difficult to derive aversion of the algorithm that would use real arithmetic. Another disadvantage is thecomputation of the residual: along with Zcayk and Y

    cayk , one needs to maintain a QR-

    factorization of the matrix [C∗ A∗Zcayk Zcayk ], which adds significant computationalcomplexity to the algorithm.

    Each of these different statements of the sameRiccatiADI algorithmmay contributewhen studying theoretical properties of the iteration. For example, directly from ourdefinition (12) of the RADI iteration it is obvious that

    0 ≤ X1 ≤ X2 ≤ . . . ≤ Xk ≤ . . . ≤ X.

    Also, the fact that the residual matrixR(Xk) is low-rank and its explicit factorizationfollows naturally from our approach. On the other hand, approaching the iterationfrom the control theory point of view as in [24] is more suitable for proving that thenon-Blaschke condition for the shifts,

    ∞∑k=1

    Re (σk)

    1 + |σk |2 = −∞,

    is sufficient for achieving the convergence when A is stable, i.e.

    limk→∞ Xk = X.

    123

  • RADI: a low-rank ADI-type algorithm for large AREs. 313

    We conclude this section by noting a relation between the Riccati ADI iterationand the rational Krylov subspace method [34]. It is easy to see that the RADI iterationalso uses the rational Krylov subspaces as the basis for approximation. This fact alsofollows from the low-rank formulation for Xcayk as given in [23], so we only state ithere without proof.

    Proposition 2 For amatrix M, (block-)vector v and a tuple−→σk = (σ1, . . . , σk) ∈ Ck−,let

    K (M, v,−→σk ) = span{(M + σ1 I )−1v, (M + σ2 I )−1v, . . . , (M + σk I )−1v}

    denote the rational Krylov subspace generated by M and the initial vector v. Then thecolumns of Xk belong toK (A∗,C∗,−→σk ).

    From the proposition we conclude the following: ifUk contains a basis for the ratio-nal Krylov subspaceK (A∗,C∗,−→σk ), then both the approximation Xkryk of the Riccatisolution obtained by the rational Krylov subspace method and the approximation Xadikobtained by a Riccati ADI iteration satisfy

    Xkryk = UkY kryk U∗k , Xadik = UkY adik U∗k ,

    for some matrices Y kryk ,Yadik ∈ Cpk×pk . The columns of both Xkryk and Xadik belong

    to the same subspace, and the only difference between the methods is the choiceof the linear combination of columns of Uk , i.e. the choice of the small matrix Yk .The rational Krylov subspace method [34] generates its Y kryk by solving the projectedRiccati equation, while the Riccati ADI methods do it via direct formulas such as theone in (12).

    4 Implementation aspects of the RADI algorithm

    There are several issues with the iteration (12), stated as is, that should be addressedwhen designing an efficient computational routine: how to decide when the iteratesXk have converged, how to solve linear systems with matrices A∗ − Xk−1G + σk I ,and how to minimize the usage of complex arithmetic. In this section we also discussthe various shift selection strategies.

    4.1 Computing the residual and the stopping criterion

    Tracking the progress of the algorithm and deciding when the iterates have convergedis very simple, and can be computed cheaply thanks to the expression ‖R(Xk)‖ =‖Rk R∗k‖ = ‖R∗k Rk‖. This is an advantage compared to the Cayley subspace iteration,where computing ‖R(Xk)‖ is more expensive because a low-rank factorization alongthe lines of Proposition 1 (b) is currently not known. The RADI iteration is stoppedonce the residual norm has decreased sufficiently relative to the initial residual norm‖CC∗‖ of the approximation X0 = 0.

    123

  • 314 P. Benner et al.

    Algorithm 1: The RADI iteration using complex arithmetic.Input: matrices A ∈ Rn×n , B ∈ Rn×m , C ∈ Rp×n .Output: approximation X ≈ ZY−1Z∗ for the solution of A∗X + X A + C∗C − XBB∗X = 0.

    1 R = C∗; K = 0; Y = [ ];2 while ‖R∗R‖ ≥ tol · ‖CC∗‖ do3 Obtain the next shift σ ;4 if first pass through the loop then5 Z = V = √−2Re (σ ) · (A∗ + σ I )−1R;6 else7 V = √−2Re (σ ) · (A∗ − K B∗ + σ I )−1R; // Use SMW if necessary8 Z = [Z V ];9 end

    10 Ỹ = I − 12Re(σ ) · (V ∗B)(V ∗B)∗; Y =[YỸ

    ];

    11 R = R + √−2Re (σ ) · (V Ỹ−1);12 K = K + (V Ỹ−1) · (V ∗B);13 end

    4.2 Solving linear systems in RADI

    During the iteration, one has to evaluate the expression (A∗ − Xk−1G +σk I )−1Rk−1.Here the matrix A is assumed to be sparse, while Xk−1G = (Xk−1B)B∗ is low-rank.There are different options onhow to solve this linear system; if onewants to use adirectsparse solver, the initial expression can be adapted by using the Sherman–Morrison–Woodbury (SMW) formula [14].We introduce the approximate feedbackmatrix Kk :=Xk B and update it during the RADI iteration: Kk = Kk−1 + (VkỸ−1k )(V ∗k B). Notethat VkỸ

    −1k also appears in the update of the residual factor, and that V

    ∗k B appears in

    the computation of Ỹk , so both have to be computed only once. The initial expressionis rewritten as

    (A∗ − Kk−1B∗ + σk I )−1Rk−1 = Lk + Nk(Im − B∗Nk)−1K ∗k−1Lk,[Lk, Nk] = (A∗ + σk I )−1[Rk−1, Kk−1].

    Thus, in each RADI step one needs to solve a linear system with the coefficientmatrix A∗ + σk I and p + m right hand sides. A very similar technique is used in thelow-rank Newton ADI solver for the Riccati equation [7,9,15,29]. In the equivalentCayley subspace iteration [23,24], linear systems defined by A∗ + σk I and only pright hand sides have to be solved, which makes their solution less expensive thantheir counterparts in RADI.

    The RADI algorithm, implementing the techniques described above, is listed inAlgorithm 1. Note that, if only the feedback matrix K is of interest, e.g. if the CAREarises from an optimal control problem, there is no need to store the whole low-rankfactors Z , Y since Algorithm 1 requires only the latest blocks to continue. This isagain similar to the low-rank Newton ADI solver [7], and not possible in the currentversion of the Cayley subspace iteration.

    123

  • RADI: a low-rank ADI-type algorithm for large AREs. 315

    4.3 Reducing the use of complex arithmetic

    To increase the efficiency of Algorithm 1, we reduce the use of complex arithmetic.We do so by taking shift σk+1 = σk immediately after the shift σk ∈ C \ R has beenused, and by merging these two consecutive RADI steps into a single one. This entireprocedure will have only one operation involving complex matrices: a linear solvewith the matrix A∗ − Xk−1G + σk I to compute Vk . There are two key observationsto be made here. First, by modifying the iteration slightly, one can ensure that thematrices Kk+1, Rk+1, and Yk+1 are real and can be computed by using real arithmeticonly, as shown in the upcoming technical proposition. Second, there is no need tocompute Vk+1 at all to proceed with the iteration: the next matrix Vk+2 will once againbe computed by using the residual Rk+1, the same way as in Algorithm 1.

    Proposition 3 Let Xk−1 = Zk−1Y−1k−1Z∗k−1 ∈ Rn×n denote the Riccati approximatesolution computed after k − 1 RADI steps. Assume that Rk−1, Vk−1, Kk−1 are realand that σ := σk = σk+1 ∈ C \ R. Let:

    Vr = (Re (Vk))∗B, Vi = (Im (Vk))∗B,F1 =

    [−Re (σ ) Vr − Im (σ ) ViIm (σ ) Vr − Re (σ ) Vi

    ], F2 =

    [VrVi

    ], F3 =

    [Im (σ ) IpRe (σ ) Ip

    ].

    Then Xk+1 = Zk+1Y−1k+1Z∗k+1, where:

    Zk+1 = [Zk−1 Re (Vk) Im (Vk)],Yk+1 =

    [Yk−1

    Ŷk+1

    ],

    Ŷk+1 =[Ip

    1/2Ip

    ]− 1

    4|σ |2 Re (σ ) F1F∗1 −

    1

    4Re (σ )F2F

    ∗2 −

    1

    2|σ |2 F3F∗3 .

    The residual factor Rk+1 and the matrix Kk+1 can be computed as

    Rk+1 = Rk−1 +√−2Re (σ ) ([Re (Vk) Im (Vk)]Ŷ−1k+1

    )(:, 1 : p),

    Kk+1 = Kk−1 + [Re (Vk) Im (Vk)]Ŷ−1k+1[VrVi

    ].

    Proof The basic idea of the proof is similar to [5, Theorem 1]; however, there is amajor complication involving the matrix Y , which in the Lyapunov case is simplyequal to the identity. Due to the technical complexity of the proof, we only displaykey intermediate results. To simplify notation, we use indices 0, 1, 2 instead of k − 1,k, k + 1, respectively.We start by taking the imaginary part of the defining relation for R(2)0 = R0 inLemma 1:

    0 = (A∗ − X0G + Re (σ ) I ) · Im (V1) + Im (σ ) · Re (V1) ;

    123

  • 316 P. Benner et al.

    thus

    V1 = Re (V1) + i Im (V1) = −1Im (σ )

    (A∗ − X0G + σ I )︸ ︷︷ ︸A0

    Im (V1) .

    We use this and the Sherman–Morrison–Woodbury formula to compute V2:

    V2 =√−2Re (σ )(A∗ − X1G + σ I )−1R1

    = √−2Re (σ )(A∗ − X1G + σ I )−1 · 1√−2Re (σ ) · (A∗ − X1G − σ I )V1= V1 − 2σ(A∗ − X1G + σ I )−1V1= V1 − 2σ

    (A0 − V1Ỹ−11

    (V ∗1 G

    ))−1V1

    = V1 − 2σ(A−10 V1 + A−10 V1

    (Ỹ1 −

    (V ∗1 G

    )A−10 V1

    )−1 (V ∗1 G

    )A−10 V1

    )

    = V1 + 2σ Im (V1)⎛⎜⎝Im (σ ) Ỹ1 + (V ∗1 B) V ∗i︸ ︷︷ ︸

    S

    ⎞⎟⎠

    −1

    Ỹ1; (24)

    where we used Im (σ ) A−10 V1 = − Im (V1) to obtain the last line. Next,

    X2 = X0 + [ V1 V2 ][Ỹ−11

    Ỹ−12

    ][ V1 V2 ]∗

    = X0 + [ Re(V1) Im(V1) ][

    I IiI iI+2σ S−1Ỹ1

    ]︸ ︷︷ ︸

    T

    [Ỹ−11

    Ỹ−12

    ]

    ×([ Re(V1) Im(V1) ]

    [I IiI iI+2σ S−1Ỹ1

    ])∗

    = X0 + [ Re(V1) Im(V1) ](T−∗

    [Ỹ1

    Ỹ2

    ]T−1︸ ︷︷ ︸

    Ŷ2

    )−1[ Re(V1) Im(V1) ]∗ .

    We first compute T−1 by using the SMW formula once again:

    T−1 =([

    IiI iI

    ]+ [ I S−1] [

    I2σ Ỹ1

    ])−1

    =[I + i2σ Ỹ−11 S −12σ Ỹ−11 S−i

    2σ Ỹ−11 S

    12σ Ỹ

    −11 S

    ],

    and, applying the congruence transformation with T−1 to[Ỹ1

    Ỹ2

    ]yields

    123

  • RADI: a low-rank ADI-type algorithm for large AREs. 317

    Ŷ2 =⎡⎣ Ỹ1 +

    i2σ S − i2σ S∗ + 14|σ |2 S

    ∗Ỹ−11 S + 14|σ |2 S∗Ỹ−11 Ỹ2Ỹ

    −11 S

    −12σ S + i4|σ |2 S

    ∗Ỹ−11 S + i4|σ |2 S∗Ỹ−11 Ỹ2Ỹ

    −11 S

    −12σ S

    ∗ − i4|σ |2 S

    ∗Ỹ−11 S − i4|σ |2 S∗Ỹ−11 Ỹ2Ỹ

    −11 S

    14|σ |2 S

    ∗Ỹ−11 S + 14|σ |2 S∗Ỹ−11 Ỹ2Ỹ

    −11 S

    ⎤⎦ .

    By using (24), it is easy to show

    Ỹ2 = I − 12Re (σ )

    (V ∗2 B

    ) (V ∗2 B

    )∗

    = − Ỹ1 − 12Re (σ )

    ×(−2σ Im (σ ) Ỹ1S−∗Ỹ1 − 2σ Im (σ ) Ỹ1S−1Ỹ1 + 4|σ |2Ỹ1S−∗ViV ∗i S−1Ỹ1

    ).

    Inserting this into the formula for Ŷ2, all terms containing inverses of Ỹ1 and S cancelout. By rearranging the terms that do appear in the formula, we get the expressionfrom the claim of the proposition.

    Deriving the formulae for R2 and K2 is straightforward:

    K2 = K0 + [ V1 V2 ][Ỹ−11

    Ỹ−12

    ][ V1 V2 ]∗ B

    = K0 +([ Re(V1) Im(V1) ] Ŷ−12

    ) ([ Re(V1) Im(V1) ]∗ B

    ),

    R2 = R0 +√−2Re (σ ) [ V1 V2 ]

    [Ỹ−11Ỹ−12

    ]

    = R0 +√−2Re (σ ) [ Re(V1) Im(V1) ]

    (T

    [Ỹ−11Ỹ−12

    ])

    = R0 +√−2Re (σ ) [ Re(V1) Im(V1) ] Ŷ−12 (:, 1 : p),

    where the last line is due to the structure of the matrix T . Since the matrix Ŷ2 is real,so are K2 and R2. �

    4.4 RADI iteration for the generalized Riccati equation

    Before we state the final implementation, we shall briefly mention the adaptation ofthe RADI algorithm for handling generalized Riccati equations

    A∗XE + E∗X A + Q − E∗XGXE = 0. (25)

    Multiplying (25) by E−∗ from the left and by E−1 from the right leads to

    (AE−1)∗X + X (AE−1) + E−∗C∗CE−1 − XBB∗X = 0. (26)

    The generalized RADI algorithm is then easily derived by running ordinary RADIiterations for the Riccati equation (26), and deploying standard rearrangements to

    123

  • 318 P. Benner et al.

    Algorithm 2: The RADI iteration, with reduced use of complex arithmetic.Input: matrices A, E ∈ Rn×n , B ∈ Rn×m , C ∈ Rp×n .Output: approximation X ≈ ZY−1Z∗ for the solution of the generalized equation

    A∗XE + E∗X A + C∗C − E∗XBB∗XE = 0. The matrices Z and Y are real.1 R = C∗; K = 0; Y = [ ]; Z = [ ];2 while ‖R∗R‖ ≥ tol · ‖CC∗‖ do3 Obtain the next shift σ ;

    4 if first pass through the loop then5 V = √−2Re (σ ) · (A∗ + σ E∗)−1R;6 else7 V = √−2Re (σ ) · (A∗ − K B∗ + σ E∗)−1R; // Use SMW if necessary8 end

    9 if σ ∈ R then10 Z = [Z V ];11 Ỹ = I − 12Re(σ ) · (V ∗B)(V ∗B)∗; Y =

    [YỸ

    ]; R = R + √−2Re (σ ) · (E∗V Ỹ−1);

    12 K = K + (E∗V Ỹ−1) · (V ∗B);13 else14 Z = [Z Re (V ) Im (V )];15 Vr = (Re (V ))∗B; Vi = (Im (V ))∗B;16 F1 =

    [−Re (σ ) Vr − Im (σ ) ViIm (σ ) Vr − Re (σ ) Vi

    ]; F2 =

    [VrVi

    ]; F3 =

    [Im (σ ) IpRe (σ ) Ip

    ];

    17 Ỹ =[Ip

    1/2Ip

    ]− 1

    4|σ |2 Re(σ ) F1F∗1 − 14Re(σ ) F2F∗2 − 12|σ |2 F3F

    ∗3 ;

    18 Y =[Y

    ];

    19 R = R + √−2Re (σ ) · E∗([Re (V ) Im (V )]Ỹ−1

    )(:, 1 : p);

    20 K = K + E∗[Re (V ) Im (V )]Ỹ−1[VrVi

    ];

    21 end22 end

    lower the computational expense (such as solving systems with the matrix A∗ + σ E∗instead of (AE−1)∗ + σ I ). Algorithm 2 shows the final implementation, taking intoaccount Proposition 3 and the handling of generalized Eq. (25).

    4.5 Shift selection

    The problem of choosing the shifts in order to accelerate the convergence of theRiccati ADI iteration is very similar to the one for the Lyapunov ADI method. Thuswe apply and discuss the techniques presented in [3,6,27,30] in the context of theRiccati equation, and compare them in several numerical experiments. It appearsnatural to employ the heuristic Penzl shifts [27]. There, a small number of approximateeigenvalues of A are generated. From this set the values which lead to the smallestmagnitude of the rational function associated to the ADI iteration are selected in aheuristical manner. Simoncini and Lin [23] have shown that the convergence of the

    123

  • RADI: a low-rank ADI-type algorithm for large AREs. 319

    RiccatiADI iteration is related to a rational function built from the stable eigenvalues ofH . This suggests to carry out the Penzl approach, but to use approximate eigenvaluesfor the Hamiltonian matrix H instead of A. Note that, due to the low rank of Q andG, we can expect that most of the eigenvalues of A are close to the eigenvalues ofH , see the discussion in [3]. Thus in many cases the Penzl shifts generated by Ashould suffice as well. Penzl shifts require significant preprocessing computation: inorder to approximate the eigenvalues of M = A or M = H , one has to build Krylovsubspaces with matrices M and M−1. All the shifts are computed in this preprocessingstage, and then simply cycled during the RADI iteration. Here, we will mainly focuson alternative approaches generating each shift just before it is used. This way wehope to compute a shift which will better adapt to the current stage of the algorithm.

    4.5.1 Residual Hamiltonian shifts

    One such approach is motivated by Theorem 1 and the discussion about shift selectionin [3]. The Hamiltonian matrix

    H̃ =[

    Ã GC̃∗C̃ − Ã∗

    ]

    is associated to the residual equation (10), where Ξ = Xk is the approximation afterk steps of the RADI iteration. If (λ,

    [ rq]) is a stable eigenpair of H̃ , and σk+1 = λ is

    used as the shift, then

    Xk ≤ Xk+1 = Xk − q(q∗r)−1q∗ ≤ X.

    In order to converge as fast as possible to X , it is better to choose such an eigenvalueλ for which the update is largest, i.e. the one that maximizes ‖q(q∗r)−1q∗‖. Note thatas the RADI iteration progresses and the residual matrix R(Ξ) = C̃∗C̃ convergesto zero, the structure of eigenvectors of H̃ that belong to its stable eigenvalues issuch that ‖q‖ becomes smaller and smaller. Thus, one can further simplify the shiftoptimality condition, and use the eigenvalue λ such that the corresponding q has thelargest norm—this is also in line with the discussion in [3].

    However, in practice it is computationally very expensive to determine such aneigenvalue, since the matrix H̃ is of order 2n. We can approximate its eigenpairsthrough projection onto some subspace. If U is an orthonormal basis of the chosensubspace, then

    (U∗ ÃU )∗ X̃proj + X̃proj(U∗ ÃU ) + (U∗C̃∗)(U∗C̃∗)∗−X̃proj(U∗B)(U∗B)∗ X̃proj = 0

    is the projected residual Riccati equation with the associated Hamiltonian matrix

    H̃ proj =[

    U∗ ÃU (U∗B)(U∗B)∗(U∗C̃∗)(U∗C̃∗)∗ −(U∗ ÃU )∗

    ]. (27)

    123

  • 320 P. Benner et al.

    Approximate eigenpairs of H̃ are (λ̂,[Ur̂Uq̂

    ]), where (λ̂,

    [r̂q̂

    ]) are eigenpairs of H̃ proj.

    Thus, a reasonable choice for the next shift is such λ̂, for which ‖Uq̂‖ = ‖q̂‖ is thelargest.

    We still have to define the subspace span{U }. One option is to use Vk (or, equiva-lently, the last p columns of the matrix Zk), which works very well in practice unlessp = 1. When p = 1, all the generated shifts are real, which can make the convergenceslow in some cases. Then it is better to choose the last � columns of the matrix Zk ;usually already � = 2 or � = 5 or a small multiple of p will suffice. An ultimateoption is to use the entire Zk , which we denote as � = ∞. This is obviously morecomputationally demanding, but it provides fast convergence in all cases we tested.

    4.5.2 Residual minimizing shifts

    The two successive residual factors are connected via the formula

    Rk+1 = (A∗ − Xk+1G − σk+1 I )(A∗ − XkG + σk+1 I )−1Rk .

    Our goal is to choose the shifts so that the residual drops to zero as quickly as possible.Locally, once Xk is computed, this goal is achieved by choosing σk+1 ∈ C− so that‖Rk+1‖ is minimized. This concept was proposed in [6] for the low-rank Lyapunovand Sylvester ADI methods and refined later in [18]. In complete analogy, we definea rational function f in the variable σ by

    f (σ ) := ‖(A∗ − Xk+1(σ )G − σ I )(A∗ − XkG + σ I )−1Rk‖2, (28)

    and wish to find

    argminσ∈C− f (σ );

    note that Xk+1(σ ) = Xk + Vk+1(σ )Ỹ−1k+1(σ )V ∗k+1(σ ) is also a function of σ . Since finvolves large matrices, we once again project the entire equation to a chosen subspaceU , and solve the optimization problem defined by the matrices of the projected prob-lem. The optimization problem is solved numerically. Efficient optimization solversuse the gradient of the function f ; after a laborious computation one can obtain anexplicit formula for the case p = 1:

    ∇ f (σR, σI )

    =⎡⎣ 2Re

    (R∗k+1 ·

    ((1

    σRI − Ã−1 − 12σR

    (Xk+1G Ã−1 + Xk+1 Ã−∗G

    ))Δ))

    −2 Im(R∗k+1 ·

    ((− Ã−1 − 12σR

    (Xk+1G Ã−1 − Xk+1 Ã−∗G

    ))Δ))

    ⎤⎦ .

    Here σ = σR + iσI , Ã = A∗ − XkG − σ I , Xk+1 = Xk+1(σ ), Rk+1 = Rk+1(σ ), andΔ = Rk+1(σ ) − Rk .

    For p > 1, a similar formula can be derived, but one should note that the functionf is not necessarily differentiable at every point σ , see, e.g., [26]. Thus, a numerically

    123

  • RADI: a low-rank ADI-type algorithm for large AREs. 321

    more reliable heuristic [18] is to artificially reduce the problem once again to the casep = 1. This can be done in the following way: let v denote the right singular vectorcorresponding to the largest singular value of the matrix Rk . Then Rk ∈ Cn×p in (28)is replaced by the vector Rkv ∈ Cn . Since numerical optimization algorithms usuallyrequire a starting point for the optimization, the two shift generating approaches maybe combined: the residual Hamiltonian shift can be used as the starting point in theoptimization for the second approach. However, from our numerical experience weconclude that the additional computational effort invested in the post-optimization ofthe residual Hamiltonian shifts often does not contribute to the convergence. The maindifficulty is the choice of an adequate subspace U such that the projected objectivefunction approximates (28) well enough. This issue requires futher investigation. Therationale is given in the following example.

    Example 1 Consider the Riccati equation given in Example 5.2 of [32]: the matrix Ais obtained by the centered finite difference discretization of the differential equation

    ∂t u = Δu − 10x∂xu − 1000y∂yu − 10∂zu + b(x, y) f (t),

    on a unit cube with 22 nodes in each direction. Thus A is of order n = 10648; thematrices B ∈ Rn×m and C ∈ Rp×n are generated at random, and in this example weset m = p = 1 and B = C∗.

    Suppose that 13 RADI iterations have already been computed, and that we need tocompute a shift to be used in the 14th iteration. Let thematrixU contain an orthonormalbasis for Z13.

    Figure 1a shows a region of the complex plane; stars are at locations of the stableeigenvalues of the projected Hamiltonian matrix (27). The one eigenvalue chosen asthe residual Hamiltonian shift σham is shown as ‘x’. The residual minimizing shiftσopt is shown as ‘o’. Each point σ of the complex plane is colored according to the

    Re

    Im x

    *

    *

    *

    *

    *

    *

    *

    ***o

    −8000 −6000 −4000 −2000

    −3

    −2

    −1

    0

    1

    2

    3

    x 104

    0.55

    0.6

    0.65

    0.7

    0.75

    0.8

    0.85

    0.9

    0.95

    Re

    Im xo

    −8000 −6000 −4000 −2000

    −3

    −2

    −1

    0

    1

    2

    3

    x 104

    0.7

    0.75

    0.8

    0.85

    0.9

    0.95

    (a) (b)

    Fig. 1 Each point σ of the complex plane is colored according to residual reduction obtained when σ is

    taken as the shift in the 14th iteration of RADI. a Ratios ρproj(σ ) = ‖Rproj14 (σ )‖/‖Rproj13 ‖ for the projectedequation of dimension 13. b Ratios ρ(σ) = ‖R14(σ )‖/‖R13‖ for the original equation of dimension 10648

    123

  • 322 P. Benner et al.

    ratio ρproj(σ ) = ‖Rproj14 (σ )‖/‖Rproj13 ‖, where Rproj is the residual for the projectedRiccati equation. The ratio ρproj(σham) ≈ 0.54297 is not far from the optimal ratioρproj(σopt) ≈ 0.53926.

    On the other hand, Fig. 1b shows the complex plane colored according to ratiosfor the original system of order 10648, ρ(σ) = ‖R14(σ )‖/‖R13‖. Neither of thevalues ρ(σham) ≈ 0.71510 and ρ(σopt) ≈ 0.71981 is optimal, but they both offer areasonable reduction of the residual norm in the next step. In this case, σham turnsout even to give a slightly better residual reduction for the original equation than σopt,making the extra effort in running the numerical optimization algorithm futile.

    5 Numerical experiments

    In this section we show a number of numerical examples, with several objectives inmind. First, our goal is to compare different low-rank implementations of the RiccatiADI algorithm mentioned in this paper: the low-rank qADI proposed in [37,38], theCayley transformed subspace iteration [23,24], and the complex and real variants ofthe RADI iteration (12). Second, we compare performance of the RADI approachagainst other methods for solving large-scale Riccati equations, namely the rationalKrylov subspace method (RKSM) [34], the extended block Arnoldi (EBA) method[16,32], and the Newton-ADI algorithm [7,9,15].

    Finally, we discuss various shift strategies for the RADI iteration described in theprevious section.

    The numerical experiments are run on a desktop computer with a four-core IntelCore i5-4690K processor and 16GB RAM. All algorithms and testing routines areimplemented and executed in MATLAB R2014a, running on Microsoft Windows 8.1.

    Example 2 Consider again the Riccati benchmark CUBE from Example 1. We usethree versions of this example: the previous setting with n = 10648, m = p = 1 andm = p = 10, and later on a finer discretization with n = 74088 and m = 10, p = 1.

    Table 1 collects timings in seconds for four different low-rank implementations ofthe Riccati ADI algorithm. The table shows only the time needed to run 80 iterationsteps; time spent for computation of the shifts used by all four variants is not included(in this case, 20 precomputedPenzl shiftswere used).All four variants compute exactlythe same iterates, as we have proved in Theorem 2.

    Table 1 Results obtained with different implementations for CUBE with n = 10648Implementation Time, m, p = 1 Time, m, p = 10Wong and Balakrishnan [37,38] 127.61 750.89

    Cayley subspace iteration [23,24] 21.67 167.02

    RADI—Algorithm 1 21.51 51.92

    RADI—Algorithm 2 11.14 26.35

    123

  • RADI: a low-rank ADI-type algorithm for large AREs. 323

    Clearly, the real variant of iteration (12), implemented as in Algorithm 2, outper-forms all the others.1 Thus we use this implementation in the remaining numericalexperiments. The RADI algorithms mostly obtain the advantage over the Cayley sub-space iteration because of the cheap computation of the residual norm. In the latteralgorithm, costly orthogonalization procedures are required for this task, and aftersome point these compensate the computational gains from the easier linear systems(cf. Sect. 4.2). Also, the times for the algorithm of Wong and Balakrishnan shown inthe table do not include the (very costly) computation of the residuals at all, so theiractual execution times are even higher.

    Next, we compare various shift strategies for RADI, as well as EBA, RKSM, andNewton-ADI algorithms. For RADI, we have the following strategies:

    – 20 precomputed Penzl shifts (“RADI—Penzl”) generated by using the Krylovsubspaces of dimensions 40 with matrices A and A−1;

    – residual Hamiltonian shifts (“RADI—Ham”), with � = 2p, � = 6p, and � = ∞;– residual minimizing shifts (“RADI—Ham+Opt”), with � = 2p, � = 6p, and

    � = ∞.For the RKSM, we have implemented the algorithm so that the use of complex arith-metic is minimized by merging two consecutive steps with complex conjugate shifts[28]. We use the adaptive shift strategy, implemented as described in [13]. EBA andRKSM require the solution of a projected CARE which can become expensive ifp > 1. Hence, this small scale solution is carried out only periodically in every 5th orevery 10th step—in the results, we display the variant that was faster.

    For all methods, the threshold for declaring convergence is reached once the relativeresidual is less than tol = 10−11. A summary of the results for all the differentmethodsand strategies is shown in Table 3. The column “final subspace dimension” displaysthe number of columns of the matrix Z , where X ≈ Z Z∗ is the final computedapproximation. Dividing this number by p (for EBA, by 2p), we obtain the numberof iterations used in a particular method. Just for the sake of completeness, we havealso included a variant of the Newton-ADI algorithm [7] with Galerkin projection [9].Without the Galerkin projection, the Newton-ADI algorithm could not compete withthe other methods. The recent developments from [15], which make the Newton-ADIalgorithm more competitive, are beyond the scope of this study.

    It is interesting to analyze the timing breakdown for RADI and RKSM methods.These timings are listed in Table 2 for the CUBE example with m = p = 10 where asignificant amount of time is spent for tasks other than solving linear systems.

    As � increases, the cost of computing shifts in RADI increases as well—the pro-jection subspace gets larger, and more effort is needed to orthogonalize its basis andcompute the eigenvalue decomposition of the projected Hamiltonian matrix. Thiseffort is, in the CUBE benchmark, awarded by a decrease in the number of iterations.However, there is a trade-off here: the extra computation does outweigh the savingin the number of iterations for sufficiently large �. Convergence history for CUBE isplotted in Fig. 2; to reduce the clutter, only the selected few methods are shown.

    1 Note that the generated Penzl shifts come in complex conjugate pairs.

    123

  • 324 P. Benner et al.

    Table 2 Times spend in different subtasks in the RADI iteration and RKSM for CUBE with m = p = 10Method Subtask Time

    RADI—Penzl: 135 iterations Precompute shifts 5.31

    Solve linear systems 43.24

    Total 49.19

    RADI—Ham, � = 2p: 139 iterations Solve linear systems 46.42Compute shifts dynamically 1.76

    Total 48.79

    RADI—Ham, � = 6p: 100 iterations Solve linear systems 32.73Compute shifts dynamically 2.75

    Total 35.92

    RADI—Ham, � = ∞: 74 iterations Solve linear systems 24.74Compute shifts dynamically 32.44

    Total 57.51

    RKSM—adaptive: 79 iterations Solve linear systems 18.45

    Orthogonalization 4.14

    Compute shifts dynamically 12.02

    Solve projected equations 15.98

    Total 53.82

    s

    (a) (b)

    Fig. 2 Algorithm performances for benchmark CUBE (n = 10648,m = p = 10). a Relative residualversus the subspace dimension used by an algorithm. b Relative residual versus time

    The fact that in each step RADI solves linear systems with p + m right hand sidevectors, compared to only p vectors in RKSM, may become noticable when m islarger than p. This effect is shown in Table 3 for CUBE with m = 10 and p = 1.Unlike these two methods, EBA can precompute the LU factorization of A, and winby a large margin in this test case.

    123

  • RADI: a low-rank ADI-type algorithm for large AREs. 325

    Table 3 Results of the numerical experiments

    Example Method No. iterations Final subspace dim. Time

    CUBEn = 10648,m = p = 1

    RADI—Penzl 97 97 18.96

    RADI—Ham, � = 2p 119 119 17.10RADI—Ham, � = 6p 99 99 14.15RADI—Ham, � = ∞ 75 75 11.60RADI—Ham+Opt, � = 2p 122 122 17.87RADI—Ham+Opt, � = 6p 103 103 16.70RADI—Ham+Opt, � = ∞ 108 108 18.44RKSM—adaptive 83 83 14.80

    EBA 111 222 6.23

    Newton-ADI 2 outer, 296 inner 192 42.11

    CUBEn = 10648,m = p = 10

    RADI—Penzl 135 1350 49.19

    RADI—Ham, � = 2p 139 1390 48.79RADI—Ham, � = 6p 100 1000 35.92RADI—Ham, � = ∞ 74 740 57.51RADI—Ham+Opt, � = 2p 87 870 30.59RADI—Ham+Opt, � = 6p 90 900 33.20RADI—Ham+Opt, � = ∞ 90 900 119.55RKSM—adaptive 79 790 53.82

    EBA 91 1820 230.57

    Newton-ADI 2 outer, 202 inner 1960 75.60

    CUBEn = 74088,m = 10, p = 1

    RADI—Penzl 139 139 1048.60

    RADI—Ham, � = 2p 97 97 617.62RADI—Ham, � = 6p 81 81 506.37RADI—Ham, � = ∞ 72 72 446.64RADI—Ham+Opt, � = 2p 101 101 621.38RADI—Ham+Opt, � = 6p 93 93 571.34RADI—Ham+Opt, � = ∞ 63 63 387.43RKSM—adaptive 73 73 338.78

    EBA 81 162 30.45

    Newton-ADI 2 outer, 288 inner 968 1546.29

    CHIPn = 20082,m = 1, p = 5

    RADI—Penzl 33 165 51.57

    RADI—Ham, � = 2p 36 180 30.32RADI—Ham, � = 6p 29 145 24.36

    123

  • 326 P. Benner et al.

    Table 3 continued

    Example Method No. iterations Final subspace dim. Time

    RADI—Ham, � = ∞ 26 130 22.64RADI—Ham+Opt, � = 2p 29 145 23.97RADI—Ham+Opt, � = 6p 26 130 22.26RADI—Ham+Opt, � = ∞ 25 125 22.33RKSM—adaptive 26 130 23.33

    EBA 26 260 6.69

    Newton-ADI 2 outer, 64 inner 204 54.04

    IFISSn = 66049,m = p = 5

    RADI—Penzl >50 >250

    RADI—Ham, � = 2p 22 110 17.21RADI—Ham, � = 6p 19 95 15.37RADI—Ham, � = ∞ 20 100 17.46RADI—Ham+Opt, � = 2p 27 135 21.12RADI—Ham+Opt, � = 6p Did not convergeRADI—Ham+Opt, � = ∞ Did not convergeRKSM—adaptive 26 130 22.28

    EBA 11 110 9.26

    Newton-ADI 2 outer, 46 inner 250 38.05

    RAILn = 317377,m = 7, p = 6

    RADI—Penzl 66 396 182.60

    RADI—Ham, � = 2p 49 294 131.34RADI—Ham, � = 6p 43 258 127.11RADI—Ham, � = ∞ 46 276 197.06RADI—Ham+Opt, � = 2p 46 276 124.13RADI—Ham+Opt, � = 6p 40 240 120.04RADI—Ham+Opt, � = ∞ 39 234 158.89RKSM—adaptive 41 246 188.60

    EBA 91 1092 916.21

    Newton-ADI 1 outer, 62 inner 372 279.90

    LUNGn = 109460,m = p = 10

    RADI—Penzl Did not converge

    RADI—Ham, � = 2p 31 310 30.03RADI—Ham, � = 6p 28 280 30.22RADI—Ham, � = ∞ 26 260 34.83RADI—Ham+Opt, � = 2p 25 250 22.33RADI—Ham+Opt, � = 6p 17 170 17.74RADI—Ham+Opt, � = ∞ 17 170 19.02

    123

  • RADI: a low-rank ADI-type algorithm for large AREs. 327

    Table 3 continued

    Example Method No. iterations Final subspace dim. Time

    RKSM—adaptive 61 610 114.22

    EBA Did not converge

    Newton-ADI Did not converge

    Numbers in bold indicate the smallest subspace dimensions and execution times

    Example 3 Next, we run the Riccati solvers for the well-known benchmark exampleCHIP. All coefficient matrices for the Riccati equation are taken as they are foundin the Oberwolfach Model Reduction Benchmark Collection [17]. Here we solve thegeneralized Riccati equation (25).

    The cost of precomputing shifts is very high in case of CHIP. One fact not shown inthe table is that all algorithms which compute shifts dynamically have already solvedthe Riccati equation before “RADI—Penzl” has even started.

    Example 4 We use the IFISS 3.2. finite-element package [31] to generate the coef-ficient matrices for a generalized Riccati equation. We choose the provided exampleT-CD 2 which represents a finite element discretization of a two-dimensional convec-tion diffusion equation on a square domain. The leading dimension is n = 66049, withE symmetric positive definite, and A non-symmetric. The matrix B consists ofm = 5randomly generated columns, and C = [C1, 0] with random C1 ∈ R5×5 (p = 5).

    In this example, the RADI iteration with Penzl shifts converges very slowly. TheRADI iteration with dynamically generated shifts are quite fast, and the final subspacedimension is smallest among all methods. On the other hand, the version with residualminimizing shifts does not converge for � = 6p,∞: it quickly reaches the relativeresidual of about 10−7, and then gets stuck by continually using shifts very close tozero. Figure 3 shows the convergence history for some of the used shift strategies.

    Example 5 The example RAIL is a larger version of the steel profile cooling modelfrom the Oberwolfach Model Reduction Benchmark Collection [17]. A finer finiteelement discretization was used for the heat equation resulting in a generalized CAREn = 317377, m = 7, p = 6, and E and A symmetric positive and negative definite,respectively. Once again, there is a trade-off between (questionably) better shifts withlarger � and faster computation with lower �.

    Example 6 The final example LUNG from theUFSparseMatrix Collection [12]mod-els temperature and water vapor transport in the human lung. It provides matrices withleading dimension n = 109460, E = I , A nonsymmetric, and B,C are generated asrandom matrices with m = p = 10. This example shows the importance of propershift generation: precomputed shifts are completely useless, while dynamically gen-erated ones show different rates of success. The projection based methods (RKSM,EBA) encountered problems at the numerical solution of the projected ARE. Eitherthe complete algorithm broke down or convergence speed was reduced. Similar issueswere encountered at the Galerkin acceleration stage in the Newton-ADI method.

    Let us summarize the findings from these and a number of other numerical exampleswe used to test the algorithms. Clearly, using dynamically generated shifts for the

    123

  • 328 P. Benner et al.

    (a) (b)

    Fig. 3 Algorithm performances for benchmark IFISS (n = 66049,m = p = 5). a Relative residual versusthe subspace dimension used by an algorithm. b Relative residual versus time

    RADI iteration has many benefits compared to the precomputed Penzl shifts. Not onlythat the number of iterations and running time are reduced, but the convergence ismorereliable. Further, there frequently exists a small value of � for which one or both of thedynamical shift strategies converge in a number of iterations comparable to runs with� = ∞, and in far less time. However, an a-priori method of determining a sufficientlysmall � with such properties is still to be found, and a topic of our future research.RADI appears to be quite competitive with other state of the art algorithms for solvinglarge scale CAREs. It frequently generates solutions using the lowest dimensionalsubspace. Since RADI iterations do not require any orthogonalization nor solvingprojected CAREs, the algorithm may outperform RKSM and EBA in problems wherethe final subspace dimension is high. On the other hand, the later methods may havean advantage when the running time is dominated by solving linear systems. It seemsthat for now, there is no single algorithm of choice that would consistently and reliablyrun fastest.

    6 Conclusion

    In this paper, we have presented a new low-rank RADI algorithm for computingsolutions of large scale Riccati equations. We have shown that this algorithm producesexactly the same iterates as three previously knownmethods (for which we suggest thecommon name “Riccati ADI methods”), but it does so in a computationally far moreefficient way. As with other Riccati solvers, the performance is heavily dependenton the choice of shift parameters. We have suggested several strategies on how thismay be done; some of them show very promising results, making the RADI algorithmcompetitive with the fastest large scale Riccati solvers.

    Acknowledgements Open access funding provided by Max Planck Society. Part of this work was donewhile the second author was a postdoctoral researcher at Max Planck Institute Magdeburg, Germany. Thesecond author would also like to acknowledge the support of the Croatian Science Foundation under GrantHRZZ-9345.

    123

  • RADI: a low-rank ADI-type algorithm for large AREs. 329

    Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 Interna-tional License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution,and reproduction in any medium, provided you give appropriate credit to the original author(s) and thesource, provide a link to the Creative Commons license, and indicate if changes were made.

    References

    1. Amodei, L., Buchot, J.M.: An invariant subspace method for large-scale algebraic Riccati equation.Appl. Numer. Math. 60(11), 1067–1082 (2010). doi:10.1016/j.apnum.2009.09.006

    2. Benner, P.: Theory and Numerical Solution of Differential and Algebraic Riccati Equations. In: Ben-ner, P., Bollhöfer, M., Kressner, D., Mehl, C., Stykel, T. (eds.) Numerical Algebra, Matrix Theory,Differential-Algebraic Equations and Control Theory, pp. 67–105. Springer, Berlin (2015). doi:10.1007/978-3-319-15260-8_4

    3. Benner, P., Bujanović, Z.: On the solution of large-scale algebraic Riccati equations by using low-dimensional invariant subspaces. Linear Algebra Appl. 488, 430–459 (2016). doi:10.1016/j.laa.2015.09.027

    4. Benner, P., Kürschner, P., Saak, J.: An improved numerical method for balanced truncation for sym-metric second order systems. Math. Comput. Model. Dyn. Syst. 19(6), 593–615 (2013). doi:10.1080/13873954.2013.794363

    5. Benner, P., Kürschner, P., Saak, J.: Efficient handling of complex shift parameters in the low-rank Cholesky factor ADI method. Numer. Algorithms 62(2), 225–251 (2013). doi:10.1007/s11075-012-9569-7

    6. Benner, P., Kürschner, P., Saak, J.: Self-generating and efficient shift parameters in ADI methods forlarge Lyapunov and Sylvester equations. Electr. Trans. Num. Anal. 43, 142–162 (2014)

    7. Benner, P., Li, J.R., Penzl, T.: Numerical solution of large Lyapunov equations, Riccati equations, andlinear-quadratic control problems. Numer. Linear Algebra Appl. 15(9), 755–777 (2008)

    8. Benner, P., Li, R.C., Truhar, N.: On the ADI method for Sylvester equations. J. Comput. Appl. Math.233(4), 1035–1045 (2009)

    9. Benner, P., Saak, J.: A Galerkin-Newton-ADI method for solving large-scale algebraic Riccati equa-tions. Preprint SPP1253-090, SPP1253 (2010). http://www.am.uni-erlangen.de/home/spp1253/wiki/index.php/Preprints

    10. Benner, P., Saak, J.: Numerical solution of large and sparse continuous time algebraic matrix Riccatiand Lyapunov equations: a state of the art survey. GAMM Mitt. 36(1), 32–52 (2013). doi:10.1002/gamm.201310003

    11. Bini, D., Iannazzo, B., Meini, B.: Numerical Solution of Algebraic Riccati Equations. Fundamentalsof Algorithms. SIAM, New Delhi (2012)

    12. Davis, T.A., Hu, Y.: The University of Florida sparse matrix collection. ACM Trans. Math. Softw.38(1), 1:1–1:25 (2011). doi:10.1145/2049662.2049663

    13. Druskin, V., Simoncini, V.: Adaptive rational Krylov subspaces for large-scale dynamical systems.Syst. Control Lett. 60(8), 546–560 (2011). doi:10.1016/j.sysconle.2011.04.013

    14. Golub,G.H.,VanLoan,C.F.:MatrixComputations, 4th edn. JohnsHopkinsUniversity Press,Baltimore(2013)

    15. Heinkenschloss, M., Weichelt, H.K., Benner, P., Saak, J.: An inexact low-rank Newton-ADI methodfor large-scale algebraic Riccati equations. Appl. Numer. Math. 108, 125–142 (2016)

    16. Heyouni, M., Jbilou, K.: An extended block Arnoldi algorithm for large-scale solutions of thecontinuous-time algebraic Riccati equation. Electr. Trans. Num. Anal. 33, 53–62 (2009)

    17. Korvink, J.G., Rudnyi, E.B.: Oberwolfach benchmark collection. In: Benner, P., Sorensen, D.C.,Mehrmann, V. (eds.) Dimension Reduction of Large-Scale Systems, Lecture Notes in Compu-tational Science and Engineering, vol. 45, pp. 311–315. Springer, Berlin (2005). doi:10.1007/3-540-27909-1_11

    18. Kürschner, P.: Efficient low-rank solution of large-scale matrix equations. Ph.D. thesis, Otto-von-Guericke-Universität Magdeburg (2016)

    19. Lancaster, P., Rodman, L.: The Algebraic Riccati Equation. Oxford University Press, Oxford (1995)20. Laub, A.J.: A Schur method for solving algebraic Riccati equations. IEEE Trans. Autom. Control

    AC–24, 913–921 (1979)

    123

    http://creativecommons.org/licenses/by/4.0/http://dx.doi.org/10.1016/j.apnum.2009.09.006http://dx.doi.org/10.1007/978-3-319-15260-8_4http://dx.doi.org/10.1007/978-3-319-15260-8_4http://dx.doi.org/10.1016/j.laa.2015.09.027http://dx.doi.org/10.1016/j.laa.2015.09.027http://dx.doi.org/10.1080/13873954.2013.794363http://dx.doi.org/10.1080/13873954.2013.79436310.1007/s11075-012-9569-710.1007/s11075-012-9569-7http://www.am.uni-erlangen.de/home/spp1253/wiki/index.php/Preprintshttp://www.am.uni-erlangen.de/home/spp1253/wiki/index.php/Preprintshttp://dx.doi.org/10.1002/gamm.201310003http://dx.doi.org/10.1002/gamm.201310003http://dx.doi.org/10.1145/2049662.2049663http://dx.doi.org/10.1016/j.sysconle.2011.04.01310.1007/3-540-27909-1_1110.1007/3-540-27909-1_11

  • 330 P. Benner et al.

    21. Levenberg, N., Reichel, L.: A generalized ADI iterative method. Numer. Math. 66(1), 215–233 (1993).doi:10.1007/BF01385695

    22. Li, J.R., White, J.: Low rank solution of Lyapunov equations. SIAM J. Matrix Anal. Appl. 24(1),260–280 (2002)

    23. Lin, Y., Simoncini, V.: A new subspace iteration method for the algebraic Riccati equation. Numer.Linear Algebra Appl. 22(1), 26–47 (2015). doi:10.1002/nla.1936

    24. Massoudi, A., Opmeer, M.R., Reis, T.: Analysis of an iteration method for the algebraic Riccatiequation. SIAM J. Matrix Anal. Appl. 37(2), 624–648 (2016). doi:10.1137/140985792

    25. Mehrmann, V., Tan, E.: Defect correctionmethods for the solution of algebraic Riccati equations. IEEETrans. Autom. Control 33, 695–698 (1988)

    26. Overton, M.L.: Large-scale optimization of eigenvalues. SIAM J. Optim. 2(1), 88–120 (1992). doi:10.1137/0802007

    27. Penzl, T.: Lyapack Users guide. Technical Report SFB393/00-33, Sonderforschungsbereich 393Numerische Simulation auf massiv parallelen Rechnern, TU Chemnitz, 09107 Chemnitz, Germany(2000). http://www.tu-chemnitz.de/sfb393/sfb00pr.html

    28. Ruhe, A.: The rational Krylov algorithm for nonsymmetric eigenvalue problems. III: complex shiftsfor real matrices. BIT 34, 165–176 (1994)

    29. Saak, J.: Efficient numerical solution of large scale algebraic matrix equations in PDE control andmodel order reduction. Ph.D. thesis, TU Chemnitz (2009). http://nbn-resolving.de/urn:nbn:de:bsz:ch1-200901642

    30. Sabino, J.: Solution of large-scale Lyapunov equations via the blockmodified smithmethod. Ph.D. The-sis, Rice University, Houston, Texas (2007). http://www.caam.rice.edu/tech_reports/2006/TR06-08.pdf

    31. Silvester, D., Elman, H., Ramage, A.: Incompressible Flow and Iterative Solver Software (IFISS)version 3.2 (2012). http://www.manchester.ac.uk/ifiss

    32. Simoncini, V.: A new iterative method for solving large-scale Lyapunov matrix equations. SIAM J.Sci. Comput. 29(3), 1268–1288 (2007). doi:10.1137/06066120X

    33. Simoncini, V.: Computational methods for linear matrix equations. SIAM Rev. 38(3), 377–441 (2016)34. Simoncini, V., Szyld, D., Monsalve, M.: On two numerical methods for the solution of large-scale

    algebraic Riccati equations. IMA J. Numer. Anal. 34(3), 904–920 (2014)35. Wachspress, E.: Iterative solution of the Lyapunov matrix equation. Appl. Math. Lett. 107, 87–90

    (1988)36. Wachspress, E.: The ADI Model Problem. Springer, New York (2013). doi:10.1007/

    978-1-4614-5122-837. Wong, N., Balakrishnan, V.: Quadratic alternating direction implicit iteration for the fast solution of

    algebraic Riccati equations. In: Proceedings of International Symposium on Intelligent Signal Pro-cessing and Communication Systems, pp. 373–376 (2005)

    38. Wong, N., Balakrishnan, V.: Fast positive-real balanced truncation via quadratic alternating directionimplicit iteration. IEEE Trans. Comput.-Aided Des. Integr. Circuits Syst. 26(9), 1725–1731 (2007)

    123

    http://dx.doi.org/10.1007/BF01385695http://dx.doi.org/10.1002/nla.1936http://dx.doi.org/10.1137/140985792http://dx.doi.org/10.1137/0802007http://dx.doi.org/10.1137/0802007http://www.tu-chemnitz.de/sfb393/sfb00pr.htmlhttp://nbn-resolving.de/urn:nbn:de:bsz:ch1-200901642http://nbn-resolving.de/urn:nbn:de:bsz:ch1-200901642http://www.caam.rice.edu/tech_reports/2006/TR06-08.pdfhttp://www.caam.rice.edu/tech_reports/2006/TR06-08.pdfhttp://www.manchester.ac.uk/ifisshttp://dx.doi.org/10.1137/06066120X10.1007/978-1-4614-5122-810.1007/978-1-4614-5122-8

    RADI: a low-rank ADI-type algorithm for large scale algebraic Riccati equationsAbstract1 Introduction2 A new low-rank factored iteration2.1 Derivation of the algorithm

    3 Equivalences with other Riccati methods4 Implementation aspects of the RADI algorithm4.1 Computing the residual and the stopping criterion4.2 Solving linear systems in RADI4.3 Reducing the use of complex arithmetic4.4 RADI iteration for the generalized Riccati equation4.5 Shift selection4.5.1 Residual Hamiltonian shifts4.5.2 Residual minimizing shifts

    5 Numerical experiments6 ConclusionAcknowledgementsReferences