A. Further Background on Gauss Quadratureproceedings.mlr.press/v48/lig16-supp.pdf · 2020. 11....

12
Gauss quadrature for matrix inverse forms with applications A. Further Background on Gauss Quadrature We present below a more detailed summary of material on Gauss quadrature to make the paper self-contained. A.1. Selecting weights and nodes We’ve described that the Riemann-Stieltjes integral could be expressed as I [f ] := Q n + R n = X n i=1 ! i f (i )+ X m i=1 i f (i )+ R n [f ], where Q n denotes the nth degree approximation and R n denotes a remainder term. The weights {! i } n i=1 , {i } m i=1 and nodes {i } n i=1 are chosen such that for all polynomials of degree less than 2n+m-1, denoted f 2 P 2n+m-1 , we have exact interpolation I [f ]= Q n . One way to compute weights and nodes is to set f (x)= x i for i 2n + m - 1 and then use this exact nonlinear system. But there is an easier way to obtain weights and nodes, namely by using polynomials orthogonal with respect to the measure . Specifically, we construct a sequence of orthogonal polynomials p 0 (λ),p 1 (λ),... such that p i (λ) is a polynomial in λ of degree exactly k, and p i , p j are orthogonal, i.e., they satisfy Z λmax λmin p i (λ)p j (λ)d(λ)= 1, i = j 0, otherwise. The roots of p n are distinct, real and lie in the interval of [λ min , λ max ], and form the nodes {i } n i=1 for Gauss quadra- ture (see, e.g., (Golub & Meurant, 2009, Ch. 6)). Consider the two monic polynomials whose roots serve as quadrature nodes: n (λ)= Y n i=1 (λ - i ), m (λ)= Y m i=1 (λ - i ), where 0 =1 for consistency. We further denote + m = ±m , where the sign is taken to ensure + m 0 on [λ min , λ max ]. Then, for m> 0, we calculate the quadrature weights as ! i = I + m (λ)n (λ) + m (i )0 n (i )(λ - i ) , j = I + m (λ)n (λ) (+ m ) 0 (j )n (j )(λ - j ) , where f 0 (λ) denotes the derivative of f with respect to λ. When m =0 the quadrature degenerates to Gauss quadrature and we have ! i = I n (λ) 0 n (i )(λ - i ) . Although we have specified how to select nodes and weights for quadrature, these ideas cannot be applied to our problem because the measure is unknown. Indeed, calculating the measure explicitly would require knowing the entire spectrum of A, which is as good as explicitly computing f (A), hence untenable for us. The next section shows how to circumvent the difficulties due to unknown . A.2. Gauss Quadrature Lanczos (GQL) The key idea to circumvent our lack of knowledge of is to recursively construct polynomials called Lanczos polynomials. The construction ensures their orthogonality with respect to . Concretely, we construct Lanczos polynomials via the following three-term recurrence: β i p i (λ)=(λ - i )p i-1 (λ) - β i-1 p i-2 (λ), i =1, 2,...,n p -1 (λ) 0; p 0 (λ) 1, (A.1) while ensuring R λmax λmin d(λ)=1. We can express (A.1) in matrix form by writing λP n (λ)= J n P n (λ)+ β n p n (λ)e n ,

Transcript of A. Further Background on Gauss Quadratureproceedings.mlr.press/v48/lig16-supp.pdf · 2020. 11....

  • Gauss quadrature for matrix inverse forms with applications

    A. Further Background on Gauss QuadratureWe present below a more detailed summary of material on Gauss quadrature to make the paper self-contained.

    A.1. Selecting weights and nodes

    We’ve described that the Riemann-Stieltjes integral could be expressed as

    I[f ] := Qn

    +Rn

    =

    X

    n

    i=1

    !i

    f(✓i

    ) +

    X

    m

    i=1

    ⌫i

    f(⌧i

    ) +Rn

    [f ],

    where Qn

    denotes the nth degree approximation and Rn

    denotes a remainder term. The weights {!i

    }ni=1

    , {⌫i

    }mi=1

    andnodes {✓

    i

    }ni=1

    are chosen such that for all polynomials of degree less than 2n+m�1, denoted f 2 P2n+m�1, we have exactinterpolation I[f ] = Q

    n

    . One way to compute weights and nodes is to set f(x) = xi for i 2n+m� 1 and then use thisexact nonlinear system. But there is an easier way to obtain weights and nodes, namely by using polynomials orthogonalwith respect to the measure ↵. Specifically, we construct a sequence of orthogonal polynomials p

    0

    (�), p1

    (�), . . . such thatpi

    (�) is a polynomial in � of degree exactly k, and pi

    , pj

    are orthogonal, i.e., they satisfy

    Z

    max

    min

    pi

    (�)pj

    (�)d↵(�) =

    1, i = j0, otherwise.

    The roots of pn

    are distinct, real and lie in the interval of [�min

    ,�max

    ], and form the nodes {✓i

    }ni=1

    for Gauss quadra-ture (see, e.g., (Golub & Meurant, 2009, Ch. 6)).

    Consider the two monic polynomials whose roots serve as quadrature nodes:

    ⇡n

    (�) =Y

    n

    i=1

    (�� ✓i

    ), ⇢m

    (�) =Y

    m

    i=1

    (�� ⌧i

    ),

    where ⇢0

    = 1 for consistency. We further denote ⇢+m

    = ±⇢m

    , where the sign is taken to ensure ⇢+m

    � 0 on [�min

    ,�max

    ].Then, for m > 0, we calculate the quadrature weights as

    !i

    = I

    ⇢+m

    (�)⇡n

    (�)

    ⇢+m

    (✓i

    )⇡0n

    (✓i

    )(�� ✓i

    )

    , ⌫j

    = I

    ⇢+m

    (�)⇡n

    (�)

    (⇢+m

    )

    0(⌧

    j

    )⇡n

    (⌧j

    )(�� ⌧j

    )

    ,

    where f 0(�) denotes the derivative of f with respect to �. When m = 0 the quadrature degenerates to Gauss quadratureand we have

    !i

    = I

    ⇡n

    (�)

    ⇡0n

    (✓i

    )(�� ✓i

    )

    .

    Although we have specified how to select nodes and weights for quadrature, these ideas cannot be applied to our problembecause the measure ↵ is unknown. Indeed, calculating the measure explicitly would require knowing the entire spectrumof A, which is as good as explicitly computing f(A), hence untenable for us. The next section shows how to circumventthe difficulties due to unknown ↵.

    A.2. Gauss Quadrature Lanczos (GQL)

    The key idea to circumvent our lack of knowledge of ↵ is to recursively construct polynomials called Lanczos polynomials.The construction ensures their orthogonality with respect to ↵. Concretely, we construct Lanczos polynomials via thefollowing three-term recurrence:

    �i

    pi

    (�) = (�� ↵i

    )pi�1(�)� �i�1pi�2(�), i = 1, 2, . . . , n

    p�1(�) ⌘ 0; p0(�) ⌘ 1, (A.1)

    while ensuringR

    max

    min

    d↵(�) = 1. We can express (A.1) in matrix form by writing

    �Pn

    (�) = Jn

    Pn

    (�) + �n

    pn

    (�)en

    ,

  • Gauss quadrature for matrix inverse forms with applications

    where Pn

    (�) := [p0

    (�), . . . , pn�1(�)]>, en is nth canonical unit vector, and Jn is the tridiagonal matrix

    Jn

    =

    2

    6

    6

    6

    6

    6

    6

    4

    ↵1

    �1

    �1

    ↵2

    �2

    �2

    . . . . . .

    . . . ↵n�1 �n�1

    �n�1 ↵n

    3

    7

    7

    7

    7

    7

    7

    5

    . (A.2)

    This matrix is known as the Jacobi matrix, and is closed related to Gauss quadrature. The following well-known theoremmakes this relation precise.

    Theorem 10 ((Wilf, 1962; Golub & Welsch, 1969)). The eigenvalues of Jn

    form the nodes {✓i

    }ni=1

    of Gauss-type quadra-tures. The weights {!

    i

    }ni=1

    are given by the squares of the first elements of the normalized eigenvectors of Jn

    .

    Thus, if Jn

    has the eigendecomposition Jn

    = P>n

    �Pn

    , then for Gauss quadrature Theorem 10 yields

    Qn

    =

    X

    n

    i=1

    !i

    f(✓i

    ) = e>1

    P>n

    f(�)Pn

    e1

    = e>1

    f(Jn

    )e1

    . (A.3)

    Specialization. We now specialize to our main focus, f(A) = A�1, for which we prove more precise results. In thiscase, (A.3) becomes Q

    n

    = [J�1n

    ]

    1,1

    . The task now is to compute Qn

    , and given A, u to obtain the Jacobi matrix Jn

    .

    Fortunately, we can efficiently calculate Jn

    iteratively using the Lanczos Algorithm (Lanczos, 1950). Suppose we havean estimate J

    i

    , in iteration (i + 1) of Lanczos, we compute the tridiagonal coefficients ↵i+1

    and �i+1

    and add them tothis estimate to form J

    i+1

    . As to Qn

    , assuming we have already computed [J�1i

    ]

    1,1

    , letting ji

    = J�1i

    ei

    and invoking theSherman-Morrison identity (Sherman & Morrison, 1950) we obtain the recursion:

    [J�1i+1

    ]

    1,1

    = [J�1i

    ]

    1,1

    +

    �2i

    ([ji

    ]

    1

    )

    2

    ↵i+1

    � �2i

    [ji

    ]

    i

    , (A.4)

    where [ji

    ]

    1

    and [ji

    ]

    i

    can be recursively computed using a Cholesky-like factorization of Ji

    (Golub & Meurant, 2009, p.31).

    For Gauss-Radau quadrature, we need to modify Ji

    so that it has a prescribed eigenvalue. More precisely, we extend Ji

    to J lri

    for left Gauss-Radau (J rri

    for right Gauss-Radau) with �i

    on the off-diagonal and ↵lri

    (↵rri

    ) on the diagonal, so thatJ lri

    (J rri

    ) has a prescribed eigenvalue of �min

    (�max

    ).

    For Gauss-Lobatto quadrature, we extend Ji

    to J loi

    with values �loi

    and ↵loi

    chosen to ensure that J loi

    has the prescribedeigenvalues �

    min

    and �max

    . For more detailed on the construction, see (Golub, 1973).

    For all methods, the approximated values are calculated as [(J 0i

    )

    �1]

    1,1

    , where J 0i

    2 {J lri

    , J rri

    , J loi

    } is the modified Jacobimatrix. Here J 0

    i

    is constructed at the i-th iteration of the algorithm.

    The algorithm for computing Gauss, Gauss-Radau, and Gauss-Lobatto quadrature rules with the help of Lanczos itera-tion is called Gauss Quadrature Lanczos (GQL) and is shown in (Golub & Meurant, 1997). We recall its pseudocodein Algorithm 1 to make our presentation self-contained (and for our proofs in Section 4).

    The error of approximating I[f ] by Gauss-type quadratures can be expressed as

    Rn

    [f ] =f (2n+m)(⇠)

    (2n+m)!I[⇢

    m

    ⇡2n

    ],

    for some ⇠ 2 [�min

    ,�max

    ] (see, e.g., (Stoer & Bulirsch, 2013)). Note that ⇢m

    does not change sign in [�min

    ,�max

    ]; but withdifferent values of m and ⌧

    j

    we obtain different (but fixed) signs for Rn

    [f ] using f(�) = 1/� and �min

    > 0. Concretely,for Gauss quadrature m = 0 and R

    n

    [f ] � 0; for left Gauss-Radau m = 1 and ⌧1

    = �min

    , so we have Rn

    [f ] 0; for rightGauss-Radau we have m = 1 and ⌧

    1

    = �max

    , thus Rn

    [f ] � 0; while for Gauss-Lobatto we have m = 2, ⌧1

    = �min

    and⌧2

    = �max

    , so that Rn

    [f ] 0. This behavior of the errors clearly shows the ordering relations between the target valuesand the approximations made by the different quadrature rules. Lemma 2 (see e.g., (Meurant, 1997)) makes this claimprecise.

  • Gauss quadrature for matrix inverse forms with applications

    Algorithm 5 Gauss Quadrature Lanczos (GQL)Input: u and A the corresponding vector and matrix, �

    min

    and �max

    lower and upper bounds for the spectrum of AOutput: g

    i

    , grri

    , glri

    and gloi

    the Gauss, right Gauss-Radau, left Gauss-Radau and Gauss-Lobatto quadrature computed ati-th iteration

    Initialize: u�1 = 0, u0 = u/kuk, ↵1 = u>0

    Au0

    , �1

    = k(A�↵1

    I)u0

    k, g1

    = kuk/↵1

    , c1

    = 1, �1

    = ↵1

    , �lr1

    = ↵1

    ��min

    ,�rr1

    = ↵1

    � �max

    , u1

    = (A� ↵1

    I)u0

    /�1

    , i = 2while i N do

    ↵i

    = u>i�1Aui�1 {Lanczos Iteration}

    ũi

    = Aui�1 � ↵iui�1 � �i�1ui�2

    �i

    = kũi

    kui

    = ũi

    /�i

    gi

    = gi�1 +

    kuk�2i�1c2

    i�1�i�1(↵i�i�1��2i�1)

    {Update gi

    with Sherman-Morrison formula}ci

    = ci�1�i�1/�i�1

    �i

    = ↵i

    � �2i�1�i�1

    , �lri

    = ↵i

    � �min

    � �2i�1�

    lri�1

    , �rri

    = ↵i

    � �max

    � �2i�1�

    rri�1

    ↵lri

    = �min

    +

    2

    i

    lri

    , ↵rri

    = �max

    +

    2

    i�

    rri{Solve for J lr

    i

    and J rri

    }↵loi

    =

    lri �

    rri

    rri��lri

    (

    max

    lri

    � �min�

    rri), (�lo

    i

    )

    2

    =

    lri �

    rri

    rri��lri

    (�max

    � �min

    ) {Solve for J loi

    }glri

    = gi

    +

    2

    i c2

    i kuk�i(↵

    lri�i��2i )

    , grri

    = gi

    +

    2

    i c2

    i kuk�i(↵

    rri �i��2i )

    , gloi

    = gi

    +

    (�

    loi )

    2

    c

    2

    i kuk�i(↵

    loi �i�(�loi )2)

    {Update grri

    , glri

    and gloi

    with Sherman-Morrisonformula}

    i = i+ 1end while

    Lemma 11. Let gi

    , glri

    , grri

    , and gloi

    be the approximations at the i-th iteration of Gauss, left Gauss-Radau, right Gauss-Radau, and Gauss-Lobatto quadrature, respectively. Then, g

    i

    and grri

    provide lower bounds on u>A�1u, while glri

    and gloi

    provide upper bounds.

    The final connection we recall as background is the method of conjugate gradients. This helps us analyze the speed atwhich quadrature converges to the true value (assuming exact arithmetic).

    A.3. Relation with Conjugate Gradient

    While Gauss-type quadratures relate to the Lanczos algorithm, Lanczos itself is closely related to conjugate gradient(CG) (Hestenes & Stiefel, 1952), a well-known method for solving Ax = b for positive definite A.

    We recap this connection below. Let xk

    be the estimated solution at the k-th CG iteration. If x⇤ denotes the true solutionto Ax = b, then the error "

    k

    and residual rk

    are defined as

    "k

    := x⇤ � xk

    , rk

    = A"k

    = b�Axk

    , (A.5)

    At the k-th iteration, xk

    is chosen such that rk

    is orthogonal to the k-th Krylov space, i.e., the linear space Kk

    spannedby {r

    0

    , Ar0

    , . . . , Ak�1r0

    }. It can be shown (Meurant, 2006) that rk

    is a scaled Lanczos vector from the k-th iteration ofLanczos started with r

    0

    . Noting the relation between Lanczos and Gauss quadrature applied to appoximate r>0

    A�1r0

    , oneobtains the following theorem that relates CG with GQL.

    Theorem 12 (CG and GQL; (Meurant, 1999)). Let "k

    be the error as in (A.5), and let k"k

    k2A

    := "Tk

    A"k

    . Then, it holdsthat

    k"k

    k2A

    = kr0

    k2([J�1N

    ]

    1,1

    � [J�1k

    ]

    1,1

    ),

    where Jk

    is the Jacobi matrix at the k-th Lanczos iteration starting with r0

    .

    Finally, the rate at which k"k

    k2A

    shrinks has also been well-studied, as noted below.

    Theorem 13 (CG rate, see e.g. (Shewchuk, 1994)). Let "k

    be the error made by CG at iteration k when started with x0

    .

  • Gauss quadrature for matrix inverse forms with applications

    Let be the condition number of A, i.e., = �1

    /�N

    . Then, the error norm at iteration k satisfies

    k"k

    kA

    2⇣

    p� 1p+ 1

    k

    k"0

    kA

    .

    Due to these explicit relations between CG and Lanczos, as well as between Lanczos and Gauss quadrature, we readilyobtain the following convergence rate for relative error of Gauss quadrature.Theorem 14 (Gauss quadrature rate). The i-th iterate of Gauss quadrature satisfies the relative error bound

    gN

    � gi

    gN

    2⇣

    p� 1p+ 1

    i

    .

    Proof. This is obtained by exploiting relations among CG, Lanczos and Gauss quadrature. Set x0

    = 0 and b = u. Then,"0

    = x⇤ and r0

    = u. An application of Theorem 12 and Theorem 13 thus yields the bound

    k"i

    k2A

    = kuk2([J�1N

    ]

    1,1

    � [J�1i

    ]

    1,1

    ) = gN

    � gi

    2⇣

    p� 1p+ 1

    i

    k"0

    kA

    = 2

    p� 1p+ 1

    i

    u>A�1u = 2⇣

    p� 1p+ 1

    i

    gN

    where the last equality draws from Lemma 15.

    In other words, Theorem 14 shows that the iterates of Gauss quadrature converge linearly.

    B. Proofs for Main Theoretical ResultsWe begin by proving an exactness property of Gauss and Gauss-Radau quadrature.Lemma 15 (Exactness). With A being symmetric positive definite with simple eigenvalues, the iterates g

    N

    , glrN

    , and grrN

    are exact. Namely, after N iterations they satisfy

    gN

    = glrN

    = grrN

    = u>A�1u.

    Proof. Observe that the Jacobi tridiagonal matrix can be computed via Lanczos iteration, and Lanczos is essentially es-sentially an iterative tridiagonalization of A. At the i-th iteration we have J

    i

    = V >i

    AVi

    , where Vi

    2 RN⇥i are the first iLanczos vectors (i.e., a basis for the i-th Krylov space). Thus, J

    N

    = V >N

    AVN

    where VN

    is an N ⇥N orthonormal matrix,showing that J

    N

    has the same eigenvalues as A. As a result ⇡N

    (�) =Q

    N

    i=1

    (�� �i

    ), and it follows that the remainder

    RN

    [f ] =f (2N)(⇠)

    (2N)!I[⇡2

    N

    ] = 0,

    for some scalar ⇠ 2 [�min

    ,�max

    ], which shows that gN

    is exact for u>A�1u. For left and right Gauss-Radau quadrature,we have �

    N

    = 0, ↵lrN

    = �min

    , and ↵rrN

    = �max

    , while all other elements of the (N +1)-th row or column of J 0N

    are zeros.Thus, the eigenvalues of J 0

    N

    are �1

    , . . . ,�N

    , ⌧1

    , and ⇡N

    (�) again equalsQ

    N

    i=1

    (���i

    ). As a result, the remainder satisfies

    RN

    [f ] =f (2N)(⇠)

    (2N)!I[(�� ⌧

    1

    )⇡2N

    ] = 0,

    from which it follows that both grrN

    and glrN

    are exact.

    The convergence rate in Theorem 13 and the final exactness of iterations in Lemma 15 does not necessarily indicate thatwe are making progress at each iterations. However, by exploiting the relations to CG we can indeed conclude that we aremaking progress in each iteration in Gauss quadrature.Theorem 16. The approximation g

    i

    generated by Gauss quadrature is monotonically nondecreasing, i.e.,

    gi

    gi+1

    , for i < N.

  • Gauss quadrature for matrix inverse forms with applications

    Proof. At each iteration ri

    is taken to be orthogonal to the i-th Krylov space: Ki

    = span{u,Au, . . . , Ai�1u}. Let ⇧i

    bethe projection onto the complement space of K

    i

    . The residual then satisfies

    k"i+1

    k2A

    = "Ti+1

    A"i+1

    = r>i+1

    A�1ri+1

    = (⇧

    i+1

    ri

    )

    >A�1⇧i+1

    ri

    = r>i

    (⇧

    >i+1

    A�1⇧i+1

    )ri

    ri

    A�1ri

    ,

    where the last inequality follows from ⇧>i+1

    A�1⇧i+1

    � A�1. Thus k"i

    k2A

    is monotonically nonincreasing, wherebygN

    � gi

    � 0 is monotonically decreasing and thus gi

    is monotonically nondecreasing.

    Before we proceed to Gauss-Radau, let us recall a useful theorem and its corollary.Theorem 17 (Lanczos Polynomial (Golub & Meurant, 2009)). Let u

    i

    be the vector generated by Algorithm 1 at the i-thiteration; let p

    i

    be the Lanczos polynomial of degree i. Then we have

    ui

    = pi

    (A)u0

    , where pi

    (�) = (�1)i det(Ji � �I)Q

    i

    j=1

    �j

    .

    From the expression of Lanczos polynomial we have the following corollary specifying the sign of the polynomial atspecific points.Corollary 18. Assume i < N . If i is odd, then p

    i

    (�min

    ) < 0; for even i, pi

    (�min

    ) > 0, while pi

    (�max

    ) > 0 for anyi < N .

    Proof. Since Ji

    = V >i

    AVi

    is similar to A, its spectrum is bounded by �min

    and �max

    from left and right. Thus, Ji

    � �min

    is positive semi-definite, and Ji

    � �max

    is negative semi-definite. Taking (�1)i into consideration we will get the desiredconclusions.

    We are ready to state our main result that compares (right) Gauss-Radau with Gauss quadrature.Theorem 19 (Theorem 4 in the main text). Let i < N . Then, grr

    i

    gives better bounds than gi

    but worse bounds than gi+1

    ;more precisely,

    gi

    grri

    gi+1

    , i < N. (B.1)

    Proof. We prove inequality (B.1) using the recurrences satisfied by gi

    and grri

    (see Alg. 1)

    Upper bound: grri

    gi+1

    . The iterative quadrature algorithm uses the recursive updates

    grri

    = gi

    +

    �2i

    c2i

    �i

    (↵rri

    �i

    � �2i

    )

    ,

    gi+1

    = gi

    +

    �2i

    c2i

    �i

    (↵i+1

    �i

    � �2i

    )

    .

    It suffices to thus compare ↵rri

    and ↵i+1

    . The three-term recursion for Lanczos polynomials shows that

    �i+1

    pi+1

    (�max

    ) = (�max

    � ↵i+1

    )pi

    (�max

    )� �i

    pi�1(�max) > 0,

    �i+1

    p⇤i+1

    (�max

    ) = (�max

    � ↵rri

    )pi

    (�max

    )� �i

    pi�1(�max) = 0,

    where pi+1

    is the original Lanczos polynomial, and p⇤i+1

    is the modified polynomial that has �max

    as a root. Noting thatpi

    (�max

    ) > 0, we see that ↵i+1

    ↵rri

    . Moreover, from Theorem 16 we know that the gi

    ’s are monotonically increasing,whereby �

    i

    (↵i+1

    �i

    � �2i

    ) > 0. It follows that

    0 < �i

    (↵i+1

    �i

    � �2i

    ) �i

    (↵rri

    �i

    � �2i

    ),

    and from this inequality it is clear that grri

    gi+1

    .

    Lower-bound: gi

    grri

    . Since �2i

    c2i

    � 0 and �i

    (↵rri

    �i

    � �2i

    ) � �i

    (↵i+1

    �i

    � �2i

    ) > 0, we readily obtain

    gi

    gi

    +

    �2i

    c2i

    �i

    (↵rri

    �i

    � �2i

    )

    = grri

    .

  • Gauss quadrature for matrix inverse forms with applications

    Combining Theorem 19 with the convergence rate of relative error for Gauss quadrature (Theorem 14) immediately yieldsthe following convergence rate for right Gauss-Radau quadrature:

    Theorem 20 (Relative error of right Gauss-Radau, Theorem 5 in the main text). For each i, the right Gauss-Radau grri

    iterates satisfygN

    � grri

    gN

    2⇣

    p� 1p+ 1

    i

    .

    This results shows that with the same number of iterations, right Gauss-Radau gives superior approximation over Gaussquadrature, though they share the same relative error convergence rate.

    Our second main result compares Gauss-Lobatto with (left) Gauss-Radau quadrature.Theorem 21 (Theorem 6 in the main text). Let i < N . Then, glr

    i

    gives better upper bounds than gloi

    but worse than gloi+1

    ;more precisely,

    gloi+1

    glri

    gloi

    , i < N.

    Proof. We prove these inequalities using the recurrences for glri

    and gloi

    from Algorithm 5.

    glri

    gloi

    : From Algorithm 5 we observe that ↵loi

    = �min

    +

    (�

    loi )

    2

    lri

    . Thus we can write glri

    and gloi

    as

    glri

    = gi

    +

    �2i

    c2i

    �i

    (↵lri

    �i

    � �2i

    )

    = gi

    +

    �2i

    c2i

    �min

    �2i

    + �2i

    (�2i

    /�lri

    � �i

    )

    gloi

    = gi

    +

    (�loi

    )

    2c2i

    �i

    (↵loi

    �i

    � (�loi

    )

    2

    )

    = gi

    +

    (�loi

    )

    2c2i

    �min

    �2i

    + (�loi

    )

    2

    (�2i

    /�lri

    � �i

    )

    To compare these quantities, as before it is helpful to begin with the original three-term recursion for the Lanczos polyno-mial, namely

    �i+1

    pi+1

    (�) = (�� ↵i+1

    )pi

    (�)� �i

    pi�1(�).

    In the construction of Gauss-Lobatto, to make a new polynomial of order i + 1 that has roots �min

    and �max

    , we add�1

    pi

    (�) and �2

    pi�1(�) to the original polynomial to ensure

    �i+1

    pi+1

    (�min

    ) + �1

    pi

    (�min

    ) + �2

    pi�1(�min) = 0,

    �i+1

    pi+1

    (�max

    ) + �1

    pi

    (�max

    ) + �2

    pi�1(�max) = 0.

    Since �i+1

    , pi+1

    (�max

    ), pi

    (�max

    ) and pi�1(�max) are all greater than 0, �1pi(�max) + �2pi�1(�max) < 0. To determine

    the sign of polynomials at �min

    , consider the two cases:

    1. Odd i. In this case pi+1

    (�min

    ) > 0, pi

    (�min

    ) < 0, and pi�1(�min) > 0;

    2. Even i. In this case pi+1

    (�min

    ) < 0, pi

    (�min

    ) > 0, and pi�1(�min) < 0.

    Thus, if S = (sgn(�1

    ), sgn(�2

    )), where the signs take values in {0,±1}, then S 6= (1, 1), S 6= (�1, 1) and S 6= (0, 1).Hence, �

    2

    0 must hold, and thus (�loi

    )

    2

    = (�i

    � �2

    )

    2 � �2i

    given that �2i

    > 0 for i < N .

    Using (�loi

    )

    2 � �2i

    with �min

    c2i

    (�i

    )

    2 � 0, an application of monotonicity of the univariate function g(x) = axb+cx

    forab � 0 to the recurrences defining glr

    i

    and gloi

    yields the desired inequality glri

    gloi

    .

    gloi+1

    glri

    : From recursion formulas we have

    glri

    = gi

    +

    �2i

    c2i

    �i

    (↵lri

    �i

    � �2i

    )

    ,

    gloi+1

    = gi+1

    +

    (�loi+1

    )

    2c2i+1

    �i+1

    (↵loi+1

    �i+1

    � (�loi+1

    )

    2

    )

    .

  • Gauss quadrature for matrix inverse forms with applications

    Establishing glri

    � gloi+1

    thus amounts to showing that (noting the relations among gi

    , glri

    and gloi

    ):

    �2i

    c2i

    �i

    (↵lri

    �i

    � �2i

    )

    � �2

    i

    c2i

    �i

    (↵i+1

    �i

    � �2i

    )

    � (�loi+1

    )

    2c2i+1

    �i+1

    (↵loi+1

    �i+1

    � (�loi+1

    )

    2

    )

    () �2

    i

    c2i

    �i

    (↵lri

    �i

    � �2i

    )

    � �2

    i

    c2i

    �i

    (↵i+1

    �i

    � �2i

    )

    � (�loi+1

    )

    2c2i

    �2i

    (�i

    )

    2�i+1

    (↵loi+1

    �i+1

    � (�loi+1

    )

    2

    )

    () 1↵lri

    �i

    � �2i

    � 1↵i+1

    �i

    � �2i

    � (�loi+1

    )

    2

    �i

    �i+1

    (↵loi+1

    �i+1

    � (�loi+1

    )

    2

    )

    () 1(↵

    i+1

    � �lri+1

    )� �2i

    /�i

    � 1↵i+1

    � �2i

    /�i

    � 1�i+1

    (↵loi+1

    �i+1

    /(�loi+1

    )

    2 � 1) (Lemma 23)

    () 1�i+1

    � �lri+1

    � 1�i+1

    � 1�i+1

    (

    min

    �i+1

    (�

    loi+1)

    2

    +

    �i+1

    lri+1

    � 1)

    () �min�i+1(�lo

    i+1

    )

    2

    +

    �i+1

    �lri+1

    � 1 � �i+1�lri+1

    � 1

    () �min�i+1(�lo

    i+1

    )

    2

    � 0,

    where the last inequality is obviously true; hence the proof is complete.

    In summary, we have the following corollary for all the four quadrature rules:Corollary 22 (Monotonicity of Lower and Upper Bounds, Corr. 7 in the main text). As the iteration proceeds, g

    i

    and grri

    gives increasingly better asymptotic lower bounds and glri

    and gloi

    gives increasingly better upper bounds, namely

    gi

    gi+1

    ; grri

    grri+1

    glri

    � glri+1

    ; gloi

    � gloi+1

    .

    Proof. Directly drawn from Theorem 16, Theorem 19 and Theorem 21.

    Before proceeding further to our analysis of convergence rates of left Gauss-Radau and Gauss-Lobatto, we note twotechnical results that we will need.Lemma 23. Let ↵

    i+1

    and ↵lri

    be as in Alg. 1. The difference �i+1

    = ↵i+1

    � ↵lri

    satisfies �i+1

    = �lri+1

    .

    Proof. From the Lanczos polynomials in the definition of left Gauss-Radau quadrature we have

    �i+1

    p⇤i+1

    (�min

    ) =

    �min

    � ↵lri

    pi

    (�min

    )� �i

    pi�1(�min)

    =

    �min

    � (↵i+1

    ��i+1

    )

    pi

    (�min

    )� �i

    pi�1(�min)

    = �i+1

    pi+1

    (�min

    ) +�

    i+1

    pi

    (�min

    ) = 0.

    Rearrange this equation to write �i+1

    = ��i+1

    pi+1(�min)

    pi(�min), which can be further rewritten as

    i+1

    Theorem 17= ��

    i+1

    (�1)i+1det(Ji+1

    � �min

    I)/Q

    i+1

    j=1

    �j

    (�1)idet(Ji

    � �min

    I)/Q

    i

    j=1

    �j

    =

    det(Ji+1

    � �min

    I)

    det(Ji

    � �min

    I)= �lr

    i+1

    .

    Remark 24. Lemma 23 has an implication beyond its utility for the subsequent proofs: it provides a new way of calculating↵i+1

    given the quantities �lri+1

    and ↵lri

    ; this saves calculation in Algorithm 5.

    The following lemma relates �i

    to �lri

    , which will prove useful in subsequent analysis.Lemma 25. Let �lr

    i

    and �i

    be computed in the i-th iteration of Algorithm 1. Then, we have the following:

    �lri

    < �i

    , (B.2)

    �lri

    �i

    1� �min�N

    . (B.3)

  • Gauss quadrature for matrix inverse forms with applications

    Proof. We prove (B.2) by induction. Since �min

    > 0, �1

    = ↵1

    > �min

    and �lr1

    = ↵��min

    we know that �lr1

    < �1

    . Assumethat �lr

    i

    < �i

    is true for all i k and considering the (k + 1)-th iteration:

    �lrk+1

    = ↵k+1

    � �min

    � �2

    k

    �lrk

    < ↵k+1

    � �2

    k

    �k

    = �k+1

    .

    To prove (B.3), simply observe the following

    �lri

    �i

    =

    ↵i

    � �min

    � �2i�1/�

    lri�1

    ↵i

    � �2i�1/�

    i�1

    (B.2) ↵i � �min↵i

    1� �min�N

    .

    With aforementioned lemmas we will be able to show how fast the difference between glri

    and gi

    decays. Note that glri

    givesan upper bound on the objective while g

    i

    gives a lower bound.

    Lemma 26. The difference between glri

    and gi

    decreases linearly. More specifically we have

    glri

    � gi

    2+(p� 1p+ 1

    )

    igN

    where + = �N

    /�min

    and is the condition number of A, i.e., = �N

    /�1

    .

    Proof. We rewrite the difference glri

    � gi

    as follows

    glri

    � gi

    =

    �2i

    c2i

    �i

    (↵lri

    �i

    � �2i

    )

    =

    �2i

    c2i

    �i

    (↵i+1

    �i

    � �2i

    )

    �i

    (↵i+1

    �i

    � �2i

    )

    �i

    (↵lri

    �i

    � �2i

    )

    =

    �2i

    c2i

    �i

    (↵i+1

    �i

    � �2i

    )

    1

    ↵lri

    � �2i

    /�i

    ���

    ↵i+1

    � �2i

    /�i

    =

    �2i

    c2i

    �i

    (↵i

    �i

    � �2i

    )

    1

    1��i+1

    /�i+1

    ,

    where �i+1

    = ↵i+1

    � ↵lri

    . Next, recall that gN�gigN

    2⇣p

    �1p+1

    i

    . Since gi

    lower bounds gN

    , we have

    1� 2⇣

    p� 1p+ 1

    i

    gN

    gi

    gN

    ,

    1� 2⇣

    p� 1p+ 1

    i+1

    gN

    gi+1

    gN

    .

    Thus, we can conclude that

    �2i

    c2i

    �i

    (↵i

    �i

    � �2i

    )

    = gi+1

    � gi

    2⇣

    p� 1p+ 1

    i

    gN

    .

    Now we focus on the term�

    1��i+1

    /�i+1

    ��1. Using Lemma 23 we know that �i+1

    = �lri+1

    . Hence,

    1��i+1

    /�i+1

    = 1� �lri+1

    /�i+1

    � 1� (1� �min

    /�N

    ) = �min

    /�N

    , 1+

    .

    Finally we have

    glri

    � gi

    =

    �2i

    c2i

    �i

    (↵i

    �i

    � �2i

    )

    1

    1��i+1

    /�i+1

    2+⇣

    p� 1p+ 1

    i

    gN

    .

  • Gauss quadrature for matrix inverse forms with applications

    Theorem 27 (Relative error of left Gauss-Radau, Theorem 8 in the main text). For left Gauss-Radau quadrature wherethe preassigned node is �

    min

    , we have the following bound on relative error:

    glri

    � gN

    gN

    2+⇣

    p� 1p+ 1

    i

    ,

    where + := �N

    /�min

    , i < N .

    Proof. Write glri

    = gi

    + (glri

    � gi

    ). Since gi

    gN

    , using Lemma 26 to bound the second term we obtain

    glri

    gN

    + 2+⇣

    p� 1p+ 1

    i

    gN

    ,

    from which the claim follows upon rearrangement.

    Due to the relations between left Gauss-Radau and Gauss-Lobatto, we have the following corollary:

    Corollary 28 (Relative error of Gauss-Lobatto, Corr. 9 in the main text). For Gauss-Lobatto quadrature, we have thefollowing bound on relative error:

    gloi

    � gN

    gN

    2+⇣

    p� 1p+ 1

    i�1, (B.4)

    where + := �N

    /�min

    and i < N .

    C. Generalization: Symmetric MatricesIn this section we consider the case where u lies in the column space of several top eigenvectors of A, and discuss how theaforementioned theorems vary. In particular, note that the previous analysis assumes that A is positive definite. With ouranalysis in this section we relax this assumption to the more general case where A is symmetric with simple eigenvalues,though we require u to lie in the space spanned by eigenvectors of A corresponding to positive eigenvalues.

    We consider the case where A is symmetric and has the eigendecomposition of A = Q⇤Q> =P

    N

    i=1

    �i

    qi

    q>i

    where �i

    ’sare eigenvalues of A increasing with i and q

    i

    ’s are corresponding eigenvectors. Assume that u lies in the column spacespanned by top k eigenvectors of A where all these k eigenvectors correspond to positive eigenvalues. Namely we haveu 2 Span{{q

    i

    }Ni=N�k+1} and 0 < �N�k+1.

    Since we only assume that A is symmetric, it is possible that A is singular and thus we consider the value of u>A†u, whereA† is the pseudo-inverse of A. Due to the constraints on u we have

    u>A†u = u>Q⇤†Q>u = u>Qk

    †k

    Q>k

    u = u>B†u,

    where B =P

    N

    i=N�k+1 �iqiq>i

    . Namely, if u lies in the column space spanned by the top k eigenvectors of A then it isequivalent to substitute A with B, which is the truncated version of A at top k eigenvalues and corresponding eigenvectors.

    Another key observation is that, given that u lies only in the space spanned by {qi

    }Ni=N�k+1, the Krylov space starting at

    u becomes

    Span{u,Au,A2u, . . .} = Span{u,Bu,B2u, . . . , Bk�1u} (C.1)This indicates that Lanczos iteration starting at matrix A and vector u will finish constructing the corresponding Krylovspace after the k-th iteration. Thus under this condition, Algorithm 1 will run at most k iterations and then stop. At thattime, the eigenvalues of J

    k

    are exactly the eigenvalues of B, thus they are exactly {�i

    }Ni=N�k+1 of A. Using similar proof

    as in Lemma 15, we can obtain the following generalized exactness result.

    Corollary 29 (Generalized Exactness). gk

    , grrk

    and glrk

    are exact for u>A†u = u>B†u, namely

    gk

    = grrk

    = glrk

    = u>A†u = u>B†u.

  • Gauss quadrature for matrix inverse forms with applications

    The monotonicity and the relations between bounds given by various Gauss-type quadratures will still be the same asin the original case in Section 4, but the original convergence rate cannot apply in this case because now we probablyhave �

    min

    (B) = 0, making undefined. This crash of convergence rate results from the crash of the convergence of thecorresponding conjugate gradient algorithm for solving Ax = u. However, by looking at the proof of, e.g., (Shewchuk,1994), and by noting that �

    1

    (B) = . . . = �N�k(B) = 0, with a slight modification of the proof we actually obtain the

    bound

    k"ik2A

    minPi

    max

    �2{�i}Ni=N�k+1[P

    i

    (�)]2k"0k2A

    ,

    where Pi

    is a polynomial of order i. By using properties of Chebyshev polynomials and following the originalproof (e.g., (Golub & Meurant, 2009) or (Shewchuk, 1994)) we obtain the following lemma for conjugate gradient.

    Lemma 30. Let "k be as before (for conjugate gradient). Then,

    k"kkA

    2⇣

    p0 � 1p0 + 1

    )

    kk"0

    kA

    , where 0 := �N

    /�N�k+1.

    Following this new convergence rate and connections between conjugate gradient, Lanczos iterations and Gauss quadraturementioned in Section 4, we have the following convergence bounds.

    Corollary 31 (Convergence Rate for Special Case). Under the above assumptions on A and u, due to the connectionBetween Gauss quadrature, Lanczos algorithm and Conjugate Gradient, the relative convergence rates of g

    i

    , grri

    , glri

    andgloi

    are given by

    gk

    � gi

    gk

    2⇣

    p0 � 1p0 + 1

    i

    gk

    � grri

    gk

    2⇣

    p0 � 1p0 + 1

    i

    glri

    � gk

    gk

    20m

    p0 � 1p0 + 1

    i

    gloi

    � gk

    gk

    20m

    p0 � 1p0 + 1

    i

    ,

    where 0m

    = �N

    /�0min

    and 0 < �0min

    < �N�k+1 is a lowerbound for nonzero eigenvalues of B.

    D. Accelerating MCMC for k-DPPWe present details of a Retrospective Markov Chain Monte Carlo (MCMC) in Algorithm 6 and Algorithm 7 that samplesfor efficiently drawing samples from a k-DPP, by accelerating it using our results on Gauss-type quadratures.

    Algorithm 6 Gauss-kDPP (L, k)Input: L the kernel matrix we want to sample DPP from, k the size of subset and Y = [N ] the ground setOutput: Y sampled from exact kDPP (L) where |Y | = kRandomly Initialize Y ✓ Y where |Y | = kwhile not mixed do

    Pick v 2 Y and u 2 Y\Y uniformly randomlyPick p 2 (0, 1) uniformly randomlyY 0 = Y \{v}Get lower and upper bounds �

    min

    , �max

    of the spectrum of LY

    0

    if k-DPP-JudgeGauss(pLv,v

    � Lu,u

    , p, LY

    0,u

    , LY

    0,v

    ,�min

    ,�max

    ) = True thenY = Y 0 [ {u}

    end ifend while

  • Gauss quadrature for matrix inverse forms with applications

    Algorithm 7 kDPP-JudgeGauss(t, p, u, v, A,�min

    ,�max

    )Input: t the target value, p the scaling factor, u, v and A the corresponding vectors and matrix, �

    min

    and �max

    lower andupper bounds for the spectrum of A

    Output: Return True if t < p(v>A�1v)� u>A�1u, False if otherwiseu�1 = 0, u0 = u/kuk, iu = 1, �u

    0

    = 0, du = 1v�1 = 0, v0 = v/kvk, iv = 1, �v

    0

    = 0, dv = 1while True do

    if du > pdv thenRun one more iteration of Gauss-Radau on u>A�1u to get tighter (glr)u and (grr)udu = (glr)u � (grr)u

    elseRun one more iteration of Gauss-Radau on v>A�1v to get tighter (glr)v and (grr)vdv = (glr)v � (grr)v

    end ifif t < pkvk2(grr)v � kuk2(glr)u then

    Return Trueelse if t � pkvk2(glr)v � kuk2(grr)u then

    Return Falseend if

    end while

    E. Accelerating Stochastic Double GreedyWe present details of Retrospective Stochastic Double Greedy in Algorithm 8 and Algorithm 9 that efficiently select asubset Y 2 Y that approximately maximize log det(L

    Y

    ).

    Algorithm 8 Gauss-DG (L)Input: L the kernel matrix and Y = [N ] the ground setOutput: X 2 Y that approximately maximize log det(L

    Y

    )

    X0

    = ;, Y0

    = Yfor i = 1, 2, . . . , N do

    Y 0i

    = Yi�1\{i}

    Sample p 2 (0, 1) uniformly randomlyGet lower and upper bounds ��

    min

    ,��max

    ,�+min

    ,�+max

    of the spectrum of LXi�1 and LY 0i respectively

    if DG-JudgeGauss(LXi�1 , LY 0i , LXi�1,i, LY 0i ,i, Li,i, p,�

    �min

    ,��max

    ,�+min

    ,�+max

    ) = True thenX

    i

    = Xi�1 [ {i}

    elseYi

    = Y 0i

    end ifend for

  • Gauss quadrature for matrix inverse forms with applications

    Algorithm 9 DG-JudgeGauss(A,B, u, v, t, p,�Amin

    ,�Amax

    ,�Bmin

    ,�Bmax

    )

    Input: t the target value, p the scaling factor, u, v, A and B the corresponding vectors and matrix, �Amin

    , �Amax

    , �Bmin

    , �Bmax

    lower and upper bounds for the spectrum of A and BOutput: Return True if p| log(t� u>A�1u)|

    +

    (1� p)|� log(t� v>B�1v)|+

    , False if otherwisedu = 1, dv = 1while True do

    if pdu > (1� p)dv thenRun one more iteration of Gauss-Radau on u>A�1u to get tighter lower and upper bounds lu, uu for | log(t �u>A�1u)|

    +

    du = uu � luelse

    Run one more iteration of Gauss-Radau on v>B�1v to get tighter lower and upper bounds lv , uv for | log(t �v>B�1v)|

    +

    dv = uv � lvend ifif puu (1� p)lv then

    Return Trueelse if plu > (1� p)uv then

    Return Falseend if

    end while