A. Further Background on Gauss Quadratureproceedings.mlr.press/v48/lig16-supp.pdf · 2020. 11....

Gauss quadrature for matrix inverse forms with applications

A. Further Background on Gauss QuadratureWe present below a more detailed summary of material on Gauss quadrature to make the paper self-contained.

A.1. Selecting weights and nodes

We’ve described that the Riemann-Stieltjes integral could be expressed as

I[f ] := Qn

+Rn

=

X

n

i=1

!i

f(✓i

) +

X

m

i=1

⌫i

f(⌧i

) +Rn

[f ],

where Qn

denotes the nth degree approximation and Rn

denotes a remainder term. The weights {!i

}ni=1

, {⌫i

}mi=1

andnodes {✓

i

}ni=1

are chosen such that for all polynomials of degree less than 2n+m�1, denoted f 2 P2n+m�1, we have exactinterpolation I[f ] = Q

n

. One way to compute weights and nodes is to set f(x) = xi for i 2n+m� 1 and then use thisexact nonlinear system. But there is an easier way to obtain weights and nodes, namely by using polynomials orthogonalwith respect to the measure ↵. Specifically, we construct a sequence of orthogonal polynomials p

0

(�), p1

(�), . . . such thatpi

(�) is a polynomial in � of degree exactly k, and pi

, pj

are orthogonal, i.e., they satisfy

Z

�

max

�

min

pi

(�)pj

(�)d↵(�) =

⇢

1, i = j0, otherwise.

The roots of pn

are distinct, real and lie in the interval of [�min

,�max

], and form the nodes {✓i

}ni=1

for Gauss quadra-ture (see, e.g., (Golub & Meurant, 2009, Ch. 6)).

Consider the two monic polynomials whose roots serve as quadrature nodes:

⇡n

(�) =Y

n

i=1

(�� ✓i

), ⇢m

(�) =Y

m

i=1

(�� ⌧i

),

where ⇢0

= 1 for consistency. We further denote ⇢+m

= ±⇢m

, where the sign is taken to ensure ⇢+m

� 0 on [�min

,�max

].Then, for m > 0, we calculate the quadrature weights as

!i

= I

⇢+m

(�)⇡n

(�)

⇢+m

(✓i

)⇡0n

(✓i

)(�� ✓i

)

�

, ⌫j

= I

⇢+m

(�)⇡n

(�)

(⇢+m

)

0(⌧

j

)⇡n

(⌧j

)(�� ⌧j

)

�

,

where f 0(�) denotes the derivative of f with respect to �. When m = 0 the quadrature degenerates to Gauss quadratureand we have

!i

= I

⇡n

(�)

⇡0n

(✓i

)(�� ✓i

)

�

.

Although we have specified how to select nodes and weights for quadrature, these ideas cannot be applied to our problembecause the measure ↵ is unknown. Indeed, calculating the measure explicitly would require knowing the entire spectrumof A, which is as good as explicitly computing f(A), hence untenable for us. The next section shows how to circumventthe difficulties due to unknown ↵.

A.2. Gauss Quadrature Lanczos (GQL)

The key idea to circumvent our lack of knowledge of ↵ is to recursively construct polynomials called Lanczos polynomials.The construction ensures their orthogonality with respect to ↵. Concretely, we construct Lanczos polynomials via thefollowing three-term recurrence:

�i

pi

(�) = (�� ↵i

)pi�1(�)� �i�1pi�2(�), i = 1, 2, . . . , n

p�1(�) ⌘ 0; p0(�) ⌘ 1, (A.1)

while ensuringR

�

max

�

min

d↵(�) = 1. We can express (A.1) in matrix form by writing

�Pn

(�) = Jn

Pn

(�) + �n

pn

(�)en

,


where Pn

(�) := [p0

(�), . . . , pn�1(�)]>, en is nth canonical unit vector, and Jn is the tridiagonal matrix

Jn

=

2

6

6

6

6

6

6

4

↵1

�1

�1

↵2

�2

�2

. . . . . .

. . . ↵n�1 �n�1

�n�1 ↵n

3

7

7

7

7

7

7

5

. (A.2)

This matrix is known as the Jacobi matrix, and is closed related to Gauss quadrature. The following well-known theoremmakes this relation precise.

Theorem 10 ((Wilf, 1962; Golub & Welsch, 1969)). The eigenvalues of Jn

form the nodes {✓i

}ni=1

of Gauss-type quadra-tures. The weights {!

i

}ni=1

are given by the squares of the first elements of the normalized eigenvectors of Jn

.

Thus, if Jn

has the eigendecomposition Jn

= P>n

�Pn

, then for Gauss quadrature Theorem 10 yields

Qn

=

X

n

i=1

!i

f(✓i

) = e>1

P>n

f(�)Pn

e1

= e>1

f(Jn

)e1

. (A.3)

Specialization. We now specialize to our main focus, f(A) = A�1, for which we prove more precise results. In thiscase, (A.3) becomes Q

n

= [J�1n

]

1,1

. The task now is to compute Qn

, and given A, u to obtain the Jacobi matrix Jn

.

Fortunately, we can efficiently calculate Jn

iteratively using the Lanczos Algorithm (Lanczos, 1950). Suppose we havean estimate J

i

, in iteration (i + 1) of Lanczos, we compute the tridiagonal coefficients ↵i+1

and �i+1

and add them tothis estimate to form J

i+1

. As to Qn

, assuming we have already computed [J�1i

]

1,1

, letting ji

= J�1i

ei

and invoking theSherman-Morrison identity (Sherman & Morrison, 1950) we obtain the recursion:

[J�1i+1

]

1,1

= [J�1i

]

1,1

+

�2i

([ji

]

1

)

2

↵i+1

� �2i

[ji

]

i

, (A.4)

where [ji

]

1

and [ji

]

i

can be recursively computed using a Cholesky-like factorization of Ji

(Golub & Meurant, 2009, p.31).

For Gauss-Radau quadrature, we need to modify Ji

so that it has a prescribed eigenvalue. More precisely, we extend Ji

to J lri

for left Gauss-Radau (J rri

for right Gauss-Radau) with �i

on the off-diagonal and ↵lri

(↵rri

) on the diagonal, so thatJ lri

(J rri

) has a prescribed eigenvalue of �min

(�max

).

For Gauss-Lobatto quadrature, we extend Ji

to J loi

with values �loi

and ↵loi

chosen to ensure that J loi

has the prescribedeigenvalues �

min

and �max

. For more detailed on the construction, see (Golub, 1973).

For all methods, the approximated values are calculated as [(J 0i

)

�1]

1,1

, where J 0i

2 {J lri

, J rri

, J loi

} is the modified Jacobimatrix. Here J 0

i

is constructed at the i-th iteration of the algorithm.

The algorithm for computing Gauss, Gauss-Radau, and Gauss-Lobatto quadrature rules with the help of Lanczos itera-tion is called Gauss Quadrature Lanczos (GQL) and is shown in (Golub & Meurant, 1997). We recall its pseudocodein Algorithm 1 to make our presentation self-contained (and for our proofs in Section 4).

The error of approximating I[f ] by Gauss-type quadratures can be expressed as

Rn

[f ] =f (2n+m)(⇠)

(2n+m)!I[⇢

m

⇡2n

],

for some ⇠ 2 [�min

,�max

] (see, e.g., (Stoer & Bulirsch, 2013)). Note that ⇢m

does not change sign in [�min

,�max

]; but withdifferent values of m and ⌧

j

we obtain different (but fixed) signs for Rn

[f ] using f(�) = 1/� and �min

> 0. Concretely,for Gauss quadrature m = 0 and R

n

[f ] � 0; for left Gauss-Radau m = 1 and ⌧1

= �min

, so we have Rn

[f ] 0; for rightGauss-Radau we have m = 1 and ⌧

1

= �max

, thus Rn

[f ] � 0; while for Gauss-Lobatto we have m = 2, ⌧1

= �min

and⌧2

= �max

, so that Rn

[f ] 0. This behavior of the errors clearly shows the ordering relations between the target valuesand the approximations made by the different quadrature rules. Lemma 2 (see e.g., (Meurant, 1997)) makes this claimprecise.


Algorithm 5 Gauss Quadrature Lanczos (GQL)Input: u and A the corresponding vector and matrix, �

min

and �max

lower and upper bounds for the spectrum of AOutput: g

i

, grri

, glri

and gloi

the Gauss, right Gauss-Radau, left Gauss-Radau and Gauss-Lobatto quadrature computed ati-th iteration

Initialize: u�1 = 0, u0 = u/kuk, ↵1 = u>0

Au0

, �1

= k(A�↵1

I)u0

k, g1

= kuk/↵1

, c1

= 1, �1

= ↵1

, �lr1

= ↵1

��min

,�rr1

= ↵1

� �max

, u1

= (A� ↵1

I)u0

/�1

, i = 2while i N do

↵i

= u>i�1Aui�1 {Lanczos Iteration}

ũi

= Aui�1 � ↵iui�1 � �i�1ui�2

�i

= kũi

kui

= ũi

/�i

gi

= gi�1 +

kuk�2i�1c2

i�1�i�1(↵i�i�1��2i�1)

{Update gi

with Sherman-Morrison formula}ci

= ci�1�i�1/�i�1

�i

= ↵i

� �2i�1�i�1

, �lri

= ↵i

� �min

� �2i�1�

lri�1

, �rri

= ↵i

� �max

� �2i�1�

rri�1

↵lri

= �min

+

�

2

i

�

lri

, ↵rri

= �max

+

�

2

i�

rri{Solve for J lr

i

and J rri

}↵loi

=

�

lri �

rri

�

rri��lri

(

�

max

�

lri

� �min�

rri), (�lo

i

)

2

=

�

lri �

rri

�

rri��lri

(�max

� �min

) {Solve for J loi

}glri

= gi

+

�

2

i c2

i kuk�i(↵

lri�i��2i )

, grri

= gi

+

�

2

i c2

i kuk�i(↵

rri �i��2i )

, gloi

= gi

+

(�

loi )

2

c

2

i kuk�i(↵

loi �i�(�loi )2)

{Update grri

, glri

and gloi

with Sherman-Morrisonformula}

i = i+ 1end while

Lemma 11. Let gi

, glri

, grri

, and gloi

be the approximations at the i-th iteration of Gauss, left Gauss-Radau, right Gauss-Radau, and Gauss-Lobatto quadrature, respectively. Then, g

i

and grri

provide lower bounds on u>A�1u, while glri

and gloi

provide upper bounds.

The final connection we recall as background is the method of conjugate gradients. This helps us analyze the speed atwhich quadrature converges to the true value (assuming exact arithmetic).

A.3. Relation with Conjugate Gradient

While Gauss-type quadratures relate to the Lanczos algorithm, Lanczos itself is closely related to conjugate gradient(CG) (Hestenes & Stiefel, 1952), a well-known method for solving Ax = b for positive definite A.

We recap this connection below. Let xk

be the estimated solution at the k-th CG iteration. If x⇤ denotes the true solutionto Ax = b, then the error "

k

and residual rk

are defined as

"k

:= x⇤ � xk

, rk

= A"k

= b�Axk

, (A.5)

At the k-th iteration, xk

is chosen such that rk

is orthogonal to the k-th Krylov space, i.e., the linear space Kk

spannedby {r

0

, Ar0

, . . . , Ak�1r0

}. It can be shown (Meurant, 2006) that rk

is a scaled Lanczos vector from the k-th iteration ofLanczos started with r

0

. Noting the relation between Lanczos and Gauss quadrature applied to appoximate r>0

A�1r0

, oneobtains the following theorem that relates CG with GQL.

Theorem 12 (CG and GQL; (Meurant, 1999)). Let "k

be the error as in (A.5), and let k"k

k2A

:= "Tk

A"k

. Then, it holdsthat

k"k

k2A

= kr0

k2([J�1N

]

1,1

� [J�1k

]

1,1

),

where Jk

is the Jacobi matrix at the k-th Lanczos iteration starting with r0

.

Finally, the rate at which k"k

k2A

shrinks has also been well-studied, as noted below.

Theorem 13 (CG rate, see e.g. (Shewchuk, 1994)). Let "k

be the error made by CG at iteration k when started with x0

.


Let be the condition number of A, i.e., = �1

/�N

. Then, the error norm at iteration k satisfies

k"k

kA

2⇣

p� 1p+ 1

⌘

k

k"0

kA

.

Due to these explicit relations between CG and Lanczos, as well as between Lanczos and Gauss quadrature, we readilyobtain the following convergence rate for relative error of Gauss quadrature.Theorem 14 (Gauss quadrature rate). The i-th iterate of Gauss quadrature satisfies the relative error bound

gN

� gi

gN

2⇣

p� 1p+ 1

⌘

i

.

Proof. This is obtained by exploiting relations among CG, Lanczos and Gauss quadrature. Set x0

= 0 and b = u. Then,"0

= x⇤ and r0

= u. An application of Theorem 12 and Theorem 13 thus yields the bound

k"i

k2A

= kuk2([J�1N

]

1,1

� [J�1i

]

1,1

) = gN

� gi

2⇣

p� 1p+ 1

⌘

i

k"0

kA

= 2

⇣

p� 1p+ 1

⌘

i

u>A�1u = 2⇣

p� 1p+ 1

⌘

i

gN

where the last equality draws from Lemma 15.

In other words, Theorem 14 shows that the iterates of Gauss quadrature converge linearly.

B. Proofs for Main Theoretical ResultsWe begin by proving an exactness property of Gauss and Gauss-Radau quadrature.Lemma 15 (Exactness). With A being symmetric positive definite with simple eigenvalues, the iterates g

N

, glrN

, and grrN

are exact. Namely, after N iterations they satisfy

gN

= glrN

= grrN

= u>A�1u.

Proof. Observe that the Jacobi tridiagonal matrix can be computed via Lanczos iteration, and Lanczos is essentially es-sentially an iterative tridiagonalization of A. At the i-th iteration we have J

i

= V >i

AVi

, where Vi

2 RN⇥i are the first iLanczos vectors (i.e., a basis for the i-th Krylov space). Thus, J

N

= V >N

AVN

where VN

is an N ⇥N orthonormal matrix,showing that J

N

has the same eigenvalues as A. As a result ⇡N

(�) =Q

N

i=1

(�� i

), and it follows that the remainder

RN

[f ] =f (2N)(⇠)

(2N)!I[⇡2

N

] = 0,

for some scalar ⇠ 2 [�min

,�max

], which shows that gN

is exact for u>A�1u. For left and right Gauss-Radau quadrature,we have �

N

= 0, ↵lrN

= �min

, and ↵rrN

= �max

, while all other elements of the (N +1)-th row or column of J 0N

are zeros.Thus, the eigenvalues of J 0

N

are �1

, . . . ,�N

, ⌧1

, and ⇡N

(�) again equalsQ

N

i=1

(��i

). As a result, the remainder satisfies

RN

[f ] =f (2N)(⇠)

(2N)!I[(�� ⌧

1

)⇡2N

] = 0,

from which it follows that both grrN

and glrN

are exact.

The convergence rate in Theorem 13 and the final exactness of iterations in Lemma 15 does not necessarily indicate thatwe are making progress at each iterations. However, by exploiting the relations to CG we can indeed conclude that we aremaking progress in each iteration in Gauss quadrature.Theorem 16. The approximation g

i

generated by Gauss quadrature is monotonically nondecreasing, i.e.,

gi

gi+1

, for i < N.


Proof. At each iteration ri

is taken to be orthogonal to the i-th Krylov space: Ki

= span{u,Au, . . . , Ai�1u}. Let ⇧i

bethe projection onto the complement space of K

i

. The residual then satisfies

k"i+1

k2A

= "Ti+1

A"i+1

= r>i+1

A�1ri+1

= (⇧

i+1

ri

)

>A�1⇧i+1

ri

= r>i

(⇧

>i+1

A�1⇧i+1

)ri

ri

A�1ri

,

where the last inequality follows from ⇧>i+1

A�1⇧i+1

� A�1. Thus k"i

k2A

is monotonically nonincreasing, wherebygN

� gi

� 0 is monotonically decreasing and thus gi

is monotonically nondecreasing.

Before we proceed to Gauss-Radau, let us recall a useful theorem and its corollary.Theorem 17 (Lanczos Polynomial (Golub & Meurant, 2009)). Let u

i

be the vector generated by Algorithm 1 at the i-thiteration; let p

i

be the Lanczos polynomial of degree i. Then we have

ui

= pi

(A)u0

, where pi

(�) = (�1)i det(Ji � �I)Q

i

j=1

�j

.

From the expression of Lanczos polynomial we have the following corollary specifying the sign of the polynomial atspecific points.Corollary 18. Assume i < N . If i is odd, then p

i

(�min

) < 0; for even i, pi

(�min

) > 0, while pi

(�max

) > 0 for anyi < N .

Proof. Since Ji

= V >i

AVi

is similar to A, its spectrum is bounded by �min

and �max

from left and right. Thus, Ji

� �min

is positive semi-definite, and Ji

� �max

is negative semi-definite. Taking (�1)i into consideration we will get the desiredconclusions.

We are ready to state our main result that compares (right) Gauss-Radau with Gauss quadrature.Theorem 19 (Theorem 4 in the main text). Let i < N . Then, grr

i

gives better bounds than gi

but worse bounds than gi+1

;more precisely,

gi

grri

gi+1

, i < N. (B.1)

Proof. We prove inequality (B.1) using the recurrences satisfied by gi

and grri

(see Alg. 1)

Upper bound: grri

gi+1

. The iterative quadrature algorithm uses the recursive updates

grri

= gi

+

�2i

c2i

�i

(↵rri

�i

� �2i

)

,

gi+1

= gi

+

�2i

c2i

�i

(↵i+1

�i

� �2i

)

.

It suffices to thus compare ↵rri

and ↵i+1

. The three-term recursion for Lanczos polynomials shows that

�i+1

pi+1

(�max

) = (�max

� ↵i+1

)pi

(�max

)� �i

pi�1(�max) > 0,

�i+1

p⇤i+1

(�max

) = (�max

� ↵rri

)pi

(�max

)� �i

pi�1(�max) = 0,

where pi+1

is the original Lanczos polynomial, and p⇤i+1

is the modified polynomial that has �max

as a root. Noting thatpi

(�max

) > 0, we see that ↵i+1

↵rri

. Moreover, from Theorem 16 we know that the gi

’s are monotonically increasing,whereby �

i

(↵i+1

�i

� �2i

) > 0. It follows that

0 < �i

(↵i+1

�i

� �2i

) �i

(↵rri

�i

� �2i

),

and from this inequality it is clear that grri

gi+1

.

Lower-bound: gi

grri

. Since �2i

c2i

� 0 and �i

(↵rri

�i

� �2i

) � �i

(↵i+1

�i

� �2i

) > 0, we readily obtain

gi

gi

+

�2i

c2i

�i

(↵rri

�i

� �2i

)

= grri

.


Combining Theorem 19 with the convergence rate of relative error for Gauss quadrature (Theorem 14) immediately yieldsthe following convergence rate for right Gauss-Radau quadrature:

Theorem 20 (Relative error of right Gauss-Radau, Theorem 5 in the main text). For each i, the right Gauss-Radau grri

iterates satisfygN

� grri

gN

2⇣

p� 1p+ 1

⌘

i

.

This results shows that with the same number of iterations, right Gauss-Radau gives superior approximation over Gaussquadrature, though they share the same relative error convergence rate.

Our second main result compares Gauss-Lobatto with (left) Gauss-Radau quadrature.Theorem 21 (Theorem 6 in the main text). Let i < N . Then, glr

i

gives better upper bounds than gloi

but worse than gloi+1

;more precisely,

gloi+1

glri

gloi

, i < N.

Proof. We prove these inequalities using the recurrences for glri

and gloi

from Algorithm 5.

glri

gloi

: From Algorithm 5 we observe that ↵loi

= �min

+

(�

loi )

2

�

lri

. Thus we can write glri

and gloi

as

glri

= gi

+

�2i

c2i

�i

(↵lri

�i

� �2i

)

= gi

+

�2i

c2i

�min

�2i

+ �2i

(�2i

/�lri

� �i

)

gloi

= gi

+

(�loi

)

2c2i

�i

(↵loi

�i

� (�loi

)

2

)

= gi

+

(�loi

)

2c2i

�min

�2i

+ (�loi

)

2

(�2i

/�lri

� �i

)

To compare these quantities, as before it is helpful to begin with the original three-term recursion for the Lanczos polyno-mial, namely

�i+1

pi+1

(�) = (�� ↵i+1

)pi

(�)� �i

pi�1(�).

In the construction of Gauss-Lobatto, to make a new polynomial of order i + 1 that has roots �min

and �max

, we add�1

pi

(�) and �2

pi�1(�) to the original polynomial to ensure

⇢

�i+1

pi+1

(�min

) + �1

pi

(�min

) + �2

pi�1(�min) = 0,

�i+1

pi+1

(�max

) + �1

pi

(�max

) + �2

pi�1(�max) = 0.

Since �i+1

, pi+1

(�max

), pi

(�max

) and pi�1(�max) are all greater than 0, �1pi(�max) + �2pi�1(�max) < 0. To determine

the sign of polynomials at �min

, consider the two cases:

1. Odd i. In this case pi+1

(�min

) > 0, pi

(�min

) < 0, and pi�1(�min) > 0;

2. Even i. In this case pi+1

(�min

) < 0, pi

(�min

) > 0, and pi�1(�min) < 0.

Thus, if S = (sgn(�1

), sgn(�2

)), where the signs take values in {0,±1}, then S 6= (1, 1), S 6= (�1, 1) and S 6= (0, 1).Hence, �

2

0 must hold, and thus (�loi

)

2

= (�i

� �2

)

2 � �2i

given that �2i

> 0 for i < N .

Using (�loi

)

2 � �2i

with �min

c2i

(�i

)

2 � 0, an application of monotonicity of the univariate function g(x) = axb+cx

forab � 0 to the recurrences defining glr

i

and gloi

yields the desired inequality glri

gloi

.

gloi+1

glri

: From recursion formulas we have

glri

= gi

+

�2i

c2i

�i

(↵lri

�i

� �2i

)

,

gloi+1

= gi+1

+

(�loi+1

)

2c2i+1

�i+1

(↵loi+1

�i+1

� (�loi+1

)

2

)

.


Establishing glri

� gloi+1

thus amounts to showing that (noting the relations among gi

, glri

and gloi

):

�2i

c2i

�i

(↵lri

�i

� �2i

)

� �2

i

c2i

�i

(↵i+1

�i

� �2i

)

� (�loi+1

)

2c2i+1

�i+1

(↵loi+1

�i+1

� (�loi+1

)

2

)

() �2

i

c2i

�i

(↵lri

�i

� �2i

)

� �2

i

c2i

�i

(↵i+1

�i

� �2i

)

� (�loi+1

)

2c2i

�2i

(�i

)

2�i+1

(↵loi+1

�i+1

� (�loi+1

)

2

)

() 1↵lri

�i

� �2i

� 1↵i+1

�i

� �2i

� (�loi+1

)

2

�i

�i+1

(↵loi+1

�i+1

� (�loi+1

)

2

)

() 1(↵

i+1

� �lri+1

)� �2i

/�i

� 1↵i+1

� �2i

/�i

� 1�i+1

(↵loi+1

�i+1

/(�loi+1

)

2 � 1) (Lemma 23)

() 1�i+1

� �lri+1

� 1�i+1

� 1�i+1

(

�

min

�i+1

(�

loi+1)

2

+

�i+1

�

lri+1

� 1)

() �min�i+1(�lo

i+1

)

2

+

�i+1

�lri+1

� 1 � �i+1�lri+1

� 1

() �min�i+1(�lo

i+1

)

2

� 0,

where the last inequality is obviously true; hence the proof is complete.

In summary, we have the following corollary for all the four quadrature rules:Corollary 22 (Monotonicity of Lower and Upper Bounds, Corr. 7 in the main text). As the iteration proceeds, g

i

and grri

gives increasingly better asymptotic lower bounds and glri

and gloi

gives increasingly better upper bounds, namely

gi

gi+1

; grri

grri+1

glri

� glri+1

; gloi

� gloi+1

.

Proof. Directly drawn from Theorem 16, Theorem 19 and Theorem 21.

Before proceeding further to our analysis of convergence rates of left Gauss-Radau and Gauss-Lobatto, we note twotechnical results that we will need.Lemma 23. Let ↵

i+1

and ↵lri

be as in Alg. 1. The difference �i+1

= ↵i+1

� ↵lri

satisfies �i+1

= �lri+1

.

Proof. From the Lanczos polynomials in the definition of left Gauss-Radau quadrature we have

�i+1

p⇤i+1

(�min

) =

�

�min

� ↵lri

�

pi

(�min

)� �i

pi�1(�min)

=

�

�min

� (↵i+1

��i+1

)

�

pi

(�min

)� �i

pi�1(�min)

= �i+1

pi+1

(�min

) +�

i+1

pi

(�min

) = 0.

Rearrange this equation to write �i+1

= ��i+1

pi+1(�min)

pi(�min), which can be further rewritten as

�

i+1

Theorem 17= ��

i+1

(�1)i+1det(Ji+1

� �min

I)/Q

i+1

j=1

�j

(�1)idet(Ji

� �min

I)/Q

i

j=1

�j

=

det(Ji+1

� �min

I)

det(Ji

� �min

I)= �lr

i+1

.

Remark 24. Lemma 23 has an implication beyond its utility for the subsequent proofs: it provides a new way of calculating↵i+1

given the quantities �lri+1

and ↵lri

; this saves calculation in Algorithm 5.

The following lemma relates �i

to �lri

, which will prove useful in subsequent analysis.Lemma 25. Let �lr

i

and �i

be computed in the i-th iteration of Algorithm 1. Then, we have the following:

�lri

< �i

, (B.2)

�lri

�i

1� �min�N

. (B.3)


Proof. We prove (B.2) by induction. Since �min

> 0, �1

= ↵1

> �min

and �lr1

= ↵��min

we know that �lr1

< �1

. Assumethat �lr

i

< �i

is true for all i k and considering the (k + 1)-th iteration:

�lrk+1

= ↵k+1

� �min

� �2

k

�lrk

< ↵k+1

� �2

k

�k

= �k+1

.

To prove (B.3), simply observe the following

�lri

�i

=

↵i

� �min

� �2i�1/�

lri�1

↵i

� �2i�1/�

i�1

(B.2) ↵i � �min↵i

1� �min�N

.

With aforementioned lemmas we will be able to show how fast the difference between glri

and gi

decays. Note that glri

givesan upper bound on the objective while g

i

gives a lower bound.

Lemma 26. The difference between glri

and gi

decreases linearly. More specifically we have

glri

� gi

2+(p� 1p+ 1

)

igN

where + = �N

/�min

and is the condition number of A, i.e., = �N

/�1

.

Proof. We rewrite the difference glri

� gi

as follows

glri

� gi

=

�2i

c2i

�i

(↵lri

�i

� �2i

)

=

�2i

c2i

�i

(↵i+1

�i

� �2i

)

�i

(↵i+1

�i

� �2i

)

�i

(↵lri

�i

� �2i

)

=

�2i

c2i

�i

(↵i+1

�i

� �2i

)

1

�

↵lri

� �2i

/�i

��

↵i+1

� �2i

/�i

�

=

�2i

c2i

�i

(↵i

�i

� �2i

)

1

1��i+1

/�i+1

,

where �i+1

= ↵i+1

� ↵lri

. Next, recall that gN�gigN

2⇣p

�1p+1

⌘

i

. Since gi

lower bounds gN

, we have

⇣

1� 2⇣

p� 1p+ 1

⌘

i

⌘

gN

gi

gN

,

⇣

1� 2⇣

p� 1p+ 1

⌘

i+1

⌘

gN

gi+1

gN

.

Thus, we can conclude that

�2i

c2i

�i

(↵i

�i

� �2i

)

= gi+1

� gi

2⇣

p� 1p+ 1

⌘

i

gN

.

Now we focus on the term�

1��i+1

/�i+1

��1. Using Lemma 23 we know that �i+1

= �lri+1

. Hence,

1��i+1

/�i+1

= 1� �lri+1

/�i+1

� 1� (1� �min

/�N

) = �min

/�N

, 1+

.

Finally we have

glri

� gi

=

�2i

c2i

�i

(↵i

�i

� �2i

)

1

1��i+1

/�i+1

2+⇣

p� 1p+ 1

⌘

i

gN

.


Theorem 27 (Relative error of left Gauss-Radau, Theorem 8 in the main text). For left Gauss-Radau quadrature wherethe preassigned node is �

min

, we have the following bound on relative error:

glri

� gN

gN

2+⇣

p� 1p+ 1

⌘

i

,

where + := �N

/�min

, i < N .

Proof. Write glri

= gi

+ (glri

� gi

). Since gi

gN

, using Lemma 26 to bound the second term we obtain

glri

gN

+ 2+⇣

p� 1p+ 1

⌘

i

gN

,

from which the claim follows upon rearrangement.

Due to the relations between left Gauss-Radau and Gauss-Lobatto, we have the following corollary:

Corollary 28 (Relative error of Gauss-Lobatto, Corr. 9 in the main text). For Gauss-Lobatto quadrature, we have thefollowing bound on relative error:

gloi

� gN

gN

2+⇣

p� 1p+ 1

⌘

i�1, (B.4)

where + := �N

/�min

and i < N .

C. Generalization: Symmetric MatricesIn this section we consider the case where u lies in the column space of several top eigenvectors of A, and discuss how theaforementioned theorems vary. In particular, note that the previous analysis assumes that A is positive definite. With ouranalysis in this section we relax this assumption to the more general case where A is symmetric with simple eigenvalues,though we require u to lie in the space spanned by eigenvectors of A corresponding to positive eigenvalues.

We consider the case where A is symmetric and has the eigendecomposition of A = Q⇤Q> =P

N

i=1

�i

qi

q>i

where �i

’sare eigenvalues of A increasing with i and q

i

’s are corresponding eigenvectors. Assume that u lies in the column spacespanned by top k eigenvectors of A where all these k eigenvectors correspond to positive eigenvalues. Namely we haveu 2 Span{{q

i

}Ni=N�k+1} and 0 < �N�k+1.

Since we only assume that A is symmetric, it is possible that A is singular and thus we consider the value of u>A†u, whereA† is the pseudo-inverse of A. Due to the constraints on u we have

u>A†u = u>Q⇤†Q>u = u>Qk

⇤

†k

Q>k

u = u>B†u,

where B =P

N

i=N�k+1 �iqiq>i

. Namely, if u lies in the column space spanned by the top k eigenvectors of A then it isequivalent to substitute A with B, which is the truncated version of A at top k eigenvalues and corresponding eigenvectors.

Another key observation is that, given that u lies only in the space spanned by {qi

}Ni=N�k+1, the Krylov space starting at

u becomes

Span{u,Au,A2u, . . .} = Span{u,Bu,B2u, . . . , Bk�1u} (C.1)This indicates that Lanczos iteration starting at matrix A and vector u will finish constructing the corresponding Krylovspace after the k-th iteration. Thus under this condition, Algorithm 1 will run at most k iterations and then stop. At thattime, the eigenvalues of J

k

are exactly the eigenvalues of B, thus they are exactly {�i

}Ni=N�k+1 of A. Using similar proof

as in Lemma 15, we can obtain the following generalized exactness result.

Corollary 29 (Generalized Exactness). gk

, grrk

and glrk

are exact for u>A†u = u>B†u, namely

gk

= grrk

= glrk

= u>A†u = u>B†u.


The monotonicity and the relations between bounds given by various Gauss-type quadratures will still be the same asin the original case in Section 4, but the original convergence rate cannot apply in this case because now we probablyhave �

min

(B) = 0, making undefined. This crash of convergence rate results from the crash of the convergence of thecorresponding conjugate gradient algorithm for solving Ax = u. However, by looking at the proof of, e.g., (Shewchuk,1994), and by noting that �

1

(B) = . . . = �N�k(B) = 0, with a slight modification of the proof we actually obtain the

bound

k"ik2A

minPi

max

�2{�i}Ni=N�k+1[P

i

(�)]2k"0k2A

,

where Pi

is a polynomial of order i. By using properties of Chebyshev polynomials and following the originalproof (e.g., (Golub & Meurant, 2009) or (Shewchuk, 1994)) we obtain the following lemma for conjugate gradient.

Lemma 30. Let "k be as before (for conjugate gradient). Then,

k"kkA

2⇣

p0 � 1p0 + 1

)

kk"0

kA

, where 0 := �N

/�N�k+1.

Following this new convergence rate and connections between conjugate gradient, Lanczos iterations and Gauss quadraturementioned in Section 4, we have the following convergence bounds.

Corollary 31 (Convergence Rate for Special Case). Under the above assumptions on A and u, due to the connectionBetween Gauss quadrature, Lanczos algorithm and Conjugate Gradient, the relative convergence rates of g

i

, grri

, glri

andgloi

are given by

gk

� gi

gk

2⇣

p0 � 1p0 + 1

⌘

i

gk

� grri

gk

2⇣

p0 � 1p0 + 1

⌘

i

glri

� gk

gk

20m

⇣

p0 � 1p0 + 1

⌘

i

gloi

� gk

gk

20m

⇣

p0 � 1p0 + 1

⌘

i

,

where 0m

= �N

/�0min

and 0 < �0min

< �N�k+1 is a lowerbound for nonzero eigenvalues of B.

D. Accelerating MCMC for k-DPPWe present details of a Retrospective Markov Chain Monte Carlo (MCMC) in Algorithm 6 and Algorithm 7 that samplesfor efficiently drawing samples from a k-DPP, by accelerating it using our results on Gauss-type quadratures.

Algorithm 6 Gauss-kDPP (L, k)Input: L the kernel matrix we want to sample DPP from, k the size of subset and Y = [N ] the ground setOutput: Y sampled from exact kDPP (L) where |Y | = kRandomly Initialize Y ✓ Y where |Y | = kwhile not mixed do

Pick v 2 Y and u 2 Y\Y uniformly randomlyPick p 2 (0, 1) uniformly randomlyY 0 = Y \{v}Get lower and upper bounds �

min

, �max

of the spectrum of LY

0

if k-DPP-JudgeGauss(pLv,v

� Lu,u

, p, LY

0,u

, LY

0,v

,�min

,�max

) = True thenY = Y 0 [ {u}

end ifend while


Algorithm 7 kDPP-JudgeGauss(t, p, u, v, A,�min

,�max

)Input: t the target value, p the scaling factor, u, v and A the corresponding vectors and matrix, �

min

and �max

lower andupper bounds for the spectrum of A

Output: Return True if t < p(v>A�1v)� u>A�1u, False if otherwiseu�1 = 0, u0 = u/kuk, iu = 1, �u

0

= 0, du = 1v�1 = 0, v0 = v/kvk, iv = 1, �v

0

= 0, dv = 1while True do

if du > pdv thenRun one more iteration of Gauss-Radau on u>A�1u to get tighter (glr)u and (grr)udu = (glr)u � (grr)u

elseRun one more iteration of Gauss-Radau on v>A�1v to get tighter (glr)v and (grr)vdv = (glr)v � (grr)v

end ifif t < pkvk2(grr)v � kuk2(glr)u then

Return Trueelse if t � pkvk2(glr)v � kuk2(grr)u then

Return Falseend if

end while

E. Accelerating Stochastic Double GreedyWe present details of Retrospective Stochastic Double Greedy in Algorithm 8 and Algorithm 9 that efficiently select asubset Y 2 Y that approximately maximize log det(L

Y

).

Algorithm 8 Gauss-DG (L)Input: L the kernel matrix and Y = [N ] the ground setOutput: X 2 Y that approximately maximize log det(L

Y

)

X0

= ;, Y0

= Yfor i = 1, 2, . . . , N do

Y 0i

= Yi�1\{i}

Sample p 2 (0, 1) uniformly randomlyGet lower and upper bounds ��

min

,��max

,�+min

,�+max

of the spectrum of LXi�1 and LY 0i respectively

if DG-JudgeGauss(LXi�1 , LY 0i , LXi�1,i, LY 0i ,i, Li,i, p,�

�min

,��max

,�+min

,�+max

) = True thenX

i

= Xi�1 [ {i}

elseYi

= Y 0i

end ifend for


Algorithm 9 DG-JudgeGauss(A,B, u, v, t, p,�Amin

,�Amax

,�Bmin

,�Bmax

)

Input: t the target value, p the scaling factor, u, v, A and B the corresponding vectors and matrix, �Amin

, �Amax

, �Bmin

, �Bmax

lower and upper bounds for the spectrum of A and BOutput: Return True if p| log(t� u>A�1u)|

+

(1� p)|� log(t� v>B�1v)|+

, False if otherwisedu = 1, dv = 1while True do

if pdu > (1� p)dv thenRun one more iteration of Gauss-Radau on u>A�1u to get tighter lower and upper bounds lu, uu for | log(t �u>A�1u)|

+

du = uu � luelse

Run one more iteration of Gauss-Radau on v>B�1v to get tighter lower and upper bounds lv , uv for | log(t �v>B�1v)|

+

dv = uv � lvend ifif puu (1� p)lv then

Return Trueelse if plu > (1� p)uv then

Return Falseend if

end while

A. Further Background on Gauss Quadratureproceedings.mlr.press/v48/lig16-supp.pdf · 2020. 11....

Documents

Transcript of A. Further Background on Gauss Quadratureproceedings.mlr.press/v48/lig16-supp.pdf · 2020. 11....