Numerical Methods Rafał Zdunek Iterative...

NumericalNumerical

MethodsMethods

Rafał ZdunekRafał Zdunek

Iterative MethodsIterative Methods

(4h.)(4h.)

Introduction

• Stationary basic iterative methods, • Krylov subspace methods,• Nonnegative matrix factorization,• Multi-dimensional array decomposition

methods (tensor decompositions).

Bibliography[1] A. Bjorck, Numerical Methods for Least Squares Problems, SIAM,

Philadelphia, 1996, [2] G. Golub, C. F. Van Loan, Matrix Computations, The John Hopkins

University Press, (Third Edition), 1996, [3] J. Stoer

R. Bulirsch, Introduction to Numerical Analysis (Second Edition),

Springer-Verlag, 1993, [4] C. D. Meyer, Matrix Analysis and Applied Linear Algebra, SIAM, 2000,[5] Ch. Zarowski, An Introduction to Numerical Analysis for Electrical and

Computer Engineers, Wiley, 2004,[6] Cichocki, R. Zdunek, A. H. Phan, S.-I. Amari, Nonnegative Matrix and

Tensor Factorization: Applications to Exploratory Multi-way Data Analysis and Blind Source Separation, Wiley and Sons, UK, 2009

DefinitionIterative linear solvers attempt to iteratively approximate a solution [ ] N

jx ℑ∈=*x to a system of linear equations:

bAx = ,

where [ ] NMija ×ℑ∈=A is a coefficient matrix and [ ] M

ib ℑ∈=b is a data vector, using following updates:

( )( 1) ( ) , ,k kf+ =x x A b ,

where ( )kx is an approximation to the solution x in the k-th iterative step, and ( ), ,f ⋅ ⋅ ⋅ is an

update function determined by an underlying iterative method, where ( )( ) *lim , ,k

kf

→∞→x A b x .

Stationary basic iterative methods

Let ,=Ax b ,n n×∈ℜAwhere (nonsingular) ,n∈ℜx n∈ℜb

,= −A S TAssume the splitting: thus 1 ,k k+ = +Sx Tx b (Basic Iterative Methods)

We have 1 ,k k+ = +x Gx c where 1 1 ,− −= = −G S T I S A 1−=c S b

Theorem: The iterations {xk } are convergent to x = A-1b for any starting guess x0 if and only if every eigenvalue λ

of S-1T satisfies | λ

| <1. Its convergence rate depends

on the maximum value of | λ

|, which is known as a spectral radius of S-1T:

( ) .max1iiλρ =− TS

.

.

Stationary basic iterative methods

Proof: Let ek =xk -x* denote an error in the k-th iteration. Since Sxk+1 = Txk +b, we have S(xk+1 – x*) = T(xk – x*) +b, and the error in xk+1 is given by ek+1 =S-1T ek

= (S-1T)k+1 e0 . If ρ(S-1T)<1, (S-1T)k → 0.

The matrix S can be regarded as a preconditioner. There are several choices for splitting A. For example:

1. Jacobi method: S = diagonal part of A;

2. Gauss-Seidel method: S = lower triangular part of A;

3. Sucessive Over-Relaxation (SOR) method: combination of 1 and 2.

Jacobi methodLet ,n n×∈ℜA

11

22

0 00

,0

0 0

n n

nn

aa

a

×

⎡ ⎤⎢ ⎥⎢ ⎥= ∈ℜ⎢ ⎥⎢ ⎥⎣ ⎦

S

,= −A S T

12 1

21

( 1)

1 ( 1)

00

.

0

n

n n

n n

n n n

a aa

aa a

×

−

−

⎡ ⎤⎢ ⎥⎢ ⎥= − ∈ℜ⎢ ⎥⎢ ⎥⎢ ⎥⎣ ⎦

T

Thus ( )11 ,k k

−+ = +x S Tx b or ( 1) ( )1 , 1, 2, , .

nk k

i i ij jj iii

x b a x i na

+

≠

⎛ ⎞= − =⎜ ⎟

⎝ ⎠∑ …

Remark: if ∃i: aii = 0, a permutation of rows or columns is necessary.

Gauss-Seidel methodAssume

11

21 22

1 ( 1)

0 0

,0

n n

n n n nn

aa a

a a a

×

−

⎡ ⎤⎢ ⎥⎢ ⎥= ∈ℜ⎢ ⎥⎢ ⎥⎢ ⎥⎣ ⎦

S

12 1

( 1)

00 0

.

0 0 0

n

n n

n n

a a

a×

−

⎡ ⎤⎢ ⎥⎢ ⎥= − ∈ℜ⎢ ⎥⎢ ⎥⎣ ⎦

T

Thus ( )11 ,k k

−+ = +x S Tx b or

1( 1) ( 1) ( )

1 1

1 . i n

k k ki i ij j ij j

j j iii

x b a x a xa

−+ +

= = +

⎛ ⎞= − −⎜ ⎟

⎝ ⎠∑ ∑

Remark: Note that the Gauss-Seidel method is a cyclic method and uses also the elements xi

(k+1) that have been already updated in the current iterative cycle.However, unlike the Jacobi method, the computation for each element cannot be done in parallel.

Successive Over-Relaxation (SOR)

Let ,n n×∈ℜA ,= + +A L D U

11

22

0 00

,0

0 0 nn

aa

a

⎡ ⎤⎢ ⎥⎢ ⎥=⎢ ⎥⎢ ⎥⎣ ⎦

D

12 1

( 1)

00 0

.

0 0 0

n

n n

a a

a −

⎡ ⎤⎢ ⎥⎢ ⎥=⎢ ⎥⎢ ⎥⎣ ⎦

U21

1 ( 1)

0 0 00

,00n n n

a

a a −

⎡ ⎤⎢ ⎥⎢ ⎥=⎢ ⎥⎢ ⎥⎢ ⎥⎣ ⎦

L

Assuming1 ,ω

= +S L D 1 .ωω−⎛ ⎞= − +⎜ ⎟

⎝ ⎠T U D ( )1

1 .k k−

+ = +x S Tx bUsing

we have ( ) ( )( )( )11 1 ,k kω ω ω ω−+ = + − + −x L D b U D x 0 < ω

< 2 - relaxation

parameter

Successive Over-Relaxation (SOR)

Finally( )

1( 1) ( ) ( 1) ( )

1 11 .

i nk k k k

i i i ij j ij jj j iii

x x b a x a xaωω

−+ +

= = +

⎛ ⎞= − + − −⎜ ⎟

⎝ ⎠∑ ∑

For ω

=1, the SOR simplifies to the Gauss-Seidel method.

If :T n n×= ∈ℜA A (symmetric) ,T=U L11 ,

2

T

ω ω ω ω

−⎛ ⎞⎛ ⎞ ⎛ ⎞= + +⎜ ⎟⎜ ⎟ ⎜ ⎟− ⎝ ⎠⎝ ⎠ ⎝ ⎠

D D DP L L

( )11 ,k k kγ −+ = − −x x P Ax b (SSOR method)

Stationary basic iterative methods for LS problems

Let ,m n×∈ℜA

,T = −A A S T

where ,m n≥

Splitting:

We have 1 ,k k+ = +x Gx c where 1 ,Tn

−= −G I S A A 1 T−=c S A b

and consider the normal equations of the first kind:

,T T=A Ax A b

In general: ( )1 ,k k k+ = + −x x B b Ax

The matrix B can take various forms, depending on the underlying method.

Landweber iterations,T = −A A S TLet 1 ,nα

=S I 1 ,Tnα

= −T I A A for the splitting

and is a relaxation parameter.0α >

We have 1 ,k k+ = +x Gx c where ,Tn α= −G I A A .Tα=c A b

Finally ( )1 ,Tk k kα+ = + −x x A b Ax where .Tα=B A

The Landweber iterations are also known as the Richardson’s first-order method.

If (range of AT), then the Landweber iterations are convergent to x = A+b if

( )0 ,TR∈x A

( ) 1

max0 2 .Tα λ−

< < A A

Jacobi method for LS problemsLet [ ]1 2, , , ,m n

n×= ∈ℜA a a a…

21 2

22 2

2

2

0 0

0,

0

0 0

n n

n

×

⎡ ⎤⎢ ⎥⎢ ⎥

= ∈ℜ⎢ ⎥⎢ ⎥⎢ ⎥⎣ ⎦

a

aS

a

,T = −A A S T

1 2 1

2 1

1

1 1

00

.

0

T Tn

Tn n

Tn n

T Tn n n

×

−

−

⎡ ⎤⎢ ⎥⎢ ⎥= − ∈ℜ⎢ ⎥⎢ ⎥⎢ ⎥⎣ ⎦

a a a aa a

Ta a

a a a a

From ( )11 ,k k k

−+ = + = +x S Tx b Gx c where

( )( 1) ( )2

2

, j 1, 2, , .Tjk k

j j k

j

x x n+ = + − =a

b Axa

…

1 ,Tn

−= −G I S A A 1 T−=c S A b

we have:

Jacobi method for LS problemsIn the parallel mode:

( ) ( )11 ,T

k k k k k−

+ = + − = + −x x B b Ax x S A b Ax where 1 T−=B S A

and assuming that all columns in A are non-zero, we have : 0.jjj s∀ >

The Jacobi method is symmetrizable since:

( )1/2 1/2 1/2 1/2.Tn

− − −− =S I G S S A AS

Gauss-Seidel method for LS problems

Assume 21 2

22 1 2 2

21 1 2

0 0

,0

Tn n

T Tn n n n

×

−

⎡ ⎤⎢ ⎥⎢ ⎥

= ∈ℜ⎢ ⎥⎢ ⎥⎢ ⎥⎣ ⎦

a

a a aS

a a a a a

1 2 1

1

00 0

.

0 0 0

T Tn

n nTn n

×

−

⎡ ⎤⎢ ⎥⎢ ⎥= − ∈ℜ⎢ ⎥⎢ ⎥⎢ ⎥⎣ ⎦

a a a a

Ta a

Thus ( )11 ,k k

−+ = +x S Tx b leads to

( )( )11 1 .T T

k k k k−

+ += + − + +x x S A b Tx S T x

Kaczmarz algorithm

1 2

2

, Ti i kk k i

i

bα+

−= +

a xx x aa

In the sequential mode the Kaczmarz algorithm (devised by a Polish mathematician: Stefan Kaczmarz, and published in 1937) is given by:

Let1

,m n

m

×

⎡ ⎤⎢ ⎥= ∈ℜ⎢ ⎥⎢ ⎥⎣ ⎦

aA

awhere is the i-th row vector of A.1 n

i×∈ℜa

mod 1i k m= −

In image reconstruction, this algorithm is known as the unconstrained Algebraic Reconstruction Technique (ART).

0 2α< < - relaxation

Kaczmarz algorithmHyperplane defined by i-th equation:

The algorithm converges to:

1 2 mH H H∩ ∩ ∩…

Kaczmarz algorithm

For consistent case:

Limit point:

where is an initial guess and G is a generalized inverse of A.

( ) ( )( ) LSNkkNP xAGbxxxx A +=+==

∞→)(lim 00

*

0x

(K. Tanabe, Numerische Mathematik, 1971)

( )R∈b A

(nullspace) (minimal norm LS solution)

Kaczmarz algorithm

For inconsistent case:Limit point:

where

⇒

where:

( ) ( )( ) bGxxxx A~lim 00

* +==∞→ Nkk

P

,r δ= +b b b ,r nδ δ δ= +b b b ( ) ,r Rδ ∈b A ( ) ,Tn Nδ ∈b A

( ) ( )( )220

* ;, nrLSSd bGbGbGbAxx δδδ +==

( ) 2, .d = −u v u v

(C. Popa, R. Zdunek, Mathematics and Computers in Simulations, 2004)

( )R∉b A

Kaczmarz algorithm (example from image reconstruction)

Krylov subspace methods1k k kα+ = +x x r

k k= −r b Ax

The Richardson method: - solution update

- residual update

For α

= 1, we have ( )1 0 0 1 2 01

.k

ik k

i+

=

= + + + + + = − ⋅∑x x r r r r I A r…

Thus { } ( )11 0 0 0 0 0span , , , ; .k k

k K ++ − ∈ ⋅ ⋅ =x x r A r A r A r… - Krylov subspace

1

1 0 01

.k

ik i

iα

+

+=

= +∑r r A r

Assuming 1 1,k k+ += −r b Ax we have

11

1 0 01

,k

ik i

iα

+−

+=

= +∑x x A r

(Linear combinationcoefficients)

Krylov subspace methodsA family of nonparametric polynomial iterative methods based on the general formula:

( )1 1 0p ,k k+ += ⋅r A r

where ( ) 11p ++ Π∈ kk A (space of polynomials with (k+1) degrees ) and ( ) 10p 1 =+k

(Residual polynomial)

( )1 0 0q ,k k+ − = ⋅x x A r

(Iterative polynomial)

( ) ( )1p 1 q .k k+ = − ⋅A A A

Assuming * :k k= −ε x x ( ) ( )1 10 0p p .k k k k

− −= = =ε A r A A r A ε

Krylov subspace methodsThe classes of nonparametric polynomial iterative methods:

• Orthogonal Residual (OR) polynomials:

( ) ( ) 0, for p ,p

0, for OR ORk l

k lt t

k l> =⎧

= ⎨ ≠⎩where ( ) ( )p 0 p 0 1.OR OR

k l= =

Methods: CG, Lanczos, GENCG and FOM.

Minimal error criterion: ( ){ }* * 0 0min : , .OR kk K− = − ∈ +

AAx x x x x x A r

The Ritz-Galerkin condition: ( )0 , .OR kk K⊥r A r

• Minimal Residual (MR) polynomials: ( ) ( )p , p 0,MR MRk lt t t = .k l≠for

Minimal error criterion: ( ){ }0 0min : , .MR kk K− = − ∈ +b Ax b Ax x x A r

The Petrov-Galerkin condition: ( )0,rAAr kMRk K ⊥ Methods: CR, GMRES, MINRES

Krylov subspace methods• Minimal Error (ME) polynomials: ( ) ( ) 0p,p =ttME

k for ( ) 1p ,kt −∀ ∈Π

( )p 0 1,MEk = ( ) ( )p 0 1.ME

k

′=where

( ){ }* * 0 0min : , .ME kk K− = − ∈ +x x x x x x A A rMinimal error criterion:

These methods satisfy the Ritz-Galerkin condition: ( )0 , .ME kk K⊥r A r

• Minimal Residual (MR) polynomials for normal equations: 2 ,=A x Ab

( ) ( )2 2 2p , p 0MR MRk l =A A A for .k l≠

Minimal error criterion: ( ){ }20 0min : , .LSQR k

k K− = − ∈ +b Ax b Ax x x A A r

Methods: LSQR

Stieltjes algorithmThe OR polynomials satisfy the 3-term recurrences and can be computed by the Stieltjes algorithm:

( ) 0p 1 =− t( )0p 1,t =

( ) ( )( ) ( )( ) ( )( ) ( )

( )( ) ( ) ( ) ( )

1 1

1 1

1 2

2 2

1 2 1,2, ,

, p,

p , p

p ,p,

p , p

,

p p p ,

k kk

k k

k kk

k k

k k k

k k k k k k k K

t t tt t

t t tt t

t t t t

η

ξ

γ η ξ

γ η ξ

− −

− −

− −

− −

− − =

⎧ ⎫=⎪ ⎪

⎪ ⎪⎪ ⎪⎪ ⎪

=⎨ ⎬⎪ ⎪⎪ ⎪= − +⎪ ⎪⎪ ⎪= − −⎩ ⎭ …

( ) ( ) ( ){ }ttt kp,,p,p 10 …

Conjugate Gradients (CG) algorithm

Let be an additional polynomial for determining a search direction for ( )tk 1φ − ( )p .k t

The search direction is tangential in the point to the hyperplane spanned by

( ){ }tk k 1p,1 −−

( ) ( ) ( ) ( ){ }0 1 2 1 p , p , p , p .k kt t t t− −… Thus

( ) ( ) ( ) ( )1 21 1 1

p pφ p .k k

k k k k

t tt t

tξ − −

− − −

⎛ ⎞−= + ∈Π⎜ ⎟⎝ ⎠

( ) ( )( ) ( )( ) ( )( ) ( )

( ) ( ) ( )( ) ( ) ( )

1 1

1 1

1 1

1 2 2

1 2 1

1 1 1,2, ,

p , p1 ,φ ,φ

p ,p,

p ,p

φ φ p ,

p p φ ,

k kk

k k k

k kkk

k k k

k k k k

k k k k k K

t tt t t

t tt t

t t t

t t t t

αγ

ξβγ

β

α

− −

− −

− −

− − −

− − −

− − =

⎧ ⎫= =⎪ ⎪

⎪ ⎪⎪ ⎪⎪ ⎪

= =⎨ ⎬⎪ ⎪⎪ ⎪= +⎪ ⎪⎪ ⎪= −⎩ ⎭ …

From the orthogonality of OR polynomials:

( )( ) ( )p p 0,T

l kt t = for kl ≠

( ) ( ) ( ) ( ){ }tttt kk 1101 p,,p,pspanφ −− ∈ …

( ) ( ) ( ){ }1 1φ span p , pk k kt t t t− −∈


1 1

2 2

1 2 1

1 1

1 1

1 1

1 1

1,2,

,

,

,

,,

Tk k

k Tk k

k k k kTk k

k Tk k

k k k k

k k k k

k

β

β

α

αα

− −

− −

− − −

− −

− −

− −

− −

=

⎧ ⎫=⎪ ⎪

⎪ ⎪⎪ ⎪= +⎪ ⎪⎪ ⎪=⎨ ⎬⎪ ⎪⎪ ⎪= +⎪ ⎪

= −⎪ ⎪⎪ ⎪⎩ ⎭

r rr r

z z r

r rz Az

x x zr r Az

…

[ ]1, ,1 ,T n= ∈ℜe … 1 ,− =r e 0z =−1

- initial guess,0x 0 0= −r b Ax n n×∈ℜA- initial residual vector, where is a symmetric and positive-definite matrix.

Remark: ( )1 1p ,k kt− −r ( )1 1φk kt− −z

From the orthogonality of OR polynomials:

( )( ) ( )p p 0,T

l kt t = for we have:,l k≠ 0.Tl k =r r

From the t-orthogonality of the search direction polynomials:

for we have:,l k≠( )( ) ( )φ φ 0T

l kt t t = 0.Tl k =z Az


Theorem:

For any the sequence generated by the conjugate direction algorithm converges to the solution x* of the linear system in at most n iterations.

0n∈ℜx { }kx

Convergence rate: ( )( )

* *0

1,

1

k

k

κ

κ

⎛ ⎞−⎜ ⎟− ≤ −⎜ ⎟+⎝ ⎠

A A

Ax x x x

A

where is a condition number of A. ( )κ A

CG algorithm for normal equations

( )( )( )

( )

1

1 1

1

10,1,

,

,

,

,

,

Tk k

Tk k

k Tk k

k k k kTk k

k Tk k

k k k kk

β

β

α

α

−

− −

−

+=

⎧ ⎫= −⎪ ⎪⎪ ⎪

=⎪ ⎪⎪ ⎪⎪ ⎪= −⎨ ⎬⎪ ⎪⎪ ⎪=⎪ ⎪⎪ ⎪

= +⎪ ⎪⎩ ⎭

q A b Ax

Aq Aw

Aw Aww q w

w qAw Aw

x x w…

( )1 0 ,T− = −w A b Ax

- initial guess,0x

1 0,k k− =q w

0Tl k =w A Aw

kzThe search direction is replaced with the correction vector that satisfies the following orthogonality conditions:

kw

1.

2. for .l k≠

CGNR algorithmConjugate Gradient Normal Residual (CGNR) algorithm for normal equations:

( )

( )

1 1

2 2

1 2 1

1 1

1 1

1 1

1,2,

,

,

,

,

,

Tk k

k Tk k

k k k kTk k

k Tk k

k k k kT

k k

k

β

β

α

α

− −

− −

− − −

− −

− −

− −

=

⎧ ⎫=⎪ ⎪

⎪ ⎪⎪ ⎪= +⎪ ⎪⎪ ⎪=⎨ ⎬⎪ ⎪⎪ ⎪= +⎪ ⎪⎪ ⎪= −⎪ ⎪⎩ ⎭

r rr r

z z r

r rAz Az

x x z

r A y Ax…

[ ]1, ,1 ,T n= ∈ℜe … 1 ,− =r e 0z =−1

- initial guess,0x ( )0 0 ,T= −r A b Ax - initial residual vector,

Preconditioning• Left preconditioning: 1 1 .− −=M Ax M b

• Right preconditioning: 1 ,− =AM y b 1 .−=x M y

In general, a good preconditioner M should meet the following requirements:• the preconditioned system should be easy to solve,• the inverse of the preconditioner should be cheap to construct and apply,• the condition number of the preconditioned matrix is considerably lower,• eigenvalues of of the preconditioned matrix are clustered,• the computational cost of calculating M-1 is much lower than for A-1.

Usually the incomplete Cholesky factorization or LU factorization are used to compute M from A.

PCG algorithm

11 1

1 1

2 2

1 2 1

1 1

1 1

1 1

1 1 1,2,

ˆ ,ˆ

,ˆ

ˆ ,ˆ

,

,,

k kTk k

k Tk k

k k k kTk k

k Tk k

k k k k

k k k k k

β

β

α

αα

−− −

− −

− −

− − −

− −

− −

− −

− − =

⎧ ⎫=⎪ ⎪⎪ ⎪=⎪ ⎪⎪ ⎪

= +⎪ ⎪⎨ ⎬⎪ ⎪=⎪ ⎪⎪ ⎪

= +⎪ ⎪⎪ ⎪= −⎩ ⎭

r M r

r rr r

z z r

r rz Az

x x zr r Az

…

( ){ }1 1* 0 0min : , .kK − −− ∈ +

Ax x x x M A M r

[ ]1, ,1 ,T n= ∈ℜe … 1 ,− =r e

1 ,− =z 0

- initial guess,0x 0 0= −r b Ax n n×∈ℜA- initial residual vector, where is a symmetric and positive-definite matrix.

M - preconditioner1ˆ .k k

−=r M rPreconditioning for the residual vector:

Minimal error criterion for the preconditioned vectors:

Lanczos algorithmThe Lanczos algorithm generates the orthogonal vectors {v1 , v2 , ..., vK } that span the Krylov subspace for the symmetrix matrix A and the residual vector r0 :

( ) { } { }10 0 0 0 1 2, span , , , span , , , .K K

KK −= =A r r Ar A r v v v… …

( )1 1

1 1

11

1 1, ,

, ,

,

,

.

k k k

k k n k k k

k k

kk

k k K

α

α β

β

β

+ −

+ +

++

+ =

⎧ ⎫=⎪ ⎪

= − −⎪ ⎪⎪ ⎪⎨ ⎬=⎪ ⎪⎪ ⎪=⎪ ⎪⎩ ⎭

Av v

v A I v v

vvv

…

0 ,=v 01 0 ,β = r 01

1

,β

=rv n n

n×∈ℜI

0 0= −r b AxAlgorithm:

Remark: Note that the Lanczos algorithm can be easily obtained from the 3-term recurrence Stieltjes algorithm by replacing t with A and the orthogonal polynomials

, ...., with the vectors:( )t0p ( )tkp 1, , .kv v…

Lanczos algorithmThe vectors satisfy the recurrence formula:

11

,KK K T

K Kβ++

⎡ ⎤= ⎢ ⎥⋅⎣ ⎦

JAV V

e

1 2

2 2

0 0

.0 0

0 0

K

K

K K

α ββ α

ββ α

⎡ ⎤⎢ ⎥⎢ ⎥⎢ ⎥=⎢ ⎥⎢ ⎥⎢ ⎥⎣ ⎦

J

[ ]0 0 1 T KK = ∈ℜe …

[ ]KK vvvV ,,, 21 …=

where and the tridiagonal Jacobi matrix:

Exact solution can be obtained for K ≤

n.

Lanczos algorithmAssuming the Ritz-Galerkin condition: T

k k =V r 0 for k = 0, 1, ..., K.

Considering and the orthogonality of the Lanczos vectors, we have0

01 r

rv =

0 1Tk k =V Ax r e where [ ]1 1,0, ,0 .T k= ∈ℜe …

From the minimal error criterion for the Ritz-Galerkin condition and

( ) { }0 1 2span , , , ,k k− ∈x x v v v… we have 0 ,k k k= +x x V ζ

where ξk is a vector of the coefficients of a linear combination of the Lanczos vectors. For one obtains0 0,=x

0 1.k k =J ζ r e

Lanczos algorithm

( )1 1

1 1

11

1 1, ,

, ,

,

,

.

k k k

k k n k k k

k k

kk

k k K

α

α β

β

β

+ −

+ +

++

+ =

⎧ ⎫=⎪ ⎪

= − −⎪ ⎪⎪ ⎪⎨ ⎬=⎪ ⎪⎪ ⎪=⎪ ⎪⎩ ⎭

Av v

v A I v v

vvv

…

0 ,=v 01 0 ,β = r 01

1

,β

=rv n n

n×∈ℜI0 0= −r b Ax

Lanczos algorithm for the system Ax = b:

[ ]1, , ,K K=V v v…

1 1

1 2

1

1

0 00

00 0

KK

K K

α ββ α

ββ α

−

−

⎡ ⎤⎢ ⎥⎢ ⎥=⎢ ⎥⎢ ⎥⎣ ⎦

J[ ] KT ℜ∈= 0,,0,11 …e1

0 1K K−=ζ J r e

0 K K= +x x V ζ

Arnoldi algorithmThe Arnoldi algorithm computes orthogonal vectors for the Krylov subspace for the non-symmetrix matrix A and the residual vector r0 with the modified Gram-Schmidt orthogonalization.

,

, 1, ,

1,

11, 1, ,

,

,,

,

.

k

Tn k n

n k n n k

k k

kk k k K

hh

h

h

=

+

++ =

=⎧ ⎫⎪ ⎪⎧ ⎫=⎪ ⎪⎪ ⎪⎨ ⎬⎪ ⎪← −⎪ ⎪⎪ ⎪⎩ ⎭⎨ ⎬

=⎪ ⎪⎪ ⎪⎪ ⎪=⎪ ⎪⎩ ⎭

w Av

v ww w v

wwv

…

…

0

01 r

rv =

[ ]KK vvV ,,1 …=The orthogonal Arnoldi vectors:

satisfy the formula: 1 1,K K K K+ +=AV V H

11 12 13 1,

21 22 23 2,( 1)

32 331,

,

( 1),( 1) ( 1),

0

0 0

K

KK K

K K

K K

K K K K

h h h hh h h h

h hh

h h

+ ×+

+ − +

⎡ ⎤⎢ ⎥⎢ ⎥⎢ ⎥= ∈ℜ⎢ ⎥⎢ ⎥⎢ ⎥⎣ ⎦

H

where HK+1,K is an upper Hessenberg matrix:

GMRES algorithmIf A is a symmetric matrix, the Arnoldi algorithm simplifies to the Lanczos algorithm.

In GMRES algorithm, the Petrov-Galerkin condition is used, and from the Minimal Residual (MR) polynomial criterion, we have:

{ }02 2min min ,

KK K− = −

x ζb Ax r AV ζ

where 0 .K K= +x x V ζ

( )0 1 0 1 1, 0 1 1,2 2 2 22.K K K K K K K K K+ + +− = − = −r AV ζ V r e H ζ r e H ζ

From the relation 1 1, :K K K K+ +=AV V H

Finally0 1 1,2 2

min .K

K K K+−ζ

r e H ζ

GMRES algorithmIn GMRES, the (K+1)-th row of the upper Hessenberg matrix H(K+1),K is neglected, and the square matrix HK,K is transformed to the upper triangular one using the QR factorization with the Givens rotations:

, .K K K K=H Q R

Since the upper triangular system can be easily solved with

the Gausian eliminations.2

1,K =Q

GMRES algorithm1

01 1, ,

1 1 ,1

1k k

11 1

22

,,

,

, , , ,

, ,

If: : Elseif: : 11 , ,, ,11

Elseif:

Tn n n

kn n n n n k

Tnk n k k k k

k

k

kk k

k k k kk k k k

kk

hh

h rh

hh

h hc s cs c s

ρ

ρθ θρ

ρ ρθθ

θθ

−

− =

+ ++

+

+

+ +

⎧ ⎫=⎪ ⎪= ⎨ ⎬= −⎪ ⎪⎩ ⎭

= = = =

⎧ ⎧= =⎪ ⎪⎪ ⎪> ≤⎨ ⎨⎪ ⎪ = = −= = −⎪ ⎪ ++ ⎩⎩

v uu Av

u u v

uu v r Q h

…

{1

, , 1

, , , 1 ,

, 1 1, 1

1

1

,1

0,

1

0: 1, 0,,

, ,, ,

, ,

,If: : Elseif: :

,

k k k

k k k k k k k

k k k k k k k k k k

k k k k k k

k k k k k k

k

k n k nn

k k kk k

k k k k

h c sr r c h sq c q q s qq s q cf c s

rk m k mr

f

ϕ ϕ ϕ

+

+

+

+ + +

+

−

=

−

= = =

← −

← ←

← − ←

= =

⎧−⎪⎪ =< = = +⎨

⎪⎪ = +⎩

∑v ww x x V R

x x w

,k

⎧ ⎫⎪ ⎪⎪ ⎪⎪ ⎪⎪ ⎪⎪ ⎪⎪ ⎪⎪ ⎪⎪ ⎪⎪ ⎪⎪ ⎪⎪ ⎪⎪ ⎪⎪ ⎪⎪ ⎪⎪ ⎪⎨ ⎬⎪ ⎪⎪ ⎪⎪ ⎪⎪ ⎪⎪ ⎪⎪ ⎪⎪ ⎪⎪ ⎪⎪ ⎪⎪ ⎪⎪ ⎪⎪ ⎪⎪ ⎪⎪ ⎪⎪ ⎪⎩ ⎭

f

Initialization:

0 0= −r b Ax

1 0 ,h = r1

01 hr

v =

1,1 1,q = 11 h=φ

(Saad and Schultz, 1986)

Bi-Conjugate GradientsFor a non-symmetrix matrix A, the 3-term Stieltjes recurrences are applied to generate 2 Krylov bases:

( ) { }0 0 1; , , ,kkK span −=A r r r…

( ) { }0 0 1; , , .k TkK span −=A r r r…

Tk k k=R R D

[ ]0 1, , .k k−=R r r…[ ]0 1, , ,k k−=R r r…

k kk

×∈ℜD

- the shifted Krylov base

Let

The bases are mutually orthogonal: - Petrov-Galerkin condition,

- diagonal matrix.

BiCG algorithm1 1

2 2

1 2 1

1 2 1

1 1

1 1

1 1

1 1

1 1

1,2,

,

,,

,

,,

,

Tk k

k Tk k

k k k k

k k k kTk k

k Tk k

k k k k

k k k kT

k k k k

k

β

ββ

α

αα

α

− −

− −

− − −

− − −

− −

− −

− −

− −

− −

=

⎧ ⎫=⎪ ⎪

⎪ ⎪⎪ ⎪= +⎪ ⎪

= +⎪ ⎪⎪ ⎪⎪ ⎪=⎨ ⎬⎪ ⎪⎪ ⎪= +⎪ ⎪

= −⎪ ⎪⎪ ⎪= −⎪ ⎪⎪ ⎪⎩ ⎭

r rr r

z z rz z r

r rz Az

x x zr r Az

r r A z

…

Initialization:(Fletcher, 1975)

0 0= −r b Ax

0 0 ,=r r [ ]1, ,1 ,T n= ∈ℜe …

1 ,− =r e 1 ,− =r e

1 ,− =z 0 0z =−1~

A – non-singular (n x n) matrix

If the denominator equals to zero, a serious breakdown occurs.

CGS algorithmIn the BiCG method, we have: ( ) 0p ,k k=r A r ( ) 0p .T

k k=r A r

From the orthogonality condition:

( ) ( ) ( ) ( ) ( )20 0 0 0 0 0, p , p , p p , p 0.T

k k k k k k k= = = =r r A r A r r A A r r A r

Equivalently, in the CGS (CG Square) method the residual vector is defined as follows:

( )20p .k k=r A r

Starting from the orthogonality condition:0 0 ,=r r 0 , 0.k =r r

CGS algorithm

( )

( )( )

0 1

0 2

1 1

1 1

0 1

0

1

1

1,2,

,

,,

,

,,

.

Tk

k Tk

k k k k

k k k k k k

Tk

k Tk

k k k k

k k k k k

k k k k k

k

β

β

β β

α

αα

α

−

−

− −

− −

−

−

−

=

⎧ ⎫=⎪ ⎪

⎪ ⎪⎪ ⎪= +⎪ ⎪

= + +⎪ ⎪⎪ ⎪⎪ ⎪=⎨ ⎬⎪ ⎪⎪ ⎪= −⎪ ⎪

= + +⎪ ⎪⎪ ⎪

= − +⎪ ⎪⎪ ⎪⎩ ⎭

r rr r

u z rq u z q

r rr Aq

z u Aqx x u z

r r A u z

…

Initialization:

0 0 ,= −r b Ax [ ]1, ,1 ,T n= ∈ℜe …

1 ,− =r e 0 ,=z 0 0 0.=q r

A – non-singular (n x n) matrix,

(Sonneveld, 1989)

BiCGSTAB algorithmThe residual vector is defined as follows: ( ) ( ) 0q pk k k=r A A r

where ( ) ( )( ) ( )1 2q 1 1 1 .k kω ω ω= − − −A A A A…

( )( )( )

( )( )

0 11

0

1

0

0 1

1 1 1,2,

, , ,

, ,

,

.

TTk kk

k k k k k kT Tk k k

k k k k k k k k k k

Tk k

k Tkk

k k k k k k k

α α ω

α ω ω

αβω

β ω

−−

−

−

− − =

⎧ ⎫= = − =⎪ ⎪

⎪ ⎪⎪ ⎪

= + + = −⎪ ⎪⎨ ⎬⎛ ⎞⎛ ⎞⎪ ⎪⎜ ⎟= ⎜ ⎟⎪ ⎪⎜ ⎟⎝ ⎠⎝ ⎠⎪ ⎪⎪ ⎪= + −⎩ ⎭

As sr r s r AzAr Az As As

x x z s r s As

r rAr Ar

z r z Az…

Initialization:

0 0 ,= −r b Ax

0 0.=z r

A – non-singular (n x n) matrix,

If ωk = 0, a serious breakdown

(Van der Vorst, 1992)

QMR methodSimilarly to the GMRES algorithm, the Quasi-Minimal Residual (QMR) method uses the Petrov-Galerkin condition but the Minimal Residual (MR) polynomial criterion takes the form:

{ }0min mink

k k− = −x ζb Ax r AR ζ

where [ ]10 ,, −= kk rrR … contains the successive residual vectors , and

1 1, .k k k k+ +=AR R H

Thus the MR criterion can be simplified to 0 1 1,mink

k k k+−ζ

r e H ζ and

0 .k k= +x x R ζ

QMR method

( )

1 1

1 1

1 1

1 1 1 1

21 1

221 1

1

, ,

,

, ,

,

, ,

1, , ,1

k kk k

k k

Tk k

k T Tk k

k k k k k k k k k k

T Tk k

k Tk k

Tk k k k k k k k

k k k kk k k

k k k kk

k k k k k

α

α α

β

β β

ε γϑ γ ε

γ β β γϑ

ε ϑ γ

− −

− −

− −

− − − −

− −

− −

−

= =

=

= − = −

=

= − = −

= = = −+

= +

v wv wv w

w vq A Ap

p v w p q w v q

q A Apw v

v Ap v w A q w

v v

d p ( )2 21 1 1

1 1 1,2,

, ,, .

k k k k k k k

k k k k k k k

ε ϑ γ− − −

− − =

⎧ ⎫⎪ ⎪⎪ ⎪⎪ ⎪⎪ ⎪⎪ ⎪⎪ ⎪⎪ ⎪⎪ ⎪⎪ ⎪⎨ ⎬⎪ ⎪⎪ ⎪⎪ ⎪⎪ ⎪⎪ ⎪⎪ ⎪⎪ ⎪= +⎪ ⎪⎪ ⎪= + = −⎩ ⎭

d s Ap sx x d r r s

…

Initialization:

0 0 ,= −r b AxA – non-singular (n x n) matrix,

(Freund and Nachtigal, 1991)

0 0 ,=v r 0 0 ,=w r

0 ,=p 0 0 ,=q 0 0 ,=d 0 0 .=s 0

Remark: numerically stable

LSQR method

{ }

1

11 1

1 1

11 1 1

1 Bi diagonalization

2 21k 1 1 1 1 2 1

QR factorization

1 1k

1 1

,

, ,

,

, .

, , ,

c , ,

k k k k

kk k k

k

k k k k

kk k k

k

k k k k k k k k

k kk

k k

r c r r r s

r sr r

β

αα

α

ββ

α β α

β

+

++ +

+ +

++ + +

+ −

− + −

+

= −⎧ ⎫⎪ ⎪⎪ ⎪= =⎪ ⎪⎨ ⎬= −⎪ ⎪⎪ ⎪

= =⎪ ⎪⎩ ⎭

= = + =

= =

v Au vvv v

u Av uuu u

( )Givens rotations

2 11 1

1

1 1 Updates 1, ,

, ,

, ,

k k kk k k k k k

k

k k k k k kk K

rc

rs s

η

η η

−− −

− −=

⎧ ⎫⎪ ⎪⎪ ⎪⎪ ⎪⎪ ⎪⎪ ⎪⎪ ⎪⎪ ⎪⎪ ⎪⎪ ⎪⎨ ⎬⎪ ⎪⎪ ⎪⎧ ⎫⎪ ⎪⎨ ⎬⎪ ⎪⎩ ⎭⎪ ⎪⎧ ⎫⎪ ⎪− ⋅

= = + ⋅ ⋅⎪ ⎪⎪ ⎪⎨ ⎬⎪ ⎪⎪ ⎪⎪ ⎪= − ⋅ = ⋅⎩ ⎭⎩ ⎭

v ww x x w

r r…

Initialization:

0 0 ,= −r b Ax

A – non-singular (m x n) matrix

1 0 ,=u r

1 1,=v Au 1 1 ,α = v

1 1,β = u11

1

,β

=uu

0 1,c = 0 0,s =

0 ,=w 0 0 1.η β=

(Paige and Saunders, 1982)

(normal equations),

Nonnegative Matrix Factorization (NMF)

Nonnegative matrix factorization (NMF) solves the following problem: estimate (up to scale and permutation ambiguity) such nonnegative matrices and that

given only the assigned lower-rank J, and possibly the prior knowledge on the estimated factors or noise distribution.

I J×+∈ℜA T J×

+∈ℜB

,I T×+∈ℜX

T≅X AB

J I T≤Usually

Rank-1 matrix factorization

1 1

J JT

j j j jj j= =

= + = +∑ ∑X a b V a b V

(Outer product)(Noise or error matrix)

Feature extraction

Published in Nature, 401, (1999): D.D. Lee from Bell Lab, H.S. Seung from MIT

(X)

(A)

(BT)

NMF for BSSy1 y2 y3

y4 y5 y6

y7 y8 y9

s1 s2

s3 s4

Original images Mixed images

Alternating minimization• First step (Update for B): • Second step (Update for A):

Iterative algorithm:

T =AB XT T=BA X

For 1, 2,t = … doInitialize randomly: (0) (0),A B

( )( )( ) ( 1) ( 1)

0arg min , ,

Tt t tD − −

≥←

BB X A B

( )( )( ) ( 1) ( )

0arg min , ,

Tt t tD −

≥←

AA X A B

End

( )|| TD X AB - objective function

Objective functions• Square Euclidean distance (square Frobenius norm):

• Generalized Kullback-Leibler divergence (I-divergence):

( )22

1 1

1 1( || ) || || [ ]2 2

I TT T T

F F it iti t

D x= =

= − = −∑∑X AB X AB AB

s t 0 0 .ij tja b i j t. . ≥ , ≥ ∀ , ,

( || ) log [ ][ ]

T TitKL it it itT

it it

xD x x⎛ ⎞⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟⎝ ⎠

= + −∑X AB ABAB

11

s t 0 0 || || 1I

jtj ij iji

b a a=

. . ≥ , ≥ , = = .∑a

Multiplicative algorithms• Euclidean distance ⇒ ISRA

• I-divergence ⇒ EMML

[ ] [ ],

[ ] [ ]

Tij tj

ij ij tj tjT Tij tj

a a b b← , ←XB X AAB B BA A

1 1

1 1

( [ ] ) ( [ ] )

I TT Tij it it tj it iti t

tj tj ij ijI Tqj pjq p

a x b xb b a a

a b= =

= =

/ /← , ← ,∑ ∑

∑ ∑AB AB

(Image Space Reconstruction Algorithm)

(Expectation-Maximization Maximum Likelihood)

ALS algorithmEuclidean distance ⇒ ALS (Alternating Least-Squares)

( ) ( )|| 0,T

T T TFD∇ = − ≡

BX AB A AB X ( ) ( )|| 0,T T

FD∇ = − ≡A

X AB AB X B

Stationary points:

ALS algorithm:

( ) ( )1,

TT T − +← =B X A A A A X

( ) ( )1,

TT T − +← =A XB B B X B

( ) ,T

P +Ω⎡ ⎤← ⎢ ⎥⎣ ⎦

B A X

( ) ,T

P +Ω⎡ ⎤← ⎢ ⎥⎣ ⎦

A X B

Projected ALS :

( )22

1 1

1 1( || ) || || [ ]2 2

I TT T T

F F it iti t

D x= =

= − = −∑∑X AB X AB AB

TensorsDefinition: Let

I1

, I2

, . . . , IN

∈ N denote index upper bounds. A tensor of

order N is an

N-way array where elements

are indexed

by

in

∈ {1, 2, . . . , In

} for 1 ≤ n ≤ N.

1 2 NI I I× × ×∈ℜX …1 2, , , ni i ix

Tensors are obviously generalizations of vectors and matrixes, for example, a third-order tensor (or 3-way array) has 3 modes (or indices or dimensions). A zero-order tensor is a scalar, a first-order tensor is a vector, a second-order tensor is a matrix, and tensors of order three and higher are called higher- order tensors.

Tensor fibers

Tensor slices

Unfolding

Mode-1 unfolding

Mode-2 unfolding Mode-3 unfolding

Unfolding

PARAFAC

1

J

j j jj=

= +∑X a b c V

PARAFAC

1 2 31

J

j j jj

I=

= + = × × × +∑X a b c V A B C V

Harshman’s PARAFAC

( )1

J

j j j jjλ

=

= +∑X a b c V

TUCKER decomposition

( )1 1 1

J R P

jrp j r pj r p

g= = =

= +∑∑∑X a b c V

Numerical Methods Rafał Zdunek Iterative...

Documents

Transcript of Numerical Methods Rafał Zdunek Iterative...