02-Linear equations Linear equations ......02-Linear equations Linear...

Eli Tziperman

02-Linear equations

Eli Tziperman

Linear equations: motivation: tomography geometric interpretation Gaussian elimination Matlab backslash operator over determined systems under determined systems effects of noise HUGE systems and sparse matrices �

.

2 Notation

Consider the following brief reminder and introduction to notation to be used in the course.

Vectors: ~x = x = x = xi; vector norm |x| =√∑

i x2i ; row vector (x)1×n;

column vector (x)n×1; scalar product x · y = xTy =∑

i xiyi = xiyi =|x| |y| cos θ;

Matrices: A = A∼

= aij; matrix multiplication (C)n×p = (A)n×m(B)m×p

where cik =∑m

j=1 aijbjk = aijbjk. Matrix multiplication is associativebut not commutative. This means that (AB)C = A(BC) ≡ ABC, butAB 6≡ BA. Example (Strang, p. 25),

EA =

1 0 0−2 1 00 0 1

2 1 14 −6 0−2 7 2

=

2 1 10 −8 2−2 7 2

1

Note that twice the first row of A has been subtracted from the secondrow, an example of row manipulation via matrix multiplication.

Permutation matrices are used to exchange the rows of a given matrix,e.g.,

PA =

0 1 01 0 00 0 1

2 1 14 −6 0−2 7 2

=

4 −6 02 1 1−2 7 2

,

and note that while PA exchanges rows of A, AP exchanges its columns.

Unit/identity matrix: I = δij, AI = IA = A; It is a diagonal matrix with1s across the diagonal. I−1 = I

Transpose: AT = aji, transpose of a product (AB)T = BTAT , conjugatetranspose A† = a∗ji where z∗ = (a+ ib)∗ = a− ib.

Inverse: A−1A = AA−1 = I.

Eigenvalues λi, and the corresponding eigenvectors, ei, of a matrix A,will be further discussed in the next chapter but are still needed todeal with some aspects of linear equations. They satisfy the equationAei = λiei.

2

eli

Line

3.4 Medical tomography

This is an interesting example application which is also useful and it may lead to either under-or over- determined systems (that is, more unknowns than equations, or more equations thanunknowns).

Objective: calculate tissue density (brain, bone. . . ) in a 2d section from multiple x-raysthrough the tissue. Idea: X-ray intensity transmitted via a distance s, I(s), is a function oftissue density along the ray path, ρ(x, y). If the density were constant, ρ0, the intensity wouldbe an exponential function of that, I(s) = I(0) exp(−µρ0s), where µ is some appropriatedecay constant. When the density is not constant, this is replaced by an integral along thepath as follows.

Each path be described by (x(u), y(u)) with u going from 0 to s; 2d examples:

(x, y) = (u, 0) a path along the y-axis

(x, y) = (0, u) a path along the x-axis

(x, y) = (u, u) a diagonal path

Intensity is related to tissue density as ,

I(s) = I(0) exp

(−∫ s

0

µρ(x(u), y(u))du

),

or

log(I(s)/I(0)) = −∫ s

0

µρ(x(u), y(u))du.

Represent the density values on a grid in each two dimensional section, ρ(xi, yj) = ρij, as avector ~ρ,

~ρ = {ρk}, (k = 1,mn) = (ρ11, ρ12, . . . , ρ1n, ρ21, ρ22, . . . , ρmn)

and write the integral as a sum,

−∫ s

0

µρ(x(u), y(u))du = −mn∑k=1

µBlkρk∆u

where Blk is one when X-ray l goes through location xk = (xi, yj) and zero otherwise. This

leads to a matrix equation A~ρ = ~b. The rhs ~b is made of log(I(s)/I(0)) entries and thematrix A is (µ∆u)B, where each raw representing one trajectory through the tissue. Here isan example of one ray and the corresponding lhs of the equation,

17

Eli Tziperman

Figure 3: Tomography schematics and brain scan images from here, here and here.

18

http://www.fda.gov/ucm/groups/fdagov-public/documents/image/ucm115328.gif

https://applied-math.uibk.ac.at/cms/images/research_images/computer-tomography.png

https://upload.wikimedia.org/wikipedia/commons/5/50/Computed_tomography_of_human_brain_-_large.png

In this example of a ray passing through several grid cells, the corresponding lhs of theequation is∫

µρ du = ρ3∆u3 + ρ4∆u4 + ρ9∆u9 + ρ15∆u15 = RHS = log(I(s)/I(0))

and in matrix for this is,. . .

0 0 ∆u3 ∆u4 0 . . . 0 ∆u9 0 . . . 0 ∆u15 . . . 0. . .. . .

ρ1......ρ20

= RHS

Note that a given location xk = (xi, yj) is encountered by many rays. The dimension ofthe matrix A is L×K =(number of X-ray trajectories)×(number of grid points). Increasingthe resolution of (xi, yj) to get a detailed image, we would always have K > L, i.e., moreunknowns ρ(xi, yj) than equations, leading to an under-determined system which can besolved using SVD.

3.5 Least squares, over-determined systems

Consider M equations in N unknown, in the form, Ax = b where there may be moreequations than unknowns (over determined systems), and even inconsistent equations. Thatis, the M ×N matrix A is not square, and M > N . An example with M = 3 and N = 2,

x = 1, y = 2, x+ y = 4.

which may be written in the form Ax = b, where

A =

1 00 11 1

, b =

124

.

19

Throughout this section, and in particular for LU and iterative methods,a good reference is Strang (2006).

2 Row and column geometric interpretations

for linear equations

(Strang, 2006, §1.2) In solving a linear system of equations, Ax = b, we caninterpret the system using the rows or the columns of A. The interpretationhelps us understand why there may be no solutions, a unique solution, or aninfinite number of solutions. For example if we have the equations,

5x+ 3y − 2z = 62x+ 4y − z = −2x+ 3y + 3z = −1

,

this system of equations can be re-written in matrix form as Ax = b,5 3 −22 4 −11 3 3

xyz

=

6−2−1

We can start by examining the rows of this system. We see that each

equation represents a plane in 3D space, row 1: 5x + 3y − 2z = 6, row 2:2x+ 4y− z = −2 and row 3: x+ 3y+ 3z = −1. Plotting these three planes,we see that they intersect at a particular point which is the solution to thesystem of equations. When two or more of the planes are parallel (but notidentical), or when two pairs of planes intersect along parallel lines, there isno intersection of all three planes and therefore no solution. When the threeplans intersect along a line instead of at a point, there is an infinite numberof solutions.

2

Alternatively, we can interpret the columns of the system geometrically.We can rewrite the system of equations as a sum over three column vectorsthat is equal to the column vector on the RHS,

x

521

+ y

343

+ z

−2−13

=

6−2−1

.We are looking for the three numbers x, y, z such that the linear combinationabove equals the RHS. We plot each of the above column vectors such thateach starts at the end of the previous one. We then plot the RHS columnvector and find that it ends at the same point as the third vector from theLHS. The x, y, z for which the sum of the LHS vectors and the RHS vectormeet is the solution to the linear system of equations. When the three vectorson the LHS are not linearly independent, it may not be possible to find acombination of these vectors that is equal to the RHS, and in that case there isno solution. An infinite number of solutions may occur if, say, two of the LHSvectors are parallel and the RHS vector is such that it can still be expressedin terms of the LHS vectors. In that case one can find different combinationsof the two parallel LHS vectors that can be part of a combination that isequal to the RHS, yielding an infinite number of solutions.

4

column interpretation, unique solution

2

y

-3

-2

-1

0z

0

1

0

2

3

2

x

4 6 8 -210

-12

61

-0.5

4

y

0

x

0

2

z

-1

column interpretation, no solution

-2 0

0.5

1

3

10

column interpretation, many solutions 1

-4

-2

0

y

-5

2

4z

6

5

8

10

0

12

x5

010

10

column interpretation, many solutions 2

-4

-2

0

y

-5

2

4z

6

5

8

10

0

12

x5

010

3 Direct solution to linear equations by LU

decomposition

In the following subsections we describe the LU algorithm, demonstrate itwith a specific example, show how it is used to efficiently solve systems oflinear equations with many right hand sides, discuss its computational costand explain why the algorithm works.

3.1 LU decomposition algorithm

The algorithm for LU Decomposition of a matrix A is straightforward andbest understood by practice. The steps are,

1. Initialize U = A and L, P to the Identity matrix.

2. Starting with the first column, exchange rows in U such that the entryin the column with the largest absolute value becomes the diagonalvalue (the pivot). Similarly exchange the same rows in P, and similarlyexchange them in L but only below the diagonal.

3. Zero the column under the pivot in U by subtracting from each rowby a factor of the pivot value. Insert this “subtraction factor” in theappropriate entry of L.

4. Move to the next column of U and repeat the above steps until youarrive at an upper diagonal U and a lower diagonal L matrix.

4

1.3 An Example of Gaussian Elimination 13

1.3 An Example of Gaussian Elimination

The way to understand elimination is by example. We begin in three dimensions:

Original system2u + v + w = 54u ° 6v = °2°2u + 7v + 2w = 9.

(1)

The problem is to find the unknown values of u, v, and w, and we shall apply Gaussianelimination. (Gauss is recognized as the greatest of all mathematicians, but certainly notbecause of this invention, which probably took him ten minutes. Ironically, it is the mostfrequently used of all the ideas that bear his name.) The method starts by subtractingmultiples of the first equation from the other equations. The goal is to eliminate u fromthe last two equations. This requires that we

(a) subtract 2 times the first equation from the second

(b) subtract °1 times the first equation from the third.

Equivalent system2u + v + w = 5

° 8v ° 2w = °128v + 3w = 14.

(2)

The coefficient 2 is the first pivot. Elimination is constantly dividing the pivot into thenumbers underneath it, to find out the right multipliers.

The pivot for the second stage of elimination is°8. We now ignore the first equation.A multiple of the second equation will be subtracted from the remaining equations (inthis case there is only the third one) so as to eliminate v. We add the second equation tothe third or, in other words, we

(c) subtract °1 times the second equation from the third.

The elimination process is now complete, at least in the “forward” direction:

Triangular system2u + v + w = 5

° 8v ° 2w = °121w = 2.

(3)

This system is solved backward, bottom to top. The last equation gives w = 2. Sub-stituting into the second equation, we find v = 1. Then the first equation gives u = 1.This process is called back-substitution.

To repeat: Forward elimination produced the pivots 2, °8, 1. It subtracted multiplesof each row from the rows beneath, It reached the “triangular” system (3), which issolved in reverse order: Substitute each newly computed value into the equations thatare waiting.

Matlab demo: Examples_linear_equations.m

70 Chapter 1 Matrices and Gaussian Elimination

timated the final roundoff error. It was Wilkinson who found the right way to answer thequestion, and his books are now classics.

Two simple examples will illustrate three important points about roundoff error. Theexamples are

Ill-conditioned A ="

1. 1.

1. 1.0001

#Well-conditioned B =

".0001 1.

1. 1.

#.

A is nearly singular whereas B is far from singular. If we slightly change the last entryof A to a22 = 1, it is singular. Consider two very close right-hand sides b:

u + v = 2u + 1.0001v = 2

andu + v = 2u + 1.0001v = 2.0001

The solution to the first is u = 2, v = 0. The solution to the second is u = v = 1. Achange in the fifth digit of b was amplified to a change in the first digit of the solution. Nonumerical method can avoid this sensitivity to small perturbations. The ill-conditioningcan be shifted from one place to another, but it cannot be removed. The true solution isvery sensitive, and the computed solution cannot be less so.

The second point is as follows.

1O Even a well-conditioned matrix like B can be ruined by a poor algorithm.

We regret to say that for the matrix B, direct Gaussian elimination is a poor algorithm.Suppose .0001 is accepted as the first pivot. Then 10,000 times the first row is subtractedfrom the second. The lower right entry becomes °9999, but roundoff to three placeswould give °10,000. Every trace of the entry 1 would disappear:

Elimination on Bwith small pivot

.0001u+v = 1u+v = 2

°!.0001u+v = 1°9999v = °9998.

Roundoff will produce °10,000v =°10,000, or v = 1. This is correct to three decimalplaces. Back-substitution with the right v = .9999 would leave u = 1:

Correct result .0001u+ .9999 = 1, or u = 1.

Instead, accepting v = 1, which is wrong only in the fourth place, we obtain u = 0:

Wrong result .0001u+1 = 1, or u = 0.

The computed u is completely mistaken. B is well-conditioned but elimination is vio-lently unstable. L, D, and U are completely out of scale with B:

B ="

1 010,000 1

#".0001 0

0 °9999

#"1 10,0000 1

#.

The small pivot .0001 brought instability, and the remedy is clear—exchange rows.

2 Least squares, over-determined systems

Consider M equations in N unknown, in the form, Ax = b where there may be moreequations than unknowns (over determined systems), and even inconsistent equations. Thatis, the M ⇥N matrix A is not square, and M > N . An example with M = 3 and N = 2,

x = 1, y = 2, x+ y = 4.

which may be written in the form Ax = b, where

A =

0

@1 00 11 1

1

A, b =

0

@124

1

A.

An exact solution does not exist, but we can look for a solution that minimizes the norme

Te of the error e = Ax� b. For this, look for x that satisfies

0 =@

@x

�e

Te

�=

@

@x

�(Ax� b)T (Ax� b)

�

=@

@x

MX

i=1

(Ax� b)i(Ax� b)i

=@

@xn

MX

i=1

NX

j=1

Aijxj � bi

! NX

k=1

Aikxk � bi

!

=@

@xn

MX

i=1

0

@NX

(j,k)=1

AijAikxjxk �

NX

k=1

Aikxkbi �

NX

j=1

Aijxjbi + bibi

1

A

=MX

i=1

NX

k=1

AinAikxk +NX

j=1

AijAinxj � Ainbi � Ainbi

!

= 2ATAx� 2AT

b

so that the optimal solution satisfies

A

TAx = A

Tb,

and the least-square solution is, therefore,

x = (ATA)�1

A

Tb.

Example: 2x = 3, 3x = 4. To solve, write A = [2; 3], b = [3; 4], ATA = 13, AT

b = 18.Optimal solution is therefore 18/13. The residual, while optimally small, is not zero if theequations are not consistent, and is equal to Ax� b, in this case,

r = Ax� b =

✓23

◆(18/13)�

✓34

◆=

✓�0.230.15

◆.

4

3 Solving under-determined systems: pseudo inverse

using SVD

If ATA is singular and cannot be inverted, or when in A(n⇥m)x(m⇥1) = b(n⇥1) there are less

equations then unknowns, there may be many possible solutions. A unique solution maybe found by adding a requirement that x has the smallest possible norm, and it can thenbe found using SVD (Strang §6, p 370). Let the SVD of an n by m matrix be A(n⇥m) =U(n⇥n)⌃(n⇥m)V

T(m⇥m), noting that columns of V have the same dimension as the unknown x,

we can expand

x =rX

i=1

aivi +mX

j=r+1

ajvj = x

† + x

null,

where r is the rank of A, or number of nonzero singular values. Then a solution may bewritten using the pseudo inverse, A†

(m⇥n) = V ⌃†U

T . First, note that

(A†A)(m⇥m) = (V(m⇥m)⌃

†(m⇥n)U

T(n⇥n))(U(n⇥n)⌃(n⇥m)V

T(m⇥m))

= V ⌃UTU⌃†

V

T

= V ⌃⌃†V

T

= V(m⇥m)

0

BBBBBBB@

1. . .

10

. . .0

1

CCCCCCCA

(m⇥m)

V

T(m⇥m)

Similarly,

(AA†)(n⇥n) = (U(n⇥n)⌃(n⇥m)VT(m⇥m))(V(m⇥m)⌃

†(n⇥m)U

T(n⇥n))

= U⌃V TV ⌃†

U

T

= U⌃⌃†U

T

= U(n⇥n)

0

BBBBBBB@

1. . .

10

. . .0

1

CCCCCCCA

(n⇥n)

U

T(n⇥n)

Note that the number of nonzero elements in the diagonal matrix is r, which may bedi↵erent from both n and m in general, and is the rank of the matrix A. Consider an under-determined system, with less equations than unknowns, n < m. If the rank of A is equal to

5

Optional

the number of equations, r = n, then (AA†)(n⇥n) = I(n⇥n) although (A†A)(m⇥m) 6= I(m⇥m).

Given this, it is easy to see that a solution is given by x = A

†b, because if we substitute

x = A

†b in our equation we find Ax = AA

†b = Ib = b.

This solution to the original problem Ax = b does not contain any of the null vectors ofU

T and is now be written as x† = A

†b. Any other solution that contains the null vectors of

U will be of larger magnitude, because, using the fact that the null and non-null parts areorthogonal (based on orthogonality of V vectors) and therefore satisfy Pythagoras’ law,

kxk = kx

† + x

nullk = kx

†k+ kx

nullk � kx

†k.

Example: 2x+ 3y = 3. Copy paste to Matlab:

close all; clear all; clc;

A=[2, 3], b=[3], disp('A''*A= (singular!)'), disp(A'*A),

disp('rank(A''*A)='), disp(rank(A'*A)),

disp('[U,S,V]=svd(A):'),[U,S,V]=svd(A),Splus=[1/S(1);0];

disp('Splus=[1/S(1);0]:'); disp(Splus);

disp('verify SVD: U*S*V''-A:'); disp(U*S*V'-A);

x=V*Splus*U'*b;

disp('x=V*Splus*U''*b:'); disp(x)

disp('A*x-b='),disp(A*x-b)

6

Optional

sparse matrix

dense

sparsity density

1 Storing a sparse matrix

•

•

1.1 Dictionary of keys (DOK)

https://en.wikipedia.org/wiki/Finite_element_method

https://en.wikipedia.org/wiki/Numerical_analysis

https://en.wikipedia.org/wiki/Matrix_(mathematics)

https://en.wikipedia.org/wiki/Combinatorics

https://en.wikipedia.org/wiki/Network_theory

https://en.wikipedia.org/wiki/Scientific

https://en.wikipedia.org/wiki/Engineering

https://en.wikipedia.org/wiki/Partial_differential_equation

https://en.wikipedia.org/wiki/Partial_differential_equation

https://en.wikipedia.org/wiki/Computer

https://en.wikipedia.org/wiki/Algorithm

https://en.wikipedia.org/wiki/Data_structure

https://en.wikipedia.org/wiki/Computer_memory

https://en.wikipedia.org/wiki/Data_compression

https://en.wikipedia.org/wiki/Computer_data_storage

https://en.wikipedia.org/wiki/Array_data_structure

https://en.wikipedia.org/wiki/Associative_array

https://en.wikipedia.org/wiki/Ordered_pair

1. ist of ists ( )

1. oordinate ist ( OO)

1. a e

•

•

•

⎛

⎜

⎜

⎝

0 0 0 05 8 0 00 0 3 00 6 0 0

⎞

⎟

⎟

⎠

⎛

⎜

⎜

⎝

10 20 0 0 0 00 30 0 40 0 00 0 50 60 70 00 0 0 0 0 80

⎞

⎟

⎟

⎠

•

•

1. ompressed ro Storage ( S orS )

1. ompressed sparse co mn ( S orS)

https://en.wikipedia.org/wiki/Zero-based_numbering

http://netlib.org/linalg/html_templates/node91.html#SECTION00931100000000000000

http://docs.scipy.org/doc/scipy/reference/generated/scipy.sparse.csr_matrix.html

http://netlib.org/linalg/html_templates/node92.html#SECTION00931200000000000000

Specia str ct re

.1 anded

⎛

⎜

⎝

X X X · · · ·X X · X X · ·X · X · X · ·· X · X · X ·· X X · X X X· · · X X X ·· · · · X · X

⎞

⎟

⎠

. Diagona

. Symmetric

ed cing in

in

So ing sparse matrix e ations

Axi A

See a so

eferences

•

•

•

•

http://docs.scipy.org/doc/scipy/reference/generated/scipy.sparse.csc_matrix.html

sparse_matrix_example.mLU_of_sparse_matrix.m

https://en.wikipedia.org/wiki/Band_matrix



https://en.wikipedia.org/wiki/Sparse_matrix#CITEREFGolubVan_Loan1996

https://en.wikipedia.org/wiki/Tridiagonal_matrix

https://en.wikipedia.org/wiki/Graph_bandwidth

https://en.wikipedia.org/wiki/Graph_bandwidth

https://en.wikipedia.org/wiki/Diagonal_matrix

https://en.wikipedia.org/wiki/Adjacency_matrix

https://en.wikipedia.org/wiki/Undirected_graph

https://en.wikipedia.org/wiki/Adjacency_list

https://en.wikipedia.org/wiki/Symbolic_Cholesky_decomposition

https://en.wikipedia.org/wiki/Symbolic_Cholesky_decomposition

https://en.wikipedia.org/wiki/Cholesky_decomposition



https://en.wikipedia.org/wiki/Iterative_method

https://en.wikipedia.org/wiki/Conjugate_gradient

https://en.wikipedia.org/wiki/GMRES

https://en.wikipedia.org/wiki/Preconditioner

https://en.wikipedia.org/wiki/Gene_H._Golub

https://en.wikipedia.org/wiki/Charles_F._Van_Loan

https://en.wikipedia.org/wiki/International_Standard_Book_Number

https://en.wikipedia.org/wiki/Special:BookSources/978-0-8018-5414-9

https://en.wikipedia.org/wiki/Springer-Verlag

https://en.wikipedia.org/wiki/International_Standard_Book_Number

https://en.wikipedia.org/wiki/Special:BookSources/978-0-387-95452-3

http://www.mgnet.org/~douglas/Preprints/pub0034.pdf

http://www.mgnet.org/~douglas/Preprints/pub0034.pdf

02-Linear equations Linear equations ......02-Linear equations Linear...

Documents

Transcript of 02-Linear equations Linear equations ......02-Linear equations Linear...