MATH 22A: LINEAR ALGEBRA Chapter 4sborgwardt/MAT22A/MAT22Achapter… · MATH 22A: LINEAR ALGEBRA...

MATH 22A: LINEAR ALGEBRAChapter 4

Steffen Borgwardt, UC Davisoriginal version of these slides: Jesus De Loera, UC Davis

February 20, 2015

MATH 22A: LINEAR ALGEBRA Chapter 4

Orthogonality andLeast Squares Approximation(4.1-4.4)

QUESTION:

Suppose Ax = b has no solution...Then what do we do ?Can we find an approximate solution?

Say a police man is interested in clocking the speed of avehicle by using measurements of its relative distance.At different times ti we measure distance bi .

Assuming the vehicle is traveling at constant speed we know alinear formula for this, but there are errors!

Suppose we expect the output b to be a linear function of theinput t b = α + tβ, but we need to determine α, β.

At t = ti , the error between the measured value bi and thevalue predicted by the function is ei = bi − (α + βti ).

We can write it as e = b − Ax where x = (α, β). e is theerror vector, b is the data vector. A is an m × 2 matrix.

We seek the line that minimizes the total squared error or

Euclidean norm ‖e‖ =√∑m

i=1 e2i .

KEY GOAL We are looking for x that minimizes ‖b − Ax‖.

If b ∈ C (A) then we can make the value ||b − Ax || = 0.

Note that ‖b − Ax‖ is the distance from b to the point Ax ,which is an element of the column space!

The error vector e = b − Ax is perpendicular to the columnspace of A.

Thus for each column ai we have aTi (b − Ax) = 0. Thus inmatrix notation: AT (b − Ax) = 0. This gives

ATAx = ATb

We are going to study this system ALOT!

KEY PROPERTY If A has independent columns, then ATAis square, symmetric and invertible.

ATAx = ATb

Recall

We say vectors x , y are perpendicular when they create a 90degree angle. When that happens the triangle they define is aright triangle!

Lemma Two vectors x , y in Rn are perpendicular if and only if

x1y1 + · · ·+ xnyn = xT y = 0

When this last equation holds we say that x , y areorthogonal.

Orthogonal Bases: A basis u1, . . . , un of V is orthogonal if〈ui , uj〉 = 0 for all i 6= j .

Lemma If v1, v2, . . . , vk are orthogonal then they are linearlyindependent.

Recall

x1y1 + · · ·+ xnyn = xT y = 0

Recall

x1y1 + · · ·+ xnyn = xT y = 0

Recall

x1y1 + · · ·+ xnyn = xT y = 0

The Orthogonality of the Subspaces

Definition We say two say two subspaces V ,W of Rn areorthogonal if for u ∈ V and w ∈W we have uTw = 0.

Can you see a way to detect when two subspaces areorthogonal?? Through their bases!

Theorem: The row space and the nullspace are orthogonal.Similarly the column space is orthogonal to the left nullspace.

proof: The dot product between the rows of AT and therespective entries in the vector y is zero.

Therefore the rows of AT are perpendicular to any y ∈ N(AT )

AT y =

Column 1 of A...

Column n of A

where y ∈ N(AT ).

AT y =

Column 1 of A...

Column n of A

where y ∈ N(AT ).

AT y =

Column 1 of A...

Column n of A

where y ∈ N(AT ).

AT y =

Column 1 of A...

Column n of A

where y ∈ N(AT ).

AT y =

Column 1 of A...

Column n of A

where y ∈ N(AT ).

There is a stronger relation, for a subspace V of Rn the set ofall vectors orthogonal to V is the orthogonal complement ofV , denoted V⊥.

Warning Spaces can be orthogonal without beingcomplements!

Exercise Let W be a subspace, its orthogonal complement isa subspace, and W ∩W⊥ = 0.

Exercise If V ⊂W are subspaces, then W⊥ ⊂ V⊥.

Theorem (Fundamental theorem part II)C (AT )⊥ = N(A) and N(A)⊥ = C (AT ).

proof: The first equation holds because x is orthogonal to allvectors of the row space ⇔ x is orthogonal to each of therows ⇔ x ∈ N(A). The other equality follows from the aboveexercises.

Corollary Given an m × n matrix A, the nullspace is theorthogonal complement of the row space in Rn. Similarly, theleft nullspace is the orthogonal complement of the columnspace inside Rm

WHY is this such a big deal?

Theorem Given an m × n matrix A, every vector x in Rn canbe written in a unique way as x = xn + xr where xn is in thenullspace and xr is in the row space of A.

proof Pick xn to be the orthogonal projection of x into N(A)and xr to be the orthogonal projection into C (AT ). Clearly xis a sum of both, but why are they unique?

If xn + xr = x ′n + x ′r , then xn − x ′n = xr − x ′r . Thus this mustbe the zero vector because N(A) is orthogonal to to C (AT ).

This has a beautiful consequence: Every matrix A, when wethink of it as a linear map, transforms the row space into itscolumn space!

An important picture

Projections onto Subspaces

QUESTION: Given a subspace S , what is the formula for theprojection p of a vector b into S?

Key idea of LEAST SQUARES for regression analysis

Think of b as data from experiments, b is not in S , due toerror of measurement.

Projection p is the best choice to replace b.

How to do this projection for a line?

b is projected onto the line L given by the vector a in thispicture:

The projection of vector b onto the line in the direction a is

p =aTb

Exercise We wish to project the vector b = (2, 3, 4) onto thez-axis...Find it!

p =aTb

Projections in the general case

Suppose the subspace S = C (A) is the column space of A.Now b is a vector that is outside the column space!

We want x that minimizes ‖b − Ax‖. Then p = Ax is theprojection of b onto C (A)

Note that ‖b − Ax‖ is the distance from b to the point Axwhich is element of the column space!

The vector w = b − Ax must be perpendicular to the columnspace (like before).

For each column ai we have aTi (b − Ax) = 0.

Thus in matrix notation: AT (b − Ax) = 0. This gives thenormal equation or least-squares equation:

ATAx = ATb

Theorem The solution x = (ATA)−1ATb gives thecoordinates of the projection p in terms of the columns of A.The projection of b into C (A) is

p = A((ATA)−1AT )b

Theorem The matrix P = A((ATA)−1AT ) is a projectionmatrix. It has the properties PT = P, and P2 = P.

Lemma ATA is a symmetric matrix. ATA has the sameNullspace as A. proof if x ∈ N(A), then clearly ATAx = 0.Conversely, if ATAx = 0 then xTATAx = ‖Ax‖ = 0, thusAx = 0.

Corollary If A has independent columns, then ATA is square,symmetric and invertible.

Example 1 We wish to project the vector b = (2, 3, 4, 1) intothe subspace x + y = 0. What is the distance?

Lemma ATA is a symmetric matrix. ATA has the sameNullspace as A.

proof if x ∈ N(A), then clearly ATAx = 0.Conversely, if ATAx = 0 then xTATAx = ‖Ax‖ = 0, thusAx = 0.

Example 2 Consider the problem Ax = b with

3 −1 1

−1 2 1

1 −1 −2

2 1 −1

bT = (1, 0,−1, 2, 2).

There is no EXACT solution to Ax = b, so we use the”NORMAL EQUATION” with

16 −2 −2

−2 11 2

−2 2 7

Solving ATAx = ATb we get the least squares solutionx∗ ≈ (0.4119, 0.2482,−0.9532)T with error‖b − Ax∗‖ ≈ 0.1799.

3 −1 1

−1 2 1

1 −1 −2

2 1 −1

bT = (1, 0,−1, 2, 2).

16 −2 −2

−2 11 2

−2 2 7

3 −1 1

−1 2 1

1 −1 −2

2 1 −1

bT = (1, 0,−1, 2, 2).

16 −2 −2

−2 11 2

−2 2 7

Example 3 A sample of lead-210 measured the followingradioactivity data at the given times (time in days). Can YOUpredict how long will it take until one percent of the originalamount remains?

time in days 0 4 8 10 14 18

mg 10 8.8 7.8 7.3 6.4 6.4

A linear model does not work here!!

There is an exponential decay on the material m(t) = m0eβt ,

where m0 is the initial radioactive material and β the decayrate.

Taking logarithms, we get

y(t) = log(m(t)) = log(m0) + βt

We can use the usual least squares to fit to the logarithmsyi = log(mi ) of the radioactive mass data.

time in days 0 4 8 10 14 18

mg 10 8.8 7.8 7.3 6.4 6.4

time in days 0 4 8 10 14 18

mg 10 8.8 7.8 7.3 6.4 6.4

time in days 0 4 8 10 14 18

mg 10 8.8 7.8 7.3 6.4 6.4

time in days 0 4 8 10 14 18

mg 10 8.8 7.8 7.3 6.4 6.4

In this case we have

[1 1 1 1 1 1

0 4 8 10 14 18

y(t) = bT = [2.30258, 2.17475, 2.05412, 1.98787, 1.856297, 1.85629]

Thus ATA =

54 700

Solving the NORMAL system we get log(m0) = 2.277327661and β = −0.0265191683

The original amount was 10 mg. After 173 days it will bebelow one percent of the radioactive material.

[1 1 1 1 1 1

0 4 8 10 14 18

y(t) = bT = [2.30258, 2.17475, 2.05412, 1.98787, 1.856297, 1.85629]

Thus ATA =

54 700

[1 1 1 1 1 1

0 4 8 10 14 18

y(t) = bT = [2.30258, 2.17475, 2.05412, 1.98787, 1.856297, 1.85629]

Thus ATA =

54 700

[1 1 1 1 1 1

0 4 8 10 14 18

y(t) = bT = [2.30258, 2.17475, 2.05412, 1.98787, 1.856297, 1.85629]

Thus ATA =

54 700

Orthogonal Bases andGram-Schmidt

Not all bases of a vector space are created equal! Some arebetter than others!

A basis u1, . . . , un of a vector space V is orthonormal if it isorthogonal and each vector has unit length.

Observation If the vectors u1, . . . , un are an orthogonal basis,their normalizations ui

‖ui‖ form an orthonormal basis.

Example The standard unit vectors are orthonormal.

Example The vectors 12−1

5−21

are an orthogonal basis of R3.

5−21

Why do we care about orthonormal bases?

Theorem Let u1, . . . , un be an orthonormal bases for a vectorspace with inner product V . The one can write any elementv ∈ V as a linear combination v = c1u1 + · · ·+ cnun whereci = 〈v , ui 〉, for i = 1, . . . , n.

Example Let us rewrite the vector v = (1, 1, 1)T in terms ofthe orthonormal basis

u1 = (1√6,

2√6,− 1√

6)T , u2 = (0,

1√5,

), u3 = (5√30,−2√

1√30

Computing the dot products vTu1 = 2√6, vTu2 = 3√

5, and

vTu3 = 4√30

, we get

v =2√6u1 +

3√5u2 +

4√30

The best reason to like matrices that have orthonormalvectors: The least-squares equations are even nicer!

u1 = (1√6,

2√6,− 1√

6)T , u2 = (0,

1√5,

), u3 = (5√30,−2√

1√30

5, and

vTu3 = 4√30

, we get

v =2√6u1 +

3√5u2 +

4√30

u1 = (1√6,

2√6,− 1√

6)T , u2 = (0,

1√5,

), u3 = (5√30,−2√

1√30

5, and

vTu3 = 4√30

, we get

v =2√6u1 +

3√5u2 +

4√30

u1 = (1√6,

2√6,− 1√

6)T , u2 = (0,

1√5,

), u3 = (5√30,−2√

1√30

5, and

vTu3 = 4√30

, we get

v =2√6u1 +

3√5u2 +

4√30

Lemma If Q is a rectangular matrix with orthonormalcolumns, then the normal equations simplify becauseQTQ = I :

QTQx = QTb simplifies to x = QTbProjection matrix simplifies toQ(QTQ)−1QT = QIQT = QQT .Thus the projection point is p = QQTb, thus

p = (qT1 b)q1 + (qT2 b)q2 + · · ·+ (qTn b)qn

To compute orthogonal/orthonormal bases for a space, we usethe GRAM-SCHMIDT ALGORITHM.

Lemma If Q is a rectangular matrix with orthonormalcolumns, then the normal equations simplify becauseQTQ = I :

QTQx = QTb simplifies to x = QTbProjection matrix simplifies toQ(QTQ)−1QT = QIQT = QQT .Thus the projection point is p = QQTb, thus

p = (qT1 b)q1 + (qT2 b)q2 + · · ·+ (qTn b)qn

input: linearly independent vectors a1, . . . , anoutput: orthogonal vectors q1, . . . , qn.

Step 1: q1 = a1

Step 2: q2 = a2 − (aT2 q1qT1 q1

Step 3: q3 = a3 − (aT3 q1qT1 q1

)q1 − (aT3 q2qT2 q2

Step 4: q4 = a4 − (aT4 q1qT1 q1

)q1 − (aT4 q2qT2 q2

)q2 − (aT4 q3qT3 q3

)q3...

......

Step j: q4 = aj − (aTj q1

qT1 q1

)q1 − (aTj q2

qT2 q2

)q2 − . . . (aTj qj−1

qTj−1qj−1

)qj−1

At the end NORMALIZE all vectors if you wish to have unitvectors! (Divide them by their length!).

Step 1: q1 = a1

Step 2: q2 = a2 − (aT2 q1qT1 q1

Step 3: q3 = a3 − (aT3 q1qT1 q1

)q1 − (aT3 q2qT2 q2

Step 4: q4 = a4 − (aT4 q1qT1 q1

)q1 − (aT4 q2qT2 q2

)q2 − (aT4 q3qT3 q3

)q3...

......

qT1 q1

)q1 − (aTj q2

qT2 q2

)q2 − . . . (aTj qj−1

qTj−1qj−1

)qj−1

Step 1: q1 = a1

Step 2: q2 = a2 − (aT2 q1qT1 q1

Step 3: q3 = a3 − (aT3 q1qT1 q1

)q1 − (aT3 q2qT2 q2

Step 4: q4 = a4 − (aT4 q1qT1 q1

)q1 − (aT4 q2qT2 q2

)q2 − (aT4 q3qT3 q3

)q3...

......

qT1 q1

)q1 − (aTj q2

qT2 q2

)q2 − . . . (aTj qj−1

qTj−1qj−1

)qj−1

EXAMPLE

Consider the subspace W spanned by (1,−2, 0, 1), (−1, 0, 0,−1)and (1, 1, 0, 0). Find an orthonormal basis for the space W .

ANSWER:

(1√6,−2√

), (−1√

3,−1√

3, 0,−1√

1√2, 0, 0,

−1√2

EXAMPLE

Consider the subspace W spanned by (1,−2, 0, 1), (−1, 0, 0,−1)and (1, 1, 0, 0). Find an orthonormal basis for the space W .ANSWER:

(1√6,−2√

), (−1√

3,−1√

3, 0,−1√

1√2, 0, 0,

−1√2

In this way, the original basis vectors a1, . . . , an can be writtenin a “triangular” way!If q1, q2, . . . , qn are orthogonal Just think of rij = aTj qi

a1 = r11(q1/qT1 q1), (1)

a2 = r12(q1/qT1 q1) + r22(q2/q

T2 q2) (2)

a3 = r13(q1/qT1 q1) + r23(q2/q

T2 q2) + r33(q3/q

T3 q3) (3)

...... (4)

an = r1n(q1/qT1 q1) + r2n(q2/q

T2 q2) + · · ·+ rnn(qn/q

Tn qn).

Write these equations in matrix form! We obtain A = QRwhere A = (a1 . . . an) and Q = (q1 . . . qn) andR = (rij).

In this way, the original basis vectors a1, . . . , an can be writtenin a “triangular” way!If q1, q2, . . . , qn are orthogonal Just think of rij = aTj qi

a1 = r11(q1/qT1 q1), (1)

a2 = r12(q1/qT1 q1) + r22(q2/q

T2 q2) (2)

a3 = r13(q1/qT1 q1) + r23(q2/q

T2 q2) + r33(q3/q

T3 q3) (3)

...... (4)

an = r1n(q1/qT1 q1) + r2n(q2/q

T2 q2) + · · ·+ rnn(qn/q

Tn qn).

Write these equations in matrix form! We obtain A = QRwhere A = (a1 . . . an) and Q = (q1 . . . qn) andR = (rij).

Theorem (QR decomposition) Every m × n matrix A withindependent columns can be factorized as A = QR where thecolumns of Q are orthonormal and R is upper triangular andinvertible.

NOTE: A and Q have the same column space. R is aninvertible and upper triangular matrix.

The simplest way to compute this decomposition:1 Use Gram-Schmidt to get orthonormal vectors qi .2 Matrix Q has columns q1, . . . , qn3 The matrix R is filled with the dot products rij = aTj qi .

NOTE: Every matrix has two decompositions LU and QR.

They are both useful for different reasons! One is for solvingequations, the other good for least-squares.

MATH 22A: LINEAR ALGEBRA Chapter 4sborgwardt/MAT22A/MAT22Achapter… · MATH 22A: LINEAR ALGEBRA...

Documents

Transcript of MATH 22A: LINEAR ALGEBRA Chapter 4sborgwardt/MAT22A/MAT22Achapter… · MATH 22A: LINEAR ALGEBRA...