unit 3 understanding matrices - web.stanford.edu

32
Understanding Matrices

Transcript of unit 3 understanding matrices - web.stanford.edu

Page 1: unit 3 understanding matrices - web.stanford.edu

Understanding Matrices

Page 2: unit 3 understanding matrices - web.stanford.edu

Eigensystems

β€’ Eigenvectors - special directions π‘£π‘˜ in which a matrix only applies scaling

β€’ Eigenvalues - the amount πœ†π‘˜ of that scaling

β€’ Right Eigenvectors (or simply eigenvectors) satisfy π΄π‘£π‘˜ = πœ†π‘˜π‘£π‘˜β€’ Eigenvectors represent directions, so 𝐴(π›Όπ‘£π‘˜) = πœ†π‘˜ π›Όπ‘£π‘˜ is also true for all 𝛼

β€’ Left Eigenvectors satisfy π‘’π‘˜π‘‡π΄ = πœ†π‘˜π‘’π‘˜

𝑇 (or π΄π‘‡π‘’π‘˜ = πœ†π‘˜π‘’π‘˜)

β€’ Diagonal matrices have eigenvalues on the diagonal, and eigenvectors ΖΈπ‘’π‘˜2 00 3

10

= 210

and 2 00 3

01

= 301

β€’ Upper/lower triangular matrices also have eigenvalues on the diagonal

2 10 3

10

= 210

and 2 10 3

11

= 311

Page 3: unit 3 understanding matrices - web.stanford.edu

Complex Numbers

β€’ Complex numbers may appear in both eigenvalues and eigenvectors0 1βˆ’1 0

1𝑖

= 𝑖1𝑖

β€’ Recall: complex conjugate: π‘Ž + 𝑏𝑖 βˆ— = π‘Ž βˆ’ 𝑏𝑖

β€’ Hermitian Matrix: π΄βˆ—π‘‡ = 𝐴 (often, π΄βˆ—π‘‡ is written as 𝐴𝐻)

β€’ 𝐴𝑣 = πœ†π‘£ implies 𝐴𝑣 βˆ—π‘‡ = πœ†π‘£ βˆ—π‘‡ or π‘£βˆ—π‘‡π΄ = πœ†βˆ—π‘£βˆ—π‘‡

β€’ Using this, 𝐴𝑣 = πœ†π‘£ implies π‘£βˆ—π‘‡π΄π‘£ = π‘£βˆ—π‘‡πœ†π‘£ or πœ†βˆ—π‘£βˆ—π‘‡π‘£ = πœ†π‘£βˆ—π‘‡π‘£ or πœ†βˆ— = πœ†

β€’ Thus, Hermitian matrices have πœ† ∈ 𝑅 (no complex eigenvalues)

β€’ Symmetric real-valued matrices have real-valued eigenvalues/eigenvectors

β€’ However, complex eigenvectors work too, e.g. 𝐴(π›Όπ‘£π‘˜) = πœ†π‘˜ π›Όπ‘£π‘˜ with 𝛼complex

Page 4: unit 3 understanding matrices - web.stanford.edu

Spatial Deformation

β€’ Let 𝑐 = Οƒπ‘˜ π›Όπ‘˜π‘£π‘˜, so that 𝐴𝑐 = Οƒπ‘˜ π›Όπ‘˜π΄π‘£π‘˜ = Οƒπ‘˜ π›Όπ‘˜πœ†π‘˜ π‘£π‘˜β€’ Then, 𝐴 tilts 𝑐 away from directions with smaller eigenvalues and towards

directions with larger eigenvalues

β€’ Large πœ†π‘˜ stretch components in their associated π‘£π‘˜ directions

β€’ Small πœ†π‘˜ squish components in their associated π‘£π‘˜ directions

β€’ Negative πœ†π‘˜ flip the sign of components in their associated π‘£π‘˜ directions

Page 5: unit 3 understanding matrices - web.stanford.edu

Spatial Deformation

β€’ Consider every point on the unit circle (green) as a vector 𝑐 = Οƒπ‘˜ π›Όπ‘˜π‘£π‘˜ , and remap each point via 𝐴𝑐 = Οƒπ‘˜ π›Όπ‘˜πœ†π‘˜ π‘£π‘˜

β€’ The remapped shape (blue) is more elliptical then the original circle (green)

β€’ The circle is stretched along the axis with the larger eigenvalue, and compressed along the axis with the smaller eigenvalue

β€’ The larger the ratio of eigenvalues, the more elliptical the circle becomes

β€’ This is true for all circles (representing all points in the plane)

Page 6: unit 3 understanding matrices - web.stanford.edu

Solving Linear Systems

β€’ Perturb the right hand side from 𝑏 to 𝑏, and solve 𝐴 Ƹ𝑐 = 𝑏 to find Ƹ𝑐

β€’ Note how 𝑐 and Ƹ𝑐 are more separated than 𝑏 and 𝑏, i.e. the solution is more perturbed than the right hand side is

β€’ Small changes in the right hand side lead to larger changes in the solution

β€’ Small algorithmic errors are also amplified,

since they change π΄βˆ’1𝑏 to αˆ˜π΄βˆ’1𝑏 which is

similar to changing π΄βˆ’1𝑏 to π΄βˆ’1 𝑏‒ The amount of amplification is proportional

to the ratio of the eigenvalues

Page 7: unit 3 understanding matrices - web.stanford.edu

Preconditioning

β€’ Suppose 𝐴 has large eigenvalue ratios that make 𝐴𝑐 = 𝑏 difficult to solveβ€’ Suppose one had an approximate guess for the inverse, i.e an αˆ˜π΄βˆ’1 β‰ˆ π΄βˆ’1

β€’ Then, transform 𝐴𝑐 = 𝑏 into αˆ˜π΄βˆ’1𝐴𝑐 = αˆ˜π΄βˆ’1𝑏 or αˆ˜πΌπ‘ = ෨𝑏‒ Typically, a bit more involved than this, but conceptually the same

β€’ መ𝐼 is not the identity, so there is still more work to do in order to find 𝑐‒ However, መ𝐼 has eigenvalues with similar magnitudes (clusters work too), making αˆ˜πΌπ‘ = ෨𝑏 far easier to solve than a poorly conditioned 𝐴𝑐 = 𝑏

Preconditioning works GREAT!

β€’ It is best to re-scale ellipsoids along eigenvector axes, but scaling along the coordinate axes (diagonal/Jacobi preconditioning) can work well too

Page 8: unit 3 understanding matrices - web.stanford.edu

Rectangular Matrices (Rank)

β€’ An π‘šπ‘₯𝑛 rectangular matrix has π‘š rows and 𝑛 columns

β€’ (Note: these comments also hold for square matrices with π‘š = 𝑛)

β€’ The columns span a space, and the unknowns are weights on each column (recall 𝐴𝑐 = Οƒπ‘˜ π‘π‘˜π‘Žπ‘˜ )

β€’ A matrix with 𝑛 columns has maximum rank 𝑛

β€’ The actual rank depends on how many of the columns are linearly independent from one another

β€’ Each column has length π‘š (which is the number of rows)

β€’ Thus, the columns live in an π‘š dimensional space, and at best can span that whole space

β€’ That is, there is a maximum of π‘š independent columns (that could exist)

β€’ Overall, a matrix at most has rank equal to the minimum of π‘š and 𝑛

β€’ Both considerations are based on looking at the columns (which are scaled by the unknowns)

Page 9: unit 3 understanding matrices - web.stanford.edu

Rows vs. Columns

β€’ One can find discussions on rows, row spaces, etc. that are used for various purposes

β€’ Although these are fine discussions in regards to matrices/mathematics, they are unnecessary for an intuitive understanding of high dimensional vector spaces (and as such can be ignored)

β€’ The number of columns is identical to number of variables, which depends on the parameters of the problemβ€’ E.g. the unknown parameters that govern a neural network architecture

β€’ The number of rows depends on the amount of data used, and adding/removing data does not intrinsically effect the nature of the problemβ€’ E. g. it does not change the network architecture, but merely perturbs the ascertained

values of the unknown parameters

Page 10: unit 3 understanding matrices - web.stanford.edu

Singular Value Decomposition (SVD)

β€’ Factorization of any size π‘šπ‘₯𝑛 matrix: 𝐴 = π‘ˆπ›΄π‘‰π‘‡

β€’ 𝛴 is π‘šπ‘₯𝑛 diagonal with non-negative diagonal entries (called singular values)

β€’ π‘ˆ is π‘šπ‘₯π‘š orthogonal, 𝑉 is 𝑛π‘₯𝑛 orthogonal (their columns are called singular vectors)β€’ Orthogonal matrices have orthonormal columns (an orthonormal basis), so their transpose

is their inverse. They preserve inner products, and thus are rotations, reflections, and combinations thereof

β€’ If A has complex entries, then π‘ˆ and 𝑉 are unitary (conjugate transpose is their inverse)

β€’ Introduced and rediscovered many times: Beltrami 1873, Jordan 1875, Sylvester 1889, Autonne1913, Eckart and Young 1936. Pearson introduced principle component analysis (PCA) in 1901, which uses SVD. Numerical methods by Chan, Businger, Golub, Kahan, etc.

Page 11: unit 3 understanding matrices - web.stanford.edu

(Rectangular) Diagonal Matrices

β€’ All off-diagonal entries are 0 β€’ Diagonal entries are π‘Žπ‘˜π‘˜ , and off diagonal entries are π‘Žπ‘˜π‘– with π‘˜ β‰  𝑖

β€’ E.g. 5 00 20 0

𝑐1𝑐2

=10βˆ’1𝛼

has 5𝑐1 = 10 and 2𝑐2 = βˆ’1, so 𝑐1 = 2 and 𝑐2 = βˆ’.5

β€’ Note that 𝛼 β‰  0 imposes a β€œno solution” condition (even though 𝑐1 and 𝑐2 are well-specified)

β€’ E.g. 5 0 00 2 0

𝑐1𝑐2𝑐3

=10βˆ’1

has 5𝑐1 = 10 and 2𝑐2 = βˆ’1, so 𝑐1 = 2 and 𝑐2 =

βˆ’ .5β€’ Note that there are β€œinfinite solutions” for 𝑐3 (even though 𝑐1 and 𝑐2 are well-specified)

β€’ A zero on the diagonal indicates a singular system, which has either no solution (e.g. 0𝑐1 = 10) or infinite solutions (e.g. 0𝑐1 = 0)

Page 12: unit 3 understanding matrices - web.stanford.edu

Singular Value Decomposition (SVD)

β€’ 𝐴𝑇𝐴 = π‘‰π›΄π‘‡π‘ˆπ‘‡π‘ˆπ›΄π‘‰π‘‡ = 𝑉 𝛴𝑇𝛴 𝑉𝑇, so (𝐴𝑇𝐴)𝑣 = πœ†π‘£ gives 𝛴𝑇𝛴 (𝑉𝑇𝑣) = πœ†(𝑉𝑇𝑣)

β€’ 𝛴𝑇𝛴 is nπ‘₯𝑛 diagonal with eigenvectors ΖΈπ‘’π‘˜, so ΖΈπ‘’π‘˜ = 𝑉𝑇𝑣 and 𝑣 = 𝑉 ΖΈπ‘’π‘˜β€’ That is, the columns of 𝑉 are the eigenvectors of 𝐴𝑇𝐴

β€’ 𝐴𝐴𝑇 = π‘ˆπ›΄π‘‰π‘‡π‘‰π›΄π‘‡π‘ˆπ‘‡ = π‘ˆ 𝛴𝛴𝑇 π‘ˆπ‘‡, so (𝐴𝐴𝑇)𝑣 = πœ†π‘£ gives 𝛴𝛴𝑇 (π‘ˆπ‘‡π‘£) = πœ†(π‘ˆπ‘‡π‘£)

β€’ 𝛴𝛴𝑇 is mπ‘₯π‘š diagonal with eigenvectors ΖΈπ‘’π‘˜, so ΖΈπ‘’π‘˜ = π‘ˆπ‘‡π‘£ and 𝑣 = π‘ˆ ΖΈπ‘’π‘˜β€’ That is, the columns of π‘ˆ are the eigenvectors of 𝐴𝐴𝑇

β€’ When π‘š β‰  𝑛, either 𝛴𝑇𝛴 or 𝛴𝛴𝑇 is larger and contains extra zeros on the diagonal

β€’ Their other diagonal entries are the squares of the singular values

β€’ That is, the singular values are the (non-negative) square roots of the non-extra eigenvalues of 𝐴𝑇𝐴 and 𝐴𝐴𝑇

β€’ Note that both 𝐴𝑇𝐴 and 𝐴𝐴𝑇 are symmetric positive semi-definite, and thus easy to work with

β€’ E.g. symmetry means their eigensystem (and thus the SVD) has no complex numbers when 𝐴 doesn’t

Page 13: unit 3 understanding matrices - web.stanford.edu

Example (Tall Matrix)

β€’ Consider size 4π‘₯3 matrix 𝐴 =

1 2 347

58

69

10 11 12

β€’ Label the columns π‘Ž1 =

14710

, π‘Ž2 =

25811

, π‘Ž3 =

36912

β€’ Since π‘Ž1 and π‘Ž2 point in different directions, 𝐴 is at least rank 2

β€’ π‘Ž3 = 2π‘Ž2 βˆ’ π‘Ž1, so the third column is in the span of the first two columns

β€’ Thus, 𝐴 is only rank 2 (not rank 3)

Page 14: unit 3 understanding matrices - web.stanford.edu

Example (SVD)

𝐴 =

1 2 347

58

69

10 11 12

=

.141 .825 βˆ’.420 βˆ’.351

.344

.547.426.028

.298 .782.644 βˆ’.509

.750 βˆ’.371 βˆ’.542 .079

25.5 0 000

1.290

00

0 0 0

.504 .574 .644βˆ’.761 βˆ’.057 .646.408 βˆ’.816 .408

β€’ Singular values are 25.5, 1.29, and 0

β€’ Singular value of 0 indicates that the matrix is rank deficient

β€’ The rank of a matrix is equal to its number of nonzero singular values

Page 15: unit 3 understanding matrices - web.stanford.edu

Derivation from 𝐴𝑇𝐴 and 𝐴𝐴𝑇

𝐴 =

1 2 347

58

69

10 11 12

=

.141 .825 βˆ’.420 βˆ’.351

.344

.547.426.028

.298 .782.644 βˆ’.509

.750 βˆ’.371 βˆ’.542 .079

25.5 0 000

1.290

00

0 0 0

.504 .574 .644βˆ’.761 βˆ’.057 .646.408 βˆ’.816 .408

β€’ 𝐴𝑇𝐴 is size 3π‘₯3 and has 3 eigenvectors (seen in 𝑉)

β€’ The square roots of the 3 eigenvalues of 𝐴𝑇𝐴 are seen in 𝛴 (color coded to the eigenvectors)

β€’ 𝐴𝐴𝑇 is size 4π‘₯4 and has 4 eigenvectors (seen in π‘ˆ)

β€’ The square roots of 3 of the eigenvalues of 𝐴𝐴𝑇 are seen in 𝛴‒ The 4th eigenvalue of 𝐴𝐴𝑇 is an extra eigenvalue of 0

Page 16: unit 3 understanding matrices - web.stanford.edu

Understanding 𝐴𝑐

𝐴 =

1 2 347

58

69

10 11 12

=

.141 .825 βˆ’.420 βˆ’.351

.344

.547.426.028

.298 .782.644 βˆ’.509

.750 βˆ’.371 βˆ’.542 .079

25.5 0 000

1.290

00

0 0 0

.504 .574 .644βˆ’.761 βˆ’.057 .646.408 βˆ’.816 .408

β€’ 𝐴 maps from 𝑅3 to 𝑅4

β€’ 𝐴𝑐 first projects 𝑐 ∈ 𝑅3 onto the 3 basis vectors in 𝑉‒ Then, the associated singular values (diagonally) scale the resultsβ€’ Lastly, those scaled results are used as weights on the basis vectors in π‘ˆ

Page 17: unit 3 understanding matrices - web.stanford.edu

Understanding 𝐴𝑐

𝐴𝑐 =

.141 .825 βˆ’.420 βˆ’.351

.344

.547.426.028

.298 .782.644 βˆ’.509

.750 βˆ’.371 βˆ’.542 .079

25.5 0 000

1.290

00

0 0 0

.504 .574 .644βˆ’.761 βˆ’.057 .646.408 βˆ’.816 .408

𝑐1𝑐2𝑐3

=

.141 .825 βˆ’.420 βˆ’.351

.344

.547.426.028

.298 .782.644 βˆ’.509

.750 βˆ’.371 βˆ’.542 .079

25.5 0 000

1.290

00

0 0 0

𝑣1𝑇𝑐

𝑣2𝑇𝑐

𝑣3𝑇𝑐

=

.141 .825 βˆ’.420 βˆ’.351

.344

.547.426.028

.298 .782.644 βˆ’.509

.750 βˆ’.371 βˆ’.542 .079

𝜎1𝑣1𝑇𝑐

𝜎2𝑣2𝑇𝑐

𝜎3𝑣3𝑇𝑐0

= 𝑒1𝜎1𝑣1𝑇𝑐 + 𝑒2𝜎2𝑣2

𝑇𝑐 + 𝑒3𝜎3𝑣3𝑇𝑐 + 𝑒40

β€’ 𝐴𝑐 projects 𝑐 onto the basis vectors in 𝑉, scales by the associated singular values, and uses those results as weights on the basis vectors in π‘ˆ

Page 18: unit 3 understanding matrices - web.stanford.edu

Extra Dimensions

𝐴 =

1 2 347

58

69

10 11 12

=

.141 .825 βˆ’.420 βˆ’.351

.344

.547.426.028

.298 .782.644 βˆ’.509

.750 βˆ’.371 βˆ’.542 .079

25.5 0 000

1.290

00

0 0 0

.504 .574 .644βˆ’.761 βˆ’.057 .646.408 βˆ’.816 .408

β€’ The 3D space of vector inputs can only span a 3D subspace of 𝑅4

β€’ The last (green) column of π‘ˆ represents the unreachable dimension, orthogonal to the range of 𝐴, and is always multiplied by 0

β€’ One can delete this column and the associated portion of Ξ£ (and still obtain a valid factorization)

Page 19: unit 3 understanding matrices - web.stanford.edu

Zero Singular Values

𝐴 =

1 2 347

58

69

10 11 12

=

.141 .825 βˆ’.420 βˆ’.351

.344

.547.426.028

.298 .782.644 βˆ’.509

.750 βˆ’.371 βˆ’.542 .079

25.5 0 000

1.290

00

0 0 0

.504 .574 .644βˆ’.761 βˆ’.057 .646.408 βˆ’.816 .408

β€’ The 3rd singular value is 0, so 𝐴 has a 1D null space that reduces the 3D input vectors to only 2 dimensions

β€’ The associated (pink) terms make no contribution to the final result, and can also be deleted (still obtaining a valid factorization)

β€’ The first 2 columns of π‘ˆ span the 2D subset of 𝑅4 that comprises the range of 𝐴

Page 20: unit 3 understanding matrices - web.stanford.edu

Approximating 𝐴

𝐴 =

1 2 347

58

69

10 11 12

β‰ˆ

.141 .825 βˆ’.420 βˆ’.351

.344

.547.426.028

.298 .782.644 βˆ’.509

.750 βˆ’.371 βˆ’.542 .079

25.5 0 000

1.290

00

0 0 0

.504 .574 .644βˆ’.761 βˆ’.057 .646.408 βˆ’.816 .408

β€’ The first singular value is much bigger than the second, and so represents the vast majority of what 𝐴 does (note, the vectors in π‘ˆ and 𝑉 are unit length)

β€’ Thus, one could approximate 𝐴 quite well by only using the terms associated with the largest singular value

β€’ This is not a valid factorization, but an approximation (and the idea behind PCA)

Page 21: unit 3 understanding matrices - web.stanford.edu

Summary

β€’ The columns of 𝑉 that do not correspond to β€œnonzero” singular values form an orthonormal basis for the null space of 𝐴

β€’ The remaining columns of 𝑉 form an orthonormal basis for the space perpendicular to the null space of 𝐴 (parameterizing meaningful inputs)

β€’ The columns of π‘ˆ corresponding to β€œnonzero” singular values form an orthonormal basis for the range of 𝐴

β€’ The remaining columns of π‘ˆ form an orthonormal basis for the (unattainable) space perpendicular to the range of 𝐴

β€’ One can drop the columns of π‘ˆ and 𝑉 that do not correspond to β€œnonzero” singular values and still obtain a valid factorization of 𝐴

β€’ One can drop the columns of π‘ˆ and 𝑉 that correspond to β€œsmall/smaller” singular values and still obtain a reasonable approximation of 𝐴

Page 22: unit 3 understanding matrices - web.stanford.edu

Example (Wide Matrix)

𝐴 =1 4 7 102 5 8 113 6 9 12

=

.504 βˆ’.761 .408

.574 βˆ’.057 βˆ’.816

.644 .646 .408

25.5 0 0 00 1.29 0 00 0 0 0

.141 .344 .547 .750

.825βˆ’.420

.426

.298.028 βˆ’.371.644 βˆ’.542

βˆ’.351 .782 βˆ’.509 .079

β€’ 𝐴 maps from 𝑅4 to 𝑅3 and so has at least a 1D null space (green)

β€’ The 3rd singular value is 0, and the associated (pink) terms make no contribution to the final result

Page 23: unit 3 understanding matrices - web.stanford.edu

Example (Wide Matrix)

𝐴 =1 4 7 102 5 8 113 6 9 12

=

.504 βˆ’.761 .408

.574 βˆ’.057 βˆ’.816

.644 .646 .408

25.5 0 0 00 1.29 0 00 0 0 0

.141 .344 .547 .750

.825βˆ’.420

.426

.298.028 βˆ’.371.644 βˆ’.542

βˆ’.351 .782 βˆ’.509 .079

β€’ Only a 2D subspace of 𝑅4 matters, with the rest of 𝑅4 in the null space of 𝐴

β€’ Only a 2D subspace of 𝑅3 is in the range of 𝐴

Page 24: unit 3 understanding matrices - web.stanford.edu

Notes

β€’ The SVD is often unwieldy for computational purposes

β€’ However, replacing matrices by their SVD can be quite useful/enlightening for theoretical pursuits

β€’ Moreover, its theoretical underpinnings are often used to devise computational algorithms

β€’ The SVD is unique under certain assumptions, such as all πœŽπ‘˜ β‰₯ 0 and in descending order

β€’ However, one can make both a πœŽπ‘˜ and its associated column in π‘ˆ negative for an ”alternate SVD” (see e.g. ”Invertible Finite Elements For Robust Simulation of Large Deformation”, Irving et al. 2004)

Page 25: unit 3 understanding matrices - web.stanford.edu

SVD Construction (Important Detail)

β€’ Let A =1 00 βˆ’1

so that 𝐴𝑇𝐴 = 𝐴𝐴𝑇 = 𝐼, and thus π‘ˆ = 𝑉 = Ξ£ = 𝐼

β€’ But 𝐴 β‰  π‘ˆΞ£π‘‰π‘‡ = 𝐼 What’s wrong?

β€’ Given a column vector π‘£π‘˜ of 𝑉, π΄π‘£π‘˜ = π‘ˆπ›΄π‘‰π‘‡π‘£π‘˜ = π‘ˆπ›΄ ΖΈπ‘’π‘˜ = π‘ˆπœŽπ‘˜ ΖΈπ‘’π‘˜ = πœŽπ‘˜π‘’π‘˜ where π‘’π‘˜ is the corresponding column of π‘ˆ

β€’ 𝐴𝑣1 =1 00 βˆ’1

10

=10

= 𝑒1 but 𝐴𝑣2 =1 00 βˆ’1

01

=0βˆ’1

β‰ 01

= 𝑒2

β€’ Since π‘ˆ and 𝑉 are orthonormal, their columns are unit length

β€’ However, there are still two choices for the direction of each column

β€’ Multiplying 𝑒2 by βˆ’1 to get 𝑒2 =0βˆ’1

makes π‘ˆ = 𝐴, and thus 𝐴 = π‘ˆΞ£π‘‰π‘‡ as

desired

Page 26: unit 3 understanding matrices - web.stanford.edu

SVD Construction (Important Detail)

β€’ An orthogonal matrix has determinant equal to Β±1, where βˆ’1 indicates a reflection of the coordinate system

β€’ If det𝑉 = βˆ’1, flip the direction of any column to make det𝑉 = 1 (so 𝑉 does not contain a reflection)

β€’ Then, for each π‘£π‘˜, compare π΄π‘£π‘˜ to πœŽπ‘˜π‘’π‘˜ and flip the direction of π‘’π‘˜ when necessary in order to make π΄π‘£π‘˜ = πœŽπ‘˜π‘’π‘˜

β€’ detπ‘ˆ = Β± 1 and may contain a reflection

β€’ When detπ‘ˆ = βˆ’1, one can flip the sign of the smallest singular value in 𝛴 to be negative, whilst also flipping the direction of the corresponding column in π‘ˆ so that detπ‘ˆ = 1

β€’ This embeds the reflection into 𝛴 and is called the polar-SVD (Irving et al. 2004)

Page 27: unit 3 understanding matrices - web.stanford.edu

Solving Linear Systems

β€’ 𝐴𝑐 = 𝑏 becomes π‘ˆπ›΄π‘‰π‘‡π‘ = 𝑏 or 𝛴(𝑉𝑇𝑐) = (π‘ˆπ‘‡π‘) or 𝛴 Ƹ𝑐 = 𝑏

β€’ The unknowns 𝑐 are remapped into the space spanned by 𝑉, and the right hand side 𝑏 is remapped into the space spanned by π‘ˆ

β€’ Every matrix is a diagonal matrix, when viewed in the right space

β€’ Solve the diagonal system 𝛴 Ƹ𝑐 = 𝑏 by dividing the entries of 𝑏 by the singular values πœŽπ‘˜; then, 𝑐 = 𝑉 Ƹ𝑐

β€’ The SVD transforms the problem into an inherently diagonal space with eigenvectors along the coordinate axes

β€’ Circles becoming ellipses (discussed earlier) is still problematicβ€’ Eccentricity is caused by ratios of singular values (since π‘ˆ and 𝑉 are orthogonal matrices)

Page 28: unit 3 understanding matrices - web.stanford.edu

Condition Number

β€’ The condition number of 𝐴 is πœŽπ‘šπ‘Žπ‘₯

πœŽπ‘šπ‘–π‘›and measures closeness to being singular

β€’ For a square matrix, it measures the difficulty in solving 𝐴𝑐 = 𝑏

β€’ For a rectangular (and square) matrix, it measures how close the columns are to being linearly dependentβ€’ For a wide (rectangular) matrix, it ignores the extra columns that are guaranteed to be

linearly dependent (which is fine, because the associated variables lack any data)

β€’ The condition number does not depend on the right hand side

β€’ The condition number is always bigger than 1, and approaches ∞ for nearly singular matrices

β€’ Singular matrices have condition number equal to ∞, since πœŽπ‘šπ‘–π‘› = 0

Page 29: unit 3 understanding matrices - web.stanford.edu

Singular Matrices

β€’ Diagonalize 𝐴𝑐 = 𝑏 to 𝛴(𝑉𝑇𝑐) = (π‘ˆπ‘‡π‘), e.g. 𝜎1 00 𝜎2

Ƹ𝑐1Ƹ𝑐2

=𝑏1𝑏2

with Ƹ𝑐1 =𝑏1

𝜎1, Ƹ𝑐2 =

𝑏2

𝜎2

β€’ Suppose 𝜎1 β‰  0 and 𝜎2 = 0; then, there is no unique solution:

β€’ When 𝑏2 = 0, there are infinite solutions for Ƹ𝑐2 (but Ƹ𝑐1 is still uniquely determined)

β€’ When 𝑏2 β‰  0, there is no solution for Ƹ𝑐2, and 𝑏 is not in the range of 𝐴 (but Ƹ𝑐1 is still uniquely determined)

β€’ Consider: 𝜎1 00 𝜎20 0

Ƹ𝑐1Ƹ𝑐2

=

𝑏1𝑏2𝑏3

with Ƹ𝑐1 =𝑏1

𝜎1, Ƹ𝑐2 =

𝑏2

𝜎2

β€’ When 𝑏3 = 0, the last row adds no new information (one has extra redundant data)

β€’ When 𝑏3 β‰  0, the last row is false and there is no solution (but Ƹ𝑐1and Ƹ𝑐2 are still uniquely determined)

β€’ Consider: 𝜎1 0 00 𝜎2 0

Ƹ𝑐1Ƹ𝑐2Ƹ𝑐3

=𝑏1𝑏2

with Ƹ𝑐1 =𝑏1

𝜎1, Ƹ𝑐2 =

𝑏2

𝜎2

β€’ Infinite solutions work for Ƹ𝑐3 (but Ƹ𝑐1and Ƹ𝑐2 are still uniquely determined)

Page 30: unit 3 understanding matrices - web.stanford.edu

Understanding Variables

β€’ Consider any column π‘˜ of 𝛴

β€’ When πœŽπ‘˜ β‰  0, one can state a value for ΖΈπ‘π‘˜β€’ When πœŽπ‘˜ = 0 or there is no πœŽπ‘˜, then there is no information in the data for ΖΈπ‘π‘˜

β€’ This does not mean that other parameters cannot be adequately determined!

β€’ Consider a row 𝑖 of 𝛴 that is identically zero

β€’ When 𝑏𝑖 = 0, this row indicates that there is extra redundant data

β€’ When 𝑏𝑖 β‰  0, this row indicates that there is conflicting information in the data

β€’ Conflicting information doesn’t necessarily imply that all is lost, i.e. β€œno solution”; rather, it might merely mean that the data contains a bit of noise

β€’ Regardless, in spite of any conflicting information, the determinable ΖΈπ‘π‘˜ represent the β€œbest” that one can do

Page 31: unit 3 understanding matrices - web.stanford.edu

Norms

β€’ Common norms: 𝑐 1 = Οƒπ‘˜ π‘π‘˜ , 𝑐 2 = Οƒπ‘˜ π‘π‘˜2, 𝑐 ∞ = max

π‘˜π‘π‘˜

β€’ ”All norms are interchangeable” is a theoretically valid statement (only)

β€’ In practice, the β€œworst case scenario” (𝐿∞) and the β€œaverage” (𝐿1, 𝐿2, etc.) are not interchangeable

β€’ E.g. (100 people * 98.6π‘œ + 1 person * 105π‘œ)/(101 people) = 98.66π‘œ

β€’ Their average temperature is 98.66π‘œ, but everyone is not β€œok”

Page 32: unit 3 understanding matrices - web.stanford.edu

Matrix Norms

β€’ Define the norm of a matrix 𝐴 = max𝑐≠0

𝐴𝑐

𝑐, so:

β€’ 𝐴 1 is the maximum absolute value column sum

β€’ 𝐴 ∞ is the maximum absolute value row sum

β€’ 𝐴 2 is the square root of the maximum eigenvalue of 𝐴𝑇𝐴 , i.e. the maximum singular value of 𝐴

β€’ The condition number for solving (square matrix) 𝐴𝑐 = 𝑏 is 𝐴 2 π΄βˆ’1 2

β€’ Since π΄βˆ’1 = π‘‰Ξ£βˆ’1π‘ˆπ‘‡ where Ξ£βˆ’1 has diagonal entries 1

πœŽπ‘˜, π΄βˆ’1 2 =

1

πœŽπ‘šπ‘–π‘›

β€’ Thus, 𝐴 2 π΄βˆ’1 2 =πœŽπ‘šπ‘Žπ‘₯

πœŽπ‘šπ‘–π‘›