unit 3 understanding matrices - web.stanford.edu
Transcript of unit 3 understanding matrices - web.stanford.edu
Understanding Matrices
Eigensystems
β’ Eigenvectors - special directions π£π in which a matrix only applies scaling
β’ Eigenvalues - the amount ππ of that scaling
β’ Right Eigenvectors (or simply eigenvectors) satisfy π΄π£π = πππ£πβ’ Eigenvectors represent directions, so π΄(πΌπ£π) = ππ πΌπ£π is also true for all πΌ
β’ Left Eigenvectors satisfy π’πππ΄ = πππ’π
π (or π΄ππ’π = πππ’π)
β’ Diagonal matrices have eigenvalues on the diagonal, and eigenvectors ΖΈππ2 00 3
10
= 210
and 2 00 3
01
= 301
β’ Upper/lower triangular matrices also have eigenvalues on the diagonal
2 10 3
10
= 210
and 2 10 3
11
= 311
Complex Numbers
β’ Complex numbers may appear in both eigenvalues and eigenvectors0 1β1 0
1π
= π1π
β’ Recall: complex conjugate: π + ππ β = π β ππ
β’ Hermitian Matrix: π΄βπ = π΄ (often, π΄βπ is written as π΄π»)
β’ π΄π£ = ππ£ implies π΄π£ βπ = ππ£ βπ or π£βππ΄ = πβπ£βπ
β’ Using this, π΄π£ = ππ£ implies π£βππ΄π£ = π£βπππ£ or πβπ£βππ£ = ππ£βππ£ or πβ = π
β’ Thus, Hermitian matrices have π β π (no complex eigenvalues)
β’ Symmetric real-valued matrices have real-valued eigenvalues/eigenvectors
β’ However, complex eigenvectors work too, e.g. π΄(πΌπ£π) = ππ πΌπ£π with πΌcomplex
Spatial Deformation
β’ Let π = Οπ πΌππ£π, so that π΄π = Οπ πΌππ΄π£π = Οπ πΌπππ π£πβ’ Then, π΄ tilts π away from directions with smaller eigenvalues and towards
directions with larger eigenvalues
β’ Large ππ stretch components in their associated π£π directions
β’ Small ππ squish components in their associated π£π directions
β’ Negative ππ flip the sign of components in their associated π£π directions
Spatial Deformation
β’ Consider every point on the unit circle (green) as a vector π = Οπ πΌππ£π , and remap each point via π΄π = Οπ πΌπππ π£π
β’ The remapped shape (blue) is more elliptical then the original circle (green)
β’ The circle is stretched along the axis with the larger eigenvalue, and compressed along the axis with the smaller eigenvalue
β’ The larger the ratio of eigenvalues, the more elliptical the circle becomes
β’ This is true for all circles (representing all points in the plane)
Solving Linear Systems
β’ Perturb the right hand side from π to π, and solve π΄ ΖΈπ = π to find ΖΈπ
β’ Note how π and ΖΈπ are more separated than π and π, i.e. the solution is more perturbed than the right hand side is
β’ Small changes in the right hand side lead to larger changes in the solution
β’ Small algorithmic errors are also amplified,
since they change π΄β1π to απ΄β1π which is
similar to changing π΄β1π to π΄β1 πβ’ The amount of amplification is proportional
to the ratio of the eigenvalues
Preconditioning
β’ Suppose π΄ has large eigenvalue ratios that make π΄π = π difficult to solveβ’ Suppose one had an approximate guess for the inverse, i.e an απ΄β1 β π΄β1
β’ Then, transform π΄π = π into απ΄β1π΄π = απ΄β1π or απΌπ = ΰ·¨πβ’ Typically, a bit more involved than this, but conceptually the same
β’ απΌ is not the identity, so there is still more work to do in order to find πβ’ However, απΌ has eigenvalues with similar magnitudes (clusters work too), making απΌπ = ΰ·¨π far easier to solve than a poorly conditioned π΄π = π
Preconditioning works GREAT!
β’ It is best to re-scale ellipsoids along eigenvector axes, but scaling along the coordinate axes (diagonal/Jacobi preconditioning) can work well too
Rectangular Matrices (Rank)
β’ An ππ₯π rectangular matrix has π rows and π columns
β’ (Note: these comments also hold for square matrices with π = π)
β’ The columns span a space, and the unknowns are weights on each column (recall π΄π = Οπ ππππ )
β’ A matrix with π columns has maximum rank π
β’ The actual rank depends on how many of the columns are linearly independent from one another
β’ Each column has length π (which is the number of rows)
β’ Thus, the columns live in an π dimensional space, and at best can span that whole space
β’ That is, there is a maximum of π independent columns (that could exist)
β’ Overall, a matrix at most has rank equal to the minimum of π and π
β’ Both considerations are based on looking at the columns (which are scaled by the unknowns)
Rows vs. Columns
β’ One can find discussions on rows, row spaces, etc. that are used for various purposes
β’ Although these are fine discussions in regards to matrices/mathematics, they are unnecessary for an intuitive understanding of high dimensional vector spaces (and as such can be ignored)
β’ The number of columns is identical to number of variables, which depends on the parameters of the problemβ’ E.g. the unknown parameters that govern a neural network architecture
β’ The number of rows depends on the amount of data used, and adding/removing data does not intrinsically effect the nature of the problemβ’ E. g. it does not change the network architecture, but merely perturbs the ascertained
values of the unknown parameters
Singular Value Decomposition (SVD)
β’ Factorization of any size ππ₯π matrix: π΄ = ππ΄ππ
β’ π΄ is ππ₯π diagonal with non-negative diagonal entries (called singular values)
β’ π is ππ₯π orthogonal, π is ππ₯π orthogonal (their columns are called singular vectors)β’ Orthogonal matrices have orthonormal columns (an orthonormal basis), so their transpose
is their inverse. They preserve inner products, and thus are rotations, reflections, and combinations thereof
β’ If A has complex entries, then π and π are unitary (conjugate transpose is their inverse)
β’ Introduced and rediscovered many times: Beltrami 1873, Jordan 1875, Sylvester 1889, Autonne1913, Eckart and Young 1936. Pearson introduced principle component analysis (PCA) in 1901, which uses SVD. Numerical methods by Chan, Businger, Golub, Kahan, etc.
(Rectangular) Diagonal Matrices
β’ All off-diagonal entries are 0 β’ Diagonal entries are πππ , and off diagonal entries are πππ with π β π
β’ E.g. 5 00 20 0
π1π2
=10β1πΌ
has 5π1 = 10 and 2π2 = β1, so π1 = 2 and π2 = β.5
β’ Note that πΌ β 0 imposes a βno solutionβ condition (even though π1 and π2 are well-specified)
β’ E.g. 5 0 00 2 0
π1π2π3
=10β1
has 5π1 = 10 and 2π2 = β1, so π1 = 2 and π2 =
β .5β’ Note that there are βinfinite solutionsβ for π3 (even though π1 and π2 are well-specified)
β’ A zero on the diagonal indicates a singular system, which has either no solution (e.g. 0π1 = 10) or infinite solutions (e.g. 0π1 = 0)
Singular Value Decomposition (SVD)
β’ π΄ππ΄ = ππ΄πππππ΄ππ = π π΄ππ΄ ππ, so (π΄ππ΄)π£ = ππ£ gives π΄ππ΄ (πππ£) = π(πππ£)
β’ π΄ππ΄ is nπ₯π diagonal with eigenvectors ΖΈππ, so ΖΈππ = πππ£ and π£ = π ΖΈππβ’ That is, the columns of π are the eigenvectors of π΄ππ΄
β’ π΄π΄π = ππ΄ππππ΄πππ = π π΄π΄π ππ, so (π΄π΄π)π£ = ππ£ gives π΄π΄π (πππ£) = π(πππ£)
β’ π΄π΄π is mπ₯π diagonal with eigenvectors ΖΈππ, so ΖΈππ = πππ£ and π£ = π ΖΈππβ’ That is, the columns of π are the eigenvectors of π΄π΄π
β’ When π β π, either π΄ππ΄ or π΄π΄π is larger and contains extra zeros on the diagonal
β’ Their other diagonal entries are the squares of the singular values
β’ That is, the singular values are the (non-negative) square roots of the non-extra eigenvalues of π΄ππ΄ and π΄π΄π
β’ Note that both π΄ππ΄ and π΄π΄π are symmetric positive semi-definite, and thus easy to work with
β’ E.g. symmetry means their eigensystem (and thus the SVD) has no complex numbers when π΄ doesnβt
Example (Tall Matrix)
β’ Consider size 4π₯3 matrix π΄ =
1 2 347
58
69
10 11 12
β’ Label the columns π1 =
14710
, π2 =
25811
, π3 =
36912
β’ Since π1 and π2 point in different directions, π΄ is at least rank 2
β’ π3 = 2π2 β π1, so the third column is in the span of the first two columns
β’ Thus, π΄ is only rank 2 (not rank 3)
Example (SVD)
π΄ =
1 2 347
58
69
10 11 12
=
.141 .825 β.420 β.351
.344
.547.426.028
.298 .782.644 β.509
.750 β.371 β.542 .079
25.5 0 000
1.290
00
0 0 0
.504 .574 .644β.761 β.057 .646.408 β.816 .408
β’ Singular values are 25.5, 1.29, and 0
β’ Singular value of 0 indicates that the matrix is rank deficient
β’ The rank of a matrix is equal to its number of nonzero singular values
Derivation from π΄ππ΄ and π΄π΄π
π΄ =
1 2 347
58
69
10 11 12
=
.141 .825 β.420 β.351
.344
.547.426.028
.298 .782.644 β.509
.750 β.371 β.542 .079
25.5 0 000
1.290
00
0 0 0
.504 .574 .644β.761 β.057 .646.408 β.816 .408
β’ π΄ππ΄ is size 3π₯3 and has 3 eigenvectors (seen in π)
β’ The square roots of the 3 eigenvalues of π΄ππ΄ are seen in π΄ (color coded to the eigenvectors)
β’ π΄π΄π is size 4π₯4 and has 4 eigenvectors (seen in π)
β’ The square roots of 3 of the eigenvalues of π΄π΄π are seen in π΄β’ The 4th eigenvalue of π΄π΄π is an extra eigenvalue of 0
Understanding π΄π
π΄ =
1 2 347
58
69
10 11 12
=
.141 .825 β.420 β.351
.344
.547.426.028
.298 .782.644 β.509
.750 β.371 β.542 .079
25.5 0 000
1.290
00
0 0 0
.504 .574 .644β.761 β.057 .646.408 β.816 .408
β’ π΄ maps from π 3 to π 4
β’ π΄π first projects π β π 3 onto the 3 basis vectors in πβ’ Then, the associated singular values (diagonally) scale the resultsβ’ Lastly, those scaled results are used as weights on the basis vectors in π
Understanding π΄π
π΄π =
.141 .825 β.420 β.351
.344
.547.426.028
.298 .782.644 β.509
.750 β.371 β.542 .079
25.5 0 000
1.290
00
0 0 0
.504 .574 .644β.761 β.057 .646.408 β.816 .408
π1π2π3
=
.141 .825 β.420 β.351
.344
.547.426.028
.298 .782.644 β.509
.750 β.371 β.542 .079
25.5 0 000
1.290
00
0 0 0
π£1ππ
π£2ππ
π£3ππ
=
.141 .825 β.420 β.351
.344
.547.426.028
.298 .782.644 β.509
.750 β.371 β.542 .079
π1π£1ππ
π2π£2ππ
π3π£3ππ0
= π’1π1π£1ππ + π’2π2π£2
ππ + π’3π3π£3ππ + π’40
β’ π΄π projects π onto the basis vectors in π, scales by the associated singular values, and uses those results as weights on the basis vectors in π
Extra Dimensions
π΄ =
1 2 347
58
69
10 11 12
=
.141 .825 β.420 β.351
.344
.547.426.028
.298 .782.644 β.509
.750 β.371 β.542 .079
25.5 0 000
1.290
00
0 0 0
.504 .574 .644β.761 β.057 .646.408 β.816 .408
β’ The 3D space of vector inputs can only span a 3D subspace of π 4
β’ The last (green) column of π represents the unreachable dimension, orthogonal to the range of π΄, and is always multiplied by 0
β’ One can delete this column and the associated portion of Ξ£ (and still obtain a valid factorization)
Zero Singular Values
π΄ =
1 2 347
58
69
10 11 12
=
.141 .825 β.420 β.351
.344
.547.426.028
.298 .782.644 β.509
.750 β.371 β.542 .079
25.5 0 000
1.290
00
0 0 0
.504 .574 .644β.761 β.057 .646.408 β.816 .408
β’ The 3rd singular value is 0, so π΄ has a 1D null space that reduces the 3D input vectors to only 2 dimensions
β’ The associated (pink) terms make no contribution to the final result, and can also be deleted (still obtaining a valid factorization)
β’ The first 2 columns of π span the 2D subset of π 4 that comprises the range of π΄
Approximating π΄
π΄ =
1 2 347
58
69
10 11 12
β
.141 .825 β.420 β.351
.344
.547.426.028
.298 .782.644 β.509
.750 β.371 β.542 .079
25.5 0 000
1.290
00
0 0 0
.504 .574 .644β.761 β.057 .646.408 β.816 .408
β’ The first singular value is much bigger than the second, and so represents the vast majority of what π΄ does (note, the vectors in π and π are unit length)
β’ Thus, one could approximate π΄ quite well by only using the terms associated with the largest singular value
β’ This is not a valid factorization, but an approximation (and the idea behind PCA)
Summary
β’ The columns of π that do not correspond to βnonzeroβ singular values form an orthonormal basis for the null space of π΄
β’ The remaining columns of π form an orthonormal basis for the space perpendicular to the null space of π΄ (parameterizing meaningful inputs)
β’ The columns of π corresponding to βnonzeroβ singular values form an orthonormal basis for the range of π΄
β’ The remaining columns of π form an orthonormal basis for the (unattainable) space perpendicular to the range of π΄
β’ One can drop the columns of π and π that do not correspond to βnonzeroβ singular values and still obtain a valid factorization of π΄
β’ One can drop the columns of π and π that correspond to βsmall/smallerβ singular values and still obtain a reasonable approximation of π΄
Example (Wide Matrix)
π΄ =1 4 7 102 5 8 113 6 9 12
=
.504 β.761 .408
.574 β.057 β.816
.644 .646 .408
25.5 0 0 00 1.29 0 00 0 0 0
.141 .344 .547 .750
.825β.420
.426
.298.028 β.371.644 β.542
β.351 .782 β.509 .079
β’ π΄ maps from π 4 to π 3 and so has at least a 1D null space (green)
β’ The 3rd singular value is 0, and the associated (pink) terms make no contribution to the final result
Example (Wide Matrix)
π΄ =1 4 7 102 5 8 113 6 9 12
=
.504 β.761 .408
.574 β.057 β.816
.644 .646 .408
25.5 0 0 00 1.29 0 00 0 0 0
.141 .344 .547 .750
.825β.420
.426
.298.028 β.371.644 β.542
β.351 .782 β.509 .079
β’ Only a 2D subspace of π 4 matters, with the rest of π 4 in the null space of π΄
β’ Only a 2D subspace of π 3 is in the range of π΄
Notes
β’ The SVD is often unwieldy for computational purposes
β’ However, replacing matrices by their SVD can be quite useful/enlightening for theoretical pursuits
β’ Moreover, its theoretical underpinnings are often used to devise computational algorithms
β’ The SVD is unique under certain assumptions, such as all ππ β₯ 0 and in descending order
β’ However, one can make both a ππ and its associated column in π negative for an βalternate SVDβ (see e.g. βInvertible Finite Elements For Robust Simulation of Large Deformationβ, Irving et al. 2004)
SVD Construction (Important Detail)
β’ Let A =1 00 β1
so that π΄ππ΄ = π΄π΄π = πΌ, and thus π = π = Ξ£ = πΌ
β’ But π΄ β πΞ£ππ = πΌ Whatβs wrong?
β’ Given a column vector π£π of π, π΄π£π = ππ΄πππ£π = ππ΄ ΖΈππ = πππ ΖΈππ = πππ’π where π’π is the corresponding column of π
β’ π΄π£1 =1 00 β1
10
=10
= π’1 but π΄π£2 =1 00 β1
01
=0β1
β 01
= π’2
β’ Since π and π are orthonormal, their columns are unit length
β’ However, there are still two choices for the direction of each column
β’ Multiplying π’2 by β1 to get π’2 =0β1
makes π = π΄, and thus π΄ = πΞ£ππ as
desired
SVD Construction (Important Detail)
β’ An orthogonal matrix has determinant equal to Β±1, where β1 indicates a reflection of the coordinate system
β’ If detπ = β1, flip the direction of any column to make detπ = 1 (so π does not contain a reflection)
β’ Then, for each π£π, compare π΄π£π to πππ’π and flip the direction of π’π when necessary in order to make π΄π£π = πππ’π
β’ detπ = Β± 1 and may contain a reflection
β’ When detπ = β1, one can flip the sign of the smallest singular value in π΄ to be negative, whilst also flipping the direction of the corresponding column in π so that detπ = 1
β’ This embeds the reflection into π΄ and is called the polar-SVD (Irving et al. 2004)
Solving Linear Systems
β’ π΄π = π becomes ππ΄πππ = π or π΄(πππ) = (πππ) or π΄ ΖΈπ = π
β’ The unknowns π are remapped into the space spanned by π, and the right hand side π is remapped into the space spanned by π
β’ Every matrix is a diagonal matrix, when viewed in the right space
β’ Solve the diagonal system π΄ ΖΈπ = π by dividing the entries of π by the singular values ππ; then, π = π ΖΈπ
β’ The SVD transforms the problem into an inherently diagonal space with eigenvectors along the coordinate axes
β’ Circles becoming ellipses (discussed earlier) is still problematicβ’ Eccentricity is caused by ratios of singular values (since π and π are orthogonal matrices)
Condition Number
β’ The condition number of π΄ is ππππ₯
ππππand measures closeness to being singular
β’ For a square matrix, it measures the difficulty in solving π΄π = π
β’ For a rectangular (and square) matrix, it measures how close the columns are to being linearly dependentβ’ For a wide (rectangular) matrix, it ignores the extra columns that are guaranteed to be
linearly dependent (which is fine, because the associated variables lack any data)
β’ The condition number does not depend on the right hand side
β’ The condition number is always bigger than 1, and approaches β for nearly singular matrices
β’ Singular matrices have condition number equal to β, since ππππ = 0
Singular Matrices
β’ Diagonalize π΄π = π to π΄(πππ) = (πππ), e.g. π1 00 π2
ΖΈπ1ΖΈπ2
=π1π2
with ΖΈπ1 =π1
π1, ΖΈπ2 =
π2
π2
β’ Suppose π1 β 0 and π2 = 0; then, there is no unique solution:
β’ When π2 = 0, there are infinite solutions for ΖΈπ2 (but ΖΈπ1 is still uniquely determined)
β’ When π2 β 0, there is no solution for ΖΈπ2, and π is not in the range of π΄ (but ΖΈπ1 is still uniquely determined)
β’ Consider: π1 00 π20 0
ΖΈπ1ΖΈπ2
=
π1π2π3
with ΖΈπ1 =π1
π1, ΖΈπ2 =
π2
π2
β’ When π3 = 0, the last row adds no new information (one has extra redundant data)
β’ When π3 β 0, the last row is false and there is no solution (but ΖΈπ1and ΖΈπ2 are still uniquely determined)
β’ Consider: π1 0 00 π2 0
ΖΈπ1ΖΈπ2ΖΈπ3
=π1π2
with ΖΈπ1 =π1
π1, ΖΈπ2 =
π2
π2
β’ Infinite solutions work for ΖΈπ3 (but ΖΈπ1and ΖΈπ2 are still uniquely determined)
Understanding Variables
β’ Consider any column π of π΄
β’ When ππ β 0, one can state a value for ΖΈππβ’ When ππ = 0 or there is no ππ, then there is no information in the data for ΖΈππ
β’ This does not mean that other parameters cannot be adequately determined!
β’ Consider a row π of π΄ that is identically zero
β’ When ππ = 0, this row indicates that there is extra redundant data
β’ When ππ β 0, this row indicates that there is conflicting information in the data
β’ Conflicting information doesnβt necessarily imply that all is lost, i.e. βno solutionβ; rather, it might merely mean that the data contains a bit of noise
β’ Regardless, in spite of any conflicting information, the determinable ΖΈππ represent the βbestβ that one can do
Norms
β’ Common norms: π 1 = Οπ ππ , π 2 = Οπ ππ2, π β = max
πππ
β’ βAll norms are interchangeableβ is a theoretically valid statement (only)
β’ In practice, the βworst case scenarioβ (πΏβ) and the βaverageβ (πΏ1, πΏ2, etc.) are not interchangeable
β’ E.g. (100 people * 98.6π + 1 person * 105π)/(101 people) = 98.66π
β’ Their average temperature is 98.66π, but everyone is not βokβ
Matrix Norms
β’ Define the norm of a matrix π΄ = maxπβ 0
π΄π
π, so:
β’ π΄ 1 is the maximum absolute value column sum
β’ π΄ β is the maximum absolute value row sum
β’ π΄ 2 is the square root of the maximum eigenvalue of π΄ππ΄ , i.e. the maximum singular value of π΄
β’ The condition number for solving (square matrix) π΄π = π is π΄ 2 π΄β1 2
β’ Since π΄β1 = πΞ£β1ππ where Ξ£β1 has diagonal entries 1
ππ, π΄β1 2 =
1
ππππ
β’ Thus, π΄ 2 π΄β1 2 =ππππ₯
ππππ