Machine Learning for Signal Processing
Fundamentals of Linear Algebra - 2
Class 3. 8 Sep 2016
Instructor: Bhiksha Raj
11-755/18-797 1
Overview
• Vectors and matrices • Vector spaces
• Basic vector/matrix operations • Various matrix types • Projections
• More on matrix types • Matrix determinants • Matrix inversion • Eigenanalysis • Singular value decomposition • Matrix Calculus
11-755/18-797 2
The importance of Bases
• Conventional 3D representation
– Each point (vector) is just a triplet of coordinates
– In reality, the coordinates are weights!
– X = x.u1 + y.u2 + z.u3
– u1 = [0 0 1], u2 = [0 1 0], u3 = [1 0 0]
• Unit vectors in each of the three directions
11-755/18-797 3
(x, y, z)
u1 x
y
z
u2 u3
The importance of Bases
• Specialty of u1, u2, u3
– Every point in the space can be expressed as some
x.u1 + y.u2 + z.u3
– All three “bases” u1, u2, u3 are required
11-755/18-797 4
(x, y, z)
u1 x
y
z
u2 u3
𝒖1 =100
𝒖2 =010
𝒖3 =001
𝑿 =𝑥𝑦𝑧= 𝑥100
+ 𝑦010
+ 𝑧001
The importance of Bases
• Is there any other set v1, v2,.. vn which share
this property
– Any point can be expressed as a.v1 + b.v2 + c.v3 …
– How many “v”s will we require 11-755/18-797 5
(a, b, c)
v1
v2
v3 𝑋 = 𝑎𝑽1+𝑏𝑽2+𝑐𝑽3
aV1
bV2
cV3
u1
u2
u3
v1
v2
v3
Basis based representation
• A “good” basis captures data structure
• Here u1, u2 and u3 all take large values for data in the set
• But in the (v1 v2 v3) set, coordinate values along v3 are always small for data on the blue sheet – v3 likely represents a “noise subspace” for these data
11-755/18-797 6
Basis based representation
• The most important challenge in ML: Find the best set of bases for a given data set
11-755/18-797 7
u1
u2
u3
v1
v2
v3
Matrix as a Basis transform
• A matrix transforms a representation in terms of a standard basis u1 u2 u3 to a representation in terms of a different bases v1 v2 v3
• Finding best bases: Find matrix that transforms standard representation to these bases
11-755/18-797 8
𝑎𝑏𝑐= 𝐓𝑥𝑦𝑧
𝐗 = 𝑎𝐯1 + 𝑏𝐯2 + 𝑐𝐯3, 𝐗 = 𝑥𝐮1+ 𝑦𝐮2+ 𝑧𝐮3
• Going on to more mundane stuff..
11-755/18-797 9
Orthogonal/Orthonormal vectors
• Two vectors are orthogonal if they are perpendicular to one another
– A.B = 0
– A vector that is perpendicular to a plane is orthogonal to every vector on
• Two vectors are orthonormal if
– They are orthogonal
– The length of each vector is 1.0
– Orthogonal vectors can be made orthonormal by scaling their lengths to 1.0
11-755/18-797 10
z
yx
A
wvu
B
0 0. zwyvxuBA
Orthogonal matrices
• Orthogonal Matrix : AAT = ATA = I
– The matrix is square
– All row vectors are orthonormal to one another • Every vector is perpendicular to the hyperplane formed by all other vectors
– All column vectors are also orthonormal to one another
– Observation: In an orthogonal matrix if the length of the row vectors is 1.0, the length of the column vectors is also 1.0
– Observation: In an orthogonal matrix no more than one row can have all entries with the same polarity (+ve or –ve)
11-755/18-797 11
5.075.0 0
375.0 125.0 5.0
375.0 125.0 5.0All 3 at 90o to
on another
Orthogonal Matrices
• Orthogonal matrices will retain the length and relative angles between transformed vectors
– Essentially, they are combinations of rotations, reflections and permutations
– Rotation matrices and permutation matrices are all orthogonal
11-755/18-797 12
q Ax
Orthogonal and Orthonormal Matrices
• If the vectors in the matrix are not unit length, it cannot be orthogonal – AAT != I, ATA != I
– AAT = Diagonal or ATA = Diagonal, but not both
– If all the entries are the same length, we can get AAT = ATA = Diagonal, though
• A non-square matrix cannot be orthogonal – AAT=I or ATA = I, but not both
11-755/18-797 13
5.075.0 0
375.0 125.0 5.0
1875.0 0675.0 1
Matrix Rank and Rank-Deficient Matrices
• Some matrices will eliminate one or more dimensions during transformation
– These are rank deficient matrices
– The rank of the matrix is the dimensionality of the transformed version of a full-dimensional object
11-755/18-797 14
P * Cone =
Matrix Rank and Rank-Deficient Matrices
• Some matrices will eliminate one or more dimensions during transformation – These are rank deficient matrices
– The rank of the matrix is the dimensionality of the transformed version of a full-dimensional object
11-755/18-797 15
Rank = 2 Rank = 1
Projections are often examples of rank-deficient transforms
11-755/18-797 16
P = W (WTW)-1 WT ; Projected Spectrogram = P*M
The original spectrogram can never be recovered P is rank deficient
P explains all vectors in the new spectrogram as a mixture of only the 4 vectors in W There are only a maximum of 4 linearly independent bases
Rank of P is 4
M =
W =
Non-square Matrices
• Non-square matrices add or subtract axes – More rows than columns add axes
• But does not increase the dimensionality of the dataaxes • May reduce dimensionality of the data
11-755/18-797 17
N
N
yyy
xxx
..
..
21
21
X = 2D data
06.
9.1.
9.8.
P = transform PX = 3D, rank 2
N
N
N
zzz
yyy
xxx
ˆ..ˆˆ
ˆ..ˆˆ
ˆ..ˆˆ
21
21
21
Non-square Matrices
• Non-square matrices add or subtract axes – More rows than columns add axes
• But does not increase the dimensionality of the data
– Fewer rows than columns reduce axes • May reduce dimensionality of the data
11-755/18-797 18
N
N
yyy
xxxˆ..ˆˆ
ˆ..ˆˆ
21
21
X = 3D data, rank 3
115.
2.13.
P = transform PX = 2D, rank 2
N
N
N
zzz
yyy
xxx
..
..
..
21
21
21
The Rank of a Matrix
• The matrix rank is the dimensionality of the transformation of a full-dimensioned object in the original space
• The matrix can never increase dimensions – Cannot convert a circle to a sphere or a line to a circle
• The rank of a matrix can never be greater than the lower of its two dimensions
11-755/18-797 19
115.
2.13.
06.
9.1.
9.8.
The Rank of Matrix
11-755/18-797 20
Projected Spectrogram = P * M Every vector in it is a combination of only 4 bases
The rank of the matrix is the smallest no. of bases required to describe the output E.g. if note no. 4 in P could be expressed as a combination of notes 1,2
and 3, it provides no additional information
Eliminating note no. 4 would give us the same projection
The rank of P would be 3!
M =
Matrix rank is unchanged by transposition
• If an N-dimensional object is compressed to a K-dimensional object by a matrix, it will also be compressed to a K-dimensional object by the transpose of the matrix
11-755/18-797 21
86.044.042.0
9.04.01.0
8.05.09.0
86.09.08.0
44.04.05.0
42.01.09.0
Matrix Determinant
• The determinant is the “volume” of a matrix
• Actually the volume of a parallelepiped formed from its row vectors – Also the volume of the parallelepiped formed from its column
vectors
• Standard formula for determinant: in text book
11-755/18-797 22
(r1)
(r2) (r1+r2)
(r1)
(r2)
Matrix Determinant: Another Perspective
• The determinant is the ratio of N-volumes
– If V1 is the volume of an N-dimensional sphere “O” in N-dimensional space
• O is the complete set of points or vertices that specify the object
– If V2 is the volume of the N-dimensional ellipsoid specified by A*O, where A is a matrix that transforms the space
– |A| = V2 / V1
11-755/18-797 23
Volume = V1 Volume = V2
7.09.0 7.0
8.0 8.0 0.1
7.0 0 8.0
Matrix Determinants • Matrix determinants are only defined for square matrices
– They characterize volumes in linearly transformed space of the same dimensionality as the vectors
• Rank deficient matrices have determinant 0
– Since they compress full-volumed N-dimensional objects into zero-volume N-dimensional objects
• E.g. a 3-D sphere into a 2-D ellipse: The ellipse has 0 volume (although it does have area)
• Conversely, all matrices of determinant 0 are rank deficient
– Since they compress full-volumed N-dimensional objects into zero-volume objects
11-755/18-797 24
Multiplication properties • Properties of vector/matrix products
– Associative
– Distributive
– NOT commutative!!!
• left multiplications ≠ right multiplications
– Transposition
11-755/18-797 25
A (B C) (A B) C
A B B A
A (BC) A BA C
TTTABBA
Determinant properties • Associative for square matrices
– Scaling volume sequentially by several matrices is equal to scaling once by the product of the matrices
• Volume of sum != sum of Volumes
• Commutative
– The order in which you scale the volume of an object is irrelevant
11-755/18-797 26
CBACBA
BAABBA
CBCB )(
Matrix Inversion
• A matrix transforms an N-dimensional object to a different N-dimensional object
• What transforms the new object back to the original?
– The inverse transformation
• The inverse transformation is called the matrix inverse
11-755/18-797 28
1
???
???
???
TQ
7.09.0 7.0
8.0 8.0 0.1
7.0 0 8.0
T
Matrix Inversion
• The product of a matrix and its inverse is the identity matrix
– Transforming an object, and then inverse transforming it gives us back the original object
11-755/18-797 29
T T-1
ITTDTDT 11
ITTDDTT 11
• Given the Transform T and transformed vector Y, how do we determine X?
11-755/18-797 30
The Inverse Transform and Simultaneous Equations
T
X Y
Inverse Transform and Simultaneous Equation
• Inverting the transform is identical to solving simultaneous equations
11-755/18-797 31
𝑎𝑏𝑐= 𝐓𝑥𝑦𝑧
𝑎 = 𝑇11𝑥 + 𝑇12𝑦 + 𝑇13𝑧𝑏 = 𝑇21𝑥 + 𝑇22𝑦 + 𝑇23𝑧𝑐 = 𝑇31𝑥 + 𝑇32𝑦 + 𝑇33𝑧
Given
𝑎𝑏𝑐
find
𝑥𝑦𝑧
𝐓=
𝑇11 𝑇12 𝑇13𝑇21 𝑇22 𝑇23𝑇31 𝑇32 𝑇33
Inverting rank-deficient matrices
• Rank deficient matrices “flatten” objects – In the process, multiple points in the original object get mapped to the same
point in the transformed object
• It is not possible to go “back” from the flattened object to the original object – Because of the many-to-one forward mapping
• Rank deficient matrices have no inverse
11-755/18-797 32
75.0433.00
433.025.0
001
Inverse Transform and Simultaneous Equation
• Inverting the transform is identical to solving
simultaneous equations
• Rank-deficient transforms result in too-few equations
– Cannot be inverted to obtain a unique solution
11-755/18-797 33
𝑎𝑏= 𝐓𝑥𝑦𝑧
Given 𝑎𝑏
find
𝑥𝑦𝑧
𝐓=𝑇11 𝑇12 𝑇13𝑇21 𝑇22 𝑇23
𝑎 = 𝑇11𝑥 + 𝑇12𝑦 + 𝑇13𝑧𝑏 = 𝑇21𝑥 + 𝑇22𝑦 + 𝑇23𝑧
Inverse Transform and Simultaneous Equation
• Inverting the transform is identical to solving simultaneous equations
• Rank-deficient transforms result in too-few equations
– Cannot be inverted to obtain a unique solution
• Or too many equations
– Cannot be inverted to obtain an exact solution 11-755/18-797 34
𝑎𝑏𝑐= 𝐓𝑥𝑦 𝐓=
𝑇11 𝑇12𝑇21 𝑇22𝑇31 𝑇32
𝑎 = 𝑇11𝑥 + 𝑇12𝑦𝑏 = 𝑇21𝑥 + 𝑇22𝑦𝑐 = 𝑇31𝑥 + 𝑇32𝑦
Given
𝑎𝑏𝑐
find 𝑥𝑦
Rank Deficient Matrices
11-755/18-797 35
The projection matrix is rank deficient
You cannot recover the original spectrogram from the projected one..
M =
Revisiting Projections and Least Squares • Projection computes a least squared error estimate
• For each vector V in the music spectrogram matrix
– Approximation: Vapprox = a.note1 + b.note2 + c.note3..
– Error vector E = V – Vapprox
– Squared error energy for V e(V) = norm(E)2
• Projection computes Vapprox for all vectors such that Total error is minimized
• But WHAT ARE “a” “b” and “c”?
11-755/18-797 36
T
note
1
note
2
note
3
c
b
a
TVapprox
The Pseudo Inverse (PINV)
• We are approximating spectral vectors V as the transformation of the vector [a b c]T
– Note – we’re viewing the collection of bases in T as a transformation
• The solution is obtained using the pseudo inverse
– This give us a LEAST SQUARES solution
• If T were square and invertible Pinv(T) = T-1, and V=Vapprox
11-755/18-797 37
c
b
a
TVapprox
c
b
a
TV
𝑎𝑏𝑐= 𝑃𝑖𝑛𝑣 𝑇 𝑉
Generalization to matrices
11-755/18-797 38
𝐗 = 𝐓𝐘 𝐘 = 𝐓−1𝐗 𝐗 = 𝐘𝐓 𝐘 = 𝐗𝐓−1
Left multiplication Right multiplication
𝐗 = 𝐓𝐘 𝐘 = Pinv(𝐓)𝐗 𝐗 = 𝐘𝐓 𝐘 = 𝐗Pinv(𝐓)
Left multiplication Right multiplication
• Unique exact solution exists
• T must be square
• No unique exact solution exists
– At least one (if not both) of the forward and backward equations may be inexact
• T may or may not be square
The Pseudo Inverse
• Case 1: Too many solutions
• Pinv(X)Y picks the shortest solution 11-755/18-797 39
𝑎𝑏= 𝐓𝑥𝑦𝑧
𝑎 = 𝑇11𝑥 + 𝑇12𝑦 + 𝑇13𝑧𝑏 = 𝑇21𝑥 + 𝑇22𝑦 + 𝑇23𝑧
𝑥𝑦𝑧= 𝑃𝑖𝑛𝑣(𝐓)
𝑎𝑏
X
Y
Z
Plane of solutions
Shortest solution Figure only meant for illustration for the above equations, actual set of solutions is a line, not a plane. Pinv(T)A will be the point on the line closest to origin
The Pseudo Inverse
• Case 2: No exact solution
• Pinv(X)Y picks the solution that results in the lowest error 11-755/18-797 40
𝑥𝑦 = 𝑃𝑖𝑛𝑣(𝐓)
𝑎𝑏𝑐
Figure only meant for illustration for the above equations, Pinv(T) will actually have 6 components. The error is a quadratic in 6 dimensions
𝑎𝑏𝑐= 𝐓𝑥𝑦 𝐓=
𝑇11 𝑇12𝑇21 𝑇22𝑇31 𝑇32
𝑎 = 𝑇11𝑥 + 𝑇12𝑦𝑏 = 𝑇21𝑥 + 𝑇22𝑦𝑐 = 𝑇31𝑥 + 𝑇32𝑦
(Pinv(T))12 (Pinv(T))11
||X – Pinv(T)A||2 “Optimal” Pinv(T)
Explaining music with one note
11-755/18-797 41
Recap: P = W (WTW)-1 WT, Projected Spectrogram = P*M
Approximation: M = W*X
The amount of W in each vector = X = PINV(W).M
W.Pinv(W).M = Projected Spectrogram
W.Pinv(W) = Projection matrix!!
M =
W =
PINV(W) = (WTW)-1WT
X =PINV(W).M
Explanation with multiple notes
11-755/18-797 42
X = Pinv(W).M; Projected matrix = W.X = W.Pinv(W).M
M =
W =
X=PINV(W)M
How about the other way?
11-755/18-797 43
W = M Pinv(V) U = WV
M =
W = ? ?
V =
U =
Pseudo-inverse (PINV)
• Pinv() applies to non-square matrices
• Pinv ( Pinv (A))) = A
• A.Pinv(A)= projection matrix!
– Projection onto the columns of A
• If A = K x N matrix and K > N, A projects N-D vectors into a higher-dimensional K-D space
– Pinv(A) = NxK matrix
– Pinv(A).A = I in this case
• Otherwise A.Pinv(A) = I 11-755/18-797 44
Finding the Transform
• Given examples – T.X1 = Y1
– T.X2 = Y2
– .. – T.XN = YN
• Find T
11-755/18-797 45
Finding the Transform
• Pinv works here too
11-755/18-797 46
𝐗 =↑ ⋮ ↑𝑋1 ⋱ 𝑋𝑁↓ ⋮ ↓
𝐘 =↑ ⋮ ↑𝑌1 ⋱ 𝑌𝑁↓ ⋮ ↓
𝐗 𝐘
𝐘 = 𝐓𝐗 𝐓 = 𝐘Pinv(𝐗)
Finding the Transform: Inexact
• Even works for inexact solutions
• We desire to find a linear transform T that maps X to Y
– But such a linear transform doesn’t really exist
• Pinv will give us the “best guess” for T that minimizes the total squared error between Y and TX 47
𝐗 =↑ ⋮ ↑𝑋1 ⋱ 𝑋𝑁↓ ⋮ ↓
𝐘 =↑ ⋮ ↑𝑌1 ⋱ 𝑌𝑁↓ ⋮ ↓
𝐗 𝐘
𝐘 ≈ 𝐓𝐗 𝐓 = 𝐘Pinv(𝐗) minimizes ||𝑌𝑖 − 𝑇𝑋𝑖||2
𝑖
Matrix inversion (division)
• The inverse of matrix multiplication – Not element-wise division!!
• Provides a way to “undo” a linear transformation – Inverse of the unit matrix is itself
– Inverse of a diagonal is diagonal
– Inverse of a rotation is a (counter)rotation (its transpose!)
– Inverse of a rank deficient matrix does not exist! • But pseudoinverse exists
• For square matrices: Pay attention to multiplication side!
• If matrix is not square, of the matrix is not invertible, use a matrix pseudoinverse:
11-755/18-797 48
A B C, A C B1, B A 1 C
CABBCACBA , ,
Eigenanalysis
• If something can go through a process mostly unscathed in character it is an eigen-something
– Sound example:
• A vector that can undergo a matrix multiplication and keep pointing the same way is an eigenvector
– Its length can change though
• How much its length changes is expressed by its corresponding eigenvalue
– Each eigenvector of a matrix has its eigenvalue
• Finding these “eigenthings” is called eigenanalysis
11-755/18-797 49
EigenVectors and EigenValues
• Vectors that do not change angle upon
transformation
– They may change length
– V = eigen vector
– l = eigen value 50
VMV l
0.17.07.05.1M
Black vectors are eigen vectors
11-755/18-797
Eigen vector example
11-755/18-797 51
Matrix multiplication revisited
• Matrix transformation “transforms” the space
– Warps the paper so that the normals to the two vectors now lie along the axes
11-755/18-797 52
2.11.1
07.00.1A
A stretching operation
• Draw two lines
• Stretch / shrink the paper along these lines by factors l1 and l2
– The factors could be negative – implies flipping the paper
• The result is a transformation of the space
11-755/18-797 53
1.4 0.8
A stretching operation
11-755/18-797 54
Draw two lines
Stretch / shrink the paper along these lines by factors l1 and l2
The factors could be negative – implies flipping the paper
The result is a transformation of the space
A stretching operation
11-755/18-797 55
Draw two lines
Stretch / shrink the paper along these lines by factors l1 and l2
The factors could be negative – implies flipping the paper
The result is a transformation of the space
Physical interpretation of eigen vector
• The result of the stretching is exactly the same as transformation by a matrix
• The axes of stretching/shrinking are the eigenvectors
– The degree of stretching/shrinking are the corresponding eigenvalues
• The EigenVectors and EigenValues convey all the information about the matrix
11-755/18-797 56
Physical interpretation of eigen vector
• The result of the stretching is exactly the same as transformation by a matrix
• The axes of stretching/shrinking are the eigenvectors
– The degree of stretching/shrinking are the corresponding eigenvalues
• The EigenVectors and EigenValues convey all the information about the matrix
11-755/18-797 57
1
2
1
21
0
0
VVM
VVV
ll
Eigen Analysis • Not all square matrices have nice eigen values and
vectors – E.g. consider a rotation matrix
– This rotates every vector in the plane • No vector that remains unchanged
• In these cases the Eigen vectors and values are complex
11-755/18-797 58
'
'
cossin
sincos
y
xX
y
xX
new
qqqR
q
Singular Value Decomposition
• Matrix transformations convert circles to ellipses
• Eigen vectors are vectors that do not change direction in the process
• There is another key feature of the ellipse to the left that carries information about the transform
– Can you identify it?
11-755/18-797 59
2.11.1
07.00.1A
Singular Value Decomposition
• The major and minor axes of the transformed ellipse define the ellipse
– They are at right angles
• These are transformations of right-angled vectors on the original circle!
11-755/18-797 60
2.11.1
07.00.1A
Singular Value Decomposition
• U and V are orthonormal matrices
– Columns are orthonormal vectors
• S is a diagonal matrix
• The right singular vectors in V are transformed to the left singular vectors in U
– And scaled by the singular values that are the diagonal entries of S
11-755/18-797 61
2.11.1
07.00.1A
matlab: [U,S,V] = svd(A)
A = U S VT
V1 V2
s1U1
s2U2
Singular Value Decomposition • The left and right singular vectors are not the same
– If A is not a square matrix, the left and right singular vectors will be of different dimensions
• The singular values are always real
• The largest singular value is the largest amount by which a vector is scaled by A
– Max (|Ax| / |x|) = smax
• The smallest singular value is the smallest amount by which a vector is scaled by A
– Min (|Ax| / |x|) = smin
– This can be 0 (for low-rank or non-square matrices)
11-755/18-797 62
The Singular Values
• Square matrices: product of singular values = determinant of the matrix
– This is also the product of the eigen values
– I.e. there are two different sets of axes whose products give you the area of an ellipse
• For any “broad” rectangular matrix A, the largest singular value of any square submatrix B cannot be larger than the largest singular value of A
– An analogous rule applies to the smallest singular value
– This property is utilized in various problems, such as compressive sensing 63
s1U1
s2U1
11-755/18-797
SVD vs. Eigen Analysis
• Eigen analysis of a matrix A: – Find two vectors such that their absolute directions are not changed by the
transform
• SVD of a matrix A: – Find two vectors such that the angle between them is not changed by the
transform
• For one class of matrices, these two operations are the same
11-755/18-797 64
s1U1
s2U1
A matrix vs. its transpose
• Multiplication by matrix A:
– Transforms right singular vectors in V to left singular vectors U
• Multiplication by its transpose AT:
– Transforms left singular vectors U to right singular vector V
• A AT : Converts V to U, then brings it back to V
– Result: Only scaling
11-755/18-797 65
11.0
07.A
TA
Symmetric Matrices
• Matrices that do not change on transposition – Row and column vectors are identical
• The left and right singular vectors are identical – U = V – A = U S UT
• They are identical to the Eigen vectors of the matrix • Symmetric matrices do not rotate the space
– Only scaling and, if Eigen values are negative, reflection
11-755/18-797 66
17.0
7.05.1
Symmetric Matrices
• Matrices that do not change on transposition – Row and column vectors are identical
• Symmetric matrix: Eigen vectors and Eigen values are always real
• Eigen vectors are always orthogonal – At 90 degrees to one another
11-755/18-797 67
17.0
7.05.1
Symmetric Matrices
• Eigen vectors point in the direction of the major and minor axes of the ellipsoid resulting from the transformation of a spheroid
– The eigen values are the lengths of the axes
11-755/18-797 68
17.0
7.05.1
Symmetric matrices • Eigen vectors Vi are orthonormal
– ViTVi = 1
– ViTVj = 0, i != j
• Listing all eigen vectors in matrix form V
– VT = V-1
– VT V = I
– V VT= I
• M Vi = lVi
• In matrix form : M V = V
– is a diagonal matrix with all eigen values
• M = V VT 11-755/18-797 69
Square root of a symmetric matrix
11-755/18-797 70
T
T
VSqrtVCSqrt
VVC
).(.)(
TT VSqrtVVSqrtVCSqrtCSqrt ).(.).(.)().(
CVVVSqrtSqrtV TT )().(.
Definiteness..
• SVD: Singular values are always positive!
• Eigen Analysis: Eigen values can be real or imaginary
– Real, positive Eigen values represent stretching of the space along the Eigen vector
– Real, negative Eigen values represent stretching and reflection (across origin) of Eigen vector
– Complex Eigen values occur in conjugate pairs
• A square (symmetric) matrix is positive definite if all Eigen values are real and positive, and are greater than 0
– Transformation can be explained as stretching and rotation
– If any Eigen value is zero, the matrix is positive semi-definite
11-755/18-797 71
Positive Definiteness..
• Property of a positive definite matrix: Defines
inner product norms
– xTAx is always positive for any vector x if A is positive
definite
• Positive definiteness is a test for validity of Gram
matrices
– Such as correlation and covariance matrices
– We will encounter these and other gram matrices
later
11-755/18-797 72
SVD on data-container matrices
• We can also perform SVD on matrices that are data containers
• S is a d x N rectangular matrix
– N vectors of dimension d
• U is an orthogonal matrix of d vectors of size d
– All vectors are length 1
• V is an orthogonal matrix of N vectors of size N
• S is a d x N diagonal matrix with non-zero entries only on diagonal
11-755/18-797 73
[
[ [
[
𝐗 = 𝑋1 𝑋2 ⋯𝑋𝑁
𝐗 = 𝐔𝐒𝐕T
SVD on data-container matrices
11-755/18-797 74
[
[ [
[
𝐗 = 𝑋1 𝑋2 ⋯𝑋𝑁
𝐗 = 𝐔𝐒𝐕T 𝐗 =
𝐔 = 𝐕T = 𝐒 = 0 0
|Ui| = 1.0 for every vector in U
|Vi| = 1.0 for every vector in V
SVD on data-container matrices
11-755/18-797 75
𝐗 = 𝐔𝐒𝐕T =
𝐔 = 𝐕T = 𝐒 = 0 0
+
+
…
𝐗 = 𝑠𝑖𝑈𝑖𝑉𝑖𝑇
𝑖
Expanding the SVD
• Each left singular vector and the corresponding right singular vector contribute on “basic” component to the data
• The “magnitude” of its contribution is the corresponding singular value
11-755/18-797 76
[
[ [
[
𝐗 = 𝑋1 𝑋2 ⋯𝑋𝑁 𝐗 = 𝐔𝐒𝐕T
...444333222111 TTTT VUsVUsVUsVUsX
Expanding the SVD
• Each left singular vector and the corresponding right singular vector contribute on “basic” component to the data
• The “magnitude” of its contribution is the corresponding singular value
• Low singular-value components contribute little, if anything
– Carry little information
– Are often just “noise” in the data 11-755/18-797 77
...444333222111 TTTT VUsVUsVUsVUsX
Expanding the SVD
• Low singular-value components contribute little, if anything
– Carry little information
– Are often just “noise” in the data
• Data can be recomposed using only the “major” components with minimal change of value
– Minimum squared error between original data and recomposed data
– Sometimes eliminating the low-singular-value components will, in fact “clean” the data
11-755/18-797 78
...444333222111 TTTT VUsVUsVUsVUsX
TT VUsVUs 222111 X
An audio example
• The spectrogram has 974 vectors of dimension 1025
– A 1024x974 matrix!
• Decompose: M = USVT = Si siUi ViT
• U is 1024 x 1024
• V is 974 x 974
• There are 974 non-zero singular values Si
11-755/18-797 79
Singular Values
• Singular values for spectrogram M
– Most Singluar values are close to zero
– The corresponding components are “unimportant”
11-755/18-797 80
An audio example
• The same spectrogram constructed from only the 25 highest singular-value components
– Looks similar • With 100 components, it would be indistinguishable from the
original
– Sounds pretty close
– Background “cleaned up”
11-755/18-797 81
With only 5 components
• The same spectrogram constructed from only the 5 highest-valued components
– Corresponding to the 5 largest singular values
– Highly recognizable
– Suggests that there are actually only 5 significant unique note combinations in the music
11-755/18-797 82
Trace
• The trace of a matrix is the sum of the diagonal entries
• It is equal to the sum of the Eigen values!
11-755/18-797 83
44434241
34333231
24232221
14131211
aaaa
aaaa
aaaa
aaaa
A
44332211)( aaaaATr
i
iiaATr ,)(
i
i
i
iiaATr l,)(
Trace
• Often appears in Error formulae
• Useful to know some properties..
11-755/18-797 84
44434241
34333231
24232221
14131211
dddd
aaad
dddd
dddd
D
44434241
34333231
24232221
14131211
cccc
cccc
cccc
cccc
C
CDE ji
jiEerror,
2
, )( TEETrerror
Properties of a Trace
• Linearity: Tr(A+B) = Tr(A) + Tr(B)
Tr(c.A) = c.Tr(A)
• Cycling invariance:
– Tr (ABCD) = Tr(DABC) = Tr(CDAB) =
Tr(BCDA)
– Tr(AB) = Tr(BA)
• Frobenius norm F(A) = Si,j aij2 = Tr(AAT)
11-755/18-797 85
Decompositions of matrices • Square A: LU decomposition
– Decompose A = L U
– L is a lower triangular matrix • All elements above diagonal are 0
– R is an upper triangular matrix • All elements below diagonal are zero
– Cholesky decomposition: A is symmetric, L = UT
• QR decompositions: A = QR – Q is orthgonal: QQT = I
– R is upper triangular
• Generally used as tools to compute Eigen decomposition or least square solutions
11-755/18-797 86
=
=
Calculus of Matrices
• Derivative of scalar w.r.t. vector
• For any scalar z that is a function of a vector x
• The dimensions of dz / dx are the same as the dimensions of x
11-755/18-797 87
Nx
x
1
x
Ndx
dz
dx
dz
d
dz
1
x
N x 1 vector
N x 1 vector
Calculus of Matrices
• Derivative of scalar w.r.t. matrix
• For any scalar z that is a function of a matrix X
• The dimensions of dz / dX are the same as the dimensions of X
11-755/18-797 88
232221
131211
xxx
xxxX
232221
131211
dx
dz
dx
dz
dx
dz
dx
dz
dx
dz
dx
dz
d
dz
X
N x M matrix
N x M matrix
Calculus of Matrices
• Derivative of vector w.r.t. vector
• For any Mx1 vector y that is a function of an Nx1 vector x
• dy / dx is an MxN matrix
11-755/18-797 89
N
MM
N
dx
dy
dx
dy
dx
dy
dx
dy
d
d
1
1
1
1
x
y
M x N matrix
Nx
x
1
x
My
y
1
y
Calculus of Matrices
• Derivative of vector w.r.t. matrix
• For any Mx1 vector y that is a function of an NxL matrx X
• dy / dX is an MxLxN tensor (note order)
11-755/18-797 90
X
y
d
d
M x 3 x 2 tensor
My
y
1
y
232221
131211
xxx
xxxX
(i,j,k)th element =
jk
i
dx
dy
,
Calculus of Matrices
• Derivative of matrix w.r.t. matrix
• For any MxK vector Y that is a function of an NxL matrx X
• dY / dX is an MxKxLxN tensor (note order)
11-755/18-797 91
232221
131211
yyy
yyyY (i,j)th element =
ijdx
dy
,
11
In general
• The derivative of an N1 x N2 x N3 x … tensor w.r.t to an M1 x M2 x M3 x … tensor
• Is an N1 x N2 x N3 x … x ML x ML-1 x … x M1 tensor
11-755/18-797 92
Compound Formulae
• Let Y = f ( g ( h ( X ) ) )
• Chain rule (note order of multiplication)
• The # represents a transposition operation
– That is appropriate for the tensor
11-755/18-797 93
))((
))(((
)(
))(()(##
X
X
X
X
X
X
X
Y
hdg
hgdf
dh
hdg
d
dh
d
d
Example
• y is N x 1
• x is M x 1
• A is N x M
• Compute dz/dA
– On board
11-755/18-797 94
2Axy z
Top Related