Computazione per l’interazione naturale: Richiami di...

10
Corso di Interazione Naturale Prof. Giuseppe Boccignone Dipartimento di Informatica Università di Milano [email protected] boccignone.di.unimi.it/IN_2016.html Computazione per l’interazione naturale: Richiami di algebra lineare (3) (e primi esempi di Machine Learning) Una matrice quadrata A è diagonalizzabile se esiste una matrice Q invertibile che consente la decomposizione Una matrice quadrata A reale e simmetrica è diagonalizzabile e avrà autovalori e autovettori distinti. Li normalizzo e li metto in Q che diventa quindi ortonormale Teorema spettrale Un po’ di algebra lineare di base //autovettori e autovalori

Transcript of Computazione per l’interazione naturale: Richiami di...

Page 1: Computazione per l’interazione naturale: Richiami di ...boccignone.di.unimi.it/IN_2016_files/LezIN_AlgebraLineare_3.key.pdf · Un po’ di algebra lineare di base //Singular Value

Corso di Interazione Naturale

Prof. Giuseppe Boccignone

Dipartimento di InformaticaUniversità di Milano

[email protected]/IN_2016.html

Computazione per l’interazione naturale: Richiami di algebra lineare (3)

(e primi esempi di Machine Learning)

• Una matrice quadrata A è diagonalizzabile se esiste una matrice Q invertibile che consente la decomposizione

• Una matrice quadrata A reale e simmetrica è diagonalizzabile e avrà autovalori e autovettori distinti. Li normalizzo e li metto in Q che diventa quindi ortonormale

Teorema spettrale

Un po’ di algebra lineare di base //autovettori e autovalori

Page 2: Computazione per l’interazione naturale: Richiami di ...boccignone.di.unimi.it/IN_2016_files/LezIN_AlgebraLineare_3.key.pdf · Un po’ di algebra lineare di base //Singular Value

Un po’ di algebra lineare di base //Singular Value Decomposition (SVD)

Se la matrice non è quadrata?

Un po’ di algebra lineare di base //Singular Value Decomposition (SVD)

• Vale il seguente teorema

(aligner)(stretcher) x(hanger) x

5.3 Singular Value Decomposition5 SOLUTIONS AND DECOMPOSITIONS

5.2.3 Symmetric

Assume A is symmetric, then

VVT = I (i.e. V is orthogonal) (260)�i 2 R (i.e. �i is real) (261)

Tr(Ap) =P

i�pi (262)

eig(I + cA) = 1 + c�i (263)eig(A� cI) = �i � c (264)

eig(A�1) = ��1

i (265)

For a symmetric, positive matrix A,

eig(AT A) = eig(AAT ) = eig(A) � eig(A) (266)

5.2.4 Characteristic polynomial

The characteristic polynomial for the matrix A is

0 = det(A� �I) (267)= �n � g

1

�n�1 + g2

�n�2 � ... + (�1)ngn (268)

Note that the coe�cients gj for j = 1, ..., n are the n invariants under rotationof A. Thus, gj is the sum of the determinants of all the sub-matrices of A takenj rows and columns at a time. That is, g

1

is the trace of A, and g2

is the sumof the determinants of the n(n� 1)/2 sub-matrices that can be formed from Aby deleting all but two rows and columns, and so on – see [17].

5.3 Singular Value Decomposition

Any n⇥m matrix A can be written as

A = UDVT , (269)

whereU = eigenvectors of AAT n⇥ n

D =p

diag(eig(AAT )) n⇥mV = eigenvectors of AT A m⇥m

(270)

5.3.1 Symmetric Square decomposed into squares

Assume A to be n⇥ n and symmetric. Then⇥

A⇤

=⇥

V⇤ ⇥

D⇤ ⇥

VT⇤, (271)

where D is diagonal with the eigenvalues of A, and V is orthogonal and theeigenvectors of A.

Petersen & Pedersen, The Matrix Cookbook, Version: November 14, 2008, Page 30

5.3 Singular Value Decomposition5 SOLUTIONS AND DECOMPOSITIONS

5.2.3 Symmetric

Assume A is symmetric, then

VVT = I (i.e. V is orthogonal) (260)�i 2 R (i.e. �i is real) (261)

Tr(Ap) =P

i�pi (262)

eig(I + cA) = 1 + c�i (263)eig(A� cI) = �i � c (264)

eig(A�1) = ��1

i (265)

For a symmetric, positive matrix A,

eig(AT A) = eig(AAT ) = eig(A) � eig(A) (266)

5.2.4 Characteristic polynomial

The characteristic polynomial for the matrix A is

0 = det(A� �I) (267)= �n � g

1

�n�1 + g2

�n�2 � ... + (�1)ngn (268)

Note that the coe�cients gj for j = 1, ..., n are the n invariants under rotationof A. Thus, gj is the sum of the determinants of all the sub-matrices of A takenj rows and columns at a time. That is, g

1

is the trace of A, and g2

is the sumof the determinants of the n(n� 1)/2 sub-matrices that can be formed from Aby deleting all but two rows and columns, and so on – see [17].

5.3 Singular Value Decomposition

Any n⇥m matrix A can be written as

A = UDVT , (269)

whereU = eigenvectors of AAT n⇥ n

D =p

diag(eig(AAT )) n⇥mV = eigenvectors of AT A m⇥m

(270)

5.3.1 Symmetric Square decomposed into squares

Assume A to be n⇥ n and symmetric. Then⇥

A⇤

=⇥

V⇤ ⇥

D⇤ ⇥

VT⇤, (271)

where D is diagonal with the eigenvalues of A, and V is orthogonal and theeigenvectors of A.

Petersen & Pedersen, The Matrix Cookbook, Version: November 14, 2008, Page 30

Decomposizione SVD

ROUGH DRAFT - BEWARE suggestions [email protected]

0 2 4 6 8 100

24

68

10

Figure 1: Best-fit regression line reduces data from two dimensions into one.

original data more clearly and orders it from most variation to the least. What makes SVDpractical for NLP applications is that you can simply ignore variation below a particularthreshhold to massively reduce your data but be assured that the main relationships ofinterest have been preserved.

8.1 Example of Full Singular Value Decomposition

SVD is based on a theorem from linear algebra which says that a rectangular matrix A canbe broken down into the product of three matrices - an orthogonal matrix U , a diagonalmatrix S, and the transpose of an orthogonal matrix V . The theorem is usually presentedsomething like this:

Amn = UmmSmnVTnn

where UTU = I, V TV = I; the columns of U are orthonormal eigenvectors of AAT , thecolumns of V are orthonormal eigenvectors of ATA, and S is a diagonal matrix containingthe square roots of eigenvalues from U or V in descending order.

The following example merely applies this definition to a small matrix in order to computeits SVD. In the next section, I attempt to interpret the application of SVD to documentclassification.

Start with the matrix

A =

!3 1 1−1 3 1

"

15

392 Chapter 12. Latent linear models

12.2.3 Singular value decomposition (SVD)

We have defined the solution to PCA in terms of eigenvectors of the covariance matrix. However,there is another way to obtain the solution, based on the singular value decomposition, orSVD. This basically generalizes the notion of eigenvectors from square matrices to any kind ofmatrix.

In particular, any (real) N × D matrix X can be decomposed as follows

X!"#$N×D

= U!"#$N×N

S!"#$N×D

VT!"#$D×D

(12.46)

where U is an N × N matrix whose columns are orthornormal (so UTU = IN ), V is D × Dmatrix whose rows and columns are orthonormal (so VTV = VVT = ID ), and S is a N × Dmatrix containing the r = min(N,D) singular values σi ≥ 0 on the main diagonal, with 0sfilling the rest of the matrix. The columns of U are the left singular vectors, and the columnsof V are the right singular vectors. See Figure 12.8(a) for an example.Since there are at most D singular values (assuming N > D), the last N − D columns of U

are irrelevant, since they will be multiplied by 0. The economy sized SVD, or thin SVD, avoidscomputing these unnecessary elements. Let us denote this decomposition by USV. If N > D,we have

X!"#$N×D

= U!"#$N×D

S!"#$D×D

VT!"#$D×D

(12.47)

as in Figure 12.8(a). If N < D, we have

X!"#$N×D

= U!"#$N×N

S!"#$N×N

VT!"#$N×D

(12.48)

Computing the economy-sized SVD takes O(NDmin(N,D)) time (Golub and van Loan 1996,p254).

The connection between eigenvectors and singular vectors is the following. For an arbitraryreal matrix X, if X = USVT , we have

XTX = VSTUT USVT = V(STS)VT = VDVT (12.49)

where D = S2 is a diagonal matrix containing the squares singular values. Hence

(XTX)V = VD (12.50)

so the eigenvectors of XTX are equal to V, the right singular vectors of X, and the eigenvaluesof XTX are equal to D, the squared singular values. Similarly

XXT = USVT VSTUT = U(SST )UT (12.51)

(XXT )U = U(SST ) = UD (12.52)

so the eigenvectors of XXT are equal to U, the left singular vectors of X. Also, the eigenvaluesof XXT are equal to the squared singular values. We can summarize all this as follows:

U = evec(XXT ), V = evec(XTX), S2 = eval(XXT ) = eval(XTX) (12.53)

392 Chapter 12. Latent linear models

12.2.3 Singular value decomposition (SVD)

We have defined the solution to PCA in terms of eigenvectors of the covariance matrix. However,there is another way to obtain the solution, based on the singular value decomposition, orSVD. This basically generalizes the notion of eigenvectors from square matrices to any kind ofmatrix.

In particular, any (real) N × D matrix X can be decomposed as follows

X!"#$N×D

= U!"#$N×N

S!"#$N×D

VT!"#$D×D

(12.46)

where U is an N × N matrix whose columns are orthornormal (so UTU = IN ), V is D × Dmatrix whose rows and columns are orthonormal (so VTV = VVT = ID ), and S is a N × Dmatrix containing the r = min(N,D) singular values σi ≥ 0 on the main diagonal, with 0sfilling the rest of the matrix. The columns of U are the left singular vectors, and the columnsof V are the right singular vectors. See Figure 12.8(a) for an example.Since there are at most D singular values (assuming N > D), the last N − D columns of U

are irrelevant, since they will be multiplied by 0. The economy sized SVD, or thin SVD, avoidscomputing these unnecessary elements. Let us denote this decomposition by USV. If N > D,we have

X!"#$N×D

= U!"#$N×D

S!"#$D×D

VT!"#$D×D

(12.47)

as in Figure 12.8(a). If N < D, we have

X!"#$N×D

= U!"#$N×N

S!"#$N×N

VT!"#$N×D

(12.48)

Computing the economy-sized SVD takes O(NDmin(N,D)) time (Golub and van Loan 1996,p254).

The connection between eigenvectors and singular vectors is the following. For an arbitraryreal matrix X, if X = USVT , we have

XTX = VSTUT USVT = V(STS)VT = VDVT (12.49)

where D = S2 is a diagonal matrix containing the squares singular values. Hence

(XTX)V = VD (12.50)

so the eigenvectors of XTX are equal to V, the right singular vectors of X, and the eigenvaluesof XTX are equal to D, the squared singular values. Similarly

XXT = USVT VSTUT = U(SST )UT (12.51)

(XXT )U = U(SST ) = UD (12.52)

so the eigenvectors of XXT are equal to U, the left singular vectors of X. Also, the eigenvaluesof XXT are equal to the squared singular values. We can summarize all this as follows:

U = evec(XXT ), V = evec(XTX), S2 = eval(XXT ) = eval(XTX) (12.53)

Page 3: Computazione per l’interazione naturale: Richiami di ...boccignone.di.unimi.it/IN_2016_files/LezIN_AlgebraLineare_3.key.pdf · Un po’ di algebra lineare di base //Singular Value

Un po’ di algebra lineare di base //Singular Value Decomposition (SVD)

• Vale il seguente teorema

5.3 Singular Value Decomposition5 SOLUTIONS AND DECOMPOSITIONS

5.2.3 Symmetric

Assume A is symmetric, then

VVT = I (i.e. V is orthogonal) (260)�i 2 R (i.e. �i is real) (261)

Tr(Ap) =P

i�pi (262)

eig(I + cA) = 1 + c�i (263)eig(A� cI) = �i � c (264)

eig(A�1) = ��1

i (265)

For a symmetric, positive matrix A,

eig(AT A) = eig(AAT ) = eig(A) � eig(A) (266)

5.2.4 Characteristic polynomial

The characteristic polynomial for the matrix A is

0 = det(A� �I) (267)= �n � g

1

�n�1 + g2

�n�2 � ... + (�1)ngn (268)

Note that the coe�cients gj for j = 1, ..., n are the n invariants under rotationof A. Thus, gj is the sum of the determinants of all the sub-matrices of A takenj rows and columns at a time. That is, g

1

is the trace of A, and g2

is the sumof the determinants of the n(n� 1)/2 sub-matrices that can be formed from Aby deleting all but two rows and columns, and so on – see [17].

5.3 Singular Value Decomposition

Any n⇥m matrix A can be written as

A = UDVT , (269)

whereU = eigenvectors of AAT n⇥ n

D =p

diag(eig(AAT )) n⇥mV = eigenvectors of AT A m⇥m

(270)

5.3.1 Symmetric Square decomposed into squares

Assume A to be n⇥ n and symmetric. Then⇥

A⇤

=⇥

V⇤ ⇥

D⇤ ⇥

VT⇤, (271)

where D is diagonal with the eigenvalues of A, and V is orthogonal and theeigenvectors of A.

Petersen & Pedersen, The Matrix Cookbook, Version: November 14, 2008, Page 30

392 Chapter 12. Latent linear models

12.2.3 Singular value decomposition (SVD)

We have defined the solution to PCA in terms of eigenvectors of the covariance matrix. However,there is another way to obtain the solution, based on the singular value decomposition, orSVD. This basically generalizes the notion of eigenvectors from square matrices to any kind ofmatrix.

In particular, any (real) N × D matrix X can be decomposed as follows

X!"#$N×D

= U!"#$N×N

S!"#$N×D

VT!"#$D×D

(12.46)

where U is an N × N matrix whose columns are orthornormal (so UTU = IN ), V is D × Dmatrix whose rows and columns are orthonormal (so VTV = VVT = ID ), and S is a N × Dmatrix containing the r = min(N,D) singular values σi ≥ 0 on the main diagonal, with 0sfilling the rest of the matrix. The columns of U are the left singular vectors, and the columnsof V are the right singular vectors. See Figure 12.8(a) for an example.Since there are at most D singular values (assuming N > D), the last N − D columns of U

are irrelevant, since they will be multiplied by 0. The economy sized SVD, or thin SVD, avoidscomputing these unnecessary elements. Let us denote this decomposition by USV. If N > D,we have

X!"#$N×D

= U!"#$N×D

S!"#$D×D

VT!"#$D×D

(12.47)

as in Figure 12.8(a). If N < D, we have

X!"#$N×D

= U!"#$N×N

S!"#$N×N

VT!"#$N×D

(12.48)

Computing the economy-sized SVD takes O(NDmin(N,D)) time (Golub and van Loan 1996,p254).

The connection between eigenvectors and singular vectors is the following. For an arbitraryreal matrix X, if X = USVT , we have

XTX = VSTUT USVT = V(STS)VT = VDVT (12.49)

where D = S2 is a diagonal matrix containing the squares singular values. Hence

(XTX)V = VD (12.50)

so the eigenvectors of XTX are equal to V, the right singular vectors of X, and the eigenvaluesof XTX are equal to D, the squared singular values. Similarly

XXT = USVT VSTUT = U(SST )UT (12.51)

(XXT )U = U(SST ) = UD (12.52)

so the eigenvectors of XXT are equal to U, the left singular vectors of X. Also, the eigenvaluesof XXT are equal to the squared singular values. We can summarize all this as follows:

U = evec(XXT ), V = evec(XTX), S2 = eval(XXT ) = eval(XTX) (12.53)

392 Chapter 12. Latent linear models

12.2.3 Singular value decomposition (SVD)

We have defined the solution to PCA in terms of eigenvectors of the covariance matrix. However,there is another way to obtain the solution, based on the singular value decomposition, orSVD. This basically generalizes the notion of eigenvectors from square matrices to any kind ofmatrix.

In particular, any (real) N × D matrix X can be decomposed as follows

X!"#$N×D

= U!"#$N×N

S!"#$N×D

VT!"#$D×D

(12.46)

where U is an N × N matrix whose columns are orthornormal (so UTU = IN ), V is D × Dmatrix whose rows and columns are orthonormal (so VTV = VVT = ID ), and S is a N × Dmatrix containing the r = min(N,D) singular values σi ≥ 0 on the main diagonal, with 0sfilling the rest of the matrix. The columns of U are the left singular vectors, and the columnsof V are the right singular vectors. See Figure 12.8(a) for an example.Since there are at most D singular values (assuming N > D), the last N − D columns of U

are irrelevant, since they will be multiplied by 0. The economy sized SVD, or thin SVD, avoidscomputing these unnecessary elements. Let us denote this decomposition by USV. If N > D,we have

X!"#$N×D

= U!"#$N×D

S!"#$D×D

VT!"#$D×D

(12.47)

as in Figure 12.8(a). If N < D, we have

X!"#$N×D

= U!"#$N×N

S!"#$N×N

VT!"#$N×D

(12.48)

Computing the economy-sized SVD takes O(NDmin(N,D)) time (Golub and van Loan 1996,p254).

The connection between eigenvectors and singular vectors is the following. For an arbitraryreal matrix X, if X = USVT , we have

XTX = VSTUT USVT = V(STS)VT = VDVT (12.49)

where D = S2 is a diagonal matrix containing the squares singular values. Hence

(XTX)V = VD (12.50)

so the eigenvectors of XTX are equal to V, the right singular vectors of X, and the eigenvaluesof XTX are equal to D, the squared singular values. Similarly

XXT = USVT VSTUT = U(SST )UT (12.51)

(XXT )U = U(SST ) = UD (12.52)

so the eigenvectors of XXT are equal to U, the left singular vectors of X. Also, the eigenvaluesof XXT are equal to the squared singular values. We can summarize all this as follows:

U = evec(XXT ), V = evec(XTX), S2 = eval(XXT ) = eval(XTX) (12.53)

392 Chapter 12. Latent linear models

12.2.3 Singular value decomposition (SVD)

We have defined the solution to PCA in terms of eigenvectors of the covariance matrix. However,there is another way to obtain the solution, based on the singular value decomposition, orSVD. This basically generalizes the notion of eigenvectors from square matrices to any kind ofmatrix.In particular, any (real) N × D matrix X can be decomposed as follows

X!"#$N×D

= U!"#$N×N

S!"#$N×D

VT!"#$D×D

(12.46)

where U is an N × N matrix whose columns are orthornormal (so UTU = IN ), V is D × Dmatrix whose rows and columns are orthonormal (so VTV = VVT = ID ), and S is a N × Dmatrix containing the r = min(N,D) singular values σi ≥ 0 on the main diagonal, with 0sfilling the rest of the matrix. The columns of U are the left singular vectors, and the columnsof V are the right singular vectors. See Figure 12.8(a) for an example.Since there are at most D singular values (assuming N > D), the last N − D columns of U

are irrelevant, since they will be multiplied by 0. The economy sized SVD, or thin SVD, avoidscomputing these unnecessary elements. Let us denote this decomposition by USV. If N > D,we have

X!"#$N×D

= U!"#$N×D

S!"#$D×D

VT!"#$D×D

(12.47)

as in Figure 12.8(a). If N < D, we have

X!"#$N×D

= U!"#$N×N

S!"#$N×N

VT!"#$N×D

(12.48)

Computing the economy-sized SVD takes O(NDmin(N,D)) time (Golub and van Loan 1996,p254).The connection between eigenvectors and singular vectors is the following. For an arbitrary

real matrix X, if X = USVT , we have

XTX = VSTUT USVT = V(STS)VT = VDVT (12.49)

where D = S2 is a diagonal matrix containing the squares singular values. Hence

(XTX)V = VD (12.50)

so the eigenvectors of XTX are equal to V, the right singular vectors of X, and the eigenvaluesof XTX are equal to D, the squared singular values. Similarly

XXT = USVT VSTUT = U(SST )UT (12.51)

(XXT )U = U(SST ) = UD (12.52)

so the eigenvectors of XXT are equal to U, the left singular vectors of X. Also, the eigenvaluesof XXT are equal to the squared singular values. We can summarize all this as follows:

U = evec(XXT ), V = evec(XTX), S2 = eval(XXT ) = eval(XTX) (12.53)

5.3 Singular Value Decomposition5 SOLUTIONS AND DECOMPOSITIONS

5.2.3 Symmetric

Assume A is symmetric, then

VVT = I (i.e. V is orthogonal) (260)�i 2 R (i.e. �i is real) (261)

Tr(Ap) =P

i�pi (262)

eig(I + cA) = 1 + c�i (263)eig(A� cI) = �i � c (264)

eig(A�1) = ��1

i (265)

For a symmetric, positive matrix A,

eig(AT A) = eig(AAT ) = eig(A) � eig(A) (266)

5.2.4 Characteristic polynomial

The characteristic polynomial for the matrix A is

0 = det(A� �I) (267)= �n � g

1

�n�1 + g2

�n�2 � ... + (�1)ngn (268)

Note that the coe�cients gj for j = 1, ..., n are the n invariants under rotationof A. Thus, gj is the sum of the determinants of all the sub-matrices of A takenj rows and columns at a time. That is, g

1

is the trace of A, and g2

is the sumof the determinants of the n(n� 1)/2 sub-matrices that can be formed from Aby deleting all but two rows and columns, and so on – see [17].

5.3 Singular Value Decomposition

Any n⇥m matrix A can be written as

A = UDVT , (269)

whereU = eigenvectors of AAT n⇥ n

D =p

diag(eig(AAT )) n⇥mV = eigenvectors of AT A m⇥m

(270)

5.3.1 Symmetric Square decomposed into squares

Assume A to be n⇥ n and symmetric. Then⇥

A⇤

=⇥

V⇤ ⇥

D⇤ ⇥

VT⇤, (271)

where D is diagonal with the eigenvalues of A, and V is orthogonal and theeigenvectors of A.

Petersen & Pedersen, The Matrix Cookbook, Version: November 14, 2008, Page 30

5.3 Singular Value Decomposition5 SOLUTIONS AND DECOMPOSITIONS

5.2.3 Symmetric

Assume A is symmetric, then

VVT = I (i.e. V is orthogonal) (260)�i 2 R (i.e. �i is real) (261)

Tr(Ap) =P

i�pi (262)

eig(I + cA) = 1 + c�i (263)eig(A� cI) = �i � c (264)

eig(A�1) = ��1

i (265)

For a symmetric, positive matrix A,

eig(AT A) = eig(AAT ) = eig(A) � eig(A) (266)

5.2.4 Characteristic polynomial

The characteristic polynomial for the matrix A is

0 = det(A� �I) (267)= �n � g

1

�n�1 + g2

�n�2 � ... + (�1)ngn (268)

Note that the coe�cients gj for j = 1, ..., n are the n invariants under rotationof A. Thus, gj is the sum of the determinants of all the sub-matrices of A takenj rows and columns at a time. That is, g

1

is the trace of A, and g2

is the sumof the determinants of the n(n� 1)/2 sub-matrices that can be formed from Aby deleting all but two rows and columns, and so on – see [17].

5.3 Singular Value Decomposition

Any n⇥m matrix A can be written as

A = UDVT , (269)

whereU = eigenvectors of AAT n⇥ n

D =p

diag(eig(AAT )) n⇥mV = eigenvectors of AT A m⇥m

(270)

5.3.1 Symmetric Square decomposed into squares

Assume A to be n⇥ n and symmetric. Then⇥

A⇤

=⇥

V⇤ ⇥

D⇤ ⇥

VT⇤, (271)

where D is diagonal with the eigenvalues of A, and V is orthogonal and theeigenvectors of A.

Petersen & Pedersen, The Matrix Cookbook, Version: November 14, 2008, Page 30

392 Chapter 12. Latent linear models

12.2.3 Singular value decomposition (SVD)

We have defined the solution to PCA in terms of eigenvectors of the covariance matrix. However,there is another way to obtain the solution, based on the singular value decomposition, orSVD. This basically generalizes the notion of eigenvectors from square matrices to any kind ofmatrix.

In particular, any (real) N × D matrix X can be decomposed as follows

X!"#$N×D

= U!"#$N×N

S!"#$N×D

VT!"#$D×D

(12.46)

where U is an N × N matrix whose columns are orthornormal (so UTU = IN ), V is D × Dmatrix whose rows and columns are orthonormal (so VTV = VVT = ID ), and S is a N × Dmatrix containing the r = min(N,D) singular values σi ≥ 0 on the main diagonal, with 0sfilling the rest of the matrix. The columns of U are the left singular vectors, and the columnsof V are the right singular vectors. See Figure 12.8(a) for an example.Since there are at most D singular values (assuming N > D), the last N − D columns of U

are irrelevant, since they will be multiplied by 0. The economy sized SVD, or thin SVD, avoidscomputing these unnecessary elements. Let us denote this decomposition by USV. If N > D,we have

X!"#$N×D

= U!"#$N×D

S!"#$D×D

VT!"#$D×D

(12.47)

as in Figure 12.8(a). If N < D, we have

X!"#$N×D

= U!"#$N×N

S!"#$N×N

VT!"#$N×D

(12.48)

Computing the economy-sized SVD takes O(NDmin(N,D)) time (Golub and van Loan 1996,p254).

The connection between eigenvectors and singular vectors is the following. For an arbitraryreal matrix X, if X = USVT , we have

XTX = VSTUT USVT = V(STS)VT = VDVT (12.49)

where D = S2 is a diagonal matrix containing the squares singular values. Hence

(XTX)V = VD (12.50)

so the eigenvectors of XTX are equal to V, the right singular vectors of X, and the eigenvaluesof XTX are equal to D, the squared singular values. Similarly

XXT = USVT VSTUT = U(SST )UT (12.51)

(XXT )U = U(SST ) = UD (12.52)

so the eigenvectors of XXT are equal to U, the left singular vectors of X. Also, the eigenvaluesof XXT are equal to the squared singular values. We can summarize all this as follows:

U = evec(XXT ), V = evec(XTX), S2 = eval(XXT ) = eval(XTX) (12.53)

392 Chapter 12. Latent linear models

12.2.3 Singular value decomposition (SVD)

We have defined the solution to PCA in terms of eigenvectors of the covariance matrix. However,there is another way to obtain the solution, based on the singular value decomposition, orSVD. This basically generalizes the notion of eigenvectors from square matrices to any kind ofmatrix.

In particular, any (real) N × D matrix X can be decomposed as follows

X!"#$N×D

= U!"#$N×N

S!"#$N×D

VT!"#$D×D

(12.46)

where U is an N × N matrix whose columns are orthornormal (so UTU = IN ), V is D × Dmatrix whose rows and columns are orthonormal (so VTV = VVT = ID ), and S is a N × Dmatrix containing the r = min(N,D) singular values σi ≥ 0 on the main diagonal, with 0sfilling the rest of the matrix. The columns of U are the left singular vectors, and the columnsof V are the right singular vectors. See Figure 12.8(a) for an example.Since there are at most D singular values (assuming N > D), the last N − D columns of U

are irrelevant, since they will be multiplied by 0. The economy sized SVD, or thin SVD, avoidscomputing these unnecessary elements. Let us denote this decomposition by USV. If N > D,we have

X!"#$N×D

= U!"#$N×D

S!"#$D×D

VT!"#$D×D

(12.47)

as in Figure 12.8(a). If N < D, we have

X!"#$N×D

= U!"#$N×N

S!"#$N×N

VT!"#$N×D

(12.48)

Computing the economy-sized SVD takes O(NDmin(N,D)) time (Golub and van Loan 1996,p254).

The connection between eigenvectors and singular vectors is the following. For an arbitraryreal matrix X, if X = USVT , we have

XTX = VSTUT USVT = V(STS)VT = VDVT (12.49)

where D = S2 is a diagonal matrix containing the squares singular values. Hence

(XTX)V = VD (12.50)

so the eigenvectors of XTX are equal to V, the right singular vectors of X, and the eigenvaluesof XTX are equal to D, the squared singular values. Similarly

XXT = USVT VSTUT = U(SST )UT (12.51)

(XXT )U = U(SST ) = UD (12.52)

so the eigenvectors of XXT are equal to U, the left singular vectors of X. Also, the eigenvaluesof XXT are equal to the squared singular values. We can summarize all this as follows:

U = evec(XXT ), V = evec(XTX), S2 = eval(XXT ) = eval(XTX) (12.53)

5.3 Singular Value Decomposition5 SOLUTIONS AND DECOMPOSITIONS

5.2.3 Symmetric

Assume A is symmetric, then

VVT = I (i.e. V is orthogonal) (260)�i 2 R (i.e. �i is real) (261)

Tr(Ap) =P

i�pi (262)

eig(I + cA) = 1 + c�i (263)eig(A� cI) = �i � c (264)

eig(A�1) = ��1

i (265)

For a symmetric, positive matrix A,

eig(AT A) = eig(AAT ) = eig(A) � eig(A) (266)

5.2.4 Characteristic polynomial

The characteristic polynomial for the matrix A is

0 = det(A� �I) (267)= �n � g

1

�n�1 + g2

�n�2 � ... + (�1)ngn (268)

Note that the coe�cients gj for j = 1, ..., n are the n invariants under rotationof A. Thus, gj is the sum of the determinants of all the sub-matrices of A takenj rows and columns at a time. That is, g

1

is the trace of A, and g2

is the sumof the determinants of the n(n� 1)/2 sub-matrices that can be formed from Aby deleting all but two rows and columns, and so on – see [17].

5.3 Singular Value Decomposition

Any n⇥m matrix A can be written as

A = UDVT , (269)

whereU = eigenvectors of AAT n⇥ n

D =p

diag(eig(AAT )) n⇥mV = eigenvectors of AT A m⇥m

(270)

5.3.1 Symmetric Square decomposed into squares

Assume A to be n⇥ n and symmetric. Then⇥

A⇤

=⇥

V⇤ ⇥

D⇤ ⇥

VT⇤, (271)

where D is diagonal with the eigenvalues of A, and V is orthogonal and theeigenvectors of A.

Petersen & Pedersen, The Matrix Cookbook, Version: November 14, 2008, Page 30

5.3 Singular Value Decomposition5 SOLUTIONS AND DECOMPOSITIONS

5.2.3 Symmetric

Assume A is symmetric, then

VVT = I (i.e. V is orthogonal) (260)�i 2 R (i.e. �i is real) (261)

Tr(Ap) =P

i�pi (262)

eig(I + cA) = 1 + c�i (263)eig(A� cI) = �i � c (264)

eig(A�1) = ��1

i (265)

For a symmetric, positive matrix A,

eig(AT A) = eig(AAT ) = eig(A) � eig(A) (266)

5.2.4 Characteristic polynomial

The characteristic polynomial for the matrix A is

0 = det(A� �I) (267)= �n � g

1

�n�1 + g2

�n�2 � ... + (�1)ngn (268)

Note that the coe�cients gj for j = 1, ..., n are the n invariants under rotationof A. Thus, gj is the sum of the determinants of all the sub-matrices of A takenj rows and columns at a time. That is, g

1

is the trace of A, and g2

is the sumof the determinants of the n(n� 1)/2 sub-matrices that can be formed from Aby deleting all but two rows and columns, and so on – see [17].

5.3 Singular Value Decomposition

Any n⇥m matrix A can be written as

A = UDVT , (269)

whereU = eigenvectors of AAT n⇥ n

D =p

diag(eig(AAT )) n⇥mV = eigenvectors of AT A m⇥m

(270)

5.3.1 Symmetric Square decomposed into squares

Assume A to be n⇥ n and symmetric. Then⇥

A⇤

=⇥

V⇤ ⇥

D⇤ ⇥

VT⇤, (271)

where D is diagonal with the eigenvalues of A, and V is orthogonal and theeigenvectors of A.

Petersen & Pedersen, The Matrix Cookbook, Version: November 14, 2008, Page 30

392 Chapter 12. Latent linear models

12.2.3 Singular value decomposition (SVD)

We have defined the solution to PCA in terms of eigenvectors of the covariance matrix. However,there is another way to obtain the solution, based on the singular value decomposition, orSVD. This basically generalizes the notion of eigenvectors from square matrices to any kind ofmatrix.

In particular, any (real) N × D matrix X can be decomposed as follows

X!"#$N×D

= U!"#$N×N

S!"#$N×D

VT!"#$D×D

(12.46)

where U is an N × N matrix whose columns are orthornormal (so UTU = IN ), V is D × Dmatrix whose rows and columns are orthonormal (so VTV = VVT = ID ), and S is a N × Dmatrix containing the r = min(N,D) singular values σi ≥ 0 on the main diagonal, with 0sfilling the rest of the matrix. The columns of U are the left singular vectors, and the columnsof V are the right singular vectors. See Figure 12.8(a) for an example.Since there are at most D singular values (assuming N > D), the last N − D columns of U

are irrelevant, since they will be multiplied by 0. The economy sized SVD, or thin SVD, avoidscomputing these unnecessary elements. Let us denote this decomposition by USV. If N > D,we have

X!"#$N×D

= U!"#$N×D

S!"#$D×D

VT!"#$D×D

(12.47)

as in Figure 12.8(a). If N < D, we have

X!"#$N×D

= U!"#$N×N

S!"#$N×N

VT!"#$N×D

(12.48)

Computing the economy-sized SVD takes O(NDmin(N,D)) time (Golub and van Loan 1996,p254).

The connection between eigenvectors and singular vectors is the following. For an arbitraryreal matrix X, if X = USVT , we have

XTX = VSTUT USVT = V(STS)VT = VDVT (12.49)

where D = S2 is a diagonal matrix containing the squares singular values. Hence

(XTX)V = VD (12.50)

so the eigenvectors of XTX are equal to V, the right singular vectors of X, and the eigenvaluesof XTX are equal to D, the squared singular values. Similarly

XXT = USVT VSTUT = U(SST )UT (12.51)

(XXT )U = U(SST ) = UD (12.52)

so the eigenvectors of XXT are equal to U, the left singular vectors of X. Also, the eigenvaluesof XXT are equal to the squared singular values. We can summarize all this as follows:

U = evec(XXT ), V = evec(XTX), S2 = eval(XXT ) = eval(XTX) (12.53)

5.3 Singular Value Decomposition5 SOLUTIONS AND DECOMPOSITIONS

5.2.3 Symmetric

Assume A is symmetric, then

VVT = I (i.e. V is orthogonal) (260)�i 2 R (i.e. �i is real) (261)

Tr(Ap) =P

i�pi (262)

eig(I + cA) = 1 + c�i (263)eig(A� cI) = �i � c (264)

eig(A�1) = ��1

i (265)

For a symmetric, positive matrix A,

eig(AT A) = eig(AAT ) = eig(A) � eig(A) (266)

5.2.4 Characteristic polynomial

The characteristic polynomial for the matrix A is

0 = det(A� �I) (267)= �n � g

1

�n�1 + g2

�n�2 � ... + (�1)ngn (268)

Note that the coe�cients gj for j = 1, ..., n are the n invariants under rotationof A. Thus, gj is the sum of the determinants of all the sub-matrices of A takenj rows and columns at a time. That is, g

1

is the trace of A, and g2

is the sumof the determinants of the n(n� 1)/2 sub-matrices that can be formed from Aby deleting all but two rows and columns, and so on – see [17].

5.3 Singular Value Decomposition

Any n⇥m matrix A can be written as

A = UDVT , (269)

whereU = eigenvectors of AAT n⇥ n

D =p

diag(eig(AAT )) n⇥mV = eigenvectors of AT A m⇥m

(270)

5.3.1 Symmetric Square decomposed into squares

Assume A to be n⇥ n and symmetric. Then⇥

A⇤

=⇥

V⇤ ⇥

D⇤ ⇥

VT⇤, (271)

where D is diagonal with the eigenvalues of A, and V is orthogonal and theeigenvectors of A.

Petersen & Pedersen, The Matrix Cookbook, Version: November 14, 2008, Page 30

5.3 Singular Value Decomposition5 SOLUTIONS AND DECOMPOSITIONS

5.2.3 Symmetric

Assume A is symmetric, then

VVT = I (i.e. V is orthogonal) (260)�i 2 R (i.e. �i is real) (261)

Tr(Ap) =P

i�pi (262)

eig(I + cA) = 1 + c�i (263)eig(A� cI) = �i � c (264)

eig(A�1) = ��1

i (265)

For a symmetric, positive matrix A,

eig(AT A) = eig(AAT ) = eig(A) � eig(A) (266)

5.2.4 Characteristic polynomial

The characteristic polynomial for the matrix A is

0 = det(A� �I) (267)= �n � g

1

�n�1 + g2

�n�2 � ... + (�1)ngn (268)

Note that the coe�cients gj for j = 1, ..., n are the n invariants under rotationof A. Thus, gj is the sum of the determinants of all the sub-matrices of A takenj rows and columns at a time. That is, g

1

is the trace of A, and g2

is the sumof the determinants of the n(n� 1)/2 sub-matrices that can be formed from Aby deleting all but two rows and columns, and so on – see [17].

5.3 Singular Value Decomposition

Any n⇥m matrix A can be written as

A = UDVT , (269)

whereU = eigenvectors of AAT n⇥ n

D =p

diag(eig(AAT )) n⇥mV = eigenvectors of AT A m⇥m

(270)

5.3.1 Symmetric Square decomposed into squares

Assume A to be n⇥ n and symmetric. Then⇥

A⇤

=⇥

V⇤ ⇥

D⇤ ⇥

VT⇤, (271)

where D is diagonal with the eigenvalues of A, and V is orthogonal and theeigenvectors of A.

Petersen & Pedersen, The Matrix Cookbook, Version: November 14, 2008, Page 30

392 Chapter 12. Latent linear models

12.2.3 Singular value decomposition (SVD)

We have defined the solution to PCA in terms of eigenvectors of the covariance matrix. However,there is another way to obtain the solution, based on the singular value decomposition, orSVD. This basically generalizes the notion of eigenvectors from square matrices to any kind ofmatrix.

In particular, any (real) N × D matrix X can be decomposed as follows

X!"#$N×D

= U!"#$N×N

S!"#$N×D

VT!"#$D×D

(12.46)

where U is an N × N matrix whose columns are orthornormal (so UTU = IN ), V is D × Dmatrix whose rows and columns are orthonormal (so VTV = VVT = ID ), and S is a N × Dmatrix containing the r = min(N,D) singular values σi ≥ 0 on the main diagonal, with 0sfilling the rest of the matrix. The columns of U are the left singular vectors, and the columnsof V are the right singular vectors. See Figure 12.8(a) for an example.Since there are at most D singular values (assuming N > D), the last N − D columns of U

are irrelevant, since they will be multiplied by 0. The economy sized SVD, or thin SVD, avoidscomputing these unnecessary elements. Let us denote this decomposition by USV. If N > D,we have

X!"#$N×D

= U!"#$N×D

S!"#$D×D

VT!"#$D×D

(12.47)

as in Figure 12.8(a). If N < D, we have

X!"#$N×D

= U!"#$N×N

S!"#$N×N

VT!"#$N×D

(12.48)

Computing the economy-sized SVD takes O(NDmin(N,D)) time (Golub and van Loan 1996,p254).

The connection between eigenvectors and singular vectors is the following. For an arbitraryreal matrix X, if X = USVT , we have

XTX = VSTUT USVT = V(STS)VT = VDVT (12.49)

where D = S2 is a diagonal matrix containing the squares singular values. Hence

(XTX)V = VD (12.50)

so the eigenvectors of XTX are equal to V, the right singular vectors of X, and the eigenvaluesof XTX are equal to D, the squared singular values. Similarly

XXT = USVT VSTUT = U(SST )UT (12.51)

(XXT )U = U(SST ) = UD (12.52)

so the eigenvectors of XXT are equal to U, the left singular vectors of X. Also, the eigenvaluesof XXT are equal to the squared singular values. We can summarize all this as follows:

U = evec(XXT ), V = evec(XTX), S2 = eval(XXT ) = eval(XTX) (12.53)

5.3 Singular Value Decomposition5 SOLUTIONS AND DECOMPOSITIONS

5.2.3 Symmetric

Assume A is symmetric, then

VVT = I (i.e. V is orthogonal) (260)�i 2 R (i.e. �i is real) (261)

Tr(Ap) =P

i�pi (262)

eig(I + cA) = 1 + c�i (263)eig(A� cI) = �i � c (264)

eig(A�1) = ��1

i (265)

For a symmetric, positive matrix A,

eig(AT A) = eig(AAT ) = eig(A) � eig(A) (266)

5.2.4 Characteristic polynomial

The characteristic polynomial for the matrix A is

0 = det(A� �I) (267)= �n � g

1

�n�1 + g2

�n�2 � ... + (�1)ngn (268)

Note that the coe�cients gj for j = 1, ..., n are the n invariants under rotationof A. Thus, gj is the sum of the determinants of all the sub-matrices of A takenj rows and columns at a time. That is, g

1

is the trace of A, and g2

is the sumof the determinants of the n(n� 1)/2 sub-matrices that can be formed from Aby deleting all but two rows and columns, and so on – see [17].

5.3 Singular Value Decomposition

Any n⇥m matrix A can be written as

A = UDVT , (269)

whereU = eigenvectors of AAT n⇥ n

D =p

diag(eig(AAT )) n⇥mV = eigenvectors of AT A m⇥m

(270)

5.3.1 Symmetric Square decomposed into squares

Assume A to be n⇥ n and symmetric. Then⇥

A⇤

=⇥

V⇤ ⇥

D⇤ ⇥

VT⇤, (271)

where D is diagonal with the eigenvalues of A, and V is orthogonal and theeigenvectors of A.

Petersen & Pedersen, The Matrix Cookbook, Version: November 14, 2008, Page 30

5.3 Singular Value Decomposition5 SOLUTIONS AND DECOMPOSITIONS

5.2.3 Symmetric

Assume A is symmetric, then

VVT = I (i.e. V is orthogonal) (260)�i 2 R (i.e. �i is real) (261)

Tr(Ap) =P

i�pi (262)

eig(I + cA) = 1 + c�i (263)eig(A� cI) = �i � c (264)

eig(A�1) = ��1

i (265)

For a symmetric, positive matrix A,

eig(AT A) = eig(AAT ) = eig(A) � eig(A) (266)

5.2.4 Characteristic polynomial

The characteristic polynomial for the matrix A is

0 = det(A� �I) (267)= �n � g

1

�n�1 + g2

�n�2 � ... + (�1)ngn (268)

Note that the coe�cients gj for j = 1, ..., n are the n invariants under rotationof A. Thus, gj is the sum of the determinants of all the sub-matrices of A takenj rows and columns at a time. That is, g

1

is the trace of A, and g2

is the sumof the determinants of the n(n� 1)/2 sub-matrices that can be formed from Aby deleting all but two rows and columns, and so on – see [17].

5.3 Singular Value Decomposition

Any n⇥m matrix A can be written as

A = UDVT , (269)

whereU = eigenvectors of AAT n⇥ n

D =p

diag(eig(AAT )) n⇥mV = eigenvectors of AT A m⇥m

(270)

5.3.1 Symmetric Square decomposed into squares

Assume A to be n⇥ n and symmetric. Then⇥

A⇤

=⇥

V⇤ ⇥

D⇤ ⇥

VT⇤, (271)

where D is diagonal with the eigenvalues of A, and V is orthogonal and theeigenvectors of A.

Petersen & Pedersen, The Matrix Cookbook, Version: November 14, 2008, Page 30

392 Chapter 12. Latent linear models

12.2.3 Singular value decomposition (SVD)

We have defined the solution to PCA in terms of eigenvectors of the covariance matrix. However,there is another way to obtain the solution, based on the singular value decomposition, orSVD. This basically generalizes the notion of eigenvectors from square matrices to any kind ofmatrix.

In particular, any (real) N × D matrix X can be decomposed as follows

X!"#$N×D

= U!"#$N×N

S!"#$N×D

VT!"#$D×D

(12.46)

where U is an N × N matrix whose columns are orthornormal (so UTU = IN ), V is D × Dmatrix whose rows and columns are orthonormal (so VTV = VVT = ID ), and S is a N × Dmatrix containing the r = min(N,D) singular values σi ≥ 0 on the main diagonal, with 0sfilling the rest of the matrix. The columns of U are the left singular vectors, and the columnsof V are the right singular vectors. See Figure 12.8(a) for an example.Since there are at most D singular values (assuming N > D), the last N − D columns of U

are irrelevant, since they will be multiplied by 0. The economy sized SVD, or thin SVD, avoidscomputing these unnecessary elements. Let us denote this decomposition by USV. If N > D,we have

X!"#$N×D

= U!"#$N×D

S!"#$D×D

VT!"#$D×D

(12.47)

as in Figure 12.8(a). If N < D, we have

X!"#$N×D

= U!"#$N×N

S!"#$N×N

VT!"#$N×D

(12.48)

Computing the economy-sized SVD takes O(NDmin(N,D)) time (Golub and van Loan 1996,p254).

The connection between eigenvectors and singular vectors is the following. For an arbitraryreal matrix X, if X = USVT , we have

XTX = VSTUT USVT = V(STS)VT = VDVT (12.49)

where D = S2 is a diagonal matrix containing the squares singular values. Hence

(XTX)V = VD (12.50)

so the eigenvectors of XTX are equal to V, the right singular vectors of X, and the eigenvaluesof XTX are equal to D, the squared singular values. Similarly

XXT = USVT VSTUT = U(SST )UT (12.51)

(XXT )U = U(SST ) = UD (12.52)

so the eigenvectors of XXT are equal to U, the left singular vectors of X. Also, the eigenvaluesof XXT are equal to the squared singular values. We can summarize all this as follows:

U = evec(XXT ), V = evec(XTX), S2 = eval(XXT ) = eval(XTX) (12.53)

5.3 Singular Value Decomposition5 SOLUTIONS AND DECOMPOSITIONS

5.2.3 Symmetric

Assume A is symmetric, then

VVT = I (i.e. V is orthogonal) (260)�i 2 R (i.e. �i is real) (261)

Tr(Ap) =P

i�pi (262)

eig(I + cA) = 1 + c�i (263)eig(A� cI) = �i � c (264)

eig(A�1) = ��1

i (265)

For a symmetric, positive matrix A,

eig(AT A) = eig(AAT ) = eig(A) � eig(A) (266)

5.2.4 Characteristic polynomial

The characteristic polynomial for the matrix A is

0 = det(A� �I) (267)= �n � g

1

�n�1 + g2

�n�2 � ... + (�1)ngn (268)

Note that the coe�cients gj for j = 1, ..., n are the n invariants under rotationof A. Thus, gj is the sum of the determinants of all the sub-matrices of A takenj rows and columns at a time. That is, g

1

is the trace of A, and g2

is the sumof the determinants of the n(n� 1)/2 sub-matrices that can be formed from Aby deleting all but two rows and columns, and so on – see [17].

5.3 Singular Value Decomposition

Any n⇥m matrix A can be written as

A = UDVT , (269)

whereU = eigenvectors of AAT n⇥ n

D =p

diag(eig(AAT )) n⇥mV = eigenvectors of AT A m⇥m

(270)

5.3.1 Symmetric Square decomposed into squares

Assume A to be n⇥ n and symmetric. Then⇥

A⇤

=⇥

V⇤ ⇥

D⇤ ⇥

VT⇤, (271)

where D is diagonal with the eigenvalues of A, and V is orthogonal and theeigenvectors of A.

Petersen & Pedersen, The Matrix Cookbook, Version: November 14, 2008, Page 30

5.3 Singular Value Decomposition5 SOLUTIONS AND DECOMPOSITIONS

5.2.3 Symmetric

Assume A is symmetric, then

VVT = I (i.e. V is orthogonal) (260)�i 2 R (i.e. �i is real) (261)

Tr(Ap) =P

i�pi (262)

eig(I + cA) = 1 + c�i (263)eig(A� cI) = �i � c (264)

eig(A�1) = ��1

i (265)

For a symmetric, positive matrix A,

eig(AT A) = eig(AAT ) = eig(A) � eig(A) (266)

5.2.4 Characteristic polynomial

The characteristic polynomial for the matrix A is

0 = det(A� �I) (267)= �n � g

1

�n�1 + g2

�n�2 � ... + (�1)ngn (268)

Note that the coe�cients gj for j = 1, ..., n are the n invariants under rotationof A. Thus, gj is the sum of the determinants of all the sub-matrices of A takenj rows and columns at a time. That is, g

1

is the trace of A, and g2

is the sumof the determinants of the n(n� 1)/2 sub-matrices that can be formed from Aby deleting all but two rows and columns, and so on – see [17].

5.3 Singular Value Decomposition

Any n⇥m matrix A can be written as

A = UDVT , (269)

whereU = eigenvectors of AAT n⇥ n

D =p

diag(eig(AAT )) n⇥mV = eigenvectors of AT A m⇥m

(270)

5.3.1 Symmetric Square decomposed into squares

Assume A to be n⇥ n and symmetric. Then⇥

A⇤

=⇥

V⇤ ⇥

D⇤ ⇥

VT⇤, (271)

where D is diagonal with the eigenvalues of A, and V is orthogonal and theeigenvectors of A.

Petersen & Pedersen, The Matrix Cookbook, Version: November 14, 2008, Page 30

5.3 Singular Value Decomposition5 SOLUTIONS AND DECOMPOSITIONS

5.2.3 Symmetric

Assume A is symmetric, then

VVT = I (i.e. V is orthogonal) (260)�i 2 R (i.e. �i is real) (261)

Tr(Ap) =P

i�pi (262)

eig(I + cA) = 1 + c�i (263)eig(A� cI) = �i � c (264)

eig(A�1) = ��1

i (265)

For a symmetric, positive matrix A,

eig(AT A) = eig(AAT ) = eig(A) � eig(A) (266)

5.2.4 Characteristic polynomial

The characteristic polynomial for the matrix A is

0 = det(A� �I) (267)= �n � g

1

�n�1 + g2

�n�2 � ... + (�1)ngn (268)

Note that the coe�cients gj for j = 1, ..., n are the n invariants under rotationof A. Thus, gj is the sum of the determinants of all the sub-matrices of A takenj rows and columns at a time. That is, g

1

is the trace of A, and g2

is the sumof the determinants of the n(n� 1)/2 sub-matrices that can be formed from Aby deleting all but two rows and columns, and so on – see [17].

5.3 Singular Value Decomposition

Any n⇥m matrix A can be written as

A = UDVT , (269)

whereU = eigenvectors of AAT n⇥ n

D =p

diag(eig(AAT )) n⇥mV = eigenvectors of AT A m⇥m

(270)

5.3.1 Symmetric Square decomposed into squares

Assume A to be n⇥ n and symmetric. Then⇥

A⇤

=⇥

V⇤ ⇥

D⇤ ⇥

VT⇤, (271)

where D is diagonal with the eigenvalues of A, and V is orthogonal and theeigenvectors of A.

Petersen & Pedersen, The Matrix Cookbook, Version: November 14, 2008, Page 30

5.3 Singular Value Decomposition5 SOLUTIONS AND DECOMPOSITIONS

5.2.3 Symmetric

Assume A is symmetric, then

VVT = I (i.e. V is orthogonal) (260)�i 2 R (i.e. �i is real) (261)

Tr(Ap) =P

i�pi (262)

eig(I + cA) = 1 + c�i (263)eig(A� cI) = �i � c (264)

eig(A�1) = ��1

i (265)

For a symmetric, positive matrix A,

eig(AT A) = eig(AAT ) = eig(A) � eig(A) (266)

5.2.4 Characteristic polynomial

The characteristic polynomial for the matrix A is

0 = det(A� �I) (267)= �n � g

1

�n�1 + g2

�n�2 � ... + (�1)ngn (268)

Note that the coe�cients gj for j = 1, ..., n are the n invariants under rotationof A. Thus, gj is the sum of the determinants of all the sub-matrices of A takenj rows and columns at a time. That is, g

1

is the trace of A, and g2

is the sumof the determinants of the n(n� 1)/2 sub-matrices that can be formed from Aby deleting all but two rows and columns, and so on – see [17].

5.3 Singular Value Decomposition

Any n⇥m matrix A can be written as

A = UDVT , (269)

whereU = eigenvectors of AAT n⇥ n

D =p

diag(eig(AAT )) n⇥mV = eigenvectors of AT A m⇥m

(270)

5.3.1 Symmetric Square decomposed into squares

Assume A to be n⇥ n and symmetric. Then⇥

A⇤

=⇥

V⇤ ⇥

D⇤ ⇥

VT⇤, (271)

where D is diagonal with the eigenvalues of A, and V is orthogonal and theeigenvectors of A.

Petersen & Pedersen, The Matrix Cookbook, Version: November 14, 2008, Page 30

5.3 Singular Value Decomposition5 SOLUTIONS AND DECOMPOSITIONS

5.2.3 Symmetric

Assume A is symmetric, then

VVT = I (i.e. V is orthogonal) (260)�i 2 R (i.e. �i is real) (261)

Tr(Ap) =P

i�pi (262)

eig(I + cA) = 1 + c�i (263)eig(A� cI) = �i � c (264)

eig(A�1) = ��1

i (265)

For a symmetric, positive matrix A,

eig(AT A) = eig(AAT ) = eig(A) � eig(A) (266)

5.2.4 Characteristic polynomial

The characteristic polynomial for the matrix A is

0 = det(A� �I) (267)= �n � g

1

�n�1 + g2

�n�2 � ... + (�1)ngn (268)

Note that the coe�cients gj for j = 1, ..., n are the n invariants under rotationof A. Thus, gj is the sum of the determinants of all the sub-matrices of A takenj rows and columns at a time. That is, g

1

is the trace of A, and g2

is the sumof the determinants of the n(n� 1)/2 sub-matrices that can be formed from Aby deleting all but two rows and columns, and so on – see [17].

5.3 Singular Value Decomposition

Any n⇥m matrix A can be written as

A = UDVT , (269)

whereU = eigenvectors of AAT n⇥ n

D =p

diag(eig(AAT )) n⇥mV = eigenvectors of AT A m⇥m

(270)

5.3.1 Symmetric Square decomposed into squares

Assume A to be n⇥ n and symmetric. Then⇥

A⇤

=⇥

V⇤ ⇥

D⇤ ⇥

VT⇤, (271)

where D is diagonal with the eigenvalues of A, and V is orthogonal and theeigenvectors of A.

Petersen & Pedersen, The Matrix Cookbook, Version: November 14, 2008, Page 30

5.3 Singular Value Decomposition5 SOLUTIONS AND DECOMPOSITIONS

5.2.3 Symmetric

Assume A is symmetric, then

VVT = I (i.e. V is orthogonal) (260)�i 2 R (i.e. �i is real) (261)

Tr(Ap) =P

i�pi (262)

eig(I + cA) = 1 + c�i (263)eig(A� cI) = �i � c (264)

eig(A�1) = ��1

i (265)

For a symmetric, positive matrix A,

eig(AT A) = eig(AAT ) = eig(A) � eig(A) (266)

5.2.4 Characteristic polynomial

The characteristic polynomial for the matrix A is

0 = det(A� �I) (267)= �n � g

1

�n�1 + g2

�n�2 � ... + (�1)ngn (268)

Note that the coe�cients gj for j = 1, ..., n are the n invariants under rotationof A. Thus, gj is the sum of the determinants of all the sub-matrices of A takenj rows and columns at a time. That is, g

1

is the trace of A, and g2

is the sumof the determinants of the n(n� 1)/2 sub-matrices that can be formed from Aby deleting all but two rows and columns, and so on – see [17].

5.3 Singular Value Decomposition

Any n⇥m matrix A can be written as

A = UDVT , (269)

whereU = eigenvectors of AAT n⇥ n

D =p

diag(eig(AAT )) n⇥mV = eigenvectors of AT A m⇥m

(270)

5.3.1 Symmetric Square decomposed into squares

Assume A to be n⇥ n and symmetric. Then⇥

A⇤

=⇥

V⇤ ⇥

D⇤ ⇥

VT⇤, (271)

where D is diagonal with the eigenvalues of A, and V is orthogonal and theeigenvectors of A.

Petersen & Pedersen, The Matrix Cookbook, Version: November 14, 2008, Page 30

5.3 Singular Value Decomposition5 SOLUTIONS AND DECOMPOSITIONS

5.2.3 Symmetric

Assume A is symmetric, then

VVT = I (i.e. V is orthogonal) (260)�i 2 R (i.e. �i is real) (261)

Tr(Ap) =P

i�pi (262)

eig(I + cA) = 1 + c�i (263)eig(A� cI) = �i � c (264)

eig(A�1) = ��1

i (265)

For a symmetric, positive matrix A,

eig(AT A) = eig(AAT ) = eig(A) � eig(A) (266)

5.2.4 Characteristic polynomial

The characteristic polynomial for the matrix A is

0 = det(A� �I) (267)= �n � g

1

�n�1 + g2

�n�2 � ... + (�1)ngn (268)

Note that the coe�cients gj for j = 1, ..., n are the n invariants under rotationof A. Thus, gj is the sum of the determinants of all the sub-matrices of A takenj rows and columns at a time. That is, g

1

is the trace of A, and g2

is the sumof the determinants of the n(n� 1)/2 sub-matrices that can be formed from Aby deleting all but two rows and columns, and so on – see [17].

5.3 Singular Value Decomposition

Any n⇥m matrix A can be written as

A = UDVT , (269)

whereU = eigenvectors of AAT n⇥ n

D =p

diag(eig(AAT )) n⇥mV = eigenvectors of AT A m⇥m

(270)

5.3.1 Symmetric Square decomposed into squares

Assume A to be n⇥ n and symmetric. Then⇥

A⇤

=⇥

V⇤ ⇥

D⇤ ⇥

VT⇤, (271)

where D is diagonal with the eigenvalues of A, and V is orthogonal and theeigenvectors of A.

Petersen & Pedersen, The Matrix Cookbook, Version: November 14, 2008, Page 30

5.3 Singular Value Decomposition5 SOLUTIONS AND DECOMPOSITIONS

5.2.3 Symmetric

Assume A is symmetric, then

VVT = I (i.e. V is orthogonal) (260)�i 2 R (i.e. �i is real) (261)

Tr(Ap) =P

i�pi (262)

eig(I + cA) = 1 + c�i (263)eig(A� cI) = �i � c (264)

eig(A�1) = ��1

i (265)

For a symmetric, positive matrix A,

eig(AT A) = eig(AAT ) = eig(A) � eig(A) (266)

5.2.4 Characteristic polynomial

The characteristic polynomial for the matrix A is

0 = det(A� �I) (267)= �n � g

1

�n�1 + g2

�n�2 � ... + (�1)ngn (268)

Note that the coe�cients gj for j = 1, ..., n are the n invariants under rotationof A. Thus, gj is the sum of the determinants of all the sub-matrices of A takenj rows and columns at a time. That is, g

1

is the trace of A, and g2

is the sumof the determinants of the n(n� 1)/2 sub-matrices that can be formed from Aby deleting all but two rows and columns, and so on – see [17].

5.3 Singular Value Decomposition

Any n⇥m matrix A can be written as

A = UDVT , (269)

whereU = eigenvectors of AAT n⇥ n

D =p

diag(eig(AAT )) n⇥mV = eigenvectors of AT A m⇥m

(270)

5.3.1 Symmetric Square decomposed into squares

Assume A to be n⇥ n and symmetric. Then⇥

A⇤

=⇥

V⇤ ⇥

D⇤ ⇥

VT⇤, (271)

where D is diagonal with the eigenvalues of A, and V is orthogonal and theeigenvectors of A.

Petersen & Pedersen, The Matrix Cookbook, Version: November 14, 2008, Page 30

Un po’ di algebra lineare di base //Singular Value Decomposition (SVD)

• Esempio

ROUGH DRAFT - BEWARE suggestions [email protected]

0 2 4 6 8 100

24

68

10

Figure 1: Best-fit regression line reduces data from two dimensions into one.

original data more clearly and orders it from most variation to the least. What makes SVDpractical for NLP applications is that you can simply ignore variation below a particularthreshhold to massively reduce your data but be assured that the main relationships ofinterest have been preserved.

8.1 Example of Full Singular Value Decomposition

SVD is based on a theorem from linear algebra which says that a rectangular matrix A canbe broken down into the product of three matrices - an orthogonal matrix U , a diagonalmatrix S, and the transpose of an orthogonal matrix V . The theorem is usually presentedsomething like this:

Amn = UmmSmnVTnn

where UTU = I, V TV = I; the columns of U are orthonormal eigenvectors of AAT , thecolumns of V are orthonormal eigenvectors of ATA, and S is a diagonal matrix containingthe square roots of eigenvalues from U or V in descending order.

The following example merely applies this definition to a small matrix in order to computeits SVD. In the next section, I attempt to interpret the application of SVD to documentclassification.

Start with the matrix

A =

!3 1 1−1 3 1

"

15

ROUGH DRAFT - BEWARE suggestions [email protected]

0 2 4 6 8 10

02

46

810

Figure 2: Regression line along second dimension captures less variation in original data.

In order to find U , we have to start with AAT . The transpose of A is

AT =

⎢⎣3 −11 31 1

⎥⎦

so

AAT =

[3 1 1−1 3 1

] ⎡

⎢⎣3 −11 31 1

⎥⎦ =

[11 11 11

]

Next, we have to find the eigenvalues and corresponding eigenvectors of AAT . We know thateigenvectors are defined by the equation Av = λv, and applying this to AAT gives us

[11 11 11

] [x1

x2

]

= λ

[x1

x2

]

We rewrite this as the set of equations

11x1 + x2 = λx1

x1 + 11x2 = λx2

and rearrange to get(11− λ)x1 + x2 = 0

16

ROUGH DRAFT - BEWARE suggestions [email protected]

0 2 4 6 8 10

02

46

810

Figure 2: Regression line along second dimension captures less variation in original data.

In order to find U , we have to start with AAT . The transpose of A is

AT =

⎢⎣3 −11 31 1

⎥⎦

so

AAT =

[3 1 1−1 3 1

] ⎡

⎢⎣3 −11 31 1

⎥⎦ =

[11 11 11

]

Next, we have to find the eigenvalues and corresponding eigenvectors of AAT . We know thateigenvectors are defined by the equation Av = λv, and applying this to AAT gives us

[11 11 11

] [x1

x2

]

= λ

[x1

x2

]

We rewrite this as the set of equations

11x1 + x2 = λx1

x1 + 11x2 = λx2

and rearrange to get(11− λ)x1 + x2 = 0

16

ROUGH DRAFT - BEWARE suggestions [email protected]

0 2 4 6 8 10

02

46

810

Figure 2: Regression line along second dimension captures less variation in original data.

In order to find U , we have to start with AAT . The transpose of A is

AT =

⎢⎣3 −11 31 1

⎥⎦

so

AAT =

[3 1 1−1 3 1

] ⎡

⎢⎣3 −11 31 1

⎥⎦ =

[11 11 11

]

Next, we have to find the eigenvalues and corresponding eigenvectors of AAT . We know thateigenvectors are defined by the equation Av = λv, and applying this to AAT gives us

[11 11 11

] [x1

x2

]

= λ

[x1

x2

]

We rewrite this as the set of equations

11x1 + x2 = λx1

x1 + 11x2 = λx2

and rearrange to get(11− λ)x1 + x2 = 0

16

ROUGH DRAFT - BEWARE suggestions [email protected]

0 2 4 6 8 10

02

46

810

Figure 2: Regression line along second dimension captures less variation in original data.

In order to find U , we have to start with AAT . The transpose of A is

AT =

⎢⎣3 −11 31 1

⎥⎦

so

AAT =

[3 1 1−1 3 1

] ⎡

⎢⎣3 −11 31 1

⎥⎦ =

[11 11 11

]

Next, we have to find the eigenvalues and corresponding eigenvectors of AAT . We know thateigenvectors are defined by the equation Av = λv, and applying this to AAT gives us

[11 11 11

] [x1

x2

]

= λ

[x1

x2

]

We rewrite this as the set of equations

11x1 + x2 = λx1

x1 + 11x2 = λx2

and rearrange to get(11− λ)x1 + x2 = 0

16

ROUGH DRAFT - BEWARE suggestions [email protected]

0 2 4 6 8 10

02

46

810

Figure 2: Regression line along second dimension captures less variation in original data.

In order to find U , we have to start with AAT . The transpose of A is

AT =

⎢⎣3 −11 31 1

⎥⎦

so

AAT =

[3 1 1−1 3 1

] ⎡

⎢⎣3 −11 31 1

⎥⎦ =

[11 11 11

]

Next, we have to find the eigenvalues and corresponding eigenvectors of AAT . We know thateigenvectors are defined by the equation Av = λv, and applying this to AAT gives us

[11 11 11

] [x1

x2

]

= λ

[x1

x2

]

We rewrite this as the set of equations

11x1 + x2 = λx1

x1 + 11x2 = λx2

and rearrange to get(11− λ)x1 + x2 = 0

16

ROUGH DRAFT - BEWARE suggestions [email protected]

x1 + (11− λ)x2 = 0

Solve for λ by setting the determinant of the coefficient matrix to zero,!!!!!(11− λ) 1

1 (11− λ)

!!!!! = 0

which works out as(11− λ)(11− λ)− 1 · 1 = 0

(λ− 10)(λ− 12) = 0

λ = 10,λ = 12

to give us our two eigenvalues λ = 10,λ = 12. Plugging λ back in to the original equationsgives us our eigenvectors. For λ = 10 we get

(11− 10)x1 + x2 = 0

x1 = −x2

which is true for lots of values, so we’ll pick x1 = 1 and x2 = −1 since those are small andeasier to work with. Thus, we have the eigenvector [1,−1] corresponding to the eigenvalueλ = 10. For λ = 12 we have

(11− 12)x1 + x2 = 0

x1 = x2

and for the same reason as before we’ll take x1 = 1 and x2 = 1. Now, for λ = 12 we have theeigenvector [1, 1]. These eigenvectors become column vectors in a matrix ordered by the sizeof the corresponding eigenvalue. In other words, the eigenvector of the largest eigenvalueis column one, the eigenvector of the next largest eigenvalue is column two, and so forthand so on until we have the eigenvector of the smallest eigenvalue as the last column of ourmatrix. In the matrix below, the eigenvector for λ = 12 is column one, and the eigenvectorfor λ = 10 is column two. "

1 11 −1

#

Finally, we have to convert this matrix into an orthogonal matrix which we do by applyingthe Gram-Schmidt orthonormalization process to the column vectors. Begin by normalizingv1.

u1 =v1|v1|

=[1, 1]√12 + 12

=[1, 1]√

2= [

1√2,1√2]

Computew2 = v2 − u1 · v2 ∗ u1 =

17

ROUGH DRAFT - BEWARE suggestions [email protected]

x1 + (11− λ)x2 = 0

Solve for λ by setting the determinant of the coefficient matrix to zero,!!!!!(11− λ) 1

1 (11− λ)

!!!!! = 0

which works out as(11− λ)(11− λ)− 1 · 1 = 0

(λ− 10)(λ− 12) = 0

λ = 10,λ = 12

to give us our two eigenvalues λ = 10,λ = 12. Plugging λ back in to the original equationsgives us our eigenvectors. For λ = 10 we get

(11− 10)x1 + x2 = 0

x1 = −x2

which is true for lots of values, so we’ll pick x1 = 1 and x2 = −1 since those are small andeasier to work with. Thus, we have the eigenvector [1,−1] corresponding to the eigenvalueλ = 10. For λ = 12 we have

(11− 12)x1 + x2 = 0

x1 = x2

and for the same reason as before we’ll take x1 = 1 and x2 = 1. Now, for λ = 12 we have theeigenvector [1, 1]. These eigenvectors become column vectors in a matrix ordered by the sizeof the corresponding eigenvalue. In other words, the eigenvector of the largest eigenvalueis column one, the eigenvector of the next largest eigenvalue is column two, and so forthand so on until we have the eigenvector of the smallest eigenvalue as the last column of ourmatrix. In the matrix below, the eigenvector for λ = 12 is column one, and the eigenvectorfor λ = 10 is column two. "

1 11 −1

#

Finally, we have to convert this matrix into an orthogonal matrix which we do by applyingthe Gram-Schmidt orthonormalization process to the column vectors. Begin by normalizingv1.

u1 =v1|v1|

=[1, 1]√12 + 12

=[1, 1]√

2= [

1√2,1√2]

Computew2 = v2 − u1 · v2 ∗ u1 =

17

ROUGH DRAFT - BEWARE suggestions [email protected]

x1 + (11− λ)x2 = 0

Solve for λ by setting the determinant of the coefficient matrix to zero,!!!!!(11− λ) 1

1 (11− λ)

!!!!! = 0

which works out as(11− λ)(11− λ)− 1 · 1 = 0

(λ− 10)(λ− 12) = 0

λ = 10,λ = 12

to give us our two eigenvalues λ = 10,λ = 12. Plugging λ back in to the original equationsgives us our eigenvectors. For λ = 10 we get

(11− 10)x1 + x2 = 0

x1 = −x2

which is true for lots of values, so we’ll pick x1 = 1 and x2 = −1 since those are small andeasier to work with. Thus, we have the eigenvector [1,−1] corresponding to the eigenvalueλ = 10. For λ = 12 we have

(11− 12)x1 + x2 = 0

x1 = x2

and for the same reason as before we’ll take x1 = 1 and x2 = 1. Now, for λ = 12 we have theeigenvector [1, 1]. These eigenvectors become column vectors in a matrix ordered by the sizeof the corresponding eigenvalue. In other words, the eigenvector of the largest eigenvalueis column one, the eigenvector of the next largest eigenvalue is column two, and so forthand so on until we have the eigenvector of the smallest eigenvalue as the last column of ourmatrix. In the matrix below, the eigenvector for λ = 12 is column one, and the eigenvectorfor λ = 10 is column two. "

1 11 −1

#

Finally, we have to convert this matrix into an orthogonal matrix which we do by applyingthe Gram-Schmidt orthonormalization process to the column vectors. Begin by normalizingv1.

u1 =v1|v1|

=[1, 1]√12 + 12

=[1, 1]√

2= [

1√2,1√2]

Computew2 = v2 − u1 · v2 ∗ u1 =

17

ROUGH DRAFT - BEWARE suggestions [email protected]

x1 + (11− λ)x2 = 0

Solve for λ by setting the determinant of the coefficient matrix to zero,!!!!!(11− λ) 1

1 (11− λ)

!!!!! = 0

which works out as(11− λ)(11− λ)− 1 · 1 = 0

(λ− 10)(λ− 12) = 0

λ = 10,λ = 12

to give us our two eigenvalues λ = 10,λ = 12. Plugging λ back in to the original equationsgives us our eigenvectors. For λ = 10 we get

(11− 10)x1 + x2 = 0

x1 = −x2

which is true for lots of values, so we’ll pick x1 = 1 and x2 = −1 since those are small andeasier to work with. Thus, we have the eigenvector [1,−1] corresponding to the eigenvalueλ = 10. For λ = 12 we have

(11− 12)x1 + x2 = 0

x1 = x2

and for the same reason as before we’ll take x1 = 1 and x2 = 1. Now, for λ = 12 we have theeigenvector [1, 1]. These eigenvectors become column vectors in a matrix ordered by the sizeof the corresponding eigenvalue. In other words, the eigenvector of the largest eigenvalueis column one, the eigenvector of the next largest eigenvalue is column two, and so forthand so on until we have the eigenvector of the smallest eigenvalue as the last column of ourmatrix. In the matrix below, the eigenvector for λ = 12 is column one, and the eigenvectorfor λ = 10 is column two. "

1 11 −1

#

Finally, we have to convert this matrix into an orthogonal matrix which we do by applyingthe Gram-Schmidt orthonormalization process to the column vectors. Begin by normalizingv1.

u1 =v1|v1|

=[1, 1]√12 + 12

=[1, 1]√

2= [

1√2,1√2]

Computew2 = v2 − u1 · v2 ∗ u1 =

17

ROUGH DRAFT - BEWARE suggestions [email protected]

x1 + (11− λ)x2 = 0

Solve for λ by setting the determinant of the coefficient matrix to zero,!!!!!(11− λ) 1

1 (11− λ)

!!!!! = 0

which works out as(11− λ)(11− λ)− 1 · 1 = 0

(λ− 10)(λ− 12) = 0

λ = 10,λ = 12

to give us our two eigenvalues λ = 10,λ = 12. Plugging λ back in to the original equationsgives us our eigenvectors. For λ = 10 we get

(11− 10)x1 + x2 = 0

x1 = −x2

which is true for lots of values, so we’ll pick x1 = 1 and x2 = −1 since those are small andeasier to work with. Thus, we have the eigenvector [1,−1] corresponding to the eigenvalueλ = 10. For λ = 12 we have

(11− 12)x1 + x2 = 0

x1 = x2

and for the same reason as before we’ll take x1 = 1 and x2 = 1. Now, for λ = 12 we have theeigenvector [1, 1]. These eigenvectors become column vectors in a matrix ordered by the sizeof the corresponding eigenvalue. In other words, the eigenvector of the largest eigenvalueis column one, the eigenvector of the next largest eigenvalue is column two, and so forthand so on until we have the eigenvector of the smallest eigenvalue as the last column of ourmatrix. In the matrix below, the eigenvector for λ = 12 is column one, and the eigenvectorfor λ = 10 is column two. "

1 11 −1

#

Finally, we have to convert this matrix into an orthogonal matrix which we do by applyingthe Gram-Schmidt orthonormalization process to the column vectors. Begin by normalizingv1.

u1 =v1|v1|

=[1, 1]√12 + 12

=[1, 1]√

2= [

1√2,1√2]

Computew2 = v2 − u1 · v2 ∗ u1 =

17

ROUGH DRAFT - BEWARE suggestions [email protected]

x1 + (11− λ)x2 = 0

Solve for λ by setting the determinant of the coefficient matrix to zero,!!!!!(11− λ) 1

1 (11− λ)

!!!!! = 0

which works out as(11− λ)(11− λ)− 1 · 1 = 0

(λ− 10)(λ− 12) = 0

λ = 10,λ = 12

to give us our two eigenvalues λ = 10,λ = 12. Plugging λ back in to the original equationsgives us our eigenvectors. For λ = 10 we get

(11− 10)x1 + x2 = 0

x1 = −x2

which is true for lots of values, so we’ll pick x1 = 1 and x2 = −1 since those are small andeasier to work with. Thus, we have the eigenvector [1,−1] corresponding to the eigenvalueλ = 10. For λ = 12 we have

(11− 12)x1 + x2 = 0

x1 = x2

and for the same reason as before we’ll take x1 = 1 and x2 = 1. Now, for λ = 12 we have theeigenvector [1, 1]. These eigenvectors become column vectors in a matrix ordered by the sizeof the corresponding eigenvalue. In other words, the eigenvector of the largest eigenvalueis column one, the eigenvector of the next largest eigenvalue is column two, and so forthand so on until we have the eigenvector of the smallest eigenvalue as the last column of ourmatrix. In the matrix below, the eigenvector for λ = 12 is column one, and the eigenvectorfor λ = 10 is column two. "

1 11 −1

#

Finally, we have to convert this matrix into an orthogonal matrix which we do by applyingthe Gram-Schmidt orthonormalization process to the column vectors. Begin by normalizingv1.

u1 =v1|v1|

=[1, 1]√12 + 12

=[1, 1]√

2= [

1√2,1√2]

Computew2 = v2 − u1 · v2 ∗ u1 =

17

ROUGH DRAFT - BEWARE suggestions [email protected]

x1 + (11− λ)x2 = 0

Solve for λ by setting the determinant of the coefficient matrix to zero,!!!!!(11− λ) 1

1 (11− λ)

!!!!! = 0

which works out as(11− λ)(11− λ)− 1 · 1 = 0

(λ− 10)(λ− 12) = 0

λ = 10,λ = 12

to give us our two eigenvalues λ = 10,λ = 12. Plugging λ back in to the original equationsgives us our eigenvectors. For λ = 10 we get

(11− 10)x1 + x2 = 0

x1 = −x2

which is true for lots of values, so we’ll pick x1 = 1 and x2 = −1 since those are small andeasier to work with. Thus, we have the eigenvector [1,−1] corresponding to the eigenvalueλ = 10. For λ = 12 we have

(11− 12)x1 + x2 = 0

x1 = x2

and for the same reason as before we’ll take x1 = 1 and x2 = 1. Now, for λ = 12 we have theeigenvector [1, 1]. These eigenvectors become column vectors in a matrix ordered by the sizeof the corresponding eigenvalue. In other words, the eigenvector of the largest eigenvalueis column one, the eigenvector of the next largest eigenvalue is column two, and so forthand so on until we have the eigenvector of the smallest eigenvalue as the last column of ourmatrix. In the matrix below, the eigenvector for λ = 12 is column one, and the eigenvectorfor λ = 10 is column two. "

1 11 −1

#

Finally, we have to convert this matrix into an orthogonal matrix which we do by applyingthe Gram-Schmidt orthonormalization process to the column vectors. Begin by normalizingv1.

u1 =v1|v1|

=[1, 1]√12 + 12

=[1, 1]√

2= [

1√2,1√2]

Computew2 = v2 − u1 · v2 ∗ u1 =

17

ROUGH DRAFT - BEWARE suggestions [email protected]

x1 + (11− λ)x2 = 0

Solve for λ by setting the determinant of the coefficient matrix to zero,!!!!!(11− λ) 1

1 (11− λ)

!!!!! = 0

which works out as(11− λ)(11− λ)− 1 · 1 = 0

(λ− 10)(λ− 12) = 0

λ = 10,λ = 12

to give us our two eigenvalues λ = 10,λ = 12. Plugging λ back in to the original equationsgives us our eigenvectors. For λ = 10 we get

(11− 10)x1 + x2 = 0

x1 = −x2

which is true for lots of values, so we’ll pick x1 = 1 and x2 = −1 since those are small andeasier to work with. Thus, we have the eigenvector [1,−1] corresponding to the eigenvalueλ = 10. For λ = 12 we have

(11− 12)x1 + x2 = 0

x1 = x2

and for the same reason as before we’ll take x1 = 1 and x2 = 1. Now, for λ = 12 we have theeigenvector [1, 1]. These eigenvectors become column vectors in a matrix ordered by the sizeof the corresponding eigenvalue. In other words, the eigenvector of the largest eigenvalueis column one, the eigenvector of the next largest eigenvalue is column two, and so forthand so on until we have the eigenvector of the smallest eigenvalue as the last column of ourmatrix. In the matrix below, the eigenvector for λ = 12 is column one, and the eigenvectorfor λ = 10 is column two. "

1 11 −1

#

Finally, we have to convert this matrix into an orthogonal matrix which we do by applyingthe Gram-Schmidt orthonormalization process to the column vectors. Begin by normalizingv1.

u1 =v1|v1|

=[1, 1]√12 + 12

=[1, 1]√

2= [

1√2,1√2]

Computew2 = v2 − u1 · v2 ∗ u1 =

17

ROUGH DRAFT - BEWARE suggestions [email protected]

x1 + (11− λ)x2 = 0

Solve for λ by setting the determinant of the coefficient matrix to zero,!!!!!(11− λ) 1

1 (11− λ)

!!!!! = 0

which works out as(11− λ)(11− λ)− 1 · 1 = 0

(λ− 10)(λ− 12) = 0

λ = 10,λ = 12

to give us our two eigenvalues λ = 10,λ = 12. Plugging λ back in to the original equationsgives us our eigenvectors. For λ = 10 we get

(11− 10)x1 + x2 = 0

x1 = −x2

which is true for lots of values, so we’ll pick x1 = 1 and x2 = −1 since those are small andeasier to work with. Thus, we have the eigenvector [1,−1] corresponding to the eigenvalueλ = 10. For λ = 12 we have

(11− 12)x1 + x2 = 0

x1 = x2

and for the same reason as before we’ll take x1 = 1 and x2 = 1. Now, for λ = 12 we have theeigenvector [1, 1]. These eigenvectors become column vectors in a matrix ordered by the sizeof the corresponding eigenvalue. In other words, the eigenvector of the largest eigenvalueis column one, the eigenvector of the next largest eigenvalue is column two, and so forthand so on until we have the eigenvector of the smallest eigenvalue as the last column of ourmatrix. In the matrix below, the eigenvector for λ = 12 is column one, and the eigenvectorfor λ = 10 is column two. "

1 11 −1

#

Finally, we have to convert this matrix into an orthogonal matrix which we do by applyingthe Gram-Schmidt orthonormalization process to the column vectors. Begin by normalizingv1.

u1 =v1|v1|

=[1, 1]√12 + 12

=[1, 1]√

2= [

1√2,1√2]

Computew2 = v2 − u1 · v2 ∗ u1 =

17

Page 4: Computazione per l’interazione naturale: Richiami di ...boccignone.di.unimi.it/IN_2016_files/LezIN_AlgebraLineare_3.key.pdf · Un po’ di algebra lineare di base //Singular Value

Un po’ di algebra lineare di base //Singular Value Decomposition (SVD)

• ortonormalizziamo la matrice

ROUGH DRAFT - BEWARE suggestions [email protected]

x1 + (11− λ)x2 = 0

Solve for λ by setting the determinant of the coefficient matrix to zero,!!!!!(11− λ) 1

1 (11− λ)

!!!!! = 0

which works out as(11− λ)(11− λ)− 1 · 1 = 0

(λ− 10)(λ− 12) = 0

λ = 10,λ = 12

to give us our two eigenvalues λ = 10,λ = 12. Plugging λ back in to the original equationsgives us our eigenvectors. For λ = 10 we get

(11− 10)x1 + x2 = 0

x1 = −x2

which is true for lots of values, so we’ll pick x1 = 1 and x2 = −1 since those are small andeasier to work with. Thus, we have the eigenvector [1,−1] corresponding to the eigenvalueλ = 10. For λ = 12 we have

(11− 12)x1 + x2 = 0

x1 = x2

and for the same reason as before we’ll take x1 = 1 and x2 = 1. Now, for λ = 12 we have theeigenvector [1, 1]. These eigenvectors become column vectors in a matrix ordered by the sizeof the corresponding eigenvalue. In other words, the eigenvector of the largest eigenvalueis column one, the eigenvector of the next largest eigenvalue is column two, and so forthand so on until we have the eigenvector of the smallest eigenvalue as the last column of ourmatrix. In the matrix below, the eigenvector for λ = 12 is column one, and the eigenvectorfor λ = 10 is column two. "

1 11 −1

#

Finally, we have to convert this matrix into an orthogonal matrix which we do by applyingthe Gram-Schmidt orthonormalization process to the column vectors. Begin by normalizingv1.

u1 =v1|v1|

=[1, 1]√12 + 12

=[1, 1]√

2= [

1√2,1√2]

Computew2 = v2 − u1 · v2 ∗ u1 =

17

ROUGH DRAFT - BEWARE suggestions [email protected]

x1 + (11− λ)x2 = 0

Solve for λ by setting the determinant of the coefficient matrix to zero,!!!!!(11− λ) 1

1 (11− λ)

!!!!! = 0

which works out as(11− λ)(11− λ)− 1 · 1 = 0

(λ− 10)(λ− 12) = 0

λ = 10,λ = 12

to give us our two eigenvalues λ = 10,λ = 12. Plugging λ back in to the original equationsgives us our eigenvectors. For λ = 10 we get

(11− 10)x1 + x2 = 0

x1 = −x2

which is true for lots of values, so we’ll pick x1 = 1 and x2 = −1 since those are small andeasier to work with. Thus, we have the eigenvector [1,−1] corresponding to the eigenvalueλ = 10. For λ = 12 we have

(11− 12)x1 + x2 = 0

x1 = x2

and for the same reason as before we’ll take x1 = 1 and x2 = 1. Now, for λ = 12 we have theeigenvector [1, 1]. These eigenvectors become column vectors in a matrix ordered by the sizeof the corresponding eigenvalue. In other words, the eigenvector of the largest eigenvalueis column one, the eigenvector of the next largest eigenvalue is column two, and so forthand so on until we have the eigenvector of the smallest eigenvalue as the last column of ourmatrix. In the matrix below, the eigenvector for λ = 12 is column one, and the eigenvectorfor λ = 10 is column two. "

1 11 −1

#

Finally, we have to convert this matrix into an orthogonal matrix which we do by applyingthe Gram-Schmidt orthonormalization process to the column vectors. Begin by normalizingv1.

u1 =v1|v1|

=[1, 1]√12 + 12

=[1, 1]√

2= [

1√2,1√2]

Computew2 = v2 − u1 · v2 ∗ u1 =

17

ROUGH DRAFT - BEWARE suggestions [email protected]

x1 + (11− λ)x2 = 0

Solve for λ by setting the determinant of the coefficient matrix to zero,!!!!!(11− λ) 1

1 (11− λ)

!!!!! = 0

which works out as(11− λ)(11− λ)− 1 · 1 = 0

(λ− 10)(λ− 12) = 0

λ = 10,λ = 12

to give us our two eigenvalues λ = 10,λ = 12. Plugging λ back in to the original equationsgives us our eigenvectors. For λ = 10 we get

(11− 10)x1 + x2 = 0

x1 = −x2

which is true for lots of values, so we’ll pick x1 = 1 and x2 = −1 since those are small andeasier to work with. Thus, we have the eigenvector [1,−1] corresponding to the eigenvalueλ = 10. For λ = 12 we have

(11− 12)x1 + x2 = 0

x1 = x2

and for the same reason as before we’ll take x1 = 1 and x2 = 1. Now, for λ = 12 we have theeigenvector [1, 1]. These eigenvectors become column vectors in a matrix ordered by the sizeof the corresponding eigenvalue. In other words, the eigenvector of the largest eigenvalueis column one, the eigenvector of the next largest eigenvalue is column two, and so forthand so on until we have the eigenvector of the smallest eigenvalue as the last column of ourmatrix. In the matrix below, the eigenvector for λ = 12 is column one, and the eigenvectorfor λ = 10 is column two. "

1 11 −1

#

Finally, we have to convert this matrix into an orthogonal matrix which we do by applyingthe Gram-Schmidt orthonormalization process to the column vectors. Begin by normalizingv1.

u1 =v1|v1|

=[1, 1]√12 + 12

=[1, 1]√

2= [

1√2,1√2]

Computew2 = v2 − u1 · v2 ∗ u1 =

17

ROUGH DRAFT - BEWARE suggestions [email protected]

[1,−1]− [1√2,1√2] · [1,−1] ∗ [ 1√

2,1√2] =

[1,−1]− 0 ∗ [ 1√2,1√2] = [1,−1]− [0, 0] = [1,−1]

and normalize

u2 =w2

|w2|= [

1√2,−1√2]

to give

U =

[ 1√2

1√2

1√2

−1√2

]

The calculation of V is similar. V is based on ATA, so we have

ATA =

⎢⎣3 −11 31 1

⎥⎦

[3 1 1−1 3 1

]

=

⎢⎣10 0 20 10 42 4 2

⎥⎦

Find the eigenvalues of ATA by

⎢⎣10 0 20 10 42 4 2

⎥⎦

⎢⎣x1

x2

x3

⎥⎦ = λ

⎢⎣x1

x2

x3

⎥⎦

which represents the system of equations

10x1 + 2x3 = λx1

10x2 + 4x3 = λx2

2x1 + 4x2 + 2x3 = λx2

which rewrite as(10− λ)x1 + 2x3 = 0

(10− λ)x2 + 4x3 = 0

2x1 + 4x2 + (2− λ)x3 = 0

which are solved by setting

∣∣∣∣∣∣∣

(10− λ) 0 20 (10− λ) 42 4 (2− λ)

∣∣∣∣∣∣∣= 0

18

ROUGH DRAFT - BEWARE suggestions [email protected]

[1,−1]− [1√2,1√2] · [1,−1] ∗ [ 1√

2,1√2] =

[1,−1]− 0 ∗ [ 1√2,1√2] = [1,−1]− [0, 0] = [1,−1]

and normalize

u2 =w2

|w2|= [

1√2,−1√2]

to give

U =

[ 1√2

1√2

1√2

−1√2

]

The calculation of V is similar. V is based on ATA, so we have

ATA =

⎢⎣3 −11 31 1

⎥⎦

[3 1 1−1 3 1

]

=

⎢⎣10 0 20 10 42 4 2

⎥⎦

Find the eigenvalues of ATA by

⎢⎣10 0 20 10 42 4 2

⎥⎦

⎢⎣x1

x2

x3

⎥⎦ = λ

⎢⎣x1

x2

x3

⎥⎦

which represents the system of equations

10x1 + 2x3 = λx1

10x2 + 4x3 = λx2

2x1 + 4x2 + 2x3 = λx2

which rewrite as(10− λ)x1 + 2x3 = 0

(10− λ)x2 + 4x3 = 0

2x1 + 4x2 + (2− λ)x3 = 0

which are solved by setting

∣∣∣∣∣∣∣

(10− λ) 0 20 (10− λ) 42 4 (2− λ)

∣∣∣∣∣∣∣= 0

18

ROUGH DRAFT - BEWARE suggestions [email protected]

[1,−1]− [1√2,1√2] · [1,−1] ∗ [ 1√

2,1√2] =

[1,−1]− 0 ∗ [ 1√2,1√2] = [1,−1]− [0, 0] = [1,−1]

and normalize

u2 =w2

|w2|= [

1√2,−1√2]

to give

U =

[ 1√2

1√2

1√2

−1√2

]

The calculation of V is similar. V is based on ATA, so we have

ATA =

⎢⎣3 −11 31 1

⎥⎦

[3 1 1−1 3 1

]

=

⎢⎣10 0 20 10 42 4 2

⎥⎦

Find the eigenvalues of ATA by

⎢⎣10 0 20 10 42 4 2

⎥⎦

⎢⎣x1

x2

x3

⎥⎦ = λ

⎢⎣x1

x2

x3

⎥⎦

which represents the system of equations

10x1 + 2x3 = λx1

10x2 + 4x3 = λx2

2x1 + 4x2 + 2x3 = λx2

which rewrite as(10− λ)x1 + 2x3 = 0

(10− λ)x2 + 4x3 = 0

2x1 + 4x2 + (2− λ)x3 = 0

which are solved by setting

∣∣∣∣∣∣∣

(10− λ) 0 20 (10− λ) 42 4 (2− λ)

∣∣∣∣∣∣∣= 0

18

ROUGH DRAFT - BEWARE suggestions [email protected]

[1,−1]− [1√2,1√2] · [1,−1] ∗ [ 1√

2,1√2] =

[1,−1]− 0 ∗ [ 1√2,1√2] = [1,−1]− [0, 0] = [1,−1]

and normalize

u2 =w2

|w2|= [

1√2,−1√2]

to give

U =

[ 1√2

1√2

1√2

−1√2

]

The calculation of V is similar. V is based on ATA, so we have

ATA =

⎢⎣3 −11 31 1

⎥⎦

[3 1 1−1 3 1

]

=

⎢⎣10 0 20 10 42 4 2

⎥⎦

Find the eigenvalues of ATA by

⎢⎣10 0 20 10 42 4 2

⎥⎦

⎢⎣x1

x2

x3

⎥⎦ = λ

⎢⎣x1

x2

x3

⎥⎦

which represents the system of equations

10x1 + 2x3 = λx1

10x2 + 4x3 = λx2

2x1 + 4x2 + 2x3 = λx2

which rewrite as(10− λ)x1 + 2x3 = 0

(10− λ)x2 + 4x3 = 0

2x1 + 4x2 + (2− λ)x3 = 0

which are solved by setting

∣∣∣∣∣∣∣

(10− λ) 0 20 (10− λ) 42 4 (2− λ)

∣∣∣∣∣∣∣= 0

18

• stesso procedimento per V

ROUGH DRAFT - BEWARE suggestions [email protected]

[1,−1]− [1√2,1√2] · [1,−1] ∗ [ 1√

2,1√2] =

[1,−1]− 0 ∗ [ 1√2,1√2] = [1,−1]− [0, 0] = [1,−1]

and normalize

u2 =w2

|w2|= [

1√2,−1√2]

to give

U =

[ 1√2

1√2

1√2

−1√2

]

The calculation of V is similar. V is based on ATA, so we have

ATA =

⎢⎣3 −11 31 1

⎥⎦

[3 1 1−1 3 1

]

=

⎢⎣10 0 20 10 42 4 2

⎥⎦

Find the eigenvalues of ATA by

⎢⎣10 0 20 10 42 4 2

⎥⎦

⎢⎣x1

x2

x3

⎥⎦ = λ

⎢⎣x1

x2

x3

⎥⎦

which represents the system of equations

10x1 + 2x3 = λx1

10x2 + 4x3 = λx2

2x1 + 4x2 + 2x3 = λx2

which rewrite as(10− λ)x1 + 2x3 = 0

(10− λ)x2 + 4x3 = 0

2x1 + 4x2 + (2− λ)x3 = 0

which are solved by setting

∣∣∣∣∣∣∣

(10− λ) 0 20 (10− λ) 42 4 (2− λ)

∣∣∣∣∣∣∣= 0

18

ROUGH DRAFT - BEWARE suggestions [email protected]

[1,−1]− [1√2,1√2] · [1,−1] ∗ [ 1√

2,1√2] =

[1,−1]− 0 ∗ [ 1√2,1√2] = [1,−1]− [0, 0] = [1,−1]

and normalize

u2 =w2

|w2|= [

1√2,−1√2]

to give

U =

[ 1√2

1√2

1√2

−1√2

]

The calculation of V is similar. V is based on ATA, so we have

ATA =

⎢⎣3 −11 31 1

⎥⎦

[3 1 1−1 3 1

]

=

⎢⎣10 0 20 10 42 4 2

⎥⎦

Find the eigenvalues of ATA by

⎢⎣10 0 20 10 42 4 2

⎥⎦

⎢⎣x1

x2

x3

⎥⎦ = λ

⎢⎣x1

x2

x3

⎥⎦

which represents the system of equations

10x1 + 2x3 = λx1

10x2 + 4x3 = λx2

2x1 + 4x2 + 2x3 = λx2

which rewrite as(10− λ)x1 + 2x3 = 0

(10− λ)x2 + 4x3 = 0

2x1 + 4x2 + (2− λ)x3 = 0

which are solved by setting

∣∣∣∣∣∣∣

(10− λ) 0 20 (10− λ) 42 4 (2− λ)

∣∣∣∣∣∣∣= 0

18

Gram-Schmid

Un po’ di algebra lineare di base //Singular Value Decomposition (SVD)

• Stesso procedimento per V

ROUGH DRAFT - BEWARE suggestions [email protected]

and use the Gram-Schmidt orthonormalization process to convert that to an orthonormalmatrix.

u1 =v1|v1|

= [1√6,2√6,1√6]

w2 = v2 − u1 · v2 ∗ u1 = [2,−1, 0]

u2 =w2

|w2|= [

2√5,−1√5, 0]

w3 = v3 − u1 · v3 ∗ u1 − u2 · v3 ∗ u2 = [−2

3,−4

3,10

3]

u3 =w3

|w3|= [

1√30

,2√30

,−5√30

]

All this to give us

V =

⎢⎢⎣

1√6

2√5

1√30

2√6

−1√5

2√30

1√6

0 −5√30

⎥⎥⎦

when we really want its transpose

V T =

⎢⎢⎣

1√6

2√6

1√6

2√5

−1√5

01√30

2√30

−5√30

⎥⎥⎦

For S we take the square roots of the non-zero eigenvalues and populate the diagonal withthem, putting the largest in s11, the next largest in s22 and so on until the smallest valueends up in smm. The non-zero eigenvalues of U and V are always the same, so that’s whyit doesn’t matter which one we take them from. Because we are doing full SVD, instead ofreduced SVD (next section), we have to add a zero column vector to S so that it is of theproper dimensions to allow multiplication between U and V . The diagonal entries in S arethe singular values of A, the columns in U are called left singular vectors, and the columnsin V are called right singular vectors.

S =

[ √12 0 00

√10 0

]

Now we have all the pieces of the puzzle

Amn = UmmSmnVTnn =

[ 1√2

1√2

1√2

−1√2

] [ √12 0 00

√10 0

]⎡

⎢⎢⎣

1√6

2√6

1√6

2√5

−1√5

01√30

2√30

−5√30

⎥⎥⎦ =

√12√2

√10√2

0√12√2

−√10√2

0

⎢⎢⎣

1√6

2√6

1√6

2√5

−1√5

01√30

2√30

−5√30

⎥⎥⎦ =

[3 1 1−1 3 1

]

20

ROUGH DRAFT - BEWARE suggestions [email protected]

and use the Gram-Schmidt orthonormalization process to convert that to an orthonormalmatrix.

u1 =v1|v1|

= [1√6,2√6,1√6]

w2 = v2 − u1 · v2 ∗ u1 = [2,−1, 0]

u2 =w2

|w2|= [

2√5,−1√5, 0]

w3 = v3 − u1 · v3 ∗ u1 − u2 · v3 ∗ u2 = [−2

3,−4

3,10

3]

u3 =w3

|w3|= [

1√30

,2√30

,−5√30

]

All this to give us

V =

⎢⎢⎣

1√6

2√5

1√30

2√6

−1√5

2√30

1√6

0 −5√30

⎥⎥⎦

when we really want its transpose

V T =

⎢⎢⎣

1√6

2√6

1√6

2√5

−1√5

01√30

2√30

−5√30

⎥⎥⎦

For S we take the square roots of the non-zero eigenvalues and populate the diagonal withthem, putting the largest in s11, the next largest in s22 and so on until the smallest valueends up in smm. The non-zero eigenvalues of U and V are always the same, so that’s whyit doesn’t matter which one we take them from. Because we are doing full SVD, instead ofreduced SVD (next section), we have to add a zero column vector to S so that it is of theproper dimensions to allow multiplication between U and V . The diagonal entries in S arethe singular values of A, the columns in U are called left singular vectors, and the columnsin V are called right singular vectors.

S =

[ √12 0 00

√10 0

]

Now we have all the pieces of the puzzle

Amn = UmmSmnVTnn =

[ 1√2

1√2

1√2

−1√2

] [ √12 0 00

√10 0

]⎡

⎢⎢⎣

1√6

2√6

1√6

2√5

−1√5

01√30

2√30

−5√30

⎥⎥⎦ =

√12√2

√10√2

0√12√2

−√10√2

0

⎢⎢⎣

1√6

2√6

1√6

2√5

−1√5

01√30

2√30

−5√30

⎥⎥⎦ =

[3 1 1−1 3 1

]

20

5.3 Singular Value Decomposition5 SOLUTIONS AND DECOMPOSITIONS

5.2.3 Symmetric

Assume A is symmetric, then

VVT = I (i.e. V is orthogonal) (260)�i 2 R (i.e. �i is real) (261)

Tr(Ap) =P

i�pi (262)

eig(I + cA) = 1 + c�i (263)eig(A� cI) = �i � c (264)

eig(A�1) = ��1

i (265)

For a symmetric, positive matrix A,

eig(AT A) = eig(AAT ) = eig(A) � eig(A) (266)

5.2.4 Characteristic polynomial

The characteristic polynomial for the matrix A is

0 = det(A� �I) (267)= �n � g

1

�n�1 + g2

�n�2 � ... + (�1)ngn (268)

Note that the coe�cients gj for j = 1, ..., n are the n invariants under rotationof A. Thus, gj is the sum of the determinants of all the sub-matrices of A takenj rows and columns at a time. That is, g

1

is the trace of A, and g2

is the sumof the determinants of the n(n� 1)/2 sub-matrices that can be formed from Aby deleting all but two rows and columns, and so on – see [17].

5.3 Singular Value Decomposition

Any n⇥m matrix A can be written as

A = UDVT , (269)

whereU = eigenvectors of AAT n⇥ n

D =p

diag(eig(AAT )) n⇥mV = eigenvectors of AT A m⇥m

(270)

5.3.1 Symmetric Square decomposed into squares

Assume A to be n⇥ n and symmetric. Then⇥

A⇤

=⇥

V⇤ ⇥

D⇤ ⇥

VT⇤, (271)

where D is diagonal with the eigenvalues of A, and V is orthogonal and theeigenvectors of A.

Petersen & Pedersen, The Matrix Cookbook, Version: November 14, 2008, Page 30

• Calcolo la radice degli autovalori

ROUGH DRAFT - BEWARE suggestions [email protected]

and use the Gram-Schmidt orthonormalization process to convert that to an orthonormalmatrix.

u1 =v1|v1|

= [1√6,2√6,1√6]

w2 = v2 − u1 · v2 ∗ u1 = [2,−1, 0]

u2 =w2

|w2|= [

2√5,−1√5, 0]

w3 = v3 − u1 · v3 ∗ u1 − u2 · v3 ∗ u2 = [−2

3,−4

3,10

3]

u3 =w3

|w3|= [

1√30

,2√30

,−5√30

]

All this to give us

V =

⎢⎢⎣

1√6

2√5

1√30

2√6

−1√5

2√30

1√6

0 −5√30

⎥⎥⎦

when we really want its transpose

V T =

⎢⎢⎣

1√6

2√6

1√6

2√5

−1√5

01√30

2√30

−5√30

⎥⎥⎦

For S we take the square roots of the non-zero eigenvalues and populate the diagonal withthem, putting the largest in s11, the next largest in s22 and so on until the smallest valueends up in smm. The non-zero eigenvalues of U and V are always the same, so that’s whyit doesn’t matter which one we take them from. Because we are doing full SVD, instead ofreduced SVD (next section), we have to add a zero column vector to S so that it is of theproper dimensions to allow multiplication between U and V . The diagonal entries in S arethe singular values of A, the columns in U are called left singular vectors, and the columnsin V are called right singular vectors.

S =

[ √12 0 00

√10 0

]

Now we have all the pieces of the puzzle

Amn = UmmSmnVTnn =

[ 1√2

1√2

1√2

−1√2

] [ √12 0 00

√10 0

]⎡

⎢⎢⎣

1√6

2√6

1√6

2√5

−1√5

01√30

2√30

−5√30

⎥⎥⎦ =

√12√2

√10√2

0√12√2

−√10√2

0

⎢⎢⎣

1√6

2√6

1√6

2√5

−1√5

01√30

2√30

−5√30

⎥⎥⎦ =

[3 1 1−1 3 1

]

20

5.3 Singular Value Decomposition5 SOLUTIONS AND DECOMPOSITIONS

5.2.3 Symmetric

Assume A is symmetric, then

VVT = I (i.e. V is orthogonal) (260)�i 2 R (i.e. �i is real) (261)

Tr(Ap) =P

i�pi (262)

eig(I + cA) = 1 + c�i (263)eig(A� cI) = �i � c (264)

eig(A�1) = ��1

i (265)

For a symmetric, positive matrix A,

eig(AT A) = eig(AAT ) = eig(A) � eig(A) (266)

5.2.4 Characteristic polynomial

The characteristic polynomial for the matrix A is

0 = det(A� �I) (267)= �n � g

1

�n�1 + g2

�n�2 � ... + (�1)ngn (268)

Note that the coe�cients gj for j = 1, ..., n are the n invariants under rotationof A. Thus, gj is the sum of the determinants of all the sub-matrices of A takenj rows and columns at a time. That is, g

1

is the trace of A, and g2

is the sumof the determinants of the n(n� 1)/2 sub-matrices that can be formed from Aby deleting all but two rows and columns, and so on – see [17].

5.3 Singular Value Decomposition

Any n⇥m matrix A can be written as

A = UDVT , (269)

whereU = eigenvectors of AAT n⇥ n

D =p

diag(eig(AAT )) n⇥mV = eigenvectors of AT A m⇥m

(270)

5.3.1 Symmetric Square decomposed into squares

Assume A to be n⇥ n and symmetric. Then⇥

A⇤

=⇥

V⇤ ⇥

D⇤ ⇥

VT⇤, (271)

where D is diagonal with the eigenvalues of A, and V is orthogonal and theeigenvectors of A.

Petersen & Pedersen, The Matrix Cookbook, Version: November 14, 2008, Page 30

ROUGH DRAFT - BEWARE suggestions [email protected]

and use the Gram-Schmidt orthonormalization process to convert that to an orthonormalmatrix.

u1 =v1|v1|

= [1√6,2√6,1√6]

w2 = v2 − u1 · v2 ∗ u1 = [2,−1, 0]

u2 =w2

|w2|= [

2√5,−1√5, 0]

w3 = v3 − u1 · v3 ∗ u1 − u2 · v3 ∗ u2 = [−2

3,−4

3,10

3]

u3 =w3

|w3|= [

1√30

,2√30

,−5√30

]

All this to give us

V =

⎢⎢⎣

1√6

2√5

1√30

2√6

−1√5

2√30

1√6

0 −5√30

⎥⎥⎦

when we really want its transpose

V T =

⎢⎢⎣

1√6

2√6

1√6

2√5

−1√5

01√30

2√30

−5√30

⎥⎥⎦

For S we take the square roots of the non-zero eigenvalues and populate the diagonal withthem, putting the largest in s11, the next largest in s22 and so on until the smallest valueends up in smm. The non-zero eigenvalues of U and V are always the same, so that’s whyit doesn’t matter which one we take them from. Because we are doing full SVD, instead ofreduced SVD (next section), we have to add a zero column vector to S so that it is of theproper dimensions to allow multiplication between U and V . The diagonal entries in S arethe singular values of A, the columns in U are called left singular vectors, and the columnsin V are called right singular vectors.

S =

[ √12 0 00

√10 0

]

Now we have all the pieces of the puzzle

Amn = UmmSmnVTnn =

[ 1√2

1√2

1√2

−1√2

] [ √12 0 00

√10 0

]⎡

⎢⎢⎣

1√6

2√6

1√6

2√5

−1√5

01√30

2√30

−5√30

⎥⎥⎦ =

√12√2

√10√2

0√12√2

−√10√2

0

⎢⎢⎣

1√6

2√6

1√6

2√5

−1√5

01√30

2√30

−5√30

⎥⎥⎦ =

[3 1 1−1 3 1

]

20

ROUGH DRAFT - BEWARE suggestions [email protected]

and use the Gram-Schmidt orthonormalization process to convert that to an orthonormalmatrix.

u1 =v1|v1|

= [1√6,2√6,1√6]

w2 = v2 − u1 · v2 ∗ u1 = [2,−1, 0]

u2 =w2

|w2|= [

2√5,−1√5, 0]

w3 = v3 − u1 · v3 ∗ u1 − u2 · v3 ∗ u2 = [−2

3,−4

3,10

3]

u3 =w3

|w3|= [

1√30

,2√30

,−5√30

]

All this to give us

V =

⎢⎢⎣

1√6

2√5

1√30

2√6

−1√5

2√30

1√6

0 −5√30

⎥⎥⎦

when we really want its transpose

V T =

⎢⎢⎣

1√6

2√6

1√6

2√5

−1√5

01√30

2√30

−5√30

⎥⎥⎦

For S we take the square roots of the non-zero eigenvalues and populate the diagonal withthem, putting the largest in s11, the next largest in s22 and so on until the smallest valueends up in smm. The non-zero eigenvalues of U and V are always the same, so that’s whyit doesn’t matter which one we take them from. Because we are doing full SVD, instead ofreduced SVD (next section), we have to add a zero column vector to S so that it is of theproper dimensions to allow multiplication between U and V . The diagonal entries in S arethe singular values of A, the columns in U are called left singular vectors, and the columnsin V are called right singular vectors.

S =

[ √12 0 00

√10 0

]

Now we have all the pieces of the puzzle

Amn = UmmSmnVTnn =

[ 1√2

1√2

1√2

−1√2

] [ √12 0 00

√10 0

]⎡

⎢⎢⎣

1√6

2√6

1√6

2√5

−1√5

01√30

2√30

−5√30

⎥⎥⎦ =

√12√2

√10√2

0√12√2

−√10√2

0

⎢⎢⎣

1√6

2√6

1√6

2√5

−1√5

01√30

2√30

−5√30

⎥⎥⎦ =

[3 1 1−1 3 1

]

20(aligner)(stretcher) x(hanger) x

392 Chapter 12. Latent linear models

12.2.3 Singular value decomposition (SVD)

We have defined the solution to PCA in terms of eigenvectors of the covariance matrix. However,there is another way to obtain the solution, based on the singular value decomposition, orSVD. This basically generalizes the notion of eigenvectors from square matrices to any kind ofmatrix.

In particular, any (real) N × D matrix X can be decomposed as follows

X!"#$N×D

= U!"#$N×N

S!"#$N×D

VT!"#$D×D

(12.46)

where U is an N × N matrix whose columns are orthornormal (so UTU = IN ), V is D × Dmatrix whose rows and columns are orthonormal (so VTV = VVT = ID ), and S is a N × Dmatrix containing the r = min(N,D) singular values σi ≥ 0 on the main diagonal, with 0sfilling the rest of the matrix. The columns of U are the left singular vectors, and the columnsof V are the right singular vectors. See Figure 12.8(a) for an example.Since there are at most D singular values (assuming N > D), the last N − D columns of U

are irrelevant, since they will be multiplied by 0. The economy sized SVD, or thin SVD, avoidscomputing these unnecessary elements. Let us denote this decomposition by USV. If N > D,we have

X!"#$N×D

= U!"#$N×D

S!"#$D×D

VT!"#$D×D

(12.47)

as in Figure 12.8(a). If N < D, we have

X!"#$N×D

= U!"#$N×N

S!"#$N×N

VT!"#$N×D

(12.48)

Computing the economy-sized SVD takes O(NDmin(N,D)) time (Golub and van Loan 1996,p254).

The connection between eigenvectors and singular vectors is the following. For an arbitraryreal matrix X, if X = USVT , we have

XTX = VSTUT USVT = V(STS)VT = VDVT (12.49)

where D = S2 is a diagonal matrix containing the squares singular values. Hence

(XTX)V = VD (12.50)

so the eigenvectors of XTX are equal to V, the right singular vectors of X, and the eigenvaluesof XTX are equal to D, the squared singular values. Similarly

XXT = USVT VSTUT = U(SST )UT (12.51)

(XXT )U = U(SST ) = UD (12.52)

so the eigenvectors of XXT are equal to U, the left singular vectors of X. Also, the eigenvaluesof XXT are equal to the squared singular values. We can summarize all this as follows:

U = evec(XXT ), V = evec(XTX), S2 = eval(XXT ) = eval(XTX) (12.53)

392 Chapter 12. Latent linear models

12.2.3 Singular value decomposition (SVD)

We have defined the solution to PCA in terms of eigenvectors of the covariance matrix. However,there is another way to obtain the solution, based on the singular value decomposition, orSVD. This basically generalizes the notion of eigenvectors from square matrices to any kind ofmatrix.

In particular, any (real) N × D matrix X can be decomposed as follows

X!"#$N×D

= U!"#$N×N

S!"#$N×D

VT!"#$D×D

(12.46)

where U is an N × N matrix whose columns are orthornormal (so UTU = IN ), V is D × Dmatrix whose rows and columns are orthonormal (so VTV = VVT = ID ), and S is a N × Dmatrix containing the r = min(N,D) singular values σi ≥ 0 on the main diagonal, with 0sfilling the rest of the matrix. The columns of U are the left singular vectors, and the columnsof V are the right singular vectors. See Figure 12.8(a) for an example.Since there are at most D singular values (assuming N > D), the last N − D columns of U

are irrelevant, since they will be multiplied by 0. The economy sized SVD, or thin SVD, avoidscomputing these unnecessary elements. Let us denote this decomposition by USV. If N > D,we have

X!"#$N×D

= U!"#$N×D

S!"#$D×D

VT!"#$D×D

(12.47)

as in Figure 12.8(a). If N < D, we have

X!"#$N×D

= U!"#$N×N

S!"#$N×N

VT!"#$N×D

(12.48)

Computing the economy-sized SVD takes O(NDmin(N,D)) time (Golub and van Loan 1996,p254).

The connection between eigenvectors and singular vectors is the following. For an arbitraryreal matrix X, if X = USVT , we have

XTX = VSTUT USVT = V(STS)VT = VDVT (12.49)

where D = S2 is a diagonal matrix containing the squares singular values. Hence

(XTX)V = VD (12.50)

so the eigenvectors of XTX are equal to V, the right singular vectors of X, and the eigenvaluesof XTX are equal to D, the squared singular values. Similarly

XXT = USVT VSTUT = U(SST )UT (12.51)

(XXT )U = U(SST ) = UD (12.52)

so the eigenvectors of XXT are equal to U, the left singular vectors of X. Also, the eigenvaluesof XXT are equal to the squared singular values. We can summarize all this as follows:

U = evec(XXT ), V = evec(XTX), S2 = eval(XXT ) = eval(XTX) (12.53)

Page 5: Computazione per l’interazione naturale: Richiami di ...boccignone.di.unimi.it/IN_2016_files/LezIN_AlgebraLineare_3.key.pdf · Un po’ di algebra lineare di base //Singular Value

Un po’ di algebra lineare di base //Singular Value Decomposition (SVD)

5.3 Singular Value Decomposition5 SOLUTIONS AND DECOMPOSITIONS

5.2.3 Symmetric

Assume A is symmetric, then

VVT = I (i.e. V is orthogonal) (260)�i 2 R (i.e. �i is real) (261)

Tr(Ap) =P

i�pi (262)

eig(I + cA) = 1 + c�i (263)eig(A� cI) = �i � c (264)

eig(A�1) = ��1

i (265)

For a symmetric, positive matrix A,

eig(AT A) = eig(AAT ) = eig(A) � eig(A) (266)

5.2.4 Characteristic polynomial

The characteristic polynomial for the matrix A is

0 = det(A� �I) (267)= �n � g

1

�n�1 + g2

�n�2 � ... + (�1)ngn (268)

Note that the coe�cients gj for j = 1, ..., n are the n invariants under rotationof A. Thus, gj is the sum of the determinants of all the sub-matrices of A takenj rows and columns at a time. That is, g

1

is the trace of A, and g2

is the sumof the determinants of the n(n� 1)/2 sub-matrices that can be formed from Aby deleting all but two rows and columns, and so on – see [17].

5.3 Singular Value Decomposition

Any n⇥m matrix A can be written as

A = UDVT , (269)

whereU = eigenvectors of AAT n⇥ n

D =p

diag(eig(AAT )) n⇥mV = eigenvectors of AT A m⇥m

(270)

5.3.1 Symmetric Square decomposed into squares

Assume A to be n⇥ n and symmetric. Then⇥

A⇤

=⇥

V⇤ ⇥

D⇤ ⇥

VT⇤, (271)

where D is diagonal with the eigenvalues of A, and V is orthogonal and theeigenvectors of A.

Petersen & Pedersen, The Matrix Cookbook, Version: November 14, 2008, Page 30

(aligner)(stretcher) x(hanger) x

392 Chapter 12. Latent linear models

12.2.3 Singular value decomposition (SVD)

We have defined the solution to PCA in terms of eigenvectors of the covariance matrix. However,there is another way to obtain the solution, based on the singular value decomposition, orSVD. This basically generalizes the notion of eigenvectors from square matrices to any kind ofmatrix.

In particular, any (real) N × D matrix X can be decomposed as follows

X!"#$N×D

= U!"#$N×N

S!"#$N×D

VT!"#$D×D

(12.46)

where U is an N × N matrix whose columns are orthornormal (so UTU = IN ), V is D × Dmatrix whose rows and columns are orthonormal (so VTV = VVT = ID ), and S is a N × Dmatrix containing the r = min(N,D) singular values σi ≥ 0 on the main diagonal, with 0sfilling the rest of the matrix. The columns of U are the left singular vectors, and the columnsof V are the right singular vectors. See Figure 12.8(a) for an example.Since there are at most D singular values (assuming N > D), the last N − D columns of U

are irrelevant, since they will be multiplied by 0. The economy sized SVD, or thin SVD, avoidscomputing these unnecessary elements. Let us denote this decomposition by USV. If N > D,we have

X!"#$N×D

= U!"#$N×D

S!"#$D×D

VT!"#$D×D

(12.47)

as in Figure 12.8(a). If N < D, we have

X!"#$N×D

= U!"#$N×N

S!"#$N×N

VT!"#$N×D

(12.48)

Computing the economy-sized SVD takes O(NDmin(N,D)) time (Golub and van Loan 1996,p254).

The connection between eigenvectors and singular vectors is the following. For an arbitraryreal matrix X, if X = USVT , we have

XTX = VSTUT USVT = V(STS)VT = VDVT (12.49)

where D = S2 is a diagonal matrix containing the squares singular values. Hence

(XTX)V = VD (12.50)

so the eigenvectors of XTX are equal to V, the right singular vectors of X, and the eigenvaluesof XTX are equal to D, the squared singular values. Similarly

XXT = USVT VSTUT = U(SST )UT (12.51)

(XXT )U = U(SST ) = UD (12.52)

so the eigenvectors of XXT are equal to U, the left singular vectors of X. Also, the eigenvaluesof XXT are equal to the squared singular values. We can summarize all this as follows:

U = evec(XXT ), V = evec(XTX), S2 = eval(XXT ) = eval(XTX) (12.53)

Un po’ di algebra lineare di base //Singular Value Decomposition (SVD)

• In generale

392 Chapter 12. Latent linear models

12.2.3 Singular value decomposition (SVD)

We have defined the solution to PCA in terms of eigenvectors of the covariance matrix. However,there is another way to obtain the solution, based on the singular value decomposition, orSVD. This basically generalizes the notion of eigenvectors from square matrices to any kind ofmatrix.

In particular, any (real) N × D matrix X can be decomposed as follows

X!"#$N×D

= U!"#$N×N

S!"#$N×D

VT!"#$D×D

(12.46)

where U is an N × N matrix whose columns are orthornormal (so UTU = IN ), V is D × Dmatrix whose rows and columns are orthonormal (so VTV = VVT = ID ), and S is a N × Dmatrix containing the r = min(N,D) singular values σi ≥ 0 on the main diagonal, with 0sfilling the rest of the matrix. The columns of U are the left singular vectors, and the columnsof V are the right singular vectors. See Figure 12.8(a) for an example.Since there are at most D singular values (assuming N > D), the last N − D columns of U

are irrelevant, since they will be multiplied by 0. The economy sized SVD, or thin SVD, avoidscomputing these unnecessary elements. Let us denote this decomposition by USV. If N > D,we have

X!"#$N×D

= U!"#$N×D

S!"#$D×D

VT!"#$D×D

(12.47)

as in Figure 12.8(a). If N < D, we have

X!"#$N×D

= U!"#$N×N

S!"#$N×N

VT!"#$N×D

(12.48)

Computing the economy-sized SVD takes O(NDmin(N,D)) time (Golub and van Loan 1996,p254).

The connection between eigenvectors and singular vectors is the following. For an arbitraryreal matrix X, if X = USVT , we have

XTX = VSTUT USVT = V(STS)VT = VDVT (12.49)

where D = S2 is a diagonal matrix containing the squares singular values. Hence

(XTX)V = VD (12.50)

so the eigenvectors of XTX are equal to V, the right singular vectors of X, and the eigenvaluesof XTX are equal to D, the squared singular values. Similarly

XXT = USVT VSTUT = U(SST )UT (12.51)

(XXT )U = U(SST ) = UD (12.52)

so the eigenvectors of XXT are equal to U, the left singular vectors of X. Also, the eigenvaluesof XXT are equal to the squared singular values. We can summarize all this as follows:

U = evec(XXT ), V = evec(XTX), S2 = eval(XXT ) = eval(XTX) (12.53)

5.3 Singular Value Decomposition5 SOLUTIONS AND DECOMPOSITIONS

5.2.3 Symmetric

Assume A is symmetric, then

VVT = I (i.e. V is orthogonal) (260)�i 2 R (i.e. �i is real) (261)

Tr(Ap) =P

i�pi (262)

eig(I + cA) = 1 + c�i (263)eig(A� cI) = �i � c (264)

eig(A�1) = ��1

i (265)

For a symmetric, positive matrix A,

eig(AT A) = eig(AAT ) = eig(A) � eig(A) (266)

5.2.4 Characteristic polynomial

The characteristic polynomial for the matrix A is

0 = det(A� �I) (267)= �n � g

1

�n�1 + g2

�n�2 � ... + (�1)ngn (268)

Note that the coe�cients gj for j = 1, ..., n are the n invariants under rotationof A. Thus, gj is the sum of the determinants of all the sub-matrices of A takenj rows and columns at a time. That is, g

1

is the trace of A, and g2

is the sumof the determinants of the n(n� 1)/2 sub-matrices that can be formed from Aby deleting all but two rows and columns, and so on – see [17].

5.3 Singular Value Decomposition

Any n⇥m matrix A can be written as

A = UDVT , (269)

whereU = eigenvectors of AAT n⇥ n

D =p

diag(eig(AAT )) n⇥mV = eigenvectors of AT A m⇥m

(270)

5.3.1 Symmetric Square decomposed into squares

Assume A to be n⇥ n and symmetric. Then⇥

A⇤

=⇥

V⇤ ⇥

D⇤ ⇥

VT⇤, (271)

where D is diagonal with the eigenvalues of A, and V is orthogonal and theeigenvectors of A.

Petersen & Pedersen, The Matrix Cookbook, Version: November 14, 2008, Page 30

392 Chapter 12. Latent linear models

12.2.3 Singular value decomposition (SVD)

We have defined the solution to PCA in terms of eigenvectors of the covariance matrix. However,there is another way to obtain the solution, based on the singular value decomposition, orSVD. This basically generalizes the notion of eigenvectors from square matrices to any kind ofmatrix.In particular, any (real) N × D matrix X can be decomposed as follows

X!"#$N×D

= U!"#$N×N

S!"#$N×D

VT!"#$D×D

(12.46)

where U is an N × N matrix whose columns are orthornormal (so UTU = IN ), V is D × Dmatrix whose rows and columns are orthonormal (so VTV = VVT = ID ), and S is a N × Dmatrix containing the r = min(N,D) singular values σi ≥ 0 on the main diagonal, with 0sfilling the rest of the matrix. The columns of U are the left singular vectors, and the columnsof V are the right singular vectors. See Figure 12.8(a) for an example.Since there are at most D singular values (assuming N > D), the last N − D columns of U

are irrelevant, since they will be multiplied by 0. The economy sized SVD, or thin SVD, avoidscomputing these unnecessary elements. Let us denote this decomposition by USV. If N > D,we have

X!"#$N×D

= U!"#$N×D

S!"#$D×D

VT!"#$D×D

(12.47)

as in Figure 12.8(a). If N < D, we have

X!"#$N×D

= U!"#$N×N

S!"#$N×N

VT!"#$N×D

(12.48)

Computing the economy-sized SVD takes O(NDmin(N,D)) time (Golub and van Loan 1996,p254).The connection between eigenvectors and singular vectors is the following. For an arbitrary

real matrix X, if X = USVT , we have

XTX = VSTUT USVT = V(STS)VT = VDVT (12.49)

where D = S2 is a diagonal matrix containing the squares singular values. Hence

(XTX)V = VD (12.50)

so the eigenvectors of XTX are equal to V, the right singular vectors of X, and the eigenvaluesof XTX are equal to D, the squared singular values. Similarly

XXT = USVT VSTUT = U(SST )UT (12.51)

(XXT )U = U(SST ) = UD (12.52)

so the eigenvectors of XXT are equal to U, the left singular vectors of X. Also, the eigenvaluesof XXT are equal to the squared singular values. We can summarize all this as follows:

U = evec(XXT ), V = evec(XTX), S2 = eval(XXT ) = eval(XTX) (12.53)

392 Chapter 12. Latent linear models

12.2.3 Singular value decomposition (SVD)

We have defined the solution to PCA in terms of eigenvectors of the covariance matrix. However,there is another way to obtain the solution, based on the singular value decomposition, orSVD. This basically generalizes the notion of eigenvectors from square matrices to any kind ofmatrix.In particular, any (real) N × D matrix X can be decomposed as follows

X!"#$N×D

= U!"#$N×N

S!"#$N×D

VT!"#$D×D

(12.46)

where U is an N × N matrix whose columns are orthornormal (so UTU = IN ), V is D × Dmatrix whose rows and columns are orthonormal (so VTV = VVT = ID ), and S is a N × Dmatrix containing the r = min(N,D) singular values σi ≥ 0 on the main diagonal, with 0sfilling the rest of the matrix. The columns of U are the left singular vectors, and the columnsof V are the right singular vectors. See Figure 12.8(a) for an example.Since there are at most D singular values (assuming N > D), the last N − D columns of U

are irrelevant, since they will be multiplied by 0. The economy sized SVD, or thin SVD, avoidscomputing these unnecessary elements. Let us denote this decomposition by USV. If N > D,we have

X!"#$N×D

= U!"#$N×D

S!"#$D×D

VT!"#$D×D

(12.47)

as in Figure 12.8(a). If N < D, we have

X!"#$N×D

= U!"#$N×N

S!"#$N×N

VT!"#$N×D

(12.48)

Computing the economy-sized SVD takes O(NDmin(N,D)) time (Golub and van Loan 1996,p254).The connection between eigenvectors and singular vectors is the following. For an arbitrary

real matrix X, if X = USVT , we have

XTX = VSTUT USVT = V(STS)VT = VDVT (12.49)

where D = S2 is a diagonal matrix containing the squares singular values. Hence

(XTX)V = VD (12.50)

so the eigenvectors of XTX are equal to V, the right singular vectors of X, and the eigenvaluesof XTX are equal to D, the squared singular values. Similarly

XXT = USVT VSTUT = U(SST )UT (12.51)

(XXT )U = U(SST ) = UD (12.52)

so the eigenvectors of XXT are equal to U, the left singular vectors of X. Also, the eigenvaluesof XXT are equal to the squared singular values. We can summarize all this as follows:

U = evec(XXT ), V = evec(XTX), S2 = eval(XXT ) = eval(XTX) (12.53)

le ultime N-D colonne di U sono irrilevanti

392 Chapter 12. Latent linear models

12.2.3 Singular value decomposition (SVD)

We have defined the solution to PCA in terms of eigenvectors of the covariance matrix. However,there is another way to obtain the solution, based on the singular value decomposition, orSVD. This basically generalizes the notion of eigenvectors from square matrices to any kind ofmatrix.In particular, any (real) N × D matrix X can be decomposed as follows

X!"#$N×D

= U!"#$N×N

S!"#$N×D

VT!"#$D×D

(12.46)

where U is an N × N matrix whose columns are orthornormal (so UTU = IN ), V is D × Dmatrix whose rows and columns are orthonormal (so VTV = VVT = ID ), and S is a N × Dmatrix containing the r = min(N,D) singular values σi ≥ 0 on the main diagonal, with 0sfilling the rest of the matrix. The columns of U are the left singular vectors, and the columnsof V are the right singular vectors. See Figure 12.8(a) for an example.Since there are at most D singular values (assuming N > D), the last N − D columns of U

are irrelevant, since they will be multiplied by 0. The economy sized SVD, or thin SVD, avoidscomputing these unnecessary elements. Let us denote this decomposition by USV. If N > D,we have

X!"#$N×D

= U!"#$N×D

S!"#$D×D

VT!"#$D×D

(12.47)

as in Figure 12.8(a). If N < D, we have

X!"#$N×D

= U!"#$N×N

S!"#$N×N

VT!"#$N×D

(12.48)

Computing the economy-sized SVD takes O(NDmin(N,D)) time (Golub and van Loan 1996,p254).The connection between eigenvectors and singular vectors is the following. For an arbitrary

real matrix X, if X = USVT , we have

XTX = VSTUT USVT = V(STS)VT = VDVT (12.49)

where D = S2 is a diagonal matrix containing the squares singular values. Hence

(XTX)V = VD (12.50)

so the eigenvectors of XTX are equal to V, the right singular vectors of X, and the eigenvaluesof XTX are equal to D, the squared singular values. Similarly

XXT = USVT VSTUT = U(SST )UT (12.51)

(XXT )U = U(SST ) = UD (12.52)

so the eigenvectors of XXT are equal to U, the left singular vectors of X. Also, the eigenvaluesof XXT are equal to the squared singular values. We can summarize all this as follows:

U = evec(XXT ), V = evec(XTX), S2 = eval(XXT ) = eval(XTX) (12.53)

5.3 Singular Value Decomposition5 SOLUTIONS AND DECOMPOSITIONS

5.2.3 Symmetric

Assume A is symmetric, then

VVT = I (i.e. V is orthogonal) (260)�i 2 R (i.e. �i is real) (261)

Tr(Ap) =P

i�pi (262)

eig(I + cA) = 1 + c�i (263)eig(A� cI) = �i � c (264)

eig(A�1) = ��1

i (265)

For a symmetric, positive matrix A,

eig(AT A) = eig(AAT ) = eig(A) � eig(A) (266)

5.2.4 Characteristic polynomial

The characteristic polynomial for the matrix A is

0 = det(A� �I) (267)= �n � g

1

�n�1 + g2

�n�2 � ... + (�1)ngn (268)

Note that the coe�cients gj for j = 1, ..., n are the n invariants under rotationof A. Thus, gj is the sum of the determinants of all the sub-matrices of A takenj rows and columns at a time. That is, g

1

is the trace of A, and g2

is the sumof the determinants of the n(n� 1)/2 sub-matrices that can be formed from Aby deleting all but two rows and columns, and so on – see [17].

5.3 Singular Value Decomposition

Any n⇥m matrix A can be written as

A = UDVT , (269)

whereU = eigenvectors of AAT n⇥ n

D =p

diag(eig(AAT )) n⇥mV = eigenvectors of AT A m⇥m

(270)

5.3.1 Symmetric Square decomposed into squares

Assume A to be n⇥ n and symmetric. Then⇥

A⇤

=⇥

V⇤ ⇥

D⇤ ⇥

VT⇤, (271)

where D is diagonal with the eigenvalues of A, and V is orthogonal and theeigenvectors of A.

Petersen & Pedersen, The Matrix Cookbook, Version: November 14, 2008, Page 30

392 Chapter 12. Latent linear models

12.2.3 Singular value decomposition (SVD)

We have defined the solution to PCA in terms of eigenvectors of the covariance matrix. However,there is another way to obtain the solution, based on the singular value decomposition, orSVD. This basically generalizes the notion of eigenvectors from square matrices to any kind ofmatrix.

In particular, any (real) N × D matrix X can be decomposed as follows

X!"#$N×D

= U!"#$N×N

S!"#$N×D

VT!"#$D×D

(12.46)

where U is an N × N matrix whose columns are orthornormal (so UTU = IN ), V is D × Dmatrix whose rows and columns are orthonormal (so VTV = VVT = ID ), and S is a N × Dmatrix containing the r = min(N,D) singular values σi ≥ 0 on the main diagonal, with 0sfilling the rest of the matrix. The columns of U are the left singular vectors, and the columnsof V are the right singular vectors. See Figure 12.8(a) for an example.Since there are at most D singular values (assuming N > D), the last N − D columns of U

are irrelevant, since they will be multiplied by 0. The economy sized SVD, or thin SVD, avoidscomputing these unnecessary elements. Let us denote this decomposition by USV. If N > D,we have

X!"#$N×D

= U!"#$N×D

S!"#$D×D

VT!"#$D×D

(12.47)

as in Figure 12.8(a). If N < D, we have

X!"#$N×D

= U!"#$N×N

S!"#$N×N

VT!"#$N×D

(12.48)

Computing the economy-sized SVD takes O(NDmin(N,D)) time (Golub and van Loan 1996,p254).

The connection between eigenvectors and singular vectors is the following. For an arbitraryreal matrix X, if X = USVT , we have

XTX = VSTUT USVT = V(STS)VT = VDVT (12.49)

where D = S2 is a diagonal matrix containing the squares singular values. Hence

(XTX)V = VD (12.50)

so the eigenvectors of XTX are equal to V, the right singular vectors of X, and the eigenvaluesof XTX are equal to D, the squared singular values. Similarly

XXT = USVT VSTUT = U(SST )UT (12.51)

(XXT )U = U(SST ) = UD (12.52)

so the eigenvectors of XXT are equal to U, the left singular vectors of X. Also, the eigenvaluesof XXT are equal to the squared singular values. We can summarize all this as follows:

U = evec(XXT ), V = evec(XTX), S2 = eval(XXT ) = eval(XTX) (12.53)

392 Chapter 12. Latent linear models

12.2.3 Singular value decomposition (SVD)

We have defined the solution to PCA in terms of eigenvectors of the covariance matrix. However,there is another way to obtain the solution, based on the singular value decomposition, orSVD. This basically generalizes the notion of eigenvectors from square matrices to any kind ofmatrix.In particular, any (real) N × D matrix X can be decomposed as follows

X!"#$N×D

= U!"#$N×N

S!"#$N×D

VT!"#$D×D

(12.46)

where U is an N × N matrix whose columns are orthornormal (so UTU = IN ), V is D × Dmatrix whose rows and columns are orthonormal (so VTV = VVT = ID ), and S is a N × Dmatrix containing the r = min(N,D) singular values σi ≥ 0 on the main diagonal, with 0sfilling the rest of the matrix. The columns of U are the left singular vectors, and the columnsof V are the right singular vectors. See Figure 12.8(a) for an example.Since there are at most D singular values (assuming N > D), the last N − D columns of U

are irrelevant, since they will be multiplied by 0. The economy sized SVD, or thin SVD, avoidscomputing these unnecessary elements. Let us denote this decomposition by USV. If N > D,we have

X!"#$N×D

= U!"#$N×D

S!"#$D×D

VT!"#$D×D

(12.47)

as in Figure 12.8(a). If N < D, we have

X!"#$N×D

= U!"#$N×N

S!"#$N×N

VT!"#$N×D

(12.48)

Computing the economy-sized SVD takes O(NDmin(N,D)) time (Golub and van Loan 1996,p254).The connection between eigenvectors and singular vectors is the following. For an arbitrary

real matrix X, if X = USVT , we have

XTX = VSTUT USVT = V(STS)VT = VDVT (12.49)

where D = S2 is a diagonal matrix containing the squares singular values. Hence

(XTX)V = VD (12.50)

so the eigenvectors of XTX are equal to V, the right singular vectors of X, and the eigenvaluesof XTX are equal to D, the squared singular values. Similarly

XXT = USVT VSTUT = U(SST )UT (12.51)

(XXT )U = U(SST ) = UD (12.52)

so the eigenvectors of XXT are equal to U, the left singular vectors of X. Also, the eigenvaluesof XXT are equal to the squared singular values. We can summarize all this as follows:

U = evec(XXT ), V = evec(XTX), S2 = eval(XXT ) = eval(XTX) (12.53)

5.3 Singular Value Decomposition5 SOLUTIONS AND DECOMPOSITIONS

5.2.3 Symmetric

Assume A is symmetric, then

VVT = I (i.e. V is orthogonal) (260)�i 2 R (i.e. �i is real) (261)

Tr(Ap) =P

i�pi (262)

eig(I + cA) = 1 + c�i (263)eig(A� cI) = �i � c (264)

eig(A�1) = ��1

i (265)

For a symmetric, positive matrix A,

eig(AT A) = eig(AAT ) = eig(A) � eig(A) (266)

5.2.4 Characteristic polynomial

The characteristic polynomial for the matrix A is

0 = det(A� �I) (267)= �n � g

1

�n�1 + g2

�n�2 � ... + (�1)ngn (268)

Note that the coe�cients gj for j = 1, ..., n are the n invariants under rotationof A. Thus, gj is the sum of the determinants of all the sub-matrices of A takenj rows and columns at a time. That is, g

1

is the trace of A, and g2

is the sumof the determinants of the n(n� 1)/2 sub-matrices that can be formed from Aby deleting all but two rows and columns, and so on – see [17].

5.3 Singular Value Decomposition

Any n⇥m matrix A can be written as

A = UDVT , (269)

whereU = eigenvectors of AAT n⇥ n

D =p

diag(eig(AAT )) n⇥mV = eigenvectors of AT A m⇥m

(270)

5.3.1 Symmetric Square decomposed into squares

Assume A to be n⇥ n and symmetric. Then⇥

A⇤

=⇥

V⇤ ⇥

D⇤ ⇥

VT⇤, (271)

where D is diagonal with the eigenvalues of A, and V is orthogonal and theeigenvectors of A.

Petersen & Pedersen, The Matrix Cookbook, Version: November 14, 2008, Page 30

Thin SVD o Economy sized

SVD

392 Chapter 12. Latent linear models

12.2.3 Singular value decomposition (SVD)

We have defined the solution to PCA in terms of eigenvectors of the covariance matrix. However,there is another way to obtain the solution, based on the singular value decomposition, orSVD. This basically generalizes the notion of eigenvectors from square matrices to any kind ofmatrix.

In particular, any (real) N × D matrix X can be decomposed as follows

X!"#$N×D

= U!"#$N×N

S!"#$N×D

VT!"#$D×D

(12.46)

where U is an N × N matrix whose columns are orthornormal (so UTU = IN ), V is D × Dmatrix whose rows and columns are orthonormal (so VTV = VVT = ID ), and S is a N × Dmatrix containing the r = min(N,D) singular values σi ≥ 0 on the main diagonal, with 0sfilling the rest of the matrix. The columns of U are the left singular vectors, and the columnsof V are the right singular vectors. See Figure 12.8(a) for an example.Since there are at most D singular values (assuming N > D), the last N − D columns of U

are irrelevant, since they will be multiplied by 0. The economy sized SVD, or thin SVD, avoidscomputing these unnecessary elements. Let us denote this decomposition by USV. If N > D,we have

X!"#$N×D

= U!"#$N×D

S!"#$D×D

VT!"#$D×D

(12.47)

as in Figure 12.8(a). If N < D, we have

X!"#$N×D

= U!"#$N×N

S!"#$N×N

VT!"#$N×D

(12.48)

Computing the economy-sized SVD takes O(NDmin(N,D)) time (Golub and van Loan 1996,p254).

The connection between eigenvectors and singular vectors is the following. For an arbitraryreal matrix X, if X = USVT , we have

XTX = VSTUT USVT = V(STS)VT = VDVT (12.49)

where D = S2 is a diagonal matrix containing the squares singular values. Hence

(XTX)V = VD (12.50)

so the eigenvectors of XTX are equal to V, the right singular vectors of X, and the eigenvaluesof XTX are equal to D, the squared singular values. Similarly

XXT = USVT VSTUT = U(SST )UT (12.51)

(XXT )U = U(SST ) = UD (12.52)

so the eigenvectors of XXT are equal to U, the left singular vectors of X. Also, the eigenvaluesof XXT are equal to the squared singular values. We can summarize all this as follows:

U = evec(XXT ), V = evec(XTX), S2 = eval(XXT ) = eval(XTX) (12.53)

Page 6: Computazione per l’interazione naturale: Richiami di ...boccignone.di.unimi.it/IN_2016_files/LezIN_AlgebraLineare_3.key.pdf · Un po’ di algebra lineare di base //Singular Value

Un po’ di algebra lineare di base //Singular Value Decomposition (SVD)

392 Chapter 12. Latent linear models

12.2.3 Singular value decomposition (SVD)

We have defined the solution to PCA in terms of eigenvectors of the covariance matrix. However,there is another way to obtain the solution, based on the singular value decomposition, orSVD. This basically generalizes the notion of eigenvectors from square matrices to any kind ofmatrix.In particular, any (real) N × D matrix X can be decomposed as follows

X!"#$N×D

= U!"#$N×N

S!"#$N×D

VT!"#$D×D

(12.46)

where U is an N × N matrix whose columns are orthornormal (so UTU = IN ), V is D × Dmatrix whose rows and columns are orthonormal (so VTV = VVT = ID ), and S is a N × Dmatrix containing the r = min(N,D) singular values σi ≥ 0 on the main diagonal, with 0sfilling the rest of the matrix. The columns of U are the left singular vectors, and the columnsof V are the right singular vectors. See Figure 12.8(a) for an example.Since there are at most D singular values (assuming N > D), the last N − D columns of U

are irrelevant, since they will be multiplied by 0. The economy sized SVD, or thin SVD, avoidscomputing these unnecessary elements. Let us denote this decomposition by USV. If N > D,we have

X!"#$N×D

= U!"#$N×D

S!"#$D×D

VT!"#$D×D

(12.47)

as in Figure 12.8(a). If N < D, we have

X!"#$N×D

= U!"#$N×N

S!"#$N×N

VT!"#$N×D

(12.48)

Computing the economy-sized SVD takes O(NDmin(N,D)) time (Golub and van Loan 1996,p254).The connection between eigenvectors and singular vectors is the following. For an arbitrary

real matrix X, if X = USVT , we have

XTX = VSTUT USVT = V(STS)VT = VDVT (12.49)

where D = S2 is a diagonal matrix containing the squares singular values. Hence

(XTX)V = VD (12.50)

so the eigenvectors of XTX are equal to V, the right singular vectors of X, and the eigenvaluesof XTX are equal to D, the squared singular values. Similarly

XXT = USVT VSTUT = U(SST )UT (12.51)

(XXT )U = U(SST ) = UD (12.52)

so the eigenvectors of XXT are equal to U, the left singular vectors of X. Also, the eigenvaluesof XXT are equal to the squared singular values. We can summarize all this as follows:

U = evec(XXT ), V = evec(XTX), S2 = eval(XXT ) = eval(XTX) (12.53)

le ultime N-D colonne di U sono irrilevanti

392 Chapter 12. Latent linear models

12.2.3 Singular value decomposition (SVD)

We have defined the solution to PCA in terms of eigenvectors of the covariance matrix. However,there is another way to obtain the solution, based on the singular value decomposition, orSVD. This basically generalizes the notion of eigenvectors from square matrices to any kind ofmatrix.In particular, any (real) N × D matrix X can be decomposed as follows

X!"#$N×D

= U!"#$N×N

S!"#$N×D

VT!"#$D×D

(12.46)

where U is an N × N matrix whose columns are orthornormal (so UTU = IN ), V is D × Dmatrix whose rows and columns are orthonormal (so VTV = VVT = ID ), and S is a N × Dmatrix containing the r = min(N,D) singular values σi ≥ 0 on the main diagonal, with 0sfilling the rest of the matrix. The columns of U are the left singular vectors, and the columnsof V are the right singular vectors. See Figure 12.8(a) for an example.Since there are at most D singular values (assuming N > D), the last N − D columns of U

are irrelevant, since they will be multiplied by 0. The economy sized SVD, or thin SVD, avoidscomputing these unnecessary elements. Let us denote this decomposition by USV. If N > D,we have

X!"#$N×D

= U!"#$N×D

S!"#$D×D

VT!"#$D×D

(12.47)

as in Figure 12.8(a). If N < D, we have

X!"#$N×D

= U!"#$N×N

S!"#$N×N

VT!"#$N×D

(12.48)

Computing the economy-sized SVD takes O(NDmin(N,D)) time (Golub and van Loan 1996,p254).The connection between eigenvectors and singular vectors is the following. For an arbitrary

real matrix X, if X = USVT , we have

XTX = VSTUT USVT = V(STS)VT = VDVT (12.49)

where D = S2 is a diagonal matrix containing the squares singular values. Hence

(XTX)V = VD (12.50)

so the eigenvectors of XTX are equal to V, the right singular vectors of X, and the eigenvaluesof XTX are equal to D, the squared singular values. Similarly

XXT = USVT VSTUT = U(SST )UT (12.51)

(XXT )U = U(SST ) = UD (12.52)

so the eigenvectors of XXT are equal to U, the left singular vectors of X. Also, the eigenvaluesof XXT are equal to the squared singular values. We can summarize all this as follows:

U = evec(XXT ), V = evec(XTX), S2 = eval(XXT ) = eval(XTX) (12.53)

5.3 Singular Value Decomposition5 SOLUTIONS AND DECOMPOSITIONS

5.2.3 Symmetric

Assume A is symmetric, then

VVT = I (i.e. V is orthogonal) (260)�i 2 R (i.e. �i is real) (261)

Tr(Ap) =P

i�pi (262)

eig(I + cA) = 1 + c�i (263)eig(A� cI) = �i � c (264)

eig(A�1) = ��1

i (265)

For a symmetric, positive matrix A,

eig(AT A) = eig(AAT ) = eig(A) � eig(A) (266)

5.2.4 Characteristic polynomial

The characteristic polynomial for the matrix A is

0 = det(A� �I) (267)= �n � g

1

�n�1 + g2

�n�2 � ... + (�1)ngn (268)

Note that the coe�cients gj for j = 1, ..., n are the n invariants under rotationof A. Thus, gj is the sum of the determinants of all the sub-matrices of A takenj rows and columns at a time. That is, g

1

is the trace of A, and g2

is the sumof the determinants of the n(n� 1)/2 sub-matrices that can be formed from Aby deleting all but two rows and columns, and so on – see [17].

5.3 Singular Value Decomposition

Any n⇥m matrix A can be written as

A = UDVT , (269)

whereU = eigenvectors of AAT n⇥ n

D =p

diag(eig(AAT )) n⇥mV = eigenvectors of AT A m⇥m

(270)

5.3.1 Symmetric Square decomposed into squares

Assume A to be n⇥ n and symmetric. Then⇥

A⇤

=⇥

V⇤ ⇥

D⇤ ⇥

VT⇤, (271)

where D is diagonal with the eigenvalues of A, and V is orthogonal and theeigenvectors of A.

Petersen & Pedersen, The Matrix Cookbook, Version: November 14, 2008, Page 30

392 Chapter 12. Latent linear models

12.2.3 Singular value decomposition (SVD)

We have defined the solution to PCA in terms of eigenvectors of the covariance matrix. However,there is another way to obtain the solution, based on the singular value decomposition, orSVD. This basically generalizes the notion of eigenvectors from square matrices to any kind ofmatrix.

In particular, any (real) N × D matrix X can be decomposed as follows

X!"#$N×D

= U!"#$N×N

S!"#$N×D

VT!"#$D×D

(12.46)

where U is an N × N matrix whose columns are orthornormal (so UTU = IN ), V is D × Dmatrix whose rows and columns are orthonormal (so VTV = VVT = ID ), and S is a N × Dmatrix containing the r = min(N,D) singular values σi ≥ 0 on the main diagonal, with 0sfilling the rest of the matrix. The columns of U are the left singular vectors, and the columnsof V are the right singular vectors. See Figure 12.8(a) for an example.Since there are at most D singular values (assuming N > D), the last N − D columns of U

are irrelevant, since they will be multiplied by 0. The economy sized SVD, or thin SVD, avoidscomputing these unnecessary elements. Let us denote this decomposition by USV. If N > D,we have

X!"#$N×D

= U!"#$N×D

S!"#$D×D

VT!"#$D×D

(12.47)

as in Figure 12.8(a). If N < D, we have

X!"#$N×D

= U!"#$N×N

S!"#$N×N

VT!"#$N×D

(12.48)

Computing the economy-sized SVD takes O(NDmin(N,D)) time (Golub and van Loan 1996,p254).

The connection between eigenvectors and singular vectors is the following. For an arbitraryreal matrix X, if X = USVT , we have

XTX = VSTUT USVT = V(STS)VT = VDVT (12.49)

where D = S2 is a diagonal matrix containing the squares singular values. Hence

(XTX)V = VD (12.50)

so the eigenvectors of XTX are equal to V, the right singular vectors of X, and the eigenvaluesof XTX are equal to D, the squared singular values. Similarly

XXT = USVT VSTUT = U(SST )UT (12.51)

(XXT )U = U(SST ) = UD (12.52)

so the eigenvectors of XXT are equal to U, the left singular vectors of X. Also, the eigenvaluesof XXT are equal to the squared singular values. We can summarize all this as follows:

U = evec(XXT ), V = evec(XTX), S2 = eval(XXT ) = eval(XTX) (12.53)

392 Chapter 12. Latent linear models

12.2.3 Singular value decomposition (SVD)

We have defined the solution to PCA in terms of eigenvectors of the covariance matrix. However,there is another way to obtain the solution, based on the singular value decomposition, orSVD. This basically generalizes the notion of eigenvectors from square matrices to any kind ofmatrix.In particular, any (real) N × D matrix X can be decomposed as follows

X!"#$N×D

= U!"#$N×N

S!"#$N×D

VT!"#$D×D

(12.46)

where U is an N × N matrix whose columns are orthornormal (so UTU = IN ), V is D × Dmatrix whose rows and columns are orthonormal (so VTV = VVT = ID ), and S is a N × Dmatrix containing the r = min(N,D) singular values σi ≥ 0 on the main diagonal, with 0sfilling the rest of the matrix. The columns of U are the left singular vectors, and the columnsof V are the right singular vectors. See Figure 12.8(a) for an example.Since there are at most D singular values (assuming N > D), the last N − D columns of U

are irrelevant, since they will be multiplied by 0. The economy sized SVD, or thin SVD, avoidscomputing these unnecessary elements. Let us denote this decomposition by USV. If N > D,we have

X!"#$N×D

= U!"#$N×D

S!"#$D×D

VT!"#$D×D

(12.47)

as in Figure 12.8(a). If N < D, we have

X!"#$N×D

= U!"#$N×N

S!"#$N×N

VT!"#$N×D

(12.48)

Computing the economy-sized SVD takes O(NDmin(N,D)) time (Golub and van Loan 1996,p254).The connection between eigenvectors and singular vectors is the following. For an arbitrary

real matrix X, if X = USVT , we have

XTX = VSTUT USVT = V(STS)VT = VDVT (12.49)

where D = S2 is a diagonal matrix containing the squares singular values. Hence

(XTX)V = VD (12.50)

so the eigenvectors of XTX are equal to V, the right singular vectors of X, and the eigenvaluesof XTX are equal to D, the squared singular values. Similarly

XXT = USVT VSTUT = U(SST )UT (12.51)

(XXT )U = U(SST ) = UD (12.52)

so the eigenvectors of XXT are equal to U, the left singular vectors of X. Also, the eigenvaluesof XXT are equal to the squared singular values. We can summarize all this as follows:

U = evec(XXT ), V = evec(XTX), S2 = eval(XXT ) = eval(XTX) (12.53)

5.3 Singular Value Decomposition5 SOLUTIONS AND DECOMPOSITIONS

5.2.3 Symmetric

Assume A is symmetric, then

VVT = I (i.e. V is orthogonal) (260)�i 2 R (i.e. �i is real) (261)

Tr(Ap) =P

i�pi (262)

eig(I + cA) = 1 + c�i (263)eig(A� cI) = �i � c (264)

eig(A�1) = ��1

i (265)

For a symmetric, positive matrix A,

eig(AT A) = eig(AAT ) = eig(A) � eig(A) (266)

5.2.4 Characteristic polynomial

The characteristic polynomial for the matrix A is

0 = det(A� �I) (267)= �n � g

1

�n�1 + g2

�n�2 � ... + (�1)ngn (268)

Note that the coe�cients gj for j = 1, ..., n are the n invariants under rotationof A. Thus, gj is the sum of the determinants of all the sub-matrices of A takenj rows and columns at a time. That is, g

1

is the trace of A, and g2

is the sumof the determinants of the n(n� 1)/2 sub-matrices that can be formed from Aby deleting all but two rows and columns, and so on – see [17].

5.3 Singular Value Decomposition

Any n⇥m matrix A can be written as

A = UDVT , (269)

whereU = eigenvectors of AAT n⇥ n

D =p

diag(eig(AAT )) n⇥mV = eigenvectors of AT A m⇥m

(270)

5.3.1 Symmetric Square decomposed into squares

Assume A to be n⇥ n and symmetric. Then⇥

A⇤

=⇥

V⇤ ⇥

D⇤ ⇥

VT⇤, (271)

where D is diagonal with the eigenvalues of A, and V is orthogonal and theeigenvectors of A.

Petersen & Pedersen, The Matrix Cookbook, Version: November 14, 2008, Page 30

Thin SVD o Economy sized

SVD

12.2. Principal components analysis (PCA) 393

=

0

σ1

σD

. . .

D

D

DN − DDD

N

X = U S V T

(a)

L

σ1. . .σL

L D

L

D

N

X ≃ UL SL V TL

(b)

Figure 12.8 (a) SVD decomposition of non-square matrices X = USVT . The shaded parts of S, and allthe off-diagonal terms, are zero. The shaded entries in U and S are not computed in the economy-sizedversion, since they are not needed. (b) Truncated SVD approximation of rank L.

Since the eigenvectors are unaffected by linear scaling of a matrix, we see that the rightsingular vectors of X are equal to the eigenvectors of the empirical covariance Σ. Furthermore,the eigenvalues of Σ are a scaled version of the squared singular values. This means we canperform PCA using just a few lines of code (see pcaPmtk).However, the connection between PCA and SVD goes deeper. From Equation 12.46, we can

represent a rank r matrix as follows:

X = σ1

⎝|u1

|

⎠(− vT

1 −)+ · · ·+ σr

⎝|ur

|

⎠(− vT

r −)

(12.54)

If the singular values die off quickly as in Figure 12.10, we can produce a rank L approximationto the matrix as follows:

X ≈ U:,1:L S1:L,1:L VT:,1:L (12.55)

This is called a truncated SVD (see Figure 12.8(b)). The total number of parameters needed torepresent an N × D matrix using a rank L approximation is

NL+ LD + L = L(N +D + 1) (12.56)

5.3 Singular Value Decomposition5 SOLUTIONS AND DECOMPOSITIONS

5.2.3 Symmetric

Assume A is symmetric, then

VVT = I (i.e. V is orthogonal) (260)�i 2 R (i.e. �i is real) (261)

Tr(Ap) =P

i�pi (262)

eig(I + cA) = 1 + c�i (263)eig(A� cI) = �i � c (264)

eig(A�1) = ��1

i (265)

For a symmetric, positive matrix A,

eig(AT A) = eig(AAT ) = eig(A) � eig(A) (266)

5.2.4 Characteristic polynomial

The characteristic polynomial for the matrix A is

0 = det(A� �I) (267)= �n � g

1

�n�1 + g2

�n�2 � ... + (�1)ngn (268)

Note that the coe�cients gj for j = 1, ..., n are the n invariants under rotationof A. Thus, gj is the sum of the determinants of all the sub-matrices of A takenj rows and columns at a time. That is, g

1

is the trace of A, and g2

is the sumof the determinants of the n(n� 1)/2 sub-matrices that can be formed from Aby deleting all but two rows and columns, and so on – see [17].

5.3 Singular Value Decomposition

Any n⇥m matrix A can be written as

A = UDVT , (269)

whereU = eigenvectors of AAT n⇥ n

D =p

diag(eig(AAT )) n⇥mV = eigenvectors of AT A m⇥m

(270)

5.3.1 Symmetric Square decomposed into squares

Assume A to be n⇥ n and symmetric. Then⇥

A⇤

=⇥

V⇤ ⇥

D⇤ ⇥

VT⇤, (271)

where D is diagonal with the eigenvalues of A, and V is orthogonal and theeigenvectors of A.

Petersen & Pedersen, The Matrix Cookbook, Version: November 14, 2008, Page 30

Un po’ di algebra lineare di base //Singular Value Decomposition (SVD) troncata

392 Chapter 12. Latent linear models

12.2.3 Singular value decomposition (SVD)

We have defined the solution to PCA in terms of eigenvectors of the covariance matrix. However,there is another way to obtain the solution, based on the singular value decomposition, orSVD. This basically generalizes the notion of eigenvectors from square matrices to any kind ofmatrix.

In particular, any (real) N × D matrix X can be decomposed as follows

X!"#$N×D

= U!"#$N×N

S!"#$N×D

VT!"#$D×D

(12.46)

where U is an N × N matrix whose columns are orthornormal (so UTU = IN ), V is D × Dmatrix whose rows and columns are orthonormal (so VTV = VVT = ID ), and S is a N × Dmatrix containing the r = min(N,D) singular values σi ≥ 0 on the main diagonal, with 0sfilling the rest of the matrix. The columns of U are the left singular vectors, and the columnsof V are the right singular vectors. See Figure 12.8(a) for an example.Since there are at most D singular values (assuming N > D), the last N − D columns of U

are irrelevant, since they will be multiplied by 0. The economy sized SVD, or thin SVD, avoidscomputing these unnecessary elements. Let us denote this decomposition by USV. If N > D,we have

X!"#$N×D

= U!"#$N×D

S!"#$D×D

VT!"#$D×D

(12.47)

as in Figure 12.8(a). If N < D, we have

X!"#$N×D

= U!"#$N×N

S!"#$N×N

VT!"#$N×D

(12.48)

Computing the economy-sized SVD takes O(NDmin(N,D)) time (Golub and van Loan 1996,p254).

The connection between eigenvectors and singular vectors is the following. For an arbitraryreal matrix X, if X = USVT , we have

XTX = VSTUT USVT = V(STS)VT = VDVT (12.49)

where D = S2 is a diagonal matrix containing the squares singular values. Hence

(XTX)V = VD (12.50)

so the eigenvectors of XTX are equal to V, the right singular vectors of X, and the eigenvaluesof XTX are equal to D, the squared singular values. Similarly

XXT = USVT VSTUT = U(SST )UT (12.51)

(XXT )U = U(SST ) = UD (12.52)

so the eigenvectors of XXT are equal to U, the left singular vectors of X. Also, the eigenvaluesof XXT are equal to the squared singular values. We can summarize all this as follows:

U = evec(XXT ), V = evec(XTX), S2 = eval(XXT ) = eval(XTX) (12.53)

5.3 Singular Value Decomposition5 SOLUTIONS AND DECOMPOSITIONS

5.2.3 Symmetric

Assume A is symmetric, then

VVT = I (i.e. V is orthogonal) (260)�i 2 R (i.e. �i is real) (261)

Tr(Ap) =P

i�pi (262)

eig(I + cA) = 1 + c�i (263)eig(A� cI) = �i � c (264)

eig(A�1) = ��1

i (265)

For a symmetric, positive matrix A,

eig(AT A) = eig(AAT ) = eig(A) � eig(A) (266)

5.2.4 Characteristic polynomial

The characteristic polynomial for the matrix A is

0 = det(A� �I) (267)= �n � g

1

�n�1 + g2

�n�2 � ... + (�1)ngn (268)

Note that the coe�cients gj for j = 1, ..., n are the n invariants under rotationof A. Thus, gj is the sum of the determinants of all the sub-matrices of A takenj rows and columns at a time. That is, g

1

is the trace of A, and g2

is the sumof the determinants of the n(n� 1)/2 sub-matrices that can be formed from Aby deleting all but two rows and columns, and so on – see [17].

5.3 Singular Value Decomposition

Any n⇥m matrix A can be written as

A = UDVT , (269)

whereU = eigenvectors of AAT n⇥ n

D =p

diag(eig(AAT )) n⇥mV = eigenvectors of AT A m⇥m

(270)

5.3.1 Symmetric Square decomposed into squares

Assume A to be n⇥ n and symmetric. Then⇥

A⇤

=⇥

V⇤ ⇥

D⇤ ⇥

VT⇤, (271)

where D is diagonal with the eigenvalues of A, and V is orthogonal and theeigenvectors of A.

Petersen & Pedersen, The Matrix Cookbook, Version: November 14, 2008, Page 30

392 Chapter 12. Latent linear models

12.2.3 Singular value decomposition (SVD)

We have defined the solution to PCA in terms of eigenvectors of the covariance matrix. However,there is another way to obtain the solution, based on the singular value decomposition, orSVD. This basically generalizes the notion of eigenvectors from square matrices to any kind ofmatrix.In particular, any (real) N × D matrix X can be decomposed as follows

X!"#$N×D

= U!"#$N×N

S!"#$N×D

VT!"#$D×D

(12.46)

where U is an N × N matrix whose columns are orthornormal (so UTU = IN ), V is D × Dmatrix whose rows and columns are orthonormal (so VTV = VVT = ID ), and S is a N × Dmatrix containing the r = min(N,D) singular values σi ≥ 0 on the main diagonal, with 0sfilling the rest of the matrix. The columns of U are the left singular vectors, and the columnsof V are the right singular vectors. See Figure 12.8(a) for an example.Since there are at most D singular values (assuming N > D), the last N − D columns of U

are irrelevant, since they will be multiplied by 0. The economy sized SVD, or thin SVD, avoidscomputing these unnecessary elements. Let us denote this decomposition by USV. If N > D,we have

X!"#$N×D

= U!"#$N×D

S!"#$D×D

VT!"#$D×D

(12.47)

as in Figure 12.8(a). If N < D, we have

X!"#$N×D

= U!"#$N×N

S!"#$N×N

VT!"#$N×D

(12.48)

Computing the economy-sized SVD takes O(NDmin(N,D)) time (Golub and van Loan 1996,p254).The connection between eigenvectors and singular vectors is the following. For an arbitrary

real matrix X, if X = USVT , we have

XTX = VSTUT USVT = V(STS)VT = VDVT (12.49)

where D = S2 is a diagonal matrix containing the squares singular values. Hence

(XTX)V = VD (12.50)

so the eigenvectors of XTX are equal to V, the right singular vectors of X, and the eigenvaluesof XTX are equal to D, the squared singular values. Similarly

XXT = USVT VSTUT = U(SST )UT (12.51)

(XXT )U = U(SST ) = UD (12.52)

so the eigenvectors of XXT are equal to U, the left singular vectors of X. Also, the eigenvaluesof XXT are equal to the squared singular values. We can summarize all this as follows:

U = evec(XXT ), V = evec(XTX), S2 = eval(XXT ) = eval(XTX) (12.53)

SVD troncata

12.2. Principal components analysis (PCA) 393

=

0

σ1

σD

. . .

D

D

DN − DDD

N

X = U S V T

(a)

L

σ1. . .σL

L D

L

D

N

X ≃ UL SL V TL

(b)

Figure 12.8 (a) SVD decomposition of non-square matrices X = USVT . The shaded parts of S, and allthe off-diagonal terms, are zero. The shaded entries in U and S are not computed in the economy-sizedversion, since they are not needed. (b) Truncated SVD approximation of rank L.

Since the eigenvectors are unaffected by linear scaling of a matrix, we see that the rightsingular vectors of X are equal to the eigenvectors of the empirical covariance Σ. Furthermore,the eigenvalues of Σ are a scaled version of the squared singular values. This means we canperform PCA using just a few lines of code (see pcaPmtk).

However, the connection between PCA and SVD goes deeper. From Equation 12.46, we canrepresent a rank r matrix as follows:

X = σ1

⎝|u1

|

⎠(− vT

1 −)+ · · ·+ σr

⎝|ur

|

⎠(− vT

r −)

(12.54)

If the singular values die off quickly as in Figure 12.10, we can produce a rank L approximationto the matrix as follows:

X ≈ U:,1:L S1:L,1:L VT:,1:L (12.55)

This is called a truncated SVD (see Figure 12.8(b)). The total number of parameters needed torepresent an N × D matrix using a rank L approximation is

NL+ LD + L = L(N +D + 1) (12.56)

E’ possibile costruire una approssimazione di rango L < r

5.3 Singular Value Decomposition5 SOLUTIONS AND DECOMPOSITIONS

5.2.3 Symmetric

Assume A is symmetric, then

VVT = I (i.e. V is orthogonal) (260)�i 2 R (i.e. �i is real) (261)

Tr(Ap) =P

i�pi (262)

eig(I + cA) = 1 + c�i (263)eig(A� cI) = �i � c (264)

eig(A�1) = ��1

i (265)

For a symmetric, positive matrix A,

eig(AT A) = eig(AAT ) = eig(A) � eig(A) (266)

5.2.4 Characteristic polynomial

The characteristic polynomial for the matrix A is

0 = det(A� �I) (267)= �n � g

1

�n�1 + g2

�n�2 � ... + (�1)ngn (268)

Note that the coe�cients gj for j = 1, ..., n are the n invariants under rotationof A. Thus, gj is the sum of the determinants of all the sub-matrices of A takenj rows and columns at a time. That is, g

1

is the trace of A, and g2

is the sumof the determinants of the n(n� 1)/2 sub-matrices that can be formed from Aby deleting all but two rows and columns, and so on – see [17].

5.3 Singular Value Decomposition

Any n⇥m matrix A can be written as

A = UDVT , (269)

whereU = eigenvectors of AAT n⇥ n

D =p

diag(eig(AAT )) n⇥mV = eigenvectors of AT A m⇥m

(270)

5.3.1 Symmetric Square decomposed into squares

Assume A to be n⇥ n and symmetric. Then⇥

A⇤

=⇥

V⇤ ⇥

D⇤ ⇥

VT⇤, (271)

where D is diagonal with the eigenvalues of A, and V is orthogonal and theeigenvectors of A.

Petersen & Pedersen, The Matrix Cookbook, Version: November 14, 2008, Page 30

12.2. Principal components analysis (PCA) 393

=

0

σ1

σD

. . .

D

D

DN − DDD

N

X = U S V T

(a)

L

σ1. . .σL

L D

L

D

N

X ≃ UL SL V TL

(b)

Figure 12.8 (a) SVD decomposition of non-square matrices X = USVT . The shaded parts of S, and allthe off-diagonal terms, are zero. The shaded entries in U and S are not computed in the economy-sizedversion, since they are not needed. (b) Truncated SVD approximation of rank L.

Since the eigenvectors are unaffected by linear scaling of a matrix, we see that the rightsingular vectors of X are equal to the eigenvectors of the empirical covariance Σ. Furthermore,the eigenvalues of Σ are a scaled version of the squared singular values. This means we canperform PCA using just a few lines of code (see pcaPmtk).However, the connection between PCA and SVD goes deeper. From Equation 12.46, we can

represent a rank r matrix as follows:

X = σ1

⎝|u1

|

⎠(− vT

1 −)+ · · ·+ σr

⎝|ur

|

⎠(− vT

r −)

(12.54)

If the singular values die off quickly as in Figure 12.10, we can produce a rank L approximationto the matrix as follows:

X ≈ U:,1:L S1:L,1:L VT:,1:L (12.55)

This is called a truncated SVD (see Figure 12.8(b)). The total number of parameters needed torepresent an N × D matrix using a rank L approximation is

NL+ LD + L = L(N +D + 1) (12.56)

5.3 Singular Value Decomposition5 SOLUTIONS AND DECOMPOSITIONS

5.2.3 Symmetric

Assume A is symmetric, then

VVT = I (i.e. V is orthogonal) (260)�i 2 R (i.e. �i is real) (261)

Tr(Ap) =P

i�pi (262)

eig(I + cA) = 1 + c�i (263)eig(A� cI) = �i � c (264)

eig(A�1) = ��1

i (265)

For a symmetric, positive matrix A,

eig(AT A) = eig(AAT ) = eig(A) � eig(A) (266)

5.2.4 Characteristic polynomial

The characteristic polynomial for the matrix A is

0 = det(A� �I) (267)= �n � g

1

�n�1 + g2

�n�2 � ... + (�1)ngn (268)

Note that the coe�cients gj for j = 1, ..., n are the n invariants under rotationof A. Thus, gj is the sum of the determinants of all the sub-matrices of A takenj rows and columns at a time. That is, g

1

is the trace of A, and g2

is the sumof the determinants of the n(n� 1)/2 sub-matrices that can be formed from Aby deleting all but two rows and columns, and so on – see [17].

5.3 Singular Value Decomposition

Any n⇥m matrix A can be written as

A = UDVT , (269)

whereU = eigenvectors of AAT n⇥ n

D =p

diag(eig(AAT )) n⇥mV = eigenvectors of AT A m⇥m

(270)

5.3.1 Symmetric Square decomposed into squares

Assume A to be n⇥ n and symmetric. Then⇥

A⇤

=⇥

V⇤ ⇥

D⇤ ⇥

VT⇤, (271)

where D is diagonal with the eigenvalues of A, and V is orthogonal and theeigenvectors of A.

Petersen & Pedersen, The Matrix Cookbook, Version: November 14, 2008, Page 30

12.2. Principal components analysis (PCA) 393

=

0

σ1

σD

. . .

D

D

DN − DDD

N

X = U S V T

(a)

L

σ1. . .σL

L D

L

D

N

X ≃ UL SL V TL

(b)

Figure 12.8 (a) SVD decomposition of non-square matrices X = USVT . The shaded parts of S, and allthe off-diagonal terms, are zero. The shaded entries in U and S are not computed in the economy-sizedversion, since they are not needed. (b) Truncated SVD approximation of rank L.

Since the eigenvectors are unaffected by linear scaling of a matrix, we see that the rightsingular vectors of X are equal to the eigenvectors of the empirical covariance Σ. Furthermore,the eigenvalues of Σ are a scaled version of the squared singular values. This means we canperform PCA using just a few lines of code (see pcaPmtk).However, the connection between PCA and SVD goes deeper. From Equation 12.46, we can

represent a rank r matrix as follows:

X = σ1

⎝|u1

|

⎠(− vT

1 −)+ · · ·+ σr

⎝|ur

|

⎠(− vT

r −)

(12.54)

If the singular values die off quickly as in Figure 12.10, we can produce a rank L approximationto the matrix as follows:

X ≈ U:,1:L S1:L,1:L VT:,1:L (12.55)

This is called a truncated SVD (see Figure 12.8(b)). The total number of parameters needed torepresent an N × D matrix using a rank L approximation is

NL+ LD + L = L(N +D + 1) (12.56)

5.3 Singular Value Decomposition5 SOLUTIONS AND DECOMPOSITIONS

5.2.3 Symmetric

Assume A is symmetric, then

VVT = I (i.e. V is orthogonal) (260)�i 2 R (i.e. �i is real) (261)

Tr(Ap) =P

i�pi (262)

eig(I + cA) = 1 + c�i (263)eig(A� cI) = �i � c (264)

eig(A�1) = ��1

i (265)

For a symmetric, positive matrix A,

eig(AT A) = eig(AAT ) = eig(A) � eig(A) (266)

5.2.4 Characteristic polynomial

The characteristic polynomial for the matrix A is

0 = det(A� �I) (267)= �n � g

1

�n�1 + g2

�n�2 � ... + (�1)ngn (268)

Note that the coe�cients gj for j = 1, ..., n are the n invariants under rotationof A. Thus, gj is the sum of the determinants of all the sub-matrices of A takenj rows and columns at a time. That is, g

1

is the trace of A, and g2

is the sumof the determinants of the n(n� 1)/2 sub-matrices that can be formed from Aby deleting all but two rows and columns, and so on – see [17].

5.3 Singular Value Decomposition

Any n⇥m matrix A can be written as

A = UDVT , (269)

whereU = eigenvectors of AAT n⇥ n

D =p

diag(eig(AAT )) n⇥mV = eigenvectors of AT A m⇥m

(270)

5.3.1 Symmetric Square decomposed into squares

Assume A to be n⇥ n and symmetric. Then⇥

A⇤

=⇥

V⇤ ⇥

D⇤ ⇥

VT⇤, (271)

where D is diagonal with the eigenvalues of A, and V is orthogonal and theeigenvectors of A.

Petersen & Pedersen, The Matrix Cookbook, Version: November 14, 2008, Page 30

Page 7: Computazione per l’interazione naturale: Richiami di ...boccignone.di.unimi.it/IN_2016_files/LezIN_AlgebraLineare_3.key.pdf · Un po’ di algebra lineare di base //Singular Value

Esempio: riduzione di dimensionalità

Esempio: riduzione di dimensionalità

Page 8: Computazione per l’interazione naturale: Richiami di ...boccignone.di.unimi.it/IN_2016_files/LezIN_AlgebraLineare_3.key.pdf · Un po’ di algebra lineare di base //Singular Value

Esempio: riduzione di dimensionalità

% Read the picture of the faces, and convert to black and white.faces = rgb2gray(imread('faces.png')); % Downsample, just to avoid dealing with high-res images.faces = im2double(imresize(faces, 0.5)); % Compute SVD of this tiger[U, D, V] = svd(faces); % Plot the magnitude of the singular values (log scale)sigmas = diag(D);figure; plot(log10(sigmas)); title('Singular Values (Log10 Scale)');figure; plot(cumsum(sigmas) / sum(sigmas)); title('Cumulative Percent of Total Sigmas'); % Show full-rank tigerfigure; subplot(4, 2, 1), imshow(faces), title('Full-Rank Faces'); % Compute low-rank approximations of the faces, and show themranks = [100, 50, 30, 20, 10, 3, 2];for i = 1:length(ranks) % Keep largest singular values, and nullify others. approx_sigmas = sigmas; approx_sigmas(ranks(i):end) = 0; % Form the singular value matrix, padded as necessary ns = length(sigmas); approx_S = D; approx_S(1:ns, 1:ns) = diag(approx_sigmas); % Compute low-rank approximation by multiplying out component matrices. approx_faces = U * approx_S * V'; % Plot approximation subplot(4, 2, i + 1), imshow(approx_faces), title(sprintf('Rank %d Faces', ranks(i)));end

12.2. Principal components analysis (PCA) 393

=

0

σ1

σD

. . .

D

D

DN − DDD

N

X = U S V T

(a)

L

σ1. . .σL

L D

L

D

N

X ≃ UL SL V TL

(b)

Figure 12.8 (a) SVD decomposition of non-square matrices X = USVT . The shaded parts of S, and allthe off-diagonal terms, are zero. The shaded entries in U and S are not computed in the economy-sizedversion, since they are not needed. (b) Truncated SVD approximation of rank L.

Since the eigenvectors are unaffected by linear scaling of a matrix, we see that the rightsingular vectors of X are equal to the eigenvectors of the empirical covariance Σ. Furthermore,the eigenvalues of Σ are a scaled version of the squared singular values. This means we canperform PCA using just a few lines of code (see pcaPmtk).However, the connection between PCA and SVD goes deeper. From Equation 12.46, we can

represent a rank r matrix as follows:

X = σ1

⎝|u1

|

⎠(− vT

1 −)+ · · ·+ σr

⎝|ur

|

⎠(− vT

r −)

(12.54)

If the singular values die off quickly as in Figure 12.10, we can produce a rank L approximationto the matrix as follows:

X ≈ U:,1:L S1:L,1:L VT:,1:L (12.55)

This is called a truncated SVD (see Figure 12.8(b)). The total number of parameters needed torepresent an N × D matrix using a rank L approximation is

NL+ LD + L = L(N +D + 1) (12.56)

5.3 Singular Value Decomposition5 SOLUTIONS AND DECOMPOSITIONS

5.2.3 Symmetric

Assume A is symmetric, then

VVT = I (i.e. V is orthogonal) (260)�i 2 R (i.e. �i is real) (261)

Tr(Ap) =P

i�pi (262)

eig(I + cA) = 1 + c�i (263)eig(A� cI) = �i � c (264)

eig(A�1) = ��1

i (265)

For a symmetric, positive matrix A,

eig(AT A) = eig(AAT ) = eig(A) � eig(A) (266)

5.2.4 Characteristic polynomial

The characteristic polynomial for the matrix A is

0 = det(A� �I) (267)= �n � g

1

�n�1 + g2

�n�2 � ... + (�1)ngn (268)

Note that the coe�cients gj for j = 1, ..., n are the n invariants under rotationof A. Thus, gj is the sum of the determinants of all the sub-matrices of A takenj rows and columns at a time. That is, g

1

is the trace of A, and g2

is the sumof the determinants of the n(n� 1)/2 sub-matrices that can be formed from Aby deleting all but two rows and columns, and so on – see [17].

5.3 Singular Value Decomposition

Any n⇥m matrix A can be written as

A = UDVT , (269)

whereU = eigenvectors of AAT n⇥ n

D =p

diag(eig(AAT )) n⇥mV = eigenvectors of AT A m⇥m

(270)

5.3.1 Symmetric Square decomposed into squares

Assume A to be n⇥ n and symmetric. Then⇥

A⇤

=⇥

V⇤ ⇥

D⇤ ⇥

VT⇤, (271)

where D is diagonal with the eigenvalues of A, and V is orthogonal and theeigenvectors of A.

Petersen & Pedersen, The Matrix Cookbook, Version: November 14, 2008, Page 30

12.2. Principal components analysis (PCA) 393

=

0

σ1

σD

. . .

D

D

DN − DDD

N

X = U S V T

(a)

L

σ1. . .σL

L D

L

D

N

X ≃ UL SL V TL

(b)

Figure 12.8 (a) SVD decomposition of non-square matrices X = USVT . The shaded parts of S, and allthe off-diagonal terms, are zero. The shaded entries in U and S are not computed in the economy-sizedversion, since they are not needed. (b) Truncated SVD approximation of rank L.

Since the eigenvectors are unaffected by linear scaling of a matrix, we see that the rightsingular vectors of X are equal to the eigenvectors of the empirical covariance Σ. Furthermore,the eigenvalues of Σ are a scaled version of the squared singular values. This means we canperform PCA using just a few lines of code (see pcaPmtk).However, the connection between PCA and SVD goes deeper. From Equation 12.46, we can

represent a rank r matrix as follows:

X = σ1

⎝|u1

|

⎠(− vT

1 −)+ · · ·+ σr

⎝|ur

|

⎠(− vT

r −)

(12.54)

If the singular values die off quickly as in Figure 12.10, we can produce a rank L approximationto the matrix as follows:

X ≈ U:,1:L S1:L,1:L VT:,1:L (12.55)

This is called a truncated SVD (see Figure 12.8(b)). The total number of parameters needed torepresent an N × D matrix using a rank L approximation is

NL+ LD + L = L(N +D + 1) (12.56)

392 Chapter 12. Latent linear models

12.2.3 Singular value decomposition (SVD)

We have defined the solution to PCA in terms of eigenvectors of the covariance matrix. However,there is another way to obtain the solution, based on the singular value decomposition, orSVD. This basically generalizes the notion of eigenvectors from square matrices to any kind ofmatrix.In particular, any (real) N × D matrix X can be decomposed as follows

X!"#$N×D

= U!"#$N×N

S!"#$N×D

VT!"#$D×D

(12.46)

where U is an N × N matrix whose columns are orthornormal (so UTU = IN ), V is D × Dmatrix whose rows and columns are orthonormal (so VTV = VVT = ID ), and S is a N × Dmatrix containing the r = min(N,D) singular values σi ≥ 0 on the main diagonal, with 0sfilling the rest of the matrix. The columns of U are the left singular vectors, and the columnsof V are the right singular vectors. See Figure 12.8(a) for an example.Since there are at most D singular values (assuming N > D), the last N − D columns of U

are irrelevant, since they will be multiplied by 0. The economy sized SVD, or thin SVD, avoidscomputing these unnecessary elements. Let us denote this decomposition by USV. If N > D,we have

X!"#$N×D

= U!"#$N×D

S!"#$D×D

VT!"#$D×D

(12.47)

as in Figure 12.8(a). If N < D, we have

X!"#$N×D

= U!"#$N×N

S!"#$N×N

VT!"#$N×D

(12.48)

Computing the economy-sized SVD takes O(NDmin(N,D)) time (Golub and van Loan 1996,p254).The connection between eigenvectors and singular vectors is the following. For an arbitrary

real matrix X, if X = USVT , we have

XTX = VSTUT USVT = V(STS)VT = VDVT (12.49)

where D = S2 is a diagonal matrix containing the squares singular values. Hence

(XTX)V = VD (12.50)

so the eigenvectors of XTX are equal to V, the right singular vectors of X, and the eigenvaluesof XTX are equal to D, the squared singular values. Similarly

XXT = USVT VSTUT = U(SST )UT (12.51)

(XXT )U = U(SST ) = UD (12.52)

so the eigenvectors of XXT are equal to U, the left singular vectors of X. Also, the eigenvaluesof XXT are equal to the squared singular values. We can summarize all this as follows:

U = evec(XXT ), V = evec(XTX), S2 = eval(XXT ) = eval(XTX) (12.53)

5.3 Singular Value Decomposition5 SOLUTIONS AND DECOMPOSITIONS

5.2.3 Symmetric

Assume A is symmetric, then

VVT = I (i.e. V is orthogonal) (260)�i 2 R (i.e. �i is real) (261)

Tr(Ap) =P

i�pi (262)

eig(I + cA) = 1 + c�i (263)eig(A� cI) = �i � c (264)

eig(A�1) = ��1

i (265)

For a symmetric, positive matrix A,

eig(AT A) = eig(AAT ) = eig(A) � eig(A) (266)

5.2.4 Characteristic polynomial

The characteristic polynomial for the matrix A is

0 = det(A� �I) (267)= �n � g

1

�n�1 + g2

�n�2 � ... + (�1)ngn (268)

Note that the coe�cients gj for j = 1, ..., n are the n invariants under rotationof A. Thus, gj is the sum of the determinants of all the sub-matrices of A takenj rows and columns at a time. That is, g

1

is the trace of A, and g2

is the sumof the determinants of the n(n� 1)/2 sub-matrices that can be formed from Aby deleting all but two rows and columns, and so on – see [17].

5.3 Singular Value Decomposition

Any n⇥m matrix A can be written as

A = UDVT , (269)

whereU = eigenvectors of AAT n⇥ n

D =p

diag(eig(AAT )) n⇥mV = eigenvectors of AT A m⇥m

(270)

5.3.1 Symmetric Square decomposed into squares

Assume A to be n⇥ n and symmetric. Then⇥

A⇤

=⇥

V⇤ ⇥

D⇤ ⇥

VT⇤, (271)

where D is diagonal with the eigenvalues of A, and V is orthogonal and theeigenvectors of A.

Petersen & Pedersen, The Matrix Cookbook, Version: November 14, 2008, Page 30

392 Chapter 12. Latent linear models

12.2.3 Singular value decomposition (SVD)

We have defined the solution to PCA in terms of eigenvectors of the covariance matrix. However,there is another way to obtain the solution, based on the singular value decomposition, orSVD. This basically generalizes the notion of eigenvectors from square matrices to any kind ofmatrix.

In particular, any (real) N × D matrix X can be decomposed as follows

X!"#$N×D

= U!"#$N×N

S!"#$N×D

VT!"#$D×D

(12.46)

where U is an N × N matrix whose columns are orthornormal (so UTU = IN ), V is D × Dmatrix whose rows and columns are orthonormal (so VTV = VVT = ID ), and S is a N × Dmatrix containing the r = min(N,D) singular values σi ≥ 0 on the main diagonal, with 0sfilling the rest of the matrix. The columns of U are the left singular vectors, and the columnsof V are the right singular vectors. See Figure 12.8(a) for an example.Since there are at most D singular values (assuming N > D), the last N − D columns of U

are irrelevant, since they will be multiplied by 0. The economy sized SVD, or thin SVD, avoidscomputing these unnecessary elements. Let us denote this decomposition by USV. If N > D,we have

X!"#$N×D

= U!"#$N×D

S!"#$D×D

VT!"#$D×D

(12.47)

as in Figure 12.8(a). If N < D, we have

X!"#$N×D

= U!"#$N×N

S!"#$N×N

VT!"#$N×D

(12.48)

Computing the economy-sized SVD takes O(NDmin(N,D)) time (Golub and van Loan 1996,p254).

The connection between eigenvectors and singular vectors is the following. For an arbitraryreal matrix X, if X = USVT , we have

XTX = VSTUT USVT = V(STS)VT = VDVT (12.49)

where D = S2 is a diagonal matrix containing the squares singular values. Hence

(XTX)V = VD (12.50)

so the eigenvectors of XTX are equal to V, the right singular vectors of X, and the eigenvaluesof XTX are equal to D, the squared singular values. Similarly

XXT = USVT VSTUT = U(SST )UT (12.51)

(XXT )U = U(SST ) = UD (12.52)

so the eigenvectors of XXT are equal to U, the left singular vectors of X. Also, the eigenvaluesof XXT are equal to the squared singular values. We can summarize all this as follows:

U = evec(XXT ), V = evec(XTX), S2 = eval(XXT ) = eval(XTX) (12.53)

5.3 Singular Value Decomposition5 SOLUTIONS AND DECOMPOSITIONS

5.2.3 Symmetric

Assume A is symmetric, then

VVT = I (i.e. V is orthogonal) (260)�i 2 R (i.e. �i is real) (261)

Tr(Ap) =P

i�pi (262)

eig(I + cA) = 1 + c�i (263)eig(A� cI) = �i � c (264)

eig(A�1) = ��1

i (265)

For a symmetric, positive matrix A,

eig(AT A) = eig(AAT ) = eig(A) � eig(A) (266)

5.2.4 Characteristic polynomial

The characteristic polynomial for the matrix A is

0 = det(A� �I) (267)= �n � g

1

�n�1 + g2

�n�2 � ... + (�1)ngn (268)

Note that the coe�cients gj for j = 1, ..., n are the n invariants under rotationof A. Thus, gj is the sum of the determinants of all the sub-matrices of A takenj rows and columns at a time. That is, g

1

is the trace of A, and g2

is the sumof the determinants of the n(n� 1)/2 sub-matrices that can be formed from Aby deleting all but two rows and columns, and so on – see [17].

5.3 Singular Value Decomposition

Any n⇥m matrix A can be written as

A = UDVT , (269)

whereU = eigenvectors of AAT n⇥ n

D =p

diag(eig(AAT )) n⇥mV = eigenvectors of AT A m⇥m

(270)

5.3.1 Symmetric Square decomposed into squares

Assume A to be n⇥ n and symmetric. Then⇥

A⇤

=⇥

V⇤ ⇥

D⇤ ⇥

VT⇤, (271)

where D is diagonal with the eigenvalues of A, and V is orthogonal and theeigenvectors of A.

Petersen & Pedersen, The Matrix Cookbook, Version: November 14, 2008, Page 30

Esempio: riduzione di dimensionalità

Page 9: Computazione per l’interazione naturale: Richiami di ...boccignone.di.unimi.it/IN_2016_files/LezIN_AlgebraLineare_3.key.pdf · Un po’ di algebra lineare di base //Singular Value

Esempio: riduzione di dimensionalità

12.2. Principal components analysis (PCA) 395

As an example, consider the 200 × 320 pixel image in Figure 12.9(top left). This has 64,000numbers in it. We see that a rank 20 approximation, with only (200+ 320+ 1)× 20 = 10, 420numbers is a very good approximation.

One can show that the error in this approximation is given by

||X − XL||F ≈ σL+1 (12.57)

Furthermore, one can show that the SVD offers the best rank L approximation to a matrix (bestin the sense of minimizing the above Frobenius norm).

Let us connect this back to PCA. Let X = USVT be a truncated SVD of X. We know thatW = V, and that Z = XW, so

Z = USVTV = US (12.58)

Furthermore, the optimal reconstruction is given by X = ZWT , so we find

X = USVT (12.59)

This is precisely the same as a truncated SVD approximation! This is another illustration of thefact that PCA is the best low rank approximation to the data.

12.2.4 Probabilistic PCA

We are now ready to revisit PPCA. One can show the following remarkable result.

Theorem 12.2.2 ((Tipping and Bishop 1999)). Consider a factor analysis model in which Ψ = σ2Iand W is orthogonal. The observed data log likelihood is given by

log p(X|W,σ2) = −N

2ln |C| − 1

2

N!

i=1

xTi C

−1xi = −N

2ln |C|+ tr(C−1Σ) (12.60)

where C = WWT + σ2I and S = 1N

"Ni=1 xixT

i = (1/N)XTX. (We are assuming centereddata, for notational simplicity.) The maxima of the log-likelihood are given by

W = V(Λ − σ2I)12R (12.61)

where R is an arbitrary L × L orthogonal matrix, V is the D × L matrix whose columns are thefirst L eigenvectors of S, and Λ is the corresponding diagonal matrix of eigenvalues. Without lossof generality, we can set R = I. Furthermore, the MLE of the noise variance is given by

σ2 =1

D − L

D!

j=L+1

λj (12.62)

which is the average variance associated with the discarded dimensions.

Thus, as σ2 → 0, we have W → V, as in classical PCA. What about Z? It is easy to see thatthe posterior over the latent factors is given by

p(zi|xi, θ) = N (zi|F−1WTxi,σ2F−1) (12.63)

F ! WTW + σ2I (12.64)

5.3 Singular Value Decomposition5 SOLUTIONS AND DECOMPOSITIONS

5.2.3 Symmetric

Assume A is symmetric, then

VVT = I (i.e. V is orthogonal) (260)�i 2 R (i.e. �i is real) (261)

Tr(Ap) =P

i�pi (262)

eig(I + cA) = 1 + c�i (263)eig(A� cI) = �i � c (264)

eig(A�1) = ��1

i (265)

For a symmetric, positive matrix A,

eig(AT A) = eig(AAT ) = eig(A) � eig(A) (266)

5.2.4 Characteristic polynomial

The characteristic polynomial for the matrix A is

0 = det(A� �I) (267)= �n � g

1

�n�1 + g2

�n�2 � ... + (�1)ngn (268)

Note that the coe�cients gj for j = 1, ..., n are the n invariants under rotationof A. Thus, gj is the sum of the determinants of all the sub-matrices of A takenj rows and columns at a time. That is, g

1

is the trace of A, and g2

is the sumof the determinants of the n(n� 1)/2 sub-matrices that can be formed from Aby deleting all but two rows and columns, and so on – see [17].

5.3 Singular Value Decomposition

Any n⇥m matrix A can be written as

A = UDVT , (269)

whereU = eigenvectors of AAT n⇥ n

D =p

diag(eig(AAT )) n⇥mV = eigenvectors of AT A m⇥m

(270)

5.3.1 Symmetric Square decomposed into squares

Assume A to be n⇥ n and symmetric. Then⇥

A⇤

=⇥

V⇤ ⇥

D⇤ ⇥

VT⇤, (271)

where D is diagonal with the eigenvalues of A, and V is orthogonal and theeigenvectors of A.

Petersen & Pedersen, The Matrix Cookbook, Version: November 14, 2008, Page 30

5.3 Singular Value Decomposition5 SOLUTIONS AND DECOMPOSITIONS

5.2.3 Symmetric

Assume A is symmetric, then

VVT = I (i.e. V is orthogonal) (260)�i 2 R (i.e. �i is real) (261)

Tr(Ap) =P

i�pi (262)

eig(I + cA) = 1 + c�i (263)eig(A� cI) = �i � c (264)

eig(A�1) = ��1

i (265)

For a symmetric, positive matrix A,

eig(AT A) = eig(AAT ) = eig(A) � eig(A) (266)

5.2.4 Characteristic polynomial

The characteristic polynomial for the matrix A is

0 = det(A� �I) (267)= �n � g

1

�n�1 + g2

�n�2 � ... + (�1)ngn (268)

Note that the coe�cients gj for j = 1, ..., n are the n invariants under rotationof A. Thus, gj is the sum of the determinants of all the sub-matrices of A takenj rows and columns at a time. That is, g

1

is the trace of A, and g2

is the sumof the determinants of the n(n� 1)/2 sub-matrices that can be formed from Aby deleting all but two rows and columns, and so on – see [17].

5.3 Singular Value Decomposition

Any n⇥m matrix A can be written as

A = UDVT , (269)

whereU = eigenvectors of AAT n⇥ n

D =p

diag(eig(AAT )) n⇥mV = eigenvectors of AT A m⇥m

(270)

5.3.1 Symmetric Square decomposed into squares

Assume A to be n⇥ n and symmetric. Then⇥

A⇤

=⇥

V⇤ ⇥

D⇤ ⇥

VT⇤, (271)

where D is diagonal with the eigenvalues of A, and V is orthogonal and theeigenvectors of A.

Petersen & Pedersen, The Matrix Cookbook, Version: November 14, 2008, Page 30

Errore di approssimazione

Esempio: riduzione di dimensionalità

Page 10: Computazione per l’interazione naturale: Richiami di ...boccignone.di.unimi.it/IN_2016_files/LezIN_AlgebraLineare_3.key.pdf · Un po’ di algebra lineare di base //Singular Value

Un po’ di algebra lineare di base //Forme quadratiche

Importanti: compaiono nelle funzioni di costo e

nella distribuzione Gaussiana

A è definita positiva

A è semi-definita positiva