Lecture 9: Linear algebracfgranda/pages/DSGA1002_fall15/material/lecture_9.pdfLinear minimum-MSE...
Transcript of Lecture 9: Linear algebracfgranda/pages/DSGA1002_fall15/material/lecture_9.pdfLinear minimum-MSE...
Lecture 9: Linear algebra
DS GA 1002 Statistical and Mathematical Modelshttp://www.cims.nyu.edu/~cfgranda/pages/DSGA1002_fall15
Carlos Fernandez-Granda
11/16/2015
Projections
Matrices
Eigendecomposition
Principal Component Analysis
Orthogonal projection
The orthogonal projection of x onto a subspace S is a vectorPS x such that x − PS x ∈ S⊥
For any orthonormal basis b1, . . . , bm of S,
PS x =m∑
i=1
〈x , bi 〉 bi
The projection of x onto the span of any vector v is
PS x =
⟨x ,
v||v ||〈·,·〉
⟩v
||v ||〈·,·〉
Lemma: The orthogonal projection is unique
Orthogonal projection
PS x is the vector in S that is closest to x
It is the solution to the optimization problem
minimizeu
||x − u||〈·,·〉subject to u ∈ S
Linear minimum-MSE estimation
Aim: Estimate X from Y using a linear estimator
Assumption: We know the means µX , µY , variances σ2X , σ
2Y and
correlation coefficient ρXY
The best linear estimate of X given Y in terms of MSE is
gLMMSE (y) =ρXY σX (y − µY )
σY+ µX
Projections
Matrices
Eigendecomposition
Principal Component Analysis
Matrices
Rectangular array of numbers, if A ∈ Rm×n
A =
A11 A12 · · · A1nA21 A22 · · · A2n
· · ·Am1 Am2 · · · Amn
Notation:
I Ai : is the ith row of A
I A:j is the jth column of A
The transpose of A AT ∈ Rn×m satisfies(AT)
ij= Aji
Matrix-vector multiplication
The product of a matrix A ∈ Rm×n and a vector x ∈ Rn equals
(A x)i =n∑
j=1
Aijx(j), 1 ≤ i ≤ m
Matrix Vector Multiplication
Ax =
A11 A12 . . . A1nA21 A22 . . . A2n
. . .Am1 Am2 . . . Amn
x1x2. . .xn
=
v1v2. . .vm
Matrix Vector Multiplication
Ax =
A11 A12 . . . A1nA21 A22 . . . A2n
. . .Am1 Am2 . . . Amn
x1x2. . .xn
=
v1v2. . .vm
Matrix Vector Multiplication
Ax =
A11 A12 . . . A1nA21 A22 . . . A2n
. . .Am1 Am2 . . . Amn
x1x2. . .xn
=
v1v2. . .vm
Matrix Vector Multiplication
Ax =
A11 A12 . . . A1nA21 A22 . . . A2n
. . .Am1 Am2 . . . Amn
x1x2. . .xn
=
v1v2. . .vm
Matrix-vector multiplication
Row interpretation:
A x =
〈A1:, x〉
〈A2:, x〉· · ·
〈Am:, x〉
Column interpretation:
A x =n∑
j=1
A:j x(j)
Dot product between x and y
x · y = xT y = yT x
Identity matrix
I =
1 0 · · · 00 1 · · · 0
· · ·0 0 · · · 1
Maps any vector to itself, for all x ∈ Rn
Ix = x
Matrix product
The product of A ∈ Rm×n and B ∈ Rn×p is a matrix AB ∈ Rm×p,
(AB)ij =n∑
k=1
AikBkj = 〈Ai :,B:,j〉
Matrix product
AB =
A11 A12 . . . A1nA21 A22 . . . A2n
. . .Am1 Am2 . . . Amn
B11 B12 . . . B1pB21 B22 . . . B2p
. . .Bn1 Bn2 . . . Bnp
=
AB11 AB12 . . . AB1pAB21 AB22 . . . AB2p
. . .ABm1 ABm2 . . . ABmp
Matrix product
AB =
A11 A12 . . . A1nA21 A22 . . . A2n
. . .Am1 Am2 . . . Amn
B11 B12 . . . B1pB21 B22 . . . B2p
. . .Bn1 Bn2 . . . Bnp
=
AB11 AB12 . . . AB1pAB21 AB22 . . . AB2p
. . .ABm1 ABm2 . . . ABmp
Matrix product
AB =
A11 A12 . . . A1nA21 A22 . . . A2n
. . .Am1 Am2 . . . Amn
B11 B12 . . . B1pB21 B22 . . . B2p
. . .Bn1 Bn2 . . . Bnp
=
AB11 AB12 . . . AB1pAB21 AB22 . . . AB2p
. . .ABm1 ABm2 . . . ABmp
Matrix product
AB =
A11 A12 . . . A1nA21 A22 . . . A2n
. . .Am1 Am2 . . . Amn
B11 B12 . . . B1pB21 B22 . . . B2p
. . .Bn1 Bn2 . . . Bnp
=
AB11 AB12 . . . AB1pAB21 AB22 . . . AB2p
. . .ABm1 ABm2 . . . ABmp
Matrix product
AB =
A11 A12 . . . A1nA21 A22 . . . A2n
. . .Am1 Am2 . . . Amn
B11 B12 . . . B1pB21 B22 . . . B2p
. . .Bn1 Bn2 . . . Bnp
=
AB11 AB12 . . . AB1pAB21 AB22 . . . AB2p
. . .ABm1 ABm2 . . . ABmp
Matrix product
AB =
A11 A12 . . . A1nA21 A22 . . . A2n
. . .Am1 Am2 . . . Amn
B11 B12 . . . B1pB21 B22 . . . B2p
. . .Bn1 Bn2 . . . Bnp
=
AB11 AB12 . . . AB1pAB21 AB22 . . . AB2p
. . .ABm1 ABm2 . . . ABmp
Matrix product
AB =
A11 A12 . . . A1nA21 A22 . . . A2n
. . .Am1 Am2 . . . Amn
B11 B12 . . . B1pB21 B22 . . . B2p
. . .Bn1 Bn2 . . . Bnp
=
AB11 AB12 . . . AB1pAB21 AB22 . . . AB2p
. . .ABm1 ABm2 . . . ABmp
Matrix product
AB =
A11 A12 . . . A1nA21 A22 . . . A2n
. . .Am1 Am2 . . . Amn
B11 B12 . . . B1pB21 B22 . . . B2p
. . .Bn1 Bn2 . . . Bnp
=
AB11 AB12 . . . AB1pAB21 AB22 . . . AB2p
. . .ABm1 ABm2 . . . ABmp
Matrix product
AB =
A11 A12 . . . A1nA21 A22 . . . A2n
. . .Am1 Am2 . . . Amn
B11 B12 . . . B1pB21 B22 . . . B2p
. . .Bn1 Bn2 . . . Bnp
=
AB11 AB12 . . . AB1pAB21 AB22 . . . AB2p
. . .ABm1 ABm2 . . . ABmp
Matrix product
AB =
A11 A12 . . . A1nA21 A22 . . . A2n
. . .Am1 Am2 . . . Amn
B11 B12 . . . B1pB21 B22 . . . B2p
. . .Bn1 Bn2 . . . Bnp
=
AB11 AB12 . . . AB1pAB21 AB22 . . . AB2p
. . .ABm1 ABm2 . . . ABmp
Matrix product
Column interpretation:
AB =[AB:1 AB:2 · · · AB:n
]The inverse A−1 of a square matrix A satisfies
AA−1 = A−1A = I
Orthogonal matrix
An orthogonal matrix is a square matrix such that
UTU = UUT = I
The columns U:1,U:2, . . . ,U:n form an orthonormal basis
For any vector x
x = UUT x =n∑
i=1
〈U:i , x〉U:i
UT x contains the basis coefficients of x
Applying an orthogonal matrix just rotates the vector
||Ux ||2 = ||x ||2
Projections
Matrices
Eigendecomposition
Principal Component Analysis
Eigenvectors and eigenvalues
An eigenvector v of A satisfies
Av = λv
for a real number λ which is the corresponding eigenvalue
Even if A is real, its eigenvectors and eigenvalues can be complex
Eigendecomposition
If a square matrix A ∈ Rn×n has n linearly independent eigenvectorsv1, . . . , vn with eigenvalues λ1, . . . , λn
A =[v1 v2 · · · vn
] λ1 0 · · · 00 λ2 · · · 0
· · ·0 0 · · · λn
[v1 v2 · · · vn]−1
= QΛQ−1
Usually, by convention λ1 ≥ λ2 ≥ · · · ≥ λn
This is the eigendecomposition of A
Not all matrices have an eigendecomposition[0 10 0
]
Power method
Let A = QΛQ−1
For an arbitrary vector x , express x in terms of Q:1,Q:2, . . . ,Q:n
x =n∑
i=1
αiQ:i
If we apply A to x k times
Akx =n∑
i=1
αiλki Q:i
If α1 6= 0, as k →∞ the term α1λk1Q:1 dominates
Power method
Input: Matrix A, vector x
Output: Eigenvector corresponding to dominant eigenvalue
Initialization: Set u1 := x/ ||x ||2
For i = 2, . . . ,m compute
ui :=Aui−1
||Aui−1||2
Markov chains
Consider a sequence of discrete random variables X0,X1, . . . such that
pXk+1|X0,X1,...,Xk (xk+1|x0, x1, . . . , xk) = pXk+1|Xk (xk+1|xk)
The sequence is a Markov chain
Let X be restricted to {α1, . . . , αn}, the Markov chain istime homogeneous if
Pij := pXk+1|Xk (αj |αi )
only depends on i and j , not on k for all 1 ≤ i , j ≤ n, k ≥ 0
Time-homogeneous Markov chains
Pij can be interpreted as entries of a transition matrix P
Consider the vector of probabilities
πk =
pXk (α1)
pXk (α2)
· · ·pXk (αn)
πk = P P · · ·P π0 = Pk π0
If P has an eigendecomposition and a dominant eigenvalue, as k →∞
limk→∞
πk = αv1, α ∈ R
Projections
Matrices
Eigendecomposition
Principal Component Analysis
Row and column space
The row space row (A) of a matrix A is the span of its rows
The column space col (A) is the span of its columns
Lemma:
rank (A) := dim (col (A)) = dim (row (A))
Number of linearly independent rows or columns is the same!
Singular-value decomposition
Every real matrix has a unique singular-value decomposition (svd)
A =[u1 u2 · · · ur
] σ1 0 · · · 00 σ2 · · · 0
· · ·0 0 · · · σr
vT1
vT2· · ·vTr
= UΣV T
The singular values are σ1 ≥ σ2 ≥ · · · ≥ σr ≥ 0
The left singular vectors u1, u2, . . . , ur form a basis for the column space
The right singular vectors v1, v2, . . . , vr form a basis for the row space
Principal Component Analysis
Aim: Find a basis for a subspace S of low dimension such that
xi ≈ PS xi
Idea: Greedy approach
1. Find unit-norm vector u1, such that the projection of the data ontoits span is as large as possible
2. Find unit-norm vector u2 orthogonal to u1, such that the projectionof the data onto its span is as large as possible
3. Find unit-norm vector u3 orthogonal to u1 and u2, such that theprojection of the data onto its span is as large as possible
4. . . .
Principal Component Analysis
We group the data x1, x2, . . . , xn as columns of a matrix X
X =[x1 x2 · · · xn
]For a unit-norm vector u, XTu contains the projections of x1, . . . , xnonto the span of u
What u maximizes the norm of the projection?
The top left singular vector because
σ1 = max||u||2=1
∣∣∣∣∣∣XTu∣∣∣∣∣∣
2
u1 = arg max||u||2=1
∣∣∣∣∣∣XTu∣∣∣∣∣∣
2
Principal Component Analysis
Similarly
σ2 = max||u||2=1,u⊥u1
∣∣∣∣∣∣XTu∣∣∣∣∣∣
2
u2 = arg max||u||2=1,u⊥u1
∣∣∣∣∣∣XTu∣∣∣∣∣∣
2
σi = max||u||2=1,u⊥u1,...,ui−1
∣∣∣∣∣∣XTu∣∣∣∣∣∣
2
ui = arg max||u||2=1,u⊥u1,...,ui−1
∣∣∣∣∣∣XTu∣∣∣∣∣∣
2
Dimensionality reduction / compression using the truncated svd
Principal Component Analysis
Similarly
σ2 = max||u||2=1,u⊥u1
∣∣∣∣∣∣XTu∣∣∣∣∣∣
2
u2 = arg max||u||2=1,u⊥u1
∣∣∣∣∣∣XTu∣∣∣∣∣∣
2
σi = max||u||2=1,u⊥u1,...,ui−1
∣∣∣∣∣∣XTu∣∣∣∣∣∣
2
ui = arg max||u||2=1,u⊥u1,...,ui−1
∣∣∣∣∣∣XTu∣∣∣∣∣∣
2
Dimensionality reduction / compression using the truncated svd
Example
Example
Example