Dimensionality Reduction Shannon Quinn (with thanks to William Cohen of Carnegie Mellon University,...
-
Upload
marlene-lyons -
Category
Documents
-
view
217 -
download
0
Transcript of Dimensionality Reduction Shannon Quinn (with thanks to William Cohen of Carnegie Mellon University,...
![Page 1: Dimensionality Reduction Shannon Quinn (with thanks to William Cohen of Carnegie Mellon University, and J. Leskovec, A. Rajaraman, and J. Ullman of Stanford.](https://reader035.fdocuments.in/reader035/viewer/2022062719/56649ece5503460f94bda55c/html5/thumbnails/1.jpg)
Dimensionality ReductionShannon Quinn
(with thanks to William Cohen of Carnegie Mellon University, and J. Leskovec, A. Rajaraman, and J. Ullman of Stanford University)
![Page 2: Dimensionality Reduction Shannon Quinn (with thanks to William Cohen of Carnegie Mellon University, and J. Leskovec, A. Rajaraman, and J. Ullman of Stanford.](https://reader035.fdocuments.in/reader035/viewer/2022062719/56649ece5503460f94bda55c/html5/thumbnails/2.jpg)
J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, http://www.mmds.org
2
Dimensionality Reduction
• Assumption: Data lies on or near a low d-dimensional subspace• Axes of this subspace are effective representation of the data
![Page 3: Dimensionality Reduction Shannon Quinn (with thanks to William Cohen of Carnegie Mellon University, and J. Leskovec, A. Rajaraman, and J. Ullman of Stanford.](https://reader035.fdocuments.in/reader035/viewer/2022062719/56649ece5503460f94bda55c/html5/thumbnails/3.jpg)
J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, http://www.mmds.org
3
Rank of a Matrix• Q: What is rank of a matrix A?• A: Number of linearly independent columns of A• For example:
– Matrix A = has rank r=2• Why? The first two rows are linearly independent, so the rank is at least 2, but all three rows are linearly dependent (the first is equal to the sum of the second and third) so the rank must be less than 3.
• Why do we care about low rank?– We can write A as two “basis” vectors: [1 2 1] [-2 -3 1]– And new coordinates of : [1 0] [0 1] [1 1]
![Page 4: Dimensionality Reduction Shannon Quinn (with thanks to William Cohen of Carnegie Mellon University, and J. Leskovec, A. Rajaraman, and J. Ullman of Stanford.](https://reader035.fdocuments.in/reader035/viewer/2022062719/56649ece5503460f94bda55c/html5/thumbnails/4.jpg)
J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, http://www.mmds.org
4
Rank is “Dimensionality”• Cloud of points 3D space:
– Think of point positionsas a matrix:• We can rewrite coordinates more efficiently!
– Old basis vectors: [1 0 0] [0 1 0] [0 0 1]– New basis vectors: [1 2 1] [-2 -3 1]– Then A has new coordinates: [1 0]. B: [0 1], C: [1 1]
• Notice: We reduced the number of coordinates!
1 row per point:
ABC
A
![Page 5: Dimensionality Reduction Shannon Quinn (with thanks to William Cohen of Carnegie Mellon University, and J. Leskovec, A. Rajaraman, and J. Ullman of Stanford.](https://reader035.fdocuments.in/reader035/viewer/2022062719/56649ece5503460f94bda55c/html5/thumbnails/5.jpg)
J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, http://www.mmds.org
5
Dimensionality Reduction• Goal of dimensionality reduction is to discover the axis of data!
Rather than representingevery point with 2 coordinateswe represent each point with1 coordinate (corresponding tothe position of the point on the red line).
By doing this we incur a bit oferror as the points do not exactly lie on the line
![Page 6: Dimensionality Reduction Shannon Quinn (with thanks to William Cohen of Carnegie Mellon University, and J. Leskovec, A. Rajaraman, and J. Ullman of Stanford.](https://reader035.fdocuments.in/reader035/viewer/2022062719/56649ece5503460f94bda55c/html5/thumbnails/6.jpg)
J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, http://www.mmds.org
6
Why Reduce Dimensions?Why reduce dimensions?• Discover hidden correlations/topics
–Words that occur commonly together• Remove redundant and noisy features
–Not all words are useful• Interpretation and visualization• Easier storage and processing of the data
![Page 7: Dimensionality Reduction Shannon Quinn (with thanks to William Cohen of Carnegie Mellon University, and J. Leskovec, A. Rajaraman, and J. Ullman of Stanford.](https://reader035.fdocuments.in/reader035/viewer/2022062719/56649ece5503460f94bda55c/html5/thumbnails/7.jpg)
J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, http://www.mmds.org
7
SVD - DefinitionA[m x n] = U[m x r] [ r x r] (V[n x r])T• A: Input data matrix
– m x n matrix (e.g., m documents, n terms)• U: Left singular vectors
– m x r matrix (m documents, r concepts)• : Singular values
– r x r diagonal matrix (strength of each ‘concept’) (r : rank of the matrix A)• V: Right singular vectors
– n x r matrix (n terms, r concepts)
![Page 8: Dimensionality Reduction Shannon Quinn (with thanks to William Cohen of Carnegie Mellon University, and J. Leskovec, A. Rajaraman, and J. Ullman of Stanford.](https://reader035.fdocuments.in/reader035/viewer/2022062719/56649ece5503460f94bda55c/html5/thumbnails/8.jpg)
J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, http://www.mmds.org
8
SVD
Am
n
m
n
U
VT
T
![Page 9: Dimensionality Reduction Shannon Quinn (with thanks to William Cohen of Carnegie Mellon University, and J. Leskovec, A. Rajaraman, and J. Ullman of Stanford.](https://reader035.fdocuments.in/reader035/viewer/2022062719/56649ece5503460f94bda55c/html5/thumbnails/9.jpg)
J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, http://www.mmds.org
9
SVD
Am
n
+
1u1v1 2u2v2
σi … scalarui … vectorvi … vector
T
![Page 10: Dimensionality Reduction Shannon Quinn (with thanks to William Cohen of Carnegie Mellon University, and J. Leskovec, A. Rajaraman, and J. Ullman of Stanford.](https://reader035.fdocuments.in/reader035/viewer/2022062719/56649ece5503460f94bda55c/html5/thumbnails/10.jpg)
J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, http://www.mmds.org
10
SVD - PropertiesIt is always possible to decompose a real matrix A into A = U VT , where• U, , V: unique• U, V: column orthonormal
– UT U = I; VT V = I (I: identity matrix)– (Columns are orthogonal unit vectors)
• : diagonal– Entries (singular values) are positive, and sorted in decreasing order (σ1 σ2 ... 0)
Nice proof of uniqueness: http://www.mpi-inf.mpg.de/~bast/ir-seminar-ws04/lecture2.pdf
![Page 11: Dimensionality Reduction Shannon Quinn (with thanks to William Cohen of Carnegie Mellon University, and J. Leskovec, A. Rajaraman, and J. Ullman of Stanford.](https://reader035.fdocuments.in/reader035/viewer/2022062719/56649ece5503460f94bda55c/html5/thumbnails/11.jpg)
J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, http://www.mmds.org
11
SVD – Example: Users-to-Movies• A = U VT - example: Users to Movies
=SciFi
Romance
Matr
ix
Alie
n
Sere
nit
y
Casa
bla
nca
Am
elie
1 1 1 0 03 3 3 0 04 4 4 0 05 5 5 0 00 2 0 4 40 0 0 5 50 1 0 2 2
m
n
U
VT
“Concepts” AKA Latent dimensionsAKA Latent factors
![Page 12: Dimensionality Reduction Shannon Quinn (with thanks to William Cohen of Carnegie Mellon University, and J. Leskovec, A. Rajaraman, and J. Ullman of Stanford.](https://reader035.fdocuments.in/reader035/viewer/2022062719/56649ece5503460f94bda55c/html5/thumbnails/12.jpg)
J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, http://www.mmds.org
12
SVD – Example: Users-to-Movies• A = U VT - example: Users to Movies
=SciFi
Romnce
x x
Matr
ix
Alie
n
Sere
nit
y
Casa
bla
nca
Am
elie
1 1 1 0 03 3 3 0 04 4 4 0 05 5 5 0 00 2 0 4 40 0 0 5 50 1 0 2 2
0.13 0.02 -0.010.41 0.07 -0.030.55 0.09 -0.040.68 0.11 -0.050.15 -0.59 0.650.07 -0.73 -0.670.07 -0.29 0.32
12.4 0 00 9.5 00 0 1.3
0.56 0.59 0.56 0.09 0.090.12 -0.02 0.12 -0.69 -0.690.40 -0.80 0.40 0.09 0.09
![Page 13: Dimensionality Reduction Shannon Quinn (with thanks to William Cohen of Carnegie Mellon University, and J. Leskovec, A. Rajaraman, and J. Ullman of Stanford.](https://reader035.fdocuments.in/reader035/viewer/2022062719/56649ece5503460f94bda55c/html5/thumbnails/13.jpg)
J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, http://www.mmds.org
13
SVD – Example: Users-to-Movies• A = U VT - example: Users to Movies SciFi-concept
Romance-concept
=SciFi
Romnce
x x
Matr
ix
Alie
n
Sere
nit
y
Casa
bla
nca
Am
elie
1 1 1 0 03 3 3 0 04 4 4 0 05 5 5 0 00 2 0 4 40 0 0 5 50 1 0 2 2
0.13 0.02 -0.010.41 0.07 -0.030.55 0.09 -0.040.68 0.11 -0.050.15 -0.59 0.650.07 -0.73 -0.670.07 -0.29 0.32
12.4 0 00 9.5 00 0 1.3
0.56 0.59 0.56 0.09 0.090.12 -0.02 0.12 -0.69 -0.690.40 -0.80 0.40 0.09 0.09
![Page 14: Dimensionality Reduction Shannon Quinn (with thanks to William Cohen of Carnegie Mellon University, and J. Leskovec, A. Rajaraman, and J. Ullman of Stanford.](https://reader035.fdocuments.in/reader035/viewer/2022062719/56649ece5503460f94bda55c/html5/thumbnails/14.jpg)
J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, http://www.mmds.org
14
SVD – Example: Users-to-Movies• A = U VT - example:
Romance-concept
U is “user-to-concept” similarity matrix
SciFi-concept
=SciFi
Romnce
x x
Matr
ix
Alie
n
Sere
nit
y
Casa
bla
nca
Am
elie
1 1 1 0 03 3 3 0 04 4 4 0 05 5 5 0 00 2 0 4 40 0 0 5 50 1 0 2 2
0.13 0.02 -0.010.41 0.07 -0.030.55 0.09 -0.040.68 0.11 -0.050.15 -0.59 0.650.07 -0.73 -0.670.07 -0.29 0.32
12.4 0 00 9.5 00 0 1.3
0.56 0.59 0.56 0.09 0.090.12 -0.02 0.12 -0.69 -0.690.40 -0.80 0.40 0.09 0.09
![Page 15: Dimensionality Reduction Shannon Quinn (with thanks to William Cohen of Carnegie Mellon University, and J. Leskovec, A. Rajaraman, and J. Ullman of Stanford.](https://reader035.fdocuments.in/reader035/viewer/2022062719/56649ece5503460f94bda55c/html5/thumbnails/15.jpg)
J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, http://www.mmds.org
15
SVD – Example: Users-to-Movies• A = U VT - example:
SciFi
Romnce
SciFi-concept
“strength” of the SciFi-concept
=SciFi
Romnce
x x
Matr
ix
Alie
n
Sere
nit
y
Casa
bla
nca
Am
elie
1 1 1 0 03 3 3 0 04 4 4 0 05 5 5 0 00 2 0 4 40 0 0 5 50 1 0 2 2
0.13 0.02 -0.010.41 0.07 -0.030.55 0.09 -0.040.68 0.11 -0.050.15 -0.59 0.650.07 -0.73 -0.670.07 -0.29 0.32
12.4 0 00 9.5 00 0 1.3
0.56 0.59 0.56 0.09 0.090.12 -0.02 0.12 -0.69 -0.690.40 -0.80 0.40 0.09 0.09
![Page 16: Dimensionality Reduction Shannon Quinn (with thanks to William Cohen of Carnegie Mellon University, and J. Leskovec, A. Rajaraman, and J. Ullman of Stanford.](https://reader035.fdocuments.in/reader035/viewer/2022062719/56649ece5503460f94bda55c/html5/thumbnails/16.jpg)
J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, http://www.mmds.org
16
SVD – Example: Users-to-Movies• A = U VT - example:
SciFi-concept
V is “movie-to-concept”similarity matrix
SciFi-concept
=SciFi
Romnce
x x
Matr
ix
Alie
n
Sere
nit
y
Casa
bla
nca
Am
elie
1 1 1 0 03 3 3 0 04 4 4 0 05 5 5 0 00 2 0 4 40 0 0 5 50 1 0 2 2
0.13 0.02 -0.010.41 0.07 -0.030.55 0.09 -0.040.68 0.11 -0.050.15 -0.59 0.650.07 -0.73 -0.670.07 -0.29 0.32
12.4 0 00 9.5 00 0 1.3
0.56 0.59 0.56 0.09 0.090.12 -0.02 0.12 -0.69 -0.690.40 -0.80 0.40 0.09 0.09
![Page 17: Dimensionality Reduction Shannon Quinn (with thanks to William Cohen of Carnegie Mellon University, and J. Leskovec, A. Rajaraman, and J. Ullman of Stanford.](https://reader035.fdocuments.in/reader035/viewer/2022062719/56649ece5503460f94bda55c/html5/thumbnails/17.jpg)
J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, http://www.mmds.org
17
SVD - Interpretation #1‘movies’, ‘users’ and ‘concepts’:• U: user-to-concept similarity matrix• V: movie-to-concept similarity matrix• : its diagonal elements: ‘strength’ of each concept
![Page 18: Dimensionality Reduction Shannon Quinn (with thanks to William Cohen of Carnegie Mellon University, and J. Leskovec, A. Rajaraman, and J. Ullman of Stanford.](https://reader035.fdocuments.in/reader035/viewer/2022062719/56649ece5503460f94bda55c/html5/thumbnails/18.jpg)
J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, http://www.mmds.org
18
SVD - Interpretation #2More details• Q: How exactly is dim. reduction done?• A: Set smallest singular values to zero
= x x
1 1 1 0 03 3 3 0 04 4 4 0 05 5 5 0 00 2 0 4 40 0 0 5 50 1 0 2 2
0.13 0.02 -0.010.41 0.07 -0.030.55 0.09 -0.040.68 0.11 -0.050.15 -0.59 0.650.07 -0.73 -0.670.07 -0.29 0.32
12.4 0 00 9.5 00 0 1.3
0.56 0.59 0.56 0.09 0.090.12 -0.02 0.12 -0.69 -0.690.40 -0.80 0.40 0.09 0.09
![Page 19: Dimensionality Reduction Shannon Quinn (with thanks to William Cohen of Carnegie Mellon University, and J. Leskovec, A. Rajaraman, and J. Ullman of Stanford.](https://reader035.fdocuments.in/reader035/viewer/2022062719/56649ece5503460f94bda55c/html5/thumbnails/19.jpg)
J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, http://www.mmds.org
19
SVD - Interpretation #2More details• Q: How exactly is dim. reduction done?• A: Set smallest singular values to zero
x x
1 1 1 0 03 3 3 0 04 4 4 0 05 5 5 0 00 2 0 4 40 0 0 5 50 1 0 2 2
0.13 0.02 -0.010.41 0.07 -0.030.55 0.09 -0.040.68 0.11 -0.050.15 -0.59 0.650.07 -0.73 -0.670.07 -0.29 0.32
12.4 0 00 9.5 00 0 1.3
0.56 0.59 0.56 0.09 0.090.12 -0.02 0.12 -0.69 -0.690.40 -0.80 0.40 0.09 0.09
![Page 20: Dimensionality Reduction Shannon Quinn (with thanks to William Cohen of Carnegie Mellon University, and J. Leskovec, A. Rajaraman, and J. Ullman of Stanford.](https://reader035.fdocuments.in/reader035/viewer/2022062719/56649ece5503460f94bda55c/html5/thumbnails/20.jpg)
J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, http://www.mmds.org
20
SVD - Interpretation #2More details• Q: How exactly is dim. reduction done?• A: Set smallest singular values to zero
x x
1 1 1 0 03 3 3 0 04 4 4 0 05 5 5 0 00 2 0 4 40 0 0 5 50 1 0 2 2
0.13 0.02 -0.010.41 0.07 -0.030.55 0.09 -0.040.68 0.11 -0.050.15 -0.59 0.650.07 -0.73 -0.670.07 -0.29 0.32
12.4 0 00 9.5 00 0 1.3
0.56 0.59 0.56 0.09 0.090.12 -0.02 0.12 -0.69 -0.690.40 -0.80 0.40 0.09 0.09
![Page 21: Dimensionality Reduction Shannon Quinn (with thanks to William Cohen of Carnegie Mellon University, and J. Leskovec, A. Rajaraman, and J. Ullman of Stanford.](https://reader035.fdocuments.in/reader035/viewer/2022062719/56649ece5503460f94bda55c/html5/thumbnails/21.jpg)
J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, http://www.mmds.org
21
SVD - Interpretation #2More details• Q: How exactly is dim. reduction done?• A: Set smallest singular values to zero
x x
1 1 1 0 03 3 3 0 04 4 4 0 05 5 5 0 00 2 0 4 40 0 0 5 50 1 0 2 2
0.13 0.020.41 0.070.55 0.090.68 0.110.15 -0.590.07 -0.730.07 -0.29
12.4 0 0 9.5
0.56 0.59 0.56 0.09 0.090.12 -0.02 0.12 -0.69 -0.69
![Page 22: Dimensionality Reduction Shannon Quinn (with thanks to William Cohen of Carnegie Mellon University, and J. Leskovec, A. Rajaraman, and J. Ullman of Stanford.](https://reader035.fdocuments.in/reader035/viewer/2022062719/56649ece5503460f94bda55c/html5/thumbnails/22.jpg)
J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, http://www.mmds.org
22
SVD - Interpretation #2More details• Q: How exactly is dim. reduction done?• A: Set smallest singular values to zero
1 1 1 0 03 3 3 0 04 4 4 0 05 5 5 0 00 2 0 4 40 0 0 5 50 1 0 2 2
0.92 0.95 0.92 0.01 0.01 2.91 3.01 2.91 -0.01 -0.01 3.90 4.04 3.90 0.01 0.01 4.82 5.00 4.82 0.03 0.03 0.70 0.53 0.70 4.11 4.11-0.69 1.34 -0.69 4.78 4.78 0.32 0.23 0.32 2.01 2.01
Frobenius norm:
ǁMǁF = Σij Mij2 ǁA-BǁF = Σij (Aij-Bij)2
is “small”
![Page 23: Dimensionality Reduction Shannon Quinn (with thanks to William Cohen of Carnegie Mellon University, and J. Leskovec, A. Rajaraman, and J. Ullman of Stanford.](https://reader035.fdocuments.in/reader035/viewer/2022062719/56649ece5503460f94bda55c/html5/thumbnails/23.jpg)
J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, http://www.mmds.org
23
SVD – Best Low Rank Approx.A U
Sigma
VT=
B U
Sigma
VT
=
B is best approximation of A
![Page 24: Dimensionality Reduction Shannon Quinn (with thanks to William Cohen of Carnegie Mellon University, and J. Leskovec, A. Rajaraman, and J. Ullman of Stanford.](https://reader035.fdocuments.in/reader035/viewer/2022062719/56649ece5503460f94bda55c/html5/thumbnails/24.jpg)
J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, http://www.mmds.org
24
SVD - Interpretation #2Equivalent:‘spectral decomposition’ of the matrix:
= x xu1 u2
σ1
σ2
v1
v2
1 1 1 0 03 3 3 0 04 4 4 0 05 5 5 0 00 2 0 4 40 0 0 5 50 1 0 2 2
![Page 25: Dimensionality Reduction Shannon Quinn (with thanks to William Cohen of Carnegie Mellon University, and J. Leskovec, A. Rajaraman, and J. Ullman of Stanford.](https://reader035.fdocuments.in/reader035/viewer/2022062719/56649ece5503460f94bda55c/html5/thumbnails/25.jpg)
J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, http://www.mmds.org
25
SVD - Interpretation #2Equivalent:‘spectral decomposition’ of the matrix= u1σ1 vT
1 u2σ2 vT2+ +...
n
m
n x 1
1 x m
k terms
Assume: σ1 σ2 σ3 ... 0
Why is setting small σi to 0 the right thing to do?Vectors ui and vi are unit length, so σi scales them.So, zeroing small σi introduces less error.
1 1 1 0 03 3 3 0 04 4 4 0 05 5 5 0 00 2 0 4 40 0 0 5 50 1 0 2 2
![Page 26: Dimensionality Reduction Shannon Quinn (with thanks to William Cohen of Carnegie Mellon University, and J. Leskovec, A. Rajaraman, and J. Ullman of Stanford.](https://reader035.fdocuments.in/reader035/viewer/2022062719/56649ece5503460f94bda55c/html5/thumbnails/26.jpg)
J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, http://www.mmds.org
26
SVD - Interpretation #2Q: How many σs to keep?A: Rule-of-a thumb: keep 80-90% of ‘energy’
= u1σ1 vT1 u2σ2 vT
2+ +...n
m
Assume: σ1 σ2 σ3 ...
1 1 1 0 03 3 3 0 04 4 4 0 05 5 5 0 00 2 0 4 40 0 0 5 50 1 0 2 2
![Page 27: Dimensionality Reduction Shannon Quinn (with thanks to William Cohen of Carnegie Mellon University, and J. Leskovec, A. Rajaraman, and J. Ullman of Stanford.](https://reader035.fdocuments.in/reader035/viewer/2022062719/56649ece5503460f94bda55c/html5/thumbnails/27.jpg)
Computing SVD(especially in a MapReduce setting…)1. Stochastic Gradient Descent (today)2. Stochastic SVD (not today)3. Other methods (not today)
![Page 28: Dimensionality Reduction Shannon Quinn (with thanks to William Cohen of Carnegie Mellon University, and J. Leskovec, A. Rajaraman, and J. Ullman of Stanford.](https://reader035.fdocuments.in/reader035/viewer/2022062719/56649ece5503460f94bda55c/html5/thumbnails/28.jpg)
talk pilfered from …..
KDD 2011
![Page 29: Dimensionality Reduction Shannon Quinn (with thanks to William Cohen of Carnegie Mellon University, and J. Leskovec, A. Rajaraman, and J. Ullman of Stanford.](https://reader035.fdocuments.in/reader035/viewer/2022062719/56649ece5503460f94bda55c/html5/thumbnails/29.jpg)
for image denoising
![Page 30: Dimensionality Reduction Shannon Quinn (with thanks to William Cohen of Carnegie Mellon University, and J. Leskovec, A. Rajaraman, and J. Ullman of Stanford.](https://reader035.fdocuments.in/reader035/viewer/2022062719/56649ece5503460f94bda55c/html5/thumbnails/30.jpg)
Matrix factorization as SGD
step size
![Page 31: Dimensionality Reduction Shannon Quinn (with thanks to William Cohen of Carnegie Mellon University, and J. Leskovec, A. Rajaraman, and J. Ullman of Stanford.](https://reader035.fdocuments.in/reader035/viewer/2022062719/56649ece5503460f94bda55c/html5/thumbnails/31.jpg)
![Page 32: Dimensionality Reduction Shannon Quinn (with thanks to William Cohen of Carnegie Mellon University, and J. Leskovec, A. Rajaraman, and J. Ullman of Stanford.](https://reader035.fdocuments.in/reader035/viewer/2022062719/56649ece5503460f94bda55c/html5/thumbnails/32.jpg)
Matrix factorization as SGD - why does this work?
step size
![Page 33: Dimensionality Reduction Shannon Quinn (with thanks to William Cohen of Carnegie Mellon University, and J. Leskovec, A. Rajaraman, and J. Ullman of Stanford.](https://reader035.fdocuments.in/reader035/viewer/2022062719/56649ece5503460f94bda55c/html5/thumbnails/33.jpg)
Matrix factorization as SGD - why does this work? Here’s the key claim:
![Page 34: Dimensionality Reduction Shannon Quinn (with thanks to William Cohen of Carnegie Mellon University, and J. Leskovec, A. Rajaraman, and J. Ullman of Stanford.](https://reader035.fdocuments.in/reader035/viewer/2022062719/56649ece5503460f94bda55c/html5/thumbnails/34.jpg)
Checking the claim
Think for SGD for logistic regression• LR loss = compare y and ŷ = dot(w,x)• similar but now update w (user weights) and x
(movie weight)
![Page 35: Dimensionality Reduction Shannon Quinn (with thanks to William Cohen of Carnegie Mellon University, and J. Leskovec, A. Rajaraman, and J. Ullman of Stanford.](https://reader035.fdocuments.in/reader035/viewer/2022062719/56649ece5503460f94bda55c/html5/thumbnails/35.jpg)
ALS = alternating least squares
![Page 36: Dimensionality Reduction Shannon Quinn (with thanks to William Cohen of Carnegie Mellon University, and J. Leskovec, A. Rajaraman, and J. Ullman of Stanford.](https://reader035.fdocuments.in/reader035/viewer/2022062719/56649ece5503460f94bda55c/html5/thumbnails/36.jpg)
![Page 37: Dimensionality Reduction Shannon Quinn (with thanks to William Cohen of Carnegie Mellon University, and J. Leskovec, A. Rajaraman, and J. Ullman of Stanford.](https://reader035.fdocuments.in/reader035/viewer/2022062719/56649ece5503460f94bda55c/html5/thumbnails/37.jpg)
![Page 38: Dimensionality Reduction Shannon Quinn (with thanks to William Cohen of Carnegie Mellon University, and J. Leskovec, A. Rajaraman, and J. Ullman of Stanford.](https://reader035.fdocuments.in/reader035/viewer/2022062719/56649ece5503460f94bda55c/html5/thumbnails/38.jpg)
Similar to McDonnell et al with perceptron learning
![Page 39: Dimensionality Reduction Shannon Quinn (with thanks to William Cohen of Carnegie Mellon University, and J. Leskovec, A. Rajaraman, and J. Ullman of Stanford.](https://reader035.fdocuments.in/reader035/viewer/2022062719/56649ece5503460f94bda55c/html5/thumbnails/39.jpg)
Slow convergence…..
![Page 40: Dimensionality Reduction Shannon Quinn (with thanks to William Cohen of Carnegie Mellon University, and J. Leskovec, A. Rajaraman, and J. Ullman of Stanford.](https://reader035.fdocuments.in/reader035/viewer/2022062719/56649ece5503460f94bda55c/html5/thumbnails/40.jpg)
![Page 41: Dimensionality Reduction Shannon Quinn (with thanks to William Cohen of Carnegie Mellon University, and J. Leskovec, A. Rajaraman, and J. Ullman of Stanford.](https://reader035.fdocuments.in/reader035/viewer/2022062719/56649ece5503460f94bda55c/html5/thumbnails/41.jpg)
![Page 42: Dimensionality Reduction Shannon Quinn (with thanks to William Cohen of Carnegie Mellon University, and J. Leskovec, A. Rajaraman, and J. Ullman of Stanford.](https://reader035.fdocuments.in/reader035/viewer/2022062719/56649ece5503460f94bda55c/html5/thumbnails/42.jpg)
![Page 43: Dimensionality Reduction Shannon Quinn (with thanks to William Cohen of Carnegie Mellon University, and J. Leskovec, A. Rajaraman, and J. Ullman of Stanford.](https://reader035.fdocuments.in/reader035/viewer/2022062719/56649ece5503460f94bda55c/html5/thumbnails/43.jpg)
![Page 44: Dimensionality Reduction Shannon Quinn (with thanks to William Cohen of Carnegie Mellon University, and J. Leskovec, A. Rajaraman, and J. Ullman of Stanford.](https://reader035.fdocuments.in/reader035/viewer/2022062719/56649ece5503460f94bda55c/html5/thumbnails/44.jpg)
More detail….• Randomly permute rows/cols of matrix• Chop V,W,H into blocks of size d x d
– m/d blocks in W, n/d blocks in H• Group the data:
– Pick a set of blocks with no overlapping rows or columns (a stratum)– Repeat until all blocks in V are covered
• Train the SGD– Process strata in series– Process blocks within a stratum in parallel
![Page 45: Dimensionality Reduction Shannon Quinn (with thanks to William Cohen of Carnegie Mellon University, and J. Leskovec, A. Rajaraman, and J. Ullman of Stanford.](https://reader035.fdocuments.in/reader035/viewer/2022062719/56649ece5503460f94bda55c/html5/thumbnails/45.jpg)
More detail….Z was V
![Page 46: Dimensionality Reduction Shannon Quinn (with thanks to William Cohen of Carnegie Mellon University, and J. Leskovec, A. Rajaraman, and J. Ullman of Stanford.](https://reader035.fdocuments.in/reader035/viewer/2022062719/56649ece5503460f94bda55c/html5/thumbnails/46.jpg)
More detail….• Initialize W,H randomly
– not at zero • Choose a random ordering (random sort) of the points in a stratum in each “sub-epoch”• Pick strata sequence by permuting rows and columns of M, and using M’[k,i] as column index of row i in subepoch k • Use “bold driver” to set step size:
– increase step size when loss decreases (in an epoch)– decrease step size when loss increases
• Implemented in Hadoop and R/Snowfall
M=
![Page 47: Dimensionality Reduction Shannon Quinn (with thanks to William Cohen of Carnegie Mellon University, and J. Leskovec, A. Rajaraman, and J. Ullman of Stanford.](https://reader035.fdocuments.in/reader035/viewer/2022062719/56649ece5503460f94bda55c/html5/thumbnails/47.jpg)
![Page 48: Dimensionality Reduction Shannon Quinn (with thanks to William Cohen of Carnegie Mellon University, and J. Leskovec, A. Rajaraman, and J. Ullman of Stanford.](https://reader035.fdocuments.in/reader035/viewer/2022062719/56649ece5503460f94bda55c/html5/thumbnails/48.jpg)
![Page 49: Dimensionality Reduction Shannon Quinn (with thanks to William Cohen of Carnegie Mellon University, and J. Leskovec, A. Rajaraman, and J. Ullman of Stanford.](https://reader035.fdocuments.in/reader035/viewer/2022062719/56649ece5503460f94bda55c/html5/thumbnails/49.jpg)
![Page 50: Dimensionality Reduction Shannon Quinn (with thanks to William Cohen of Carnegie Mellon University, and J. Leskovec, A. Rajaraman, and J. Ullman of Stanford.](https://reader035.fdocuments.in/reader035/viewer/2022062719/56649ece5503460f94bda55c/html5/thumbnails/50.jpg)
![Page 51: Dimensionality Reduction Shannon Quinn (with thanks to William Cohen of Carnegie Mellon University, and J. Leskovec, A. Rajaraman, and J. Ullman of Stanford.](https://reader035.fdocuments.in/reader035/viewer/2022062719/56649ece5503460f94bda55c/html5/thumbnails/51.jpg)
Varying rank100 epochs for all
![Page 52: Dimensionality Reduction Shannon Quinn (with thanks to William Cohen of Carnegie Mellon University, and J. Leskovec, A. Rajaraman, and J. Ullman of Stanford.](https://reader035.fdocuments.in/reader035/viewer/2022062719/56649ece5503460f94bda55c/html5/thumbnails/52.jpg)
Hadoop scalabilityHadoop
process setup time starts to
dominate
![Page 53: Dimensionality Reduction Shannon Quinn (with thanks to William Cohen of Carnegie Mellon University, and J. Leskovec, A. Rajaraman, and J. Ullman of Stanford.](https://reader035.fdocuments.in/reader035/viewer/2022062719/56649ece5503460f94bda55c/html5/thumbnails/53.jpg)
J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, http://www.mmds.org
53
SVD - Conclusions so far• SVD: A= U VT: unique
–U: user-to-concept similarities–V: movie-to-concept similarities– : strength of each concept
• Dimensionality reduction: –keep the few largest singular values (80-90% of ‘energy’)–SVD: picks up linear correlations
![Page 54: Dimensionality Reduction Shannon Quinn (with thanks to William Cohen of Carnegie Mellon University, and J. Leskovec, A. Rajaraman, and J. Ullman of Stanford.](https://reader035.fdocuments.in/reader035/viewer/2022062719/56649ece5503460f94bda55c/html5/thumbnails/54.jpg)
J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, http://www.mmds.org
54
Relation to Eigen-decomposition• SVD gives us:
– A = U VT
• Eigen-decomposition:– A = X L XT
• A is symmetric• U, V, X are orthonormal (UTU=I),• , L are diagonal
• Now let’s calculate:
![Page 55: Dimensionality Reduction Shannon Quinn (with thanks to William Cohen of Carnegie Mellon University, and J. Leskovec, A. Rajaraman, and J. Ullman of Stanford.](https://reader035.fdocuments.in/reader035/viewer/2022062719/56649ece5503460f94bda55c/html5/thumbnails/55.jpg)
J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, http://www.mmds.org
55
Relation to Eigen-decomposition
X L2 XT
X L2 XT
Shows how to computeSVD using eigenvalue
decomposition!
• SVD gives us:– A = U VT
• Eigen-decomposition:– A = X L XT
• A is symmetric• U, V, X are orthonormal (UTU=I),• , L are diagonal
• Now let’s calculate:
![Page 56: Dimensionality Reduction Shannon Quinn (with thanks to William Cohen of Carnegie Mellon University, and J. Leskovec, A. Rajaraman, and J. Ullman of Stanford.](https://reader035.fdocuments.in/reader035/viewer/2022062719/56649ece5503460f94bda55c/html5/thumbnails/56.jpg)
J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, http://www.mmds.org
56
SVD: Drawbacks+ Optimal low-rank approximationin terms of Frobenius norm- Interpretability problem:
– A singular vector specifies a linear combination of all input columns or rows- Lack of sparsity:
– Singular vectors are dense!=
U
VT
![Page 57: Dimensionality Reduction Shannon Quinn (with thanks to William Cohen of Carnegie Mellon University, and J. Leskovec, A. Rajaraman, and J. Ullman of Stanford.](https://reader035.fdocuments.in/reader035/viewer/2022062719/56649ece5503460f94bda55c/html5/thumbnails/57.jpg)
J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, http://www.mmds.org
57
CUR Decomposition• Goal: Express A as a product of matrices C,U,RMake ǁA-C·U·RǁF small• “Constraints” on C and R:
A C U R
Frobenius norm:
ǁXǁF = Σij Xij2
![Page 58: Dimensionality Reduction Shannon Quinn (with thanks to William Cohen of Carnegie Mellon University, and J. Leskovec, A. Rajaraman, and J. Ullman of Stanford.](https://reader035.fdocuments.in/reader035/viewer/2022062719/56649ece5503460f94bda55c/html5/thumbnails/58.jpg)
J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, http://www.mmds.org
58
CUR Decomposition• Goal: Express A as a product of matrices C,U,RMake ǁA-C·U·RǁF small• “Constraints” on C and R:
Pseudo-inverse of the intersection of C and R
A C U R
Frobenius norm:
ǁXǁF = Σij Xij2
![Page 59: Dimensionality Reduction Shannon Quinn (with thanks to William Cohen of Carnegie Mellon University, and J. Leskovec, A. Rajaraman, and J. Ullman of Stanford.](https://reader035.fdocuments.in/reader035/viewer/2022062719/56649ece5503460f94bda55c/html5/thumbnails/59.jpg)
J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, http://www.mmds.org
60
CUR: How it Works• Sampling columns (similarly for rows):
Note this is a randomized algorithm, same column can be sampled more than once
![Page 60: Dimensionality Reduction Shannon Quinn (with thanks to William Cohen of Carnegie Mellon University, and J. Leskovec, A. Rajaraman, and J. Ullman of Stanford.](https://reader035.fdocuments.in/reader035/viewer/2022062719/56649ece5503460f94bda55c/html5/thumbnails/60.jpg)
J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, http://www.mmds.org
61
Computing U• Let W be the “intersection” of sampled columns C and rows R
– Let SVD of W = X Z YT• Then: U = W+ = Y Z+ XT
– Z+: reciprocals of non-zero singular values: Z+ii =1/ Zii– W+ is the “pseudoinverse”
AC
R
U = W+
W
Why pseudoinverse works?W = X Z Y then W-1 = X-1 Z-1 Y-1
Due to orthonomalityX-1=XT and Y-1=YT
Since Z is diagonal Z-1 = 1/Zii
Thus, if W is nonsingular, pseudoinverse is the true inverse
![Page 61: Dimensionality Reduction Shannon Quinn (with thanks to William Cohen of Carnegie Mellon University, and J. Leskovec, A. Rajaraman, and J. Ullman of Stanford.](https://reader035.fdocuments.in/reader035/viewer/2022062719/56649ece5503460f94bda55c/html5/thumbnails/61.jpg)
J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, http://www.mmds.org
62
CUR: Pros & Cons+ Easy interpretation
• Since the basis vectors are actual columns and rows+ Sparse basis
• Since the basis vectors are actual columns and rows- Duplicate columns and rows
• Columns of large norms will be sampled many timesSingular vector
Actual column
![Page 62: Dimensionality Reduction Shannon Quinn (with thanks to William Cohen of Carnegie Mellon University, and J. Leskovec, A. Rajaraman, and J. Ullman of Stanford.](https://reader035.fdocuments.in/reader035/viewer/2022062719/56649ece5503460f94bda55c/html5/thumbnails/62.jpg)
J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, http://www.mmds.org
63
Solution• If we want to get rid of the duplicates:
–Throw them away–Scale (multiply) the columns/rows by the square root of the number of duplicatesA
Cd
Rd
Cs
Rs
Construct a small U
![Page 63: Dimensionality Reduction Shannon Quinn (with thanks to William Cohen of Carnegie Mellon University, and J. Leskovec, A. Rajaraman, and J. Ullman of Stanford.](https://reader035.fdocuments.in/reader035/viewer/2022062719/56649ece5503460f94bda55c/html5/thumbnails/63.jpg)
J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, http://www.mmds.org
64
SVD vs. CURSVD: A = U VT
Huge but sparse Big and dense
CUR: A = C U R
Huge but sparse Big but sparse
dense but small
sparse and small
![Page 64: Dimensionality Reduction Shannon Quinn (with thanks to William Cohen of Carnegie Mellon University, and J. Leskovec, A. Rajaraman, and J. Ullman of Stanford.](https://reader035.fdocuments.in/reader035/viewer/2022062719/56649ece5503460f94bda55c/html5/thumbnails/64.jpg)
Stochastic SVD• “Randomized methods for computing low-rank approximations of matrices”, Nathan Halko 2012• “Finding Structure with Randomness: Probabilistic Algorithms for Constructing Approximate Matrix Decompositions”, Halko, Martinsson, and Tropp, 2011• “An Algorithm for the Principal Component Analysis of Large Data Sets”, Halko, Martinsson, Shkolnisky, and Tygert, 2010