Madhavan MukundMadhavan Mukund Lecture 16: 3 June, 2021 DMML Apr–Jul 2021 2/14 The curse of...
Transcript of Madhavan MukundMadhavan Mukund Lecture 16: 3 June, 2021 DMML Apr–Jul 2021 2/14 The curse of...
![Page 1: Madhavan MukundMadhavan Mukund Lecture 16: 3 June, 2021 DMML Apr–Jul 2021 2/14 The curse of dimensionality ML data is often high dimensional — especially images A 1000 1000 pixel](https://reader035.fdocuments.in/reader035/viewer/2022071607/6144b95d34130627ed508804/html5/thumbnails/1.jpg)
Lecture 16: 3 June, 2021
Madhavan Mukund
https://www.cmi.ac.in/~madhavan
Data Mining and Machine LearningApril–July 2021
![Page 2: Madhavan MukundMadhavan Mukund Lecture 16: 3 June, 2021 DMML Apr–Jul 2021 2/14 The curse of dimensionality ML data is often high dimensional — especially images A 1000 1000 pixel](https://reader035.fdocuments.in/reader035/viewer/2022071607/6144b95d34130627ed508804/html5/thumbnails/2.jpg)
The curse of dimensionality
ML data is often high dimensional — especially images
A 1000⇥ 1000 pixel image has 106 features
Data behaves very di↵erently in high dimensions
2D unit square, 0.04% probability of being near the border (within 0.001)
104D hypercube, 99.999999% probability of being near the border
Distances between items
2D unit square, mean distance between 2 random points is 0.52
3D unit cube, mean distance between 2 random points is 0.66
106D unit hypercube, mean distance between 2 random points is approximately 408.25
There’s a lot of “space” in higher dimensions!
Higher danger of overfitting
Madhavan Mukund Lecture 16: 3 June, 2021 DMML Apr–Jul 2021 2 / 14
![Page 3: Madhavan MukundMadhavan Mukund Lecture 16: 3 June, 2021 DMML Apr–Jul 2021 2/14 The curse of dimensionality ML data is often high dimensional — especially images A 1000 1000 pixel](https://reader035.fdocuments.in/reader035/viewer/2022071607/6144b95d34130627ed508804/html5/thumbnails/3.jpg)
The curse of dimensionality
ML data is often high dimensional — especially images
A 1000⇥ 1000 pixel image has 106 features
Data behaves very di↵erently in high dimensions
2D unit square, 0.04% probability of being near the border (within 0.001)
104D hypercube, 99.999999% probability of being near the border
Distances between items
2D unit square, mean distance between 2 random points is 0.52
3D unit cube, mean distance between 2 random points is 0.66
106D unit hypercube, mean distance between 2 random points is approximately 408.25
There’s a lot of “space” in higher dimensions!
Higher danger of overfitting
Madhavan Mukund Lecture 16: 3 June, 2021 DMML Apr–Jul 2021 2 / 14
"
I%.
![Page 4: Madhavan MukundMadhavan Mukund Lecture 16: 3 June, 2021 DMML Apr–Jul 2021 2/14 The curse of dimensionality ML data is often high dimensional — especially images A 1000 1000 pixel](https://reader035.fdocuments.in/reader035/viewer/2022071607/6144b95d34130627ed508804/html5/thumbnails/4.jpg)
The curse of dimensionality
ML data is often high dimensional — especially images
A 1000⇥ 1000 pixel image has 106 features
Data behaves very di↵erently in high dimensions
2D unit square, 0.04% probability of being near the border (within 0.001)
104D hypercube, 99.999999% probability of being near the border
Distances between items
2D unit square, mean distance between 2 random points is 0.52
3D unit cube, mean distance between 2 random points is 0.66
106D unit hypercube, mean distance between 2 random points is approximately 408.25
There’s a lot of “space” in higher dimensions!
Higher danger of overfitting
Madhavan Mukund Lecture 16: 3 June, 2021 DMML Apr–Jul 2021 2 / 14
![Page 5: Madhavan MukundMadhavan Mukund Lecture 16: 3 June, 2021 DMML Apr–Jul 2021 2/14 The curse of dimensionality ML data is often high dimensional — especially images A 1000 1000 pixel](https://reader035.fdocuments.in/reader035/viewer/2022071607/6144b95d34130627ed508804/html5/thumbnails/5.jpg)
The curse of dimensionality
ML data is often high dimensional — especially images
A 1000⇥ 1000 pixel image has 106 features
Data behaves very di↵erently in high dimensions
2D unit square, 0.04% probability of being near the border (within 0.001)
104D hypercube, 99.999999% probability of being near the border
Distances between items
2D unit square, mean distance between 2 random points is 0.52
3D unit cube, mean distance between 2 random points is 0.66
106D unit hypercube, mean distance between 2 random points is approximately 408.25
There’s a lot of “space” in higher dimensions!
Higher danger of overfitting
Madhavan Mukund Lecture 16: 3 June, 2021 DMML Apr–Jul 2021 2 / 14
![Page 6: Madhavan MukundMadhavan Mukund Lecture 16: 3 June, 2021 DMML Apr–Jul 2021 2/14 The curse of dimensionality ML data is often high dimensional — especially images A 1000 1000 pixel](https://reader035.fdocuments.in/reader035/viewer/2022071607/6144b95d34130627ed508804/html5/thumbnails/6.jpg)
The curse of dimensionality
ML data is often high dimensional — especially images
A 1000⇥ 1000 pixel image has 106 features
Data behaves very di↵erently in high dimensions
2D unit square, 0.04% probability of being near the border (within 0.001)
104D hypercube, 99.999999% probability of being near the border
Distances between items
2D unit square, mean distance between 2 random points is 0.52
3D unit cube, mean distance between 2 random points is 0.66
106D unit hypercube, mean distance between 2 random points is approximately 408.25
There’s a lot of “space” in higher dimensions!
Higher danger of overfitting
Madhavan Mukund Lecture 16: 3 June, 2021 DMML Apr–Jul 2021 2 / 14
☒
![Page 7: Madhavan MukundMadhavan Mukund Lecture 16: 3 June, 2021 DMML Apr–Jul 2021 2/14 The curse of dimensionality ML data is often high dimensional — especially images A 1000 1000 pixel](https://reader035.fdocuments.in/reader035/viewer/2022071607/6144b95d34130627ed508804/html5/thumbnails/7.jpg)
The curse of dimensionality
ML data is often high dimensional — especially images
A 1000⇥ 1000 pixel image has 106 features
Data behaves very di↵erently in high dimensions
2D unit square, 0.04% probability of being near the border (within 0.001)
104D hypercube, 99.999999% probability of being near the border
Distances between items
2D unit square, mean distance between 2 random points is 0.52
3D unit cube, mean distance between 2 random points is 0.66
106D unit hypercube, mean distance between 2 random points is approximately 408.25
There’s a lot of “space” in higher dimensions!
Higher danger of overfitting
Madhavan Mukund Lecture 16: 3 June, 2021 DMML Apr–Jul 2021 2 / 14
![Page 8: Madhavan MukundMadhavan Mukund Lecture 16: 3 June, 2021 DMML Apr–Jul 2021 2/14 The curse of dimensionality ML data is often high dimensional — especially images A 1000 1000 pixel](https://reader035.fdocuments.in/reader035/viewer/2022071607/6144b95d34130627ed508804/html5/thumbnails/8.jpg)
The curse of dimensionality
ML data is often high dimensional — especially images
A 1000⇥ 1000 pixel image has 106 features
Data behaves very di↵erently in high dimensions
2D unit square, 0.04% probability of being near the border (within 0.001)
104D hypercube, 99.999999% probability of being near the border
Distances between items
2D unit square, mean distance between 2 random points is 0.52
3D unit cube, mean distance between 2 random points is 0.66
106D unit hypercube, mean distance between 2 random points is approximately 408.25
There’s a lot of “space” in higher dimensions!
Higher danger of overfitting
Madhavan Mukund Lecture 16: 3 June, 2021 DMML Apr–Jul 2021 2 / 14
![Page 9: Madhavan MukundMadhavan Mukund Lecture 16: 3 June, 2021 DMML Apr–Jul 2021 2/14 The curse of dimensionality ML data is often high dimensional — especially images A 1000 1000 pixel](https://reader035.fdocuments.in/reader035/viewer/2022071607/6144b95d34130627ed508804/html5/thumbnails/9.jpg)
The curse of dimensionality
ML data is often high dimensional — especially images
A 1000⇥ 1000 pixel image has 106 features
Data behaves very di↵erently in high dimensions
2D unit square, 0.04% probability of being near the border (within 0.001)
104D hypercube, 99.999999% probability of being near the border
Distances between items
2D unit square, mean distance between 2 random points is 0.52
3D unit cube, mean distance between 2 random points is 0.66
106D unit hypercube, mean distance between 2 random points is approximately 408.25
There’s a lot of “space” in higher dimensions!
Higher danger of overfitting
Madhavan Mukund Lecture 16: 3 June, 2021 DMML Apr–Jul 2021 2 / 14
![Page 10: Madhavan MukundMadhavan Mukund Lecture 16: 3 June, 2021 DMML Apr–Jul 2021 2/14 The curse of dimensionality ML data is often high dimensional — especially images A 1000 1000 pixel](https://reader035.fdocuments.in/reader035/viewer/2022071607/6144b95d34130627ed508804/html5/thumbnails/10.jpg)
Dimensionality reduction
Remove unimportant features byprojecting to a smaller dimension
Example: project blue points in3D to black points in 2D plane
Principal Component Anaylsis —transform d-dimensional input tok-dimensional input, preservingessential features
Madhavan Mukund Lecture 16: 3 June, 2021 DMML Apr–Jul 2021 3 / 14
☒
![Page 11: Madhavan MukundMadhavan Mukund Lecture 16: 3 June, 2021 DMML Apr–Jul 2021 2/14 The curse of dimensionality ML data is often high dimensional — especially images A 1000 1000 pixel](https://reader035.fdocuments.in/reader035/viewer/2022071607/6144b95d34130627ed508804/html5/thumbnails/11.jpg)
Dimensionality reduction
Remove unimportant features byprojecting to a smaller dimension
Example: project blue points in3D to black points in 2D plane
Principal Component Anaylsis —transform d-dimensional input tok-dimensional input, preservingessential features
Madhavan Mukund Lecture 16: 3 June, 2021 DMML Apr–Jul 2021 3 / 14
![Page 12: Madhavan MukundMadhavan Mukund Lecture 16: 3 June, 2021 DMML Apr–Jul 2021 2/14 The curse of dimensionality ML data is often high dimensional — especially images A 1000 1000 pixel](https://reader035.fdocuments.in/reader035/viewer/2022071607/6144b95d34130627ed508804/html5/thumbnails/12.jpg)
Dimensionality reduction
Remove unimportant features byprojecting to a smaller dimension
Example: project blue points in3D to black points in 2D plane
Principal Component Anaylsis —transform d-dimensional input tok-dimensional input, preservingessential features
Madhavan Mukund Lecture 16: 3 June, 2021 DMML Apr–Jul 2021 3 / 14
![Page 13: Madhavan MukundMadhavan Mukund Lecture 16: 3 June, 2021 DMML Apr–Jul 2021 2/14 The curse of dimensionality ML data is often high dimensional — especially images A 1000 1000 pixel](https://reader035.fdocuments.in/reader035/viewer/2022071607/6144b95d34130627ed508804/html5/thumbnails/13.jpg)
Singular Value Decomposition (SVD)
Input matrix M, dimensions n ⇥ d
Rows are items, columns are features
Decompose M as UDV>
D is a k ⇥ k diagonal matrix, positive real entries
U is n ⇥ k , V is d ⇥ k
Columns of U, V are orthonormal — unit vectors, mutually orthogonal
Interpretation
Columns of V correspond to new abstract features
Rows of U describe decomposition of terms across features
M =P
i Dii (ui · v>i )
For columns ui of U and vi of V , ui · v>i is an n ⇥ d matrix, like M
ui · v>i describes components of rows of M along direction vi
Madhavan Mukund Lecture 16: 3 June, 2021 DMML Apr–Jul 2021 4 / 14
Features
"ia
V,V2
- -
w'dItems '
i
v? v1 - -v2
![Page 14: Madhavan MukundMadhavan Mukund Lecture 16: 3 June, 2021 DMML Apr–Jul 2021 2/14 The curse of dimensionality ML data is often high dimensional — especially images A 1000 1000 pixel](https://reader035.fdocuments.in/reader035/viewer/2022071607/6144b95d34130627ed508804/html5/thumbnails/14.jpg)
Singular Value Decomposition (SVD)
Input matrix M, dimensions n ⇥ d
Rows are items, columns are features
Decompose M as UDV>
D is a k ⇥ k diagonal matrix, positive real entries
U is n ⇥ k , V is d ⇥ k
Columns of U, V are orthonormal — unit vectors, mutually orthogonal
Interpretation
Columns of V correspond to new abstract features
Rows of U describe decomposition of terms across features
M =P
i Dii (ui · v>i )
For columns ui of U and vi of V , ui · v>i is an n ⇥ d matrix, like M
ui · v>i describes components of rows of M along direction vi
Madhavan Mukund Lecture 16: 3 June, 2021 DMML Apr–Jul 2021 4 / 14
U D v't☒ innxk
k×nkid
v
¥.
![Page 15: Madhavan MukundMadhavan Mukund Lecture 16: 3 June, 2021 DMML Apr–Jul 2021 2/14 The curse of dimensionality ML data is often high dimensional — especially images A 1000 1000 pixel](https://reader035.fdocuments.in/reader035/viewer/2022071607/6144b95d34130627ed508804/html5/thumbnails/15.jpg)
Singular Value Decomposition (SVD)
Input matrix M, dimensions n ⇥ d
Rows are items, columns are features
Decompose M as UDV>
D is a k ⇥ k diagonal matrix, positive real entries
U is n ⇥ k , V is d ⇥ k
Columns of U, V are orthonormal — unit vectors, mutually orthogonal
Interpretation
Columns of V correspond to new abstract features
Rows of U describe decomposition of terms across features
M =P
i Dii (ui · v>i )
For columns ui of U and vi of V , ui · v>i is an n ⇥ d matrix, like M
ui · v>i describes components of rows of M along direction vi
Madhavan Mukund Lecture 16: 3 June, 2021 DMML Apr–Jul 2021 4 / 14
① original feature
C) new features
![Page 16: Madhavan MukundMadhavan Mukund Lecture 16: 3 June, 2021 DMML Apr–Jul 2021 2/14 The curse of dimensionality ML data is often high dimensional — especially images A 1000 1000 pixel](https://reader035.fdocuments.in/reader035/viewer/2022071607/6144b95d34130627ed508804/html5/thumbnails/16.jpg)
Singular Value Decomposition (SVD)
Input matrix M, dimensions n ⇥ d
Rows are items, columns are features
Decompose M as UDV>
D is a k ⇥ k diagonal matrix, positive real entries
U is n ⇥ k , V is d ⇥ k
Columns of U, V are orthonormal — unit vectors, mutually orthogonal
Interpretation
Columns of V correspond to new abstract features
Rows of U describe decomposition of terms across features
M =P
i Dii (ui · v>i )
For columns ui of U and vi of V , ui · v>i is an n ⇥ d matrix, like M
ui · v>i describes components of rows of M along direction vi
Madhavan Mukund Lecture 16: 3 June, 2021 DMML Apr–Jul 2021 4 / 14
0
![Page 17: Madhavan MukundMadhavan Mukund Lecture 16: 3 June, 2021 DMML Apr–Jul 2021 2/14 The curse of dimensionality ML data is often high dimensional — especially images A 1000 1000 pixel](https://reader035.fdocuments.in/reader035/viewer/2022071607/6144b95d34130627ed508804/html5/thumbnails/17.jpg)
Singular Value Decomposition (SVD)
Input matrix M, dimensions n ⇥ d
Rows are items, columns are features
Decompose M as UDV>
D is a k ⇥ k diagonal matrix, positive real entries
U is n ⇥ k , V is d ⇥ k
Columns of U, V are orthonormal — unit vectors, mutually orthogonal
Interpretation
Columns of V correspond to new abstract features
Rows of U describe decomposition of terms across features
M =P
i Dii (ui · v>i )
For columns ui of U and vi of V , ui · v>i is an n ⇥ d matrix, like M
ui · v>i describes components of rows of M along direction vi
Madhavan Mukund Lecture 16: 3 June, 2021 DMML Apr–Jul 2021 4 / 14
![Page 18: Madhavan MukundMadhavan Mukund Lecture 16: 3 June, 2021 DMML Apr–Jul 2021 2/14 The curse of dimensionality ML data is often high dimensional — especially images A 1000 1000 pixel](https://reader035.fdocuments.in/reader035/viewer/2022071607/6144b95d34130627ed508804/html5/thumbnails/18.jpg)
Singular Value Decomposition (SVD)
Input matrix M, dimensions n ⇥ d
Rows are items, columns are features
Decompose M as UDV>
D is a k ⇥ k diagonal matrix, positive real entries
U is n ⇥ k , V is d ⇥ k
Columns of U, V are orthonormal — unit vectors, mutually orthogonal
Interpretation
Columns of V correspond to new abstract features
Rows of U describe decomposition of terms across features
M =P
i Dii (ui · v>i )
For columns ui of U and vi of V , ui · v>i is an n ⇥ d matrix, like M
ui · v>i describes components of rows of M along direction vi
Madhavan Mukund Lecture 16: 3 June, 2021 DMML Apr–Jul 2021 4 / 14
-
0 rnxl vi.dx , v5 : lxd
![Page 19: Madhavan MukundMadhavan Mukund Lecture 16: 3 June, 2021 DMML Apr–Jul 2021 2/14 The curse of dimensionality ML data is often high dimensional — especially images A 1000 1000 pixel](https://reader035.fdocuments.in/reader035/viewer/2022071607/6144b95d34130627ed508804/html5/thumbnails/19.jpg)
Singular Value Decomposition (SVD)
Input matrix M, dimensions n ⇥ d
Rows are items, columns are features
Decompose M as UDV>
D is a k ⇥ k diagonal matrix, positive real entries
U is n ⇥ k , V is d ⇥ k
Columns of U, V are orthonormal — unit vectors, mutually orthogonal
Interpretation
Columns of V correspond to new abstract features
Rows of U describe decomposition of terms across features
M =P
i Dii (ui · v>i )
For columns ui of U and vi of V , ui · v>i is an n ⇥ d matrix, like M
ui · v>i describes components of rows of M along direction vi
Madhavan Mukund Lecture 16: 3 June, 2021 DMML Apr–Jul 2021 4 / 14
![Page 20: Madhavan MukundMadhavan Mukund Lecture 16: 3 June, 2021 DMML Apr–Jul 2021 2/14 The curse of dimensionality ML data is often high dimensional — especially images A 1000 1000 pixel](https://reader035.fdocuments.in/reader035/viewer/2022071607/6144b95d34130627ed508804/html5/thumbnails/20.jpg)
Singular vectors
Unit vectors passing through theorigin
Want to find “best” k singularvectors to represent feature space
Suppose we projectai = (ai1, ai2, . . . , aid) onto vthrough origin
Minimizing distance of ai from v isequivalent to maximizing theprojection of ai onto v
Length of the projection is ai · v
Madhavan Mukund Lecture 16: 3 June, 2021 DMML Apr–Jul 2021 5 / 14
![Page 21: Madhavan MukundMadhavan Mukund Lecture 16: 3 June, 2021 DMML Apr–Jul 2021 2/14 The curse of dimensionality ML data is often high dimensional — especially images A 1000 1000 pixel](https://reader035.fdocuments.in/reader035/viewer/2022071607/6144b95d34130627ed508804/html5/thumbnails/21.jpg)
Singular vectors
Unit vectors passing through theorigin
Want to find “best” k singularvectors to represent feature space
Suppose we projectai = (ai1, ai2, . . . , aid) onto vthrough origin
Minimizing distance of ai from v isequivalent to maximizing theprojection of ai onto v
Length of the projection is ai · v
Madhavan Mukund Lecture 16: 3 June, 2021 DMML Apr–Jul 2021 5 / 14
![Page 22: Madhavan MukundMadhavan Mukund Lecture 16: 3 June, 2021 DMML Apr–Jul 2021 2/14 The curse of dimensionality ML data is often high dimensional — especially images A 1000 1000 pixel](https://reader035.fdocuments.in/reader035/viewer/2022071607/6144b95d34130627ed508804/html5/thumbnails/22.jpg)
Singular vectors
Unit vectors passing through theorigin
Want to find “best” k singularvectors to represent feature space
Suppose we projectai = (ai1, ai2, . . . , aid) onto vthrough origin
Minimizing distance of ai from v isequivalent to maximizing theprojection of ai onto v
Length of the projection is ai · v
Madhavan Mukund Lecture 16: 3 June, 2021 DMML Apr–Jul 2021 5 / 14
![Page 23: Madhavan MukundMadhavan Mukund Lecture 16: 3 June, 2021 DMML Apr–Jul 2021 2/14 The curse of dimensionality ML data is often high dimensional — especially images A 1000 1000 pixel](https://reader035.fdocuments.in/reader035/viewer/2022071607/6144b95d34130627ed508804/html5/thumbnails/23.jpg)
Singular vectors
Unit vectors passing through theorigin
Want to find “best” k singularvectors to represent feature space
Suppose we projectai = (ai1, ai2, . . . , aid) onto vthrough origin
Minimizing distance of ai from v isequivalent to maximizing theprojection of ai onto v
Length of the projection is ai · v
Madhavan Mukund Lecture 16: 3 June, 2021 DMML Apr–Jul 2021 5 / 14
antiSino -
- § → o o→ ☐
hes•
={wso→ I
![Page 24: Madhavan MukundMadhavan Mukund Lecture 16: 3 June, 2021 DMML Apr–Jul 2021 2/14 The curse of dimensionality ML data is often high dimensional — especially images A 1000 1000 pixel](https://reader035.fdocuments.in/reader035/viewer/2022071607/6144b95d34130627ed508804/html5/thumbnails/24.jpg)
Singular vectors . . .
Sum of squares of lengths of projections of all rows in M onto v — |Mv |2
First singular vector — unit vector through origin that maximizes the sum ofprojections of all rows in M
v1 = arg max|v |=1
|Mv |
Second singular vector — unit vector through origin, perpendicular to v1, thatmaximizes the sum of projections of all rows in M
v2 = arg maxv?v1; |v |=1
|Mv |
Third singular vector — unit vector through origin, perpendicular to v1, v2, thatmaximizes the sum of projections of all rows in M
v3 = arg maxv?v1,v2; |v |=1
|Mv |
Madhavan Mukund Lecture 16: 3 June, 2021 DMML Apr–Jul 2021 6 / 14
V air④
↳ei[¥ *an.v
1 MitM -✓
←pÉcanT
![Page 25: Madhavan MukundMadhavan Mukund Lecture 16: 3 June, 2021 DMML Apr–Jul 2021 2/14 The curse of dimensionality ML data is often high dimensional — especially images A 1000 1000 pixel](https://reader035.fdocuments.in/reader035/viewer/2022071607/6144b95d34130627ed508804/html5/thumbnails/25.jpg)
Singular vectors . . .
Sum of squares of lengths of projections of all rows in M onto v — |Mv |2
First singular vector — unit vector through origin that maximizes the sum ofprojections of all rows in M
v1 = arg max|v |=1
|Mv |
Second singular vector — unit vector through origin, perpendicular to v1, thatmaximizes the sum of projections of all rows in M
v2 = arg maxv?v1; |v |=1
|Mv |
Third singular vector — unit vector through origin, perpendicular to v1, v2, thatmaximizes the sum of projections of all rows in M
v3 = arg maxv?v1,v2; |v |=1
|Mv |
Madhavan Mukund Lecture 16: 3 June, 2021 DMML Apr–Jul 2021 6 / 14
![Page 26: Madhavan MukundMadhavan Mukund Lecture 16: 3 June, 2021 DMML Apr–Jul 2021 2/14 The curse of dimensionality ML data is often high dimensional — especially images A 1000 1000 pixel](https://reader035.fdocuments.in/reader035/viewer/2022071607/6144b95d34130627ed508804/html5/thumbnails/26.jpg)
Singular vectors . . .
Sum of squares of lengths of projections of all rows in M onto v — |Mv |2
First singular vector — unit vector through origin that maximizes the sum ofprojections of all rows in M
v1 = arg max|v |=1
|Mv |
Second singular vector — unit vector through origin, perpendicular to v1, thatmaximizes the sum of projections of all rows in M
v2 = arg maxv?v1; |v |=1
|Mv |
Third singular vector — unit vector through origin, perpendicular to v1, v2, thatmaximizes the sum of projections of all rows in M
v3 = arg maxv?v1,v2; |v |=1
|Mv |
Madhavan Mukund Lecture 16: 3 June, 2021 DMML Apr–Jul 2021 6 / 14
![Page 27: Madhavan MukundMadhavan Mukund Lecture 16: 3 June, 2021 DMML Apr–Jul 2021 2/14 The curse of dimensionality ML data is often high dimensional — especially images A 1000 1000 pixel](https://reader035.fdocuments.in/reader035/viewer/2022071607/6144b95d34130627ed508804/html5/thumbnails/27.jpg)
Singular vectors . . .
Sum of squares of lengths of projections of all rows in M onto v — |Mv |2
First singular vector — unit vector through origin that maximizes the sum ofprojections of all rows in M
v1 = arg max|v |=1
|Mv |
Second singular vector — unit vector through origin, perpendicular to v1, thatmaximizes the sum of projections of all rows in M
v2 = arg maxv?v1; |v |=1
|Mv |
Third singular vector — unit vector through origin, perpendicular to v1, v2, thatmaximizes the sum of projections of all rows in M
v3 = arg maxv?v1,v2; |v |=1
|Mv |
Madhavan Mukund Lecture 16: 3 June, 2021 DMML Apr–Jul 2021 6 / 14
![Page 28: Madhavan MukundMadhavan Mukund Lecture 16: 3 June, 2021 DMML Apr–Jul 2021 2/14 The curse of dimensionality ML data is often high dimensional — especially images A 1000 1000 pixel](https://reader035.fdocuments.in/reader035/viewer/2022071607/6144b95d34130627ed508804/html5/thumbnails/28.jpg)
Singular vectors . . .
With each singular vector vj , associated singular value is �j = |Mvj |
Repeat r times till maxv?v1,v2,...,vr ; |v |=1
|Mv | = 0
r turns out to be the rank of M
Vectors {v1, v2, . . . , vr} are orthonormal right singular vectors
Our greedy strategy provably produces “best-fit” dimension r subspace for M
Dimension r subspace that maximizes content of M projected onto it
Corresponding left singular vectors are given by ui =1
�iMvi
Can show that {u1,u2, . . . ,ur} are also orthonormal
Madhavan Mukund Lecture 16: 3 June, 2021 DMML Apr–Jul 2021 7 / 14
![Page 29: Madhavan MukundMadhavan Mukund Lecture 16: 3 June, 2021 DMML Apr–Jul 2021 2/14 The curse of dimensionality ML data is often high dimensional — especially images A 1000 1000 pixel](https://reader035.fdocuments.in/reader035/viewer/2022071607/6144b95d34130627ed508804/html5/thumbnails/29.jpg)
Singular vectors . . .
With each singular vector vj , associated singular value is �j = |Mvj |
Repeat r times till maxv?v1,v2,...,vr ; |v |=1
|Mv | = 0
r turns out to be the rank of M
Vectors {v1, v2, . . . , vr} are orthonormal right singular vectors
Our greedy strategy provably produces “best-fit” dimension r subspace for M
Dimension r subspace that maximizes content of M projected onto it
Corresponding left singular vectors are given by ui =1
�iMvi
Can show that {u1,u2, . . . ,ur} are also orthonormal
Madhavan Mukund Lecture 16: 3 June, 2021 DMML Apr–Jul 2021 7 / 14
![Page 30: Madhavan MukundMadhavan Mukund Lecture 16: 3 June, 2021 DMML Apr–Jul 2021 2/14 The curse of dimensionality ML data is often high dimensional — especially images A 1000 1000 pixel](https://reader035.fdocuments.in/reader035/viewer/2022071607/6144b95d34130627ed508804/html5/thumbnails/30.jpg)
Singular vectors . . .
With each singular vector vj , associated singular value is �j = |Mvj |
Repeat r times till maxv?v1,v2,...,vr ; |v |=1
|Mv | = 0
r turns out to be the rank of M
Vectors {v1, v2, . . . , vr} are orthonormal right singular vectors
Our greedy strategy provably produces “best-fit” dimension r subspace for M
Dimension r subspace that maximizes content of M projected onto it
Corresponding left singular vectors are given by ui =1
�iMvi
Can show that {u1,u2, . . . ,ur} are also orthonormal
Madhavan Mukund Lecture 16: 3 June, 2021 DMML Apr–Jul 2021 7 / 14
![Page 31: Madhavan MukundMadhavan Mukund Lecture 16: 3 June, 2021 DMML Apr–Jul 2021 2/14 The curse of dimensionality ML data is often high dimensional — especially images A 1000 1000 pixel](https://reader035.fdocuments.in/reader035/viewer/2022071607/6144b95d34130627ed508804/html5/thumbnails/31.jpg)
Singular vectors . . .
With each singular vector vj , associated singular value is �j = |Mvj |
Repeat r times till maxv?v1,v2,...,vr ; |v |=1
|Mv | = 0
r turns out to be the rank of M
Vectors {v1, v2, . . . , vr} are orthonormal right singular vectors
Our greedy strategy provably produces “best-fit” dimension r subspace for M
Dimension r subspace that maximizes content of M projected onto it
Corresponding left singular vectors are given by ui =1
�iMvi
Can show that {u1,u2, . . . ,ur} are also orthonormal
Madhavan Mukund Lecture 16: 3 June, 2021 DMML Apr–Jul 2021 7 / 14
I:-D" II.mi:D
"
I
![Page 32: Madhavan MukundMadhavan Mukund Lecture 16: 3 June, 2021 DMML Apr–Jul 2021 2/14 The curse of dimensionality ML data is often high dimensional — especially images A 1000 1000 pixel](https://reader035.fdocuments.in/reader035/viewer/2022071607/6144b95d34130627ed508804/html5/thumbnails/32.jpg)
SVD : M= UDVT Old - new
} f) \ V:dxInxd nxkkxk kid
n items Dig §d-features { new
items interim features
⑨ ofnnrfeahvsT
{Diivivi → nxdnxl d%d
![Page 33: Madhavan MukundMadhavan Mukund Lecture 16: 3 June, 2021 DMML Apr–Jul 2021 2/14 The curse of dimensionality ML data is often high dimensional — especially images A 1000 1000 pixel](https://reader035.fdocuments.in/reader035/viewer/2022071607/6144b95d34130627ed508804/html5/thumbnails/33.jpg)
Singular vectors . . .
With each singular vector vj , associated singular value is �j = |Mvj |
Repeat r times till maxv?v1,v2,...,vr ; |v |=1
|Mv | = 0
r turns out to be the rank of M
Vectors {v1, v2, . . . , vr} are orthonormal right singular vectors
Our greedy strategy provably produces “best-fit” dimension r subspace for M
Dimension r subspace that maximizes content of M projected onto it
Corresponding left singular vectors are given by ui =1
�iMvi
Can show that {u1,u2, . . . ,ur} are also orthonormal
Madhavan Mukund Lecture 16: 3 June, 2021 DMML Apr–Jul 2021 7 / 14
![Page 34: Madhavan MukundMadhavan Mukund Lecture 16: 3 June, 2021 DMML Apr–Jul 2021 2/14 The curse of dimensionality ML data is often high dimensional — especially images A 1000 1000 pixel](https://reader035.fdocuments.in/reader035/viewer/2022071607/6144b95d34130627ed508804/html5/thumbnails/34.jpg)
Singular Value Decomposition
M, dimension n ⇥ d , of rank r uniquely decomposes as M = UDV>
V = [v1 v2 · · · vr ] are the right singular vectors
D is a diagonal matrix with D[i , i ] = �i , the singular values
U = [u1 u2 · · · ur ] are the left singular vectors
M
n ⇥ d=
U
n ⇥ r
D
r ⇥ r
V>
r ⇥ d
Madhavan Mukund Lecture 16: 3 June, 2021 DMML Apr–Jul 2021 8 / 14
9- /Mv , /rz -- /Mvzl
OT ?Tz
Rs Eti ✗ 9<1villi- v4 v4 , µ -- K
![Page 35: Madhavan MukundMadhavan Mukund Lecture 16: 3 June, 2021 DMML Apr–Jul 2021 2/14 The curse of dimensionality ML data is often high dimensional — especially images A 1000 1000 pixel](https://reader035.fdocuments.in/reader035/viewer/2022071607/6144b95d34130627ed508804/html5/thumbnails/35.jpg)
Rank-k approximation
M has rank r , SVD gives rank r decomposition
Singular values are non-increasing — �1 � �2 � · · · � �r
Suppose we retain only k largest ones
We have
Matrix of first k right singular vectors Vk = [v1 v2 · · · vk ],Corresponding singular values �1,�2, . . . ,�k
Matrix of k left singular vectors Uk = [u1 u2 · · · uk ]
Let Dk be the k ⇥ k diagonal matrix with entries �1,�2, . . . ,�k
Then UkDkV>k is the best fit rank-k approximation of M
In other words, by truncating the SVD, we can focus on k most significant featuresimplicit in M
Madhavan Mukund Lecture 16: 3 June, 2021 DMML Apr–Jul 2021 9 / 14
![Page 36: Madhavan MukundMadhavan Mukund Lecture 16: 3 June, 2021 DMML Apr–Jul 2021 2/14 The curse of dimensionality ML data is often high dimensional — especially images A 1000 1000 pixel](https://reader035.fdocuments.in/reader035/viewer/2022071607/6144b95d34130627ed508804/html5/thumbnails/36.jpg)
Rank-k approximation
M has rank r , SVD gives rank r decomposition
Singular values are non-increasing — �1 � �2 � · · · � �r
Suppose we retain only k largest ones
We have
Matrix of first k right singular vectors Vk = [v1 v2 · · · vk ],Corresponding singular values �1,�2, . . . ,�k
Matrix of k left singular vectors Uk = [u1 u2 · · · uk ]
Let Dk be the k ⇥ k diagonal matrix with entries �1,�2, . . . ,�k
Then UkDkV>k is the best fit rank-k approximation of M
In other words, by truncating the SVD, we can focus on k most significant featuresimplicit in M
Madhavan Mukund Lecture 16: 3 June, 2021 DMML Apr–Jul 2021 9 / 14
![Page 37: Madhavan MukundMadhavan Mukund Lecture 16: 3 June, 2021 DMML Apr–Jul 2021 2/14 The curse of dimensionality ML data is often high dimensional — especially images A 1000 1000 pixel](https://reader035.fdocuments.in/reader035/viewer/2022071607/6144b95d34130627ed508804/html5/thumbnails/37.jpg)
Rank-k approximation
M has rank r , SVD gives rank r decomposition
Singular values are non-increasing — �1 � �2 � · · · � �r
Suppose we retain only k largest ones
We have
Matrix of first k right singular vectors Vk = [v1 v2 · · · vk ],Corresponding singular values �1,�2, . . . ,�k
Matrix of k left singular vectors Uk = [u1 u2 · · · uk ]
Let Dk be the k ⇥ k diagonal matrix with entries �1,�2, . . . ,�k
Then UkDkV>k is the best fit rank-k approximation of M
In other words, by truncating the SVD, we can focus on k most significant featuresimplicit in M
Madhavan Mukund Lecture 16: 3 June, 2021 DMML Apr–Jul 2021 9 / 14
![Page 38: Madhavan MukundMadhavan Mukund Lecture 16: 3 June, 2021 DMML Apr–Jul 2021 2/14 The curse of dimensionality ML data is often high dimensional — especially images A 1000 1000 pixel](https://reader035.fdocuments.in/reader035/viewer/2022071607/6144b95d34130627ed508804/html5/thumbnails/38.jpg)
Rank-k approximation
M has rank r , SVD gives rank r decomposition
Singular values are non-increasing — �1 � �2 � · · · � �r
Suppose we retain only k largest ones
We have
Matrix of first k right singular vectors Vk = [v1 v2 · · · vk ],Corresponding singular values �1,�2, . . . ,�k
Matrix of k left singular vectors Uk = [u1 u2 · · · uk ]
Let Dk be the k ⇥ k diagonal matrix with entries �1,�2, . . . ,�k
Then UkDkV>k is the best fit rank-k approximation of M
In other words, by truncating the SVD, we can focus on k most significant featuresimplicit in M
Madhavan Mukund Lecture 16: 3 June, 2021 DMML Apr–Jul 2021 9 / 14
![Page 39: Madhavan MukundMadhavan Mukund Lecture 16: 3 June, 2021 DMML Apr–Jul 2021 2/14 The curse of dimensionality ML data is often high dimensional — especially images A 1000 1000 pixel](https://reader035.fdocuments.in/reader035/viewer/2022071607/6144b95d34130627ed508804/html5/thumbnails/39.jpg)
M--VDVT
mt.FI#-Dt-I-.-.I-vmInxdnD
![Page 40: Madhavan MukundMadhavan Mukund Lecture 16: 3 June, 2021 DMML Apr–Jul 2021 2/14 The curse of dimensionality ML data is often high dimensional — especially images A 1000 1000 pixel](https://reader035.fdocuments.in/reader035/viewer/2022071607/6144b95d34130627ed508804/html5/thumbnails/40.jpg)
Rank-k approximation
M has rank r , SVD gives rank r decomposition
Singular values are non-increasing — �1 � �2 � · · · � �r
Suppose we retain only k largest ones
We have
Matrix of first k right singular vectors Vk = [v1 v2 · · · vk ],Corresponding singular values �1,�2, . . . ,�k
Matrix of k left singular vectors Uk = [u1 u2 · · · uk ]
Let Dk be the k ⇥ k diagonal matrix with entries �1,�2, . . . ,�k
Then UkDkV>k is the best fit rank-k approximation of M
In other words, by truncating the SVD, we can focus on k most significant featuresimplicit in M
Madhavan Mukund Lecture 16: 3 June, 2021 DMML Apr–Jul 2021 9 / 14
![Page 41: Madhavan MukundMadhavan Mukund Lecture 16: 3 June, 2021 DMML Apr–Jul 2021 2/14 The curse of dimensionality ML data is often high dimensional — especially images A 1000 1000 pixel](https://reader035.fdocuments.in/reader035/viewer/2022071607/6144b95d34130627ed508804/html5/thumbnails/41.jpg)
PCA and variance
Interpret PCA in terms of preserving variance
Di↵erent projections have di↵erent variance
SVD orders projections in decreasing order ofvariance
Criterion for choosing when to stop
Choose k so that a desired fraction of thevariance is “explained”
Madhavan Mukund Lecture 16: 3 June, 2021 DMML Apr–Jul 2021 10 / 14
![Page 42: Madhavan MukundMadhavan Mukund Lecture 16: 3 June, 2021 DMML Apr–Jul 2021 2/14 The curse of dimensionality ML data is often high dimensional — especially images A 1000 1000 pixel](https://reader035.fdocuments.in/reader035/viewer/2022071607/6144b95d34130627ed508804/html5/thumbnails/42.jpg)
PCA and variance
Interpret PCA in terms of preserving variance
Di↵erent projections have di↵erent variance
SVD orders projections in decreasing order ofvariance
Criterion for choosing when to stop
Choose k so that a desired fraction of thevariance is “explained”
Madhavan Mukund Lecture 16: 3 June, 2021 DMML Apr–Jul 2021 10 / 14
![Page 43: Madhavan MukundMadhavan Mukund Lecture 16: 3 June, 2021 DMML Apr–Jul 2021 2/14 The curse of dimensionality ML data is often high dimensional — especially images A 1000 1000 pixel](https://reader035.fdocuments.in/reader035/viewer/2022071607/6144b95d34130627ed508804/html5/thumbnails/43.jpg)
PCA and variance
Interpret PCA in terms of preserving variance
Di↵erent projections have di↵erent variance
SVD orders projections in decreasing order ofvariance
Criterion for choosing when to stop
Choose k so that a desired fraction of thevariance is “explained”
Madhavan Mukund Lecture 16: 3 June, 2021 DMML Apr–Jul 2021 10 / 14
![Page 44: Madhavan MukundMadhavan Mukund Lecture 16: 3 June, 2021 DMML Apr–Jul 2021 2/14 The curse of dimensionality ML data is often high dimensional — especially images A 1000 1000 pixel](https://reader035.fdocuments.in/reader035/viewer/2022071607/6144b95d34130627ed508804/html5/thumbnails/44.jpg)
PCA and variance
Interpret PCA in terms of preserving variance
Di↵erent projections have di↵erent variance
SVD orders projections in decreasing order ofvariance
Criterion for choosing when to stop
Choose k so that a desired fraction of thevariance is “explained”
Madhavan Mukund Lecture 16: 3 June, 2021 DMML Apr–Jul 2021 10 / 14
![Page 45: Madhavan MukundMadhavan Mukund Lecture 16: 3 June, 2021 DMML Apr–Jul 2021 2/14 The curse of dimensionality ML data is often high dimensional — especially images A 1000 1000 pixel](https://reader035.fdocuments.in/reader035/viewer/2022071607/6144b95d34130627ed508804/html5/thumbnails/45.jpg)
Manifold learning
Projection may not always help
Swiss roll dataset
Projection onto 2 dimesions is not useful
Better to unroll the image
Discover the manifold along which the data lies
Madhavan Mukund Lecture 16: 3 June, 2021 DMML Apr–Jul 2021 11 / 14
![Page 46: Madhavan MukundMadhavan Mukund Lecture 16: 3 June, 2021 DMML Apr–Jul 2021 2/14 The curse of dimensionality ML data is often high dimensional — especially images A 1000 1000 pixel](https://reader035.fdocuments.in/reader035/viewer/2022071607/6144b95d34130627ed508804/html5/thumbnails/46.jpg)
Manifold learning
Projection may not always help
Swiss roll dataset
Projection onto 2 dimesions is not useful
Better to unroll the image
Discover the manifold along which the data lies
Madhavan Mukund Lecture 16: 3 June, 2021 DMML Apr–Jul 2021 11 / 14
![Page 47: Madhavan MukundMadhavan Mukund Lecture 16: 3 June, 2021 DMML Apr–Jul 2021 2/14 The curse of dimensionality ML data is often high dimensional — especially images A 1000 1000 pixel](https://reader035.fdocuments.in/reader035/viewer/2022071607/6144b95d34130627ed508804/html5/thumbnails/47.jpg)
Manifold learning
Projection may not always help
Swiss roll dataset
Projection onto 2 dimesions is not useful
Better to unroll the image
Discover the manifold along which the data lies
Madhavan Mukund Lecture 16: 3 June, 2021 DMML Apr–Jul 2021 11 / 14
![Page 48: Madhavan MukundMadhavan Mukund Lecture 16: 3 June, 2021 DMML Apr–Jul 2021 2/14 The curse of dimensionality ML data is often high dimensional — especially images A 1000 1000 pixel](https://reader035.fdocuments.in/reader035/viewer/2022071607/6144b95d34130627ed508804/html5/thumbnails/48.jpg)
Manifold learning
Projection may not always help
Swiss roll dataset
Projection onto 2 dimesions is not useful
Better to unroll the image
Discover the manifold along which the data lies
Madhavan Mukund Lecture 16: 3 June, 2021 DMML Apr–Jul 2021 11 / 14
![Page 49: Madhavan MukundMadhavan Mukund Lecture 16: 3 June, 2021 DMML Apr–Jul 2021 2/14 The curse of dimensionality ML data is often high dimensional — especially images A 1000 1000 pixel](https://reader035.fdocuments.in/reader035/viewer/2022071607/6144b95d34130627ed508804/html5/thumbnails/49.jpg)
Manifold learning
Projection may not always help
Swiss roll dataset
Projection onto 2 dimesions is not useful
Better to unroll the image
Discover the manifold along which the data lies
Madhavan Mukund Lecture 16: 3 June, 2021 DMML Apr–Jul 2021 11 / 14