Introduction to Machine Learningepxing/Class/10715/lectures/VCTheory.pdf · Machine Learning,...
Transcript of Introduction to Machine Learningepxing/Class/10715/lectures/VCTheory.pdf · Machine Learning,...
![Page 1: Introduction to Machine Learningepxing/Class/10715/lectures/VCTheory.pdf · Machine Learning, CMU-10715 Vapnik–Chervonenkis Theory Barnabás Póczos . ... What you need to know](https://reader033.fdocuments.in/reader033/viewer/2022042600/5f4a3414b173c72ba4275788/html5/thumbnails/1.jpg)
Advanced Introduction to
Machine Learning, CMU-10715
Vapnik–Chervonenkis Theory
Barnabás Póczos
![Page 2: Introduction to Machine Learningepxing/Class/10715/lectures/VCTheory.pdf · Machine Learning, CMU-10715 Vapnik–Chervonenkis Theory Barnabás Póczos . ... What you need to know](https://reader033.fdocuments.in/reader033/viewer/2022042600/5f4a3414b173c72ba4275788/html5/thumbnails/2.jpg)
We have explored many ways of learning from data
But…
– How good is our classifier, really?
– How much data do we need to make it “good enough”?
Learning Theory
2
![Page 3: Introduction to Machine Learningepxing/Class/10715/lectures/VCTheory.pdf · Machine Learning, CMU-10715 Vapnik–Chervonenkis Theory Barnabás Póczos . ... What you need to know](https://reader033.fdocuments.in/reader033/viewer/2022042600/5f4a3414b173c72ba4275788/html5/thumbnails/3.jpg)
Review of what we have
learned so far
3
![Page 4: Introduction to Machine Learningepxing/Class/10715/lectures/VCTheory.pdf · Machine Learning, CMU-10715 Vapnik–Chervonenkis Theory Barnabás Póczos . ... What you need to know](https://reader033.fdocuments.in/reader033/viewer/2022042600/5f4a3414b173c72ba4275788/html5/thumbnails/4.jpg)
Notation
4
This is what the learning algorithm produces
We will need these definitions, please copy it!
![Page 5: Introduction to Machine Learningepxing/Class/10715/lectures/VCTheory.pdf · Machine Learning, CMU-10715 Vapnik–Chervonenkis Theory Barnabás Póczos . ... What you need to know](https://reader033.fdocuments.in/reader033/viewer/2022042600/5f4a3414b173c72ba4275788/html5/thumbnails/5.jpg)
Big Picture
5
Bayes risk
Estimation error Approximation error
Bayes risk
Ultimate goal:
Approximation error
Estimation error
![Page 6: Introduction to Machine Learningepxing/Class/10715/lectures/VCTheory.pdf · Machine Learning, CMU-10715 Vapnik–Chervonenkis Theory Barnabás Póczos . ... What you need to know](https://reader033.fdocuments.in/reader033/viewer/2022042600/5f4a3414b173c72ba4275788/html5/thumbnails/6.jpg)
Big Picture: Illustration of Risks
6
Upper bound
Goal of Learning:
![Page 7: Introduction to Machine Learningepxing/Class/10715/lectures/VCTheory.pdf · Machine Learning, CMU-10715 Vapnik–Chervonenkis Theory Barnabás Póczos . ... What you need to know](https://reader033.fdocuments.in/reader033/viewer/2022042600/5f4a3414b173c72ba4275788/html5/thumbnails/7.jpg)
Learning Theory
7
![Page 8: Introduction to Machine Learningepxing/Class/10715/lectures/VCTheory.pdf · Machine Learning, CMU-10715 Vapnik–Chervonenkis Theory Barnabás Póczos . ... What you need to know](https://reader033.fdocuments.in/reader033/viewer/2022042600/5f4a3414b173c72ba4275788/html5/thumbnails/8.jpg)
Outline
8
These results are useless if N is big, or infinite. (e.g. all possible hyper-planes)
Today we will see how to fix this with the
Shattering coefficient and VC dimension
Theorem:
From Hoeffding’s inequality, we have seen that
![Page 9: Introduction to Machine Learningepxing/Class/10715/lectures/VCTheory.pdf · Machine Learning, CMU-10715 Vapnik–Chervonenkis Theory Barnabás Póczos . ... What you need to know](https://reader033.fdocuments.in/reader033/viewer/2022042600/5f4a3414b173c72ba4275788/html5/thumbnails/9.jpg)
Outline
9
Theorem:
From Hoeffding’s inequality, we have seen that
After this fix, we can say something meaningful about this too:
This is what the learning algorithm produces and its true risk
![Page 10: Introduction to Machine Learningepxing/Class/10715/lectures/VCTheory.pdf · Machine Learning, CMU-10715 Vapnik–Chervonenkis Theory Barnabás Póczos . ... What you need to know](https://reader033.fdocuments.in/reader033/viewer/2022042600/5f4a3414b173c72ba4275788/html5/thumbnails/10.jpg)
Hoeffding inequality
10
Theorem:
Observation!
Definition:
![Page 11: Introduction to Machine Learningepxing/Class/10715/lectures/VCTheory.pdf · Machine Learning, CMU-10715 Vapnik–Chervonenkis Theory Barnabás Póczos . ... What you need to know](https://reader033.fdocuments.in/reader033/viewer/2022042600/5f4a3414b173c72ba4275788/html5/thumbnails/11.jpg)
McDiarmid’s
Bounded Difference Inequality
It follows that
11
![Page 12: Introduction to Machine Learningepxing/Class/10715/lectures/VCTheory.pdf · Machine Learning, CMU-10715 Vapnik–Chervonenkis Theory Barnabás Póczos . ... What you need to know](https://reader033.fdocuments.in/reader033/viewer/2022042600/5f4a3414b173c72ba4275788/html5/thumbnails/12.jpg)
Bounded Difference Condition
12
Our main goal is to bound
Lemma:
Let g denote the following function:
Observation:
Proof:
) McDiarmid can be applied for g!
![Page 13: Introduction to Machine Learningepxing/Class/10715/lectures/VCTheory.pdf · Machine Learning, CMU-10715 Vapnik–Chervonenkis Theory Barnabás Póczos . ... What you need to know](https://reader033.fdocuments.in/reader033/viewer/2022042600/5f4a3414b173c72ba4275788/html5/thumbnails/13.jpg)
Bounded Difference Condition
13
Corollary:
The Vapnik-Chervonenkis inequality does that
with the shatter coefficient (and VC dimension)!
![Page 14: Introduction to Machine Learningepxing/Class/10715/lectures/VCTheory.pdf · Machine Learning, CMU-10715 Vapnik–Chervonenkis Theory Barnabás Póczos . ... What you need to know](https://reader033.fdocuments.in/reader033/viewer/2022042600/5f4a3414b173c72ba4275788/html5/thumbnails/14.jpg)
Concentration and
Expected Value
14
![Page 15: Introduction to Machine Learningepxing/Class/10715/lectures/VCTheory.pdf · Machine Learning, CMU-10715 Vapnik–Chervonenkis Theory Barnabás Póczos . ... What you need to know](https://reader033.fdocuments.in/reader033/viewer/2022042600/5f4a3414b173c72ba4275788/html5/thumbnails/15.jpg)
Vapnik-Chervonenkis inequality
15
We already know:
Vapnik-Chervonenkis inequality:
Corollary: Vapnik-Chervonenkis theorem:
Our main goal is to bound
![Page 16: Introduction to Machine Learningepxing/Class/10715/lectures/VCTheory.pdf · Machine Learning, CMU-10715 Vapnik–Chervonenkis Theory Barnabás Póczos . ... What you need to know](https://reader033.fdocuments.in/reader033/viewer/2022042600/5f4a3414b173c72ba4275788/html5/thumbnails/16.jpg)
Shattering
16
![Page 17: Introduction to Machine Learningepxing/Class/10715/lectures/VCTheory.pdf · Machine Learning, CMU-10715 Vapnik–Chervonenkis Theory Barnabás Póczos . ... What you need to know](https://reader033.fdocuments.in/reader033/viewer/2022042600/5f4a3414b173c72ba4275788/html5/thumbnails/17.jpg)
How many points can a linear
boundary classify exactly in 1D?
- +
2 pts 3 pts - + +
- - +
- + - ?? There exists placement s.t. all labelings can be classified
- +
17
The answer is 2
![Page 18: Introduction to Machine Learningepxing/Class/10715/lectures/VCTheory.pdf · Machine Learning, CMU-10715 Vapnik–Chervonenkis Theory Barnabás Póczos . ... What you need to know](https://reader033.fdocuments.in/reader033/viewer/2022042600/5f4a3414b173c72ba4275788/html5/thumbnails/18.jpg)
- + 3 pts 4 pts
- +
+ - +
- +
- ??
- +
- +
How many points can a linear
boundary classify exactly in 2D?
There exists placement s.t. all labelings can be classified
18
The answer is 3
![Page 19: Introduction to Machine Learningepxing/Class/10715/lectures/VCTheory.pdf · Machine Learning, CMU-10715 Vapnik–Chervonenkis Theory Barnabás Póczos . ... What you need to know](https://reader033.fdocuments.in/reader033/viewer/2022042600/5f4a3414b173c72ba4275788/html5/thumbnails/19.jpg)
How many points can a linear
boundary classify exactly in d-dim?
19
The answer is d+1
How many points can a linear
boundary classify exactly in 3D?
The answer is 4
tetraeder
+
+ -
-
![Page 20: Introduction to Machine Learningepxing/Class/10715/lectures/VCTheory.pdf · Machine Learning, CMU-10715 Vapnik–Chervonenkis Theory Barnabás Póczos . ... What you need to know](https://reader033.fdocuments.in/reader033/viewer/2022042600/5f4a3414b173c72ba4275788/html5/thumbnails/20.jpg)
Growth function,
Shatter coefficient
Definition
0 0 0
0 1 0
1 1 1
1 0 0
0 1 1
0 1 0
1 1 1
(=5 in this example)
Growth function, Shatter coefficient
maximum number of behaviors on n points
20
![Page 21: Introduction to Machine Learningepxing/Class/10715/lectures/VCTheory.pdf · Machine Learning, CMU-10715 Vapnik–Chervonenkis Theory Barnabás Póczos . ... What you need to know](https://reader033.fdocuments.in/reader033/viewer/2022042600/5f4a3414b173c72ba4275788/html5/thumbnails/21.jpg)
Growth function,
Shatter coefficient
Definition
Growth function, Shatter coefficient
maximum number of behaviors on n points
- +
+
Example: Half spaces in 2D
+ + -
21
![Page 22: Introduction to Machine Learningepxing/Class/10715/lectures/VCTheory.pdf · Machine Learning, CMU-10715 Vapnik–Chervonenkis Theory Barnabás Póczos . ... What you need to know](https://reader033.fdocuments.in/reader033/viewer/2022042600/5f4a3414b173c72ba4275788/html5/thumbnails/22.jpg)
VC-dimension
Definition
Growth function, Shatter coefficient
maximum number of behaviors on n points
Definition: VC-dimension
# b
eha
vio
rs
Definition: Shattering
Note: 22
![Page 23: Introduction to Machine Learningepxing/Class/10715/lectures/VCTheory.pdf · Machine Learning, CMU-10715 Vapnik–Chervonenkis Theory Barnabás Póczos . ... What you need to know](https://reader033.fdocuments.in/reader033/viewer/2022042600/5f4a3414b173c72ba4275788/html5/thumbnails/23.jpg)
VC-dimension
Definition
# b
eha
vio
rs
23
![Page 24: Introduction to Machine Learningepxing/Class/10715/lectures/VCTheory.pdf · Machine Learning, CMU-10715 Vapnik–Chervonenkis Theory Barnabás Póczos . ... What you need to know](https://reader033.fdocuments.in/reader033/viewer/2022042600/5f4a3414b173c72ba4275788/html5/thumbnails/24.jpg)
- +
- +
VC-dimension
24
![Page 25: Introduction to Machine Learningepxing/Class/10715/lectures/VCTheory.pdf · Machine Learning, CMU-10715 Vapnik–Chervonenkis Theory Barnabás Póczos . ... What you need to know](https://reader033.fdocuments.in/reader033/viewer/2022042600/5f4a3414b173c72ba4275788/html5/thumbnails/25.jpg)
Examples
25
![Page 26: Introduction to Machine Learningepxing/Class/10715/lectures/VCTheory.pdf · Machine Learning, CMU-10715 Vapnik–Chervonenkis Theory Barnabás Póczos . ... What you need to know](https://reader033.fdocuments.in/reader033/viewer/2022042600/5f4a3414b173c72ba4275788/html5/thumbnails/26.jpg)
VC dim of decision stumps
(axis aligned linear separator) in 2d
What’s the VC dim. of decision stumps in 2d?
- +
+
- +
-
+ +
-
There is a placement of 3 pts that can be shattered ) VC dim ≥ 3
26
![Page 27: Introduction to Machine Learningepxing/Class/10715/lectures/VCTheory.pdf · Machine Learning, CMU-10715 Vapnik–Chervonenkis Theory Barnabás Póczos . ... What you need to know](https://reader033.fdocuments.in/reader033/viewer/2022042600/5f4a3414b173c72ba4275788/html5/thumbnails/27.jpg)
What’s the VC dim. of decision stumps in 2d? If VC dim = 3, then for all placements of 4 pts, there exists a labeling that
can’t be shattered
3 collinear 1 in convex hull of other 3
quadrilateral
- + - - +
-
-
+ -
+ -
VC dim of decision stumps
(axis aligned linear separator) in 2d
27
![Page 28: Introduction to Machine Learningepxing/Class/10715/lectures/VCTheory.pdf · Machine Learning, CMU-10715 Vapnik–Chervonenkis Theory Barnabás Póczos . ... What you need to know](https://reader033.fdocuments.in/reader033/viewer/2022042600/5f4a3414b173c72ba4275788/html5/thumbnails/28.jpg)
VC dim. of axis parallel
rectangles in 2d What’s the VC dim. of axis parallel rectangles in 2d?
- +
+
- +
- There is a placement of 3 pts that can be shattered ) VC dim ≥ 3
28
![Page 29: Introduction to Machine Learningepxing/Class/10715/lectures/VCTheory.pdf · Machine Learning, CMU-10715 Vapnik–Chervonenkis Theory Barnabás Póczos . ... What you need to know](https://reader033.fdocuments.in/reader033/viewer/2022042600/5f4a3414b173c72ba4275788/html5/thumbnails/29.jpg)
VC dim. of axis parallel
rectangles in 2d
There is a placement of 4 pts that can be shattered ) VC dim ≥ 4 29
![Page 30: Introduction to Machine Learningepxing/Class/10715/lectures/VCTheory.pdf · Machine Learning, CMU-10715 Vapnik–Chervonenkis Theory Barnabás Póczos . ... What you need to know](https://reader033.fdocuments.in/reader033/viewer/2022042600/5f4a3414b173c72ba4275788/html5/thumbnails/30.jpg)
VC dim. of axis parallel
rectangles in 2d What’s the VC dim. of axis parallel rectangles in 2d?
+ +
-
-
- -
- + - - +
-
- -
- +
-
+
-
If VC dim = 4, then for all placements of 5 pts, there exists a labeling that
can’t be shattered
4 collinear
2 in convex hull 1 in convex hull
pentagon
30
![Page 31: Introduction to Machine Learningepxing/Class/10715/lectures/VCTheory.pdf · Machine Learning, CMU-10715 Vapnik–Chervonenkis Theory Barnabás Póczos . ... What you need to know](https://reader033.fdocuments.in/reader033/viewer/2022042600/5f4a3414b173c72ba4275788/html5/thumbnails/31.jpg)
Sauer’s Lemma
31
The VC dimension can be used to upper bound
the shattering coefficient.
Sauer’s lemma:
Corollary:
We already know that [Exponential in n]
[Polynomial in n]
![Page 32: Introduction to Machine Learningepxing/Class/10715/lectures/VCTheory.pdf · Machine Learning, CMU-10715 Vapnik–Chervonenkis Theory Barnabás Póczos . ... What you need to know](https://reader033.fdocuments.in/reader033/viewer/2022042600/5f4a3414b173c72ba4275788/html5/thumbnails/32.jpg)
Proof of Sauer’s Lemma
Write all different behaviors on a sample
(x1,x2,…xn) in a matrix:
0 0 0
0 1 0
1 1 1
1 0 0
0 1 0
1 1 1
0 1 1
32
0 0 0
0 1 0
1 1 1
1 0 0
0 1 1
![Page 33: Introduction to Machine Learningepxing/Class/10715/lectures/VCTheory.pdf · Machine Learning, CMU-10715 Vapnik–Chervonenkis Theory Barnabás Póczos . ... What you need to know](https://reader033.fdocuments.in/reader033/viewer/2022042600/5f4a3414b173c72ba4275788/html5/thumbnails/33.jpg)
Proof of Sauer’s Lemma
We will prove that
33
0 0 0
0 1 0
1 1 1
1 0 0
0 1 1
Shattered subsets of columns:
Therefore,
![Page 34: Introduction to Machine Learningepxing/Class/10715/lectures/VCTheory.pdf · Machine Learning, CMU-10715 Vapnik–Chervonenkis Theory Barnabás Póczos . ... What you need to know](https://reader033.fdocuments.in/reader033/viewer/2022042600/5f4a3414b173c72ba4275788/html5/thumbnails/34.jpg)
Proof of Sauer’s Lemma
Lemma 1
34
0 0 0
0 1 0
1 1 1
1 0 0
0 1 1
for any binary matrix with no repeated rows.
Shattered subsets of columns:
Lemma 2
In this example: 5· 6
In this example: 6· 1+3+3=7
![Page 35: Introduction to Machine Learningepxing/Class/10715/lectures/VCTheory.pdf · Machine Learning, CMU-10715 Vapnik–Chervonenkis Theory Barnabás Póczos . ... What you need to know](https://reader033.fdocuments.in/reader033/viewer/2022042600/5f4a3414b173c72ba4275788/html5/thumbnails/35.jpg)
Proof of Lemma 1
Lemma 1
35
0 0 0
0 1 0
1 1 1
1 0 0
0 1 1
Shattered subsets of columns:
Proof
In this example: 6· 1+3+3=7
![Page 36: Introduction to Machine Learningepxing/Class/10715/lectures/VCTheory.pdf · Machine Learning, CMU-10715 Vapnik–Chervonenkis Theory Barnabás Póczos . ... What you need to know](https://reader033.fdocuments.in/reader033/viewer/2022042600/5f4a3414b173c72ba4275788/html5/thumbnails/36.jpg)
Proof of Lemma 2
36
for any binary matrix with no repeated rows.
Lemma 2
Induction on the number of columns Proof
Base case: A has one column. There are three cases:
) 1 · 1
) 1 · 1
) 2 · 2
![Page 37: Introduction to Machine Learningepxing/Class/10715/lectures/VCTheory.pdf · Machine Learning, CMU-10715 Vapnik–Chervonenkis Theory Barnabás Póczos . ... What you need to know](https://reader033.fdocuments.in/reader033/viewer/2022042600/5f4a3414b173c72ba4275788/html5/thumbnails/37.jpg)
Proof of Lemma 2
37
Inductive case: A has at least two columns.
We have,
By induction (less columns)
0 0 0
0 1 0
1 1 1
1 0 0
0 1 1
![Page 38: Introduction to Machine Learningepxing/Class/10715/lectures/VCTheory.pdf · Machine Learning, CMU-10715 Vapnik–Chervonenkis Theory Barnabás Póczos . ... What you need to know](https://reader033.fdocuments.in/reader033/viewer/2022042600/5f4a3414b173c72ba4275788/html5/thumbnails/38.jpg)
Proof of Lemma 2
38
because
0 0 0
0 1 0
1 1 1
1 0 0
0 1 1
![Page 39: Introduction to Machine Learningepxing/Class/10715/lectures/VCTheory.pdf · Machine Learning, CMU-10715 Vapnik–Chervonenkis Theory Barnabás Póczos . ... What you need to know](https://reader033.fdocuments.in/reader033/viewer/2022042600/5f4a3414b173c72ba4275788/html5/thumbnails/39.jpg)
Vapnik-Chervonenkis inequality
39
Vapnik-Chervonenkis inequality:
From Sauer’s lemma:
Since
Therefore,
[We don’t prove this]
Estimation error
![Page 40: Introduction to Machine Learningepxing/Class/10715/lectures/VCTheory.pdf · Machine Learning, CMU-10715 Vapnik–Chervonenkis Theory Barnabás Póczos . ... What you need to know](https://reader033.fdocuments.in/reader033/viewer/2022042600/5f4a3414b173c72ba4275788/html5/thumbnails/40.jpg)
Linear (hyperplane) classifiers
40
Estimation error
We already know that
Estimation error
Estimation error
![Page 41: Introduction to Machine Learningepxing/Class/10715/lectures/VCTheory.pdf · Machine Learning, CMU-10715 Vapnik–Chervonenkis Theory Barnabás Póczos . ... What you need to know](https://reader033.fdocuments.in/reader033/viewer/2022042600/5f4a3414b173c72ba4275788/html5/thumbnails/41.jpg)
Vapnik-Chervonenkis Theorem
41
We already know from McDiarmid:
Corollary: Vapnik-Chervonenkis theorem: [We don’t prove them]
Vapnik-Chervonenkis inequality:
Hoeffding + Union bound for finite function class:
![Page 42: Introduction to Machine Learningepxing/Class/10715/lectures/VCTheory.pdf · Machine Learning, CMU-10715 Vapnik–Chervonenkis Theory Barnabás Póczos . ... What you need to know](https://reader033.fdocuments.in/reader033/viewer/2022042600/5f4a3414b173c72ba4275788/html5/thumbnails/42.jpg)
PAC Bound for the Estimation
Error
42
VC theorem:
Inversion:
Estimation error
![Page 43: Introduction to Machine Learningepxing/Class/10715/lectures/VCTheory.pdf · Machine Learning, CMU-10715 Vapnik–Chervonenkis Theory Barnabás Póczos . ... What you need to know](https://reader033.fdocuments.in/reader033/viewer/2022042600/5f4a3414b173c72ba4275788/html5/thumbnails/43.jpg)
Structoral Risk Minimization
43
Bayes risk
Estimation error Approximation error
Ultimate goal:
Approximation error
Estimation error
So far we studied when estimation error ! 0, but we also want approximation error ! 0
Many different variants…
penalize too complex models to avoid overfitting
![Page 44: Introduction to Machine Learningepxing/Class/10715/lectures/VCTheory.pdf · Machine Learning, CMU-10715 Vapnik–Chervonenkis Theory Barnabás Póczos . ... What you need to know](https://reader033.fdocuments.in/reader033/viewer/2022042600/5f4a3414b173c72ba4275788/html5/thumbnails/44.jpg)
What you need to know
Complexity of the classifier depends on number of points that can
be classified exactly
Finite case – Number of hypothesis
Infinite case – Shattering coefficient, VC dimension
PAC bounds on true error in terms of empirical/training error and
complexity of hypothesis space
Empirical and Structural Risk Minimization
44
![Page 45: Introduction to Machine Learningepxing/Class/10715/lectures/VCTheory.pdf · Machine Learning, CMU-10715 Vapnik–Chervonenkis Theory Barnabás Póczos . ... What you need to know](https://reader033.fdocuments.in/reader033/viewer/2022042600/5f4a3414b173c72ba4275788/html5/thumbnails/45.jpg)
Thanks for your attention
45