BREAK POINTS - Bloch€¦ · If not data set of size can be shattered by , then is a break point...
Transcript of BREAK POINTS - Bloch€¦ · If not data set of size can be shattered by , then is a break point...
Matthieu R Bloch Tuesday, March 3, 2020
BREAK POINTSBREAK POINTS
1
LOGISTICSLOGISTICSTAs and Office hours
Tuesday: Dr. Bloch (College of Architecture Cafe) - 11am - 11:55amTuesday: TJ (VL C449 Cubicle D) - 1:30pm - 2:45pmThursday: Hossein (VL C449 Cubicle B): 10:45pm - 12:00pmFriday: Brighton (TSRB 523a) - 12pm-1:15pm
ProjectsProposals due Friday March 13, 2020
Midterm Thursday March 5thSample midterm posted (do not share)Open notes but not open electronics (no space on the desks :()
2
RECAP: DICHOTOMYRECAP: DICHOTOMYThe union bound naturally depends on the size of , but this is not what matters
We will consider the number of hypotheses that lead to distinct labeling for a dataset
Definition. (Dichotomy)
For a dataset and set of hypotheses , the set of dichotomies generated by on is
By definition and in general
Definition. (Growth function)
For a set of hypotheses , the growth function of is
The growth function does not depend on the datapoints and
H
D ≜ {xi}N
i=1 H H D
H({ ) ≜ {{h( ) : h ∈ H}xi}N
i=1 xi }N
i=1
|H({ )| ≤xi}N
i=1 2N |H({ )| ≪ |H|xi}N
i=1
H H
(N) ≜ |H({ )|mH max{xi}
N
i=1
xi}N
i=1
{xi}N
i=1 (N) ≤mH 2N
3
RECAP: EXAMPLESRECAP: EXAMPLESPositive rays:
Positive intervals:
H ≜ {h : R → {±1} : x ↦ sgn (x − a)|a ∈ R}
= N + 1mH
H ≜ {h : R → {±1} : x ↦ 1{x ∈ [a; b]} − 1{x ∉ [a; b]}|a < b ∈ R}
= ( ) + 1 = + N + 1mH
N + 1
2
1
2N
2 1
2
4
EXAMPLESEXAMPLESConvex sets:
because we can generate all dichotomies
Definition. (Shattering)
If can generate all dichotomies on , we say that shatters
H ≜ {h : → {±1}|{x ∈ : h(x) = +1} is convex}R2 R2
(N) =mH 2N
H {xi}N
i=1 H {xi}N
i=1
5
6
EXAMPLESEXAMPLESLinear classifiers:
The growth function is a worst case measure, hence
4 points cannot always be shattered and
H ≜ {h : → {±1} : x ↦ sgn ( x + b)|w ∈ , b ∈ R}R2w
⊺ R2
(3) = 8mH
(4) = 14 <mH 24
7
8
WISHFUL THINKINGWISHFUL THINKINGThe growth function can sometimes be smaller than
What if we could use instead of in our analysis?
We know that with probability at least
What if we could show that with probability at least
Our ability to generalize then depend on the behavior of the growth functionIf is exponential in , we can’t say muchIf is polynomial in , then for large enough
mH |H|
mH |H|
1 − δ
R( ) ≤ ( ) +h∗
R̂N h∗ log
1
2N
2|H|
δ
− −−−−−−−−−√
1 − δ
R( ) ≤ ( ) + ?h∗
R̂N h∗ log
1
2N
2 (N)mH
δ
− −−−−−−−−−−−−√
(N)mH N
(N)mH N R( ) ≈ ( )h∗
R̂N h∗
N
9
10
BREAK POINTSBREAK POINTSDefinition. (Break point)
If not data set of size can be shattered by , then is a break point for
If is a break point then
ExamplesPositive rays: break point Postive Intervals: break point Convex sets: break point Linear classifiers: break point
We will spend some time proving the following result
Proposition.
If there exists any break point for , then is polynomial in
If there is no break point for , then
k H k H
k (k) <mH 2k
k = 2k = 3
k = ∞k = 4
H (N)mH N
H (N) =mH 2N
11
BREAK POINTSBREAK POINTSDefinition.
Assume has break point . is the maximum number of dichotomies of points such thatno subset of size can be shattered by the dichotomies.
is a purely combinatorial quantity, does not depend on
Example.
Assume has break point . How many dichotomies can we generate of set of size 3?
By definition, if is a break point for , then
H k B(N, k) N
k
B(N, k) H
H 2
k H (N) ≤ B(N, k)mH
12
13
SAUER’S LEMMASAUER’S LEMMALemma.
, for ,
Lemma (Sauer's lemma)
is polynomial in of degree
Conclusion: if has a break point, is polynomial in
B(N, 1) = 1 B(1, k) = 2 k > 1∀k > 1 B(N, k) ≤ B(N − 1, k) + B(N − 1, k − 1)
B(N, k) ≤ ( )∑i=0
k−1N
i
B(N, k) N k − 1
H (N)mH N
14
15
16
17