The Power of Selective Memory
description
Transcript of The Power of Selective Memory
![Page 1: The Power of Selective Memory](https://reader035.fdocuments.in/reader035/viewer/2022062315/56815c1d550346895dc9f24c/html5/thumbnails/1.jpg)
Power of Selective Memory. Slide 1
The Power of Selective Memory
Shai Shalev-Shwartz
Joint work with
Ofer Dekel, Yoram Singer
Hebrew University, Jerusalem
![Page 2: The Power of Selective Memory](https://reader035.fdocuments.in/reader035/viewer/2022062315/56815c1d550346895dc9f24c/html5/thumbnails/2.jpg)
Power of Selective Memory. Slide 2
Outline• Online learning, loss bounds etc.• Hypotheses space – PST• Margin of prediction and hinge-loss• An online learning algorithm• Trading margin for depth of the PST• Automatic calibration• A self-bounded online algorithm for learning
PSTs
![Page 3: The Power of Selective Memory](https://reader035.fdocuments.in/reader035/viewer/2022062315/56815c1d550346895dc9f24c/html5/thumbnails/3.jpg)
Power of Selective Memory. Slide 3
Online Learning
• For
• Get an instance
• Predict a target based on
• Get true update and suffer loss
• Update prediction mechanism
![Page 4: The Power of Selective Memory](https://reader035.fdocuments.in/reader035/viewer/2022062315/56815c1d550346895dc9f24c/html5/thumbnails/4.jpg)
Power of Selective Memory. Slide 4
Analysis of Online Algorithm• Relative loss bounds (external regret):
For any fixed hypothesis h :
![Page 5: The Power of Selective Memory](https://reader035.fdocuments.in/reader035/viewer/2022062315/56815c1d550346895dc9f24c/html5/thumbnails/5.jpg)
Power of Selective Memory. Slide 5
Prediction Suffix Tree (PST)Each hypothesis is parameterized by a triplet:
context function
![Page 6: The Power of Selective Memory](https://reader035.fdocuments.in/reader035/viewer/2022062315/56815c1d550346895dc9f24c/html5/thumbnails/6.jpg)
Power of Selective Memory. Slide 6
PST Example
0
-3
1
-1
4
-2 7
![Page 7: The Power of Selective Memory](https://reader035.fdocuments.in/reader035/viewer/2022062315/56815c1d550346895dc9f24c/html5/thumbnails/7.jpg)
Power of Selective Memory. Slide 7
Margin of Prediction
• Margin of prediction
• Hinge loss
-3 -2 -1 0 1 2 30
0.5
1
1.5
2
2.5
3
3.5
4
0-1 losshinge loss
![Page 8: The Power of Selective Memory](https://reader035.fdocuments.in/reader035/viewer/2022062315/56815c1d550346895dc9f24c/html5/thumbnails/8.jpg)
Power of Selective Memory. Slide 8
Complexity of hypothesis• Define the complexity of hypothesis as
• We can also extend g s.t.
and get
![Page 9: The Power of Selective Memory](https://reader035.fdocuments.in/reader035/viewer/2022062315/56815c1d550346895dc9f24c/html5/thumbnails/9.jpg)
Power of Selective Memory. Slide 9
Algorithm I :Learning Unbounded-Depth PST
• Init:• For t=1,2,…
• Get and predict• Get and suffer loss• Set• Update weight vector• Update tree
![Page 10: The Power of Selective Memory](https://reader035.fdocuments.in/reader035/viewer/2022062315/56815c1d550346895dc9f24c/html5/thumbnails/10.jpg)
Power of Selective Memory. Slide 10
Example
y = 0
y = ?
![Page 11: The Power of Selective Memory](https://reader035.fdocuments.in/reader035/viewer/2022062315/56815c1d550346895dc9f24c/html5/thumbnails/11.jpg)
Power of Selective Memory. Slide 11
Example
y = +0
y = ?
![Page 12: The Power of Selective Memory](https://reader035.fdocuments.in/reader035/viewer/2022062315/56815c1d550346895dc9f24c/html5/thumbnails/12.jpg)
Power of Selective Memory. Slide 12
Example
y = +0
y = ? ?
![Page 13: The Power of Selective Memory](https://reader035.fdocuments.in/reader035/viewer/2022062315/56815c1d550346895dc9f24c/html5/thumbnails/13.jpg)
Power of Selective Memory. Slide 13
Example
y = + -0
y = ? ?
-.23
+
![Page 14: The Power of Selective Memory](https://reader035.fdocuments.in/reader035/viewer/2022062315/56815c1d550346895dc9f24c/html5/thumbnails/14.jpg)
Power of Selective Memory. Slide 14
Example
y = + -0
y = ? ? ?
-.23
+
![Page 15: The Power of Selective Memory](https://reader035.fdocuments.in/reader035/viewer/2022062315/56815c1d550346895dc9f24c/html5/thumbnails/15.jpg)
Power of Selective Memory. Slide 15
Example
y = + - +0
y = ? ? ?
-.23
+
.23
.16
+
-
![Page 16: The Power of Selective Memory](https://reader035.fdocuments.in/reader035/viewer/2022062315/56815c1d550346895dc9f24c/html5/thumbnails/16.jpg)
Power of Selective Memory. Slide 16
Example
y = + - +0
y = ? ? ? -
-.23
+
.23
.16
+
-
![Page 17: The Power of Selective Memory](https://reader035.fdocuments.in/reader035/viewer/2022062315/56815c1d550346895dc9f24c/html5/thumbnails/17.jpg)
Power of Selective Memory. Slide 17
Example
y = + - + -0
y = ? ? ? -
-.42
+
.23
.16
+
-
-.14
-.09
+
-
![Page 18: The Power of Selective Memory](https://reader035.fdocuments.in/reader035/viewer/2022062315/56815c1d550346895dc9f24c/html5/thumbnails/18.jpg)
Power of Selective Memory. Slide 18
Example
y = + - + -0
y = ? ? ? - +
-.42
+
.23
.16
+
-
-.14
-.09
+
-
![Page 19: The Power of Selective Memory](https://reader035.fdocuments.in/reader035/viewer/2022062315/56815c1d550346895dc9f24c/html5/thumbnails/19.jpg)
Power of Selective Memory. Slide 19
Example
y = + - + - +0
y = ? ? ? - +
-.42
+
.41
.29
+
-
-.14
-.09
+
-
.09
.06
+
-
![Page 20: The Power of Selective Memory](https://reader035.fdocuments.in/reader035/viewer/2022062315/56815c1d550346895dc9f24c/html5/thumbnails/20.jpg)
Power of Selective Memory. Slide 20
Analysis• Let be a sequence of
examples and assume that • Let be an arbitrary hypothesis• Let be the loss of on the
sequence of examples. Then,
![Page 21: The Power of Selective Memory](https://reader035.fdocuments.in/reader035/viewer/2022062315/56815c1d550346895dc9f24c/html5/thumbnails/21.jpg)
Power of Selective Memory. Slide 21
Proof Sketch• Define
• Upper bound
• Lower bound
• Upper + lower bounds give the bound in the theorem
![Page 22: The Power of Selective Memory](https://reader035.fdocuments.in/reader035/viewer/2022062315/56815c1d550346895dc9f24c/html5/thumbnails/22.jpg)
Power of Selective Memory. Slide 22
Proof Sketch (Cont.)Where does the lower bound come from?• For simplicity, assume that and • Define a Hilbert space:• The context function gt+1
is the projection of gt onto the half-space
where f is the function
![Page 23: The Power of Selective Memory](https://reader035.fdocuments.in/reader035/viewer/2022062315/56815c1d550346895dc9f24c/html5/thumbnails/23.jpg)
Power of Selective Memory. Slide 23
Example revisited
• The following hypothesis has cumulative loss of 2 and complexity of 2. Therefore, the number of mistakes is bounded above by 12.
y = + - + - + - + -
![Page 24: The Power of Selective Memory](https://reader035.fdocuments.in/reader035/viewer/2022062315/56815c1d550346895dc9f24c/html5/thumbnails/24.jpg)
Power of Selective Memory. Slide 24
Example revisited
• The following hypothesis has cumulative loss of 1 and complexity of 4. Therefore, the number of mistakes is bounded above by 18.But, this tree is very shallow
0
1.41 -1.41
+-
y = + - + - + - + -
Problem: The tree we learned is much more deeper !
![Page 25: The Power of Selective Memory](https://reader035.fdocuments.in/reader035/viewer/2022062315/56815c1d550346895dc9f24c/html5/thumbnails/25.jpg)
Power of Selective Memory. Slide 25
Geometric Intuition
![Page 26: The Power of Selective Memory](https://reader035.fdocuments.in/reader035/viewer/2022062315/56815c1d550346895dc9f24c/html5/thumbnails/26.jpg)
Power of Selective Memory. Slide 26
Geometric Intuition (Cont.)Lets force gt+1 to be sparse by “canceling” the new coordinate
![Page 27: The Power of Selective Memory](https://reader035.fdocuments.in/reader035/viewer/2022062315/56815c1d550346895dc9f24c/html5/thumbnails/27.jpg)
Power of Selective Memory. Slide 27
Geometric Intuition (Cont.)Now we can show that:
![Page 28: The Power of Selective Memory](https://reader035.fdocuments.in/reader035/viewer/2022062315/56815c1d550346895dc9f24c/html5/thumbnails/28.jpg)
Power of Selective Memory. Slide 28
Trading margin for sparsity• We got that
• If is much smaller than we can get a loss bound !
• Problem: What happens if is very small and therefore ?Solution: Tolerate small margin errors !
• Conclusion: If we tolerate small margin errors, we can get a sparser tree
![Page 29: The Power of Selective Memory](https://reader035.fdocuments.in/reader035/viewer/2022062315/56815c1d550346895dc9f24c/html5/thumbnails/29.jpg)
Power of Selective Memory. Slide 29
Automatic Calibration• Problem: The value of is unknown • Solution: Use the data itself to estimate it !More specifically:• Denote
• If we keep then we get a mistake bound
![Page 30: The Power of Selective Memory](https://reader035.fdocuments.in/reader035/viewer/2022062315/56815c1d550346895dc9f24c/html5/thumbnails/30.jpg)
Power of Selective Memory. Slide 30
Algorithm II :Learning Self Bounded-Depth PST
• Init:• For t=1,2,…
• Get and predict• Get and suffer loss• If do nothing! Otherwise:
• Set• Set • Set
• Update w and the tree as in Algo. I, up to depth dt
![Page 31: The Power of Selective Memory](https://reader035.fdocuments.in/reader035/viewer/2022062315/56815c1d550346895dc9f24c/html5/thumbnails/31.jpg)
Power of Selective Memory. Slide 31
Analysis – Loss Bound• Let be a sequence of
examples and assume that • Let be an arbitrary hypothesis• Let be the loss of on the
sequence of examples. Then,
![Page 32: The Power of Selective Memory](https://reader035.fdocuments.in/reader035/viewer/2022062315/56815c1d550346895dc9f24c/html5/thumbnails/32.jpg)
Power of Selective Memory. Slide 32
Analysis – Bounded depth• Under the previous conditions, the depth of
all the trees learned by the algorithm is bounded above by
![Page 33: The Power of Selective Memory](https://reader035.fdocuments.in/reader035/viewer/2022062315/56815c1d550346895dc9f24c/html5/thumbnails/33.jpg)
Power of Selective Memory. Slide 33
Example revisitedPerformance of Algo. II• y = + - + - + - + - …• Only 3 mistakes• The last PST is of
depth 5• The margin is 0.61
(after normalization)• The margin of the max
margin tree (of infinite depth) is 0.7071
0
-.55
+.55
.39+
-
-. 22
-.07
+ -
.07
.05
-
.03
-.05
-+
-
![Page 34: The Power of Selective Memory](https://reader035.fdocuments.in/reader035/viewer/2022062315/56815c1d550346895dc9f24c/html5/thumbnails/34.jpg)
Power of Selective Memory. Slide 34
Conclusions• Discriminative online learning of PSTs• Loss bound• Trade margin and sparsity• Automatic calibration
Future work• Experiments• Features selection and extraction• Support vectors selection