The Power of Selective Memory

34
Power of Selective Memory. Slide 1 The Power of Selective Memory Shai Shalev-Shwartz Joint work with Ofer Dekel, Yoram Singer Hebrew University, Jerusalem

description

The Power of Selective Memory. Shai Shalev-Shwartz Joint work with Ofer Dekel, Yoram Singer Hebrew University, Jerusalem. Outline. Online learning, loss bounds etc. Hypotheses space – PST Margin of prediction and hinge-loss An online learning algorithm Trading margin for depth of the PST - PowerPoint PPT Presentation

Transcript of The Power of Selective Memory

Page 1: The Power of Selective Memory

Power of Selective Memory. Slide 1

The Power of Selective Memory

Shai Shalev-Shwartz

Joint work with

Ofer Dekel, Yoram Singer

Hebrew University, Jerusalem

Page 2: The Power of Selective Memory

Power of Selective Memory. Slide 2

Outline• Online learning, loss bounds etc.• Hypotheses space – PST• Margin of prediction and hinge-loss• An online learning algorithm• Trading margin for depth of the PST• Automatic calibration• A self-bounded online algorithm for learning

PSTs

Page 3: The Power of Selective Memory

Power of Selective Memory. Slide 3

Online Learning

• For

• Get an instance

• Predict a target based on

• Get true update and suffer loss

• Update prediction mechanism

Page 4: The Power of Selective Memory

Power of Selective Memory. Slide 4

Analysis of Online Algorithm• Relative loss bounds (external regret):

For any fixed hypothesis h :

Page 5: The Power of Selective Memory

Power of Selective Memory. Slide 5

Prediction Suffix Tree (PST)Each hypothesis is parameterized by a triplet:

context function

Page 6: The Power of Selective Memory

Power of Selective Memory. Slide 6

PST Example

0

-3

1

-1

4

-2 7

Page 7: The Power of Selective Memory

Power of Selective Memory. Slide 7

Margin of Prediction

• Margin of prediction

• Hinge loss

-3 -2 -1 0 1 2 30

0.5

1

1.5

2

2.5

3

3.5

4

0-1 losshinge loss

Page 8: The Power of Selective Memory

Power of Selective Memory. Slide 8

Complexity of hypothesis• Define the complexity of hypothesis as

• We can also extend g s.t.

and get

Page 9: The Power of Selective Memory

Power of Selective Memory. Slide 9

Algorithm I :Learning Unbounded-Depth PST

• Init:• For t=1,2,…

• Get and predict• Get and suffer loss• Set• Update weight vector• Update tree

Page 10: The Power of Selective Memory

Power of Selective Memory. Slide 10

Example

y = 0

y = ?

Page 11: The Power of Selective Memory

Power of Selective Memory. Slide 11

Example

y = +0

y = ?

Page 12: The Power of Selective Memory

Power of Selective Memory. Slide 12

Example

y = +0

y = ? ?

Page 13: The Power of Selective Memory

Power of Selective Memory. Slide 13

Example

y = + -0

y = ? ?

-.23

+

Page 14: The Power of Selective Memory

Power of Selective Memory. Slide 14

Example

y = + -0

y = ? ? ?

-.23

+

Page 15: The Power of Selective Memory

Power of Selective Memory. Slide 15

Example

y = + - +0

y = ? ? ?

-.23

+

.23

.16

+

-

Page 16: The Power of Selective Memory

Power of Selective Memory. Slide 16

Example

y = + - +0

y = ? ? ? -

-.23

+

.23

.16

+

-

Page 17: The Power of Selective Memory

Power of Selective Memory. Slide 17

Example

y = + - + -0

y = ? ? ? -

-.42

+

.23

.16

+

-

-.14

-.09

+

-

Page 18: The Power of Selective Memory

Power of Selective Memory. Slide 18

Example

y = + - + -0

y = ? ? ? - +

-.42

+

.23

.16

+

-

-.14

-.09

+

-

Page 19: The Power of Selective Memory

Power of Selective Memory. Slide 19

Example

y = + - + - +0

y = ? ? ? - +

-.42

+

.41

.29

+

-

-.14

-.09

+

-

.09

.06

+

-

Page 20: The Power of Selective Memory

Power of Selective Memory. Slide 20

Analysis• Let be a sequence of

examples and assume that • Let be an arbitrary hypothesis• Let be the loss of on the

sequence of examples. Then,

Page 21: The Power of Selective Memory

Power of Selective Memory. Slide 21

Proof Sketch• Define

• Upper bound

• Lower bound

• Upper + lower bounds give the bound in the theorem

Page 22: The Power of Selective Memory

Power of Selective Memory. Slide 22

Proof Sketch (Cont.)Where does the lower bound come from?• For simplicity, assume that and • Define a Hilbert space:• The context function gt+1

is the projection of gt onto the half-space

where f is the function

Page 23: The Power of Selective Memory

Power of Selective Memory. Slide 23

Example revisited

• The following hypothesis has cumulative loss of 2 and complexity of 2. Therefore, the number of mistakes is bounded above by 12.

y = + - + - + - + -

Page 24: The Power of Selective Memory

Power of Selective Memory. Slide 24

Example revisited

• The following hypothesis has cumulative loss of 1 and complexity of 4. Therefore, the number of mistakes is bounded above by 18.But, this tree is very shallow

0

1.41 -1.41

+-

y = + - + - + - + -

Problem: The tree we learned is much more deeper !

Page 25: The Power of Selective Memory

Power of Selective Memory. Slide 25

Geometric Intuition

Page 26: The Power of Selective Memory

Power of Selective Memory. Slide 26

Geometric Intuition (Cont.)Lets force gt+1 to be sparse by “canceling” the new coordinate

Page 27: The Power of Selective Memory

Power of Selective Memory. Slide 27

Geometric Intuition (Cont.)Now we can show that:

Page 28: The Power of Selective Memory

Power of Selective Memory. Slide 28

Trading margin for sparsity• We got that

• If is much smaller than we can get a loss bound !

• Problem: What happens if is very small and therefore ?Solution: Tolerate small margin errors !

• Conclusion: If we tolerate small margin errors, we can get a sparser tree

Page 29: The Power of Selective Memory

Power of Selective Memory. Slide 29

Automatic Calibration• Problem: The value of is unknown • Solution: Use the data itself to estimate it !More specifically:• Denote

• If we keep then we get a mistake bound

Page 30: The Power of Selective Memory

Power of Selective Memory. Slide 30

Algorithm II :Learning Self Bounded-Depth PST

• Init:• For t=1,2,…

• Get and predict• Get and suffer loss• If do nothing! Otherwise:

• Set• Set • Set

• Update w and the tree as in Algo. I, up to depth dt

Page 31: The Power of Selective Memory

Power of Selective Memory. Slide 31

Analysis – Loss Bound• Let be a sequence of

examples and assume that • Let be an arbitrary hypothesis• Let be the loss of on the

sequence of examples. Then,

Page 32: The Power of Selective Memory

Power of Selective Memory. Slide 32

Analysis – Bounded depth• Under the previous conditions, the depth of

all the trees learned by the algorithm is bounded above by

Page 33: The Power of Selective Memory

Power of Selective Memory. Slide 33

Example revisitedPerformance of Algo. II• y = + - + - + - + - …• Only 3 mistakes• The last PST is of

depth 5• The margin is 0.61

(after normalization)• The margin of the max

margin tree (of infinite depth) is 0.7071

0

-.55

+.55

.39+

-

-. 22

-.07

+ -

.07

.05

-

.03

-.05

-+

-

Page 34: The Power of Selective Memory

Power of Selective Memory. Slide 34

Conclusions• Discriminative online learning of PSTs• Loss bound• Trade margin and sparsity• Automatic calibration

Future work• Experiments• Features selection and extraction• Support vectors selection