Tutorial :PART2 - Princeton University Computer Science · 2019-07-31 · Tutorial:"PART"2...

36
Tutorial : PART 2 Online Convex Optimization, A GameTheoretic Approach to Learning Elad Hazan Satyen Kale Princeton University Yahoo Research

Transcript of Tutorial :PART2 - Princeton University Computer Science · 2019-07-31 · Tutorial:"PART"2...

Page 1: Tutorial :PART2 - Princeton University Computer Science · 2019-07-31 · Tutorial:"PART"2 Online"Convex"Optimization, AGame 8Theoretic"Approach"to"Learning" Elad"Hazan"""""Satyen"Kale

Tutorial:    PART  2

Online  Convex  Optimization,A  Game-­‐Theoretic  Approach  to  Learning  

Elad  Hazan                                                        Satyen  KalePrinceton  University                                Yahoo  Research            

Page 2: Tutorial :PART2 - Princeton University Computer Science · 2019-07-31 · Tutorial:"PART"2 Online"Convex"Optimization, AGame 8Theoretic"Approach"to"Learning" Elad"Hazan"""""Satyen"Kale

Exploiting  curvature:  logarithmic  regret

Page 3: Tutorial :PART2 - Princeton University Computer Science · 2019-07-31 · Tutorial:"PART"2 Online"Convex"Optimization, AGame 8Theoretic"Approach"to"Learning" Elad"Hazan"""""Satyen"Kale

• Some  natural  problems  admit  better  regret:    • Least  squares  linear  regression• Soft-­‐margin  SVM• Portfolio  selection

• α-­‐strong  convexity:  (e.g.  𝑓 𝑥 =  ∥ 𝑥 ∥&)

• α-­‐exp-­‐concavity:  (e.g.  𝑓 𝑥 = −log𝑎,𝑥)

Logarithmic  regret

r2f(x) ⌫ ↵I

r2f(x) ⌫ ↵rf(x)rf(x)>

Page 4: Tutorial :PART2 - Princeton University Computer Science · 2019-07-31 · Tutorial:"PART"2 Online"Convex"Optimization, AGame 8Theoretic"Approach"to"Learning" Elad"Hazan"""""Satyen"Kale

Online  Soft-­‐Margin  SVM

• Features  𝑎 ∈  ℝ/,  labels  𝑏 ∈ {−1,1}

• Hypothesis  class  H is  a  set  of  linear  predictors:• Parameters:  weight  vectors  𝑥 ∈ ℝ/• Prediction  of  weight  vector  x for  feature  vector  a is  aTx

• Loss  𝑓5 𝑥 = max 0, 1   − 𝑏5𝑎5,𝑥 +  𝜆 ∥ 𝑥 ∥&

• Thm [Hazan,  Agarwal,  Kale  2007]: OGD  with  step  sizes  𝜂5 ≈>5   has  

𝑶(𝐥𝐨𝐠 𝑻) regret

Convex,  non-­‐smooth strongly-­‐convex

Page 5: Tutorial :PART2 - Princeton University Computer Science · 2019-07-31 · Tutorial:"PART"2 Online"Convex"Optimization, AGame 8Theoretic"Approach"to"Learning" Elad"Hazan"""""Satyen"Kale

Universal  Portfolio  Selection

• Loss  function  𝑓5 𝑥 = − log(𝑟5,𝑥)

• Convex,  but  no  strongly  convex  component

• But  exp-­‐concave:  exp(−𝑓5 𝑥 ) = 𝑟5,𝑥 (concave  function:  actually,  linear)

• Definition: f is  called  𝛼-­‐exp-­‐concave  if  exp −𝛼𝑓 𝑥 is  concave

• Theorem  [Hazan,  Agarwal,  Kale  2007]: For  exp-­‐concave  losses,   there  is  an  algorithm,  Online  Newton  Step,  that  obtains  𝑶(𝐥𝐨𝐠 𝑻) regret

Page 6: Tutorial :PART2 - Princeton University Computer Science · 2019-07-31 · Tutorial:"PART"2 Online"Convex"Optimization, AGame 8Theoretic"Approach"to"Learning" Elad"Hazan"""""Satyen"Kale

Online  Newton  Step  algorithm

• Use  x1  = arbitrary  point  in  K in  round  1• For  t  >  1:  use  xt defined  by  following  equations

rt�1 = rf

t�1(xt�1)

C

t�1 =t�1X

⌧=1

r⌧

r>⌧

g

t�1 =t�1X

⌧=1

r⌧

r>⌧

x

� cr⌧

y

t

= C

�1t�1gt�1

x

t

= argminx2K

(x� y

t

)>Ct�1(x� y

t

)>

≈ cumulative  Hessian

≈ cumulative  gradient  step

≈ Newton  update

c is  a  constant  depending  on  α,  other  bounds

Page 7: Tutorial :PART2 - Princeton University Computer Science · 2019-07-31 · Tutorial:"PART"2 Online"Convex"Optimization, AGame 8Theoretic"Approach"to"Learning" Elad"Hazan"""""Satyen"Kale

Regularizationfollow  the  regularized  leader,  online  mirror  descent

Page 8: Tutorial :PART2 - Princeton University Computer Science · 2019-07-31 · Tutorial:"PART"2 Online"Convex"Optimization, AGame 8Theoretic"Approach"to"Learning" Elad"Hazan"""""Satyen"Kale

• Most  natural:

• Be-­‐the-­‐leader  algorithm  has  zero  regret  [Kalai-­‐Vempala,  2005]:

• So  if  𝑓5 𝑥5 ≈ 𝑓5(𝑥5K>),  we  get  a  regret  bound

• Unstability:  𝑓5 𝑥5 − 𝑓5(𝑥5K>) can  be  large!

Follow-­‐the-­‐leader  

Regret =X

t

f

t

(xt

)� minx

⇤2K

X

t

f

t

(x⇤)

x

0t

= argminx2K

Pt

i=1 fi(x) = x

t+1

x

t

= argminx2K

Pt�1i=1 fi(x)

Page 9: Tutorial :PART2 - Princeton University Computer Science · 2019-07-31 · Tutorial:"PART"2 Online"Convex"Optimization, AGame 8Theoretic"Approach"to"Learning" Elad"Hazan"""""Satyen"Kale

Fixing  FTL:  Follow-­‐The-­‐Regularized-­‐Leader  (FTRL)

• First,  suffices  to  use  replace  ft by  a  linear  function,  𝛻𝑓5 𝑥5 ,𝑥

• Next,  add  regularization:

• R(x)  is  a  strongly  convex  function,  𝜂 is  a  regularization  constant

• Under  certain  conditions,  this  ensures  stability:

rft(xt) · xt �rft(xt) · xt+1 ⇡ O(⌘)

x

t

= arg minx2K

Pt�1i=1 rf

i

(xi

)>x+ 1⌘

R(x)

Page 10: Tutorial :PART2 - Princeton University Computer Science · 2019-07-31 · Tutorial:"PART"2 Online"Convex"Optimization, AGame 8Theoretic"Approach"to"Learning" Elad"Hazan"""""Satyen"Kale

Specific  instances  of  FTRL

• 𝑅 𝑥 =  ∥ 𝑥 ∥&

• Here,  

• Almost  like  OGD:  starting  with  y1 =  0,  for t  =  1,  2,  …

QK

(y) = argminx2K

ky � xk

xt =Q

K(yt)

yt+1 = yt � ⌘rft(xt)

x

t

= argminx2K

Pt�1i=1 rf

i

(xi

)>x+ 1⌘

R(x)

=Q

K

⇣�⌘

Pt�1i=1 rf

i

(xi

)⌘

Page 11: Tutorial :PART2 - Princeton University Computer Science · 2019-07-31 · Tutorial:"PART"2 Online"Convex"Optimization, AGame 8Theoretic"Approach"to"Learning" Elad"Hazan"""""Satyen"Kale

Specific  instances  of  FTRL• Experts  setting:  𝐾   = Δ/ distributions  over  experts• 𝑓5 𝑥 = 𝑐5,𝑥,  where  ct is  the  vector  of  losses• 𝑅 𝑥 =  ∑ 𝑥R log 𝑥R�

R :  negative  entropy

• Exactly  RWM!  Starting  with  y1 =  uniform  distribution,  for t  =  1,  2,  …

x

t

= arg min

x2K

Pt�1i=1 rf

i

(x

i

)

>x+

1⌘

R(x)

= exp

⇣�⌘

Pt�1i=1 ci

⌘/Z

t

Entrywiseexponential

Normalization  constant

xt =eQ

K(yt)

yt+1 = yt � exp(�⌘ct)

“Projection”  =  Normalization

Entrywise product  with  entrywise exponential

Page 12: Tutorial :PART2 - Princeton University Computer Science · 2019-07-31 · Tutorial:"PART"2 Online"Convex"Optimization, AGame 8Theoretic"Approach"to"Learning" Elad"Hazan"""""Satyen"Kale

FTRL  ⇔ Online  Mirror  Descent

x

t

= arg minx2K

Pt�1i=1 rf

i

(xi

)>x+ 1⌘

R(x)

xt =QR

K(yt)

yt+1 = (rR)�1(rR(yt)� ⌘rft(xt))

Bregman Projection:Q

R

K

(y) = argminx2K

B

R

(xky)

B

R

(xky) := R(x)�R(y)�rR(y)>(x� y)

Page 13: Tutorial :PART2 - Princeton University Computer Science · 2019-07-31 · Tutorial:"PART"2 Online"Convex"Optimization, AGame 8Theoretic"Approach"to"Learning" Elad"Hazan"""""Satyen"Kale

Advanced  regularizationfollow  the  perturbed  leader,  AdaGrad

Page 14: Tutorial :PART2 - Princeton University Computer Science · 2019-07-31 · Tutorial:"PART"2 Online"Convex"Optimization, AGame 8Theoretic"Approach"to"Learning" Elad"Hazan"""""Satyen"Kale

Randomized  regularization:  Follow-­‐The-­‐Perturbed-­‐Leader

• Special  case  of  OCO,  Online  Linear  Optimization:  𝑓5 𝑥 = 𝑐5,𝑥

• Follow-­‐The-­‐Perturbed-­‐Leader   [Kalai-­‐Vempala 2005]:

• Regularization  pTx:  vector  p drawn  u.a.r.  from  [− 𝑇� , 𝑇� ]

• Expected  regret  is  the  same  if  p is  chosen  afresh  every  round• Re-­‐randomizing  every  round  ⇒ 𝑥5 =   𝑥5K> w.p.  at  least  1   − 1/ 𝑇�

• ⇒ regret  =  𝑂( 𝑇� )

x

t

= argminx2K

Pt�1i=1 c

>i

x+ p

>x

Page 15: Tutorial :PART2 - Princeton University Computer Science · 2019-07-31 · Tutorial:"PART"2 Online"Convex"Optimization, AGame 8Theoretic"Approach"to"Learning" Elad"Hazan"""""Satyen"Kale

Follow-­‐The-­‐Perturbed-­‐Leader

• Algorithm:  p is  random  vector  with  entries  in  [− 𝑇� , 𝑇� ]

• Implementation  requires  only  linear  optimization  over K,  easier  than  projections  needed  by  OMD

• 𝑂 𝑇� regret  bound  holds  for  arbitrary  sets  K:  convexity  unnecessary!

• Can  be  extended  to  convex  losses  using  gradient  trick  +  using  expected  perturbed   leader  in  each  round

x

t

= argminx2K

Pt�1i=1 c

>i

x+ p

>x

Page 16: Tutorial :PART2 - Princeton University Computer Science · 2019-07-31 · Tutorial:"PART"2 Online"Convex"Optimization, AGame 8Theoretic"Approach"to"Learning" Elad"Hazan"""""Satyen"Kale

Adaptive  Regularization:  AdaGrad

• Consider  use  of  OGD  in  learning  a  generalized  linear  model• For  parameter  𝑥 ∈ ℝ/ and  example   𝑎, 𝑏 ∈ ℝ/  ×  ℝ,  prediction  is  some  function  of  𝑎,𝑥• Thus,  𝛻𝑓5 𝑥 = ℓ𝓁 𝑎5,𝑏5 , 𝑥 𝑎5

• OGD  update:  𝑥5K> = 𝑥5 − 𝜂𝛻𝑓5 𝑥5 = 𝑥5  − 𝜂ℓ𝓁 𝑎5, 𝑏5 , 𝑥 𝑎5• Thus,  features  treated  equally  in  updating  parameter  vector

• In  typical  text  classification  tasks,  feature  vectors  at are  very  sparse• Slow  learning!

• Adaptive  regularization:  implements  per-­‐feature   learning  rates

Page 17: Tutorial :PART2 - Princeton University Computer Science · 2019-07-31 · Tutorial:"PART"2 Online"Convex"Optimization, AGame 8Theoretic"Approach"to"Learning" Elad"Hazan"""""Satyen"Kale

AdaGrad (diagonal  form)[Duchi,  Hazan,  Singer  ’10]

• Set  𝑥> ∈ 𝐾  arbitrarily• For  t =  1,  2,…,  

1. use  𝑥5 obtain  ft2. compute  𝑥5K> as  follows:

• Regret  bound:  𝑂 ∑ ∑ 𝛻𝑓5 𝑥5 R&�

5��

R  

• Infrequently  occurring,  or  small-­‐scale,  features  have  small  influence  on  regret  (and  therefore,  convergence  to  optimal  parameter)

G

t

= diag(P

t

i=1 rf

i

(xi

)rf

i

(xi

)>)

y

t+1 = x

t

� ⌘G

�1/2t

rf

t

(xt

)

x

t+1 = argminx2K

(yt+1 � x)>G

t

(yt+1 � x)

Page 18: Tutorial :PART2 - Princeton University Computer Science · 2019-07-31 · Tutorial:"PART"2 Online"Convex"Optimization, AGame 8Theoretic"Approach"to"Learning" Elad"Hazan"""""Satyen"Kale

Bandit  Convex  Optimization

Page 19: Tutorial :PART2 - Princeton University Computer Science · 2019-07-31 · Tutorial:"PART"2 Online"Convex"Optimization, AGame 8Theoretic"Approach"to"Learning" Elad"Hazan"""""Satyen"Kale

Bandit  Optimization

• Motivating  example:  ad  selection  on  websites

• Simplest  form:  in  each  round  t• Choose  one  ad  it out  of  n• Simultaneously,   user  chooses  loss  vector  ct• Observe  loss  of  chosen  ad, ct(it)

• Essentially,  experts  problem  with  partial  feedback• Modeled  as  online  linear  optimization  over  n-­‐simplex  of  distributions• Available  choices  are  usually  called  “arms”   (from  multi-­‐armed  bandits)• Randomization  necessary,  so  consider  expected  regret

ct(i)  =  -­‐revenue  if  ad  i would  be  

clicked  if  shown,  0otherwise

Regret =TX

t=1

ct(it)�mini

TX

t=1

ct(i)

Page 20: Tutorial :PART2 - Princeton University Computer Science · 2019-07-31 · Tutorial:"PART"2 Online"Convex"Optimization, AGame 8Theoretic"Approach"to"Learning" Elad"Hazan"""""Satyen"Kale

Constructing  cost  estimators

e1

e2 en

c̃t = nct(it)eit

Exploration:• Choose  arm  it from  1,  2,  …,  n  u.a.r.• Observe ct(it)

E[c̃t] = ct

Exploitation: x

t

= argminx2K

t�1X

i=1

i

>x+

1

R(x)

Page 21: Tutorial :PART2 - Princeton University Computer Science · 2019-07-31 · Tutorial:"PART"2 Online"Convex"Optimization, AGame 8Theoretic"Approach"to"Learning" Elad"Hazan"""""Satyen"Kale

Algorithm:      1.  w.p.  q explore  (as  before),  create  estimator

2.  o/w  exploit

Regret  is  essentially  (ignoring  dependence  on  n)  bounded  by  

Can  we  do  better?  

Separate  explore-­‐exploit  template

E[c̃t] = ct

x

t

= argminx2K

t�1X

i=1

i

>x+

1

R(x)

⇡ q · T + (1� q) · 1q· RegretFTRL = O

q +

pT

q

!= O(T 2/3)

Page 22: Tutorial :PART2 - Princeton University Computer Science · 2019-07-31 · Tutorial:"PART"2 Online"Convex"Optimization, AGame 8Theoretic"Approach"to"Learning" Elad"Hazan"""""Satyen"Kale

Simultaneous  exploration-­‐exploitation  [Auer,  Cesa-­‐Bianchi,  Freund,  Schapire 2002]

e1

e2 en

Exploration+Exploitation:• Choose  arm  it from  distribution xt• Observe ct(it)

E[c̃t] = ct

c̃t =ct(it)

xt(it)eit

x

t

= argminx2K

t�1X

i=1

i

>x+

1

R(x)

With  R = negative  entropy,  this  alg (called  EXP3)  has  𝑂 𝑇� regret

Page 23: Tutorial :PART2 - Princeton University Computer Science · 2019-07-31 · Tutorial:"PART"2 Online"Convex"Optimization, AGame 8Theoretic"Approach"to"Learning" Elad"Hazan"""""Satyen"Kale

Bandit  Linear  Optimization

• Generalize  from  multi-­‐armed  bandits• K: convex  set  in  ℝ/

• Linear  cost  functions:  𝑓5 𝑥 = 𝑐5,𝑥

• Approach:  1. (Exploration)  construct  unbiased  estimators2. (Exploitation)  Plug  into  FTRL:

E[c̃t] = ct

x

t

= argminx2K

t�1X

i=1

i

>x+

1

R(x)

Page 24: Tutorial :PART2 - Princeton University Computer Science · 2019-07-31 · Tutorial:"PART"2 Online"Convex"Optimization, AGame 8Theoretic"Approach"to"Learning" Elad"Hazan"""""Satyen"Kale

Bandit  Linear  Optimization

• K: convex  set  in  ℝ/,  linear  cost  functions:  𝑓5 𝑥 = 𝑐5,𝑥• Assume  K  contains  e1,  e2,  …,  en “exploration  basis”• In  round  t:

1. W.p.  q  choose  𝑖5 ∈ 1, 2,… , 𝑛 and  use  𝑒R5 .  Observe  𝑐5 𝑖5 and  set

2. W.p.  1  – q  use  xt obtained  from  FTRL  and  set  𝑐5b = 0

• Gives  regret  bound  of  O(T2/3)

E[c̃t] = ct

x

t

= argminx2K

t�1X

i=1

i

>x+

1

R(x)

c̃t =ct(it)

q/neit

Page 25: Tutorial :PART2 - Princeton University Computer Science · 2019-07-31 · Tutorial:"PART"2 Online"Convex"Optimization, AGame 8Theoretic"Approach"to"Learning" Elad"Hazan"""""Satyen"Kale

Geometric  regularization  via  barriers  [Abernethy,  Hazan,  Rakhlin 2008]

• R =  self-­‐concordant  barrier  for  K.

• Dikin ellipsoid at  any  point  x  lies  in K:  𝑦   𝑦   − 𝑥 ,𝛻&𝑅 𝑥 𝑦   − 𝑥 ≤ 1}

• Algorithm: use  random  endpoint  of  axes  of  ellipsoid  at  xt• ≡  using  xt in  expectation    (exploitation)• 2n endpoints   suffice  to  construct  𝑐5b (exploration)

• Thm: Regret  =  𝑂( 𝑇� )x

t

= argminx2K

t�1X

i=1

i

>x+

1

R(x)

Page 26: Tutorial :PART2 - Princeton University Computer Science · 2019-07-31 · Tutorial:"PART"2 Online"Convex"Optimization, AGame 8Theoretic"Approach"to"Learning" Elad"Hazan"""""Satyen"Kale

Bandit  Online  Convex  Optimization[Flaxman,  Kalai,  McMahan  2005]

• Near  boundary,  estimators  have  large  variance• Hence,  xt and  xt+1 can  be  very  different• Regret  bound  =  O(T3/4)• Recent  developments:𝑂 𝑇� regret  possible!

ft arbitrary  convex  functions

Approach:1. Construct  estimator  𝑐5b of  𝛻𝑓5 𝑥52. Plug  into  FTRL

x

t

= argminx2K

t�1X

i=1

i

>x+

1

R(x)

rf(x) ⇡ 1

Eu2Sn�1

[f(x+ �u)u]

Page 27: Tutorial :PART2 - Princeton University Computer Science · 2019-07-31 · Tutorial:"PART"2 Online"Convex"Optimization, AGame 8Theoretic"Approach"to"Learning" Elad"Hazan"""""Satyen"Kale

Projection-­‐free  algorithms

Page 28: Tutorial :PART2 - Princeton University Computer Science · 2019-07-31 · Tutorial:"PART"2 Online"Convex"Optimization, AGame 8Theoretic"Approach"to"Learning" Elad"Hazan"""""Satyen"Kale

Matrix  completion

In  round  t:• Obtain  (user,  movie)  pair,  and  output  a  recommended  rating• Then  observe  true  rating,  and  suffer  square  loss

Comparison  class:  all  matrices  with  bounded  trace  norm

Page 29: Tutorial :PART2 - Princeton University Computer Science · 2019-07-31 · Tutorial:"PART"2 Online"Convex"Optimization, AGame 8Theoretic"Approach"to"Learning" Elad"Hazan"""""Satyen"Kale

Bounded  trace  norm  matrices

• Trace  norm  of  a  matrix  =  sum  of  singular  values• K =  {  X |  X is  a  matrix  with  trace  norm  at  most  D }• Optimizing  over  K promotes  low  rank  solutions

• Online  Matrix  Completion  problem  is  an  OCO  problem  over  K• FTRL,  OMD  etc require  computing  projections  on  K• Expensive!  Requires  computing  eigendecomposition of  matrix  to  be  projected:  O(n3) operation

• But:  linear  optimization  over  K is  easier• Requires  computing  top  eigenvector;  can  be  done  in  O(sparsity) time  typically

Page 30: Tutorial :PART2 - Princeton University Computer Science · 2019-07-31 · Tutorial:"PART"2 Online"Convex"Optimization, AGame 8Theoretic"Approach"to"Learning" Elad"Hazan"""""Satyen"Kale

1. Matrix  completion   (K =  bounded   trace  norm  matrices)eigen decomposition                  largest  singular  vector  computation

2. Online  routing   (K =  flow  polytope)conic  optimization  over  flow  polytopeshortest  path  computation

3. Rotations  (K =  rotation  matrices)conic  optimization  over  rotations  setWahba’s algorithm  – linear  time

4. Matroids (K =  matroid polytope)convex  opt.  via  ellipsoid  methodgreedy  algorithm  – nearly  linear  time!

Projections  à linear  optimization

Page 31: Tutorial :PART2 - Princeton University Computer Science · 2019-07-31 · Tutorial:"PART"2 Online"Convex"Optimization, AGame 8Theoretic"Approach"to"Learning" Elad"Hazan"""""Satyen"Kale

Projection-­‐Free  Online  Learning

• Is  online  learning  possible  given  access  to  linear  optimization  oracle  rather  than  projection  oracle?

• Yes:  projection  can  be  implemented  via  polynomiallymany  linear  optimizations• But  for  efficiency,  would  like  only  a  few  (1  or  2)  calls  to  linear  optimization  oracle  per  round

• Follow-­‐The-­‐Perturbed-­‐Leader does  the  job  for  linear loss  functions

• What  about  general  convex  loss  functions?  Is  OCO  possible  with  only  1  or  2  calls  to  a  linear  optimization  oracle  per  round?

Page 32: Tutorial :PART2 - Princeton University Computer Science · 2019-07-31 · Tutorial:"PART"2 Online"Convex"Optimization, AGame 8Theoretic"Approach"to"Learning" Elad"Hazan"""""Satyen"Kale

Conditional  Gradient  algorithm  [Frank,  Wolfe  ’56]

1. At  iteration  t: convex  comb.  of  at  most  t vertices  (sparsity)2. No  learning  rate.  𝜂5 ≈

>5 (independent  of  diameter,  gradients  etc.)

Convex  opt  problem:minh∈i

𝑓(𝑥)

Assume:  • f is  smooth,  convex• linear  opt  over  K is  easy

v

t

= argminx2K

rf(xt

)>x

x

t+1 = x

t

+ ⌘

t

(vt

� x

t

)

Page 33: Tutorial :PART2 - Princeton University Computer Science · 2019-07-31 · Tutorial:"PART"2 Online"Convex"Optimization, AGame 8Theoretic"Approach"to"Learning" Elad"Hazan"""""Satyen"Kale

Online  Conditional  Gradient  [Hazan,  Kale  ’12]

vt

xt+1xt

xt+1 (1� t

�↵)xt + t

�↵vt

Theorem: Regret = O(T 3/4)

• Set  𝑥> ∈ 𝐾  arbitrarily• For  t =  1,  2,…,  

1. Use  𝑥5,  obtain  ft2. Compute  𝑥5K> as  follows

v

t

= argminx2K

⇣Pt

i=1 rf

i

(xi

) + �

t

x

t

⌘>x

Pti=1 rfi(xi) + �txt

Page 34: Tutorial :PART2 - Princeton University Computer Science · 2019-07-31 · Tutorial:"PART"2 Online"Convex"Optimization, AGame 8Theoretic"Approach"to"Learning" Elad"Hazan"""""Satyen"Kale

Linearly-­‐convergence  CG  [Garber,  Hazan 2013]

xt

Theorem: For  polytopes,  strongly-­‐convex  and  smooth  losses:1. Offline:    gap  after  t  steps:2. Online:  Regret = O(

pT )exp(�ct)

Page 35: Tutorial :PART2 - Princeton University Computer Science · 2019-07-31 · Tutorial:"PART"2 Online"Convex"Optimization, AGame 8Theoretic"Approach"to"Learning" Elad"Hazan"""""Satyen"Kale

Directions  for  future  research

1. Efficient  algorithms  &  optimal  dependency  on  dimension  for  bandit  OCO

2. Faster  online  matrix  algorithms3. Optimal  regret  /running  time  tradeoffs4. Optimal  regret  and  projection-­‐free  algorithm  for  general  convex  sets5. Time  series/dependence/adaptivity (vs.  static  regret)6. Adding  state  /  online  Reinforcement  Learning  /  MDPs  /  stochastic  

games  [lots  of  existing  work…]  7. Tensor/spectral   learning  in  the  online  setting8. Game  theoretic  aspect  – stronger  notions,  approachability,  …  

Page 36: Tutorial :PART2 - Princeton University Computer Science · 2019-07-31 · Tutorial:"PART"2 Online"Convex"Optimization, AGame 8Theoretic"Approach"to"Learning" Elad"Hazan"""""Satyen"Kale

Bibliography  &  more  information,  see:http://www.cs.princeton.edu/~ehazan/tutorial/tutorial.htm

Thank  you!