Online Convex Optimization in the Bandit Settingkamalika/teaching/CSE291W11/Feb9.pdf · Online...

33
Online Convex Optimization in the Bandit Setting: Gradient Descent Without a Gradient -Avinash Atreya Feb 9 2011

Transcript of Online Convex Optimization in the Bandit Settingkamalika/teaching/CSE291W11/Feb9.pdf · Online...

Page 1: Online Convex Optimization in the Bandit Settingkamalika/teaching/CSE291W11/Feb9.pdf · Online Convex Optimization in the Bandit Setting: Gradient Descent Without a Gradient -Avinash

Online Convex Optimization in the Bandit Setting: Gradient Descent Without a Gradient

-Avinash Atreya

Feb 9 2011

Page 2: Online Convex Optimization in the Bandit Settingkamalika/teaching/CSE291W11/Feb9.pdf · Online Convex Optimization in the Bandit Setting: Gradient Descent Without a Gradient -Avinash

Outline

• Introduction – The Problem

– Example

– Background

– Notation

– Results

• One Point Estimate

• Main Theorem

• Extensions and Related Work

Page 3: Online Convex Optimization in the Bandit Settingkamalika/teaching/CSE291W11/Feb9.pdf · Online Convex Optimization in the Bandit Setting: Gradient Descent Without a Gradient -Avinash

The Problem

At time t: – We need to choose an input vector 𝑥𝑡 ∈ 𝑆 ⊂ ℝ𝑑

– 𝑆 is a convex set

– Nature reveals only the cost 𝑐𝑡 𝑥𝑡

– 𝑐𝑡 : ℝ𝑑 → ℝ is convex (not necessarily differentiable)

Our Goal: – Minimize the expected regret:

𝔼 𝑐𝑡 𝑥𝑡

𝑛

𝑡=1

− min𝑥∈𝑆

𝑐𝑡(𝑥)

𝑛

𝑡=1

Page 4: Online Convex Optimization in the Bandit Settingkamalika/teaching/CSE291W11/Feb9.pdf · Online Convex Optimization in the Bandit Setting: Gradient Descent Without a Gradient -Avinash

Example

Online advertising spends each day

Each component of 𝑥𝑡 : spend on a search engine in dollars

10020050

End of the day we learn the number of clicks

Page 5: Online Convex Optimization in the Bandit Settingkamalika/teaching/CSE291W11/Feb9.pdf · Online Convex Optimization in the Bandit Setting: Gradient Descent Without a Gradient -Avinash

Background

Online Convex Optimization

– We learn the function 𝑐𝑡 after we pick 𝑥𝑡

Bandit Setting

– We learn only the outcome of our action

Online Convex Optimization in Bandit Setting

– We only learn the outcome 𝑐𝑡 𝑥𝑡

Page 6: Online Convex Optimization in the Bandit Settingkamalika/teaching/CSE291W11/Feb9.pdf · Online Convex Optimization in the Bandit Setting: Gradient Descent Without a Gradient -Avinash

Notation I

𝐷 : diameter 𝑥 − 𝑦 2 ≤ 𝐷 ∀𝑥, 𝑦 ∈ 𝑆

𝐺 : gradient upper bound 𝛻𝑐𝑡 𝑥𝑡 2 ≤ 𝐺 ∀𝑡 1 ≤ 𝑡 ≤ 𝑛

Page 7: Online Convex Optimization in the Bandit Settingkamalika/teaching/CSE291W11/Feb9.pdf · Online Convex Optimization in the Bandit Setting: Gradient Descent Without a Gradient -Avinash

Notation II

𝐶 : function absolute value bound 𝑐𝑡 𝑥 ≤ 𝐶 ∀𝑡 , ∀𝑥

𝐿 : Lipschitz Constant 𝑐𝑡 𝑥 − 𝑐𝑡 𝑦 2 ≤ 𝐿 𝑥 − 𝑦 2 ∀𝑡, ∀𝑥, 𝑦 ∈ 𝑆

Page 8: Online Convex Optimization in the Bandit Settingkamalika/teaching/CSE291W11/Feb9.pdf · Online Convex Optimization in the Bandit Setting: Gradient Descent Without a Gradient -Avinash

Notation III

Until ball 𝔹 and unit sphere 𝕊

𝔹 = 𝑥 ∈ ℝ𝑑 𝑥 ≤ 1+,

𝕊 = 𝑥 ∈ ℝ𝑑 𝑥 = 1+

Projection onto the convex set 𝑆 𝑃𝑆 𝑥 = argmin

𝑧∈𝑆|𝑥 − 𝑧|

Page 9: Online Convex Optimization in the Bandit Settingkamalika/teaching/CSE291W11/Feb9.pdf · Online Convex Optimization in the Bandit Setting: Gradient Descent Without a Gradient -Avinash

Key Results

Online Convex Optimization: (Zinkevich)

𝑐𝑡 𝑥𝑡 − min𝑥∈𝑆

𝑐 𝑥 ≤ 𝐷𝐺 𝑛

𝑛

𝑡=1

𝑛

𝑡=1

Bandit Setting

𝔼 𝑐𝑡 𝑥𝑡

𝑛

𝑡=1

− min𝑥∈𝑆

𝑐𝑡 𝑥

𝑛

𝑡=1

≤ 6𝑛56𝑑𝐶

Page 10: Online Convex Optimization in the Bandit Settingkamalika/teaching/CSE291W11/Feb9.pdf · Online Convex Optimization in the Bandit Setting: Gradient Descent Without a Gradient -Avinash

Outline

• Introduction

• One Point Estimate

– Key Challenge

– Projected Gradient Descent

– Expected Gradient Descent

– One Point Estimate

• Main Theorem

• Extensions and Related Work

Page 11: Online Convex Optimization in the Bandit Settingkamalika/teaching/CSE291W11/Feb9.pdf · Online Convex Optimization in the Bandit Setting: Gradient Descent Without a Gradient -Avinash

Key Challenge

Approach

– Projected Gradient descent 𝑥𝑡+1 = 𝑃𝑆 𝑥𝑡 − 𝜈𝛻𝑐𝑡 𝑥𝑡

Challenge

– How to estimate gradient with only 𝑐𝑡 𝑥𝑡 ?

Page 12: Online Convex Optimization in the Bandit Settingkamalika/teaching/CSE291W11/Feb9.pdf · Online Convex Optimization in the Bandit Setting: Gradient Descent Without a Gradient -Avinash

Gradient Estimate

We need at least 𝑑 + 1 points in d dimensions

1-d : 𝑓′ 𝑥 ≈ 𝑓 𝑥+𝛿 − f 𝑥

𝛿

Prior work exists on using two point estimates in d dimensions

𝑓(𝑥)

𝑓(𝑥 + 𝛿)

Page 13: Online Convex Optimization in the Bandit Settingkamalika/teaching/CSE291W11/Feb9.pdf · Online Convex Optimization in the Bandit Setting: Gradient Descent Without a Gradient -Avinash

Projected Gradient Descent

Due to Zinkevich (seen in class)

𝑥1 = 0 ; At time 𝑡 + 1 – 𝑐𝑡 is revealed (convex and differentiable)

– 𝑥𝑡+1 = 𝑃𝑠 𝑥𝑡 − 𝜂𝛻𝑐𝑡 𝑥𝑡

Regret bound:

𝑐𝑡 𝑥𝑡 − min𝑥∈𝑆

𝑐 𝑥 ≤ 𝑅𝐺 𝑛

𝑛

𝑡=1

𝑛

𝑡=1

Page 14: Online Convex Optimization in the Bandit Settingkamalika/teaching/CSE291W11/Feb9.pdf · Online Convex Optimization in the Bandit Setting: Gradient Descent Without a Gradient -Avinash

Expected Gradient Descent

𝑥1 = 0 ; At time 𝑡 + 1

– 𝑥𝑡+1 = 𝑃𝑠 𝑥𝑡 − 𝜂𝑔𝑡

– 𝑔𝑡: random vector

– 𝔼 𝑔𝑡 𝑥𝑡- = 𝛻𝑐𝑡 𝑥𝑡

Same bound holds on expectation:

𝔼 𝑐𝑡 𝑥𝑡

𝑛

𝑡=1

− 𝑚𝑖𝑛𝑥∈𝑆

𝑐𝑡 𝑥

𝑛

𝑡=1

≤ 𝑅𝐺 𝑛

Page 15: Online Convex Optimization in the Bandit Settingkamalika/teaching/CSE291W11/Feb9.pdf · Online Convex Optimization in the Bandit Setting: Gradient Descent Without a Gradient -Avinash

Key Challenge Revisited

Challenge – Estimate gradient 𝛻𝑐𝑡 𝑥𝑡 with one point estimate

𝑐𝑡(𝑥𝑡)

Somewhat easier – Come up with 𝑐 𝑡 , 𝑔𝑡 so that 𝐸 𝑔𝑡 𝑥𝑡- = 𝛻𝑐𝑡 𝑥𝑡

Come up with a function 𝑐𝑡 (close to 𝑐𝑡) whose gradient is easy to estimate (using 𝑐𝑡) on expectation

Page 16: Online Convex Optimization in the Bandit Settingkamalika/teaching/CSE291W11/Feb9.pdf · Online Convex Optimization in the Bandit Setting: Gradient Descent Without a Gradient -Avinash

One Point Estimate I

Fundamental theorem of calculus:

𝑑

𝑑𝑥 𝑐𝑡(𝑥 + 𝑦)𝑑𝑦

+𝛿

−𝛿

= 𝑐𝑡(𝑥 + 𝛿) − 𝑐𝑡(𝑥 − 𝛿)

Uniform random variable:𝑣 ∈ , −1,+1- 𝑑

𝑑𝑥𝛿

1

2𝑐𝑡(𝑥 + 𝑣𝛿)𝑑𝑣

1

−1

=𝑐𝑡 𝑥 + 𝛿 − 𝑐𝑡 𝑥 − 𝛿

2

Page 17: Online Convex Optimization in the Bandit Settingkamalika/teaching/CSE291W11/Feb9.pdf · Online Convex Optimization in the Bandit Setting: Gradient Descent Without a Gradient -Avinash

One Point Estimate II

Random variable 𝑢 ∈ * −1,+1+ 𝑑

𝑑𝑥𝔼𝑣 ~ 𝒰 −1,1 𝑐𝑡 𝑥 + 𝛿𝑣 =

𝔼𝑢 ~ −1,1 𝑐𝑡 𝑥 + 𝛿𝑢 𝑢

𝛿

𝑐𝑡 𝑥 = 𝔼,𝑐𝑡(𝑥 + 𝛿𝑣)- (smoothed version of 𝑐𝑡 ) – The function we are looking for!

– 𝑔𝑡 = 𝑐𝑡 𝑥 + 𝛿𝑢𝑡 𝑢𝑡

𝑣 is drawn from a line, 𝑢 from end points

Page 18: Online Convex Optimization in the Bandit Settingkamalika/teaching/CSE291W11/Feb9.pdf · Online Convex Optimization in the Bandit Setting: Gradient Descent Without a Gradient -Avinash

One point Estimate III

𝑑 dimensions – 𝑣 ~ 𝔹 (the unit ball)

– 𝑢 ~ 𝕊 (the unit sphere)

𝛻𝔼𝑣 ~ 𝔹 𝑐𝑡 𝑥 + 𝛿𝑣 =𝑑

𝛿𝔼𝑢 ~ 𝕊 𝑐𝑡 𝑥 + 𝛿𝑢 𝑢

Follows from Stokes’ theorem (generalization of fundamental theorem to 𝑑 dimensions)

𝑣 𝑢

Page 19: Online Convex Optimization in the Bandit Settingkamalika/teaching/CSE291W11/Feb9.pdf · Online Convex Optimization in the Bandit Setting: Gradient Descent Without a Gradient -Avinash

Putting Things Together

Expected gradient on 𝑐𝑡 : 1 − 𝛼 𝑆 → ,−𝐶, 𝐶-

– 𝑔𝑡 =𝑑

𝛿𝑐𝑡 𝑥𝑡 + 𝛿𝑢𝑡 𝑢𝑡 , 𝑢𝑡 ~ 𝔹

– 𝑥𝑡+1 = 𝑃𝑠 (𝑥𝑡 − 𝜂𝑔𝑡)

– 𝔼 𝑔𝑡 𝑥𝑡- = 𝛻 𝑐𝑡 𝑥

Bound on regret:

𝔼 𝑐𝑡 𝑥𝑡

𝑛

𝑡=1

− 𝑚𝑖𝑛𝑥∈ 1−𝛼 𝑆

𝑐𝑡 𝑥𝑡

𝑛

𝑡=1

≤ 𝑅𝐺 𝑛

Page 20: Online Convex Optimization in the Bandit Settingkamalika/teaching/CSE291W11/Feb9.pdf · Online Convex Optimization in the Bandit Setting: Gradient Descent Without a Gradient -Avinash

Outline

• Introduction

• One point estimate

• Main Theorem

– Algorithm

– Observations

– Proof Sketch

– Results

• Extensions and Related Work

Page 21: Online Convex Optimization in the Bandit Settingkamalika/teaching/CSE291W11/Feb9.pdf · Online Convex Optimization in the Bandit Setting: Gradient Descent Without a Gradient -Avinash

The Algorithm

Bandit-Gradient-Descent(𝛼, 𝛿, 𝜈)

–𝑥1 = 0

–At time t

• Select 𝑢𝑡 ~ 𝕊

• Play 𝑥𝑡 + 𝛿𝑢𝑡

• Observe 𝑐𝑡(𝑥𝑡 + 𝛿𝑢𝑡)

• 𝑥𝑡+1 = 𝑃 1−𝛼 𝑆(𝑥𝑡 − 𝜈𝑐𝑡 𝑥𝑡 + 𝛿𝑢𝑡 𝑢𝑡)

Page 22: Online Convex Optimization in the Bandit Settingkamalika/teaching/CSE291W11/Feb9.pdf · Online Convex Optimization in the Bandit Setting: Gradient Descent Without a Gradient -Avinash

Terms in the bound

Expected gradient for 𝑐 𝑡

Difference between min in 1 − 𝛼 𝑆 and 𝑆

Difference between 𝑐 𝑥 , 𝑥 ∈ 1 − 𝛼 𝑆 and 𝑐 𝑦 , 𝑦 ∈ 𝑆

1 − 𝛼 𝑆

𝑆

Page 23: Online Convex Optimization in the Bandit Settingkamalika/teaching/CSE291W11/Feb9.pdf · Online Convex Optimization in the Bandit Setting: Gradient Descent Without a Gradient -Avinash

Observation I

If we take a step of size 𝛼𝑟 from 𝑥 ∈ 1 − 𝛼 𝑆, we stay in 𝑆

Bounds on S: 𝑟𝔹 ⊂ 𝑆 ⊂ 𝑅𝔹

S contains the origin

𝛼𝑟𝔹 centered at 𝑥 ∈ 𝑆. So, 1 − 𝛼 𝑆 + 𝛼𝑟𝔹 ⊂ 1 − 𝛼 𝑆 + 𝛼𝑆 = 𝑆

𝑟

𝑅

𝑆

Page 24: Online Convex Optimization in the Bandit Settingkamalika/teaching/CSE291W11/Feb9.pdf · Online Convex Optimization in the Bandit Setting: Gradient Descent Without a Gradient -Avinash

Observation II

From expected gradient (𝜂 = 𝜈𝛿/𝑑)

𝔼 𝑐 𝑡 𝑥𝑡

𝑛

𝑡=1

− min𝑥∈ 1−𝛼 𝑆

𝑐 𝑡 𝑥𝑡

𝑛

𝑡=1

≤ 𝑅𝐺 𝑛

Gradient bound 𝐺

|𝑔𝑡| = 𝑑

𝛿𝑐𝑡 𝑥𝑡 + 𝛿𝑢𝑡 𝑢𝑡 ≤

𝑑𝐶

𝛿

Regret bound: 𝑅𝑑𝐶 𝑛

𝛿

Page 25: Online Convex Optimization in the Bandit Settingkamalika/teaching/CSE291W11/Feb9.pdf · Online Convex Optimization in the Bandit Setting: Gradient Descent Without a Gradient -Avinash

Observation III

Optimum in 1 − 𝛼 𝑆 is near optimum in 𝑆

From Jensen’s inequality

𝑐𝑡 1 − 𝛼 𝑥 + 𝛼0 ≤ 1 − 𝛼 𝑐𝑡 𝑥 + 𝛼𝑐𝑡 0

𝑐𝑡 1 − 𝛼 𝑥 − 𝑐𝑡 𝑥 ≤ 𝛼 𝑐𝑡 0 − 𝑐𝑡 𝑥 ≤ 2𝛼𝐶

Summing up

min𝑥∈ 1−𝛼 𝑆

𝑐𝑡 𝑥

𝑛

𝑡=1

− min𝑥∈𝑆

𝑐𝑡 𝑥

𝑛

𝑡=1

≤ 2𝛼𝐶𝑛

Page 26: Online Convex Optimization in the Bandit Settingkamalika/teaching/CSE291W11/Feb9.pdf · Online Convex Optimization in the Bandit Setting: Gradient Descent Without a Gradient -Avinash

Observation IV

Lipcshitz across 1 − 𝛼 𝑆 and 𝑆:

For 𝑥 ∈ 𝑆, 𝑦 ∈ 1 − 𝛼 𝑆

𝑐𝑡 𝑥 − 𝑐𝑡 𝑦 ≤2𝐶|𝑥 − 𝑦|

𝛼r

Obvious when Δ = 𝑥 − 𝑦 > 𝛼𝑟

Otherwise we pick a point 𝑧 ∈ 𝑆 in the direction of Δ and use Jensen’s inequality

Page 27: Online Convex Optimization in the Bandit Settingkamalika/teaching/CSE291W11/Feb9.pdf · Online Convex Optimization in the Bandit Setting: Gradient Descent Without a Gradient -Avinash

Proof Sketch I

Combining all the observations

𝔼 𝑐𝑡 𝑥𝑡

𝑛

𝑡=1

− min𝑥∈𝑆

𝑐𝑡 𝑥𝑡

𝑛

𝑡=1

≤𝑅𝑑𝐶 𝑛

𝛿 (expected gradient)

+6𝛿𝐶𝑛

𝛼𝑟 (effective Lipcshtiz)

+2𝛼𝐶𝑛 (difference in min)

Page 28: Online Convex Optimization in the Bandit Settingkamalika/teaching/CSE291W11/Feb9.pdf · Online Convex Optimization in the Bandit Setting: Gradient Descent Without a Gradient -Avinash

Proof Sketch II

Bound is of the form

𝑎1

𝛿+ 𝑏

𝛿

𝛼+ 𝑐𝛼

Setting 𝛿 =𝑎2

𝑏𝑐

3 , 𝛼 =

𝑎𝑏

𝑐2

3 gives a bound of

3 𝑎𝑏𝑐3

Note: 𝑎 = 𝑅𝑑𝐶 𝑛 , 𝑏 =6𝐶𝑛

𝑟, 𝑐 = 2𝐶𝑛

Page 29: Online Convex Optimization in the Bandit Settingkamalika/teaching/CSE291W11/Feb9.pdf · Online Convex Optimization in the Bandit Setting: Gradient Descent Without a Gradient -Avinash

Theorem

For 𝑛 ≥3𝑅𝑑

2𝑟

2, 𝜈 =

𝑅𝐶

𝑛 ,

𝛿 =𝑟𝑅2𝑑2

12𝑛

3 , 𝛼 =

3𝑅𝑑

2𝑟 𝑛

3

We can show a bound of

3𝐶𝑛56 𝑑𝑅

𝑟

3

Page 30: Online Convex Optimization in the Bandit Settingkamalika/teaching/CSE291W11/Feb9.pdf · Online Convex Optimization in the Bandit Setting: Gradient Descent Without a Gradient -Avinash

Outline

• Introduction

• One Point Estimate

• Main Theorem

• Extensions and Related Work

– Bound with a Lipschitz Constant

– Reshaping to Isotropic Position

– Related Work

Page 31: Online Convex Optimization in the Bandit Settingkamalika/teaching/CSE291W11/Feb9.pdf · Online Convex Optimization in the Bandit Setting: Gradient Descent Without a Gradient -Avinash

Bound with a Lipschitz Constant

When each 𝑐𝑡 is 𝐿 Lipschitz, for suitable values of 𝛼, 𝛿, 𝜈 We can show a bound of

2𝑛34 3𝑅𝑑𝐶 𝐿 +

𝐶

𝑟

Intuition: use the direct Lipschtiz constant instead of the effective one

Page 32: Online Convex Optimization in the Bandit Settingkamalika/teaching/CSE291W11/Feb9.pdf · Online Convex Optimization in the Bandit Setting: Gradient Descent Without a Gradient -Avinash

Reshaping

Dependence on 𝑅/𝑟 is not ideal

Transform S to be in its isotropic position – Affine transformation so that covariance = 𝐼

– 𝑟′ = 1, 𝑅′ = 1.01𝑑, 𝐿′ = 𝐿𝑅, 𝐶′ = 𝐶

Page 33: Online Convex Optimization in the Bandit Settingkamalika/teaching/CSE291W11/Feb9.pdf · Online Convex Optimization in the Bandit Setting: Gradient Descent Without a Gradient -Avinash

Related Work

Klienberg (independently) 𝑂(𝑛3

4) bound for the same problem

– Phases of length 𝑑 + 1

– Random one-point gradient estimates

– Only oblivious adversaries

Online linear optimization in bandit setting

– Kalai and Vempala show a bound of 𝑂( 𝑛)