Efficient Sequential Decision-Making in Structured Problems Adam Tauman Kalai Georgia Institute of...

25
Efficient Sequential Decision-Making in Structured Problems Adam Tauman Kalai Georgia Institute of Technology Weizmann Institute Toyota Technological Institute National Institute of Corrections

Transcript of Efficient Sequential Decision-Making in Structured Problems Adam Tauman Kalai Georgia Institute of...

Page 1: Efficient Sequential Decision-Making in Structured Problems Adam Tauman Kalai Georgia Institute of Technology Weizmann Institute Toyota Technological Institute.

Efficient SequentialDecision-Making in Structured Problems

Adam Tauman KalaiGeorgia Institute of TechnologyWeizmann InstituteToyota Technological InstituteNational Institute of Corrections

Page 2: Efficient Sequential Decision-Making in Structured Problems Adam Tauman Kalai Georgia Institute of Technology Weizmann Institute Toyota Technological Institute.

BANDITS AND REGRET

TIME

123

19

511

88

869

343

2

4

AVG 18 65 4

95

1

5

REGRET = AVG REWARD OF BEST DECISION – AVG REWARD = 8 – 5 = 3

Page 3: Efficient Sequential Decision-Making in Structured Problems Adam Tauman Kalai Georgia Institute of Technology Weizmann Institute Toyota Technological Institute.

TWO APPROACHES

Bayesian setting [Robbins52] Independent prior probability dist. over payoff

sequences for each machine Thm: Maximize (discounted) expected reward

by pulling arm of largest “Gittins index”

Nonstochastic [Auer,Cesa-Bianchi,Freund,Schapire95]

Thm: For any sequence of [0,1] costs on N machines, their algorithm achieves expected

regret of O

Page 4: Efficient Sequential Decision-Making in Structured Problems Adam Tauman Kalai Georgia Institute of Technology Weizmann Institute Toyota Technological Institute.

Route Time

25 min

17 min

44 min

STRUCTURED COMB-OPT

Clustering Errors

40

55

19

Online examples: Routing Compression Binary search trees PCFGs Pruning dec. trees Poker Auctions Classification

Problems not included: Portfolio selection

(nonlinear) Online sudoko

Page 5: Efficient Sequential Decision-Making in Structured Problems Adam Tauman Kalai Georgia Institute of Technology Weizmann Institute Toyota Technological Institute.

STRUCTURED COMB-OPT

Known decision set S.

Known LINEARLINEAR cost func. c: S £ [0,1]d ! [0,1].

Unknown w1, w2, …, w 2 [0,1]d

On period t = 1, 2, …, T:

Alg. picks st2S.

Alg. pays and finds out c(st,wt).

REGRET =

=

Page 6: Efficient Sequential Decision-Making in Structured Problems Adam Tauman Kalai Georgia Institute of Technology Weizmann Institute Toyota Technological Institute.

MAIN POINTS

Offline optimization M: [0,1]d ! SM(w) = argmins2S c(s,w), e.g. shortest pathEasier than sequential decision-making!?

EXPLORATIONAutomatically find “exploration basis” using M

LOW REGRETDimension matters more than # decisions

EFFICIENCYOnline algorithm uses offline black-box opt. M

Page 7: Efficient Sequential Decision-Making in Structured Problems Adam Tauman Kalai Georgia Institute of Technology Weizmann Institute Toyota Technological Institute.

MAIN RESULT

An algorithm that achives:

For any set S, any linear c: S£[0,1]d![0,1], any T ¸ 1, and any sequence w1,…,wT 2 [0,1]d,

E[regret of alg] · 15dT-1/3

Each update requires linear time and calls offline optimizer M with probability O(dT-1/3)

[AK04,MB04,DH06]

Page 8: Efficient Sequential Decision-Making in Structured Problems Adam Tauman Kalai Georgia Institute of Technology Weizmann Institute Toyota Technological Institute.

EXPLORE vs EXPLOITFind good “exploration basis” using M

On period t = 1, 2, …, T: ExploreExplore with probability ,

Play st := a random element of exploration basis

Estimate vt somehow

ExploitExploit with probability 1-,Play st := M(i<tvi+p)

vt := 0

KeyKey property: E[vt] = wt

E[calls to M] = .

random perturbation[Hannan57]

[AK04,[AK04, MB04]MB04]

Page 9: Efficient Sequential Decision-Making in Structured Problems Adam Tauman Kalai Georgia Institute of Technology Weizmann Institute Toyota Technological Institute.

REMAINDER OF TALK

EXPLORATIONEXPLORATIONGood “exploration basis” definitionFinding one

EXPLOITATIONEXPLOITATIONPerturbation (randomized regularization)Stability analysis

OTHER DIRECTIONSOTHER DIRECTIONSApproximation algorithmsConvex problems

Page 10: Efficient Sequential Decision-Making in Structured Problems Adam Tauman Kalai Georgia Institute of Technology Weizmann Institute Toyota Technological Institute.

EXPLORATIONEXPLORATION

Page 11: Efficient Sequential Decision-Making in Structured Problems Adam Tauman Kalai Georgia Institute of Technology Weizmann Institute Toyota Technological Institute.

GOING TO d-DIMENSIONS

Linear cost function c: S £ [0,1]d ! [0,1] Mapping S ! [0,1]d:

s = (c(s,(1,0,…,0)),c(s,(0,1,…,0)),…,c(s,(0,…,0,1)) c(s,w) = s¢w

S = { s | s 2 S }K = convex-hull(S)WLOG dim(S)=d

K

Page 12: Efficient Sequential Decision-Making in Structured Problems Adam Tauman Kalai Georgia Institute of Technology Weizmann Institute Toyota Technological Institute.

EXPLORATION BASISDef: Exploration basis b1, b2, …, bd 2 S is a

2-Barycentric-spanner if, for every s 2 S, s = i ibi for some 1, 2, …, d 2 [-2,2]

Possible to find an exploration basis efficiently using offline optimizer M(w) = argmins2S c(s,w)

[AK04]

S = { s | s 2 S }K = convex-hull(S)WLOG dim(S)=d

K

badgood

Page 13: Efficient Sequential Decision-Making in Structured Problems Adam Tauman Kalai Georgia Institute of Technology Weizmann Institute Toyota Technological Institute.

EXPLOITATIONEXPLOITATION

Page 14: Efficient Sequential Decision-Making in Structured Problems Adam Tauman Kalai Georgia Institute of Technology Weizmann Institute Toyota Technological Institute.

EXPLORE vs EXPLOITFind good “exploration basis” using M

On period t = 1, 2, …, T: ExploreExplore with probability ,

Play st := a random element of exploration basis

Estimate vt somehow

ExploitExploit with probability 1-,Play st := M(i<tvi+p)

vt := 0

KeyKey property: E[vt] = wt

E[calls to M] = .

random perturbation[Hannan57]

[AK04,[AK04, MB04]MB04]

Page 15: Efficient Sequential Decision-Making in Structured Problems Adam Tauman Kalai Georgia Institute of Technology Weizmann Institute Toyota Technological Institute.

INSTABILITY

Define zt = M(i·t wi) = argmins2S i·t c(s,wi)

Natural idea: use zt-1 on period t?

REGRET=1!

½ 0

0 1

1 0

0 1

1 0

Page 16: Efficient Sequential Decision-Making in Structured Problems Adam Tauman Kalai Georgia Institute of Technology Weizmann Institute Toyota Technological Institute.

STABILITY ANALYSIS [KV03]

Define zt = M(i·t wi) = argmins2S i<t c(s,wi)

Lemma: Regret of using zt on period t is 0

Proof:

mins2S c(s,w1)+c(s,w2)+…+c(s,wT) =

c(zT,w1)+…+c(zT,wT-1)+c(zT,wT) ¸

c(zT-1,w1)+…+c(zT-1,wT-1)+c(zT,wT) ¸

¸

c(z1,w1)+c(z2,w2)+…+c(zT,wT)

Page 17: Efficient Sequential Decision-Making in Structured Problems Adam Tauman Kalai Georgia Institute of Technology Weizmann Institute Toyota Technological Institute.

STABILITY ANALYSIS [KV03]

Define zt = M(i·t wi) = argmins2S i<t c(s,wi)

Lemma: Regret of using zt on period t is 0

) Regret of zt-1 on t · t·T c(zt-1,wt)-c(zt,wt)

Idea: regularize to achieve stability

Let yt = M(i·t wi+p), for random p 2 [0,1]d.

E[Regret of yt-1 on t] · t·T E[c(yt-1,wt)-c(yt,wt)] +

Strange: randomized regularization!

yt can be computed using M

Page 18: Efficient Sequential Decision-Making in Structured Problems Adam Tauman Kalai Georgia Institute of Technology Weizmann Institute Toyota Technological Institute.

OTHER DIRECTIONSOTHER DIRECTIONS

Page 19: Efficient Sequential Decision-Making in Structured Problems Adam Tauman Kalai Georgia Institute of Technology Weizmann Institute Toyota Technological Institute.

BANDIT CONVEX OPT. Convex feasible set S µ Rd

Unknown sequence of concave functions f1,…, fT: S ! [0,1]

On period t = 1,2,…,T:Algorithm chooses xt 2 S

Algorithm pays and finds out ft(xt)

Thm. 8 concave f1, f2, …: S ! [0,1], 8T0,T ¸ 1, bacterial ascent algorithm achieves:

Page 20: Efficient Sequential Decision-Making in Structured Problems Adam Tauman Kalai Georgia Institute of Technology Weizmann Institute Toyota Technological Institute.

MOTIVATING EXAMPLE

Company has to decide how much to advertize among d channels, within budget.

Feedback is total profit, affected by external factors.

x1

f1(x1)

$P

RO

FIT

$ADVERTISINGx2

f2(x2)

x3

f3(x3)

x4

f4(x4)

f1

f2

f3

f4

x*

Page 21: Efficient Sequential Decision-Making in Structured Problems Adam Tauman Kalai Georgia Institute of Technology Weizmann Institute Toyota Technological Institute.

BACTERIAL ASCENT

S

EXPLOREEXPLOIT

x0 x1

Page 22: Efficient Sequential Decision-Making in Structured Problems Adam Tauman Kalai Georgia Institute of Technology Weizmann Institute Toyota Technological Institute.

BACTERIAL ASCENT

S

EXPLOREEXPLOIT

x0 x1x2

Page 23: Efficient Sequential Decision-Making in Structured Problems Adam Tauman Kalai Georgia Institute of Technology Weizmann Institute Toyota Technological Institute.

BACTERIAL ASCENT

S

EXPLOREEXPLOIT

x0 x1x2

x3

Page 24: Efficient Sequential Decision-Making in Structured Problems Adam Tauman Kalai Georgia Institute of Technology Weizmann Institute Toyota Technological Institute.

APPROXIMATION ALG’s

What if offline optimization is NP-hard? Example: repeated traveling salesman problem Suppose you have approximation algorithm A,

c(A(w),w) · mins2S c(s,w) for all w 2 [0,1]d

Would like to achieve low -regret = our cost – (min cost of best s 2 S)

Possible using convex optimization approach above and transformations of approximation algorithms [KKL07]

Page 25: Efficient Sequential Decision-Making in Structured Problems Adam Tauman Kalai Georgia Institute of Technology Weizmann Institute Toyota Technological Institute.

CONCLUSIONS

Can extend bandit algorithms to structured problemsGuarantee worst-case low regretLinear combinatorial optimization problems Convex optimization

Remarks Works against adaptive adversaries as wellOnline efficiency = offline efficiencyCan handle approximation algorithmsCan achieve cost · (1+) min cost + O(1/)