Adaptive Regret Minimization in Bounded Memory Games

51
Adaptive Regret Minimization in Bounded Memory Games Jeremiah Blocki, Nicolas Christin, Anupam Datta, Arunesh Sinha 1 GameSec 2013 – Invited Paper

description

Adaptive Regret Minimization in Bounded Memory Games. Jeremiah Blocki , Nicolas Christin, Anupam Datta, Arunesh Sinha . GameSec 2013 – Invited Paper. Motivating Example: Cheating Game. Semester 1. Semester 2. Semester 3. Motivating Example: Speeding Game. Week 1. Week 2. - PowerPoint PPT Presentation

Transcript of Adaptive Regret Minimization in Bounded Memory Games

Page 1: Adaptive Regret Minimization in Bounded Memory Games

Adaptive Regret Minimization in Bounded Memory Games

Jeremiah Blocki, Nicolas Christin, Anupam Datta, Arunesh Sinha

1

GameSec 2013 – Invited Paper

Page 2: Adaptive Regret Minimization in Bounded Memory Games

Motivating Example: Cheating Game

4

Semester 1 Semester 2 Semester 3

Page 3: Adaptive Regret Minimization in Bounded Memory Games

Motivating Example: Speeding Game

5

Week 1 Week 2 Week 3

Page 4: Adaptive Regret Minimization in Bounded Memory Games

Motivating Example: Speeding GameExample

Actions

6

QuestionsAppropriate Game Model for this Interaction?

Defender Strategies?

:

Outcomes

:High InspectionLow Inspection

Speed Behave

Page 5: Adaptive Regret Minimization in Bounded Memory Games

Game Elements

8

o Repeated Interactiono Two Players: Defender and Adversary

o Imperfect Informationo Defender only observes outcome

o Short Term Adversarieso Adversary Incentives Unknown to

Defendero Last presentation! [JNTP 13]

o Adversary may be uninformed/irrational

Repeated G

ame M

odel?

Stackelberg

Page 6: Adaptive Regret Minimization in Bounded Memory Games

Additional Game Elements

9

o History-dependent Actionso Adversary adapts behavior following unknown strategyo How should defender respond?

o History-dependent Rewards:o Point Systemo Reputation of defender depends both on its history

and on the current outcome

StandardRegretMinimization

Repeated Game Model?

Page 7: Adaptive Regret Minimization in Bounded Memory Games

Outline

10

Motivation Background

Standard Definition of Regret Regret Minimization Algorithms Limitations

Our Contributions Bounded Memory Games Adaptive Regret Results

Page 8: Adaptive Regret Minimization in Bounded Memory Games

Speeding Game: Repeated Game Model

Example

11

.19 0.70.2 1

High InspectionLow Inspection

Defender’s (D) Expected Utility

+

Page 9: Adaptive Regret Minimization in Bounded Memory Games

Regret Minimization ExampleExample

Experts

13

Low Inspecti

onHigh

Inspection

What should I do?

Page 10: Adaptive Regret Minimization in Bounded Memory Games

BehaveLow

BehaveHigh

SpeedHighLowHigh

LowHigh

LowHigh

Regret Minimization ExampleExample

.19 0.70.2 1

High InspectionLow Inspection

Defender’s Utility

Experts

14

AdversaryDefender

Utility1.892.21.59

AristotlePlato

0.19 + 1 + 0.7 = 1.89 0.2 + 1 + 1 = 2.2Day 1 Day 2 Day 3

Page 11: Adaptive Regret Minimization in Bounded Memory Games

Regret Minimization ExampleExample

.19 0.70.2 1

High InspectionLow Inspection

Defender’s Utility

Regret

15

DefenderAristotlePlato

Utility1.892.20.59

Page 12: Adaptive Regret Minimization in Bounded Memory Games

Regret Minimization ExampleExample

.19 0.70.2 1

High InspectionLow Inspection

Defender’s Utility

Regret

16

Page 13: Adaptive Regret Minimization in Bounded Memory Games

Regret Minimization ExampleExample

.19 0.70.2 1

High InspectionLow Inspection

Defender’s utility

Regret Minimization Algorithm (A)

17

lim¿𝑇→∞ (Regret ( 𝐴 ,𝐸𝑥𝑝𝑒𝑟𝑡𝑠 ,𝑇 )𝑇 )≤0

Page 14: Adaptive Regret Minimization in Bounded Memory Games

Regret Minimization: Basic Idea

18

Low Inspectio

n

High Inspectio

n

1.0 1.0Weights

Choose action probabilistically based on weights

.19 0.70.2 1

High Inspection

Low Inspection

Page 15: Adaptive Regret Minimization in Bounded Memory Games

Regret Minimization: Basic Idea

19

Updated weights

Low Inspectio

n

High Inspectio

n

0.5 1.5

.19 0.70.2 1

High InspectionLow Inspection

Page 16: Adaptive Regret Minimization in Bounded Memory Games

Speeding GameExample

.19 0.70.2 1

High InspectionLow Inspection

Defender’s utility

Defender’s Strategy

20

Nash Equilibrium: Low Inspection

Regret Minimization: Low Inspection

Dominant Strategy

Low Inspectio

nHigh

Inspection

0.5 1.5

Page 17: Adaptive Regret Minimization in Bounded Memory Games

Speeding GameExample

.19 0.70.2 1

High InspectionLow Inspection

Defender’s utility

Defender’s Strategy

21

Nash Equilibrium: Low Inspection

Regret Minimization: Low Inspection

Dominant Strategy

Low Inspectio

nHigh

Inspection

0.3 1.7

Page 18: Adaptive Regret Minimization in Bounded Memory Games

Speeding GameExample

.19 0.70.2 1

High InspectionLow Inspection

Defender’s utility

Defender’s Strategy

22

Nash Equilibrium: Low Inspection

Regret Minimization: Low Inspection

Dominant Strategy

Low Inspectio

nHigh

Inspection

0.1 1.9

Page 19: Adaptive Regret Minimization in Bounded Memory Games

Philosophical Argument

24

See! My advice

was better!

We need

a better game model

!

Page 20: Adaptive Regret Minimization in Bounded Memory Games

Unmodeled Game Elements

29

o Adversary Incentives Unknown to Defendero Last presentation! [JNTP 13]

o Adversary may be uninformed/irrational

o History-dependent Rewards:o Point Systemo Reputation of defender depends both on its history

and on the current outcome

o History-dependent Actionso Adversary adapts behavior following unknown strategyo How should defender respond?

Page 21: Adaptive Regret Minimization in Bounded Memory Games

Outline

30

Motivation Background Our Contributions

Bounded Memory Games Adaptive Regret Results

Page 22: Adaptive Regret Minimization in Bounded Memory Games

Bounded Memory Games

32

State s: Encodes last m outcomes

States: can capture history dependent rewards

𝑂𝑖−𝑚 ,…,𝑂 𝑖− 2 ,𝑂𝑖−1 𝑂𝑖−𝑚+1 ,…,𝑂𝑖−1 ,𝑂𝑖𝑂𝑖

- Defender payoff when actions d,a are played at state s

Page 23: Adaptive Regret Minimization in Bounded Memory Games

Bounded Memory Games

33

State s: Encodes last m outcomes

Current outcome is only dependent on current actions

, ¿ ,…, , - Defender payoff when actions d,a are played at state s

Page 24: Adaptive Regret Minimization in Bounded Memory Games

Bounded Memory Games - Experts

34

Expert advice may depend on the last m outcomes If no violations have been

detected in the last m rounds then play High Inspection, otherwise Low Inspection

Fixed Defender Strategy:

State Action

Page 25: Adaptive Regret Minimization in Bounded Memory Games

Outline

35

Motivation Background Our Contributions

Bounded Memory Games Adaptive Regret Results

Page 26: Adaptive Regret Minimization in Bounded Memory Games

k-Adaptive Strategy

36

Decision tree for the next k roundsSpeed

Speed

Speed Speed

Day 1

Day 2

Day 3

…… …

Behave

… …

Speed

Speed

Page 27: Adaptive Regret Minimization in Bounded Memory Games

k-Adaptive Strategy

37

Decision tree for the next k rounds

Week 1 Week 2 Week 3

I will never speed while I am on vacation.

I will speed until I get caught. If I ever get a ticket then I will

stop.

I will keep speeding until I get two tickets. If I ever get two

tickets then I will stop.

If violations have been detected in the last 7 rounds

then play High Inspection, otherwise Low Inspection

Page 28: Adaptive Regret Minimization in Bounded Memory Games

k-Adaptive Regret

38

Regret (𝐷 , Expert ,𝑇 )=∑𝑖=1

𝑇(𝑟 𝑖′−𝑟𝑖 )

Initial State

Defender … O-1 O0

Actions (a1,d1) (a2,d2) … (ak+1,dk+1)

Outcome

O1 O2 … Ok+1 …

r1 r2 … rk+1

Expert … O-1 O0

Actions (a1,d1’)

(a2’,d2’)

… (ak+1,dk+1’)

Outcome

O1’ O2’ … Ok+1’ …

r1’ r2’ … rk+1’

Page 29: Adaptive Regret Minimization in Bounded Memory Games

k-Adaptive Regret Minimization

39

Definition: An algorithm D is a-approximate k-adaptive regret minimization if for any bounded memory-m game and any fixed set of experts EXP

lim ¿ 𝑇→∞(max𝐸∈exp

Regret (D ,𝐸 ,𝑇 )𝑇 )≤𝛾 .

Page 30: Adaptive Regret Minimization in Bounded Memory Games

Outline

40

Motivation Background Bounded Memory Games Adaptive Regret Results

Page 31: Adaptive Regret Minimization in Bounded Memory Games

k-Adaptive Regret Minimization

41

lim¿𝑇→∞(max𝐸∈exp

Regret (D ,𝐸 ,𝑇 )𝑇 )≤𝛾 .

Definition: An algorithm D is a-approximate k-adaptive regret minimization if for any bounded memory-m game and any fixed set of experts EXP

Theorem: For any there is an inefficient –approximate k-adaptive regret minimization algorithm.

Page 32: Adaptive Regret Minimization in Bounded Memory Games

Inefficient Regret Minimization Algorithm

42

• Use standard regret minimization algorithm for repeated games of imperfect information [AK04, McMahanB04,K05,FKM05]

, ¿ ,…, , …

f1

f2

Bounded Memory-mGame

fixed

stra

tegy

Repeated Game

Page 33: Adaptive Regret Minimization in Bounded Memory Games

Inefficient Regret Minimization Algorithm

43

, ¿ ,…, , …

f1

f2

Bounded Memory-mGame

fixed

stra

tegy

Repeated Game

Expected reward in original game given:

1. Defender follows fixed strategy f2 for next mkt rounds of original game

2. Defender sees sequence of k-adaptive adversaries below

• Use standard regret minimization algorithm for repeated games of imperfect information [AK04, McMahanB04,K05,FKM05]

Page 34: Adaptive Regret Minimization in Bounded Memory Games

Current outcome is only dependent on current actions

Inefficient Regret Minimization Algorithm

Start State Stagei (m*k*t rounds)

Real Game … O1 … Om …

Repeated Game

… O1 … Om …

44

Page 35: Adaptive Regret Minimization in Bounded Memory Games

Inefficient Regret Minimization Algorithm

Start State Stagei (m*k*t rounds)

Real Game … O1 … Om …

Repeated Game

… O1 … Om …

45

• After m rounds in Stagei View 1 and View 2 must converge to the same state

Page 36: Adaptive Regret Minimization in Bounded Memory Games

Inefficient Regret Minimization Algorithm

46

, ¿ ,…, , …

f1

f2

Bounded Memory-mGame

fixed

stra

tegy

Repeated GameStandard Regret Minimization algorithms maintain weight

for each expert.

Inefficient: Exponentially many fixed strategies!

• Use standard regret minimization algorithm for repeated games of imperfect information [AK04, McMahanB04,K05,FKM05]

Page 37: Adaptive Regret Minimization in Bounded Memory Games

Summary of Technical Results

47

Imperfect Information

Perfect Information

Oblivious Regret Hard (Theorem 1)APX (Theorem 5)

APX (Theorem 4)

k-Adaptive Regret Hard (Theorem 1) Hard (Remark 2)

Fully Adaptive Regret X (Theorem 6) X (Theorem 6)

Easier

X – No Regret Minimization Algorithm ExistsHard – unless no regret minimization algorithm is efficient in

APX – efficient approximate regret minimization algorithm

Page 38: Adaptive Regret Minimization in Bounded Memory Games

Summary of Technical Results

48

Imperfect Information

Perfect Information

Oblivious Regret Hard (Theorem 1)APX (Theorem 5)

APX (Theorem 4)

k-Adaptive Regret Hard (Theorem 1)APX (New!)

Hard (Remark 2)APX (New!)

Fully Adaptive Regret X (Theorem 6) X (Theorem 6)

Easier

X – No Regret Minimization Algorithm ExistsHard – unless no regret minimization algorithm is efficient in

APX – efficient approximate regret minimization algorithm in n, k

Page 39: Adaptive Regret Minimization in Bounded Memory Games

Summary of Technical Results

49

Imperfect Information

Perfect Information

Oblivious Regret k-Adaptive Regret

Fully Adaptive Regret X (Theorem 6) X (Theorem 6)

Easier

Ideas: Implicit weight representation + Dynamic Programming

Warning! f(k) is a very large constant!

Page 40: Adaptive Regret Minimization in Bounded Memory Games

Implicit Weights: Outcome Tree

50

… …

Behave… …

SpeedBehave Speed 𝑶 ( ln𝒏𝜸 )

𝒘𝒖𝒗

How often is edge (s,t) relevant?

nodes

Page 41: Adaptive Regret Minimization in Bounded Memory Games

Implicit Weights: Outcome Tree

51

Expert: E… …

Behave… …

SpeedBehave Speed

𝒘𝒖𝒗 𝒘 𝑬= ∑𝒖 ,𝒗∈𝑬

𝑹𝒖𝒗𝒘𝒖𝒗  

nodes

𝑶 ( ln𝒏𝜸 )❑

Page 42: Adaptive Regret Minimization in Bounded Memory Games

Open Questions

52

Perfect Information: efficient 𝛄-Approximate k-Adaptive Regret Minimization Algorithm when k = 0 and 𝛄 = 0?

𝛄-Approximate k-Adaptive Regret Minimization Algorithm with more efficient running time? ?

Imperfect Information

Perfect Information

Oblivious Regret Hard (Theorem 1)APX (Theorem 5)

APX (Theorem 4)

k-Adaptive Regret Hard (Theorem 1)APX

Hard (Remark 2)APX

Fully Adaptive Regret X (Theorem 6) X (Theorem 6)

Thanks for Listening!

Page 44: Adaptive Regret Minimization in Bounded Memory Games

THEOREM 3Unless RP=NP there is no efficient RegretMinimization algorithm for Bounded Memory

Games even against an oblivious adversary.

Reduction from MAX 3-SAT (7/8+ε) [Hastad01] Similar to reduction in [EKM05] for MDP’s

54

Page 45: Adaptive Regret Minimization in Bounded Memory Games

THEOREM 3: SETUP

Defender Actions A: {0,1}x{0,1}

m = O(log n)

States: Two states for each variable S0 = {s1,…, sn} S1 = {s’1,…,s’n}

Intuition: A fixed strategy corresponds to a variable assignment 55

Page 46: Adaptive Regret Minimization in Bounded Memory Games

THEOREM 3: OVERVIEW The adversary picks a clause uniformly at

random for the next n rounds

Defender can earn reward 1 by satisfying this unknown clause in the next n rounds

The game will “remember” if a reward has already been given so that defender cannot earn a reward multiple times during n rounds

56

Page 47: Adaptive Regret Minimization in Bounded Memory Games

THEOREM 3: STATE TRANSITIONS

57

Adversary Actions B: {0,1}x{0,1,2,3}

b = (b1,b2)

g(a,b) = b1

f(a,b) = S1 if a2 = 1 or b2 = a1 (reward already given) S0 else (no reward given)

Page 48: Adaptive Regret Minimization in Bounded Memory Games

THEOREM 3: REWARDS

58

b = (b1,b2)

No reward whenever B plays b2 = 2

r(a,b,s) =

1 if s S0 and a = b2

-5 if s S1 and f(a,b) = S0 and b2 3 0 otherwise

No reward whenever s S1

Page 49: Adaptive Regret Minimization in Bounded Memory Games

THEOREM 3: OBLIVIOUS ADVERSARY(d1,…,dn) - binary De Buijn sequence of order

n

1. Pick a clause C uniformly at random2. For i = 1,…,n

Play b = (di,b2)

3. Repeat Step 159

b2 =

1 If xi C0 If xi C3 If i = n2 Otherwise

Page 50: Adaptive Regret Minimization in Bounded Memory Games

ANALYSIS

Defender can never be rewarded from s S1 Get Reward => Transition to s S1 Defender is punished for leaving S1

Unless adversary plays b2 = 3 (i.e when i = n)60

f(a,b) = S1 if a2 = 1 or b2 = a1

S0 else

r(a,b,s)=

1 if s S0 and a = b2

-5 if s S1 and f(a,b) = S0 and b2 3 0 otherwise

Page 51: Adaptive Regret Minimization in Bounded Memory Games

THEOREM 3: ANALYSIS φ - assignment satisfying ρ fraction of clauses

fφ – average score ρ/n

Claim: No strategy (fixed or adaptive) can obtain an average expected score better than ρ*/n

Regret Minimization Algorithm Run until expected average regret < ε/n Expected average score > (ρ*- ε )/n

61