Adaptive Regret Minimization in Bounded Memory Games Jeremiah Blocki, Nicolas Christin, Anupam...

51
Adaptive Regret Minimization in Bounded Memory Games Jeremiah Blocki, Nicolas Christin, Anupam Datta, Arunesh Sinha 1 GameSec 2013 – Invited Paper

Transcript of Adaptive Regret Minimization in Bounded Memory Games Jeremiah Blocki, Nicolas Christin, Anupam...

Page 1: Adaptive Regret Minimization in Bounded Memory Games Jeremiah Blocki, Nicolas Christin, Anupam Datta, Arunesh Sinha 1 GameSec 2013 – Invited Paper.

Adaptive Regret Minimization in Bounded Memory Games

Jeremiah Blocki, Nicolas Christin,

Anupam Datta, Arunesh Sinha

1

GameSec 2013 – Invited Paper

Page 2: Adaptive Regret Minimization in Bounded Memory Games Jeremiah Blocki, Nicolas Christin, Anupam Datta, Arunesh Sinha 1 GameSec 2013 – Invited Paper.

Motivating Example: Cheating Game

4

Semester 1 Semester 2 Semester 3

Page 3: Adaptive Regret Minimization in Bounded Memory Games Jeremiah Blocki, Nicolas Christin, Anupam Datta, Arunesh Sinha 1 GameSec 2013 – Invited Paper.

Motivating Example: Speeding Game

5

Week 1 Week 2 Week 3

Page 4: Adaptive Regret Minimization in Bounded Memory Games Jeremiah Blocki, Nicolas Christin, Anupam Datta, Arunesh Sinha 1 GameSec 2013 – Invited Paper.

Motivating Example: Speeding Game

Example

Actions

6

Questions

Appropriate Game Model for this Interaction?

Defender Strategies?

:

Outcomes

:

High InspectionLow Inspection

Speed Behave

Page 5: Adaptive Regret Minimization in Bounded Memory Games Jeremiah Blocki, Nicolas Christin, Anupam Datta, Arunesh Sinha 1 GameSec 2013 – Invited Paper.

Game Elements

8

o Repeated Interactiono Two Players: Defender and Adversary

o Imperfect Informationo Defender only observes outcome

o Short Term Adversarieso Adversary Incentives Unknown to

Defendero Last presentation! [JNTP 13]

o Adversary may be uninformed/irrational

Rep

eated G

ame M

od

el?

Stackelb

erg

Page 6: Adaptive Regret Minimization in Bounded Memory Games Jeremiah Blocki, Nicolas Christin, Anupam Datta, Arunesh Sinha 1 GameSec 2013 – Invited Paper.

Additional Game Elements

9

o History-dependent Actionso Adversary adapts behavior following unknown strategyo How should defender respond?

o History-dependent Rewards:o Point Systemo Reputation of defender depends both on its history

and on the current outcome

StandardRegretMinimization

Repeated Game Model?

Page 7: Adaptive Regret Minimization in Bounded Memory Games Jeremiah Blocki, Nicolas Christin, Anupam Datta, Arunesh Sinha 1 GameSec 2013 – Invited Paper.

Outline

10

Motivation Background

Standard Definition of Regret Regret Minimization Algorithms Limitations

Our Contributions Bounded Memory Games Adaptive Regret Results

Page 8: Adaptive Regret Minimization in Bounded Memory Games Jeremiah Blocki, Nicolas Christin, Anupam Datta, Arunesh Sinha 1 GameSec 2013 – Invited Paper.

Speeding Game: Repeated Game Model

Example

11

.19 0.7

0.2 1

High Inspection

Low Inspection

Defender’s (D) Expected Utility

+

Page 9: Adaptive Regret Minimization in Bounded Memory Games Jeremiah Blocki, Nicolas Christin, Anupam Datta, Arunesh Sinha 1 GameSec 2013 – Invited Paper.

Regret Minimization Example

Example

Experts

13

Low Inspecti

onHigh

Inspection

What should I do?

Page 10: Adaptive Regret Minimization in Bounded Memory Games Jeremiah Blocki, Nicolas Christin, Anupam Datta, Arunesh Sinha 1 GameSec 2013 – Invited Paper.

Behave

Low

Behave

High

Speed

High

Low

High

Low

High

Low

High

Regret Minimization Example

Example

.19 0.7

0.2 1

High Inspection

Low Inspection

Defender’s Utility

Experts

14

Adversary

Defender

Utility

1.89

2.2

1.59

Aristotle

Plato

0.19 + 1 + 0.7 = 1.89 0.2 + 1 + 1 = 2.2

Day 1 Day 2 Day 3

Page 11: Adaptive Regret Minimization in Bounded Memory Games Jeremiah Blocki, Nicolas Christin, Anupam Datta, Arunesh Sinha 1 GameSec 2013 – Invited Paper.

Regret Minimization Example

Example

.19 0.7

0.2 1

High Inspection

Low Inspection

Defender’s Utility

Regret

15

Defender

Aristotle

Plato

Utility

1.89

2.2

0.59

Page 12: Adaptive Regret Minimization in Bounded Memory Games Jeremiah Blocki, Nicolas Christin, Anupam Datta, Arunesh Sinha 1 GameSec 2013 – Invited Paper.

Regret Minimization Example

Example

.19 0.7

0.2 1

High Inspection

Low Inspection

Defender’s Utility

Regret

16

Page 13: Adaptive Regret Minimization in Bounded Memory Games Jeremiah Blocki, Nicolas Christin, Anupam Datta, Arunesh Sinha 1 GameSec 2013 – Invited Paper.

Regret Minimization Example

Example

.19 0.7

0.2 1

High Inspection

Low Inspection

Defender’s utility

Regret Minimization Algorithm (A)

17

lim¿𝑇→∞(Regret ( 𝐴 ,𝐸𝑥𝑝𝑒𝑟𝑡𝑠 ,𝑇 )𝑇 )≤0

Page 14: Adaptive Regret Minimization in Bounded Memory Games Jeremiah Blocki, Nicolas Christin, Anupam Datta, Arunesh Sinha 1 GameSec 2013 – Invited Paper.

Regret Minimization: Basic Idea

18

Low Inspectio

n

High Inspectio

n

1.0 1.0Weights

Choose action probabilistically based on weights

.19 0.7

0.2 1

High Inspection

Low Inspection

Page 15: Adaptive Regret Minimization in Bounded Memory Games Jeremiah Blocki, Nicolas Christin, Anupam Datta, Arunesh Sinha 1 GameSec 2013 – Invited Paper.

Regret Minimization: Basic Idea

19

Updated weights

Low Inspectio

n

High Inspectio

n

0.5 1.5

.19 0.7

0.2 1

High Inspection

Low Inspection

Page 16: Adaptive Regret Minimization in Bounded Memory Games Jeremiah Blocki, Nicolas Christin, Anupam Datta, Arunesh Sinha 1 GameSec 2013 – Invited Paper.

Speeding Game

Example

.19 0.7

0.2 1

High Inspection

Low Inspection

Defender’s utility

Defender’s Strategy

20

Nash Equilibrium: Low Inspection

Regret Minimization: Low Inspection

Dominant Strategy

Low Inspectio

nHigh

Inspection

0.5 1.5

Page 17: Adaptive Regret Minimization in Bounded Memory Games Jeremiah Blocki, Nicolas Christin, Anupam Datta, Arunesh Sinha 1 GameSec 2013 – Invited Paper.

Speeding Game

Example

.19 0.7

0.2 1

High Inspection

Low Inspection

Defender’s utility

Defender’s Strategy

21

Nash Equilibrium: Low Inspection

Regret Minimization: Low Inspection

Dominant Strategy

Low Inspectio

nHigh

Inspection

0.3 1.7

Page 18: Adaptive Regret Minimization in Bounded Memory Games Jeremiah Blocki, Nicolas Christin, Anupam Datta, Arunesh Sinha 1 GameSec 2013 – Invited Paper.

Speeding Game

Example

.19 0.7

0.2 1

High Inspection

Low Inspection

Defender’s utility

Defender’s Strategy

22

Nash Equilibrium: Low Inspection

Regret Minimization: Low Inspection

Dominant Strategy

Low Inspectio

nHigh

Inspection

0.1 1.9

Page 19: Adaptive Regret Minimization in Bounded Memory Games Jeremiah Blocki, Nicolas Christin, Anupam Datta, Arunesh Sinha 1 GameSec 2013 – Invited Paper.

Philosophical Argument

24

See! My advice

was better!

We need

a better game model

!

Page 20: Adaptive Regret Minimization in Bounded Memory Games Jeremiah Blocki, Nicolas Christin, Anupam Datta, Arunesh Sinha 1 GameSec 2013 – Invited Paper.

Unmodeled Game Elements

29

o Adversary Incentives Unknown to Defendero Last presentation! [JNTP 13]

o Adversary may be uninformed/irrational

o History-dependent Rewards:o Point Systemo Reputation of defender depends both on its history

and on the current outcome

o History-dependent Actionso Adversary adapts behavior following unknown strategyo How should defender respond?

Page 21: Adaptive Regret Minimization in Bounded Memory Games Jeremiah Blocki, Nicolas Christin, Anupam Datta, Arunesh Sinha 1 GameSec 2013 – Invited Paper.

Outline

30

Motivation Background Our Contributions

Bounded Memory Games Adaptive Regret Results

Page 22: Adaptive Regret Minimization in Bounded Memory Games Jeremiah Blocki, Nicolas Christin, Anupam Datta, Arunesh Sinha 1 GameSec 2013 – Invited Paper.

Bounded Memory Games

32

State s: Encodes last m outcomes

States: can capture history dependent rewards

𝑂𝑖−𝑚 ,…,𝑂 𝑖− 2 ,𝑂𝑖−1 𝑂𝑖−𝑚+1 ,…,𝑂𝑖 −1 ,𝑂𝑖𝑂 𝑖

- Defender payoff when actions d,a are played at state s

Page 23: Adaptive Regret Minimization in Bounded Memory Games Jeremiah Blocki, Nicolas Christin, Anupam Datta, Arunesh Sinha 1 GameSec 2013 – Invited Paper.

Bounded Memory Games

33

State s: Encodes last m outcomes

Current outcome is only dependent on current actions

, ¿ ,…, , - Defender payoff when actions d,a are played at state s

Page 24: Adaptive Regret Minimization in Bounded Memory Games Jeremiah Blocki, Nicolas Christin, Anupam Datta, Arunesh Sinha 1 GameSec 2013 – Invited Paper.

Bounded Memory Games - Experts

34

Expert advice may depend on the last m outcomes

If no violations have been detected in the last m rounds

then play High Inspection, otherwise Low Inspection

Fixed Defender Strategy:

State Action

Page 25: Adaptive Regret Minimization in Bounded Memory Games Jeremiah Blocki, Nicolas Christin, Anupam Datta, Arunesh Sinha 1 GameSec 2013 – Invited Paper.

Outline

35

Motivation Background Our Contributions

Bounded Memory Games Adaptive Regret Results

Page 26: Adaptive Regret Minimization in Bounded Memory Games Jeremiah Blocki, Nicolas Christin, Anupam Datta, Arunesh Sinha 1 GameSec 2013 – Invited Paper.

k-Adaptive Strategy

36

Decision tree for the next k rounds

Speed

Speed

SpeedSpeed

Day 1

Day 2

Day 3

…… …

Behave

… …

Speed

Speed

Page 27: Adaptive Regret Minimization in Bounded Memory Games Jeremiah Blocki, Nicolas Christin, Anupam Datta, Arunesh Sinha 1 GameSec 2013 – Invited Paper.

k-Adaptive Strategy

37

Decision tree for the next k rounds

Week 1 Week 2 Week 3

I will never speed while I am on vacation.

I will speed until I get caught. If I ever get a ticket then I will

stop.

I will keep speeding until I get two tickets. If I ever get two

tickets then I will stop.

If violations have been detected in the last 7 rounds

then play High Inspection, otherwise Low Inspection

Page 28: Adaptive Regret Minimization in Bounded Memory Games Jeremiah Blocki, Nicolas Christin, Anupam Datta, Arunesh Sinha 1 GameSec 2013 – Invited Paper.

k-Adaptive Regret

38

Regret (𝐷 , Expert ,𝑇 )=∑𝑖=1

𝑇

(𝑟 𝑖′−𝑟𝑖 )

Initial State

Defender … O-1 O0

Actions (a1,d1) (a2,d2) … (ak+1,dk+1

)

Outcome

O1 O2 … Ok+1 …

r1 r2 … rk+1

Expert … O-1 O0

Actions (a1,d1’)

(a2’,d2’)

… (ak+1,dk+1

’)…

Outcome

O1’ O2’ … Ok+1’ …

r1’ r2’ … rk+1’

Page 29: Adaptive Regret Minimization in Bounded Memory Games Jeremiah Blocki, Nicolas Christin, Anupam Datta, Arunesh Sinha 1 GameSec 2013 – Invited Paper.

k-Adaptive Regret Minimization

39

Definition: An algorithm D is a-approximate k-adaptive regret minimization if for any bounded memory-m game and any fixed set of experts EXP

lim ¿ 𝑇→∞(max𝐸∈exp

Regret (D ,𝐸 ,𝑇 )𝑇 )≤𝛾 .

Page 30: Adaptive Regret Minimization in Bounded Memory Games Jeremiah Blocki, Nicolas Christin, Anupam Datta, Arunesh Sinha 1 GameSec 2013 – Invited Paper.

Outline

40

Motivation Background Bounded Memory Games Adaptive Regret Results

Page 31: Adaptive Regret Minimization in Bounded Memory Games Jeremiah Blocki, Nicolas Christin, Anupam Datta, Arunesh Sinha 1 GameSec 2013 – Invited Paper.

k-Adaptive Regret Minimization

41

lim¿𝑇→∞(max𝐸∈exp

Regret (D ,𝐸 ,𝑇 )𝑇 )≤𝛾 .

Definition: An algorithm D is a-approximate k-adaptive regret minimization if for any bounded memory-m game and any fixed set of experts EXP

Theorem: For any there is an inefficient –approximate k-adaptive regret minimization algorithm.

Page 32: Adaptive Regret Minimization in Bounded Memory Games Jeremiah Blocki, Nicolas Christin, Anupam Datta, Arunesh Sinha 1 GameSec 2013 – Invited Paper.

Inefficient Regret Minimization Algorithm

42

• Use standard regret minimization algorithm for repeated games of imperfect information [AK04, McMahanB04,K05,FKM05]

, ¿ ,…, , …

f1

f2

Bounded Memory-mGame

fixed

str

ateg

y

Repeated Game

Page 33: Adaptive Regret Minimization in Bounded Memory Games Jeremiah Blocki, Nicolas Christin, Anupam Datta, Arunesh Sinha 1 GameSec 2013 – Invited Paper.

Inefficient Regret Minimization Algorithm

43

, ¿ ,…, , …

f1

f2

Bounded Memory-mGame

fixed

str

ateg

y

Repeated Game

Expected reward in original game given:

1. Defender follows fixed strategy f2 for next mkt rounds of original game

2. Defender sees sequence of k-adaptive adversaries below

• Use standard regret minimization algorithm for repeated games of imperfect information [AK04, McMahanB04,K05,FKM05]

Page 34: Adaptive Regret Minimization in Bounded Memory Games Jeremiah Blocki, Nicolas Christin, Anupam Datta, Arunesh Sinha 1 GameSec 2013 – Invited Paper.

Current outcome is only dependent on current actions

Inefficient Regret Minimization Algorithm

Start State Stagei (m*k*t rounds)

Real Game … O1 … Om …

Repeated Game

… O1 … Om …

44

Page 35: Adaptive Regret Minimization in Bounded Memory Games Jeremiah Blocki, Nicolas Christin, Anupam Datta, Arunesh Sinha 1 GameSec 2013 – Invited Paper.

Inefficient Regret Minimization Algorithm

Start State Stagei (m*k*t rounds)

Real Game … O1 … Om …

Repeated Game

… O1 … Om …

45

• After m rounds in Stagei View 1 and View 2 must converge to the same state

Page 36: Adaptive Regret Minimization in Bounded Memory Games Jeremiah Blocki, Nicolas Christin, Anupam Datta, Arunesh Sinha 1 GameSec 2013 – Invited Paper.

Inefficient Regret Minimization Algorithm

46

, ¿ ,…, , …

f1

f2

Bounded Memory-mGame

fixed

str

ateg

y

Repeated GameStandard Regret Minimization algorithms maintain weight

for each expert.

Inefficient: Exponentially many fixed strategies!

• Use standard regret minimization algorithm for repeated games of imperfect information [AK04, McMahanB04,K05,FKM05]

Page 37: Adaptive Regret Minimization in Bounded Memory Games Jeremiah Blocki, Nicolas Christin, Anupam Datta, Arunesh Sinha 1 GameSec 2013 – Invited Paper.

Summary of Technical Results

47

Imperfect Information

Perfect Information

Oblivious Regret Hard (Theorem 1)APX (Theorem 5)

APX (Theorem 4)

k-Adaptive Regret Hard (Theorem 1) Hard (Remark 2)

Fully Adaptive Regret X (Theorem 6) X (Theorem 6)

Easier

X – No Regret Minimization Algorithm Exists

Hard – unless no regret minimization algorithm is efficient in APX – efficient approximate regret minimization algorithm

Page 38: Adaptive Regret Minimization in Bounded Memory Games Jeremiah Blocki, Nicolas Christin, Anupam Datta, Arunesh Sinha 1 GameSec 2013 – Invited Paper.

Summary of Technical Results

48

Imperfect Information

Perfect Information

Oblivious Regret Hard (Theorem 1)APX (Theorem 5)

APX (Theorem 4)

k-Adaptive Regret Hard (Theorem 1)APX (New!)

Hard (Remark 2)APX (New!)

Fully Adaptive Regret X (Theorem 6) X (Theorem 6)

Easier

X – No Regret Minimization Algorithm Exists

Hard – unless no regret minimization algorithm is efficient in APX – efficient approximate regret minimization algorithm in n, k

Page 39: Adaptive Regret Minimization in Bounded Memory Games Jeremiah Blocki, Nicolas Christin, Anupam Datta, Arunesh Sinha 1 GameSec 2013 – Invited Paper.

Summary of Technical Results

49

Imperfect Information

Perfect Information

Oblivious Regret

k-Adaptive Regret

Fully Adaptive Regret X (Theorem 6) X (Theorem 6)

Easier

Ideas: Implicit weight representation + Dynamic Programming

Warning! f(k) is a very large constant!

Page 40: Adaptive Regret Minimization in Bounded Memory Games Jeremiah Blocki, Nicolas Christin, Anupam Datta, Arunesh Sinha 1 GameSec 2013 – Invited Paper.

Implicit Weights: Outcome Tree

50

… …

Behave… …

SpeedBehave Speed 𝑶 ( ln𝒏𝜸 )

𝒘𝒖𝒗

How often is edge (s,t) relevant?

nodes

Page 41: Adaptive Regret Minimization in Bounded Memory Games Jeremiah Blocki, Nicolas Christin, Anupam Datta, Arunesh Sinha 1 GameSec 2013 – Invited Paper.

Implicit Weights: Outcome Tree

51

Expert: E

… …

Behave… …

SpeedBehave Speed

𝒘𝒖𝒗𝒘 𝑬= ∑

𝒖 ,𝒗∈𝑬

𝑹𝒖𝒗𝒘𝒖𝒗  

nodes

𝑶 ( ln𝒏𝜸 )❑

Page 42: Adaptive Regret Minimization in Bounded Memory Games Jeremiah Blocki, Nicolas Christin, Anupam Datta, Arunesh Sinha 1 GameSec 2013 – Invited Paper.

Open Questions

52

Perfect Information: efficient 𝛄-Approximate k-Adaptive Regret Minimization Algorithm when k = 0 and 𝛄 = 0?

𝛄-Approximate k-Adaptive Regret Minimization Algorithm with more efficient running time? ?

Imperfect Information

Perfect Information

Oblivious Regret Hard (Theorem 1)APX (Theorem 5)

APX (Theorem 4)

k-Adaptive Regret Hard (Theorem 1)APX

Hard (Remark 2)APX

Fully Adaptive Regret X (Theorem 6) X (Theorem 6)

Thanks for Listening!

Page 44: Adaptive Regret Minimization in Bounded Memory Games Jeremiah Blocki, Nicolas Christin, Anupam Datta, Arunesh Sinha 1 GameSec 2013 – Invited Paper.

THEOREM 3

Unless RP=NP there is no efficient RegretMinimization algorithm for Bounded Memory

Games even against an oblivious adversary.

Reduction from MAX 3-SAT (7/8+ε) [Hastad01] Similar to reduction in [EKM05] for MDP’s

54

Page 45: Adaptive Regret Minimization in Bounded Memory Games Jeremiah Blocki, Nicolas Christin, Anupam Datta, Arunesh Sinha 1 GameSec 2013 – Invited Paper.

THEOREM 3: SETUP

Defender Actions A: {0,1}x{0,1}

m = O(log n)

States: Two states for each variable S0 = {s1,…, sn} S1 = {s’1,…,s’n}

Intuition: A fixed strategy corresponds to a variable assignment 55

Page 46: Adaptive Regret Minimization in Bounded Memory Games Jeremiah Blocki, Nicolas Christin, Anupam Datta, Arunesh Sinha 1 GameSec 2013 – Invited Paper.

THEOREM 3: OVERVIEW

The adversary picks a clause uniformly at random for the next n rounds

Defender can earn reward 1 by satisfying this unknown clause in the next n rounds

The game will “remember” if a reward has already been given so that defender cannot earn a reward multiple times during n rounds

56

Page 47: Adaptive Regret Minimization in Bounded Memory Games Jeremiah Blocki, Nicolas Christin, Anupam Datta, Arunesh Sinha 1 GameSec 2013 – Invited Paper.

THEOREM 3: STATE TRANSITIONS

57

Adversary Actions B: {0,1}x{0,1,2,3}

b = (b1,b2)

g(a,b) = b1

f(a,b) = S1 if a2 = 1 or b2 = a1 (reward already given)

S0 else (no reward given)

Page 48: Adaptive Regret Minimization in Bounded Memory Games Jeremiah Blocki, Nicolas Christin, Anupam Datta, Arunesh Sinha 1 GameSec 2013 – Invited Paper.

THEOREM 3: REWARDS

58

b = (b1,b2)

No reward whenever B plays b2 = 2

r(a,b,s) =

1 if s S0 and a = b2

-5 if s S1 and f(a,b) = S0 and b2 3

0 otherwise

No reward whenever s S1

Page 49: Adaptive Regret Minimization in Bounded Memory Games Jeremiah Blocki, Nicolas Christin, Anupam Datta, Arunesh Sinha 1 GameSec 2013 – Invited Paper.

THEOREM 3: OBLIVIOUS ADVERSARY

(d1,…,dn) - binary De Buijn sequence of order n

1. Pick a clause C uniformly at random2. For i = 1,…,n

Play b = (di,b2)

3. Repeat Step 159

b2 =

1 If xi C

0 If xi C

3 If i = n

2 Otherwise

Page 50: Adaptive Regret Minimization in Bounded Memory Games Jeremiah Blocki, Nicolas Christin, Anupam Datta, Arunesh Sinha 1 GameSec 2013 – Invited Paper.

ANALYSIS

Defender can never be rewarded from s S1

Get Reward => Transition to s S1

Defender is punished for leaving S1

Unless adversary plays b2 = 3 (i.e when i = n)60

f(a,b) = S1 if a2 = 1 or b2 = a1

S0 else

r(a,b,s)=

1 if s S0 and a = b2

-5 if s S1 and f(a,b) = S0 and b2 3

0 otherwise

Page 51: Adaptive Regret Minimization in Bounded Memory Games Jeremiah Blocki, Nicolas Christin, Anupam Datta, Arunesh Sinha 1 GameSec 2013 – Invited Paper.

THEOREM 3: ANALYSIS φ - assignment satisfying ρ fraction of clauses

fφ – average score ρ/n

Claim: No strategy (fixed or adaptive) can obtain an average expected score better than ρ*/n

Regret Minimization Algorithm Run until expected average regret < ε/n Expected average score > (ρ*- ε )/n

61