Adaptive Regret Minimization in Bounded Memory Games Jeremiah Blocki, Nicolas Christin, Anupam...

Adaptive Regret Minimization in Bounded Memory Games

Jeremiah Blocki, Nicolas Christin,

Anupam Datta, Arunesh Sinha

GameSec 2013 – Invited Paper

Motivating Example: Cheating Game

Semester 1 Semester 2 Semester 3

Motivating Example: Speeding Game

Week 1 Week 2 Week 3

Motivating Example: Speeding Game

Example

Actions

Questions

Appropriate Game Model for this Interaction?

Defender Strategies?

Outcomes

High InspectionLow Inspection

Speed Behave

Game Elements

o Repeated Interactiono Two Players: Defender and Adversary

o Imperfect Informationo Defender only observes outcome

o Short Term Adversarieso Adversary Incentives Unknown to

Defendero Last presentation! [JNTP 13]

o Adversary may be uninformed/irrational

eated G

Stackelb

Additional Game Elements

o History-dependent Actionso Adversary adapts behavior following unknown strategyo How should defender respond?

o History-dependent Rewards:o Point Systemo Reputation of defender depends both on its history

and on the current outcome

StandardRegretMinimization

Repeated Game Model?

Outline

Motivation Background

Standard Definition of Regret Regret Minimization Algorithms Limitations

Our Contributions Bounded Memory Games Adaptive Regret Results

Speeding Game: Repeated Game Model

Example

.19 0.7

High Inspection

Low Inspection

Defender’s (D) Expected Utility

Regret Minimization Example

Example

Experts

Low Inspecti

onHigh

Inspection

What should I do?

Behave

Example

.19 0.7

High Inspection

Low Inspection

Defender’s Utility

Experts

Adversary

Defender

Utility

Aristotle

0.19 + 1 + 0.7 = 1.89 0.2 + 1 + 1 = 2.2

Day 1 Day 2 Day 3

Example

.19 0.7

High Inspection

Low Inspection

Regret

Defender

Aristotle

Utility

Example

.19 0.7

High Inspection

Low Inspection

Regret

Example

.19 0.7

High Inspection

Low Inspection

Defender’s utility

Regret Minimization Algorithm (A)

lim¿𝑇→∞(Regret ( 𝐴 ,𝐸𝑥𝑝𝑒𝑟𝑡𝑠 ,𝑇 )𝑇 )≤0

Regret Minimization: Basic Idea

Low Inspectio

High Inspectio

1.0 1.0Weights

Choose action probabilistically based on weights

.19 0.7

High Inspection

Low Inspection

Regret Minimization: Basic Idea

Updated weights

Low Inspectio

High Inspectio

0.5 1.5

.19 0.7

High Inspection

Low Inspection

Speeding Game

Example

.19 0.7

High Inspection

Low Inspection

Defender’s Strategy

Nash Equilibrium: Low Inspection

Regret Minimization: Low Inspection

Dominant Strategy

Low Inspectio

Inspection

0.5 1.5

Speeding Game

Example

.19 0.7

High Inspection

Low Inspection

Dominant Strategy

Low Inspectio

Inspection

0.3 1.7

Speeding Game

Example

.19 0.7

High Inspection

Low Inspection

Dominant Strategy

Low Inspectio

Inspection

0.1 1.9

Philosophical Argument

See! My advice

was better!

We need

a better game model

Unmodeled Game Elements

o Adversary Incentives Unknown to Defendero Last presentation! [JNTP 13]

o Adversary may be uninformed/irrational

o History-dependent Rewards:o Point Systemo Reputation of defender depends both on its history

and on the current outcome

o History-dependent Actionso Adversary adapts behavior following unknown strategyo How should defender respond?

Outline

Motivation Background Our Contributions

Bounded Memory Games Adaptive Regret Results

Bounded Memory Games

State s: Encodes last m outcomes

States: can capture history dependent rewards

𝑂𝑖−𝑚 ,…,𝑂 𝑖− 2 ,𝑂𝑖−1 𝑂𝑖−𝑚+1 ,…,𝑂𝑖 −1 ,𝑂𝑖𝑂 𝑖

- Defender payoff when actions d,a are played at state s

Bounded Memory Games

State s: Encodes last m outcomes

Current outcome is only dependent on current actions

, ¿ ,…, , - Defender payoff when actions d,a are played at state s

Bounded Memory Games - Experts

Expert advice may depend on the last m outcomes

If no violations have been detected in the last m rounds

then play High Inspection, otherwise Low Inspection

Fixed Defender Strategy:

State Action

Outline

Motivation Background Our Contributions

Bounded Memory Games Adaptive Regret Results

k-Adaptive Strategy

Decision tree for the next k rounds

SpeedSpeed

…… …

Behave

… …

k-Adaptive Strategy

Decision tree for the next k rounds

Week 1 Week 2 Week 3

I will never speed while I am on vacation.

I will speed until I get caught. If I ever get a ticket then I will

I will keep speeding until I get two tickets. If I ever get two

tickets then I will stop.

If violations have been detected in the last 7 rounds

then play High Inspection, otherwise Low Inspection

k-Adaptive Regret

Regret (𝐷 , Expert ,𝑇 )=∑𝑖=1

(𝑟 𝑖′−𝑟𝑖 )

Initial State

Defender … O-1 O0

Actions (a1,d1) (a2,d2) … (ak+1,dk+1

Outcome

O1 O2 … Ok+1 …

r1 r2 … rk+1

Expert … O-1 O0

Actions (a1,d1’)

(a2’,d2’)

… (ak+1,dk+1

’)…

Outcome

O1’ O2’ … Ok+1’ …

r1’ r2’ … rk+1’

k-Adaptive Regret Minimization

Definition: An algorithm D is a-approximate k-adaptive regret minimization if for any bounded memory-m game and any fixed set of experts EXP

lim ¿ 𝑇→∞(max𝐸∈exp

Regret (D ,𝐸 ,𝑇 )𝑇 )≤𝛾 .

Outline

Motivation Background Bounded Memory Games Adaptive Regret Results

k-Adaptive Regret Minimization

lim¿𝑇→∞(max𝐸∈exp

Regret (D ,𝐸 ,𝑇 )𝑇 )≤𝛾 .

Definition: An algorithm D is a-approximate k-adaptive regret minimization if for any bounded memory-m game and any fixed set of experts EXP

Theorem: For any there is an inefficient –approximate k-adaptive regret minimization algorithm.

Inefficient Regret Minimization Algorithm

• Use standard regret minimization algorithm for repeated games of imperfect information [AK04, McMahanB04,K05,FKM05]

, ¿ ,…, , …

Bounded Memory-mGame

Repeated Game

, ¿ ,…, , …

Repeated Game

Expected reward in original game given:

1. Defender follows fixed strategy f2 for next mkt rounds of original game

2. Defender sees sequence of k-adaptive adversaries below

Current outcome is only dependent on current actions

Start State Stagei (m*k*t rounds)

Real Game … O1 … Om …

Repeated Game

… O1 … Om …

Start State Stagei (m*k*t rounds)

Real Game … O1 … Om …

Repeated Game

… O1 … Om …

• After m rounds in Stagei View 1 and View 2 must converge to the same state

, ¿ ,…, , …

Repeated GameStandard Regret Minimization algorithms maintain weight

for each expert.

Inefficient: Exponentially many fixed strategies!

Summary of Technical Results

Imperfect Information

Perfect Information

Oblivious Regret Hard (Theorem 1)APX (Theorem 5)

APX (Theorem 4)

k-Adaptive Regret Hard (Theorem 1) Hard (Remark 2)

Fully Adaptive Regret X (Theorem 6) X (Theorem 6)

Easier

X – No Regret Minimization Algorithm Exists

Hard – unless no regret minimization algorithm is efficient in APX – efficient approximate regret minimization algorithm

Perfect Information

APX (Theorem 4)

k-Adaptive Regret Hard (Theorem 1)APX (New!)

Hard (Remark 2)APX (New!)

Easier

X – No Regret Minimization Algorithm Exists

Hard – unless no regret minimization algorithm is efficient in APX – efficient approximate regret minimization algorithm in n, k

Perfect Information

Oblivious Regret

k-Adaptive Regret

Easier

Ideas: Implicit weight representation + Dynamic Programming

Warning! f(k) is a very large constant!

Implicit Weights: Outcome Tree

… …

Behave… …

SpeedBehave Speed 𝑶 ( ln𝒏𝜸 )

𝒘𝒖𝒗

How often is edge (s,t) relevant?

Implicit Weights: Outcome Tree

Expert: E

… …

Behave… …

SpeedBehave Speed

𝒘𝒖𝒗𝒘 𝑬= ∑

𝒖 ,𝒗∈𝑬

𝑹𝒖𝒗𝒘𝒖𝒗

𝑶 ( ln𝒏𝜸 )❑

Open Questions

Perfect Information: efficient 𝛄-Approximate k-Adaptive Regret Minimization Algorithm when k = 0 and 𝛄 = 0?

𝛄-Approximate k-Adaptive Regret Minimization Algorithm with more efficient running time? ?

Perfect Information

APX (Theorem 4)

k-Adaptive Regret Hard (Theorem 1)APX

Hard (Remark 2)APX

Thanks for Listening!

THEOREM 3

Unless RP=NP there is no efficient RegretMinimization algorithm for Bounded Memory

Games even against an oblivious adversary.

Reduction from MAX 3-SAT (7/8+ε) [Hastad01] Similar to reduction in [EKM05] for MDP’s

THEOREM 3: SETUP

Defender Actions A: {0,1}x{0,1}

m = O(log n)

States: Two states for each variable S0 = {s1,…, sn} S1 = {s’1,…,s’n}

Intuition: A fixed strategy corresponds to a variable assignment 55

THEOREM 3: OVERVIEW

The adversary picks a clause uniformly at random for the next n rounds

Defender can earn reward 1 by satisfying this unknown clause in the next n rounds

The game will “remember” if a reward has already been given so that defender cannot earn a reward multiple times during n rounds

THEOREM 3: STATE TRANSITIONS

Adversary Actions B: {0,1}x{0,1,2,3}

b = (b1,b2)

g(a,b) = b1

f(a,b) = S1 if a2 = 1 or b2 = a1 (reward already given)

S0 else (no reward given)

THEOREM 3: REWARDS

b = (b1,b2)

No reward whenever B plays b2 = 2

r(a,b,s) =

1 if s S0 and a = b2

-5 if s S1 and f(a,b) = S0 and b2 3

0 otherwise

No reward whenever s S1

THEOREM 3: OBLIVIOUS ADVERSARY

(d1,…,dn) - binary De Buijn sequence of order n

1. Pick a clause C uniformly at random2. For i = 1,…,n

Play b = (di,b2)

3. Repeat Step 159

1 If xi C

0 If xi C

3 If i = n

2 Otherwise

ANALYSIS

Defender can never be rewarded from s S1

Get Reward => Transition to s S1

Defender is punished for leaving S1

Unless adversary plays b2 = 3 (i.e when i = n)60

f(a,b) = S1 if a2 = 1 or b2 = a1

S0 else

r(a,b,s)=

1 if s S0 and a = b2

-5 if s S1 and f(a,b) = S0 and b2 3

0 otherwise

THEOREM 3: ANALYSIS φ - assignment satisfying ρ fraction of clauses

fφ – average score ρ/n

Claim: No strategy (fixed or adaptive) can obtain an average expected score better than ρ*/n

Regret Minimization Algorithm Run until expected average regret < ε/n Expected average score > (ρ*- ε )/n

Adaptive Regret Minimization in Bounded Memory Games Jeremiah Blocki, Nicolas Christin, Anupam...

Documents

Transcript of Adaptive Regret Minimization in Bounded Memory Games Jeremiah Blocki, Nicolas Christin, Anupam...

Personality arunesh chand mankotia

Bpo Industry, Created On Tuesday, May 23, 2006 Arunesh Chand Mankotia

Indoor advertising concept NAMO - Arunesh Chand Mankotia

From physical security to cybersecurity - iXthanhhng/publications/... · Review Article From physical security to cybersecurity Arunesh Sinha,1,*,‡ Thanh H. Nguyen,1,‡ Debarun

Best Ad Banners Ever - Arunesh Chand Mankotia

GameSec 2010 November 22, Berlin Mathias Humbert, Mohammad Hossein Manshaei, Julien Freudiger and Jean-Pierre Hubaux EPFL - Laboratory for Computer communications.

Strategic business proposal eent - green brick project - arunesh chand mankotia

Maintaining Quality at Scale for Sanitation ARUNESH SINGH, SATYA PRAKASH CHOUBEY,GENEVIEVE KELLY, APRAJITA SINGH Leveraging existing reporting mechanisms.

Why Study Statistics Arunesh Chand Mankotia 2004

ISPs and Ad Networks Against Botnet Ad Fraud Nevena Vratonjic, Mohammad Hossein Manshaei, Maxim Raya and Jean-Pierre Hubaux 1 November 2010, GameSec’10.

Naturally Rehearsing Passwords Jeremiah Blocki NSF TRUST October 2013 Manuel Blum Anupam Datta.

Human-Computable Passwords Jeremiah Blocki Manuel Blum Anupam Datta Santosh Vempala.

CS 255: Database System Principles slides: Variable length data and record By:- Arunesh Joshi( 107) Id:-006538558 Cs257_107_ch13_13.7.

SEVERAL COMPLEX VARIABLES - Jagiellonian Universitygamma.im.uj.edu.pl/~blocki/publ/ln/scv-poznan.pdf · In Sections 1-4 we present basic results on subharmonic and plurisubharmonic

Audit Games Jeremiah Blocki, Nicolas Christin, Anupam Datta, Ariel D. Procaccia, Arunesh Sinha 1 Carnegie Mellon University.

The Complex Monge–Ampere Equation` in Kahler Geometry¨gamma.im.uj.edu.pl/~blocki/publ/ln/cetr.pdf · The Complex Monge–Ampere Equation` in Kahler Geometry¨ Zbigniew Błocki

Probing in 802.11 using Neighbor Graphs Minho Shin, Arunesh Mishra, William Arbaugh University of Maryland.

CS 580: Algorithm Design and Analysis › homes › jblocki › courses › ... · CS 580: Algorithm Design and Analysis Jeremiah Blocki Purdue University ... of finding the greatest

CS 255: Database System Principles slides: B-trees By:- Arunesh Joshi Id:-006538558.

CS 255: Database System Principles slides: From Parse Trees to Logical Query Plans By:- Arunesh Joshi Id:-006538558.