15-382 COLLECTIVE INTELLIGENCE – S18gdicaro/15382-Spring18/...lecture28: game theory 3 instructor:...

27
L ECTURE 28: G AME T HEORY 3 I NSTRUCTOR : G IANNI A. D I C ARO 15 - 382 C OLLECTIVE I NTELLIGENCE S18

Transcript of 15-382 COLLECTIVE INTELLIGENCE – S18gdicaro/15382-Spring18/...lecture28: game theory 3 instructor:...

Page 1: 15-382 COLLECTIVE INTELLIGENCE – S18gdicaro/15382-Spring18/...lecture28: game theory 3 instructor: giannia. di caro 15-382 collective intelligence– s18

LECTURE 28:GAME THEORY 3

INSTRUCTOR:GIANNI A.DI CARO

15-382COLLECTIVE INTELLIGENCE – S18

Page 2: 15-382 COLLECTIVE INTELLIGENCE – S18gdicaro/15382-Spring18/...lecture28: game theory 3 instructor: giannia. di caro 15-382 collective intelligence– s18

15781 Fall 2016: Lecture 22

MIXED NASH EQUILIBRIUM

0,0 -1,1 1,-1

1,-1 0,0 -1,1

-1,1 1,-1 0,0

2

13 ,13 ,13 ,

13 ,13 ,13

Finding ME:

§ Let 𝜋%& , 𝜋'& , 𝜋(& , be the probabilities of the pure strategy mix for player 𝑖 = 1,2, 𝜋%& + 𝜋'& + 𝜋(& = 1

§ A mixed strategy equilibrium needs to make player 𝑖 indifferent among all three of his strategies (i.e. same expected utility)

§ à Find player 𝑖 expected utilities as a function of the parameters of the mixed strategy and set the parameters in order to satisfy the previous requirement

R S

P

R

P

S

0,0 -2,2 1,-1

2,-2 0,0 -1,1

-1,1 1,-1 0,0

R S

P

R

P

S

§ In symmetric zero-sum games the expected utility of the players at equilibrium is zero à This property can be used to rule out equilibrium candidates (just check if one player has a positive utility!)

Page 3: 15-382 COLLECTIVE INTELLIGENCE – S18gdicaro/15382-Spring18/...lecture28: game theory 3 instructor: giannia. di caro 15-382 collective intelligence– s18

3

GAME OF CHICKEN

http://youtu.be/u7hZ9jKrwvo

§ Each player, in attempting to secure his best outcome, risks the worst§ Every player wants to dare, but only if the other chickens out!§ A mediator would help…

Page 4: 15-382 COLLECTIVE INTELLIGENCE – S18gdicaro/15382-Spring18/...lecture28: game theory 3 instructor: giannia. di caro 15-382 collective intelligence– s18

4

GAME OF CHICKEN

§ Social welfare is the sum of utilities

§ Optimal social welfare = 6

§ Pure NE: (C,D) and (D,C), social welfare = 5

§ Mixed NE: both (./ ,./), social

welfare = 4§ Can we do better? Players

are independent so far …

Dare Chicken

Dare 0,0 4,1

Chicken 1,4 3,3

Page 5: 15-382 COLLECTIVE INTELLIGENCE – S18gdicaro/15382-Spring18/...lecture28: game theory 3 instructor: giannia. di caro 15-382 collective intelligence– s18

5

CORRELATED EQUILIBRIUM

§ A “trusted” authority / mediator chooses a pair of strategies (𝑠1, 𝑠2) according to a probability distribution 𝑝 over 𝑆2 (it can be generalized to 𝑛 players)

§ The mediator “flips a coin” / draw according to the distribution 𝑝(𝑠., 𝑠/) and, based on the outcome, tells the players which pure strategy to use based

Robert AumannNobel prize, 2005

Page 6: 15-382 COLLECTIVE INTELLIGENCE – S18gdicaro/15382-Spring18/...lecture28: game theory 3 instructor: giannia. di caro 15-382 collective intelligence– s18

6

CORRELATED EQUILIBRIUM

§ à The trusted party only tells each player what to do, but it does not reveal what the other party is supposed to do!

§ The distribution 𝒑 is known to the players: each player knows the probability of observing a strategy profile and assumes the other player will follow mediator’s instructions

§ à Posterior conditional probability is known: Pr[𝑠&|𝑠;]

§ It is a Correlated Equilibrium (CE) if no player wants to deviate from the trusted party’s instructions, such that choices are correlated

§ à Find distribution 𝑝 that guarantees a CE

Page 7: 15-382 COLLECTIVE INTELLIGENCE – S18gdicaro/15382-Spring18/...lecture28: game theory 3 instructor: giannia. di caro 15-382 collective intelligence– s18

Dare Chicken

Dare 0,0 7,2

Chicken 2,7 6,6

7

CORRELATED EQUILIBRIUM

§ Common knowledge: Distribution 𝑝 (is CE)o(D,D): 0o(D,C): .

>o(C,D): .

>o(C,C): .

>

§ If Player 2 is told to play D, then P2 knows that the outcome must be (C,D) and that Player 1 will obey the instructions à P1 plays C

ü Based on this, Player 2 has no incentive to change from playing D, as given

Page 8: 15-382 COLLECTIVE INTELLIGENCE – S18gdicaro/15382-Spring18/...lecture28: game theory 3 instructor: giannia. di caro 15-382 collective intelligence– s18

8

CORRELATED EQUILIBRIUM

§ Distribution 𝑝 (is CE)o (D,D): 0o (D,C): .

>

o (C,D): .>

o (C,C): .>

0,0 7,2

2,7 6,6

§ If Player 2 is told to play C, then 2 knows that the outcome must be (D,C) or (C,C) with equal probability.

§ Player’s 2 expected utility on playing C conditioned on the fact that he is told to play C (and Player 1 will obey instructions) is:

./𝑢/ 𝐷, 𝐶 + .

/𝑢/ 𝐶, 𝐶 = .

/2 + .

/6 = 4

§ If Player 2 deviates from instructions and plays D: 𝑢/ = 3.5 < 4

ü It’s better to follow the instructions!

Chicken

Dare

Dare Chicken

Page 9: 15-382 COLLECTIVE INTELLIGENCE – S18gdicaro/15382-Spring18/...lecture28: game theory 3 instructor: giannia. di caro 15-382 collective intelligence– s18

9

CORRELATED EQUILIBRIUM

§ Distribution 𝑝 (is CE)o (D,D): 0o (D,C): .

>

o (C,D): .>

o (C,C): .>

§ Player 2 does not have incentive to deviate

§ Since the game is symmetric, also Player 1 does not have incentive to deviate

§ → Correlated equilibrium!

§ Expected reward per player: (1/3)*7 + (1/3)*2 + (1/3)*6 = 5

§ Mixed strategy NE: 4*(2/3), which is < 5

§ Social welfare: 30/3

0,0 7,2

2,7 6,6Chicken

Dare

Dare Chicken

Page 10: 15-382 COLLECTIVE INTELLIGENCE – S18gdicaro/15382-Spring18/...lecture28: game theory 3 instructor: giannia. di caro 15-382 collective intelligence– s18

10

CORRELATED EQUILIBRIUM

§ Let 𝑁 = {1,2} for simplicity

§ A mediator chooses a pair of strategies (𝑠., 𝑠/) according to a distribution 𝑝 over 𝑆/

§ Reveals 𝑠. to player 1 and 𝑠/ to player 2

§ When player 1 gets 𝑠. ∈ 𝑆, he knows that the distribution over strategies of 2 is

Pr 𝑠/ 𝑠. =Pr 𝑠. ∧ 𝑠/Pr 𝑠.

=𝑝 𝑠., 𝑠/

∑ 𝑝(𝑠., 𝑠/M)NOP∈(

Page 11: 15-382 COLLECTIVE INTELLIGENCE – S18gdicaro/15382-Spring18/...lecture28: game theory 3 instructor: giannia. di caro 15-382 collective intelligence– s18

11

COMPUTING CE STRATEGY

§ Player’s 1 strategy 𝑠. is a best response if its expected utility cannot be unilaterally improved based on what he knows:

QPr 𝑠/ 𝑠. 𝑢. 𝑠., 𝑠/ ≥ Q Pr 𝑠/ 𝑠. 𝑢.(𝑠.M , 𝑠/)NO∈(NO∈(

, ∀𝑠.M ∈ 𝑆

§ Equivalently, replacing using Bayes’ rule

Q 𝑝 𝑠., 𝑠/ 𝑢. 𝑠., 𝑠/ ≥ Q 𝑝 𝑠., 𝑠/ 𝑢.(𝑠.M , 𝑠/)NO∈(NO∈(

§ 𝑝 is a correlated equilibrium (CE) if both players are best responding

Page 12: 15-382 COLLECTIVE INTELLIGENCE – S18gdicaro/15382-Spring18/...lecture28: game theory 3 instructor: giannia. di caro 15-382 collective intelligence– s18

12

CE AS LP

§ Can compute CE via linear programming in polynomial time!

find 𝑝 𝑠., 𝑠/s.t. ∀𝑠., 𝑠.M , 𝑠/ ∈ 𝑆,

∀𝑠., 𝑠/, 𝑠/M ∈ 𝑆,

Q 𝑝 𝑠., 𝑠/ = 1NT,NO∈(

∀𝑠., 𝑠/ ∈ 𝑆, 𝑝 𝑠., 𝑠/ ∈ [0,1]

Q 𝑝 𝑠.,𝑠/ 𝑢. 𝑠.,𝑠/ ≥ Q 𝑝 𝑠.,𝑠/ 𝑢.(𝑠.M ,𝑠/)NO∈UNO∈U

Q 𝑝 𝑠.,𝑠/ 𝑢/ 𝑠., 𝑠/ ≥ Q 𝑝 𝑠., 𝑠/ 𝑢/(𝑠.,𝑠/M )NT∈UNT∈U

Page 13: 15-382 COLLECTIVE INTELLIGENCE – S18gdicaro/15382-Spring18/...lecture28: game theory 3 instructor: giannia. di caro 15-382 collective intelligence– s18

13

BEST WELFARE CE§ Adding an objective (linear) function f, the best correlated

equilibrium (e.g., max welfare) can be found

max 𝑓(𝑝 𝑠., 𝑠/ ;𝑢., 𝑢/)s.t. ∀𝑠., 𝑠.M , 𝑠/ ∈ 𝑆,

∀𝑠., 𝑠/, 𝑠/M ∈ 𝑆,

Q 𝑝 𝑠., 𝑠/ = 1NT,NO∈(

∀𝑠., 𝑠/ ∈ 𝑆, 𝑝 𝑠., 𝑠/ ∈ [0,1]

Q 𝑝 𝑠.,𝑠/ 𝑢. 𝑠.,𝑠/ ≥ Q 𝑝 𝑠.,𝑠/ 𝑢.(𝑠.M ,𝑠/)NO∈UNO∈U

Q 𝑝 𝑠.,𝑠/ 𝑢/ 𝑠., 𝑠/ ≥ Q 𝑝 𝑠., 𝑠/ 𝑢/(𝑠.,𝑠/M )NT∈UNT∈U

Page 14: 15-382 COLLECTIVE INTELLIGENCE – S18gdicaro/15382-Spring18/...lecture28: game theory 3 instructor: giannia. di caro 15-382 collective intelligence– s18

14

IMPLEMENTATION OF CE

§ Instead of a mediator, use a hat!§Balls in hat are labeled with “chicken” or “dare”, each blindfolded player takes a ball

Which balls implement the distribution 𝑝 before ?

1. 1 chicken, 1 dare2. 2 chicken, 1 dare3. 2 chicken, 2 dare4. 3 chicken, 2 dare

C DDCC CDC

E.g., An automatic trusting authority can be implemented using cryptographic algorithms

Page 15: 15-382 COLLECTIVE INTELLIGENCE – S18gdicaro/15382-Spring18/...lecture28: game theory 3 instructor: giannia. di caro 15-382 collective intelligence– s18

15

CE VS. NE

What is the relation between CE and NE?1. CE ⇒ NE2. NE ⇒ CE3. NE ⇔ CE4. NE ∥ CE

§ For any pure strategy NE, there is a corresponding correlated equilibrium yielding the same outcome.

§ For any mixed strategy NE, there is a corresponding correlated equilibrium yielding the same distribution of outcomes.

§ From Nash theorem, “all” games have a mixed strategies NE”. Since a NE implies a CE, a CE always exist

Page 16: 15-382 COLLECTIVE INTELLIGENCE – S18gdicaro/15382-Spring18/...lecture28: game theory 3 instructor: giannia. di caro 15-382 collective intelligence– s18

16

A DIFFERENT TYPE OF GAMES: STACKELBERG GAMES

§ Playing up is a dominant strategy for row player

§ Row player plays up§ Column player would then play left§ Therefore, (1,1) is the only Nash

equilibrium outcome

1,1 3,0

0,0 2,1

L R

U

D

Page 17: 15-382 COLLECTIVE INTELLIGENCE – S18gdicaro/15382-Spring18/...lecture28: game theory 3 instructor: giannia. di caro 15-382 collective intelligence– s18

17

COMMITMENT IS GOOD

§ Suppose the game is played sequentially as follows:

oRow player commits to playing a row

oColumn player observes the commitment and chooses a column

§ Row player can commit to playing Down: Column player will play Right and the Row player gets now a better reward!

1,1 3,0

0,0 2,1

L R

U

D

Page 18: 15-382 COLLECTIVE INTELLIGENCE – S18gdicaro/15382-Spring18/...lecture28: game theory 3 instructor: giannia. di caro 15-382 collective intelligence– s18

1,1 3,0

0,0 2,1

18

COMMITMENT TO MIXED STRATEGY

§ By committing to a mixed strategy, row player can get even better and guarantee a reward of almost 2.5: 0.49×3+ 0.51×2

§ Stackelberg strategy (1934)§ Rooted in duopoly scenarios

𝜋] = 0.49

§ Player 1 (Leader) moves at the start of the game. Then use backward induction to find the subgame perfect equilibrium.

§ First for any output of leader, find the strategy of Followerthat maximizes its payoff (its expected best reply).

§ Next, find the strategy of leader that maximizes leader player utility, given the strategy of follower

𝜋^ = 0.51

𝜋% = 1𝜋_ = 0

Page 19: 15-382 COLLECTIVE INTELLIGENCE – S18gdicaro/15382-Spring18/...lecture28: game theory 3 instructor: giannia. di caro 15-382 collective intelligence– s18

19

COMPUTING STACKELBERG

§ Theorem [Conitzer and Sandholm, 2006]: In 2-player normal form games, an optimal Stackelberg strategy can be found in polynomial time

§ Theorem:The problem is NP-hard when the number of players is ≥ 3

Page 20: 15-382 COLLECTIVE INTELLIGENCE – S18gdicaro/15382-Spring18/...lecture28: game theory 3 instructor: giannia. di caro 15-382 collective intelligence– s18

20

TRACTABILITY: 2 PLAYERS

§ For each pure strategy 𝑠/ of the follower, we compute via the LP below a mixed strategy 𝑥. for the leader such that:

oPlaying 𝑠/ is a best response for the follower

oUnder this constraint, 𝑥. is optimal (for follower)

§ Choose 𝑥.∗ that maximizes leader’s utility

max ∑ 𝑥. 𝑠. 𝑢.(𝑠., 𝑠/)NT∈(

s.t. ∀𝑠/M ∈ 𝑆,

∀𝑠. ∈ 𝑆, 𝑥. 𝑠. ∈ [0,1]

∑ 𝑥. 𝑠. 𝑢/ 𝑠., 𝑠/ ≥ ∑ 𝑥. 𝑠. 𝑢/ 𝑠., 𝑠/MNT∈(NT∈(

∑ 𝑥. 𝑠. = 1NT∈(

Page 21: 15-382 COLLECTIVE INTELLIGENCE – S18gdicaro/15382-Spring18/...lecture28: game theory 3 instructor: giannia. di caro 15-382 collective intelligence– s18

21

APPLICATION: SECURITY

§ Airport security: deployed at LAX§ Federal Air Marshals§ Coast Guard§ Idea:

oDefender commits to mixed strategyoAttacker observes and best responds

§ Attacker monitors defender and tries to maximize damage, while a defender deploys resources to minimize damage based on knowledge of what the attacker would like to obtain

Page 22: 15-382 COLLECTIVE INTELLIGENCE – S18gdicaro/15382-Spring18/...lecture28: game theory 3 instructor: giannia. di caro 15-382 collective intelligence– s18

22

SECURITY GAMES

§ Set of targets 𝑇 = {1,… ,𝑛}o E.g., Entrance gates of an airport

§ Set Ωof 𝑟 security resources available to the defender (leader)o E.g., Video cameras looking at specific

directionso E.g., Air marshals flying on specific flights

§ Set Σof feasible partitions of target set: Σ ⊆ 2𝑇

§ Resource 𝜔 ∈ Ω can be assigned to one of the partitions in 𝐴 𝜔 ⊆ Σ à Schedule

§ Defender (leader): how to deploy the resources over the target à How to assign resources to targets

§ Attacker (follower) chooses targets to attack

resources

targets

Page 23: 15-382 COLLECTIVE INTELLIGENCE – S18gdicaro/15382-Spring18/...lecture28: game theory 3 instructor: giannia. di caro 15-382 collective intelligence– s18

23

SECURITY GAMES

§ For each target 𝑡, there are four numbers defining the payoffs of defender and attacker in case of successful and unsuccessful attack:o 𝑢𝑑+ 𝑡 = Defender’s payoff when target 𝑡is attacked and target

was covered by at least one resourceo 𝑢𝑑− 𝑡 = Defender’s payoff when target 𝑡is attacked and target

was not coveredo 𝑢𝑎− 𝑡 = Attacker’s payoff when target 𝑡is attacked and target

was not coveredo 𝑢𝑎+ 𝑡 = Attacker’s payoff when target 𝑡is attacked and target

was covered by at least one resource

§ 𝑢𝑑+ 𝑡 ≥ 𝑢𝑑− 𝑡 , and 𝑢𝑎+ 𝑡 ≤ 𝑢𝑎− 𝑡

§ For each target 𝑡 there’s a coverage probability by the defender using the resources in Ω, such that the set of targets defines 𝒄 = (𝑐1,… , 𝑐𝑛), a vector of coverage probabilities

Page 24: 15-382 COLLECTIVE INTELLIGENCE – S18gdicaro/15382-Spring18/...lecture28: game theory 3 instructor: giannia. di caro 15-382 collective intelligence– s18

24

SECURITY GAMES

§ The expected utilities to the defender/attacker under coverage c if target 𝑡 is attacked are:

resources

targets𝑢q 𝑡, 𝒄 = 𝑢qr 𝑡 ⋅ 𝑐t + 𝑢qu 𝑡 1− 𝑐t𝑢v 𝑡, 𝒄 = 𝑢vr 𝑡 ⋅ 𝑐t + 𝑢vu 𝑡 1− 𝑐t

Payoffs example with two targets:

Page 25: 15-382 COLLECTIVE INTELLIGENCE – S18gdicaro/15382-Spring18/...lecture28: game theory 3 instructor: giannia. di caro 15-382 collective intelligence– s18

25

BAYESIAN SECURITY GAMES

§ There are multiple types of potential attackers, each type with different payoffs

§ The defender knows about their payoffs, as well as the distribution of attacker types

§ Bayesian extensions are used to model uncertainty over the payoffs and preferences of the players, where more uncertainty can be expressed with increasing number of types.

Payoffs example with two targets and two types of attackers:

Page 26: 15-382 COLLECTIVE INTELLIGENCE – S18gdicaro/15382-Spring18/...lecture28: game theory 3 instructor: giannia. di caro 15-382 collective intelligence– s18

26

SOLVING SECURITY GAMES

§ It’s is a 2-player Stackelberg game, so we can compute an optimal strategy for the defender in polynomial time…?

§ Consider the case of Σ = 𝑇, i.e., resources are assigned to individual targets, i.e., schedules have size 1

§ Nevertheless, number of leader strategies is exponential: 2𝑟

§ à Representation of the linear program is exponential§ Theorem [Korzhyk et al. 2010]: Optimal leader strategy can

be computed in poly time

§ A number of smart algorithms have been developed …§ Jain M., An B., Tambe M. (2013) Security Games Applied to Real-World: Research

Contributions and Challenges. In: Jajodia S., Ghosh A., Subrahmanian V., Swarup V., Wang C., Wang X. (eds) Moving Target Defense II. Advances in Information Security, vol 100. Springer, New York, NY

Page 27: 15-382 COLLECTIVE INTELLIGENCE – S18gdicaro/15382-Spring18/...lecture28: game theory 3 instructor: giannia. di caro 15-382 collective intelligence– s18

27

IN THE NEWS …