Competition between adaptive agents: learning and collective efficiency

Competition between adaptive agents: learning and collective efficiency

Damien Challet

Oxford University

Matteo Marsili

ICTP-Trieste (Italy)

[email protected]

● My definition of the Minority Game

● Simple worlds (M= 0)

●Markovian behavior

●Neural networks

●Reinforcement learning

● Multistate worlds (M> 0)

● Cause of large inefficiencies

● Remedies

● From El Farol to MG and back

'Truth is always in the minority'

Kierkegaard

Zig-Zag-Zoug

● Game played by Swiss children

● 3 players, 3 feet, 3 magic

words

●“Ziiig” ... “Zaaag” .... “ZOUG!”

Minority Game

● Zig-Zag-Zoug with N players● Aim: to be in the minority● Outcome = #UP-#DOWN = #A-#B● Model of competition between adaptive players

Challet and Zhang (1997), from El Farol's bar problem (Arthur 1994)

Initial goals of the MG

El Farol (1994): impossible to understand

Drastic simplification, keeping key ingredients

Bounded rationality

Reinforcement learning

Symmetrize the problem: 60/100 -> 50/50

Understand the symmetric problem

Generalize results to the asymmetric problem

Repeated games

Why playing again ?

Frustration

Losers in majority

How to play ?

Deduction

Rationality

Best answer

All lose !

Induction

Limited capabilities

Beliefs, strategies,personality

Trial and error

Learning

Minority Game

a1(t)a2(t)

aN(t)

...

A(t)=iai(t)

Payoff player i

-ai(t)A(t)

N agents i=1, ..., N

Choice ai (t) +1

-1

Total losses = A2

Markovian learning'If it ain't broken, don't fix it' (Reents et al., Physica A 2000:

If I won, I stick to my previous choice

If I lost, I change to the other choice with prob p

Results: ( 2= < A> 2 )

● pN = x = cst (small p): 2 = 1 + 2x (1+ x/6)

● p~ N 1/2 2 ~ N

● p~ 1 2 ~ N 2

Markovian learning II

Problem: if N unknown, p= ?

Try: p= f(t) e.g. p= t-k

Convergence for any N

Freezing

When to stop ?

Neural networks

Simple perceptrons, learning rate R (Metzler ++ 1999)

2 = N + N(N-1)F(N,R)min

2 = N (1-2/) = 0.363... N

Reinforcement learning

● Each player has a register Di

● Di> 0 + is better

● Di< 0 - is better

● Di(t+1) = Di(t) – A(t)

● Choice: prob(+ | Di) = f(Di) f '(x) > 0 (RL)

Reinforcement learning II

● Central result:

agents minimize < A> 2 (predictability) for all f

● Stationary state: < A> = 0

● Fluctuations = ?

● Ex: f(x)=(1+tanh(K x))/2 exponential learning, K

learning rate

●K< Kc ~ N

●K> Kc 2~ N2

Market Impact: each agent has an influence on the outcome

● Naive agents: payoff - A = - A-i -a i

● Non-naive agents: payoff - A + c a i

● Smart agents: payoff - A-i

cf WLU, AU

● Central result 2:

non-naive agents minimize < A2> (fluctuations) for all

f

-> Nash equilibrium

Reinforcement learning III

~ 1

Summary

Rate Markov NN RL naive RL non-naive NN non-naive

Small 1 N N 1 1?

Medium N 1 1?

Large 1 1?N2 N 2 N 2

Minority Games with memory

If an agent believes that the outcome depends on the past results, the outcome will depend on the past results.

Sun spot effect

Self-fulfilling prophecies

Fallacies of casual inference

Consequence:

The other agents will change their behavior accordingly

=P/N

2/N

Minority Games with memory: naïve agents

Fixed randomly drawn strategies = quenched disorder

Tools of statistical physics give the exact solution in

principle

Agents minimize the predictability

Predictability = Hamiltonian

Optimization problem

Numeric:

Savit++ PRL99

Analytic:

Challet++ PRL99

Coolen+ J. Phys A 2002

?

Minority Games with memory: low efficiency

= P/N

Minority Games with memory: low efficiency

P/N is not the right scaling for large fluctuations

Minority Games with memory: origin of low efficiency

Stochastic dynamical equation for strategy score Ui

slow varying part + correlated noise

I: Size independent II = K P -1/2

When I << II, large fluctuations

Transition at I / K = G / P 1/2

Critical signal to noise ratio = G / P 1/2


Check:

Determine G

Predict critical points

I/K

G / P 1/2


BEFOREAFTER

Minority Games with memory: sophisticated agents

Agents minimize fluctuations

Optimization problem again

Reverse problem

Many variations, different global utility functions

● Grand canonical game (play or not play)

●Time window of scores (exponential moving

average)

●Any payoffHence, given a task (global utility function),

one knows how to design agents (local utility).

example: optimal defects combinations (cf. Neil's

talk)

From El Farol to MG and back

El Farol

0 NL

MG

0 NL = N/2

Differences, similarities?

Which results from MG are valid for El Farol?

From El Farol to MG and back

0 NL

Theorem: all results from MG apply to El Farol

N< a>

Everything scales like (L/N – < a>)/ = P ½

The El Farol problem with P states of the world is solved.

From El Farol to MG and back:new results

If (L/N – < a>)/ = P ½ 0,

P>Pc = 2 / [(L/N-< a>)2]: no more phase transition.

Summary•AU/WLU suppresses large fluctuations -> Nash equilibrium

•Design: agents must know they have an impact.

•The knowledge of the exact impact not crucial

•Reverse problem also possible

•MG: simple, rich, fun, and usefulwww.unifr.ch/econophysics/minority

102 commented references

Competition between adaptive agents: learning and collective efficiency

Documents

Transcript of Competition between adaptive agents: learning and collective efficiency