Multi-Agent Learning Mini-Tutorial Gerry Tesauro IBM T.J.Watson Research Center .

23
Multi-Agent Learning Mini- Tutorial Gerry Tesauro IBM T.J.Watson Research Center http://www.research.ibm.com/infoecon http://www.research.ibm.com/massdist
  • date post

    21-Dec-2015
  • Category

    Documents

  • view

    221
  • download

    0

Transcript of Multi-Agent Learning Mini-Tutorial Gerry Tesauro IBM T.J.Watson Research Center .

Page 1: Multi-Agent Learning Mini-Tutorial Gerry Tesauro IBM T.J.Watson Research Center  .

Multi-Agent Learning Mini-Tutorial

Gerry Tesauro

IBM T.J.Watson Research Centerhttp://www.research.ibm.com/infoecon http://www.research.ibm.com/massdist

Page 2: Multi-Agent Learning Mini-Tutorial Gerry Tesauro IBM T.J.Watson Research Center  .

Outline Statement of the problem Tools and concepts from RL & game theory “Naïve” approaches to multi-agent learning

ordinary single-agent RL; no-regret learning fictitious play evolutionary game theory

“Sophisticated” approaches minimax-Q (Littman), Nash-Q (Hu & Wellman) tinkering with learning rates: WoLF (Bowling),

Multiple-timescale Q-learning (Leslie & Collins) “strategic teaching” (Camerer talk)

Challenges and Opportunities

Page 3: Multi-Agent Learning Mini-Tutorial Gerry Tesauro IBM T.J.Watson Research Center  .

Normal single-agent learning Assume that environment has observable states,

characterizable expected rewards and state transitions, and all of the above is stationary (MDP-ish)

Non-learning, theoretical solution to fully specified problem: DP formalism

Learning: solve by trial and error without a full specification: RL + exploration, Monte Carlo, ...

Page 4: Multi-Agent Learning Mini-Tutorial Gerry Tesauro IBM T.J.Watson Research Center  .
Page 5: Multi-Agent Learning Mini-Tutorial Gerry Tesauro IBM T.J.Watson Research Center  .

Multi-Agent Learning Problem: Agent tries to solve its

learning problem, while other agents in the environment also are trying to solve their own learning problems.

Non-learning, theoretical solution to fully specified problem: game theory

Page 6: Multi-Agent Learning Mini-Tutorial Gerry Tesauro IBM T.J.Watson Research Center  .

Basics of game theory A game is specified by: players (1…N), actions,

and payoff matrices (functions of joint actions) B’s action

A’s action

A’s payoff B’s payoff

If payoff matrices are identical, game is cooperative, else non-cooperative (zero-sum = purely competitive)

011

101

110

S

P

R

SPR

011

101

110

S

P

R

SPR

Page 7: Multi-Agent Learning Mini-Tutorial Gerry Tesauro IBM T.J.Watson Research Center  .

Basic lingo…(2) Games with no states: (bi)-matrix games Games with states: stochastic games, Markov

games; (state transitions are functions of joint actions)

Games with simultaneous moves: normal form Games with alternating turns: extensive form

No. of rounds = 1: one-shot game No. of rounds > 1: repeated game

deterministic action choice: pure strategy non-deterministic action choice: mixed strategy

Page 8: Multi-Agent Learning Mini-Tutorial Gerry Tesauro IBM T.J.Watson Research Center  .

Basic Analysis A joint strategy x is Pareto-optimal if no x’ that

improves everybody’s payoffs An agent’s xi is a dominant strategy if it’s always

best regardless of others’ actions xi is a best-reponse to others’ x-i if it maximizes

payoff given x-i

A joint strategy x is an equilibrium if each agent’s strategy is simultaneously a best-response to everyone else’s strategy, i.e. no incentive to deviate (Nash, correlated)

A Nash equilibrium always exists, but may be exponentially many of them, and not easy to compute

Page 9: Multi-Agent Learning Mini-Tutorial Gerry Tesauro IBM T.J.Watson Research Center  .

What about imperfect information games?

Nash eqm. requires knowledge of all payoffs. For imperfect info. games, corresponding concept is Bayes-Nash equilibrium (Nash plus Bayesian inference over hidden information). Even more intractable than regular Nash.

Page 10: Multi-Agent Learning Mini-Tutorial Gerry Tesauro IBM T.J.Watson Research Center  .

Can we make game theory more tractable?

Active area of research Symmetric games: payoffs are invariant under

swapping of player labels. Can look for symmetric equilibria, where all agents play same mixed strategy.

Network games: agent payoffs only depend on interactions with a small # of neighbors

Summarization games: payoffs are simple summarization functions of population joint actions (e.g. voting)

Page 11: Multi-Agent Learning Mini-Tutorial Gerry Tesauro IBM T.J.Watson Research Center  .

Summary: pros and cons of game theory Game theory provides a nice

conceptual/theoretical framework for thinking about multi-agent learning.

Game theory is appropriate provided that: Game is stationary and fully specified; Enough computer power to compute equilibrium; Can assume other agents are also game theorists; Can solve equilibrium coordination problem.

Above conditions rarely hold in real applications Multi-agent learning is not only a fascinating

problem, it may be the only viable option.

Page 12: Multi-Agent Learning Mini-Tutorial Gerry Tesauro IBM T.J.Watson Research Center  .

Naïve Approaches to Multi-Agent Learning

Basic idea: agent adapts, ignoring non-stationarity of other agents’ strategies

1. Fictitious play: Agent observes time-average frequency of other players’ action choices, and models:

agent then plays best-response to this model

Variants of fictitious play: exponential recency weighting, “smoothed” best response (~softmax), small adjustment toward best response, ...

nsobservatiototal

observedktimeskactionprob

#

#)(

Page 13: Multi-Agent Learning Mini-Tutorial Gerry Tesauro IBM T.J.Watson Research Center  .

What if all agents use fictitious play?

Strict Nash equilibria are absorbing points for fictitious play

Typical result is limit-cycle behavior of strategies, with increasing period as N

In certain cases, product of empirical distributions converges to Nash even though actual play cycles (penny matching example)

Page 14: Multi-Agent Learning Mini-Tutorial Gerry Tesauro IBM T.J.Watson Research Center  .

More Naïve Approaches…

2. Evolutionary game theory: “Replicator Dynamics” models: large population of agents using different strategies, fittest agents breed more copies. Let x= population strategy vector, and xk = fraction of

population playing strategy k. Growth rate then:

Above equation also derived from an “imitation” model NE are fixed points of above equation, but not

necessarily attractors (unstable or neutral stable)

),(),( xxuxeuxdt

dx kk

k

Page 15: Multi-Agent Learning Mini-Tutorial Gerry Tesauro IBM T.J.Watson Research Center  .

Many possible dynamic behaviors...

limit cycles attractors unstable f.p.

Also saddle points, chaotic orbits, ...

Page 16: Multi-Agent Learning Mini-Tutorial Gerry Tesauro IBM T.J.Watson Research Center  .

Replicator dynamics: auction bidding strategies

Page 17: Multi-Agent Learning Mini-Tutorial Gerry Tesauro IBM T.J.Watson Research Center  .

More Naïve Approaches…

3. Iterated Gradient Ascent: (Singh, Kearns and Mansour): Again does a myopic adaptation to other players’ current strategy.

Coupled system of linear equations: u is linear in xi and x-i

Analysis for two-player, two-action games: either converges to a Nash fixed point on the boundary (at least one pure strategy), or get limit cycles

),( iii

i xxuxdt

dx

Page 18: Multi-Agent Learning Mini-Tutorial Gerry Tesauro IBM T.J.Watson Research Center  .

Further Naïve Approaches…

4. Dumb Single-Agent Learning: Use a single-agent algorithm in a multi-agent problem & hope that it works

No-regret learning by pricebots (Greenwald & Kephart) Simultaneous Q-learning by pricebots (Tesauro & Kephart)

In many cases, this actually works: learners converge either exactly or approximately to self-consistent optimal strategies

Page 19: Multi-Agent Learning Mini-Tutorial Gerry Tesauro IBM T.J.Watson Research Center  .

“Sophisticated” approaches

Takes into account the possibility that other agents’ strategies might change.

5. Multi-Agent Q-learning: Minimax-Q (Littman): convergent algorithm for

two-player zero-sum stochastic games Nash-Q (Hu & Wellman): convergent algorithm

for two-player general-sum stochastic games; requires use of Nash equilibrium solver

Page 20: Multi-Agent Learning Mini-Tutorial Gerry Tesauro IBM T.J.Watson Research Center  .

More sophisticated approaches...

6. Varying learning rates

WoLF: “Win or Learn Fast” (Bowling): agent reduces its learning rate when performing well, and increases when doing badly. Improves convergence of IGA and policy hill-climbing

Multi-timescale Q-Learning (Leslie): different agents use different power laws t-n for learning rate decay: achieves simultaneous convergence where ordinary Q-learning doesn’t

Page 21: Multi-Agent Learning Mini-Tutorial Gerry Tesauro IBM T.J.Watson Research Center  .

More sophisticated approaches...

7. “Strategic Teaching:” recognizes that other players’ strategy are adaptive

“A strategic teacher may play a strategy which is not myopically optimal (such as cooperating in Prisoner’s Dilemma) in the hope that it induces adaptive players to expect that strategy in the future, which triggers a best-response that benefits the teacher.” (Camerer, Ho and Chong)

Page 22: Multi-Agent Learning Mini-Tutorial Gerry Tesauro IBM T.J.Watson Research Center  .

Theoretical Research Challenges Proper theoretical formulation?

“No short-cut” hypothesis: Massive on-line search a la Deep Blue to maximize expected long-term reward

(Bayesian) Model and predict behavior of other players, including how they learn based on my actions (beware of infinite model recursion)

trial-and-error exploration continual Bayesian inference using all evidence

over all uncertainties (Boutilier: Bayesian exploration)

When can you get away with simpler methods?

Page 23: Multi-Agent Learning Mini-Tutorial Gerry Tesauro IBM T.J.Watson Research Center  .

Real-World Opportunities Multi-agent systems where you can’t do

game theory (covers everything :-)) Electronic marketplaces (Kephart) Mobile networks (Chang) Self-managing computer systems (Kephart) Teams of robots (Bowling, Stone) Video games Military/counter-terrorism applications