Intro duction to Op enSpielmlanctot.info/open_spiel-tutorial-kuleuven-mar11-2020.pdfIntro to Op...

Introduction to OpenSpiel

Marc Lanctot

Joint work with Edward Lockhart, Jean-Baptiste Lespiau, Vinicius Zambaldi, Satyaki Upadhyay, Julien Pérolat, Sriram Srinivasan, Finbarr Timbers, Karl Tuyls, Shayegan Omidshafiei, Daniel Hennes, Dustin Morrill, Paul Muller, Timo Ewalds, Ryan Faulkner, János Kramár, Bart De Vylder, Brennan Saeta, James Bradbury, David Ding, Sebastian Borgeaud, Matthew Lai, Julian Schrittwieser, Thomas Anthony, Edward Hughes, Ivo Danihelka, Jonah Ryan-Davis, and several external contributors!

Private & ConfidentialMany, many great collaborators!

Private & ConfidentialIntro to OpenSpiel (Released Aug ‘19)

● Open source framework for research on RL, search, and planning in games

● Main impl in C++ and Python. Also:○ Swift○ Julia (contributed post-release)

● ﹥25 games● ﹥10 algorithms

Private & ConfidentialOpenSpiel

Supports:

● n-player games● Zero-sum, coop, general-sum● Perfect / imperfect info● Simultaneous-move games

Private & ConfidentialTour of OpenSpiel

Main web site: github.com/deepmind/open_spiel/

(Link to open colab on the main site)

● Contributors● Games● Algorithms

Private & ConfidentialOpenSpiel: Example Viz (Kuhn Poker)

Private & ConfidentialOpenSpiel: Example Viz (Replicator dynamics)

Private & ConfidentialMotivation: Why another games / RL library?

1. Promote work on general multiagent RLa. “Atari Learning Environment” of multiagent/gamesb. General game-learning

2. Games have specific requirements and use cases:a. Illegal moves, turn-based, etc.

3. Connecting research communities!4. Open code, metrics, communication, progress5. Reproducibility in research

Private & Confidential

Design Philosophy

1. Keep it simple.2. Keep it light.

Main structure:

● C++ core + Python API● Swift port● Julia API● Go API (in the works)● Games in C++● Algs in C++ and Python● Many examples / colab

Example

OpenSpiel: Design & Code

Private & ConfidentialObject-Oriented API

SimMoveGame

NormalFormGame

MatrixGame

GameType

● dynamics● information● utility● reward_model

NewInitialState()

Private & ConfidentialOpenSpiel Live Demo, Part 1

● Showcase basic OpenSpiel core API by example via python interpreter.● Feel free to follow along in colab or locally!● Transcript of demo: demo1.txt

Multi-Agent and AI

Singh, Kearns & Mansour ‘03, Infinitesimal Gradient Ascent (IGA)

Multiagent Learning Dynamics

Multi-Agent and AI

Formalize optimization as a

dynamical system:

policy gradients

Analyze using well-established

techniques

Multiagent Learning Dynamics

Image from Singh, Kearns, & Mansour ‘03

Multi-Agent and AI

→ Evolutionary Game Theory: replicator dynamics

Replicator Dynamics

time derivative

Multi-Agent and AI

Replicator Dynamics

time derivative utility of action a against the joint policy / population of other players

Multi-Agent and AI

Replicator Dynamics

time derivative utility of action a against the joint policy / population of other players

Expected / average utility of the joint policy / population

Multi-Agent and AI

Bloembergen et al. 2015

Phase Portraits

● Matrix games and learning dynamics● Feel free to follow along in colab or locally!● Transcript of demo: demo2.txt

Multi-Agent and AI

A simple MDP

B C D E

Pr(B | A, a) = 0.75 Pr(C | A, a) = 0.25 Pr(D | A, b) = 0.4 Pr(E | A, b) = 0.6

3 -1 0 2 4 2 -3 2

c d e f g h i j

Multi-Agent and AI

A simple MDP Multiagent System

B C D E

3 -1 0 2 4 2 -3 2

c d e f g h i j

Chance is a player with afixed stochastic policy!

Multi-Agent and AI

(A, a, F, 1, B, c) is a terminal history.

Terminal history A.K.A. Episode

B C D E

3 -1 0 2 4 2 -3 2

c d e f g h i j

Multi-Agent and AI

(A, a, F, 1, B, c) is a terminal history. (A, b, G, 3, D, g) is a another terminal

history.

Terminal history A.K.A. Episode

B C D E

3 -1 0 2 4 2 -3 2

c d e f g h i j

Multi-Agent and AI

(A, a, F, 2, C) is a history. It is a prefix of (A, a, F, 2, C, e) and (A, a, F, 2, C, f).

Prefix (non-terminal) Histories

B C D E

3 -1 0 2 4 2 -3 2

c d e f g h i j

Private & ConfidentialPartially Observable Zero-Sum Games

● Players start w/ 2 chips● Each: ante 1 chip● 3-card deck● 2 actions: pass, bet● Reward: money diff

Kuhn (simplified) poker

Private & ConfidentialTerminology

● An information state, , corresponds to a sequence of observations○ with respect to the player to play at

Ante: 1 chip per player, , P1 bets (raise)

private observation

Environment is in one of many world states

private observation

Environment is in one of many world states

full history of actions (including nature’s!!)

● Imperfect information games, information state strings + vectors● Feel free to follow along in colab or locally!● Transcript of demo: demo3.txt

Private & ConfidentialQuery API

● OpenSpiel designed to be a generic API (breadth vs. depth)

● OpenSpiel designed to be a generic API (breadth vs. depth)● However, sometimes domain-specific knowledge is required.

OpenSpiel provides a query API to get knowledge about states:

- query.h, query.cc- python/pybind11/pyspiel.cc

OpenSpiel provides a query API to get knowledge about states:

- query.h, query.cc- python/pybind11/pyspiel.cc

Currently only one game uses this.

Private & ConfidentialBest File References

First example and API references:

● examples/example.cc,● python/examples/example.py

● python/examples/matrix_game_example.py

● python/egt/dynamics_test.py

● python/examples/kuhn_policy_gradient.py

● python/examples/tic_tac_toe_qlearner.py

● python/examples/independent_tabular_qlearning.py

Demo transcripts: demo1.txt, demo2.txt, demo3.txt

Private & ConfidentialThank You!

● Paper: https://arxiv.org/abs/1908.09453● Github: github.com/deepmind/open_spiel/

10/03/2020

The end and thank you

Marc Lanctotlanctot@google.com

mlanctot.info

Intro duction to Op enSpielmlanctot.info/open_spiel-tutorial-kuleuven-mar11-2020.pdfIntro to Op...

Documents

Transcript of Intro duction to Op enSpielmlanctot.info/open_spiel-tutorial-kuleuven-mar11-2020.pdfIntro to Op...

K. Geboes Dept of Pathology, Univ Hospital, KULeuven, Belgium · Microscopic colitis Histological Classification K. Geboes Dept of Pathology, Univ Hospital, KULeuven, Belgium

Competition and solidarity: Belgium Erik Schokkaert (Department of Economics, KULeuven; CORE, Université catholique de Louvain) 1.

Tom Wenseleers Laboratorium voor Entomologie KULeuven tom.wenseleers@bio.kuleuven.be

Beej’sGuidetoNetworkProgrammingbeej.us/guide/bgnet/pdf/bgnet_a4_c_1.pdfINTRO 2 Ifyoustillgeterrors,youcouldtryfurtheraddinga-lxnettotheendofthatcommandline.Idon’tknow whatthatdoes,exactly,butsomepeopleseemtoneedit

ALLOWANCES and - United States Navy · 2014. 4. 22. · op-506 op-508 op- 5 1 op-5pc op- 5 1p op-511 op-512b op-5 14 op-515 op- 5 2 op-09b92 op-90a op-90d op-90e op-93d op-941 op-094g61

DeliverableD1.1 Web-platformsecurityguide ... · Documentrevisionhistory Responsible Date Initialskeleton KULeuven March15,2013 Initialrevision KULeuven July5,2013 Updatedrevision

Studio Molenbeek KULeuven Fall 2013

IP 3 -induced Ca 2+ release and calmodulin Laboratory of Physiology KULeuven Leuven, Belgium.

TIII team: Presentation final event [KULeuven]

Final - Crowdfunding Research - KULeuven 20 Oct 2017 · 2017. 10. 27. · Microsoft PowerPoint - Final - Crowdfunding Research - KULeuven 20 Oct 2017 Author: u0032395 Created Date:

NFC Voucher - Legal Aspects - Christophe Geuens (IBBT-ICRI-KULeuven)

User-centred design in the NFC-Voucher project - Karin Slegers & Verónica Donoso (IBBT-CUO-KULeuven)

H.De Man/KULeuven/IMEC 1 Joos, The ‘Renaissance Engineer’

Lode Vermeersch (VUB & HIVA-Kuleuven ) & Ankelien Kindekens (VUB) And Katrien Van Iseghem

Willy Sansen KULeuven, ESAT-MICAS Leuven, Belgium

ROLE Xmas Project KULeuven

1 Regional Policy Lotte Ovaere, KULeuven Louvain Institute for Ireland in Europe Fall 2011.

Credibility in E-WOM How review perceptions impact their persuasiveness Natalie Van Hemelen (KULeuven), Peeter W. J. Verlegh (UVA) & Tim Smits (KULeuven)

OP 45 - California · 2016. 1. 25. · OP 4 OP 238 OP 275 OP 33 OP 112 OP 13 OP 185 OP 92 OP 45 OP 120 OP 132 OP 26 OP 24 OP 128 OP 104 OP 88 OP 113 OP 16 OP 99 OP 160 £¤ 50 Clarksburg

activities of KULeuven