Upper Confidence Trees for Game AI Chahine Koleejan.
-
Upload
bartholomew-gordon -
Category
Documents
-
view
213 -
download
0
Transcript of Upper Confidence Trees for Game AI Chahine Koleejan.
![Page 1: Upper Confidence Trees for Game AI Chahine Koleejan.](https://reader036.fdocuments.in/reader036/viewer/2022082820/56649e7c5503460f94b7dd38/html5/thumbnails/1.jpg)
Upper Confidence Treesfor Game AI
Chahine Koleejan
![Page 2: Upper Confidence Trees for Game AI Chahine Koleejan.](https://reader036.fdocuments.in/reader036/viewer/2022082820/56649e7c5503460f94b7dd38/html5/thumbnails/2.jpg)
Background on Game AI
• For many years, computer chess was considered an ideal sandbox for testing AI algorithms
• Simple rules and clear benchmarks of performance against human intelligence
• Alpha-beta search programs domination over human players changed this
![Page 3: Upper Confidence Trees for Game AI Chahine Koleejan.](https://reader036.fdocuments.in/reader036/viewer/2022082820/56649e7c5503460f94b7dd38/html5/thumbnails/3.jpg)
The Game of Go
• Researchers moved on to Go as their new challenge
• The game of Go is much harder to crack:1. Massive search space– 19x19 board -> up to 361 possible moves per turn– More than 10170 possible states2. Game itself is very complex– Hard to find good heuristics
![Page 4: Upper Confidence Trees for Game AI Chahine Koleejan.](https://reader036.fdocuments.in/reader036/viewer/2022082820/56649e7c5503460f94b7dd38/html5/thumbnails/4.jpg)
Example of a Game of Go
Honinbo Shusaku(Black) vs Gennan Inseki(White), 1846
![Page 5: Upper Confidence Trees for Game AI Chahine Koleejan.](https://reader036.fdocuments.in/reader036/viewer/2022082820/56649e7c5503460f94b7dd38/html5/thumbnails/5.jpg)
The Multi-arm Bandit Setting
• Hypothetical probability settting• Gambler is at a row of k-”bandits”• When a bandit is pulled the gambler gets
some amount of money• Each bandit has a different probability
distribution• The gambler must decide which bandits to pull
to maximise his reward
![Page 6: Upper Confidence Trees for Game AI Chahine Koleejan.](https://reader036.fdocuments.in/reader036/viewer/2022082820/56649e7c5503460f94b7dd38/html5/thumbnails/6.jpg)
Exploitation and Exploration
• We need to balance the exploitation of the action currently believed to be optimal with the exploration of other actions that may be better in the long run
• Upper Confidence Bound:– We want to maximise this value for an arm j:
UCB1 = x ̅j + √[(2 ln n)/nj]
![Page 7: Upper Confidence Trees for Game AI Chahine Koleejan.](https://reader036.fdocuments.in/reader036/viewer/2022082820/56649e7c5503460f94b7dd38/html5/thumbnails/7.jpg)
Why do we care?
![Page 8: Upper Confidence Trees for Game AI Chahine Koleejan.](https://reader036.fdocuments.in/reader036/viewer/2022082820/56649e7c5503460f94b7dd38/html5/thumbnails/8.jpg)
Why do we care?
• Sequential decision making games are basically a multi-arm bandit problem!
![Page 9: Upper Confidence Trees for Game AI Chahine Koleejan.](https://reader036.fdocuments.in/reader036/viewer/2022082820/56649e7c5503460f94b7dd38/html5/thumbnails/9.jpg)
Why do we care?
• Sequential decision making games are basically a multi-arm bandit problem!
• …But worse.
![Page 10: Upper Confidence Trees for Game AI Chahine Koleejan.](https://reader036.fdocuments.in/reader036/viewer/2022082820/56649e7c5503460f94b7dd38/html5/thumbnails/10.jpg)
Why do we care?
• Sequential decision making games are basically a multi-arm bandit problem!
• …But worse.
• …But it’s close enough so we can use the math.
![Page 11: Upper Confidence Trees for Game AI Chahine Koleejan.](https://reader036.fdocuments.in/reader036/viewer/2022082820/56649e7c5503460f94b7dd38/html5/thumbnails/11.jpg)
Monte Carlo Tree Search(MCTS)
• A tree search method which has revolutionised computer Go
• Works by simulating thousands of random games
• Does not need any prior knowledge of the game
• Does not need heuristics or evaluation functions, just observes the outcome of the simulation
![Page 12: Upper Confidence Trees for Game AI Chahine Koleejan.](https://reader036.fdocuments.in/reader036/viewer/2022082820/56649e7c5503460f94b7dd38/html5/thumbnails/12.jpg)
UCT Algorithm
• We have a tree where each node has a value given by the UCB1 bound
• Steps of the algorithm:1. Selection2. Expansion3. Simulation4. Backpropagation
![Page 13: Upper Confidence Trees for Game AI Chahine Koleejan.](https://reader036.fdocuments.in/reader036/viewer/2022082820/56649e7c5503460f94b7dd38/html5/thumbnails/13.jpg)
Selection and Expansion
• Starting at root node, recursively choose the child with the highest value until we reach an expandable node
• A node is expandable if it is non-terminal and has unvisited children
• One child node is added to our tree
![Page 14: Upper Confidence Trees for Game AI Chahine Koleejan.](https://reader036.fdocuments.in/reader036/viewer/2022082820/56649e7c5503460f94b7dd38/html5/thumbnails/14.jpg)
Simulation
• A simulation is run from the new node to the end of the game according to our defined default policy
• At the most basic level the default policy is just random legal play
![Page 15: Upper Confidence Trees for Game AI Chahine Koleejan.](https://reader036.fdocuments.in/reader036/viewer/2022082820/56649e7c5503460f94b7dd38/html5/thumbnails/15.jpg)
Backpropagation
• The simulation result is “backed up” (i.e backpropagated) up the tree through the selected nodes to update their value
• For example, +1 if we won and -1 if we lost
![Page 16: Upper Confidence Trees for Game AI Chahine Koleejan.](https://reader036.fdocuments.in/reader036/viewer/2022082820/56649e7c5503460f94b7dd38/html5/thumbnails/16.jpg)
Example
![Page 17: Upper Confidence Trees for Game AI Chahine Koleejan.](https://reader036.fdocuments.in/reader036/viewer/2022082820/56649e7c5503460f94b7dd38/html5/thumbnails/17.jpg)
References
• A Survey of Monte Carlo Tree Search Methods, Cameron B. Browne and co. IEEE Transactions on Computational Intelligence and AI in Games, 2012
• Monte-Carlo tree search and rapid action value estimation in computer Go, Sylvain Gelly & David Silver, Artificial Intelligence 175, 2011
![Page 18: Upper Confidence Trees for Game AI Chahine Koleejan.](https://reader036.fdocuments.in/reader036/viewer/2022082820/56649e7c5503460f94b7dd38/html5/thumbnails/18.jpg)
• If you’re interested in Go talk to me!
• It’s really cool!
![Page 19: Upper Confidence Trees for Game AI Chahine Koleejan.](https://reader036.fdocuments.in/reader036/viewer/2022082820/56649e7c5503460f94b7dd38/html5/thumbnails/19.jpg)
Othello Demo