Upper Confidence Treesfor Game AI
Chahine Koleejan
Background on Game AI
• For many years, computer chess was considered an ideal sandbox for testing AI algorithms
• Simple rules and clear benchmarks of performance against human intelligence
• Alpha-beta search programs domination over human players changed this
The Game of Go
• Researchers moved on to Go as their new challenge
• The game of Go is much harder to crack:1. Massive search space– 19x19 board -> up to 361 possible moves per turn– More than 10170 possible states2. Game itself is very complex– Hard to find good heuristics
Example of a Game of Go
Honinbo Shusaku(Black) vs Gennan Inseki(White), 1846
The Multi-arm Bandit Setting
• Hypothetical probability settting• Gambler is at a row of k-”bandits”• When a bandit is pulled the gambler gets
some amount of money• Each bandit has a different probability
distribution• The gambler must decide which bandits to pull
to maximise his reward
Exploitation and Exploration
• We need to balance the exploitation of the action currently believed to be optimal with the exploration of other actions that may be better in the long run
• Upper Confidence Bound:– We want to maximise this value for an arm j:
UCB1 = x ̅j + √[(2 ln n)/nj]
Why do we care?
Why do we care?
• Sequential decision making games are basically a multi-arm bandit problem!
Why do we care?
• Sequential decision making games are basically a multi-arm bandit problem!
• …But worse.
Why do we care?
• Sequential decision making games are basically a multi-arm bandit problem!
• …But worse.
• …But it’s close enough so we can use the math.
Monte Carlo Tree Search(MCTS)
• A tree search method which has revolutionised computer Go
• Works by simulating thousands of random games
• Does not need any prior knowledge of the game
• Does not need heuristics or evaluation functions, just observes the outcome of the simulation
UCT Algorithm
• We have a tree where each node has a value given by the UCB1 bound
• Steps of the algorithm:1. Selection2. Expansion3. Simulation4. Backpropagation
Selection and Expansion
• Starting at root node, recursively choose the child with the highest value until we reach an expandable node
• A node is expandable if it is non-terminal and has unvisited children
• One child node is added to our tree
Simulation
• A simulation is run from the new node to the end of the game according to our defined default policy
• At the most basic level the default policy is just random legal play
Backpropagation
• The simulation result is “backed up” (i.e backpropagated) up the tree through the selected nodes to update their value
• For example, +1 if we won and -1 if we lost
Example
References
• A Survey of Monte Carlo Tree Search Methods, Cameron B. Browne and co. IEEE Transactions on Computational Intelligence and AI in Games, 2012
• Monte-Carlo tree search and rapid action value estimation in computer Go, Sylvain Gelly & David Silver, Artificial Intelligence 175, 2011
• If you’re interested in Go talk to me!
• It’s really cool!
Othello Demo
Top Related