Parallel Monte-Carlo Tree Search with Simulation Servers
description
Transcript of Parallel Monte-Carlo Tree Search with Simulation Servers
![Page 1: Parallel Monte-Carlo Tree Search with Simulation Servers](https://reader034.fdocuments.in/reader034/viewer/2022051115/5681498a550346895db6d28a/html5/thumbnails/1.jpg)
Parallel Monte-Carlo Tree Search with Simulation Servers
HIDEKI KATO†‡ and IKUO TAKEUCHI†
† The University of Tokyo‡ Fixstars Corporation
November 7th, 2008
![Page 2: Parallel Monte-Carlo Tree Search with Simulation Servers](https://reader034.fdocuments.in/reader034/viewer/2022051115/5681498a550346895db6d28a/html5/thumbnails/2.jpg)
Contents
Computer Go
Monte-Carlo Tree Search
Parallel Monte-Carlo Tree Search
Client-Server Approach
Experiments and Discussion
Conclusion and Future Work
![Page 3: Parallel Monte-Carlo Tree Search with Simulation Servers](https://reader034.fdocuments.in/reader034/viewer/2022051115/5681498a550346895db6d28a/html5/thumbnails/3.jpg)
Computer Go• The game of Go
– Task par excellence for AI (H. Berliner 1978)– Most challenging; largest search space
• 19 x 19 10171, 9 x 9 1038 cf. Chess 1050
– Minimax tree search and a static evaluation function with domain knowledge was used so far without major success
• The Monte-Carlo Go revolution– MoGo beat an 8-dan professional player on 9 x 9– Crazy Stone beat a 4-dan professional player with 8 st
ones handicap
![Page 4: Parallel Monte-Carlo Tree Search with Simulation Servers](https://reader034.fdocuments.in/reader034/viewer/2022051115/5681498a550346895db6d28a/html5/thumbnails/4.jpg)
Monte-Carlo Tree Search (MCTS)
Descend tree from root to leaf
Update values ofthe moves
Repeat until time-up
Play most visited move in root
Add a node
Simulatea game
![Page 5: Parallel Monte-Carlo Tree Search with Simulation Servers](https://reader034.fdocuments.in/reader034/viewer/2022051115/5681498a550346895db6d28a/html5/thumbnails/5.jpg)
Parallel MCTS (PMCTS)
Lock
Search tree (shared)
Thread 1
• Symmetrical multi-thread (SMT) PMCTS– Commonly used straightforward implementation– MCTS threads share a search tree
Thread 3
Thread 2
Thread 4
![Page 6: Parallel Monte-Carlo Tree Search with Simulation Servers](https://reader034.fdocuments.in/reader034/viewer/2022051115/5681498a550346895db6d28a/html5/thumbnails/6.jpg)
Related Work• S. Gelly et al. introduced SMT PMCTS for share
d-memory SMP systems (2006)• T. Cazenave et al. proposed and evaluated three P
MCTS algorithms on a 16 Intel Pentium-4 MPI cluster (2007)
• G. Chaslot et al. evaluated root, leaf and tree parallelization on 2 x 8 core IBM Power5 (2008)
• S. Gelly et al. proposed SMT PMCTS for MPI clusters of shared-memory SMP nodes (2008)
![Page 7: Parallel Monte-Carlo Tree Search with Simulation Servers](https://reader034.fdocuments.in/reader034/viewer/2022051115/5681498a550346895db6d28a/html5/thumbnails/7.jpg)
Problems
• Number of processors–Shared tree PMCTS can run only on shared-me
mory systems; currently up to 16 or 32 processors
–PMCTS algorithms for clusters of computers connected through networks is necessary
–Longer communication time decreases performance like other parallel applications
–Increasing the threads increases the overhead of the locks to share search tree
![Page 8: Parallel Monte-Carlo Tree Search with Simulation Servers](https://reader034.fdocuments.in/reader034/viewer/2022051115/5681498a550346895db6d28a/html5/thumbnails/8.jpg)
MoGo’s Solution• Combine fine and coarse grain PMCTS
– For MPI clusters with shared-memory SMP nodes (S. Gelly et al. 2008)
– Runs SMT PMCTS on each node– Periodically exchanges and merges values in the tree– Excellent performance
• MoGoTitan beat an 8-dan Korean Professional Go player with 9 stones handicap (2008)
Huygens super computer at SARA in Amsterdam, the Netherlands
25 out of 104 SMP nodes were usedEach node consists of 16 dual core Power6
processors at 4.7 GHz
![Page 9: Parallel Monte-Carlo Tree Search with Simulation Servers](https://reader034.fdocuments.in/reader034/viewer/2022051115/5681498a550346895db6d28a/html5/thumbnails/9.jpg)
MoGo’s Solution (cont’d)• Disadvantages
– Expensive• High speed network interfaces such as InfiniBand are very
expensive (so are the clusters)
– Lack of flexibility• MPI does not allow to add or remove computers on the fly• MPI requires special setup; must be pre-configured
• Applicable to non-MPI clusters on moderate speed networks?– Nobody tried yet
![Page 10: Parallel Monte-Carlo Tree Search with Simulation Servers](https://reader034.fdocuments.in/reader034/viewer/2022051115/5681498a550346895db6d28a/html5/thumbnails/10.jpg)
Client-Server Approach• Recent success of grid computing
– Folding@home achieved one petaflop with major benefits by 41,145 Sony Playstation 3 consoles all over the world (2007)
– Less expensive massive parallel approach– Applicable to PMCTS?
• Basic idea– Separate tree search part and simulation part– Broadcast positions to be simulated using UDP/IP– Don’t wait the end of slow simulations
![Page 11: Parallel Monte-Carlo Tree Search with Simulation Servers](https://reader034.fdocuments.in/reader034/viewer/2022051115/5681498a550346895db6d28a/html5/thumbnails/11.jpg)
Client-Server Approach (cont’d)• Client-server PMCTS
– A client searches tree and send a position; a server simulates a game from the position and sends back the result
– Runs on a cluster of loosely-coupled computers– Servers can run on small memory computers even if
the tree is going to be huge– No special set-up for servers; just a small application– Longer communication time due to moderate speed
networks– Performance? Scales well?
![Page 12: Parallel Monte-Carlo Tree Search with Simulation Servers](https://reader034.fdocuments.in/reader034/viewer/2022051115/5681498a550346895db6d28a/html5/thumbnails/12.jpg)
Client-Server PMCTS
Descend tree from root to leaf
Update values of the moves
Repeat until time-up
Select most visited move in root
Add a node
Broadcast the position
Receive a result(no wait)
Repeat forever
Send the result
Simulatea game
Repeat forever
Receive positions
Send the result
Receive positions
Simulatea game
Server 1 Server 2
Client
Search tree
Loop
![Page 13: Parallel Monte-Carlo Tree Search with Simulation Servers](https://reader034.fdocuments.in/reader034/viewer/2022051115/5681498a550346895db6d28a/html5/thumbnails/13.jpg)
Experimental System
CPU: Q9550/3GHz (400 x 7.5)OS: Ubuntu Linux 8.04M/B: ASUS P5K-VM (G33)RAM:PC3200 4GiBNIC: Intel EXP9300PT (PCI-Ex x1)
CPU: Q9550/3GHz (400 x 7.5)OS: Ubuntu Linux 8.04M/B: DFI LP JR P45-T2RS (P45)RAM:PC3200 4GiBNIC: Intel EXP9300PT (PCI-Ex x1)RTT: 151±22 s @ 1 kB
CPU: Q6600/3GHz (333 x 9)OS: Ubuntu Linux 8.04M/B: ASUS P5K-VM (G33)RAM:PC3200 4GiBNIC: Intel EXP9300PT (PCI-Ex x1)RTT: 154±20 s @ 1 kB
CPU: Q6600/3GHz (333 x 9)OS: Ubuntu Linux 8.04M/B: ASUS P5WDG2-WS Pro (975X)RAM:PC3200 4GiBNIC: Intel EXP9300GT (PCI)RTT: 159±22 s @ 1 kB
PC1 (1 client and 3 servers) PC2 (4 servers)
PC3 (4 servers) PC4 (4 servers)
Allied Telesis GS908XLSwitching delay: 2.2 ms @ 64 byte
Switch
![Page 14: Parallel Monte-Carlo Tree Search with Simulation Servers](https://reader034.fdocuments.in/reader034/viewer/2022051115/5681498a550346895db6d28a/html5/thumbnails/14.jpg)
Experiments• A tree searcher or a simulator exclusively uses a core• One core or other on the client computer is used for a
tree searcher or a simulator thread, respectively• The simulators on the server computers run as
individual processes • All results are ELO ratings against GNU Go 3.7.11
level 0
9 x 9 13 x 13Games 2,000 500Time per move (s) 0.005 to 0.64 0.05 to 6.4
Simulation servers 1 to 15 1 to 15
![Page 15: Parallel Monte-Carlo Tree Search with Simulation Servers](https://reader034.fdocuments.in/reader034/viewer/2022051115/5681498a550346895db6d28a/html5/thumbnails/15.jpg)
How to evaluate the results?• Simulations per second?
– Commonly used for shared memory SMP systems but not a good measure for clusters
– The benefits of simulations are not the same• Use equivalent-strength speed-up
– The ratio of time-per-move settings that give the same strength at different number of simulators
– “Equivalent speed-up” for short• Number of simulators or cores
– The number of simulators is used to evaluate scalability while the number of all cores is used to evaluate performance
![Page 16: Parallel Monte-Carlo Tree Search with Simulation Servers](https://reader034.fdocuments.in/reader034/viewer/2022051115/5681498a550346895db6d28a/html5/thumbnails/16.jpg)
Equivalent Speed-up
Time per move (s)
84211/21/4
1/8
1/16
ELO
rati
ng
4 core (13 x 13)
16 core (13 x 13)
-300
-200
-100
0
100
200
300
4 core (9 x 9)
16 core (9 x 9)
![Page 17: Parallel Monte-Carlo Tree Search with Simulation Servers](https://reader034.fdocuments.in/reader034/viewer/2022051115/5681498a550346895db6d28a/html5/thumbnails/17.jpg)
Performance (4 core vs. 16 core)
y = 0.0001x + 1.5969
0
1
2
3
4
5
-600 -500 -400 -300 -200 -100 0 100 200
ELO rating
Equiv
ale
nt sp
eed-u
p
9 x 9
13 x 13
![Page 18: Parallel Monte-Carlo Tree Search with Simulation Servers](https://reader034.fdocuments.in/reader034/viewer/2022051115/5681498a550346895db6d28a/html5/thumbnails/18.jpg)
Scalability
9 x 9 (0.08 s/move)
13 x 13 (0.4 s/move)
1 2 3 155 7 114 8
Number of simulators
ELO
rati
ng
y = 86.864x - 281.43
R2 = 0.9794
-300
-200
-100
0
100
![Page 19: Parallel Monte-Carlo Tree Search with Simulation Servers](https://reader034.fdocuments.in/reader034/viewer/2022051115/5681498a550346895db6d28a/html5/thumbnails/19.jpg)
Conclusion and Future Work• Client-server parallel Monte-Carlo tree search
– Runs on a cluster of loosely coupled computers– Small memory computers such as game consoles can
be used for simulation servers– Allows servers to connect or disconnect on-the-fly– Reduced communication by broadcasting– No overhead to share search tree– Scales well on 13 x 13 with 15 simulators
• Future work– Multiple clients for single or multiple users