Large-Scale Platform for MOBA Game AI - NVIDIA · • [1] Silver, David, et al. "Mastering the game...
Transcript of Large-Scale Platform for MOBA Game AI - NVIDIA · • [1] Silver, David, et al. "Mastering the game...
Large-Scale Platform for MOBA Game AI
28th March 2018
Bin Wu & Qiang Fu
Outline
• Introduction
• Learning algorithms
• Computing platform
• Demonstration
Game AI Development
Early exploration Transition Rapid development Explosive growth
1950s-1960s 1970s-1980s 1990s-2000s 2010s
Checkers beat state
champion
Chess4.5 beat human
players Deep Blue (IBM) beat
Garry Kasparov
Alpha Go (DeepMind)
defeat Lee Sedol, and
Jie Ke
Applications of Game AI
Research Gaming
Ideal testbed for general AI research Core applications in gaming industry
Massive data from human players
Low experimental costs
General ability for inception and decision
Pre-game procedures e.g., game designing
Player experience e.g., teammates, enemies
Others e.g., E-sports From virtual world to
real world
Game AI Research Topic
• Many AI giants have joined game AI research
• Moving from Go->RTS, MOBA, etc.
Game AI has become a research hot topic after the success of AlphaGo
Released Starcraft
AI platform,preliminary results in simple scenarios
Released Starcraft II AI platform,not
able to defeat built-in AI
DOTA 2 1v1 beat top human players. 5v5 to be activated in 2018
MOBA Game
• 5 v.s. 5 game Obtain gold/exp → gain advantages on equipment → win fights → destroy enemy’s base
Goal: Destroy Enemy’s Base
Enemy’s Turrets
Movement Control
Attack/Skills Control
Equipment Purchase
Neutral creeps: source of
money/power/level/…
MOBA Game
• Micro Combat ◇ Movement
◇ Use of skills
MOBA Game
• Macro Strategies ◇ Back up
◇ Laning
◇ Ganking
◇ Stealing base
MOBA AI - Key Challenge
Computing Platform Learning Algorithm
Learning Algorithms
Learning Algorithms - Challenges
1
2
3
4
Complexity 10^20000
Multi-agent 5v5 coordination
Imperfect Info Partially observable
Sparse and delayed rewards 20,000+ frames per game
Learning Algorithms - Challenges
• Complexity >> Go ◇ End to end solutions (SL/RL) do not work well
Not able to finish basic movement/attack
Similar observations made
• DeepMind
Go MOBA
State space 3^360 ≈ 10^170
(361 pos, 3 states each) 10^20000
(10 heroes,2000+pos * 10+states)
Action space
250^150 ≈ 10^360 (250 pos available, 150
decisions per game in average)
20^20000 ≈ 10^20000 (20 actions,20,000 frames per game)
Left, right, … Skill 1,2,3+pos/target
recover return etc…
Learning Algorithms - Challenges
• Multi-agent ◇ Macro strategy level
Four defending, while one steals the base
◇ Micro combat level Tanks protecting assassins
Learning Algorithms - Challenges
• Sparse and delayed rewards
Go
MOBA
>20,000 steps
<360 steps
Learning Algorithms - Challenges
• Imperfect Information ◇ Maps are partially observable
Guess enemy’s positions/strategy
Actively explore to gain vision
• Divide and Conquer
Model Architecture
Transfer
Combat
Split for simplification Solution space ~10^20000->~10^2000
Strategy
Model - Transfer
• Where to send heroes? ◇ Compared to Go game
Put heroes as stones
Put maps as boards
◇ Predict good position
Hotspots Prediction Transfer
Model - Strategy
• Key resources in MOBA ◇ Modeling macro objectives Describe hotspots transition series before destroying the key resource
Model - Strategy
宏观Session切分示例
Start Dragon Mid 1st turret
Slain Dark
dragon Dragon
Base Mid 3rd turret
Mid 2nd turret Bottom 1st
turret
Stealing blue creep Killing bottom lane creeps
Attacking bottom 1st turret
Describe hotspots transition series before destroying the key resource
Model - Transfer Network with Macro Strategy
Key resources
Hotspots
Model - Combat
• Multi-task on buttons ◇ Action space Directions
Skill releasing position
Learning Framework
• Imitation + Reinforcement Learning
Computing Platform
• Computing Platform ◇ Computational power – large-scale CPU/GPU virtualization
◇ Learning platform – Efficient and easy-to-use platform
MOBA Game AI Platform
Millions of CPUs Thousands of GPUs
Online service Idle resource pool Online service Offline service
Docker + mixed online/offline technique Docker + GPU virtualization for shared resource
Elastic computation Kubernetes resource allocation
Tencent cloud function
Machine learning
Feature extraction Game environment
deployment Model training
Reinforcement learning
Computational units
Deployment
Resource allocation
Task Managem
ent
Service
Computational Power
• Computational Costs
GPU CPU
MOBA AI thousands millions
CPU/GPU Demands Challenge Solution
Improve resource utilization efficiency without additional costs
The more is the better
CPU/GPU virtualization for shared resources
CPU Virtualization
70% - Idle resource pool ◇ New resources not yet delivered ◇ Old resources not yet cleared ◇ Returned resources
30% - Idle slots in online service
◇ Online service resource usage
◇ 20%->65% using docker isolation
• Elastic and dynamic resource pool ◇ millions of CPU cores
# of CPU cores # CPU avg %
Percentage millions 20%
Elastic & Dynamic Resource Pool
GPU Virtualization
• Goal: improve GPU usage efficiency
• Resource usage
• Optimization idea
# of GPU % of low load machines GPU avg usage
thousands 65% 28%
GPU Virtualization [12]
Parallel share
Time-slice share
Learning Platform
Core Technique
Version Update
Frequency
Feature extraction
Hours
Model training
One day
RL training
One day
Learning Platform - Feature Extraction Platform
• Demand 1 ◇ Feature extraction from up to hundreds of thousands of replays
Challenge: demands up to 210 thousand CPU cores per day Solution
• CPU virtualization
• docker elastic & dynamic resource pool
• Demand 2 ◇ Multiple tasks, each with millions of entries
Challenge: Parallel task scheduling Solution: Tencent Serverless Cloud Function
Game replays
gamecore
pre
Game Raw Data Features Training samples Models Evaluation
Feature extraction shuffle Training Evaluation
Learning Platform - Serverless Cloud Function
SDK
COS CMQ …
API
SDK
Function Call
Function Config
Function
Coordination
Function Function Function …
Application layer
Access layer
Control layer
Execution layer
Advantage of Cloud Function ◇ Function As A Service
◇ Millions of CPU cores available
◇ Free of charge in idle slots
30% of costs on average
Learning Platform - Model Training Platform
1.Requirement ◇ Billions of samples per task ◇ Fast model training
2.Solution ◇ Multi-GPU, multi-machine ◇ Machine learning platform
3.Challenges ◇ IO Efficient data inputs Efficient computation ◇ Communication Efficient parameters exchange
Training Platform
Big Data
Result
Model Training Platform - IO
Data IO ◇ Multiprocessing ◇ “Lock free” queue Efficient computation
◇ Data pre-caching ◇ OP speed up by multi-threading
Model Training Platform - Communication
Parameters exchange
◇ NCCL2 [11]
Efficient communication
between GPUs
◇ RDMA
Efficient communication across
nodes
Model Training Platform - Performance
Acceleration
Optimization results (acceleration ratio)
IO Computation Communication 0
10
20
30
40
50
60
70
Multi-GPU Multi-Machine Speed-up
1GPU 8GPUs 16GPUs 32GPUs 64GPUs
Before After Upper bound
• Demands ◇ Hierarchical RL
Various scenarios
◇ Large-scale parallel self-play
Millions of games
◇ Automatic task management
Unified framework
Model analysis
Evaluation
Learning Platform - Reinforcement Learning Platform
打野
清兵
团战
Jungle
Laning
Combat
RL Platform - Hierarchical RL
• Hierarchical RL
◇Scenario specific
• Solution
◇General Hierarchical RL
• Features
◇Macro task selection
◇Micro task selection
◇Effectively handles long-term planning and delayed rewards
◇Value network for guiding sub-task policy learning
• Large-scale parallel self-play
• Solution ◇ Docker image for gamecore version management
◇ Parallel training framework
RL Platform – Parallel Training
• Unified framework for model analysis and evaluation
◇Task submission
◇ Task start/stop
◇ Results visualization
RL Platform – Automatic task management
Reward curve 雷达图 Prediction distribution Self play results
• Ten million scenarios per day
◇20s per scenario with 16 GPUs
• Millions of full games
◇10min+ per game with 128 GPUs
RL Platform – Performance
Demonstration
Visualization
Demo – Quadra-kill Under Turret
• Micro combat ◇ Fight against mid-high level testers
◇ Killing while avoiding harm from turret
Demo – Pentakill
• Micro combat ◇ Fight against mid-high level testers
Demo – Transfer & Strategy
• Opening
Demo – Transfer & Strategy
• First Dragon appears at 2:00
Demo – Transfer & Strategy
• Besiege and Destroy the Base
Demo – RL
After reinforcement Before reinforcement
Summary
• Pursue general AI via game AI research
• MOBA AI
◇ Algorithm
· Imitation + Reinforcement Learning
◇ Computing platform
· Feature extraction platform
Millions of CPUs
· Model training platform
Thousands of GPUs
· Reinforcement learning platform
Hierarchical RL
Tencent Game AI Research
Tencent Game AI Research
• Future work ◇ Algorithm
· Tactic-level search and planning
· Multi-agent RL
◇ Computational power
· Search/planning platform
MCTS
· Reinforcement learning platform
Multi-agent RL
2017.3 “Jueyi”Fine Art wins the UEC World Cup
About Tencent AI Lab
Our journey
2016.4 Tencent establishes its corporate-level AI Lab
2017.3 Tencent announces leading AI researcher Dr Tong ZHANG as the Director of Tencent AI Lab
2017.5 Tencent establishes its Seattle AI Lab and announces leading Speech Recognition expert Dr Dong Yu as Deputy Director
2017.11 Tencent is identified by China Ministry of Science and Technology to build national open innovation platform for AI medical imaging Today
Our team consists of 70 world-class AI scientists and
300 research engineers
Diverse game ecosystem Game AI
Environment for AGI
China’s leading news, video, music and literature platforms Content AI Perceiving the world and
generating content
Social AI New ways to communicate WeChat: ~1 billion MAU QQ: 850 million MAU
Massive user base
Medical AI Impact and advance
industry
Building a national open innovation platform for AI medical imaging
About Tencent AI Lab
Thank you
References
• [1] Silver, David, et al. "Mastering the game of Go with deep neural networks and tree search." Nature 529.7587 (2016): 484-489.
• [2] Artificial Intelligence Startup Landscape Trends and Insights - Q4 2016. NOVEMBER 20, 2016 VENTURE SCANNER. https://www.venturescanner.com/blog/2016/artificial-intelligence-startup-landscape-trends-and-insights-q4-2016
• [3] Tian, Yuandong, et al. "ELF: An Extensive, Lightweight and Flexible Research Platform for Real-time Strategy Games." arXiv preprint arXiv:1707.01067 (2017).
• [4] O Vinyals et al. StarCraft II: A New Challenge for Reinforcement Learning. https://deepmind.com/research/publications/starcraft-ii-new-challenge-reinforcement-learning/. Aug. 9, 2017
• [5] “We've created an AI which beats the world's top professionals at 1v1 matches of Dota 2”. https://blog.openai.com/dota-2/
• [6] Ontanó n, Santiago, Gabriel Synnaeve, Alberto Uriarte, Florian Richoux, David Churchill, and Mike Preuss. "RTS AI: Problems and Techniques." (2015): 1-12.
• [7] Miles, Chris, and Sushil J. Louis. "Co-evolving real-time strategy game playing influence map trees with genetic algorithms." Proceedings of the International Congress on Evolutionary Computation, Portland, Oregon. IEEE Press, 2006.
• [8] Jang, Su-Hyung, and Sung-Bae Cho. "Evolving neural NPCs with layered influence map in the real-time simulation game „Conqueror‟." Computational Intelligence and Games, 2008. CIG'08. IEEE Symposium on. IEEE, 2008.
• [9] Weber, Ben George, Michael Mateas, and Arnav Jhala. "Building Human-Level AI for Real-Time Strategy Games." AAAI Fall Symposium: Advances in Cognitive Systems. Vol. 11. 2011.
• [10] Xingjian, S. H. I., et al. "Convolutional LSTM network: A machine learning approach for precipitation nowcasting." Advances in neural information processing systems. 2015.
• [11] Nathan Luehr. NCCL: ACCELERATED COLLECTIVE COMMUNICATIONS FOR GPUS. April 5, 2016. GPU Technology Conference 2016.
• [12] CUDA MULTI-PROCESS SERVICE. https://docs.nvidia.com/deploy/pdf/CUDA_Multi_Process_Service_Overview.pdf.