Large-Scale Platform for MOBA Game AI - NVIDIA · • [1] Silver, David, et al. "Mastering the game...

Large-Scale Platform for MOBA Game AI

28th March 2018

Bin Wu & Qiang Fu

Outline

• Introduction

• Learning algorithms

• Computing platform

• Demonstration

Game AI Development

Early exploration Transition Rapid development Explosive growth

1950s-1960s 1970s-1980s 1990s-2000s 2010s

Checkers beat state

champion

Chess4.5 beat human

players Deep Blue (IBM) beat

Garry Kasparov

Alpha Go (DeepMind)

defeat Lee Sedol, and

Jie Ke

Applications of Game AI

Research Gaming

Ideal testbed for general AI research Core applications in gaming industry

Massive data from human players

Low experimental costs

General ability for inception and decision

Pre-game procedures e.g., game designing

Player experience e.g., teammates, enemies

Others e.g., E-sports From virtual world to

real world

Game AI Research Topic

• Many AI giants have joined game AI research

• Moving from Go->RTS, MOBA, etc.

Game AI has become a research hot topic after the success of AlphaGo

Released Starcraft

AI platform，preliminary results in simple scenarios

Released Starcraft II AI platform，not

able to defeat built-in AI

DOTA 2 1v1 beat top human players. 5v5 to be activated in 2018

MOBA Game

• 5 v.s. 5 game Obtain gold/exp → gain advantages on equipment → win fights → destroy enemy’s base

Goal: Destroy Enemy’s Base

Enemy’s Turrets

Movement Control

Attack/Skills Control

Equipment Purchase

Neutral creeps: source of

money/power/level/…

MOBA Game

• Micro Combat ◇ Movement

◇ Use of skills

MOBA Game

• Macro Strategies ◇ Back up

◇ Laning

◇ Ganking

◇ Stealing base

MOBA AI - Key Challenge

Computing Platform Learning Algorithm

Learning Algorithms

Learning Algorithms - Challenges

1

2

3

4

Complexity 10^20000

Multi-agent 5v5 coordination

Imperfect Info Partially observable

Sparse and delayed rewards 20,000+ frames per game


• Complexity >> Go ◇ End to end solutions (SL/RL) do not work well

Not able to finish basic movement/attack

Similar observations made

• DeepMind

Go MOBA

State space 3^360 ≈ 10^170

（361 pos, 3 states each） 10^20000

(10 heroes，2000+pos * 10+states)

Action space

250^150 ≈ 10^360 (250 pos available, 150

decisions per game in average)

20^20000 ≈ 10^20000 (20 actions，20,000 frames per game)

Left, right, … Skill 1,2,3+pos/target

recover return etc…


• Multi-agent ◇ Macro strategy level

Four defending, while one steals the base

◇ Micro combat level Tanks protecting assassins


• Sparse and delayed rewards

Go

MOBA

>20,000 steps

<360 steps


• Imperfect Information ◇ Maps are partially observable

Guess enemy’s positions/strategy

Actively explore to gain vision

• Divide and Conquer

Model Architecture

Transfer

Combat

Split for simplification Solution space ~10^20000->~10^2000

Strategy

Model - Transfer

• Where to send heroes? ◇ Compared to Go game

Put heroes as stones

Put maps as boards

◇ Predict good position

Hotspots Prediction Transfer

Model - Strategy

• Key resources in MOBA ◇ Modeling macro objectives Describe hotspots transition series before destroying the key resource

Model - Strategy

宏观Session切分示例

Start Dragon Mid 1st turret

Slain Dark

dragon Dragon

Base Mid 3rd turret

Mid 2nd turret Bottom 1st

turret

Stealing blue creep Killing bottom lane creeps

Attacking bottom 1st turret

Describe hotspots transition series before destroying the key resource

Model - Transfer Network with Macro Strategy

Key resources

Hotspots

Model - Combat

• Multi-task on buttons ◇ Action space Directions

Skill releasing position

Learning Framework

• Imitation + Reinforcement Learning

Computing Platform

• Computing Platform ◇ Computational power – large-scale CPU/GPU virtualization

◇ Learning platform – Efficient and easy-to-use platform

MOBA Game AI Platform

Millions of CPUs Thousands of GPUs

Online service Idle resource pool Online service Offline service

Docker + mixed online/offline technique Docker + GPU virtualization for shared resource

Elastic computation Kubernetes resource allocation

Tencent cloud function

Machine learning

Feature extraction Game environment

deployment Model training

Reinforcement learning

Computational units

Deployment

Resource allocation

Task Managem

ent

Service

Computational Power

• Computational Costs

GPU CPU

MOBA AI thousands millions

CPU/GPU Demands Challenge Solution

Improve resource utilization efficiency without additional costs

The more is the better

CPU/GPU virtualization for shared resources

CPU Virtualization

70% - Idle resource pool ◇ New resources not yet delivered ◇ Old resources not yet cleared ◇ Returned resources

30% - Idle slots in online service

◇ Online service resource usage

◇ 20%->65% using docker isolation

• Elastic and dynamic resource pool ◇ millions of CPU cores

# of CPU cores # CPU avg %

Percentage millions 20%

Elastic & Dynamic Resource Pool

GPU Virtualization

• Goal: improve GPU usage efficiency

• Resource usage

• Optimization idea

# of GPU % of low load machines GPU avg usage

thousands 65% 28%

GPU Virtualization [12]

Parallel share

Time-slice share

Learning Platform

Core Technique

Version Update

Frequency

Feature extraction

Hours

Model training

One day

RL training

One day

Learning Platform - Feature Extraction Platform

• Demand 1 ◇ Feature extraction from up to hundreds of thousands of replays

Challenge: demands up to 210 thousand CPU cores per day Solution

• CPU virtualization

• docker elastic & dynamic resource pool

• Demand 2 ◇ Multiple tasks, each with millions of entries

Challenge: Parallel task scheduling Solution: Tencent Serverless Cloud Function

Game replays

gamecore

pre

Game Raw Data Features Training samples Models Evaluation

Feature extraction shuffle Training Evaluation

Learning Platform - Serverless Cloud Function

SDK

COS CMQ …

API

SDK

Function Call

Function Config

Function

Coordination

Function Function Function …

Application layer

Access layer

Control layer

Execution layer

Advantage of Cloud Function ◇ Function As A Service

◇ Millions of CPU cores available

◇ Free of charge in idle slots

30% of costs on average

Learning Platform - Model Training Platform

1.Requirement ◇ Billions of samples per task ◇ Fast model training

2.Solution ◇ Multi-GPU, multi-machine ◇ Machine learning platform

3.Challenges ◇ IO Efficient data inputs Efficient computation ◇ Communication Efficient parameters exchange

Training Platform

Big Data

Result

Model Training Platform - IO

Data IO ◇ Multiprocessing ◇ “Lock free” queue Efficient computation

◇ Data pre-caching ◇ OP speed up by multi-threading

Model Training Platform - Communication

Parameters exchange

◇ NCCL2 [11]

Efficient communication

between GPUs

◇ RDMA

Efficient communication across

nodes

Model Training Platform - Performance

Acceleration

Optimization results (acceleration ratio)

IO Computation Communication 0

10

20

30

40

50

60

70

Multi-GPU Multi-Machine Speed-up

1GPU 8GPUs 16GPUs 32GPUs 64GPUs

Before After Upper bound

• Demands ◇ Hierarchical RL

Various scenarios

◇ Large-scale parallel self-play

Millions of games

◇ Automatic task management

Unified framework

Model analysis

Evaluation

Learning Platform - Reinforcement Learning Platform

打野

清兵

团战

Jungle

Laning

Combat

RL Platform - Hierarchical RL

• Hierarchical RL

◇Scenario specific

• Solution

◇General Hierarchical RL

• Features

◇Macro task selection

◇Micro task selection

◇Effectively handles long-term planning and delayed rewards

◇Value network for guiding sub-task policy learning

• Large-scale parallel self-play

• Solution ◇ Docker image for gamecore version management

◇ Parallel training framework

RL Platform – Parallel Training

• Unified framework for model analysis and evaluation

◇Task submission

◇ Task start/stop

◇ Results visualization

RL Platform – Automatic task management

Reward curve 雷达图 Prediction distribution Self play results

• Ten million scenarios per day

◇20s per scenario with 16 GPUs

• Millions of full games

◇10min+ per game with 128 GPUs

RL Platform – Performance

Demonstration

Visualization

Demo – Quadra-kill Under Turret

• Micro combat ◇ Fight against mid-high level testers

◇ Killing while avoiding harm from turret

Demo – Pentakill

• Micro combat ◇ Fight against mid-high level testers

Demo – Transfer & Strategy

• Opening


• First Dragon appears at 2:00


• Besiege and Destroy the Base

Demo – RL

After reinforcement Before reinforcement

Summary

• Pursue general AI via game AI research

• MOBA AI

◇ Algorithm

· Imitation + Reinforcement Learning

◇ Computing platform

· Feature extraction platform

Millions of CPUs

· Model training platform

Thousands of GPUs

· Reinforcement learning platform

Hierarchical RL

Tencent Game AI Research

Tencent Game AI Research

• Future work ◇ Algorithm

· Tactic-level search and planning

· Multi-agent RL

◇ Computational power

· Search/planning platform

MCTS

· Reinforcement learning platform

Multi-agent RL

2017.3 “Jueyi”Fine Art wins the UEC World Cup

About Tencent AI Lab

Our journey

2016.4 Tencent establishes its corporate-level AI Lab

2017.3 Tencent announces leading AI researcher Dr Tong ZHANG as the Director of Tencent AI Lab

2017.5 Tencent establishes its Seattle AI Lab and announces leading Speech Recognition expert Dr Dong Yu as Deputy Director

2017.11 Tencent is identified by China Ministry of Science and Technology to build national open innovation platform for AI medical imaging Today

Our team consists of 70 world-class AI scientists and

300 research engineers

Diverse game ecosystem Game AI

Environment for AGI

China’s leading news, video, music and literature platforms Content AI Perceiving the world and

generating content

Social AI New ways to communicate WeChat: ~1 billion MAU QQ: 850 million MAU

Massive user base

Medical AI Impact and advance

industry

Building a national open innovation platform for AI medical imaging

About Tencent AI Lab

Thank you

References

• [1] Silver, David, et al. "Mastering the game of Go with deep neural networks and tree search." Nature 529.7587 (2016): 484-489.

• [2] Artificial Intelligence Startup Landscape Trends and Insights - Q4 2016. NOVEMBER 20, 2016 VENTURE SCANNER. https://www.venturescanner.com/blog/2016/artificial-intelligence-startup-landscape-trends-and-insights-q4-2016

• [3] Tian, Yuandong, et al. "ELF: An Extensive, Lightweight and Flexible Research Platform for Real-time Strategy Games." arXiv preprint arXiv:1707.01067 (2017).

• [4] O Vinyals et al. StarCraft II: A New Challenge for Reinforcement Learning. https://deepmind.com/research/publications/starcraft-ii-new-challenge-reinforcement-learning/. Aug. 9, 2017

• [5] “We've created an AI which beats the world's top professionals at 1v1 matches of Dota 2”. https://blog.openai.com/dota-2/

• [6] Ontanó n, Santiago, Gabriel Synnaeve, Alberto Uriarte, Florian Richoux, David Churchill, and Mike Preuss. "RTS AI: Problems and Techniques." (2015): 1-12.

• [7] Miles, Chris, and Sushil J. Louis. "Co-evolving real-time strategy game playing influence map trees with genetic algorithms." Proceedings of the International Congress on Evolutionary Computation, Portland, Oregon. IEEE Press, 2006.

• [8] Jang, Su-Hyung, and Sung-Bae Cho. "Evolving neural NPCs with layered influence map in the real-time simulation game „Conqueror‟." Computational Intelligence and Games, 2008. CIG'08. IEEE Symposium on. IEEE, 2008.

• [9] Weber, Ben George, Michael Mateas, and Arnav Jhala. "Building Human-Level AI for Real-Time Strategy Games." AAAI Fall Symposium: Advances in Cognitive Systems. Vol. 11. 2011.

• [10] Xingjian, S. H. I., et al. "Convolutional LSTM network: A machine learning approach for precipitation nowcasting." Advances in neural information processing systems. 2015.

• [11] Nathan Luehr. NCCL: ACCELERATED COLLECTIVE COMMUNICATIONS FOR GPUS. April 5, 2016. GPU Technology Conference 2016.

• [12] CUDA MULTI-PROCESS SERVICE. https://docs.nvidia.com/deploy/pdf/CUDA_Multi_Process_Service_Overview.pdf.

https://www.venturescanner.com/blog/2016/artificial-intelligence-startup-landscape-trends-and-insights-q4-2016

















https://deepmind.com/research/publications/starcraft-ii-new-challenge-reinforcement-learning/











https://blog.openai.com/dota-2/



Large-Scale Platform for MOBA Game AI - NVIDIA · • [1] Silver, David, et al. "Mastering the game...

Documents

Transcript of Large-Scale Platform for MOBA Game AI - NVIDIA · • [1] Silver, David, et al. "Mastering the game...