COEVOLVING ROBUST STRATEGIES FOR Real-Time Strategy Games

1

Evolutionary Computing Systems Lab (ECSL), University of Nevada, Reno

COEVOLVING ROBUST STRATEGIES FOR REAL-TIME STRATEGY GAMES

Christopher [email protected] http://www.cse.unr.edu/~caballinger

mailto:[email protected]

http://www.cse.unr.edu/~caballinger

2

Outline

Artificial Intelligence Game AI

Board Games RTS Games

StarCraft WaterCraft

Motivation Prior Work

Methodology Evolutionary

Methods Representation

Encoding AI Behavior

Current Progress Conclusions Future Work


Evolutionary Computing Systems Lab (ECSL), University of Nevada, Reno 3

Artificial Intelligence

(Broadly) understanding and building intelligent agents. (Russell & Norvig) Intelligent Agent

An autonomous entity which observes through sensors, acts upon an environment using actuators, and directs its activity towards achieving goals. (Russell & Norvig)

Computational Intelligence A set of nature-inspired computational

methodologies and approaches to address complex problems. (Kahraman)

Game AI Decision-making process of computer-controlled

opponents/NPCs (Ponsen et. al.)


Board Games

Present challenging problems Complex state space Adversarial planning

Checkers State Space - (1020)

Chess State Space - (1050)

Go State Space - (10170)

A lot of AI research in the past used board games Board Game AIs play competitively against humans RTS games present even more difficult challenges


Real-Time Strategy

Much more complex than board games State space is orders of magnitude larger

(1050)36,000

to (10200)36,000

for an entire game match MUCH more than the number of protons in the

observable universeRTS Games Board Games

Simultaneous Moves Turn-based Moves

Durative Actions Instant Actions

Partially Observable Fully Observable

Non-deterministic Determinist


Real-Time Strategy

Several categories of challenges for Game AI Resource Management Decisions under uncertainty Spatial/Temporal reasoning Collaboration Opponent modeling/learning Adversarial real-time planning

Remains a challenge for AI, but not an impossible problem Human players are capable of overcoming these challenges

Humans can adapt to these difficult challenges so well, professional RTS players can make a living playing “e-sports” Most well known pro-league is for StarCraft


StarCraft

Objectives: Manage economy To build army

Many types of units

Each type has strengths and weaknesses

Getting the right mix is key

Research upgrades/abilities

To destroy enemy


StarCraft

Development Problems StarCraft

3rd-Party API can be used for AI development

Runs (relatively) slow Hard to run multiple

instances in parallel StarCraft II

No API

StarCraft II


WaterCraft

† Source code can be found on Christopher Ballinger’s website

WaterCraft† Modeled after

StarCraft II Easy to run in

parallel Runs quicker by

disabling graphics

10

Motivation RTS games are good testbeds for AI research

Present many challenging aspects Intransitive relationships between strategies, similar to rock-

paper-scissors No one optimal strategy

Robustness of strategies We believe designing a good RTS game player will advance AI

research significantly (like chess and checkers did)

S1

S3

S2


11

Previous Work Case-based reasoning

(Ontanion, 2006) Genetic Algorithms +

Case-based Reasoning (Miles, 2005)

Reinforcement Learning (Spronck, 2007)

Studies on specific aspects Combat (Churchill, 2012) Economy (Chan, 2007) Coordination (Keaveney,

2011) Case-Injection into

population (Miles, Sushil 2005)


What we’ve done Focus on build-orders (our

‘strategy’) Robustness against

multiple opponents Defeat known/common

strategies Case-injection Compare two

evolutionary methods Genetic

Algorithm (GA) Coevolutionary

Algorithm (CA)


Evolutionary Methods

Terminology Chromosome

A possible solution Population

A set of chromosomes Typically, initial

population contains completely random chromosomes

Fitness A measure of how well a

solution/chromosome solves a problem

Generation The number of iterations

we repeat the process

Evaluatio

n

Selection

Crossove

r

Mutation

New Population

C0

C1

C2

Cn

. . .

Population

Series1

0

60

120

Time

Fitness

1 0 0 1 0

C0

Gene

Alleles =

0,1


Evaluation

C0

C1

C2

Cn

. . .

Evaluator

F0

Population Evaluate

Chromosome

Assign Fitness

F1

F2

Fn

…

Assign a fitness to all chromosomes in the population


Selection A method for selecting which chromosomes we

should select for crossover, and how often. Roulette Wheel Selection

F0

F1

F2

Fn


Crossover

1 0 1 0 1 0 1 0 1 01 0 1 0 1 0 1 0 1 0

Randomly Select Index

Parent 1

Child 1

Parent 2

Child 2

A method to exchange information between two chromosomes (parents), attempting to produce more effective chromosomes (children)


01

0

1

Mutation A method to make sure certain

patterns/capabilities do not permanently go extinct in the entire population

1 0 0 1 0

1 0 1 0 0

1 0 0 1 1

1 0 … … …

1 0 1 1 1


New Population

F0

Old Population(Parents)

F1

F2

Fn

…

All Children become the Parents for the start of the next generation

C0

C1

C2

Cn

. . .

New Population(Children)

C2

Cn

C0

C1

. . .

Evaluatio

n

Selection

Crossove

r

Mutation

New Population


Evolutionary Methods Differences between our GA and CA?

GA: Population plays against the same hand-tuned baselines every generation

C0

C1

Cn

Baseline 1

Population

Baseline 2

Baseline 3

Generation 1

Teachset(Evaluators)




C0

C1

Cn

Baseline 1

Population

Baseline 2

Baseline 3

Generation 2





C0

C1

Cn

Baseline 1

Population

Baseline 2

Baseline 3

Generation 3





CA: Population plays against chromosomes from previous generations

C0

C1

Cn

Parent 1

Population

Parent 2

Parent 3

Generation 1






C0

C1

Cn

Parent 1

Population

Parent 2

Parent 3

Generation 2






C0

C1

Cn

Parent 1

Population

Parent 2

Parent 3

Generation 3


24

Evolutionary Methods


åÎ

úûù

êëé

øö

èæ=iDj

ijl

shared

iFjf 1

Identical parameters Pop. Size 50, 50

generations Scaled Fitness, CHC

selection, Uniform Crossover, Mutation

Teachset Shared Fitness

CA Teachset (8 Opponents)

Hall of Fame (HOF) Shared Selection

GA Teachset (3 Opponents)

Opponents never change

Baseline strategies

ååÎÎ

++=jj

k

BDkUDkiij BCUCSRF 32 k

25

Metric - Baseline Build-Orders Provide a diverse set of challenges

Fast Build Quickly build 5 Marines and attacks Doesn’t need much infrastructure

Medium Build Build 10 Marines and attack

Slow Build Build 5 Vultures and attacks Slow, requires a lot of infrastructure

Encoded as a chromosome


26

0 1 0 1 0 0

Representation - Encoding

Bitstring 3-bits per action Decoded

sequentially Inserts required

prerequisites

Bit Sequence

Action Prereq.

000-001 Build SCV (Minerals)

None

010 Build Marine Barracks

011-100 Build Firebat Barracks, Refinery, Academy

101 Build Vulture Barracks, Refinery, Factory

110 Build SCV (Gas) Refinery

111 Attack None



Representation - AI Behavior Execute actions in the queue as quickly as

possible Do not skip ahead in the queue

“Attack” action All Marines, Firebats, and Vultures move to attack

opponents Command Center Attack any other opponent units/buildings along the

way If nearby ally-unit is attacked, assist it by attacking

opponent’s unit

If Command Center is attacked, send SCVs to defend Once all threats have been eliminated, SCVs

return to their tasks


Experiment #1

Want to show that GAs and CAs find good build-orders

Ran GA and CA 10 times Evolved 15-bit(5 action) build-orders CA never trained against the baselines Ran multiple times to see if results could be

repeated reliably GA always found the same two build-orders CA always found the same single build-order

Exhaustive SearchBallinger, C.; Louis, S., "Comparing Heuristic Search Methods for Finding Effective Real-Time Strategy Game Plans“Ballinger, C.; Louis, S., "Comparing Coevolution, Genetic Algorithms, and Hill-Climbers for Finding Real-Time Strategy Game Plans"

29

Exhaustive Search

Exhaustive Search against all three baselines

15-bits was the maximum solution length we could exhaustively search.

Takes 20hrs to do all evaluations

Baselines encoded in 24-39bits, providing them with a large advantage

Shows how frequently the best solutions occur

Ranks all solutions by how many baselines they defeat

Solution 1

Baseline 1Baseline 2Baseline 3

Solution 2


Solution N



Ballinger, C.; Louis, S., "Comparing Heuristic Search Methods for Finding Effective Real-Time Strategy Game Plans“Ballinger, C.; Louis, S., "Comparing Coevolution, Genetic Algorithms, and Hill-Climbers for Finding Real-Time Strategy Game Plans"

30

Results

Exhaustive Search 32,768(215) possible

solutions 80% of possible

solutions lose to all three baselines

19.9% of possible solutions beat only one of the three baselines

Only 30 solutions (0.1% of possible solutions) can defeat two baselines

Zero solutions could beat all three

0 1 2 31

10

100

1000

10000

100000

Number of Wins

Number of Chromosomes


Exhaus-tive

-1800-1600-1400-1200-1000-800-600-400-200

0

Avg. Score Difference


31

Results CA

Always found the same solution Four Vultures and a

Firebat Never defeats any

baselines Doesn’t plan for

opponents that take more than 5 actions

Still improves score Beats many other

15-bit strategies


Exhaus-tive

CA

-1800-1600-1400-1200-1000-800-600-400-200

0



32

Results GA

Found solutions that could beat two baselines 100% of the time Strategy 1

Two SCVs, Two Firebats, One Vulture

Quick but weak defense

Strategy 2 Four Firebats, One

Vulture Strong but slow

defense


Exhaus-tive

CA GA

-1800-1600-1400-1200-1000-800-600-400-200

0




Discussion #1

GA reliably produces high-quality solutions

CA improves against baselines not seen during training 15-bits is very limited Huge disadvantage against the

baselines



Experiment #2

Increased bit-string length to 39 Same length as our longest baseline Will CA perform better on a level playing field?

Ran GA and CA 10 times GA found one build-order CA found 3 build-orders

Selected 3 random Hall of Fame (HOF) build-orders

Generated 10 random build-orders All GA, CA, HOF, Random, and Baseline

build-orders competed against each otherBallinger, C.; Louis, S., "Robustness of Coevolved Strategies in a Real-Time Strategy Game"

35

Results - Score


Set Baseline(3)

GA(1) CA(3) HOF(3) Rand(10)

Baseline(3)

1733 2341 1975 1775 2935

GA(1) 4591 2875 2175 2833 3573

CA(3) 2600 3925 2830 3322 3775

HOF(3) 2611 3533 2355 2877 3379

Rand(10)

1124 2017 1456 1498 1851

GA fitness highest against Baselines CA fitness highest against all other

build-orders

Ballinger, C.; Louis, S., "Robustness of Coevolved Strategies in a Real-Time Strategy Game"

36

Results - Wins


Set Baseline(3)


Baseline(3)

33% 33% 44% 33% 90%

GA(1) 100% 100% 0% 33% 80%

CA(3) 66% 100% 44% 66% 100%

HOF(3) 66% 66% 33% 44% 100%

Rand(10)

10% 40% 0% 0% 45%

GA always wins against the baselines CA beats two of the three baselines

Never appeared during training


37

Results – Command Centers


Set Baseline(3)


Baseline(3)

22% 33% 44% 33% 86%

GA(1) 100% 0% 0% 33% 60%

CA(3) 44% 66% 11% 44% 66%

HOF(3) 44% 33% 0% 11% 60%

Rand(10)

0% 0% 0% 0% 0%

Percent of C.C. destroyed were very similar Only two of the three CA build-orders

attack



Discussion #2

GA produces high-quality solutions for known opponents Highest score against the opponents used

for training CA produces more robust solutions

Defeats opponents not seen during training

How difficult are these strategies to a human player?

Can we bias a CA to defeat a human?Ballinger, C.; Louis, S., "Robustness of Coevolved Strategies in a Real-Time Strategy Game"

39

Experiment #3

Recorded actions of a human player against a previously coevolved strategy.

Coevolved strategy was 39-bits (13 actions) Human (me) selected which units to build

in real-time Unit actions were determined by the same

rules used by the GA and CA Human strategies took 75-bits (25 actions) to

encode Very hard to find winning 39-bit strategies

without “peeking” 39-bit strategies can still defeat 75-bit strategies


Ballinger, C.; Louis, S., "Finding Robust Strategies to Defeat Specific Opponents Using Case-Injected Coevolution"

40

Metric – Human Cases

We used two strategies for picking actions Easy Human (EH) Strategy (75-bits, 25 actions)

Quickly build 2 Marines, attack, repeat Slows down opponent and chips away at the base

Hard Human (HH) Strategy (75-bits , 25 actions) Build 9 SCVs, then build Firebats and Vultures in

parallel until opponent sends attack force Defend Command Center and send remaining

units to destroy opponents defenseless base Slow, requires a lot of infrastructure




Case-Injection

Injected human replays into CA teachset 2 of the 8 teachset spaces are

permanently replaced with the human cases

Not injecting human cases into the population (yet)

GA only trains against the human cases



Results Ran GA and CA 10 times

GA always found one build-order CA always found one build-order

Averaged the GA’s and CA’s population performance against each human strategy for each generation



Results – Score

0 5 10 15 20 25 30 35 40 451000150020002500300035004000450050005500

CA vs EHCA vs HHGA vs EHGA vs HH

Generation

Avg. Score

GA got the highest scores against the EH strategy

CA got the highest scores against the HH strategy



Results - Wins

0 5 10 15 20 25 30 35 40 4505

101520253035404550

CA vs EHCA vs HHGA vs EHGA vs HH

Generation

Avg.

Wins

Trivial to beat the EH strategy

GA never learns to defeat HH Over specializes against

EH strategy

CA quickly learns to defeat HH Still defeats the EH strategy

as often as the GA, though the score isn’t as high



Discussion #3

GA with fitness sharing can be mislead by large difficulty gap

CA produces high-quality robust solutions Can be biased towards known opponents Less prone to being mislead



Conclusion

Conclusion CAs are suitable for finding RTS strategies

Produces robust strategies Can defeat multiple opponents Can defeat opponents not seen during

training Can learn to defeat known opponents

without becoming over specialized


Future Work

Future Work Case-Injection into population

Learn to play like a known player/strategy Strategy identification and counter-strategy selection

What strategies might the current opponent be using? What strategies in my case database might be useful to

learn from to defeat the current opponent? System for perpetual Coevolution and Case-Injection

The more people play, the more new and useful strategies we can coevolve

Future-Future Work More Flexible Encoding

Complete game player Better opponent modeling

48

Acknowledgements

This research is supported by ONR grant

N000014-12-c-0522.

More information (papers, movies) [email protected] (

http://www.cse.unr.edu/~caballinger) [email protected] (

http://www.cse.unr.edu/~sushil)



http://www.cse.unr.edu/~caballinger


http://www.cse.unr.edu/~sushil


Publications In Preparation:

Identifying Pro StarCraft II players and strategies (IEEE T-CIAIG)

• Liu, S.; Ballinger, C.; Louis, S.; "Player Identification from RTS Game Replays", Computers and Their Applications (CATA), 2013 28th International Conference on, 4-6 March 2013

• Ballinger, C.; Louis, S., "Comparing Heuristic Search Methods for Finding Effective Real-Time Strategy Game Plans", IEEE Symposium Series on Computational Intelligence (SSCI) 2013, 16-19 April 2013

• Ballinger, C.; Louis, S., "Comparing Coevolution, Genetic Algorithms, and Hill-Climbers for Finding Real-Time Strategy Game Plans", Genetic and Evolutionary Computation Conference (GECCO) 2013, 6-10 July 2013

• Ballinger, C.; Louis, S., "Robustness of Coevolved Strategies in a Real-Time Strategy Game", IEEE Congress on Evolutionary Computation (CEC) 2013, 20-23 June 2013

• Ballinger, C.; Louis, S., "Finding Robust Strategies to Defeat Specific Opponents Using Case-Injected Coevolution", IEEE IEEE Conference on Computational Intelligence and Games (CIG) 2013, 11-13 August 2013

COEVOLVING ROBUST STRATEGIES FOR Real-Time Strategy Games

Documents

Transcript of COEVOLVING ROBUST STRATEGIES FOR Real-Time Strategy Games