Evolving Multi-modal Behavior in NPCs Jacob Schrum – [email protected] [email protected]...

26
Evolving Multi- Evolving Multi- modal Behavior modal Behavior in NPCs in NPCs Jacob Schrum – Jacob Schrum – [email protected] [email protected] Risto Miikkulainen – Risto Miikkulainen – [email protected] [email protected] University of Texas at Austin University of Texas at Austin Department of Computer Department of Computer Sciences Sciences

Transcript of Evolving Multi-modal Behavior in NPCs Jacob Schrum – [email protected] [email protected]...

Page 1: Evolving Multi-modal Behavior in NPCs Jacob Schrum – schrum2@cs.utexas.edu schrum2@cs.utexas.edu Risto Miikkulainen – risto@cs.utexas.edu risto@cs.utexas.edu.

Evolving Multi-Evolving Multi-modal Behavior modal Behavior

in NPCsin NPCsJacob Schrum – Jacob Schrum –

[email protected]@cs.utexas.eduRisto Miikkulainen – Risto Miikkulainen – [email protected]@cs.utexas.edu

University of Texas at AustinUniversity of Texas at AustinDepartment of Computer Department of Computer

SciencesSciences

Page 2: Evolving Multi-modal Behavior in NPCs Jacob Schrum – schrum2@cs.utexas.edu schrum2@cs.utexas.edu Risto Miikkulainen – risto@cs.utexas.edu risto@cs.utexas.edu.

IntroductionIntroduction

Goal: discover NPC behavior Goal: discover NPC behavior automaticallyautomatically

BenefitsBenefits Save production time/effortSave production time/effort Learn counterintuitive behaviorsLearn counterintuitive behaviors Find weaknesses in static scriptsFind weaknesses in static scripts Tailor behavior to human playersTailor behavior to human players

Page 3: Evolving Multi-modal Behavior in NPCs Jacob Schrum – schrum2@cs.utexas.edu schrum2@cs.utexas.edu Risto Miikkulainen – risto@cs.utexas.edu risto@cs.utexas.edu.

IntroductionIntroduction

ChallengesChallenges Games are complexGames are complex Multiple objectives Multiple objectives Multi-modal behavior requiredMulti-modal behavior required

RL & Evolution popular approaches RL & Evolution popular approaches How to encourage multi-modal How to encourage multi-modal

behavior?behavior?

Page 4: Evolving Multi-modal Behavior in NPCs Jacob Schrum – schrum2@cs.utexas.edu schrum2@cs.utexas.edu Risto Miikkulainen – risto@cs.utexas.edu risto@cs.utexas.edu.

Typical Agent Typical Agent ArchitectureArchitecture

One policyOne policy

Why not several policies?Why not several policies?

Agent

Environment

policy

Sensor input

Actions

Page 5: Evolving Multi-modal Behavior in NPCs Jacob Schrum – schrum2@cs.utexas.edu schrum2@cs.utexas.edu Risto Miikkulainen – risto@cs.utexas.edu risto@cs.utexas.edu.

Agent With Multiple Agent With Multiple PoliciesPolicies

Agent

Environment

policy 2

Sensor input

Actions

policy 1

policy n

arb

itrate

Policy for each Policy for each mode mode

Individual Individual policies policies simpler than simpler than monolithic monolithic policypolicy

Must choose which Must choose which policy to usepolicy to use

Page 6: Evolving Multi-modal Behavior in NPCs Jacob Schrum – schrum2@cs.utexas.edu schrum2@cs.utexas.edu Risto Miikkulainen – risto@cs.utexas.edu risto@cs.utexas.edu.

Multi-modal GameMulti-modal Game

Game to test multi-modal architectureGame to test multi-modal architecture Make task delineation clearMake task delineation clear Same NPCs perform two distinct tasksSame NPCs perform two distinct tasks

Must determine their task from sensorsMust determine their task from sensors New Game: “New Game: “Fight or Flight”Fight or Flight”

Page 7: Evolving Multi-modal Behavior in NPCs Jacob Schrum – schrum2@cs.utexas.edu schrum2@cs.utexas.edu Risto Miikkulainen – risto@cs.utexas.edu risto@cs.utexas.edu.

Fight or FlightFight or Flight

Fight TaskFight Task Player fights with batPlayer fights with bat NPCs avoid batNPCs avoid bat NPCs fight backNPCs fight back

Flight TaskFlight Task Player has no weaponPlayer has no weapon Player runs awayPlayer runs away NPCs confine/attackNPCs confine/attack

Page 8: Evolving Multi-modal Behavior in NPCs Jacob Schrum – schrum2@cs.utexas.edu schrum2@cs.utexas.edu Risto Miikkulainen – risto@cs.utexas.edu risto@cs.utexas.edu.

NPC ObjectivesNPC Objectives

Fight TaskFight Task Deal damageDeal damage Avoid damageAvoid damage Stay aliveStay alive

Flight TaskFlight Task Deal damageDeal damage

Not the same Not the same objective as in the objective as in the Fight task!Fight task!

How do we deal with multiple, competing objectives?

Page 9: Evolving Multi-modal Behavior in NPCs Jacob Schrum – schrum2@cs.utexas.edu schrum2@cs.utexas.edu Risto Miikkulainen – risto@cs.utexas.edu risto@cs.utexas.edu.

Multi-Objective Multi-Objective OptimizationOptimization

Imagine game with two Imagine game with two objectives: objectives: Damage DealtDamage Dealt Health RemainingHealth Remaining

AA dominates dominates BB iff iff AA is is strictly better in one strictly better in one objective and objective and at least at least as good in others as good in others

Population of points Population of points not dominated are best: not dominated are best: Pareto Front Pareto Front

High health but did not deal much damage

Dealt lot of damage,but lost lots of health

Tradeoff between objectives

Page 10: Evolving Multi-modal Behavior in NPCs Jacob Schrum – schrum2@cs.utexas.edu schrum2@cs.utexas.edu Risto Miikkulainen – risto@cs.utexas.edu risto@cs.utexas.edu.

NSGA-IINSGA-II Evolution: natural approach for finding optimal Evolution: natural approach for finding optimal

population population Non-Dominated Sorting Genetic Algorithm II*Non-Dominated Sorting Genetic Algorithm II*

Population P with size N; Evaluate PPopulation P with size N; Evaluate P Use mutation to get P´ size N; Evaluate P´Use mutation to get P´ size N; Evaluate P´ Calculate non-dominated fronts of {P Calculate non-dominated fronts of {P P´} size 2NP´} size 2N New population size N from highest fronts of {P New population size N from highest fronts of {P P´}P´}

*K. Deb et al. 2000

Page 11: Evolving Multi-modal Behavior in NPCs Jacob Schrum – schrum2@cs.utexas.edu schrum2@cs.utexas.edu Risto Miikkulainen – risto@cs.utexas.edu risto@cs.utexas.edu.

NeuroevolutionNeuroevolution Genetic Algorithms + Artificial Neural Genetic Algorithms + Artificial Neural

NetworksNetworks NNs good at generating behaviorNNs good at generating behavior GA creates new nets, evaluates themGA creates new nets, evaluates them Four basic mutations (no crossover used)Four basic mutations (no crossover used)

Perturb WeightAdd Connection Add Neuron Merge Neurons

Page 12: Evolving Multi-modal Behavior in NPCs Jacob Schrum – schrum2@cs.utexas.edu schrum2@cs.utexas.edu Risto Miikkulainen – risto@cs.utexas.edu risto@cs.utexas.edu.

New Mode MutationNew Mode Mutation

New mode with inputs from New mode with inputs from preexisting modepreexisting mode

Maximum preference neuron Maximum preference neuron determines modedetermines mode

before after

Page 13: Evolving Multi-modal Behavior in NPCs Jacob Schrum – schrum2@cs.utexas.edu schrum2@cs.utexas.edu Risto Miikkulainen – risto@cs.utexas.edu risto@cs.utexas.edu.

ExperimentExperiment Compare 1Mode vs. ModeMutationCompare 1Mode vs. ModeMutation 10 trials each10 trials each What to evolve against?What to evolve against?

Bot with static policy (instead of player)Bot with static policy (instead of player) Bot has a first person perspectiveBot has a first person perspective

Fight TaskFight Task Swing bat constantlySwing bat constantly Approach nearest bot Approach nearest bot

in frontin front

Flight TaskFlight Task Back away from Back away from

nearest bot in frontnearest bot in front

Page 14: Evolving Multi-modal Behavior in NPCs Jacob Schrum – schrum2@cs.utexas.edu schrum2@cs.utexas.edu Risto Miikkulainen – risto@cs.utexas.edu risto@cs.utexas.edu.

Incremental EvolutionIncremental Evolution

Hard to evolve against proposed bot Hard to evolve against proposed bot strategiesstrategies Could easily fail to evolve interesting Could easily fail to evolve interesting

behaviorbehavior Incremental evolution against Incremental evolution against

increasing speedsincreasing speeds 0%, 40%, 80%, 100%0%, 40%, 80%, 100%

Increase speed when all Increase speed when all goals goals are metare met

End when End when goalsgoals met at 100% met at 100%

Page 15: Evolving Multi-modal Behavior in NPCs Jacob Schrum – schrum2@cs.utexas.edu schrum2@cs.utexas.edu Risto Miikkulainen – risto@cs.utexas.edu risto@cs.utexas.edu.

GoalsGoals Average population performance high enough?Average population performance high enough?

Then increase speedThen increase speed Each objective has a goal:Each objective has a goal:

FightFight At least 50 damage to bot (1 kill)At least 50 damage to bot (1 kill) Less than 20 damage per NPC Less than 20 damage per NPC

on average (2 hits) on average (2 hits) Survive at least 800 time Survive at least 800 time

steps (80% of trial) steps (80% of trial) FlightFlight

At least 100 damage to bot (2 kills)At least 100 damage to bot (2 kills)

Average population objective score met goal Average population objective score met goal value?value? Goal metGoal met

Page 16: Evolving Multi-modal Behavior in NPCs Jacob Schrum – schrum2@cs.utexas.edu schrum2@cs.utexas.edu Risto Miikkulainen – risto@cs.utexas.edu risto@cs.utexas.edu.
Page 17: Evolving Multi-modal Behavior in NPCs Jacob Schrum – schrum2@cs.utexas.edu schrum2@cs.utexas.edu Risto Miikkulainen – risto@cs.utexas.edu risto@cs.utexas.edu.

Mode Mutation ResultsMode Mutation Results

Performs well in both Performs well in both taskstasks

Fight TaskFight Task Baiting behaviorBaiting behavior

One NPC takes damage so One NPC takes damage so others can sneak up behindothers can sneak up behind

Bot knocked back and forthBot knocked back and forth

Flight TaskFlight Task Corralling behaviorCorralling behavior

Keep bot confined in ring of Keep bot confined in ring of NPCsNPCs

Move to scare the bot into Move to scare the bot into enclosureenclosure

Page 18: Evolving Multi-modal Behavior in NPCs Jacob Schrum – schrum2@cs.utexas.edu schrum2@cs.utexas.edu Risto Miikkulainen – risto@cs.utexas.edu risto@cs.utexas.edu.

Use of Multiple ModesUse of Multiple Modes

Different modes for Different modes for baiting and attackingbaiting and attacking

Similar elements of Similar elements of modes co-opted for modes co-opted for different tasksdifferent tasks

Many unselected modesMany unselected modes As many as 7 unused As many as 7 unused

modesmodes Still have outward Still have outward

connectionsconnections Are they vestigial?Are they vestigial?

Page 19: Evolving Multi-modal Behavior in NPCs Jacob Schrum – schrum2@cs.utexas.edu schrum2@cs.utexas.edu Risto Miikkulainen – risto@cs.utexas.edu risto@cs.utexas.edu.

1 Mode Results1 Mode Results Only performs well in one taskOnly performs well in one task Example 1Example 1

Runs away in Fight task Runs away in Fight task Corralling behavior in Flight Corralling behavior in Flight

task task Example 2Example 2

Overly aggressive in Fight taskOverly aggressive in Fight task Lets bot escape in Flight taskLets bot escape in Flight task

Population averages of Population averages of individual objectives are high individual objectives are high enough, but few enough, but few individualsindividuals do do well in well in allall objectives objectives

Page 20: Evolving Multi-modal Behavior in NPCs Jacob Schrum – schrum2@cs.utexas.edu schrum2@cs.utexas.edu Risto Miikkulainen – risto@cs.utexas.edu risto@cs.utexas.edu.

Why Different Behaviors?Why Different Behaviors?

Progression methodProgression method Numerically similar performance Numerically similar performance Drastically different distribution of behaviorsDrastically different distribution of behaviors

1Mode evolves groups for subsets of 1Mode evolves groups for subsets of objectivesobjectives

ModeMutation biases towards solving ModeMutation biases towards solving allall objectivesobjectives Changes shape of fitness landscapeChanges shape of fitness landscape

Page 21: Evolving Multi-modal Behavior in NPCs Jacob Schrum – schrum2@cs.utexas.edu schrum2@cs.utexas.edu Risto Miikkulainen – risto@cs.utexas.edu risto@cs.utexas.edu.

Future WorkFuture Work

Improve progressionImprove progression More granularity in tougher end of task More granularity in tougher end of task

sequencesequence Can incremental evolution be avoided?Can incremental evolution be avoided?

Improve multiobjective selectionImprove multiobjective selection Bias towards middle of Bias towards middle of

trade-off surface trade-off surface Other algorithms: Other algorithms:

SPEA2SPEA2 PESA-IIPESA-II

Page 22: Evolving Multi-modal Behavior in NPCs Jacob Schrum – schrum2@cs.utexas.edu schrum2@cs.utexas.edu Risto Miikkulainen – risto@cs.utexas.edu risto@cs.utexas.edu.

Future WorkFuture Work

Improve ModeMutationImprove ModeMutation Should new modes be strongly differentiated?Should new modes be strongly differentiated? Different arbitration mechanism?Different arbitration mechanism? Better option than randomly applying Better option than randomly applying

mutation?mutation? Different initial connectivity?Different initial connectivity?

P(y)

P(x)

Page 23: Evolving Multi-modal Behavior in NPCs Jacob Schrum – schrum2@cs.utexas.edu schrum2@cs.utexas.edu Risto Miikkulainen – risto@cs.utexas.edu risto@cs.utexas.edu.

ConclusionConclusion

ModeMutation encourages multi-ModeMutation encourages multi-modal behaviormodal behavior Biases search toward multi-modal Biases search toward multi-modal

solutionssolutions ModeMutation better than 1ModeModeMutation better than 1Mode

More successes in shorter amount of More successes in shorter amount of timetime

Lead to multi-modal behavior in Lead to multi-modal behavior in future gamesfuture games

Page 24: Evolving Multi-modal Behavior in NPCs Jacob Schrum – schrum2@cs.utexas.edu schrum2@cs.utexas.edu Risto Miikkulainen – risto@cs.utexas.edu risto@cs.utexas.edu.

Questions?Questions?

Movies: Movies: http://nn.cs.utexas.edu/?multimodal09

E-mail: E-mail: [email protected]@cs.utexas.edu

Page 25: Evolving Multi-modal Behavior in NPCs Jacob Schrum – schrum2@cs.utexas.edu schrum2@cs.utexas.edu Risto Miikkulainen – risto@cs.utexas.edu risto@cs.utexas.edu.

Auxiliary SlidesAuxiliary Slides

Page 26: Evolving Multi-modal Behavior in NPCs Jacob Schrum – schrum2@cs.utexas.edu schrum2@cs.utexas.edu Risto Miikkulainen – risto@cs.utexas.edu risto@cs.utexas.edu.

Ignore Achieved Goals for Ignore Achieved Goals for ObjectivesObjectives

Goal is met → Drop objectiveGoal is met → Drop objective Focus selection on most difficult Focus selection on most difficult

objectivesobjectives Prevents stagnationPrevents stagnation Reshaping fitness Reshaping fitness

landscape helps landscape helps escape peaks escape peaks

Project scores into Project scores into lower dimension lower dimension