Evolving Multi-modal Behavior in NPCs Jacob Schrum – [email protected] [email protected]...
-
Upload
august-peters -
Category
Documents
-
view
231 -
download
2
Transcript of Evolving Multi-modal Behavior in NPCs Jacob Schrum – [email protected] [email protected]...
Evolving Multi-Evolving Multi-modal Behavior modal Behavior
in NPCsin NPCsJacob Schrum – Jacob Schrum –
[email protected]@cs.utexas.eduRisto Miikkulainen – Risto Miikkulainen – [email protected]@cs.utexas.edu
University of Texas at AustinUniversity of Texas at AustinDepartment of Computer Department of Computer
SciencesSciences
IntroductionIntroduction
Goal: discover NPC behavior Goal: discover NPC behavior automaticallyautomatically
BenefitsBenefits Save production time/effortSave production time/effort Learn counterintuitive behaviorsLearn counterintuitive behaviors Find weaknesses in static scriptsFind weaknesses in static scripts Tailor behavior to human playersTailor behavior to human players
IntroductionIntroduction
ChallengesChallenges Games are complexGames are complex Multiple objectives Multiple objectives Multi-modal behavior requiredMulti-modal behavior required
RL & Evolution popular approaches RL & Evolution popular approaches How to encourage multi-modal How to encourage multi-modal
behavior?behavior?
Typical Agent Typical Agent ArchitectureArchitecture
One policyOne policy
Why not several policies?Why not several policies?
Agent
Environment
policy
Sensor input
Actions
Agent With Multiple Agent With Multiple PoliciesPolicies
Agent
Environment
policy 2
Sensor input
Actions
policy 1
policy n
…
arb
itrate
Policy for each Policy for each mode mode
Individual Individual policies policies simpler than simpler than monolithic monolithic policypolicy
Must choose which Must choose which policy to usepolicy to use
Multi-modal GameMulti-modal Game
Game to test multi-modal architectureGame to test multi-modal architecture Make task delineation clearMake task delineation clear Same NPCs perform two distinct tasksSame NPCs perform two distinct tasks
Must determine their task from sensorsMust determine their task from sensors New Game: “New Game: “Fight or Flight”Fight or Flight”
Fight or FlightFight or Flight
Fight TaskFight Task Player fights with batPlayer fights with bat NPCs avoid batNPCs avoid bat NPCs fight backNPCs fight back
Flight TaskFlight Task Player has no weaponPlayer has no weapon Player runs awayPlayer runs away NPCs confine/attackNPCs confine/attack
NPC ObjectivesNPC Objectives
Fight TaskFight Task Deal damageDeal damage Avoid damageAvoid damage Stay aliveStay alive
Flight TaskFlight Task Deal damageDeal damage
Not the same Not the same objective as in the objective as in the Fight task!Fight task!
How do we deal with multiple, competing objectives?
Multi-Objective Multi-Objective OptimizationOptimization
Imagine game with two Imagine game with two objectives: objectives: Damage DealtDamage Dealt Health RemainingHealth Remaining
AA dominates dominates BB iff iff AA is is strictly better in one strictly better in one objective and objective and at least at least as good in others as good in others
Population of points Population of points not dominated are best: not dominated are best: Pareto Front Pareto Front
High health but did not deal much damage
Dealt lot of damage,but lost lots of health
Tradeoff between objectives
NSGA-IINSGA-II Evolution: natural approach for finding optimal Evolution: natural approach for finding optimal
population population Non-Dominated Sorting Genetic Algorithm II*Non-Dominated Sorting Genetic Algorithm II*
Population P with size N; Evaluate PPopulation P with size N; Evaluate P Use mutation to get P´ size N; Evaluate P´Use mutation to get P´ size N; Evaluate P´ Calculate non-dominated fronts of {P Calculate non-dominated fronts of {P P´} size 2NP´} size 2N New population size N from highest fronts of {P New population size N from highest fronts of {P P´}P´}
*K. Deb et al. 2000
NeuroevolutionNeuroevolution Genetic Algorithms + Artificial Neural Genetic Algorithms + Artificial Neural
NetworksNetworks NNs good at generating behaviorNNs good at generating behavior GA creates new nets, evaluates themGA creates new nets, evaluates them Four basic mutations (no crossover used)Four basic mutations (no crossover used)
Perturb WeightAdd Connection Add Neuron Merge Neurons
New Mode MutationNew Mode Mutation
New mode with inputs from New mode with inputs from preexisting modepreexisting mode
Maximum preference neuron Maximum preference neuron determines modedetermines mode
before after
ExperimentExperiment Compare 1Mode vs. ModeMutationCompare 1Mode vs. ModeMutation 10 trials each10 trials each What to evolve against?What to evolve against?
Bot with static policy (instead of player)Bot with static policy (instead of player) Bot has a first person perspectiveBot has a first person perspective
Fight TaskFight Task Swing bat constantlySwing bat constantly Approach nearest bot Approach nearest bot
in frontin front
Flight TaskFlight Task Back away from Back away from
nearest bot in frontnearest bot in front
Incremental EvolutionIncremental Evolution
Hard to evolve against proposed bot Hard to evolve against proposed bot strategiesstrategies Could easily fail to evolve interesting Could easily fail to evolve interesting
behaviorbehavior Incremental evolution against Incremental evolution against
increasing speedsincreasing speeds 0%, 40%, 80%, 100%0%, 40%, 80%, 100%
Increase speed when all Increase speed when all goals goals are metare met
End when End when goalsgoals met at 100% met at 100%
GoalsGoals Average population performance high enough?Average population performance high enough?
Then increase speedThen increase speed Each objective has a goal:Each objective has a goal:
FightFight At least 50 damage to bot (1 kill)At least 50 damage to bot (1 kill) Less than 20 damage per NPC Less than 20 damage per NPC
on average (2 hits) on average (2 hits) Survive at least 800 time Survive at least 800 time
steps (80% of trial) steps (80% of trial) FlightFlight
At least 100 damage to bot (2 kills)At least 100 damage to bot (2 kills)
Average population objective score met goal Average population objective score met goal value?value? Goal metGoal met
Mode Mutation ResultsMode Mutation Results
Performs well in both Performs well in both taskstasks
Fight TaskFight Task Baiting behaviorBaiting behavior
One NPC takes damage so One NPC takes damage so others can sneak up behindothers can sneak up behind
Bot knocked back and forthBot knocked back and forth
Flight TaskFlight Task Corralling behaviorCorralling behavior
Keep bot confined in ring of Keep bot confined in ring of NPCsNPCs
Move to scare the bot into Move to scare the bot into enclosureenclosure
Use of Multiple ModesUse of Multiple Modes
Different modes for Different modes for baiting and attackingbaiting and attacking
Similar elements of Similar elements of modes co-opted for modes co-opted for different tasksdifferent tasks
Many unselected modesMany unselected modes As many as 7 unused As many as 7 unused
modesmodes Still have outward Still have outward
connectionsconnections Are they vestigial?Are they vestigial?
1 Mode Results1 Mode Results Only performs well in one taskOnly performs well in one task Example 1Example 1
Runs away in Fight task Runs away in Fight task Corralling behavior in Flight Corralling behavior in Flight
task task Example 2Example 2
Overly aggressive in Fight taskOverly aggressive in Fight task Lets bot escape in Flight taskLets bot escape in Flight task
Population averages of Population averages of individual objectives are high individual objectives are high enough, but few enough, but few individualsindividuals do do well in well in allall objectives objectives
Why Different Behaviors?Why Different Behaviors?
Progression methodProgression method Numerically similar performance Numerically similar performance Drastically different distribution of behaviorsDrastically different distribution of behaviors
1Mode evolves groups for subsets of 1Mode evolves groups for subsets of objectivesobjectives
ModeMutation biases towards solving ModeMutation biases towards solving allall objectivesobjectives Changes shape of fitness landscapeChanges shape of fitness landscape
Future WorkFuture Work
Improve progressionImprove progression More granularity in tougher end of task More granularity in tougher end of task
sequencesequence Can incremental evolution be avoided?Can incremental evolution be avoided?
Improve multiobjective selectionImprove multiobjective selection Bias towards middle of Bias towards middle of
trade-off surface trade-off surface Other algorithms: Other algorithms:
SPEA2SPEA2 PESA-IIPESA-II
Future WorkFuture Work
Improve ModeMutationImprove ModeMutation Should new modes be strongly differentiated?Should new modes be strongly differentiated? Different arbitration mechanism?Different arbitration mechanism? Better option than randomly applying Better option than randomly applying
mutation?mutation? Different initial connectivity?Different initial connectivity?
P(y)
P(x)
ConclusionConclusion
ModeMutation encourages multi-ModeMutation encourages multi-modal behaviormodal behavior Biases search toward multi-modal Biases search toward multi-modal
solutionssolutions ModeMutation better than 1ModeModeMutation better than 1Mode
More successes in shorter amount of More successes in shorter amount of timetime
Lead to multi-modal behavior in Lead to multi-modal behavior in future gamesfuture games
Questions?Questions?
Movies: Movies: http://nn.cs.utexas.edu/?multimodal09
E-mail: E-mail: [email protected]@cs.utexas.edu
Auxiliary SlidesAuxiliary Slides
Ignore Achieved Goals for Ignore Achieved Goals for ObjectivesObjectives
Goal is met → Drop objectiveGoal is met → Drop objective Focus selection on most difficult Focus selection on most difficult
objectivesobjectives Prevents stagnationPrevents stagnation Reshaping fitness Reshaping fitness
landscape helps landscape helps escape peaks escape peaks
Project scores into Project scores into lower dimension lower dimension