Intelligent Online Case Based Planning Agent Model For Rts Games Conference Presentation
Inductive Agent Modeling in Games - part 1
Transcript of Inductive Agent Modeling in Games - part 1
![Page 1: Inductive Agent Modeling in Games - part 1](https://reader034.fdocuments.in/reader034/viewer/2022042807/6268a8e1855e34613d0c11d3/html5/thumbnails/1.jpg)
Inductive Agent Modeling
in Games
Bobby D. BryantNeuroevolution and Behavior Laboratory
Department of Computer Science and EngineeringUniversity of Nevada, Reno
![Page 2: Inductive Agent Modeling in Games - part 1](https://reader034.fdocuments.in/reader034/viewer/2022042807/6268a8e1855e34613d0c11d3/html5/thumbnails/2.jpg)
Today’s Plan
• Introduction to Inductive Agent Modeling
◦ concepts and issues
• In-depth example
![Page 3: Inductive Agent Modeling in Games - part 1](https://reader034.fdocuments.in/reader034/viewer/2022042807/6268a8e1855e34613d0c11d3/html5/thumbnails/3.jpg)
Why model agents?
• Competitive reasons
• Automated content creation
![Page 4: Inductive Agent Modeling in Games - part 1](https://reader034.fdocuments.in/reader034/viewer/2022042807/6268a8e1855e34613d0c11d3/html5/thumbnails/4.jpg)
How complicated is an agent model?
• from the very simple
◦ e.g., drama manager sets parameter for whether
the player wants to see cat-fights
• to the very complex
◦ e.g., controller emulates the Red Baron’s use of
his airplane and guns in a dog-fight
![Page 5: Inductive Agent Modeling in Games - part 1](https://reader034.fdocuments.in/reader034/viewer/2022042807/6268a8e1855e34613d0c11d3/html5/thumbnails/5.jpg)
Relationship of Modeler to Modeled
• Foe (AKA opponent modeling)
• Ally (e.g., human teammate)
• Neutral (e.g., NPCs for richer game environment)
![Page 6: Inductive Agent Modeling in Games - part 1](https://reader034.fdocuments.in/reader034/viewer/2022042807/6268a8e1855e34613d0c11d3/html5/thumbnails/6.jpg)
Who/what do we model?
• Human player
• Game AI
◦ Virtual player (e.g., Ms. Pacman)
◦ Autonomous in-game agent (e.g., NPC in FRPG)
![Page 7: Inductive Agent Modeling in Games - part 1](https://reader034.fdocuments.in/reader034/viewer/2022042807/6268a8e1855e34613d0c11d3/html5/thumbnails/7.jpg)
Autonomous In-Game Agents
Virtual Player Embedded Agents
![Page 8: Inductive Agent Modeling in Games - part 1](https://reader034.fdocuments.in/reader034/viewer/2022042807/6268a8e1855e34613d0c11d3/html5/thumbnails/8.jpg)
What if You Want to Modela Specific Kind of Behavior?
• E.g., The Red Baron (vs. “a fighter pilot”)
• Behavior may not be optimal (in the normal sense)◦ this notion ’challenges’ many researchers
• Hand-code the behavior?
• Use RL with a complex/subtle reward function?
• Derive policy from examples?◦ bypasses traditional knowledge engineering methods
· allows indirect, intuitive expression of behavior
◦ uses subject-matter experts, not technical experts
![Page 9: Inductive Agent Modeling in Games - part 1](https://reader034.fdocuments.in/reader034/viewer/2022042807/6268a8e1855e34613d0c11d3/html5/thumbnails/9.jpg)
Now here’s the plan...
...And Uses Them to TrainGame−Playing Agents
...While Machine Learning SystemCaptures Example Decisions...
Foolish Human,Prepare to Die!
0 50 100 150 200 250
020
4060
8010
0
Generation
Ave
rage
Sco
re o
n T
est S
et
Human Plays Games...
� � � �� � � �� � � �� � � �� � � �� � � �
� � �� � �� � �� � �� � �� � �
![Page 10: Inductive Agent Modeling in Games - part 1](https://reader034.fdocuments.in/reader034/viewer/2022042807/6268a8e1855e34613d0c11d3/html5/thumbnails/10.jpg)
The Inductive Agent Modeling Challenge
• Capture examples
• Create an agent model
◦ Conceive as a function:
· for reactive control -
ma : observablestate→ action
· more generally -
ma : observablestate× context→ action
◦ Derive by inductive machine learning mechanism
![Page 11: Inductive Agent Modeling in Games - part 1](https://reader034.fdocuments.in/reader034/viewer/2022042807/6268a8e1855e34613d0c11d3/html5/thumbnails/11.jpg)
Terminology
• Exemplar - the agent creating the examples
• Observer - the agency for capturing the examples
• Learner - the model that will emulate the exemplar
On-line vs. Off-line
• On-line: the learner and observer may be the same
• Off-line: the learner and observer are (probably)
distinct
![Page 12: Inductive Agent Modeling in Games - part 1](https://reader034.fdocuments.in/reader034/viewer/2022042807/6268a8e1855e34613d0c11d3/html5/thumbnails/12.jpg)
Challenge: Induction
• Deriving a generality from a collection of instances
• A hard problem...
◦ known to be a logical problem since Hume (18th C.)
◦ no-free-lunch theorems
• But we do it all the time anyway...
◦ Quine: Creatures inveterately wrong in their inductions
have a pathetic but praise-worthy tendency to die
before reproducing their kind.
◦ works because the universe isn’t a random place(?)
![Page 13: Inductive Agent Modeling in Games - part 1](https://reader034.fdocuments.in/reader034/viewer/2022042807/6268a8e1855e34613d0c11d3/html5/thumbnails/13.jpg)
Induction in Machine Learning (i)
Learning Classifier Systems (LCS)
• Map feature sets onto discrete classifications
c :< f1, f2, f3, . . . , fn >→ class
• Learn general rule from examples
• Large body of research
◦ we can adopt these methods directly,
when the agent to be modeled has
discrete action space
![Page 14: Inductive Agent Modeling in Games - part 1](https://reader034.fdocuments.in/reader034/viewer/2022042807/6268a8e1855e34613d0c11d3/html5/thumbnails/14.jpg)
Induction in Machine Learning (ii)
Artificial Neural Networks (ANN)
• Map input patterns onto (continuous) output patterns
n : Rm → Rn
• Learn general rule from examples
• Large body of research
◦ we can adopt these methods directly,
when the agent to be modeled has
continuous action spaces
![Page 15: Inductive Agent Modeling in Games - part 1](https://reader034.fdocuments.in/reader034/viewer/2022042807/6268a8e1855e34613d0c11d3/html5/thumbnails/15.jpg)
Induction in Machine Learning (iii)
<Your Favorite Method Goes Here>
• Map ??? onto ???
• Learn general rule from examples
• Your body of research
◦ you can adopt these methods directly,
when the agent to be modeled has
appropriate input/output types
• [Discuss!]
![Page 16: Inductive Agent Modeling in Games - part 1](https://reader034.fdocuments.in/reader034/viewer/2022042807/6268a8e1855e34613d0c11d3/html5/thumbnails/16.jpg)
Challenge: Change in POV (i)Observer and/or learner may have a different observable
state (or view thereof) than the exemplar has
• e.g., Legion-II example (later)
◦ exemplar is human player with “God’s-eye” view
◦ observer/learner is in-game agent with egocentric view
• may make the induction harder
• can make the induction impossible,if critical state information is not visible
◦ e.g., trying to model a driver when observer or learner
cannot see street lights
![Page 17: Inductive Agent Modeling in Games - part 1](https://reader034.fdocuments.in/reader034/viewer/2022042807/6268a8e1855e34613d0c11d3/html5/thumbnails/17.jpg)
Challenge: Change in POV (ib)
Human Player In-Game Agents
![Page 18: Inductive Agent Modeling in Games - part 1](https://reader034.fdocuments.in/reader034/viewer/2022042807/6268a8e1855e34613d0c11d3/html5/thumbnails/18.jpg)
Challenge: Change in POV (ii)
Observer’s and/or learner’s view of state may have a
completely different modality than the exemplar’s
• e.g., Parker & Bryant (in press) work on emulating
Quake II bot◦ exemplar (bot) has direct access to games’s state
variables· distances, directions, etc.
◦ observer/learner has only low-resolution rendered
visual input
• presumably makes the induction harder
![Page 19: Inductive Agent Modeling in Games - part 1](https://reader034.fdocuments.in/reader034/viewer/2022042807/6268a8e1855e34613d0c11d3/html5/thumbnails/19.jpg)
Challenge: Measuring Success
• Nix “I don’t think the Red Baron would do it that way.”
• Approach based on Behavior Analysis
◦ 2007 workshop
◦ anecdote (if time allows)
• Holding back training examples for testing
◦ conventional ML technique
◦ best suggestion so far
![Page 20: Inductive Agent Modeling in Games - part 1](https://reader034.fdocuments.in/reader034/viewer/2022042807/6268a8e1855e34613d0c11d3/html5/thumbnails/20.jpg)
Detailed Example Using Lamarckian
Neuroevolution
• But let’s talk about this a bit more first...
![Page 21: Inductive Agent Modeling in Games - part 1](https://reader034.fdocuments.in/reader034/viewer/2022042807/6268a8e1855e34613d0c11d3/html5/thumbnails/21.jpg)
References
ReferencesBryant, B. D., and Miikkulainen, R. (2007). Acquiring visibly intelligent behavior with
example-guided neuroevolution. In Proceedings of the Twenty-Second Na-tional Conference on Artificial Intelligence (AAAI-07), 801–808. Menlo Park,CA: AAAI Press.
20
![Page 22: Inductive Agent Modeling in Games - part 1](https://reader034.fdocuments.in/reader034/viewer/2022042807/6268a8e1855e34613d0c11d3/html5/thumbnails/22.jpg)
Acquiring Visibly Intelligent Behaviorwith
Example-Guided Neuroevolution
Bobby D. BryantDepartment of Computer Science and Engineering
University of Nevada, Reno
Risto MiikkulainenDepartment of Computer SciencesThe University of Texas at Austin
Based on the AAAI’07 Presentation, July 25, 2007
![Page 23: Inductive Agent Modeling in Games - part 1](https://reader034.fdocuments.in/reader034/viewer/2022042807/6268a8e1855e34613d0c11d3/html5/thumbnails/23.jpg)
Introduction
Visibly Intelligent Behavior
Focus on agents in environments where -
• The agents’ behavior is
directly observable
• Humans have intuitions about
what is and is not intelligent
Obvious examples: games and simulators
1
![Page 24: Inductive Agent Modeling in Games - part 1](https://reader034.fdocuments.in/reader034/viewer/2022042807/6268a8e1855e34613d0c11d3/html5/thumbnails/24.jpg)
Background
Example Game Environment: Legion-II
• Discrete-state strategy game with video display
• Pits legions vs. barbarians
• Legions must learn to cooperateto minimize the barbarians’ pillage◦ pre-programmed barbarians
◦ legions trained by neuroevolution
• Designed as multi-agent testbed◦ complex enough to produce interesting phenomena
◦ transparent enough for analysis
◦ scalable in complexity
2
![Page 25: Inductive Agent Modeling in Games - part 1](https://reader034.fdocuments.in/reader034/viewer/2022042807/6268a8e1855e34613d0c11d3/html5/thumbnails/25.jpg)
Background
The Legions’ Sensors• Three egocentric sensor arrays detect
cities, barbarians, and other legions
• Sub-arrays provide range and direction
information for objects in six 60◦
“pie slices”:
NE E NWSE SW W NE E NWSE SW WX
Sense Adjacent Sense DistantLocalSense
Sensor Array
• Each ranged element computed asP
i1di
• 3× (1 + 6 + 6) = 39 FP sensor elements total
• Still only provides a fuzzy view of the map3
![Page 26: Inductive Agent Modeling in Games - part 1](https://reader034.fdocuments.in/reader034/viewer/2022042807/6268a8e1855e34613d0c11d3/html5/thumbnails/26.jpg)
Background
The Legions’ Controllers• Sensor activations are propagated through
an artificial neural network:
X NE E SE SW W NW
HiddenLayerNeurons
Key:
Neuron
Scalar
Sig
nal P
ropa
gatio
n
OutputNeurons
Controller Outputs
Sensor Inputs (39 elements)...
• Then the network’s output activations are decoded
to choose an action
• The network must be trained to the task4
![Page 27: Inductive Agent Modeling in Games - part 1](https://reader034.fdocuments.in/reader034/viewer/2022042807/6268a8e1855e34613d0c11d3/html5/thumbnails/27.jpg)
Background
Neuroevolution withEnforced Sub-Populations (ESP)
(Gomez 2003)
Sig
nal P
ropa
gatio
n
P17
P1
P2
X NE E SE SW W NW
...Breeding Populations
Sensor Inputs (39 elements)...
...
...
Controller Outputs
...
• Direct-encoding neuroevolution mechanism
• Separate breeding population for each neuron
◦ populations co-evolve to produce network
• Use game scores for fitness function
◦ score = pillage rate (lower is better)
◦ evaluation noise reduced by averaging5
![Page 28: Inductive Agent Modeling in Games - part 1](https://reader034.fdocuments.in/reader034/viewer/2022042807/6268a8e1855e34613d0c11d3/html5/thumbnails/28.jpg)
Background
How well does it work?
• Quantitative results – see prior publications
◦ visit
http://www.cse.unr.edu/˜bdbryant/#ref-research-publications
• Qualitative results – see movie
6
![Page 29: Inductive Agent Modeling in Games - part 1](https://reader034.fdocuments.in/reader034/viewer/2022042807/6268a8e1855e34613d0c11d3/html5/thumbnails/29.jpg)
Experiments
Learning from Examples
• Play a dozen games
• Capture examples as <state,action> pairs
◦ use egocentric sensor readings for state
◦ use the human’s choice of move for action
=⇒< , “West”>
7
![Page 30: Inductive Agent Modeling in Games - part 1](https://reader034.fdocuments.in/reader034/viewer/2022042807/6268a8e1855e34613d0c11d3/html5/thumbnails/30.jpg)
Experiments
Target Policies(Used for Example Generation)
• Policy family Ld, where d is an integer distance
◦ garrison may not move > d from city
◦ must return to city when no barbarians within d
• Safety condition
◦ garrison may not end with barbarian equally near city
◦ must move directly to city if unavoidable
◦ side effect: two barbarians can lock a garrison in
• Examined only d ∈ 0, 1 (limited sensor resolution)
◦ notice that L0 is degenerate (trivial)
• No constraints on ’rovers’8
![Page 31: Inductive Agent Modeling in Games - part 1](https://reader034.fdocuments.in/reader034/viewer/2022042807/6268a8e1855e34613d0c11d3/html5/thumbnails/31.jpg)
Experiments
Lamarckian Neuroevolution
populationof
representations
S produces a streamof contextual decisions
conditionally return to gene pool(for Lamarckian NE, return the tuned network)
tune up,then evaluate
environment
instantiate
S(ANN)
human−generated examplesare isomorphic to S’s decisions
(use them to partially train S before evaluating)
9
![Page 32: Inductive Agent Modeling in Games - part 1](https://reader034.fdocuments.in/reader034/viewer/2022042807/6268a8e1855e34613d0c11d3/html5/thumbnails/32.jpg)
Experiments
Lamarckian NeuroevolutionWith ESP!
populationof
representations
S produces a streamof contextual decisions
conditionally return to gene pool(for Lamarckian NE, return the tuned network)
tune up,then evaluate
environment
instantiate
S(ANN)
human−generated examplesare isomorphic to S’s decisions
(use them to partially train S before evaluating)
• Tuning is done with backpropagation
• ESP and Lamarckian mechanisms are orthogonal10
![Page 33: Inductive Agent Modeling in Games - part 1](https://reader034.fdocuments.in/reader034/viewer/2022042807/6268a8e1855e34613d0c11d3/html5/thumbnails/33.jpg)
Experiments
Comparanda• Unguided evolution (as before)
◦ 5,000 generations
◦ fitness = 1 / pillage rate
• Lamarckian neuroevolution
◦ 5,000 generations
◦ various amounts of training per generation· sample sizes: 5, 10, 20, 50, 100, 200, ... 10,000· only report results for 5,000 in the paper· in general, more examples−→ better results, higher cost
• Backpropagation
◦ 20,000 epochs
◦ full 11,000 examples per epoch11
![Page 34: Inductive Agent Modeling in Games - part 1](https://reader034.fdocuments.in/reader034/viewer/2022042807/6268a8e1855e34613d0c11d3/html5/thumbnails/34.jpg)
Experiments
Results (i)Coarse Behavior Metrics
0 20 40 60 80 100
02
46
8
Test Example Classification Error Rate (%)
Ave
rage
Tes
t Gam
e S
core
(pi
llage
rat
e)
Unguided evolutionL0 human playL0 backpropagationL0 guided evolutionL1 human playL1 backpropagationL1 guided evolution
12
![Page 35: Inductive Agent Modeling in Games - part 1](https://reader034.fdocuments.in/reader034/viewer/2022042807/6268a8e1855e34613d0c11d3/html5/thumbnails/35.jpg)
Experiments
Results (ii)Rule Induction – L0 Results
0 10 20 30 40 50
0.0
1.0
2.0
Game Turn
Avera
ge L
0 V
iola
tions
Unguided
Backpropagation
Lamarckian
13
![Page 36: Inductive Agent Modeling in Games - part 1](https://reader034.fdocuments.in/reader034/viewer/2022042807/6268a8e1855e34613d0c11d3/html5/thumbnails/36.jpg)
Experiments
Results (iii)Rule Induction – L1 Results
0 10 20 30 40 50
0.0
0.4
0.8
Game Turn
Avera
ge L
1 V
iola
tions
Unguided
Backpropagation
Lamarckian
14
![Page 37: Inductive Agent Modeling in Games - part 1](https://reader034.fdocuments.in/reader034/viewer/2022042807/6268a8e1855e34613d0c11d3/html5/thumbnails/37.jpg)
Experiments
Results (iv)Rule Induction – L0 Results
0 10 20 30 40 50
0.0
0.2
0.4
Game Turn
Avera
ge S
afe
ty V
iola
tions
Unguided
Backpropagation
Lamarckian
15
![Page 38: Inductive Agent Modeling in Games - part 1](https://reader034.fdocuments.in/reader034/viewer/2022042807/6268a8e1855e34613d0c11d3/html5/thumbnails/38.jpg)
Experiments
Results (v)Rule Induction – L1 Results
0 10 20 30 40 50
0.0
0.2
0.4
Game Turn
Avera
ge S
afe
ty V
iola
tions
Unguided
Backpropagation
Lamarckian
16
![Page 39: Inductive Agent Modeling in Games - part 1](https://reader034.fdocuments.in/reader034/viewer/2022042807/6268a8e1855e34613d0c11d3/html5/thumbnails/39.jpg)
Experiments
Results (vi)Rule Induction – L0 Results
0 10 20 30 40 50
0.0
0.4
0.8
Game Turn
Avera
ge L
egio
n D
ista
nce
Unguided
Backpropagation
Lamarckian
17
![Page 40: Inductive Agent Modeling in Games - part 1](https://reader034.fdocuments.in/reader034/viewer/2022042807/6268a8e1855e34613d0c11d3/html5/thumbnails/40.jpg)
Experiments
Results (vii)Rule Induction – L1 Results
0 10 20 30 40 50
0.0
0.4
0.8
Game Turn
Avera
ge L
egio
n D
ista
nce
Unguided
Backpropagation
Lamarckian
18
![Page 41: Inductive Agent Modeling in Games - part 1](https://reader034.fdocuments.in/reader034/viewer/2022042807/6268a8e1855e34613d0c11d3/html5/thumbnails/41.jpg)
Related Work
Related Work• Policy induction with rule-based systems
◦ behavioral cloning (Sammut et al. 1992)
◦ KnoMic (van Lent and Laird 2001)
◦ Style Machines (Brand and Hertzmann 2000)
• Social robotics / mimetic algorithms
◦ surveyed in Nicolescu (2003)
• Advice systems
◦ RL advice unit (Kuhlmann et al. 2004)
◦ applique networks (Yong et al. 2005)
• Inverse reinforcement learning (Ng and Russell 2000)
• User modeling
◦ adaptive user interfaces
◦ drama management for interactive fiction (AIIDE’07) 19
![Page 42: Inductive Agent Modeling in Games - part 1](https://reader034.fdocuments.in/reader034/viewer/2022042807/6268a8e1855e34613d0c11d3/html5/thumbnails/42.jpg)
Conclusions
Conclusions
• Some applications require intuitively correct behaviors
• Policy induction can simplify creating such behaviors
• Lamarckian neuroevolution can implement PIfor some applications
◦ beats backpropagation on enforcing rule conformance
◦ more power and efficiency are needed
◦ raises questions about how success can be measured
Related papers and movies can be found atwww.cse.unr.edu/ bdbryant/#ref-research-publications
20
![Page 43: Inductive Agent Modeling in Games - part 1](https://reader034.fdocuments.in/reader034/viewer/2022042807/6268a8e1855e34613d0c11d3/html5/thumbnails/43.jpg)
Acknowledgments
Acknowledgments
This research was sponsored in part by the Digital Media
Collaboratory at the IC2 Institute at the University of Texas at Austin.
It builds on earlier research supported in part by the
National Science Foundation under grant IIS-0083776
and the Texas Higher Education Coordinating Board
under grant ARP-003658-476-2001.
CPU time for the experiments was made possible by
NSF grant EIA-0303609, using the Mastodon cluster at UT-Austin.
Some of the images are taken from the open-source game Freeciv.
21
![Page 44: Inductive Agent Modeling in Games - part 1](https://reader034.fdocuments.in/reader034/viewer/2022042807/6268a8e1855e34613d0c11d3/html5/thumbnails/44.jpg)
References
ReferencesGomez, F. (2003). Robust Non-Linear Control Through Neuroevolution. PhD thesis,
Department of Computer Sciences, The University of Texas at Austin.
Kuhlmann, G., Stone, P., Mooney, R., and Shavlik, J. (2004). Guiding a reinforcementlearner with natural language advice: Initial results in RoboCup soccer. InThe AAAI-2004 Workshop on Supervisory Control of Learning and AdaptiveSystems.
Ng, A. Y., and Russell, S. (2000). Algorithms for inverse reinforcement learning. InProceedings of the 17th International Conference on Machine Learning, 663–670. San Francisco: Morgan Kaufmann.
Nicolescu, M. N. (2003). A Fremework for Learning From Demonstration, General-ization and Practiced in Human-Robot Domains. PhD thesis, University ofSouthern California.
Sammut, C., Hurst, S., Kedzier, D., and Michie, D. (1992). Learning to fly. In Proceed-
22
![Page 45: Inductive Agent Modeling in Games - part 1](https://reader034.fdocuments.in/reader034/viewer/2022042807/6268a8e1855e34613d0c11d3/html5/thumbnails/45.jpg)
References
ings of the Ninth International Conference on Machine Learning, 385–393. Ab-erdeen, UK: Morgan Kaufmann.
van Lent, M., and Laird, J. E. (2001). Learning procedural knowledge through obser-vation. In Proceedings of the International Conference on Knowledge Capture,179–186. New York: ACM.
Yong, C. H., Stanley, K. O., and Miikkulainen, R. (2005). Incorporating advice intoevolution of neural networks. In Proceedings of the Genetic and EvolutionaryComputation Conference (GECCO 2005): Late Breaking Papers.
22