Testbed for Integrating and Evaluating Learning Techniques

1Integrating Learning in Interactive Gaming Simulators

Testbed for Integrating and Evaluating Learning Techniques

David W. Aha1 & Matthew Molineaux2

1Intelligent Decision Aids GroupNavy Center for Applied Research in AI

Naval Research Laboratory; Washington, DC2ITT Industries; AES Division; Alexandria, VA

[email protected]

17 November 2004

TIELT

2Testbed for Integrating and Evaluating Learning Techniques

Outline

1. Motivation: Learning in cognitive systems2. Objectives:

• Encourage machine learning research on complex tasks that require knowledge-intensive approaches

• Provide industry & military with access to the results

3. Design: TIELT functionality & components4. Example: Knowledge base content5. Status:

• Implementation & documentation• Collaborations & events• Task list

6. Summary

1. Motivation: Learning in cognitive systems2. Objectives:

• Encourage machine learning research on complex tasks that require knowledge-intensive approaches

• Provide industry & military with access to the results

3. Design: TIELT functionality & components4. Example: Knowledge base content5. Status:

• Implementation & documentation• Collaborations & events• Task list

6. Summary

Thanks to our sponsor:


DARPA

Defense Advanced Research Projects Agency (~$2.3B/yr)

IPTO IXO MTO…

Information Processing Technology Office

• Selected previous achievements– Timesharing, Internet, Email, Speech Understanding, LISP, …

• Current focus: Cognitive Systems


Cognitive Systems

• A cognitive system is one that– can reason, using substantial amounts of

appropriately represented knowledge– can learn from its experience so that it performs

better tomorrow than it did today– can explain itself and be told what to do– can be aware of its own capabilities and reflect on its

own behavior – can respond robustly to surprise

• A cognitive system is one that– can reason, using substantial amounts of

appropriately represented knowledge– can learn from its experience so that it performs

better tomorrow than it did today– can explain itself and be told what to do– can be aware of its own capabilities and reflect on its

own behavior – can respond robustly to surprise

“Systems that know what they’re doing”


Anatomy of a Cognitive Agent

External Environment External Environment

Communication(language,gesture,image)

Prediction,planning

Deliberative Processes

Reflective Processes

Reactive Processes

Perception Action

STM

Sensors Effectors

Other reasoning

LTM

Concepts

SentencesCog

nit

ive

Ag

en

t

Affect

Attention

Learning

Learning

(Brachman, 2003)


Learning in Cognitive Systems(Langley & Laird, 2002)

Action preconditions

Plan adaptor

Action effects

Resource allocater

Information fuser

Decision application procedure

Decision selector, Conflict resolver

Explanation generation

Recall procedureRemembering & Reflection

Dialogue coordination

NL interpretationInteraction & Communication

Action utility

Action executerExecution & Action

Inferencing knowledge and procedures

Beliefs & belief relationsReasoning & Belief Maintenance

Plans, Plan generator (e.g., search method)Problem Solving & Planning

Monitoring focus

Environment modelPrediction & Monitoring

Situation categories, Situation categorizationPerception & Situation Assessment

Space of possible decisionsDecision Making & Choice

Categories, Pattern Categorizer

Patterns, Pattern recognizer Recognition & Categorization

Knowledge Container(s)Capability

Action preconditions

Plan adaptor

Action effects

Resource allocater

Information fuser

Decision application procedure

Decision selector, Conflict resolver

Explanation generation

Recall procedureRemembering & Reflection

Dialogue coordination

NL interpretationInteraction & Communication

Action utility

Action executerExecution & Action

Inferencing knowledge and procedures

Beliefs & belief relationsReasoning & Belief Maintenance

Plans, Plan generator (e.g., search method)Problem Solving & Planning

Monitoring focus

Environment modelPrediction & Monitoring

Situation categories, Situation categorizationPerception & Situation Assessment

Space of possible decisionsDecision Making & Choice

Categories, Pattern Categorizer

Patterns, Pattern recognizer Recognition & Categorization

Knowledge Container(s)Capability

Many opportunities

exist for learning in cognitive

systems


Problem

Status of Learning in Cognitive Systems

Few deployed cognitive systems integrate techniques that exhibit rapid & enduring learning behavior on complex tasks

– It’s costly to integrate & evaluate embedded learning techniques

Few deployed cognitive systems integrate techniques that exhibit rapid & enduring learning behavior on complex tasks

– It’s costly to integrate & evaluate embedded learning techniques

Complication

Machine learning (ML) researchers tend to investigate:¬Rapid: Knowledge poor algorithms

¬Enduring: Learning over a short time period

¬Embedded: Stand-alone evaluations

Machine learning (ML) researchers tend to investigate:¬Rapid: Knowledge poor algorithms

¬Enduring: Learning over a short time period

¬Embedded: Stand-alone evaluations


TIELT Motivation

We want Cognitive Agents that Learn• Rapidly,• in context, and• over the long-term.

We have few (if any) of them

We want Cognitive Agents that Learn• Rapidly,• in context, and• over the long-term.

We have few (if any) of them


TIELT Objective

Encourage the study of research on learning in cognitive systems, with subsequent transition goals

ML Researchers

Learning Modules

Cognitive Agents

Cognitive Agents

ThatLearn

Military

Industry


Current ML Research Focus

Benchmark studies of multiple algorithms on simple (e.g., supervised) learning tasks from many static datasets

ML Researcher

ML System1Database1

This was encouraged (in part) by the availability of datasets in a standard (interface) format

Database2

Databasem

ML System2

ML Systemn

......

m results on System1

m results on System2

m results on Systemn

...Analysis

BenchmarkAnalysis


Previous API for ML Investigations

Supervised Learning ML

SystemDatabase Interface

(standard format)

(e.g., UCI Repository)

ReasoningSystem



(standard format)


ReasoningSystem


Systemj

DatabaseiInterface

(standard format)

DecisionSystemk

Inspiration

UC Irvine Repository of Machine Learning (ML) Databases • An interface for empirical benchmarking studies on supervised learning• 1525 citations (and many publications use it w/o citing) since 1986

UC Irvine Repository of Machine Learning (ML) Databases • An interface for empirical benchmarking studies on supervised learning• 1525 citations (and many publications use it w/o citing) since 1986

Limitation

• Only useful for isolated ML studies• Has not encouraged studies of ML in cognitive systems

• Only useful for isolated ML studies• Has not encouraged studies of ML in cognitive systems


Accomplishing TIELT’s Objective

One approach: Shift ML research focus from static datasets to dynamic simulators of rich environments



(standard format)


ReasoningSystem



(standard format)


ReasoningSystem


Systemj

DatabaseiInterface

(standard format)

(e.g., UCI Repository of ML Databases)

DecisionSystemk

Cognitive Learning Reasoning Modules

Wor

ld(S

imu

late

d/R

ea

l)

Sensors ML Module

Interface(standard API)

ML Module

ML Module(e.g., TIELT)Effectors

Cognitive Learning Reasoning Modules

Wor

ld(S

imu

late

d/R

ea

l)

Sensors ML Module


ML Module

ML Module(e.g., TIELT)Effectors

Cognitive Learning Decision Systemk

Wor

ldi

(Sim

ula

ted/

Re

al)

Sensors ML Module


ML Module

ML Modulej

(e.g., TIELT)Effectors


Refining TIELT’s Objective

Objective

Develop a tool for evaluating decision systems in simulators– Specific support for evaluating learning techniques– Demonstrate research utility prior to approaching industry/military

Develop a tool for evaluating decision systems in simulators– Specific support for evaluating learning techniques– Demonstrate research utility prior to approaching industry/military

Benefits

1. Reduce system-simulator integration costs from m*n to m+n (see next)

2. Permits benchmark studies on selected simulator tasks

3. Encourages study of ML for knowledge-intensive problems

4. Provide support for DARPA Challenge Problems on Cognitive Learning

1. Reduce system-simulator integration costs from m*n to m+n (see next)

2. Permits benchmark studies on selected simulator tasks

3. Encourages study of ML for knowledge-intensive problems

4. Provide support for DARPA Challenge Problems on Cognitive Learning


Reducing Integration Costs

Integrating a simulator & cognitive system: Its expensive! (time, $)

Simulator1 Cognitive System1

Problem: Prohibitive integration costs retard research progress

m*n integrations

Simulator1

Simulatorm

Cognitive System1

Cognitive Systemn

......

Proposed Solution: Standardize integrations to reduce costs

m+n integrations

Simulator1

Simulatorm

Cognitive System1

Cognitive Systemn

TIELT......


What Domain?

Desiderata

1. Available implementations (cheap to acquire & run)2. Challenging problems for CogSys/ML research 3. Significant interest (academia, military, industry, funding, public)

1. Available implementations (cheap to acquire & run)2. Challenging problems for CogSys/ML research 3. Significant interest (academia, military, industry, funding, public)

Simulation Games?


Gaming Genres of Interest(modified from (Laird & van Lent, 2001))

Control units and strategic enemy (i.e., other coach), commentator

Act as coach and a key player

Madden NFL Football

Team Sports

Control enemy1st vs. 3rd personIndividual competitionMany (e.g., driving games)

Individual Sports

Control all units and strategic enemies

God, first-person perspectives

Controlling at multiple levels (e.g., strategic, tactical warfare)

Empire Earth 2, AoE, Civilization

Strategy (real-time, discrete)

Control enemies, partners, and supporting characters

Solo vs. (massively) multi-player

Be a character (includes puzzle solving, etc.)

Temple of Elemental Evil

Role-Playing

Control enemies1st vs. 3rd person, solo vs team play

Control a characterQuake, UnrealAction

AI RolesSub-GenresDescriptionExampleGenre


Some Game Environment Challenges

• Significant background knowledge available– e.g., Processes, tasks, objects, actions– Use: Provide opportunities for rapid learning

• Adversarial• Collaborative• Multiple reasoning levels (e.g., strategic, tactical)• Real-time• Uncertainty (“Fog of War”)• Noise (e.g., imprecision)• Relational (e.g., social networks)• Temporal• Spatial

• Significant background knowledge available– e.g., Processes, tasks, objects, actions– Use: Provide opportunities for rapid learning

• Adversarial• Collaborative• Multiple reasoning levels (e.g., strategic, tactical)• Real-time• Uncertainty (“Fog of War”)• Noise (e.g., imprecision)• Relational (e.g., social networks)• Temporal• Spatial


Focus: Broad interests

Academia: Learning in Simulation Games

Evidence of commitment

• Interactive Computer Games: Human-Level AI’s Killer Application (Laird & van Lent, AAAI’00 Invited Talk)

• MeetingsAAAI symposia (several in recent years)International Conference on Computers and GamesAAAI’04 Workshop on Challenges in Game AIAI in Interactive Digital Entertainment Conference (2005-) …

• New journals focusing on (e.g., real-time) simulation gamesJ. of Game DevelopmentInt. J. of Intelligent Games and Simulation

• Interactive Computer Games: Human-Level AI’s Killer Application (Laird & van Lent, AAAI’00 Invited Talk)

• MeetingsAAAI symposia (several in recent years)International Conference on Computers and GamesAAAI’04 Workshop on Challenges in Game AIAI in Interactive Digital Entertainment Conference (2005-) …

• New journals focusing on (e.g., real-time) simulation gamesJ. of Game DevelopmentInt. J. of Intelligent Games and Simulation

• Game engines (e.g., GameBots, ORTS, RoboCup Soccer Server)Use (other) open source engines (e.g., FreeCiv, Stratagus)

• Representation (e.g., Forbus et al., 2001; Houk, 2004; Munoz-Avila & Fisher, 2004)

• Learning opponent unit models (e.g., Laird, 2001; Hill et al., 2002)• … (see table)

• Game engines (e.g., GameBots, ORTS, RoboCup Soccer Server)Use (other) open source engines (e.g., FreeCiv, Stratagus)

• Representation (e.g., Forbus et al., 2001; Houk, 2004; Munoz-Avila & Fisher, 2004)

• Learning opponent unit models (e.g., Laird, 2001; Hill et al., 2002)• … (see table)


Name + Reference Method

TasksTest Plan & Metrics

(independent variables to vary and dependents to measure)

Learning Performance(Goodman, AAAI’93) Projective Visualization 1 TDIDT per

feature clusterPredict amount of

inflicted damageVary training amount & projection length;

predict summed pain

MAYOR (Fasciano, 1996 M.S. Thesis)

Case-based planning Plan Execution Conds.

Maximize SimCity Game Score

Online: Vary whether learning was used;measure % successful plan executions

(Fogel et al., CCGFBR’96) Genetic Alg Rule learning 1x1 tank battles Vary locations/space of routes; measure damage

KnoMic (van Lent & Laird, ICML’98) Production Rules Rule Conds. & Goals

Racetrack Mission for TacAir SOAR

Measure speed in which KnoMic learned correct control rules

(Agogino et al., 1999 NPL) Neuro-evolution Wt & genetic learning

30 gold-collecting peons vs. 1 human

Vary learning methodology; measure survival rate of peons

(Laird, ICAA’01) SOAR Chunking Rule learning Predict enemy beh. None; would focus on speedup

(Geisler, 2002 M.S. Thesis) NB, TDIDT, BP, ensembles

Depends on the method

4 simple classification tasks

Vary training set size & #ensembles;measure classification accuracy

Bryant & Mikkulainen, CEC’03) Neuroevolution NN wts, etc. Discrete Legions vs. Barbarians

Offline: Vary training set size; measure a game-specific fn.

(Chia & Williams, BRIMS’03) Naïve Bayes Learning to add/del. rules

1x1 tank battles Vary adversarial aggressiveness & whether learning occurs; measure #wins

(Fagan & Cunningham, ICCBR’03) Case-based prediction Selecting plans to save

Predict a player’s action Vary the #stored plans and the user;measure acc. & prediction freq.

(Guestrin et al., IJCAI’03) Relational MDPs Partition objects Beat enemy in 3x3 Freecraft games

Simplistic: one run.

(Sweetser & Dennis, 2003 Ent. Computing: Tech. & Applications)

Advice giving Regression wts Just-in-time Hints to Human Player

Vary with vs. without providing hints; measure % hints that were useful

(Spronck et al., 2004 IJIGS) Dynamic Scripting Rule wts Beat NWN AI in simple scenarios

Offline: Measure average turning pt & speed, effectiveness, robustness, & efficiency

(Ponsen, 2004 M.S. Thesis) Dynamic Scripting & GA for rule learning

Rule wts and new rules

Defeat Wargus opponent

Offline: Vary map size, learning algorithm, and opponent control alg; measure % wins

(Ulam et al., AAAI’04 Workshop) Self-adaptation Task Edits Defend city (FreeCiv) Offline: Vary trace size; measure % successes

Survey: Selected Previous Work onLearning & Gaming Simulators


Industry: Learning in Simulation Games

Focus: Increase sales via enhanced gaming experience

• USA: $7B in sales in 2003 (ESA, 2004)– Strategy games: $0.3B

• Simulators: Many! (e.g., SimCity, Quake, SoF, UT)• Target: Control avatars, unit behaviors

• USA: $7B in sales in 2003 (ESA, 2004)– Strategy games: $0.3B

• Simulators: Many! (e.g., SimCity, Quake, SoF, UT)• Target: Control avatars, unit behaviors


• Developers: “keenly interested in building AIs that might learn, both from the player & environment around them.” (GDC’03 Roundtable Report)

• Middleware products that support learning (e.g., MASA, SHAI, LearningMachine)• Long-term investments in learning (e.g., iKuni, Inc.)• Conferences:

– Game Developer’s Conference– Computer Game Technology Conference

• Developers: “keenly interested in building AIs that might learn, both from the player & environment around them.” (GDC’03 Roundtable Report)

• Middleware products that support learning (e.g., MASA, SHAI, LearningMachine)• Long-term investments in learning (e.g., iKuni, Inc.)• Conferences:

– Game Developer’s Conference– Computer Game Technology Conference


Industry: Learning in Simulation Games

Some Promising Techniques (Rabin, 2004)• Belief networks for probabilistic inference• Decision tree learning• Genetic algorithms (e.g., for offline parameter tuning)• Statistical prediction (e.g., using N-grams to predict future events)• Neural networks (e.g., for offline applications)• Player modeling (e.g., to regulate game difficulty, model reputation)• Reinforcement learning• Weakness modification learning (e.g., don’t repeat failed

strategies)

• Belief networks for probabilistic inference• Decision tree learning• Genetic algorithms (e.g., for offline parameter tuning)• Statistical prediction (e.g., using N-grams to predict future events)• Neural networks (e.g., for offline applications)• Player modeling (e.g., to regulate game difficulty, model reputation)• Reinforcement learning• Weakness modification learning (e.g., don’t repeat failed

strategies)

Status

• Few deployed systems have used learning (Kirby, 2004): e.g.,1. Black & White: on-line, explicit (player immediately reinforces behavior)2. C&C Renegade: on-line, implicit (agent updates set of legal paths)3. Re-volt: off-line, implicit (GA tunes racecar behaviors prior to shipping)

• Problems: Performance, constraints (preventing learning “something dumb”), trust in learning system

• Few deployed systems have used learning (Kirby, 2004): e.g.,1. Black & White: on-line, explicit (player immediately reinforces behavior)2. C&C Renegade: on-line, implicit (agent updates set of legal paths)3. Re-volt: off-line, implicit (GA tunes racecar behaviors prior to shipping)

• Problems: Performance, constraints (preventing learning “something dumb”), trust in learning system


Military: Learning in Simulation Games

Focus: Training, analysis, & experimentation

• Learning: Acquisition of new knowledge or behaviors• Simulators: JWARS, OneSAF, Full Spectrum Command, etc.• Target: Control strategic opponent or own units

• Learning: Acquisition of new knowledge or behaviors• Simulators: JWARS, OneSAF, Full Spectrum Command, etc.• Target: Control strategic opponent or own units


• “Learning is an essential ability of intelligent systems” (NRC, 1998)• “To realize the full benefit of a human behavior model within an intelligent

simulator,…the model should incorporate learning” (Hunter et al., CCGBR’00)• “Successful employment of human behavior models…requires that [they] possess

the ability to integrate learning” (Banks & Stytz, CCGBR’00)• Conferences: BRIMS, I/ITSEC

• “Learning is an essential ability of intelligent systems” (NRC, 1998)• “To realize the full benefit of a human behavior model within an intelligent

simulator,…the model should incorporate learning” (Hunter et al., CCGBR’00)• “Successful employment of human behavior models…requires that [they] possess

the ability to integrate learning” (Banks & Stytz, CCGBR’00)• Conferences: BRIMS, I/ITSEC

Status: No CGF simulator has been deployed with learning (D. Reece, 2003)

Some problems (Petty, CGFBR’01):• Cost of training phase• Loss of training control• Learning non-doctrinal behaviors• Learning unpredictable behaviors

Some problems (Petty, CGFBR’01):• Cost of training phase• Loss of training control• Learning non-doctrinal behaviors• Learning unpredictable behaviors


Analysis: Conclusions

State-of-the-art

1. Research on learning in complex gaming simulators is in its infancy• Knowledge-poor approaches are limited to simple performance tasks• Knowledge-intensive approaches require huge knowledge bases, which to date

have been manually encoded

2. Existing approaches have many simplifying assumptions• Scenario limitations (e.g., on number and/or capabilities of adversaries)• Learning is (usually) performed only off-line • Learned knowledge is not transferred (e.g., to playing other games)

1. Research on learning in complex gaming simulators is in its infancy• Knowledge-poor approaches are limited to simple performance tasks• Knowledge-intensive approaches require huge knowledge bases, which to date

have been manually encoded

2. Existing approaches have many simplifying assumptions• Scenario limitations (e.g., on number and/or capabilities of adversaries)• Learning is (usually) performed only off-line • Learned knowledge is not transferred (e.g., to playing other games)

Significant advances would include:

1. Fast acquisition approaches for a large amount of domain knowledge • This would enable rapid learning without requiring manual encoding

2. Demonstrations of on-line learning (i.e., within a single simulation run)3. Increasing knowledge transfer among tasks & simulators over time

• e.g., knowledge of processes, strategies, tasks, roles, objects, & actions

1. Fast acquisition approaches for a large amount of domain knowledge • This would enable rapid learning without requiring manual encoding

2. Demonstrations of on-line learning (i.e., within a single simulation run)3. Increasing knowledge transfer among tasks & simulators over time

• e.g., knowledge of processes, strategies, tasks, roles, objects, & actions


TIELT Specification

1. Simplifies integration & evaluation!• Learning-embedded decision systems & gaming simulators• Supports communications, game model, perf. task, evaluation• Free & available

2. Learning foci• Task (e.g., learn how to execute, or advise on, a task)• Player (e.g., accept advice, predict a player’s strategies)• Game (e.g., learn/refine its objects, their relations, & behaviors)

3. Learning methods• Supervised/unsupervised, immediate/delayed feedback, analytic,

active/passive, online/offline, direct/indirect, automated/interactive• Learning results should be available for inspection

4. Gaming simulators: Those with challenging learning tasks5. Reuse:

• Communications are separated from the game model & perf. task• Provide access to libraries of simulators & decision systems

1. Simplifies integration & evaluation!• Learning-embedded decision systems & gaming simulators• Supports communications, game model, perf. task, evaluation• Free & available

2. Learning foci• Task (e.g., learn how to execute, or advise on, a task)• Player (e.g., accept advice, predict a player’s strategies)• Game (e.g., learn/refine its objects, their relations, & behaviors)

3. Learning methods• Supervised/unsupervised, immediate/delayed feedback, analytic,

active/passive, online/offline, direct/indirect, automated/interactive• Learning results should be available for inspection

4. Gaming simulators: Those with challenging learning tasks5. Reuse:

• Communications are separated from the game model & perf. task• Provide access to libraries of simulators & decision systems


Distinguishing TIELT

System Focus $ Game Engine(s)

Prominent Feature

Reasoning Activity

DirectIA (MASA)

AI SDK FPS, RTS, etc.

Behavior authoring Sense-act, …

SimBionic (SHAI)

AI SDK FPS, etc. Behavior authoring Sense-act, …

FEAR AI SDK Quake 2, etc. Behavior authoring Sense-act, …

RoboCup Research Testbed

RoboCup Soccer game play Sense-act, coaching, etc.

GameBots Research Testbed

UT (FPS) UT game play Sense-act

ORTS Research Testbed

RTS games Hack-free MM RTS Sense-act, strategy

TIELT Research Testbed

Several genres

Experimentation for evaluating learning & learned behaviors

Sense-act, advice processing, prediction, model updating, etc.

1. Provides an interface for message-passing interfaces2. Supports composable system-level interfaces


TIELT’sInternal

CommunicationModules

TIELT’s KBEditors

TIELT’s KBEditors

Selected/Developed Knowledge BasesSelected/Developed Knowledge Bases

GamePlayer(s)

GameEngineLibrary

GameEngineLibrary

Stratagus

Full Spectrum Command

TIELT’s User Interface TIELT’s User Interface

EvaluationInterface

TIELTUser

TIELTUser

SelectedGameEngine

SelectedGameEngine

DecisionSystemLibrary


ReasoningSystem

ReasoningSystem

Learning Module

. . .

Learning Module

ReasoningSystem

ReasoningSystem

Learning Module

. . .

Learning Module

DecisionSystem

DecisionSystem

Learning Module

. . .

Learning Module

SelectedDecisionSystem


TIELT: Integration Architecture

Knowledge Base Libraries


...

Learned Knowledge

(inspectable)

PredictionInterface

CoordinationInterface

AdviceInterface

GameModel

Agent Description

GameInterface

Model

Decision System

InterfaceModel

ExperimentMethodology

GIMGIM

GIM

DSIMDSIM

DSIM

GMGM

GM

ADAD

AD

EMEM

EM


TIELT’s Knowledge Bases

GameModel

Agent Description

GameInterface

Model

Decision System

InterfaceModel


Defines communication processes with the game engine

Defines communication processes with the decision system

Defines interpretation of the game• e.g., initial state, classes, operators, behaviors (rules)• Behaviors could be used to provide constraints on learning

Defines what decision tasks (if any) TIELT must support

Defines selected performance tasks (taken from Game Model Description) and the experiment to conduct


TIELT: Supported Performance Tasks

Types of Problem Solving Tasks

Analysis Synthesis

Classification

Diagnosis

Planning Design Scheduling

StructuralParametric

DecisionSupport

Performance vs. learning tasks

Performance: Application of the learned knowledge (e.g., classification)Learning: Activity of learning system (e.g., update weights in a neural net)

TIELT users will define complex, user-configurable performance tasks


An Example Complex Learning Task

Subtasks and supporting operations

1. Diagnosis: Identify (computer and/or human) opponent strategies & goals• Classification: Opponent recognition • Recording: Actions of opponents and their effects

–This repeatedly involves classification• Diagnosis: Identify goal(s) being solved by these effects• Classification: Identify goal(s), if solved, that prevents opponent goals

2. Planning: Select/adapt or create plan to achieve goals and win the game• Classification: Select top-level actions to achieve goals

– Iteratively identify necessary sub-goals and, finally, primitive actions• Design (parametric): Identify good initial layout of controllable assets

3. Execute plan• Recording: Collect measures of effectiveness, to provide feedback• Planning: If needed, re-plan, based on feedback, at Step 2

Task description

Win a real-time strategy gameThis involves several

challenging learning tasks


TIELT’sInternal


TIELT’s KBEditors

TIELT’s KBEditors


GamePlayer(s)

GameEngineLibrary

GameEngineLibrary

Stratagus



EvaluationInterface

TIELTUser

TIELTUser

SelectedGame

Engine

SelectedGame

Engine



ReasoningSystem

ReasoningSystem

Learning Module

. . .

Learning Module

ReasoningSystem

ReasoningSystem

Learning Module

. . .

Learning Module

DecisionSystem

DecisionSystem

Learning Module

. . .

Learning Module



Use: Controlling a Game Character

Raw StateProcessed

State

DecisionAction



Learned Knowledge

(inspectable)

PredictionInterface


AdviceInterface

GameModel

Agent Description

GameInterface

Model

Decision System

InterfaceModel


GIMGIM

GIM

DSIMDSIM

DSIM

GMGM

GM

ADAD

AD

EMEM

EM


UT Example: Game Model

Classes

Player Team: String Number: Integer Position: Location

Location x: Integer y: Integer z: Integer

State DescriptionPlayers : Array[ ] of PlayerSelf : PlayerScore : Integer…

OperatorsShoot(Player) Preconditions: Player.isVisible Effects: Player.Health -= rand(10)MoveTo(Location) Preconditions: Location.isReachable() Effects: Self.position == Location…

RulesGetShotBy(Player) Preconditions: Player.hasLineOfSight(Self) Effects: Self.Health -= rand(10)EnemyMovements(Enemy, Location1, Location2) Preconditions: Location2.isReachableFrom(Location1)

Enemy.position == Location1 Effects: Enemy.position == Location2…


UT Example: Game Interface Model

Action TemplatesTURN(Pitch: real, Yaw: real, Roll: real)SETWALK(Walk: boolean) //Start walking or runningRUNTO(Target: integer) //ID of object in world…

Sensor TemplatesCWP(Weapon: integer) //Change Weapon to Weapon with this IdFLG(Id: integer, Reachable: boolean, State: Symbol <held, dropped, home>…

Examples interface messages from the GameBots API• http://www.planetunreal.com/gamebots/docapi.html

CommunicationMedium: TCP/IP, Port 3000Message Format: <name> {<attr1> <value1>} {<attr2> <value2>} …


UT Example: Decision System Interface Model

Example Template Messages Sent By TIELT

InitializeGameRules(ruleSet: Array [ ] of Rule)SendStateUpdates(CurrentState: Array [ ] of Object)LoadScenario(SavedGameFilename: String)…

Communication Medium: Standard I/OMessage Format: (<name> <value1> <value2> <value3> … )

Template Messages Received By TIELT

GiveAdvice(AdviceMessage: String)PerformAction(OperatorName: String, Parameters: Array [ ] of String)AskForValue(AttributeName: String)…


UT Example: Agent Description

Think-Act Cycle

Shoot Something Pick up a HealthpackGo Somewhere Else

Call ShootOperator

Ask Decision System:Where Do I Go?

Ask Decision System:Where Do I Go?

Call PickupOperator


UT Example: Experiment Methodology

InitializationGame Model: Unreal Tournament.xmlGame Interface: GameBots.xmlDecision System: MyUTBot.xmlRuns: 100Call slowdown(0.5)

MetricsFragCount: Self.killsFragsPerSecond: Self.kills/LengthOfGameAverageHealth: Self.health x

PlotFragCount vs. RunsAverageHealth vs. # of playersFragsPerSecond vs. Outdegree of net nodes


TIELT’sInternal


TIELT’s KBEditors

TIELT’s KBEditors


GamePlayer(s)

GameEngineLibrary

GameEngineLibrary

Stratagus



EvaluationInterface

TIELTUser

TIELTUser

SelectedGame

Engine

SelectedGame

Engine



ReasoningSystem

ReasoningSystem

Learning Module

. . .

Learning Module

ReasoningSystem

ReasoningSystem

Learning Module

. . .

Learning Module

DecisionSystem

DecisionSystem

Learning Module

. . .

Learning Module



Use: Predicting Opponent Actions

Raw StateProcessed

StatePrediction

Prediction



Learned Knowledge

(inspectable)

PredictionInterface


AdviceInterface

GameModel

Agent Description

GameInterface

Model

Decision System

InterfaceModel


GIMGIM

GIM

DSIMDSIM

DSIM

GMGM

GM

ADAD

AD

EMEM

EM


TIELT’sInternal


TIELT’s KBEditors

TIELT’s KBEditors


GamePlayer(s)

GameEngineLibrary

GameEngineLibrary

Stratagus



EvaluationInterface

TIELTUser

TIELTUser

SelectedGame

Engine

SelectedGame

Engine



ReasoningSystem

ReasoningSystem

Learning Module

. . .

Learning Module

ReasoningSystem

ReasoningSystem

Learning Module

. . .

Learning Module

DecisionSystem

DecisionSystem

Learning Module

. . .

Learning Module



Use: Updating a Game Model

Raw StateProcessed

State

Edit

Edit



Learned Knowledge

(inspectable)

PredictionInterface


AdviceInterface

GameModel

Agent Description

GameInterface

Model

Decision System

InterfaceModel


GIMGIM

GIM

DSIMDSIM

DSIM

GMGM

GM

ADAD

AD

EMEM

EM


TIELT: A Researcher Use Case


GameModel

GameInterface

Model

GameEngineLibrary

GameEngineLibrary

Stratagus




ReasoningSystem

ReasoningSystem

Learning Module

. . .

Learning Module

ReasoningSystem

ReasoningSystem

Learning Module

. . .

Learning Module

DecisionSystem

DecisionSystem

Learning Module

. . .

Learning Module



SelectedGameEngine

SelectedGameEngine

1. Define/store decision system interface model

2. Select game simulator & interface3. Select game model4. Select/define performance task(s)5. Define/select expt. methodology6. Run experiments7. Analyze displayed results

1. Define/store decision system interface model

2. Select game simulator & interface3. Select game model4. Select/define performance task(s)5. Define/select expt. methodology6. Run experiments7. Analyze displayed results

Decision System

InterfaceModel

Agent Description




GIMGIM

GIM

DSIMDSIM

DSIM

GMGM

GM

ADAD

AD

EMEM

EM


TIELT: A Game Developer Use Case


GameEngineLibrary

GameEngineLibrary

Stratagus




ReasoningSystem

ReasoningSystem

Learning Module

. . .

Learning Module

ReasoningSystem

ReasoningSystem

Learning Module

. . .

Learning Module

DecisionSystem

DecisionSystem

Learning Module

. . .

Learning Module

Decision System

InterfaceModel



1. Define/store game interface model

2. Define/store game model3. Select decision system/interface4. Define performance task(s)5. Define/select expt. methodology6. Run experiments7. Analyze displayed results

1. Define/store game interface model

2. Define/store game model3. Select decision system/interface4. Define performance task(s)5. Define/select expt. methodology6. Run experiments7. Analyze displayed results

GameInterface

Model

SelectedGameEngine

SelectedGameEngine

GameModel

Agent Description




GIMGIM

GIM

DSIMDSIM

DSIM

GMGM

GM

ADAD

AD

EMEM

EM


SelectedGame

Engine

Expt. Method.Editor

Game InterfaceModel Editor

Percepts

User

Decision SystemInterface Model Editor

Game ModelEditor

Agent Descr.Editor

GameModel

AgentDescription

Pe

rf.

Ta

sk

EvaluationInterface

Evaluator

Action / Control

Translator(Mapper)

Learning OutputsActions

Translated Model (Subset)

Learning Task

GameInterface

Model

Decision System

InterfaceModel

LearningTranslator(Mapper)

CurrentState

ModelUpdater

Database


StoredState

AdviceInterface Database

EngineState

Controller


TIELT’s Internal Communication Modules


Sensing the Game State(City placement example, inspired by Alpha Centauri, etc.)

TIELT

Game Interface

Model Editor

Sensors

User

Game ModelEditor

GameModel

Up

da

tes

GameInterface

Model

ActionTranslator

Actions

GameEngine

GameEngine

CurrentState

1

2

4 3

4

In Game Engine, the game begins; a colony pod is created and placed.

1

The Game Engine sends a “See” sensor message identifying the pod’s location.

This message template provides updates (instructions) to the Current State, telling it that there is a pod at the location See describes.

4

2

The Model Updater receives the sensor message and finds the corresponding message template in the Game Interface Model.

3

ControllerModel

Updater

3

The Model Updater notifies the Controller that the See action event has occurred.

5

5


Fetching Decisions from the Decision System(City placement example)

TIELT



LearningModule #1

LearningModule #n

. . .

User


Agent Desc.Editor

AgentDescription

LearningTranslator

Translated Model (Subset)

Decision System

InterfaceModel

ActionTranslator

LearningOutputs

The Controller notifies the Learning Translator that it has received a See message.

The Learning Translator finds a city location task, which is triggered by the See message. It queries the controller for the learning mode, then creates a TestInput message to send to the reasoning system with information on the pod’s location and the map from the Current State.

The Decision System transmits output to the Action Translator.

The Learning Translator transmits the TestInput message to the Decision System.

1

2 23

4

Controller

CurrentState

1

4

2

3


TIELT

Game InterfaceModel Editor

User

ActionTranslator

Actions

GameEngine

GameEngine

1

2

4.a

The Action Translator receives a TestOutput message from the Decision System.

The Action Translator finds the TestOutput message template, determines it is associated with the city location task, and builds a MovePod operator (defined by the Current State) with the parameters of TestOutput.

The Game Engine receives Move and updates the game to move the pod toward its destination, or

The Action Translator determines that the Move Action from the Game Interface Model is triggered by the MovePod Operator and binds Move using information from MovePod.


3

GameInterface

Model

Decision System

InterfaceModel

AdviceInterface

The Advice Interface receives Move and displays advice to a human player on what to do next, or makes a Prediction.

4.b, c1

4.a

2

3

Acting in the Game World(City placement example)

4.b

CurrentState

2

PredictionInterface

4.c

3


Implementation

TIELT Status (November 2004)

• TIELT (v0.5) available• Features

– Message protocols• Current: Console I/O, TCP/IP, UDP• Future: Library calls, HLA interface, RMI (possibly)

– Message content: Configurable• Instantiated templates tell it how to communicate with other modules

– Initialization messages: Start, Stop, Load Scenario, Set Speed– Game Model representations (w/ Lehigh University)

• Simple programs• TMK process models• PDDL (language used in planning competitions)

• TIELT (v0.5) available• Features

– Message protocols• Current: Console I/O, TCP/IP, UDP• Future: Library calls, HLA interface, RMI (possibly)

– Message content: Configurable• Instantiated templates tell it how to communicate with other modules

– Initialization messages: Start, Stop, Load Scenario, Set Speed– Game Model representations (w/ Lehigh University)

• Simple programs• TMK process models• PDDL (language used in planning competitions)



Documentation

• TIELT User’s Manual (82 pages)1. TIELT Overview2. The TIELT User Interface3. Scripting in TIELT4. Theory of the Game Model5. Communications6. TMK Models7. Experiments

• TIELT Tutorial (45 pages)1. The Game Model2. The Game Interface Model3. Decision System Interface Model4. Agent Description5. Experiment Methodology

• TIELT User’s Manual (82 pages)1. TIELT Overview2. The TIELT User Interface3. Scripting in TIELT4. Theory of the Game Model5. Communications6. TMK Models7. Experiments

• TIELT Tutorial (45 pages)1. The Game Model2. The Game Interface Model3. Decision System Interface Model4. Agent Description5. Experiment Methodology



Access

• TIELT www site (new) • Selected Components

– Documents: Documentation, publications, XML Spec– Status– Forum: A full-featured web forum/bulletin board – Bug Tracker: TIELT bug/feature tracking facility – FAQ-o-Matic: Questions and problem solutions; user-driven– Download

• TIELT www site (new) • Selected Components

– Documents: Documentation, publications, XML Spec– Status– Forum: A full-featured web forum/bulletin board – Bug Tracker: TIELT bug/feature tracking facility – FAQ-o-Matic: Questions and problem solutions; user-driven– Download


1. Communication

You Are

Here

2. Resources for learning to use TIELT

• TIELT Scripting syntax highlighting• Map of TIELT Component Interactions

– Thanks, Megan• Typed script interface

• TIELT Scripting syntax highlighting• Map of TIELT Component Interactions

– Thanks, Megan• Typed script interface

TIELT Issues (November 2004)

TCP/IP

TIELT

Library Calls

SWIG

TIELT is a multilingual application; this provides interfacing with many different games.

TIELT is a multilingual application; this provides interfacing with many different games.


Game Model3. Formatting

To no one’s surprise, everyone agrees that TIELT’s Game Model representation is inadequate.

Requests have been made for:• 3D Maps (Quake)• A different programming language• A relational operator representation• Standardized events

To no one’s surprise, everyone agrees that TIELT’s Game Model representation is inadequate.

Requests have been made for:• 3D Maps (Quake)• A different programming language• A relational operator representation• Standardized events

TIELT Issues (November 2004)

“We’re working on it”


TIELT’sInternal


TIELT’s KBEditors

TIELT’s KBEditors


GameModel

Task Descriptions

GameInterface

Model

Decision System

InterfaceModel



PredictionInterface

EvaluationInterface


AdviceInterface

TIELT User

TIELTUser

TIELT Collaborations (2004-05)

Knowledge Base LibrariesKnowledge Base Libraries

Game LibraryGame Library

Mad Doc

EE2 ToEE

Troika

FreeCiv

NWU ISLE

Platform LibraryPlatform Library

Stratagus

Lehigh U.

UT Arl.

FSC/R

USC/ICT

UrbanTerror

U. Minn-D.

RoboCup

Decision System LibraryDecision System Library

Learning Modules

Soar: U.MichICARUS: ISLE

DCA: UT Arlington

Neuroevolution: UT Austin

Others: Many

LU, USC Mich/ISLEU. Mich.Many Many

U.Minn-D. USC/ICTU.Mich.


TIELT Collaboration Projects (2004-05)

Organization Game Interface and Model

Decision System

Tasks and Evaluation Methodology

Mad Doc Software Empire Earth 2 (RTS)

Troika Games Temple of Elemental Evil (RPG)

ISLE SimCity (~RTS) ICARUS ICARUS w/ FreeCiv, design

Lehigh U. Stratagus/Wargus (RTS), and HTN/TMK designs

Case-based planner (CBP)

Wargus/CBP

NWU FreeCiv (discrete strategy), and qualitative game representations

U. Michigan SOAR SOAR w/ 2 games(e.g., FSW, ToEE), design

U. Minnesota-Duluth RoboCup (team sports) Advice-taking components

Advice processing

USC/ICT Full Spectrum Command(RTS)

SOAR with FSC

UT Arlington Urban Terror (FPS) DCA (lite version)

UT Austin Neuroevolution e.g., Neuroevolution/EE2


Games Being Integrated with TIELT

Category Gaming Simulator DescriptionGenre Foci Perspective

Commercial 1. Empire Earth II (Mad Doc S/W)

2. Temple of Elemental Evil (Toika)

3. SimCity (ISLE)

• RTS• Role-

playing• RTS

• Civilization• Solve

quests• City

manager

• God• 1st person

• God

Freeware 1. FreeCiv (NWU) (~Civilization)

2. Wargus (Lehigh U.) (~Warcraft II)

3. Urban Terror (UT Arlington)

4. RoboCup Soccer (UW)

• Discrete strategy

• RTS• FPS• Team

sports

• Civilization

• Civilization• Shooter• Team of

agents

• God

• God• 1st person• Behavior

designer

Military • Full Spectrum Command (USC/Inst. Creative Technologies)

• RTS • Leading an Army Light Infantry Company

• 1st person


Promising Learning Strategies

Learning Strategy

Description When to Use Justification

Advice Giving Expert explains how to perform in a given state (this is the only interactive strategy listed here)

Speedup needed & expert is available

Permits quick acquisition of specific and general domain knowledge

Backpropagation Trains a 3-layer neural network (NN) of sigmoidal hidden units

Target is a non-linear function; offline training is ok

Many learning tasks are non-linear and some can be performed off-line

Case-Based Reasoning

Use/adapt solutions from experiences to solve similar problems

Cases complement incomplete domain model; problem-solving speed is crucial.

Quicker to adapt cases than reason from scratch, but requires domain-specific adaptation knowledge

Chunking Compile a sequence of steps into a macro For tasks requiring speedup Transforms a complex reasoning task into a fast retrieval task

Dynamic Scripting

RL for tasks with large state spaces that w/ domain knowledge can be collapsed into a smaller set

Small set of states exist, with a set of rules for each

Greatly speeds up RL approach, but requires analysis of task states

Evolutionary Computation

Evolutionary (genetic) selection on a population of genomes, where application dictates their rep’n

Search space is huge, and training can be done offline

Genome rep’ns can be task specific, so this powerful search method can be tuned for the task

Meta Reasoning After a failure, this identifies its type & task that failed, it retrieves a task-specific strategy to avoid this failure, and updates its model

To support self-adaptation Although knowledge intensive, this is an excellent method for changing problem-solving strategies

Neuroevolution Using a separate genetic algorithm population for learning each hidden unit’s weight in a NN

To support cooperating heterogeneous agents

A good offline agent-based learning approach for multi-agent gaming

Reinforcement Learning (RL)

Reinforce sequence of decisions after problem solving is completed

Reward is known only after sequence ends, and blame can be ascribed

Well-understood paradigm for learning action policies (i.e., what action to perform in a given state)

Relational MDPs Learn a Markov decision process re: objects & their relations using probabilistic relational models

Seeking knowledge transfer (KT) to similar environments

KT is crucial for learning quickly, and feasibly, for some tasks


TIELT-General Game Player Integration(with Stanford University’s Michael Genesereth)

GGP

TIELT GGP-TIELT

• Logical game formalisms• Access to remote players• WWW access

• Experiment design/control capabilities• Common game engine interface• Support for several learning approaches

• Play entire class of general games as well as TIELT-integrated gaming simulators.

• Compete remotely against reference players and other GGP systems.

• Define evaluation methodologies for learning experimentation.

• Participate in AAAI’05 GGP Competition.

Integration Architecture

GGPTest Bed

GGP Competitors

TIELT

TIELT-Ready GGP Competitors

W W W

Reference Opponents


Upcoming Events

1. National Conference on AI (AAAI’05; 24-28 July; Pittsburgh)– General Game Playing Competition ($10K prize)

2. Int. Joint Conference on AI (IJCAI’05; 30 July-5 August; Edinburgh)– Workshop: Reasoning, Representation, and Learning in Gaming

Simulation Tasks (Tentative title)

3. Int. Conference on ML (ICML’05; 7-11 August; Bonn)– Workshop submission in progress

4. Int. Conference on CBR (ICCBR’05; 23-26 August; Chicago)– Workshop & Competition: CBR in Games

1. National Conference on AI (AAAI’05; 24-28 July; Pittsburgh)– General Game Playing Competition ($10K prize)

2. Int. Joint Conference on AI (IJCAI’05; 30 July-5 August; Edinburgh)– Workshop: Reasoning, Representation, and Learning in Gaming

Simulation Tasks (Tentative title)

3. Int. Conference on ML (ICML’05; 7-11 August; Bonn)– Workshop submission in progress

4. Int. Conference on CBR (ICCBR’05; 23-26 August; Chicago)– Workshop & Competition: CBR in Games


Summary

TIELT: Mediates between a (gaming) simulator and a learning-embedded decision system

• Goals: – Simplify running learning expts with cognitive systems– Support DARPA challenge problems in learning

• Designed to work with many types of simulators & decision systems

TIELT: Mediates between a (gaming) simulator and a learning-embedded decision system

• Goals: – Simplify running learning expts with cognitive systems– Support DARPA challenge problems in learning

• Designed to work with many types of simulators & decision systems

Status: • TIELT (v0.5 Alpha) completed in 10/04

– User’s Manual, Tutorial, & www site exist• 10 collaborating organizations (1-year contracts)

– Enhances probability that TIELT will achieve its goals• We’re planning several TIELT-related events

Status: • TIELT (v0.5 Alpha) completed in 10/04

– User’s Manual, Tutorial, & www site exist• 10 collaborating organizations (1-year contracts)

– Enhances probability that TIELT will achieve its goals• We’re planning several TIELT-related events


Backup Slides


Metrics

Industry perspective

1. Ability to develop learned/learning behaviors of interest2. Time required to

• develop game interface & model KBs, and • these behaviors

3. Availability of learning-embedded reasoning systems4. Support for both off-line and on-line learning

1. Ability to develop learned/learning behaviors of interest2. Time required to

• develop game interface & model KBs, and • these behaviors

3. Availability of learning-embedded reasoning systems4. Support for both off-line and on-line learning

Research perspective

1. Time required to develop reasoning interface KB2. Ability to design/facilitate selected evaluation methodology3. Expressiveness of KB representation4. Breadth of learning techniques supported5. Breadth of learning and performance tasks supported6. Availability of integrated gaming simulators & challenges

1. Time required to develop reasoning interface KB2. Ability to design/facilitate selected evaluation methodology3. Expressiveness of KB representation4. Breadth of learning techniques supported5. Breadth of learning and performance tasks supported6. Availability of integrated gaming simulators & challenges


Some Expected User Metrics

Performance tasks

1. Some standards• e.g., classification accuracy, ROC analyses, precision & recall

2. Decision making speed and accuracy3. Plan execution quality (e.g., time to execute, mission-

specific Measures of Effectiveness)4. Number of constraint violations5. Ability to transfer learned knowledge

1. Some standards• e.g., classification accuracy, ROC analyses, precision & recall

2. Decision making speed and accuracy3. Plan execution quality (e.g., time to execute, mission-

specific Measures of Effectiveness)4. Number of constraint violations5. Ability to transfer learned knowledge


TIELT: Potential Learning Challenge Problems

1. Learn to win a game (i.e., accomplish an objective)• e.g., solve a challenging diplomacy task, provide a realistic

military training course facing intelligent adversaries, or help users to develop real-time cognitive reasoning skills for a defined role in support of a multi-echelon mission

1. Learn to win a game (i.e., accomplish an objective)• e.g., solve a challenging diplomacy task, provide a realistic

military training course facing intelligent adversaries, or help users to develop real-time cognitive reasoning skills for a defined role in support of a multi-echelon mission

2. Learn an adversary’s strategy• e.g., predict a terrorist group’s plan and/or tactics, suggest

appropriate responses to prevent adversarial goals, help users identify characteristics of adversarial strategies

2. Learn an adversary’s strategy• e.g., predict a terrorist group’s plan and/or tactics, suggest

appropriate responses to prevent adversarial goals, help users identify characteristics of adversarial strategies

3. Learn crucial processes of an environment• e.g., learn to improve an incorrect/incomplete game model so that

it more accurately/reliably defines objects/agents in the game, their behaviors, their capabilities, and their limitations

3. Learn crucial processes of an environment• e.g., learn to improve an incorrect/incomplete game model so that

it more accurately/reliably defines objects/agents in the game, their behaviors, their capabilities, and their limitations

4. Intelligent situation assessment• e.g., learn which factors in the simulation require attention to

accomplish different types of tasks

4. Intelligent situation assessment• e.g., learn which factors in the simulation require attention to

accomplish different types of tasks


Example Game: FreeCiv(Discrete-time strategy)

http://www.freeciv.org

Civilization II (MicroProse)

•Civilization II (1996-): 850K+ copies sold – PC Gamer: Game of the Year Award winner– Many other awards

•Civilization series (1991-): Introduced the civilization-based game genre

•Civilization II (1996-): 850K+ copies sold – PC Gamer: Game of the Year Award winner– Many other awards

•Civilization series (1991-): Introduced the civilization-based game genre

FreeCiv (Civ II clone)

•Open source freeware•Discrete strategy game•Goal: Defeat opponents, or build a spaceship

•Resource management – Economy, diplomacy,

science, cities, buildings, world wonders

– Units (e.g., for combat)•Up to 7 opponent civs•Partial observability

•Open source freeware•Discrete strategy game•Goal: Defeat opponents, or build a spaceship

•Resource management – Economy, diplomacy,

science, cities, buildings, world wonders

– Units (e.g., for combat)•Up to 7 opponent civs•Partial observability


Previous FreeCiv/Learning Research

(Ulam et al., AAAI’04 Workshop on Challenges in Game AI)

• Title: Reflection in Action: Model-Based Self-Adaptation in Game Playing Agents • Scenarios:

– City defense: Defend a city for 3000 years

• Title: Reflection in Action: Model-Based Self-Adaptation in Game Playing Agents • Scenarios:

– City defense: Defend a city for 3000 years


FreeCiv CP Scenario

General description

• Game initialization: Your only unit, a “settler”, is placed randomly on a random world (see Game Options below). Players cyclically alternate play

• Objective: Obtain highest score, conquer all opponents, or build first spaceship• Scoring: “Basic” goal is to obtain 1000 points. Game options affect the score.

– Citizens: 2 pts per happy citizen, 1 per content citizen– Advances: 20 pts per World Wonder, 5 per “futuristic” advance– Peace: 3 pts per turn of world peace (no wars or combat)– Pollution: -10pts per square currently polluted

• Top-level tasks (to achieve a high score): – Develop an economy– Increase population– Pursue research advances– Opponent interactions: Diplomacy and defense/combat

• Game initialization: Your only unit, a “settler”, is placed randomly on a random world (see Game Options below). Players cyclically alternate play

• Objective: Obtain highest score, conquer all opponents, or build first spaceship• Scoring: “Basic” goal is to obtain 1000 points. Game options affect the score.

– Citizens: 2 pts per happy citizen, 1 per content citizen– Advances: 20 pts per World Wonder, 5 per “futuristic” advance– Peace: 3 pts per turn of world peace (no wars or combat)– Pollution: -10pts per square currently polluted

• Top-level tasks (to achieve a high score): – Develop an economy– Increase population– Pursue research advances– Opponent interactions: Diplomacy and defense/combat

Game Option Y1 Y2 Y3

World size Small Normal Large

Difficulty level Warlord (2/6) Prince (3/6) King (4/6)

#Opponent civilizations 5 5 7

Level of barbarian activity Low Medium High


FreeCiv CP Information Sources

Concepts in an Initial Knowledge Base

• Resources: Collection and useo Food, production, trade (money)

• Terrain: o Resources gained per turno Movement requirements

• Units:o Type (Military, trade, diplomatic, settlers, explorers)o Healtho Combat: Offense & defenseo Movement constraints (e.g., Land, sea, air)

• Government Types (e.g., anarchy, despotism, monarchy, democracy)• Research network: Identifies constraints on what can be studied at any time• Buildings (e.g., cost, capabilities)• Cities

o Population Growtho Happinesso Pollution

• Civilizations (e.g., military strength, aggressiveness, finances, cities, units)• Diplomatic states & negotiations

• Resources: Collection and useo Food, production, trade (money)

• Terrain: o Resources gained per turno Movement requirements

• Units:o Type (Military, trade, diplomatic, settlers, explorers)o Healtho Combat: Offense & defenseo Movement constraints (e.g., Land, sea, air)

• Government Types (e.g., anarchy, despotism, monarchy, democracy)• Research network: Identifies constraints on what can be studied at any time• Buildings (e.g., cost, capabilities)• Cities

o Population Growtho Happinesso Pollution

• Civilizations (e.g., military strength, aggressiveness, finances, cities, units)• Diplomatic states & negotiations


FreeCiv CP Decisions

Civilization decisions • Choice of government type (e.g., democracy)• Distribution of income devoted to research, entertainment, and wealth goals• Strategic decisions affecting other decisions (e.g., coordinated unit movement for trade)

• Choice of government type (e.g., democracy)• Distribution of income devoted to research, entertainment, and wealth goals• Strategic decisions affecting other decisions (e.g., coordinated unit movement for trade)

City decisions

Unit decisions

Diplomacy decisions

• Production choice (i.e., what to create, including city buildings and units)• Citizen roles (e.g., laborers, entertainers, or specialists), and laborer placement

– Note: Locations vary in their terrain, which generate different amounts of food, income, and production capability

• Production choice (i.e., what to create, including city buildings and units)• Citizen roles (e.g., laborers, entertainers, or specialists), and laborer placement

– Note: Locations vary in their terrain, which generate different amounts of food, income, and production capability

• Task (e.g., where to build a city, whether/where to engage in combat, espionage)• Movement

• Task (e.g., where to build a city, whether/where to engage in combat, espionage)• Movement

• Whether to sign a proffered peace treaty with another civilization• Whether to offer a gift

• Whether to sign a proffered peace treaty with another civilization• Whether to offer a gift


FreeCiv CP Decision Space

Variables • Civilization-wide variables

o N: Number of civilizations encounteredo D: Number of diplomatic states (that you can have with an opponent)o G: Number of government types available to youo R: Number of research advances that can be pursued o I: Number of partitions of income into entertainment, money, & research

• U: #Unitso L: Number of locations a unit can move to in a turn

• C: #Citieso Z: Number of citizens per cityo S: Citizen status (i.e., laborer, entertainer, doctor)o B: Number of choices for city production

• Civilization-wide variableso N: Number of civilizations encounteredo D: Number of diplomatic states (that you can have with an opponent)o G: Number of government types available to youo R: Number of research advances that can be pursued o I: Number of partitions of income into entertainment, money, & research

• U: #Unitso L: Number of locations a unit can move to in a turn

• C: #Citieso Z: Number of citizens per cityo S: Citizen status (i.e., laborer, entertainer, doctor)o B: Number of choices for city production

Decision complexity per turn (for a typical game state)

• O(DNGRI*LU*(SZB)C) ; this ignores both other variables and domain knowledgeo This becomes large with the number of units and citieso Example: N=3; D=5; G=3; R=4; I=10; U=25; L=4; C=8; Z=10; S=3; B=10o Size of decision space (i.e., possible next states): 2.5*1065 (in one turn!)

o Comparison: Decision space of chess per turn is well below 140 (e.g., 20 at first move)

• O(DNGRI*LU*(SZB)C) ; this ignores both other variables and domain knowledgeo This becomes large with the number of units and citieso Example: N=3; D=5; G=3; R=4; I=10; U=25; L=4; C=8; Z=10; S=3; B=10o Size of decision space (i.e., possible next states): 2.5*1065 (in one turn!)

o Comparison: Decision space of chess per turn is well below 140 (e.g., 20 at first move)


FreeCiv CP: A Simple Example Learning Task

Situation• We’re England (e.g., London)• Barbarians are north (in red)• Two other civs exist• Our military is weak

• We’re England (e.g., London)• Barbarians are north (in red)• Two other civs exist• Our military is weak

What should we do?• Ally with Wales? If so, how?• Build a military unit? Which?• Improve defenses? • Increase city’s production rate?• Build a new city to the south? Where?• Research “Gun Powder”? Or…?• Move our diplomat back to London?• A combination of these?

• Ally with Wales? If so, how?• Build a military unit? Which?• Improve defenses? • Increase city’s production rate?• Build a new city to the south? Where?• Research “Gun Powder”? Or…?• Move our diplomat back to London?• A combination of these?

What information could help with this decision?• Previous similar experiences• Generalizations of those experiences• Similarity knowledge

• Previous similar experiences• Generalizations of those experiences• Similarity knowledge

• Adaptation knowledge• Opponent model• Statistics on barbarian strength, etc.

• Adaptation knowledge• Opponent model• Statistics on barbarian strength, etc.


Decision Space Size

Analysis of the Example Learning Task

Situation• D: 3 (war, neutral, peace)• N: Only 1 other civilization

contacted (i.e., Wales)• G: 2 government types known• R: 4 research advances available• I: 5 partitions of income available• L: ~14 per unit• U: 3 Units (1 external, 2 in city)• C: 1 City

– S: 3 (entertainer, laborer, doctor)– Z: 6 citizens– B: 5 units/buildings it can produce

• D: 3 (war, neutral, peace)• N: Only 1 other civilization

contacted (i.e., Wales)• G: 2 government types known• R: 4 research advances available• I: 5 partitions of income available• L: ~14 per unit• U: 3 Units (1 external, 2 in city)• C: 1 City

– S: 3 (entertainer, laborer, doctor)– Z: 6 citizens– B: 5 units/buildings it can produce

• 1.2*109

• This reduces to ~32 sensible choices after applying some domain knowledge– e.g., don’t change diplomatic status now, keep units in city for defense, don’t change government

now (because it’ll slow production), keep external unit away from danger

• 1.2*109

• This reduces to ~32 sensible choices after applying some domain knowledge– e.g., don’t change diplomatic status now, keep units in city for defense, don’t change government

now (because it’ll slow production), keep external unit away from danger

Complexity function

• O(DNGRI*LU*(SZB)C)• O(DNGRI*LU*(SZB)C)


FreeCiv CP: Learning Opportunities

Learn to keep citizens happy • Citizens in a city who are unhappy will revolt; this temporarily eliminates city production• Several factors influence happiness (e.g., entertainment, military presence, gov’t type)

• Citizens in a city who are unhappy will revolt; this temporarily eliminates city production• Several factors influence happiness (e.g., entertainment, military presence, gov’t type)

Learn to obtain diplomatic advantages

Learn how to wage war successfully

Learn how to increase territory size

• Countries at war tend to have decreased trade, lose units and cities, etc.• Diplomats can sometimes obtain peace treaties or otherwise end wars• Unit movement decisions can also impact opponents’ diplomatic decisions

• Countries at war tend to have decreased trade, lose units and cities, etc.• Diplomats can sometimes obtain peace treaties or otherwise end wars• Unit movement decisions can also impact opponents’ diplomatic decisions

• Good military decisions can yield new cities/citizens/trade, but losses can be huge• Unit decisions can benefit from learning tactical coordinated behaviors• The selection of a military unit(s) for a task depends on the opponent’s capabilities

• Good military decisions can yield new cities/citizens/trade, but losses can be huge• Unit decisions can benefit from learning tactical coordinated behaviors• The selection of a military unit(s) for a task depends on the opponent’s capabilities

• Initially, unexplored areas are unknown; their resources (e.g., gold) cannot be harvested• Exploration needs to be balanced with security• City placement decisions influence territory expansion

• Initially, unexplored areas are unknown; their resources (e.g., gold) cannot be harvested• Exploration needs to be balanced with security• City placement decisions influence territory expansion


FreeCiv CP: Example Learned Knowledge

Learn what playing strategy to use in each adversarial situation

Combat Strength

Advantage

Current Diplomatic Status with Opponent

Allied Peace Neutral Distrustful War

None

Unfavorable

Favorable

Attack

Retreat!

Fortify

Legend

Trade

Seek Peace

Bribe

Strategy to use per adversarial situation

• Situations are defined by relative military strength, diplomatic status, whether the opponent has strong alliances, locations of forces, etc.

• Selecting a good playing strategy depends on many of these variables

• Situations are defined by relative military strength, diplomatic status, whether the opponent has strong alliances, locations of forces, etc.

• Selecting a good playing strategy depends on many of these variables


What Techniques Could Learn the Task of Selecting a Playing Strategy?

Meta-reasoning (e.g., Ulam et al., AAAI’04 Wkshp on Challenges in Game AI)

• Requires knowledge on:1. Tasks being performed

2. Types of failures that can occur when performing these tasks• T2: Overestimate own strength, underestimate enemy strength, …• T3: Incorrect assessment of enemy’s diplomatic status, …

3. Strategies for adapting these tasks• S1: Increase military strength • S2: Assess distribution of enemy forces• S3: Consider enemy’s diplomatic history

4. Mapping of failure types in (2) to adaptation strategies in (3)• Example: We decided to Attack, but underestimated enemy strength.

This was indexed by strategy S2, which we’ll do from now on in T2.

• Requires knowledge on:1. Tasks being performed

2. Types of failures that can occur when performing these tasks• T2: Overestimate own strength, underestimate enemy strength, …• T3: Incorrect assessment of enemy’s diplomatic status, …

3. Strategies for adapting these tasks• S1: Increase military strength • S2: Assess distribution of enemy forces• S3: Consider enemy’s diplomatic history

4. Mapping of failure types in (2) to adaptation strategies in (3)• Example: We decided to Attack, but underestimated enemy strength.

This was indexed by strategy S2, which we’ll do from now on in T2.

T1: Determine Playing Strategy

T3: Assess Diplomatic Status

T4: Select Strategy

Attack Retreat!Fortify TradeSeek PeaceBribe

T2: Assess Military Advantage


Challenges for Using Learning via Meta-Reasoning

How can its background knowledge be learned (efficiently)?

• i.e., tasks, failure types, failure adaptation strategies, mappings• Also, the agent needs to understand how to diagnosis an error

(i.e., identify which task failed and its failure type)

• i.e., tasks, failure types, failure adaptation strategies, mappings• Also, the agent needs to understand how to diagnosis an error

(i.e., identify which task failed and its failure type)

Can we scale it to more challenging learning problems?

• Currently, it has only been applied to simpler tasks– “Defend a City” (in FreeCiv)

• More difficult would be “Play Entire Game”

• Currently, it has only been applied to simpler tasks– “Defend a City” (in FreeCiv)

• More difficult would be “Play Entire Game”

What if only incomplete background knowledge exists?

• Could complementary learning techniques apply it?– e.g., Relational MDPs (which handle uncertainty)

• Could learning techniques be used to extend/correct it?– e.g., Learning from advice, case-based reasoning

• Could complementary learning techniques apply it?– e.g., Relational MDPs (which handle uncertainty)

• Could learning techniques be used to extend/correct it?– e.g., Learning from advice, case-based reasoning


Full Spectrum Command & Warrior(http://www.ict.usc.edu/disp.php?bd=proj_games)

Focus: US Army training tools (deployed @ Ft Benning & Afghanistan)

1. Full Spectrum Command (PC-based simulator)– Role: Commander of a U.S. Army light infantry Company (120 soldiers)– Tasks: Interpret the assigned mission, organize the force, plan

strategically, & coordinate the actions of the Company2. Full Spectrum Warrior (MS Xbox-based simulator)

– Role: Light infantry squad leader – Tasks: Complete assigned missions safely

1. Full Spectrum Command (PC-based simulator)– Role: Commander of a U.S. Army light infantry Company (120 soldiers)– Tasks: Interpret the assigned mission, organize the force, plan

strategically, & coordinate the actions of the Company2. Full Spectrum Warrior (MS Xbox-based simulator)

– Role: Light infantry squad leader – Tasks: Complete assigned missions safely

Organization: USC’s Institute for Creative Technologies

• POC: Michael van Lent (Editor-in-Chief, Journal of Game Development)• Goal: Develop immersive, interactive, real time training simulations to

help the Army create decision-making & leadership-development tools

• POC: Michael van Lent (Editor-in-Chief, Journal of Game Development)• Goal: Develop immersive, interactive, real time training simulations to

help the Army create decision-making & leadership-development tools


METAGAME(Pell, 1992)

Focus: Learn strategies to win any game in a pre-defined category

• Initial category: “Chess-like” games– Games are produced by a game generator

• Input: Rules on how to play the game– Move grammar is used to communicate actions

• Output (desired): A winning playing strategy

• Initial category: “Chess-like” games– Games are produced by a game generator

• Input: Rules on how to play the game– Move grammar is used to communicate actions

• Output (desired): A winning playing strategy

e.g., Knight-Zone Chess

Annual Competition based on METAGAME

• Title: General Game Playing (games.stanford.edu)• Champion: Michael Genesereth (Stanford U.)• AAAI’05 Prize: $10K

• Title: General Game Playing (games.stanford.edu)• Champion: Michael Genesereth (Stanford U.)• AAAI’05 Prize: $10K

Game Manager

Player

Games

RecordsTemporary State Data

Graphics for

Spectatorsperceptactionsclocks

action


Collaborator: Mad Doc Software

Summary

• PI: Ron Rosenberg (Producer)• Experience:

• Mad Doc is a leader in real-time strategy games; Empire Earth II is expected to sell in the millions of copies

• CEO Ian Davis (CMU PhD in Robotics) is a well known collaborator with the AI research community, and gave an invited presentation at AAAI’04. He will work with Ron on this contract.

• Deliverables: Mad Doc (RTS) game simulator API–This will be used by multiple other collaborators

• PI: Ron Rosenberg (Producer)• Experience:

• Mad Doc is a leader in real-time strategy games; Empire Earth II is expected to sell in the millions of copies

• CEO Ian Davis (CMU PhD in Robotics) is a well known collaborator with the AI research community, and gave an invited presentation at AAAI’04. He will work with Ron on this contract.

• Deliverables: Mad Doc (RTS) game simulator API–This will be used by multiple other collaborators


Collaborator: Troika Games

Summary

• PI: Tim Cain, Joint-CEO• Experience:

• Troika has outstanding experience with developing state-of-the-art role playing games, including Temple of Elemental Evil (ToEE)

• A game developer since 1982, Tim obtained an M.S. with a focus on machine learning at UC Irvine in the late 1980’s.

• Deliverables: ToEE (RPG) game simulator API–This will be used by some other collaborators (e.g., U. Michigan)

• PI: Tim Cain, Joint-CEO• Experience:

• Troika has outstanding experience with developing state-of-the-art role playing games, including Temple of Elemental Evil (ToEE)

• A game developer since 1982, Tim obtained an M.S. with a focus on machine learning at UC Irvine in the late 1980’s.

• Deliverables: ToEE (RPG) game simulator API–This will be used by some other collaborators (e.g., U. Michigan)


Collaborator: ISLE

Summary

• PIs: Dr. Seth Rogers, Dr. Pat Langley• Experience:

• ISLE (Institute for the Study of Learning and Expertise) is known for its ICARUS cognitive architecture, which is distinguished in part by its commitment to ground every symbol with a physical world object

• Pat Langley, founder of the journal Machine Learning, is known for his expertise in cognitive architectures and evaluation methodologies of learning systems.

• Deliverables: • ICARUS reasoning system API

• FreeCiv agent (with assistance from NWU) and SimCity agent• This will also be used by USC/ICT

• SimCity (RTS) game simulator API

• PIs: Dr. Seth Rogers, Dr. Pat Langley• Experience:

• ISLE (Institute for the Study of Learning and Expertise) is known for its ICARUS cognitive architecture, which is distinguished in part by its commitment to ground every symbol with a physical world object

• Pat Langley, founder of the journal Machine Learning, is known for his expertise in cognitive architectures and evaluation methodologies of learning systems.

• Deliverables: • ICARUS reasoning system API

• FreeCiv agent (with assistance from NWU) and SimCity agent• This will also be used by USC/ICT

• SimCity (RTS) game simulator API


Collaborator: Lehigh U.

Summary

• PI: Prof. Héctor Muñoz-Avila• Experience:

• Héctor is an expert on hierarchical planning technology, and in particular has expertise in case-based planning

• Collaborating with NRL on TIELT during CY04 on (1) Game Model description representations, (2) Stratagus/Wargus game simulator API, and (3) feedback on TIELT usage

• Deliverables: • Software for translating among Game Model representations• Stratagus/Wargus (RTS) game simulator API

– This may be used by UT Austin• Case-based planning reasoning system API

• PI: Prof. Héctor Muñoz-Avila• Experience:

• Héctor is an expert on hierarchical planning technology, and in particular has expertise in case-based planning

• Collaborating with NRL on TIELT during CY04 on (1) Game Model description representations, (2) Stratagus/Wargus game simulator API, and (3) feedback on TIELT usage

• Deliverables: • Software for translating among Game Model representations• Stratagus/Wargus (RTS) game simulator API

– This may be used by UT Austin• Case-based planning reasoning system API


Collaborator: NWU

Summary

• PIs: Prof. Ken Forbus, Prof. Tom Hinrichs• Experience:

• Ken is a leading AI/games researcher. He is also the leading worldwide researcher in computational approaches to reasoning by analogy.

• Ken’s group has extensive experience with qualitative reasoning approaches and with using the FreeCiv gaming simulator.

• Deliverables: • FreeCiv (Discrete Strategy) game simulator API

– This will be used by ISLE• Qualitative spatial reasoning system for FreeCiv API

• PIs: Prof. Ken Forbus, Prof. Tom Hinrichs• Experience:

• Ken is a leading AI/games researcher. He is also the leading worldwide researcher in computational approaches to reasoning by analogy.

• Ken’s group has extensive experience with qualitative reasoning approaches and with using the FreeCiv gaming simulator.

• Deliverables: • FreeCiv (Discrete Strategy) game simulator API

– This will be used by ISLE• Qualitative spatial reasoning system for FreeCiv API


Collaborator: U. Michigan

Summary

• PI: Prof. John Laird• Experience:

• John is the best-known AI/games researcher, and has extensive experience with integrating many commerical, freeware, and military game simulators with the Soar cognitive architecture.

• Deliverables: • Soar reasoning system API

– This will be used by USC/ICT• Applications of Soar to two game simulators (e.g., ToEE, Wargus)

• PI: Prof. John Laird• Experience:

• John is the best-known AI/games researcher, and has extensive experience with integrating many commerical, freeware, and military game simulators with the Soar cognitive architecture.

• Deliverables: • Soar reasoning system API

– This will be used by USC/ICT• Applications of Soar to two game simulators (e.g., ToEE, Wargus)


Collaborator: USC/ICT

Summary

• PI: Dr. Michael van Lent• Experience:

• Extensive implementation experience with AI/game research; PhD advisor was John Laird.

• Lead ICT’s development of Full Spectrum Warrior and Full Spectrum Command (FSC) in collaboration with Quicksilver Software and the Army’s PEO STRI. FSC is deployed at Ft. Benning and Afghanistan.

• Editor-in-Chief, Journal of Game Development• Deliverables:

• FSC (RTS) game simulator API• Applications of FSC with U. Michigan’s Soar and ISLE’s ICARUS

• PI: Dr. Michael van Lent• Experience:

• Extensive implementation experience with AI/game research; PhD advisor was John Laird.

• Lead ICT’s development of Full Spectrum Warrior and Full Spectrum Command (FSC) in collaboration with Quicksilver Software and the Army’s PEO STRI. FSC is deployed at Ft. Benning and Afghanistan.

• Editor-in-Chief, Journal of Game Development• Deliverables:

• FSC (RTS) game simulator API• Applications of FSC with U. Michigan’s Soar and ISLE’s ICARUS


Collaborator: UT Arlington

Summary

• PIs: Prof. Larry Holder, G. Michael Youngblood• Experience:

• Larry has extensive experience with developing unsupervised machine learning systems that use relational representations, and has lead efforts on developing the D’Artagnan cognitive architecture.

• Deliverables: • Urban Terror (FPS) game simulator API• D’Artagnan reasoning system API (partial)

• PIs: Prof. Larry Holder, G. Michael Youngblood• Experience:

• Larry has extensive experience with developing unsupervised machine learning systems that use relational representations, and has lead efforts on developing the D’Artagnan cognitive architecture.

• Deliverables: • Urban Terror (FPS) game simulator API• D’Artagnan reasoning system API (partial)


Collaborator: UT Austin

Summary

• PI: Prof. Risto Miikkulainen• Experience:

• Risto has significant experience with integrating neuro-evolution and similar approaches with game simulators.

• Collaborating with UT Austin’s Digital Media Laboratory’s development of the NERO (FPS) game simulator

• Deliverables: • Knowledge-intensive neuro-evolution reasoning system API• Application of this API using other simulators (e.g., FSC, Wargus)

and U. Wisconsin’s advice processing module

• PI: Prof. Risto Miikkulainen• Experience:

• Risto has significant experience with integrating neuro-evolution and similar approaches with game simulators.

• Collaborating with UT Austin’s Digital Media Laboratory’s development of the NERO (FPS) game simulator

• Deliverables: • Knowledge-intensive neuro-evolution reasoning system API• Application of this API using other simulators (e.g., FSC, Wargus)

and U. Wisconsin’s advice processing module


Collaborator: U. Wisconsin

Summary

• PI(s): Prof. Jude Shavlik (UW), Prof. Richard Maclin (U. Minn-Duluth)• Experience:

• Jude advised the first significant M.S. Thesis on applying machine learning to FPS game simulators (Geisler, 2002)

• Maclin, who will be on sabbatical at U. Wisconsin during this project, has performed extensive work with applying AI techinques (e.g., advice processing) to the RoboCup game simulator

• Deliverables: • RoboCup (team sports) game simulator API• Advice processing module• WWW-based repository for TIELT software components (e.g., APIs)

• PI(s): Prof. Jude Shavlik (UW), Prof. Richard Maclin (U. Minn-Duluth)• Experience:

• Jude advised the first significant M.S. Thesis on applying machine learning to FPS game simulators (Geisler, 2002)

• Maclin, who will be on sabbatical at U. Wisconsin during this project, has performed extensive work with applying AI techinques (e.g., advice processing) to the RoboCup game simulator

• Deliverables: • RoboCup (team sports) game simulator API• Advice processing module• WWW-based repository for TIELT software components (e.g., APIs)

Testbed for Integrating and Evaluating Learning Techniques

Documents

Transcript of Testbed for Integrating and Evaluating Learning Techniques