The Fawcett Society Kat Banyard Campaigns Officer, Fawcett Society [email protected].
1 Transfer Learning Site Visit August 4, 2006 Report of the ISLE Team Pat Langley Tom Fawcett Daniel...
-
Upload
haley-wagner -
Category
Documents
-
view
213 -
download
0
Transcript of 1 Transfer Learning Site Visit August 4, 2006 Report of the ISLE Team Pat Langley Tom Fawcett Daniel...
1
Transfer Learning Site Visit August 4, 2006
Report of the ISLE Team
Pat LangleyTom Fawcett
Daniel ShapiroInstitute for the Study of Learning and Expertise
ICARUS
Transfer Learning Site Visit August 4, 2006
Results from Year 1for the ISLE Team
3
ISLE: Transfer in ICARUSPI: Pat Langley
ICARUS Architecture
Architecture Components• Conceptual inference: Icarus performs bottom-up inference from relational ground state literals to higher level state concepts.
• Skill execution: Icarus retrieves relevant skills for goals and executes them reactively.
• Skill learning: Icarus acquires general hierarchical reactive skills that explain/generate successful solution paths.
• Value learning: Icarus employs reinforcement learning to acquire a value function over game states using a factored state representation (hierarchy of first-order predicates)
Long-TermLong-TermConceptualConceptual
MemoryMemory
Short-TermShort-TermConceptualConceptual
MemoryMemory
Short-TermShort-TermGoal/SkillGoal/SkillMemoryMemory
ConceptualConceptualInferenceInference
SkillSkillExecutionExecution
PerceptionPerception
EnvironmentEnvironment
PerceptualPerceptualBufferBuffer
Problem SolvingProblem SolvingSkill LearningSkill Learning
MotorMotorBufferBuffer
Skill RetrievalSkill Retrieval
Long-TermLong-TermSkill MemorySkill Memory
Contains relational and hierarchical knowledge
about relevant concepts
Generates beliefs using observed environment and long term
conceptual knowledge Creates internal description of the
perceived environment
Contains descriptions of the perceived objects
Contains inferred beliefs about the environment
Contains hierarchical knowledge about executable skills
Finds novel solutions for
achieving goals
Acquires new skills based on
successful problem solving traces
Selects relevant skills based on beliefs and goals
Contains goals and intentions
Executes skills on the environment
•Logically defined arbitrary rules of play•Addressed by learning value function over game states
Testbeds: Urban Combat and GGP
•First-person real-time shooter game•Goal: find and defuse IEDs•Addressed by learning new skills
Results• Urban Combat: Evaluation ongoing• GGP: Transfer ratio of 1.3 on TL 7, jump start of 20..
TL MetricsReward
Score P ValueTransfer ratio 1.3009 0.2715Transfer ratio (truncated) 1.4375 0.1660Transfer ratio (smoothed) -0.6753 0.8055Jump start 28.5714 0.0900Jump start (smoothed) 19.2857 0.1735ARR (narrow) 0.6833 0.1900ARR (wide) -INFINITY 0.4005Asymptotic Advantage -INFINITY 0.5885Ratio (area under curves) 0.7687 0.6645Transfer difference 464.2869 0.3335Transfer difference (scaled)
4
University of Michigan: Transfer in SoarPI: John Laird
Payoff
Problem/Objective
Solution Approach/Accomplishments
• Study transfer learning using multiple online architectural learning mechanisms
• Chunking (EBL)• Reinforcement Learning, • Semantic Learning• Episodic Learning
• Determine strengths and weaknesses• Develop reasoning strategies that
maximize transfer
• Fair comparison of learning mechanisms• All use same performance system
• Integration and synthesis of multiple learning mechanisms and reasoning strategies on same problem
• Not reliant on one mechanism• Best technique used for given problem• Positive interaction between methods
• Integrated Soar & Urban Combat Testbed• Three learning approaches in UCT
• Levels 0-2• Significant transfer
Body
Long-Term MemoriesProcedural
Short-Term Memory
Dec
isio
n P
roce
dure
Chunking
Episodic
EpisodicLearning
SemanticLearning
Semantic
ReinforcementLearning
Perception Action
Soar
Level Memory Search RL
0 31.3 38.7 22.8
1 10.4 n/a 6.8
2 9.6 8.1 1.1
5
Northwestern University: CompanionsPIs: Kenneth D. Forbus, Thomas Hinrichs
Payoff
Problem/Objective
Solution Approach/Accomplishments
• Extend Companion Cognitive Systems architecture to achieve transfer learning
• Advance analogical processing technology• Develop techniques for learning self-models
• Test using ETS Physics testbed
• New techniques for robust near and far transfer learning
• Advances can be incorporated in other cognitive architectures, systems
• Near term: Analogy Servers• Long-range: Companions architecture
used in military/intelligence systems• Today’s cluster is tomorrow’s multi-
core laptop
• Analogy approach based on how humans seem to do transfer
• Study worked solutions to learn equations, modeling assumptions
• e.g., when could something be a point mass?
• Pilot experiment: Achieved transfer levels 1, 3, & 5
ETS-generatedAP Physics test
Worked Solutions
Sketches included Y2-3
Learned strategies,
encoding rules,and cases
Learning to solve problemsby studying worked solutions
6
UT Arlington: Urban Combat Testbed (UCT)PIs: L. Holder, M. Youngblood, D. Cook
• Develop Urban Combat Testbed (UCT), a simulated, real-time, urban combat domain
• Agent interface provides detailed, real-time perceptual information and command execution
• Human interface provides compelling video interface and keyboard/mouse command interface
• Develop scenarios for human and agent trials for each level of transfer• Execute human and agent trials, compare transfer learning performance• Investigate other approaches to transfer learning
•Human transfer learning•Hierarchical reinforcement learning•Agent-based cognitive architectures
• UCT version 1.0 available• Based on Quake 3 Arena first-person shooter
(FPS) game• Enhanced to include realistic urban combat
environments• Agent version provides interface to game
percepts and commands• Human-player version provides standard
interface as in commercial FPS games• Under development
• Set of scenarios to evaluate different levels of transfer learning• Random generation of scenarios• Ability to log game interaction
Technical Details Highlights
Vision/Goals• Develop Urban Combat Testbed (UCT) capable of generating tasks to evaluate transfer learning performance
• Conduct significant human trials to evaluate human transfer learning performance
• Disseminate UCT to community as a benchmarking tool for cognitive performance
• Investigate novel cognitive architectures for achieving transfer learning in Urban Combat and similar domains• Achieve 70% of human transfer learning performance
7
UT Arlington: Reinforcement LearningPI: M. Huber
Technical Approach Benefits and New Capabilities
Integration and Deliverables Example and Performance
Transfer of skill and concept hierarchies from training to transfer tasks
Transfer skills and concepts are found automatically and carry probability and value attributes• Transfer skills are extracted based on local system characteristics in the task domain
•Sub-skills are reward-independent•Transfer skills have an associated probabilistic model
• Hierarchical concepts capture capabilities of the skill set•Concepts capture probabilistic behavior of skills•Concepts capture value attributes of the task domain
• Generated representation hierarchy and refinement process have bounded optimality properties
•Policies learned on the representation are within a bound of optimal
The approach provides skill and concept hierarchies for use as representations by reasoning systems• Provides probabilistic and utility information to representation hierarchies in ICARUS
•Explicit tie of reasoning structure and reinforcement learning• Generates new, capability-specific concepts that could serve
as new predicates in Markov Logic Networks (MLN)•Probabilistic attributes can facilitate fast integration into MLN
Integration and Delivery MilestonesIntegration: ICARUS int. MLN MLN/ICARUSDevelopment: Skill utility Skill generalization Skill extension Deliverables: Prototype w. Prototype w. Final system
UCT interface skill gen.
Urban Combat Testbed (UCT)•Training task: Go to flag•Transfer task: Retrieve different flag
• Transfer from training to transfer task•29 sub-skills and associated concepts•Reduction from 20,000 to 81 states
• Transfer Performance (Transfer Ratio - TR)
•TR 2.5 with skill transfer•TR 5 with skill and concept transfer
Skill HierarchyConcept Hierarchy
Selective, task-specific state space construction
Hierarchical state representation
Task learningSkill and concept
extraction
•Extraction of sub-skills using subgoal discovery
•Learning of concepts characterizing skill capabilities
• Transferred concepts and skills are used to construct a more
abstract Bounded Parameter state representation
•Learning on new, more compact representation leads to improved learning performance
Year 1 Year 2 Year 3
8
University of Washington: Markov LogicPI: Pedro Domingos
Payoff
Problem/Objective
Approach/Accomplishments
- Transfer learning requires: - Relational inference & learning - Uncertain inference & learning- Markov logic provides this - Simple, general, unified framework- Needs: - Scaling to large problems - Online, “lifelong” operation - Extension to continuous data - Extension to decision-making
- Key approaches: - Representation mapping - Statistical predicate invention- Accomplishments to date: - LazySAT: Efficient use of memory (400,000 X less than WalkSAT on BibServ) - MC-SAT: Fast mixed inference (>1000 X faster than Gibbs, tempering) - Alchemy system - Collaborated on integration w/ Icarus, etc.
- Enables highest levels of transfer - Between relational structures, as opposed to surface descriptions- Enables transfer “in the wild” - Noisy, rich, real-world domains - As opposed to shoehorning problems into standard machine learning form-Broadly applicable AI technology - Greatly increases speed of adaptation
Markovlogic
ILPWeight
Learning
WalkSAT MCMC
SourceDomain
TargetDomain
0
4
8
12
16
0 100 200 300 400 500No. Records
No
. Cla
use
s (m
illio
n)
WalkSAT
LazySAT
9
CycorpPI: Michael Witbrock
Payoff
Problem/Objective:Knowledge-based transfer learning
• Supply background knowledge and well-encoded, logically meaningful domains and problem spaces
• Elaborate on background knowledge and knowledge gathered from source tasks and domains
• Informed by existing background knowledge in Cyc
SourceTestbed
Situation,Status, &Queries
Advice &Support
TargetTestbed
Cyc KB (background knowledge)
Collect knowledgerelevant to a task,
domain, or problem
Elaborate on knowledge: • Inferential expansion• Probabilistic weighting• Rule formation (ILP)
ExecutionAgent(s)
Perform inference;supply advice, queryresults, background
knowledge
• Information flow among complementary learningand transfer mechanisms and approaches
• Establish a well-founded, mutually compatible baseof assumptions and facts – necessary for transfer
• Allow systems to communicate observations, conclusions, skills, memories and intentions
• Learning can take full advantage of existing background knowledge, knowledge from less- obviously related domains and problems
• New high-level, semantically connected knowledge, within a context of existing knowledge: understanding
Solution Approach/Accomplishments
• Representation of initial domains and solutions• Existing knowledge relevant to domains identified• Physics testing domain: encoding developed, first
transfer level problems represented in Cyc• Urban Combat (FPS) testbed: map space semantics
defined; distribution being developed
• Initial integration of probabilistic reasoning• System integrated, scalability testing underway• Alchemy system Extended
• Rule and skill learning underway• First utomatically-generated results
from evaluation domains• Application of work from BUTLER seedling
New Rules and Skills:
RuleInduction
New Facts: Automated KnowledgeAcquisition
Expanded Knowledge:Inference &
Markov Logic
10
Maryland/Lehigh: Hierarchical Task NetsPIs: Dana Nau, Héctor Muñoz-Avila
Problem/Objective
Solution Approach/Accomplishments
• Learn applicability conditions of HTN methods that tell how to decompose tasks into subtasks
• Input: plan traces produced by an expert problem-solver
• Reflects abstraction levels in the game• Output: methods consistent with plan traces
• Can be transferred in different games
• HTNs represent knowledge of different granularity at different levels
• Facilitates transfer to different games• Increasingly capable HTN learning
algorithms• Y1: transfer levels 1-3• Y2: transfer levels 4-7• Y3: transfer levels 8-10
• Approach: our new HDL algorithm• Can start with no prior information• Can start with info transferred from a previous
learning session• Accomplishments:
• Development of the HDL algorithm• Theoretical conditions in which HDL achieves
full convergence [paper at ICAPS-06]• Experiments: even when only halfway to
convergence, HDL solved > 3/4 of test set
MadRTS real-time strategy game
HDL++ Learning
Agent
Statistical methods to compare learning
curves
TIELT
Payoff for TL
Scenario Generator
11
Rutgers University: Relational TemplatesPI: Michael Pazzani
Constraint Clauses evaluated
None 320,968
Unique 195,489
Commutative 165,601
Both 88,230
Approach
Payoff
Problem/Objective
Solution Approach/Accomplishments
• Learn templates from Markov Logic Networks (MLNs)
• Learn Markov Logic Networks (MLNs) from templates
• Learning general concepts and strategies applicable across many domains,
• Transitivity• thwarting, feigning
• Constraining Learning of MLN clauses• Creating template from MLN clauses by
Least General Generalization Speed Up
MLNsTemplates
SameVenue(a1,a2) v
!SameVenue(a2,a3) v !SameVenue(a3,a1)
SameTitle(a1,a2) v !SameTitle(a2,a3) v
!SameTitle(a3,a1)
P(a1,a2) v !P(a2,a3) v
!P(a3,a1)
12
UT Austin: Theory RefinementPI: Mooney
SummaryProblem/Objective• Faster learning in target domain by efficiently
transferring probabilistic relational knowledge using bottom-up theory refinement.
• Determine appropriate predicate mapping by searching possible mappings to find the most accurate for the target domain.
• Use relational path-finding to more effectively construct new clauses in the target domain.
Develop transfer learning methods for Markov Logic Networks (MLNs) that:
• Efficiently revise structure and parameters of learned knowledge from source domain to fit novel target domain.
• Automatically recover an effective mapping of the predicates in the source domain to those in the target domain.
Approach 2. Determine which parts of the source structure are still valid in target domain and which need to be revised; annotate source MLN accordingly.
3. Specialize only overly-general clauses and generalize only overly-specific ones, leaving the good ones unchanged.
• Alchemy and our transfer algorithm equally improve accuracy over learning from scratch.
• Our approach decreases learning time and number of revision candidates significantly.
Experimental Results
(Mihalkova & Mooney, ICML06-TL workshop)1. Find an effective predicate
mapping.
4. Search for additional clauses using relational path-finding.
13
UT Austin: Reinforcement LearningPI: Peter Stone
β(A’)→A
γ(S’)→S
I-TAC
SME-QDBN
1
2:S A ' :S' A '
OXOXXOX
XOOXOXXOXOX
Problem/Objective
1 2
• Develop core architecture-independent unified transfer learning technology for reinforcement learning
• Key technical idea: transfer via inter-task mapping– Generalization of value-function-based transfer
• Automatic discovery of inter-task mapping– I-TAC (inter-task action correlation)– SME-QDBN (structure mapping + qualitative dynamic Bayes networks)
• Value-function-based transfer and policy-based transfer• Focus on results in many domains• Transfer of knowledge among reinforcement learning tasks (within the same
domain/testbed)– RoboCup Soccer, GGP
• Compare with Icarus GGP performance
ResultsTechnical Approach• Automatic discovery of inter-task mapping
– I-TAC (inter-task action correlation)• Data-centered approach• Train a classifier to map state transition pairs to actions in the source• Use the classifier and state mapping to obtain the action mapping
– SME-QDBN (structure mapping + qualitative dynamic Bayes networks)• Knowledge/model-centered approach• Represent action model using qualitative DBNs• Specialized and optimized SME for QDBNs, using heuristic search
• RoboCup soccer– Value-function-based transfer: sarsa-learning, function approximators– Policy-based transfer: neuro-evolution (NEAT)
• GGP: value-function-based– Using symmetry to scale up the same type of games– Identifying game-tree features to transfer among different types of
games
Source Actions (a)
Tar
get A
ctio
ns (
a’)
163 (76%)51 (24%)1 (<1%)4v3 Pass3
133 (50%)133 (50%)0 (0%)4v3 Pass2
97 (36%)174 (64%)1 (<1%)4v3 Pass1
71 (24%)0 (0%)227 (76%)4v3 Hold
297 (92%)25 (8%)2 (<1%)3v2 Pass2
26 (7%)330 (93%)0 (0%)3v2 Pass1
0 (0%)0 (0%)382 (100%)3v2 Hold
3v2 Pass23v2 Pass13v2 Hold
I-TACSME-QDBN
t.r. = 5.8t.t.r. = 84%
Connect-3 (4x4, same opp)
CaptureGo (3x3, same opp)
t.r. = 5.6t.t.r. = 83%
t.r. = 4.3t.t.r. = 88%
t.r. = 4.3t.t.r. = 73%
Minichess (5x5)
RoboCup GGP
14
• Model/knowledge oriented approach• Using knowledge about
– How actions affects state variables?– How state variables relate to each other?
• Use structure mapping to find similarities between source and target tasks
• Discover β and γ together
Objective
Technical Approach
Results Keepaway match scores
• Representation: qualitative dynamic Bayes networks• Specialized and optimized SME for QDBNs• SME-QDBN uses heuristic search to find the
mapping of the maximal score1. Generate local matches and calculate the conflict set for each
local match;2. Generate initial global mappings based on immediate relations
of local matches;3. Merge global mappings with common structures;4. Search for a maximal global mapping with the highest score;
Structure Mapping
• can be decomposed into two parts– Mappings of states () and actions ()– Transforming representation of value functions
(table-based or function approximation)
• Current work focuses on automatic discovery of mappings of state variables and actions
– Data oriented approach (I-TAC)– Model/knowledge oriented approach (structure mapping)
• Data oriented approach to automatic discovery of • Considers mappings of states () and actions ()
separately
(S’)→S (A’)→A
• Assume that is given. How can we learn β?
Inter-Task Action Correlation (I-TAC)
Technical Approach1. Collect transition data in source domain2. Train a classifier from state pairs to actions3. Collect transition data in target domain, define as
β(a’) = arg maxa #{all tuples with a’ | C(γ(s’1),γ(s’2)) = a}
ResultsSource Actions (a)
Tar
get A
ctio
ns (
a’)
3v2 Hold 3v2 Pass1 3v2 Pass2
3v2 Hold 382 (100%) 0 (0%) 0 (0%)
3v2 Pass1 0 (0%) 330 (93%) 26 (7%)
3v2 Pass2 2 (<1%) 25 (8%) 297 (92%)
4v3 Hold 227 (76%) 0 (0%) 71 (24%)
4v3 Pass1 1 (<1%) 174 (64%) 97 (36%)
4v3 Pass2 0 (0%) 133 (50%) 133 (50%)
4v3 Pass3 1 (<1%) 51 (24%) 163 (76%)
UT Austin: Mapping Value FunctionsPI: Stone
15
UT Austin: Feature ConstructionPI: Stone
• Scale up from small to large version of same game– Simultaneous update of isomorphic states– Exploit symmetry to scale up RL in board games
• Transfer between different small games– Table-based learning but transfer in feature space– Automated discovery of state-features– Initialization by feature-matching– Two person, complete-information, turn-taking games
Technical ApproachResults (rand opp)
Objective
Connect-3, 4x4
t.r. = 4.3t.t.r. = 73%
Minichess (5x5)
• Verify presence of symmetries on smaller task (larger task => too much memory)
• Transfer knowledge to larger task (simultaneous backups for upto 8 transitions)
Othello, 4x4
Feature discovery Features discovered
• Feature extraction/matching based on abstract game-tree expansion upto 2 levels
FindingsLimited look-ahead based features are quick to extract and match, few (manageable knowledgebase), highly common/reusable, and faster than minimax lookahead against suboptimal opponents
Future Plan:Abstraction Matching
SourceGame
AbstractMDP
TargetGame
Abstraction Discovery(Jong & Stone, 05) Online Abstraction
Matching (ongoing work)
SourceGame
AbstractMDP
TargetGame
Abstraction Discovery(Jong & Stone, 05) Online Abstraction
Matching (ongoing work)
transfer
Minimax lookahead
tr =1.66, ttr=56.7%tr~ 40, ttr~ 99%
Transfer Learning Site Visit August 4, 2006
Proposal for Year 2from the ISLE Team
17
Changes from Initial Plans
Year 1
• Full integration did not happen in Y1– Component systems (Icarus, Soar,
Companions, LUTA, CaMeL) were developed independently and did not emerge as a single system
– Ideas and/or subsystems of component efforts to be integrated in later years.
• Little use of background knowledge in Y1– Still believe it is critical for taking full
advantage of transfer opportunities, but…– Y1 concentrated on basic navigation and
problem solving without exploiting deep semantic domain knowledge
• Markov logic not used in Y1 testbed evaluations– Initial integration with ICARUS is finished, but
efficiency issues advised against its use for Y1 tasks
– Improving efficiency is a top Y2 priority
Year 2
• Continuing with three main architectures– Development of Component systems will
continue. – Evaluation will focus on comparing and
contrasting agent architectures.• Focus on highest transfer levels in all three
testbeds – Urban Combat, Physics, and GGP
• More interesting scientific results linked to key claims, but fewer total experimental conditions and less engineering
• Management structure for project will change to a matrix organization
18
Year 2 Matrix Management Structure
ISLE (Langley)Oversight
ISLE (Langley)ICARUS
UW (Domingos)Markov logic
UW (Domingos)Alchemy
Rutgers (Pazzani)Rel. templates
UT (Mooney)Theory Revision
Michigan (Laird)Soar extension
NU (Forbus)Companions
UT (Stone)LUTA extension
Maryland (Nau)HTN planning
ISLE (Konik)Skill learning
Cyc. (Witbrock)CYC integr.
ISLE (Langley)Oversight
ISLE (Shapiro)Urban Combat
WSU (Holder)UC extensions
ISLE (Stracuzzi)ICARUS on GGP
NU (Forbus)Compns Physics
UT (Stone)LUTA on GGP
ISLE (Stracuzzi)GGP evaluation
Cyc. (Matuszek)Physics eval.
Michigan (Laird)UC evaluation
ISLE (Choi)UC evaluation
ISLE (Konik)ICARUS Physics
WSU (Holder)Humans on UC
TechnologyDevelopment
ExperimentalEvaluations
Technology work breaks down into extending Markov logic, integrating Markov logic and HTNs into ICARUS, and extending other agent architectures
Evaluation efforts focus on GGP (external), Urban Combat (internal), and ETS Physics (external), each used on two agent architectures
19
Expected Year 2 Products
Extended Alchemy software that includes:
Techniques for inventing new predicates that support mapping across domains
Methods for revising inference rules based on observed regularities (from UT Austin)
Methods for using relational templates to learn from few instances (from Rutgers)
Ability to access background knowledge from CYC (from Cycorp)
Extended ICARUS software that includes:
Techniques for learning goal-oriented mappings that support transfer
More flexible inference using Alchemy as a central module (from Washington)
Extended methods for learning skills in adversarial contexts (from Maryland)
Methods for combining skill learning with value learning (from UT Austin)
Extended versions of software for:
Soar (Michigan) that supports transfer by semantic learning and chunking
Companions (Northwestern) that supports transfer by deep structural analogy
LUTA (UT Austin) that achieves transfer by knowledge-based feature construction
Extended Urban Combat testbed that:
Includes a richer variety of objects, activities, and spatial settings
Supports multi-agent coordination and multi-agent competition
Allows tests of high-level transfer from urban military operations to search and rescue activities
20
Claims about Transfer Learning
Claim: Transfer that produces human rates of learning depends on reusing structures that are relational and composable
Test: Design source/target scenarios which involve shared relational structures that satisfy specified classes of transformations
Example: Draw source and target problems from branches of physics with established relations among statements and solutions
Claim: Deep transfer depends on the ability to discover mappings between superficially different representations
Test: Design source/target scenarios that use different predicates and distinct formulations of states, rules, and goals
Example: Define two games in GGP that are nearly equivalent but have no superficial relationship
Meta-Claim: These claims hold for domains that involve reactive execution, problem-solving search, and conceptual inference
Test: Demonstrate deep transfer in testbeds that need these aspects of cognitive systems
Example: Develop transfer learning agents for Urban Combat, GGP, and Physics
Predicate invention for representation mapping in Markov logic (Washington)
Goal-directed solution analysis for hierarchical skill mapping (ISLE)
Representation mapping through deep structural analogy (Northwestern)
Semantic learning augmented with procedural chunking (Michigan)
We will explore four paths to deep transfer:
21
ISLE Year 2 Plans for ICARUS
Combine rapid analytic creation of hierarchical skills with statistical estimation of their utilities
Learn relational concepts that characterize the conditions under which skills achieve goals
Retrieve relevant skills even when the goals that index them match only incompletely
Acquire mappings among domain representations based on analysis of problem solution traces
Use these capabilities to support deep transfer
We will demonstrate deep transfer in three separate testbeds with distinct characteristics.
ICARUS’ Unique Capabilities
Plans for Evaluation
Integration Plans
Mapping Concepts and Skills
Long-TermLong-TermConceptualConceptual
MemoryMemory
Short-TermShort-TermConceptualConceptual
MemoryMemory
Short-TermShort-TermGoal/SkillGoal/SkillMemoryMemory
ConceptualConceptualInferenceInference
SkillSkillExecutionExecution
PerceptionPerception
EnvironmentEnvironment
PerceptualPerceptualBufferBuffer
Problem SolvingProblem SolvingSkill LearningSkill Learning
MotorMotorBufferBuffer
Skill RetrievalSkill Retrieval
Long-TermLong-TermSkill MemorySkill Memory
Replace w/Alchemyinference software
Augment with CYCknowledge base
Incorporate HTN planning methods
Cycorp
Washington
Maryland
Add methods for learning value fns
UT Austin
source concepts
targetconcepts
sourceskills
targetskills
ICARUS will not only learn hierarchical skills and concepts, but also how they map across different settings
Urban Combat Problem Solvingin Physics
General Game Playing
New Technology: Concept Revision
1. Learn new domain-specific concepts
2. Generalize these concepts to expand possible transfer opportunities
3. Specialize again in target domain to increase utility
Details vary, but underlying structure unchanged
Y2 Plans for ICARUS on GGP
ChallengesDomain Independence
– Remove assumption of “chess-like” games– Expand beyond common board games, consider puzzles
or games with many players• Concept learning and revision
– Remove assumption that domain-specific concepts will be provided
– Agent must discover new concepts or revise existing ones
GoalsDemonstrate discovery / transfer of structural domain knowledge
– Build on Y1 success with first-order concepts
– Learn relationships among concepts to capture domain structure
– Expand learning of relative concept utility to revise concepts to improve utility
• Generalize existing concepts to expand coverage
• Specialize general concepts to improve utility
– Derive new concepts from game description
Domain-specific
concept (source)Generalized concept Specialized concept
(target)
New Technology: Concept Derivation
1. Derive basic concepts from game description
2. Evaluate utility through experience
3. Construct more complex structures by combining concepts and
expanding derivation
GGP Game
Description
Simple derived
conceptsComplex structures
Return to description
and derived concepts for further expansion
23
GeneralConcept
Transform the current situation• When expecting or searching for transferRetrieve a memory based on
transformed situation• Automatic (procedural) or• Deliberate (semantic/episodic)
Use transfer memory to impact behavior• Control selection of actions• Decide on strategy or tactic
Perform target task • Generate behavior• Sense environment• Create internal situational assessment
Michigan Year 2 Plans for Soar
Body
Long-Term Memories
Procedural
Short-Term Memory
Dec
isio
n Pr
oced
ure
Chunking
Episodic
EpisodicLearning
SemanticLearning
Semantic
App
rais
al
Det
ecto
r
ReinforcementLearning
Perception Action
Experience IdentifyGeneralizeAbstract
Store
Source Problems Target Problems
Soar provides• Extreme flexibility in every phase of
transfer• Multiple performance methods• Task-dependent knowledge for
abstraction, transformation, instantiation• Multiple learning mechanisms
Create general concept/skill/…• Generalization based on multiple examples• Abstraction based on prior semantic
knowledge
Store in memory for later recall• Different memories for different types of
knowledge • Procedural, semantic, episodic
Identify elements that might be useful• Everything, but literal (episodic)• Categories, structures (semantic)• Results of processing (chunking)• Explicit analysis (reflection)
Perform source task • Generate behavior• Sense environment• Create internal situational assessment
Experience
Retrieve
Use
Transform Retrieve
Transform
Transform/map retrieved memory• Explicitly map to current situation or• Instantiate for current situation
Retrieve from memory a related concept• For some memories this will be automatic• For others, it will be deliberate
24
Level 9 Transfer in Soar
Body
Long-Term Memories
Procedural
Short-Term Memory
Dec
isio
n Pr
oced
ure
Chunking
Episodic
EpisodicLearning
SemanticLearning
Semantic
App
rais
al
Det
ecto
r
ReinforcementLearning
Perception Action
Experience IdentifyGeneralizeAbstract
Store
Source Problems Target Problems
Source: Hunted dies after getting trapped in a dead end
• learns spatial configuration of dead end• learns dead end is deadly to hunted
Target: Hunter tries to chase hunted to a location it has recognized as a dead end.
Perform source task • Hunted dies after getting trapped in a dead
end
Experience Use
Transform Retrieve
Perform target task • As hunter, tries to develop strategy for
killing hunted
Use transfer memory to impact behavior• Searches for dead ends• Tries to “herd” hunted into dead ends
Identify elements that might be useful• Death is feedback that made mistakeCreate general concept/skill/…• Uses episodic knowledge to recall behavior that
led up to death• Analyzes spatial configuration• Causal knowledge determines critical features
Store in memory for later recall• Stores dead-end concept in semantic
memory and associates bad result
Transform the current situation• Creates internal model of huntedRetrieve a memory based on
transformed situation• Queries memory – what would be bad when I
imagine myself as hunted?• Retrieves memory of dead end
25
Level 10 Transfer in Soar
Body
Long-Term Memories
Procedural
Short-Term Memory
Dec
isio
n Pr
oced
ure
Chunking
Episodic
EpisodicLearning
SemanticLearning
Semantic
App
rais
al
Det
ecto
r
ReinforcementLearning
Perception Action
Experience IdentifyGeneralizeAbstract
Store
Source Problems Target Problems
Source: 1v1 • Learns to pick up ammo to deny enemy
Target: Fire rescue• Transforms to remove gasoline near fire
Experience Use
Transform Retrieve
Perform source task • Tries to kill enemyIdentify elements that might be useful• Encounters experience when can pick up
enemy ammo and realizes that would deny enemy ammo
Store in memory for later recall• Stores general concept in semantic
memory
Perform target task • As fire rescuer, try to search building
(and avoid dieing, flames, etc.)
Transform the current situation• Analyzes situation• Determines that fire is its enemy
Retrieve a memory based on transformed situation
• Queries memory for ways to defeat an enemy
• Retrieves general concept about resources
Create general concept/skill/…• Uses background knowledge to generalize to a
concept of deny enemy its resources necessary to hurt me
Use transfer memory to impact behavior• Instantiates general concept in current
situation [resource map to air and fuel – wood, gasoline, …]
• Takes actions to eliminate fuel
26
Northwestern Year 2 Plans for Companions
Foundation: Analogical Processing• Northwestern’s technology is based on how humans seem to do
transfer – by analogy and similarity• Based on Gentner’s (1983) Structure-Mapping theory• Simulations of cognitive processes engineered into components in
prior DARPA research– SME: Analogical matching, similarity estimation, comparison– SEQL: Generalization– MAC/FAC: Similarity-based retrieval
Approach• Extend Companions Cognitive Systems architecture by
– Creating and incorporating advances in analogical processing
– Develop techniques to learn self-models to help formulate own knowledge goals
• Compare Companions and ICARUS in physics testbed– Help ISLE and Cycorp integrate our
representations and support libraries into ICARUS
– Extend as necessary (e.g., sketching support)
Metrics• Coverage = Fraction of time an answer is generated• Accuracy = Whether the answer is right (including partial
credit)
Companions Cognitive Systems Architecture• Structure-mapping operations appear to be heavily used
throughout human reasoning and learning• Hypothesis: Can achieve human-like reasoning and learning by
making structure-mapping operations central in a cognitive architecture
Year
Coverage Accuracy
Near Transfer
Far Transfer
Near Transfer
Far Transfer
1 50% 0% 50% 0%
2 80% 50% 90% 80%
3 90% 80% 90% 90%
27
Northwestern Year 2 Plans for Far Transfer (7-10)
“A battery is like a pump”
Advice can be about
appropriate analogs,
mappings, analogical inferences
Analogical encoding will let Companions work
with more abstract advice
Metamappings will guide cross-
domain analogies by first matching
general knowledge
KB
Persistent Mappings store ongoing
understanding of cross-domain analogy
Expanded self-modeling
capabilities to improve skills and
knowledge
Need to study pulley problems more
Need to figure
out trig inverses
28
University of Washington Year 2 Plans
Integration
Evaluation
Unique Contribution
- Integrate into Icarus- Apply to Physics and Urban Combat- Transfer Level 8: - 60% of human performance- Transfer Levels 9-10: - 30% of human performance- Infrastructure: - Component-wise evaluation - “White box” evaluation
Technologies
Alchemy
ICARUS
Percepts Inferences
- Representation mapping - Entities - Attributes - Relations - Ontologies - Situations - Events- Based on statistical predicate invention - Discover abstract relations, etc., & transfer- Infrastructure - Efficient inference and learning - Online, “lifelong” operation - Extension to continuous data - Extension to decision-making
PredicateInvention
InfrastructureRepresentationMapping
29
UT Austin Year 2 Plans (Mooney)
Theory RefinementPredicate MappingImprove system’s ability to revise the structure of the
source Markov Logic Network (MLN) to fit the target domain.
• Improve efficiency of clause generalization and specialization procedures by using bottom-up search to directly identify productive changes rather than blindly searching the space of possible refinements.
• Improve generation of new clauses in the target domain by exploiting advanced ILP methods.
Improve system’s ability to accurately map predicates from source to target domain.
• Use schema-mapping techniques from information integration to suggest predicate mappings by analyzing source and target data.
• Use lexical knowledge (e.g. WordNet) to guide matching of predicate names in source and target.
• Use heuristic search to improve efficiency of finding the best overall predicate mapping.
Integration with Alchemy and ICARUS Evaluation on Testbeds
Incorporate MLN Transfer Learning Methodsinto Alchemy and Icarus
• Integrate predicate mapping and theory refinement methods into UW Alchemy MLN software package.
• Integrate our transfer learning methods for MLNs into Icarus+Alchemy to provide transfer of static inferential knowledge from the source to the target domain.
Evaluate MLN TL methods on ISLE Testbeds• In Urban Combat and other ISLE testbeds, measure the
accuracy of transfer learning at making within-state inferences (using AUC) compared to learning an MLN from scratch by adapting knowledge from source to target tasks for several levels of transfer.
• Measure training time of our system versus existing Alchemy to demonstrate improved efficiency.
• Compare ablated version of Icarus without MLN-transfer-learning to enhanced version on final testbed performance metrics and demonstrate improved performance.
Source Training Data
MLN Learner(Alchemy)
Source MLN
MLN RevisionTarget
Training DataTarget MLN
Predicate Mapping
30
Rutgers University Year 2 Plans
Integration
Evaluation
Unique Contributions
- Alchemy Integration into Icarus- Apply to Physics and Urban Combat- Transfer Levels 9-10: - 50% of human performance- Infrastructure: - Component-wise evaluation: Alchemy
Technologies-Learning and Instantiating Templates
- Based on second order learning - Discover general regularities & transfer
-Entailment Learning-Combining Inductive and deductive learning-Discover simple rules and combine (e.g., OCCAM)
- Infrastructure-Template Learner for Markov Logic Networks- Deductive Learner for Markov Logic Networks
Templates
Alchemy
ICARUS
Percepts Inferences
MLNS
Template Learning and Instantiation
Learning by entailment
31
Cycorp Year 2 PlansTechnologies and Capabilities
Problem/Objective:Knowledge-based transfer learning
• Supply formalized domain expertise and well-encoded, logically meaningful domains and problem spaces
• Elaborate on background knowledge via ILP and inference, provide advice, and extend knowledge gathered from source tasks and domains
• Informed by existing background knowledge in Cyc
New Rules and Skills:
Rule Inductionvia ILP
New Facts: Domain & General
Knowledge
Expanded Knowledge:
Inference, Advice, & Probabilities
• Technical Development• Provide domain knowledge for use by Urban Combat,
Physics, and GGP performers• Provide inference capabilities, including query support,
goal advice, and knowledge elaboration, for UCT, Physics and GGP performers
• Pursue knowledge gathering and elaboration via ILP over domain and background knowledge
• Pursue inference speedup and results improvement via Reinforcement Learning of inference pathways
• Integration & Coordination• Integrate Alchemy and other probabilistic reasoning
approaches with Cyc’s inference capabilities
Tasks
• Responsibility for technical integration of Alchemy, Cyc, and other inference approaches
Cycbackground knowledge& inference capability
SourceTestbed
Situation,Status, &Queries
Advice,Support, &Elaboration
TargetTestbed
Collect knowledgerelevant to a task,
domain, or problem
Develop knowledge: • Inferential expansion• Probabilistic weighting• Rule formation (ILP)
ExecutionAgent(s)
Perform inference;advice, query results,background, skills and
memories
• Responsibility for technical coordinationof groups developing on the Physics testbed
Payoff• Information flow among complementary learning
and transfer mechanisms and approaches• Establish a well-founded, mutually compatible base
of assumptions and facts – necessary for transfer• Allow systems to communicate observations,
conclusions, skills, memories and intentions
• Learning can take full advantage of existing background knowledge, knowledge from less- obviously related domains and problems
• High-level, semantically connected knowledge, within a context of existing knowledge = understanding
32
Cycorp Year 2 PlansEvaluation and Integration
Coordination & IntegrationRepresentation• Coverage: in each testbed,
• How many problems are represented?• How many types of problem? What
novel problem categories?• How many and what type of obstacles,
goals, percepts, and actions?• What novel types of solution information?
• Accuracy:• Well-represented domains are critical for
successful performance; accuraterepresentations are demonstrated bysuccessful agent evaluations.
CycCycKBKB
Testbeds:Urban Combat,Physics, GGP
S2
IS2-1 IS2-j
S1
IS1-i
IS1-1
IS2-1IS1-i
IS2-1
IS1-1 IS1-i
IS2-j
S3
IS3-1
IS1-1
InferenceEngines &
Approaches
Reasoning Over
Queries & Testbeds
Coordinating Inferences
Infe
rence
s (Q
uerie
s, G
oals,
Searc
h Pat
hs, E
labora
tions)
Querie
s & In
fere
nce N
eeds;
Skills
, Conce
pts, L
TMs
Domain Knowledge
Coordination &
Semantic Content Background & Problem
RepresentationLTMs &
Followup Queries
Large ScaleILP, Knowledge
Seeking,Generalization,
LTMs
COMPANIONS
Soar
Queries, G
oals,
Elaboratio
ns
Knowledge, Goals,
Analysis, A
dvice
ICARUS
• How many formal representations of problems and queries in different testbeds are shared by different architectures?
• How many inference requests, of how many types, go through a common interface?
• How effectively can knowledge be probabilistically qualified (as measured by crossfold validation)?
• Learning & Transfer:• What novel fact-level knowledge gathered for
the source is reused in the target space?• How many facts, in what domains?• How many rules can be obtained via ILP over
gathered and domain knowledge, in what domains?• What agent skills are obtainable by ILP within Cyc?
• Advice-giving and query results:• What appropriate, novel goals are presented?• What improvement on random search can be
obtained through advice?• Skills, abilities, and long-term memories:
Inference & Learning
Knowledge& Inference
Soar, ICARUS,Companions
UrbanCombat,
GGP,Physics
• What novel abilities can agents demonstratewith knowledge and inference support?
• What new problems are solvable that could not be solved without that support?
33
Maryland/Lehigh Year 2 Plans
New Technologies
Solution and Evaluation
1. Mapping between Icarus’ hierarchical representations and HTNs
2. Techniques for systematically extending planners to work in adversarial domains (i.e., multiple possible responses from an adversary)
3. Extensions to Icarus to learn in such domains
• A mapping between Icarus’ hierarchical representations and Hierarchical Task Networks
• New algorithms will provide capabilities to reason about adversaries: I.e., to learn about them and to plan against them
• This will provide high-level transfer in adversarial environments via learning about abstract strategies/models of the behaviors of single or groups of adversaries in one scenario and transferring this knowledge to another scenario
Contributions1. How: Generalize our planner-modification techniques
to deal with adversaries2. Work with ISLE to generalize Icarus’ learning to learn
about adversaries3. When:
1. September-December 2006: develop the theory and implement the new algorithms
2. January-April 2007: Work with the ISLE team to incorporate algorithms into Icarus
3. May 2007: Evaluation: Use the GGP testbed for Year 2
• Icarus does plan abstraction by grouping actions
• The groups are analogous to Hierarchical Task Network (HTN) decomposition templates (e.g.., as in SHOP2)
• “Planner-modification” techniques for systematically generalizing planners to work with nondeterministic actions (i.e., multiple possible outcomes)
Capabilities
a5 a6a2
a3 a4
a1
our actionState 1
State 2
adversary response 1
adversary response 2action
State 1
State 2
possible outcome 1
possible outcome 2
34
UT Austin Year 2 Plans (Stone)
β(A’)→A
γ(S’)→S
I-TAC
SME-QDBN
:S A ' :S' A '
OXOXXOX
XOOXOXXOXOX
Evaluation
Unique Abilities• Automatic discovery of inter-task mapping
– I-TAC (inter-task action correlation)• Train a classifier to map state transition pairs to actions in the source• Use the classifier and state mapping to obtain the action mapping
– SME-QDBN (structure mapping + qualitative dynamic Bayes nets)• Knowledge/model-centered approach• Represent action model using QDBNs• Specialized and optimized SME for QDBNs
using heuristic search•Policy-based transfer
Capabilities/Technologies
Integration• Incorporate RL into Icarus and/or Soar
– Focus on leveraging action-value functions into generalizable planning knowledge
– Abstract learned RL knowledge to relational representations
• ISLE team comparisons: compare value function transfer vs. Icarus approach in GGP
• Evaluate same core algorithms in multiple domains• GGP: value-function-based
– Use symmetry to scale up within same game types– Game-tree features to transfer among different types of games– Automatic abstraction discovery
• RoboCup Soccer– Value-function-based transfer: sarsa, function approximators– Policy-based transfer: neuroevolution (NEAT)
• Urban Combat -- continued evaluation of year 1 effort
Core architecture-independent TL for reinforcement learning
t t+1HoldDist(K1,C)
Dist(K2,C)
Dist(T1,C)
Dist(K1,K2)
Dist(K1,T1)
Dist(K2,T)
Ang(K2,K1,T)
DAng(C,K1,K2)
DAng(C,K1,T1)
Dist(K2,T1)
Ang(K2,K1,T1)
Dist(K1,C)
Dist(K2,C)
Dist(T1,C)
Dist(K1,K2)
Dist(K1,T1)
Dist(K2,T)
Ang(K2,K1,T)
DAng(C,K1,K2)
DAng(C,K1,T1)
Dist(K2,T1)
Ang(K2,K1,T1)
-
min
min
13 inputs, 3 outputs
19 inputs, 4 outputs
35
• Generalization of skills and concepts•Policy homomorphisms as general skills •Relational concepts and featuresfrom homomorphic mapping
• Relational Learning •Reinforcement learning with relational concepts and generalized skills
• Estimation of skill/concept utility•Skill utility to regulate exploration•Concept utility to improve state hierarchy construction
UT Arlington Year 2 Plans (Huber)
Technical Approach Novel Capabilities
Integration Evaluation Plans
Generalization of skills and concepts, and estimation of skill and concept utilities to improve transfer
Automatic definition of relevant representational concepts and utility-based guidance for efficient hierarchy construction and skill exploration in RL.• Learning of generalized, “parametric” skills
•Generalized policies apply in novel situations and environments•Skills have operator descriptions with utilities and probabilities
• Automatic generation of useful representational concepts•Generation of task-relevant relations in the form of predicates•Discovery of relevant feature sets and object types
• Automatic derivation of skill and concept utilities•Concept utilities allow construction of appropriate representation•Skill utilities guide exploration or guide planning
Provides RL-based creation of hierarchical skills and concepts with symbolic representations, and skill and concept utilities to provide guidance on their use.• RL-based skill learning component for use in ICARUS
•Learned operator representations facilitate integration of skills
•Learned features and concepts can augment concept hierarchy• Skill and concept utilities for search and planning guidance
•Skill utility estimates can guide operator selection•Concept utility can inform the representation investigated
Development and Integration TimelineSkill generalization:
Skill/concept utility:
New capabilities will extend the set of transfer levels the Hierarchical RL system can address• Evaluation within the Urban Combat Testbed (UCT)
•Application of standalone system to transfer levels 1-6•Evaluation focus on tasks with significant change of the environment and of the task objective
• Evaluation of performance using Transfer Ratio (TR)
•Target of TR values larger than 2• Evaluation of use of capabilities by evaluation of frequency and task utility of generalized skills
Year 2 Year 3
Skill HierarchyConcept Hierarchy
Selective, task-specific state space construction
Hierarchical state representation
Task learningSkill and concept
extraction
Skill and concept
generalization
Skill and conceptutility
DevelopmentIntegration
36
Year 1 Evaluation Plans
Comparison among architectures should reveal the conditions for successful transfer learning.
But implementing agents that can operate in multiple testbeds takes considerable time and resources.
Instead, we will develop agents within two architectures for each testbed, with only one (ICARUS) being applied to all three of them.
Experiments will evaluate how well each pair of frameworks supports transfer involving quite different forms of knowledge.
UrbanCombat
ETSPhysics
GGP
Soar
Companions
LUTA
ICARUS
37
Year 2 Evaluation Plans
Comparison among architectures should reveal the conditions for successful transfer learning.
But implementing agents that can operate in multiple testbeds takes considerable time and resources.
Instead, we will develop agents within two architectures for each testbed, with only one (ICARUS) being applied to all three of them.
Experiments will evaluate how well each pair of frameworks supports transfer involving quite different forms of knowledge.
UrbanCombat
ETSPhysics
GGP
Soar
Companions
LUTA
ICARUS
38
Urban CombatLevel 9 Transfer: Hunted to Hunter
Hunted Hunter
Learn that there is a path with very low visibility
Learn that getting caught in a dead end is deadly
Avoid path that goes near ambush places
Learn to check the hidden path periodically
Learn to try to trap hunted in dead end
Discover a place that makes a good ambush
Tactical reasoning andstrategies;Symbolic and spatialrepresentations
Transfer
Scenarios in UrbanCombat Testbed (UCT)
39
Urban CombatLevel 10 Transfer: 1v1 to Fire Rescue
1 vs. 1 Fire Rescue
Pick up enemy ammo
Avoid being seen or shot
Don’t get caught in dead end
Use doors, walls for protection
Always have multiple exits
Consume enemy resources
Take advantage of terrain
Always leave an out
Tactical reasoning andstrategies;Symbolic and spatialrepresentations
Transfer
Backburn or backdraftRemove wood from fire’s path
Scenarios in UrbanCombat Testbed (UCT)
40
Year 2 Plans for Urban Combat Evaluation
UCT
TL0
UCT
TL1
Source (A)Problems
Target (B)Problems
Interface
BTLPerf 0 B
TLPerf 1
Experience
Perf
orm
ance
Transfer Ratio > 30% (Y2 Go/No-Go)
Source: 1v1, 2v2
Target: FireRescue
SourceBK
BK+TK
TargetBK TL0
UCT
B
TransferredKnowledge (TK)
Tactical
Terrain
Resource
Spatial
Tactical
Terrain
Resource
Spatial
Ammunition
Combustible
Deadend
NoExit
TransferLearning
Performance:•Transfer ratio (go/no-go)•Demonstrate deep transfer•Comparison to human trials
41
Year 2 Plans for Urban Combat Evaluation
• Metrics– Hunter: Time it takes agent to kill opponent
plus time penalties for health loss– Hunted: Inverse of time before opponent
kills agent plus time penalties for health loss
– 1v1: Time to kill N opponents plus time penalties for health loss and fewer than N kills
– Fire Rescue: Time to rescue ally from fire plus time penalties for health loss
42
Year 2 Plans for Urban Combat Evaluation
• Transferred knowledge– Hunter ↔ Hunted (TL level 9)
• Spatial– Visibility, dead-ends, ambush places, terrain
• Tactical– Check (hunter) / seek (hunted) low visibility areas– Trap in (hunter) / avoid (hunted) dead-ends– Seek (hunter) / avoid (hunted) ambush places
– 1v1 ↔ Fire Rescue (TL level 10)• Spatial
– Accessibility of resources (ammunition / combustibles)– Dead-ends, exits, terrain
• Tactical– Consume enemy resources– Use terrain for protection– Always leave an out
– Taxonomic• Types of terrain, spaces, resources and tactics
43
Year 2 Plans for Urban Combat Evaluation
• Performance milestones– Based on TL levels 9 and 10– Go/No Go: Transfer ratio > 30%– Demonstrate achievement of specific deep
transfer opportunities (e.g., ammunition combustible
– Comparison to human trials• Transfer ratios
• Deep transfer
Level 10: Differing
Source / target game graphs share substructure
corresponding to transferable strategy.
Year 2 Experimental Plans for GGP
GGP Terms• Game: defines environment in which agent operates.
– Includes initial state, terminal states, transition function, goals
– Score associated with goal and terminal states
• Match: competition between two agents in a game
• Scenario: source / target pairing of games
– Source / target may vary in one or more ways (initial state, terminal states, goals, transitions)
– Typically exactly one source and one target game
Protocol• Two transfer levels
9. Reformulating
10. Differing
• Seven scenarios per level
• Multiple consecutive matches in each game • Fixed opponent (non-learning)• Players receive score according to goal
• Domain performance metric: Score from satisfied goal
• Domain performance goal: Maximize score
Source Target
Structural Transfer
Level 9: ReformulatingSource / target game graphs are isomorphic.
• Source / target game descriptions fundamentally different
• Different axioms
• Different structural representation
• Equivalent meaning (same state graph structure)
• Transfer must occur at structural level
Opponent choices
can lead to several
possible successor
states
Current goal value is 50
Regardless of the opponent’s move,
agent can reach a state with a
higher goal value
Example: Generalized Fork
75 7065
50
45
Year 2 Plans for Physics Testbed
• All of Newtonian Dynamics
– Requires calculus, higher-degree polynomials, graphs
• Two areas from Dynamical Analogies
– Well-explored cross-domain analogies in various physical domains
– Excellent venue for exploring distant transfer
M
F
L
M
• Two areas from Dynamical Analogies– Well-explored cross-domain analogies
in various physical domains– Excellent venue for exploring distant
transfer– Example: Domain A = linear motion
Domain B = rotational motion, thermal systems, hydraulics, electricity, …
46
Physics TestbedDynamical Analogies in Detail
47
BACKUP SLIDES
48
Transfer Using Structure Mapping
• Model/knowledge oriented approach• Using knowledge about
– How actions affects state variables?– How state variables relate to each other?
• Use structure mapping to find similarities between source and target tasks
• Discover β and γ together
Objective
Technical Approach
Results Keepaway
Summary• Works nicely for Keepaway• Strong demand for domain knowledge• Provides similarity measures for source and target• Future work
–Improve efficiency–Learn QDBN from data–Apply to GGP and Urban Combat
• Qualitative DBNs– Dynamic Bayes networks are structure representation for
actions: an action (directly) affects a small number of state variables
– Probabilities are less relevant; more qualitative properties matter: no change, increase/decrease, etc.
• Specialized and optimized SME for QDBNs– Fixed types of entities– How entities match?– How to evaluate mappings?
• SME-QDBN uses heuristicsearch to find the mapping of the maximal score
– Prune with upper bounds
1. Generate local matches and calculate the conflict set for each local match;
2. Generate initial global mappings based on immediate relations of local matches;
3. Merge global mappings with common structures;4. Search for a maximal global mapping with the highest score;
Step 3 Step 4
Algorithm
t t+1HoldDist(K1,C)
Dist(K2,C)
Dist(T1,C)
Dist(K1,K2)
Dist(K1,T1)
Dist(K2,T)
Ang(K2,K1,T)
DAng(C,K1,K2)
DAng(C,K1,T1)
Dist(K2,T1)
Ang(K2,K1,T1)
Dist(K1,C)
Dist(K2,C)
Dist(T1,C)
Dist(K1,K2)
Dist(K1,T1)
Dist(K2,T)
Ang(K2,K1,T)
DAng(C,K1,K2)
DAng(C,K1,T1)
Dist(K2,T1)
Ang(K2,K1,T1)
-
min
min
49
Policy Transfer Using NEAT
• An alternative to value-function-based transfer• Direct policy transfer based on the mappings of state
variables (β) and actions (γ)
• NEAT (NeuroEvolution of Augmenting Topologies)– Uses genetic algorithms to evolve neural networks– Neural networks are used as action selectors
Results for Keepaway
Results for SchedulingObjective
• NEAT evolves 3v2 players
13 inputs, 3 outputs
19 inputs, 4 outputs
• Use a (from β & γ) to transform organisms
• NEAT evolves 4v3 with population from 3v2
• An autonomic computing task• Task: determine in what order to process jobs• Goal: maximize aggregate utility
• Source task: 2 job types (8 state variables, 8 actions)• Target task: 4 job types (16 state variables, 16 actions)
Episodes
Cos
t Scratch
With Transfer
t.t.r. = 80%t.r. = 35
t.t.r. = 84%t.r. = 5.8
Comparison of Sarsa and NEAT• Taylor, Whiteson, & Stone (GECCO-06)
50
Evaluation Process
• Year 1: Near-transfer (levels 1-6)– A = set of basic problems, B= transfer variations– Training runs include quiz of four problems, followed by
worked solutions– Experiment design worked out by ETS, NU
• ETS provided training + test examples for NU’s research needs– Novel problems from same templates were used for
evaluation• Tests were carried out on a sequestered NU cluster
– 5 nodes to ETS for Physics– Scripting language developed to facilitate creation of
experiments– Code frozen at start of evaluation– Efforts to make Companions usable by others has been an
important step towards making the architecture into a robust product
51
Good News from Evaluation
• With 2/3rds of the data in, likely to achieve our 50% near transfer goal
• Simple model works surprisingly well– Study worked solution = store it. – Extracted equations, modeling conditions, and
sanity checks from prior problems• Analogical mapping problems did arise
– Some straightforward rerepresentation techniques may suffice for near transfer of concrete cases
• Most of the failures were to due to limitations in parts of the system where learning currently does not take place– Provides clear examples for helping drive Y2
research
52
Research Questions Raised by Evaluation so far
• Self-monitoring to learn self-models is important– 25% of restructuring problems failed because of hard-wired
resource bounds. • A smarter system would figure out that it was making
reasonable progress, and dynamically increasing the bounds would enable it to solve the problem.
• A really smart system would look over the pattern of activity, see a lot of redundancy, and figure out how to change its strategies to be more efficient
• Even black-box subsystems need to be extensible by learning– Example: ArcSineFn, ArcCosineFn left out of algebra
system, causing 25% of the restructuring problems to fail.– In real world, such gaps are always possible– Cognitive systems must be adaptable enough to work
around them.
53
Common materials developed for Physics Testbed
• Representational Infrastructure– Starting with ResearchCyc plus Northwestern’s
representations– Extended to include representation of worked solutions
(NU, Cycorp, ETS)• Support Libraries
– Algebra package for symbolic, numerical equation solving
• With Johan de Kleer (PARC, Inc.)– Units package, tightly integrated with Cycorp
representations• Sketching tool for creating sketches associated with
worked solutions and problems– Modification of Sketching Knowledge Entry Associate
(sKEA) developed in earlier DARPA programs– Not deployed this year due to tight evaluation schedule
54
Understanding Strengths and Weakness of Soar / ICARUS
• UCT Stresses– Real time– Integration of reaction, decision making, and planning– Spatial reasoning
• Level 9 and 10 transfer stress – Flexibility in reasoning – Mixing task performance, deliberate reflection– Using multiple learning mechanisms– Multiple strategies for transfer– Knowledge-based transfer
55
Michigan Year 2 Plans for Urban Combat
• What capabilities/technologies you will provide?– Transfer across variations at the highest levels: 9 – Reformulation, 10 - Differing– Transfer of tactical reasoning and strategies across symbolic and spatial
representations – Use synergy across multiple learning mechanisms and general strategy discovery
methods– Compare and contrast across two cognitive architectures (Soar and ICARUS)
• How you plan to evaluate those capabilities?– In variations of UCT scenarios and maps across very different goals
• 1v1 Hunter/hunted• 1v1, 2v2 combat engagements • Search/rescue
– How fast and how safely can they perform their tasks?• How and when they will be integrated into the larger system(s)?
– From day 1 they will be integrated in Soar & ICARUS – full cognitive architectures– All scenarios require complete end to end behavior
• What unique ability your technologies will add?– Transfer across tasks requiring a combination of bottom-up, knowledge-based, and
spatial reasoning– Real-time, on-line learning and transfer– Learning mechanisms that require small numbers of source trials
56
Example Level 10 Transfer
Examples of general tactics learned from source: 1v1 Engagements
1.Consume opponent’s resources– Pick up enemy’s ammo
2.Divide and conquer – Attack one enemy at a time
3.Attack from a distance– Attack with rifle at distance
4.Minimize exposure– Take advantage of terrain
5.Always leave yourself an out6.Sacrifice for ultimate goal
Examples of transfer to target (rescue from burning building).
1. Consume fire’s resources– Set a backburn/backfire– Remove fuel (word)
2. Divide and conquer– Putting out one fire at a time
3. Avoid getting close to fire– Use ropes & tools to work at
distance4. Minimize exposure to fire
– Use barriers/doors5. Always have a safe exit available6. One fights (to death) while other
rescues
57
Transfer of Spatial Knowledge
• Why Spatial Reasoning and Knowledge?– Ubiquitous across all military domains, inherent to military tactics and strategy– Requires integration of symbolic and metric data
• Examples:– Types of surfaces that facilitate/hinder travel
• Not only speed travel but are likely to provide unobstructed paths• Roads and sidewalks vs. grassy areas and buildings• Bridges over water
– Placement of IEDs relative to structures (fixed and dynamic)• Best places to search for IEDs - Areas that can be ignored
– Common structures, organization of building that aid searching (next slide)• Transfer basic structure (in two-story houses, bedrooms usually on second floor)
– Locations for attacking/defending from enemies• Exposure to detection and fire• Location for setting ambush / being ambushed• Sniper positions• How to out flank opponent
– Spatial organization of groups of agents• Too difficult for year 2 (maybe can get to in year 3) • Ability to provide cover to teammates • How to search as a group• How to attack and defend as a group
58
Combinations of Spatial and Symbolic
• Common spatial layouts of specific types of buildings– Movie theaters– Schools– Restaurants – Office buildings– Homes
• Common spatial layouts of specific types of rooms– Bathrooms– Offices– Classrooms– Theaters
• Correlation of signs and spatial structures– Exit signs– Street signs– Restrooms
Will add graphics