Bob Marinier Based on paper by Lehman, Laird, Rosenbloom NSF, DARPA, ONR University of Michigan...

31
A Gentle Introduction to Soar A Cognitive Architecture for Human-Level AI Bob Marinier Based on paper by Lehman, Laird, Rosenbloom NSF, DARPA, ONR University of Michigan April 18, 2008

Transcript of Bob Marinier Based on paper by Lehman, Laird, Rosenbloom NSF, DARPA, ONR University of Michigan...

Page 1: Bob Marinier Based on paper by Lehman, Laird, Rosenbloom NSF, DARPA, ONR University of Michigan April 18, 2008.

A Gentle Introduction to SoarA Cognitive Architecture for Human-Level AI

Bob MarinierBased on paper by Lehman, Laird, Rosenbloom

NSF, DARPA, ONRUniversity of Michigan

April 18, 2008

Page 2: Bob Marinier Based on paper by Lehman, Laird, Rosenbloom NSF, DARPA, ONR University of Michigan April 18, 2008.

What is Artificial Intelligence?

2

Page 3: Bob Marinier Based on paper by Lehman, Laird, Rosenbloom NSF, DARPA, ONR University of Michigan April 18, 2008.

What is Artificial Intelligence?

3

Page 4: Bob Marinier Based on paper by Lehman, Laird, Rosenbloom NSF, DARPA, ONR University of Michigan April 18, 2008.

What is Human-Level AI?

4

Page 5: Bob Marinier Based on paper by Lehman, Laird, Rosenbloom NSF, DARPA, ONR University of Michigan April 18, 2008.

How to Achieve Human-Level AI?

• Create separate systems for each capability– e.g., language, planning, learning, etc.

• Create single system of simpler mechanisms out of which these capabilities arise– Cognitive architecture– Inspired by psychology (study of how people think)

5

Page 6: Bob Marinier Based on paper by Lehman, Laird, Rosenbloom NSF, DARPA, ONR University of Michigan April 18, 2008.

What is Architecture?

• Computer architectures– Differ in processor type, memory size, commands,

etc.– Differences reflect designs intended to be optimal

under different assumptions about usage

6

Page 7: Bob Marinier Based on paper by Lehman, Laird, Rosenbloom NSF, DARPA, ONR University of Michigan April 18, 2008.

What is Architecture?

7

APPLICATION: word processor

TASK: write a paper

HARDWARE: PC

architecture for

architecture for content for

content for

BEHAVIOR = ARCHITECTURE + CONTENT

Page 8: Bob Marinier Based on paper by Lehman, Laird, Rosenbloom NSF, DARPA, ONR University of Michigan April 18, 2008.

What is Cognitive Architecture?

• A theory of the fixed mechanisms and structures that underlie human cognition

• Said another way:A theory of what is common to the wide array of behaviors we think of as intelligent

• Soar is one such theory (there are others)– Soar is a computational theory (it actually runs on

computers)

8

Page 9: Bob Marinier Based on paper by Lehman, Laird, Rosenbloom NSF, DARPA, ONR University of Michigan April 18, 2008.

What Cognitive Behaviors Have in Common

• Goal-oriented• Takes place in rich, complex, detailed

environment• Requires a large amount of knowledge• Requires use of symbols and abstractions• Flexible, and a function of the environment• Requires learning from the environment and

experience

9

Page 10: Bob Marinier Based on paper by Lehman, Laird, Rosenbloom NSF, DARPA, ONR University of Michigan April 18, 2008.

10

Pitcher (Joe)

Batter (Sam)

First baseman

Second basemanShort stop

Third baseman

Catcher

Left-fielder

Center-fielder

Right-fielder

Page 11: Bob Marinier Based on paper by Lehman, Laird, Rosenbloom NSF, DARPA, ONR University of Michigan April 18, 2008.

What Cognitive Behaviors Have in Common

• Behaves in goal-oriented manner– Joe’s goal is to win the game– He adopts several subgoals to help him achieve this

• Operates in a rich, complex, detailed environment– Positions and movements of the players, current state of the game, etc.

• Uses a large amount of knowledge– Choosing the pitch draws on his own pitching record, Sam’s batting record, etc.

• Behaves flexibly as a function of the environment– Choosing the pitch takes into account handed-ness of the batter, etc. – When pitch is hit, Joe must change his subgoal to respond to the new situation

• Uses symbols and abstractions– Since Joe has never played this particular game before, he must draw on previous experience

by abstracting away from this day and place

• Learns from environment and experience– Joe needs to learn from this experience in order to do better when Sam bats in the future

11

Page 12: Bob Marinier Based on paper by Lehman, Laird, Rosenbloom NSF, DARPA, ONR University of Michigan April 18, 2008.

Content is Knowledge

• K1: Knowledge of the objects in the game– E.g., baseball, infield, base line, inning, out, etc.

• K2: Knowledge of abstract events and particular episodes– E.g., how batters hit, how this guy batted last time

• K3: Knowledge of rules of the game– E.g., number of outs, balk, infield fly

• K4: Knowledge of objectives– E.g., get batter out, throw strikes

• K5: Knowledge of actions or methods for attaining objectives– E.g., use a curve ball, throw to first, walk batter

• K6: Knowledge of when to choose actions or methods– E.g., if behind in the count, throw a fast ball

• K7: Knowledge of the component physical actions– E.g., how to throw a curve ball, catch, run

12

Page 13: Bob Marinier Based on paper by Lehman, Laird, Rosenbloom NSF, DARPA, ONR University of Michigan April 18, 2008.

Problem Spaces

• Knowledge is organized as a sequence of decisions through a problem space

13

Joe is standing on the mound. Sam is at bat. Joe has the goal of getting Sam out.

He chooses a curve ball

He chooses a fast ball

He chooses a slider

Strike…HitFoul…Ball…

StrikeHit…FoulBall…

Strike…SingleFoul…Homerun

Joe faces the next batter

He chooses another fast ball

He changes to a curve ball

He catches it …

Page 14: Bob Marinier Based on paper by Lehman, Laird, Rosenbloom NSF, DARPA, ONR University of Michigan April 18, 2008.

Problem Spaces

14

Initial state

f1 v1f2 v2

S0

f1 v1f2 v1

S1

f1 v2f2 v2

S2

f1 v2f2 v1

S3

f1 v5f2 v1

S12

f1 v3f2 v6

S91

f1 v4f2 v8

S30

f1 v2f2 v6

S80

Goal state

Goal state

operator

Page 15: Bob Marinier Based on paper by Lehman, Laird, Rosenbloom NSF, DARPA, ONR University of Michigan April 18, 2008.

Problem Spaces

15

Initial state

f1 v1f2 v2

S0

f1 v1f2 v1

S1

f1 v5f2 v1

S12

f1 v3f2 v6

S91Goal state

operator

Page 16: Bob Marinier Based on paper by Lehman, Laird, Rosenbloom NSF, DARPA, ONR University of Michigan April 18, 2008.

Tying the Content to the Architecture

16

Current State: batter name Sam batter status not out balls 0 strikes 0 outs 0 … goal batter out problem space pitch

Operator: throw-curve

Page 17: Bob Marinier Based on paper by Lehman, Laird, Rosenbloom NSF, DARPA, ONR University of Michigan April 18, 2008.

Operator Proposal

• How do operators get proposed and compared?– Knowledge determines when an operator is relevant

to the current goal and state• Joe’s goal is to get the batter out, and he’s the pitcher, so his

available operators are kinds of pitches• Knowledge represented in the state may influence the

choice of pitch (e.g., is the batter right or left handed)

17

Page 18: Bob Marinier Based on paper by Lehman, Laird, Rosenbloom NSF, DARPA, ONR University of Michigan April 18, 2008.

Operator Selection

• How are operators selected?– Principle of Rationality

“If an agent has knowledge that an operator application will lead to one of its goals then the agent will select that operator”

– That is:Rational agents behave in a goal-oriented way

18

Page 19: Bob Marinier Based on paper by Lehman, Laird, Rosenbloom NSF, DARPA, ONR University of Michigan April 18, 2008.

Operator Application

• How is a selected operator applied?– Can execute operator in external world

• Joe throws a pitch

– Can result in internal changes to the state• Joe thinks about throwing a pitch

– Using states and operators allows us to model both acting and thinking as a function of knowledge

19

Page 20: Bob Marinier Based on paper by Lehman, Laird, Rosenbloom NSF, DARPA, ONR University of Michigan April 18, 2008.

Goals

• How do we know if execution of an operator has achieved the goal?– In baseball, rely on knowledge of rules of the game– Can also have external signals (e.g., umpire)

• How do goals and problem spaces change over time?– Via the application of operators

20

Page 21: Bob Marinier Based on paper by Lehman, Laird, Rosenbloom NSF, DARPA, ONR University of Michigan April 18, 2008.

Tying the Content to the Architecture

• So how do we:– Represent knowledge so agent acts in a goal-oriented

way?– Represent knowledge in a way that is independent of

baseball?

• Represent knowledge in terms of problem spaces, goals, states, and operators

• Guide operator choice by the principle of rationality

21

Page 22: Bob Marinier Based on paper by Lehman, Laird, Rosenbloom NSF, DARPA, ONR University of Michigan April 18, 2008.

Defining the Architecture

• What are the architectural processes for using knowledge to create and change states and operators?

22

Page 23: Bob Marinier Based on paper by Lehman, Laird, Rosenbloom NSF, DARPA, ONR University of Michigan April 18, 2008.

Long-term vs. Short-term Knowledge

23

Long-term (Procedural) Memory

Short-term (Working)Memory

Some knowledge is not specific to the current situation…

…and some is

Page 24: Bob Marinier Based on paper by Lehman, Laird, Rosenbloom NSF, DARPA, ONR University of Michigan April 18, 2008.

Soar

24

Working Memory

Procedural MemoryD

ecis

ion

Pro

cedu

re

Perception Action

Page 25: Bob Marinier Based on paper by Lehman, Laird, Rosenbloom NSF, DARPA, ONR University of Michigan April 18, 2008.

Rules (knowledge in long-term memory)

• IF I am the pitcher, the other team is at bat, and I perceive that I am at the mound

THEN suggest a goal to get the batter out via pitching (Pitch).

• IF the problem space is to Pitch and I perceive a new batter who is left/right handed

THEN add batter not out, balls 0, strikes 0, and batter left/right-handed to the state.

• IF the problem space is to Pitch and the batter is not out

THEN suggest the throw-curve-ball operator.

25

Page 26: Bob Marinier Based on paper by Lehman, Laird, Rosenbloom NSF, DARPA, ONR University of Michigan April 18, 2008.

Rules (knowledge in long-term memory)

• Each matching rule “maps” from current goal, state and operator to changes to those objects

• There can be dependencies among rules– Can’t choose a pitch until you’ve decided to pitch to

the batter, which you can’t do if you’re not on the mound, etc.

– Soar doesn’t recognize dependencies; it just “fires” rules as they match

• All parts of rules are expressed in terms of perceptions, actions, states and operators

27

Page 27: Bob Marinier Based on paper by Lehman, Laird, Rosenbloom NSF, DARPA, ONR University of Michigan April 18, 2008.

Decision Cycle (how Soar controls interactions between its parts)

29

Elaboration

Decide

Application

Input

Output

Procedural Memory(rules)

Perception/Action Interface

Working Memory

Get-batter-out

Pitch

0/0left…

New batter

Dec

isio

n P

roce

dure

throw-fast

throw-curve

Throw curve

0/0left…

throw-curve

Hit

Page 28: Bob Marinier Based on paper by Lehman, Laird, Rosenbloom NSF, DARPA, ONR University of Michigan April 18, 2008.

Summary

• How is (ST) knowledge represented in the state?– As sets of features and values

• How is general (LT) knowledge represented?– Rules that map one set of feature-values to another

• What are the architectural processes for using knowledge in LTM?– Decision cycle

(input, elaborate, decide, apply, output)• What are the mechanisms for interacting with the world?

– Perception and action go through interfaces embedded in the decision cycle

30

Page 29: Bob Marinier Based on paper by Lehman, Laird, Rosenbloom NSF, DARPA, ONR University of Michigan April 18, 2008.

But wait, there’s more…

• Soar also has ways to deal with a lack of knowledge, including learning

• Recent work on Soar has focused on new mechanisms to accommodate new kinds of problems– New long-term memories with different properties– New learning mechanisms– Non-symbolic ways of representing knowledge

31

Page 30: Bob Marinier Based on paper by Lehman, Laird, Rosenbloom NSF, DARPA, ONR University of Michigan April 18, 2008.

Extending Soar

• Learn from rewards– Reinforcement learning

• Learn facts– What you know– Semantic memory

• Learn events– What you remember– Episodic memory

• Basic drives and …– Emotions, feelings, mood

• Non-symbolic reasoning– Mental imagery

• Working memory relevance– Activation

• Learn from regularities– Spatial and temporal clusters

32

Symbolic Long-Term Memories

Procedural

Symbolic Short-Term MemoryDecision

Procedure

ChunkingReinforcementLearning

Semantic

SemanticLearning

Episodic

EpisodicLearning

Perception ActionVisual

Imagery

Fee

ling

Gen

erat

ion

ReinforcementLearning

Clustering

Page 31: Bob Marinier Based on paper by Lehman, Laird, Rosenbloom NSF, DARPA, ONR University of Michigan April 18, 2008.

How to Learn More About Soar

• Soar homepage– http://sitemaker.umich.edu/soar/– Read the full Gentle Introduction to Soar– Download Soar and tutorials

• 28th Soar Workshop– May 5-7 (Mon-Wed) in Ann Arbor– Invited speakers on Cognitive Robotics

• Greg Trafton (NRL) and Paul Benjamin (Pace University)

– It’s not too late to register!– http://winter.eecs.umich.edu/workshop/

• John Laird’s new book: The Soar Cognitive Architecture (due out Summer 2009)

33