Lesson 5 –Low-level control and learning Anders Lyhne...

INTRODUCTION TO AUTONOMOUS

MOBILE ROBOTS

Lesson 5 – Low-level control and learning

Anders Lyhne Christensen, D6.05, [email protected]

Overview

� Low-level control� Ad-hoc

� Sense-think-act loop

� Event driven control

� Learning and adaptation I� Types of learning

� Issues in learning

� Example: Q-learning applied to a real robot

(next time, we will discuss an interesting approach to learning called evolutionary robotics in more detail)

Low-level control

We will cover three types of low-level control:

� Stream of instructions

� Classic Control loop

� Event-driven languages

Other approaches such as logic programming

exist, but we will not cover those in this course.

Stream of instructions

Example:// move forward for 2 seconds:

moveForward(speed = 10)

sleep(2000)

if (obstacleAhead()) {

turnLeft(speed = 10)

sleep(1000)

} else {

…

}

� Suitable for industrial, assembly line robots� Easy to describe a fixed, predefined task as a recipe

� Little branching

Classic control loop

Sense Think Act

The loop usually has a fixed duration, e.g. 100 ms and is called repeatedly

Classic control loop

while (!Button.ESCAPE.isPressed()) {

long startTime = System.currentTimeMillis();

sense(); // read sensors

think(); // plan next action

act(); // do next action

try {

Thread.sleep(100 –

(System.currentTimeMillis() – startTime));

} catch (Exception e) {}

}

Event-driven languages

URBI script – examples:Ball tracking:whenever (ball.visible) {

headYaw.val += camera.xfov * ball.x

&

headPitch.val += camera.yfov * ball.y

};

Interaction:at (speech.hear("hello")) {

voice.say("How are you?")

&

robot.standup();

}

Distributed and event-driven

….

Right motor

microcontroller

Left motor

microcontroller

Proximity

sensors

microcontroller

Ev

en

t b

us

LEARNING AND ADAPTATION

Based on slides from Prof. Lynn E. Parker

What is Learning/Adaptation?

� Many definitions:� Modification of behavioral tendency by experience.

(Webster 1984)

� A learning machine, broadly defined, is any device whose actions are influenced by past experiences. (Nilsson 1965)

� Any change in a system that allows it to perform better the second time on repetition of the same task or on another task drawn from the same population. (Simon 1983)

� An improvement in information processing ability that results from information processing activity. (Tanimoto1990)

� Our operational definition:� Learning produces changes within an agent that over time

enable it to perform more effectively within its environment.

What is Relationship between Learning and

Adaptation?

� Evolutionary adaptation: Descendents change over long time scales based on the success or failure of their ancestors in the environment

� Structural adaptation: Agents adapt their morphology with respect to the environment

� Sensor adaptation: An agent’s perceptual system becomes more attuned to its environment

� Behavioral adaptation: An agent’s individual behaviors are adjusted relative to one another

� Learning: Essentially anything else that results in a more ecologically fit agent (can include adaptation).

Habituation and Sensitization

� Adaptation may produce habituation or sensitization� Habituation:

� An eventual decrease in or cessation of a behavioral response when a stimulus is presented numerous times

� Useful for eliminating spurious or unnecessary responses

� Generally associated with relatively insignificant stimuli, such as loud noise

� Sensitization:� The opposite – an increase in the

probability of a behavioral response when a stimulus is repeated frequently

� Generally associated with “dire” stimuli, like electrical shocks

Sensitization

Example of habituation

Learning

� Learning, on the other hand, can improve performance in additional ways:� Introducing new knowledge (facts, behaviors, rules) into the

system

� Generalizing concepts from multiple examples

� Specializing concepts for particular instances that are in some way different from the mainstream

� Reorganizing the information within the system to be more efficient

� Creating or discovering new concepts

� Creating explanations of how things function

� Reusing past experiences

AI Research has Generated Several Learning

Approaches

� Reinforcement learning: rewards and/or punishments are used to alter

numeric values in a controller

� Evolutionary learning: Genetic operators such as crossover and

mutation are used over populations of controllers, leading to more

efficient control strategies

� Neural networks: A form of reinforcement learning that uses

specialized architectures in which learning occurs as the result of

alterations in synaptic weights

� Learning from experience:

� Memory-based learning: myriad individual records of past experiences

are used to derive function approximators for control laws

� Case-based learning: Specific experiences are organized and stored as a

case structure, then retrieved and adapted as needed based on the

current situational context

Learning Approaches (con’t.)

� Inductive learning: Specific training examples are

used, each in turn, to generalize and/or specialize

concepts or controllers

� Explanation-based learning: Specific domain

knowledge is used to guide the learning process

� Multistrategy learning: Multiple learning methods

compete and cooperate with each other, each

specializing in what it does best

Challenges with Learning

� Credit assignment problem: How is credit or blame assigned to a particular piece or pieces of knowledge in a large knowledge base, or to the components of a complex system responsible for either the success or failure of an attempt to accomplish a task?

� Saliency problem: What features in the available input stream are relevant to the learning task?

� New term problem: When does a new representational construct (concept) need to be created to capture some useful feature effectively?

� Indexing problem: How can a memory be efficiently organized to provide effective and timely recall to support learning and improved performance?

� Utility problem: How does a learning system determine that the information it contains is still relevant and useful? When is it acceptable to forget things?

Example: Q-Learning Algorithm

� Provides ability to learn by determining which behavioral actions are most appropriate for a given situation

� State-action table:

� E(y) = utility of state y

Actions, a

State, x

Next state y, utility E(y)

Update function for Q(x,a)

� Q(x,a) �Q(x,a) + β(r + λE(Y) – Q(x,a))

� Where:

� β is learning rate parameter

� r is the payoff (reward or punishment)

� λ is a parameter, called the discount factor, ranging between 0 and 1

� E(y) is the utility of the state y that results from the action and is computed by:

E(y) = max(Q(y,a)) for all actions a

� Reward actions are propagated across states so that rewards from similar states can facilitate learning, too.

� What is “similar state”? One approach: Weighted Hamming Distance

Utility Function Used to Modify

Robot’s Behavioral ResponsesInitialize all Q(x,a) to 0.

Do Forever

� Determine current world state s via sensing

� 90% of the time choose action a that maximizes Q(x,a)

else pick random action

� Execute a

� Determine reward r

� Update Q(x, a) as described

� Update Q(x’,a) for all states x’ similar to x

End Do

Example of Using Q-Learning: Teaching Box-Pushing

� Robot (Obelix):

� 8 sonar (4 look forward, 2 look right, 2 look left)

� Sonar quantized into two ranges:� NEAR (from 9-18 inches)

� FAR (from 18-30 inches)

� Forward-looking infrared (IR):� Binary response of 4 inches to indicate when

robot in BUMP state

� Current to drive motors monitored to determine if robot is STUCK (i.e., input current exceeds a threshold)

� Total of 18 bits of sensor information available: 16 sonar bits (NEAR, FAR), two for BUMP and STUCK

� Motor control outputs – five choices:� Moving forward

� Turning left 22 degrees

� Turning right 22 degrees

� Turning more sharply left at 45 degrees

� Turning more sharply right at 45 degrees

Obelix robot and box, 1991

Robot’s Learning Problem

� Learning Problem:

� Deciding, for any of the approximately 250,000 perceptual

states, which of the 5 possible actions will enable it to find

and push boxes around a room without getting stuck

250,000 perceptual states

5 actions

= 250,000 x 5

= 1,250,000 state/action pairs to explore!

State Diagram of Behavior

Transitions

Finder

Unwedger

Pusher

BUMP

BUMP

BUMP + ∆t

STUCKSTUCK

STUCK + ∆t

Anything else

• Finder: moves robot

toward possible boxes

• Pusher: occurs after

BUMP results from box

find

• Unwedger: removes

robot when box is no

longer pushable

Measurement of “State Nearness”

� Use 18-bit representation of state (16 for sonar,

two for BUMP and STUCK)

� Compute Hamming distance between states

� Recall: Hamming distance = number of bits in

which the two states differ

� For this example, states were considered “near” if

Hamming distance < 3

Robotic Results

� Q-learning strategy tested on Obelixrobot

� Observations:� Using Q-learning over a random agent

substantially improved box pushing

� Also compared performance to hand-coded solution: performance of Q-learning approach was close-to or better than hand-coded solution

� Importance of this work:� Its empirical demonstration of Q-learning’s

feasibility as a useful approach to behavior-based robotic learning

Summary of Learning/Adaptation

so far

� Robots need to learn in order to adapt effectively to a changing

and dynamic environment

� Behavior-based robots can learn in a variety of ways:

� They can learn entire new behaviors

� They can learn more effective responses

� They can learn to associate more appropriate or broader stimuli with a

particular response

� They can learn new combinations of behaviors

� They can learn more effective coordination of existing behaviors

� Learning can be either continuous (on-line) or batch (off-line)

� Q-Learning is a form of Reinforcement Learning in which

actions and states are evaluated together.

A Challenge: Getting RL to

Work on Real Robots

� When is learning appropriate?

� When task is originally under-specified or difficult to code

exactly by hand

� When task has parameters that are likely to change over time

in unpredictable ways

� When time taken to learn control policy is less than that for

hand-coding a comparable policy

� When learned policy can be executed more efficiently than a

hand-coded one

Problems with RL on Robots

� Huge number of states to explore, with large number of possible

actions in each state.

� E.g., 24 sonar sensors, quantized into 3 range bands � 282 billion possible states

� If possible actions in each state or go forwards or backwards � > 560 billionstate-action combinations to try

� Robot is physical, thus it takes time to perform an action

� 1 second per action � 20,000 years to try each combination

� During early learning, robot’s actions may be dangerous

� “Let’s try rolling down the stairwell to see what next state I end up in …”

� One possible safeguard: give robot reflexes to stop dangerous actions

Today’s task

� Work on projects

Lesson 5 –Low-level control and learning Anders Lyhne...

Documents

Transcript of Lesson 5 –Low-level control and learning Anders Lyhne...