Lesson 5 –Low-level control and learning Anders Lyhne...
Transcript of Lesson 5 –Low-level control and learning Anders Lyhne...
INTRODUCTION TO AUTONOMOUS
MOBILE ROBOTS
Lesson 5 – Low-level control and learning
Anders Lyhne Christensen, D6.05, [email protected]
Overview
� Low-level control� Ad-hoc
� Sense-think-act loop
� Event driven control
� Learning and adaptation I� Types of learning
� Issues in learning
� Example: Q-learning applied to a real robot
(next time, we will discuss an interesting approach to learning called evolutionary robotics in more detail)
Low-level control
We will cover three types of low-level control:
� Stream of instructions
� Classic Control loop
� Event-driven languages
Other approaches such as logic programming
exist, but we will not cover those in this course.
Stream of instructions
Example:// move forward for 2 seconds:
moveForward(speed = 10)
sleep(2000)
if (obstacleAhead()) {
turnLeft(speed = 10)
sleep(1000)
} else {
…
}
� Suitable for industrial, assembly line robots� Easy to describe a fixed, predefined task as a recipe
� Little branching
Classic control loop
Sense Think Act
The loop usually has a fixed duration, e.g. 100 ms and is called repeatedly
Classic control loop
Sense Think Act
The loop usually has a fixed duration, e.g. 100 ms and is called repeatedly
Classic control loop
while (!Button.ESCAPE.isPressed()) {
long startTime = System.currentTimeMillis();
sense(); // read sensors
think(); // plan next action
act(); // do next action
try {
Thread.sleep(100 –
(System.currentTimeMillis() – startTime));
} catch (Exception e) {}
}
Event-driven languages
URBI script – examples:Ball tracking:whenever (ball.visible) {
headYaw.val += camera.xfov * ball.x
&
headPitch.val += camera.yfov * ball.y
};
Interaction:at (speech.hear("hello")) {
voice.say("How are you?")
&
robot.standup();
}
Distributed and event-driven
….
Right motor
microcontroller
Left motor
microcontroller
Proximity
sensors
microcontroller
Ev
en
t b
us
LEARNING AND ADAPTATION
Based on slides from Prof. Lynn E. Parker
What is Learning/Adaptation?
� Many definitions:� Modification of behavioral tendency by experience.
(Webster 1984)
� A learning machine, broadly defined, is any device whose actions are influenced by past experiences. (Nilsson 1965)
� Any change in a system that allows it to perform better the second time on repetition of the same task or on another task drawn from the same population. (Simon 1983)
� An improvement in information processing ability that results from information processing activity. (Tanimoto1990)
� Our operational definition:� Learning produces changes within an agent that over time
enable it to perform more effectively within its environment.
What is Relationship between Learning and
Adaptation?
� Evolutionary adaptation: Descendents change over long time scales based on the success or failure of their ancestors in the environment
� Structural adaptation: Agents adapt their morphology with respect to the environment
� Sensor adaptation: An agent’s perceptual system becomes more attuned to its environment
� Behavioral adaptation: An agent’s individual behaviors are adjusted relative to one another
� Learning: Essentially anything else that results in a more ecologically fit agent (can include adaptation).
Habituation and Sensitization
� Adaptation may produce habituation or sensitization� Habituation:
� An eventual decrease in or cessation of a behavioral response when a stimulus is presented numerous times
� Useful for eliminating spurious or unnecessary responses
� Generally associated with relatively insignificant stimuli, such as loud noise
� Sensitization:� The opposite – an increase in the
probability of a behavioral response when a stimulus is repeated frequently
� Generally associated with “dire” stimuli, like electrical shocks
Sensitization
Example of habituation
Learning
� Learning, on the other hand, can improve performance in additional ways:� Introducing new knowledge (facts, behaviors, rules) into the
system
� Generalizing concepts from multiple examples
� Specializing concepts for particular instances that are in some way different from the mainstream
� Reorganizing the information within the system to be more efficient
� Creating or discovering new concepts
� Creating explanations of how things function
� Reusing past experiences
AI Research has Generated Several Learning
Approaches
� Reinforcement learning: rewards and/or punishments are used to alter
numeric values in a controller
� Evolutionary learning: Genetic operators such as crossover and
mutation are used over populations of controllers, leading to more
efficient control strategies
� Neural networks: A form of reinforcement learning that uses
specialized architectures in which learning occurs as the result of
alterations in synaptic weights
� Learning from experience:
� Memory-based learning: myriad individual records of past experiences
are used to derive function approximators for control laws
� Case-based learning: Specific experiences are organized and stored as a
case structure, then retrieved and adapted as needed based on the
current situational context
Learning Approaches (con’t.)
� Inductive learning: Specific training examples are
used, each in turn, to generalize and/or specialize
concepts or controllers
� Explanation-based learning: Specific domain
knowledge is used to guide the learning process
� Multistrategy learning: Multiple learning methods
compete and cooperate with each other, each
specializing in what it does best
Challenges with Learning
� Credit assignment problem: How is credit or blame assigned to a particular piece or pieces of knowledge in a large knowledge base, or to the components of a complex system responsible for either the success or failure of an attempt to accomplish a task?
� Saliency problem: What features in the available input stream are relevant to the learning task?
� New term problem: When does a new representational construct (concept) need to be created to capture some useful feature effectively?
� Indexing problem: How can a memory be efficiently organized to provide effective and timely recall to support learning and improved performance?
� Utility problem: How does a learning system determine that the information it contains is still relevant and useful? When is it acceptable to forget things?
Example: Q-Learning Algorithm
� Provides ability to learn by determining which behavioral actions are most appropriate for a given situation
� State-action table:
� E(y) = utility of state y
Actions, a
State, x
Next state y, utility E(y)
Update function for Q(x,a)
� Q(x,a) �Q(x,a) + β(r + λE(Y) – Q(x,a))
� Where:
� β is learning rate parameter
� r is the payoff (reward or punishment)
� λ is a parameter, called the discount factor, ranging between 0 and 1
� E(y) is the utility of the state y that results from the action and is computed by:
E(y) = max(Q(y,a)) for all actions a
� Reward actions are propagated across states so that rewards from similar states can facilitate learning, too.
� What is “similar state”? One approach: Weighted Hamming Distance
Utility Function Used to Modify
Robot’s Behavioral ResponsesInitialize all Q(x,a) to 0.
Do Forever
� Determine current world state s via sensing
� 90% of the time choose action a that maximizes Q(x,a)
else pick random action
� Execute a
� Determine reward r
� Update Q(x, a) as described
� Update Q(x’,a) for all states x’ similar to x
End Do
Example of Using Q-Learning: Teaching Box-Pushing
� Robot (Obelix):
� 8 sonar (4 look forward, 2 look right, 2 look left)
� Sonar quantized into two ranges:� NEAR (from 9-18 inches)
� FAR (from 18-30 inches)
� Forward-looking infrared (IR):� Binary response of 4 inches to indicate when
robot in BUMP state
� Current to drive motors monitored to determine if robot is STUCK (i.e., input current exceeds a threshold)
� Total of 18 bits of sensor information available: 16 sonar bits (NEAR, FAR), two for BUMP and STUCK
� Motor control outputs – five choices:� Moving forward
� Turning left 22 degrees
� Turning right 22 degrees
� Turning more sharply left at 45 degrees
� Turning more sharply right at 45 degrees
Obelix robot and box, 1991
Robot’s Learning Problem
� Learning Problem:
� Deciding, for any of the approximately 250,000 perceptual
states, which of the 5 possible actions will enable it to find
and push boxes around a room without getting stuck
250,000 perceptual states
5 actions
= 250,000 x 5
= 1,250,000 state/action pairs to explore!
State Diagram of Behavior
Transitions
Finder
Unwedger
Pusher
BUMP
BUMP
BUMP + ∆t
STUCKSTUCK
STUCK + ∆t
Anything else
• Finder: moves robot
toward possible boxes
• Pusher: occurs after
BUMP results from box
find
• Unwedger: removes
robot when box is no
longer pushable
Measurement of “State Nearness”
� Use 18-bit representation of state (16 for sonar,
two for BUMP and STUCK)
� Compute Hamming distance between states
� Recall: Hamming distance = number of bits in
which the two states differ
� For this example, states were considered “near” if
Hamming distance < 3
Robotic Results
� Q-learning strategy tested on Obelixrobot
� Observations:� Using Q-learning over a random agent
substantially improved box pushing
� Also compared performance to hand-coded solution: performance of Q-learning approach was close-to or better than hand-coded solution
� Importance of this work:� Its empirical demonstration of Q-learning’s
feasibility as a useful approach to behavior-based robotic learning
Summary of Learning/Adaptation
so far
� Robots need to learn in order to adapt effectively to a changing
and dynamic environment
� Behavior-based robots can learn in a variety of ways:
� They can learn entire new behaviors
� They can learn more effective responses
� They can learn to associate more appropriate or broader stimuli with a
particular response
� They can learn new combinations of behaviors
� They can learn more effective coordination of existing behaviors
� Learning can be either continuous (on-line) or batch (off-line)
� Q-Learning is a form of Reinforcement Learning in which
actions and states are evaluated together.
A Challenge: Getting RL to
Work on Real Robots
� When is learning appropriate?
� When task is originally under-specified or difficult to code
exactly by hand
� When task has parameters that are likely to change over time
in unpredictable ways
� When time taken to learn control policy is less than that for
hand-coding a comparable policy
� When learned policy can be executed more efficiently than a
hand-coded one
Problems with RL on Robots
� Huge number of states to explore, with large number of possible
actions in each state.
� E.g., 24 sonar sensors, quantized into 3 range bands � 282 billion possible states
� If possible actions in each state or go forwards or backwards � > 560 billionstate-action combinations to try
� Robot is physical, thus it takes time to perform an action
� 1 second per action � 20,000 years to try each combination
� During early learning, robot’s actions may be dangerous
� “Let’s try rolling down the stairwell to see what next state I end up in …”
� One possible safeguard: give robot reflexes to stop dangerous actions
Today’s task
� Work on projects