Reinforcement Learning Chapter 13 What is Reinforcement Learning? Q-Learning Examples 1.

22
Reinforcement Learning Chapter 13 What is Reinforcement Learning? Q-Learning Examples 1

Transcript of Reinforcement Learning Chapter 13 What is Reinforcement Learning? Q-Learning Examples 1.

Page 1: Reinforcement Learning Chapter 13 What is Reinforcement Learning? Q-Learning Examples 1.

1

Reinforcement Learning

Chapter 13

• What is Reinforcement Learning?• Q-Learning• Examples

Page 2: Reinforcement Learning Chapter 13 What is Reinforcement Learning? Q-Learning Examples 1.

2

Machine Learning Categories

Page 3: Reinforcement Learning Chapter 13 What is Reinforcement Learning? Q-Learning Examples 1.

3

What’s reinforcement Learning?

• An autonomous agent should learn to choose optimal actions in each state to achieve its goals.

• The agent learns how to achieve that goal by trial-and-error interactions with its environment.

Page 4: Reinforcement Learning Chapter 13 What is Reinforcement Learning? Q-Learning Examples 1.

4

Example: Learning to ride a bike

• Suppose: In the first trial, the RL system begins riding the bicycle and performs a series of actions that result in the bicycle being tilted 45 degrees to the right.

• At this point, there are two possible actions: – turn the handle bars right:

• crashing to the ground (a negative reinforcement)

– turn the handle bars left:• crashing to the ground (a negative reinforcement)

Page 5: Reinforcement Learning Chapter 13 What is Reinforcement Learning? Q-Learning Examples 1.

5

Example: Learning to ride a bike

• At this point, the RL system has not only learned that turning the handle bars right or left when tilted 45 degrees to the right is bad, but that the "state" of being titled 45 degrees to the right is bad.

• Again, the RL system begins another trial and performs a series of actions that result in the bicycle being tilted 40 degrees to the right. ……

Page 6: Reinforcement Learning Chapter 13 What is Reinforcement Learning? Q-Learning Examples 1.

6

Reinforcement Learning: Suitable for state-action problems

• Board games: E.g. backgammon, chess, 8-puzzle, …(Reinforcement learning in board games., Imran Ghory, 2004)

s0

s2 s1

s5 s6 s7

s3

s8

a5a4

a1a2

a3

a6 a7

Page 7: Reinforcement Learning Chapter 13 What is Reinforcement Learning? Q-Learning Examples 1.

7

What’s reinforcement Learning?

s0 s1

Agent

environment

StateReward

a0

r0 r1

s2

r2

a1

Action

a2

s : state

a : action

r : a reward function

control policy : S -> A

Page 8: Reinforcement Learning Chapter 13 What is Reinforcement Learning? Q-Learning Examples 1.

8

Example: TD-Gammon

• Tesauro (1995)

• RL to play Backgammon to become the world championship

• Immediate reward

– +100 if win

– -100 if lose

– 0 for all other states

• Trained by playing 1.5 million games against itself

• Now approximately equal to best human player

Page 9: Reinforcement Learning Chapter 13 What is Reinforcement Learning? Q-Learning Examples 1.

9

An Example of Reward Function

Page 10: Reinforcement Learning Chapter 13 What is Reinforcement Learning? Q-Learning Examples 1.

10

The Goal in Reinforcement Learning

• Goal: learn to choose actions that maximize:

r0 + r1 + 2 r2 + … ,

• where 0 < 1

• The discount factor is used to exponentially decrease the weight of reinforcements received in the future

• It’s called: Discounted Cumulative Reward

Page 11: Reinforcement Learning Chapter 13 What is Reinforcement Learning? Q-Learning Examples 1.

11

Discounted Cumulative Reward

=0.9

Page 12: Reinforcement Learning Chapter 13 What is Reinforcement Learning? Q-Learning Examples 1.

12

Other Options

• Finite-horizon model:

• Average-reward model:

• Average discounted reward model:

Page 13: Reinforcement Learning Chapter 13 What is Reinforcement Learning? Q-Learning Examples 1.

13

Different Types of Learning Tasks

• Agent’s actions: – Deterministic, or – Nondeterministic

• Agent may have or haven’t the ability of predicting the next state that will result from each action

• Trainer of the agent: – Expert (who shows it examples of optimal action

sequences), or – agent itself(train itself by performing actions of its own

choice.)

Page 14: Reinforcement Learning Chapter 13 What is Reinforcement Learning? Q-Learning Examples 1.

14

Q-Learning for Simple Deterministic Worlds

Page 15: Reinforcement Learning Chapter 13 What is Reinforcement Learning? Q-Learning Examples 1.

15

example

Q(s1, aright) r + Q (s2 , )

0 + 0.9 max{63,81,100}

90

Page 16: Reinforcement Learning Chapter 13 What is Reinforcement Learning? Q-Learning Examples 1.

16

RL as a function approximation method

• Learning the control policy () is very similar to the function approximation problem, except:

1. Delayed reward– In RL, The trainer provides only a sequence of immediate

reward values => Facing the problem of temporal credit assignment.

2. Exploration or Exploitation (next slide)– Exploration to collect new information, or Exploitation of

what it already learned to maximize the cumulative rewards.

– In RL, the agents influence the distribution of training examples by the action sequence it chooses.

Page 17: Reinforcement Learning Chapter 13 What is Reinforcement Learning? Q-Learning Examples 1.

17

Explore or Exploit?• In Q-learning, there is no mention about how to choose an

action among possible actions, some obtions:

– Random uniform selction

– High Q-value selection

– Selection based on the following probability:

– Small k => exploration, large k => exploitation,

– Common choice: small k at the beginning of the learning process, then gradually increasing k

Page 18: Reinforcement Learning Chapter 13 What is Reinforcement Learning? Q-Learning Examples 1.

18

RL Vs. other function approximation(continued)

3. Partially Observable States– In many practical situations, the sensors provide only partial information

(like the camera in front of a robot). – Solution: considering previous observations together with the current

sensor data

4. Life-long Learning– Unlike the function approximation task, in RL, robots need to learn many

task simultaneously plus online learning process forever.

Page 19: Reinforcement Learning Chapter 13 What is Reinforcement Learning? Q-Learning Examples 1.

19

RL Convergence• Proved in p 377-378, Mitchell.

• Three conditions of convergency:

– Deterministic Markov Decision Process (MDP)

– Immediate positive bounded rewards

– Agent selects every agent-action pairs infinitely often.

Page 20: Reinforcement Learning Chapter 13 What is Reinforcement Learning? Q-Learning Examples 1.

20

Markov Decision Process• Finite set of States : S; Set of Actions: A

– t: discrete time step; – st: the state at time t; – at: the action at time t;

• At each discrete time, agent observe states st S, and chooses action at A. • Then receive immediate reward: rt , And state change to: st+1

• Markov assumption: st+1= (st , at ), rt=r (st , at )– i.e., rt, and st+1 depend only on current state and action

• Functions and r may be nondeterministic • Functions and r not necessarily be known to agent

stat

rt

st+1 rt+1

st+2 rt+2

at+1 at+2

Page 21: Reinforcement Learning Chapter 13 What is Reinforcement Learning? Q-Learning Examples 1.

21

Other issues in RL (p. 381 - 386)• Reinforcement Learning for non-deterministic rewards

and actions

• Temporal Difference Learning

• Generalizing from examples

• Relationship to dynamic programming

• Continuous reinforcement learning (state-of-the-art)

Page 22: Reinforcement Learning Chapter 13 What is Reinforcement Learning? Q-Learning Examples 1.

22

Homework

• 13.3– Tik-Tak-Toe