University Paderborn 16 January 2009 RG Knowledge Based Systems Hans Kleine Büning Reinforcement...
-
Upload
mary-stephens -
Category
Documents
-
view
212 -
download
0
Transcript of University Paderborn 16 January 2009 RG Knowledge Based Systems Hans Kleine Büning Reinforcement...
![Page 1: University Paderborn 16 January 2009 RG Knowledge Based Systems Hans Kleine Büning Reinforcement Learning.](https://reader036.fdocuments.in/reader036/viewer/2022070305/55141ec9550346ec488b5709/html5/thumbnails/1.jpg)
UniversityPaderbor
n
16 January 2009
RG Knowledge Based Systems
Hans Kleine Büning
Reinforcement LearningReinforcement Learning
![Page 2: University Paderborn 16 January 2009 RG Knowledge Based Systems Hans Kleine Büning Reinforcement Learning.](https://reader036.fdocuments.in/reader036/viewer/2022070305/55141ec9550346ec488b5709/html5/thumbnails/2.jpg)
Reinforcement Learning Prof. Dr. Hans
Kleine Büning 2
UniversityPaderbor
n
OutlineOutline
• Motivation• Applications• Markov Decision Processes• Q-learning• Examples
![Page 3: University Paderborn 16 January 2009 RG Knowledge Based Systems Hans Kleine Büning Reinforcement Learning.](https://reader036.fdocuments.in/reader036/viewer/2022070305/55141ec9550346ec488b5709/html5/thumbnails/3.jpg)
Reinforcement Learning Prof. Dr. Hans
Kleine Büning 3
UniversityPaderbor
n
![Page 4: University Paderborn 16 January 2009 RG Knowledge Based Systems Hans Kleine Büning Reinforcement Learning.](https://reader036.fdocuments.in/reader036/viewer/2022070305/55141ec9550346ec488b5709/html5/thumbnails/4.jpg)
Reinforcement Learning Prof. Dr. Hans
Kleine Büning 4
UniversityPaderbor
n
Reinforcement Learning: The Idea
• A way of programming agents by reward and punishment without specifying how the task is to be achieved
![Page 5: University Paderborn 16 January 2009 RG Knowledge Based Systems Hans Kleine Büning Reinforcement Learning.](https://reader036.fdocuments.in/reader036/viewer/2022070305/55141ec9550346ec488b5709/html5/thumbnails/5.jpg)
Reinforcement Learning Prof. Dr. Hans
Kleine Büning 5
UniversityPaderbor
n
Learning to Ride a Bicycle
Environment
Environment
state
action
€€€€€€
![Page 6: University Paderborn 16 January 2009 RG Knowledge Based Systems Hans Kleine Büning Reinforcement Learning.](https://reader036.fdocuments.in/reader036/viewer/2022070305/55141ec9550346ec488b5709/html5/thumbnails/6.jpg)
Reinforcement Learning Prof. Dr. Hans
Kleine Büning 6
UniversityPaderbor
n
Learning to Ride a Bicycle
• States:– Angle of handle bars
– Angular velocity of handle bars
– Angle of bicycle to vertical
– Angular velocity of bicycle to vertical
– Acceleration of angle of bicycle to vertical
![Page 7: University Paderborn 16 January 2009 RG Knowledge Based Systems Hans Kleine Büning Reinforcement Learning.](https://reader036.fdocuments.in/reader036/viewer/2022070305/55141ec9550346ec488b5709/html5/thumbnails/7.jpg)
Reinforcement Learning Prof. Dr. Hans
Kleine Büning 7
UniversityPaderbor
n
Learning to Ride a Bicycle
Environment
Environment
state
action
€€€€€€
![Page 8: University Paderborn 16 January 2009 RG Knowledge Based Systems Hans Kleine Büning Reinforcement Learning.](https://reader036.fdocuments.in/reader036/viewer/2022070305/55141ec9550346ec488b5709/html5/thumbnails/8.jpg)
Reinforcement Learning Prof. Dr. Hans
Kleine Büning 8
UniversityPaderbor
n
Learning to Ride a Bicycle
• Actions:– Torque to be applied to the
handle bars
– Displacement of the center of mass from the bicycle’s plan (in cm)
![Page 9: University Paderborn 16 January 2009 RG Knowledge Based Systems Hans Kleine Büning Reinforcement Learning.](https://reader036.fdocuments.in/reader036/viewer/2022070305/55141ec9550346ec488b5709/html5/thumbnails/9.jpg)
Reinforcement Learning Prof. Dr. Hans
Kleine Büning 9
UniversityPaderbor
n
Learning to Ride a Bicycle
Environment
Environment
state
action
€€€€€€
![Page 10: University Paderborn 16 January 2009 RG Knowledge Based Systems Hans Kleine Büning Reinforcement Learning.](https://reader036.fdocuments.in/reader036/viewer/2022070305/55141ec9550346ec488b5709/html5/thumbnails/10.jpg)
Reinforcement Learning Prof. Dr. Hans
Kleine Büning 10
UniversityPaderbor
n
Angle of bicycle to vertical is greater
than 12°
Reward = 0
Reward = -1
no yes
![Page 11: University Paderborn 16 January 2009 RG Knowledge Based Systems Hans Kleine Büning Reinforcement Learning.](https://reader036.fdocuments.in/reader036/viewer/2022070305/55141ec9550346ec488b5709/html5/thumbnails/11.jpg)
Reinforcement Learning Prof. Dr. Hans
Kleine Büning 11
UniversityPaderbor
n
Learning To Ride a Bicycle
Reinforcement Learning
![Page 12: University Paderborn 16 January 2009 RG Knowledge Based Systems Hans Kleine Büning Reinforcement Learning.](https://reader036.fdocuments.in/reader036/viewer/2022070305/55141ec9550346ec488b5709/html5/thumbnails/12.jpg)
Reinforcement Learning Prof. Dr. Hans
Kleine Büning 12
UniversityPaderbor
n
Reinforcement Learning: Applications
• Board Games– TD-Gammon program, based on reinforcement learning, has
become a world-class backgammon player
• Mobile Robot Controlling– Learning to Drive a Bicycle– Navigation– Pole-balancing– Acrobot
• Sequential Process Controlling– Elevator Dispatching
![Page 13: University Paderborn 16 January 2009 RG Knowledge Based Systems Hans Kleine Büning Reinforcement Learning.](https://reader036.fdocuments.in/reader036/viewer/2022070305/55141ec9550346ec488b5709/html5/thumbnails/13.jpg)
Reinforcement Learning Prof. Dr. Hans
Kleine Büning 13
UniversityPaderbor
n
Key Features of Reinforcement Learning
• Learner is not told which actions to take• Trial and error search• Possibility of delayed reward:
– Sacrifice of short-term gains for greater long-term gains
• Explore/Exploit trade-off• Considers the whole problem of a goal-directed
agent interacting with an uncertain environment
![Page 14: University Paderborn 16 January 2009 RG Knowledge Based Systems Hans Kleine Büning Reinforcement Learning.](https://reader036.fdocuments.in/reader036/viewer/2022070305/55141ec9550346ec488b5709/html5/thumbnails/14.jpg)
Reinforcement Learning Prof. Dr. Hans
Kleine Büning 14
UniversityPaderbor
n
The Agent-Environment Interaction
• Agent and environment interact at discrete time steps: t = 0,1, 2, …– Agent observes state at step t :
st 2 S
– produces action at step t: at 2 A
– gets resulting reward : rt +1 2 ℜ
– and resulting next state: st +1 2 S
![Page 15: University Paderborn 16 January 2009 RG Knowledge Based Systems Hans Kleine Büning Reinforcement Learning.](https://reader036.fdocuments.in/reader036/viewer/2022070305/55141ec9550346ec488b5709/html5/thumbnails/15.jpg)
Reinforcement Learning Prof. Dr. Hans
Kleine Büning 15
UniversityPaderbor
n
The Agent’s Goal:
• Coarsely, the agent’s goal is to get as much reward as it
can over the long run
Policy is• a mapping from states to action s) = a
• Reinforcement learning methods specify how the agent changes its policy as a result of experience experience
![Page 16: University Paderborn 16 January 2009 RG Knowledge Based Systems Hans Kleine Büning Reinforcement Learning.](https://reader036.fdocuments.in/reader036/viewer/2022070305/55141ec9550346ec488b5709/html5/thumbnails/16.jpg)
Reinforcement Learning Prof. Dr. Hans
Kleine Büning 16
UniversityPaderbor
n
Deterministic Markov Decision Process
![Page 17: University Paderborn 16 January 2009 RG Knowledge Based Systems Hans Kleine Büning Reinforcement Learning.](https://reader036.fdocuments.in/reader036/viewer/2022070305/55141ec9550346ec488b5709/html5/thumbnails/17.jpg)
Reinforcement Learning Prof. Dr. Hans
Kleine Büning 17
UniversityPaderbor
n
Example
![Page 18: University Paderborn 16 January 2009 RG Knowledge Based Systems Hans Kleine Büning Reinforcement Learning.](https://reader036.fdocuments.in/reader036/viewer/2022070305/55141ec9550346ec488b5709/html5/thumbnails/18.jpg)
Reinforcement Learning Prof. Dr. Hans
Kleine Büning 18
UniversityPaderbor
n
Example: Corresponding MDP
![Page 19: University Paderborn 16 January 2009 RG Knowledge Based Systems Hans Kleine Büning Reinforcement Learning.](https://reader036.fdocuments.in/reader036/viewer/2022070305/55141ec9550346ec488b5709/html5/thumbnails/19.jpg)
Reinforcement Learning Prof. Dr. Hans
Kleine Büning 19
UniversityPaderbor
n
Example: Corresponding MDP
![Page 20: University Paderborn 16 January 2009 RG Knowledge Based Systems Hans Kleine Büning Reinforcement Learning.](https://reader036.fdocuments.in/reader036/viewer/2022070305/55141ec9550346ec488b5709/html5/thumbnails/20.jpg)
Reinforcement Learning Prof. Dr. Hans
Kleine Büning 20
UniversityPaderbor
n
Example: Corresponding MDP
![Page 21: University Paderborn 16 January 2009 RG Knowledge Based Systems Hans Kleine Büning Reinforcement Learning.](https://reader036.fdocuments.in/reader036/viewer/2022070305/55141ec9550346ec488b5709/html5/thumbnails/21.jpg)
Reinforcement Learning Prof. Dr. Hans
Kleine Büning 21
UniversityPaderbor
n
Example: Policy
![Page 22: University Paderborn 16 January 2009 RG Knowledge Based Systems Hans Kleine Büning Reinforcement Learning.](https://reader036.fdocuments.in/reader036/viewer/2022070305/55141ec9550346ec488b5709/html5/thumbnails/22.jpg)
Reinforcement Learning Prof. Dr. Hans
Kleine Büning 22
UniversityPaderbor
n
Value of Policy and Rewards
![Page 23: University Paderborn 16 January 2009 RG Knowledge Based Systems Hans Kleine Büning Reinforcement Learning.](https://reader036.fdocuments.in/reader036/viewer/2022070305/55141ec9550346ec488b5709/html5/thumbnails/23.jpg)
Reinforcement Learning Prof. Dr. Hans
Kleine Büning 23
UniversityPaderbor
n
Value of Policy and Agent’s Task
![Page 24: University Paderborn 16 January 2009 RG Knowledge Based Systems Hans Kleine Büning Reinforcement Learning.](https://reader036.fdocuments.in/reader036/viewer/2022070305/55141ec9550346ec488b5709/html5/thumbnails/24.jpg)
Reinforcement Learning Prof. Dr. Hans
Kleine Büning 24
UniversityPaderbor
n
Nondeterministic Markov Decision Process
P = 0
.8
P = 0.1
P = 0.1
![Page 25: University Paderborn 16 January 2009 RG Knowledge Based Systems Hans Kleine Büning Reinforcement Learning.](https://reader036.fdocuments.in/reader036/viewer/2022070305/55141ec9550346ec488b5709/html5/thumbnails/25.jpg)
Reinforcement Learning Prof. Dr. Hans
Kleine Büning 25
UniversityPaderbor
n
Nondeterministic Markov Decision Process
![Page 26: University Paderborn 16 January 2009 RG Knowledge Based Systems Hans Kleine Büning Reinforcement Learning.](https://reader036.fdocuments.in/reader036/viewer/2022070305/55141ec9550346ec488b5709/html5/thumbnails/26.jpg)
Reinforcement Learning Prof. Dr. Hans
Kleine Büning 26
UniversityPaderbor
n
Nondeterministic Markov Decision Process
![Page 27: University Paderborn 16 January 2009 RG Knowledge Based Systems Hans Kleine Büning Reinforcement Learning.](https://reader036.fdocuments.in/reader036/viewer/2022070305/55141ec9550346ec488b5709/html5/thumbnails/27.jpg)
Reinforcement Learning Prof. Dr. Hans
Kleine Büning 27
UniversityPaderbor
n
Example with South-Easten Wind
![Page 28: University Paderborn 16 January 2009 RG Knowledge Based Systems Hans Kleine Büning Reinforcement Learning.](https://reader036.fdocuments.in/reader036/viewer/2022070305/55141ec9550346ec488b5709/html5/thumbnails/28.jpg)
Reinforcement Learning Prof. Dr. Hans
Kleine Büning 28
UniversityPaderbor
n
Example with South-Easten Wind
![Page 29: University Paderborn 16 January 2009 RG Knowledge Based Systems Hans Kleine Büning Reinforcement Learning.](https://reader036.fdocuments.in/reader036/viewer/2022070305/55141ec9550346ec488b5709/html5/thumbnails/29.jpg)
Reinforcement Learning Prof. Dr. Hans
Kleine Büning 29
UniversityPaderbor
n
Methods
Dynamic Programming
ValueFunction
Approximation+
DynamicProgramming
ReinforcementLearning,
Monte Carlo Methods
ValuationFunction
Approximation+
ReinforcementLearning
continuousstates
discrete states discrete statescontinuous
states
Model (reward function and transitionprobabilities) is known
Model (reward function or transitionprobabilities) is unknown
![Page 30: University Paderborn 16 January 2009 RG Knowledge Based Systems Hans Kleine Büning Reinforcement Learning.](https://reader036.fdocuments.in/reader036/viewer/2022070305/55141ec9550346ec488b5709/html5/thumbnails/30.jpg)
Reinforcement Learning Prof. Dr. Hans
Kleine Büning 30
UniversityPaderbor
n
Q-learning Algorithm
![Page 31: University Paderborn 16 January 2009 RG Knowledge Based Systems Hans Kleine Büning Reinforcement Learning.](https://reader036.fdocuments.in/reader036/viewer/2022070305/55141ec9550346ec488b5709/html5/thumbnails/31.jpg)
Reinforcement Learning Prof. Dr. Hans
Kleine Büning 31
UniversityPaderbor
n
Q-learning Algorithm
![Page 32: University Paderborn 16 January 2009 RG Knowledge Based Systems Hans Kleine Büning Reinforcement Learning.](https://reader036.fdocuments.in/reader036/viewer/2022070305/55141ec9550346ec488b5709/html5/thumbnails/32.jpg)
Reinforcement Learning Prof. Dr. Hans
Kleine Büning 32
UniversityPaderbor
n
Example
![Page 33: University Paderborn 16 January 2009 RG Knowledge Based Systems Hans Kleine Büning Reinforcement Learning.](https://reader036.fdocuments.in/reader036/viewer/2022070305/55141ec9550346ec488b5709/html5/thumbnails/33.jpg)
Reinforcement Learning Prof. Dr. Hans
Kleine Büning 33
UniversityPaderbor
n
Example: Q-table Initialization
![Page 34: University Paderborn 16 January 2009 RG Knowledge Based Systems Hans Kleine Büning Reinforcement Learning.](https://reader036.fdocuments.in/reader036/viewer/2022070305/55141ec9550346ec488b5709/html5/thumbnails/34.jpg)
Reinforcement Learning Prof. Dr. Hans
Kleine Büning 34
UniversityPaderbor
n
Example: Episode 1
![Page 35: University Paderborn 16 January 2009 RG Knowledge Based Systems Hans Kleine Büning Reinforcement Learning.](https://reader036.fdocuments.in/reader036/viewer/2022070305/55141ec9550346ec488b5709/html5/thumbnails/35.jpg)
Reinforcement Learning Prof. Dr. Hans
Kleine Büning 35
UniversityPaderbor
n
Example: Episode 1
![Page 36: University Paderborn 16 January 2009 RG Knowledge Based Systems Hans Kleine Büning Reinforcement Learning.](https://reader036.fdocuments.in/reader036/viewer/2022070305/55141ec9550346ec488b5709/html5/thumbnails/36.jpg)
Reinforcement Learning Prof. Dr. Hans
Kleine Büning 36
UniversityPaderbor
n
Example: Episode 1
![Page 37: University Paderborn 16 January 2009 RG Knowledge Based Systems Hans Kleine Büning Reinforcement Learning.](https://reader036.fdocuments.in/reader036/viewer/2022070305/55141ec9550346ec488b5709/html5/thumbnails/37.jpg)
Reinforcement Learning Prof. Dr. Hans
Kleine Büning 37
UniversityPaderbor
n
Example: Episode 1
![Page 38: University Paderborn 16 January 2009 RG Knowledge Based Systems Hans Kleine Büning Reinforcement Learning.](https://reader036.fdocuments.in/reader036/viewer/2022070305/55141ec9550346ec488b5709/html5/thumbnails/38.jpg)
Reinforcement Learning Prof. Dr. Hans
Kleine Büning 38
UniversityPaderbor
n
Example: Episode 1
![Page 39: University Paderborn 16 January 2009 RG Knowledge Based Systems Hans Kleine Büning Reinforcement Learning.](https://reader036.fdocuments.in/reader036/viewer/2022070305/55141ec9550346ec488b5709/html5/thumbnails/39.jpg)
Reinforcement Learning Prof. Dr. Hans
Kleine Büning 39
UniversityPaderbor
n
Example: Q-table
![Page 40: University Paderborn 16 January 2009 RG Knowledge Based Systems Hans Kleine Büning Reinforcement Learning.](https://reader036.fdocuments.in/reader036/viewer/2022070305/55141ec9550346ec488b5709/html5/thumbnails/40.jpg)
Reinforcement Learning Prof. Dr. Hans
Kleine Büning 40
UniversityPaderbor
n
Example: Episode 1
![Page 41: University Paderborn 16 January 2009 RG Knowledge Based Systems Hans Kleine Büning Reinforcement Learning.](https://reader036.fdocuments.in/reader036/viewer/2022070305/55141ec9550346ec488b5709/html5/thumbnails/41.jpg)
Reinforcement Learning Prof. Dr. Hans
Kleine Büning 41
UniversityPaderbor
n
Episode 1
![Page 42: University Paderborn 16 January 2009 RG Knowledge Based Systems Hans Kleine Büning Reinforcement Learning.](https://reader036.fdocuments.in/reader036/viewer/2022070305/55141ec9550346ec488b5709/html5/thumbnails/42.jpg)
Reinforcement Learning Prof. Dr. Hans
Kleine Büning 42
UniversityPaderbor
n
Example: Q-table
![Page 43: University Paderborn 16 January 2009 RG Knowledge Based Systems Hans Kleine Büning Reinforcement Learning.](https://reader036.fdocuments.in/reader036/viewer/2022070305/55141ec9550346ec488b5709/html5/thumbnails/43.jpg)
Reinforcement Learning Prof. Dr. Hans
Kleine Büning 43
UniversityPaderbor
n
Example: Episode 2
![Page 44: University Paderborn 16 January 2009 RG Knowledge Based Systems Hans Kleine Büning Reinforcement Learning.](https://reader036.fdocuments.in/reader036/viewer/2022070305/55141ec9550346ec488b5709/html5/thumbnails/44.jpg)
Reinforcement Learning Prof. Dr. Hans
Kleine Büning 44
UniversityPaderbor
n
Example: Episode 2
![Page 45: University Paderborn 16 January 2009 RG Knowledge Based Systems Hans Kleine Büning Reinforcement Learning.](https://reader036.fdocuments.in/reader036/viewer/2022070305/55141ec9550346ec488b5709/html5/thumbnails/45.jpg)
Reinforcement Learning Prof. Dr. Hans
Kleine Büning 45
UniversityPaderbor
n
Example: Episode 2
![Page 46: University Paderborn 16 January 2009 RG Knowledge Based Systems Hans Kleine Büning Reinforcement Learning.](https://reader036.fdocuments.in/reader036/viewer/2022070305/55141ec9550346ec488b5709/html5/thumbnails/46.jpg)
Reinforcement Learning Prof. Dr. Hans
Kleine Büning 46
UniversityPaderbor
n
Example: Q-table after Convergence
![Page 47: University Paderborn 16 January 2009 RG Knowledge Based Systems Hans Kleine Büning Reinforcement Learning.](https://reader036.fdocuments.in/reader036/viewer/2022070305/55141ec9550346ec488b5709/html5/thumbnails/47.jpg)
Reinforcement Learning Prof. Dr. Hans
Kleine Büning 47
UniversityPaderbor
n
Example: Value Function after Convergence
![Page 48: University Paderborn 16 January 2009 RG Knowledge Based Systems Hans Kleine Büning Reinforcement Learning.](https://reader036.fdocuments.in/reader036/viewer/2022070305/55141ec9550346ec488b5709/html5/thumbnails/48.jpg)
Reinforcement Learning Prof. Dr. Hans
Kleine Büning 48
UniversityPaderbor
n
Example: Optimal Policy
![Page 49: University Paderborn 16 January 2009 RG Knowledge Based Systems Hans Kleine Büning Reinforcement Learning.](https://reader036.fdocuments.in/reader036/viewer/2022070305/55141ec9550346ec488b5709/html5/thumbnails/49.jpg)
Reinforcement Learning Prof. Dr. Hans
Kleine Büning 49
UniversityPaderbor
n
Example: Optimal Policy
![Page 50: University Paderborn 16 January 2009 RG Knowledge Based Systems Hans Kleine Büning Reinforcement Learning.](https://reader036.fdocuments.in/reader036/viewer/2022070305/55141ec9550346ec488b5709/html5/thumbnails/50.jpg)
Reinforcement Learning Prof. Dr. Hans
Kleine Büning 50
UniversityPaderbor
n
Q-learning
![Page 51: University Paderborn 16 January 2009 RG Knowledge Based Systems Hans Kleine Büning Reinforcement Learning.](https://reader036.fdocuments.in/reader036/viewer/2022070305/55141ec9550346ec488b5709/html5/thumbnails/51.jpg)
Reinforcement Learning Prof. Dr. Hans
Kleine Büning 51
UniversityPaderbor
n
Convergence of Q-learning
![Page 52: University Paderborn 16 January 2009 RG Knowledge Based Systems Hans Kleine Büning Reinforcement Learning.](https://reader036.fdocuments.in/reader036/viewer/2022070305/55141ec9550346ec488b5709/html5/thumbnails/52.jpg)
Reinforcement Learning Prof. Dr. Hans
Kleine Büning 52
UniversityPaderbor
n
Blackjack• Standard rules of blackjack hold• State space:
– element[0] - current value of player's hand (4-21)
– element[1] - value of dealer's face -up card (2-11)
– element[2] - player does not have usable ace (0/1)
• Starting states:– player has any 2 cards (uniformly
distributed), dealer has any 1 card (uniformly distributed)
• Actions: – HIT– STICK
• Rewards: – 1 for a loss– 0 for a draw– 1 for a win
![Page 53: University Paderborn 16 January 2009 RG Knowledge Based Systems Hans Kleine Büning Reinforcement Learning.](https://reader036.fdocuments.in/reader036/viewer/2022070305/55141ec9550346ec488b5709/html5/thumbnails/53.jpg)
Reinforcement Learning Prof. Dr. Hans
Kleine Büning 53
UniversityPaderbor
n
Blackjack: Optimal Policy
![Page 54: University Paderborn 16 January 2009 RG Knowledge Based Systems Hans Kleine Büning Reinforcement Learning.](https://reader036.fdocuments.in/reader036/viewer/2022070305/55141ec9550346ec488b5709/html5/thumbnails/54.jpg)
Reinforcement Learning Prof. Dr. Hans
Kleine Büning 54
UniversityPaderbor
n
Reinforcement Learning: Example
• States– Grids
• Actions– Left– Up– Right– Down
• Rewards– Bonus 20– Food 1– Predator -10– Empty grid -0.1
• Transition probabilities– 0.80 – agent goes where he
intends to go– 0.20 – to any other adjacent
grid or remains where it was (in case he is on the board of the grid world he goes to the other side)
![Page 55: University Paderborn 16 January 2009 RG Knowledge Based Systems Hans Kleine Büning Reinforcement Learning.](https://reader036.fdocuments.in/reader036/viewer/2022070305/55141ec9550346ec488b5709/html5/thumbnails/55.jpg)
Reinforcement Learning Prof. Dr. Hans
Kleine Büning 55
UniversityPaderbor
n
Reinforcement Learning: Example
![Page 56: University Paderborn 16 January 2009 RG Knowledge Based Systems Hans Kleine Büning Reinforcement Learning.](https://reader036.fdocuments.in/reader036/viewer/2022070305/55141ec9550346ec488b5709/html5/thumbnails/56.jpg)
Reinforcement Learning Prof. Dr. Hans
Kleine Büning 56
UniversityPaderbor
n
Reinforcement Learning: Example
![Page 57: University Paderborn 16 January 2009 RG Knowledge Based Systems Hans Kleine Büning Reinforcement Learning.](https://reader036.fdocuments.in/reader036/viewer/2022070305/55141ec9550346ec488b5709/html5/thumbnails/57.jpg)
Reinforcement Learning Prof. Dr. Hans
Kleine Büning 57
UniversityPaderbor
n
Reinforcement Learning: Example
![Page 58: University Paderborn 16 January 2009 RG Knowledge Based Systems Hans Kleine Büning Reinforcement Learning.](https://reader036.fdocuments.in/reader036/viewer/2022070305/55141ec9550346ec488b5709/html5/thumbnails/58.jpg)
Reinforcement Learning Prof. Dr. Hans
Kleine Büning 58
UniversityPaderbor
n
Reinforcement Learning: Example
![Page 59: University Paderborn 16 January 2009 RG Knowledge Based Systems Hans Kleine Büning Reinforcement Learning.](https://reader036.fdocuments.in/reader036/viewer/2022070305/55141ec9550346ec488b5709/html5/thumbnails/59.jpg)
Reinforcement Learning Prof. Dr. Hans
Kleine Büning 59
UniversityPaderbor
n
Reinforcement Learning: Example
![Page 60: University Paderborn 16 January 2009 RG Knowledge Based Systems Hans Kleine Büning Reinforcement Learning.](https://reader036.fdocuments.in/reader036/viewer/2022070305/55141ec9550346ec488b5709/html5/thumbnails/60.jpg)
Reinforcement Learning Prof. Dr. Hans
Kleine Büning 60
UniversityPaderbor
n
Reinforcement Learning: Example
![Page 61: University Paderborn 16 January 2009 RG Knowledge Based Systems Hans Kleine Büning Reinforcement Learning.](https://reader036.fdocuments.in/reader036/viewer/2022070305/55141ec9550346ec488b5709/html5/thumbnails/61.jpg)
Reinforcement Learning Prof. Dr. Hans
Kleine Büning 61
UniversityPaderbor
n
Reinforcement Learning: Example
![Page 62: University Paderborn 16 January 2009 RG Knowledge Based Systems Hans Kleine Büning Reinforcement Learning.](https://reader036.fdocuments.in/reader036/viewer/2022070305/55141ec9550346ec488b5709/html5/thumbnails/62.jpg)
Reinforcement Learning Prof. Dr. Hans
Kleine Büning 62
UniversityPaderbor
n
Reinforcement Learning: Example
![Page 63: University Paderborn 16 January 2009 RG Knowledge Based Systems Hans Kleine Büning Reinforcement Learning.](https://reader036.fdocuments.in/reader036/viewer/2022070305/55141ec9550346ec488b5709/html5/thumbnails/63.jpg)
Reinforcement Learning Prof. Dr. Hans
Kleine Büning 63
UniversityPaderbor
n
Reinforcement Learning: Example
![Page 64: University Paderborn 16 January 2009 RG Knowledge Based Systems Hans Kleine Büning Reinforcement Learning.](https://reader036.fdocuments.in/reader036/viewer/2022070305/55141ec9550346ec488b5709/html5/thumbnails/64.jpg)
Reinforcement Learning Prof. Dr. Hans
Kleine Büning 64
UniversityPaderbor
n
Reinforcement Learning: Example
![Page 65: University Paderborn 16 January 2009 RG Knowledge Based Systems Hans Kleine Büning Reinforcement Learning.](https://reader036.fdocuments.in/reader036/viewer/2022070305/55141ec9550346ec488b5709/html5/thumbnails/65.jpg)
Reinforcement Learning Prof. Dr. Hans
Kleine Büning 65
UniversityPaderbor
n
Reinforcement Learning: Example
![Page 66: University Paderborn 16 January 2009 RG Knowledge Based Systems Hans Kleine Büning Reinforcement Learning.](https://reader036.fdocuments.in/reader036/viewer/2022070305/55141ec9550346ec488b5709/html5/thumbnails/66.jpg)
Reinforcement Learning Prof. Dr. Hans
Kleine Büning 66
UniversityPaderbor
n
Reinforcement Learning: Example
![Page 67: University Paderborn 16 January 2009 RG Knowledge Based Systems Hans Kleine Büning Reinforcement Learning.](https://reader036.fdocuments.in/reader036/viewer/2022070305/55141ec9550346ec488b5709/html5/thumbnails/67.jpg)
Reinforcement Learning Prof. Dr. Hans
Kleine Büning 67
UniversityPaderbor
n
Reinforcement Learning: Example
![Page 68: University Paderborn 16 January 2009 RG Knowledge Based Systems Hans Kleine Büning Reinforcement Learning.](https://reader036.fdocuments.in/reader036/viewer/2022070305/55141ec9550346ec488b5709/html5/thumbnails/68.jpg)
Reinforcement Learning Prof. Dr. Hans
Kleine Büning 68
UniversityPaderbor
n
Reinforcement Learning: Example
![Page 69: University Paderborn 16 January 2009 RG Knowledge Based Systems Hans Kleine Büning Reinforcement Learning.](https://reader036.fdocuments.in/reader036/viewer/2022070305/55141ec9550346ec488b5709/html5/thumbnails/69.jpg)
Reinforcement Learning Prof. Dr. Hans
Kleine Büning 69
UniversityPaderbor
n
Reinforcement Learning: Example
![Page 70: University Paderborn 16 January 2009 RG Knowledge Based Systems Hans Kleine Büning Reinforcement Learning.](https://reader036.fdocuments.in/reader036/viewer/2022070305/55141ec9550346ec488b5709/html5/thumbnails/70.jpg)
Reinforcement Learning Prof. Dr. Hans
Kleine Büning 70
UniversityPaderbor
n
Reinforcement Learning: Example
![Page 71: University Paderborn 16 January 2009 RG Knowledge Based Systems Hans Kleine Büning Reinforcement Learning.](https://reader036.fdocuments.in/reader036/viewer/2022070305/55141ec9550346ec488b5709/html5/thumbnails/71.jpg)
Reinforcement Learning Prof. Dr. Hans
Kleine Büning 71
UniversityPaderbor
n
Reinforcement Learning: Example
![Page 72: University Paderborn 16 January 2009 RG Knowledge Based Systems Hans Kleine Büning Reinforcement Learning.](https://reader036.fdocuments.in/reader036/viewer/2022070305/55141ec9550346ec488b5709/html5/thumbnails/72.jpg)
Reinforcement Learning Prof. Dr. Hans
Kleine Büning 72
UniversityPaderbor
n
Reinforcement Learning: Example
![Page 73: University Paderborn 16 January 2009 RG Knowledge Based Systems Hans Kleine Büning Reinforcement Learning.](https://reader036.fdocuments.in/reader036/viewer/2022070305/55141ec9550346ec488b5709/html5/thumbnails/73.jpg)
Reinforcement Learning Prof. Dr. Hans
Kleine Büning 73
UniversityPaderbor
n
Reinforcement Learning: Example
![Page 74: University Paderborn 16 January 2009 RG Knowledge Based Systems Hans Kleine Büning Reinforcement Learning.](https://reader036.fdocuments.in/reader036/viewer/2022070305/55141ec9550346ec488b5709/html5/thumbnails/74.jpg)
Reinforcement Learning Prof. Dr. Hans
Kleine Büning 74
UniversityPaderbor
n
Reinforcement Learning: Example
![Page 75: University Paderborn 16 January 2009 RG Knowledge Based Systems Hans Kleine Büning Reinforcement Learning.](https://reader036.fdocuments.in/reader036/viewer/2022070305/55141ec9550346ec488b5709/html5/thumbnails/75.jpg)
Reinforcement Learning Prof. Dr. Hans
Kleine Büning 75
UniversityPaderbor
n
Reinforcement Learning: Example
![Page 76: University Paderborn 16 January 2009 RG Knowledge Based Systems Hans Kleine Büning Reinforcement Learning.](https://reader036.fdocuments.in/reader036/viewer/2022070305/55141ec9550346ec488b5709/html5/thumbnails/76.jpg)
Reinforcement Learning Prof. Dr. Hans
Kleine Büning 76
UniversityPaderbor
n
Reinforcement Learning: Example
![Page 77: University Paderborn 16 January 2009 RG Knowledge Based Systems Hans Kleine Büning Reinforcement Learning.](https://reader036.fdocuments.in/reader036/viewer/2022070305/55141ec9550346ec488b5709/html5/thumbnails/77.jpg)
Reinforcement Learning Prof. Dr. Hans
Kleine Büning 77
UniversityPaderbor
n
Reinforcement Learning: Example
![Page 78: University Paderborn 16 January 2009 RG Knowledge Based Systems Hans Kleine Büning Reinforcement Learning.](https://reader036.fdocuments.in/reader036/viewer/2022070305/55141ec9550346ec488b5709/html5/thumbnails/78.jpg)
Reinforcement Learning Prof. Dr. Hans
Kleine Büning 78
UniversityPaderbor
n
Reinforcement Learning: Example
![Page 79: University Paderborn 16 January 2009 RG Knowledge Based Systems Hans Kleine Büning Reinforcement Learning.](https://reader036.fdocuments.in/reader036/viewer/2022070305/55141ec9550346ec488b5709/html5/thumbnails/79.jpg)
Reinforcement Learning Prof. Dr. Hans
Kleine Büning 79
UniversityPaderbor
n
Reinforcement Learning: Example