1 Kunstmatige Intelligentie / RuG KI2 - 11 Reinforcement Learning Johan Everts.

Kunstmatige Intelligentie / RuG

KI2 - 11

Reinforcement Learning

Johan Everts

What is Learning ?

Learning takes place as a result of interaction between an agent and the world, the idea behind learning is that

Percepts received by an agent should be used not only for acting, but also for improving the agent’s ability to behave optimally in the future to achieve its goal.

Learning Types

Supervised learning: Situation in which sample (input, output)

pairs of the function to be learned can be perceived or are given

Reinforcement learning: Where the agent acts on its environment, it

receives some evaluation of its action (reinforcement), but is not told of which action is the correct one to achieve its goal

Unsupervised Learning:No information at all about given output

Task Learn how to behave successfully to achieve a

goal while interacting with an external environmentLearn through experience

Examples Game playing: The agent knows it has won or

lost, but it doesn’t know the appropriate action in each state

Control: a traffic system can measure the delay of cars, but not know how to decrease it.

Elements of RL

Transition model, how action influence states Reward R, imediate value of state-action transition Policy , maps states to actions

Environment

State Reward Action

Policy

sss 221100 r a2

r a0 :::

Elements of RL

r(state, action)immediate reward values

Elements of RL

Value function: maps states to state values

Discount factor [0, 1) (here 0.9)

V*(state) valuesr(state, action)immediate reward values

90 100 0

81 90 100

2 11π trγtγrtrsV ...

G 90 100 0

81 90 100

G 90 100 0

81 90 100

RL task (restated)

Execute actions in environment,

observe results.

Learn action policy : state action

that maximizes expected discounted

reward

E [r(t) + r(t + 1) + 2r(t + 2) + …]

from any starting state in S

Target function is : state action

RL differs from other function approximation tasks Partially observable states Exploration vs. Exploitation Delayed reward -> temporal credit

assignment

Target function is : state action

However… We have no training examples of form

<state, action>

Training examples are of form

<<state, action>, reward>

Utility-based agents

Try to learn V * (abbreviated V*) perform lookahead search to choose best action

from any state s

Works well if agent knows

: state action state

r : state action R

When agent doesn’t know and r, cannot choose

actions this way

a s,δ*Va s,rmaxargsπ*a

Q-learning

Define new function very similar to V*

If agent learns Q, it can choose optimal

action even without knowing or R

Using Learned Q

a s,δ*γVa s,ra s,Q

a s,Q maxargsπ*a

Learning the Q-value

Note: Q and V* closely related

Allows us to write Q recursively as

a' s,Q maxargs*Va'

a' ,tsQmax γta ,tsr

ta ,tsδγVta ,tsr ta ,tsQ

Learning the Q-value

FOR each <s, a> DO

Initialize table entry:

Observe current state s

WHILE (true) DO

Select action a and execute it

Receive immediate reward r

Observe new state s’

Update table entry for as follows

Move: record transition from s to s’

0 a s,Q̂

a s,Q̂

a' ,s'Q max γa s,r a s,Q a'

r(state, action)immediate reward values

Q(state, action) valuesV*(state) values

G 90 100 0

81 90 100

Q-learning

Q-learning, learns the expected utility of taking a particular action a in a particular state s (Q-value of the pair (s,a))

Q-learning

Demonstration

http://iridia.ulb.ac.be/~fvandenb/qlearning/qlearning.html

eps: probability to use a random action instead of the optimal policy

gam: discount factor, closer to 1 more weight is given to future reinforcements.

alpha: learning rate

Q-learning estimates one time step difference

Why not for n steps?

a ,tsQ max γ tr ta ,tsQ a

a ,ntsQ maxγntrγ tγr tr ta ,tsQ a

nnn ˆ11 1

Temporal Difference Learning:

TD() formula

Intuitive idea: use constant 0 1 to combine estimates from various lookahead distances (note normalization factor (1- ))

ta ,tsQλta ,tsλQta ,tsQ λ ta ,tsQ λ 32211

Temporal Difference Learning:

Genetic algorithms

Imagine the individuals as agent functions

Fitness function as performance measure or reward function

No attempt made to learn the relationship between the rewards and actions taken by an agent

Simply searches directly in the individual space to find one that maximizes the fitness functions

Genetic algorithms

Represent an individual as a binary string Selection works like this: if individual X scores

twice as high as Y on the fitness function, then X is twice as likely to be selected for reproduction than Y.

Reproduction is accomplished by cross-over and mutation

Cart – Pole balancing

Demonstration

http://www.bovine.net/~jlawson/hmc/pole/sane.html

Summary

RL addresses the problem of learning control strategies for autonomous agents

In Q-learning an evaluation function over states and actions is learned

TD-algorithms learn by iteratively reducing the differences between the estimates produced by the agent at different times

In the genetic approach, the relation between rewards and actions is not learned. You simply search the fitness function space.

1 Kunstmatige Intelligentie / RuG KI2 - 11 Reinforcement Learning Johan Everts.

Documents

Transcript of 1 Kunstmatige Intelligentie / RuG KI2 - 11 Reinforcement Learning Johan Everts.

Demonstration of the PDFreactor .NET API...Emotionele Intelligentie meting. De kwaliteit Emotionele Intelligentie omvat vier categorieën ofwel Factoren (Welzijn, Zelfbeheersing, Emotionaliteit

Kunstmatige Intelligentie en Blockchain: Samenwerken om ... · • Transparant en niet te veranderen • Veilig vanwege het mechanisme van overeenstemming • Gedecentraliseerd •

Dag van intelligentie 2018 - pearsonacademy.nl · Dag van intelligentie 2018 Donald H. Saklofske, Ph.D. Department of Psychology University of Western Ontario, Canada dsaklofs@uwo.ca

Automation Technicians Training Gaps Analysis€¦ · ˇ ˆ ˘ ˙ ˇ˝˛˚˜ ! 226-388 donald street winnipeg manitoba canada R3B 2J4 204.989.8002 fax 204.989.8048 ki2@kisquared.com

1 Kunstmatige Intelligentie / RuG KI2 - 7 Clustering Algorithms Johan Everts.

Robot-journalistiek: Hoe kunstmatige intelligentie het … · 2019. 3. 25. · medialandschap beïnvloedt. Onderzoeksinteresse y it CONCEPTUALIZER Message generation Monitoring Preverbal

Computational Learning Theory PAC IID VC Dimension SVM Kunstmatige Intelligentie / RuG KI2 - 5 Marius Bulacu & prof. dr. Lambert Schomaker.

THE INTEGRATION OF SECOND CORE COMPETENCE (KI2) OF ...lib.unnes.ac.id/21376/1/2201410078-s.pdf · THE INTEGRATION OF SECOND CORE COMPETENCE (KI2) OF CURRICULUM 2013 IN ENGLISH CLASSES

AI in de praktijk - SURF · 2020. 9. 16. · Ongekend snel gaan de ontwikkelingen inmachine learning en andere vormen van artificiële intelligentie. SURF helpt haar leden om deze

Strategische Intelligentie

LIVELINK – WANNEER INTELLIGENTIE AAN HET LICHT KOMT · Met dit lichtmanagement zijn kostenbesparingen van maar liefst 55 procent en meer mogelijk. Bij veranderde ruimtelijke situaties

mrloux.weebly.commrloux.weebly.com/.../38512431/ch4-ki2-cultureregionsatdifferentsca… · CHAPTER 2 Name Instructor LAYERS OF TRADITION: CULTURE REGIONS AT DIFFERENT SCALES ACTIVITY

Kunstmatige Intelligentie / RuG KI2 - 11 Reinforcement Learning Sander van Dijk.

GEOLOGY OF TEXAS · Weches, Queen City, and Reklaw Fms.) (Ec1) Wilcox and Midway Groups (EPA) Navarro and Taylor Groups (Ku2) Fredericksburg and L. Washita Groups (KI2) Trinity Group

Meetup 6/3/2017 - Artificiële Intelligentie: over chatbots & robots

University of Amsterdam - GitHub Pages · 2021. 1. 2. · kunstmatige neurale netwerken zijn computer modellen die gebaseerd zijn op de werking van het brein. Het interessante van

Top 100 University | Rijksuniversiteit Groningen - University of ...Omdat men de "kunstmatige nier" niet altijd kan gebruiken, werd uitgezien naar andere dialysatie-methoden en het

Media Understanding Bachelor Kunstmatige Intelligentie · Media Understanding . Bachelor Kunstmatige Intelligentie . Regular Exam . Date: March 31, 2017 . Time: 9:00-12:00 . Place:

Gambiadag 2011 Workshop Water · 2011. 4. 15. · Overalaanwezig, vaakvan goedekwaliteit Relatiefondiepgrondwater introductie |regenwater | oppervlaktewater | grondwater | kunstmatige

Myofascial Meridian Stimulation Therapy Korean Integrative Medicine Institute KI1: 용천 KI2: 연곡 KI10: 음곡 KI16: KI7: 복유.