Lecture 4: Q-learning (table) - GitHub Pageshunkim.github.io/ml/RL/rl04.pdf · 2017-10-02 ·...

23

Lecture 4: Q-learning (table) exploit&exploration and discounted future reward Reinforcement Learning with TensorFlow&OpenAI Gym Sung Kim <[email protected]>

Upload
others
Category

Documents
view
1
download
0

Embed Size (px):

Transcript of Lecture 4: Q-learning (table) - GitHub Pageshunkim.github.io/ml/RL/rl04.pdf · 2017-10-02 ·...

Page 1: Lecture 4: Q-learning (table) - GitHub Pageshunkim.github.io/ml/RL/rl04.pdf · 2017-10-02 · Lecture 4: Q-learning (table) exploit&exploration and discounted future reward Reinforcement

Lecture 4: Q-learning (table)exploit&exploration and discounted future reward

Reinforcement Learning with TensorFlow&OpenAI GymSung Kim <[email protected]>

Page 2: Lecture 4: Q-learning (table) - GitHub Pageshunkim.github.io/ml/RL/rl04.pdf · 2017-10-02 · Lecture 4: Q-learning (table) exploit&exploration and discounted future reward Reinforcement

Dummy Q-learning algorithm

Page 3: Lecture 4: Q-learning (table) - GitHub Pageshunkim.github.io/ml/RL/rl04.pdf · 2017-10-02 · Lecture 4: Q-learning (table) exploit&exploration and discounted future reward Reinforcement

Learning Q (s, a)?

Page 4: Lecture 4: Q-learning (table) - GitHub Pageshunkim.github.io/ml/RL/rl04.pdf · 2017-10-02 · Lecture 4: Q-learning (table) exploit&exploration and discounted future reward Reinforcement

Learning Q(s, a) Table: one success!

11

11

1

111

Page 5: Lecture 4: Q-learning (table) - GitHub Pageshunkim.github.io/ml/RL/rl04.pdf · 2017-10-02 · Lecture 4: Q-learning (table) exploit&exploration and discounted future reward Reinforcement

Exploit VS Exploration

11

11

1

111

http://home.deib.polimi.it/restelli/MyWebSite/pdf/rl5.pdf

Page 6: Lecture 4: Q-learning (table) - GitHub Pageshunkim.github.io/ml/RL/rl04.pdf · 2017-10-02 · Lecture 4: Q-learning (table) exploit&exploration and discounted future reward Reinforcement

Exploit VS Exploration

0.0 0.0 0.0 0.0 0.0

Page 7: Lecture 4: Q-learning (table) - GitHub Pageshunkim.github.io/ml/RL/rl04.pdf · 2017-10-02 · Lecture 4: Q-learning (table) exploit&exploration and discounted future reward Reinforcement

Exploit (weekday) VS Exploration (weekend)

0.5 0.6 0.3 0.2 0.5

Page 8: Lecture 4: Q-learning (table) - GitHub Pageshunkim.github.io/ml/RL/rl04.pdf · 2017-10-02 · Lecture 4: Q-learning (table) exploit&exploration and discounted future reward Reinforcement

Exploit VS Exploration: E-greedy

e = 0.1

if rand < e: a = random

else: a = argmax(Q(s, a))

Page 9: Lecture 4: Q-learning (table) - GitHub Pageshunkim.github.io/ml/RL/rl04.pdf · 2017-10-02 · Lecture 4: Q-learning (table) exploit&exploration and discounted future reward Reinforcement

Exploit VS Exploration: decaying E-greedy

for i in range (1000)

e = 0.1 / (i+1)

if random(1) < e: a = random

else: a = argmax(Q(s, a))

Page 10: Lecture 4: Q-learning (table) - GitHub Pageshunkim.github.io/ml/RL/rl04.pdf · 2017-10-02 · Lecture 4: Q-learning (table) exploit&exploration and discounted future reward Reinforcement

Exploit VS Exploration: add random noise

0.5 0.6 0.3 0.2 0.5

Page 11: Lecture 4: Q-learning (table) - GitHub Pageshunkim.github.io/ml/RL/rl04.pdf · 2017-10-02 · Lecture 4: Q-learning (table) exploit&exploration and discounted future reward Reinforcement

Exploit VS Exploration: add random noise

0.5 0.6 0.3 0.2 0.5

a = argmax(Q(s, a) + random_values)

a = argmax([0.5 0.6 0.3 0.2 0.5]+[0.1 0.2 0.7 0.3 0.1])

Page 12: Lecture 4: Q-learning (table) - GitHub Pageshunkim.github.io/ml/RL/rl04.pdf · 2017-10-02 · Lecture 4: Q-learning (table) exploit&exploration and discounted future reward Reinforcement

Exploit VS Exploration: add random noise

0.5 0.6 0.3 0.2 0.5

for i in range (1000) a = argmax(Q(s, a) + random_values / (i+1))

Page 13: Lecture 4: Q-learning (table) - GitHub Pageshunkim.github.io/ml/RL/rl04.pdf · 2017-10-02 · Lecture 4: Q-learning (table) exploit&exploration and discounted future reward Reinforcement

Exploit VS Exploration

11

11

1

111

Page 14: Lecture 4: Q-learning (table) - GitHub Pageshunkim.github.io/ml/RL/rl04.pdf · 2017-10-02 · Lecture 4: Q-learning (table) exploit&exploration and discounted future reward Reinforcement

Dummy Q-learning algorithm

Page 15: Lecture 4: Q-learning (table) - GitHub Pageshunkim.github.io/ml/RL/rl04.pdf · 2017-10-02 · Lecture 4: Q-learning (table) exploit&exploration and discounted future reward Reinforcement

Discounted future reward

11

11

1

111

1

1

Page 16: Lecture 4: Q-learning (table) - GitHub Pageshunkim.github.io/ml/RL/rl04.pdf · 2017-10-02 · Lecture 4: Q-learning (table) exploit&exploration and discounted future reward Reinforcement

Learning Q (s, a) with discounted reward

Page 17: Lecture 4: Q-learning (table) - GitHub Pageshunkim.github.io/ml/RL/rl04.pdf · 2017-10-02 · Lecture 4: Q-learning (table) exploit&exploration and discounted future reward Reinforcement

Discounted future reward

Page 18: Lecture 4: Q-learning (table) - GitHub Pageshunkim.github.io/ml/RL/rl04.pdf · 2017-10-02 · Lecture 4: Q-learning (table) exploit&exploration and discounted future reward Reinforcement

Learning Q (s, a) with discounted reward

Page 19: Lecture 4: Q-learning (table) - GitHub Pageshunkim.github.io/ml/RL/rl04.pdf · 2017-10-02 · Lecture 4: Q-learning (table) exploit&exploration and discounted future reward Reinforcement

Discounted reward ( = 0.9)

1

Page 20: Lecture 4: Q-learning (table) - GitHub Pageshunkim.github.io/ml/RL/rl04.pdf · 2017-10-02 · Lecture 4: Q-learning (table) exploit&exploration and discounted future reward Reinforcement

Q-learning algorithm

Page 21: Lecture 4: Q-learning (table) - GitHub Pageshunkim.github.io/ml/RL/rl04.pdf · 2017-10-02 · Lecture 4: Q-learning (table) exploit&exploration and discounted future reward Reinforcement

Q-Table Policy

(3) quality (reward) for the given action

(eg, LEFT: 0.5, RIGHT 0.1 UP: 0.0, DOWN: 0.8)

Q (s, a)

111 11111

1

1

(1) state, s

(2) action, a

Page 22: Lecture 4: Q-learning (table) - GitHub Pageshunkim.github.io/ml/RL/rl04.pdf · 2017-10-02 · Lecture 4: Q-learning (table) exploit&exploration and discounted future reward Reinforcement

Convergence

• In deterministic worlds

• In finite states

Machine Learning, Tom Mitchell, 1997

Page 23: Lecture 4: Q-learning (table) - GitHub Pageshunkim.github.io/ml/RL/rl04.pdf · 2017-10-02 · Lecture 4: Q-learning (table) exploit&exploration and discounted future reward Reinforcement

Next

Lab: Q-learning Table

Introduction to Information Retrievalpitoura/courses/ap/ap15/slides/ir14-web-other.pdf · Introduction to Information Retrieval Ανάγκες Χρησʐών Need [Brod02, RL04] Informational

Introduction to Information Retrievalpitoura/courses/ap/ap15/slides/ir14-web-other.pdf · Introduction to Information Retrieval Ανάγκες Χρησʐών Need [Brod02, RL04] Informational

Heading1Heading2Heading3Heading4 Heading5 Q $100 Q $200 Q $300 Q $400 Q $500 Q $100 Q $200 Q $300 Q $400 Q $500.

Heading1Heading2Heading3Heading4 Heading5 Q $100 Q $200 Q $300 Q $400 Q $500 Q $100 Q $200 Q $300 Q $400 Q $500.

$Q. Q.. ..Q.. .Q.. Q. - · PDF fileMuted Bass %\\, Q-# D %,-# D. 2 6 &..... 7...Q... . Q.....#.#. 8 &.....# Q. 9 QO.. Q.....Q---&..... .. Q.....#. &..... E# Q.!! Q.# Q.#... .Q--- %$

Q. Q.. ..Q.. .Q.. Q. - · PDF fileMuted Bass %\\, Q-# D %,-# D. 2 6 &..... 7...Q... . Q.....#.#. 8 &.....# Q. 9 QO.. Q.....Q---&..... .. Q.....#. &..... E# Q.!! Q.# Q.#... .Q--- %

Jeopardy 1 2 3 4 Q$100 Q$200 Q$300 Q$400 Q$500 Q$100 Q$200 Q$300 Q$400 Q$500 Q$100 Q$200 Q$300 Q$400…

Jeopardy 1 2 3 4 Q$100 Q$200 Q$300 Q$400 Q$500 Q$100 Q$200 Q$300 Q$400 Q$500 Q$100 Q$200 Q$300 Q$400…

Lab 4: Q-learning (table) - GitHub Pageshunkim.github.io/ml/RL/rl-l04.pdfLab 4: Q-learning (table) exploit&exploration and discounted future reward Reinforcement Learning with TensorFlow&OpenAI

Lab 4: Q-learning (table) - GitHub Pageshunkim.github.io/ml/RL/rl-l04.pdfLab 4: Q-learning (table) exploit&exploration and discounted future reward Reinforcement Learning with TensorFlow&OpenAI

OVERVIEW Comparison of GRACE Monthly Estimates with ...Monthly GRACE gravity values were derived from the CSR-RL04 spherical harmonic monthly models for period of April 2002 to June

OVERVIEW Comparison of GRACE Monthly Estimates with ...Monthly GRACE gravity values were derived from the CSR-RL04 spherical harmonic monthly models for period of April 2002 to June

q q q q q q q 09 - Bihar

q q q q q q q 09 - Bihar

˘ˇ ˆ˙˝˛ˇ˚ · q qqqqqqqqqqqqqqqqqq q q qqqqqqqqqqqqqqqqqq q q qqqqqqqqqqqqqqqqqq q q qqqqqqqqqqqqqqqqqq q q qqqqqqqqqqqqqqqqqq q q qqqqqqqqqqqqqqqqqq q q qqqqqqqqqqqqqqqqqq

˘ˇ ˆ˙˝˛ˇ˚ · q qqqqqqqqqqqqqqqqqq q q qqqqqqqqqqqqqqqqqq q q qqqqqqqqqqqqqqqqqq q q qqqqqqqqqqqqqqqqqq q q qqqqqqqqqqqqqqqqqq q q qqqqqqqqqqqqqqqqqq q q qqqqqqqqqqqqqqqqqq

Lab 6-1: Q Network - GitHub Pageshunkim.github.io/ml/RL/rl06-l1.pdfPlaying Atari with Deep Reinforcement Learning - University of Toronto by V Mnih et al. Algorithm Playing Atari with

Lab 6-1: Q Network - GitHub Pageshunkim.github.io/ml/RL/rl06-l1.pdfPlaying Atari with Deep Reinforcement Learning - University of Toronto by V Mnih et al. Algorithm Playing Atari with

Jeopardy - kyrene.org€¦ · Jeopardy Q $100 Q $200 Q $300 Q $400 Q $500 Q $100 Q $100 Q $100 Q $100 Q $200 Q $200 Q $200 Q $200 Q $300 Q $300 Q $300 Q $300 Q $400 Q $400 Q $400

Jeopardy - kyrene.org€¦ · Jeopardy Q $100 Q $200 Q $300 Q $400 Q $500 Q $100 Q $100 Q $100 Q $100 Q $200 Q $200 Q $200 Q $200 Q $300 Q $300 Q $300 Q $300 Q $400 Q $400 Q $400

ML with Tensorflow-new - GitHub Pageshunkim.github.io/ml/lec11.pdf · 2017-10-02 · Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 7 -68 27 Jan 2016 Case Study: AlexNet [Krizhevsky

ML with Tensorflow-new - GitHub Pageshunkim.github.io/ml/lec11.pdf · 2017-10-02 · Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 7 -68 27 Jan 2016 Case Study: AlexNet [Krizhevsky

Pictures from the Past: Mount St. Joseph, Cincinnati, … from the Past: Mount St. Joseph, Cincinnati, Ohio Q Q Q Q Q Q Q Q Q Q Q Q Q Q Q Q Q Q Q Q Q Q Q Q Q Q Q Q Q ... Built in the

Pictures from the Past: Mount St. Joseph, Cincinnati, … from the Past: Mount St. Joseph, Cincinnati, Ohio Q Q Q Q Q Q Q Q Q Q Q Q Q Q Q Q Q Q Q Q Q Q Q Q Q Q Q Q Q ... Built in the

· — Z 007) o Q Q Q Q O Q Q Q Q Q o n o. Q CD o Q

· — Z 007) o Q Q Q Q O Q Q Q Q Q o n o. Q CD o Q

ML with Tensorflow - GitHub Pageshunkim.github.io/ml/lec0.pdf · Goals •Basic understanding of machine learning algorithms Linear regression, Logistic regression (classiﬁcation)-Neural

ML with Tensorflow - GitHub Pageshunkim.github.io/ml/lec0.pdf · Goals •Basic understanding of machine learning algorithms Linear regression, Logistic regression (classiﬁcation)-Neural

ML with Tensorflow lab - GitHub Pageshunkim.github.io/ml/lab12.pdfFei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 10 -21 8 Feb 2016 Character-level language model example Vocabulary:

ML with Tensorflow lab - GitHub Pageshunkim.github.io/ml/lab12.pdfFei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 10 -21 8 Feb 2016 Character-level language model example Vocabulary:

Lab 6-2: Q Network for Cart Pole - GitHub Pageshunkim.github.io/ml/RL/rl06-l2.pdf · 2017-10-02 · import numpy as np import tensor flow as tf import gym gym. make( CartPole—vØ'

Lab 6-2: Q Network for Cart Pole - GitHub Pageshunkim.github.io/ml/RL/rl06-l2.pdf · 2017-10-02 · import numpy as np import tensor flow as tf import gym gym. make( CartPole—vØ'

stuartbrownmusic.com · 2 no q moan mp mp moan mp no moan no no mp moan mp q q q q there q there q be q q q q q there q there q be q be q be q of of q of q of q q q ing q ing q ing

stuartbrownmusic.com · 2 no q moan mp mp moan mp no moan no no mp moan mp q q q q there q there q be q q q q q there q there q be q be q be q of of q of q of q q q ing q ing q ing

Jeopardy GeographyReligionAchievementPolitical Economic Q $100 Q $200 Q $300 Q $400 Q $500 Q $100 Q $200 Q $300 Q $400 Q $500 Final Jeopardy Q $100 Q.

Jeopardy GeographyReligionAchievementPolitical Economic Q $100 Q $200 Q $300 Q $400 Q $500 Q $100 Q $200 Q $300 Q $400 Q $500 Final Jeopardy Q $100 Q.

U.S. Department of Veterans Affairs Homeless Veterans ... · q q q q q q q q q q q q q q q q q q. Confirm staff/presenter availability for the selected date and time. Provide logistical

U.S. Department of Veterans Affairs Homeless Veterans ... · q q q q q q q q q q q q q q q q q q. Confirm staff/presenter availability for the selected date and time. Provide logistical

Stopping By Woods 02-19-09 · 2 # # # # # # & & O? &? 1: Soprano 2: Alto 3: Tenor 4: Bass 5: Piano 8 s q q q 3 q q q q q q 6 q q q q q 3 q q q q q qq q qq q q q 3 s r q q q 3 hhh

Stopping By Woods 02-19-09 · 2 # # # # # # & & O? &? 1: Soprano 2: Alto 3: Tenor 4: Bass 5: Piano 8 s q q q 3 q q q q q q 6 q q q q q 3 q q q q q qq q qq q q q 3 s r q q q 3 hhh

Languages

Pages

Legal

Copyright © 2022 FDOCUMENTS