Practice Theory · 2020. 1. 3. · Practice Theory Powerful modeling, simple exploration...
Transcript of Practice Theory · 2020. 1. 3. · Practice Theory Powerful modeling, simple exploration...
![Page 1: Practice Theory · 2020. 1. 3. · Practice Theory Powerful modeling, simple exploration Sophisticated explorationin small-state MDPs e.g.: Atari Deep ReinforcementLearning e.g. !",](https://reader033.fdocuments.in/reader033/viewer/2022060601/6055e95653c3076f8c43bc39/html5/thumbnails/1.jpg)
![Page 2: Practice Theory · 2020. 1. 3. · Practice Theory Powerful modeling, simple exploration Sophisticated explorationin small-state MDPs e.g.: Atari Deep ReinforcementLearning e.g. !",](https://reader033.fdocuments.in/reader033/viewer/2022060601/6055e95653c3076f8c43bc39/html5/thumbnails/2.jpg)
![Page 3: Practice Theory · 2020. 1. 3. · Practice Theory Powerful modeling, simple exploration Sophisticated explorationin small-state MDPs e.g.: Atari Deep ReinforcementLearning e.g. !",](https://reader033.fdocuments.in/reader033/viewer/2022060601/6055e95653c3076f8c43bc39/html5/thumbnails/3.jpg)
Practice Theory
Powerfulmodeling,simpleexploration Sophisticatedexploration insmall-stateMDPs
e.g.:AtariDeepReinforcement Learning e.g.𝐸",R-MAXalgorithms
Limitedtheoryforrichobservations
Goal
DevelopReinforcementLearningapproachesguaranteed tolearnanoptimalpolicy withasmallnumberofsamples despiterichobservations.
![Page 4: Practice Theory · 2020. 1. 3. · Practice Theory Powerful modeling, simple exploration Sophisticated explorationin small-state MDPs e.g.: Atari Deep ReinforcementLearning e.g. !",](https://reader033.fdocuments.in/reader033/viewer/2022060601/6055e95653c3076f8c43bc39/html5/thumbnails/4.jpg)
Model PACGuarantees
Small-state MDPs Known
Structured large-stateMDPs New
ReactivePOMDPs Extended
ReactivePSRs New
LQR (continuousactions) Known
![Page 5: Practice Theory · 2020. 1. 3. · Practice Theory Powerful modeling, simple exploration Sophisticated explorationin small-state MDPs e.g.: Atari Deep ReinforcementLearning e.g. !",](https://reader033.fdocuments.in/reader033/viewer/2022060601/6055e95653c3076f8c43bc39/html5/thumbnails/5.jpg)
Model PACGuarantees
Small-state MDPs Known
Structured large-stateMDPs New
ReactivePOMDPs Extended
ReactivePSRs New
LQR (continuousactions) Known
![Page 6: Practice Theory · 2020. 1. 3. · Practice Theory Powerful modeling, simple exploration Sophisticated explorationin small-state MDPs e.g.: Atari Deep ReinforcementLearning e.g. !",](https://reader033.fdocuments.in/reader033/viewer/2022060601/6055e95653c3076f8c43bc39/html5/thumbnails/6.jpg)
§
§
§
![Page 7: Practice Theory · 2020. 1. 3. · Practice Theory Powerful modeling, simple exploration Sophisticated explorationin small-state MDPs e.g.: Atari Deep ReinforcementLearning e.g. !",](https://reader033.fdocuments.in/reader033/viewer/2022060601/6055e95653c3076f8c43bc39/html5/thumbnails/7.jpg)
![Page 8: Practice Theory · 2020. 1. 3. · Practice Theory Powerful modeling, simple exploration Sophisticated explorationin small-state MDPs e.g.: Atari Deep ReinforcementLearning e.g. !",](https://reader033.fdocuments.in/reader033/viewer/2022060601/6055e95653c3076f8c43bc39/html5/thumbnails/8.jpg)
![Page 9: Practice Theory · 2020. 1. 3. · Practice Theory Powerful modeling, simple exploration Sophisticated explorationin small-state MDPs e.g.: Atari Deep ReinforcementLearning e.g. !",](https://reader033.fdocuments.in/reader033/viewer/2022060601/6055e95653c3076f8c43bc39/html5/thumbnails/9.jpg)
![Page 10: Practice Theory · 2020. 1. 3. · Practice Theory Powerful modeling, simple exploration Sophisticated explorationin small-state MDPs e.g.: Atari Deep ReinforcementLearning e.g. !",](https://reader033.fdocuments.in/reader033/viewer/2022060601/6055e95653c3076f8c43bc39/html5/thumbnails/10.jpg)
𝐻
![Page 11: Practice Theory · 2020. 1. 3. · Practice Theory Powerful modeling, simple exploration Sophisticated explorationin small-state MDPs e.g.: Atari Deep ReinforcementLearning e.g. !",](https://reader033.fdocuments.in/reader033/viewer/2022060601/6055e95653c3076f8c43bc39/html5/thumbnails/11.jpg)
§
![Page 12: Practice Theory · 2020. 1. 3. · Practice Theory Powerful modeling, simple exploration Sophisticated explorationin small-state MDPs e.g.: Atari Deep ReinforcementLearning e.g. !",](https://reader033.fdocuments.in/reader033/viewer/2022060601/6055e95653c3076f8c43bc39/html5/thumbnails/12.jpg)
§
𝜋(𝑥')
§
![Page 13: Practice Theory · 2020. 1. 3. · Practice Theory Powerful modeling, simple exploration Sophisticated explorationin small-state MDPs e.g.: Atari Deep ReinforcementLearning e.g. !",](https://reader033.fdocuments.in/reader033/viewer/2022060601/6055e95653c3076f8c43bc39/html5/thumbnails/13.jpg)
![Page 14: Practice Theory · 2020. 1. 3. · Practice Theory Powerful modeling, simple exploration Sophisticated explorationin small-state MDPs e.g.: Atari Deep ReinforcementLearning e.g. !",](https://reader033.fdocuments.in/reader033/viewer/2022060601/6055e95653c3076f8c43bc39/html5/thumbnails/14.jpg)
![Page 15: Practice Theory · 2020. 1. 3. · Practice Theory Powerful modeling, simple exploration Sophisticated explorationin small-state MDPs e.g.: Atari Deep ReinforcementLearning e.g. !",](https://reader033.fdocuments.in/reader033/viewer/2022060601/6055e95653c3076f8c43bc39/html5/thumbnails/15.jpg)
§
§
§
§
§
![Page 16: Practice Theory · 2020. 1. 3. · Practice Theory Powerful modeling, simple exploration Sophisticated explorationin small-state MDPs e.g.: Atari Deep ReinforcementLearning e.g. !",](https://reader033.fdocuments.in/reader033/viewer/2022060601/6055e95653c3076f8c43bc39/html5/thumbnails/16.jpg)
![Page 17: Practice Theory · 2020. 1. 3. · Practice Theory Powerful modeling, simple exploration Sophisticated explorationin small-state MDPs e.g.: Atari Deep ReinforcementLearning e.g. !",](https://reader033.fdocuments.in/reader033/viewer/2022060601/6055e95653c3076f8c43bc39/html5/thumbnails/17.jpg)
![Page 18: Practice Theory · 2020. 1. 3. · Practice Theory Powerful modeling, simple exploration Sophisticated explorationin small-state MDPs e.g.: Atari Deep ReinforcementLearning e.g. !",](https://reader033.fdocuments.in/reader033/viewer/2022060601/6055e95653c3076f8c43bc39/html5/thumbnails/18.jpg)
§
§
§
§
§
![Page 19: Practice Theory · 2020. 1. 3. · Practice Theory Powerful modeling, simple exploration Sophisticated explorationin small-state MDPs e.g.: Atari Deep ReinforcementLearning e.g. !",](https://reader033.fdocuments.in/reader033/viewer/2022060601/6055e95653c3076f8c43bc39/html5/thumbnails/19.jpg)
![Page 20: Practice Theory · 2020. 1. 3. · Practice Theory Powerful modeling, simple exploration Sophisticated explorationin small-state MDPs e.g.: Atari Deep ReinforcementLearning e.g. !",](https://reader033.fdocuments.in/reader033/viewer/2022060601/6055e95653c3076f8c43bc39/html5/thumbnails/20.jpg)
§
§
§
§
𝑥
§
![Page 21: Practice Theory · 2020. 1. 3. · Practice Theory Powerful modeling, simple exploration Sophisticated explorationin small-state MDPs e.g.: Atari Deep ReinforcementLearning e.g. !",](https://reader033.fdocuments.in/reader033/viewer/2022060601/6055e95653c3076f8c43bc39/html5/thumbnails/21.jpg)
§
𝜋 𝑥 ) *
Distributionofinitialstate
Distributionofnextstate
Instantaneousreward
![Page 22: Practice Theory · 2020. 1. 3. · Practice Theory Powerful modeling, simple exploration Sophisticated explorationin small-state MDPs e.g.: Atari Deep ReinforcementLearning e.g. !",](https://reader033.fdocuments.in/reader033/viewer/2022060601/6055e95653c3076f8c43bc39/html5/thumbnails/22.jpg)
§
max/E0~23 𝑟 𝑎 + E*7~8 *,/ 𝑉⋆(𝑥<)
Distributionofinitialstate
DistributionofnextstateInstantaneous
reward
Optimalaction
![Page 23: Practice Theory · 2020. 1. 3. · Practice Theory Powerful modeling, simple exploration Sophisticated explorationin small-state MDPs e.g.: Atari Deep ReinforcementLearning e.g. !",](https://reader033.fdocuments.in/reader033/viewer/2022060601/6055e95653c3076f8c43bc39/html5/thumbnails/23.jpg)
§
max/E0~23 𝑟 𝑎 + E*7~8 *,/ 𝑉⋆(𝑥<)
𝑄⋆(𝑥, 𝑎)
𝜋⋆ 𝑥 = argmax/
𝑄⋆ 𝑥, 𝑎
![Page 24: Practice Theory · 2020. 1. 3. · Practice Theory Powerful modeling, simple exploration Sophisticated explorationin small-state MDPs e.g.: Atari Deep ReinforcementLearning e.g. !",](https://reader033.fdocuments.in/reader033/viewer/2022060601/6055e95653c3076f8c43bc39/html5/thumbnails/24.jpg)
§
§
![Page 25: Practice Theory · 2020. 1. 3. · Practice Theory Powerful modeling, simple exploration Sophisticated explorationin small-state MDPs e.g.: Atari Deep ReinforcementLearning e.g. !",](https://reader033.fdocuments.in/reader033/viewer/2022060601/6055e95653c3076f8c43bc39/html5/thumbnails/25.jpg)
§
§
§
§
![Page 26: Practice Theory · 2020. 1. 3. · Practice Theory Powerful modeling, simple exploration Sophisticated explorationin small-state MDPs e.g.: Atari Deep ReinforcementLearning e.g. !",](https://reader033.fdocuments.in/reader033/viewer/2022060601/6055e95653c3076f8c43bc39/html5/thumbnails/26.jpg)
§
§
§
§
![Page 27: Practice Theory · 2020. 1. 3. · Practice Theory Powerful modeling, simple exploration Sophisticated explorationin small-state MDPs e.g.: Atari Deep ReinforcementLearning e.g. !",](https://reader033.fdocuments.in/reader033/viewer/2022060601/6055e95653c3076f8c43bc39/html5/thumbnails/27.jpg)
![Page 28: Practice Theory · 2020. 1. 3. · Practice Theory Powerful modeling, simple exploration Sophisticated explorationin small-state MDPs e.g.: Atari Deep ReinforcementLearning e.g. !",](https://reader033.fdocuments.in/reader033/viewer/2022060601/6055e95653c3076f8c43bc39/html5/thumbnails/28.jpg)
E0~23 𝑟 𝑎 + E*7~8 *,/ 𝑉⋆ 𝑥<
§
![Page 29: Practice Theory · 2020. 1. 3. · Practice Theory Powerful modeling, simple exploration Sophisticated explorationin small-state MDPs e.g.: Atari Deep ReinforcementLearning e.g. !",](https://reader033.fdocuments.in/reader033/viewer/2022060601/6055e95653c3076f8c43bc39/html5/thumbnails/29.jpg)
E0~23 𝑟 𝑎 + E*7~8 *,/ max/7𝑄⋆(𝑥<, 𝑎<)
§
![Page 30: Practice Theory · 2020. 1. 3. · Practice Theory Powerful modeling, simple exploration Sophisticated explorationin small-state MDPs e.g.: Atari Deep ReinforcementLearning e.g. !",](https://reader033.fdocuments.in/reader033/viewer/2022060601/6055e95653c3076f8c43bc39/html5/thumbnails/30.jpg)
E0~23 𝑟 𝑎 + E*7~8 *,/ 𝑄⋆(𝑥<, 𝜋⋆ 𝑥< )
§
§
![Page 31: Practice Theory · 2020. 1. 3. · Practice Theory Powerful modeling, simple exploration Sophisticated explorationin small-state MDPs e.g.: Atari Deep ReinforcementLearning e.g. !",](https://reader033.fdocuments.in/reader033/viewer/2022060601/6055e95653c3076f8c43bc39/html5/thumbnails/31.jpg)
E 𝑓 𝑥', 𝑎' − 𝑟' − 𝑓 𝑥'CD, 𝑎'CD ,
𝑥'
![Page 32: Practice Theory · 2020. 1. 3. · Practice Theory Powerful modeling, simple exploration Sophisticated explorationin small-state MDPs e.g.: Atari Deep ReinforcementLearning e.g. !",](https://reader033.fdocuments.in/reader033/viewer/2022060601/6055e95653c3076f8c43bc39/html5/thumbnails/32.jpg)
E 𝑓 𝑥', 𝑎' − 𝑟' − 𝑓 𝑥'CD, 𝑎'CD ,
![Page 33: Practice Theory · 2020. 1. 3. · Practice Theory Powerful modeling, simple exploration Sophisticated explorationin small-state MDPs e.g.: Atari Deep ReinforcementLearning e.g. !",](https://reader033.fdocuments.in/reader033/viewer/2022060601/6055e95653c3076f8c43bc39/html5/thumbnails/33.jpg)
§
§
§ Validitycondition
![Page 34: Practice Theory · 2020. 1. 3. · Practice Theory Powerful modeling, simple exploration Sophisticated explorationin small-state MDPs e.g.: Atari Deep ReinforcementLearning e.g. !",](https://reader033.fdocuments.in/reader033/viewer/2022060601/6055e95653c3076f8c43bc39/html5/thumbnails/34.jpg)
§
§
§
§
![Page 35: Practice Theory · 2020. 1. 3. · Practice Theory Powerful modeling, simple exploration Sophisticated explorationin small-state MDPs e.g.: Atari Deep ReinforcementLearning e.g. !",](https://reader033.fdocuments.in/reader033/viewer/2022060601/6055e95653c3076f8c43bc39/html5/thumbnails/35.jpg)
§
§
§
§
![Page 36: Practice Theory · 2020. 1. 3. · Practice Theory Powerful modeling, simple exploration Sophisticated explorationin small-state MDPs e.g.: Atari Deep ReinforcementLearning e.g. !",](https://reader033.fdocuments.in/reader033/viewer/2022060601/6055e95653c3076f8c43bc39/html5/thumbnails/36.jpg)
![Page 37: Practice Theory · 2020. 1. 3. · Practice Theory Powerful modeling, simple exploration Sophisticated explorationin small-state MDPs e.g.: Atari Deep ReinforcementLearning e.g. !",](https://reader033.fdocuments.in/reader033/viewer/2022060601/6055e95653c3076f8c43bc39/html5/thumbnails/37.jpg)
§
§
§
§
§
![Page 38: Practice Theory · 2020. 1. 3. · Practice Theory Powerful modeling, simple exploration Sophisticated explorationin small-state MDPs e.g.: Atari Deep ReinforcementLearning e.g. !",](https://reader033.fdocuments.in/reader033/viewer/2022060601/6055e95653c3076f8c43bc39/html5/thumbnails/38.jpg)
§
§§
§
§
§
![Page 39: Practice Theory · 2020. 1. 3. · Practice Theory Powerful modeling, simple exploration Sophisticated explorationin small-state MDPs e.g.: Atari Deep ReinforcementLearning e.g. !",](https://reader033.fdocuments.in/reader033/viewer/2022060601/6055e95653c3076f8c43bc39/html5/thumbnails/39.jpg)
![Page 40: Practice Theory · 2020. 1. 3. · Practice Theory Powerful modeling, simple exploration Sophisticated explorationin small-state MDPs e.g.: Atari Deep ReinforcementLearning e.g. !",](https://reader033.fdocuments.in/reader033/viewer/2022060601/6055e95653c3076f8c43bc39/html5/thumbnails/40.jpg)
§
§E*∼8F max/ [𝑄⋆ 𝑥, 𝑎 ]
E*∼8F𝑄⋆(𝑥, 𝜋⋆ 𝑥 )
![Page 41: Practice Theory · 2020. 1. 3. · Practice Theory Powerful modeling, simple exploration Sophisticated explorationin small-state MDPs e.g.: Atari Deep ReinforcementLearning e.g. !",](https://reader033.fdocuments.in/reader033/viewer/2022060601/6055e95653c3076f8c43bc39/html5/thumbnails/41.jpg)
§
§
§ 𝑉I = E𝒙∼𝚪𝟏[𝒇 𝒙, 𝝅𝒇 𝒙 ]
§
§
§
§
§
Optimismunderuncertainty,guessfor𝑉 𝜋⋆ if𝑓 = 𝑄⋆
Checkingouroptimisticbelief
Prunethepossiblesolutions
![Page 42: Practice Theory · 2020. 1. 3. · Practice Theory Powerful modeling, simple exploration Sophisticated explorationin small-state MDPs e.g.: Atari Deep ReinforcementLearning e.g. !",](https://reader033.fdocuments.in/reader033/viewer/2022060601/6055e95653c3076f8c43bc39/html5/thumbnails/42.jpg)
![Page 43: Practice Theory · 2020. 1. 3. · Practice Theory Powerful modeling, simple exploration Sophisticated explorationin small-state MDPs e.g.: Atari Deep ReinforcementLearning e.g. !",](https://reader033.fdocuments.in/reader033/viewer/2022060601/6055e95653c3076f8c43bc39/html5/thumbnails/43.jpg)
§§
§§
§
§
§
§
![Page 44: Practice Theory · 2020. 1. 3. · Practice Theory Powerful modeling, simple exploration Sophisticated explorationin small-state MDPs e.g.: Atari Deep ReinforcementLearning e.g. !",](https://reader033.fdocuments.in/reader033/viewer/2022060601/6055e95653c3076f8c43bc39/html5/thumbnails/44.jpg)
§
§
§
![Page 45: Practice Theory · 2020. 1. 3. · Practice Theory Powerful modeling, simple exploration Sophisticated explorationin small-state MDPs e.g.: Atari Deep ReinforcementLearning e.g. !",](https://reader033.fdocuments.in/reader033/viewer/2022060601/6055e95653c3076f8c43bc39/html5/thumbnails/45.jpg)
§
§
§
§
![Page 46: Practice Theory · 2020. 1. 3. · Practice Theory Powerful modeling, simple exploration Sophisticated explorationin small-state MDPs e.g.: Atari Deep ReinforcementLearning e.g. !",](https://reader033.fdocuments.in/reader033/viewer/2022060601/6055e95653c3076f8c43bc39/html5/thumbnails/46.jpg)
§
§
§
§
§
![Page 47: Practice Theory · 2020. 1. 3. · Practice Theory Powerful modeling, simple exploration Sophisticated explorationin small-state MDPs e.g.: Atari Deep ReinforcementLearning e.g. !",](https://reader033.fdocuments.in/reader033/viewer/2022060601/6055e95653c3076f8c43bc39/html5/thumbnails/47.jpg)
§
§
§
§
§
![Page 48: Practice Theory · 2020. 1. 3. · Practice Theory Powerful modeling, simple exploration Sophisticated explorationin small-state MDPs e.g.: Atari Deep ReinforcementLearning e.g. !",](https://reader033.fdocuments.in/reader033/viewer/2022060601/6055e95653c3076f8c43bc39/html5/thumbnails/48.jpg)
Detailsat:https://arxiv.org/abs/1610.09512