Practical Applications of Deep Reinforcement Learning ... · DL4J and RL4J libraries. SKIL &...
Transcript of Practical Applications of Deep Reinforcement Learning ... · DL4J and RL4J libraries. SKIL &...
© The AnyLogic Company | www.anylogic.com
Practical Applications of Deep Reinforcement Learning Using AnyLogic
The AnyLogic Conference 2019, Austin, TX
Arash Mahdavi, Program Lead, The AnyLogic CompanyTy Wang, Vice President of Business Development, Skymind
© The AnyLogic Company | www.anylogic.com 2
Learning and decision making from a simulation model
FINAL MODEL
LEARN
Simulation model is an extension of someone’s mental model
© The AnyLogic Company | www.anylogic.com 3
Learning and decision making from a simulation model
FINAL MODEL
LEARN
© The AnyLogic Company | www.anylogic.com 4
Simulation as the reinforcement learning environment
SIMULATED WORLD(Simulation Model)
© The AnyLogic Company | www.anylogic.com 5
Traffic Light Example
Eduardo GonzalezVP EngineeringSkymind
Samuel Audet Deep Learning EngineerSkymind
Tyler Wolfe-AdamTechnical Support Specialist The AnyLogic Company
© The AnyLogic Company | www.anylogic.com 6
Arriv
al ra
tes (
per h
our)
Time (seconds)
Traffic Light Example
Cars enter the intersection from 4 directions and move towards the opposing side.
The objective of the training experiment is to learn a policy optimally controls the traffic light based on current status of the traffic.
N
S
W E
© The AnyLogic Company | www.anylogic.com 8
Implementation Architecture
AnyLogic Model
Imported RL4J library
Custom Experiment
© The AnyLogic Company | www.anylogic.com 9
What is inside the Custom experiment?
Hyperparameters
Network configuration
Training
© The AnyLogic Company | www.anylogic.com 10
What is inside the Custom experiment?
Network configuration
10
300 300
2
Input
Hidden 1 Hidden 2
Output
© The AnyLogic Company | www.anylogic.com 11
What is inside the Custom experiment?
Network configuration
© The AnyLogic Company | www.anylogic.com 12
What is inside the Custom experiment?
Network configuration
Training
© The AnyLogic Company | www.anylogic.com 14
What is inside the Custom experiment?
Array with 10 elements
12
34
56
87
9
© The AnyLogic Company | www.anylogic.com 16
What is inside the Custom experiment?
Action == 0: do nothingAction == 1: change the traffic
light phase if not yellow
© The AnyLogic Company | www.anylogic.com 19
Comparison of results (Base vs. Optimized vs. Policy)
Real systems: Dynamic + Stochastic (exogenous inputs / system internals)
Optimization: Optimal fixed input parameters
Policy: Optimal (or near-optimal) decisions over time
© The AnyLogic Company | www.anylogic.com 20
Reinforcement learning decision points
Hyperparameters Observation Space
Action SpaceReward
© The AnyLogic Company | www.anylogic.com 21
Trained policies can be deployed in all types of devices and equipments to adaptively and autonomously complete some tasks.
How are learned policies used?
Edge devices could be used as controllersto deploy the learned policies.
© The AnyLogic Company | www.anylogic.com 22
Export model and text file
Test Export File Format
Export AnyLogic Model to Train
© The AnyLogic Company | www.anylogic.com 23
Add model into Skymind intelligence layer
.jar File Transfer
Create Experiment
Ready-to-Use Machine Learning Notebooks, Libraries, and Workflows
© The AnyLogic Company | www.anylogic.com 24
Train Model
Notebook Integration
Web or Command Line Interface
Compute and Storage Resource Management
Analytics
© The AnyLogic Company | www.anylogic.com 25
Deploy Model
Ready-to-Use Deployment Workflow
Multiple Model Language Support: Java, Python, Endpoints, RPA
© The AnyLogic Company | www.anylogic.com 26
Manage history and versions
Version History with Rollback
© The AnyLogic Company | www.anylogic.com 27
Machine Learning powered by Skymind
http://www.skymind.ai/anylogic
© The AnyLogic Company | www.anylogic.com 28
• The great news for simulation modelers is that their skills have a new and exciting application now!
• To implement a reinforcement learning (or DRL) a team of DRL expert(s) + simulation modeler(s) can collaborate. In theory, it is not necessary for each team to have an in-depth knowledge of the other group’s tasks.
• In developing simulation models that are going to be used as training environments, the stakes are higher because the human buffer is no longer there.
What should simulation modelers know about this new application?
© The AnyLogic Company | www.anylogic.com 29
At least in near future, there is NO way to automate the process of abstracting reality into a simulation model because it has two aspects that [current] machines are not good at:
The process of abstracting reality is an art Simulation models are fundamentally based on uncovering causality and how something works
Can simulation modelers’ jobs be replaced with AI too?
© The AnyLogic Company | www.anylogic.com 30
AnyLogic-AI integration roadmap
April 2019DL4J and RL4J librariesSKIL & AnyLogic
Summer 2019AnyLogic Cloud Python API (RL ready)
June 2019RL capabilities for the current AnyLogic Cloud Java APIDRL examples with instructions
end of 2019AL- AI book (first draft)
We are here now
• Integration with other AI platforms
• DRL in the Cloud (DRL experiment)• AL Python API (AnyLogic 9)
• Providing DRL-compatible example models
Fall 2019Preset learning algorithm/architectures in SKIL