Miguel Angel Soto Santibanez. Overview Why they are important Previous Work Advantages and...
179
Building an Artificial Cerebellum using a System of Distributed Q-Learning Agents Miguel Angel Soto Santibanez
-
date post
21-Dec-2015 -
Category
Documents
-
view
215 -
download
0
Transcript of Miguel Angel Soto Santibanez. Overview Why they are important Previous Work Advantages and...
- Slide 1
- Miguel Angel Soto Santibanez
- Slide 2
- Overview Why they are important Previous Work Advantages and Shortcomings
- Slide 3
- Overview Advantages and Shortcomings New Technique Illustration
- Slide 4
- Overview Illustration Centralized Learning Agent Issue New Techniques
- Slide 5
- Overview New Techniques Evaluate Set of Techniques Summary of Contributions
- Slide 6
- Overview Summary of Contributions Software Tool
- Slide 7
- Early Sensorimotor Neural Networks Sensory System (receptors) Motor System (motor executors )
- Slide 8
- Modern Sensorimotor Neural Networks Sensory System Motor System Cerebellum
- Slide 9
- Natural Cerebellum FAST PRECISELY SYNCHRONIZED
- Slide 10
- Artificial Cerebellum
- Slide 11
- Previous Work Cerebellatron CMAC FOX SpikeFORCE SENSOPAC
- Slide 12
- Previous work strengths Cerebellatron improved movement smoothness in robots.
- Slide 13
- Previous work strengths CMAC provided function approximation with fast convergence.
- Slide 14
- Previous work strengths FOX improved convergence by making use of eligibility vectors.
- Slide 15
- Previous work strengths SpikeFORCE and SENSOPAC have improved our understanding of natural cerebellums.
- Slide 16
- Previous work strengths May allow us to: Treat nervous system ailments. Discover biological mechanisms.
- Slide 17
- Previous work strengths LWPR a step in the right direction towards tackling the scalability issue.
- Slide 18
- Previous work issues Cerebellatron is difficult to use and requires very complicated control input
- Slide 19
- Previous work issues CMAC and FOX depend on fixed sized tiles and therefore do not scale well.
- Slide 20
- Previous work issues Methods proposed by SpikeFORCE and SENSOPAC require rare skills.
- Slide 21
- Previous work issues The methods LWPR proposed by SENSOPAC only works well if the problem has only a few non- redundant and non-irrelevant dimensions.
- Slide 22
- Previous work issues Two Categories: 1) Framework Usability Issues: 2)Building Blocks Incompatibility Issues:
- Slide 23
- Previous work issues Two Categories: 1) Framework Usability Issues: very difficult to use requires very specialize skills
- Slide 24
- Previous work issues Two Categories: 2) Building Blocks Incompatibility Issues: memory incompatibility processing incompatibility
- Slide 25
- Proposed technique. Two Categories: Framework Usability Issues: new development framework
- Slide 26
- Proposed technique. Two Categories: Building Blocks Incompatibility Issues new I/O mapping algorithm
- Slide 27
- Proposed Technique Provides a shorthand notation.
- Slide 28
- Proposed Technique Provides a recipe.
- Slide 29
- Proposed Technique Provides simplification rules.
- Slide 30
- Proposed Technique Provides a more compatible I/O mapping algorithm. Moving Prototypes
- Slide 31
- Proposed Technique The shorthand notation symbols: a sensor:
- Slide 32
- Proposed Technique The shorthand notation symbols: an actuator:
- Slide 33
- Proposed Technique The shorthand notation symbols: a master learning agent:
- Slide 34
- Proposed Technique The shorthand notation symbols: the simplest artificial cerebellum:
- Slide 35
- Proposed Technique The shorthand notation symbols: an encoder:
- Slide 36
- Proposed Technique The shorthand notation symbols: a decoder:
- Slide 37
- Proposed Technique The shorthand notation symbols: an agent with sensors and actuators:
- Slide 38
- Proposed Technique The shorthand notation symbols: a sanity point: S
- Slide 39
- Proposed Technique The shorthand notation symbols: a slave sensor learning agent:
- Slide 40
- Proposed Technique The shorthand notation symbols: a slave actuator learning agent:
- Slide 41
- Proposed Technique The Recipe: 1)Become familiar with problem at hand. 2) Enumerate significant factors. 3) Categorize factors as either sensors, actuators or both. 4)Specify sanity points.
- Slide 42
- Proposed Technique The Recipe: 5)Simplify overloaded agents. 6) Describe system using proposed shorthand notation. 7) Apply simplification rules. 8)Specify reward function for each agent.
- Slide 43
- Proposed Technique The simplification rules: 1) Two agents in series can be merged into a single agent:
- Slide 44
- Proposed Technique The simplification rules: 2)Ok to apply simplification rules as long as no sanity point is destroyed. S
- Slide 45
- Proposed Technique The simplification rules: 3)If decoder and encoder share same output and input signals respectively, they can be deleted
- Slide 46
- Proposed Technique The simplification rules: 4)Decoders with a single output signal can be deleted
- Slide 47
- Proposed Technique The simplification rules: 5)Encoders with a single output signal can be deleted
- Slide 48
- Proposed Technique The simplification rules: 6)If several agents receive signals from a single decoder and send their signals to single encoder:
- Slide 49
- Proposed Technique Q-learning: Off-policy Control algorithm. Temporary-Difference algorithm. Can be applied online.
- Slide 50
- Proposed Technique L. A. R Q-learning:
- Slide 51
- Proposed Technique Q-learning: How it does it? By learning an estimate of the long-term expected reward for any state-action pair.
- Slide 52
- Proposed Technique state long term expected reward action Q-learning: value function
- Slide 53
- Proposed Technique Q-learning: state long term expected reward action
- Slide 54
- Proposed Technique Q-learning: state long term expected reward action stateaction 64140 72140 80140 88140 96140 104140 112140
- Slide 55
- Proposed Technique action state long term expected reward stateaction 0.20.70 0.40.75 0.60.90 0.80.95 1.00.80 1.20.90 1.40.55 Q-learning:
- Slide 56
- Proposed Technique How is the Q-function updated? Initialize Q(s, a) arbitrarily Repeat (for each episode) Initialize s Repeat (for each step of the episode) Choose a for s using an exploratory policy Take action a, observe r, s Q(s, a) Q(s,a) + [r + max Q (s, a) Q(s, a)] a s s
- Slide 57
- Proposed Technique states actions 987654321987654321 1 2 3 4 5 6 7 8 9 Q-learning:
- Slide 58
- Proposed Technique Tile Coding Kanerva Coding Proposed alternatives:
- Slide 59
- Proposed Technique Distribution of memory resources:
- Slide 60
- Proposed Technique Moving Prototypes:
- Slide 61
- Proposed Technique Tile Based vs. Moving Prototypes:
- Slide 62
- Proposed Technique CMAC vs. Moving Prototypes:
- Slide 63
- Proposed Technique LWPR vs. Moving Prototypes:
- Slide 64
- Hidden Input Output Proposed Technique LWPR vs. Moving Prototypes:
- Slide 65
- Hidden Input Output Proposed Technique O(n)O(log n)
- Slide 66
- LWPR vs. Moving Prototypes: Hidden Input Output Proposed Technique One trillion interactions A few dozen interactions
- Slide 67
- Centralized L.A. issue Steering angle L. A. R R R Obstacles positions
- Slide 68
- Centralized L.A. issue L. A. R R R
- Slide 69
- Centralized L.A. issue L. A. R R R
- Slide 70
- Centralized L.A. issue L. A. R R R
- Slide 71
- Centralized L.A. issue ( possible solution ) Centralize L.A.: Distributed L.A.: L. A.
- Slide 72
- Distributed L.A. Additional Tasks: 1)Analyze Distributed L.A. Systems. 2)Identify most important bottlenecks. 3)Propose amelioration techniques. 4)Evaluate their performance.
- Slide 73
- Distributed L.A. DEVS Java:
- Slide 74
- Distributed L.A. Some DEVS Java details: 1)Provides continuous time support. 2)Models are easy to define. 3)Do not need to worry about the simulator. 4)Supports concurrency.
- Slide 75
- Distributed L.A. First technique: Learning will occur faster if we split the value function in slices along the action axis. states actions states actions states actions
- Slide 76
- Distributed L.A. First Technique Results: 1) 3500 time units to learn to balance pole. 2) 1700 time units to learn to balance pole. ~ 50% improvement.
- Slide 77
- Distributed L.A. Second technique: We suspect that we may be able to ameliorate this problem by making sluggish learning agents pass some of their load to speedy learning agents.. Sluggish agent Speedy agent
- Slide 78
- Distributed L.A. Second Technique Results: 1) 52000 time units to learn to balance pole. 2) 42000 time units to learn to balance pole. ~ 20% improvement.
- Slide 79
- Distributed L.A. Third Technique: Giving each distributed LA its own local simulator should improve learning performance significantly.
- Slide 80
- Distributed L.A. Third Technique Results: 1) 29000 time units to learn to balance pole. 2) 14000 time units to learn to balance pole. ~ 50% improvement.
- Slide 81
- Distributed L.A. Fourth Thechnique: Using a push policy instead of a poll policy to propagate max-values among the learning agents should improve learning performance significantly.
- Slide 82
- Distributed L.A. Fourth Technique Results: 1) 14000 time units to learn to balance pole. 2) 6000 time units to learn to balance pole. ~ 60% improvement.
- Slide 83
- Distributed L.A. How does the Push policy scale?
- Slide 84
- Distributed L.A. Results: 1) 24000 time units to learn to balance pole. 2) 12000 time units to learn to balance pole. ~ 50% improvement.
- Slide 85
- Distributed L.A. Results Suggest the use of a distributed system that: 1) divides value function along action axis. 2) auto balances load from slow agents to fast ones. 3) allows each agent access to its own simulator. 4) uses a Push policy to propagate maxQ(s,a) information.
- Slide 86
- Contributions Explained importance of Artificial Cerebellums:
- Slide 87
- Contributions Provided Analysis of previous work: Cerebellatron CMAC FOX SpikeFORCE SENSOPAC
- Slide 88
- Contributions Extracted most important shortcomings: 1) Framework usability. 2) Building blocks incompatibility.
- Slide 89
- Contributions Proposed to use a new framework
- Slide 90
- Contributions Proposed the use of Moving Prototypes: CMAC, FOX Moving Prototypes
- Slide 91
- Contributions Proposed the use of Moving Prototypes Hidden Input Output O(n)O(log n)
- Slide 92
- Contributions Illustrated new technique using a real-life problem
- Slide 93
- Contributions Proposed using distributed agent system L. A.
- Slide 94
- Contributions Proposed a set of techniques to ameliorate bottlenecks.
- Slide 95
- Future Work Development Tool:
- Slide 96
- Future Work Automatization: Extract Policy Train
- Slide 97
- Future Work Intuitive: Extract Policy Train Components
- Slide 98
- Future Work Simple component configuration: Extract Policy Train Hardware Details Signal Table Details . Components
- Slide 99
- Future Work Integrated simulator: Extract Policy Train Components
- Slide 100
- Extract Policy Train Components Library Water Components Space Components Land Components Air Components Future Work Available library:
- Slide 101
- Questions or Comments
- Slide 102
- Slide 103
- Slide 104
- Asteroids Problem
- Slide 105
- Asteroids Problem (step 1)
- Slide 106
- spacecraft
- Slide 107
- Asteroids Problem (step 1) spacecraft
- Slide 108
- Asteroids Problem (step 2) 1)Distance to nearby Asteroids 1)Angle to nearby Asteroids 1)Angle to goal direction d a g
- Slide 109
- Asteroids Problem (step 3) 1)Distance to nearby Asteroids Sensor d
- Slide 110
- Asteroids Problem (step 3) 2)Angle to nearby Asteroids Sensor Actuator a
- Slide 111
- Asteroids Problem (step 3) 3)Angle to goal direction Sensor Actuator
- Slide 112
- Asteroids Problem (step 3) Sensor 1 find distance to nearby Asteroids Sensor 2 find angle to nearby Asteroids Sensor 3 find angle to goal direction Actuator 1 Steer spacecraft (to avoid collision) Actuator 2 Steer spacecraft (to match goal)
- Slide 113
- Asteroids Problem (step 3) Sensor 1 find distance to nearby Asteroids Sensor 2 find angle to nearby Asteroids Sensor 3 find angle to goal direction Actuator 1 Steer spacecraft (to avoid collision and to match goal)
- Slide 114
- Asteroids Problem (step 3) Learning agent receives information about the environment from sensors S 1, S 2 and S 3 and instruct the actuators A 1 what to do in this case
- Slide 115
- Asteroids Problem (step 4) Learning agent receives information about the environment from sensors S 1, S 2 and S 3 and generates the desired heading angle signal Learning agent receives desired delta heading angle signal and instruct the actuators A 1 what to do in this case S desired delta heading angle
- Slide 116
- Asteroids Problem (step 5) Learning agent receives information about the environment from sensors S 1, S 2 and S 3 and generates the desired heading angle signal Learning agent receives desired delta heading angle signal and instruct the actuators A 1 what to do in this case S desired delta heading angle
- Slide 117
- Asteroids Problem (step 6) S S1 S2 S3 A1 desired delta heading angle
- Slide 118
- Asteroids Problem (step 6) S radar angles to nearby asteroids distances to nearby asteroids desired delta heading angle angle to goal
- Slide 119
- Asteroids Problem (step 6)
- Slide 120
- Slide 121
- Asteroids Problem [0,4] distance [0,0] angle signaldistance (m) 0[0,19) 1[19,39) 2[39,59) 3[59,79) 4[79,) signalAngle () 0[0,22.5)
- Slide 122
- Asteroids Problem (step 6) S Radar angles to nearby asteroids [0,7] angle to goal distances to nearby asteroids [0,390624] desired delta heading angle
- Slide 123
- Asteroids Problem (step 6) spacecraft
- Slide 124
- Asteroids Problem (step 6) S Radar angles to nearby asteroids [0,7] angle to goal [0,7] distances to nearby asteroids [0,390624] desired delta heading angle
- Slide 125
- Asteroids Problem (step 6) S Radar angles to nearby asteroids [0,7] angle to goal [0,7] distances to nearby asteroids [0,390624] [0,24999999][0,20] desired delta heading angle SignalAngle 0-10 1-9 100 199 2010
- Slide 126
- Asteroids Problem (step 6)
- Slide 127
- Slide 128
- Slide 129
- S Radar angles to nearby asteroids [0,7] angle to goal [0,7] distances to nearby asteroids [0,390624] [0,24999999][0,20] propulsion system propulsion signal [0,100] [0,20] desired delta heading angle
- Slide 130
- Asteroids Problem (step 7) S Radar angles to nearby asteroids [0,7] angle to goal [0,7] distances to nearby asteroids [0,390624] [0,24999999][0,20] propulsion system propulsion signal [0,100] [0,20] desired delta heading angle
- Slide 131
- Asteroids Problem (step 8) S Radar angles to nearby asteroids [0,7] angle to goal [0,7] distances to nearby asteroids [0,390624] [0,24999999][0,20] propulsion system propulsion signal [0,100] [0,20] desired delta heading angle
- Slide 132
- Asteroids Problem (step 8) spacecraft environment [0,24999999][0,20] desired delta heading angle propulsion signal [0,20] [0,100] FIRST AGENT:SECOND AGENT:
- Slide 133
- Asteroids Problem (step 8) FIRST AGENT: REWARD FUNCTION: Reward = distance to an asteroid < 10m -10000 else => (180 |angleToGoal| ) if Spacecraft environment [0,24999999][0,20] Desired delta heading angle
- Slide 134
- Asteroids Problem (step 8) SECOND AGENT: REWARD FUNCTION: Reward = - |achievedAngle desiredAngle| desired delta heading angle propulsion signal [0,20] [0,100]
- Slide 135
- Asteroids Problem (training)
- Slide 136
- spacecraft
- Slide 137
- Asteroids Problem (issues) S Radar angles to nearby asteroids [0,7] angle to goal [0,7] distances to nearby asteroids [0,390624] [0,24999999] desired delta heading angle [0,20] propulsion system propulsion signal [0,100] [0,20]
- Slide 138
- Asteroids Problem (issues) 1)Master L.A. took much longer to converge.
- Slide 139
- Asteroids Problem (issues) 2)Trained system not as good as experienced human driver.
- Slide 140
- Asteroids Problem (issues) 3)Slave L.A. should probably react differently to a given command depending on current angular velocity.
- Slide 141
- Asteroids Problem ( possible solutions ) 1)Should simplify Master Agent. L. A.
- Slide 142
- Asteroids Problem ( possible solutions ) 2)Use more precise angles in order to characterize a nearby asteroid.
- Slide 143
- Asteroids Problem ( possible solutions ) 3)Allow the slave agent to know the current angular velocity before selecting the appropriate steering signal.
- Slide 144
- Asteroids Problem (step 1) Radars are not very precise with respect to predicting distance to target. Signal Strength
- Slide 145
- Asteroids Problem (step 1)
- Slide 146
- Asteroids Problem (step 2) New factors: 1) angular velocity around azimuth of the spacecraft
- Slide 147
- Asteroids Problem (step 3) Angular velocity around azimuth Sensor
- Slide 148
- Asteroids Problem (step 3) Angular velocity around azimuth Sensor
- Slide 149
- Asteroids Problem (step 3) Learning agent receives information about the environment from sensors S 1, S 2 and S 3 and generates the desired heading angle signal Learning agent receives desired delta heading angle signal and current angular velocity from s4 and then instruct the actuators A 1 what to do in this case S Desired delta heading angle
- Slide 150
- Asteroids Problem (step 4) Learning agent receives position of closest asteroids and current heading from sensor S 3 and generates the desired heading angle signal Learning agent receives desired delta heading angle signal and current angular velocity and then instruct the actuators A 1 what to do in this case desired delta heading angle S position of closest asteroids S Learning agent receives information about environment from sensor S 1 and generates the distance to closest asteroids signal
- Slide 151
- Asteroids Problem (step 4) Laser Verification
- Slide 152
- Asteroids Problem (step 5) 1)Slave Sensor Learning Agent: signal from radar distance to closest asteroids. 2)Master Learning Agent: closest asteroids position and angle to goal desired delta heading angle. 3)Slave Actuator Learning Agent: desired delta heading angle steering signal
- Slide 153
- Asteroids Problem (step 6) radar desired delta heading angle S position of closest asteroids angle to goal [0,7] [0,20] propulsion system propulsion signal [0,100] [0,20] S gyroscope solar wind detector radar
- Slide 154
- Asteroids Problem (step 6)
- Slide 155
- Signal Strength
- Slide 156
- Asteroids Problem (step 6) Signal Strength
- Slide 157
- Asteroids Problem (step 6) Signal Strength
- Slide 158
- Asteroids Problem (step 6) Signal Strength
- Slide 159
- Asteroids Problem (step 6) Signal Strength
- Slide 160
- Asteroids Problem (step 6) Signal Strength
- Slide 161
- Asteroids Problem (step 6) [0,4] distance [0,4] angle signaldistance (m) 0[0,19) 1[19,39) 2[39,59) 3[59,79) 4[79,) signalangle () 0[0,2) 1[2,4) 2[4,6) 3[6,8) 4[8,10)
- Slide 162
- Asteroids Problem (step 6) [79,) 4 [19,39) 1 [39,59) 2 Signal Strength
- Slide 163
- Asteroids Problem (step 6) Signal Strength [8,10) 4 [4,6) 2 [2,4) 1
- Slide 164
- Asteroids Problem (step 6) radar desired delta heading angle S Position of closest asteroids angle to goal [0,7] [0,20] propulsion system propulsion signal [0,100] [0,20] S gyroscope Solar wind detector radar [0,15624] [0,124] [0,124999] distance to closest asteroids angle to closest asteroids
- Slide 165
- Asteroids Problem (step 6) Signal Strength
- Slide 166
- Asteroids Problem (step 6) signalamplitude 0[0,0.01) 1[0.01,0.02) 497[4.97,498) 498[4.98,4.99) 499[4.99,)
- Slide 167
- Asteroids Problem (step 6) radar desired delta heading angle S Position of closest asteroids angle to goal [0,7] [0,20] propulsion system propulsion signal [0,100] [0,20] S gyroscope Solar wind detector radar [0,15624] [0,124] [0,499] [0,124999] distance to closest asteroids angle to closest asteroids
- Slide 168
- Asteroids Problem (step 6) [0,9] angular speed signalangular speed (/s) 0(-,-7] 1(-7,-5] 2(-5,-3] 3(-3,-1] 4(-1,1) 5[1,3) 6[3,5) 7[5,7) 8[7, )
- Slide 169
- Asteroids Problem (step 6) radar desired delta heading angle S position of closest asteroids angle to goal [0,7] [0,20] propulsion system propulsion signal [0,100] [0,20] S gyroscope solar wind detector radar [0,15624] [0,124] [0,124999] [0,8] [0,188]
- Slide 170
- Asteroids Problem (step 7) radar desired delta heading angle S position of closest asteroids angle to goal [0,7] [0,20] propulsion system propulsion signal [0,100] [0,20] S gyroscope solar wind detector radar [0,15624] [0,124] [0,124999] [0,8] [0,188]
- Slide 171
- Asteroids Problem (step 8) radar desired delta heading angle S position of closest asteroids angle to goal [0,7] [0,20] propulsion system propulsion signal [0,100] [0,20] S gyroscope solar wind detector radar [0,15624] [0,124] [0,124999] [0,8] [0,188]
- Slide 172
- Asteroids Problem (step 8) FIRST AGENT: [0,124] [0,499] radar signal distances to closest asteroids
- Slide 173
- Asteroids Problem (step 8) position of closest asteroids [0,124999][0,20] desired delta heading angle SECOND AGENT:
- Slide 174
- Asteroids Problem (step 8) desired delta heading angle and current angular velocity propulsion signal [0,188] [0,100] THIRD AGENT:
- Slide 175
- Asteroids Problem (step 8) FIRST AGENT: [0,124] radar signal distances of closest asteroids REWARD FUNCTION: for each distance: reward = -10* deltaDistance Note: deltaDistance = |distance laserDistance|
- Slide 176
- Asteroids Problem (step 8) position of closest asteroids [0,124999][0,20] desired delta heading angle SECOND AGENT:REWARD FUNCTION: if distance to an asteroid -10000 reward = else => (180 |angleToGoal| )
- Slide 177
- Asteroids Problem (step 8) desired delta heading angle propulsion signal [0,209] [0,100] THIRD AGENT: REWARD FUNCTION: reward = - |achievedAngle desiredAngle|
- Slide 178
- Asteroids Problem radar desired delta heading angle S position of closest asteroids angle to goal [0,7] [0,20] propulsion system propulsion signal [0,100] [0,20] S gyroscope solar wind detector radar [0,15624] [0,124] [0,124999] [0,8] [0,188]
- Slide 179