Reinforcement Learning in Simulated Soccer with Kohonen Networks Chris White and David Brogan...
-
Upload
stephen-wells -
Category
Documents
-
view
213 -
download
0
Transcript of Reinforcement Learning in Simulated Soccer with Kohonen Networks Chris White and David Brogan...
![Page 1: Reinforcement Learning in Simulated Soccer with Kohonen Networks Chris White and David Brogan University of Virginia Department of Computer Science.](https://reader035.fdocuments.in/reader035/viewer/2022071805/56649cd85503460f949a1a6c/html5/thumbnails/1.jpg)
Reinforcement Learning in Simulated Soccer with Kohonen Networks
Chris White and David BroganUniversity of Virginia
Department of Computer Science
![Page 2: Reinforcement Learning in Simulated Soccer with Kohonen Networks Chris White and David Brogan University of Virginia Department of Computer Science.](https://reader035.fdocuments.in/reader035/viewer/2022071805/56649cd85503460f949a1a6c/html5/thumbnails/2.jpg)
Simulated Soccer How does agent decide what to do
with the ball?
Complexities Continuous inputs High dimensionality
![Page 3: Reinforcement Learning in Simulated Soccer with Kohonen Networks Chris White and David Brogan University of Virginia Department of Computer Science.](https://reader035.fdocuments.in/reader035/viewer/2022071805/56649cd85503460f949a1a6c/html5/thumbnails/3.jpg)
Reinforcement Learning (RL) Learning to associate utility values with
state-action pairs Agent incrementally updates value
associated with each state-action pair based on interaction with environment
(Russell & Norvig)
![Page 4: Reinforcement Learning in Simulated Soccer with Kohonen Networks Chris White and David Brogan University of Virginia Department of Computer Science.](https://reader035.fdocuments.in/reader035/viewer/2022071805/56649cd85503460f949a1a6c/html5/thumbnails/4.jpg)
Problems State space explodes exponentially in
terms of dimensionality Current methods of managing state
space explosion lack automation
RL does not scale well to problems with complexities of simulated soccer…
![Page 5: Reinforcement Learning in Simulated Soccer with Kohonen Networks Chris White and David Brogan University of Virginia Department of Computer Science.](https://reader035.fdocuments.in/reader035/viewer/2022071805/56649cd85503460f949a1a6c/html5/thumbnails/5.jpg)
Quantization Divide State Space into regions of
interest Tile Coding (Sutton & Barto, 1998)
No automated method for regions granularity Heterogeneity location
Prefer a learned abstraction of state space
![Page 6: Reinforcement Learning in Simulated Soccer with Kohonen Networks Chris White and David Brogan University of Virginia Department of Computer Science.](https://reader035.fdocuments.in/reader035/viewer/2022071805/56649cd85503460f949a1a6c/html5/thumbnails/6.jpg)
Kohonen Networks Clustering
algorithm Data
driven
Voronoi Diagram
Agent nearopponent goal
Teammate nearopponent goal
No nearbyopponents
![Page 7: Reinforcement Learning in Simulated Soccer with Kohonen Networks Chris White and David Brogan University of Virginia Department of Computer Science.](https://reader035.fdocuments.in/reader035/viewer/2022071805/56649cd85503460f949a1a6c/html5/thumbnails/7.jpg)
State Space Reduction 90 continuous valued inputs
describe state of a soccer game Naïve discretization 290 states Filter out unnecessary inputs still
218 states Clustering algorithm only 5000
states Big Win!!!
![Page 8: Reinforcement Learning in Simulated Soccer with Kohonen Networks Chris White and David Brogan University of Virginia Department of Computer Science.](https://reader035.fdocuments.in/reader035/viewer/2022071805/56649cd85503460f949a1a6c/html5/thumbnails/8.jpg)
Two Pass Algorithm Pass 1:
Use Kohonen Network and large training set to learn state space
Pass 2: Use Reinforcement Learning to learn
utilities for states (SARSA)
![Page 9: Reinforcement Learning in Simulated Soccer with Kohonen Networks Chris White and David Brogan University of Virginia Department of Computer Science.](https://reader035.fdocuments.in/reader035/viewer/2022071805/56649cd85503460f949a1a6c/html5/thumbnails/9.jpg)
Fragility of Learned Actions
What happens to attacker’s utility if goalie crosses dotted line?
![Page 10: Reinforcement Learning in Simulated Soccer with Kohonen Networks Chris White and David Brogan University of Virginia Department of Computer Science.](https://reader035.fdocuments.in/reader035/viewer/2022071805/56649cd85503460f949a1a6c/html5/thumbnails/10.jpg)
Unresolved Issues Increased generalization leads to
frequency aliasing…
This becomes a sampling problem…
vs.
Few samples Many samples
Example: Riemann Sum
![Page 11: Reinforcement Learning in Simulated Soccer with Kohonen Networks Chris White and David Brogan University of Virginia Department of Computer Science.](https://reader035.fdocuments.in/reader035/viewer/2022071805/56649cd85503460f949a1a6c/html5/thumbnails/11.jpg)
Aliasing & Sampling Utility function not band limited How can we sample to reduce
error? Uniformly increase sampling rate?
(not the best idea) Adaptively super sample? Choose sample points based on
special criteria?
![Page 12: Reinforcement Learning in Simulated Soccer with Kohonen Networks Chris White and David Brogan University of Virginia Department of Computer Science.](https://reader035.fdocuments.in/reader035/viewer/2022071805/56649cd85503460f949a1a6c/html5/thumbnails/12.jpg)
Forcing Functions Use a forcing function to only
sample action in a state when it is likely to be effective (valleys are ignored) Reduces variance in experienced
reward for state-action pair How do we create such a forcing
function?
![Page 13: Reinforcement Learning in Simulated Soccer with Kohonen Networks Chris White and David Brogan University of Virginia Department of Computer Science.](https://reader035.fdocuments.in/reader035/viewer/2022071805/56649cd85503460f949a1a6c/html5/thumbnails/13.jpg)
Results Evaluate three systems
Control – Random action selection SARSA Forcing Function
Evaluation criteria Goals scored Time of possession
![Page 14: Reinforcement Learning in Simulated Soccer with Kohonen Networks Chris White and David Brogan University of Virginia Department of Computer Science.](https://reader035.fdocuments.in/reader035/viewer/2022071805/56649cd85503460f949a1a6c/html5/thumbnails/14.jpg)
Cumulative ScoreSARSA vs. Random Policy
0
100
200
300
400
500
600
700
800
900
1 55 109
163
217
271
325
379
433
487
541
595
649
703
757
811
865
919
Games Played
Cu
mu
lati
ve G
oal
s S
core
d
Learning Team
Random Team
![Page 15: Reinforcement Learning in Simulated Soccer with Kohonen Networks Chris White and David Brogan University of Virginia Department of Computer Science.](https://reader035.fdocuments.in/reader035/viewer/2022071805/56649cd85503460f949a1a6c/html5/thumbnails/15.jpg)
Time of PossessionTime of Possession
0
1000
2000
3000
4000
5000
6000
1 60 119
178
237
296
355
414
473
532
591
650
709
768
827
886
945
Games Played
Tim
e o
f P
oss
essi
on
Time of Possession
![Page 16: Reinforcement Learning in Simulated Soccer with Kohonen Networks Chris White and David Brogan University of Virginia Department of Computer Science.](https://reader035.fdocuments.in/reader035/viewer/2022071805/56649cd85503460f949a1a6c/html5/thumbnails/16.jpg)
Team with Forcing Functions
SARSA with Forcing Function vs. Random Policy
0
200
400
600
800
1000
12001 65 129
193
257
321
385
449
513
577
641
705
769
833
897
Games Played
Cu
mu
lati
ve S
core
Learning Team with ForcingFunctions
Random Team Against Teamwith Forcing Functions
![Page 17: Reinforcement Learning in Simulated Soccer with Kohonen Networks Chris White and David Brogan University of Virginia Department of Computer Science.](https://reader035.fdocuments.in/reader035/viewer/2022071805/56649cd85503460f949a1a6c/html5/thumbnails/17.jpg)
With Forcing vs. WithoutPerformance With Forcing Functions vs Performance Without Forcing Functions
0
200
400
600
800
1000
1200
1 53 105
157
209
261
313
365
417
469
521
573
625
677
729
781
833
885
937
Games Played
Cu
mu
lati
ve S
core
Learning Team Without ForcingFunctions
Random Team Against Team WithoutForcing Functions
Learning Team with Forcing Functions
Random Team Against Team withForcing Functions
![Page 18: Reinforcement Learning in Simulated Soccer with Kohonen Networks Chris White and David Brogan University of Virginia Department of Computer Science.](https://reader035.fdocuments.in/reader035/viewer/2022071805/56649cd85503460f949a1a6c/html5/thumbnails/18.jpg)
Summary Two-Pass learning algorithm for
simulated soccer State space abstraction is automated Data driven technique
Improved state of the art for simulated soccer
![Page 19: Reinforcement Learning in Simulated Soccer with Kohonen Networks Chris White and David Brogan University of Virginia Department of Computer Science.](https://reader035.fdocuments.in/reader035/viewer/2022071805/56649cd85503460f949a1a6c/html5/thumbnails/19.jpg)
Future Work Learned distance metric
Additional automation in process Better generalization