By Luigi Cardamone, Daniele Loiacono and Pier Luca Lanzi.

24
Learning Drivers through Imitation using Supervised Methods By Luigi Cardamone, Daniele Loiacono and Pier Luca Lanzi

Transcript of By Luigi Cardamone, Daniele Loiacono and Pier Luca Lanzi.

Learning Drivers through Imitation using Supervised Methods

By Luigi Cardamone, Daniele Loiacono and Pier Luca Lanzi

The outline

Introduction Related work Torcs Imitation learning

What sensors? What actions? What learning method? What data?

Experimental results Discussion, conclusions and future work

Introduction

What is imitation learning? Supervised learning Neuroevolution

Two main methods Direct methods Indirect methods

Introduction

Direct methods are well-known to be very ineffective.

Our methods develop drivers with only 15% lower performance than best bot in TORCS.

The trick is in “human-like” high-level action prediction

The outline

Introduction Related work Torcs Imitation learning

What sensors? What actions? What learning method? What data?

Experimental results Discussion, conclusions and future work

Related work

Imitation learning in computer games Rule-based NPC for Quake III via two-

step process Quake II NPC via reinforcement learning,

fuzzy clustering and a Bayesian motion-modeling.

Neural networks with backpropagation for Legion II and Motocross The Force.

Drivatar training for Forza Motosport

The outline

Introduction Related work Torcs Imitation learning

What sensors? What actions? What learning method? What data?

Experimental results Discussion, conclusions and future work

What input sensors?

The rangefinder sensor

The lookahead sensor

What actions?

4 low-level effectors in TORCS Wheel Gas pedal Brake pedal Gear change

2 high-level actions in this work Speed Trajectory

What learning methods?

K-nearest neighbor Training :▪ Doesn’t need any training

How it was applied?▪ Directly during the TORCS race▪ At each tic, the logged data is searched to

find the k most similar instances.▪ The k similar instances are selected and

averaged

What learning methods?

Neural networks Training ▪ Neuroevolution with Augmenting Topology

(NEAT) to evolve both the weights and the topology of a neural network

How it was applied?▪ 2 networks, for speed and target position

prediction▪ Rangefinder networks with 19 angle inputs + 1 bias

input▪ Lookahead networks with 8 segments inputs + bias

input

▪ The fitness was defined as the prediction error

What data?

Inferno bot on 3 tracks for 3 laps each Simple fast track Difficult track with many fast turns A difficult track with many slow sharp

turns

Only the data of second lap was recorded 3 data sets with 1982, 3899 and 3619

examples Additional all-in set with 9500 examples

The outline

Introduction Related work Torcs Imitation learning

What sensors? What actions? What learning method? What data?

Experimental results Discussion, conclusions and future work

Experimental results

Overall, we obtained 16 models 2 learning algorithms 3 + 1 datasets 2 types of sensors

K-nearest algorithm was applied with k = 20

NEAT was applied with 100 individuals for 100 generations

All the experiments were conducted using TORCS 1.3.1

Experimental Results - Evaluation

Each model was evaluated by using it to drive a car on each track for 10.000 game ticks.

The tracks 3 tracks used for training 2 unseen tracks▪ A simple fast track▪ A track with many fast and difficult turns

The driver was also equipped with standard recovery policy.

Experimental results - Results

Inferno was better than its imitations

Lookeaheads are better than rangefinders

K-nearest neighbor is better than NEAT

One of the models had only 15% lower performance than Inferno bot.

The summary of the results

Experimental results - Execution time

Direct methods result in low computational cost

Our approach needs 30 times less CPU time to obtain reasonable results

Increasing the lookahead

How much lookahead is useful?

Second series of tests with 8 and 16 lookeahead values showed overfitting

The outline

Introduction Related work Torcs Imitation learning

What sensors? What actions? What learning method? What data?

Experimental results Discussion, conclusions and future

work

Discussion

Good drivers Close to the target bot Run out of the track in difficult turns as a

result of prediction error or a low reactivity in steering

Bad drivers Many discontinues in the prediction of

trajectories Causes car to move quickly from oneside of the track to the other one

Perceptual aliasing

Two different places can be perceived the same

Usually happens on long straight parts of the road

Can be solved via special treatment of straight parts, full throttle or bigger lookahead

Summary

Supervised learning to imitate a driver

High-level aspect of driving, speed and trajectory rather than low-level effectors

Novel lookahead sensor

Good results with k-nearest neighbor

Inferno bot is still better due to perceptual aliasing and slow steering during abrupt turns

Future work

Exploit structural symmetry on the track

Increase the robustness to noise

Reduce computational cost

Improve steering reaction to abrupt turns