Sahika Genc and Premkumar Rangarajan Amazon Web Services€¦ · 2020-01-0737 Pulished Apr 20 Using...

8
2020-01-0737 Published 14 Apr 2020 Using Reinforcement Learning and Simulation to Develop Autonomous Vehicle Control Strategies Anthony Navarro PolySync Technologies Sahika Genc and Premkumar Rangarajan Amazon Web Services Rana Khalil Voyage Auto Inc. Nick Goberville, Johan Fanas Rojas, and Zachary Asher Western Michigan University Citation: Navarro, A., Genc, S., Rangarajan, P., Khalil, R. et al., “Using Reinforcement Learning and Simulation to Develop Autonomous Vehicle Control Strategies,” SAE Technical Paper 2020-01-0737, 2020, doi:10.4271/2020-01-0737. Abstract W hile machine learning in autonomous vehicles development has increased significantly in the past few years, the use of reinforcement learning (RL) methods has only recently been applied. Convolutional Neural Networks (CNNs) became common for their powerful object detection and identification and even provided end-to- end control of an autonomous vehicle. However, one of the requirements of a CNN is a large amount of labeled data to inform and train the neural network. While data is becoming more accessible, these networks are still sensitive to the format and collection environment which makes the use of others’ data more difficult. In contrast, RL develops solutions in a simulation environment through trial and error without labeled data. Our research expands upon previous research in RL and Proximal Policy Optimization (PPO) and the appli- cation of these algorithms to 1/18th scale cars by expanding the application of this control strategy to a full-sized passenger vehicle. By using this method of unsupervised learning, our research demonstrates the ability to learn new control strate- gies while in a simulated environment without the need for large amounts of real-world data. e use of simulation envi- ronments for RL is important as the unsupervised learning methodology requires many trials to learn appropriate desired behavior. Running this in the real-world would be expensive and impractical, however the simulation enables the solutions to be developed at low cost and time as the process can be accelerated beyond real-time. e simulation environment has high-fidelity to model vehicle dynamics as well as rendering capability for domain adaptation, and guarantees successful simulation-to-real world transfer. Traditional control algorithms are used on the strategies developed to ensure a proper mapping to the physical vehicle it is being applied to. is approach results in a low cost, low data solution that enables control of a full-sized, self-driving passenger vehicle. Introduction T he goal of this experiment was to demonstrate the application of reinforcement learning algorithms trained in a virtual environment and deployed in the real world in a closed course setting. As autonomous vehicle racing has become increasingly popular of the past few years. From local events for RC sized cars hosted consistently in Oakland, CA to sporadic events all over the world, these have allowed interested developers to send a smaller scale car around a simulated track using various techniques of machine learning. Full sized vehicles have seen their own races as well, with the Self-Racing Cars event hosted annually in California. ere are even commercial companies such as RoboRace that created a sport out of autonomous vehicle racing. ese are all exciting applications of machine learning in closed-course racing that have become possible in recent years. While all of these applications rely on Convolutional Neural Networks (CNN) as their machine learning approach, CNNs require a large amount of data for the system to learn from. is has to be gathered from similar setups, and can be sensitive to many things like lighting, sensor placement and calibration. Here we present the ability to use reinforcement learning as an alternative that allows the development of new control models without the need for extensive data gathering. Reinforcement learning is an approach in machine learning and artificial intelli- gence where a system is allowed to explore a solution space, and systematically find solutions that work. e training of reinforcement learning algorithms benefit exponentially from a simulation environment since agents are able to explore their environment and solution space with little or no supervision and consequence. Amazon Web Services (AWS) has developed a platform for developers to start using reinforcement learning in automotive applications. New technologies such as the Polysync DriveKit and AWS DeepRacer project demonstrate the ability to lower the Downloaded from SAE International by Zachary Asher, Wednesday, April 15, 2020

Transcript of Sahika Genc and Premkumar Rangarajan Amazon Web Services€¦ · 2020-01-0737 Pulished Apr 20 Using...

Page 1: Sahika Genc and Premkumar Rangarajan Amazon Web Services€¦ · 2020-01-0737 Pulished Apr 20 Using Reinforcement Learning and Simulation to Develop Autonomous Vehicle Control Strategies

2020-01-0737 Published 14 Apr 2020

Using Reinforcement Learning and Simulation to Develop Autonomous Vehicle Control StrategiesAnthony Navarro PolySync Technologies

Sahika Genc and Premkumar Rangarajan Amazon Web Services

Rana Khalil Voyage Auto Inc.

Nick Goberville, Johan Fanas Rojas, and Zachary Asher Western Michigan University

Citation: Navarro, A., Genc, S., Rangarajan, P., Khalil, R. et al., “Using Reinforcement Learning and Simulation to Develop Autonomous Vehicle Control Strategies,” SAE Technical Paper 2020-01-0737, 2020, doi:10.4271/2020-01-0737.

Abstract

While machine learning in autonomous vehicles development has increased significantly in the past few years, the use of reinforcement learning

(RL) methods has only recently been applied. Convolutional Neural Networks (CNNs) became common for their powerful object detection and identification and even provided end-to-end control of an autonomous vehicle. However, one of the requirements of a CNN is a large amount of labeled data to inform and train the neural network. While data is becoming more accessible, these networks are still sensitive to the format and collection environment which makes the use of others’ data more difficult. In contrast, RL develops solutions in a simulation environment through trial and error without labeled data. Our research expands upon previous research in RL and Proximal Policy Optimization (PPO) and the appli-cation of these algorithms to 1/18th scale cars by expanding the application of this control strategy to a full-sized passenger

vehicle. By using this method of unsupervised learning, our research demonstrates the ability to learn new control strate-gies while in a simulated environment without the need for large amounts of real-world data. The use of simulation envi-ronments for RL is important as the unsupervised learning methodology requires many trials to learn appropriate desired behavior. Running this in the real-world would be expensive and impractical, however the simulation enables the solutions to be  developed at low cost and time as the process can be accelerated beyond real-time. The simulation environment has high-fidelity to model vehicle dynamics as well as rendering capability for domain adaptation, and guarantees successful simulation-to-real world transfer. Traditional control algorithms are used on the strategies developed to ensure a proper mapping to the physical vehicle it is being applied to. This approach results in a low cost, low data solution that enables control of a full-sized, self-driving passenger vehicle.

Introduction

The goal of this experiment was to demonstrate the application of reinforcement learning algorithms trained in a virtual environment and deployed in the

real world in a closed course setting. As autonomous vehicle racing has become increasingly popular of the past few years. From local events for RC sized cars hosted consistently in Oakland, CA to sporadic events all over the world, these have allowed interested developers to send a smaller scale car around a simulated track using various techniques of machine learning. Full sized vehicles have seen their own races as well, with the Self-Racing Cars event hosted annually in California. There are even commercial companies such as RoboRace that created a sport out of autonomous vehicle racing. These are all exciting applications of machine learning in closed-course racing that have become possible in recent years.

While all of these applications rely on Convolutional Neural Networks (CNN) as their machine learning

approach, CNNs require a large amount of data for the system to learn from. This has to be gathered from similar setups, and can be sensitive to many things like lighting, sensor placement and calibration. Here we  present the ability to use reinforcement learning as an alternative that allows the development of new control models without the need for extensive data gathering. Reinforcement learning is an approach in machine learning and artificial intelli-gence where a system is allowed to explore a solution space, and systematically find solutions that work. The training of reinforcement learning algorithms benefit exponentially from a simulation environment since agents are able to explore their environment and solution space with little or no supervision and consequence. Amazon Web Services (AWS) has developed a platform for developers to start using reinforcement learning in automotive applications. New technologies such as the Polysync DriveKit and AWS DeepRacer project demonstrate the ability to lower the

Downloaded from SAE International by Zachary Asher, Wednesday, April 15, 2020

Page 2: Sahika Genc and Premkumar Rangarajan Amazon Web Services€¦ · 2020-01-0737 Pulished Apr 20 Using Reinforcement Learning and Simulation to Develop Autonomous Vehicle Control Strategies

USING REINFORCEMENT LEARNING AND SIMULATION TO DEVELOP AUTONOMOUS VEHICLE CONTROL STRATEGIES 2

barrier of entry to both autonomous vehicle and reinforce-ment learning algorithm development.

A few different challenges had to be overcome to make this experiment a success. Transitioning from a simulated environment to a real-world environment is the ability to recreate the conditions such as system level latency, environ-mental changes, and other unpredictable behaviors or scenarios that a vehicle will experience in the field. Simulations have grown more sophisticated in using techniques such as procedurally generated scenarios with tunable permutations and actors, as well as enabling ingestion of real-world data and scenarios to learn from. Modeling accurate sensor calibra-tion models, problems, and performance is also a major chal-lenge to recreate or account for in simulations, as hardware has grown increasingly complex and rich in features to aid an autonomous system with more information and redundancy. Simulations, however, are a valuable tool to test all compo-nents of the system in isolation in both good and bad condi-tions as simulations offer an opportunity to introduce fault injection and certain failures that could happen to a func-tioning autonomous system.

The technology both systems use and how they are inte-grated together is discussed before demonstrating the deploy-ment of a reinforcement learning model on a full-sized passenger vehicle on an actual race track. We also analyze the results of this experiment and compare them to a previously published SAE paper and experiment that was conducted that used a single camera and a deep neural network solution [1].

Production Vehicle Control Using Polysync’s DrivekitOne of the challenges in autonomous vehicle development is the seamless command and control of the vehicle as expected as a driver through Drive-By Wire (DBW). For most of the automotive history, vehicles have been controlled with mechanical and electrical controls directly linked to their respective busses and systems, which have required manual input from the driver to command and control the vehicle. In recent years, the automotive industry has begun adopting the fly-by-wire technology [2] from the aviation industry. These systems use indirect electronic control of mechanical compo-nents. This key change allows for autonomous vehicles to be electronically controlled without the needs for a specialized mechanical robotic system to physically move the controls of the vehicle.

Polysync’s DriveKit is designed to enable vehicle control and abstract the vehicle as a connected device for developers. Most vehicles designed today use a common area network (CAN) in order to facilitate communications across multiple systems in the vehicle. The CAN also provides status and control information about the vehicle. By using a proprietary method of reverse engineering, Polysync utilizes the vehicles OBD-II port, CAN, and a custom designed wiring harness to read vehicle status and write control commands to the vehicle.

In order to maintain the manufacturer’s safety standards, sensor imposition is used to enable control while allowing all driver commands to be passed through to the vehicle for normal operation and safety monitoring. DriveKit utilizes multiple Microcontrollers (MCUs) to enable seamless reliable commu-nication with any computer by sending position commands for acceleration and braking and torque or angle for steering. Examples for how to communicate with DriveKit and commu-nication procedures can be found on the OSCC Github [3].

Enabling Autonomous Vehicle Algorithms Using Deepracer and Aws ServicesAWS DeepRacer robocar is an educational platform to learn and develop deep reinforcement learning algorithms for autonomous driving using cloud services. Users configure the robocar before deploying it into a simulation environment where it learns how to drive by trial-and-error by exploring a discrete action space of finite actions. The range is defined by the maximum speed and the absolute value of the maximum steering angles. The granularity define the number of speeds and steering angles the agent has. It utilizes a high-fidelity simulation environment with a physics-engine based on visual observations, to optimize a long-term objective, i.e., complete a lap as fast as possible (see Figure 2). For autonomous driving, the AWS DeepRacer vehicle receives input images streamed at 15 frames per second from the front camera. The raw input is downsized to 160x120 pixels in size and converted to gray-scale images. Responding to an input observation, the vehicle reacts with a well-defined action of specific speed and steering angle based on the action space. The actions are converted to low-level motor controls to enable movement in the actual AWS DeepRacer robocar. Reinforcement Learning (RL) has been used to accomplish diverse robotic tasks: manipulation [4, 5, 6, 7], locomotion [8, 9], navigation [10, 11, 12, 13], flight [14, 15], interaction [16, 17], motion planning [18, 19] and more.

 FIGURE 1  Polysync Drivekit Architecture

© S

AE

Inte

rnat

iona

l.

Downloaded from SAE International by Zachary Asher, Wednesday, April 15, 2020

Page 3: Sahika Genc and Premkumar Rangarajan Amazon Web Services€¦ · 2020-01-0737 Pulished Apr 20 Using Reinforcement Learning and Simulation to Develop Autonomous Vehicle Control Strategies

USING REINFORCEMENT LEARNING AND SIMULATION TO DEVELOP AUTONOMOUS VEHICLE CONTROL STRATEGIES 3

There are multiple cloud services used to train an RL agent for AWS DeepRacer. The main services that enable training and simulation are Amazon SageMaker and AWS RoboMaker, respectively. The block diagram in Figure 3 illus-trates how Amazon SageMaker and AWS RoboMaker interact to create a distributed deep reinforcement learning environ-ment to train and test autonomous agents at scale. AWS RoboMaker is a cloud service to develop, test and deploy robot software. We initialize our agent, i.e., our neural network model, in Amazon SageMaker and store a persistent copy in Amazon S3, an object store service. We initialize our environ-ment in AWS RoboMaker and copy the agent model from S3. The agent interacts with the environment and we store the experience data collected in Redis, an in-memory database. Amazon SageMaker trains the neural network with data collected in Redis and updates the neural network models in S3. AWS RoboMaker copies the model from S3 and creates more experience data. The cycle continues until training stops based on a convergence criteria. The same setup can launch multiple simulations to train faster or train the agent to traverse multiple tracks simultaneously.

Amazon SageMaker is a platform to train and deploy machine learning models at scale using Jupyter Notebook as

an interface. It integrates with popular libraries such as scikit-learn [20], frameworks such as TensorFlow [21] and eases development with tools for hyper-parameter optimization and web service based hosting of trained models. AWS DeepRacer utilizes Amazon SageMaker RL libraries for Clipped Policy Proximal Optimization (PPO) such as Intel RL Coach [22] and Ray RLlib [23] libraries that build on top of existing deep learning frameworks. The algorithms inte-grate with environments using the Open AI Gym [24] inter-face which specifies a standard way to define the state, action, reward and episode so that agent can interact with the envi-ronment. A number of simulators such as PyBullet for robotics, EnergyPlus [25] for buildings and Matlab for indus-trial control are supported by Amazon SageMaker. For AWS DeepRacer, we use the RL Coach library that implements the PPO algorithm. Policy optimization methods with parame-trized policies provide the best results in simulation environ-ments. These methods do not require modeling the dynamics of the system to determine the optimal policy. On the other hand, when the simulation fidelity is low and contain unmod-eled dynamics, the policy trained in the simulation perform poorly in the real-world.

The simulation environment is developed with AWS RoboMaker and uses OpenAI Gym to interface with the RL Coach library. We built the robocar assets including camera, actuators, and Ackerman driving models in Gazebo [26]. Gazebo consists of several components that work together to create a high fidelity robotics simulation. A robot model describes each component of the DeepRacer car - the chassis, wheels, camera - their dimensions, how they link with each other, their properties such as mass, field of view of the camera. We specify the model using the Unified Robot Description Format (URDF) and our model is inspired from the MIT Racecar [27]. We create our tracks and background environ-ment using a 3D modeling software, and import these assets into Gazebo. A physics engine simulates the laws of physics with the robot model and takes into account factors like colli-sion, friction and acceleration of the car. We use the ODE physics engine. A rendering engine, OGRE, visualizes the graphics of the simulation, i.e., the car moving along the track. In addition, Gazebo integrates with various plugins, that allows us to use the camera as a sensor and simulate multiple light sources.

Optimization of Reinforcement Learning for DeepracerDue to high sample complexity and safety requirements, it is common to train the RL agent in simulation [4, 8, 28]. To reduce training time and encourage exploration, the agent is usually trained with distributed rollouts [29, 30, 31, 32]. For a successful transfer to the real world, researchers use calibra-tion [5, 33], domain randomization [34, 35, 36, 15], fine tuning with real world data [12], and learn features from a combina-tion of simulation and real data [37, 38].

 FIGURE 2  AWS DeepRacer robocar educational platform to learn and develop deep reinforcement learning policies trained in simulation for autonomous 1/18th scale robocar racing in real world.

© S

AE

Inte

rnat

iona

l.

 FIGURE 3  The block diagram of how several cloud services are utilized to train a reinforcement learning agent that learns to drive in a simulation environment using deep neural networks.

© S

AE

Inte

rnat

iona

l.

Downloaded from SAE International by Zachary Asher, Wednesday, April 15, 2020

Page 4: Sahika Genc and Premkumar Rangarajan Amazon Web Services€¦ · 2020-01-0737 Pulished Apr 20 Using Reinforcement Learning and Simulation to Develop Autonomous Vehicle Control Strategies

USING REINFORCEMENT LEARNING AND SIMULATION TO DEVELOP AUTONOMOUS VEHICLE CONTROL STRATEGIES 4

Transformation of Deepracer Inference and Pid Optimization for Full Size Autonomous Vehicle ControlIt was important to understand how the AWS DeepRacer infer-ence process worked. First, the state of the environment (raw image) is processed and sent to the inference engine. The infer-ence engine is a ROS node which has access to the trained RL model and the action space settings for that model. Inference results are a vector of Class Labels and Probabilities for each Class. The Class Labels are mapped to the Action Space settings. The Class Label with the highest probability is chosen and sent to a ROS node for Navigation. The Navigation ROS node trans-lates the selected action space setting (the Class with the highest probability that was chosen during inference) into Steering Angle and Throttle Position values which are published to topics that enable Autonomous Vehicle Control in the DeepRacer vehicle.

To use the DeepRacer RL models on our production Kia Niro, we  began with an analysis of the Robot Operating System (ROS) libraries created by Polysync’s [39] module for autonomous vehicle control. ROS was selected as the prefered middleware since DeepRacer was developed with it and DriveKit supports it. The Polysync ROSCCO node provides a list of ROS topics to enable control of a full size car. These were /SteeringCommand, /ThrottleCommand and /BrakeCommand. Using these topics, we wrote an Integration and Transformation node to send drive predictions from the Navigation node of the DeepRacer to the ROSCCO topics.

Approach for Interpreting Deepracer Inference PredictionsAn RL model was trained using the AWS DeepRacer console based on the Re:Invent 2018 track as the agent interaction world. We subsequently loaded this model to a physical 1/18th scale robocar while disabling the robocar’s servo controls. We wrote a simple python script to subscribe to DeepRacer’s “Auto Drive” topic and sent the output for logging. Once we  started the DeepRacer robocar, “Steering Angle” and “Throttle Float64” messages could be seen being sent to the logs. “Steering Angle” is a real number between -1 to +1 and “Throttle” is a value between 0 to 1.

Control Design for Autonomous Operation of the Full Size CarFor steering, we noticed that when values were sent as-is from DeepRacer to ROSCCO, it resulted in an erratic and wild steering behavior in the full size car. Considering the DeepRacer vehicle is 1/18th scale, abrupt steering changes are acceptable as it autonomously drives around the track with the goal maintaining all wheels on the track. However for the full size car, this erratic movement was not acceptable driving behavior. After extensive testing and experimenting with various configurations, we setup a proportional, integral, derivative (PID) controller for steering with the following settings that helped correct the steering behavior to acceptable limits:

K p = 0 20.

Ki = 0 0025.

Kd = 0 0025.

Upper Limit = 0 20.

Lower Limit = −0 20.

 FIGURE 4  RL and Command and Control chart.

© S

AE

Inte

rnat

iona

l.

 FIGURE 5  Polysync and DeepRacer Software Stacks Integrated on the Vehicle.

© S

AE

Inte

rnat

iona

l.

Downloaded from SAE International by Zachary Asher, Wednesday, April 15, 2020

Page 5: Sahika Genc and Premkumar Rangarajan Amazon Web Services€¦ · 2020-01-0737 Pulished Apr 20 Using Reinforcement Learning and Simulation to Develop Autonomous Vehicle Control Strategies

USING REINFORCEMENT LEARNING AND SIMULATION TO DEVELOP AUTONOMOUS VEHICLE CONTROL STRATEGIES 5

max Loop Frequency = 25

AWS DeepRacer predictions provide control values for Steering and Throttle only as the car used is a 1/18th scale electric car with a Servo motor. This works well on this scale of car as the momentum of the robocar does not require an active braking system. With a full size car however, properly controlling both Throttle and Braking were critical. To use the DeepRacer predictions, braking values had to be derived from test data. This solution comprised of setting up two addi-tional PID controllers, one each for Throttle and Brake. The inputs to these PID controllers were derived based on the differential between two consecutive throttle predictions from the DeepRacer node. If there was a positive differential (Current DeepRacer Throttle - Last DeepRacer Throttle), the new Throttle Position was sent to the input topic of the Throttle PID controller. If the differential was negative, the new Throttle Position was sent to the input of the Brake PID controller. After experimentation with several configurations and tests, the following values were chosen for the PID setup:

K p = 0 20.

Ki = 0 0020.

Kd = 0 0015.

Upper Limit = 0 2.

Lower Limit = −0 5.

max Loop Frequency = 25

An additional ROS node was written to subscribe to the PID Control outputs for Steering, Throttle, Brake and these are published to the respective topics in the ROSCCO nodes for enabling autonomous vehicle control.

Other ConsiderationsWe conducted tests to extract the Steering and Throttle position feedback from the full size car and use this as a parameter to compute differential for PID Control inputs, however the tests remained inconclusive. We will continue to run these tests to determine the optimal configuration settings for our experimentation.

ResultsA two day test event was conducted at the MAC Track which is a technical kart track in Oregon. It was perfect for our appli-cations as it contained a technical layout while being suitable for our tests since we limited the vehicle to 20 MPH for safety and controllability. The AWS DeepRacer RL model controlled the steering, braking, braking during the test while a safety driver ensured safe operation and a computer operator that setup and ran the system.

Additional tests were conducted on the images collected during the track test to quantify the simulation-to-real world transfer success. We trained 17 RL models on various tracks in the simulation environment without augmenting the training data with the real world images or digital version of the real world track. A detailed discussion on the training and evaluation of these models in the simulation environment is in [40]. We ran the RL model in inference mode with inputs as images from the real world and the output as the discrete action with the highest probability to achieve a reward.

The number images for a single loop along the track was around 1700. We annotated the images with respect to the manual driving to obtain the ground truth. To evaluate the performance of the simulation-to-real world transfer of the models, we calculated the ratio of images when the output of the RL model matched the ground truth. The percentage of matching discrete actions for each model is shown in Figure 6. The best model matches the discrete actions to the ground truth annotations with 92% probability without training on the real world images from the track. All the models match the discrete actions to the ground truth annotation with a minimum of 60%.

The best model was trained on two simulated racing tracks, namely Tokyo and London loops available in the AWS DeepRacer console, and shadows were added with 20% prob-ability during training in simulation to increase the robust-ness of the model. This indicates that certain types of noise is more effective than others. For example, salt-and-pepper or sharpening have not resulted in better performance. More interestingly, the next 4 models with %80 match probability have not included any noise augmentation. Another inter-esting result is that the deeper neural network performance is worse than the shallow neural networks with noise or regularization.

 FIGURE 6  The percentage of matching discrete actions for each RL model for a single loop around the real world track

© S

AE

Inte

rnat

iona

l.

Downloaded from SAE International by Zachary Asher, Wednesday, April 15, 2020

Page 6: Sahika Genc and Premkumar Rangarajan Amazon Web Services€¦ · 2020-01-0737 Pulished Apr 20 Using Reinforcement Learning and Simulation to Develop Autonomous Vehicle Control Strategies

USING REINFORCEMENT LEARNING AND SIMULATION TO DEVELOP AUTONOMOUS VEHICLE CONTROL STRATEGIES 6

To deep dive into understanding the differences in the models and determine the best model for future testing, we use Gradient-weighted Class Activation Mapping (Grad-CAM) in [41] visualization. Grad-CAM applies to CNNs used in reinforcement learning, without any architectural changes or re-training. Grad-CAM uses the gradients of any target concept, flowing into the final convolutional layer to produce a coarse localization map highlighting the important regions in the image for predicting the concept. We generated the localization maps for the best model with the shallow network as well as the deep neural network model as shown in Figure 7. The warmer the colors at a particular location, the more impact (higher gradients) that location has on the concept. Therefore, according to the location maps in Figure 7, the deeper model (maps in the right column) is frequently distracted by objects outside the track. The shallow network is better at tuning these out. Therefore, a trade-off between shallow and deeper networks exists with respect to the depen-dence of the concept on the granularity of the task.

ConclusionMachine learning has an exciting and relevant application in autonomous vehicles. Reinforcement learning specifically has many key advantages over previously attempted deep neural

network approaches. For the deep neural network approach, it took thousands of real images with many hours of training in order to obtain results that worked but were less than ideal. Using reinforcement learning, we expect to accomplish better results without the need for collecting real imagery. This alone can be a large hurdle a vehicle, sensors, and a test track all need to be obtained first. And then all the testing has to be done on a physical track to evaluate the model’s performance. Reinforcement learning has allowed us to start in software, evaluate results, and minimize the time on the track. Since we are able to get an understanding of the expected results before testing on the track, we are also able to better utilize our limited track time and focus on improvements and results.

We plan to continue expanding this research by building on the work this team has done as well as what the DeepRacer team is doing. By taking the data recorded on the track and creating a simulation of the track, we hope to achieve much better performance and start being able to generalize results. DeepRacer has expanded their work to create head-to-head racing vehicles trained in reinforcement learning. Expanding research in both of these areas and applying everything to a full size vehicle should continue to prove that reinforcement learning has a unique advantage in autonomous vehicles. Future research could be conducted to help develop additional control methodologies, improve strategies for autonomous vehicle racing, or adapt control strategies to new scenarios where the system is able to apply past knowledge with prior experience in the current scenario. With developer focused systems for autonomous vehicles, these can be  explored further in future research.

References 1. Navarro, A., Joerdening, J., Khalil, R., Brown, A., and Asher,

Z., “Development of an Autonomous Vehicle Control Strategy Using a Single Camera and Deep Neural Networks,” SAE Technical Paper 2018-01-0035, 2018, https://doi.org/10.4271/2018-01-0035.

2. Traverse, P., Lacaze, I., and Souyris, J., “Airbus Fly-by-Wire: A Total Approach to Dependability,” . In: Jacquart, R., editor. Building the Information Society. (Boston, MA, Springer US, 2004), 191-212.

3. “Polysync oscc,” https://github.com/PolySync/oscc. 4. Andrychowicz, M., Wolski, F., Ray, A., Schneider, J. et al.,

“Hindsight Experience Replay,” Advances in Neural Information Processing Systems 5048-5058, 2017.

5. Andrychowicz, M., Baker, B., Chociej, M., Jozefowicz, R., McGrew, B., Pachocki, J., Petron, A., Plappert, M., Powell, G., Ray, A. et al., “Learning Dexterous Inhand Manipulation,” arXiv preprint arXiv:1808.00177, 2018.

6. Gu, S., Holly, E., Lillicrap, T., and Levine, S., “Deep Reinforcement Learning for Robotic Manipulation with Asynchronous off-Policy Updates,” in 2017 IEEE International Conference on Robotics and Automation (ICRA), 2017, 3389-3396, IEEE.

7. Rusu, A.A., Večerík, M., Rothörl, T., Heess, N., Pascanu, R., and Hadsell, R., “Sim-to-Real Robot Learning from Pixels

 FIGURE 7  The comparison of the localization maps highlighting the important regions in the image for predicting the concept between the best model with shallow network (left column) versus the deeper neural network model (right column).

© S

AE

Inte

rnat

iona

l.

Downloaded from SAE International by Zachary Asher, Wednesday, April 15, 2020

Page 7: Sahika Genc and Premkumar Rangarajan Amazon Web Services€¦ · 2020-01-0737 Pulished Apr 20 Using Reinforcement Learning and Simulation to Develop Autonomous Vehicle Control Strategies

USING REINFORCEMENT LEARNING AND SIMULATION TO DEVELOP AUTONOMOUS VEHICLE CONTROL STRATEGIES 7

with Progressive Nets,” in Proceedings of the 1st Annual Conference on Robot Learning, ser, Proceedings of Machine Learning Research, Levine, S., Vanhoucke, V., and Goldberg, K., Eds., 78, PMLR, Nov. 13-15, 2017, 262-270, available at: http://proceedings.mlr.press/v78/rusu17a.html.

8. Hwangbo, J., Lee, J., Dosovitskiy, A., Bellicoso, D., Tsounis, V., Koltun, V., and Hutter, M., “Learning Agile and Dynamic Motor Skills for Legged Robots,” Science Robotics 4(26), 2019, [Online], available at: https://robotics.sciencemag.org/content/4/26/eaau5872.

9. Xie, Z., Berseth, G., Clary, P., Hurst, J., and van de Panne, M., “Feedback Control for Cassie with Deep Reinforcement Learning,” in 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 2018, 1241-1246, IEEE.

10. Hsu, S.-H., Chan, S.-H., Wu, P.-T., Xiao, K., and Fu, L.-C., “Distributed Deep Reinforcement Learning Based Indoor Visual Navigation,” in 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 2018, 2532-2537, IEEE.

11. Choi, J., Park, K., Kim, M., and Seok, S., “Deep Reinforcement Learning of Navigation in a Complex and Crowded Environment with a Limited Field of View,” in 2019 International Conference on Robotics and Automation (ICRA), 2019, 5993-6000, IEEE.

12. Zhu, Y.,Mottaghi, R.,Kolve, E.,Lim, J.J.,Gupta, A., Fei-Fei, L., and Farhadi, A., “Target-Driven Visual Navigation in Indoor Scenes Using Deep Reinforcement Learning,” in 2017 IEEE international conference on robotics and automation (ICRA), 2017, 3357-3364, IEEE.

13. Kahn, G., Villaflor, A., Ding, B., Abbeel, P., and Levine, S., “Self-Supervised Deep Reinforcement Learning with Generalized Computation Graphs for Robot Navigation,” in 2018 IEEE International Conference on Robotics and Automation (ICRA), IEEE, 2018, 1-8.

14. Kim, H.J., Jordan, M.I., Sastry, S., and Ng, A.Y., “Autonomous Helicopter Flight Via Reinforcement Learning,” in Advances in Neural Information Processing Systems, 2004, 799-806.

15. Sadeghi, F. and Levine, S., “CAD2RL: Real Singleimage Flight without a Single Real Image,” arXiv preprint arXiv:1611.04201, 2016.

16. Chen, C., Liu, Y., Kreiss, S., and Alahi, A., “Crowdrobot Interaction: Crowd-Aware Robot Navigation with Attention-Based Deep Reinforcement Learning,” in 2019 International Conference on Robotics and Automation (ICRA), 2019, 6015-6022, IEEE.

17. Christen, S., Stevsic, S., and Hilliges, O., “Guided Deep Reinforcement Learning of Control Policies for Dexterous Human-Robot Interaction,” arXiv preprint arXiv:1906.11695, 2019.

18. Everett, M., Chen, Y.F., and How, J.P., “Motion Planning among Dynamic, Decision-Making Agents with Deep Reinforcement Learning,” in 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), IEEE, 2018, 3052-3059.

19. Sartoretti, G., Kerr, J., Shi, Y., Wagner, G. et al., “Primal: Pathfinding Via Reinforcement and Imitation Multi-Agent Learning,” IEEE Robotics and Automation Letters 4(3):2378-2385, 2019.

20. Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V. et al., “Scikit-Learn: Machine Learning in Python,” Journal of Machine Learning Research 12:2825-2830, 2011.

21. Abadi, M., Agarwal, A., Barham, P., Brevdo, E., Chen, Z., Citro, C., Corrado, G.S., Davis, A., Dean, J., Devin, M., Ghemawat, S., Goodfellow, I., Harp, A., Irving, G., Isard, M., Jia, Y., Jozefowicz, R., Kaiser, L., Kudlur, M., Levenberg, J., Mané, D., Monga, R., Moore, S., Murray, D., Olah, C., Schuster, M., Shlens, J., Steiner, B., Sutskever, I., Talwar, K., Tucker, P., Vanhoucke, V., Vasudevan, V., Viégas, F., Vinyals, O., Warden, P., Wattenberg, M., Wicke, M., Yu, Y., and Zheng, X., “TensorFlow: Large-Scale Machine Learning on Heterogeneous Systems,” 2015, software available from tensorflow.org, [Online], available at: https://www.tensorflow.org/.

22. Caspi, I., Leibovich, G., Novik, G., and Endrawis, S., “Reinforcement Learning Coach,” Dec. 2017. [Online]. Available: https://doi.org/10.5281/zenodo.1134899.

23. Liang, E., Liaw, R., Nishihara, R., Moritz, P., Fox, R., Goldberg, K., Gonzalez, J.E., Jordan, M.I., and Stoica, I., “RLlib: Abstractions for Distributed Reinforcement Learning,” in International Conference on Machine Learning (ICML), 2018.

24. Brockman, G., Cheung, V., Pettersson, L., Schneider, J., Schulman, J., Tang, J., and Zaremba, W., “OpenAI Gym,” 2016.

25. Crawley, D.B., Lawrie, L.K., Winkelmann, F.C., Buhl, W.F. et al., “EnergyPlus: Creating a New-Generation Building Energy Simulation Program,” Energy and Buildings 33(4):319-331, 2001.

26. Koenig, N. and Howard, A., “Design and Use Paradigms for Gazebo, an Open-Source Multi-Robot Simulator,” in IEEE/RSJ International Conference on Intelligent Robots and Systems, Sendai, Japan, Sep 2004, 2149-2154.

27. Karaman, S., Anders, A., Boulet, M., Connor, J., Gregson, K., Guerra, W., Guldner, O., Mohamoud, M., Plancher, B., Shin, R.et al., “Project-Based, Collaborative, Algorithmic Robotics for High School Students: Programming Self-Driving Race Cars at Mit,” in 2017 IEEE Integrated STEM Education Conference (ISEC), IEEE, 2017, 195-203.

28. Brockman, G., Cheung, V., Pettersson, L., Schneider, J., Schulman, J., Tang, J., and Zaremba, W., “OpenAI Gym,” arXiv preprint arXiv:1606.01540, 2016.

29. Fan, L., Zhu, Y., Zhu, J., Liu, Z., Zeng, O., Gupta, A., Creus-Costa, J., Savarese, S., and Fei-Fei, L., “Surreal: Open-Source Reinforcement Learning Framework and Robot Manipulation Benchmark,” in Conference on Robot Learning, 2018, 767-782.

30. Liang, J., Makoviychuk, V., Handa, A., Chentanez, N., Macklin, M., and Fox, D., “Gpu-Accelerated Robotic Simulation for Distributed Reinforcement Learning,” in Proceedings of the 2nd Conference on Robot Learning, ser. Proceedings of Machine Learning Research, Billard, A., Dragan, A., Peters, J., and Morimoto, J., 87, PMLR, Oct 29-31, 2018, 270-282, [Online], available at: http://proceedings.mlr.press/v87/liang18a.html.

31. Espeholt, L., Soyer, H., Munos, R., Simonyan, K., Mnih, V., Ward, T., Doron, Y., Firoiu, V., Harley, T., Dunning, I., Legg, S., and Kavukcuoglu, K., “IMPALA: Scalable Distributed Deep-RL with Importance Weighted Actor-Learner

Downloaded from SAE International by Zachary Asher, Wednesday, April 15, 2020

Page 8: Sahika Genc and Premkumar Rangarajan Amazon Web Services€¦ · 2020-01-0737 Pulished Apr 20 Using Reinforcement Learning and Simulation to Develop Autonomous Vehicle Control Strategies

© 2020 SAE International. All rights reserved. No part of this publication may be reproduced, stored in a retrieval system, or transmitted, in any form or by any means, electronic, mechanical, photocopying, recording, or otherwise, without the prior written permission of SAE International.

Positions and opinions advanced in this work are those of the author(s) and not necessarily those of SAE International. Responsibility for the content of the work lies solely with the author(s).

ISSN 0148-7191

8 USING REINFORCEMENT LEARNING AND SIMULATION TO DEVELOP AUTONOMOUS VEHICLE CONTROL STRATEGIES

Architectures,” in Proceedings of the 35th International Conference on Machine Learning, ser, Proceedings of Machine Learning Research, Dy, J. and Krause, A., Eds., 80, Stockholmsmässan, Stockholm Sweden, PMLR, Jul 10-15, 2018, 1407-1416, [Online], available at: http://proceedings.mlr.press/v80/espeholt18a.html.

32. Liang, E., Liaw, R., Nishihara, R., Moritz, P., Fox, R., Goldberg, K., Gonzalez, J., Jordan, M., and Stoica, I., “RLlib: Abstractions for Distributed Reinforcement Learning,” in Proceedings of the 35th International Conference on Machine Learning, ser. Proceedings of Machine Learning Research, Dy, J. and Krause, A., Eds., 80, Stockholmsmässan, Stockholm, Sweden, PMLR, Jul 10-15, 2018, 3053-3062, [Online], available at: http://proceedings.mlr.press/v80/liang18b.html.

33. Tan, J., Zhang, T., Coumans, E., Iscen, A., Bai, Y., Hafner, D., Bohez, S., and Vanhoucke, V., “Sim-to-Real: Learning Agile Locomotion for Quadruped Robots,” in Proceedings of Robotics: Science and Systems, Pittsburgh, PA, June 2018.

34. Peng, X.B., Andrychowicz, M., Zaremba, W., and Abbeel, P., “Sim-to-Real Transfer of Robotic Control with Dynamics Randomization,” in 2018 IEEE International Conference on Robotics and Automation (ICRA), 2018, 1-8, IEEE.

35. Muratore, F., Treede, F., Gienger, M., and Peters, J., “Domain Randomization for Simulation-Based Policy Optimization with Transferability Assessment,” in Conference on Robot Learning, 2018, 700-713.

36. Mandlekar, A., Zhu, Y., Garg, A., Fei-Fei, L., and Savarese, S., “Adversarially Robust Policy Learning: Active Construction

of Physically-Plausible Perturbations,” in 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 2017, 3932-3939, IEEE.

37. Higgins, I., Pal, A., Rusu, A., Matthey, L., Burgess, C., Pritzel, A., Botvinick, M., Blundell, C., and Lerchner, A., “Darla: Improving Zero-Shot Transfer in Reinforcement Learning,” in Proceedings of the 34th International Conference on Machine Learning, Vol. 70, JMLR.org, 2017, 1480-1490.

38. Bharadhwaj, H., Wang, Z., Bengio, Y., and Paull, L., “A Data-Efficient Framework for Training and Sim-Toreal Transfer of Navigation Policies,” in 2019 International Conference on Robotics and Automation (ICRA), 2019, 782-788, IEEE.

39. “Polysync Ros Oscc Bridge,” https://github.com/PolySync/roscco.

40. Balaji, B., Mallya, S., Genc, S., Gupta, S., Dirac, L., Khare, V., Roy, G., Sun, T., Tao, Y., Townsend, B., Calleja, E., Muralidhara, S., and Karuppasamy, D., “Deepracer: Educational Autonomous Racing Platform for Experimentation with sim2real Reinforcement Learning,” 2019.

41. Selvaraju, R.R., Cogswell, M., Das, A., Vedantam, R., Parikh, D., and Batra, D., “Grad-CAM: Visual Explanations from Deep Networks Via Gradient-Based Localization,” in Proceedings of the IEEE International Conference on Computer Vision, 2017, 618-626.

Downloaded from SAE International by Zachary Asher, Wednesday, April 15, 2020