When Artificial Intelligence (AI) Meets Autonomous Vehicles (AV)...

Post on 24-Jun-2020

1 views 0 download

Transcript of When Artificial Intelligence (AI) Meets Autonomous Vehicles (AV)...

Ching-Yao ChanBerkeley DeepDrive, UC Berkeley

Cooperative Interacting Vehicles Summer School 2018Domaine de Chalès - Nouan-le-Fuzelier, France

September 4, 2018

When Artificial Intelligence (AI)

Meets

Autonomous Vehicles (AV)

• Berkeley DeepDrive, Brief Introduction

• Emergence of AV and AI

• AI in AV, Why and How?

• Reinforcement Learning (RL) and Inverse Reinforcement Learning (IRL)• Topic to be covered by Pin Wang

• AI for Deployment

• The Ultimate Driving Machine

• Concluding Remarks

Presentation Outline

• Berkeley Vision Learning Center• A consortium that started in 2012• Tremendous advances in computer vision and deep learning• Open-source CAFFE, widely used globally

• Now Berkeley Artificial Intelligence Research (BAIR)• https://bair.berkeley.edu/

• Berkeley DeepDrive (BDD) Center• A consortium that started in Spring 2016• Seeking to apply AI and deep learning technologies to

automotive applications.

Deep Learning at Berkeley

Berkeley DeepDrive• Current industrial members include: (as of August 2018)

– Automakers and Suppliers: • Ford, GM, Honda, Hyundai, SF Motors, Toyota• Continental, ZF

– Mobility Operators and Providers: • Didi Chuxing, Meituan-Dianping, UISEE, Zenity Mobility

– Technology providers: • Autobrain, Baidu, Huawei, Mapillary, Nexar• Nvidia, NXP, Panasonic, Samsung, Sony

Our M ission:

We seek to merge deep learning w ith automotive perception and bring computer vision technology to the forefront.

Berkeley DeepDriveSee deepdrive.berkeley.edu for lists of projects and researchers

Pushing the scientific forefronts of• Computer Vision/ Autonomous Perception• Automated Driving Systems• Robotics• A.I ./ Machine Learning

Berkeley DeepDriveDeep Learning Autonomy

BDD Research Themes

BDD Research Intelligence for Autonomy

Skill Sets of Intelligent Dynamic Systems

BDD Research and Applications

Autonomy for

Intelligent Systems

BDD-100k Data Release, 05/2018See bdd-data.berkeley.edu for detail and archived paper

100K Videos

“Autonomous” Vehicles for Real in 2018-2021?

AV Testing in California

As of August 23, 2018,

• There are 56 Autonomous Vehicle Testing permit holders.

• More than 400 test vehicles.

Latest News about Vehicle Automation• Toyota invests 500M in Uber, and aim for deployment in 2021 (08/2018)• Waymo pilot program shows self-driving cars can boost transit (07/2018)• Drive.ai self-driving car hitting road in Frisco, Texas (07/2018)• Ford hives off self-driving operations (07/2018)• Waymo partners with Walmart to shuttle customers in self-driving cars (07/2018)• Mercedes (+Nvidia+Bosch) will launch self-driving taxi in California next year (07/2018) • Uber, Waymo in talks about self-driving partnership: Uber CEO (05/2018)• Ford's self-driving car network will launch 'at scale' in 2021. (05/2018)• Apple reportedly working with Volkswagen on self-driving vans. (05/2018)• Aptiv, Lyft launch Las Vegas fleet of self-driving cars (05/2018)• Waymo and Honda reportedly will build a self-driving delivery vehicle. (04/2018)• Auto parts maker Magna invests $200 million in Lyft (03/2018)• ……

The (Fourth) Wave of A.I.

Doing Better and BetterWith Deeper and Deeper Networks

* End-to-End Training of Deep Visuomotor Policies, Levine et al, 2015

Deep Learning: From Image to Control

How can Deep Learning (AI) Help (Self-Driving) Vehicles?

Automobiles A.I.

A Great Enabler

Machine Learning/ A.I . & Automated Driving

A Fitting Challenge

Where and How Best to Utilize?

Automated Driving Systems (ADS) - Functional Block Diagram

DrivingEnvironment

Actuation

Sensing(camera,

radar, lidar, etc.)

VehicleKinematic & Dynamic

Model

Control Commands

EgoVehicleStates

Trajectory Planning

Driver

Autonomous Perception

Mapping & Localization

Route Planning

Automated Driving Systems (ADS) - Feedforward and Feedback in Control Systems

DrivingEnvironment

Actuation

Sensing(camera,

radar, lidar, etc.)

VehicleKinematic & Dynamic

Model

Control Commands

EgoVehicleStates

Trajectory Planning

Driver

Autonomous Perception

Mapping & Localization

Route Planning Feedforward

Conventional Vehicle Control

DisciplineFeedback

Automated Driving Systems (ADS) - DNN End-to-End Learning for ADS

DrivingEnvironment

Actuation

Sensing(camera,

radar, lidar, etc.)

VehicleKinematic & Dynamic

Model

Control Commands

EgoVehicleStates

Trajectory Planning

Driver

Autonomous Perception

Mapping & Localization

Route Planning

*End-to-end Learning for Self-Driving Cars, Nvidia, 2016

End-to-End Learning for Self-Driving Cars

(NVIDIA, 2016)• Minimum training data used to learn to

drive in traffic on local roads with or without lane markings and on highways.

• The system learns internal representations such as detecting useful road features with only the human steering angle as the training signal.

• A convolutional neural network (CNN) to map raw pixels from a single front-facing camera directly to steering commands.

Automated Driving Systems (ADS) - End-to-End Learning for Self-Driving Cars

DrivingEnvironment

Actuation

Sensing(camera,

radar, lidar, etc.)

VehicleKinematic & Dynamic

Model

Control Commands

EgoVehicleStates

Trajectory Planning

Driver

Autonomous Perception

Mapping & Localization

Route Planning

*End-to-end Learning for Self-Driving Cars, Nvidia, 2016

?

Automated Driving Systems (ADS) End-to-end to predict future egomotion (UCB Darrell’s Group)

DrivingEnvironment

Actuation

Sensing(camera,

radar, lidar, etc.)

VehicleKinematic & Dynamic

Model

Control Commands

EgoVehicleStates

Trajectory Planning

Driver

Autonomous Perception

Mapping & Localization

Route Planning

An end-to-end trainable architecture for learning to predict a distribution over future vehicle egomotion

*End-to-end Learning of Driving Models from Large-scale Video Datasets, Xu et al, CVPR 2017

End-to-End Learning of Driving Models (UCB Darrell’s Group, 2017)

• Exploiting large scale online and/or crowdsourced datasets.

• Learning a driving model or policy from uncalibrated sources.

• Predicting the distribution over feasible future actions.

End-to-End Learning of Driving Models (UCB Darrell’s Group, 2017)

• Exploiting large scale online and/or crowdsourced datasets.

• Learning a driving model or policy from uncalibrated sources.

• Predicting the distribution over feasible future actions.

Automated Driving Systems (ADS) - End-to-End Navigation by RL (Deep Mind 2018)

DrivingEnvironment

Actuation

Sensing(camera,

radar, lidar, etc.)

VehicleKinematic & Dynamic

Model

Control Commands

EgoVehicleStates

Trajectory Planning

Driver

Autonomous Perception

Mapping & Localization

Route Planning

*Learning to Navigate in Cities w ithout a Map, DeepMind, 2018

An end-to-end deep reinforcementlearning approach that can be applied on a city scale

End-to-End Navigation by Reinforcement Learning

(DeepMind 2018)

• Real-world grounded content is built on top of the publicly available Google StreetView.

• Agent never sees the underlying graphs but only the RGB images.

• The goal is represented in terms of its proximity to a set L of fixed landmarks.

• The aim is to show a neural network can learn to traverse entire cities (London, Paris and New York) using only visual observations.

Automated Driving Systems (ADS) Reinforcement Learning for AV (Wang & Chan, 2017)

DrivingEnvironment

Actuation

Sensing(camera,

radar, lidar, etc.)

VehicleKinematic & Dynamic

Model

Control Commands

EgoVehicleStates

Trajectory Planning

Driver

Autonomous Perception

Mapping & Localization

Route Planning

Maneuver Control based on Reinforcement Learning for Automated Vehicles in An Interactive Environment

*Reinforcement Learning, P. Wang, C-Y Chan, ITSC 17, IV 18

Reinforcement Learning for driving policy in interactivedriving environment (Wang and Chan 2017-2018)

ImmediateReward Safety Promptness

𝒇𝒇𝒅𝒅(𝒅𝒅𝒅𝒅𝒅𝒅𝒅𝒅𝒅𝒅𝒅𝒅𝒅𝒅𝒅𝒅) 𝒇𝒇𝒗𝒗(𝒅𝒅𝒔𝒔𝒅𝒅𝒅𝒅𝒅𝒅)

Smoothness

𝒇𝒇𝒅𝒅(𝒅𝒅𝒅𝒅𝒅𝒅𝒅𝒅𝒂𝒂𝒅𝒅𝒂𝒂𝒅𝒅𝒅𝒅𝒅𝒅𝒂𝒂𝒅𝒅)

Application of Reinforcement Learning andInverse Reinforcement Learning for

Autonomous Driving

Pin WangTeam Leader

Ching-Yao ChanAssociate Director, Berkeley DeepDrive

Reinforcement Learning for AutonomousDriving

– use cases: Ramp Merge and Lane Change

Reinforcement Learning – Problem Formulation• Find a safe, comfortable, efficient driving policy under

dynamic traffic by maximizing a long-term reward

Continuousstate space

Continuousaction space

Continuousreward function

Vehicle control Longitudinal Lateral

Reinforcement Learning AlgorithmsAn Overview

RL algorithms

Discrete Action Space Continuous Action Space

Q-learningDueling Networks

Stochastic policy gradientActor-criticTrust region policy gradientNatural policy gradients

Stochastic ContinuousAction Space

Deterministic ContinuousAction Space

Deterministic policy gradientOn-policy DPGOff-policy DPGNormalized Advantage Functions

Quadratic Q-function Approximator

Reward Function

• Reward Function• Safety• Comfort• Efficiency

• Time sequence

• Q-function approximator design

Quadratic Q-function Approximation

𝝁𝝁 𝒅𝒅 ,𝑴𝑴 𝒅𝒅 ,𝑽𝑽(𝒅𝒅) are values learnedfrom neural networks.

• A Reinforcement Learning Based Approach for Automated Lane Change Maneuvers, 2018 IEEE International Conference onIntelligent Vehicles.

• Formulation of Deep Reinforcement Learning Architecture Toward Autonomous Driving for On-Ramp Merge, 2017 IEEE InternationalConference on Intelligent Transportation Systems.

Highway/ramp traffic: -- random departure time-- random initial speed-- individual speed limit

Highway vehicles: -- car following behavior

Ego vehicle: -- ramp merging behavior-- lane changing

(1) Scenarios of ramp merging and lane changing

(2) Traffic on highway and ramp (3) Vehicle behaviors

(4) Simulation rules: Vehicle interactions Accepted gap Lane change commands

Simulation Platform

Lane changeRamp merge

Loss (decreasing) Reward (increasing )

Training Results

Training steps: 600,000Lane changing vehicles: 6,000Train on CUPTraining time: 150 mins.

Loss (decreasing) Reward (increasing )

Training steps: 400,000Ramp merging vehicles: 15,000Train on CUPTraining time: 100 mins.

• Verification– Save 10 models during training– Play each model with 100 vehicles running.– Calculate the averaged total rewards for each model.

Model Verification

training steps

Verification of Vehicle Performance

Inverse Reinforcement Learning forReward Function Learning

Inverse Reinforcement LearningInfer reward function from roll-outs of expert policy/demonstrations

• Given:– States, Actions– transition model p(s’|s, a) (sometimes) – Samples from policy 𝝅𝝅

• Learn:– Reward Function 𝒂𝒂𝝓𝝓(𝒅𝒅,𝒅𝒅)– Either a linear combination or neural network

• Then:– Use learned reward function to learn 𝝅𝝅∗(𝒅𝒅|𝒅𝒅)

Two Main Methods

• Maximum Margin Based (Ng & Abbeel, 2004)– Reward function design: 𝑹𝑹 𝒅𝒅 = 𝒘𝒘 ∗ 𝒇𝒇 𝒅𝒅– Feature function expectation : 𝝁𝝁𝑬𝑬– Max margin and update:

– Drawback:• Ambiguity: Different policies may lead to the same feature values.

• Max Entropy Based (Ziebart, 2008) – Learn 𝒔𝒔 𝒂𝒂 𝜽𝜽 from observations – Based on max. entropy.– Use max. likelihood as approximation

– Drawback: approximation has bias.

Proposed Method

• Max. Entropy

• Incorporate prior knowledge– Incorporate prior info. on vehicle kinematics

Kinematic Model

Feature Functions

• Features

– Front vehicle time headway: 𝐓𝐓𝐓𝐓𝐓𝐓𝒇𝒇 = 𝒚𝒚𝒇𝒇𝒂𝒂𝒂𝒂𝒅𝒅𝒅𝒅−𝒚𝒚𝒅𝒅𝒆𝒆𝒂𝒂𝒗𝒗

– Rear vehicle time headway: 𝐓𝐓𝐓𝐓𝐓𝐓𝒂𝒂 = 𝒚𝒚𝒅𝒅𝒆𝒆𝒂𝒂−𝒚𝒚𝒂𝒂𝒅𝒅𝒅𝒅𝒂𝒂𝒗𝒗

– AV longitudinal acceleration: �̇�𝒚

– AV lateral acceleration: �̇�𝒙

– AV steering angle rate: �̇�𝜹𝒇𝒇

– Speed diff. btw. current speed and desired speed: |𝒗𝒗 − 𝒗𝒗𝒅𝒅𝒅𝒅𝒅𝒅|

– Lateral deviation from the target lane: |𝒚𝒚 − 𝒚𝒚𝒅𝒅𝒅𝒅𝒅𝒅|

Training

• NGSIM Data– Naturalistic traffic data on I-80 – Coverage of rush hour (5:00pm-5:30pm) and transition period (4:00pm-4:15pm)– 5000+ vehicle trajectories, 200 lane changes

• Extracted Scenario– Lane change between two lanes– Four vehicles as a pair– Target vehicle (blue) is changing lane

Driving Direction

12

34

5

76La

nes

Bird view of naturalistic traffic recorded on I-80 freeway Extracted scenario illustration

• Generated trajectory of left & right lane changes based on the learned reward function

130 140 150 160 170 180 190

X/m

10

11

12

13

14

15

16

17

18

19

Y/m

Original TrajectoryFiltered TrajectoryIRL Generated TrajectoryLane I

Lane II

120 130 140 150 160 170 180 190

X/m

-1

0

1

2

3

4

5

6

7

8

Y/m

Original TrajectoryFiltered TrajectoryIRL Generated TrajectoryLane I

Lane II

• Research Topics:– Different formats of reward functions– Diverse situations to make the model more robust– Comparison with other IRL methods

Technical Approach

Applying AI to Production Cars

Software 1.0

Written in codes (C++ …)Requires Domain Expertise1. Decompose problems2. Design algorithms3. Compose into a systemMeasure performance

*Building the Software 2.0 Stack,” Andrej Kaparthy, Tesla, 05/2018

Software 2.0

Requires Much Less Domain Expertise1. Design a Code SkeletonMeasure performance

“Fill in the Blanks Programming”

*Building the Software 2.0 Stack,” Andrej Kaparthy, Tesla, 05/2018

Cameras, Radar, Ultrasonic, IMU

Steering, Acceleration

Cameras, Radar, Ultrasonic, IMU

Steering, Acceleration

Cameras, Radar, Ultrasonic, IMU

Steering, Acceleration

1.0 Code

2.0 Code

*Building the Software 2.0 Stack,” Andrej Kaparthy, Tesla, 05/2018

How to Expedite Learning and Testing?• The consensus is that it is too resource-consuming and not feasible

to conduct ADS testing by physical cases “completely.” (>108 km)

• Practices of Safety Assurance Testing:• Learn from database of “corner cases”

• Collection of challenging scenarios and probable test cases for specifications

• “Fleet” Learning• Tesla, e.g. (100’s M of on-road data)

• “Simulated” Learning• Waymo, e.g. (8M miles daily, 2.5B miles yearly)

Applying AI in Achieving Safe and Robust AV Performance

• Proving Ground

• Road Testing

• Simulation

• Supervised Learning

• Imitation + Reinforcement Learning

• RL + Supervised Learning

Testing/Validation AI & ML

General Intelligence All Situations Uncharted Territory

Domain Adaptation Transfer Learning Learning to Learn

Philosophically Speaking ….

What are We (Humans) and Machine Good at?

• Expression and Gesture

• Intuitive Reflex

• Imagination

• Adaption

• System One*

• Complex & Fast Computation

• Rational Reasoning

• Rule-Abiding

• Vast Data Capacity

• System Two*

Human Machine

* Thinking Fast and Slow, Daniel Kahneman

Man and Machine are quite complementary

H(orse) Metaphor for Automated Driving Systems (ADS)

Tight ReinLoose Rein

High Autonomy High Intervention

HorseRiding

CarDriving

• The H-Metaphor as a Guideline for Vehicle Automation and Interaction by F. Flemisch et al., 2003

H-Metaphor for Automated Driving Systems (ADS)

The horse can run a course well on its own; it also behaves well even if the rider pulls the rein or uses the whip occasionally.

HorseRiding

CarDriving

The car can run the course well on its own; it also behaves well even if the driver steers the wheel or pushes the pedal occasionally.

The Ultimate Driving Machine

The Ultimate Driving Machine?

Level of Automation

Level of Driver Inputs

I

II

III

IV

V

5 Levels of Automation per

SAE J-3016

Switching of Automation

Levels

Supervisory Controlin Automated (Driving) Systems

• Supervisory control*:

Human-machine systems can exist in a spectrum of automation, and shift across the spectrum of control levels in real time to suit the situation at hand.

* T. Sheridan, Telerobotics, Automation, and Human Supervisory Control, Cambridge, MA: MIT Press, 1992.

The Ultimate Driving Machine?

Level of Automation

Level of Driver Inputs

I

II

III

IV

V

5 Levels of Automation per

SAE J-3016

Supervisory Control at Varying

Automation Levels

Safe and Effective Interaction with

Surrounding

Vehicle State Measurement

Module

Detection and Perception

Modules

Actuation Control Modules

If there is a lack of clarity and certainty,

Can an arbitration module learn to make decisions to achieve its goal?

Given the foundation below,

Research Questions in Supervisory Concept

ArbitrationModule

AV ControllerInputs

DriverInputs

?

Minimum Risk Doman

Automation Lock Doman

1

2

4

6

3

Automation ODD

Automatic Transition Doman

Singularity Doman

5

1. Request In2. Request Out3. Auto Transition In4. Minimum Risk Move5. Driver Takeover at Will6. Automation Lock-In

Operational Design Domain (ODD, per SAE)

Concluding Remarks

Opportunities in AI for AV

• Significant advancements in Deep Learning, 2010s • Text, Voice, Image • Robotics Autonomous Driving

• Still a long way to go, to achieve general intelligence, but it is an exciting era for AI+AV

Intelligence ≠ Perfection

Artificial or Human

We, As a Society, Have a High Tolerance of What Humans Do,

• Distraction

• Fatigue

• Poor Judgment

• Mistakes

• Not Knowing What Is in Others’ Mind

• Misinformation

• Reliability

• Consistency

• Fail-Safe

• Not Understanding Algorithms?

Human Behaviors Machine Performance

Can We Accept and Live With What Machines Do?

(What we have now is)

Not A.I., but I.A., Intelligence Augmentation

Michael Jordan, UC Berkeley

Thank you.

Ching-Yao Chancychan@berkeley.edu