Post on 16-Apr-2017
Autonomous Vehicles WebinarThe intersection of robotics and artificial intelligence
Streaming live via Hangouts8pm CT - August 28th, 2016
Undergraduate student at University of Illinois at Urbana - Champaign, Class of 2017
B.S. Mechanical Engineering, Minor in Electrical Engineering
Previous: PwC, Cummins, UIUC RA
OverviewI. What is an AV?II. Technology
A. AI + Robotics = AVsB. “Self-Driving Stack”
1. Sensing2. Processing3. Actuation
III. Up Next
What is an autonomous vehicle (AV) ?
Within the context of this discussion are focusing of roadway motor vehicles.
AVs at their simplest would be a car with cruise-control capability. At its most complex is an entirely driverless vehicle.
Much like everything else in tech, there is a lot of contention on how the classification should be structured. What is ‘full autonomy’, etc? Thankfully, the U.S. Dept. of Transportation developed an official tiering with very clear distinctions.
Autonomous vehicles (AVs) are vehicles that are capable movement with limited or no outside instruction or intervention.
Autonomy, per the U.S. Dept. of Transportation:
SOURCE:http://www.nhtsa.gov/About+NHTSA/Press+Releases/U.S.+Department+of+Transportation+Releases+Policy+on+Automated+Vehicle+Development
Tier 1Automation at this level involves one or more specific control functions. Examples include electronic stability control or pre-charged brakes, where the vehicle automatically assists with braking to enable the driver to regain control of the vehicle or stop faster than possible by acting alone.
Tier 2This level involves automation of at least two primary control functions designed to work in unison to relieve the driver of control of those functions. An example of combined functions enabling a Level 2 system is adaptive cruise control in combination with lane centering.
Tier 3Vehicles at this level of automation enable the driver to cede full control of all safety-critical functions under certain traffic or environmental conditions and in those conditions to rely heavily on the vehicle to monitor for changes in those conditions requiring transition back to driver control. The driver is expected to be available for occasional control, but with sufficiently comfortable transition time.
Tier 4The vehicle is designed to perform all safety-critical driving functions and monitor roadway conditions entirely. The driver could provide destination input and is not expected to be available for control at any time during the trip. This includes unoccupied vehicles.
AI + robotics = AVs
The intersection of artificial intelligence and robotics
An intelligent system that is capable of taking information/data and acting upon that data, capable of learning how to draw further insight
Study of design and control of mechanical systems. On a closed-loop, these systems are capable of controlling themselves using sensory information
● Modern machine learning and AI techniques are capable of this for specific tasks (AlphaGo, Image Classification)
● These similar techniques, especially Deep Learning, could be applied to vehicles to teach it them drive given high volumes of data
● Robotics is a well understood field of study with decades of research and progress
● Has been applied to planes, cars, etc, but in an extremely limited fashion
● Autonomy cannot be “hard-coded”, must be “learned”
AI
Robotics
The intersection of artificial intelligence and robotics: where the magic happens
Autonomous vehicles have always been a scientific dream. Planes have been capable of auto-pilot, “self-flying” features for decades. How is it taking so long to happen on cars? Well, existing infrastructures and roads cannot support rule-based robotic systems. There are too many possible scenarios that could occur when driving, rules for robotic vehicles cannot be “hard-coded”.
True autonomy requires artificial intelligence. Intelligence that resembles the human capability to decipher 3D space changing in time. With decades of advances in machine learning and artificial intelligence we are nearing a time when machines are better at understanding roads than we are.
Technology Deep-Dive
There is a lot going on under the hood, let’s try to simplify it
Pose Graph
LIDAR
Graph SLAM
1 Sensing
Processing2
Actuation3
The “Self-Driving Stack”The architecture of autonomy
Commands are sent to Control Unit which tells engine/motor to speed up or slow down. An analogous process occurs for vehicle steering.
Sensor data is passed on ro algorithms and is processed locally (GPUs) or over a distributed network (the Cloud)
Autonomous Vehicle Architecture
01000110101010100010110101000101
Video Camera (still images processing, pixels)
LIDAR (light-radar, point clouds)
Specific sensors (e.g. red light detection, pedestrian detection)
1Sensin
g Processing
2 3
Actuation
Autonomous Vehicle Architecture
Electromechanical Actuation
Sensing
Sensing
Processing/Computation
1
1
23
1Sensing
Technology Deep-Dive: Sensing
1 Sensing
Processing2
Actuation3
LIDAR, video cameras, and radar/sonic sensors are most commonly used for gathering vehicle environment data
Video Camera (still images processing, pixels)
LIDAR (light-radar, point clouds)
Specific sensors (e.g. red light detection, stop signs)
Sensing
● “Light radar” - LIDAR ● Generates point clouds that are 3D
representations of the driving environment● Seen as the high-resolution input data that is
integral to SLAM + RRT techniques
● Simple video cameras input feeds of still images that can be processed for lanes, obstacles, pedestrians, etc
● Cheap and effective, now being heavily implemented as the choice data for deep learning
● Case-specific sensors are heavily leveraged to provide insight in areas that LiDAR and cameras cannot handle in a general way
● Ex) a specific camera pointed at where stoplights are - feed directly into a specific algorithm for sensing red, yellow, and green colors
A deep-dive on LIDAR Sensing
● LIDAR has quickly become a go-to sensor for autonomous applications. Velodyne is an industry leader with relatively cheap, easy to calibrate units
● LIDAR units send out pulses of light and measure the time to return, which can be used to compute the distance of an object
● A rotating LIDAR sensor gathering distances of objects at different angles can gather enough points of data to construct a “point cloud”
● It is evident how useful point clouds are, similar effect as the human eye, 3D representation of space in real time
Researchers at MIT in collaboration with DARPA have been able to fabricate and implement a solid-state LIDAR chip:
“Our lidar chips promise to be orders of magnitude smaller, lighter, and cheaper than lidar systems available on the market today. They also have the potential to be much more robust because of the lack of moving parts, with a non-mechanical beam steering 1,000 times faster than what is currently achieved in mechanical lidar systems.”
“At the moment, our on-chip lidar system can detect objects at ranges of up to 2 meters, though we hope to achieve a 10-meter range within a year. The minimum range is around 5 centimeters. We have demonstrated centimeter longitudinal resolution and expect 3-cm lateral resolution at 2 meters. There is a clear development path towards lidar on a chip technology that can reach 100 meters, with the possibility of going even farther.”
Massive size and price reduction of LIDAR sensors could fundamentally change approach to autonomous vehicles, drones, prosthetics, etc.
“MIT and DARPA pack LIDAR sensor onto single chip”IEEE Spectrum, Aug 4 2016
A new, cheaper, solid state LIDAR emerging Sensing
SOURCE: http://spectrum.ieee.org/tech-talk/semiconductors/optoelectronics/mit-lidar-on-a-chip
The sensing stage needs to gather lots of data from different sources in order to fully understand the environment
Video Camera (still images processing, pixels)
LIDAR (light-radar, point clouds)
Specific sensors (e.g. red light detection, stop signs)
Sensing
Technology Deep-Dive: Processing
1 Sensing
Processing2
Actuation3
z
The Processing Stack Processing
● CPUs, GPUs, SoCs on board
● Large amounts of flash memory
● “Cloud” compute● Powerful endpoints,
limited only by speed of data communication
Computational Methods Local
Distributed
RRT*, SLAM, Kinematics
End-to-End, DNN, CNN
Motion Planning / Mapping
Machine Learning / Deep Learning
Intersections, Left-turn
Rule-based systems
LIDAR point cloud data
Video Camera Feed
Computational Muscle
Input Data
Output Commands
Computational Methods
Motion Planning Artificial Intelligence (ML/Deep Learning)
Motion Planning - Algorithm 1: SLAM Processing
What is the world around me (mapping)● Sense from various positions● Integrate measurements to produce map
Where in I am in the world (localization)● Sense● Relate sensor reading to a world model (a priori maps)● Compute (probabilistic) location relative to model
**above points taken from CMU paper cited below
Depicted to the right is a Kalman Filter being applied to position measurements and sensory information that in turn generates a Gaussian distribution of the possible positions
Simultaneous localization and mapping (SLAM)
SOURCE: http://www.cs.cmu.edu/~motionplanning/lecture/Chap8-Kalman-Mapping_howie.pdf
Motion Planning - Algorithm 1: SLAM Processing
SLAM Walkthrough
SOURCE: http://ocw.mit.edu/courses/aeronautics-and-astronautics/16-412j-cognitive-robotics-spring-2005/projects/1aslam_blas_repo.pdf
Motion Planning - Algorithm 1: SLAM Processing
SLAM Walkthrough 1 2 3 4 5 6 7
RobotLandmark
SOURCE: http://ocw.mit.edu/courses/aeronautics-and-astronautics/16-412j-cognitive-robotics-spring-2005/projects/1aslam_blas_repo.pdf
Motion Planning - Algorithm 1: SLAM Processing
SLAM Walkthrough 1 2 3 4 5 6 7
RobotLandmark
SOURCE: http://ocw.mit.edu/courses/aeronautics-and-astronautics/16-412j-cognitive-robotics-spring-2005/projects/1aslam_blas_repo.pdf
Motion Planning - Algorithm 1: SLAM Processing
SLAM Walkthrough 1 2 3 4 5 6 7
RobotLandmark
SOURCE: http://ocw.mit.edu/courses/aeronautics-and-astronautics/16-412j-cognitive-robotics-spring-2005/projects/1aslam_blas_repo.pdf
Motion Planning - Algorithm 1: SLAM Processing
SLAM Walkthrough 1 2 3 4 5 6 7
RobotLandmark
SOURCE: http://ocw.mit.edu/courses/aeronautics-and-astronautics/16-412j-cognitive-robotics-spring-2005/projects/1aslam_blas_repo.pdf
Motion Planning - Algorithm 1: SLAM Processing
SLAM Walkthrough 1 2 3 4 5 6 7
SOURCE: http://ocw.mit.edu/courses/aeronautics-and-astronautics/16-412j-cognitive-robotics-spring-2005/projects/1aslam_blas_repo.pdf
Motion Planning - Algorithm 1: SLAM Processing
SLAM Walkthrough 1 2 3 4 5 6 7
SOURCE: http://ocw.mit.edu/courses/aeronautics-and-astronautics/16-412j-cognitive-robotics-spring-2005/projects/1aslam_blas_repo.pdf
Motion Planning - Algorithm 1: SLAM Processing
SLAM Walkthrough 1 2 3 4 5 6 7
Location Likelihood Distribution
SOURCE: http://ocw.mit.edu/courses/aeronautics-and-astronautics/16-412j-cognitive-robotics-spring-2005/projects/1aslam_blas_repo.pdf
Motion Planning - Algorithm 2: RRTs Processing
● Rapidly-exploring Random Trees (RRTs) are a set of exploratory algorithms that are useful for trajectory planning
● With a set of polygonal obstacles, an RRT can generate a possible path from the starting configuration to the ending (goal) configuration
● Sample paths are then input to a controller/model representation of the vehicle dynamics and the predicted trajectory of the vehicle is computed (x)
● The runtime of these algorithms can vary since accuracy is based on samples taken
Once a probabilistic localization is realized, a probabilistic path can be generated using RRTs
SOURCE: http://acl.mit.edu/papers/KuwataTCST09.pdfhttp://www.staff.science.uu.nl/~gerae101/pdf/compare.pdf
Motion Planning - SLAM + RRTs = advanced guesswork Processing
● In order to obtain a higher-resolution probabilistic model of the ideal trajectory more samples need to be taken and more computations performed, hence the need for massive compute power!
● It is understandable that a car driving 60mph would have issues performing this depth of computation in a rapidly changing environment
For more in-depth understanding of algorithmic robotics motion planning works check out SLAM for Dummies
A probabilistic path generated from probabilistic input poses issues for vehicles moving at high speeds
SOURCE: http://workshops.acin.tuwien.ac.at/clutter2014/papers/ric2014_submission_9.pdfhttp://acl.mit.edu/papers/KuwataTCST09.pdf
**white spots represent sampled points used to generate RRT
Artificial Intelligence (ML/Deep Learning) Processing
● Newly emerging methodologies all revolve around deep learning via neural nets
○ RNNs, CNNs, GANs, Autoencoding, etc● Two main forces driving adoption of these
methods:○ Cheaper and more powerful local and cloud
computing (GPUs)○ Open-source deep learning platforms
(TensorFlow)
These deep learning methodologies are injecting intelligence into vehicles, feeding them massive amounts of data, and letting them learn
Please check out this Deep Learning Playground for a better visualization of the concept
Artificial Intelligence Methods Feature extraction performed by a CNN on video from a forward facing camera. Model was able to determine what were road edges with relative accuracy (via NVIDIA)
Lane centering generator that predicts path of vehicles based on video input from front facing camera (via Comma.ai)
Artificial Intelligence (ML/Deep Learning) Processing
Important Academic Papers Regarding Deep Learning Processing
● NVIDIA - “End to End Learning for Self-Driving Cars”Video input from a forward facing camera is trained against steering wheel position and deep learning networks are capable of detecting important road features with limited additional nudging in the right direction
● Comma.ai - "Learning A Driving Simulator"Using video input with no additional training metadata (IMU, wheel angle) auto-encoded video was generated, predicting many frames into the future while maintaining road features
● Radford et al. (Facebook AI) - "Unsupervised Representational Learning w/ Deep GANs"Seminal work on deep learning auto-encoding that allowed Comma.ai breakthrough and similar work i.e. “Autoencoding Blade Runner”
● NYU & Facebook AI - “Deep Multi-Scale Video Prediction Beyond Mean Square Error”
Implications of these papers indicate deep learning is a highly promising solution for AVs
Computational Muscle
CPUs GPUs SoCs (Onboard) Distributed Computing (Cloud)
Computational muscle limited to local compute, for now Processing
● Current self-driving solutions are all implemented with local compute due to the need for simplicity, focusing on software first
● Utilizing GPUs and special SoCs to perform simple operations (i.e. with pixels and point clouds) at massive scale in parallel
● New TPUs (tensor processing units) are being designed specifically for the purpose of machine learning and AI, as well as new platforms emerging specifically for AVs
● A distributed network offering massive computational muscle would be ideal, but does not offer immediate simplicity due to latency, security, reliability, ...
● Movement toward an “AWS for AVs” is a huge opportunity, many companies are actively working on
Two paradigms currently, local compute (CPUs, SoCs, GPUs) and distributed computation over a network (Cloud)
Google’s new TPU that powered AlphaGo
Technology Deep-Dive: Actuation
1 Sensing
Processing2
Actuation3
Actuation stage is primarily based on field of controls and electromechanical systems Actuation
● The control unit is circuit hardware that manages electromechanical systems within a car
● Large amount of low-level controls have been standardized into protocols like CAN
● Most well-studied and understood portion of the self-driving technology stack, high feasibility relative to other parts of the “stack”
● Companies like Delphi and Bosch are large players in this space and have invested decades of time and research into vehicle controls
● Innovation in this space is much more iterative, positioning incumbents to dominate the controls hardware/software for AVs
The processing stage sends commands via bus like CAN or similar architectures to engine control unit/modules
Up Next
High level trends, “Self-Driving Stack” trends, general comments
● Costs of sensors is falling through the floor
● No “best sensor” yet, converging toward LIDAR and video camera, dependent on processing approaches
● Accuracy limits, distance limits, latency of data feed (LIDAR especially) are improving exponentially with cost
Sensing● Models vs. Neural vs.
Mixed, no “best practice”● Local compute only
implementation yet, will transition toward “Cloud” same way as software
● Mapping is important but AI vector bank is the new data network effect
● V2V, V2I communication cannot be relied upon
Processing● Actuation / controls is out
in front of the rest of the tech, not a limiting factor
● Mission critical safety and reliability needs to be investigated more heavily, beyond “Six Sigma”
● Incumbents well positioned
● Security has not been investigated thoroughly, will emerge as a large space later on
Actuation / Controls
My Thoughts
1Data Network Effects for AI systems are the sole most important factor to long-term success. Advantage Uber and Tesla.
LIDAR and GPU companies will become important OEMs and provide it as a service to Big Auto, only non-commodity hardware that matters to enable “AV”.
2
The inherently difficult problems are software related, Big Auto not positioned to “win” at software. Defer to startups with ex-researchers.
3
- Otto (recently acquired by Uber for ~$600M)- Zoox
- $200M fundraise with not evening a landing page, talk about stealthy! Team consists of “fathers of AVs”
- Comma.ai- Attempting to offer autonomy enablement to vehicle
mfgs.- Drive.ai
- Software for AVs, not much info, rockstar team with very deep background
- Peloton Tech- More immediate use case for semi-autonomy with
platooning. Strategic investors, UPS venture arm is a positive signal.
- NuTonomy- Released functioning product in Singapore, great
team
Companies to pay attention to
Thank You