Distributed Inference Between Mobile Edge Devices … › ... › sandeep-chinchali.pdfSandeep...

46
Distributed Inference Between Mobile Edge Devices and the Cloud Sandeep Chinchali*, Jenya Pergament*, Eyal Cidon*, Marco Pavone, Sachin Katti Neural Net 1

Transcript of Distributed Inference Between Mobile Edge Devices … › ... › sandeep-chinchali.pdfSandeep...

Page 1: Distributed Inference Between Mobile Edge Devices … › ... › sandeep-chinchali.pdfSandeep Chinchali, Apoorva Sharma, James Harrison, Amine Elhafsi, Daniel Kang, Jenya Pergament,

Distributed Inference Between Mobile Edge Devices and the Cloud

Sandeep Chinchali*, Jenya Pergament*, Eyal Cidon*, Marco Pavone, Sachin Katti

Neural Net

1

Page 2: Distributed Inference Between Mobile Edge Devices … › ... › sandeep-chinchali.pdfSandeep Chinchali, Apoorva Sharma, James Harrison, Amine Elhafsi, Daniel Kang, Jenya Pergament,

Can robot perception tasks be done in the cloud?• Automated Sensing from Video/LIDAR

• Compute-intensive Deep Neural Nets (DNNs)

• Can resource-constrained robots scalably use

“the cloud?”

2

Uplink-limited

Credit: Alexander Kazeka, https://www.youtube.com/watch?v=1j_3fh34E44

Page 3: Distributed Inference Between Mobile Edge Devices … › ... › sandeep-chinchali.pdfSandeep Chinchali, Apoorva Sharma, James Harrison, Amine Elhafsi, Daniel Kang, Jenya Pergament,

Sensory Input

Robot Model

Limited Network

Offload Compute

Mobile Robot

Cloud Model

Cloud

Image, MapDatabases

OffloadLogic

Local Compute

Query the cloud for better accuracy?Latency vs. Accuracy vs. Power …

Page 4: Distributed Inference Between Mobile Edge Devices … › ... › sandeep-chinchali.pdfSandeep Chinchali, Apoorva Sharma, James Harrison, Amine Elhafsi, Daniel Kang, Jenya Pergament,

OutlineLearning-Based Approach to Cloud Offloading in Robotics Sandeep Chinchali, Apoorva Sharma, James Harrison, Amine Elhafsi, Daniel Kang, Jenya Pergament, Eyal Cidon, Sachin Katti, Marco Pavone, [accepted to Robotics: Science and Systems (RSS) 2019]

1. Accuracy vs Compute-Efficiency Trade-offs of DNNs2. Network Costs of Streaming Video/ LIDAR

3. A learning-based approach to Cloud Offloading

4. Simulation and Hardware Experiments

4

Page 5: Distributed Inference Between Mobile Edge Devices … › ... › sandeep-chinchali.pdfSandeep Chinchali, Apoorva Sharma, James Harrison, Amine Elhafsi, Daniel Kang, Jenya Pergament,

Accuracy of Robot and Cloud DNNs

Cloud ModelRobot Model

5

Page 6: Distributed Inference Between Mobile Edge Devices … › ... › sandeep-chinchali.pdfSandeep Chinchali, Apoorva Sharma, James Harrison, Amine Elhafsi, Daniel Kang, Jenya Pergament,

If embedded AI gets better, will I still need the cloud?

Cloud is still useful to:1. Pool video from multiple

robots2. Access large map, image

databases3. Query models trained on

more/newer data

“Cloud”: could even be a bigger on-board model

6

Jetson TX2 GPU (~$480)

Google Edge TPU (~$150)Jetson Nano (~$99)

Model Raspberry PI 3

R-pi 3 + Intel Neural Compute Stick

Jetson Nano

Edge TPU

SSD MobileNet-v2 (300x300)

1 FPS 11 FPS 39 FPS 48 FPS

Source: https://devblogs.nvidia.com/jetson-nano-ai-computing/

Page 7: Distributed Inference Between Mobile Edge Devices … › ... › sandeep-chinchali.pdfSandeep Chinchali, Apoorva Sharma, James Harrison, Amine Elhafsi, Daniel Kang, Jenya Pergament,

Outline1. Accuracy vs Compute-Efficiency Trade-offs of DNNs

2. Network Costs of Streaming Video/ LIDAR

3. A learning-based approach to Cloud Offloading

4. Simulation and Hardware Experiments

7

Page 8: Distributed Inference Between Mobile Edge Devices … › ... › sandeep-chinchali.pdfSandeep Chinchali, Apoorva Sharma, James Harrison, Amine Elhafsi, Daniel Kang, Jenya Pergament,

Uplink-limited

Network Costs of Cloud Communication

1. Congested Wireless Links2. High Bandwidth: Designed for Human, Not Robot Perception

8

J. Emmons, S. Fouladi, G. Ananthanarayanan, S. Venkataraman, S. Savarese, K. Winstein, “Cracking Open the DNN blackbox”

Page 9: Distributed Inference Between Mobile Edge Devices … › ... › sandeep-chinchali.pdfSandeep Chinchali, Apoorva Sharma, James Harrison, Amine Elhafsi, Daniel Kang, Jenya Pergament,

Our Network Congestion Experiments

“ROS Ate My Network Bandwidth!”(ROS User Forums)

~70 Mbps

Page 10: Distributed Inference Between Mobile Edge Devices … › ... › sandeep-chinchali.pdfSandeep Chinchali, Apoorva Sharma, James Harrison, Amine Elhafsi, Daniel Kang, Jenya Pergament,

Outline1. Accuracy vs Compute-Efficiency Trade-offs of DNNs

2. Network Costs of Streaming Video/ LIDAR

3. A learning-based approach to Cloud Offloading

4. Simulation and Hardware Experiments

10

Page 11: Distributed Inference Between Mobile Edge Devices … › ... › sandeep-chinchali.pdfSandeep Chinchali, Apoorva Sharma, James Harrison, Amine Elhafsi, Daniel Kang, Jenya Pergament,

WastedQueries

Cloud Offloading as a Decision Problem

[email protected] 11

Cloud Queries

RobotConfidence

Robot Correct Contending goals• Maximize Accuracy• Minimize latency• Limited Network

Share

Optimal Control

Limited Cloud Queries

Page 12: Distributed Inference Between Mobile Edge Devices … › ... › sandeep-chinchali.pdfSandeep Chinchali, Apoorva Sharma, James Harrison, Amine Elhafsi, Daniel Kang, Jenya Pergament,

RL Approach to Cloud Offloading

DNN

Edge Cloud

12

Page 13: Distributed Inference Between Mobile Edge Devices … › ... › sandeep-chinchali.pdfSandeep Chinchali, Apoorva Sharma, James Harrison, Amine Elhafsi, Daniel Kang, Jenya Pergament,

Reinforcement Learning (RL)

Goal: Maximize the total reward

Agent Environment

Observe state !"

Action #"

Reward $"

13Adapted from Pensieve (Sigcomm 18, Mao et. al.)

Exploration vs. Exploitation Tradeoff

Exploit: On-board Robot Model

Explore: Utility of Cloud by learning

Page 14: Distributed Inference Between Mobile Edge Devices … › ... › sandeep-chinchali.pdfSandeep Chinchali, Apoorva Sharma, James Harrison, Amine Elhafsi, Daniel Kang, Jenya Pergament,

RobotLimited Network

Cloud

Reward

!"#$%&'($))#$*&

!+$,$''-' Offload

Cloud Model Predict*' = /

*' = {1, 3}Past Predictions

*' = 5

State 6'

Page 15: Distributed Inference Between Mobile Edge Devices … › ... › sandeep-chinchali.pdfSandeep Chinchali, Apoorva Sharma, James Harrison, Amine Elhafsi, Daniel Kang, Jenya Pergament,

The Robot Offloading MDP

Cloud Model

Robot Limited Network

!"#$#%%

Reward

Offload

!&'#()%*#++'#,)

-%

,% = /

Cloud

,% = {1, 3}Past Predictions

,% = 5

State 6%

Page 16: Distributed Inference Between Mobile Edge Devices … › ... › sandeep-chinchali.pdfSandeep Chinchali, Apoorva Sharma, James Harrison, Amine Elhafsi, Daniel Kang, Jenya Pergament,

The Robot Offloading MDP: Action Space

Cloud Model

Robot Limited Network

!"#$#%%

Reward

Offload

!&'#()%*#++'#,)

-%

,% = /

Cloud

,% = {1, 3}Past Predictions

,% = 5

State 6%

Page 17: Distributed Inference Between Mobile Edge Devices … › ... › sandeep-chinchali.pdfSandeep Chinchali, Apoorva Sharma, James Harrison, Amine Elhafsi, Daniel Kang, Jenya Pergament,

The Robot Offloading MDP: State Space

Cloud Model

Robot Limited Network

!"#$#%%

Reward

Offload

!&'#()%*#++'#,)

-%

,% = /

Cloud

,% = {1, 3}Past Predictions

,% = 5

State 6%

Page 18: Distributed Inference Between Mobile Edge Devices … › ... › sandeep-chinchali.pdfSandeep Chinchali, Apoorva Sharma, James Harrison, Amine Elhafsi, Daniel Kang, Jenya Pergament,

The Robot Offloading MDP: Reward

Cloud Model

Robot Limited Network

!"#$#%%

Reward

Offload

!&'#()%*#++'#,)

-%

,% = /

Cloud

,% = {1, 3}Past Predictions

,% = 5

State 6%

Page 19: Distributed Inference Between Mobile Edge Devices … › ... › sandeep-chinchali.pdfSandeep Chinchali, Apoorva Sharma, James Harrison, Amine Elhafsi, Daniel Kang, Jenya Pergament,

Outline1. Accuracy vs Compute-Efficiency Trade-offs of DNNs

2. Network Costs of Streaming Video/ LIDAR

3. A learning-based approach to Cloud Offloading

4. Simulation and Hardware Experiments

19

Page 20: Distributed Inference Between Mobile Edge Devices … › ... › sandeep-chinchali.pdfSandeep Chinchali, Apoorva Sharma, James Harrison, Amine Elhafsi, Daniel Kang, Jenya Pergament,

Query Cloud

SVM Classifier

Robot Model

!"FaceNet

Embed Face A

90% Conf

Coherence Time

" = $ " = %

Page 21: Distributed Inference Between Mobile Edge Devices … › ... › sandeep-chinchali.pdfSandeep Chinchali, Apoorva Sharma, James Harrison, Amine Elhafsi, Daniel Kang, Jenya Pergament,

RL beats benchmark offloading policies> 2.6x reward of benchmarks

RL: 70 % of oracle reward

All-Robot: today’s de-facto!"

#$%&'()*+,

Page 22: Distributed Inference Between Mobile Edge Devices … › ... › sandeep-chinchali.pdfSandeep Chinchali, Apoorva Sharma, James Harrison, Amine Elhafsi, Daniel Kang, Jenya Pergament,

RL intelligently, but sparingly queries cloud

Page 23: Distributed Inference Between Mobile Edge Devices … › ... › sandeep-chinchali.pdfSandeep Chinchali, Apoorva Sharma, James Harrison, Amine Elhafsi, Daniel Kang, Jenya Pergament,

Hardware Experiments on Live Video + Embedded Compute Platform

Page 24: Distributed Inference Between Mobile Edge Devices … › ... › sandeep-chinchali.pdfSandeep Chinchali, Apoorva Sharma, James Harrison, Amine Elhafsi, Daniel Kang, Jenya Pergament,

RL for Cloud Offloading in Robotics

• Compute model size and sensory data will grow

• Judicious use of Cloud in Robotics

• RL: General Two-Stage Decision Problem

OffloadLogic

Robot ModelCloud Model

Mobile RobotLimited Network

Sensory Input

Cloud

Offload ComputeLocal ComputeImage, MapDatabases

Query the cloud for better accuracy?Latency vs. Accuracy vs. Power …

Thanks! Please See Sandeep, Eyal, Jenya

Page 25: Distributed Inference Between Mobile Edge Devices … › ... › sandeep-chinchali.pdfSandeep Chinchali, Apoorva Sharma, James Harrison, Amine Elhafsi, Daniel Kang, Jenya Pergament,

25

Emmons et. al, “Neural Networks Are Networks Too”

Uplink-limited

Sensor Representation for Machine Perception

1. Human Eye -> High Bandwidth2. All-edge/All-cloud restrictive

Can we send fewer, relevant bits for the same accuracy? 7

Google Edge TPU ($150), Nvidia Jetson Nano ($99), TX2 ($600)

Page 26: Distributed Inference Between Mobile Edge Devices … › ... › sandeep-chinchali.pdfSandeep Chinchali, Apoorva Sharma, James Harrison, Amine Elhafsi, Daniel Kang, Jenya Pergament,

Future Directions

26

Page 27: Distributed Inference Between Mobile Edge Devices … › ... › sandeep-chinchali.pdfSandeep Chinchali, Apoorva Sharma, James Harrison, Amine Elhafsi, Daniel Kang, Jenya Pergament,

Emmons et. al, “Neural Networks Are Networks Too”

Uplink-limited

Network Costs of Cloud Communication

1. Congested Wireless Links2. High Bandwidth: Designed for Human Perception

27

Page 28: Distributed Inference Between Mobile Edge Devices … › ... › sandeep-chinchali.pdfSandeep Chinchali, Apoorva Sharma, James Harrison, Amine Elhafsi, Daniel Kang, Jenya Pergament,

System Architecture

DNN

Edge Cloud

28

Page 29: Distributed Inference Between Mobile Edge Devices … › ... › sandeep-chinchali.pdfSandeep Chinchali, Apoorva Sharma, James Harrison, Amine Elhafsi, Daniel Kang, Jenya Pergament,

Should we split Vision DNNs between edge/cloud?

Edge Google

Split at Layer 5

PredictPixelsOff-the-shelf

Pixels Intermediates

Do not split rapidly-evolving DNNs!NeuroSurgeon ASPLOS ’17

Google v1 v2

Split at Layer 5 10

29

Page 30: Distributed Inference Between Mobile Edge Devices … › ... › sandeep-chinchali.pdfSandeep Chinchali, Apoorva Sharma, James Harrison, Amine Elhafsi, Daniel Kang, Jenya Pergament,

Off-the-shelf

Idea: Keep Vision DNNs Intact

Decoder Edge Encoder Google, FB

Black-Box w/ API

PredictPixels

Benefit: Extends beyond video or DNNs (e.g. robotic map-making) 30

Page 31: Distributed Inference Between Mobile Edge Devices … › ... › sandeep-chinchali.pdfSandeep Chinchali, Apoorva Sharma, James Harrison, Amine Elhafsi, Daniel Kang, Jenya Pergament,

Learning-based Approach

DNN

Edge Cloud

31

Page 32: Distributed Inference Between Mobile Edge Devices … › ... › sandeep-chinchali.pdfSandeep Chinchali, Apoorva Sharma, James Harrison, Amine Elhafsi, Daniel Kang, Jenya Pergament,

Decoder

PixelEstimateCoded

FeaturesVideo

Edge

Feedback Reward (Training)

Predict

Off-the-Shelf

System Architecture

32

Page 33: Distributed Inference Between Mobile Edge Devices … › ... › sandeep-chinchali.pdfSandeep Chinchali, Apoorva Sharma, James Harrison, Amine Elhafsi, Daniel Kang, Jenya Pergament,

Many Open Questions

• Machines (DNNs) will watch most future video

• Research Avenues:• Small—scale RL simulations [Hotnets 18]

• Practical systems prototype [Under review]

• Active Learning to query the cloud [Under review]

• Deep RL with Real Vision DNNs – next!DNN

33

Page 34: Distributed Inference Between Mobile Edge Devices … › ... › sandeep-chinchali.pdfSandeep Chinchali, Apoorva Sharma, James Harrison, Amine Elhafsi, Daniel Kang, Jenya Pergament,

Simplified Systems Prototype

DNN

Edge Cloud

34

Page 35: Distributed Inference Between Mobile Edge Devices … › ... › sandeep-chinchali.pdfSandeep Chinchali, Apoorva Sharma, James Harrison, Amine Elhafsi, Daniel Kang, Jenya Pergament,

Edge Device

Video

Feature Feedback

Coded Features

1. Active Edge Encoders

Dynamically Encode Task-Relevant Content 35

Modify Sub-Image Resolution,Crop Regions,

“Machine” features, …

DNN

Page 36: Distributed Inference Between Mobile Edge Devices … › ... › sandeep-chinchali.pdfSandeep Chinchali, Apoorva Sharma, James Harrison, Amine Elhafsi, Daniel Kang, Jenya Pergament,

Code 1, Camera 1

Code 2, Camera 2

2. Centralized Active Decoder

Estimate Edge Scenes, “Fill-in” Missing Pixels w/ memory 36

DNN

Predict

State-ful DecoderPixel

Estimates

Page 37: Distributed Inference Between Mobile Edge Devices … › ... › sandeep-chinchali.pdfSandeep Chinchali, Apoorva Sharma, James Harrison, Amine Elhafsi, Daniel Kang, Jenya Pergament,

DNNPredict

Codes

Pixels

Edge Device

Feature Feedback

3. Feature Feedback from the Cloud

What content matters?

37

Content Priorities,Camera Angle,

Page 38: Distributed Inference Between Mobile Edge Devices … › ... › sandeep-chinchali.pdfSandeep Chinchali, Apoorva Sharma, James Harrison, Amine Elhafsi, Daniel Kang, Jenya Pergament,

MobileNet

Edge

Audio

Video

AI Offloader:

• New Content?

• BW Sufficient?

• Edge Correct?

Low Latency Result

Cloud Model

Accurate

Cloud

Result

Offload

Don’t Offload

Ba

nd

wid

th

Mobile Offloading for Vision

38

1.2-2.1x accuracy of all-edge, 60-90% BW savings compared to all-cloud

Page 39: Distributed Inference Between Mobile Edge Devices … › ... › sandeep-chinchali.pdfSandeep Chinchali, Apoorva Sharma, James Harrison, Amine Elhafsi, Daniel Kang, Jenya Pergament,

Should we split Vision DNNs between edge/cloud?

Edge Google

Split at Layer 5

PredictPixelsOff-the-shelf

Pixels Intermediates

Do not split rapidly-evolving DNNs!NeuroSurgeon ASPLOS ’17

Google v1 v2

Split at Layer 5 10

39

Page 40: Distributed Inference Between Mobile Edge Devices … › ... › sandeep-chinchali.pdfSandeep Chinchali, Apoorva Sharma, James Harrison, Amine Elhafsi, Daniel Kang, Jenya Pergament,

Results: Mobile Offloading for Vision

1. Trade-off Accuracy for BW Savings2. Adapt to edge model accuracy

Results (normalized to all-cloud):1. 60-90% BW savings 2. 80-90% accuracy of oracle3. 1.2-2.1x accuracy of all-edge

40

Edge MobileNet v1, v2

Accuracy

Page 41: Distributed Inference Between Mobile Edge Devices … › ... › sandeep-chinchali.pdfSandeep Chinchali, Apoorva Sharma, James Harrison, Amine Elhafsi, Daniel Kang, Jenya Pergament,

Insight: Bandwidth and Task-Aware Delivery1. Human Eye -> High Bandwidth

2. All-edge/All-cloud restrictive

3. Use Off-the-Shelf DNNs

Black-boxDecoder / EstimatorFeature Extractor/Filter

41

Page 42: Distributed Inference Between Mobile Edge Devices … › ... › sandeep-chinchali.pdfSandeep Chinchali, Apoorva Sharma, James Harrison, Amine Elhafsi, Daniel Kang, Jenya Pergament,

Problem Insights1. Human Eye -> High Bandwidth

2. All-edge/All-cloud restrictive

3. Use Off-the-Shelf DNNs

Proposal: Bandwidth and Task-Aware Video Delivery

Machine Perception

42

Page 43: Distributed Inference Between Mobile Edge Devices … › ... › sandeep-chinchali.pdfSandeep Chinchali, Apoorva Sharma, James Harrison, Amine Elhafsi, Daniel Kang, Jenya Pergament,

Deep-dive into componentsEdge Cloud

43

Page 44: Distributed Inference Between Mobile Edge Devices … › ... › sandeep-chinchali.pdfSandeep Chinchali, Apoorva Sharma, James Harrison, Amine Elhafsi, Daniel Kang, Jenya Pergament,

Edge Device

!"#

Data Center

Feature Feedback

$"#%"#

&"#

Wireless Network

1. Distributed Edge Encoders

%"# = ()*+,-)(!"#, $"#, 0&"#)44

Page 45: Distributed Inference Between Mobile Edge Devices … › ... › sandeep-chinchali.pdfSandeep Chinchali, Apoorva Sharma, James Harrison, Amine Elhafsi, Daniel Kang, Jenya Pergament,

!"#

!$#

Data Center

2. Centralized Active Decoder

Pretrain Predict%#

%# = '()*#)+,-(/0#)

/0# = '2*342*(/0#5", 78#, !#)

Decoder

98# /0#

45

Page 46: Distributed Inference Between Mobile Edge Devices … › ... › sandeep-chinchali.pdfSandeep Chinchali, Apoorva Sharma, James Harrison, Amine Elhafsi, Daniel Kang, Jenya Pergament,

Pretrain Predict

!"# $%# $&#

%'# "'#('#

Edge Device

)'#

Feature Feedback

3. Feature Feedback from the Cloud

Active Decoder*)# = ,-./0-.($%#23, 5"#, (#)

46