Distributed Edge Learning for Big Data Analytics: Challenges and …cssongguo/EdgeAI-short.pdf ·...
Transcript of Distributed Edge Learning for Big Data Analytics: Challenges and …cssongguo/EdgeAI-short.pdf ·...
Distributed Edge Learning for Big Data Analytics:Challenges and Trends
Song Guo The Hong Kong Polytechnic University
[email protected]://www4.comp.polyu.edu.hk/~cssongguo/
RGC Research Impact Fund (RIF), “Edge Learning: the Enabling Technology for Distributed Big Data Analytics in Cloud-Edge Environment”, 2020-2025, Project Coordinator, 7,640,000 HKD.
2
Acknowledgement
• Background and Preliminaries
• Challenge Analysis
• Approaches and Results
• Future Directions
3
Outline
Booming Era of Intelligence
Smart Home Self-DrivingSmart Health Smart Grid Multimedia ServiceSmart Surveillance
Smart Decision Making, Automation, and Optimization
Big Data AI4
From Cloud Intelligence to Edge IntelligenceCloud Intelligence
5
Cloud
Edge Intelligence
Cloud
Edge• Save bandwidth
847 ZB vs. 19.5 ZB Per Year• Reduce latency𝒔𝒔-level to 𝐦𝐦𝐦𝐦/𝝁𝝁𝒔𝒔-level• Ensure privacy
Model Training &Inference
Model Training &Inference
Model Training &Inference
6
Distributed Machine Learning
Weakness - Slow• Single machine has limited
hardware resource• The size of training data is large• Machine learning model is large
Advantage - Fast• Train the model on multiple machines
in parallelChallenges:How to configure these machines?
Single-machine training Multiple-machine training
7
Distributed Edge Learning
Worker 1 Worker 2
Worker 3 Worker 4
Server1 Server2
Distributed Machine Learning in a Data Center
The principle of edge computing naturally facilitates distributed edge learning by leveraging edge/device resources.
Distributed Machine Learning with Edge Devices
• Gradient aggregation and model synchronization: Stochastic Gradient Descent (SGD) • Architecture: Parameter Server
8
Basic Algorithm and Architecture
• BSP (Bulk Synchronous Parallel): Models at all workers are strictly synchronized in every iteration.
• ASP (Asynchronous Parallel): Workers update the global model asynchronously under a given threshold of staleness.
9
Synchronization Methods
BSP ASP
• Background and Preliminaries
• Challenge Analysis
• Approaches and Results
• Future Directions
10
Outline
11
Challenges
Network
Cloud
DataStream
Edge
Constrained Communication
Constrained Computation
Unstable Environment
Hard to train
Hardware Diversity
Data Heterogeneity
QoS Diversity
Hard to customize
No Incentive for Participation
Hard to sustain
Vulnerable Edge Devices Privacy of Data
Hard to protect
It is quite challenging to realize distributed edge learning in an efficient,secure, sustainable, and customized manner due to the inherentcharacteristics of the cloud-edge environment.
12
Hard to Train: Constrained Resource
• Limited bandwidth vs. increased communication overhead• Limited power, computing capacity and memory space
Image throughput for the Inception-V3 model with CPU & GPU [AI Benchmark]
Prediction by Deloitte Insights
13
Hard to Train: Unstable Environment
• Unguaranteed network Quality of Service (QoS))• Unstable runtime of the edge device (computation/OS/battery/mobility)
14
Hard to Protect: Security
GlobalModel
∆𝑊𝑊1𝑊𝑊 𝑊𝑊 𝑊𝑊 𝑊𝑊
∆𝑊𝑊2 ∆𝑊𝑊3 ∆𝑊𝑊𝑁𝑁
Clients
Serverθ1
α1
αk
αL
θi
θq
x1
xj
xN
O1
Ok
OL
Good dataPoisoned data
Attacked Model
• Cloud-edge environment suffers from malicious attacks.
More malicious workers, lower accuracy
Higher poisoning rate, higher error
Hard to Protect: Privacy
15
homomorphic Encryption:execute on encrypted data.
differential privacy: add a perturbation to the gradient.
• Two mainstream approaches for privacy protection in cloud-based learning are differential privacy and homomorphic encryption.
• How to design lightweight mechanisms to protect data privacy on vulnerable and resource-constrained edge devices in distributed edge learning?
a degraded convergence rate high complexity, only support simple operations
16
Hard to Sustain: Lack of Participations
resource consumption
=data assets
not participate without incentive
hard to avoid malicious behavior
Contribution
Client A Client Bhard to classify the contribution of A & B
Contribution
…
…time
time
Learning strategy A
Learning strategy B
Contribution
hard to guarantee the sustainability
Contribution
Contribution
• How to build a benign ecosystem to for sustainable development of edge learning?
not participate without incentiveContribution
Client A Client B
Contribution
…
…time
time
Learning strategy A
Learning strategy B
Contribution
Contribution
Contribution
Hard to Customize: Edge Diversity
17Hard to adapt to different hardware and environments
QoS diversity
Hardware diversity
Data Heterogeneity
• Background and Preliminaries
• Challenge Analysis
• Approaches and Results
• Future Directions
18
Outline
19
ApproachesChallenges Example approaches
Hard to trainConstrained resources
OSP [Wang@ICPP’19]Falcon [Zhou@ICDCS’19] Petrel [Zhou @ICDCS’20][Zhou@TC’20][Zhou@ICDCS’19][Zhang@UbiComp’20]
Unstable environment Heter-aware gradient coding [Wang@ICDCS’19]L4L [Zhan@IPDPS’20]
Hard to protectSecurity guarantee DGSB [Wang@20] [Yu@TBD'20]Privacy protection [Du@TSC'19], [Yu@CPS'18]
Hard to sustain Lack of participations
Incentive mechanism with DRL [Zhan@INFOCOM’20][Zhan@IoT’20][Zhan@NETWORK’19][Zhan@NETWORK’20]
20
Critical Scientific Issues
Performance: How to improve learning performanceunder systems constraints (i.e. high communication cost, device-level resource constraints)?
Security and Privacy: How to enhance data/model privacy and how to design lightweight security mechanisms?
Incentive: How to stimulate effective and efficient collaboration among all organizations of federated learning ecosystem?
Win-Altogether
Secure
Goals
Fast
Conclusion
Hardware Diversity
21
Flexibility Efficiency
Edge devices are equipped with various processing units with different computing capabilities.
How to make the edge learning adapted to different hardware environments?
CPU(Central Processing Unit)
GPU(Graphics Processing Unit)
NPU(Neural Processing Unit)
• Hardware of edge devices:
22
Data Heterogeneity
• Data distributions of workers are not identical.• Applying the same learning strategy for all
workers fails to work efficiently.Dataset1
Cloud
Dataset2Dataset3 Dataset4
QoS Diversity
Data are generated separately by the distributed edge devices.
Different training environments
Different data qualities Current edge learning systems usually use one model for all clients.
MeansBut
• Different clients has different measurements on QoS in terms of accuracy, latency, cost, etc.• One model will never be the optimal choice for all clients.
23
Wireless Optimization for Edge Learning
24
• How to jointly optimize the edge learning process and the wireless transmission over diverse modes of wireless communication over a complex network topology?
over-the-air computation and gradient compression:simultaneous transmission and decoding the average/sum of gradients
channel allocation and transmission scheduling in complex wireless environment:jointly optimize the training algorithm and transmission scheduling
synchronization mechanism in multi-layer architecture:decentralized synchronization and layer by layer synchronization
Thank you !
25
Edge Learning: A revolutionary learning paradigm enabling ubiquitous intelligence!
[email protected]://www4.comp.polyu.edu.hk/~cssongguo/