Federated Learningcs294-163/fa19/slides/... · 2019-10-02 · Updates of M(i) Updates of M(i)...

50
Federated Learning Min Du Postdoc, UC Berkeley

Transcript of Federated Learningcs294-163/fa19/slides/... · 2019-10-02 · Updates of M(i) Updates of M(i)...

Page 1: Federated Learningcs294-163/fa19/slides/... · 2019-10-02 · Updates of M(i) Updates of M(i) Updates of M(i) i) Gradient updates for M(i) Gradient updates for M(i) Gradient updates

Federated LearningMin Du

Postdoc, UC Berkeley

Page 2: Federated Learningcs294-163/fa19/slides/... · 2019-10-02 · Updates of M(i) Updates of M(i) Updates of M(i) i) Gradient updates for M(i) Gradient updates for M(i) Gradient updates

Outline

q Preliminary: deep learning and SGD

q Federated learning: FedSGD and FedAvg

q Related research in federated learning

q Open problems

Page 3: Federated Learningcs294-163/fa19/slides/... · 2019-10-02 · Updates of M(i) Updates of M(i) Updates of M(i) i) Gradient updates for M(i) Gradient updates for M(i) Gradient updates

Outline

q Preliminary: deep learning and SGD

q Federated learning: FedSGD and FedAvg

q Related research in federated learning

q Open problems

Page 4: Federated Learningcs294-163/fa19/slides/... · 2019-10-02 · Updates of M(i) Updates of M(i) Updates of M(i) i) Gradient updates for M(i) Gradient updates for M(i) Gradient updates

• Find a function, which produces a desired output given a particular input.

Example task Given input Desired output

Image classification 8

Playing GO Next move

Next-word-prediction Looking forward to your ? reply

𝑤 is the set of parameters contained by the function

The goal of deep learning

Page 5: Federated Learningcs294-163/fa19/slides/... · 2019-10-02 · Updates of M(i) Updates of M(i) Updates of M(i) i) Gradient updates for M(i) Gradient updates for M(i) Gradient updates

• Given one input sample pair 𝑥#, 𝑦# , the goal of deep learning model training is to find a set of parameters 𝑤, to maximize the probability of outputting 𝑦#given 𝑥#.

Given input: 𝑥# Maximize: 𝑝(5|𝑥#, 𝑤)

Finding the function: model training

Page 6: Federated Learningcs294-163/fa19/slides/... · 2019-10-02 · Updates of M(i) Updates of M(i) Updates of M(i) i) Gradient updates for M(i) Gradient updates for M(i) Gradient updates

• Given a training dataset containing 𝑛 input-output pairs 𝑥,, 𝑦, , 𝑖 ∈ 1, 𝑛 , the goal of deep learning model training is to find a set of parameters 𝑤, such that the average of 𝑝(𝑦,) is maximized given 𝑥,.

Given input:

Output:

Finding the function: model training

Page 7: Federated Learningcs294-163/fa19/slides/... · 2019-10-02 · Updates of M(i) Updates of M(i) Updates of M(i) i) Gradient updates for M(i) Gradient updates for M(i) Gradient updates

• That is,

Which is equivalent to

𝑚𝑎𝑥𝑖𝑚𝑖𝑧𝑒1𝑛4,56

7

𝑝(𝑦,|𝑥,, 𝑤)

𝑚𝑖𝑛𝑖𝑚𝑖𝑧𝑒1𝑛4,56

7

−log(𝑝(𝑦,|𝑥,, 𝑤))A basic component for loss function 𝑙(𝑥,, 𝑦,, 𝑤) given sample 𝑥,, 𝑦, :

Let 𝑓, 𝑤 = 𝑙(𝑥,, 𝑦,, 𝑤) denote the loss function.

• Given a training dataset containing 𝑛 input-output pairs 𝑥,, 𝑦, , 𝑖 ∈ 1, 𝑛 , the goal of deep learning model training is to find a set of parameters 𝑤, such that the average of 𝑝(𝑦,) is maximized given 𝑥,.

Finding the function: model training

Page 8: Federated Learningcs294-163/fa19/slides/... · 2019-10-02 · Updates of M(i) Updates of M(i) Updates of M(i) i) Gradient updates for M(i) Gradient updates for M(i) Gradient updates

Deep learning model training

For a training dataset containing 𝑛 samples (𝑥,, 𝑦,), 1 ≤ 𝑖 ≤ 𝑛, the training objective is:

minC∈ℝE

𝑓(𝑤) where 𝑓 𝑤 ≝ 67∑,567 𝑓,(𝑤)

𝑓, 𝑤 = 𝑙(𝑥,, 𝑦,, 𝑤) is the loss of the prediction on example 𝑥,, 𝑦,

No closed-form solution: in a typical deep learning model, 𝑤 may contain millions of parameters.Non-convex: multiple local minima exist.

𝑤

𝑓(𝑤)

Page 9: Federated Learningcs294-163/fa19/slides/... · 2019-10-02 · Updates of M(i) Updates of M(i) Updates of M(i) i) Gradient updates for M(i) Gradient updates for M(i) Gradient updates

Solution: Gradient Descent

𝑤

Loss 𝑓(𝑤)

Randomly initialized weight 𝑤

Compute gradient ∇𝑓(𝑤)𝑤IJ6 = 𝑤I − 𝜂∇𝑓(𝑤)

(Gradient Descent)

At the local minimum, ∇𝑓(𝑤) is close to 0.

Learning rate 𝜂 controls the step size

How to stop? – when the update is small enough – converge.

∥ 𝑤IJ6 − 𝑤I ∥≤ 𝜖

or ∥ ∇𝑓(𝑤I) ∥≤ 𝜖

Problem: Usually the number of training samples n is large – slow convergence

Page 10: Federated Learningcs294-163/fa19/slides/... · 2019-10-02 · Updates of M(i) Updates of M(i) Updates of M(i) i) Gradient updates for M(i) Gradient updates for M(i) Gradient updates

Solution: Stochastic Gradient Descent (SGD)

● At each step of gradient descent, instead of compute for all training samples, randomly pick a small subset (mini-batch) of training samples 𝑥N, 𝑦N .

● Compared to gradient descent, SGD takes more steps to converge, but each step is much faster.

𝑤IJ6 ← 𝑤I − 𝜂∇𝑓 𝑤I; 𝑥N, 𝑦N

Page 11: Federated Learningcs294-163/fa19/slides/... · 2019-10-02 · Updates of M(i) Updates of M(i) Updates of M(i) i) Gradient updates for M(i) Gradient updates for M(i) Gradient updates

Outline

q Preliminary: deep learning and SGD

q Federated learning: FedSGD and FedAvg

q Related research in federated learning

q Open problems

Page 12: Federated Learningcs294-163/fa19/slides/... · 2019-10-02 · Updates of M(i) Updates of M(i) Updates of M(i) i) Gradient updates for M(i) Gradient updates for M(i) Gradient updates

The biggest obstacle to using advanced data analysis isn’t skill base or technology; it’s plain old access to the data.

-Edd Wilder-James, Harvard Business Review

“”

The importance of data for ML

Page 13: Federated Learningcs294-163/fa19/slides/... · 2019-10-02 · Updates of M(i) Updates of M(i) Updates of M(i) i) Gradient updates for M(i) Gradient updates for M(i) Gradient updates

“Data is the New Oil”

Page 14: Federated Learningcs294-163/fa19/slides/... · 2019-10-02 · Updates of M(i) Updates of M(i) Updates of M(i) i) Gradient updates for M(i) Gradient updates for M(i) Gradient updates

Google, Apple, ......

ML model

Private data: all the photos a user takes and everything they type on their mobile keyboard, including passwords, URLs, messages, etc.

image classification:e.g. to predict which photos are most likely to be viewed multiple times in the future;

language models:e.g. voice recognition,next-word-prediction, and auto-reply in Gmail

Page 15: Federated Learningcs294-163/fa19/slides/... · 2019-10-02 · Updates of M(i) Updates of M(i) Updates of M(i) i) Gradient updates for M(i) Gradient updates for M(i) Gradient updates

Google, Apple, ......

Instead of uploading the raw data, train a model locally and upload the model.

Addressing privacy:Model parameters will never contain more information than the raw training data

Addressing network overhead:The size of the model is generally smaller than the size of the raw training data

ML model

ML model

ML model

MODEL AGGREGATIONML model

Page 16: Federated Learningcs294-163/fa19/slides/... · 2019-10-02 · Updates of M(i) Updates of M(i) Updates of M(i) i) Gradient updates for M(i) Gradient updates for M(i) Gradient updates

Federated optimization● Characteristics (Major challenges)

○ Non-IID

■ The data generated by each user are quite different

○ Unbalanced

■ Some users produce significantly more data than others

○ Massively distributed

■ # mobile device owners >> avg # training samples on each device

○ Limited communication

■ Unstable mobile network connections

Page 17: Federated Learningcs294-163/fa19/slides/... · 2019-10-02 · Updates of M(i) Updates of M(i) Updates of M(i) i) Gradient updates for M(i) Gradient updates for M(i) Gradient updates

A new paradigm – Federated Learning

a synchronous update scheme that proceeds in rounds of communication

McMahan, H. Brendan, Eider Moore, Daniel Ramage, and Seth Hampson. "Communication-efficient learning of deep networks from decentralized data." AISTATS, 2017.

Page 18: Federated Learningcs294-163/fa19/slides/... · 2019-10-02 · Updates of M(i) Updates of M(i) Updates of M(i) i) Gradient updates for M(i) Gradient updates for M(i) Gradient updates

Local data

Local data

Local data

Local data

Central Server

Global model M(i)

Model M(i) Model M(i)

Model M(i) Model M(i)

Gradient updates for M(i)

Gradient updates for M(i)

Gradient updates for M(i)

Gradient updates for M(i)

In round number i…

Deployed by Google, Apple, etc.

Federated learning – overview

Page 19: Federated Learningcs294-163/fa19/slides/... · 2019-10-02 · Updates of M(i) Updates of M(i) Updates of M(i) i) Gradient updates for M(i) Gradient updates for M(i) Gradient updates

Local data

Local data

Local data

Local data

Central Server

In round number i…

Updates of M(i) Updates of M(i)

Updates of M(i) Updates of M(i)

Gradient updates for M(i)

Gradient updates for M(i)

Gradient updates for M(i)

Gradient updates for M(i)

Model Aggregation

M(i+1)

Federated learning – overview

Page 20: Federated Learningcs294-163/fa19/slides/... · 2019-10-02 · Updates of M(i) Updates of M(i) Updates of M(i) i) Gradient updates for M(i) Gradient updates for M(i) Gradient updates

Local data

Central Server

Local data

Local data

Local data

Global model M(i+1)

Model M(i+1)

Model M(i+1)

Model M(i+1)

Model M(i+1)

Round number i+1 and continue…

Federated learning – overview

Page 21: Federated Learningcs294-163/fa19/slides/... · 2019-10-02 · Updates of M(i) Updates of M(i) Updates of M(i) i) Gradient updates for M(i) Gradient updates for M(i) Gradient updates

Federated learning – detail

For efficiency,at the beginning of each round, a random fraction C of clientsis selected, and the server sends the current model parameters to each of these clients.

Page 22: Federated Learningcs294-163/fa19/slides/... · 2019-10-02 · Updates of M(i) Updates of M(i) Updates of M(i) i) Gradient updates for M(i) Gradient updates for M(i) Gradient updates

Federated learning – detail

● Recall in traditional deep learning model training

○ For a training dataset containing 𝑛 samples (𝑥,, 𝑦,), 1 ≤ 𝑖 ≤ 𝑛, the training objective is:

𝑓, 𝑤 = 𝑙(𝑥,, 𝑦,, 𝑤) is the loss of the prediction on example 𝑥,, 𝑦,

○ Deep learning optimization relies on SGD and its variants, through mini-batches

𝑤IJ6 ← 𝑤I − 𝜂∇𝑓 𝑤I; 𝑥N, 𝑦N

minC∈ℝE

𝑓(𝑤) where 𝑓 𝑤 ≝ 67∑,567 𝑓,(𝑤)

Page 23: Federated Learningcs294-163/fa19/slides/... · 2019-10-02 · Updates of M(i) Updates of M(i) Updates of M(i) i) Gradient updates for M(i) Gradient updates for M(i) Gradient updates

Federated learning – detail

● In federated learning

○ Suppose 𝑛 training samples are distributed to 𝐾 clients, where 𝑃N is the set of indices of data points on client 𝑘, and 𝑛N = 𝑃N .

○ For training objective: minC∈ℝE

𝑓(𝑤)

𝑓 𝑤 = ∑N56T 7U7𝐹N(𝑤) where 𝐹N(𝑤) ≝

67U∑,∈WU 𝑓,(𝑤)

Page 24: Federated Learningcs294-163/fa19/slides/... · 2019-10-02 · Updates of M(i) Updates of M(i) Updates of M(i) i) Gradient updates for M(i) Gradient updates for M(i) Gradient updates

A baseline – FederatedSGD (FedSGD)

● A randomly selected client that has 𝑛N training data samples in federated learning ≈ A randomly selected sample in traditional deep learning

● Federated SGD (FedSGD): a single step of gradient descent is done per round

● Recall in federated learning, a C-fraction of clients are selected at each round.

○ C=1: full-batch (non-stochastic) gradient descent

○ C<1: stochastic gradient descent (SGD)

Page 25: Federated Learningcs294-163/fa19/slides/... · 2019-10-02 · Updates of M(i) Updates of M(i) Updates of M(i) i) Gradient updates for M(i) Gradient updates for M(i) Gradient updates

A baseline – FederatedSGD (FedSGD)Learning rate: 𝜂; total #samples: 𝑛; total #clients: 𝐾; #samples on a client k: 𝑛N; clients fraction 𝐶 = 1

● In a round t:

○ The central server broadcasts current model 𝑤I to each client; each client k computes gradient: 𝑔N = ∇𝐹N(𝑤I), on its local data.

■ Approach 1: Each client k submits 𝑔N; the central server aggregates the gradients to generate a new model:

● 𝑤IJ6 ← 𝑤I − 𝜂∇𝑓 𝑤I = 𝑤I − 𝜂 ∑N56T 7U7𝑔N .

■ Approach 2: Each client k computes: 𝑤IJ6N ← 𝑤I − 𝜂𝑔N; the central server performs aggregation:

● 𝑤IJ6 ← ∑N56T 7U7𝑤IJ6N For multiple times ⟹ FederatedAveraging (FedAvg)

Recall f w = ∑^56_ `a`F^(w)

Page 26: Federated Learningcs294-163/fa19/slides/... · 2019-10-02 · Updates of M(i) Updates of M(i) Updates of M(i) i) Gradient updates for M(i) Gradient updates for M(i) Gradient updates

Federated learning – deal with limited communication

● Increase computation

○ Select more clients for training between each communication round

○ Increase computation on each client

Page 27: Federated Learningcs294-163/fa19/slides/... · 2019-10-02 · Updates of M(i) Updates of M(i) Updates of M(i) i) Gradient updates for M(i) Gradient updates for M(i) Gradient updates

Federated learning – FederatedAveraging (FedAvg)Learning rate: 𝜂; total #samples: 𝑛; total #clients: 𝐾; #samples on a client k: 𝑛N; clients fraction 𝐶

● In a round t:

○ The central server broadcasts current model 𝑤I to each client; each client k computes gradient: 𝑔N = ∇𝐹N(𝑤I), on its local data.

■ Approach 2:

● Each client k computes for E epochs : 𝑤IJ6N ← 𝑤I − 𝜂𝑔N

● The central server performs aggregation: 𝑤IJ6 ← ∑N56T 7U7𝑤IJ6N

● Suppose B is the local mini-batch size, #updates on client k in each round: 𝑢N = 𝐸 7Ue.

Page 28: Federated Learningcs294-163/fa19/slides/... · 2019-10-02 · Updates of M(i) Updates of M(i) Updates of M(i) i) Gradient updates for M(i) Gradient updates for M(i) Gradient updates

Federated learning – FederatedAveraging (FedAvg)

Model initialization

● Two choices:

○ On the central server

○ On each client

The loss on the full MNIST training set for models generated by𝜃𝑤 + (1 − 𝜃)𝑤i

Shared initialization works better in practice.

Page 29: Federated Learningcs294-163/fa19/slides/... · 2019-10-02 · Updates of M(i) Updates of M(i) Updates of M(i) i) Gradient updates for M(i) Gradient updates for M(i) Gradient updates

Federated learning – FederatedAveraging (FedAvg)

Model averaging

● As shown in the right figure:

The loss on the full MNIST training set for models generated by𝜃𝑤 + (1 − 𝜃)𝑤i

In practice, naïve parameter averaging works surprisingly well.

Page 30: Federated Learningcs294-163/fa19/slides/... · 2019-10-02 · Updates of M(i) Updates of M(i) Updates of M(i) i) Gradient updates for M(i) Gradient updates for M(i) Gradient updates

Federated learning – FederatedAveraging (FedAvg)

1. At first, a model is randomly initialized on the central server.

2. For each round t:i. A random set of clients are

chosen;ii. Each client performs local

gradient descent steps;iii. The server aggregates

model parameters submitted by the clients.

Page 31: Federated Learningcs294-163/fa19/slides/... · 2019-10-02 · Updates of M(i) Updates of M(i) Updates of M(i) i) Gradient updates for M(i) Gradient updates for M(i) Gradient updates

Federated learning – Evaluation● #clients: 100● Dataset: MNIST

○ IID: Random partition○ Non-IID: each client only

contains two digits○ Balanced

1 client

1 client

FedSGD FedSGDFedAvg FedAvg

#rounds required to achieve a target accuracy on test dataset.

Image classification

Impact of varying C

In general, the higher C, the smaller #rounds to reach target accuracy.

Page 32: Federated Learningcs294-163/fa19/slides/... · 2019-10-02 · Updates of M(i) Updates of M(i) Updates of M(i) i) Gradient updates for M(i) Gradient updates for M(i) Gradient updates

Federated learning – Evaluation● Dataset from: The Complete Works of Shakespeare

○ #clients: 1146, each corresponding to a speaking role○ Unbalanced: different #lines for each role○ Train-test split ratio: 80% - 20%○ A balanced and IID dataset with 1146 clients is also constructed

● Task: next character prediction

● Model: character-level LSTM language model

Language modeling

Page 33: Federated Learningcs294-163/fa19/slides/... · 2019-10-02 · Updates of M(i) Updates of M(i) Updates of M(i) i) Gradient updates for M(i) Gradient updates for M(i) Gradient updates

Federated learning – Evaluation

● The effect of increasing computation in each round (decrease B / increase E)

● Fix C=0.1

Image classification

In general, the more computation in each round, the faster the model trains.FedAvg also converges to a higher test accuracy (B=10, E=20).

Page 34: Federated Learningcs294-163/fa19/slides/... · 2019-10-02 · Updates of M(i) Updates of M(i) Updates of M(i) i) Gradient updates for M(i) Gradient updates for M(i) Gradient updates

Federated learning – Evaluation Language modeling

● The effect of increasing computation in each round (decrease B / increase E)

● Fix C=0.1

In general, the more computation in each round, the faster the model trains.FedAvg also converges to a higher test accuracy (B=10, E=5).

Page 35: Federated Learningcs294-163/fa19/slides/... · 2019-10-02 · Updates of M(i) Updates of M(i) Updates of M(i) i) Gradient updates for M(i) Gradient updates for M(i) Gradient updates

Federated learning – Evaluation

● What if we maximize the computation on each client? 𝐸 → ∞

Best performance may achieve at earlier rounds; increasing #rounds do not improve.

Best practice: decay the amount of local computation when the model is close to converge.

Page 36: Federated Learningcs294-163/fa19/slides/... · 2019-10-02 · Updates of M(i) Updates of M(i) Updates of M(i) i) Gradient updates for M(i) Gradient updates for M(i) Gradient updates

Federated learning – Evaluation● #clients: 100● Dataset: CIFAR-10

○ IID: Random partition○ Non-IID: each client only

contains two digits○ Balanced

Image classification

Page 37: Federated Learningcs294-163/fa19/slides/... · 2019-10-02 · Updates of M(i) Updates of M(i) Updates of M(i) i) Gradient updates for M(i) Gradient updates for M(i) Gradient updates

Federated learning – Evaluation

● Dataset from: 10 million public posts from a large social network○ #clients: 500,000, each

corresponding to an author

● Task: next word prediction

● Model: word-level LSTM language model

Language modelingA real-world problem

200 clients per round; B=8, E=1

Page 38: Federated Learningcs294-163/fa19/slides/... · 2019-10-02 · Updates of M(i) Updates of M(i) Updates of M(i) i) Gradient updates for M(i) Gradient updates for M(i) Gradient updates

Outline

q Preliminary: deep learning and SGD

q Federated learning: FedSGD and FedAvg

q Related research in federated learning

q Open problems

Page 39: Federated Learningcs294-163/fa19/slides/... · 2019-10-02 · Updates of M(i) Updates of M(i) Updates of M(i) i) Gradient updates for M(i) Gradient updates for M(i) Gradient updates

Federated learning – related research

Data poisoning attacks. How to backdoor federated learning, arXiv:1807.00459.

Secure aggregation. Practical Secure Aggregation for Privacy-Preserving Machine Learning, CCS’17

Client-level differential privacy. Differentially Private Federated Learning: A Client-level Perspective, ICLR’19

Decentralize the central server via blockchain.

Google FL Workshop: https://sites.google.com/view/federated-learning-2019/home

Page 40: Federated Learningcs294-163/fa19/slides/... · 2019-10-02 · Updates of M(i) Updates of M(i) Updates of M(i) i) Gradient updates for M(i) Gradient updates for M(i) Gradient updates

HiveMind: Decentralized Federated Learning

Local data

Local data

Local data

Local dataOasis Blockchain Platform

HiveMindSmart Contract

aggregated noise

DifferentiallyPrivate Global Model

Differential privacy Secure aggregation Model encryptionDP noise

Page 41: Federated Learningcs294-163/fa19/slides/... · 2019-10-02 · Updates of M(i) Updates of M(i) Updates of M(i) i) Gradient updates for M(i) Gradient updates for M(i) Gradient updates

HiveMind: Decentralized Federated Learning

Local data

Local data

Local data

Local dataOasis Blockchain Platform

HiveMindSmart Contract

aggregated noise

DifferentiallyPrivate Global Model

Model M(i)

Model M(i)

Model M(i)

Model M(i)

In round number i…

Global model M(i)

Page 42: Federated Learningcs294-163/fa19/slides/... · 2019-10-02 · Updates of M(i) Updates of M(i) Updates of M(i) i) Gradient updates for M(i) Gradient updates for M(i) Gradient updates

HiveMind: Decentralized Federated Learning

Local data

Local data

Local data

Local dataOasis Blockchain Platform

HiveMindSmart Contract

Gradient updates for M(i)

Gradient updates for M(i)

Gradient updates for M(i)

Gradient updates for M(i)

In round number i…

DP noise+

DP noise+

DP noise+

DP noise+

Differential privacy Secure aggregation Model encryptionDP noise

Page 43: Federated Learningcs294-163/fa19/slides/... · 2019-10-02 · Updates of M(i) Updates of M(i) Updates of M(i) i) Gradient updates for M(i) Gradient updates for M(i) Gradient updates

HiveMind: Decentralized Federated Learning

Local data

Local data

Local data

Local dataOasis Blockchain Platform

HiveMindSmart Contract

In round number i…

Differential privacy Secure aggregation Model encryption

Gradient updates for M(i)

DP noise+

Gradient updates for M(i)

DP noise+

Gradient updates for M(i)

DP noise+

Gradient updates for M(i)

DP noise+

DP noise

Page 44: Federated Learningcs294-163/fa19/slides/... · 2019-10-02 · Updates of M(i) Updates of M(i) Updates of M(i) i) Gradient updates for M(i) Gradient updates for M(i) Gradient updates

HiveMind: Decentralized Federated Learning

Local data

Local data

Local data

Local dataOasis Blockchain Platform

HiveMindSmart Contract

In round number i…

Differential privacy Secure aggregation Model encryption

Gradient updates for M(i)

DP noise+

Gradient updates for M(i)

DP noise+

Gradient updates for M(i)

DP noise+

Gradient updates for M(i)

DP noise+

DP noise

Secure Aggregation

Page 45: Federated Learningcs294-163/fa19/slides/... · 2019-10-02 · Updates of M(i) Updates of M(i) Updates of M(i) i) Gradient updates for M(i) Gradient updates for M(i) Gradient updates

aggregated noise

Secure Aggregation

HiveMind: Decentralized Federated Learning

Local data

Local data

Local data

Local dataOasis Blockchain Platform

HiveMindSmart Contract

In round number i…

Differential privacy Secure aggregation Model encryption

Gradient updates for M(i)

DP noise+

Gradient updates for M(i)

DP noise+ Gradient updates

for M(i)DP

noise+

Gradient updates for M(i)

DP noise+

DP noise

Global model M(i+1)

Page 46: Federated Learningcs294-163/fa19/slides/... · 2019-10-02 · Updates of M(i) Updates of M(i) Updates of M(i) i) Gradient updates for M(i) Gradient updates for M(i) Gradient updates

HiveMind: Decentralized Federated Learning

Local data

Local data

Local data

Local dataOasis Blockchain Platform

HiveMindSmart Contract

aggregated noise

DifferentiallyPrivate Global Model

Differential privacy Secure aggregation Model encryption

Global model M(i+1)

Round number i+1 and continue…

DP noise

Page 47: Federated Learningcs294-163/fa19/slides/... · 2019-10-02 · Updates of M(i) Updates of M(i) Updates of M(i) i) Gradient updates for M(i) Gradient updates for M(i) Gradient updates

HiveMind: Decentralized Federated Learning

Local data

Local data

Local data

Local dataOasis Blockchain Platform

HiveMindSmart Contract

aggregated noise

DifferentiallyPrivate Global Model

Differential privacy Secure aggregation Model encryptionDP noise

Page 48: Federated Learningcs294-163/fa19/slides/... · 2019-10-02 · Updates of M(i) Updates of M(i) Updates of M(i) i) Gradient updates for M(i) Gradient updates for M(i) Gradient updates

Outline

q Preliminary: deep learning and SGD

q Federated learning: FedSGD and FedAvg

q Related research in federated learning

q Open problems

Page 49: Federated Learningcs294-163/fa19/slides/... · 2019-10-02 · Updates of M(i) Updates of M(i) Updates of M(i) i) Gradient updates for M(i) Gradient updates for M(i) Gradient updates

Federated learning – open problems

● Detect data poisoning attacks, while secure aggregation is being used.

● Asynchronous model update in federated learning and its co-existence with secure aggregation.

● Further reduce communication overhead through quantization etc.

● The usage of differential privacy in each of the above settings.

● ……

Page 50: Federated Learningcs294-163/fa19/slides/... · 2019-10-02 · Updates of M(i) Updates of M(i) Updates of M(i) i) Gradient updates for M(i) Gradient updates for M(i) Gradient updates

Min [email protected]

Thank you!