Homomorphic encryption for deep learning: a revolution in the … · deep learning: a revolution in...

Homomorphic encryption for deep learning: a revolution in the makingPascal Paillier

The problem

A booming industry.

The problem

GPU

GPU

GPU

data

prediction

Trained neural network

Cloud Inc.

The problem

GPU

GPU

GPU

sensitive data

prediction

Cloud Inc.

The problem

GPU

GPU

GPU

sensitive data

sensitive prediction

Cloud Inc.

The problem

GPU

GPU

GPU

sensitive data


STDs, HIV

Health data

GenomicsBlood work

Lifestyle trackingCloud Inc.

The problem

GPU

GPU

GPU

sensitive data


Fraud prevention

Financial data

Bank activityTransaction

records

InvestmentsCloud Inc.

The problem

GPU

GPU

GPU

sensitive data


Crime prevention

Gov dataPolice/Justice case solving

Tax evasion

Cloud Inc.

Homomorphic encryptionfor neural networks

The big motivation for applying FHE

GPU

GPU

GPU

sensitive data


Enc(

Enc(

)

)

Input and output data are encrypted

The model is evaluated in the encrypted domain = homomorphic inference

Only the user has the key to encrypt

and decrypt

Cloud Inc.

First generation FHE

x, y ∈ {0,1}Enc(x), Enc(y) → Enc(x ⊕ y)

Enc(x), Enc(y) → Enc(x ∧ y)

pretty fast

super slow

⊕ + ∧ = all functions

[Gentry09]

First generation FHE

x, y ∈ {0,1}Enc(x), Enc(y) → Enc(x ⊕ y)


pretty fast

super slow

⊕ + ∧ = all functions

Enc(x), Enc(y) → Enc(x ⊕ y)


noise ~the same

noise doubled

But there is a notion of noise in ciphertexts

If the noise exceeds a threshold, the ciphertext looses decryptability

One must resort to bootstrapping, a very slow noise-cleaning operation

[Gentry09]

The importance of multiplicative depth

computation as a

boolean circuit

inputs

outputs

The importance of multiplicative depth

computation as a

boolean circuit

noise growth is

exponential

inputs

outputs

Somewhat HE and scale-invariant HE

FHE SHE

+bootstrap

+bootstrap

++noise

growth

✔ can go on forever

✘ works up to circuit

outputs

FHE = SHE + bootstrapping

Quick refresher on neural networks

Neural network = cognitive model

Artificial neurons

y =n

∑i=1

wixi + b z = f(y)

activation function

Dense layers (fully connected)

layer L layer L+1yL+1 = WL+1xL + bL+1

xL+1 = f (yL+1)

learnableweight matrix

learnablebias

vector

✔ learnable

Convolutional layers

0 1 1 1 1 0 0 1 1 0 0 0

0 1 2 2 1 1 1 2 2 2 1 0

1 0 0 3 3 1 1 1 1 0 0 0

1 0 2 3 2 3 3 3 1 2 0 0

1 1 3 7 4 5 5 5 5 2 3 1

0 1 1 8 7 6 7 5 6 3 2 0

0 1 0 5 9 5 6 6 3 4 2 0

0 2 2 7 8 9 9 9 4 4 1 0

0 1 3 4 7 7 8 6 3 0 0 0

1 0 3 2 3 5 4 3 4 2 1 0

1 1 2 1 1 5 3 2 2 3 2 1

0 0 0 0 0 2 0 1 1 0 2 0

0 1 1 1 1 0 0 1 1

0 1 2 2 1 1 1 2 2

1 0 0 3 3 1 1 1 1

1 0 2 3 2 3 3 3 1

1 1 3 7 4 5 5 5 5

0 1 1 8 7 6 7 5 6

0 1 0 5 9 5 6 6 3

0 2 2 7 8 9 9 9 4

0 1 3 4 7 7 8 6 3

1 0 3 2 3 5 4 3 4

-3 0 2 8

2 -1 -1 4

5 5 3 -6

-3 0 2 8

2 -1 -1 4

5 5 3 -6

-3 0 2 8

2 -1 -1 4

5 5 3 -6

-3 0 2 8

2 -1 -1 4

5 5 3 -6

-3 0 2 8

2 -1 -1 4

5 5 3 -6

0 1 1 1 1 0 0 1 1 0 0 0

0 1 2 2 1 1 1 2 2 2 1 0

1 0 0 3 3 1 1 1 1 0 0 0

1 0 2 3 2 3 3 3 1 2 0 0

1 1 3 7 4 5 5 5 5 2 3 1

0 1 1 8 7 6 7 5 6 3 2 0

0 1 0 5 9 5 6 6 3 4 2 0

0 2 2 7 8 9 9 9 4 4 1 0

0 1 3 4 7 7 8 6 3 0 0 0

1 0 3 2 3 5 4 3 4 2 1 0

1 1 2 1 1 5 3 2 2 3 2 1

0 0 0 0 0 2 0 1 1 0 2 0

0 1 1 1 1 0 0 1 1

0 1 2 2 1 1 1 2 2

1 0 0 3 3 1 1 1 1

1 0 2 3 2 3 3 3 1

1 1 3 7 4 5 5 5 5

0 1 1 8 7 6 7 5 6

0 1 0 5 9 5 6 6 3

0 2 2 7 8 9 9 9 4

0 1 3 4 7 7 8 6 3

1 0 3 2 3 5 4 3 4

0 1 1 1 1 0 0 1 1

0 1 2 2 1 1 1 2 2

1 0 0 3 3 1 1 1 1

1 0 2 3 2 3 3 3 1

1 1 3 7 4 5 5 5 5

0 1 1 8 7 6 7 5 6

0 1 0 5 9 5 6 6 3

0 2 2 7 8 9 9 9 4

0 1 3 4 7 7 8 6 3

1 0 3 2 3 5 4 3 4

0 1 1 1 1 0 0 1 1

0 1 2 2 1 1 1 2 2

1 0 0 3 3 1 1 1 1

1 0 2 3 2 3 3 3 1

1 1 3 7 4 5 5 5 5

0 1 1 8 7 6 7 5 6

0 1 0 5 9 5 6 6 3

0 2 2 7 8 9 9 9 4

0 1 3 4 7 7 8 6 3

1 0 3 2 3 5 4 3 4

0 1 1 1 1 0 0 1 1

0 1 2 2 1 1 1 2 2

1 0 0 3 3 1 1 1 1

1 0 2 3 2 3 3 3 1

1 1 3 7 4 5 5 5 5

0 1 1 8 7 6 7 5 6

0 1 0 5 9 5 6 6 3

0 2 2 7 8 9 9 9 4

0 1 3 4 7 7 8 6 3

1 0 3 2 3 5 4 3 4

0 1 1 1 1 0 0 1 1

0 1 2 2 1 1 1 2 2

1 0 0 3 3 1 1 1 1

1 0 2 3 2 3 3 3 1

1 1 3 7 4 5 5 5 5

0 1 1 8 7 6 7 5 6

0 1 0 5 9 5 6 6 3

0 2 2 7 8 9 9 9 4

0 1 3 4 7 7 8 6 3

1 0 3 2 3 5 4 3 4

-3 0 2 8

2 -1 -1 4

5 5 3 -6

learnable multiple kernels

kernel

tensorinput 2D image

✔ learnable

Average and max pooling

0 1 1 1 1 0 0 1 1 0 0 0

0 1 2 2 1 1 1 2 2 2 1 0

1 0 0 3 3 1 1 1 1 0 0 0

1 0 2 3 2 3 3 3 1 2 0 0

1 1 3 7 4 5 5 5 5 2 3 1

0 1 1 8 7 6 7 5 6 3 2 0

0 1 0 5 9 5 6 6 3 4 2 0

0 2 2 7 8 9 9 9 4 4 1 0

0 1 3 4 7 7 8 6 3 0 0 0

1 0 3 2 3 5 4 3 4 2 1 0

1 1 2 1 1 5 3 2 2 3 2 1

0 0 0 0 0 2 0 1 1 0 2 0

3 3 1

0 1 2

1 0 0

1 0 2

4x3 max pool

Flatteninglossless

dimension reduction

✘not learnable

Layer-wide operations

✘not learnable

layer L layer L+1

xL+1i =

exLi

∑j exLj

softmax

xL+1 =xL

∥xL∥

L2 regularization

xL+1 =xL

|xL |L1 regularization

So, how to evaluate a neural network?

Σ f

Σ f

Σ f

Σ fΣ f


Σ f

Σ f

Σ f

Σ fΣ f

linear


Σ f

Σ f

Σ f

Σ fΣ f


Σ f

Σ f

Σ f

Σ fΣ f

ℝ → ℤp


Σ f

Σ f

Σ f

Σ fΣ f

ℝ → ℤp?

CryptoNets

y =n

∑i=1

wixi + b z = f(y)

activation function = square

[DGBL+16]

noise growth is

linear

noise growth

Just take

L = ♯ 𝗅𝖺𝗒𝖾𝗋𝗌 + 1

CryptoNets: just use some leveled SHE

The CryptoNets approach - pros/consPros

Simple, can reuse libraries for leveled HE

Small accuracy loss compared to state-of-the-art nets

If the leveled HE scheme allows batch plaintexts then one can parallelize homomorphic inference

Cons

Super slow dude! (demo MNIST ~5mn)

Parameters increase too fast to cope with truly deep networks

Does not support state-of-the-art neural nets

How do you perform softmax or regularizations?

34

4-th generation FHE: Torus FHE (TFHE)

Torus FHE is a simplification and generalisation of 1st, 2nd and 3rd generation FHE based on [R]LWE

In its basic form:

Symmetric FHE

Plaintexts are bits

Supports all binary gates + negation + Mux

Every binary operation is bootstrapped

Bootstrapping is fast - a few tens of milliseconds

Security arises from the hardness of lattice reduction

Supports any security strength, e.g. 80, 128, 192, 256 bits

μ

0 = 112

gaussian noise

E ⃗s(μ) = ( ⃗a , ⃗a ⋅ ⃗s + μ + ϵ)

≈c ( ⃗a , u)

[CGGI16], [CGGI17]

Our approach

• Early results published in

Florian Bourse, Michele Minelli, Matthias Minihold and Pascal Paillier ”Fast homomorphic evaluation of deep discretized neural networks”

In CRYPTO 2018

• There have been a number of subsequent improvements following that paper

1. Extension from binary activations to real-valued activations

2. New bootstrapping procedure - also programmable

[BMMP18]

Our approach towards bootstrapping

σ2variance

∥ ⃗w∥2σ2

back to variance

E(μ′�)E (∑i

wiμi mod 1) TFHE bootstrap

Our approach towards bootstrapping

σ2variance

∥ ⃗w∥2σ2

back to variance

E (f(μ′�))E (∑i

wiμi mod 1) our bootstrap

f( ⋅ )

any real-valued univariate function can be programmed

A neuron-specific FHE

• Our scheme is specifically crafted to encrypt activations in neurons

• Designed to support only 2 operations:

• Multi-sum (scalar product with integer weight vector)

• Fast bootstrapping (a few ms)

• Encrypted activations are real values in intervals, not just bits

• Any activation function can be applied for free when bootstrapping

How to support all network operations

Generic neuron with parameters

Pick and set

Perform multi-addition with

Program the bootstrapping with the function

Max-pooling becomes super-easy: just reapply Max at will

Max: given

Bootstrap with function

Add

Bootstrap again with identity

w1, …, wn, b, f( ⋅ ) ∈ ℝ

a ∈ ℝ w̄i ≈ ⌈awi⌋w̄1, …, w̄n ∈ ℤ

f ( ⋅a

+ b)

E(μ1), E(μ2)

E(μ1 − μ2) max(0, ⋅)

E(μ2)

Summarized results

• Readily applies to

• Exact activation function: ReLU, Sigmoid, Tanh, etc ✔

• Deep networks, any number of layers ✔

• Convolutional NN, flattening, dropout ✔

• Recurrent NN ✔

• + extensions that support

• Max pooling, average pooling is trivial ✔

• Regularizations (L1, L2, elastic) ✔

• Softmax ✔

All ingredient operations that are performed

in neural networks are being covered

Technological advantages

CryptoNets and the like Our work

Uses levelled SHE (BGV, FV) not FHE True FHE, fully scalable

Each neuron much slower for deeper networks

Speed at neuron level independent of # layers

Polynomial approximation of activation functions Exact activation functions

Models must be trained specifically Can convert pre-trained models

Modular arithmetics Uses floats

Huge latency High throughput

game changers

Benchmark experiments

43

[BMMP18]


44

[BMMP18]


45

[BMMP18]

Introducing a new, game-changing business offer

GPU

GPU

GPU

sensitive data


Enc(

Enc(

)

)

GPU

GPU

GPU

sensitive data


to protected inference

Cloud Inc.

Enable Cloud Inc. to switch from unprotected inference

Cloud Inc.

Unsafe MLaaS

Safe MLaaS

Cloud Inc.

Input Output

Pre-trained model in Tensorflow format

(under NDA)

GPU-based inference engine for hard-coded model

GPUGPU GPU

PTXCode

Key generation

Encrypt

Decrypt

+

Automated conversion

Takes user-encrypted inputs

Returns user-encrypted results

User utilities

Cloud Inc.

ConclusionOur work constitutes a breakthrough in several ways over CryptoNets and the like

Supports all neural network operations

Supports pre-trained neural networks

Blazingly fast

Highly parallelizable

Some challenges remain

Boost performances - switch to GPUs

Avoid going back and forth from and to the FFT domain - can we stay in the FFT domain (almost) all the time?

Perform training over encrypted data

Questions?

[email protected]

Homomorphic encryption for deep learning: a revolution in the … · deep learning: a revolution in...

Documents

Transcript of Homomorphic encryption for deep learning: a revolution in the … · deep learning: a revolution in...