Homomorphic encryption for deep learning: a revolution in the … · deep learning: a revolution in...
Transcript of Homomorphic encryption for deep learning: a revolution in the … · deep learning: a revolution in...
Homomorphic encryption for deep learning: a revolution in the makingPascal Paillier
The problem
A booming industry.
The problem
GPU
GPU
GPU
data
prediction
Trained neural network
Cloud Inc.
The problem
GPU
GPU
GPU
sensitive data
prediction
Cloud Inc.
The problem
GPU
GPU
GPU
sensitive data
sensitive prediction
Cloud Inc.
The problem
GPU
GPU
GPU
sensitive data
sensitive prediction
STDs, HIV
Health data
GenomicsBlood work
Lifestyle trackingCloud Inc.
The problem
GPU
GPU
GPU
sensitive data
sensitive prediction
Fraud prevention
Financial data
Bank activityTransaction
records
InvestmentsCloud Inc.
The problem
GPU
GPU
GPU
sensitive data
sensitive prediction
Crime prevention
Gov dataPolice/Justice case solving
Tax evasion
Cloud Inc.
Homomorphic encryptionfor neural networks
The big motivation for applying FHE
GPU
GPU
GPU
sensitive data
sensitive prediction
Enc(
Enc(
)
)
Input and output data are encrypted
The model is evaluated in the encrypted domain = homomorphic inference
Only the user has the key to encrypt
and decrypt
Cloud Inc.
First generation FHE
x, y ∈ {0,1}Enc(x), Enc(y) → Enc(x ⊕ y)
Enc(x), Enc(y) → Enc(x ∧ y)
pretty fast
super slow
⊕ + ∧ = all functions
[Gentry09]
First generation FHE
x, y ∈ {0,1}Enc(x), Enc(y) → Enc(x ⊕ y)
Enc(x), Enc(y) → Enc(x ∧ y)
pretty fast
super slow
⊕ + ∧ = all functions
Enc(x), Enc(y) → Enc(x ⊕ y)
Enc(x), Enc(y) → Enc(x ∧ y)
noise ~the same
noise doubled
But there is a notion of noise in ciphertexts
If the noise exceeds a threshold, the ciphertext looses decryptability
One must resort to bootstrapping, a very slow noise-cleaning operation
[Gentry09]
The importance of multiplicative depth
computation as a
boolean circuit
inputs
outputs
The importance of multiplicative depth
computation as a
boolean circuit
noise growth is
exponential
inputs
outputs
Somewhat HE and scale-invariant HE
FHE SHE
+bootstrap
+bootstrap
++noise
growth
✔ can go on forever
✘ works up to circuit
outputs
FHE = SHE + bootstrapping
Quick refresher on neural networks
Neural network = cognitive model
Artificial neurons
y =n
∑i=1
wixi + b z = f(y)
activation function
Dense layers (fully connected)
layer L layer L+1yL+1 = WL+1xL + bL+1
xL+1 = f (yL+1)
learnableweight matrix
learnablebias
vector
✔ learnable
Convolutional layers
0 1 1 1 1 0 0 1 1 0 0 0
0 1 2 2 1 1 1 2 2 2 1 0
1 0 0 3 3 1 1 1 1 0 0 0
1 0 2 3 2 3 3 3 1 2 0 0
1 1 3 7 4 5 5 5 5 2 3 1
0 1 1 8 7 6 7 5 6 3 2 0
0 1 0 5 9 5 6 6 3 4 2 0
0 2 2 7 8 9 9 9 4 4 1 0
0 1 3 4 7 7 8 6 3 0 0 0
1 0 3 2 3 5 4 3 4 2 1 0
1 1 2 1 1 5 3 2 2 3 2 1
0 0 0 0 0 2 0 1 1 0 2 0
0 1 1 1 1 0 0 1 1
0 1 2 2 1 1 1 2 2
1 0 0 3 3 1 1 1 1
1 0 2 3 2 3 3 3 1
1 1 3 7 4 5 5 5 5
0 1 1 8 7 6 7 5 6
0 1 0 5 9 5 6 6 3
0 2 2 7 8 9 9 9 4
0 1 3 4 7 7 8 6 3
1 0 3 2 3 5 4 3 4
-3 0 2 8
2 -1 -1 4
5 5 3 -6
-3 0 2 8
2 -1 -1 4
5 5 3 -6
-3 0 2 8
2 -1 -1 4
5 5 3 -6
-3 0 2 8
2 -1 -1 4
5 5 3 -6
-3 0 2 8
2 -1 -1 4
5 5 3 -6
0 1 1 1 1 0 0 1 1 0 0 0
0 1 2 2 1 1 1 2 2 2 1 0
1 0 0 3 3 1 1 1 1 0 0 0
1 0 2 3 2 3 3 3 1 2 0 0
1 1 3 7 4 5 5 5 5 2 3 1
0 1 1 8 7 6 7 5 6 3 2 0
0 1 0 5 9 5 6 6 3 4 2 0
0 2 2 7 8 9 9 9 4 4 1 0
0 1 3 4 7 7 8 6 3 0 0 0
1 0 3 2 3 5 4 3 4 2 1 0
1 1 2 1 1 5 3 2 2 3 2 1
0 0 0 0 0 2 0 1 1 0 2 0
0 1 1 1 1 0 0 1 1
0 1 2 2 1 1 1 2 2
1 0 0 3 3 1 1 1 1
1 0 2 3 2 3 3 3 1
1 1 3 7 4 5 5 5 5
0 1 1 8 7 6 7 5 6
0 1 0 5 9 5 6 6 3
0 2 2 7 8 9 9 9 4
0 1 3 4 7 7 8 6 3
1 0 3 2 3 5 4 3 4
0 1 1 1 1 0 0 1 1
0 1 2 2 1 1 1 2 2
1 0 0 3 3 1 1 1 1
1 0 2 3 2 3 3 3 1
1 1 3 7 4 5 5 5 5
0 1 1 8 7 6 7 5 6
0 1 0 5 9 5 6 6 3
0 2 2 7 8 9 9 9 4
0 1 3 4 7 7 8 6 3
1 0 3 2 3 5 4 3 4
0 1 1 1 1 0 0 1 1
0 1 2 2 1 1 1 2 2
1 0 0 3 3 1 1 1 1
1 0 2 3 2 3 3 3 1
1 1 3 7 4 5 5 5 5
0 1 1 8 7 6 7 5 6
0 1 0 5 9 5 6 6 3
0 2 2 7 8 9 9 9 4
0 1 3 4 7 7 8 6 3
1 0 3 2 3 5 4 3 4
0 1 1 1 1 0 0 1 1
0 1 2 2 1 1 1 2 2
1 0 0 3 3 1 1 1 1
1 0 2 3 2 3 3 3 1
1 1 3 7 4 5 5 5 5
0 1 1 8 7 6 7 5 6
0 1 0 5 9 5 6 6 3
0 2 2 7 8 9 9 9 4
0 1 3 4 7 7 8 6 3
1 0 3 2 3 5 4 3 4
0 1 1 1 1 0 0 1 1
0 1 2 2 1 1 1 2 2
1 0 0 3 3 1 1 1 1
1 0 2 3 2 3 3 3 1
1 1 3 7 4 5 5 5 5
0 1 1 8 7 6 7 5 6
0 1 0 5 9 5 6 6 3
0 2 2 7 8 9 9 9 4
0 1 3 4 7 7 8 6 3
1 0 3 2 3 5 4 3 4
-3 0 2 8
2 -1 -1 4
5 5 3 -6
learnable multiple kernels
kernel
tensorinput 2D image
✔ learnable
Average and max pooling
0 1 1 1 1 0 0 1 1 0 0 0
0 1 2 2 1 1 1 2 2 2 1 0
1 0 0 3 3 1 1 1 1 0 0 0
1 0 2 3 2 3 3 3 1 2 0 0
1 1 3 7 4 5 5 5 5 2 3 1
0 1 1 8 7 6 7 5 6 3 2 0
0 1 0 5 9 5 6 6 3 4 2 0
0 2 2 7 8 9 9 9 4 4 1 0
0 1 3 4 7 7 8 6 3 0 0 0
1 0 3 2 3 5 4 3 4 2 1 0
1 1 2 1 1 5 3 2 2 3 2 1
0 0 0 0 0 2 0 1 1 0 2 0
3 3 1
0 1 2
1 0 0
1 0 2
4x3 max pool
Flatteninglossless
dimension reduction
✘not learnable
Layer-wide operations
✘not learnable
layer L layer L+1
xL+1i =
exLi
∑j exLj
softmax
xL+1 =xL
∥xL∥
L2 regularization
xL+1 =xL
|xL |L1 regularization
So, how to evaluate a neural network?
Σ f
Σ f
Σ f
Σ fΣ f
So, how to evaluate a neural network?
Σ f
Σ f
Σ f
Σ fΣ f
linear
So, how to evaluate a neural network?
Σ f
Σ f
Σ f
Σ fΣ f
So, how to evaluate a neural network?
Σ f
Σ f
Σ f
Σ fΣ f
So, how to evaluate a neural network?
Σ f
Σ f
Σ f
Σ fΣ f
So, how to evaluate a neural network?
Σ f
Σ f
Σ f
Σ fΣ f
So, how to evaluate a neural network?
Σ f
Σ f
Σ f
Σ fΣ f
ℝ → ℤp
So, how to evaluate a neural network?
Σ f
Σ f
Σ f
Σ fΣ f
ℝ → ℤp?
CryptoNets
y =n
∑i=1
wixi + b z = f(y)
activation function = square
[DGBL+16]
noise growth is
linear
noise growth
Just take
L = ♯ 𝗅𝖺𝗒𝖾𝗋𝗌 + 1
CryptoNets: just use some leveled SHE
The CryptoNets approach - pros/consPros
Simple, can reuse libraries for leveled HE
Small accuracy loss compared to state-of-the-art nets
If the leveled HE scheme allows batch plaintexts then one can parallelize homomorphic inference
Cons
Super slow dude! (demo MNIST ~5mn)
Parameters increase too fast to cope with truly deep networks
Does not support state-of-the-art neural nets
How do you perform softmax or regularizations?
34
4-th generation FHE: Torus FHE (TFHE)
Torus FHE is a simplification and generalisation of 1st, 2nd and 3rd generation FHE based on [R]LWE
In its basic form:
Symmetric FHE
Plaintexts are bits
Supports all binary gates + negation + Mux
Every binary operation is bootstrapped
Bootstrapping is fast - a few tens of milliseconds
Security arises from the hardness of lattice reduction
Supports any security strength, e.g. 80, 128, 192, 256 bits
μ
0 = 112
gaussian noise
E ⃗s(μ) = ( ⃗a , ⃗a ⋅ ⃗s + μ + ϵ)
≈c ( ⃗a , u)
[CGGI16], [CGGI17]
Our approach
• Early results published in
Florian Bourse, Michele Minelli, Matthias Minihold and Pascal Paillier ”Fast homomorphic evaluation of deep discretized neural networks”
In CRYPTO 2018
• There have been a number of subsequent improvements following that paper
1. Extension from binary activations to real-valued activations
2. New bootstrapping procedure - also programmable
[BMMP18]
Our approach towards bootstrapping
σ2variance
∥ ⃗w∥2σ2
back to variance
E(μ′�)E (∑i
wiμi mod 1) TFHE bootstrap
Our approach towards bootstrapping
σ2variance
∥ ⃗w∥2σ2
back to variance
E (f(μ′�))E (∑i
wiμi mod 1) our bootstrap
f( ⋅ )
any real-valued univariate function can be programmed
A neuron-specific FHE
• Our scheme is specifically crafted to encrypt activations in neurons
• Designed to support only 2 operations:
• Multi-sum (scalar product with integer weight vector)
• Fast bootstrapping (a few ms)
• Encrypted activations are real values in intervals, not just bits
• Any activation function can be applied for free when bootstrapping
How to support all network operations
Generic neuron with parameters
Pick and set
Perform multi-addition with
Program the bootstrapping with the function
Max-pooling becomes super-easy: just reapply Max at will
Max: given
Bootstrap with function
Add
Bootstrap again with identity
w1, …, wn, b, f( ⋅ ) ∈ ℝ
a ∈ ℝ w̄i ≈ ⌈awi⌋w̄1, …, w̄n ∈ ℤ
f ( ⋅a
+ b)
E(μ1), E(μ2)
E(μ1 − μ2) max(0, ⋅)
E(μ2)
Summarized results
• Readily applies to
• Exact activation function: ReLU, Sigmoid, Tanh, etc ✔
• Deep networks, any number of layers ✔
• Convolutional NN, flattening, dropout ✔
• Recurrent NN ✔
• + extensions that support
• Max pooling, average pooling is trivial ✔
• Regularizations (L1, L2, elastic) ✔
• Softmax ✔
All ingredient operations that are performed
in neural networks are being covered
Technological advantages
CryptoNets and the like Our work
Uses levelled SHE (BGV, FV) not FHE True FHE, fully scalable
Each neuron much slower for deeper networks
Speed at neuron level independent of # layers
Polynomial approximation of activation functions Exact activation functions
Models must be trained specifically Can convert pre-trained models
Modular arithmetics Uses floats
Huge latency High throughput
game changers
Benchmark experiments
43
[BMMP18]
Benchmark experiments
44
[BMMP18]
Benchmark experiments
45
[BMMP18]
Introducing a new, game-changing business offer
GPU
GPU
GPU
sensitive data
sensitive prediction
Enc(
Enc(
)
)
GPU
GPU
GPU
sensitive data
sensitive prediction
to protected inference
Cloud Inc.
Enable Cloud Inc. to switch from unprotected inference
Cloud Inc.
Unsafe MLaaS
Safe MLaaS
Cloud Inc.
Input Output
Pre-trained model in Tensorflow format
(under NDA)
GPU-based inference engine for hard-coded model
GPUGPU GPU
PTXCode
Key generation
Encrypt
Decrypt
+
Automated conversion
Takes user-encrypted inputs
Returns user-encrypted results
User utilities
Cloud Inc.
ConclusionOur work constitutes a breakthrough in several ways over CryptoNets and the like
Supports all neural network operations
Supports pre-trained neural networks
Blazingly fast
Highly parallelizable
Some challenges remain
Boost performances - switch to GPUs
Avoid going back and forth from and to the FFT domain - can we stay in the FFT domain (almost) all the time?
Perform training over encrypted data
Questions?