Arithmetic Coding and Blinding Countermeasures for Lattice ... · size. Arithmetic Coding offers an...

19
Noname manuscript No. (will be inserted by the editor) Arithmetic Coding and Blinding Countermeasures for Lattice Signatures Engineering a Side-Channel Resistant Post-Quantum Signature Scheme with Compact Signatures Markku-Juhani O. Saarinen Manuscript version of Wednesday 21 st December, 2016 Abstract We describe new arithmetic coding techniques and side-channel blinding countermea- sures for lattice-based cryptography. Using these techniques, we develop a practical, compact, and more quantum-resistant variant of the BLISS Ideal Lattice Signature Scheme. We first show how the BLISS parameters and hash-based random oracle can be modified to be more secure against quan- tum preimage attacks while optimizing signature size. Arithmetic Coding offers an information the- oretically optimal compression for stationary and memoryless sources, such as the discrete Gaussian distributions often present in lattice-based cryp- tography. We show that this technique gives bet- ter signature sizes than the previously proposed advanced Huffman-based signature compressors. We further demonstrate that arithmetic decoding from an uniform source to target distribution is also an optimal non-uniform sampling method in the sense that a minimal amount of true random bits is required. Performance of this new Binary Arithmetic Coding (BAC) sampler is comparable to other practical samplers. The same code, tables, or circuitry can be utilized for both tasks, eliminat- ing the need for separate sampling and compres- Markku-Juhani O. Saarinen, DarkMatter, Dubai, UAE. E-mail: [email protected] This work was partly funded by the European Union H2020 SAFEcrypto project (grant no. 644729) while the Author was with CSIT, Queen’s University Belfast, UK. sion components. We then describe simple ran- domized blinding techniques that can be applied to anti-cyclic polynomial multiplication to mask timing- and power consumption side-channels in ring arithmetic. We further show that the Gaus- sian sampling process can also be blinded by a split-and-permute techniques as an effective coun- termeasure against side-channel attacks. Keywords Lattice Signatures · Arithmetic coding · Side-Channel Countermeasures · Quantum-Resistant Cryptography · BLISS 1 Introduction Recent years have seen an increased focus on Lattice-based and other “quantum-resistant” pub- lic key cryptography for classical computers, which has also been fueled by official governmen- tal warnings and endorsements [5, 6, 9, 31]. However, standardization efforts for these new algorithms are only starting [8, 30] and post- quantum algorithms clearly have not yet reached the level of maturity that is expected from tra- ditional cryptosystems (RSA or Elliptic Curve). Lattice-based public key algorithms have been criticized for their key and signature / cipher- text sizes, and for their lack of resistance against side-channel attacks. This paper offers a number

Transcript of Arithmetic Coding and Blinding Countermeasures for Lattice ... · size. Arithmetic Coding offers an...

Page 1: Arithmetic Coding and Blinding Countermeasures for Lattice ... · size. Arithmetic Coding offers an information the-oretically optimal compression for stationary and memoryless sources,

Noname manuscript No.(will be inserted by the editor)

Arithmetic Coding and Blinding Countermeasures forLattice SignaturesEngineering a Side-Channel Resistant Post-Quantum Signature Scheme withCompact Signatures

Markku-Juhani O. Saarinen

Manuscript version of Wednesday 21st December, 2016

Abstract We describe new arithmetic codingtechniques and side-channel blinding countermea-sures for lattice-based cryptography. Using thesetechniques, we develop a practical, compact, andmore quantum-resistant variant of the BLISS IdealLattice Signature Scheme. We first show how theBLISS parameters and hash-based random oraclecan be modified to be more secure against quan-tum preimage attacks while optimizing signaturesize. Arithmetic Coding offers an information the-oretically optimal compression for stationary andmemoryless sources, such as the discrete Gaussiandistributions often present in lattice-based cryp-tography. We show that this technique gives bet-ter signature sizes than the previously proposedadvanced Huffman-based signature compressors.We further demonstrate that arithmetic decodingfrom an uniform source to target distribution isalso an optimal non-uniform sampling method inthe sense that a minimal amount of true randombits is required. Performance of this new BinaryArithmetic Coding (BAC) sampler is comparableto other practical samplers. The same code, tables,or circuitry can be utilized for both tasks, eliminat-ing the need for separate sampling and compres-

Markku-Juhani O. Saarinen, DarkMatter, Dubai, UAE.E-mail: [email protected] work was partly funded by the European Union H2020SAFEcrypto project (grant no. 644729) while the Authorwas with CSIT, Queen’s University Belfast, UK.

sion components. We then describe simple ran-domized blinding techniques that can be appliedto anti-cyclic polynomial multiplication to masktiming- and power consumption side-channels inring arithmetic. We further show that the Gaus-sian sampling process can also be blinded by asplit-and-permute techniques as an effective coun-termeasure against side-channel attacks.

Keywords Lattice Signatures · Arithmeticcoding · Side-Channel Countermeasures ·Quantum-Resistant Cryptography · BLISS

1 Introduction

Recent years have seen an increased focus onLattice-based and other “quantum-resistant” pub-lic key cryptography for classical computers,which has also been fueled by official governmen-tal warnings and endorsements [5,6,9,31].

However, standardization efforts for these newalgorithms are only starting [8,30] and post-quantum algorithms clearly have not yet reachedthe level of maturity that is expected from tra-ditional cryptosystems (RSA or Elliptic Curve).Lattice-based public key algorithms have beencriticized for their key and signature / cipher-text sizes, and for their lack of resistance againstside-channel attacks. This paper offers a number

Page 2: Arithmetic Coding and Blinding Countermeasures for Lattice ... · size. Arithmetic Coding offers an information the-oretically optimal compression for stationary and memoryless sources,

of novel implementation techniques that addressthese issues.

The BLISS (Bimodal Lattice SignatureScheme) was proposed in CRYPTO 2013 byDucas, Durmus, Lepoint, and Lyubashevsky [11]and offers some of the most compact and efficientlattice-based signatures currently available. Arecent survey of practical lattice-based signatureschemes [19] found BLISS to offer state-of-the-artperformance and recommended it for practicaluse. The scheme has been implemented in FPGAhardware [35], on an 8-bit AVR target [27], andhas been distributed as a part of the strongSwanIPsec VPN suite. Relevant hardware implemen-tation techniques are also considered in [40].We use BLISS as basis for our new lattice-basedsignature implementation, dubbed BLZZRD.

After describing the notation (Section 2.1) andthe original BLISS scheme (Section 2.2), we moveto our new discoveries:

1. Grover’s Attack on Random Oracle can bemounted with a Quantum Computer againstmany signature algorithms. This attack can beapplied against the random oracle componentof BLISS (Section 3). We show how to mod-ify the algorithm and its security parametersto counter this attack with small performanceoverhead (Section 3.1).

2. Arithmetic Coding is ideally suited for com-pression of stationary distributions (Section4) arising in Lattice cryptography, achievingnear-optimal efficiency. We show how a Bi-nary Arithmetic Coder (BAC) can be imple-mented with limited precision (Section 5), andused to compress lattice signatures (Section5.2).

3. Gaussian sampling can be implementedwith a BAC. The same arithmetic coding ta-bles, code, and circuitry can also be used toefficiently implement Gaussian sampling, an-other major operation required in Lattice cryp-tosystems (Section 5.4). The resulting sampleris optimal in the sense that it requires a mini-mum number of true random bits.

4. Polynomial Blinding is an efficient ring-arithmetic side-channel countermeasure. Ithides the details of time or energy consump-tion of individual arithmetic operations viarandomization. Polynomial blinding can beimplemented with a minimal impact on over-all performance or footprint (Section 6.1).

5. Split Shuffled Sampling is a general side-channel countermeasure for Gaussian sam-plers. Here two or more sample vectors arerandomized and combined with each other,masking the properties of individual samplesduring the random sampling process (Sec-tion 6.2). This is an effective countermeasureagainst attacks such as the cache attack pre-sented against BLISS in CHES 2016 [3].

Some efficient implementation tricks for taskssuch as computation of the discrete Gaussian dis-tribution are also described. We further take ad-vantage of recent advances in security proof tech-niques, indicating a smaller precision requirementwhich results leAlgorithmads to smaller, faster im-plementations.

2 BLISS and BLZZRD

BLISS is an lattice public key signature algorithmproposed by Ducas, Durmus, Lepoint, and Lyuba-shevsky [11]. We use the more recent BLISS-Bvariant [10] as basis of our work. We refer thereader to these works for security analysis andother design rationale of the original proposals.Since BLZZRD is an improvement with quan-tum resistance, compressed signatures, and side-channel resistant countermeasures, we concentrateon these specific implementation techniques inpresent work.

2.1 Conventions and Notation

Arithmetic is performed in a cyclotomic ring R =

Zq[x]/(xn + 1) where q is a small prime. Such aring is anti-circulant as xn ≡ −1 (mod xn + 1).Arithmetic in this ring can be performed efficiently

2

Page 3: Arithmetic Coding and Blinding Countermeasures for Lattice ... · size. Arithmetic Coding offers an information the-oretically optimal compression for stationary and memoryless sources,

Table 1 Original parameter sets and security targets for BLISS-B from [10,11,35].

Scheme BLISS-B0 BLISS-BI BLISS-BII BLISS-BIII BLISS-BIVOptimized for Fun Speed Size Security SecuritySecurity target ≤ 60 bits 128 bits 128 bits 160 bits 192 bits

Degree n 256 512 512 512 512Modulus q 7681 12289 12289 12289 12289

Secret key density δ1, δ2 0.55, 0.15 0.3, 0.0 0.3, 0.0 0.42, 0.03 0.45, 0.06Deviation σ 100 215 107 250 271

Ratio α 0.748 1.610 0.801 1.216 1.027Repetition rate M 2.44 1.21 2.18 1.40 1.61

Challenge weight κ 12 23 23 30 39Truncation params d, p 5, 480 10, 24 10, 24 9, 48 8, 96Verification bound B2 2492 12878 11074 10206 9901Verification bound B∞ 530 2100 1563 1760 1613

via Number Theoretic Transforms (Analogous toFFT in finite fields) when n divides q − 1 and n isa power of 2.

Non-boldface fi denotes individual coeffi-cients of polynomial f =

∑n−1i=0 fix

i. Hencef ∈ R is a ring element, while fi ∈ Zq . Weuse a ∗ b =

∑n−1i=0

∑n−1j=0 aibjx

i+j to denote theproduct of polynomials a and b. For anti-circulantrings of degree n we always have fi+n = −fi andwe may therefore reduce fi = (−1)bi/ncfi mod n

for all i ∈ Z.We interchangeably use polynomials as zero-

indexed vectors; dot product is defined as a ·b =

∑n−1i=0 aibi, the Euclidean norms are

‖a‖2 = a · a, ‖a‖2 =√a · a, and

sup norm ‖a‖∞ is the largest absolute valuemax|a0|, |a1|, . . . , |an−1|. Rounding symbolbxe = bx+ 1

2c denotes the closest integer to x.A discrete Gaussian distribution with mean µ,

deviation σ and variance σ2 is denotedNZ(µ, σ2).

We use exclusively zero-centered (µ = 0) distri-butions. See Section 4 for distribution definitionsand further discussion.

In pseudocode we use ∧,∨,¬ symbols to de-note bitwise Boolean manipulation of two’s com-plement integers. A binary modular reduction op-erator mod n returns an integer in range [0, n−1].

2.2 Description of BLISS

Our description differs somewhat from the orig-inal description (which is even more dense). We

used the original reference implementation1 andthe strongSwan “production grade” implementa-tion2 to verify the correctness of our interpretation.

Various parameters, symbols, and securityclaims used in descriptions are given in Table 1.These are the parameters used by both the pub-lications and can also be found in the implemen-tations. These are unmodified for BLZZRD apartfrom κ. See Section 2.2 for further information.

2.3 Key Generation

The BLISS-B key generation is perhaps the sim-plest part of the scheme. Algorithm 1 describesthe process. The BLISS-B key generation proce-dure differs from original BLISS in that keys arenot rejected based on an additional norm.

2.4 Creating Signatures

Algorithm 2 describes the signature generationmethod. The purpose of the random oracle H isto take the input (w,M) and hash it into a vec-tor of length n that has exactly κ ones, with otherentries being zero (the vector can also be inter-preted as a polynomial in the ring). The referenceimplementation used an ad hoc construction based

1 Original BLISS reference implementations are avail-able from: http://bliss.di.ens.fr/

2 strongSwan: https://wiki.strongswan.org/projects/strongswan/wiki/BLISS

3

Page 4: Arithmetic Coding and Blinding Countermeasures for Lattice ... · size. Arithmetic Coding offers an information the-oretically optimal compression for stationary and memoryless sources,

Algorithm 1 BLISS-B [10,11] Key Generation. f and g are polynomials in ring Zq[x]/(xn + 1).1: (f ,g)← Uniformly random polynomials with exactly dδ1ne entries in ±1 and dδ2ne entries in ±2.2: g← 2g + 1 All but g0 are even (±2 or ±4)).3: a← g/f . Restart if f is not invertible. Quick division via Number Theoretic Transform.

Output: Private key (f ,g) and public key a.

on SHA-512 [16]; the strongSwan implementationnow uses the MGF1 construction [20] after wereported cryptographic flaws in their own ad hocconstruction.

We use a novel two-stage random oracle de-scribed in Section 3.1. The BLISS-B signaturemethod differs from BLISS in steps 8-15 of Al-gorithm 2; this is the “Greedy Sign Choices” algo-rithm for minimizing the norm ‖c·t‖+‖c·u‖. Oth-erwise the algorithms are equivalent and BLISScan even be used to verify BLISS-B signatureswithout modification.

2.5 Verifying Signatures

The BLISS signature verification process is de-scribed in Algorithm 3. Verification is very fast;only two quick checks for norm bounds and a sin-gle ring multiplication is required in addition to acall to the random oracle.

3 Security of the Random Oracle

Is there a trivial way to forge a signature (t, z, c)

that will pass signature verification (Algorithm 3)for some public key a and message M?

We can easily choose arbitrary t and z thatpass steps 1 and 2 (norm bounds). What is leftis finding matching oracle outputs c

?= c?. There

are(nκ

)possible ways for H(w,M) to construct

a vector with κ entries as 1, and remaining n − κentries being zero.

Examining the BLISS reference implementa-tions, we find that c may not be transmitted orused as a vector (or a sorted list) but as an unsortedoutput of the oracle; in this case there are n!

(n−κ)!possibilities. To get from list (ordered set) match-ing to vector (unordered set) matching complex-ity, one can simply rearrange the input c values in

the signature to match the output from the oraclefor the forged message M (note that this is not aseasy with our revised Oracle described in Section3.1.) These entropies (given in Table 2) appear tobe consistent with security claims.

This is reasonable in classical computingwhere O(2H) preimage search is required whereH is the entropy of target range; however thesigner herself can create a signature that matchestwo different messages with relative ease. BLISSsignatures are therefore not collision resistant.

We note that preimage search is one of thethings that quantum computers do well. This leadsus the following simple theorem:

Theorem 1 (Exhaustive Quantum Forgery) Aquantum adversary can forge an arbitrary BLISS

signature with O(√(

))complexity.

Proof With fixed t and c in Algorithm 3 one canmount an exhaustive preimage search to find somemessage M or small-norm vector∆z = z−z′ thatsatisfies c = H(w + ∆z,M). This search with

Grover’s Algorithm [17,18] will require π4

√(nκ

)lookups. This is optimal complexity for a quantumsearch [46]. ut

We observe that with the suggested parametersand security levels in Tables 1 and 2, the secu-rity level against a quantum adversary falls shortof the target; for example, BLISS-I and BLISS-IIwith claimed 128-bit security would fall with com-plexity of roughly 266.

3.1 New Random Oracle

With n = 512, a trivial encoding of c requires 9κbits for transmitting the indexes. In BLZZRD the cvector itself (or the nonzero index set) is not trans-mitted. An intermediate functionHi(w,M) is first

4

Page 5: Arithmetic Coding and Blinding Countermeasures for Lattice ... · size. Arithmetic Coding offers an information the-oretically optimal compression for stationary and memoryless sources,

Algorithm 2 BLISS-B [10,11] Signature algorithm. H(w,M) is a deterministic random oracle.Input: Private key (f ,g) and public key a = g/f .Input: Message to be signed M.

1: (t,u)← (NnZ (0, σ2),NnZ (0, σ2)) All coefficients are discrete Gaussians.2: v← t ∗ a Multiplication in the ring using NTT.3: for i = 0, 1, . . . , n− 1 do4: vi ←

((q + 1)vi + ui

)mod 2q Equivalent to adding q to vi if odd, then ui.

5: wi = b vi2d e mod p Truncation to nearest integer, limit by p = b2d−1qc.6: end for7: c← H(w,M) Using w and M, create a vector with κ ones.8: (a,b)← (0,0) Init the “GreedySC” sign choices algorithm.9: for all ci = 1 do

10: if(∑n−1

j=0 fjai+j + gjbi+j)≥ 0 then

11: (a,b)← (a,b)− xi(f ,g) Shift both f and g by i positions for subtraction.12: else13: (a,b)← (a,b) + xi(f ,g) Add. Anti-circulant shift for norm minimization.14: end if15: end for16: s← Random bit. Randomize sign.17: (t,u)← (t,u) + (−1)s(a,b)18: Continue with probability 1/

(M exp

(−‖a‖

2+‖b‖22σ2

)cosh

(t·a+u·bσ2

)), otherwise restart.

19: for i = 0, 1, . . . , n− 1 do20: zi ← wi − bvi−ui2d

e mod p Create “rounding correction” for the signature.21: end forOutput: Message signature is (t, z, c).

Algorithm 3 BLISS-B [10,11] Signature verification.Input: Public key a.Input: Signature (t, z, c) and message M.

1: Reject if√‖t‖2 + 22d‖z‖2 > B2 Euclidean norm bound check.

2: Reject if max(‖t‖∞, 2d‖z‖∞) > B∞ Suprenum norm bound check.3: v← t ∗ a Ring arithmetic using NTT mod q.4: for i = 0, 1, . . . , n− 1 do5: vi ← (q + 1)vi + ciq mod 2q Adding q to vi if vi is odd, and / or ci = 1.6: wi ← b vi2d e+ zi mod p Truncated value “corrected” with signature z.7: end for8: c? ← H(w,M) Run the oracle on w and M, compare.

Output: Accept signature if c = c?.

used to compute a θ-bit random hash cθ. The inter-mediate then can be converted to the c vector withones at exactly κ positions via Ho oracle function:

c = H(w,M) = Ho (Hi(w,M)) . (1)

Related random Oracle construction was alsoproposed in [44], where hash function output wasextended using a stream cipher (confusingly calleda “PRNG” in that work). Fortunately we nowhave a hash standard that directly supports XOFs(eXtendable-Output Functions).

BLZZRD uses the SHA3 hash function andSHAKE XOF [15,16]. Hi is implemented withSHA3-256 or SHA3-384, depending on corre-sponding θ value:

cθ = Hi(w,M) = SHA3− θ( w |M ). (2)

Contents of w = ( w0 | w1 | · · · | wn−1 ) areencoded as network byte order 16-bit integers.To produce the c indexes, SHA3’s SHAKE256extendable-output function (XOF) is used. The cθintermediate hash is used to seed the XOF, fromwhich an arbitrary number of 16-bit big-endian

5

Page 6: Arithmetic Coding and Blinding Countermeasures for Lattice ... · size. Arithmetic Coding offers an information the-oretically optimal compression for stationary and memoryless sources,

Table 2 Complexity of exhaustive forgery attack on the hash-based random oracle H , based on entropy. The parametersused by new variant given below. The new variant does not communicate the c vector itself, but an intermediate hash thatcan be used to deterministically generate it. BLZZRD does not contain an equivalent of “toy” BLISS-0.

Variant / Attack I II III IVOriginal BLISS n, κ 512, 23 512, 23 512, 30 512, 39Orig. c entropy, bits 131.82 131.82 161.04 195.02

Orig. bits on wire (κ log2 n) 207 207 270 351Orig. Grover’s search security 66 66 80 96

BLZZRD n, κ 512, 58 512, 58 512, 82 512, 113New c entropy, bits 256.81 256.81 320.58 385.30

New intermediate hash θ, bits 256 256 384 384New Grover’s search security 128 128 160 192

values can be extracted. These are masked to in-dex range [ 0, n − 1 ] and rejected if already con-tained in c. This process is repeated until κ onesare found for the c vector.

Note that SHAKE256 accepts an arbitrary-sized input “key” and has an internal chaining ca-pacity of 512 bits. Its collision resistance as a hashfunction is at 256-bit security level, and this is alsoits security level against a quantum preimage at-tack; more than adequate for 128-, 160-, and 192-bit security goals.

When transmitting or comparing signatures(Algorithms 2 and 3) we may use cθ instead of thefull c vector or index set. This leads to more com-pact signatures. See Table 2 for numerical valuesfor θ; it has been chosen to be the double of theof the security parameter to counter the QuantumForgery Attack of Theorem 1. The new κ parame-ter is chosen so that

(nκ

)> 2θ.

The increased κ has an impact on both signa-ture creation and verification speed, and also therejection ratio (Step 18 in Algorithm 2.) Exper-imentally the slowdown is less than 30% in allcases.

Our modification of κ helps to secure thescheme against this particular quantum attack, butthat does not necessarily secure the scheme againstall of them. Evaluation with the methodology of[2] indicates that quantum attacks against the lat-tice are harder the quantum preimage attack de-scribed here, however.

4 Discrete Gaussians

Obtaining random numbers from a Gaussian (Nor-mal) or other non-uniform distributions is calledrandom sampling. Cryptographically secure sam-pling is required by many lattice-based crypto-graphic algorithms; see [12] for an overview.

4.1 Gaussian Sampling

A random sampler from a zero-centered discreteGaussian distribution NZ(0, σ

2) returns integerx ∈ Z with probability given by density func-tion ρσ(x). This probability mass of discrete Gaus-sian distribution at x is exactly proportional toρσ(x) ∝ exp(− x2

2σ2 ), where σ is a deviation pa-rameter (See Figure 1). For σ ' 2 we can approx-imate it to high precision with

ρσ(x) ≈1

σ√2πe−

x2

2σ2 . (3)

4.2 Tailcut

As |x| grows, ρσ(x) eventually diminishes intostatistical insignificance. We may choose a cryp-tographic threshold parameter such as ε =

2−128 and simply ignore all |x| > τ when2∑x>τ ρσ(x) ≤ ε. Therefore the probability that

a random x from the distribution satisfies −τ <

x < τ is greater than 1 − ε. We call τ the “tail-cut parameter”. It’s typical numerical value forε = 2−128 bound is related to deviation by roughlyτ = 13.2σ.

6

Page 7: Arithmetic Coding and Blinding Countermeasures for Lattice ... · size. Arithmetic Coding offers an information the-oretically optimal compression for stationary and memoryless sources,

0.02500.05000.07500.10000.12500.15000.1750

-8 -7 -6 -5 -4 -3 -2 -1 0 1 2 3 4 5 6 7 8x

p(x)

Fig. 1 The green bars illustrate the probability mass for in-tegers x ∈ Z with a σ = 2.5 discrete Gaussian distribu-tion (Equation 3). Blue line is the corresponding continuousprobability density function.

4.3 Efficient computation of density tables

Essentially all sampling algorithms require distri-bution density tables. Evaluation of transcenden-tal functions (such as exp) is slow if generic algo-rithms are used.

We derive two parameters b = exp(− 12σ2 )

and c = 1/σ√2π from deviation σ. Equation

3 can now be written as ρσ(x) = cb(x2). Since

x2 ∈ Z, square-and-multiply exponentiation algo-rithms (analogous to those used for modular expo-nentiation) may be used to efficiently compute thepower b(x

2) at arbitrary x ∈ Z. Real values canbe kept at fixed point range [0, 1] if floating pointarithmetic is not available.

When tabulating the density, due to symme-try ρσ(x) = ρσ(−x) we consider the sequencet0, t1, t2, .. with ti = ρσ(i). By observing that theratio of consecutive values satisfies ti+1

ti= ui =

b2i+1 we arrive at the following recurrence:

t0 = c u0 = b Initialize.ti = ti−1ui−1 ui = b2ui−1 For i ≥ 1.

(4)

The algorithm of Equation 4 computes consecu-tive discrete Gaussian density function values withonly two multiplications per point (b2 stays con-stant). This new technique makes the table initial-ization process much faster, a great advantage forlimited-resource implementations.

4.4 Gaussian sampling algorithms

We first describe the “inversion sampler” which isone way of using uniform random bits to selectan element from the target distribution by invert-ing its cumulative distribution. Since NZ(0, σ

2) issymmetric we can define a cumulative sequencesi =

∑ix=−i ρσ(x). It can be computed as an ex-

tension of sequences of Equation 4 via s0 = t0 andsi = si−1+2ti. Clearly the sum of all probabilitiesconverges to s∞ = 1.

For cryptographic applications we can assumea source of unbiased random bits zi ∈ 0, 1.A sequence of n random bits can be viewed asreal-valued z ∈ [0, 1] via binary fraction z =

0.z1z2z3 . . . zn. When n is finite, z1, z2, ..zn onlydefines a range of size 2−n. For uniformly (n =

∞) random z ∈ [0, 1] the corresponding sampledinteger x can be derived with additional randomsign bit via

x =

0 if z < s0,

±i if si−1 ≤ z < si for i ≥ 1.(5)

This corresponds to “inversion sampling”, and canbe implemented via a binary search into mono-tonically increasing table si by first randomizinga large number of bits to create a high-precisionreal number z.

A wide variety of sampling methods suchas Inversion Sampling [32], Knuth-Yao Sampling[22], The Ziggurat Method [4,13,28,29], Kahn-Karney Sampling [21], and “Bernoulli” sampling[11] have been proposed for lattice cryptography.

5 Arithmetic Coding

Arithmetic Coding is a standard, classical datacompression technique [37], notable for its opti-mality under certain conditions. It is superior to,but less common than Huffman coding. This ismainly because its use has been historically hin-dered by patents. For example the JPEG image fileformat has always supported both, but implemen-tation of the arithmetic coder in 1990s was rare.

Information theory tells us that the average en-tropy (in bits/symbol) of a stationary and memo-ryless discrete source such as Ω = NZ(0, σ

2) is

7

Page 8: Arithmetic Coding and Blinding Countermeasures for Lattice ... · size. Arithmetic Coding offers an information the-oretically optimal compression for stationary and memoryless sources,

H(Ω) = −∞∑

x=−∞ρσ(x) log2 ρσ(x). (6)

Equation 6 gives us a lower bound for bits requiredto represent any discrete Gaussian sequence; this isalso the expected number of random entropy bitsrequired in sampling.

Theorem 2 Arithmetic coding is optimal for astatic and memoryless discrete source Ω in thesense that it is able to encode n symbols from sucha source into nH(Ω) +O(1) data bits.

Proof See Section 1.5 of [42]. ut

5.1 Implementing a Binary Arithmetic Coder

Arithmetic Coding and Decoding can be imple-mented in many ways. Binary Arithmetic Code(BAC) [25] is one such approach. For BLZZRDreference code, we implemented a BAC encoderand decoder for a static (non-dynamic) distribu-tion. Range scaling details of the new codec wereinspired by [42], even though that implementationisn’t a BAC. See [45] for a classic implementationof arithmetic coding.

Our implementation is in C language, uses 64-bit fixed point precision, and is highly portable.Importantly, we pose no specific limits on buffercarry bit back-propagation, such as “bit stuffing”used in Q-Coder hardware architecture [33]. Fullsource is distributed together with the referenceimplementation (see Section 7). The implementa-tion is only about 250 lines.

Table 3 gives the variable conventions used inthe implementation. Constants used to define theBLZZRD BAC are as follows:

P Bit precision used by the imple-mentation. We used P = 64.

D Alphabet size in bits. For discreteGaussians, we need 2D > 2τ .

With a BAC, an alphabet of size 2D can be viewedas a binary tree of depth D. For each one of the Dbinary decisions in encoding or decoding, a singlecomparison is required.

Table 3 The internal variables used by main BAC routines.Note that interpretation of “input” and “output” differs forencoding and decoding, as does the size of a word.

n Size of the decoded vector of words.b, l The interval is [b, b + l − 1]. Can be imple-

mented with P -bit arithmetic.c Scaled binary division point, P bits.x Current input byte or word.ibit Input bit counter.ibuf[ ] Input vector. Indexed by iptr as ibuf[iptr].owrd Stores an output word and carry bits; must

have a type large enough for additional carrybits.

obit Output bit counter.obuf[ ] Output vector. Indexed by olen as obuf[olen].

The input frequencies can be arbitrarily scaled“counts”. The probability of occurrence of x is

Pr(x) =freq[x]∑2D−1i=0 freq[i]

. (7)

Given a table of frequencies freq[2D], functionBuildDist (Algorithm 4) creates a correspondingscaled BAC distribution table “tree” dist[2D] thatis used by both encoding and decoding functions.

The encoding routine AriEncode (Algorithm6) requires a helper function StoreWithCarry (Al-gorithm 5) to propagate carries. Arithmetic codingroutines generally require an output buffer as thesecarries cannot be reliably predicted. The decodingroutine AriDecode (Algorithm 8) utilizes a helperfunction ShiftGetBit (Algorithm 7) to get new bitsinto the working variable x.

The coding and decoding procedure is illus-trated by Figure 2. Exhaustive testing with variousdistributions was performed to verify the correctoperation of the codec.

5.2 Compressing Signatures

We have implemented arithmetic coding compres-sion and decompression for the t component ofsignatures generated with new BLZZRD parame-ters. The z vector was also modelled as discreteGaussian with a small σ. See Table 4 for ourparameters and experimental results. Rather thantransmitting the c vector indexes, an intermediate

8

Page 9: Arithmetic Coding and Blinding Countermeasures for Lattice ... · size. Arithmetic Coding offers an information the-oretically optimal compression for stationary and memoryless sources,

Algorithm 4 BuildDist. Create a scaled BAC distribution tree from alphabet frequency counts.Input: freq[2D], a frequency count for alphabet.

1: for i = 2D−1, . . . 8, 4, 2, 1 do2: j ← 03: while j < 2D do4: c0 ←

∑j+i−1k=j freq[k] “Left” tree branch count for 0.

5: c1 ←∑j+2i−1k=i+j freq[k] “Right” branch count for 1.

6: dist[i+ j]←⌊

2P c0c0+c1

⌋Division point scaled to [0, 2P − 1].

7: j ← j + 2i Big step.8: end while9: end for

Output: dist[2D], a BAC distribution tree.

Algorithm 5 StoreWithCarry(b). Stores the highest bit of b to variable owrd. If the byte is full, it isstored to obuf[olen], while also adjusting the carry if necessary.1: owrd← 2 owrd +

⌊b

2P−1

⌋Add highest bit of b to output, shift left

2: obit← obit + 1 Output bit counter.3: if obit ≥ 8 then4: obuf[olen]← owrd mod 28

5: i← olen Index for carry propagation.6: while (owrd ≥ 28) and (i > 0) do7: i← i− 1 Proceed left.8: owrd←

⌊owrd28

⌋+ obuf[i]

9: obuf[i]← owrd mod 28 Add carry to bytes until done.10: end while11: obit← 0, owrd← 0, olen← olen + 1 Full byte output.12: end if

3 2 1 0 0

1

0.07935..

0.15702..

0.13583..

0.14483..

0.14237..

0.14341..

5

10 10

0.14285..

0.14285..

0.0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1.0

0

1

0

.4202

.0350

.4202

.0350

.1570

0

8

0

4

0

2

0

1

Fig. 2 In this toy example we are encoding discrete Gaussian with σ = 2.5 with tail cutting applied at ≈ 0.001 level; therange is−7 . . . 7 and the integers fit 4 bits. For convenience the distribution is centered at 8 average (high bit flip). Left sideshows the operation of binary sample decoding; this corresponds to binary search within the cumulative distribution. On eachstep either min or max bound is set to divisor value. Only one scaling multiplication and comparison is required. Four bitsyields the output 0+ 4+0+1 = 5. Right side illustrates decoding of multiple samples. Output (5, 10, 10) corresponds to(−3, 2, 2) when adjusted to zero centre. After decoding 3 samples the bounds min = 0.14237..10 = 0.0010010001110..2and max = 0.14285..10 = 0.0010010010010..2 already match in 8 binary fraction bits. Both bounds and the inputfraction can be shifted left, discarding the integer bits.

9

Page 10: Arithmetic Coding and Blinding Countermeasures for Lattice ... · size. Arithmetic Coding offers an information the-oretically optimal compression for stationary and memoryless sources,

Algorithm 6 AriEncode. Perform Arithmetic Coding on a sequence of input words.Input: ibuf[n]: A sequence of n input words from the alphabet.Input: dist[2D]: A BAC distribution tree table constructed with BuildDist.

1: b← 0, l← 2P − 1 Initial base b and range l.2: owrd← 0, obit← 0, olen← 0 Initialize StoreWithCarry output variables.3: for iptr = 0, 1, 2, . . . n− 1 do4: x = ibuf[iptr]. Get a new input word x.5: for ibit = D − 1 . . . , 2, 1, 0 do6: c← dist[

(x ∧ (2P − 2ibit)

)∨ 2ibit] Get centre point from masked BAC freq. table.

7: c←⌊lc2P

⌋Scale c to [0, l[.

8: if (x ∧ 2ibit) = 0 then9: l← c Input bit is 0: select lower part.

10: else11: b← b+ c, l← l − c Input bit is 1: select higher part.12: if b > 2P then13: b← b− 2P , owrd← owrd + 1 Carry. Note: owrd’s type should not overflow.14: end if15: end if16: while l < 2P−1 do17: StoreWithCarry(b) Store the highest bit of b.18: b← 2b mod 2P , l← 2l mod 2P Shift bounds left.19: end while20: end for21: end for22: while b > 0 or owrd 6= 0 do23: StoreWithCarry(b) Purge remaining nonzero bits from b.24: b← 2b mod 2P Shift b bound left.25: end whileOutput: obuf[olen], a byte vector with the encoded data.

Algorithm 7 ShiftGetBit(x). Shifts x left and fetches a new bit from ibuf byte array.1: ibit← ibit− 1 Decrease number of available bits.2: if ibit < 0 then3: if iptr < ilen then4: iwrd← ibuf[iptr] Get a new byte.5: iptr← iptr + 16: else7: iwrd← 0 Assume zeros.8: end if9: ibit← 7 Manage input pointers.

10: end if11: x← (2x mod 2P ) + (

⌊iwrd2ibit

⌋mod 2) Shift x left, add a bit from iwrd at the bottom.

hash of θ bits is transmitted. The last line of Table4 gives the experimental average signature sizes,including overhead required to encode parameterand array sizes.

5.3 Comparison with a Huffman Code

The extended version of [35] describes an ad-vanced “block” Huffman encoding technique for

the BLISS-1 parameter set. A similar technique isalso implemented in the strongSwan project.

The codec achieves good performance (fora Huffman code) by assigning codes to quadru-plets

(b abs(t2i)28 c, b abs(t2i+1)

28 c, z2i, z2i+1

)rather

than individual ti and zi values. The lower 8 bitsof ti values is stored as they are, and up to foursign bits are stored for nonzero entries.

10

Page 11: Arithmetic Coding and Blinding Countermeasures for Lattice ... · size. Arithmetic Coding offers an information the-oretically optimal compression for stationary and memoryless sources,

Algorithm 8 AriDecode. Perform Arithmetic Decoding on a sequence of input bytes.Input: ibuf[ilen]: A sequence of input bytes.Input: dist[2D]: A BAC distribution tree table constructed with BuildDist.

1: b← 0, l← 2P − 1, x← 0 Initial base b, range l, and input variable x.2: iwrd← 0, iptr← 0, ibit← 0 Initialize input pointers.3: for i = 1, 2, . . . P do4: ShiftGetBit(x) Fill up x with P input bits.5: end for6: for optr = 0, 1, 2, . . . n− 1 do7: owrd← 0 Zero this output word.8: for obit = D − 1 . . . , 2, 1, 0 do9: c← dist[

(owrd ∧ (2P − 2obit)

)∨ 2obit] Get centre point from masked BAC freq. table.

10: c←⌊lc2P

⌋Scale c to [0, l[.

11: if x− b < c then12: l← c Output bit 0: Select lower part, reduce range.13: else14: b← b+ c Output bit 1: Select upper part.15: l← l − c Reduce range.16: owrd← owrd ∨ 2obit Store bit.17: end if18: while l < 2P−1 do19: ShiftGetBit(x) Add an input bit to x.20: b← 2b mod 2P , l← 2l Shift bounds left.21: end while22: end for23: obuf[optr]← owrd Store the output word.24: end forOutput: obuf[n], a word vector with the decoded data.

Table 4 BLZZRD signature compression using Binary Arithmetic Coding.

Parameter / Variant BLZZRD-I BLZZRD-II BLZZRD-III BLZZRD-IVSecurity target, bits 128 128 160 192

Revised κ 58 58 82 113Arith. code σ for t 215 107 250 271Arith. code σ for z 0.4792 0.4352 0.646 1.136Intermediate hash θ 256 256 384 384Avg. compressed t 5079.2 4563.8 5190.6 5250.2Avg. compressed z 486.0 326.9 779.3 1206.1Encoding overhead 32 32 32 32

Signature average, bits 5843.2 5178.7 6385.9 6872.3

We note that the 64-entry Huffman tree givenin Appendix B of [35] for BLISS-I can only en-code values in range −1023 ≤ ti ≤ 1023.Since ti comes from Gaussian distribution withσ = 215, there is a probability of P = 1 −∑1023x=−1023 ρσ(x) ≈ 2−18.98 that individual ti val-

ues will overflow and a probability of 1 − (1 −P )n ≈ 2−9.98 (or roughly one in thousand) thatan individual t signature vector cannot be com-pressed using the proposed code.

Ignoring these frequent overflow conditions,our exhaustive testing has showed that the blockHuffman codec compresses (t, z) to an av-erage of 5650.1 bits/signature. Our arithmeticcoder achieves a slightly better result, 5565.2

bits/signature. A non-customized Huffman cod-ing technique would yield significantly inferior re-sults. We emphasize that our BAC technique is en-tirely generic and the same code can be used for

11

Page 12: Arithmetic Coding and Blinding Countermeasures for Lattice ... · size. Arithmetic Coding offers an information the-oretically optimal compression for stationary and memoryless sources,

any static distribution without modification or cus-tomization.

Furthermore, [35] states that “c does not con-tain any easily removable redundancy”, and there-fore κ log2(n) = 207 bits are used. This is notstrictly accurate as our new oracle (Section 3.1) re-quires a smaller number of bits. In case of the orig-inal BLISS-I security parameters only θ = 128

bits would be required, a 38% drop. For BLZZRD-I, we have κ = 58 and θ = 256, indicating a re-duction to less than half of the corresponding plainBLISS encoding size.

Arithmetic Coding is somewhat slower thanHuffman coding, but as can be observed in Table7, compression is not a bottleneck in the signingoperation. Signature verification consists of a sin-gle multiplication and is very fast anyway. As in-dicated by Theorem 2, Arithmetic Coding is themost efficient way to compress signatures with el-ements from non-uniform but static distribution.

5.4 Sampling with an Arithmetic Decoder

An interesting consequence of the information the-oretic optimality of an arithmetic coder is thatwhen the decoder is fed uniform random bits, itwill output random samples in the desired targetdistribution. Therefore the same code, circuitry,and tables that are used to compress signatures candouble as a random sampler as well. We have uti-lized Algorithm 8 as a Gaussian sampler in thisfashion. With precision P = 64 we may reason-ably expect to reach a security level close to 128-bit level, based on the “Valiant-Valiant” samplingTheorem and conjectures [41].

Table 5 gives performance characteristics ofour initial implementation. The binary search CDFsampler used tables of size 212 and an additionalsign bit, whereas the BAC sampler used tables ofsize 213 without special coding for sign bits. Therandom usage was calculated from average num-ber of random bytes used to sample vectors oflength n = 512. The table also includes numbersfor a blinded sampler (Section 6.2). Precision ofP = 64 was used in all cases and the speeds arecomparable. We observe that the main advantageof the new BAC sampler is that it uses a very small

number of true random bits; the actual random us-age is typically within 1.3% of the theoretical min-imum for vector sizes used on Lattice cryptogra-phy in experiments with our implementation.

By comparison, The Knuth-Yao sampler canalso be expected to use a small number of bits,shown to averageH+2 bits per sample [22]. How-ever, Knuth-Yao always uses a whole number ofbits per sample whereas a BAC sampler can beutilized to create vectors of samples where bits ofentropy are “shared” between individual samples.This elementary property directly follows fromTheorem 2. In many ways the Knuth-Yao samplercompares to an BAC sampler like Huffman Codescompare to Arithmetic Compression.

For BLZZRD parameters the actual random-ness usage of Knuth-Yao is about 16% higher thanthe BAC sampler for vectors of length n = 512.Knuth-Yao of [39] is suitable primarily for smalldeviations used in Ring-LWE encryption (σ ≈ 3),not for larger σ required by BLISS / BLZZRD.

6 Randomized Side-Channel Countermeasuresfor Ideal Lattices

Blinding is a standard countermeasure againstboth the timing attack [23] and emissions-basedattacks such as Differential Power Analysis [24]for traditional public key cryptosystems. Blind-ing countermeasures add randomness to privatekey operations, making determination of secretsfrom observations more difficult for the attacker.In case of RSA, there are two kinds of blinding,base blinding and exponent blinding. In case ofECC, scalar blinding can be used.

Examining private key operations in latticecryptosystems, we note that there are two maincomponents that can introduce side-channel vul-nerabilities to the implementation; ring polyno-mial arithmetic and Gaussian sampling. We pro-pose analogous blinding countermeasures againstboth. For BLISS/BLZZRD (Algorithm 2), specialnote should be made to implement the GreedySCcomponent using side-channel resistant techniquesas well. An advanced cache attack against a BLISSimplementation was described in CHES 2016 [3],

12

Page 13: Arithmetic Coding and Blinding Countermeasures for Lattice ... · size. Arithmetic Coding offers an information the-oretically optimal compression for stationary and memoryless sources,

Table 5 Sampling with an Arithmetic Coder vs a simple inverse CDF sampler on a Core i7-5600U @ 2.6 GHz. Note thatP = 64 precision was also used for BLZZRD-III and BLZZRD-IV, even though it is really not sufficient above 128-bitsecurity level. The same precision was used for all samplers.

Parameter / Variant BLZZRD-I BLZZRD-II BLZZRD-III BLZZRD-IVDeviation σ 215 107 250 271

Entropy H (NZ(0, σ2)) 9.7953 8.7886 10.013 10.129BAC Sampler

Random bits / Sample 9.9261 8.9194 10.144 10.260Samples / Second 8,900,000 9,600,000 8,800,000 8,800,000

CDF SamplerRandom bits / Sample 64 64 64 64

Samples / Second 8,672,000 8,408,000 8,704,000 8,440,000Blinded CDF (m = 2).Random bits / Sample 128 128 128 128

Samples / Second 3,400,000 3,400,000 3.400,000 3,400,000

which attacked the samplers of a reference imple-mentation.

Masking [7,38] is a standard countermeasureagainst Differential Power Analysis [24], espe-cially on hardware platforms. Our arithmetic coun-termeasures are not based on splitting sensitivedata into multiple shares, and therefore have muchsmaller overhead when compared to work such as[36] on masked Ring-LWE implementation. Oursampling countermeasure can be seen as one typeof generalized masking, however.

6.1 Blinded Polynomial Multiplication

Basic arithmetic in Zq and especially the modu-lar reduction by small prime q may easily intro-duce emissions to an ideal lattice cryptography im-plementation. For example, the arithmetic opera-tions described for a low-resource Ring-LWE im-plementation of [27] contain a number of brancheson sensitive data.

The simplest form of polynomial blinding isto multiply polynomials with random constantsa, b ∈ Zq . This can be done in regular or NTTdomain; the results are equivalent. One can seta = b−1 or multiply the result by c = (ab)−1:

h = af ∗ bgf ∗ g = (ab)−1h.

Note that anti-cyclic NTT multiplication requireseach polynomial to be “pre-processed” by multi-

plying entries with tabulated roots of unity ωi any-way. Therefore by choosing a = ωi, b = ωj ,this type of blinding can be done with virtuallyno extra cost. The normalization constant becomesc = ω−i−j .

Algorithm 9 PolyBlind(v, s, c) returns vector vof length n shifted by s, 0 ≤ s < n positions andmultiplied with a constant c.1: for i = 0, 1, . . . n− s− 1 do2: v′i ← (cv(i+s)) mod q3: end for4: for i = n− s, . . . n− 1 do5: v′i ← (q − cv(i+s−n)) mod q6: end for

Output: Blinded vector v′.

An another type of blinding of anti-cyclicpolynomial multiplication can be achieved via cir-cularly “shifting” the polynomials. As noted inSection 2.1, we may write a polynomial as f =∑n−1i=0 fix

i. Shifting by j positions is equivalentto computing

xjf =

n−1∑i=0

fixi+j =

n−1∑i=0

fi−jxi. (8)

Here the coefficients are therefore simply rotatedin anti-cyclic fashion. Both constant multiplica-tion and shifting by c are captured by functionPolyBlind (Algorithm 9). This type of countermea-sure was apparently first proposed in [26] and in-

13

Page 14: Arithmetic Coding and Blinding Countermeasures for Lattice ... · size. Arithmetic Coding offers an information the-oretically optimal compression for stationary and memoryless sources,

dependently in [47] for use with the NTRU cryp-tosystem.

PolyBlind (Algorithm 9) is very fast. Its in-verse operation PolyBlind′(v,−s, c−1) – distin-guished by a negative shift value s – is equally easyto construct. With that function, we have

PolyBlind′(PolyBlind(v, s, c),−s, c−1) = v (9)

Due to isometries of the anti-circulant ring, we canuse a total of four blinding parameters: a, b (con-stants) and r, s (shift values) in the blinded schemeto compute the polynomial product f ∗ g:

f ′ = PolyBlind(f , r, a) 0 ≤ r < n, 0 < a < q

g′ = PolyBlind(g, s, b) 0 ≤ s < n, 0 < b < q

h′ = f ′ ∗ g′

f ∗ g = PolyBlind′(h′,−(r + s), (ab)−1

).

One may choose a and b from tabulated rootsof unity; a = ωi, b = ωj and avoid comput-ing the inverse since (ab)−1 = ω−(i+j). Thistype of blinding has a relatively small performancepenalty. If roots of unity are used as constants,the total amount of “noise” entropy introduced is4 log2(n) = 36 bits.

The basic blinding technique is generally ap-plicable to schemes based on ideal lattices. Forexample the optimized implementations of ringarithmetic for the “New Hope” Ring-LWE keyexchange [1,2] can be blinded in straightforwardfashion and with only a negligible performancepenalty.

6.2 Blinded Gaussian Sampling

We note that the BLISS/BLZZRD algorithm al-ways samples vectors of n variables at once.We define a function VectorSample(n, σ) =

NnZ (0, σ

2) that produces a vector of n samplesfrom discrete Gaussian with deviation parameterσ. Step 1 of signing process (Algorithm 2) can bewritten as

(t,u)← (VectorSample(n, σ),

VectorSample(n, σ)).

If VectorSample is implemented naively as aloop, emissions from the implementation may re-veal information about the random numbers beingsampled and attacker may determine which ele-ments of the random vector have specific features,as is done in the cache attack presented in CHES2016 [3]. This may be alleviated to some degree byusing VectorShuffle(VectorSample(n, σ)), whichis just a random shuffle of the Gaussian samplevector. This approach was first proposed in [39].

From probability theory we know that thesum of any two discrete Gaussian distributionsis a discrete Gaussian distribution. More pre-cisely, variances (which is the square of devia-tion) are additive. Let X and Y be independentrandom variables with distributions NZ(µX , σ

2X)

and NZ(µY , σ2Y ), respectively. Then their sum

X + Y has distribution NZ(µX + µY , σ2X + σ2

Y ).With zero-centered distributions the average doesnot change, but the resulting deviation will beσX+Y =

√σ2X + σ2

Y . By induction, this can begeneralized to more variables. We use this prop-erty to create a more secure Gaussian vector sam-pler. Algorithm 10 constructs the target distribu-tion as a sum of random samples from distributionσ′ = 1√

mσ.

Algorithm 10 VectorBlindSample(n,m, σ)

returns NnZ (0, σ

2) using m iterations of randomblinding.1: x← 02: for i = 1, 2, . . . ,m do3: x← x+NnZ

(0, ( 1√

mσ)2)

4: x← VectorShuffle(x)5: end for

Output: Sampled vector x.

As pointed out in [35], one can also chooseσ′ = σ√

1+k2and construct the final distribution

as Z = X + kY . This approach is more limited,though, as the convolution parameter k ∈ Z mustbe carefully chosen so that the statistical distanceto the actual target distribution is not too great. Ta-ble 6 gives some appropriate parameters to use. Inthe resulting construction all elements of the sec-ond vector are simply scaled with the integer con-

14

Page 15: Arithmetic Coding and Blinding Countermeasures for Lattice ... · size. Arithmetic Coding offers an information the-oretically optimal compression for stationary and memoryless sources,

Table 6 Parameters for Z ≈ X + kY split samplers, where the target distribution is NZ(0, σ), k is a small integerconstant, and X and Y have distribution NZ(0, σ′) with σ′ = 1

σ

√1 + k2. This is a convolution approximation and ε

gives the statistical distance (total variation distance) to the target distribution. Here we want ε2 < 1n

where n is a securityparameter [41,43].

Parameter / Variant BLZZRD-I BLZZRD-II BLZZRD-III BLZZRD-IVDeviation σ 215 107 250 271Constant k 11 8 12 12

Split deviation σ′ 17.8548 13.2717 20.7614 22.5053Distance ε 2−89.1 2−76.8 2−85.3 2−100.1

stant k:

v = VectorShuffle(VectorSample(n, σ′)) +

k ∗ VectorShuffle(VectorSample(n, σ′)).

This countermeasure has a performance im-pediment factor of m or less. However, fast“leakier” sampling algorithms may be used. Sig-nificantly, the required tables are much smaller.For example, a CDF table of only 256 entries givesa secure tailcut at τ = 14.34σ for BLZZRD-I withparameters of Table 6.

The amount of noise entropy introduced bythe permutations is technically in thousands ofbits. Even though this does not directly translateto attack complexity, this countermeasure makesemissions- and cache-based attacks on the samplersignificantly harder, defeating the BLISS attack ofCHES 2016 [3]. It may be possible to adopt theattack to counter some blinding parameters, butprobably not all. For example, in subsequent work[34] a sampler with m = 2 countermeasure is at-tacked with significantly increased complexity, butthe attack does apply to m = 3. This is just a sim-ple parameter change in the implementation.

7 Implementation

An experimental reference implementation thatcontains most of the features described in thiswork is available as an self-contained publicdomain distribution https://github.com/mjosaarinen/blzzrd. Table 7 summarizesits performance on a 2015 laptop computer (IntelCore i7-4870HQ @ 2.50GHz).

On the same system, OpenSSL’s (version1.0.2g) optimized implementation of ECDSA withthe NIST P-256 curve [14] required 0.039ms forcreating a signature and 0.097ms for verification.All variants of Reference BLZZRD have at least40% faster signature verification when comparedto this implementation of NIST P-256 ECDSA.

NIST P-256 is the fastest curve implementedby OpenSSL; BLZZRD outperforms all ECDSAcurves in signature verification with a high mar-gin in this industry-standard implementation. Forlarger curves (409 bits or more), BLZZRD signa-ture generation was also faster than ECDSA.

We note that (unlike the OpenSSL ECDSAbenchmark implementation) our BLZZRD refer-ence implementation has no assembly languageoptimizations and was generally written for read-ability and maintainability rather than for rawspeed. It is also the first implementation withside-channel countermeasures, so direct perfor-mance comparisons to Lattice-based implementa-tions without countermeasures would not be fair.

8 Conclusions

We have offered a number of techniques that canbe used to improve the security and performanceof lattice-based public key algorithms. Using thesetechniques, we constructed BLZZRD, a practi-cal, evolutionary variant of the BLISS-B signaturescheme.

We have described a direct Grover’s attack onthe random oracle component of the BLISS sig-nature scheme. In order to counter this attack, anew hash-based random oracle is proposed, withadjusted κ parameter and an “intermediate” hash

15

Page 16: Arithmetic Coding and Blinding Countermeasures for Lattice ... · size. Arithmetic Coding offers an information the-oretically optimal compression for stationary and memoryless sources,

Table 7 Breakdown of time consumption by different BLZZRD elementary operations in the plain C reference implemen-tation. We observe that blinding countermeasures for sampling and arithmetic approximately double the time required forcreating signatures. CDF sampler was used for these tests.

Primitive BLZZRD-I BLZZRD-II BLZZRD-III BLZZRD-IVGenerate a Key Pair 0.251 ms 0.250 ms 0.274 ms 0.285 ms

Sign a Message 0.254 ms 0.533 ms 0.326 ms 0.409 ms.. with Countermeasures 0.498 ms 1.042 ms 0.613 ms 0.757 ms

Verify a Signature 0.064 ms 0.063 ms 0.068 ms 0.069 msCompress a Signature 0.044 ms 0.041 ms 0.046 ms 0.049 ms

Uncompress a Signature 0.048 ms 0.042 ms 0.050 ms 0.053 ms

which reduces signature size. It is currently noteasy to estimate the security of all components ofBLISS/BLZZRD against quantum attacks, but thenew random oracle parameters are consistent withsuggested quantum-resistant key sizes for sym-metric ciphers [9].

Many algorithms in Lattice cryptography uti-lize discrete Gaussian variables in signatures andciphertexts. We show that arithmetic coding canbe used to compress these quantities in essentiallyoptimal fashion. We describe an efficient BinaryArithmetic Coder (BAC) that produces smallersignatures than previous compressors. A some-what surprising finding is that arithmetic coderscan be adopted as non-uniform (Gaussian) Sam-plers that have comparable performance charac-teristics (millions of samples per second) to othersampling algorithms, but require only a (opti-mally) small amount of true random bits.

Standard “blinding” side-channel countermea-sures for public key algorithms introduce random-ness (noise) into private key operations, thus mak-ing determination of secrets more difficult. In ideallattice cryptography, the main components used byprivate key operations typically are ring arithmeticand non-uniform (Gaussian) random sampling.

For ring arithmetic, we introduce “blindedpolynomial multiplication”, a simple randomiza-tion technique based of constant multiplicationand rotation of polynomials. This technique ischeap to implement and utilizes the specific isome-tries of the types of anti-circulant rings often usedin lattice cryptography.

For Gaussian sampling, we note that latticecryptography algorithms typically require vectors

rather than individual samples. We show that sam-pling processes can be blinded by shuffling thesevectors and combining multiple Gaussian distri-butions with each other. This can also result ina significantly reduced implementation footprintsince the size of required CDF tables can be madesmaller. The general countermeasure is effectiveagainst side-channel attacks such as the cache at-tack recently described against BLISS, and is pa-rameterizable to counter future attacks.

References

1. Erdem Alkim, Léo Ducas, Thomas Pöppel-mann, and Peter Schwabe. Newhope with-out reconciliation. Preprint., 2016. URL:https://cryptojedi.org/papers/newhopesimple-20161217.pdf. 14

2. Erdem Alkim, Léo Ducas, Thomas Pöppelmann,and Peter Schwabe. Post-quantum key exchange –a new hope. In USENIX Security 16, pages 237–343. USENIX Association, August 2016. URL:https://www.usenix.org/system/files/conference/usenixsecurity16/sec16_paper_alkim.pdf. 6, 14

3. Leon Groot Bruinderink, Andreas Hülsing, TanjaLange, and Yuval Yarom. Flush, Gauss, and reload– a cache attack on the BLISS lattice-based sig-nature scheme. In Benedikt Gierlichs and Axel Y.Poschmann, editors, CHES 2016, volume 9813of LNCS, pages 323–345. Springer, 2016. URL:https://eprint.iacr.org/2016/300,doi:10.1007/978-3-662-53140-2_16. 2,12, 14, 15

4. Johannes Buchmann, Daniel Cabarcas, FlorianGöpfert, Andreas Hülsing, and Patrick Weiden.Discrete ziggurat: A time-memory trade-off forsampling from a Gaussian distribution over theintegers. In Tanja Lange, Kristin Lauter, and PetrLisonek, editors, SAC 2013, volume 8282 of LNCS,

16

Page 17: Arithmetic Coding and Blinding Countermeasures for Lattice ... · size. Arithmetic Coding offers an information the-oretically optimal compression for stationary and memoryless sources,

pages 402–417. Springer, 2014. Extended ver-sion available as IACR ePrint 2014/510. URL:https://eprint.iacr.org/2013/510,doi:10.1007/978-3-662-43414-7_20. 7

5. Peter Campbell, Michael Groves, and Dan Shep-herd. Soliloquy: A cautionary tale. ETSI2nd Quantum-Safe Crypto Workshop in part-nership with the IQC, October 2014. URL:https://docbox.etsi.org/Workshop/2014/201410_CRYPTO/S07_Systems_and_Attacks/S07_Groves_Annex.pdf. 1

6. CESG. Quantum key distribution: A CESGwhite paper, February 2016. URL: https://www.cesg.gov.uk/white-papers/quantum-key-distribution. 1

7. Suresh Chari, Charanjit S. Jutla, Josyula R. Rao,and Pankaj Rohatgi. Towards sound approachesto counteract power-analysis attacks. In MichaelWiener, editor, CRYPTO 1999, volume 1666 ofLNCS, pages 398–412. Springer, 1999. URL:https://eprint.iacr.org/2009/420,doi:10.1007/3-540-48405-1_26. 13

8. Lily Chen, Stephen Jordan, Yi-Kai Liu, Dustin Moody,Rene Peralta, Ray Perlner, and Daniel Smith-Tone.Report on post-quantum cryptography. NISTIR 8105,April 2016. URL: http://nvlpubs.nist.gov/nistpubs/ir/2016/NIST.IR.8105.pdf,doi:10.6028/NIST.IR.8105. 1

9. CNSS. Use of public standards for the secure shar-ing of information among national security systems.Committee on National Security Systems: CNSS Ad-visory Memorandum, Information Assurance 02-15,July 2015. 1, 16

10. Léo Ducas. Accelerating Bliss: the geometry of ternarypolynomials. IACR ePrint 2014/874, 2014. URL:https://eprint.iacr.org/2014/874. 2, 3,4, 5

11. Léo Ducas, Alain Durmus, Tancrède Lep-oint, and Vadim Lyubashevsky. Lattice signa-tures and bimodal Gaussians. In Ran Canettiand Juan A. Garay, editors, CRYPTO 2013,pages 40–56. Springer, 2013. Extended ver-sion available as IACR ePrint 2013/383. URL:https://eprint.iacr.org/2013/383,doi:10.1007/978-3-642-40041-4_3. 2, 3,4, 5, 7

12. Nagarjun C. Dwarakanath and Steven D. Galbraith.Sampling from discrete Gaussians for lattice-basedcryptography on a constrained device. Applicable Al-gebra in Engineering, Communication and Comput-ing, 25(3):159–180, June 2014. doi:10.1007/s00200-014-0218-3. 6

13. Hassan Edrees, Brian Cheung, McCullen San-dora, David B. Nummey, and Deian Stefan.Hardware-optimized ziggurat algorithm forhigh-speed Gaussian random number genera-tors. In Toomas P. Plaks, editor, ERSA 2009,pages 254–260. CSREA Press, 2009. URL:http://sprocom.cooper.edu/sprocom2/pubs/conference/ecsns2009ersa.pdf. 7

14. FIPS. (FIPS) 186-4, digital signature standard (DSS).Federal Information Processing Standards Publication,July 2013. doi:10.6028/NIST.FIPS.186-4.15

15. FIPS. Secure Hash Standard (SHS). Federal Informa-tion Processing Standards Publication 180-4, August2015. doi:10.6028/NIST.FIPS.180-4. 5

16. FIPS. SHA-3 standard: Permutation-based hash andextendable-output functions. Federal InformationProcessing Standards Publication 202, August 2015.doi:10.6028/NIST.FIPS.202. 4, 5

17. Lov K. Grover. A fast quantum mechanical al-gorithm for database search. In Proceedings ofthe Twenty-eighth Annual ACM Symposium on The-ory of Computing, STOC ’96, pages 212–219.ACM, 1996. URL: http://arxiv.org/abs/quant-ph/9605043, doi:10.1145/237814.237866. 4

18. Lov K. Grover. From Schrödinger’s equation tothe quantum search algorithm. American Journalof Physics, 69(7):769–777, 2001. URL: http://arxiv.org/abs/quant-ph/0109116,doi:10.1119/1.1359518. 4

19. James Howe, Thomas Pöppelmann, Máire O’Neill,Elizabeth O’Sullivan, and Tim Güneysu. Practicallattice-based digital signature schemes. ACM Trans.Embed. Comput. Syst., 14(3):41:1–24, April 2015.doi:10.1145/2724713. 2

20. Jakob Jonsson and Burt Kaliski. Public-key cryptog-raphy standards (PKCS) #1: RSA cryptography spec-ifications version 2.1. IETF RFC 3447, February2003. URL: https://tools.ietf.org/html/rfc3447, doi:10.17487/RFC3447. 4

21. Charles F. F. Karney. Sampling exactly from the nor-mal distribution, 2014. Preprint arXiv:1303.6257, Ver-sion 2. URL: http://arxiv.org/abs/1303.6257. 7

22. Donald E. Knuth and Andrew C. Yao. The com-plexity of nonuniform random number generation. InJoseph F. Traub, editor, Algorithms and Complexity:New Directions and Recent Results, pages 357–428,New York, 1976. Academic Press. 7, 12

23. Paul Kocher. Timing attacks on implementations ofDiffie-Hellman, RSA, DSS, and other systems. In NealKoblitz, editor, CRYPTO ’96, volume 1109 of LNCS,pages 104–113. Springer, 1996. doi:10.1007/3-540-68697-5_9. 12

24. Paul Kocher, Joshua Jaffe, and BenjaminJun. Differential power analysis. In MichaelWiener, editor, CRYPTO ’99, volume 1666of LNCS, pages 388–397. Springer, 1999.doi:10.1007/3-540-48405-1_25. 12,13

25. Glen G. Langdon, Jr. An introduction to arithmeticcoding. IBM Journal of Research and Development,28(2):135–149, 1984. 8

26. Mun-Kyu Lee, Jeong Eun Song, Dooho Choi, andDong-Guk Han. Countermeasures against poweranalysis attacks for the NTRU public key cryp-tosystem. IEICE Transactions on Fundamentals

17

Page 18: Arithmetic Coding and Blinding Countermeasures for Lattice ... · size. Arithmetic Coding offers an information the-oretically optimal compression for stationary and memoryless sources,

of Electronics, Communications and Computer Sci-ences, E93.A(1):153–163, 2010. doi:10.1587/transfun.E93.A.153. 13

27. Zhe Liu, Hwajeong Seo, Sujoy Sinha Roy, Jo-hann Großschädl, Howon Kim, and Ingrid Ver-bauwhede. Efficient Ring-LWE encryption on8-bit AVR processors. In Tim Güneysu and He-lena Handschuh, editors, CHES 2015, volume 9293of LNCS, pages 663–682. Springer, 2015. URL:https://eprint.iacr.org/2015/410,doi:10.1007/978-3-662-48324-4_33. 2,13

28. George Marsaglia and Wai Wan Tsang. A fast, easilyimplemented method for sampling from decreasing orsymmetric unimodal density functions. SIAM Journalon Scientific and Statistical Computing, 5(2):349–359,1984. doi:10.1137/0905026. 7

29. George Marsaglia and Wai Wan Tsang. The ziggu-rat method for generating random variables. Journalof Statistical Software, 5(8):1–7, October 2000. URL:http://www.jstatsoft.org/v05/i08. 7

30. NIST. Submission requirements and evaluation crite-ria for the post-quantum cryptography standardizationprocess. Official Call for Proposals, National Institutefor Standards and Technology, December 2016.URL: http://csrc.nist.gov/groups/ST/post-quantum-crypto/documents/call-for-proposals-final-dec-2016.pdf. 1

31. NSA/CSS. Information assurance directorate: Com-mercial national security algorithm suite and quantumcomputing FAQ, January 2016. URL: https://www.iad.gov/iad/library/ia-guidance/ia-solutions-for-classified/algorithm-guidance/cnsa-suite-and-quantum-computing-faq.cfm. 1

32. Chris Peikert. An efficient and parallel Gaussian sam-pler for lattices. In Tal Rabin, editor, CRYPTO 2010,volume 6223 of LNCS, pages 80–97. Springer, 2010.doi:10.1007/978-3-642-14623-7_5. 7

33. William B. Pennebaker, Joan L. Mitchell, Glen G.Langdon, Jr, and Ronald B. Arps. An overview of thebasic principles of the Q-coder adaptive binary arith-metic coder. IBM Journal of Research and Develop-ment, 32(6):717–726, 1988. 8

34. Peter Pessl. Analyzing the shuffling side-channelcountermeasure for lattice-based signatures. InOrr Dunkelman and Somitra Kumar Sanadhya, ed-itors, INDOCRYPT 2016, volume 10095 of LNCS,pages 153–170. Springer, 2016. doi:10.1007/978-3-319-49890-4_9. 15

35. Thomas Pöppelmann, Léo Ducas, and Tim Güneysu.Enhanced lattice-based signatures on reconfig-urable hardware. In Lejla Batina and MatthewRobshaw, editors, CHES 2014, volume 8731 ofLNCS, pages 353–370. Springer, 2014. Extendedversion available as IACR ePrint 2014/254. URL:https://eprint.iacr.org/2014/254,

doi:10.1007/978-3-662-44709-3_20. 2, 3,10, 11, 12, 14

36. Oscar Reparaz, Sujoy Sinha Roy, Frederik Ver-cauteren, and Ingrid Verbauwhede. A maskedring-LWE implementation. In Tim Güneysu andHelena Handschuh, editors, CHES 2015, volume 9293of LNCS, pages 683–702. Springer, 2015. URL:https://eprint.iacr.org/2015/724,doi:10.1007/978-3-662-48324-4_34. 13

37. Jorma J. Rissanen. Generalized kraft inequality andarithmetic coding. IBM Journal of Research and De-velopment, 20:198–203, May 1976. doi:10.1147/rd.203.0198. 7

38. Matthieu Rivain, Emmanuel Prouff, and Julien Doget.Higher-order masking and shuffling for software im-plementations of block ciphers. In Christophe Clavierand Kris Gaj, editors, CHES 2009, volume 5747 ofLNCS, pages 171–188. Springer, 2009. Extendedversion available as IACR ePrint 2009/420. URL:https://eprint.iacr.org/2009/420,doi:10.1007/978-3-642-04138-9_13. 13

39. Sujoy Sinha Roy, Oscar Reparaz, Frederik Ver-cauteren, and Ingrid Verbauwhede. Compact and sidechannel secure discrete Gaussian sampling. IACRePrint 2014/591, 2014. URL: https://eprint.iacr.org/2014/591. 12, 14

40. Sujoy Sinha Roy, Frederik Vercauteren, Nele Mentens,Donald Donglong Chen, and Ingrid Verbauwhede.Compact Ring-LWE cryptoprocessor. In Lejla Batinaand Matthew Robshaw, editors, CHES 2014, volume8731 of LNCS, pages 371–391. Springer, 2014. URL:https://eprint.iacr.org/2013/866,doi:10.1007/978-3-662-44709-3_21. 2

41. Markku-Juhani O. Saarinen. Gaussian sampling preci-sion in lattice cryptography. IACR ePrint 2015/953,October 2015. URL: https://eprint.iacr.org/2015/953. 12, 15

42. Amir Said. Introduction to arithmetic coding -theory and practice. In Khalid Sayood, editor,Lossless Compression Handbook. Academic Press,2002. Chapter also published as HP Technical reportHPL-2004-76. URL: http://www.hpl.hp.com/techreports/2004/HPL-2004-76.pdf. 8

43. Gregory Valiant and Paul Valiant. An automaticinequality prover and instance optimal identitytesting. In FOCS 2014, pages 51–60. IEEEComputer Society, 2014. Full version avail-able as http://theory.stanford.edu/~valiant/papers/instanceOptFull.pdf.doi:10.1109/FOCS.2014.14. 15

44. Patrick Weiden, Andreas Hülsing, Daniel Cabarcas,and Johannes Buchmann. Instantiating treeless sig-nature schemes. IACR ePrint 2013/065, Febru-ary 2013. URL: https://eprint.iacr.org/2013/065. 5

45. Ian H. Witten, Radford M. Neal, and John G. Cleary.Arithmetic coding for data compression. Communica-tions of the ACM, 30(6):520–240, June 1987. doi:10.1145/214762.214771. 8

18

Page 19: Arithmetic Coding and Blinding Countermeasures for Lattice ... · size. Arithmetic Coding offers an information the-oretically optimal compression for stationary and memoryless sources,

46. Christof Zalka. Grover’s quantum search-ing algorithm is optimal. Physical Review A,60:2746–2751, October 1999. URL: http://arxiv.org/abs/quant-ph/9711070,doi:10.1103/PhysRevA.60.2746. 4

47. Xuexin Zheng, An Wang, and Wei Wei. First-ordercollision attack on protected NTRU cryptosystem. Mi-croprocessors & Microsystems, 37(6-7):601–609, Au-gust 2013. doi:10.1016/j.micpro.2013.04.008. 14

19