BasicProbabilityTheory€¦ · 2 BasicProbabilityTheory Thisdescribesthefollowingsituation: First,...

Foundations of Cybersecurity (Winter 16/17)Prof. Dr. Michael BackesCISPA / Saarland University computer science

saarlanduniversity

Lecture 6 (9 December 2016)


saarlanduniversity



saarlanduniversity


Chapter 2

Basic Probability Theory

In this chapter, we first give an introduction to basic probability theory required for the cryp-tography part of the lecture.

2.1 On the Notation for Probabilities

We have to introduce some probability notation dealing with events in the execution of crypto-graphic protocols. You may read this section rather quickly for the moment, getting back to itwhen the notation is used in the sequel.

2.1.1 Syntax and Informal Semantics

Let x1, . . . , xn be variable names, let X1, . . . , Xn be probabilistic functions (e.g., probabilisticalgorithms), and let e1, . . . , en be deterministic expressions where ei may depend on x1, . . . , xi−1.Let π be a predicate depending on x1, . . . , xn. Then we write

P := Pr [π(x1, . . . , xn) : x1 ← X1(e1), . . . , xn ← Xn(en)]

for the following probability: First, x1 is chosen according to X1(e1), then x2 using X2(e2)(which may depend on the value of x1), and so forth. Finally, π is evaluated (which depends onthe values x1, . . . , xn chosen before), and P is the probability that π evaluates to true.

In some cases, we use slightly relaxed notation. E.g., we write y ← f(x) even if f is nota probabilistic function, but a deterministic one. Or we use expressions like A(B(x), y), whereboth A and B are probabilistic functions. This then means that first z ← B(x) is evaluated,and then A(z, y). However, expressions with relaxed notation can always be transformed intoexpressions using the notation as defined above.

To make this notation clearer, we give some examples: Let U be the uniform distri-bution on {1, . . . , 10} (that is, U is a probabilistic function without arguments). ThenPr [x = y : x← U, y ← U ] denotes the probability that x = y if x and y are both indepen-dently chosen according to the distribution U . So Pr [x = y : x← U, y ← U ] = 1

10 . Althoughthe same expression occurs on the right-hand side of both arrows, this does not imply that x = ywith probability 1. Using the relaxed notation introduced above, this probability could as wellbe written as Pr [x = y : x←R {1, . . . , 10}, y ←R {1, . . . , 10}].

For the next example, let fn be some function on bitstrings of length n, and let A be aprobabilistic algorithm. Consider the following expression:

Pr [fn(x) = fn(y) ∧ x 6= y : x←R {0, 1}n, y ← A(n, x)] (2.1)

9

2 Basic Probability Theory

This describes the following situation: First, x is chosen to be a uniformly chosen bitstring oflength n. Then, A is invoked with the length parameter n and the bitstring x as arguments.A now outputs a string y. The expression above then describes the probability that both xand y have the same image under fn, but are not equal (they form a so-called collision for fn).Note that in this example, the whole expression (2.1) depends on an external argument n.Such parameterized probabilities are especially useful to specify asymptotic behavior. In thisexample, we could require that for any polynomial-time algorithm A, the probability abovedecreases sufficiently fast for growing n.

2.1.2 Towards a Formal Semantics (not required for this lecture)

Let M be a finite or countable set. A probability distribution on M is a function D assigning anonnegative real number to each E ⊆M (intuitively, the probability that some x ∈ E is chosen,or alternatively, that the event E occurs), such that D(E) =

∑x∈E D({x}) and D(M) = 1. To

highlight the fact that the images of D are probabilities, we often write PD(E) instead of D(E).For sets containing one element we abbreviate PD({m}) by PD(m).

A probabilistic function A : M → N is modeled as a deterministic function M → Dist(N),i.e., it assigns each input a distribution on the output. Dist(N) is the set of all distributionson N . The probabilistic assignment n ← A(m) then means choosing a value according to thedistribution A(m) and assigning the value to n. The support of A with respect to m ∈ M ,denoted [A(m)], is defined as the set of all n ∈ N that can be output with nonzero probabilityby A(m). Formally, [A(m)] := {n ∈ N | PA(m)(n) > 0}.

Let π be a predicate on a certain domainM . Then one defines the event that π is fulfilled asEπ := {m ∈M | π(m) holds}. The probability that π is fulfilled given a probabilistic assignmentx← D is defined as

Pr [π(x) : x← D] := PD(Eπ) =∑m s.t.

π(m) holds

PD(m) (2.2)

Given probabilistic functions X1 : M → N and X2 : M × N → O, the sequential execution(X1, X2) : M → N × O is a probabilistic function that intuitively upon argument m runsx← X1(m), y ← X2(m,x), and returns (x, y). Formally,

P(X1,X2)(m)(n, o) := PX1(m)(n) · PX2(m,n)(o) (2.3)

Using Equations 2.2 and 2.3 one obtains the following general formula. Let π be a predicatedepending on m, n, and o. For all m ∈M let

Pr [π(m,n, o) : n← X1(m), o← X2(m,n)] =∑n,o s.t.

π(m,n,o) holds

PX1(m)(n) · PX2(m,n)(o)

In a similar way, one can define a more general version involving i algorithms Xi as well as aversion with several external messages m1, . . . ,ml. However, this quickly gets lengthy, so thatwe will not pursue it here for reasons of clarity.

To illustrate the computation with the above general formula, we again consider the probabil-ity Pr [x = y : x←R {1, . . . , 10}, y ←R {1, . . . , 10}] from the previous section. This probability

10

2.1 On the Notation for Probabilities

can now be formally computed as

Pr [x = y : x← U, y ← U ] =∑

x, y s.t.x = y

PU (x) · PU (y)

=∑x

PU (x) · PU (x)

=10∑x=1

1

10· 110

=1

10.

11

2 Basic Probability Theory

12


saarlanduniversity



saarlanduniversity



saarlanduniversity


Chapter 3

The One-Time Pad and Stream Ciphers

In this chapter, we introduce the one-time pad and strean ciphers, we discuss achieving ran-domness for cryptography, and, finally, introduce basic notions of security for cryptography.

3.1 The One-Time Pad

The one-time pad, also called Vernam cipher, was invented by Gilbert Vernam in 1917. Itwas the first scheme for which a security proof was given, namely by Claude Shannon in 1949.(Before that, not even definitions of security existed.) It can be defined over different messagespaces, for example {a, . . . , z} or {0, 1}. We define it over bitstrings.

Definition 3.1 (One-Time Pad) Let M = C = {0, 1}n for some n ∈ N. Let the key gener-ation algorithm K draw keys from {0, 1}n according to a uniform distribution. Both encryptionand decryption are defined by the exclusive OR of the key K with the plaintext m or the cipher-text c, respectively, i.e.,

E(K,m) := K ⊕m and D(K, c) := K ⊕ c.

Only one message can be securely encrypted with each key, hence the name of the scheme. Thecorrectness of the one-time pad can easily be shown: Let K ∈ [K] = {0, 1}n, m ∈ M, andc ∈ [E(K,m)], i.e., c = K ⊕m since E is deterministic. Then

D(K, c) = D(K,K ⊕m)

= K ⊕K ⊕m= m.

3.1.1 Perfect Secrecy of the One-Time Pad

The first definition of security for encryption schemes was the definition of perfect secrecy givenby Claude Shannon in 1949. Intuitively it says that any message is encrypted to a specific cipher-text with the same probability. This essentially means that, given a ciphertext, no adversarygains any information on which plaintext was encrypted.

Definition 3.2 (Perfect Secrecy) Let (K,E,D) be an encryption scheme with messagespaceM and ciphertext space C. The encryption scheme provides perfect secrecy if and only ifthe following holds for all m0,m1 ∈M and for all c ∈ C:

Pr[c = c′ : K ← K, c′ ← E(K,m0)

]= Pr

[c = c′ : K ← K, c′ ← E(K,m1)

]13

3 The One-Time Pad and Stream Ciphers

This definition essentially says that for a randomly chosen key K the probability that a messageis encrypted to a particular ciphertext c is the same for all messages mi. Equivalent variants ofthis definition exist based on statistical independence and entropy.

Shannon proved that the one-time pad satisfies this notion of perfect secrecy. Even if theproof is rather simple, it constituted a major step in the history of cryptography since it wasthe first proof of security of a cryptographic scheme.

Proposition 3.3 The one-time pad provides perfect secrecy.

Proof. Let m0,m1, c ∈ {0, 1}n be given. For i ∈ {0, 1} it holds that

Pr[c = c′ : K ← K, c′ ← E(K,mi)

]=

∑K, c′ s.t.c′ = c

Pr [K∗ = K : K∗ ← {0, 1}n] · Pr[c∗ = c′ : c∗ = mi ⊕K

]=

∑K

Pr [K∗ = K : K∗ ← {0, 1}n] · Pr [c∗ = c : c∗ = mi ⊕K]

=∑K

2−n ·{

1 if c = K ⊕mi

0 else

= 2−n.

This holds both for i = 0 and for i = 1, hence the claim follows.

The one-time pad has the disadvantage that the key is as long as the message, and a keycannot be reused without sacrificing perfect secrecy as we will show below. Thus deploying thisscheme necessarily requires to securely transmit a large amount of keys. While this is done invery sensitive environments (the “Red Telephone” linking the White House with the Kremlinduring the Cold War was encrypted with the one-time pad) it is impractical for essentially anyapplication. So the question naturally arises if one can securely encrypt with shorter keys.Unfortunately, Shannon also proved that this is not possible given the definition of perfectsecrecy.

Proposition 3.4 (Optimality of the One-Time Pad) Let (K,E,D) be a cipher with mes-sage spaceM, ciphertext space C, and key space K := [K]. If the cipher provides perfect secrecy,then |K| ≥ |M|.

Proof. Let m1 ∈ M, K1 ∈ K, and c ∈ [E(K1,m1)] arbitrary. Assume for contradiction that|K| < |M|. Let Mc := {m ∈ M | ∃K ∈ K : D(K, c) = m} denote the set of plaintexts thatthe ciphertext c could be decrypted to using some key K. Since decryption is deterministic andsince |K| < |M|, we have that |Mc| < |M|. Thus there exist some m2 ∈ M \Mc, and thism2 satisfies that for all K ∈ K we have D(K, c) 6= m2 by definition of Mc. For the plaintextsm1,m2 and the ciphertext c one obtains

Pr[c = c′ : K ← K, c′ ← E(K,m1)

]> 0 = Pr

[c = c′ : K ← K, c′ ← E(K,m2)

].

This contradicts the assumption that the cipher provides perfect secrecy. Hence the assumption|K| < |M| was not correct.

14

3.2 Stream Ciphers

3.1.2 Active Attacks Against the One-Time Pad

If an adversary not only reads messages but also changes messages (called an active adversary),then active attacks such as the following one become possible. Consider a simple voting sys-tem, where each participant votes 0 or 1. This vote is encrypted using the one-time pad, i.e.,each participant i shares a random bit Ki ←R {0, 1} with a central voting authority. Everyparticipant sends the ciphertext ci := Ki ⊕mi to the central authority. The voting authoritythen computes ci ⊕Ki = mi and counts all votes. Note that this is a very weak form of votingscheme since the voting authority can see the votes of every voter. This is provided for the sakeof demonstration only.

Now assume that an attacker knows (e.g., by means of an opinion poll) that a majority ofall participants is going to vote 0. Then by intercepting all votes ci and replacing them withc∗i := ci⊕1 the attacker is able to invert the outcome of the vote, since these ciphertexts decryptto

c∗i ⊕Ki = (ci ⊕ 1)⊕Ki = ((mi ⊕Ki)⊕ 1)⊕Ki = mi ⊕ 1

One says that the one-time pad is highly malleable, i.e., it offers no form of integrity of the plain-texts. Thus, the one-time pad should only be used with suitable integrity protection mechanisms.These are called MACs (message authentication codes) and will be discussed in Section 6.1.

3.2 Stream Ciphers

Stream ciphers constitute practical encryption schemes that can easily be implemented in hard-ware. They are widely used in protocols such as SSL/TLS and 802.11b WEP.

We have seen that the one-time pad provides very strong security guarantees, but it is notusable for most practical applications because the keys need to be as long as the messages inorder to retain these guarantees. One idea to overcome this problem is to not use keys that arefully random but keys that only look random. A short truly random string is used as input toa deterministic function, a pseudorandom generator G, which computes a larger string. Thislarger string of course cannot be truly random, but it will be such that no “efficient” machine cantell it apart (can distinguish it) from true randomness. Consequently, it is called a pseudorandomstring.

Typical examples of stream ciphers have encryption and decryption functions that simplytake the exclusive OR of the key Ki, that is pseudorandomly generated by G, and the messagemi or the ciphertext ci, respectively.

Recall that we proved in the last chapter that such a cipher cannot provide perfect secrecysince its key space is much smaller than the message space. However, under the assumptionthat the adversary has not unbounded power, but is “efficient”, it still provides strong securityguarantees.

3.2.1 Unpredictability of Pseudorandom Generators (PRGs)

Why do we need unpredictability? Assume an email is transmitted in an encrypted manner.Then one knows that large parts of the plaintext, e.g., the header, have a fixed format. Thus anattacker can obtain some bytes of the pseudorandom string at the beginning by computing theexclusive OR of the initial bits of the encryption and the known parts of the header. Perhapshe needs to guess where exactly the header is placed, but this is possible with reasonably largeprobability. If he was able to predict the behavior of the pseudorandom generator, then hewould be able to decrypt the remaining parts of the message.

15


3.2.2 Getting True Randomness in Practice

The seed for a pseudorandom generator, and keys for symmetric encryption schemes in general,should be as random as possible, for example, one uses physical random number generators toget good randomness.

There are some physical sources that are supposed to produce good randomness, but theresulting bits may have a certain bias or some correlation. In practice, one often circumventsthis by taking the exclusive OR of bits obtained from different such sources; in theory it is moreappropriate to apply a randomness extractor to improve the quality of the output. An examplefor such a randomness extractor is the Von Neumann extractor or hash functions.Typical physical sources of randomness include:• thermal noise in various electric circuits,• shot noise,• avalanche noise,• atmospheric noise, captured by antennas,• quantum effects, e.g., radioactive decay obtained from Geiger-Müller tubes,• Brownian motion,• more playfully, lava lamps. There is even a patent for creating random bits with lava

lamps (US patent #5,732,138).In July 2009, Reidler, Aviad, Rosenblut and Kanter published a ultrahigh-speed random numbergenerator based on the fluctuating intensity of a chaotic semiconductor laser. The generationrate is up to 12.5 GBits/s.

There is also a number of random sources that are more easily available, at the cost ofproducing weaker randomness. What is used also in practice is• measurement of times between user key-strokes, and• time needed to access different sectors on the hard-disk drive (the air turbulences caused

by the spinning disk is supposed to be sufficiently random).

3.2.3 Examples of Pseudorandom Generators

In the following, we provide examples for pseudorandom generators and discuss their strengthsand weaknesses.

RC4

RC4 is a stream cipher invented by Ron Rivest in 1987 for RSA Security Inc., which also holdsthe trademark for it. The source code was originally not released to the public because it wasa trade secret, but was posted to a mailing list in September 1994; thus people referred to thisversion as alleged RC4. Today it is known that alleged RC4 indeed equals RC4. While RC4does not satisfy next-bit unpredictability, it is considered secure from a practical point of viewif one takes certain precautions. It is used in many protocols such as SSL/TLS and 802.11bWEP.

Its main data structure is an array S of 256 bytes. The array is initialized to the identitybefore any output is generated, i.e., the first cell is initialized with 0, the second with 1 and soon. Then the cells are permuted using certain operations that depend on the current state andthe chosen key K. In pseudocode, the RC4 initialization phase works as follows:

for i from 0 to 255S[i] := i

16

3.2 Stream Ciphers

j := 0for i from 0 to 255

j := (j + S[i] + K[i mod keylength]) mod 256swap(S[i],S[j])

After initialization has been completed, the following procedure computes the pseudorandomsequence. For each output byte, new values of the variables i and j are calculated, the corre-sponding cells are swapped, and the content of a third cell is output:

i := 0j := 0while GeneratingOutput

i := (i + 1) mod 256j := (j + S[i]) mod 256swap(S[i],S[j])output S[(S[i] + S[j]) mod 256]

One easily sees that the first byte that is output depends on the content of three cells only. Thisproperty can be used to launch attacks against the cipher, so one usually discards the first 256bytes of output generated by this algorithm to prevent these attacks.

Linear Feedback Shift Register (LFSR)

Linear feedback shift registers (LFSRs) constitute a building block for generating pseudorandomstrings. An LFSR consists of a shift register that shifts its content for one bit at each clock,and a function that determines the feedback, i.e., the next value that enters the shift register,as a function of its current state. For LFSRs this is a linear function, e.g., the exclusive OR ofcertain specified bits in the state. But also feedback shift registers with a more complex feedbackfunction exist. The initial state of the shift register comprises the seed. A sketch of an LFSR isgiven in Figure 3.1. The security of an LFSR is determined by its feedback function and how it

K0 K1 K2 K3 K4

Figure 3.1: Linear feedback shift register

is interlinked with the surrounding operations. Note that LFSRs on their own cannot naively beused as stream ciphers because directly outputting the bits of the state leaks the current stateof the register, so after a full turn the complete state is known. A solution often taken is thatone throws away the first bits and subsequently combines LFSRs in some clever way either withother LFSRs or with other cryptographic primitives. For instance, one encrypts the output ofan LFSR using a secure block cipher, which yields a pseudorandom sequence provided that thebits output by an LFSR do not repeat themselves.

17


The Content Scrambling System (CSS)

We conclude with a brief description of the content scrambling system (CSS), which is a propri-etary standard for encrypting the contents of DVDs. The key management is rather complexand we do not go into details here. The actual content is encrypted with a stream cipher whosecorresponding pseudorandom generator will be described in the sequel. Figure 3.2 illustratesthe pseudorandom generator.

17-bit LFSR

25-bit LFSR

8

8

add mod

256

+

seed

1||K0K1

1||K2K3K4

Figure 3.2: Content scrambling system

Each sector key consists of 5 bytes K0, . . . ,K4. The pseudorandom generator comprises twoLFSRs that operate as follows. The first LFSR has a length of 17 bits and is initialized with1 ‖ K0 ‖ K1, where the feedback is computed as the exclusive OR of bits 1 and 15. The secondLFSR has a length of 25 bits initialized as 1 ‖ K2 ‖ K3 ‖ K4, where the feedback is calculated asthe exclusive OR of bits 11, 19, 20, and 23. Both LFSRs are clocked eight times, each of themproducing eight bits of output. This output is treated as bytes, added modulo 256 observing thecarry bit from the previous addition. The resulting output is used to encrypt the actual data.

The most apparent weakness of CSS is its short key size of 40 bits. This is a consequence ofUS export regulations at the time when CSS was standardized. Brute-force attacks are reportedto take 17 hours on a Celeron 366, i.e., on a very old hardware, and thus already much fasteron more recent hardware. More sophisticated attacks reduce the complexity to even only 216

possible keys, thus taking only several seconds until CSS is successfully attacked.

3.3 Adversary Models

We distinguish the following adversarial capabilities for modeling our adversaries:

• Single-message chosen-plaintext attack (1-CPA): the adversary sees one ciphertext only.In the first step, the adversary chooses two messages. In the next step he is given theencryption of one of these messages. This ciphertext is also called challenge ciphertext orsimply challenge. If he cannot distinguish which of these (self-chosen) plaintexts is insidethe challenge ciphertext, he did not learn any partial information about the messagecontained in the challenge. We have already seen ciphers fulfilling this notion, e.g., theone-time pad.

18

3.3 Adversary Models

• Chosen-plaintext attack (CPA): here the adversary sees(polynomially) many ciphertexts of self-chosen plain-texts, i.e., in addition to 1-CPA attacks, he may se-lect arbitrary plaintexts whose encryption he sees beforeproceeding with a challenge. Examples of ciphers fulfill-ing this stronger notion are randomized CBC based onPRPs (cf. Section 4.4.2) as well as randomized countermode based on PRFs (cf. Section 4.4.5).

• Chosen-ciphertext attack (CCA): in addition to the ca-pabilities for CPA, the adversary may choose ciphertextsthat get decrypted for him. In order to avoid trivial dis-tinguishability, these ciphertexts must of course be dif-ferent from the challenge ciphertext. We will work withchosen-ciphertext attacks later.

19


20


saarlanduniversity



saarlanduniversity



saarlanduniversity


Chapter 4

Block Ciphers and Modes of Operation

In the following, we will see an archetypical example of a block cipher, namely, DES. We willsubsequently discuss how block ciphers can be used to securely encrypt a larger amount of data,and we will briefly present some attacks on block ciphers. First, however, we will introduce theconcept of Feistel networks, which constitutes an important design principle underlying manyblock ciphers.

4.1 Feistel Networks

Feistel networks are a specific construction for designing symmetric encryption schemes. Theywere first described by Horst Feistel during his work on the cipher Lucifer at IBM. Lucifer wasthe predecessor of the Data Encryption Standard (DES), and both are built upon the samedesign. Other ciphers using Feistel networks include IDEA, RC5, and Skipjack.

A Feistel network defines a function F : {0, 1}2n → {0, 1}2n. It is parameterized by thenumber of rounds d ∈ N and the round functions f1, . . . , fd : {0, 1}n → {0, 1}n. The function Foperates in d rounds, typically between 12 and 16. In the i-th round, the following operationsare executed:

1. the round input is split into two halves Li−1 ‖ Ri−1,2. the round function fi is applied to the right half yielding fi(Ri−1),3. the exclusive OR of this value and the left half is computed yielding Li−1 ⊕ fi(Ri−1),4. the left and right side are swapped, thus yielding the round output

Li ‖ Ri := Ri−1 ‖ Li−1 ⊕ fi(Ri−1)

This construction is depicted in Figure 4.1. The main innovation is that the function F definedby a Feistel network is a permutation for arbitrary functions fi. In particular, these do not evenneed to be invertible.

Proposition 4.1 (Feistel Networks Are Permutations) Let F : {0, 1}2n → {0, 1}2n be aFeistel network using an arbitrary round function fi. Then, F constitutes a permutation.

Proof. First, we show how to invert a single round i. Let Fi denote the function computed inthe i-th round. We define the candidate inverse function Gi for input Li ‖ Ri as Gi(Li ‖ Ri) :=Ri⊕fi(Li) ‖ Li. The following simple calculation shows that this is indeed the inverse function:

Gi(Fi(Li−1 ‖ Ri−1)) = Gi(Ri−1 ‖ Li−1 ⊕ fi(Ri−1))= (Li−1 ⊕ fi(Ri−1))⊕ fi(Ri−1) ‖ Ri−1= Li−1 ‖ Ri−1

21

4 Block Ciphers and Modes of Operation

Li-1 Ri-1

Li Ri

+ fi

Figure 4.1: One round in a Feistel network

The candidate inverse G of the function F having d rounds is now defined as G := G1 ◦ · · · ◦Gd;the following calculation shows that G is indeed the inverse:

G1(G2(. . . Gd−1(Gd(Fd(Fd−1(. . . F1(m) . . . )))) . . . ))

= G1(G2(. . . Gd−1(Fd−1(. . . F1(m) . . . )) . . . ))

= . . .

= G1(F1(m))

= m

Feistel networks are appealing due to their simple design and their additional nice propertythat the same hardware circuit can be used for both encryption and decryption. This wasparticularly important in the 60s and 70s when the first block ciphers were designed, and whenhardware was still much more expensive. There are two minor differences between encryptinga plaintext and decrypting a ciphertext in a Feistel network. However, these do not affect thenetwork itself but only the handling of inputs of the circuit, i.e., messages and inputs of roundfunctions. Namely, (i) the round functions need to be applied in reversed order, and (ii) writingc = Ld ‖ Rd as the ciphertext, one inputs Rd ‖ Ld to the network for decryption, i.e., oneexchanges the left and the right half of the ciphertext, and additionally one exchanges the leftand the right half of the output of the Feistel network. One can easily verify that the Feistelnetwork (with these modifications) computes the decryption operation as defined in the aboveproof.

4.2 The Data Encryption Standard (DES)

For a long time, DES was one of the most widely applied block ciphers. It was designed byIBM in collaboration with the NSA and published as a standard in FIPS PUB 46-3. There wererumors about weaknesses the NSA had built into it, but until now, no evidence was found forthese concerns. The main point of criticism against DES was its limited key length, which isnowadays the reason why pure DES is no longer used. In fact, the first DES break, in the senseof a total break given a certain number of plaintext/ciphertext pairs, was reported in 1997 by theDESCHALL project and took about three months. In 1998, the Electronic Frontier Foundation(EFF) built a specialized hardware device resulting in a break in roughly three days. Later theywhere cooperating with distributed.net breaking a DES challenge in 22 hours. The world recordfor breaking a DES encryption is currently around 10 hours and this is likely to decrease in the

22

4.2 The Data Encryption Standard (DES)

IP:58 50 42 34 26 18 10 2 60 52 44 36 28 20 12 462 54 46 38 30 22 14 6 64 56 48 40 32 24 16 857 49 41 33 25 17 9 1 59 51 43 35 27 19 11 361 53 45 37 29 21 13 5 63 55 47 39 31 23 15 7

Figure 4.2: Initial permutation (IP) in DES: the first bit of the output is the 58-th bit of theinput, the second bit of the output is the 50-th bits of the input, and so on.

Ri

round key Ki+

expansion permutation E

S1 S2 S3 S4 S5 S6 S7 S8

32

48

48

48

32

32

P-Box

Figure 4.3: DES round function fi

future. A variant called Triple-DES (3DES), which we will discuss later in this chapter, is stilldeployed and seems to provide good security.

4.2.1 Overall Construction of DES

DES is essentially a Feistel network of 16 rounds operating on blocks of 64 bits and using akey of 56 bits, which is randomly chosen from {0, 1}56. What makes DES different from a pureFeistel design is the initial permutation IP which is applied before the first round starts, andwhose inverse IP−1 is applied after the last round, see Figure 4.2.The DES round function fi, as depicted in Figure 4.3, executes the following steps:

1. Expansion permutation. The message block of 32-bit length is expanded to a blockof 48 bits by doubling 16 bits and permuting the block as defined by the expansion per-mutation E shown in Figure 4.4. The permutation is chosen in a way that one bit in theinput affects two substitutions in the output (see the S-Boxes below). This creates anavalanche effect, meaning that a small difference between two plaintexts will produce alarge difference in the resulting ciphertexts.

2. Round key addition. The exclusive OR of the 48-bit value with the 48-bit round key Ki

is computed. The derivation of the round keys, i.e., the key schedule, is defined below.3. Splitting. This block of 48 bits is split into eight blocks with 6 bits each.

23


E:32 1 2 3 4 54 5 6 7 8 98 9 10 11 12 13

12 13 14 15 16 1716 17 18 19 20 2120 21 22 23 24 2524 25 26 27 28 2928 29 30 31 32 1

Figure 4.4: Expansion permutation E in DES: the first bit of output is the 32-nd bit of theinput, the second bit of output is the first of the input and so on.

4. S-Box. Each of these blocks is the input to one of the eight S-boxes S1, . . . , S8, as shownin Figure 4.5. Each S-box yields a 4-bit output. Concatenating these outputs yields ablock of 32 bits.

5. P-Box. The permutation P defined in Figure 4.6 is applied to the 32-bit block, yieldingthe output of fi.

4.2.2 Key Schedule

The choice of the key used in each round is called key schedule. Even if a DES key is composedof 64 bits, 8 bits are used for error detection, thus resulting in an actual key length of 56 bits. Inthe first step, the error correction bits are stripped off a 64-bit key K and the bits are permutedby a fixed permutation. Both is performed by the function PC-1 shown in Figure 4.7. Theoutput is divided into a 28-bit left half C0 and a 28-bit right half D0. Now for each round wecompute

Ci = Ci−1 « piDi = Di−1 « pi

where x « pi means a cyclic shift on x to the left by pi positions where pi = 1 if i ∈ {1, 2, 9, 16},and pi = 2 otherwise. Finally, Ci and Di are joined together and permuted according to PC-2,obtaining the 48-bit round key. This permutation is given in Figure 4.7.

4.2.3 The Security of DES

DES withstood attacks quite successfully apart from some attacks based on linear and differentialcryptanalysis which we shall discuss later. However, the major weakness of DES is its limitedkey length of 56 bits which is nowadays not enough to provide a reasonable level of security.For this reason, several improvements on the plain DES have been proposed. We will discusssome of them in the following. For more information about the security of DES we refer toSection 4.3.

4.3 Some Attacks on Block Ciphers

In the sequel we sketch some possible attacks on block ciphers and briefly discuss their effectson DES.

24


S1:14 4 13 1 2 15 11 8 3 10 6 12 5 9 0 70 15 7 4 14 2 13 1 10 6 12 11 9 5 3 84 1 14 8 13 6 2 11 15 12 9 7 3 10 5 0

15 12 8 2 4 9 1 7 5 11 3 14 10 O 6 13S2:15 1 8 14 6 11 3 4 9 7 2 13 12 0 5 103 13 4 7 15 2 8 14 12 0 1 10 6 9 11 50 14 7 11 10 4 13 1 5 8 12 6 9 3 2 15

13 8 10 1 3 15 4 2 11 6 7 12 0 5 14 9S3:10 0 9 14 6 3 15 5 1 13 12 7 11 4 2 813 7 0 9 3 4 6 10 2 8 5 14 12 11 15 113 6 4 9 8 15 3 0 11 1 2 12 5 10 14 71 10 13 0 6 9 8 7 4 15 14 3 11 5 2 12

S4:7 13 14 3 0 6 9 10 1 2 8 5 11 12 4 15

13 8 11 5 6 15 0 3 4 7 2 12 1 10 14 910 6 9 0 12 11 7 13 15 1 3 14 5 2 8 43 15 0 6 10 1 13 8 9 4 5 11 12 7 2 14

S5:2 12 4 1 7 10 11 6 8 5 3 15 13 0 14 9

14 11 2 12 4 7 13 1 5 0 15 10 3 9 8 64 2 1 11 10 13 7 8 15 9 12 5 6 3 O 14

11 8 12 7 1 14 2 13 6 15 0 9 10 4 5 3S6:12 1 10 15 9 2 6 8 0 13 3 4 14 7 5 1110 15 4 2 7 12 9 5 6 1 13 14 0 11 3 89 14 15 5 2 8 12 3 7 0 4 10 1 13 11 64 3 2 12 9 5 15 10 11 14 1 7 6 0 8 13

S7:4 11 2 14 15 0 8 13 3 12 9 7 5 10 6 1

13 0 11 7 4 9 1 10 14 3 5 12 2 15 8 61 4 11 13 12 3 7 14 10 15 6 8 0 5 9 26 11 13 8 1 4 10 7 9 5 0 15 14 2 3 12

S8:13 2 8 4 6 15 11 1 10 9 3 14 5 0 12 71 15 13 8 10 3 7 4 12 5 6 11 0 14 9 27 11 4 1 9 12 14 2 0 6 10 13 15 3 5 82 1 14 7 4 10 8 13 15 12 9 0 3 5 6 11

Figure 4.5: S-Boxes in DES: for a binary string x0x1x2x3x4x5, the output is computed by lookingup the entry in row x0x5 and column x1x2x3x4. This representation has historical reasons.

P

16 7 20 21 29 12 28 171 15 23 26 5 18 31 102 8 24 14 32 27 3 9

19 13 30 6 22 11 4 25

Figure 4.6: P-Box in DES: the first bit of the output is the 16-th bit of the input.

25


PC-1:57 49 41 33 25 17 91 58 50 42 34 26 18

10 2 59 51 43 35 2719 11 3 60 52 44 3663 55 47 39 31 23 157 62 54 46 38 30 22

14 6 61 53 45 37 2921 13 5 28 20 12 4

PC-2:14 17 11 24 1 53 28 15 6 21 10

23 19 12 4 26 816 7 27 20 13 241 52 31 37 47 5530 40 51 45 33 4844 49 39 56 34 5346 42 50 36 29 32

Figure 4.7: DES key schedule permutations PC-1 and PC-2 (note that the indices refer to theoriginal key with error-correcting bits)

C0 D0

K

PC-1

LS1 LS1

C1 D1

LS2 LS2

LS16 LS16

C16 D16

PC-2 K1

PC-2 K16

64bit

28bit28bit

28bit 28bit

28bit 28bit

28bit28bit

28bit 28bit

28bit 28bit

28bit28bit

Figure 4.8: Key scheduling in DES

26


4.3.1 Exhaustive Key Search

The conceptually simplest attack that can be mounted against any block cipher is exhaus-tive key search, also called brute-force attack. Knowing a few plaintext/ciphertext pairs(m1, c1), (m2, c2), . . . with ci = E(K,mi), one tries to find the key K by testing every possi-ble key.

To estimate the effectiveness of this attack we need to know how many keys would encrypt afixed plaintext to a specific ciphertext. We will give an estimation for this in the sequel, wherewe replace the block cipher with a random function for every key. While this technique is notaccurate, it makes use of the intuition that the encryption function should essentially constitutea random function, and is usually considered to be a reasonable abstraction. In this case, onesays that the block cipher is considered to be an ideal cipher. This heuristic is used quite often,because an accurate analysis of a real block cipher such as DES is extremely complex.

For the case of DES we prove the following statement: for a randomly chosen messageblock m ∈ {0, 1}64 and a randomly chosen key K ∈ {0, 1}56, the probability that there existsanother key K ′ ∈ {0, 1}56 encrypting the messagem to the same ciphertext is very low, providedthat DES behaves like an ideal cipher. Hence, the probability that a key is already uniquelydetermined after seeing one plaintext/ciphertext pair, is high.

Lemma 4.2 Let (KDES,EDES,DDES) denote the idealized DES cipher, i.e., for randomly cho-sen key K ∈ [KDES], the function EDES(K, ·) is “as good as” a randomly chosen functionR : {0, 1}64 → {0, 1}64. Then

Pr[∃K ′ ∈ [KDES] s.t. EDES(K ′,m) = EDES(K,m) ∧K ′ 6= K : m←RM, K ← KDES

]≤ 1

256

Proof. We have

Pr[∃K ′ ∈ [KDES] s.t. K ′ 6= K ∧ EDES(K,m) = EDES(K ′,m) : m←RM,K ← KDES

](1)

≤∑

K′∈{0,1}56Pr[EDES(K ′,m) = EDES(K,m) ∧K ′ 6= K : m←RM, K ← KDES

]=

∑K,K′∈{0,1}56

1

256· Pr

[EDES(K ′,m) = EDES(K,m) ∧K ′ 6= K : m←RM

](2)=

∑K,K′∈{0,1}56

1

256·{

0 if K = K ′1264

if K 6= K ′

≤ 256 · 256

256 · 264= 2−8

For inequality (1) we exploited that the probability of a union is always less or equal thanthe sum of the probabilities of all elements of the union. Equality (2) holds because EDES

was considered to be an ideal cipher. Consequently, both EDES(K,m) and EDES(K ′,m) are(independent) uniformly random in {0, 1}64 over the choice of the random functions, providedthat the keys are not equal. Thus the probability that both encryptions are equal is 1

264.

27


Exhaustive Key Search in Practice. The DES Challenge was put forward by RSA Securityto encourage research on the security of DES. A reward of $10,000 was offered for solving thechallenge, i.e., for computing the key that was used to encrypt a specific plaintext/ciphertextpair of the form “The unknown message is: ––––”. In 1997, the DESCHALL project neededabout three months to break the DES challenge with a distributed search. In 1998, the ElectronicFrontier Foundation (EFF) built the specialized hardware device Deep Crack that was able tobreak DES keys in roughly three days, at rather moderate costs of about $250,000. While thisis certainly too expensive for individuals, this amount is reasonable for large organizations orgovernments. Later the EFF and distributed.net together broke the challenge in 22 hours.

4.3.2 Linear Cryptanalysis

Suppose messages m and keys K are drawn uniformly at random from their respective sets.Further suppose one knows i1, . . . , ir, j1, . . . , js, k1, . . . , kt such that

Pr[m(i1) ⊕ . . .⊕m(ir) ⊕ c(j1) ⊕ . . .⊕ c(js) ⊕K(k1) ⊕ . . .⊕K(kt) = 1

]≥ 1

2+ ε,

where m(i), c(i), K(i) denote the i-th bits of the bitstring m, the ciphertext c, and the key K,respectively. Such relations can be found for DES with ε ≈ 2−21. If one gets enough plain-text/ciphertext pairs, one can obtain partial information about the key. Namely for 1/ε2 plain-text/ciphertext pairs (ml, cl) one has

Pr[K(k1) ⊕ . . .⊕K(kt) = MAJ

(m

(i1)l ⊕ . . .⊕m(ir)

l ⊕ c(j1)l ⊕ . . .⊕ c(js)l | l = 1, . . . , 1/ε2)]≥ 97.7%.

where MAJ denotes the majority function, taken over all exclusive ORs computed from anygiven plaintext/ciphertext pair, i.e., MAJ(bi | i = 1, . . . , n) returns 0 if n2 or more arguments biare 0, and 1 otherwise.

One can calculate that obtaining 242 plaintext/ciphertext pairs for DES allows for retrieving14 bits of information of the key using these techniques. This enables an attacker to reducethe complexity of an exhaustive key search to 242 DES operations, as one only has to test keysfulfilling the relation.

An even more involved method to break block ciphers is differential cryptanalysis, which wewill not discuss here.

4.3.3 Side-channel Attacks

All the previous attacks are concerned with the input/output behavior of the algorithm only.This is the most common view, sometimes called the standard view, on cryptographic algorithms.However, in certain scenarios an attacker can gain access to more data. This can, for example,be the time needed to compute an encryption, the electric power consumption of a processor, orthe electromagnetic emanation of a device, in particular of smartcards. Some of these attackswere able to completely break a system in seconds, and several smartcards have been withdrawnafter it was discovered that they were highly vulnerable to such attacks.

One of the most famous side-channel attacks is due to Paul Kocher, who measured electricpower consumption of microprocessors. He showed that by inspection of power consumptioncurves one can identify which conditional branch the executed code took. Consequently, if theseconditions depend on the secret key, then information about the key is leaked. In fact, fromseveral popular algorithms the key could quite easily be retrieved using such an attack.

28

4.4 Modes of Operation


Given a block cipher (K,E,D) with message spaceM = {0, 1}n and key space K := [K], i.e., asymmetric encryption scheme where the message space M corresponds to the cipher space C.There are several ways it can be used to encrypt messages that are longer than a single block.The standard FIPS PUB 81 defines four modes of operation (ECB, CBC, OFB, CFB). Anothermode is the counter mode, which has several variants. All operation modes have in commonthat their key generation algorithm is the same as the key generation for the cipher, we omit itsdescription in the sequel.

4.4.1 Electronic Codebook Mode (ECB)

The electronic codebook (ECB) mode corresponds to the “naive” use of a block cipher. Themessage is split into a sequence m = m1, . . . ,mt of blocks mi ∈M. Each block mi is encryptedwith the same key K. Formally, the encryption c = c1, . . . , ct is computed as

ci := E(K,mi).

The decryption operation is obvious. The ECB mode is depicted in Figure 4.9.The main weakness of the ECB mode is that identical blocks mi are encrypted to the same

ciphertext ci. This is a particularly strong weakness if messages come from a source with lowentropy, e.g., securely encrypting (uncompressed) images is not possible using ECB. On thepositive side, if one block ci gets corrupted due to unreliable channels, then only one block isaffected while all other blocks can still be decrypted.

4.4.2 Cipher Block Chaining Mode (CBC)

The cipher block chaining (CBC) mode overcomes the main weakness of ECB by using aninitialization vector IV which is chosen uniformly at randomly from the setM in the encryptionprocess as follows. A message m = m1, . . . ,mt is encrypted by computing

c1 := E(K,m1 ⊕ IV ) andci := E(K,mi ⊕ ci−1), for i ≥ 2,

i.e., each ciphertext block ci is XOR’ed with the next block of the plaintext. The ciphertext isc = IV , c1, . . . , ct. The decryption operation is straightforwardly derivable from the encryption;thus we omit its description. The CBC mode is depicted in Figure 4.10.

For the CBC mode, a one-bit error in the ciphertext gives not only a one-block error in thecorresponding message block but also a one-bit error in the next decrypted plaintext block. Theinitial value IV ensures that two encryptions of the same plaintext yield different ciphertexts,as opposed to the ECB mode. Note that the initialization vector needs to be included in theciphertext, as otherwise the ciphertext cannot be decrypted.

4.4.3 Cipher Feedback Mode (CFB)

To encrypt a message m = m1, . . . ,mt in the cipher feedback (CFB) mode, one first randomlychooses an initialization vector IV . The ciphertext c = IV , c1, . . . , ct is computed by appendingthe initialization vector IV to the ci that are computed as follows:

c1 := m1 ⊕ E(K, IV ), andci := mi ⊕ E(K, ci−1) for i ≥ 2.

29

4 Block Ciphers and Modes of Operationm1 m2c1 c2E(K, )E(K, ) c1 c2m1 m2D(K, )D(K, )Figure 4.9: Encryption and Decryption in ECB Modem1E(K, )+IV c1

m2E(K, )+c2c1D(K, )+IV m1

c2D(K, )+m2Figure 4.10: Encryption and Decryption in CBC Mode

m1 m2

c1 c2

E(K,.) E(K,.)+

IV

+

c1 c2

m1 m2

E(K,.) E(K,.)+

IV

+

Figure 4.11: Encryption and Decryption in CFB Mode

m1

E(K,.)

+

IV

c1

E(K,.)

m2

+

c2

E(K,.)

c1

E(K,.)

+

IV

m1

E(K,.)

c2

+

m2

E(K,.)

Figure 4.12: Encryption and Decryption in OFB Modem1+IV c1 m2+c2IV+1 IV+2E(K, ) E(K, ) c1+IV m1 c2+m2IV+1 IV+2E(K, ) E(K, )Figure 4.13: Encryption and Decryption in CTR Mode

30


The decryption operation is defined in the straightforward manner. Note that the decryptionalgorithm D from the block cipher is not involved.

A one-bit error in the ciphertext causes a one-bit error in the corresponding plaintext blockand a block-error in the next decrypted plaintext block, i.e., the opposite of what happens inthe CBC mode. The CFB is depicted in Figure 4.11.

4.4.4 Output Feedback Mode (OFB)

To encrypt a messagem = m1, . . . ,mt in the output feedback (OFB) mode one randomly choosesan initialization vector IV , the ciphertext c = IV , c1, . . . , ct is computed by appending theinitialization vector to the ci that are computed as follows

ci := mi ⊕ Ei(K, IV ).

where Ei(K, ·) means iterating the encryption algorithm i times. Of course, when implementingthis mode one reuses previously computed iterations in the obvious way, see also Figure 4.12.Decryption is computed analogously. Again, the initialization vector needs to be included in theciphertext.

The OFB mode has the property that a one-bit error in the ciphertext gives only a one-biterror in the decrypted ciphertext. Furthermore, the key stream can be pre-computed, as it doesnot depend on the ciphertext. So this mode is good if latency between receiving a message anddecrypting it is important.

A Note on the Initialization Vector. We have defined the modes ECB, CBC, CFB, andOFB in such a way that the initialization vector is chosen uniformly at random and included inthe ciphertext. However, the key requirement is that an initialization vector is not used twice,so one could define these modes such that subsequent initialization vectors are determined by acounter. Then the initialization vector does not have to be sent with the ciphertext; however,both the sender and the recipient have to synchronize their counters.

4.4.5 Counter Mode (CTR)

In contrast to the former modes of operation, the key stream in the counter mode is computedby applying the function E to values generated by a so-called counter, which can be essentiallyany function that will not repeat itself for a long time. In the following we will take a simple,deterministic counter. This mode comes in two variations, depending on how the initializationvector is chosen.

The overall operation of the counter mode (both deterministic and randomized) is sketchedin Figure 4.13. Let an initialization vector IV be given. (We will later explain how it maybe chosen.) The encryption of a message m = m1, . . . ,ml, where each mi ∈ M, is given byc = c1, . . . , cl where

ci := mi ⊕ E(K, IV + i).

Here, the addition IV + i is computed modulo 2n.

Deterministic Counter Mode (detCTR). In this variant, the initialization vector is chosento be IV = 0. We will see that this has the consequence that only one message can be encryptedsecurely, as we would re-use a value output by the function E when encrypting the secondmessage.

31


Deterministic counter mode has very similar characteristics to OFB. A bit-error in ciphertextleads to a bit-error in the message; subsequent blocks are not affected. The key stream can beprecomputed, and random access to the ciphertext is possible.

Randomized Counter Mode (randCTR). This variant does not use a fixed initializationvector, but randomly chooses a new one for every message that is encrypted. This value has tobe sent along with the ciphertext c as constructed above, i.e., the actual ciphertext is (IV , c).

Randomized counter mode has very similar characteristics to detCTR and OFB. However,in contrast to detCTR we will see that it can be used to securely encrypt multiple messages.As for detCTR, the key stream can be precomputed, and random access to the ciphertext ispossible.

32

BasicProbabilityTheory€¦ · 2 BasicProbabilityTheory Thisdescribesthefollowingsituation: First,...

Documents

Transcript of BasicProbabilityTheory€¦ · 2 BasicProbabilityTheory Thisdescribesthefollowingsituation: First,...