COMS30124 : Crypto and Information...
Transcript of COMS30124 : Crypto and Information...
![Page 1: COMS30124 : Crypto and Information Theoryresist.isti.cnr.it/free_slides/cryptology/oswald/Part_II.pdf · Computational Security When considering schemes which are computationally](https://reader030.fdocuments.in/reader030/viewer/2022041122/5d0eba9288c9937f3b8bbe3a/html5/thumbnails/1.jpg)
COMS30124 : Crypto and Information Theory
Elisabeth Oswald and Nigel Smart
Department of Computer Science,University Of Bristol,
Merchant Venturers Building,Woodland Road,Bristol, BS8 1UBUnited Kingdom.
11th October 2006
Elisabeth Oswald and Nigel Smart
COMS30124 : Crypto and Information Theory Slide 1
![Page 2: COMS30124 : Crypto and Information Theoryresist.isti.cnr.it/free_slides/cryptology/oswald/Part_II.pdf · Computational Security When considering schemes which are computationally](https://reader030.fdocuments.in/reader030/viewer/2022041122/5d0eba9288c9937f3b8bbe3a/html5/thumbnails/2.jpg)
Outline
Computational Security
Recap on Probability Theory
Probability and Ciphers
Shannon’s Theorem
Entropy and Uncertainty
Entropy and Cryptography
Elisabeth Oswald and Nigel Smart
COMS30124 : Crypto and Information Theory Slide 2
![Page 3: COMS30124 : Crypto and Information Theoryresist.isti.cnr.it/free_slides/cryptology/oswald/Part_II.pdf · Computational Security When considering schemes which are computationally](https://reader030.fdocuments.in/reader030/viewer/2022041122/5d0eba9288c9937f3b8bbe3a/html5/thumbnails/3.jpg)
Information Theory and Cryptography
Information Theory is one of the foundations of computer science.
Here we will examine its relationship to cryptography.
We will be followingI Chapter 4 of Smart - Cryptography, an Introduction
Other books are...I Chapter 2 of Stinson - Cryptography : Theory and PracticeI Chapter 1 of Welsh - Codes and Cryptography
Elisabeth Oswald and Nigel Smart
COMS30124 : Crypto and Information Theory Slide 3
![Page 4: COMS30124 : Crypto and Information Theoryresist.isti.cnr.it/free_slides/cryptology/oswald/Part_II.pdf · Computational Security When considering schemes which are computationally](https://reader030.fdocuments.in/reader030/viewer/2022041122/5d0eba9288c9937f3b8bbe3a/html5/thumbnails/4.jpg)
Computational Security
A system is computationally secure if the best algorithm for breakingit requires N operations.
I Where N is a very big numberI No practical system can be proved secure under this definition.
In practice we say a system is computationally secure if the bestknown algorithm for breaking it requires an unreasonably largeamount of computer time.
Elisabeth Oswald and Nigel Smart
COMS30124 : Crypto and Information Theory Slide 4
![Page 5: COMS30124 : Crypto and Information Theoryresist.isti.cnr.it/free_slides/cryptology/oswald/Part_II.pdf · Computational Security When considering schemes which are computationally](https://reader030.fdocuments.in/reader030/viewer/2022041122/5d0eba9288c9937f3b8bbe3a/html5/thumbnails/5.jpg)
Computational Security
Another, practical, approach is to reduce a well studied hardproblem to the problem of breaking the system.
I E.g. : The system is secure if a given integer n cannot befactored.
Systems of this form are often called provably secure.I However, we only have a proof relative to some hard problem.I Not an absolute proof.
Essentially bounding the computational power of the adversary.I Even if the adversary has limited (but large) resources she still
will not break the system.
Elisabeth Oswald and Nigel Smart
COMS30124 : Crypto and Information Theory Slide 5
![Page 6: COMS30124 : Crypto and Information Theoryresist.isti.cnr.it/free_slides/cryptology/oswald/Part_II.pdf · Computational Security When considering schemes which are computationally](https://reader030.fdocuments.in/reader030/viewer/2022041122/5d0eba9288c9937f3b8bbe3a/html5/thumbnails/6.jpg)
Computational Security
When considering schemes which are computationally secure
I We need to be careful about the key sizes etc.I We need to keep abreast of current algorithmic developmentsI At some point in the future we should expect our system to be
broken (may be many millennia hence though).
Most schemes in use today are computationally secure.
Elisabeth Oswald and Nigel Smart
COMS30124 : Crypto and Information Theory Slide 6
![Page 7: COMS30124 : Crypto and Information Theoryresist.isti.cnr.it/free_slides/cryptology/oswald/Part_II.pdf · Computational Security When considering schemes which are computationally](https://reader030.fdocuments.in/reader030/viewer/2022041122/5d0eba9288c9937f3b8bbe3a/html5/thumbnails/7.jpg)
Unconditional Security
For unconditional security we place no bound on the computationalpower of the adversary.
In other words, a system is unconditionally secure if it cannot bebroken even with infinite computing power.
I Some systems are unconditionally secure.
Other names for unconditionally secure areI Perfectly secureI Information Theoretically Secure
Elisabeth Oswald and Nigel Smart
COMS30124 : Crypto and Information Theory Slide 7
![Page 8: COMS30124 : Crypto and Information Theoryresist.isti.cnr.it/free_slides/cryptology/oswald/Part_II.pdf · Computational Security When considering schemes which are computationally](https://reader030.fdocuments.in/reader030/viewer/2022041122/5d0eba9288c9937f3b8bbe3a/html5/thumbnails/8.jpg)
Examples
Of the ciphers we have seen, or of those we are to see later on, thefollowing are not computationally secure
I Caesar cipherI Substitution cipherI Vigenère cipher
The following are computationally secure but not unconditionallysecure.
I DES (?) - AESI RSA
One time pad is unconditionally secure if used correctly.
Elisabeth Oswald and Nigel Smart
COMS30124 : Crypto and Information Theory Slide 8
![Page 9: COMS30124 : Crypto and Information Theoryresist.isti.cnr.it/free_slides/cryptology/oswald/Part_II.pdf · Computational Security When considering schemes which are computationally](https://reader030.fdocuments.in/reader030/viewer/2022041122/5d0eba9288c9937f3b8bbe3a/html5/thumbnails/9.jpg)
Probability Diversion
To study perfect security we need to look a little at probability.
A random variable X is a variable which takes certain values withcertain probabilities.
Examples:I Let X be the random variable representing tosses of a fair coin
I p(X = heads) = 1/2I p(X = tails) = 1/2
I Let X be the random variable representing letters in English text
I p(X = a) = 0.082, p(X = e) = 0.127, p(X = z) = 0.001
Elisabeth Oswald and Nigel Smart
COMS30124 : Crypto and Information Theory Slide 9
![Page 10: COMS30124 : Crypto and Information Theoryresist.isti.cnr.it/free_slides/cryptology/oswald/Part_II.pdf · Computational Security When considering schemes which are computationally](https://reader030.fdocuments.in/reader030/viewer/2022041122/5d0eba9288c9937f3b8bbe3a/html5/thumbnails/10.jpg)
Probability Diversion
Let X and Y be random variables.I p(X = x) is the probability that X takes the value x .I p(Y = y) is the probability that Y takes the value y .
The joint probability is defined asI p(X = x , Y = y) is the probability that X takes the value x and
Y takes the value y .
X and Y are independent iffI p(X = x , Y = y) = p(X = x) ·p(Y = y) for all values of x and y .
Elisabeth Oswald and Nigel Smart
COMS30124 : Crypto and Information Theory Slide 10
![Page 11: COMS30124 : Crypto and Information Theoryresist.isti.cnr.it/free_slides/cryptology/oswald/Part_II.pdf · Computational Security When considering schemes which are computationally](https://reader030.fdocuments.in/reader030/viewer/2022041122/5d0eba9288c9937f3b8bbe3a/html5/thumbnails/11.jpg)
Conditional Probability
The conditional probability is defined asI p(X = x | Y = y) is the probability that X takes the value x
given that Y takes the value y .
We have
p(X = x , Y = y) = p(X = x | Y = y) · p(Y = y)
p(X = x , Y = y) = p(Y = y | X = x) · p(X = x)
Elisabeth Oswald and Nigel Smart
COMS30124 : Crypto and Information Theory Slide 11
![Page 12: COMS30124 : Crypto and Information Theoryresist.isti.cnr.it/free_slides/cryptology/oswald/Part_II.pdf · Computational Security When considering schemes which are computationally](https://reader030.fdocuments.in/reader030/viewer/2022041122/5d0eba9288c9937f3b8bbe3a/html5/thumbnails/12.jpg)
Bayes’ Theorem
The following is one of the most crucial statements in probability.
Bayes’ TheoremIf p(Y = y) > 0 then
p(X = x | Y = y) =p(X = x , Y = y)
p(Y = y)
=p(Y = y | X = x) · p(X = x)
p(Y = y)
X and Y are independent iff p(X = x | Y = y) = p(X = x)
I i.e. value of X does not depend on the value of Y .
Elisabeth Oswald and Nigel Smart
COMS30124 : Crypto and Information Theory Slide 12
![Page 13: COMS30124 : Crypto and Information Theoryresist.isti.cnr.it/free_slides/cryptology/oswald/Part_II.pdf · Computational Security When considering schemes which are computationally](https://reader030.fdocuments.in/reader030/viewer/2022041122/5d0eba9288c9937f3b8bbe3a/html5/thumbnails/13.jpg)
Probability and Ciphers
Let P denote the set of possible plaintexts.Let K denote the set of possible keys.Let C denote the set of possible ciphertexts.Let P, K , C be associated random variables with probabilities
p(P = m) p(K = k) p(C = c).
We make the reasonable assumption that P and K are independent.
The set of ciphertexts under the key k is defined by
C(k) = {ek (m) : m ∈ P}.
Elisabeth Oswald and Nigel Smart
COMS30124 : Crypto and Information Theory Slide 13
![Page 14: COMS30124 : Crypto and Information Theoryresist.isti.cnr.it/free_slides/cryptology/oswald/Part_II.pdf · Computational Security When considering schemes which are computationally](https://reader030.fdocuments.in/reader030/viewer/2022041122/5d0eba9288c9937f3b8bbe3a/html5/thumbnails/14.jpg)
Probability and Ciphers
We have the relationship
p(C = c) =∑
{k :c∈C(k)}
p(K = k) · p(P = dk (c)).
For c ∈ C and m ∈ P we can compute the probabilityp(C = c | P = m).
This is the probability that c is the ciphertext given that m is theplaintext
p(C = c | P = m) =∑
{k :m=dk (c)}
p(K = k).
To break a cipher we want to know the probabilities of the plaintextgiven a certain ciphertext.
Elisabeth Oswald and Nigel Smart
COMS30124 : Crypto and Information Theory Slide 14
![Page 15: COMS30124 : Crypto and Information Theoryresist.isti.cnr.it/free_slides/cryptology/oswald/Part_II.pdf · Computational Security When considering schemes which are computationally](https://reader030.fdocuments.in/reader030/viewer/2022041122/5d0eba9288c9937f3b8bbe3a/html5/thumbnails/15.jpg)
Probability and Ciphers
We can compute the probability of m being the plaintext given c isthe ciphertext,
p(P = m | C = c) =p(C = c | P = m) · p(P = m)
p(C = c)
=p(P = m) ·
∑{k :m=dk (c)} p(K = k)∑
{k :c∈C(k)} p(K = k) · p(P = dk (c)).
This can be computed by anyone who knows the probabilitydistributions of K , P and the encryption function.
Using these probabilities one may be able to deduce someinformation about the plaintext once you have seen the ciphertext.
Elisabeth Oswald and Nigel Smart
COMS30124 : Crypto and Information Theory Slide 15
![Page 16: COMS30124 : Crypto and Information Theoryresist.isti.cnr.it/free_slides/cryptology/oswald/Part_II.pdf · Computational Security When considering schemes which are computationally](https://reader030.fdocuments.in/reader030/viewer/2022041122/5d0eba9288c9937f3b8bbe3a/html5/thumbnails/16.jpg)
ExampleSuppose we have P = {a, b}, i.e. only two possible messages
I p(P = a) = 1/4 and p(P = b) = 3/4Suppose we have K = {k1, k2, k3}, i.e. three possible keys
I p(K = k1) = 1/2 and p(K = k2) = p(K = k3) = 1/4.Suppose we have C = {1, 2, 3, 4} with encryption given by
ek (m) a bk1 1 2k2 2 3k3 3 4
We can then compute
p(C = 1) = 1/8,
p(C = 2) = 7/16,
p(C = 3) = 1/4,
p(C = 4) = 3/16.
Elisabeth Oswald and Nigel Smart
COMS30124 : Crypto and Information Theory Slide 16
![Page 17: COMS30124 : Crypto and Information Theoryresist.isti.cnr.it/free_slides/cryptology/oswald/Part_II.pdf · Computational Security When considering schemes which are computationally](https://reader030.fdocuments.in/reader030/viewer/2022041122/5d0eba9288c9937f3b8bbe3a/html5/thumbnails/17.jpg)
Example
We can now compute the conditional probabilities
p(P = a | C = 1) = 1 p(P = b | C = 1) = 0p(P = a | C = 2) = 1/7 p(P = b | C = 2) = 6/7p(P = a | C = 3) = 1/4 p(P = b | C = 3) = 3/4p(P = a | C = 4) = 0 p(P = b | C = 4) = 1
HenceI If we see the ciphertext 1 we know the message is a.I If we see the ciphertext 4 we know the message is b.I If we see the ciphertext 3 we guess the message is b.I If we see the ciphertext 2 we guess the message is b.
Elisabeth Oswald and Nigel Smart
COMS30124 : Crypto and Information Theory Slide 17
![Page 18: COMS30124 : Crypto and Information Theoryresist.isti.cnr.it/free_slides/cryptology/oswald/Part_II.pdf · Computational Security When considering schemes which are computationally](https://reader030.fdocuments.in/reader030/viewer/2022041122/5d0eba9288c9937f3b8bbe3a/html5/thumbnails/18.jpg)
Example - Conclusion
So in the previous example the ciphertext does reveal informationabout the plaintext.
This is exactly what we wish to avoid.
We want the ciphertext to give no information about the plaintext.
A system with this property is said to be perfectly secure.
Elisabeth Oswald and Nigel Smart
COMS30124 : Crypto and Information Theory Slide 18
![Page 19: COMS30124 : Crypto and Information Theoryresist.isti.cnr.it/free_slides/cryptology/oswald/Part_II.pdf · Computational Security When considering schemes which are computationally](https://reader030.fdocuments.in/reader030/viewer/2022041122/5d0eba9288c9937f3b8bbe3a/html5/thumbnails/19.jpg)
Perfect Secrecy
A cryptosystem has perfect secrecy iff
p(P = m | C = c) = p(P = m).
for all m ∈ P and c ∈ C.
That is, the probability that the plaintext is m given that theciphertext c is observed is the same as the probability that theplaintext is m without seeing c.
In other words knowing c reveals no information about m.
Elisabeth Oswald and Nigel Smart
COMS30124 : Crypto and Information Theory Slide 19
![Page 20: COMS30124 : Crypto and Information Theoryresist.isti.cnr.it/free_slides/cryptology/oswald/Part_II.pdf · Computational Security When considering schemes which are computationally](https://reader030.fdocuments.in/reader030/viewer/2022041122/5d0eba9288c9937f3b8bbe3a/html5/thumbnails/20.jpg)
Perfect Secrecy
Recall: Perfect secrecy means p(P = m | C = c) = p(P = m). Thisis equivalent to
p(C = c | P = m) = p(C = c).
Assume, p(C = c) > 0 for all c ∈ C (if not remove c from C).
For any fixed m we have p(C = c | P = m) = p(C = c) > 0. Thismeans that for all c there must be at least one key k such that
ek (m) = c.
Conclusion:#K ≥ #C.
Note: We always have #C ≥ #P since encryption with a given key kis injective.
Elisabeth Oswald and Nigel Smart
COMS30124 : Crypto and Information Theory Slide 20
![Page 21: COMS30124 : Crypto and Information Theoryresist.isti.cnr.it/free_slides/cryptology/oswald/Part_II.pdf · Computational Security When considering schemes which are computationally](https://reader030.fdocuments.in/reader030/viewer/2022041122/5d0eba9288c9937f3b8bbe3a/html5/thumbnails/21.jpg)
Shannon’s Theorem
Shannon’s Theorem is the most important theorem in theinformation theoretic study of cryptography.
Shannon’s TheoremSuppose (P, C, K, ek (·), dk (·)) is a cryptosystem with#P = #C = #K.This cryptosystem provides perfect secrecy if and only if every key isused with equal probability 1/#K and, for each m ∈ P and c ∈ C,there is a unique key k such that ek (m) = c.
Note the statement is if and only if hence we need to prove it in bothdirections.
Elisabeth Oswald and Nigel Smart
COMS30124 : Crypto and Information Theory Slide 21
![Page 22: COMS30124 : Crypto and Information Theoryresist.isti.cnr.it/free_slides/cryptology/oswald/Part_II.pdf · Computational Security When considering schemes which are computationally](https://reader030.fdocuments.in/reader030/viewer/2022041122/5d0eba9288c9937f3b8bbe3a/html5/thumbnails/22.jpg)
ProofSuppose system provides perfect secrecy.
Then, for any fixed m ∈ P, we know that for all c ∈ C there is at leastone key k such that ek (m) = c.
Since #C = #K we have
#C = #{ek (m) : k ∈ K} = #K
i.e. there do not exist two keys k1 and k2 such that
ek1(m) = ek2(m) = c.
So for all m ∈ P and c ∈ C there is unique k ∈ K such thatek (m) = c.
We need to show that every key is used with equal probability, i.e.
p(K = k) = 1/#K for all k ∈ K.
Elisabeth Oswald and Nigel Smart
COMS30124 : Crypto and Information Theory Slide 22
![Page 23: COMS30124 : Crypto and Information Theoryresist.isti.cnr.it/free_slides/cryptology/oswald/Part_II.pdf · Computational Security When considering schemes which are computationally](https://reader030.fdocuments.in/reader030/viewer/2022041122/5d0eba9288c9937f3b8bbe3a/html5/thumbnails/23.jpg)
Proof
Let n = #K and P = {mi : 1 ≤ i ≤ n} and fix c ∈ C.
Label the keys k1, . . . , kn such that
eki (mi) = c for 1 ≤ i ≤ n.
Due to perfect secrecy we have p(P = mi | C = c) = p(P = mi) andthus
p(P = mi) = p(P = mi | C = c)
=p(C = c | P = mi) · p(P = mi)
p(C = c)
=p(K = ki) · p(P = mi)
p(C = c)
Elisabeth Oswald and Nigel Smart
COMS30124 : Crypto and Information Theory Slide 23
![Page 24: COMS30124 : Crypto and Information Theoryresist.isti.cnr.it/free_slides/cryptology/oswald/Part_II.pdf · Computational Security When considering schemes which are computationally](https://reader030.fdocuments.in/reader030/viewer/2022041122/5d0eba9288c9937f3b8bbe3a/html5/thumbnails/24.jpg)
Proof
Hence we obtain that for all 1 ≤ i ≤ n,
p(C = c) = p(K = ki).
Since∑n
i=1 p(K = ki) = 1 we have
n · p(C = c) = 1 ⇒ p(C = c) = 1/n,
thus all keys are used with equal probability.
Conclusion:p(K = k) = 1/#K for all k ∈ K.
Elisabeth Oswald and Nigel Smart
COMS30124 : Crypto and Information Theory Slide 24
![Page 25: COMS30124 : Crypto and Information Theoryresist.isti.cnr.it/free_slides/cryptology/oswald/Part_II.pdf · Computational Security When considering schemes which are computationally](https://reader030.fdocuments.in/reader030/viewer/2022041122/5d0eba9288c9937f3b8bbe3a/html5/thumbnails/25.jpg)
Proof
Now we need to prove the result in the other direction.
Suppose that
I #K = #C = #P;I every key is used with equal probability 1/#K; andI for each m ∈ P and c ∈ C there is a unique key k with
ek (m) = c.
Then we need to show that the system is perfectly secure, i.e.
p(P = m | C = c) = p(P = m).
Elisabeth Oswald and Nigel Smart
COMS30124 : Crypto and Information Theory Slide 25
![Page 26: COMS30124 : Crypto and Information Theoryresist.isti.cnr.it/free_slides/cryptology/oswald/Part_II.pdf · Computational Security When considering schemes which are computationally](https://reader030.fdocuments.in/reader030/viewer/2022041122/5d0eba9288c9937f3b8bbe3a/html5/thumbnails/26.jpg)
ProofSince each key is used with equal probability, we have, for fixed c
p(C = c) =∑
{k :c∈C(k)}
p(K = k) · p(P = dk (c))
=1
#K∑
{k :c∈C(k)}
p(P = dk (c)).
Since for each m and c there is a unique key k with ek (m) = c wehave ∑
{k :c∈C(k)}
p(P = dk (c)) =∑m∈P
p(P = m) = 1.
Conclusion: p(C = c) = 1/#K.
Elisabeth Oswald and Nigel Smart
COMS30124 : Crypto and Information Theory Slide 26
![Page 27: COMS30124 : Crypto and Information Theoryresist.isti.cnr.it/free_slides/cryptology/oswald/Part_II.pdf · Computational Security When considering schemes which are computationally](https://reader030.fdocuments.in/reader030/viewer/2022041122/5d0eba9288c9937f3b8bbe3a/html5/thumbnails/27.jpg)
Proof
In addition, if c = ek (m) then
p(C = c | P = m) = p(K = k) = 1/#K.
Then using Bayes’ Theorem we have
p(P = m | C = c) =p(C = c | P = m) · p(P = m)
p(C = c)
=1/#K · p(P = m)
1/#K= p(P = m).
Q.E.D.
Elisabeth Oswald and Nigel Smart
COMS30124 : Crypto and Information Theory Slide 27
![Page 28: COMS30124 : Crypto and Information Theoryresist.isti.cnr.it/free_slides/cryptology/oswald/Part_II.pdf · Computational Security When considering schemes which are computationally](https://reader030.fdocuments.in/reader030/viewer/2022041122/5d0eba9288c9937f3b8bbe3a/html5/thumbnails/28.jpg)
Example - Shift Cipher
For the Shift Cipher we had P = K = C = Z26 and ek (m) = m + kmod 26.
Shannon’s Theorem implies perfect secrecy if we encrypt 1 letter.
Extension to plaintext of length n by using Shift Cipher with new keyfor each letter.For this system we clearly have
P = K = C = (Z26)n p(K = k) =
126n .
Furthermore, for each m and c there is a unique k such thatek (m) = c.Shannon’s Theorem then implies perfect secrecy.
Elisabeth Oswald and Nigel Smart
COMS30124 : Crypto and Information Theory Slide 28
![Page 29: COMS30124 : Crypto and Information Theoryresist.isti.cnr.it/free_slides/cryptology/oswald/Part_II.pdf · Computational Security When considering schemes which are computationally](https://reader030.fdocuments.in/reader030/viewer/2022041122/5d0eba9288c9937f3b8bbe3a/html5/thumbnails/29.jpg)
Example - Vernam One-Time Pad
Vernam Cipher uses Shift cipher but modulo 2 instead of modulo 26.
Binary arithmetic or XOR is defined as
⊕ 0 10 0 11 1 0
Vernam One-Time PadI Gilbert Vernam patented this cipher in 1917 for encryption and
decryption of telegraph messages.I To send a binary string you need a key as long as the message.I Each key can be used only once hence One-Time Pad.
Elisabeth Oswald and Nigel Smart
COMS30124 : Crypto and Information Theory Slide 29
![Page 30: COMS30124 : Crypto and Information Theoryresist.isti.cnr.it/free_slides/cryptology/oswald/Part_II.pdf · Computational Security When considering schemes which are computationally](https://reader030.fdocuments.in/reader030/viewer/2022041122/5d0eba9288c9937f3b8bbe3a/html5/thumbnails/30.jpg)
Example - Vernam One-Time Pad
Clearly we cannot use the same key twice owing to following chosenplaintext attack.
I Eve generates m and asks Alice to encrypt it.I Eve receives c = m ⊕ k from Alice.I Eve can now compute the key k = c ⊕m.I Eve can decrypt all messages encrypted with k .
One time pad is used in some military and diplomatic contexts.
Elisabeth Oswald and Nigel Smart
COMS30124 : Crypto and Information Theory Slide 30
![Page 31: COMS30124 : Crypto and Information Theoryresist.isti.cnr.it/free_slides/cryptology/oswald/Part_II.pdf · Computational Security When considering schemes which are computationally](https://reader030.fdocuments.in/reader030/viewer/2022041122/5d0eba9288c9937f3b8bbe3a/html5/thumbnails/31.jpg)
Key DistributionPerfect secrecy ⇒ length of key is at least length of plaintext.
I Key distribution becomes the major problem.I Solution in fourth year course COMS40213: Information
Security.
Aim of modern cryptography is to design systems whereI one key can be used many times; andI a short key can encrypt a long message.
Such systems will not be unconditionally secure, but should be atleast computationally secure.
We need to use some Information Theory to analyse the situationwhere the same key is used for multiple encryptions.Again, the main results are due to Shannon in the late 1940’s.
Elisabeth Oswald and Nigel Smart
COMS30124 : Crypto and Information Theory Slide 31
![Page 32: COMS30124 : Crypto and Information Theoryresist.isti.cnr.it/free_slides/cryptology/oswald/Part_II.pdf · Computational Security When considering schemes which are computationally](https://reader030.fdocuments.in/reader030/viewer/2022041122/5d0eba9288c9937f3b8bbe3a/html5/thumbnails/32.jpg)
Uncertainty
Consider the following examples.
I The outcome of a throw of two dice is more uncertain than theoutcome of the throw of one die.
I The outcome of the throw of a fair die is more uncertain thanthe outcome of the throw of a biased die.
I The uncertainty of the random variable X with p(X = 0) = pand p(X = 1) = 1− p is the same as the uncertainty of therandom variable Y with p(Y = a) = p and p(Y = d) = 1− p.
If we want to define uncertainty H(X ) of some random variable Xthen H(X ) should be a function of the probability distribution of Xonly.
Elisabeth Oswald and Nigel Smart
COMS30124 : Crypto and Information Theory Slide 32
![Page 33: COMS30124 : Crypto and Information Theoryresist.isti.cnr.it/free_slides/cryptology/oswald/Part_II.pdf · Computational Security When considering schemes which are computationally](https://reader030.fdocuments.in/reader030/viewer/2022041122/5d0eba9288c9937f3b8bbe3a/html5/thumbnails/33.jpg)
Uncertainty - Shannon’s Axioms
In 1948 Shannon proposed 8 requirements which a sensibledefinition of uncertainty H(X ) should satisfy.Let X be random variable with values x1, . . . , xn and probabilitiespi = p(X = xi) for i = 1, . . . , n then the most important requirementsare:
I H(p1, . . . , pn) is maximum when pi = 1/n for all i ;I H(p1, . . . , pn) ≥ 0 and equals zero only when pi = 1 for some i ;I H(1
n , . . . , 1n ) ≤ H( 1
n+1 , . . . , 1n+1) for n ∈ N (A two-horse race is
less uncertain than a three-horse race.); andI H( 1
mn , . . . , 1mn ) = H( 1
m , . . . , 1m ) + H(1
n , . . . , 1n ) for m, n ∈ N.
(Linearity condition: Uncertainty when throwing an m-sided diefollowed by an n sided die is sum of individual uncertainties.)
Elisabeth Oswald and Nigel Smart
COMS30124 : Crypto and Information Theory Slide 33
![Page 34: COMS30124 : Crypto and Information Theoryresist.isti.cnr.it/free_slides/cryptology/oswald/Part_II.pdf · Computational Security When considering schemes which are computationally](https://reader030.fdocuments.in/reader030/viewer/2022041122/5d0eba9288c9937f3b8bbe3a/html5/thumbnails/34.jpg)
Entropy = Uncertainty
From the requirements on the previous slide one can prove that theonly possible definition for H(X ) is the following.
Given a random variable X that takes on a finite set of values withprobabilities p1, . . . , pn, the uncertainty or entropy is
H(X ) = −n∑
i=1
pi log2 pi .
Note that if pi = 0 we remove it from the above sum.
Elisabeth Oswald and Nigel Smart
COMS30124 : Crypto and Information Theory Slide 34
![Page 35: COMS30124 : Crypto and Information Theoryresist.isti.cnr.it/free_slides/cryptology/oswald/Part_II.pdf · Computational Security When considering schemes which are computationally](https://reader030.fdocuments.in/reader030/viewer/2022041122/5d0eba9288c9937f3b8bbe3a/html5/thumbnails/35.jpg)
Entropy - Examples
Let X be the throw of a fair die, i.e. p(X = i) = 1/6 for i = 1, . . . , 6,then
H(X ) = −6∑
i=1
16
log216
= − log216
= log2 6.
More general, if X takes on n values with equal probability then
H(X ) = −n∑
i=1
1n
log21n
= log2 n
Suppose X is the answer to a question with values either Yes or No.I If I always answer Yes, then there is no uncertainty, i.e.
H(X ) = 0.I If Yes and No are equally probable then H(X ) = 1.
Elisabeth Oswald and Nigel Smart
COMS30124 : Crypto and Information Theory Slide 35
![Page 36: COMS30124 : Crypto and Information Theoryresist.isti.cnr.it/free_slides/cryptology/oswald/Part_II.pdf · Computational Security When considering schemes which are computationally](https://reader030.fdocuments.in/reader030/viewer/2022041122/5d0eba9288c9937f3b8bbe3a/html5/thumbnails/36.jpg)
InformationConsider the following examples.
I Suppose I toss a fair coin and tell you the outcome of theexperiment, then I have given you 1 bit of information.
I Suppose I toss a fair coin n times, then the information of theoutcome of the experiment clearly is n bits.
I Suppose I answer Yes to a question with probability 0.99, andNo with probability 0.01, then the answer Yes providesconsiderably less information than the answer No, since onealready expected the answer Yes.
These examples suggest that the information I of an event E whichoccurs with probability p should be defined as
I(E) = − log2 p.
Elisabeth Oswald and Nigel Smart
COMS30124 : Crypto and Information Theory Slide 36
![Page 37: COMS30124 : Crypto and Information Theoryresist.isti.cnr.it/free_slides/cryptology/oswald/Part_II.pdf · Computational Security When considering schemes which are computationally](https://reader030.fdocuments.in/reader030/viewer/2022041122/5d0eba9288c9937f3b8bbe3a/html5/thumbnails/37.jpg)
Entropy and InformationLet X be a random variable that takes on the values x1, . . . , xn withpi = p(X = xi), then the information content of the event X = xi is
I(X = xi) = − log2 pi .
Recall that the entropy of a random variable X was defined as
H(X ) = −n∑
i=1
pi log2 pi .
This is the mean value of the information content of the eventsX = xi .
Therefore, entropy measures the average information content of anobservation of X .
Conclusion: Loss of entropy is gain of information !Elisabeth Oswald and Nigel Smart
COMS30124 : Crypto and Information Theory Slide 37
![Page 38: COMS30124 : Crypto and Information Theoryresist.isti.cnr.it/free_slides/cryptology/oswald/Part_II.pdf · Computational Security When considering schemes which are computationally](https://reader030.fdocuments.in/reader030/viewer/2022041122/5d0eba9288c9937f3b8bbe3a/html5/thumbnails/38.jpg)
Example
Let us return to our example cryptosystem from earlier.
The possible plaintexts, keys and ciphertexts wereI P = {a, b},I K = {k1, k2, k3},I C = {1, 2, 3, 4}.
We had the following probabilitiesI p(P = a) = 1/4 and p(P = b) = 3/4.I p(K = k1) = 1/2 and p(K = k2) = p(K = k3) = 1/4.I p(C = 1) = 1/8, p(C = 2) = 7/16, p(C = 3) = 1/4 and
p(C = 4) = 3/16.
Elisabeth Oswald and Nigel Smart
COMS30124 : Crypto and Information Theory Slide 38
![Page 39: COMS30124 : Crypto and Information Theoryresist.isti.cnr.it/free_slides/cryptology/oswald/Part_II.pdf · Computational Security When considering schemes which are computationally](https://reader030.fdocuments.in/reader030/viewer/2022041122/5d0eba9288c9937f3b8bbe3a/html5/thumbnails/39.jpg)
Example
Then we have
H(P) =−14
log214− 3
4log2
34≈ 0.81,
H(K ) =−12
log212− 2
14
log214≈ 1.5,
H(C) =−18
log218− 7
16log2
716
− 14
log214− 3
16log2
316
≈ 1.85.
Note that the uncertainty or entropy H(C) of the ciphertext is smallerthan the sum of the entropies of the plaintext H(P) and the keyH(K ).
Later we will see that the difference is the remaining uncertaintyabout the key given the the ciphertext.
Elisabeth Oswald and Nigel Smart
COMS30124 : Crypto and Information Theory Slide 39
![Page 40: COMS30124 : Crypto and Information Theoryresist.isti.cnr.it/free_slides/cryptology/oswald/Part_II.pdf · Computational Security When considering schemes which are computationally](https://reader030.fdocuments.in/reader030/viewer/2022041122/5d0eba9288c9937f3b8bbe3a/html5/thumbnails/40.jpg)
A Fact About Logarithms
The following is a special case of Jensen’s inequality which we willneed to discuss entropy in more depth.
Suppose ai > 0 for i = 1, . . . , n andn∑
i=1
ai = 1.
Then, if xi > 0 for i = 1, . . . , n we have
n∑i=1
ai log2(xi) ≤ log2
(n∑
i=1
aixi
).
Furthermore, equality occurs if and only if x1 = x2 = . . . = xn.
Elisabeth Oswald and Nigel Smart
COMS30124 : Crypto and Information Theory Slide 40
![Page 41: COMS30124 : Crypto and Information Theoryresist.isti.cnr.it/free_slides/cryptology/oswald/Part_II.pdf · Computational Security When considering schemes which are computationally](https://reader030.fdocuments.in/reader030/viewer/2022041122/5d0eba9288c9937f3b8bbe3a/html5/thumbnails/41.jpg)
Upper Bound on Entropy
Suppose X is a random variable that takes on values x1, . . . , xn withprobability distribution pi = p(X = xi) for i = 1, . . . , n then
H(X ) = −n∑
i=1
pi log2 pi =n∑
i=1
pi log21pi
≤ log2
n∑i=1
(pi ×
1pi
)(by Jensen’s Inequality)
= log2 n.
Conclusion:For random variable X with n possible values we haveH(X ) ≤ log2 n and we obtain equality if and only if pi = 1/n for all i .
Elisabeth Oswald and Nigel Smart
COMS30124 : Crypto and Information Theory Slide 41
![Page 42: COMS30124 : Crypto and Information Theoryresist.isti.cnr.it/free_slides/cryptology/oswald/Part_II.pdf · Computational Security When considering schemes which are computationally](https://reader030.fdocuments.in/reader030/viewer/2022041122/5d0eba9288c9937f3b8bbe3a/html5/thumbnails/42.jpg)
Joint EntropyLet X and Y be random variables with values x1, . . . , xn andy1, . . . , ym and joint probability
rij = p(X = xi , Y = yj)
for i = 1, . . . , n and j = 1, . . . , m.
The joint entropy is defined as
H(X , Y ) = −n∑
i=1
m∑j=1
rij log2 rij .
The joint entropy H(X , Y ) is the uncertainty of the random variablesX and Y together.
The joint entropy H(X , Y ) measures the average informationcontent of an observation of X and Y together.
Elisabeth Oswald and Nigel Smart
COMS30124 : Crypto and Information Theory Slide 42
![Page 43: COMS30124 : Crypto and Information Theoryresist.isti.cnr.it/free_slides/cryptology/oswald/Part_II.pdf · Computational Security When considering schemes which are computationally](https://reader030.fdocuments.in/reader030/viewer/2022041122/5d0eba9288c9937f3b8bbe3a/html5/thumbnails/43.jpg)
Joint EntropyLet X and Y be random variables then we have the inequality
H(X , Y ) ≤ H(X ) + H(Y ),
with equality if and only if X and Y are independent.
Reminder:X and Y are independent means that for all i and j
p(X = xi , Y = yj) = p(X = xi) · p(Y = yj).
Proof can be found inI Stinson - Cryptography: Theory and Practice, Theorem 2.7, p.
57 andI Welsh - Codes and Cryptography, Theorem 2, p. 6.
Elisabeth Oswald and Nigel Smart
COMS30124 : Crypto and Information Theory Slide 43
![Page 44: COMS30124 : Crypto and Information Theoryresist.isti.cnr.it/free_slides/cryptology/oswald/Part_II.pdf · Computational Security When considering schemes which are computationally](https://reader030.fdocuments.in/reader030/viewer/2022041122/5d0eba9288c9937f3b8bbe3a/html5/thumbnails/44.jpg)
Conditional Entropy
Conditional entropy measures the average uncertainty of a randomvariable X given an observation of a random variable Y .
Reminder: If X and Y are random variables with values x1, . . . , xn
and y1, . . . , ym then the conditional probability p(X = xi |Y = yj) isthe probability that the value of X will be xi given that the value of Yis yj .
The conditional entropy of X given Y = yj is defined as
H(X |Y = yj) = −n∑
i=1
p(X = xi |Y = yj) · log2 p(X = xi |Y = yj).
Elisabeth Oswald and Nigel Smart
COMS30124 : Crypto and Information Theory Slide 44
![Page 45: COMS30124 : Crypto and Information Theoryresist.isti.cnr.it/free_slides/cryptology/oswald/Part_II.pdf · Computational Security When considering schemes which are computationally](https://reader030.fdocuments.in/reader030/viewer/2022041122/5d0eba9288c9937f3b8bbe3a/html5/thumbnails/45.jpg)
Conditional Entropy
The conditional entropy of X given Y is defined as the weightedaverage of the entropies H(X |Y = yj) for j = 1, . . . , m, i.e.
H(X |Y ) =∑m
j=1 p(Y = yj) · H(X |Y = yj)
= −∑m
j=1∑n
i=1 p(Y = yj) · p(X = xi |Y = yj) · log2 p(X = xi |Y = yj).
Conditional entropy measures the average uncertainty of a randomvariable X given observations of a random variable Y , averagedover all values that Y can take.
Elisabeth Oswald and Nigel Smart
COMS30124 : Crypto and Information Theory Slide 45
![Page 46: COMS30124 : Crypto and Information Theoryresist.isti.cnr.it/free_slides/cryptology/oswald/Part_II.pdf · Computational Security When considering schemes which are computationally](https://reader030.fdocuments.in/reader030/viewer/2022041122/5d0eba9288c9937f3b8bbe3a/html5/thumbnails/46.jpg)
Conditional and Joint Entropy
Conditional and joint entropy are linked by the following formula
H(X , Y ) = H(Y ) + H(X |Y ).
Proof: Welsh - Codes and Cryptography, Theorem 1, p. 8.
As an immediate consequence, we have the following upper bound
H(X |Y ) ≤ H(X )
with equality if and only if X and Y are independent.
Proof: Welsh - Codes and Cryptography, Corollary, p. 9.
Elisabeth Oswald and Nigel Smart
COMS30124 : Crypto and Information Theory Slide 46
![Page 47: COMS30124 : Crypto and Information Theoryresist.isti.cnr.it/free_slides/cryptology/oswald/Part_II.pdf · Computational Security When considering schemes which are computationally](https://reader030.fdocuments.in/reader030/viewer/2022041122/5d0eba9288c9937f3b8bbe3a/html5/thumbnails/47.jpg)
Information and Entropy
Reminder: Loss of uncertainty is gain of information.
Let X and Y be two random variables, then the information about Xconveyed by Y is defined as
I(X |Y ) = H(X )− H(X |Y ).
Clearly I(X |Y ) = 0 if and only if X and Y are independent.
Remark:I Strangely enough we have I(X |Y ) = I(Y |X ).I Proof: Welsh - Codes and Cryptography, Proof, p. 11.
Elisabeth Oswald and Nigel Smart
COMS30124 : Crypto and Information Theory Slide 47
![Page 48: COMS30124 : Crypto and Information Theoryresist.isti.cnr.it/free_slides/cryptology/oswald/Part_II.pdf · Computational Security When considering schemes which are computationally](https://reader030.fdocuments.in/reader030/viewer/2022041122/5d0eba9288c9937f3b8bbe3a/html5/thumbnails/48.jpg)
Conditional Entropy and Cryptography
Let P, K, C be the set of possible messages, keys and ciphertextswith associated random variables P, K , C.
H(P|K , C) = 0I Given the ciphertext and the key, you know the plaintext since it
is the decryption of the given ciphertext under the given key.
H(C|P, K ) = 0I Given the plaintext and the key, you know the ciphertext since it
is the encryption of the given plaintext under the given key.I Note: Modern public key encryption schemes do not have this
last property when used correctly.
Elisabeth Oswald and Nigel Smart
COMS30124 : Crypto and Information Theory Slide 48
![Page 49: COMS30124 : Crypto and Information Theoryresist.isti.cnr.it/free_slides/cryptology/oswald/Part_II.pdf · Computational Security When considering schemes which are computationally](https://reader030.fdocuments.in/reader030/viewer/2022041122/5d0eba9288c9937f3b8bbe3a/html5/thumbnails/49.jpg)
Key Equivocation
The conditional entropy H(K |C) is called the key equivocation andmeasures the average uncertainty remaining about the key when aciphertext has been observed.
Suppose that an adversary wants to determine the key of anon-perfect cipher.The smaller H(K |C) is, the easier it will be to recover the key.
The information revealed about the key by the ciphertext is the lossof uncertainty about the key when a ciphertext has been observed,i.e.
I(K |C) = H(K )− H(K |C).
Elisabeth Oswald and Nigel Smart
COMS30124 : Crypto and Information Theory Slide 49
![Page 50: COMS30124 : Crypto and Information Theoryresist.isti.cnr.it/free_slides/cryptology/oswald/Part_II.pdf · Computational Security When considering schemes which are computationally](https://reader030.fdocuments.in/reader030/viewer/2022041122/5d0eba9288c9937f3b8bbe3a/html5/thumbnails/50.jpg)
Key EquivocationFor a cryptosystem (P, C, K, ek (·), dk (·)) we have
H(K |C) = H(K ) + H(P)− H(C).
In words: The remaining uncertainty about the key when a ciphertexthas been observed is equal to the sum of the uncertainties aboutthe key and the plaintext minus the uncertainty about the ciphertext.
Proof can be found inI Stinson Cryptography: theory and practice, Theorem 2.10,
p. 59.
As a consequence of the last two equations, the informationrevealed about the key by the ciphertext is equal to
I(K |C) = H(C)− H(P).
Elisabeth Oswald and Nigel Smart
COMS30124 : Crypto and Information Theory Slide 50
![Page 51: COMS30124 : Crypto and Information Theoryresist.isti.cnr.it/free_slides/cryptology/oswald/Part_II.pdf · Computational Security When considering schemes which are computationally](https://reader030.fdocuments.in/reader030/viewer/2022041122/5d0eba9288c9937f3b8bbe3a/html5/thumbnails/51.jpg)
Example - Key Equivocation
Returning to our example cryptosystem from earlier
H(P) ≈ 0.81, H(K ) ≈ 1.5 and H(C) ≈ 1.85.
Using the formula for H(K |C) we get
H(K |C) = H(K ) + H(P)− H(C) ≈ 1.5 + 0.81− 1.85 ≈ 0.46.
So the remaining uncertainty about the key is less than half a bit.
And the information revealed about the key by the ciphertext is
I(K |C) = H(C)− H(P) ≈ 1.85− 0.81 ≈ 1.04.
Thus the ciphertext leaks more than 1 bit of information about thekey.
Elisabeth Oswald and Nigel Smart
COMS30124 : Crypto and Information Theory Slide 51
![Page 52: COMS30124 : Crypto and Information Theoryresist.isti.cnr.it/free_slides/cryptology/oswald/Part_II.pdf · Computational Security When considering schemes which are computationally](https://reader030.fdocuments.in/reader030/viewer/2022041122/5d0eba9288c9937f3b8bbe3a/html5/thumbnails/52.jpg)
Spurious Keys
If you know that the plaintext is taken from a ‘natural’ language, thenknowing the ciphertext rules out a certain subset of the keys.
Of the remaining possible keys, only one is correct.The remaining possible, but incorrect, keys are called the spuriouskeys.
Consider the Shift Cipher with the same key for each letter.I Suppose the ciphertext is WNAJW.I The plaintext is known to be an English word.I The only ‘meaningful’ plaintexts are RIVER and ARENA.I We have two possible keys E and W .I One is correct and one is spurious.
Elisabeth Oswald and Nigel Smart
COMS30124 : Crypto and Information Theory Slide 52
![Page 53: COMS30124 : Crypto and Information Theoryresist.isti.cnr.it/free_slides/cryptology/oswald/Part_II.pdf · Computational Security When considering schemes which are computationally](https://reader030.fdocuments.in/reader030/viewer/2022041122/5d0eba9288c9937f3b8bbe3a/html5/thumbnails/53.jpg)
Natural Language
To prove a bound on the number of spurious keys, we need to definewhat we mean by the entropy per letter HL of a natural language L.
Ideally we would like HL to be defined such that the number ofmeaningful strings of length n, which we denote T (n), with n � 0 isabout
2nHL ≈ T (n).
In a natural language there are very few meaningful strings, so theentropy per letter HL will be lower than the entropy of a randomstring,
HL ≤ log2 26 ≈ 4.7.
Elisabeth Oswald and Nigel Smart
COMS30124 : Crypto and Information Theory Slide 53
![Page 54: COMS30124 : Crypto and Information Theoryresist.isti.cnr.it/free_slides/cryptology/oswald/Part_II.pdf · Computational Security When considering schemes which are computationally](https://reader030.fdocuments.in/reader030/viewer/2022041122/5d0eba9288c9937f3b8bbe3a/html5/thumbnails/54.jpg)
Natural LanguageWe get a better approximation if we use the probabilities with whichletters occur in English: if P is the random variable representing theletters in the English language, then
p(P = a) = 0.082, p(P = b) = 0.015, . . . , p(P = z) = 0.001.
This gives us the upper bound
HL ≤ H(P) ≈ 4.19.
However, successive letters are clearly not independent which willfurther reduce the entropy per letter.
An even better approximation is to use P2, i.e. the random variableof bigrams in English, which leads to the bound
HL ≤H(P2)
2≈ 3.90.
Elisabeth Oswald and Nigel Smart
COMS30124 : Crypto and Information Theory Slide 54
![Page 55: COMS30124 : Crypto and Information Theoryresist.isti.cnr.it/free_slides/cryptology/oswald/Part_II.pdf · Computational Security When considering schemes which are computationally](https://reader030.fdocuments.in/reader030/viewer/2022041122/5d0eba9288c9937f3b8bbe3a/html5/thumbnails/55.jpg)
Natural LanguageContinuing this process, we are led to the following definition.
The entropy per letter HL of a natural language L is defined as
HL = limn→∞
H(Pn)
n,
where Pn is the random variable for n-grams.
This is hard to compute exactly but we can approximate it andvarious experiments yield the empirical result
1.0 ≤ HL ≤ 1.5.
So each letter in EnglishI requires 5 = dlog2 26e bits of data to represent it, butI Huffman encoding would only use 1.5 bits per letter.
Elisabeth Oswald and Nigel Smart
COMS30124 : Crypto and Information Theory Slide 55
![Page 56: COMS30124 : Crypto and Information Theoryresist.isti.cnr.it/free_slides/cryptology/oswald/Part_II.pdf · Computational Security When considering schemes which are computationally](https://reader030.fdocuments.in/reader030/viewer/2022041122/5d0eba9288c9937f3b8bbe3a/html5/thumbnails/56.jpg)
RedundancyFor a language L with entropy HL and alphabet P, we need aboutn log2 #P bits to represent a string of length n.However a compact encoding only needs about nHL bits.
The redundancy RL of a language is defined as the relativedifference between both encodings, i.e.
RL =n log2 #P− nHL
n log2 #P= 1− HL
log2 #P.
If we take HL ≈ 1.25 then the redundancy of English is
RL = 1− 1.25log2 26
= 0.75.
So we can compress an English text file of 10 MB down to 2.5 MB.
Elisabeth Oswald and Nigel Smart
COMS30124 : Crypto and Information Theory Slide 56
![Page 57: COMS30124 : Crypto and Information Theoryresist.isti.cnr.it/free_slides/cryptology/oswald/Part_II.pdf · Computational Security When considering schemes which are computationally](https://reader030.fdocuments.in/reader030/viewer/2022041122/5d0eba9288c9937f3b8bbe3a/html5/thumbnails/57.jpg)
Spurious Keys
Let Pn and Cn be the set of n-grams of plaintext and ciphertext, withassociated random variables Pn and Cn.
Suppose we use the same key k ∈ K with associated randomvariable K to encrypt each letter, then
K (c) = {k ∈ K : ∃m ∈ Pn, p(Pn = m) > 0, ek (m) = c},
is the set of possible keys for which c is the encryption of ameaningful message of length n.
Therefore, given the ciphertext c the number of spurious keys is
#K (c)− 1,
since there is only 1 correct key.
Elisabeth Oswald and Nigel Smart
COMS30124 : Crypto and Information Theory Slide 57
![Page 58: COMS30124 : Crypto and Information Theoryresist.isti.cnr.it/free_slides/cryptology/oswald/Part_II.pdf · Computational Security When considering schemes which are computationally](https://reader030.fdocuments.in/reader030/viewer/2022041122/5d0eba9288c9937f3b8bbe3a/html5/thumbnails/58.jpg)
Spurious Keys
The average number of spurious keys over all possible ciphertextsof length n is denoted by sn and equals
sn =∑c∈Cn
p(Cn = c) · (#K (c)− 1)
=∑c∈Cn
p(Cn = c) ·#K (c)−∑c∈Cn
p(Cn = c)
=∑c∈Cn
p(Cn = c) ·#K (c)− 1
We will now relate sn to the key equivocation H(K |Cn).
Elisabeth Oswald and Nigel Smart
COMS30124 : Crypto and Information Theory Slide 58
![Page 59: COMS30124 : Crypto and Information Theoryresist.isti.cnr.it/free_slides/cryptology/oswald/Part_II.pdf · Computational Security When considering schemes which are computationally](https://reader030.fdocuments.in/reader030/viewer/2022041122/5d0eba9288c9937f3b8bbe3a/html5/thumbnails/59.jpg)
Key Equivocation and Spurious Keys
Recall that H(K |Cn) is the average of H(K |Cn = c) over all possibleciphertexts and thus
H(K |Cn) =∑c∈Cn
p(Cn = c) · H(K |Cn = c)
≤∑c∈Cn
p(Cn = c) · log2 #K (c) (most uncertain when all equally likely)
≤ log2
(∑c∈Cn
p(Cn = c) ·#K (c)
)(by Jensen’s inequality)
= log2(sn + 1). (from last slide)
Conclusion: H(K |Cn) ≤ log2(sn + 1).
Elisabeth Oswald and Nigel Smart
COMS30124 : Crypto and Information Theory Slide 59
![Page 60: COMS30124 : Crypto and Information Theoryresist.isti.cnr.it/free_slides/cryptology/oswald/Part_II.pdf · Computational Security When considering schemes which are computationally](https://reader030.fdocuments.in/reader030/viewer/2022041122/5d0eba9288c9937f3b8bbe3a/html5/thumbnails/60.jpg)
Key Equivocation and Spurious KeysRecall that the key equivocation H(K |Cn) could be expressed as
H(K |Cn) = H(K ) + H(Pn)− H(Cn).
For a language L with entropy HL we can use the estimate
H(Pn) ≈ nHL = n(1− RL) log2 #P,
provided that n is reasonably large.
Since the entropy is always bounded by the log2 of number of values
H(Cn) ≤ n log2 #C.
Conclusion: If #P = #C then, putting all this together, we have theinequality
H(K |Cn) ≥ H(K )− nRL log2 #P.
Elisabeth Oswald and Nigel Smart
COMS30124 : Crypto and Information Theory Slide 60
![Page 61: COMS30124 : Crypto and Information Theoryresist.isti.cnr.it/free_slides/cryptology/oswald/Part_II.pdf · Computational Security When considering schemes which are computationally](https://reader030.fdocuments.in/reader030/viewer/2022041122/5d0eba9288c9937f3b8bbe3a/html5/thumbnails/61.jpg)
Bound on Number of Spurious KeysCombining the result of the two previous slides, we get the bound
log2(sn + 1) ≥ H(K )− nRL log2 #P.
Theorem:Suppose that (P, C, K, ek (·), dk (·)) is a cryptosystem with #P = #Csuch that keys are chosen equiprobably. If RL is the redundancy ofthe underlying language, then given a ciphertext of length n, theexpected number of spurious keys sn satisfies
sn ≥#K
(#P)nRL− 1.
Example:For a substitution cipher we have #P = 26, #K = 26! ≈ 288.4 andtake RL = 0.75, then
sn ≥ 288.4−3.5n − 1.
Elisabeth Oswald and Nigel Smart
COMS30124 : Crypto and Information Theory Slide 61
![Page 62: COMS30124 : Crypto and Information Theoryresist.isti.cnr.it/free_slides/cryptology/oswald/Part_II.pdf · Computational Security When considering schemes which are computationally](https://reader030.fdocuments.in/reader030/viewer/2022041122/5d0eba9288c9937f3b8bbe3a/html5/thumbnails/62.jpg)
Unicity DistanceThe unicity distance n0 of a cryptosystem is the value of n at whichthe expected number of spurious keys becomes zero.
Alternatively, the average amount of ciphertext required for anadversary to be able to uniquely determine the key, given enoughcomputing time.
For a perfectly secure cipher we have n0 = ∞.
We set sn = 0 in the following
sn ≥#K
(#P)nRL− 1
to obtain an estimate of the unicity distance n0
n0 ≈log2 #K
RL log2 #P.
Elisabeth Oswald and Nigel Smart
COMS30124 : Crypto and Information Theory Slide 62
![Page 63: COMS30124 : Crypto and Information Theoryresist.isti.cnr.it/free_slides/cryptology/oswald/Part_II.pdf · Computational Security When considering schemes which are computationally](https://reader030.fdocuments.in/reader030/viewer/2022041122/5d0eba9288c9937f3b8bbe3a/html5/thumbnails/63.jpg)
Substitution Cipher
We now show why it was easy to break the substitution cipher.I #P = 26I #K = 26! ≈ 288.4
I RL = 0.75 for English
We get an estimate for the unicity distance of
n0 ≈88.4
0.75× 4.7≈ 25.
So we require on average only 25 ciphertext characters before wecan break the substitution cipher, given enough computing time.
After 25 characters we expect a unique valid decryption.
Elisabeth Oswald and Nigel Smart
COMS30124 : Crypto and Information Theory Slide 63
![Page 64: COMS30124 : Crypto and Information Theoryresist.isti.cnr.it/free_slides/cryptology/oswald/Part_II.pdf · Computational Security When considering schemes which are computationally](https://reader030.fdocuments.in/reader030/viewer/2022041122/5d0eba9288c9937f3b8bbe3a/html5/thumbnails/64.jpg)
Modern Ciphers
Given a cipher which encrypts bit strings using keys of bit length m.I #P = 2I #K = 2m
I RL = 0.75 for English (underestimation since we’re using ASCII)
Then we get an estimate for the unicity distance of
n0 ≈log2 #K
RL log2 #P=
log2(2m)
0.75 log2(2)=
m0.75
=4m3
.
Elisabeth Oswald and Nigel Smart
COMS30124 : Crypto and Information Theory Slide 64