1 RANDOM GRAPHS IN CRYPTOGRAPHY Adi Shamir The Weizmann Institute Israel May 15, 2007 7th Haifa...

RANDOM GRAPHS IN CRYPTOGRAPHY

Adi ShamirThe Weizmann Institute

Israel

May 15, 2007 7th Haifa Workshop on Interdisciplinary Applications of Graph Theory,

Combinatorics and Algorithms

Random Graphs in Cryptography:

In this talk I will concentrate on some particular algorithmic issues related to random graphs which are motivated by cryptanalytic applications

Many of the results I will describe are either unpublished or little known in our community

Note that in cryptanalysis, constants are important!

Cryptography and Randomness:

Cryptography deals with many types of randomness:

- random strings- random variables- random functions- random permutations- random walks- …

The notion of random functions (oracles):

- truly random when applied to fresh inputs- consistent when applied to previously used inputs

f(0)=37f(1)=92f(2)=78f(3)=51

This tabular description gives us a local view, which is not very informative

To see the big picture, we define the random graph G associated with the random function f:

x f(x)

When the function f is a permutation, its associated graph G is very simple:

However, when the function f is a random function rather than a random permutation, we get a very rich and interesting structure:

Random Graph #1:

Random Graph #2:

There is a huge literature on the structure and combinatorial properties of such random graphs:

The distribution of component sizes, tree sizes, cycle sizes, vertex in-degrees, number of predecessors, etc.

In many applications we are interesting in the behavior of the random function f under iteration.

Examples:- pseudo random generators- stream ciphers- iterated block ciphers and hash functions- time/memory tradeoff attacks - randomized iterates- …

In this case, we are interested in a single path starting at a random vertex within the random graph.

A random path in a random graph:

Such a path always starts with a tail, and ends with a cycle.

The expected length of both the tail and the cycle is about the square root of the number of vertices.

Interesting algorithmic problems on paths:

Assuming that we can only move forwards along edges:

- Find some point on the cycle

- Find some point on the cycle- Find the same point a second time

- Find some point on the cycle- Find the same point a second time- Find the length of the cycle

- Find some point on the cycle- Find the same point a second time- Find the length of the cycle- Find the cycle entry point

Why are we interested in these algorithms?

- Pollard’s rho algorithm: The cycle length l can be used to find small factors of large numbers, requires only negligible memory. - Finding collisions in hash functions: The cycle entry point can represent a hash function collision.

How to find a collision in a given hash function H?

- Exhaustive search: Requires 2n time, no space- Birthday paradox: Construct a large table of 2n/2 random hash values, sort it, and look for consecutive equal values. Requires both time and space of 2n/2

- Random path algorithm: Iterate the hash function until you find the entry point into a cycle. Requires 2n/2 time and very little space

Cycle detection is a very well studied problem:

- Floyd- Pollard- Brent- Yao- Quisquater- …

And yet there are new surprising ideas!

The best known technique:Floyd’s two finger algorithm:- Keep two pointers- Run one of them at normal speed, and the other at double speed, until they collide

Floyd’s two finger algorithm:- Keep two pointers- Run one of them at normal speed, and the other at double speed, until they collide

Can we use Floyd’s algorithm to find the entry point into the cycle?

Can we use Floyd’s algorithm to find the entry point into the cycle?-First find the meeting point

Can we use Floyd’s algorithm to find the entry point into the cycle?- first find the meeting point- move one of the fingers back to the beginning

Can we use Floyd’s algorithm to find the entry point into the cycle?- first find the meeting point- move one of the fingers back to the beginning- move the two fingers at equal speed

Why does it work?

Why does it work?- denote by d the distance from the beginningto the meeting point

Why does it work?- denote by d the distance from the beginningto the meeting point- the fast finger ran another d, reaching the same point, so d is some (unknown) multiple of the cycle length

Why does it work?- running the two marked fingers another d steps reaches the same point again

Why does it work?- running the two marked fingers another d steps reaches the same point again- so the two fingers meet for the first time at the entrance to the cycle, and then travel together

Is this the most efficient cycle detection algorithm?

- When the path has n vertices and the tail is short, Floyd’s algorithm requires about 3n steps, and its extension requires up to 5n steps

Is this the most efficient cycle detection algorithm?

- When the cycle is short, the fast finger can traverse it many times without noticing

A better idea:

- Place checkpoints at fixed intervals- Update the checkpoints periodically

Problem:

- Too few checkpoints can miss small cycles

Problem:

- Too many checkpoints are wasteful

Problem:

- You do not usually know in which case you are!

Examples of unusually short cycles:

- cellular automata (e.g., when simulating the game of life)- stream ciphers (e.g., when one of the LFSR’s is stuck at 0)

A very elegant solution:

Published by Nivasch in 2004

Properties of the Nivasch algorithm:- Uses a single finger- Uses negligible amount of memory- Stops almost immediately after recycling - Efficient for all possible lengths of cycle and tail- Ideal for fast hardware implementations

The basic idea of the algorithm:- Maintain a stack of values, which is initially empty - Insert each new value into the top of the stack- Force the values in the stack to be monotonically increasing

43 67 9 50 8 21

The Stack Algorithm:

43 67 9 50 8 21

0 3 8 6

43 67 9 50 8 21

0 3 6 1

43 67 9 50 8 21

0 1 9 2

43 67 9 50 8 21

0 1 2 5

43 67 9 50 8 21

0 1 2 5 4

43 67 9 50 8 21

0 1 2 4

43 67 9 50 8 21

0 1 2 4 3

43 67 9 50 8 21

0 1 2 3

43 67 9 50 8 21

0 1 2 3 8

43 67 9 50 8 21

0 1 2 3 8 6

43 67 9 50 8 21

0 1 2 3 6

43 67 9 50 8 21

0 1 2 3 6 1

43 67 9 50 8 21

0 1 2 3 1

43 67 9 50 8 21

0 1 2 1

43 67 9 50 8 21

Stop when two identical values appear at the top of the stack

43 67 9 50 8 21

Claim: The maximal size of the stack is expected to be only

logarithmic in the path length, requiring negligible memory

43 67 9 50 8 21

Claim: The stack algorithm always stops during the second

cycle, regardless of the length of the cycle or its tail

43 67 9 50 8 21

Proof: The smallest value on the cycle cannot be eliminated by any later value. Its second occurrence will eliminate all

the higher values separating them on the stack.

43 67 9 50 8 21

The smallest value in the cycle is located at a random position, so we expect to go through the cycle at least once and at most twice (1.5 times on average)

43 67 9 50 8 21

Improvement: Partition the values into k types, and use a different stack for each type. Stop the algorithm when repetition is found in some stack.

43 67 9 50 8 21

The new expected running time: (1+1/k)*n. Note that n is the minimum possible running time of any cycle detecting algorithm, and for k=100 we exceed it by only 1%

43 67 9 50 8 21

Unlike Floyd’s algorithm, the Nivasch algorithm provides excellent approximations for the length of the tail and cycle as soon as we find a repeated value, with no extra work

43 67 9 50 8 21

Note that when we stop, the bottom value in each stack contains the smallest value of that type, and that these k values are uniformly distributed along the tail and cycle

43 67 9 50 8 21

Adding two special points to the k stack bottoms, at least one must be in the tail and at least one must be in the cycle, regardless of their sizes

43 67 9 50 8 21

We can now find the two closest points (e.g., 0 and 2) which are just behind the collision point. We can thus find the collision after a short synchronized walk

43 67 9 50 8 21

The Fundamental Problem of Cryptanalysis:

Given a ciphertext, find the corresponding keyGiven a hash value, find a first or second preimage

Invert the easily computed random function f where f(x)=Ex(0) or f(x)=H(x)

The Random Graph Defined by f:Goal: Go backwards Means: Going forwards

Possible solutions:

Method #1: Exhaustive search Time complexity T ≈ N. Memory complexity M ≈1.

Method #2: Exhaustive tableTime complexity T ≈ 1. Memory complexity M ≈N.

Time/Memory tradeoffs: find a compromise betweenthe two extremes, i.e., M << N and T << N, by using a free preprocessing stage.

Hellman’s T/M Tradeoff (1979)Preprocessing phase:Choose m random starting points, evaluate chains of length t.Store only pairs of (startpoint,endpoint) sorted by endpoints.

Online phase: from the given y=f(x) complete the chain.Find x by re-calculating the chain from its startpoint.

How can we cover this graph by chains?The main problem: Long chains converge

Problem: Hard to cover more than N/t images

Why? A new path of length t is likely to collide with the N/t images already covered by the table, due to the birthday paradox

Hellman’s solution:

Use t “independent” tables from t “related” functions fi(x)=f(x+ i mod N) – note that inversion of fi inversion of f.

How can we cover this graph by chains?

Local properties are preserved, while global properties are modified

- A point which had k predecessors in f will also have k predecessors in fi, but their identities will change.

- In particular, all the graphs will have exactly the same set of leaves, and values which are not in the range of f will not be covered by any path in any table

- On the other hand, the number of components, the size of the cycles, and the structure of the trees hanging around the cycle can be very different

Are these graphs really independent?

Hellman’s trick is theoretically unfounded, but works very well in practice.

To invert a given image, try separately each one of the t functions, so both time and space increase by t:

T=t2, M=mt

By the birthday paradox, the maximum possible values of t and m in a single table satisfy t*tm=N

T/M tradeoff: TM2=N2.

Are these graphs really independent?

Let c=N1/3

Use c tables, each table with m=c paths, each path of length about t=c (stopping each path at the first distinguished point rather than at a fixed length)

Together they cover most of the c3=N vertices.

Total time T=c2=N2/3 , total space M=c2=N2/3

A typical choice of parameters:

Can be the best approach for cryptosystems with about 80 bit keys

Can be the best approach for cryptosystems whose keys are derived from passwords with up to 16 characters

Are such tradeoffs practically interesting?

If each value has b bits, straightforward implementations require 2b bits per path

I will show now that only b/3 bits are needed.

This is a 6 fold saving in memory, which is equivalent to 36 fold saving in time due to the T/M tradeoff curve: TM2=N2

A new optimization of Hellman’s scheme:

According to an old Chinese philosopher,

Paths in random graphs are like people:

They are born at uniformly distributed startpoints, but die at very unequally distributed endpoints!

The new optimization:

The unequal distribution of endpoints:

distinguished points are much more likely to be near the leaves than deep in this graph, so very few of them will be chosen as endpoints

Note that the startpoints are arbitrary, so for each one of the c tables we can choose a different interval of c consecutive values as start points

Since c=N1/3 , only b/3 bits are needed per startpoint.

Since we do not store endpoints, this is all we need!

The new optimization: Forget the endpoints!

Divide all the c2 possible distinguished points into about c large regions:

During preprocessing, make sure that each region has at most one path ending at one of its distinguished points

For each region, memorize the startpoint of this path (if it exists), but not the value of the corresponding endpoint

Problem: This can lead to too many false alarms

A false alarm happens when a stored endpoint is found, but its corresponding startpoint does not lead back to the initial value. In Hellman’s scheme, false alarms happen in about half the tables we try. This wastes time, but is hard to avoid.

There are two types of false alarms here:

An old false alarm happens when the path from the initial value joins one of the precomputed paths:

There are two types of false alarms here:

An old false alarm happens when the path from the initial value joins one of the precomputed paths:

A new false alarm happens when the path from the initial value enters a new endpoint:

A surprisingly small number of new false alarms are created by forgetting the endpoints:

Endpoints which are likely to be chosen by the online phase were also likely to be chosen by the preprocessing phase

Since the Hellman parameters were chosen maximally, with high probability each new path is likely to end in one of the marked endpoints (otherwise we could add more paths to increase our cover!)

The bottom line:

Simulations show that the total running time is increased only by a few percent due to new false alarms, and thus it is a complete waste to memorize the endpoints!

Oechslin’s Rainbow Tables (2003)

Use a different sequence of functions along each path, such as:

111222333 or 123123123 or pseudorandom e.g. 1221211

There are many other possible tradeoff schemes:

Make the choice of the next function dependent on previous values

There was already a slight problem with the multiple graphs of Hellman’s scheme

What kind of random graph are we working with in such schemes?

Oechslin’s scheme is even wierder

Its time to define a new notion of a random graph!

We introduced a new type of graph called Stateful Random Graph

Barkan, Biham, and Shamir (Crypto 2006):

We proved rigorous bounds on the achievable time/memory tradeoffs of any scheme which is based on such graphs, including Hellman, Oechslin, and all their many variants and possible extensions

The Random Stateful Graph Model

• The nodes in the graph are pairs (yi , si), with N possible images yi and S possible states si.

• The scheme designer can choose any U, then random f is given.

• The increased number of nodes (NS) can reduce the probability of collisions and a good U can create more structured graphs.

• Examples of states: Table# in Hellman, column# in Oechslin. We call it a hidden state, since its value is unknown to the attacker when he tries to invert an image y .

s0 Ux1

The Stateful-Random-Graph Model – cont

U in Hellman:

xi=yi-1 + si-1 mod Nsi=si-1

s0 Ux1

U in Rainbow:

xi=yi-1 + si-1 mod Nsi=si-1 + 1 mod S.

s0 Ux1

U in exhaustive search:

xi=si-1

si=si-1 + 1 mod N, which goes over all the preimagesof f in a single cycle

s0 Ux1

Coverage and Collision of Paths

Definition: Two paths collide

if both yi= yj and si=sj

net coverage – the set of images yi covered by the M paths.gross coverage – set of nodes (yi , si ) covered by the M paths.

(yj-1 , sj-1 )

(yi-1 , si-1 )

s0 Ux1

The rigorously proven Coverage Theorem:

Let where M=N α, for any 0<α<1.

For any U with S hidden states,

with overwhelming probability over random f’s,

the net coverage of any collection of M paths of any length in the stateful random graph

is bounded from above by 2A.

,)ln(SNSNMA

Reduction of Best Case to Average Case

Wi,j=1 if the net coverage of fi and Mj is larger than 2A (0 otherwise).

We want to prove that almost all the rows contain only zeroes by proving that there are fewer 1’s than rows in W.

For a given U, consider a huge table W of all the possible functions and all the subsets of M startpoints:

Upper Bounding Prob(Wi,j=1)

Method:

• Construct an algorithm that counts the net coverage for fi and Mj.

• Analyze prob. that the counted coverage > 2A, i.e., Prob(Wi,j=1), over a random and uniform choice of start points Mj and function fi

The Combinatorial heart of the proof :

Define the notion of a coin toss with fixed success prob. q.Define the notion of miracle: many coin tosses with few successes.Prove that the probability of a miracle is negligible.

Bounding Prob(Wi,j=1) – Basic Idea

Algorithm: traverses chains, stop chain on collision, counts net coverage.

…{M paths

We want to consider each output of f as a truly random number. However, this view is justified only the first time f is applied to an input (fresh value). Otherwise, the output of f is already known.

Recall: Collision only if (yi , si ) = (yj , sj ).

FreshBucket1

FreshBucket2

FreshBucketS

···

Real execution of chain creation

FreshBucket3

FreshBucket4

FreshBucket1

FreshBucket2

FreshBucketS

···

FreshBucket3

FreshBucket4

FreshBucket1

FreshBucket2

FreshBucketS

···

FreshBucket3

FreshBucket4

FreshBucket1

FreshBucket2

FreshBucketS

···

FreshBucket3

FreshBucket4

FreshBucket1

FreshBucket2

FreshBucketS

···

FreshBucket3

FreshBucket4

FreshBucket1

FreshBucket2

FreshBucketS

···

Not fresh! (although in another bucket)We already know f(2)=7.7 already covered by bucket1 –no need to add to fresh bucket4.

FreshBucket3

FreshBucket4

FreshBucket1

FreshBucket2

FreshBucketS

···

FreshBucket3

FreshBucket4

FreshBucket1

FreshBucket2

FreshBucketS

···

Collision of a freshly created value – f(3) with a value already in the fresh bucket2. Chain must end.

FreshBucket3

FreshBucket4

Clearly: NetCoverage Σ |FreshBucket|

FreshBucket1

FreshBucket2

FreshBucketS

···

FreshBucket3

FreshBucket4

Analysis: what is the probability of a collision between a fresh image yi=f(xi) and the values in the fresh bucket?

Exactly |FreshBucket|/N – as yi= f(xi) is truly random and independent of previous events.

Problem: |FreshBucket| depends on previous probabilistic events – difficult to analyze.

Bounding Prob(Wi,j=1) – Analysis

FreshBucket1

FreshBucket2

FreshBucketS

···A/S

Prob. that a coin toss is successful: exactly q=A/(SN) (independent of previous events!)A successful coin toss Collision at most M successful coin tosses

s0 Ux1

Set a threshold A/S in each bucketdivides bucket to lower and upper buckets.

Bounding Prob(Wi,j=1) – Coin Toss

Miracles happen with very low probability:

Miracle: NetCoverage > 2A

Miracle After A coin tosses there are fewer than M successes

i.e., Prob(Miracle) Prob(B(A,q)<M),

where B(A,q) is a binomial random variable, with q=A/(SN).

Concluding the proof:Prob(Wi,j=1) is so small that #1 in table << #rows

so for any tradeoff scheme U with S hidden states,

almost all functions f cannot be covered well

even by the best subset of M startpoints. QED.

Rigorous Lower Bound on the Number of Hidden States S

Net coverage of at least N/2

and therefore, the number of hidden states must satisfy:

2 2 ln( ) NA SNM SN

.16 ln( ) 32 ln( )

N NSM SN M N

Corollaries:

To cover most of the vertices of any stateful random graph, you have to use a sufficiently large number of hidden states, which determines the minimal possible running time of the online phase of the attack.

This rigorously proven lower bound is applicable to Hellman’s scheme, the Rainbow scheme, and to any other scheme which can be described by stateful random graphs.

Conclusion:

Random graphs are wonderful objects to study

Understanding their structure can lead to many cryptographic and cryptanalytic optimizations

In this talk I gave only a small sample of the published and folklore results at the interface between cryptography and random graph theory

1 RANDOM GRAPHS IN CRYPTOGRAPHY Adi Shamir The Weizmann Institute Israel May 15, 2007 7th Haifa...

Documents

Transcript of 1 RANDOM GRAPHS IN CRYPTOGRAPHY Adi Shamir The Weizmann Institute Israel May 15, 2007 7th Haifa...

Raanan Shamir, MD

public-key cryptography Shamir

Facts & Figures - Weizmann Institute of Science · Weizmann Institute of Science Facts and Figures 2 3 About the Weizmann Institute of Science The Weizmann Institute of Science is

SHAMIR MINERAL · UC MC SHAMIR SOL An affordable everyday solution • Personalized lens solution • Clear and wide vision zones • Advanced optical design at a great price SHAMIR

MECHANICS - Weizmann

Without Borders Shamir 2005

Collaboration: Ehud Ehud AltmanAltman - - The Weizmann ...€¦ · Ehud AltmanAltman - - The Weizmann . Weizmann Institute of Science. Eugene Demler . Demler - - Harvard University.

Weizmann October2008

fuentes - Weizmann

The Noisy Oracle Problem · The Noisy Oracle Problem U. Feige, A. Shamir, M. Tennenholtz Department of Applied Mathematics The Weizmann Institute of Science Rehovot 76100, Israel

Shamir Secret Sharing Presentation

Eitan Shamir - Paper

Post-Quantum Security of Fiat-Shamir · Post-Quantum Security of Fiat-Shamir Dominique Unruh University of Tartu May 16, 2018 Abstract. The Fiat-Shamir construction (Crypto 1986)

Joint Weizmann-SISSA Meeting Emergent Structures in ...zoccolan/VisionLab/Home_files/poster_weizmann_sissa.pdfEhud Ahissar (Weizmann) Mathew E. Diamond (SISSA) David Mukamel (Weizmann)

Element - Shamir

Breaking Grain-128 with Dynamic Cube Attacks · 2011. 3. 20. · Breaking Grain-128 with Dynamic Cube Attacks Itai Dinur and Adi Shamir Computer Science department The Weizmann Institute

Chaim Weizmann

SHAMIR VISIONRECREATING PERFECT - shamirlens.com

STUDENTS PROBABILITY DAY Weizmann Institute of Science March 28, 2007 Yoni Nazarathy (Supervisor: Prof. Gideon Weiss) University of Haifa Yoni Nazarathy.

Cube Attacks on Tweakable Black Box PolynomialsCube Attacks on Tweakable Black Box Polynomials Itai Dinur and Adi Shamir Computer Science department The Weizmann Institute Rehobot