00396969

download 00396969

of 5

Transcript of 00396969

  • 7/28/2019 00396969

    1/5

    Modern Optimisation Algorithms fo r CryptanalysisAndrew Clark *Information Security Research Centre & Distributed Systems Technology Centre

    Queensland University of TechnologyEmail: [email protected]

    AbstractIn recent years a number of optimiiation algorithmshave emerged which have proven to be effective insolving a variety of NP-complete problems. Exam-ples of such methods include simulated annealing,genetic algorithms and the tabu search. This pa-per will describe each of these three algorithms andoverview their use in the field of cryptology. Inparticular, the application to cryptanalysis of sim-ple substitution an d transposition ciphers is consid-ered.1 IntroductionMany researchers in the field of cryptanalysis areinterested in developing automated attacks on en-cryption algorithms (ciphers). When analysing ci-phers it is advantageous that a proposed attackwill run without human intervention, finishing ei-ther when the message has been successfully d ecrypted or the key has been determined (this iswhat is meant by an automated attack). Previouswork in this area ([l], [4], [SI)has shown that sim-ulated annealing and genetic algorithms can prc-vide successful automated attach on both substi-tution and transposition ciphers. The purpose ofthis paper is to summarise the previous work andto present new work by applying the tabu search incryptanalysing these ciphers. The tabu search ([2])is a new and innovative technique which warrantsfurther investigation and for this reason is includedin t he following discussion.Section 2 briefly describes the two cipher typesbeing considered and givesa short summary of theirproperties. Section 3 discusses suitability assess-ment= which is an issue pertinent t o each of the al-gorithms described in the following three sections(simulated annealing, genetic algorithms and thetabu search, respectively). Suitability assessmentprovides a means for determining the suitability

    The work reported in this paper h a been funded in partby the Cooperative Research Centres Program through theDepartment of the Prime Minister and Cabinet of the Com-monwealth Government of Australia

    of an arbitrary solution. Finally a comparison ofthese techniques is presented. The comparison isnot intended to highlight one particular method butrather to illustrate positive and negative aspects ofeach of the algorithms. In each case experimentalresults will be given to substantiate any claims.2 Simple CiphersThis section gives a brief description of substitutionand transposition ciphers. Although the cipherson their own are relatively simple, they form thebuilding blocks for many popular and more com-plex ciphers - for example DES, he Data Encryp-tion Standard. For the purposes of this paper it isassumed that the alphabet consists of 27 characters- A,B, . Z, and the space character (denoted -).2.1 Substitution CiphersThe substitution cipher simply involves substitut-ing each letter in the message for another. Typi-cally a key is represented as a permutation of thecharacters in the alphabet. For example the key

    QWERTWIOPASDFGHJKL-ZXCVBNMwould encrypt the message

    I-THINK-THEREFORE-LAMto

    OM-IOFAM-ITKTYGKTMOMQDThe key above works in t he following way: plaintextA becomes ciphertext Q, B becomes W, C becomes E,etc.An important property of the substitution cipheris that the n-gram (n onsecutive letters in th e mes-sage or the cipher) statistics are maintained. Forexample, if an I occurs three times in the message,then the corresponding ciphertext letter (0 in theexample above) will occur the Same number of timesin the encrypted message. This property is impor-tant from th e point of view of a cryptanalyst and isth e basis of each of the attacky on the substitutioncipher described below.

    258

    0-7803-2404-8/94/4.00 01994 EEE

    http://qut.edu/http://qut.edu/http://qut.edu/
  • 7/28/2019 00396969

    2/5

    2.2 Transposition CiphersThe transposition (or permutation) cipher works onblocks of the message at the same time. The size ofthe block is constant and is usually chosen by theuser. Let M denote the length of a block. Each ofthe M letters in the block is rearranged accordingto some fixed permutation (the key). An examplekey (whereM =6) is

    4 1 6 3 2 5which transforms the message used in the exampleabove to

    HINT-IHKRT-ERE-OFEMITA-Rwhere the random characters RThave been a ppended to the original message to increase thelength to be a multiple of M.The attack methodology used on the transposi-tion cipher is as follows:1. Propose a key, K,,.2. Decrypt the ciphertext using Kp.3. Compare the n-gram (n >1) statistics of thedecrypted message with the known languagestatistics to evaluate a given key.

    3 Suitability Assessment

    p allows the weighting in favour of either the singlecharacter frequencies of the digram frequencies.For example,Forsyth and Safavi-Naini [I] choose a = 0. In

    his genetic algorithm attack Spillman [6] uses a farmore complicated function which normalises the r esult to a value between zero and one.The fitness of the key of a transposition cipheris more dficult to determine. This is because therelative frequencies of the single characters do notchange upon encryption. Thus any fitness assess-ment can not involve comparison of single char-acter frequencies. Furthermore, comparison of n-gram statistics (where n >1)is less powerful sincethe characters are already there in their naturalfrequencies. For example, a digram common inthe plaintext language (say TH)may also occur f r equently in the encrypted message (or in a messagedecrypted with an incorrect key).Despite these properties of the ciphertext, it isstill possible to mount a successfulattack by com-paring digram and trigram frequencies using an

    equation similar to (1).An alternative approach was used by Matthews[4] who calculated a fitness by assigning scores toa small list of the possible digrams and trigrams.The score assigned to a particular digram or trigramreflected its desirability in the decrypted message.As an example, consider the score table proposed

    Variations of (1) are possible.

    - by Matthews.In each of the algorithms to be described a methodof assessingeachproposed solution is required. Thestandard assessment methods used by cryptana-lysts when investigating substitution and permu-tation ciphers are discussed in this section. Theassessment methods (sometimes known as fitnessfunctions) have proven to be effective in attackingthese cipher types.When determining the fitness of a key to a substi-tution cipher, the single character and digrams fr e The negative score to three consecutivequencies are usually compared. A general formula spaces has a very powerful effect. Since the pace

    English text it is highly likely that this configura-set of characters in the alphabet - ie. A,B, . , , -) tion will occur in a message encrypted using thefor the fitness of key k is as follows (A denotes thethe most coonly Occurring in Plain

    -transposition cipher or i n an unseccessful attemptot decrypt it. The fitness function obtained usingthe score table might look as in (2).

    where SF and SDF denote t he relative frequenciesof single charactem and digrams in the English l a - where s denotes the set Of di/trigrams in theguage (respectively), and DF and DDF denote the score table, f i is the relative frequency of the ithrelative frequencies of single characters and digrams di/trigram in t he decrypted message and si is thein the message decrypted using k. Varying a and corresponding score.

    259

  • 7/28/2019 00396969

    3/5

    4 Simulated AnnealingAS its name suggests, the simulated annealing al-gorithm is modeled on the process of annealing ina metal. Metals are annealed (heated to a hightemperature and then slowly cooled) to produce amolecular structure which is crystalline.The movement of atoms in a metal (between dif-ferent energy levels) is governed by the Metropoliscriterion (51 which says that a particle moves fromenergy level E1 to & with probability P given by(31,

    where k is the Boltzmann constant (ignored in thefollowing discussion) and T is the temperature.The optimisation algorithm ([l],3])mimics thisprocess by starting with a (usually) random solu-tion (which is analogous to the st ructure of a metalat high temperature) and then iterating accordingto the Metropolis criterion and reducing the tem-perature according to a cooling schedule to arriveat a near-optimal solution.An implementation of the simulated annealing al-gorithm follows these steps:1. Generate an initial solution to the problem(usually random).2. Calculate the fitness of the initial solution.3. Set the initial temperature T =To4. For temperature, T ,do many times

    Generate a new solution - this involvesperturbing the current solution in somemanner.0 Calculate the cost of th e modified solu-tion.

    Determine the difference in cost betweenthe current solution and the proposed so -lution.Consult the Metropolis criterion to de-cide i f the proposed solution should be ac-cepted.If the proposed solution is accepted, therequired changes are made to the currentsolution.

    5 . If the stopping criterion is satisfied the algo-rithm ceases with the current solution, other-wise decrement the temperature, T , nd returnto Step 4.

    In this case perturbing the current solution sim-ply involved swapping two randomly chosen e lements of the key. In the case of the transpositioncipher a rotation of the key by a random amount isalso used as a perturbation mechanism.

    5 Genetic AlgorithmsThe second of the three algorithms to emerge wasthe genetic algorithm ([4], [6]). It uses ideas fromthe evolutionary process to iteratively breed supe-rior solutions to the problem at hand. Unlie sim-ulated annealing which is only concerned with onesolution (or key) at a time, the genetic algorithmmaintains a pool of keys. This gene pool is ma-nipulated using functions such as selection, matingand mutation to evolvea near-optimal solution.The selection process typically involves choosingrandom pairs from the current solution pool whichwill mate to produce ofbpring for the next gener-ation. Although the selection is random it is biasedtowards the fittest of the current pool.Mating is designed so that two parents will p r educe two children. Each child should have char-acteristics of both its parents. As an example con-sider the the transposition cipher with a key oflength M =7. Firstly generate a random bit stringof length M .Parent 1: 4 5 1 7 3 2 6Parent 2: 4 3 1 2 7 6 5

    1 0 0 1 1 0 1it String:Child 1 is created in two steps:1. Take the elements in Parent 1corresponding toa 1 n the bit string.

    Childl: 4 * * 7 3 * 62. Place the missing elements (denoted by *) inthe order they appear in Parent 2.

    Child 1: 4 1 2 7 3 5 6Child 2 is found in a similar manner.

    Child 2: 4 3 1 5 7 6 2Mutation is performed simply by swapping cer-tain elements within a key, or rotating th e elementsin the key. Mutation usually occurs with a low prob-ability.The overall genetic algori thm canbe summarisedas follows:

  • 7/28/2019 00396969

    4/5

    1. Generate a random, initial gene pool. Here agene represents a possible solution to the prob- necessary).lem at hand - for example a possible permuta-tion which decrypts a transposition cipher. S . &peat Step, and until a 6xed numberof iterations have been performed, or there hasbeen no mprovement in the best solution for anumber of iterations.

    4. Update the tabu list and the 'best SO far' (if

    2. A fitness is allotted to each gene in the pool.3. A mating pool is then generated by selecting

    parents from the current gene population. Thistowards the fittest of the current solutions.

    ,one ifference of this algorithm from the others

    variations on one key are considered. ~h~makes4. The genes in the mating pool are combined to the algorithm powerful since although the keys canproduce a set of "children". change dramatically over a number of iterations,the difference between successive iterations is small

    5 . Each child then undergoes a mutation process but ever improving. The advantage of this becomeswith small probability (predetermined). The obvious when one considers what happens when afitness is then calculated for each child. solution of the current configuration is close to the

    5 ~ l ~ t i o nr o m , although random, i s biased &cussed in t h i paper is that in iteration only

    required answer. Both simulated annealing and thegenetic algorithm can be led away from the opti-mum (perhaps with sma l l probability), however thetabu search is more likely to find the optimum since6. The best of the new generation and the oldgeneration are then combined in some mannerand the algorithm returns to Step 3.7. Stop after a certain number of iterations, or most steps are upwards (towards Feat er fitness).The tabu search achieves a good balance betweenthe ability to jump out of regions of local minima

    and iterative improvement.when the message decrypts.

    6 Tabu SearchThe most recent of the three techniques is the tabu

    . search [2 ] which has proven to be very effective insolving many optimiisation problems. The main aimof the t abu search is to provide an heuristic for find-ing a good solution to t he problem at hand withoutbecoming trapped in a local minimum. The algwrithm has the concept of a short term memory inthe form of a tabu l i t . At each iteration the currentkey is added to the tabu list. This key wll remain'tabu' for a fixed number of iterations.The algorithm is:1. Generate a random initial solution and calcu-late its fitness. Record this as he best solution

    found so far.2. Create a list of possible moves: Here a 'move'consists of swapping two randomly chosen elements of the current key. The size of this listis a parameter of the algorithm and is not nec-essarily k ed . The fitness of the solutions ob-

    tained by making each of the moves are calcu-lated.3. Choose the best admissible candidate. Of thecandidate moves, which one is not tabu andyields the best improvement in the fitness ofthe current solution? A move which is tabumay be accepted if it satisfies the aspimtioncriteria. In this case the aspiration criterionwas that the fitness be at least as high as thebest solution found so far.

    7 Results ComparedLack of space prevents further details of the algo-rithms being given. In this section a review of thethree algorithms is presented. The results in Fig-ure 1and Table 1 were obtained by running attackson the substitution cipher using each of the threemethods. Attacks on he transposition cipher werealso implemented, with similar results. For each al-gorithm there are a number of different parameterswhich need to be varied to %netune" the optimi-sation process. Determining the optimal values forthese parameters is a non-trivial task and is usuallyperformed experimentally. In some casesguidelinesare given. For example, the initial temperature inthe simulated annealing algorithm should be chosenhigh enough such that every proposed move is ac-cepted by the Metropolis criterion. Readers shouldrefer to the references section for more detailed re-ports of implementing these algorithms.It is unreasonable to expect these methods to ar-rive at th e correct solution on their first run. It isusually necessary to perform a number of runs, achstarting from a different region (randomly chosen)of the solution space.Each algorithm was run with initial keys beingchosen randomly. Figure 1 gives an indication ofthe convergence rates of each of the algorithms asafunction of the number of iterations. Of course, thisresult can not be used as the only indicator of thesuperiority of any algorithm since the complexity

    261

  • 7/28/2019 00396969

    5/5

    I. . .' U .. ,

    Method Average TimeSimulated Annealing 242s

    Tabu Search 94sGenetic Algorithm 220s

    Tabusearch-Genetic Algortihm . .Simulated Annealing-0 I I I I I I0 50 100 150 200 250 300 350Number of Iterations

    [2] Red Glover. Tabu search: A tutorial. Inter-[3] S. Kirkpatrick, C. D.Gelatt, Jr., and M. P. Vec-

    faces, 20(4):74-94, July 1990.

    Figure 1:Comparison of Algorithms

    of one iteration for each of the algorithms varies.It does, however, give a comparison of the com-plexities of the various algorithms. Simulated an-nealing is expected to require fewer iterations sinceeach iteration is very computationally intensive -many keys are considered and the fitness of eachkey must be calculated. As can be seen from Table1, he computation time required by the simulatedannealing algorithm is the greatest, reinforcing thefact that each iteration considers many keys. The

    the application of techniques such as hese to morecomplex ciphers is envisaged. The dXerent prop-erties of each algorithm means that the use of onealgorithm over another may be preferred for par-ticular applications. In any case, these methodsprovide a reliable problem solving technique withmany useful properties.

    References[l ] W. S. Forsyth and R. Safavi-Naini. Automated

    cryptanalysis of substitution ciphers. Cryptolo-times presented in Table 1 were averaged over anumber of mns of each of the algorithms. ..gia; 17(4)1407418, October 1993.

    ence, i20(4598):671-680,1983. -': Time comparison Of [41 Robert A. J.Matthews. The use of genetic al ge_ _ rithms in cryptanalysis. Cqptologia, 17(2):187-The genetic algorithm appears to "learn" more 201, April 1993.slowly although the convergence rate improves as

    the gene pool collects more fit solutions. The re- [5] N. Metropolis, A. W. Rosenblunth, M. N.sults in Figure 1 how that the tabu search required Rosenblmth, A.H. Teller, and E.Teller. h u a -