Download - documentatie licenta

Transcript
Page 1: documentatie licenta

Table of Contents

Table of contents

1 Introduction.........................................................................................................................11.1 Context........................................................................................................................11.2 Project Topics.............................................................................................................2

1.2.1 Statistical Tests for Random and Pseudo-random Number Generators.................21.2.2 Statistical Functions................................................................................................3

1.3 Project Goals...............................................................................................................41.3.1 Motivation...............................................................................................................41.3.2 Objectives................................................................................................................41.3.3 Project Description.................................................................................................5

2 Bibliographical Review......................................................................................................13 Theoretical Foundations......................................................................................................8

3.1 Statistical Tests for Random and Pseudo-random Number Generators......................83.1.1 NIST Statistical Test Suite.......................................................................................8

3.1.1.1 NIST Test 1 – Frequency (Monobit) Test.......................................................93.1.1.2 NIST Test 2 – Frequency Test within a Block.............................................103.1.1.3 NIST Test 3 – Runs Test...............................................................................123.1.1.4 NIST Test 4 – Test for the Longest Run of Ones in a Block........................143.1.1.5 NIST Test 7 – Non-overlapping Template Matching Test...........................173.1.1.6 NIST Test 8 – Overlapping Template Matching Test..................................193.1.1.7 NIST Test 9 – Maurer’s “Universal Statistical” Test...................................223.1.1.8 NIST Test 9 – Maurer’s “Universal Statistical” Test (Coron’s variant).......26

3.1.2 CryptoRand Statistical Test Suite.........................................................................283.1.2.1 The Histogram Test – Hl Test.......................................................................28

3.2 Statistical Functions..................................................................................................323.2.1 IBETA Statistical Function...................................................................................323.2.2 IGAM and IGAMC Statistical Functions..............................................................333.2.3 The Chi-square Distribution.................................................................................343.2.4 The Poison Distribution........................................................................................36

3.3 OpenMP....................................................................................................................384 Requirements Specifications and System Architecture....................................................41

4.1 Block Diagram of the System...................................................................................414.1.1 SeqTestRand..........................................................................................................414.1.2 ParTestRand.........................................................................................................424.1.3 SFC.......................................................................................................................43

4.2 Functional Requirements..........................................................................................434.3 Non-functional Requirements...................................................................................444.4 Constraints................................................................................................................454.5 General System Architecture....................................................................................45

5 Design Detail....................................................................................................................485.1 Design Policies..........................................................................................................485.2 Use Cases..................................................................................................................52

5.2.1 SeqTestRand..........................................................................................................525.2.2 ParTestRand.........................................................................................................555.2.3 SFC.......................................................................................................................55

5.3 Program Structure.....................................................................................................605.3.1 SeqTestRand..........................................................................................................60

5.3.1.1 SeqNIST........................................................................................................60

Page 2: documentatie licenta

Table of Contents

5.3.1.2 SeqHistograms..............................................................................................685.3.2 ParTestRand.........................................................................................................72

5.3.2.1 ParNIST........................................................................................................725.3.2.2 ParNIST_blocks............................................................................................74

5.3.3 SFC.......................................................................................................................745.4 User interface............................................................................................................76

5.4.1 SeqTestRand..........................................................................................................765.4.2 ParTestRand.........................................................................................................805.4.3 SFC.......................................................................................................................81

6 System Usage....................................................................................................................846.1 SeqTestRand.............................................................................................................846.2 ParTestRand..............................................................................................................896.3 SFC...........................................................................................................................90

7 Deployment and experimental results...............................................................................947.1 Used Technology......................................................................................................947.2 Running the Applications.........................................................................................94

7.2.1 Hardware requirements........................................................................................947.2.2 Software requirements..........................................................................................95

7.3 Encountered Issues and Solutions Found.................................................................957.4 Experimental Results................................................................................................96

7.4.1 NIST Battery..........................................................................................................967.4.1.1 NIST Test 1 – Frequency (Monobit) Test.....................................................977.4.1.2 NIST Test 2 – Frequency Test within a Block.............................................987.4.1.3 NIST Test 3 – Runs Test...............................................................................997.4.1.4 NIST Test 4 – Test for the Longest Run of Ones in a Block......................1007.4.1.5 NIST Test 7 – Non-overlapping Template Matching Test.........................1017.4.1.6 NIST Test 8 – Overlapping Template Matching Test................................1027.4.1.7 NIST Test 9 –Maurer’s ”Universal Statistical” Test..................................103

8 Conclusions.....................................................................................................................1048.1 Results.....................................................................................................................1048.2 Comparison with Similar Systems..........................................................................1058.3 Future Development................................................................................................106

9 References.......................................................................................................................10710 Annexes...........................................................................................................................109

10.1 Experimental Results Tables...................................................................................10910.1.1 NIST Battery....................................................................................................10910.1.2 CryptoRand Battery........................................................................................112

10.2 Papers......................................................................................................................113

Page 3: documentatie licenta

1 Introduction

1 Introduction

1.1 Context

The concept of randomness appeared a long time ago and was defined by Aristotle as “the situation when a choice is to be made which has no logical component by which to determine or make the choice”. [1] Over the years many scientific fields became concerned with randomness, fields like cryptography, game theory, information theory, algorithmic probability, pattern recognition, statistics, probability theory, quantum mechanics, and so on.

The many applications of randomness have led to many different methods for generating random data. These methods may vary as to how unpredictable or statistically random they are, and how quickly they can generate random numbers. [2]

There are two basic types of generators used to produce random sequences: true random number generators (TRNGs) and pseudo-random number generators (PRNGs). The difference between them is that the first one uses a nondeterministic source (the entropy source) and a processing function (the entropy distillation process) to produce randomness, while the second one uses one or more seeds to generate pseudo-random numbers. For cryptographic applications, both of these generator types produce a sequence of zeroes and ones.

The simplest random number generator could be an unbiased fair coin that is labeled with a 0 on one side and with a 1 on the other side. The bit sequence generated would be represented by the results of flipping the coin more times, with the condition that the flips are independent of each other. This means that the result of a previous coin flip does not affect the result of the next coin flips.

The condition from the previous example is in fact one of the properties of random and pseudo-random number generators: unpredictability. This property states that the value of the next element generated in the sequence cannot be predicted, regardless of how many elements have already been produced. For pseudo-random number generators the unpredictability is also applied for the seed, i.e. there should be no correlation between the generated output sequence and the seed.

As any other application, the random number generators cannot be used if they are not tested. Various statistical tests were developed over the years in order to test the reliability of random number generators and to offer the possibility of comparing them. These tests are divided in two categories: the ones that test sequences of bits and the ones that test random numbers located in the interval [0, 1].

Given that randomness is a probabilistic property, we can say that the properties of a random sequence can be described in terms of probability. That is why the outcome of statistical tests applied to a truly random sequence, can be described in probabilistic terms and is known a priori.

There are an infinite number of statistical tests, each of them testing a certain property of truly random sequences. Because of the high number of tests, they are organized in batteries, but none of these batteries can be called a complete set of tests for random number generators. Caution must be taken not only when choosing the battery or batteries of tests to use, but also when interpreting the results of the statistical tests.

The statistical tests must not only certify that a generator produces random sequences or look for certain properties in a generator, but they must also be fast and efficient and must be able to test sequences of various lengths. This way the same statistical tests can be used by a vast pool of generators.

1

Page 4: documentatie licenta

1 Introduction

1.2 Project Topics

1.2.1 Statistical Tests for Random and Pseudo-random Number Generators

A numeric sequence is said to be statistically random when it contains no recognizable patterns or regularities. [3] It can be globally random or locally random. The global randomness is based on the idea that a sequence can be random as a whole and not random on some subsequence. Local randomness refers to the idea that there are minimum sequence lengths for which random distributions are approximated.

As defined in [4], statistical tests provide a mechanism for making quantitative decisions about a process, by determining whether there is enough evidence to reject a conjecture or hypothesis about that process. The conjecture is also called the null hypothesis, which in the case of random and pseudo-random number generators testing, represents the belief that the generator produces random sequences.

Statistical tests cannot state that a sequence is 100% random, because even if the null hypothesis is accepted, it doesn't mean that it is true. It only means that we do not have enough evidence to believe that the hypothesis is false. However, statistical tests can reject those sequences that are not random, based on some statistical properties of a truly random sequence.

Statistical tests have in fact two hypotheses: the null hypothesis (H0) and the alternative hypothesis (Ha). Obviously, the alternative hypothesis used in the testing of random and pseudo-random number generators is represented by the belief that the sequence produced is not random. Besides the two hypotheses, a statistical test also contains a test statistic that is based on the specific property tested, and the significance level (α). The significance level defines the sensitivity of the test or the risk of rejecting the null hypothesis. It can be chosen arbitrary, even though in practice values of 0.01, 0.1 and 0.5 are usually used.

The P-value is the probability of obtaining a result at least as extreme as the one that was actually observed when assuming that the null hypothesis is true. [5] The probability of rejecting a null hypothesis is strongly related to the P-value: the lower the P-value, the higher the probability. However, the P-value should not be confused with the probability of the null hypothesis to be true or with the probability of falsely rejecting the null hypothesis. The P-value cannot give an answer without a significance level set, because the null hypothesis can be rejected only if the P-value is lower than the significance level.

The power of the statistical tests is represented by the probability of rejecting a null hypothesis when it is actually false. The values of the test statistic that lead to the rejection of the null hypothesis form the critical region, which is defined based on a cut-off value that is computed based on the test distribution and the significance level.

The steps of any statistical test, as they are described in [6] are:1. Formulate the hypothesis.2. Calculate a statistic as a function of the data used as input.3. Choose a critical region.4. Chose a significance level, representing the size of the critical region.5. Analyze the position of the test statistic relatively to the critical region. If the

test statistic value exceeds the critical region, then the null hypothesis for randomness is rejected.

There are two types of errors that can appear in the context of statistical tests. The type I error appears when the data is random, but the null hypothesis is rejected. The other type of error (type II error) appears when the data is not random, but the null hypothesis is accepted. The probability of a type I error is represented by the significance level (α), and the probability of a type II error is denoted by β. Unlike α, β is not fixed because it can take many different values, depending on the type of non-randomness of the input sequence. However, α

2

Page 5: documentatie licenta

1 Introduction

and β are related to each other and to the size of the input sequence, meaning that if two of them are known, the third one can be calculated.

The first statistical tests for random numbers were published by M.G. Kendal and Bernard Babington Smith in [7] in 1938. Over the years, a great number of statistical tests were developed for testing random and pseudo-random number generators. Some of the batteries most known today are:

NIST (National Institute of Standards and Technology) [8] DIEHARD [9] TestU01 [10] ENT [11]

The NIST Statistical Test Suite tests bit sequences generated by both random and pseudo-random number generators. The ENT battery also uses bit sequences as input, but tests pseudo-random number generators. The Diehard battery tests real numbers in the interval [0, 1] and the TestU01 is based on NIST and Diehard, but also contains a great number of original tests. Therefore TestU01 uses as input both bit sequences and real number sequences.

The battery of tests chosen and the tests performed from that battery depend on the type of generator tested and on the properties we want to seek in it. The results of the statistical tests do not give only an answer to the question “Is this sequence random or not?”, but also provide means to analyze the results, in order to understand what properties must be improved for a certain generator and how far away is that generator from producing a random number.

1.2.2 Statistical Functions

Statistical functions are mathematical functions used to statistically analyze data. In the context of statistical test, statistical functions are used to compute the P-value, being therefore an important part of random number testing.

One of the most known libraries that contain such statistical functions is the Cephes library. This library provides special functions of mathematical physics and related items in the C language for scientists and engineers. It not only contains some of the most widely used statistical functions, but also elementary mathematical functions and other functions that come in hand when developing any statistical application.

Some of the statistical functions most widely used are: The GAMMA function LGAMMA – the logarithmic gamma function IGAM – the incomplete gamma function (regularized) IGAMC – the complemented incomplete gamma function (regularized) ERF – the error function ERFC – the complemented error function IBETA – incomplete beta function The POISSON distribution The CHY SQUARE distributionThese functions take as parameters some statistical variables computed from the initial

input data, and compute the random distribution over the input sequence. They have a great importance in the development of statistical tests for random and pseudo random number generators, because they are used when a decision is made regarding the acceptance or rejection of the null hypothesis. The type of statistical function used by each test depends on the test's distribution and property verified.

3

Page 6: documentatie licenta

1 Introduction

1.3 Project Goals

The current project is a part of the CryptoRand research project and aims at developing a more efficient implementation of the NIST Statistical Test Suite and also a battery of original tests developed as part of this project.

The main objective of the CryptoRand project, as stated in [12], is the study and development of high performance systems for generation and testing of random number sequences for cryptographic applications.

1.3.1 Motivation

The belief that a random number generator does truly produce random sequences can be certified only by statistical tests designed to test such sequences. Given the importance of random numbers in cryptography, and not only, this kind of statistical tests can become crucial. Although the NIST Statistical Test Suite is one of the most known batteries of tests, it is limited in the maximum size of the sequence to be tested and may take a considerate amount of time to complete the chosen tests.

Given this facts, the motivation of the current project is the need of a more efficient implementation of the NIST Statistical Test Suite and the constant need of development of new statistical tests that reveal new properties of random sequences. The more tests a sequence passes, the more motivated is the belief that that sequence is random.

The NIST Statistical Test Suite can be optimized not only for the classical sequential architecture, but they can also be improved for parallel multi-core architectures. The new statistical tests developed in this project also need to be implemented in an efficient manner for both the sequential and parallel architectures.

In order to design new statistical tests, one has to have a deep understanding of the statistical functions used. With this in mind, the current project also proposes a statistical functions calculator, containing the functions most used when developing statistical tests for random and pseudo-random sequences. This application is designed to ease the comprehension of the behavior, domain and co-domain of each of these statistical functions.

1.3.2 Objectives

The main objectives of this project are: Obtaining a more efficient sequential implementation of the NIST Statistical Test

Suite. Developing a parallel implementation of the NIST Statistical Test Suite. Analyzing and implementing in an efficient manner a sequential version of the

original statistical tests developed in the CryptoRand project. Implementing a parallel version of the CryptoRand battery of tests. Defining the testing modes of the statistical tests. Analyzing statistical functions and choosing a suite of the most used ones in

statistical tests. Designing a calculator to ease the comprehension of statistical functions. Offering the possibility to tabulate statistical functions for a certain given interval. Offering the possibility to run the tests on a batch of files or on a single file. Displaying the results in a textual manner and in a graphical manner.

4

Page 7: documentatie licenta

1 Introduction

1.3.3 Project Description

The current project is a team project and is composed of a number of applications that can be used as stand alone projects or combined in a certain manner, being closely related to each other. The applications can be grouped in more categories, depending on the criterion of classification. A first classification can be done by dividing the applications into the ones that are concerned with statistical tests and the one that is concerned with statistical functions. A second classification can be done for the first group by separating them in sequential implementations and parallel implementations. Both the sequential and the parallel implementations are composed of two batteries of tests: NIST and CryptoRand. These batteries come in two shapes: grouped in a single project that has a graphical user interface, and stand alone applications that be run independently from the command line prompt.

The project consists of the following applications that were implemented by a team of two people:

SFC - Statistical Functions Calculator SeqTestRand – incorporates the sequential versions of the tests from the two

batteries and provides a graphical user interface. SeqNIST – command line application containing the sequential version of the

NIST battery. SeqHistograms – command line application containing the sequential version of

the Histogram tests. SeqGenBlockFreqTest – this application was entirely developed by my colleague

and will be therefore presented by her. ParTestRand – incorporates the parallel versions of the tests from two batteries and

provides a graphical user interface. ParNIST – command line application containing the parallel version of the NIST

battery (each test was parallelized). ParHistograms – this application was entirely developed by my colleague and will

be therefore presented by her. ParNIST_blocks – command line application containing the sequential version of

the tests from the NIST battery and the parallelization of the blocks on which the tests are run.

ParHistograms_blocks – command line application containing the sequential version of the Histogram tests and the parallelization of the blocks on which the tests are run.

ParGenBlockFreqTest – this application was entirely developed by my colleague and will be therefore presented by her.

The first step done in the development of this project was to analyze the documentation offered by NIST and find possible optimizations to each of the 14 tests offered. (Actually the documentation describes 16 statistical tests, but 2 of them were disregarded by NIST, because of some problems found.) The tests were divided equally between the members of the team. I also studied another version of one of the NIST tests, proposed by J. S. Coron in two of his articles.

After the optimizations for each of the tests were analyzed, some decisions were taken regarding the maximum size of the sequence to be tested and the testing modes. The tests were implemented first in a version that allowed files with a size greater than 1GB, by reading 200000 bytes from the input sequence at once and processing them. Because of the time required to process sequences greater than 1 GB, it was decided that this version has to be integrated in another project that uses the great computational power of a grid.

For the second version of the tests the maximum size of the sequence to be tested was chosen to be 1 GB and the four testing modes that were selected are:

5

Page 8: documentatie licenta

1 Introduction

Mode 1 – running the tests on the whole sequence. Mode 2 – running the tests on a given number of blocks of a given equal size. Mode 3 – running the tests on a given number of blocks which increase in size

with a given step. Mode 4 – running the tests on a given number of blocks which decrease in size

with a given step.After the sequential version of the tests was implemented, the next step was to find the

statistical tests that could be parallelized. Out of the 14 tests implemented in the sequential version, only 9 were candidates for parallelization, out of which 6 were implemented by me and 3 by my colleague.

By this phase of the project, two tests were developed for the CryptoRand battery: the Histogram Test and the General Block Frequency Test. The Histogram Tests is actually composed of 8 sub-tests, because of the bit-length used (from 1 bit to 8 bits). The sequential version of these tests was implemented and they were analyzed further more in order to see if they could be parallelized. Only 2 of the Histogram Tests and the General Block Frequency Test were parallelized (by my colleague).

For the NIST tests and the Histogram tests, two more parallel applications were developed in order to parallelize the testing modes 2, 3 and 4, by using the sequential versions of the tests. The one for the NIST battery was developed by me and the one for the Histogram tests was developed by my colleague.

The sequential and the two parallel versions of these tests were incorporated in two separate applications that have a graphical user interface. For these two applications there are two possibilities two run the tests: on a single file that contains the input sequence and on multiple files in a batch mode. Also there are two possible ways to represent the result: in a textual manner or in a graphical manner. My part in the implementation of these two applications was to design the graphical user interface and to implement the graphical representation mode, whereas my colleague handled the incorporation of the applications developed so far.

The applications were tested in order to check their results and to make a comparison between the NIST implementation and our implementation and between the sequential and parallel versions in terms of execution time and maximum file size accepted. The experiments were performed on a computer with two quad core processors and on files of various sizes.

In the process of analyzing the statistical tests, the importance of statistical functions was underlined. Therefore, a new application was developed that calculates the values of some statistical functions, which are most used in statistical tests. The application also offers the possibility to tabulate the values of the functions for some given intervals. The functions were divided in 8 categories, out of which four were analyzed and implemented by me.

In the end, the project can be discussed by talking about three applications: the sequential version of the statistical tests for random number sequences (SeqTestRand), the parallel version of the statistical tests for random number sequences (ParTestRand) and the statistical functions calculator (SFC).

6

Page 9: documentatie licenta

2 Bibliographical Review

2 Bibliographical Review

The bibliographic study has done at the Technical University of Cluj Napoca, the Computer Science section. The study was mainly concerned with randomness, statistical tests for random and pseudo-random number generators, statistical functions and parallelization techniques.

In order to understand the concepts of randomness, random and pseudo-random number generators and statistical tests, some studies were done, starting with the basic information provided on Wikipedia ([1], [2], [3] and [5]) and continuing with a series of books. The NIST/SEMATECH e-Handbook of Statistical Methods [4] describes in detail what is a statistical test and how it should be used, while the steps of a statistical test and many other details are also described in the book A 100 Statistical Tests, by Kopal K. Kanji [6]. When talking about random numbers we cannot skip the study of Knuth’s book The Art of Computer Programming - Seminumerical Algorithms.

The implementation of the NIST statistical test suite for random and pseudo-random number generators is based on the documentation offered by NIST on the institute’s web page [8]. This paper does not only contain the description of all the tests, including the algorithm used and the technical descriptions of each test, but it also gives a brief introduction to randomness, random number generators and random sequences. The algorithms described in the documentation were modified in order to become more efficient, but the purpose of the tests remained the same. For the 9th test of the NIST statistical test suite, namely Maurer’s “Universal Statistical” Test, two articles written by J. S. Coron were studied ([15] and [17]) in order to implement another version of the test using Coron’s improved formulas.

The second battery of test implemented, CryptoRand, is based on the article written by professor Ioan Rasa [18]. Some other statistical tests suite were also studied briefly in order to better understand the concept of statistical test and to get an overview of the existing batteries. These suites were: TestU01, Diehard and ENT and for each of them code and documentation is provided on their corresponding web pages [9], [10], [11].

The statistical functions implemented were studied from different web sites that provide description of various mathematical and statistical functions. Some of these web sites are Wikipedia, the Math World – Wolfram Web Resource web site and Alglib.NET, where the last one also provided some implementations in .NET. Mathematical formulas of the statistical function were also provided in the Handbook of Mathematical Functions, by Abramowitz and Stegun [20].

The tests were parallelized with the help of OpenMP directives for C. For this purpose the OpenMP tutorial from [37] was studied, together with some other OpenMP articles like the one written by Kang Su Gatlin and Pete Isensee, about OpenMP and C/C++.

7

Page 10: documentatie licenta

3 Theoretical Foundations

3 Theoretical Foundations

3.1 Statistical Tests for Random and Pseudo-random Number Generators

As Knuth said in [13], “if we were to give some man a pencil and paper and ask him to write down 100 random decimal digits, chances are very slim that he will give a satisfactory result”, because “people tend to avoid things that seem non-random”. That’s why the most analyzed problem in the domain of random numbers is: when can we say that a number random and how can this fact be determined?

Over the years, the theory of statistics began to provide some measures of randomness for sequences produced by various generators, with the development of various statistical tests. Each of these tests searched for one or more properties of a truly random number in the input sequence. Because the number of statistical tests is very vast, they were grouped into batteries of tests, out of each two will be presented next. The first one is NIST, which is one of the most know statistical test suite, and the second one is a new developed battery, being part of the CryptoRand project.

The statistical tests presented here measure quantitative properties of the tested sequences. This is the most used method, but there are other methods to detect non-randomness in a sequence, like the one mentioned in [14]: looking at random bits plotted in a plane. The pictures obtained have revealed “spatial correlations not clearly detected in the quantitative tests” [14]. These tests could not be used as stand alone batteries, but rather to be used as a complement of the quantitative ones.

3.1.1 NIST Statistical Test Suite

The NIST Statistical Test Suite is a battery of tests for random and pseudo-random number generators that may be used for many purposes including cryptographic, modeling and simulation applications. It contains a total of 16 tests, out of which NIST disregarded 2 because of some problems that were found. The two tests that are taken out of the battery are the Discrete Fourier Transform (Spectral) Test (test 6) and the Lempel-Ziv Compression Test (test 10). The 14 tests are designed for generators that produce a sequence of bits, being therefore focused on cryptographic applications. Their algorithms are described in the official NIST documentation [8].

The significance level used by these tests is 0.01 and the null hypothesis is represented by the belief that the sequence is random. Therefore if the P-value computed for each test is less than 0.01 the null hypothesis is rejected, otherwise it is accepted.

Even if the order in which the tests are applied is arbitrary, it is recommended that the first test, the Frequency (Monobit) Test, be applied first because it looks for the most basic evidence that the test is not random, namely non-uniformity. If this test fails, the likelihood of the other tests to fail as well is high.

The reference distributions mainly used by these tests are the Standard Normal distribution and the Chi-square (χ2) distribution.[8] The first one (also called bell-shape curve) is used to compare the test statistic obtained for the random number generator with the expected value of the statistic under the assumption of randomness. The test statistic for the

standard normal distribution has the form , where x is the sample test statistic

value, μ is the expected value and σ2 is variance of the test statistic. The Chi-square distribution (χ2) is used to compare the goodness-of-fit of the observed frequencies of a sample measure to the corresponding expected frequencies of the hypothesis distribution, and

8

Page 11: documentatie licenta

3 Theoretical Foundations

has the form , where oi are the observed frequencies of occurrence of the

measure and ei are the expected frequencies.[8]In the next sub-sections I will present the algorithms and the theoretical aspects for

each of the tests implemented by me.

3.1.1.1 NIST Test 1 – Frequency (Monobit) Test

1. Purpose

The purpose of this test, as stated in [8], is to determine whether the number of ones and zeroes in a sequence are approximately the same as would be expected for a truly random sequence. The appearance of a zero or a one in the global input sequence should be equally likely, so the defect detected by this test is that the sequence contains too many zeroes or ones.

2. Used notations

n = the length of the input sequence in bits. ε = the sequence of bits being tested. ε = ε1ε2…εn

sobs = the absolute value of the sum of Xi in the sequence, divided by the square root of the length of the sequence.

Sn = the difference between the number of ones and the number of zeroes.

3. Test statistic and reference distribution

The test statistic is sobs and the reference distribution is a half-normal distribution. For a random sequence the test statistic should tend to zero, because only this way the number of zeroes and the number of ones can be approximately the same.

4. Input size recommendations

n ≥ 100 bits

5. Test description

For this test, the null hypothesis is that in a sequence of independent identically distributed Bernoulli random variables, the probability of ones is 1/2. The test makes use of the following approximation in order to asses the closeness of the number of ones to half of the sequence’s length: the distribution of the binomial sum, normalized by , is closely approximated by a standard normal distribution. The algorithm is derived from the Central Limit Theorem for the random walk and is described in the following steps.

Step 1 Convert to ±1 the bits in the sequence following the formula: . Compute the following sum, representing the difference between the number of ones

and the number of zeroes in the sequence: .Step 2

Compute the test statistic: .

Step 3

9

Page 12: documentatie licenta

3 Theoretical Foundations

Compute , where erfc is the complementary error function

and has the form .

Step 4 If ( ) then the sequence is non-random, else the sequence is random.

6. Example trace

In order to better understand the algorithm described above, an example will be presented next.

Input: ε = 1011010101, n = 10.

Step 1

Step 2

Step 3

Step 4, therefore the sequence is random.

7. Conclusions

The sequence would have been rejected by the test if or were to be large,

causing the P-value to decrease below 0.01. Large positive values of Sn indicate that the sequence contains too many ones, and large negative values of Sn indicate too many zeroes in the sequence.

3.1.1.2 NIST Test 2 – Frequency Test within a Block

1. Purpose

The purpose of this test, as stated in [8], is to determine whether frequency of ones in an M-bit block is approximately M/2, as would be expected under an assumption of randomness. If M = 1 the test becomes the Frequency (Monobit) Test. The appearance of a zero or a one in one of the blocks should be equally likely, so the defect detected by this test is that at least one of the blocks contains too many zeroes or ones.

2. Used notations

n = the length of the input sequence in bits. M = the length of each block in bits. ε = the sequence of bits being tested. ε = ε1ε2…εn

N = the number of non-overlapping blocks. πi = the proportion of ones in block i.

10

Page 13: documentatie licenta

3 Theoretical Foundations

χ2(obs) = a measure of how well the observed proportion of ones within a given M-bit block match the expected proportion ½.

3. Test statistic and reference distribution

The test statistic is χ2(obs) and the reference distribution is a Chi-square (χ2) distribution.

4. Input size recommendations

n ≥ 100 bits n ≥ M × N M ≥ 20 bits M ≥ 0.01 × n N < 100

5. Test description

The test decomposes the input sequence into a number of non-overlapping subsequences, trying to detect localized deviations from the ideal 1/2 proportion of ones. It actually applies the chi-square test for a homogeneous match of empirical frequencies to the ideal 1/2. For each disjoint block sequence the proportion of ones is computed and a Chi-square statistic, with N (number of blocks) degrees of freedom, compares these proportions to the ideal one. The following steps describe how the algorithm works.

Step 1

Partition the input sequence into non-overlapping blocks.

Discard any unused bits.Step 2

Determine the proportion πi of ones in each M-bit block: , for each i

in the interval [1, N].

Step 3

Compute the χ2(obs) statistic: .

Step 4

Compute , where igamc is the complementary

(regularized) incomplete gamma function and has the following form:

.

Step 5 If ( ) then the sequence is non-random, else the sequence is random.

6. Example trace

11

Page 14: documentatie licenta

3 Theoretical Foundations

For a better understanding of the test’s algorithm an example will be traced next.

Input: ε = 0110011010, n = 10, M = 3.

Step 1N = 3The blocks are: 011, 001 and 101.The final 0 bit is discarded.

Step 2

, ,

Step 3

Step 4

Step 5, therefore the sequence is random.

7. Conclusions

A small P-value would indicate a large deviation from the equal proportion of ones and zeroes in at least one of the blocks.

3.1.1.3 NIST Test 3 – Runs Test

1. Purpose

The purpose of this test, as stated in [8], is to determine whether the number of runs of ones and zeroes of various lengths is as expected for a random sequence. A run is an uninterrupted sequence of identical bits. A run of length k consists of exactly k identical bits bounded before and after with a bit of the opposite value. Practically, this test determines if the oscillation between the runs of ones and zeroes in the sequence is too slow or too fast.

2. Used notations

n = the length of the input sequence in bits. ε = the sequence of bits being tested. ε = ε1ε2… εn π = the proportion of ones in the input sequence. Vn(obs) = the total number of runs (i.e., the total number of zero runs + the total

number of one runs) across all the n bits.

3. Test statistic and reference distribution

The test statistic is Vn(obs) and the reference distribution is a Chi-square (χ2) distribution.

4. Input size recommendations

n ≥ 100 bits

5. Test description

12

Page 15: documentatie licenta

3 Theoretical Foundations

The test is based on the distribution of the total number of runs and has as a prerequisite the Frequency (Monobit) Test. The following steps describe the algorithm used for this test.

Step 1

Compute the pre-test proportion π of ones in the input sequence: .

Step 2

Determine if the prerequisite Frequency test is passed by computing and

making the following comparison: .

If , then the Runs test does not need to be performed and the P-value is set

to 0.

The following steps are performed only if the prerequisite test performed in steps 1 and 2 is passed.

Step 3

Compute the test statistic: , where r(k) = 0 if εk = εk+1, otherwise

r(k) = 1.Step 4

Compute , where erfc is the complementary

error function and has the form .

Step 5 If ( ) then the sequence is non-random, else the sequence is random.

6. Example trace

The following example is shown in order to better illustrate the algorithm.

Input: ε = 1001101011, n = 10.

Step 1

Step 2

, therefore the test is run.

Step 3

13

Page 16: documentatie licenta

3 Theoretical Foundations

, so .Step 4

Step 5, therefore the sequence is random.

7. Conclusions

A large value for Vn(obs) would indicate an oscillation in the sequence that is to fast, while a small value would indicate that the oscillation is too slow. The conclusion is that for a sequence to be random it mustn’t have a lot of changes inside, but it must have a considerate number of changes.

3.1.1.4 NIST Test 4 – Test for the Longest Run of Ones in a Block

1. Purpose

The purpose of this test, as stated in [8], is to determine whether length of the longest run of ones within the tested sequence is consistent with the length of the longest run of ones that would be expected in a random sequence. An irregularity in the expected length of the longest run of ones implies that there is also an irregularity in the expected length of the longest run of zeroes. That’s why only the test for the longest run of ones needs to be computed. The test focuses on the longest run of ones within M-bit blocks and determines if the oscillation between the runs of ones and zeroes in the sequence is too slow or too fast.

2. Used notations

n = the length of the input sequence in bits. ε = the sequence of bits being tested. ε = ε1ε2…εn

M = the length of each block in bits. N = the number of non-overlapping blocks. χ2(obs) = a measure of how well the observed longest run length within M-bit

blocks matches the expected longest length of a run of ones within M-bit blocks. vi = the frequencies of the longest runs of ones in each block. = theoretical probabilities.

3. Test statistic and reference distribution

The test statistic is χ2(obs) and the reference distribution is a Chi-square (χ2) distribution.

4. Input size recommendations

n ≥ 128 bits, if M = 8 n ≥ 6272 bits, if M = 128 n ≥ 750 000 bits , if M = 104

5. Test description

14

Page 17: documentatie licenta

3 Theoretical Foundations

This test computes the longest run of ones ʋj within the jth subsequence and chooses K+1 classes depending on the value of M (the length of the subsequence). For these substrings the frequencies of the longest runs of ones are computed, where the computed values of the longest run of ones in each block belongs to any of the K+1 classes chosen. The following steps illustrate the algorithm of this test.

Step 1 Divide the sequence into M-bit blocks.

Step 2 Tabulate the frequencies vi of the longest runs of ones in each block, where each cell

contains the number of runs of 1s of a given length, as shown in Table 3-1.

Table 3-1.The manner in which frequencies of the longest run of ones in each block are tabulated depending on the size of the block chosen

vi M=8 M=128 M=104

v0 ≤ 1 ≤ 4 ≤ 10v1 2 5 11v2 3 6 12v3 ≥ 4 7 13v4 8 14v5 ≥ 9 15v6 ≥ 16

Step 3

Compute , where K is determined in accordance with

Table 3-2.

Table 3-2.The manner in which K is chosen depending on the size of the block chosen

M K8 3

128 5104 6

The πi values used in the above formula are computed in the following way:

,

where r represents the number of ones in the m-bit block, M – r represents the number of zeroes in the block, represents the longest run of ones in the block and

.

The values obtained for are the following ones:K = 3, M = 8: π0 = 0.2148, π1 = 0.3672, π2 = 0.2305, π3 = 0.1875. K = 5, M = 128: π0 = 0.1174, π1 = 0.2430, π2 = 0.2493, π3 = 0.1752, π4 = 0.1027,

π5 = 0.1124.K = 6, M = 104: π0 = 0.0882, π1 = 0.2092, π2 = 0.2483, π3 = 0.1933, π4 = 0.1208,

π5 = 0.0675, π6 = 0.0727.

15

Page 18: documentatie licenta

3 Theoretical Foundations

Step 4

Compute , where igamc is the complementary

(regularized) incomplete gamma function and has the following form:

.

Step 5 If ( ) then the sequence is non-random, else the sequence is random.

6. Example trace

The following example has the purpose of underlining the algorithm of the current test.

Input: ε = 11001100 00010101 01101100 01001100 11100000 00000010 01001101 01010001 00010011 11010110 10000000 11010111 11001100 11100110 11011000 10110010, n = 128, M = 8.

Step 1

Table 3-3.The sequence divided in blocks of 8 bits and the longest run of ones for each block

Block no. Block bits Max-run Block no. Block bits Max-run 1 11001100 2 9 00010011 22 00010101 1 10 11010110 23 01101100 2 11 10000000 14 01001100 2 12 11010111 35 11100000 3 13 11001100 26 00000010 1 14 11100110 37 01001101 2 15 11011000 28 01010001 1 16 10110010 2

Step 2v0 = 4, v1 = 9, v2 = 3, v3 = 0.

Step 3K = 3, N = 16π0 = 0.2148, π1 = 0.3672, π2 = 0.2305, π3 = 0.1875

Step 4

Step 5, therefore the sequence is random.

7. Conclusions

Large values of χ2(obs) indicate that the sequence has clusters of ones.

3.1.1.5 NIST Test 7 – Non-overlapping Template Matching Test

1. Purpose

16

Page 19: documentatie licenta

3 Theoretical Foundations

The purpose of this test, as stated in [8], is to detect generators that produce too many occurrences of a given non-periodic pattern. An m-bit window is used to search for a specific m-bit pattern. If the pattern is found, the window slides to the bit after the found pattern, else the window slides one bit position. The property that this test is searching in the input sequence is periodic dependence and the defect found is irregular occurrences of a pre-specified template.

2. Used notations

n = the length of the input sequence in bits. ε = the sequence of bits being tested. ε = ε1ε2…εn

m = the length of the pattern in bits. B = the m-bit template to be matched. M = the length of the substring of ε to be tested in bits. N = number of independent blocks. χ2(obs) = a measure of how well the observed number of template “hits” matches

the expected number of template “hits”. Wj (j=1, … , N) = the number of times that B occurs within the block j. μ = theoretical mean. σ2 = the variance.

3. Test statistic and reference distribution

The test statistic is χ2(obs) and the reference distribution is a Chi-square (χ2) distribution.

4. Input size recommendations

m = 2, 3, … , or 10; it is recommended that m = 9 or m = 10. N ≤ 100 M > 0.01 × n

5. Test description

The test rejects input sequences that have too many or to few occurrences of a given aperiodic pattern. An aperiodic pattern B cannot have the form CC…CCʹ, where Cʹ is a prefix of C. The following steps describe the algorithm used by this test.

Step 1 Divide the sequence into N non-overlapping blocks of length M.

Step 2 Compute the number of times that the template occurs within the block j (W j), by

creating an m-bit window on the sequence and comparing the bits within the window against the template.

If there is no match, the window slides over one bit. If there is a match the window slides over m bits.

Step 3

Compute and .

Step 4

17

Page 20: documentatie licenta

3 Theoretical Foundations

Compute .

Step 5

Compute , where igamc is the complementary

(regularized) incomplete gamma function and has the following form:

.

Step 6 If ( ) then the sequence is non-random, else the sequence is random.

6. Example trace

In order to better understand the algorithm used by this test an example trace is provided.

Input: ε = 10100100101110010110, n = 20, M = 10, m = 3, B = 001.

Step 1N = 2The blocks are: 1010010010 and 1110010110.

Step 2

Table 3-4.The number of appearances of the template in each block (Wj)

Bit Positions

Block 1 Block 2Bits W1 Bits W2

1-3 101 0 111 02-4 010 0 110 03-5 100 0 100 04-6 001 (hit) Increment to 1 001 (hit) Increment to 15-7 Not examined 1 Not examined 16-8 Not examined 1 Not examined 17-9 001 (hit) Increment to 2 011 18-10 Not examined 2 110 1

Step 3

Step 4

Step 5

Step 6

18

Page 21: documentatie licenta

3 Theoretical Foundations

, therefore the sequence is random.

7. Conclusions

If the P-value is very small, then the sequence has irregular occurrences of the templates provided as input and therefore is not random.

3.1.1.6 NIST Test 8 – Overlapping Template Matching Test

1. Purpose

The focus of this test, as stated in [8], is the number of occurrences of pre-specified target strings. As the 7th test, it uses an m-bit window to search for a specific m-bit pattern. Whether the pattern is found or not, the window slides only one bit position. The property that this test is searching in the input sequence is periodic dependence and the defect found is irregular occurrences of a pre-specified template.

2. Used notations

n = the length of the input sequence in bits. ε = the sequence of bits being tested. ε = ε1ε2…εn

m = the length of the pattern in bits (the pattern is a run of ones). B = the m-bit template to be matched (it is a run of ones). K = the number of degrees of freedom. M = the length of the substring of ε to be tested in bits. N = the number of independent blocks. χ2(obs) = a measure of how well the observed number of template “hits” matches

the expected number of template “hits”. πi = theoretical probabilities. vi = the number of times that the template appeared i times in a block.

3. Test statistic and reference distribution

The test statistic is χ2(obs) and the reference distribution is a Chi-square (χ2) distribution.

4. Input size recommendations

n ≥ 106 bits Various values of m may be used, but NIST recommends m=9 or m=10. n ≥ M × N N × (min πi) > 5

m ≈ log2 M K≈2λ. πi values need to be recalculated for values of K other than 5.

5. Test description

The test rejects input sequences which contain too many or too few occurrences of m-bit run of ones, but can be modified to detect irregular occurrences of any periodic pattern. The following steps describe the algorithm used by this test.

Step 1

19

Page 22: documentatie licenta

3 Theoretical Foundations

Divide the sequence into N non-overlapping blocks of length M.Step 2

Compute the number of occurrences of B in each block:a. Create an m-bit window on the sequence.b. Compare the bits within that window against B.c. Increment a counter when there is a match.d. Slide the window over one bit and go to step b.

Record the number of occurrences of B in each block by incrementing an array v i, where i=0, … , 5, such that: v0 is incremented when there are no occurrences of B in the block, v1 is incremented when there is one occurrence of B in the block, and so on, and lastly v5 is incremented when there are 5 or more occurrences of B in the block.

Step 3

Compute and , used to compute the theoretical probabilities

πi corresponding to the classes of vi.Step 4

Compute , where the probabilities used are

, , , , ,

, being calculated with the following formula:

.

Step 5

Compute , where igamc is the complementary

(regularized) incomplete gamma function and has the following form:

.

Step 6 If ( ) then the sequence is non-random, else the sequence is random.

6. Example trace

The following example will underline the description of this test’s algorithm.

Input: ε = 10111011110010110100011100101110111110000101101001, n = 50, m = 2, B = 11, K = 2, M = 10.

Step 1N = 5The blocks are: 1011101111, 0010110100, 0111001011, 1011111000 and 0101101001.

Step 2The examination of the first block is illustrated in Table 3-5.

Table 3-5.Illustration of the examination of the first block

Bit Positions Bits No. of occurrences of B=111-2 10 02-3 01 0

20

Page 23: documentatie licenta

3 Theoretical Foundations

3-4 11(hit) Increment to 14-5 11(hit) Increment to 25-6 10 26-7 01 27-8 11(hit) Increment to 38-9 11(hit) Increment to 49-10 11(hit) Increment to 5

After examining the first block the frequencies are: v0 = 0, v1 = 0, v2 = 0, v3 = 0, v4 = 0, v5 = 1.

The same process is applied to block 2, 3, 4 and 5. In the end the frequencies obtained are: v0 = 0, v1 = 2, v2 = 0, v3 = 1, v4 = 1, v5 = 1.

Step 3

Step 4For this example the values of the theoretical probabilities had to be recomputed,

because the example doesn’t fit the test’s input requirements. The recomputed probabilities are: , , , , ,

.

Step 5

.

Step 6, therefore the sequence is random.

7. Conclusions

For the example provided, if the template would have been searched in the entire sequence there would have been to many occurrences. Therefore v5 would have been large and the test statistic as well, which makes the P-value very small and the sequence non-random.

3.1.1.7 NIST Test 9 – Maurer’s “Universal Statistical” Test

1. Purpose

The purpose of this test, as stated in [8], is to detect whether or not the sequence can be significantly compressed without loss of information. A sequence is considered non-random if it is significantly compressible. The test focuses on the number of bits between matching patterns, the measure being related to the length of the compressed sequence.

21

Page 24: documentatie licenta

3 Theoretical Foundations

2. Used notations

n = the length of the input sequence in bits. ε = the sequence of bits being tested. ε = ε1ε2…εn

L = the length of each block in bits. Q = the number of blocks in the initialization sequence. K = the number of the test blocks, without the initialization sequence blocks. fn = the logarithmic sum of the number of digits in the distance between L-bit

templates divided by the length of the test segment Tj = a table holding the block number of the last occurrence of each L-bit block,

where j is the decimal representation of the contents of the current block. sum = accumulative log2 sum of all the differences detected in the K blocks. σ = the theoretical standard deviation. c = the heuristic approximation.

3. Test statistic and reference distribution

The test statistic is fn, the sum of the log2 distances between matching L-bit templates and the reference distribution is half-normal distribution.

4. Input size recommendations

n ≥ (Q + K) × L 6 ≤ L ≤ 16 Q = 10 • 2L

The values of L, Q and n should be chosen in accordance with Table 3-6.

Table 3-6.The correspondence between the length of the sequence, the length of the block and the number of blocks in the initialization segment

n L Q=10•2L

≥387.840 6 640≥904.960 7 1.280

≥2.068.480 8 2.560≥4.654.080 9 5.120≥1.342.400 10 10.240≥22.753.280 11 20.480≥49.643.520 12 40.960≥107.560.960 13 81.920

22

Page 25: documentatie licenta

3 Theoretical Foundations

≥231.669.760 14 163.840≥496.435.200 15 327.680

≥1.059.061.760 16 655.360

5. Test description

The test measures the actual cryptographic significance of a defect and uses a test statistic that is closely related to the per-bit entropy of the sequence. It is designed for large sequences, but for small values of L, because the initialization segment takes time exponential in L. The test compares the standardized statistic to an acceptable range based on a standard normal Gaussian density, making use of the test statistic’s mean. The following steps describe in detail the algorithm adopted by this test.

Step 1 Partition the input sequence into two segments:

- An initialization segment consisting of Q L-bit non-overlapping blocks.- A test segment consisting of K L-bit non-overlapping blocks, where

.

Discard the bits at the end of the sequence that don’t form a complete L-bit block.Step 2

Create a table for the initialization segment, which will have as indexes the L-bit values of each block, transformed to base 10, and will contain the block number of the last occurrence of each block.

The table will have the following form: Tj=i, where i takes values between 1 and Q, and j is the decimal representation of the contents of the ith L-bit block.

Step 3 For each of the K blocks in the test segment determine the number of blocks since

the last occurrence of the same L-bit block (i-Tj). Replace the corresponding value in table Tj with the current location of the block

(Tj=i). Add the distance between re-occurrences of the same L-bit block to the cumulative

log2 sum of all the distances of this kind found in the K blocks:

.

Step 4

Compute the test statistic: .

Step 5

Compute , where erfc is the

complementary error function and has the form .

The value of the expectedValue(L) are taken from Table 3-7.

23

Page 26: documentatie licenta

3 Theoretical Foundations

, where the variance is taken from Table 3-7 and

.

Table 3-7.The relation between L, the expected value and the variance

L Expected value Variance 6 5.2177052 2.9547 601962507 3.1258 7.1836656 3.2389 8.1764248 3.31110 9.1723243 3.35611 10.170032 3.38412 11.168765 3.40113 12.168070 3.41014 13.167693 3.41615 14.167488 3.41916 15.167379 3.421

Step 6 If ( ) then the sequence is non-random, else the sequence is random.

6. Example trace

An example will be presented next for a better understanding of the steps taken in the execution of this test.

Input: ε = 01011010011101010111, n = 20, L = 2.

Step 1Q = 22 = 4

The initialization segment is: 01011010 The test segment is: 011101010111.The L-bit blocks are shown in Table 3-8.

Table 3-8.The L-bit blocks of the initialization and test segments

Block No. Type Contents1

Initialization Segment

012 013 104 105 Test Segment 016 11

24

Page 27: documentatie licenta

3 Theoretical Foundations

7 018 019 0110 11

Step 2The Tj table is shown in Table 3-9.

Table 3-9.Table containing the last occurrence of each block in the initialization segment

Possible L-bit Value00

(saved in T0)01

(saved in T1)10

(saved in T2)11

(saved in T3)Initialization 0 2 4 0

Step 3The distances and the sum computed are the following:

For block 5: T1=5, sum = log2(5-2) = 1.584962501. For block 6: T3=6, sum = 1.584962501 + log2(6-0) = 4.169925002. For block 7: T1=7, sum = 4.169925002 + log2(7-5) = 5.169925002. For block 8: T1=8, sum = 5.169925002 + log2(8-7) = 5.169925002. For block 9: T1=9, sum = 5.169925002 + log2(9-8) = 5.169925002. For block 10: T3=10, sum = 5.169925002 + log2(10-6) = 7.169925002.

The final state of the Tj table is shown in Table 3-10.

Table 3-10.The final state of the table Tj, after all the blocks were examinedPossible L-bit Value

00(saved in T0)

01(saved in T1)

10 (saved in T2)

11 (saved in T3)

Final State 0 9 4 10

Step 4

Step 5

Variance(2) = 52.586224 and expectedValue(2) = 1.5374383.

Step 6, therefore the sequence is random.

7. Conclusions

The sequence is considered non-random when the logarithmic sum of all the distances between re-occurrences of an L-bit block detected in the sequence is significantly different than the theoretical expected value for the chosen L (that’s when the sequence is significantly compressible).

25

Page 28: documentatie licenta

3 Theoretical Foundations

3.1.1.8 NIST Test 9 – Maurer’s “Universal Statistical” Test (Coron’s variant)

Maurer’s “Universal Statistical” Test is based on the computation of a function which is asymptotically related to the source’s entropy. In cryptographic applications this function measures the effective key-size of block ciphers keyed by the source’s output.

In paper [15], J.S. Coron describes a variant of the Maurer’s “Universal Statistical” Test for which the test function is exactly equal to the source’s entropy. This variant of the test would therefore detect the defects in the sequence in a better way.

As stated in [16], this test is based on the stationary ergodic source with finite memory statistical model. This model allows the computation of the source’s entropy, which measures the bits of unpredictability. If this fails in a cryptographic system for example, there could appear serious problems concerning the security of that system, because an attacker could use the reduction in entropy to speed-up exhaustive search on the encryption algorithm.

In paper [15], the blocks of L-bit size are denoted ,

where sN represents the sequence, and sL•i is the bit from the position L•i in the sequence. The

test function is denoted , where represents the

minimum distance between the nth block and any similar precedent block.The first step that Coron takes in order to improve Maurer’s test function is to

generalize the test parameter to any function of the minimum distance between two similar blocks. He does this by replacing the logarithm of the base two of the minimum distance between re-occurrences of the same block with a function , yielding

.

Coron calculates the mean of the test statistic for a sequence of random binary variables outputted by a stationary ergodic source S in the following manner:

,

where and b is a block of L bits.

If the L-bit blocks are considered to be statistically independent, then the above probability can be resumed to:

,

and the mean calculated above becomes:

, where .

This shows that the mean value of the generalized test may be interpreted as the expectation of a random variable W=W(X) that has a probability of Pr[b] to hit the value

. If the entropy of the L-bit blocks is expressed in the following way:

, then the entropy can be viewed as the expectation of a random

variable W*=W*(X) that has a probability of Pr[b] to hit the value –log2(Pr[b]).From what was said until now only one equation was left to be solved:

.

By making the substitution we obtain:

26

Page 29: documentatie licenta

3 Theoretical Foundations

,

and g(1) = 0, .

Therefore the modified version of the function used in the Maurer’s “Universal Statistical” Test is

, with .

With this new formula computed, the mean value of this new function, taking as input a sequence generated by an ergodic stationary source is equal to the entropy of L-bit blocks of the source. The values of the theoretical standard deviation and the variance must also be modified in order to be in concordance with the new function computed.

Maurer introduces a heuristic approximation c(L,K), that is used as a corrective factor which reduces the standard deviation compared to what it would have been if the An-terms were independent. The new heuristic approximation proposed by Coron in [15] and [17] has the following form:

.

Table 3-11 shows the values for d(L) and e(L) for each possible L and the recomputed values of the variance used for the new function formula ([15]).

Table 3-11.The recomputed values of d(L), e(L) and the variance used for the Coron’s formula

L Variance d(L) e(L)3 2.5769918 0.3313257 0.43818094 2.9191004 0.3516506 0.40501705 3.1291382 0.3660832 0.38566686 3.2547450 0.3758725 0.37437827 3.3282150 0.3822459 0.36782698 3.3704039 0.3862500 0.36405699 3.3942629 0.3886906 0.361909110 3.4075860 0.3901408 0.360698211 3.4149476 0.3909846 0.360022212 3.4189794 0.3914671 0.359648413 3.4211711 0.3917390 0.359443314 3.4223549 0.3918905 0.359331615 3.4229908 0.3919740 0.359271216 3.4233308 0.3920198 0.3592384∞ 3.4237147 0.3920729 0.3592016

To improve efficiency the coefficients computed by g(i) are approximated for large

values of i by using the following formula: , where

is Euler’s constant.

When computing the new function formula, Coron assumed that the L-bit blocks are statistically independent, that is that the probability of appearance of a block does not depend on the preceding ones. This assumption is valid only if the source is a binary source that emits ones with probability p, and zeroes with probability 1 – p. Therefore, for a source with finite

27

Page 30: documentatie licenta

3 Theoretical Foundations

memory the test statistic is not equal to the entropy of L-bit blocks of the source. However, if the statistics of the tested source differ from the statistics of a truly random source, the tested source will be rejected with high probability.

3.1.2 CryptoRand Statistical Test Suite

The CryptoRand Statistical Test Suite contains two original tests that were created as part of the CryptoRand project. The tests contained in this suite are the Histogram Test and the General Block Frequency Test. The Histogram Test is divided in 8 sub-tests, numbered from 1 to 8, representing the number of bits used for the templates in each test. I implemented only half of the Histogram sub-tests (from 5 to 8) so I will present a theoretical description only for the Histogram Test in the next sub-section. My team colleague will describe the General Block Frequency Test that she implemented.

The significance level recommended for these tests is 0.01, but can be changed at any time if another value is desired.

3.1.2.1 The Histogram Test – Hl Test

1. Purpose

The purpose of this test, as results from [18], is to determine if the number of occurrences of all the patterns of length l, where l belongs to the interval [1, 8], are approximately the same as would be expected under an assumption of randomness. If l = 1, the patterns searched are 0 and 1, so the test resumes to the NIST Frequency (Monobit) Test.

2. Used notations

n = the length of the tested sequence in bits. l = the length of the templates (l determines the number of the Histogram Test

from a total of 8 tests). A = 2l – each number i in the interval {0, 1, … , A-1} can be represented on l

bits. N = the number of non-overlapping blocks divided by A. Ni = the frequency of block i (N0 + N1 + … + NA-1 = N ×A). X = a random variable with A-1 degrees of freedom and a Chi-square (χ2)

distribution (the expected value). ni = the observed value of Ni. x = the observed value of the test statistic. Hi = Histogram Test i(bits).

3. Test statistic and reference distribution

The test statistic is x and the reference distribution is a Chi-square (χ2) distribution.

4. Input size recommendations

n ≥ l × A l > 0

5. Test description

The following steps describe the general algorithm used by the Histogram Test, while the algorithms used for each sub-test are described in the implementation details. The test partitions the input sequence in blocks of length l and computes the frequencies of occurrence of all the blocks. The variable l gives the name of the sub-test, taking values between 1 and 8.

28

Page 31: documentatie licenta

3 Theoretical Foundations

Step 1

Compute .

Partition the input sequence into A × N blocks of size l. Discard any bits that are left.

Step 2 Compute the frequency ni of each l-bit block in the entire sequence. Under an assumption of randomness the expected value of these frequencies is N.

Step 3

Compute the observed value: .

The expected value under an assumption of randomness is .

Step 4

For l ≤ 6, compute , where igamc is the complementary

(regularized) incomplete gamma function and has the following form:

.

For l > 6, compute

, where erfc

is the complementary error function and has the form .

P(X > x) represents the P-value.Step 5

If ( ) then the sequence is non-random, else the sequence is random.

6. Examples

In the following a series of examples taken from [18] will be presented, in order to illustrate the power of the Histogram Test and the properties of its sub-tests.

Example 1

For this example l = 2, therefore A = 4 and n = 8 × N, where N is a fixed value. The possible values of l-bit length are 00, 01, 10 and 11, and the frequencies with which these values appear in the input sequence are denoted n0, n1, n2, and respectively n3.

The Histogram Test states that N1 + N2 +…+ NA-1 = A × N, therefore n0 + n1 + n2 + n3 = 4 × N.

The test statistic for the Histogram Test on 2 bits is denoted here x2 and is computed in the following way:

29

Page 32: documentatie licenta

3 Theoretical Foundations

.

In order to compare this test with the Histogram Test on 1 bit, which is equivalent with the Frequency (Monobit) NIST Test, the test statistic is also computed for this sub-test and is denoted x1. When l = 1 the possible values of l-bit length are only 0 and 1, and the frequencies with which these values appear in the input sequence are equal to 2n0 + n1 + n2 and respectively n1 + n2 + 2n3. Taking into consideration that for l = 1 the theoretical frequency is 4N, x1 has the following formula:

.

Because (n0 –N) + (n1 –N) + (n2 –N) + (n3 –N) = 0, x1 can be reduced to

.

Taking into consideration the formula obtained for x2 and the equation (n0 –N) + (n1 –

N) + (n2 –N) + (n3 –N) = 0, we deduct that is maximum when

, , n1 = N and n2 = N. Therefore .

Example 2

Input: the input sequence is 00 01 00 00 10 00 01 10 00 00 10 00 01 00 10 01, n = 32, l = 2, A = 4

Step 1

Step 2The l-bit templates searched in a non-overlapping manner are 00, 01, 10 and 11. The corresponding frequencies of these templates in each block are:00 01 00 00 10 00 01 10 00 00 10 00 01 00 10 01 => n0 = 800 01 00 00 10 00 01 10 00 00 10 00 01 00 10 01 => n1 = 400 01 00 00 10 00 01 10 00 00 10 00 01 00 10 01 => n2 = 400 01 00 00 10 00 01 10 00 00 10 00 01 00 10 01 => n3 = 0

Step 3

In order to make a comparison with the Histogram Test with l = 1, x1 is also computed:

For this case x1 = x2.Step 4

For l = 2, .

For l = 1, .

30

Page 33: documentatie licenta

3 Theoretical Foundations

Step 5For l = 2, , therefore the sequence is random.For l = 1, , therefore the sequence is non-random.

Example 3

The following example is in fact a comparison between the Histogram sub-tests 1, 2, 4 and 8.

The sequence has the length n = 8 × 256 × N. Case 1: l = 8:

The frequencies of the 256 blocks of 8 bits length are:00000000 – 2N00000001 – N… …11111110 – N11111111 – 0The theoretical frequency is in this case N.

, with 255 degrees of freedom.

Case 2: l = 4:The frequencies of the 16 blocks of 4 bits length are:0000 – 32N + 2N0001 – 32N… …1110 – 32N1111 – 32N - 2NThe theoretical frequency is in this case 32N.

, with 15 degrees of freedom.

Case 3: l = 2:The frequencies of the 4 blocks of 2 bits length are:00 – 256N + 4N01 – 256N 10 – 256N11 – 256N - 4NThe theoretical frequency is in this case 256N.

, with 3 degrees of freedom.

Case 4: l = 1:The frequencies of the 2 blocks of 1 bit length are:0 – 4×256N + 8N1 – 4×256N – 8N The theoretical frequency is in this case 4×256N.

, with 1 degree of freedom.

As a conclusion, for the given sequence the following comparison can be made:x8 = 8 × x4 = 16 × x2 = 16 × x1.

7. Conclusions

31

Page 34: documentatie licenta

3 Theoretical Foundations

As shown in the second example, even if the test statistic was equal for the Histogram 1(bit) Test and for the Histogram 2(bits) Test, the P-values were not equal. In fact the test rejected the input sequence for the first one and accepted it for the second one. This is the reason for which the Histogram Test was divided in 8 sub-tests.

3.2 Statistical Functions

Statistical functions have an important role in the computation of statistical tests, because their values are used in the final process of deciding whether the sequence is random or not. In the following subsections a series of statistical function will be presented. These functions are the ones most used in statistical testing and represent only half of the functions that were taken into consideration in developing a statistical functions calculator, while the other half will be presented by my team colleague.

3.2.1 IBETA Statistical Function

The Incomplete Beta Statistical function, or IBETA for short, has the following mathematical formula:

[19].

This function has three input parameters: x, a and b, and returns a value in the interval [0, 1]. The last two parameters must be real numbers greater that 0 and the first one, which specifies the upper limit of the integration, must be a real number in the interval [0, 1]. The function is incomplete because the integration interval doesn’t necessarily span across the whole interval used by the Beta function for integration.

The Beta function is also called the Euler integral of the first kind and has the following formula:

, for x, y > 0 [19].

Because the difference between the Incomplete Beta function and the Beta function is only the chosen integration limit, the same properties that apply to the last one, also apply to the first one. Actually the Incomplete Beta function is a generalization of the Beta function, because for x = 1 the two functions are identical.

The Beta function is symmetrical, meaning that and , and has many other forms beside the base formula presented above.

One of the interesting formulas used to represent the Beta function is the one that makes use of another statistical function, namely Gamma:

[19].

Other formulas defined in [20] are:

The derivative of the Incomplete Beta function is:

,

and its indefinite integral is:

[21].

The Beta function was the first known scattering amplitude in string theory and was also used in the theory of the preferential attachment process, which is a type of stochastic urn process. The ratio between IBETA and BETA represents the Regularized Incomplete Beta

32

Page 35: documentatie licenta

3 Theoretical Foundations

Function, which can be used to evaluate the cumulative density function of a random variable X from a binomial distribution. As stated in [22], the Incomplete Beta function is used in statistics in order to compute the cumulative distribution functions of the binomial distribution, F-distribution and Student's distribution.

The Beta distribution is a family of continuous probability distributions defined on the interval [0, 1] and parameterized by two positive shape parameters, a and b. The probability density function and the cumulative density function that characterize the Beta Distribution are computed with the help of the Incomplete Beta function. As described in [23], the probability density function has the formula:

,

and the cumulative density function is described by:

.

The Beta distribution is a special case of the Dirichlet distribution with only two parameters, and when it takes both of its shape parameters as 1 is identical to the uniform distribution. This distribution is used as a prior distribution for binomial proportions in Bayesian analysis.

3.2.2 IGAM and IGAMC Statistical Functions

The Incomplete Gamma statistical function is defined as an integral function with the same integrand as the Gamma function, but with different integral intervals. There are two formulas provided in [24], one for the upper part of the integration and one for the lower part:

and .

The relation between the two parts of the Incomplete Gamma function can be expressed in the following way: , where is the Gamma function. is also called the complement of the Incomplete Gamma function.

Even if in mathematics the Incomplete Gamma function has the formula defined above, the Cephes mathematical library uses in fact another one, namely the formula of the lower Regularized Gamma function:

.

Practically, the lower Regularized Gamma function represents the regularization of the lower Incomplete Gamma function, , by the complete Gamma function, . This is the reason why the Incomplete Gamma function is also known under this definition, which is also the formula used for IGAM in the statistical functions calculator implemented in this project.

The Complemented Incomplete Gamma statistical function, or IGAMC for short, is one of the functions most used in statistical tests for random and pseudo-random number generators and is closely related to the Incomplete Gamma function. In mathematics it is represented by the upper Regularized Gamma function, but the Cephes library uses the same convention as in the case of the Incomplete Gamma function stated above, and uses the following formula to define it:

,

33

Page 36: documentatie licenta

3 Theoretical Foundations

where represents the upper Incomplete Gamma function as defined in mathematics, and is the complete Gamma function. Both parameters of the Incomplete Gamma function and its complement (as defined in Cephes) must be positive real numbers.

The relation between the two Regularized Gamma functions is the same as the one between the lower and upper part of the Incomplete Gamma Function: .

The derivatives of the two functions as well as their indefinite integrals are defined in [25] as:

, ,

, .

The following formulas represent some of the basic properties of the functions presented above:

, , , , , ,

.

As stated in [26], the Incomplete Gamma function could be used in the computation of cumulative distribution functions of the Gamma distribution, the Inverse Gamma distribution, the Poison distribution and the Chi-square distribution, by using the following formulas:

[27],

[28],

[29],

[30].

3.2.3 The Chi-square Distribution

The Chi-square distribution is one of the most widely used theoretical distributions in inferential statistics, because, under reasonable assumptions, easily calculated quantities can be proven to have distributions that approximate to the Chi-square distribution if the null hypotheses is true. [31] The Chi-square distribution is used in statistical tests, in order to compute the goodness of fit of an observed distribution to a theoretical one. Usually the test statistic has the following form:

[32].

If the computed test statistic is large, then the observed and expected values are not close and the model is a poor fit to the data. 

The definition given in [31], says that the random variable is distributed

according to the Chi-square distribution with k degrees of freedom, if Xi are k independent, normally distributed random variables with mean 0 and variance 1. This is denoted . Practically the degree of freedom is equal to the standard normal deviated variable being

34

Page 37: documentatie licenta

3 Theoretical Foundations

summed up. The Chi-square distribution is a special case of the Gamma distribution (

) and is characterized by two density functions: the probability density

function and the cumulative density function. The probability density function of the Chi-square distribution has the following formula:

, for x > 0

0, for x ≤ 0The cumulative density function of the Chi-square distribution is:

[31]

For both of the functions presented above, the x parameter must be a positive real number and the k parameter must be an integer number greater than 0, because it represents the degrees of freedom for the Chi-square distribution.

The characteristic function of the Chi-square distribution is . For the random variable X that is distributed according to the Chi-square distribution with k degrees of freedom ( ), the mean is given by μ= k, and the variance by σ2= 2k. The

information entropy is given by .

Figures 3-1 and 3-2 show the plots of the probability density function and the cumulative density function for some fixed degrees of freedom and are taken from [31].

Figure 3-1.Probability density function for the Chi-square distribution

35

Page 38: documentatie licenta

3 Theoretical Foundations

Figure 3-2.Cumulative density function for the Chi-square distribution

3.2.4 The Poison Distribution

The Poisson distribution is a discrete probability distribution that expresses the probability of a number of events occurring in a fixed period of time, if these events occur with a known average rate and independently of the time passed since the last event. [33] The distribution can also be used to count the number of events that appeared in other intervals, like distance, volume or area. It was discovered by Siméon-Denis Poisson, who gave it the name, and has the following form, defined in [34]:

where n is the number of discrete events, T is the time interval length, and λ is the number of events per unit time.

The probability mass function for the Poisson distribution is given by the following formula:

[33]

where k is a positive integer number representing the number of occurrences of an event, and λ is a positive real number equal to the expected number of occurrence that occur in a given interval. The above function represents the probability that there are exactly k occurrences of some events, whose expected number of occurrences in a certain time interval, of a given length, is λ.

The Poisson distribution can be derived as a limiting case of the binomial distribution as the number of trial goes to infinity and the expected number of successes remains fixed. It can also be approximated by the normal distribution with mean λ and variance λ, if λ has sufficient large values. The expected value and the variance of the Poisson distribution are equal to λ.

The cumulative distribution function for the Poisson distribution has the following formulas:

or . [33]

36

Page 39: documentatie licenta

3 Theoretical Foundations

The probability mass function of the Poisson distribution is shown in Figure 3-3. The horizontal axis is represented by k and the connecting lines are only guides for the eye, because the function only defines integer values of k. [33]

Figure 3-3.The probability mass function for the Poisson distribution

Figure 3-4 shows the cumulative density function for the Poisson distribution. As it can be seen in the plot, the function is discontinuous where k is an integer and flat everywhere else.

Figure 3-4.The cumulative density function of the Poisson distribution

The Poisson distribution is normalized so that the sum of probabilities equals 1:

. [35]

This distribution can be applied to systems with a large number of possible evens, each of which is rare. Some classical examples are the nuclear decay of atoms, the number of cars which pass through a road section in a given period of time, the number of stars in a given area of the sky, the number of errors in a text of a given length, the number of phone calls in a call-center, the number of web-requests in a given period of time and so on.

37

Page 40: documentatie licenta

3 Theoretical Foundations

3.3 OpenMP

OpenMP comes from Open Multi-Processing and is an Application Program Interface (API) that supports multi-platform shared-memory parallel programming in C/C++ and Fortran on all architectures, including UNIX and Windows NT platforms. [36] OpenMP is a portable, scalable model that offers a simple and flexible interface for developing parallel applications. It contains the three primary API components: compiler directives, runtime library routines, and environment variables.

OpenMP is an implementation of multithreading, having a master thread that forks a number of slave threads. A certain task can be divided among the threads generated, which will then run concurrently, being allocated to different processors by the runtime environment. Each thread has an id, which can be obtained in C by calling the function omp_get_thread_num(), and is formed before the parallel code section is executed. When the task is finished, all of the threads join back into the master thread that continues to run the rest of the program. The process presented above is shown in Figure 3-5, taken from [37].

Figure 3-5.OpenMP fork-join model of parallelization

The parallelized code is marked by a preprocessor directive, whose form depends on the directive used. OpenMP offers the possibility to both set or get the current number of threads, by means of user level runtime routines or environment variables (for example omp_set_num_threads() and omp_get_num_threads() functions can be called in C). For a better performance, the number of threads used should be equal to the number of processors available.

The core elements of OpenMP described in [38] are: The constructs for thread creation Work load distribution Data environment management Thread synchronization User level runtime routines Environment variablesOpenMP provides both task parallelization and data parallelization: normally, each

thread executes the parallelized code section independently, but work-sharing constructs are provided in order to divide the execution of the enclosed code region among the members of the team that encounter it. The threads used in the parallelization are not required to maintain exact consistency with the real memory all of the time, because they can cache their data. In the situations where all the threads must see the same identical value of a shared variable, the programmer has to ensure that all threads flush the variable as needed.

As stated in [37], OpenMP contains a simple and limited set of directives, which can be used to obtain significant parallelization. The format of an OpenMP directive is the following one: #pragma omp <directive name> [clause, ...] newline. Each directive applies only to the succeeding statement, which must be a structured block. The directive scoping can

38

Page 41: documentatie licenta

3 Theoretical Foundations

be static, orphaned or dynamic. In the first case the directive applies to the code between the beginning and the end of the block structure following it, in the second case the orphaned directive appears independently from another orphaned directive, while in the third case the extend of the directive includes both its static extend and the extends of its orphaned directives.

The fundamental OpenMP parallel construct is the parallel region directive. When a thread reaches this directive, it forks a team of threads and becomes the master of the team. Each of the threads in the team created will execute the code enclosed in the parallel region and will join with the other threads at the end of this region. If one of the threads terminates within the parallel region, all other threads in the team will terminate and the work done until that point will be undefined.

The work-sharing constructs are used to specify how independent work should be assigned to one or all of the threads. These constructs are: omp for or omp do, which are used to split up loop iterations between threads, sections, which assign consecutive, but independent blocks of code to different threads, single, used to specify a block of code that is executed by only one thread, and master, which is similar to single but in this case the thread executing the thread is the master thread. A work-sharing construct must be enclosed dynamically within a parallel region in order for the directive to execute in parallel. [37]

Some of the synchronization directives offered by OpenMP are critical, used to specify a region of code that must be executed by only one thread

at a time atomic, specifies that a specific memory location must be updated atomically ordered, specifies that iterations of the enclosed loop will be executed in the same

order as if they were executed on a serial processor barrier, synchronizes all threads in the team master, used to specify that a block should be executed only by the master thread taskwait, specifies a wait on the completion of child tasks generated since the

beginning of the current task flush, used to write thread-visible variables back to memory, in order for the

implementation to provide a consistent view of memoryAnother directive provided by OpenMP is threadprivate, which is used to make global

file scope variables local and persistent to a thread through the execution of multiple parallel regions.

The clauses that appear in the omp directive statement can be: Data sharing attribute clauses: shared, private, default – used to specify if the data

in the parallel region is shared or private; a default data scoping can also be defined.

Synchronization clauses: no wait – used to specify that threads do not synchronize at the end of the parallel loop.

Schedule clauses: schedule(type, chunk), where type can be static, dynamic, guided, runtime, or auto – used in for-loops to define the schedule method by which the iterations are divided among the threads.

The IF control – used to condition the parallelized execution of a block. Initialization: firstprivate, lastprivate– used to define how private data is initialized

and how it is written back to the original variable object. Data copying: copyin, copyprivate – similar with the initialization clauses, but this

time it defines the way in which data is copied between global variables and private variables.

Reduction: reduction(operator | intrinsic:list) – used to summarize the local copies of each thread of some variables into global shared variables

39

Page 42: documentatie licenta

3 Theoretical Foundations

Other: num_threads, collapse – are used to specify the number of threads to be used (num_threads), and to specify how many loops in a nested loop should be collapsed into one large iteration space and divided according to the schedule clause (collapse).

A new construct that came with OpenMP 3.0 is the task directive which defines an explicit task that may be executed by the thread that encountered it or deferred for execution by any other thread in the team. The data environment of the task is determined by the data sharing attribute clauses and the task execution is subject to task scheduling. [37]

OpenMP is useful because it is simple, it provides incremental parallelism and unified code for both parallel and serial applications, and the directives offered, automatically handle data layout and decomposition. However it only runs efficiently on the shared-memory multiprocessor platforms and it requires a compiler that supports OpenMP.

Even though the expected speed up when using OpenMP is N, where N is the number of processors used, this is not actually the real case, because OpenMP could have problems like load balancing and synchronization overhead. Other reasons might be that not the whole program could be parallelized, or that the memory bandwidth does not usually scale up N times.

OpenMP was written primarily to relieve the programmer from the details of threading, enabling a focus on the more important issues. It was developed for use by the high-performance computing community and it works best in programming styles that have loop-heavy code working on shared arrays of data. [39]

40

Page 43: documentatie licenta

4 Requirements Specifications and System Architecture

4 Requirements Specifications and System Architecture

The current paper describes three main applications incorporated in a greater system designed for testing random and pseudo-random number generators. These sub-systems are called SeqTestRand, ParTestRand and SFC. SeqTestRand and ParTestRand are two systems that incorporate a series of statistical test implemented in a sequential manner, and respectively in a parallel manner. SFC is a statistical functions calculator that can be helpful in the development of new statistical tests. In the following sub-sections the system architecture and the requirements specifications will be presented for each of these three sub-systems. Because this is a team project, the implementation of the system was divided between the two members of the team.

4.1 Block Diagram of the System

4.1.1 SeqTestRand

Figure 4-1.Block diagram of the SeqTestRand application

The SeqTestRand application can be used to test a random bit sequence stored in a file. As it can be seen in the block diagram of the system (Figure 4-1), it incorporates the sequential implementation of two batteries of tests, NIST and CryptoRand. While the NIST battery consists of only one application (SeqNIST), the CryptoRand battery consists of two applications: SeqHistograms and SeqGenBlockFreq. All the functionality offered by these applications can be accessed through an intuitive, user friendly graphical user interface.

The SeqNIST application is a byte-oriented efficient implementation of the NIST statistical test suite and can be run from the command line. It was developed by me and incorporates a total of 15 statistical tests, out of which 8 were implemented by me and the other 7 by my colleague.

The SeqHistograms application is an efficient implementation of the Histogram Tests and can be run from the command line. It was developed by my team colleague and incorporates a total of 8 histogram tests, out of which 4 were implemented by me and the other 4 by my colleague.

The SeqGenBlockFreq application was developed entirely by the other team member.

SeqNist

FCT

GUI

SeqHistograms

SeqGenBlockFreq

NIST battery

CryptoRand battery

41

Page 44: documentatie licenta

4 Requirements Specifications and System Architecture

4.1.2 ParTestRand

Figure 4-2.Block diagram of the ParTestRand application

The ParTestRand application is similar with the SeqTestRand application, with the difference that in this case the statistical tests batteries are parallelized for a multicore architecture, using OpenMP. Another difference is that the NIST battery now contains two applications: ParNIST and ParNIST_blocks, and the CryptoRand battery now contains three applications: ParHistograms, ParHistograms_blocks and ParGenBlockFreq.

The ParNIST application is similar with the SeqNIST application, but now the algorithms used for each test are parallelized. It was developed by me and incorporates a total of 9 tests, out of which 6 were implemented by me and 3 by the other team member.

The ParNIST_blocks application is similar with the SeqNIST application, because it contains the same sequential version of the tests (except for NIST test 5), with the only difference that in the testing modes 2, 3 and 4 the block sequences are run in parallel. The application was developed by me, while the implementation of the tests was divided among the team members in the same manner as for SeqNIST.

The ParHistograms_blocks application is similar with the SeqHistograms application, because it contains the same sequential implementation of the tests, with the only difference that in the testing modes 2, 3 and 4 the block sequences are run in parallel. The application was developed by my team colleague, while the implementation of the tests was divided among the team members in the same manner as for SeqHistograms.

The ParHistograms and the ParGenBlockFreq applications were developed entirely by the other team member.

ParNist

FCT

GUI

ParHistograms

ParGenBlockFreq

NIST battery

CryptoRand battery

ParNist_blocks ParHistograms_blocks

42

Page 45: documentatie licenta

4 Requirements Specifications and System Architecture

4.1.3 SFC

Figure 4-3.Block diagram of the SFC application

The SFC (Statistical Functions Calculator) application can be used to compute certain statistical functions that are necessary in the testing process of random sequences. The application can be used either to compute the value of a function in a certain point, or to generate a table with values of a function for a certain given interval. All the functionality offered by this application can be accessed through an intuitive, user friendly graphical user interface. Both the user interface design and the functions implementations were divided between the members of the team.

4.2 Functional Requirements

The functional requirements for the SeqTestRand application are the following ones: The possibility to select one of the two statistical tests batteries proposed for

implementation (NIST and CryptoRand). Testing with the chosen battery the random or pseudo-random bit sequence stored

in a file, which is given as input. The possibility to select any number of tests from one of the batteries. Running the selected tests for one chosen file. Running the selected tests for each file in a chosen directory that may contain other

directories. The possibility to select one of the four testing modes:

- Testing the whole sequence.- Testing a specified number of equal non-overlapping blocks of a given

size from a specified offset.- Testing a specified number of overlapping blocks of increasing size with

a given step, from a specified offset (the left limit of the blocks is fixed and they are growing in size to the right)

- Testing a specified number of overlapping blocks of decreasing size with a given step, from a specified offset (the right limit of the blocks is fixed and they are decreasing in size to the right)

The possibility to choose the significance level for the statistical tests selected. Displaying the textual results of each test. The possibility to visualize the results in a graphical manner, by plotting the P-

values obtained. Computing and displaying the execution time for each test and for the whole

execution Providing help for the values of each parameter.

GUI

Function Class

43

Page 46: documentatie licenta

4 Requirements Specifications and System Architecture

Providing general help, for all the tests and for all the inputs required by the application.

The possibility to cancel the running tests at any moment.

The functional requirements for the ParTestRand application are the ones specified for the SeqTestRand application, plus the following ones:

In modes 2, 3 or 4, there should exist the possibility to choose between- Running the parallelized version of each test, while the block sequences

are run in a sequential manner (internal parallelization of each test).- Running the sequential version of each test, while the block sequences are

run in parallel (block sequences parallelization). The possibility to choose the number of threads used for the parallel execution.

The functional requirements for the SFC application are the following ones: The possibility to select a statistical function and to calculate its corresponding

value for the parameters given by the user. The possibility to generate a table that contains the values of the statistical function

chosen, for parameters located in some intervals given by the user (here the user will also have to give the step with which the parameters will increase every time)

The possibility to save the obtained table in a file chosen by the user. The default extension for this file should be .csv, in order to be able to visualize the results in Microsoft Office Excel or in any other tabular program.

4.3 Non-functional Requirements

The non-functional requirements of the SeqTestRand application are the following ones:

Usability- The application will have an intuitive, user friendly interface.- Each battery of tests and each statistical test in these batteries will be

described on short in the help provided by the application, in order to facilitate the usability of the application. This way the user will understand what the purpose of each test is, when to use it and how to choose its parameters.

- The size of the file introduced by the user will be displayed, in order to know the size of the tested sequence. The size of the sequence can be helpful when dividing the tested sequence into sub-sequences for modes 2, 3 and 4.

- The user will be alerted every time an action he performed is not a correct one.

Extensibility- Because the statistical tests batteries are independent, the application can

be easily extend by adding other statistical tests batteries.- Because of the same reason stated above, any modification done on one

of the batteries does not affect the other test suites. Performance

- The execution times obtained depend on the size of the tested sequence, being directly proportional with it, but will be much smaller that those obtained for the current available implementations.

44

Page 47: documentatie licenta

4 Requirements Specifications and System Architecture

- In order to reduce the time needed to read the tested sequence from the input file, the file will be read a single time and then used by all the selected tests from one of the batteries.

The non-functional requirements of the ParTestRand application are the same as the ones for the SeqTestRand application, the only difference being that this time the execution times obtained must not only be much more smaller that those obtained for the current available implementations, but also smaller than the execution times obtained for the SeqTestRand application. This is necessary in order to justify the parallel implementation of the statistical test suites.

For the SFC application the non-functional requirements are the following ones: Usability

- The application will have an intuitive, user friendly interface.- For every input parameter of every function, the valid domain will be

displayed, in order for the user to acknowledge the accepted values of each parameter.

- For each of the functions integrated in this calculator, the corresponding mathematical formula and a short description of what the function computes will be displayed, for a better use of this application.

Extensibility- The application can be easily extended by adding other statistical

functions, because the functions don not depend on each other.- Because the functions are independent, the modifications done on one

function, do not affect the other functions integrated in the calculator.

4.4 Constraints

In order for the three applications described to be implemented and to be run properly, the minimum system constraints that need to be fulfilled are a Windows XP with Service Pack 2 or a higher version of the Windows operating system and the installation of a .NET 3.0 framework or higher.

For the ParTestRand application it is recommended to run it on a multi-core architecture, because it is designed to obtain a better execution time by using the power of parallel processing.

For the SeqTestRand and ParTestRand applications the input file should have a length of at most 1 GB. This size was chosen as a result of a series of experiments made. For the first version of the NIST statistical test suite implemented, the maximum input file size was of 2 31

GB. In this first version the file was not entirely read into memory from the start. Instead chunks of 200000 bytes were read and processed in a sequential manner. Because for files greater that 1GB, the application needed more time to run, it was decided that this first version of the tests should be integrated in another project which should use the computational power of the grid.

The sequence tested by SeqTestRand and ParTestRand is in fact the bit representation of the input file, so the bits of the sequence should not be written in the file one by one.

4.5 General System Architecture

The current described project is part of a greater research project concerned with generation and testing of random and pseudo-random number generators. This project is named CryptoRand and has the system architecture shown in Figure 4-4.

45

Page 48: documentatie licenta

4 Requirements Specifications and System Architecture

The TestRand system integrated in the CryptoRand project is concerned with testing random and pseudo-random number sequences, while the rest of the systems integrated are handling the generation of random sequences. The TestRand system integrates another four applications, as presented in Figure 4-5. Three of the sub-systems integrated in the TestRand project represent the project described in this paper.

The SeqTestRand and ParTestRand applications have a stratified, low coupling architecture. The components of the architecture are the functional module, and a graphical user interface, which interact with each other. The functional module of the SeqTestRand application contains an executable for the NIST battery and two for the CryptoRand battery. The one for the ParTestRand application contains two executables for the NIST battery and three for the CryptoRand battery. Both functional models have a high cohesion architecture. The tow architectures described are presented in Figure 4-6.

Figure 4-4.The general architecture of the CryptoRand project

Figure 4-5.The general architecture of the TestRand project

PseudoRand UnpredRand

GUI GUI

TrueRand

GUI

TestRand

GUI

GUI Integration Module

GUIRand

CryptoRand

TestRand

SeqTestRand ParTestRand

G U I G U I

ParTestRandGPU

G U I

SFC

G U I

46

Page 49: documentatie licenta

4 Requirements Specifications and System Architecture

Figure 4-5.The SeqTestRand and ParTestRand architectures

The architecture of the SFC application consists of a graphical user interface and a class that contains the implementation of some statistical functions. Because each statistical function is represented by a method in the Function class, adding new statistical functions means adding new methods to this class. The architecture of the SFC application is shown in Figure 4-7.

Figure 4-6.The SFC application architecture

GUI

Statistical Tests

Batteries

SeqTestRand

GUI

Statistical Tests

Batteries

ParTestRand

GUI

Function Class

47

Page 50: documentatie licenta

5 Design Detail

5 Design Detail

5.1 Design Policies

The design policies used for the SeqTestRand application and the reasoning made are shown below:

1) The NIST battery will be implemented as a single executable file, SeqNIST: In order to obtain a unitary, independent application. Can be run from the command line prompt. Can also be integrated in the SeqTestRand application with a graphical

user interface.2) The CryptoRand battery will be divided in two parts: the Histogram Tests and the

General Block Frequency Test, obtaining two executables. These applications were developed by the other team member so the design policies concerning them will be detailed by her. My contribution was implementing half of the Histogram tests.

3) Each test from a battery corresponds to a function that can be called from the main section of the application:

In order to generate a single executable for related tests (a various number of tests can be run using this executable).

Each function will have as parameters a pointer to the beginning of the tested sequence and the length of the sequence. By adopting this convention, the same function can be used to test the whole sequence or only a part of it, just by modifying the parameter values.

Each test can be implemented independently.4) The maximum size of the file is 1 GB:

In order to be able to read the entire sequence at once. A Win32 application has 2 GB of memory allocated, which are divided

between the heap, the stack and so on. The experiments made have proven that the maximum size that can be

allocated dynamically is around 1600 MB. 600 MB of memory were left for the allocation of the used variables. For files greater than 1GB the testing takes too much time. Another

version of the tests exists that are part of another research project designed to be adapted for the grid.

5) The entire sequence will be read from the input file at once: The execution time will be reduced, because the sequence will be used by

each test (the file must not be read by each test). All the related tests can be incorporated in a single executable. In order to facilitate the parallelization of the tests.

6) The buffer keeping the tested sequence will be allocated dynamically: If static allocation would be used, 1GB will be allocated all the time, even

if the tested sequence has a much smaller length7) For numbers that exceed the 32 bit representation the double type is used

whenever possible: The processing time obtained when using the double type is less than the

one obtained when using the long long type. Numbers that exceed the 32 bit representation appear when the tested

sequence is big and the total number of bits is required.

48

Page 51: documentatie licenta

5 Design Detail

The int type is processor dependent and still has a length of 32 bits on most of the existing 64 bit processors.

8) Different testing modes offered: In order to analyze in more detail the tested sequence, the application

offers four testing modes. Using the offered testing modes a detailed report can be obtained about the degree of randomness of the whole sequence or of different parts of it (global randomness and local randomness).

9) The command line arguments used for the SeqNIST application will be presented next. The ones chosen for the two executables of the CryptoRand battery will be detailed by the other team member.SeqNIST.exe <file_name> <mode> <start> <step> <count> <alpha> <tests_list_with_parameters>

<file_name> represents the path and the name of the file which contains the tested sequence.

<mode> can be:- 1 – the whole sequence is tested.- 2 – a number of <count> non-overlapping blocks of size <step>

starting from offset <start> in the input sequence, are tested.- 3 – a number of <count> overlapping blocks of increasing size with

step <step> starting from offset <start> in the input sequence, are tested.

- 4 – a number of <count> overlapping blocks of decreasing size with step <step> starting from offset <start> in the input sequence, are tested.

The modes have been chosen in such a way that more different sub-sequences from the input sequence could be tested.

<start> specifies the position from the file from which the testing sequence can be tested (for mode 1 the value given for this argument is not taken into consideration, because <start> is always 0).

<step> specifies the length of a block in mode 2 and the step with which the tested sequence increases or decreases in modes 3 and respectively 4 (for mode 1 the value given for this argument is not taken into consideration, because it is not needed).

<count> specifies the number of blocks tested from the input sequence (for mode 1 the value given for this argument is not taken into consideration, because it is not needed).

Using <start>, <step> and <count> the tested sub-sequence can be chosen to be as long as needed, and can be located in any part of the sequence.

<alpha> specifies the significance level to be used by the statistical tests selected.

<tests_list_with_parameters> represents a list of tests, where each test has the following format: <tx> <p1> <p2>…<pn>. x is the number of the test and <pn> is the nth parameter of the test.

10) The following execution times will be displayed: The processing time for each test. The processing time for each block, if the testing mode is 2, 3 or 4. The total processing time of all the tests run. The I/O time needed to read the sequence from the input file.

49

Page 52: documentatie licenta

5 Design Detail

The total execution time.11) There will exist two running modes:

Single file: the user will choose a file to be tested. Multiple files: the user will choose a directory to be tested, in case he

wants to run more files in batch mode.12) For each test the results consists of the P-value(s) obtained and a statement

confirming the rejection or acceptance of the sequence. The user will also have the option to view the results in a graphical manner:

The user is interested in knowing if the sequence passed the tests or not, but the P-value(s) obtained for each test are also important because it can specify the degree of randomness, or non-randomness.

The graphical representation of the results could be useful if many tests are selected and especially in the situation when many files are run at once, because the user can see more clearly how many P-values obtained are lower than the significance level.

13) Every tab in the interface corresponds to a statistical tests battery: This way the user can make a clear distinction between the existent

batteries. New batteries can be added at any time in the same manner, without

affecting the existing batteries and confusing the user.

The design policies used for the ParTestRand application are similar to the ones presented for the SeqTestRand application with some differences presented below:

1) The NIST battery will have two executables: ParNIST and ParNIST_blocks: In the ParNIST application the algorithm used for each test is

parallelized. The ParNIST_blocks application only runs for modes 2, 3 and 4 and

contains the sequential version of each test (except for NIST test 5), while the parallelization is applied for the block sub-sequences into which the tested sequence is divided.

The two executables will be integrated in the same battery, but the user will have the possibility to choose what he wants to be parallelized.

2) The CryptoRand battery will have three executables: ParHistograms, ParHistograms_blocks and ParGenBlockFreq. These applications were developed by the other team member so the design policies concerning them will be detailed by her. My only contribution was in the ParHistograms_blocks application, for which I provided half of the sequential tests implementations.

3) The number of threads used for the parallelization can be chosen by the user: Normally, the user will want to use a number of threads equal to the

number of processors used in the multi-core architecture. The chunk for each thread will be calculated as a function of the number

of threads given as input. By using various numbers of threads, some comparisons can be made.

4) The command line arguments used for the ParNIST and ParNIST_blocks applications will be presented next. The ones chosen for the three executables of the CryptoRand battery will be detailed by the other team member.ParNIST[_blocks].exe <file_name> <mode> <start> <step> <count> <alpha> <number_threads> <tests_list_with_parameters>

50

Page 53: documentatie licenta

5 Design Detail

The only difference brought from the SeqNIST application is the argument <no_threads>, which is used to specify the number of threads to be used for the parallelization.

For the ParNIST_blocks application <mode> can be only 2, 3 or 4, so <start>, <step> and <count> are mandatory.

5) OpenMP was chosen for parallelization: OpenMP provides both task parallelization and data parallelization. OpenMP supports multi-platform shared-memory parallel programming

in C/C++. The OpenMP directives automatically handle data layout and

decomposition. It can be used in both Unix or various Windows platforms.

The SFC application has the following desing policies:1) Every tab in the interface corresponds to a statistical function:

This way the functions are organized and can be intuitively and easily accessed by the user.

2) Specifying the valid domain for each parameter of each function: In order to reduce the risk of introducing some parameter values that are

out of the domain of the function. This way the user will not have to guess the accepted value, and will be

spared from warning messages.3) A short description of the returned value and the mathematical formula of each

function will be specified: Through the mathematical formula of the function the user will

understand the concrete mathematical implementation of each function and will have no doubts about the results of the function.

In order to understand the correspondence between the introduced parameter values and the actual parameters used in the mathematical function.

4) For the implementation of each function the Cephes library was used: The Cephes library is a well known scientific mathematical library,

containing over 400 mathematical and statistical functions. The Cephes library is also used by NIST in their implementation.

5) The value of a function can be computed in one point or for a series of points, displayed in a tabular form:

By displaying the results in a tabular form, for some intervals given, the results can be easily checked for correctness by comparing them to existing tabular representations of the mathematical function.

The obtained tables can lead to a detailed analyze of the statistical functions implemented.

The tables can be used to generate charts. The value of the function in one point is also useful when only a certain

value of the function is needed.

5.2 Use Cases

5.2.1 SeqTestRand

The use cases for the SeqTestRand application are shown in the use case diagram from Figure 5-1.

51

Page 54: documentatie licenta

5 Design Detail

Figure 5-6.Use case diagram for the SeqTestRand application

The main use cases presented in the diagram shown above are: running the selected tests for a single file, running batch job and viewing graphical results. The rest of the activities that the user can perform, except for canceling tests, are used by the first or the second main use cases stated before.

The next Use Case will represent the actions performed by the user when he wants to run the selected tests for a single input sequence.

52

Page 55: documentatie licenta

5 Design Detail

1. Basic Flow

Use Case Start The use case starts when the actor wants to test a single file, whose binary representation is a bit sequence, with the help of statistical tests.

1.1 The actor chooses the statistical tests battery that he wants to use for testing.1.2 The actor introduces the input file.1.3 The system displays the size of the input file in bytes.1.4 The actor chooses the running mode 1 and introduces a value for the significance

level.1.5 The actor selects the wanted tests and introduces values for the parameters where

needed.1.6 The actor asks for the tests to be run for the chosen file.1.7 The system validates the information introduced by the user and displays the

results of the tests.

Use Case EndThe actor visualizes the results displayed.

2. Alternative Flows

2.1. The input file is not specified

Alternative Flow StartsThis flow can occur in step 1.3 as a result of not specifying a file in step 1.2.

2.1.1 The system does not display the size of the input file.

Alternative Flow EndsThe flow continues with step 1.4 from the basic flow.

2.2 The actor chooses one of the testing modes 2, 3 or 4

Alternative Flow StartsThis flow can occur in step 1.4 as a result of choosing one of the testing modes 2, 3 or 4.

2.2.1 The actor chooses one of the testing modes 2, 3 or 4 and introduces values for start, step, count, and alpha.

Alternative Flow EndsThe flow continues with step 1.5 from the basic flow.

2.3 Incomplete information

Alternative Flow StartsThis flow can occur in step 1.7 as a result of not specifying the input data in one of the steps: 1.2, 1.4, and 1.5 from the basic flow or in step 2.2.1 from the alternative flow 2.2.

53

Page 56: documentatie licenta

5 Design Detail

2.3.1 The system shows a warning message through which it alerts the actor that he didn’t provide all the information required.2.3.2 The actor provides the necessary information.

Alternative Flow EndsThe flow continues with step 1.6 from the basic flow.

2.4 Incorrect information

Alternative Flow StartsThis flow can occur in step 1.7 as a result of specifying incorrect information in one of the steps: 1.2, 1.4, and 1.5 from the basic flow or in step 2.2.1 from the alternative flow 2.2.

2.4.1 The system shows a warning message through which it alerts the actor that the information he provided is not correct.2.4.2 The actor corrects the information where necessary.

Alternative Flow EndsThe flow continues with step 1.6 from the basic flow.

2.5 The actor cancels the tests

Alternative Flow StartsThis flow can occur in step 1.7 as a result of canceling the tests.

2.5.1 The system shows a warning message through which it alerts the actor that the tests have been canceled and cancels the execution of the tests.

Alternative Flow EndsThis flow ends.

2.6 The system fails

Alternative Flow StartsThis flow can occur in any step. When the system fails the data is not saved.

2.6.1 The actor restarts the system.

Alternative Flow EndsThis flow continues from step 1.1 from the basic flow.

The Use Case for the operation of running the tests for multiple files placed in a directory (running batch job) is similar with the one presented above. The only difference is that steps 1.2 and 1.3 from the basic flow are missing and there is another step introduced after step 1.5 from the basic flow, where the actor chooses the directory that contains the files that need to be tested. Because of these changes the alternative flow 2.1 does not exist anymore, but two other alternative flows are introduced for the last step in the following situations: the actor does not choose any directory or the actor chooses an empty directory. The alternative flows will consist of the system showing warning messages, through which it alerts the actor that the tests were not performed because of one of the problems stated.

The Use Cases for the cancel tests and view graphical results operations are too simple to be described. They consist of only one step performed by the actor, asking the system to cancel the tests or to view the graphical results, and the response from the system, which

54

Page 57: documentatie licenta

5 Design Detail

cancels the tests or displays the graphical results. The rest of the Use Cases are part of the first Use Case presented above.

5.2.2 ParTestRand

The use cases for the ParTestRand application are shown in the use case diagram from Figure 5-2. The main use cases presented in Figure 5-2 are: running the selected tests for a single file, running a batch job and viewing graphical results. The rest of the activities that the user can perform, except for canceling tests, are used by the first or the second main use cases stated before.

The Use Case for the ParTestRand application will not be detailed here, because it is similar with the SeqTestRand Use Case. The only difference is that in step 1.4 from the basic flow and in step 2.2.1 from the alternative flow 2.2, the actor also introduces the number of threads wanted. In step 2.2.1 from the alternative flow 2.2, the actor also chooses what he wants to be parallelized, the tests or the block sub-sequences.

5.2.3 SFC

The use cases for the SFC application are shown in the use case diagram presented in Figure 5-3. The main use cases presented in Figure 5-3 are: computing function and generate table. The rest of the use cases, except for clear data and view about, are part of the two main use cases stated before.

Two Use Cases will be presented next: one for computing the value of a function and one for generating a table for the function.

Use Case 1: Computing the value of a function

Precondition: The actor has selected one of the functions from the calculator.

1. Basic Flow

Use Case StartThe use case starts when the actor wants to compute the value of a statistical function.

1.1 The actor inserts the parameters needed to compute the selected function.1.2 The actor asks for the function to be computed.1.3 The system validates the parameters and displays the result of the function.

Use Case EndThe actor visualizes the results.

2. Alternative Flows

2.1 Incomplete information

Alternative flow beginsThis flow can occur in step 1.3 as a result of not specifying the input data in step 1.1.2.1.1 The system shows a warning message through which it alerts the actor that he didn’t provide the parameters for the function.2.1.2 The actor provides the parameters for the function.

Alternative Flow EndsThe flow continues with step 1.2 from the basic flow.

55

Page 58: documentatie licenta

5 Design Detail

2.2 Incorrect information

Alternative Flow StartsThis flow can occur in step 1.3 as a result of specifying incorrect information in step 1.1.

2.2.1 The system shows a warning message through which it alerts the actor that the parameters he provided are incorrect.2.4.2 The actor corrects the values of the parameters.

Alternative Flow EndsThe flow continues with step 1.2 from the basic flow.

2.3 The system fails

Alternative Flow StartsThis flow can occur in any step. When the system fails the data is not saved.

2.3.1 The actor restarts the system.

Alternative Flow EndsThis flow continues from step 1.1 from the basic flow.

Use Case 1: Generating a table for the function.

Precondition: The actor has selected one of the functions from the calculator and has entered the table generation mode.

1. Basic Flow

Use Case StartThe use case starts when the actor wants to generate a table with the values of the selected function for specified intervals of parameters.

1.1 The actor inserts the fist and the last values of the interval, and the step with which to increase the parameters from this interval every time.1.2 The actor introduces the path and the name of the file where the table will be saved.1.3 The actor asks for the generation of the table.1.4 The system validates the parameters, generates the table and announces the actor that the operation was successful.

Use Case EndThe system goes back to the standard form of the calculator.

2. Alternative Flows

2.1 Incomplete information

Alternative flow beginsThis flow can occur in step 1.4 as a result of not specifying the input data in one of the steps 1.1 and 1.2.

2.1.1 The system shows a warning message through which it alerts the actor that he didn’t provide all the required information.2.1.2 The actor provides the required information.

56

Page 59: documentatie licenta

5 Design Detail

Alternative Flow EndsThe flow continues with step 1.3 from the basic flow.

2.2 Incorrect parameters

Alternative flow beginsThis flow can occur in step 1.4 as a result of introducing wrong values for the function parameters in step 1.1

2.2.1 The system shows a warning message through which it alerts the actor that the values he provided are incorrect.2.2.2 The actor corrects the values.

Alternative Flow EndsThe flow continues with step 1.3 from the basic flow.

2.3 Incorrect file

Alternative flow beginsThis flow can occur in step 1.4 as a result of introducing an inexistent path or a wrong file name in step 1.1.

2.3.1 The system does not generate the table and shows a warning message through which it alerts the actor that the operation could not be performed.

Alternative Flow EndsThe system goes back to the standard form of the calculator.

2.4 The system fails

Alternative Flow StartsThis flow can occur in any step. When the system fails the data is not saved.

2.4.1 The actor restarts the system.

Alternative Flow EndsThis flow continues from step 1.1 from the basic flow.

57

Page 60: documentatie licenta

5 Design Detail

Figure 5-7.Use Case diagram for the ParTestRand application

58

Page 61: documentatie licenta

5 Design Detail

Figure 5-8.Use Case Diagram for the SFC application

59

Page 62: documentatie licenta

5 Design Detail

5.3 Program Structure

5.3.1 SeqTestRand

SeqTestRand is a WPF application that integrates the sequential implementation of two statistical tests batteries: NIST and CryptoRand. The NIST battery is implemented as a single application, SeqNIST, while the CryptoRand battery is implemented as two applications, because the tests contained in the CryptoRand battery can be divided in two categories: the Histogram tests (SeqHistograms) and the General Block Frequency test (SeqGenBlockFreq). The SeqNIST application is a byte-oriented more efficient implementation of the NIST Statistical Test Suite. The two applications that implement the CryptoRand battery represent an efficient implementation of some tests developed as part of the CryptoRand project. To sum up, the SeqTestRand application integrates three applications (SeqNIST, SeqHistograms and SeqGenBlockFreq) and provides a graphical user interface for them. The structure of each of these applications and the optimizations brought to each test will be presented next. The only application that will not be presented is SeqGenBlockFreq, because it was developed entirely by my team colleague and it will be described by her. SeqTestRand was developed in C# and XAML and the applications are incorporated as separate processes.

5.3.1.1 SeqNIST

The SeqNIST application was developed by me and integrates 14 statistical tests taken from the NIST documentation [8] plus another version of the Maurer’s “Universal Statistical” Test (NIST test 9). Seven of these tests were implemented by the other team member and the other 8 by me. The NIST documentation describes 16 statistical tests, but 2 of these were disregarded by NIST because of some problems found. The SeqNIST application is a Win32 console application and was developed in C.

The input file is read into a buffer for which space is allocated dynamically with the size of the file. The whole sequence is loaded into memory from the start, in an array of type unsigned char, in order to hold one byte from the input sequence in one position of the array. This will allow us to work on bytes, instead of working on bits.

The application implements the four running modes presented in the design policies section for SeqTestRand and contains 14 functions for each of the NIST tests implemented. The functions take as parameters the size of the sequence to be tested and a pointer to the beginning of the sequence, in order to be able to test the whole sequence or only a part of it by using the same function. For the tests that have parameters, the values received for each test are also transmitted through the function parameters. Each of these functions will be then called from the main section of the application.

The application also computes the processing time for each test and for each block sub-sequence if the testing mode is 2, 3 or 4, the total processing time, the time needed to read the input file, and the total execution time. The time is computed using the clock() function from the time.h library, in order to compute the time in the same manner as in the NIST implementation and to be able to compare the results. The statistical functions used to compute the P-value, igamc and erfc, are implemented according to the Cephes library.

The main increase in performance is due to shifting the paradigm from a bit sequence to a byte sequence, as shown in Figure 5-4. Each algorithm was adapted in order to be able to work on bytes, by using different look-up tables or other efficient computations. In the next sections I will present the optimizations brought to each of the tests implemented by me. The other 7 test will be described by my team colleague.

60

Page 63: documentatie licenta

5 Design Detail

Figure 5-4.Shifting the paradigm from a bit sequence to a byte sequence

1. NIST Test 1 – The Frequency (Monobit) Test

The test needs to compute the difference between the number of ones and the number of zeroes in the sequence. Instead of transforming each 0 bit into a -1 and then summing up all the bits in the sequence like in the NIST implementation, I am computing just the number of ones, since the number of zeroes is complementary. In order to compute the number of ones, I am using a look-up table of 256 entries that holds the number of ones in each possible byte value. Each byte from the input sequence is matched with one of the indices of the look-up table and then the corresponding value is added to the total number of ones, like shown in Figure 5-5.

Figure 5-5. Optimization for the Frequency (Monobit) Test

This test has no parameters and can be called from the command line with the following argument: t1. An example of this new version of the Frequency Test will be presented next, in order to show the new form of the algorithm. The steps presented correspond to the steps used in the test description from the Theoretical Foundations sub-section 3.1.1.1.

Input: ε = 10110101 0110110 10101110 11001100. n = 32 bits (4 bytes).The buffer holds the unsigned chars with the following ASCII codes: 181, 54, 174, 204.

Step 1In the following, noOfOnes is the lookup table holding the number of ones for each

byte and noOnes represents the total number of ones in the sequence.noOnes = 0.noOnes = noOnes + noOfOnes[181] = 0 + 5 = 5.noOnes = noOnes + noOfOnes[54] = 5 + 4 = 9.

0 0 0 0 0 1 0 0 0 0 0 0 0 10 1

The bits buffer used by NIST to hold the sequence (n entries)

The bytes buffer used by SeqNIST to hold the sequence (n/8 entries)

4 3 255

1 1 11 1 1 1 1…

0 1 2 3 4 5 6 7 8 9 10 1112 131415 n-1

Bytes buffer used in SeqNIST

4 3 255

0 1 1 2 1 8

0 1 2 3 4 255

Look-up table with number of ones in each byte

61

Page 64: documentatie licenta

5 Design Detail

noOnes = noOnes + noOfOnes[174] = 9 + 5 = 14.noOnes = noOnes + noOfOnes[204] = 14 + 4 = 18.The number of zeroes is equal to n – 18 = 32 – 18 = 14.Now the difference between the number of ones and the number of zeroes is 4, which

corresponds to the variable denoted Sn in the description of the test from section 3.1.1.1. The rest of the steps that are performed are also described in section 3.1.1.1. As it can be seen in the example only 4 additions and a subtraction need to be performed in this new version of the algorithm in order to compute the number of ones in the sequence. In the NIST implementation this step was performed by using 14 transformations from 0 to -1 and 32 additions.

2. NIST Test 2 –Frequency Test within a Block

This test is similar with the first test, but this time the sequence is divided into blocks and the difference between the number of ones and the number of zeroes is computed for each block. The same optimization is applied for this test as for the first one, but this time the number of ones in each block is computed. Therefore, the look-up table from the first test can also be used here, but now only the values for the bytes in a block are summed up, like shown in Figure 5-6.

Figure 5-6. Optimization for the Frequency Test within a Block

In order to be able to work on bytes, the size of the block given by the user must be given in bytes. Because the length of the block in bytes is stored using the long type, this implementation permits greater block lengths that in the NIST implementation which uses the int type to store the length of the block in bits. Even if the int type is processor dependent, for most of the existent 64 bit processors, it still has 32 bits. Therefore the maximum block size allowed in the NIST implementation is 231 bits, while for my implementation it is 8 times greater.

This test can be called from the command line with the following arguments: t2 <block_length>. An example of this new version of the Frequency Test within a Block will be presented next, in order to show the new form of the algorithm. The steps presented correspond to the steps used in the test description from the Theoretical Foundations sub-section 3.1.1.2.

Input: ε = 10110101 0110110 10101110 11001100, n = 31 bits (4 bytes), M = 2 bytes.The buffer holds the unsigned chars with the following ASCII codes: 181, 54, 174, 204.

Step 1N = 2, and the blocks are: 10110101 0110110 and 10101110 11001100.

4 3 255

0 1 1 2 1 8

0 1 2 3 4 255

Look-up table holding the number of ones in each byte

0 1

M-1 0

2

1

254

254

7

… …

M-1

The bytes buffer used in SeqNIST

0

62

Page 65: documentatie licenta

5 Design Detail

Step 2In the following, noOfOnes is the lookup table holding the number of ones for each

byte. The number of ones in the first block is computed by adding noOfOnes[181] + noOfOnes[54], while the number of ones in the second block is computed by adding noOfOnes[174] + noOfOnes[204]. Now the proportions of ones are:

The rest of the steps that are performed are the ones described in section 3.1.1.2.

3. NIST Test 3 –Runs Test

This test has as prerequisite the Frequency test, meaning that if the condition imposed by the first test is passed, then the Runs test is performed, otherwise the P-value returned by this test is 0. For this step, the number of ones is computed in the same manner as for the Frequency test.

The test also computes the number of runs in the input sequence. The algorithm proposed by NIST in the documentation traverses the input sequence bit by bit, compares every two adjacent bits, and increments the total number of runs whenever the two bits are different. In my implementation, in order to compute the number of runs, two look-up tables are used. The first one has 256 entries and contains the number of runs in each possible byte value. The second one is a matrix with 2 lines and 256 columns and contains the first and the last bit for each possible byte value. Each byte in the input sequence is matched with an index of the first look-up table and then the corresponding value is added to the total number of runs. For every two adjacent bytes processed, the second look-up table is used in order to compare the last bit of the first byte with the first bit of the second byte. This comparison is performed in order to determine if two adjacent bytes share the same run. If this is the case then a 1 is subtracted from the total number of runs every time the comparison is made, in order to obtain the correct number of runs. The way this two lookup tables are used is shown in Figure 5-7.

Figure 5-7. Optimization for the Runs Test

This test can be called from the command line with the following argument: t3. An example of this new version of the Runs Test will be presented next, in order to show the new form of the algorithm. The steps presented correspond to the steps used in the test description from the Theoretical Foundations sub-section 3.1.1.3.

The bytes buffer used in SeqNIST

4 3 255

1 2 3 2 3 1

0 1 2 3 4 255

Look-up table holding the number of runs in each byte

… 0 0 0 0 0 1

0 1 2 3 4 255

0 1 0 1 0 1…

F

L

… 0 3

=?

Look-up table holding the first and the last bits for each byte

1

255

00000000 00000000 00000000 00011111 8t

h

mask applied

F

L

1

255

1

F

L

1

255

1

F

L

1

255

1

F

L

1

255

1

F

L

1

255

1

F

L

1

255

1

F

L

1

255

1

F

L

63

Page 66: documentatie licenta

5 Design Detail

Input: ε = 10110101 0110110 10101110 00111100, n = 32 bits (4 bytes).The buffer holds the unsigned chars with the following ASCII codes: 181, 54, 174, 60.

Step 1

The proportion of ones in the sequence is .

Step 2

, therefore the test is run.

Step 3The number of runs (Vobs) is computed using the nrOfRuns look-up table, holding the

number of runs in each possible byte value and the FaLB (First and Last Bit) matrix, which has on the first row the first bit of each possible byte and on the second row the last bit of the same byte.

Vobs = nrOfRuns[181] + nrOfRuns[54] = 7 + 5 = 12.FaLB [1][181] ≠ FaLB [0][54].Vobs = Vobs + nrOfRuns[174] = 12 + 6 = 18.FaLB [1][54] ≠ FaLB [0][174].Vobs = Vobs + nrOfRuns[60] = 18 + 3 = 21.FaLB [1][174] = FaLB [0][60] => Vobs = Vobs – 1 = 20.The rest of the steps that are performed are the ones described in section 3.1.1.3.

4. NIST Test 4 –Test for the Longest Run of Ones in a Block

This test divides the input sequence into blocks of length M and searches for the maximum run on ones in each block bit by bit. After this the frequencies of the longest run of ones are tabulated into categories. In order to be able to work on bytes, my implementation uses a block length of 8 bits (1 byte) and a look-up table of 256 entries containing the maximum run of ones in each possible byte value. Now the longest run of ones in each block is easy to be found because it is represented by the value from the look-up table whose index is equal to the value of the current byte (block). In order to compute the frequencies of the longest runs of ones we just take the value from the look-up table, find the category it is in and increment the frequency of that category, like shown in Figure 5-8.

Figure 5-8. Optimization for the Longest Run of Ones in a Block

The bytes buffer used in SeqNIST

4 3 255

0 1 1 2 1 8

0 1 2 3 4 255

Look-up table holding the maximum run of ones in each byte

……

F≤1

F2

F3

F≥4

Frequencies of the longest runs of ones

in each block

64

Page 67: documentatie licenta

5 Design Detail

This test can be called from the command line with the following argument: t4. An example of this new version of the Test for the Longest Run of Ones in a Block will be presented next, in order to present the new form of the algorithm. The steps presented correspond to the steps used in the test description from the Theoretical Foundations sub-section 3.1.1.4.

Input: ε = 11001100 00010101 01111100 01011100 11100000, n = 40 bits (5 bytes), M = 8 bits (1 byte).

The buffer holds the unsigned chars with the following ASCII codes: 204, 21, 124, 92, 224.

Step 1N = 5, and the blocks with the maximum run of ones computed with the help of the maxRunOfOnes lookup table are the following ones:11001100 – maxRunOfOnes[204] = 200010101 – maxRunOfOnes[21] = 101111100 – maxRunOfOnes[124] = 501011100 – maxRunOfOnes[92] = 311100000 – maxRunOfOnes[224] = 3The rest of the steps that are performed are the ones described in section 3.1.1.4.

5. NIST Test 7 –Non-overlapping Template Matching Test

This test uses an m-bit window to search for a specific pattern in the input sequence. If the pattern is found the window slides right m bits positions, else it only slides one bit position. In order to be able to work on bytes, the length of the pattern is taken to be 8 bits (1 byte). For searching the template in the input sequence, the first two bytes are put into an unsigned short int variable. The template is compared with the first byte of this variable, by performing logical operations. If they are equal, then the variable is shifted left 8 bits and another byte from the input sequence is added to it by using logical operators. If the template and the first byte are different, then the unsigned short int variable is shifted left only one bit. Whenever there are 8 bits free in the variable, another byte from the input sequences is added to it. This process is repeated until there are no more bytes left into the input sequence. The algorithm presented is also shown in Figure 5-9.

Figure 5-9. Optimization for the Non-overlapping Template Matching Test

The test divides the sequence into blocks and then computes the number of appearances of the pattern in each block. Unlike in the NIST implementation, my version of the test takes as parameters the length of the block and a file containing the wanted templates.

The bytes buffer used in SeqNIST

4 3 255…

4 3

=

Template

?<<8 bits 3 9

Yes

No <<1 bit 8 6

Repeat until no bytes are

left

9

65

Page 68: documentatie licenta

5 Design Detail

In the NIST implementation the length of the block is hardcoded, but the length of the template can be chosen. However the documentation states that any block length can be used if some conditions hold (my implementation checks for these conditions).

This test can be called from the command line with the following arguments: t7 <block_size> <templates_file>. An example of this new version of the Non-overlapping Template Matching Test will be presented next, in order to present the new form of the algorithm. The steps presented correspond to the steps used in the test description from the Theoretical Foundations sub-section 3.1.1.5.

Input: ε = 11001100 00010101 01111100 01011100 11100000, n = 40 bits (5 bytes), M = 3 bytes, B = 11000001.

Step 1N = 1The single block is: 11001100 00010101 01111100.

Step 2The unsigned short int variable will be denoted temp because it is a temporary

variable. The number of occurrences of the template in each block (in this case in the only block) is computed in the following way:

temp = 11001100 0001010111001100 ≠ 11000001 => temp << 1 = 10011000 0010101010011000 ≠ 11000001 => temp << 1 = 00110000 0101010000110000 ≠ 11000001 => temp << 1 = 01100000 1010100001100000 ≠ 11000001 => temp << 1 = 11000001 0101000011000001 = 11000001 => in order to shift left 8 bits, temp will be cleared, the byte previously taken from the input sequence and the current byte are placed in it, temp = 00010101 01111100 and the variable is shifted left with the total number of bits it has been shifted until now (4) => temp = 01010111 11000000

The search is performed again and the short int variable is shifted left 1 bit at each step. The template is not found anymore and the search stops when there are less than 8 original bits left in the variable. So for this example W1 = 1.

The rest of the steps that are performed are the ones described in section 3.1.1.5.

6. NIST Test 8 –Overlapping Template Matching Test

This test is similar with the 7th test because it uses the same m-bit window to search for a pattern in the input sequence. This time however, the pattern is a run of ones and the window slides right one bit position whether or not the pattern is found. The length of the template used in my implementation is of 9 bits, as the one recommended by NIST. Instead of working on bits, the same optimization is applied as for test 7, using the short int variable to hold two bytes from the input sequence at a time. In order to compare the run of ones with the first 9 bits from the unsigned short int variable, the later is compared with the value 65408 (FF8000 H) which has ones on the first nine positions. If the variable is greater than this value then the pattern is found, else not. At each step the variable is shifted left 1 bit and whenever there is enough space another byte from the input sequence is added to it. The process described is shown in Figure 5-10.

As the Non-overlapping Template Matching Test does, this test also divides the sequence into sub-sequences and counts the number of appearances of the pattern in each block. The length of the block is hardcoded this time and is of 129 bytes.

This test can be called from the command line with the following arguments: t8. An example of this new version of the Overlapping Template Matching Test will be shown next,

66

Page 69: documentatie licenta

5 Design Detail

in order to present the new form of the algorithm. The steps presented correspond to the steps used in the test description from the Theoretical Foundations sub-section 3.1.1.6.

Figure 5-10. Optimization for the Overlapping Template Matching Test

Input: ε = 11001100 00010101 01111100 01011100 11100000, n = 40 bits (5 bytes), B = 111111111, K = 5, M = 3 bytes (for the purpose of this example M is chosen to be 3; in the implementation M is hardcoded to 129).

Step 1N = 1The single block is: 11001100 00010101 01111100.

Step 2The unsigned short int variable will be denoted temp because it is a temporary

variable. The number of occurrences of the template in each block (in this case in the only block) is computed in the following way.

temp = 11001100 00010101At each step temp is compared with the value 11111111 10000000. If temp is greater

than this value then a counter is incremented, otherwise temp is shifted left one bit. After the eighth shift the temp will look like this: temp = 00010101 00000000. Now a new byte from the input sequence can be placed at the end of the variable becoming temp = 00010101 01111100. The search goes on until there are less than 8 original bits left in temp. For this example the count will be 0. The rest of the steps that are performed are the ones described in section 3.1.1.6.

7. NIST Test 9 –Maurer’s “Universal Statistical” Test

This test computes a table holding the last appearances of each possible L-bit value in the input sequence. Because the indexes of this table are base 10 numbers, the L-bit value taken from the input sequence needs to be transformed from base 2 to base 10, by using multiplications and additions. One of the optimizations brought to this test is that in my implementation the transformation from base 2 to base 10 doesn’t need to be made because L was chosen to be 8 and the transformation to base 10 is already made by storing each byte from the sequence as an unsigned char value. Now the last occurrence of each possible L-bit value is computed by using an array of 256 entries, where each entry corresponds to a byte value. For each byte taken from the input sequence the corresponding value in the table is incremented. The process described is shown in Figure 5-11.

4 3 255…

4 3≥

FF80H

?

9

<<1 bit 8 6

1

2

3

Repeat 8 times

3 9

Go back to 1

The bytes buffer used in SeqNIST

67

Page 70: documentatie licenta

5 Design Detail

Figure 5-11. Optimization for Maurer’s “Universal Statistical” Test

For this version of the test the formulas presented in the NIST documentation are used in order to compute the expected value and the variance. This test can be called from the command line with the following arguments: t9m. No example will be presented for this test because only the initialization sequence must have Q = 2L = 28 = 256 bits (32 bytes), therefore taking to much time to exemplify the algorithm for each block.

8. NIST Test 9 –Maurer’s “Universal Statistical” Test – Coron’s Version

This test uses the same algorithm as the version presented before, and therefore the same optimizations. However Coron does not use the minimum distance between two successive reoccurrences of the same block, but generalizes the tests parameter to any function of the minimum distance between similar blocks. Therefore in this implementation another function is used. If i < 23 the function used is computed using the following formula:

.

If i takes values greater than 23, it is considered to be to large and the sum is approximated using the following formula:

.

The values used for the heuristic approximation and the variance are also changed accordingly to Coron’s formulas.

5.3.1.2 SeqHistograms

This application was developed by the other team member, while I provided only the sequential implementation for 4 Histogram tests (H5, H6, H7, and H8). Because of this reason, I will present only the efficient implementation of the four Histogram tests while the rest will be presented by my colleague. Each test corresponds to a function from the program, which takes as parameters the size of the sequence to be tested and a pointer to the beginning of the sequence, in order to be able to test the whole sequence or only a part of it by using the same function.

1. Histogram Test 5(bits)

This Histogram test computes the frequencies of appearance of all the possible values of 5-bits length in the input sequence. In order to work on bytes, an unsigned long variable is used that will be initialized with the first 4 bytes from the input sequence. The variable is then divided into non-overlapping blocks of 5 bits length. The values of these blocks will be

0 1 255

k n/8-2

0 1 255

Table holding the last position on which each byte appeared in the sequence

…0 1 k n/8-1

0 1

n/8-2

n/8-1

The bytes buffer used in SeqNIST

68

Page 71: documentatie licenta

5 Design Detail

obtained by applying some masks to the variable (each mask contains 5 ones), and performing some logical operations. Six of the masks are applied for the first 30 bits in the unsigned long variable, after which the variable is shifted left 8 bits and another byte from the input sequence is added. For the 10 bits that remain unchecked, 2 more masks are applied. The whole process is repeated from the start until all the bytes from the input sequence are processed. The frequencies are computed by incrementing the array of frequencies for the current 5-bit value after applying each mask. The algorithm presented is shown in Figure 5-12 on an example.

Figure 5-12. Algorithm of the Histogram Test 5(bits)

2. Histogram Test 6(bits)

This Histogram test computes the frequencies of appearance of all the possible values of 6-bits length in the input sequence. In order to work on bytes, an unsigned long variable is used which will contain on the last 3 position 3 bytes from the input sequence. The variable is divided into non-overlapping blocks of 6 bits length. The values of these blocks will be obtained by applying some masks to the variable (each mask contains 6 ones), and performing

The bytes buffer

4 3 255…

4 3

Repeat until no bytes are

left

9 8 1

9 8

00000100 00000011 00001001 00001000

11111000 00000000 00000000 00000000 1st mask applied >> 27 => frequency[0]++

00000111 11000000 00000000 00000000 2nd mask applied>> 22 => frequency[16]++

00000000 00111110 00000000 00000000 3rd mask applied>> 17 => frequency[1]++

00000000 00000001 11110000 00000000 4th mask applied>> 12 => frequency[16]++

00000000 00000000 00000000 01111100 6th mask applied

00000000 00000000 00001111 10000000 5th mask applied>> 7 => frequency[18]++

>> 2 => frequency[2]++

4 3 9 8 << 8 3 9 8 1

00000011 00001001 00001000 00001000

00000000 00000000 00000011 11100000 7th mask applied >> 5 => frequency[0]++00000000 00000000 00000000 00011111 8th mask applied

frequency[8]++

69

Page 72: documentatie licenta

5 Design Detail

some logical operations. There are 4 masks used that are applied on the last 24 bits in the unsigned long variable. The variable will be filled with three bytes from the input sequence at a time, until no more are left. The frequencies are computed by incrementing the array of frequencies for the current 6-bit value after applying each mask. The algorithm presented is shown in Figure 5-13 on an example.

Figure 5-13. Algorithm of the Histogram Test 6(bits)

3. Histogram Test 7(bits)

This Histogram test computes the frequencies of appearance of all the possible values of 7-bits length in the input sequence. In order to work on bytes, an unsigned long variable is used that will be initialized with the first 4 bytes from the input sequence. The variable is then divided into non-overlapping blocks of 7 bits length. The values of these blocks will be obtained by applying some masks to the variable (each mask contains 7 ones), and performing some logical operations. Four of the masks are applied for the first 28 bits in the unsigned long variable, after which the variable is shifted left 24 bits and another 3 bytes from the input sequence are added. For the 28 bits that remain unchecked, 4 more masks are applied. The whole process is repeated from the start until all the bytes from the input sequence are processed. The frequencies are computed by incrementing the array of frequencies for the current 7-bit value after applying each mask. The algorithm presented is shown in Figure 5-14 on an example.

The bytes buffer

4 3 255…

4 3

Repeat until no bytes are

left

9

9

00000000 00000100 00000011 00001001

00000000 11111100 00000000 00000000 1st mask applied >> 18 => frequency[1]++

00000000 00000011 11110000 00000000 2nd mask applied>> 12 => frequency[0]++

00000000 00000000 00001111 11000000 3rd mask applied>> 6 => frequency[12]++

00000000 00000000 00000000 00111111 4th mask appliedfrequency[9]++

70

Page 73: documentatie licenta

5 Design Detail

Figure 5-14. Algorithm of the Histogram Test 7(bits)

4. Histogram Test 8(bits)

This Histogram test computes the frequencies of appearance of all the possible values of 8-bits length in the input sequence. Because the size of the template is equal to one byte the frequencies of each 8-bit value can be computed directly by incrementing the frequency table for the position of the current byte taken from the input sequence. The algorithm used is shown in Figure 5-15 on an example.

The bytes buffer

4 3 255…

4 3

Repeat until no bytes are

left

9 8 1

9 8

00000100 00000011 00001001 00001000

11111110 00000000 00000000 00000000 1st mask applied >> 25 => frequency[2]++

00000001 11111100 00000000 00000000 2nd mask applied>> 18 => frequency[0]++

00000000 00000011 11111000 00000000 3rd mask applied>> 11 => frequency[97]++

00000000 00000000 00000111 11110000 4th mask applied>> 4 => frequency[16]++

4 3 9 8 << 24 8 1 3 5

00001000 00000001 00000011 00000101

00001111 11100000 00000000 00000000 5th mask applied

>> 21 => frequency[64]++

00000000 00011111 11000000 00000000 6th mask applied

3 5

>>14 => frequency[4]++

00000000 00000000 00111111 10000000 7th mask applied

>>7 => frequency[6]++

00000000 00000000 00000000 01111111 8th mask applied

frequency[5]++

71

Page 74: documentatie licenta

5 Design Detail

Figure 5-15. Algorithm of the Histogram Test 8(bits)

5.3.2 ParTestRand

ParTestRand is a WPF application, similar with the SeqTestRand application, but which incorporates the parallel version of the two statistical batteries, NIST and CryptoRand. There are two parallel versions of the NIST battery which are implemented as two applications: ParNIST and ParNIST_blocks. In one of these applications the tests are parallelized internally, meaning that each algorithm is parallelized, while in the other application, the sequential version of the algorithm is used and the parallelization is applied to the block sub-sequences created in the testing modes 2, 3 and 4. The CryptoRand battery is implemented as three applications, two parallel versions for the Histogram tests (SeqHistograms and SeqHistograms_blocks), from the same reasons stated for NIST, and a parallel version for the General Block Frequency test (ParGenBlockFreq). ParTestRand was developed in C# and XAML and the applications are incorporated as separate processes.

The ParNIST application represents the parallelization of the SeqNIST application, being therefore a byte-oriented more efficient implementation of the NIST Statistical Test Suite which also takes advantage of the processing power offered by multi-core architectures.The ParNIST_blocks application offers another parallel version of SeqNIST, which can be used when the sequence is divided in many sub-blocks in one of the testing modes 2, 3 or 4. The three applications that implement the parallel version of the CryptoRand battery represent an efficient parallel implementation of some tests developed as part of the CryptoRand project. To sum up, the ParTestRand application integrates 5 applications and provides a graphical user interface for them.

Each of these applications was parallelized using OpenMP directives. The structure of the ParNIST and ParNIST_blocks applications will be presented next, as well as the parallelization techniques used for 6 of the NIST test. The applications representing the CryptoRand battery were developed by the other team member, as well as the parallelization of 3 NIST test. The sequential versions of the tests used in the ParNIST_blocks and ParHistograms_blocks applications were divided in the same manner as for SeqNIST and SeqHistograms (as described in sections 5.3.1.1 and respectively 5.3.1.2). Therefore my only contribution to the parallel version of the CryptoRand battery was the sequential implementation of the Histogram tests from 5 to 8 used in the application ParHistograms_blocks.

5.3.2.1 ParNIST

The ParNIST application was developed by me and integrates the parallel version of 9 statistical tests taken from the NIST documentation [8]. From the 14 tests implemented in the sequential version of this application only 9 could be parallelized, out of which 6 were implemented by me and 3 by the other team member. ParNIST is a Win32 console application and was developed in C.

The bytes buffer

4 3 255…9

frequency[4]++ frequency[9]++frequency[3]++ ... frequency[255]++

72

Page 75: documentatie licenta

5 Design Detail

The structure of this application is similar with the SeqNIST application, with the difference that now the number of threads is taken into consideration and the algorithm comprised in each function (each test) is parallelized. The parameters used for each function are the same, but their number is incremented by one, because now the number of threads given by the user is also a parameter. The same bytes buffer is used and the same optimizations are applied by shifting the paradigm from a bit sequence to a byte sequence. The processing times computed in the SeqNIST application are also computed here in the same manner in order to be able to make a comparison with the NIST implementation. The optimizations described in section 3.1.1.1 for each test also apply here. In the next sections I will present the parallelization techniques applied to each of the tests implemented by me. The other 3 test will be described by my team colleague.

1. NIST Test 1 – The Frequency (Monobit) Test

The test computes the difference between the number of ones and the number of zeroes in the sequence. The same lookup table is used in order to compute the number of ones as in the SeqNIST implementation. In order to parallelize the algorithm used here, the tested sequence is divided into equal chunks, each of which is assigned to a thread from the team. Each thread computes the number of ones for the corresponding chunk and reduction is applied to the variable holding the total number of ones. The OpenMP directive used is parallel for.

2. NIST Test 2 – Frequency Test within a Block

This test divides the tested sequence into blocks and computes the proportion of ones in each block. In order to compute the number of ones in each block the same lookup table is used as in the SeqNIST implementation. In the parallel version of the algorithm an equal number of blocks are assigned to each thread, meaning that the chunk used will be obtained by dividing the number of blocks to the given number of threads. Each thread will then compute the proportion of ones and a sum for the blocks assigned to it. The sum depends on the proportion of ones computed for each block and is obtained by applying reduction for it. The OpenMP directive used is parallel for.

3. NIST Test 3 – Runs Test

The Runs Test computes the number of runs in the sequence. The same lookup tables as in the SeqNIST implementation are used in order to compute the total number of runs. For the prerequisite test that computes the total number of ones in the sequence, the sequence is divided into equal chunks according to the number of threads given by the user. Each thread will then compute the number of ones in the corresponding chunk and reduction will be applied for the variable holding the total number of ones in the sequence.

In order to parallelize the process of computing the number of runs, the sequence will be divided in the same manner as before, into equal chunks. Each chunk will be assigned to a thread that will compute the number of runs in it. For the variable holding the total number of runs reduction is applied. The OpenMP directive used is parallel for.

4. NIST Test 4 – Test for the Longest Run of Ones in a Block

This test divides the tested sequence into blocks, looks for the longest run of ones in a block and then tabulates it in the corresponding frequency category. Because in our implementation the blocks size used is 8, each byte will be a block and the longest run of ones can be computed by using the same lookup table as in the SeqNIST implementation. In the parallel implementation the sequence is divided into equal chunks, which have a length equal to the ratio between the sequence length and the number of threads given as input. Each

73

Page 76: documentatie licenta

5 Design Detail

thread computes the frequencies of the longest run of ones for the corresponding chunk. In order to obtain a reliable result, reduction is applied for all the frequency categories. The OpenMP directive used is parallel for.

5. NIST Test 7 – Non-overlapping Template Matching Test

This test divides the sequence into blocks and then uses an m-bit window to search for a specific m-bit aperiodic pattern in these blocks. The test is actually run for all the aperiodic patterns of length m, where in our case m = 8. The total number of templates currently used is 75. The templates used are split between the number of threads, and each thread will then run the test for the corresponding patterns. Because each run of the test returns another P-value, no reduction needs to be applied. The OpenMP directive used is parallel for.

6. NIST Test 8 – Overlapping Template Matching Test

This test partitions the tested sequence into blocks of 129 bytes and then uses an m-bit window to search for a run of ones of m bits in these blocks. In this implementation m was taken to be 9. Each tread is assigned an equal number of blocks in which it will look for the pattern of ones. The number of occurrences of the template in each block is computed by the designated thread. The test also tabulates the frequencies of appearance of this pattern in each block, based on some categories. These frequencies are accurately obtained because reduction is used for each of them. The OpenMP directive used is parallel for.

5.3.2.2 ParNIST_blocks

The ParNIST_blocks application was developed by me and integrates the sequential version of the 14 statistical tests that are also incorporated in the SeqNIST application. The parallelization applied in this application consists of running the tests in parallel for the sub-sequence blocks specified in the testing modes 2, 3 and 4. Therefore this application does not run in mode 1, because it can’t be parallelized this way for this testing mode. ParNIST_blocks is a Win32 console application and was developed in C.

The structure of the main program is similar with the one of the ParNIST application, but now each loop that runs a test for each sub-sequence block is parallelized. The chunk used is represented by the ratio between the total number of blocks (given by the user through the argument <count>) and the number of threads (also given by the user). Each thread will run the test for a number of blocks equal to the chunk designated to it. The OpenMP directive used is parallel for. The sequential version of the test was taken from SeqNIST, where 8 were implemented by me and 6 by the other team member (NIST test 5 could not be included here).

5.3.3 SFC

The Statistical Functions Calculator is an application designed to come in use when working with statistical tests and implicitly with statistical functions. The application contains a class whose functionality is incorporated in an interface. The class is named Function because it contains the statistical tests that can be used in this calculator. The functions are implemented as static methods of this class and are taken from the Cephes mathematical library. This calculator offers a total of 13 statistical functions, out of which 7 were implemented by me and 6 by the other team member. Because SFC is a WPF application developed in C# and XAML, the functions from the Cephes library needed to be translated from C to C# and adapted accordingly. The formulas used for the functions implemented by me and a short description will be presented next.

74

Page 77: documentatie licenta

5 Design Detail

1. IGAM

IGAM is the Incomplete Gamma function as described in the Cephes library. In mathematics the formula used by this function actually represents the Lower Regularized Gamma Function, which is the ratio between the Lower Incomplete Gamma function and the Complete Gamma function. The formula used in this implementation is the one described by Cephes:

.

The function has two parameters that must be positive real numbers.

2. IGAMC

IGAMC is the Complemented Incomplete Gamma function as described in the Cephes library. In mathematics the formula used by this function actually represents the Upper Regularized Gamma function, which is the ration between the Upper Incomplete Gamma function and the Complete Gamma function. The formula used in this implementation is the one described by Cephes:

The function has two parameters that must be positive real numbers.

3. IBETA

IBETA is the Incomplete Beta Function, and has the same formula defined both in mathematics and in the Cephes library:

The function has three parameters, out of which the first one must be a real number in the interval [0, 1] and the last two must be real numbers greater than 0.

4. CHI-SQUARE (Left)

The CHI-SQUARE (Left) statistical function computes the area under the left hand tail (from 0 to x) of the Chi-square distribution’s probability density function with k degrees of freedom. The formula used for the probability density function is the following one:

, for x > 0

0, for x ≤ 0The function has two parameters, out of which the first one must be a positive real number and the second one, representing the degrees of freedom, must be an integer greater than 0.

5. CHI-SQUARE (Right)

The CHI-SQUARE (Right) statistical function computes the area under the right hand tail (from x to ∞) of the Chi-square distribution’s probability density function with k degrees of freedom. The formula used for the probability density function and the parameters used are the same as for CHI-SQUARE (Left).

6. POISSON (Left)

POISSON (Left) represents the sum of the first k terms of the Poisson distribution’s cumulative density function. The formula used to compute this sum is the following one:

75

Page 78: documentatie licenta

5 Design Detail

The function has two parameters, out of which the first one must be a natural number and the second one, representing the degrees of freedom, must be an integer greater than 0.

7. POISSON (Right)

POISSON (Right) represents the sum of the terms k+1 to infinity of the Poisson distribution’s cumulative density function. The formula used to compute this sum is the following one:

The parameters are the same ones used for POISSON (Right).

5.4 User interface

5.4.1 SeqTestRand

The interface provided by the SeqTestRand application is an intuitive user friendly interface that can be divided in two parts: the design part, implemented in XAML, and the code behind the interface that handles various events and the integration of some processes used, using C#. The interface was designed by me, i.e. I implemented the XAML files, while the code behind in C# was implemented afterwards by my team colleague. In the following I will present only the design details of the interface, leaving the operationally to be explained by my colleague. However there is an exception, because the visualization of graphical results was implemented entirely by me, design and functionality.

The interface is composed of four windows: the main window, the graphical results window, the help window and the processing window. The main window contains two tabs: one for the NIST battery, shown in Figure 5-16, and one for the CryptoRand battery, shown in Figure 5-17. The separation on tabs is very useful if the application needs to be extended, because a number of batteries can be added without affecting the ones already implemented. Also this separation makes the batteries independent.

The main window was design to guide the user through the steps he has to make in order to run the tests. First a file containing the input sequence must be chosen, either by using the Browse button or by typing it in the text box situated after the File label. The system will display the length of the sequence in bytes in a label. Next the user has to choose a testing mode from the combo box preceded by the label Mode and introduce some parameters in the text boxes offered if the testing modes are 2, 3 or 4. These parameters are: the offset from where the sequence will be testes, denoted by the label Start, the step used to specify the length of a block or the value with which the block will be increased/decrease every time, denoted by the label Step, and the number of blocks to be used, denoted by the label Count. The significance level, denoted by the label Alpha can also be changed by typing another value in the corresponding text box. For each of these text boxes and for the values of the combo box there are tool tips attached in order to inform the user about their meaning.

Each test can be chosen by checking the corresponding check box and introducing the necessary parameters if it has any. The parameters are placed below the name of the test and each of them is followed by a small button containing a question mark. This button was designed to help the user, by showing him what the conditions that the current parameter needs to meet are. For the NIST test 7 another Browse button is used for specifying the second parameter, because it must be a file. The RUN button was designed to be used when a single file will be used for each test, while the RUN BATCH JOB button was designed to be

76

Page 79: documentatie licenta

5 Design Detail

used when the tests need to be run for multiple input files. The results are displayed in a text box preceded by the Text results label, while the SHOW GRAPHICAL RESULTS button was designed to open the graphical results window. The help window can be opened by using the Help button.

Figure 5-16. The main window of the SeqTestRand application – NIST battery tab

The graphical results window is shown in Figure 5-18, where the p-values obtained for a file of 1MB and for all the tests of the NIST battery, in testing mode 1, are displayed. This window displays a plot where the OY axis takes values from 0 to 1 with a logarithmic scale and the OX axis is divided in equal portions depending on the number of P-values displayed. Each P-value is represented by a point on the plot, but it is actually implemented as a small ellipse. The P-value points are united by a polyline.

The OY axis is divided in four intervals: [0, 0.001], [0.001, 0.01], [0.01, 0.1] and [0.1, 1], because the significance level can’t have a value lower than 0.0001. The red line represents the significance level chosen by the user, which in Figure 5-17 is 0.01. If all the points in the plot are situated above the red line then the sequence is random, and if there are points situated below the red line, then their number represents the number of P-values that failed the sequence. In Figure 5-18 only one P-value is below the significance level and two are on the limit, meaning that only one test failed the sequence. The visualization of graphical results can be useful when there are many tests run and many P-values obtained. In this situation the user can tell more quickly if the sequence is random or not just by viewing the plot. When the tests are run in the testing modes 2, 3 or 4, the results can be come hard to read because of their number. That’s a situation when the graphical result could be very helpful, as

77

Page 80: documentatie licenta

5 Design Detail

shown in Figure 5-19, where a 100 MB file is run for the first test, in testing mode 3, starting from the offset 0, increasing with a step of 20000 bytes and for a number of 10000 blocks.

Figure 5-17. The main window of the SeqTestRand application – CryptoRand battery tab

Figure 5-18. The graphical results window of the SeqTestRand application

78

Page 81: documentatie licenta

5 Design Detail

Figure 5-19. The graphical results window of the SeqTestRand application – an example when this window is indispensable

The help window is shown in Figure 5-20 and it contains just a read only text box, where all the information is loaded, and an OK button.

Figure 5-20. The help window of the SeqTestRand application

79

Page 82: documentatie licenta

5 Design Detail

The processing window appears when the tests are run and is shown in Figure 5-21. It is actually formed of 10 ellipses that are animated in order to form a rotating circle. It was designed to keep the user informed that the application is still running normally (it hasn’t crash) but it is just tacking some time. As it can be seen in Figure 5-21, the Cancel button also becomes available when the processing window goes into action. In fact, the Cancel button is the only control that the user should be allowed to use until the tests finish running. This button was designed to be used when the user wants to cancel the running tests.

For each of the buttons used in this interface I used some customized templates, which are located in the App.xaml file.

Figure 5-21. The processing window of the SeqTestRand application

5.4.2 ParTestRand

The interface provided by the ParTestRand application is similar with the one provided by SeqTestRand. The difference is that it provides means for the user to choose the number of threads used for the parallelization. The number of threads can be entered in the text box following the Number Threads label. Another difference is that: NIST test 5 is not used here anymore, because it couldn’t be parallelized in neither of the two methods used. Because there are two methods used for parallelization, the user also has the possibility to choose between running the parallelized tests and running the blocks in parallel, but only in the testing modes 2, 3 and 4. The parallel version of each test is used by default, and if he wants to run the blocks in parallel he has the check the check box next to the testing modes combo box.

80

Page 83: documentatie licenta

5 Design Detail

The main window for the NIST tab is presented in Figure 5-22, while the processing window, the help window and the graphical results window are the same as the ones presented for the SeqTestRand application. The modifications done to the CryptoRand tab are the same as the ones done for the NIST battery, so they won’t be presented anymore. The parts of the applications were divided between the members of the team in the same manner as it was described for the SeqTestRand application.

As it can be seen in Figure 5-22, the option to run the blocks in parallel is not available at first. It will become available when one of the modes 2, 3 and 4 are selected. Because not all the tests could be parallelized, the tests that are disabled in Figure 5-22 are the ones that can be used only when the blocks are run in parallel. Therefore they will become enabled when the checkbox run blocks in parallel is checked.

Figure 5-22. The main window of the SeqTestRand application – NIST battery

5.4.3 SFC

The interface provided by the Statistical Functions Calculator is an intuitive user friendly interface that can be divided in two parts: the design part, implemented in XAML, and the code behind the interface that handles various events, using C#. The design part can be divided again in: the design of the main window, the design of the tabulate window and the design of the about window. I designed the tabulate window and the about window, while the other team member designed the main window. The code behind the interface handles a series of events generated for each statistical function. Therefore each event handler was implemented by the member of the team who also implemented the corresponding function.

The main window, designed by my colleague, contains a tab for each statistical function, which makes the functions independent. I only used the design implemented by my

81

Page 84: documentatie licenta

5 Design Detail

colleague in order to add the tabs corresponding to the statistical functions implemented by me. The main window of one of the tabs added by me is shown in Figure 5-23.

The tabulate window was designed by me and was used by my team colleague in order to add the necessary tabulate windows for the functions implemented by her. The design of this window is shown in Figure 5-24, for the IBETA function. In contains three text boxes for each parameter of the function, labeled with the corresponding name, representing the value from which the parameter should start, the value where the parameter should stop, and the value with which to increase the parameter at every step. The file where the results will be saved can be chosen by clicking the Save button or can be typed in the text box preceded by the label Save results to file. The Calculate button was designed for generating the table for the intervals provided by the user. There will be one tabulate window for each function implemented; therefore the name of the current function must also be specified in this window.

Figure 5-23. The main window of the SFC application – IGAMC tab

Figure 5-24. The tabulate window for the IBETA statistical function in the SFC application

82

Page 85: documentatie licenta

5 Design Detail

The about window is shown in Figure 5-25 and contains information about the authors, the project and the function library used. For each of the buttons used in this interface I used some customized templates, which are located in the App.xaml file.

As I stated before I also implemented the code behind the main window and behind the tabulate windows for the functions implemented by me. This code handles a series of events generated by the interface. For the main window the events handled for each function are:

1. Calculate button clickThis event handler contains the computation of the selected function for the

parameters given. Any mistakes, like not entering all the parameters or entering invalid parameters, are handled, as well as any exception that the selected statistical function can throw. There will be one event handler for each function.

2. Clear button clickThis button clears the contents of all the text boxes.

3. Tabulate button clickThis button opens the tabulate window corresponding to the selected function.

4. About button clickThis button opens the about window corresponding to the selected function.

For each of the tabulate windows the events handled are:1. Calculate button click

This event handler generates the table with the results of the function for the given intervals and saves it in the specified file. Any mistakes, like not entering all the parameters, entering invalid parameters, not entering a file or entering an invalid path, are handled, as well as any exception that the selected statistical function can throw. If the data cannot be saved in the specified file, the user is alerted that the operation was unsuccessful.

2. Save button clickWhen this button is clicked a save dialog box is opened, where the user can choose the

path and the name of the file in which he wants to save the table. The default extension of the file is csv, in order to be able to visualize the results in Microsoft Excel, but the extension can also be changed by the user if wanted.

Figure 5-25. The about window of the SFC application

83

Page 86: documentatie licenta

6 System Usage

6 System Usage

6.1 SeqTestRand

The SeqTestRand application can be used to test random bit sequences with two batteries of statistical tests: NIST and CryptoRand. There are two running modes available and two modes two visualize the results. The running modes are on a single file or on multiple files, and the visualization modes are textually or graphically. Each of the functionalities offered by this system will be presented next.

Running the tests on a single file is one of the main functionalities that this system offers. It incorporates other functionalities, like choosing the testing mode, selecting the wanted tests, introducing an input file and so on. The steps required to run the tests implemented by me on a single file are the following ones:

1. Choose a battery of tests by selecting one of the tabs: NIST or CryptoRand.2. Introduce an input file either by typing the full path in the text box preceded by the

File label, or by clicking the Browse button and searching the file through the existent file system. In the second case, the full path to the file will be automatically filled in the text box. In both cases, the length of the file in bytes will be displayed in the label following the label Length. The minimum sequence length in bytes for each test can be found in the Help window.

3. Choose a running mode by selecting one of the values 1, 2, 3 or 4 from the combo box following the Mode label.

4. Introduce the necessary parameters for the selected mode. If the testing mode selected is 1, then no parameters need to be filled (the corresponding text boxes will be disabled). If the testing mode selected is 2, 3 or 4, then the following parameters need to be filled: The text box following the Start label should be filled with the position from

where the sequence will be tested. The text box following the Step label should be filled with the length of a block,

if the testing mode is 2, with the value with which the block will be increased at every step, if the testing mode is 3, or with the value with which the block will be decreased at every step, if the testing mode is 4.

The text box following the Count label should be filled with the number of blocks to be used.

Start + Step × Count should not exceed the length of the sequence. 5. Choose the significance level. The default significance level is 0.01, but it can be

changed by the user to any value between 0.0001 and 1.6. Select the wanted tests and introduce their parameters. Each test can be selected

by checking the corresponding check box. The parameters of each test are placed below the name of the test. In the NIST battery there are only 3 tests that have parameters: NIST test 2, NIST test 7 and NIST test 11. The second NIST test takes as a parameter the length of the block, which must be greater than 3 bytes, greater or equal to the size of the sequence divided by 100 and less or equal to the size of the sequence. NIST test 7 has 2 parameters: the length of the block, which must be greater or equal to the size of the sequence divided by 100 and less or equal to the size of the sequence, and the templates file. The default template file chosen is the one used by NIST, but it can be changed to any other templates file containing integer numbers between 0 and 255. Another template file can be chosen either by typing the full path in the text box proceeded by the Templates file label, or by clicking the Browse button and searching the file through the existent file system.

84

Page 87: documentatie licenta

6 System Usage

NIST test 11 was implemented by my colleague. The four Histogram tests implemented by me, from the CryptoRand battery, do not have parameters. In order to view the conditions imposed for each parameter the question mark (?) button, following each test parameter, can be clicked.

7. Click the RUN button in order to run the selected tests on the chosen sequence.8. Wait for the tests to finish running. During this time the processing window will

appear. The user can cancel the tests during this time by clicking the Cancel button. If the user cancels the tests, a yes/no/cancel message box will appear that will ask him if he really wants to cancel the running tests. If he chooses yes he will be informed that this might take a while., but the tests will be canceled.

9. View the results in the text box below the label Text results.Figure 6-1 shows an example for performing the steps 1 – 7 presented above in the

NIST battery (it the same for the CryptoRand battery) In this example the chosen file is m100.txt, having 100 MB length (104857600 bytes), and the testing mode chosen is 2. The sequence will be tested starting from the 100000 byte and will be divided in blocks of length 20000000 bytes. The first 4 blocks will be tested, because Count is 4. The significance level used is the default one and the selected tests are the ones I implemented. The figure also shows the message box that would appear if the user clicks the question mark button following the Block size parameter of test 7.

Figure 6-2 shows the application is step 8 and Figure 6-3 shows the application in step 9, when the user can view the results.

Running the tests on multiple files is another of the main functionalities that this system offers. It also incorporates other functionalities, similar with the ones used in running the tests on a single file. The steps required to run the tests implemented by me on multiple files are the following ones:

1. Choose a battery of tests as described before.2. Choose a running mode as described before.3. Introduce the necessary parameters for the selected mode as described before.4. Choose the significance level as described before.5. Select the wanted tests and introduce their parameters as described before6. Click the RUN BATCH JOB button. A folder browse dialog will be opened.7. Choose a directory from the folder browse dialog.8. Wait for the tests to finish running as described before.9. View the results as described before.

Figure 6-4 shows how steps 1 – 7 presented above for running the tests on multiple files are performed in the CryptoRand battery (it the same for the NIST battery). In this example the testing mode chosen is 1, so Start, Step and Count need not be specified. The significance level used is the default one and the selected tests are the ones I implemented. Steps 8 – 9 are not shown anymore because are similar to the ones presented in Figure 6-2 and respectively in Figure 6-3.

The results obtained after the tests are run on a single file or on multiple file can also be visualized in a graphical manner by clicking the VIEW GRAPHICAL RESULTS button. For the example shown when running the tests on a single file, the graphical results window is shown in Figure 6-5. The help window can be opened by clicking the Help button.

85

Page 88: documentatie licenta

6 System Usage

Figure 6-9.Example for using the SeqTestRand application by running the tests on a single file in the NIST battery(steps 1-7)

86

Page 89: documentatie licenta

6 System Usage

Figure 6-10.Example for using the SeqTestRand application – processing the tests

Figure 6-11 Example for using the SeqTestRand application – final results

87

Page 90: documentatie licenta

6 System Usage

Figure 6-12.Example for using the SeqTestRand application by running the tests on multiple files in the CryptoRand battery(steps 1-7)

Figure 6-13.The plot obtained for the example shown for running the tests for a single file (graphical window)

88

Page 91: documentatie licenta

6 System Usage

6.2 ParTestRand

The ParTestRand application can be used for the same purpose as the SeqTestRand application is used, but this time the batteries are parallelized. The running modes and the visualization modes described for SeqTestRand also apply here, with some minor changes in the steps taken.

In order to run the tests on a single file the following steps need to be made:1. Choose a battery of tests as described for SeqTestRand.2. Introduce an input file as described for SeqTestRand.3. Choose a running mode as described for SeqTestRand.4. Choose the parallelization mode: run parallelized test algorithm or run blocks in

parallel. The default option is running the parallelized test algorithm. In order to switch to the option of running the blocks in parallel, the run blocks in parallel check box must be checked. If this check box is unchecked, the default option is enabled again. At first only the tests whose algorithm could be parallelized are enabled, while the other tests will be enabled if the option run blocks in parallel is chosen.

5. Introduce the necessary parameters for the selected mode as described for SeqTestRand.

6. Choose the significance level as described for SeqTestRand.7. Introduce the number of threads wanted by typing a number greater than 0 in the

text box following the Number Threads label.8. Select the wanted tests and introduce their parameters as described for

SeqTestRand.9. Click the RUN button as described for SeqTestRand.10. Wait for the tests to finish running as described for SeqTestRand.11. View the results as described for SeqTestRand.

Figure 6-6 shows an example for performing the steps 1 – 9 presented above in the NIST battery (it is similar for CryptoRand). In this example the chosen file is m1.txt, having 1 MB length (1048576 bytes), and the testing mode chosen is 3. The run blocks in parallel check box is not checked, so the parallelized algorithm for each test will be run. The sequence will be tested starting from the first byte, the blocks will initially have a size of 200000 bytes, and then they will increase with 200000 at every step. The first 4 blocks will be tested, because Count is 4. The significance level used is the default one and the number of threads is 2. The selected tests are the ones I implemented, except the two variants of test 9 which can only be run when run blocks in parallel is checked. The browse file dialog box, the selection from the combo box and the message box shown when a question mark button is clicked will not be shown here anymore, because they are identical to the ones presented for SeqTestRand. The same applies for the processing window and the final result, which are similar with the ones presented for SeqTestRand.

89

Page 92: documentatie licenta

6 System Usage

Figure 6-14.Example for using the ParTestRand application by running the tests on a single file (steps 1-9)

In order to run the tests on multiple files the following steps need to be made:1. Choose a battery of tests as described before.2. Choose a running mode as described before.3. Choose the parallelization mode as described before.4. Introduce the necessary parameters for the selected mode as described before.5. Choose the significance level as described before.6. Introduce the number of threads as described before.7. Select the wanted tests and introduce their parameters as described before.8. Click the RUN BATCH JOB button as described before.9. Choose a directory as described before.10. Wait for the tests to finish running as described before.11. View the results as described before.

These steps are similar to the ones described for running the tests on multiple files in SeqTestRand, with the difference that the number of threads and the parallelization mode can now be chosen. The graphical results can be visualized in the same manner as for SeqTestRand. The help window can be opened by clicking the Help button.

6.3 SFC

The Statistical Function calculator can be used to compute the values of a series of statistical functions. The values of the function can be computed in two ways: for a particular point or for multiple points, in which case a table of values is generated.

Computing the value of the function in one point is one of the main functionalities that this system offers. It incorporates other functionalities, like choosing the function, and

90

Page 93: documentatie licenta

6 System Usage

introducing the function’s parameters. The steps required to compute the value of a function in a single point are the following ones:

1. Select a function by selecting one of the tabs.2. Introduce the parameters of the function by typing the values in the text boxes

located in the Input parameters group.3. Click the Calculate button in order to compute the value of the function.4. View the results in the read only text box preceded by the name of the function.An example is shown in Figure 6-7, where the value of the function igamc(0.1, 0.5) is

computed.

Figure 6-15.Example of using the SFC application by calculating the value of the function in a point

Generating a table with the values of a function is another of the main functionalities that this system offers. It incorporates other functionalities, like choosing the function, introducing an interval and a step for each parameter, and choosing the file into which to save the table. The steps required to compute the value of a function in multiple points are the following ones:

1. Select a function as described before.2. Click the Tabulate button.3. Introduce an interval and a step for each parameter in the text boxes located in

the Parameters group. The first value of the interval must be typed in the text box following the label from, the last value of the interval must be typed in the text box following the label to, and the step with which the parameters are increased every time must be typed in the text box following the label step.

4. Choose the file into which to save the table either by typing the full path and the name of the file in the text box preceded by the Save results to file label, or by clicking the Save button and searching the location of the file through the existent file system.

5. Click the Calculate button in order to generate the table.6. A message box will appear informing the user that the table was saved in the

specified file. The user can visualize now the results in the specified file.

91

Page 94: documentatie licenta

6 System Usage

An example is shown in Figure 6-8 for the IBETA function. The x parameter will take the values {0.1, 0.2, 0.3}, the a parameter will take the values {1, 2, 3} and the b parameter will take the values {5, 6, 7, 8}. The table will be saved in the file table.csv.

Figure 6-16.Example of using the SFC application by generating a table with the values of the function in multiple points

The table obtained is shown in Figure 6-9. The text boxes from the main window can be cleared by clicking the Clear button, while the about window can be opened by clicking the About button.

92

Page 95: documentatie licenta

6 System Usage

Figure 6-17.The table obtained for the example presented above

93

Page 96: documentatie licenta

7 Deployment and experimental results

7 Deployment and experimental results

7.1 Used Technology

The command line applications SeqNIST, ParNIST, ParNIST_blocks, as well as the tests integrated in SeqHistograms and ParHistograms_blocks were written in C. The C language was chosen because of the efficiency of programs written is this language and because the algorithms used for the statistical tests implemented are not suited for object oriented programming. Using the C language also facilitates a comparison with other systems, because they are also using it to implement statistical tests batteries.

The command line applications were compiled in the release configuration, using Maximize Speed (/O2) as a project optimization property in order to optimize the compiled code. The Multi-threaded DLL (/MD) was used as a runtime library, in order to be able to run the application on computers that do not have the .NET runtime library for C installed.

For the SeqTestRand, ParTestRand and SFC application the .NET technology was used, more specifically the .NET framework 3.5. WPF (Windows Presentation Foundation) was used to implement the interfaces of these applications. SeqTestRand and ParTestRand use C# in order to handle various events and to integrate various applications. C# is also used in SFC in order to implement the Function class and to make the connection between the interface and the Function class.

All the applications were compiled and run using Visual Studio 2008 and its integrated compilers for C and C#.

7.2 Running the Applications

The three applications presented before need not be installed on the computer in order to run them. Only a copy of the necessary executables needs to be presented on the computer on which they will be run. For the SeqTestRand application the necessary executables are: SeqTestRand, SeqNIST, SeqHistograms and SeqGenBlockFreq. For the ParTestRand application the necessary executables are: ParTestRand, ParNIST, ParNIST_blocks, ParHistograms, ParHistograms_blocks and ParGenBlockFreq. The SFC application only needs one executable with the same name. Except SeqTestRand, ParTestRand, and SFC all the other executables can also be executed independently, but only from the command line.

In order to be able to run the three main applications presented some software and hardware requirements need to be met. These requirements will be presented next.

7.2.1 Hardware requirements

For the SeqTestRand application the minimum hardware requirements that need to be met are: 1GB of RAM and a mono-core processor of 1GHz. It is recommended to run the ParTestRand application on a multi-core processor because it was designed to take advantage of the processing power of multi-core architectures. Therefore the minimum hardware requirements for ParTestRand are: 1GB of RAM and a dual-core processor of at least 1GHz each. These applications could be run on lower configurations, but the processing time could become too high. These minimum hardware requirements presented above also apply in the case the command line applications integrated in SeqTestRand and ParTestRand are run independently.

The minimum hardware requirements for the SFC application are the ones of the .NET framework 3.5 presented in [40]. The recommended requirements for the .NET framework are: 256 MB of RAM and 1 GHz mono-core processor.

94

Page 97: documentatie licenta

7 Deployment and experimental results

7.2.2 Software requirements

The software requirements needed in order to be able to run the three applications presented above are:

Windows XP with Service Pack 2 or a higher version of the Windows operating system.

.NET framework 3.5.

7.3 Encountered Issues and Solutions Found

The first optimized version of the NIST statistical tests read and processed only 200000 bytes from the input sequence at a time. Because the tested sequence was not entirely stored in memory, the input file needed to be read for each of the tests run. For large files the time needed to handle the file took too much time. The solution to this problem was to limit the size of the input file to 1GB and to allocate the whole tested sequence dynamically in memory. This way the file is read only one time and each test can use it when it is run. The limited size of the sequence also solves the problem that occurs for very large files: the processing time takes too much time to be run on a local computer. For large files the tests could be adapted and run on a grid.

The first version of the tests tested the entire sequence as a whole, searching therefore for randomness globally. The problem appeared when only a part of the sequence needed to be tested or when the sequence needed to be tested locally, by dividing it into blocks and testing each of them in part. The solution to this problem was to introduce more testing modes, in order to be able to test the sequence as a whole, or different parts of it. The 4 running modes chosen offer the user the possibility to partition the sequence in a variety of ways.

Another problem that appeared was that while the tests were run, the interface of the SeqTestRand and ParTestRand applications was blocked. Whenever the user clicked somewhere in the interface, the application appeared not to be responding. The solution to this problem was to introduce a processing window that ran on another thread until the tests finished running. My part in this solution was to design the processing window.

In the first version of the NIST statistical tests, each test was implemented as a separate application, so the interface had to run a number of processes equal to the number of selected tests. This problem was solved by using a single executable that incorporates all the statistical tests of a battery. The tests were implemented as separate functions that take as parameters a pointer to the beginning of the sequence tested and the length oh the sequence. This way, the same function can be used to test the whole sequence or only a part of it.

Another problem that arose was that if the user wanted to run the tests for more files, he had to enter them one by one in the interface and run the tests for each of them, or he could make a batch file using the command line executables for each battery. The solution offered for this problem, was to implement another running mode in the interface, trough which the user can choose a directory for which the tests are run. This way the path to the directory is entered only once and the tests are run for all the files at once.

In the situation when there are many tests run and there are many sequences tested, the textual results can take a lot of time to be read. The solution to this problem was to implement a graphical mode of displaying the results that took less time to be analyzed. In the graphical mode the plot consists of P-value points united by lines. The significance level is also displayed in order to visualize the number of P-values that are greater that the significance level and those that are smaller than it.

95

Page 98: documentatie licenta

7 Deployment and experimental results

7.4 Experimental Results

7.4.1 NIST Battery

In order to prove that the byte-oriented implementation of the NIST statistical test suite is indeed more efficient that the original bit-oriented version, a comparison needed to be made between the two systems. A comparison also needed to be made with the results obtained by the parallel version of the tests. A series of experiments were performed in order to be able to compare the execution times of the three applications. They were performed on a computer with 2 Intel Xeon E5405 at 2 GHz CPUs and 4 GB of RAM, running Windows 2008 Server on 64 bits.

The execution times were computed in the same manner as in the NIST implementation in order to obtain an accurate comparison. The three applications compared were SeqNIST, by using SeqTestRand, ParNIST, by using ParTestRand, and NIST version 1.8. For each of them the time needed to process the sequence was taken into consideration, without taking into account the time needed for reading the input sequence. Each test from the suite was run for 10 input files of different sizes, and for each file the test was run 5 times in order to obtain an average execution time and therefore a more accurate results. The files chosen have lengths of up to 200 MB, because each of the tests from the NIST implementation has a specific file length limitation which is less than 200 MB.

In the following sub-sections the charts and the average speed-ups obtained for each of the tests implemented by me will be presented. Two charts will be presented for each of the tests excepting the two variants of NIST test 9, which have only one chart. One of the charts will contain the comparison of execution time for NIST version 1.8 and SeqNIST, and the other one will contain the comparison of execution time for SeqNIST and ParNIST. For NIST test 9, Maurer’s variant, only the chart containing SeqNIST and NIST will be displayed, because the test could not be parallelized. The Coron’s variant of NIST test 9 does not have a chart, because there isn’t another implementation with which to be compared. The first chart shows the improvement obtained by applying sequential optimizations, while the second one shows the improvements obtained by the parallelization.

96

Page 99: documentatie licenta

7 Deployment and experimental results

7.4.1.1 NIST Test 1 – Frequency (Monobit) Test

The average speed-up obtained is 9.82 for SeqNIST and 49.64 for ParNIST. The maximum file size for which the NIST implementation gave a result was 198 MB . The comparison between SeqNIST and NIST is shown in Figure 7-1 and the comparison between SeqNIST and ParNIST is shown in Figure 7-2.

Figure 7-18.Comparison of execution times between NIST and SeqNIST for the Frequency (Monobit) Test

Figure 7-19.Comparison of execution times between ParNIST and SeqNIST for the Frequency (Monobit) Test

97

Page 100: documentatie licenta

7 Deployment and experimental results

7.4.1.2 NIST Test 2 – Frequency Test within a Block

The average speed-up obtained is 9.63 for SeqNIST and 53.74 for ParNIST. The maximum file size for which the NIST implementation gave a result was 198 MB . The comparison between SeqNIST and NIST is shown in Figure 7-3 and the comparison between SeqNIST and ParNIST is shown in Figure 7-4. The block lengths used are shown in the corresponding table from the first annex.

Figure 7-20.Comparison of execution times between NIST and SeqNIST for the Frequency Test within a Block

Figure 7-21.Comparison of execution times between ParNIST and SeqNIST for the Frequency Test within a Block

98

Page 101: documentatie licenta

7 Deployment and experimental results

7.4.1.3 NIST Test 3 – Runs Test

The average speed-up obtained is 5.84 for SeqNIST and 35.22 for ParNIST. The maximum file size for which the NIST implementation gave a result was 43 MB . The comparison between SeqNIST and NIST is shown in Figure 7-5 and the comparison between SeqNIST and ParNIST is shown in Figure 7-6.

Figure 7-22.Comparison of execution times between NIST and SeqNIST for the Runs Test

Figure 7-23.Comparison of execution times between ParNIST and SeqNIST for the Runs Test

99

Page 102: documentatie licenta

7 Deployment and experimental results

7.4.1.4 NIST Test 4 – Test for the Longest Run of Ones in a Block

The average speed-up obtained is 6.51 for SeqNIST and 51.01 for ParNIST. The maximum file size for which the NIST implementation gave a result was 199 MB . The comparison between SeqNIST and NIST is shown in Figure 7-7 and the comparison between SeqNIST and ParNIST is shown in Figure 7-8.

Figure 7-24.Comparison of execution times between NIST and SeqNIST for the Test for the Longest Run of Ones in a Block

Figure 7-25.Comparison of execution times between ParNIST and SeqNIST for the Test for the Longest Run of Ones in a Block

100

Page 103: documentatie licenta

7 Deployment and experimental results

7.4.1.5 NIST Test 7 – Non-overlapping Template Matching Test

The average speed-up obtained is 3.13 for SeqNIST and 18.56 for ParNIST. The maximum file size for which the NIST implementation gave a result was 198 MB . The comparison between SeqNIST and NIST is shown in Figure 7-9 and the comparison between SeqNIST and ParNIST is shown in Figure 7-10. The templates file used is the one used by NIST, containing all the aperiodic patterns having a length of 8 bits. The block lengths used are shown in the corresponding table from the first annex.

Figure 7-26.Comparison of execution times between NIST and SeqNIST for the Non-overlapping Template Matching Test

Figure 7-27.Comparison of execution times between ParNIST and SeqNIST for the Non-overlapping Template Matching Test

101

Page 104: documentatie licenta

7 Deployment and experimental results

7.4.1.6 NIST Test 8 – Overlapping Template Matching Test

The average speed-up obtained is 15.15 for SeqNIST and 103.59 for ParNIST. The maximum file size for which the NIST implementation gave a result was 198 MB . The comparison between SeqNIST and NIST is shown in Figure 7-11 and the comparison between SeqNIST and ParNIST is shown in Figure 7-12.

Figure 7-28.Comparison of execution times between NIST and SeqNIST for the Overlapping Template Matching Test

Figure 7-29.Comparison of execution times between ParNIST and SeqNIST for the Overlapping Template Matching Test

102

Page 105: documentatie licenta

7 Deployment and experimental results

7.4.1.7 NIST Test 9 –Maurer’s ”Universal Statistical” Test

The average speed-up obtained is 12.8 for SeqNIST. The experiments were performed on small files because those are the ones for which the NIST implementation allows L to be 8. The experimental results are shown in Figure 7-13.

Figure 7-30.Comparison of execution times between NIST and SeqNIST for Maurer’s “Universal Statistical” Test

103

Page 106: documentatie licenta

8 Conclusions

8 Conclusions

8.1 Results

The developed applications presented in this paper lead to a series of realizations that are a response to the objectives established in the initial phase of the project. The first objective was to obtain a more efficient implementation of the NIST statistical test suite. The experimental results presented before show that shifting the paradigm from a bit sequence to a byte sequence does indeed increase the performance of the system. All of the 14 test regarded by NIST and a new version of the Maurer’s “Universal Statistical” Test, were implemented in a new optimized version of the battery.

Only for 9 of these tests the algorithm could be parallelized, but for these 9 tests and for 5 more, there is another parallelization mode available in one of the testing modes 2, 3 and 4: running the blocks in parallel. The performance obtained by the parallel version of the test is even more increased than the one offered by the sequential version.

The SeqTestRand and ParTestRand application bring a lot of improvements in the domain of statistical tests for random and pseudo-random number generators. Not only is their execution time improved, but they also offer other facilities like multiple running modes, the possibility to run the test on a directory of files, graphical results visualization, the possibility to choose the significance level and so on. The running modes implemented offer the possibility to test the whole sequence or various portions of it. The graphical display of results has proved to be very useful in situations when the textual results become to hard to follow because of the high number of P-values obtained.

These applications also bring into light a new developed battery of tests, named CryptoRand, like the project from which they are a part of. These tests were also implemented in an efficient manner, using the byte sequence paradigm used for the NIST battery. Unfortunately, only the algorithm of 3 of the tests comprised in this battery could be parallelized, but for all the Histogram tests there exists the possibility to run the blocks used in modes 2, 3 and 4, in parallel.

SeqTestRand and ParTestRand do not only offer two statistical batteries incorporated in a single application, but are also intuitive and easy to use. They also support their users by providing general information for each of the tests, like the purpose, the maximum allowed size of the input sequence, the allowed values for each parameter, if it has any, and the property and defect detected.

The Statistical Function Calculator developed as part of this project is an useful application in the domain of statistical tests, because every statistical test uses a statistical function in order to make a decision regarding the randomness of the tested sequence. The most used statistical functions and distributions were studied and incorporated in this calculator. The calculator can be useful in situations when a statistical function needs to be studied, because it provides general information about it and means by which a table of values for the function can be generated. This table of values can then be used to generate the plot of the function on a certain interval.

The system described in this paper, comprising three main applications, can be used to test random and pseudo-random number generators, which produce bit sequences of at most 1 GB, and to develop new statistical tests for random numbers. The implemented tests are not only some of the most know statistical test, but are also efficient and integrated in a user friendly environment.

104

Page 107: documentatie licenta

8 Conclusions

8.2 Comparison with Similar Systems

The implementation of the NIST statistical battery incorporated in the project described in this paper can be compared with the original implementation provided by NIST. Actually the comparison is done with version 1.8 of the NIST implementation, which is the last but one version of the original implementation. Version 2.0 of the NIST implementation could not be used, because the execution times are not computed and it does not offer a graphical user interface. However, the modifications made to version 2.0 are actually minor, because no noticeable improvement seems to be done in the algorithm of the implemented tests.

The original NIST implementation uses a bits buffer to store the input sequence, meaning that each bit is actually stored in a byte, while in the SeqNIST implementation, the sequence is stored in a bytes buffer, meaning that no space is wasted. Because the sequence is stored in a byes buffer, the SeqNIST implementation does all the processing at the byte level, unlike the original NIST implementation which does all the processing at the bit level. The experimental results shown in section 7.4.1 prove that shifting the paradigm from a bit sequence to a byte sequence, does lead to a noticeable increase in the execution time. The average speed-up obtained by SeqNIST ranges between 3.13 and 15.15. The ParNIST implementation goes one step further and parallelizes the efficient implementation of the NIST statistical test suite, obtaining even a higher increase in performance (average speed-up obtained ranges between 18.56 and 103.59).

The maximum allowed size for the input sequence in the NIST implementation is of 256 MB, because it uses the int type to store the size of the sequence in bits. The int type is processor dependent and still has a length of 32 bits on most of the existent 64 bit architectures. SeqNIST and ParNIST use the double type to store the size of the sequence in bits, allowing therefore sequences greater that 256 MB. The maximum length of the input file in these implementations is actually limited by the amount of space that can be allocated dynamically in a process. Because of this limitation the maximum allowed input sequence size in SeqNIST and ParNIST is of 1GB.

The original NIST implementation asks the user to introduce the length of the sequence desired to be tested and the number of streams. In our implementation the length of the input file is computed automatically and 4 testing modes are available. The first two are similar with the ones offered by NIST: on the whole sequence or on more blocks taken from the current sequence. In the second case the novelty brought by our implementation is that the consecutive blocks can start from any offset within the input sequence, unlike in the NIST implementation where the blocks start by default from the beginning of the sequence. The last two modes are also a novelty brought by our implementation, because the blocks are not equal this time, but are incremented or decremented with a certain step. In steps 2, 3 and 4, ParNIST offers the possibility to run the blocks in parallel.

Our implementation also offers the possibility to choose the significance level wanted, unlike in the NIST implementation where it is hardcoded to 0.01. ParTestRand also offers the possibility to choose the number of threads used.

For some of the tests implemented by me the parameters required or their allowed ranges differ between my implementation and the original implementation. For the second test the length of the sequence can take greater values in our implementation, because the long type is used to store the length of the block in bytes, while in the NIST implementation the int type is used to store the length of the sequence in bits. Another difference here is that the length of the block for our implementation is always taken in bytes. In the NIST implementation the 4th test can be run for a block length of 8, 128 or 104 bits, depending on the value chosen when compiled. In our implementation the size of the block is hardcoded to

105

Page 108: documentatie licenta

8 Conclusions

8 in order be able to work on bytes. The 7th NIST test takes in the original implementation as one of the parameter the length of the searched pattern, while the length of the block is hardcoded. In our implementation the length of the pattern is hardcoded to 8, in order to be able to work on bytes and the length of the block in bytes is taken as a parameter. The NIST documentation states that the size of the block can be changed if some conditions are met. For NIST test 8, the original implementation takes as parameter the length of the run of ones searched, while in our implementation it is hardcoded to 9, which is one of the two values recommended by NIST in their documentation. For NIST test 9, the original implementation takes as parameter the length of a block and the length of the initialization sequence, while in our implementation the length of the block is hardcoded to 8, in order to be able to work on bytes, and the size of the initialization sequence is computed automatically according to a formula provided in the NIST documentation.

Another novelty that our implementation brings is running the tests on multiple files, as a batch job, which makes it much easier to test a directory of files. In the NIST implementation, each of the files needs to be introduced separately, while in our implementation only the name of the directory is required.

Our implementation displays the textual results directly in the interface and offers the possibility to visualize them graphically, while in the NIST implementation the interface closes every time a test is run and saves the results in a separate file for each test, which makes it harder to be visualized. Also our interface is not blocked when the tests are run, because a processing window is displayed to alert the user that the application is still running.

As a conclusion, our implementation of the NIST statistical test suite is not only efficient, but also offers some useful functionality, which are not present in the original implementation.

8.3 Future Development

The SeqTestRand and ParTestRand applications can be extended by adding an unlimited number of statistical batteries. This is possible because the implementation and integration of each battery makes them independent. If other batteries are added, the existent ones are not affected, and if one of the batteries is modified, it does not affect the other batteries. The SFC application can also be extended by adding a number of statistical tests, because they are also implemented in an independent way.

The first version of the tests that read only 200000 bytes from the input sequence at a time can be adapted and run on a grid, on files greater than 1 GB. Using the processing power of a grid, the execution time might decrease for large files.

The applications described in this project will be integrated in the TestRand system, which will be integrated as well in the CryptoRand research project.

106

Page 109: documentatie licenta

9 References

9 References

[1] Randomness –Wikipedia, http://en.wikipedia.org/wiki/Randomness[2] Random number generation – Wikipedia, http://en.wikipedia.org/wiki/Random_number_generation[3] Statistical randomness – Wikipedia, http://en.wikipedia.org/wiki/Statistical_randomness[4] NIST/SEMATECH e-Handbook of Statistical Methods, http://www.itl.nist.gov/div898/handbook/[5] P-value – Wikipedia, http://en.wikipedia.org/wiki/P-value[6] Kopal K. Kanji, 100 Statistical Tests, Third Edition, SAGE Publications, September 2006[7] M. G. Kendal, B. Babington – Smith, Journal of the Royal Statistical Society 101, 1938, 147-166[8] Andrew Rukhin, Juan Soto, James Nechvatal, Miles Smid, Elaine Barker, Stefan Leigh, Mark Levenson, Mark Vangel, David Banks, Alan Heckert, James Dray, San Vo, A Statistical Test Suite for Random and Pseudorandom Number Generators for Cryptographic Applications, NIST Special Publication 800-22, http://csrc.nist.gov/publications/nistpubs/800-22-rev1/SP800-22rev1.pdf[9] George Marsaglia, DIEHARD Statistical Tests, http://www.stat.fsu.edu/pub/diehard[10] Pierre L’Ecuyer, Richard Simard, TestU01: A C Library for Empirical Testing of Random Number Generators, Département d’Informatique et de Recherche Opérationnelle Université de Montréal, http://www.iro.umontreal.ca/~lecuyer/myftp/papers/testu01.pdf[11] John Walker, ENT, A Pseudorandom Number Sequence Test Program, http://www.fourmilab.ch/random/[12] CryptoRand random numbers project, http://cryptorand.utcluj.ro/index.html[13] Donald E. Knuth, The Art of Computer Programming - Seminumerical Algorithms, Vol. 2, Second Edition, Addison-Wesley Publishing Company, 1981[14] I. Vattulainen, K. Kankaala, J. Saarinen, T. Ala-Nissila, A Comparative Study of Some Pseudorandom Number Generators, Department of Electrical Engineering, Tampere University of Technology, 10 August 1993[15] Jean Sébastien Coron, On the Security Of Random Sources, Published in H. Imai and Y. Zheng Eds., Public-Key Cryptography, vol. 1560 of Lecture Notes in Computer Science, pp. 29–42, Springer-Verlag, 1999.[16] Security requirements for cryptographic modules, Federal Information Processing Standards Publication 140-1, U.S. Department of Commerce / N.I.S.T., National Technical Information Service, Springfield, Virginia, 1994.[17] Jean Sébastien Coron, David Naccache, An accurate evaluation of Maurer’s Universal Statistical Test[18] Ioan Rasa, Subsitemul TestRand, unpublished article[19] Beta function – Wikipedia, http://en.wikipedia.org/wiki/Beta_function[20] Abramowitz and Stegun, Handbook of Mathematical Functions, http://www.math.ucla.edu/~cbm/aands/[21] Eric W. Weisstein, Incomplete Beta Function, From MathWorld – A Wolfram Web Resource, http://mathworld.wolfram.com/IncompleteBetaFunction.html[22] Sergey Bochkanov, Vladimir Bystritsky, Incomplete beta function, ALGLIB.NET, Special Functions, http://www.alglib.net/specialfunctions/incompletebeta.php[23] Beta distribution-Wikipedia, http://en.wikipedia.org/wiki/Beta_distribution[24] Incomplete gamma function – Wikipedia, http://en.wikipedia.org/wiki/Incomplete_gamma_function

107

Page 110: documentatie licenta

9 References

[25] Eric W. Weisstein, Regularized Gamma Function, From MathWorld – A Wolfram Web Resource, http://mathworld.wolfram.com/RegularizedGammaFunction.html [26] Sergey Bochkanov, Vladimir Bystritsky, Incomplete gamma function, ALGLIB.NET, Special Functions, http://www.alglib.net/specialfunctions/incompletegamma.php[27] Gamma distribution – Wikipedia, http://en.wikipedia.org/wiki/Gamma_distribution[28] Inverse-gamma distribution – Wikipedia, http://en.wikipedia.org/wiki/Inverse-gamma_distribution[29] Sergey Bochkanov, Vladimir Bystritsky, Poisson distribution, ALGLIB.NET, Special Functions, http://www.alglib.net/specialfunctions/distributions/poisson.php[30] Sergey Bochkanov, Vladimir Bystritsky, Chi-square distribution, ALGLIB.NET, Special Functions, http://www.alglib.net/specialfunctions/distributions/chisquare.php[31] Chi-square distribution – Wikipedia, http://en.wikipedia.org/wiki/Chi-square_distribution[32] Chi-Square Goodness of Fit Test, http://www.stat.yale.edu/Courses/1997-98/101/chigf.htm[33] Poisson distribution – Wikipedia, http://en.wikipedia.org/wiki/Poisson_distribution[34] Poisson distribution – Qwiki, http://qwiki.stanford.edu/wiki/Poisson_Distribution[35] Eric W. Weisstein, Poisson Distribution, From MathWorld – A Wolfram Web Resource, http://mathworld.wolfram.com/PoissonDistribution.html[36] The OpenMP API specification for parallel programming, http://openmp.org/wp/about-openmp/[37] Blaise Barney, OpenMP, Lawrence Livermore National Laboratory, https://computing.llnl.gov/tutorials/openMP/[38] OpenMP – Wikipedia, http://en.wikipedia.org/wiki/OpenMP[39] Kang Su Gatlin, Pete Isensee, OpenMP and C++. Reap the Benefits of Multithreading without All the Work, http://msdn.microsoft.com/en-us/magazine/cc163717.aspx[40] System Requirements for Version 3.5, http://msdn.microsoft.com/en-us/library/bb882520.aspx

108

Page 111: documentatie licenta

10 Annexes

10 Annexes

10.1 Experimental Results Tables

10.1.1 NIST Battery

Table 10-12.Execution times obtained for the Frequency (Monobit) Test

TEST 1File Size

(MB)NIST SeqNIST

Speed-up SeqNIST/NIST

ParNISTSpeed-up

ParNIST/NISTSpeed-up

ParNIST/SeqNIST20 312.6 31 10.08 14.2 22.01 2.1840 625 63 9.92 15.8 39.56 3.9960 937 96.4 9.72 15.4 60.84 6.2680 1253 125 10.02 28 44.75 4.46100 1565.2 159.2 9.83 27.8 56.30 5.73120 1868.6 187.8 9.95 34.2 54.64 5.49140 2187.2 231.2 9.46 40.6 53.87 5.69160 2503.2 256.2 9.77 47 53.26 5.45180 2815.4 294 9.58 53 53.12 5.55198 3100.2 315.2 9.84 53.4 58.06 5.90

Table 10-13.Execution times obtained for the Frequency Test within a BlockTEST 2

File Size

(MB)

BLOCK LENGTH

IN BYTES

NIST SeqNISTSpeed-up SeqNIST/

NISTParNIST

Speed-up ParNIST/

NIST

Speed-up ParNIST/SeqNIST

20 300000 312.20 34.2 9.13 14.4 21.68 2.3840 500000 618.80 69.4 8.92 15 41.25 4.6360 700000 928.00 93.4 9.94 15.4 60.26 6.0680 900000 1253.20 125 10.03 32.2 38.92 3.88100 1100000 1565.60 157.4 9.95 34.2 45.78 4.60120 1300000 1865.40 193.8 9.63 27 69.09 7.18140 1500000 2177.80 228.6 9.53 37.4 58.23 6.11160 1700000 2496.80 259.6 9.62 31 80.54 8.37180 1900000 2812.40 289 9.73 49.8 56.47 5.80198 2100000 3075.00 312.2 9.85 47.2 65.15 6.61

109

Page 112: documentatie licenta

10 Annexes

Table 10-14.Execution times obtained for the Runs TestTEST 3

File Size (MB)

NIST SeqNISTSpeed-up

SeqNIST/NISTParNIST

Speed-up ParNIST/NIST

Speed-up ParNIST/SeqNIST

5 296.6 49.8 5.96 15.2 19.51 3.289 521.8 87.4 5.97 16 32.61 5.4613 756.4 125 6.05 25.2 30.02 4.9617 987.6 172 5.74 28 35.27 6.1421 1218.8 209.6 5.81 31.4 38.82 6.6825 1443.8 250 5.78 41 35.21 6.1029 1684.6 290.8 5.79 47 35.84 6.1933 1903 328.2 5.80 50 38.06 6.5637 2140.4 369 5.80 50 42.81 7.3843 2494 434.8 5.74 56.6 44.06 7.68

Table 10-15.Execution times obtained for the Test for the Longest Run of Ones in a BlockTEST 4

File Size (MB)

NIST SeqNISTSpeed-up SeqNIST/

NISTParNIST

Speed-up ParNIST/NIST

Speed-up ParNIST/SeqNIST

20 1612.6 250 6.45 31 52.02 8.0640 3234.2 494 6.55 66.2 48.85 7.4660 4840 747 6.48 93.6 51.71 7.9880 6456.2 984.8 6.56 131.2 49.21 7.51100 8074.6 1237.2 6.53 156 51.76 7.93120 9684.2 1484.8 6.52 187.6 51.62 7.91140 11297 1734.4 6.51 222 50.89 7.81160 12903.2 1984.4 6.50 250 51.61 7.94180 14515.8 2234.2 6.50 284.2 51.08 7.86199 16053 2462.6 6.52 312.4 51.39 7.88

Table 10-16.Execution times obtained for the Non-overlapping Template Matching TestTEST 7

File Size (MB)

BLOCK LENGTH

NIST SeqNIST Speed-up SeqNIST/

ParNIST Speed-up ParNIST/

Speed-up ParNIST/

110

Page 113: documentatie licenta

10 Annexes

IN BYTES NIST NIST SeqNIST20 2621440 133868.80 42678.4 3.14 7212.6 18.56 5.9240 5242880 267681.20 85429 3.13 14428.6 18.55 5.9260 7864320 401700.20 128041 3.14 21637.4 18.57 5.9280 10485760 535350.00 170800 3.13 28856 18.55 5.92100 13107200 669190.80 213384.4 3.14 36068.4 18.55 5.92120 15728640 803062.60 257709.2 3.12 43275.2 18.56 5.96140 18350080 936743.60 298836.8 3.13 50494 18.55 5.92160 20971520 1070737.20 341469.6 3.14 57714.8 18.55 5.92180 23592960 1205012.80 384162.6 3.14 64926.6 18.56 5.92198 25952256 1325268.40 422634.2 3.14 71403.4 18.56 5.92

Table 10-17.Execution times obtained for the Overlapping Template Matching TestTEST 8

File Size

(MB)NIST SeqNIST

Speed-up SeqNIST/NIST

ParNISTSpeed-up

ParNIST/NISTSpeed-up

ParNIST/SeqNIST

20 10918 722.2 15.12 106 103.00 6.8140 21819 1438 15.17 212.8 102.53 6.7660 32750 2162.4 15.15 312.4 104.83 6.9280 43665 2884.4 15.14 418.8 104.26 6.89100 54599 3599.6 15.17 518.4 105.32 6.94120 65475 4321.8 15.15 637.2 102.75 6.78140 76419 5037.6 15.17 743.6 102.77 6.77160 87322 5759.6 15.16 844 103.46 6.82180 98200 6481.4 15.15 946.6 103.74 6.85198 108022 7134.6 15.14 1046.8 103.19 6.82

Table 10-18.Execution times obtained for Maurer’s “Universal Statistical” TestTEST 9

File Size (B) NIST SeqNIST Speed-up SeqNIST/NIST258560 227.8 18.8 12.12295000 256.2 20 12.81330000 287.6 22.6 12.73365000 318.6 25 12.74400000 359.4 27.2 13.21435000 378 29.2 12.95470000 412.4 32 12.89505000 437.4 34.4 12.72540000 469 36.2 12.96580000 503.2 39.2 12.84

10.1.2 CryptoRand Battery

Table 10-19.Execution time obtained for Histogram Test 5(bits)Histogram Test 5(bits)

File Size (MB) SeqHistograms100 393.6

111

Page 114: documentatie licenta

10 Annexes

200 788.6300 1179.8400 1577.2500 1971600 2365.6700 2760.4800 3154.6900 35491024 4047.4

Table 10-9.Execution time obtained for Histogram Test 6(bits)

Histogram Test 6(bits)File Size (MB) SeqHistograms

100 352.2200 703300 1058.2400 1411.2500 1768.8600 2114.8700 2475.8800 2811.2900 3183.41024 3609.4

Table 10-10.Execution time obtained for Histogram Test 7(bits)

Histogram Test 7(bits)File Size (MB) SeqHistograms

100 337.8200 643300 960.8400 1213.4500 1512600 1814700 2130.8800 2411.6900 3052.21024 3462.6

Table 10-11.Execution time obtained for Histogram Test 8(bits)Histogram Test 8(bits)

File Size (MB) SeqHistograms100 166200 333300 498.6400 665.6

112

Page 115: documentatie licenta

10 Annexes

500 831.4600 994.4700 1169800 1330.8900 1497.41024 1703.2

10.2 Papers

This section incorporates three articles of which I am a co-author and which are connected to the project described in this paper. The first one participated at The Twelth International Conference on Applied Mathematics and Computer Science. The second one participated at the Automation and Computer Science Students Conference. Both papers were puplished. The third one was sent to the Eighth International Conference on Parallel Processing and Applied Mathematics.

113