On the adequacy of pseudo-random number generators (or: How big a period do we need?)

4

Click here to load reader

Transcript of On the adequacy of pseudo-random number generators (or: How big a period do we need?)

Page 1: On the adequacy of pseudo-random number generators (or: How big a period do we need?)

Volume 5, Number 1 OPERATIONS RESEARCH LETTERS Jtm¢ 1986

O N T H E ADEQUACY O F P S E U D O - R A N D O M N ~ M B E R GIcNEIb~,TORS (OR: HOW BIG A P E R I O D D O W E NEED?)

David HEATH

School of Operations Research and l~dustrial Engineering Cornell University. Ithaca, NY 14853, USA

Paul SANCHEZ

Syste~ and hldustrial Engineering Department, University of At, zone4 Tucson, AZ 85721, USA

Received Juli !985 Revised January 1986

Generating pseudo-random number sequences is central to discrete-event computer simulations. Traditionally, generators are u~ed only after they h,~,r, ~ pas.~ed several statistical tests. We propose a new, easy-to-implement ~est of randonmo~ based upon the 'birthday problem', which has certain optimal properties. When applied to a broad class of generators (which includes linear congruential generators) the test illuminates a previously unknown relationship between the period of the generator and the number of pseudo-random observations which can be safely used.

pseudo-random numbers * discrete event simulation * line&- congruantial generators * statistical t~ts of randomness

1. introd~tion

In this note we consider the adequacy of a certain class of pseudo-random number genera- tors. Our point of view can be thought of as somewhere between two cited by Knuth [4, p. 1271:

D.H. Lehmer (1951): 'A random sequence is a vague notion embodying the idea of a sequence in which each terra is unpredictable to the uninitiated and whose digits pass a certain number o f tests, traditional with statisticians and depending somewhat on the uses to which the sequence is to be put.' J.N. Franklin (1962): 'Th, ~. sequence . . . is ran- dom if it has every property that is shared by all infinite sequences of independent samples of random variables froni the uniform distribu- tion.' One of the characteristics which distinguishes

computer simulations from analytical or numeri- cal methods of evaluating the performance of a system is the need for random variate generation. Bratley, Fox and Schrage [11 state: 'Generating random numbers uniformly distributed in aspeci-

fled interval is fundamental to s;.mu!ation.' How- ever, for most purposes it is both impractical and vndesirable to generate truly random values. What is often done in practice is to generate sequences of numbers which are deterministic but which pass certain statistical tests. The element3 of these sequences are called pseudo-random numbers (PRN's), and a great deal of effort is put into developing and testing algorithms for generating them. All of the algorithms implemented on dig- i:al computers share certain properties: (1) they use finite length bit sequences, (2) they generate the next value by some transformation of a finite number of bits of the previous value(s).

Pseudo-~r~dom m~.~ers would ideally be inde- pendent and uniformly distributed on the open interval (0, 1). Since finitely many (say b) bi~s are being utilized this is equivalent to generating in- teger values in the range [0, 2b). (Note that this i~ the maximum range which is obtainable. Not all algorithms use the full set of integers. For exam- ple, Knuth [4, pp. 18-19], shows that multipfica- tive congruential generators which use modulo N -- 2 ~ arithmetic have periods < N / 4 for b ~ 3. Thus at most a quater of the values in the range

0167-6377/86/$3.50 © 1986, Elsevier Science Pubfishers B.V. ( N o r t h - H o l l a n d ) 3

Page 2: On the adequacy of pseudo-random number generators (or: How big a period do we need?)

[0, N - 1] can be obtained.) Statistical tests which are currently in use for ew, luating generators ad- dress issues such as wiiefller the observed values appear to be uniformly distributed, or whether k-tuples of observations are correlated (see [5]). Correlation is a mer,,,ure of linear dependence, and is used. as a st~rrogate measure of indepen- dence.

'We propose he~,e a new statistical test for ran- domness Our main conclusion is that for a gen- erator to pass ffds test its period must be a least some multiple of the square of the number of observations required, and t, hat in a certain precise sense such a period is big enougJa.

2. Basic model and results

'reasonable way' re, the statistician :c reject, we use the usual statis',ical testing moJ,¢l, 'rod let a and/~ denote th,~ probabilities of e~'rors of type I (of rejecting when the generator V; tealbl random) and of type I~ (of failing to r,~y:ct a non-random generator). To jt~dge the statk, f, cian's p~sformance in the above game one cc,tfid, for eximple, ask how small fl can be made v, nen we resthct a to be less thai,, a0; we shall in',,t::ad ask how ~mall a + ,8 can be made.

Ot:r results can bc ,summarized as tollows:

Theorem. For lar'~;e N, the statisHcian can construct a test with ~ = 0 and a approximately equal t:) exp{-kZ( N) 2N}. Co~w¢rsely, for any ~ > 0 ~,e can assure :hat a + ~ > 1 - ~ for every test by requiring k:'~, N ) / N < 8(~).

We shall consider only PRN generators whicl'~ can be described in the following way: there is a finite set, ~, of possible seeds, which we shall number 0, 1 . . . . . N - 1. Beginning with some ini- tial seed So, the generator produces the next seed according to a fixed func6,n f, In general, so,.~ = f(s~). We assume that the reported number is the seed. Finally, we assume that the generator is of full period, i.e., that beginning with any seed in a and repeatedly applying f the generator will ev,:n- tually hit every other seed.

Many commonly-used generators are of the above form. In particular, both linear congruential and shift register generators usually work in this way.

We consider the test in a game-like form. Two players, the 'generator picker' and the 'statistician', agree to the following rules: (a) The generator picker picks a generator of the

above type (with an initial seed)/and an- nounces its period, N.

(b) The statistician chooses a statistical test. The null hypothesis is that the sequence of num- bers he is about to see is random (independent and uniform on {0, ! . . . . . N - 1 }).

(c) k (N) successsive random numbers are ob- tained from the generator.

(d) The statistician applies his test. We shall see that whether the generator picker

succeeds (i.e., the statistician cannot reject the null hypothesis in a reasonable way) or the sta'istidan succeeds (he can) depeuds entirely upon the func- tion k(N). To make more precise the notion of a

Remark. This theorem indicates that prudence re- quires picking k2( N ) / N small. The comers,: indi- cates that for finite period generators, it k ~ ( N ) / X is small then there is a good way ~o choose a generator (although this procedure i~ more of the- oretical than practical interest). It, fret, the con- verse shows that in a very precise sense it is not possible to construct any statist:.ca! test ~,based on the birthday problem or any other statislical prop- erty) to distinguish a generator so chosen from a truly random one. Of cours,,: it is possible to choose badly; thus, keepir~g k2(N) /N small does not, by itself, suffice.

ProM. To show that the statistician can do well we exhibit a good test. Our test is based on the 'birthday problem' (see Feller [2, p. 33] or Von Mises [6]). Under our assu~pfiops concerning generators the sequences of nambers pt'oduced will not have any 'repeats' in the first N elements. But truly random numbers would have repeats m relatively short seauences. Thus our test is: rqeet H 0 if and only it' the sequence observed h'~s no repeats.

This test will clearly always reject I-~, o for a generator in our class (as long as k(N) < N). se fl = 0. The value or" a is the probab~,iity that a sequence of independent uniform random vada- bies on 0, 1 . . . . . N - 1 fails to reheat in k(N) trials.

The calculation of a i:; standard: a is given by N ! / ( ( N - k ) ! Nk). Setting k(N)~ , , /N °'~ and using Stifling's formula (or the results oi i3] or [6])

Page 3: On the adequacy of pseudo-random number generators (or: How big a period do we need?)

we ;,btain ~hat fo; large N, a is approxirrately gi~'rc~:n by e: ,p{- 0~/2}. For example, if we al)ow t)e statistician to sample, say, 3* N °~ or~tcomes ~ro~r) the generator, he can construct a test with ~8 -- 0 and a ~ 0Ol11.

Conversely, we shall now show that ,he genera- tor picker can do well if k2(N)/N i', small. It is easy to see that for us a generator together wi~h its initial seed is exactly a permutatioa of the i,ategers 0, 1 . . . . . N - 1. A good (.'mixed) strategy for the generator picker is as follows: try. to pick a permu- tation by choosing the elements independenff,i and uniformly from the s,~t. l"his works until a duplicate occurs. When a duplicate occurs, th,,ow av, ay that outcome and pick again. We claim that no statistical test can do well against this strategy if k ( N ) is small.

To see this, consider any statistical test. That test will reject H 0 with a lruly random sequence with probability a. How often can it reject H0 with the data from the generater? Clear]y it can reject with probability no bigger than a + P{we got a duplicate in the ?irst k(N) 'trials}. Hence /~ _>. 1 - a - P{we got a duplicate}, so ..~ + fl >_ 1 - P{w~ ~,ot a duplicate}. This probability was esti- mated above. Hence we conclude: if kz(N)/N is big, our test will reject any generator with high probability, while if it is smal',, a generator can be chosen so that no statistica~ test can detect that the gener~,~or is ~eally aon-,.andom.

The ~;enerator picker n f i~ t try to fool the statisti,:ian by periodically repeating a value. The statistician could guard against that possibility by making his test two-sided. The new test would be: reject H0 if the seq~ence observed has either too many or too few ,,epeats. For this purpose one needs to know th,:~ distribution of the number of matches for a traly random sequence. Hoist [3! shows that if .~' --, ~ :rod k ~ oo in such a w~, that k~/2N->h, the distribution of the number of matche~ tends to a Poisson distribution V,lth mean ,~.

3. Rem.'a'ks

Such a test c ~ easily 'be impleme,,ted using a binary heap data structure. We ~.:nerate suffi- ciently many observations to atW,n power a as specified above, and each observed,ion is placed on the heap. Then all observations :,re popped off the

heap md duplicates ztre ('ounted. If we o)'.~erve eithe~ ~ too many or too fe~ duplicates we tel =ct the h~:,the~is that the numbers are truly a ndom. SiT, re n heap operation~ take order n lr: ~, n time, we have an efficient aad easily imply," :nted two ,,ided statistical test.

The number of ob~rvations requ~ ,ed to detect the pseudo-random nature of the [~ ;nerator is re- ally quite small. For example, wP,: a 16-bit seed, N cannot exceed 65536. Thus wi ,h (as calculated above) 3*256 = 768 observatic.as, our statistical test will reject such a generav,r at about the 1% level.

The converse result, whic'~ shows that a genera- tor can be chosen in sucl', a way that its lack of true randomness cannot oe detected for quite a while, may s~m more "~':assufing than it really ~s. I! is easy to see that lib, ear congruential goner." tots (for which s , , i - a s , .~ b (rood N)) do not s~,fficc: there are no more ~lan N 3 sequences wh:ch can arise for any parti<alar N (since there are at most N choices for a, for b, and for So.) l mder our structure a ste,,,stician desiring to r.:ject lineaa- congruentiai ~,:nerators could simpb make a list of the first f(~ zr (or five or six) elements of each of these sequ,~,aces. He could ther~ check to see whether t~,e first four-tuple of reservations is on this its) Under H o this s)~ould occur with probab3hty a < 1/N, which is very small for typi- cal va~aes of N.

P is amusing to note ,,aat our approach sug- ges, s that the random nu:a~ber generator should be O, Jsen at random (in z very special way). Knuth,

~, p. 5], su~ests the ,:ontrary: "The moral to this story is that raodom aumbers should not be gener- ated with a method chosen at random."

Our methods do not apply directly to genera- tors which do pot report the full seed, but rather some functior of it. To attempt to extend our approach to handle such generators, we imp|e~ mented a %uper-duper' type of generator which produces ~ach value as a non-invertible function of two ~¢nerators with relatively prime periods. Specifi:aily, we combined (XORed) the o~puts of a mvi, tip!icative congruential and a shift-register gen,:rator. To overcome the non-in,~ev.ibility of th~ XOR step, we concatenated severa~ generated , aiues to obtma enough bits so that invertibility was no longer obviously impossible. We were un- able to reject the hypothesis that the omput se- quence was random with the proposed test.

Page 4: On the adequacy of pseudo-random number generators (or: How big a period do we need?)

References

[1] Fox BmtIey and Schrage, A Guide to Simulation, Springer, New York, 1983.

[2] William Feller, An Introduction to Probability Theory and its Applications, Vol. 1, Wiley, New York, 1968.

[3] Lars Holst, "Limit theorems for some occupancy and sequential occupancy problems", Ann. Math. Star. 42, 1671-1680 (1971).

[4] Donald E. Knuth, The Art of Computer Programming, Vol. 2~ Addison-Wesley, Reading, MA, 1969.

[5] Harald Nicderreiter, "The serial test for pseudo-random numbers. 3e~rat~;1 by the linear congruential method", Number ~lath. 46, 51-68 (1985).

[6] R. Von Mises, "Ueber aufl~ilungs - und besetzungs - Wahrscheinlichkeiten", Revue de la Facult~ des Sciences de i'Universitb d'lstanbal, N.S. 4, 145-163 (1939).