Parallel White Noise Generation on a GPU via Cryptographic Hash Stanley TzengLi-Yi Wei Microsoft...

31
Parallel White Noise Generation on a GPU via Cryptographic Hash Stanley Tzeng Li-Yi Wei Microsoft Research Asia
  • date post

    19-Dec-2015
  • Category

    Documents

  • view

    217
  • download

    1

Transcript of Parallel White Noise Generation on a GPU via Cryptographic Hash Stanley TzengLi-Yi Wei Microsoft...

Parallel White Noise Generationon a GPU via Cryptographic HashParallel White Noise Generation

on a GPU via Cryptographic Hash

Stanley Tzeng Li-Yi Wei

Microsoft Research Asia

What is White Noise?What is White Noise?

Spatial domain: uniform random number

Frequency domain: white noise

spatial domain frequency domain

ImportanceImportance

Mother of all random numbers

Commonly used, e.g. rand() in C/C++

Major algorithms sequential

e.g. xn = a xn-1 + b mod c

Processors are becoming parallel

GPU, multi-core CPU, Cell

sequential algorithms cannot leverage that

ContributionContribution

☺Parallel algorithm for white noises

independent evaluation for every sample

easy implementation as a GPU pixel shader

speed faster than sequential algorithms

quality same or better

usage similar to texture mapping

PRNG (Pseudo Random Number Generator)PRNG (Pseudo Random Number Generator)

The main source of randomness in programs

Desirable properties

white noise statistics

repeatable

fast computation

low memory usage

Core IdeaCore Idea

1. input trivially prepared in parallel, e.g. linear ramp

2. feed input value into hash, independently and in parallel

3. output white noise

key idea:

borrow cryptographic hash!

hash

input

output

HashHash

(however nice) input → (unrecognizable) mess

Cryptographic HashCryptographic Hash

A subclass of hash

Commonly used for security applications

e.g. password, digital signature

Properties

irreversible – cannot find input from hash output

decorrelating – similar inputs, dissimilar outputs

uniform probability – all outputs likely to occur

Cryptographic Hash - ExampleCryptographic Hash - Example

irreversible, decorrelating, uniform probability

CHash ("The quick brown fox jumps over the lazy dog") = 9e107d9d372bb6826bd81d3542a419d6

CHash ("The quick brown fox jumps over the lazy eog")

= ffd93f16876049265fbaef4da268dd0e

Cryptographic Hash as a PRNGCryptographic Hash as a PRNG

White noise statistics

CHash is cryptographically secure

Repeatable

CHash is invariant with same input

Fast computation

CHash is parallel + constant cost

Low memory usage

CHash maintains no state

Order-independent i.e. Random accessible

important for parallel GPU applications

hash

Which Cryptographic Hash?Which Cryptographic Hash?

Many options

MD5, SHA, RIPEMD, Tiger, block cipher, etc

Desirable properties

white noise quality

fast computation

power-of-2 aligned (output & operations)

pure pixel shader, no state maintenance

Our Hash of Choice: MD5 [Rivest 1992]Our Hash of Choice: MD5 [Rivest 1992]

128-bit outputs and 32-bit operation

Small number of constants fit entirely in shader

Fastest among those satisfying quality criteria

Not 100% secure [Wang and Yu 2005]

but good enough for our goal

MD5 Algorithm OverviewMD5 Algorithm Overview

InputScrambling

(bit op, table, arithmetic) Outputshift table sin table

64 rounds

Performance Bottlenecks for Pixel ShaderPerformance Bottlenecks for Pixel Shader

InputScrambling

(bit op, table, arithmetic) Outputshift table sin table

64 rounds

Our OptimizationOur Optimization

InputScrambling

(bit op, table, arithmetic) Outputshift table sin table

64 rounds

sin functionreducedshift table

loop unrolling

Previous PRNGPrevious PRNG

GPU

BBS [Blum et al. 1986, Olano 2005]

O extremely fast

X not good quality

CEICG [Entacher et al. 1998, Sussman et al. 2006]

O decent quality

X processing time varies

AES [NIST 2001, Yamanouchi 2007]

O invertible (not hash)

X not good quality

CPU

rand

O commonly used

X not good quality

drand48

O better quality

X slower

Mersenne Twister [Matsumoto and Nishimura 1998]

O high quality and fast

X not random accessible

Assessing Quality: DIEHARD [Marsaglia 1995]Assessing Quality: DIEHARD [Marsaglia 1995]

De facto standard on measuring PRNG quality

Runs 15 different tests on the bits generated

Outputs p-val. If p == 0 || p == 1, fail.

BIRTHDAY SPACINGS TEST, M= 512 N=2**24 LAMBDA= 2.0000 Results for aes.bin

For a sample of size 500: mean aes.bin using bits 1 to 24 2.036

duplicate number number spacings observed expected

0 66. 67.668 1 130. 135.335 2 148. 135.335 3 80. 90.224 4 44. 45.112 5 20. 18.045

6 to INF 12. 8.282 Chisquare with 6 d.o.f. = 4.50 p-value= .391147

Cumulative Distribution FunctionCumulative Distribution Function

Shows how data is distributed within set

Given x in data, what % of data values are ≤ x

0 %

100% 100 %

1X=0 1X=0

0 %

Normal Distribution Uniform Distribution

Kolmogorov-Smirnov TestKolmogorov-Smirnov Test

Determines how two sets of data are alike

Looks at max difference D between distribution functions

100 %

1X=0

0 %

100 %

1X=0

0 %

not alike alike

D

D

Assessing Quality: DIEHARDAssessing Quality: DIEHARD

Run the results of the DIEHARD test (p-value) through a KS-test. Look at D-value.

Uniform Distribution Curve

P-value Curve

D-Value

Cumulative Distribution Function

D Smaller D is better quality!0

100

Assessing Quality: Power SpectrumAssessing Quality: Power Spectrum

Radial mean: should be uniform

Radial variance: should be low & uniform

Power spectrum density Radial mean Radial variance (Anisotropy)

Assessing Speed: Batch RenderingAssessing Speed: Batch Rendering

Clock time to generate random bits

n2 x 128 bits image, n = 512, 1024, 2048 and 4096

n2

n2

Assessing Speed: Texture Subset(For random accessibility)Assessing Speed: Texture Subset(For random accessibility)

A huge virtual texture

clock time for access A B

measure difference

(smaller is better)

220

220

A

B

0

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

MD5GPU GPU CEICG GPU BBS GPU AES rand drand48 M. Twister

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

DIEHARD TESTS PASSED DIEHARD TEST D-VALUE

Test Results: DIEHARD ResultsTest Results: DIEHARD Results

the higher the better the lower the better

Test Results: Power Spectrum TestsTest Results: Power Spectrum Tests

MD5 M. Twister GPU BBS

Test Results: Batch Render SpeedTest Results: Batch Render Speed

0

10

20

30

40

50

60

MD5CPU MD5GPUref MD5GPUopt GPU CEICG GPU BBS GPU AES rand drand48 M. Twister

fps

512 1024 2048 4096

Test Results: Texture Subset SpeedTest Results: Texture Subset Speed

Texture Subset Difference

3.1

0 0

4.8

0 0

362776257001.9

1

10

100

1000

10000

100000

1000000

MD5CPU MD5GPUref MD5GPUopt GPU CEICG GPU BBS GPU AES rand drand48 M. Twister

(ms)

Trading Quality for SpeedTrading Quality for Speed

Reducing # of rounds

O faster speed

X lower quality

Rounds Time(ms)DIEHARD tests

passedKS D-Val

64 6.3 15/15 0.2029

48 4.7 14/15 0.2042

32 3.1 13/15 0.2295

16 1.6 13/15 0.253

ApplicationsApplications

Fractal terrain

(vertex shader)

Texture tiling

(fragment shader)

Future WorkFuture Work

Implement our method in hardware

very similar to texture unit but much smaller

(no need for cache)

Alternative hashes

ride with advances in cryptographic hash

Thank You!Thank You!