CUDA- MonteCarloPi Code

8/13/2019 CUDA- MonteCarloPi Code

1/6

SIGCSE 2011 - The 42nd ACM Technical Symposium on Computer Science Education

March 9-12, 2010, Dallas, Texas, USA

Workshop 9: General purpose computing using GPUs: Developing a

hands-on undergraduate course on CUDA programming

Monte Carlo Computation

Random Number Generation

B. Wilkinson Feb 11, 2011Preliminaries

Monte Carlo computations use random selections within calculations. There are many applicationareas: numerical integration, physical simulations, business models, finance, . Monte Carlo

computations are embarrassingly parallel because each random selection and subsequent calculation is

independent on the other selection and calculations. They are very amenable to GPUs thecalculations using different random sequences random can be done independently by different threads.

The Monte Carlo computation considered here is to compute . The Monte Carlo calculation isdescribed in Appendix A. One major issue is how to generate random numbers. A CUDA kernel

cannot call rand() or any other C library function from within a CUDA kernel (except math routines as

given in the NVIDIA CUDA C programming guide.1NVIDIA provides a CUDA CURAND library for

generating random numbers with various distributions.2 Here we will provide MonteCarlo code

using this library, and also a version using hand-coded (pseudo) random number generator using thewell-known generator xi+1= (a * xi+ c) mod m, where a = 16807, c = 0, and m = 2

31- 1 (a prime

number). Selecting a starting value (seed) that creates a unique sequence for each thread is an issue for

hand-coding. CURAND handles this aspect nicely in the API.3

Provided files:Each provided guest account has the following for Session 1b:

Directory Contents Description

WorkshopFiles Pi.cu Monte Carlo CUDA program

Makefile Makefile for compiling and running CUDAprogram

PiMyRandom.cu Monte Carlo CUDA program using hand-codedrandom number generator.

1NVIDIA CUDA C programming guide, version 3.2, 11/09/2010,

http://developer.download.nvidia.com/compute/cuda/3_2_prod/toolkit/docs/CUDA_C_Programming_Guide.pdf2NVIDIA CUDA CURAND Library, PG-05328-032_V01, August 2010

http://developer.download.nvidia.com/compute/cuda/3_2_prod/toolkit/docs/CUDA_C_Programming_Guide.pdf3Although not strictly Monte Carlo, the effect is the same in the calculation if one uses numbers in numeric sequence andso a simple counter may be possible with a large sample.


2/6

Task 1 Compiling and Executing Monte Carlo program

In this task, you will compile and execute a simple prewritten CUDA program that add to vectors. The

code is given hereas Pi.cuand overleaf:

/ / Der i ved f r om code devel oped by Pat r i ck Rogers, UNC- C, whi ch used a hand- coded r andomnumber gener at or .

#i ncl ude #i ncl ude #i ncl ude #i ncl ude #i ncl ude #i ncl ude

#def i ne TRI ALS_PER_THREAD 4096#def i ne BLOCKS 256#def i ne THREADS 256#def i ne PI 3. 1415926535 / / known val ue of pi

__gl obal __ voi d gpu_monte_car l o( f l oat *est i mat e, cur andStat e *st at es) {unsi gned i nt t i d = t hr eadI dx. x + bl ockDi m. x * bl ockI dx. x;i nt poi nt s_i n_ci rcl e = 0;

f l oat x , y;

curand_i ni t ( 1234, t i d, 0, &states[t i d] ) ; / / I ni t i al i ze CURAND

f or ( i nt i = 0; i < TRI ALS_PER_THREAD; i ++) {x = cur and_uni f or m ( &st at es[ t i d] ) ;y = cur and_uni f or m ( &st at es[ t i d] ) ;poi nt s_i n_ci r cl e += ( x*x + y*y


3/6

pi _gpu += host [ i ] ;}

pi _gpu / = ( BLOCKS * THREADS) ;

st op = cl ock( ) ;

pr i nt f ( "GPU pi cal cul at ed i n %f s. \ n", ( st op- st ar t ) / ( f l oat ) CLOCKS_PER_SEC) ;

start = cl ock( ) ;f l oat pi _cpu = host _mont e_car l o( BLOCKS * THREADS * TRI ALS_PER_THREAD) ;st op = cl ock( ) ;pr i nt f ( "CPU pi cal cul at ed i n %f s. \ n", ( st op- st ar t ) / ( f l oat ) CLOCKS_PER_SEC) ;

pr i nt f ( "CUDA est i mat e of PI = %f [ er r or of %f ] \ n", pi _gpu, pi _gpu - PI ) ;pr i nt f ( "CPU est i mat e of PI = %f [ er r or of %f ] \ n", pi _cpu, pi _cpu - PI ) ;

r et ur n 0;}

Timing Execution In Session 1a, we CUDA events to time the execution although because of the

synchronous nature of cudaMemcpy, we could have used Linux clock() or time() . In above, we use

clock() and also time the execution of computing on the CPU only for comparison.

Compiling:A makefile is given below:

NVCC = /usr/local/cuda/bin/nvcc

CUDAPATH = /usr/local/cuda

NVCCFLAGS = -I$(CUDAPATH)/include

LFLAGS = -L$(CUDAPATH)/lib64 -lcuda -lcudart -lm

Pi:

$(NVCC) $(NVCCFLAGS) $(LFLAGS) o Pi Pi.cu

Typemake Pito compile the program and ./Pito execute program.

Executing ProgramType ./Pi to execute compiled program. The program will first compute on theGPU and then on the CPU (which may take several seconds).

Task 2 Experiment with different CUDA grid/block structures

Experiment with different numbers of blocks and threads/block.


4/6

Task 3 Hand-coded Random Number Generator

Compile and test the MonteCarlo pi program, PiMyRandom.cu:

/ / Der i ved somewhat f r om code devel oped by Pat r i ck Rogers, UNC- C#i ncl ude #i ncl ude

#i ncl ude #i ncl ude #i ncl ude

#def i ne TRI ALS_PER_THREAD 4096#def i ne BLOCKS 256#def i ne THREADS 256#def i ne PI 3. 1415926535 / / known val ue of pi

__devi ce__ f l oat my_r and( unsi gned i nt *seed) {unsi gned l ong a = 16807; / / const ant s f or r andom number generatorunsi gned l ong m = 2147483647; / / 2 31 - 1unsi gned l ong x = ( unsi gned l ong) *seed;

x = ( a * x) %m;

*seed = ( unsi gned i nt ) x;

return ( ( f l oat ) x) / m;}

__gl obal __ voi d gpu_monte_car l o( f l oat *est i mat e) {unsi gned i nt t i d = t hr eadI dx. x + bl ockDi m. x * bl ockI dx. x;i nt poi nt s_i n_ci rcl e = 0;f l oat x , y;

unsi gned i nt seed = t i d + 1; / / st art i ng number i n r andom sequence

f or ( i nt i = 0; i < TRI ALS_PER_THREAD; i ++) {x = my_r and( &seed) ;

y = my_r and( &seed) ;poi nt s_i n_ci r cl e += ( x*x + y*y


5/6

cudaMal l oc( ( voi d **) &dev, BLOCKS * THREADS * si zeof ( f l oat ) ) ;gpu_mont e_car l o( dev) ;

cudaMemcpy(host , dev, BLOCKS * THREADS * si zeof ( f l oat) , cudaMemcpyDevi ceToHost ) ;f l oat pi _gpu;f or ( i nt i = 0; i < BLOCKS * THREADS; i ++) {

pi _gpu += host [ i ] ;}

pi _gpu / = ( BLOCKS * THREADS) ;

st op = cl ock( ) ;

pr i nt f ( "GPU pi cal cul at ed i n %f s. \ n", ( st op- st ar t ) / ( f l oat ) CLOCKS_PER_SEC) ;

start = cl ock( ) ;f l oat pi _cpu = host _mont e_car l o( BLOCKS * THREADS * TRI ALS_PER_THREAD) ;st op = cl ock( ) ;pr i nt f ( "CPU pi cal cul at ed i n %f s. \ n", ( st op- st ar t ) / ( f l oat ) CLOCKS_PER_SEC) ;

pr i nt f ( "CUDA est i mat e of PI = %f [ er r or of %f ] \ n", pi _gpu, pi _gpu - PI ) ;pr i nt f ( "CPU est i mat e of PI = %f [ er r or of %f ] \ n", pi _cpu, pi _cpu - PI ) ;

r et ur n 0;}

This uses a hand-coded random number generator. You will need to modify the make file to compilethe program. Note the time of execution.


6/6

CUDA- MonteCarloPi Code

Documents

Transcript of CUDA- MonteCarloPi Code