CUDA- MonteCarloPi Code

download CUDA- MonteCarloPi Code

of 6

Transcript of CUDA- MonteCarloPi Code

  • 8/13/2019 CUDA- MonteCarloPi Code

    1/6

    SIGCSE 2011 - The 42nd ACM Technical Symposium on Computer Science Education

    March 9-12, 2010, Dallas, Texas, USA

    Workshop 9: General purpose computing using GPUs: Developing a

    hands-on undergraduate course on CUDA programming

    Monte Carlo Computation

    Random Number Generation

    B. Wilkinson Feb 11, 2011Preliminaries

    Monte Carlo computations use random selections within calculations. There are many applicationareas: numerical integration, physical simulations, business models, finance, . Monte Carlo

    computations are embarrassingly parallel because each random selection and subsequent calculation is

    independent on the other selection and calculations. They are very amenable to GPUs thecalculations using different random sequences random can be done independently by different threads.

    The Monte Carlo computation considered here is to compute . The Monte Carlo calculation isdescribed in Appendix A. One major issue is how to generate random numbers. A CUDA kernel

    cannot call rand() or any other C library function from within a CUDA kernel (except math routines as

    given in the NVIDIA CUDA C programming guide.1NVIDIA provides a CUDA CURAND library for

    generating random numbers with various distributions.2 Here we will provide MonteCarlo code

    using this library, and also a version using hand-coded (pseudo) random number generator using thewell-known generator xi+1= (a * xi+ c) mod m, where a = 16807, c = 0, and m = 2

    31- 1 (a prime

    number). Selecting a starting value (seed) that creates a unique sequence for each thread is an issue for

    hand-coding. CURAND handles this aspect nicely in the API.3

    Provided files:Each provided guest account has the following for Session 1b:

    Directory Contents Description

    WorkshopFiles Pi.cu Monte Carlo CUDA program

    Makefile Makefile for compiling and running CUDAprogram

    PiMyRandom.cu Monte Carlo CUDA program using hand-codedrandom number generator.

    1NVIDIA CUDA C programming guide, version 3.2, 11/09/2010,

    http://developer.download.nvidia.com/compute/cuda/3_2_prod/toolkit/docs/CUDA_C_Programming_Guide.pdf2NVIDIA CUDA CURAND Library, PG-05328-032_V01, August 2010

    http://developer.download.nvidia.com/compute/cuda/3_2_prod/toolkit/docs/CUDA_C_Programming_Guide.pdf3Although not strictly Monte Carlo, the effect is the same in the calculation if one uses numbers in numeric sequence andso a simple counter may be possible with a large sample.

  • 8/13/2019 CUDA- MonteCarloPi Code

    2/6

    Task 1 Compiling and Executing Monte Carlo program

    In this task, you will compile and execute a simple prewritten CUDA program that add to vectors. The

    code is given hereas Pi.cuand overleaf:

    / / Der i ved f r om code devel oped by Pat r i ck Rogers, UNC- C, whi ch used a hand- coded r andomnumber gener at or .

    #i ncl ude #i ncl ude #i ncl ude #i ncl ude #i ncl ude #i ncl ude

    #def i ne TRI ALS_PER_THREAD 4096#def i ne BLOCKS 256#def i ne THREADS 256#def i ne PI 3. 1415926535 / / known val ue of pi

    __gl obal __ voi d gpu_monte_car l o( f l oat *est i mat e, cur andStat e *st at es) {unsi gned i nt t i d = t hr eadI dx. x + bl ockDi m. x * bl ockI dx. x;i nt poi nt s_i n_ci rcl e = 0;

    f l oat x , y;

    curand_i ni t ( 1234, t i d, 0, &states[t i d] ) ; / / I ni t i al i ze CURAND

    f or ( i nt i = 0; i < TRI ALS_PER_THREAD; i ++) {x = cur and_uni f or m ( &st at es[ t i d] ) ;y = cur and_uni f or m ( &st at es[ t i d] ) ;poi nt s_i n_ci r cl e += ( x*x + y*y

  • 8/13/2019 CUDA- MonteCarloPi Code

    3/6

    pi _gpu += host [ i ] ;}

    pi _gpu / = ( BLOCKS * THREADS) ;

    st op = cl ock( ) ;

    pr i nt f ( "GPU pi cal cul at ed i n %f s. \ n", ( st op- st ar t ) / ( f l oat ) CLOCKS_PER_SEC) ;

    start = cl ock( ) ;f l oat pi _cpu = host _mont e_car l o( BLOCKS * THREADS * TRI ALS_PER_THREAD) ;st op = cl ock( ) ;pr i nt f ( "CPU pi cal cul at ed i n %f s. \ n", ( st op- st ar t ) / ( f l oat ) CLOCKS_PER_SEC) ;

    pr i nt f ( "CUDA est i mat e of PI = %f [ er r or of %f ] \ n", pi _gpu, pi _gpu - PI ) ;pr i nt f ( "CPU est i mat e of PI = %f [ er r or of %f ] \ n", pi _cpu, pi _cpu - PI ) ;

    r et ur n 0;}

    Timing Execution In Session 1a, we CUDA events to time the execution although because of the

    synchronous nature of cudaMemcpy, we could have used Linux clock() or time() . In above, we use

    clock() and also time the execution of computing on the CPU only for comparison.

    Compiling:A makefile is given below:

    NVCC = /usr/local/cuda/bin/nvcc

    CUDAPATH = /usr/local/cuda

    NVCCFLAGS = -I$(CUDAPATH)/include

    LFLAGS = -L$(CUDAPATH)/lib64 -lcuda -lcudart -lm

    Pi:

    $(NVCC) $(NVCCFLAGS) $(LFLAGS) o Pi Pi.cu

    Typemake Pito compile the program and ./Pito execute program.

    Executing ProgramType ./Pi to execute compiled program. The program will first compute on theGPU and then on the CPU (which may take several seconds).

    Task 2 Experiment with different CUDA grid/block structures

    Experiment with different numbers of blocks and threads/block.

  • 8/13/2019 CUDA- MonteCarloPi Code

    4/6

    Task 3 Hand-coded Random Number Generator

    Compile and test the MonteCarlo pi program, PiMyRandom.cu:

    / / Der i ved somewhat f r om code devel oped by Pat r i ck Rogers, UNC- C#i ncl ude #i ncl ude

    #i ncl ude #i ncl ude #i ncl ude

    #def i ne TRI ALS_PER_THREAD 4096#def i ne BLOCKS 256#def i ne THREADS 256#def i ne PI 3. 1415926535 / / known val ue of pi

    __devi ce__ f l oat my_r and( unsi gned i nt *seed) {unsi gned l ong a = 16807; / / const ant s f or r andom number generatorunsi gned l ong m = 2147483647; / / 2 31 - 1unsi gned l ong x = ( unsi gned l ong) *seed;

    x = ( a * x) %m;

    *seed = ( unsi gned i nt ) x;

    return ( ( f l oat ) x) / m;}

    __gl obal __ voi d gpu_monte_car l o( f l oat *est i mat e) {unsi gned i nt t i d = t hr eadI dx. x + bl ockDi m. x * bl ockI dx. x;i nt poi nt s_i n_ci rcl e = 0;f l oat x , y;

    unsi gned i nt seed = t i d + 1; / / st art i ng number i n r andom sequence

    f or ( i nt i = 0; i < TRI ALS_PER_THREAD; i ++) {x = my_r and( &seed) ;

    y = my_r and( &seed) ;poi nt s_i n_ci r cl e += ( x*x + y*y

  • 8/13/2019 CUDA- MonteCarloPi Code

    5/6

    cudaMal l oc( ( voi d **) &dev, BLOCKS * THREADS * si zeof ( f l oat ) ) ;gpu_mont e_car l o( dev) ;

    cudaMemcpy(host , dev, BLOCKS * THREADS * si zeof ( f l oat) , cudaMemcpyDevi ceToHost ) ;f l oat pi _gpu;f or ( i nt i = 0; i < BLOCKS * THREADS; i ++) {

    pi _gpu += host [ i ] ;}

    pi _gpu / = ( BLOCKS * THREADS) ;

    st op = cl ock( ) ;

    pr i nt f ( "GPU pi cal cul at ed i n %f s. \ n", ( st op- st ar t ) / ( f l oat ) CLOCKS_PER_SEC) ;

    start = cl ock( ) ;f l oat pi _cpu = host _mont e_car l o( BLOCKS * THREADS * TRI ALS_PER_THREAD) ;st op = cl ock( ) ;pr i nt f ( "CPU pi cal cul at ed i n %f s. \ n", ( st op- st ar t ) / ( f l oat ) CLOCKS_PER_SEC) ;

    pr i nt f ( "CUDA est i mat e of PI = %f [ er r or of %f ] \ n", pi _gpu, pi _gpu - PI ) ;pr i nt f ( "CPU est i mat e of PI = %f [ er r or of %f ] \ n", pi _cpu, pi _cpu - PI ) ;

    r et ur n 0;}

    This uses a hand-coded random number generator. You will need to modify the make file to compilethe program. Note the time of execution.

  • 8/13/2019 CUDA- MonteCarloPi Code

    6/6