GPU implementation of Cell Dynamics simulation for block ... · Correlation between theory,...

Post on 13-Feb-2020

4 views 0 download

Transcript of GPU implementation of Cell Dynamics simulation for block ... · Correlation between theory,...

Correlation between theory, simulation and experiment

~ 1-1000 nm

GPU implementation of Cell Dynamics simulation for block copolymer systemsLudwig Schreier, Marco Pinna, Andrei V. Zvelindovsky

lschreier@uclan.ac.uk

University of Central LancashireSchool of Computing, Engineering and Physical Sciences

Computational Physics GroupPR1 2HE PrestonUnited Kingdom

Using NVIDIA CUDA programming language, we

implemented Cell Dynamics simulation (CDS) for themodelling of block copolymers on the GPU. The code wasdeveloped, tested and benchmarked on a NVIDIA QuadroFX4600 and Tesla C1060 card. We compare results with atwo-dimensional to one-dimensional domain decomposedversion in C language and conventional Fortran 90 versionusing nested-array-loops. Performance of the code as awhole as well as of various internal parts has been analysedin detail.Two CUDA based version of CDS, one with a two Kernel andone with a four Kernel approach were developed for testingreasons. For lamellae systems in two dimensionalsimulation box of 1024*1024 grid points, enormousspeedups can be achieved using CUDA compared to Fortran90 code. The created C version shows tremendous speedupoptimisation which can be related to SIMD features of theCPU. The boundary condition implementation wasidentified as bottleneck for further speedup optimisation.

Zoom-image of simulation regime (mesoscale) fordiblock copolymer

Boundary condition exchange via halos(ghostpoints)

Cell Dynamics CUDA implementation using 4 Kernel approach Cell Dynamics CUDA implementation using 2 Kernel approach

CUDA Visual Profiler benchmarking results CUDA Visual Profiler benchmarking results

CUDA Visual Profiler benchmarking results Speedup of CUDA vs C Speedup of CUDA vs FortranCUDA Visual Profiler benchmarking results

AcknowledgmentsThe work is supported by AccelrysLtd. Via EPSRC CASE researchstudentship.

(c) Hitachi Global Storage Technologies

Development platforms:Quadro FX4600 and Tesla C1060