GeantV Geometry: SIMD abstraction and interfacing with CUDA · C-like abstract kernels Backend...
Transcript of GeantV Geometry: SIMD abstraction and interfacing with CUDA · C-like abstract kernels Backend...
PH - SFT
GeantV Geometry: SIMD abstraction and interfacing with CUDA
Johannes de Fine Licht ([email protected]) Sandro Wenzel ([email protected]) !Thanks to Michal Husejko and Romain Wartel at TechLab for providing hardware.
Johannes de Fine Licht ([email protected])
PH - SFT
Abstracted algorithms
Write common code for multiple types of SIMD “backends” (Vc/Cilk/CUDA).
Eases code maintenance; requires only one kernel to be written for a given functionality
Must retain performance of a low level implementation
���2PH - SFT
Johannes de Fine Licht ([email protected])
PH - SFT ���3
Illustrating scalar/SIMD abstraction and kernelsSingle particle
interface
C-like abstract kernels
Vector interface
External CUDA kernels
Johannes de Fine Licht ([email protected])
PH - SFT ���3
Illustrating scalar/SIMD abstraction and kernelsSingle particle
interface
C-like abstract kernels
Vector interface
External CUDA kernels
Vc Looper Thread access
Scalar Looper Cilk Plus Looper
Johannes de Fine Licht ([email protected])
PH - SFT ���3
Illustrating scalar/SIMD abstraction and kernelsSingle particle
interface
C-like abstract kernels
Backend instantiation
Cilk Plus Backend
Vector interface
External CUDA kernels
Vc Looper Thread access
Scalar Looper Cilk Plus Looper
Scalar Backend Vc Backend CUDA Backend
Johannes de Fine Licht ([email protected])
PH - SFT ���3
Illustrating scalar/SIMD abstraction and kernelsSingle particle
interface
C-like abstract kernels
Backend instantiation
Cilk Plus Backend
Vector interface
External CUDA kernels
Vc Looper Thread access
Scalar Looper Cilk Plus Looper
C-like specialized kernels
Scalar Backend Vc Backend CUDA Backend
Kernel instantiation
Johannes de Fine Licht ([email protected])
PH - SFT
Generic kernels on abstract types
���4
Generic programming; write algorithms in a generic way on abstract types
!Implementation depends on the backend !Backend determines the types on which the high level operations are performed
Johannes de Fine Licht ([email protected])
PH - SFT
Generic kernels on abstract types
���4
Generic programming; write algorithms in a generic way on abstract types
!Implementation depends on the backend !Backend determines the types on which the high level operations are performed
Johannes de Fine Licht ([email protected])
PH - SFT
Implementations wrapped in types
Operations are abstracted
Backends implement all operations
Vc already supports arithmetics
CUDA uses primitive types
Wrapper structs are implemented if needed
���5
Johannes de Fine Licht ([email protected])
PH - SFT
Interfacing with CUDA
���6
Involving the GPU should be easy !Principle of common code !Issue: GPU code needs nvcc, but we want a different compiler for CPU
vs.GNU/Clang:
C++11 Vc
Cilk
nvcc: CUDA
Johannes de Fine Licht ([email protected])
PH - SFT
Host memory !vecgeom::CudaManager
vecgeom::UnplacedBox vecgeom::LogicalVolume …
Dual namespaces
���7
GPU memory !vecgeom_cuda::UnplacedBox vecgeom_cuda::LogicalVolume …
CUDA kernel
namespace vecgeom
namespace vecgeom_cuda
namespace vecgeom
Utilize distinct namespaces but same code for CPU and GPU !Classes live in one or both environments !Provide abstraction to interface between environments
Johannes de Fine Licht ([email protected])
PH - SFT
Host memory !vecgeom::CudaManager
vecgeom::UnplacedBox vecgeom::LogicalVolume …
Dual namespaces
���7
GPU memory !vecgeom_cuda::UnplacedBox vecgeom_cuda::LogicalVolume …
CUDA kernel
namespace vecgeom
namespace vecgeom_cuda
Allocation
namespace vecgeom
Utilize distinct namespaces but same code for CPU and GPU !Classes live in one or both environments !Provide abstraction to interface between environments
Johannes de Fine Licht ([email protected])
PH - SFT
Host memory !vecgeom::CudaManager
vecgeom::UnplacedBox vecgeom::LogicalVolume …
Dual namespaces
���7
GPU memory !vecgeom_cuda::UnplacedBox vecgeom_cuda::LogicalVolume …
Launch kernel CUDA kernel
namespace vecgeom
namespace vecgeom_cuda
Instantiate
Allocation
Synchronize content
namespace vecgeom
Utilize distinct namespaces but same code for CPU and GPU !Classes live in one or both environments !Provide abstraction to interface between environments
Johannes de Fine Licht ([email protected])
PH - SFT
Compilation scheme
���8
namespace vecgeom .o
Compile for C++11 with vector backend
Source files .cu
Compile with NVCC with CUDA backend
namespace vecgeom_cuda
.o
Optional modules (ROOT, benchmarking…)
Source files .cpp
+ CUDA interface .cu
+
Executable
Linking
Interface header
Johannes de Fine Licht ([email protected])
PH - SFT
Status
���9
Nu m b e r of p a rt ic le s loca te d
0 .0
0 .5
1 .0
1 .5
2 .0
2 .5
3 .0
3 .5
4 .0
4 .5
Sp
ee
du
p
Pa rt ic le loca t ion in b ox g e om e t ry
GPU
GPU with ove rh e a d
CPU
Preliminary benchmarks run in hybrid CPU/GPU environment for four-level box geometry.
Location of particles is a tree algorithm, so speedups are only seen at very high particle multiplicity.
Common code for box methods runs for scalar, Vc, Cilk and CUDA backends !Implemented GPU memory synchronization !Dual namespaces allows dispatching work to CPU or GPU !Location can run in either environment !Results are in agreement
!
Johannes de Fine Licht ([email protected])
PH - SFT
Appendix: CPU/GPU same interface
���10
Main executable compiled with C++ compiler
CUDA kernel compiled with NVCC
Same code used
Johannes de Fine Licht ([email protected])
PH - SFT ���13
External CUDA kernels
C-like abstract kernels
Vector interface
Johannes de Fine Licht ([email protected])
PH - SFT ���13
External CUDA kernels
C-like abstract kernels
Scalar Backend
Vc Backend
Cilk Plus Backend
CUDA Backend
Vector interface