EnSPy: Python library for computations of ensembles of particles on GPU
-
Upload
phtraveller -
Category
Documents
-
view
42 -
download
0
description
Transcript of EnSPy: Python library for computations of ensembles of particles on GPU
EnSPy: Python library for computations of ensembles of particles on GPU
EnSPy: Python library for computations ofensembles of particles on GPU
Glib Ivashkevych
Institute of Theoretical Physics, NSC KIPT,Kharkov, Ukraine
October 13, 2010
EnSPy: Python library for computations of ensembles of particles on GPU
Why GPU?
GPU – Graphic Processing Unit
programmable
manycore
multithreaded
with very high memory bandwidth
GPU programming give us:
high performance
transparent scalability
... and is useful for problems with high data parallelism:
large datasets
portions of data could be processed independently
EnSPy: Python library for computations of ensembles of particles on GPU
Why GPU?
GPU – Graphic Processing Unit
programmable
manycore
multithreaded
with very high memory bandwidth
GPU programming give us:
high performance
transparent scalability
... and is useful for problems with high data parallelism:
large datasets
portions of data could be processed independently
EnSPy: Python library for computations of ensembles of particles on GPU
Outline
Outline1 NVIDIA GPU Architecture and CUDA2 Python and CUDA3 EnSPy functionality4 EnSPy architecture5 Example: D5 potential6 Example: Hill problem7 Example: Hill problem, N–body version8 Performance results9 Future development
EnSPy: Python library for computations of ensembles of particles on GPU
NVIDIA GPU Architecture and CUDA
Outline1 NVIDIA GPU Architecture and CUDA2 Python and CUDA3 EnSPy functionality4 EnSPy architecture5 Example: D5 potential6 Example: Hill problem7 Example: Hill problem, N–body version8 Performance results9 Future development
EnSPy: Python library for computations of ensembles of particles on GPU
NVIDIA GPU Architecture and CUDA
Simplified GT200 architecture
consists ofmultiprocessors
each MP has:
8 stream processors1 unit for doubleprecision operationsshared memory
global memory
EnSPy: Python library for computations of ensembles of particles on GPU
NVIDIA GPU Architecture and CUDA
Multiprocessors and threads
MP can launch numerous threads
threads are ”lightweight” – little creation and switchingoverhead
threads run the same code
threads syncronization within MP
cooperation via shared memory
each thread have unique identifier – thread ID
Efficiency is achieved by latency hiding by calculation, and not bycache usage, as on CPU
EnSPy: Python library for computations of ensembles of particles on GPU
NVIDIA GPU Architecture and CUDA
C for CUDA
a set of extensions to C
runtime library
function and variable type qualifiers
built–in vector types: float4, double2 etc.
built–in variables
Kernels
maps parallel part of the program to the GPU
execution: N times in parallel by N CUDA threads
CUDA Driver API
low–level control over the execution
no need in nvcc compiler if kernels are precompiled – onlydriver needed
EnSPy: Python library for computations of ensembles of particles on GPU
NVIDIA GPU Architecture and CUDA
Execution model
EnSPy: Python library for computations of ensembles of particles on GPU
Python and CUDA
Outline1 NVIDIA GPU Architecture and CUDA2 Python and CUDA3 EnSPy functionality4 EnSPy architecture5 Example: D5 potential6 Example: Hill problem7 Example: Hill problem, N–body version8 Performance results9 Future development
EnSPy: Python library for computations of ensembles of particles on GPU
Why Python?
Python: flexible multipurpose interpreted language
easy to learn
dynamically typed
rich built–in functionality
very well documented
have large and active community
EnSPy: Python library for computations of ensembles of particles on GPU
Why Python?
Python scientific packages:
SciPy – modeling and simulation
Fourier transformsODEOptimizationscipy.weave.inline – C inlining with little or nooverhead· · ·
NumPy – arrays, linear algebra etc.
flexible array creation routinessorting, random sampling and statistics· · ·
Python is a convenient way of interfacing C/C++ libraries
EnSPy: Python library for computations of ensembles of particles on GPU
Why Python?
Python scientific packages:
SciPy – modeling and simulation
Fourier transformsODEOptimizationscipy.weave.inline – C inlining with little or nooverhead· · ·
NumPy – arrays, linear algebra etc.
flexible array creation routinessorting, random sampling and statistics· · ·
Python is a convenient way of interfacing C/C++ libraries
EnSPy: Python library for computations of ensembles of particles on GPU
Why Python?
Python scientific packages:
SciPy – modeling and simulation
Fourier transformsODEOptimizationscipy.weave.inline – C inlining with little or nooverhead· · ·
NumPy – arrays, linear algebra etc.
flexible array creation routinessorting, random sampling and statistics· · ·
Python is a convenient way of interfacing C/C++ libraries
EnSPy: Python library for computations of ensembles of particles on GPU
Why Python?
Python and CUDA
We could interface with:
Python C API – low–level approach: overkill
SWIG, Boost::Python – high–level approach: overkill
PyCUDA – most simple and straightforward way for CUDAonly
scipy.weave.inline – simple and straightforward way forboth CUDA and plain C/C++
EnSPy: Python library for computations of ensembles of particles on GPU
EnSPy functionality
Outline1 NVIDIA GPU Architecture and CUDA2 Python and CUDA3 EnSPy functionality4 EnSPy architecture5 Example: D5 potential6 Example: Hill problem7 Example: Hill problem, N–body version8 Performance results9 Future development
EnSPy: Python library for computations of ensembles of particles on GPU
EnSPy functionality
Motivation
Combine flexibility of Python with efficiency of C++ → CUDA forN–body sim
interface of EnSPy is written in Python
core of EnSPy is written in C++
joined together by scipy.weave.inline
C++ core could be used without Python – just include headerand link with precompiled shared library
easily extensible: both through high–level Python interfaceand low–level C++ core – new algorithms, initial distributionsetc.
multi–GPU parallelization
it’s easy to experiment with EnSPy!
EnSPy: Python library for computations of ensembles of particles on GPU
EnSPy functionality
EnSPy functionality
Types of ensembles:
”Simple” ensemble – without interaction, only externalpotential
N–body ensemble – both external potential and gravitationalinteraction between particles
Current algorithms:
4-th order Runge–Kutta for ”simple” ensemble
Hermite scheme with shared time steps for N-body ensemble
EnSPy: Python library for computations of ensembles of particles on GPU
EnSPy functionality
Predefined initial distributions:
Uniform, point and spherical for ”simple” ensembles
Uniform sphere with 2T/|U| = 1 for N-body ensemble
user could supply functions (in Python) for initial ensemblegeneration
User specified values and expressions:
parameters of initial distribution
potential, forces, parameters of integration scheme
arbitrary number of triggers – Ni (t) of particles which do notcross the given hypersurface Fi (q, p) = 0 before time t
arbitrary number of averages – F̄i (q, p, t) – quantities whichshould be averaged over the ensembles
EnSPy: Python library for computations of ensembles of particles on GPU
EnSPy functionality
Runtime generation and compilation of C and CUDA code:
User specified expressions (as Python strings) are wrapped byEnSPy template subpackage into C functions and CUDAmodule
Compiled at runtime
High usage and calculation efficiency:
flexible Python interface
all actual calculations are performed by runtime generated Cextension and precompiled shared library
Drawback:
extra time for generation and compilation of new code
EnSPy: Python library for computations of ensembles of particles on GPU
EnSPy architecture
Outline1 NVIDIA GPU Architecture and CUDA2 Python and CUDA3 EnSPy functionality4 EnSPy architecture5 Example: D5 potential6 Example: Hill problem7 Example: Hill problem, N–body version8 Performance results9 Future development
EnSPy: Python library for computations of ensembles of particles on GPU
EnSPy architecture
Execution flow and architecture
Input parameters
↓
Ensemble population(predefined or user specifieddistribution)
↓
Code generation andcompilation
↓
Launching NGPUs threads
EnSPy: Python library for computations of ensembles of particles on GPU
EnSPy architecture
GPU parallelization scheme for N–body simulations
EnSPy: Python library for computations of ensembles of particles on GPU
EnSPy architecture
Order of force calculation
EnSPy: Python library for computations of ensembles of particles on GPU
Example: D5 potential
Outline1 NVIDIA GPU Architecture and CUDA2 Python and CUDA3 EnSPy functionality4 EnSPy architecture5 Example: D5 potential6 Example: Hill problem7 Example: Hill problem, N–body version8 Performance results9 Future development
EnSPy: Python library for computations of ensembles of particles on GPU
Example: D5 potential
Overview
Problem:Escape from potential well.
Watched values (trigger):
N(t) – number of particles, remaining in the well at time t
Potential:
UD5 = 2ay2 − x2 + xy2 +x4
4
”Critical” energy: Ecr = ES = 0
EnSPy: Python library for computations of ensembles of particles on GPU
Example: D5 potential
Potential and structure of phase space:
−2 −1 0 1 2x
−2
−1
0
1
2
y
Level lines of D5 potential
2 1 0 1 2
2
1
0
1
2
x
px
EnSPy: Python library for computations of ensembles of particles on GPU
Example: D5 potential
Calculation setup:
”Simple ensemble”
uniform initial distribution of N = 10240 particles inx > 0 ∩ U(x , y) < E
trigger: x = 0→ q0 = 0.
12 lines of simple Python code (examples/d5.py):specification of integration parameters
EnSPy: Python library for computations of ensembles of particles on GPU
Example: D5 potential
Results:
Regular particles are trapped in well → initial ”mixed state” splits
E = 0.1
E = 0.9
0 10 20 300
0.2
0.4
0.6
0.8
1
t
N(t)/N(0)
EnSPy: Python library for computations of ensembles of particles on GPU
Example: Hill problem
Outline1 NVIDIA GPU Architecture and CUDA2 Python and CUDA3 EnSPy functionality4 EnSPy architecture5 Example: D5 potential6 Example: Hill problem7 Example: Hill problem, N–body version8 Performance results9 Future development
EnSPy: Python library for computations of ensembles of particles on GPU
Example: Hill problem
Overview
Problem:Toy model of escape from star cluster: escape of star frompotential of point rotating star cluster Mc and point galaxy coreMg � Mc
Watched values (trigger):
N(t) – number of particles, remaining in cluster at time t
”Potential” in cluster frame of reference (tidal approximation):
UHill = −3ω2x2 − GMc
r2
”Critical” energy: Ecr = ES = −4.5ω2
EnSPy: Python library for computations of ensembles of particles on GPU
Example: Hill problem
Potential:
−1.0 −0.5 0.0 0.5x
−1.0
−0.5
0.0
0.5
y
Hill curves
EnSPy: Python library for computations of ensembles of particles on GPU
Example: Hill problem
Calculation setup:
”Simple ensemble”
uniform initial distribution of N = 10240 particles in|x | < rt ∩ U(x , y) < E
ω = 1√3→ rt = 1
trigger: |x | − rt = 0→ abs(q0) - 1. = 0.
12 lines of simple Python code (examples/hill plain.py):specification of integration parameters
EnSPy: Python library for computations of ensembles of particles on GPU
Example: Hill problem
Results:
Traping of regular particles (some tricky physics here):
0
2 · 103
4 · 103
6 · 103
8 · 103
1 · 104
N(t
)
0 2.5 · 104 5 · 104 7.5 · 104 1 · 105
nt
E = −1.3E = −0.8E = −0.3
EnSPy: Python library for computations of ensembles of particles on GPU
Example: Hill problem, N–body version
Outline1 NVIDIA GPU Architecture and CUDA2 Python and CUDA3 EnSPy functionality4 EnSPy architecture5 Example: D5 potential6 Example: Hill problem7 Example: Hill problem, N–body version8 Performance results9 Future development
EnSPy: Python library for computations of ensembles of particles on GPU
Example: Hill problem, N–body version
Overview
Problem:Simplified model of escape from star cluster: escape of star frompotential of rotating star cluster with total mass Mc and pointpotential of galaxy core with mass Mg � Mc (2D)
Watched values:Configuration of cluster
Potential of galaxy core in cluster frame of reference (tidalapproximation):
UHillNB = −3ω2x2
EnSPy: Python library for computations of ensembles of particles on GPU
Example: Hill problem, N–body version
”Toy” Hill model vs N–body Hill model:
EnSPy: Python library for computations of ensembles of particles on GPU
Example: Hill problem, N–body version
Calculation setup:
N–body ensemble
2D (z = 0) initial distribution of N = 10240 particles insidecircle R with zero initial velocities
14 lines of simple Python code (examples/hill nbody.py):specification of integration parameters
Mc = 1, R = 200, ω = 1√3
EnSPy: Python library for computations of ensembles of particles on GPU
Example: Hill problem, N–body version
Results: cluster configuration
step = 201
−300
−200
−100
0
100
200
300
y
−300 −200 −100 0 100 200 300x
step = 801
−300
−200
−100
0
100
200
300
y
−300 −200 −100 0 100 200 300x
step = 401
−300
−200
−100
0
100
200
300
y
−300 −200 −100 0 100 200 300x
step = 1001
−300
−200
−100
0
100
200
300
y
−300 −200 −100 0 100 200 300x
step = 601
−300
−200
−100
0
100
200
300
y
−300 −200 −100 0 100 200 300x
step = 1201
−300
−200
−100
0
100
200
300
y
−300 −200 −100 0 100 200 300x
EnSPy: Python library for computations of ensembles of particles on GPU
Performance results
Not as good, as it could be – subject to improve. Estimation:∼ 1TFlops on 2x recent Fermi graphic processors
0
10
20
30
40
GF
lop/s
1 · 104 2 · 104 5 · 104 1 · 105 2 · 105
N
GTX260 DP - N–bodyGTX260 DP – ”simple” ensemble
EnSPy: Python library for computations of ensembles of particles on GPU
Future development
Must have features:
MPI: shifting from ”one host–multiple GPUs” to ”multiplehosts–multiple GPUs” environment
individual timesteps for Hermite
tree–codes
Performance improvements:
utilization of texture memory
better load balancing between GPUs