A parallel computational framework for the simulation of variably saturated flow based on the...

A parallel computational framework for the simulation of variably saturated flowbased on the Cellular Automata concept using CUDA architecture.

Paolo Burlando1, Grigorios G. Anagnostopoulos1 and Adamos Kyriakou2

Institute of Environmental Engineering (1), Computer Vision Lab (2), ETH Zurich, Switzerlandcorrespondence: [email protected]

1. IntroductionA simple and efficient computational frameworkis presented for the simulation of variably satu-rated flow in porous media. In this modeling ap-proach the Cellular Automata (CA) concept is im-plemented.

• It is efficient for the simulation of large scalephenomena.

• The inherent CA concept simplicity and itsnatural parallelism make its implementationeasy within the CUDA framework.

2. Computational algorithmAccording to the macroscopic CA notion the com-putational domain consists of a two or three di-mensional lattice, which is composed by rectan-gular or prismatic cells respectively. Every cell ofthe lattice communicates with its neighbors onlythrough its faces.

(0,0,-1)

(0,0,1)

(0,-1,0) (0,1,0)

(1,0,0)

(-1,0,0)

(0,0,0)

Q0

Q1

Q2Q3

Q4

Q5

Coupling the discrete formulation of the massbalance of an arbitrary cell with the Darcy-Buckinghams law one can compute the head attime t+�t:

ht+�t

c

=

X

↵2I

K↵c

A↵c

l↵c

ht

↵

+Vc

�( c

)

�tht

c

+X

↵2I

0Q↵

bound

+ Sc

X

↵2I

K↵c

A↵c

l↵c

+Vc

�( c

)

�t

The above equation is applied in all the cells of thelattice except those, which have a Diriclets bound-ary condition, the hydraulic head of which is fixedthroughout the simulation.

3. Verification of the algorithmThe presented algorithm was tested againstknown benchmark cases available from the litera-ture, in order to evaluate its performance. Theseinclude experimental data, analytical solutionsand numerical experiments (Anagnostopoulosand Burlando, 2011).

An example is the infiltration experiment ofVauclin et al (1979), which is used to evaluatethe ability of the model simulating the transientposition of the water table in a laboratory scale soilbox.

Distance (m)

Wa

ter

De

pth

(m

)

0 0.5 1 1.5 2 2.5 32

1.5

1

0.5

0t = 2 hrs

t = 3 hrs

t = 4 hrs

t = 8 hrs

experimental data

4. CUDA ArchitectureCUDA is a general purpose parallel computing ar-chitecture that leverages the parallel compute en-gine in NVIDIA GPUs to solve many complexcomputational problems in a more efficient waythan on a CPU. Chapter 4: Hardware Implementation

CUDA Programming Guide Version 2.3 75

A set of SIMT multiprocessors with on-chip shared memory.

Figure 4-2. Hardware Model

4.2 Multiple Devices The use of multiple GPUs as CUDA devices by an application running on a multi-GPU system is only guaranteed to work if these GPUs are of the same type.

When the system is in SLI mode, all GPUs are accessible via the CUDA driver and runtime as separate devices, but there are special considerations as described below.

First, an allocation in one CUDA device on one GPU will consume memory on other GPUs. Because of this, allocations may fail earlier than otherwise expected.

Device

Multiprocessor N

Multiprocessor 2

Multiprocessor 1

Device Memory

Shared Memory

Instruction Unit

Processor 1

Registers

…Processor 2

Registers

Processor M

Registers

Constant Cache

Texture Cache

CUDA comes with a software environment that al-lows developers to use C as a high-level program-ming language.

5. Implementation and performanceThe most challenging issue is the fact that the domain can haveirregular geometry, which can make more difficult the exploitationof locality at the thread computations and the use of the sharedmemory.

For the runs we used a Nvidia Quadro 2000 graphics cardwith 192 CUDA cores installed in a pc with an Intel Xeon pro-cessor at 2.93 GHz. The benchmark case of Vauclin et al (1979)was used for assessing the performance of the code for griddimensions of increasing size (scale effect).

Parallelization strategy:

• The cell values are stored in a 1D array and for each cell the indexes of its neighboring cells were alsostored. Both of these matrices reside in the global memory.

• Simulation constants are stored in the constant memory.

• Soil properties for each soil class are stored in the texture memory.

• Atomic operations are used in order to check for convergence at every iteration.

• The shared memory is used to accelerate the atomic operations and the block’s memory accesses.

Results and conclusions:

1"

10"

100"

1000"

10000"

100000"

1000" 10000" 100000" 1000000" 10000000"Speed%(%cells/sec%)%

Number%of%Cells%

CPU" GPU"

0"

10"

20"

30"

40"

50"

60"

70"

80"

90"

1000" 10000" 100000" 1000000" 10000000"

Speed%Up%Factor%(.)%

Number%of%Cells%

• The speed up factor increases with grid dimension. As the domain size increases more computationalresources of the GPU are exploited.

• Our framework is very attractive for basin scale simulations (e.g. in natural hazards assessment)where the grid sizes can become excessively large.

References[1] G.G. Anagnostopoulos, P. Burlando, (2011). Object-oriented computational framework for the simulation of variably saturated flow, using a reduced complexity model, Submitted in Environmental Modelling

& Software

[2] M. Vauclin, D. Khanji, G. Vachaud, (1979). Experimental and numerical study of a transient, two-dimensional unsaturated-saturated water recharge problem. Water Resources Research, Vol 15

[3] NVIDIA (2010). Cuda programming guide, 3.0, Available: http://developer.download.nvidia.com/compute/cuda/3_0/toolkit/docs/

NVIDIA_CUDA_ProgrammingGuide.pdf

Grigoris Anagnostopoulos

Abstract No: IN13A-1317

A parallel computational framework for the simulation of variably saturated flow based on the...

Education

Transcript of A parallel computational framework for the simulation of variably saturated flow based on the...