The Parallel Models of Coronal Polarization Brightness Calculation
description
Transcript of The Parallel Models of Coronal Polarization Brightness Calculation
1DCABES 2009 China University Of Geosciences
The Parallel Models of Coronal Polarization Brightness Calculation
Jiang Wenqian
DCABES 2009 China University Of Geosciences 2
Outline
Introduction
pB Calculation Formula
Serial pB Calculation Process
Parallel pB Calculation Models
Conclusion
DCABES 2009 China University Of Geosciences 3
Part . IntroductionⅠ
Space weather forecast needs an accurate solar wind model for the solar atmosphere and the interplanetary space. The global model of corona and heliosphere is the basis of numerical space weather forecast, and the observation basis of explaining various relevant relations.
Meanwhile, three-dimensional numerical Magnetohydrodynamics (MHD) simulation is one of the most common numerical methods to study corona and solar wind.
DCABES 2009 China University Of Geosciences 4
Part . IntroductionⅠ
Besides, calculating and converting the generated coronal electron density to the coronal polarization brightness (pB) is the key method of comparing with observation results, and is important to validate the MHD models.
Due to the massive data and the complexity of the pB model, the computation will cost too much time to visualize the pB data in nearly real time while using a single CPU (or core).
DCABES 2009 China University Of Geosciences 5
Part . IntroductionⅠ
According to the characteristic of CPU/GPU computing environment, we analyze the pB conversion algorithm, implement two parallel models of pB calculation with MPI and CUDA, and compares the two models’ efficiency.
DCABES 2009 China University Of Geosciences 6
Part Ⅱ. pB Calculation Formula
pB is derived from electron-scattered photosphere radiation. It can be used in the inversion of coronal electron density and to validate numerical models. Taking limb darkening into account, pB calculation formula of a small coronal volume element is shown as followed :
(1)
(2)
(3)
])1[(sin2
20 BAN
III ert
2sincosA
]cos
sin1ln)sin31(
sin
cossin31[
8
1 22
2
B
DCABES 2009 China University Of Geosciences 7
Part Ⅱ. pB Calculation Formula
The polarization brightness image for comparing with the observation of coronagraph can be generated through integrating the electron density along the line of sight.
Density integral Process of pB Calculation
DCABES 2009 China University Of Geosciences 8
Part . Serial Ⅲ pB Calculation Process
The steps of the serial model of pB calculation on CPU with the experimental data are shown as below.
The serial process of pB calculation
DCABES 2009 China University Of Geosciences 9
Part . Serial Ⅲ pB Calculation Process
According to the serial process of pB calculation above, we implement it under the environment of G95 on Linux and Visual Studio 2005 on Windows XP respectively.
With being measured the time cost of each step, it is found that the most time-consuming part of the whole program is the calculation of pB values, accounting for 98.05% and 99.05% of the total time cost respectively.
DCABES 2009 China University Of Geosciences 10
Part . Serial Ⅲ pB Calculation Process
Therefore, in order to improve the performance to meet the command of getting coronal polarization brightness in nearly real-time, we should optimize the calculation part of pB values.
As the density integration of each point over solar limb along the line of sight is independent, the parallel computation method is very suitable for pB calculation.
DCABES 2009 China University Of Geosciences 11
Part Ⅳ. Parallel pB Calculation Models
Currently, parallelized MHD numerical calculation is mainly based on MPI.
With the development of high performance computation, using GPU architecture to solve intensive computation shows obvious advantages.
Based on this situation, it will be an efficient parallel solution to implement the parallel MHD numerical calculation using GPU.
We implement two parallel models based on MPI and CUDA respectively.
DCABES 2009 China University Of Geosciences 12
Part Ⅳ. Parallel pB Calculation Models
Experiment Environment Experimental Data
42×42×82(r, θ, φ) density data(den)321×321×481(x , y, z) cartesian coordinate grid321×321 pB values will be generated.
HardwareIntel(R) Xeon(R) CPU, E5405 @ 2.00GHz(8 CPUs)1GB memory NVIDIA Quadro FX 4600 GPU, 760MB Global Memory GDD
R3 SDRAM graphics card
(It owns G80 kernel architecture, 12 MPs and 128 SPs )
DCABES 2009 China University Of Geosciences 13
Part Ⅳ. Parallel pB Calculation Models
Experiment Environment Compiling Environment
CUDA-based parallel model Visual Studio 2005 on Windows XP CUDA 1.1 SDK
MPI-based parallel model G95 on Linux MPICH2
DCABES 2009 China University Of Geosciences 14
Part Ⅳ. Parallel pB Calculation Models
MPI-based Parallelized Implementation In the MPI environment, how the experiment
decomposes computing domain into sub-domains is shown as bellow.
DCABES 2009 China University Of Geosciences 15
Part Ⅳ. Parallel pB Calculation Models
MPI-based Parallelized Implementation
DCABES 2009 China University Of Geosciences 16
Part Ⅳ. Parallel pB Calculation Models
MPI-based Parallelized Implementation The final result shows that MPI-based parallel model
reaches a speedup of 5.8. As the experiment is implemented under the platform with 8 CPU cores, the speed-up ratio of the result is closed to its theoretical value.
Meanwhile, it is revealed that the MPI-based parallel solution for the experiment has balanced the utilization ratio of processors and the communication between processors.
DCABES 2009 China University Of Geosciences 17
Part Ⅳ. Parallel pB Calculation Models
CUDA-based Parallelized Implementation According to pB serial calculation process and the CU
DA architecture, we should put the calculation part into the Kernel function to implement the parallel program.
Since the calculation of density interpolation and the cumulative sum involved in every pB value are independent, we can use multi-threads to process the pB value calculation in the CUDA, and each thread calculates one pB value.
DCABES 2009 China University Of Geosciences 18
Part Ⅳ. Parallel pB Calculation Models
CUDA-based Parallelized Implementation However, the pB values to be calculated is much larg
er than the available thread number of GPU, so each thread should calculate multiple pB values. According to experimental conditions, the thread number is setting to 256 for each block so as to maximize the use of computing resources.
The block number depends on the ratio of pB number and thread number. In addition, since the access time of global memory is large, we can put some independent data to the shared memory to reduce data access time.
DCABES 2009 China University Of Geosciences 19
Part Ⅳ. Parallel pB Calculation Models
CUDA-based Parallelized Implementation The size of data put into shared memory is about
7KB, less than 16KB provided by GPU, so the parallel solution is feasible.
Moreover, the data-length array is read-only and its using frequency is very high, so the optimized strategy that the data-length array is migrated from shared memory into constant memory is adopted to further improve its access efficiency.
The CUDA-based parallel calculation process is shown as bellow.
DCABES 2009 China University Of Geosciences 20
Part Ⅳ. Parallel pB Calculation Models
DCABES 2009 China University Of Geosciences 21
Part Ⅳ. Parallel pB Calculation Models
Experiment results The pB calculation time of two models is shown in Table 1.
Table 1. The pB calculation time of serial models and parallel
models and their speed-up ratio
MPI( G95)
CUDA( Visual Studio 2005)
pB calculation time of serial models( s)
32.403 48.938
pB calculation time of parallel models( s)
5.053 1.536
Speed-up ratio 6.41 31.86
DCABES 2009 China University Of Geosciences 22
Part Ⅳ. Parallel pB Calculation Models
Experiment results The total performance of two models is as shown in Table 2.
Table 2. The total running-time of two parallel models and the speed-up ratios
compared with their serial models
MPI( G95)( s)
CUDA( Visual Studio 2005)
( s)
The speed-up ratio of running-time
Serial models 33.05 49.406 0.67
Parallel models 5.70 2.004 2.84
DCABES 2009 China University Of Geosciences 23
Part Ⅳ. Parallel pB Calculation Models
Experiment results Finally, we draw the coronal polarization brightness
image shown as bellow with using calculated data.
DCABES 2009 China University Of Geosciences 24
Conclusion
Under the same environment, pB calculation time of MPI-based parallel model costs 5.053 seconds while the serial model costs 32.403 seconds. The model’s speedup is 6.41.
The pB calculation time of CUDA-based parallel model costs 1.536 seconds while the serial model costs 48.936 seconds. The model’s speedup is 31.86.
The total running-time of CUDA-based model is 2.84 times than that of MPI-based model.
DCABES 2009 China University Of Geosciences 25
Conclusion
It finds that the CUDA-based parallel model is more suitable for pB calculation, and it provides a better solution for post-processing and visualizing the MHD numerical calculation results.
DCABES 2009 China University Of Geosciences 26
Thank you!!!