Study of 1D Bar Problem by Finite Element Method on ... · b) Generating global stiffness matrix...
Transcript of Study of 1D Bar Problem by Finite Element Method on ... · b) Generating global stiffness matrix...
International Journal of Science and Research (IJSR) ISSN (Online): 2319-7064
Index Copernicus Value (2013): 6.14 | Impact Factor (2013): 4.438
Volume 4 Issue 6, June 2015
www.ijsr.net Licensed Under Creative Commons Attribution CC BY
Study of 1D Bar Problem by Finite Element Method
on Parallel Architectures
Pikle Nileshchandra K1, Umesh B. Chavan
2
Department of Information Technology Walchand College of Engineering, Sangli, India
Abstract: Finite element method (FEM) is one of the most commonly used numerical technique to find approximate solution for
various problems in the field of mechanical engineering, civil engineering etc. In this paper we are presenting the overview of solving
mechanical 1D bar problem by Finite Element Method (FEM). This method include two steps, first we have to generate stiffness matrix
and after generating stiffness matrix solve the system of linear equations by suitable method either by direct method or by iterative
method. As Iterative methods are more computationally efficient in this paper we emphasize Conjugate gradient (CG) method and
Preconditioned Conjugate Gradient method (PCG). Later on we will focus on the parallelization of these methods on parallel
architectures such as CPU-GPU architecture or Message passing interface (MPI).
Keywords: Finite Element Method (FEM), Element By Element Finite Element Method (EBEFEM), Conjugate Gradient (CG),
Preconditioned Conjugate Gradient (PCG), Graphics Processing Unit (GPU), Compute Unified Device Architecture (CUDA).
1. Introduction
Finite Element Method is one of the best methods for
solving partial differential equations (PDE) from various
domains of engineering such as Mechanical Engineering
civil engineering etc. Various structural analysis problems[1]
like determining effects of loads on structures like vehicles,
bridges, buildings, etc. are routinely carried out using the
Finite Element Method. A typical Finite Element simulation
of a practical problem usually involves the assembly and
solution of hundreds of thousands of simultaneous linear
algebraic equations which can be written in the of the form
Ax=b (1)
Solving FEM problem divided into following steps
1. Divide problem into number of finite elements.
2. Generate local stiffness matrix.
3. Assemble to form the linear systems of equations of the
form Ku = f.
4. Solve this linear system of equations by preconditioned
conjugate gradient method.
Figure 1: Steps in Finite Element Method
where K is stiffness matrix, u is a load vector, L is total
length of bar and f is a applied force refer figure. One end of
bar is fixed and force (f) is applied at other end of bar. Bar is
divided into n number of finite elements area of cross
sections (A) at each element is given. According to the
hook's law stiffness (k) is given by
k=𝐴×𝐸
𝐿 (2)
where l is segmented length and is computed by Total
length/number of elements (L/n), E is modulus of elasticity.
The memory requirements and the computational time
required to solve such equations increases as the number of
equations increases. To deal with such large numerical
problems in the Finite Element Analysis, parallel computing
on high performance computer is gradually becoming a main
stream tool. Many parallel algorithms and programs for
finite element computation have been developed on parallel
computers, utilizing vast numbers of CPUs or GPU cores to
achieve high speed up and scalability.
2. Literature Survey
The theory and implementation of the Finite Element
Method is discussed various books, see for example Seshu
[1]. The element by element algorithm is discussed in
Hughes et. al [2] and Hughes [3]. A serial implementation of
the element by element method is discussed in King and
Sonnad [4]. A parallel implementation of the element-by-
element method using CUDA is presented in Kiss et. al [5]
where they solved the problem of heat conduction in an in
homegeneous media. Sheth [6] has demonstrated a proof of
concept implementation of the element by element method
using CUDA to solve plane linear elastic problems. Mafi and
Sirouspour [7] have also implemented the element by
element Finite Element Method using GPU solve problems
in nonlinear finite deformation analysis. High Performance
Conjugate Gradient (HPCG) algorithm explained in [8] on
GPU.
Paper ID: SUB155753 2065
International Journal of Science and Research (IJSR) ISSN (Online): 2319-7064
Index Copernicus Value (2013): 6.14 | Impact Factor (2013): 4.438
Volume 4 Issue 6, June 2015
www.ijsr.net Licensed Under Creative Commons Attribution CC BY
Problem solving by FEM
Figure 2: 1Dimensional Bar Problem
a) Generating Local Stiffness Matrix
According to hook's law stress directly proportional to strain
f=ku (3)
then stiffness is given by equation (2) where A is area of
cross section, E young's modulus, L is length of the bar, f is
force acting on bar, u is displacement.
let 𝑓𝑖and 𝑓𝑗be the force acting on bar in X and Y direction
respectively.
𝑓𝑖 =𝐴×𝐸
𝐿× 𝑢𝑖 (5)
𝑓𝑗 =𝐴×𝐸
𝐿× 𝑢𝑗 (6)
Then 𝑓𝑗 is reaction to force 𝑓𝑖 i.e. opposite to 𝑓𝑖
therefor 𝑓𝑖= -𝑓𝑗
𝑓𝑖= 𝐴×𝐸
𝐿× (𝑢𝑖 − 𝑢𝑗 ) (7)
𝑓𝑗= 𝐴×𝐸
𝐿× (𝑢𝑗 − 𝑢𝑖) (8)
[𝑓𝑖𝑓𝑗
]=k ⋅ [1 −1−1 1
] ⋅ [𝑢𝑖
𝑢𝑗] (9)
b) Generating global stiffness matrix (Assembling)
In the second step all local stiffness matrices generated from
previous steps are assembled to produce the stiffness matrix
(K). Assembled matrix looks like as follows
K=[
𝑘1 −𝑘1 0 ⋯ 0
−𝑘1 𝑘1+k2 0
⋮ ⋮0 0 ⋯ 𝑘𝑛
] (10)
This matrix is sparse, symmetric and tridiagonal. When the
arrangement of the elements get changed the stiffness matrix
is also get changed so there is no guarantee that stiffness
matrix generated will be symmetric and tridiagonal. By
putting stiffness matrix in equation (2) we will generate a
system of linear equations of the form shown in equation
(1).Then next step is solving this system.
c) Solving the system of linear equations
There are several methods of solving system of linear
equations divided direct and iterative method. Direct
methods like gauss-elimination, LU decomposition,
cholesky's decom- position[9] are somewhat costlier in
terms of memory and number of iterations. As in this case
the matrix generated is tridiagonal and symmetric Thomas
Algorithm[10] can be used to reduce space complexity.
Because in Thomas algorithm no need to store whole
stiffness matrix, only diagonal is stored so space is reduced
from 𝑛2to 3𝑛. Like other direct methods solution is found by
forward or backward substitution which takes n iterations in
Thomas algorithm. On the other hand iterative methods are
more suitable in terms of space and time complexity. One of
the best iterative methods is conjugate gradient method. It
converges towards the solution faster than the other
methods. The main drawback of direct method is we have to
follow𝑛iterations for𝑛equations. In CG method we can
iterate less than 𝑛, it depends on up to what approximation
we can tolerate the error. If the solution to this equation
represents the deformation takes place in the bar at various
positions. The stiffness matrix generated from assembling
process is a very large sparse matrix, to solve this very large
memory space required. So another method called Element
by Element Finite Element Method (EBEFEM) [11] is used
to solve the 1D bar problem. EBEFEM is solved by using
preconditioned conjugate gradient method.
3. Element by Element Finite Element Method
Element-by-element (EBE) algorithm implements the
conjugate gradient method at the element level. It reduces
solving the linear set of equations Ax=b for the whole
system to
𝐴 (𝑒) ⋅ 𝑥 (𝑒)=b 𝑒
(11)
That is instead of generating the global stiffness matrix,
solution is find out at element level. This system is
explained by law[12]. In equation (11)𝐴 (𝑒)is an element
level stiffness- matrix𝑥 (𝑒)vector of unknowns at element
level and𝑏 (𝑒)is right hand side known vector. The reason
behind such element level computations is, easy for
parallelization. On parallel architecture each processor will
compute these computations locally hence speedup will be
achived. Further, Liu and Yang [13] combined a similar
model proposed by Hughes, Levit et al. [1] with this model
and implemented an element by element (EBE) Jacobi
preconditioned conjugate gradient (PCG) method.
4. Graphics Processing Unit
Graphics processing unit (GPU) most of the times it is also
called as visual processing unit (VPU).GPU is not a
standalone device it is works with CPU. Most of the
computers now a days comes with GPUs.GPU accelerated
computing combined with CPU together used to accelerate
the scientific, analytical, etc. applications. Generally GPUs
considers two architectures memory architecture and host
device architecture. In GPUs thousands of cores are present
as compare to CPUs now a day’s 4, 8, 16 cores are present.
Though GPUs having these many cores it cannot be used as
standalone device because if for parallel processing GPUs
are 10 times faster than CPU for the same CPUs may be 10
times faster than GPUs for sequential processing. Hence in
host-device architecture GPUs and CPUs are mandatory.
Host-device architecture states execution of a program on
GPU whereas memory architecture emphasize on memory
and cores organization on GPU.
1) Host-Device Architecture
In host-device architecture host is CPU and device is known
as GPU. Host-device architecture states the process of data
Paper ID: SUB155753 2066
International Journal of Science and Research (IJSR) ISSN (Online): 2319-7064
Index Copernicus Value (2013): 6.14 | Impact Factor (2013): 4.438
Volume 4 Issue 6, June 2015
www.ijsr.net Licensed Under Creative Commons Attribution CC BY
transfer and execution of a program. During the execution of
the program following states occurs
1. Allocate memory on host.
2. Allocate memory on device.
3. Copy data from host-to-device.
4. Host instructs device for processing.
5. Device computes parallely.
6. Results copied back host memory.
Typical host-device architecture is shown in fig. 3
Figure 3: Processing flow on host and device
2) GPU Memory Architecture
GPU have 6 types of memories global memory, shared
memory, constant memory, texture memory, local memory,
registers. Fig shows memeory layout in GPUs.
Figure 4: Memory hierarchy in GPU
The processing unit of GPU is thread. Group of threads
comprises block and blocks comprises grid. Each thread has
its own local memory and register memory. Threads within
the block uses shared memory. Threads within different
blocks cannot communicate. All the memories are read-write
memories except constant and texture memory which are
read only memories.
3) Compute Unified Device Architecture (CUDA)
Compute Unified Device Architecture (CUDA) is the
parallel computing platform provided by NVIDIA to give
developers programming access to its GPUs. It has various
libraries and runtime application programming interfaces
(APIs) which allow the developer to implement the parallel
CUDA code using simple C/C++ directives. In CUDA
kernel functions are used to execute on GPU for which block
size, grid size also passed. Some most commonly functions
in CUDA are given below.
cudaMalloc () - Allocates memory on GPU.
CudaMemcpyHostToDevice ()- Copies memory from host
to device.
CudaMemCpyDeviceToHost () - Copies memory from
device to host.
CudaFree () - Frees allocate memory from GPU.
EBE Fine Element Method on GPU
Element by element FEM is used while parallelization of the
FEM problem. Instead of generating the global stiffness
matrix (K) the computations are done locally to find the
result that is load vector in our case. As each of these
elements is independent they can be parallelized. The
detailed algorithms are shown [2][5][6]
5. Conclusions and Future Work
As generation of stiffness matrix produces a very large size
matrix it is very time consuming to solve and store the
matrix. Various structural engineering problems, scientific
problems solved by using FEM needs to be solved in real
time. Sequential process is very time consuming so by
parallelizing we can reduce time as well as space complexity
of the problem. PCG is the most used iterative solver for
system of linear equations as in equation (1) so by
parallelizing this method can be used in different domain
problems. Instead of parallelizing PCG if we parallelize
EBEFEM we get more independent SIMD computations
hence parallel execution of EBEFEM is more efficient.
References
[1] Seshu, P. 2004, Textbook of the Finite Element
Analysis, Prentice Hall of India, Pvt. Ltd.
[2] Hughes, T. Levit, I. and Winget, J. 1983, An element by
element solution algoriths for problems of structural and
solid mechanics, Computational Methods in Applied
mechanics and Engineering, 32 (2), 241-254.
[3] Hughes, T. 2000, The Finite Element Method: Linear
Static and Dynamic Finite. Element Analysis, Dover
Publications.
[4] King R. and Sonnad, V., 1987, Implementation of an
element by element algorithms for the finite element
method on a coarse grained parallel computer,
Paper ID: SUB155753 2067
International Journal of Science and Research (IJSR) ISSN (Online): 2319-7064
Index Copernicus Value (2013): 6.14 | Impact Factor (2013): 4.438
Volume 4 Issue 6, June 2015
www.ijsr.net Licensed Under Creative Commons Attribution CC BY
Computational Methods in Applied Mechanics and
Engineering, 65 (1), 47-59
[5] Kiss, I., Gyimothy, S., Badics, Z and Pavo, J. 2012,
Parallel Realization of the Element by Element FeM
Technique using CUDA, IEEE Transactions on
Magnetics, 48 (2), 507-510.
[6] Sheth, 2013, R. Parallel computing in Finite Element
Method using CUDA, Dual Degree Thesis, IIT-Bombay.
[7] Mafi, R. and Sirouspour, S., 2013, GPU-baes
acceleration of computations in nonlinear finite element
deformation analysis, International Journal for
Numerical Methods.
[8] https://software.sandia.gov/hpcg/
[9] http://www2.units.it/ipl/students_area/imm2/files/Nume
rical_Recipes.pdf
[10] http://en.wikipedia.org/wiki/Tridiagonal_matrix_algorit
hm
[11] T. Hughes, I. Levit, J. Winget, “An element-by-element
solution algorithm for problems of structural and solid
mechanics”, Comp. Meths. Appl. Mech. Engg., 36 (2),
1983, 241-254.
[12] K. Law, “A parallel finite element solution method”,
Comp. & Struct., 23 (6), 1986, 845-858.
[13] Y. Liu, W. Zhou, Q. Yang, “A distributed memory
parallel element-by-element scheme based onJacobi-
conditioned conjugate gradient for 3D finite element
analysis”, Fin. Elemnt. in Analys. and Desg., 43 (6–7),
2007, 494-503.
Paper ID: SUB155753 2068