Study of 1D Bar Problem by Finite Element Method on ... · b) Generating global stiffness matrix...

4
International Journal of Science and Research (IJSR) ISSN (Online): 2319-7064 Index Copernicus Value (2013): 6.14 | Impact Factor (2013): 4.438 Volume 4 Issue 6, June 2015 www.ijsr.net Licensed Under Creative Commons Attribution CC BY Study of 1D Bar Problem by Finite Element Method on Parallel Architectures Pikle Nileshchandra K 1 , Umesh B. Chavan 2 Department of Information Technology Walchand College of Engineering, Sangli, India Abstract: Finite element method (FEM) is one of the most commonly used numerical technique to find approximate solution for various problems in the field of mechanical engineering, civil engineering etc. In this paper we are presenting the overview of solving mechanical 1D bar problem by Finite Element Method (FEM). This method include two steps, first we have to generate stiffness matrix and after generating stiffness matrix solve the system of linear equations by suitable method either by direct method or by iterative method. As Iterative methods are more computationally efficient in this paper we emphasize Conjugate gradient (CG) method and Preconditioned Conjugate Gradient method (PCG). Later on we will focus on the parallelization of these methods on parallel architectures such as CPU-GPU architecture or Message passing interface (MPI). Keywords: Finite Element Method (FEM), Element By Element Finite Element Method (EBEFEM), Conjugate Gradient (CG), Preconditioned Conjugate Gradient (PCG), Graphics Processing Unit (GPU), Compute Unified Device Architecture (CUDA). 1. Introduction Finite Element Method is one of the best methods for solving partial differential equations (PDE) from various domains of engineering such as Mechanical Engineering civil engineering etc. Various structural analysis problems[1] like determining effects of loads on structures like vehicles, bridges, buildings, etc. are routinely carried out using the Finite Element Method. A typical Finite Element simulation of a practical problem usually involves the assembly and solution of hundreds of thousands of simultaneous linear algebraic equations which can be written in the of the form Ax=b (1) Solving FEM problem divided into following steps 1. Divide problem into number of finite elements. 2. Generate local stiffness matrix. 3. Assemble to form the linear systems of equations of the form Ku = f. 4. Solve this linear system of equations by preconditioned conjugate gradient method. Figure 1: Steps in Finite Element Method where K is stiffness matrix, u is a load vector, L is total length of bar and f is a applied force refer figure. One end of bar is fixed and force (f) is applied at other end of bar. Bar is divided into n number of finite elements area of cross sections (A) at each element is given. According to the hook's law stiffness (k) is given by k= × (2) where l is segmented length and is computed by Total length/number of elements (L/n), E is modulus of elasticity. The memory requirements and the computational time required to solve such equations increases as the number of equations increases. To deal with such large numerical problems in the Finite Element Analysis, parallel computing on high performance computer is gradually becoming a main stream tool. Many parallel algorithms and programs for finite element computation have been developed on parallel computers, utilizing vast numbers of CPUs or GPU cores to achieve high speed up and scalability. 2. Literature Survey The theory and implementation of the Finite Element Method is discussed various books, see for example Seshu [1]. The element by element algorithm is discussed in Hughes et. al [2] and Hughes [3]. A serial implementation of the element by element method is discussed in King and Sonnad [4]. A parallel implementation of the element-by- element method using CUDA is presented in Kiss et. al [5] where they solved the problem of heat conduction in an in homegeneous media. Sheth [6] has demonstrated a proof of concept implementation of the element by element method using CUDA to solve plane linear elastic problems. Mafi and Sirouspour [7] have also implemented the element by element Finite Element Method using GPU solve problems in nonlinear finite deformation analysis. High Performance Conjugate Gradient (HPCG) algorithm explained in [8] on GPU. Paper ID: SUB155753 2065

Transcript of Study of 1D Bar Problem by Finite Element Method on ... · b) Generating global stiffness matrix...

Page 1: Study of 1D Bar Problem by Finite Element Method on ... · b) Generating global stiffness matrix (Assembling) In the second step all local stiffness matrices generated from previous

International Journal of Science and Research (IJSR) ISSN (Online): 2319-7064

Index Copernicus Value (2013): 6.14 | Impact Factor (2013): 4.438

Volume 4 Issue 6, June 2015

www.ijsr.net Licensed Under Creative Commons Attribution CC BY

Study of 1D Bar Problem by Finite Element Method

on Parallel Architectures

Pikle Nileshchandra K1, Umesh B. Chavan

2

Department of Information Technology Walchand College of Engineering, Sangli, India

Abstract: Finite element method (FEM) is one of the most commonly used numerical technique to find approximate solution for

various problems in the field of mechanical engineering, civil engineering etc. In this paper we are presenting the overview of solving

mechanical 1D bar problem by Finite Element Method (FEM). This method include two steps, first we have to generate stiffness matrix

and after generating stiffness matrix solve the system of linear equations by suitable method either by direct method or by iterative

method. As Iterative methods are more computationally efficient in this paper we emphasize Conjugate gradient (CG) method and

Preconditioned Conjugate Gradient method (PCG). Later on we will focus on the parallelization of these methods on parallel

architectures such as CPU-GPU architecture or Message passing interface (MPI).

Keywords: Finite Element Method (FEM), Element By Element Finite Element Method (EBEFEM), Conjugate Gradient (CG),

Preconditioned Conjugate Gradient (PCG), Graphics Processing Unit (GPU), Compute Unified Device Architecture (CUDA).

1. Introduction

Finite Element Method is one of the best methods for

solving partial differential equations (PDE) from various

domains of engineering such as Mechanical Engineering

civil engineering etc. Various structural analysis problems[1]

like determining effects of loads on structures like vehicles,

bridges, buildings, etc. are routinely carried out using the

Finite Element Method. A typical Finite Element simulation

of a practical problem usually involves the assembly and

solution of hundreds of thousands of simultaneous linear

algebraic equations which can be written in the of the form

Ax=b (1)

Solving FEM problem divided into following steps

1. Divide problem into number of finite elements.

2. Generate local stiffness matrix.

3. Assemble to form the linear systems of equations of the

form Ku = f.

4. Solve this linear system of equations by preconditioned

conjugate gradient method.

Figure 1: Steps in Finite Element Method

where K is stiffness matrix, u is a load vector, L is total

length of bar and f is a applied force refer figure. One end of

bar is fixed and force (f) is applied at other end of bar. Bar is

divided into n number of finite elements area of cross

sections (A) at each element is given. According to the

hook's law stiffness (k) is given by

k=𝐴×𝐸

𝐿 (2)

where l is segmented length and is computed by Total

length/number of elements (L/n), E is modulus of elasticity.

The memory requirements and the computational time

required to solve such equations increases as the number of

equations increases. To deal with such large numerical

problems in the Finite Element Analysis, parallel computing

on high performance computer is gradually becoming a main

stream tool. Many parallel algorithms and programs for

finite element computation have been developed on parallel

computers, utilizing vast numbers of CPUs or GPU cores to

achieve high speed up and scalability.

2. Literature Survey

The theory and implementation of the Finite Element

Method is discussed various books, see for example Seshu

[1]. The element by element algorithm is discussed in

Hughes et. al [2] and Hughes [3]. A serial implementation of

the element by element method is discussed in King and

Sonnad [4]. A parallel implementation of the element-by-

element method using CUDA is presented in Kiss et. al [5]

where they solved the problem of heat conduction in an in

homegeneous media. Sheth [6] has demonstrated a proof of

concept implementation of the element by element method

using CUDA to solve plane linear elastic problems. Mafi and

Sirouspour [7] have also implemented the element by

element Finite Element Method using GPU solve problems

in nonlinear finite deformation analysis. High Performance

Conjugate Gradient (HPCG) algorithm explained in [8] on

GPU.

Paper ID: SUB155753 2065

Page 2: Study of 1D Bar Problem by Finite Element Method on ... · b) Generating global stiffness matrix (Assembling) In the second step all local stiffness matrices generated from previous

International Journal of Science and Research (IJSR) ISSN (Online): 2319-7064

Index Copernicus Value (2013): 6.14 | Impact Factor (2013): 4.438

Volume 4 Issue 6, June 2015

www.ijsr.net Licensed Under Creative Commons Attribution CC BY

Problem solving by FEM

Figure 2: 1Dimensional Bar Problem

a) Generating Local Stiffness Matrix

According to hook's law stress directly proportional to strain

f=ku (3)

then stiffness is given by equation (2) where A is area of

cross section, E young's modulus, L is length of the bar, f is

force acting on bar, u is displacement.

let 𝑓𝑖and 𝑓𝑗be the force acting on bar in X and Y direction

respectively.

𝑓𝑖 =𝐴×𝐸

𝐿× 𝑢𝑖 (5)

𝑓𝑗 =𝐴×𝐸

𝐿× 𝑢𝑗 (6)

Then 𝑓𝑗 is reaction to force 𝑓𝑖 i.e. opposite to 𝑓𝑖

therefor 𝑓𝑖= -𝑓𝑗

𝑓𝑖= 𝐴×𝐸

𝐿× (𝑢𝑖 − 𝑢𝑗 ) (7)

𝑓𝑗= 𝐴×𝐸

𝐿× (𝑢𝑗 − 𝑢𝑖) (8)

[𝑓𝑖𝑓𝑗

]=k ⋅ [1 −1−1 1

] ⋅ [𝑢𝑖

𝑢𝑗] (9)

b) Generating global stiffness matrix (Assembling)

In the second step all local stiffness matrices generated from

previous steps are assembled to produce the stiffness matrix

(K). Assembled matrix looks like as follows

K=[

𝑘1 −𝑘1 0 ⋯ 0

−𝑘1 𝑘1+k2 0

⋮ ⋮0 0 ⋯ 𝑘𝑛

] (10)

This matrix is sparse, symmetric and tridiagonal. When the

arrangement of the elements get changed the stiffness matrix

is also get changed so there is no guarantee that stiffness

matrix generated will be symmetric and tridiagonal. By

putting stiffness matrix in equation (2) we will generate a

system of linear equations of the form shown in equation

(1).Then next step is solving this system.

c) Solving the system of linear equations

There are several methods of solving system of linear

equations divided direct and iterative method. Direct

methods like gauss-elimination, LU decomposition,

cholesky's decom- position[9] are somewhat costlier in

terms of memory and number of iterations. As in this case

the matrix generated is tridiagonal and symmetric Thomas

Algorithm[10] can be used to reduce space complexity.

Because in Thomas algorithm no need to store whole

stiffness matrix, only diagonal is stored so space is reduced

from 𝑛2to 3𝑛. Like other direct methods solution is found by

forward or backward substitution which takes n iterations in

Thomas algorithm. On the other hand iterative methods are

more suitable in terms of space and time complexity. One of

the best iterative methods is conjugate gradient method. It

converges towards the solution faster than the other

methods. The main drawback of direct method is we have to

follow𝑛iterations for𝑛equations. In CG method we can

iterate less than 𝑛, it depends on up to what approximation

we can tolerate the error. If the solution to this equation

represents the deformation takes place in the bar at various

positions. The stiffness matrix generated from assembling

process is a very large sparse matrix, to solve this very large

memory space required. So another method called Element

by Element Finite Element Method (EBEFEM) [11] is used

to solve the 1D bar problem. EBEFEM is solved by using

preconditioned conjugate gradient method.

3. Element by Element Finite Element Method

Element-by-element (EBE) algorithm implements the

conjugate gradient method at the element level. It reduces

solving the linear set of equations Ax=b for the whole

system to

𝐴 (𝑒) ⋅ 𝑥 (𝑒)=b 𝑒

(11)

That is instead of generating the global stiffness matrix,

solution is find out at element level. This system is

explained by law[12]. In equation (11)𝐴 (𝑒)is an element

level stiffness- matrix𝑥 (𝑒)vector of unknowns at element

level and𝑏 (𝑒)is right hand side known vector. The reason

behind such element level computations is, easy for

parallelization. On parallel architecture each processor will

compute these computations locally hence speedup will be

achived. Further, Liu and Yang [13] combined a similar

model proposed by Hughes, Levit et al. [1] with this model

and implemented an element by element (EBE) Jacobi

preconditioned conjugate gradient (PCG) method.

4. Graphics Processing Unit

Graphics processing unit (GPU) most of the times it is also

called as visual processing unit (VPU).GPU is not a

standalone device it is works with CPU. Most of the

computers now a days comes with GPUs.GPU accelerated

computing combined with CPU together used to accelerate

the scientific, analytical, etc. applications. Generally GPUs

considers two architectures memory architecture and host

device architecture. In GPUs thousands of cores are present

as compare to CPUs now a day’s 4, 8, 16 cores are present.

Though GPUs having these many cores it cannot be used as

standalone device because if for parallel processing GPUs

are 10 times faster than CPU for the same CPUs may be 10

times faster than GPUs for sequential processing. Hence in

host-device architecture GPUs and CPUs are mandatory.

Host-device architecture states execution of a program on

GPU whereas memory architecture emphasize on memory

and cores organization on GPU.

1) Host-Device Architecture

In host-device architecture host is CPU and device is known

as GPU. Host-device architecture states the process of data

Paper ID: SUB155753 2066

Page 3: Study of 1D Bar Problem by Finite Element Method on ... · b) Generating global stiffness matrix (Assembling) In the second step all local stiffness matrices generated from previous

International Journal of Science and Research (IJSR) ISSN (Online): 2319-7064

Index Copernicus Value (2013): 6.14 | Impact Factor (2013): 4.438

Volume 4 Issue 6, June 2015

www.ijsr.net Licensed Under Creative Commons Attribution CC BY

transfer and execution of a program. During the execution of

the program following states occurs

1. Allocate memory on host.

2. Allocate memory on device.

3. Copy data from host-to-device.

4. Host instructs device for processing.

5. Device computes parallely.

6. Results copied back host memory.

Typical host-device architecture is shown in fig. 3

Figure 3: Processing flow on host and device

2) GPU Memory Architecture

GPU have 6 types of memories global memory, shared

memory, constant memory, texture memory, local memory,

registers. Fig shows memeory layout in GPUs.

Figure 4: Memory hierarchy in GPU

The processing unit of GPU is thread. Group of threads

comprises block and blocks comprises grid. Each thread has

its own local memory and register memory. Threads within

the block uses shared memory. Threads within different

blocks cannot communicate. All the memories are read-write

memories except constant and texture memory which are

read only memories.

3) Compute Unified Device Architecture (CUDA)

Compute Unified Device Architecture (CUDA) is the

parallel computing platform provided by NVIDIA to give

developers programming access to its GPUs. It has various

libraries and runtime application programming interfaces

(APIs) which allow the developer to implement the parallel

CUDA code using simple C/C++ directives. In CUDA

kernel functions are used to execute on GPU for which block

size, grid size also passed. Some most commonly functions

in CUDA are given below.

cudaMalloc () - Allocates memory on GPU.

CudaMemcpyHostToDevice ()- Copies memory from host

to device.

CudaMemCpyDeviceToHost () - Copies memory from

device to host.

CudaFree () - Frees allocate memory from GPU.

EBE Fine Element Method on GPU

Element by element FEM is used while parallelization of the

FEM problem. Instead of generating the global stiffness

matrix (K) the computations are done locally to find the

result that is load vector in our case. As each of these

elements is independent they can be parallelized. The

detailed algorithms are shown [2][5][6]

5. Conclusions and Future Work

As generation of stiffness matrix produces a very large size

matrix it is very time consuming to solve and store the

matrix. Various structural engineering problems, scientific

problems solved by using FEM needs to be solved in real

time. Sequential process is very time consuming so by

parallelizing we can reduce time as well as space complexity

of the problem. PCG is the most used iterative solver for

system of linear equations as in equation (1) so by

parallelizing this method can be used in different domain

problems. Instead of parallelizing PCG if we parallelize

EBEFEM we get more independent SIMD computations

hence parallel execution of EBEFEM is more efficient.

References

[1] Seshu, P. 2004, Textbook of the Finite Element

Analysis, Prentice Hall of India, Pvt. Ltd.

[2] Hughes, T. Levit, I. and Winget, J. 1983, An element by

element solution algoriths for problems of structural and

solid mechanics, Computational Methods in Applied

mechanics and Engineering, 32 (2), 241-254.

[3] Hughes, T. 2000, The Finite Element Method: Linear

Static and Dynamic Finite. Element Analysis, Dover

Publications.

[4] King R. and Sonnad, V., 1987, Implementation of an

element by element algorithms for the finite element

method on a coarse grained parallel computer,

Paper ID: SUB155753 2067

Page 4: Study of 1D Bar Problem by Finite Element Method on ... · b) Generating global stiffness matrix (Assembling) In the second step all local stiffness matrices generated from previous

International Journal of Science and Research (IJSR) ISSN (Online): 2319-7064

Index Copernicus Value (2013): 6.14 | Impact Factor (2013): 4.438

Volume 4 Issue 6, June 2015

www.ijsr.net Licensed Under Creative Commons Attribution CC BY

Computational Methods in Applied Mechanics and

Engineering, 65 (1), 47-59

[5] Kiss, I., Gyimothy, S., Badics, Z and Pavo, J. 2012,

Parallel Realization of the Element by Element FeM

Technique using CUDA, IEEE Transactions on

Magnetics, 48 (2), 507-510.

[6] Sheth, 2013, R. Parallel computing in Finite Element

Method using CUDA, Dual Degree Thesis, IIT-Bombay.

[7] Mafi, R. and Sirouspour, S., 2013, GPU-baes

acceleration of computations in nonlinear finite element

deformation analysis, International Journal for

Numerical Methods.

[8] https://software.sandia.gov/hpcg/

[9] http://www2.units.it/ipl/students_area/imm2/files/Nume

rical_Recipes.pdf

[10] http://en.wikipedia.org/wiki/Tridiagonal_matrix_algorit

hm

[11] T. Hughes, I. Levit, J. Winget, “An element-by-element

solution algorithm for problems of structural and solid

mechanics”, Comp. Meths. Appl. Mech. Engg., 36 (2),

1983, 241-254.

[12] K. Law, “A parallel finite element solution method”,

Comp. & Struct., 23 (6), 1986, 845-858.

[13] Y. Liu, W. Zhou, Q. Yang, “A distributed memory

parallel element-by-element scheme based onJacobi-

conditioned conjugate gradient for 3D finite element

analysis”, Fin. Elemnt. in Analys. and Desg., 43 (6–7),

2007, 494-503.

Paper ID: SUB155753 2068