Nine-Month Report - University of Birminghampostgrad.eee.bham.ac.uk/zhangl/report/Nine Month...
Transcript of Nine-Month Report - University of Birminghampostgrad.eee.bham.ac.uk/zhangl/report/Nine Month...
Nine-Month ReportHardware Acceleration for Real Time
Solution of the Discrete-Finite Element Method
Supervised by
Dr. Steven F Quigley
Prof. Andrew H C Chan
Lin ZhangDepartment of Electronics, Electrical and Computer
EngineeringThe University of Birmingham
June 2008
Index
Index
Abstract - 1 -1. Introduction - 2 -
1.1 Overview - 2 -1.2 Outline - 3 -
2. Combined Discrete Finite Element Method - 5 -2.1 Finite Element Method - 5 -
2.1.1 Beam element.....................................................................- 7 -
2.1.2 Space frame element........................................................- 10 -
2.2 Discrete Element Method - 15 -3. Parallel Solution for Linear Matrix Equations - 16 -
3.1 Classic Methods for Linear Matrix Equation Solution - 16 -3.1.1 Direct Solution Process.....................................................- 16 -
3.1.2 Iterative Computing Method..............................................- 21 -
3.2 Overview of Parallel Computation - 26 -3.2.1 Matrix Partition..................................................................- 26 -
3.2.2 Matrix Multiplication..........................................................- 28 -
3.3 Paralleled Methods for Equation Solutions - 29 -3.3.1 Jacobi................................................................................- 29 -
3.3.2 Gauss-Seidel....................................................................- 29 -
3.3.3 Successive over-relaxation...............................................- 30 -
3.3.4 CG and PCG.....................................................................- 31 -
3.4 Summary and Outlook - 32 -4. Case Studies - 33 -
4.1 Beam model - 33 -
1. Introduction
4.2 Space Frame Model - 41 -4.2.1 Optimization and balance.................................................- 42 -
4.2.2 Design for parallel solution of matrix equations................- 47 -
5. Conclusion - 48 -5.1 Summary of the report - 48 -5.2 Work plan for next nine months - 48 -5.3 Outline plan to PhD submission- 49 -5.4 Publication plan - 49 -
Reference I
Abstract
Abstract
This nine-month progressive report will present a general overview of the
combined finite-discrete element method and then go through the
fundamental principles of both finite element method and parallel computing
processes. After these background reviews, the report will result in a
fundamental structure to deal with the FEM problems. Two case studies on
finite element models will be introduced in the following part of the report
using the resulted FEM structures. This part will introduce the designs and
implements have been done so far. The finial part of this report will be the
work plan for the next stage.
1
1. Introduction
1. Introduction
1.1 Overview
The combined Discrete-Finite Element Method (DFEM) is a promising
approach for creating virtual reality and gaming environments that exhibit
highly realistic physical behaviour[1]. It combines both the advantages and
benefits of the finite element tools and techniques with discrete element
algorithms. However, based on the complexity of this method, the DFEM
equations are computationally expensive, and cannot be solved in real time
on common desktop PCs and workstations for complex virtual environment.
Many numerical problems can be greatly speeded up by using hardware
accelerators, specialized integrated circuits that exploit high level of
parallelism to give rapid solution.
A practical attempt to solve DFEM equations in real time could be performed
by using hardware accelerators on low cost plug-in board for desktop PCs.
This will involve investigation of a variety of formulations of the DFEM in order
to identify which method can be best accelerated by low cost hardware with a
clear partition between hardware and software. It will also involve design of
the hardware accelerators and evaluation of their effectiveness and scalability
to large-scale problems.
2
1. Introduction
The study for the past nine months was still the initial stage of the whole
project. The main objectives are:
1) Studying the background of the DFEM especially in the Civil engineering
field and the fundamentals of the parallel techniques for the solution of
linear matrix equations.
2) Developing proper techniques for hardware parallel computing which fit
the characteristics of the FEM.
3) Design of a hardware accelerator and its software interface for two simple
FEM models.
1.2 Outline
This report is organised as follows:
Section 1 will generally introduce the research background of the whole
project and state the target for the work in the first nine month
period, followed by the outline of the whole report in 1.2.
Section 2 will be a summarized literature review of the combined Discrete-
Finite Element Method. Based on the current research progress,
which is more focused on finite element method currently, this part
will begin with a detailed description of the FEM, followed by a
brief introduction of the discrete element method (DEM).
3
1. Introduction
Section 3 will talk about the general parallel techniques for the solution of
linear matrix equations. Several approaches for matrix calculation
and equation solution will also be introduced.
Section 4 will describe the case studies on a beam model in 32-bit fixed-
point, as well as the space frame model in floating-point;
Section 5 will be about the conclusion of the past nine month work, the plan
for future work and publication, and expected date to finish this
study.
4
2. Combined Finite-Discrete Element Method
2. Combined Discrete Finite
Element Method
Just like as name suggests, the combined finite-discrete element method is a
combination of finite element-based analysis of continua and discrete
element-based transient dynamics, contact detection and contact interaction
solution. It is the solution for transient dynamic analysis of system which
contains a large number of deformable interactants in a process of breakage,
fracture and fragmentation [2].
2.1 Finite Element Method
The finite element method is a technique for approximating the governing
differential equations for a continuous system with a set of algebraic
equations using a finite number of variables [3].
Classical Structural Mechanics (force, displacement, energy) can be used to
solve simple and normal engineering problems. With the development of
structural engineering, especially the usage of computer calculation, matrix
analysis methods became more popular, and one of the most commonly used
matrix methods is the Structural Mechanics matrix displacement method
5
2. Combined Finite-Discrete Element Method
which is the precursor of the finite element method. In the finite element
method, after dividing a structure into units called elements, an approximate
displacement function is used to represent the behaviour of the structure.
Therefore, the finite element method is an approximate numerical analysis.
The principle of virtual work can be used to derive the finite element
equations. For further information about the FEM one can refer to various
textbooks such as [4, 5].
Generally speaking, in solid and structural mechanics the FEM can be used to
solve one-, two- and three-dimensional as well as axisymmetric problems,
including the elastic, elasto-plastic and viscoelastic analysis of trusses,
frames, plates, shells and solid bodies [6].
The one-dimensional structural elements can be used for the analysis of
skeletal type systems like planar trusses, space trusses, beams, continuous
beams, planar frames, grid systems and space frames [7]. This would result in
the same matrix format for the matrix displacement method in structural
mechanics. It is used to calculate the internal forces and distortion of the
plane or three dimensional structures.
The common procedure to solve matrix structure problem is:
1) Discretization of the whole structure as close to its geometry as possible.
That is to say the skeletal shape, including elements and nodes, of the
element should fit the real object as much as possible.
2) Find out the properties of these elements and build the element matrices
6
2. Combined Finite-Discrete Element Method
such as the stiffness matrix.
3) Analysis of the whole structure by assembling the equilibrium equations for
each node, the structural stiffness matrix and the load vector.
4) Apply the boundary conditions and solve the matrix equations. Normally,
the result from the solutions is the displacement of each node.
5) Calculate out the internal force and the distortion from the displacement of
the nodes and output the quantities of interest.
The present study till now was focused on the beam and space frame models.
2.1.1 Beam element
The deformation of a beam element is assumed to be described only by the
transverse displacement ( , ) and rotation ( , ) of the beam, where
as in [Figure 1].
Figure 1 A typical beam element
7
2. Combined Finite-Discrete Element Method
Because there are four nodal displacements, a cubic displacement model is
required
Using the condition
and at
and
and at
The equation can be also written as
where the shape function is given by
8
2. Combined Finite-Discrete Element Method
Figure 2 Deformation of an element of frame in plane[8]
From [Figure 2] axial displacement due to the transverse displacement can
be expressed as
where is the distance from the neutral axis. The axial strain is given by
where strain-displacement matrix is given by:
The element stiffness matrix can be calculated from
9
2. Combined Finite-Discrete Element Method
with
where is the area moment of inertia of the cross section about
-axis.
2.1.2 Space frame element
A typical space frame element is shown as in [Figure 3]
10
2. Combined Finite-Discrete Element Method
Figure 3 Element with 12 degrees of freedom[9]
The 12 degrees of freedom could be divided into three individual groups. The
and could be described using a linear displacement model and leads to
the stiffness matrix as:
where , and are the area of cross section, Young’s modulus and second
moment of inertia respectively.
From Hook’s law, where is the shear modulus of the material. The
stiffness matrix of the element corresponding to torsional displacement
degrees of freedom and can be derived as
11
2. Combined Finite-Discrete Element Method
Since , where is the polar moment of inertia of the cross section,
Eq. can be rewritten as
The second group of degrees of freedom are , , and . These four
could be seen as a beam element, which also be called bending
displacements in the plane . Thus, the corresponding stiffness matrix can
be derived as
where is the area moment of inertia of the cross section about
-axis.
12
2. Combined Finite-Discrete Element Method
Similarly, , , and contribute the bending displacements in the plane
. The stiffness matrix can be derived as
where is the area moment of inertia of the cross section about
-axis.
Build, and together, we can get the overall local stiffness matrix of the
element as
13
2. Combined Finite-Discrete Element Method
To transfer the local stiffness matrix to a global one is normally used
where the transformation matrix is given by
and the transformation sub-matrix is given by
Three nodes are needed to define the space frame element, where two of
them are both ends of the element and the other one is use to define the
position of the local plane. , and are for the local coordinates.
The local direction can be define as
14
2. Combined Finite-Discrete Element Method
Therefore, the local direction can be defined by the cross product from and
Similarly, the local direction can be produced by the cross product of and
2.2 Discrete Element Method
The discrete element method was first proposed by Cundall in the early 1970s
[10], and was originally used to analyze the mechanical behaviour of
discontinuous media, such as rocks. The main idea about the DEM is to
divide those discontinuous objects into an aggregate of rigid elements, and
then the motion equations are used to calculate their motions. From those
equations, the whole state of movement could be derived. The most important
benefit of the DEM is that the relative motions of particles are permitted and
there is no requirement for continuity of displacement and compatible
deformation conditions.
Despite originating from traditional discontinuous medium problems, the
application area of the DEM was extended to continuous objects and
15
2. Combined Finite-Discrete Element Method
mechanical transformation problems between continuous and discontinuous
medium. One typical example is the damage and destruction of brittle
material, such as concrete, under dynamic loading like shock and penetration.
This kind of topic is normally hard to solve and simulated directly by those
algorithms only based on continuity mechanics, such as finite element method
which was mentioned before [11, 12].
Present work on DEM has just started and will go deeper in the following
months.
16
3. Parallel Solution for Linear Matrix Equations
3. Parallel Solution for Linear
Matrix Equations
The most time consuming part of the FEM could be matrix equation solution.
On the other hand, although no solution of matrix equations is required for the
DEM, a small time step which is less than the critical time step has to be used
because of the conditionally stable nature of the explicit scheme used. The
FEM contains a large number of matrix addition and multiplications and the
DEM contains large number of floating point calculations which has normally
caused the real time solution of large FE and DE systems to be impracticable
on sequential clusters. One acceptable solution for the matrix equations is the
use of parallel techniques.
3.1 Classic Methods for Linear Matrix Equation Solution
There are several classic methods normally used for matrix equation
solutions, which could be roughly divided into two categories, direct solving
process and iterative computing method.
17
3. Parallel Solution for Linear Matrix Equations
3.1.1 Direct Solution Process
The direct solution process includes Gaussian Elimination Method, Gauss-
Jordan Elimination Method and LU decomposition method. These schemes
are normally suitable for dense linear equations systems.
3.1.1.1 Gaussian elimination method
There are two basic steps in Gaussian elimination. Step one is Forward
Elimination, which reduces the given system of equations into triangular form,
or results in a degenerate equation with no solution, indicating that the system
is singular. This is accomplished through the use of elementary row
operations. The second step uses back substitution to find the solution of the
system above [13].
The technique of partial pivoting is also widely used in Gaussian elimination
method which moves the entry of row with the largest absolute value to the
“pivot position”. This difference improves the numerical stability of the
algorithm and reduces the round-off error.
3.1.1.2 Gaussian-Jordan elimination method
The Gaussian-Jordan Elimination is a variation of the traditional Gaussian
18
3. Parallel Solution for Linear Matrix Equations
Elimination method, which put zeros both above and below the diagonal
elements, where the traditional one would keep an upper triangular matrix. In
other words, Gauss-Jordan elimination brings a matrix to reduced row
echelon form.
Gauss–Jordan elimination is considerably less efficient than Gaussian
elimination with back-substitution when solving a system of linear equations.
However, it is well suited for calculating the matrix inverse [14].
3.1.1.3 LU decomposition method
The LU decomposition is a matrix decomposition which decomposes a matrix
as the product of a lower and upper triangular matrix [15]. It can also be
viewed as a variant of the Gauss elimination method.
Let be a square matrix and , where is an upper triangular matrix
with unity at the diagonal and is a lower triangular matrix. Thus the linear
matrix equation can be write as
Let , the original problem can be equivalent transform ed into
19
3. Parallel Solution for Linear Matrix Equations
It reduces the whole solution into two simpler ones that can be easily solved
by back-substitution.
Using the matrix multiplication shows in, the Doolittle decomposition could be
derived as
One advantage for using a computer to solve the LU method is the efficiency
of the memory requirement for storing the matrix. On one hand, there is no
need to store the “0”s in either or and the “1”s on diagonal. Thus, and
could be stored in one square matrix form. On the other hand, each would
20
3. Parallel Solution for Linear Matrix Equations
be only used once to calculate the corresponding or , that the
result of and can just overwrite the .
In most practical considerations, such as in FEM, the coefficient matrices
normally are symmetrical when the problem is linear elastic. In this case, the
LDLT method could reduce the calculation, the memory requirement and
simplify the program design.
Let , where
Because , . Based on the
21
3. Parallel Solution for Linear Matrix Equations
uniqueness of the decomposition for matrix, we can safely obtain the result
that . Moreover, is unique if A is not singular.
where
or
3.1.2 Iterative Computing Method
On the other hand, iterative computing methods are more suitable for sparse
matrix problems although they can also be used to deal with dense ones as
well. The typical schemes in this category are the Jacobi method, Gauss-
Seidel method, Successive over-relaxation (SOR) and Conjugate Gradient
Method for symmetric matrices.
22
3. Parallel Solution for Linear Matrix Equations
3.1.2.1 Jacobi method
The Jacobi method is an iterative algorithm used to solve linear equations.
The solution of vector is sought where .
Let , where , and represent the diagonal, lower triangular
and upper triangular part of the coefficient matrix respectively. Then the
equation to be solved can be rewritten as
and
If for each , the definition for Jacobi method can be expressed as
where is the iteration count. The element based approach can be described
as [16]
23
3. Parallel Solution for Linear Matrix Equations
3.1.2.2 Gauss-Seidel method
Gauss-Seidel method is very similar to Jacobi method which described above.
The Gauss-Seidel iteration is
where , and represent the diagonal, lower triangular and upper triangular
part of the coefficient matrix respectively. One explicit implementation of the
Gauss-Seidel is[17]
Compared with the Jacobi method, the computation of in Gauss-Seidel
method only uses the most updated values. It has reason to believe that this
improvement can speed up the convergence at the first place. Moreover, from
the express of, it is clear that after the calculation of , the value of is
useless. This characteristic of the algorithm may result in memory saving in
the program design stage that the value of could be overwritten when a new
value is obtained.
24
3. Parallel Solution for Linear Matrix Equations
3.1.2.3 Successive over-relaxation
Successive over-relaxation (SOR) is a numerical method originally used to
speed up the convergence of the Gauss-Seidel method [18, 19]. The key
point of the SOR is the introduction of the relaxation factor and the
correction where
Thus the general iteration can be expressed as
where the right-hand side should be calculated according to eq. (3.18)
instead of the normally iteration expressions.
For Gauss-Seidel method, the SOR expression is
Similarly, the iteration expression for Jacobi method is
25
3. Parallel Solution for Linear Matrix Equations
If the coefficient matrix is positive definite, the SoR iteration is convergent
when [20].
3.1.2.4 Conjugate Gradient (CG) and Preconditioned Conjugate Gradient (PCG) Method
The Conjugate Gradient method is an effective method for symmetric positive
definite systems as shown in [21].
The method proceeds by generating vector sequences of iterates (i.e.,
successive approximations to the solution), residuals corresponding to the
iterates and search directions used in updating the iterates and residuals[22].
Under most circumstances, CG is used in combination with some kind of
preconditioning[23]. The matrix is implicitly multiplied by an approximation
of where normally constructed to be an approximation of and
is easier to solve than that it reduced the condition number of the
coefficient matrix. Jacobi preconditioning is usually used, which is a diagonal
matrix with the diagonal elements of the matrix .
The algorithm of Preconditioned Conjugate Gradient is as follows[24]:
26
3. Parallel Solution for Linear Matrix Equations
3.2 Overview of Parallel Computation
Parallel computing is a kind of process which can use diverse computational
resource to deal with numerical problem[25]. There are two different aspects
of parallel techniques: on time domain using say pipeline, and on spatial
domain using say domain decomposition. The spatial domain parallelization
seems more widely accepted as the definition of the parallel computation.
The process which can speed up the calculation related to matrix is the main
point to focus for both FEM and DEM in this research.
27
3. Parallel Solution for Linear Matrix Equations
3.2.1 Matrix Partition
To dealing the matrix related issues, the first step is to divide the whole
problem, normally the coefficient matrix , into several sub-matrixes that
would be used in the following procedures. The two common ways to do so
are striped partitioning as shown in [Figure 4] and [Figure 5] and
checkerboard partitioning showing in [Figure 6].
Figure 4 Striped partitioning examples of 16x16 matrixes
28
3. Parallel Solution for Linear Matrix Equations
Figure 5 Striped row-major mapping of a 27x27 matrix on 3 processes
Figure 6 Checkerboard partitioning examples of a 16x16 matrix
3.2.2 Matrix Multiplication
Let , and where . One simplest parallel
matrix multiplication method is described as in [Figure 7] where .
29
3. Parallel Solution for Linear Matrix Equations
Figure 7 Example of the simple matrix multiplication
The pseudo code is
where , . The matrix shift one process forward per column
per loop[26].
There are also some sophisticated schemes for parallel matrix multiplication,
such as Cannon’s method[27], Fox’s method[28], DNS method[29] and
Systolic method[30].
3.3 Paralleled Methods for Equation Solutions
Classic iterative matrix equation solution methods, such as Jacobi and Gauss-
Seidel, can also be modified to run in parallel.
30
3. Parallel Solution for Linear Matrix Equations
3.3.1 Jacobi
From, let and , then the basic process for
each iteration is one multiplication for matrix and vector followed by vectors
addition
The parallel algorithm of the Jacobi method is very obvious by using.
3.3.2 Gauss-Seidel
Gauss-Seidel method is a scheme which calculates each element one by one.
It is a strict sequential process in that the calculation of each new element
needs all the newest result of the former elements.
Let , the parallel algorithm of Gauss-Seidel
method can be expressed as[31]
31
3. Parallel Solution for Linear Matrix Equations
where
The matrix multiplication in and are using parallel technique described in.
3.3.3 Successive over-relaxation
The parallel version of SOR is varied by the method using for generate the
SOR expression as discussed in section 3.1.2.3. Generally speaking, the
parallel method mentioned in section 3.3.1 and 3.3.2 can be used to replace
the on the right hand side of Eqn while the calculation of is
running simultaneously.
32
3. Parallel Solution for Linear Matrix Equations
3.3.4 CG and PCG
As expressed in Eqn, if matrix or is not very sparse, most work is done in
or solving . This is where parallelism most beneficial.
Rewritten the preconditioner , the parallel version of PCG is[24]:
All compute intensive operations can be done in parallel.
33
3. Parallel Solution for Linear Matrix Equations
3.4 Summary and Outlook
In this section, several classical methods for linear matrixes equation solution
have been discussed, as well as their parallel forms. There is another group
of solution schemes which concentrate on the local stiffness matrix generate
in FEM for each element instead of dealing with the whole global stiffness
matrix. it could increase the efficiency of the calculation of the classical
methods, especially for FEM. More research will be done for these element-
by-element methods.
34
4. Case Study
4. Case Studies
There are some FPGA-based applications about FEM using on different
fields[32, 33]. These efforts gave the common steps to dealing with FEM
applications to follow.
4.1 Beam model
Beam is one of the simplest models among all FEM schemes. It has been
chosen as the first problem to work on for building the common structure for
future work.
Based on the principles introduced in 2.1.1, the work was divided into two
parts, software and hardware. The software version was written using
MATLAB m-files, which provides convenient matrix calculation functions. The
whole work includes four stages: Data Initialization, Equation Matrix Solution,
Position Calculation and Result Plotting. The most important part within these
three is Equation Matrix Solution. In the first instance, LDLT scheme was
used, which can be seen as an extension of the LU method but is only
suitable for symmetric matrix. Thus it was considered the best solution to fit
the FEM. The expressions can be found at and . The only problem of LDLT
scheme is its sequential nature which could be complex and resource costing
when moved to hardware. Another attempt to solve the equations was Jacobi
iterative method. This scheme was widely used on microprocessors because
35
4. Case Study
it does not require much resource and has good efficiency. There are also
some attempts already been made for the FPGA platform [34]. In any case,
the software part is only the first stage to make sure that the whole design can
run correctly on the arithmetic level and obtain a result of the given problem to
compare with the hardware generated result [Figure 8].
Figure 8 User Interface of the software for beam model with the left end fixed and right end loaded
The hardware part was written in VHDL which was generated by System
Generator. Following by the same structure of software implementation, the
program was organized as shown in [Figure 9].
36
4. Case Study
Figure 9 Four blocks of the Hardware
The parallel techniques were used in the blocks on FPGA.
As discussed in section three, Jacobi method first came into the consideration
of the hardware design because of its iterative nature. From equation , the
idealized block diagram of Jacobi iteration is shown in [Figure 10]. The first
pipeline stage is multiplication. Each row of matrix feeds into the multiplier
leaf nodes with the vector at the other input simultaneously. After the
multiplication, the output of each multiplier will be routed to an accumulator to
obtain the value of . The third stage in the pipeline is the subtraction
of , followed by the last division stage.
37
4. Case Study
Figure 10 Idealized Jacobi block diagram[35]
In the actual design, the division in hardware is normally a time consuming
process that it should run at the beginning and obtain the value of . This
value should be stored for later use. That is to say, the last stage of the
pipeline could be changed to a simple multiplication and the division process
is hidden behind the three stages described earlier. Another solution to avoid
performing division on hardware is to add one more group of input values
which contains the quotient from the software. The second solution was the
way that is used in the current design.
38
4. Case Study
Table 1 Design parameters of Jacobi method
Parameter Type Description
integer Input vector length
integer Multiplier latency
integer Adder latency
integer Data path width
integer Reduction vector length
Simply increasing the size of the binary reduction tree to deal with larger
matrices is unwise as the resources on FPGA are limited although a larger
board can normally be found. This way of using hardware resource is
inefficient. When the problem size increases beyond a certain amount, this
method will definitely fail. A better way to deal with the problem is to separate
the whole problem into several sub-matrix with an upper boundary of for the
largest data path width whenever the input vector [35].
The time for get one value of should be
where is the time cost in each sub binary reduction tree, is
39
4. Case Study
the number of sub blocks and is the time for subtraction by and the
addition of the . is the time cost of the reduction tree for all the result
generated from sub blocks that
And the time cost by comparison of the condition of convergence is
It is clear that when and are any power of while is any multiple of ,
the whole system will have the highest efficiency.
By using the 32 bit fixed-point core in System Generator and RC2000-V2
(Virtex-II xc2v6000-4ff1152), the time and resource cost are listed below
Table 2 Available resource on Virtex-II xc2v6000 board[36]
40
4. Case Study
Table 3 Core statistics
Core Latency (Clock cycles) Resource cost
(Slices) FFs LUTs
Multiply 587 1104 1119
add 17 0 33
If a problem has , the total slices cost is BRAMs=73, Slices=
10642.
Table 4 FPGA resource utilization
Slices % BRAMs %
10642 31.5 73 50.7
The time cost by this problem per iteration should be
41
4. Case Study
336 2592 20544
559 4548 38805
The time cost of the Positioning part of the program was 28 clock cycles per
element and the hardware cost was 15604 Slices, 28420 FFs and 29354
LUTs as discussed in the three month report [37].
With the problem size for an coefficient matrix of single precision
elements, Jacobi iteration solution on software will cost times of division,
times of multiplication and times of addition according to Eqn.
And the time cost by comparison of the condition of convergence is
times of addition and times of multiplication. Thus, the time cost for software
calculation is increased as an exponential function:
42
4. Case Study
For hardware, when the structure is defined, the time cost per iteration of this
certain design is fixed. When the whole system is fully loaded, it would reach
its peak efficiency point.
Figure 11 Time Costing Comparison between Hardware and Software
4.2 Space Frame Model
As discussed in section three, the space frame model is much more
43
4. Case Study
complicated than beam model. The problem cannot be solved as easy as
“pump the data in and wait for the result” as it is in the beam model. The work
for space frame model was divided into two stages: structure design and
hardware generation. The work has been done so far was limited on design
stage for hardware.
4.2.1 Optimization and balance
In the structure design stage, many points should be considered to find a
solution which can generate the result of the problem in a rational time with an
acceptable computation resource cost.
The input data normally contain these components:
(1) Size of problem. The size of the space frame problem can be defined by
six parameters: the number of elements (nelem), the number of points
(npoin), the number of output (noutp), the number of nodes (nnode), the
number of dimension (ndime) and the degrees of freedom (ndofn).
(2) Element connectivity Matrix lnods[nnode][nelem]. This matrix describes
how those elements link together.
(3) Matrix nboun[ndofn][npoin]. This matrix contains the information about the
boundary conditions which shows the property of each nodal point on the
end of elements.
(4) Matrix presc[ndofn][npoin]. This matrix contains the value of the force or
44
4. Case Study
non-zero boundary condition applied on each nodal point.
(5) Matrix coord[ndime][npoin]. As its name shows, the number in each row of
the matrix gives the coordinates of each nodal point.
(6) Vector d[7]. This vector contains the material parameters of the element.
4.2.1.1 Software or Hardware
The whole design follows the basic structure suggested in [Figure 9] in
general, but there is a need to clarify the boundary of the software and
hardware parts of work. After reading in all the given values mentioned
before, the following procedures used is shown in [Figure 12]
Figure 12 Process to calculate space frame element
(1) Process for boundary conditions where “0” for free ends, “1” for fixed ends
and “-1” for tied nodes;
(2) Create the coordinate array for each element;
45
4. Case Study
(3) Generate the local stiffness matrix for each element;
(4) Assemble the global stiffness matrix;
(5) Generate the force vector;
(6) Calculate out the final result.
Generally, the software part, which was written in ANSI-C, is controlling the
data input and output with some simple process based on its sequential
nature. It can reach a pretty high frequency for instructions. Looking back on
[Figure 12], the procedure (1) which uses to generate new boundary condition
matrix is a sequential process that more suitable to deal it on PC side.
Similarly, procedures (2) and (5) should be run on PC side as well.
Procedure (3) is a repeatable process that can be sped up by sharing the
work with multiple instances. It is more suitable to be put into the FPGA side.
The procedure (6) is the solution for matrix equations. It should definitely be
put on the FPGA side as well. The communication between PC and FPGA is
very time consuming that move process (4) back to PC is unreasonable. As
described above, we can safely reach the conclusion that the red zone in
[Figure 12] is the extent of the hardware process for the space frame model.
4.2.1.2 Data storage
The key issue of data storage is to find the balance between time
consumption and resource costs. Normally, a highly compressed storage
strategy would result in complex access procedures which could cost more
46
4. Case Study
time. It can be seen as a compromise between time and storage.
In common structural computing problems, the size of the total stiffness matrix
used is normally quite big that would cost large amount of memory to store
them. There are several ways to save all these data. The easiest way is full-
matrix method that saves all of the elements in the stiffness matrix into the
processor. Actually, this method is lavish under most circumstances and
rarely be used in practical cases.
There are two storage schemes frequently used in program design, especially
for sparse matrix, two dimensional constant bandwidth and one dimensional
variable bandwidth storages.
Based on the symmetry of the total stiffness matrix, only half of the data in the
matrix need to be saved. Furthermore, the non-zero elements in the matrix
are usually concentrated in the area near the diagonal. Thus, those elements
can be stored into a matrix that equals to the dimension of the
global stiffness matrix and is the half-bandwidth. Back to the space frame
element, there are three degrees of freedom per node, that
When the largest difference between the equation numbers of each frame
element is
47
4. Case Study
shows the matchup between the original stiffness matrix and the modified
new matrix .
If there is one element in matrix is , this element in matrix should be
. The ratio for the cost of memory resource in space frame method is
It is clear that to gain a high efficient storage result, the difference between the
equation numbers of each frame element should be small.
The one dimensional variable bandwidth scheme is much more complex than
the two-dimensional constant bandwidth method that will not be discussed
further here.
48
4. Case Study
The choice between the full-matrix method and the two dimensional constant
bandwidth one is dependent on the size of the problem. If it is a small one that
the memory space is good enough for the whole process, the full-matrix
storage method could provide better performance on time scale. Otherwise, if
the memory space becomes the bottleneck of the problem, the band storage
method should be a better choice.
4.2.2 Design for parallel solution of matrix equations
The Jacobi method often converges rather slowly when compared to the more
sophisticated methods. As discussed in section three, the SOR method
should be a better solution for speed up. In this design, the successive over-
relaxation of Jacobi (JSR) is used. According to, the algorithm for JSR only
needs to change slightly from [Figure 10].
The main problem for an efficient JSR is finding a proper relaxation factor .
This will be done in the future work.
49
5. Conclusion
5. Conclusion
5.1 Summary of the report
This nine-month progressive report has presented a general overview of the
combined finite-discrete element method then went through the fundamental
principles of both finite element method and parallel computing processes.
These background studies result in a fundamental structure to deal with the
FEM problems. Moreover, two case studies on finite element models have
been introduced.
5.2 Work plan for next nine months
The following points summarise the suggested work to be done in the next
period (10 to 18 working months):
Continue current case study on space frame element for FEM. The work
of hardware programming is still on the way. More detailed analysis about
the time and computing resource cost which was mentioned in the report
should be performed at numerical level.
Explore and adapt various methods for the efficient data transfer
management between PC and FPGA. The data transfer is a time
50
5. Conclusion
consuming part in comparison with the time of actual calculation. When
the size of the problem grows, it is impossible to store all the required data
on the hardware part at the beginning. The control of the data transfer
management should be added into consideration of the whole design.
Background study on discrete element method. The work have be done
so for was mostly focused on finite element method, which more attention
should be given to DEM in the following work.
Case study on the discrete element method.
A Gantt chart has been shown in [Figure 13] demonstrating the time plan for
each aspect.
5.3 Outline plan to PhD submission
If the proposed work is completed smoothly, the expected date for submission
of the finial thesis will be in the period between sept.2010 and Jan 2011.
5.4 Publication plan
There are two related conference in consideration for my second year study.
1) International Conference on Field Programmable Logic and Application
2009
2) International Conference on Field-Programmable Technology 2009.
51
5. Conclusion
The article will focus on the feasibility of the real time solution for FDEM on
FPGAs based on several small case studies. It will also concern about the
comparison between the time efficiency between FPGAs and common
super computers on FDEM problems.
52
5. Conclusion
53
Figure 13G
antt chart for 10-18 month plan
Appendix
Reference
I