Clock Skewing EECS 290A Sequential Logic Synthesis and Verification.
Implementation and Optimization of FDTD Kernels by Using Cache-Aware Time-Skewing Algorithms
-
Upload
serhan-oezbey -
Category
Technology
-
view
69 -
download
0
Transcript of Implementation and Optimization of FDTD Kernels by Using Cache-Aware Time-Skewing Algorithms
IMPLEMENTATION AND OPTIMIZATION OF FDTD KERNELS BY USING CACHE-AWARE TIME-SKEWING ALGORITHMS
THESIS PRESENTATION
1
SERHAN OZBEY WARSAW UNIVERSITY OF TECHNOLOGYINSTITUTE OF TELECOMMUNICATIONS 16/03/2017
ABSTRACT
The main goal of this thesis was to implement and optimize cache-aware time-skewing algorithms for FDTD kernels to reduce cache misses and idle time of the processor.
Large scale discretization of space and computations needed for electromagnetic simulations
Importance of utilization and optimization of an efficient memory access pattern
Naive implementation of FDTD method into code is a kernel with cascaded loops that makes data reads and writes from memory to calculate EM fields.
Exploiting data dependencies and locality features of FDTD kernel with a better usage of memory hierarchy, reducing processors’ idle time is achievable
Execution time of FDTD can take long if cascaded loops are not incremented in a way to use data dependencies efficiently.
Reduction of this idle time can be done with skewing and blocking time and space domains to force loop iterations to follow data dependencies for a better access scheme with better usage of fast CPU cache memories
TOPICS
1. INTRODUCTION
2. LITERATURE REVIEW
3. METHODOLOGY
4. RESULTS AND DISCUSSION
5. CONCLUSIONS
3
INTRODUCTION
For sustainable and reliable telecommunication networks, modelling of efficient and durable network components are highly demanded. This is done by modelling and producing efficient devices that interacts well with electromagnetic disturbances that affects performance of such components.
Considerations of factors such as electromagnetic radiation, scattering should be done by electromagnetic modelling of devices to simulate interactions of devices with nature conditions and materials existing in environment.
This is done by modelling and producing efficient devices that interacts well with electromagnetic disturbances that affects performance of such components
4
INTRODUCTION
Computational electromagnetics (electromagnetic modeling): is the process of modeling the interaction of electromagnetic fields with physical objects and the environment. Maxwell’s equations should be solved, which will evaluate electric and magnetic fields according to given boundary and constitutional relation conditions.
By using computationally efficient approximations to Maxwell's equations, it is used to
calculate antenna performance
electromagnetic compatibility,
radar cross section
electromagnetic wave propagation when not in free space.
5
INTRODUCTION
Computational electromagnetics have been the answer for electromagnetic simulations using latest technology available. By now, there is many methods existing in domain such as integral form Maxwell’s equation solvers like MoM or differential form Maxwell’s equation solvers as FEM and FDTD.
To achieve high details and accuracy in these solvers, huge discretization of space and time elements needed to solve these problems.
This means memory should be used in an efficient way by exchanging spatial and temporal data in a fast way to calculate the field values with Maxwell’s equations till the end of the given time.
6
INTRODUCTION
• FDTD, the numerical analysis technique which is used widely in computational electromagnetics , belongs in the general class of grid-based differential numerical modeling methods. The time-dependent Maxwell's equations (in partial differential form) are discretized using central-difference approximations to the space and time partial derivatives.
7
FDTD METHOD
Solving Maxwell’s equations in time domain.
Saving each frame (one time iteration of our code) as a movie.
Electric field changing at a particular point will induce a curling (circulating) magnetic field.
Likewise, an induced magnetic field induces curling electric field.
This leaves us with a leapfrog way of calculations as shown at the figure on right hand side.
8
FDTD METHOD
for t in 0 to NT-1
for i in 1 to N-1
E[i] = k1*E[i] + k2 * ( H[i] - H[i-1] )
end for
for i in 1 to N-1
H[i]+=E[i]-E[i+1]
end for
end for
A naïve 1D FDTD algorithm.
It is calculating all field values N for every NT timesteps.
9
INTRODUCTION
• FDTD, remains to be a challenging task for the computers and devices running it due to it’s high demands of computational power and memory bandwidth .
• Programs can’t leverage fully efficiently from the evolving processor power upgrades matching Moore’s Law , as processors spend more than %80 of their time waiting for a data to process or to be received from the main memory.
10
INTRODUCTION
• Stencil codes such as FDTD kernels includes cascaded loops forcing processors to make a lot of memory read and writes. This is because of problem sizes in general are too big to fit inside the biggest cache component of the processor.
• Special feature of stencil codes are known as datas are somehow related to it’s neighbours.
• In case of FDTD kernels, this is happening between E-fields and H-fields. Space and time elements are dependent to elements close by in FDTD, as a result of Maxwell’s equations.
11
A data dependency graph, showing how the elements at different space and time are related to each others computations as shown at the FDTD formula.
12
As programs can’t leverage fully efficiently from the evolving processor power upgrades matching Moore’s Law, one factor that is becoming more and more important is how well the algorithm takes advantage of the memory hierarchy, its memory performance .
Memory access speed is very important in modern microprocessors. And this is a reason that we will focus our work to cache memory hierarchies to make the most of effective cache replacement methods to
reduce cache miss rates
improving locality of data
making the fast data access possible between processor and memory via effective cache usage.
14
INTRODUCTION
Cache-aware time-skewing algorithms takes advantage of explicitly defined processor details which is being used with. As the algorithm stores data together in the same block , and as mentioned earlier, this is the reasons that processors memory page size and cache lines should be included inside algorithm.
This is a vital part as the algorithm is taking advantage if processors cache behavior as it’s main objective is minimizing the movement of memory pages in processors cache.
Objectives will be focused on loop tiling , time skewing , reducing CPU stalls with data locality optimizations. Significant rise on the performance will be expected as a result of these optimization steps.
15
INTRODUCTION
INTRODUCTION
FDTD solvers demands expensive hardware with parallelism features to run smoothly and accurately,
Our objective was to extend previous researches that provided ideas against these solutions.
The main objective of this thesis is achieving better results in means of reliability, cache usage and execution times for FDTD codes to make it available to run smoothly and accurately given problems with also taking the physics and engineering aspects of the problem into account which has been lacking in previous researches.
Extension of previously known works on code optimizations such as loop blocking, cache-aware algorithms and time-skewing techniques has been introduced as a contribution in details, instead only including implicit informations.
16
LITERATURE REVIEW
FDTD method
References for understanding the problem and implementation of theory to code
Changes and proposals for new FDTD techniques
Solving FDTD problems for extreme conditions and specific problems
Photonics , biomedicine
Solving Schrodinger equations with a generalized FDTD approach
Different implementations to software as V2D.
17
LITERATURE REVIEW
Memory hierarchy and the "memory wall"
Referring to important concepts of memory management and optimizations such as
Memory hierarchy
‘Memory wall’ term
Von Neumann bottleneck
Roofline model
Memory mountain
18
LITERATURE REVIEW
Stencil codes and data dependencies
Definition and types of stencils
Approximating problem into stencil code
Methodology of determination of data dependencies
Other terms such as: Paralellism, GPU
Locality optimizations
Understanding the ‘Principle of locality’
Important terms related to locality features of codes ( machine balance, computer balance, scalable locality)
Different code optimization algorithms studies
19
METHODOLOGY
Research design
Code generation and validation
Dependence and loop iteration analysis
Finding optimal tiling and skewing
Methodogical assumptions
20
Summarizing, for both 1D FDTD and 2D FDTD:
Cache profiling
Execution time
Data types and Programming Languages
Compiler optimizations
Future works
33
RESULTS AND DISCUSSIONS
CONCLUSIONS
Computational electromagnetics gained much more importance with improvements and demands of the related technologies, such as antenna design, bio-medicine, wireless communications
A good software implementation is a must for highly memory and computational intense code kernel such as FDTD
In this thesis, previous literature work was extended and demonstrated about the improvements with software optimizations such as loop blocking, cache-aware algorithms and time-skewing for 1D and 2D FDTD kernels.
34
CONCLUSIONS
Difference between naive FDTD codes and applied algorithms applied were shown in the results for 1D and 2D cases.
Results that were achieved indicates that applying time-skewing algorithms, with the way that has been done in this thesis, comes with increased total data references but with much better cache hit rate performance from other codes.
Performance of time-skewing is much visible in 2D code in terms of cache misses.
Run-time graphs and improved L1 and L3 cache miss rates for 1D and 2D cases have been achieved and demonstrated with results.
Explanation of line-by-line cache misses are explained throughout the thesis.
35