HPC Technology Track: Foundations of Computational Science Lecture 2 Dr. Greg Wettstein, Ph.D....
-
Upload
damian-goodwin -
Category
Documents
-
view
218 -
download
3
Transcript of HPC Technology Track: Foundations of Computational Science Lecture 2 Dr. Greg Wettstein, Ph.D....
![Page 1: HPC Technology Track: Foundations of Computational Science Lecture 2 Dr. Greg Wettstein, Ph.D. Research Support Group Leader Division of Information Technology.](https://reader030.fdocuments.in/reader030/viewer/2022032804/56649e565503460f94b4e562/html5/thumbnails/1.jpg)
HPC Technology Track:Foundations of Computational Science
Lecture 2
Dr. Greg Wettstein, Ph.D.
Research Support Group LeaderDivision of Information Technology
Adjunct ProfessorDepartment of Computer Science
North Dakota State University
![Page 2: HPC Technology Track: Foundations of Computational Science Lecture 2 Dr. Greg Wettstein, Ph.D. Research Support Group Leader Division of Information Technology.](https://reader030.fdocuments.in/reader030/viewer/2022032804/56649e565503460f94b4e562/html5/thumbnails/2.jpg)
What is High Performance Computing?
Definition:
The solution of problems involving highdegrees of computational complexityor data analysis which require specializedhardware and software systems.
![Page 3: HPC Technology Track: Foundations of Computational Science Lecture 2 Dr. Greg Wettstein, Ph.D. Research Support Group Leader Division of Information Technology.](https://reader030.fdocuments.in/reader030/viewer/2022032804/56649e565503460f94b4e562/html5/thumbnails/3.jpg)
What is Parallel Computing?
Definition:
A strategy of decreasing the time to solutionof a computational problem by carrying outmultiple elements of the computationat the same time.
![Page 4: HPC Technology Track: Foundations of Computational Science Lecture 2 Dr. Greg Wettstein, Ph.D. Research Support Group Leader Division of Information Technology.](https://reader030.fdocuments.in/reader030/viewer/2022032804/56649e565503460f94b4e562/html5/thumbnails/4.jpg)
Does HPC imply Parallel Computing?
Typically but not always. HPC solutions may require specialized systems due
to memory and/or I/O performance issues.
Conversely parallel computing does not necessarily imply high performance computing.
![Page 5: HPC Technology Track: Foundations of Computational Science Lecture 2 Dr. Greg Wettstein, Ph.D. Research Support Group Leader Division of Information Technology.](https://reader030.fdocuments.in/reader030/viewer/2022032804/56649e565503460f94b4e562/html5/thumbnails/5.jpg)
Flynn's Taxonomy:Classification Strategy for Concurrent Execution
SISD Single Instruction, Single Data
MISD Multiple Instruction, Single Data
SIMD * Single Instruction, Multiple Data
MIMD * Multiple Instruction, Multiple Data
* = Relevant to HPC
![Page 6: HPC Technology Track: Foundations of Computational Science Lecture 2 Dr. Greg Wettstein, Ph.D. Research Support Group Leader Division of Information Technology.](https://reader030.fdocuments.in/reader030/viewer/2022032804/56649e565503460f94b4e562/html5/thumbnails/6.jpg)
SIMDThe Origin of HPC
Architectural model at the heart of 'vector processors'.
Performance enhancement in machines at origin of HPC:
CDC STAR-100 and Cray-1 Utility predicated on fact that mathematical
operations on vectors or vector spaces are at the heart of linear algebra.
![Page 7: HPC Technology Track: Foundations of Computational Science Lecture 2 Dr. Greg Wettstein, Ph.D. Research Support Group Leader Division of Information Technology.](https://reader030.fdocuments.in/reader030/viewer/2022032804/56649e565503460f94b4e562/html5/thumbnails/7.jpg)
Vector Processing Diagram
534
310
21
74 67
21
25
2 34
7 4 87
14
Vector Length = 8 'words'
Vector elements
Vector elements
Parallel mathematicaloperations +,-,*,/
![Page 8: HPC Technology Track: Foundations of Computational Science Lecture 2 Dr. Greg Wettstein, Ph.D. Research Support Group Leader Division of Information Technology.](https://reader030.fdocuments.in/reader030/viewer/2022032804/56649e565503460f94b4e562/html5/thumbnails/8.jpg)
Current SIMD Examples Embedded in modern x86 and x86_64 architectures.
primarily focus on graphics/signal processing MMX, PNI, SSE2-4, AVX
Foundation for current trend in 'GPGPU computing' NVIDIA Tesla architecture
Component of Larrabee architecture.
![Page 9: HPC Technology Track: Foundations of Computational Science Lecture 2 Dr. Greg Wettstein, Ph.D. Research Support Group Leader Division of Information Technology.](https://reader030.fdocuments.in/reader030/viewer/2022032804/56649e565503460f94b4e562/html5/thumbnails/9.jpg)
SSE Implementation
534
310
21
74 67
21
25
2 34
7 4 87
14
Vector elements
Vector elements
Parallel operations100+ (SSE4)
128 bit XMM register 128 bit XMM register
Stride Length
![Page 10: HPC Technology Track: Foundations of Computational Science Lecture 2 Dr. Greg Wettstein, Ph.D. Research Support Group Leader Division of Information Technology.](https://reader030.fdocuments.in/reader030/viewer/2022032804/56649e565503460f94b4e562/html5/thumbnails/10.jpg)
MIMDMultiple Instruction Multiple Data
Characterized by multiple execution threads operating on separate data elements.
Threads may operate in shared or disjoint (distributed) memory configurations.
Implementation example SMP (Symmetric Multi-Processing)
![Page 11: HPC Technology Track: Foundations of Computational Science Lecture 2 Dr. Greg Wettstein, Ph.D. Research Support Group Leader Division of Information Technology.](https://reader030.fdocuments.in/reader030/viewer/2022032804/56649e565503460f94b4e562/html5/thumbnails/11.jpg)
SPMDThe Basis for Modern HPC
Defined as a single process executing a common program at different points.
Different from SIMD in that execution is not in lockstep format.
Common implementations: shared memory:
OpenMP Pthreads
distributed memory MPI
![Page 12: HPC Technology Track: Foundations of Computational Science Lecture 2 Dr. Greg Wettstein, Ph.D. Research Support Group Leader Division of Information Technology.](https://reader030.fdocuments.in/reader030/viewer/2022032804/56649e565503460f94b4e562/html5/thumbnails/12.jpg)
Characteristics of MD Models
MIMD/SPMD requires active participation by programmer to implement 'orthogonalization'.
SIMD requires active participation by the compiler with consideration by the programmer to support orthogonalization.
Orthogonalization defn: The isolation ofa problem into discrete elementscapable of being independentlyresolved.
![Page 13: HPC Technology Track: Foundations of Computational Science Lecture 2 Dr. Greg Wettstein, Ph.D. Research Support Group Leader Division of Information Technology.](https://reader030.fdocuments.in/reader030/viewer/2022032804/56649e565503460f94b4e562/html5/thumbnails/13.jpg)
The Real World - A Continuum
Practical programs do not exhibit strict model partitioning.
More pragmatic model is to consider 'dimensions' of parallelism available to a program.
Currently a total of four dimensions of parallelism are exploitable.
![Page 14: HPC Technology Track: Foundations of Computational Science Lecture 2 Dr. Greg Wettstein, Ph.D. Research Support Group Leader Division of Information Technology.](https://reader030.fdocuments.in/reader030/viewer/2022032804/56649e565503460f94b4e562/html5/thumbnails/14.jpg)
Dimensions of Parallelism
First dimension. Standard sequential programming with processor
supplied ILP (Instruction Level Parallelism) Referred to as 'free' or 'invisible' parallelism.
Second dimension. SIMD or OpenMP loop parallelism characterized by isolation of the problem into a
single system image primarily supported by programming language or
compiler
![Page 15: HPC Technology Track: Foundations of Computational Science Lecture 2 Dr. Greg Wettstein, Ph.D. Research Support Group Leader Division of Information Technology.](https://reader030.fdocuments.in/reader030/viewer/2022032804/56649e565503460f94b4e562/html5/thumbnails/15.jpg)
Dimensions of Parallelism - cont.
Third dimension – Two subtypes. use of MPI to partition problem into orthogonal
elements partitioning is frequently implemented on multiple
system images
MIMD threading on a single system image separate threads dispatched to handle separate tasks
which can execute asynchronously Common HPC example is to 'thread' computation
and Input/Output (I/O)
![Page 16: HPC Technology Track: Foundations of Computational Science Lecture 2 Dr. Greg Wettstein, Ph.D. Research Support Group Leader Division of Information Technology.](https://reader030.fdocuments.in/reader030/viewer/2022032804/56649e565503460f94b4e562/html5/thumbnails/16.jpg)
Dimensions of Parallelism - cont.
Fourth dimension partitioning of the problem into orthogonal
elements which can be dispatched to a heterogeneous instruction architecture.
examples: GPGPU/CUDA PowerXcell SPU FPGA
![Page 17: HPC Technology Track: Foundations of Computational Science Lecture 2 Dr. Greg Wettstein, Ph.D. Research Support Group Leader Division of Information Technology.](https://reader030.fdocuments.in/reader030/viewer/2022032804/56649e565503460f94b4e562/html5/thumbnails/17.jpg)
Depth of Parallelism
Measure of the complexity of parallelism implemented.
Simplest metric is the count of the number of programmer implemented dimensions of parallelism on a single system image.
Example MPI implementation with SIMD loop vectorization
on each node Parallelism depth is two
![Page 18: HPC Technology Track: Foundations of Computational Science Lecture 2 Dr. Greg Wettstein, Ph.D. Research Support Group Leader Division of Information Technology.](https://reader030.fdocuments.in/reader030/viewer/2022032804/56649e565503460f94b4e562/html5/thumbnails/18.jpg)
Parallelism Analysis Example
Process based MIMD application. Depth = 1
MPI simulation with OpenMP loop vectorization. Depth = 2
MPI partitioning with CUDA PTree offload and SIMD loop vectorization.
Depth = 3
![Page 19: HPC Technology Track: Foundations of Computational Science Lecture 2 Dr. Greg Wettstein, Ph.D. Research Support Group Leader Division of Information Technology.](https://reader030.fdocuments.in/reader030/viewer/2022032804/56649e565503460f94b4e562/html5/thumbnails/19.jpg)
Escalation of Complexity
Dimension
Architectural decisions must be basedon cost benefit analysis of performancereturns.
Depth
1
N
Least
Most
1 4
![Page 20: HPC Technology Track: Foundations of Computational Science Lecture 2 Dr. Greg Wettstein, Ph.D. Research Support Group Leader Division of Information Technology.](https://reader030.fdocuments.in/reader030/viewer/2022032804/56649e565503460f94b4e562/html5/thumbnails/20.jpg)
Exercise
Verify you have changeset which adds experimental code for SSE/SIMD based boolean PTree operators.
Study the class methods implementing the AND and OR operators.
Review and understood how vector and stride length effect the number of times a loop needs to be executed.
![Page 21: HPC Technology Track: Foundations of Computational Science Lecture 2 Dr. Greg Wettstein, Ph.D. Research Support Group Leader Division of Information Technology.](https://reader030.fdocuments.in/reader030/viewer/2022032804/56649e565503460f94b4e562/html5/thumbnails/21.jpg)
goto skills_lecture1;