COMP528: Multi-core and Multi-Processor ComputingDr Michael K Bane, Computer Science, University of [email protected]
Who, What, Where, When?
• Me…
• You…
• Course:
• Why contents
• Timetable
• Labs, Assignments & Exams
• To Follow ASAP
• Practical & Assignment (& Exam) details & deadlines
• Web page: course outline, course logistics, course materials
• Recommended Reading: some updates
• Contact: my office/timetable. For now: [email protected]
Who am I?
3
• My first computer
• VIC 20
https://aydinstone.wordpress.com/tag/commodore-vic-20/
Who am I?
4
• My first computer
• VIC 20
• My first parallel computer
• INMOS Transputer
http://www.brunel.ac.uk/~eesttti/papers/main.html
Who am I?
5
• My first computer
• VIC 20
• My first parallel computer
• INMOS Transputer
• My first super computer
• Cray T3E
Who am I?
6
• My first computer
• VIC 20
• My first parallel computer
• INMOS Transputer
• My first super computer
• Cray T3E
• Jobs
• Supporting HPC
• Modelling chemical weather
• Manager: Research Apps
• Energy Efficient Compute Research
• … and now my 1st course at U/Liverpool
• Materials originally by ALEXEI LISITSA
• With minor revisions and a few enhancements
• Once I'm up & running with the systems
• Practical & Assignment (& Exam) details & deadlines
• Web page: course outline, course logistics, course materials
• Recommended Reading: some updates
• Updated/full contact details
YOU...
• Compsci u/g?
• Language?
• Previous experience…
• Expectations…
Q: have people been assigned labs?(half to each session)This week's labs will be more about gaining familiarity with the set-up
ASSESSMENT60% by examination/s
40% by course work (lab assignments with element of theory)
Why do we want "more" or "better"…
• Improving products
• More efficient (ie longer range) electric cars
• More accurate weather forecast (when going to rain in my street to what will the weather be like if I book a vacation next month; hurricanes' strength & landfall)
• Nuclear arms stockpile: how to manage / monitor others
• Precision (personalised) medicine
• Deep(er) Learning & (better) Articial Intelligence
• Deeper understanding of the university and all the processes within
• Modelling ever finer details and joining yet more models together dynamically
Generally can call the ways to these: HIGH PERFORMANCE COMPUTING (HPC)
Course elements
• [whiteboard – why HPC]
• [whiteboard – let's design HPC]
• topics we will cover, in which order
** what elements to
consider
hw: cpu, mem (NUMA),
sw: language extensions,
new languages
misc: compilers, DSL,
** options to get to HPC
cpu: x86/arm/power, gpu,
xeon phi, FPGA, custom ASIC
mem: shared, dist, (v-
shared)
• Improving a single chip
• Faster clock
• More ops/cycle
• … and improving memory
• Faster, bigger, better
• ONLY GOES SO FAR
• Using >>1 gets us a LOT future
• multi-core and multi-processor
• parallel computing
Aims
• To provide students with a deep, critical and systematic understanding of key issues and effective solutions for parallel programming for systems with multi-core processors and parallel architectures
• To develop students appreciation of a variety of approaches to parallel programming, including using MPI and OpenMP
• To develop the students' skills in parallel programming in particular using MPI and OpenMP
• To develop the students' skills in parallelization of ideas, algorithms and of existing serial code.
The reasons for multi-core processors, I
• We want applications to execute faster
• Clock speeds no longer increasing exponentially10 GHz
1 GHz
100 MHz
10 MHz
1 MHz
’79 ’87 ’95 ’03 ’11
The reasons for multi-core processors, II• Clock speeds of CPU are stabilizing:
• Excessive power consumption (power proportional to freq^3)
• Heat dissipation
• Overly complicated design
• Memory wall: the increasing gap between processor and memory speeds (cannot feed the cpu quick enough)
• Strategy:• Limit CPU speed and sophistication• Put multiple CPUs (“cores”) on a single chip in a socket
• Potential performance: CPU freq * no ops per cycle * #cores per CPU * #CPUs
... and then• Strategy:
• Limit CPU speed and sophistication• Put multiple CPUs (“cores”) on a single chip in a socket• Several (2 or 4) sockets on a node (cf motherboard)• Connect 10s or 100s or 1000s of nodes…
• Potential performance: CPU freq * no ops per cycle * #cores per CPU * #CPUs per node * #nodes
Parallel computing as it was
Parallel computersare expensive
There are not manyparallel computers
Most people do not learnparallel programming
Parallel computingnot mainstream
Parallel programmingis difficult
picture is from Intel Software College materials
Parallel programmingenvironments are inadequate
Parallel computing recent history
• 2000s-2010s: mass production of
• dual-core CPU,
• quad-core CPU,
• hex-core CPU, 8-core CPU, …
• massively parallel GPU
• PCs are parallel computers
• Development and deployment of large scale parallel architectures
PCs are parallel computers
Everyone has aparallel computer
More people learningparallel programming
Parallel programmingconsidered mainstream
Parallel programminggets easier
picture is taken from Intel Software College materials
Parallel programmingenvironments improve
Parallel computing as it is now
• Problem has inherent parallelism
• Programming language cannot express parallelism
• Compiler and/or hardware must find hidden parallelism
• Does not work well in general
Possible parallel approach
• Problem has inherent parallelism
• Programmer has way to express parallelism
• Compiler translates program for multiple cores or multiple processors (etc)
• We will find out how well this works!
Sequential approach?
Methodology
• Study problem, sequential program, or code segment
• Look for opportunities for parallelism
• Usually best to start by thinking abstractly about the problem to be solved, not by any current program implementation of a given solution
• Try to keep all processors busy doing useful work
• Processors (cores) could be either placed locally (multicore processors), or connected by local/global networks => huge variety of approaches/methods
(after Intel Software College)
Low level vs High level parallelism
• Low-level constructions (such as raw threads) – very flexible, but require much more care on details
• High-level constructions (such as tasks) – compiler takes care on many details, still reasonably flexible
• You will practically explore both options
• Dept'al cluster / University HPC / Hartree HPC / maybe others
• CPU & GPU
• FORTRAN or C (plus language extensions); possibly CUDA
RECAP
• We have looked at
• Why any need to consider multi-core and multi-processor programming
• Some elements that may be important in an HPC setting
• Next time in COMP528…
• Terminology: cores, cpus, processors; threads, processes
• How "high performing" is a supercomputer?
Background Information
• MOOC from PRACE
• https://www.futurelearn.com/courses/supercomputing
• Recommendations re HPC / multi-core / multi-processor
• P Pacheco, "An Introduction to Parallel Programming", Morgan Kaufmann [IPP]
• Darryl Gove, "Multicore Application Programming for Windows, Linux, and Oracle Solaris", Addison-Wesley, 2010 [MAP]
• Recommendations re C
• I. Horton, "Beginning C", Berkeley, CA : Apress
• Recommendations re CUDA
• Sanders et al, "CUDA by example: an introduction to general purpose GPU programming"
• My further recommendations [to follow…]
Top Related