PRESENTATION MADE BY JORAM CHIKWANYA Humanitarian Principles & Standards.
Joram Benham April 2, 2012. Introduction Motivation Multicore Processors Overview, CELL ...
-
Upload
ursula-reeves -
Category
Documents
-
view
212 -
download
0
Transcript of Joram Benham April 2, 2012. Introduction Motivation Multicore Processors Overview, CELL ...
![Page 1: Joram Benham April 2, 2012. Introduction Motivation Multicore Processors Overview, CELL Advantages of CMPs Throughput, Latency Challenges.](https://reader030.fdocuments.in/reader030/viewer/2022032600/56649dd05503460f94ac4e42/html5/thumbnails/1.jpg)
Multicore/Manycore Processors
Joram BenhamApril 2, 2012
![Page 2: Joram Benham April 2, 2012. Introduction Motivation Multicore Processors Overview, CELL Advantages of CMPs Throughput, Latency Challenges.](https://reader030.fdocuments.in/reader030/viewer/2022032600/56649dd05503460f94ac4e42/html5/thumbnails/2.jpg)
Outline
Introduction Motivation Multicore Processors
Overview, CELL Advantages of CMPs
Throughput, Latency Challenges Future of Multicore
![Page 3: Joram Benham April 2, 2012. Introduction Motivation Multicore Processors Overview, CELL Advantages of CMPs Throughput, Latency Challenges.](https://reader030.fdocuments.in/reader030/viewer/2022032600/56649dd05503460f94ac4e42/html5/thumbnails/3.jpg)
Introduction
Multicore processors Several/many cores on the same chip Dual/quad core – two/four cores
AKA Chip-multiprocessors (CMPs)
![Page 4: Joram Benham April 2, 2012. Introduction Motivation Multicore Processors Overview, CELL Advantages of CMPs Throughput, Latency Challenges.](https://reader030.fdocuments.in/reader030/viewer/2022032600/56649dd05503460f94ac4e42/html5/thumbnails/4.jpg)
Motivation
![Page 5: Joram Benham April 2, 2012. Introduction Motivation Multicore Processors Overview, CELL Advantages of CMPs Throughput, Latency Challenges.](https://reader030.fdocuments.in/reader030/viewer/2022032600/56649dd05503460f94ac4e42/html5/thumbnails/5.jpg)
Motivation - ILP
Instruction-Level Parallelism Pipelining – split execution into stages Superscalar – issue multiple instruction
each cycle Out-of-order execution Branch prediction
Take advantage of implicit program parallelism – instruction independence
![Page 6: Joram Benham April 2, 2012. Introduction Motivation Multicore Processors Overview, CELL Advantages of CMPs Throughput, Latency Challenges.](https://reader030.fdocuments.in/reader030/viewer/2022032600/56649dd05503460f94ac4e42/html5/thumbnails/6.jpg)
Motivation – ILP Problems
1. Limited amount of implicit parallelism in sequentially designed/coded programs
2. Circuitry for pipelining becomes complex after 10-20 stages
3. Power – circuitry for ILP exploitation results in exponentially more power being used
![Page 7: Joram Benham April 2, 2012. Introduction Motivation Multicore Processors Overview, CELL Advantages of CMPs Throughput, Latency Challenges.](https://reader030.fdocuments.in/reader030/viewer/2022032600/56649dd05503460f94ac4e42/html5/thumbnails/7.jpg)
Intel processor power over time. Power in Watts on y-axis, years on x-axis.
![Page 8: Joram Benham April 2, 2012. Introduction Motivation Multicore Processors Overview, CELL Advantages of CMPs Throughput, Latency Challenges.](https://reader030.fdocuments.in/reader030/viewer/2022032600/56649dd05503460f94ac4e42/html5/thumbnails/8.jpg)
Chip-MultiprocessorsAKA Multicore/Manycore Processor
![Page 9: Joram Benham April 2, 2012. Introduction Motivation Multicore Processors Overview, CELL Advantages of CMPs Throughput, Latency Challenges.](https://reader030.fdocuments.in/reader030/viewer/2022032600/56649dd05503460f94ac4e42/html5/thumbnails/9.jpg)
CMPs
Getting harder to build better uniprocessors
CMPs are less difficult Can reuse/modify old designs Add modified copies to same chip
Requires a paradigm shift From Von Neumann model to parallel
programming model Thread-level parallelism + instruction-
level parallelism
![Page 10: Joram Benham April 2, 2012. Introduction Motivation Multicore Processors Overview, CELL Advantages of CMPs Throughput, Latency Challenges.](https://reader030.fdocuments.in/reader030/viewer/2022032600/56649dd05503460f94ac4e42/html5/thumbnails/10.jpg)
Basic Uniprocessor Design
![Page 11: Joram Benham April 2, 2012. Introduction Motivation Multicore Processors Overview, CELL Advantages of CMPs Throughput, Latency Challenges.](https://reader030.fdocuments.in/reader030/viewer/2022032600/56649dd05503460f94ac4e42/html5/thumbnails/11.jpg)
Basic CMP Design
![Page 12: Joram Benham April 2, 2012. Introduction Motivation Multicore Processors Overview, CELL Advantages of CMPs Throughput, Latency Challenges.](https://reader030.fdocuments.in/reader030/viewer/2022032600/56649dd05503460f94ac4e42/html5/thumbnails/12.jpg)
Real CMP Example - CELL
CELL CMP – heterogeneous Developed by Sony, Toshiba, IBM Built for Sony’s PlayStation 3 Contains 9 cores
1 Power Processing Element (PPE) 8 Synergistic Processing Elements (SPEs)
![Page 13: Joram Benham April 2, 2012. Introduction Motivation Multicore Processors Overview, CELL Advantages of CMPs Throughput, Latency Challenges.](https://reader030.fdocuments.in/reader030/viewer/2022032600/56649dd05503460f94ac4e42/html5/thumbnails/13.jpg)
![Page 14: Joram Benham April 2, 2012. Introduction Motivation Multicore Processors Overview, CELL Advantages of CMPs Throughput, Latency Challenges.](https://reader030.fdocuments.in/reader030/viewer/2022032600/56649dd05503460f94ac4e42/html5/thumbnails/14.jpg)
Advantages of CMPsThroughput, Latency
![Page 15: Joram Benham April 2, 2012. Introduction Motivation Multicore Processors Overview, CELL Advantages of CMPs Throughput, Latency Challenges.](https://reader030.fdocuments.in/reader030/viewer/2022032600/56649dd05503460f94ac4e42/html5/thumbnails/15.jpg)
Improving Throughput
Web-server throughput Handle many independent service
requests Collections of uniprocessor servers
used Then, multiprocessor systems CMP approach
Use less power for communication Reducing clock-speeds
![Page 16: Joram Benham April 2, 2012. Introduction Motivation Multicore Processors Overview, CELL Advantages of CMPs Throughput, Latency Challenges.](https://reader030.fdocuments.in/reader030/viewer/2022032600/56649dd05503460f94ac4e42/html5/thumbnails/16.jpg)
Throughput – Servers
General rule: “The simpler the pipeline, the lower the
power.” Simple cores – less power used Less speed, but more cores available to
handle requests
![Page 17: Joram Benham April 2, 2012. Introduction Motivation Multicore Processors Overview, CELL Advantages of CMPs Throughput, Latency Challenges.](https://reader030.fdocuments.in/reader030/viewer/2022032600/56649dd05503460f94ac4e42/html5/thumbnails/17.jpg)
Comparison of power usage by equivalent narrow issue/in-order processors, and wide-issue/out-of-order processors on throughput-oriented software.
![Page 18: Joram Benham April 2, 2012. Introduction Motivation Multicore Processors Overview, CELL Advantages of CMPs Throughput, Latency Challenges.](https://reader030.fdocuments.in/reader030/viewer/2022032600/56649dd05503460f94ac4e42/html5/thumbnails/18.jpg)
Throughput - Multithreading Server applications:
High thread-level parallelism Lower instruction-level parallelism, high cache
miss rates Results in idle processor time on uniprocessors
Hardware multithreading Coarse-grained: stalls trigger switches Fine-grained: switch threads continuously Simultaneous: Run multiple threads using
superscalar issuing
![Page 19: Joram Benham April 2, 2012. Introduction Motivation Multicore Processors Overview, CELL Advantages of CMPs Throughput, Latency Challenges.](https://reader030.fdocuments.in/reader030/viewer/2022032600/56649dd05503460f94ac4e42/html5/thumbnails/19.jpg)
Throughput – Increase the Cores
More cores = higher total hardware thread count
What kind of cores should be added? Fewer larger, more complex cores▪ Individual threads complete faster
Many smaller, simpler cores▪ Slightly slower – but more cores means more
threads, and higher throughput
![Page 20: Joram Benham April 2, 2012. Introduction Motivation Multicore Processors Overview, CELL Advantages of CMPs Throughput, Latency Challenges.](https://reader030.fdocuments.in/reader030/viewer/2022032600/56649dd05503460f94ac4e42/html5/thumbnails/20.jpg)
Improving Latency
Latency is more important in some programs E.g. Desktop applications, compilation
CMPs are closer together on chip – less communication time
Two ways CMPs help with latency Parallelize the code for responsive
applications Run sequential applications on their own
hardware threads – no competition between threads
![Page 21: Joram Benham April 2, 2012. Introduction Motivation Multicore Processors Overview, CELL Advantages of CMPs Throughput, Latency Challenges.](https://reader030.fdocuments.in/reader030/viewer/2022032600/56649dd05503460f94ac4e42/html5/thumbnails/21.jpg)
Multicore ChallengesPower and Temperature, Cache Coherence, Memory Access, Paradigm Shift, Starvation
![Page 22: Joram Benham April 2, 2012. Introduction Motivation Multicore Processors Overview, CELL Advantages of CMPs Throughput, Latency Challenges.](https://reader030.fdocuments.in/reader030/viewer/2022032600/56649dd05503460f94ac4e42/html5/thumbnails/22.jpg)
Power and Temperature
In theory: two cores on the same chip = twice as much power + lots of heat
Solutions: Reduce core clock speeds Implement a power control unit
![Page 23: Joram Benham April 2, 2012. Introduction Motivation Multicore Processors Overview, CELL Advantages of CMPs Throughput, Latency Challenges.](https://reader030.fdocuments.in/reader030/viewer/2022032600/56649dd05503460f94ac4e42/html5/thumbnails/23.jpg)
CELL chip-multiprocessor thermal diagram.
![Page 24: Joram Benham April 2, 2012. Introduction Motivation Multicore Processors Overview, CELL Advantages of CMPs Throughput, Latency Challenges.](https://reader030.fdocuments.in/reader030/viewer/2022032600/56649dd05503460f94ac4e42/html5/thumbnails/24.jpg)
Cache Coherence
Multiple cores, independent local caches Load same block of main memory into
cache – may result in data inconsistency Cache coherence schemes
Snooping: Watch the communication bus
Directory-based: Keep track of which memory locations are being shared in multiple caches
![Page 25: Joram Benham April 2, 2012. Introduction Motivation Multicore Processors Overview, CELL Advantages of CMPs Throughput, Latency Challenges.](https://reader030.fdocuments.in/reader030/viewer/2022032600/56649dd05503460f94ac4e42/html5/thumbnails/25.jpg)
Memory Issues
We need more memory to share among multicore processors 64-bit processors – helps address the
issue: more addressable memory Useless if we cannot access it quickly Disk speed slows everyone down
![Page 26: Joram Benham April 2, 2012. Introduction Motivation Multicore Processors Overview, CELL Advantages of CMPs Throughput, Latency Challenges.](https://reader030.fdocuments.in/reader030/viewer/2022032600/56649dd05503460f94ac4e42/html5/thumbnails/26.jpg)
Change to Parallel Paradigm “To use multicore, you really have to use
multiple threads. If you know how to do it, it's not bad. But the first time you do it there are lots of ways to shoot yourself in the foot. The bugs you introduce with multithreading are so much harder to find.”
Have to educate programmers Convince them to make their programs
concurrent
![Page 27: Joram Benham April 2, 2012. Introduction Motivation Multicore Processors Overview, CELL Advantages of CMPs Throughput, Latency Challenges.](https://reader030.fdocuments.in/reader030/viewer/2022032600/56649dd05503460f94ac4e42/html5/thumbnails/27.jpg)
Starvation
Sequential programs will not use all cores Some cores “starve”
Shared cache usage One core evicts another core’s data Other core has to keep accessing main
memory
![Page 28: Joram Benham April 2, 2012. Introduction Motivation Multicore Processors Overview, CELL Advantages of CMPs Throughput, Latency Challenges.](https://reader030.fdocuments.in/reader030/viewer/2022032600/56649dd05503460f94ac4e42/html5/thumbnails/28.jpg)
Future of MulticoreMulticore, Manycore, Hybrids
![Page 29: Joram Benham April 2, 2012. Introduction Motivation Multicore Processors Overview, CELL Advantages of CMPs Throughput, Latency Challenges.](https://reader030.fdocuments.in/reader030/viewer/2022032600/56649dd05503460f94ac4e42/html5/thumbnails/29.jpg)
Future of CMPs
Instruction-level parallelism reaching its limits
CMPs help with throughput and latency
Two types of CMP will emerge “Manycore”: large number of small,
simple cores, targets at servers/throughput
“Multicore”: fewer, faster superscalar cores for very latency sensitive programs
“Hybrids”: heterogeneous combinations
![Page 30: Joram Benham April 2, 2012. Introduction Motivation Multicore Processors Overview, CELL Advantages of CMPs Throughput, Latency Challenges.](https://reader030.fdocuments.in/reader030/viewer/2022032600/56649dd05503460f94ac4e42/html5/thumbnails/30.jpg)
References
Hammond, L., Laudon, J., Olukotun, K. Chip Multiprocessor Architecture: Techniques to Improve Throughput and Latency. Morgan and Claypool, 2007.
Hennessy, J. L., Patterson, D. A. Computer Architecture: A Quantitative Approach.San Francisco: Morgan Kaufmann Publishers, 2007.
Mashiyat, A. S. “Multi/Many Core Systems.” St. Francis Xavier University course presentation, 2011.
Schauer, Bryan. “Multicore Processors – A Necessity.” Proquest Discovery Guides. September 2008. Web. Accessed April 2 2012. <http://www.csa.com/discoveryguides/multicore/review.pdf>