CS 61C: Great Ideas in Computer Architecture Lecture 19 ...cs61c/fa16/lec/19/L19.pdf · CS 61c...
Transcript of CS 61C: Great Ideas in Computer Architecture Lecture 19 ...cs61c/fa16/lec/19/L19.pdf · CS 61c...
![Page 1: CS 61C: Great Ideas in Computer Architecture Lecture 19 ...cs61c/fa16/lec/19/L19.pdf · CS 61c Lecture 19: Thread Level Parallel Processing 21 Thread pool: List of threads competing](https://reader033.fdocuments.in/reader033/viewer/2022060300/5f0813b37e708231d4203801/html5/thumbnails/1.jpg)
CS61C:GreatIdeasinComputerArchitecture
Lecture19:Thread-LevelParallelProcessing
BernhardBoser&RandyKatz
http://inst.eecs.berkeley.edu/~cs61c
![Page 2: CS 61C: Great Ideas in Computer Architecture Lecture 19 ...cs61c/fa16/lec/19/L19.pdf · CS 61c Lecture 19: Thread Level Parallel Processing 21 Thread pool: List of threads competing](https://reader033.fdocuments.in/reader033/viewer/2022060300/5f0813b37e708231d4203801/html5/thumbnails/2.jpg)
Agenda
• MIMD- multipleprogramssimultaneously• Threads• Parallelprogramming:OpenMP• Synchronizationprimitives• SynchronizationinOpenMP• And,inConclusion…
CS61c Lecture19:ThreadLevelParallelProcessing 2
![Page 3: CS 61C: Great Ideas in Computer Architecture Lecture 19 ...cs61c/fa16/lec/19/L19.pdf · CS 61c Lecture 19: Thread Level Parallel Processing 21 Thread pool: List of threads competing](https://reader033.fdocuments.in/reader033/viewer/2022060300/5f0813b37e708231d4203801/html5/thumbnails/3.jpg)
ImprovingPerformance1. Increaseclockratefs
− Reachedpracticalmaximumfortoday’stechnology− <5GHzforgeneralpurposecomputers
2. LowerCPI(cyclesperinstruction)− SIMD,“instructionlevelparallelism”
3. Performmultipletaskssimultaneously− MultipleCPUs,eachexecutingdifferentprogram− Tasksmayberelated
§ E.g.eachCPUperformspartofabigmatrixmultiplication− orunrelated
§ E.g.distributedifferentwebhttprequestsoverdifferentcomputers§ E.g.runppt (viewlectureslides)andbrowser(youtube)simultaneously
4. Doalloftheabove:− Highfs,SIMD,multipleparalleltasks
3CS61c Lecture19:ThreadLevelParallelProcessing
Today’slecture
![Page 4: CS 61C: Great Ideas in Computer Architecture Lecture 19 ...cs61c/fa16/lec/19/L19.pdf · CS 61c Lecture 19: Thread Level Parallel Processing 21 Thread pool: List of threads competing](https://reader033.fdocuments.in/reader033/viewer/2022060300/5f0813b37e708231d4203801/html5/thumbnails/4.jpg)
New-SchoolMachineStructures(It’sabitmorecomplicated!)
• ParallelRequestsAssigned tocomputere.g.,Search“Katz”
• ParallelThreadsAssigned tocoree.g.,Lookup,Ads
• ParallelInstructions>[email protected].,5pipelined instructions
• ParallelData>1dataitem@one timee.g.,Addof4pairsofwords
• HardwaredescriptionsAllgates@onetime
• ProgrammingLanguages 4
SmartPhone
WarehouseScale
Computer
SoftwareHardware
HarnessParallelism&AchieveHighPerformance
LogicGates
Core Core…
Memory(Cache)
Input/Output
Computer
CacheMemory
Core
InstructionUnit(s) FunctionalUnit(s)
A3+B3A2+B2A1+B1A0+B0
Project4CS61c Lecture19:ThreadLevelParallelProcessing
![Page 5: CS 61C: Great Ideas in Computer Architecture Lecture 19 ...cs61c/fa16/lec/19/L19.pdf · CS 61c Lecture 19: Thread Level Parallel Processing 21 Thread pool: List of threads competing](https://reader033.fdocuments.in/reader033/viewer/2022060300/5f0813b37e708231d4203801/html5/thumbnails/5.jpg)
ParallelComputerArchitectures
CS61c 5
Severalseparatecomputers,somemeansforcommunication(e.g.Ethernet)
Massivearrayofcomputers,fastcommunicationbetweenprocessors
Multi-coreCPU:1datapathinsinglechip
shareL3cache,memory, peripheralsExample:Hivemachines
GPU“graphicsprocessing unit”
![Page 6: CS 61C: Great Ideas in Computer Architecture Lecture 19 ...cs61c/fa16/lec/19/L19.pdf · CS 61c Lecture 19: Thread Level Parallel Processing 21 Thread pool: List of threads competing](https://reader033.fdocuments.in/reader033/viewer/2022060300/5f0813b37e708231d4203801/html5/thumbnails/6.jpg)
Example:CPUwith2Cores
6
Processor“Core”1
Control
DatapathPC
Registers(ALU)
MemoryInput
Output
Bytes
I/O-MemoryInterfaces
Processor0MemoryAccesses
Processor“Core”2
Control
DatapathPC
Registers(ALU)
Processor1MemoryAccesses
CS61c
![Page 7: CS 61C: Great Ideas in Computer Architecture Lecture 19 ...cs61c/fa16/lec/19/L19.pdf · CS 61c Lecture 19: Thread Level Parallel Processing 21 Thread pool: List of threads competing](https://reader033.fdocuments.in/reader033/viewer/2022060300/5f0813b37e708231d4203801/html5/thumbnails/7.jpg)
MultiprocessorExecutionModel
• Eachprocessor(core)executesitsowninstructions• Separate resources(notshared)
− Datapath(PC,registers,ALU)− Highestlevelcaches(e.g.1st and2nd)
• Shared resources− Memory(DRAM)− Often3rd levelcache
§ Oftenonsamesiliconchip§ Butnotarequirement
• Nomenclature− “MultiprocessorMicroprocessor”− Multicoreprocessor
§ E.g.4coreCPU(centralprocessingunit)§ Executes4differentinstructionstreamssimultaneously
7CS61c Lecture19:ThreadLevelParallelProcessing
![Page 8: CS 61C: Great Ideas in Computer Architecture Lecture 19 ...cs61c/fa16/lec/19/L19.pdf · CS 61c Lecture 19: Thread Level Parallel Processing 21 Thread pool: List of threads competing](https://reader033.fdocuments.in/reader033/viewer/2022060300/5f0813b37e708231d4203801/html5/thumbnails/8.jpg)
TransitiontoMulticore
Sequential App Performance
8CS61c Lecture19:ThreadLevelParallelProcessing
![Page 9: CS 61C: Great Ideas in Computer Architecture Lecture 19 ...cs61c/fa16/lec/19/L19.pdf · CS 61c Lecture 19: Thread Level Parallel Processing 21 Thread pool: List of threads competing](https://reader033.fdocuments.in/reader033/viewer/2022060300/5f0813b37e708231d4203801/html5/thumbnails/9.jpg)
MultiprocessorExecutionModel
• Sharedmemory− Each“core”hasaccesstotheentirememoryintheprocessor− Specialhardwarekeepscachesconsistent− Advantages:
§ Simplifiescommunication inprogramviasharedvariables− Drawbacks:
§ Doesnotscalewell:o “Slow”memorysharedbymany“customers”(cores)o Maybecomebottleneck(Amdahl’sLaw)
• Twowaystouseamultiprocessor:− Job-levelparallelism
§ Processorsworkonunrelatedproblems§ Nocommunicationbetweenprograms
− Partitionworkofsingletaskbetweenseveralcores§ E.g.eachperformspartoflargematrixmultiplication
9CS61c Lecture19:ThreadLevelParallelProcessing
![Page 10: CS 61C: Great Ideas in Computer Architecture Lecture 19 ...cs61c/fa16/lec/19/L19.pdf · CS 61c Lecture 19: Thread Level Parallel Processing 21 Thread pool: List of threads competing](https://reader033.fdocuments.in/reader033/viewer/2022060300/5f0813b37e708231d4203801/html5/thumbnails/10.jpg)
ParallelProcessing
• It’sdifficult!• It’sinevitable
− Onlypathtoincreaseperformance− Onlypathtolowerenergyconsumption(improvebatterylife)
• Inmobilesystems(e.g.smartphones,tablets)− Multiplecores− Dedicatedprocessors,e.g.
§ motionprocessoriniPhone§ GPU(graphicsprocessingunit)
• Warehouse-scalecomputers− multiple“nodes”
§ “boxes”withseveralCPUs,disksperbox− MIMD(multi-core)andSIMD(e.g.AVX)ineachnode
10CS61c Lecture19:ThreadLevelParallelProcessing
![Page 11: CS 61C: Great Ideas in Computer Architecture Lecture 19 ...cs61c/fa16/lec/19/L19.pdf · CS 61c Lecture 19: Thread Level Parallel Processing 21 Thread pool: List of threads competing](https://reader033.fdocuments.in/reader033/viewer/2022060300/5f0813b37e708231d4203801/html5/thumbnails/11.jpg)
PotentialParallelPerformance(assumingsoftwarecanuseit)
Year Cores SIMD bits /Core Core *SIMD bits
Total, e.g.FLOPs/Cycle
2003 2 128 256 42005 4 128 512 82007 6 128 768 122009 8 128 1024 162011 10 256 2560 402013 12 256 3072 482015 14 512 7168 1122017 16 512 8192 1282019 18 1024 18432 2882021 20 1024 20480 320
11
2.5X 8X 20X
MIMD SIMD MIMD&SIMD+2/
2yrs2X/4yrs
CS61c
12years
20xin12years201/12 =1.28xà 28%peryearor2xevery3years!
IF(!)wecanuseit
![Page 12: CS 61C: Great Ideas in Computer Architecture Lecture 19 ...cs61c/fa16/lec/19/L19.pdf · CS 61c Lecture 19: Thread Level Parallel Processing 21 Thread pool: List of threads competing](https://reader033.fdocuments.in/reader033/viewer/2022060300/5f0813b37e708231d4203801/html5/thumbnails/12.jpg)
Agenda
• MIMD- multipleprogramssimultaneously• Threads• Parallelprogramming:OpenMP• Synchronizationprimitives• SynchronizationinOpenMP• And,inConclusion…
CS61c Lecture19:ThreadLevelParallelProcessing 12
![Page 13: CS 61C: Great Ideas in Computer Architecture Lecture 19 ...cs61c/fa16/lec/19/L19.pdf · CS 61c Lecture 19: Thread Level Parallel Processing 21 Thread pool: List of threads competing](https://reader033.fdocuments.in/reader033/viewer/2022060300/5f0813b37e708231d4203801/html5/thumbnails/13.jpg)
ProgramsRunningonmyComputerPID TTY TIME CMD220 ?? 0:04.34 /usr/libexec/UserEventAgent (Aqua)222 ?? 0:10.60 /usr/sbin/distnoted agent224 ?? 0:09.11 /usr/sbin/cfprefsd agent229 ?? 0:04.71 /usr/sbin/usernoted230 ?? 0:02.35 /usr/libexec/nsurlsessiond232 ?? 0:28.68 /System/Library/PrivateFrameworks/CalendarAgent.framework/Executables/CalendarAgent234 ?? 0:04.36 /System/Library/PrivateFrameworks/GameCenterFoundation.framework/Versions/A/gamed235 ?? 0:01.90 /System/Library/CoreServices/cloudphotosd.app/Contents/MacOS/cloudphotosd236 ?? 0:49.72 /usr/libexec/secinitd239 ?? 0:01.66 /System/Library/PrivateFrameworks/TCC.framework/Resources/tccd240 ?? 0:12.68 /System/Library/Frameworks/Accounts.framework/Versions/A/Support/accountsd241 ?? 0:09.56 /usr/libexec/SafariCloudHistoryPushAgent242 ?? 0:00.27 /System/Library/PrivateFrameworks/CallHistory.framework/Support/CallHistorySyncHelper243 ?? 0:00.74 /System/Library/CoreServices/mapspushd244 ?? 0:00.79 /usr/libexec/fmfd246 ?? 0:00.09 /System/Library/PrivateFrameworks/AskPermission.framework/Versions/A/Resources/askpermissiond248 ?? 0:01.03 /System/Library/PrivateFrameworks/CloudDocsDaemon.framework/Versions/A/Support/bird249 ?? 0:02.50 /System/Library/PrivateFrameworks/IDS.framework/identityservicesd.app/Contents/MacOS/identityservicesd250 ?? 0:04.81 /usr/libexec/secd254 ?? 0:24.01 /System/Library/PrivateFrameworks/CloudKitDaemon.framework/Support/cloudd258 ?? 0:04.73 /System/Library/PrivateFrameworks/TelephonyUtilities.framework/callservicesd267 ?? 0:02.15 /System/Library/CoreServices/AirPlayUIAgent.app/Contents/MacOS/AirPlayUIAgent --launchd271 ?? 0:03.91 /usr/libexec/nsurlstoraged274 ?? 0:00.90 /System/Library/PrivateFrameworks/CommerceKit.framework/Versions/A/Resources/storeaccountd282 ?? 0:00.09 /usr/sbin/pboard283 ?? 0:00.90
/System/Library/PrivateFrameworks/InternetAccounts.framework/Versions/A/XPCServices/com.apple.internetaccounts.xpc/Contents/MacOS/com.apple.internetaccounts285 ?? 0:04.72 /System/Library/Frameworks/ApplicationServices.framework/Frameworks/ATS.framework/Support/fontd291 ?? 0:00.25 /System/Library/Frameworks/Security.framework/Versions/A/Resources/CloudKeychainProxy.bundle/Contents/MacOS/CloudKeychainProxy292 ?? 0:09.54 /System/Library/CoreServices/CoreServicesUIAgent.app/Contents/MacOS/CoreServicesUIAgent293 ?? 0:00.29
/System/Library/PrivateFrameworks/CloudPhotoServices.framework/Versions/A/Frameworks/CloudPhotoServicesConfiguration.framework/Versions/A/XPCServices/com.apple.CloudPhotosConfiguration.xpc/Contents/MacOS/com.apple.CloudPhotosConfiguration
297 ?? 0:00.84 /System/Library/PrivateFrameworks/CloudServices.framework/Resources/com.apple.sbd302 ?? 0:26.11 /System/Library/CoreServices/Dock.app/Contents/MacOS/Dock303 ?? 0:09.55 /System/Library/CoreServices/SystemUIServer.app/Contents/MacOS/SystemUIServer
…156total at this momentHow does mylaptopdothis?
Imagine doing 156assignments all at the same time!CS61c Lecture19:ThreadLevelParallelProcessing 13
![Page 14: CS 61C: Great Ideas in Computer Architecture Lecture 19 ...cs61c/fa16/lec/19/L19.pdf · CS 61c Lecture 19: Thread Level Parallel Processing 21 Thread pool: List of threads competing](https://reader033.fdocuments.in/reader033/viewer/2022060300/5f0813b37e708231d4203801/html5/thumbnails/14.jpg)
Threads• Sequentialflowofinstructionsthatperformssometask
− Uptonowwejustcalledthisa“program”
• Eachthreadhasa− DedicatedPC(programcounter)− Separateregisters− Accessesthesharedmemory
• Eachprocessorprovidesone(ormore)− hardware threads (orharts)thatactivelyexecuteinstructions− Eachcoreexecutesone“hardware thread”
• Operatingsystemmultiplexesmultiple− software threads ontotheavailablehardwarethreads− allthreadsexceptthosemappedtohardwarethreadsarewaiting
14CS61c Lecture19:ThreadLevelParallelProcessing
![Page 15: CS 61C: Great Ideas in Computer Architecture Lecture 19 ...cs61c/fa16/lec/19/L19.pdf · CS 61c Lecture 19: Thread Level Parallel Processing 21 Thread pool: List of threads competing](https://reader033.fdocuments.in/reader033/viewer/2022060300/5f0813b37e708231d4203801/html5/thumbnails/15.jpg)
OperatingSystemThreads
Giveillusionofmany“simultaneously”activethreads1. Multiplexsoftwarethreadsontohardwarethreads:
a) Switchoutblockedthreads(e.g.cachemiss,userinput,networkaccess)b) Timer(e.g.switchactivethreadevery1ms)
2. Removeasoftwarethreadfromahardwarethreadbyi. interruptingitsexecutionii. savingitsregistersandPCtomemory
3. Startexecutingadifferentsoftwarethreadbyi. loadingitspreviouslysavedregistersintoahardwarethread’sregistersii. jumpingtoitssavedPC
CS61c Lecture19:ThreadLevelParallelProcessing 15
![Page 16: CS 61C: Great Ideas in Computer Architecture Lecture 19 ...cs61c/fa16/lec/19/L19.pdf · CS 61c Lecture 19: Thread Level Parallel Processing 21 Thread pool: List of threads competing](https://reader033.fdocuments.in/reader033/viewer/2022060300/5f0813b37e708231d4203801/html5/thumbnails/16.jpg)
Example:4Cores
CS61c Lecture19:ThreadLevelParallelProcessing 16
Threadpool:Listofthreadscompetingforprocessor
OSmapsthreadstocoresandscheduleslogical(software)threads
Core2
Each“Core”activelyruns1programatatime
Core1 Core3 Core4
![Page 17: CS 61C: Great Ideas in Computer Architecture Lecture 19 ...cs61c/fa16/lec/19/L19.pdf · CS 61c Lecture 19: Thread Level Parallel Processing 21 Thread pool: List of threads competing](https://reader033.fdocuments.in/reader033/viewer/2022060300/5f0813b37e708231d4203801/html5/thumbnails/17.jpg)
Multithreading
• Typicalscenario:− Activethreadencounterscachemiss− Activethreadwaits~ 1000cyclesfordatafromDRAM−à switchoutandrundifferentthreaduntildataavailable
• Problem−Mustsavecurrentthreadstateandloadnewthreadstate
§ PC,allregisters(couldbemany,e.g.AVX)−àmustperformswitchin≪1000cycles
• Canhardwarehelp?−Moore’slaw:transistorsareplenty
17CS61c Lecture19:ThreadLevelParallelProcessing
![Page 18: CS 61C: Great Ideas in Computer Architecture Lecture 19 ...cs61c/fa16/lec/19/L19.pdf · CS 61c Lecture 19: Thread Level Parallel Processing 21 Thread pool: List of threads competing](https://reader033.fdocuments.in/reader033/viewer/2022060300/5f0813b37e708231d4203801/html5/thumbnails/18.jpg)
HardwareassistedSoftwareMultithreading
18
MemoryInput
Output
Bytes
I/O-MemoryInterfaces
Processor(1 Core,2Threads)
Control
DatapathPC0
Registers0
(ALU)
PC1
Registers1
• TwocopiesofPCandRegistersinsideprocessorhardware
• Looksliketwoprocessorstosoftware(hardwarethread0,hardwarethread1)
• Hyperthreading:• Boththreadsmaybeactive
simultaneously
CS61c Lecture19:ThreadLevelParallelProcessingNote:presentedincorrectlyinthelecture
![Page 19: CS 61C: Great Ideas in Computer Architecture Lecture 19 ...cs61c/fa16/lec/19/L19.pdf · CS 61c Lecture 19: Thread Level Parallel Processing 21 Thread pool: List of threads competing](https://reader033.fdocuments.in/reader033/viewer/2022060300/5f0813b37e708231d4203801/html5/thumbnails/19.jpg)
Multithreading
• Logicalthreads− ≈1%morehardware,≈10%(?)betterperformance
§ Separateregisters§ Sharedatapath,ALU(s),caches
• Multicore− =>DuplicateProcessors− ≈50%morehardware,≈2Xbetterperformance?
• Modernmachinesdoboth−Multiplecoreswithmultiplethreads percore
19CS61c Lecture19:ThreadLevelParallelProcessing
![Page 20: CS 61C: Great Ideas in Computer Architecture Lecture 19 ...cs61c/fa16/lec/19/L19.pdf · CS 61c Lecture 19: Thread Level Parallel Processing 21 Thread pool: List of threads competing](https://reader033.fdocuments.in/reader033/viewer/2022060300/5f0813b37e708231d4203801/html5/thumbnails/20.jpg)
Bernhard’sLaptop
CS61c Lecture19:ThreadLevelParallelProcessing 20
$ sysctl -a | grep hw
hw.physicalcpu: 2hw.logicalcpu: 4hw.l1icachesize: 32,768hw.l1dcachesize: 32,768hw.l2cachesize: 262,144hw.l3cachesize: 3,145,728
• 2Cores• 4Threadstotal
![Page 21: CS 61C: Great Ideas in Computer Architecture Lecture 19 ...cs61c/fa16/lec/19/L19.pdf · CS 61c Lecture 19: Thread Level Parallel Processing 21 Thread pool: List of threads competing](https://reader033.fdocuments.in/reader033/viewer/2022060300/5f0813b37e708231d4203801/html5/thumbnails/21.jpg)
Example:6Cores,24LogicalThreads
CS61c Lecture19:ThreadLevelParallelProcessing 21
Threadpool:Listofthreadscompetingforprocessor
OSmapsthreadstocoresandscheduleslogical(software)threads
Thread1Core2
Thread2
Thread3
Thread4
Thread1Core6
Thread2
Thread3
Thread4
Thread1Core4
Thread2
Thread3
Thread4
Thread1Core5
Thread2
Thread3
Thread4
Thread1Core3
Thread2
Thread3
Thread4
Thread1Core1
Thread2
Thread3
Thread4
4Logicalthreadspercore(hardware)thread
![Page 22: CS 61C: Great Ideas in Computer Architecture Lecture 19 ...cs61c/fa16/lec/19/L19.pdf · CS 61c Lecture 19: Thread Level Parallel Processing 21 Thread pool: List of threads competing](https://reader033.fdocuments.in/reader033/viewer/2022060300/5f0813b37e708231d4203801/html5/thumbnails/22.jpg)
Agenda
• MIMD- multipleprogramssimultaneously• Threads• Parallelprogramming:OpenMP• Synchronizationprimitives• SynchronizationinOpenMP• And,inConclusion…
CS61c Lecture19:ThreadLevelParallelProcessing 22
![Page 23: CS 61C: Great Ideas in Computer Architecture Lecture 19 ...cs61c/fa16/lec/19/L19.pdf · CS 61c Lecture 19: Thread Level Parallel Processing 21 Thread pool: List of threads competing](https://reader033.fdocuments.in/reader033/viewer/2022060300/5f0813b37e708231d4203801/html5/thumbnails/23.jpg)
LanguagessupportingParallelProgramming
23
ActorScript Concurrent Pascal JoCaml OrcAda Concurrent ML Join OzAfnix Concurrent Haskell Java PictAlef Curry Joule ReiaAlice CUDA Joyce SALSAAPL E LabVIEW ScalaAxum Eiffel Limbo SISALChapel Erlang Linda SRCilk Fortan 90 MultiLisp Stackless PythonClean Go Modula-3 SuperPascalClojure Io Occam VHDLConcurrent C Janus occam-π XC
CS61c Lecture19:ThreadLevelParallelProcessing
Whichonetopick?
![Page 24: CS 61C: Great Ideas in Computer Architecture Lecture 19 ...cs61c/fa16/lec/19/L19.pdf · CS 61c Lecture 19: Thread Level Parallel Processing 21 Thread pool: List of threads competing](https://reader033.fdocuments.in/reader033/viewer/2022060300/5f0813b37e708231d4203801/html5/thumbnails/24.jpg)
Whysomanyparallelprogramminglanguages?
• Piazzaquestion:−Why“intrinsics”?− TOIntel:fixyour#()&$!Compiler!
• It’shappening...but− SIMDfeaturesarecontinuallyaddedtocompilers(Intel,gcc)− Intenseareaofresearch− Researchprogress:
§ 20+yearstotranslateCintogood(fast!)assembly§ HowlongtotranslateCintogood(fast!)parallelcode?
o Generalproblem isveryhardtosolveo Presentstate:specializedsolutions forspecificcaseso Youropportunitytobecomefamous!
CS61c Lecture19:ThreadLevelParallelProcessing 24
![Page 25: CS 61C: Great Ideas in Computer Architecture Lecture 19 ...cs61c/fa16/lec/19/L19.pdf · CS 61c Lecture 19: Thread Level Parallel Processing 21 Thread pool: List of threads competing](https://reader033.fdocuments.in/reader033/viewer/2022060300/5f0813b37e708231d4203801/html5/thumbnails/25.jpg)
ParallelProgrammingLanguages
• Numberofchoicesisindicationof− Nouniversalsolution
§ Needsareveryproblemspecific− E.g.
§ Scientificcomputing(matrixmultiply)§ Webserver:handlemanyunrelatedrequestssimultaneously§ Input/output:it’sallhappeningsimultaneously!
• Specializedlanguagesfordifferenttasks− Someareeasiertouse(forsomeproblems)− Noneisparticularly”easy”touse
• 61C− Parallellanguageexamplesforhigh-performancecomputing− OpenMP
CS61c Lecture19:ThreadLevelParallelProcessing 25
![Page 26: CS 61C: Great Ideas in Computer Architecture Lecture 19 ...cs61c/fa16/lec/19/L19.pdf · CS 61c Lecture 19: Thread Level Parallel Processing 21 Thread pool: List of threads competing](https://reader033.fdocuments.in/reader033/viewer/2022060300/5f0813b37e708231d4203801/html5/thumbnails/26.jpg)
ParallelLoops
• Serialexecution:for (int i=0; i<100; i++) {
…}
• ParallelExecution:
CS61c Lecture19:ThreadLevelParallelProcessing 26
for (int i=0; i<25; i++) { …
}
for (int i=25; i<50; i++) {
…}
for (int i=50; i<75; i++) {
…}
for (int i=75; i<100; i++) {
…}
![Page 27: CS 61C: Great Ideas in Computer Architecture Lecture 19 ...cs61c/fa16/lec/19/L19.pdf · CS 61c Lecture 19: Thread Level Parallel Processing 21 Thread pool: List of threads competing](https://reader033.fdocuments.in/reader033/viewer/2022060300/5f0813b37e708231d4203801/html5/thumbnails/27.jpg)
Parallelfor inOpenMP
#include <omp.h>
#pragma omp parallel forfor (int i=0; i<100; i++) {
…}
CS61c Lecture19:ThreadLevelParallelProcessing 27
![Page 28: CS 61C: Great Ideas in Computer Architecture Lecture 19 ...cs61c/fa16/lec/19/L19.pdf · CS 61c Lecture 19: Thread Level Parallel Processing 21 Thread pool: List of threads competing](https://reader033.fdocuments.in/reader033/viewer/2022060300/5f0813b37e708231d4203801/html5/thumbnails/28.jpg)
OpenMPExample$ gcc-5 -fopenmp for.c;./a.outthread 0, i = 0thread 1, i = 3thread 2, i = 6thread 3, i = 8thread 0, i = 1thread 1, i = 4thread 2, i = 7thread 3, i = 9thread 0, i = 2thread 1, i = 501 02 03 14 15 16 27 28 39 40
CS61c Lecture19:ThreadLevelParallelProcessing 28
![Page 29: CS 61C: Great Ideas in Computer Architecture Lecture 19 ...cs61c/fa16/lec/19/L19.pdf · CS 61c Lecture 19: Thread Level Parallel Processing 21 Thread pool: List of threads competing](https://reader033.fdocuments.in/reader033/viewer/2022060300/5f0813b37e708231d4203801/html5/thumbnails/29.jpg)
OpenMP
• Cextension:nonewlanguagetolearn• Multi-threaded,shared-memoryparallelism
− CompilerDirectives,#pragma− RuntimeLibraryRoutines,#include <omp.h>
• #pragma− IgnoredbycompilersunawareofOpenMP− Samesourceformultiplearchitectures
§ E.g.sameprogramfor1&16cores
• Onlyworkswithsharedmemory
29CS61c Lecture19:ThreadLevelParallelProcessing
![Page 30: CS 61C: Great Ideas in Computer Architecture Lecture 19 ...cs61c/fa16/lec/19/L19.pdf · CS 61c Lecture 19: Thread Level Parallel Processing 21 Thread pool: List of threads competing](https://reader033.fdocuments.in/reader033/viewer/2022060300/5f0813b37e708231d4203801/html5/thumbnails/30.jpg)
OpenMPProgrammingModel• Fork- JoinModel:
• OpenMPprogramsbeginassingleprocess(masterthread)− Sequentialexecution
• Whenparallelregionisencountered− Masterthread“forks” intoteamofparallelthreads− Executedsimultaneously− Atendofparallelregion,parallelthreads”join”,leavingonlymasterthread
• Processrepeatsforeachparallelregion− Amdahl’slaw?
30CS61c Lecture19:ThreadLevelParallelProcessing
![Page 31: CS 61C: Great Ideas in Computer Architecture Lecture 19 ...cs61c/fa16/lec/19/L19.pdf · CS 61c Lecture 19: Thread Level Parallel Processing 21 Thread pool: List of threads competing](https://reader033.fdocuments.in/reader033/viewer/2022060300/5f0813b37e708231d4203801/html5/thumbnails/31.jpg)
WhatKindofThreads?
• OpenMPthreadsareoperatingsystem(software)threads.• OSwillmultiplexrequestedOpenMPthreadsontoavailablehardwarethreads.• Hopefullyeachgetsarealhardwarethreadtorunon,sonoOS-leveltime-multiplexing.• Butothertasksonmachinecanalsousehardwarethreads!• Be“careful”(?)whentimingresultsforproject4!
− 5AM?− Jobqueue?
31CS61c Lecture19:ThreadLevelParallelProcessing
![Page 32: CS 61C: Great Ideas in Computer Architecture Lecture 19 ...cs61c/fa16/lec/19/L19.pdf · CS 61c Lecture 19: Thread Level Parallel Processing 21 Thread pool: List of threads competing](https://reader033.fdocuments.in/reader033/viewer/2022060300/5f0813b37e708231d4203801/html5/thumbnails/32.jpg)
Example2:computingp
CS61c 32http://openmp.org/mp-documents/omp-hands-on-SC08.pdf
![Page 33: CS 61C: Great Ideas in Computer Architecture Lecture 19 ...cs61c/fa16/lec/19/L19.pdf · CS 61c Lecture 19: Thread Level Parallel Processing 21 Thread pool: List of threads competing](https://reader033.fdocuments.in/reader033/viewer/2022060300/5f0813b37e708231d4203801/html5/thumbnails/33.jpg)
Sequentialp
CS61c Lecture19:ThreadLevelParallelProcessing 33
pi = 3.142425985001
• Resemblesp,butnotveryaccurate• Let’sincreasenum_steps andparallelize
![Page 34: CS 61C: Great Ideas in Computer Architecture Lecture 19 ...cs61c/fa16/lec/19/L19.pdf · CS 61c Lecture 19: Thread Level Parallel Processing 21 Thread pool: List of threads competing](https://reader033.fdocuments.in/reader033/viewer/2022060300/5f0813b37e708231d4203801/html5/thumbnails/34.jpg)
Parallelize(1)…
CS61c Lecture19:ThreadLevelParallelProcessing 34
• Problem:eachthreadsneedsaccesstothesharedvariablesum
• Coderunssequentially…
![Page 35: CS 61C: Great Ideas in Computer Architecture Lecture 19 ...cs61c/fa16/lec/19/L19.pdf · CS 61c Lecture 19: Thread Level Parallel Processing 21 Thread pool: List of threads competing](https://reader033.fdocuments.in/reader033/viewer/2022060300/5f0813b37e708231d4203801/html5/thumbnails/35.jpg)
Parallelize(2)…
CS61c Lecture19:ThreadLevelParallelProcessing 35
sum[0] sum[1]
1. Computesum[0]andsum[2]
inparallel
2. Computesum = sum[0] + sum[1]
sequentially
![Page 36: CS 61C: Great Ideas in Computer Architecture Lecture 19 ...cs61c/fa16/lec/19/L19.pdf · CS 61c Lecture 19: Thread Level Parallel Processing 21 Thread pool: List of threads competing](https://reader033.fdocuments.in/reader033/viewer/2022060300/5f0813b37e708231d4203801/html5/thumbnails/36.jpg)
Parallelp
CS61c 36Lecture19:ThreadLevelParallelProcessing
![Page 37: CS 61C: Great Ideas in Computer Architecture Lecture 19 ...cs61c/fa16/lec/19/L19.pdf · CS 61c Lecture 19: Thread Level Parallel Processing 21 Thread pool: List of threads competing](https://reader033.fdocuments.in/reader033/viewer/2022060300/5f0813b37e708231d4203801/html5/thumbnails/37.jpg)
TrialRun
i = 1, id = 1i = 0, id = 0i = 2, id = 2i = 3, id = 3i = 5, id = 1i = 4, id = 0i = 6, id = 2i = 7, id = 3i = 9, id = 1i = 8, id = 0pi = 3.142425985001
CS61c Lecture19:ThreadLevelParallelProcessing 37
![Page 38: CS 61C: Great Ideas in Computer Architecture Lecture 19 ...cs61c/fa16/lec/19/L19.pdf · CS 61c Lecture 19: Thread Level Parallel Processing 21 Thread pool: List of threads competing](https://reader033.fdocuments.in/reader033/viewer/2022060300/5f0813b37e708231d4203801/html5/thumbnails/38.jpg)
Scaleup:num_steps = 106
pi = 3.141592653590
Youverify howmany digitsarecorrect…
CS61c Lecture19:ThreadLevelParallelProcessing 38
![Page 39: CS 61C: Great Ideas in Computer Architecture Lecture 19 ...cs61c/fa16/lec/19/L19.pdf · CS 61c Lecture 19: Thread Level Parallel Processing 21 Thread pool: List of threads competing](https://reader033.fdocuments.in/reader033/viewer/2022060300/5f0813b37e708231d4203801/html5/thumbnails/39.jpg)
CanweParallelizeComputingsum
CS61c Lecture19:ThreadLevelParallelProcessing 39
Summationinsideparallelsection• Insignificantspeedupinthisexample,but…• pi = 3.138450662641• Wrong!And value changes between runs?!• What’s goingon?
AlwayslookingforwaystobeatAmdahl’sLaw…
![Page 40: CS 61C: Great Ideas in Computer Architecture Lecture 19 ...cs61c/fa16/lec/19/L19.pdf · CS 61c Lecture 19: Thread Level Parallel Processing 21 Thread pool: List of threads competing](https://reader033.fdocuments.in/reader033/viewer/2022060300/5f0813b37e708231d4203801/html5/thumbnails/40.jpg)
YourTurn
Whatarethepossiblevaluesof*($s0) afterexecutingthiscodeby2concurrent threads?
# *($s0) = 100lw $t0,0($s0)addi $t0,$t0,1sw $t0,0($s0)
CS61c Lecture19:ThreadLevelParallelProcessing 40
Answer *($s0)
A 100 or101B 101C 101or102D 100or101or102E 100or101or102or103
![Page 41: CS 61C: Great Ideas in Computer Architecture Lecture 19 ...cs61c/fa16/lec/19/L19.pdf · CS 61c Lecture 19: Thread Level Parallel Processing 21 Thread pool: List of threads competing](https://reader033.fdocuments.in/reader033/viewer/2022060300/5f0813b37e708231d4203801/html5/thumbnails/41.jpg)
YourTurn
Whatarethepossiblevaluesof*($s0) afterexecutingthiscodeby2concurrent threads?
# *($s0) = 100lw $t0,0($s0)addi $t0,$t0,1sw $t0,0($s0)
CS61c Lecture19:ThreadLevelParallelProcessing 41
Answer *($s0)
C 101or102
• 102ifthethreadsentercodesectionsequentially• 101ifbothexecutelw beforeeitherrunssw• onethreadsees“stale”data
![Page 42: CS 61C: Great Ideas in Computer Architecture Lecture 19 ...cs61c/fa16/lec/19/L19.pdf · CS 61c Lecture 19: Thread Level Parallel Processing 21 Thread pool: List of threads competing](https://reader033.fdocuments.in/reader033/viewer/2022060300/5f0813b37e708231d4203801/html5/thumbnails/42.jpg)
What’sgoingon?
CS61c Lecture19:ThreadLevelParallelProcessing 42
• Operationisreallypi = pi + sum[id]
• Whatif>1threadsreadscurrent(same)valueofpi,computesthesum,andstorestheresultbacktopi?
• Eachprocessorreadssameintermediatevalueofpi!• Resultdependsonwhogetstherewhen
• A“race”à resultisnotdeterministic
![Page 43: CS 61C: Great Ideas in Computer Architecture Lecture 19 ...cs61c/fa16/lec/19/L19.pdf · CS 61c Lecture 19: Thread Level Parallel Processing 21 Thread pool: List of threads competing](https://reader033.fdocuments.in/reader033/viewer/2022060300/5f0813b37e708231d4203801/html5/thumbnails/43.jpg)
Agenda
• MIMD- multipleprogramssimultaneously• Threads• Parallelprogramming:OpenMP• Synchronizationprimitives• SynchronizationinOpenMP• And,inConclusion…
CS61c Lecture19:ThreadLevelParallelProcessing 43
![Page 44: CS 61C: Great Ideas in Computer Architecture Lecture 19 ...cs61c/fa16/lec/19/L19.pdf · CS 61c Lecture 19: Thread Level Parallel Processing 21 Thread pool: List of threads competing](https://reader033.fdocuments.in/reader033/viewer/2022060300/5f0813b37e708231d4203801/html5/thumbnails/44.jpg)
Synchronization
• Problem:− Limitaccesstosharedresourceto1actoratatime− E.g.only1personpermittedtoeditafileatatime
§ otherwisechangesbyseveralpeoplegetallmixedup
• Solution:
CS61c Lecture19:ThreadLevelParallelProcessing 44
• Taketurns:• Onlyonepersonget’sthe
microphone&talksatatime
• Alsogoodpracticeforclassrooms,btw…
![Page 45: CS 61C: Great Ideas in Computer Architecture Lecture 19 ...cs61c/fa16/lec/19/L19.pdf · CS 61c Lecture 19: Thread Level Parallel Processing 21 Thread pool: List of threads competing](https://reader033.fdocuments.in/reader033/viewer/2022060300/5f0813b37e708231d4203801/html5/thumbnails/45.jpg)
Locks
• Computersuselockstocontrolaccesstosharedresources− Servespurposeofmicrophoneinexample− Alsoreferredtoas“semaphore”
• Usuallyimplementedwithavariable− int lock;
§ 0forunlocked§ 1forlocked
CS61c Lecture19:ThreadLevelParallelProcessing 45
![Page 46: CS 61C: Great Ideas in Computer Architecture Lecture 19 ...cs61c/fa16/lec/19/L19.pdf · CS 61c Lecture 19: Thread Level Parallel Processing 21 Thread pool: List of threads competing](https://reader033.fdocuments.in/reader033/viewer/2022060300/5f0813b37e708231d4203801/html5/thumbnails/46.jpg)
Synchronizationwithlocks// wait for lock releasedwhile (lock != 0) ;// lock == 0 now (unlocked)
// set locklock = 1;
// access shared resource ... // e.g. pi// sequential execution! (Amdahl ...)
// release locklock = 0;
CS61c Lecture19:ThreadLevelParallelProcessing 46
![Page 47: CS 61C: Great Ideas in Computer Architecture Lecture 19 ...cs61c/fa16/lec/19/L19.pdf · CS 61c Lecture 19: Thread Level Parallel Processing 21 Thread pool: List of threads competing](https://reader033.fdocuments.in/reader033/viewer/2022060300/5f0813b37e708231d4203801/html5/thumbnails/47.jpg)
LockSynchronization
Thread1
while (lock != 0) ;
lock = 1;
// critical section
lock = 0;
Thread2
while (lock != 0) ;
lock = 1; // critical sectionlock = 0;
CS61c Lecture19:ThreadLevelParallelProcessing 47
• Thread2findslocknotset,beforethread1setsit
• Boththreadsbelievetheygotandsetthelock!
Tryasyouwant,thisproblemhasnosolution,notevenattheassemblylevel.
Unlessweintroducenewinstructions,thatis!
![Page 48: CS 61C: Great Ideas in Computer Architecture Lecture 19 ...cs61c/fa16/lec/19/L19.pdf · CS 61c Lecture 19: Thread Level Parallel Processing 21 Thread pool: List of threads competing](https://reader033.fdocuments.in/reader033/viewer/2022060300/5f0813b37e708231d4203801/html5/thumbnails/48.jpg)
HardwareSynchronization
• Solution:− Atomicread/write− Read&writeinsingleinstruction
§ Nootheraccesspermittedbetweenreadandwrite− Note:
§ Mustusesharedmemory (multiprocessing)
• Commonimplementations:− Atomicswapofregister↔memory− Pairofinstructionsfor“linked”readandwrite
§ writefailsifmemorylocationhasbeen“tampered”withafterlinkedread
§ MIPSusesthissolution
48CS61c Lecture19:ThreadLevelParallelProcessing
![Page 49: CS 61C: Great Ideas in Computer Architecture Lecture 19 ...cs61c/fa16/lec/19/L19.pdf · CS 61c Lecture 19: Thread Level Parallel Processing 21 Thread pool: List of threads competing](https://reader033.fdocuments.in/reader033/viewer/2022060300/5f0813b37e708231d4203801/html5/thumbnails/49.jpg)
MIPSSynchronizationInstructions• Loadlinked: ll $rt, off($rs)
− Readsmemorylocation(likelw)− Alsosets(hidden)“linkbit”− Linkbitisresetifmemorylocation(off($rs))isaccessed
• Storeconditional: sc $rt, off($rs)
− Storesoff($rs) = $rt (like sw)− Sets$rt=1 (success)iflinkbitisset
§ i.e.no(other)processaccessedoff($rs) sincell− Sets$rt=0 (failure)otherwise− Note:sc clobbers $rt,i.e.changesitsvalue
49CS61c Lecture19:ThreadLevelParallelProcessing
![Page 50: CS 61C: Great Ideas in Computer Architecture Lecture 19 ...cs61c/fa16/lec/19/L19.pdf · CS 61c Lecture 19: Thread Level Parallel Processing 21 Thread pool: List of threads competing](https://reader033.fdocuments.in/reader033/viewer/2022060300/5f0813b37e708231d4203801/html5/thumbnails/50.jpg)
LockSynchronization
BrokenSynchronization
while (lock != 0) ;
lock = 1;
// critical section
lock = 0;
Fix(lockisatlocation$s1)
Try: addiu $t0,$zero,1ll $t1,0($s1)bne $t1,$zero,Trysc $t0,0($s1)beq $t0,$zero,Try
Locked:
# critical section
Unlock:sw $zero,0($s1)
CS61c Lecture19:ThreadLevelParallelProcessing 50
Tryagainifsc failed(another threadexecutedsc sinceabovell)
$t0 = 1 beforecalling ll:minimize timebetweenll andsc
![Page 51: CS 61C: Great Ideas in Computer Architecture Lecture 19 ...cs61c/fa16/lec/19/L19.pdf · CS 61c Lecture 19: Thread Level Parallel Processing 21 Thread pool: List of threads competing](https://reader033.fdocuments.in/reader033/viewer/2022060300/5f0813b37e708231d4203801/html5/thumbnails/51.jpg)
Agenda
• MIMD- multipleprogramssimultaneously• Threads• Parallelprogramming:OpenMP• Synchronizationprimitives• SynchronizationinOpenMP• And,inConclusion…
CS61c Lecture19:ThreadLevelParallelProcessing 51
![Page 52: CS 61C: Great Ideas in Computer Architecture Lecture 19 ...cs61c/fa16/lec/19/L19.pdf · CS 61c Lecture 19: Thread Level Parallel Processing 21 Thread pool: List of threads competing](https://reader033.fdocuments.in/reader033/viewer/2022060300/5f0813b37e708231d4203801/html5/thumbnails/52.jpg)
OpenMPLocks
CS61c Lecture19:ThreadLevelParallelProcessing 52
![Page 53: CS 61C: Great Ideas in Computer Architecture Lecture 19 ...cs61c/fa16/lec/19/L19.pdf · CS 61c Lecture 19: Thread Level Parallel Processing 21 Thread pool: List of threads competing](https://reader033.fdocuments.in/reader033/viewer/2022060300/5f0813b37e708231d4203801/html5/thumbnails/53.jpg)
SynchronizationinOpenMP
• Typicallyareusedinlibrariesofhigherlevelparallelprogrammingconstructs• E.g.OpenMPoffers$pragmasforcommoncases:
− critical− atomic− barrier− ordered
• OpenMPoffersmanymorefeatures− seeonlinedocumentation− ortutorialat
§ http://openmp.org/mp-documents/omp-hands-on-SC08.pdf
CS61c Lecture19:ThreadLevelParallelProcessing 53
![Page 54: CS 61C: Great Ideas in Computer Architecture Lecture 19 ...cs61c/fa16/lec/19/L19.pdf · CS 61c Lecture 19: Thread Level Parallel Processing 21 Thread pool: List of threads competing](https://reader033.fdocuments.in/reader033/viewer/2022060300/5f0813b37e708231d4203801/html5/thumbnails/54.jpg)
OpenMPcritical
CS61c Lecture19:ThreadLevelParallelProcessing 54
![Page 55: CS 61C: Great Ideas in Computer Architecture Lecture 19 ...cs61c/fa16/lec/19/L19.pdf · CS 61c Lecture 19: Thread Level Parallel Processing 21 Thread pool: List of threads competing](https://reader033.fdocuments.in/reader033/viewer/2022060300/5f0813b37e708231d4203801/html5/thumbnails/55.jpg)
TheTroublewithLocks…• …isdead-locks• Consider2cookssharingakitchen
− Eachcooksamealthatrequiressaltandpepper(locks)− Cook1grabssalt− Cook2grabspepper− Cook1noticess/heneedspepper
§ it’snotthere,sos/hewaits− Cook2realizess/heneedssalt
§ it’snotthere,sos/hewaits
• Anotsocommoncauseofcookstarvation− Butdeadlocksarepossibleinparallelprograms− Verydifficulttodebug
§ malloc/free iseasy…
CS61c Lecture19:ThreadLevelParallelProcessing 55
![Page 56: CS 61C: Great Ideas in Computer Architecture Lecture 19 ...cs61c/fa16/lec/19/L19.pdf · CS 61c Lecture 19: Thread Level Parallel Processing 21 Thread pool: List of threads competing](https://reader033.fdocuments.in/reader033/viewer/2022060300/5f0813b37e708231d4203801/html5/thumbnails/56.jpg)
Agenda
• MIMD- multipleprogramssimultaneously• Threads• Parallelprogramming:OpenMP• Synchronizationprimitives• SynchronizationinOpenMP• And,inConclusion…
CS61c Lecture19:ThreadLevelParallelProcessing 56
![Page 57: CS 61C: Great Ideas in Computer Architecture Lecture 19 ...cs61c/fa16/lec/19/L19.pdf · CS 61c Lecture 19: Thread Level Parallel Processing 21 Thread pool: List of threads competing](https://reader033.fdocuments.in/reader033/viewer/2022060300/5f0813b37e708231d4203801/html5/thumbnails/57.jpg)
AndinConclusion,…• Sequentialsoftwareexecutionspeedislimited• Parallelprocessingistheonlypathtohigherperformance
− SIMD:instructionlevelparallelism§ Implemented inallhighperformanceCPUstoday(x86,ARM,…)§ Partiallysupportedbycompilers
− MIMD:threadlevelparallelism§ Multicoreprocessors§ SupportedbyOperatingSystems(OS)§ Requiresprogrammerinterventiontoexploitatsingleprogramlevel
o E.g.OpenMP− SIMD&MIMDformaximumperformance
• Synchronization− Requireshardwaresupport:specializedassemblyinstructions− Typicallyusehigher-levelsupport− Bewareofdeadlocks
57CS61c Lecture19:ThreadLevelParallelProcessing