Parallelism and Concurrency · 2016. 11. 21. · OCaml, Concurrency and Parallelism Unfortunately,...

77
Parallelism and Concurrency COS 326 David Walker Princeton University slides copyright 2013-2015 David Walker and Andrew W. Appel permission granted to reuse these slides for non-commercial educaGonal purposes

Transcript of Parallelism and Concurrency · 2016. 11. 21. · OCaml, Concurrency and Parallelism Unfortunately,...

Page 1: Parallelism and Concurrency · 2016. 11. 21. · OCaml, Concurrency and Parallelism Unfortunately, even if your computer has 2, 4, 6, 8 cores, OCaml cannot exploit them. It mulGplexes

ParallelismandConcurrency

COS326DavidWalker

PrincetonUniversityslidescopyright2013-2015DavidWalkerandAndrewW.Appel

permissiongrantedtoreusetheseslidesfornon-commercialeducaGonalpurposes

Page 2: Parallelism and Concurrency · 2016. 11. 21. · OCaml, Concurrency and Parallelism Unfortunately, even if your computer has 2, 4, 6, 8 cores, OCaml cannot exploit them. It mulGplexes

Parallelism

2

•  Whatisit?

•  Today'stechnologytrends.

•  Then:–  Whyisitsomuchhardertoprogram?

•  (Isitactuallysomuchhardertoprogram?)–  SomepreliminarylinguisGcconstructs

•  threadcreaGon•  ourfirstparallelfuncGonalabstracGon:futures

Page 3: Parallelism and Concurrency · 2016. 11. 21. · OCaml, Concurrency and Parallelism Unfortunately, even if your computer has 2, 4, 6, 8 cores, OCaml cannot exploit them. It mulGplexes

PARALLELISM:WHATISIT?

Page 4: Parallelism and Concurrency · 2016. 11. 21. · OCaml, Concurrency and Parallelism Unfortunately, even if your computer has 2, 4, 6, 8 cores, OCaml cannot exploit them. It mulGplexes

Parallelism

4

•  Whatisit?–  doingmanythingsatthesameGmeinsteadofsequenGally(one-aSer-the-other).

Page 5: Parallelism and Concurrency · 2016. 11. 21. · OCaml, Concurrency and Parallelism Unfortunately, even if your computer has 2, 4, 6, 8 cores, OCaml cannot exploit them. It mulGplexes

FlavorsofParallelism

5

DataParallelism–  samecomputaGonbeingperformedonacollec%onofindependentitems

–  e.g.,addingtwovectorsofnumbersTaskParallelism

–  differentcomputaGons/programsrunningatthesameGme–  e.g.,runningwebserveranddatabase

PipelineParallelism–  assemblyline:

sequenGalf

sequenGalg

mapfoverallitems mapgoverallitems

Page 6: Parallelism and Concurrency · 2016. 11. 21. · OCaml, Concurrency and Parallelism Unfortunately, even if your computer has 2, 4, 6, 8 cores, OCaml cannot exploit them. It mulGplexes

Parallelismvs.Concurrency

6

Parallelism:performsmanytaskssimultaneously

•  purpose:improvesthroughput•  mechanism:

–  manyindependentcompuGngdevices–  decreaserunGmeofprogrambyuGlizingmulGplecoresorcomputers

•  eg:runningyourwebcrawleronaclusterversusonemachine.

Concurrency:mediatesmulG-partyaccesstosharedresources•  purpose:decreaseresponseGme•  mechanism:

–  switchbetweendifferentthreadsofcontrol–  workononethreadwhenitcanmakeusefulprogress;whenitcan't,suspenditandworkonanotherthread

•  eg:runningyourclock,editor,chatatthesameGmeonasingleCPU.–  OSgiveseachoftheseprogramsasmallGme-slice(~10msec)–  oSenslowsthroughputduetocostofswitchingcontexts

•  eg:don'tblockwhilewaiGngforI/Odevicetorespond,butletanotherthreaddousefulCPUcomputaGon

Page 7: Parallelism and Concurrency · 2016. 11. 21. · OCaml, Concurrency and Parallelism Unfortunately, even if your computer has 2, 4, 6, 8 cores, OCaml cannot exploit them. It mulGplexes

Parallelismvs.Concurrency

7

cpu cpu cpu

job

Parallelism:performseveralindependenttaskssimultaneously

resource(cpu,disk,server,datastructure)

job …Concurrency:mediate/mulGplexaccesstosharedresource

job job

manyefficientprogramsusesomeparallelismandsomeconcurrency

Page 8: Parallelism and Concurrency · 2016. 11. 21. · OCaml, Concurrency and Parallelism Unfortunately, even if your computer has 2, 4, 6, 8 cores, OCaml cannot exploit them. It mulGplexes

UNDERSTANDINGTECHNOLOGYTRENDS

Page 9: Parallelism and Concurrency · 2016. 11. 21. · OCaml, Concurrency and Parallelism Unfortunately, even if your computer has 2, 4, 6, 8 cores, OCaml cannot exploit them. It mulGplexes

Moore'sLaw•  Moore'sLaw:Thenumberoftransistorsyoucanputona

computerchipdoubles(approximately)everycoupleofyears.

•  ConsequenceformostofthehistoryofcompuGng:Allprogramsdoubleinspeedeverycoupleofyears.–  Why?Hardwaredesignersarewickedsmart.–  Theyhavebeenabletousethoseextratransistorsto(forexample)doublethenumberofinstrucGonsexecutedperGmeunit,therebyprocessingspeedofprograms

•  ConsequenceforapplicaGonwriters:–  watchTVforawhileandyourprogramsop%mizethemselves!–  perhapsmoreimportantly:newapplicaGonsthoughtimpossiblebecamepossiblebecauseofincreasedcomputaGonalpower

Page 10: Parallelism and Concurrency · 2016. 11. 21. · OCaml, Concurrency and Parallelism Unfortunately, even if your computer has 2, 4, 6, 8 cores, OCaml cannot exploit them. It mulGplexes

CPUClockSpeedsfrom1993-2005

10

Page 11: Parallelism and Concurrency · 2016. 11. 21. · OCaml, Concurrency and Parallelism Unfortunately, even if your computer has 2, 4, 6, 8 cores, OCaml cannot exploit them. It mulGplexes

CPUClockSpeedsfrom1993-2005

11

Nextyear’smachineistwiceasfast!

Page 12: Parallelism and Concurrency · 2016. 11. 21. · OCaml, Concurrency and Parallelism Unfortunately, even if your computer has 2, 4, 6, 8 cores, OCaml cannot exploit them. It mulGplexes

CPUClockSpeedsfrom1993-2005

12

Oops!

Page 13: Parallelism and Concurrency · 2016. 11. 21. · OCaml, Concurrency and Parallelism Unfortunately, even if your computer has 2, 4, 6, 8 cores, OCaml cannot exploit them. It mulGplexes

CPUPower1993-2005

13

Page 14: Parallelism and Concurrency · 2016. 11. 21. · OCaml, Concurrency and Parallelism Unfortunately, even if your computer has 2, 4, 6, 8 cores, OCaml cannot exploit them. It mulGplexes

CPUPower1993-2005

14

ButpowerconsumpGonisonlypartoftheproblem…coolingistheother!

Page 15: Parallelism and Concurrency · 2016. 11. 21. · OCaml, Concurrency and Parallelism Unfortunately, even if your computer has 2, 4, 6, 8 cores, OCaml cannot exploit them. It mulGplexes

TheHeatProblem

15

Page 16: Parallelism and Concurrency · 2016. 11. 21. · OCaml, Concurrency and Parallelism Unfortunately, even if your computer has 2, 4, 6, 8 cores, OCaml cannot exploit them. It mulGplexes

TheProblem

16

1993PenGumHeatSink

2005Cooler

Page 17: Parallelism and Concurrency · 2016. 11. 21. · OCaml, Concurrency and Parallelism Unfortunately, even if your computer has 2, 4, 6, 8 cores, OCaml cannot exploit them. It mulGplexes

Cray-4:1994

17

Upto64processorsRunningat1GHz8MegabytesofRAMCost:roughly$10MTheCRAY2,3,and4CPUandmemoryboardswereimmersedinabathofelectricallyinertcoolingfluid.

Page 18: Parallelism and Concurrency · 2016. 11. 21. · OCaml, Concurrency and Parallelism Unfortunately, even if your computer has 2, 4, 6, 8 cores, OCaml cannot exploit them. It mulGplexes

watercooled!

18

Page 19: Parallelism and Concurrency · 2016. 11. 21. · OCaml, Concurrency and Parallelism Unfortunately, even if your computer has 2, 4, 6, 8 cores, OCaml cannot exploit them. It mulGplexes

PowerDissipaGon

19

Page 20: Parallelism and Concurrency · 2016. 11. 21. · OCaml, Concurrency and Parallelism Unfortunately, even if your computer has 2, 4, 6, 8 cores, OCaml cannot exploit them. It mulGplexes

20

Darn!IntelengineersnolongeropGmizemyprogramswhileIwatchTV!

Powertochippeaking

Page 21: Parallelism and Concurrency · 2016. 11. 21. · OCaml, Concurrency and Parallelism Unfortunately, even if your computer has 2, 4, 6, 8 cores, OCaml cannot exploit them. It mulGplexes

21

Butlook:Moore’sLawsGllholds,sofar,fortransistors-per-chip.

Whatdowedowithallthosetransistors?1.  MulGcore!

2.  System-on-chipwithspecializedcoprocessors(suchasGPU)BothofthosearePARALLELISM

Page 22: Parallelism and Concurrency · 2016. 11. 21. · OCaml, Concurrency and Parallelism Unfortunately, even if your computer has 2, 4, 6, 8 cores, OCaml cannot exploit them. It mulGplexes

Parallelism

22

WhyisitparGcularlyimportant(today)?–  Roughlyeveryotheryear,achipfromIntelwould:

•  halvethefeaturesize(sizeoftransistors,wires,etc.)•  doublethenumberoftransistors•  doubletheclockspeed•  thisdrovetheeconomicengineoftheITindustry(andtheUS!)

–  Nolongerabletodoubleclockorcutvoltage:aprocessorwon’tgetanyfaster!•  (sowhyshouldyoubuyanewlaptop,desktop,etc.?)•  powerandheatarelimitaGonsontheclock•  errors,variability(noise)arelimitaGonsonthevoltage•  butwecansGllpackalotoftransistorsonachip…(atleastforanother10to15years.)

Page 23: Parallelism and Concurrency · 2016. 11. 21. · OCaml, Concurrency and Parallelism Unfortunately, even if your computer has 2, 4, 6, 8 cores, OCaml cannot exploit them. It mulGplexes

Core

MulG-coreh/w–commonL2

L2cache

Core

Mainmemory

L1cache L1cache

ALUALU

ALUALU

23

Page 24: Parallelism and Concurrency · 2016. 11. 21. · OCaml, Concurrency and Parallelism Unfortunately, even if your computer has 2, 4, 6, 8 cores, OCaml cannot exploit them. It mulGplexes

Today…(actually9yearsago!)

24

Page 25: Parallelism and Concurrency · 2016. 11. 21. · OCaml, Concurrency and Parallelism Unfortunately, even if your computer has 2, 4, 6, 8 cores, OCaml cannot exploit them. It mulGplexes

GPUs

•  There'snothinglikevideogamingtodriveprogressincomputa%on!

•  GPUscanhavehundredsoreventhousandsofcores

•  Threeofthe5mostpowerfulsupercomputersintheworldtakeadvantageofGPUacceleraGon.

•  ScienGstsuseGPUsforsimulaGonandmodelling–  eg:proteinfoldingandfluiddynamics

Page 26: Parallelism and Concurrency · 2016. 11. 21. · OCaml, Concurrency and Parallelism Unfortunately, even if your computer has 2, 4, 6, 8 cores, OCaml cannot exploit them. It mulGplexes

GPUs

•  There'snothinglikevideogamingtodriveprogressincomputa%on!

•  GPUscanhavehundredsoreventhousandsofcores

•  Threeofthe5mostpowerfulsupercomputersintheworldtakeadvantageofGPUacceleraGon.

•  ScienGstsuseGPUsforsimulaGonandmodelling–  eg:proteinfoldingandfluiddynamics

JohnDanskin,PhDPrinceton1994,VicePresidentforGPUarchitecture,Nvidia(whathedoeswithhisspareGme…builtthiscarhimself)

Page 27: Parallelism and Concurrency · 2016. 11. 21. · OCaml, Concurrency and Parallelism Unfortunately, even if your computer has 2, 4, 6, 8 cores, OCaml cannot exploit them. It mulGplexes

So…

27

InsteadoftryingtomakeyourCPUgofaster,Intel’sjustgoingtopackmoreCPUsontoachip.

–  afewyearsago:dualcore(2CPUs).–  aliqlemorerecently:4,6,8cores.–  IntelistesGng48-corechipswithresearchersnow.–  Within10years,you’llhave~1024IntelCPUsonachip.

Infact,that’salreadyhappeningwithgraphicschips(eg,Nvidia).

–  reallygoodatsimpledataparallelism(manydeeppipes)–  buttheyaremuchdumberthananIntelcore.–  andrightnow,chewupalotofpower.–  watchforGPUstoget“smarter”andmorepowerefficient,whileCPUsbecomemorelikeGPUs.

Page 28: Parallelism and Concurrency · 2016. 11. 21. · OCaml, Concurrency and Parallelism Unfortunately, even if your computer has 2, 4, 6, 8 cores, OCaml cannot exploit them. It mulGplexes

STILLMOREPROCESSORS:THEDATACENTER

Page 29: Parallelism and Concurrency · 2016. 11. 21. · OCaml, Concurrency and Parallelism Unfortunately, even if your computer has 2, 4, 6, 8 cores, OCaml cannot exploit them. It mulGplexes

DataCenters:GeneraGonZSuperComputers

Page 30: Parallelism and Concurrency · 2016. 11. 21. · OCaml, Concurrency and Parallelism Unfortunately, even if your computer has 2, 4, 6, 8 cores, OCaml cannot exploit them. It mulGplexes

DataCenters:LotsofConnectedComputers!

Page 31: Parallelism and Concurrency · 2016. 11. 21. · OCaml, Concurrency and Parallelism Unfortunately, even if your computer has 2, 4, 6, 8 cores, OCaml cannot exploit them. It mulGplexes

DataCenters•  10sor100softhousandsofcomputers•  Allconnectedtogether•  MoGvatedbynewapplicaGonsandscalablewebservices:

–  let'scatalogueallNbillionwebpagesintheworld–  let'sallallowanyoneintheworldtosearchforthepageheorsheneeds

–  let'sprocessthatsearchinlessthanasecond•  It'sAmazing!•  It'sMagic!

Page 32: Parallelism and Concurrency · 2016. 11. 21. · OCaml, Concurrency and Parallelism Unfortunately, even if your computer has 2, 4, 6, 8 cores, OCaml cannot exploit them. It mulGplexes

DataCenters:LotsofConnectedComputersComputercontainersforplug-and-playparallelism:

Page 33: Parallelism and Concurrency · 2016. 11. 21. · OCaml, Concurrency and Parallelism Unfortunately, even if your computer has 2, 4, 6, 8 cores, OCaml cannot exploit them. It mulGplexes

SoundsGreat!

33

•  Somyoldprogramswillrun2x,4x,48x,256x,1024xfaster?

Page 34: Parallelism and Concurrency · 2016. 11. 21. · OCaml, Concurrency and Parallelism Unfortunately, even if your computer has 2, 4, 6, 8 cores, OCaml cannot exploit them. It mulGplexes

SoundsGreat!

34

•  Somyoldprogramswillrun2x,4x,48x,256x,1024xfaster?–  noway!

Page 35: Parallelism and Concurrency · 2016. 11. 21. · OCaml, Concurrency and Parallelism Unfortunately, even if your computer has 2, 4, 6, 8 cores, OCaml cannot exploit them. It mulGplexes

SoundsGreat!

35

•  Somyoldprogramswillrun2x,4x,48x,256x,1024xfaster?–  noway!–  toupgradefromIntel386to486,theappwriterandcompilerwriterdidnothavetodoanything(much)•  IA486interpretedthesamesequenGalstreamofinstrucGons;itjustdiditfaster

•  thisiswhywecouldwatchTVwhileIntelengineersopGmizedourprogramsforus

–  toupgradefromIntel486todualcore,weneedtofigureouthowtosplitasinglestreamofinstrucGonsintotwostreamsofinstrucGonsthatcollaboratetocompletethesametask.•  withoutwork&thought,ourprogramsdon'tgetanyfasteratall•  ittakesingenuitytogenerateefficientparallelalgorithmsfromsequen%alones

Page 36: Parallelism and Concurrency · 2016. 11. 21. · OCaml, Concurrency and Parallelism Unfortunately, even if your computer has 2, 4, 6, 8 cores, OCaml cannot exploit them. It mulGplexes

What’stheanswer?

Page 37: Parallelism and Concurrency · 2016. 11. 21. · OCaml, Concurrency and Parallelism Unfortunately, even if your computer has 2, 4, 6, 8 cores, OCaml cannot exploit them. It mulGplexes

InPart:FuncGonalProgramming!

Dryad

Pig

Naiad

Page 38: Parallelism and Concurrency · 2016. 11. 21. · OCaml, Concurrency and Parallelism Unfortunately, even if your computer has 2, 4, 6, 8 cores, OCaml cannot exploit them. It mulGplexes

PARALLELANDCONCURRENTPROGRAMMING

Page 39: Parallelism and Concurrency · 2016. 11. 21. · OCaml, Concurrency and Parallelism Unfortunately, even if your computer has 2, 4, 6, 8 cores, OCaml cannot exploit them. It mulGplexes

Core

MulGcoreHardware&DataCenters

L2cache

Core

Mainmemory

L1cache L1cache

ALUALU

ALUALU

39

Page 40: Parallelism and Concurrency · 2016. 11. 21. · OCaml, Concurrency and Parallelism Unfortunately, even if your computer has 2, 4, 6, 8 cores, OCaml cannot exploit them. It mulGplexes

Speedup

40

•  Speedup:theraGoofsequenGalprogramexecuGonGmetoparallelexecuGonGme.

•  IfT(p)istheGmeittakestorunacomputaGononpprocessors

•  Aparallelprogramhasperfectspeedup(akalinearspeedup)if

•  Badnews:Noteveryprogramcanbeeffec%velyparallelized.–  infact,veryfewprogramswillscalewithperfectspeedups.–  wecertainlycan'tachieveperfectspeedupsautomaGcally–  limitedbysequenGalporGons,datatransfercosts,...

speedup(p)=T(1)/T(p)

T(1)/T(p)=speedup=p

Page 41: Parallelism and Concurrency · 2016. 11. 21. · OCaml, Concurrency and Parallelism Unfortunately, even if your computer has 2, 4, 6, 8 cores, OCaml cannot exploit them. It mulGplexes

MostTroubling…

41

Most,butnotall,parallelandconcurrentprogrammingmodelsarefarhardertoworkwiththansequenGalones:

•  Theyintroducenondeterminism–  therootof(almostall)evil–  programpartssuddenlyhavemanydifferentoutcomes

•  theyhavedifferentoutcomesondifferentruns•  debuggingrequiresconsideringallofthepossibleoutcomes•  horribleheisenbugshardtotrackdown

•  Theyarenonmodular–  moduleAimplicitlyinfluencestheoutcomesofmoduleB

•  Theyintroducenewclassesoferrors–  racecondiGons,deadlocks

•  Theyintroducenewperformance/scalabilityproblems–  busy-waiGng,sequenGalizaGon,contenGon,

Page 42: Parallelism and Concurrency · 2016. 11. 21. · OCaml, Concurrency and Parallelism Unfortunately, even if your computer has 2, 4, 6, 8 cores, OCaml cannot exploit them. It mulGplexes

InformalErrorRateChart

regularitywithwhichyoushootyourselfinthefoot

Page 43: Parallelism and Concurrency · 2016. 11. 21. · OCaml, Concurrency and Parallelism Unfortunately, even if your computer has 2, 4, 6, 8 cores, OCaml cannot exploit them. It mulGplexes

InformalErrorRateChart

regularitywithwhichyoushootyourselfinthefoot

nullpointers,paucityoftypes,inheritence

manualmemorymanagement

kitchensink+manualmemory

heavenonearth

unstructuredparallelorconcurrentprogramming

Page 44: Parallelism and Concurrency · 2016. 11. 21. · OCaml, Concurrency and Parallelism Unfortunately, even if your computer has 2, 4, 6, 8 cores, OCaml cannot exploit them. It mulGplexes

SolidParallelProgrammingRequires

44

1.GoodsequenGalprogrammingskills.–  allthethingswe’vebeentalkingabout:usemodules,types,...

2.DeepknowledgeoftheapplicaGon.3.Pickacorrect-by-construc%onparallelprogrammingmodel

–  wheneverpossible,aparallelmodelwithsemanGcsthatcoincideswithsequenGalsemanGcs•  wheneverpossible,reusewell-testedlibrariesthathideparallelism

–  wheneverpossible,amodelthatcutsdownnon-determinism–  wheneverpossible,amodelwithfewerpossibleconcurrencybugs–  ifbugscanarise,knowandusesafeprogrammingpaqerns

4.Carefulengineeringtoensurescaling.

–  unfortunately,thereissomeGmesatradeoff:•  reducednondeterminismcanleadtoreducedresourceuGlizaGon

–  synchronizaGon,communicaGoncostsmayneedopGmizaGon

Page 45: Parallelism and Concurrency · 2016. 11. 21. · OCaml, Concurrency and Parallelism Unfortunately, even if your computer has 2, 4, 6, 8 cores, OCaml cannot exploit them. It mulGplexes

OURFIRSTPARALLELPROGRAMMINGMODEL:THREADS

Page 46: Parallelism and Concurrency · 2016. 11. 21. · OCaml, Concurrency and Parallelism Unfortunately, even if your computer has 2, 4, 6, 8 cores, OCaml cannot exploit them. It mulGplexes

Threads:AWarning•  ConcurrentThreadswithLocks:theclassicshoot-yourself-in-

the-footconcurrentprogrammingmodel–  alltheclassicerrormodes

•  WhyThreads?–  almostallprogramminglanguageswillhaveathreadslibrary

•  OCamlinparGcular!–  youneedtoknowwherethepiyallsare–  theassemblylanguageofconcurrentprogrammingparadigms

•  we’llusethreadstobuildseveralhigher-levelprogrammingmodels

Page 47: Parallelism and Concurrency · 2016. 11. 21. · OCaml, Concurrency and Parallelism Unfortunately, even if your computer has 2, 4, 6, 8 cores, OCaml cannot exploit them. It mulGplexes

Threads

47

•  Threads:anabstracGonofaprocessor.–  programmer(orcompiler)decidesthatsomeworkcanbedoneinparallelwithsomeotherwork,e.g.:

–  weforkathreadtorunthecomputaGoninparallel,e.g.:

let _ = compute_big_thing() in let y = compute_other_big_thing() in ...

let t = Thread.create compute_big_thing () in let y = compute_other_big_thing () in ...

Page 48: Parallelism and Concurrency · 2016. 11. 21. · OCaml, Concurrency and Parallelism Unfortunately, even if your computer has 2, 4, 6, 8 cores, OCaml cannot exploit them. It mulGplexes

IntuiGoninPictures

48

let t = Thread.create f () in let y = g () in ...

Thread.create execute g () ...

processor1

(* doing nothing *) execute f () ...

processor2

Gme1Gme2Gme3

Page 49: Parallelism and Concurrency · 2016. 11. 21. · OCaml, Concurrency and Parallelism Unfortunately, even if your computer has 2, 4, 6, 8 cores, OCaml cannot exploit them. It mulGplexes

OfCourse…

49

Supposeyouhave2availablecoresandyoufork4threads.InatypicalmulG-threadedsystem,

–  theoperaGngsystemprovidestheillusionthatthereareaninfinitenumberofprocessors.•  notreally:eachthreadconsumesspace,soifyouforktoomanythreadstheprocesswilldie.

–  it%me-mul%plexesthethreadsacrosstheavailableprocessors.•  aboutevery10msec,itstopsthecurrentthreadonaprocessor,andswitchestoanotherthread.

•  soathreadisreallyavirtualprocessor.

Page 50: Parallelism and Concurrency · 2016. 11. 21. · OCaml, Concurrency and Parallelism Unfortunately, even if your computer has 2, 4, 6, 8 cores, OCaml cannot exploit them. It mulGplexes

OCaml,ConcurrencyandParallelismUnfortunately,evenifyourcomputerhas2,4,6,8cores,OCamlcannotexploitthem.ItmulGplexesallthreadsoverasinglecore

Hence,OCamlprovidesconcurrency,butnotparallelism.Why?BecauseOCaml(likePython)hasnoparallel“runGmesystem”orgarbagecollector.OtherfuncGonallanguages(Haskell,F#,...)do.Fortunately,whenthinkingaboutprogramcorrectness,itdoesn’tmaqerthatOCamlisnotparallel--IwilloSenpretendthatitis.YoucanhideI/Olatency,domulGprocessprogrammingordistributetasksamongstmulGplecomputersinOCaml.

core

thread …thread thread

Page 51: Parallelism and Concurrency · 2016. 11. 21. · OCaml, Concurrency and Parallelism Unfortunately, even if your computer has 2, 4, 6, 8 cores, OCaml cannot exploit them. It mulGplexes

CoordinaGon

51

HowdowegetbacktheresultthattiscompuGng?

Thread.create : (‘a -> ‘b) -> ‘a -> Thread.t let t = Thread.create f () in let y = g () in ...

Page 52: Parallelism and Concurrency · 2016. 11. 21. · OCaml, Concurrency and Parallelism Unfortunately, even if your computer has 2, 4, 6, 8 cores, OCaml cannot exploit them. It mulGplexes

FirstAqempt

52

let r = ref None let t = Thread.create (fun _ -> r := Some(f ())) in let y = g() in

match !r with | Some v -> (* compute with v and y *)

| None -> ???

What’swrongwiththis?

Page 53: Parallelism and Concurrency · 2016. 11. 21. · OCaml, Concurrency and Parallelism Unfortunately, even if your computer has 2, 4, 6, 8 cores, OCaml cannot exploit them. It mulGplexes

SecondAqempt

53

let r = ref None let t = Thread.create (fun _ -> r := Some(f ())) in let y = g() in

let rec wait() =

match !r with | Some v -> v

| None -> wait()

in let v = wait() in (* compute with v and y *)

Page 54: Parallelism and Concurrency · 2016. 11. 21. · OCaml, Concurrency and Parallelism Unfortunately, even if your computer has 2, 4, 6, 8 cores, OCaml cannot exploit them. It mulGplexes

TwoProblems

54

First,wearebusy-wai%ng.•  consumingcpuwithoutdoingsomethinguseful.•  theprocessorcouldbeeitherrunningausefulthread/programorpower

down.

let r = ref None let t = Thread.create (fun _ -> r := Some(f ())) in let y = g() in

let rec wait() =

match !r with | Some v -> v

| None -> wait()

in let v = wait() in (* compute with v and y *)

Page 55: Parallelism and Concurrency · 2016. 11. 21. · OCaml, Concurrency and Parallelism Unfortunately, even if your computer has 2, 4, 6, 8 cores, OCaml cannot exploit them. It mulGplexes

TwoProblems

55

Second,anoperaGonliker:=Somevmaynotbeatomic.•  r:=SomevrequiresustocopythebytesofSomevintotherefr•  wemightseepartofthebytes(correspondingtoSome)beforewe’ve

wriqenintheotherparts(e.g.,v).•  Sothewaitermightseethewrongvalue.

let r = ref None let t = Thread.create (fun _ -> r := Some(f ())) in let y = g() in

let rec wait() =

match !r with | Some v -> v

| None -> wait()

in let v = wait() in (* compute with v and y *)

Page 56: Parallelism and Concurrency · 2016. 11. 21. · OCaml, Concurrency and Parallelism Unfortunately, even if your computer has 2, 4, 6, 8 cores, OCaml cannot exploit them. It mulGplexes

Atomicity

56

Considerthefollowing:

andsupposetwothreadsareincremenGngthesamerefr:Thread1 Thread2inc(r); inc(r); !r !r

IfriniGallyholds0,thenwhatwillThread1seewhenitreadsr?

let inc(r:int ref) = r := (!r) + 1

Page 57: Parallelism and Concurrency · 2016. 11. 21. · OCaml, Concurrency and Parallelism Unfortunately, even if your computer has 2, 4, 6, 8 cores, OCaml cannot exploit them. It mulGplexes

Atomicity

57

Theproblemisthatwecan’tseeexactlywhatinstrucGonsthecompilermightproducetoexecutethecode.Itmightlooklikethis:Thread1 Thread2EAX := load(r); EAX := load(r);

EAX := EAX + 1; EAX := EAX + 1; store EAX into r store EAX into r

EAX := load(r) EAX := load(r)

Page 58: Parallelism and Concurrency · 2016. 11. 21. · OCaml, Concurrency and Parallelism Unfortunately, even if your computer has 2, 4, 6, 8 cores, OCaml cannot exploit them. It mulGplexes

Atomicity

58

ButaclevercompilermightopGmizethisto:Thread1 Thread2EAX := load(r); EAX := load(r);

EAX := EAX + 1; EAX := EAX + 1;

store EAX into r store EAX into r EAX := load(r) EAX := load(r)

Page 59: Parallelism and Concurrency · 2016. 11. 21. · OCaml, Concurrency and Parallelism Unfortunately, even if your computer has 2, 4, 6, 8 cores, OCaml cannot exploit them. It mulGplexes

Atomicity

59

Furthermore,wedon’tknowwhentheOSmightinterruptonethreadandruntheother.Thread1 Thread2EAX := load(r); EAX := load(r);

EAX := EAX + 1; EAX := EAX + 1; store EAX into r store EAX into r

EAX := load(r) EAX := load(r) (ThesituaGonissimilar,butnotquitethesameonmulG-processorsystems.)

Page 60: Parallelism and Concurrency · 2016. 11. 21. · OCaml, Concurrency and Parallelism Unfortunately, even if your computer has 2, 4, 6, 8 cores, OCaml cannot exploit them. It mulGplexes

TheHappensBeforeRelaGonWedon’tknowexactlywheneachinstrucGonwillexecute,buttherearesomeconstraints:theHappensBeforerelaGonRule1:Giventwoexpressions(orinstrucGons)insequence,e1;e2weknowthate1happensbeforee2.Rule2:Givenaprogram:lett=Thread.createfxin....Thread.joint;eweknowthat(fx)happensbeforee.

Page 61: Parallelism and Concurrency · 2016. 11. 21. · OCaml, Concurrency and Parallelism Unfortunately, even if your computer has 2, 4, 6, 8 cores, OCaml cannot exploit them. It mulGplexes

Atomicity

61

OnepossibleinterleavingoftheinstrucGons:Thread1 Thread2EAX := load(r); EAX := load(r); EAX := EAX + 1; EAX := EAX + 1;

store EAX into r store EAX into r EAX := load(r) EAX := load(r)

Whatanswerdoweget?

Page 62: Parallelism and Concurrency · 2016. 11. 21. · OCaml, Concurrency and Parallelism Unfortunately, even if your computer has 2, 4, 6, 8 cores, OCaml cannot exploit them. It mulGplexes

Atomicity

62

Anotherpossibleinterleaving:Thread1 Thread2EAX := load(r); EAX := load(r); EAX := EAX + 1; EAX := EAX + 1;

store EAX into r store EAX into r EAX := load(r) EAX := load(r)

WhatanswerdowegetthisGme?

Page 63: Parallelism and Concurrency · 2016. 11. 21. · OCaml, Concurrency and Parallelism Unfortunately, even if your computer has 2, 4, 6, 8 cores, OCaml cannot exploit them. It mulGplexes

Atomicity

63

Anotherpossibleinterleaving:Thread1 Thread2EAX := load(r); EAX := load(r); EAX := EAX + 1; EAX := EAX + 1; store EAX into r store EAX into r EAX := load(r) EAX := load(r)

WhatanswerdowegetthisGme?Moral:ThesystemisresponsibleforschedulingexecuGonofinstrucGons.Moral:Thiscanleadtoanenormousdegreeofnondeterminism.

Page 64: Parallelism and Concurrency · 2016. 11. 21. · OCaml, Concurrency and Parallelism Unfortunately, even if your computer has 2, 4, 6, 8 cores, OCaml cannot exploit them. It mulGplexes

Atomicity

64

Infact,today’smulGcoreprocessorsdon’ttreatmemoryinasequen%allyconsistentfashion.Thread1 Thread2EAX := load(r); EAX := load(r); EAX := EAX + 1; EAX := EAX + 1; store EAX into r store EAX into r EAX := load(r) EAX := load(r)

Thatmeansthatwecan’tevenassumethatwhatwewillseecorrespondstosomeinterleavingofthethreads’instruc%ons!Beyondthescopeofthisclass!Butthetake-awayisthis:It’snotagoodideatouseordinaryloads/storestosynchronizethreads;youshoulduseexplicitsynchronizaGonprimiGvessothehardwareandopGmizingcompilerdon’topGmizethemaway.

Page 65: Parallelism and Concurrency · 2016. 11. 21. · OCaml, Concurrency and Parallelism Unfortunately, even if your computer has 2, 4, 6, 8 cores, OCaml cannot exploit them. It mulGplexes

Atomicity

65

Infact,today’smulGcoreprocessorsdon’ttreatmemoryinasequen%allyconsistentfashion.Thatmeansthatwecan’tevenassumethatwhatwewillseecorrespondstosomeinterleavingofthethreads’instruc%ons!Beyondthescopeofthisclass!Butthetake-awayisthis:It’snotagoodideatouseordinaryloads/storestosynchronizethreads;youshoulduseexplicitsynchronizaGonprimiGvessothehardwareandopGmizingcompilerdon’topGmizethemaway.

Core1

L2cache

Core2

L1cache L1cache

ALU ALU

Core3

Core4

L1cache L1cache

ALU ALUWhenCore1storesto“memory”,itlazily

propagatestoCore2’sL1cache.TheloadatCore2mightnotseeit,unlessthereisanexplicitsynchronizaGon.

Page 66: Parallelism and Concurrency · 2016. 11. 21. · OCaml, Concurrency and Parallelism Unfortunately, even if your computer has 2, 4, 6, 8 cores, OCaml cannot exploit them. It mulGplexes

Summary:Interleaving&RaceCondiGons

66

CalculatepossibleoutcomesforaprogrambyconsideringallofthepossibleinterleavingsoftheatomicacGonsperformedbyeachthread.

–  Subjecttothehappens-beforerelaGon.•  can’thaveachildthread’sacGonshappeningbeforeaparentforksit.•  can’thavelaterinstrucGonsexecuteearlierinthesamethread.

–  Here,atomicmeansindivisibleacGons.•  Forexample,onmostmachinesreadingorwriGnga32-bitwordisatomic.•  But,wriGngamulG-wordobjectisusuallynotatomic.•  MostoperaGonslike“b:=b-w”areimplementedintermsofaseriesofsimpleroperaGonssuchas–  r1=read(b);r2=read(w);r3=r1–r2;write(b,r3)

Reasoningaboutallinterleavingsishard.justaboutimpossibleforpeople

–  NumberofinterleavingsgrowsexponenGallywithnumberofstatements.–  It’shardforustotellwhatisandisn’tatomicinahigh-levellanguage.–  YOUAREDOOMEDTOFAILIFYOUHAVETOWORRYABOUTTHISSTUFF!

Page 67: Parallelism and Concurrency · 2016. 11. 21. · OCaml, Concurrency and Parallelism Unfortunately, even if your computer has 2, 4, 6, 8 cores, OCaml cannot exploit them. It mulGplexes

Summary:Interleaving&RaceCondiGons

67

CalculatepossibleoutcomesforaprogrambyconsideringallofthepossibleinterleavingsoftheatomicacGonsperformedbyeachthread.

–  Subjecttothehappens-beforerelaGon.•  can’thaveachildthread’sacGonshappeningbeforeaparentforksit.•  can’thavelaterinstrucGonsexecuteearlierinthesamethread.

–  Here,atomicmeansindivisibleacGons.•  Forexample,onmostmachinesreadingorwriGnga32-bitwordisatomic.•  But,wriGngamulG-wordobjectisusuallynotatomic.•  MostoperaGonslike“b:=b-w”areimplementedintermsofaseriesofsimpleroperaGonssuchas–  r1=read(b);r2=read(w);r3=r1–r2;write(b,r3)

Reasoningaboutallinterleavingsishard.justaboutimpossibleforpeople

–  NumberofinterleavingsgrowsexponenGallywithnumberofstatements.–  It’shardforustotellwhatisandisn’tatomicinahigh-levellanguage.–  YOUAREDOOMEDTOFAILIFYOUHAVETOWORRYABOUTTHISSTUFF!

WARNINGIfyouseepeopletalkaboutinterleavings,BEWARE!

Itprobablymeansthey’reassuming“sequenGalconsistency,”

whichisanoversimplified,naïvemodelofwhattheparallelcomputerreallydoes.

It’sactuallymorecomplicatedthanthat.

Page 68: Parallelism and Concurrency · 2016. 11. 21. · OCaml, Concurrency and Parallelism Unfortunately, even if your computer has 2, 4, 6, 8 cores, OCaml cannot exploit them. It mulGplexes

AconvenGonalsoluGonforshared-memoryparallelism

Thread1 Thread2lock(mutex); lock(mutex);

inc(r); inc(r);

!r !r unlock(mutex); unlock(mutex);

GuaranteesmutualexclusionofthesecriGcalsecGons.ThissoluGonworks(evenforrealmachinesthatarenotsequenGallyconsistent),but…Complextoprogram,subjecttodeadlock,pronetobugs,notfault-tolerant,hardtoreasonabout.

let inc(r:int ref) = r := (!r) + 1

Page 69: Parallelism and Concurrency · 2016. 11. 21. · OCaml, Concurrency and Parallelism Unfortunately, even if your computer has 2, 4, 6, 8 cores, OCaml cannot exploit them. It mulGplexes

AconvenGonalsoluGonforshared-memoryparallelism

Thread1 Thread2lock(mutex); lock(mutex);

inc(r); inc(r);

!r !r unlock(mutex); unlock(mutex);

GuaranteesmutualexclusionofthesecriGcalsecGons.ThissoluGonworks(evenforrealmachinesthatarenotsequenGallyconsistent),but…Complextoprogram,subjecttodeadlock,pronetobugs,notfault-tolerant,hardtoreasonabout.

let inc(r:int ref) = r := (!r) + 1

SynchronizaHon

Page 70: Parallelism and Concurrency · 2016. 11. 21. · OCaml, Concurrency and Parallelism Unfortunately, even if your computer has 2, 4, 6, 8 cores, OCaml cannot exploit them. It mulGplexes

AnotherapproachtothecoordinaGonProblem

70

Howdowegetbacktheresultthattiscompu%ng?

Thread.create : (‘a -> ‘b) -> ‘a -> Thread.t let t = Thread.create f () in let y = g () in ...

Page 71: Parallelism and Concurrency · 2016. 11. 21. · OCaml, Concurrency and Parallelism Unfortunately, even if your computer has 2, 4, 6, 8 cores, OCaml cannot exploit them. It mulGplexes

OneSoluGon(usingjoin)

71

let r = ref None

let t = Thread.create (fun _ -> r := Some(f ())) in let y = g() in

Thread.join t ;

match !r with | Some v -> (* compute with v and y *)

| None -> failwith “impossible”

Page 72: Parallelism and Concurrency · 2016. 11. 21. · OCaml, Concurrency and Parallelism Unfortunately, even if your computer has 2, 4, 6, 8 cores, OCaml cannot exploit them. It mulGplexes

OneSoluGon(usingjoin)

72

let r = ref None

let t = Thread.create (fun _ -> r := Some(f ())) in let y = g() in

Thread.join t ;

match !r with | Some v -> (* compute with v and y *)

| None -> failwith “impossible”

Thread.join tcausesthecurrentthreadtowait

unGlthethreadtterminates.

Page 73: Parallelism and Concurrency · 2016. 11. 21. · OCaml, Concurrency and Parallelism Unfortunately, even if your computer has 2, 4, 6, 8 cores, OCaml cannot exploit them. It mulGplexes

OneSoluGon(usingjoin)

73

let r = ref None

let t = Thread.create (fun _ -> r := Some(f ())) in let y = g() in

Thread.join t ;

match !r with | Some v -> (* compute with v and y *)

| None -> failwith “impossible”

SoaSerthejoin,weknowthatanyoftheoperaGons

ofthavecompleted.

SynchronizaHon

Page 74: Parallelism and Concurrency · 2016. 11. 21. · OCaml, Concurrency and Parallelism Unfortunately, even if your computer has 2, 4, 6, 8 cores, OCaml cannot exploit them. It mulGplexes

InPictures

74

Thread1t=createfxinst1,1;inst1,2;inst1,3;inst1,4;…inst1,n-1;inst1,n;joint

Thread2inst2,1;inst2,2;inst2,3;…inst2,m;

WeknowthatforeachthreadthepreviousinstrucGonsmusthappenbeforethelaterinstrucGons.Soforinstance,inst1,1musthappenbeforeinst1,2.

Page 75: Parallelism and Concurrency · 2016. 11. 21. · OCaml, Concurrency and Parallelism Unfortunately, even if your computer has 2, 4, 6, 8 cores, OCaml cannot exploit them. It mulGplexes

InPictures

75

Thread1t=createfxinst1,1;inst1,2;inst1,3;inst1,4;…inst1,n-1;inst1,n;joint

Thread2inst2,1;inst2,2;inst2,3;…inst2,m;

WealsoknowthattheforkmusthappenbeforethefirstinstrucGonofthesecondthread.

Page 76: Parallelism and Concurrency · 2016. 11. 21. · OCaml, Concurrency and Parallelism Unfortunately, even if your computer has 2, 4, 6, 8 cores, OCaml cannot exploit them. It mulGplexes

InPictures

76

Thread1t=createfxinst1,1;inst1,2;inst1,3;inst1,4;…inst1,n-1;inst1,n;joint

Thread2inst2,1;inst2,2;inst2,3;…inst2,m;

WealsoknowthattheforkmusthappenbeforethefirstinstrucGonofthesecondthread.

Andthankstothejoin,weknowthatalloftheinstrucGonsofthesecondthreadmustbecompletedbeforethejoinfinishes.

Page 77: Parallelism and Concurrency · 2016. 11. 21. · OCaml, Concurrency and Parallelism Unfortunately, even if your computer has 2, 4, 6, 8 cores, OCaml cannot exploit them. It mulGplexes

InPictures

77

Thread1t=createfxinst1,1;inst1,2;inst1,3;inst1,4;…inst1,n-1;inst1,n;joint

Thread2inst2,1;inst2,2;inst2,3;…inst2,m;

However,ingeneral,wedonotknowwhetherinst1,iexecutesbeforeoraSerinst2,j.Ingeneral,synchroniza%oninstruc%onslikeforkandjoinreducethenumberofpossibleinterleavings.Synchroniza%oncutsdownnondeterminism.IntheabsenceofsynchronizaGonwedon’tknowanything…