Parallelism and Concurrency · 2016. 11. 21. · OCaml, Concurrency and Parallelism Unfortunately,...
Transcript of Parallelism and Concurrency · 2016. 11. 21. · OCaml, Concurrency and Parallelism Unfortunately,...
ParallelismandConcurrency
COS326DavidWalker
PrincetonUniversityslidescopyright2013-2015DavidWalkerandAndrewW.Appel
permissiongrantedtoreusetheseslidesfornon-commercialeducaGonalpurposes
Parallelism
2
• Whatisit?
• Today'stechnologytrends.
• Then:– Whyisitsomuchhardertoprogram?
• (Isitactuallysomuchhardertoprogram?)– SomepreliminarylinguisGcconstructs
• threadcreaGon• ourfirstparallelfuncGonalabstracGon:futures
PARALLELISM:WHATISIT?
Parallelism
4
• Whatisit?– doingmanythingsatthesameGmeinsteadofsequenGally(one-aSer-the-other).
FlavorsofParallelism
5
DataParallelism– samecomputaGonbeingperformedonacollec%onofindependentitems
– e.g.,addingtwovectorsofnumbersTaskParallelism
– differentcomputaGons/programsrunningatthesameGme– e.g.,runningwebserveranddatabase
PipelineParallelism– assemblyline:
sequenGalf
sequenGalg
mapfoverallitems mapgoverallitems
Parallelismvs.Concurrency
6
Parallelism:performsmanytaskssimultaneously
• purpose:improvesthroughput• mechanism:
– manyindependentcompuGngdevices– decreaserunGmeofprogrambyuGlizingmulGplecoresorcomputers
• eg:runningyourwebcrawleronaclusterversusonemachine.
Concurrency:mediatesmulG-partyaccesstosharedresources• purpose:decreaseresponseGme• mechanism:
– switchbetweendifferentthreadsofcontrol– workononethreadwhenitcanmakeusefulprogress;whenitcan't,suspenditandworkonanotherthread
• eg:runningyourclock,editor,chatatthesameGmeonasingleCPU.– OSgiveseachoftheseprogramsasmallGme-slice(~10msec)– oSenslowsthroughputduetocostofswitchingcontexts
• eg:don'tblockwhilewaiGngforI/Odevicetorespond,butletanotherthreaddousefulCPUcomputaGon
Parallelismvs.Concurrency
7
cpu cpu cpu
job
…
Parallelism:performseveralindependenttaskssimultaneously
resource(cpu,disk,server,datastructure)
job …Concurrency:mediate/mulGplexaccesstosharedresource
job job
manyefficientprogramsusesomeparallelismandsomeconcurrency
UNDERSTANDINGTECHNOLOGYTRENDS
Moore'sLaw• Moore'sLaw:Thenumberoftransistorsyoucanputona
computerchipdoubles(approximately)everycoupleofyears.
• ConsequenceformostofthehistoryofcompuGng:Allprogramsdoubleinspeedeverycoupleofyears.– Why?Hardwaredesignersarewickedsmart.– Theyhavebeenabletousethoseextratransistorsto(forexample)doublethenumberofinstrucGonsexecutedperGmeunit,therebyprocessingspeedofprograms
• ConsequenceforapplicaGonwriters:– watchTVforawhileandyourprogramsop%mizethemselves!– perhapsmoreimportantly:newapplicaGonsthoughtimpossiblebecamepossiblebecauseofincreasedcomputaGonalpower
CPUClockSpeedsfrom1993-2005
10
CPUClockSpeedsfrom1993-2005
11
Nextyear’smachineistwiceasfast!
CPUClockSpeedsfrom1993-2005
12
Oops!
CPUPower1993-2005
13
CPUPower1993-2005
14
ButpowerconsumpGonisonlypartoftheproblem…coolingistheother!
TheHeatProblem
15
TheProblem
16
1993PenGumHeatSink
2005Cooler
Cray-4:1994
17
Upto64processorsRunningat1GHz8MegabytesofRAMCost:roughly$10MTheCRAY2,3,and4CPUandmemoryboardswereimmersedinabathofelectricallyinertcoolingfluid.
watercooled!
18
PowerDissipaGon
19
20
Darn!IntelengineersnolongeropGmizemyprogramswhileIwatchTV!
Powertochippeaking
21
Butlook:Moore’sLawsGllholds,sofar,fortransistors-per-chip.
Whatdowedowithallthosetransistors?1. MulGcore!
2. System-on-chipwithspecializedcoprocessors(suchasGPU)BothofthosearePARALLELISM
Parallelism
22
WhyisitparGcularlyimportant(today)?– Roughlyeveryotheryear,achipfromIntelwould:
• halvethefeaturesize(sizeoftransistors,wires,etc.)• doublethenumberoftransistors• doubletheclockspeed• thisdrovetheeconomicengineoftheITindustry(andtheUS!)
– Nolongerabletodoubleclockorcutvoltage:aprocessorwon’tgetanyfaster!• (sowhyshouldyoubuyanewlaptop,desktop,etc.?)• powerandheatarelimitaGonsontheclock• errors,variability(noise)arelimitaGonsonthevoltage• butwecansGllpackalotoftransistorsonachip…(atleastforanother10to15years.)
Core
MulG-coreh/w–commonL2
L2cache
Core
Mainmemory
L1cache L1cache
ALUALU
ALUALU
23
Today…(actually9yearsago!)
24
GPUs
• There'snothinglikevideogamingtodriveprogressincomputa%on!
• GPUscanhavehundredsoreventhousandsofcores
• Threeofthe5mostpowerfulsupercomputersintheworldtakeadvantageofGPUacceleraGon.
• ScienGstsuseGPUsforsimulaGonandmodelling– eg:proteinfoldingandfluiddynamics
GPUs
• There'snothinglikevideogamingtodriveprogressincomputa%on!
• GPUscanhavehundredsoreventhousandsofcores
• Threeofthe5mostpowerfulsupercomputersintheworldtakeadvantageofGPUacceleraGon.
• ScienGstsuseGPUsforsimulaGonandmodelling– eg:proteinfoldingandfluiddynamics
JohnDanskin,PhDPrinceton1994,VicePresidentforGPUarchitecture,Nvidia(whathedoeswithhisspareGme…builtthiscarhimself)
So…
27
InsteadoftryingtomakeyourCPUgofaster,Intel’sjustgoingtopackmoreCPUsontoachip.
– afewyearsago:dualcore(2CPUs).– aliqlemorerecently:4,6,8cores.– IntelistesGng48-corechipswithresearchersnow.– Within10years,you’llhave~1024IntelCPUsonachip.
Infact,that’salreadyhappeningwithgraphicschips(eg,Nvidia).
– reallygoodatsimpledataparallelism(manydeeppipes)– buttheyaremuchdumberthananIntelcore.– andrightnow,chewupalotofpower.– watchforGPUstoget“smarter”andmorepowerefficient,whileCPUsbecomemorelikeGPUs.
STILLMOREPROCESSORS:THEDATACENTER
DataCenters:GeneraGonZSuperComputers
DataCenters:LotsofConnectedComputers!
DataCenters• 10sor100softhousandsofcomputers• Allconnectedtogether• MoGvatedbynewapplicaGonsandscalablewebservices:
– let'scatalogueallNbillionwebpagesintheworld– let'sallallowanyoneintheworldtosearchforthepageheorsheneeds
– let'sprocessthatsearchinlessthanasecond• It'sAmazing!• It'sMagic!
DataCenters:LotsofConnectedComputersComputercontainersforplug-and-playparallelism:
SoundsGreat!
33
• Somyoldprogramswillrun2x,4x,48x,256x,1024xfaster?
SoundsGreat!
34
• Somyoldprogramswillrun2x,4x,48x,256x,1024xfaster?– noway!
SoundsGreat!
35
• Somyoldprogramswillrun2x,4x,48x,256x,1024xfaster?– noway!– toupgradefromIntel386to486,theappwriterandcompilerwriterdidnothavetodoanything(much)• IA486interpretedthesamesequenGalstreamofinstrucGons;itjustdiditfaster
• thisiswhywecouldwatchTVwhileIntelengineersopGmizedourprogramsforus
– toupgradefromIntel486todualcore,weneedtofigureouthowtosplitasinglestreamofinstrucGonsintotwostreamsofinstrucGonsthatcollaboratetocompletethesametask.• withoutwork&thought,ourprogramsdon'tgetanyfasteratall• ittakesingenuitytogenerateefficientparallelalgorithmsfromsequen%alones
What’stheanswer?
InPart:FuncGonalProgramming!
Dryad
Pig
Naiad
PARALLELANDCONCURRENTPROGRAMMING
Core
MulGcoreHardware&DataCenters
L2cache
Core
Mainmemory
L1cache L1cache
ALUALU
ALUALU
39
Speedup
40
• Speedup:theraGoofsequenGalprogramexecuGonGmetoparallelexecuGonGme.
• IfT(p)istheGmeittakestorunacomputaGononpprocessors
• Aparallelprogramhasperfectspeedup(akalinearspeedup)if
• Badnews:Noteveryprogramcanbeeffec%velyparallelized.– infact,veryfewprogramswillscalewithperfectspeedups.– wecertainlycan'tachieveperfectspeedupsautomaGcally– limitedbysequenGalporGons,datatransfercosts,...
speedup(p)=T(1)/T(p)
T(1)/T(p)=speedup=p
MostTroubling…
41
Most,butnotall,parallelandconcurrentprogrammingmodelsarefarhardertoworkwiththansequenGalones:
• Theyintroducenondeterminism– therootof(almostall)evil– programpartssuddenlyhavemanydifferentoutcomes
• theyhavedifferentoutcomesondifferentruns• debuggingrequiresconsideringallofthepossibleoutcomes• horribleheisenbugshardtotrackdown
• Theyarenonmodular– moduleAimplicitlyinfluencestheoutcomesofmoduleB
• Theyintroducenewclassesoferrors– racecondiGons,deadlocks
• Theyintroducenewperformance/scalabilityproblems– busy-waiGng,sequenGalizaGon,contenGon,
InformalErrorRateChart
regularitywithwhichyoushootyourselfinthefoot
InformalErrorRateChart
regularitywithwhichyoushootyourselfinthefoot
nullpointers,paucityoftypes,inheritence
manualmemorymanagement
kitchensink+manualmemory
heavenonearth
unstructuredparallelorconcurrentprogramming
SolidParallelProgrammingRequires
44
1.GoodsequenGalprogrammingskills.– allthethingswe’vebeentalkingabout:usemodules,types,...
2.DeepknowledgeoftheapplicaGon.3.Pickacorrect-by-construc%onparallelprogrammingmodel
– wheneverpossible,aparallelmodelwithsemanGcsthatcoincideswithsequenGalsemanGcs• wheneverpossible,reusewell-testedlibrariesthathideparallelism
– wheneverpossible,amodelthatcutsdownnon-determinism– wheneverpossible,amodelwithfewerpossibleconcurrencybugs– ifbugscanarise,knowandusesafeprogrammingpaqerns
4.Carefulengineeringtoensurescaling.
– unfortunately,thereissomeGmesatradeoff:• reducednondeterminismcanleadtoreducedresourceuGlizaGon
– synchronizaGon,communicaGoncostsmayneedopGmizaGon
OURFIRSTPARALLELPROGRAMMINGMODEL:THREADS
Threads:AWarning• ConcurrentThreadswithLocks:theclassicshoot-yourself-in-
the-footconcurrentprogrammingmodel– alltheclassicerrormodes
• WhyThreads?– almostallprogramminglanguageswillhaveathreadslibrary
• OCamlinparGcular!– youneedtoknowwherethepiyallsare– theassemblylanguageofconcurrentprogrammingparadigms
• we’llusethreadstobuildseveralhigher-levelprogrammingmodels
Threads
47
• Threads:anabstracGonofaprocessor.– programmer(orcompiler)decidesthatsomeworkcanbedoneinparallelwithsomeotherwork,e.g.:
– weforkathreadtorunthecomputaGoninparallel,e.g.:
let _ = compute_big_thing() in let y = compute_other_big_thing() in ...
let t = Thread.create compute_big_thing () in let y = compute_other_big_thing () in ...
IntuiGoninPictures
48
let t = Thread.create f () in let y = g () in ...
Thread.create execute g () ...
processor1
(* doing nothing *) execute f () ...
processor2
Gme1Gme2Gme3
OfCourse…
49
Supposeyouhave2availablecoresandyoufork4threads.InatypicalmulG-threadedsystem,
– theoperaGngsystemprovidestheillusionthatthereareaninfinitenumberofprocessors.• notreally:eachthreadconsumesspace,soifyouforktoomanythreadstheprocesswilldie.
– it%me-mul%plexesthethreadsacrosstheavailableprocessors.• aboutevery10msec,itstopsthecurrentthreadonaprocessor,andswitchestoanotherthread.
• soathreadisreallyavirtualprocessor.
OCaml,ConcurrencyandParallelismUnfortunately,evenifyourcomputerhas2,4,6,8cores,OCamlcannotexploitthem.ItmulGplexesallthreadsoverasinglecore
Hence,OCamlprovidesconcurrency,butnotparallelism.Why?BecauseOCaml(likePython)hasnoparallel“runGmesystem”orgarbagecollector.OtherfuncGonallanguages(Haskell,F#,...)do.Fortunately,whenthinkingaboutprogramcorrectness,itdoesn’tmaqerthatOCamlisnotparallel--IwilloSenpretendthatitis.YoucanhideI/Olatency,domulGprocessprogrammingordistributetasksamongstmulGplecomputersinOCaml.
core
thread …thread thread
CoordinaGon
51
HowdowegetbacktheresultthattiscompuGng?
Thread.create : (‘a -> ‘b) -> ‘a -> Thread.t let t = Thread.create f () in let y = g () in ...
FirstAqempt
52
let r = ref None let t = Thread.create (fun _ -> r := Some(f ())) in let y = g() in
match !r with | Some v -> (* compute with v and y *)
| None -> ???
What’swrongwiththis?
SecondAqempt
53
let r = ref None let t = Thread.create (fun _ -> r := Some(f ())) in let y = g() in
let rec wait() =
match !r with | Some v -> v
| None -> wait()
in let v = wait() in (* compute with v and y *)
TwoProblems
54
First,wearebusy-wai%ng.• consumingcpuwithoutdoingsomethinguseful.• theprocessorcouldbeeitherrunningausefulthread/programorpower
down.
let r = ref None let t = Thread.create (fun _ -> r := Some(f ())) in let y = g() in
let rec wait() =
match !r with | Some v -> v
| None -> wait()
in let v = wait() in (* compute with v and y *)
TwoProblems
55
Second,anoperaGonliker:=Somevmaynotbeatomic.• r:=SomevrequiresustocopythebytesofSomevintotherefr• wemightseepartofthebytes(correspondingtoSome)beforewe’ve
wriqenintheotherparts(e.g.,v).• Sothewaitermightseethewrongvalue.
let r = ref None let t = Thread.create (fun _ -> r := Some(f ())) in let y = g() in
let rec wait() =
match !r with | Some v -> v
| None -> wait()
in let v = wait() in (* compute with v and y *)
Atomicity
56
Considerthefollowing:
andsupposetwothreadsareincremenGngthesamerefr:Thread1 Thread2inc(r); inc(r); !r !r
IfriniGallyholds0,thenwhatwillThread1seewhenitreadsr?
let inc(r:int ref) = r := (!r) + 1
Atomicity
57
Theproblemisthatwecan’tseeexactlywhatinstrucGonsthecompilermightproducetoexecutethecode.Itmightlooklikethis:Thread1 Thread2EAX := load(r); EAX := load(r);
EAX := EAX + 1; EAX := EAX + 1; store EAX into r store EAX into r
EAX := load(r) EAX := load(r)
Atomicity
58
ButaclevercompilermightopGmizethisto:Thread1 Thread2EAX := load(r); EAX := load(r);
EAX := EAX + 1; EAX := EAX + 1;
store EAX into r store EAX into r EAX := load(r) EAX := load(r)
Atomicity
59
Furthermore,wedon’tknowwhentheOSmightinterruptonethreadandruntheother.Thread1 Thread2EAX := load(r); EAX := load(r);
EAX := EAX + 1; EAX := EAX + 1; store EAX into r store EAX into r
EAX := load(r) EAX := load(r) (ThesituaGonissimilar,butnotquitethesameonmulG-processorsystems.)
TheHappensBeforeRelaGonWedon’tknowexactlywheneachinstrucGonwillexecute,buttherearesomeconstraints:theHappensBeforerelaGonRule1:Giventwoexpressions(orinstrucGons)insequence,e1;e2weknowthate1happensbeforee2.Rule2:Givenaprogram:lett=Thread.createfxin....Thread.joint;eweknowthat(fx)happensbeforee.
Atomicity
61
OnepossibleinterleavingoftheinstrucGons:Thread1 Thread2EAX := load(r); EAX := load(r); EAX := EAX + 1; EAX := EAX + 1;
store EAX into r store EAX into r EAX := load(r) EAX := load(r)
Whatanswerdoweget?
Atomicity
62
Anotherpossibleinterleaving:Thread1 Thread2EAX := load(r); EAX := load(r); EAX := EAX + 1; EAX := EAX + 1;
store EAX into r store EAX into r EAX := load(r) EAX := load(r)
WhatanswerdowegetthisGme?
Atomicity
63
Anotherpossibleinterleaving:Thread1 Thread2EAX := load(r); EAX := load(r); EAX := EAX + 1; EAX := EAX + 1; store EAX into r store EAX into r EAX := load(r) EAX := load(r)
WhatanswerdowegetthisGme?Moral:ThesystemisresponsibleforschedulingexecuGonofinstrucGons.Moral:Thiscanleadtoanenormousdegreeofnondeterminism.
Atomicity
64
Infact,today’smulGcoreprocessorsdon’ttreatmemoryinasequen%allyconsistentfashion.Thread1 Thread2EAX := load(r); EAX := load(r); EAX := EAX + 1; EAX := EAX + 1; store EAX into r store EAX into r EAX := load(r) EAX := load(r)
Thatmeansthatwecan’tevenassumethatwhatwewillseecorrespondstosomeinterleavingofthethreads’instruc%ons!Beyondthescopeofthisclass!Butthetake-awayisthis:It’snotagoodideatouseordinaryloads/storestosynchronizethreads;youshoulduseexplicitsynchronizaGonprimiGvessothehardwareandopGmizingcompilerdon’topGmizethemaway.
Atomicity
65
Infact,today’smulGcoreprocessorsdon’ttreatmemoryinasequen%allyconsistentfashion.Thatmeansthatwecan’tevenassumethatwhatwewillseecorrespondstosomeinterleavingofthethreads’instruc%ons!Beyondthescopeofthisclass!Butthetake-awayisthis:It’snotagoodideatouseordinaryloads/storestosynchronizethreads;youshoulduseexplicitsynchronizaGonprimiGvessothehardwareandopGmizingcompilerdon’topGmizethemaway.
Core1
L2cache
Core2
L1cache L1cache
ALU ALU
Core3
Core4
L1cache L1cache
ALU ALUWhenCore1storesto“memory”,itlazily
propagatestoCore2’sL1cache.TheloadatCore2mightnotseeit,unlessthereisanexplicitsynchronizaGon.
Summary:Interleaving&RaceCondiGons
66
CalculatepossibleoutcomesforaprogrambyconsideringallofthepossibleinterleavingsoftheatomicacGonsperformedbyeachthread.
– Subjecttothehappens-beforerelaGon.• can’thaveachildthread’sacGonshappeningbeforeaparentforksit.• can’thavelaterinstrucGonsexecuteearlierinthesamethread.
– Here,atomicmeansindivisibleacGons.• Forexample,onmostmachinesreadingorwriGnga32-bitwordisatomic.• But,wriGngamulG-wordobjectisusuallynotatomic.• MostoperaGonslike“b:=b-w”areimplementedintermsofaseriesofsimpleroperaGonssuchas– r1=read(b);r2=read(w);r3=r1–r2;write(b,r3)
Reasoningaboutallinterleavingsishard.justaboutimpossibleforpeople
– NumberofinterleavingsgrowsexponenGallywithnumberofstatements.– It’shardforustotellwhatisandisn’tatomicinahigh-levellanguage.– YOUAREDOOMEDTOFAILIFYOUHAVETOWORRYABOUTTHISSTUFF!
Summary:Interleaving&RaceCondiGons
67
CalculatepossibleoutcomesforaprogrambyconsideringallofthepossibleinterleavingsoftheatomicacGonsperformedbyeachthread.
– Subjecttothehappens-beforerelaGon.• can’thaveachildthread’sacGonshappeningbeforeaparentforksit.• can’thavelaterinstrucGonsexecuteearlierinthesamethread.
– Here,atomicmeansindivisibleacGons.• Forexample,onmostmachinesreadingorwriGnga32-bitwordisatomic.• But,wriGngamulG-wordobjectisusuallynotatomic.• MostoperaGonslike“b:=b-w”areimplementedintermsofaseriesofsimpleroperaGonssuchas– r1=read(b);r2=read(w);r3=r1–r2;write(b,r3)
Reasoningaboutallinterleavingsishard.justaboutimpossibleforpeople
– NumberofinterleavingsgrowsexponenGallywithnumberofstatements.– It’shardforustotellwhatisandisn’tatomicinahigh-levellanguage.– YOUAREDOOMEDTOFAILIFYOUHAVETOWORRYABOUTTHISSTUFF!
WARNINGIfyouseepeopletalkaboutinterleavings,BEWARE!
Itprobablymeansthey’reassuming“sequenGalconsistency,”
whichisanoversimplified,naïvemodelofwhattheparallelcomputerreallydoes.
It’sactuallymorecomplicatedthanthat.
AconvenGonalsoluGonforshared-memoryparallelism
Thread1 Thread2lock(mutex); lock(mutex);
inc(r); inc(r);
!r !r unlock(mutex); unlock(mutex);
GuaranteesmutualexclusionofthesecriGcalsecGons.ThissoluGonworks(evenforrealmachinesthatarenotsequenGallyconsistent),but…Complextoprogram,subjecttodeadlock,pronetobugs,notfault-tolerant,hardtoreasonabout.
let inc(r:int ref) = r := (!r) + 1
AconvenGonalsoluGonforshared-memoryparallelism
Thread1 Thread2lock(mutex); lock(mutex);
inc(r); inc(r);
!r !r unlock(mutex); unlock(mutex);
GuaranteesmutualexclusionofthesecriGcalsecGons.ThissoluGonworks(evenforrealmachinesthatarenotsequenGallyconsistent),but…Complextoprogram,subjecttodeadlock,pronetobugs,notfault-tolerant,hardtoreasonabout.
let inc(r:int ref) = r := (!r) + 1
SynchronizaHon
AnotherapproachtothecoordinaGonProblem
70
Howdowegetbacktheresultthattiscompu%ng?
Thread.create : (‘a -> ‘b) -> ‘a -> Thread.t let t = Thread.create f () in let y = g () in ...
OneSoluGon(usingjoin)
71
let r = ref None
let t = Thread.create (fun _ -> r := Some(f ())) in let y = g() in
Thread.join t ;
match !r with | Some v -> (* compute with v and y *)
| None -> failwith “impossible”
OneSoluGon(usingjoin)
72
let r = ref None
let t = Thread.create (fun _ -> r := Some(f ())) in let y = g() in
Thread.join t ;
match !r with | Some v -> (* compute with v and y *)
| None -> failwith “impossible”
Thread.join tcausesthecurrentthreadtowait
unGlthethreadtterminates.
OneSoluGon(usingjoin)
73
let r = ref None
let t = Thread.create (fun _ -> r := Some(f ())) in let y = g() in
Thread.join t ;
match !r with | Some v -> (* compute with v and y *)
| None -> failwith “impossible”
SoaSerthejoin,weknowthatanyoftheoperaGons
ofthavecompleted.
SynchronizaHon
InPictures
74
Thread1t=createfxinst1,1;inst1,2;inst1,3;inst1,4;…inst1,n-1;inst1,n;joint
Thread2inst2,1;inst2,2;inst2,3;…inst2,m;
WeknowthatforeachthreadthepreviousinstrucGonsmusthappenbeforethelaterinstrucGons.Soforinstance,inst1,1musthappenbeforeinst1,2.
InPictures
75
Thread1t=createfxinst1,1;inst1,2;inst1,3;inst1,4;…inst1,n-1;inst1,n;joint
Thread2inst2,1;inst2,2;inst2,3;…inst2,m;
WealsoknowthattheforkmusthappenbeforethefirstinstrucGonofthesecondthread.
InPictures
76
Thread1t=createfxinst1,1;inst1,2;inst1,3;inst1,4;…inst1,n-1;inst1,n;joint
Thread2inst2,1;inst2,2;inst2,3;…inst2,m;
WealsoknowthattheforkmusthappenbeforethefirstinstrucGonofthesecondthread.
Andthankstothejoin,weknowthatalloftheinstrucGonsofthesecondthreadmustbecompletedbeforethejoinfinishes.
InPictures
77
Thread1t=createfxinst1,1;inst1,2;inst1,3;inst1,4;…inst1,n-1;inst1,n;joint
Thread2inst2,1;inst2,2;inst2,3;…inst2,m;
However,ingeneral,wedonotknowwhetherinst1,iexecutesbeforeoraSerinst2,j.Ingeneral,synchroniza%oninstruc%onslikeforkandjoinreducethenumberofpossibleinterleavings.Synchroniza%oncutsdownnondeterminism.IntheabsenceofsynchronizaGonwedon’tknowanything…