Parallelism and Concurrency · 2016. 11. 21. · OCaml, Concurrency and Parallelism Unfortunately,...

Post on 16-Sep-2020

0 views 0 download

Transcript of Parallelism and Concurrency · 2016. 11. 21. · OCaml, Concurrency and Parallelism Unfortunately,...

ParallelismandConcurrency

COS326DavidWalker

PrincetonUniversityslidescopyright2013-2015DavidWalkerandAndrewW.Appel

permissiongrantedtoreusetheseslidesfornon-commercialeducaGonalpurposes

Parallelism

2

•  Whatisit?

•  Today'stechnologytrends.

•  Then:–  Whyisitsomuchhardertoprogram?

•  (Isitactuallysomuchhardertoprogram?)–  SomepreliminarylinguisGcconstructs

•  threadcreaGon•  ourfirstparallelfuncGonalabstracGon:futures

PARALLELISM:WHATISIT?

Parallelism

4

•  Whatisit?–  doingmanythingsatthesameGmeinsteadofsequenGally(one-aSer-the-other).

FlavorsofParallelism

5

DataParallelism–  samecomputaGonbeingperformedonacollec%onofindependentitems

–  e.g.,addingtwovectorsofnumbersTaskParallelism

–  differentcomputaGons/programsrunningatthesameGme–  e.g.,runningwebserveranddatabase

PipelineParallelism–  assemblyline:

sequenGalf

sequenGalg

mapfoverallitems mapgoverallitems

Parallelismvs.Concurrency

6

Parallelism:performsmanytaskssimultaneously

•  purpose:improvesthroughput•  mechanism:

–  manyindependentcompuGngdevices–  decreaserunGmeofprogrambyuGlizingmulGplecoresorcomputers

•  eg:runningyourwebcrawleronaclusterversusonemachine.

Concurrency:mediatesmulG-partyaccesstosharedresources•  purpose:decreaseresponseGme•  mechanism:

–  switchbetweendifferentthreadsofcontrol–  workononethreadwhenitcanmakeusefulprogress;whenitcan't,suspenditandworkonanotherthread

•  eg:runningyourclock,editor,chatatthesameGmeonasingleCPU.–  OSgiveseachoftheseprogramsasmallGme-slice(~10msec)–  oSenslowsthroughputduetocostofswitchingcontexts

•  eg:don'tblockwhilewaiGngforI/Odevicetorespond,butletanotherthreaddousefulCPUcomputaGon

Parallelismvs.Concurrency

7

cpu cpu cpu

job

Parallelism:performseveralindependenttaskssimultaneously

resource(cpu,disk,server,datastructure)

job …Concurrency:mediate/mulGplexaccesstosharedresource

job job

manyefficientprogramsusesomeparallelismandsomeconcurrency

UNDERSTANDINGTECHNOLOGYTRENDS

Moore'sLaw•  Moore'sLaw:Thenumberoftransistorsyoucanputona

computerchipdoubles(approximately)everycoupleofyears.

•  ConsequenceformostofthehistoryofcompuGng:Allprogramsdoubleinspeedeverycoupleofyears.–  Why?Hardwaredesignersarewickedsmart.–  Theyhavebeenabletousethoseextratransistorsto(forexample)doublethenumberofinstrucGonsexecutedperGmeunit,therebyprocessingspeedofprograms

•  ConsequenceforapplicaGonwriters:–  watchTVforawhileandyourprogramsop%mizethemselves!–  perhapsmoreimportantly:newapplicaGonsthoughtimpossiblebecamepossiblebecauseofincreasedcomputaGonalpower

CPUClockSpeedsfrom1993-2005

10

CPUClockSpeedsfrom1993-2005

11

Nextyear’smachineistwiceasfast!

CPUClockSpeedsfrom1993-2005

12

Oops!

CPUPower1993-2005

13

CPUPower1993-2005

14

ButpowerconsumpGonisonlypartoftheproblem…coolingistheother!

TheHeatProblem

15

TheProblem

16

1993PenGumHeatSink

2005Cooler

Cray-4:1994

17

Upto64processorsRunningat1GHz8MegabytesofRAMCost:roughly$10MTheCRAY2,3,and4CPUandmemoryboardswereimmersedinabathofelectricallyinertcoolingfluid.

watercooled!

18

PowerDissipaGon

19

20

Darn!IntelengineersnolongeropGmizemyprogramswhileIwatchTV!

Powertochippeaking

21

Butlook:Moore’sLawsGllholds,sofar,fortransistors-per-chip.

Whatdowedowithallthosetransistors?1.  MulGcore!

2.  System-on-chipwithspecializedcoprocessors(suchasGPU)BothofthosearePARALLELISM

Parallelism

22

WhyisitparGcularlyimportant(today)?–  Roughlyeveryotheryear,achipfromIntelwould:

•  halvethefeaturesize(sizeoftransistors,wires,etc.)•  doublethenumberoftransistors•  doubletheclockspeed•  thisdrovetheeconomicengineoftheITindustry(andtheUS!)

–  Nolongerabletodoubleclockorcutvoltage:aprocessorwon’tgetanyfaster!•  (sowhyshouldyoubuyanewlaptop,desktop,etc.?)•  powerandheatarelimitaGonsontheclock•  errors,variability(noise)arelimitaGonsonthevoltage•  butwecansGllpackalotoftransistorsonachip…(atleastforanother10to15years.)

Core

MulG-coreh/w–commonL2

L2cache

Core

Mainmemory

L1cache L1cache

ALUALU

ALUALU

23

Today…(actually9yearsago!)

24

GPUs

•  There'snothinglikevideogamingtodriveprogressincomputa%on!

•  GPUscanhavehundredsoreventhousandsofcores

•  Threeofthe5mostpowerfulsupercomputersintheworldtakeadvantageofGPUacceleraGon.

•  ScienGstsuseGPUsforsimulaGonandmodelling–  eg:proteinfoldingandfluiddynamics

GPUs

•  There'snothinglikevideogamingtodriveprogressincomputa%on!

•  GPUscanhavehundredsoreventhousandsofcores

•  Threeofthe5mostpowerfulsupercomputersintheworldtakeadvantageofGPUacceleraGon.

•  ScienGstsuseGPUsforsimulaGonandmodelling–  eg:proteinfoldingandfluiddynamics

JohnDanskin,PhDPrinceton1994,VicePresidentforGPUarchitecture,Nvidia(whathedoeswithhisspareGme…builtthiscarhimself)

So…

27

InsteadoftryingtomakeyourCPUgofaster,Intel’sjustgoingtopackmoreCPUsontoachip.

–  afewyearsago:dualcore(2CPUs).–  aliqlemorerecently:4,6,8cores.–  IntelistesGng48-corechipswithresearchersnow.–  Within10years,you’llhave~1024IntelCPUsonachip.

Infact,that’salreadyhappeningwithgraphicschips(eg,Nvidia).

–  reallygoodatsimpledataparallelism(manydeeppipes)–  buttheyaremuchdumberthananIntelcore.–  andrightnow,chewupalotofpower.–  watchforGPUstoget“smarter”andmorepowerefficient,whileCPUsbecomemorelikeGPUs.

STILLMOREPROCESSORS:THEDATACENTER

DataCenters:GeneraGonZSuperComputers

DataCenters:LotsofConnectedComputers!

DataCenters•  10sor100softhousandsofcomputers•  Allconnectedtogether•  MoGvatedbynewapplicaGonsandscalablewebservices:

–  let'scatalogueallNbillionwebpagesintheworld–  let'sallallowanyoneintheworldtosearchforthepageheorsheneeds

–  let'sprocessthatsearchinlessthanasecond•  It'sAmazing!•  It'sMagic!

DataCenters:LotsofConnectedComputersComputercontainersforplug-and-playparallelism:

SoundsGreat!

33

•  Somyoldprogramswillrun2x,4x,48x,256x,1024xfaster?

SoundsGreat!

34

•  Somyoldprogramswillrun2x,4x,48x,256x,1024xfaster?–  noway!

SoundsGreat!

35

•  Somyoldprogramswillrun2x,4x,48x,256x,1024xfaster?–  noway!–  toupgradefromIntel386to486,theappwriterandcompilerwriterdidnothavetodoanything(much)•  IA486interpretedthesamesequenGalstreamofinstrucGons;itjustdiditfaster

•  thisiswhywecouldwatchTVwhileIntelengineersopGmizedourprogramsforus

–  toupgradefromIntel486todualcore,weneedtofigureouthowtosplitasinglestreamofinstrucGonsintotwostreamsofinstrucGonsthatcollaboratetocompletethesametask.•  withoutwork&thought,ourprogramsdon'tgetanyfasteratall•  ittakesingenuitytogenerateefficientparallelalgorithmsfromsequen%alones

What’stheanswer?

InPart:FuncGonalProgramming!

Dryad

Pig

Naiad

PARALLELANDCONCURRENTPROGRAMMING

Core

MulGcoreHardware&DataCenters

L2cache

Core

Mainmemory

L1cache L1cache

ALUALU

ALUALU

39

Speedup

40

•  Speedup:theraGoofsequenGalprogramexecuGonGmetoparallelexecuGonGme.

•  IfT(p)istheGmeittakestorunacomputaGononpprocessors

•  Aparallelprogramhasperfectspeedup(akalinearspeedup)if

•  Badnews:Noteveryprogramcanbeeffec%velyparallelized.–  infact,veryfewprogramswillscalewithperfectspeedups.–  wecertainlycan'tachieveperfectspeedupsautomaGcally–  limitedbysequenGalporGons,datatransfercosts,...

speedup(p)=T(1)/T(p)

T(1)/T(p)=speedup=p

MostTroubling…

41

Most,butnotall,parallelandconcurrentprogrammingmodelsarefarhardertoworkwiththansequenGalones:

•  Theyintroducenondeterminism–  therootof(almostall)evil–  programpartssuddenlyhavemanydifferentoutcomes

•  theyhavedifferentoutcomesondifferentruns•  debuggingrequiresconsideringallofthepossibleoutcomes•  horribleheisenbugshardtotrackdown

•  Theyarenonmodular–  moduleAimplicitlyinfluencestheoutcomesofmoduleB

•  Theyintroducenewclassesoferrors–  racecondiGons,deadlocks

•  Theyintroducenewperformance/scalabilityproblems–  busy-waiGng,sequenGalizaGon,contenGon,

InformalErrorRateChart

regularitywithwhichyoushootyourselfinthefoot

InformalErrorRateChart

regularitywithwhichyoushootyourselfinthefoot

nullpointers,paucityoftypes,inheritence

manualmemorymanagement

kitchensink+manualmemory

heavenonearth

unstructuredparallelorconcurrentprogramming

SolidParallelProgrammingRequires

44

1.GoodsequenGalprogrammingskills.–  allthethingswe’vebeentalkingabout:usemodules,types,...

2.DeepknowledgeoftheapplicaGon.3.Pickacorrect-by-construc%onparallelprogrammingmodel

–  wheneverpossible,aparallelmodelwithsemanGcsthatcoincideswithsequenGalsemanGcs•  wheneverpossible,reusewell-testedlibrariesthathideparallelism

–  wheneverpossible,amodelthatcutsdownnon-determinism–  wheneverpossible,amodelwithfewerpossibleconcurrencybugs–  ifbugscanarise,knowandusesafeprogrammingpaqerns

4.Carefulengineeringtoensurescaling.

–  unfortunately,thereissomeGmesatradeoff:•  reducednondeterminismcanleadtoreducedresourceuGlizaGon

–  synchronizaGon,communicaGoncostsmayneedopGmizaGon

OURFIRSTPARALLELPROGRAMMINGMODEL:THREADS

Threads:AWarning•  ConcurrentThreadswithLocks:theclassicshoot-yourself-in-

the-footconcurrentprogrammingmodel–  alltheclassicerrormodes

•  WhyThreads?–  almostallprogramminglanguageswillhaveathreadslibrary

•  OCamlinparGcular!–  youneedtoknowwherethepiyallsare–  theassemblylanguageofconcurrentprogrammingparadigms

•  we’llusethreadstobuildseveralhigher-levelprogrammingmodels

Threads

47

•  Threads:anabstracGonofaprocessor.–  programmer(orcompiler)decidesthatsomeworkcanbedoneinparallelwithsomeotherwork,e.g.:

–  weforkathreadtorunthecomputaGoninparallel,e.g.:

let _ = compute_big_thing() in let y = compute_other_big_thing() in ...

let t = Thread.create compute_big_thing () in let y = compute_other_big_thing () in ...

IntuiGoninPictures

48

let t = Thread.create f () in let y = g () in ...

Thread.create execute g () ...

processor1

(* doing nothing *) execute f () ...

processor2

Gme1Gme2Gme3

OfCourse…

49

Supposeyouhave2availablecoresandyoufork4threads.InatypicalmulG-threadedsystem,

–  theoperaGngsystemprovidestheillusionthatthereareaninfinitenumberofprocessors.•  notreally:eachthreadconsumesspace,soifyouforktoomanythreadstheprocesswilldie.

–  it%me-mul%plexesthethreadsacrosstheavailableprocessors.•  aboutevery10msec,itstopsthecurrentthreadonaprocessor,andswitchestoanotherthread.

•  soathreadisreallyavirtualprocessor.

OCaml,ConcurrencyandParallelismUnfortunately,evenifyourcomputerhas2,4,6,8cores,OCamlcannotexploitthem.ItmulGplexesallthreadsoverasinglecore

Hence,OCamlprovidesconcurrency,butnotparallelism.Why?BecauseOCaml(likePython)hasnoparallel“runGmesystem”orgarbagecollector.OtherfuncGonallanguages(Haskell,F#,...)do.Fortunately,whenthinkingaboutprogramcorrectness,itdoesn’tmaqerthatOCamlisnotparallel--IwilloSenpretendthatitis.YoucanhideI/Olatency,domulGprocessprogrammingordistributetasksamongstmulGplecomputersinOCaml.

core

thread …thread thread

CoordinaGon

51

HowdowegetbacktheresultthattiscompuGng?

Thread.create : (‘a -> ‘b) -> ‘a -> Thread.t let t = Thread.create f () in let y = g () in ...

FirstAqempt

52

let r = ref None let t = Thread.create (fun _ -> r := Some(f ())) in let y = g() in

match !r with | Some v -> (* compute with v and y *)

| None -> ???

What’swrongwiththis?

SecondAqempt

53

let r = ref None let t = Thread.create (fun _ -> r := Some(f ())) in let y = g() in

let rec wait() =

match !r with | Some v -> v

| None -> wait()

in let v = wait() in (* compute with v and y *)

TwoProblems

54

First,wearebusy-wai%ng.•  consumingcpuwithoutdoingsomethinguseful.•  theprocessorcouldbeeitherrunningausefulthread/programorpower

down.

let r = ref None let t = Thread.create (fun _ -> r := Some(f ())) in let y = g() in

let rec wait() =

match !r with | Some v -> v

| None -> wait()

in let v = wait() in (* compute with v and y *)

TwoProblems

55

Second,anoperaGonliker:=Somevmaynotbeatomic.•  r:=SomevrequiresustocopythebytesofSomevintotherefr•  wemightseepartofthebytes(correspondingtoSome)beforewe’ve

wriqenintheotherparts(e.g.,v).•  Sothewaitermightseethewrongvalue.

let r = ref None let t = Thread.create (fun _ -> r := Some(f ())) in let y = g() in

let rec wait() =

match !r with | Some v -> v

| None -> wait()

in let v = wait() in (* compute with v and y *)

Atomicity

56

Considerthefollowing:

andsupposetwothreadsareincremenGngthesamerefr:Thread1 Thread2inc(r); inc(r); !r !r

IfriniGallyholds0,thenwhatwillThread1seewhenitreadsr?

let inc(r:int ref) = r := (!r) + 1

Atomicity

57

Theproblemisthatwecan’tseeexactlywhatinstrucGonsthecompilermightproducetoexecutethecode.Itmightlooklikethis:Thread1 Thread2EAX := load(r); EAX := load(r);

EAX := EAX + 1; EAX := EAX + 1; store EAX into r store EAX into r

EAX := load(r) EAX := load(r)

Atomicity

58

ButaclevercompilermightopGmizethisto:Thread1 Thread2EAX := load(r); EAX := load(r);

EAX := EAX + 1; EAX := EAX + 1;

store EAX into r store EAX into r EAX := load(r) EAX := load(r)

Atomicity

59

Furthermore,wedon’tknowwhentheOSmightinterruptonethreadandruntheother.Thread1 Thread2EAX := load(r); EAX := load(r);

EAX := EAX + 1; EAX := EAX + 1; store EAX into r store EAX into r

EAX := load(r) EAX := load(r) (ThesituaGonissimilar,butnotquitethesameonmulG-processorsystems.)

TheHappensBeforeRelaGonWedon’tknowexactlywheneachinstrucGonwillexecute,buttherearesomeconstraints:theHappensBeforerelaGonRule1:Giventwoexpressions(orinstrucGons)insequence,e1;e2weknowthate1happensbeforee2.Rule2:Givenaprogram:lett=Thread.createfxin....Thread.joint;eweknowthat(fx)happensbeforee.

Atomicity

61

OnepossibleinterleavingoftheinstrucGons:Thread1 Thread2EAX := load(r); EAX := load(r); EAX := EAX + 1; EAX := EAX + 1;

store EAX into r store EAX into r EAX := load(r) EAX := load(r)

Whatanswerdoweget?

Atomicity

62

Anotherpossibleinterleaving:Thread1 Thread2EAX := load(r); EAX := load(r); EAX := EAX + 1; EAX := EAX + 1;

store EAX into r store EAX into r EAX := load(r) EAX := load(r)

WhatanswerdowegetthisGme?

Atomicity

63

Anotherpossibleinterleaving:Thread1 Thread2EAX := load(r); EAX := load(r); EAX := EAX + 1; EAX := EAX + 1; store EAX into r store EAX into r EAX := load(r) EAX := load(r)

WhatanswerdowegetthisGme?Moral:ThesystemisresponsibleforschedulingexecuGonofinstrucGons.Moral:Thiscanleadtoanenormousdegreeofnondeterminism.

Atomicity

64

Infact,today’smulGcoreprocessorsdon’ttreatmemoryinasequen%allyconsistentfashion.Thread1 Thread2EAX := load(r); EAX := load(r); EAX := EAX + 1; EAX := EAX + 1; store EAX into r store EAX into r EAX := load(r) EAX := load(r)

Thatmeansthatwecan’tevenassumethatwhatwewillseecorrespondstosomeinterleavingofthethreads’instruc%ons!Beyondthescopeofthisclass!Butthetake-awayisthis:It’snotagoodideatouseordinaryloads/storestosynchronizethreads;youshoulduseexplicitsynchronizaGonprimiGvessothehardwareandopGmizingcompilerdon’topGmizethemaway.

Atomicity

65

Infact,today’smulGcoreprocessorsdon’ttreatmemoryinasequen%allyconsistentfashion.Thatmeansthatwecan’tevenassumethatwhatwewillseecorrespondstosomeinterleavingofthethreads’instruc%ons!Beyondthescopeofthisclass!Butthetake-awayisthis:It’snotagoodideatouseordinaryloads/storestosynchronizethreads;youshoulduseexplicitsynchronizaGonprimiGvessothehardwareandopGmizingcompilerdon’topGmizethemaway.

Core1

L2cache

Core2

L1cache L1cache

ALU ALU

Core3

Core4

L1cache L1cache

ALU ALUWhenCore1storesto“memory”,itlazily

propagatestoCore2’sL1cache.TheloadatCore2mightnotseeit,unlessthereisanexplicitsynchronizaGon.

Summary:Interleaving&RaceCondiGons

66

CalculatepossibleoutcomesforaprogrambyconsideringallofthepossibleinterleavingsoftheatomicacGonsperformedbyeachthread.

–  Subjecttothehappens-beforerelaGon.•  can’thaveachildthread’sacGonshappeningbeforeaparentforksit.•  can’thavelaterinstrucGonsexecuteearlierinthesamethread.

–  Here,atomicmeansindivisibleacGons.•  Forexample,onmostmachinesreadingorwriGnga32-bitwordisatomic.•  But,wriGngamulG-wordobjectisusuallynotatomic.•  MostoperaGonslike“b:=b-w”areimplementedintermsofaseriesofsimpleroperaGonssuchas–  r1=read(b);r2=read(w);r3=r1–r2;write(b,r3)

Reasoningaboutallinterleavingsishard.justaboutimpossibleforpeople

–  NumberofinterleavingsgrowsexponenGallywithnumberofstatements.–  It’shardforustotellwhatisandisn’tatomicinahigh-levellanguage.–  YOUAREDOOMEDTOFAILIFYOUHAVETOWORRYABOUTTHISSTUFF!

Summary:Interleaving&RaceCondiGons

67

CalculatepossibleoutcomesforaprogrambyconsideringallofthepossibleinterleavingsoftheatomicacGonsperformedbyeachthread.

–  Subjecttothehappens-beforerelaGon.•  can’thaveachildthread’sacGonshappeningbeforeaparentforksit.•  can’thavelaterinstrucGonsexecuteearlierinthesamethread.

–  Here,atomicmeansindivisibleacGons.•  Forexample,onmostmachinesreadingorwriGnga32-bitwordisatomic.•  But,wriGngamulG-wordobjectisusuallynotatomic.•  MostoperaGonslike“b:=b-w”areimplementedintermsofaseriesofsimpleroperaGonssuchas–  r1=read(b);r2=read(w);r3=r1–r2;write(b,r3)

Reasoningaboutallinterleavingsishard.justaboutimpossibleforpeople

–  NumberofinterleavingsgrowsexponenGallywithnumberofstatements.–  It’shardforustotellwhatisandisn’tatomicinahigh-levellanguage.–  YOUAREDOOMEDTOFAILIFYOUHAVETOWORRYABOUTTHISSTUFF!

WARNINGIfyouseepeopletalkaboutinterleavings,BEWARE!

Itprobablymeansthey’reassuming“sequenGalconsistency,”

whichisanoversimplified,naïvemodelofwhattheparallelcomputerreallydoes.

It’sactuallymorecomplicatedthanthat.

AconvenGonalsoluGonforshared-memoryparallelism

Thread1 Thread2lock(mutex); lock(mutex);

inc(r); inc(r);

!r !r unlock(mutex); unlock(mutex);

GuaranteesmutualexclusionofthesecriGcalsecGons.ThissoluGonworks(evenforrealmachinesthatarenotsequenGallyconsistent),but…Complextoprogram,subjecttodeadlock,pronetobugs,notfault-tolerant,hardtoreasonabout.

let inc(r:int ref) = r := (!r) + 1

AconvenGonalsoluGonforshared-memoryparallelism

Thread1 Thread2lock(mutex); lock(mutex);

inc(r); inc(r);

!r !r unlock(mutex); unlock(mutex);

GuaranteesmutualexclusionofthesecriGcalsecGons.ThissoluGonworks(evenforrealmachinesthatarenotsequenGallyconsistent),but…Complextoprogram,subjecttodeadlock,pronetobugs,notfault-tolerant,hardtoreasonabout.

let inc(r:int ref) = r := (!r) + 1

SynchronizaHon

AnotherapproachtothecoordinaGonProblem

70

Howdowegetbacktheresultthattiscompu%ng?

Thread.create : (‘a -> ‘b) -> ‘a -> Thread.t let t = Thread.create f () in let y = g () in ...

OneSoluGon(usingjoin)

71

let r = ref None

let t = Thread.create (fun _ -> r := Some(f ())) in let y = g() in

Thread.join t ;

match !r with | Some v -> (* compute with v and y *)

| None -> failwith “impossible”

OneSoluGon(usingjoin)

72

let r = ref None

let t = Thread.create (fun _ -> r := Some(f ())) in let y = g() in

Thread.join t ;

match !r with | Some v -> (* compute with v and y *)

| None -> failwith “impossible”

Thread.join tcausesthecurrentthreadtowait

unGlthethreadtterminates.

OneSoluGon(usingjoin)

73

let r = ref None

let t = Thread.create (fun _ -> r := Some(f ())) in let y = g() in

Thread.join t ;

match !r with | Some v -> (* compute with v and y *)

| None -> failwith “impossible”

SoaSerthejoin,weknowthatanyoftheoperaGons

ofthavecompleted.

SynchronizaHon

InPictures

74

Thread1t=createfxinst1,1;inst1,2;inst1,3;inst1,4;…inst1,n-1;inst1,n;joint

Thread2inst2,1;inst2,2;inst2,3;…inst2,m;

WeknowthatforeachthreadthepreviousinstrucGonsmusthappenbeforethelaterinstrucGons.Soforinstance,inst1,1musthappenbeforeinst1,2.

InPictures

75

Thread1t=createfxinst1,1;inst1,2;inst1,3;inst1,4;…inst1,n-1;inst1,n;joint

Thread2inst2,1;inst2,2;inst2,3;…inst2,m;

WealsoknowthattheforkmusthappenbeforethefirstinstrucGonofthesecondthread.

InPictures

76

Thread1t=createfxinst1,1;inst1,2;inst1,3;inst1,4;…inst1,n-1;inst1,n;joint

Thread2inst2,1;inst2,2;inst2,3;…inst2,m;

WealsoknowthattheforkmusthappenbeforethefirstinstrucGonofthesecondthread.

Andthankstothejoin,weknowthatalloftheinstrucGonsofthesecondthreadmustbecompletedbeforethejoinfinishes.

InPictures

77

Thread1t=createfxinst1,1;inst1,2;inst1,3;inst1,4;…inst1,n-1;inst1,n;joint

Thread2inst2,1;inst2,2;inst2,3;…inst2,m;

However,ingeneral,wedonotknowwhetherinst1,iexecutesbeforeoraSerinst2,j.Ingeneral,synchroniza%oninstruc%onslikeforkandjoinreducethenumberofpossibleinterleavings.Synchroniza%oncutsdownnondeterminism.IntheabsenceofsynchronizaGonwedon’tknowanything…