Concurrent systems - cl.cam.ac.uk · PDF file–Scalabilitygiven parallelism and...

17
9/27/17 1 Concurrent systems Lecture 1: Introduction to concurrency, threads, and mutual exclusion Michaelmas 2017 Dr Robert N. M. Watson (With thanks to Dr Steven Hand) 1 Concurrent and distributed systems One course, two parts 8 lectures on concurrent systems 8 further lectures of distributed systems Similar interests and concerns: Scalability given parallelism and distributed systems Mask local or distributed communications latency Importance in observing (or enforcing) execution orders Correctness in the presence of concurrency (+debugging) Important differences Underlying primitives: shared memory vs. message passing Distributed systems experience communications failure Distributed systems (may) experience unbounded latency (Further) difficulty of distributed time 2

Transcript of Concurrent systems - cl.cam.ac.uk · PDF file–Scalabilitygiven parallelism and...

Page 1: Concurrent systems - cl.cam.ac.uk · PDF file–Scalabilitygiven parallelism and distributed systems ... , there is true parallelism –E.g. Direct Memory Access (DMA) transfers data

9/27/17

1

ConcurrentsystemsLecture1:Introductiontoconcurrency,threads,

andmutualexclusion

Michaelmas 2017Dr RobertN.M.Watson(WiththankstoDr StevenHand)

1

Concurrentanddistributedsystems

• Onecourse,twoparts– 8lecturesonconcurrentsystems– 8furtherlecturesofdistributedsystems

• Similarinterestsandconcerns:– Scalability givenparallelismanddistributedsystems– Masklocalordistributedcommunicationslatency– Importanceinobserving(orenforcing)executionorders– Correctness inthepresenceofconcurrency(+debugging)

• Importantdifferences– Underlyingprimitives:sharedmemoryvs.messagepassing– Distributedsystemsexperiencecommunicationsfailure– Distributedsystems(may)experienceunboundedlatency– (Further)difficultyofdistributedtime

2

Page 2: Concurrent systems - cl.cam.ac.uk · PDF file–Scalabilitygiven parallelism and distributed systems ... , there is true parallelism –E.g. Direct Memory Access (DMA) transfers data

9/27/17

2

Concurrentsystemsoutline1. Introductiontoconcurrency,threads,and

mutualexclusion2. Moremutualexclusion,semaphores,producer-

consumer,andMRSW3. CCR,monitors,concurrencyinpractice4. Safetyandliveness5. Concurrencywithoutshareddata;transactions6. Furthertransactions7. Crashrecovery;lockfreeprogramming;TM8. Concurrentsystemscasestudy:FreeBSDKernel

3

Recommendedreading• “OperatingSystems,ConcurrentandDistributedSoftwareDesign“,JeanBaconandTimHarris,Addison-Wesley2003

• “ModernOperatingSystems”,(3rd Ed),AndrewTannenbaum,Prentice-Hall2007

• “JavaConcurrencyinPractice”,BrianGoetzandothers,Addison-Wesley2006

Throughouttheterm,IwillsuggestyoulookintheBaconandHarrisformoredetailedexplanationsofalgorithms,asIcanonlypresentsketchesinlecture.

4

Page 3: Concurrent systems - cl.cam.ac.uk · PDF file–Scalabilitygiven parallelism and distributed systems ... , there is true parallelism –E.g. Direct Memory Access (DMA) transfers data

9/27/17

3

Whatisconcurrency?• Computersappeartodomanythingsatonce

– E.g.runningmultipleprogramsonyourlaptop– E.g.writingbackdatabufferedinmemorytotheharddiskwhile

theprogram(s)continuetoexecute• Inthefirstcase,thismayactuallybeanillusion

– E.g.processestimesharingasingleCPU• Inthesecond,thereistrueparallelism

– E.g.DirectMemoryAccess(DMA)transfersdatabetweenmemoryandI/Odevices(e.g.,NIC,SATA)atthesametimeastheCPUexecutescode

– E.g.,twoCPUsexecutecodeatthesametime• Inbothcases,wehaveaconcurrency

– Manythingsareoccurring“atthesametime”

5

Inthiscoursewewill• Investigateconcurrencyincomputersystems– Processes,threads,interrupts,hardware

• Considerhowtocontrolconcurrency– Mutualexclusion(locks,semaphores),conditionsynchronization,lock-freeprogramming

• Learnaboutdeadlock,livelock,priorityinversion– Andprevention,avoidance,detection,recovery

• Seehowabstractioncanprovidesupportforcorrect&fault-tolerantconcurrentexecution– Transactions,serialisability,concurrencycontrol

• Exploreadetailedconcurrentsoftwarecasestudy• Later,wewillextendtheseideastodistributedsystems

6

Page 4: Concurrent systems - cl.cam.ac.uk · PDF file–Scalabilitygiven parallelism and distributed systems ... , there is true parallelism –E.g. Direct Memory Access (DMA) transfers data

9/27/17

4

Recall:Processesandthreads• Processes areinstancesofprogramsinexecution

– OSunitofprotection&resourceallocation– Hasavirtualaddressspace;andoneormorethreads

• Threads areentitiesmanagedbythescheduler– Representsanindividualexecutioncontext– Athreadcontrolblock(TCB)holdsthesavedcontext(registers,

includingstackpointer),schedulerinfo,etc• Threadsrunintheaddressspacesoftheirprocess

– (andalsointhekerneladdressspaceonbehalfofusercode)• ContextswitchesoccurwhentheOSsavesthestateofone

threadandrestoresthestateofanother– Ifaswitchisbetweenthreadsindifferentprocesses,then

processstateisalsoswitched– e.g.,theaddressspace

7

ConcurrencywithasingleCPU(1)

• Process/OSconcurrency– ProcessXrunsforawhile(untilblocks orinterrupted)– OSrunsforawhile(e.g.doessomeTCPprocessing)– ProcessXresumeswhereitleftoff…

• Inter-processconcurrency– ProcessXrunsforawhile;thenOS;thenProcessY;thenOS;thenProcessZ;etc

• Intra-processconcurrency– ProcessXhasmultiplethreadsX1,X2,X3,…– X1runsforawhile;thenX3;thenX1;thenX2;then…

8

Page 5: Concurrent systems - cl.cam.ac.uk · PDF file–Scalabilitygiven parallelism and distributed systems ... , there is true parallelism –E.g. Direct Memory Access (DMA) transfers data

9/27/17

5

ConcurrencywithasingleCPU(2)

• WithjustoneCPU,canthinkofconcurrencyasinterleaving ofdifferentexecutions,e.g.

Proc(A) OS Proc(B) Proc(C) Proc(A)OS Proc(B) OS OS

time

timerinterrupt diskinterrupt systemcall pagefault

• Exactlywhereexecutionisinterruptedandresumedis notusuallyknowninadvance…• this makesconcurrencychallenging!

• Generallyshouldassumeworstcasebehavior9Non-deterministicorsocomplexastobeunpredictable

Concurrencywithmultipleprocessors

• ManymodernsystemshavemultipleCPUs– Andevenifdon’t,haveotherprocessingelements

• Hencethingscanoccurinparallel,e.g.

Proc(A) OS Proc(B) Proc(C)

Proc(A)

OS Proc(B) OS OS

time

CPU0

CPU1 Proc(A)OS Proc(D)Proc(C) OS

• NoticethattheOSrunsonbothCPUs:tricky!• MoregenerallycanhavedifferentthreadsofthesameprocessexecutingondifferentCPUstoo

10

OS

Page 6: Concurrent systems - cl.cam.ac.uk · PDF file–Scalabilitygiven parallelism and distributed systems ... , there is true parallelism –E.g. Direct Memory Access (DMA) transfers data

9/27/17

6

Whatmightthiscodedo?

11

void main(void) {threadid_t threads[NUMTHREADS]; // Thread IDsint i; // Counter

for (i = 0; i < NUMTHREADS; i++)threads[i] = thread_create(threadfn, i);

for (i = 0; i < NUMTHREADS; i++)thread_join(threads[i]);

}

void threadfn(int threadnum) {sleep(rand(2)); // Sleep 0 or 1 secondsprintf(“%s %d\n”, threadstr, threadnum);

}

Whatorderscouldtheprintfs runin?

#define NUMTHREADS 4char *threadstr = “Thread”;

Globalvariablesaresharedbyallthreads

main() iscalledonceatstartup

Eachthreadhasitsownlocalvariables

Additionalthreadsarestartedexplicitly

Possibleorderingsofthisprogram

• Whatordercouldtheprintf()soccurin?• Twosourcesofnon-determinisminexample:– Programnon-determinism:Threadsrandomlysleep0or1secondsbeforeprinting

– Threadschedulingnon-determinism:Arbitraryorderforunprioritised,concurrentwakeups,preemptions

• Thereare4!(factorial)validpermutations– Assumingprintf()isindivisible– Isprintf()indivisible?Maybe.

• Evenmorepotentialtimings ofprintf()s

12

Page 7: Concurrent systems - cl.cam.ac.uk · PDF file–Scalabilitygiven parallelism and distributed systems ... , there is true parallelism –E.g. Direct Memory Access (DMA) transfers data

9/27/17

7

Multiplethreadswithinaprocess• Asingle-threadedprocesshas

code,aheap,astack,registers• Additionalthreadshavetheirown

registersandstacks– Per-threadprogramcounters($pc)

allowexecutionflowstodiffer– Per-threadstackpointers($sp)allow

callstacks,localvariablestodiffer

• Heapandcode(+globalvariables)aresharedbetweenallthreads

• Accesstoanotherthread’sstackispossibleinsomelanguages– butdeeplydiscouraged! 13

Code

Processaddressspace

Heap

Thread1registers

$pc

$t0

$sp

$a0$a1

Stack

Thread2registers

$pc

$t0

$sp

$a0$a1

Stack

1:N- user-levelthreading• Kernel onlyknowsabout(and

schedules)processes• Auserspace library implements

threads,contextswitching,scheduling,synchronisation,…– E.g.,theJVMorathreadinglibrary

• Advantages– Lightweightcreation/termination+

contextswitch;application-specificscheduling;OSindependence

• Disadvantages– Awkwardtohandleblockingsystem

callsorpagefaults,preemption;cannotusemultipleCPUs

• Veryearly1990s!

14

Kernel

P1 P2

CPU1 CPU2

P1T1 T2

T3

Page 8: Concurrent systems - cl.cam.ac.uk · PDF file–Scalabilitygiven parallelism and distributed systems ... , there is true parallelism –E.g. Direct Memory Access (DMA) transfers data

9/27/17

8

1:1- kernel-levelthreading• Kernel providesthreadsdirectly

– Bydefault,aprocesshasonethread…– …butcancreatemoreviasystemcalls

• Kernelimplementsthreads,threadcontextswitching,scheduling,etc.

• Userspace threadlibrary1:1mapsuserthreadsintokernelthreads

• Advantages:– Handlespreemption,blockingsyscalls– StraightforwardtousemultipleCPUs

• Disadvantages:– Higheroverhead(traptokernel);less

flexible;lessportable• ModelofchoiceacrossmajorOSes

– Windows,Linux,MacOS,FreeBSD,Solaris,…

15

Kernel

P1

CPU1 CPU2

P1 T1T2

T3

Kernel

P1

CPU1 CPU2

P1T1

T2T3

T2

M:N- hybridthreading• Bestofbothworlds?

– M:Nthreads,scheduleractivations,…• Kernel exposesasmallernumber(M)of

activations – typically1:1withCPUs• Userspace schedulesalargernumber

(N)ofthreads ontoavailableactivations– Kernelupcalls whenathreadblocks,

returningtheactivationtouserspace– Kernelupcalls whenathreadwakesup,

userspace schedulesitonanactivation– Kernelcontrolsmaximumparallelismby

limitingnumberofactivations• RemovedfrommostOSes– why?• Now:VirtualMachineMonitors(VMMs)

– EachVirtualCPU(VCPU)isanactivation• Reappearsinconcurrencyframeworks

– E.g.,Apple’sGrandCentralDispatch(GCD)16

Page 9: Concurrent systems - cl.cam.ac.uk · PDF file–Scalabilitygiven parallelism and distributed systems ... , there is true parallelism –E.g. Direct Memory Access (DMA) transfers data

9/27/17

9

Advantagesofconcurrency

• AllowsustooverlapcomputationandI/Oonasinglemachine

• Cansimplifycodestructuringand/orimproveresponsiveness– e.g.onethreadredrawstheGUI,anotherhandlesuserinput,andanothercomputesgamelogic

– e.g.onethreadperHTTPrequest– e.g.backgroundGCthreadinJVM/CLR

• Enablestheseamless(?!)useofmultipleCPUs–greaterperformancethroughparallelprocessing

17

Concurrentsystems

• Ingeneral,havesomenumberofprocesses…– …eachwithsomenumberofthreads …– …runningonsomenumberofcomputers…– …eachwithsomenumberofCPUs.

• Forthishalfofthecoursewe’llfocusonasinglecomputerrunningamulti-threadedprocess– mostproblems&solutionsgeneralizetomultipleprocesses,CPUs,andmachines,butmorecomplex

– (we’lllookatdistributedsystemslaterintheterm)• Challenge:threadswillaccesssharedresourcesconcurrentlyviatheircommonaddressspace

18

Page 10: Concurrent systems - cl.cam.ac.uk · PDF file–Scalabilitygiven parallelism and distributed systems ... , there is true parallelism –E.g. Direct Memory Access (DMA) transfers data

9/27/17

10

Example:HousematesBuyingBeer

• Thread1(person1)1. Lookinfridge2. Ifnobeer,gobuybeer3. Putbeerinfridge

• Inmostcases,thisworksjustfine…• Butifbothpeoplelook(step1)beforeeitherrefillsthe

fridge(step3)…we’ll endupwithtoomuchbeer!• Obviouslymoreworryingif“lookinfridge”is“check

reactor”,and“buybeer”is“togglesafetysystem” ;-)

• Thread2(person2)1. Lookinfridge2. Ifnobeer,gobuybeer3. Putbeerinfridge

19

Solution#1:LeaveaNote• Thread1(person1)

1. Lookinfridge2. Ifnobeer&nonote

1. Leavenoteonfridge2. Gobuybeer3. Putbeerinfridge4. Removenote

• Thread2(person2)1. Lookinfridge2. Ifnobeer &nonote

1. Leavenoteonfridge2. Gobuybeer3. Putbeerinfridge4. Removenote

• Probablyworksforhumanbeings…• Butcomputersarestooopid!

• Canyouseetheproblem?

20

Page 11: Concurrent systems - cl.cam.ac.uk · PDF file–Scalabilitygiven parallelism and distributed systems ... , there is true parallelism –E.g. Direct Memory Access (DMA) transfers data

9/27/17

11

Non-Solution#1:LeaveaNote// thread 1beer = checkFridge();if(!beer) {

if(!note) {note = 1;buyBeer();note = 0;

}}

// thread 2beer = checkFridge();if(!beer) {

if(!note) {note = 1;buyBeer();note = 0;

}}

• Easiertoseewithpseudo-code…21

Non-Solution#1:LeaveaNote// thread 1beer = checkFridge();if(!beer) {

if(!note) {

note = 1;buyBeer();note = 0;

}}

// thread 2

beer = checkFridge();if(!beer) {

if(!note) {note = 1;buyBeer();note = 0;

}}

• Easiertoseewithpseudo-code…

contextswitch

contextswitch

22

Page 12: Concurrent systems - cl.cam.ac.uk · PDF file–Scalabilitygiven parallelism and distributed systems ... , there is true parallelism –E.g. Direct Memory Access (DMA) transfers data

9/27/17

12

Non-Solution#1:LeaveaNote

• Ofcoursethiswon’thappenallthetime– Needthreadstointerleaveinthejusttherightway(orjustthewrongway;-)

• Unfortunatelycodethatis‘mostlycorrect’ismuchworsethancodethatis‘mostlywrong’!– Difficulttocatchintesting,asoccursrarely–Mayevengoawaywhenrunningunderdebugger• e.g.onlycontextswitchesthreadswhentheyblock• (suchbugsaresometimescalledHeisenbugs)

23

CriticalSections&MutualExclusion

• Thehigh-levelproblemhereisthatwehavetwothreadstryingtosolvethesameproblem– BothexecutebuyBeer()concurrently– Ideallywantonlyonethreaddoingthatatatime

• Wecallthiscodeacriticalsection– A pieceofcodewhichshouldneverbeconcurrentlyexecutedbymorethanonethread

• Ensuringthisinvolvesmutualexclusion– Ifonethreadisexecutingwithinacriticalsection,allotherthreadsareprohibitedfromenteringit

24

Page 13: Concurrent systems - cl.cam.ac.uk · PDF file–Scalabilitygiven parallelism and distributed systems ... , there is true parallelism –E.g. Direct Memory Access (DMA) transfers data

9/27/17

13

AchievingMutualExclusion

• Onewayistoletonlyonethreadeverexecuteaparticularcriticalsection– e.g.anominatedbeerbuyer– butthisrestrictsconcurrency

• Alternativelyour(broken)solution#1wastryingtoprovidemutualexclusionviathenote– Leavinganotemeans“I’minthecriticalsection”;– Removingthenotemeans“I’mdone”– But,aswesaw,itdidn’twork;-)

• Thiswasbecausewecouldexperienceacontextswitchbetweenreading‘note’,andsettingit

25

Non-Solution#1:LeaveaNote

26

// thread 1beer = checkFridge();if(!beer) {

if(!note) {

note = 1;buyBeer();note = 0;

}}

// thread 2

beer = checkFridge();if(!beer) {

if(!note) {note = 1;buyBeer();note = 0;

}}

contextswitch

contextswitch

Wedecidetoenterthecriticalsectionhere…

Butonlymarkthefacthere…

Theseproblemsarereferredtoasraceconditionsinwhichmultiplethreads

“race”withoneanotherduringconflictingaccesstosharedresources

Page 14: Concurrent systems - cl.cam.ac.uk · PDF file–Scalabilitygiven parallelism and distributed systems ... , there is true parallelism –E.g. Direct Memory Access (DMA) transfers data

9/27/17

14

Atomicity• Whatwewantisforthecheckingofnoteandthe(conditional)settingofnotetohappenwithoutanyotherthreadbeinginvolved– Wedon’tcareifanotherthreadreadsitafterwe’redone;orsetsitbeforewestartourcheck

– Butoncewestartourcheck,wewanttocontinuewithoutanyinterruption

• Ifasequenceofoperations(e.g.read-and-set)occurasifoneoperation,wecallthematomic– Sinceindivisible fromthepointofviewoftheprogram

• Anatomicread-and-set operationissufficientforustoimplementacorrectbeerprogram

27

Solution#2:AtomicNote// thread 1beer = checkFridge();if(!beer) {

if(read-and-set(note)) {buyBeer();note = 0;

}}

// thread 2beer = checkFridge();if(!beer) {

if(read-and-set(note)) {buyBeer();note = 0;

}}

• read-and-set(&address)atomically checksthevalueinmemoryandiff itiszero,setsittoone– returns1iff thevaluewaschangedfrom0->1

• Thispreventsthebehaviorwesawbefore,andissufficienttoimplementacorrectprogram…– althoughthisisnotthatprogram:-)

28

Page 15: Concurrent systems - cl.cam.ac.uk · PDF file–Scalabilitygiven parallelism and distributed systems ... , there is true parallelism –E.g. Direct Memory Access (DMA) transfers data

9/27/17

15

Non-Solution#2:AtomicNote// thread 1beer = checkFridge();if(!beer) {

if(read-and-set(note)) {buyBeer();note = 0;

}}

// thread 2

beer = checkFridge();if(!beer) {

if(read-and-set(note)) {buyBeer();note = 0;

}}

• Ourcriticalsectiondoesn’tcoverenough!

contextswitch

contextswitch

29

Generalmutualexclusion

• Wewouldliketheabilitytodefinearegionofcodeasacriticalsectione.g.

// thread 1ENTER_CS();beer = checkFridge();if(!beer)

buyBeer();LEAVE_CS();

// thread 2ENTER_CS();beer = checkFridge();if(!beer)

buyBeer();LEAVE_CS();

• Thisshouldwork…• …providingthatourimplementationofENTER_CS()/LEAVE_CS()iscorrect

30

Page 16: Concurrent systems - cl.cam.ac.uk · PDF file–Scalabilitygiven parallelism and distributed systems ... , there is true parallelism –E.g. Direct Memory Access (DMA) transfers data

9/27/17

16

Implementingmutualexclusion

• Oneoptionistopreventcontextswitches– e.g.disableinterrupts(forkernelthreads),orsetanin-memoryflag(foruserthreads)

• ENTER_CS()=“disablecontextswitches”;LEAVE_CS()=“re-enablecontextswitches”

• Canworkbut:– Ratherbruteforce(stopsallotherthreads,notjustthosewhowanttoenterthecriticalsection)

– Potentiallyunsafe(ifdisableinterruptsandthensleepwaitingforatimerinterrupt;-)

– Anddoesn’tworkacrossmultipleCPUs31

Implementingmutualexclusion

• Associateamutualexclusionlockwitheachcriticalsection,e.g.avariableL– (mustensureusecorrectlockvariable!)ENTER_CS()=“LOCK(L)”LEAVE_CS()=“UNLOCK(L)”

• CanimplementLOCK()usingread-and-set():

LOCK(L) { while(!read-and-set(L))

; // do nothing}

UNLOCK(L) { L = 0;

}

32

Page 17: Concurrent systems - cl.cam.ac.uk · PDF file–Scalabilitygiven parallelism and distributed systems ... , there is true parallelism –E.g. Direct Memory Access (DMA) transfers data

9/27/17

17

Solution#3:mutualexclusionlocks// thread 1LOCK(fridgeLock); beer = checkFridge();if(!beer)

buyBeer();UNLOCK(fridgeLock);

// thread 2LOCK(fridgeLock); beer = checkFridge();if(!beer)

buyBeer();UNLOCK(fridgeLock);

• Thisis– finally!– acorrectprogram• Stillnotperfect

– Lockmightbeheldforquitealongtime(e.g.imagineanotherpersonwantingtogetthemilk!)

– WaitingthreadswasteCPUtime(orworse)– Contention occurswhenconsumershavetowaitforlocks

• Mutualexclusionlocksoftenknownasmutexes– Butwewillpreferthistermforsleepable locks– seeLecture2– Sothinkoftheaboveasaspinlock 33

Summary+nexttime• Definitionofaconcurrentsystem• Originsofconcurrencywithinacomputer• Processesandthreads• Challenge:concurrentaccesstosharedresources• Criticalsections,mutualexclusion,raceconditions,

atomicity• Mutualexclusionlocks(mutexes)

• Nexttime:– Moreonmutualexclusion– Hardwaresupportformutualexclusion– Semaphoresformutualexclusion,processsynchronisation,and

resourceallocation– Producer-consumerrelationships.

34