Architectures for Congestion-Sensitive Pricing of Network Services
Operating Systems and Networks Network Lecture 10 ...Operating Systems and Networks Network Lecture...
Transcript of Operating Systems and Networks Network Lecture 10 ...Operating Systems and Networks Network Lecture...
OperatingSystemsandNetworks
NetworkLecture10:CongestionControl
AdrianPerrigNetworkSecurityGroupETHZürich
2
WhereweareintheCourse• MorefunintheTransportLayer!– Themysteryofcongestioncontrol– DependsontheNetworklayertoo
PhysicalLink
Application
NetworkTransport
3
Topic• Understandingcongestion,a“trafficjam”inthenetwork– Laterwewilllearnhowtocontrolit
What’stheholdup?
Network
NatureofCongestion• Routers/switcheshaveinternalbufferingforcontention
4
...
...
... ...
InputBuffer OutputBufferFabric
Input Output
NatureofCongestion(2)• Simplifiedviewofperportoutputqueues– TypicallyFIFO(FirstInFirstOut),discardwhenfull
5
Router
=
(FIFO)QueueQueuedPackets
Router
6
NatureofCongestion(3)• Queueshelpbyabsorbingburstswheninput>outputrate
• Butifinput>outputratepersistently,queuewilloverflow– Thisiscongestion
• Congestionisafunctionofthetrafficpatterns– canoccurevenifeverylinkhassamecapacity
EffectsofCongestion• Whathappenstoperformanceasweincreasetheload?
7
8
EffectsofCongestion(3)• Asofferedloadrises,congestionoccursasqueuesbegintofill:– Delayandlossrisesharplywithmoreload– Throughputfallsbelowload(duetoloss)– Goodput mayfallbelowthroughput(duetospuriousretransmissions)
• Noneoftheaboveisgood!– Wanttooperatenetworkjustbeforetheonsetofcongestion
9
BandwidthAllocation• Importanttaskfornetworkistoallocateitscapacitytosenders– Goodallocationisefficientandfair
• Efficient meansmostcapacityisusedbutthereisnocongestion
• Fair meanseverysendergetsareasonablesharethenetwork
10
BandwidthAllocation(2)• Keyobservation:– Inaneffectivesolution,TransportandNetworklayersmustworktogether
• Networklayerwitnessescongestion– Onlyitcanprovidedirectfeedback
• Transportlayercausescongestion– Onlyitcanreduceofferedload
11
BandwidthAllocation(3)• Whyisithard?(Justsplitequally!)– Numberofsendersandtheirofferedloadisconstantlychanging
– Sendersmaylackcapacityindifferentpartsofthenetwork– Networkisdistributed;nosinglepartyhasanoverallpictureofitsstate
12
BandwidthAllocation(4)• Solutioncontext:– Sendersadaptconcurrentlybasedontheirownviewofthenetwork
– Designthisadaptionsothenetworkusageasawholeisefficientandfair
– Adaptationiscontinuoussinceofferedloadscontinuetochangeovertime
13
Topics• Natureofcongestion• Fairallocations• AIMDcontrollaw• TCPCongestionControlhistory• ACKclocking• TCPSlow-start• TCPFastRetransmit/Recovery• CongestionAvoidance(ECN)
14
FairnessofBandwidthAllocation(§6.3.1)
• What’sa“fair”bandwidthallocation?– Themax-minfairallocation
15
Recall• Wewantagoodbandwidthallocationtobefairandefficient– Nowwelearnwhatfairmeans
• Caveat:inpractice,efficiencyismoreimportantthanfairness
16
Efficiencyvs.Fairness• Cannotalwayshaveboth!– ExamplenetworkwithtrafficAàB,BàCandAàC– Howmuchtrafficcanwecarry?
A B C1 1
17
Efficiencyvs.Fairness(2)• Ifwecareaboutfairness:– Giveequalbandwidthtoeachflow– AàB:½unit,BàC:½,andAàC,½– Totaltrafficcarriedis1½units
A B C1 1
18
Efficiencyvs.Fairness(3)• Ifwecareaboutefficiency:– Maximizetotaltrafficinnetwork– AàB:1unit,BàC:1,andAàC,0– Totaltrafficrisesto2units!
A B C1 1
19
TheSlipperyNotionofFairness• Whyis“equalperflow”fairanyway?– AàCusesmorenetworkresources(twolinks)thanAàBorBàC
– HostAsendstwoflows,Bsendsone
• Notproductivetoseekexactfairness– Moreimportanttoavoidstarvation– “Equalperflow”isgoodenough
20
Generalizing“EqualperFlow”• Bottleneck foraflowoftrafficisthelinkthatlimitsitsbandwidth– Wherecongestionoccursfortheflow– ForAàC,linkA–Bisthebottleneck
A B C1 10
Bottleneck
21
Generalizing“EqualperFlow”(2)• Flowsmayhavedifferentbottlenecks– ForAàC,linkA–Bisthebottleneck– ForBàC,linkB–Cisthebottleneck– Cannolongerdividelinksequally…
A B C1 10
22
Max-MinFairness• Intuitively,flowsbottleneckedonalinkgetanequalshareofthatlink
• Max-minfairallocation isonethat:– Increasingtherateofoneflowwilldecreasetherateofasmallerflow
– This“maximizestheminimum”flow
23
Max-MinFairness(2)• Tofinditgivenanetwork,imagine“pouringwaterintothenetwork”1. Startwithallflowsatrate02. Increasetheflowsuntilthereisanewbottleneckinthe
network3. Holdfixedtherateoftheflowsthatarebottlenecked4. Gotostep2foranyremainingflows
Max-MinExample• Example:networkwith4flows,linksequalbandwidth– Whatisthemax-minfairallocation?
24
Max-MinExample(2)• Whenrate=1/3,flowsB,C,andDbottleneckR4—R5– FixB,C,andD,continuetoincreaseA
25
Bottleneck
Max-MinExample(3)• Whenrate=2/3,flowAbottlenecksR2—R3.Done.
26
Bottleneck
Bottleneck
Max-MinExample(4)• EndwithA=2/3,B,C,D=1/3,andR2—R3,R4—R5full– Otherlinkshaveextracapacitythatcan’tbeused
• ,linksxample:networkwith4flows,linksequalbandwidth– Whatisthemax-minfairallocation?
27
AdaptingoverTime• Allocationchangesasflowsstartandstop
28
Time
AdaptingoverTime(2)
29
Flow1slowswhenFlow2starts
Flow1speedsupwhenFlow2stops
Time
Flow3limitiselsewhere
30
Recall• Wanttoallocatecapacitytosenders– Networklayerprovidesfeedback– Transportlayeradjustsofferedload– Agoodallocationisefficientandfair
• Howshouldweperformtheallocation?– Severaldifferentpossibilities…
31
BandwidthAllocationModels• Openloopversusclosedloop– Open:reservebandwidthbeforeuse– Closed:usefeedbacktoadjustrates
• HostversusNetworksupport– Whosets/enforcesallocations?
• WindowversusRatebased– Howisallocationexpressed?
TCPisaclosedloop,host-driven,andwindow-based
32
BandwidthAllocationModels(2)• We’lllookatclosed-loop,host-driven,andwindow-based
• Networklayerreturnsfeedbackoncurrentallocationtosenders– Atleasttellsifthereiscongestion
• Transportlayeradjustssender’sbehaviorviawindowinresponse– Howsendersadaptisacontrollaw
33
AdditiveIncreaseMultiplicativeDecrease(AIMD)(§6.3.2)
• Bandwidthallocationmodels– AdditiveIncreaseMultiplicativeDecrease(AIMD)controllaw
AIMD!
Sawtooth
34
AdditiveIncreaseMultiplicativeDecrease• AIMDisacontrollawhostscanusetoreachagoodallocation– Hostsadditivelyincreaseratewhilenetworkisnotcongested
– Hostsmultiplicativelydecreaseratewhencongestionoccurs
– UsedbyTCPJ
• Let’sexploretheAIMDgame…
35
AIMDGame• Hosts1and2shareabottleneck– Butdonottalktoeachotherdirectly
• Routerprovidesbinaryfeedback– Tellshostsifnetworkiscongested
RestofNetwork
Bottleneck
Router
Host1
Host2
1
11
36
AIMDGame(2)• Eachpointisapossibleallocation
Host1
Host20 1
1
Fair
Efficient
OptimalAllocation
Congested
37
AIMDGame(3)• AIandMDmovetheallocation
Host1
Host20 1
1
Fair,y=x
Efficient,x+y=1
OptimalAllocation
Congested
MultiplicativeDecrease
AdditiveIncrease
38
AIMDGame(4)• Playthegame!
Host1
Host20 1
1
Fair
Efficient
Congested
Astartingpoint
39
AIMDGame(5)• Alwaysconvergetogoodallocation!
Host1
Host20 1
1
Fair
Efficient
Congested
Astartingpoint
40
AIMDSawtooth• Producesa“sawtooth”patternovertimeforrateofeachhost– ThisistheTCPsawtooth (later)
MultiplicativeDecrease
AdditiveIncrease
Time
Host1or2’sRate
41
AIMDProperties• Convergestoanallocationthatisefficientandfairwhenhostsrunit– Holdsformoregeneraltopologies
• Otherincrease/decreasecontrollawsdonot!(TryMIAD,MIMD,AIAD)
• Requiresonlybinaryfeedbackfromthenetwork
FeedbackSignals• Severalpossiblesignals,withdifferentpros/cons– We’lllookatclassicTCPthatusespacketlossasasignal
42
Signal ExampleProtocol Pros/ConsPacket loss TCPNewReno
Cubic TCP(Linux)+Hard togetwrong
-HearaboutcongestionlatePacket delay Compound TCP
(Windows)+Hear aboutcongestionearly-Needtoinfercongestion
Routerindication
TCPswithExplicitCongestionNotification
+Hearaboutcongestionearly-Require routersupport
43
HistoryofTCPCongestionControl (§6.5.10)
• ThestoryofTCPcongestioncontrol– Collapse,control,anddiversification
What’sup?
Internet
44
CongestionCollapseinthe1980s• EarlyTCPusedafixedsizeslidingwindow(e.g.,8packets)– Initiallyfineforreliability
• Butsomethingstrangehappened astheARPANETgrew– Linksstayedbusybuttransferrates fellbyordersofmagnitude!
45
CongestionCollapse(2)• Queuesbecamefull,retransmissionscloggedthenetwork,and
goodput fell
Congestioncollapse
46
VanJacobson(1950—)• WidelycreditedwithsavingtheInternetfromcongestioncollapseinthelate80s– Introducedcongestioncontrolprinciples– Practicalsolutions(TCPTahoe/Reno)
• Muchotherpioneeringwork:– Toolsliketraceroute,tcpdump,pathchar– IPheadercompression,multicasttools
Source:Wikipedia(publicdomain)
47
TCPTahoe/Reno• Avoidcongestioncollapsewithoutchangingrouters(orevenreceivers)
• Ideaistofixtimeoutsandintroduceacongestionwindow (cwnd)overtheslidingwindowtolimitqueues/loss
• TCPTahoe/RenoimplementsAIMDbyadaptingcwndusingpacketlossasthenetworkfeedbacksignal
48
TCPTahoe/Reno(2)• TCPbehaviorswewillstudy:– ACK clocking– Adaptivetimeout(meanandvariance)– Slow-start– FastRetransmission– FastRecovery
• Together,theyimplementAIMD
TCPTimeline
49
1988
19901970 19801975 1985
Originsof“TCP”(Cerf&Kahn,’74)
3-wayhandshake(Tomlinson,‘75)
TCPReno(Jacobson,‘90)
CongestioncollapseObserved,‘86
TCP/IP“flagday”(BSDUnix4.2,‘83)
TCPTahoe(Jacobson,’88)
Pre-history Congestioncontrol...
TCPandIP(RFC791/793,‘81)
TCPTimeline(2)
50
201020001995 2005
ECN(Floyd,‘94)
TCPReno(Jacobson,‘90) TCPNewReno
(Hoe,‘95) TCPBIC(Linux,‘04
TCPwithSACK(Floyd,‘96)
DiversificationClassiccongestioncontrol...
1990
TCPLEDBAT(IETF’08)
TCPVegas(Brakmo,‘93)
TCPCUBIC(Linux,’06)
...
BackgroundRoutersupportDelaybased
FASTTCP(Lowetal.,’04)
CompoundTCP(Windows,’07)
51
TCPAck Clocking(§6.5.10)• Theself-clockingbehaviorofslidingwindows,andhowitisusedbyTCP– The“ACK clock”
TickTock!
52
SlidingWindowACKClock• Eachin-orderACK advancestheslidingwindowandletsanewsegmententerthenetwork– ACKs “clock”datasegments
Ack 12345678910
20191817161514131211Data
BenefitofACKClocking• Considerwhathappenswhensenderinjectsaburstofsegmentsintothenetwork
53
Fastlink FastlinkSlow(bottleneck)link
Queue
BenefitofACKClocking(2)• Segmentsarebufferedandspreadoutonslowlink
54
Fastlink FastlinkSlow(bottleneck)link
Segments“spreadout”
BenefitofACKClocking(3)• ACKs maintainthespreadbacktotheoriginalsender
55
SlowlinkAcks maintainspread
BenefitofACKClocking(4)• Senderclocksnewsegmentswiththespread– Nowsendingatthebottlenecklinkwithoutqueuing!
56
Slowlink
Segmentsspread Queuenolongerbuilds
57
BenefitofACKClocking(4)• Helpsthenetworkrunwithlowlevelsoflossanddelay!
• Thenetworkhassmoothedouttheburstofdatasegments
• ACK clocktransfersthissmoothtimingbacktothesender
• Subsequentdatasegmentsarenotsentinburstssotheydonot queueupinthenetwork
58
TCPUsesACKClocking• TCPusesaslidingwindowbecauseofthevalueofACK
clocking
• Slidingwindowcontrolshowmanysegmentsareinsidethenetwork– Calledthecongestionwindow,orcwnd– Rateisroughlycwnd/RTT
• TCPonlysendssmallburstsofsegmentstoletthenetworkkeepthetrafficsmooth
59
TCPSlowStart(§6.5.10)• HowTCPimplementsAIMD,part1– “Slowstart”isacomponentoftheAIportionofAIMD
Slow-start
60
Considerations• WewantTCPtofollowanAIMDcontrollawforagood
allocation
• Senderusesacongestionwindow orcwnd tosetitsrate(≈cwnd/RTT)
• Senderusespacketlossasthenetworkcongestionsignal
• NeedTCPtoworkacrossaverylargerangeofratesandRTTs
61
TCPStartupProblem• Wewanttoquicklyneartherightrate,cwndIDEAL,butitvariesgreatly– Fixedslidingwindowdoesn’tadaptandisroughonthenetwork(loss!)
– AIwithsmallburstsadaptscwnd gentlytothenetwork,butmighttakealongtimetobecomeefficient
62
Slow-StartSolution• Startbydoublingcwnd everyRTT– Exponentialgrowth(1,2,4,8,16,…)– Startslow,quicklyreachlargevalues
AI
Fixed
TimeWindo
w(cwnd
)
Slow-start
63
Slow-StartSolution(2)• Eventuallypacketlosswilloccurwhenthenetworkiscongested– Losstimeouttellsuscwnd istoolarge– Nexttime,switchtoAIbeforehand– Slowlyadaptcwnd nearrightvalue
• Intermsofcwnd:– ExpectlossforcwndC ≈2BD+queue– Usessthresh =cwndC/2toswitchtoAIafterobservingloss
64
Slow-StartSolution(3)• Combinedbehavior,afterfirsttime– Mosttimespendnearrightvalue
AI
Fixed
Time
Window
ssthresh
cwndC
cwndIDEALAIphase
Slow-start
Slow-Start(Doubling)Timeline
65
Incrementcwndby1segmentsizeforeachACK
AdditiveIncreaseTimeline
66
Incrementcwnd by1segmentsizeeverycwnd ACKs(or1RTT)
67
TCPTahoe(Implementation)• Initialslow-start(doubling)phase
– Startwithcwnd =1(orsmallvalue)– cwnd +=1segmentsizeperACK
• LaterAdditiveIncreasephase– cwnd +=1/cwndsegmentsperACK– Roughlyadds1segmentsizeperRTT
• Switchingthreshold(initiallyinfinity)– SwitchtoAIwhencwnd >ssthresh– Setssthresh =cwnd/2afterloss– Beginwithslow-startaftertimeout
68
TimeoutMisfortunes• Whydoaslow-startaftertimeout?– InsteadofMDcwnd (forAIMD)
• TimeoutsaresufficientlylongthattheACK clockwillhaverundown– Slow-startrampsuptheACK clock
• WeneedtodetectlossbeforeatimeouttogettofullAIMD– DoneinTCPReno
69
TCPFastRetransmit/FastRecovery(§6.5.10)
• HowTCPimplementsAIMD,part2– “Fastretransmit”and“fastrecovery”aretheMDportionofAIMD
AIMDsawtooth
70
Recall• WewantTCPtofollowanAIMDcontrollawforagood
allocation
• Senderusesacongestionwindow orcwnd tosetitsrate(≈cwnd/RTT)
• Senderusesslow-starttorampuptheACK clock,followedbyAdditiveIncrease
• Butafteratimeout,senderslow-startsagainwithcwnd=1(asithasnoACK clock)
71
InferringLossfromACKs• TCPusesacumulativeACK– Carrieshighestin-orderseq.number– Normallyasteadyadvance
• DuplicateACKsgiveushintsaboutwhatdatahasn’tarrived– Tellussomenewdatadidarrive,butitwasnotnextsegment– Thusthenextsegmentmaybelost
72
FastRetransmit• TreatthreeduplicateACKsasaloss– Retransmitnextexpectedsegment– Somerepetitionallowsforreordering,butstilldetectslossquickly
Ack 12345555 55
FastRetransmit(2)
73
Ack 10Ack 11Ack 12Ack 13
...
Ack 13
Ack 13Ack 13
Data14...Ack 13
Ack 20......
Data20ThirdduplicateACK,sosend14 Retransmissionfills
intheholeat14ACKjumpsafterlossisrepaired
......
Data14waslostearlier,butgot15to20
74
FastRetransmit(3)• Itcanrepairsinglesegmentlossquickly,typicallybeforeatimeout
• However,wehavequiettimeatthesender/receiverwhilewaitingfortheACKtojump
• AndwestillneedtoMDcwnd …
75
InferringNon-LossfromACKs• DuplicateACKsalsogiveushintsaboutwhatdatahasarrived– EachnewduplicateACKmeansthatsomenewsegmenthasarrived
– Itwillbethesegmentsaftertheloss– Thusadvancingtheslidingwindowwillnotincreasethenumberofsegmentsstoredinthenetwork
76
FastRecovery• Firstfastretransmit,andMDcwnd• ThenpretendfurtherduplicateACKsaretheexpectedACKs– LetsnewsegmentsbesentforACKs– ReconcileviewswhentheACKjumps
Ack 1234555555
FastRecovery(2)
77
Ack 12Ack 13Ack 13
Ack 13Ack 13
Data14Ack 13
Ack 20......
Data20ThirdduplicateACK,sosend14
Data14waslostearlier,butgot15to20
Retransmissionfillsintheholeat14
Setssthresh,cwnd =cwnd/2
Data21Data22
MoreACKsadvancewindow;maysend
segmentsbeforejump
Ack 13
ExitFastRecovery
78
FastRecovery(3)• Withfastretransmit,itrepairsasinglesegmentlossquicklyandkeepstheACK clockrunning
• ThisallowsustorealizeAIMD– Notimeoutsorslow-startafterloss,justcontinuewithasmallercwnd
• TCPRenocombinesslow-start,fastretransmitandfastrecovery– MultiplicativeDecreaseis½
TCPReno
79
MDof½,noslow-start
ACKclockrunning
TCPsawtooth
80
TCPReno,NewReno,andSACK• RenocanrepaironelossperRTT– Multiplelossescauseatimeout
• NewReno furtherrefinesACKheuristics– Repairsmultiplelosseswithouttimeout
• SACKisabetteridea– ReceiversendsACKrangessosendercanretransmitwithoutguesswork
81
ExplicitCongestionNotification(§5.3.4,§6.5.10)
• Howrouterscanhelphoststoavoidcongestion– ExplicitCongestionNotification
!!
82
CongestionAvoidancevs.Control• ClassicTCPdrivesthenetworkintocongestionandthenrecovers– Needstoseelosstoslowdown
• Wouldbebettertousethenetworkbutavoidcongestionaltogether!– Reduceslossanddelay
• Buthowcanwedothis?
FeedbackSignals• Delayandroutersignalscanletusavoidcongestion
83
Signal ExampleProtocol Pros/Cons
Packet loss Classic TCPCubic TCP(Linux)
Hard togetwrongHearaboutcongestionlate
Packet delay Compound TCP(Windows)
Hear aboutcongestionearlyNeedtoinfercongestion
Routerindication
TCPswithExplicitCongestionNotification
HearaboutcongestionearlyRequire routersupport
ECN(ExplicitCongestionNotification)• Routerdetectstheonsetofcongestionviaitsqueue– Whencongested,itmarks affectedpackets(IPheader)
84
ECN(2)• Markedpacketsarriveatreceiver;treatedasloss– TCPreceiverreliablyinformsTCPsenderofthecongestion
85
86
ECN(3)• Advantages– Routersdeliverclearsignaltohosts– Congestionisdetectedearly,noloss– Noextrapacketsneedtobesent
• Disadvantage– Routersandbothsenderandreceivermustbeupgraded
87
Example1AssumeaTCPsenderwithoutfastretransmit,butwithslowstartandadditiveincrease.Alsoassume:• Segmentsn,n+1,n+2,…,n+10transmittedattimes0,1,2,…,10ms• Transmissiontime/segment=1ms• RTT(2xpropagation+transmision +ack processing+ack transmission)=10
ms• Segmentnislost(only)• InordersegmentsandACKs• Retransmissiontimerforsegmentnis60ms,startingattheendof
transmission• cwnd =ssthresh =64attime0• offeredWindow =70
88
Example2:InferEventsthatOccurred