Post on 20-Jan-2016
description
Univ. of Tehran Computer Network 1
Computer Computer NetworksNetworks
(Graduate level)
University of TehranDept. of EE and Computer Engineering
By:Dr. Nasser Yazdani
Lecture 8: Congestion Control
Univ. of Tehran Computer Network 2
Congestion Control Congestion control basics TCP congestion control Assigned reading
[JK88] Congestion Avoidance and Control
[CJ89] Analysis of the Increase and Decrease Algorithms for Congestion Avoidance in Computer Networks
Univ. of Tehran Computer Network 3
Overview Congestion sources and collapse
Congestion control basics
TCP congestion control
TCP interactions
Univ. of Tehran Computer Network 4
Why End-to-End Protocols?
Underlying best-effort network Drop/ reorder messages delivers duplicate copies of a given message limits messages to some finite size delivers messages after an arbitrarily long delay multiple application processes on each host Different speed of sender and receiver (Flow control) Congestion in the network (Congestion controls)
Initially, there was no end to end protocol. Now: UDP: A simple end to end protocol TCP: Reliable Transport protocol
Univ. of Tehran Computer Network 5
Reliable Transport (TCP) Communication abstraction:
Connection oriented, Point to point Reliable
Error Detection and correction Ordered Byte-stream
Application writes bytes TCP sends segments Application reads bytes
Full duplex, two way connection Flow and congestion controlled
Protocol implemented entirely at the ends Fate sharing
Univ. of Tehran Computer Network 6
Difference From Link Layers
Logical link vs. physical link Must establish connection Variable RTT
May vary within a connection Reordering packets
How long can packets live max segment lifetime Can’t expect endpoints to exactly match link
Buffer space availability Packets in transmission, delay X bandwidth
Transmission rate Don’t directly know media/network transmission rate
(Congestion)
Try to adapt to the situation.
Univ. of Tehran Computer Network 7
Congestion
Different sources compete for resources inside network where thery are unaware of current state of resource and each other
In general it is resource allocation problem. manifestations:
lost packets (buffer overflow at routers) long delays (queuing in router buffers)
10 Mbps
100 Mbps
1.5 Mbps
Univ. of Tehran Computer Network 8
Causes/costs of congestion: scenario 1
two senders, two receivers
one router, infinite buffers
no retransmission large delays
when congested maximum
achievable throughput
Univ. of Tehran Computer Network 9
Causes/costs of congestion: scenario 2
one router, finite buffers sender retransmission of lost
packet
Univ. of Tehran Computer Network 10
Causes/costs of congestion: scenario 2
always: (goodput)
“perfect” retransmission only when loss:
retransmission of delayed (not lost) packet makes
larger (than perfect case) for same
in
out
=
in
out
>
in
“costs” of congestion: more work (retrans) for given “goodput” unneeded retransmissions: link carries multiple
copies of pkt
Univ. of Tehran Computer Network 11
Causes/costs of congestion: scenario 3
four senders multihop paths timeout/retransmit
inQ: what happens as
and increase ?
in
Univ. of Tehran Computer Network 12
Causes/costs of congestion: scenario 3
Another “cost” of congestion: when packet dropped, any “upstream
transmission capacity used for that packet was wasted!
Univ. of Tehran Computer Network 13
Approaches towards congestion control
End-end congestion control:
no explicit feedback from network
congestion inferred from end-system observed loss, delay
approach taken by TCP
Network-assisted congestion control:
routers provide feedback to end systems single bit indicating
congestion (SNA, DECbit, TCP/IP ECN, ATM)
explicit rate sender should send at
Two broad approaches towards congestion control:
Univ. of Tehran Computer Network 14
Congestion Collapse Increase in network load results in decrease
of useful work done and crash the network ability to deliver data
Possible causes Spurious retransmissions of packets still in flight
Classical congestion collapse How can this happen with packet conservation? Solution: better timers and TCP congestion control
Undelivered packets Packets consume resources and are dropped elsewhere
in network Solution: congestion control for ALL traffic
Univ. of Tehran Computer Network 15
Other Congestion Collapse Causes
Fragments Mismatch of transmission and retransmission units Solutions
Make network drop all fragments of a packet (early packet discard in ATM)
Do path MTU discovery Control traffic
Large percentage of traffic is for control Headers, routing messages, DNS, etc.
Stale or unwanted packets Packets that are delayed on long queues “Push” data that is never used
Univ. of Tehran Computer Network 16
Where to Prevent Collapse?
Can end hosts prevent problem? Yes, but must trust end hosts to do right thing E.g., sending host must adjust amount of data
it puts in the network based on detected congestion
Can routers prevent collapse? No, not all forms of collapse Doesn’t mean they can’t help Sending accurate congestion signals Isolating well-behaved from ill-behaved
sources
Univ. of Tehran Computer Network 17
Congestion Control and Avoidance
A mechanism which: Uses network resources efficiently Preserves fair network resource allocation Prevents or avoids collapse
Congestion collapse is not just a theory Has been frequently observed in many
networks It is a top 10 problem.
Univ. of Tehran Computer Network 18
Congestion Collapse and Efficiency
knee – point after which throughput increases slowly delay increases quickly
cliff – point after which throughput decreases
quickly to zero (congestion collapse)
delay goes to infinity Congestion avoidance
stay at knee Congestion control
stay left of (but usually close to) cliff
Note (in an M/M/1 queue) delay = 1/(1 – utilization)
Load
Load
Th
rou
ghp
ut
De
lay
knee cliff
over utilization
under utilization
saturation
congestion collapse
Univ. of Tehran Computer Network 19
Goals Operate near the knee point Remain in equilibrium How to maintain equilibrium?
Don’t put a packet into network until another packet leaves. How do you do it?
Use ACK: send a new packet only after you receive and ACK. Why?
Maintain number of packets in network “constant”
Univ. of Tehran Computer Network 20
How Do You Do It? Detect when network
approaches/reaches knee point Stay there
Questions How do you get there? What if you overshoot (i.e., go over knee
point) ? Possible solution:
Increase window size until you notice congestion
Decrease window size if network congested
Univ. of Tehran Computer Network 21
Overview Congestion sources and collapse
Congestion control basics
TCP congestion control
TCP interactions
Univ. of Tehran Computer Network 22
Control System Model [CJ89]
Simple, yet powerful model Explicit binary signal of congestion
Why explicit (TCP uses implicit)? Implicit allocation of bandwidth
User 1
User 2
User n
x1
x2
xn
xi>Xgoal
y
Univ. of Tehran Computer Network 23
Objectives Simple router behavior Distributedness Efficiency: Xknee = xi(t) Fairness: (xi)2/n(xi
2) Power: (throughput/delay) Convergence: control system must
be stable, responsiveness.
Univ. of Tehran Computer Network 24
Power Power (ratio of throughput to
delay)
Optimalload Load
Th
rou
ghp
ut/d
elay
Univ. of Tehran Computer Network 25
Fair Allocation Maxmin fairness
Flows which share the same bottleneck get the same amount of bandwidth
Assumes no knowledge of priorities
Fairness = 1 - distance from fairness line
User 1: x1U
ser
2: x
2
2 user example
2 gettingtoo much
1 getting too much
fairnessline
2
2
i
i
xn
xxF
Univ. of Tehran Computer Network 26
Basic Control Model Let’s assume window-based control Reduce window when congestion is
perceived How is congestion signaled?
Either mark or drop packets When is a router congested?
Drop tail queues – when queue is full Average queue length – at some threshold
Increase window otherwise Probe for available bandwidth – how?
Univ. of Tehran Computer Network 27
Linear Control Many different possibilities for reaction
to congestion and probing Examine simple linear controls Window(t + 1) = a + b Window(t) Different ai/bi for increase and ad/bd for
decrease Supports various reaction to signals
Increase/decrease additively Increased/decrease multiplicatively Which of the four combinations is optimal?
Univ. of Tehran Computer Network 28
Possible Choices
Multiplicative increase, additive decrease aI=0, bI>1, aD<0, bD=1
Additive increase, additive decrease aI>0, bI=1, aD<0, bD=1
Multiplicative increase, multiplicative decrease aI=0, bI>1, aD=0, 0<bD<1
Additive increase, multiplicative decrease aI>0, bI=1, aD=0, 0<bD<1
Which one?
decreasetxba
increasetxbatx
iDD
iIIi )(
)()1(
Univ. of Tehran Computer Network 29
Phase plots What are desirable properties? What if flows are not equal?
Efficiency Line
Fairness Line
User 1’s Allocation x1
User 2’s
Allocation x2
Optimal point
Overload
Underutilization
Univ. of Tehran Computer Network 30
Phase plots Simple way to visualize behavior of
competing connections over time
Efficiency Line
Fairness Line
User 1’s Allocation x1
User 2’s
Allocation x2
Univ. of Tehran Computer Network 31
Additive Increase/Decrease
T0
T1
Efficiency Line
Fairness Line
User 1’s Allocation x1
User 2’s
Allocation x2
Both X1 and X2 increase/decrease by the same amount over time Additive increase improves efficiency and
additive decrease reduces efficiency
Univ. of Tehran Computer Network 32
Muliplicative Increase/Decrease
Both X1 and X2 increase by the same factor over time Extension from origin – constant
fairness
T0
T1
Efficiency Line
Fairness Line
User 1’s Allocation x1
User 2’s
Allocation x2
Univ. of Tehran Computer Network 33
Convergence to Efficiency
xH
Efficiency Line
Fairness Line
User 1’s Allocation x1
User 2’s
Allocation x2
Univ. of Tehran Computer Network 34
Convergence to Fairness
xH
Efficiency Line
Fairness Line
User 1’s Allocation x1
User 2’s Allocation
x2
xH’
Univ. of Tehran Computer Network 35
Convergence to Efficiency & Fairness
xH
Efficiency Line
Fairness Line
User 1’s Allocation x1
User 2’s Allocation
x2
xH’
Univ. of Tehran Computer Network 36
Increase
Efficiency Line
Fairness Line
User 1’s Allocation x1
User 2’s Allocation
x2
xL
Univ. of Tehran Computer Network 37
Constraints Distributed efficiency
I.e., Window(t+1) > Window(t) during increase
ai > 0 & bi > 1 Similarly, ad < 0 & bd < 1
Must never decrease fairness a & b’s must be > 0 ai/bi > 0 and ad/bd 0
Full constraints ad = 0, 0 bd < 1, ai > 0 and bi = 1
Univ. of Tehran Computer Network 38
What is the Right Choice? Constraints limit us to AIMD
Can have multiplicative term in increase
AIMD moves towards optimal point
x0
x1
x2
Efficiency Line
Fairness Line
User 1’s Allocation x1
User 2’s Allocation
x2
Univ. of Tehran Computer Network 39
Overview Congestion sources and collapse
Congestion control basics
TCP congestion control
TCP interactions
Univ. of Tehran Computer Network 40
TCP Congestion Control Motivated by ARPANET congestion collapse Underlying design principle: packet
conservation At equilibrium, inject packet into network only
when one is removed Basis for stability of physical systems
Why was this not working? Connection doesn’t reach equilibrium Spurious retransmissions Resource limitations prevent equilibrium
Univ. of Tehran Computer Network 41
TCP Congestion Control - Solutions
Reaching equilibrium Slow start
Eliminates spurious retransmissions Accurate RTO estimation Fast retransmit
Adapting to resource availability Congestion avoidance
Univ. of Tehran Computer Network 42
TCP Congestion Control Basics Keep a congestion window, cwnd
Denotes how much network is able to absorb
Sender’s maximum window: Min (advertised window, cwnd)
Sender’s actual window: Max window - unacknowledged segments
If we have large actual window, should we send data in one shot? No, use acks to clock sending new data
Univ. of Tehran Computer Network 43
Self-clocking
PrPb
Ar
Ab
ReceiverSender
As
Univ. of Tehran Computer Network 44
Slow Start How do we get this clocking behavior
to start? Initialize cwnd = 1 Upon receipt of every ack, cwnd = cwnd
+ 1 Implications
Window actually increases to W in RTT * log2(W)
Can overshoot window and cause packet loss
Univ. of Tehran Computer Network 45
Slow Start Example
1
One RTT
One pkt time
0R
2
1R
3
4
2R
567
83R
91011
1213
1415
1
2 3
4 5 6 7
Univ. of Tehran Computer Network 46
Slow Start Sequence Plot
Time
Sequence No
.
.
.
Univ. of Tehran Computer Network 47
Congestion Avoidance Loss implies congestion – why?
Not necessarily true on all link types If loss occurs when cwnd = W
Network can handle 0.5W ~ W segments Set cwnd to 0.5W (multiplicative
decrease) Upon receiving ACK
Increase cwnd by 1/cwnd Results in additive increase
Univ. of Tehran Computer Network 48
Return to Slow Start If packet is lost we lose our self
clocking as well Need to implement slow-start and
congestion avoidance together When timeout occurs set ssthresh to
0.5w If cwnd < ssthresh, use slow start Else use congestion avoidance
Univ. of Tehran Computer Network 49
Overall TCP Behavior
Time
Window
Univ. of Tehran Computer Network 50
Congestion Window
Time
CongestionWindow
Slow start with each time out
Time out Time out
Slow Start
•Time out is just wasting resource and time•How to prevent it: Do not wait! Fast retransmission
Univ. of Tehran Computer Network 51
Fast Retransmit Resend a segment
after 3 duplicate ACKs A duplicate ACK means
that an out-of sequence segment was received
Notes: duplicate ACKs due to
packet reordering why reordering?
window may be too small to get duplicate ACKs
Then what? Slow start
ACK 2
segment 1cwnd = 1
cwnd = 2 segment 2segment 3
ACK 4cwnd = 4 segment 4
segment 5segment 6segment 7
ACK 3
3 duplicateACKs
ACK 4
ACK 4
ACK 4
Univ. of Tehran Computer Network 52
Fast Recovery A duplicate ack notifies sender that a packet
has departed network When < cwnd packets are outstanding
Allow new packets out with each new duplicate acknowledgement
Behavior Sender is idle for some time – waiting for ½ cwnd
worth of dupacks Transmits at original rate after wait
Ack clocking rate is same as before loss At the end: No Slow start: W=W/2 and got to
AIMD
Univ. of Tehran Computer Network 53
Fast Retransmit
Time
Sequence NoDuplicate Acks
RetransmissionX
Univ. of Tehran Computer Network 54
Fast Recovery
Time
Sequence NoSent for each dupack after
W/2 dupacks arrive
Univ. of Tehran Computer Network 55
Multiple Losses
Time
Sequence NoDuplicate Acks
RetransmissionX
X
XX
Now what?
Univ. of Tehran Computer Network 56
Time
Sequence NoX
X
XX
Tahoe
Slow start again
Univ. of Tehran Computer Network 57
TCP Reno (1990) All mechanisms in Tahoe Addition of fast-recovery
Opening up congestion window after fast retransmit Delayed acks Header prediction
Implementation designed to improve performance Has common case code inlined
With multiple losses, Reno typically timeouts because it does not see duplicate acknowlegements
58
TCP Reno
Fast retransmit: retransmit a segment after 3 DUP Acks
Fast recovery: reduce cwnd to half instead of to one
Time
cwnd
Slow Start
CongestionAvoidance
Timeout
Fast Recovery
Fast recovery
Univ. of Tehran Computer Network 59
Reno
Time
Sequence NoX
X
XX
Now what? - timeout
Univ. of Tehran Computer Network 60
NewReno The ack that arrives after
retransmission (partial ack) should indicate that a second loss occurred
When does NewReno timeout? When there are fewer than three
dupacks for first loss When partial ack is lost
How fast does it recover losses? One per RTT
Univ. of Tehran Computer Network 61
NewReno
Time
Sequence NoX
X
XX
Now what? – partial ackrecovery
Univ. of Tehran Computer Network 62
SACK Basic problem is that cumulative
acks only provide little information Ack for just the packet received
What if acks are lost? carry cumulative also
Not used Bitmask of packets received
Selective acknowledgement (SACK)
How to deal with reordering
Univ. of Tehran Computer Network 63
SACK
Time
Sequence NoX
X
XX
Now what? – sendretransmissions as soonas detected
Univ. of Tehran Computer Network 64
Performance Issues Timeout >> fast rexmit
Need 3 dupacks/sacks Not great for small transfers
Don’t have 3 packets outstanding What are real loss patterns like?
Right edge recovery Allow packets to be sent on arrival of
first and second duplicate ack Helps recovery for small windows
How to deal with reordering?
Univ. of Tehran Computer Network 65
NewReno Changes
Send a new packet out for each pair of dupacks Adapt more gradually to new window
Will not halve congestion window again until recovery is completed Identifies congestion events vs. congestion
signals Initial estimation for ssthresh
Univ. of Tehran Computer Network 66
Rate Halving Recovery
Time
Sequence NoSent after every
other dupack
Univ. of Tehran Computer Network 67
Delayed Ack Impact TCP congestion control triggered
by acks If receive half as many acks window
grows half as fast Slow start with window = 1
Will trigger delayed ack timer First exchange will take at least
200ms Start with > 1 initial window
Bug in BSD, now a “feature”/standard
Univ. of Tehran Computer Network 68
TCP Congestion Control end-end control (no network assistance) transmission rate limited by congestion window size, Congwin,
over segments:
w segments, each with MSS bytes sent in one RTT:
throughput = w * MSS
RTT Bytes/sec
Congwin
Univ. of Tehran Computer Network 69
Fast Retransmit and Recovery in Reno
Upon reception of 3 dupes, thresh= thresh/2. Set the congwin = thresh + 3
This accounts for the 3 packets that have left the network. Increment congwin for each dupe subsequently
received. Transmit a new segment if we are allowed When a new ack* finally arrives, set
congwin=thresh and we are in congestion avoidance again with a “deflated window”.
*”new ack” means an ack for any data not yet acked, but inside congwin.
Univ. of Tehran Computer Network 70
Fast Retransmit and Recovery in NewReno
Upon reception of 3 dupes, thresh= thresh/2. Set the congwin = thresh + 3
This accounts for the 3 packets that have left the network. Increment congwin for each dup subsequently
received. Transmit a new segment if we are allowed When a new ack* finally arrives, set congwin=thresh
and we are in congestion avoidance again.* But only do this if the ack received is for the highest
seq# sent, avoiding a stall while recovering.
71
TCP & Routers
How Routers can help Congestion control Indeed, Congestion control and queue
management are the same problem “Resource Allocation”.
RED XCP Read
Chapter 6 of the book, also look at [FJ93] Random Early Detection Gateways for
Congestion Avoidance
72
Queuing Disciplines Each router must implement some
queuing discipline Queuing allocates both bandwidth
and buffer space: Bandwidth: which packet to serve
(transmit) next Buffer space: which packet to drop next
(when required) Queuing also affects latency
73
Packet Drop Dimensions
AggregationPer-connection state Single class
Drop positionHead Tail
Random location
Class-based queuing
Early drop Overflow drop
74
Typical Internet Queuing FIFO + drop-tail
Simplest choice Used widely in the Internet
FIFO (first-in-first-out) Implies single class of traffic
Drop-tail Arriving packets get dropped when queue is
full regardless of flow or importance Important distinction:
FIFO: scheduling discipline Drop-tail: drop policy
75
Active Queue Management
Design active router queue management to aid congestion control
Why? Routers can distinguish between
propagation and persistent queuing delays
Routers can decide on transient congestion, based on workload
76
Active Queue Designs Modify both router and hosts
DECbit: congestion bit in packet header Modify router, hosts use TCP
Fair queuing Per-connection buffer allocation
RED (Random Early Detection) Drop packet or set bit in packet header as
soon as congestion is starting
77
Random Early Detection (RED)
Detect incipient congestion, allow bursts Keep power (throughput/delay) high
Keep average queue size low Assume hosts respond to lost packets
Avoid window synchronization Randomly mark packets
Avoid bias against bursty traffic Some protection against ill-behaved
users
78
RED Algorithm Maintain running average of queue
length If avgq < minth do nothing
Low queuing, send packets through If avgq > maxth, drop packet
Protection from misbehaving sources Else mark packet in a manner
proportional to queue length Notify sources of incipient congestion
79
RED OperationMin threshMax thresh
Average Queue Length
minth maxth
maxP
1.0
Avg queue length
P(drop)
80
RED Algorithm Maintain running average of queue
length Byte mode vs. packet mode – why?
For each packet arrival Calculate average queue size (avg) If minth ≤ avgq < maxth
Calculate probability Pa
With probability Pa
Mark the arriving packet
Else if maxth ≤ avg Mark the arriving packet
81
Queue Estimation Standard EWMA: avgq = (1-wq) avgq + wqqlen
Special fix for idle periods – why? Upper bound on wq depends on minth
Want to ignore transient congestion Can calculate the queue average if a burst arrives
Set wq such that certain burst size does not exceed minth
Lower bound on wq to detect congestion relatively quickly
Typical wq = 0.002
82
Thresholds minth determined by the utilization
requirement Tradeoff between queuing delay and utilization
Relationship between maxth and minth
Want to ensure that feedback has enough time to make difference in load
Depends on average queue increase in one RTT
Paper suggest ratio of two Current rule of thumb is factor of three
84
Packet Marking maxp is reflective of typical loss rates Paper uses 0.02
0.1 is more realistic value If network needs marking of 20-30%
then need to buy a better link! Gentle variant of RED (recommended)
Vary drop rate from maxp to 1 as the avgq varies from maxth to 2* maxth
More robust to setting of maxth and maxp
85
Extending RED for Flow Isolation
Problem: what to do with non-cooperative flows?
Fair queuing achieves isolation using per-flow state – expensive at backbone routers How can we isolate unresponsive flows
without per-flow state? RED penalty box
Monitor history for packet drops, identify flows that use disproportionate bandwidth
Isolate and punish those flows
86
FRED Fair Random Early Drop (Sigcomm, 1997) Maintain per flow state only for active
flows (ones having packets in the buffer) minq and maxq min and max number of
buffers a flow is allowed occupy avgcq = average buffers per flow Strike count of number of times flow has
exceeded maxq
87
Feedback
Round Trip Time
Congestion Window
Congestion Header
Feedback
Round Trip Time
Congestion Window
How does XCP Work?
Feedback = + 0.1 packet
88
Feedback = + 0.1 packet
Round Trip Time
Congestion Window
Feedback = - 0.3 packet
How does XCP Work?
89
Congestion Window = Congestion Window + Feedback
Routers compute feedback without any per-flow state Routers compute feedback without any per-flow state
How does XCP Work?
XCP extends ECN and CSFQ
90
How Does an XCP Router Compute the Feedback?
Congestion Controller
Fairness ControllerGoal: Divides between
flows to converge to fairness
Looks at a flow’s state in Congestion Header
Algorithm:If > 0 Divide equally
between flowsIf < 0 Divide between flows proportionally to their
current rates
MIMD AIMD
Goal: Matches input traffic to link capacity & drains the
queueLooks at aggregate traffic
& queue
Algorithm:Aggregate traffic changes by
~ Spare Bandwidth
~ - Queue SizeSo, = davg Spare -
Queue
Congestion Controller
Fairness Controller
91
= davg Spare - Queue
224
0 2 and
Theorem: System converges to optimal
utilization (i.e., stable) for any link bandwidth, delay,
number of sources if:
(Proof based on Nyquist Criterion)
Getting the devil out of the details …
Congestion Controller Fairness Controller
No Parameter Tuning
No Parameter Tuning
Algorithm:If > 0 Divide equally between
flowsIf < 0 Divide between flows
proportionally to their current rates
Need to estimate number of flows N
Tinpkts pktpkt RTTCwndTN
)/(1
RTTpkt : Round Trip Time in header
Cwndpkt : Congestion Window in header
T: Counting Interval
No Per-Flow StateNo Per-Flow State
92
Lessons TCP alternatives
TCP being used in new/unexpected ways Key changes needed
Routers FIFO, drop-tail interacts poorly with TCP Various schemes to desynchronize flows and
control loss rate Fair-queuing
Clean resource allocation to flows Complex packet classification and scheduling
Core-stateless FQ & XCP Coarse-grain fairness Carrying packet state can reduce complexity
Univ. of Tehran Computer Network 93
Next Lecture: TCP behavior and New Versions
High speed TCPs Assigned reading
[BP95] TCP Vegas: End to End Congestion Avoidance on a Global Internet
[FHPW00] Equation-Based Congestion Control for Unicast Applications