Lecture 8: TCP and Congestion Control Slides adapted from: Congestion slides for Computer Networks:...
-
Upload
flora-blake -
Category
Documents
-
view
221 -
download
2
Transcript of Lecture 8: TCP and Congestion Control Slides adapted from: Congestion slides for Computer Networks:...
Lecture 8:
TCP and Congestion Control
Slides adapted from:Congestion slides for Computer Networks: A Systems Approach (Peterson and Davis)
Chapter 3 slides for Computer Networking: A Top Down Approach Featuring the Internet (Kurose and Ross)
ITCS 6166/8166 091Spring 2007
Jamie PaytonDepartment of Computer Science
University of North Carolina at Charlotte
February 5, 2007
Announcements
• Textbook is on reserve in library
• Homework 2 will be assigned on Wednesday– Due: Feb. 14
Transmission Control Protocol
• Implementation of sliding window protocol
TCP Segment (Packet) Structure
TCP uses sliding windows at sender and receiver
(buffers)
TCP Details
• Views data as ordered streams of bytes (not packets)– Sequence numbers are over bytes
• Seq # for segment = # of first byte
• Acknowledgements are cumulative– Acks for next expected byte #
• What happens on out-of-order segments?– Option 1: discard– Option 2: wait and see if other segments show up
• This approach taken in practice
TCP Sequence NumbersSeq. #’s:
– byte stream “number” of first byte in segment’s data
ACKs:
– seq # of next byte expected from other side
– cumulative ACK
Q: how receiver handles out-of-order segments
– A: TCP spec doesn’t say, - up to implementor
Host A Host B
Seq=42, ACK=79, data = ‘C’
Seq=79, ACK=43, data = ‘C’
Seq=43, ACK=80
Usertypes
‘C’
host ACKsreceipt
of echoed‘C’
host ACKsreceipt of
‘C’, echoesback ‘C’
timesimple telnet scenario
TCP: Sender• Assign sequence number to each segment (SeqNum)• Maintain three state variables:
– send window size (SWS) – upper bound on the number of outstanding (unACKed) frames that the sender can send
– last acknowledgment received (LAR)– last frame sent (LPS)
• Maintain invariant: LPS - LAR <= SWS
• Advance LAR when ACK arrives • Buffer up to SWS frames (in case retransmit required)• Timeout associated with each frame
SWS
LAR LFS
… …
TCP: Receiver• Maintain three state variables
– receive window size (RWS) – upper bound on the number of out-of-order frames the receiver can accept
– largest acceptable packet (LAP)– last packet received (LPR)
• Maintain invariant: LAP - LPR <= RWS
• Packet SeqNum arrives:– if LPR < SeqNum < = LAP accept– if SeqNum < = LPR or SeqNum > LAP discard
• Send cumulative ACKs
RWS
LPR LAP
… …
Buffer out of order packets!
TCP ACK generation [RFC 1122, RFC 2581]
Event at Receiver
Arrival of in-order segment withexpected seq #. All data up toexpected seq # already ACKed
Arrival of in-order segment withexpected seq #. One other segment has ACK pending
Arrival of out-of-order segmenthigher-than-expect seq. # .Gap detected
Arrival of segment that partially or completely fills gap
TCP Receiver action
Delayed ACK. Wait up to 500msfor next segment. If no next segment,send ACK
Immediately send single cumulative ACK, ACKing both in-order segments
Immediately send duplicate ACK, indicating seq. # of next expected byte
Immediate send ACK, provided thatsegment starts at lower end of gap
TCP Retransmission Timeout
• RTT estimate important for efficient operation– Too long: long delay before retransmission– Too short: unnecessary retransmission
• TCP RTT estimation– SampleRTT: measured time from segment
transmission until ACK receipt– SampleRTT will vary, want estimated RTT “smoother”
• average several recent measurements, not just current SampleRTT
TCP Round Trip Time
EstimatedRTT = (1- )*EstimatedRTT + *SampleRTT
• Exponential weighted moving average• influence of past sample decreases exponentially fast• typical value: = 0.125
Example RTT estimation:RTT: gaia.cs.umass.edu to fantasia.eurecom.fr
100
150
200
250
300
350
1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106
time (seconnds)
RTT
(mill
isec
onds
)
SampleRTT Estimated RTT
Setting the TCP Timeout• EstimtedRTT plus “safety margin”
– large variation in EstimatedRTT -> larger safety margin
• First estimate of how much SampleRTT deviates from EstimatedRTT:
TimeoutInterval = EstimatedRTT + 4*DevRTT
DevRTT = (1-)*DevRTT + *|SampleRTT-EstimatedRTT|
(typically, = 0.25)
Then set timeout interval:
Flow Control
• What happens if the receiving process is slow and the sending process is fast?
• TCP provides flow control– sender won’t overflow receiver’s buffer by
transmitting too much, too fast
How Flow Control Works
• Receiver advertises spare room to sender:RcvWindow = RcvBuffer-[LastByteRcvd-LastByteRead]
– includes RcvWindow in segments
• Sender keeps track of how much spare room receiver has in its variable RcvWindow
• Sender limits unACKed data to RcvWindow– guarantees receive buffer doesn’t overflow
Flow Control• Send buffer size: MaxSendBuffer• Receive buffer size: MaxRcvBuffer• Receiving side
– LastByteRcvd - LastByteRead < = MaxRcvBuffer– AdvertisedWindow = MaxRcvBuffer - (NextByteExpected - NextByteRead)
• Sending side– LastByteSent - LastByteAcked < = AdvertisedWindow– EffectiveWindow = AdvertisedWindow - (LastByteSent - LastByteAcked)
– LastByteWritten - LastByteAcked < = MaxSendBuffer– block sender if (LastByteWritten - LastByteAcked) + y > MaxSenderBuffer
• Always send ACK in response to arriving data segment• Persist when AdvertisedWindow = 0
Congestion Control
Congestion:• Informally: “too many sources sending too much
data too fast for network to handle”• Different from flow control!• Manifestations:
– lost packets (buffer overflow at routers)– long delays (queueing in router buffers)
• A top-10 problem in computer network research!
TCP Congestion Control
• Idea– assumes best-effort network (FIFO or FQ routers)– each source determines network capacity for itself– uses implicit feedback– ACKs pace transmission (self-clocking)
• Challenges– determining the available capacity in the first place– adjusting to changes in the available capacity
TCP Congestion Control Fundamentals
• Additive Increase/Multiplicative Decrease
• Slow start
• Fast Retransmit and Fast Recovery
Additive Increase/Multiplicative Decrease
• Objective: adjust to changes in the available capacity• New state variable per connection: CongestionWindow
– Counterpart to flow control’s advertised window– limits how much data source has in transit
MaxWin = MIN(CongestionWindow, AdvertisedWindow)
EffWin = MaxWin - (LastByteSent - LastByteAcked)
• Idea:– increase CongestionWindow when congestion goes down– decrease CongestionWindow when congestion goes up
• Now EffectiveWindow includes both flow control and congestion control
AIMD (cont)• Question: how does the source determine
whether or not the network is congested?
• Answer: a timeout occurs– timeout signals that a packet was lost– packets are seldom lost due to transmission error– lost packet implies congestion
Source Destination
…
AIMD (cont)
• Algorithm– increment CongestionWindow
by one packet every time an entirewindow’s worth of data is successful (linear increase)
– divide CongestionWindow by two whenever a timeout occurs (multiplicative decrease)
– think about the concepts in termsof packets, even though its implemented in bytes
• In practice: increment a little for each ACKIncrement = (MSS * MSS)/CongestionWindow
CongestionWindow += Increment
AIMD (cont)
• Trace: sawtooth behavior– proven to provide more stability than additive
increase/additive decrease– the consequences of having too large of a
window are much worse than having too small
60
20
1.0 2.0 3.0 4.0 5.0 6.0 7.0 8.0 9.0
KB
Time (seconds)
70
304050
10
10.0
Why AIMD?
• Fairness– If you run at any of the other combinations (AIAD, MIAD, MIMD),
common sequences of events can result in unfair distribution among competing flows
• R. Jain and K.K Ramakrishnan., Congestion Avoidance in Computer Networks with a Connectionless Network Layer: Concepts, Goals, and Methodology, in Proceedings of the Computer Networking Symposium, pp. 134—143, April 1988.
• Stability– If you’re slow starting, you KNOW that ½ of the window is OK,
so go back to there.– If you’re steady-state sending, then the reason your packet was
dropped is likely because a new flow started up• V. Jacobson., Congestion Avoidance and Control, in Proceedings of
the SIGCOMM Symposium, pp. 314—329, August 1988.
Slow Start
• Objective: determine the available capacity at first– increase congestion window rapidly
from a cold start
• Idea:– begin with CongestionWindow = 1
packet– double CongestionWindow each
RTT (increment by 1 packet for each ACK)
Source Destination
Slow Start (cont.)
• Why “slow”?– Starts slow in comparison to immediately
filling the advertised window– Prevents routers from having to handle bursts
of initial traffic.
Slow Start (cont)• when first starting connection
– the source has no idea what resources are available– Slow start continues to double CongWin until a loss occurs, then enters
additive increase/multiplicative decrease
• when connection goes dead waiting for timeout– receiver reopens entire window– use slow start to ramp up to previous CongWin/2
• Trace
• Problem: lose up to half a CongestionWindow’s worth of data– if you hit on the border of the network’s capacity (e.g., send n bytes worth of
data successfully, so double window to 2n, but network can only support n)
60
20
1.0 2.0 3.0 4.0 5.0 6.0 7.0 8.0 9.0
Time (seconds)
70
304050
10
Fast Retransmit and Fast Recovery
• Problem: coarse-grain TCP timeouts lead to idle periods
• Fast retransmit: use duplicate ACKs to trigger retransmission faster than normal– does not replace regular timeout
but enhances it– duplicate ACK suggests
problem at receiver• three duplicate ACKs triggers a
retransmit for the missing packet
Packet 1
Packet 2
Packet 3
Packet 4
Packet 5
Packet 6
Retransmitpacket 3
ACK 1
ACK 2
ACK 2
ACK 2
ACK 6
ACK 2
Sender Receiver
Fast Retransmit Results
• no more long, flat periods where no packets are sent, waiting for a timeout– eliminates about half of the timeouts in practice
• Fast recovery– skip the slow start phase– go directly to half the last successful CongestionWindow
(ssthresh)
60
20
1.0 2.0 3.0 4.0 5.0 6.0 7.0
Time (seconds)
70
304050
10
Congestion Avoidance• TCP’s strategy
– control congestion once it happens– repeatedly increase load in an effort to find the point at
which congestion occurs, and then back off• i.e., cause congestion in order to control it
• Alternative strategy– predict when congestion is about to happen– reduce rate before packets start being discarded– call this congestion avoidance, instead of congestion control
TCP Vegas
• End host management of congestion avoidance
• Watch for (implicit) signs that congestion is building– e.g. steady increase in RTT, flattening of the
perceived throughput
TCP Vegas• While we’re
increasing the congestion window, the perceived throughput stays about the same– i.e., the extra
stuff is being queued
– TCP Vegas tries to reduce the amount of this extra stuff
60
20
0.5 1.0 1.5 4.0 4.5 6.5 8.0
KB
Time (seconds)
Time (seconds)
70
304050
10
2.0 2.5 3.0 3.5 5.0 5.5 6.0 7.0 7.5 8.5
900
300100
0.5 1.0 1.5 4.0 4.5 6.5 8.0
Sen
ding
KB
ps1100
500700
2.0 2.5 3.0 3.5 5.0 5.5 6.0 7.0 7.5 8.5
Time (seconds)0.5 1.0 1.5 4.0 4.5 6.5 8.0Q
ueue
siz
e in
rou
ter
5
10
2.0 2.5 3.0 3.5 5.0 5.5 6.0 7.0 7.5 8.5
Algorithm • Let BaseRTT be the minimum of all measured RTTs (commonly the RTT
of the first packet)• If not overflowing the connection, then
ExpectedRate = CongestionWindow/BaseRTT• Source calculates sending rate (ActualRate) once per RTT
– Record the send time for a particular packet, record how many bytes are transmitted before the ack comes, compute the sample RTT for that packet, and divide the number of bytes transmitted in the middle by the RTT
• Source compares ActualRate with ExpectedRate
Diff = ExpectedRate - ActualRateif Diff <
increase CongestionWindow linearly (not using bandwidth effectively)
else if Diff > decrease CongestionWindow linearly(moving towards congestion)
elseleave CongestionWindow unchanged
Algorithm (cont)
• Parameters(in practice) = 1 packet = 3 packets
70605040302010
KB
Time (seconds)
0.5 1.0 1.5 2.0 2.5 3.0 3.5 4.0 4.5 5.0 5.5 6.0 6.5 7.0 7.5 8.0
0.5 1.0 1.5 2.0 2.5 3.0 3.5 4.0 4.5 5.0 5.5 6.0 6.5 7.0 7.5 8.0
CA
M K
Bps
240200160120
8040
Time (seconds)