The Internet’s architecture for managing congestion
-
Upload
kasimir-steele -
Category
Documents
-
view
23 -
download
0
description
Transcript of The Internet’s architecture for managing congestion
Some Internet History
• 1974: First draft of TCP/IP[“A protocol for packet network interconnection”, Vint Cerf and Robert Kahn]
• 1983: ARPANET switches on TCP/IP
• 1986: Congestion collapse• 1988: Congestion control for TCP
[“Congestion avoidance and control”, Van Jacobson]
“A Brief History of the Internet”, the Internet Society
End-to-end control
Internet congestion is controlled by the end-systems.The network operates as a dumb pipe.[“End-to-end arguments in system design” by Saltzer, Reed, Clark, 1981]
UserServer(TCP)
request
End-to-end control
Internet congestion is controlled by the end-systems.The network operates as a dumb pipe.[“End-to-end arguments in system design” by Saltzer, Reed, Clark, 1981]
UserServer(TCP)
data
End-to-end control
Internet congestion is controlled by the end-systems.The network operates as a dumb pipe.[“End-to-end arguments in system design” by Saltzer, Reed, Clark, 1981]
The TCP algorithm, running on the server, decides how fast to send data.
UserServer(TCP)
data
acknowledgements
TCPif (seqno > _last_acked) {
if (!_in_fast_recovery) {
_last_acked = seqno;
_dupacks = 0;
inflate_window();
send_packets(now);
_last_sent_time = now;
return;
}
if (seqno < _recover) {
uint32_t new_data = seqno - _last_acked;
_last_acked = seqno;
if (new_data < _cwnd) _cwnd -= new_data; else _cwnd=0;
_cwnd += _mss;
retransmit_packet(now);
send_packets(now);
return;
}
uint32_t flightsize = _highest_sent - seqno;
_cwnd = min(_ssthresh, flightsize + _mss);
_last_acked = seqno;
_dupacks = 0;
_in_fast_recovery = false;
send_packets(now);
return;
}
if (_in_fast_recovery) {
_cwnd += _mss;
send_packets(now);
return;
}
_dupacks++;
if (_dupacks!=3) {
send_packets(now);
return;
}
_ssthresh = max(_cwnd/2, (uint32_t)(2 * _mss));
retransmit_packet(now);
_cwnd = _ssthresh + 3 * _mss;
_in_fast_recovery = true;
_recover = _highest_sent;
}
time [0-8 sec]
traf
fic r
ate
[0-1
00 k
B/s
ec]
Motivation: buffer size• Internet routers have buffers,
to accomodate bursts in traffic.
• How big do the buffers need to be?– 3 GByte? Rule of thumb—what Cisco does today
– 300 MByte? [Appenzeller, Keslassy, McKeown, 2004 ]
– 30 kByte?
• Large buffers are unsustainable:– Data volumes double every 10 months– CPU speeds double every 18 months– Memory access speeds double every 10 years
Motivation: TCP’s teleology[Kelly, Maulloo, Tan, 1998]
• Consider several TCP flows sharing a single link
• Let xr be the mean bandwidth of flow r [pkts/sec]
Let y be the total bandwidth of all flows [pkts/sec]
Let C be the total available capacity [pkts/sec]
• TCP and the network act so as to solvemaximise r U(xr) - P(y,C) over xr0 where y=r
xr
x
U(x
)
y
P(y
,C)
C
Bad teleology
little extra valued attached to high-bandwidth flowssevere penalty for
allocating too little bandwidth
x
U(x
)
Bad teleology
x
U(x
)
flows with largeRTT are satisfied with little bandwidth
flows with small RTT want more bandwidth
TCP’s teleology
• The network acts as if it’s trying tosolve an optimization problem
• Is this what we want the Internet to optimize?
• Does it even succeed in performing the optimization?
x
U(x
)
yC
P(y
,C)
Synchronization
Desynchronized TCP flows:– aggregate traffic
is smooth
– network solves the optimization
SynchronizedTCP flows:– aggregate traffic
is bursty
– network oscillates about the optimum
++=
++=
aggr
egat
etr
affi
c ra
te
time
indi
vidu
alfl
ow r
ates
Synchronization
Desynchronized TCP flows:– aggregate traffic
is smooth
– network solves the optimization
SynchronizedTCP flows:– aggregate traffic
is bursty
– network oscillates about the optimum
++=
++=
aggr
egat
etr
affi
c ra
te
time
indi
vidu
alfl
ow r
ates
TCP traffic model
• When there are many TCP flows, the aggregate traffic rate xt varies smoothly, according to a differential equation[Misra, Gong, Towsley, 2000]
• The equation involves
– pt, the packet loss probability at time t,
– RTT, the average round trip time
time
aggr
egat
etr
affi
c ra
te
desynchronized synchronized
TCPif (seqno > _last_acked) {
if (!_in_fast_recovery) {
_last_acked = seqno;
_dupacks = 0;
inflate_window();
send_packets(now);
_last_sent_time = now;
return;
}
if (seqno < _recover) {
uint32_t new_data = seqno - _last_acked;
_last_acked = seqno;
if (new_data < _cwnd) _cwnd -= new_data; else _cwnd=0;
_cwnd += _mss;
retransmit_packet(now);
send_packets(now);
return;
}
uint32_t flightsize = _highest_sent - seqno;
_cwnd = min(_ssthresh, flightsize + _mss);
_last_acked = seqno;
_dupacks = 0;
_in_fast_recovery = false;
send_packets(now);
return;
}
if (_in_fast_recovery) {
_cwnd += _mss;
send_packets(now);
return;
}
_dupacks++;
if (_dupacks!=3) {
send_packets(now);
return;
}
_ssthresh = max(_cwnd/2, (uint32_t)(2 * _mss));
retransmit_packet(now);
_cwnd = _ssthresh + 3 * _mss;
_in_fast_recovery = true;
_recover = _highest_sent;
}
time [0-8 sec]
traf
fic r
ate
[0-1
00 k
B/s
ec]
Queue model
• How does packet loss probability pt depend on buffer size?
• There are two families of answers, depending on queueing delay:– Small buffers (queueing delay « RTT)– Large buffers (queueing delay RTT)
time
val1
40 41 42 43 44 45
05
1015
20
20
05
1015
20
200
05
1015
20
2000
Small buffers
As the optical fibre’s line rate increases
– queue size fluctuates more and more rapidly
– queue size distribution does not change(it depends only on link utilization, not on line rate)
time
val1
40 41 42 43 44 45
05
1015
20
20
05
1015
20
200
05
1015
20
2000
queu
e si
ze[0
-15
pkt]
time [0-5 sec]
queueing delay19 ms
queueing delay1.9 ms
queueing delay0.19 ms
time
val1
40 41 42 43 44 45
05
1015
20
20
05
1015
20
200
05
1015
20
2000
Large buffers (queueing delay 200 ms)
• When xt<C the queue size is small (C=line rate)
• No packet drops, so TCP increases xt
time
val1
40 42 44 46 48 50
050
100
150
queueapprox
queu
e si
ze[0
-160
pkt
]
time [0-10 sec]
Large buffers (queueing delay 200 ms)
• When xt<C the queue size is small (C=line rate)
• No packet drops, so TCP increases xt
• When xt>C the queue fills up and packets begin to get dropped
time
val1
40 42 44 46 48 50
050
100
150
queueapprox
queu
e si
ze[0
-160
pkt
]
time [0-10 sec]
Large buffers (queueing delay 200 ms)
• When xt<C the queue size is small (C=line rate)
• No packet drops, so TCPs increases xt
• When xt>C the queue fills up and packets begin to get dropped
• TCPs may ‘overshoot’, leading to synchronization
time
val1
40 42 44 46 48 50
050
100
150
queueapprox
queu
e si
ze[0
-160
pkt
]
time [0-10 sec]
Large buffers (queueing delay 200 ms)
• Drop probability depends onboth traffic rate xt and queue size qt
time
val1
40 42 44 46 48 50
050
100
150
queueapprox
queu
e si
ze[0
-160
pkt
]
time [0-10 sec]
Analysis
• Write down differential equations– for aggregate TCP traffic rate xt
– for queue dynamics and loss prob pt
taking account of buffer size
• Calculate– average link utilization– average queue occupancy/delay– extent of synchronization
and consequent loss of utilization, and jitter[Gaurav Raina, PhD thesis, 2005]
Stability/instability analysis
• For some values of C*RTT, the dynamical system is stable
• For others it is unstable and there are oscillations(i.e. the flows are partially synchronized)
• When it is unstable, we can calculate the amplitude of the oscillations
20 40 60 80 1000.60.8
1.21.4
20 40 60 80 1000.60.8
1.21.4
time
trafficrate xt/C
0.5 1 1.5 2
-4
-3
-2
-1
Instability plot
traffic intensity x/C
log10 ofpkt loss
probability p
TCP throughput equation
queue equation
extent ofoscillationsin x/C
0.5 1 1.5 2
-4
-3
-2
-1
Instability plot
C*RTT=4 pkts
C*RTT=20 pkts
C*RTT=100 pkts
traffic intensity x/C
log10 ofpkt loss
probability p
Alternative buffer-sizing rules
b25 b100 b400
0.5 1 1.5
-4
-3
-2
-1
b10 b20 b50
0.5 1 1.5
-4
-3
-2
-1
0.5 1 1.5 2
-4
-3
-2
-1
b50 b1000
0.5 1 1.5-6
-5
-4
-3
-2
-1
r
p
Intermediate buffers buffer = bandwidth*delay / sqrt(#flows)
or Large buffers buffer = bandwidth*delay
Large buffers with AQMbuffer=bandwidth*delay*{¼,1,4}
Small buffersbuffer={10,20,50} pkts
Small buffers, ScalableTCPbuffer={50,1000} pkts[Vinnicombe 2002] [T.Kelly 2002]
Conclusion
• The network acts to solve an optimization problem.– We can choose which optimization problem,
by choosing the right buffer size & by changing TCP’s code.
• It may or may not attain the solution– In order to make sure the network is stable,
we need to choose the buffer size & TCP code carefully.
Prescription
• ScalableTCP in end-systemsneed to persuade Microsoft, Linus
• Much smaller buffers in routersneed to persuade BT/AT&T
x
U(x
)
y
P(y
,C)
C
ScalableTCP gives more weight to high-bandwidth flows.And it’s been shown to be stable.
With small buffers,the network likes to run with slightly lower utilization,hence lower delay