Computer Networks (Graduate level)

Univ. of Tehran Computer Network 1

Computer Computer NetworksNetworks

(Graduate level)

University of TehranDept. of EE and Computer Engineering

By:Dr. Nasser Yazdani

Lecture 8: Congestion Control

Congestion Control Congestion control basics TCP congestion control Assigned reading

[JK88] Congestion Avoidance and Control

[CJ89] Analysis of the Increase and Decrease Algorithms for Congestion Avoidance in Computer Networks

Overview Congestion sources and collapse

Congestion control basics

TCP congestion control

TCP interactions

Why End-to-End Protocols?

Underlying best-effort network Drop/ reorder messages delivers duplicate copies of a given message limits messages to some finite size delivers messages after an arbitrarily long delay multiple application processes on each host Different speed of sender and receiver (Flow control) Congestion in the network (Congestion controls)

Initially, there was no end to end protocol. Now: UDP: A simple end to end protocol TCP: Reliable Transport protocol

Reliable Transport (TCP) Communication abstraction:

Connection oriented, Point to point Reliable

Error Detection and correction Ordered Byte-stream

Application writes bytes TCP sends segments Application reads bytes

Full duplex, two way connection Flow and congestion controlled

Protocol implemented entirely at the ends Fate sharing

Difference From Link Layers

Logical link vs. physical link Must establish connection Variable RTT

May vary within a connection Reordering packets

How long can packets live max segment lifetime Can’t expect endpoints to exactly match link

Buffer space availability Packets in transmission, delay X bandwidth

Transmission rate Don’t directly know media/network transmission rate

(Congestion)

Try to adapt to the situation.

Congestion

Different sources compete for resources inside network where thery are unaware of current state of resource and each other

In general it is resource allocation problem. manifestations:

lost packets (buffer overflow at routers) long delays (queuing in router buffers)

10 Mbps

100 Mbps

1.5 Mbps

Causes/costs of congestion: scenario 1

two senders, two receivers

one router, infinite buffers

no retransmission large delays

when congested maximum

achievable throughput

one router, finite buffers sender retransmission of lost

packet

always: (goodput)

“perfect” retransmission only when loss:

retransmission of delayed (not lost) packet makes

larger (than perfect case) for same

“costs” of congestion: more work (retrans) for given “goodput” unneeded retransmissions: link carries multiple

copies of pkt

four senders multihop paths timeout/retransmit

inQ: what happens as

and increase ?

Another “cost” of congestion: when packet dropped, any “upstream

transmission capacity used for that packet was wasted!

Approaches towards congestion control

End-end congestion control:

no explicit feedback from network

congestion inferred from end-system observed loss, delay

approach taken by TCP

Network-assisted congestion control:

routers provide feedback to end systems single bit indicating

congestion (SNA, DECbit, TCP/IP ECN, ATM)

explicit rate sender should send at

Two broad approaches towards congestion control:

Congestion Collapse Increase in network load results in decrease

of useful work done and crash the network ability to deliver data

Possible causes Spurious retransmissions of packets still in flight

Classical congestion collapse How can this happen with packet conservation? Solution: better timers and TCP congestion control

Undelivered packets Packets consume resources and are dropped elsewhere

in network Solution: congestion control for ALL traffic

Other Congestion Collapse Causes

Fragments Mismatch of transmission and retransmission units Solutions

Make network drop all fragments of a packet (early packet discard in ATM)

Do path MTU discovery Control traffic

Large percentage of traffic is for control Headers, routing messages, DNS, etc.

Stale or unwanted packets Packets that are delayed on long queues “Push” data that is never used

Where to Prevent Collapse?

Can end hosts prevent problem? Yes, but must trust end hosts to do right thing E.g., sending host must adjust amount of data

it puts in the network based on detected congestion

Can routers prevent collapse? No, not all forms of collapse Doesn’t mean they can’t help Sending accurate congestion signals Isolating well-behaved from ill-behaved

sources

Congestion Control and Avoidance

A mechanism which: Uses network resources efficiently Preserves fair network resource allocation Prevents or avoids collapse

Congestion collapse is not just a theory Has been frequently observed in many

networks It is a top 10 problem.

Congestion Collapse and Efficiency

knee – point after which throughput increases slowly delay increases quickly

cliff – point after which throughput decreases

quickly to zero (congestion collapse)

delay goes to infinity Congestion avoidance

stay at knee Congestion control

stay left of (but usually close to) cliff

Note (in an M/M/1 queue) delay = 1/(1 – utilization)

knee cliff

over utilization

under utilization

saturation

congestion collapse

Goals Operate near the knee point Remain in equilibrium How to maintain equilibrium?

Don’t put a packet into network until another packet leaves. How do you do it?

Use ACK: send a new packet only after you receive and ACK. Why?

Maintain number of packets in network “constant”

How Do You Do It? Detect when network

approaches/reaches knee point Stay there

Questions How do you get there? What if you overshoot (i.e., go over knee

point) ? Possible solution:

Increase window size until you notice congestion

Decrease window size if network congested

TCP interactions

Control System Model [CJ89]

Simple, yet powerful model Explicit binary signal of congestion

Why explicit (TCP uses implicit)? Implicit allocation of bandwidth

User 1

User 2

User n

xi>Xgoal

Objectives Simple router behavior Distributedness Efficiency: Xknee = xi(t) Fairness: (xi)2/n(xi

2) Power: (throughput/delay) Convergence: control system must

be stable, responsiveness.

Power Power (ratio of throughput to

delay)

Optimalload Load

Fair Allocation Maxmin fairness

Flows which share the same bottleneck get the same amount of bandwidth

Assumes no knowledge of priorities

Fairness = 1 - distance from fairness line

User 1: x1U

2 user example

2 gettingtoo much

1 getting too much

fairnessline

Basic Control Model Let’s assume window-based control Reduce window when congestion is

perceived How is congestion signaled?

Either mark or drop packets When is a router congested?

Drop tail queues – when queue is full Average queue length – at some threshold

Increase window otherwise Probe for available bandwidth – how?

Linear Control Many different possibilities for reaction

to congestion and probing Examine simple linear controls Window(t + 1) = a + b Window(t) Different ai/bi for increase and ad/bd for

decrease Supports various reaction to signals

Increase/decrease additively Increased/decrease multiplicatively Which of the four combinations is optimal?

Possible Choices

Multiplicative increase, additive decrease aI=0, bI>1, aD<0, bD=1

Additive increase, additive decrease aI>0, bI=1, aD<0, bD=1

Multiplicative increase, multiplicative decrease aI=0, bI>1, aD=0, 0<bD<1

Additive increase, multiplicative decrease aI>0, bI=1, aD=0, 0<bD<1

Which one?

decreasetxba

increasetxbatx

iIIi )(

Phase plots What are desirable properties? What if flows are not equal?

Efficiency Line

Fairness Line

User 1’s Allocation x1

User 2’s

Allocation x2

Optimal point

Overload

Underutilization

Phase plots Simple way to visualize behavior of

competing connections over time

Efficiency Line

Fairness Line

User 2’s

Allocation x2

Additive Increase/Decrease

Efficiency Line

Fairness Line

User 2’s

Allocation x2

Both X1 and X2 increase/decrease by the same amount over time Additive increase improves efficiency and

additive decrease reduces efficiency

Muliplicative Increase/Decrease

Both X1 and X2 increase by the same factor over time Extension from origin – constant

fairness

Efficiency Line

Fairness Line

User 2’s

Allocation x2

Convergence to Efficiency

Efficiency Line

Fairness Line

User 2’s

Allocation x2

Convergence to Fairness

Efficiency Line

Fairness Line

User 2’s Allocation

Convergence to Efficiency & Fairness

Efficiency Line

Fairness Line

Increase

Efficiency Line

Fairness Line

Constraints Distributed efficiency

I.e., Window(t+1) > Window(t) during increase

ai > 0 & bi > 1 Similarly, ad < 0 & bd < 1

Must never decrease fairness a & b’s must be > 0 ai/bi > 0 and ad/bd 0

Full constraints ad = 0, 0 bd < 1, ai > 0 and bi = 1

What is the Right Choice? Constraints limit us to AIMD

Can have multiplicative term in increase

AIMD moves towards optimal point

Efficiency Line

Fairness Line

TCP interactions

TCP Congestion Control Motivated by ARPANET congestion collapse Underlying design principle: packet

conservation At equilibrium, inject packet into network only

when one is removed Basis for stability of physical systems

Why was this not working? Connection doesn’t reach equilibrium Spurious retransmissions Resource limitations prevent equilibrium

TCP Congestion Control - Solutions

Reaching equilibrium Slow start

Eliminates spurious retransmissions Accurate RTO estimation Fast retransmit

Adapting to resource availability Congestion avoidance

TCP Congestion Control Basics Keep a congestion window, cwnd

Denotes how much network is able to absorb

Sender’s maximum window: Min (advertised window, cwnd)

Sender’s actual window: Max window - unacknowledged segments

If we have large actual window, should we send data in one shot? No, use acks to clock sending new data

Self-clocking

ReceiverSender

Slow Start How do we get this clocking behavior

to start? Initialize cwnd = 1 Upon receipt of every ack, cwnd = cwnd

+ 1 Implications

Window actually increases to W in RTT * log2(W)

Can overshoot window and cause packet loss

Slow Start Example

One RTT

One pkt time

4 5 6 7

Slow Start Sequence Plot

Sequence No

Congestion Avoidance Loss implies congestion – why?

Not necessarily true on all link types If loss occurs when cwnd = W

Network can handle 0.5W ~ W segments Set cwnd to 0.5W (multiplicative

decrease) Upon receiving ACK

Increase cwnd by 1/cwnd Results in additive increase

Return to Slow Start If packet is lost we lose our self

clocking as well Need to implement slow-start and

congestion avoidance together When timeout occurs set ssthresh to

0.5w If cwnd < ssthresh, use slow start Else use congestion avoidance

Overall TCP Behavior

Window

Congestion Window

CongestionWindow

Slow start with each time out

Time out Time out

Slow Start

•Time out is just wasting resource and time•How to prevent it: Do not wait! Fast retransmission

Fast Retransmit Resend a segment

after 3 duplicate ACKs A duplicate ACK means

that an out-of sequence segment was received

Notes: duplicate ACKs due to

packet reordering why reordering?

window may be too small to get duplicate ACKs

Then what? Slow start

segment 1cwnd = 1

cwnd = 2 segment 2segment 3

ACK 4cwnd = 4 segment 4

segment 5segment 6segment 7

3 duplicateACKs

Fast Recovery A duplicate ack notifies sender that a packet

has departed network When < cwnd packets are outstanding

Allow new packets out with each new duplicate acknowledgement

Behavior Sender is idle for some time – waiting for ½ cwnd

worth of dupacks Transmits at original rate after wait

Ack clocking rate is same as before loss At the end: No Slow start: W=W/2 and got to

Fast Retransmit

Sequence NoDuplicate Acks

RetransmissionX

Fast Recovery

Sequence NoSent for each dupack after

W/2 dupacks arrive

Multiple Losses

Sequence NoDuplicate Acks

RetransmissionX

Now what?

Sequence NoX

Slow start again

TCP Reno (1990) All mechanisms in Tahoe Addition of fast-recovery

Opening up congestion window after fast retransmit Delayed acks Header prediction

Implementation designed to improve performance Has common case code inlined

With multiple losses, Reno typically timeouts because it does not see duplicate acknowlegements

TCP Reno

Fast retransmit: retransmit a segment after 3 DUP Acks

Fast recovery: reduce cwnd to half instead of to one

Slow Start

CongestionAvoidance

Timeout

Fast Recovery

Fast recovery

Sequence NoX

Now what? - timeout

NewReno The ack that arrives after

retransmission (partial ack) should indicate that a second loss occurred

When does NewReno timeout? When there are fewer than three

dupacks for first loss When partial ack is lost

How fast does it recover losses? One per RTT

NewReno

Sequence NoX

Now what? – partial ackrecovery

SACK Basic problem is that cumulative

acks only provide little information Ack for just the packet received

What if acks are lost? carry cumulative also

Not used Bitmask of packets received

Selective acknowledgement (SACK)

How to deal with reordering

Sequence NoX

Now what? – sendretransmissions as soonas detected

Performance Issues Timeout >> fast rexmit

Need 3 dupacks/sacks Not great for small transfers

Don’t have 3 packets outstanding What are real loss patterns like?

Right edge recovery Allow packets to be sent on arrival of

first and second duplicate ack Helps recovery for small windows

How to deal with reordering?

NewReno Changes

Send a new packet out for each pair of dupacks Adapt more gradually to new window

Will not halve congestion window again until recovery is completed Identifies congestion events vs. congestion

signals Initial estimation for ssthresh

Rate Halving Recovery

Sequence NoSent after every

other dupack

Delayed Ack Impact TCP congestion control triggered

by acks If receive half as many acks window

grows half as fast Slow start with window = 1

Will trigger delayed ack timer First exchange will take at least

200ms Start with > 1 initial window

Bug in BSD, now a “feature”/standard

TCP Congestion Control end-end control (no network assistance) transmission rate limited by congestion window size, Congwin,

over segments:

w segments, each with MSS bytes sent in one RTT:

throughput = w * MSS

RTT Bytes/sec

Congwin

Fast Retransmit and Recovery in Reno

Upon reception of 3 dupes, thresh= thresh/2. Set the congwin = thresh + 3

This accounts for the 3 packets that have left the network. Increment congwin for each dupe subsequently

received. Transmit a new segment if we are allowed When a new ack* finally arrives, set

congwin=thresh and we are in congestion avoidance again with a “deflated window”.

*”new ack” means an ack for any data not yet acked, but inside congwin.

Fast Retransmit and Recovery in NewReno

Upon reception of 3 dupes, thresh= thresh/2. Set the congwin = thresh + 3

This accounts for the 3 packets that have left the network. Increment congwin for each dup subsequently

received. Transmit a new segment if we are allowed When a new ack* finally arrives, set congwin=thresh

and we are in congestion avoidance again.* But only do this if the ack received is for the highest

seq# sent, avoiding a stall while recovering.

TCP & Routers

How Routers can help Congestion control Indeed, Congestion control and queue

management are the same problem “Resource Allocation”.

RED XCP Read

Chapter 6 of the book, also look at [FJ93] Random Early Detection Gateways for

Congestion Avoidance

Queuing Disciplines Each router must implement some

queuing discipline Queuing allocates both bandwidth

and buffer space: Bandwidth: which packet to serve

(transmit) next Buffer space: which packet to drop next

(when required) Queuing also affects latency

Packet Drop Dimensions

AggregationPer-connection state Single class

Drop positionHead Tail

Random location

Class-based queuing

Early drop Overflow drop

Typical Internet Queuing FIFO + drop-tail

Simplest choice Used widely in the Internet

FIFO (first-in-first-out) Implies single class of traffic

Drop-tail Arriving packets get dropped when queue is

full regardless of flow or importance Important distinction:

FIFO: scheduling discipline Drop-tail: drop policy

Active Queue Management

Design active router queue management to aid congestion control

Why? Routers can distinguish between

propagation and persistent queuing delays

Routers can decide on transient congestion, based on workload

Active Queue Designs Modify both router and hosts

DECbit: congestion bit in packet header Modify router, hosts use TCP

Fair queuing Per-connection buffer allocation

RED (Random Early Detection) Drop packet or set bit in packet header as

soon as congestion is starting

Random Early Detection (RED)

Detect incipient congestion, allow bursts Keep power (throughput/delay) high

Keep average queue size low Assume hosts respond to lost packets

Avoid window synchronization Randomly mark packets

Avoid bias against bursty traffic Some protection against ill-behaved

RED Algorithm Maintain running average of queue

length If avgq < minth do nothing

Low queuing, send packets through If avgq > maxth, drop packet

Protection from misbehaving sources Else mark packet in a manner

proportional to queue length Notify sources of incipient congestion

RED OperationMin threshMax thresh

Average Queue Length

minth maxth

Avg queue length

P(drop)

RED Algorithm Maintain running average of queue

length Byte mode vs. packet mode – why?

For each packet arrival Calculate average queue size (avg) If minth ≤ avgq < maxth

Calculate probability Pa

With probability Pa

Mark the arriving packet

Else if maxth ≤ avg Mark the arriving packet

Queue Estimation Standard EWMA: avgq = (1-wq) avgq + wqqlen

Special fix for idle periods – why? Upper bound on wq depends on minth

Want to ignore transient congestion Can calculate the queue average if a burst arrives

Set wq such that certain burst size does not exceed minth

Lower bound on wq to detect congestion relatively quickly

Typical wq = 0.002

Thresholds minth determined by the utilization

requirement Tradeoff between queuing delay and utilization

Relationship between maxth and minth

Want to ensure that feedback has enough time to make difference in load

Depends on average queue increase in one RTT

Paper suggest ratio of two Current rule of thumb is factor of three

Packet Marking maxp is reflective of typical loss rates Paper uses 0.02

0.1 is more realistic value If network needs marking of 20-30%

then need to buy a better link! Gentle variant of RED (recommended)

Vary drop rate from maxp to 1 as the avgq varies from maxth to 2* maxth

More robust to setting of maxth and maxp

Extending RED for Flow Isolation

Problem: what to do with non-cooperative flows?

Fair queuing achieves isolation using per-flow state – expensive at backbone routers How can we isolate unresponsive flows

without per-flow state? RED penalty box

Monitor history for packet drops, identify flows that use disproportionate bandwidth

Isolate and punish those flows

FRED Fair Random Early Drop (Sigcomm, 1997) Maintain per flow state only for active

flows (ones having packets in the buffer) minq and maxq min and max number of

buffers a flow is allowed occupy avgcq = average buffers per flow Strike count of number of times flow has

exceeded maxq

Feedback

Round Trip Time

Congestion Window

Congestion Header

Feedback

Round Trip Time

Congestion Window

How does XCP Work?

Feedback = + 0.1 packet

Round Trip Time

Congestion Window

Feedback = - 0.3 packet

How does XCP Work?

Congestion Window = Congestion Window + Feedback

Routers compute feedback without any per-flow state Routers compute feedback without any per-flow state

How does XCP Work?

XCP extends ECN and CSFQ

How Does an XCP Router Compute the Feedback?

Congestion Controller

Fairness ControllerGoal: Divides between

flows to converge to fairness

Looks at a flow’s state in Congestion Header

Algorithm:If > 0 Divide equally

between flowsIf < 0 Divide between flows proportionally to their

current rates

MIMD AIMD

Goal: Matches input traffic to link capacity & drains the

queueLooks at aggregate traffic

& queue

Algorithm:Aggregate traffic changes by

~ Spare Bandwidth

~ - Queue SizeSo, = davg Spare -

Congestion Controller

Fairness Controller

= davg Spare - Queue

0 2 and

Theorem: System converges to optimal

utilization (i.e., stable) for any link bandwidth, delay,

number of sources if:

(Proof based on Nyquist Criterion)

Getting the devil out of the details …

Congestion Controller Fairness Controller

No Parameter Tuning

Algorithm:If > 0 Divide equally between

flowsIf < 0 Divide between flows

proportionally to their current rates

Need to estimate number of flows N

Tinpkts pktpkt RTTCwndTN

RTTpkt : Round Trip Time in header

Cwndpkt : Congestion Window in header

T: Counting Interval

No Per-Flow StateNo Per-Flow State

Lessons TCP alternatives

TCP being used in new/unexpected ways Key changes needed

Routers FIFO, drop-tail interacts poorly with TCP Various schemes to desynchronize flows and

control loss rate Fair-queuing

Clean resource allocation to flows Complex packet classification and scheduling

Core-stateless FQ & XCP Coarse-grain fairness Carrying packet state can reduce complexity

Next Lecture: TCP behavior and New Versions

High speed TCPs Assigned reading

[BP95] TCP Vegas: End to End Congestion Avoidance on a Global Internet

[FHPW00] Equation-Based Congestion Control for Unicast Applications

Computer Networks (Graduate level)

Documents

Transcript of Computer Networks (Graduate level)

CS252 Graduate Computer Architecture Lecture 21 Multiprocessor Networks (con’t) John Kubiatowicz Electrical Engineering and Computer Sciences University.

Overview CS 332 – Computer Networks 1CS332 - Computer Networks.

Computer Science Graduate Degree Handbook• CSCI P536: Advanced Operating Systems • CSCI P538 Computer Networks 3. Computer Science Courses (15 credits) • Any CSCI 500+ level

Computer Network Management Review 2 – Transport Protocols Acknowledgments: Lecture slides are from the graduate level Computer Networks course thought.

CS252 Graduate Computer Architecture Lecture 16 Multiprocessor Networks (con’t) March 29 th , 2010

Joint Master in Software Engineering 1 ECOM 6303 Advanced Computer Networks Spring 2015 (Graduate course) Paper Reading, Review, and Presentation Computer.

A Graduate Course in Computer Networks

Transport Layer Computer Networks Computer Networks Term A15.

Transport Layer Computer Networks Computer Networks Term B10.

Advance Computer Networking L-6 TCP & Routers Acknowledgments: Lecture slides are from the graduate level Computer Networks course thought by Srinivasan.

CS252 Graduate Computer Architecture Lecture 17 Multiprocessor Networks (con’t) March 31 th , 2010

Advanced Computer Networks Background Material 1: Getting stuff from here to there Acknowledgments: Lecture slides are from the graduate level Computer.

GRADUATE PROGRAM Electrical and Computer Engineeringengineering.uci.edu/files/grad-studies-brochure-eecs.pdf · GRADUATE PROGRAM Electrical and Computer Engineering ... computer networks,

CS252 Graduate Computer Architecture Lecture 16 Multiprocessor Networks (con’t) March 14 th, 2012 John Kubiatowicz Electrical Engineering and Computer.

CS 552 Computer Networks Fall 2004 Rich Martin. Course Description Graduate course on computer networking –Undergraduate knowledge of networking assumed.

CS 552 Computer Networks Fall 2008 Rich Martin. Course Description Graduate course on computer networking –Undergraduate knowledge of networking assumed.

Advance Computer Networking L-8 Routers Acknowledgments: Lecture slides are from the graduate level Computer Networks course thought by Srinivasan Seshan.

INTRODUCTION TO COMPUTER NETWORKS Zeeshan Abbas. Introduction to Computer Networks INTRODUCTION TO COMPUTER NETWORKS.

Advance Computer Networking L-5 TCP & Routers Acknowledgments: Lecture slides are from the graduate level Computer Networks course thought by Srinivasan.

Computer networks--networks