Signal Processing Challenges in Active Queue Management

11
IEEE SIGNAL PROCESSING MAGAZINE SEPTEMBER 2004 69 1053-5888/04/$20.00©2004IEEE Signal Processing Challenges in Active Queue Management A review of central issues Stephan Bohacek, Khushboo Shah, Gonzalo R. Arce, and Mike Davis n 1993, the publication of Floyd and Jacobson’s seminal paper “Random Early Detection Gateways for Congestion Avoidance” [1] marked a new direction in networking research and began what is perhaps the most investigated example of cross-layer optimization. While this paper inspired an immense amount of work, after ten years of work, countless papers, and thousands of hours of simulation time, there is no clear consensus about whether the ideas or approach discussed by Floyd and Jacobson or any other related approaches bring any benefit to a typical computer network. Lack of consen- sus is perhaps an understatement. The Internet Engineering Task Force (IETF, the group that sets the standards for the Internet) recommended that these approaches should be implemented by network opera- tors [2], and most commercial routers have the capabili- ty to implement them as well. However, there is an abundance of convincing papers that illustrate that the approaches set forward by Floyd and Jacobson as well as other related papers will have an effect that falls somewhere between significant improvement and to a major degradation in performance [3]–[7]. The problem addressed by Floyd and Jacobson stems from a conceptual problem with how two basics subsystems, transport control protocol (TCP) and router queues, interact. TCP is the prominent mecha- nism for controlling the data sending rates in the Internet; it controls at least 85% of all traffic on the Internet. The goal of TCP’s congestion control is to utilize all available bandwidth in a fair and efficient way. Congestion control is a difficult problem because TCP runs only at the end-hosts, and there is little communication from the network to end hosts. Hence, the end-hosts are left to infer the available bandwidth with limited I © DIGITAL VISION

Transcript of Signal Processing Challenges in Active Queue Management

Page 1: Signal Processing Challenges in Active Queue Management

IEEE SIGNAL PROCESSING MAGAZINESEPTEMBER 2004 691053-5888/04/$20.00©2004IEEE

Signal ProcessingChallenges in ActiveQueue Management

A review of central issues

Stephan Bohacek,Khushboo Shah,

Gonzalo R. Arce,and Mike Davis

n 1993, the publication of Floyd and Jacobson’s seminalpaper “Random Early Detection Gateways forCongestion Avoidance” [1] marked a new direction in

networking research and began what is perhaps the mostinvestigated example of cross-layer optimization. Whilethis paper inspired an immense amount of work, after tenyears of work, countless papers, and thousands of hoursof simulation time, there is no clear consensus aboutwhether the ideas or approach discussed by Floyd andJacobson or any other related approaches bring anybenefit to a typical computer network. Lack of consen-sus is perhaps an understatement. The InternetEngineering Task Force (IETF, the group that sets thestandards for the Internet) recommended that theseapproaches should be implemented by network opera-tors [2], and most commercial routers have the capabili-

ty to implement them as well. However, there is anabundance of convincing papers that illustrate that the

approaches set forward by Floyd and Jacobson as well asother related papers will have an effect that falls somewhere

between significant improvement and to a major degradationin performance [3]–[7].The problem addressed by Floyd and Jacobson stems from a

conceptual problem with how two basics subsystems, transport controlprotocol (TCP) and router queues, interact. TCP is the prominent mecha-

nism for controlling the data sending rates in the Internet; it controls at least 85% of alltraffic on the Internet. The goal of TCP’s congestion control is to utilize all availablebandwidth in a fair and efficient way. Congestion control is a difficult problem becauseTCP runs only at the end-hosts, and there is little communication from the network toend hosts. Hence, the end-hosts are left to infer the available bandwidth with limited

I

©D

IGIT

AL

VIS

ION

Page 2: Signal Processing Challenges in Active Queue Management

IEEE SIGNAL PROCESSING MAGAZINE70 SEPTEMBER 2004

information. While TCP does not meet its goals in allcircumstances, it achieves a balance. And perhaps mostimportantly, TCP does not lead to congestion collapsewhen the number of users increases. TCP employs anadditive increase multiplicative decrease (AIMD)scheme where the sending rate slowly increases until apacket loss is detected at which point the sending rateis divided in half. (More details on TCP can be foundin “Transport Control Protocol” and in [8].)

While packet loss may be due to random transmis-sion errors, in today’s wired networks, most lost pack-ets are due to queue overflow in routers as follows. Atthe routers (sometimes called switches), packets arriveon many input interfaces and are forwarded to outputinterfaces. If a packet destined for a particular outgo-ing interface arrives when the outgoing link is busy,this packet is placed in a first-in, first-out (FIFO)queue. In the event that packets arrive faster than theoutgoing link speed, the FIFO queue will becomeincreasingly occupied. Eventually, packets will nolonger be able to be accommodated and packets willbe dropped.

While TCP has served its purpose exceptionally welland is partly responsible for the communication explo-sion of the last decade, there is a fundamental concep-tual flaw. Routers have queues to enable statisticalmultiplexing. Suppose that two flows share a link atsome point between their respective sources and desti-nations. Since statistical multiplexing is employed, theseflows do not have to synchronize the times at whichthey transmit packets. Instead, if two packets arrive at arouter at the same time, one is transmitted while theother is placed in queue. Similarly, if many flows share asingle link, a burst of packets may arrive at an outputinterface in a short amount of time and queue may

become quite full. Thus, to take advantage of statisticalmultiplexing, a queue is needed. Indeed, from elemen-tary queuing theory, it is clear that a far larger through-put can be achieved by including a queue. However,TCP congestion control uses the queue in a differentway than the queue was intended to be used for. TCPtries to fill the queue. When the queue is filled, TCPwill learn, by way of a packet loss, that the sending rateshould be decreased. Plainly said, the queue is for ran-dom bursts, but TCP fills it to determine bandwidth.

The goal of Floyd and Jacobson’s paper and thepapers that followed was to correct this apparent conflictthrough active queue management (AQM). The idea isthis. TCP’s goal is to determine when the availablebandwidth is fully utilized. In AQM, when the routerdetermines that the bandwidth is fully utilized, packetsare dropped even when the queue is not full so as toalert TCP and thereby keeping TCP flows in check.

Since a comprehensive solution to AQM hasremained an open problem for over ten years, it seemsappropriate to embark on novel approaches. As will beshown, some of the challenges facing AQM design fallinto the area of signal processing and significant contri-bution by the signal processing community is possible.This article seeks to frame these problems in termsaccessible to the signal processing researchers. In thenext section, the basic idea of AQM is provided. Thenthe objectives of AQM are discussed, and a shortoverview of a sample of different approaches to AQMis provided. The signal processing aspects of AQM arediscussed in the following sections, specifically, theproblem of predicting congestion, approaches todetecting changes in network traffic, an estimationproblem, and dithering and quantization. Finally, someconcluding comments are made.

Abrief and simplified discussion transport control protocol(TCP) is suitable for our purposes (see [8] for more infor-

mation on TCP). In most implementations of TCP (e.g., TCP-SACK, TCP-NewReno, etc.), the source maintains a variable, thecongestion window (cwnd), that is the maximum allowablenumber of unacknowledged packets. For every packet received,the destination replies with an acknowledgement. Since ittakes one round-trip time (RTT) after a packet is sent for anacknowledgement to be received, TCP will permit a sendingrate of cwnd/RTT packets/s. A TCP source takes one of twostates, congestion avoidance or slow-start. When a TCP flowbegins, it enters the slow-start state. In slow-start, the conges-tion window is incremented by one every time an acknowl-edgement is received. This amounts to the congestion windowdoubling every RTT. TCP remains in the slow start state until thefirst packet drop is detected. Upon detecting a drop, the con-gestion window is divided by two and the congestion avoid-

ance state is entered. In the congestion avoidance state, cwndis incremented by one every time a congestion window’s worthof acknowledgements are received, which occurs exactly onceper RTT. When a packet drop is detected (by lack of acknowl-edgement), the congestion window is divided by two. Notethat if the size of the file transferred is small enough, the entirefile will be transfered during the slow-start phase.

cwnd

Time

During slow-start,cwnd growsexponentially cwnd divides when a

packet loss is detected

When no loss occurs, cwndincreases at a rate of 1/RTT

Transport Control Protocol

Page 3: Signal Processing Challenges in Active Queue Management

IEEE SIGNAL PROCESSING MAGAZINESEPTEMBER 2004 71

An Overview of AQM and the BasicsA key feature of TCP is that it decreases the sendingrate when it experiences packet losses. AQM tries totake advantage of this feature by dropping packets tocontrol the packet arrival rates. Figure 1 situates AQMin the network. The question mark denotes AQM prin-ciple task, deciding to drop a packet or not.

There are two ways that AQM decides to drop apacket. AQM may directly select which packet is to bedropped as in the case of adaptive virtual queue (AVQ)[9] and stabilized RED (SRED) [10]. Or, as in the caseof Floyd and Jacobson’s RED [1], adaptive RED(ARED) [11], random exponential marking (REM)[12], proportional integrator (PI) controller [13], andBLUE [14], the AQM determines a drop probabilityand packets are dropped at random according to thisprobability. (This BLUE should not be confused withthe best linear unbiased estimator. In fact, BLUE is notan acronym; rather it is an alternative to RED.)

There has been extensive work toward modelingTCP’s data rate. The typical assumption is that packetsare dropped at random. While the validity of thisassumption depends on the AQM, it is often a goodapproximation even when a deterministic AQM is used.(The approximation is not true when the flows becomesynchronized.) Letting δ denote the average dropprobability. It has been found that for small δ, the aver-age throughput (i.e., the number of packets sent persecond) of TCP while in congestion avoidance is

T = C

RTT√

δ, (1)

where C is some constant, typically C = √3/2, and

nearly always between 1 and 1.4 [15]. If the dropprobability is time varying, then TCP’s mean transmis-sion rate is also time varying. A reasonable approxima-tion of the dynamics of TCP’s mean sending rateduring congestion avoidance is

ddt

T (t ) = 1RTT2 − βT (t )2 δ (t ), (2)

where β is a constant, typically, β = 1/C 2 where C isthe constant in (1). The model (2) has developed in[16] while analysis of the case where drops are modeledas a Cox process and round-trip time (RTT) is modeledas a diffusion process can be found in [17].

Note that not all traffic is TCP traffic. For example,a nonnegligible amount of traffic follows the UDP pro-tocol. UDP does not have any congestion control. Itssending rate is controlled by the application.

As an alternative to dropping packets, it has beenproposed that AQM would simply mark packets ratherthan drop them. When a TCP destination gets amarked packet, it returns an ACK with a flag set notify-ing the source that a packet has been marked. Uponreceiving such a packet, the source acts as though a

packet has been dropped. At this time, however, not allhosts support this feature. Regardless, we use the termsmark and drop interchangeably.

Objectives of AQMBefore discussing AQM schemes, the objectives ofAQM must be understood. The utility of an AQM

▲ 1. A pictorial representation of AQM.

Sources

AQM?

Network

Queue

Network

Destinations

Network Traffic

The fluctuation of traffic greatly impact the performanceof AQM. In [22], network measurements indicated that

Ethernet traffic displayed self-similarity. In [23], it wasshown that such behavior could be generated by theaggregate of on-off processes where the on and off timeshave a heavy-tailed distribution. In [24], it is argued thatfiles sizes on the Internet have a heavy-tailed distribution,specifically, they are Pareto distributed. This model ofPareto distributed on and off times has become a standardmodel for modeling Internet traffic. Since the exponent inthese distributions is less than two and greater than one,the mean exists, but the variance does not. In terms of net-work flows, this implies that while most files transferred arequite small, some transfers are very long. Due to this distri-bution, it is assumed that there are two types of file trans-fers, short-lived flows (often called mice) and long-livedflows (often called elephants). While the accepted para-digm is that there are two types of flows (mice and ele-phants), one must remember that there are medium sizedfiles as well. The degree that intermediate sized file trans-fers behave as mice or elephants depends on the RTT, thepacket drop probability, and TCP parameters such as theinitial value of SSThres and the receiver window size (see[8] for details on these parameters). While manyresearchers have assumed that the file size follows thePareto distribution, there has been some investigations thatindicate that non-heavy tailed distributions such lognormalare more accurate. Furthermore, while web traffic mostlyconsists of small file transfers, file sharing consist of largefiles transfers, and online gaming uses very short file mes-sages. As the popularity of these applications waxes andwanes and as new applications are introduced, the distri-bution of the size of files transferred varies.

Page 4: Signal Processing Challenges in Active Queue Management

IEEE SIGNAL PROCESSING MAGAZINE72 SEPTEMBER 2004

scheme that performs well in a narrow sense (e.g.,increases the transfer time of very large file) is limited.Rather, an AQM scheme for “typical” networks mustmeet many objectives, some of which are difficult toquantify. Furthermore, these requirements change asnew technology and new applications are introduced.For these reasons, listing the objectives of AQM is adifficult and perhaps controversial task.

In the original paper by Floyd and Jacobson and inmany of the AQM papers since Floyd and Jacobson’s,the primary objective was to keep the average queuingdelay low while maximizing the throughput. If thequeue never empties, then the outgoing link is alwaysbusy, hence the highest possible throughput isachieved. While a large average queue occupancyreduces the possibility of the queue emptying and thelink going idle, it also increases the time it takes forpackets to pass through the queue. Hence, Floyd andJacobson desired a balance where the average queueoccupancy is as small as possible without causing a sig-nificant decrease in throughput.

Reducing the time to transfer a file is often sited as agoal of AQM [18]. The time to transfer a small file isdominated RTT and not the bandwidth. Thus, control-ling the average queue occupancy also controls theaverage time to transfer a small file. On the other hand,some applications have strict delay requirements. Forexample, voice-over-IP requires an end-to-end delay ofless than 250 ms; an amount that is easily surpassedwhen a large queue is full. A simple version of a voice-over-IP delay criterion might impose a maximum delay.However, considering that a few late packets inter-spaced with enough on-time packets does not signifi-cantly detract from quality, a reasonable, yet complex,delay criterion might consider how often the delay isgreater than 250 ms as well as the sojourn times.

Let q (t ) be the number of packets in the queueincluding the packet that is currently being transmitted.Since maximizing the throughput is the same as mini-mizing the time that the queue is empty, the primarygoals of AQM are

min1T

∫ T

01{q(t )=0}dt

subject to: GT (q) ≤ Q , (3)

where 1 is the indicator function and where G dependson constraint on the queue occupancy, or equivalently, G

depends on the constraint on the delay. For example, ifthe average queue occupancy is constrained, thenGT (q) := (1/T )

∫ T0 q(t )dt , if the maximum queue occu-

pancy is constrained GT (q) := maxt∈[0,T ] q (t ), and if theprobability of large delay is constrained, thenGT (q) := (1/T )

∫ T0 1{q(t )>q}dt . In all cases, the time

horizon T could be infinite.While solving the above problems is quite challeng-

ing, AQM must meet other objectives as well. Some ofthese objectives are quite subtle, but still critical. Onesuch objective is “no bias to bursty traffic.” To under-stand the origins of this objective, we must understandsome details of TCP. TCP does not send packets at aconstant rate, rather it sends packets in bursts with thetime between bursts equal to one RTT. The number ofpackets sent in a burst is equal to the congestion win-dow, while the average bit-rate rate of the flow is thewindow size divided by RTT. Thus, for a fixed bit-rate,the larger the RTT, the larger the burst. If an AQMscheme tends to drop packets that arrive in bursts, thenit will tend to drop packets belonging to flows withlarger RTT. This would allow flows with shorter RTTto dominate the link bandwidth. Hence, bias to burstyflows is actually an issue of fairness. To evaluate RED,Ziegler [19] proposed the following measure of fairness

(1

N

∑di

)2

1N

∑d2

i

,

where di is the empirical drop probability experiencedby the ith flow.

Another reason cited to use AQM, especially RED,is that it reduces the possibility of global synchroniza-tion. In simulations, it is not uncommon for flows tosynchronize. For example, suppose two flows share alink and compete for bandwidth over this link. If theseflows also cross some links that are not shared and,consequently, have different round-trip time, it is possi-ble that the flows will become synchronized in such away that one flow achieves a far higher sending ratethan the other flow. Small changes in the link propaga-tion delays often breaks this synchronization. An effec-tive way to stop synchronization is to drop packetsprobabilistically. Because synchronization is so frequentin simulation, some sort of randomization is often anecessity. However, since the Internet is very large andusers follow random behavior, it is not clear how preva-lent synchronization is in real networks ([10] indicatesthat synchronization is rare, while [20] shows that itdoes occur). As long as an AQM scheme drops packetsrandomly, no special design consideration is required toavoid synchronization.

In the original Floyd and Jacobson paper, it wasmentioned that RED could control the queue sizeeven when flows do not use TCP congestion control.

A key feature of TCP is that itdecreases the sending ratewhen it experiences packetlosses.

Page 5: Signal Processing Challenges in Active Queue Management

IEEE SIGNAL PROCESSING MAGAZINESEPTEMBER 2004 73

More recently, in [21], one of the principle objectivesof RED was stated to drop packets of flows that are“nonconformant.” Here nonconformant flows arethose that send data faster than TCP would dictate.There is considerable concern that if users utilize con-gestion control schemes that send data faster thanTCP, the TCP flows would suf fer reduction inthroughput. It is feared that if such schemes becamewide spread, the network may become unstable andexperience congestion collapse where congestion is sosevere that little data is successfully transmitted.Similarly, a malicious attacker could send packets at ahigh rate effectively starving the TCP flows of band-width. In this way, attackers could attack links andimpact on the utility of the network.

There has been extensive work investigating the sta-bility of different AQM algorithms. Since all quantitiesare bounded, stability concepts such as bounded inputimplies bounded output are not applicable. Instead,investigators examine the local stability around specificoperating points. However, if the queue occupancyoscillates (with either fixed period or chaotically [5])but never empties and never exceeds the specified limit,then these oscillations might be harmless [11]. On theother hand, an AQM that oscillates may lead to unfore-seen problems for other protocols that rely on AQM.

Another critical objective of an AQM scheme is sim-plicity. Today, administering an ISP is seen as overly dif-ficult. Thus, any added further complexity must comewith overwhelming improvements in performance.

Existing AQM AlgorithmsThere are many approaches to AQM. Here we brieflyreview a few approaches and then illustrate their per-formance through several simulations.

AQM AlgorithmsThis section is not meant to be a comprehensive reviewof AQM algorithms. The reader is urged to see the ref-erenced papers for more details.

The first AQM strategy is the simplest, drop-tail queue.Here a finite sized FIFO queue is employed. If an arrivingpacket finds the queue full, this packet is discarded.

RED is the most investigated AQM. RED determinesthe packet drop probability based on a filtered version ofthe queue occupancy. Let tn denote the time when thenth packet exits the queue and let τk be the time whenthe kth packet arrives at the queue. Hence, at momentsτk, the occupancy of the queue increments by one unlessthe queue is already full, in which case it remains at qmax,the size of the queue. Between packet arrivals, severalpackets may have departed. Let |{n : tn ∈ (τk, τk+1]}|denote the number of packets that departed the queuebetween the arrival of the kth and k + 1th packet. Thus,the queue occupancy just after the k + 1th packet arrivedis given by q (τk+1) = min (qmax, q(τk) + 1 − |

{n : tn ∈ (τk, τk+1]}|). A filtered version of the queue is

q (τk+1) = (1 − wq)q (τk) + wqq (τk+1) if q (τk+1) > 1(4a)

q (τk+1) = (1 − wq

)τk+1−max{tn :tn<τk+1} otherwise,(4b)

where 0 < wq ≤ 1. RED drops incoming packets withprobability f (q) shown in Figure 2. A popular variantof RED is gentle RED uses a slightly modified versionof f and is also shown in Figure 2.

ARED’s mapping from q to packet drop probabilityis slightly more complex. The idea is to use the schemeabove but adjust maxp (shown in Figure 2) dynamical-ly. Roughly speaking, when q exceeds maxt h ,then maxpgrows by a factor of 1.25, and when q falls belowmint h , then maxp decreases by a factor of 0.9. Thisadjustment is typically performed every 0.5 s.

Beyond RED and ARED, there are several otherAQM schemes. In [13], a standard proportional inte-grator controller was studied and found to work well.Specifically, the mapping from the queue occupancy is

▲ 2. (a) The relationship between a filtered version of the queueoccupancy and the marking probability. The solid line corre-sponds to RED while the dotted line corresponds to gentle RED.(b) The topology which was used for simulations.

1.0

0.8

0.6

0.4

0.2

0.00 10 20 30 40 50 60

minth maxth qmax

maxp

Sources10 Mb/s

10 ms PropagationDelay

100 Mb/s1 ms Prop. Delay

Destinations

(a)

(b)

q

Page 6: Signal Processing Challenges in Active Queue Management

IEEE SIGNAL PROCESSING MAGAZINE74 SEPTEMBER 2004

P (s ) = Ks/z + 1

sQ (s ),

where P and Q are the Laplace transform of the packetdrop probability and queue occupancy respectively.

In [12], an approach known as random exponentialmarking (REM) was developed and analyzed. A statevariable r is maintained via

r(k + 1) = [r(k) + γ

(q(k) − q∗)

+λ(k) − C × T)]+

. (5)

Here q∗ is the desired queue occupancy, λ(k)

is thearrival rate during the kth sample, T is the sample period, and C is the link capacity. The mapping from rto the packet drop probability is δ(k) = 1 − φ−r(k) . Therationale behind REM is discussed in detail in [12] andthe references cite therein.

AVQ [9] takes a rather different approach. AVQseeks to keep the queue empty. To this end, a virtuallink speed, C , is introduced. At all times, the virtuallink speed is no greater than the actual link speed.Specifically, the virtual link speed is determined by(d/dt )C (t ) = α (γC − λ (t )) , where γ is the desiredutilization with γ < 1, C is the actual link speed, andλ (t ) is the packet arrival rate at time t . Along with thevirtual link speed, a virtual queue is maintained. Aspackets arrive, they are placed in the real queue and a

token is placed in the virtual queue. The real packetsleave the real queue according to the link speed, whilethe tokens leave the virtual queue at the virtual linkspeed. That is, a token is served only when all tokensthat arrived before it have be served. The service timeof a token is the size of the packet that the token rep-resents divided by the virtual link speed. If a tokenfinds the virtual queue full, then the real packet andtoken are dropped.

AQM ExperimentsDifferent AQM scheme were simulated using the ns-2simulator. The single bottleneck topology shown in Figure2 was used for the simulations. Three types of experi-ments were performed. The first experiment investigatedperformance with different numbers of long-lived TCPconnections. The second simulations considered Web-like traffic. Specifically, in these experiments the startingtimes of the transfers were modeled with a Poissonprocess and the size of the file transfers had a Pareto dis-tribution (i.e., P (file size > z ) = β/z α ), where themean was 20 KB and α = 1.4. Note for α = 1.4, largefiles make up only a small portion of the traffic. Thus,these two simulations are on opposite sides of the spec-trum. Figure 3 shows average drop probability and aver-age queue occupancy. There are a large number ofparameters in AQM schemes. Here the default valuesgiven by the ns-2 simulator were used, except that someparameters were adjusted so that the average queue

occupancy was around five in thelong-lived case with many flows.(The upper left plot.) Specifically,for the drop-tail, the queue size was7; for RED, mint h = 2 andmaxt h = 4; for ARED, mint h = 2and maxt h = 7; for REM,q∗ = 2.5; and for AVQ, γ = 0.86.

There is much to be gleanedfrom these simulations. Perhapsmost striking feature is the differ-ence in performance in the three dif-ferent scenarios. For example, whilethe goal of a queue occupancy offive is met in the long-lived case, inthe Web-like traffic simulation, thedrop probability could have beensignificantly smaller and still thequeue occupancy constraint wouldhave been meet. In the Web-liketraffic simulation, several schemeshave a queue occupancy of nearlyfive, but only at the expense of dropprobability of nearly 20%, a dropprobability that leads to unbearablyslow file transfers. In general, itseems difficult to set the parametersto achieve desire queue occupancyover a wide range of scenarios.▲ 3. (a) Long-lived flows. (b) Small size file transfers.

Traffic Type: FTP

Number of Connections

Connections/SecondConnections/Second

Number of Connections

Traffic Type: HTTP with α = 1.4

2001601208040020016012080400

1

2

3

4

5

6

Dro

p P

roba

bilit

y

Que

ue O

ccup

ancy

0.12

0.10

0.08

0.06

0.04

0.02

0.00

1

0

2

3

4

5

6

7

Dro

p P

roba

bilit

y

Que

ue O

ccup

ancy

0.35

0.30

0.25

0.20

0.15

0.10

0.05

0.0080757065605550454035308075706560555045403530

Drop TailREDAdaptiveREDREMAVQ

(a)

(b)

Page 7: Signal Processing Challenges in Active Queue Management

IEEE SIGNAL PROCESSING MAGAZINESEPTEMBER 2004 75

Another point illustrated by these simulations is that thebehavior of AQM is not entirely intuitive. For example,one might think that queue occupancy and drop proba-bility are inversely related. It is true that in the case ofhigh connection rate in second simulations, AVQ has thelargest drop probability and the lowest queue occupancy.On the other hand, REM seems to not obey this relation-ship with a fairly high queue occupancy and a fairly highdrop probability. A final conclusion is that it is clear thatAQM is not solved, there is no method that clearly andsignificantly out-performs the others.

Predicting CongestionIn [1] it is stated that RED’s primary goal is to detectincipient congestion. The chief motivation of the detec-tion is to maximize throughput and minimize delay asstated above. Thus, the idea is that by making observa-tions, AQM can predict congestion.

As mentioned, RED and ARED observe packetarrivals to determine packet drop probability. However,these methods do not directly use the queue occupancybut rather use (4a) and (4b) to produce a filtered ver-sion of the queue occupancy. At first glance, (4a)appears to be a low-pass filter. However, note that theτks are not uniformly spaced but are the times whenthe packets arrive. As a result, when packets arrivequickly, q is updated more quickly and tracks the actualqueue occupancy more closely. Hence, changes in net-work traffic that lead to an increase in congestion arerapidly reacted to. When packets arrive less frequentlyand the queue is emptying, then q is updated less fre-quently and the actual queue occupancy is trackedmore slowly. Hence RED is conservative in reacting tothe decrease in arrival rate. The second expression (4b)implies that while the queue is empty, q decreases expo-nentially at a rate of (1 − wq)

t , where t is the durationthat the queue has been empty. Thus, an empty queueis taken as a sign of decreased utilization, and hence qis updated more quickly to react to this information.The way in which (4a) behaves differently for increas-ing queue occupancy than it does for decreasing occu-pancy is similar to weighted median filters. Indeed,weighted median filters have been shown to providesome improvement in the performance of AQM [25].

The idea that AQM should detect congestion is asubtle one. Suppose that there are a fixed number ofTCP flows (the setting that most AQM researchersfocus on during design and analysis). In the case of adrop tail queue, TCP flows will slowly increase theirsending rate until the queue fills and packets aredropped. Thus, one interpretation of AQM’s objectiveis to detect this increase in sending rates and droppackets before the queue fills. However, such predic-tion is likely not possible. In [26] it was found that theaggregate arrival rate of TCP flows is well modeled byan affine AR system driven by Gaussian white noise.The normalized variance (i.e., the variance of theresidual error divided by the variance of the TCP

arrival rate) is never less than 0.5 and typically greaterthan 0.75. This result implies that it is not possible tomake fine scale predictions of TCP’s sending rate. Thedetails and further implications of the predictabilitydifficulties can be found in [26] and [27].

Detecting Changes in TrafficIn AQM research, few methods discuss specific strate-gies to adapt to changes in traffic. Many methodsincorporate an integrator into the controller (specifi-cally the queue). Thus, if the arrival rate differs fromthe link-speed for an extended period of time, then thequeue occupancy will move from its desire value. Thecontroller will then act to correct the deviation in thequeue occupancy. However, such integrators are often“slow” unless a large gain is applied. On the otherhand, a large gain can lead to instabilities. Anotherapproach is to design a separate feedback loop todetect changes in network traffic. Two such approach-es are examined next. The first is an approach that isclosely related to sequential detection, while the sec-ond attempts to directly estimate the number of signif-icant active flows.

A Sequential Detection ApproachRED works well if the parameters are set correctly.Unfortunately, the parameters depend on the amountof traffic. ARED seeks to adapt the key parameter maxp(see Figure 2) when a change in traffic is detected andleave the RED mechanism to control the dynamics ofTCP. Specifically, maxp is adjusted when the averagequeue occupancy passes above the threshold maxt h orbelow the threshold mint h . BLUE [14], an alternativeAQM scheme, takes a similar approach; the packetdrop probability is increased if the queue occupancyincreases beyond qupper and is decreased if the queueoccupancy falls below the threshold qlower . WhileARED uses the average queue occupancy and BLUEused actual queue occupancy, both of these approachesare essentially sequential change point detectors. Webriefly examine the basis for such an approach.

Here we consider long-lived flows along with ran-domly occurring short and medium sized file transfers.As mentioned, when the packet drop probability isnonzero, the packet arrivals from the long-lived flowscan be modeled by an affine system driven by Gaussianwhite noise [26]. The packet arrival process due to theshorter flows can be approximately modeled as a

Plainly said, the queue is forrandom bursts, but TCP fills itto determine bandwidth.

Page 8: Signal Processing Challenges in Active Queue Management

IEEE SIGNAL PROCESSING MAGAZINE76 SEPTEMBER 2004

sequence of independent random variables; the upperleft plot in Figure 4 shows the histogram of the num-ber of packet arrivals from shorter flows.

When in balance, the packet drop probability is setso that the average aggregate arrival rate of the longand shorter flows is equal to the link capacity. But,when a new long flow begins, the aggregate arrival rateincreases and the system is out of balance. The longerthis imbalance remains, the larger the queue occupancybecomes. Since the packet arrival rates are stochastic,this new flow is not immediately detectable. However,to minimize queuing delay, the AQM must quicklydetect this imbalance. A reasonable way to detect thisimbalance is to employ sequential detection. Figure 4shows the probability density (histogram) of the num-ber of packet arrivals when there are random short andmedium flows and a new single, long-lived TCP flowbeginning at t = 0. In the example shown, the packetdropping probability is set to zero, hence the newlong-lived TCP flow will continuously increase its send-ing rate. Note that Figure 4 indicates that the packetarrival processes are approximately Gaussian with thesame variance but different mean. In this case, based onthe methods discussed in [28], the optimal sequentialchange point detection is as follows.

Define the statistic Q k according to

Q k+1 = max (Q k + Ak − µ, 0),

where Ak is the number of packet arrivals during thekth sample interval and

µ = A0TCP + A1TCP

2

with ArTCP is the average arrival rate with r long-livedflows. We declare that a new long-lived flow has begunwhen Q k ≥ H, where the threshold H imposes a par-ticular average delay to detection or, equivalently, afalse alarm rate. Now, if µ = BW × T , where T is thesample period and BW is the bandwidth, then the sta-tistic Q k coincides with the occupancy of the queue.And if H = qupper, then we see that BLUE is the opti-mal change point detector (under the Gaussianassumption). Similarly, the ending of a long-lived filetransfer can be detected when Q k < qlower.

While Figure 4 seems to indicate that change pointdetection can easily be applied, further work is requiredbefore making strong conclusions. For example, in thissimulation the sample period was 100 ms which exactlycoincides with the RTT of the long-lived flow.Nonetheless, the sequential detection view of AQMmakes a strong case that changes in packet arrival rateshould be detected using the actual queue size, asopposed to a filtered version of the queue such as theone given by (4a) and (4b).

While sequential change point detection is useful todetect long-lived TCP flows, it does not provide anyhint as to what should be done once a long-lived TCPflow is detected. One strategy is to do as in BLUEand increase the packet drop probability when thethreshold is crossed. Another strategy is to simplydrop all packets until the statistic drops below thethreshold. This second approach is simply a drop-tailqueue. Thus, there is some theoretical justification todrop-tail queue. Perhaps this is the reason that experi-ments have found that in many situations, drop tailperforms quite well.

Estimating the Number of Active FlowsIn [10], an interesting way to estimate the numberof active flows was developed. The goal of thismethod is to estimate the number of substantialflows. By substantial we mean flows that are respon-sible for a large portion of the arriving packets.Flows can be identified by their flow ID whichinclude the source and destination IP addresses andports. SRED stores M flow IDs. When a packetarrives, a flow ID is selected at random from thestored IDs. If the randomly selected flow ID match-es the newly arrived packet’s flow ID, then a hit isdeclared and H (k) = 1, where k denotes the kthpacket arrival. In the case that the flow IDs do notmatch, we set H (k) = 0. Furthermore, with proba-bility p, the flow ID at the randomly selected bufferlocation is changed to the flow ID of the newlyarrived packet.

An estimate of P (H (k) = 1) is U , a smoothed ver-sion of H,

▲ 4. The density (histogram) of the number of packet arrivals. Themagenta curve shows the histogram of the number of packets thatarrive in 100 ms when there are no long-lived TCP flows and justshort and medium sized flows. The red curve shows the histogramof the number of arrivals during the interval from 200 to 300 msafter a single long-lived TCP flow has started. The blue figureshows the histogram of the number of arrivals during the intervalfrom 400 to 500 ms after a single long-lived flow has started.

Pro

babi

lity

0.10

0.08

0.06

0.04

0.02

0

Number of Packet Arrivals

0 10 20 30 40

t = 0t = 0t = 300 mst = 500 ms

Page 9: Signal Processing Challenges in Active Queue Management

IEEE SIGNAL PROCESSING MAGAZINESEPTEMBER 2004 77

U (k + 1) = (1 − α)U (k) + αH (k),

with 0 < α < 1. Then U (k)−1 is an estimate of thenumber of flows. To see this, suppose that when apacket arrives, it belongs to flow i with probability πi .The probability of a hit is

P (H (k) = 1) =∑

i

π2i .

Now, if there are N flows, all sending at the same rate,then πi = 1/N and P (H (k) = 1) = N π2

i = 1/N . Onthe other hand, if πi �= 1/N for all i , then the estimateis not as accurate. In this case, this estimator weighsfaster flows more than slower flows. While more analy-sis of this approach can be found in [10], the impact ofsuch weighting has not been investigated.

Estimating the MeanAggregate Packet Arrival RateThe design of many AQM algorithms is based on themodel of the dynamics of TCP’s sending rate given by(2) (e.g., [13], [9], [12]). However, (2) is an approxi-mation of the dynamics of the mean sending rate.Thus, the observations made by the AQM will notcoincide with (2) unless there are a large number offlows and the mean value theorem applies. If there arefew flows, then random fluctuations in the sending ratemight be misinterpreted as indications of congestion.

Since both the stability and performance analysis hasbeen performed using (2), there is a risk that theseAQM strategies will not perform as designed. Oneremedy is to consider a more exact model such as a sto-chastic differential equation model found in [17].However, designing a controller for such a system isquite difficult. Another alternative is to estimate themean sending rate from the observations. In [29] it isshown that distribution of the sending rate is wellapproximated by a negative binomial with mean T (t )given by (2) and variance approximately 0.3/δ (t ),where δ (t ) is the packet drop probability at time t .With this distribution, one can start to develop meth-ods to estimate the mean from observations. However,there are substantial difficulties that need to be address.For example, along with the stationary distribution,one needs to know the transition probabilities. In [17],the transition probability was found to obey

∂tp (w, t ) = − 1

RT T∂

∂wp (w, t ) + w

RT Tδ (t )

× (4p (2w, t ) − p (w, t )),p (w, 0) = p0 (w), (6)

where p (w, t ) is the probability density function of thewindow size w at time t , given that at time t = 0, theprobability density function of the window size is p0.There does not appear to be a close form solution to

(6). Given the window size, w, the arrival rate averagedover one RTT is with RTT. Thus, to estimate meanaggregate arrival rate, it might be necessary to also estimate the RTT of the individual flows.

It is interesting to note that the AQM schemes thatassume that the observed arrival rate is mean arrivalrate, seem to work reasonably well. This reminds oneof LMS where similar instantaneous moment esti-mates are used.

Dithering and QuantizationAs shown in Figure 1, AQM decides whether to drop apacket or not. In most schemes, however, this decisionis not directly made. Instead, a middle ground is takenand only the packet drop probability is found.Dithering is a technique to express intermediate valuesbetween two quanta and can be used as an alternativeof probabilistic packet dropping. Dithering, specificallyerror diffusion, results in the dropped packets beingmaximally spaced [30]. Floyd and Jacobson recognizedthe advantage of spreading the dropped packets apart.The rationale for homogeneous packet drops is that ifdrops occur in bursts, then the arrival rate of the flowswill deviate from its intended rate.

An approach investigated by one of the authors iscalled DEM [31] and is as follows. Let Pk be the packetdrop probability and let Dk be the dithered version of Pk.Specifically, the kth packet is dropped if Dk = 1, where

▲ 5. Queue size with DEM (magenta curve) and RED (blue curve).

Que

ue (

Pac

kets

)

100

80

60

40

20

00 20 40 60 80 100

Time (s)

REDDEM

TCP does not send packets ata constant rate, rather it sendspackets in bursts with thetime between bursts equal to one RTT.

Page 10: Signal Processing Challenges in Active Queue Management

IEEE SIGNAL PROCESSING MAGAZINE78 SEPTEMBER 2004

Dk ={

1 if Pk + Ek−1 ≥ 12

0 otherwise,

and Ek = Dk − (Pk + Ek−1).While this scheme could be utilized with any of the

packet drop probabilities discussed earlier, we havefound that a particularly useful packet drop probability is

Pk ={

N (q (τk) /qmax)α if q (τk) ≤ qmax

( 1N

)1/α

0 otherwise,

where q (τk) is the queue size, N is the number ofactive flows,

α = log

(32

(MSS

RTT × BW + qdesiredMSS

)2)

/log

(qdesired

qmax

),

MSS is the packet size, RTT is the average round-triptime of the active flows, BW is the link bandwidth, qmaxis the maximum size of the queue, and qdesired is thedesired value of the queue. Note that this approachreduces the number of parameters to just one, qdesired, asopposed to the other AQM approaches that generallyrequire four or more. However, this method relies onthe difficult task of estimating quantities such as thenumber of substantial active flows and the average RTT.

This scheme was tested on a single bottleneck topol-ogy with four competing flows. Figure 5 shows a com-parison of the queue occupancy under DEM and RED.It is clear that DEM results in a more stable queue.Furthermore, DEM rarely allows the queue to empty,hence the utilization is high.

ConclusionWhile AQM has been the focus of intense researchefforts, many open problems remain. Here it wasshown that a number of these problems are perhapsbest solved by the signal processing community.Specifically, AQM challenges include prediction, detec-tion, estimation, and quantization.

While this article reviewed many of the central issuesof AQM design, there are some critical areas have beenneglected due to lack of space. One important area isthe stability of the closed-loop system. There has beenextensive work in this area, but still more workremains. For example, much of the work has focusedon stability around an operating point and has notexamined the nonlinear stability in the face of randomtraffic. Another area only briefly mentioned is the abili-ty of AQM to provide defense against network attacks.For example, if some or many end-hosts send packets ata high rate regardless of the packet drop probability,then the AQM should attempt to stop these flows or atleast not allow them to interfere with other flows.

An area that has yet to be fully explored is AQM inwireless networks. While cross-layer design and analysisof the MAC and physical layers are being addressed(indeed, two articles in this issue focus on joint MACand physical layer design), considering the dynamics ofTCP and the performance constraints discussed inSection III along with the physical and MAC layersremains virtually uninvestigated.

AcknowledgmentsThis work was supported by the National ScienceFoundation under grants 0322744 and 0312851.

Stephan Bohacek received his BS in electrical engineer-ing and computer science from the University ofCalifornia, Berkeley, in 1989. From 1989 to 1996, hewas a digital signal processing engineer for ThetaDigital and Wadia Digital. In 1999, he received isPh.D. in electrical engineering–systems from theUniversity of Southern California. From 1999 to2002, he was an assistant professor in theDepartment of Mathematics at the University ofSouthern California. Since 2002 he has been an assis-tant professor at the University of Delaware in theDepartment of Electrical and Computer Engineeringwith a joint appointment in the Department ofComputer and Information Sciences. His research isfocused on networking. Recently, his efforts havebeen directed toward traffic and network modeling,congestion control, reliable routing, networkingsecurity, and ad hoc networks.

Khushboo Shah received the B.S. degree in electricalengineering from L.D. College of Engineering,India, in 1998 and the M.S. degree in electrical engi-neering from University of Southern California in2001. In 2003 she spent the summer of 2003 atCAIDA. She is currently pursuing her Ph.D. in elec-tr ical engineering at University of SouthernCalifornia. Her research interests include AQM, traf-fic characterization and modeling, TCP modeling aswell as network security.

An area that has yet to be fullyexplored is AQM in wirelessnetworks

Page 11: Signal Processing Challenges in Active Queue Management

Gonzalo R. Arce received the Ph.D. degree fromPurdue University in 1982. Since 1982 he has beenwith the faculty of the Department of Electrical andComputer Engineering at the University of Delawarewhere he is the Charles Black Evans Professor anddepartment chairman. His research interests includestatistical and nonlinear signal processing, multimediasecurity, electronic imaging, and signal processing forcommunications and networks. He received the NSFResearch Initiation Award. He is a Fellow of the IEEE.He was cochair of NSIP’01, cochair of the 1991 SPIE’sSymposium on Nonlinear Electronic Imaging, andcochair of SPIE ITCOM 2002 and 2003. He was asso-ciate editor for IEEE Transactions for Signal Processing,senior editor of Applied Signal Processing Journal, guesteditor for IEEE Transactions on Image Processing, andguest editor for Optics Express. He is coauthor of thetextbooks Digital Halftoning (Marcel Dekker, 2001),Nonlinear Signal Processing and Applications (CRCPress, 2003), and Nonlinear Signal Processing (Wiley,2004). He holds seven U.S. patents.

Mike Davis received his B.E.E. in 1989 and his M.E.E.in 1991, both from the University of Delaware. Since1991 he has been the system and network manager ofElectrical and Computer Engineering and ComputerScience Research Lab at the University of Delaware,where he is a Ph.D. student. Since 1997 he has been aninstructor for the Department of Electrical andComputer Engineering. His research interests includecomputer networking protocols and management.

References[1] S. Floyd and V. Jacobson, “Random early detection gateways for conges-

tion avoidance,” IEEE/ACM Trans. Networking, vol. 1, no. 4, pp.397–413, 1993.

[2] B. Braden, D. Clark, J. Crowcroft, B. Davie, S. Deering, D. Estrin, V.J.S.Floyd, G. Minshall, C. Partridge, L. Peterson, K.K. Ramakrishnan, S. Shenker, and J. Wroclawski, “Recommendations on queue managementand congestion avoidance in the internet,” RFC 2309, 1998.

[3] M. May, J. Bolot, C. Diot, and B. Lyles, “Reasons not to deploy RED,” inProc. 7th. Int. Workshop Quality of Service (IWQoS’99), 1999, pp. 260–262.

[4] C. Brandauer, G. Iannaccone, C. Diot, T. Ziegler, S. Fdida, and M. May,“Comparison of tail drop and active queue management performance forbulk-data and web-like internet traffic,” in Proc. 6th IEEE Symp. Computersand Communications (ISCC’01), 2001, pp. 122–129.

[5] E.H.A. Priya Ranjan and R.J. La, “Nonlinear instabilities in TCP-RED,” inProc. Infocom, 2002, pp. 249–258.

[6] T. Ziegler, S. Fdida, and C. Brandauer, “Stability criteria of RED with TCPtraffic,” in 9th IFIP Working Conf. on Performance, 2001.

[7] V. Firoiu and M. Borden, “A study of active queue management for con-gestion control,” in Proc. Infocom, 2000, pp. 1435–1444.

[8] W.R. Stevens, TCP/IP Illustrated, Volume. Reading, MA: Addison-Wesley, 1995.

[9] S. Kunniyur and R. Srikant, “Analysis and design of an adaptive virtual

queue (AVQ) algorithm for active queue management,” in Proc. ACMSIGCOMM, 2001, pp. 123–134.

[10] T.J. Ott, T.V. Lakshman, and L.H. Wong, “SRED: Stabilized RED,” inProc. Infocom, 1999, pp. 1346–1355.

[11] S. Floyd, R. Gummadi, and S. Shenker, “Adaptive RED: An algorithm forincreasing the robustness of RED,” Tech. Rep., ACIRI, 2001.

[12] S. Athuraliya, V.H. Li, S.H. Low, and Q. Yin, “REM: Active queue man-agement,” IEEE Network, vol. 15, pp. 48–53, 2001, pp. 48–53.

[13] C.V. Hollot, V. Misra, D.F. Towsley, and W. Gong, “On designingimproved controllers for AQM routers supporting TCP flows,” in Proc.Infocom, 2001, pp. 1726–1734.

[14] W. Feng, D. Kandlur, D. Saha, and K.G. Shin, “Blue: An alternativeapproach to active queue management,” in Proc. NOSSDAV 2001, no.CSE-TR-387-99, 2001, pp. 41–50.

[15] T. Ott, J. Kemperman, and M. Mathis, “The stationary behavior of idealTCP congestion avoidance,” [Online]. Available: ftp://ftp.bellcore.com/pub/tjo/ TCPwindow.ps

[16] V. Misra, W. Gong, and D. Towsley, “Stochastic differential equationmodeling and analysis of tcp-windowsize behavior,” in Proc. PERFOR-MANCE99, 1999.

[17] S. Bohacek, “A stochastic model of tcp and fair video transmission,” inProc. Infocom, 2003, pp. 1134–1144.

[18] L. Le, J. Aikat, K. Jeffay, and F.D. Smith, “The effects of active queue man-agement on web performance,” in Proc. SIGCOMM, 2003, pp. 265–276.

[19] T. Ziegler, “On averaging for active queue management congestion avoid-ance,” in Proc. ISCC 2002, 2002.

[20] S. Floyd and V. Jacobson, “The synchronization of periodic routing mes-sages,” IEEE/ACM Trans. Networking, vol. 2, no. 2, pp. 122–136, 1994.

[21] B.B. et al, “Recommendations on queue management and congestionavoidance in the Internet,” RFC 2309, 1998.

[22] W.E. Leland, M.S. Taqqu, W. Willinger, and D.V. Wilson, “On the self-similar nature of Ethernet traffic,” in ACM SIGCOMM, D.P. Sidhu, ed.,San Francisco, CA, 1993, pp. 183–193.

[23] M.S. Taqqu, W. Willinger, and R. Sherman, “Proof of a fundamentalresult in self-similar traf fic modeling,” ACMCCR: ComputerCommunication Rev., vol. 27, no. 2, pp. 5–23, 1997.

[24] M. Crovella and A. Bestavros, “Self-similarity in world wide web traffic:Evidence and possible causes,” in Proc. SIGMETRICS’96, 1996, pp. 160–169.

[25] G.R. Arce, K.E. Barner, and L. Ma, “RED gateway congestion controlusing median queue size estimates,” IEEE Trans. Signal Processing, vol. 51,2003, pp. 2149–2164.

[26] K. Shah, S. Bohacek, and E. Jonckheere, “On the predictability of datanetwork traffic,” in Proc. American Control Conf., 2003, pp. 1619–1624.

[27] K. Shah, S. Bohacek, and E. Jonckheere, “On the performance limitationof active queue management (AQM),” in Proc. 43rd IEEE Conf. onDecision and Control, to be published.

[28] M. Basseville and I.V. Nikiforov, Detection of Abrupt Changes—Theoryand Application. Englewood Cliffs, NJ: Prentice-Hall, 1993.

[29] S. Bohacek and K. Shah, “TCP throughput and timeout—Steady stateand time-varying dynamics,” GLOBECOM, to be published.

[30] D. Lau and G.R. Arce, Modern Digital Halftoning. New York: MarcelDekker, 2001.

[31] G.R. Arce and M. Wilson, “Diffusion early marking (DEM),” Universityof Delaware, Newark, DE, Tech. Rep., 2003.

IEEE SIGNAL PROCESSING MAGAZINESEPTEMBER 2004 79