Internet Users - DSpace@MIT Home

85
Usage Profiles: Allocation of Network Capacity to Internet Users by Pierre Arthur Elysee Bachelor in Computer Systems Engineering University of Massachusetts at Amherst, 1997 BARKER MASSACHUSETTS'INSTITUTE OF TECHNOLOGY APR 2 4 2001 LIBRARIES Submitted to the Department of Electrical Engineering and Computer Science in partial fulfillment of the requirements for the degree of Master of Science in Electrical Engineering and Computer Science at the MASSACHUSETTS INSTITUTE OF TECHNOLOGY January 2001 © 2001 Massachusetts Institute of Technology All rights reserved. Author ........................ ................. Department of Electrical Enginee ng and Computer Science January, 2001 Certified by . ............................... David D. Clarke Senior Research Scientist ,Thesis Suvervisor Accepted by ............ ,% ... ............................ . . . . .. ; .. ......................... Arthur C. Smith Chairman, Department Committee on Graduate Students

Transcript of Internet Users - DSpace@MIT Home

Page 1: Internet Users - DSpace@MIT Home

Usage Profiles: Allocation of Network Capacity to

Internet Users

by

Pierre Arthur Elysee

Bachelor in Computer Systems EngineeringUniversity of Massachusetts at Amherst, 1997

BARKER

MASSACHUSETTS'INSTITUTEOF TECHNOLOGY

APR 2 4 2001

LIBRARIES

Submitted to the Department of Electrical Engineeringand Computer Science in partial fulfillment of the

requirements for the degree of

Master of Science in Electrical Engineering and Computer Science

at theMASSACHUSETTS INSTITUTE OF TECHNOLOGY

January 2001

© 2001 Massachusetts Institute of TechnologyAll rights reserved.

Author ........................ .................Department of Electrical Enginee ng and Computer Science

January, 2001

Certified by ................................David D. Clarke

Senior Research Scientist,Thesis Suvervisor

Accepted by ............ ,% ... ............................ . . . . .. ; .. .........................

Arthur C. SmithChairman, Department Committee on Graduate Students

Page 2: Internet Users - DSpace@MIT Home

Usage Profiles: Allocation of Network Capacity to Internet Users

by

Pierre Arthur Elysee

Submitted to theDepartment of Electrical Engineering and Computer Science

in partial fulfillment of the requirements for the degree ofMaster of Science in Electrical Engineering.

Abstract

In the Internet of today, there are only very crude controls placed on the amount of net-work capacity that any one user can consume. All users are expected to slow down whenthey encounter congestion, but there is little verification that they actually do, and there areno controls that permit a relative allocation of capacity to one user over another. Theresearch in this thesis describes a method to impose a usage limit, or "usage profile" on thebehavior of individual users. In particular, this thesis explores the design of usage profilesthat allow bursty traffic patterns, as opposed to continuous rate limits. This work describesan effective usage profile algorithm for web traffic which has a very bursty character. Thefollowing approach studies the characteristics of web traffic, and introduces the fundamen-tal concepts to establish the necessary framework. Through simulations, it analyzes anexisting usage profile, the leaky-bucket scheme for different token rates and different datasets, and points out its limitations in the context of web traffic. Then, it proposes a newusage profile, the Average Rate Control Usage Profile (ARCUP) algorithm, that best regu-lates web traffic. Several variants of this algorithm are presented throughout. It discussesthe characteristics of a good profile in order to facilitate the choice of a specific variant.The selected variant of the ARCUP algorithm is simulated for different target rates anddifferent data sets. The results show that this algorithm will work for any data sets that areheavy-tailed distributed, and for different target rates which represent different usage pro-files. This thesis concludes with a summary of findings and suggests possible applica-tions.

Thesis Supervisor: David C. ClarkTitle: Senior Research Scientist

Page 3: Internet Users - DSpace@MIT Home

This page left intentionally blank

3

Page 4: Internet Users - DSpace@MIT Home

Acknowledgements

I'd like to express my sincere gratitude to professor Dave Clark for his insights, motivation,

and encouragement throughout this thesis. I would like to extend my gratitue as well to professor

Al Drake who guided me through my dark days here. Finally, I would like to thank my friends

who have been instrumental to my success at MIT , in particular Amit Sinha and Eric Brittain. I

would like to thank all my professors and advisors who have contributed to my education and suc-

cess. Finally, my thanks go to those who are dearest to me: my mother, Odilia Nazaire and my

father Wilner Elysee for their everlasting love and support.

4

Page 5: Internet Users - DSpace@MIT Home

This page left intentionally blank

5

Page 6: Internet Users - DSpace@MIT Home

Table of Contents

1 Introduction 8

1.1 Introduction ................................................................................................. . . 8

1.2 W eb overview ................................................................................................ . 9

1.3 C haracteristics of w eb traffic ........................................................................ 10

1.4 Genesis of data used in simulations ............................................................ 11

1.5 M ethodology ................................................................................................ . 12

2 Theoretical background 14

2.1 D efinition of self-sim ilarity .......................................................................... 14

2.2 D efinition of heavy-tailed............................................................................. 15

2.3 Definition of ON/OFF sources......................................................................16

2.4 Examining ON-times or file transfer sizes....................................................16

2.5 E xam ining O FF-tim es................................................................................. 17

2.6 Fractional Brow nian M otion........................................................................ 17

2.7 Feedback C ontrol System ............................................................................ 19

3 Leaky bucket algorithm 22

3.1 D efinition of a leaky bucket........................................................................ 22

3.2 Analyzing the leaky bucket algorithm for different token rates .......... 23

3.3 Token rate equals to 10,000 bits per second............................................... 24

3.4 Token rate equals to 20,000 bits per second............................................... 26

3.5 Token rate equals to 25,000 bits per second............................................... 27

3.6 Token rate equals to 8,000 bits per second................................................. 28

3.7 Performance of the leaky bucket algorithm on different data sets...............28

3 .8 S um m ary ..................................................................................................... . 30

4 Averate Rate Control Usage Profile Algorithm 32

4 .1 Introduction ............................................................................................... . . 32

4.2 U ncontrolled average rate:.......................................................................... 33

4.3 M axim um data control.....................................................................................35

4.4 Decrease and increase the peak rate by a fixed factor ................................. 36

4.5 Varying the peak rate and maximum data control...................38

4.6 Performance of the ARCUP algorithm for different target rates.................41

4.7 Effect of maximum data size on obtained average rate ................ 45

4.8 Effect of maximum data on transmission duration ................... 46

6

Page 7: Internet Users - DSpace@MIT Home

4.9 Algorithm performance for different data sets.................................................47

4.10 Running the algorithm at the peak, target, and average rates ...................... 55

5 Conclusion, Applications, and Future Work 58

5.1 C onclusion ................................................................................................ . . 58

5 .2 A pp lication ................................................................................................. . . 59

5.3 F uture w ork ................................................................................................ . 60

5.4 R eferences.................................................................................................. . 6 1

A Source codes and Data sample 62

A .1 G enerating O FF tim es ............................................................................... .... 62A.2 This module represents the leaky bucket's scheme source codes ........ ...68A.3 This module represents the uncontrolled ARCUP algorithm ........... . ..73A.4 This module represents the uncontrolled ARCUP algorithm .................... 78A.5 Sample of data used in simulations .......................................................... 84

7

Page 8: Internet Users - DSpace@MIT Home

Chapter 1

Introduction

The goal of this chapter is to introduce our work and to establish a basis for its usabil-

ity. It contains the following information: an introduction, a brief overview of the Web, a

description of the characteristics of web traffic, and an analysis of the data used in simula-

tions.

1.1 IntroductionThe Internet today uses a service model called "best effort". In this service model, the net-

work allocates bandwidth amongst all the instantaneous users as best as it can, and

attempts to serve all of them without making any explicit commitment as to the quality of

service (QOS) offered [9]. For years, there have been heated debates in the Internet com-

munity regarding the types of services that should be provided in the future. For some

researchers, the existing service has been working fairly well so far. Others argue that

users who are willing to pay more money in order to have a better service should be given

the option. Previous work in [9] has explored the issue of extending the Internet by adding

features that permit allocating different service levels to different users. A number of

schemes have been proposed to accomplish this goal: fair allocation service, priority

scheduling, expected capacity allocation, guaranteed minimum capacity. Closely related

to the work presented in this paper is the guaranteed minimum capacity scheme. This ser-

vice provides a guaranteed worst case rate along any path from source to destination (for

more details see [9]). A drawback with this scheme is that it assumes that the traffic

offered by the user is a steady flow. This is not the case for Internet traffic.

The majority of Web users are not interested in a capacity profile that allows them to

go at a continous steady rate. While cruising the web, for instance, the normal usage pat-

tern is short on-periods separated by long off-periods. The ideal profile from the user's

prospective would allow bursts to occur at a high speed. It should award a user that is not

sending continously at the high rate, and constrain a user that is doing the contrary.

Appropriate profiles are needed to match bursty usage patterns. The problem with defining

8

Page 9: Internet Users - DSpace@MIT Home

such a profile is made harder by the fact that the size of web transfers are "heavy-tailed",

which means that while almost all Web transfers are very small, most of the bytes trans-

ferred are in a few very large transfers. In a previous study conducted in [8], a leaky-

bucket scheme has been proposed to regulate web traffic. This paper confirms that the

leaky-bucket scheme penalizes web traffic by introducing excessive delays. Then, it pro-

poses a scheme that better regulates this type of traffic.

The rest of this paper has the following organization: the remaining of chapter I

emphasizes the need for usage profiles, defines the characteristics of web traffic, and ana-

lyzes the genesis of the data used in simulations; chapter II contains theoretical back-

ground information; chapter III presents and analyzes the "leaky bucket" algorithm;

chapter IV presents and analyzes the "average rate control usage profile" (ARCUP) algo-

rithm; chapter V proposes possible applications of the usage profile algorithms and a sum-

mary of findings; it contains references, source code, and a sample of the data sets used in

simulations.

1.2 Web overviewThe eminent role of the World Wide Web as a medium for information dissemination

has made it important to understand its properties. In recent years, the Web has been used,

among other applications, to trade stocks on-line, to conduct electronic commerce, and to

publish and deliver information in a variety of ways such as raw data, formatted text,

graphics, audio, video, and software [5]. An easy way to think of the Web, however, is as

a set of cooperating clients and servers. We interact with the Web through web browsers;

Netscape and Internet explorer currently dominate the market. A browser is a graphical

client program that allows users to access remotely located files or objects [11]. In order to

access a file, we need to know its URL (uniform resource locator) which resembles the

following:

http://www.haitiglobalvillage.com/

http://www.soccer.com/

http://www.amazon.com/

9

Page 10: Internet Users - DSpace@MIT Home

Since the Web is organized as a client-server model, each file is stored on a specific host

machine. Upon a request from the user, the requested file is transferred to the user's local

machine. These exchanges generate traffic over the Internet.

Today, the Web generates more data traffic on the Internet than any other application.

In an attempt to control the amount of traffic generated by each user on the Internet, one

proposal is to impose a usage profile on each user to regulate their traffic pattern. A usage

profile is a mechanism that shapes traffic and limits the length of bursts; further, it defines

a limit on the maximum traffic generated by each user and discourages users from abusing

the network by introducing additional delays on their transfers should they violate their

profile.

1.3 Characteristics of web traffic

The design and implementation of an attractive "usage profile" algorithm for Web

browsing requires a thorough understanding of Web traffic (in this paper, we use the terms

web traffic and Internet traffic interchangeably since the Web is currently the main con-

tributor of network traffic). For years, researchers assumed that traffic on the Internet fol-

lowed the model of a poisson process. Poisson arrival processes have a bursty length

characteristic which tends to be smoothed by averaging over long period of time [1]. To

the contrary, recent studies of LAN and wide-area network have provided ample evidence

that Internet traffic is self-similar [1]. As a result, such traffic can be modeled using the

notion of self-similarity. Self-similarity is the trait we attribute to any object whose

appearance is scale-invariant. That is, an object whose appearance does not alter regard-

less of the scale at which it is viewed [13]. [2] shows that Internet traffic exhibits long-

range dependence as well. Long-range dependence involves the tail behavior of the auto-

correlation function of a stationary time series while self-similarity refers to the scaling

behavior of the finite dimensional distribution of a discrete or continous time process [2].

Futher, a process is considered to be long-range dependent or heavy-tailed if its auto-cor-

relation function decays hyperbolically rather than exponentially [2]. Since self-similar

processes display similar features, they are often referred to as heavy-tailed [8]. The dif-

ference between self-similarity and heavy-tailed will be addressed in the next chapter.

10

Page 11: Internet Users - DSpace@MIT Home

1.4 Genesis of data used in simulationsTo understand the nature of web traffic, we made use of real users' traces. These traces

were used to evaluate our proposed control algorithms. They were collected at Boston

University's Computer Science Department. Researchers at BU added a measurement

apparatus to the Web browser NCSA Mosaic, the preferred browser at the time (November

1994 through February 1995). These researchers were able to monitor the transactions

between each individual user on their LAN and the Internet. These traces contain records

of the HTTP request and user behavior occurring between November 1994 and May 1995.

During that period a total of 9,633 Mosaic sessions were traced, corresponding to a popu-

lation of 762 users, and resulting in 1,143,839 requests for data transfer. Here is a sample

trace:

gonzo "http://www.careermosaic.com/cm/cml.html" 4776 0.806263

gonzo "http://www.careermosaic.com/cm/images/cmHome.gif" 90539 7.832752

gonzo "http://www.careermosaic.com/cm/cml.html" 0 0.0

Each line corresponds to a single URL request by the user and can be read as follows:

User name: gonzo

URL: http://www.careermosaic.com/cm/

File name: cml.html

Size of the document in bytes: 4776 bytes

Object retrieving time in seconds: 0.806263

These data were collected at the application level. They were thoroughly examined in [1]

which showed that the heavy-tailed nature of transmission and idle times are not primarily

due to network protocols or user preference; the heavy-tailed property can rather

11

Page 12: Internet Users - DSpace@MIT Home

350

data set gonzo

300

250

C/)-2 200

0E

1 50

100

50

00 2 4 6 8 10 12 14 16 18

File sizes in bytes X 104

Figure 1-1: Distribution of data set gonzo

be attributed to information storage and processing. The results of these studies show that

files transmitted through the Internet and files stored on servers obey heavy-tail distribu-

tions. Figure 1-1 shows that data set gonzo is indeed heavy-tailed. Most of the files have

sizes less than 20,000 bytes while some are as big as 160,000 bytes. With confidence, we

use these data sets to simulate our algorithms.

1.5 Methodology

In this paper, ON-times are represented by data collected by researchers at BU from

the monitoring of transactions between each individual user on their LAN and the Internet.

When not specified, data set gonzo will be the one considered in our analyses to represent

ON-times. OFF-times are generated by a model based on a fractional Brownian motion

algorithm. The leaky-bucket and ARCUP algorithms are the two schemes evaluated and

contrasted in this paper; they are both written in C. The leaky-bucket scheme is simulated

12

I

Page 13: Internet Users - DSpace@MIT Home

for different token rates in order to determine the most appropriate rate for a given user.

Moreover, it is simulated for different data sets which illustrate its performance and its

limitation in each case. More precisely, it shows that leaky-bucket is not the preferred

scheme to regulate heavy-tailed distributed traffic. To cope with the shortcomings of the

leaky-bucket scheme, the ARCUP algorithm is proposed.

I construct a hypothetical user model by combining the ON times derived from BU

user data, and the OFF times derived from our fBM model. I picked a target value of aver-

age usage for each user of 10,000 bits/second, which is a not unreasonable overall usage

rate for a user exploring the Web today. To adjust each data set to achieve this long-term

average rate, I scale the OFF times produced by model appropriately. The result of this

scaling operation is that for each user trace, if no control is applied (that is, if there is no

usage profile), the average rate over the whole trace will be 10,000 bits/sec.

In general terms, the method used to evaluate each proposed usage profile is as fair as

possible. The profile is initially set so that it also has a long-term average permitted rate of

10,000 b/s. We observe the extent to which the bursty behavior of the user is passed

through the profile unchanged. We then adjust the average rate. We then developed several

variants of this algorithm. The best variant is simulated for different target rates and data

sets. It embodies the most effective usage profile algorithm for Web traffic that emerges

from this study. On the one hand, this algorithm permits bursty traffic if the user's average

rate is less than the contracted target rate. On the other hand, it prevents the user from

abusing his/her profile.

13

Page 14: Internet Users - DSpace@MIT Home

Chapter 2

Theoretical background

The background information presented in this chapter is aimed at putting our work into

context. It is designed to ease the reader's understanding and will help the reader to appre-

ciate the merit of our results. This chapter is organized as follows: sections I and II define

self-similarity and heavy-tail behavior; section III defines and examines the nature of ON/

OFF sources; section IV presents a definition of fractional Brownian motion; section V

defines autocorrelation; and finally, section VI introduces the notion of feedback control

which is essential for the understanding of the ARCUP algorithm.

2.1 Definition of self-similarity

Self-similarity (this definition closely follows the one given in [1]): let X (t)

be a stationary time series with zero-mean ( i.e., pt = 0 ). The m-aggregated series is

defined by summing the original series X over non-overlapping blocks of size m. X is said

to be H-self-similar, if for all m >= 0, X(m) has the same distribution as X rescaled. Math-

ematically,

tiM

X~ =m-_ Xi (2-1)

(i = t- 1)

for all natural m. When X is H-self-similar, its autocorrelation function is given by:

r(k) = E[(X- (Xt+k) (2-2)2

which is the same for the series X(m) for all m. Therefore, the distribution of the aggre-

gated series is the same as that of the original except for a change in scale. In general, a

process with long-range dependence has an autocorrelation function

14

Page 15: Internet Users - DSpace@MIT Home

r(k) = k (2-3)

as k goes to infinity, where 0< b <1. The series X(t) is said to be asymptotically second-

order self-similar with Hurst parameter H = 1- b/2 [3]. The Hurst parameter gives the

degree of long-range dependence present in a self-similar time series.

2.2 Definition of heavy-tailedThe understanding of heavy-tailed distribution is very important in network engineer-

ing due to their relationship to traffic self-similarity. Unlike exponential, poisson, or nor-

mal distribution, heavy-tailed distributions exhibit uncommon properties. To date, the

simplest heavy-tailed distribution known is the Pareto distribution. Both its probability

mass function and its cumulative mass function are given by:

P(x) = akx-(a+ 1) (2-4)

F(x) = P[X:x] = I- (2-5)

for cc, k > 0, and x >=k. In this distribution, if c <= I (see equation 1 below) the distribu-

tion is unstable with infinite mean and infinite variance. If cc<= 2, on the other hand, the

distribution is stable with finite mean but infinite variance [1]. Consequently, the degree

of self-similarity largely depends on the parameter alpha. Regardless of the behavior of

the distribution, the process is heavy-tailed if the asymptotic shape of the distribution is

hyperbolic.

The Pareto distribution has an attractive trait; its distribution is hyperbolic over its

entire range . It is widely used to generate self-similar traffic. In fact, it is shown in [2]

that superposing a number of Pareto distributed ON/OFF sources produces a time series

sequence that is asymptotically self-similar.

2.3 Definition of ON/OFF sourcesUsing Web traffic, user preference, and file sizes data, [5] explains why the transmis-

sion times and quiet times for a given Web session are heavy-tailed. Web traffic can be

15

Page 16: Internet Users - DSpace@MIT Home

modeled using heavy-tailed distribution; in the literature, they are referred to as ON and

OFF times distributions. From a user's point of view, ON times correspond to the trans-

mission duration of Web files, and OFF times represent times when the browser is not

actively transferring data [5]. It is imperative to mention that transmission times depend

on network condition at the time as well. Technically, ON times represent periods during

which packets (or cells depending on the architecture in use) arrive at regular intervals

while OFF times exemplify periods with no packets arrival. Some systems have an ON/

OFF characteristic where an ON-period can be followed by other ON-periods and OFF-

periods by other OFF-periods [2]. In contrast, the model used to mimic ON-OFF sources

in Internet traffic are said to be strictly alternating [5]. That is an ON-period is always fol-

lowed by an OFF period, vice-versa. And at the level of individual source-destination

pairs, the lengths of the ON and OFF-periods are independent and identically distributed.

However, the ON and OFF periods are independent from one another, and their distribu-

tion need not be the same. In our simulations, we use actual data from Web traces to rep-

resent ON-times, and generate synthetic OFF times using fractional Brownian motion.

2.4 Examining ON-times or file transfer sizesResearchers in [1] showed that the distribution of transfer size for file sizes greater

than 10,000 bytes can be modeled using a heavy-tail distribution. It follows that self-sim-

ilarity or heavy-tailed property exhibited by Internet traffic is mainly determined by the set

of available files in the Web. Today, most of the data transfered over the Internet represent

multimedia files (i.e., images, text, video, and audio). Researchers in [5] report that

although multimedia may not be the primary factor in determining the heavy-tailed nature

of files transferred, it does increase the distribution tail weight. The tail weight for the dis-

tribution of file sizes between 1,000 and 30,000 bytes is primarily due to images; for file

sizes between 30,000 and 300,000, the tail weight is caused mainly by audio files. Finally,

from 300,000 and onwards, the weight is attributed to video files.

To prove that multimedia is not solely responsible for the heavy-tailed nature of Inter-

net traffic, the authors in [5] compare the distribution of available files in the Web with the

distribution of Unix files. They conclude that the distribution of Unix files are much

heavier tailed than the distribution of Web files. This conclusive remark suggests that even

16

Page 17: Internet Users - DSpace@MIT Home

with added multimedia contents, Web files do not dominate the heaviness in the tail distri-

bution of transferred files: file sizes in general follow a heavy-tailed distribution [5].

2.5 Examining OFF-timesAs mentioned in section IV-I, ON-times are the result of transmission durations of

individual Web files; OFF-times, on the other hand, represent periods when a browser is

not actively transferring data. While ON-times are mainly due to the transmission duration

of Web data, OFF-times can be the result of a number of different causes: the client's

machine (or workstation) may be idle because it has just finished receiving the content of a

Web page. Therefore, before requesting the next content, it will interpret, format, and dis-

play the first component. In other cases, the client machine may be idle because the user is

processing the information last received; or the user may not be using his/her machine at

all. These two phenomena are called "active OFF" and "inactive OFF" times by the

authors in [1]. The understanding of OFF-times distribution lies in their differences.

Active "OFF-times" represent the time required by client machine to format, interpret,

and display the content of a Web page. As a result, machine processing time tends to be in

the range of ims to 1 second [1]. It is unlikely, however, for an embedded Web document

to require more than 30 seconds processing time. The researchers in [1] assumed that

inactive OFF-times are in general due to user inactivity. Thereby, they concluded that

inactive "OFF-times" resulting from user inactivity are mainly responsible for the heavy-

tailed nature of OFF-times.

2.6 Fractional Brownian MotionFractional Brownian motion is a natural extension of ordinary Brownian motion. It is

a gaussian zero-mean stochastic process, and is indexed by a single scalar parameter H

ranging between zero and one [4]. Fractional Brownian motion (fBm) has very useful

properties: fractal dimension, scale-invariance, and self-similarity [3]. As such, fBm rep-

resents a good model to describe non stationary stochastic processes with long-range

dependence [4]. As a non stationary process, fBm does not admit a spectrum in the usual

sense. However, it is possible to attach to it an average spectrum [4].

17

Page 18: Internet Users - DSpace@MIT Home

Although non stationary, fBm does have stationary increments, which means that the

probability properties of the process:

Bh(t + s) - Bh(t) (2-6)

only depend on the variable s. Moreover, this increment process is self-similar because for

any a > 0, the following is true:

Bh(at)) = a x Bh(t) (2-7)

where '=' represents equality in distribution. A standard fractional Brownian motion has

the integral representation:

Bh ( t2 )-Bh(tl) = 1/(F(h+0.5)).

t2

f(t2 - s) h-0.dB(s)

tI

- f(tl -s) -0.5dB(s) 2-8)

Ordinary Brownian motion is obtained from the standard fractional Brownian motion

when the Hurst parameter h = 0.5. The non stationary property of fBm is manifested in its

covariance structure [4] given:

E(Bh(t)Bh(s))) = ((y2 )/(2))(t 2 h + 1s2hi - (t - s)2h1 ) (2-9)

After manipulating the previous equation, the variance is

Var(Bh(t)) = y2t 2 hJ (2-10)

The self-similarity characteristic of fBm can be deduced from the previous equation with a

scale transformation [3].

2 2h 2G~h (ri) = r CTI2 (1) (2-11)

18

Page 19: Internet Users - DSpace@MIT Home

This result proves that fractional Brownian motion is scale-invariance and statiscally

indistinguishable with regard to a scale transformation [3].

2.7 Feedback Control SystemThe reader is encouraged to reread this section for a better understanding of the

ARCUP algorithm; it gives a brief definition of a control system. A control system can be

viewed as a system for which the manipulation of the input element(s) results in a desired

output(s). Two main features define a control system: first, a mathematical model that

expresses the characteristics of the original system; second, the design stage in which an

appropriate control mechanism is selected and implemented in order to achieve a desired

system performance [6]. The control of a system can be performed either in open loop or

in closed-loop. An open loop is a system in which the control input to the system is inde-

pendent of its output. It is imperative that the system itself is not influenced by the system

output. Open-loop systems are often simple and inexpensive to design. In some applica-

tions, however, it is important to feed back the output in order to better control the system.

In this case, the loop is said to be closed. Therefore, in a closed-loop system the control

input is influenced by the system output. The system output value is compared with a ref-

erence input value, the result is used to modify the control system input [6]. The mathe-

matical modeling of feedback systems was introduced by Nyquist in 1932: he observed

the behavior of open loop systems and sinusoidal inputs to deduce the associated closed-

loop system. His work was later improved and surpassed by Bode in 1938 and Evans in

1948. Their work constitutes the basis of classical control theory [6].

A basic closed-loop system contains: an input device, an output measuring device, an

error measuring device, and an amplifier and control system. The latter manipulates the

measured error in order to positively modify the output [6]. It is worth noting that there

are two types of closed-loop or feedback systems:

(a) A regulator is a closed-loop system that maintains an output equal to a pre-deter-

mined value regardless of changes in system parameters; and

19

Page 20: Internet Users - DSpace@MIT Home

(b) A servomechanism is a closed-loop system that produces an output equal to some

reference input position without change in parameter values. The former is better suited

to model the system presented in this paper.

20

Page 21: Internet Users - DSpace@MIT Home

This page left intentionally blank

21

Page 22: Internet Users - DSpace@MIT Home

Chapter 3

Leaky bucket algorithm

Having established the framework that would allow the reader to understand our work,

we are now in a position to introduce the first usage profile algoritm. The study of leaky

bucket assumes that the traffic source behaves as a two-state on/off arrival process with an

arbitrary distribution for the time spent in each state [9]. A source is considered to be on

(or busy) when it is transmitting/receiving and off (or idle) when it is not. Technically,

ON-times represent periods during which packets (or cells depends on the architecture in

use) arrive at regular intervals while OFF-times represent periods with no packets arrival.

In this chapter we introduce the concept of a leaky bucket, and analyze the results of the

algorithm for different token rates and on different data sets. It is organized as follows:

section I gives a definition of the leaky bucket concept; section II analyzes the algorithm

for different token rates ranging from 10,000 to 25,000 bits per second; section III evalu-

ates the performance of the leaky bucket algorithm for different data sets; and finally, sec-

tion IV summarizes our findings.

3.1 Definition of a leaky bucketThis mechanism has been utilized as a congestion control strategy in high-speed net-

works. It is often used to control traffic flows; ATM is a prime example. The ATM (Asyn-

chronous Transfer Mode) technology is based on the transmission of a fixed size data unit.

When an ATM connection is established, the traffic characteristics of the source and its

quality of service are guaranteed by the network. The network enforces the admission con-

trol policies by using a usage parameter control [13]. The leaky bucket algorithm serves

this purpose. The leaky bucket algorithm is used to control bursty traffic as well. An Inter-

net Service Provider (ISP), for instance, can use a leaky bucket profile to shape its incom-

ing traffic [8]. A simple leaky bucket is characterized by two components: the size of the

bucket and the token replenishing rate. Tokens are generated at a constant rate and are

stored in the bucket which has a finite capacity. A token which arrives when the bucket is

full is automatically discarded. If the bucket contains enough tokens when packets arrive,

22

Page 23: Internet Users - DSpace@MIT Home

the mechanism allows them to pass through. That is, the user can burst traffic into the net-

work at his/her allowed peak rate which is most of the time equivalent to the physical link

capacity. After each transfer, the bucket is decremented accordingly. When the number of

tokens is insufficient only a portion of the arriving traffic is immediately sent while the rest

is queued. If the bucket is empty, all arriving packets (traffic) are queued or discarded

according to the policy in place. The queued packets are serviced upon token arrival in the

bucket. If the token replenishing rate is constant, the queued packets service rate is con-

stant as well. Note, with large buckets, users can send bursty traffic in a short time period.

The token replenishing rate, on the other hand, allows user to send data at constant bit rate

for any periods of time.

3.2 Analyzing the leaky bucket algorithm for different token rates

The leaky-bucket scheme has been utilized to monitor and enforce usage parameter

control (UPC). Researchers in [8] show that a leaky-bucket scheme can be imposed on

each on-off source as a usage profile. A profile is, therefore, defined by an initial number

of tokens in the bucket and a token replenishing rate. The profile determines when the on-

off sources (or users) can burst traffic into the network or should they send at the token

rate. In this thesis, the bucket has an infinite capacity (i.e., no token is discarded). The

amount of token accumulated by the bucket, however, is finite; it depends on the OFF-

periods. Indeed, the lenghts of OFF-periods are bounded, thereby the amount of tokens in

a bucket is bounded as well. In general, the transfer time for each file has a peak time and

a token time component. The larger the file (i.e., the longer the on-period) the bigger the

token component is likely to be and the slower the transfer. The longer the off-period the

more tokens the bucket accumulates. The arriving file following such a period can be sent

at the peak rate. The bigger the peak time component, the faster is the transfer. After a

long ON-time, representing the transfer of a large file, the bucket contains few tokens.

The total transfer time required to send an arriving file depends more generally on the his-

tory of previous ON-OFF times. If the arriving file is small, there is a high probability

that it will be sent at the peak rate. Conversely, if the arriving file is large, there is a low

probability that the bucket will have enough tokens: it will be processed at a speed near the

token rate [8]. A drawback with this scheme is that the more heavy-tailed the distribution

23

Page 24: Internet Users - DSpace@MIT Home

of the ON-times is the more the traffic that has to be sent at the token rate. [8] shows that

increasing the peak rate is not an effective way to solve this problem. In the following

subsections, we analyze the leaky bucket algorithm for different token rate values. We

will consider the normalized average rate (i.e., 10,000 bits/sec) and two of its multiple:

token rates of 20,000, and 25,000 bits per second. These rates will constitute the focus of

our attention. This procedure will allow us to determine the required token rate that will

provide the least total transfer time for the user under consideration.

3.3 Token rate equals to 10,000 bits per second

The initial value of 10,000 bits per second is chosen since this is the actual long-term

average rate of the traffic to be controlled. When applying this token rate, most transfer

times have two components: a peak time and a token time. The first fraction of the data is

received at the peak rate, and the rest if any at the token rate. The smaller the transmission

duration time the more dominant is the peak time. As mentioned earlier, the goal of the

usage profile algorithm is to regulate users to their long-term contracted rate (token rate or

target rate), allow burst traffic to be sent rapidly, and prevent abuse. In the scatter plot of

Figure 3-1, the points that are lying near the X-axis represent small files that are sent at the

24

Page 25: Internet Users - DSpace@MIT Home

peak rate. On the other hand, the transfers that required thirty-seven, fourty-two, and

2 4 6 8 10Data transferred in bytes

12 14 16 18

x 104

Figure 3-1: Leaky bucket with token rate 10,000 bits/sec

forty-nine seconds are three examples of very large transfers. As a result, the token time

dominate in each case. For example, for the thirty seven second transfer, the file size is

58,852 bytes with a peak time of 0.0928 second and a token time of 37.7 seconds. For the

forty-two second transfer, the file transferred has a size of 90539 bytes with a resulting

peak time of 0.3 and a resulting token time of 42.12 seconds. Finally, for the forty-nine

second event, the file transferred has a size of 163711 bytes resulting in a peak time of

0.81 second and a token time of 48.32 seconds. In all three cases the results are much bet-

ter than if the user were receiving strictly at the token rate. The reader can verify that for

the larger transfer (i.e., file size = 163711 bytes), the total transmission time would have

been 131 seconds had the whole file been sent at the token rate which implies that over

one half of the data was received at the peak rate, and the rest at the token rate. In this

case, the transmission duration time is dominated by the token time.

25

4

4

3

3

2

2

0

_0

0

EU)

leaky bucket aig. with token rate = 10,000 bits/sec5-

*

0 --*

5-

0-

5 -

0-

5-

0 - *

*

5 - ** **4

nimh* **** *I *

0

r5

1

1

Page 26: Internet Users - DSpace@MIT Home

3.4 Token rate equals to 20,000 bits per second

As mentioned earlier, the worst case transfer time is file size in bits divided by the

token rate. In this case, we decrease the transfer time for two reasons: a faster token rate,

and a larger number of tokens after a given OFF time which increases the number of bytes

sent at the peak rate. By increasing the token rate, we decrease the transfer time: they are

inversely proportional. However, increasing the token rate has the disavantage that it per-

mits a user to "abuse" the profile by sending at a long-term steady state rate greater than

the intended target rate. The duration of the transfer still has a peak time component and a

token time component. In the scatter plot of Figure 3-2, the peak time dominates in most

of the transactions. As an illustration, the two outliners occuring at 0.5 and 4.5 seconds

have greater token times (see Figure 3-2). The first represents the file transferred for the

5

2 4 6 8 10Data transferred in bytes

12 14 16 18

x 104

Figure 3-2: Leaky bucket with token rate 20,000 bits/sec

duration of 0.5 second, and having a size of 16,031 bytes with a peak time of 0.117 and a

token time of 0.4 second. The second represents the file transferred for the duration of 4.5

26

4.

3.

C

0

2.

0U)

C

_ 1.

0.

* leaky bucket aig. with token rate = 20,000 bits/sec4-

5-

3-

5-

2-

5-*

1 *

0

Page 27: Internet Users - DSpace@MIT Home

seconds, and having a size of 15,429 bytes with a peak time of 0.04 second and a token

time of 4.17 seconds.

3.5 Token rate equals to 25,000 bits per secondA token rate of 25,000 bits per second is chosen here based on simulation results. This

rate is sufficient to allow this user to receive all files at the peak rate (i.e., achieves the low-

est possible transmission duration time). That is, the user always accumulates enough

tokens to receive data at the peak rate --token time is null. The transmission duration

depends only on the peak rate. As an illustration, if we consider the 80,000 bytes file, we

find the duration time to be 0.64 seconds. This result is compliant with the results in Fig-

.,2

_0

~0CO)

E(n)

1.

0.

0.

0.

0.

2 4 6 8 10Data transferred in bytes

12 14 16

Figure 3-3: Leaky bucket with token rate 25,000 bits/sec

ure 3-3. Further increasing the token rate for this data set will not improve

tion.

the delay dura-

27

leaky bucket with token rate = 25,000 bits/sec *

2-

*

1-

8 --

6 -

4-

2-

0 18

x 104

I I

Page 28: Internet Users - DSpace@MIT Home

3.6 Token rate equals to 8,000 bits per secondIn case of overload, that is when the normalized average rate of a user is greater than

the token rate, the replenishing rate of the bucket is slower than the arriving data rate. The

bucket is unable to accumulate enough tokens. Therefore, most of the files are received at

the token rate. The transmission duration of the largest file goes from fifty seconds at

10,000 bits per second to 160 seconds at 8,000 bits per second. The leaky bucket algo-

rithm severely penalizes overloaded traffic with heavy-tailed distributions. In the next

chapter, we present a different algorithm that yields better result from a similar situation.

160

2 4 6 8 10Data transferred in bytes

12 14 16

Figure 3-4: Leaky bucket with target rate 8,000 bits/sec

3.7 Performance of the leaky bucket algorithm on different data setsStudying the algorithm for one data set is insufficient to draw a conclusion about per-

formance: data set distribution varies from user to user. This section evaluates the perfor-

mance of the leaky bucket algorithm for different users (or data sets). The objective in the

28

1

4

2

01M

0

_0

~0C

U')

E

leaky bucket algorithm with token rate = 8,000 bits/sec

0 -

0 --

0 -

0 --

0--

0-

0-

8

6

4

2

0 18

x 104

Page 29: Internet Users - DSpace@MIT Home

following analysis is to determine the minimum token rate required by each data set in

order to achieve the lowest possible transfer time. For simplicity, we conducted this study

of the algorithm on only four sets of data; we analyzed data sets goofy, daffy, gonzo, and

pooh. From the scatter plot of Figure 3-5, we can infer that data set goofy attains its lowest

leaky bucket wit token rate = 25,000 bits/sec9- toer data set goofy.

7 -

67--

5 -

43--

3 -

0 2 4 6 8 10 12 1Data transferred in bytse X 105

1.2

0.6

0.4

02

leay suke wit token rate =25,000 bits/secton data set geezo.

--

~d 2 4 6 8 10 12Data transferred in bytes

14 16

45

4

35

25

0 5

18

X to70

leaky bucket with token rate = 35,000 bits/sector data set datty.*

1 2 3 4 5 6Data transferred in bytse

7 8 9

X 10o

Figure 3-5: Maximum token rate required for each data set

transferred duration time of ten seconds for a token rate of 25,000 bits per second; data set

daffy reaches its lowest transferred duration time of five seconds for a corresponding token

rate of 35,000 bits per second; data set gonzo reaches its lowest transferred duration time

of 1.4 seconds for a corresponding token rate of 25,000 bits per second. Finally, data set

pooh reaches its lowest transferred duration time of 0.7 second for a corresponding token

rate of 55,000 bits per second. If we consider the peak rate (IMbits per second), our cal-

culations yield transferred duration times corresponding to he results exhibit by the scat-

29

4 0 1 2 3 4 5 6Data transferred in bytes X 10

leaky bucket with token rate 55,000 bits/secfor data set pooh,

0.6 -

0.5-

.0 0.4

0.3-

02-

01

Page 30: Internet Users - DSpace@MIT Home

ter plot of Figure 3-5. The discrepancy in the token rate, in turn, can be attributed to the

difference in these data sets distributions.

Data set goofy reaches its maximum token rate sooner because its data set tail distribu-

300

250

200

150

100

50

data set goofy

2 4 6 8File sizes in bytes

10 12 14

X 10

250 data set pooh

10

Eo

250F data set daffy

200

1t0

1 2 3 4File sizes in bytes

5 6

X 10

data set gonzo

300

250

200

0 0

-

10

50

0 3 7 8 01 2 1 2 14 16 18File sizes in bytes X 1o' File sizes in bytes X 104

Figure 3-6: Data set distributions

tion is less heavier (see Figure 3-6). As a result, data set goofy accumulates enough

tokens that allows it to receive data at the peak rate given a smaller token rate. That is the

opposite for data set pooh which has a very heavy-tailed distribution. Consequently, the

required token rate is much larger.

3.8 SummaryTo summarize, the transmission duration time decreases as we increase the token rate.

Though increasing the token rate (i.e., the usage profile) results in lowering the transfer

delays, such a scheme is neither efficient nor economically appealing. To achieve a desir-

30

0111

Page 31: Internet Users - DSpace@MIT Home

able delay, a user will eventually need a higher profile which will be more expensive.

After all, this higher usage profile may be needed just because of a few large transfers.

Running this scheme for different data sets shows that the heavier the tail distribution of

the data set the greater is the required token rate necessary to achieve the desired minimum

delay. As pointed out in [1], the leaky-bucket scheme performs well with Poisson like dis-

tnbution but penalizes heavy-tailed traffic (i.e., web traffic). The need for a suitable algo-

rithm to regulate web traffic is eminent: the next chapter proposes and evaluates such an

algorithm.

31

Page 32: Internet Users - DSpace@MIT Home

Chapter 4

Averate Rate Control Usage Pro-file Algorithm

4.1 IntroductionThis algorithm is build on the Time Sliding Window Tagging algorithm presented in

[12]. There are two parts to this profile scheme. The first is a rate estimator, based on the

Time Sliding Window estimator, which measures the average usage over some period. The

second is a control or limit that is applied to the traffic so long as the actual sending rate, as

measured by the estimator, exceeds the target rate. In contrast to the leaky bucket scheme,

the controls look at each ON period in isolation; in other words, what happens in each ON

period does not depend on the length of the immediate OFF period, or any other short-

term history. The control is only related to the output of the estimator.

The estimator allows the algorithm to determine when a user has exceeded its target

rate. In this case, the algorithm will enforce some pre-defined policy to discourage such a

behavior. I propose a number of different controls, which involve reducing the allowed

peak rate, and limitimg the number of bytes in any one ON period that can be sent at the

peak rate. Further, instead of using packets to calculate the total number of bytes sent, I

use file sizes. In this simulation, all control adjustments are made at the beginning of each

file (each ON period) since the size of the file is known in advance. A practical scheme

must make the decision incrementally since the duration of the ON period is not known.

This algorithm maintains three local variables: Win-length that represents how much

the past "history" of a user will affect its current rate; the average-rate that estimates the

current rate of the user upon each ON period; Tfront that represents the time of the cur-

rent file arrival. While Tfront and the average-rate are calculated throughout simulation,

Win-length must be configured. [12] shows that this algorithm performs well for Tfront

between 0.6 and I second. The core of the ARCUP algoritm is presented below:

Initially:

32

Page 33: Internet Users - DSpace@MIT Home

Winlength = constant;

Average-rate = 0;

T_front = 0;

Upon each file arrival, TSW updates its variables:

FilesinTSW = average-rate*Winjlength;

Newfiles = FilesinTSW + workload;

average-rate = Newfiles/ (total-time - Tfront + Win-length);

T_front = total-time; / time of the last packet arrival

The goal of this algorithm is to hold the user to his/her long term average rate and to

impose limitations on burst. Our challenge is to let burst through with little distortion

while keeping the achieved average rate close to the nominal average rate. We accom-

plished these goals by keeping the current average rate as close as possible to the target

rate; both average and target rate are expressed in bits per second. For each arriving file,

the algorithm compares the current average rate with the target rate. If the average rate is

smaller than the target rate, no action is taken; if the average rate is greater than the target

rate, then some control is imposed to reduce the average rate. A typical average rate func-

tion is depicted in Figure 4-2. Since we want to achieve this goal regardless of changes in

system parameters, we use a feedback closed-loop to control the system. In other words,

the average rate is continuously measured and compared to the target rate. The difference

(or error value) between them is used to vary the peak rate, which for this simulation has a

maximum value of 1 Mega bits per second.

4.2 Uncontrolled average rate:The uncontrolled average rate represents the average rate of each user if no control mech-

anism is imposed. For each user, the corresponding average rate is determined by allow-

ing the user to receive data strictly at the peak rate. A simple division between the total

bits transferred and the total transmission time yields the uncontrolled average rate. The

OFF-times for each data set is scaled to adjust the uncontrolled average rate to the desired

value. Under these conditions, the user obtains its highest average peak rate (the highest

33

Page 34: Internet Users - DSpace@MIT Home

value that the average rate can reach) and its lowest transmission duration delay. In the

following subsections, we will present several schemes that aim at controlling the average

rate and by doing so we increase the transmission delay.

This chapter defines and analyzes the same estimator scheme with different controls.

x 11_

]V~N~1VJ45

~ThTh10

\JV '?V _0 NJ l I~j \I15 20

Time in mn25 30

Figure 4-1: Uncontrolled average for data set gonzo

Data set gonzo is utilized as the benchmark to evaluate the merit of each applied control

technique. We will present the following methods: in section I, we analyze an uncon-

trolled average rate; in section II, we show a maximum data control method; in section III,

we study a decrease and increase of the peak rate by a fixed factor approach; in section IV,

we combine the two techniques used in sections II and III; in section V, we study the per-

formance of the algorithm for different target rates; in section VI, we evaluate the perfor-

mance of the algorithm for different data sets; and in section VII, we compare the results

for running the algorithm at the peak, average, and target rates.

34

2.51-

2

CO

0)

(Z1.5

1

uncontrolled average rate

35

0.5-

01-0

Page 35: Internet Users - DSpace@MIT Home

As an example, we consider a user with a target rate of 10,000 bits/sec which is not

unreasonable for a user surfing the Web today. In this case, no control algorithm is applied

on the user who completed all transfers at the peak rate. The resulting average rate func-

tion (Figure 4-1) stays above the target rate between six and seven minutes into simula-

tion, and reaches a maximum value which is about twenty-five times the target rate. To

reiterate, our objective is to keep the average rate as close as possible to the target rate.

This control will become apparent to the user in terms of elapse time when downloading

information. The smaller the target rate and the tighter the control, the more time it will

take to receive Web data. The straight lines in Figure 4-1 represent very long off-periods

while the spikes represent large files transfer.

4.3 Maximum data control

The motivation behind this scheme is to prevent the user from receiving large files

regardless of its current average rate. This variant does not use the rate estimator. It

receives data at the peak rate except if the size of the Web data is greater than a pre-defined

value. Files as big as 900,000 bits are suitable to evaluate the performance of different data

sets in the context of this algorithm. The bigger the maximum data the smaller is the over-

all transmission time for a given data set. Here, we evaluate the algorithm for rather two

small maximum data: 300,000 bits and 150,000 bits. With a maximum data of 300,000

bits, the user is less restricted. Therefore, the total transmission time to complete his/her

transfers is less than in the other case. However, the long-term average rate is better con-

trolled with a maximum data of 150,000 bits while the increase in the total transmission

time is tolerable. In both cases, any file greater than the pre-defined maximum data size is

process as follows: the pre-defined maximum data is serviced at the peak rate, and the rest

at the target rate. If the current file size is less than the pre-defined maximum data, the user

can always receive at the peak rate. In Figure 4-2, the long term average rate in both cases

are well below the value obtained in the uncontrolled case (Figure 4-1). The improvement

in controlling the long-term average rate is at the expense of additional delay. The time

required by the same user to complete his/her entire transaction is increased by about four

minutes.

35

Page 36: Internet Users - DSpace@MIT Home

15 20Time in mn

25 30 35

Figure 4-2: Maximum data effect

4.4 Decrease and increase the peak rate by a fixed factorFor this version of the ARCUP algorithm, the user is allowed to receive data at the

peak rate until its average rate surpasses its target rate. At the occurrence of such an event,

the peak rate is decreased by a pre-defined factor (in this case, we set the pkrate =

0.8*pk-rate). This decrease continues until the average rate drops below the target rate. At

each packet arrival, the ARCUP algorithm determines how to adjust its peak rate. We have

chosen this approach instead of a per time interval criterion since the ON periods represent

when the user is active. Figure 4-3 shows all the cases where the peak rate has to be

dropped. Figure 4-4 depicts the changes occuring in the peak rate; it also expresses the

dynamism between the average, the target, and the peak rates. A spike in Figure 4-3 cor-

responds to a drop in the peak rate in Figure 4-4. In this instance, the longest period of

36

X 1012

10 -

8

6

4

0a)C,)(I)15

a)

a)

a)

max data 300,000 bits

max data 150,000 bits

2

00 5 10 40

Page 37: Internet Users - DSpace@MIT Home

drop occurs between twelve and seventeen minutes into simulation; between twelve and

twenty-two minutes approximately, the peak rate is kept below its maximum value.

x 1052 r-

5

Figure 4-3

10, 152025310 15 20 25 30

Time in mn

: Data gonzo with target rate 10,000 bits/sec

35 40

The peak rate is increased by a predefined factor as well (in this example, we set peak

rate = 1.2*peak rate). After a drop around five minutes into simulation (Figure 4-4), the

peak rate increases and remains constant until about twelve minutes into simulation. The

behavior of the peak rate within that interval is due to the fact that the average rate remains

below the target rate over that same interval (Figure 4-3).

37

1.4

CD,S1.2

Cz

a)

CUa") 0.8

0.6

0.4 F

0.2

00

q

1.8 -

1.6 -

Page 38: Internet Users - DSpace@MIT Home

x 105

9-

8-

7-

26-

1-CZ

3-

2-

010 5 10 15 20 25 30 35

Time in mn

Figure 4-4: Change in the peak rate

4.5 Varying the peak rate and maximum data controlThis section is somewhat a hybrid of section II and section III. This variant of the

ARCUP algorithms provides the best results; thereby, it is utilized for the remaining of

this thesis to evaluate different case studies. In addition to the maximum data control, the

peak rate is dropped whenever the average rate is greater than the target rate. This control

occurs regardless of files' sizes. Analyzing this variant of the ARCUP algorithm for the

same maximum data used above, we observed a better control in the average rate with a

slight increase in the total delay. The maximum value reaches by the average rate in this

case is about 80,000 bits/sec compare to 200,000 bits/sec in the previous case (Figure 4-3).

In return, the total delay is about three minutes extra.

38

Page 39: Internet Users - DSpace@MIT Home

We present the results for two maximum values data: namely, 300,000 and 150,000

X 10

8

7

6

Cl,

5

)4CZ

a)

3

2

1

030 35 40 45

Figure 4-5: Maximum data and peak rate control

bits. In both cases, the algorithm restricts the user from receiving data at the peak rate

once the file size exceeds those values. This restriction takes effect even when the current

average rate is less than the target rate. With the maximum data of 300,000 bits, the long-

term average rate is slightly bigger with a total delay of 40 minutes. The long-term aver-

age rate is more restricted with the smaller maximum data while the incurring delay is

about the same. Figure 4-5 shows the result for these two cases, while Figure 4-6 displays

their relationship with the peak rate. The reader can observe that the peak rate drops when-

ever the average rate is greater than the target rate; it increases when the target rate is

smaller, or remains constant. In Figure 4-5, for instance, when the maximum data size

allowed equals 150,000 bits, the peak rate momentarily drops from 1 Mega bits per second

to 10,000 bits per second after seven minutes into simulation. This drop occurs because

39

0 5 10 15 20 25Time in mn

max data = 300,0

max data = 150,000 bits

00 bits

~v1

Page 40: Internet Users - DSpace@MIT Home

the average rate is greater than the target rate within that interval. Consequently, it forces

10

9

8

7

U

-D

C

a)0)

6

5

4

x 10

0 5 10 15 20 25 300 5 10 15 20 25 30

Time in mn

Figure 4-6: Variation in the peak rates

35 40 45

the average rate to go below the target rate. The peak rate remains below its maximum

value during much of the simulation. It momentarily climbs back to it nominal value after

thirty minutes into simulation due to the re-adjustment in the average rate. The peak rate

remains at its maximum value (i.e., IMbits/sec) for about two minutes thereafter. This

observation can be explained by the fact that the average rate remains below the target rate

within that interval. On the other hand, when the maximum data size allowed equals

300,000 bits, the result is a little bit different. The reader should notice that the drop in the

peak rate is more acute when the maximum data equals 300,000 bits. By relaxing the con-

straint on the maximum data size, the user obtains a greater average rate (see Figure 4-5) .

Since the peak rate is dynamically controlled, a greater increase in the average rate results

in a greater decrease in the peak rate (see Figure 4-6).

40

max data = 150,000 bits

max data = 300,000 bits

-

I I

Page 41: Internet Users - DSpace@MIT Home

4.6 Performance of the ARCUP algorithm for different target ratesIs the ARCUP algorithm sensitive to the target rate? Obviously, different users are

bound to have different profile requests. In a quest to determine the best profile for a user

given a sample of his/her Web transactions over a period of time, I evaluate the ARCUP

algorithm for different target rate values. For simplicity, we restrict ourselves to just three

values, namely: 12,000 bits/sec, 10,000 bits/sec, and 8,000 bits/sec.

For this analysis, I concentrate on the transfer time required for two files when differ-

ent target rates are applied. As an illustration, I focus on the scatter plot of figure 4.7-4.9 to

analyze the transmission times resulting for a file of 20,000 bytes and a file of 60,000

bytes.

First, I analyze the results for the case when the target rate equals 12,000 bits/sec. For

the first file, the resulting transmission duration time is approximately two seconds using

the ARCUP algorithm average rate; if it were processed at the nominal average rate (

10,000 bits/sec), the resulting transmission time would have been sixteen seconds. Fur-

ther, this file size is less than the pre-defined maximum data of 300,000 bits. As a result,

most of the transfers are done at a rate that is close to the peak rate. For the second file, the

nominal average rate (i.e, 10,000 bits/sec) would yield a transmission duration time of

fourty-eight seconds while the ARCUP algorithm yields approximately eight seconds.

Since the second file size is greater than the pre-defined maximum data, its transmission

transfer time contains a peak rate component and a target rate component. In the scatter

plot of Figure 4-7, the dotted points near the x-axis represent the files sizes less than

41

Page 42: Internet Users - DSpace@MIT Home

300,000 bits. All those files are processed at a rate that is

ing dotted points represent the larger files that are processe

2 4 6 8 10Transfer size in bytes

near the peak rate. The rem

d at a slower rate.

12 14 16 1

x 104

Figure 4-7: Data set gonzo at 12,000 bits/sec

Second, I analyze the results for the case when the target rate equals 10,000 bits per

second. By lowering the target rate, we obtained an increased in the maximum delay per

transaction. In this analysis, I consider the same two files mentioned in the previous case.

The same amount of time is required for the smaller file (see Figure 4-7) since it is less

than the pre-defined maximum data allowed. However, there is an increased of approxi-

42

2

1

E

'-1CZ

20 C

target rate = 12,000 bits/sec

*

*

0- *

5-

0-

**

*

*

5-

n*--

ain-

8

Page 43: Internet Users - DSpace@MIT Home

mately three minutes for the second file . Third, by lowering the target rate to 8,000 bits/

45

C.)

Cd)

Cn

F-

40-

35-

30-

25-

20-

15-

10-

5

0 2 4 6 8Transfer size in bits

10 12 14

x 105

Figure 4-8: Data set gonzo at 10,000 bits/sec

sec, the ARCUP algorithm becomes more restrictive (Figure 4-8). The individual transfer

rate for each file increases as well as the overall transmission time. We notice that the

20,000 bytes file still requires approximately the same amount of time while the 60,000

bytes file requires twenty seconds. In addition, there is a slight increase in the transmission

time of the files that are less the maximum data. These are the files with an average and a

peak rate component. In this case, the average rate reached the target rate much sooner; it

has the effect of reducing the obtained average rate; though, it increased the transmission

times.

43

Varying peak rate with maxdata = 300,000 bits and target rate =10,000 bits/sec

e *

****

***

0

Page 44: Internet Users - DSpace@MIT Home

To summarize, variations in the target rate affect the obtained average rate ( total data

2 4 6 8 10Data transferred in bytes

12 14 16

Figure 4-9: Data set gonzo at 8,000 bits/sec

transfer over total transmission duration) as well as the transmission duration time. If the

target rate is set to be greater than the nominal average rate, the user obtains a lower

response time per transfer. On the other hand, if the target rate is set lower than the nomi-

nal average rate, (which is 10,000 bits/sec for data set gonzo) the user obtains greater

response time. Below is a table of obtained average rate for different target rates and dif-

ferent maximum data.

44

4

4

3

3

2

C)

C)

0

Z3

0Ud)

E(n)

2

I I I I

target rate = 8,000 bits/sec

5-

0-

5-

*0-

**5-

0 *

0-*

5-*

**0 -

*

1

0 18

x 104

r_

1

Page 45: Internet Users - DSpace@MIT Home

4.7 Effect of maximum data size on obtained average rateIncreasing the maximum data size can be viewed as limiting the type of files that a

Table 4-1: Effect of maximum data size on obtained average rate

Maximum Obtained Nominal Target ratedata average rate average rate (bits/sec)

(bits/sec) (bits/sec)

100,000 bits 8,780 10,000 8,000

300,000 bits 9,084

600,000 bits 9,248

900,000 bits 9,460

100,000 bits 8,933 10,000 10,000

300,000 bits 9,220

600,000 bits 9,4708

900,000 bits 9,652

100,000 bits 9,045 10,000 12,000

300,000 bits 9,304

600,000 bits 9,525.5

900,000 bits 9,675.6

user can download with no additional delay. A typical case can be to allow the user to

download, for instance, hypertext, images, videos, but not MP-3 files. Table 4-1 and Fig-

ure 4-10 illustrate the case of a fixed target rate but different maximum data. The total

transmission duration time improves with an increase in the maximum data. Figure 4-10

shows that by increasing the maximum data allowed more files are being received near or

at the peak rate. It illustrates the characteristic of the ARCUP algorithm to restrict large

files.

45

Page 46: Internet Users - DSpace@MIT Home

25

20

0

10

5

0 2 4 6 8 10Transfer size in bytes

12 14 16 18 . . 2X 10

4 6 a 10 12 14Transfer size in bytes

Figure 4-10: Transfer for data set gonzo with different maximum data

4.8 Effect of maximum data on transmission durationTable 4-2 shows the results of running the ARCUP algorithm at different maximum

data values. These results are not taking into consideration OFF periods. In other words,

they solely represent the activity periods (i.e., ON periods) of data set gonzo at a target

Table 4-2: Effect of maximum data on transmission duration

Maximum data Total transmission Uncontrolled(bits) duration (secs) duration (secs)

100,000 290 20

300,000 221 20

600,000 150 20

46

max data= 100,000 bits

**

95

.4-

max data 300,000 bits

25 -

20-

.15 9

E

10*

5-

2 4 6 8 10 12 14 16 11Transfer size in bytes

0 2 4 6 8 10 12 14 16 1aTransfer size in bytes X 10

2

2

max data = 600,000 bits *

5 -

0 *-

5 -

0 -

5 --

max data 900,000 bits

25-

20 -

S15 -.0 1

to-

5-

*- Mt.**p * * ** * , *

16 18X 104

Page 47: Internet Users - DSpace@MIT Home

Table 4-2: Effect of maximum data on transmission duration

Maximum data Total transmission Uncontrolled(bits) duration (secs) duration (secs)

900,000 115 20

rate of 10,000 bits/second. As observed, the total transmission duration decreases as we

increase the maximum data value. This feature will provide ISPs (Internet Service Pro-

vider) and/or network adminitrators the flexibility to adjust a user's profile as desired.

4.9 Algorithm performance for different data setsStudying the algorithm for one data set is insufficient to draw a performance conclu-

sion: data set distribution varies from user to user. This section evaluates the performance

of the ARCUP algorithm for different users (or data sets). The objective in this following

analysis is to determine the algorithm's performance for different data sets. This study

involves four sets of data: goofy, taz, daffy, and pooh (The names of the actual users were

removed for privacy rights). For each data set, we first consider the uncontrolled average

rate followed by the controlled average rate. The target rate and the nominal average rates

in all cases are 10,000 bits/sec, and the maximum data is 300,000 bits. The ARCUP algo-

rithm performs relatively well in each case. Table 4-3 shows the achieved average rate for

each data set.

In each case, we give the user's uncontrolled average rate followed by his/her con-

Table 4-3: Achieved average rate

Long-term average Achieved averageUser name rate(bits/sec) rate (bits/sec)

Daffy 9,993 9,155

Gonzo 10,122 9,048

Goofy 10,406 9,467

Pooh 10,430 9,260

Taz 9,883 8,864

trolled average rate. In the uncontrolled case, the average rate has the highest spikes and

47

Page 48: Internet Users - DSpace@MIT Home

the smallest total transmission delays. In the controlled case, the long-term average rate

is much lower, but there is an increase in total transmission delay. For instance, data set

pooh's average rate swings in the vicinity of the target rate while the transmission time

delay increases considerably for the controlled case (Figure 4-16). In the other cases, the

algorithm controls the average rates with far less additional transmission delay (Figure 4-

8). As an illustration, the achieved average rate for data set daffy is reduced considerably

in the controlled case with a small increase in the total transmission delay (Figure 4-12).

10 15 20Time in mn

25 30 35

Figure 4-11: Uncontrolled daffy

48

x104.5

4 -

3.5 F

3CO,

0)

2.5F-

2

1.5

1

0.5

00 5

AI -j VI \

40

Page 49: Internet Users - DSpace@MIT Home

0 5 10 15 20Time in nr

25 30 35 40 45

Figure 4-12: Control daffy

49

x 104

8

7

6

55(D5

co

3

2

1

0 kI

Page 50: Internet Users - DSpace@MIT Home

-

Al It, IL [Ak L.j W Ij WufU' YLWWMA-'Ni\l o\I

10KUV.jr L--.t. J~~ul l'\fj1 U 1 1N --4 IJAWW I

15 20Time in mn

25 30

Figure 4-13: Uncontrolled goofy

50

x 1055

4.5 -

4 -

3.5 F

3

2.5 -

C-)

Cu)

CO)

CD 2

1.5

1

0.5

00 5 35 40

I I k 1,1h

[VJ\Ilf \/41u&-

Page 51: Internet Users - DSpace@MIT Home

di I K.1

5 10 15 20 2Time in mn

5 30 35 40 45

Figure 4-14: Controlled goofy

51

x 10412

10F

c-)

cz

2

00

J11.

-

11IlLiIN44 U I k INqjqj rv

Page 52: Internet Users - DSpace@MIT Home

SX 105

2.5-

2-a)C',

1.5-

1--

)

a)

0.5-

0N0 5 10 15 20 2

Time in mn

Figure 4-15: Uncontrolled pooh

52

Page 53: Internet Users - DSpace@MIT Home

1 2 1

10 15Time in mn

20

Figure 4-16: Controlled pooh

53

10

8

6

a-)

a)

4

2

00

IT WO L --- M U L5 25 30

I I

Page 54: Internet Users - DSpace@MIT Home

X10

5-

4-

3 -

0

C

a,

2-

0 20 40 60 80 100 120Time in mn

Figure 4-17: Uncontrolled taz

54

Page 55: Internet Users - DSpace@MIT Home

X 10

6-

5-

4 --

Z)3 --

2--

00 20 40 60 80 100 120 140Time in mn

Figure 4-18: Controlled taz

4.10 Running the algorithm at the peak, target, and average rates

The transmission time is given by the ratio between the size of the file being received

and the actual transfer rate. The transmission time for each file received has two bound-

aries. Considering the scatter plot of Figure 4-19, the transmission time is limited above

by the target rate and below by the peak rate. This section investigates the duration of

transferred files for running the algorithm : 1) at the peak rate, 2) at the target rate, and 3)

at the average rate. An illustration should facilitate its understanding: when allowed to

continuously receive files at the peak rate, the maximum delay observed by the user is 1.4

55

Page 56: Internet Users - DSpace@MIT Home

transaction. This result is accurate because the user's largest transaction is

16

14

12

Cl)

10E

i8I-

6

4

2

00 0.5 1 1.5

Transfer size in bytes2 2.5

X 104

Figure 4-19: Average, peak, and target rates

180,000 bytes at a speed of 1 Mega bits per second. When constrained to continuously

receive at the target rate, the maximum delay is 120 seconds per transaction. Again, we

can corroborate this result because the largest transaction is still 180,000 bytes at a target

rate of 10,000 bits per second. For a user that is receiving data at the average rate, the

observed delay in transmission time is enclosed between the two previous cases. In this

case, each transmission time has a peak time as well as a target time component. As a

result, they are not processed any faster than the peak rate nor any slower than the target

rate.

56

181

second per

o peak rate* target rateA average rate

*

AA

-A -

AA

- -

A AA

O'O1m nc) n-

Page 57: Internet Users - DSpace@MIT Home

This page left intentionally blank

57

Page 58: Internet Users - DSpace@MIT Home

Chapter 5

Conclusion, Applications, andFuture Work

This chapter contains a summary of our findings, suggests a possible application of the

ARCUP algorithm, and proposes an idea for future work. Further, it contains references,

source codes, and a sample of the data set used in simulations.

Since the two schemes presented in chapter 4 and chapter 5 are different, we presented

the graphical results for each one of them in the most comprehensive way. However, one

will be mostly interested to know how much the flows are delayed by each scheme respec-

tively. We answer this question by evaluating both algorithms against a common metric.

We first look at the amount of time required for a user to complete his/her transactions

when no control is applied. We then computed the amount of time required for the same

set of data when either scheme is applied. The results are inserted in the following section.

5.1 ConclusionThis paper analyzed the leaky-bucket algorithm and showed that is inefficient in con-

trolling heavy-tail traffic. This analysis is supported by [8] as well. Table I shows the

results from running the leaky-bucket versus the proposed algorithm for different data sets

at the standard usage profile (i.e., target rate = 10,000 bits/sec). The target rate in the case

of the ARCUP algorithm is equivalent to the token rate in the leaky-bucket algorithm.

From this table, one case deserves some elaboration: data set goofy. Data set goofy (Fig-

ure 1-1) is relatively poisson distributed, therefore the leaky-bucket scheme provides a

transmission delay that is similar to the ARCUP algorithm. The leaky-bucket profile

works well for short-range dependence traffic -- predictable type of traffic like poisson

58

Page 59: Internet Users - DSpace@MIT Home

model. In all the other cases, the ARCUP algorithm outperforms its counterpart. Table 5-

1:

Table 5-1: Leaky bucket vs. ARCUP algorithm (time in mn and data in Mega bits)

User name or Total data Uncontrolled Leaky-bucket TSW algorithmdata set transferred

Taz 67 113 140.5 127

Gonzo 19.7 32.4 40 34.6

Goofy 22.7 36 40.3 40

Pooh 22.6 20.8 24 21.7

Daffy 13 37.7 43 42

These results, are obtained by running the ARCUP algorithm with a maximum data

equals to 600,000 bits and a target rate of 10,000 bits/sec. These results can be further

improved by increasing the maximum data value. In closing, the simple leaky bucket pro-

file works well for short-range dependence traffic -- predictable type of traffic like Poisson

model. The ARCUP algorithm is more suitable for long-range dependent or self-similar

traffic. It outperforms the leaky bucket algorithm especially when the total files transferred

is considerable. More studies can confirm whether or not this algoritm (ARCUP) does

penalize Markovian traffic.

5.2 Application

In the current Internet model, ISPs (Internet Service Providers) lease their bandwidths

from the backbone suppliers. The latter own high capacity links like OC-1, OC-2, and OC-

3. They are considered as the source of bandwidth capacity. Corporations, institutions,

and individual users, in turn, lease their bandwidth from the ISPs. To deter end users from

using more than their contracted bandwidth, Internet Service Providers can build a usage

profile for each source. They will have to provision enough bandwidth to carry traffic for

all users. For Web (bursty) traffic, they assume that not all users send at peak rate at the

same time. In a setting like a university, ISPs can provide a profile to the LAN administra-

tor. The administrator will then repartition profiles locally according to users' needs. In a

computer laboratory, like the Athena clusters here at MIT, all workstations will be

59

Page 60: Internet Users - DSpace@MIT Home

assigned the same profile. Businesses, for example, will be able to discourage their

employees from making excessive usage of the Internet by assigning them lower usage

profiles. Let's say an employee profile is such that he/she can download five files every ten

minutes. If this employee tries to download seven files in ten minutes, the algorithm can

be calibrated to complete this transfer in twenty minutes. Such a policy will encourage

the user to stay within his assigned profile. In addition, the usage profile algorithm can be

built with added features than can warn the user when exceeding his/her profile. As an

incentive, the user will not be penalized if he/she voluntarily slows down. The usage pro-

file mechanism will contribute to avoid congestion over the Internet.

5.3 Future work

The two algorithms presented in this thesis can work in tandem with mechanisms that

can differentiate the type of traffic before hand. In future work, one can explore the possi-

bilities of designing a hybrid algorithm that can regulate properly all types of traffic. Fur-

ther, for both algorithms, we assumed that the length on the ON periods are known a priori

which is contrary to reality. In future work, one can explore the merit of a scheme that can

determine the proper adjustment to the peak and the average rate based on the instanta-

neous length of the ON period. An inquisitive mind can even look into a different algoritm

that will produce better results than the one presented in this thesis.

60

Page 61: Internet Users - DSpace@MIT Home

5.4 References

[1] M. E. Crovella and A. Bestavros, "Self-Similarity in World Wide Web Traffic: Evi-dence and Possible Causes", in: IEEE/A CM Transactions on networking, Vol. 5, No. 6,December 1997.

[2] Walter Willinger, Murad S. Taqqu, Robert Sherman, and Daniel V. Wilson, "Self-Sim-ilarity Through High-Variability: Statistical Analysis of Ethernet LAN Traffic at theSource Level", in: IEEE/ACM Transactions on networking, Vol. 5, No. 1, February1997.

[3] Modeling Mammographic Images Using Fractional Brownian Motion, Signal Pro-cessing Research Centre, Queensland University of Technology.

[4] Patrick Flandrin, "Wavelet Analysis and Synthesis of Fractional Brownian Motion",in: IEEE Transactions on Information Theory, Vol. 5, No. 38, March 1992.

[5] W. Willinger, V. Paxon, and M.S. Taqqu, "Self-Similarity and Heavy Tails StructuralModeling of Network Traffic In a Pratical Guide to Heavy Tails: Statistical Techniquesand Applications", R. Adler, R. Feldman, and M.S. Taqqu editors, Birkhauser, 1998.

[6] S. A. Marshall, "Introduction to Control Theory", 19978.

[7] Gennady Samorodnitsky and Murad S. Taqqu, "Stable Non-Gaussian Random Pro-cesses, Stochastic Models with Infinite Variance", pp. 318-320.

[8] Ian Je Hun Liu, Bandwidth Provisioning for an IP Network using User Profiles. S. M.Thesis, Technology and Policy, and Electrical Engineering and Computer Science,Massachusetts Institute of Technology, Cambridge, Ma. 1999.

[9] David D. Clark, "An Internet Model for Cost Allocation and Pricing": in Internet Eco-nomics, McKnight, L. and Bailey, J., editors, MIT Press, 1997.

[10] Kevin Warwick, "An Introduction to Control Systems", Advanced Series in Electricaland Computer Engineering, Vol. 8, 2nd edition.

[11] Larry L. Peterson and Bruce S. Davie, "Computer Networks, A Systems Approach",2nd edition.

[12] Wenjia Fang, Differentiated Services: Architecture, Mechanisms and an Evaluation,Ph.D. Dissertation, Presented to the Faculty of Princeton University, November 2000

[13] F. Nicola, Gertjan A. Hagesteijn, and Byung G. Kim, Fast-simulation of the leakybucket algorithm.

61

Page 62: Internet Users - DSpace@MIT Home

Appendix A

Source codes and Data sam-ple

A.1 Generating OFF times/* Author: Pierre Arthur Elysee */

/* Date: 6/4/99 */

/* Description: This module generates the off times use in our simulations */

#include <stdio.h>

#include <stdlib.h>

#include <string.h>

#include <math.h>

#include <time.h>

/* all macros */

#define seps "" /* separation symbols for parsing */

#define MAXNUMBERCHARS 200 /* number of characters expected per line */

/* This function generates random numbers */

double rand-gen(num)

/* generate a uniformly distributed random number from [0,1] */

unsigned long *num;

{

double random;

62

Page 63: Internet Users - DSpace@MIT Home

*num = (*num * 16807) % 4294967295;

random = ((double) *num) / 4.294967295e9;

return (random);

I

void mainO

{

/* variables */

FILE *fp-in, *fp-out;

char inputname[20], outputname[20];

char buff[200]; /* hold one line of data at a time */

float word[30], duration;

int nw, onsources, offsources, fbm = 1;

double k = 0.0, y=0.0, time on=0.0, timeoff=0.0, savetimeon=0.0,

save time off=0.0;

double starttime=0.0, endtime=0.0;

char *w;

char buffl [80], buff2[15], buff3[15], buff4[15], buff5[15], buff6[15];

unsigned long seed = 2345677;

long int offset;

printf("Enter input file name:\n");

scanf("%s", input-name);

strcpy(output-name,input-name);

strcat(output-name, "_out");

printf("input file name: %s \n",input name);

printf("output file name: %s \n\n",output-name);

63

Page 64: Internet Users - DSpace@MIT Home

/* durantion value is entered here */

printf("Enter duration value:\n");

scanf("%f", &duration );

starttime = clockO; /* beginning of simulation */

/* testing input and output file for existance */

if ( (fpjin = fopen(input-name,"r") ) == (FILE *) NULL)

{printf("Couldn't open %s for reading.\n", input-name);

exit(1);

I

if ( (fpout =fopen(outputname,"w") ) == (FILE *) NULL )

{printf("Couldn't open %s for writing.\n", output name);

exit(1);

I

else

{

strcpy(buff3,"Time on");

strcpy(buff4,"Time off");

strcpy(buff5,"Source index");

strcpy(buff6,"Fbm_index");

/* output formatting */

sprintf(buffl., "%s %s %s %s", buff3, buff4, buff5, buff6);

fputs(buff 1, fpout); /* write line to file */

fputs("\n", fpout); /* add return character to line */

fp-in = fopen(input_name,"r");

64

Page 65: Internet Users - DSpace@MIT Home

/* Read input component list file stream */

while (fgets(buff, MAXNUMBERCHARS, fpjin) != 0)

{

nw =0;

w = strtok(buff, seps); /* find first word */

while (w)

{

/* parsing input line */

word[nw++] = atof(w);

w = strtok(NULL, seps); /* find next word */

I

/* calculating number of sources on and off for each FBM */

onsources = (int)(word[7]/word[1]);

offsources = (int)(word[3] - on-sources);

/*testing Beta and alpha values */

if(word[21] <= 0.0 || word[23] <= 0.0 || word[13] <= 0.0)

{

sprintf(buff 1, "%s %d", "Invalid BetaHigh1 or BetaHigh2 or Alphal for

FBM:", fbm++);

fputs(buff 1, fpout);

fputs("\n", fpout); continue;

I

/* calculating time-on and timeoff for each source within an FBM */

while(on-sources)

65

Page 66: Internet Users - DSpace@MIT Home

{.

while(duration > timeon){

/* y is a uniformely distributed random variable between (0,1) */

do{

y = rand-gen(&seed); // y will never be either zero or one

}while(y==0.0 11 y==I.);

/* calculating on/off times */

timeon = word[21]*pow((1/y-1), 1.0/word[13]); /* time in seconds */

timeoff= word[21]*pow((1/y-1), 1.0/word[13]); /* time in seconds */

timeon = timeon + savetimeoff;

timeoff = time-off + time-on;

savetimeoff = timeoff;

if(time-on > duration ) continue; /*time-on has exceeded simulation time*/

/* writing results to output file */

strcpy(buff 1, "\0"); /* initializing buffer to null */

sprintf( buff 1, "%f %f %d %s%d%s", timeon, timeoff,

onsources, "A", fbm, "Z");

fputs(buff 1, fp-out); /* write line to file */

fputs("\n", fp-out); /* add return character to line */

}/* end while duration */

timeoff= 0.0;

timeon = 0.0;

savetimeon =0.0;

savetimeoff= 0.0;

onsources--; /* get next source */

66

Page 67: Internet Users - DSpace@MIT Home

} /* end while onsources */

while(off sources)

{while(duration > time-on){

/* y is a uniformely distributed random variable */

do{

y = rand-gen(&seed); /* y will never be either zero or one */

}while(y==O.O 11 y==1.0);

/* calculating on times */

timeon = word[23]*pow((1/y-1), 1.0/word[13]); /* time in seconds */

timeoff = word[23]*pow((1/y-1), 1.0/word[13]); /* time in seconds */

timeon = timeon + savetime_off;

time-off = time-on + timeoff;

savetimeoff = timeoff;

if(timeon > duration ) continue;/*timeoff has exceeded simulation time */

1* writing results to output file */

strcpy(buff 1, "\0"); /* initializing buffer to null */

sprintf( buff1, "%f %f %d %s%d%s", timeon, timeoff,

offsources, "A", fbm, "Z");

fputs(buff 1, fpout); /* write line to file */

fputs("\n", fpout); /* add return character to line */

} /* end while duration */

offsources--; /* get next source */

timeoff = 0.0;

67

Page 68: Internet Users - DSpace@MIT Home

timeon = 0.0;

savetimeon = 0.0;

savetimeoff = 0.0;

}/* end while offsources */

fbm++; /* get a new FBM from file */

}/* end outer while */

fclose(fp-in);

fclose(fp-out);

endtime = clocko;

printf("-------------------------------------\n");

printf("Simulation has been completed successfully.\n");

printf("The total running time is: %e seconds\n",((double)(end-time - starttime))/

CLOCKSPERSEC);

} /* end else */

}/* end main */

A.2 This module represents the leaky bucket's scheme source codes/* Author: Pierre Arthur Elysee */

/* Date: 2/4/2000 */

/* Description: This module represents the leaky bucket scheme. It calculates the total

time requires by each source to complete their data tranfer. */

#include <stdio.h>

#include <stdlib.h>

#include <string.h>

#include <math.h>

/* all macros */

68

Page 69: Internet Users - DSpace@MIT Home

#define seps "" 1* separator symbols for parsing */

#define MAXNUMBERCHARS 200 /* number of characters expected per line */

#define inibucket 0.0; /* initial content of bucket */

typedef struct

int w;

struct NODE *next;

} NODE;

void mainO

{

/* variables */

FILE *fp-in, *fp-out;

char input-name[20], output_name[20];

char buff [200]; /* hold on line of data from input file */

char buff 1[200], buff2[20], buff3[10],buff4[20], buff5[10],buff6[20], buff7[20];

NODE *ptr head, *ptr-tail, *ptr walk, *prev-pos, *temp;

float offtime;

float pkjtime = 0.0;

float tkntime = 0.0; /* peak time and token time */

float total-pk time = 0.0;

float totaltkntime = 0.0;

float totalofftime = 0.0;

float totalontime = 0.0;

float prevontime = 0.0;

float period-on, periodoff, prev-offtime;

float tknrate = 10000.0; /* replenishing rate of token in bit/sec */

69

Page 70: Internet Users - DSpace@MIT Home

float pkjrate = 1000000.0; /* bit/sec */

float totaltime = 0.0; /* time requires by each source to comple their transfer */

float bucket = 0.0; /* bucket capacity */

float totaldata = 0.0;

float datainbits = 0.0;

int workload = 0;

int ontime = 0.0;

int typ-offtime = 0; /* in sec */

int newid = 0;

int old_id = 1;

int flag = 1;

int i = 0;

char *tmp;

printf("Enter input file name:\n");

scanf("%s", input-name);

strcpy(output-name,input-name);

strcat(output-name, "_out");

printf("input file name: %s \n",input-name);

printf("output file name: %s \n\n",output-name);

printf("please enter token rate:");

scanf("%f', &tknrate);

/* testing input and output file for existence */

if ( (fp_in = fopen(inputname,"r") )== (FILE *) NULL)

{

printf("Couldn't open %s for reading.\n", inputname);

70

Page 71: Internet Users - DSpace@MIT Home

exit(I);

}if ( (fp-out = fopen(output-name,"w") ) == (FILE *) NULL)

{printf("Couldn't open %s for writing.\n", output-name);

exit(1);

I

else

{

strcpy(buff2,"typoff-time");

strcpy(buff3,"total data");

strcpy(buff5,"totalpktime");

strcpy(buff6,"total tkn_time");

strcpy(buff7,"totaltransfer time");

/* output formatting */

sprintf(buffl, "%s %s %s %s %s", buff2, buff3, buff5, buff6, buff7);

fputs(buff 1, fp-out); /* write line to file */

fputs("\n", fp-out); /* add return character to line */

fp__n = fopen(input-name,"r");

while (fgets(buff, MAXNUMBERCHARS, fpin) != 0)

{

typ-offtime = atoi(strtok(buff, seps));

workload = atoi(strtok(NULL, seps));

data in bits = 1.0*workload;

71

Page 72: Internet Users - DSpace@MIT Home

if(typofftime == 0) typ-offjtime = 4; /* since of time off can not be zero */

bucket = bucket + tkn rate*typ-off time;

totaldata = totaldata + datainbits;

/* calculating peak time and token time */

if(data_inbits > bucket)

pktime = bucket/pk-rate;

datainbits = datainbits - bucket;

bucket = pktime*tkn_rate; /* replenishing bucket for transfer duration*/

tkntime = datainbits/tknrate - bucket/tknrate;

bucket = 0.0;

}

else

pktime = datainbits/pkjrate;

tkntime = 0.0;

bucket = bucket - datainbits;

}

/* calculating total transfer time */

total-pk time = totalpktime + pktime;

totaltkntime = totaltkntime + tkntime;

totaltime = total_pk_time + totaltkntime + typ-offtime + total-time;

sprintf(buffl, " %d %d %f %f %f", typofftime,

work-load, pktime, tkntime, total_time);

fputs(buff 1, fpout); /* write line to file */

fputs("\n", fp-out); /* add return character to line */

}/ end while

72

Page 73: Internet Users - DSpace@MIT Home

/* taking care of last source */

/*spnntf(buffl, " %d %f %f %f %f", typ-offtime,

total-data/8, total-pk-time, totaltkn time,total-time);

fputs(buff 1, fp-out); write line to file

/*fputs("\n", fp-out); add return character to line */

fclose(fpjin);

fclose(fpout);

}// end else

}// end main

A.3 This module represents the uncontrolled ARCUP algorithm/* Author: Pierre Arthur Elysee */

/* Date: 9/24/00 */

/* Description: This module calculates the total time requires *1

/* by each source to complete their data tranfer. It uses an average estimator skim

along with */

/* real web traces. send av/target partion of data at target and l-av/target at peak rate

when average > target, increase otherwise. Report the transfer time for each transac-

tion*/

#include <stdio.h>

#include <stdlib.h>

#include <string.h>

#include <math.h>

/* all macros */

#define seps " " /* separator symbols for parsing */

#define MAXNUMBERCHARS 200 /* number of characters expected per line */

73

Page 74: Internet Users - DSpace@MIT Home

#define ini_bucket 0.0; /* initial content of bucket */

typedef struct {

int w;

struct NODE *next;

} NODE;

void mainO

{

/* variables */

FILE *fp-inI, *fp-in2, *fp out;

char input_name1 [20], input-name2[20], output-name[20];

char buff[200]; /* hold on line of data from input file */

char buff 1[200], buff2[20]. buff3[10],buff4[20], buff5[10],buff6[20], buff7[20];

char *w, *tmp;

NODE *ptr head, *ptr-tail, *ptr walk, *prev-pos, *temp;

float offtime = 0.0;

float pkjtime = 0.0;

float total-pk-time = 0.0;

float pk-rate = 1000000.0; /* initial pkrate */

float targetrate = 10000.0; /* initial target-rate */

float avrate = 0.0; /* peak time and token time */

float prevontime = 0.0;

float period-on, period-off, prev-off-time;

float totaltime = 0.0; /* time requires by each source to comple their transfer */

float totaldata = 0.0;

float datainbits = 0.0;

74

Page 75: Internet Users - DSpace@MIT Home

float Win-length = 0.6;

float T_front = 0.0;

float Files_inTSW = 0.0;

float Newfiles = 0.0;

float alpha = 0.05;

float word[30]; /*contain data from input file */

int ontime = 0.0;

int period = 1;

int workload = 0;

int maxdata = 600000;/* representing maximum allowed file sizes */

int typ-offtime = 0; /* in sec */

int old_id = 1;

int flag = 1;

int i = 0;

int nw = 0;

printf("Enter input file name of data:\n");

scanf("%s", input-name2);

strcpy(output-name,input-name2);

strcat(output-name, "_out");

printf("input file name: %s \n",input-name2);

printf("output file name: %s \n\n",output-name);

if ( (fpin2 = fopen(inputname2,"r") ) == (FILE *) NULL)

{printf("Couldn't open %s for reading.\n", inputname2);

exit(I);

75

Page 76: Internet Users - DSpace@MIT Home

}if ( (fp-out = fopen(output-name,"w") ) == (FILE *) NULL)

{printf("Couldn't open %s for writing.\n", output_name);

exit(1);

I

else

{

strcpy(buff2,"typ-off-time");

strcpy(buff3,"total data");

strcpy(buff4,"pkjrate");

strcpy(buff5,"av-rate");

strcpy(buff6,"target_rate");

strcpy(buff7,"totaltransfer time");

/* output formatting */

/* sprintf(buff1, "%s %s %s %s %s %s", buff2, buff3, buff4, buff5,

buff6, buff7); */

fputs(buffl, fp-out); /* write line to file */

fputs("\n", fp-out); /* add return character to line */

/* open file with data transfer info */

fp-in2 = fopen(input-name2,"r");

/* reading data from input file */

while (fgets(buff, MAXNUMBERCHARS, fpjin2) != 0)

{

76

Page 77: Internet Users - DSpace@MIT Home

nw =0;

w = strtok(buff, seps); /* find first word */

while (w)

{/* parsing input line */

word[nw++] = atof(w);

w = strtok(NULL, seps); /* find next word */

}typ-offjtime = word[0];

workload = (int)(word[1]);

datainbits = 8*workjload; /* data to transfer in bits */

pk-time = datajin-bits/pkrate;

}

totaltime = Tfront + typ-off-time + pkjtime;

/* calculating the the average transfer rate */

FilesinTSW = avrate*Win_length;

Newfiles = Files_inTSW + workload*8;

avrate = Newfiles/ (total-time - Tfront + Win_length);

T_front = total-time; // time of the last packet arrival

total_pk_time = totalpkjtime + pkjtime;

sprintf(buffl," %d %d %d %d %d %f %f",

typ-offjtime, work-load, (int)pk-rate, (int)avrate, (int)target_rate,total-time/60.0

,total-data);

fputs(buff 1, fpout); /* write line to file */

fputs("\n", fp-out); /* add return character to line */

I

77

Page 78: Internet Users - DSpace@MIT Home

fclose(fpin2);

fclose(fpout);

}// end else

}// end main

A.4 This module represents the uncontrolled ARCUP algorithm/* Author: Pierre Arthur Elysee */

/* Date: 9/24/00 */

/* Description: This module calculates the total time requires */

/* by each source to complete their data tranfer. It uses an average estimator skim

along with */

/* real web traces. send av/target partion of data at target and I-av/target at peak rate

when average > target, increase otherwise. Report the transfer time for each transac-

tion*/

#include <stdio.h>

#include <stdlib.h>

#include <string.h>

#include <math.h>

/* all macros */

#define seps "" /* separator symbols for parsing */

#define MAXNUMBERCHARS 200 /* number of characters expected per line */

#define inibucket 0.0; /* initial content of bucket */

typedef struct {

int w;

struct NODE *next;

} NODE;

78

Page 79: Internet Users - DSpace@MIT Home

void maino

{

/* variables */

FILE *fp-inl, *fp-in2, *fpout;

char inputnamel[20], input-name2[20], outputname[20];

char buff[200]; /* hold on line of data from input file */

char buff 1[200], buff2[20], buff3[10],buff4[20], buff5[10],buff6[20], buff7[20];

char *w;

NODE *ptr-head, *ptr-tail, *ptr walk, *prev-pos, *temp;

float offtime;

float pkjtime = 0.0;

float totalpk time = 0.0;

float pk-rate = 1000000.0; /* initial pkrate */

float target-rate = 10000.0; /* initial target-rate */

float avrate = 0.0; /* peak time and token time */

float prevontime = 0.0;

float period-on, period-off, prevoffjtime;

float totaltime = 0.0; /* time requires by each source to comple their transfer */

float totaldata = 0.0;

float datainbits =0.0;

float Win-length = 0.6;

float T_front = 0.0;

float Files_inTSW = 0.0;

float Newfiles = 0.0;

float alpha = 0.05;

float word[30]; /*contain data from input file */

79

Page 80: Internet Users - DSpace@MIT Home

int ontime = 0.0;

int period = 1;

int work-load = 0;

int maxdata = 300000;

int typ-offtime = 0; /* in sec */

int old_id = 1;

int flag = 1;

int i = 0;

int nw = 0;

char *tmp;

printf("Enter input file name of data:\n");

scanf("%s", input-name2);

strcpy(output-name,input-name2);

strcat(output-name, "_out");

printf("input file name:

printf("output file name:

%s \n",inputname2);

%s \n\n",outputname);

if ( (fp__n2 = fopen(input-name2,"r") ) == (FILE *) NULL)

{printf("Couldn't open %s for reading.\n", input name2);

exit(1);

}if ( (fp-out = fopen(outputname,"w") ) == (FILE *) NULL)

{

printf("Couldn't open %s for writing.\n", output~name);

80

Page 81: Internet Users - DSpace@MIT Home

exit(1);

I

else

{strcpy(buff2,"typoffjtime");

strcpy(buff3,"total data");

strcpy(buff4,"pk-rate");

strcpy(buff5,"av-rate");

strcpy(buff6,"target_rate");

strcpy(buff7,"totaltransfer time");

/* output formatting */

/* sprintf(buffl, "%s %s %s %s %s %s", buff2, buff3, buff4, buff5,

buff6, buff7); */

fputs(buff 1, fpout); /* write line to file */

fputs("\n", fp-out); /* add return character to line */

/* open file with data transfer info */

fp-in2 = fopen(input-name2,"r");

/* reading data from input file */

while (fgets(buff, MAXNUMBERCHARS, fpjin2) != 0)

{

nw = 0;

w = strtok(buff, seps); /* find first word */

while (w)

{

81

Page 82: Internet Users - DSpace@MIT Home

/* parsing input line */

word[nw++] = atof(w);

w = strtok(NULL, seps); /* find next word */

}

typ-offtime = word[O];

workload = (int)(word[1]);

datainbits = 8*work-load; /* data to transfer in bits */

{

totaldata = totaldata + datainbits;

if( typ-offjtime == 0) typ-offjtime = 3;

if(data in bits > max-data)

pkjtime = 0.2*datainbits/targetrate;

pkjtime = pkjtime + 0.8*datainbits/pkrate;

datainbits = 0;

}

if(av-rate > target-rate && datainbits != 0)

{

pkjrate = 0.8*pk rate;

pkjtime = data_in_bits/pk-rate;

}

else if(data in bits != 0)

{

pkjrate 1 1.2*pk rate;

if(pkrate > 1000000.0) pk-rate = 1000000.0;

82

Page 83: Internet Users - DSpace@MIT Home

pkjtime = datainbits/pk_rate;

}totaltime = Tfront + typoffjtime + pktime

/* calculating the the average transfer rate */

Files_inTSW = avrate*Win_length;

Newfiles = FilesinTSW + workload*8;

avrate = Newfiles/ (total-time - Tfront + Win_length);

T_front = total-time; // time of the last packet arrival

total-pk time = total_pktime + pktime;

sprintf(buff 1, " %d %d %d %d %d %f %f",

typofftime,

workload, (int)pk rate, (int)av-rate, (int)targetratetotaltime/60.0 ,total-data);//

totaltime/60.Opktime

fputs(buff 1, fpout); 1* write line to file */

fputs("\n", fp-out); /* add return character to line */

/* readjusting peak rate */

} while(fgets(buff,......)

fclose(fpin2);

fclose(fp-out);

}// end else

}// end main

83

Page 84: Internet Users - DSpace@MIT Home

A.5 Sample of data used in simulationstaz 797447407 13352 "http://cs-www.bu.edu/" 2299 0.969024

taz 797447408

taz 797447409

taz 797447476

taz 797447611

taz 797447639

taz 797447641

taz 797447643

taz 797447644

taz 797447650

taz 797447651

taz 797447652

taz 797447679

taz 797447680

taz 797447723

taz 797447727

taz 797447727

taz 797447735

taz 797447736

taz 797447738

taz 797447741

taz

taz

taz

taz

taz

taz

940773 "http://cs-www.bu.edu/lib/pics/bu-iogo.gif' 1803 0.629453

941884

527498

924579

206997

152490

428176

349043

507322

164916

850779

477564

342368

595449

341012

348313

567056

"http://cs-www.bu.edu/lib/pics/bu-label.gif' 715 0.326586

"http://cs-www.bu.edu/students/grads/Home.html" 4734 0.494357

"http://cs-www.bu.edu/lib/icons/rball.gif' 0 0.0

"http://www.cts.com/cts/market/" 9103 1.751035

"http://www.cts.com/cts/market/marketplace.gif' 20886 1.143875

"http://www.cts.com/cts/market/dirsite-icon.gif' 2511 0.590501

"http://www.cts.com/art/cts.gif' 1826 0.613857

"http://www.cts.com/~flowers" 318 0.599902

"http://www.cts.com:80/-flowers/" 3044 1.556657

"http://www.cts.com/~flowers/thumb.gif' 9256 2.654269

"http://www.cts.com/~flowers/order.html" 2865 0.743227

"http://www.cts.com/-flowers/thumb.gif' 0 0.0

"http://www.cts.com:80/-flowers/" 0 0.0

"http://www.cts.com/cts/market/dirsite-icon.gif' 0 0.0

"http://www.cts.com/art/cts.gif' 0 0.0

"http://www.cts.com/-vacation" 320 1.312128

930528 "http://www.cts.com:80/-vacation/" 1168 1.020013

92121 "http://www.cts.com/-vacation/logo2.gif' 5485 2.669898

441969 "http://www.cts.com/-vacation/boxindxl.gif' 3828 0.682333

797447742 561155 "http://www.cts.com/~vacation/boxindx2.gif' 3786 0.701705

797447743 760975 "http://www.cts.com/~vacation/boxindx4.gif' 3936 0.757089

797448806 36925 "http://worldweb.net/-stoneji/tattoo/mytats.html" 2898 0.181871

797448806 314167 "http://worldweb.net/-stonej/tattoo/my-tats.gif' 7806 0.254429

797448807 108445 "http://worldweb.net/-stoneji/tattoo/dragon.gif' 10878 0.313795

797448808 675031 "http://worldweb.net/~stoneji/tattoo/cavedraw.gif' 3710 0.231060

84

Page 85: Internet Users - DSpace@MIT Home

taz

taz

taz

taz

taz

taz

taz

taz

797448809

797448823

797454701

797454705

797454706

797454707

797454708

797454709

85

849560 "http://worldweb.net/-stoneji/tattoo/sm-arm.gif' 3198 0.297034

624434 "http://worldweb.net/-stoneji/tattoo.html" 0 0.0

532335 "http://www.hollywood.com/rocknroll/" 1837 0.880347

926469 "http://www.hollywood.com/rocknroll/buzz.gif' 3841 0.400525

750284 "http://www.hollywood.com/rocknroll/quote.gif' 3888 0.410293

571245 "http://www.hollywood.com/rocknroll/sound.gif' 4181 0.425392

420418 "http://www.hollywood.com/rocknroll/video.gif' 4000 0.437737

257421 "http://www.hollywood.com/rocknroll/sight.gif' 3512 0.419571