Interesting Links

68
1 Interesting Links

description

Interesting Links. On the Self-Similar Nature of Ethernet Traffic Will E. Leland, Walter Willinger and Daniel V. Wilson BELLCORE Murad S. Taqqu BU. Analysis and Prediction of the Dynamic Behavior of Applications, Hosts, and Networks. Overview. What is Self Similarity? - PowerPoint PPT Presentation

Transcript of Interesting Links

Page 1: Interesting Links

1

Interesting Links

Page 2: Interesting Links

On the Self-Similar Nature of Ethernet Traffic Will E. Leland, Walter Willinger and Daniel V. Wilson BELLCOREMurad S. Taqqu BU

Analysis and Prediction of the Dynamic Behavior of Applications, Hosts, and Networks

Page 3: Interesting Links

3

Overview

What is Self Similarity?

Ethernet Traffic is Self-Similar

Source of Self Similarity

Implications of Self Similarity

Page 4: Interesting Links

Section 1:

What is Self-Similarity ?

Page 5: Interesting Links

5

Intuition of Self-Similarity

Something “feels the same” regardless of scale (also called fractals)

Page 6: Interesting Links

6

Page 7: Interesting Links

7

Page 8: Interesting Links

8

Page 9: Interesting Links

9

What is Self-Similarity?

In case of stochastic objects like time-series, self-similarity is used in the distributional sense

Page 10: Interesting Links

10

Pictorial View of Self-Similarity

Page 11: Interesting Links

11

The Famous Data

Leland and Wilson collected hundreds of millions of Ethernet packets without loss and with recorded time-stamps accurate to within 100µs.

Data collected from several Ethernet LAN’s at the Bellcore Morristown Research and Engineering Center at different times over the course of approximately 4 years.

Page 12: Interesting Links

12

Page 13: Interesting Links

13

Why is Self-Similarity Important? Recently, network packet traffic has been

identified as being self-similar. Current network traffic modeling using

Poisson distributing (etc.) does not take into account the self-similar nature of traffic.

This leads to inaccurate modeling which, when applied to a huge network like the Internet, can lead to huge financial losses.

Page 14: Interesting Links

14

Problems with Current Models A Poisson process

When observed on a fine time scale will appear bursty

When aggregated on a coarse time scale will flatten (smooth) to white noise

A Self-Similar (fractal) process When aggregated over wide range of time scales

will maintain its bursty characteristic

Page 15: Interesting Links

15

Consequences of Self-Similarity Traffic has similar statistical properties at a

range of timescales: ms, secs, mins, hrs, days

Merging of traffic (as in a statistical multiplexer) does not result in smoothing of traffic

Bursty DataStreams

Aggregation Bursty AggregateStreams

Page 16: Interesting Links

16

Pictorial View of Current Modeling

Page 17: Interesting Links

17

Side-by-side View

Page 18: Interesting Links

18

Definitions and Properties

Long-range Dependence autocorrelation decays slowly

Hurst Parameter Developed by Harold Hurst (1965) H is a measure of “burstiness”

also considered a measure of self-similarity 0 < H < 1 H increases as traffic increases

Page 19: Interesting Links

19

Definitions and Properties Cont.’d

low, medium, and high traffic hours as traffic increases, the Hurst parameter increases

i.e., traffic becomes more self-similar

Page 20: Interesting Links

21

Properties of Self Similarity X = (Xt : t = 0, 1, 2, ….) is covariance stationary random

process (i.e. Cov(Xt,Xt+k) does not depend on t for all k)

Let X(m)={Xk(m)} denote the new process obtained by averaging

the original series X in non-overlapping sub-blocks of size m.

Mean , variance 2

Suppose that Autocorrelation Function r(k) k-β, 0<β<1

E.g. X(1)= 4,12,34,2,-6,18,21,35

Then X(2)=8,18,6,28

X(4)=13,17

Page 21: Interesting Links

22

Auto-correlation Definition X is exactly second-order self-similar if

The aggregated processes have the same autocorrelation structure as X. i.e.

r (m) (k) = r(k), k0 for all m =1,2, …

X is [asymptotically] second-order self-similar ifthe above holds when [ r (m) (k) r(k), m

Most striking feature of self-similarity: Correlation structures of the aggregated process do not degenerate as m

Page 22: Interesting Links

23

Traditional Models

This is in contrast to traditional models Correlation structures of their aggregated

processes degenerate as m i.e. r (m) (k) 0 as mfor k = 1,2,3,...

Example: Poisson Distribution Self-Similar Distribution

Page 23: Interesting Links

24

Page 24: Interesting Links

25

Long Range Dependence

Processes with Long Range Dependence are characterized by an autocorrelation function that decays hyperbolically as k increases

Important Property: This is also called non-summability of correlation

kkr )(

Page 25: Interesting Links

26

Intuition

Short-range processes: Exponential Decay of autocorrelations , i.e.: r(k) ~ pk , as k , 0 < p < 1 Summation is finite

The intuition behind long-range dependence: While high-lag correlations are all individually

small, their cumulative affect is important Gives rise to features drastically different from

conventional short-range dependent processes

Page 26: Interesting Links

27

The Measure of Self-Similarity Hurst Parameter H , 0.5 < H < 1 Three approaches to estimate H (Based on

properties of self-similar processes) Variance Analysis of aggregated processes Analysis of Rescaled Range (R/S) statistic for

different block sizes A Whittle Estimator

Page 27: Interesting Links

28

Variance Analysis

Variance of aggregated processes decays as: Var(X(m)) = am-b as m inf,

For short range dependent processes (e.g. Poisson Process), Var(X(m)) = am-1 as m inf,

Plot Var(X(m)) against m on a log-log plot Slope > -1 indicative of self-similarity

Page 28: Interesting Links

29

Page 29: Interesting Links

30

The R/S statistic

)],......,,0min(),......,,0[max()(

1)()(

2121 nn WWWWWWnSnS

nR

)(),(

),,....2,1:(2 nSVarianceSamplenXmeanSample

nkX k

)()....( 21 nXkXXXW kk

where

For a given set of observations,

Rescaled Adjusted Range or R/S statistic is given by

Page 30: Interesting Links

31

Example

Xk = 14,1,3,5,10,3 Mean = 36/6 = 6 W1 =14-(1.6 )=8 W2 =15-(2.6 )=3 W3 =18-(3.6 )=0 W4 =23-(4.6 )=-1 W5 =33-(5.6 )=3 W6 =36-(6.6 )=0

R/S = 1/S*[8-(-1)] = 9/S

Page 31: Interesting Links

32

The Hurst Effect

For self-similar data, rescaled range or R/S statistic grows according to cnH H = Hurst Paramater, > 0.5

For short-range processes , R/S statistic ~ dn0.5

History: The Nile river In the 1940-50’s, Harold Edwin Hurst studies the 800-year record of

flooding along the Nile river. (yearly minimum water level) Finds long-range dependence.

Page 32: Interesting Links

33

Page 33: Interesting Links

34

Whittle Estimator

Provides a confidence interval Property: Any long range dependent process

approaches FGN, when aggregated to a certain level

Test the aggregated observations to ensure that it has converged to the normal distribution

Page 34: Interesting Links

35

Recap

Self-similarity manifests itself in several equivalent fashions: Non-degenerate autocorrelations Slowly decaying variance Long range dependence Hurst effect

Page 35: Interesting Links

Section 2:

Ethernet Traffic is Self-Similar

Page 36: Interesting Links

37

Plots Showing Self-Similarity (Ⅰ)

H=0.5

H=0.5

H=1

Estimate H 0.8

Page 37: Interesting Links

38

Plots Showing Self-Similarity (Ⅱ)

Higher Traffic, Higher H

High Traffic

Mid Traffic

Low Traffic

1.3%-10.4%

3.4%-18.4%

5.0%-30.7%

Page 38: Interesting Links

39

Observation shows “contrary to Poisson”

Network Utilization H As we shall see shortly, H measures traffic burstiness

As number of Ethernet users increases, the resultingaggregate traffic becomes burstier instead of smoother

H : A Function of Network Utilization

Page 39: Interesting Links

40

Difference in low traffic H values Pre-1990: host-to-host workgroup traffic Post-1990: Router-to-router traffic Low period router-to-router traffic consists

mostly of machine-generated packets Tend to form a smoother arrival stream, than low

period host-to-host traffic

Page 40: Interesting Links

41

H : Measuring “Burstiness”

Intuitive explanation using M/G/Model As α 1, service time is more variable, easier

to generate burst Increasing H !

Page 41: Interesting Links

42

Summary

Ethernet LAN traffic is statistically self-similar

H : the degree of self-similarity H : a function of utilization H : a measure of “burstiness”

Models like Poisson are not able to capture self-similarity

Page 42: Interesting Links

43

Discussions

How to explain self-similarity ? Heavy tailed file sizes

How this would impact existing performance? Limited effectiveness of buffering Effectiveness of FEC

How to adapt to self-similarity? Prediction Adaptive FEC

Page 43: Interesting Links

Section 3:

Explaining Self - Similarity

Page 44: Interesting Links

45

Introduction

Page 45: Interesting Links

46

Introduction

The superposition of many ON/OFF sources whose ON-periods and OFF-periods exhibit the Noah Effect produces aggregate network traffic that features the Joseph Effect. Noah Effect: high variability or infinite variance Joseph Effect: self-similar or long-range

dependent

Also known as packet train models

Page 46: Interesting Links

47

The Noah Effect

Noah Effect is the essential point of departure from traditional to self-similar traffic modeling

Results in highly variable ON-OFF periods : Train length and inter-train distances can be very large with non-negligible probabilities

Infinite Variance Syndrome : Many naturally occurring phenomenon can be well described with infinite variance distributions

Heavy-tail distributions, parameter

Page 47: Interesting Links

48

Existing Models

Traditional traffic models: finite variance ON/OFF source models

Superposition of such sourcesbehaves like white noise, with only short range correlations

Page 48: Interesting Links

49

Idealized ON/OFF Model Lengths of ON- and OFF periods are iid positive random

variables, Uk

Suppose that U has a hyperbolic tail distribution,

Property (1) is the infinite variance syndrome or the Noah Effect. 2 implies E(U2) = > 1 ensures that E(U) < , and that S0 is not infinite

(1) ,21 , as ~)( ucuuUP

Page 49: Interesting Links

50

http://statistik.wu-wien.ac.at/cgi-bin/anuran.pl

Page 50: Interesting Links

51

Page 51: Interesting Links

52

Explaining Self-Similarity

Consider a set of processes which are either ON or OFF The distribution of ON and OFF times are heavy

tailed 12 The aggregation of these processes leads to a

self-similar process H = (3 - min 12)/2

So, how do we get heavy tailed ON or OFF times?

Page 52: Interesting Links

53

Heavy Tailed ON Times and File Sizes

Analysis of client logs showed that ON times were, in fact, heavy tailed ~ 1.2 Over about 3 orders of magnitude

This lead to the analysis of underlying file sizes ~ 1.1 Over about 4 orders of magnitude Similar to FTP traffic

Files available from UNIX file systems are typically heavy tailed

Page 53: Interesting Links

54

Heavy Tailed OFF times

Analysis of OFF times showed that they are also heavy tailed ~ 1.5

Page 54: Interesting Links

55

Ethernet LAN Traffic Measurements at the Source Level

Location Bellcore Morristown Research and Engineering Center

The first set The busy hour of the August 1989 Ethernet LAN

measurements About 105 sources, 748 active source-destination pairs 95% of the traffic was internal

The second set 9 day-long measurement period in December 1994 About 3,500 sources, 10,000 active pairs Measurements are made up entirely of remote traffic

Page 55: Interesting Links

56

Textured Plots of Packet Arrival Times

Page 56: Interesting Links

57

Textured Plots of Packet Arrival Times

Page 57: Interesting Links

58

Checking for the Noah Effect

Complementary distribution plots

Hill’s estimate Let U1, U2,…, Un denote the observed ON-(or

OFF-)periods and write U(1) U(2) …U(n) for the corresponding order statistics

uucuUP as ),log()log(~))(log(

(3) ,)log(log1ˆ11

0)()1(

ki

iknnn UU

k

Page 58: Interesting Links

59

Page 59: Interesting Links

60

Page 60: Interesting Links

61

Traffic Modeling and Generation Although network traffic is intrinsically

complex, parsimonious modeling is still possible. Estimating a single parameter (intensity of the

Noah Effect) is enough.

Page 61: Interesting Links

62

Performance and Protocol Analysis

The queue length distribution Traditional (Markovian) traffic: decreases

exponentially fast Self-similar traffic: decreases much more slowly

Protocol design should be expected to take into account knowledge about network traffic such as the presence or absence of the Noah Effect.

Page 62: Interesting Links

63

Conclusion

The presence of the Noah Effect in measured Ethernet LAN traffic is confirmed.

The superposition of many ON/OFF models with Noah Effect results in aggregate packet streams that are consistent with measured network traffic, and exhibits the self-similar or fractal properties.

Page 63: Interesting Links

64

Major Results of CB97

Established that WWW traffic was self-similar Modeled a number of different WWW

characteristics (focus on the tail) Provide an explanation for self-similarity of

WWW traffic based on underlying file size distribution

Page 64: Interesting Links

65

An example File size Distribution on a Win2000 machine

Page 65: Interesting Links

Section 4:

Impact of Self Similarity

Page 66: Interesting Links

67

Comparison

Page 67: Interesting Links

68

Impact on Network Engineering Queuing delays are much higher in the

presence of long range dependence than for Poisson traffic

To avoid dropping packets, buffers have to be huge

You have to be very careful predicting future traffic based past measurement

You cannot look at a little bit of video and decide how much buffer it’s going to require

Page 68: Interesting Links

Thanks !