On Internet Traffic Dynamics and Internet Topology I High Variability Phenomena
description
Transcript of On Internet Traffic Dynamics and Internet Topology I High Variability Phenomena
![Page 1: On Internet Traffic Dynamics and Internet Topology I High Variability Phenomena](https://reader036.fdocuments.in/reader036/viewer/2022062810/56815ccb550346895dcad855/html5/thumbnails/1.jpg)
On Internet Traffic Dynamics and Internet Topology I
High Variability Phenomena
Walter WillingerAT&T Labs-Research
[email protected][This is joint work with J. Doyle and D. Alderson (Caltech)]
![Page 2: On Internet Traffic Dynamics and Internet Topology I High Variability Phenomena](https://reader036.fdocuments.in/reader036/viewer/2022062810/56815ccb550346895dcad855/html5/thumbnails/2.jpg)
Topics Covered
Motivation A working definition Some illustrative examples Some simple constructions Some key mathematical results Resilience to ambiguity Heavy tails and statistics A word of caution
![Page 3: On Internet Traffic Dynamics and Internet Topology I High Variability Phenomena](https://reader036.fdocuments.in/reader036/viewer/2022062810/56815ccb550346895dcad855/html5/thumbnails/3.jpg)
Motivation
Internet is full of “high variability”– Link bandwidth: Kbps – Gbps– File sizes: a few bytes – Mega/Gigabytes– Flows: a few packets – 100,000+ packets– In/out-degree (Web graph): 1 – 100,000+– Delay: Milliseconds – seconds and beyond
How to deal with “high variability”– High variability = large, but finite variance– High variability = infinite variance
![Page 4: On Internet Traffic Dynamics and Internet Topology I High Variability Phenomena](https://reader036.fdocuments.in/reader036/viewer/2022062810/56815ccb550346895dcad855/html5/thumbnails/4.jpg)
A Working Definition A distribution function F(x) or random variable X is
called heavy-tailed if for some where c>0 and finite F is also called a power law or scaling distribution The parameter is called the tail index 1< < 2, F has infinite variance, but finite mean 0 < < 1, the variance and mean of F are infinite “Mild” vs “wild” (Mandelbrot): 2 vs < 2
xcxxFxXP ,)(1][
![Page 5: On Internet Traffic Dynamics and Internet Topology I High Variability Phenomena](https://reader036.fdocuments.in/reader036/viewer/2022062810/56815ccb550346895dcad855/html5/thumbnails/5.jpg)
Some Illustrative Examples
Some commonly-used plotting techniques– Probability density functions (pdf)– Cumulative distribution functions (CDF)– Complementary CDF (CCDF)
Different plots emphasize different features– Main body of the distribution vs. tail– Variability vs. concentration– Uni- vs. multi-modal
![Page 6: On Internet Traffic Dynamics and Internet Topology I High Variability Phenomena](https://reader036.fdocuments.in/reader036/viewer/2022062810/56815ccb550346895dcad855/html5/thumbnails/6.jpg)
Probability density functions
0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 50
0.5
1
1.5
x
f(x)
Lognormal(0,1)Gamma(.53,3)Exponential(1.6)Weibull(.7,.9)Pareto(1,1.5)
![Page 7: On Internet Traffic Dynamics and Internet Topology I High Variability Phenomena](https://reader036.fdocuments.in/reader036/viewer/2022062810/56815ccb550346895dcad855/html5/thumbnails/7.jpg)
Cumulative Distribution Function
0 2 4 6 8 10 12 14 16 18 200
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
x
F(x)
Lognormal(0,1)Gamma(.53,3)Exponential(1.6)Weibull(.7,.9)Pareto(1,1.5)
![Page 8: On Internet Traffic Dynamics and Internet Topology I High Variability Phenomena](https://reader036.fdocuments.in/reader036/viewer/2022062810/56815ccb550346895dcad855/html5/thumbnails/8.jpg)
Complementary CDFs
10-1 100 101 102
10-4
10-3
10-2
10-1
100
log(x)
log(
1-F(
x))
Lognormal(0,1)Gamma(.53,3)Exponential(1.6)Weibull(.7,.9)ParetoII(1,1.5)ParetoI(0.1,1.5)
![Page 9: On Internet Traffic Dynamics and Internet Topology I High Variability Phenomena](https://reader036.fdocuments.in/reader036/viewer/2022062810/56815ccb550346895dcad855/html5/thumbnails/9.jpg)
20th Century’s 100 largest disasters worldwide
10-2
10-1
100
100
101
102
US Power outages (10M of customers)
Natural ($100B)
Technological ($10B)
Log(size)
Log(rank)
Most events are
small
But the large events are huge
![Page 10: On Internet Traffic Dynamics and Internet Topology I High Variability Phenomena](https://reader036.fdocuments.in/reader036/viewer/2022062810/56815ccb550346895dcad855/html5/thumbnails/10.jpg)
Why “Heavy Tails” Matter …
Risk modeling (insurance) Load balancing (CPU, network) Job scheduling (Web server design) Combinatorial search (Restart methods) Complex systems studies (SOC vs. HOT) Towards a theory for the Internet …
![Page 11: On Internet Traffic Dynamics and Internet Topology I High Variability Phenomena](https://reader036.fdocuments.in/reader036/viewer/2022062810/56815ccb550346895dcad855/html5/thumbnails/11.jpg)
Some First Properties
Heavy-tailed or “scaling” distribution– – Compare to exponential distribution:
Linearly increasing mean residual lifetime– – Compare to exponential distribution
xcwXPxXPwXxXP 1/|
cxxXxXE |
))(exp(| wxwXxXP
| constxXxXE
![Page 12: On Internet Traffic Dynamics and Internet Topology I High Variability Phenomena](https://reader036.fdocuments.in/reader036/viewer/2022062810/56815ccb550346895dcad855/html5/thumbnails/12.jpg)
Some Simple Constructions For U uniform in [0,1], set X=1/U
– X is heavy-tailed with For E (standard) exponential, set X=exp(E)
– X is heavy-tailed with The mixture of exponential distributions with
parameter 1/having a (centered) Gamma(a,b) distribution is a Pareto distribution with a
The distribution of the time between consecutive visits of a symmetric random walk to zero is heavy-tailed with
![Page 13: On Internet Traffic Dynamics and Internet Topology I High Variability Phenomena](https://reader036.fdocuments.in/reader036/viewer/2022062810/56815ccb550346895dcad855/html5/thumbnails/13.jpg)
Key Mathematical Properties of Scaling Distributions Invariant under aggregation
– Non-classical CLT and stable laws (Essentially) invariant under maximization
– Domain of attraction of Frechet distribution (Essentially) invariant under mixture
– Example: The largest disasters worldwide
![Page 14: On Internet Traffic Dynamics and Internet Topology I High Variability Phenomena](https://reader036.fdocuments.in/reader036/viewer/2022062810/56815ccb550346895dcad855/html5/thumbnails/14.jpg)
Linear Aggregation: Classical Central Limit Theorem A well-known result
– X(1), X(2), … independent and identically distributed random variables with distribution function F (mean and variance 1)
– S(n) = X(1)+X(2)+…+X(n) n-th partial sum–
More general formulations are possible Often-used argument for the ubiquity of the normal
distribution
nNnnnS as ),1,0(/))((
![Page 15: On Internet Traffic Dynamics and Internet Topology I High Variability Phenomena](https://reader036.fdocuments.in/reader036/viewer/2022062810/56815ccb550346895dcad855/html5/thumbnails/15.jpg)
Linear Aggregation: Non-classical Central Limit Theorem A not so well-known result
– X(1), X(2), … independent and identically distributed with common distribution function F that is heavy-tailed with 1<<2
– S(n) = X(1)+X(2)+…+X(n) n-th partial sum–
The limit distribution is heavy-tailed with index More general formulations are possible Rarely taught in most Stats/Probability courses!
nnnnS as law, stable /))((
![Page 16: On Internet Traffic Dynamics and Internet Topology I High Variability Phenomena](https://reader036.fdocuments.in/reader036/viewer/2022062810/56815ccb550346895dcad855/html5/thumbnails/16.jpg)
Maximization:Maximum Domain of Attraction A not so well-known result (extreme-value theory)
– X(1), X(2), … independent and identically distributed with common distribution function F that is heavy-tailed with 1<<2
– M(n)=max(X(1), …, X(n)), n-th successive maxima–
G is the Fréchet distribution G is heavy-tailed with index
nGnnM as ,/)(
)exp( x
![Page 17: On Internet Traffic Dynamics and Internet Topology I High Variability Phenomena](https://reader036.fdocuments.in/reader036/viewer/2022062810/56815ccb550346895dcad855/html5/thumbnails/17.jpg)
Intuition for “Mild” vs. “Wild”
The case of “mild” distributions– “Evenness” – large values of S(n) occur as a
consequence of many of the X(i)’s being large– The contribution of each X(i), even of the
largest, is negligible compared to the sum The case of “wild” distributions
– “Concentration” – large values of S(n) or M(n) occur as a consequence of a single large X(i)
– The largest X(i) is dominant compared to S(n)
![Page 18: On Internet Traffic Dynamics and Internet Topology I High Variability Phenomena](https://reader036.fdocuments.in/reader036/viewer/2022062810/56815ccb550346895dcad855/html5/thumbnails/18.jpg)
Weighted Mixture
A little known result– X(1), X(2), … independent and identically distributed
with common distribution function F that is heavy-tailed with 1<<2
– p(1), p(2), …, p(n) iid in [0,1] with p(1)+…p(n)=1– W(n)=p(1)X(1)+…+p(n)X(n)–
Invariant “distributions” are Condition on X(i) >a>0
xlargefor ,)( xcxnWP W cuuFW 1)(
![Page 19: On Internet Traffic Dynamics and Internet Topology I High Variability Phenomena](https://reader036.fdocuments.in/reader036/viewer/2022062810/56815ccb550346895dcad855/html5/thumbnails/19.jpg)
100
101
102
20th Century’s 100 largest disasters worldwide
US Power outages (10M of customers,1985-1997)
Natural ($100B)
Technological ($10B)
Slope = -1(=1)
10-2
10-1
100
![Page 20: On Internet Traffic Dynamics and Internet Topology I High Variability Phenomena](https://reader036.fdocuments.in/reader036/viewer/2022062810/56815ccb550346895dcad855/html5/thumbnails/20.jpg)
8/14/03100
101
102 US Power outages (10M of customers,
1985-1997)
10-2
10-1
100
Slope = -1(=1)
A large event is not inconsistent with statistics.
![Page 21: On Internet Traffic Dynamics and Internet Topology I High Variability Phenomena](https://reader036.fdocuments.in/reader036/viewer/2022062810/56815ccb550346895dcad855/html5/thumbnails/21.jpg)
Resilience to Ambiguity
Scaling distributions are robust under– … aggregation, maximization, and mixture– … differences in observing/reporting/accounting– … varying environments, time periods
The “value” of robustness– Discoveries are easier/faster– Properties can be established more accurately– Findings are not sensitive to the details of the data
gathering process
![Page 22: On Internet Traffic Dynamics and Internet Topology I High Variability Phenomena](https://reader036.fdocuments.in/reader036/viewer/2022062810/56815ccb550346895dcad855/html5/thumbnails/22.jpg)
On the Ubiquity of Heavy Tails Heavy-tailed distributions are attractors for
averaging (e.g., non-classical CLT), but are the only distributions that are also (essentially) invariant under maximizing and mixing.
Gaussians (“normal”) distributions are also attractors for averaging (e.g., classical CLT), but are not invariant under maximizing and mixing
This makes heavy tails more ubiquitous than Gaussians, so no “special” explanations should be required …
![Page 23: On Internet Traffic Dynamics and Internet Topology I High Variability Phenomena](https://reader036.fdocuments.in/reader036/viewer/2022062810/56815ccb550346895dcad855/html5/thumbnails/23.jpg)
Heavy Tails and Statistics
The traditional “curve-fitting” approach “Curve-fitting” by example Beyond “curve-fitting” – “Borrowing strength” “Borrowing strength” by example What “science” in “scientific modeling”? Additional considerations
![Page 24: On Internet Traffic Dynamics and Internet Topology I High Variability Phenomena](https://reader036.fdocuments.in/reader036/viewer/2022062810/56815ccb550346895dcad855/html5/thumbnails/24.jpg)
“Curve-fitting” approach Model selection
– Choose parametric family of distributions Parameter estimation
– Take a strictly static view of data– Assume moment estimates exist/converge
Model validation– Select “best-fitting” model– Rely on some “goodness-of-fit” criteria/metrics
“Black box-type” modeling, “data-fitting” exercise
![Page 25: On Internet Traffic Dynamics and Internet Topology I High Variability Phenomena](https://reader036.fdocuments.in/reader036/viewer/2022062810/56815ccb550346895dcad855/html5/thumbnails/25.jpg)
“Curve-fitting” by example
Randomly picked data set– LBL’s WAN traffic (in- and outbound)– 1:30, June 24 – 1:30, June 25 (PDT), 1996– 243,092 HTTP connection sizes (bytes)– Courtesy of Vern Paxson (thanks!)
Illustration of widely-accepted approach
![Page 26: On Internet Traffic Dynamics and Internet Topology I High Variability Phenomena](https://reader036.fdocuments.in/reader036/viewer/2022062810/56815ccb550346895dcad855/html5/thumbnails/26.jpg)
CCDF plot on log-log scale
0 1 2 3 4 5 6 7 8-6
-5
-4
-3
-2
-1
0HTTP Connections (Sizes)
log10(x)
log1
0(1-
F(x)
)
![Page 27: On Internet Traffic Dynamics and Internet Topology I High Variability Phenomena](https://reader036.fdocuments.in/reader036/viewer/2022062810/56815ccb550346895dcad855/html5/thumbnails/27.jpg)
Fitted 2-parameter Lognormal distribution (=6.75,=2.05)
0 1 2 3 4 5 6 7 8-6
-5
-4
-3
-2
-1
0HTTP Connections (Sizes)
log10(x)
log1
0(1-
F(x)
)
![Page 28: On Internet Traffic Dynamics and Internet Topology I High Variability Phenomena](https://reader036.fdocuments.in/reader036/viewer/2022062810/56815ccb550346895dcad855/html5/thumbnails/28.jpg)
Fitted 2-parameter Pareto distribution(=1.27, m=2000)
0 1 2 3 4 5 6 7 8-6
-5
-4
-3
-2
-1
0HTTP Connections (Sizes)
log10(x)
log1
0(1-
F(x)
)
![Page 29: On Internet Traffic Dynamics and Internet Topology I High Variability Phenomena](https://reader036.fdocuments.in/reader036/viewer/2022062810/56815ccb550346895dcad855/html5/thumbnails/29.jpg)
The “truth” about “curve-fitting”
Highly predictable outcome– Always doable, no surprises
• Cause for endless discussions (“Which model is better?”)
When “more” means “better” …– 2-parameter distributions (Pareto, Lognormal, …)
• 3-parameter distributions (Weibull, Gamma, …)– 5-parameter distribution (Double-Pareto, -Lognormal, …), etc.
Inadequate “goodness-of-fit” criteria due to– Voluminous data sets
• Data with strong dependencies (LRD)– Data with high variability (heavy tails)
» Data with non-stationary features
![Page 30: On Internet Traffic Dynamics and Internet Topology I High Variability Phenomena](https://reader036.fdocuments.in/reader036/viewer/2022062810/56815ccb550346895dcad855/html5/thumbnails/30.jpg)
“Borrowing strength” approach
Mandelbrot & Tukey to the rescue– Sequential moment plots (Mandelbrot)– Borrowing strength from large data (Tukey)
“Borrowing strength” – dynamic view of data– Rely on traditional approach for initial (small) subset of
available data– Consider successively larger subsets– Look out for inherently consistent models– Identify “patchwork “ of “fixes”
![Page 31: On Internet Traffic Dynamics and Internet Topology I High Variability Phenomena](https://reader036.fdocuments.in/reader036/viewer/2022062810/56815ccb550346895dcad855/html5/thumbnails/31.jpg)
Lognormal?
0 1 2 3 4 5 6 7 8-6
-5
-4
-3
-2
-1
0HTTP Connections (Sizes)
log10(x)
log1
0(1-
F(x)
)
![Page 32: On Internet Traffic Dynamics and Internet Topology I High Variability Phenomena](https://reader036.fdocuments.in/reader036/viewer/2022062810/56815ccb550346895dcad855/html5/thumbnails/32.jpg)
Pareto?
0 1 2 3 4 5 6 7 8-6
-5
-4
-3
-2
-1
0HTTP Connections (Sizes)
log10(x)
log1
0(1-
F(x)
)
![Page 33: On Internet Traffic Dynamics and Internet Topology I High Variability Phenomena](https://reader036.fdocuments.in/reader036/viewer/2022062810/56815ccb550346895dcad855/html5/thumbnails/33.jpg)
“Borrowing strength” (example 1)
Use same data set as before Illustration of Mandelbrot-Tukey approach (1)
– Sequential standard deviation plots– Lack of robustness– A case against Lognormal distributions
More on sequential standard deviation plots Scaling distributions to the rescue
![Page 34: On Internet Traffic Dynamics and Internet Topology I High Variability Phenomena](https://reader036.fdocuments.in/reader036/viewer/2022062810/56815ccb550346895dcad855/html5/thumbnails/34.jpg)
Fitting lognormal: n=20,000
0 20000 2400005.5
6
6.5
7
7.5
8
8.5
9
9.5
10
10.5
11 x 104 Sequential Standard Deviation (std) Plot
n
sdt(n
)
std = 93626 (2LN 4.8)
![Page 35: On Internet Traffic Dynamics and Internet Topology I High Variability Phenomena](https://reader036.fdocuments.in/reader036/viewer/2022062810/56815ccb550346895dcad855/html5/thumbnails/35.jpg)
Fitting lognormal: n=40,000
0 40000 2400005.5
6
6.5
7
7.5
8
8.5
9
9.5
10
10.5
11 x 104 Sequential Standard Deviation (std) Plot
n
sdt(n
)
std(20000) = 93626 (2LN 4.8)
std = 72744 (2LN 4.5)
![Page 36: On Internet Traffic Dynamics and Internet Topology I High Variability Phenomena](https://reader036.fdocuments.in/reader036/viewer/2022062810/56815ccb550346895dcad855/html5/thumbnails/36.jpg)
Fitting lognormal: n=80,000
0 80000 2400005.5
6
6.5
7
7.5
8
8.5
9
9.5
10
10.5
11 x 104 Sequential Standard Deviation (std) Plot
n
sdt(n
)
std = 66322 (2LN 4.4)
std(20000) = 93626 (2LN 4.8)
std(40000) = 72744 (2LN 4.5)
![Page 37: On Internet Traffic Dynamics and Internet Topology I High Variability Phenomena](https://reader036.fdocuments.in/reader036/viewer/2022062810/56815ccb550346895dcad855/html5/thumbnails/37.jpg)
Fitting lognormal: n=160,000
0 160000 2400005.5
6
6.5
7
7.5
8
8.5
9
9.5
10
10.5
11 x 104 Sequential Standard Deviation (std) Plot
n
sdt(n
)
std = 61883(2
LN 4.3)
std(20000) = 93626 (2LN 4.8)
std(40000) = 72744 (2LN 4.5)
std(80000) = 66322 (2LN 4.4)
![Page 38: On Internet Traffic Dynamics and Internet Topology I High Variability Phenomena](https://reader036.fdocuments.in/reader036/viewer/2022062810/56815ccb550346895dcad855/html5/thumbnails/38.jpg)
Fitting lognormal: All data
0 2400005.5
6
6.5
7
7.5
8
8.5
9
9.5
10
10.5
11 x 104 Sequential Standard Deviation (std) Plot
n
sdt(n
)
std(20000) = 93626 (2LN 4.8)
std(40000) = 72744 (2LN 4.5)
std(80000) = 66322 (2LN 4.4)
std(160000) = 61883 (2LN 4.3)
std(all data) = 57569 (2LN 4.2)
![Page 39: On Internet Traffic Dynamics and Internet Topology I High Variability Phenomena](https://reader036.fdocuments.in/reader036/viewer/2022062810/56815ccb550346895dcad855/html5/thumbnails/39.jpg)
The case against lognormal The lognormal model assumes
– existence/convergence of 2nd moment– parameter estimates are inherently consistent
However, sequential std plot indicates– non-existence/divergence of 2nd moment – inherently inconsistent parameter estimates
What “science” in “scientific modeling”?– Curve/data fitting is not “science”– “Patchwork” of “fixes” (Mandelbrot)
![Page 40: On Internet Traffic Dynamics and Internet Topology I High Variability Phenomena](https://reader036.fdocuments.in/reader036/viewer/2022062810/56815ccb550346895dcad855/html5/thumbnails/40.jpg)
Randomizing observations
0 0.5 1 1.5 2x 105
4
4.5
5
5.5
6
6.5
7
7.5
8 x 104 Sequential std Plots
n
std(
n)
original datarandom permutation of original data
![Page 41: On Internet Traffic Dynamics and Internet Topology I High Variability Phenomena](https://reader036.fdocuments.in/reader036/viewer/2022062810/56815ccb550346895dcad855/html5/thumbnails/41.jpg)
Matching “mild” distributions
0 0.5 1 1.5 2x 105
4.5
5
5.5
6
6.5
7
7.5
8
8.5 x 104 Sequential std Plots
n
std(
n)
original datasample from matching"mild" (normal) distribution
![Page 42: On Internet Traffic Dynamics and Internet Topology I High Variability Phenomena](https://reader036.fdocuments.in/reader036/viewer/2022062810/56815ccb550346895dcad855/html5/thumbnails/42.jpg)
Lognormal or scaling distribution
0 0.5 1 1.5 2x 105
0
2
4
6
8
10
12
x 104 Sequential std Plots
n
std(
n)
original dataLognormal samplePareto sample
![Page 43: On Internet Traffic Dynamics and Internet Topology I High Variability Phenomena](https://reader036.fdocuments.in/reader036/viewer/2022062810/56815ccb550346895dcad855/html5/thumbnails/43.jpg)
Pareto?
0 1 2 3 4 5 6 7 8-6
-5
-4
-3
-2
-1
0HTTP Connections (Sizes)
log10(x)
log1
0(1-
F(x)
)
![Page 44: On Internet Traffic Dynamics and Internet Topology I High Variability Phenomena](https://reader036.fdocuments.in/reader036/viewer/2022062810/56815ccb550346895dcad855/html5/thumbnails/44.jpg)
“Borrowing strength” (example 2)
Use same data set as before Illustration of Mandelbrot-Tukey approach (2)
– Sequential tail index plots– Strong robustness properties– A case for scaling distributions
A requirement for future empirical studies
![Page 45: On Internet Traffic Dynamics and Internet Topology I High Variability Phenomena](https://reader036.fdocuments.in/reader036/viewer/2022062810/56815ccb550346895dcad855/html5/thumbnails/45.jpg)
Fitting Pareto: n=20,000
0 20000 2400000.8
0.9
1
1.1
1.2
1.3
1.4
1.5
1.6
1.7
1.8Sequential Tail Index ((n)) Plot
n
(n)
1.22
![Page 46: On Internet Traffic Dynamics and Internet Topology I High Variability Phenomena](https://reader036.fdocuments.in/reader036/viewer/2022062810/56815ccb550346895dcad855/html5/thumbnails/46.jpg)
Fitting Pareto: n=40,000
0 40000 2400000.8
0.9
1
1.1
1.2
1.3
1.4
1.5
1.6
1.7
1.8Sequential Tail Index ((n)) Plot
n
(n)
1.26
(20000) 1.22
![Page 47: On Internet Traffic Dynamics and Internet Topology I High Variability Phenomena](https://reader036.fdocuments.in/reader036/viewer/2022062810/56815ccb550346895dcad855/html5/thumbnails/47.jpg)
Fitting Pareto: n=80,000
0 80000 2400000.8
0.9
1
1.1
1.2
1.3
1.4
1.5
1.6
1.7
1.8Sequential Tail Index ((n)) Plot
n
(n)
1.26
(20000) 1.22
(40000) 1.26
![Page 48: On Internet Traffic Dynamics and Internet Topology I High Variability Phenomena](https://reader036.fdocuments.in/reader036/viewer/2022062810/56815ccb550346895dcad855/html5/thumbnails/48.jpg)
Fitting Pareto: n=160,000
0 160000 2400000.8
0.9
1
1.1
1.2
1.3
1.4
1.5
1.6
1.7
1.8Sequential Tail Index ((n)) Plot
n
(n) 1.32
(20000) 1.22
(40000) 1.26
(80000) 1.26
![Page 49: On Internet Traffic Dynamics and Internet Topology I High Variability Phenomena](https://reader036.fdocuments.in/reader036/viewer/2022062810/56815ccb550346895dcad855/html5/thumbnails/49.jpg)
Fitting Pareto: All data
0 2400000.8
0.9
1
1.1
1.2
1.3
1.4
1.5
1.6
1.7
1.8Sequential Tail Index ((n)) Plot
n
(n)
(20000) 1.22
(40000) 1.26
(80000) 1.26
(160000) 1.32
(all data) 1.30
![Page 50: On Internet Traffic Dynamics and Internet Topology I High Variability Phenomena](https://reader036.fdocuments.in/reader036/viewer/2022062810/56815ccb550346895dcad855/html5/thumbnails/50.jpg)
The case for scaling distributions
The “creativity” of scaling distributions– Data: Divergent sequential moment plots – Mathematics: Allow for infinite moments
Re-discover the “science” in “scientific modeling”– Scientifically “economical” modeling (when more data
doesn’t mean more parameters)– Statistically “efficient” modeling (when more data
mean more model accuracy/confidence)– Trading “goodness-of-fit” for “robustness”
![Page 51: On Internet Traffic Dynamics and Internet Topology I High Variability Phenomena](https://reader036.fdocuments.in/reader036/viewer/2022062810/56815ccb550346895dcad855/html5/thumbnails/51.jpg)
Looking ahead …
Main objective of current empirical studies “The observations are consistent (in the sense of “curve
fitting”) with model/distribution X, but are not consistent with model/distribution Y.”
Requirement for future empirical studies “The observations are consistent (in the sense of
“borrowing strength”) with model/distribution X, and X is not sensitive to the methods of measuring and collecting the observations.”
![Page 52: On Internet Traffic Dynamics and Internet Topology I High Variability Phenomena](https://reader036.fdocuments.in/reader036/viewer/2022062810/56815ccb550346895dcad855/html5/thumbnails/52.jpg)
Some Words of Caution …
Not every “straight-looking” log-log plot means “heavy tails”!
Never use frequency plots to infer heavy tails – even though physicists do it all the time!
![Page 53: On Internet Traffic Dynamics and Internet Topology I High Variability Phenomena](https://reader036.fdocuments.in/reader036/viewer/2022062810/56815ccb550346895dcad855/html5/thumbnails/53.jpg)
100 101 102 10310-2
10-1
100
x
1-F(
x)
Straight-looking log-log plot?
![Page 54: On Internet Traffic Dynamics and Internet Topology I High Variability Phenomena](https://reader036.fdocuments.in/reader036/viewer/2022062810/56815ccb550346895dcad855/html5/thumbnails/54.jpg)
0 20 40 60 80 100 120 140 160 18010-2
10-1
100
x
1-F(
x)
No! 25 samples from Exp(50) – linear semi-log plot!
![Page 55: On Internet Traffic Dynamics and Internet Topology I High Variability Phenomena](https://reader036.fdocuments.in/reader036/viewer/2022062810/56815ccb550346895dcad855/html5/thumbnails/55.jpg)
=1
=0
-1 0 1 2 3 4
-1
-2
-3
-4
log10(x)
log10(p)
dxdPxp )(
Slope = -(+1)
NEVER infer from frequency plots on log-log scale!
![Page 56: On Internet Traffic Dynamics and Internet Topology I High Variability Phenomena](https://reader036.fdocuments.in/reader036/viewer/2022062810/56815ccb550346895dcad855/html5/thumbnails/56.jpg)
-1 0 1 2 3 4
-1
-2
-3
-4
log10(x)
log10(p)
dxdPxp )(
2)10(10
x
![Page 57: On Internet Traffic Dynamics and Internet Topology I High Variability Phenomena](https://reader036.fdocuments.in/reader036/viewer/2022062810/56815ccb550346895dcad855/html5/thumbnails/57.jpg)
=1
=0
-1 0 1 2 3 4 5
0
-1
-2
-3
-4
log10(x)
log10(P)
)( xXP
ALWAYS infer from CCDF plots on log-log scale!
![Page 58: On Internet Traffic Dynamics and Internet Topology I High Variability Phenomena](https://reader036.fdocuments.in/reader036/viewer/2022062810/56815ccb550346895dcad855/html5/thumbnails/58.jpg)
A Word of Wisdom …
In my view, even if an accumulation of quick “fixes”were to yield an adequately fitting “patchwork”, it would bring no understanding. – B.B. Mandelbrot, 1997