Measurement, Modeling, and Analysis of the Internet: Part II.
-
date post
20-Dec-2015 -
Category
Documents
-
view
213 -
download
0
Transcript of Measurement, Modeling, and Analysis of the Internet: Part II.
Measurement, Modeling, and Analysis of the Internet: Part II
Overview
Traffic ModelingTCP Modeling and Congestion ControlTopology Modeling
Part II.a: Traffic modeling
Traffic Modeling
Early modeling efforts: legacy of telephonyPacket arrivals: Call arrivals (Poisson)Exponential holding times
Big Bang in 1993 “On the Self-Similar Nature of Ethernet Traffic”
Will E. Leland, Walter Willinger, Daniel V. Wilson, Murad S. Taqqu
Self-Similarity in Traffic Measurement
(Ⅱ) Network Traffic
That Changed Everything…..
Extract from abstract
“ We demonstrate that Ethernet local area network (LAN) traffic is statistically self-similar, that none of the commonly used traffic models is able to capture this fractal behavior, that such behavior has serious implications for the design, control, and analysis of high-speed…”
Properties of Self-Similarity
o Var(X(m) ) (= 2 m-β ) decreases more slowly (than m –1)
o r(k) decreases hyperbolically (not exponentially) so that kr(k) = (long range dependence)
o The spectral density [discrete time Fourier Transform of r(k)] f(λ) cλ-(1- β), as λ0 (not bounded)
What went wrong? What next?Modelers realized Calls->Packets
mapping inherently wrongSelf-similarity, or more accurately LRD
evidenced by Burstiness of trafficExplanations for LRD were sought and
modeled[LWWT] postulated heavy tails somewhere
as likely cause of LRD
Explanations of LRD
Open loop modelsClosed loop modelsMixed or structural models
Open loop models
Cox’s construction
Aggregate traffic is made up of many connections
Connections arrive at randomEach connection has a “size” (number
of packets)Each connection transmits packets at
some “rate”Heavy tailed distribution of size can
cause LRD traffic
M/G/ traffic model
M/G/ traffic modelPoisson customer arrivalsHeavy tailed service times
Paretotypical distribution
Traffic number of busy servers
Where are the heavy tails though…Construction provided generative model
for trafficStill didn’t explain where the heavy tails
were coming from..…until 1997
“Self-similarity in World Wide Web traffic. Evidence and possible causes.” Mark E. Crovella and Azer Bestavros.
Postulated that web file sizes follow Pareto distribution
Crovella dataset
Picture seemed complete..
Generative model existed Heavy tails were found Performance analysts got to work
Simulations based on generative model Analysis of multiplexers fed with traffic model Grave predictions on buffer overflow sprung Conservative buffer dimensioning was advocated
…but real world systems performed much better
Problems with open loop models Upwards of 90% network traffic closed loop Transmission of future packets depends on
what happened to prior packets Buffer overflows cause senders to back
off/reduce rate, thereby affecting generation of packets
Open loop models ignored the network effects Simulation/Analysis results misleading with
open loop models
Closed loop models
Why is closed loop important?Recall..“Transmission of future packets depends
on what happened to prior packets”Suggests closed loop behavior induces
correlations independently of file size distribution
Chaos?
“ The chaotic nature of TCP congestion control” A. Veres and M. Boda, Infocom 2000 (winner best paper award)
Paper simulated TCP sources sharing a link and observed chaotic dynamics
Chaotic dynamics
Onset of “chaos” depended on B/N ratio(B = Buffer size, N = number of flows)
Chaos continued..
Paper generated traffic, and preliminary analysis demonstrated presence of LRD
LRD completely determined by TCP, no role of variability of filesizes
Do the claims hold up?
Verification of TCP induced LRD
18
20
22
24
26
28
30
0 2 4 6 8 10 12 14 16 18 20 22
Timescale (log2)
En
erg
y
short (4 hours)long (100 hours)
Another TCP based model
“ On the Propagation of Long-Range Dependence in the Internet” A. Veres, Zs. Kenesi, S. Molnár, G. Vattay Sigcomm 2000
Proposed the theory that TCP can get “infected” by long range dependence and then “spread” the infection
Model
Let F* be an LRD flow, sharing a link C1 with a TCP flow T1
Since TCP adapts to available capacity T1 = C1 - F* Implies T1 becomes LRD (linearity and C1 is a
constant) Now T1 shares link C2 with TCP flow T2
T2 = C2 - T1
Since T1 has been established LRD, T2 now becomes LRD
And so on… Model has too many technical flaws to point
out..
Combined (structural) models
Recent (and not so) thoughts on traffic modelingObservation: Internet protocol hierarchy
is layeredDifferent layers act at different
timescalesLayering can lead to multiple timescale
(and hence LRD) behaviorShort time scale(multi-fractal) behavior
can be quite different from long time scale (mono-fractal)
From traces to traffic models
Implicit assumptions behind application modeling techniques:Identify the application corresponding to a
given flow recorded during a measurement period
Identify traffic generated by (instances) of the same application
Operation of the application-level protocol
Example of web traffic modeling
Primary random variables:Request sizes/Reply sizes User think timePersistent connection usageNbr of objects per persistent
connection
Number of embedded images/pageNumber of parallel connectionsConsecutive documents per server Number of servers per page
Consider independent Markov on-off processes
Wavelet plot (PSD) of LRD vs Markovian
LRD
ProductOf 3 Mark.
On-Off
Product of2 Mark.On-Off
MarkovianOn-Off
SpectrumIndistinguishable!
Relating layers to traffic generation
Session layer behavior
Transport layer behavior
application layer behavior
Packet generated when all layers are “on”, i.e resultant process is product of component layers
The thousand word picture
Part II.b: Fluid modeling of TCP
Outline
BackgroundStochastic Fluid Model Deterministic Fluid Models
Control theoretic analysisDelay, stability
Some limiting fluid models
TCP Congestion Control: window algorithm
Window: can send W packets at a time
• increase window by one per RTT if no loss, W <- W+1 each RTT
• decrease window by half on detection of loss W W/2
TCP Congestion Control: window algorithm
Window: can send W packetsincrease window by one per RTT if no
loss, W <- W+1 each RTT decrease window by half on detection of
loss W W/2
sender
receiver
W
TCP Congestion Control: window algorithm
Window: can send W packets• increase window by one per RTT if
no loss, W <- W+1 each RTT • decrease window by half on
detection of loss W W/2
sender
receiver
W
Background:
TCP throughput modeling: hot research topic in the late 90s
Earliest work by Teunis Ott (Bellcore) Steady state analysis of TCP throughput using time
rescaling
Padhye et al. (UMass, Sigcomm98) obtained accurate throughput formula for TCP
Formula validated with real Internet traces Traces contained loss events
Loss modeling
What do losses in a wide area experiment look like?
First guess: is the loss process Poisson?Analyze traces: several independent
experiments, duration 100 seconds each.
Trace analysis
Loss inter arrival events tested forIndependence
Lewis and Robinson test for renewal hypothesis
ExponentialityAnderson-Darling test
Scatter plot of statistic
Experiment 1
Experiment 2
Experiment 3
Experiment 4
SDE based model
Sender
Loss Probability pi
Traditional, Source centric loss model
Sender
Loss Indications arrival rate
New, Network centric loss model
New loss model proposed in “Stochastic Differential Equation Modeling and Analysis of TCP Window size behavior”, Misra et. al. Performance 99.
Loss model enabled casting of TCP behavior as a Stochastic Differential Equation, roughly
dw dt
R
w
2dN
Refinement of SDE model
W(t) = f(,R)
Window Size is a function of loss rate ( and round trip time (R)
R
Network
Network is a (blackbox) sourceof R and
Solution: Express R and as functions of W (and N, number of flows)
R
Active Queue Management:RED
RED: Random Early Detect proposed in 1993
Proactively mark/drop packets in a router queue probabilistically toPrevent onset of congestion by reacting early Remove synchronization between flows
The RED mechanism
RED: Marking/dropping based on average queue length x (t) (EWMA algorithm used for averaging)
tmin tmax
pmax
1
2tmax
Mark
ing
pro
babili
ty p
Average queue length x
t ->
- q (t)- x (t)
x (t): smoothed, time averaged q (t)
Loss Model
Sender
AQM Router
Packet Drop/Mark
Receiver
Loss Rate as seen by Sender:
B(t-p(t-(t)
Round Trip Delay ()
B(t)p(t)
(t)dt=E[dN(t)] -> deterministic fluid model
Deterministic System of Differential Equations
Window Size:
All quantities are average values.
dtdWi
Additiveincrease
))(( tqR1
i
Loss arrivalrate
)())(()(
tptqRtW
i
i
Mult.decrease
2Wi
Queue length: dtdq
Outgoingtraffic
C1 0tq ])([
Incomingtraffic
))(()(
tqRtW
i
i
System of Differential Equations (cont.)
Average queue length:
Where = averaging parameter of RED(wq)= sampling interval ~ 1/C
Loss probability:
Where is obtained from the marking profile
p
x
)()ln(
)()ln(
tq1
tx1
dtdx
dtdx
dxdp
dtdp
dxdp
Closed loop
W=Window size, R = RTT, q = queue length, p = marking probability
N1iRpfdtdW
i1i ),(
)( i2 Wfdtdq
)(qfdtdp
3
Verification of deterministic fluid model
Network simulated using ns Differential equations setup
for equivalent network No. of flows changes at t=75
and t=100 DE solver captures transient
performance Observation: Sample path
(simulation) matches deterministic fluid model: Fluid limit?
DE method ns simulation
Inst
. queue leng
th
Time
Inst. queue length at a router
Control theoretic analysis
Deterministic fluid model yields convenient control theoretic formulation
Non-linear system linearized about operating point
Frequency domain analysis reveals many interesting insights for the first time
Block diagram view
ttR1
N
21
1 W W q q
p
__
__
C
ttR1
ttR1Time Delay
Rtt
TCP window control
TCP load factor
congested queue
Control law(e.g. RED)
Small Signal model
)(sPtcp )(sPqueue0ttsReAQM Control
Law
CRN2
s
N2CR
sP
20
2
2
20
tcp
)(
0
0queue
R1
s
RN
sP
)(
p W q
Control theoretic analysis predicts stability of the systemGoes down as link capacity (C) increasesGoes down as number of flows (N)
decreasesGoes down as feedback delay increases
Analysis also reveals characteristics of controllerStability decreases by increasing slope (or
gain) of the RED drop profile ( )
Immediate insights
p
x
(Control) Theory based parameter tuning
Non-linear simulation with 60 ftp + 180 http flows
Design rules developed for RED parameter tuning given network conditions
Default ns parameters for REDRED parameters tuned
Queue length
Time
PI Controller performance
RED and PI compared Number of flows
gradually increased between t=50 and t=100
PI faster to converge, react
PI controls queue length independent of number of flows
- RED- PI controller
Time
Queue length
UNC Testbed
Plot of CDF of response time of requests (80% load)
Cu
mu
lati
ve p
rob
ab
ility
Response time (ms)
Plot of CDF of response time of requests (100% load)
Cu
mu
lati
ve p
rob
ab
ility
Response time (ms)
PI, qref=20 FIFO, RED
PI, qref=200
Recent fluid limits
Continuous settingA Mean-Field Model for Multiple TCP
Connections through a Buffer Implementing RED. [Baccelli, McDonald, Reynier]
Discrete settingLimit Behavior of ECN/RED Gateways Under
a Large Number of TCP Flows. [Tinnakornsrisuphap, Makowski]
Continuous setting
Start with similar stochastic model, Scaling
C NC,Q N (t) Q (t) / N ,N
Fluid limit obtained:
Q N (t) q(t),KN (t) k(t)
Where : is the loss rate
Final fluid equations very similar to our mean value model
KN (t)
Discrete setting
Q (N )(t) Nq(t) N L(t)
Start with discrete model for Windowsize behavior, obtain (with similar scaling, C->NC ),
Similar conclusion as ours regarding role of gain of RED drop profileDemonstrate RED removes synchronization in the limit
Q (N )(t)
NP N q(t)
Also obtain
Srikant et al.
Studied different scalings for limiting fluid models
Obtained limits similar to Makowski et al., in a continuous setting
Interesting observations regarding choice of models (rate based vs queue based) for REM If queue lengths have to be negligible compared to RTTs, use rate-
based models.
If virtual queues are to be used, then either scaling doesn’t matter (using variance calculations).
Parameter choices for stability would be different, depending upon the model
Scaling1
N
.
.
2Nc
p(q) 1 exp q
N
versus
p(q) 1 exp qN
Intuition
N scaling leads to rate-based models
N scaling leads to queue-based models
Why?
• Queue length becomes either or N, depending on the scaling. Thus, the queue length hits zero often in the former case, leading to an averaging effect.
N
Other applications of fluid modelsDesign and analysis of DiffServ
networksModeling and analysis of short-lived
flowsAnalysis of other mechanisms, e.g.
Stochastic Fair droppingGroups at Caltech and UIUC using
similar models for design/analysis
Part II.c: Topology modeling
Why study topology?
Correctness of network protocols typically independent of topology
Performance of networks critically dependent on topologye.g., convergence of route information
Internet impossible to replicate Modeling of topology needed to
generate test topologies
Internet topologies
AT&T
SPRINTMCI
AT&T
MCI SPRINT
Router level Autonomous System (AS) level
More on topologies..
Router level topologies reflect physical connectivity between nodes Inferred from tools like traceroute or well known
public measurement projects like Mercator and Skitter
AS graph reflects a peering relationship between two providers/clients Inferred from inter-domain routers that run BGP and
publlic projects like Oregon Route Views
Inferring both is difficult, and often inaccurate
Early work
Early models of topology used variants of Erdos-Renyi random graphsNodes randomly distributed on 2-
dimensional planeNodes connected to each other w/
probability inversely proportional to distance
Soon researchers observed that random graphs did not represent real world networks
Real world topologies
Real networks exhibit Hierarchical structure Specialized nodes (transit, stub..) Connectivity requirements Redundancy
Characteristics incorporated into the Georgia Tech Internetwork Topology Models (GT-ITM) simulator (E. Zegura, K.Calvert and M.J. Donahoo, 1995)
So…are we done?
No!In 1999, Faloutsos, Faloutsos and
Faloutsos published a paper, demonstrating power law relationships in Internet graphs
Specifically, the node degree distribution exhibited power laws
That Changed Everything…..
Power laws in AS level topology
Faloutsos3 (Sigcomm’99) frequency vs. degree
-10
-9
-8
-7
-6
-5
-4
-3
-2
-1
0
0 2 4 6 8
degree
fre
qu
en
cy
Power Laws
topology from BGP tables of 18 routers
Faloutsos3 (Sigcomm’99) frequency vs. degree
-10
-9
-8
-7
-6
-5
-4
-3
-2
-1
0
0 2 4 6 8
degree
fre
qu
en
cy
Power Laws
topology from BGP tables of 18 routers
Faloutsos3 (Sigcomm’99) frequency vs. degree
-10
-9
-8
-7
-6
-5
-4
-3
-2
-1
0
0 2 4 6 8
degree
fre
qu
en
cy
Power Laws
topology from BGP tables of 18 routers
Faloutsos3 (Sigcomm’99)
frequency vs. degree
empirical ccdf P(d>x) ~ x-
Power Laws
-10
-9
-8
-7
-6
-5
-4
-3
-2
-1
0
0 2 4 6 8
degree (d)
P(k
> d
)
Power Laws
-10
-9
-8
-7
-6
-5
-4
-3
-2
-1
0
0 2 4 6 8
degree (d)
P(k
> d
)
Faloutsos3 (Sigcomm’99)
frequency vs. degree
empirical ccdf P(d>x) ~ x-
α ≈1.15
GT-ITM abandoned..
GT-ITM did not give power law degree graphs
New topology generators and explanation for power law degrees were sought
Focus of generators to match degree distribution of observed graph
Generating power law graphsGoal: construct network of size N
with degree power law, P(d>x) ~ x-
power law random graph (PLRG)(Aiello et al)
Inet (Chen et al)
incremental growth (BA) (Barabasi et al)
general linear preference (GLP) (Bu et al)
Power law random graph (PLRG) (Aiello et al) operations
2
11
may be disconnected, contain multiple edges, self-loops
contains unique giant component for right choice of parameters
assign degrees to nodes drawn from power law distribution
create kv copies of node v; kv degree of v.
aggregate edges
randomly match nodes in pool
Inet (Chen et al)
assumptionmax degree, size grow exponentially over time
algorithmpick date, calculate maximum degree/sizecompute degrees of other nodesform spanning tree with degree 2+
attach other nodes according to linear preference
match remaining nodesremove self loops, multi-edges
Barabasi model: fixed exponentincremental growth
initially, m0 nodesstep: add new node i with m edges
linear preferential attachmentconnect to node i with probability ∏(ki) = ki / ∑ kj
0.5
0.5 0.25
0.5 0.25
new nodeexisting node
may contain multi-edges, self-loops
motivation greater flexibility in assigning preference removes need for rewiring
new preferential function ∏(ki) = (ki - ) / ∑ (kj - ), in (-,1) operations
prob. p: add m new links prob. 1-p: add a new node with m new links
can achieve any in (1, )
General linear preference
“ Scale-free” graphs
Preferential attachment leads to “scale free” structure in connectivity
Implications of “scale free” structure Few centrally located and highly connected hubs Network robust to random attack/node removal
(probability of targeting hub very low) Network susceptible to catastrophic failure by
targeted attacks (“Achilles heel of the Internet” Albert, Jeong, Barabasi, Nature 2000)
Is the router-level Internet graph scale-free?No…(There is no Memphis!)Emphasis on degree distribution -
structure ignoredReal Internet very structuredEvolution of graph is highly constrained
Topology constraints
Technology Router out degree is constrained by processing
speed Routers can either have a large number of low
bandwidth connections, or.. A small number of high bandwidth connections
Geography Router connectivity highly driven by geographical
proximity Economy
Capacity of links constrained by the technology that nodes can afford, redundancy/performance they desire etc.
Optimization based models for topologyHOT-1 Highly Optimized Tolerances
Doyle et. al., Caltech, USC, ISI, AT&T..
HOT-2 Heuristically Optimized TradeoffsFabrikant, Koutsoupias, Papadimitriou,
Berkeley
HOT-3: variant of HOT-2Chang, Jamin, Willinger, Michigan, AT&T
Fabrikant HOT
Each new node solves the local optimization problem to find a target node to connect to.
Each new node i connects to an existing node j that minimizes the weighted sum of two objectives: min (dij + hj)dij (last mile cost) = Euclidean distance from i to jhj (transmission delay cost) = average hop distance
from j to all other nodes
Modified Fabrikant HOT
Univariate HOT model.Criteria: (i) AS geography.
Bivariate HOT model.Criteria: (i) AS geography, (ii) AS business
model.
Various extensions