Measurement, Modeling, and Analysis of the Internet: Part II.

Measurement, Modeling, and Analysis of the Internet: Part II

Overview

Traffic ModelingTCP Modeling and Congestion ControlTopology Modeling

Part II.a: Traffic modeling

Traffic Modeling

Early modeling efforts: legacy of telephonyPacket arrivals: Call arrivals (Poisson)Exponential holding times

Big Bang in 1993 “On the Self-Similar Nature of Ethernet Traffic”

Will E. Leland, Walter Willinger, Daniel V. Wilson, Murad S. Taqqu

Self-Similarity in Traffic Measurement

(Ⅱ) Network Traffic

That Changed Everything…..

Extract from abstract

“ We demonstrate that Ethernet local area network (LAN) traffic is statistically self-similar, that none of the commonly used traffic models is able to capture this fractal behavior, that such behavior has serious implications for the design, control, and analysis of high-speed…”

Properties of Self-Similarity

o Var(X(m) ) (= 2 m-β ) decreases more slowly (than m –1)

o r(k) decreases hyperbolically (not exponentially) so that kr(k) = (long range dependence)

o The spectral density [discrete time Fourier Transform of r(k)] f(λ) cλ-(1- β), as λ0 (not bounded)

What went wrong? What next?Modelers realized Calls->Packets

mapping inherently wrongSelf-similarity, or more accurately LRD

evidenced by Burstiness of trafficExplanations for LRD were sought and

modeled[LWWT] postulated heavy tails somewhere

as likely cause of LRD

Explanations of LRD

Open loop modelsClosed loop modelsMixed or structural models

Open loop models

Cox’s construction

Aggregate traffic is made up of many connections

Connections arrive at randomEach connection has a “size” (number

of packets)Each connection transmits packets at

some “rate”Heavy tailed distribution of size can

cause LRD traffic

M/G/ traffic model

M/G/ traffic modelPoisson customer arrivalsHeavy tailed service times

Paretotypical distribution

Traffic number of busy servers

Where are the heavy tails though…Construction provided generative model

for trafficStill didn’t explain where the heavy tails

were coming from..…until 1997

“Self-similarity in World Wide Web traffic. Evidence and possible causes.” Mark E. Crovella and Azer Bestavros.

Postulated that web file sizes follow Pareto distribution

Crovella dataset

Picture seemed complete..

Generative model existed Heavy tails were found Performance analysts got to work

Simulations based on generative model Analysis of multiplexers fed with traffic model Grave predictions on buffer overflow sprung Conservative buffer dimensioning was advocated

…but real world systems performed much better

Problems with open loop models Upwards of 90% network traffic closed loop Transmission of future packets depends on

what happened to prior packets Buffer overflows cause senders to back

off/reduce rate, thereby affecting generation of packets

Open loop models ignored the network effects Simulation/Analysis results misleading with

open loop models

Closed loop models

Why is closed loop important?Recall..“Transmission of future packets depends

on what happened to prior packets”Suggests closed loop behavior induces

correlations independently of file size distribution

Chaos?

“ The chaotic nature of TCP congestion control” A. Veres and M. Boda, Infocom 2000 (winner best paper award)

Paper simulated TCP sources sharing a link and observed chaotic dynamics

Chaotic dynamics

Onset of “chaos” depended on B/N ratio(B = Buffer size, N = number of flows)

Chaos continued..

Paper generated traffic, and preliminary analysis demonstrated presence of LRD

LRD completely determined by TCP, no role of variability of filesizes

Do the claims hold up?

Verification of TCP induced LRD

18

20

22

24

26

28

30

0 2 4 6 8 10 12 14 16 18 20 22

Timescale (log2)

En

erg

y

short (4 hours)long (100 hours)

Another TCP based model

“ On the Propagation of Long-Range Dependence in the Internet” A. Veres, Zs. Kenesi, S. Molnár, G. Vattay Sigcomm 2000

Proposed the theory that TCP can get “infected” by long range dependence and then “spread” the infection

Model

Let F* be an LRD flow, sharing a link C1 with a TCP flow T1

Since TCP adapts to available capacity T1 = C1 - F* Implies T1 becomes LRD (linearity and C1 is a

constant) Now T1 shares link C2 with TCP flow T2

T2 = C2 - T1

Since T1 has been established LRD, T2 now becomes LRD

And so on… Model has too many technical flaws to point

out..

Combined (structural) models

Recent (and not so) thoughts on traffic modelingObservation: Internet protocol hierarchy

is layeredDifferent layers act at different

timescalesLayering can lead to multiple timescale

(and hence LRD) behaviorShort time scale(multi-fractal) behavior

can be quite different from long time scale (mono-fractal)

From traces to traffic models

Implicit assumptions behind application modeling techniques:Identify the application corresponding to a

given flow recorded during a measurement period

Identify traffic generated by (instances) of the same application

Operation of the application-level protocol

Example of web traffic modeling

Primary random variables:Request sizes/Reply sizes User think timePersistent connection usageNbr of objects per persistent

connection

Number of embedded images/pageNumber of parallel connectionsConsecutive documents per server Number of servers per page

Consider independent Markov on-off processes

Wavelet plot (PSD) of LRD vs Markovian

LRD

ProductOf 3 Mark.

On-Off

Product of2 Mark.On-Off

MarkovianOn-Off

SpectrumIndistinguishable!

Relating layers to traffic generation

Session layer behavior

Transport layer behavior

application layer behavior

Packet generated when all layers are “on”, i.e resultant process is product of component layers

The thousand word picture

Part II.b: Fluid modeling of TCP

Outline

BackgroundStochastic Fluid Model Deterministic Fluid Models

Control theoretic analysisDelay, stability

Some limiting fluid models

TCP Congestion Control: window algorithm

Window: can send W packets at a time

• increase window by one per RTT if no loss, W <- W+1 each RTT

• decrease window by half on detection of loss W W/2


Window: can send W packetsincrease window by one per RTT if no

loss, W <- W+1 each RTT decrease window by half on detection of

loss W W/2

sender

receiver

W


Window: can send W packets• increase window by one per RTT if

no loss, W <- W+1 each RTT • decrease window by half on

detection of loss W W/2

sender

receiver

W

Background:

TCP throughput modeling: hot research topic in the late 90s

Earliest work by Teunis Ott (Bellcore) Steady state analysis of TCP throughput using time

rescaling

Padhye et al. (UMass, Sigcomm98) obtained accurate throughput formula for TCP

Formula validated with real Internet traces Traces contained loss events

Loss modeling

What do losses in a wide area experiment look like?

First guess: is the loss process Poisson?Analyze traces: several independent

experiments, duration 100 seconds each.

Trace analysis

Loss inter arrival events tested forIndependence

Lewis and Robinson test for renewal hypothesis

ExponentialityAnderson-Darling test

Scatter plot of statistic

Experiment 1

Experiment 2

Experiment 3

Experiment 4

SDE based model

Sender

Loss Probability pi

Traditional, Source centric loss model

Sender

Loss Indications arrival rate

New, Network centric loss model

New loss model proposed in “Stochastic Differential Equation Modeling and Analysis of TCP Window size behavior”, Misra et. al. Performance 99.

Loss model enabled casting of TCP behavior as a Stochastic Differential Equation, roughly

dw dt

R

w

2dN

Refinement of SDE model

W(t) = f(,R)

Window Size is a function of loss rate ( and round trip time (R)

R

Network

Network is a (blackbox) sourceof R and

Solution: Express R and as functions of W (and N, number of flows)

R

Active Queue Management:RED

RED: Random Early Detect proposed in 1993

Proactively mark/drop packets in a router queue probabilistically toPrevent onset of congestion by reacting early Remove synchronization between flows

The RED mechanism

RED: Marking/dropping based on average queue length x (t) (EWMA algorithm used for averaging)

tmin tmax

pmax

1

2tmax

Mark

ing

pro

babili

ty p

Average queue length x

t ->

- q (t)- x (t)

x (t): smoothed, time averaged q (t)

Loss Model

Sender

AQM Router

Packet Drop/Mark

Receiver

Loss Rate as seen by Sender:

B(t-p(t-(t)

Round Trip Delay ()

B(t)p(t)

(t)dt=E[dN(t)] -> deterministic fluid model

Deterministic System of Differential Equations

Window Size:

All quantities are average values.

dtdWi

Additiveincrease

))(( tqR1

i

Loss arrivalrate

)())(()(

tptqRtW

i

i

Mult.decrease

2Wi

Queue length: dtdq

Outgoingtraffic

C1 0tq ])([

Incomingtraffic

))(()(

tqRtW

i

i

System of Differential Equations (cont.)

Average queue length:

Where = averaging parameter of RED(wq)= sampling interval ~ 1/C

Loss probability:

Where is obtained from the marking profile

p

x

)()ln(

)()ln(

tq1

tx1

dtdx

dtdx

dxdp

dtdp

dxdp

Closed loop

W=Window size, R = RTT, q = queue length, p = marking probability

N1iRpfdtdW

i1i ),(

)( i2 Wfdtdq

)(qfdtdp

3

Verification of deterministic fluid model

Network simulated using ns Differential equations setup

for equivalent network No. of flows changes at t=75

and t=100 DE solver captures transient

performance Observation: Sample path

(simulation) matches deterministic fluid model: Fluid limit?

DE method ns simulation

Inst

. queue leng

th

Time

Inst. queue length at a router

Control theoretic analysis

Deterministic fluid model yields convenient control theoretic formulation

Non-linear system linearized about operating point

Frequency domain analysis reveals many interesting insights for the first time

Block diagram view

ttR1

N

21

1 W W q q

p

__

__

C

ttR1

ttR1Time Delay

Rtt

TCP window control

TCP load factor

congested queue

Control law(e.g. RED)

Small Signal model

)(sPtcp )(sPqueue0ttsReAQM Control

Law

CRN2

s

N2CR

sP

20

2

2

20

tcp

)(

0

0queue

R1

s

RN

sP

)(

p W q

Control theoretic analysis predicts stability of the systemGoes down as link capacity (C) increasesGoes down as number of flows (N)

decreasesGoes down as feedback delay increases

Analysis also reveals characteristics of controllerStability decreases by increasing slope (or

gain) of the RED drop profile ( )

Immediate insights

p

x

(Control) Theory based parameter tuning

Non-linear simulation with 60 ftp + 180 http flows

Design rules developed for RED parameter tuning given network conditions

Default ns parameters for REDRED parameters tuned

Queue length

Time

PI Controller performance

RED and PI compared Number of flows

gradually increased between t=50 and t=100

PI faster to converge, react

PI controls queue length independent of number of flows

- RED- PI controller

Time

Queue length

UNC Testbed

Plot of CDF of response time of requests (80% load)

Cu

mu

lati

ve p

rob

ab

ility

Response time (ms)

Plot of CDF of response time of requests (100% load)

Cu

mu

lati

ve p

rob

ab

ility

Response time (ms)

PI, qref=20 FIFO, RED

PI, qref=200

Recent fluid limits

Continuous settingA Mean-Field Model for Multiple TCP

Connections through a Buffer Implementing RED. [Baccelli, McDonald, Reynier]

Discrete settingLimit Behavior of ECN/RED Gateways Under

a Large Number of TCP Flows. [Tinnakornsrisuphap, Makowski]

Continuous setting

Start with similar stochastic model, Scaling

C NC,Q N (t) Q (t) / N ,N

Fluid limit obtained:

Q N (t) q(t),KN (t) k(t)

Where : is the loss rate

Final fluid equations very similar to our mean value model

KN (t)

Discrete setting

Q (N )(t) Nq(t) N L(t)

Start with discrete model for Windowsize behavior, obtain (with similar scaling, C->NC ),

Similar conclusion as ours regarding role of gain of RED drop profileDemonstrate RED removes synchronization in the limit

Q (N )(t)

NP N q(t)

Also obtain

Srikant et al.

Studied different scalings for limiting fluid models

Obtained limits similar to Makowski et al., in a continuous setting

Interesting observations regarding choice of models (rate based vs queue based) for REM If queue lengths have to be negligible compared to RTTs, use rate-

based models.

If virtual queues are to be used, then either scaling doesn’t matter (using variance calculations).

Parameter choices for stability would be different, depending upon the model

Scaling1

N

.

.

2Nc

p(q) 1 exp q

N

versus

p(q) 1 exp qN

Intuition

N scaling leads to rate-based models

N scaling leads to queue-based models

Why?

• Queue length becomes either or N, depending on the scaling. Thus, the queue length hits zero often in the former case, leading to an averaging effect.

N

Other applications of fluid modelsDesign and analysis of DiffServ

networksModeling and analysis of short-lived

flowsAnalysis of other mechanisms, e.g.

Stochastic Fair droppingGroups at Caltech and UIUC using

similar models for design/analysis

Part II.c: Topology modeling

Why study topology?

Correctness of network protocols typically independent of topology

Performance of networks critically dependent on topologye.g., convergence of route information

Internet impossible to replicate Modeling of topology needed to

generate test topologies

Internet topologies

AT&T

SPRINTMCI

AT&T

MCI SPRINT

Router level Autonomous System (AS) level

More on topologies..

Router level topologies reflect physical connectivity between nodes Inferred from tools like traceroute or well known

public measurement projects like Mercator and Skitter

AS graph reflects a peering relationship between two providers/clients Inferred from inter-domain routers that run BGP and

publlic projects like Oregon Route Views

Inferring both is difficult, and often inaccurate

Early work

Early models of topology used variants of Erdos-Renyi random graphsNodes randomly distributed on 2-

dimensional planeNodes connected to each other w/

probability inversely proportional to distance

Soon researchers observed that random graphs did not represent real world networks

Real world topologies

Real networks exhibit Hierarchical structure Specialized nodes (transit, stub..) Connectivity requirements Redundancy

Characteristics incorporated into the Georgia Tech Internetwork Topology Models (GT-ITM) simulator (E. Zegura, K.Calvert and M.J. Donahoo, 1995)

So…are we done?

No!In 1999, Faloutsos, Faloutsos and

Faloutsos published a paper, demonstrating power law relationships in Internet graphs

Specifically, the node degree distribution exhibited power laws

That Changed Everything…..

Power laws in AS level topology

Faloutsos3 (Sigcomm’99) frequency vs. degree

-10

-9

-8

-7

-6

-5

-4

-3

-2

-1

0

0 2 4 6 8

degree

fre

qu

en

cy

Power Laws

topology from BGP tables of 18 routers

Faloutsos3 (Sigcomm’99)

frequency vs. degree

empirical ccdf P(d>x) ~ x-

Power Laws

-10

-9

-8

-7

-6

-5

-4

-3

-2

-1

0

0 2 4 6 8

degree (d)

P(k

> d

)

Power Laws

-10

-9

-8

-7

-6

-5

-4

-3

-2

-1

0

0 2 4 6 8

degree (d)

P(k

> d

)

Faloutsos3 (Sigcomm’99)

frequency vs. degree

empirical ccdf P(d>x) ~ x-

α ≈1.15

GT-ITM abandoned..

GT-ITM did not give power law degree graphs

New topology generators and explanation for power law degrees were sought

Focus of generators to match degree distribution of observed graph

Generating power law graphsGoal: construct network of size N

with degree power law, P(d>x) ~ x-

power law random graph (PLRG)(Aiello et al)

Inet (Chen et al)

incremental growth (BA) (Barabasi et al)

general linear preference (GLP) (Bu et al)

Power law random graph (PLRG) (Aiello et al) operations

2

11

may be disconnected, contain multiple edges, self-loops

contains unique giant component for right choice of parameters

assign degrees to nodes drawn from power law distribution

create kv copies of node v; kv degree of v.

aggregate edges

randomly match nodes in pool

Inet (Chen et al)

assumptionmax degree, size grow exponentially over time

algorithmpick date, calculate maximum degree/sizecompute degrees of other nodesform spanning tree with degree 2+

attach other nodes according to linear preference

match remaining nodesremove self loops, multi-edges

Barabasi model: fixed exponentincremental growth

initially, m0 nodesstep: add new node i with m edges

linear preferential attachmentconnect to node i with probability ∏(ki) = ki / ∑ kj

0.5

0.5 0.25

0.5 0.25

new nodeexisting node

may contain multi-edges, self-loops

motivation greater flexibility in assigning preference removes need for rewiring

new preferential function ∏(ki) = (ki - ) / ∑ (kj - ), in (-,1) operations

prob. p: add m new links prob. 1-p: add a new node with m new links

can achieve any in (1, )

General linear preference

“ Scale-free” graphs

Preferential attachment leads to “scale free” structure in connectivity

Implications of “scale free” structure Few centrally located and highly connected hubs Network robust to random attack/node removal

(probability of targeting hub very low) Network susceptible to catastrophic failure by

targeted attacks (“Achilles heel of the Internet” Albert, Jeong, Barabasi, Nature 2000)

Is the router-level Internet graph scale-free?No…(There is no Memphis!)Emphasis on degree distribution -

structure ignoredReal Internet very structuredEvolution of graph is highly constrained

Topology constraints

Technology Router out degree is constrained by processing

speed Routers can either have a large number of low

bandwidth connections, or.. A small number of high bandwidth connections

Geography Router connectivity highly driven by geographical

proximity Economy

Capacity of links constrained by the technology that nodes can afford, redundancy/performance they desire etc.

Optimization based models for topologyHOT-1 Highly Optimized Tolerances

Doyle et. al., Caltech, USC, ISI, AT&T..

HOT-2 Heuristically Optimized TradeoffsFabrikant, Koutsoupias, Papadimitriou,

Berkeley

HOT-3: variant of HOT-2Chang, Jamin, Willinger, Michigan, AT&T

Fabrikant HOT

Each new node solves the local optimization problem to find a target node to connect to.

Each new node i connects to an existing node j that minimizes the weighted sum of two objectives: min (dij + hj)dij (last mile cost) = Euclidean distance from i to jhj (transmission delay cost) = average hop distance

from j to all other nodes

Modified Fabrikant HOT

Univariate HOT model.Criteria: (i) AS geography.

Bivariate HOT model.Criteria: (i) AS geography, (ii) AS business

model.

Various extensions

Measurement, Modeling, and Analysis of the Internet: Part II.

Documents

Transcript of Measurement, Modeling, and Analysis of the Internet: Part II.