Network Tomography Based on Flow Level Measurements Dogu Arifler, Gustavo de Veciana, and Brian L....

Network Tomography Based on Flow Level Measurements

Dogu Arifler, Gustavo de Veciana, and Brian L. Evans

The University of Texas at Austin

IEEE International Conference on Acoustics, Speech, and Signal Processing

Montréal, Canada, May 18, 2004

http://www.wncg.org http://www.ece.utexas.edu

2

Outline

Introduction Motivation for inferring network resource sharing Flow level measurements

Methodology for inferring network resource sharing Sampling of flow class throughput processes Dimensionality reduction

Validation with measured data TCP measurements from UT Austin’s border router Statistical accuracy of estimates

Conclusion

3

Inference of congested resource sharing

Motivation: Network managers need information about resource sharing in other networks to better plan for services and diagnose performance problems Internet service providers need to

diagnose configuration errors and link failures in peer networks

Content providers need to balance workload and plan cache placement

Problem: In general, properties of networks outside one’s administrative domain are unknown Little or no information on routing, topology, or link utilizations

Solution: Network tomography Inferring characteristics of networks from available network traffic

measurements

4

Network tomography

Previous work based on packet level measurements Correlation of end-to-end packet losses and delays

[Rubenstein, Kurose & Towsley, 2002]

Inspection of arrival order of packets using probe packets[Rabbat, Nowak & Coates, 2002]

Data intensive to collect and store each packet Complex to analyze: high variability over different time scales

[Feldmann, Gilbert, Huang & Willinger, 2002]

Propose to use flow level measurements A flow is a sequence of packets associated with a given instance

of an application [IETF RFC #2722, 1999]

Packets composing a flow correspond to transfer of a Web page, a file, an e-mail message, etc.

Passive flow level measurements available at local site

5

Flow level measurements

Flow records Provide summary information Easier to collect and store Collected by networking equipment

(e.g. Cisco NetFlow, sFlow, Argus)

Flow records contain Source/destination IP addresses, port numbers, number of

packets and bytes in flow, and start and end time of flow

~80% of Internet flows are TCP flows [http://www.caida.org]

Data warehouse

Records

Monitored link

packets of a flow

timeout

time

start time end time

response time

identifier 1

identifier 2

6

TCP flows

TCP adapts its data transmission rate to available network capacity Congested link bandwidth sharing is roughly fair for flows that

have similar packet loss rates and roundtrip times

Correlated link bandwidth allocation among flows results in correlated flow performance measures

TCP flow performance measure: perceived throughput Amount of data in bytes (flow size) divided by response time

Premise: Throughputs of TCP flows that temporally overlap at a congested resource are correlated

time

available capacityflow 1

flow 2

7

Throughput of a flow class

Flow class is a collection of flow records that have a common identifier, e.g. source/destination address

How can one infer which flow classes share resources? Correlate flow class throughput processes given by

time

. . . . . .

class 2

class 1Flow records collected at a measurement site

8

Which flow class throughput samples can be used to capture flow class throughput correlations?

Construct correlation matrix R of pairwise correlations Estimate throughput correlation between class pairs by using

class throughput samples at times when flow class pair is active

N(T) number of discrete intervals over which ci and cj are active

Conditional sampling of random processes

time

consider red and blue classes

Example activity of flow classes for

discrete time index n

9

Exploratory factor analysis

Correlation structure captured by few latent factors

Orthogonal factor model p flow classes and m factors where m ≤ p Λij are loadings (or weights) of each factor Fi on a variable

Estimate Λ and using principal components analysis m determined by H. F. Kaiser’s rule [1960]: Principal components

whose variances are greater than 1 are significant factors

10

Inference of resource sharing

11 12

21 22*

31 32

41 42

51 52

Λ

Class 1

Class 2

Class 3

Class 4

Class 5

Factor 1 Factor 2

• Classes 1, 2 and 5 share one resourceClasses 3 & 4 share another resource

• Consider five flow classes with two significant factors identified• Factor loading with largest magnitudein each row is boxed

1 2 3 4 5

1 2 3 4 5

Source

Destination

• Paper validates approach usingknown distributions of flow sizesand flow arrivals for two topologies

11

Measured data: preprocessing

Two NetFlow datasets from UT Austin’s border router

Assume that traffic is stationary over one-hour periods

Choose two incoming flow classes that are very likely to experience congestion at the server Select IP addresses associated with AOL and HotMail Divide each class into two: AOL1, AOL2 and HotMail1, HotMail2

Filter flow records based on Packets: Discard flows consisting of only 1 packet Duration: Discard flows with duration shorter than 1 second Size: Discard flows with sizes < 8 kB or > 64 kB

Collection date Period TCP records

Dataset2002 11/06/2002 12:58-2:07 PM 5,173,385

Dataset2004 01/21/2004 12:58-1:26 PM 4,440,697

12

Measured data: component variances

Parent class (AOL and HotMail) throughput correlation is -0.07 for Dataset2002 and 0.05 for Dataset2004

95% bootstrap confidence intervals of variances of principal components of 4 classes AOL1, AOL2, HotMail1, and HotMail2 given below

2 significant factors have explanatory power of 72% for Dataset2002 and 63% for Dataset2004

Principal

component

Dataset2002

95% confidence interval

Dataset2004

95% confidence interval

1 (1.5457, 1.7900) (1.3646, 1.4786)

2 (1.0861, 1.3206) (1.0237, 1.1603)

3 (0.7058, 0.9150) (0.8230, 0.9690)

4 (0.2194, 0.4458) (0.5413, 0.6379)

13

Measured data: factor loadings Based on 2 significant factors, determine factor loadings

Rotated factor loading estimates Rows correspond to classes Columns correspond to shared infrastructure

Estimate 95% bootstrap confidence intervals for loadings to establish accuracy†

With 95% confidence, we can identify which flow classes share infrastructure

Dataset2002 Dataset2004

AOL1AOL2HotMail1Hotmail2

AOL1AOL2HotMail1Hotmail2

† Dogu Arifler, Network Tomography Based on Flow Level Measurements, Ph.D. Dissertation, 2004.

14

Conclusion

Contributions Application of a structural analysis technique, factor analysis, to

explore network properties Methodology for inferring resource sharing Use of bootstrap methods to make inferential statements about

resource sharing

Possible applications Network monitoring and root cause analysis of poor performance Problem diagnosis and off-line evaluation of congestion status of

networks Route configuration by Internet service providers

15

Backup slides

16

Flow level performance of elastic traffic

Elastic traffic can tolerate rate variations This implies that a closed-loop control, such as TCP, can be

applied end-to-end on flows

Additive increase, multiplicative decrease congestion avoidance algorithm of TCP The transmission rate increases linearly in the absence of packet

loss, and is halved when there is packet loss

For a given RTT and loss rate p, flow throughput is:

Also, this relates p to throughput

However, y(p) depends on number of flows in progress Packet level dynamics is determined by flow level dynamics

constant( )y p

RTT p

17

Notes on processor sharing

When there are n customers in the system, each receive service at a rate 1/n sec/sec All customers are sharing the capacity equally

Two abstractions: Customers are given the full capacity on a part-time basis Customers are given a fractional-capacity on a full-time basis

Why does TCP realize processor sharing? When there are n flows in the single-bottleneck system, the

protocol tends to share bandwidth roughly equally among flows (for flows with similar RTTs and packet loss rates). This is processor sharing!

More generally, TCP’s additive-increase/multiplicative-decrease (AIMD) achieves fair sharing [Massoulie and Roberts, 2002]

18

Notes on factor analysis

Factor analysis vs. principal component analysis (PCA) In factor analysis, primary goal is to explain correlations between

variables (off-diagonal elements of covariance/correlation matrix) In PCA, primary goal is to explain variance (diagonal elements of

covariance/correlation matrix) PCA is usually used to find initial estimates of loadings Another related method: independent component analysis; looks

at higher order moments How do temporal correlations within a class’ throughput

affect factor analysis? Ignore serial correlations when the interest is descriptive or

exploratory in nature Successfully applied to econometric time series, biometric time

series, etc. See e.g., Basilevsky 1993 or Jolliffe 2002

19

Confidence intervals for loadings

20

Interaction of coupled traffic

Consider a “linear” network to evaluate the effect of interactions of coupled network traffic

Can throughputs of two flow classes that do not share a link be correlated due to interactions through another flow class?

Results of fluid simulations show that degree of correlation between throughputs of classes not sharing a link is negligible

file server 3

10 Mbps LANs with 10 workstations

1

2

3

file server 1

file server 2

21

Interaction of coupled traffic: an example

Consider the “linear” network below

Discard flows with sizes < 4 kB or > 32 kB Based on 2 significant factors, determine factor loadings Rotated factor loading estimates

Rows correspond to classes Columns correspond to shared links

file server 3

10 Mbps LANs with 10 workstations

1

2

3

file server 1

file server 2

80% 80%Background traffic utilizes 20% ofbottleneck links

(20%)

(40%)(40%)

22

The bootstrap

Validation with real data is extremely difficult! Unlike controlled simulations, we do not know routing information

We would like to be able to make inferential statements Estimate 95% confidence intervals for eigenvalues and loadings Modify Kaiser’s rule for selecting significant eigenvalues

The bootstrap, a computer-based method, can be used to compute confidence intervals [Efron and Tibshirani, 1993]

From data at hand, construct empirical distribution and generate many realizations

No distributional assumptionson data required

Applicable to any statistic, s(X), simple or complicated

(B independent replications)

samples of size n

*1 * *1

*2 * *2

* * *

ˆ 1

ˆ 2

ˆB B

s

s

B s

X X

X X

X X

n̂F

23

Principal component method

Determine m “significant”eigenvalues of R using Kaiser’s rule [Kaiser, 1960]

Variances of factors are given by eigenvalues

1 1 1 1, , , , , , , ,T

T T Tm m p p m m p p R ξ ξ ξ ξ ξ ξ

eig

en

valu

e1 2 43 5 6 7

1variance of a normalized variable

…

m significanteigenvalues

1 1 1 1ˆ ˆ ˆ ˆ, , , , ,

TT T T

m m m m R ξ ξ ξ ξ Ψ ΛΛ Ψ

2

1

ˆˆ 1m

i ijj

Use spectral decomposition on R to estimate Λ and Eigenvalue-eigenvector pairs (i, ξi), 1 ≤ i ≤ p

where

24

Methodology for inferring resource sharing

1. Define the flow classes of interest, C

2. Set flow filtering thresholds for packets, duration, and size

3. Determine flows F that satisfy the filtering criteria

4. Compute flow class throughputs at discretized times

5. Through conditional sampling, estimate pairwise correlations

6. Find number of factors m using eigenvalues of the correlation matrix and modified Kaiser's rule

7. Perform exploratory factor analysis based on m factors

8. Rotate factor loadings using varimax rotation

9. Determine which flow classes have the largest loading on a given factor: Inference of shared congested resources

25

Summary of methodology

Flow filtering

Bootstrap Exploratoryfactor analysis

Conditional sampling

Network tomography

Network Tomography Based on Flow Level Measurements Dogu Arifler, Gustavo de Veciana, and Brian L....

Documents

Transcript of Network Tomography Based on Flow Level Measurements Dogu Arifler, Gustavo de Veciana, and Brian L....