Bandwidth estimation in computer networks: measurement techniques & applications Constantine...

Bandwidth estimation in computer networks:

measurement techniques &applications

Constantine Dovrolis

College of ComputingGeorgia Institute of Technology

The Internet as a “black box”

Several network properties are important for applications and transport protocols: Delay, loss rate, jitter, capacity, congestion, load, etc

But, routers and switches do not provide direct feedback to end-hosts (except ICMP, also of limited use) Mostly due to scalability, policy, and simplicity factors

Can we guess what is in the black box?

End-systems can infer network state through end-to-end (e2e) measurements Without any feedback from routers Objectives: accuracy, speed, non-intrusiveness

probe packets

Bandwidth estimation in packet networks

Bandwidth estimation (bwest): Inference of various throughput-related metrics

with end-to-end network measurements Early works:

Keshav’s packet pair method for congestion control (‘89) Bolot’s capacity measurements (’93) Carter-Crovella’s bprobe and cprobe tools (’96) Jacobson’s pathchar – per hop capacity estimation (‘97) Melander’s TOPP avail-bw estimation (’00)

In last ± 5 years: Several new estimation techniques Many research papers at prestigious conferences More than a dozen of new measurement tools Several applications of bwest methods Significant commercial interest in bwest technology

Overview This talk is an overview of the most important

developments in bwest area over the last 10 years A personal bias could not be avoided..

Overview Bandwidth-related metrics Capacity estimation

Packet pairs and CapProbe technique Available bandwidth estimation

Iterative probing: Pathload and PathChirp Direct probing: Spruce Comparison of tools Percentile estimation: Pathvar

Applications Overlay-based Video Streaming SOcket Buffer Auto-Sizing (SOBAS)

Network bandwidth metrics

Ravi S.Prasad, Marg Murray, K.C. Claffy, Constantine Dovrolis,

“Bandwidth Estimation: Metrics, Measurement Techniques, and Tools,”

IEEE Network, November/December 2003.

Capacity definition Maximum possible end-to-end throughput at IP layer

In the absence of any cross traffic Achievable with maximum-sized packets

If Ci is capacity of link i, end-to-end capacity C defined as:

Capacity determined by narrow link

iHi

CC,...,1

min

Available bandwidth definition Per-hop average avail-bw:

Ai = Ci (1-ui)

ui: average utilization A.k.a. residual capacity

End-to-end avg avail-bw A:

Determined by tight link ISPs measure per-hop

avail-bw passively (router counters, MRTG graphs)

iHi

AA,...,1

min

The avail-bw as a random process

Instantaneous utilization ui(t): either 0 or 1 Link utilization in (t, t+)

Averaging timescale: Available bandwidth in (t, t+)

End-to-end available bandwidth in (t, t+)

t

t

ii dttuttu )(1

),(

)],(1[),( ttuCttA iii

),(min),(,...,1

ttAttA iHi

Available bandwidth distribution Avail-bw has significant variability

Need to estimate second-order moments, or even better, the avail-bw distribution

Variability depends on averaging timescale Larger timescale, lower variance

Distribution is Gaussian-like, if >100-200 msec and with sufficient flow multiplexing

E2E Capacity estimation (a’):

Packet pair technique

Constantine Dovrolis, Parmesh Ramanathan, David Moore,

“Packet Dispersion Techniques and Capacity Estimation,”

In IEEE/ACM Transactions on Networking, Dec 2004.

Packet pair dispersion Packet Pair (P-P) technique

Originally, due to Jacobson & Keshav

Send two equal-sized packets back-to-back Packet size: L Packet trx time at link i:

L/Ci

P-P dispersion: time interval between last bit of two packets

Without any cross traffic, the dispersion at receiver is determined by narrow link:

iinout C

L,max

C

L

C

L

iHi

R

,...,1max

Cross traffic interference Cross traffic packets can affect P-P dispersion

P-P expansion: capacity underestimation P-P compression: capacity overestimation

Noise in P-P distribution depends on cross traffic load Example: Internet path with 1Mbps capacity

Multimodal packet pair distribution Typically, P-P distribution includes several local modes

One of these modes (not always the strongest) is located at L/C Sub-Capacity Dispersion Range (SCDR) modes:

P-P expansion due to common cross traffic packet sizes (e.g., 40B, 1500B)

Post-Narrow Capacity Modes (PNCMs): P-P compression at links that follow narrow link

E2E Capacity estimation (b’):

CapProbe

Rohit Kapoor, Ling-Jyh Chen, Li Lao, Mario Gerla, M. Y. Sanadidi,

"CapProbe: A Simple and Accurate Capacity Estimation Technique,"

ACM SIGCOMM 2004

Compression of p-p dispersion First packet queueing => compressed dispersion

Capacity overestimation

Expansion of p-p dispersion Second packet queueing => expansion of dispersion

Capacity underestimation

CapProbe Both expansion and compression result from queuing

delays Key insight: packet pair with zero queueing delay would

yield correct estimate measure RTT for each probing packet of packet pair estimate capacity from pair with minimum RTT-sum

CapacityCapacity

Measurements - Internet, Internet2

To

UCLA-2 UCLA-3 UA NTNU

Time Capacity Time Capacity Time Capacity Time Capacity

CapProbe

0’03 5.5 0’01 96 0’02 98 0’07 97

0’03 5.6 0’01 97 0’04 79 0’07 97

0’03 5.5 0’02 97 0’17 83 0’22 97

Pathrate

6’10 5.6 0’16 98 5’19 86 0’29 97

6’14 5.4 0’16 98 5’20 88 0’25 97

6’5 5.7 0’16 98 5’18 133 0’25 97

Pathchar

21’12 4.0 22’49 18 3 hr 34 3 hr 34

20’43 3.9 27’41 18 3 hr 34 3 hr 35

21.18 4.0 29’47 18 3 hr 30 3 hr 35

• CapProbe implemented using PING packets, sent in pairs

Avail-bw estimation (a’):

Pathload

Manish Jain, Constantine Dovrolis,“End-to-End Available Bandwidth:

Measurement Methodology, Dynamics, and Relation with TCP Throughput,”

IEEE/ACM Transactions on Networking, Aug 2003.

Probing methodology

Sender transmits periodic packet stream of rate R K packets, packet size L, interarrival T = L/R

Receiver measures One-Way Delay (OWD) for each packet D(k) = tarv(k) - tsnd(k) OWD variations: Δ(k) = D(k+1) – D(k)

Independent of clock offset between sender/receiver

With stationary & fluid-modeled cross traffic: If R > A, then Δ(k) > 0 for all k Else, Δ(k) = 0 for all k

Self-loading periodic streams

Increasing OWDs means R>A Almost constant OWDs means R<A

Example of OWD variations 12-hop path from U-Delaware to U-Oregon

K=100 packets, A=74Mbps, T=100μsec Rleft = 97Mbps, Rright=34Mbps

Ri

Iterative probing and Pathload

Iterative probing in Pathload: Send multiple probing streams (duration τ) at various rates Each stream samples path at different time interval Outcome of each stream is either Ri > A or Ri < A Estimate upper and lower bound for avail-bw variation range

Avail-bw time series from NLANR trace

Avail-bw estimation (b’):

PathChirp

Vinay Ribeiro, Rolf Riedi, Jiri Navratil, Rich Baraniuk, Les Cottrell,

“PathChirp: Efficient Available Bandwidth Estimation,”PAM 2003

Chirp Packet Trains

Exponentially decrease packet spacing within packet train

Wide range of probing rates

Efficient: few packets 100Mbps-1 packets, 134.1

Chirps vs. CBR Trains

Multiple rates in each chirping train Allows one estimate per-chirp Potentially more efficient estimation

Comparison with Pathload

• 100Mbps links• pathChirp uses 10

times fewer bytes for comparable accuracy

Available bandwidth

Efficiency Accuracy

pathchirp pathload pathChirp10-90%

pathloadAvg.min-max

30Mbps 0.35MB 3.9MB 19-29Mbps 16-31Mbps



Avail-bw estimation (c’):

Direct probing & Spruce

Jacob Strauss, Dina Katabi, and Frans Kaashoek

“A Measurement Study of Available Bandwidth Estimation Tools,”

IMC 2003

Direct probing

Iterative methods are relatively slow and they need to probe at multiple rates

Direct probing methods Probe only once at a rate Ri and determine

avail-bw from output rate Ro

Spruce (IMC’03) followed this approach With a single-link fluid model (capacity Ct):

Sender sends periodic stream with input rate Ri

Receiver measures output rate Ro

)1( o

tit R

CRCA

What’s wrong with Direct Probing?

Two major issues: Tight link capacity Ct is assumed to be known

What if tight link is different than narrow link? Input rate Ri equal to source rate (set equal to Ct)

But what would happen in a multi-hop path? Probing stream can arrive at tight link with lower rate

than source rate (due to a link with lower avail-bw than Ct before the tight link)

Results in avail-bw underestimation For details, see CCR paper, Oct 2006, by Lao, Dovrolis

and Sanadidi, “The Probe Gap Model can Underestimate the

Available Bandwidth of Multihop Paths”

)1( o

tit R

CRCA

Avail-bw estimation (d’): percentile estimation

Manish Jain, Constantine Dovrolis,“End-to-End Estimation of the Available

Bandwidth Variation Range,”ACM SIGMETRICS 2005

Avail-bw distribution and percentiles

Avail-bw random process, in timescale A(t) Assume stationarity

Marginal distribution of A:

F(R) = Prob [A ≤ R]

Ap :pth percentile of A, such that p = F(Ap)

Objective: Estimate variation range [AL, AH] for given averaging timescale ALand AH are pL and pH percentiles of A

Typically, pL =0.10 and pH =0.90

Percentile sampling

Question : Which percentile of A corresponds to rate R?

Given R and estimate F(R)

Assume that F(R) is inversible Sender transmits periodic packet stream of rate R

Length of stream = measurement timescale Receiver classifies stream as

Type-G if A ≤ R: I(R)= 1 with probability F(R)

Type-L otherwise: I(R)= 0 with probability 1-F(R) Collect N samples of I(R) by sending N streams of rate R

Number of type-G streams: I(R,N)= Σi Ii(R)

E[ I(R,N) ] = F(R) * N Percentile rank of probing rate R: F(R) = E[I(R,N)] / N

Use I(R,N)/N as estimator of F(R)

Non-parametric estimation

Does not assume specific avail-bw distribution Iterative algorithm

Stationarity requirement across iterations N-th iteration: probing rate Rn

Use percentile sampling to estimate percentile rank of Rn

To estimate the upper percentile AH with pH = F(AH): fn = I(Rn,N)/N If fn is between pH±report AH = Rn

Otherwise, If fn > pH + , set Rn+1 < Rn

If fn < pH - , set Rn+1 > Rn

Similarly, estimate the lower percentile AL

Validation example Verification using real Internet traffic traces

b=0.05 b=0.15

Non-parametric estimator tracks variation range within 10-20% Selection of b depends on traffic

Traffic spikes/dips may not be detected if b is too small But larger b causes larger MSRE

Parametric estimation Assume Gaussian avail-bw distribution

Justified assumption for large degree of traffic multiplexing And/or for long averaging timescale (>200msec)

Gaussian distribution completely specified by Mean and standard deviation

pth percentile of Gaussian distribution Ap = + -1(p)

Sender transmits N probing streams of rates R1 and R2 Receiver determines percentiles ranks corresponding to R1 and R2 and can be then estimated by solving

R1 = + -1(p1) R2 = + -1(p2)

Variation range is then calculated from: AH = + -1(pH) AL = + -1(pL)

Validation example

Gaussian traffic non-Gaussian traffic

Verification using real Internet traffic traces Evaluated with both Gaussian and non-Gaussian traffic

Parametric algorithm is more accurate than non-parametric algorithm, when

traffic is good match to Gaussian model in non-stationary conditions

Experimental comparison of avail-bw estimation tools

Alok Shriram, Marg Murray, Young Hyun, Nevil Brownlee, Andre Broido, k claffy,”Comparison of Public End-to-End

Bandwidth Estimation Tools on High-Speed Links,”

PAM 2005

Testbed topology

Accuracy comparison - SmartBitsDirection 1, Measured AB

Direction 2, Measured AB

Actual AB

Accuracy comparison - TCPreplayActual Available Bandwidth

Measured Available Bandwidth

Measurement latency comparison

• Abing: 1.3 to 1.4 s

• Spruce: 10.9 to 11.2 s

• Pathload: 7.2 to 22.3 s

• Patchchirp: 5.4 s

• Iperf: 10.0 to 10.2 s

Probing traffic comparison

Applications of bandwidth estimation

Applications of bandwidth estimation

Large TCP transfers and congestion control Bandwidth-delay product estimation Socket buffer sizing

Streaming multimedia Adjust encoding rate based on avail-bw

Intelligent routing systems Overlay networks and multihoming Select best path based on capacity or avail-bw

Content Distribution Networks (CDNs) Choose server based on least-loaded path

SLA and QoS verification Monitor path load and allocated capacity

End-to-end admission control Network “spectroscopy” Several more..

Overlay-based Video Streaming

M. Jain and C. Dovrolis,“Path Selection using Available Bandwidth

Estimation in Overlay-based Video Streaming,”

published at IFIP Networking 2007.

Motivation Video/IPTV next “killer-application” in the Internet IP networks present several challenges

Network impairments such as losses and jitter Lack of QoS guarantees

Previous approaches Adapt encoding rate

Video quality not consistent Use proactive error correction techniques such as

forward error correction (FEC) or retransmission Bandwidth overhead for FEC Potentially larger playback delay

Error concealment techniques Limited applicability & effectiveness

Exploiting Path Diversity Multihoming and overlay networks make multiple paths

available between sender and receiver Applications can dynamically switch from one path to

another Based on observed (or predicted) performance Serves as an additional adaptation mechanism

Schemes using path diversity based on network measurement have been explored Path selection techniques: Tao et al. [ACM Multimedia ‘04]

and Amir et al. [NOSSDAV ‘05] Multi-description coding techniques: Begen et al. [Signal

Processing ‘05] and Apostolopoulos et al. [INFOCOM ‘02] They use only loss and jitter measurements Employ dummy packets for measurement

Our Approach We consider use of overlay network for video

streaming To maximize perceived video quality

Novel features in our approach: Use avail-bw to drive dynamic path selection Use VQM technique instead of PSNR to measure

video quality Described in ITU-T recommendation J.144

Use data packets to substitute “dummy” packets for network measurements

Objective: Evaluate the performance of different path selection schemes in terms of perceived video quality

Overlay based Video Streaming Overlay nodes run measurement

module (m-module) To measure network performance

Video stream delivered via at most two-hop path Based on results by Zhu, Dovrolis

and Ammar, 2006 M-module uses active measurements

to estimate Loss-rate Jitter Avail-bw: average and variation

range Uses video packets for in-band

measurements In paths with video streams

Self Loading Periodic Streams (SLoPS)

Objective: Determine whether avail-bw A is greater or smaller than a given rate R

A > R or A <= R Sender transmits periodic packet stream of rate R

K packets, packet size L, departure spacing T = L/R Receiver measures One-Way Delay (OWD) for each

packet D(k) = tarv(k) - tsnd(k)

OWD variations: Δ(k) = D(k+1) – D(k) Independent of clock offset between sender/receiver

With stationary & fluid-modeled cross traffic: If R > A, then Δ(k) > 0 for all k Else, Δ(k) = 0 for all k

In-band Avail-bw Estimation

In-band avail-bw estimation uses video packets for probing network paths

Ingress M-module shapes the incoming probing packets to rate Rp

If incoming rate Rv < Rp Add decreasing amount of delay i

Else, add increasing amount of delay i

Insert i in packet header Egress M-module re-shapes the video stream to rate Rv using i

Path Selection schemes We consider 4 path selection schemes

Based on choice of measured network performance metric Loss based (LPS)

Loss-rate monitored during 3 second period Path with lowest loss-rate selected

Jitter based (JPS) Jitter of successive packet pairs measured in 3 second period Path with minimum 90th percentile of jitter measurements

selected If jitter is same, then path with lowest loss-rate is selected

Avail-bw based Average avail (A-APS) Lower bound of avail-bw variation range (L-APS) In current path:

If avail-bw < twice of video rate; then switch to path with maximum avail-bw

Experimental Setup

Testbed details: Video stream transmitted from node B to E

A 2-minute MPEG-2 encoded clip from SMPTE test video sequences

M-module on nodes B, D, and E Cross traffic generated by replaying NLANR packet traces

Video Quality Estimation

Performance of path selection schemes evaluated based on: Video quality

Measured using VQM tool http://www.its.bldrdoc.gov/n3/video/vqmsoftware.ht

m VQM compares the original video stream with

the received video stream Lower VQM score implies higher quality

User-abort probability Based on VQM scores of 10-second video

segments Path switching frequency

http://www.its.bldrdoc.gov/n3/video/vqmsoftware.htm

http://www.its.bldrdoc.gov/n3/video/vqmsoftware.htm

Results: Experiment 1

L-APS performs better than other schemes A-APS performance similar to LPS

Average avail-bw does not capture the variability of avail-bw distribution

Ranking of 4 schemes in terms of abort fraction is similar to long term VQM scores


L-APS has lowest path switching frequency JPS has higher path switching frequency than L-APS

Similar path switching frequency to A-APS


Avail-bw based path selection results in better perceived video quality Compared to algorithms relying on jitter or loss-rate

Path Switching versus FEC

We compared performance of simplest form of Reed-Solomon FEC code (n, k) n-k out of n packets carry FEC packets In our experiments n = k+1

L-APS performs consistently better than FEC

Summary

Explored the use of measurement driven path selection schemes for video streaming

Avail-bw estimates based path selection shows better performance Compared to other network metrics

An interesting next step would be to use real-time video quality measurements to drive path selection schemes

Also, hybrid schemes involving use of both path selection and FEC is unexplored

Improved TCP Throughput with BwEst

Ravi S. Prasad, Manish Jain, Constantine Dovrolis,

“Socket Buffer Auto-Sizing for High-Performance Data Transfers,”

Journal of Grid Computing, 2003

TCP performs poorly in fat-long paths Basic idea in TCP congestion control:

Increase congestion window until packet loss occurs Then reduce window by a factor of two (multiplicative decrease)

Consequences Increased loss-rate Avg. throughput less than

available bandwidth Large delay-variation

Example: 1Gbps path, 100msec RTT, 1500B pkts Single loss ≈ 4000 pkts

window reduction Recovery from single loss

≈ 400 sec Need very small loss

probability (ρ<2*10-8) to saturate path

Can we avoid congestive losses w/o changing TCP?

Ws = min {Wc, Wr} But Wr<= S

S: rcv socket buffer size We can limit S so that connectionis limited by receive window: Ws = S Non-congested path:

Throughput R(S) = S/T(S) R(S) increases with S until it reaches MFT MFT: Maximum Feasible Throughput (onset of congestion) Objective: set S close to MFT, to avoid any losses and

stabilize TCP send window to a safe value Congested path:

Path with losses before start of target transfer Limiting S can only reduce throughput

Socket buffer auto-sizing (SOBAS) Apply SOBAS only in non-congested paths Receiving application measures rcv-throughput Rrcv

When receive throughput becomes constant: Connection has reached its maximum throughput Rmax

If the send window becomes any larger, losses will occur Receiving application limits rcv socket buffer size S:

S = RTT * Rmax

With congestion unresponsive cross traffic: Rmax = A With congestion responsive cross traffic: Rmax = MFT

SOBAS does not require any changes in TCP But estimation of RTT and congestion-status requires

“ping-like” probing

Measurements at WAN path

Path: GaTech to LBL,CA SOBAS flow get higher average throughput than non-

SOBAS flow because former avoids all congestion losses SOBAS does not significantly increase path’s RTT

Non-SOBAS SOBAS

To conclude..

Things I have not talked about Other estimation tools & techniques

ABget, pipechar, STAB, pathneck, IGI/PTR, … Per-hop capacity estimation (pathchar, clink, pchar)

Other applications Bwest in new TCPs (e.g., TCP Westwood, TCP FAST, PCP) Bwest in wireless nets Bwest in multihoming Bwest in bottleneck detection

Theoretical analysis of packet pair/train dispersion Unavoidable estimation bias in avail-bw measurements due to

traffic burstiness Diminishes as packet train length increases

Scalable bandwidth measurements in networks with many monitoring nodes

Practical issues Interrupt coalescence Traffic shapers Non-FIFO queues

Conclusions

We know how to do the following: Estimate e2e capacity & avail-bw Apply bwest in several applications

We know that we cannot do the following: Estimate per-hop capacity with VPS techniques Avoid avail-bw estimation bias when traffic is

bursty and/or path has multiple bottlenecks We do not yet know how to do the following:

Scalable but accurate bandwidth measurements Integrate with more applications

Many research questions are still open Help wanted!

Bandwidth estimation in computer networks: measurement techniques & applications Constantine...

Documents

Transcript of Bandwidth estimation in computer networks: measurement techniques & applications Constantine...