An Empirical Evaluation of VoIP Playout Buffer Dimensioning in Skype, Google Talk, and MSN Messenger

28
NOSSDAV 2009 An Empirical Evaluation of VoIP Playout Buffer Dimensioning for KuanTa Chen, Academia Sinica ChenChi Wu and ChinLaung Lei, National Taiwan University ChunYin Huang, National Taiwan Ocean University

description

VoIP playout buffer dimensioning has long been a challeng- ing optimization problem, as the buffer size must maintain a balance between conversational interactivity and speech quality. The conversational quality may be affected by a number of factors, some of which may change over time. Although a great deal of research effort has been expended in trying to solve the problem, how the research results are applied in practice is unclear. In this paper, we investigate the playout buffer dimension- ing algorithms applied in three popular VoIP applications, namely, Skype, Google Talk, and MSN Messenger. We conduct experiments to assess how the applications adjust their playout buffer sizes. Using an objective QoE (Quality of Experience) metric, we show that Google Talk and MSN Messenger do not adjust their respective buffer sizes appropriately, while Skype does not adjust its buffer at all. In other words, they could provide better QoE to users by improving their buffer dimensioning algorithms. Moreover, none of the applications adapts its buffer size to the network loss rate, which should also be considered to ensure optimal QoE provisioning.

Transcript of An Empirical Evaluation of VoIP Playout Buffer Dimensioning in Skype, Google Talk, and MSN Messenger

Page 1: An Empirical Evaluation of VoIP Playout Buffer Dimensioning in Skype, Google Talk, and MSN Messenger

NOSSDAV 2009

An Empirical Evaluation of VoIP Playout Buffer Dimensioning 

for

Kuan‐Ta Chen, Academia Sinica

Chen‐Chi Wu and Chin‐Laung Lei, National Taiwan University

Chun‐Yin Huang, National Taiwan Ocean University

Page 2: An Empirical Evaluation of VoIP Playout Buffer Dimensioning in Skype, Google Talk, and MSN Messenger

Playout Buffering

Voice needs to be played smoothly, but network may inject delay variance (jitters)

(dejitter buffer)

Page 3: An Empirical Evaluation of VoIP Playout Buffer Dimensioning in Skype, Google Talk, and MSN Messenger

NOSSDAV 2009 / Kuan‐Ta Chen  3

The Tradeoff

As buffering sacrifices delay in exchange for a higher chance that voice packets arrive on time

Larger buffer size better speech quality

lower conversational interactivity

Smaller buffer size worse speech quality

higher conversational interactivity

Page 4: An Empirical Evaluation of VoIP Playout Buffer Dimensioning in Skype, Google Talk, and MSN Messenger

NOSSDAV 2009 / Kuan‐Ta Chen  4

Playout Buffer Dimensioning

Deciding the optimal buffer sizemaintain a balance between conversational interactivity and speech quality

It’s challenging because of so many factorsNetwork delayNetwork delay jitterNetwork lossCodec implementationRedundancy controlError recoveryTransport protocol, etc

changes over time

Page 5: An Empirical Evaluation of VoIP Playout Buffer Dimensioning in Skype, Google Talk, and MSN Messenger

NOSSDAV 2009 / Kuan‐Ta Chen  5

Proposals on Buffer Dimensioning

Many algorithms have been proposed The k largest delay among the previous m delays [Naylor’85]

Inflate buffer size when packets arrive late, and shrink buffer 

size over time [Stone’95]

Weighted sum of EWMA of delay and delay jitter [Ramjee’94]

Automatic adjustment of EWMA weights [Narbutt’03]

Weight adjustment within talk bursts [Liang’03, Sreenan’03]

Page 6: An Empirical Evaluation of VoIP Playout Buffer Dimensioning in Skype, Google Talk, and MSN Messenger

We are curious about …

Skype: 405 million registrars (15 million online)

Whether a gap exists between research community and software 

practitioners?

Page 7: An Empirical Evaluation of VoIP Playout Buffer Dimensioning in Skype, Google Talk, and MSN Messenger

In other words …

Do commercial products really adopt any of these proposals?

What’s the performance of commercial products (in terms 

of buffer dimensioning)?

Page 8: An Empirical Evaluation of VoIP Playout Buffer Dimensioning in Skype, Google Talk, and MSN Messenger

NOSSDAV 2009 / Kuan‐Ta Chen  8

Our Contribution

A systematic measurement methodology for measuring VoIP playout buffer size

Show that the real‐life VoIP applications do not adjust their buffer sizes appropriately

based on QoE measures

A regression‐based algorithm to compute the optimal buffer size given a network condition

Light‐weight computation; thus can be applied in run time

Page 9: An Empirical Evaluation of VoIP Playout Buffer Dimensioning in Skype, Google Talk, and MSN Messenger

Talk Progress

Overview

Measuring Applications’ Buffer SizeExperiment Methodology

Measurement Results

Deriving Optimal Buffer SizeMethodology

Derived Buffer Sizes

Evaluation of Applications’ Dimensioning Algorithms

Conclusion & On‐going Work

Page 10: An Empirical Evaluation of VoIP Playout Buffer Dimensioning in Skype, Google Talk, and MSN Messenger

NOSSDAV 2009 / Kuan‐Ta Chen  10

Experiment Methodology

Dummynet for controling network conditions (delay, jitter, and packet loss)

Use a recording card (ESI Maya44) to ensure time‐synchronized audio recordings from two hosts

Recorder

Listener

Speaker

FreeBSD w/ dummynet

Audio output

Audio output

Wave File

Left channel (speaker)

Right channel (listener )

packets

packets

Audio input

Page 11: An Empirical Evaluation of VoIP Playout Buffer Dimensioning in Skype, Google Talk, and MSN Messenger

NOSSDAV 2009 / Kuan‐Ta Chen  11

Buffer Size Estimation

End‐to‐end delay componentsNetwork delay (dummynet)Coder delay + packetization delay (assumes 50 ms)Playout buffering delay (unknown)

Buffer size = e2e delay – network delay – 50 ms

Finding best alignment (based on cross‐correlation 

coeffcients)

end‐to‐end delay

Page 12: An Empirical Evaluation of VoIP Playout Buffer Dimensioning in Skype, Google Talk, and MSN Messenger

NOSSDAV 2009 / Kuan‐Ta Chen  12

Experiment Settings

Application: Skype, Google Talk, MSN Messenger

Network delay & jitter: 0 ms, 25 ms, 50ms, …, 200 ms

Network loss rate: 0%, 1%, …, 10%

10 VoIP calls for each app/network settingEach call lasts 240 seconds

Page 13: An Empirical Evaluation of VoIP Playout Buffer Dimensioning in Skype, Google Talk, and MSN Messenger

Effect of Delay and Jitter

Skype maintains the same buffer sizeGoogle Talk slightly adjusts the buffer size according to delay and jitterMSN Messenger’s buffer size grows linearly as the jitter increases

Page 14: An Empirical Evaluation of VoIP Playout Buffer Dimensioning in Skype, Google Talk, and MSN Messenger

NOSSDAV 2009 / Kuan‐Ta Chen  14

Effect of Packet Loss

All three applications do not adapt buffer size to packet loss

Page 15: An Empirical Evaluation of VoIP Playout Buffer Dimensioning in Skype, Google Talk, and MSN Messenger

Having seen the different behavior of the applications, …

Which one application’s playout dimensioning is best?

Is the best one optimal?

Page 16: An Empirical Evaluation of VoIP Playout Buffer Dimensioning in Skype, Google Talk, and MSN Messenger

Talk Progress

Overview

Measuring Applications’ Buffer SizeExperiment Methodology

Measurement Results

Deriving Optimal Buffer SizeMethodology

Derived Buffer Sizes

Evaluation of Applications’ Dimensioning Algorithms

Conclusion

On‐going Work

Page 17: An Empirical Evaluation of VoIP Playout Buffer Dimensioning in Skype, Google Talk, and MSN Messenger

NOSSDAV 2009 / Kuan‐Ta Chen  17

Deriving Optimal Buffer Size ‐ QoE

How to define the “optimal” buffer size?Buffer size that yields the best quality of experience (QoE)

How to measure the QoE of a VoIP call?PESQ (ITU‐T P.862, Perceptual Evaluation of Speech Quality)

measures listening qualitysignal level, accurate

E‐Model (ITU‐T G.107)measures overall quality (listening + interactivity)network level, lightweight but not accurate in listening quality

Page 18: An Empirical Evaluation of VoIP Playout Buffer Dimensioning in Skype, Google Talk, and MSN Messenger

QoE Assessment Model [Ding’03]original audio clip  degraded audio clip 

PESQ algorithm

MOS score

R score

substract Id in E‐model

final R score

final MOS score

conversion

extension

Page 19: An Empirical Evaluation of VoIP Playout Buffer Dimensioning in Skype, Google Talk, and MSN Messenger

NOSSDAV 2009 / Kuan‐Ta Chen  19

Deriving Optimal Buffer Size ‐Simulation

For each (buffer size, network setting)

1. Encode an audio clip into a sequence of VoIP frames2. Impairment at the network

network delay & jitterpacket loss (Gilbert model)

3. Packet discarding at the receiverdrop a packet if its arrival time is later than scheduled time (sent time + playout buffer size) 

4. Decode the result frames (a subset of original frames) to a degraded audio clip

5. Compute average QoE scores

Page 20: An Empirical Evaluation of VoIP Playout Buffer Dimensioning in Skype, Google Talk, and MSN Messenger

Derived Optimal Buffer Size (1)

Page 21: An Empirical Evaluation of VoIP Playout Buffer Dimensioning in Skype, Google Talk, and MSN Messenger

Derived Optimal Buffer Size (2)

Page 22: An Empirical Evaluation of VoIP Playout Buffer Dimensioning in Skype, Google Talk, and MSN Messenger

NOSSDAV 2009 / Kuan‐Ta Chen  22

A typical relationship between QoS and QoE

QoS,  e.g., speech quality or (e2e delay)‐1

QoE

Hard to tell “very bad”from “extremely bad”

Marginal benefit is small

Page 23: An Empirical Evaluation of VoIP Playout Buffer Dimensioning in Skype, Google Talk, and MSN Messenger

Comparing Real‐Life Applications with Optimal Settings

MSN Messenger’s buffer dimensioning algorithm is better than those of Skype and Google Talk

Page 24: An Empirical Evaluation of VoIP Playout Buffer Dimensioning in Skype, Google Talk, and MSN Messenger

NOSSDAV 2009 / Kuan‐Ta Chen  24

Modeling Optimal Buffer Size

Using a linear regression to model the optimal buffer size given a network setting

plrjitterdelaycoef

jitterdelaycoef

delaycoefconst

plrjitterdelay

jitterdelay

delay

⋅⋅⋅

+⋅⋅

+⋅+

⋅⋅

.)(

Page 25: An Empirical Evaluation of VoIP Playout Buffer Dimensioning in Skype, Google Talk, and MSN Messenger

Talk Progress

Overview

Measuring Applications’ Buffer SizeExperiment Methodology

Measurement Results

Deriving Optimal Buffer SizeMethodology

Derived  Buffer Sizes

Evaluation of Applications’ Dimensioning Algorithms

Conclusion & On‐going Work

Page 26: An Empirical Evaluation of VoIP Playout Buffer Dimensioning in Skype, Google Talk, and MSN Messenger

Conclusion

Our results show that MSN Messenger performs the best in terms of buffer dimensioning

Proprietary codec are used in SkypeOther factors may dominate the final quality perceived by users

Results from the research community seem not be applied in real‐life VoIP applications

methods not generalizable enough?methods not inutitive enough?methods not practical enough? (e.g., hard to implement)other explanations?

A regression modeling appraoch to compute the optimal buffer size in run time

Page 27: An Empirical Evaluation of VoIP Playout Buffer Dimensioning in Skype, Google Talk, and MSN Messenger

NOSSDAV 2009 / Kuan‐Ta Chen  27

On‐going Work

More factors in measuring applications’ buffer dimensioning behavior

frame size, redundancy control, loss burstiness, speech codec, …

More factors in deriving optimal buffer sizetransport protocol (TCP in addition to UDP), speech codec, …

Real‐life network experiments to evaluate the  regression‐based buffer dimensioning algorithm

Page 28: An Empirical Evaluation of VoIP Playout Buffer Dimensioning in Skype, Google Talk, and MSN Messenger

NOSSDAV 2009

Thank You!

Kuan‐Ta Chen

http://www.iis.sinica.edu.tw/~ktchen