Maelstream: Self-Organizing Media Streaming for Many-to...

Maelstream: Self-Organizing Media Streamingfor Many-to-Many Interaction

Lucas Provensi , Abhishek Singh , Frank Eliassen, and Roman Vitenberg

Abstract—A number of emerging multimedia applications, such as webinars, require users to interact by exchanging media streams.

In such application there are multiple interacting participants which both produce and consume media content and a set of participants

which are only consumers. Keeping the end-to-end latency as low as possible while not violating bandwidth constraints is one of the

most important requirements for this type of application. While there exists solutions to this problem for applications such as multi-party

video conferencing, they rely on dedicated infrastructures which may be expensive and not available to all users. On the other hand,

decentralized P2P solutions have been focusing on single source media streaming, which does not consider multiple interactive

participants. In this paper, we propose Maelstream, a self-organizing media streaming solution that supports multiple interacting

participants as well as a large number of consumers. Maelstream uses gossip protocols to generate multiple latency-aware streaming

trees on top of a P2P overlay. We have evaluated our solution with simulations implemented using Peersim and ns-3 simulators, and

compared Maelstream with Chunkyspread, an unstructured protocol capable of fine-tuning latency. We show that Maelstream can

achieve low end-to-end latency and scales well with the number of streams.

Index Terms—Self-organizing systems, P2P streaming, many-to-many low-latency interaction

Ç

1 INTRODUCTION

IN the last few years, we have been witnessing an incr-easing number of applications exploiting multimedia

interactions over the Internet, such as multi-party videoconferencing, live streaming from social media, webinars,distributed discussion panels, on-line gaming and so forth.These applications consist of two or more interactive usersthat produce distinct media content. The content producersinteract with each other by exchanging media streams andthus, they require low latency communication among them-selves. The application might also include a number of con-tent consumers that are only observing the interaction byreceiving all the streams from the producers. The consum-ers require streams with low latency so that they can syn-chronize the presentation of the streams at their end. Toenable this type of application we need a content distribu-tion solution that can keep the interaction quality when theapplication is running in latency and bandwidth con-strained environments, such as the Internet which can onlyprovide best effort.

Solutions for interactive multimedia applications canexploit the client-server model to provide direct connectionbetween interactive users with low latency [1]. However,these solutions suffer from scalability problems when thenumber of producers and consumers grows. Few works existthat deal with the case of multiple interacting producers and

multiple consumers [2]. These works, however, rely heavilyon dedicated infrastructures, centralized solutions or CDNs,which might increase the deployment cost and prevent manyusers fromparticipating in the application.

P2P solutions can address the scalability and deploymentcost problems by providing overlays where all participantscontribute resources to accommodate more users. However,the majority of works on P2P streaming overlays are appliedto single-source live streaming or to video-on-demand [3].These works are not designed to deal with multiple interac-tive users, and the few P2P works that support multiplesources are either limited to small multicast groups, or notaimed at high-rate streaming [4].

Interactive users have more stringent requirementsregarding the latency with which they receive each other’smedia streams, but P2P solutions for live streaming do notmake any distinction between interactive and non-interactiveusers. Using one of these solutions and having the contentproducers connect directly to one anothermight not be possi-ble, since the bandwidth capacity of a given producer maynot be sufficient to provide its stream to all the other interac-tive users, as well as to at least one consumer. Furthermore,having multiple producers implies that the bandwidthresources contributed by the peers will be used to providemultiple streams, but most P2P solutions assume all resour-ces can be utilized to provide a single stream. Independentlyapplying a single-stream solution for each producer mightresult in conflicting allocations of the bandwidth capacityor unfair allocations, where some streams will utilize moreresources than others.

In this paper, we propose Maelstream, a decentralizedsolution for constructing and maintaining latency-awareP2P overlays for interactive media streaming applications.Maelstream has been designed to:

� The authors are with the Department of Informatics, University of Oslo,Norway. E-mail: {provensi, abhi, frank, romanvi}@ifi.uio.no.

Manuscript received 16 Mar. 2017; revised 9 Oct. 2017; accepted 18 Nov.2017. Date of publication 10 Jan. 2018; date of current version 11 May 2018.(Corresponding author: Lucas Provensi.)Recommended for acceptance by M. Steinder.For information on obtaining reprints of this article, please send e-mail to:[email protected], and reference the Digital Object Identifier below.Digital Object Identifier no. 10.1109/TPDS.2018.2791599

1342 IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, VOL. 29, NO. 6, JUNE 2018

1045-9219� 2018 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.See ht _tp://www.ieee.org/publications_standards/publications/rights/index.html for more information.

https://orcid.org/0000-0002-1796-3888

https://orcid.org/0000-0002-1796-3888

https://orcid.org/0000-0002-1796-3888

https://orcid.org/0000-0002-1796-3888

https://orcid.org/0000-0002-1796-3888

https://orcid.org/0000-0002-9396-2764

https://orcid.org/0000-0002-9396-2764

https://orcid.org/0000-0002-9396-2764

https://orcid.org/0000-0002-9396-2764

https://orcid.org/0000-0002-9396-2764

mailto:

� Handle multiple stream producers of distinct con-tent and multiple consumers.

� Provide as low latency as possible for all partici-pants, although giving priority over the overlayresources to interactive participants.

� Ensure that all participants continue receiving thestreams in presence of churn.

� Enforces a fair distribution of the bandwidth capac-ity of each node across all the streams.

The number of consumers can potentially be large (thou-sands), although more typically in the order of hundreds(webinars, distributed lectures). The number of interactivenodes we assume will be in the order of tens or less, e.g.,Google Hangouts typically has 2 to 10 participants.

Maelstream applies gossip protocols to disseminatemeta-information about the streaming session among allparticipants. No centralized infrastructure is needed, andeach participant makes individual decisions regarding fromwhom it should request each stream and to whom it shouldprovide a particular stream. The result is the construction ofmultiple trees, one for each distinct stream, that aims to pro-vide as low an end-to-end latency as possible to all partici-pants. The underlying overlay is also resilient, with peersbeing able to quickly repair the parent-child relationship ofthe trees in presence of churn.

We evaluate the proposed solution with extensive simula-tions. We use the Peersim simulator to evaluate Maelstream’sself-organizing behavior. Peersim allows the simulation oflarge scale networks for a long period of time, but abstractsmost of the complexities of the transport layer. Therefore, wealso use the ns-3 simulator to conduct more complete simula-tions with actual data streaming and simulated transportlayer, although with smaller scale and shorter simulationtime. In the ns-3 simulations, we compare Maelstream withChunkyspread, an unstructured multi-tree construction pro-tocol capable of fine-tuning latency. We simulate scenarioswith multiple interactive users and evaluate the achievedend-to-end latencies and receiving rates at themedia consum-ers. We also evaluate the system performance under churnand in bandwidth-constrained environments. The simulationresults shows that Maelstream can achieve low end-to-endlatencies and scales better with the number of interactiveusers (40 to 75 percent lower latencies when compared toChunkySpread). The simulations also show that the churnhandling capabilities of the system can keep a high receivingrate at the users (5 to 8 percent higher rates when compared toChunkySpread).

The rest of the paper is organized as follows. Section 2describes Maelstream’s system model and the class of appli-cations it is aimed at. Section 3 discusses requirements andchallenges. Section 4 compares Maelstream with relatedworks. Section 5 presents our solution, starting with an intu-ition about a centralized heuristic (approximate solution tominimum latency tree), followed by a detailed descriptionof Maelstream’s decentralized approach. Finally, Section 6presents the results of our extensive simulations andSection 7 concludes the paper.

2 APPLICATION DESCRIPTION AND SYSTEM MODEL

We consider a class of emerging applications where a num-ber of distributed users independently produce and possibly

exchange media content, and an additional set of remoteusers only consume the content produced by the first set ofusers. Some examples of such applications are distributeddiscussion panels, distributed lectures and webinars. In dis-tributed discussion panels, a set of panelists are discussing atopic and this discussion is beingwatched by an audience. Indistributed lectures, a set of lecturers collaborate to present asubject to a number of students.

A further example is distributed live podcast. To illustratethis application, consider the scenariowhere a number of foot-ball enthusiasts (journalists, bloggers, etc.) want to produce avideo podcast for commenting on a live football match. Thepodcasters interact with each other by exchanging mediastreams (audio, video or both). Additionally a number of users(team supporters, sports fans, etc.) who are following the foot-ball match can start watching the live video podcast. For theduration of the match, the podcasters require timely deliveryof streams from each other so that they can discuss the matchin real-time. As the audience for the podcast is also followingthe live match, the audience require streams from podcasterswith as low delay as possible so that they can relate the com-ments of the podcasters with the currentmatch situation.

Video-conferencing solutions cannot be applied for thistype of application because they are limited to a small andclosed group of participants. Live media streaming solu-tions are not a good match either as they essentially con-sider one media source and its dissemination to a numberof users. However, there exist solutions for this type ofapplication that rely on a dedicated infrastructure (such asGoogle Hangouts on Air,1 which later moved to YoutubeLive). Hangout users stream video and audio to the Googleinfrastructure, where the streams are aggregated and sentback to all participating users. The Hangout live session canthen be broadcast to an audience through YouTube.

There are three main problems with this solution: First,the users are subject to the terms of use of the infrastructure,which in this case allows the exploitation of the session con-tent and permits making this content available to other com-panies. Second, the performance of the system is limited bythe bandwidth and latency offered by the infrastructure,which may reduce the media quality in order to alleviate itsload, while the resources of end-users are not taken advan-tage of. Third, if all the streams are aggregated and multi-plexed into a single stream, users cannot choose to receiveonly a subset of the available streams. 2 Thus, we explorethe feasibility of supporting such applications by utilizingthe resources of participating users.

We focus on decentralized settings where there is no cen-tral manager for building the streaming overlays. The over-lay uses the resources provided by the peers participatingin the session, and each peer knows only a small subset ofother participants. During a streaming session, each partici-pating peer adopts one of the following roles:

1) Interactive node: we classify a node which producesa stream as an interactive node. An interactive nodealso needs all streams produced by other interactivenodes. We represent the set of interactive nodes as I

1. https://plus.google.com/hangouts/onair2. Use cases include e-sports events where viewers are interested in

receiving only the streams of the players/teams they support.

PROVENSI ETAL.: MAELSTREAM: SELF-ORGANIZING MEDIA STREAMING FOR MANY-TO-MANY INTERACTION 1343

https://plus.google.com/hangouts/onair

and denote the set of streams produced by thesenodes as S (with jIj ¼ jSj).

2) Receiver node: we classify a node which is only a con-sumer of the streams produced by all interactive nodesas a receiver.We denote the set of receiver nodes asR.

The total set of nodes U ¼ I [R can be potentially large(thousands or tens of thousands). However, for applicationssuch as webinars and distributed lectures, the typical num-ber of participants will be in the order of hundreds. Forapplications such as Google Hangouts, the typical size of Iis between 2 to 10. The nodes participate in a streaming ses-sion which starts when the first interactive node joins theoverlay, and it lasts until there is no interactive node left inthe session. During a streaming session, interactive andreceiver nodes may join or leave the session at any time.

We consider the typical scenario where nodes have lim-ited upload bandwidth and the limit may vary across thenodes. We assume asymmetric bandwidth, where theupload is always more scarce than the download capacity.We further assume that streams in S have the same volumeand that a node can estimate its own capacities to decidehow many streams it can relay to other nodes. In our set-tings, each node in U can directly communicate with anyother node in U . Each message sent between any two nodesin U is subjected to delay specific to the path between them.Furthermore, we assume that nodes have incentive to coop-erate with each other and a node can relay the streams itreceives to other nodes provided it has sufficient uploadcapacity. These assumptions aim at limiting the scope ofour work, and we consider bandwidth estimation, incen-tives and NAT traversal as orthogonal problems.

Given the upload bandwidth capacities for each nodeand latency between each pair of nodes, the problem is tofind a self-organizing solution for creating and maintainingan overlay such that all nodes in U receive all streams in Swith as low end-to-end latency as possible. Furthermore, asnodes in I may be interacting with each other, they havestricter latency requirements, and should be given priorityon the overlay. Requirements for prioritization of interactivenodes are explained in Section 3.

3 REQUIREMENTS AND CHALLENGES

For the class of emerging applications and the modeldescribed in Section 2, we have identified the following mainrequirements (Req. in short):

Req. 1 Performance: Given the limited bandwidth resources, itis required that all nodes in U (media producers and con-sumers) receive all the streams in Swith as low latency aspossible and with as high streaming rate as possible. It isalso required to give priority to nodes in I, since they areparticipating in a real-time interaction. This means thatnodes in R should use their resources first to provide thestreams to nodes in I, and then to the other nodes in R.However, priority should not be given to nodes in I ifdoing so prevents the other nodes in the network ofreceiving all the streams.

Req. 2 Robustness: Users can join or leave the streaming ses-sion at any time. Considering that users contributeresources to the application, user churn will have nega-tive impact on performance. Affected users who remain

in the session need be able quickly recover, preventingfurther performance degradation.

Req. 3 Scalability: The application should be able to accom-modate multiple media producers and disseminate thestreams to the stream consumers. Any solution for theapplication must scale with the number of producersand the number of consumers. As jSj increases, nodeswill use more bandwidth to disseminate all the streamsin S. It is therefore required that the total bandwidthcapacity of the overlay must be shared fairly across thedifferent streams. No priority over the resources shouldbe given to a single or to a subset of streams, while theremaining streams are not able to find resources.

In Section 2 we discussed the limitations of differentapproaches and have limited the scope of our solution todecentralized P2P settings. In these settings, peers collaborateto propagate the streams by either using explicit tree struc-tures, where the streams are pushed down the tree, or byswarming, where peers advertise and pull parts of the streamsfrom their neighbors [4]. Swarming solutions aremore tolerantto data loss, since missing chunks can be pulled from neigh-bors. Pulling, however, increases latency since the playbackbuffer at the peers needs to be big enough to allow retransmis-sion (at least larger than the round-trip time [5]). Streamingtrees, on the other hand, are less tolerant to data loss, but bettersuited for interactive applications that cannot tolerate highlatency. We therefore explore the problem of building andmaintaining P2P propagation trees for multiple streams, giventhe requirements described above. We have identified a num-ber of challenges (Chl. in short) inherent to this problem. Tothe best of our knowledge, the combination of these challengesare not addressed by any existing single system.

Chl. 1 Concurrent Streaming Trees. To fulfill Req. 1, multiplelow latency streaming trees need to be built and main-tained simultaneously. Most P2P streaming systemsdeal with constructing a single streaming tree that pro-vides low latency delivery to a number of nodes. How-ever, it is not possible to directly use these systems toindependently build optimal streaming trees for eachstream in S, since the overlay resources are sharedacross all the streams. It is also not possible to keep thesame set of trees unchanged throughout the session, asthe node dynamics change over time.

Chl. 2 Performance in Presence of Churn.Weadopt decentralizedsettings where each node knows only a subset of othernodes in the network (its neighbors). To fulfil Req. 2, thissubset should contain neighbors that can be used asreplacement to repair broken streaming tree branchescaused by user churn. A good replacement is a neighborthat can act as relay node for the affected stream, and offeras low latency as possible, in order to minimize the imp-act of churn on performance. As jSj increases, it gets diffi-cult to maintain a set of neighbors that is both small andcontains good replacements tomend all streaming trees.

Chl. 3 Fair bandwidth sharing versus latency reduction. To fulfilReq. 3, as new interactive nodes join the system, newstreaming trees need to be built, and the system’s currentbandwidth allocation should not prevent the creation ofnew trees. This can be achieved by dividing the uploadcapacity at each node equally according to jSj, so that it is


guaranteed that no single tree will utilize more band-width capacity than the others. This approach, however,can potentially degrade performance, as individualnodes can relay some streams with lower latency thanothers, depending on proximity to the source of eachindividual stream. Therefore, there is a trade-off betweenminimizing the latency of individual trees and having afair distribution of bandwidth resources.

4 RELATED WORK

There aremanyworks on application-levelmulticast that pro-pose solutions for one-to-manymedia streaming. Theseworkscan be classified into tree-based approaches,mesh-based approa-ches and hybrid approaches, depending on how the peer-to-peer overlay is organized [16]. Table 1 summarizes some ofMaelstream properties related to fulfilling the requirementsdescribed in Section 2 andhow they compare to relatedworks.

Tree-based approaches, such as SplitStream [6], Chunky-Spread [7] and Sepidar [8] build one or more multicast treesrooted at the source. SplitStream is a DHT-based solution thatrelies on multiple disjoint trees to handle churn in a dynamicenvironment and to distribute the load across all nodes. Split-Stream assumes that the Pastry [17] DHT routing can provideacceptable communication latency.However it only considersper hop latency for each tree. Multiple trees can also be imple-mented on top of unstructured P2P overlays, which is thecase in ChunkySpread. Chunkyspread uses SwapLinks [18]and can provide a simpler solution than SplitStreamwith bet-ter control over the system load. Another example is Sepidar,which uses gradient overlays to build approximateminimumheight trees. These systems, however, are designed for non-

interactive applications, optimizing data delivery of a singlestream, and do not includemechanisms for dealingwithmul-tiple stream sources or prioritizing interactive nodes.

Mesh-based solutions, such as Prime [5], Chainsaw [9],CoolStreaming [10] andAnySee [11]make use of unstructuredrandomly connected overlay that are simpler to construct andmaintain and are also more robust than trees. The authors ofChainsaw, for instance, claim that it can be configured for usein many-to-many dissemination. However, mesh-basedapproaches are based on swarming with no long-term rela-tionship between peers. Swarming is not the best option forinteractive applications, since it can introduce higher over-head in terms of meta-information exchange, increase theoverall latency because of the push-pull mechanism, and itrelies on randomly connectedgraphs, that are not latency opti-mized. There have also been a number of studies on the sched-uling algorithms of mesh-based systems [19], [20]. Theseworks, however, are aimed at improving the playback conti-nuity and load balance at the nodes, and can only alleviate theinherent increased latency of push-pullmechanisms.

Hybrid solutions, such as mTreebone [14], Bullet [12] andToMo [13], combine tree-based and mesh-based solutions toconstruct overlays that are both resilient and efficient. mTree-bone, for instance, exploits the fact that most of the datastreamed through amesh overlay follows a specific tree struc-ture, and tries to assign stable nodes to this tree backbone.mTreebone can achieve better latencies than puremesh-basedapproaches, but they are still optimized for one-to-many dis-semination and also rely on an underlying randomly con-nected graph. There are also hybrid works that enabletransitions between different overlay topologies [21], [22]. Dif-ferently from these works, our work aims at providing a

TABLE 1Comparison of Maelstream with the Related P2P Streaming Solutions

Requirement Maelstream Tree-based Mesh-based Hybrid

Req. 1 Low end-to-endlatency

Low per hop latency [6];low latency of stream slicesrelative to other slices [7]or to source [8]

Not aimed at reducinglatency [5], [9], [10]; lowpath latency [11]

Not aimed at reducinglatency [12], [13]; lowlatency in the backbonetree [14]

Priority tointeractive users

No prioritization No prioritization No prioritization

Req. 2 Churn will causetemporaryinterruptions

Churn will cause temporaryinterruptions (alleviated bymultiple streams stripes[6], [7])

Virtually no interruption,if a higher latency isacceptable

Churn will causetemporary interruptions(alleviated by using mesh)

Trees affected bychurn are mended byreplacing failed nodesusing neighbor set

Affected trees mended byreplacing the failed nodesusing underlying DHT anda spare capacity group [6];using neighbor set [7]

Does not use trees. Redirectspacket requests from failednodes to the remainingnodes in the neighbor set

Affected trees mended byreplacing failed nodesusing neighbor set [12], [14];re-joining from the treeroot [13]

Neighbor set ofconstant size (used byall trees), selectedpartially at randomand partially basedon proximity

Neighbor set of constant size(for a single tree), selected atrandom [7]; based onproximity [6]; based onutility [8]

Neighbor set of constant sizeselected at random [5], [9],[10]; based on proximity [11];based on bandwidthcapacity [15]

Neighbor set of constant size(for a single tree), selected atrandom

Req. 3 Supports multipledistinct streams

Assumes a single stream Supports multiple distinctstreams

Assumes a single stream

Fair bandwidthsharing acrossstreams

Assumes bandwidth resour-ces are used by a singlestream

Bandwidth can be shared,but fairness is not enforced

Assumes bandwidthresources are used bya single stream


single and general solution, and could potentially be inte-gratedwith them as one of the transition states.

Maelstream can be classified as a tree-based approach,since multiple trees are the mainmechanism used for deliver-ing streams. Unlike SplitStream and Chunkyspread, Mael-stream aims at building low end-to-end latency trees.Maelstream also builds its trees on top of unstructured P2Poverlays, like in Chunkyspread, but instead of relying only onrandom neighbor selection, it combines random selectionwith selection based on proximity, in order to further reducelatency. Differently from other tree-based works, Maelstreamalso aims at giving all streaming trees a fair share of the over-lay resources, so that all trees can be built even in bandwidthconstrained environments (Chl. 3).

The works contained in Table 1 are all decentralized sys-tems, where the nodes self-organize into streaming over-lays. Another approach is applying a centralized server,which orchestrate the overlay construction, such as Coop-Net [23]. Using a centralized infrastructure, nodes wouldreport their perceived latency for all received streams to aserver that can compute the optimal overlay to connectpeers, such that the overall latency is minimized or the max-imum latency across all nodes is minimized. The problem offinding delay-constrained least-cost multicast trees is NP-complete, and the heuristics to solve it do not scale with thenumber of nodes as the centralized infrastructure will needto obtain measurements from all nodes [24]. Another draw-back in this approach is its churn handling capability.Churn can introduce change in latency between nodes andthis will require the centralized infrastructure to repeatedlycompute the modified overlay.

Besides the above mentioned P2P solutions, there areinfrastructure-based approaches, such as 4D TeleCast [2], andcloud-assisted solutions, such as CLive [25], which may pro-vide desired quality-of-service guarantees for the application.Contrary to these approaches, we advocate a completelydecentralized and self-organizing solution for interactivemedia streaming overlays that is simpler to deploy and caneliminate the infrastructure costs. There have also been workson improving P2P streaming solutions by using communitynetwork clouds (CNs) [26], which unlike commercial clouds,are usually self-organizing and free. These works rely on thecapacity of nodes in a CN, but these nodes however, are alsodynamic and heterogeneous and therefore not as reliable ascommercial cloud nodes.

5 THE MAELSTREAM APPROACH

In this section we present Maelstream, our self-organizingsolution for the problem described in Section 3. To supportscalable interactive applications operating in dynamic envi-ronments, we adopt a gossip-based unstructured P2Papproach, where each node in the overlay knows only asmall set of neighbors that changes over time [27].

The proposed solution aims at providing a simple overlayconstruction andmaintenance protocol that can support mul-tiple latency-aware streaming trees with distinct media con-tent. We start by describing a centralized solution where aplanner has complete knowledge of the network topologyand can build approximate minimum latency trees for allstreams. This centralized solution will give us insight on how

to build minimum latency trees, and will also be used inSection 6 as a baseline. We then devise a decentralized proto-col by removing the decision-making process from a centralplanner, and moving it to the participating nodes, which nowcan act independently using only partial knowledge.

5.1 Centralized Solution

The centralized solution assumes a planner with knowledgeabout all the nodes in the network. We adapt the Steiner-tree-based heuristic(STH) algorithm described in [28] to getan approximate solution to the problem of finding multipleconcurrent streaming trees constrained by bandwidth. Theplanner works as follows:

1) Ignore that the total bandwidth is shared across allstreams and find the set T of minimum latencystreaming trees, where jT j ¼ jIj.

2) Verify if all trees in T can be built without violatingthe bandwidth constraints of the nodes.

3) If any node is using more capacity than available,remove one or more of its links and compute theresidual network containing the available bandwidthcapacity.

4) Mend the removed links by selecting the shortestpath replacements from the residual network.

5) Distribute T to all nodes, monitor node churn andrebuild the trees, if necessary.

We consider the cost as the delay between two givennodes and the centralized planner have instant access to thedelay information from all nodes to all of their knownneighbors at any given time. The shortest path algorithm isconstrained by the upload capacity at each node, but it canuse the total capacity of the nodes at each streaming treeindependently.

Algorithm 1 describes the heuristic used to build mini-mum latency trees in the first step of the planner. For eachinteractive node i 2 I, the algorithm builds a tree Ti by firstcalculating the shortest path from i to all the other nodes inU (minimum spanning tree starting at i). It then adds all theremaining nodes in U to Ti one by one, by following thepath with lowest end-to-end latency. Once a node joins Ti,the algorithm updates the node’s parent capacity (boundingthe parent degree) and recalculates the shortest path for theremaining nodes.

Algorithm 1. Centralized Tree Construction Algorithm

Data: U , IResult: Set of low latency trees T : jT j ¼ jIj1 foreach node i 2 I do2 Ti fig;3 Q U � Ti;4 Let pði; uÞ be the lowest latency path from i to a node u;5 Calculate pði; uÞ for all u 2 Q;6 while Q 6¼ ? do7 q arg minq02Q pði; q0Þ;8 Add all the nodes in the path pði; qÞ to Ti;9 Update the capacity of nodes added to Ti;10 Q Q� Ti;11 Re-calculate pði; uÞ for all u 2 Q;12 Add Ti to T ;


The goal of the planner is to find the lowest latency pathfrom each node in I to each node in U , given the bandwidthconstraints of the network.

The remainder of this section will present a solution thatapproximates Algorithm 1 in decentralized settings, andthat also takes into consideration the requirements specifiedin Section 3. The decentralized protocol is divided into threemain components: meta-information exchange (Section 5.2),neighbor selection (Section 5.3), and construction and main-tenance of streaming trees (Section 5.4).

5.2 Meta-Information Exchange

To effectively participate in the streaming session, all nodesneed to be kept informed about how many streams (distinctmedia sources) there exist in the session, and which neigh-bor can provide the best connection for each stream. Forthat reason, each node n will periodically execute a meta-information exchange protocol by sending probe messagesto a subset of its neighbors. The neighbor q that receives aprobe message will respond with meta-information abouthow it perceives the current streaming session:

� The set of streams in S that are being receivedthrough active connections;

� The estimated end-to-end latency for each stream sin S: the latency from the stream’s producer ps to q isdenoted LTpsðqÞ;

� An estimation of q’s available upload capacity. Forsimplicity, we consider that all streams have thesame volume and we can count how many streamscan be accommodated by q as the number of avail-able upload slots: UP ðqÞ.

When n receives the response from q, it can compute itscommunication round-trip time RTT ðn; qÞ from n to q. Sinceeach node only requires probing a small sub-set of neigh-bors, this simple form of RTT measurement is sufficient andpractical. n will estimate the end-to-end latency of eachstream s that can be relayed to n through q as LTpsðn; qÞ ¼LTpsðqÞ þRTT ðn; qÞ=2. The meta-information exchange pro-tocol can also help identifying neighbor nodes that aredown, if they failed to respond to the probe message after atime-out period.

5.3 Neighbor Selection

For scalability reasons, we want to keep only a small num-ber of neighbors at each node, and it is essential that this setof neighbors is up-to-date and useful. To keep the set up-to-date, nodes periodically exchange subsets of their neigh-bors, so they can replace dead entries and also find moreuseful neighbors. We define a useful neighbor as one thatcan offer low end-to-end latency for a particular stream andhas enough upload bandwidth capacity to provide a con-nection to that stream.

We have identified two main problems when trying tokeep a node’s neighbor list completely biased towards latencyreduction: First, with multiple distinct streams, the peers thatare relatively close to all stream sources will group together inlow-latency clusters. References to nodes outside the clusterswill become rare, and this violatesReq. 2, as the clusteringwillreduce the overall robustness of the overlay in the presence offailures [29]. Second, in the case of an environment with

constrained upload bandwidth, it becomes harder or evenimpossible to build complete latency-aware streaming treesrooted at each interactive node, which violates Req. 3 as theupload capacity of nodes outside the low-latency clusters willrarely be used.

Fig. 1 illustrates these problems through an example. Thefigure shows a set of nodes distributed in the two-dimen-sional space according to the distance between them(latency). For the sake of discussion, we divide the spaceinto three regions: R1, R2, and R3. The receiver nodes insideregion R1 are relatively closer to all interactive nodes andtherefore are the best candidates to relay the streams. As wemove to the R2 and R3 regions, the nodes cannot for most ofthe streams offer as good latency as nodes in R1. As a result,nodes will try to exclude nodes in R2 and R3 from theirneighbor list, and the upload capacity offered by thosenodes will be wasted. Isolating R2 and R3 nodes as onlyleaves in all streaming trees is not a problem when nodes inR1 have enough upload capacity to transmit all the streamsin the session among themselves and also to the nodes in R2

and R3. However, when the overall upload capacity of thenodes in R1 is limited, the nodes in R2 and R3 should alsocontribute to the overlay as relay nodes.

To prevent these problems, in Maelstream each nodekeeps references not only to nodes selected based on utilitybut also nodes selected at random. We apply the T-man pro-tocol [30] to rank nodes based on their utility for multiplestreaming trees, and we combine it with a peer samplingservice that provides random subsets of nodes [29]. Themain challenge here (Chl. 2) is to provide T-man with aranking strategy that considers the utility of a neighbor as arelay node in each of the streaming trees. We implementthis multi-objective ranking strategy by first classifying eachneighbor q of a node n in one or more of the states shown inTable 2. The neighbor list is then ranked as follows:

� Neighbors in the Candidate state are ranked basedon the average difference between LTpsðn; qÞ andLTpsðn; rÞ for each s in S, where r is the current noderelaying s to n. By using the average we are selectingneighbors by their proximity to all stream sources(Nodes inside R1 region in Fig. 1), which is usedhere as a measure of how useful the neighbors canbe as replacements in case of churn events.

Fig. 1. Formation of low-latency clusters.


� Neighbors in the Active or Requested states are also inthe Candidate state, as they can provide one or morestreams to n. These entries are ranked accordingly andcan be used by the peer sampling service, but will notbe removed from n until they leave these states.

� Neighbors in the Rejected state usually have exhaustedtheir upload capacity and are not willing to replacetheir upload connections. They are ranked the lowestamong neighbors not in this state. Among themselves,they are ranked the same asCandidate nodes.

Each node n keeps a small neighbor set of size c (here weconsider both T-man’s ranked entries as well as the sam-pling service random entries) and participates in a periodicview exchanges. The exchanges aim at keeping the randomentries up-to-date and to improve the utility of the rankedentries. T-man and the sampling service will exchangeworst ranked and random neighbor entries respectively,however n will always keep the active and requested neigh-bors as they are part of on-going connections or mightbecome part of one. The worst ranked neighbors of n mightbe more useful to other nodes, since they could be hosted atdifferent locations in the underlying network topologies.The exchange of random entries also allows n to find newlyjoined nodes. After a view exchange, new entries receivedby n are fresh, and no RTT measurement were taken toderive the latency information. Therefore, n can only rankthem after the meta-information exchange takes place.

Join Procedure: When a node n wants to join the overlay, itneeds to contact a node i that is already part of the network(or a rendezvous node at a well-known location) and thatnodewill introduce n to the rest of the overlay. The introduceriwill follow a joining protocol based on randomwalks to pro-vide n with an initial set of neighbors [29]. The random walkmessages will also contain information about the role of thenode joining. If n is interactive, then the nodes receiving therandom walk message will know that a new stream is avail-able and can bootstrap the construction of a new streamingtree by requesting the stream from n. After having obtainedan initial neighbor set, n can execute the periodic meta-information and neighbor exchange operations.

5.4 Construction and Maintenance ofStreaming Trees

The neighbor selection mechanism will ensure robustness inpresence of user churn (Req. 2), and scalability in terms ofnumber of nodes (Req. 3). The objective of the third compo-nent of our solution is to build concurrent streaming trees thatfulfills the performance requirement (Req. 1) and providesscalability in terms of numbers of streams (Req. 3). We define

a decentralized version of the basic solution described inSection 5.1, and expand it into a protocol that each peer willfollow, consisting of two main tasks: Finding relay nodes forall existing streams (Algorithm 2) and processing requests tobecome a relay node (Algorithm 3).

Algorithm 2. Algorithm for Requesting Streams

Input : Current node n 2 U , its view Vn and the stream set SResult: Requests are made for each stream s 2 S,

if necessary.1 foreach stream s in S do2 V 0 fv j v 2 Vn ^ v is not in Rejectedg;3 Sort V 0 by decreasing order of LTpsðv; nÞ : v 2 V ;4 r Node that is currently streaming s to n;5 q nil;6 if n 2 R and r ¼ nil and 9i 2 V 0 : UP ðiÞ > 0 then7 q i;8 else if n 2 R and

9j 2 V 0 : LTpsðn; jÞ þ thdl < LTpsðn; rÞthen9 q j;10 else if r ¼ nil or

LTpsðn; rÞ > LTpsðn; V 0½0�Þ þ thdl then11 q V 0½0�;12 if q 6¼ nil then13 Request s from q;

In the first task of the protocol, each node n periodicallyexecutes Algorithm 2 in order to decide from which neigh-bor to request each of the known streams or replace the cur-rent relay nodes with better ones. The aim is to minimizethe latency perceived by n when receiving each of thestreams. To avoid giving preference to any particular stream(fairness component of Req. 3), the nodes select the order ofstream to be evaluated at random (the foreach loop selectstreams at random). For each stream s, nwill first produce alist V 0 of its neighbors that are not in the rejected state. Thelist is sorted on the end-to-end latency of s when forwardedby the respective neighbors, so that nodes offering the low-est end-to-end latency are evaluated first.

If n is a receiver, it will first request s from the neighbor ithat reported free upload slots (Line 6), so that the chances ofrejection are small, and a tree branch is created as soon aspossible. In case none is found, n will request the streamfrom the neighbor j which can offer the best end-to-endlatency but has no free upload slots (Line 8). In this case, n’sproximity to jmight make it a more suitable child of j in thetree branch starting at j, so jmay decide to replace one of itschildren with n. If n is already receiving s from a node r, nwill only request the stream from j if the end-to-end latencyreported by j is significantly lower than the one offered by r.A threshold (thdl in Lines 8 and 10) is applied to decide if thedifference is significant, and it also has impact on the proto-col stability, as will be discussed later in this section.

If n is interactive (n =2 R), it will request s from the neigh-bor with best end-to-end latency regardless if it has freeupload slots or not (Lines 10). In that case, because interac-tive nodes have priority, the overlay will self-organize tobring them close together: At every iteration, n will try toconnect to nodes at higher positions in the other streamingtrees. Given that the network has enough capacity to carryall the streams to all nodes, by marking neighbors that

TABLE 2Classification of a Neighbor q by a Node n

Neighbor state Description

Active q is providing one or more streams to n.Requested n has requested one or more streams

from q but has not received a response yet.Candidate q can provide one or more streams

(for each stream s, we keep track ofLTpsðqÞ and UP ðqÞ).

Rejected q rejected a stream request from n(we keep track of the number of rejects).


rejected stream requests and by constantly exchangingviews, n has a high chance of finding sources for all streamsin the session.

For the second task, upon receiving a request from anode q for a stream s, a node n will process the requestaccording to Algorithm 3. If n has available upload slots, itwill establish a connection with q to transmit the requestedstream (Line 2). If n has no free upload slots, it might decideto stop one of its active connections to accommodate q if atleast one of the following conditions is met:

� n is a receiver node and q is an interactive node(Line 8). According to Req. 1, interactive nodes willhave priority in the streaming trees.

� q can be a better relay node than one of the currentlyconnected nodes. q is a better relay node only if q canprovide lower latency as an intermediate node in thestreaming tree branch starting in n (Line 8, secondcondition). For stability reasons, a threshold thup isapplied when calculating the latency gain, as we willdiscuss later on this section.

� n is using all of its upload capacity for streams otherthan s (Line 10). We want to make sure that stream-ing trees can be built for all streams in the session(Req. 3). At the same time we do not want to pre-allocate bandwidth based on the number of streams,as this could result in trees with higher latency(trade-off discussed in Chl. 3). For that reason, q isgiven a fair chance of getting a upload slot from n, incase it is not contributing resources to s.

Algorithm 3. Algorithm for Processing a Stream Request

Input : Current node n 2 U , requesting node q 2 U ,requested stream s 2 S

Result: Reject or accept the stream request1 if UP ðnÞ > 0 then2 Establish connection between n and q, start streaming

s and return;3 A fa j a 2 U ^ a is receiving any stream from ng;4 if fn; qg � I and jA \ Ij > k then5 Reject the request and return;6 a0 arg maxa2ðA\RÞðLTpðn; aÞ : p 2 IÞ;7 x nil;8 if ðn 2 R and q 2 IÞ or LTpsðn; qÞ þ thup < LTpsðn; a0Þ then9 x a0;10 else if n is not a relay node for s then11 s0 the stream using the biggest fraction of n’s upload

capacity;12 x arg maxa2ðA\RÞðLTps0 ðn; aÞÞ;13 if x 6¼ nil then14 Stop streaming s to x and notify x;15 Establish connection between n and q and start

streaming s;16 else17 Reject the request and return;

Node nwill reject the stream request if none of these con-ditions are met or if n and q are both interactive nodes, andn is using a large number of its upload connections to pro-vide streams to other interactive nodes (Line 4). In the algo-rithm we use a constant k as this number, and the value of kis discussed in Section 6. In an environment with scarce

upload bandwidth, if interactive nodes have only enoughcapacity to exchange streams among themselves, thenreceiver nodes might not get any of the streams. Therefore,we make sure that part of the capacity of interactive nodesis available to receiver nodes.

If n has no free upload slots and accepts the request, itwill disconnect the active node that is not interactive and isreceiving a stream from n with the highest end-to-endlatency (argmax function in Line 6). If n is not yet a relaynode for the requested stream, then the node to be discon-nected (Line 12) is selected from the ones receiving s0, whichis the stream that holds the biggest fraction of n’s uploadcapacity. The disconnected node will be informed by n, addn to the Rejected state and will try to find a new parent, byexecuting Algorithm 2 again.

The result of executing Algorithms 2 and 3 is the con-struction of multiple streaming trees, each rooted at oneinteractive node. Fig. 2 shows two trees constructed after allnodes have established connections to all streams in the ses-sion. In the figure, the interactive node that produces thecontent for a particular streaming tree is colored green, theinteractive node that is not a producer is colored with a ligh-ter green, while the blue nodes with thinner borders arereceiver nodes, only consuming and relaying the content.The arrows indicate streaming connections among nodeswith the associated end-to-end latency. The figure showsthat interactive nodes occupy places higher up in the otherstreaming trees (interactive node 0 is receiving directlyfrom 1, and 1 directly from 0). It also shows that the samenode may be an important relay node in one tree but not inthe other (e.g., Node 5 is a relay node in the tree rooted atnode 1 but is a leaf in the tree rooted at 0).

Practical Considerations: Maelstream does not prescribeany particular media format or data stream serialization,and ultimately, an accepted stream request translates into atransport layer connection being created between two peers.Different implementations for the applications described inSection 2 may decide to use the connections in differentways. For instance, applications using scalable video codingcould use the low latency connections to quickly push thebasic video quality layer to all users, while using the neigh-bor set in a swarming fashion to retrieve the remainingvideo enhancement layers.

Synchronization is an important aspect of multimediasystems. End-to-end latency differences across peers willresult in out-of-sync streams coming from different sources,and also discrepancies in the time the combined media is

Fig. 2. Construction of multiple streaming trees.


played at each peer. Exact asynchrony tolerance level havenot been determined yet for different types of interactiveapplications [31], and so Maelstream was not designed tokeep these levels within bounds. Nevertheless, by aiming atreducing the overall end-to-end latencies, Maelstream iseffectively reducing the time discrepancies among peers,which helps reducing asynchrony levels.

Another important aspect of P2P systems is NAT tra-versal. When deploying a P2P system on the Internet, somepeers can be behind NAT gateways and firewalls, whichwill limit their capacity to directly communicate with eachother. Although great part of peers on the Internet arebehind NAT, it has been shown in [32] that, for UDP stream-ing, about 80 percent of all NAT type combinations are tra-versable by using well-known techniques (such as hole-punching), without the need of using third party relays. InMaelstream we assume that nodes can communicatedirectly (they have either public addresses or traversableNAT types), and we leave the evaluation of particular tra-versal methods as a future work.

Convergence to Stable Trees: One issue related to dynami-cally constructing streaming trees is the time it takes for thetrees to stabilize. Stabilization is mainly associated with thedecision to replace active upload or download connectionsto achieve better latencies. Continually replacing connec-tions would cause temporary interruptions during thestreaming session, and would require constant mending ofbroken tree branches. Therefore, there is a trade-off betweenthe achieved end-to-end latency and the time that it takesto produce a stable streaming tree. To achieve an acceptableconvergence speed, Maelstream defines configurablethresholds for the latency gain (thdl in Algorithm 2 and thup

in Algorithm 3) so that the nodes can decide if the latencygain is sufficient to justify the reconnections.

6 EVALUATION

We have conducted extensive simulations to evaluate Mael-stream, using two distinct simulation environment: Peer-sim [33] and ns-3 [34]. Peersim abstracts most of thecomplexities found in large scale P2P environments, provid-ing us a simple framework to implement and evaluateMaelstream’s basic overlay properties. The ns-3 network sim-ulator allows us to conduct packet-level simulations,3 inwhich multiple streams interact at shared links, causing con-gestion and delay variations. Thus, ns3 presents an alternativeto using methods to simulate bandwidth dynamics, such asthe one presented in [35]. ns-3 simulations are more realisticthan the Peersim ones but can be computationally expensiveand do not scale well with the number of simulated nodesand streams. For that reason, we have implemented smallerscale scenarios with shorter session times in ns-3 and largerscale with longer simulation times in Peersim.

6.1 Settings

Table 3 summarizes important setting used to configure thePeersim and ns-3 simulations. In both simulators, each nodeis connected to a random internet host in the underlyinginternet topology, so that the hosts are used as routers to

send messages subject to host-to-host latencies. In ns-3, themessages are also subject to queuing at intermediate routersand dropping caused by congestion on shared links. Eachnode running Maelstream keeps a neighbor cache of sizec ¼ 30 and will periodically exchange a view of size l ¼ 8.The values of l and c are similar to the ones suggestedin [29]. The constants used in Algorithm 2 and 3 are:k ¼ 0:8 � up, thup ¼ 40 and thdl ¼ 15. We selected these val-ues through sensitivity analysis, whereas in real settingsthey can be configured according to the application goals.For this analysis, we ran Peersim simulations with thup andthdl ranging from 0 to 100 and evaluated the achieved end-to-end latencies as well as the number of reconnections. Weobserved that using low thup values will result in a highernumber of reconnections than using low thdl values, thuswe selected a more conservative value for thup.

In ns-3, we introduce churn by having nodes joining thenetwork according to a Poisson model with the mean valueof 3 arrivals per second. Node removals follow a Pareto dis-tribution with minimum stay time of 180 seconds anda ¼ 1:42. These values constitute a moderate churn modeland are consistent with models used in other works [7],[16]. The stream producers follow the same joining modelbut do not leave the network until the end of the session.During the simulation, all packets are timestamped usingns-3 global simulation time, so that the actual end-to-endlatencies can be calculated upon the packets arrival. Thesetimestamps are only used to report simulation results, andthe simulated systems do not use synchronized clocks fortheir operations. The latencies values are calculated at theend of the simulation as the average of the latencies of thepackets received in a time frame of two seconds.

TABLE 3Summary of Simulation Settings

Setting Peersim ns-3

Underlyinginternettopology

Meridian Internetlatency data set4

with 2500 internethosts

Inet5 topologywith 3000 internethosts.

Total numberof nodes

2600 nodes 500 nodes (e.g.,webinars6)

Number ofdistinctstreams

10 (maximumnumber ofparticipantsin GoogleHangouts)

4 (maximumnumber of users inapplications suchas Apple’s iChat)and 10

Session time 24 hours, wherewe match PeerSimtime unit to 1 ms.

10 minutes inns-3 simulationtime.

Node capacity(up ¼ numberof uploadslots)

10 � up � 15,randomlyselected.

8 � up � 12,randomly selected.

Streaming No actual streamsof data packets.Nodes only maintainlogical tree links.

Each interactivenode streams at500 kbps, constantbitrate.

3. Maelstream ns-3 implementation and all simulation scripts can befound at http://folk.uio.no/provensi/maelstream/

4. http://www.cs.cornell.edu/People/egs/meridian/data.php5. http://topology.eecs.umich.edu/inet/6. 522 attendees in average, according to https://goo.gl/9qKW8p


http://www.cs.cornell.edu/People/egs/meridian/data.php

http://topology.eecs.umich.edu/inet/

In Peersim, we use the Skype P2P network FTA traces [36]to introduce churn. Throughout the session, there are burstsof peers joining and leaving the network, so that the stream-ing trees have to be constantly mended to handle the bursts.We treat each joining event as a new node joining the net-work (Nodes that leave and later re-join the session have tofollow the joining procedure described in Section 5.3 again).The minimum stay time is one minute and the mean staytime is six hours.

6.2 Baselines

To evaluateMaelstream,we compare it against two baselines.The first one is Chunkyspread adapted to multi-sourcestreaming. We selected Chunkyspread because it achieveslower or similar latencies and better churn resilience thanother tree-based systems such as Splitstream. This baseline isonly used in the ns-3 simulations, as Chunkyspread requiresdata packet manipulation and Peersim lacks an abstractionfor packets. The second baseline is a hypothetical centralizedsystem that uses Algorithm 1 to build low latency streamingtrees. In this baseline we relax the upload constraints to pro-duce a lower bound for latency. We did not select pull-basedsystems, such as Chainsaw, as the latencies achieved by thesesystems are always higher than systems that push the datadown pre-constructed streaming trees.

We have implemented Chunkyspread in ns-37 followingthe algorithms described in [7]. Since Chunkyspread doesnot support multiple sources streaming, we have exploredtwo variants: The first is deploying multiple instances ofchunkyspread in each node, one for each stream producer.The second is to use only one instance and make the root ofeach individual tree the source of a distinct stream (insteadof splitting the same stream in multiple slices).

The first variant has the advantage of being more resilientto churn, since removing a tree branch will only interruptone slice of the stream (that could still be presented to theuser if the media can be decoded using incomplete data).However, it suffers from higher latencies as data packets forthe same stream will travel to the receivers following differ-ent paths, and will arrive with varying delays. Therefore, theend-to-end latency achieved by this variant of Chunkyspredfor one stream is always equal to the latency of the slice fol-lowing the highest end-to-end latency path.

Throughout this section, we compare Maelstream withthe second variant of Chunkyspread, as it is better suited tointeractive media streaming. We use the Cyclon proto-col [29] to dynamically generate a random view of neigh-bors for Chunkyspread.

The second baseline, from now on referred to as lower-bound baseline, assumes a centralized planner, with knowl-edge about all the nodes in the network as described inSection 5.1. In order to consider this baseline as the lower-bound solution, we implement Algorithm 1 and ignore thefact that the bandwidth capacity of the nodes is sharedacross multiple trees. Therefore, the overall latency acrossall the trees will always be lower than the latency achievedby concurrent trees sharing the same resources.

We have implemented Algorithm 1 in ns-3 and Peersimand compared it against the Maelstream implementationunder the same settings. We use the Cyclon protocol tobootstrap the simulation and each node is initialized with alimited neighbor view. The nodes collect RTT informationfrom all their neighbors and this information is immediatelyaccessible by the planner, which is implemented as an exter-nal observer in the simulation. The minimum latency treesare computed every cycle and we ignore the overhead anddelay of disseminating the new tree structures from the cen-tral planner, so that the trees are immediately available toall the nodes in the overlay.

For the Peersim experiments, in addition to the lower-bound algorithm, we compare Maelstream with a naive treeconstruction protocol. In this protocol every node has accessto a random neighbor set, from which it chooses entries torequest streams based on end-to-end latency. Nodes receiv-ing requests will only accept them if they have availablebandwidth capacity and there are no mechanisms in placefor replacing active stream connections or enforcing fairbandwidth sharing. Because of this, we expect Maelstreamto always perform better than the naive solution.

6.3 Performance Evaluation

First, we evaluate the performance of Maelstream withrespect to Req. 1. The main performance challenge (Chl. 2) isto built and maintain concurrent low latency streamingtrees. We conducted a series of experiments to evaluate theachieved end-to-end latencies of Maelstream compared tothe baselines described in Section 6.2.

Fig. 3 shows the average end-to-end latency across allnodes over time, collected from the Peersim experiments. Inthis scenario, a new interactive node is added to the systemat the start of each joining burst, until the number of interac-tive nodes reaches 10 (after about 230 minutes). Before allthe interactive nodes have joined, because there is enoughbandwidth capacity in the system, Maelstream performanceis close to the lower-bound baseline. As more interactivenodes join, the bandwidth requirements to carry all streamsto all nodes increase, and the difference between Mael-stream and the lower-bound baseline grows bigger. The dif-ference, however, is always significantly smaller than thedifference between Maelstream and the Naive baseline.

Fig. 4 shows the empirical cumulative distribution func-tion (CDF) of the latencies with which each node received

Fig. 3. Average end-to-end latency over time (Peersim).

7. We could not get access to a reference implementation of Chunky-spread and had to implement it ourselves. This implementation is alsoavailable at http://folk.uio.no/provensi/maelstream/


each stream, measured at all nodes for the duration of thePeersim experiments. Using the lower-bound baseline,almost 100 percent of the latencies perceived by the nodesare lower than 200 ms. With Maelstream, this percentage isreduced to about 95 percent. Using the naive solution, only40 percent of the latencies perceived by the nodes are lowerthan 200 ms.

Fig. 5 shows the average end-to-end latencies of Mael-stream compared to the ones achieved by Chunkyspreadand the lower-bound baseline, in the first five minutes ofthe ns-3 simulations. This simulation was performed inmore stable settings, so that Maelstream and Chunkyspreadcan converge to a stable solution. In this settings, there are10 streams with nodes joining the network within the firstminute of simulation and no node leaving the network untilthe end of the session. The results are consistent with theones in Fig. 3, with the Maelstream being much closer to thelower-bound values than to the Chunkyspread values.

Fig. 6 shows the CDF of latencies with which each nodereceives each stream in the session in the ns-3 experimentswhen we apply the Poisson churn model. According to thesetting is Table 3, for the experiments with 4 stream pro-ducers, all nodes have enough bandwidth capacity to carryall the streams in the session. However, with 10 stream pro-ducers, there is just enough capacity in the overlay to carryall the streams, but the lower capacity bound does not allowall nodes to carry all the streams (the impact of this lowercapacity on the streaming rate is evaluated in Section 6.4).The figure shows that Maelstream can provide better laten-cies and also scales better with the number of streams in abandwidth constrained environment.

For the scenario with 4 stream producers, two mainfactors contribute to the latency gain in Maelstream: Firstthe biased neighbor selection. Second, Chunkyspread

adopts latency optimization technique that considers rela-tive delays as opposed to end-to-end latency. For the sce-nario with 10 streams and scarce bandwidth, Maelstreamtries to prevent a node of spending all of its upload capacityfor a single stream and all nodes have a chance to get allstreams. Chunkyspread does nothing to prevent this, andsome nodes will struggle to find parents that can providegood latency for the other streams in the session. The higherlatencies in the scenario with 10 streams indicate that inenvironments with high bandwidth heterogeneity andscarce upload capacities, the system should trade off mediaquality for lower latency. Techniques such as video scalingcan be applied to reduce the bandwidth requirements of thesystem, giving low latency relay nodes more slots to carrythe media to more nodes.

Req. 1 also states that interactive nodes should have pri-ority in the other streaming trees. To evaluate that, we needto verify that if a node i 2 I and a node r 2 R are deployedon the same network host, i will receive the streams in Swith latencies lower than or equal to the latencies for thesame streams in r. This will happen because relay nodesshould use their capacity first to provide streams to nodesin I, and then to nodes in R. We run the Peersim experi-ments using Maelstream with and without giving priorityto interactive nodes, and deploy receiver nodes in the samerouter as the interactive nodes. Fig. 7 shows the results ofthese experiments. The latency values for receivers andinteractive nodes are plotted in the 3D space usingnode; router; streamf g tuples. To fulfill Req. 1 the surfaces inthe plot should not intersect, and the surface representingthe interactive nodes should be below the one for receivernodes, as the latency decreases with depth in the 3D space.Fig. 7a shows that without the prioritization the surfacesintersect, and the latency experienced by interactive nodesis longer than for receiver nodes at times. Fig. 7b, on theother hand, shows that interactive nodes experience shorterlatencies than their network co-located receiver nodes if theprioritization is in place.

6.4 Robustness Evaluation

According to Req. 2, the neighbor set of a node should con-tain relay nodes for all streaming trees in the session, so thatin case of user churn, broken tree branches can be mendedas quickly as possible. In the Peersim experiments, wedefine stream availability SAðn; sÞ ¼ �

tm �Pj

i¼1 ti�=tm as a

measure of how much time a node n had uninterruptedaccess to a stream s, where tm is the duration of the

Fig. 4. CDF of end-to-end latencies of each stream received by eachnode (Peersim).

Fig. 5. Average end-to-end latency over time (ns-3).

Fig. 6. CDF of end-to-end latencies of each stream received by eachnode (ns-3).


measured streaming interval, j is the number of times thestream swas interrupted, and ti is the measured duration ofthe ith interruption (the interval between a connection breakand a reconnection to a different neighbor).

To maintain low end-to-end latencies in presence ofchurn (Chl. 2), Maelstream strives to populate the neighborsets with alive nodes that can provide all of the streams inthe session with as low latency as possible. We evaluateMaelstream with different neighbor selection strategies:

1) Random: Selecting all neighbors at random increasesthe overall robustness of the overlay, as discussed inSection 5.3.

2) Biased towards latency reduction: Rank and keeponly the best Candidate nodes. Could have negativeimpact on robustness, as low-latency clusters can getdisconnected.

3) Mixed: Combines both biased and random selection.

In the Peersim experiments presented in Section 6.3, themean time between bursts of events is 25 minutes, whichconstitutes a mild churn model, with small impact onstream availability. To better evaluate robustness, wereduce the mean time between bursts to 40 seconds (com-press the 24 hours dataset to 34 minutes) so that nodes haveless time to mend broken connections before the next churnevent. Fig. 8a shows the average stream availability acrossall nodes over time. The biased strategy performs worst inpresence of churn, while Maelstream’s mixed strategy isvery close to the random strategy. Fig. 8b shows the averageend-to-end latencies for the same experiment. Maelstreamusing the mixed strategy performs slightly worst than thebias strategy, and better than the random strategy. In this

scenario, by applying the mixed strategy, Maelstreamachieves a better balance between churn-resilience andlatency reduction than the random and biased strategies.

In the ns-3 experiments, since we simulate data packagesand the transport layer, we can calculate the streaming rateat a node n when receiving a stream s as the total amount ofdata received by n, that is part of s, divided by the total ses-sion time since n joined the network. The streaming ratewill drop when a node takes a long time to find a parent fors (or fail to find a parent), when intermittent packet dropsoccur at intermediate routers (rare when there is enoughcapacity) and when the systems are executing operationsthat temporally remove tree branches. Also, longer delayscaused by queuing at intermediate routers implies a differ-ence between the total amount of data generated by thesource of s and the total amount received at the nodes bythe end of the simulation, further reducing the average rate.

The churn model used for the ns-3 simulations is a mod-erate churn model with a mean stay time of five minutes(half the duration of the session), requiring the participantsto constantly mend broken tree branches. Fig. 9 shows theaverage streaming rate with which each node receives eachstream for the whole duration of the session following thischurn model. In this scenario, Maelstream always performsbetter than Chunkyspread. With four stream producers,Chunkyspread and Maelstream are able to keep more than80 percent of the node; streamf g pairs with rates higher

Fig. 7. Latency at interactive and receiver nodes deployed in the samerouters.

Fig. 8. Performance of Maelstream using different neighbor selectionstrategies (Peersim).


than 400 kbps. With 10 stream producers and scarce band-width, both systems will display reduced receiving rates.This is due to the fact that, as the bandwidth gets scarcer,nodes will struggle to find neighbors willing to relay therequired streams. Also, with increased number of streams,churn events will impact more trees, requiring an increasingnumber of mending operations.

Fig. 10 shows the number of tree operations executed persecond in the ns-3 experiments. We count the number oftree operations (branch creation, removal and replacement)done per second. Maelstream is more aggressive, andexecutes more operations than Chunkyspread. Part ofthese operations are for latency optimization over alreadymended trees. Chunkyspread is more conservative and exe-cutes less operations at the cost of higher latencies. Thisimplies that in Maelstream a higher number of stream inter-ruptions would be perceived by receivers. However, sincethe receiving rate in Maelstream is higher than in Chunky-spread (as seen in Fig. 9), the duration of the interruptionsis shorter.

6.5 Scalability Evaluation

To fulfill Req. 3, Maelstream should be able to accommodatenew nodes as they join the network, and reorganize the con-current streaming trees in order to achieve the performancerequirements. The main challenge (Chl. 3), is to accommo-date new interactive nodes in a bandwidth constrainedenvironment. Fig. 11 shows the achieved end-to-end laten-cies of Maelstream compared to Chunkyspread, as the num-ber of streams increases from 4 to 10 but the upload capacityat the nodes remains the same. The minimum and maxi-mum values in the histogram are the average latencies ofthe streaming trees with the lowest and highest latenciesrespectively, with the mean as the average across all trees.Maelstream’s performance is consistently much closer to

the lower-bound than Chunkyspread is. However, as thenumber of streams reaches 10, it becomes harder for bothsystems to find nodes close to stream producers withavailable capacity, therefore the increased differencebetween minimum and maximum values. As discussed inSection 6.3, one solution could be to apply media scalingto reduce the volume of the highest latency stream.

Fig. 12 shows how Maelstream and Chunkyspread effec-tively use the bandwidth capacity at the nodes, in the ns-3experiment with 10 streams. The nodes are ranked based ontheir average end-to-end latencies and how much of theirbandwidth capacity was used throughout the session. In anenvironmentwhere it is harder to find nodeswith spare band-width, Maelstream manages to use more of the network’scapacity than Chunkyspread (the filled area in the Fig. 12a isbigger than in Fig. 12b). The figures also show that there aremore good relay nodes in Chuckyspread with unused capac-ity than in Maelstream: In Fig. 12b, nodes ranked 350 to 450have better latencies than some of the nodes using 100 percentcapacity. In Fig. 12a, there are only a few nodes between theones ranked 450 and 500 that can offer better latencies thanthe nodes using 100 percent capacity.

Req. 3 also states that the bandwidth resources should beshared fairly across the streams. To evaluate the impact ofMaelstream’s fair bandwidth sharing mechanism (describedin Section 5.4), we conducted experiments using Maelstream

Fig. 9. CDF of receiving rate of each stream at each node (ns-3).

Fig. 10. Number of tree operations over time (ns-3).

Fig. 11. Average end-to-end latency of experiments conducted withincreasing number of interactive nodes (ns-3).

Fig. 12. Ranking of nodes by end-to-end latency and how much of thenode’s capacity is being used (ns-3).


with and without this mechanism enabled. For these experi-ments we used Peersim and reduced the node capacity to8 � up � 13, which resulted in the network having approxi-mately 90 percent of the required capacity to carry all streamsto all nodes. We collected the stream availability for each ofthe 10 streams and calculated the difference between the high-est and lowest availabilities. Fig. 13 shows this difference overtime. Before all the interactive nodes have joined the system(first 200 minutes) and there is sufficient capacity to carry allthe streams to all nodes, the difference is very small. It getsbigger after all the interactive nodes have joined, and somestreaming trees will suffer more from the shortage of band-width resources than others. The graph shows that byenabling the fairness mechanism the difference can bereduced considerably.

7 CONCLUSIONS

In this paper we presented Maelstream, a decentralized andself-organizing approach for constructing and maintainingP2P overlays for a class of emerging interactive streamingapplication. We investigated the requirements of such appli-cations and how they are translated to overlay and streamingtree properties. Maelstream takes advantage of the scalabil-ity and robustness of gossip-based overlays to constructlatency-aware streaming trees to serve multiple distinctstreams to multiple users. We have evaluated the proposedsolution through a set of simulations using Peersim and ns-3,and have shown that it can provide low latency streaming,even in presence of churn. As a future work we intend toimplement Maelstream as a full fledged media streamingprototype, so that we can test and evaluate its performancein real deployments. Technologies such as WebRTC8 can beused to implement this prototype. We also aim at exploringour solutions in applications where the nodes can changetheir role over time. This way, for instance, receiver nodescould dynamically became interactive.

ACKNOWLEDGMENTS

This research was partially conducted in the framework ofthe Verdione project funded by the Research Council ofNorway under grant 187828.

REFERENCES

[1] P. Kauff and O. Schreer, “An immersive 3D video-conferencingsystem using shared virtual team user environments,” in Proc. 4thInt. Conf. Collaborative Virtual Environ., 2002, pp. 105–112.

[2] A. Arefin, Z. Huang, K. Nahrstedt, and P. Agarwal, “4D telecast:Towards large scale multi-site and multi-view dissemination of3DTI contents,” in Proc. IEEE 32nd Int. Conf. Distrib. Comput. Syst.,2012, pp. 82–91.

[3] Z. Shen, J. Luo, R. Zimmermann, and A. V. Vasilakos, “Peer-to-peer media streaming: Insights and new developments,” Proc.IEEE, vol. 99, no. 12, Dec. 2011.

[4] X. Zhang and H. Hassanein, “A survey of peer-to-peer live videostreaming schemes–An algorithmic perspective,” J. Comput. Netw.:Int. J. Comput. Telecommun. Netw., vol. 56, pp. 3548–3579, 2012.

[5] N. Magharei and R. Rejaie, “Prime: Peer-to-peer receiver-drivenmesh-based streaming,” IEEE/ACM Trans. Netw., vol. 17, no. 4,pp. 1052–1065, Aug. 2009.

[6] M. Castro, P. Druschel, A.-M. Kermarrec, A. Nandi, A. Rowstron,and A. Singh, “Splitstream: High-bandwidth multicast in coopera-tive environments,” SIGOPS Oper. Syst. Rev., vol. 37, no. 5,pp. 298–313, 2003.

[7] V. Venkataraman, K. Yoshida, and P. Francis, “Chunkyspread:Heterogeneous unstructured tree-based peer-to-peer multicast,”in Proc. 14th IEEE Int. Conf Netw. Protocols, 2006, pp. 2–11.

[8] A. H. Payberah, J. Dowling, F. Rahimain, and S. Haridi,“Distributed optimization of P2P live streaming overlays,” Com-put., vol. 94, no. 8–10, 2012, pp. 621–647.

[9] V. Pai, K. Kumar, K. Tamilmani, V. Sambamurthy, and A. E. Mohr,“Chainsaw: Eliminating trees from overlay multicast,” in Proc. Int.Workshop Peer-to-Peer Syst., 2005, pp. 127–140.

[10] X. Zhang, J. Liu, B. Li, and Y. S. P. Yum, “Coolstreaming/DONet:A data-driven overlay network for peer-to-peer live mediastreaming,” in Proc. IEEE 24th Annu. Joint Conf. IEEE Comput. Com-mun., 2005, vol. 3, pp. 2012–2111.

[11] X. Liao, H. Jin, Y. Liu, and L. M. Ni, “Scalable live streaming ser-vice based on interoverlay optimization,” IEEE IEEE Trans. Parall.Distrib. Syst., vol. 18, no. 12, pp. 1663–1674, Dec. 2007.

[12] D. Kosti�c, A. Rodriguez, J. Albrecht, and A. Vahdat, “Bullet: Highbandwidth data dissemination using an overlay mesh,” ACMSIGOPS Oper. Syst. Rev., vol. 37, no. 5, pp. 282–297, 2003.

[13] S. Awiphan, Z. Su, and J. Katto, “ToMo: A two-layer mesh/treestructure for live streaming in P2P overlay network,” in Proc. 7thIEEE Consum. Commun. Netw. Conf., 2010, pp. 1–5.

[14] F. Wang, Y. Xiong, and J. Liu, “mTreeBone: A hybrid tree/meshoverlay for application-layer live video multicast,” in Proc. 27thInt. Conf. Distrib. Comput. Syst., 2007, pp. 49–49.

[15] R. Roverso, R. Reale, S. El-Ansary, and S. Haridi, “Smoothcache2.0: CDN-quality adaptive HTTP live streaming on peer-to-peeroverlays,” in Proc. 6th ACMMultimedia Syst. Conf., 2015, pp. 61–72.

[16] J. Seibert, D. Zage, S. Fahmy, and C. Nita-Rotaru, “Experimentalcomparison of peer-to-peer streaming overlays: An applicationperspective,” in Proc. 33rd IEEE Conf. Local Comput. Netw., 2008,pp. 20–27.

[17] A. Rowstron and P. Druschel, “Pastry: Scalable, decentralizedobject location, and routing for large-scale peer-to-peer systems,”in Proc. IFIP/ACM Int. Conf. Distrib. Syst. Platforms Open Distrib.Process., 2001, pp. 329–3250.

[18] V. Vishnumurthy and P. Francis, “On heterogeneous overlay con-struction and random node selection in unstructured P2Pnetworks,” in Proc. 25th IEEE Int. Conf. Comput. Commun., 2006,pp. 1–12.

[19] C. Liang, Y. Guo, and Y. Liu, “Is random scheduling sufficient inP2P video streaming?” in Proc. 28th Int. Conf. Distrib. Comput.Syst., 2008, pp. 53–60.

[20] M. K. Bideh, B. Akbari, and A. G. Sheshjavani, “Adaptive content-and-deadline aware chunk scheduling in mesh-based P2P videostreaming,” Peer-to-Peer Netw. Appl., vol. 9, pp. 436–448, 2016.

[21] M. Wichtlhuber, B. Richerzhagen, J. Ruckert, and D. Hausheer,“TRANSIT: Supporting transitions in peer-to-peer live videostreaming,” in Proc. IFIP Netw. Conf., 2014, pp. 1–9.

[22] J. R€uckert, B. Richerzhagen, E. Lidanski, R. Steinmetz, andD. Hausheer, “TOPT: Supporting flash crowd events in hybridoverlay-based live streaming,” inProc. IFIPNetw. Conf., 2015, pp. 1–9.

[23] V. N. Padmanabhan, H. J. Wang, P. A. Chou, and K. Sripanidkul-chai, “Distributing streaming media content using cooperativenetworking,” in Proc. 12th Int. Workshop Netw. Oper. Syst. SupportDigit. Audio Video, 2002, pp. 177–186.

Fig. 13. Difference between highest and lowest stream availabilities withfair bandwidth sharingmechanism (FBS) enabled and disabled (Peersim).

8. http://www.webrtc.org/


http://www.webrtc.org/

[24] L. Li and C. Li, “A multicast routing protocol with multiple QoSconstraints,” in Proc. Commun. Syst., 2002, pp. 181–197.

[25] A. H. Payberah, H. Kavalionak, V. Kumaresan, A. Montresor, andS. Haridi, “Clive: Cloud-assisted P2P live streaming,” in Proc.IEEE 12th Int. Conf. Peer-to-Peer Comput., 2012, pp. 79–90.

[26] L. Baldesi, L. Maccari, and R. L. Cigno, “Improving P2P streamingin wireless community networks,” Comput. Netw., vol. 93,pp. 389–403, 2015.

[27] M. Jelasity, R. Guerraoui, A.-M. Kermarrec, and M. Van Steen,“The peer sampling service: Experimental evaluation of unstruc-tured gossip-based implementations,” in Proc. ACM/IFIP/USENIXInt. Conf. Distrib. Syst. Platforms Open Distrib. Process., 2004,pp. 79–98.

[28] C.-F. Wang, C.-T. Liang, and R.-H. Jan, “Heuristic algorithms forpacking of multiple-group multicasting,” Comput. Oper. Res.,vol. 29, no. 7, pp. 905–924, 2002.

[29] S. Voulgaris, D. Gavidia, and M. Van Steen, “CYCLON: Inexpen-sive membership management for unstructured P2P overlays,” J.Netw. Syst. Manag., vol. 13, no. 2, pp. 197–217, 2005.

[30] M. Jelasity, A. Montresor, and O. Babaoglu, “T-Man: Gossip-basedfast overlay topology construction,” Comput. Netw., pp. 2321–2339,2009.

[31] M. Montagud, F. Boronat, H. Stokking, and R. van Brandenburg,“Inter-destination multimedia synchronization: Schemes, usecases and standardization,” Multimedia Syst., vol. 18, no. 6,pp. 459–482, 2012.

[32] R. Roverso, S. El-Ansary, and S. Haridi, “Natcracker: Nat combi-nations matter,” in Proc. 18th Int. Conf. Comput. Commun. Netw.,2009, pp. 1–7.

[33] A. Montresor and M. Jelasity, “PeerSim: A scalable P2P simu-lator,” in Proc. IEEE 9th Int. Conf. Peer-to-Peer Comput., 2009,pp. 99–100.

[34] T. R. Henderson, M. Lacage, G. F. Riley, C. Dowell, and J. Kopena,“Network simulations with the ns-3 simulator,” SIGCOMMDemonstration, p. 527, 2008.

[35] A. Gkogkas, R. Roverso, and S. Haridi, “Accurate and efficientsimulation of bandwidth dynamics for peer-to-peer overlaynetworks,” in Proc. 5th Int. ICST Conf. Perform. Eval. MethodologiesTools, 2011, pp. 352–361.

[36] B. Javadi, D. Kondo, A. Iosup, and D. Epema, “The failure tracearchive: Enabling the comparison of failure measurements andmodels of distributed systems,” Elsevier J. Parall. Distrib. Comput.,vol. 73, no. 8, pp. 1208–1223, 2013.

Lucas Provensi received the MS degree in com-puter science from the Federal University ofGoias, Brazil. He is working toward the PhDdegree in the Department of Informatics, Univer-sity of Oslo. His PhD research is related to self-adaptation and self-organization supporting mul-timedia systems.

Abhishek Singh received the MS degree fromthe Vrije Universiteit Amsterdam. He is workingtoward PhD degree at the University of Oslo. Hisresearch interests include broadly in the area ofsecurity, privacy, and distributed systems. In hisPhD thesis, he is addressing research challengesfor the large-scale privacy-preserving data dis-semination applications.

Frank Eliassen is professor in the UiOs Depart-ment of informatics. He is an experiencedresearcher and a project manager for severaldecades, in the areas of distributed systems mid-dleware and IoT/Cyber-Physical Systems (CPS)with experience from national and EU levelprojects. His present research interest includeservice-oriented IoT/edge/fog computing andCPS middleware and programming models inapplication areas including smart cities and smartgrids, adaptive software systems, autonomic sys-tems (self-*), peer-to-peer systems, and coopera-tive micro-grids.

Roman Vitenberg is a professor in the Depart-ment of Informatics, University of Oslo. Hisresearch interests include distributed applications,middleware and algorithms; including specifica-tion, design, analysis, implementation, perfor-mance evaluation, and software engineering. Inparticular, he has been working on large-scalecommunication, privacy and security, data stor-age, distributed event-based systems, fault-tolerant distributed computing, and more recently,blockchain. He is an associate editor for the the

EAI Transactions on Cloud Computing and a Steering Committeememberfor ACMDEBS. He hasmore than 70 publications in peer-reviewedvenuesand five filed patents. His papers were presented best paper awards atACM/IFIP/USENIXMiddleware, ACMSAC, and ACMDEBS conferences.

" For more information on this or any other computing topic,please visit our Digital Library at www.computer.org/publications/dlib.


Maelstream: Self-Organizing Media Streaming for Many-to...

Documents

Transcript of Maelstream: Self-Organizing Media Streaming for Many-to...