Application Layer Multicasting

Application Layer Multicast

Application Layer MulticastingYash Sheth42961722Swapnil Tiwari63242307Vishal Sahu19489633

Application Layer MulticastMulticasting is increasingly used as a efficient and scalable mechanism for the dissemination of data across heterogeneous systems on the internet

Todays IP multicast is restricted due to separate islands of singular control and ISPs are reluctant to enable it in order to reduce router load

To overcome this hurdle of router dependence, there is a need for efficient data delivery without modifying the network. This is where the Application layer comes in!All multicast activity is implemented by the peers, instead of the routers

The goal of this multicast protocol is to construct and maintain an efficient an efficient overlay network of nodes for data transmission

Overlay networks are formed by peers that need to transmit or receive the data. They self-organize themselves into logical topologies which facilitates multi-point message forwarding

C.K. Yeo,B.S. Lee, M.H. Er, "A survey of application level multicast techniquesBanerjee, S. and Bhattacharjee, B. and Kommareddy, C., "Scalable Application Layer MulticastB. Zhang, S. Jamin, L. Zhang, "Host Multicast: A Framework for Delivering Multicast to End Users"Kyungbaek Kim, Sharad Mehrotra, and Nalini Venkatasubramanian, "FaReCast: Fast, Reliable ALM for Flash Dissemination"Paper Study:In this paper, we come across various different kinds of overlay networks and their characteristics, as well as compare their performances on a few standard metricsClassification of Application Layer MulticastOverlay Topology DesignService ModelArchitecturePaper 1: A survey of application level multicast techniques Control Topology and Data Topology together determine the overall structure of the Overlay NetworkTree-Based Topology:Application Layer Multicast Architecture (ALMA)Self-Configuring source specific trees where data topology is comprised within the control topology. Centralized Directory Server (DS) maintains an updated records of the tree.New members query the DS for a list of potential parents and informs it in case they change their parentEach member incorporates dual performance metrics of loss rate and RTT provided by each member to its peers from a ready-list for each memberClassification by Overlay Topology6This is similar to the Gossip algorithm, where each member shares its end-to-end metric data with its gossip candidatesMembers change their parent to the one having the most optimum distance metric. This is called spiraling with their ancestors and siblingsLoop detection in this dynamic tree topology is implemented by maintaining a level number for each member, which is updated whenever it finds a new parent with respect to the new parents level

7Application Level Multicast Infrastructure (ALMI)

Provides a multicast middleware which is implemented above the socket layer. Tailored for a small group in a many-to-many service modelCentral controller creates a minimum spanning tree MST consisting of unicast connections between end hosts.Latency between members is used as the link cost of the MST, thereby optimizing ALMI overlay for low latency data deliveryDynamic change can create different versions of the MST, which are tracked by the controller by version number. This is done to prevent loops and partitions in the new treeNew MST is periodically sent to the previous version member to update their knowledge of connectionsPacket having newer version number is not propagated from a member with older version till it is updated8Banana Tree protocol (BTP)Uses a receiver-based self-organizing approach to build a shared group data tree designed mainly for distributed file sharing applicationsFirst host to join the group becomes the root. Subsequent member learn of the root and join the treeTree is incrementally optimized for low delay by allowing a node to switch to a sibling if it is closer than the parent. The sibling information is provided to each node by its new parentLoop prevention is handled by fulfilling these conditions:A node rejects all attempts for switching if it is itself in the process of switching parentsIt must include the current parent information while switching to validate that the potential parent is its sibling9

10Host Multicast (HM)Provides best-effort delivery to applications while being compatible with IP multicast to the furthest extent possibleAutomates interconnection of IP multicast islands and provides multicast to incapable end-hosts via unicast tunnelsMulticast islands are connected via UDP tunnels between Designated Members (DM) of the islandIt has a distributed tree building protocol scaling to O(N)A new member discovers the root of the shared tree through a Rendezvous Point (RP). It then sets this as a potential parent and takes the list of its childrenDFS-style search for closest parentLoop detection is easy, as member knows it its entire path from the root until that point11

12Overlay Multicast Network Infrastructure (OMNI)

Targeted at media streaming applicationsSingle source overlay tree aimed at minimizing end-to-end average latencyEach member maintains information about its neighbors and ancestorsStoring the root path aids in loop detection

13Mesh TreeA two step approach:Group Members self organize themselves in control topology called mesh.A routing protocol runs across this control topology to define unique path to each member.

Advantage over Tree approach:Lesser construction and maintenance overhead as routing protocol itself takes care of loop avoidance and detection. 14Narada First a mesh control topology across member node is built, followed by nodes self organizing themselves in source-rooted multicast trees to form a data topology using DVMRP protocol.

Each member maintains state information about all other members which is periodically exchanged, leading to large number of messages.

The overhead of member discovery and maintenance of membership results in considerable overhead which limits Narada to small/medium groups.

The utility of each link is periodically evaluated and links may be deleted and added as per need.15Kudos Extension of Narada. Hierarchical topology.

Partitions nodes into clusters with a unique representative (cluster head). Within each cluster independent instance of Narada is run. At the top level, Narada runs on overlay nodes comprising of all cluster heads.

The data topology comprises of independent trees at bottom level and a tree at the top level.

Complex architecture due to cluster migration, diffusion and splitting.

However, it provides superior scalability and low management overhead.16Scattercast Similar to Narada but used ScatterCast proxies (infrastructure service agents).

Uses Gossamer to self organize into a mesh control topology. Source rooted distribution trees are then constructed using a routing algorithm.

Gossip style member discovery. Each member periodically picks up random nodes to exchange membership information.

Latency used a cost metric.

Mesh optimization using a cost function (which is cost of routing to different sources via its neighbors.17Embedded StructureMembers of the overlay network are assigned logical addresses from some abstract coordinate spaceOverlay is built using these logical addressesTopology Agnostic with some exceptionsHighly ScalableALM - CANDesigned to scale to a large group with multiple source service modelUses underlying CAN architecture of virtual d-dimensional space divided into zonesEach node owns it zone and maintains a routing table consisting of only its neighborsControl topology: d-dimensional space Data Topology: Directed flooding over Control topologyDissemination by data forwarding ruleForward only to neighbors except the one you received it fromDont forward if the packet has travelled half the space forward from sourceWhen nodes join the zone splits, when nodes leave the zones are remergedLong data distribution paths

Nodes maintain information only about the neighbors(zones that abut the node)Mention scalabilityTopology agnostic data paths can be very long optimize by assigning zones based on proximity 19ALM-DTDesigned to support large multicast sizeEstablished and maintained in a distributed fashion Each member is assigned (x, y) coordinatesDistance metric is the neighbor testControl topology consists of combinations of DTsNodes only maintain information about neighborsSource rooted data topology is embedded in the DTUses compass routing New nodes join using the DT serverSet of alternative routes available in case of failureSuboptimal mapping of the overlay to the physical network

A DT for a set of vertices A is a triangulation graph with the defining property that for each circumscribing circle of a triangle formed by three vertices in A, no vertex of A is in the interior of the circle. The underlying mechanism used in forming the DT control topology is the neighbour test.The neighbour test is based on the locally equiangularTwo nodes are neighbours in the overlay if their corresponding vertices are connected by an edge in the DT that comprises the vertices associated with all the nodes in the overlay.Compass routing no loop detection 20

Compass Routing

BayeuxDesigned to be primarily fault tolerantControl Topology: Tapestry OverlayData Topology:Built explicitly over control overlay Consists of nodes acting as software routers as well as multicast receiversRouting done using local routing maps stored at each nodeAny destination can be reached in logbN stepsTopology aware Join and Leaving the group Generates more traffic since it maintains more group membership informationRoot is the single point of failureCan be avoided by a multicast tree partitioning mechanismTapestry is a wide area routing and location infrastructure which embeds nodes in a well-defined virtual address space. Nodes have names independent of their locations in the form of random fixed-length bit sequences represented by a common base (e.g. 40 Hex digits representing 160 bits). These can be generated by secure one-way hashing algorithms such as SHA-1.

Nodes store only a small constant size(b) entries

Data tree has to be explicitly built on top of control overlay

In particular the root keeps a list of all group members and all group management traffic must go through the root which is not only a bottleneck but also a single point of failure. Bayeux proposes a multicast tree partitioning mechanism to ameliorate these problems by splitting the root into several replicas and partitioning members across them.23Routing in Bayeux

24Join Request in Bayeux

NICEDesigned for low-bandwidth, data streaming applications with large receiver setsDistance metric is the latency between membersControl topology: Consists of multilevel hierarchyA distributed clustering protocol at each layer chooses a cluster leader which is the graph theoretic center of cluster in Layer Li to join Layer Li+1Cluster size at each layer is between k to 3k-1 where k is constant and comprises of members close to each other Joining operation:A new member is bootstrapped by RP to the hosts in highest layer(Li) of hierarchyJoining host contacts the members of these layer to find the closest one by latencyThe selected node informs the node about its peers in layer Li -1The process iteratively continues till the node locates its cluster in Layer L0

Consists of multilevel hierarchy this is how it achieves scalabilityCluster size at each layer is between k to 3k-1 where k is constant and comprises of members close to each other 26Multilayered hierarchy in NICE

NICE (Contd)Topology aware since the proximity is based on latency between nodes In the control topology, all members of each cluster peer with each other and exchange refresh messages incorporating the latencies between themAlso, members probe peers in supercluster to identify a new cluster leader to adapt to changing network conditionsCluster leaders are also responsible for splitting and merging of clusters to follow size boundsWorst case control overhead is O(klogN)Number of application level hops between any pair of members : O(logN)NICE (Contd)Data Topology: Embedded source specific tree with a simple forwarding ruleSource sends the packet to all its peers in the control toplogyA receiver forwards the packets only if its a cluster leaderLoops and partitions may occur due to stale cluster membership or node failure No special mechanisms for handling, depends on cluster members and leaders to restore the hierarchical relationships and reconcile the cluster view for all members

Flat Topology v/s Hierarchy TopologyIn a flat topology, there is only a single-layer overlay sitting atop the physical network. In a hierarchical topology, we introduce a hierarchy into the overlay forming a multi-layer overlay.Hierarchical topology are highly scalable and incur low management overheadChildren nodes cannot form overlay links across clustersAdditional overhead required for maintenance of clustersService Models Mechanism in which data is delivered multiple destinations.

Best Effort vs Reliability

Source-specific delivery vs many any-source delivery

Determines the type of application the various multicast techniques can support. (video conferencing vs data replication)

31Best Effort vs Reliable transfer Reliability here refers to end-to-end reliability.

Unlike TCP which only provides hop-to-hop reliability.

Must be augmented with data duplication and retransmission.

Puts pressure on source to handle retransmission and rate control.

Receivers can implement buffering and rate limiting capabilities to help source node.

32Best Effort vs Reliable transfer contd... ALMI uses direct connection to source for retransmission in case of error recovery.

Scattercast uses Scaleable Reliable Multicast (SRM) to recover lost data. A recovery request is forwarded to source through intermediate proxy nodes (SCX) until one of the SCX can recover data.

Some protocols use some sort of heuristic to choose best outgoing path or predicting the shortest and most reliable link to the next hop.

33Source-specific vs any-source Source-specific Builds a data topology tree rooted at a source tree. Simple and grow linearly with node addition. Overhead of maintaining and optimizing multiple trees.

Any-specific Any member can be root in the tree. Less overhead in tree building and management. But not optimized for a single source.34Source-specific vs any-source contd ... Type of model adopted by a protocol depends on the targeted applications.

Scattercast uses single-source at it is heavily used in media streaming and broadcasting.

ALMI which is primarily targeted for collaborative applications uses any source model. 35Architecture Peer-to-Peer (P2P) All functionalities are vested with end users. Simplicity of set up and deployment. Provides redundancy and resource sharing.

Proxy Support Dedicated servers strategically deployed within network to provide efficient data distribution and value added services. Better capable that individual host to provide more efficient services. Usually static in nature. Less responsive to changing conditions. Problems with deployment.

36Peer-to-Peer vs Proxy contd ... OMNI deploys proxies called Multicast Service Nodes. Prior to data transfer MSN organize themselves in into a data delivery tree.

Overcast organizes proxies called overcast nodes into a distribution tree rooted at single source. Proxies also provide storage capability in the network.

Scattercast uses its proxies (SCXs) to build an application aware framework. (such as rate adaptation) SCX are also used to repair partitions in overlay network.

Host Multicast (HM) uses designated members (DM) to build a shared data tree. DMs are not proxies as they are not dedicated server nodes.

37Architecture Centralized All responsibilities of group management, overlay computation and optimization are with a central controller. Leads to simplification of routing algorithm, efficient and simple group management. However, limited scalability and central point of failure.

Distributed Group management, overlay computation vested with individual nodes of network. Require some sort of mechanism for host to learn about some nodes in multicast group. More robust to failure and scalable. Excessive overhead. Not optimal and efficient in building overlay network. 38Architecture contd Hybrid A lightweight controller along with distributed algorithm. Controller is used to facilitate group management, error recovery and some of overlay construction. Actual overlay construction is done by individual nodes.39Centralized, Distributed and Hybrid contd ... ALMI only one that uses completely centralized approach where session controller has complete knowledge of membership. Overcast protocol maintains a global registry through which proxies learn about the overlay network. SCRIBE requires a contact node in the overlay network which is responsible for bootstrapping the new node. NICE and TBCP need a rendezvous point to learn about higher level/root nodes. ALMA and HM are examples of hybrid approach. They both use centralized server as a rendezvous point for new members. RP maintains membership information. 40Performance ComparisonScalabilityLimited by protocol efficiencyNumber of receivers supported can be a good metric Another metric can be the protocol overheadA system which scales up more doesnt mean it is better than a system which doesnt scale to the same extent!Performance Comparison

Performance ComparisonEfficiency of ProtocolsDepends on factors such as:Quality of data paths generatedProtocol OverheadTime and resources used to construct the applicationLayer multicast overlayAmount of state information required at each node to maintain the overlay.

Performance ComparisonQuality of data path:StretchAlso called as Relative Data penaltyMeasure of the increase in latency that applications incur while using overlay routingStretch = Protocols unicast routing distance IP unicast routing distanceStressMeasure of effectiveness of the protocol in distributing network load across different physical linksDefined on per node per link basisCounts the number of identical packets sent by the protocol on that link All comparisons are made against IP multicastAs these factors are difficult to measure, other factors such as path length(for Stretch) and Out-degree(for Stress) can be used for analysis

Performance ComparisonProtocol OverheadMeasured as total bytes of non-data that enters the network Includes control and maintenance traffic, probe traffic etcState information maintained by a node also contributes to the protocol overheadFailure ToleranceDepends on various mechanisms put in place for error and failure recoveryWhether prone to single point of failure or failure can be contained or localized

Single point of Failure : not as catastrophic ; just would avoid new joins .. The rest works fineLocalized failure : contained only to the node and its neighbors and does not impact any join processes 45In this paper, we study the technique of Host Multicast HM, which is an application layer techniqueAlong with overcoming the deployment and control pitfalls of IP multicasting, HM is compatible with existing IP multicast infrastructureThis paper goes on to discuss the Host Multicast Tree Protocol HMTP and further suggest some performance tuning and provide the performance evaluation of this techniquePaper 2: Host Multicast: A Framework for Delivering Multicast to End UsersIt is a hybrid approach to Application layer multicast and IP multicastHost Multicast Framework to provide best effort delivery to applicationsCompatibility with IP multicast allows it to accept and forward packets from existing IP multicast networks, hence deployable on existing internetNo support is required from the OS, routers or servers. Hence, it supports transmission of data over heterogeneous systems irrespective of the networkHost Multicast47Each cloud above is considered to be an existing IP multicast island being linked by a channel of communication. HM is used for distribution of data over such a networkHM Architecture

IP Multicast Island

Unicast TunnelNormal MemberHostHostHostHostHostHostDesignated Member (DM)DMDMDMRendezvous Point (HMRP)RP48HM ArchitectureEvery island has a Designated Member DM assigned to communicate with other such islandsAny 2 DMs are connected via a Unicast Tunnel over UDPAll DMs run HMTP to self-organize their own island as a member of a bi-directional shared tree. Here, a special node is assigned as the rootFrom member to root, there is only one loop-free pathIncoming data is replicated and sent to remaining neighbors, thus capable of reaching any node on the treeEach multicast group or island has a Group ID and Session Directory stored in a Host Multicast Rendezvous Point (HMRP)49Host Multicast Tree Protocol-HMTPBuild a bi-directional shared-tree connecting all islandsThe tree should be congruent to physical network topology to be efficient: use member-to-member round-trip time as distance metric in current designThe tree should be robust: be able to handle node failure, dynamic join/leave etc.In HMTP members maintain the tree structure by periodically exchanging messages with neighborsThe mechanisms to detect node failure and loop formation are described further..50JoinRoot of the tree along with its directory is always known by the HMRPNew member does DFS to find closest parentClustering nearby members make the tree congruent to actual physical network topology to support islandsDFGEAACHBRPWhere is my group?Root of your group is ABCDABCFGDGFGGH51Maintenance and Member LeavingWhen Node leaves, parent and children are automatically notifiedEach member keeps its children list and root path up to date by exchanging REFRESH and PATH messages with neighbors.Root sends REFRESH message to HMRPDFGEAACBRPBCDBCFGGFGH52Loop Detection and ResolutionBuild a bi-directional shared-tree connecting all islandsLoop is possible: Multiple conflicting joins happen at the same time.Loop Detection: Ones root path contains itselfResolution: Leave the current parent and re-join the tree from the root.Hence we can say that loops are rare.DFGEACB53Performance MetricsTree Cost: sum of all tree link delaysCost ratio to IP MulticastTree Delay: delay from one member to another along the treeDelay ratio to unicast delayLink LoadNumber of duplicate packets carried by a physical link.Compared to Unicast Star and IP Multicast54Performance Comparisons

55ConclusionThe authors have discussed the technique of Host Multicast and compared its performance on various test scenarios, mainly against Unicast and IP multicast scenariosThis proves the efficiency of Host Multicast along with the HMTP ProtocolThere has been a lot of related work regarding HM in the research of ALMI, Narada, Gossamar, Yold, BTP, etc..Future work suggested the reduction of HMTP delay, supporting multiple DMs on a single island, etc..56Paper 3: Scalable Application Layer MulticastProposal:The paper proposes a highly scalable application layer multicast protocol NICE specially designed for low-bandwidth, data streaming application with large receiver setsEx: News and sports ticker services, real time stock updates etc.

NICE is a recursive acronym for NICE Internet Cooperative EnvironmentApplication Layer MulticastEnd nodes of the overlay network replicate data instead of routersIntuitively less efficient than native multicastWe need a protocol which is:Efficient in delivery of data along data pathsRobust to failureIntroduces minimum control overheadHierarchical ArrangementMembers are assigned to different layersLayers are sequentially numbered with the lowest layer being layer zero(L0)All the hosts belong to L0Hosts in each layer are partitioned into clustersCluster consists of hosts separated by low latencySize of Cluster is between k and (3k 1), where k is a constantEach cluster has a leader which is at minimum-maximum distance to all other hostsThe cluster leaders of all clusters in Layer Li-1 join Layer LiAt most logkN layers, highest layer has a single memberHost maintains information about all the clusters it belongs to and the super clusterControl TopologyNeighbors on the control topology exchange soft state refreshesThey do not generate high volumes of traffic with all the remaining members of the clusterEach member of the cluster exchanges soft state refreshesAllows all the cluster members to identify the changes in cluster membershipControl topology is a cliqueWorst case control overhead at the highest layer for the node is O(klogN)

Clusters in NICEData TopologyData path is source-specific tree implicitly defined from the control topologySource forwards the data to all the members in the clusterReceiving nodes forward the data only if they are cluster leadersData topology is a star topologyNumber of application level hops between any pair of members is logNProtocolJoinThe new node contacts the Rendezvous point(RP)RP responds with a list of peers in the highest layerThe node chooses the member closest to itselfThe process continues iterative till layer 0 The cluster leader may change during join operation

Cluster Maintenance:Each member of the cluster periodically sends HeartBeat messages containing the distance estimates from the member to the other peers in the cluster If a member does not send a message within the time interval, it is assumed to be no longer part of the group

Join

Protocol (Contd)Cluster SplitA cluster leader periodically checks for the size of the clusterA split occurs if the size goes beyond 3k - 1 The cluster leader initiates the split operationDivides the cluster into two equal sized clustersElects the cluster leaders for them and transfers the leadership using LeaderTransfer messages

Cluster MergeCluster leader initiates the merge operationIt chooses its closest cluster peer in the super-clusterPerforms the merge operation by ClusterMergeRequestCluster leaders of both the clusters inform their respective peers about the change

Protocol (Contd)Host Departure A host can either depart gracefully, by sending a remove message to all peersA host can abruptly depart due to failure, in this case the non-receipt of HeartBeat message indicates the peer has leftLeader SelectionIf the departing host was a leader, departure triggers leader selectionEach host chooses a leader independently and communicate this to other peers using HeartBeat meassagesConflicts between two leaders are resolved in this phase

Simulations

ConclusionNICE protocol provides a low overhead control structureIt also provides low latency data distribution pathsIt is well suited for low bandwidth application consisting of large receiver sets

Application Layer Multicasting

Documents

Transcript of Application Layer Multicasting