Download - 1 SVDC: A Highly Scalable Isolation Architecture for .... SVDC - A Highly... · 1 SVDC: A Highly Scalable Isolation Architecture for Virtualized Layer-2 Data Center Networks Congjie

1

SVDC: A Highly Scalable Isolation Architecturefor Virtualized Layer-2 Data Center Networks

Congjie Chen, Student Member, IEEE, Dan Li, Senior Member, IEEE, Jun Li, Senior Member, IEEE,Konglin Zhu, Member, IEEE

Abstract—While large layer-2 networks are widely accepted as the network fabric for modern data centers and network virtualizationis required to support multi-tenant cloud computing, existing network virtualization solutions are not specifically designed for layer-2networks. In this paper, we design SVDC, a highly-scalable and low-overhead virtualization architecture for large layer-2 data centernetworks. By leveraging the emerging software defined networking (SDN) framework, SVDC decouples the global identifier of a virtualnetwork from the identifier carried in the packet header. Hence, SVDC can scale to a great number of virtual networks with a very shorttag in the packet header, which is never achieved by previous network virtualization solutions. SVDC enhances MAC-in-MACencapsulation in a way that packets with overlapped MAC addresses are correctly forwarded even without in-packet global identifiers todifferentiate the virtual networks they belong to. Besides, scalable and efficient layer-2 multicast and broadcast within virtual networksare also supported in SVDC. With extensive simulations and experiments, we show that SVDC is better than existing solutions in manyaspects, particularly isolating virtual networks with high scalability and higher network goodput due to minimal packet header overhead.

Index Terms—data center network, network virtualization, multi-tenant cloud computing.

F

1 INTRODUCTION

Virtualization is one of the paramount enabling technologiesfor the success of multi-tenant cloud computing. Throughvirtualization, cloud providers make profits by multiplex-ing physical resources among a large number of tenants,while tenants benefit from the “pay-as-you-go” chargingmodel and rapid resource provision. Although computationvirtualization technologies (by virtual machines, or VMs)have been developed for decades, cloud network virtual-ization only received attention recently, partially because ofthe cloud security and privacy concerns in the virtualizedenvironment.

Due to the simplicity and easiness to manage, largelayer-2 network is more and more widely accepted as thefabric to build a data center network. Scalable layer-2 archi-tectures, such as TRILL [1], SPB [2] and PortLand [3], areproposed as either industry standards or research artifacts.The layer-2 network can even cross the Internet via servicessuch as VPLS [4]. However, this kind of layer-2 networkfabric design mainly focuses on routing/forwarding rules

• Congjie Chen, Dan Li and Konglin Zhu are with the Department ofComputer Science, Tsinghua University.

• Jun Li is with the Department of Computer and Information Science,University of Oregon.

• Konglin Zhu is also with the School of Information and CommunicationEngineering, Beijing University of Posts and Telecommunications.

The work was supported by the National Key Basic Research Program ofChina (973 program) under Grant 2014CB347800, the National NaturalScience Foundation of China under Grant No.61502045, the National NaturalScience Foundation of China under Grant No.61522205, No.61432002, No.61133006, the National High-tech R&D Program of China (863 program)under Grant 2013AA013303, 2015AA01A705, 2015AA016102, EU FP7Marie Curie Actions project Grant Agreement No. 607584 (the Cleanskyproject), ZTE corporation and Tsinghua University Initiative Scientific Re-search Program.

in the network, and it is still an open issue how to run amulti-tenant network virtualization scheme on top of thenetwork fabrics. Existing network virtualization solutionseither face severe scalability problem [5], or are not specifi-cally designed for layer-2 networks [6]–[8]. Particularly, weneed to address the following challenges in designing sucha layer-2 virtualization solution.

First, for a large-scale, geographically distributed layer-2 network operated by a cloud provider, the potentialnumber of tenants and virtual networks can be huge. Net-work virtualization based on VLan [5] can support at most4094 virtual networks, which is obviously not enough. Al-though VxLan [6], NvGRE [7] and NetLord [8] can support16,777,216 virtual networks, they are at the cost of usingmuch more bits in the packet header. The fundamentalissue is, in existing network virtualization proposals [5]–[7],the number of virtual networks that can be differentiateddepends on the number of bits, aka the tag, used in thepacket header. An interesting question is: can we design anisolation architecture for virtualized layer-2 networks whoseupper limit of supported virtual networks exceed the lengthconstraint of the extra tag it inserts in the packet header?

Second, given the possible overlapped MAC addressesfor VMs in different virtual networks and the limited for-warding table size in data center switches, it is inevitableto encapsulate the original MAC address of a packet whentransmitting it in the core network. MAC-in-UDP encap-sulation used in VxLAN [6] and MAC-in-IP encapsula-tion used in NetLord [8] incur unnecessary packet head-er overhead for a layer-2 network. Compared with them,MAC-in-MAC encapsulation is more suitable. Although itis proposed in previous works such as Provider BackboneBridges [9], we need to enhance it to work well in the multi-tenant network virtualization scenario where VM addresses

largely overlap.Third, multicast service is common in data center net-

works [10], [11], but how to support multicast in a layer-2 network, particularly in a multi-tenant virtualized layer-2 network, is still open. A desired capability for a layer-2network virtualization approach is to support efficient andscalable layer-2 multicast as well as broadcast.

We design SVDC in this paper, which leverages theframework of software defined networking (SDN) to ad-dress the challenges above, and achieves the goal of highscalability and low overhead in the following ways.

First, SVDC decouples the global identifier of a virtualnetwork and the in-packet tag. The global identifier ismaintained in the SVDC controller while the in-packet tagis only used to differentiate virtual networks residing in thesame server. We denote the global virtual network identifiermaintained in the SVDC controller as global tenant networkidentifier (GTID) and the server-local identifier of a virtualnetwork, which is inserted in the packet header, as localtenant network identifier (LTID). In this way, SVDC is ableto encompass a great number of virtual networks largelyexceeding the short tag length carried in the packet header,which is never achieved by previous network virtualizationsolutions.

Second, SVDC uses MAC-in-MAC encapsulation iningress edge switches to mask the overlapped VM addressesin the core network and employs two techniques to guaran-tee correct packet forwarding in the first hop and last hopeven without in-packet global virtual network identifier. Inthe ingress edge switch, SVDC rewrites the LTID of thepacket based on the original LTID of it together with theincoming port of it, which helps correct delivery after thepacket arrives at the destination server. For the egress edgeswitch to correctly forward the packet out after decapsula-tion, SVDC reuses the VLan field of the outer MAC headerto indicate the forwarding port of the egress edge switch. Inthis way, SVDC enhances MAC-in-MAC encapsulation toenable correct packet forwarding in the whole process withminimum efforts.

Third, SVDC efficiently supports up to tens of billionsof multicast and broadcast groups with possible overlap-ping multicast or broadcast addresses in different virtualnetworks sharing the same physical layer-2 network, by thesame framework. Each multicast or broadcast group addresswithin a virtual network is translated to a global uniquemulticast group address that is assigned to the virtualnetwork, which is maintained by the SVDC controller andcarried by the packet multiplexing the destination MACaddress as well as the VLan field in the outer MAC header.

We carry out extensive simulations and experiments toevaluate SVDC. The results show that SVDC outperformsexisting solutions on many fronts, including its highestvirtual network scalability and highest aggregate networkgoodput due to minimized packet header overhead. Theoverhead on the SVDC controller is also shown to be af-fordable.

The rest of this paper is organized as follows. Section 2describes the background of the SVDC design. Section 3 andSection 4 present the SVDC design overview and designdetails, respectively. Sections 5 and 6 describe the evaluationresults. Section 7 and Section 8 discuss several important

considerations in SVDC deployment and related works ofSVDC, respectively. Finally, Section 9 concludes the paper.

2 BACKGROUND

Cloud providers prefer to preserve a layer-2 model for inter-VM communication in virtualized data center networks dueto its simplicity and easy management. However, thereare many challenges in deploying pure layer-2 networkvirtualization.

To maintain the ”plug-and-play” features of layer-2 net-works, most of basic layer-2 network operations rely onbroadcasting, such as Address Resolution Protocol (ARP), Dynamic Host Configuration Protocol (DHCP) and lo-calizing unknown end hosts for Ethernet switches. Sincetenants share the physical network resources in virtualizeddata center networks, frequent broadcasting may causesevere network performance interference among tenants.Meanwhile, the shared nature of virtualized data centernetworks brings malicious tenants more opportunities toattack their target victims [12], threatening the privacy andsafety of other tenants. To protect the service quality andsecurity of one tenant from action of other tenants, it isnecessary to segregate traffic of different tenants’ virtualnetworks, which is often done by VLan in today’s layer-2network management. VLan inserts a 12-bit VLan ID intothe Ethernet header to differentiate tenants. However, the4094 VLan ID limit is not adequate to serve a large numberof tenants simultaneously in a cloud network. In addition,a tenant may require more than one virtual network in hisapplication, which exacerbates this issue.

For the sake of encouraging tenants to migrate theirapplications to the cloud, cloud providers must providetenants with enough flexibility, i.e. enforcing little restrictionon configuration of applications’ parameters. In such a sce-nario, tenants may want to flexibly assign MAC addressesor VLan IDs in their own networks, which results in addressoverlapping problem.

Moreover, virtualization places additional demands oncommodity switches in cloud networks. Instead of learningjust one MAC address per server link, switches now haveto learn tens of MAC addresses of VMs placed on the sameserver. Commodity switches that can hold 50K∼70K MACentries in their forwarding information base (FIB) tables [8]now have to store hundreds of thousands of MAC entriesif they are deployed in cloud networks, which may lead tofrequent flooding of unknown destination packets if theirFIB tables overflow.

3 DESIGN OVERVIEW

In this section we introduce the overview of SVDC design.System Overview: The basic architecture of SVDC is

shown in Fig. 1, including VMs, virtual switches residingin servers, edge switches, the SVDC controller and core net-work. In minimum configuration, we only need to deployan SVDC controller and update the edge switches to supportSVDC. The controller interacts with the edge switches usingan SDN protocol like OpenFlow [13]. A very light-weightmodification on the virtual switch is required to fill the LTIDof a virtual network into the packet. Core switches and VMsjust run legacy protocols, and can be unaware of SVDC.

2

Server-d

VM-d

Virtu

al

Switch

-d

Server-s

VM-s

Virtu

al

Switch

-s

Ingress Edge

Switch

ES-s

Core Network

Egress Edge

Switch

ES-d

SVDC Controller

Global Mapping Tables

Local FIB TableUnicast Encapsulation Table &

Multicast Encapsulation Table

Multicast Decapsulation Table

Fig. 1: SVDC architecture overview.

In the core network, any kind of layer-2 forwardingschemes can be used, e.g., Spanning Tree protocol [14],TRILL protocol [1] and Shortest Path Bridging protocol [2]for unicast, while global multicast tree formation proto-col [15] for multicast. However, up to the operator’s con-figuration, the SVDC controller can also use OpenFlow toconfigure the unicast/multicast forwarding entries in thecore network. Anyway, in this paper we only focus on thedesign of SVDC controller and edge switches based on anSDN framework, and it can seamlessly coexist with anyforwarding fabric in the core network, either SDN or non-SDN.

Data Structures: As shown in Fig. 1, virtual switches,edge switches and the SVDC controller need to implementsome data structures to support SVDC. Every virtual switchneeds to maintain a local FIB table with entries destined toVMs on the local server, while packets sent to all the otherVMs are simply forwarded to the edge switch it connectsto. An ingress edge switch needs to maintain both a unicastencapsulation table and a multicast encapsulation table, usedin MAC-in-MAC encapsulation for every packet. When thefirst packet of a flow arrives at an ingress edge switch, theencapsulation table lookup will fail and then the packet isdirected to the SVDC controller. The SVDC controller thenlooks up its global mapping tables which maintain the globalinformation of the network, and responds to the ingressswitch with the information to update its encapsulationtable. Subsequent packets of the flow will be directly en-capsulated by looking up the encapsulation table, withoutinterrupting the SVDC controller again. Multicast group joinrequests are also directed to the SVDC controller, and thenthe controller updates the multicast decapsulation table in cor-responding egress edge switches with group membership.

Global Identifier vs. Server-local Identifier: SVDC sup-ports a great number of virtual networks by maintaining aglobal identifier, which is GTID, for every virtual networkin the SVDC controller, but never carrying it in the packet.Instead, a server-local identifier, which is LTID, is carried inthe packet header to identify a virtual network on a certainphysical server. Different virtual networks on the same serv-er have different LTIDs. For the same global virtual network,its LTID in different servers can either be different or thesame. The SVDC controller maintains the mapping rela-tionship between the GTID and the corresponding LTIDs indifferent servers, and is responsible for the translation whenthe first packet of a flow is directed to it. The translation

includes both mapping a LTID to the GTID, and vice-versa.To avoid introducing any new packet header field, SVDCreuses the 12-bit VLan field as the LTID, which should beadequate since the number of virtual networks in a physicalserver cannot exceed 4096. By supporting a great number ofvirtual networks using a 12-bit tag in the packet header,SVDC is more scalable than other network virtualizationarchitectures that we have already known in the community.

Correct Forwarding with Server-local Identifier inPackets: To minimize the packet header overhead intro-duced due to encapsulating the original Ethernet headerfrom VMs in a layer-2 network, SVDC uses MAC-in-MACencapsulation in ingress edge switches. It not only masksthe overlapped MAC addresses of VMs in different virtualnetworks, but also minimizes the number of forwardingentries in core switches. The key point here is how toguarantee correct packet forwarding in the first hop and lasthop, since no information is carried in the packet to globallydifferentiate virtual networks in a direct way. Fortunately,SVDC has two methods to deal with these problems.

First, for the ingress edge switch to identify the globalvirtual network an incoming packet belongs to, only theLTID carried in the VLan tag is not enough. But the VLantag together with the incoming port of the switch are justenough for the identification, since the incoming port ofthe switch can uniquely identify the physical server wherethe packet is sent from. After identifying the global virtualnetwork the packet belongs to, the ingress edge switchwill rewrite the VLan tag of the packet as the LTID in thedestination server, which helps correct packet delivery whenthe packet arrives at the destination server.

Second, when the egress edge switch decapsulates theouter MAC header, it needs a way to correctly forward thepacket to an outgoing port. Local table lookup cannot helpbecause the in-packet virtual network identifier is not theglobal one and thus can overlap among servers. The waywe come up with is to reuse the VLan field of the outerMAC header to indicate the forwarding port in the egressedge switch. The field is filled in the ingress edge switchfor a unicast packet by looking up the unicast encapsulationtable, and filled in the egress edge switch for a multicastpacket by looking up the multicast decapsulation table.The 12-bit VLan tag is also more than enough to identifyoutgoing ports, unless the egress edge switch has more than4096 ports, which cannot happen in practice.

Supporting Multicast and Broadcast: SVDC encompass-

3

Solution Packet header Supported virtual Scale of forwarding Supported multicastoverhead (bytes) network # entries in a switch group #

SVDC 18 ∝(number of servers) O(number of edge switches) 34,359,738,370VLan 0 4094 O(number of VMs) 34,342,961,150VxLan 54 16,777,216 O(number of servers) 8,388,608NVGRE 46 16,777,216 O(number of servers) 8,388,608NetLord 38 16,777,216 O(number of edge switches) N/A

TABLE 1: Comparison among the network virtualization solutions.

es multicast and broadcast within each virtual network withpossible overlapping group addresses. In order to avoidtraffic leakage among virtual networks, the SVDC controllermaps each multicast group or broadcast in a virtual networkto a global multicast group, which can be identified by theglobal multicast group address, composed of 23-bit multi-cast MAC address [16] and 12-bit VLan ID. This 35-bit globalmulticast group address is enough to support a potentiallyhuge number of multicast/broadcast groups within virtualnetworks and can be carried in the outer Ethernet header.

Comparison: Table 1 compares SVDC with existing net-work virtualization solutions in the community in details.

An additional packet header is required to encapsulatethe original Ethernet packet, which is 18 bytes for SV-DC (Ethernet header with VLan-tag), 54 bytes for VxLan(Ethernet+IP+UDP+VxLan Header), 46 bytes for NVGRE(Ethernet+IP+GRE Header) and 38 bytes for NetLord (Eth-ernet+IP).

Since SVDC uses a server-local identifier to identifydifferent virtual networks on a certain physical server, theupper limit of virtual networks in SVDC is proportional tothe number of servers in the network, which is quite a largenumber when SVDC is applied to a large-scale network.Section 5.1 analyzes how many virtual networks SVDC cansupport in detail. The upper limit of virtual networks is 4094in VLan which uses a 12-bit VLan ID and 16,777,216 in theother three architectures since they all use a 24-bit virtualnetwork identifier.

For forwarding, VLan in the worst case must maintainentries for every VM in the data center network because itforwards packets based on flat MAC addresses. VxLan andNVGRE both use the MAC address of the decapsulationend-point residing on the destination server as the destina-tion address in the outer MAC header. Thus, they maintainentries for every server in switches in core network. BothSVDC and NetLord use the MAC address of the edge switchconnecting to the destination server as the destination MACaddress in the outer MAC header. Therefore, they only needto maintain entries for every edge switch in switches in corenetwork.

When calculating the upper limit of multicast groups anarchitecture can support, we only consider the non-overlapmulticast group in the layer-2 network, i.e. multicast groupsthat do not leak traffic to hosts that are not their members.SVDC uses 35 bits to differentiate multicast groups in thecore network, and thus the number of multicast groups itcan support is 34,359,738,370. VLan uses 23 bits to differen-tiate multicast groups within each virtual network and thus,can also support 34,342,961,150 different multicast groupsin the whole network. Both VxLan and NVGRE map allIP multicast addresses within a virtual network to a unique

global IP multicast address in the core network. Consideringthat only the lower 23 bits of an IP multicast address can bemapped to a unique Ethernet multicast address, they cansupport up to 8,388,608 different multicast groups. NetLorddoes not discuss the multicast supporting issue.

By comparison, we can find out that SVDC is the onlysolution which decouples the global identifier of a virtualnetwork from the in-packet identifier, and is the only solu-tion that can encompass a great number of virtual networksexceeding the length limit of the in-packet identifier ofthe virtual network. As a result, SVDC enjoys the highestscalability and minimal packet header overhead. Given thefact that other solutions (except VLan) is not particularlydesigned for a layer-2 network, we argue that SVDC is thebest solution as far as we know for large layer-2 cloud datacenter network virtualization.

4 DESIGN DETAILS

In this section we present the design details of SVDC.

4.1 SVDC Components

Virtual Switches: Every virtual switch configures its FIBtable entries towards VMs in the local server, and setsthe forwarding port of the default entry towards the edgeswitch connecting to the server it resides in. As shown inFig. 2, the key of the FIB table entry is (LTID,VMAC), whichuniquely identifies a VM in a physical server. Note thatin SVDC, VMs are not aware of the virtualized networkinfrastructure, and thus the Ethernet header sent by a VMdoes not contain any LTID while the VLan field of it is leftempty. When a virtual switch receives an Ethernet packet, itfirst determines whether it is from a local VM or from theinbound port. If from a local VM, the virtual switch addsthe LTID in the VLan field of the Ethernet header basedon the incoming port and then forwards it out. If from theinbound port, operations on it depend on whether it is aunicast packet or a multicast/broadcast packet. For a unicastpacket, the virtual switch directly looks up the FIB tableand forwards it to a certain VM in the local server; for abroadcast packet, the virtual switch forwards it to all VMswithin the same virtual network on the local server; whilefor a tenant-defined multicast packet, the virtual switchforwards it towards VMs that are interested in it, which canbe learned by snooping the multicast group join messagesent by VMs.

Edge switches: Edge switches bear most intelligence ofthe data plane in SVDC. They are responsible for rewritingVLan field in the inner Ethernet packet header and encap-sulating/decapsulating the original Ethernet packets.

4

Key Output

(LTID, VMAC) port

Fig. 2: The FIB table maintained by every virtual switch.

As shown in Fig. 3(a), every ingress edge switch main-tains a unicast encapsulation table which maps from (in-port, LTID-s, VM-d) to (LTID-d, ES-d, p-ID), where in-port isthe incoming port of the packet, LTID-s is the LTID of thevirtual network in the source server, VM-d is the destinationVM in the original Ethernet header, LTID-d is the LTID of thevirtual network in the destination server, ES-d is the MACaddress of the egress edge switch, and p-ID is the outgoingport to which the egress edge switch should forward thepacket. If the lookup hits, the ingress edge switch will do thefollowing operations. First, it rewrites LTID-s in the VLanfield of the original Ethernet header as LTID-d. Second, itencapsulates the packet by adding an outer Ethernet header,with ES-d as the destination MAC address, its own MACaddress (ES-s) as the source MAC address, and p-ID as theVLan field. Third, it forwards the encapsulated packet bylooking up the forwarding table. However, if the lookupfails, the ingress edge switch will direct the packet to theSVDC controller with the incoming port of the packet,which helps the controller obtain the information requiredto install an encapsulation entry in the unicast encapsulationtable.

Key Output

(in-port, LTID-s, VM-d) (LTID-d, ES-d, p-ID)

(a) The unicast encapsulation table maintained by everyingress edge switch.

Key Output

(in-port, LTID-s, Group-L) Group-G

(b) The multicast encapsulation table main-tained by every ingress edge switch.

Key Output

Group-G Multiple (Out-PORT,LTID-d)

(c) The multicast decapsulation table maintained byevery egress edge switch.

Fig. 3: The three mapping tables maintained by every edgeswitch.

A multicast encapsulation table is also maintained,which maps from the tuple (in-port, LTID-s, Group-L) tothe global multicast group address Group-G to fill in theouter Ethernet header, as shown in Fig. 3(b). Group-L is themulticast group address or the broadcast address withina virtual network. If the lookup hits, it encapsulates themulticast/broadcast packets with Group-G as the destinationMAC address and VLan ID while ES-s as the source MACaddress. If the lookup misses, it will send this packet to the

SVDC controller to update the multicast encapsulation table.Since VMs of a certain group can have different LTIDs in

different servers, egress edge switches should rewrite LTIDin the inner Ethernet header for each packet duplicationdestined to different servers. Thus, every egress edge switchmaintains a multicast decapsulation table, which maps fromGroup-G to multiple (Out-PORT, LTID-d) tuples, where Out-PORT is an output port of a multicast/broadcast packet du-plication and LTID-d is the LTID of the virtual network in thedestination server connecting to the Out-PORT, as shownin Fig. 3(c). Entries in this table are inserted by the SVDCcontroller when the multicast group join message sent by aVM is directed to it. When an egress edge switch receives amulticast packet, it first duplicates this packet as the numberof (Out-PORT,LTID-d) tuples. Then, it decapsulates eachpacket duplication, rewrites the LTID in the inner Ethernetheader of each packet duplication as indicated by LTID-dand sends each packet duplication towards the destinationserver as indicated by the Out-PORT.

To avoid overloading mapping tables in edge switches,each entry in mapping tables has an expiration time whichcan be configured by the network administrator. The entrywill be removed if it does not match any packets for a periodof time defined by the expiration time. This can help managethe life cycle of entries in mapping tables without botheringthe SVDC controller.

SVDC Controller: For a virtual network, we denote itslocal identifier in server s as LTID-s. For a multicast group,we denote its global group address as Group-G and its groupaddress in a virtual network as Group-L. Besides, we use EIDto denote the MAC address of an edge switch, SID to denotethe identifier of a physical server, and VMAC to denotethe MAC address of a VM. Note that for the same virtualnetwork, its LTID in different servers can either be differentor the same. When a new virtual network is created, theSVDC controller will assign an unused GTID to it and anavailable LTID in each server that hosts its VMs to it. Group-G is dynamically constructed and released. When a newmulticast/broadcast group wants to send traffic across thecore network, the controller will assign an available Group-G to it. When all the receivers of a group leave a multicastgroup, or a broadcast group lacks of activity for a longduration, the corresponding Group-G will be removed.

As shown in Fig. 5, the SVDC controller keeps severalgroups of mapping tables based on its global knowledge ofthe network including the knowledge of the VM locationsin each virtual network.

• LT-GT MAP: (SID, LTID)→GTID. It is used to identifythe global identifier of a virtual network based on aphysical server and its local virtual network identifi-er. This mapping table can be obtained when GTIDand LTIDs have been assigned to a virtual network.

• VM-LT MAP: (GTID, VMAC)→(SID, LTID). Based onthe global identifier of a virtual network and a certainMAC address of a VM, we can uniquely identifythe physical server a VM resides in as well as thelocal identifier of the virtual network on that server.This mapping table can also be obtained after GTIDand LTIDs assignment decisions for a virtual networkhave been made.

5

Server-d

VM-d

Virtu

al

Switch

-d

Server-s

VM-s

Virtu

al

Switch

-s

Core NetworkLTID-s LTID-dP-ID

Pkt from VM-s

on Server-s

Pkt from

VM-s

Pay

Load

S-MAC:VM-s

D-MAC:VM-d

VLan:LTID-s

S-MAC:ES-s

D-MAC:ES-d

VLan:P-ID

+

+ Eth

Eth

Pkt from

VM-s

Pay

Load

S-MAC:VM-s

D-MAC:VM-d

VLan:LTID-d+

Pkt to

VM-d

ES-s ES-d

SVDC Controller

Enc

Enc

6. Local Delivery

Fig. 4: Journey of a unicast packet.

Key Output

(SID,LTID) GTID

(a) LT-GT MAP

Key Output

(GTID,VMAC) (SID,LTID)

(b) VM-LT MAP

Key Output

SID (EID, port)

(EID, port) SID

(c) SID-ES MAP

Key Output

(GTID, Group-L) Group-G

(d) GL-GG MAP

Fig. 5: The four mapping tables maintained by the SVDCcontroller.

• SID-ES MAP: (EID, port) ↔ SID. This mapping tablecan be directly obtained from the network topologyand it is used to identify the server connected to acertain port of an edge switch or vice versa.

• GL-GG MAP: (GTID, Group-L)→Group-G. It is used tomap a multicast group or broadcast address within avirtual network to its global multicast group address.

The main function of the SVDC controller is to respondto requests from edge switches with information they need,which helps install the encapsulation/decapsulation tablesin the ingress/egress edge switches. When an ingress edgeswitch receives the first packet of a flow, it directs the packetto the controller with the incoming port of the packet andqueries the controller for the information required.

If it is a unicast data packet, the controller first usesSID-ES MAP to get the SID of the source server. By source

server’s SID and LTID in the original packet, the controllerthen identifies GTID of the virtual network by LT-GT MAP.Based on GTID and the destination MAC address of the ori-ginal packet, the controller can use VM-LT MAP to furtheridentify the destination SID and LTID of the virtual networkin the destination server. Finally, the controller depends onSID-ES MAP again to get the MAC address of the egressedge switch as well as the port number of the egress edgeswitch connecting to the destination server. Now, the SVDCcontroller can return all the information needed by theingress edge switch to construct the unicast encapsulationtable.

If it is a multicast data packet, the controller uses SID-ESMAP and LT-GT MAP sequentially to get the GTID of thevirtual network as aforementioned. Then, if the controllercan find a corresponding entry in GL-GG MAP to get Group-G, it returns Group-G to the ingress edge switch to build themulticast encapsulation table. If not, it will find an availableglobal multicast group address Group-G, insert a new entryto GL-GG MAP, and return the new Group-G to the ingressedge switch.

If it is a multicast group join request, the SVDC controllerfirstly gets the GTID of the virtual network by using SID-ES MAP and LT-GT MAP sequentially. Then, it looks upthe GL-GG MAP to find the corresponding Group-G. If theSVDC controller can find one, it just responds to the edgeswitch with this information. If not, the SVDC controllerwill find an available Group-G and insert a new entry to GL-GG MAP before it responds it to the edge switch. After theedge switch gets the Group-G from the SVDC controller, itinserts a new entry into the multicast decapsulation tablewith Out-PORT as the incoming port of the multicast groupjoin request and LTID-d as the LTID of it.

If the cloud provider’s layer-2 data center networks aregeographically distributed across the Internet, the SVDCcontroller needs to maintain the information of all clouddata center networks of this cloud provider. In practice,each data center network has a controller and the globalinformation is synchronized among these controllers eitherperiodically or using other feasible methodology [17].

Core Networks: Ordinary layer-2 switches are used ascore switches in SVDC. They need not have large FIB tables.Any layer-2 routing scheme can be incorporated with SVDC.

6

Server-d

VM-d

Virtu

al

Switch

-d

Server-s

VM-s

Virtu

al

Switch

-s

ES-s

Core Network

ES-d

LTID-s LTID-d

Pkt from VM-s

on Server-s

Pkt from

VM-s

Pay

Load

S-MAC:VM-s

D-MAC:Group-L

VLan:LTID-s

S-MAC:ES-s

D-MAC

VLan

+

+ Eth

Eth

Pkt from

VM-s

Pay

Load

S-MAC:VM-s

D-MAC:Group-L

VLan:LTID-s+

Pkt to

VM-d

Group-G

Pay

Load

S-MAC:VM-s

D-MAC:Group-L

VLan:LTID-d+

SVDC Controller

E

Fig. 6: Journey of a multicast packet.

This flexibility allows network operators to choose anyfabric that provides an Ethernet abstraction.

To handle unicast packets, switches in core networksneed to be configured to allow packets with all relevantVLans to pass through them, which can be done by usingVLan trunking protocol in commodity Ethernet switch-es [18] or wildcard matching technology in OpenFlowswitches [13] etc.

To handle multicast packets, network operators not onlyneeds to configure core switches to forward packets withall relevant VLans, but also needs to configure core switch-es to forward multicast packets through multicast treeswithin VLans. This can be done by using IGMP snoopingin commodity Ethernet switches [15] or distribution treetechnology in switches supporting RBridge protocol [19] etc.

If the core networks are geographically distributed,packets from one data center may need to be delivered toanother data center, which can be done by deploying VPLSservice [4]. To set up it, customer edge switches (CE) need tobe deployed at data center network border. When a packetneeds to be sent across the Internet, it is first sent to CEs,which is responsible for encapsulating [20] and tunnelingthis packet to another data center. After the packet arrivesat the other end of the tunnel, core switches in anotherdata center help deliver this packet towards the egress edgeswitch.

4.2 Journey of a Unicast PacketWe take Fig. 4 as an example to illustrate the completejourney of a unicast data packet in SVDC. When a packet isgenerated by a VM and sent out to the local virtual switch,it carries the destination MAC address (VM-d), the sourceMAC address (VM-s), and leaves the VLan field empty.

The virtual switch then adds the local LTID (LTID-s) intothe VLan field of the packet and looks up the local FIBtable for forwarding (Fig. 4-1). If the destination VM VM-d iswithin the local server, the packet will be directly forwardedto VM-d. Otherwise, the packet is delivered to the ingressedge switch ES-s.

Next, the ingress edge switch ES-s looks up its encapsu-lation table using (in-port, LTID-s, VM-d) as key. If missed,the ingress edge switch directs the packet to the controller

(Fig. 4-2) and the controller installs the encapsulation entryfor the flow (Fig. 4-3). If hit, the ingress edge switch ob-tains the tuple (LTID-d, ES-d, p-ID). Then VLan field of theoriginal Ethernet header is changed from LTID-s to LTID-d,and an outer Ethernet header is added (Fig. 4-4). The ingressedge switch immediately looks up the FIB table to forwardthe packet.

After that, the packet is delivered by core switchestowards the egress edge switch ES-d. The egress edge switchgets the VLan field of the outer Ethernet header p-ID,decapsulates the outer Ethernet header, and forwards it tothe port p-ID (Fig. 4-5).

Finally the packet arrives at the destination virtualswitch. The virtual switch looks up the FIB table based onLTID-d and VM-d, and delivers it to the destination VM(Fig. 4-6).

4.3 Journey of Multicast PacketFig. 6 illustrates the complete journey of a multicast packetin SVDC. When a VM generates a multicast packet, thedestination address field of the Ethernet header is filled withthe layer-2 multicast group address, denoted as Group-L.This packet then goes to the virtual switch, which insertsLTID-s into the VLan field and forwards it towards theingress edge switch (Fig. 6-1).

The ingress edge switch ES-s looks up its multicastencapsulation table using (in-port, LTID-s, Group-L) as key.If missed, the ingress edge switch directs the packet tothe controller (Fig. 6-2). Then, the controller installs themulticast encapsulation entry into the ingress edge switchand the multicast decapsulation entries into the egress edgeswitches (Fig. 6-3). If hit, the ingress edge switch gets theglobal multicast group address Group-G to fill in the outerEthernet header (Fig. 6-4).

This packet is then forwarded towards the egress edgeswitches along the constructed multicast tree. When anegress edge switch receives this packet, it takes Group-Gfilled in the outer Ethernet header as key and gets multiple(Out-PORT, LTID-d) tuples. It then duplicates the packet asthe number of the tuples, decapsulates each packet dupli-cation, rewrites the LTID of it and forwards it towards theOut-PORT (Fig. 6-5).

7

Finally, the packet arrives at the destination virtualswitch and is forwarded towards VMs which have joinedthe multicast group in the virtual network (Fig. 6-6).

5 SCALABILITY ANALYSIS AND SIMULATIONS

In this section, we first analyze the scalability of SVDC.Then, we conduct multiple simulations to evaluate theperformance of it. We compare the scalability and perfor-mance of SVDC against NetLord [8] and VxLan [6], but notVLan, since VLan cannot scale to a large number of virtualnetworks (e.g., more than 4094).

5.1 Scalability Analysis of the Virtualization Architec-tures

As mentioned in Section 2, an important design goal ofSVDC is to encompass a huge number of virtual networks.Thus, we first analyze the upper bound of the number ofvirtual networks that can be supported in different networkvirtualization architectures, including SVDC, NetLord andVxLan.

Since SVDC reuses the 12-bit VLan field in the innerEthernet header to differentiate tenants on the same server,the upper bound of the number of virtual networks ona single server in SVDC is 4096, the same as in VLan.By maintaining a mapping relationship between GTID and(SID, LTID) in the SVDC controller, different tenants canhave the same LTID on different servers. With a conservativeassumption that each virtual network only contains one VMand is located on a single server, we can calculate the upperlimit of the number of virtual networks that SVDC cansupport as N = S × 2L, where N is the number of virtualnetworks, S is the number of servers in the network and Lis the size of LTID (12 bits).

5 10 15 200

20

40

60

80

100

Number of Server (k)

Virt

ual N

etw

ork

Lim

it (m

illio

n)

SVDCNetLordVxLan

Fig. 7: Upper bound of supported virtual networks.

As shown in Fig. 7, the upper bound of virtual networksSVDC can support grows linearly as the number of serversincreases, while that of VxLan and NetLord remain thesame regardless of the number of servers. It is becauseVxLan and NetLord store the virtual network identifier inthe packet header whose size cannot be very large, whileSVDC leverages the SVDC controller to maintain a globalvirtual network identifier and limits the scope of the localvirtual network identifier within a server. When the numberof server is beyond 5,000, the upper limit of virtual networksSVDC can support outperforms that in VxLan and NetLord.For example, when the number of servers is 10,000, the

number of virtual networks SVDC can support is 44%higher than that in VxLan and NetLord.

With this pessimistic assumption, when the number ofservers in data center network is less than 5,000, especiallyless than 4,100, the upper limit of the number of virtualnetworks SVDC can support is less than that in VxLan andNetLord. However, considering that public cloud providersoften have super large data center networks which havemore than 4,100 servers [21], our SVDC architecture caneasily achieves high scalability.

5.2 Simulation SetupIn this section, we describe the process of our simulationsetup. We develop an event-driven simulator to coarselymodel a large-scale multi-tenant data center network andconduct simulations to evaluate the performance of SVDC,comparing it against NetLord and VxLan.

Network Topology: We use a three-layer Fat-Tree topol-ogy with 32-port switches as the representative multi-pathdata center network topology. Every server is placed with30 VMs. In this network, there are 512 edge switches, 8192servers, and 245,760 VMs. The speed of every link in the net-work is 1Gb/s. Since SVDC is not designed for any specificnetwork topology or routing scheme, we run simulationsto verify the performance of SVDC in different networkenvironments, e.g., multi-path Fat-Tree and a Spanning-Treeout of Fat-Tree.

Workload: We generate virtual networks with randomsizes, and the VMs of a virtual network are randomlyplaced in the servers. When running a group of simulation,each VM sends 50 packets (with random destinations) toother VMs within the same virtual network. All VMs starttransferring data simultaneously, and the simulation termi-nates when all the data transmissions are over. We vary thenumber of active virtual networks that can send data, whichaffects the total number of active VMs participating in datatransmission.

Routing Scheme: The Fat-Tree network has multipleequal-cost paths between any pair of servers. To make un-biased comparison among the three network virtualizationsolutions, we use the same routing strategy for Fat-Treenetwork, i.e., exploiting Equal-Cost-Multi-Path to spreadtraffic to as many paths as possible. But in the Spanning-Tree network, there is a single routing path between anytwo servers.

Evaluation Metrics: Two metrics are evaluated in thesimulation, namely, the aggregate network goodput and thetraffic overhead ratio. The former one is calculated as thesum of bytes carried in the payloads of all packets thatare transferred within a unit time, while the latter one iscalculated as the ratio of the total header bytes over thetotal packet bytes transferred during the simulation time.Goodput is one of the most important metric for moderndata centers, since most cloud applications are bandwidthhungry [22], [23].

5.3 Simulation ResultsNormal distribution of packet sizes in Fat-Tree network:We first carry out simulations by assuming the packet sizefollows a normal distribution with the mean of 150 bytes

8

0 50 100 150 200 2500

1

2

3

4

5

6

Number of VMs (k)

Ag

gre

ga

te g

oo

dp

ut

(Tb

ps)

SVDC

NetLord

VxLan

(a) Aggregate goodput

0 50 100 150 200 2500

10

20

30

40

50

Number of VMs (k)

Ove

rhe

ad

ra

tio

(%

)

SVDC

NetLord

VxLan

(b) Traffic overhead ratio

Fig. 8: Performance of the three solutions under normal distri-bution of packet sizes in Fat-Tree network.

0 50 100 150 200 2500

5

10

15

20

Number of VMs (k)

Ag

gre

ga

te g

oo

dp

ut

(Gb

ps)

SVDC

NetLord

VxLan


0 50 100 150 200 2500

5

10

15

20

25

30

35

40

45

Number of VMs (k)

Ove

rhe

ad

ra

tio

(%

)

SVDC

NetLord

VxLan


Fig. 9: Performance of the three solutions under normal distri-bution of packet sizes in Spanning-Tree network.

and the standard deviation of 200 bytes. The range of thepacket size is from 100 bytes to 1500 bytes. We do not setthe lower bound of the packet size as 64 bytes, because inVxLan 50 extra bytes of UDP/IP/Ethernet header is used toencapsulate the original packet, and the total packet headersize is 86 bytes (plus the original 16-byte Ethernet headerand 20-byte IP header).

Fig. 8 shows the results. The aggregate network goodputincreases with more active VMs, because more VMs helpmultiplexing the link bandwidth resource in the network.SVDC obviously outperforms the other two solutions inachieving higher aggregate network goodput, because lesstraffic overhead is put to the packet headers. When the num-ber of active VMs is 245k, the aggregate network goodputof SVDC is 13% and 28% higher than that of NetLord andVxLan, respectively.

Normal distribution of packet sizes in Spanning-Treenetwork: We use the same set of workload as above andrun the simulation in Spanning-Tree network. The resultis shown in Fig. 9, which exhibits similar trend as Fig. 8exhibits. In the Spanning-Tree network, when the numberof active VMs is 245k, the aggregate network goodput ofSVDC outperforms that in NetLord and VxLan by 15% and27%, respectively. The aggregate goodput in Spanning-Treenetwork is orders of magnitude less than that in Fat-Treenetwork, due to much less link resource in Spanning-Treenetwork.

Exponential distribution of packet sizes in Fat-Treenetwork: Then we change the packet size distribution asa negative exponential distribution with the mean of 150

0 50 100 150 200 2500

1

2

3

4

5

Number of VMs (k)

Agg

rega

te g

oodp

ut (

Tbp

s)

SVDCNetLordVxLan


0 50 100 150 200 2500

10

20

30

40

50

Number of VMs (k)

Ove

rhea

d ra

tio (

%)

SVDCNetLordVxLan


Fig. 10: Performance of the three solutions under exponentialdistribution of packet sizes in Fat-Tree network.

0 50 100 150 200 2500

0.5

1

1.5

2

2.5

3

3.5

Number of VMs (k)

Ag

gre

ga

te g

oo

dp

ut

(Tb

ps)

SVDC

NetLord

VxLan


0 50 100 150 200 2500

20

40

60

80

100

Number of VMs (k)

Ove

rhe

ad

ra

tio

(%

)

SVDC

NetLord

VxLan


Fig. 11: Performance of the three solutions under constantpacket size in Fat-Tree network.

bytes and run the simulation in Fat-Tree network. The lowerand the upper bound of the packet size are the same as inthe groups of simulations above. Fig. 10 shows the results.When there are 245k active VMs, SVDC achieves 16% and32% higher aggregate goodput than that in NetLord andVxLan, respectively. By comparing Fig. 10(a) with Fig. 8(a),the latter gets higher aggregate goodput, resulting frommuch more larger-sized packets and thus less amortizedtraffic overhead.

Constant packet sizes in Fat-Tree network: In the lastgroup of simulation, we set the packet size constantly as 100bytes and the result is shown in Fig. 11. In this case, most ofpackets are occupied by the packet header, and we can findthat the traffic overhead ratios of SVDC, NetLord and VxLanare 56%, 76% and 93%, respectively. The aggregate goodputof SVDC is 75% and 550% higher than that of NetLord andVxLan.

As a whole, the simulation results demonstrate thatSVDC can significantly improve the aggregate networkgoodput given the same network capacity, especially forsmaller packets.

6 EXPERIMENTS

In this section, we conduct testbed experiments both tostudy the scalability of the SVDC controller and evaluatethe SVDC’s network performance in real network.

9

6.1 Scalability of the SVDC Controller

Since we use the SVDC controller to centrally manage themapping tables and process the first packet for every flow,an intuitive question is: can the SVDC controller scale toa large topology with a high rate of flow arrivals? Toanswer this question, we implement our SVDC controllerand evaluate the scalability of it.

Testbed: We implement the SVDC controller by modify-ing the forwarding module in Floodlight [24], which followsthe OpenFlow framework. It runs on a server equippedwith AMD Opteron 4176 2.4GHz 12-core CPU, 32GB RAMand 1Gbps NICs. Six servers are connected to the SVDCcontroller via a H3C S5500 switch. Each of them is equippedwith a 4-core Intel(R) i3 3.3GHz processor, 4GB memory and1Gbps NICs.

Benchmark: Since we do not have a large data centernetwork topology, we use cbench [25] which can simulatea large number of virtual switches and virtual machines ona single server as a test tool to evaluate the scalability ofthe SVDC controller. Cbench treat each virtual switch as asession that can repeatedly send flow requests to the con-troller and consume feedback messages from it while eachvirtual machine is treated as a network card that connects tothe virtual switch. We install cbench on each server. Thus,six servers together can simulate a large scale topology forexperiments. We compare the results of the SVDC controlleragainst an unmodified Floodlight controller.

Experimental Results: We first demonstrate the process-ing throughput of the SVDC controller, in terms of the totalnumber of requests that can be processed in the controllerper second, against different number of VMs in the wholenetwork. The number of VMs coexisting in our simulatednetwork varies from 1,000,000 to 10,000,000. We run 20 trialsfor each group of simulations and obtain the average result.

1 2 3 4 5 6 7 8 9 100

50

100

150

200

250

Number of VMs(million)

Th

rou

gh

pu

t (k

ilo r

esp

on

se

s/s

)

Original Floodlight

SVDC Controller

Fig. 12: Throughput of the SVDC controller and original Flood-light controller.

From the results shown in Fig. 12, we find out that withan increasing number of VMs, the throughput decreases inboth the original Floodlight controller and the SVDC con-troller. When the number of VMs coexisting in the networkreaches 10,000,000, the throughput of the SVDC controllerdecreases down to 147k responses per second. Comparedwith the original Floodlight controller, the SVDC controllerhas a maximum 7% throughput reduction, indicating that afew micro seconds latency are added by the SVDC controllerfor each flow request, which is an affordable cost. We haveconducted statistical tests on the experimental results shownin Fig. 12. All results are fallen into the 95% confidence

0 2 4 6 8 100

20

40

60

80

100


CP

U U

til.(%

)

(a) CPU utilization

0 2 4 6 8 100

500

1000

1500

2000

2500

3000

3500

4000


Me

mo

ry O

ccu

pie

d (

MB

)

(b) Memory consumption

Fig. 13: The CPU utilization and memory consumption of theSVDC controller

intervals on the average value and the length of all the 95%confidence intervals are no more than 4000 responses persecond.

We then measure the CPU utilization and the memo-ry consumption ratio of the SVDC controller. Results areshown in Fig. 13(a) and Fig. 13(b) respectively. We cansee that when the number of VMs reaches 10,000,000, theCPU utilization of the SVDC controller is 83% while thememory consumed by the SVDC controller is nearly 4GB.This indicates that a single server with 12-core CPU and32GB memory can handle flow requests of a large network.Considering that our implementation is non-optimized, weenvision that if the SVDC controller can be incrementallyimplemented on a more powerful distributed SDN con-troller such as Onix [26] and Kandoo [27], it can handle ahigher flow arrival rate in a larger data center network.

6.2 Network Performance Evaluation of SVDC

200 400 600 800 1000 1200 14000

0.2

0.4

0.6

0.8

1

Packet size (B)

Agg

rega

te g

oodp

ut (

Gbp

s)

SVDCNetLordVxLan

Fig. 14: Aggregate network goodput of the three networkvirtualization solutions in the experiment.

We then want to evaluate the SVDC’s network perfor-mance in real network.

Testbed We implement the edge switches and virtualswitches of SVDC by modifying the datapath module ofOpen vSwitch [28]. The MAC-in-MAC encapsulation anddecapsulation functions of edge switches of SVDC are im-plemented by adding about 300 lines of code to the vportmodule of the datapath module. The mapping tables of vir-tual switches and edge switches of SVDC are implementedby reusing the flow tables of the datapath module in OpenvSwitch.

10

Based on our edge switch and virtual switch implemen-tation, we deploy a simple tree topology to evaluate theperformance of SVDC. A H3C S5500 switch is used as a coreswitch, which connects two edge switches. Six servers areconnected to the two edge switches. Each server is equippedwith a 4-core Intel(R) i3 3.3GHz processor, 4GB memory and1Gbps NICs. Four VMs and a virtual switch are installed ineach server. The SVDC controller is connected to the coreswitch. The expiration time of the flow entries installed inedge switches is configured as 5 seconds.

Bulk Data Transfer: Next, we emulate bulk data transferamong VMs on our testbed. Every VM transfers 100MBuseful data to another VM in a different server. The total100MB data is split into many small slices and each slicecan be packed into the payload of a packet. The packet sizevaries from 200 bytes to 1400 bytes. We run 10 trials for eachpacket size and obtain the average result.

Aggregate Network Goodput: We measure the aggre-gate network goodputs of the three solutions in the exper-iment, as shown in Fig. 14. All results are within the 95%confidence intervals on the average value and the length ofall the 95% confidence intervals are no more than 30Mb-ps. Similar with the results in simulations, the aggregategoodput of the network increases significantly with largerpacket size. As aforementioned, there are two reasons forthis trend. First, a larger packet size can amortize the trafficoverhead paid to carry encapsulation headers. Second, alarger packet size means less packets to transfer in thenetwork and accordingly less packet loss caused by trafficcongestion. For any packet size, the aggregate goodput ofSVDC outperforms the other two solutions, thanks to theminimal packet overhead.

7 DISCUSSIONS

In this section, we discuss some important considerations inSVDC deployment. Our major concerns of SVDC are abouthow does it handle VM migration, how does it achieve fault-tolerance and whether it can deal with the cloud dynamicity.

VM Migration: To handle VM migration, a central VMmanager which can communicate with all hosts needs tobe deployed in the network. We envision this VM managerto be co-located with our SVDC controller. In this scenario,when a VM is about to migrate, the VM manager will notifyour SVDC controller about the SID of the destination server,the IP address and the GTID of this VM.

The SVDC controller first needs to look up whether aLTID is assigned to the virtual network hosting this VM inthe destination server before VM migration starts. If not, theSVDC controller will get an unused LTID and configures thevirtual switch on the destination server.

After VM migration completes, a gratuitous ARP mes-sage is sent from the destination server to announce the newlocation of the VM. This ARP message is directed to the SV-DC controller as a broadcast entry query when it arrives atthe ingress edge switch. In this way, the SVDC controller canconfirm VM migration completion and update the locationinformation of this VM in its mapping tables.

To maintain the communication states destined for themigrated VM in edge switches, the SVDC controller broad-casts an entry update message to all edge switches immedi-ately after it receives the gratuitous ARP message. The entry

update message contains the (LTID, ES, p-ID) tuple for themigrated VM to use after migration. All edge switches thatmaintain encapsulation table entries toward the migratedVM update their encapsulation tables and keep the com-munication states towards the migrated VM. The gratuitousARP message is then sent to VMs within the same virtualnetworks to update the ARP tables of them and the FIBtables of core switches.

Since there is a time interval between the completion ofVM migration and the completion of switches’ forwardingstate update, traffic loss for migrated VM is inevitable.Recent proposals which can update data center networkswith zero loss [29], [30] may be applicable for this case. Weleave it as future work.

Fault-tolerance: An important aspect of large virtualizeddata center network is the increased likelihood of failures.SVDC tolerates server failures as well as edge switch fail-ures, because no ”hard state” is associated with a specificvirtual switch or edge switch. We assume that in large vir-tualized data centers, there are virtual network and physicalnetwork management systems which are responsible forhandling virtual switches or edge switches failures.

However, it is necessary for SVDC to handle failures ofcontroller instances or control links connecting controllerinstances and edge switches. To handle failures of controllerinstances, the SVDC controller needs to be implemented indistributed form and more than one instances are used tomanage a network slice. This can be done by incrementallyimplement the SVDC controller on distributed SDN con-troller such as Onix [26]. To handle failures of controller-switch links, traditional routing protocols that are fault-tolerant, e.g. Spanning-Tree protocol [14], can be appliedto the management network if it is deployed in an out-band manner. If the management network is deployed inan in-band manner, we assume the layer-2 routing schemein the core network can take the responsibility to handle linkfailures.

Cloud Dynamicity: Cloud networks inherently havehigh dynamicity. In every second, a large number of VMsand virtual networks come and go simultaneously in thecloud. Previous study has shown the median arrival intervalbetween jobs in cloud is about 900 ms [31]. We considerthe arrival interval between virtual networks follows thesame trend. Thus, we need to analyze whether SVDC candeal with this high dynamicity, which depends on how longdoes SVDC take to finish the virtual network setup anddestruction process.

As mentioned in Section 4, when a new virtual networksetup request arrives at the SVDC controller, it will assignan unused GTID to it and an available LTID in each serverthat hosts its VMs to it. Then, the SVDC controller populatesthe LT-GT MAP and VM-LT MAP for this virtual network.All these work can be done in the controller in a few microseconds for each VM since the SVDC controller has a globalknowledge of the network. Besides, for the sake of correctlyadding LTID to packets sent out by VMs, FIB table entries invirtual switches need to be configured towards VMs of thisvirtual network. Since current implementation of the SVDCcontroller and virtual switches follow the OpenFlow frame-work, the FIB table entry configuration process in SVDCis similar to the flow entry insertion process in OpenFlow

11

framework, which takes a few hundreds of microsecondsto finish for each VM [32]. The virtual network destructionprocess is opposite against the setup process. Thus, weestimate that it takes SVDC about one second to setup ordestruct a virtual network containing O(104) VMs whileeach VM locates on a different server, which indicates SVDCcan deal with the high dynamicity nature of cloud.

8 RELATED WORKS

8.1 Virtualized Network Isolation Schemes

VLan is the conventional virtualized network isolationscheme used in layer-2 network, defined by IEEE 802.1Qstandard [5]. To distinguish different virtualized networks,it inserts a 12-bit VLan ID field into the Ethernet header toidentify a virtual network. Layer-2 switches only forwardpackets to VMs that have the same VLan ID as the source.VLan supports multicast/broadcast within each virtual net-work. As aforementioned, VLan-based isolation scheme lim-its the number of virtual networks that can coexist in thedata center up to 4094. But today’s mega data center needsto scale to much more virtual networks, which VLan cannotfit in.

In order to scale the number of virtualized networkssupported, Internet Engineering Task Force (IETF) proposedtwo architectures: VxLan [6] and NVGRE [7]. VxLan definesa new VxLan packet format, which encapsulates Ethernetpackets with UDP headers. The VxLan packet contains a 24-bit ID to identify a virtual network, which can hold morethan 16 million virtual networks. Similarly, NVGRE utilizesthe GRE standard to encapsulate layer-2 packets sent byVMs and contains a 24-bit tenant ID in the GRE header. Oth-er solutions, for example, DOVE [33] and STT [34], designsimilar architectures. DOVE uses MAC-in-UDP encapsula-tion as in VxLan and STT adds a TCP-like header to layer-2 packets. To handle multicast and broadcast traffic, thesearchitectures map tenant-defined multicast addresses andbroadcast address within a virtual network to a specifiedmulticast IP address that are used in the core network.Multicast/broadcast traffic within a virtual network is sentto all servers hosting VMs in the same virtual network. Tun-nel end points residing in destination servers help forwardpackets towards destination VMs or drop packets if no VMsresiding on the same server are the destinations of thesepackets. Though these architectures can support multicastand broadcast, they incur a high bandwidth overhead byaggregating all multicast and broadcast traffic within avirtual network to a global multicast traffic destined to allservers hosting VMs within the same virtual network.

NetLord [8] is a scalable multi-tenant network architec-ture using MAC-in-IP encapsulation. It uses a NetLord ag-ent which is integrated with the VM hypervisor on serversto encapsulate packets sent out by VMs. The tenant ID iscontained in the destination IP address field in the outer IPheader of packets, while the destination MAC address inthe outer MAC header is the MAC address of the egressedge switch which the destination NetLord agent connectsto. NetLord does not discuss the support of tenant-definedmulticast and broadcast.

8.2 Large Layer-2 Networks

Recently, there is a growing interest in the community todesign a pure large layer-2 (Ethernet) network for datacenters, due to the ease of management, arbitrary VM mi-gration, etc. TRILL [1] and Shortest Path Bridging [2] aretwo main industry standards that target at the large layer-2network. Both of them introduce MAC-in-MAC encapsula-tion mechanism to reduce the FIB table size of switches andthe link state routing protocol to realize efficient routing.One major difference between them is that TRILL relies ona new shim layer between its outer MAC header and innerMAC header to define the ingress and egress gateways andhelp avoid loop. It will revise its outer MAC header hop-by-hop. However, Shortest Path Bridging uses the outer MACheader to define the ingress and egress gateways of packetsand forwards the packets by tunnelling.

Many research projects share the same goal as thesetwo industry standards. PortLand [3] designs a scalableEthernet by exploiting the hierarchical feature of the Fat-Tree topology. It introduces a hierarchical MAC addressscheme that is used to forward packets in the core net-work and requires every switch in the network to forwardpackets based on MAC-address prefixes. MOOSE [35] ad-dresses Ethernet scaling issues by using hierarchical MACaddressing. By requiring switches to forward packets basedon MAC-address prefixes, MOOSE can support shortestpath routing and multicast/broadcast in layer-2 network.SEATTLE [36] enables shortest path forwarding by runninga link-state routing protocol on switches in the core network.It forwards packets based on end-host MAC addresses andstores them in switches which can be achieved by a dis-tributed hash table (DHT). This network-wide DHT is alsoused to build a flexible directory service which performsaddress resolution.

8.3 Network Virtualization using SDN

Researchers have also spent much effort on designing net-work virtualization mechanism over SDN. FlowVisor [37]is a network virtualization mechanism which slices mostphysical resources in the OpenFlow network and allowsthe network to be controlled by multiple users withoutinterference. It inserts a new layer between the control planeand the data plane to implement virtualization and providesdifferent tenants with the abstraction of different slices ofthe physical network topology. FlowVisor does not focuson the scalability problem of network isolation while SVDCdoes. [38] is another mechanism which provides tenantswith network slice abstraction at language level. It aims atminimizing the difficulties for network administrators toimplement network isolation. Thus, it also does not focuson the scalability problem of network isolation.

9 CONCLUSION

In this paper we designed SVDC, a highly-scalable and low-overhead cloud data center network virtualization architec-ture. SVDC is specifically designed for layer-2 networks. Byleveraging the emerging SDN framework, SVDC decouplesthe global identifier of a virtual network from the identifiercarried in the packet header, and thus it can support a

12

great number of virtual networks with only limited lengthof in-packet tags. SVDC enhances MAC-in-MAC encapsu-lation to not only minimize the packet header overheadin encapsulation, but also guarantees correct forwarding inthe whole process with local identifier of virtual networkin the packet header. A huge number of multicast groupsare also efficiently supported in SVDC. Both simulationsand experiments demonstrate that SVDC can achieve higherscalability and network goodput than other solutions, withaffordable overhead of the controller.

REFERENCES

[1] J. Touch and R. Perlman, “Transparent Interconnection of Lots ofLinks (TRILL): Problem and applicability statement,” in IEEERFC 5556, May 2009.

[2] “IEEE 802.1aq - Shortest Path Bridging.” http://www.ieee802.org/1/pages/802.1aq.html.

[3] R. N. Mysore, A. Pamboris, N. Farrington, P. M. N. Huang,S. Radhakrishnan, V. Subramanya, and A. Vahdat., “PortLand: AScalable Fault-Tolerant Layer 2 Data Center Network Fabric,” inProc.Sigcomm, June 2009.

[4] M. Lasserre and V. Kompella, “Virtual private lan service (vpls)using label distribution protocol (ldp) signaling,” tech. rep., RFC4762, January, 2007.

[5] “IEEE Standard for Local and metropolitan area networks-VirtualBridged Local Area Networks,” IEEE computer society, May 2006.

[6] M.Mahalingam, D.Dutt, K.Duda, and P.Agarwal, “VxLan: Aframework for Overlaying Virtualized Layer 2 Networks overLayer 3 Networks,” IETFdraft-mahalingam-dutt-dcops-vxlan-01.txt, August 2012.

[7] M. Sridharan, K. Duda, and I. Ganga, “NVGRE: NetworkVirtualization using Generic Routing Encapsulation,”draft-sridharan-virtualization-nvgre-00.txt, March 2012.

[8] Mudigonda and Jayaram, “NetLord: a scalable multi-tenantnetwork architecture for virtualized datacenters,” inProc.SIGCOMM, June 2011.

[9] “IEEE 802.1ah - Provider Backbone Bridges.” http://www.ieee802.org/1/pages/802.1ah.html.

[10] D. Li, H. Cui, Y. Hu, Y. Xia, and X. Wang, “Scalable data centermulticast using multi-class bloom filter,” in Network Protocols(ICNP), 2011 19th IEEE International Conference on, pp. 266–275,IEEE, 2011.

[11] D. Li, Y. Li, J. Wu, S. Su, and J. Yu, “Esm: Efficient and scalabledata center multicast routing,” Networking, IEEE/ACMTransactions on, vol. 20, pp. 944–955, June 2012.

[12] T. Ristenpart, E. Tromer, H. Shacham, and S. Savage, “Hey, you,get off of my cloud: exploring information leakage in third-partycompute clouds,” in Proceedings of the 16th ACM conference onComputer and communications security, pp. 199–212, ACM, 2009.

[13] N. McKeown, T. Anderson, H. Balakrishnan, G. Parulkar,L. Peterson, J. Rexford, S. Shenker, and J. Turner, “Openflow:enabling innovation in campus networks,” ACM SIGCOMMComputer Communication Review, vol. 38, no. 2, pp. 69–74, 2008.

[14] “802.1D IEEE Standard for Local and Metropolitan AreaNetworks. Media Access Control (MAC) Bridges”,” IEEE, 2004.

[15] M. J. Christensen, K. Kimball, and F. Solensky, “Considerationsfor internet group management protocol (igmp) and multicastlistener discovery (mld) snooping switches,” 2006.

[16] M. A. Patton and K. Auerbach, “Multicast (including Broadcast)Addresses.” http://www.cavebear.com/archive/cavebear/Ethernet/multicast.html.

[17] Z. Guo, M. Su, Y. Xu, Z. Duan, L. Wang, S. Hui, and H. J. Chao,“Improving the performance of load balancing insoftware-defined networks through load variance-basedsynchronization,” Computer Networks, vol. 68, pp. 95–109, 2014.

[18] “Vlan Trunking Protocol.” http://en.wikipedia.org/wiki/VLAN Trunking Protocol.

[19] R. Perlman, D. Eastlake, D. Dutt, S. Gai, and A. Ghanwani,“Routing Bridges (RBridges): Base Protocol Specification,”Internet Engineering Task Force (IETF), July 2011.

[20] E. C. R. Luca Martini and G. Heron, “Encapsulation methods fortransport of ethernet over mpls networks,” tech. rep., RFC 4448,June, 2005.

[21] “How Big is AWS? Netcraft Finds 158,000 Servers.” http://http://www.datacenterknowledge.com/archives/2013/06/04/how-big-is-aws-new-netcraft-numbers-show-insight/.

[22] J. Dean and S. Ghemawat, “Mapreduce: simplified dataprocessing on large clusters,” Communications of the ACM, vol. 51,no. 1, pp. 107–113, 2008.

[23] A. Greenberg, J. R. Hamilton, N. Jain, S. Kandula, C. Kim,P. Lahiri, D. A. Maltz, P. Patel, and S. Sengupta, “VL2: a scalableand flexible data center network,” in ACM SIGCOMM ComputerCommunication Review, vol. 39, pp. 51–62, ACM, 2009.

[24] “Floodlight.” http://www.projectfloodlight.org/floodlight/.[25] “Cbench.” http://github.com/andi-bigswitch/oflops/tree/

master/cbench.[26] T. Koponen, M. Casado, N. Gude, J. Stribling, L. Poutievski,

M. Zhu, R. Ramanathan, Y. Iwata, H. Inoue, T. Hama, et al.,“Onix: A distributed control platform for large-scale productionnetworks.,” in OSDI, vol. 10, pp. 1–6, 2010.

[27] S. Hassas Yeganeh and Y. Ganjali, “Kandoo: a framework forefficient and scalable offloading of control applications,” inProceedings of the first workshop on Hot topics in software definednetworks, pp. 19–24, ACM, 2012.

[28] “Open Vswitch.” http://openvswitch.org/.[29] H. H. Liu, X. Wu, M. Zhang, L. Yuan, R. Wattenhofer, and

D. Maltz, “zupdate: Updating data center networks with zeroloss,” ACM SIGCOMM Computer Communication Review, vol. 43,no. 4, pp. 411–422, 2013.

[30] X. Jin, H. H. Liu, R. Gandhi, S. Kandula, R. Mahajan, M. Zhang,J. Rexford, and R. Wattenhofer, “Dynamic scheduling of networkupdates,” in ACM SIGCOMM Computer Communication Review,vol. 44, pp. 539–550, ACM, 2014.

[31] C. Reiss, A. Tumanov, G. R. Ganger, R. H. Katz, and M. A.Kozuch, “Heterogeneity and dynamicity of clouds at scale:Google trace analysis,” in Proceedings of the Third ACM Symposiumon Cloud Computing, p. 7, ACM, 2012.

[32] C. Rotsos, N. Sarrar, S. Uhlig, R. Sherwood, and A. W. Moore,“Oflops: An open framework for openflow switch evaluation,” inPassive and Active Measurement, pp. 85–95, Springer, 2012.

[33] K. Barabash, R. Cohen, D. Hadas, V. Jain, R. Recio, andB. Rochwerger., “Case for overlays in DCN virtualization.,” inProceedings of the 3rd Workshop on Data Center - Converged andVirtual Ethernet Switching, September 2011.

[34] B. Davie and E. Gross., “A Stateless Transport Tunneling Protocolfor Network Virtualization (STT),” IETF draft-davie-stt-02.txt,August 2012.

[35] M. Scott, A. Moore, and J. Crowcroft, “Addressing the scalabilityof ethernet with moose,” in Proc. DC CAVES Workshop, 2009.

[36] C. Kim, M. Caesar, and J. Rexford, “Floodless in seattle: a scalableethernet architecture for large enterprises,” in ACM SIGCOMMComputer Communication Review, vol. 38, pp. 3–14, ACM, 2008.

[37] R. Sherwood, G. Gibb, K.-K. Yap, G. Appenzeller, M. Casado,N. McKeown, and G. M. Parulkar, “Can the production networkbe the testbed?,” in OSDI, vol. 10, pp. 1–6, 2010.

[38] S. Gutz, A. Story, C. Schlesinger, and N. Foster, “Splendidisolation: A slice abstraction for software-defined networks,” inProceedings of the first workshop on Hot topics in software definednetworks, pp. 79–84, ACM, 2012.

Mr. Congjie Chen Mr. Congjie Chen receivedthe BS degree in 2013 from the Department ofTelecommunication Engineering, Beijing Univer-sity of Posts and Telecommunications. Current-ly, he is working towards the Master degree inthe Computer Science Department of TsinghuaUniversity, China. His main research interest in-cludes data center networks and network virtu-alization.

13

Dr. Dan Li Dr. Dan Li received the PhD degreein computer science from Tsinghua University in2007. He is an associate professor in the Com-puter Science Department of Tsinghua Universi-ty, Beijing, China. His research interest includesfuture internet architecture and data center net-working.

Dr. Jun Li Dr. Jun Li received his PhD degreein computer science from the University of Cal-ifornia, Los Angeles, CA, in 2002, with honors(Outstanding Doctor of Philosophy). He is nowan associate professor in the Department ofComputer and Information Science, University ofOregon, and directs the Network and SecurityResearch Laboratory. His research interest in-cludes networking, distributed systems and thesecurity of networking and distributed systems.

Dr. Konglin Zhu Dr. Konglin Zhu received hisPhD degree in computer science from the Uni-versity of Goettingen in early 2014 and he is nowthe faculty in School of Information and Com-munication Engineering of Beijing University ofPosts and Telecommunications. His research in-terest includes data routing in mobile social net-works and online social networking analysis.

14