Introduction to the Cloud Computing Network Control Plane...Common Data Center Network...
Transcript of Introduction to the Cloud Computing Network Control Plane...Common Data Center Network...
Introduction to the Cloud Computing Network Control Plane
Architecture and Protocols for Data Center Networks
Outline
❒ Data Center Networking Basics❒ Problem Solving with Traditional Design
Techniques❒ Virtual/Overlay Network Functional
Architecture❒ Virtual/Overlay Network Design and
Implementation
Data Center Networking Basics
Lots of Servers❒ Data centers consist of
massive numbers of servers❍ Up to 100,000’s
❒ Each server has multiple processors❍ 8 or more
❒ Each processor has multiple cores❍ 32 max for commodity
processors, more coming
❒ Each server has multiple NICs❍ Usually at least 2 for redundancy❍ 1G common, 10G on the upswing
Source: http://img.clubic.com/05468563-photo-google-datacenter.jpg
Mostly Virtualized❒ Hypervisor provides a compute
abstraction layer ❍ Looks like hardware to operating
system❍ OSes run as multiple Virtual
Machines (VMs) on single server
❒ Hypervisor maps VM to processors ❍ Virtual cores (vCores)
❒ Virtual switch provides networking between VMs and to DC network❍ Virtual NICs (vNICS)
❒ W.o. oversubscription, usually as many VMs as cores❍ Up to 256 for 8p x 32c❍ Typical is 32 for 4p x 8c
❒ VMs can be moved from one machine to another
Server Hardware
Hypervisor
VM1
VM2
VM3
VM4
NIC1 NIC2
Virtual Switch
vNIC vNICvNIC vNIC
Data Center Network Problem❒ For a single virtualized data center built
with cheap commodity servers:❍ 32 VMs per server❍ 100,000 servers❍ 32 x 100,000 = 3.2 million VMs!
❒ Each VM needs a MAC address and an IP address
❒ Infrastructure needs IP and MAC addresses too❍ Routers, switches❍ Physical servers for management
❒ Clearly a scaling problem!
Common Data Center Network Architectures: Three Tier❒ Server NICs connected directly to edge
switch ports❒ Aggregation layer switches connect
multiple edge switches❒ Top layer switches connect aggregation
❍ Top layer can also connect to the Internet
❒ Usually some redundancy
❒ Pluses❍ Common❍ Simple
❒ Minuses❍ Top layer massively over-subscribed❍ Reduced cross sectional bandwidth
• 4:1 oversubscription means only 25% of bandwidth available
❍ Scalability at top layer requires expensive enterprise switches
Source: K. Bilal, S. U. Khan, L. Zhang, H. Li, K. Hayat, S. A. Madani, N. Min-Allah, L. Wang, D. Chen, M. Iqbal, C.-Z. Xu, and A. Y. Zomaya, "Quantitative Comparisons of the State of the Art Data Center Architectures," Concurrency and Computation: Practice and Experience,
vol. 25, no. 12, pp. 1771-1783, 2013.
Top of Rack (ToR) Switch
End ofRow Switch(sometimes)
These can beIP Routers(for more €s)
Common Data Center Network Architectures: Fat Tree❒ CLOS network origin in 1950’s
telephone network❒ Data center divided into k pods❒ Each pod has (k/2)2 switches
❍ k/2 access, k/2 aggregation
❒ Core has (k/2)2 switches❒ 1:1 oversubscription ratio and full
bisection bandwidth
❒ Pluses❍ No oversubscription❍ Full bisection bandwidth
❒ Minuses❍ Need specialized routing and
addressing scheme❍ Number of pods limited to number of
ports on a switch❍ Maximum # of pods = # switch ports
Source: Bilal, et. al.
k=4 Example
Problem Sovlving with Traditonal Design Techniques
Problem #1:ARP/ND Handling
❒ IP nodes use ARP (IPv4) and N(eighbor) D(iscovery) for resolving the IP to MAC address❍ Broadcast (ARP)
and Multicast (ND)
❒ Problem:❍ Broadcast
forwarding load on large, flat L2 networks can be overwhelming
Source: http://www.louiewong.com/wp-content/uploads/2010/09/ARP.jpg
Problem #2: VM Movement❒ Data center operators need
to move VMs around❍ Reasons: server maintenance,
server optimization for energy use, performance improvement, etc.
❍ MAC address can stay fixed (provided it is unique in the data center)
❍ If subnet changes, IP address must change because it is bound to the VM’s location in the topology• For “hot” migration, the IP
address cannot change
❒ Problem:❍ How broadcast domains are
provisioned affects where VMs can be moved
Source: http://www.freesoftwaremagazine.com/files/nodes/1159/slide4.jpg
Hypervisor Hypervisor Hypervisor Hypervisor
Solutions Using Traditional Network Design Principles: IP Subnets
❒ ToR == last hop router❍ Subnet (broadcast domain) limited to rack❍ Good broadcast/multicast limitation❍ Poor VM mobility
❒ Aggregation Switch == last hop router❍ Subnet limited to racks controlled by aggregation switch❍ Complex configuration
• Subnet VLAN to all access switches and servers on served racks
❍ Moderate broadcast/multicast limitation❍ Moderate VM mobility
• To any rack covered
❒ Core Switch/Router == last hop router❍ Poor broadcast/multicast limitation❍ Good VM mobility
Note:These solutions onlywork if the data centeris single tenant!
Source: Bilal, et. al.
Where to put the lasthop router?
Problem #3: Dynamic Provisioning of Tenant Networks❒ Virtualized data centers enable
renting infrastructure to outside parties (aka tenants)❍ Infrastructure as a Service (IaaS)
model❍ Amazon Web Services, Microsoft
Azure, Google Compute Engine, etc.
❒ Customers get dynamic server provisioning through VMs❍ Expect same dynamic “as a service”
provisioning for networks too
❒ Characteristics of tenant network❍ Traffic isolation❍ Address isolation
• From other tenants• From infrastructure
Solution Using Traditional Network Design Principles❒ Use a different VLAN for each tenant network❒ Problem #1
❍ There are only 4096 VLAN tags for 802.1q VLANs*❍ Forces tenant network provisioning along physical
network lines
❒ Problem #2❍ For fully dynamic VM placement, each ToR-server
link must be dynamically configured as a trunk
❒ Problem #3❍ Can only move VMs to servers where VLAN tag is
available• Ties VM movement to physical infrastructure
*except for carrier Ethernet, about which more shortly
Summary
❒ Configuring subnets based on hierarchical switch architecture always results in a tradeoff between broadcast limitation and VM movement freedom❍ On top of which, can’t achieve traffic isolation for
multitenant networks
❒ Configuring multitenant networks with VLAN tags for traffic isolation ties tenant configuration to physical data center layout❍ Severely limits where VMs can be provisioned and
moved❍ Requires complicated dynamic trunking
❒ For multitenant, virtualized data centers, no good solution using traditional techniques!
Virtual/Overlay Network Functional Architecture
Virtual Networks through Overlays
Source: Bilal, et. al. Blue Tenant Network Yellow Tenant Network
❒ Basic idea of an overlay:❍ Tunnel tenant packets through underlying physical Ethernet or IP network ❍ Overlay forms a conceptually separate network providing a separate service
from underlay
❒ L2 service like VPLS or EVPN❍ Overlay spans a separate broadcast domain
❒ L3 service like BGP IP VPNs❍ Different tenant networks have separate IP address spaces
❒ Dynamically provision and remove overlay as tenants need network service
❒ Multiple tenants with separate networks on the same server
Advantages of Overlays
❒ Tunneling is used to aggregate traffic❒ Addresses in underlay are hidden from the
tenant❍ Inhibits unauthorized tenants from accessing
data center infrastructure
❒ Tenant addresses in overlay are hidden from underlay and other tenants❍ Multiple tenants with the same IP address space
❒ Overlays can potentially support large numbers of tenant networks
❒ Virtual network state and end node reachability are handled in the end nodes
Challenges of Overlays
❒ Management tools to co-ordinate overlay and underlay❍ Overlay networks probe for bandwidth and
packet loss, which can lead to inaccurate information
❍ Lack of communication between overlay and underlay can lead to inefficient usage of network resources
❍ Lack of communication between overlays can lead to contention and other performance issues
❒ Overlay packets may fail to traverse firewalls
❒ Path MTU limit may cause fragmentation❒ Efficient multicast is challenging
Functional Architecture: Definitions❒ Virtual Network
❍ Overlay network defined over the Layer 2 or Layer 3 underlay (physical) network
❍ Provides either a Layer 2 or a Layer 3 service to tenant
❒ Virtual Network Instance (VNI) or Tenant Network❍ A specific instance of a virtual network
❒ Virtual Network Context (VNC)❍ A tag or field in the encapsulation header that
identifies the specific tenant network
Functional Architecture: More Definitions❒ Network Virtualization Edge (NVE)
❍ Data plane entity that sits at the edge of an underlay network and implements L2 and/or L3 network virtualization functions • Example: virtual switch aka Virtual Edge Bridge (VEB)
❍ Terminates the virtual network towards the tenant VMs and towards outside networks
❒ Network Virtualization Authority (NVA)❍ Control plane entity that provides information
about reachability and connectivity for all tenants in the data center
Overlay Network Architecture
Data Center L2/L3 Network
NVE
TenantSystem
TenantSystem
NVE
TenantSystem
TenantSystem
TenantSystem
NVE
NVA
Data PlaneControl Plane
LAN link
Point to Point link
End Systemintegration
Virtual/Overlay Network Design and ImplementatION
Implementing Overlays: Tagging or Encapsulation?❒ At or above Layer 2 but below Layer 3:
❍ Insert tag at a standards specified place in the pre-Layer 3 header
❒ At Layer 3:❍ Encapsulate the tenant packet with an
encapsulation protocol header and an IP header
❒ Tenant network identified by Virtual Network Context❍ Tag for tagging❍ Context identifier in protocol header for
encapsulation
L2 Virtual Networks:Tagging Options❒ Simple 802.1q VLANs
❍ 4096 limit problem❍ Trunking complexity
❒ MPLS❍ Nobody uses MPLS directly on the switching hardware
• One experimental system (Zepplin)
❍ Switches are perceived to be too expensive
❒ TRILL❍ IETF standard for L2 encapsulation❍ Not widely adopted
• Brocade and Cisco implement it
❒ Collection of enhancements to 802.1 since 2000❍ 802.1qbg Virtual Edge Bridging (VEB) and Virtual Ethernet Port Aggregation
(VEPA) (data plane)❍ 802.1qbc Provider Bridging (data plane)❍ 802.1qbf Provider Backbone Bridging (data plane)
• Also does MAC’nMAC encapsulation
❍ 802.1aq Shortest-Path Bridging (control plane)❍ Note: These are also used by carriers for wide area network (Carrier
Ethernet)
802.1qbg: Standard Virtual Switch/VEB❒ Virtual switch software sits in hypervisor and switches
packets between VMs❒ Every time a packet arrives for a VM, the hypervisor
takes an interrupt❍ Potential performance issue
Source: D. Kamath, et. Al., “Edge Virtual Bridge Proposal Version 0, Rev 0.1”, March, 2010.
802.1qbg: Hardware Supported VEB
❒ SR-IOV is a PCI Express bus standard for allowing VMs to communicate directly with the NIC❍ No hypervisor interrupt
❒ Improves performance of virtual switching❒ Downside
❍ More expensive NIC hardware❍ More complex virtual switch❍ Constrains VM movement
802.1qbg: VEB Forwarding
❒ At 1, VEB forwards between VM and outside network via an external physical bridge (e.g. ToR)
❒ At 2, VEB forwards between two VMs belonging to the blue tenant on the same hypervisor
❒ At 3, forwarding between two logical uplink ports is not allowed
802.1qbg: VEB Characteristics❒ Works in the absence of any ToR switch support❒ Only supports a single physical uplink❒ VEB does not participate in spanning tree
calculations❒ Maximize bandwidth
❍ As opposed to VEPA which uses trombone forwarding (as we will shortly see)
❒ Minimize latency for co-located VMs because no external network to cross
❒ Migration of VMs between servers is straightforward❍ If both support SR-IOV for hardware supported
802.1qbg:VEB Drawbacks (as of 2010)❒ Limited additional packet processing (ACLs,
etc.)❒ Limited security features❒ Limited monitoring (Netflow, etc.)❒ Limited support for 802.1 protocols (802.1x
authentication, etc.)❒ Limited support for promiscuous mode❒ All these are supported in the ToR❒ Assumption: the only way to get support
for these is to forward frames to the ToR before sending them to the VM
802.1qbg: Virtual Edge Port Aggregation (VEPA)❒ Firmware upgrade to switch to allow forwarding out of the same
physical port the packet arrived at under certain conditions❒ VMs send all packets to the switch
❍ Packets to VMs on VLANs on same machine turned around and sent back
❒ Trombone routing halves the capacity on the ToR-server link
❒ OpenVirtualSwtch (OVS) supports ACLs
❒ OVS supports Netflow❒ VMWare virtual switch supports
promiscuous mode and OVS supports it if the NIC is in promiscuous mode
❒ OVS doesn’t support 802.1x❒ Conclusion: programming support
into software is a much better solution than making a hardware standard that reduces performance
5 Years Later: VEBs support most of these
Ethernet Data Plane Evolution:Not Your Father’s Ethernet Anymore
1990 1999 2005 2008
802.1D802.1QVLAN
802.1QbcProvider Bridging
802.1QbfProvider
BackboneBridging
Source:evolutionanimation.wordpress.com
Source:P. Thaler, N. Finn, D. Fedyk,G. Parsons, and E. Gray, “IEEE 802.1Q:Media Access Control Bridges andVirtual Bridged Local Area Networks”, IETF-86 Tutorial, March 19, 2013
Ethernet Control Plane Evolution❒ Rapid Spanning Tree
Protocol (RSTP): single spanning tree for all traffic
❒ Multiple Spanning Tree Protocol (MSTP): different VLANs share separate paths
❒ Shortest Path Bridging: Use routing protocol (ISIS) to give each node its own spanning tree
Source: P. Thaler, et. al., 2013
SPB Data Center Virtualization
Data Center L2 Network
NVA(e.g. Software DefinedNetwork Controller)
NVE (EdgeSwitch 1)
NVE (EdgeSwitch 2)
NVE (EdgeSwitch 3)
CentralSwitch 1
1) Create RedTenant Network
(I-SID1)
2) Distribute Shortest
Path Routes with ISIS
B-V
ID1
I-SI
D1
VN-1
VN-1
Hybrid Centralized/Distributed Control Plane
VM
L2 Virtualization: Challenges Handled❒ “Hot” VM movement
❍ IP address space configured on I-SID❍ But only within the data center
❒ ARP containment ❍ Limit broadcast domain to I-SID
❒ Firewall traversal ❍ No firewall at L2
❒ Path MTU ❍ Handled by the IP layer
❒ Multicast ❍ ISIS handles
❒ Management❍ Whole suite of management tools for 802.1 networks
L2 Virtualization Summary
❒ Possible to virtualize a data center with standardized L2 overlays❍ Advances in 802.1Q data plane provide one
encapsulation layer of MAC’nMAC encapsulation and extra layer of VLAN tags
❍ Centralized, decentralized or hybrid control plane
❒ But most existing deployments use proprietary extensions❍ Cisco UCS uses TRILL
❒ But using IP overlays is cheaper❍ Switches supporting carrier Ethernet extensions
and TRILL are more expensive than simple 802.1Q
L3 Virtual Networks: Advantages
❒ Easy IP provisioning through hypervisor/virtual switch❍ End host provisioning❍ No need for distributed control plane
❒ Cheap NICs and switching hardware ❒ Support in hypervisor/virtual switch❒ No limitation on number and placement of
virtual networks❍ Virtual network can even extend into WAN
L3 Virtual Networks: Challenges❒ Path MTU limitation may cause
fragmentation❒ Lack of tools for management❒ Some performance hit
❍ Encapsulation/decapsulation❍ Lack of NIC hardware support
But low cost of NICs and switching hardware trumps all!!
L3 Virtual Networks: Encapsulation Options❒ IP in IP
❍ Use IP address as VNC❍ Problem for IPv4: Lack of address space
❒ IPSec in Infrastructure mode❍ Provides additional confidentiality❍ Problem: Key distribution complexity❍ Problem: larger performance hit even with hardware encryption
assist
❒ In practice:❍ STT
• Proprietary VMWare/NSX protocol• Designed to leverage TLS hardware support on NICs
❍ GRE and NVGRE❍ VxLAN
❒ Coming❍ GEVNE
• Proposed unified protocol framework for encapsulation headers
NVGRE: Network Virtualization Generic Routing Encapsulation
❒ Microsoft-proposed GRE Extension built on:❍ RFC 2784 GRE❍ RFC 2890 GRE Key Extension
❒ Provides a Layer 2 service tunneled over IP❍ No VLAN id!
❒ VNC is a Virtual Subnet Identifier (VSID)❍ 24 bit Key
• Each VSID constitutes a separate broadcast domain
– Like a VLAN
❍ 8 bit Flow label • Adds entropy for Equal Cost
Multipath (ECMP) routing P-DMAC
P-SMAC
P-VIDEthertype=0x0800
P-SIP
P-DIPProtocol=0x2F
0 1 0 0ReservedVerProtocol=0x6558
24 bit VSID 8 bit FlowID
C-DMAC
C-SMACEthertype=0x0800
C-SIP
C-DIPProtocol=<Payload>
Checksum bit &Sequence # bit
Key bit
IndicatesTransparent
EthernetBridging
NO VID!!
NVGRE Characteristics
❒ Path MTU discovery must be performed by originating NVE
❒ Encapsulated MAC header VLAN tag handling❍ Originating NVE must strip out any 802.1Q VLAN tag❍ Receiving NVE must add required 802.1Q VLAN tag
back❍ Requires NVA to maintain and provision VLAN tag to VN
Key mapping
❒ Multicast handling❍ Multicast routing deployed in infrastructure
• Provider provisions a multicast address per VSID• Addr takes all multicast and broadcast traffic originating in VSID
❍ No multicast routing deployed in infrastructure• N-way unicast by NVEs or a dedicated VM multicast router
VxLAN: Virtual eXtensible LocalArea Network ❒ RFC 7348
❍ Consortium lead by Intel, VMWare and Cisco
❒ Full Layer 2 service provided over IP❍ VLAN id OK❍ VxLAN segments constitute
a broadcast domain
❒ VNC is VxLAN Network Identifier (VNI)❍ 24 bit VxLAN Segment
Identifier
❒ Recommended UDP source port randomized to provide entropy for ECMP routing
P-DMAC
P-SMAC
P-VIDEthertype=0x0800
P-SIP
P-DIPProtocol=0x17 (UDP)
C-DMAC
C-SMAC
Ethertype=0x0800C-SIP
C-DIPProtocol=<Payload>
C-VID
Source Port = <Random>
Dest. Port = 4789
UDP Length UDP Checksum (= 0)
Rsv. 1 R. Reserved
24 bit VNI Reserved
UDP
VxLAN
Set to 1 for valid VNI
Other bits ignored
VxLAN Characteristics❒ Problem: IP multicast control plane required by RFC
7348❍ IP multicast address allocated per VNI for determining IP
unicast address to MAC address mapping❍ Multicast routing not widely deployed in data centers
• Most VxLAN deployments use NVA/SDN Controller
❒ Solution: VxLAN just used as an encapsulation format❒ UDP endpoint constitutes a VxLAN Tunnel End Point
(VTEP)❍ Handled at application layer
❒ Path MTU discovery performed by VTEP❒ Multicast handling like NVGRE
❍ Can be handled by using underlay multicast❍ Mostly handled using N-way unicast
VxLAN Data Center Virtualization
Data Center L3 Network
NVA(e.g. Software DefinedNetwork Controller)
ToR1
ToR2
ToR3
Centralized Control Plane
VTEP
VTEP
VTEP
Create RedTenant Network
(VNI-1)
VNI-1
VN
I-1
NVE
NVE
NVE
L3 Virtual Networks Summary
❒ Despite the challenges with IP overlays, they are widely deployed❍ Usually workarounds for the challenges
❒ Software availability❍ Lots of open source software❍ Also proprietary solutions available
❒ Can extend overlay into WAN❍ Between data centers❍ Between enterprise network and data center
❒ Deployments almost exclusively use centralized control❍ NVA implemented using an SDN controller