Challenges of L2 NID Based Architecture for vCPE and NFV Deployment
-
Upload
bangladesh-network-operators-group -
Category
Internet
-
view
1.228 -
download
5
Transcript of Challenges of L2 NID Based Architecture for vCPE and NFV Deployment
Challenges of Layer 2 NID Based Architecture For vCPE/NFV Deployments
Santanu Dasgupta Sr. Consulting Engineer – Global Service Provider Network Architecture
BDNOG-3
May, 2015
2 © 2013-2014 Cisco and/or its affiliates. All rights reserved.
Background and Context
3 © 2013-2014 Cisco and/or its affiliates. All rights reserved.
“Network Functions” in SP Network Architecture Landscape
LTE
Smartphone
Access
xDSL
WiFi
Smartphone
PC
RNC 2G 3G
Ethernet CE
NodeB
eNodeB
AP
Small Cell FAP
Gateways / Service Edge
OSS/BSS
Subsystems and Control
Data Plane Voice Video Data
Core Network Infrastructure
IMS
xDSL HFC
PGW SGW
2/3G GGSN
2/3G SGSN
MME
ePDG
eWAG
PE
Metro Network Infrastructure
NAT FW IPSec
DPI CGN Caching
Opt
MSC-S MGW
A-SBC I-SBC
BGCF
MGCF
PS / RLS
DRA
Video ingestion
DRM
Video Network
EMS Provisioning Analytics Billing
Radius
DNS
DHCP
S-CSCF
P-CSCF
I-CSCF
Trans- coding
Cache Control
Policy
Parental control
HLR
HSS
ENUM
TAS SMS-C
Services
OCS MMS-C HCS RMS
xDSL DSLAM DSL/ FTTX BNG
Core Routing
Metro Ethernet
Biz CPE
Consumer CPE
Cable Modem CMTS
Capacity Planning
WLC
SecGW
HNB-GW
Policy
SDN Controller
BGP server
Metro Ethernet
Data Center Core and Data Center Network Infrastructure
4 © 2013-2014 Cisco and/or its affiliates. All rights reserved.
Virtualization of “Network Functions”
Existing Hardware / Appliance based Network Functions (NFs)
Virtualized NFs running as VM on x86 Server Platform
Step 1: Decouple software from underlying hardware
Step 2: Port it as a VM/container on x86 Server
platform running as a Network Function
Data Center Switching Infrastructure
Hypervisor
vFW vCPE vDPI vLB
Hypervisor
5 © 2013-2014 Cisco and/or its affiliates. All rights reserved.
NFV, SDN & Orchestration Together Partial list, just a few main ones are mentioned here
Ethernet Switching Network Underlay
Hypervisor Hypervisor Hypervisor
NAT Firewall DPI
Orchestration and SDN Control Function
Storage
Server 1 Server 2 Server 3
Firewall DPI
VM / VNF Lifecycle Management in
End-to-end manner
Network Plumbing to orchestrate
dynamic topologies
Configuration Management of the VNFs
Integration with Other DC/POD And the WAN
OAM, Assurance, Analytics
Standard APIs
NAT
6 © 2013-2014 Cisco and/or its affiliates. All rights reserved.
Traditional Managed CPE with IP/MPLS L3VPN
PE PE P
CE CE Carrier Ethernet /
Backhaul Carrier Ethernet /
Backhaul
§ There are multiple genuine and perceived issues in the traditional service delivery model – § CPE provisioning and servicing often require truck roll (sending engineers) à high OPEX § The amount of feature sets enabled on the on-premise CPE makes the solution complex to operate § Service delivery is not agile, lacks automation, service turn-up / changes takes a lot of time § On site CPE’s are often expensive and not an open platform
§ The industry is expecting something that is more open, agile, fully automated, flexible to address different market segments and can help the operators to reduce their TCO § This is where L2 NID on-premises + vCPE architecture discussion for business VPN started almost
2 years back in the industry
MP-BGP Static / IGP / eBGP Static / IGP / eBGP
MPLS Core Branch Branch
7 © 2013-2014 Cisco and/or its affiliates. All rights reserved.
What is “L2 NID” ?
§ Layer 2 NID – Layer 2 Network Interface Device (example Cisco ME 1200)
§ Some calls it Layer 2 Network Termination Device (NTD) à we will call it L2 NID for this presentation
§ L2 NID is the device that Carrier Ethernet Operator drops at Customer Premises to terminate the Ethernet last mile § It is managed by the operator
§ It has user facing interfaces (UNI) and network facing interfaces (NNI) – typically all Ethernet § It marks the demarcation point between the Operator and Customer Network
§ The L2 NID is typically a 4 to 6 port FE/GE/10GE L2 switch with some other capabilities such as – § Ethernet OAM (CFM, Y.1731) for fault & performance management, Service Activation (Y.1564), timing support (mobile b/h) etc.
Carrier Ethernet AGG
AGG
AGG
NPE
NPE
L2 NID MPLS Core
Typical Carrier Ethernet Operation Domain
Dem
arca
tion
Customer Premises
PE
8 © 2013-2014 Cisco and/or its affiliates. All rights reserved.
L2 NID + vCPE Architecture Proposal for Managed CPE/VPN
§ No need to have a Layer 3 CPE at Customer premises anymore
§ Virtualize the L3 CPE and Put that at SP’s POP or Cloud / NFV Data Center
§ Make the branch simplified with only one device, where complex features are running at SP’s Cloud making it easier to operate à may also help to reduce cost
Carrier Ethernet AGG
AGG
AGG
NPE
NPE
L2 NID MPLS Core
One Device on Customer Prem
vCPE
vCPE
Layer 2 Backhaul of Branch Traffic to vCPE PE-CE Routing Very simple on-premises CPE
L3 CPE Virtualized, Complex features running at SP’s premises
PE CPE
Animated
9 © 2013-2014 Cisco and/or its affiliates. All rights reserved.
L2 NID + vCPE Architecture Proposal for Managed CPE/VPN
Carrier Ethernet AGG
AGG
AGG
NPE
NPE
L2 NID MPLS Core
One Device on Customer Prem
vCPE
vCPE
Layer 2 Backhaul of Branch Traffic to vCPE PE-CE Routing Very simple on-premises CPE
L3 CPE Virtualized, Complex features running at SP’s premises
PE
§ No need to have a Layer 3 CPE at Customer premises anymore
§ Virtualize the L3 CPE and Put that at SP’s POP or Cloud / NFV Data Center
§ Make the branch simplified with only one device, where complex features are running at SP’s Cloud making it easier to operate à may also help to reduce cost
10 © 2013-2014 Cisco and/or its affiliates. All rights reserved.
Why The L2 NID Based Alternate Looked Promising ?
Carrier Ethernet
AGG
AGG
NPE
NPE
L2 NID
Carrier Ethernet AGG
AGG
AGG
NPE
NPE
L2 NID MPLS Core
CPE AGG
Two Devices on Customer Prem
One Device on Customer Prem
MPLS Core
vCPE
vCPE
PE-CE Routing over Carrier Ethernet Transport
Layer 2 Backhaul of Branch Traffic to vCPE PE-CE Routing
Traditional Model
L2 NID+ vCPE Model
Reduction of Customer Premise Devices from Two to One was Promising to Reduce Cost and Complexity
PE
PE
11 © 2013-2014 Cisco and/or its affiliates. All rights reserved.
This Would Enable Agile Service Creation too
Carrier Ethernet AGG
AGG
AGG
NPE
NPE
L2 NID MPLS Core
One Device on Customer Prem
vCPE
vFW
vDPI
NFV DC
NFV and Cloud Orchestration
§ NFV and Orchestration also enables agile service creation and turn-up
§ vCPE can be chained with rich set of NFV, Cloud IaaS, PaaS and SaaS services (SP hosted or 3rd party)
§ This is true irrespective of the on-premises CPE type – be in L2 NID or L3 CPE or whatever else
PE
PE
Cloud DC (SP / 3rd Party)
IaaS / PaaS / SaaS
12 © 2013-2014 Cisco and/or its affiliates. All rights reserved.
Challenges with the L2 NID Based Architecture, where there is no L3 CPE at the Branch
13 © 2013-2014 Cisco and/or its affiliates. All rights reserved.
MAC Address Scale Issues for the NFV DC
.
.
.
250 MAC Address
250 MAC Address
250 MAC Address
Carrier Ethernet AGG
AGG
AGG
NPE
NPE
L2 NID L2 NID
L2 NID L2 NID
TOR Switch
vCPE vCPE
L2 Trunk / QinQ
L3 VPN PE
L3 Links
1 Million MAC Address!!!
(4000 customers @250 MAC)
§ NFV DC’s are built with a network switching underlay à servers aren’t directly connected to the NPE
§ With layer 2 backhaul of traffic from customer branches, the NFV DC switching layer will learn all customer MAC addresses
§ An example site with 4000 customer sites and 250 MAC address per site means 1 Million MACs § The switching underlay / TOR switches will now need to support and learn 1Million MAC addresses § Impacting cost of the network, service scale, convergence time upon failure due to large table size
§ This can be technically solved with end-to-end overlay (like GRE or MPLS PW) from branch to vCPE or NPE to vCPE à defeating the original simplicity of the proposed architecture to a major extent
L2 NID
L2 NID
L2 NID
14 © 2013-2014 Cisco and/or its affiliates. All rights reserved.
Security Exposure Due to Extension of the Broadcast Domain
Carrier Ethernet AGG
AGG
AGG
NPE
vCPE
L2 NID
L2 NID
L2 NID
.
.
. vCPE
vCPE vCPE
TOR Switch
Customer’s L2 Domain Extended to NFV DC’s delivering vCPE
§ By not having a L3 CPE at branch, and vCPE at NFV DC, it extends customer’s Layer 2 domain all the way to the NFV DC
§ For a POP with 4000 customers, it means extension of 4000 layer 2 domains hitting the NFV DC à SP typically has no control what assets are there at these branches and how secure they are
§ This poses a significant security risk to SP’s infrastructure for various DDoS/other attacks
NFV DC
15 © 2013-2014 Cisco and/or its affiliates. All rights reserved.
Potential Risks Due to Layer 2 Loops
Carrier Ethernet AGG
AGG
AGG
NPE
vCPE L2 NID
L2 NID
vCPE
vCPE vCPE
TOR Switch
L2 NID
L2 NID
To Customer LAN
To Customer LAN
Customer LAN’s STP Domain Operators domain Starts here onwards
§ In a L2 NID only based architecture, it is critical to demarcate customer’s STP domains at the L2 NID
§ There could be dual homing situations, where L2 NID may have to participate in Customer’s Spanning Tree domain, also may require some form of loop prevention mechanism on the NNI side too
§ Such dual homed connectivity requirement pose risks. Operational errors may cause the SP infrastructure to get impacted due to layer 2 loops originated from a customer branch
16 © 2013-2014 Cisco and/or its affiliates. All rights reserved.
High Availability Design Challenges
Carrier Ethernet AGG
AGG
AGG
NPE
NPE
L2 NID MPLS Core
vCPE1
vCPE2
PE-CE Routing
PE
Layer 2 Backhaul of Branch Traffic to vCPE
§ For situations with two vCPE’s for HA, the two vCPE’s need to run HSRP / VRRP (lets consider VRRP)
§ There are multiple ways to run the VRRP traffic between the two vCPE’s that comes with different levels of complexity and different degree of reliability
§ The “L2 NID ßà vCPE” connectivity tracking becomes key for reliable bi-directional packet forwarding
17 © 2013-2014 Cisco and/or its affiliates. All rights reserved.
High Availability Design Challenges VRRP on Directly Connected Links
Carrier Ethernet AGG
AGG
AGG
NPE
NPE
L2 NID MPLS Core
vCPE1
vCPE2
PE-CE Routing
PE
Layer 2 Backhaul of Branch Traffic to vCPE
§ Simplest way to run VRRP à but requires a L2 segment between two NFV DCs (typically two sites)
§ Less reliable, since VRRP operation is blind to the connectivity failures from vCPE to L2 NID
§ If vCPE1 is active on VRRP segment, and if vCPE1 to the L2 NID connectivity fails, vCPE1 may continue to remain active à will cause service outage
VRRP
18 © 2013-2014 Cisco and/or its affiliates. All rights reserved.
High Availability Design Challenges VRRP on Via the Carrier Ethernet Network
Carrier Ethernet AGG
AGG
AGG
NPE2
NPE1
L2 NID MPLS Core
vCPE1
vCPE2
PE
§ To address the reliability issue, VRRP may be carried across the Carrier Ethernet network § vCPE1 – NPE1 – NPE2 – vCPE2 à need a L2 path, FRR enabled explicit path between NPE’s to force the path
§ vCPE1 – NPE1 – AGG … – AGG – NPE2 – vCPE2
§ More reliable now, since VRRP traffic will be dropped if the L2NID to vCPE connectivity fails, but – § Carrier Ethernet network to ensure VRRP packets aren’t dropped during congestion – that will trigger false failover § VRRP delay timers may not be very aggressive, Carrier Ethernet network to ensure minimum delay
§ This is more complex to provision and operate
VRRP
Potential Alternate Paths for VRRP Sessions
19 © 2013-2014 Cisco and/or its affiliates. All rights reserved.
High Availability Design Challenges VRRP via the Carrier Ethernet Network + IEEE 802.1ag (CFM)
Carrier Ethernet AGG
AGG
AGG
NPE2
NPE1
L2 NID MPLS Core
vCPE1
vCPE2
PE
§ The previous solution is still not end-to-end, not covering the AGG to the L2 NID connectivity § For an end-to-end reliable operations, that is a key requirement
§ A way to address this challenge is to use VRRP and CFM (802.1ag) together § CFM runs end-to-end from L2NID to vCPE. When due to any failure on the path, CFM session expires à interface of
vCPE goes to line protocol “down” state à VRRP traffic cannot go out any more out of the interface à VRRP switchover takes place to the standby
§ Solves this HA issue, but brings back a lot of complexities in the network § We’re trying to remove complexities by removing L3 CPE from branch à but we introduced different complexities now
VRRP
802.1ag CFM Sessions
VRRP
CFM
CFM
20 © 2013-2014 Cisco and/or its affiliates. All rights reserved.
Carrier Ethernet AGG
AGG
AGG
NPE2
NPE1
L2 NID MPLS Core
vCPE1
vCPE2
PE
VRRP
802.1ag CFM Sessions
VRRP
CFM
CFM
High Availability Design Challenges Upstream Routing from vCPE to the L3VPN PE
PE-CE Routing with Conditional Advertisement depending on vCPE to L2 NID connectivity and VRRP status (typically will require eBGP)
§ The vCPE’s need to run PE-CE routing with eBGP / IGP or Static routing
§ The vCPE’s now need to perform conditional route advertisement to the L3 VPN PE’s depending on the reachability of vCPE to L2 NID and VRRP status
§ If vCPE1 is the preferred path for the downstream traffic, but vCPE1 has lost connectivity to L2 NID, the vCPE1 needs to make L3VPN PE aware by advertising routes with less preferred attribute than vCPE2
§ Typically restrict the PE-CE routing protocol to eBGP and may add more complexity in the design
21 © 2013-2014 Cisco and/or its affiliates. All rights reserved.
Lack of L3 Capability @Branch Will Limit Available Services
Carrier Ethernet AGG
AGG
AGG
NPE
NPE
L2 NID MPLS Core
vCPE1
vCPE2
PE VRRP
Branch
§ Many customer may require capabilities at branch that requires Layer 3 devices § Such as IPSec VPN or WAN Acceleration
§ Many Service Providers are looking forward to use 3G or 4G LTE as backup connectivity § Typical L2 NIDs do not have those interfaces à forcing another CPE for the backup
§ If Hierarchical & granular QOS is a requirement at the branch, this could be challenging with L2 NID too
22 © 2013-2014 Cisco and/or its affiliates. All rights reserved.
Conclusion of L2 NID + vCPE Architecture
§ We were attempting to simplify the architecture by removing L3 CPE from the branch in the managed CPE / VPN architecture
§ But in that process, complexities of other types got injected back into the network § MAC address scaling issue impacting service scale, convergence, cost § Security exposure of the vCPE / NFV DC and the SP infrastructure due to extension of L2 domains § Possible chances of Layer 2 loops due to operational errors § Complex design requirements to satisfy high availability à more difficult to operate
§ It may create further limitations when it comes to service availability at the branch § Layer 3 services such as IPSec, WAN Acceleration etc. are not possible from the branch anymore § 3G / 4G LTE on the same device § Hierarchical and Granular QOS from the branch
23 © 2013-2014 Cisco and/or its affiliates. All rights reserved.
So What We May Do To Approach the Problem ? This is My Recommendation J
Carrier Ethernet AGG
AGG
AGG
NPE
NPE
L3 NID MPLS Core
vFW
vCPE*
vDPI
NFV DC
NFV and Cloud Orchestration
Cloud DC (SP / 3rd Party)
PE
PE
§ Keep a L3 CPE at branch, may be physical or virtual à call it a L3 NID or L3 CPE or whatever you like § We need to demarcate customer’s L2 network at the branch itself, that’s how the networks scaled § This helps avoid MAC address scale, security & L2 Loop issues. Also avoids additional issues with VRRP design § Make the L3 CPE at branch Zero Touch Provisioning (ZTP) capable – to achieve automation and agility
§ If required, try and make the L3 CPE at branch simplified by reducing the footprint of “enabled features” § Provision complex CPE features on NFV DC (may include advanced routing on a vCPE) § Have the ability to service chain vCPE with other rich set of functions using NFV orchestration system à make it agile!
L3 NID or L3 CPE
Simple Routing like Static / IGP with vCPE, or PE-CE Routing from branch Demarcate Customer L2 Network Here
vCPE* - May not be reqd. most of the time
IaaS / PaaS / SaaS
Thank you.
25 © 2013-2014 Cisco and/or its affiliates. All rights reserved.
NFV – How to build / Augment Operations skillsets
• Most existing technologies, protocols and associated skills are equally required • On top of that, there are needs for acquisition of New Skills • x86 Server Virtualization
• Virtualization on Linux (and KVM/QEMU) Environment • Cloud Orchestration Systems – such as OpenStack
• Virtual Switches – OVS, Netmap/VALE, Snabbswitch, Vendor Specific etc
• SDN Controllers – OpenDayLight, Vendor Specific • Device Programmability and APIs – NETCONF, Yang, RESTCONF, REST APIs, OF….
• Service Function Chaining – specially NSH (Network Service Header)
• Network based Virtual Overlay transport – VXLAN, MPLSoGRE/UDP, LISP, L2TPv3….. • Automation Tools – puppet / chef etc.
• Management, Orchestration, OSS Fundamentals,
• …..