Multi-site OSP - OSS Boston · MULTI-SITE OPENSTACK DEPLOYMENT OPTIONS & CHALLENGES FOR TELCOS...
Transcript of Multi-site OSP - OSS Boston · MULTI-SITE OPENSTACK DEPLOYMENT OPTIONS & CHALLENGES FOR TELCOS...
MULTI-SITE OPENSTACK DEPLOYMENT OPTIONS & CHALLENGES
FOR TELCOS
Azhar Sayeed Chief Architect
DISCLAIMER ImportantInforma+on
2
Theinforma+ondescribedinthisslidesetdoesnotprovideanycommitmentstoroadmapsoravailabilityofproductsorfeatures.Itsinten+onispurelytoprovideclarityindescribingtheproblemanddriveadiscussionthatcanthenbeusedtodriveopensourcecommuni+esRedHatProductManagementownstheroadmapandsupportabilityconversa+onforanyRedHatproduct
3
AGENDA
• Background: OpenStack Architecture • Telco Deployment Use case • Distributed deployment – requirements • Multi-Site Architecture
• Challenges • Solution and Further Study • Conclusions
OPENSTACK ARCHITECTURE
6
WHY MULTI-SITE FOR TELCO? • Computerequirements–NotjustatDataCenter
• Mul+pleDataCenters• ManagedServiceOffering
• ManagedBranchOffice• ThickvCPE
• MobileEdgeCompute• vRAN– vBBUloca+ons• VirtualizedCentralOffices
• Hundredstothousandsofloca+ons• PrimaryandBackupDataCenter–Disasterrecovery• IoTGateways–Fogcompu+ng
CentrallymanagedComputeclosertotheuser
7
Multiple DC or Central Offices
Security & Firewall Quality of Service (QoS) Traffic Shaping Device Management
Main Data Center
Overlay Tunnel over Internet
E2EOrchestratorRemoteSites• HierarchicalConnec+vitymodelofCO• Remotesiteswithcompute
requirements• ExtendOpenStacktothesesites
Independent OpenStack Deployments
Backup Data Center
Remote Data Centers
Atypicalservicealmostalwaysspansacrossmul3pleDCs
8
Multiple DCs – NFV Deployment
L2 or L3 Extensions between DCs
Real Customer Requirements
Fully Redundant System
Controllers Storage Nodes Compute Nodes
Region1
Region2......25• 25Sites
• 2-5VNFsrequiredateachsite• Maximumof2ComputeNodespersiteneededforthese
VNFs• StorageRequirements=Imagestorageonly• TotalnumberofcontrolNodes=25*3=75• TotalNumberofStorageNodes=25*3=75• TotalNumberofComputeNodes=25*2=50
RedundantConfigura+onOverhead
75%
9
Virtual Central Office
L2 or L3 Extensions between DCs
Real Customer Challenge
Fully Redundant System
Controllers Storage Nodes Compute Nodes
Region1Region2......1000+
• 1000+Sites–CentralOffices• Fromfew10sto100sofVMs• FullyRedundantconfigura+ons• Termina+onofResiden+al,BusinessandMobileServices• Managing1000openstackislands• Tier1Telcosalreadyhave>100sitestoday
ManagementChallenge
DEPLOYMENT OPTIONS
10
OPTIONS • Mul+pleIndependentIslandModel–seenthisalready• CommonAuthen+ca+onandManagement
– ExternaluserpolicymanagementwithLDAPintegra+on– CommonKeystone
• Stretcheddeploymentmodel– ExtendcomputeandStorageNodesintootherDataCenters– Keepcentralcontrolofallremoteresources
• AllowDataCenterstoshareworkloads–Tri-circleapproach• ProxytheAPIs–MasterSlavemodelorcascadingmodel• Agentbasedmodel• Somethingelse??
11
12
Multiple DC or Central Offices
L2 or L3 Extensions between DCs
Feedtheloadbalancer• Sitecapacityindependentoftheother• Userinforma+onseparateor
replicatedoffline• Loadbalancerdirectstrafficwhereto
goto–Goodforloadsharing• DR–externalproblem
Independent OpenStack Deployments
LB
Fully Redundant System
Fully Redundant System
Controllers Storage Nodes Compute Nodes
CloudManagementPladorm
Region1
Region2…NGoodforfew10sofsites–Whatabout100sorThousandsofsites
Directory
13
Extended OpenStack Model
L2 or L3 Extensions between DCs
CommonorSharedKeystone• SingleKeystoneforauthen+ca+on• Userinforma+oninoneloca+on• IndependentResources• Modifythekeystoneendpointtable
• Endpoint,Service,Region,IP
Shared Keystone Deployment
Fully Redundant System
Fully Redundant System
Controllers Storage Nodes Compute Nodes
CloudManagementPladorm
Region1
Region2…N
Keystone
Iden+ty:Keystone–Singlepointofcontrol
Directory
14
Extended OpenStack Model
L2 or L3 Extensions between DCs
CentralController• Singleauthen+ca+on• DistributedComputeResources• SingleAvailabilityZoneperRegion
Central Controller and Remote Compute & Storage (HCI) Nodes
Fully Redundant System
Controllers Storage Nodes Compute Nodes
CloudManagementPladorm
Region1 Region2…N
Replicated Storage – Galera Cluster
Cinder, Glance and Image
Manual Restore
Directory
15
Revisiting the Branch Office - Thick CPE
Enterprise vCPE x86 Server with VNFs
Data Center
Internet
Enterprise vCPE
NFVI
Security & Firewall Quality of Service (QoS) Traffic Shaping Device Management
OpenStack, OpenShift/Kubernetes
Can we deploy compute nodes at all the branch sites and centrally control them?
IPSec, MPLS or Other Tunnel mechanism
E2ENetworkOrchestrator
DeployNovaCompute
HowdoIscaleittothousandsofsites?
OSP 10 – Scale components independently Most OpenStack HA services and VIPs must be launched/managed by Pacemaker or HAProxy. However, some can be managed via systemctl thanks to the simplification of pacemaker constraints introduced in version 9 and 10.
17
COMPOSABLE SERVICES AND CUSTOM ROLES
• Leverage composable services model – to define a Central Keystone
– Place functionality where it is needed – i.e. dis-aggregate
• Deployable standalone on separate nodes or combined with other services into Custom Role(s).
– Distribute the functionality depending on the DC locations
Hardcoded Controller Role
Custom Controller Role
Custom Ceilometer Role
Custom Networker Role
...
Keystone
Ceilometer
Neutron
RabbitMQ
Glance
Keystone
Ceilometer
Neutron
RabbitMQ
Glance
...
18
Re-visiting the Virtual Central Office use case
L2 or L3 Extensions between DCs
Real Customer Challenge
Fully Redundant System
Controllers Storage Nodes Compute Nodes
Region1
RequireFlexibilityandsomeHierarchy
Region2
Region3
Region4
Region3bRegion3a
Scaling across a thousand sites?
19
CONSIDERATIONS
• Some areas that we need to look at • Latency and Outage times
• Delays due to distance between DCs and link speeds - RTT • The remote site is lost – headless operations and subsequent
recovery
• Startup Storms • Scaling Oslo messaging
• RabbitMQ
• Scaling of Nodes => Scale RabbitMQ/Messaging • Ceilometer (Gnocchi & Aodh)– heavy user of MQ
Scaling across a thousand sites?
20
LATENCY AND OUTAGE TIMES
• Latencybetweensites–NovaAPICalls• 10,50,100ms?Roundtrip+me=Queuetuning• Bojlenecklink/nodespeed
• Outage+me–recovery+me
• 30sormore?• NovaComputeservicesflapping• Confirma+on–fromprovisioningtoopera+on• Neutron+meouts–bindingissues• Headlessopera+on• Restart–causesstorms
21
RABBITMQ TUNING • Tunethebuffers–increasebuffersize
• Takeintoaccountmessagesinflight–ratesandroundtrip+mes• BDP=Bojleneckspeed*RTT
• Numberofmessages• Servers*backends*requests/sec=Numberofmessages/sec
• Splitintomul+pleinstancesofmessagequeuesfordistributeddeployment• CeilometerintoaMQ–HeaviestuserofMQ• NovaintoasingleMQ• NeutronintoaMQ• Refertoaninteres+ngpresenta+ononthistopic–“TuningRabbitMQ
atLargeScaleCloud”– OpenstackSummit–Aus+n2016
MQ
MQ
MQ
NovaConductor
Compute
Ceilometercollector
CeilometerAgents
Neutron
22
RECENT AMQP ENHANCEMENTS • Eliminates the broker based model • Enhances AMQP 1.0
• Separate messaging end point from message routers
• Newton has AMQP driver for oslo messaging • Ocata provides perf tuning, upstream support for
Triple-O
• If you must use RabbitMQ • Use clustering and exchange configurations • Use shovel plugin with exchange configurations
and multiple instances
Broker
Broker
Broker
Broker Broker
Hierarchical-Tree
Mesh-Routed
OPENSTACK CASCADING PROJECT
23
Parent
Child
Child
Child Child
ParentAZ1 AZn
ProxyforNova,Cinder,Celometer&NeutronsubsystemspersiteAtParent–loadsofproxysonesetperChildUsercommunicatestothemaster
Cascading solution split into two projects
24
TRICIRCLE AND TRIO2O
• Tricircle – Networking across openstack clouds • Trio2o – Single API Gateway for Nova, Cinder
ExpandworkloadsintootherOSinstancesCreateNetworkingextensionsIsola+onofEast-westtrafficApplica+onHA
APIGateway
User1 UserN
AZ1 AZx AZn
TRI-CIRCLEMakeNeutron(s)workasasinglecluster
Trio2o
OPNFVMul+-SiteProject– Eupheratesrelease
SingleRegionwithmul+plesubregionsSharedorFederatedKeystoneSharedorDistributedGlanceUID=TenantID+PODID
pod
Remote Compute Nodes
25
WHAT’S THE ALTERNATIVE?
• Should we abandon the idea of Remote Nova Nodes? • Use Packstack/AllinOne – OSP in a box – ala Vz uCPE
• High overhead if you want to run 1-2 VNFs
• Perhaps some optimization possible using Kolla/Container model
• Initialize the remote nodes – Need L3/L2 connectivity for PXE • Make that a Kubernetes Node – Use containers on that node
• Implement a new interface for remote nodes
• Nova Agent on remote nodes ?
• Abandon the idea of OpenStack – No!!!! No OpenStack really!!! ?
• Use a CMP – to manage remote bare metal nodes
• KVM – Hypervisor
• Run Containers on remote nodes – Do we run into same issues?
Virtual controllers – to get around node restrictions
26
VIRTUAL CONTROLLER MODEL
Kolla –Containerizing the control plane • Kolla –Kubernetes and Kolla Ansible
• Containerizing OSP control makes the previous options easier
• Can remote nodes be considered as PODS in Kubernetes environments
• Interface between Master and Host node • The containers can be deployed on those nodes to manage apps or even OSP
services
Keystone Glance Nova
Neutron VM1 VM2
27
SUMMARY • Deploying OpenStack at multiple sites is a must for Telcos • Tri-circle and Trio2o offer good promise • Tune Rabbit MQ or move to MQ enhancements (AMQP)
• Partition MQ
• Scale MQ instances • Carefully craft the Availability Zone model • Nova Agent Proxy • Deploying baremetal at remote sites still an issue does not solve the
problem of access
• Another way of automation using call home • Use Kubernetes as master orchestrator => Kubernetes managing OSP
managing container workloads – K8S Sandwich
plus.google.com/+RedHat
linkedin.com/company/red-hat
youtube.com/user/RedHatVideos
facebook.com/redhatinc
twitter.com/RedHatNews
THANK YOU
ABSTRACT ImportantInforma+on
29
OpenStackprovidesagreatInfrastructure-as-a-Service(IaaS)pladromfordeploymentofapplica+onsinvirtualmachinesandcontainers.Fortelcosspecifically,OpenStackunifiesthepointofpresence(PoP),centraloffice,anddatacenterinfrastructure.However,manytelcosneedOpenStackdeployedinmanydatacentersaroundtheregionorcountry.Theques+onishowshouldtheydeployOpenStackformul+-siteneeds?Shouldtheyconsiderstretcheddeploymentwheredifferentcomponentssitindifferentloca+ons?Orshouldtheyconsiderreplica+ngtheen+reOpenStackenvironmentineachloca+on?WhatimpactdoesthishaveforKeystone,messaging,disasterrecovery,andmoreimportantly,unifiedmanagementofallthesesites?Thispresenta+onwilldiscussarchitecturalanddeploymentop+onsformul+-sitedeploymentsofOpenStack