A retrospective on 8 years of shipping Host SDN in … · A retrospective on 8 years of shipping...

44
Virtual Filtering Platform A retrospective on 8 years of shipping Host SDN in the Public Cloud Daniel Firestone Tech Lead and Manager, Azure Networking Host SDN Team

Transcript of A retrospective on 8 years of shipping Host SDN in … · A retrospective on 8 years of shipping...

VirtualFilteringPlatform

Aretrospectiveon8yearsofshippingHostSDNinthePublicCloud

DanielFirestoneTechLeadandManager,AzureNetworkingHostSDNTeam

Overview

• AzureandScale• WhyVirtualSwitchesforSDN?• EarlyimplementationsofAzureHostSDN• AzureHostSDNPlatformGoals• VFP– OurplatformforhostSDN• VFPv2– AddressingChallengesofScale• HardwareOffloads• ConclusionandFuture

mediacaching identity servicebus

mobileservices

cloudservices

virtualmachines

DataServices tableDataLake

blobstorage

SQLdatabase

AppServices

media

hpcintegration analytics

caching identity servicebus

web appsmobileservices

cloudservices

InfrastructureServices cdn

virtualmachines

virtualnetwork vpn

trafficmanager

MicrosoftAzure

2013201420152016

10’sofPB ExabytesAzureStorage

Pbps10’sofTbpsDatacenterNetwork

2010 2017

100K MillionsComputeInstances

Fortune500usingMicrosoftCloud

>85%

>60 TRILLIONAzurestorageobjects

>9 MILLIONAzureActiveDirectoryOrgs

1outof3AzureVMsareLinuxVMs>900 TRILLION

requests/day

>3 TRILLIONAzureEventHubsevents/week

>110BILLIONAzureDBrequests/day

AzureScale&Momentum

>18 BILLIONAzureActiveDirectoryauthentications/week

NewAzurecustomersamonth

>120,000

WhydoweneedVirtualSwitchSDNPolicy?

Policyapplicationatthehostismorescalable!

IaaS VM

Example#1:LB(FromAnanta,SIGCOMM‘13)

• AllinfrastructurerunsbehindanLBtoenablehighavailabilityandapplicationscale• Howdowemakeapplicationloadbalancingscaletothecloud?• Challenges:• Howdoyouloadbalancetheloadbalancers?• HardwareLBsareexpensive,andcannotsupporttherapidcreation/deletionofLBendpointsrequiredinthecloud• Support10sofGbps percluster• Needasimpleprovisioningmodel

LB

WebServerVM

WebServerVM

SQLService

IaaS VM

SQLService

NAT

“SDN”Approach:SoftwareLBwithNATinVMSwitch

MUX

VMDIP10.1.1.2

VMDIP10.1.1.3

AzureVMSwitch

StatelessTunnel

EdgeRouters

Client

VIP

VIP

DIPDIP

DirectReturn:VIP

VIP

MUX

VMDIP10.1.1.4

VMDIP10.1.1.5

AzureVMSwitch

NATSLBM

TenantDefinition:VIPs,#DIPs

Mappings

• GoalofanLB:MapaVirtualIP(VIP)toaDynamicIP(DIP)setofacloudservice• Twosteps:LoadBalance(selectaDIP)andNAT(translateVIP->DIPandports)• PushingtheNATtothevswitchmakestheMUXes stateless(ECMP)andenablesdirectreturn• SinglecontrollerabstractsoutLB/vswitch interactions

NAT

Example#2:Vnet

• IdeasfromVL2(SIGCOMM‘09)

• GoalistomapCustomerAddresses(e.g.BYOIPspace)toProviderAddresses(real10/8addressesonthephysicalnetwork)

• Thisrequiresatranslationof*every*packetonthenetwork– nohardwaredeviceonournetworkisscalableenoughtohandlethisloadalongwithalloftherelevantpolicy

• Enablescompaniestocreatetheirownvirtualnetworkinthecloud,definingtheirowntopologies,securitygroups,middleboxesandmore

VNETForwardingPolicy:Traffictoon-prem

Node1:10.1.1.5

BlueVM110.1.1.2

GreenVM110.1.1.2

VMSwitchSrc:10.1.1.2 Dst:10.2.0.9Src:10.1.1.2 Dst:10.2.0.9

Policylookup:10.2/16routestoGWonhostwithPA10.1.1.7

NM

Src:10.1.1.5 Dst:10.1.1.7 GRE:Green Src:10.1.1.2 Dst:10.2.0.9

L3ForwardingPolicy

Node3:10.1.1.7

GreenVPNGWVM10.1.2.1

VMSwitch

GreenEnterpiseNetwork10.2/16

VPNGW

Src:10.1.1.2 Dst:10.2.0.9L3VPNPPP

EvenMoreVSwitch…

• 5-tupleACLs• InfrastructureProtection• User-definedProtection

• Billing• Meteringtraffictointernet

• Ratelimiting• SecurityGuards• Spoof,ARP,DHCP,andotherattacks

• Moreindevelopmentallthetime…

NM/TM

Node2:10.2.1.6

VM110.1.1.2

VM210.1.1.3

VMSwitch

TenantDescription

ACL:VM2cantalktoothergreenVMsACL:VM2cantalktoVM3butnotVM4MeteralltrafficfromVM2outsideof10/8RatelimitVM2to800mbps

VM310.1.1.4

VM410.1.1.5 Billing

VM2sent23MBofpublicinternettraffic

EarlyApproachtoAzureVswitch (2009-2011):StackedSDNdriversperapp• EachSDNapplicationisadrivermodulehardcompiledintothevswitch,handlingpacketsonitsown

• ChangestoSDNpolicyrequirekernelspacechanges,andanOSupdate

• WasrevolutionaryforusinshippingLBandVNETandHostSDN– butnoteasytoaddnewSDNApps

• AfteracoupleofyearswedecidedweneededamoreflexibleHostSDNplatform

AzureVirtualSwitch

ACLs

LB

VNET

Overview

• AzureandScale• WhyVirtualSwitchesforSDN?• EarlyimplementationsofAzureHostSDN• AzureHostSDNPlatformGoals• VFP– OurplatformforhostSDN• VFPv2– AddressingChallengesofScale• HardwareOffloads• ConclusionandFuture

OriginalGoalsforAzureHostSDNPlatform

• Goal1:Provideaprogrammingmodelallowingformultiplesimultaneous,independentnetworkcontrollerstoprogramnetworkapplications,minimizingcross-controllerdependencies

• Goal2:ProvideaMATprogrammingmodelcapableofusingconnectionsasabaseprimitive,ratherthanjustpackets– statefulrulesasfirstclassobjects

• Goal3:Provideaprogrammingmodelthatallowscontrollerstodefinetheirownpolicyandactions,ratherthanimplementingfixedsetsofnetworkpoliciesforpredefinedscenarios

WhatisVFP?

VMSwitch

vNIC

VM

NICvNIC

VM

SLB(NAT)

VNET

ACLs,Metering,Security

VFP

VirtualFilteringPlatform(VFP)Azure’sSDNDataplane

• PluginmoduleforWS2012+VMSwitch• ProvidescoreSDNfunctionalityforAzurenetworkingservices,including:• AddressVirtualizationforVNET• VIP->DIPTranslationforSLB• ACLs,Metering,andSecurityGuards

• Usesprogrammablerule/flowtablestoperformper-packetactions• ProgrammedbymultipleAzureSDNcontrollers,supportsalldataplane policyatlineratewithoffloads

VFPTranslatesL2extensibility(ingress/egresstoswitch)toL3extensibility(inbound/outboundtoVM)

VM

Metering

VNET

SLB

ACLs

Inbound(Egress) Outbound(Ingress)

Egress->Inbound

Ingress->Outbound

VMSwitch

vNIC

VM

NICvNIC

VM

SLB(NAT)

VNET

ACLs,Metering,Security

VFPEgress

Egress

Ingress

Ingress

Goal:AllPolicyisintheController-VFPisaFast,FlexibleImplementationofPolicy• Toenableagility,allowcontrollerstospecifyexactlywhattheywanttodoattheflow/packetlevel,sotheycanimplementnewSDNscenarioswithoutdataplane driverchanges• VFPfocusesonintegratingmulti-controllerpoliciesandscalingthehostdataplane – perf andoffloadswithoutsacrificingflexibility• 3KeyPrimitivesweexposetocontrollers:• Layers– independentflowtablespercontrollertoorderthepipeline• RuleMatches– definewhichpacketsmatchwhichrule• RuleActions– whattodowithapacketforagivenrule

Node:10.4.1.5

VFP

KeyPrimitive:MatchActionTables

BlueVM110.1.1.2NIC

Controllers

TenantDescriptionVNet Description

Flow Action

VNet RoutingPolicy ACLsNAT

Endpoints

Flow ActionFlow Action

TO:10.2/16 Encap toGW

TO:10.1.1.5 Encap to10.5.1.7

TO:!10/8 NAT outofVNET

Flow ActionFlow Action

TO:79.3.1.2 DNATto10.1.1.2

TO:!10/8 SNATto79.3.1.2

Flow Action

TO:10.1.1/24 Allow

10.4/16 Block

TO:!10/8 Allow

• VFPexposesatypedMatch-Action-TableAPItotheagents/controllers• Onetable(“Layer”)perpolicy• InspiredbyOpenFlow andotherMATdesigns,butdesignedformulti-controller,stateful,scalablehostSDNapplications

VNET SLBNAT ACLS

Layers

• AVFPlayerisnotabuilt-infunction– itisagenericsetofrule/flowtables• Anylayercanbecreatedatanytime– itisonlyan“LBlayer”ora“VNETlayer”basedonwhatrulesareplumbedintoit• ResourceslikeNATpoolsorPA->CAmappingpoolsareavailabletoanylayertoimplementspecialfunctionality(e.g.SLBorVNET)

EverythingisStateful• Thecoreprimitiveofmostpolicyisa(TCP,UDP,…)connection– translatestoatwo-wayflow• 5-tupleACLs,VIP-DIPSLBNAT,dynamicoutboundSNAT,andmore• Stateful rulesmakeiteasytoreasonaboutasymmetricpolicy– rulesapplytowhicheversidestartedtheflow,andthereversehappensautomaticallyfortheotherdirection• FlowstatemanagedbyTCPconnectiontracker

VM

OUTRules

INRules

Flow Flow

INFlowTable OUTFlowTable

VFP

Layer

Example:SoftwareLBSupportVM

DynamicNATRules

StaticNATRules

OUTRules

DecapRules

INFlowTable OUTFlowTable

INFlowTable OUTFlowTable

FlowFlow

SLBNATLayer

SLBDecap Layer

FlowFlow NATRanges

NATPool

RulescanreferenceResources,likedynamicNATpoolsorPA-CA

mappingtables

Similarly,VNETcanbeexpressedasaseriesof(encap,decap,rewrite,etc)rules,ratherthanfixedpolicy

CoolUsesofStateful Flows– LBFastpath

VFP

SLBDecap/Fastpath

SLBNAT

Storage

Decap

VFP

SLBDecap/Fastpath

SLBNAT

VM

DecapEncap

MUX RedirectPacket

FASTPATH

ExampleVFPLayers:SupportforLB,VNET,SecurityGroups,andBilling

VM

ACLs

VNET

SLBNAT

ILB

SLBDecap /Fastpath

Metering

SuccessfullydeployedacrossAzurein2012

AgilityExample:InternalLoadBalancing

• LBteamwantedtoofferCA-spaceLBinadditiontoPA-spaceLB

• Alltheyhadtodowascreateanewlayer– addednewpolicybyspecifyingCA-spacerulematchesforNATrules

• NonewworkinVFP,becausewepickedtherightprimitives

VM

ACLs

VNET

SLBNAT

ILB

SLBDecap /Fastpath

Metering

SLBController

Overview

• AzureandScale• WhyVirtualSwitchesforSDN?• EarlyimplementationsofAzureHostSDN• AzureHostSDNPlatformGoals• VFP– OurplatformforhostSDN• VFPv2– AddressingChallengesofScale• HardwareOffloads• ConclusionandFuture

ScalingUpSDN:NICSpeedsinAzure

• 2009:1Gbps• 2012:10Gbps• 2015:40Gbps• 2017:50Gbps• Soon:100Gbps?

Wegota50ximprovementinnetworkthroughput,butnota50ximprovementinCPUpower!

0

10

20

30

40

50

60

2009 2012 2015 2017

NICSpeed,Gbps

NewGoalsforVFPv2(2013-2014)

• Goal4:ProvideaserviceabilitymodelallowingforfrequentdeploymentsandupdateswithoutrequiringrebootsorinterruptingVMconnectivityforstateful flows,andstrongservicemonitoring

• Goal5:Provideveryhighpacketrates,evenwithalargenumberoftablesandrules,viaextensivecaching

• Goal6:ImplementanefficientmechanismtooffloadflowpolicytoprogrammableNICs,withoutassumingcomplexruleprocessing

VFPv1Layers- ChallengesVM

Metering

VNET

SLBNAT

ACLs

SLBDecap /Fastpath

ILB

• Holdoverfromoriginalvswitch design– everylayerindependentlyhandles,parses,andmodifiespackets

• Mostofourlayerswanttobestateful – butthismeansindependentconnectiontrackingandflowstateateachlayer

• AshostSDNbecameeasytoprogramandwidelyused,peoplewantedtoaddnewlayersallthetime

• Couldn’tkeepaddinglayersandscalingup

PARSE

PARSEPARSE

PARSE

PARSEPARSE

PARSE

Modify

Modify

Modify

Modify

Weneedabetterprimitiveforactions!

VM

Metering

VNET

SLBNAT

ACLs

SLBDecap /Fastpath

ILB

PARSE

MODIFY

UnifiedFlowID

HEADER

HEADER

HEADER

MATCH

TranspositionEngine

CompositeTransposition

ASICPipelineModel:ParseOnce,ModifyOnce

TRANSPOSE

Shippedin2014

HeaderTransposition- ActionsHeader Parameters

OuterEthernet SourceMAC,DestMAC

OuterIP SourceIP,DestIP

Encap Encap Type, GREKey/VXLANVNI

InnerEthernet SourceMAC,DestMAC

InnerIP SourceIP,DestIP

TCP/UDP SourcePort,Dest Port(note:doesnotsupportPush/Pop)

Action Notes

Pop Removethisheader.Noparamssupported.

Push Pushthisheaderontothepacket.Allparams mustbespecified.

Modify Modifythisheader.Allparamsareoptional,butatleastonemustbespecified.

Ignore Leavethisheaderasis.Noparamssupported.

NotPresent Thisheaderisnotexpectedtobepresent(basedonthematchconditions).Noparams supported.

Headers

HeaderActions

HeaderTransposition– ExampleActions

Header NAT Encap Decap Encap+NAT Decap+NAT

OuterEthernet Ignore Push(SMAC,DMAC) Pop Push(SMAC,DMAC) Pop

OuterIP Modify(SIP,DIP) Push(SIP,DIP) Pop Push(SIP,DIP) Pop

GRE/VxLAN NotPresent Push(Key) Pop Push(Key) Pop

InnerEthernet NotPresent Modify(DMAC) Ignore Modify(DMAC) Ignore

InnerIP NotPresent Ignore Ignore Modify(SIP,DIP) Modify(SIP,DIP)

TCP/UDP Modify(SPt,DPt) Ignore Ignore Modify(SPt,DPt) Modify(SPort,DPt)

Allowsrulestoexpressmorecomplexactionsacrossheaders

UnifiedParsingandMatchingCondition Notes

SourceVPort N/A

(Outer)SourceMACAddress N/A

(Outer)DestinationMACAddress N/A

(Outer)SourceIPAddress IPv4orIPv6

(Outer)DestinationIPAddress IPv4orIPv6

(Outer)IPProtocol N/A

SourcePort AppliesifProtocol==TCPorUDP

DestinationPort AppliesifProtocol==TCPorUDP

ICMPType AppliesifProtocol==ICMP(v4orv6)

DestinationVport N/A

GREKey/VxLAN VNI(TenantID) AppliesifOuterProtocol==GRE/VxLAN

(Inner)SourceMACAddress N/A

(Inner)DestinationMACAddress N/A

(Inner)SourceIPAddress IPv4orIPv6

(Inner)DestinationIPAddress IPv4orIPv6

(Inner)IPProtocol N/A

HeaderTranspositionsCompleteourGenericSouthboundAPICapabilityStory• Inordertoenableagility,wewantcontrollerstobeabletodefinenewtypesofpolicydynamicallywithoutneedingtochangeVFP.• Wealreadyprovideflexibilityin:• Layers:Controllerscandefinenewlayersdynamicallyfortheirownpolicywithoutinterferingwithothercontrollers’layers• Rules:Controllerscandefinewhichrulesmatchwhichpacketsviaaconsistent5-tuplematchAPI,nothingspecifictospecialpolicies

• Headertranspositionsprovidethekeythirdprimitive:Abilitytospecifywhatexactlyaruledoesonceitismatched• AllbuiltinrulesdefineHTs,butcontrollerscandefinetheirownrulesbycreatingnewonesoutofHTsonthefly

UnifiedFlowTables– AFastpath ThroughVFP

TranspositionEngine

Rewrite

Transposition

SLBDecap SLBNAT VNET ACL MeteringRule Action Rule ActionRule Action Rule Action Rule Action Rule Action

Decap* DNAT* Rewrite* Allow* Meter*FirstPacket

Second+Packet

Flow ActionDecap,DNAT,Rewrite,Meter1.2.3.1->1.3.4.1,62362->80

RuleLookups(Expensive)

HashLookups(Cheap)

VFP

UnifiedFlowTables

• Singlehashlookupforeachpacketafterflowiscreated• Leavesroomfornewlayersw/operf impact(e.g.ILB,etc)• SingleflowtableperVMcanbesizedwithVMsize• AllVFPactionscanbeexpressedasheadertranspositions– e.g.encap/decap/l3rewrite/l4NAT• Anysetofheadertranspositionscanbecomposedandexpressedasonetransposition• UnifiedFlowTable:Onematch(perentireflowid,innerandouter)andoneaction(headertransposition)perflow

Overview

• AzureandScale• WhyVirtualSwitchesforSDN?• EarlyimplementationsofAzureHostSDN• AzureHostSDNPlatformGoals• VFP– OurplatformforhostSDN• VFPv2– AddressingChallengesofScale• HardwareOffloads• ConclusionandFuture

SingleRootIOVirtualization(SR-IOV):NativePerformanceforVirtualizedWorkloads

ParentPartition VM1 VM2

TCP/IP TCP/IP

VFDriver VFDriverNetworkVirtualServiceProvider

NICNICEmbeddedSwitch

ExternalSwitch

VF VFPF

ButwhereistheSDNPolicy?

2016:AcceleratingVFPwithFPGASmartNICs

• Goal:Offloadacacheofourinternal(unified)flowtabletotheNIC

• PackageHeaderTranspositionsandUnifiedFlowIDsintohardwareAPI

• AllowsustoenableSR-IOV,applyingvirtualizationpolicyinhardwareandbypassingthehostcompletely

FutureofHostSDN:NewHardware/Softwareco-designmodels,programmableaccelerationfortransports,QoS,crypto,andmore!

Results-AzureAcceleratedNetworking:FastestCloudNetwork!• HighestbandwidthVMsofanycloud• DS15v2&D15v2VMsgetupto25Gbps

• Consistentlowlatencynetworkperformance• ProvidesSR-IOVtotheVM• 10xlatencyimprovement• Increasedpacketspersecond(PPS)• Reducedjittermeansmoreconsistencyinworkloads

• EnablesworkloadsrequiringnativeperformancetorunincloudVMs• >2ximprovementformanyDBandOLTPapplications

HostNetworkingmakesPhysicalNetworkFastandScalable

• Massive,distributed40/100GbEnetworkbuiltoncommodityhardware• NoHardwarepertenantACLs• NoHardwareNAT• NoHardwareVPN/overlay• NoVendor-specificcontrol,managementordataplane

• Allpolicyisinsoftwareonhosts–andeverything’saVM!• Networkservicesdeployedlikeallotherservices

• VFP,battletestedinthecloud,isnowavailableinMicrosoftAzureStackforprivatecloudaswell!

T2-1-1

T2-1-2

T2-1-8

T3-1

T3-2

T3-3

T3-4

RowSpine

T2-4-1

T2-4-2

T2-4-4

DataCenterSpine

T1-1 T1-8T1-7…T1-2

… …

RegionalSpine

T1-1 T1-8T1-7…T1-2 T1-1 T1-8T1-7…T1-2

Rack …T0-1 T0-2 T0-

20

40/50GServers

…T0-1 T0-2 T0-

20

40/50GServers

…T0-1 T0-2 T0-

20

40/50GServers

Thanks!

• VFPDevelopers• YueZuo,HarishKumarChandrappa,PraveenBalasubramanian,VikasBhardwaj,SomeshChaturmohta,MilanDasgupta,MahmoudElhaddad,LuisHernandez,NathanHu,AlanJowett,HadiKatebi,FengfenLiu,KeithMange,RandyMiller,ClaireMitchell,SambhramaMundkur,ChidambaramMuthu,GauravPoothia,MadhanSivakumar,EthanSong,KhoaTo,KelvinZou,andQasimZuhair

• DesignInfluence• AlirezaDabagh,DeepakBansal,PankajGarg,ChanghoonKim,HemantKumar,ParveenPatel,ParagSharma,NisheethSrivastava,VenkatThiruvengadam,NarasimhanVenkataramaiah,HaiyongWang

• DaveMaltz,MarkRussinovich,andAlbertGreenbergforyearsofsupport

WanttocomebuildthenextgenerationofscalableHostSDN?We’rehiring!

[email protected]