JK Lee James Zeng, LBSwitch P4workshop v2

19
LBSwitch: Layer 4 Load Balancing in P4 switch JK Lee, James Zeng Co/work with: Rui Miao, Changhoon Kim, Minlan Yu

Transcript of JK Lee James Zeng, LBSwitch P4workshop v2

Page 1: JK Lee James Zeng, LBSwitch P4workshop v2

LBSwitch:*Layer*4*Load*Balancing*in*P4*switch

JK#Lee,#James#ZengCo/work#with:#RuiMiao,#Changhoon Kim,#Minlan Yu

Page 2: JK Lee James Zeng, LBSwitch P4workshop v2

Agenda• Load#Balancing#at#Facebook

• Insights#from#Facebook#data

• LBSwitch:#cloud/scale#load#balancing#on#switches

• Evaluations#on#Facebook#data

2

Page 3: JK Lee James Zeng, LBSwitch P4workshop v2

Load*balancing*at*Facebook

3

L4LB

Data#Center

ServerServer

Server

L4LBL4LB L7LB Web#

Server

Data#Center

L7LB

PoPL4LB

East%West(traffic

North%South(traffic

Page 4: JK Lee James Zeng, LBSwitch P4workshop v2

L4*LB*Key*Functions

• VIP:#Client/facing#Virtual#IP,#DIP:#Server#IP• Maintain#VIP#! DIP/pool#mappings• Given#a#VIP,#choose#an#appropriate#DIP#from#the#pool,#and#forward#packets#to#the#DIP

• A#DIP#pool#can#change#over#time

4

Load Balancer

Sever Server Server Server

Datacenter

DIP1=10.0.0.1 DIP2=10.0.0.2 DIP3=11.0.0.1 DIP4=11.0.0.2

VIP1=20.0.0.1 VIP2=20.0.0.2

L4#Requests

Datacenter

VIP: Virtual IPDIP: Direct IP

VIP1 for service 1VIP2 for service 2

Page 5: JK Lee James Zeng, LBSwitch P4workshop v2

Agenda• Load#balancing#at#Facebook

• Insights(from(Facebook(data◦Many(active(connections◦ Frequent(DIP(pool(updates◦ Frequent(new(connection(arrivals

• LBSwitch:#cloud/scale#load#balancing#on#switches

• Evaluations#on#Facebook#data

5

Page 6: JK Lee James Zeng, LBSwitch P4workshop v2

Measurement*Study:*Many*Active*Connections

6

Implication:#Each#rack#needs#to#handle#10+M#connections

Peak#clusters#have#11/17M#connections#per#rack

Page 7: JK Lee James Zeng, LBSwitch P4workshop v2

Frequent*DIP*pool*updates

7

Bursty updates#in#NS#traffic66.7%#clusters#have#no#updatesPeak#at#107#updates/min

Frequent#updates#in#EW#traffic5#updates/min# in#the#median16#updates/min# in#the#peak

Implication:#Need#to#ensure#consistent#VIP/DIP#mappings#during#frequent#DIP#pool#updates

• Bursty updates#in#NS#traffic,#frequent#updates#in#EW#traffic◦ NS:#high#stability#and#shared#DIPs◦ EW:#high#service#dynamics

Page 8: JK Lee James Zeng, LBSwitch P4workshop v2

Frequent*new*connection*arrival

8

Peak#at#4.4/51M#per#min

Implication:#Need#to#handle#consistent#arrival#of#new#connections

Median#at#50#connections#per#min

*VIP:#Virtual#IP

Page 9: JK Lee James Zeng, LBSwitch P4workshop v2

Today’s*L4*LoadGBalancing•Special#HW#middlebox◦Hard#to#scale,#rigid#placement

•SW#data#plane#in#x86#servers◦Many#servers#are#needed#◦ Variable#latency◦Hard#to#isolate#performance#between#services,#tenants

9

Page 10: JK Lee James Zeng, LBSwitch P4workshop v2

Key*Idea:*P4*Switch*as*CloudGScale*LoadGBalancer

• Benefits◦ High# throughput# (Tbps,#Gpps),# zero/latency,#ubiquitous#◦ Predictable#performance#even#under#availability#attacks

• Challenges◦ Don’t#break#existing#connections#during#DIP#pool#update◦Maintain#millions#of#connection#states#in#switch#SRAM

10

Parser

Deparser

In

Queues

Actio

n

Stage#1

Match#Ta

ble

Actio

n

Stage#XMatch#Ta

ble

Out

Actio

n

Stage#N

Match#Ta

ble

L2 lookup L4#LB NAT,Tunneling

Page 11: JK Lee James Zeng, LBSwitch P4workshop v2

DIP*Pool*Update*Scenario

• Existing#connections#to#DIP1#shouldn’t#be#affected#by#DIP2#addition

11

DIP(Pool(Version(1

VIP DIPs

20.0.0.1:80 10.0.0.1:80# (DIP1)

DIP(Pool(Version(2

VIP DIPs

20.0.0.1:80 10.0.0.1:80# (DIP1)10.0.0.2:80( (DIP2)

DIP2#addition

on#miss

VIP DIP

20.0.0.1:8010.0.0.1:80#

""" """

""" """

Connection DIP

""" """

VIPTableConnTable

1.2.3.4:1234!20.0.0.1:80

TCP

1.2.3.4:1234#10.0.0.1:80#TCP

on#hit

Install

Connection DIP

1.2.3.4:1234#20.0.0.1:80#TCP

10.0.0.1:80

""" """

VIP DIP

20.0.0.1:80TCP

10.0.0.1:80

10.0.0.2:80

""" """

""" """

Connection DIP

1.2.3.4:1234#20.0.0.1:80TCP

10.0.0.1:80

""" """

Page 12: JK Lee James Zeng, LBSwitch P4workshop v2

Scale*to*Millions*of*Connections

• Reduce#connection#entry#size◦ Store#hash#digest#instead#of#5 tuple◦ Store#DIP/pool#version#instead#of#DIP

12

Connection# ID#(5 tuple) DIP#(2 tuple)

1.2.3.4:1234#20.0.0.1:80, TCP 10.0.0.1:80

""" """

Hash digest Version

0xAABBCC 1

""" """

4~13X#reduction

• Achieve#high#table#density◦ Multi/stage#cuckoo#hash◦ Table#insertion#by#SW

Page 13: JK Lee James Zeng, LBSwitch P4workshop v2

LBSwitchDesign

13

ConnTable(Digest! Version)

VIPTable(VIP#! Version)

DIPPoolTable(VIP,#Version#! DIP)On#miss

On#hit

Table#management#

Data#plane#(HW)

ConnectionLearning

Connection#insertionControl#plane#(SW)

•Data(plane:#connection# tracking,#version#mapping,# DIP#selection#at#line/rate• Control(plane:#connection#learning#&#insertion,#DIP#pool#updates•No#connection#stalled#by#SW#(no#slow#path)

Page 14: JK Lee James Zeng, LBSwitch P4workshop v2

Protect*Connections*during*DIPGPool*Update

14

ConnTable(Digest! Version)

VIPTable(VIP#! Version)

Bloom#Filter(connections# using# vold)

DIPPoolTable(VIP,#Version#! DIP)On#miss

Data#plane#(HW)

VIP#in#version#update

Control#plane#(SW)

• Connection# learning#&#insertion#by#CPU#is#slow• 2nd#packet#of#the#new#connection#may#arrive#before#ConnTable insertion• 100s#of#connection#arrivals#in#1ms#(per#VIP),#vulnerable#to#version#update:#vold to#vnew• Solution:#use#bloom#filter#to#cache#connections#recently#mapped#to#vold

On#hit

Table#management# ConnectionLearning#(1ms#delay)

Connection#insertion

Normal#VIPVIPTable

(VIP#! vold)VIPTable

(VIP#! vold vnew)

Page 15: JK Lee James Zeng, LBSwitch P4workshop v2

P4*Prototype• Data#plane◦ ~300#lines#of#P4#code#into#switch.p4◦ register_read/write primitives# to#implement#bloom# filter

• Control#plane◦ ~800#lines#of#C#code#in#switch_api

15

Page 16: JK Lee James Zeng, LBSwitch P4workshop v2

Evaluations• Re/play#Facebook#traces:#connection#&#DIP#pool#update#events

• Scale◦ Up#to#13X#reduction#of#ConnTable

(Bigger#reduction#w/#IPv6)◦ 80MB#SRAM#for#17M#connections#

• Protect#existing#connections#during#DIP#pool#updates◦ w/o#caching:#up#to#1400#connections#break#per#VIP,#per#update◦ w/#caching:#zero#connection#break◦ Few#KB#SRAM#for#bloom#filter#caching

16

Page 17: JK Lee James Zeng, LBSwitch P4workshop v2

Conclusion*&*Next*Steps• Cloud/scale#L4#LB#is#feasible#in#P4#switches#

• Plan#to#open/source#P4#and#switch_api codes

• Deployment#scenarios◦ Hybrid#deployment#of#LBSwitch (L4)#+#SLBs#(L7)■ P4#switch#as#a#building#block#of#decomposed#VNF

◦ Server#health#monitoring#in#LBSwitch data#plane

• Support#OpenStack LBaaS (Load/Balancing/as/a/Service)◦ Change#deployment#model:#VM/centric#to#switch/centric◦ Tenant#isolation:#management#and#performance

17

Page 18: JK Lee James Zeng, LBSwitch P4workshop v2

Thank#you

Page 19: JK Lee James Zeng, LBSwitch P4workshop v2

Questions• Provide#consistency#across#pipelines,#under#routing/ECMP#changes?◦ Yes,#by#keeping#hash#function#consistent#across#pipelines

• Can#quickly#react#to#server# failures#while#guaranteeing#consistency?◦ Yes,#by#resilient#hashing# in#HW

• How#to#manage#and#update#VIPs,#DIP/pools?◦ Plan#to#support#OpenStack LBaaS◦ L4#only,#not#L7

19