Programmable Forwarding Planes at Terabit/s Speeds · Defined standard packet fields; User Defined....

22
CHIEF TECHNOLOGY OFFICER, BAREFOOT NETWORKS Programmable Forwarding Planes at Terabit/s Speeds Patrick Bosshart And the entire Barefoot Networks team Hot Chips 30, August 21, 2018

Transcript of Programmable Forwarding Planes at Terabit/s Speeds · Defined standard packet fields; User Defined....

Page 1: Programmable Forwarding Planes at Terabit/s Speeds · Defined standard packet fields; User Defined. Matching: Defined exact and ternary matching on those fields. Same: Action. Defined

CHIEF TECHNOLOGY OFFICER, BAREFOOT NETWORKS

Programmable Forwarding Planes at Terabit/s Speeds

Patrick Bosshart

And the entire Barefoot Networks team

Hot Chips 30, August 21, 2018

Page 2: Programmable Forwarding Planes at Terabit/s Speeds · Defined standard packet fields; User Defined. Matching: Defined exact and ternary matching on those fields. Same: Action. Defined

Barefoot Tofino™: A Domain Specific Processor for Networking

2

GPU

Graphics

OpenCL

Compiler

DSP

Signal Processing

Matlab

Compiler

MachineLearning

TPU

TensorFlow

Compiler

Tofino

Networking

P4

Compiler>>>

Page 3: Programmable Forwarding Planes at Terabit/s Speeds · Defined standard packet fields; User Defined. Matching: Defined exact and ternary matching on those fields. Same: Action. Defined

• CPU - 100x• Network Processor - 10x• FPGA - 10x• These ratios have been constant for many years

Performance Penalty for Programmable Packet Processing

3

Page 4: Programmable Forwarding Planes at Terabit/s Speeds · Defined standard packet fields; User Defined. Matching: Defined exact and ternary matching on those fields. Same: Action. Defined

• Introduction to Tofino• Basics of switching• Programmable switching• Tofino details• Applications

• What is the processing paradigm for programmable switches?• How does Tofino process packets?• How does it avoid a performance penalty?• What can you do with its programmability?

Presentation Outline

4

Page 5: Programmable Forwarding Planes at Terabit/s Speeds · Defined standard packet fields; User Defined. Matching: Defined exact and ternary matching on those fields. Same: Action. Defined

• 6.5Tb/s switch• 260 lanes of 25G SerDes• 260 x 25G Ethernet ports, 130 x

50G, 65 x 100G, or combinations• 11B transistors• 16nm technology• 1250 MHz• Equivalent in area and power to

fixed function chips

Barefoot Tofino™ - Overview

5

Match+Action

Pipeline 2Match+Action

Pipeline 3

Match+Action

Pipeline 1Match+Action

Pipeline 0SharedPacket Buffer

&TM

MAC + Serial I/O

Page 6: Programmable Forwarding Planes at Terabit/s Speeds · Defined standard packet fields; User Defined. Matching: Defined exact and ternary matching on those fields. Same: Action. Defined

pipe 3

pipe 2

pipe 1

Barefoot Tofino – Block Diagram

6

pipe 0

Rx MACs10/25/40/50/100

IngressPipeline

Tx MAC10/25/40/50/100

Control & configuration

Reset / Clocks

PCIe CPU MAC DMA engines

Traffic Manager

EgressPipeline

Rx SerDes

Tx SerDes

Rx MACs10/25/40/50/100

IngressPipeline

Tx MAC10/25/40/50/100

EgressPipeline

Rx SerDes

Tx SerDes

Rx MACs10/25/40/50/100

IngressPipeline

Tx MAC10/25/40/50/100

EgressPipeline

Rx SerDes

Tx SerDes

Rx MACs10/25/40/50/100

IngressPipeline

Tx MAC10/25/40/50/100

EgressPipeline

Rx SerDes

Tx SerDes

Page 7: Programmable Forwarding Planes at Terabit/s Speeds · Defined standard packet fields; User Defined. Matching: Defined exact and ternary matching on those fields. Same: Action. Defined

What’s in a packet? – A simple example

7

MAC IPv4 TCP

Dest. Addr.

Src. Addr.

Eth. Type

Src. IP

Dst. IP

IP Proto.

Time To Live (TTL)

Src. Port

Dst. Port

Seq. Num.

Payload

= IPv4 = TCP

Page 8: Programmable Forwarding Planes at Terabit/s Speeds · Defined standard packet fields; User Defined. Matching: Defined exact and ternary matching on those fields. Same: Action. Defined

Match and Action

8

MATCH

ACTION

MATCH

ACTION

#

=

MAC Dest. Addr.

Set Output Port

Packet Header Output

Page 9: Programmable Forwarding Planes at Terabit/s Speeds · Defined standard packet fields; User Defined. Matching: Defined exact and ternary matching on those fields. Same: Action. Defined

Sequence of Match Tables

9

MATCH

ACTION

MAC TABLE

MATCH

ACTION

IP TABLE

MATCH

ACTION

ACL TABLE

Exact MatchMAC Dest. Addr.

Longest Prefix MatchIPv4 Dest. Addr.

Ternary MatchIPv4 Dest. Addr.

TCP Src. Port

Page 10: Programmable Forwarding Planes at Terabit/s Speeds · Defined standard packet fields; User Defined. Matching: Defined exact and ternary matching on those fields. Same: Action. Defined

Components OpenFlow Generalization

Packet Fields Defined standard packet fields User Defined

Matching Defined exact and ternary matching on those fields

Same

Action Defined actions on those fields -specific, complex

Generalized, simple actions, VLIW with one action per word

Multiple Tables Defined sequence of match-action tables

Same

Programming the Forwarding Plane P4Developed and maintained by P4.org, over 100 members.

Controlling the Forwarding Plane Defined API, e.g. for table, flow entry setup/modification/deletion

P4RuntimeDeveloped by Google, Barefoot, ONF and contributed to P4.org

Generalizing OpenFlow

10

Page 11: Programmable Forwarding Planes at Terabit/s Speeds · Defined standard packet fields; User Defined. Matching: Defined exact and ternary matching on those fields. Same: Action. Defined

Define MAC headerheader ethernet_t {

bit<48> dstAddr;bit<48> srcAddr;bit<16> etherType;

}

Define table matching on MAC header wordtable mac {

key = {ingress_metadata.bd : exact;l2_metadata.lkp_mac_da : exact;

}actions = {

dmac_hit;dmac_miss;dmac_redirect_to_cpu;

}default_action = dmac_miss;size =MAC_TABLE_SIZE;

}

Define table actionsaction dmac_hit(bit<16> ifindex, bit<16> port_lag_index) {

ingress_metadata.egress_ifindex = ifindex;ingress_metadata.egress_port_lag_index = port_lag_index;l2_metadata.same_if_check = l2_metadata.same_if_check, ^ ifindex;

}

• Open Source

• Reconfigurable

• Protocol Independent

• Target Independent: s/w, FPGAs, NICs, fixed and programmable switches

• Vendor Independent

• P4 Tutorial at Hot Chips 2017

P4 Code Example

11

• Open Source

• P4 compiler -> P4 Runtime -> Switch

Page 12: Programmable Forwarding Planes at Terabit/s Speeds · Defined standard packet fields; User Defined. Matching: Defined exact and ternary matching on those fields. Same: Action. Defined

Tofino Hardware

12

Page 13: Programmable Forwarding Planes at Terabit/s Speeds · Defined standard packet fields; User Defined. Matching: Defined exact and ternary matching on those fields. Same: Action. Defined

Packet Parsing

ETH

IPv6 IPv4

UDP TCP

END

packet TCAM

NEXT STATE

PACKET HEADER VECTOR

16 Parsers Per Pipe, Each handles 100G Traffic

Parser FSM

Header word extract

Page 14: Programmable Forwarding Planes at Terabit/s Speeds · Defined standard packet fields; User Defined. Matching: Defined exact and ternary matching on those fields. Same: Action. Defined

Match-Action Unit (MAU)

14

Packet Header Vector in

X

Delay

#

MatchRAM / TCAM

Action RAM

Instr. RAM

Instruction Address

SRAM / TCAM ARRAYS ALUs:Meters, Statistics,

Stateful

Match Data

Hash

Mux ALUPacket Header

Vector out

Xbar

Operand Data

Page 15: Programmable Forwarding Planes at Terabit/s Speeds · Defined standard packet fields; User Defined. Matching: Defined exact and ternary matching on those fields. Same: Action. Defined

Table graph mapping to Match-Action Stages

• Multiple small tables per stage

• Large tables spread over multiple stages

15

IPv6 IPv4

ACL

uRPF

0 1 32

ACL

Page 16: Programmable Forwarding Planes at Terabit/s Speeds · Defined standard packet fields; User Defined. Matching: Defined exact and ternary matching on those fields. Same: Action. Defined

Table Predication

• Multiple tables per stage

• Each table produces next-table pointer

• Predication invalidates skipped tables

16

A

B

C

D

=> SKIP

SKIP <=

A B C D

Short Forward BranchesTable Graph

Page 17: Programmable Forwarding Planes at Terabit/s Speeds · Defined standard packet fields; User Defined. Matching: Defined exact and ternary matching on those fields. Same: Action. Defined

Dependencies and Pipelining

17

IPv6 IPv4

ACL

uRPF

0 1 32

ACL

Different Stages

MAC Dest. ModificationCreates Match Data Dependency

STAGE 0123

Concurrent Sequential

Execution Pipelining

STAGE

Page 18: Programmable Forwarding Planes at Terabit/s Speeds · Defined standard packet fields; User Defined. Matching: Defined exact and ternary matching on those fields. Same: Action. Defined

• Didn't use von Neuman processing model• Compiler and compiler friendly design• Regular architecture: more replication allows better optimization• MAU area divided into

• RAM arrays + support logic + standard switch features• Programmable portion

• Programmable portion area about 10% of chip area• Power is less than the comparable fixed function switches

How did we get the programmability without the penalty?

18

Page 19: Programmable Forwarding Planes at Terabit/s Speeds · Defined standard packet fields; User Defined. Matching: Defined exact and ternary matching on those fields. Same: Action. Defined

• CPU, Network Processor performance• Use run-to-completion model• Throughput dependent on program

• Hardware switch performance• Deterministic• Requires assured throughput at 100% traffic, all inputs

• Most important switch parameters• TM packet data ram size• Match table ram size

• Many specific behaviors required• Multicast, queue prioritization, WRED, PFC, …

Aspects of switch performance

19

Page 20: Programmable Forwarding Planes at Terabit/s Speeds · Defined standard packet fields; User Defined. Matching: Defined exact and ternary matching on those fields. Same: Action. Defined

• Switching Applications:• Table and Feature scaling• Inband Network Telemetry: (INT P4.org)

• Data added to packet on each hop - which switch, time spent there, …• Data collected on exit• Answers question: if I'm delayed, who delayed me?

• Bloom filters, heavy hitter detection• Computational Networking Applications:

• Layer 4 load balancer (SilkRoad: Sigcomm 2017)• Replaces ~500 CPUs• 500x lower latency• DDOS protection

• DNS cache• Key-Value store cache (NetCache: SOSP 2017)• ML parameter server

Applications

20

Page 21: Programmable Forwarding Planes at Terabit/s Speeds · Defined standard packet fields; User Defined. Matching: Defined exact and ternary matching on those fields. Same: Action. Defined

• Switching is going programmable• Tofino achieved no performance penalty vs fixed function switches• P4 enables users to build their network their way• Opens up wire-speed processing to the networking software industry

Summary

21

Page 22: Programmable Forwarding Planes at Terabit/s Speeds · Defined standard packet fields; User Defined. Matching: Defined exact and ternary matching on those fields. Same: Action. Defined

Thank You!

22