Dev Conf 2017 - Meeting nfv networking requirements

Meeting Networking Requirements for NFV

Flavio Bruno LeitnerPrincipal Software Engineer - Networking Service TeamJanuary 2017

● NFV concepts and goals● NFV requirements● 10G Ethernet● Physical-Virtual-Physical (PVP) scenario● Some network solutions● Dive into DPDK enabled Open vSwitch● Possible improvements

Agenda

Virtualize network hardware appliances

NFV - Network Functions Virtualization

Virtualization

VM VM VMFirewall

Router

A new product/project needs new networking infrastructure

NFV - Goals

Before● Slow Process● High Cost● Less Flexibility

After● Fast Process● Lower Cost● Greater Flexibility

Deploy a new service with a click!

NFV - Networking Requirements

Virtualization

Low Latency

High Throughput

… with zero packet loss

NFV Requirements - Challenge

Worse case: Wirespeed smallest frame

Packet rate: 14.88Mpps (million packets per second)

Ethernet specific: 20 bytes [Inter-frame gap (12) + MAC preamble (8)]

Ethernet frame: 64 bytes [MAC header(14) + Payload(46)]

Minimum Ethernet frame size: 20 + 64 = 84 bytes.

Challenge 10GBit/s

How much time per packet?

1 / 14.88Mpps = 67.2 nanoseconds

3GHz CPU => ~200 cycles

Cache Miss => ~32 nanoseconds

L2 Cache Hit => ~10 cycles

L3 Cache hit=> ~36 cycles

Small Budget!

Challenge 10GBit/s - 14.88Mpps

Sources: http://www.intel.co.uk/content/dam/www/public/us/en/documents/manuals/64-ia-32-architectures-optimization-manual.pdfhttps://people.netfilter.org/hawk/presentations/nfws2014/dp-accel-10G-challenge.pdf8

Networking to Virtual Machines - PVP

LogicPort

vSwitchPhysPort

PhysPort

Traffic Generator

● Linux Bridge

● Open vSwitch (OVS)

● SR-IOV

● DPDK Enabled Open vSwitch (OVS-DPDK)

Networking to Virtual Machines

● Use the kernel datapath

● NAPI

● Unpredictable latency

● Not SDN ready

● Low throughput: ~1Mpps/core (Phy-to-Phy)

● qemu runs in userspace

Linux Bridge

● Use the kernel datapath

● NAPI

● Unpredictable latency

● SDN ready

● Low throughput: ~1Mpps/core

● qemu runs in userspace

Open vSwitch

● Low latency

● High throughput

● Bypass the host

● Not SDN friendly - Can’t use a virtual switch in the host

● Physical HW exposed - no abstraction, certification issues/costs

● Migration issues

● Limited number of devices

SR-IOV

What is DPDK?

● A set of libraries and drivers for fast packet processing.

● Open Source, BSD License

Usage:

● Receive and send packets within the minimum number of CPU cycles.

What it is not:

● A networking stack

Data Plane Development Kit (DPDK)

Consists of APIs, provided through the BSD driver running in userspace, to

configure the devices and their respective queues. In addition, a PMD

accesses the RX and TX descriptors directly without any interrupts to quickly

receive, process and deliver packets in the user’s application.

DPDK - Poll-Mode Drivers

Source: http://dpdk.org/doc/guides/prog_guide/poll_mode_drv.html15

● Open vSwitch kernel module is just a cache managed by userspace.

● DPDK provides the libraries and drivers to RX/TX from userspace.

● Yeah, DPDK enabled Open vSwitch!

● Remember the 14.88Mpps? ~16Mpps/core Phys-to-Phys.

● Cost at least one core 100% busy running the PMD thread.

(power consumption, cooling, wasted cycles)

Open vSwitch + DPDK

● Provide network connectivity to Virtual Machines

● Qemu runs in userspace

● Vhost-user interface (TX/RX shared virtqueues)

● Guests can choose between kernel or userspace

● Throughput: ~3.5Mpps/core (default features, PVP, tuned)

● Scales up linearly with multiple parallel streams

● System needs to be carefully tuned

OVS-DPDK for NFV

● Poll-Mode Driver thread owns a CPU

● Devices (queues) are distributed between PMD threads

● Each PMD thread will busy loop polling and processing

● Run-To-Completion

● Batching (reduce per packet processing cost)

How does it work?

X-Ray Patient: OVS-DPDK PMD Thread

Port 1

FW Plane

Port 2 Port n

PMD in PVP

FW Plane

P2 L1 L2 VM

LogicPort

vSwitchPhysPort

PhysPort

Traffic Generator

Packet Flow

PhysicalNIC (10)

FW Plane

PhysicalNIC (11)

vhost-user (20)

vhost-user (21)

in_port=10,action=21in_port=20,action=11

Measuring Throughput: Zero Packet Loss

Expected:● Constant traffic rate● System is constantly dropping packets● Decrease traffic rate, repeat

Packet Drops: Aim For Weak Spots

PhysicalNIC (10)

FW Plane

PhysicalNIC (11)

vhost-user(20)

vhost-user(21)

in_port=10,action=21in_port=20,action=11

Packet Drops: NIC RX QUEUE

PhysicalNIC

FW Plane

● Fixed sized limited by hardware● Drops are reported in the port stats● Queue overflow

(producer-consumer problem)

Packet Drops: Vhost-user TX Queue

FW Plane

Guestvhost-user

● Fixed sized limited in software● Drops reported in the guest● Queue overflow

Packet Drops: Vhost-user RX Queue

FW Plane

Guestvhost-user

● Fixed sized limited in software● Drops are reported in the port stats● Queue overflow

Measuring Throughput: Zero Packet Loss

Expected:● Constant traffic rate● System is constantly dropping packets● Decrease traffic rate, repeat

Reality:● System is stable for a period of time● Few packets dropped sporadically● Decrease traffic rate, repeat● Very low throughput● Understand what is causing the drops

Estimating PMD Processing Budget

Throughput (Mpps) Proc. Budget (µs) PMD Budget (µs)

Measuring Polling/Processing cost.

Device Mode Time (µs)Phys Ingress Polling 0.2

Phys Ingress Processing 3.1

Phys egress Polling 0.016

Phys egress Processing 0

vhost-user ingress Polling 0.013

vhost-user ingress Processing 0

vhost-user egress Polling 0.73

vhost-user egress Processing 2.14

Total Polling+Processing 6.2

● Total of 6.2µs is 24x the per packet budget (0.25µs)

● Assuming 32 packets in a batch, per packet reduces to 0.19µs, ~5Mpps

● 3.5Mpps 0 packet loss (0.29µs) => batch size of 21.4 in average.

Batching

● Internal sources

● External sources

What is wasting time?

● What are they?

● How much significant are they?

Externals Sources

● PMD Processing Budget (3Mpps): 0.16µs

● Ftrace tool => Kernel RCU callback: 50µs + preemption cost

● Roughly 8 batches

● rcu_nocbs=<cpu-list>, rcu_nocb_poll

External Interferences: RCU Callback

● nohz_full

● No way to get rid off it

External Interferences: Timer Interrupt

● Scheduling issues:

○ irqbalance off

○ isolcpus

● Watchdog: nowatchdog

● Power Management: processor.max_cstates=1

● Hyper Threading

● Real-Time Kernel

External Interferences: Other Sources

● Use DPDK L-Thread subsystem to isolate devices

● Disable mergeable buffers to increase batch sizes inside the guest

● Disable mergeable buffers to decrease per packet cost

● Increase OVS-DPDK batch size

● Increase NIC queue size

● Increase virtio ring size

● BIOS settings

● Hardware Offloading

● Faster platform/CPUs

● Improve CPU isolation in the kernel

Possible Improvements

Thank You

Questions & Answers

Source: http://dpdk.org/doc/guides/prog_guide/poll_mode_drv.html37

Dev Conf 2017 - Meeting nfv networking requirements

Engineering

Transcript of Dev Conf 2017 - Meeting nfv networking requirements

DNNDK User Guide - Xilinx...dev libgflags-dev libgoogle-glog-dev libopencv-dev protobuf-compiler libleveldb-dev liblmdb-dev libhdf5-dev libsnappy-dev libboost-all-dev libssl-dev :

Peter lubbers-html5-overview-sf-dev-conf-2011

Why Distribution Matters in NFV | NFV Insights Series · WHY DISTRIBUTION MATTERS IN NFV STRATEGIC WHITE PAPER | NFV INSIGHTS SERIES Classical telecommunications networks are highly

JUNIPER SDN/NFV - State... · JUNIPER SDN/NFV UBS – State of SDN Conf Call ANKUR SINGLA, VP/GM – CLOUD SOFTWARE June 2014 . Title and Bullets Slide Type Juniper Networks Large

Bill Tai - Slides for the Swipe Conference - Mobile App Dev Conf in Sydney

RADIATOR GOES NFV - Open · RADIATOR GOES NFV THE PERFECT INSTRUMENT FOR YOUR NFV ORCHESTRA NFV is a natural environment to deploy Radiator products. ... carriers …

A Presentation to AMA Alliance Leadership Development ...kefferfinancialplanning.com/files/AMA Alliance Leadership Dev Conf... · A Presentation to AMA Alliance Leadership Development

DEFINING NFV NFV Network Function Virtualization

Industrial Introduction to NFV - University of Sheffield · Industrial Introduction to NFV Contents ... for NFV Containers for NFV could give Carriers and Vendors radical efficiency

Dev conf 2017 - What is software product design

GS NFV-IFA 007 - V2.3.1 - Network Functions Virtualisation (NFV) … · 2018. 9. 6. · GS NFV-IFA 007 - V2.3.1 - Network Functions Virtualisation (NFV) Release 2; Management and

ISG NFV UPDATE - events17.linuxfoundation.orgevents17.linuxfoundation.org/sites/events/files/slides/ETSI NFV ISG... · ISG NFV UPDATE ETSI ISG NFV Pierre Lynch Lead Technologist,

Dev conf git_lecture

NFV and Connection Tracking OpenStack NFV: Performance with …€¦ · OpenStack NFV: Performance with OvS-DPDK for NFV and Connection Tracking Bhanuprakash Bodireddy(bhanuprakash.bodireddy@intel.com),

Why Distribution Matters in NFV | NFV Insights Series · the requirements this places on the NFV ecosystem vendors. About the NFV Insights Series NFV represents a major shift in the

GS NFV-IFA 005 - V2.1.1 - Network Functions Virtualisation (NFV ...

2011 06 15 velocity conf from visible ops to dev ops final

Dynamic Service Chaining for NFV/SDN · 2018-07-27 · 2 ! Introduction – NFV Reference Architecture – NFV Use cases ! Policy Enforcement in NFV/SDN – Challenges in NFV environments

NFV Updates

GS NFV-IFA 009 - V1.1.1 - Network Functions Virtualisation (NFV ...