PF_DIRECT@TMA12
-
Upload
nicola-bonelli -
Category
Engineering
-
view
97 -
download
2
description
Transcript of PF_DIRECT@TMA12
Flexible High Performance Traffic Generation on Commodity Multi-core
Platforms
Nicola Bonelli, Andrea Di Pietro, Stefano Giordano, Gregorio Procissi
CNIT and Dip. di Ingegneria dell’Informazione - Università di Pisa
Introduction and Motivations• New network devices are emerging… (probes, NIDs, shapers)
• Available traffic generator from the market:• Expensive black-box solutions (i.e. Spirent AX analyzer)
• Not enough extensible: limited traffic patterns, poor semantics for randomization, etc.• PC and professional NICs based solutions are cheaper (Endace, Napatech,
Invea-tech)• Enable fast packet transmission but usually do not provide a framework for traffic
generation
• Traffic generator should combine the flexibility of the software with the power of the modern hardware• multi-core architectures equipped with multi-queues NICs are today
commodity hardware
• Is it possible to create a software for traffic generation that, running of top of such a parallel architecture, is able to provide hardware-class performance?
Software for traffic generation• A number of software solutions for traffic generation (trafgen, iperf, rude/crude,
mgen)
• Ostinato, and Brute makes use of PF_PACKET sockets and therefore are able to customize traffic at data-link layer:
• - Packet rate hardly exceed few million packets per second (no scalability)• - No explicit support of multi-queue NICs• - It does not support time-stamping to adjust the timing with which to transmit packets
Fast packet transmission…
• Recently accelerated drivers have also emerged: netmap (Luigi Rizzo)• memory-map the DMA descriptors of NICs to user-space and can transmit at wire-speed
(14.8Mpps) the same packet or a small set of of packets • A single thread generating a random-address IP packets does not fill the pipe (~6/8 Mpps
each)• Also using the very fast Mersenne-twister random generator! (~50 CPU cycles)
• Additional investigations are required…
PF_DIRECT featuresWe implemented a brand new socket, named PF_DIRECT:
• A socket designed for the traffic generation (and transmission)• Compliant with vanilla drivers (not a custom driver)• Designed to run on top of commodity parallel hardware
• Support of timestamp in transmission
• Decoupling the traffic generation from packet transmission• Packets are generated by a user-space thread and transmitted by
multiple kernel threads
• Simple patterns are generated and transmitted nearly at wire speed• More complex patterns, most likely, do not have this requirement
PF_DIRECT architecturePF_DIRECT kernel module consists of:
• A user-space library written in C++11 supposed to handle memory mapping, packet dispatching among k-thread, etc.
• A special memory mapped byte-oriented SPSC queue• Amortizes traffic coherence between cores (of queue index invalidations)
• Kernel thread supposed to transmit the packets buffered at the SPSC queues, each at the given timestamp
• Active wait or reschedule in case of long wait…• TSC of different cores are synchronized on modern CPUs (INVARIANT_TSC)
• A ring of pre-allocated socket buffers (skb) which are re-used by the kernel module and never get deallocated by network drivers
• User-counter trick
PF_DIRECT architecture
Traffic generation with PF_DIRECT Our experimental traffic generator, built on top of PF_DIRECT, consists of:
• User-space application, where each thread of execution represent a source of traffic
• Traffic sources “Engine” (that can concurrently make use of different traffic models)• User-space thread, one per core, running a deadline scheduler (~20 ns
context switch)
• A user-defined traffic mode (micro-thread) is in charge of:• Create the packet to be transmitted• Schedule the timestamp for the packet transmission• Send the packet through the PF_DIRECT socket (buffered it at the SPSC queue)
• Xml composition blocks that allow to instantiate and bind a given source to a core and to a given hardware queue
Traffic generator architecture
Experimental results: 1GMonsters
1 Gb link
Xeon 6-core X5650 @2.57 GHz, 12GBytes RAM
Intel 82599 multi-queue 10G Ethernet adapter, ixgbe 3.4.24 device driver
PF_DIRECT for traffic generation
Spirent AX-4000 Traffic Analyzer
Model CBR, 64bytes frames with random IP addresses:single source: 1 user-space threadhardware queue: 1 kernel thread
1G link: CBR 100kpps, interarrival time
1G link: variadic rate up to 1.4Mpps
1G link: Inter-arrival times of Poisson process at 100Kpps
1G link: Inter-arrival times of Poisson process at 1Mpps
Experimental results: 10GMascara Monsters
10 Gb link
Xeon 6-core X5650 @2.57 GHz, 12GBytes RAM
Intel 82599 multi-queue 10G Ethernet adapter, ixgbe 3.4.24 device driver
PF_DIRECT for traffic generation
Xeon 6-core X5650 @2.57 GHz, 12 GBytes RAM
Intel 82599 multi-queue 10G ethernet adapter, ixgbe 3.4.24 device driver
PFQ for traffic capture
Model CBR, 64bytes frames with random IP addresses:1 user-space thread
multiple hardware queue: 4 kernel threads
10G link: variadic rate up to 12.8Mpps
10G link: Inter-arrival times of Poisson process at 4Mpps
10G link: throughput bps
10G link: throughput bps
Conclusions• PF_DIRECT a Linux socket that leverages the potential
of multi-core architectures and multi-queues NICs
• PF_DIRECT decouples the task of packet generation from that of transmission• A single thread is able to generate non-trivial traffic, close
to the wire-rate ~13Mpps• Multiple kernel-threads transmit packets though multiple
queues• Support transmission timestamp (in TSC)
• Experimental traffic generator on top of PF_DIRECT
Future work• Release the PF_DIRECT source code
• Additional performance improvements in PF_DIRECT
• Performance: identify a small set of changes, common to different drivers, that could define a “PF_DIRECT aware-driver”
• Implement a stable version of the “traffic generator” with complex traffic models