1 PC-base Software Routers: High Performance and Application Service Support Author: Raffaele Bolla,...
-
date post
21-Dec-2015 -
Category
Documents
-
view
212 -
download
0
Transcript of 1 PC-base Software Routers: High Performance and Application Service Support Author: Raffaele Bolla,...
1
PC-base Software Routers: High Performance and Application Service Support
Author:
Raffaele Bolla, Roberto Bruschi
Publisher:
PRESTO’08
Presenter:
Hsin-Mao Chen
Date:2010/02/24
3
Introduction
Linux
Network boards
Packet Reception or Transmission
HW Interrupt
(IRQ)
Kernel
Software IRQs
(SoftIRQs)
Packet ProcessingRAM
TxRing and Rx
Ring
4
Introduction
A SoftIRQ executes two main tasks.
1.The de-allocation of already-transmitted packets placed in the TxRing.
2.All the real packet forwarding operations. The task handles the received packets in the RxRing.
5
Architectural Bottlenecks
SR architecture based on a single CPU/core.
1.The SR computational capacity.
2.The bandwidth/latency of I/O busses.
SR architecture based on multiprocessor.
Typical performance issues may sap parallelization gain.
1.Data accessing serialization.
2.CPU/core cache coherence.
6
Architectural Bottlenecks
Data accessing serialization
The SoftIRQ accesses to each TxRing are serialized by a code locking procedure (LLTX lock). This lock guarantees that each TxRing can be read or modified by only one SoftIRQ at a time.
7
Architectural Bottlenecks
CPU/core cache management
Whenever a CPU/core loads a TxRing to its local cache, all of the other processors also cashing it must invalidate their cache copies.
8
Mulit-CPU/core Enhancements
HW evolution
Intel® Advanced Smart Cache: It consists of a mechanism that allows level 2 cache-sharing among all the cores in the same processor.
Intel PRO 1000 adapters: It supports multiple Tx- and Rx Ring and multiple HW IRQs per network interface.
9
Mulit-CPU/core Enhancements
SW architecture
1.To entirely bind all operations carried out in forwarding a packet to a single CPU.
2.To reduce LLTX lock contention as much as possible.
3.To equally distribute the computational load among all the processors/cores in the system.
10
Mulit-CPU/core Enhancements
CPU/core binding to TxRing: Bind each CPU/core to a different TxRing on each output device.
CPU/core binging to RxRing: Bind each RxRings to a different CPU/core.Xeon core: 1 Mpkt/s
Gigabit Ethernet interface: 1.488 Mpkt/s with 64B sized frames
Fast Ethernet interface: 148.8 pkt/s with 64B sized frames