14 Years Of PF RING - ntop · PF_RING • PF_RING has been introduced in 2004 for improving the...
Transcript of 14 Years Of PF RING - ntop · PF_RING • PF_RING has been introduced in 2004 for improving the...
![Page 1: 14 Years Of PF RING - ntop · PF_RING • PF_RING has been introduced in 2004 for improving the performance of network monitoring applications, accelerating packet capture which was](https://reader035.fdocuments.in/reader035/viewer/2022062414/5ebc95534d738955160d6e20/html5/thumbnails/1.jpg)
14 Years Of PF_RINGHow packet capture acceleration evolved
from pf_ring.ko to PF_RING FT
Alfredo [email protected]
![Page 2: 14 Years Of PF RING - ntop · PF_RING • PF_RING has been introduced in 2004 for improving the performance of network monitoring applications, accelerating packet capture which was](https://reader035.fdocuments.in/reader035/viewer/2022062414/5ebc95534d738955160d6e20/html5/thumbnails/2.jpg)
SKYPE
MySQL
SSH Facebook
SSL
HTTP
DropBox
IMAP
Introduction
• Network Monitoring tools need high-speed, promiscuous, raw packet capture.
• Specialized adapters are often not affordable, or not flexible enough, or they do not provide an “open” API.
• Commodity network adapters and device drivers are designed for providing host connectivity and are not optimized for high-speed raw packet capture.
![Page 3: 14 Years Of PF RING - ntop · PF_RING • PF_RING has been introduced in 2004 for improving the performance of network monitoring applications, accelerating packet capture which was](https://reader035.fdocuments.in/reader035/viewer/2022062414/5ebc95534d738955160d6e20/html5/thumbnails/3.jpg)
PF_RING• PF_RING has been introduced in 2004 for
improving the performance of network monitoring applications, accelerating packet capture which was the main bottleneck on commodity hardware at that time.
• Packet capture does not mean just providing a buffer with the packet data, it also means providing a rich set of features for manipulating, filtering, and processing packets at high rates.
• PF_RING offers on commodity hardware (a standard PC) the ability to receive and transmit at wire-rate up to 100 Gbit.
PF_RING
Application
![Page 4: 14 Years Of PF RING - ntop · PF_RING • PF_RING has been introduced in 2004 for improving the performance of network monitoring applications, accelerating packet capture which was](https://reader035.fdocuments.in/reader035/viewer/2022062414/5ebc95534d738955160d6e20/html5/thumbnails/4.jpg)
The Bro Use Case• Bro IDS is a flexible network analysis framework providing:
• analyzers for many protocols
• a scripting language to define monitoring policies
• application-layer state
• many other nice features
• CPU bound application due to processor-intensive features
• Let’s see how PF_RING has been used to accelerate Bro throughout its evolution..
![Page 5: 14 Years Of PF RING - ntop · PF_RING • PF_RING has been introduced in 2004 for improving the performance of network monitoring applications, accelerating packet capture which was](https://reader035.fdocuments.in/reader035/viewer/2022062414/5ebc95534d738955160d6e20/html5/thumbnails/5.jpg)
Packet Capture Evolution
![Page 6: 14 Years Of PF RING - ntop · PF_RING • PF_RING has been introduced in 2004 for improving the performance of network monitoring applications, accelerating packet capture which was](https://reader035.fdocuments.in/reader035/viewer/2022062414/5ebc95534d738955160d6e20/html5/thumbnails/6.jpg)
The Kernel Module
pf_ring.ko Cluster TNAPI/RSS DNA ZC
• The pf_ring module copies packets from the card to a circular buffer.
• The application reads packets directly from the circular buffer.
Capture performance: up to 1-2 Mpps
Bro performance: 280 Kpps
PF_RING.ko
Application
Circular Buffer
Kernel
![Page 7: 14 Years Of PF RING - ntop · PF_RING • PF_RING has been introduced in 2004 for improving the performance of network monitoring applications, accelerating packet capture which was](https://reader035.fdocuments.in/reader035/viewer/2022062414/5ebc95534d738955160d6e20/html5/thumbnails/7.jpg)
Load Balancing
pf_ring.ko Cluster TNAPI/RSS DNA ZC
• A PF_RING Cluster distributes traffic to multiple threads or application instances.
• It keeps flow coherency (all packets for the same flow to the same thread).
• Each application instance needs to handle a portion of the whole stream.
Capture performance: up to 1-2 Mpps
Bro performance: 1 Mpps on 4 cores
PF_RING.ko
Thread #1
Kernel
Thread #2
Thread #3
Thread #4
![Page 8: 14 Years Of PF RING - ntop · PF_RING • PF_RING has been introduced in 2004 for improving the performance of network monitoring applications, accelerating packet capture which was](https://reader035.fdocuments.in/reader035/viewer/2022062414/5ebc95534d738955160d6e20/html5/thumbnails/8.jpg)
Hardware Load-Balancing
pf_ring.ko Cluster TNAPI/RSS DNA ZC
• RSS is a hardware technology that distributes the load across multiple queues keeping flow coherency.
• TNAPI (deprecated) was a multi-threaded driver, able to fully exploit RSS and multi-core architectures to deliver data through independent data streams.
Capture Performance: up to 8-10 Mpps on 4 cores
Bro performance: 1 Mpps on 4 cores
PF_RING.ko
Kernel
Thread #1
Thread #2
Thread #3
Thread #4
![Page 9: 14 Years Of PF RING - ntop · PF_RING • PF_RING has been introduced in 2004 for improving the performance of network monitoring applications, accelerating packet capture which was](https://reader035.fdocuments.in/reader035/viewer/2022062414/5ebc95534d738955160d6e20/html5/thumbnails/9.jpg)
Zero-Copy Drivers
pf_ring.ko Cluster TNAPI/RSS DNA ZC
• DNA (today known as ZC Driver) is a kernel-bypass technology for Intel cards.
• Packets get copied by the card directly into the application memory.
Capture Performance: up to 15–20 Mpps/core
Bro performance: 1 Mpps on 4 cores
DMA copy
Buffer
Application
Kernel
PF_RING DNA
![Page 10: 14 Years Of PF RING - ntop · PF_RING • PF_RING has been introduced in 2004 for improving the performance of network monitoring applications, accelerating packet capture which was](https://reader035.fdocuments.in/reader035/viewer/2022062414/5ebc95534d738955160d6e20/html5/thumbnails/10.jpg)
Zero-Copy API
pf_ring.ko Cluster TNAPI/RSS DNA ZC
• PF_RING ZC provides a flexible API to create full zero-copy processing patterns (load-balancing, pipelining, etc).
• Inter-VM support with KVM.
• Multi-vendor FPGA support.
App 2
KVM
App 3
App 1PF_RING
ZCPF_RING
ZC
PF_RING ZC
![Page 11: 14 Years Of PF RING - ntop · PF_RING • PF_RING has been introduced in 2004 for improving the performance of network monitoring applications, accelerating packet capture which was](https://reader035.fdocuments.in/reader035/viewer/2022062414/5ebc95534d738955160d6e20/html5/thumbnails/11.jpg)
Capture Performance vs Application Performance
• Capture speed with Intel cards using 1 core @ 10 Gbit:M
pps
02,5
57,510
12,515
PCAP AF_PACKET PF_RING PF_RING ZC
10 Gbit
• Scale up to 100 Gbps with FPGA adapters using multiple cores.
• Ok, capture speed is impressive, but how to further improve application performance (without touching the code)?
![Page 12: 14 Years Of PF RING - ntop · PF_RING • PF_RING has been introduced in 2004 for improving the performance of network monitoring applications, accelerating packet capture which was](https://reader035.fdocuments.in/reader035/viewer/2022062414/5ebc95534d738955160d6e20/html5/thumbnails/12.jpg)
Packet Filtering Evolution
![Page 13: 14 Years Of PF RING - ntop · PF_RING • PF_RING has been introduced in 2004 for improving the performance of network monitoring applications, accelerating packet capture which was](https://reader035.fdocuments.in/reader035/viewer/2022062414/5ebc95534d738955160d6e20/html5/thumbnails/13.jpg)
Software Filtering - BPF
Kernel Filtering
Hardware Filtering nBPF nBroker FT/nDPI
• BPF filters (those supported by applications like tcpdump and Wireshark), are compiled into BPF bytecode and executed by the BPF Virtual Machine, for each received packet.
PF_RING.ko
Application
Kernel
(000) ldh [12] (001) jeq #0x86dd jt 17 jf 2 (002) jeq #0x800 jt 3 jf 17 (003) ldb [23] (004) jeq #0x6 jt 5. jf 17 (005) ld [26] (006) jeq #0x1020304 jt 7 jf 17 (007) ld [30] (008) jeq #0x5060708 jt 9 jf 17 (009) ldh [20] (010) jset #0x1fff jt 17 jf 11 (011) ldxb 4*([14]&0xf) (012) ldh [x + 14] (013) jeq #0x50 jt 16 jf 14 (014) ldh [x + 16] (015) jeq #0x50 jt 16 jf 17 (016) ret #262144 (017) ret #0
tcp and src host 1.2.3.4 and dst host 5.6.7.8 and port 80
BPF Bytecode
![Page 14: 14 Years Of PF RING - ntop · PF_RING • PF_RING has been introduced in 2004 for improving the performance of network monitoring applications, accelerating packet capture which was](https://reader035.fdocuments.in/reader035/viewer/2022062414/5ebc95534d738955160d6e20/html5/thumbnails/14.jpg)
Software Filtering - Rules
Kernel Filtering
Hardware Filtering nBPF nBroker FT/nDPI
• Software filtering rules consist of:
• Packet elements to match (ip, port, protocol, etc).
• Action to be performed when a packet matches the filter (pass, discard, forward, etc).
PF_RING.ko
Application
Kernel
<1.2.3.4, 5.6.7.8, 80, tcp>
<src host 1.2.3.4, dst host 5.6.7.8, port 80, protocol tcp, discard>
Rule
![Page 15: 14 Years Of PF RING - ntop · PF_RING • PF_RING has been introduced in 2004 for improving the performance of network monitoring applications, accelerating packet capture which was](https://reader035.fdocuments.in/reader035/viewer/2022062414/5ebc95534d738955160d6e20/html5/thumbnails/15.jpg)
Hardware Filtering Rules
Kernel Filtering
Hardware Filtering nBPF nBroker FT/nDPI
• Hardware filtering rules are evaluated by the network card (no CPU overhead).
• Available on Intel (82599/X520), Silicom Redirector, others..
PF_RING.ko
Application
Kernel
<1.2.3.4, 5.6.7.8, 80, tcp>
<src host 1.2.3.4, dst host 5.6.7.8, port 80, protocol tcp>
Rules
![Page 16: 14 Years Of PF RING - ntop · PF_RING • PF_RING has been introduced in 2004 for improving the performance of network monitoring applications, accelerating packet capture which was](https://reader035.fdocuments.in/reader035/viewer/2022062414/5ebc95534d738955160d6e20/html5/thumbnails/16.jpg)
BPF to Filtering Rules
Kernel Filtering
Hardware Filtering nBPF nBroker FT/nDPI
• nBPF is a filtering engine supporting the well-known BPF syntax.
• It can generate hardware rules, and filter in software what is not filtered by the card.
• Able to generate hardware rules for FPGA adapters to filter traffic at 100 Gbit.
PF_RING.ko
Application
Kernel
tcp and src host 1.2.3.4 and dst host 5.6.7.8 and port 80
Rules
<1.2.3.4, 5.6.7.8, 80, tcp>
nBPF
![Page 17: 14 Years Of PF RING - ntop · PF_RING • PF_RING has been introduced in 2004 for improving the performance of network monitoring applications, accelerating packet capture which was](https://reader035.fdocuments.in/reader035/viewer/2022062414/5ebc95534d738955160d6e20/html5/thumbnails/17.jpg)
Hardware Traffic Steering
Kernel Filtering
Hardware Filtering nBPF nBroker FT/nDPI
• nBroker is a library for controlling traffic filtering and steering on Intel 100 Gbit adapters (FM10000).
• CLI tool to easy the internal switch configuration:> port eth1 match shost 10.0.0.1 dport 80 steer-to eth2
4x 10G
40G/100G
ApplicationX
![Page 18: 14 Years Of PF RING - ntop · PF_RING • PF_RING has been introduced in 2004 for improving the performance of network monitoring applications, accelerating packet capture which was](https://reader035.fdocuments.in/reader035/viewer/2022062414/5ebc95534d738955160d6e20/html5/thumbnails/18.jpg)
Application
L7 Filtering
Kernel Filtering
Hardware Filtering nBPF nBroker FT/nDPI
• PF_RING FT (Flow Table) is a highly optimized library able to classify L7 traffic.
• It leverages on nDPI to detect application protocols (250+ protocols including Facebook, Skype, Youtube, BitTorrent, …)
Performance: 10+ Mpps/core
Bro performance (filtering out multimedia traffic*): 1.6 Mpps / 10 Gbit Internet traffic on 4 cores (+60%)
* Multimedia traffic (NetFlix, Spotify, etc) is not really interesting for an IDS..
PF_RING FT
Flow Table
nDPI
Fragment Cache
![Page 19: 14 Years Of PF RING - ntop · PF_RING • PF_RING has been introduced in 2004 for improving the performance of network monitoring applications, accelerating packet capture which was](https://reader035.fdocuments.in/reader035/viewer/2022062414/5ebc95534d738955160d6e20/html5/thumbnails/19.jpg)
Conclusions• Capture speed and (filtering) features has
been improved in PF_RING over the years to assist applications processing high traffic rate.
• Moving to 40 or 100 Gbit accelerating packet capture is not enough, network speed grows faster than CPU speed!
• The best way for improving the performance of CPU bound applications is to scale with cores and nodes, and filter as much as possible.
• Do not confuse capture performance with application performance.