TurboFlow: Information Rich Flow Record Generation on … · 2020-02-06 · TurboFlow Architecture:...
Transcript of TurboFlow: Information Rich Flow Record Generation on … · 2020-02-06 · TurboFlow Architecture:...
TurboFlow: Information Rich Flow Record Generation on
Commodity SwitchesJohn Sonchack1, Adam J. Aviv2, Eric Keller3, Jonathan M. Smith1
1University of Pennsylvania, 2USNA, 3University of Colorado
Introduction: Network Monitoring with Flow Records
Flow record <srcIp, dstIp,
srcPort, dstPort, arrivalTs,
avgInterArrival, pktsDroppedCt, queueLen, …>
2
Flow record <srcIp, dstIp,
srcPort, dstPort, arrivalTs,
avgInterArrival, pktsDroppedCt, queueLen, …>
Traffic Engineering SecurityDebugging
3
Introduction: Network Monitoring with Flow Records
Flow record <srcIp, dstIp,
srcPort, dstPort, arrivalTs,
avgInterArrival, pktsDroppedCt, queueLen, …>
Traffic Engineering SecurityDebuggingLow throughput because packets dropped at switch
2!
4
Introduction: Network Monitoring with Flow Records
Flow record <srcIp, dstIp,
srcPort, dstPort, arrivalTs,
avgInterArrival, pktsDroppedCt, queueLen, …>
Traffic Engineering SecurityDebuggingCongestion at
switch 2!
5
Introduction: Network Monitoring with Flow Records
Flow record <srcIp, dstIp,
srcPort, dstPort, arrivalTs,
avgInterArrival, pktsDroppedCt, queueLen, …>
Traffic Engineering SecurityDebuggingCongestion at
switch 2!
6
Introduction: Network Monitoring with Flow Records
Flow record <srcIp, dstIp,
srcPort, dstPort, arrivalTs,
avgInterArrival, pktsDroppedCt, queueLen, …>
Traffic Engineering SecurityDebuggingHosts 2 and 4 are in a botnet!
From botnet 7
Introduction: Network Monitoring with Flow Records
Flow record <srcIp, dstIp,
srcPort, dstPort, arrivalTs,
avgInterArrival, pktsDroppedCt, queueLen, …>
Traffic Engineering SecurityDebugging
3.2$Tb/s >$100$M$packets/s
>$10$M$flows/s
8
Introduction: Network Monitoring with Flow Records
9
Flow Monitoring Switches: Prior Work
Sampled Packets InaccurateSampling Flow
Records
Custom Hardware Offloading
Server Offloading Expensive
Restrictive
Packets (or other records)
Flow Monitoring Switches: Prior Work
10
Sampled Packets InaccurateSampling
Flow Records
Flow Records
Packets (or other records)
Flow Records
Introduction: TurboFlowMain idea: Optimize instead of offload. Q : What can we get out of the programmable hardware in next-generation commodity switches?
Onboard MicroserversProgrammable Forwarding Engines
11
Introduction: TurboFlowMain idea: Optimize instead of offload. Q : What can we get out of the programmable hardware in next-generation commodity switches?
A : Flow record generation for multi-terabit rate traffic without sampling or offloading.
Onboard MicroserversProgrammable Forwarding Engines
12
13
Programmable Forwarding
Engine
MicroserverTurboFlow
Flow Record Generation
Flow Records
Packets
Pre-aggregation
Partial Flow
Records
Introduction: TurboFlow
• Introduction
• Architecture
• Evaluation
• Conclusion
14
Outline
Flow Records
Packets
Programmable Forwarding
Engine
MicroserverTurboFlow
Pre-aggregation
Partial Flow
Records
Flow Record Generation
15
TurboFlow Architecture
Flow Records
Packets
Programmable Forwarding
Engine
MicroserverTurboFlow
Partial Flow
Records
Pre-aggregation
Flow Record Generation
Forwarding Engine
Switch CPU
Background: Programmable Forwarding Engines
16
Stateful VariablesActionMatch
Forwarding Engine
Background: Programmable Forwarding Engines
17
Packet Count
Average Interarrival
…
Update 3 1 ms …Update 49 8 ms …Update 3 42 ms …
Match Stateful VariablesAction
Flow (IP 5-tuple)
A -> BE -> GF -> G
Switch CPU
Forwarding Engine
Background: Programmable Forwarding Engines
18
Switch CPU
Packet Count
Average Interarrival
…
Update 3 1 ms …Update 49 8 ms …Update 3 42 ms …
Match Stateful VariablesAction
Flow (IP 5-tuple)
A -> BE -> GF -> G
Table Manager
Rule installation rate: < 10 K / s
Flow arrival rate @ 1 Tb/s:
> 10,000 K / s
Forwarding Engine
Switch CPU
TurboFlow Architecture: Using the FE Efficiently
19
Table Manager
Forwarding Engine
Switch CPU
TurboFlow Architecture: Using the FE Efficiently
20
Current Flow
(IP 5-tuple)Packet Count
Average Interarrival
…
Match Stateful Variables
Forwarding Engine
Switch CPU
TurboFlow Architecture: Using the FE Efficiently
21
Match Stateful Variables
Flow Key Hash
1234
Current Flow
(IP 5-tuple)Packet Count
Average Interarrival
…
Table Manager
Forwarding Engine
Switch CPU
TurboFlow Architecture: Using the FE Efficiently
22
Flow Key (IP 5-tuple)
Packet Count
Average Interarrival
…
A -> B 4 3 ms …C -> D 49 8 ms …E -> F 3 42 ms …Z -> Q 9 10 ms …
Match Stateful Variables
Flow Key Hash
1234
HASH
Current Flow
(IP 5-tuple)Packet Count
Average Interarrival
…
A -> B 4 3 ms …C -> D 49 8 ms …E -> F 3 42 ms …Z -> Q 9 10 ms …
Tracked Flow: Update Counters
A->B
Forwarding Engine
Switch CPU
TurboFlow Architecture: Using the FE Efficiently
23
Match Stateful Variables
Flow Key Hash
1234
HASH
Tracked Flow: Update Counters
A->B
G->H
Untracked Flow: Replace colliding record, send it to CPU
Z -> Q: 9 10 ms …
Record Aggregator
Tracked Flow
(IP 5-tuple)Packet Count
Average Interarrival
…
A -> B 4 3 ms …C -> D 49 8 ms …E -> F 3 42 ms …G -> H 1 — …
24
TurboFlow Design
Flow Records
Packets
Programmable Forwarding
Engine
MicroserverTurboFlow
Partial Flow
Records
Pre-aggregation
Flow Record Generation
Count: 12Count: 2Count: 10
TurboFlow Architecture: Using the CPU Efficiently
25
Partial Flow
Records
Flow Records
Key Count
A->B 12
Flow Stats Dictionary
TurboFlow Architecture: Using the CPU Efficiently
Partial Flow
Records
Flow Records
Count: 12Key Count
A->B 12
Flow Stats Dictionary
Count: 2Count: 10
26
Optimization Performance Vs. Baseline
baseline (std::unordered_map)
-
Reduce Pointer Operations
1.64X
Vectorize Key Comparison
3.79X
Batch and Prefetch
4.9X
average of 146 cycles spent per partial flow record.
27
Outline
• Introduction
• Architecture
• Evaluation
• Conclusion
Flow Records
Packets
Programmable Forwarding
Engine
MicroserverTurboFlowOptimized
Flow Record Generation
Pre-aggregation
Partial Flow
Records
28
Implementation and Evaluation
Benchmark Workloads • 10 Gb/s Internet
Router Traces (CAIDA 2015)
• 144 Node Simulated Datacenter Cluster (YAPS simulator)
ImplementationsP4 Switch
(3.2 Tb/s Barefoot Tofino)
P4 SmartNIC (40 Gb/s Netronome NFP)
29
Implementation and Evaluation
Benchmark Workloads • 10 Gb/s Internet
Router Traces (CAIDA 2015)
• 144 Node Simulated Datacenter Cluster (YAPS simulator)
ImplementationsP4 Switch
(3.2 Tb/s Barefoot Tofino)
P4 SmartNIC (40 Gb/s Netronome NFP)
30
Required Average Throughput to Monitor 100 X 10 Gb/s Internet Links
0.0 0.2 0.4 0.6 0.8 1.00
10
20
30
40
Parti
alFl
owR
ecor
dpe
rSec
ond
(Mill
ions
)
No FE AggregationFE Aggregation with 5 MB FE Memory ( 26%)
31
0.0 0.2 0.4 0.6 0.8 1.00
10
20
30
40
Parti
alFl
owR
ecor
dpe
rSec
ond
(Mill
ions
)
No FE AggregationFE Aggregation with 5 MB FE Memory ( 26%)
Partial aggregation using 5 MB of FE memory reduces workload by ~4X.
Required Average Throughput to Monitor 100 X 10 Gb/s Internet Links
32
0.0 0.2 0.4 0.6 0.8 1.00
10
20
30
40
Parti
alFl
owR
ecor
dpe
rSec
ond
(Mill
ions
)
No FE AggregationFE Aggregation with 5 MB FE Memory ( 26%)
1 2 3 4Switch CPU Cores
0
10
20
30
40
Parti
alFl
owR
ecor
dpe
rSec
ond
(Mill
ions
)
std::unordered map Fully Optimized
No FE AggregationFE Aggregation with 5 MB FE Memory ( 26%)
Optimizations improve performance by ~5X.
Required Average Throughput to Monitor 100 X 10 Gb/s Internet Links
33
0.0 0.2 0.4 0.6 0.8 1.00
10
20
30
40
Parti
alFl
owR
ecor
dpe
rSec
ond
(Mill
ions
)
No FE AggregationFE Aggregation with 5 MB FE Memory ( 26%)
1 2 3 4Switch CPU Cores
0
10
20
30
40
Parti
alFl
owR
ecor
dpe
rSec
ond
(Mill
ions
)
std::unordered map Fully Optimized
No FE AggregationFE Aggregation with 5 MB FE Memory ( 26%)
FE pre-aggregation + optimizations = terabit rate workloads using 1 core and ~26% of FE memory.
Required Average Throughput to Monitor 100 X 10 Gb/s Internet Links
34
Outline
• Introduction
• TurboFlow Design
• Implementation and Evaluation
• Conclusion
In the Paper
Cost analysis
More interesting flow features Pipeline layouts
Psuedocode
Expected worst case analysis
More Evaluation
35
Conclusion (and Thank You for Listening!)
• Flow records are important for monitoring, but difficult to generate at the switch due to high traffic rates.
• TurboFlow is a flow record generator carefully optimized for next generation commodity switch hardware that scales to multi-terabit rate traffic without sampling.
36