Case Study: Network Processors - syncandshare.lrz.de filewireless infrastructure (Node B/RNC) ...
Transcript of Case Study: Network Processors - syncandshare.lrz.de filewireless infrastructure (Node B/RNC) ...
SOCSA Slides: Network Processors
© Institute for Integrated Systems Technische Universität München www.lis.ei.tum.de
Case Study: Network Processors
System-on-Chip
Solutions & Architectures A. Herkersdorf
© Institute for
Integrated Systems
A. Herkersdorf SoC - Network Processors - 2
Network Processors
Motivation
Where will Network Processors be used
What is the „business case“ for network processors
Status on today‘s NP products
IC Requirements for Network Processors
Processor/CPU Requirements
Memory Requirements
On-chip Interconnect Requirements
SOCSA Slides: Network Processors
© Institute for
Integrated Systems
A. Herkersdorf SoC - Network Processors - 3
Real-world case studies
Firewall
Edge
Router
Server
Adapters (NIC)
Cable
Core
Router
Base
Station
Controllers
Cable
Headend
Sonet/SDH Transmission LAN/SAN
Switch
Internet Router
Sonet/SDH Transmission
© Institute for
Integrated Systems
A. Herkersdorf SoC - Network Processors - 4
Networking Trends Voice/Data integration
both in WAN backbone and wireless access
New broadband access technologies xDSL, Cable
Need for service differentiation interactive real-time, streaming,
best effort
Security (crypto, authentication) Business use of Internet Prevention of attacks (firewalls,
virus scanning)
Emerging standards still in flux IETF, ETSI, ANSI
Ever growing data link rates Time
L2 / VLAN
History
L2 / VLAN
Layer 3: IP Routing
Today
AS
IC
Netw
ork
Pro
ce
sso
r
L2 / VLAN
Layer 3: IP Routing
Current Trend
Layer 4: DiffServ Security Filtering
L2 / VLAN
Layer 3: IP Routing
Future
Layer 4: DiffServ Security Filtering
Deep Packet Processing
Demand for high performance and flexibility in future networking components
SOCSA Slides: Network Processors
© Institute for
Integrated Systems
A. Herkersdorf SoC - Network Processors - 5
Network Processor Status
No industry standard NP architecture established yet High conceptual diversity among different vendors
High-end tends towards chip set solutions,
low-end towards fully integrated NPs
Market focus is in multi-Gb EN and OC-48
Trends
Towards high-level programming language support
Standard third-party tools chain
Push towards standardized interfaces
Coprocessor-NP interworking among diff. vendors
NP Forum (API, Streaming / LookAside i/f), IETF ForCES
LSI AMCC Netronome Xelerated
product APP3300 nP3700 IXP2855 X11
Gbps 0.6-3.5 5 5 10-24
Op. Freq. 290 MHz 700 MHz 1.5 GHz 240 MHz
# proc. 2C + ? D 3 16 160
# threads ? 72 128 160
proc struct. SMP CP
D-Pipeline SMP
1 thr./pkt Pipeline Pipeline
# coproc. 10 5 3 10
mem type DDR-RAM Flash
QDR-SRAM QDR-SRAM RDRAM
RL-/DDR2 SRAM
# chips 1 1 1 1
power [W] [email protected] ? 27-32 10 (X10q) Sources: www.lsi.com; www.amcc.com; www.intel.com;
www.xelerated.com (2008)
RLDRAM
© Institute for
Integrated Systems
A. Herkersdorf SoC - Network Processors - 6
LSI APP3300
NP targeted for
• wireline access (MSAN, DSLAM)
• wireless infrastructure (Node B/RNC)
• multi-service switch/router
and integrates
• classifier
• traffic manager
• control&service processor (2 ARM11)
• security protocol processor (1.5 Gbit/s)
• high-level language programmable data path processor pipeline (proprietary)
• MACs for different interface types Source: www.lsi.com
SOCSA Slides: Network Processors
© Institute for
Integrated Systems
A. Herkersdorf SoC - Network Processors - 7
AMCC – nP3700
Multi-threaded SMP
3 CPU cores with
24 threads each
Dedicated coprocessors
Traffic Management
Schedule, Shape, Queuing
Policy / Rule Mgmt.
Hashing (search)
Statistics
Fast QDR SRAM and RLDRAM interfaces
Applications: VoIP/media GW, edge routers, WLAN Enterprise access
Source: www.amcc.com
© Institute for
Integrated Systems
A. Herkersdorf SoC - Network Processors - 8
Netronome IXP2855 (formerly Intel)
16-core processor pipeline:
• 8 threads/core
• special core2core interconnect
• max. 1.5 GHz
Two Crypto-Engines:
• max. 10 Gbps IPSec
Hash Engine
Multiple Memory connections:
• 4x QDR SRAM/TCAM
• 3x RDRAM
XScale control plane processor (750 MHz)
Sources: www.intel.com, www.netronome.com
SOCSA Slides: Network Processors
© Institute for
Integrated Systems
A. Herkersdorf SoC - Network Processors - 9
Xelerated X11
Multi-stage processor pipeline:
• 5x 32 PISC processors 1 instr./stage
deterministic performance
• 5 engine access points for coprocessor calls
10 co-processors
Buffer Manager (Q, TS)
Memory interfaces:
• RLDRAM/DDR2-SDRAM
• SRAM/TCAM i/f
• internal TCAM
Source: www.xelerated.com
© Institute for
Integrated Systems
A. Herkersdorf SoC - Network Processors - 10
Anatomy of Switch/Router Systems
Determines box function: Switch, Router, Gateway, etc.
Line Interface
Network Processor Switch
Fabric System Processor
Backplane
SOCSA Slides: Network Processors
© Institute for
Integrated Systems
A. Herkersdorf SoC - Network Processors - 11
NP CPU Performance Requirements
Network Processing Wide performance spectrum
(x15) between applications for same link rate
High absolute MIPS
MIPS
Q:
Determine the back-to-back packets/s arrival rate on OC-3 to OC-192 links, and the required per packet instruction budget for L2 switching, QoS forwarding, and virus scanning at OC-48. Assume a packet size of 64 Bytes.
800 MIPS 12 K MIPS
Source: [3] Jenkins, "NPU Co-Processors", 2000
© Institute for
Integrated Systems
A. Herkersdorf SoC - Network Processors - 12
NP CPU Performance Requirements
pkt/s = link rate [b/s] / pkt size [b/pkt]
Instr / pkt = MIPS / Mpkt / s
Instr / pkt ¦L2 = 800 / 4.86 = 164
Instr / pkt ¦QoS = 3000 / 4.86 = 617
Instr/ pkt ¦virus = 12000 / 4.86 = 2470
MIPS
800 MIPS 12 K MIPS
OC-3 OC-12 OC-48 OC-192
Mbps 155.5 622.1 2488 9953
pkt/s 304 K 1.21 M 4.86 M 19.44 M
s/pkt 3.29 µ 823 n 206 n 51 n
For the same applications, what are the Instr / pkt requirements for OC-3 and OC-192?
Same as for OC-48
SOCSA Slides: Network Processors
© Institute for
Integrated Systems
A. Herkersdorf SoC - Network Processors - 13
NP CPU Performance Requirements
How many embedded processor cores would we need to operate in parallel to support virus scanning at OC-48 rate?
32 bit, dual-issue, single threaded RISC CPU 600 MHz, CPI = 0.7
32 K Byte I/D cache
4.5 W, 10 mm2, 0.13µm CMOS
System memory 133 MHz, DDR SDRAM
32 bit Data bus, 1 M words in each of 4 banks
8 – 0.5 access cycles
Assumptions: D-cache miss rate: 2 %
I-cache miss rate: 0
Data access freq.: 20 %
I/O
Mem PE PE
PE PE
…
…
N
© Institute for
Integrated Systems
A. Herkersdorf SoC - Network Processors - 14
NP CPU Performance Requirements
Microprocessor lecture:
CPU time = x x Instructions
Program
Clock cycles
Instruction
Seconds
Clock cycle
Application specific
1 / fcpu CPI: CPU and memory hierarchy dependent
N x
? 206 ns 2470 1/600MHz
CPI = CPICPU + CPIMEM = CPICPU + f(D_acc) x Cachemiss_rate x Cachemiss_penalty =
= 0.7 + 0.2 x 0.02 x 8 x (600 / 133) = 0.7 + 0.14 = 0.84
N = ceil( 16.78 ) = 17
SOCSA Slides: Network Processors
© Institute for
Integrated Systems
A. Herkersdorf SoC - Network Processors - 15
NP CPU Performance Requirements
Q: By how much does N increase
when D-cache miss rate increases to 5% ?
CPIMEM = 0.36
N’ = ceil(21.18) = 22 Five more processor cores
when cache miss rate increases from 2% to 5%
+ 22.5 W
N = 17 means: 170 mm2 = 13x13 mm
76.5 W
Reasons why our calculation is even optimistic No interconnect latencies nor
contention on shared memory access considered
No runtime kernel (or OS) on CPU considered
D-cache miss rate might well be higher depending on application context size
© Institute for
Integrated Systems
A. Herkersdorf SoC - Network Processors - 16
Conclusions (1)
No One-size-fits-all solution for NP! Not even within one speed rate
Different NP solutions for different application ranges and network location
CPU-only resources aren’t the way to approach network processing
Great insight! But what is it then that makes up Network Processors?
SOCSA Slides: Network Processors
© Institute for
Integrated Systems
A. Herkersdorf SoC - Network Processors - 17
NP Applications by Sub-Function
Imbalance in processing requirement between NP sub-functions
Packet classification, security and traffic mgmt. are most performance demanding fcts
What are the remaining CPU performance requirements for Firewalls and Security Network Interface Cards when Classification, Traffic shaping, Encryption and Compression would be implemented by specific coprocessors?
Firewall: ≈ 10% SecNIC: ≈ 15%
© Institute for
Integrated Systems
A. Herkersdorf SoC - Network Processors - 18
Network Processor Resource Mix
Log COMPUTATIONAL DENSITY = performance / area
103 . . . 104
Lo
g
P O
W E
R
C
ON
SU
MP
TIO
N
10
5 .
. . 10
6 ASIP
DSP CPU
FPGA
ASIC
Custom
IC
Lo
g
F L
E X
I B
I L
I T
Y
FU
NC
TIO
NA
L D
IVE
RS
ITY
Network
Processor
Network Processors address demand for high flexibility and performance with a mix of different SoC resources
The M-$ Question: What is the right
balance between these resources and …
how to assemble them to a homogeneous, high- performance system?
SOCSA Slides: Network Processors
© Institute for
Integrated Systems
A. Herkersdorf SoC - Network Processors - 19
Network Processor Resource Mix
ASIP DSP CPU
FPGA
ASIC
Custom
IC
Network
Processor
ASIC resources Link interfaces EN Phy/ MAC,
SONET framer,
Memory controller
std. peripherals
FPGA/ASIP Generic coprocessors:
Classification, Traffic mgmt., etc.
CPU’s Reserved for functionality
with highest flexibility demand
Secures “future proof” of device
Hardware multi-threading (see Microprocessor lecture) for CPU’s strongly recommended to hide memory and coprocessor access latencies!