Post on 08-Jul-2020
TM
October 2013
TM 2
• Data Path Acceleration Architecture (DPAA)
• Buffer Manager
• Queue Manager
• Frame Manager
• Accelerator Overview
− Security Engine
− Pattern Match Engine
− RapidIO Message Manager
− Decompression/Compression Engine
− Data Center Bridging
3 TM
QMan
Queue
Manager
BMan
Buffer
Manager
SEC
Security
Engine
PME
Pattern
Match
Engine
FMan
Frame
Manager
RMan
Rapid I/O
Manager
Cores
Rapid I/O
Messaging
and more
Ethernet
Collection of cores, HW
accelerators and bridges, tied
together by HW Buffer and
Queue Managers.
4 TM
Hardware Accelerators
FMAN
Frame
Manager
50 Gbps aggregate Parse,
Classify, Distribute
BMAN
Buffer
Manager
64 buffer pools
QMAN
Queue
Manager
Up to 224 queues
RMAN
Rapid IO
Manager
Seamless mapping sRIO
to DPAA
SEC
Security
40Gbps: IPSec, SSL
Public Key 25K/s 1024b
RSA
PME
Pattern
Matching
10Gbps aggregate
DCE
Data
Compression
20Gbps aggregate
Saving CPU Cycles for higher value work
Compress and
Decompress
traffic across the
Internet
Protects against
internal and
external Internet
attacks
Frees CPU from
draining repetitive
RSA, VPN and
HTTPs traffic
Identifies traffic
and targets CPU
or accelerator
New Enhanced
Line rate
50Gbps
Networking
Quality of Service
for FCoE in
converged data
center networking
5 TM
• Buffer: Unit of contiguous memory, allocated by software
• Frame: Single buffer or list of buffers that hold data, for example, packet payload, header, and other control information
• Frame Descriptor (FD): Proxy structure used to represent frames
• Frame Queue:
− FIFO of (related) Frame Descriptors (e.g. TCP session)
− The basic queuing structure supported by QMan
• Frame Queue Descriptor (FQD): Structure used to manage Frame Queues
Buffer
Buffer
Ethernet
Frame Pre-
amble
Dest
addr
Src
addr Type Data CRC
Buffer Buffer
FD
FD
FQD FD FD FD FD …
6 TM
Length
Simple Frame Multi-buffer Frame
(Scatter/gather)
D PID
BPID
Address
Offset
Status/Cmd
000
Buffer
Frame Descriptor
D PID
BPID
Address
Offset
Length
Status/Cmd
100
Frame Descriptor
Address
Length
BPID
Offset
00
Address
Length
BPID
Offset (=0)
00
Address
Length
BPID
Offset (=0)
01
…
Data
Data
Data
S/G List
Packet
0 1 2 3 4 5 6 7 8 9 1
0
1
1
1
2
1
3
1
4
1
5
1
6
1
7
1
8
1
9
2
0
2
1
2
2
2
3
2
4
2
5
2
6
2
7
2
8
2
9
3
0
3
1
D
D
LIODN
offset
BPID ELIO
DN
offset
- - - - addr
addr (cont)
Fmt Offset Length
STATUS/CMD
7 TM
FQD Selected Field Description:
• FQD_LINK: Link to the next FQD in a queue of FQDs, used for Work Queues
• ORPRWS: ORP Restoration Window Size
• OA: ORP Auto Advance NESN Window Size
• ODP_SEQ: ODP Sequence Number
• ORP_NESN: ORP Next Expected Sequence Number.
• ORP_EA_HPTR, ORP_EA_TPTR: ORP Early Arrival Head and Tail Pointer
• PFDR_HPTR, PFDR_TPTR : PFDR Head and Tail Pointer
• CONTEXT_A, CONTEXT_B: Frame Queue Context A and B
• STATE: FQ State
• DEST_WQ: Destination Work Queue
• ICS_SURP: Intra-Class Scheduling Surplus or Deficit.
• IS: Intra-Class Scheduling Surplus or Deficit identifier
• ICS_CRED: Intra-Class Scheduling Credit
• CONG_ID: Congestion Group ID
• RA[1-2]_SFDR_PTR: SFDR Pointer for Recently Arrived frame # 1 and 2
• TD_MANT, TD_EXP : Tail Drop threshold Exponent and Mantissa
• C: FQD in external memory or in cache (Qman 1.1)
• X: XON or XOFF for flow control command (Qman1.1)
0 1 2 3 4 5 6 7 8 9 1
0
1
1
1
2
1
3
1
4
1
5
1
6
1
7
1
8
1
9
2
0
2
1
2
2
2
3
2
4
2
5
2
6
2
7
2
8
2
9
3
0
3
1
- - - - - - - - FQD_LINK
OR
PR
WS
OA
ODP_SEQ ORP_NESN
OL
WS
ORP_EA_HSEQ ORP_EA_TSEQ
ORP_EA_HPTR ORP_EA_TPTR
ORP_EA_TPTR PFDR_HPTR
(cont…) PFDR_TPTR
CONTEXT_A
(cont…)
CONTEXT_B
FQ_CTRL
FE
R
ST
A
TE
DEST_WQ
ICS_SURP
IS ICS_CRED
BYTE_CNT
CONG_ID FRM_CNT
NR
A
OA
C
C RA1_SFDR_PTR X
IT - - - RA2_SFDR_PTR
NO
D
OAL OD1_SFDR_PTR OAL OD2_SFDR_PTR
NP
C
- - - OD3_SFDR_PTR - - - TD_MANT TD_EXP
8 TM
Power Architecture™
Core
D-Cache I-Cache
L2 Cache
Power Architecture™
Core
D-Cache I-Cache
L2 Cache
Power Architecture™
Core
D-Cache I-Cache
L2 Cache
• FMan receives packets
− allocates internal buffers
− retrieves data from MAC
• BMI
− acquires a buffer from BMan
− uses DMA to store data in it
• Parse+classify+keygen select a queue and policer profile
• Policer “colors” and optionally discards frame
• QMan applies active queue management and enqueues frame
• Frame is enqueued to one of a pool of cores
• Available core dequeue FD for processing MAC
BMI
Parser
Classifier
Keygen
Policer
QMI
WRED
Enqueue
Dequeue
To
Memory
10GE GE GE GE GE
Frame Manager
(FMan) DMA
Policer Keygen
(Distribution)
Parser Classifier
QMI
BMI
Memory Buffer
Manager
Queue
Manager
D
WQ0
WQ1
WQ2
WQ3
WQ4
WQ5
WQ6
WQ7
Power Architecture™
Core
D-Cache I-Cache
L2 Cache
ENQ
FD
DEQ
Return
Buf Ptr
Request
Buffer
DDR
D
PKT
D
PKT
DDR
9 TM
• Maintains pools of buffers that have been provided by software.
− Manages up to 64 Buffer Pools
− Pools contain “tokens” that usually are (but need not be) physical addresses
of buffers.
− SW fills the pools with any token values it wants any time it wants.
− Typical: pool of little buffers, pool of medium buffers, pool of large buffers
− Can get interrupt if pool depletes.
• Cores, Frame Manager, Security Engine,… acquire these buffers by
requesting them from BMan.
− The requesting block must indicate which buffer pool it is requesting a
buffer from using Buffer Pool ID (BPID)
− BMan then removes buffers from one of its buffer pools and passes them to
the module making the request.
• Modules release buffers back to BMan, providing the token (physical
address) and buffer (BPID) in which to store it.
10 TM
• Standardized command interface to SW and HW
− Up to 50 Software Portals (T4240)
− Up to 6 Direct Connect Portals (T4240)
− Up to 64 separate pools of free buffers
• BMan keeps a small per-pool stockpile of buffer pointers in internal memory
− stockpile of 64 buffer pointers per pool, Maximum 2G buffer pointers
− Absorbs bursts of acquire/release commands without external memory access
− minimized access to memory for buffer pool management.
• Pools (buffer pointers) overflow into DRAM
• LIFO buffer allocation policy
− A released buffer is immediately used for receiving new data, using cache lines previously allocated
… … SW Portal DCP Portal
System Memory
……
Per-Pool
Buffer Stacks
Central Service
Sequencer
On chip Memory
……
Per-Pool
Stockpile
Com
fort Z
one
16-5
5
63 D
eep
Fetch/Flush
Sequencer
Depletion
Threshold
Free Pool
Low Watermark
Core FMan, SEC, PME, etc
CoreNet
11 TM
• Queue Manager acts as a central resource in the multicore datapath infrastructure, managing the queueing of data between:
− Cores (including IPC)
− Hardware offload accelerators
− Network interfaces – Frame Manager
• Queue management
− High performance interfaces (“portals”) for enqueue/dequeue
− Internal buffering of queue/frame data to enhance performance
• Congestion avoidance and management
− RED/WRED
− Tail drop for single queues and aggregates of queues
− Congestion notification for “loss-less” flow control
• Load spreading across processing engines (cores, HW accelerators)
• Order restoration, Order preservation/atomicity
• Delivery to cache/HW accelerators of per queue context information with the data (Frames)
FQD
Cache
Queue Manager
(QMan)
…
FMan
FMan
SEC
PME …
…
…
…
…
FD
Memory
Queuing
Engines
Software Portals
…
CoreNet To Cores
Dire
ct C
on
ne
ct p
orta
ls
DCE
RMan
12 TM
• T4xxx has a total of 50 Software Portals (SP), increase from 10 SP found in the P class processors.
• Supports Customer Edge Egress Traffic Management (CEETM) that provides hierarchical class based scheduling and traffic shaping:
− Available as an alternate to FQ/WQ scheduling mode on the egress side of specific direct connect portals
− Enhanced class based scheduling supporting 16 class queues per channel
− Token bucket based dual rate shaping representing Committed Rate (CR) and Excess Rate (ER)
− Congestion avoidance mechanism equivalent to that provided by FQ congestion groups
• A total of 48 algorithmic sequencers are provided, allowing multiple enqueue/dequeue operations to execute simultaneously.
• Support up to 295M enqueue/dequeue operations per second.
13 TM
Each Frame Manager (FMan) supports:
• Two 10GE MAC and 6GE MACs
− Ethernet & HiGig support
− Rmon/ifMIB stats, EEE, 802.3bf, DCB (PFC, ETS)
• Header Parsing
− L2/L3/L4 parse and validate (checksum)
− User defined protocols supported
• Classification
− Exact match classification for selected traffic
• Distribution
− Based on exact match or hash based queue selection for load spreading
− Can store packet to guest specific private memory based on PCD
• Policing
− Color aware dual rate, 3 color
BMI
Shared
Memory
Buffer
DMA
MACs
Scheduler
Classifier
1GE 1GE
10GE
QMI
Parser
Key
Gen
Policer
IC
Frame
Frame
QMan
BMan
10GE
1GE 1GE
1GE
1GE
14 TM
• Performs parsing of common L2/L3/L4 headers, including tunneled protocols
• Can be augmented by the user to parse other standard protocols
• Can also parse proprietary, user-defined headers at any layer:
− Self-describing, using standard fields such as proprietary Ethertype, Protocol ID, Next Header, etc.
− Non-Self-Describing through configuration.
• Parse results, including proprietary fields, can be used by the classifier, and/or software.
• Soft parse can modify any field in parse results
15 TM
B 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
1x
Logic
al P
ort ID
Sh
imR
L2R L3R L4
R
Cla
ssific
atio
n
Pla
n ID
Nxt
Hdr
Running
Sum
Flags,
Flag
offset
Rout T
ype
RH
P
2x
Sh
im
Offs
et_
1
Shim
Offs
et_
2
IP P
id o
ffset
Eth
Offs
et
LLC
+S
nap
Offs
et
VL
AN
TC
I
offs
et_
1
VLA
NT
CI
offs
et_
n
LastE
TypeO
ffset
PP
PoE
Offs
et
MP
LS
Offs
et_
1
MP
LS
Offs
et_
n
IPO
ffset_
1
IPO
ffset_
n
GR
E
Offs
et
L4 O
ffset
NxtH
dr
offs
et
The Frame Manager Parser stores all parsing results in the Parse Array.
Ethernet
VLAN
Incoming frame
Other L3
DA EthType
8100 VLAN ... TOS ... PID ... SIP DIP SPORT DPORT .... SA
EthType
0800
Parse Result Array (simplified)
EthType
<UDF>
UDF
Length
UDF
Content
compare ethtype to UDF
ethtype
read UDF length
write UDF result
write UDF offset
write parse continue offset
IPv4
TCP/UDP
16 TM
• After parsing is complete, the user can identify the proper action to be taken.
• Hash based: Spreading and distribution of frames according to hash performed on selected fields of the frame.
− Performed by the KeyGen module
− Allows a hash key to be built from many different fields in the frame to provide
− differentiation between flows
− A key can be built exclusively from fields present in the frame based on what the parse function has detected. Alternatively, default values can be used if fields are not present in the frame.
− Hashes selected fields in the frame as part of a spreading mechanism
− The result is a specific frame queue identifier (FQID). To support added control, this FQID can be indexed by values found in the frame, such as TOS or p-bits, or any other desired fields.
• Table Lookup: Action to take on frame is determined by table lookup on selected fields of the frame.
17 TM
UDP Flow Definition
Preamble
7B
….
Start of Fr
1B
Dest MAC
6B
Src MAC
6B
EThertype
2B
For packets with TCP and IPv4 headers, a flow can be defined as the 5-tuple:
• IPv4 protocol, IPv4 source, IPv4 destination, TCP source port, and TCP destination port
For packets with UDP and IPv4 headers, a flow can be defined as the 5-tuple:
• IPv4 protocol, IPv4 source, IPv4 destination, UDP source port,UDP destination port
Version
0x45
DiffServe
1B
Length
2B
ID
2B
Frag Offset
2B
TCP Flow Definition
TTL
1B
Protocol
0x11
Hdr Check
2B
IP Source
4B
IP Dest
4B
UDP SPort
2B
UDP DPort
2B
Preamble
7B
….
Start of Fr
1B
Dest MAC
6B
Src MAC
6B
EThertype
2B
Version
0x45
DiffServe
1B
Length
2B
ID
2B
Frag Offset
2B
TTL
1B
Protocol
0x11
Hdr Check
2B
IP Source
4B
IP Dest
4B
TCP SPort
2B
TCP DPort
2B
18 TM
• 32 Key Generation schemes
− Direct Method
Used for port based or post coarse
classification
− Indirect Method
Based on the parsed protocol stack of the
frame and the source port
The presence (or absence) of valid headers can
direct the scheme used
• 256 Classification Plans
− Indicates which fields of a parsed packet are
of interest for key generation
Build Key
Hash Key
Mask Bits
FQ Base
Addition
Logical ‘OR’
with Special
Fields
Parser Result/Frame
56 byte Key
8 byte Hash Key
24 bit field
24 bit FQID
Classification
Plan/LCV
Exact Key for
Table Lookup
19 TM
• After parsing is complete, the user can identify the proper action to be taken.
• Hash based: Spreading and distribution of frames according to hash performed on selected fields of the frame.
• Table Lookup: Action to take on frame is determined by table lookup on selected fields of the frame
− Performed by the FMan Controller module
− Looks up certain fields in the frame to determine subsequent action to take, including policing. The FMan contains internal memory that holds small tables for this purpose.
− Classification look-ups are performed based on the combination of user configuration and what fields the Parser actually encountered.
− Look-ups can be chained together such that a successful look-up can provide key information for a subsequent look-up. After all the look-ups are complete, the final classification result provides either a hash key to use for spreading, or a FQID directly.
20 TM
• FMan key new features for QorIQ T4 processors
− Six 1G/2.5G multirate Ethernet MACs (mEMACs) per Frame Manager
− Two 10G multirate Ethernet MACs (mEMACs) per Frame Manager
− QMan interface: Supports priority based flow control message pass from Ethernet MAC to Qman
− Comply with IEEE 803.3az (Energy efficient ethernet) and IEEE 802.1QBbb, in addition of IEEE Std 802.3®, IEEE 802.3u, IEEE 802.3x, IEEE 802.3z, IEEE 802.3ac, IEEE 802.3ab, and IEEE-1588 v2 (clock synchronization over Ethernet)
− Port Virtualization: Virtual Storage profile (SPID) selection after classification or distribution function evaluation.
− Rx port multicast support.
− Egress Shaping.
− Offline port: able to copy the frame into new buffers and enqueue back to the QMan.
21 TM
Public Key Hardware Accelerators (PKHA) ~25K RSA Ops/sec (1024b) RSA and Diffie-Hellman (to 4096b) Elliptic curve cryptography (1023b)
Data Encryption Standard Accelerators (DESA) ~15Gbps DES, 3DES (2K, 3K) ECB, CBC, OFB modes
Advanced Encryption Standard Accelerators (AESA) ~40Gbps Key lengths of 128-, 192-, and 256-bit ECB, CBC, CTR, CCM, GCM, CMAC, OFB, CFB, and XTS
ARC Four Hardware Accelerators (AFHA) ~7.5Gbps
Compatible with RC4 algorithm Message Digest Hardware Accelerators (MDHA) ~40Gbps
SHA-1, SHA-2 256,384,512-bit digests MD5 128-bit digest HMAC with all algorithms
Kasumi/F8 Hardware Accelerators (KFHA) ~9Gbps F8 , F9 as required for 3GPP A5/3 for GSM and EDGE GEA-3 for GPRS
Snow 3G Hardware Accelerators (STHA) ~12Gbps Implements Snow 3.0
ZUC Hardware Accelerators (ZHA) ~14Gbps Implements 128-EEA3 & 128-EIA3
CRC Unit~40Gbps Standard and user defined polynomials
Random Number Generator, random IV generation
Supports protocol processing for the following: •IPSec •802.1ae (MACSEC) •SSL/TLS/DTLS •3GPP RLC •LTE PDCP •SRTP •802.11i (WiFi) •802.16e (WiMax)
Job Queue
Controller
Descriptor
Controllers
CHAs
DM
A
RT
IC
Queue
Interface
Job Ring I/F
22 TM
• Regex support plus significant extensions:
− Patterns can be split into 256 sets each of which can contain 16 subsets
− 32K patterns of up to 128B length
− 9.6 Gbps raw performance
• Combined hash/NFA technology
− No “explosion” in number of patterns due to wildcards
− Low system memory utilization
− Fast pattern database compiles and incremental updates
• Matching across “work units”
− Finds patterns in streamed data
• Pipeline of processing
− PME offers pipeline of filtering, matching, and behavior base engine for complete pattern matching solution
On-Chip
System
Bus
Interface
Pattern
Matcher
Frame
Agent
(PMFA)
Data
Examination
Engine
(DXE)
Stateful
Rule
Engine
(SRE)
Key
Element
Scanning
Engine
(KES)
Hash
Tables
Access to Pattern Descriptors and State
Pattern Matching Engine components
Cache Cache
User Definable Reports
Cor
eNet
B
Ma
n
QM
an
23 TM
• Many queues allow multiple inbound/outbound queues per core
− Hardware queue management via QorIQ Data Path Architecture (DPAA)
• Supports all messaging-style transaction types
− Type 11 Messaging
− Type 10 Doorbells
− Type 9 Data Streaming
• Enables low overhead direct core-to-core communication
Core Core Core Core
10G SRIO
QorIQ or DSP
Core Core Core Core
10G SRIO
QorIQ or DSP
Type9 User PDU
Channelized CPU-
to-CPU transport Device-to-Device
Transport
MSG User PDU
24 TM
• Deflate
− As specified as in RFC1951
• GZIP
− As specified in RFC1952
• Zlib
− As specified in RFC1950
− Interoperable with the zlib 1.2.5 compression library
• Encoding
− supports Base 64 encoding and decoding (RFC4648).
• Operate up to 600Mhz
− 10Gbps Compress
− 10Gbps Decompress
− 20Gbps Aggregate
32KB
History
Frame
Agent
QMan
I/F
BMan
I/F
Bus
I/F
Decompressor
Compressor
QMan
Portal
BMan
Portal
To
Corenet
4KB
History
25 TM
• QMan 1.2 (e.g. QorIQ T42xx) supports Data Center Bridging (DCB).
• DCB refers to a series of inter-related IEEE specifications collectively designed to enhance Ethernet LAN traffic prioritization and congestion management.
• DCB can be used in:
− Between data center network nodes:
− LAN/network traffic
− Storage Area Network (SAN) (e.g. Fiber Channel (loss sensitive )) and,
− IPC traffic (e.g. Infiniband (low latency))
• The DPAA is compliant with the following DCB specifications (traffic management related) :
− IEEE Std. 802.1Qbb: Priority-based flow control (PFC)
To avoid frame loss, a PFC Pause frames can be sent autonomously by HW.
− IEEE Std. 802.1Qaz: Enhanced transmission selection (ETS)
Support Weighted bandwidth fairness.
26 TM
ETS CoS based
Bandwidth Management
• Enables Intelligent sharing of
bandwidth between traffic classes
control of bandwidth
• 802.1Qaz
10 GE Realized Traffic Utilization
3G/s HPC Traffic
3G/s
2G/s
3G/s Storage Traffic
3G/s
3G/s
LAN Traffic
4G/s
5G/s 3G/s
t1 t2 t3
Offered Traffic
t1 t2 t3
3G/s 3G/s
3G/s 3G/s 3G/s
2G/s
3G/s 4G/s 6G/s
Priority Flow Control
• Enables lossless behavior
for each class of service
• PAUSE sent per virtual lane
when buffers limit exceeded
• IEEE 802.1Qbb
Transmit Queues Ethernet Link
Receive Buffers
Zero Zero
One One
Two Two
Five Five
Four Four
Six Six
Seven Seven
Three Three STOP PAUSE Eight
Virtual
Lanes
TM