Post on 17-Dec-2015
Receiver ‘packet-splitting’
A look at how a driver can cause the 82573L NIC to separate a packet’s headers from its data
NIC can do packet-parsing
• Intel’s newest gigabit Ethernet controllers offer an enhancement to the ‘extended’ Receive Descriptor, called ‘packet-split’ format, which empowers the hardware to recognize the packet ‘headers’ used with the most common network protocols and to automatically separate those headers from their accompanying packet ‘data’
‘Extended’ RX-Descriptors
Base-address (64-bits)
reserved (=0)
MRQ(multiple receive queues)
Extendedstatus
Packet-length
Packet-checksum
VLANtag
Extendederrors
IPidentification
The device-driver initializes the ‘base-address’ field with the physical address of a packet-buffer, and it initializes the ‘reserved’ field with a zero-value… … the network hardware will later modify both fields
The network controller will ‘write-back’ the values for these fields when it has transferred a received packet’s data into the packet-buffer
CPU writes this, NIC reads it: NIC writes this, CPU reads it:
‘Packet-Split’ RX-Descriptors
Base-address 0 (64-bits) MRQ(multiple receive queues)
Extendedstatus
Packet-Length 0
Packet-checksum
VLANtag
Extendederrors
IPidentification
The device-driver initializes four ‘base-address’ fields (‘even-numbered’ addresses)
The network controller will ‘write-back’ values to these fields when it has transferred a received packet’s data into those packet-buffers
CPU writes this, NIC reads it: NIC writes this, CPU reads it:
Base-address 1 (64-bits)
Base-address 2 (64-bits)
Base-address 3 (64-bits)
PacketLength 3
PacketLength 2
PacketLength 1
HeaderLength
reserved 1
SP
Same ‘Extended’ Status/Errors
0 0 0 0ACK
0 0 0 0UDPV
IPIV
0PIF
IPCS
TCPCS
UDPCS
VP
IXSM
EOP
DD
19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
RXE IPE TCPE 0 0 SEQ SE CE 0 0 0 0
11 10 9 8 7 6 5 4 3 2 1 0
Syntax modifications for ‘fetch’
typedef struct {unsigned long long base_addr0;unsigned long long base_addr1;unsigned long long base_addr2;unsigned long long base_addr3;} RX_DESC_FETCH;
Syntax modifications for ‘store’
typedef struct {unsigned int mrq;unsigned short ip_identification;unsigned short packet_chksum;unsigned int desc_status:20;unsigned int desc_errors:12;unsigned short packet_length0;unsigned short vlan_tag;unsigned short header_length;unsigned short packet_length1;unsigned short packet_length2;unsigned short packet_length3unsigned long long reserved;} RX_DESC_STORE;
Same syntax for the ‘union’
typedef union {
RX_DESC_FETCH rxf;RX_DESC_STORE rxs;} RX_DESCRIPTOR;
NIC Registers involved
RCTL
Reserved (=0)RFCTL
LEN3(1KB)
PSRCTL LEN0(128B)
LEN1(1KB)
LEN2(1KB)
Packet-Split Receive Control register
Receive Filter Control register
EXTEN
31 15 0
31 24 23 16 15 8 7 0
31 10 0
DTYP
Device Control register
base_addr0
Each descriptor has four buffers
buffer0
base_addr1
base_addr2
base_addr3
buffer1 buffer2 buffer3
Packet-Split Rx-descriptor
Four buffers are allocated for receiving one packet
Refresh for ‘reuse’
• As with the ‘extended’ receive-descriptors, it is necessary for a device-driver to setup each ‘packet-split’ receive-descriptor any time it is going to be ‘reused’, since prior buffer-addresses get overwritten during a packet-reception by the network controller
• So driver needs a formula for recalculating buffer-addresses, or use a ‘shadow’ array
Kernel-memory layout
512bytes
65536bytes
Sixteen Rx-descriptors (32-bytes each)
Sixty-four receive-buffers (1024-bytes each)
kmem
KMEM_SIZE (= 66048 bytes)
Caveats
• Short packets are not always ‘split’
• Unrecognized packet-headers may not be separated from accompanying packet-data
• Demonstrating the packet-split capability will require us to devise a way to transmit packets which have the TCP/UDP and IP packet-headers that the NIC recognizes
Our ‘pktsplit.c’ demo
• We created a ‘minimal’ kernel-module for demonstrating the NIC’s ‘packet-splitting’ capabilities
TIMEOUT for an in-class demonstration
In-class exercise
• Can you enhance our ‘pktsplit.c’ module so that its Receive-Descriptor Queue will function automatically as a ring-buffer (as happens in our ‘extended.c’ example)?
• Your best option for this is to install an ISR which will reinitialize some Rx-Descriptors (and advance the RDT index) each time an RXDMT0 interrupt gets triggered