What’s needed to receive?
-
Upload
brady-sosa -
Category
Documents
-
view
34 -
download
0
description
Transcript of What’s needed to receive?
What’s needed to receive?
A look at the minimum steps required for programming our
anchor nic’s to receive packets
A disappointment
• Our former ‘nicwatch.cpp’ application does not seem to work reliably to show packets being received by the 82573L controller
• It was based on the ‘raw sockets’ protocol implemented within the Linux kernel’s vast networking subsystem, thus offering us the prospect of a ‘hardware-independent’ tool -- if only it would show us all the packets!
Two purposes…
• So let’s discard ‘nicwatch.cpp’ in favor of writing our own hardware-specific module that WILL be able to show us all the nic’s received packets, independently of Linux’s various layers of networking protocol code
• And let’s keep it as simple as possible, so we can see which programming steps are the truly essential ones for the 82573L nic
Accessing 82573L registers
• Device registers are hardware mapped to a range of addresses in physical memory
• We can get the location and extent of this memory-range from a BAR register in the 82573L device’s PCI Configuration Space
• We then request the Linux kernel to setup an I/O ‘remapping’ of this memory-range to ‘virtual’ addresses within kernel-space
i/o-memory remapping
dynamicram
nic registers
vram
IO-APIC
Local-APIC
userspace
APIC registers
kernel code/data
nic registers
vram
‘virtual’ address-spacephysical address-space
1-GB
3-GB
Kernel memory allocation
• The NIC requires that some host memory for packet-buffers and receive descriptors
• The kernel provides a ‘helper function’ for reserving a suitable region of memory in kernel-space which is both ‘non-pageable’ and ‘physically contiguous’ (i.e., kzalloc())
• It’s our job is to decide how much memory our network controller hardware will need
the packet’s data ‘payload’ goes here(usually varies from 56 to 1500 bytes)
Ethernet packet layout
• Total size normally can vary from 64 bytes up to 1522 bytes (unless ‘jumbo’ packets and/or ‘undersized’ packets are enabled)
• The NIC expects a 14-byte packet ‘header’ and it appends a 4-byte CRC check-sum
destination MAC address (6-bytes)
source MAC address(6-bytes)
Type/length(2-bytes)
Cyclic RedundancyChecksum (4-bytes)
0 6 12 14
Rx-Descriptor Ring-Buffer
Circular buffer (128-bytes minimum – and must be a multiple of 128 bytes)
RDBA base-address
RDLEN (in bytes)
RDH (head)
RDT (tail)
= owned by hardware (nic)
= owned by software (cpu)
0x00
0x10
0x20
0x30
0x40
0x50
0x60
0x70
0x80
Our ‘nicspy.c’ module
• It will be a ‘character-mode’ device-driver
• It will only implement ‘read()’ and ‘ioctl()’
• The ‘read()’ function will cause a task to sleep until a network packet has arrived
• An interrupt-handler will wake up the task
• A ‘get_info’ function will be provided as a debugging aid, so the NIC’s Rx descriptor-queue can be conveniently inspected
Sixteen packet-buffers
• Our ‘nicspy.c’ driver allocates 16 buffers of size 1536 bytes (i.e., for normal ethernet)
unused unused
32-KB allocated (16 packet-buffers, plus Rx-Descriptor Queue)
#define KMEM_SIZE 0x8000 // 32KB = size of kernel memory allocation
void *kmem = kzalloc( KMEM_SIZE, GFP_KERNEL );if ( !kmem ) return –ENOMEM;
for the Rx Descriptor Queue (256 bytes)
for the sixteen packet-buffers
Format for an Rx Descriptor
Base-address (64-bits) statusPacket-length
Packet-checksum
VLANtag
errors
16 bytes
The device-driver initializes this ‘base-address’ field with the physical address of a packet-buffer
The network controller will ‘write-back’ the values for these fields when it has transferred a received packet’s data into this packet-buffer
Suggested C syntax
typedef struct {unsigned long long base_address;unsigned short packet_length;unsigned short packet_cksum;unsigned char desc_status;unsigned char desc_errors;unsigned short VLAN_tag;} RX_DESCRIPTOR;
‘Legacy Format’ for the Intel Pro1000 network controller’s Receive Descriptors
RxDesc Status-field
PIF IPCS TCPCS VP IXSM EOP DD
7 6 5 4 3 2 1 0
DD = Descriptor Done (1=yes, 0=no) shows if nic is finished with descriptor EOP = End Of Packet (1=yes, 0=no) shows if this packet is logically last IXSM = Ignore Checksum Indications (1=yes, 0=no) VP = VLAN Packet match (1=yes, 0=no) USPCS = UDP Checksum calculated in packet (1=yes, 0=no) TCPCS = TCP Checksum calculated in packet (1=yes, 0=no) IPCS = IPv4 Checksum calculated on packet (1=yes, 0=no) PIF = Passed In-exact Filter (1=yes, 0=no) shows if software must check
UDPCS
RxDesc Error-field
RXE IPE TCPE reserved=0 SE CE
7 6 5 4 3 2 1 0
RXE = Received-data Error (1=yes, 0=no) IPE = IPv4-checksum error TCPE = TCP/UDP checksum error (1=yes, 0=no) SEQ = Sequence error (1=yes, 0=no) SE = Symbol Error (1=yes, 0=no) CE = CRC Error or alignment error (1=yes, 0=no)
SEQreserved=0
Essential ‘receive’ registers
enum {
E1000_CTRL 0x0000, // Device Control
E1000_STATUS 0x0008, // Device Status
E1000_ICR 0x00C0, // Interrupt Cause Read
E1000_IMS 0x00D0, // Interrupt Mask Set
E1000_IMC 0x00D8, // Interrupt Mask Clear
E1000_RCRL 0x0100, // Receive Control
E1000_RDBAL 0x2800, // Rx Descriptor Base Address Low
E1000_RDBAH 0x2804, // Rx Descriptor Base Address High
E1000_RDLEN 0x2808, // Rx Descriptor Length
E1000_RDH 0x2810, // Rx Descriptor Head
E1000_RDT 0X2818, // Rx Descriptor Tail
E1000_RXDCTL 0x2828, // Rx Descriptor Control
E1000_RA 0x5400, // Receive address-filter Array
};
Receive Control (0x0100)
R=0
0 0FLXBUFSE
CRCBSEX R
=0PMCF DPF R
=0CFI
CFIEN
VFE BSIZE
BAM
R=0
MO DTYP RDMTS
ILOS
SLU
LPE UPE 0 0 R=0
15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
SBPEN
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16
LBM MPE
EN = Receive Enable DTYP = Descriptor Type DPF = Discard Pause Frames SBP = Store Bad Packets MO = Multicast Offset PMCF = Pass MAC Control FramesUPE = Unicast Promiscuous Enable BAM = Broadcast Accept Mode BSEX = Buffer Size ExtensionMPE = Multicast Promiscuous Enable BSIZE = Receive Buffer Size SECRC = Strip Ethernet CRCLPE = Long Packet reception Enable VFE = VLAN Filter Enable FLXBUF = Flexible Buffer sizeLBM = Loopback Mode CFIEN = Canonical Form Indicator EnableRDMTS = Rx-Descriptor Minimum Threshold Size CFI = Canonical Form Indicator bit-value
We used 0x0000801C in RCTL to prepare the ‘receive engine’ prior to enabling it
Device Control (0x0000)
PHYRST
VME R=0
TFCE RFCE RST R=0
R=0
R=0
R=0
R=0
ADVD3
WUC
R=0
D/UDstatus
R=0
R=0
R=0
R=0
R=0
FRCDPLX
FRCSPD
R=0
SPEED R=0
SLU
R=0
R=0
R=1
0 0 FD
15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
GIOMD
R=0
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16
FD = Full-Duplex SPEED (00=10Mbps, 01=100Mbps, 10=1000Mbps, 11=reserved)GIOMD = GIO Master Disable ADVD3WUP = Advertise Cold Wake Up Capability SLU = Set Link Up D/UD = Dock/Undock status RFCE = Rx Flow-Control EnableFRCSPD = Force Speed RST = Device Reset TFCE = Tx Flow-Control EnableFRCDPLX = Force Duplex PHYRST = Phy Reset VME = VLAN Mode Enable
82573LWe used 0x040C0241 to initiate a ‘device reset’ operation
0
Device Status (0x0008)
? 0 0 0 0 0 0 0 0 0 0 0GIO
MasterEN
0 0 0
0 0 0 0 PHYRA ASDV
ILOS
SLU
0 TXOFF 0 0
FD
15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
FunctionID
LU
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16
SPEED
FD = Full-DuplexLU = Link UpTXOFF = Transmission PausedSPEED (00=10Mbps,01=100Mbps, 10=1000Mbps, 11=reserved)ASDV = Auto-negotiation Speed Detection ValuePHYRA = PHY Reset Asserted
82573L
some undocumented functionality?
PCI Bus Master DMA82573L i/o-memory
RX and TX FIFOs(32-KB total)
Host’s Dynamic Random Access Memory
Rx Descriptor Queue packet-buffer
packet-buffer
packet-buffer
packet-buffer
packet-buffer
packet-buffer
packet-buffer
DMA
on-chip RX descriptors
on-chip TX descriptors
Our ‘read()’ algorithm
unsigned int rx_curr;
ssize_t my_read( struct file *file, char *buf, size_t len, loff_t *pos ){
// our global variable ‘rx_curr’ is the descriptor-array index // for the next receive-buffer descriptor to be processed
if ( this descriptor’s status is zero ) put calling task to sleep;
// wakeup the task when a fresh packet has been received
copy received data from the packet-buffer to user’s bufferclear this descriptor’s statusadvance our global variable ‘rx_curr’ to the next descriptorreturn the number of data-bytes transferred
}
‘nicspy.cpp’
• This application calls our device-driver’s ‘read()’ function repeatedly, and displays the ‘raw’ ethernet packet-data each time
• It requires our ‘nicspy.c’ device-driver to be installed in the kernel, obviously
• There’s no ‘clash’ of filenames here – and their similarity helps keep them together:
nicspy.c and nicspy.ko (the kernel-side)nicspy.cpp and nicspy ( the user-side )
in-class demo
• We can install ‘nicspy.ko’ on one of our anchor machines – making sure ‘eth1’ is ‘down’ before we do our module-install – and then we run ‘nicspy’ on that machine
• Next we install our ‘nicping.ko’ module on some other anchor machine – be sure its ‘eth1’ interface is ‘down’ beforehand – and then use ‘cat /proc/nicping’ for a transmit