PFLDNet Workshop February 2003 R. Hughes-Jones Manchester Some Performance Measurements Gigabit...
-
Upload
chrystal-willis -
Category
Documents
-
view
218 -
download
2
Transcript of PFLDNet Workshop February 2003 R. Hughes-Jones Manchester Some Performance Measurements Gigabit...
PFLDNet Workshop February 2003R. Hughes-Jones Manchester
Some Performance MeasurementsGigabit Ethernet NICs &
Server Quality Motherboards
Richard Hughes-JonesThe University of Manchester
Workshop on Protocols for Fast Long-Distance Networks
Session: Close to Hardware
PFLDNet Workshop February 2003R. Hughes-Jones Manchester
UDP/IP packets sent between back-to-back systems Processed in a similar manner to TCP/IP Not subject to flow control & congestion avoidance algorithms Used UDPmon test program
Latency Round trip times measured using Request-Response UDP frames Latency as a function of frame size
Slope s given by:
Mem-mem copy(s) + pci + Gig Ethernet + pci + mem-mem copy(s) Intercept indicates processing times + HW latencies
Histograms of ‘singleton’ measurements Tells us about:
Behavior of the IP stack The way the HW operates Interrupt coalescence
pathsdata dt
db1 s
The Latency Measurements Made
PFLDNet Workshop February 2003R. Hughes-Jones Manchester
The Throughput Measurements Made (1) UDP Throughput Send a controlled stream of UDP frames spaced at regular intervals
Zero stats OK done
●●●
Get remote statistics Send statistics:No. receivedNo. lost + loss
patternNo. out-of-orderCPU load & no. int1-way delay
Send data frames atregular intervals ●●●
Time to send Time to receive
Inter-packet time(Histogram)
Signal end of testOK done
n bytes
Number of packets
Wait timetime
PFLDNet Workshop February 2003R. Hughes-Jones Manchester
The Throughput Measurements Made (2) UDP Throughput Send a controlled stream of UDP frames spaced at regular intervals Vary the frame size and the frame transmit spacing At the receiver record
The time of first and last frames received The number packets received, the number lost, number out of order The received inter-packet spacing is histogramed The time each packet is received provides packet loss pattern CPU load, Number of interrupts
Use the Pentium CPU cycle counter for times and delay Few lines of user code
Tells us about: Behavior of the IP stack The way the HW operates Capacity and Available throughput of the LAN / MAN / WAN
PFLDNet Workshop February 2003R. Hughes-Jones Manchester
The PCI Bus & Gigabit Ethernet Measurements
PCI Activity Logic Analyzer with
PCI Probe cards in sending PC Gigabit Ethernet Fiber Probe Card PCI Probe cards in receiving PC
GigabitEthernet
ProbeCPU
mem
chipset
NIC
CPU
mem
NIC
chipset
Logic AnalyserDisplay
PFLDNet Workshop February 2003R. Hughes-Jones Manchester
Examine the behaviour of different NICs Nice example of running at 33 MHz Quick look at some new Server boards
PFLDNet Workshop February 2003R. Hughes-Jones Manchester
SuperMicro 370DLE: Latency: SysKonnect
PCI:32 bit 33 MHz Latency small 62 µs & well behaved Latency Slope 0.0286 µs/byte Expect: 0.0232 µs/byte
PCI 0.00758
GigE 0.008
PCI 0.00758
PCI:64 bit 66 MHz Latency small 56 µs & well behaved Latency Slope 0.0231 µs/byte Expect: 0.0118 µs/byte
PCI 0.00188
GigE 0.008
PCI 0.00188 Possible extra data moves ?
Motherboard: SuperMicro 370DLE Chipset: ServerWorks III LE Chipset CPU: PIII 800 MHz RedHat 7.1 Kernel 2.4.14
SysKonnect 64 bit 66 MHz
y = 0.0231x + 56.088
y = 0.0142x + 81.975
0
20
40
60
80
100
120
140
0 500 1000 1500 2000 2500 3000Message length bytes
Late
ncy u
s
SysKonnect 32bit 33 MHz
y = 0.0286x + 61.79
y = 0.0188x + 89.756
0
20
40
60
80
100
120
140
160
0 500 1000 1500 2000 2500 3000Message length bytes
Late
ncy u
s
PFLDNet Workshop February 2003R. Hughes-Jones Manchester
SuperMicro 370DLE: Throughput: SysKonnect
PCI:32 bit 33 MHz Max throughput 584Mbit/s No packet loss >18 us spacing
PCI:64 bit 66 MHz Max throughput 720 Mbit/s No packet loss >17 us spacing
Packet loss during BW drop
Motherboard: SuperMicro 370DLE Chipset: ServerWorks III LE Chipset CPU: PIII 800 MHz RedHat 7.1 Kernel 2.4.14
SysKonnect 32bit 33 MHz
0
200
400
600
800
1000
0 5 10 15 20 25 30 35 40Transmit Time per frame us
Rec
v W
ire ra
te
Mbi
ts/s
50 bytes
100 bytes
200 bytes
400 bytes
600 bytes
800 bytes
1000 bytes
1200 bytes
1400 bytes
1472 bytes
SysKonnnect 64bit 66MHz
0
200
400
600
800
1000
0 5 10 15 20 25 30 35 40
Transmit Time per frame us
Recv
Wire
rate
M
bits
/s
50 bytes 100 bytes 200 bytes 400 bytes 600 bytes 800 bytes 1000 bytes 1200 bytes 1400 bytes 1472 bytes
PFLDNet Workshop February 2003R. Hughes-Jones Manchester
SuperMicro 370DLE: PCI: SysKonnect Motherboard: SuperMicro 370DLE Chipset: ServerWorks III LE Chipset CPU: PIII 800 MHz PCI:64 bit 66 MHz RedHat 7.1 Kernel 2.4.14
Receive transfer
Send PCI
Packet on Ethernet Fibre
Send setup
Send transfer
Receive PCI
1400 bytes sent Wait 100 us ~8 us for send or receive Stack & Application overhead ~ 10 us / node
~36 us
PFLDNet Workshop February 2003R. Hughes-Jones Manchester
SuperMicro 370DLE: PCI: SysKonnect
1400 bytes sent Wait 20 us
1400 bytes sent Wait 10 us
Motherboard: SuperMicro 370DLE Chipset: ServerWorks III LE Chipset CPU: PIII 800 MHz PCI:64 bit 66 MHz RedHat 7.1 Kernel 2.4.14
Frames are back-to-backCan drive at line speedCannot go any faster !
Frames on Ethernet Fiber 20 us spacing
PFLDNet Workshop February 2003R. Hughes-Jones Manchester
Intel Pro/1000 64bit 66MHz
y = 0.0187x + 167.86
0
50
100
150
200
250
0 500 1000 1500Message length bytes
Late
ncy u
s
SuperMicro 370DLE: Latency: Intel Pro/1000
Latency high but well behaved Indicates Interrupt coalescence Slope 0.0187 us/byte Expect:
PCI 0.00188
GigE 0.008
PCI 0.00188
0.0118 us/byte
Motherboard: SuperMicro 370DLE Chipset: ServerWorks III LE Chipset CPU: PIII 800 MHz PCI:64 bit 66 MHz RedHat 7.1 Kernel 2.4.14
Intel Pro/1000 64bit 66MHz
0
50
100
150
200
250
300
350
400
450
0 5000 10000 15000
Message length bytes
Late
ncy u
s
PFLDNet Workshop February 2003R. Hughes-Jones Manchester
SuperMicro 370DLE: Throughput: Intel Pro/1000
Max throughput 910 Mbit/s No packet loss >12 us spacing
Packet loss during BW drop CPU load 65-90% spacing < 13 us
Motherboard: SuperMicro 370DLE Chipset: ServerWorks III LE Chipset CPU: PIII 800 MHz PCI:64 bit 66 MHz RedHat 7.1 Kernel 2.4.14 Intel Pro/1000 64bit 66MHz
0
200
400
600
800
1000
0 5 10 15 20 25 30 35 40Transmit Time per frame us
Recv
Wire
rate
M
bits
/s
50 bytes 100 bytes 200 bytes 400 bytes 600 bytes 800 bytes 1000 bytes 1200 bytes 1400 bytes 1472 bytes
UDP Intel Pro/1000 : 370DLE 64bit 66MHz
0
0.2
0.4
0.6
0.8
1
0 5 10 15 20 25 30 35 40Transmit Time per frame us
% P
acke
t los
s 50 bytes 100 bytes 200 bytes 400 bytes 600 bytes 800 bytes 1000 bytes 1200 bytes 1400 bytes 1472 bytes
PFLDNet Workshop February 2003R. Hughes-Jones Manchester
SuperMicro 370DLE: PCI: Intel Pro/1000 Motherboard: SuperMicro 370DLE Chipset: ServerWorks III LE Chipset CPU: PIII 800 MHz PCI:64 bit 66 MHz RedHat 7.1 Kernel 2.4.14
Send PCI
Receive PCI
Send 64 byte request
Send Interrupt processing
Interrupt delay
Request received
Receive Interrupt processing
1400 byte response
Request – Response Demonstrates interrupt coalescence No processing directly after each transfer
PFLDNet Workshop February 2003R. Hughes-Jones Manchester
SuperMicro 370DLE: PCI: Intel Pro/1000
1400 bytes sent Wait 11 us ~4.7us on send PCI bus PCI bus ~43% occupancy ~ 3.25 us on PCI for data recv ~ 30% occupancy
1400 bytes sent Wait 11 us Action of pause packets
Motherboard: SuperMicro 370DLE Chipset: ServerWorks III LE Chipset CPU: PIII 800 MHz PCI:64 bit 66 MHz RedHat 7.1 Kernel 2.4.14
PFLDNet Workshop February 2003R. Hughes-Jones Manchester
SuperMicro 370DLE: Throughput: Alteon
PCI:64 bit 33 MHz Max throughput 674Mbit/s Packet loss < 10 us spacing
PCI:64 bit 66 MHz Max throughput 930 Mbit/s Packet loss < 10 us spacing
Packet loss during BW drop
Motherboard: SuperMicro 370DLE Chipset: ServerWorks III LE Chipset CPU: PIII 800 MHz RedHat 7.1 Kernel 2.4.14 Alteon 32bit 33 MHz
0
200
400
600
800
1000
0 5 10 15 20 25 30 35 40Transmit Time per frame us
Rec
v W
ire ra
te
Mbi
ts/s
50 bytes 100 bytes 200 bytes 400 bytes 600 bytes 800 bytes 1000 bytes 1200 bytes 1400 bytes 1472 bytes
Alteon 64 bit 66 MHz
0
200
400
600
800
1000
0 5 10 15 20 25 30 35 40Transmit Time per frame us
Rec
v W
ire ra
te
Mbi
ts/s
50 bytes 100 bytes 200 bytes 400 bytes 600 bytes 800 bytes 1000 bytes 1200 bytes 1400 bytes 1472 bytes
PFLDNet Workshop February 2003R. Hughes-Jones Manchester
SuperMicro 370DLE: PCI: Alteon Motherboard: SuperMicro 370DLE Chipset: ServerWorks III LE Chipset CPU: PIII 800 MHz RedHat 7.1 Kernel 2.4.14
Send PCI
Receive PCI
PCI:64 bit 33 MHz 1400 byte packets Signals nice and clean
PCI:64 bit 66 MHz 1400 byte packets Spacing 16 us NIC mem transfer pauses
slows down the transfer
Send PCI
Receive PCI
Receive transfer
PFLDNet Workshop February 2003R. Hughes-Jones Manchester
UDP Intel pro1000 IBMdas 64bit 33 MHz
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
0 5 10 15 20 25 30 35 40
Transmit Time per frame us
Recv W
ire r
ate
Mb
its/s
50 bytes 100 bytes 200 bytes 400 bytes 600 bytes 800 bytes 1000 bytes 1200 bytes 1400 bytes 1472 bytes
IBM das: Throughput: Intel Pro/1000
Max throughput 930Mbit/s No packet loss > 12 us Clean behaviour
Packet loss during drop
UDP Intel pro1000 IBMdas 64bit 33 MHz
0
100
200
300
400
500
600
700
800
900
1000
0 5 10 15 20 25 30 35 40
Transmit Time per frame us
Rec
v W
ire
rate
Mb
its/
s
50 bytes 100 bytes 200 bytes 400 bytes 600 bytes 800 bytes 1000 bytes 1200 bytes 1400 bytes 1472 bytes
Motherboard: IBM das Chipset:: ServerWorks CNB20LE CPU: Dual PIII 1GHz PCI:64 bit 33 MHz RedHat 7.1 Kernel 2.4.14
PFLDNet Workshop February 2003R. Hughes-Jones Manchester
IBM das: PCI: Intel Pro/1000
1400 bytes sent 11 us spacing Signals clean ~9.3us on send PCI bus PCI bus ~82% occupancy ~ 5.9 us on PCI for data recv.
Motherboard: IBM das Chipset:: ServerWorks CNB20LE CPU: Dual PIII 1GHz PCI:64 bit 33 MHz RedHat 7.1 Kernel 2.4.14
PFLDNet Workshop February 2003R. Hughes-Jones Manchester
SuperMicro P4DP6: Latency Intel Pro/1000
Some steps Slope 0.009 us/byte Slope flat sections : 0.0146 us/byte Expect 0.0118 us/byte
No variation with packet size FWHM 1.5 us Confirms timing reliable
Motherboard: SuperMicro P4DP6 Chipset: Intel E7500 (Plumas) CPU: Dual Xeon Prestonia 2.2 GHz PCI, 64 bit, 66 MHz RedHat 7.2 Kernel 2.4.19 Intel 64 bit 66 MHz
y = 0.0093x + 194.67
y = 0.0149x + 201.75
0
50
100
150
200
250
300
0 500 1000 1500 2000 2500 3000Message length bytes
Late
ncy u
s
64 bytes Intel 64 bit 66 MHz
0
100
200
300
400
500
600
700
800
900
170 190 210
Latency us
N(t)
512 bytes Intel 64 bit 66 MHz
0
100
200
300
400
500
600
700
800
170 190 210Latency us
N(t)
1024 bytes Intel 64 bit 66 MHz
0
100
200
300
400
500
600
700
800
190 210 230
Latency us
N(t)
1400 bytes Intel 64 bit 66 MHz
0
100
200
300
400
500
600
700
800
190 210 230
Latency us
N(t)
PFLDNet Workshop February 2003R. Hughes-Jones Manchester
SuperMicro P4DP6: Throughput Intel Pro/1000
Max throughput 950Mbit/s No packet loss
CPU utilisation on the receiving PC was ~ 25 % for packets > than 1000 bytes
30- 40 % for smaller packets
Motherboard: SuperMicro P4DP6 Chipset: Intel E7500 (Plumas) CPU: Dual Xeon Prestonia 2.2 GHz PCI, 64 bit, 66 MHz RedHat 7.2 Kernel 2.4.19
gig6-7 Intel pci 66 MHz 27nov02
0
200
400
600
800
1000
0 5 10 15 20 25 30 35 40Transmit Time per frame us
Recv
Wire
rate
M
bits
/s
50 bytes 100 bytes 200 bytes 400 bytes 600 bytes 800 bytes 1000 bytes 1200 bytes 1400 bytes 1472 bytes
gig6-7 Intel pci 66 MHz 27nov02
0
20
40
60
80
100
0 5 10 15 20 25 30 35 40Transmit Time per frame us
% C
PU
Idl
e R
ecei
ver
50 bytes 100 bytes 200 bytes 400 bytes 600 bytes 800 bytes 1000 bytes 1200 bytes 1400 bytes 1472 bytes
PFLDNet Workshop February 2003R. Hughes-Jones Manchester
SuperMicro P4DP6: PCI Intel Pro/1000
1400 bytes sent Wait 12 us ~5.14us on send PCI bus PCI bus ~68% occupancy ~ 3 us on PCI for data recv
CSR access inserts PCI STOPs NIC takes ~ 1 us/CSR CPU faster than the NIC !
Similar effect with the SysKonnect NIC
Motherboard: SuperMicro P4DP6 Chipset: Intel E7500 (Plumas) CPU: Dual Xeon Prestonia 2.2 GHz PCI, 64 bit, 66 MHz RedHat 7.2 Kernel 2.4.19
PFLDNet Workshop February 2003R. Hughes-Jones Manchester
SuperMicro P4DP8-G2: Throughput SysKonnect
Max throughput 990Mbit/s New Card cf other tests
20- 30% utilisation Sender
~30 % utilisation Receiver
Motherboard: SuperMicro P4DP8-G2 Chipset: Intel E7500 (Plumas) CPU: Dual Xeon Prestonia 2.4 GHz PCI:64 bit 66 MHz RedHat 7.3 Kernel 2.4.19 SysKonnect b2b
0
200
400
600
800
1000
0 5 10 15 20 25 30 35 40Transmit Time per frame us
Rec
v W
ire ra
te
Mbi
ts/s
50 bytes 100 bytes 200 bytes 400 bytes 600 bytes 800 bytes 1000 bytes 1200 bytes 1400 bytes 1472 bytes
SysKonnect b2b
0
20
40
60
80
100
0 5 10 15 20 25 30 35 40
Transmit Time per frame us
% C
PU
Idle
Sender
50 bytes 100 bytes 200 bytes 400 bytes 600 bytes 800 bytes 1000 bytes 1200 bytes 1400 bytes 1472 bytes
SysKonnect b2b
0
20
40
60
80
100
0 5 10 15 20 25 30 35 40
Transmit Time per frame us
% C
PU
Idle
Receiv
er
50 bytes 100 bytes 200 bytes 400 bytes 600 bytes 800 bytes 1000 bytes 1200 bytes 1400 bytes 1472 bytes
PFLDNet Workshop February 2003R. Hughes-Jones Manchester
SuperMicro P4DP8-G2: Throughput Intel onboard
Max throughput 995Mbit/s No packet loss
20% CPU utilisation receiver packets > 1000 bytes
30% CPU utilisation smaller packets
Motherboard: SuperMicro P4DP8-G2 Chipset: Intel E7500 (Plumas) CPU: Dual Xeon Prestonia 2.4 GHz PCI-X:64 bit RedHat 7.3 Kernel 2.4.19 Intel-onboard
0
200
400
600
800
1000
0 5 10 15 20 25 30 35 40Transmit Time per frame us
Rec
v W
ire ra
te
Mbi
ts/s
50 bytes 100 bytes 200 bytes 400 bytes 600 bytes 800 bytes 1000 bytes 1200 bytes 1400 bytes 1472 bytes
Intel-onboard
0
20
40
60
80
100
0 5 10 15 20 25 30 35 40Transmit Time per frame us
% C
PU
Idle
Receiv
er
50 bytes 100 bytes 200 bytes 400 bytes 600 bytes 800 bytes 1000 bytes 1200 bytes 1400 bytes 1472 bytes
PFLDNet Workshop February 2003R. Hughes-Jones Manchester
Futures & Work in Progress
Dual Gigabit Ethernet controllers More detailed study of PCI-X Interaction of multiple PCI Gigabit flows What happens when you have disks?
10 Gigabit Ethernet NICs
2 Streams Dual Gigabit Controller
0
100
200
300
400
500
600
700
800
900
1000
0 5 10 15 20 25 30 35 40
Transmit Time per frame us
Rec
v W
ire
rate
Mb
its/
s
50 bytes 100 bytes 200 bytes 400 bytes 600 bytes 800 bytes 1000 bytes 1200 bytes 1400 bytes 1472 bytes
PFLDNet Workshop February 2003R. Hughes-Jones Manchester
All NICs & motherboards were stable – 1000s GBytes of transfers Alteon could handle 930 Mbit/s on 64bit/66MHz SysKonnect gave 720-876 Mbit/s improving to 876-990 Mbit/s on later
m.boards Intel gave 910 – 950 Mbit/s and 950-995 Mbit/s on later m.boards
PCI and GigEthernet signals show 800 MHz CPU can drive large packets at line speed.
More CPU power is required for receiving – loss due to IP discards Rule of thumb at least 1 GHz CPU power free for 1 Gbit
Times for DMA transfers scale with PCI bus speed but CSR access is constant New PCI-X and on-board controllers are better
Buses: 64 bit 66MHz PCI or faster PCI-X are required for performance 32 bit 33 MHz PCI bus is REALLY busy !! 64bit 33 MHz are > 80% used
Summary & Conclusions (1)
PFLDNet Workshop February 2003R. Hughes-Jones Manchester
The NICs should be well designed: Use advanced PCI commands Chipset will then make efficient use of memory CSRs well designed – minimum no of accesses
The drivers need to be well written: CSR access / Clean management of buffers / Good interrupt handling
Worry about the CPU-Memory bandwidth as well as the PCI bandwidth Data crosses the CPU bus several times
Separate the data transfers – use m.boards with multiple PCI buses OS must be up to it too !!
Summary, Conclusions & Thanks
PFLDNet Workshop February 2003R. Hughes-Jones Manchester
Throughput Measured for 1472 byte Packets
NICMotherboard
AlteonAceNIC
SysKonnectSK-9843
IntelPro1000
SuperMicro 370DLE; Chipset: ServerWorks III LE PCI 32bit 33 MHz RedHat 7.1 Kernel 2.4.14
674 Mbit/s 584 Mbit/s0-0 µs
SuperMicro 370DLE Chipset: ServerWorks III LE PCI 64bit 64 MHzRedHat 7.1 Kernel 2.4.14
930 Mbit/s 720 Mbit/s0-0 µs
910 Mbit/s400-120 µs
IBM das Chipset: CNB20LE; PCI 64bit 32 MHzRedHat 7.1 Kernel 2.4.14
790 Mbit/s0-0 µs
930 Mbit/s400-120 µs
SuperMicro P4DP6 Chipset: Intel E7500; PCI 64bit 64 MHzRedHat 7.2 Kernel 2.4.19-SMP
876 Mbit/s 0-0 µs
950 Mbit/s70-70 µs
SuperMicro P4DP8-G2 Chipset: Intel E7500; PCI 64bit 64 MHzRedHat 7.2 Kernel 2.4.19-SMP
990 Mbit/s0-0 µs
995 Mbit/s70-70 µs
PFLDNet Workshop February 2003R. Hughes-Jones Manchester
The SuperMicro P4DP6 Motherboard
Dual Xeon Prestonia (2cpu/die) 400 MHx Front side bus
Intel® E7500 Chipset 6 PCI-X slots 4 independent PCI buses Can select:
64 bit 66 MHz PCI 100 MHz PCI-X 133 MHz PCI-X
2 100 Mbit Ethernet Adaptec AIC-7899W dual channel SCSI UDMA/100 bus master/EIDE channels
data transfer rates of 100 MB/sec burst P4DP8-2G dual Gigabit Ethernet
PFLDNet Workshop February 2003R. Hughes-Jones Manchester
More Information Some URLs
UDPmon / TCPmon kit + writeuphttp://www.hep.man.ac.uk/~rich/net
ATLAS Investigation of the Performance of 100Mbit and Gigabit Ethernet Components Using Raw Ethernet Frames
http:// www.hep.man.ac.uk/~rich/atlas/atlas_net_note_draft5.pdf DataGrid WP7 Networking:
http://www.gridpp.ac.uk/wp7/index.html Motherboard and NIC Tests:
www.hep.man.ac.uk/~rich/net/nic/GigEth_tests_Boston.pptIEPM-BW site:
http://www-iepm.slac.stanford.edu/bw
PFLDNet Workshop February 2003R. Hughes-Jones Manchester
SuperMicro P4DP6: Latency: SysKonnect
Motherboard: SuperMicro P4DP6 Chipset: Intel E7500 (Plumas) CPU: Dual Xeon Prestonia 2.2 GHz PCI:64 bit 66 MHz RedHat 7.3 Kernel 2.4.19
Latency low Interrupts every packet Latency well behaved Slope 0.0199 us/byte Expect:
PCI 0.00188
GigE 0.008
PCI 0.00188
0.0118 us/byte
SysKonnect 64 bit 66 MHz
y = 0.0199x + 62.406
y = 0.012x + 78.554
0
20
40
60
80
100
120
140
0 500 1000 1500 2000 2500 3000
Message length bytes
Late
ncy
us
PFLDNet Workshop February 2003R. Hughes-Jones Manchester
SuperMicro P4DP6: Throughput: SysKonnect
Max throughput 876Mbit/s Big improvement
Loss not due to user Kernel moves
Loss traced to “indiscards” in the receiving IP layer
CPU utilisation on the receiving PC was ~ 25 % for packets > than 1000 bytes
30- 40 % for smaller packets
SysKonnect 64 bit 66 MHz
0
200
400
600
800
1000
0 5 10 15 20 25 30 35 40
Transmit Time per frame us
Recv
Wire
rate
M
bits
/s
50 bytes 100 bytes 200 bytes 400 bytes 600 bytes 800 bytes 1000 bytes 1200 bytes 1400 bytes 1472 bytes
SysKonnect P4DP6 64 bit 66 MHz
0
20
40
60
80
100
0 5 10 15 20 25 30 35 40
Transmit Time per frame us
% P
acket
loss
50 bytes 100 bytes 200 bytes 400 bytes 600 bytes 800 bytes 1000 bytes 1200 bytes 1400 bytes 1472 bytes
Motherboard: SuperMicro P4DP6 Chipset: Intel E7500 (Plumas) CPU: Dual Xeon Prestonia 2.2 GHz PCI:64 bit 66 MHz RedHat 7.3 Kernel 2.4.19
SysKonnect P4DP6 pci 66 MHz
0
20
40
60
80
100
0 5 10 15 20 25 30 35 40Transmit Time per frame us
% C
PU
Id
le
Re
ce
ive
r
50 bytes 100 bytes 200 bytes 400 bytes 600 bytes 800 bytes 1000 bytes 1200 bytes 1400 bytes 1472 bytes
PFLDNet Workshop February 2003R. Hughes-Jones Manchester
SuperMicro P4DP6: PCI: SysKonnect
1400 bytes sent DMA transfers clean PCI STOP signals when
accessing the NIC CSRs
NIC takes ~ 0.7us/CSR CPU faster than the NIC !
Motherboard: SuperMicro P4DP6 Chipset: Intel E7500 (Plumas) CPU: Dual Xeon Prestonia 2.2 GHz PCI, 64 bit, 66 MHz RedHat 7.2 Kernel 2.4.19
Send PCI
Receive PCI
PCI STOP
PCI STOP
PFLDNet Workshop February 2003R. Hughes-Jones Manchester
SuperMicro P4DP8-G2: Latency SysKonnect Motherboard: SuperMicro P4DP8-G2 Chipset: Intel E7500 (Plumas) CPU: Dual Xeon Prestonia 2.4 GHz PCI:64 bit 66 MHz RedHat 7.3 Kernel 2.4.19
Latency low Interrupt every packet Several steps Slope 0.022 us/byte Expect: 0.0118 us/byte
PCI 0.00188
GigE 0.008
PCI 0.00188
Plot smooth for PC switch PC ! Slope 0.028 us/byte
SysKonnect
y = 0.0221x + 43.322
y = 0.007x + 65.95
0
20
40
60
80
100
120
140
0 500 1000 1500 2000 2500 3000Message length bytes
Late
ncy u
sw06gva-05 SysKonnect SW 10 Oct 02
y = 0.028x + 57.467y = 0.008x + 86.15
0
20
40
60
80
100
120
140
0 500 1000 1500 2000 2500 3000
Message length bytes
Late
ncy u
s
PFLDNet Workshop February 2003R. Hughes-Jones Manchester
Interrupt Coalescence: Throughput
Intel Pro 1000 on 370DLE
Throughput 1472 byte packets
0
100
200
300
400
500
600
700
800
900
0 10 20 30 40
Delay between transmit packets us
Re
ce
ive
d W
ire
ra
te M
bit
/s
coa5
coa10
coa20
coa40
coa64
coa100
Throughput 1000 byte packets
0
100
200
300
400
500
600
700
800
0 10 20 30 40
Delay between transmit packets us
Receiv
ed
Wir
e r
ate
Mb
it/s
coa5
coa10
coa20
coa40
coa64
coa100