Protocols Recent and Current Work.
-
Upload
brady-miller -
Category
Documents
-
view
28 -
download
0
description
Transcript of Protocols Recent and Current Work.
ESLEA Technical Collaboration Meeting , 20-21 Jun 2006, R. Hughes-Jones Manchester1
Protocols
Recent and Current Work.
Richard Hughes-Jones The University of Manchester
www.hep.man.ac.uk/~rich/ then “Talks”
ESLEA Technical Collaboration Meeting , 20-21 Jun 2006, R. Hughes-Jones Manchester2
Outline
SC|05 TCP and UDP memory-2-memory & disk-2-disk flows 10 Gbit Ethernet
VLBI Jodrell Mark5 problem – see Matt’s Talk Data delay on a TCP link – How suitable is TCP?
4th Year MPhys Project Stephen Kershaw & James Keenan Throughput on the 630Mbit JB-JIVE UKLight Link 10 Gbit in FABRIC
ATLAS Network tests on Manchester T2 farm The Manc-Lanc UKLight Link ATLAS Remote Farms
RAID Tests HEP server 8 lane PCIe RAID card
ESLEA Technical Collaboration Meeting , 20-21 Jun 2006, R. Hughes-Jones Manchester3
SCINet
Caltech Booth The BWC at the SLAC Booth
Collaboration at SC|05
ESLEA Boston Ltd. & Peta-CacheSun
Storcloud
ESLEA Technical Collaboration Meeting , 20-21 Jun 2006, R. Hughes-Jones Manchester4
Bandwidth Challenge wins Hat Trick The maximum aggregate bandwidth was >151 Gbits/s
130 DVD movies in a minute serve 10,000 MPEG2 HDTV movies
in real-time 22 10Gigabit Ethernet waves
Caltech & SLAC/FERMI booths In 2 hours transferred 95.37 TByte 24 hours moved ~ 475 TBytes
Showed real-time particle event analysis
SLAC Fermi UK Booth: 1 10 Gbit Ethernet to UK NLR&UKLight:
transatlantic HEP disk to diskVLBI streaming
2 10 Gbit Links to SALC:rootd low-latency file access
application for clusters Fibre Channel StorCloud
4 10 Gbit links to FermiDcache data transfers
SLAC-ESnet
FermiLab-HOPI
SLAC-ESnet-USNFNAL-UltraLight
UKLight
SLAC-ESnet
FermiLab-HOPI
SLAC-ESnet-USNFNAL-UltraLight
UKLight
SC2004 101 Gbit/s
In to booth
Out of booth
ESLEA Technical Collaboration Meeting , 20-21 Jun 2006, R. Hughes-Jones Manchester5
ESLEA and UKLight
6 * 1 Gbit transatlantic Ethernet layer 2 paths UKLight + NLR
Disk-to-disk transfers with bbcp Seattle to UK Set TCP buffer and application
to give ~850Mbit/s One stream of data 840-620 Mbit/s
Stream UDP VLBI data UK to Seattle 620 Mbit/s
sc0502 SC|05
0
100
200
300
400
500
600
700
800
900
1000
16:00 17:00 18:00 19:00 20:00 21:00 22:00 23:00
date-time
Ra
te
M
bit/s
sc0503 SC|05
0
100
200
300
400
500
600
700
800
900
1000
16:00 17:00 18:00 19:00 20:00 21:00 22:00 23:00
date-time
Ra
te
M
bit/s
sc0504 SC|05
0
100
200
300
400
500
600
700
800
900
1000
16:00 17:00 18:00 19:00 20:00 21:00 22:00 23:00
date-time
Ra
te
M
bit/s
sc0501 SC|05
0
100
200
300
400
500
600
700
800
900
1000
16:00 17:00 18:00 19:00 20:00 21:00 22:00 23:00
time
Ra
te
M
bit/s
UKLight SC|05
0
500
1000
1500
2000
2500
3000
3500
4000
4500
16:00 17:00 18:00 19:00 20:00 21:00 22:00 23:00
date-time
Ra
te
Mb
it/s
Reverse TCP
ESLEA Technical Collaboration Meeting , 20-21 Jun 2006, R. Hughes-Jones Manchester6
SLAC 10 Gigabit Ethernet 2 Lightpaths:
Routed over ESnet Layer 2 over Ultra Science Net
6 Sun V20Z systems per λ
dcache remote disk data access 100 processes per node Node sends or receives One data stream 20-30 Mbit/s
Used Netweion NICs & Chelsio TOE Data also sent to StorCloud
using fibre channel links
Traffic on the 10 GE link for 2 nodes: 3-4 Gbit per nodes 8.5-9 Gbit on Trunk
ESLEA Technical Collaboration Meeting , 20-21 Jun 2006, R. Hughes-Jones Manchester7
VLBI Work
TCP Delay and VLBI Transfers
Manchester 4th Year MPhys Project
by
Stephen Kershaw & James Keenan
ESLEA Technical Collaboration Meeting , 20-21 Jun 2006, R. Hughes-Jones Manchester8
VLBI Network Topology
ESLEA Technical Collaboration Meeting , 20-21 Jun 2006, R. Hughes-Jones Manchester9
VLBI Application Protocol
VLBI data is Constant Bit Rate
tcpdelay instrumented TCP program emulates sending CBR
Data. Records relative 1-way delay
Data1
●●●
Timestamp1
Time
TCP & Network Receiver
Timestamp2
Sender
Data2Timestamp4
Timestamp5
Data4
Timestamp3
Data3
Packet loss
RTT
Time
Sender Receiver
ACKSegment time on wire = bits in segment/BW
Remember Bandwidth*Delay Product BDP = RTT*BW
ESLEA Technical Collaboration Meeting , 20-21 Jun 2006, R. Hughes-Jones Manchester10
Send time – 10000 packets
Check the Send Time
10,000 Messages Message size: 1448 Bytes Wait time: 0 TCP buffer 64k Route:
Man-ukl-JIVE-prod-Man RTT ~26 ms
Slope 0.44 ms/message From TCP buffer size &
RTT Expect ~42 messages/RTT~0.6ms/message
Sen
d tim
e se
c
1 sec
Message number
ESLEA Technical Collaboration Meeting , 20-21 Jun 2006, R. Hughes-Jones Manchester11
Send Time Detail
100 ms
Message 102Message 76
About 25 us One rtt
Sen
d tim
e se
c
26 messages
Message number
TCP Send Buffer limited After SlowStart Buffer full
packets sent out in burstseach RTT
Program blocked on sendto()
ESLEA Technical Collaboration Meeting , 20-21 Jun 2006, R. Hughes-Jones Manchester12
1 way delay – 10000 packets
1-Way Delay
1 w
ay d
elay
100
ms
Message number
100 ms
10,000 Messages Message size: 1448 Bytes Wait time: 0 TCP buffer 64k Route:
Man-ukl-JIVE-prod-Man RTT ~26 ms
ESLEA Technical Collaboration Meeting , 20-21 Jun 2006, R. Hughes-Jones Manchester13
= 1.5 x RTT
= 1 x RTT 26 ms
Message number
≠ 0.5 x RTT
1 w
ay d
elay
10
ms
10 ms
Why not just 1 RTT? After SlowStart TCP Buffer Full Messages at front of TCP Send Buffer have to wait for next burst of ACKs – 1 RTT later Messages further back in the TCP Send Buffer wait for 2 RTT
1-Way Delay Detail
ESLEA Technical Collaboration Meeting , 20-21 Jun 2006, R. Hughes-Jones Manchester14
5 ms
Message number
Route:LAN gig8-gig1
Ping 188 μs
10,000 Messages Message size: 1448 Bytes Wait times: 0 μs
Drop 1 in 1000
Manc-JIVE tests showtimes increasing with a “saw-tooth” around 10 s
1-Way Delay with packet drop
800 us
28 ms1
way
del
ay 1
0 m
s
ESLEA Technical Collaboration Meeting , 20-21 Jun 2006, R. Hughes-Jones Manchester15
10 Gbit in FABRIC
ESLEA Technical Collaboration Meeting , 20-21 Jun 2006, R. Hughes-Jones Manchester16
FABRIC 4Gbit Demo 4 Gbit Lightpath Between GÉANT PoPs Collaboration with Dante Continuous (days) Data Flows – VLBI_UDP and multi-Gigabit TCP tests
ESLEA Technical Collaboration Meeting , 20-21 Jun 2006, R. Hughes-Jones Manchester17
10 Gigabit Ethernet: UDP Data transfer on PCI-X Sun V20z 1.8GHz to
2.6 GHz Dual Opterons Connect via 6509 XFrame II NIC PCI-X mmrbc 2048 bytes
66 MHz One 8000 byte packets
2.8us for CSRs 24.2 us data transfer
effective rate 2.6 Gbit/s
2000 byte packet, wait 0us ~200ms pauses
8000 byte packet, wait 0us ~15ms between data blocks
CSR Access 2.8us
Data Transfer
ESLEA Technical Collaboration Meeting , 20-21 Jun 2006, R. Hughes-Jones Manchester18
ATLAS
ESLEA Technical Collaboration Meeting , 20-21 Jun 2006, R. Hughes-Jones Manchester19
ESLEA: ATLAS on UKLight 1 Gbit Lightpath Lancaster-Manchester Disk 2 Disk Transfers Storage Element with SRM using distributed disk pools dCache & xrootd
ESLEA Technical Collaboration Meeting , 20-21 Jun 2006, R. Hughes-Jones Manchester20
udpmon: Lanc-Manc Throughput
Lanc Manc Plateau ~640 Mbit/s wire rate No packet Loss
Manc Lanc ~800 Mbit/s but packet loss
Send times Pause 695 μs every 1.7ms So expect ~600 Mbit/s
Receive times (Manc end) No corresponding gaps
W11 pyg13-gig1_19Jun06
0
500
1000
1500
2000
2500
3000
3500
6200000 6210000 6220000 6230000 6240000 6250000Recv time 0.1us
1-w
ay d
ela
y u
s
W11 pyg13-gig1_19Jun06
0
500
1000
1500
2000
2500
3000
3500
6200000 6210000 6220000 6230000 6240000 6250000Send time 0.1us
1-w
ay
de
lay
us
pyg13-gig1_19Jun06
0100200300400500600700800900
1000
0 10 20 30 40Spacing between frames us
Recv W
ire r
ate
Mbit/s
50 bytes
100 bytes
200 bytes
400 bytes
600 bytes
800 bytes
1000 bytes
1200 bytes
1400 bytes
1472 bytes
ESLEA Technical Collaboration Meeting , 20-21 Jun 2006, R. Hughes-Jones Manchester21
gig1-pyg13_20Jun06
0100200300400500600700800900
1000
0 10 20 30 40Spacing between frames us
Recv W
ire r
ate
Mbit/s
50 bytes
100 bytes
200 bytes
400 bytes
600 bytes
800 bytes
1000 bytes
1200 bytes
1400 bytes
1472 bytes
udpmon: Manc-Lanc Throughput
Manc Lanc Plateau ~890 Mbit/s wire rate
Packet Loss Large frames 10% when at line rate Small frames 60% when at line rate
1way delay
gig1-pyg13_20Jun06
0
20
40
60
80
100
0 10 20 30 40Spacing between frames us
% P
acket
loss
50 bytes
100 bytes 200 bytes
400 bytes 600 bytes
800 bytes 1000 bytes
1200 bytes 1400 bytes
1472 bytes
W11 gig1-pyg13_20Jun06
0
1000
2000
3000
4000
5000
6000
7000
0 1000 2000 3000 4000 5000Packet No.
1-w
ay d
ela
y u
s
ESLEA Technical Collaboration Meeting , 20-21 Jun 2006, R. Hughes-Jones Manchester22
ATLAS Remote Computing: Application Protocol
Event Request EFD requests an event from SFI SFI replies with the event ~2Mbytes
Processing of event Return of computation
EF asks SFO for buffer space SFO sends OK EF transfers results of the computation
tcpmon - instrumented TCP request-response program emulates the Event Filter EFD to SFI communication.
Send OK
Send event data
Request event
●●●
Request Buffer
Send processed event
Process event
Time
Request-Response time (Histogram)
Event Filter Daemon EFD SFI and SFO
ESLEA Technical Collaboration Meeting , 20-21 Jun 2006, R. Hughes-Jones Manchester23
tcpmon: TCP Activity Manc-CERN Req-Resp
0
50000
100000
150000
200000
250000
0 200 400 600 800 1000 1200 1400 1600 1800 2000time
Data
Byte
s O
ut
0
50
100
150
200
250
300
350
400
Data
Byte
s I
n
DataBytesOut (Delta DataBytesIn (Delta Web100 hooks for TCP status
Round trip time 20 ms 64 byte Request green
1 Mbyte Response blue TCP in slow start 1st event takes 19 rtt or ~ 380 ms
0
50000
100000
150000
200000
250000
0 200 400 600 800 1000 1200 1400 1600 1800 2000time ms
Data
Byte
s O
ut
0
50000
100000
150000
200000
250000
Cu
rCw
nd
DataBytesOut (Delta DataBytesIn (Delta CurCwnd (Value
TCP Congestion windowgets re-set on each Request
TCP stack RFC 2581 & RFC 2861 reduction of Cwnd after inactivity
Even after 10s, each response takes 13 rtt or ~260 ms
020406080
100120140160180
0 200 400 600 800 1000 1200 1400 1600 1800 2000time ms
TC
PA
ch
ive M
bit
/s
0
50000
100000
150000
200000
250000
Cw
nd
Transfer achievable throughput120 Mbit/s
Event rate very low Application not happy!
ESLEA Technical Collaboration Meeting , 20-21 Jun 2006, R. Hughes-Jones Manchester24
tcpmon: TCP Activity Manc-cern Req-Respno cwnd reduction
Round trip time 20 ms 64 byte Request green
1 Mbyte Response blue TCP starts in slow start 1st event takes 19 rtt or ~ 380 ms
0
200000
400000
600000
800000
1000000
1200000
0 500 1000 1500 2000 2500 3000time
Da
ta B
yte
s O
ut
0
50
100
150
200
250
300
350
400
Data
Byte
s I
n
DataBytesOut (Delta DataBytesIn (Delta
0100200300400
500600700800900
0 1000 2000 3000 4000 5000 6000 7000 8000time ms
TC
PA
ch
ive M
bit
/s
0
200000
400000
600000
800000
1000000
1200000
Cw
nd
0
100
200
300
400
500
600
700
800
0 500 1000 1500 2000 2500 3000time ms
nu
m P
ackets
0
200000
400000
600000
800000
1000000
1200000
Cw
nd
PktsOut (Delta PktsIn (Delta CurCwnd (Value
TCP Congestion windowgrows nicely
Response takes 2 rtt after ~1.5s Rate ~10/s (with 50ms wait)
Transfer achievable throughputgrows to 800 Mbit/s
Data transferred WHEN theapplication requires the data
3 Round Trips 2 Round Trips
ESLEA Technical Collaboration Meeting , 20-21 Jun 2006, R. Hughes-Jones Manchester25
Recent RAID Tests
Manchester HEP Server
ESLEA Technical Collaboration Meeting , 20-21 Jun 2006, R. Hughes-Jones Manchester26
“Server Quality” Motherboards
Boston/Supermicro H8DCi Two Dual Core Opterons
1.8 GHz 550 MHz DDR Memory
HyperTransport
Chipset: nVidia nForce Pro 2200/2050
AMD 8132 PCI-X Bridge PCI
2 16 lane PCIe buses 1 4 lane PCIe 133 MHz PCI-X
2 Gigabit Ethernet SATA
ESLEA Technical Collaboration Meeting , 20-21 Jun 2006, R. Hughes-Jones Manchester27
Disk_test: areca PCI-Express 8 port Maxtor 300 GB Sata disks RAID0 5 disks Read 2.5 Gbit/s Write 1.8 Gbit/s
RAID5 5 data disks
Read 1.7 Gbit/s Write 1.48 Gbit/s
afs6 R0 5disk areca 8PCIe 10 Jun06 Read 8k
0
1000
2000
3000
4000
5000
6000
7000
0.0 500.0 1000.0 1500.0 2000.0 2500.0 3000.0 3500.0 4000.0
File size Mbytes
Th
ro
ug
hp
ut
Mb
it/s
Mbit/s 8k r
Mbit/s 8k w
afs6 R5 5disk areca 8PCIe 10 Jun06 Read 8k
0
1000
2000
3000
4000
5000
6000
7000
0.0 500.0 1000.0 1500.0 2000.0 2500.0 3000.0 3500.0 4000.0File size Mbytes
Th
ro
ug
hp
ut
Mb
it/s
Mbit/s 8k r
Mbit/s 8k w
afs6 R6 7disk areca 8PCIe 10 Jun06 Read
0
1000
2000
3000
4000
5000
6000
7000
0.0 500.0 1000.0 1500.0 2000.0 2500.0 3000.0 3500.0 4000.0File size Mbytes
Th
ro
ug
hp
ut
Mb
it/s
Mbit/s 8k r
Mbit/s 8k w
RAID6 5 data disks
Read 2.1 Gbit/s Write 1.0 Gbit/s
ESLEA Technical Collaboration Meeting , 20-21 Jun 2006, R. Hughes-Jones Manchester28
Any Questions?
ESLEA Technical Collaboration Meeting , 20-21 Jun 2006, R. Hughes-Jones Manchester29
More Information Some URLs 1 UKLight web site: http://www.uklight.ac.uk MB-NG project web site: http://www.mb-ng.net/ DataTAG project web site: http://www.datatag.org/ UDPmon / TCPmon kit + writeup:
http://www.hep.man.ac.uk/~rich/net Motherboard and NIC Tests:
http://www.hep.man.ac.uk/~rich/net/nic/GigEth_tests_Boston.ppt& http://datatag.web.cern.ch/datatag/pfldnet2003/ “Performance of 1 and 10 Gigabit Ethernet Cards with Server Quality Motherboards” FGCS Special issue 2004 http:// www.hep.man.ac.uk/~rich/
TCP tuning information may be found at:http://www.ncne.nlanr.net/documentation/faq/performance.html & http://www.psc.edu/networking/perf_tune.html
TCP stack comparisons:“Evaluation of Advanced TCP Stacks on Fast Long-Distance Production Networks” Journal of Grid Computing 2004
PFLDnet http://www.ens-lyon.fr/LIP/RESO/pfldnet2005/ Dante PERT http://www.geant2.net/server/show/nav.00d00h002
ESLEA Technical Collaboration Meeting , 20-21 Jun 2006, R. Hughes-Jones Manchester30
Lectures, tutorials etc. on TCP/IP: www.nv.cc.va.us/home/joney/tcp_ip.htm www.cs.pdx.edu/~jrb/tcpip.lectures.html www.raleigh.ibm.com/cgi-bin/bookmgr/BOOKS/EZ306200/CCONTENTS www.cisco.com/univercd/cc/td/doc/product/iaabu/centri4/user/scf4ap1.htm www.cis.ohio-state.edu/htbin/rfc/rfc1180.html www.jbmelectronics.com/tcp.htm
Encylopaedia http://www.freesoft.org/CIE/index.htm
TCP/IP Resources www.private.org.il/tcpip_rl.html
Understanding IP addresses http://www.3com.com/solutions/en_US/ncs/501302.html
Configuring TCP (RFC 1122) ftp://nic.merit.edu/internet/documents/rfc/rfc1122.txt
Assigned protocols, ports etc (RFC 1010) http://www.es.net/pub/rfcs/rfc1010.txt & /etc/protocols
More Information Some URLs 2
ESLEA Technical Collaboration Meeting , 20-21 Jun 2006, R. Hughes-Jones Manchester31
Backup Slides
ESLEA Technical Collaboration Meeting , 20-21 Jun 2006, R. Hughes-Jones Manchester32
SuperComputing
ESLEA Technical Collaboration Meeting , 20-21 Jun 2006, R. Hughes-Jones Manchester33
SC2004: Disk-Disk bbftp bbftp file transfer program uses TCP/IP UKLight: Path:- London-Chicago-London; PCs:- Supermicro +3Ware RAID0 MTU 1500 bytes; Socket size 22 Mbytes; rtt 177ms; SACK off Move a 2 Gbyte file Web100 plots:
Standard TCP Average 825 Mbit/s (bbcp: 670 Mbit/s)
Scalable TCP Average 875 Mbit/s (bbcp: 701 Mbit/s
~4.5s of overhead)
Disk-TCP-Disk at 1Gbit/sis here!
0
500
1000
1500
2000
2500
0 5000 10000 15000 20000
time msT
CP
Ach
ive M
bit
/s
050000001000000015000000200000002500000030000000350000004000000045000000
Cw
nd
InstaneousBW
AveBW
CurCwnd (Value)
0
500
1000
1500
2000
2500
0 5000 10000 15000 20000
time ms
TC
PA
ch
ive M
bit
/s
050000001000000015000000200000002500000030000000350000004000000045000000
Cw
nd
InstaneousBWAveBWCurCwnd (Value)
ESLEA Technical Collaboration Meeting , 20-21 Jun 2006, R. Hughes-Jones Manchester34
SC|05 HEP: Moving data with bbcp What is the end-host doing with your network protocol? Look at the PCI-X 3Ware 9000 controller RAID0 1 Gbit Ethernet link 2.4 GHz dual Xeon ~660 Mbit/s
PCI-X bus with RAID Controller
PCI-X bus with Ethernet NIC
Read from diskfor 44 ms every 100ms
Write to Networkfor 72 ms
Power needed in the end hosts Careful Application design
ESLEA Technical Collaboration Meeting , 20-21 Jun 2006, R. Hughes-Jones Manchester35
10 Gigabit Ethernet: UDP Throughput
1500 byte MTU gives ~ 2 Gbit/s Used 16144 byte MTU max user length 16080 DataTAG Supermicro PCs Dual 2.2 GHz Xenon CPU FSB 400 MHz PCI-X mmrbc 512 bytes wire rate throughput of 2.9 Gbit/s
CERN OpenLab HP Itanium PCs Dual 1.0 GHz 64 bit Itanium CPU FSB 400 MHz PCI-X mmrbc 4096 bytes wire rate of 5.7 Gbit/s
SLAC Dell PCs giving a Dual 3.0 GHz Xenon CPU FSB 533 MHz PCI-X mmrbc 4096 bytes wire rate of 5.4 Gbit/s
an-al 10GE Xsum 512kbuf MTU16114 27Oct03
0
1000
2000
3000
4000
5000
6000
0 5 10 15 20 25 30 35 40Spacing between frames us
Rec
v W
ire
rate
Mb
its/
s
16080 bytes 14000 bytes 12000 bytes 10000 bytes 9000 bytes 8000 bytes 7000 bytes 6000 bytes 5000 bytes 4000 bytes 3000 bytes 2000 bytes 1472 bytes
ESLEA Technical Collaboration Meeting , 20-21 Jun 2006, R. Hughes-Jones Manchester36
10 Gigabit Ethernet: Tuning PCI-X
16080 byte packets every 200 µs Intel PRO/10GbE LR Adapter PCI-X bus occupancy vs mmrbc
Measured times Times based on PCI-X times from
the logic analyser Expected throughput ~7 Gbit/s Measured 5.7 Gbit/s
mmrbc1024 bytes
mmrbc2048 bytes
mmrbc4096 bytes5.7Gbit/s
mmrbc512 bytes
CSR Access
PCI-X Sequence
Data Transfer
Interrupt & CSR UpdateKernel 2.6.1#17 HP Itanium Intel10GE Feb04
0
2
4
6
8
10
0 1000 2000 3000 4000 5000Max Memory Read Byte Count
PC
I-X
Tra
nsfe
r tim
e
us
measured Rate Gbit/srate from expected time Gbit/s Max throughput PCI-X
DataTAG Xeon 2.2 GHz
0
2
4
6
8
10
0 1000 2000 3000 4000 5000Max Memory Read Byte Count
PC
I-X
Tra
nsfe
r tim
e
us
measured Rate Gbit/srate from expected time Gbit/s Max throughput PCI-X
ESLEA Technical Collaboration Meeting , 20-21 Jun 2006, R. Hughes-Jones Manchester37
10 Gigabit Ethernet: TCP Data transfer on PCI-X
Sun V20z 1.8GHz to2.6 GHz Dual Opterons
Connect via 6509 XFrame II NIC PCI-X mmrbc 4096 bytes
66 MHz
Two 9000 byte packets b2b Ave Rate 2.87 Gbit/s
Burst of packets length646.8 us
Gap between bursts 343 us 2 Interrupts / burst
CSR Access
Data Transfer
ESLEA Technical Collaboration Meeting , 20-21 Jun 2006, R. Hughes-Jones Manchester38
TCP on the 630 Mbit Link
Jodrell – UKLight – JIVE
ESLEA Technical Collaboration Meeting , 20-21 Jun 2006, R. Hughes-Jones Manchester39
TCP Throughput on 630 Mbit UKLight Manchester gig7 – JBO mk5 606 4 Mbyte TCP buffer
test 0 Dup ACKs seen Other Reductions
test 1
test 2
0
200
400
600
800
1000
0 20 40 60 80 100 120
time s
TC
PA
chiv
e M
bit
/s
0
1000000
2000000
3000000
4000000
5000000
6000000
7000000
Cw
nd
InstaneousBWCurCwnd (Value)
0
200
400
600
800
1000
0 20 40 60 80 100 120
time s
TC
PA
chiv
e M
bit
/s
050000010000001500000200000025000003000000350000040000004500000
Cw
nd
InstaneousBW CurCwnd (Value
0
200
400
600
800
1000
0 20 40 60 80 100 120
time s
TC
PA
chiv
e M
bit
/s
0
1000000
2000000
3000000
4000000
5000000
6000000
7000000
Cw
nd
InstaneousBW CurCwnd (Value
ESLEA Technical Collaboration Meeting , 20-21 Jun 2006, R. Hughes-Jones Manchester40
Message 102Message 76
100 ms
Sen
d tim
e se
c
26 messages
Comparison of Send Time & 1-way delay
Message number
ESLEA Technical Collaboration Meeting , 20-21 Jun 2006, R. Hughes-Jones Manchester41
Route:Man-ukl-ams-prod-man
Rtt 27ms 10,000 Messages Message size: 1448 Bytes Wait times: 0 μs DBP = 3.4MByte TCP buffer 10MByte
1-Way Delay 1448 byte msgone-way
0
100000
200000
300000
400000
500000
600000
700000
800000
900000
0 2000 4000 6000 8000 10000 12000Packet No.
1-w
ay d
elay
us
50 ms
Message number
0100
200300400
500600
700800
0 1000 2000 3000 4000 5000 6000 7000 8000 9000
time ms
num
Pac
kets
0
500000
1000000
1500000
2000000
Cw
nd
P ktsOut (Delta)P ktsIn (Delta)CurCwnd (Value)
Web100 plot Starts after 5.6 Sec
due to Clock Sync. ~400 pkts/10ms Rate similar to iperf
ESLEA Technical Collaboration Meeting , 20-21 Jun 2006, R. Hughes-Jones Manchester42
Related Work: RAID, ATLAS Grid RAID0 and RAID5 tests
4th Year MPhys project last semester Throughput and CPU load Different RAID parameters
Number of disksStripe sizeUser read / write size
Different file systemsExt2 ext3 XSF
Sequential File Write, Read Sequential File Write, Read with continuous background read or write
Status Need to check some results & document Independent RAID controller tests planned.
ESLEA Technical Collaboration Meeting , 20-21 Jun 2006, R. Hughes-Jones Manchester43
Objective: demo 1 Gbit/s aggregate bandwidth between RAL and 4 Tier 2 sites RAL has SuperJANET4 and UKLight links: RAL Capped firewall traffic at 800 Mbit/s
SuperJANET Sites: Glasgow Manchester Oxford QMWL
UKLight Site: Lancaster
Many concurrent transfersfrom RAL to each of the Tier 2 sites
HEP: Service Challenge 4
~700 Mbit UKLight
Peak 680 Mbit SJ4
5510 +5530
5530
RouterA
UKLightRouter
3 x 5510+ 5530
5510-3stack
ADS Caches
CPUs +Disks
CPUs +Disks
CPUs +Disks
CPU +Disks
CPUs +Disks
10Gb/ s
4 x1Gb/ s
10Gb/ s
4 x 1Gb/ sto CERN
1Gb/ sto Lancaster
N x 1Gb/ s
N x 1Gb/ s
FW
1Gb/ s 1Gb/ s to SJ 4
RALSite
2 x 1Gb/ s
Tier 1
RALTier 2
10Gb/ s
CPU +Disks
5510-2stack
OracleRACs
5510 +5530
5530
RouterA
UKLightRouter
3 x 5510+ 5530
5510-3stack
ADS Caches
CPUs +Disks
CPUs +Disks
CPUs +Disks
CPU +Disks
CPUs +Disks
10Gb/ s
4 x1Gb/ s
10Gb/ s
4 x 1Gb/ sto CERN
1Gb/ sto Lancaster
N x 1Gb/ s
N x 1Gb/ s
FW
1Gb/ s 1Gb/ s to SJ 4
RALSite
2 x 1Gb/ s
Tier 1
RALTier 2
10Gb/ s
CPU +Disks
5510-2stack
OracleRACs
Applications able to sustain high rates.
SuperJANET5, UKLight &new access links very timely
ESLEA Technical Collaboration Meeting , 20-21 Jun 2006, R. Hughes-Jones Manchester44
Network switch limits behaviour End2end UDP packets from udpmon
Only 700 Mbit/s throughput
Lots of packet loss
Packet loss distributionshows throughput limited
w05gva-gig6_29May04_UDP
0
100
200
300
400
500
600
700
800
900
1000
0 5 10 15 20 25 30 35 40Spacing between frames us
Recv W
ire r
ate
Mb
its/s
50 bytes 100 bytes 200 bytes 400 bytes 600 bytes 800 bytes 1000 bytes 1200 bytes 1400 bytes 1472 bytes
w05gva-gig6_29May04_UDP
0
10
20
30
40
50
60
70
80
90
100
0 5 10 15 20 25 30 35 40Spacing between frames us
% P
acket
loss
50 bytes 100 bytes 200 bytes 400 bytes 600 bytes 800 bytes 1000 bytes 1200 bytes 1400 bytes 1472 bytes
w05gva-gig6_29May04_UDP wait 12us
0
2000
4000
6000
8000
10000
12000
14000
0 100 200 300 400 500 600Packet No.
1-w
ay d
ela
y u
s
0
2000
4000
6000
8000
10000
12000
14000
500 510 520 530 540 550Packet No.
1-w
ay d
ela
y u
s