Experiences Tuning Cluster Hosts 1GigE and 10GbE Paul Hyder Cooperative Institute for Research in...
-
Upload
nickolas-ferguson -
Category
Documents
-
view
215 -
download
0
Transcript of Experiences Tuning Cluster Hosts 1GigE and 10GbE Paul Hyder Cooperative Institute for Research in...
![Page 1: Experiences Tuning Cluster Hosts 1GigE and 10GbE Paul Hyder Cooperative Institute for Research in Environmental Sciences, CU Boulder Cooperative Institute.](https://reader036.fdocuments.in/reader036/viewer/2022082612/56649f4d5503460f94c6db5c/html5/thumbnails/1.jpg)
Experiences Tuning Experiences Tuning Cluster HostsCluster Hosts
1GigE and 10GbE1GigE and 10GbE
Paul HyderPaul Hyder
Cooperative Institute for Research in Cooperative Institute for Research in Environmental Sciences, CU BoulderEnvironmental Sciences, CU Boulder
(CIRES at NOAA/ERSL/GSD High (CIRES at NOAA/ERSL/GSD High Performance Computing)Performance Computing) Paul.Hyder at noaa.govPaul.Hyder at noaa.gov
![Page 2: Experiences Tuning Cluster Hosts 1GigE and 10GbE Paul Hyder Cooperative Institute for Research in Environmental Sciences, CU Boulder Cooperative Institute.](https://reader036.fdocuments.in/reader036/viewer/2022082612/56649f4d5503460f94c6db5c/html5/thumbnails/2.jpg)
Tuning FocusTuning Focus
Cluster Front Ends and Cron Server Cluster Front Ends and Cron Server HostsHosts
File transfer servers (scponly)File transfer servers (scponly) BWCTL hostBWCTL host Remote client hostsRemote client hosts 10GbE Testbed10GbE Testbed
(7.2 Gb/sec uses ~49% of one 3G CPU)(7.2 Gb/sec uses ~49% of one 3G CPU)
![Page 3: Experiences Tuning Cluster Hosts 1GigE and 10GbE Paul Hyder Cooperative Institute for Research in Environmental Sciences, CU Boulder Cooperative Institute.](https://reader036.fdocuments.in/reader036/viewer/2022082612/56649f4d5503460f94c6db5c/html5/thumbnails/3.jpg)
How We Apply the Well How We Apply the Well Known RulesKnown Rules
Jumbo FramesJumbo Frames– 8K on hosts8K on hosts– 9K on network9K on network
Tune TCP to match BDPTune TCP to match BDP Encourage application writers to use Encourage application writers to use
large read and write bufferslarge read and write buffers Install tuned ApplicationsInstall tuned Applications
– PSC.edu patch to sshPSC.edu patch to sshOpenSSH:channels.hOpenSSH:channels.h#define CHAN_TCP_PACKET_DEFAULT (32*1024)#define CHAN_TCP_PACKET_DEFAULT (32*1024)#define CHAN_TCP_WINDOW_DEFAULT #define CHAN_TCP_WINDOW_DEFAULT
(4*CHAN_TCP_PACKET_DEFAULT)(4*CHAN_TCP_PACKET_DEFAULT)
![Page 4: Experiences Tuning Cluster Hosts 1GigE and 10GbE Paul Hyder Cooperative Institute for Research in Environmental Sciences, CU Boulder Cooperative Institute.](https://reader036.fdocuments.in/reader036/viewer/2022082612/56649f4d5503460f94c6db5c/html5/thumbnails/4.jpg)
Throughput TestingThroughput Testing
Iperf (2.0.2) from shell scriptsIperf (2.0.2) from shell scripts– Vary buffer (-l) and window (-w)Vary buffer (-l) and window (-w)– Modify ifconfig and PCI configurationModify ifconfig and PCI configuration– Loop takes 3 daysLoop takes 3 days
Bwctl with remote hostsBwctl with remote hosts– ?Anyone on NLR??Anyone on NLR?
Use scp/sftp/rsync as final testUse scp/sftp/rsync as final test
![Page 5: Experiences Tuning Cluster Hosts 1GigE and 10GbE Paul Hyder Cooperative Institute for Research in Environmental Sciences, CU Boulder Cooperative Institute.](https://reader036.fdocuments.in/reader036/viewer/2022082612/56649f4d5503460f94c6db5c/html5/thumbnails/5.jpg)
I’m CuriousI’m Curious
How much TCP tuning information How much TCP tuning information do you provide users and admins?do you provide users and admins?
Are hosts being tuned?Are hosts being tuned? Does your internal LAN support Does your internal LAN support
jumbo frames?jumbo frames?
![Page 6: Experiences Tuning Cluster Hosts 1GigE and 10GbE Paul Hyder Cooperative Institute for Research in Environmental Sciences, CU Boulder Cooperative Institute.](https://reader036.fdocuments.in/reader036/viewer/2022082612/56649f4d5503460f94c6db5c/html5/thumbnails/6.jpg)
GSD Cluster GigE DefaultsGSD Cluster GigE Defaults
[wr]mem_default 2MB[wr]mem_default 2MB [wr]mem_max 16MB[wr]mem_max 16MB ipv4/tcp_[wr]mem 64KB 2MB 16MBipv4/tcp_[wr]mem 64KB 2MB 16MB optmem_max 512Koptmem_max 512K txqueuelen 10000txqueuelen 10000 netdev_max_backlog 3000netdev_max_backlog 3000 ipv4/tcp_sack and ipv4/tcp_timestamps onipv4/tcp_sack and ipv4/tcp_timestamps on Don’t touch ipv4/tcp_memDon’t touch ipv4/tcp_mem
![Page 7: Experiences Tuning Cluster Hosts 1GigE and 10GbE Paul Hyder Cooperative Institute for Research in Environmental Sciences, CU Boulder Cooperative Institute.](https://reader036.fdocuments.in/reader036/viewer/2022082612/56649f4d5503460f94c6db5c/html5/thumbnails/7.jpg)
Jumbo Frame PlotJumbo Frame Plot
![Page 8: Experiences Tuning Cluster Hosts 1GigE and 10GbE Paul Hyder Cooperative Institute for Research in Environmental Sciences, CU Boulder Cooperative Institute.](https://reader036.fdocuments.in/reader036/viewer/2022082612/56649f4d5503460f94c6db5c/html5/thumbnails/8.jpg)
What doesn’t workWhat doesn’t work
Jumbo FramesJumbo Frames– Switch FabricsSwitch Fabrics
High density cardsHigh density cards Complex vLAN configurationsComplex vLAN configurations Stand alone GigE switchesStand alone GigE switches
– FirewallsFirewalls– ICMP for path mtu discoveryICMP for path mtu discovery
Disabled completelyDisabled completely Network devices don’t respondNetwork devices don’t respond
![Page 9: Experiences Tuning Cluster Hosts 1GigE and 10GbE Paul Hyder Cooperative Institute for Research in Environmental Sciences, CU Boulder Cooperative Institute.](https://reader036.fdocuments.in/reader036/viewer/2022082612/56649f4d5503460f94c6db5c/html5/thumbnails/9.jpg)
Linux 2.6 and JumbosLinux 2.6 and Jumbos
IP hostA.52434 > hostB.22: S 544:544(0) win 16304 <mss 8152,...>
IP hostB.22 > hostA.52434: S 207:207(0) ack 545 win 5792 <mss 1460,...>
...
IP hostA.52434 > hostB.22: . 2255:6599(4344) ack 2293 win 16304 <...>
IP hostA.52434 > hostB.22: P 6599:10943(4344) ack 2293 win 16304 <...>
IP router > hostA: icmp 36: hostB unreachable - need to frag (mtu 1500)
IP hostA.52434 > hostB.22: . 2255:3703(1448) ack 2293 win 16304 <...>
![Page 10: Experiences Tuning Cluster Hosts 1GigE and 10GbE Paul Hyder Cooperative Institute for Research in Environmental Sciences, CU Boulder Cooperative Institute.](https://reader036.fdocuments.in/reader036/viewer/2022082612/56649f4d5503460f94c6db5c/html5/thumbnails/10.jpg)
Host Side ChecksHost Side Checks
Interrupt Aggregation (Linux NAPI)Interrupt Aggregation (Linux NAPI) Memory to match buffer tuningMemory to match buffer tuning More than one CPUMore than one CPU Static ARP entriesStatic ARP entries
![Page 11: Experiences Tuning Cluster Hosts 1GigE and 10GbE Paul Hyder Cooperative Institute for Research in Environmental Sciences, CU Boulder Cooperative Institute.](https://reader036.fdocuments.in/reader036/viewer/2022082612/56649f4d5503460f94c6db5c/html5/thumbnails/11.jpg)
Network Device SettingsNetwork Device Settings
Static ARP entries or increase Static ARP entries or increase timeouttimeout
Increase FDB timeoutsIncrease FDB timeouts Verify jumbo frame configurationVerify jumbo frame configuration
![Page 12: Experiences Tuning Cluster Hosts 1GigE and 10GbE Paul Hyder Cooperative Institute for Research in Environmental Sciences, CU Boulder Cooperative Institute.](https://reader036.fdocuments.in/reader036/viewer/2022082612/56649f4d5503460f94c6db5c/html5/thumbnails/12.jpg)
10GbE Quick Notes10GbE Quick Notes
Know your PCI hardware (MMRBC, Know your PCI hardware (MMRBC, Latency timer, and Splits)Latency timer, and Splits)
TCP stack is ~0.200msTCP stack is ~0.200ms Increase netdev_max_backlog to Increase netdev_max_backlog to
3000030000(throughput = backlog * 100MHz * ave_bytes_pkt)(throughput = backlog * 100MHz * ave_bytes_pkt)
Set *_cong to CERN valuesSet *_cong to CERN values Write buffers in code ~128KBWrite buffers in code ~128KB
![Page 13: Experiences Tuning Cluster Hosts 1GigE and 10GbE Paul Hyder Cooperative Institute for Research in Environmental Sciences, CU Boulder Cooperative Institute.](https://reader036.fdocuments.in/reader036/viewer/2022082612/56649f4d5503460f94c6db5c/html5/thumbnails/13.jpg)
10G buffer plot10G buffer plot
![Page 14: Experiences Tuning Cluster Hosts 1GigE and 10GbE Paul Hyder Cooperative Institute for Research in Environmental Sciences, CU Boulder Cooperative Institute.](https://reader036.fdocuments.in/reader036/viewer/2022082612/56649f4d5503460f94c6db5c/html5/thumbnails/14.jpg)
Questions?Questions?
![Page 15: Experiences Tuning Cluster Hosts 1GigE and 10GbE Paul Hyder Cooperative Institute for Research in Environmental Sciences, CU Boulder Cooperative Institute.](https://reader036.fdocuments.in/reader036/viewer/2022082612/56649f4d5503460f94c6db5c/html5/thumbnails/15.jpg)
Reference URLsReference URLs
http://www.psc.edu/networking/projects/hpn-ssh/http://www.psc.edu/networking/projects/hpn-ssh/ http://dast.nlanr.net/Projects/Iperf/http://dast.nlanr.net/Projects/Iperf/ http://www.sublimation.org/scponly/http://www.sublimation.org/scponly/ http://e2epi.internet2.edu/bwctl/http://e2epi.internet2.edu/bwctl/
– http://abilene.internet2.edu/ami/bwctl_status.cgi/TCP/nowhttp://abilene.internet2.edu/ami/bwctl_status.cgi/TCP/now http://www.tcptrace.org/http://www.tcptrace.org/ http://ultralight.caltech.edu/http://ultralight.caltech.edu/ http://staff.science.uva.nl/~delaat/articles/2003-7-http://staff.science.uva.nl/~delaat/articles/2003-7-
10gige.pdf10gige.pdf
http://www.csm.ornl.gov/~dunigan/netperf/netlinks.htmlhttp://www.csm.ornl.gov/~dunigan/netperf/netlinks.html http://www.psc.edu/networking/projects/tcptune/http://www.psc.edu/networking/projects/tcptune/
http://www.amd.com/us-en/assets/content_type/white_papers_and_tech_docs/http://www.amd.com/us-en/assets/content_type/white_papers_and_tech_docs/26310.pdf26310.pdf