Farm Completion Beat Jost and Niko Neufeld LHCb Week St. Petersburg June 2010.
-
Upload
kristian-rogers -
Category
Documents
-
view
214 -
download
2
Transcript of Farm Completion Beat Jost and Niko Neufeld LHCb Week St. Petersburg June 2010.
Farm Completion
Beat Jost and Niko NeufeldLHCb Week St. Petersburg
June 2010
Filling the farm
• Thanks for interesting and useful discussions to– Loic Barda, Rolf Lindner, Laurent Roy
and Eric Thomas
• Thanks for measurements and plots to – Juan Caicedo and Patrick Robbe
Farm Completion St. Petersburg 06/2010 - Niko Neufeld 2
The three limits:Power, Cooling, Money
• Power: 550 kW available (105 kW used)
• Cooling: nominally available 525 kW• Rack-space: 1700 Us (plenty)• Money: xx MCHF
Farm Completion St. Petersburg 06/2010 - Niko Neufeld 3
Event Filter Farm
• Level 1: – 100 SuperMicro Twin servers (2 servers
in a single 1U chassis with shared power-supply), Intel Harpertown CPU 5420 (2.5 GHz) 4 cores / socket, 1 GB RAM /core
• Level 2:– 350 DELL Bladeservers (up to 16 blades
in a 10 U chassis), Intel Harpertown CPU 5420 (2.5 GHz) 4 cores / socket, 2 GB RAM /core
Farm Completion St. Petersburg 06/2010 - Niko Neufeld 4
The new farm-node
• Both Intel and AMD have brought out new processors: with up to 12 cores / chip and (Intel) hyper-threads (a.k.a. virtual CPUs)
• Memory has (again) become faster and cheaper (DDR-3) and each processor has 3 memory channels ( “good” memory configuration = 3 * n, where n = 2, 4, 8, 16
• Both processors are now NUMA (non-uniform memory access)– Study program ongoing to take profit from this
Farm Completion St. Petersburg 06/2010 - Niko Neufeld 5
How many jobs / server
Farm Completion St. Petersburg 06/2010 - Niko Neufeld 6
How fast?
Farm Completion St. Petersburg 06/2010 - Niko Neufeld 7
Server specifications
• 1 GB RAM per hardware thread == virtual core
• 1 Power supply failure should not affect more than 2 units
• 2 Gigabit Ethernet ports• No constraints on power-consumption• CPU (AMD 61xx / Intel 56xx) chosen
such as to optimise the Moore/CHF
Farm Completion St. Petersburg 06/2010 - Niko Neufeld 8
A likely candidate
• 1.2 kW– redundant PS
• 4 servers with each– 12 cores – 24 GB (up to 96)
RAM– 1 HDD– 2 x Gigabit Ethernet
• 21 kCHF list-price
Farm Completion St. Petersburg 06/2010 - Niko Neufeld 9
Conclusions
• We will run with 16 Moore jobs / server (twice as many as today)
• Each server will be 2 to 2.5 x faster than the current HLT node
• Each Moore instance can use up to 1.5 GB RAM– If really need more RAM
1. Reduce number of jobs2. Increase (double) memory
Farm Completion St. Petersburg 06/2010 - Niko Neufeld 10
Procedure / planning
Step Duration
Decision to buy (day X) 0
Technical specifications to firms 1 week
Firms reply (with offer) / validation of sample server
4 weeks
Adjudication (negotiation) 1 week
Delivery (in batches if possible installation starts as soon as delivered)
6 weeks
Finishing installation 1 week
Farm Level 3 in production 13 weeks after initial decision
Farm Completion St. Petersburg 06/2010 - Niko Neufeld 11
To-do list
Hardware• Unpacking (surface
SX8 need a lot of space and friendly volunteers)
• Installation in D1– Power, network
• Burn-in (3 days)• Exchange faulty
servers / parts
Software• Install OS, verify
OS tuning (NIC, memory arrangement etc…)
• Integrate in software-management (Quattor)
• Add to farm-control
Farm Completion St. Petersburg 06/2010 - Niko Neufeld 12
DETAILS
Farm Completion St. Petersburg 06/2010 - Niko Neufeld 13
Farm Completion St. Petersburg 06/2010 - Niko Neufeld 14
How fast? (Moore v9r2 HLT1 only)
DAQ & electronics upgrade - Niko Neufeld 15