6/26/01High Throughput Linux Clustering at Fermilab--S. Timm 1 High Throughput Linux Clustering at...

6/26/01 High Throughput Linux Clustering at Fermilab--S. Timm

1

High Throughput Linux Clustering at Fermilab

Steven C. Timm--Fermilab


2

Outline

• Computing problems in High Energy Physics

• Clusters at Fermilab• Hardware Configuration• Software Management Tools• Future Plans


3

Fermilab

• In Batavia, IL. Since 1972, highest energy accelerator in the world.


4

Accelerator

• Collides protons and antiprotons at 2 TeV


5

Coarse Parallelism

• Basic idea: Each “event” is independent• Code doesn’t vectorize well or need SMP• 1000’s of instructions per byte of I/O• Need lots of small, cheap computers• Have used VAX, MC68020, IBM, SGI

workstations, now Linux PC’s.


6

Types of Computing at Fermilab

• Simulation of detector response• Data acquisition• Event reconstruction• Data analysis• Theory calculations (Beowulf-like)• Linux clusters used in all of the

above!


7

Physics Motivation

• Three examples:• Fixed target experiment (~1999)• Collider experiment (running now)• CMS experiment (running 5 years

in future)


8

Fermilab E871

• Called HyperCP, ran in 1997 and 1999

• 3 particles per event• 10 billion events written to tape• 22000 tapes, 5 Gb apiece• More than 100 Tb of data!• Analysis recently completed, about 1

yr.


9

Run II Collider Experiments

• CDF and D0—just starting to run now

• Expected data rate 1 Tb/day• 50-100 tracks per event• Goal: To reconstruct events as

fast as they come in.


10

CDF Detector


11

Mass Storage System

• 1 Pb-capacity tape robot (ADIC)• Mammoth tape drives, 11 Mb/sec• Two tape drives per Linux PC• Unix-like filespace to keep track of

files• Network-attached storage, can

deliver up to 100 Mb/sec throughput.


12

Mass Storage System


13

Reconstruction Farm

• Five farms currently installed• 340 dual CPU nodes in all, 500-

1000 MHZ• 50 Gb disk each, 512 Mb RAM• One I/O node, SGI Origin 2000, 1 Tb

disk, 4 CPU’s, 2 x Gigabit Ethernet.• Farms Batch System software to

coordinate batch jobs


14

Farms I/O Node

SGI O2200

4 x 400 MHz

2 Gb Ethernet

1 Tb disk


15

Farm Workers

• 50 500 MHz Dual PIII

• 50 Gb disk• 512 Mb

RAM


16

Farm Workers

• 2U dual PIII 750 MHz, 50 Gb disk.

• 1Gb RAM.


17

Data mining and analysis facility

• SGI Origin 2000, 176 processors• 5 Terabytes of disk and growing• Used for repetitive analysis of

small subsets of data• Wouldn’t need the SMP but it is the

easiest way to get a lot of processors near a lot of disk.


18

CMS Project


19

CMS Project

• Scheduled to run in 2005 at CERN’s LHC (Geneva, Switzerland)

• Fermilab is managing US contribution.• Every 40 ns, expect 25 collisions • Each collision makes 50-100 particles• 1-10 petabytes of data has to be

distributed around the world• Will need at least 10000 of today’s

fastest PC’s


20

Qualified Vendors

• We evaluate vendors on hardware reliability, competency in Linux, service quality, and price/performance.

• Vendors chosen for desktops and farm workers

• 13 companies submitted evaluation units, five chosen in each category


21

Fermi Linux

• Based on Red Hat Linux 6.1 (7.1 coming soon)

• Add a number of security fixes• Follow all kernel and installer

updates• Updates sent out to ~1000 nodes by

Autorpm• Qualified vendors ship machines

with it preloaded.


22

ICABOD

• Vendor ships system with Linux OS loaded.

• Expect scripts:– Reinstall the system if necessary– Change root password, partition disks– Configure static IP address– Install kerberos and ssh keys


23

Burn-in

• All nodes go through 1 month burn-in test.

• Load both CPU (2 x seti@home)• Disk (Bonnie)• Network test• Monitor temperatures and current

draw.• Reject if more than 2% down time.

mailto:seti@home


24

Management tools


25

NGOP Monitor(Display)


26

NGOP Monitor(Display)


27

FBSNG

• Farms Batch System, Next Generation

• Allows parallel batch jobs which may be dependent on each other

• Abstract and flexible resource definition and management

• Dynamic configuration through API• Web-based interface


28

Future plans

• Next level of integration—1 “pod” of six racks plus switch, console server, display.

• Linux on disk servers, for NFS/NIS• “chaotic” analysis servers and

compute farms to replace big SMP boxes

• Find NFS replacement (SAN?)• Abandon tape altogether?

6/26/01High Throughput Linux Clustering at Fermilab--S. Timm 1 High Throughput Linux Clustering at...

Documents

Transcript of 6/26/01High Throughput Linux Clustering at Fermilab--S. Timm 1 High Throughput Linux Clustering at...