Building Large Scale Fabrics – A Summary

19
Building Large Scale Fabrics – A Summary Marcel Kunze, FZK

description

Building Large Scale Fabrics – A Summary. Marcel Kunze, FZK . Observation. Everybody seems to need unprecedented amount of CPU, Disk and Network b/w Trend to PC based computing fabrics and commodity hardware LCG (CERN), L. Robertson CDF (Fermilab), M. Neubauer D0 (FermiLab), I. Terekhov - PowerPoint PPT Presentation

Transcript of Building Large Scale Fabrics – A Summary

Page 1: Building Large Scale Fabrics – A Summary

Building Large Scale Fabrics – A Summary

Marcel Kunze, FZK

Page 2: Building Large Scale Fabrics – A Summary

ACAT 2002 Moscow Marcel Kunze - FZK

ObservationEverybody seems to need unprecedented amount of CPU, Disk and Network b/wTrend to PC based computing fabrics and commodity hardware

LCG (CERN), L. Robertson CDF (Fermilab), M. Neubauer D0 (FermiLab), I. Terekhov Belle (KEK), P. Krokovny Hera-B (DESY), J. Hernandez Ligo, P. Shawhan Virgo, D. Busculic AMS, A.KlimentovConsiderable savings in cost wrt. RISC based farm:Not enough ‘bang for the buck’ (M. Neubauer)

Page 3: Building Large Scale Fabrics – A Summary

ACAT 2002 Moscow Marcel Kunze - FZK

AMS02 Benchmarks

Executive time of AMS “standard” job compare to CPU clock1) V.Choutko, A.Klimentov AMS note 2001-11-01

1)

Brand, CPU , Memory

Intel PII dual-CPU 450 MHz, 512 MB RAM

OS/Compiler

RH Linux 6.2 / gcc 2.95

“Sim”

1

“Rec”

1

Intel PIII dual-CPU 933 MHz, 512 MB RAM RH Linux 6.2 / gcc 2.95 0.54 0.54

Compaq, Quad α-ev67 600 MHz, 2 GB RAM RH Linux 6.2 / gcc 2.95 0.58 0.59

AMD Athlon,1.2GHz, 256 MB RAM RH Linux 6.2 / gcc 2.95 0.39 0.34

Intel Pentium IV 1.5GHz, 256 MB RAM RH Linux 6.2 / gcc 2.95 0.44 0.58

Compaq dual-CPU PIV Xeon 1.7GHz, 2GB RAM RH Linux 6.2 / gcc 2.95 0.32 0.39

Compaq dual α-ev68 866MHz, 2GB RAM Tru64 Unix/ cxx 6.2 0.23 0.25

Elonex Intel dual-CPU PIV Xeon 2GHz, 1GB RAM RH Linux 7.2 / gcc 2.95 0.29 0.35

AMD Athlon 1800MP, dual-CPU 1.53GHz, 1GB RAM RH Linux 7.2 / gcc 2.95 0.24 0.23

8 CPU SUN-Fire-880, 750MHz, 8GB RAM Solaris 5.8/C++ 5.2 0.52 0.45

24 CPU Sun Ultrasparc-III+, 900MHz, 96GB RAM RH Linux 6.2 / gcc 2.95 0.43 0.39

Compaq α-ev68 dual 866MHz, 2GB RAM RH Linux 7.1 / gcc 2.95 0.22 0.23

Page 4: Building Large Scale Fabrics – A Summary

ACAT 2002 Moscow Marcel Kunze - FZK

SWITCH

C-IXP

WHO

TEN-155

KPNQwest

RENATER National ResearchNetworks

Mission Oriented Link & USLIC

Public

IN2P3

JEG (Japan)GenesisProject

CERN

Fabrics and Networks: Commodity Equipment

Needed for LHC at CERN in 2006: Storage

Raw recording rate 0.1 – 1 GB/secAccumulating at 5-8 PetaBytes/year

10 PetaBytes of diskProcessing

200’000 of today’s (2001) fastest PCs

Networks5-10 Gbps between main Grid

nodesDistributed computing effort to

avoid congestion: 1/3 at CERN 2/3 elsewhere

Page 5: Building Large Scale Fabrics – A Summary

ACAT 2002 Moscow Marcel Kunze - FZK

PC Cluster 5(Belle)

1U serverPentium III 1.2GHz256 CPU(128 nodes)

Page 6: Building Large Scale Fabrics – A Summary

ACAT 2002 Moscow Marcel Kunze - FZK

PC Cluster 6 Blade server: LP Pentium III 700MHz

40CPU (40 nodes)

3U

Page 7: Building Large Scale Fabrics – A Summary

ACAT 2002 Moscow Marcel Kunze - FZK

Disk Storage

Page 8: Building Large Scale Fabrics – A Summary

ACAT 2002 Moscow Marcel Kunze - FZK

IDE Performance

Page 9: Building Large Scale Fabrics – A Summary

ACAT 2002 Moscow Marcel Kunze - FZK

Basic QuestionsCompute farms contain several 1000s of computing elementsStorage farms contain 1000s of disk drives

How to build scalable systems ?How to build reliable systems ?How to operate and maintain large fabrics ?How to recover from errors ?

EDG deals with the issue (P. Kunszt)IBM deals with the issue (N. Zheleznykh) Project Eliza: Self healing clusters

Several ideas and tools are already on the market

Page 10: Building Large Scale Fabrics – A Summary

ACAT 2002 Moscow Marcel Kunze - FZK

Storage Scalability

Difficult to scale up to systems of 1000s of components and keep single system image:NFS-Automounter, Symbolic links etc.(M.Neubauer, CAF: ROOTD does not need this and allows for direct worldwide access to distributed files w/o mounts)

Scalability in size and throughput by means of storage virtualisationAllows to set up non-TCP/IP based systems to handle multi-GB/s

Page 11: Building Large Scale Fabrics – A Summary

ACAT 2002 Moscow Marcel Kunze - FZK

Virtualisation of Storage

InternetIntranet

Data Servers mount virtual storage as SCSI-Device

Storage Area Network(FCAL, InfiniBand,…)

InputLoad balancing

switch

Shared Data Access

(Oracle, PROOF)

Scalability

200 MB/s sustained

Page 12: Building Large Scale Fabrics – A Summary

ACAT 2002 Moscow Marcel Kunze - FZK

Storage Elements(M. Gasthuber)

PNFS = Perfectly Normal FileSystem Store MetaData with the Data 8 hierarchies of file tags

Migration of data (hierarchical storage systems): dCache Development of DESY and FermiLab ACLs, Kerberos, ROOT-aware Web-Monitoring Cached as well as direct tape access Fail-safe

Page 13: Building Large Scale Fabrics – A Summary

ACAT 2002 Moscow Marcel Kunze - FZK

Necessary admin. Tools(A. Manabe)

System (SW) Installation /update Dolly++ (Image cloning)

Configuration Arusha (http://ark.sourceforge.net) LCFGng (http://www.lcfg.org)

Status Monitoring/ System Health Check CPU/memory/disk/network utilization: Ganglia*1,plantir*2

(Sub-)system service sanity check: Pikt*3/Pica*4/cfengine*1 http://ganglia.sourceforge.net *2 http://www.netsonde.com*3 http://pikt.org *4 http://pica.sourceforge.net/wtf.html

Command Execution WANI: WEB base remote command executer

Page 14: Building Large Scale Fabrics – A Summary

ACAT 2002 Moscow Marcel Kunze - FZK

WANI is implemented on `Webmin’ GUI

Command input

Node selection

Start

Page 15: Building Large Scale Fabrics – A Summary

ACAT 2002 Moscow Marcel Kunze - FZK

Command execution result

Host name

Results from 200nodesin 1 Page

Page 16: Building Large Scale Fabrics – A Summary

ACAT 2002 Moscow Marcel Kunze - FZK

Click here

Stderr output

Click here

Stdout output

Page 17: Building Large Scale Fabrics – A Summary

ACAT 2002 Moscow Marcel Kunze - FZK

CPU Scalability

The current tools scale up to ~1000 CPUs(In the previous example 10000 CPUs would require to check 50 pages)Autonomous operation requiredIntelligent self-healing clusters

Page 18: Building Large Scale Fabrics – A Summary

ACAT 2002 Moscow Marcel Kunze - FZK

Resource Scheduling

Problem: How to access local resources from the Grid ?Local batch queues vs. Global batch queues Extension of Dynamite (Amsterdam university) to

work with Globus: Dynamite-G (I. Shoshmina)

Open Question: How do we deal with interactive applications on the Grid ?

Page 19: Building Large Scale Fabrics – A Summary

ACAT 2002 Moscow Marcel Kunze - FZK

ConclusionsA lot of tools existA lot of work needs yet to be done in the Fabric area in order to get reliable, scalable systems