Site Lightning Report: MWT2
description
Transcript of Site Lightning Report: MWT2
![Page 1: Site Lightning Report: MWT2](https://reader036.fdocuments.in/reader036/viewer/2022062322/56814894550346895db5aa1e/html5/thumbnails/1.jpg)
Site Lightning Report: MWT2
Mark NeubauerUniversity of Illinois at Urbana-Champaign
US ATLAS Facilities Meeting @ UC Santa CruzNov 14, 2012
![Page 2: Site Lightning Report: MWT2](https://reader036.fdocuments.in/reader036/viewer/2022062322/56814894550346895db5aa1e/html5/thumbnails/2.jpg)
Midwest Tier-2
2
Three site Tier-2 consortia
0101001011110…
The Team:Rob Gardner, Dave Lesny, Mark Neubauer, Sarah Williams, Illija Vukotic, Lincoln Bryant, Fred Luehring
![Page 3: Site Lightning Report: MWT2](https://reader036.fdocuments.in/reader036/viewer/2022062322/56814894550346895db5aa1e/html5/thumbnails/3.jpg)
Midwest Tier-2
3
Focus of this talk:Illinois Tier-2
![Page 4: Site Lightning Report: MWT2](https://reader036.fdocuments.in/reader036/viewer/2022062322/56814894550346895db5aa1e/html5/thumbnails/4.jpg)
Tier-2 @ Illinois
4
History of the project:– Fall 2007−: Development/operation of T3gs– 08/26/10: Torre’s US ATLAS IB talk– 10/26/10: Tier2@Illinois Proposal
submitted to US ATLAS Computing Mgmt– 11/23/20: Proposal formally accepted– 10/5/11: First successful test of ATLAS
production jobs run on Campus Cluster(CC)• Jobs read data from our Tier3gs cluster
![Page 5: Site Lightning Report: MWT2](https://reader036.fdocuments.in/reader036/viewer/2022062322/56814894550346895db5aa1e/html5/thumbnails/5.jpg)
Tier-2 @ Illinois
5
History of the project (cont):– 03/1/12: Successful T2@Illinois Pilot• Squid proxy cache, Condor head node job
flocking from UC– 4/4/12: First hardware into Taub cluster• 16 compute nodes (dual x5650, 48 GB
memory, 160 GB disk, IB) 196 cores• 60 2TB drives in DDN array 120 TB raw
– 4/17/12: PerfSONAR nodes online
![Page 6: Site Lightning Report: MWT2](https://reader036.fdocuments.in/reader036/viewer/2022062322/56814894550346895db5aa1e/html5/thumbnails/6.jpg)
Illinois Tier-2
6
T2onTaub
History of the project (cont)–4/18/12: T2@Illinois in production
![Page 7: Site Lightning Report: MWT2](https://reader036.fdocuments.in/reader036/viewer/2022062322/56814894550346895db5aa1e/html5/thumbnails/7.jpg)
Illinois Tier-2
7
Stable operation: Last two weeks
![Page 8: Site Lightning Report: MWT2](https://reader036.fdocuments.in/reader036/viewer/2022062322/56814894550346895db5aa1e/html5/thumbnails/8.jpg)
Illinois Tier-2
8
Last day on MWT2:
![Page 9: Site Lightning Report: MWT2](https://reader036.fdocuments.in/reader036/viewer/2022062322/56814894550346895db5aa1e/html5/thumbnails/9.jpg)
Why at Illinois?
9
• National Center for Supercomputing Applications (NCSA)
• National Petascale Computing Facility (NPCF): Blue Waters
• Advanced Computation Building– 7000 sq. ft with 70” raised floor– 2.3 MW of power capacity– 250 kW UPS– 750 tons of cooling capacity
• Experience in HEP computing
NCSA Building
ACB
NPCF
![Page 10: Site Lightning Report: MWT2](https://reader036.fdocuments.in/reader036/viewer/2022062322/56814894550346895db5aa1e/html5/thumbnails/10.jpg)
Tier-2 @ Illinois
10
• Deployed in a shared campus cluster (CC) in ACB– “Taub” first instance of CC– Tier2@Illinois on Taub in production within MWT2
• Pros (ATLAS perspective)– Free building, power, cooling, core
infrastructure support w/ plenty of room for future expansion
– Pool of Expertise, heterogeneous HW– Bulk Pricing important given DDD
(Dell Deal Demise)– Opportunistic resources
• Challenges– Constraints on hardware, pricing, architecture, timing
![Page 11: Site Lightning Report: MWT2](https://reader036.fdocuments.in/reader036/viewer/2022062322/56814894550346895db5aa1e/html5/thumbnails/11.jpg)
Tier-2 @ Illinois
11
Current CPU and disk resources:• 16 compute nodes (taubXXX)– dual x5650, 48 GB memory, 160 GB disk, IB) 196
cores ~400 js
• 60 2TB drives in Data Direct Networks (DDN) array 120 TB raw ~70 TB usable
![Page 12: Site Lightning Report: MWT2](https://reader036.fdocuments.in/reader036/viewer/2022062322/56814894550346895db5aa1e/html5/thumbnails/12.jpg)
Tier-2 @ Illinois
12
• Utility nodes / services (.campuscluster.illinois.edu):– Gatekeeper (mwt2-gt)• Primary schedd for Taub condor pool
– Flocks other jobs to UC and IU Condor Pools
– Condor Head Node (mwt2-condor)• Collector and Negotiator for Taub condor pool
– Accepts flocked jobs from other MWT2 Gatekeepers
– Squid (mwt2-squid)• Proxy cache for CVMFS, Frontier for Taub (backup for IU/UC)
– CVMFS Replica server (mwt2-cvmfs)• CVMFS replica server for Master CVMFS server
– dCache s-node (mwt2-s1)• Pool node for GPFS data storage (installed, dCache in progress)
![Page 13: Site Lightning Report: MWT2](https://reader036.fdocuments.in/reader036/viewer/2022062322/56814894550346895db5aa1e/html5/thumbnails/13.jpg)
Next CC Instance (to be named) Overview
13
• Mix of Ethernet-only and Ethernet + InfiniBand connected nodes– assume 50-100% will be IB enabled
• Mix of CPU-only and CPU+GPU nodes– assume up to 25% of nodes will have GPUs
• New storage device and support nodes– added to shared storage environment– Allow for other protocols (SAMBA, NFS, GridFTP, GPFS)
• VM hosting and related services– persistent services and other needs directly related to
use of compute/storage resources
![Page 14: Site Lightning Report: MWT2](https://reader036.fdocuments.in/reader036/viewer/2022062322/56814894550346895db5aa1e/html5/thumbnails/14.jpg)
Next CC Instance (basic configuration)
14
• Dell PowerEdge C8220 2-socket Intel Xeon E5-2670 – 8-core Sandy Bridge processors @ 2.60GHz– 1 “sled” : 2 SB processors– 8 sleds in 4U : 128 cores
• Memory configuration options: – 2GB/core, 4GB/core, 8GB/core
• Options:– InfiniBand FDR (GigE otherwise)– NVIDIA M2090 (Fermi
GPU) Accelerators– Storage via DDN SFA12000– can add in 30TB (raw) increments
Dell C8220 compute sled
![Page 15: Site Lightning Report: MWT2](https://reader036.fdocuments.in/reader036/viewer/2022062322/56814894550346895db5aa1e/html5/thumbnails/15.jpg)
Summary and Plans
15
• New Tier-2 @ Illinois– Modest (currently) resource integrated into MWT2 and
in production use– Cautious optimism: Deploying an Tier-2 within a shared
campus cluster a success• Near term plans– Buy into 2nd campus cluster instance• $160k of FY12 funds with 60/40 CPU/disk split
– Continue dCache deployment– LHCONE @ Illinois due to turn on 11/20/12 – Virtualization of Tier-2 utility services– Better integration into MWT2 monitoring