CSG Research Computing Jim Pepin USC CTO/Director HPCC

27
CSG Research Computing Jim Pepin USC CTO/Director HPCC

description

CSG Research Computing Jim Pepin USC CTO/Director HPCC. HPCC. Provide common facilities and services for a large cross section of the university that requires leading edge computational and networking resources. Leverage USC central resources with externally funded projects. Overview. - PowerPoint PPT Presentation

Transcript of CSG Research Computing Jim Pepin USC CTO/Director HPCC

Page 1: CSG  Research Computing Jim Pepin USC CTO/Director HPCC

CSG Research Computing

Jim PepinUSC

CTO/Director HPCC

Page 2: CSG  Research Computing Jim Pepin USC CTO/Director HPCC

HPCC

Provide common facilities and services for a large cross section of the university that requires leading edge computational and networking resources.

Leverage USC central resources with externally funded projects.

Page 3: CSG  Research Computing Jim Pepin USC CTO/Director HPCC

Overview

Sponsored by ISD (Information Services Division of USC) and ISI (Information Sciences Institute) User community

ISI LAS Engineering School of Medicine IMSC ICT Others

Page 4: CSG  Research Computing Jim Pepin USC CTO/Director HPCC

Current Resources

High Performance Computing Resources Linux Cluster (~1000nodes/2000cpus, 2Gb/sec

Myrinet) 20TB shared disk, 18GB - 40GB local disk per node. Ranks in top 10 for academic clusters.

Myrinet switch is 768 nodes. Adding nodes funded by USC research groups. Sun Core Servers (E15k shared memory)

72 processors, 288GB memory, 30TB shared disk Mass Storage Facilities (Unitree)

18,000 tape capacity

Page 5: CSG  Research Computing Jim Pepin USC CTO/Director HPCC

Funding Sources

ISD (University) Resources 1.5M M/S and Equipment budget

Software/Maintenance .4M Generic capital 1.0M Other .1M

3 FTEs direct support 2 FTEs system staff offset

Los Nettos/LAAP 2.0M

Condo Arrangements 50k-250k one off capital purchases

Page 6: CSG  Research Computing Jim Pepin USC CTO/Director HPCC
Page 7: CSG  Research Computing Jim Pepin USC CTO/Director HPCC
Page 8: CSG  Research Computing Jim Pepin USC CTO/Director HPCC
Page 9: CSG  Research Computing Jim Pepin USC CTO/Director HPCC
Page 10: CSG  Research Computing Jim Pepin USC CTO/Director HPCC

Cluster Power Usage Math

42 nodes/cabinet 200 watts/node. 8.4Kw/cabinet 1000 nodes 24 cabinets 1 control cabinet per 8 cabinets of compute servers 8 control cabinets 32 cabinets per 100 nodes

268Kw per 1000 nodes 100 Tons of a/c per 1000 nodes Roughly 400KW total power use for 1000 nodes 1500-2000 sq feet of space.

Page 11: CSG  Research Computing Jim Pepin USC CTO/Director HPCC

Current Software

Cluster software from IBM (xcat) is core of facility. Stable production environment. MPI is basic message passing Globus/NMI work is proceeding with Carl’s help in funding plus ISD

resources. Leverages with campus need for global directory More later.

Solaris and Unitree are core for Mass Storage support. We need to look at other mass storage opportunities.

Issues We need to be able to support faculty/researchers with tools and

consulting to help them effectively use large-scale resources. Many packages exist on HPCC resources, with no local support to

help use them.

Page 12: CSG  Research Computing Jim Pepin USC CTO/Director HPCC

“Middleware”

Globus as base with NMI architecture for campus. GT2 moving to GT3

SCEC/ISI Condor as lightweight job manager in user rooms PBS/Maui on Cluster and Computation side of E15k

Issues Kx509 bridge from Kerberos USC PKI lite CA is base.

Only hosts and services. NMI based.

Pubcookie (Kerberos back-end) Uses host certs from PKI lite CA

Shib for some prototype library apps (scholar’s portal) Campus GDS/PR using NMI schemes (eduperson etc)

Page 13: CSG  Research Computing Jim Pepin USC CTO/Director HPCC

HPCC Governance

HPCC faculty advisory group Meets 4-5 times a year Provides guidance to DCIO and CTO “Final” Decisions are in ISD (CIO/DCIO) Usual mode is agreement

Time allocation No recharge Large project reviewed by faculty allocation group

Some projects over 500k node hours Condo users get dedicated nodes and cost sharing

Research leverage Condo Cost sharing External funding Grid construction

Next generation network

Page 14: CSG  Research Computing Jim Pepin USC CTO/Director HPCC

CTO/HPCC Projects

Advanced Networking Projects Calren-2

2xGb service today . 10Gb service in next 2 years.

Fiber/wavelength services(CENIC/National Lambda Rail) Online for west coast.

Look at L2 possibilities to build shared ‘spaces’. Look to leverage for project like Optiputer ITR.

1 Wilshire colo facilities See if we can use that space to facilitate ETF proposal.

Optiputer ITR as way to help network expansion.

Page 15: CSG  Research Computing Jim Pepin USC CTO/Director HPCC

CTO/HPCC Projects

Leverage HPCC efforts at ISI with ISD Resources Clusters

Expand cluster to ~2000 nodes centrally owned. Expand cluster for other groups (condo model).

Mass Storage Look into large scale storage for groups like VHF project and

other high end storage needs. (fractional petabytes) Globus/NMI

Provide campus leadership for Global directory services and identity management. (authentication and authorization).

Networking Research

Page 16: CSG  Research Computing Jim Pepin USC CTO/Director HPCC

CTO/HPCC Projects

Fiber is a major part of the HPCC’s ability to service large scale computational needs. The following slides show what we have today and how it can be used.

Page 17: CSG  Research Computing Jim Pepin USC CTO/Director HPCC

Fiber Facilities

Lease dark fiber. Started with dark fiber 3 years ago.

Pioneer in this area. DWP (Department of Water and Power) USC franchise area fiber for campus access. Leverage new players (NLR/Cenic). Use for USC, LAAP and Los Nettos projects. Built-out today using low cost CWDM and 15540s.

10Gbps ethernet backbone in place Fall ‘02 Built-out fiber to Caltech/JPL/VHF(Shoah) and

other Los Nettos sites.

Page 18: CSG  Research Computing Jim Pepin USC CTO/Director HPCC

Fiber Facilities

Lease more dark fiber. Harvey Mudd. Build second path to USC for disaster recovery.

Install DWDM gear from CENIC deal with Cisco. 1Gb wavelengths in first phase (fall 04) 10Gb wavelengths in summer 04.

Use to enable projects like Optiputer and ETF.

Experiment with optical switching hardware as ‘fiber patch panel’ for development of shared ‘computer centers’.

Page 19: CSG  Research Computing Jim Pepin USC CTO/Director HPCC

Original USC Fiber Backbone

DowntownClinic

UPC

HSC

ISI

ICT

1 wilshire

Original

4 strand SM DWPfiber

External fiber plant

Page 20: CSG  Research Computing Jim Pepin USC CTO/Director HPCC

Caltech

818 VHF

JPL

HSC1 wilshire

UPC

ISI

ICT

Today’s Fiber and Gigaman circuits

Tustin

HMC

gigaman

fiber

Page 21: CSG  Research Computing Jim Pepin USC CTO/Director HPCC

Colo Facilities

Acquired space in 1 wilshire (original site).

3 years ago. DWP fiber is core. Use to connect to exchanges and others ISPs.

Extend to potentially other ‘1 Wilshire’ buildings. Use new Campus Level 3 fiber as means.

House routers and l2 equipment. Provide space on USC campus for partners

partners. Enables Pacific Wave Exchange Point.

Page 22: CSG  Research Computing Jim Pepin USC CTO/Director HPCC

Exchange Point/Research

Foundry Bigirons802.1q vlans

10Gb

10Gb

Gb ports100m ports

Gb ports100mb ports

Gb ports100mb ports

1 wilshire

UPC

ISI

Gb

10Gb

HSC

818 7thGb

Page 23: CSG  Research Computing Jim Pepin USC CTO/Director HPCC

Experimental Networking

Networking research community California Institutes for Science and Innovation

(CITRIS, CalIT2, Nano Systems, BioMedical) San Diego Super Computer Center CACR ISI Teragrid/Distributed Terascale Facility UCSB/Dan Blumenthal optical labs

Page 24: CSG  Research Computing Jim Pepin USC CTO/Director HPCC

Future Resource Goals

High Performance Computing Resources Linux Cluster (2048nodes/4096cpus, 2Gb/sec Myrinet)

60TB shared disk, 36GB - 72GB local disk per node. Rank in top 5 for academic clusters.

Start 64 bit nodes in summer 04. Switch fabric will expand past 1024 nodes with ability to condo other

users. Plan to add more nodes funded by USC research groups (condo) Goal

would be 3000+ nodes total. Sun Core Servers (E15k shared memory)

72 processors, 288GB memory, 300TB disk Use this system for high end data users (large scale databases) and video users.

Mass Storage Facilities (Unitree today) 18,000 tape capacity PB online as goal in 3 years.

Page 25: CSG  Research Computing Jim Pepin USC CTO/Director HPCC

3 Year Strategy

Next step after 32 bit pentium. Need to determine what will replace Xeons. One

answer is opteron or IA64, but we need to start to develop clusters in this space and benchmark. Much of the code will need reworking at user level.

Find ways to cost share with local cluster purchasers. “Condo” housing of medium to large clusters will be important.

Build “Grid-U”

Page 26: CSG  Research Computing Jim Pepin USC CTO/Director HPCC

3 Year Strategy

As cluster expand into the 2-4k node space power and A/C become significant issues (along with floor space).

We need to develop several major partners to allow HPCC to be the central piece of joint proposals from USC for such initiatives as ETF and future cyber infrastructure proposals.

Example is shared submission for Major Research Instrument grant.

Page 27: CSG  Research Computing Jim Pepin USC CTO/Director HPCC

3 Year Strategy

Networking Futures Expand Exchange Point (R/E, Pacific Wave) 10Gb at all sites Layer 1 facilities (Optiputer type connections) Re-design/RFP for campus network this month

Design network with ‘enclaves’ for research or academic support

Much higher internal bandwidth (10Gb core-core, at least 1Gb to all buildings 10Gb to major research centers)

How to provide comprehensive security without unacceptable friction.