Fast computers, big/fast storage, fast networks

34
Fast computers, big/fast storage, fast networks Marla Meehl Manager, Network Engineering and Telecommunications, NCAR/UCAR, Manager of the Front Range GigaPoP Computational & Information Systems Laboratory National Center for Atmospheric Research

description

Fast computers, big/fast storage, fast networks. Marla Meehl Manager, Network Engineering and Telecommunications, NCAR/UCAR, Manager of the Front Range GigaPoP Computational & Information Systems Laboratory National Center for Atmospheric Research. 35 systems over 35 years. - PowerPoint PPT Presentation

Transcript of Fast computers, big/fast storage, fast networks

Page 1: Fast computers, big/fast storage, fast networks

Fast computers, big/fast storage, fast networks

Marla MeehlManager, Network Engineering and Telecommunications,

NCAR/UCAR, Manager of the Front Range GigaPoPComputational & Information Systems Laboratory

National Center for Atmospheric Research

Page 2: Fast computers, big/fast storage, fast networks

CDC 3600CDC 6600

CDC 7600Cray 1-A S/N 3 (C1)

Cray 1-A S/N 14 (CA)Cray X-MP/4 (CX)TMC CM2/8192 (capitol)

Cray Y-MP/8 (shavano)Cray Y-MP/2 (castle)

IBM RS/6000 Cluster (CL)TMC CM5/32 (littlebear)

IBM SP1/8 (eaglesnest)CCC Cray 3/4 (graywolf)

Cray Y-MP/8I (antero)Cray T3D/64 (T3)

Cray T3D/128 (T3)Cray J90/16 (paiute)Cray J90/20 (aztec)

Cray J90se/24 (ouray)Cray C90/16 (antero)

HP SPP-2000/64 (sioux)Cray J90se/24 (chipeta)SGI Origin2000/128 (ute)

Linux Networx Pentium-II/16 (tevye)IBM p3 WH1/296 (blackforest)

IBM p3 WH2/604 (blackforest)IBM p3 WH2/1308 (blackforest)

Compaq ES40/36 (prospect)SGI Origin 3800/128 (tempest)IBM p4 p690-C/1216 (bluesky)

IBM p4 p690-C/1600 (bluesky)IBM p4 p690-F/64 (thunder)

IBM e1350/264 (lightning)IBM e1350/140 (pegasus)

IBM BlueGene-L/2048 (frost)IBM BlueGene-L/8192 (frost)

Aspen Nocona-IB/40 (coral)IBM p5 p575/624 (bluevista)

IBM p5+ p575/1744 (blueice)IBM p6 p575/192 (firefly)

IBM p6 p575/4096 (bluefire)new system @ NWSC

1960 1965 1970 1975 1980 1985 1990 1995 2000 2005 2010 2015

History of Supercomputing at NCAR

Experimental/Test Systems that became Production Systems

Experimental/Test Systems

Red text indicates those systems that are currently in operation within theNCAR Computational and Information Systems Laboratory's computing facility.

27 Oct '09

Production Systems

Marine St. Mesa Lab I Mesa Lab II NWSC

35 systemsover 35 years

Page 3: Fast computers, big/fast storage, fast networks

Current HPC Systems at NCAR

System Name Vendor System#

Frames

# Processors Processor Type GHz

Peak TFLOPs

Production Systems

bluefire (IB) IBM p6 Power 575 11 4096 POWER6 4.7 77.00Research Systems, Divisional Systems & Test Systems

Firefly (IB) IBM p6 Power 575 1 192 POWER6 4.7 3.610

frost (BG/L) IBM BlueGene/L 4 8192 PowerPC-440 0.7 22.935

Page 4: Fast computers, big/fast storage, fast networks

High Utilization, Low Queue Wait Times

4

bluefire lightning blueice bluevistasince Jun'08 since Dec'04 Jan'07-Jun'08 Jan'06-Sep'08

Peak TFLOPs > 77.0 TFLOPs 1.1 TFLOPs 12.2 TFLOPs 4.4 TFLOPsall jobs all jobs all jobs all jobs

Premium 00:05 00:11 00:11 00:12Regular 00:36 00:16 00:37 01:36Economy 01:52 01:10 03:37 03:51Stand-by 01:18 00:55 02:41 02:14

Average Queue Wait Times for User Jobs at the NCAR/CISL Computing Facility

0.0%

10.0%

20.0%

30.0%

40.0%

50.0%

60.0%

70.0%

80.0%

90.0%

100.0%

Nov-

08

Dec-

08

Jan-

09

Feb-

09

Mar

-09

Apr-0

9

May

-09

Jun-

09

Jul-0

9

Aug-

09

Sep-

09

Oct

-09

Nov-

09

Bluefire System Utilization (daily average)

Spring ML Powerdown May 3 Firmware & Software Upgrade

Bluefire system utilization is routinely >90%

Average queue-wait time < 1hr

0

2000

4000

6000

8000

10000

12000

14000

<2m

<5m

<10m

<20m

<40m

<1h

<2h

<4h

<8h

<1d

<2d

>2d Queue

# of

Job

s

Queue-Wait Time

Premium

Regular

Economy

Standby

Page 5: Fast computers, big/fast storage, fast networks

Bluefire Usage

Climate49.3%

IPCC-AR56.4%

Weather Prediction10.0%

Oceanography4.0%

Atmospheric Chemistry

3.5%Astrophysics4.4%

Basic Fluid Dynamics2.6%

Cloud Physics1.3%

Miscellaneous0.3%

Upper Atmosphere0.3%

Accelerated Scientific Discovery

18.0%

NCAR FY2009 Computing Resource Usage by Discipline(FY2009)

Page 6: Fast computers, big/fast storage, fast networks

0

20

40

60

80

100

Jan-00 Jan-01 Jan-02 Jan-03 Jan-04 Jan-05 Jan-06 Jan-07 Jan-08 Jan-09 Jan-10

Peak TFLOPs at NCAR (All Systems)

IBM POWER6/Power575/IB (bluefire)

IBM POWER6/Power575/IB (firefly)

IBM POWER5+/p575/HPS (blueice)

IBM POWER5/p575/HPS (bluevista)

IBM BlueGene/L (frost)

IBM Opteron/Linux (pegasus)

IBM Opteron/Linux (lightning)

IBM POWER4/Federation (thunder)

IBM POWER4/Colony (bluesky)

IBM POWER4 (bluedawn)

SGI Origin3800/128

IBM POWER3 (blackforest)

IBM POWER3 (babyblue)

lightning/pegasus

blueskyblackforest

ARCS Phase 3

ARCS Phase 2

ARCS Phase 4

Linux

frostbluevista

ICESS Phase 1

blueice

bluefire

ICESS Phase 2

ARCS Phase 1

firefly

CURRENT NCAR COMPUTING >100TFLOPS

Page 7: Fast computers, big/fast storage, fast networks

Univ Faculty, 191, 14%

Government, 108, 8%

Graduate and Un-dergrad

Students; 343; 25%

NCAR Scientists & Proj Scientists,

160, 12%NCAR Assoc Sci

& Support Staff,

252, 19%

Other, 13, 1%

Univ Research Associates, 280, 21%

CISL Computing Resource Users1347 Users in FY09

Page 8: Fast computers, big/fast storage, fast networks

0

20

40

60

80

100

120

Universities with largest number of users in FY09

Num

ber

of U

sers

Users are from 114 U.S. Universities

Page 9: Fast computers, big/fast storage, fast networks

Disk Storage Systems

Current Disk Storage for HPC and Data Collections: 215 TB for Bluefire (GPFS) 335 TB for Data Analysis and Vis (DAV) (GPFS) 110 TB for Frost (Local) 152 TB for Data Collections (RDA, ESG) (SAN)

Undergoing extensive redesign and expansion. Add ~2PB to (existing GPFS system) to create large

central (x-mounted) filesystem

Page 10: Fast computers, big/fast storage, fast networks

Data Services Redesign

Near-term Goals:– Creation of unified and consistent data environment for NCAR HPC– High-performance availability of central filesystem from many projects/systems

(RDA, ESG, CAVS, Bluefire, TeraGrid) Longer-term Goals:

– Filesystem cross-mounting– Global WAN filesystems

Schedule:– Phase 1: Add 300TB (now)– Phase 2: Add 1 PB

(March 2010 )– Phase 3: Add .75 PB (TBD)

Page 11: Fast computers, big/fast storage, fast networks

Mass Storage Systems NCAR MSS (Mass Storage System)

In Production since 70s Transitioning from the NCAR Mass Storage System to

HPSS– Prior to NCAR-Wyoming Supercomputer Center (NWSC)

commissioning in 2012– Cutover Jan 2011– Transition without data migration Silos/Tape Drives

Tape Drives/Silos Sun/StorageTek T10000B (1 TB media) with SL8500

libraries Phasing out tape silos Q2 2010 Oozing 4 PBs (unique), 6 PBs with duplicates

“Online” process Optimized data transfer Average 20 TB/day with 10 streams

Page 12: Fast computers, big/fast storage, fast networks
Page 13: Fast computers, big/fast storage, fast networks

Estimated GrowthTotal PBs Stored (w/ DSS Stewardship & IPCC AR5)

0.00

5.00

10.00

15.00

20.00

25.00

30.00

35.00

Oct'07 Oct'08 Oct'09 Oct'10 Oct'11 Oct'12

Page 14: Fast computers, big/fast storage, fast networks

Facility ChallengeBeyond 2011

NCAR Data Center– NCAR data center limits of power/cooling/space have been

reached– Any future computing augmentation will exceed existing

center capacity– Planning in progress

Partnership Focus on Sustainability and Efficiency Project Timeline

Page 15: Fast computers, big/fast storage, fast networks

NCAR-Wyoming Supercomputer Center (NWSC) Partners NCAR & UCAR University of Wyoming State of Wyoming Cheyenne LEADS Wyoming Business Council Cheyenne Light, Fuel & Power

Company National Science Foundation (NSF)

http://cisl.ucar.edu/nwsc/

Cheyenne

University of Wyoming

NCAR – Mesa Lab

Page 16: Fast computers, big/fast storage, fast networks

Focus on Sustainability Maximum energy efficiency LEED certification Achievement of the smallest possible carbon footprint Adaptable to the ever-changing landscape of High-

Performance Computing Modular and expandable space that can be adapted as

program needs or technology demands dictate

Page 17: Fast computers, big/fast storage, fast networks

Focus On Efficiency Don’t fight mother nature The cleanest electron is an

electron never used The computer systems

drive all of the overhead loads– NCAR will evaluate in

procurement– Sustained performance /

kW Focus on the biggest

losses– Compressor based cooling– UPS losses – Transformer losses

0.1% 15.0%

6.0%

13.0%65.9%

Typical Modern Data Center

Lights CoolingFansElectrical LossesIT Load

91.9%

NWSC Design

Lights CoolingFansElectrical LossesIT Load

Page 18: Fast computers, big/fast storage, fast networks

Design Progress Will be one of the most efficient data centers

– Cheyenne Climate– Eliminate refrigerant based cooling for ~98% of the

year– Efficient water use– Minimal transformations steps electrically

Guaranteed 10% Renewable Option for 100% (Wind Power) Power Usage Effectiveness (PUE)

– PUE = Total Load / Computer Load = 1.08 ~ 1.10 for WY

– Good data centers 1.5– Typical Commercial 2.0

Page 19: Fast computers, big/fast storage, fast networks

Expandable and Modular Long term

– 24 – acre site– Dual substation electrical feeds– Utility commitment up to 36MW– Substantial capability fiber optics

Medium term – Phase I / Phase II– Can be doubled– Enough structure for three or – four generations– Future container solutions or other advances

Short term – Module A – Module B– Each Module 12,000 sq.ft.– Install 4.5MW initially can be doubled in Module A– Upgrade Cooling and Electrical systems in 4.5MW increments

Page 20: Fast computers, big/fast storage, fast networks

Project Timeline

65% Design Review Complete (October 2009) 95% Design Complete 100% Design Complete (Feb 2010) Anticipated Groundbreaking (June 2010) Certification of Occupancy (October 2011) Petascale System in Production (January 2012)

Page 21: Fast computers, big/fast storage, fast networks

NCAR-Wyoming Supercomputer Center

Page 22: Fast computers, big/fast storage, fast networks

HPC services at NWSC PetaFlop Computing – 1 PFLOP peak minimum 100’s PetaByte Archive (+20 PB yearly growth)

– HPSS – Tape archive hardware acquisition TBD

PetaByte Shared File System(s) (5-15PB)– GPFS/HPSS HSM integration (or Lustre)– Data Analysis and Vis– Cross Mounting to NCAR Research Labs, others

Research Data Archive data servers and Data portals High-Speed Networking and some enterprise services

Page 23: Fast computers, big/fast storage, fast networks

NWSC HPC Projection

230.0

0.2

0.4

0.6

0.8

1.0

1.2

1.4

1.6

1.8

2.0

Jan-04 Jan-05 Jan-06 Jan-07 Jan-08 Jan-09 Jan-10 Jan-11 Jan-12 Jan-13 Jan-14

Thou

sand

sPeak PFLOPs at NCAR

NWSC HPC (uncertainty)

NWSC HPC

IBM POWER6/Power575/IB (bluefire)

IBM POWER5+/p575/HPS (blueice)

IBM POWER5/p575/HPS (bluevista)

IBM BlueGene/L (frost)

IBM Opteron/Linux (pegasus)

IBM Opteron/Linux (lightning)

IBM POWER4/Colony (bluesky)bluesky

ARCS Phase 4ICESS Phase 1

bluefire

ICESS Phase 2

frost

Page 24: Fast computers, big/fast storage, fast networks

NCAR Data Archive Projection

24

0

20

40

60

80

100

120Ja

n-99

Jan-

00

Jan-

01

Jan-

02

Jan-

03

Jan-

04

Jan-

05

Jan-

06

Jan-

07

Jan-

08

Jan-

09

Jan-

10

Jan-

11

Jan-

12

Jan-

13

Jan-

14

Jan-

15

Peta

byte

sTotal Data in the NCAR Archive (Actual and Projected)

TotalUnique

Page 25: Fast computers, big/fast storage, fast networks

Tentative Procurement Timeline

39993 40177 40361 40545 40729 40913

NDA's

RFI

RFI Evaluation

RFP Draft

RFP Refinement

Benchmark Refinement

RFP Release

Proposal Evaluation

Negotiations

NSF Approval

Award

Installation & ATP

NWSC Production

Jan-12Jul-11Dec-10Jul-10Jan-10Jul-09

Page 26: Fast computers, big/fast storage, fast networks
Page 27: Fast computers, big/fast storage, fast networks

Front Range GigaPoP (FRGP) 10 years of operation by UCAR 15 members NLR, I2, Esnet peering @ 10Gbps Commodity – Qwest and Level3 CPS and TransitRail Peering Intra-FRGP peering Akamai 10Gbps ESnet peering www.frgp.net

Page 28: Fast computers, big/fast storage, fast networks
Page 29: Fast computers, big/fast storage, fast networks

Fiber, fiber, fiber NCAR/UCAR intra-campus fiber – 10Gbps backbone Boulder Research and Administration Network (BRAN) Bi-State Optical Network (BiSON) – upgrading to

10/40/100Gbps capable– Enables 10Gbps NCAR/UCAR Teragrid connection– Enable 10Gbps NOAA/DC link via NLR lifetime lambda

DREAM – 10Gbps SCONE – 1Gbps

Page 30: Fast computers, big/fast storage, fast networks
Page 31: Fast computers, big/fast storage, fast networks
Page 32: Fast computers, big/fast storage, fast networks

Western Regional Network (WRN) A multi-state partnership to ensure robust, advanced,

high-speed networking available for research, education, and related uses

Increased aggregate bandwidth Decreased costs

Page 33: Fast computers, big/fast storage, fast networks
Page 34: Fast computers, big/fast storage, fast networks

Questions