Angèle Simard Canadian Meteorological Center Meteorological Service of Canada MSC Computing Update.

30
Angèle Simard Canadian Meteorological Center Meteorological Service of Canada MSC Computing Update

Transcript of Angèle Simard Canadian Meteorological Center Meteorological Service of Canada MSC Computing Update.

Angèle Simard

Canadian Meteorological Center

Meteorological Service of Canada

MSC ComputingUpdate

2

Canadian Meteorological

Centre

National Meteorological Centre for Canada

Provide services to National Defence and Emergency organizations

International WMO commitments ( emergency response, Telecomm, data, etc…)

Supercomputer used for NWP, climate, air quality, etc… ; operations and R&D

3

Outline

Current Status

Supercomputer Procurement

Conversion Experience

Scientific Direction

Current Status

5

NECSX-532M2

NECSX-4/80M3

NEC SX-3/44R

Cray1SCDC

176

CrayXMP 416

CDC 7600

NEC SX-3/44

NEC SX-6/80M10

1

10

100

1000

10000

100000

1000000

1974

1976

1978

1980

1982

1984

1986

1988

1990

1992

1994

1996

1998

2000

2002

MF

LO

PF

s (P

EA

K)

CrayXMP 28

MSC’s Supercomputing History

6

CMC Supercomputer Infrastructure

Supercomputer

High Speed Network

1000 Mb/s switched128 ports. Each host has links

to ops & dev nets

SGI O300: 8 PEs

1 TB4xFC

Central File Server

LSIRAID

ADIC AML-E (Tape Robot)

145 TB4 DST drives20 MB/s ea.

Front Ends

NetappF840.05 TB

LinuxCluster

12 DL380

NECSX-6/80M10

3.6 TB RAIDFC switched

128 ports

SGIOrigin 3000

7

Supercomputers: NEC

NEC

– 10 node

– 80 PE

– 640 GB Memory

– 2 TB disks

– 640 Gflops (peak)

8

0

20

40

60

80

100

120

140

160

180

200

1996 1997 1998 1999 2000 2001 2002

YEAR

TB

Archive Size

New data Written

Read

Archive Statistics

9

Automated Archiving System

22,000 tape mounts/month

11 Terabytes growth/month

maximum of 214 TB

SupercomputerProcurement

11

RFP/Contract Process Competitive process

Formal RFP released in mid-2002

Bidders need to meet all mandatory requirements to make it to next phase

Bidders were rejected if bids above the funding envelope

Bidders were rated on performance (90%) and price (10%)

IBM came first

Live preliminary benchmark at IBM facility

Contract awarded to IBM in November 2002

12

1.042

2.47996

5.89251

0

1

2

3

4

5

6

0 2.5 5

Initial, Upgrade & Optional Upgrade

Su

stai

ned

TF

lop

s

IBM

IBM Committed Sustained Performance in MSC TFlops

13

On-going Support

The systems must achieve a minimum of 99 % availability.

Remedial maintenance 24/7: 30 minute response time between 8 A.M. and 5 P.M. on weekdays. One hour response outside above periods.

Preventive maintenance or engineering changes: maximum of eight (8) hours a month, in blocks of time not exceeding two (2) or four (4) hours per day (subject to certain conditions) .

Software support: Provision of emergency and non-emergency assistance.

14

CMC Supercomputer Infrastructure

High Speed Network

1000 Mb/s switched128 ports. Each host has links

to ops & dev nets

SGI O300: 8 PEs

1 TB4xFC

Central File Server

LSIRAID

ADIC AML-E (Tape Robot)

145 TB4 DST drives20 MB/s ea.

Front Ends

NetappF840.05 TB

LinuxCluster

12 DL380

SGIO3000's24 PEs

(MIPS R12K)

Supercomputers

IBM SP -832 PE & 1.6TB

15

Supercomputers: Now and Then…

NEC

– 10 node

– 80 PE

– 640 GB Memory

– 2 TB disks

–640 Gflops (peak)

IBM

– 112 nodes

– 928 PE

– 2.19 TB Memory

– 15 TB Disks

– 4.825 Tflops (peak)

Conversion Experience

17

Transition to IBM

Prior to conversion:

Developed and tested code on multiple platforms (MPP & vector)

Where possible, avoided proprietary tools

When required, hid proprietary routines under a local API to centralize changes

Following contract award, obtained early access to the IBM (thanks to NCAR)

18

Transition to IBM

Application Conversion Plan:

Plan was made to comply to tight schedule

Plan’s key element included a rapid implementation of an operational “core” (to test mechanics)

Phased approach for optimized version of models

End to End testing scheduled to begin mid-September

19

Scalability

Rule of Thumb

To obtain required wall-clock timings:

Require 4 times more IBM processors to obtain same performance with NEC SX-6

20

Timing (min) Configuration NEC IBM NEC IBM

GEMDM regional 38 34 8 MPI 8 MPI x4 OpenMP (48 hour) (32 total)

GEMDM Global 35 36 4 MPI 4 MPI x4 OpenMP (240 hour) (16 total) 3D-Var(R1) 11 09 4 6 CPU OpenMP (without AMSU-B) 3D-Var (G2) 16 12 4 6 CPU OpenMP (with AMSU-B)

Preliminary Results

SCIENTIFIC DIRECTIONS

22

Global System

Now: Uniform resolution of 100 km (400 X 200 X 28) 3D-Var at T108 on model levels, 6-hr cycle Use of raw radiances from AMSUA, AMSUB and GOES

2004: Resolution to 35 km (800 X 600 X 80) Top at 0.1 hPa (instead of 10 hPa) with additional AMSUA and AMSUB channels 4D-Var assimilation, 6-hr time window with 3 outer loops at full model resolution and inner loops at T108 (cpu equivalent of a 5- day forecast of full resolution model) new datasets: profilers, MODIS winds, QuikScat

2005+: Additional datasets (AIRS, MSG, MTSAT, IASI, GIFTS, COSMIC) Improved data assimilation

23

Regional System

Now: Variable resolution, uniform region at 24 km (354 X 415 X 28) 3D-Var assimilation on model levels at T108, 12-hr spin-up

2004: Resolution to 15 km in uniform region (576 X 641 X 58) Inclusion of AMSU-B and GOES data in assimilation cycle Four model runs a day (instead of two) New datasets: profilers, MODIS winds, QuikScat

2006: Limited area model at 10 km resolution (800 X 800 X 60) LAM 4D-Var data assimilation Assimilation of Radar data

24

Ensemble Prediction System

Now: 16 members global system (300 X 150 X 28) Forecasts up to 10 days once a day at 00Z Optimal Interpolation assimilation system, 6-hr cycle,use of derived radiance data (Satems)

2004: Ensemble Kalman Filter assimilation system, 6-hr cycle,

use of raw radiances from AMSUA, AMSUB and GOES Two forecast runs per day (12Z run added) Forecasts extended to 15 days (instead of 10)

25

...Ensemble Prediction System

2005:

Increased resolution to 100 km (400 X 200 X 58) Increased members to 32 Additional datasets such as in global deterministic system

2007:

Prototype regional ensemble 10 members LAM (500 X 500 X 58) No distinct data assimilation; initial and boundary conditions from global EPS

26

Mesoscale System

Now: Variable resolution, uniform region at 10 km (290 X 371 X 35)

Two windows; no data assimilation

2004: Prototype Limited area model at 2.5 km (500 X 500 X 45)

over one area

2005: Five Limited area model windows at 2.5 km (500 X 500 X 45)

2008: 4D data assimilation

27

Coupled Models

Today

In R&D: coupled atmosphere, ocean, ice, wave, hydrology, biology & chemistry

In Production: storm surge, wave

Future

Global coupled model for both prediction & data assimilation

28

Estuary & Ice Circulation

29

Merchant Ship RoutingP

uiss

ance

Temps …2 jours...

Route sudRoute sud

Route nordRoute nord

30

Challenges

IT Security

Accelerated Storage Needs

Replacement of Front-end & Storage Infrastructure