HPC ACTIVITIES AT CPTEC · 2015. 11. 26. · 25 Three distinct PC clusters spread over Brazil,...
Transcript of HPC ACTIVITIES AT CPTEC · 2015. 11. 26. · 25 Three distinct PC clusters spread over Brazil,...
www.cptec.inpe.br1
CENTER FOR WEATHER FORECAST AND CLIMATE STUDIESCENTER FOR WEATHER FORECAST AND CLIMATE STUDIES
HPC ACTIVITIES AT CPTECHPC ACTIVITIES AT CPTEC
JAIRO PANETTA (CPTEC)JAIRO PANETTA (CPTEC)SAULO BARROS (USP)SAULO BARROS (USP)
and many and many colleaguescolleagues…… ((……..)..)ECMWF, OCT 2006ECMWF, OCT 2006
www.cptec.inpe.br2
Installed Top Speed (GFlops)
1
10
100
1.000
10.000
100.000
1993 1994 1995 1996 1997 1998 1999 2000 2001 2002 2003 2004 2005
ECMWF NCEP UKMO DWD JMA Canada
Source: Top500
www.cptec.inpe.br3
1
10
100
1.000
10.000
100.000
1993 1994 1995 1996 1997 1998 1999 2000 2001 2002 2003 2004 2005
ECMWF NCEP UKMO DWD JMA Canada LeastSquares
1.8/year
Installed Top Speed (GFlops)
Source: Top500
1 PFlops @ 2012
www.cptec.inpe.br4
1PByte
1994 1998 2004MACHINEMACHINE
NUMBER OF NODESNUMBER OF NODES
PROCESSORSPROCESSORS
TOP SPEEDTOP SPEED
MEMORYMEMORY
DISKDISK
Supercomputing at CPTEC
www.cptec.inpe.br5
Global Spectral Model T213L42 (64km) up to 10 days, twice a day (NCEP analysis and GPSAS assimilation system)
Regional ETA model (20kmL38) up to 5 days, twice a day (RPSAS assimilation system with CPTEC AGCM fields)
Coupled ocean/atmosphere global model (T126L28 + MOM3) up to 30 days, twice a day
CATT-BRAMS environmental model up to 3 days
Global (T126L28, 15 members, 15 days) and regional (40kmL38, 5 members, 5 days) ensembles twice a day
Wave model, climate monthly runs, etc…
Operational Suite
www.cptec.inpe.br6
Software aspects of production models:ParallelismEfficiencyEasy to useEasy to modify
Provide user support on all software aspects
Transform successful research into production
Probe future technologies Hardware and software
HPC Group Activities
www.cptec.inpe.br7
Spectral Eulerian or Semi-Lagrangian, Full, Reduced, Quadratic or Linear Grid
Dynamically configurable, Fortran 90, OpenMP, MPI
Binary reproducible, portable, efficient on production machine
Easy to insert new physical parametrizations
Souza’s Shallow Cumulus , Grell Ensemble Convection, CLIRAD radiation
About 15 men-years modernization effort
Global Model
www.cptec.inpe.br8
Efficiency under OpenMP
Eulerian Full
25%
30%
35%
40%
45%
50%
1 2 3 4 5 6 7 8
Processors
% M
achi
ne T
op S
peed
T062L28 (210km) T126L28 (105km) T170L42 (79km) T213L42 (63km) T254L64 (53km) T319L64 (42km)
www.cptec.inpe.br9
MPI +OMP Speed-up
T213L42 Eulerian Full
0
4
8
12
16
20
24
28
32
0 8 16 24 32
Processors
Spee
d-up
www.cptec.inpe.br10
A T341L64, Semi-Lagrangian, Reduced Grid, OpenMP +
MPI executes on 4 full nodes (32 procs) at 80,67 GFlops
(31,5% of top speed)
SL Reduced Grid
www.cptec.inpe.br11
Limited area forecast model for regional weather centers
BRAMS = RAMS + tropical parametrizations + software quality + binary reproducibility + higher efficiency
Contributions from multiple sources:IAG/USP, IME/USP, UFCG, …
Fortran 90, MPI
Tailored for PC Clusters
About 20 men-years effort
BRAMS
www.cptec.inpe.br12
CATT-BRAMSAir Pollution due to biomassburning and urban areas
www.cptec.inpe.br13
Experimenting with OLAM
150 km global
75 km regional
www.cptec.inpe.br14
Ocean Land and Atmosphere Model; global version of RAMS
Developed at Duke University by Robert Walkoand Roni Avissar
Global triangulation based on icosahedron, shaved eta vertical coordinate
Non hydrostatic, finite volume formulation
Prototype versionSound results (daily test runs at CPTEC and IAG/USP)Requires software enhancement (long effort)
OLAM
www.cptec.inpe.br15
CREATION, DISTRIBUTION, MAINTENANCE AND SUPPORT OF MODERN, EFFICIENT, UP TO DATE OPEN SOURCE SOFTWARE FOR
METEOROLOGICAL AND ENVIRONMENTAL SCIENCES
HPC Group Role at CPTEC
www.cptec.inpe.br16
Web Pages
BRAMS
Global Model
OLAM
Unpublished
Prototypes
Published
www.cptec.inpe.br17
BRAMS DISSEMINATION
Week of site accesses
Daily production within Brazil
www.cptec.inpe.br18
Given that:CATT-BRAMS runs on NEC-SX6 at CPTECBRAMS runs on PC Clusters all over BrazilA desirable single source physics for BRAMS and OLAM
Is it possible to generate a single sourcephysics that is efficient on a wide range of architectures?
Elusive goal over the last 30 years
Effective Portability
www.cptec.inpe.br19
Combine vector instructions with cache reuse
Vector Instructions on SX and PC
“Unstructured” blocked physics From (k, i, j) formulation into (ij, k) Block on ijTailor block size to the architecture
First Attempt
www.cptec.inpe.br20
First Attempt: Radiationij x k formulation
0
1000
2000
3000
4000
5000
6000
7000
8000
9000
SX6 (Vector Size 256) IA32 (Vector Size 4)
Exec
utio
n Ti
me
(s)
k formulation ij formulation
www.cptec.inpe.br21
0
200
400
600
800
1000
1200
0 10 20 30 40 50 60
Problem Size
MFl
ops
Xeon SX6
Second Attempt: Advection
Original (k, i, j) formulation
www.cptec.inpe.br22
0
500
1000
1500
2000
2500
3000
3500
4000
4500
0 10 20 30 40 50 60
Problem Size
MFl
ops
Xeon SX6
Second Attempt: Advection
Current Formulation
www.cptec.inpe.br23
Development History:1. (k, i, j)2. (ij, k)3. Blocked (ij,k)4. (ijBlk, k, nBlk)5. Vector (nBlk) of all fields on type (ijBlk,k)
Single source, efficient on both vector and microprocessor based architectures
There is still a long way to go:Full code, not single moduleType conversion cost
Second Attempt: Advection
www.cptec.inpe.br24
Second Attempt: Advection
Single Performance Parameter
0
1
2
3
4
5
6
7
0 5000 10000 15000 20000 25000 30000 35000 40000
Vector Length
Spee
d G
ain
Xeon SX6
www.cptec.inpe.br25
Three distinct PC clusters spread over Brazil, driven by a portal, scheduled using three distinct grid middlewares, one middleware at a time
BRAMS Climatology Generation:Partition Brazil in three regionsThree starting dates (members)Partition 10 years climatology into one year runs
Stressing Grid concept with larger than usual computing load grain
Grid Computing
www.cptec.inpe.br26
Domain Partition
40 km resolution, but uneven grid sizes and computational efforts
www.cptec.inpe.br27
Results
91/93 (7 days)
94/96 (5 days)
97/99 (4,5 days)
2.6 Speed-up
www.cptec.inpe.br28
Formal Procurement for 1000 processors machine, for research purposes
IA32 processors, fast interconnectProposals due Nov 6th
Goal: “Massively Parallel” versions of:Global ModelCATT - BRAMSMesoscale models (includes at least BRAMS)Local Ensemble Kalman Filter based Data AssimilationOLAM…
Future Plans - I
www.cptec.inpe.br29
Central computing facility replacementSchedule for 2007/2008, depending upon funding20 – 40 TFlops range
Production Goals:Higher resolution AGCM (20 km)Higher resolution Environmental ModelHigher resolution Mesoscale ModelsKalman Filtering based Data AssimilationClimate Change
Future Plans - II
www.cptec.inpe.br30
THANK YOUTHANK YOU