Training session 2 : Advanced training course on modipsl and libIGCM November 14 th 2013, MdS
description
Transcript of Training session 2 : Advanced training course on modipsl and libIGCM November 14 th 2013, MdS
Training session 2 :Advanced training course on modipsl and libIGCM
November 14th 2013, MdS
Outline
• IPSL climate modelling centre (ICMC) presentation
• IPSLCM history and perspective • Mini how to use modipsl/libIGCM• Post-processing with libIGCM• Monitoring a simulation• Hands-on
IPSL climate modelling centre (ICMC)
Modeling platform (IPSL-ESM)Arnaud Caubel (LSCE) - Marie-Alice Foujols (IPSL)
Data Archive and Access RequirementsSébastien Denvil (IPSL) - Karim Ramage (IPSL)
Atmospheric and surface physics and dynamics (LMDZ)
Frédéric Hourdin (LMD) - Laurent Fairhead (LMD)
Ocean and sea ice physics and dynamics (NEMO, LIM)
C Ethé (IPSL) - Claire Lévy - Gurvan Madec (LOCEAN)
Atmosphere and ocean interactions (IPSL-CM, different resolutions)
Sébastien Masson (LOCEAN) - Olivier Marti (LSCE)
Biogeochemical cycles (PISCES)Laurent Bopp (LSCE) - Patricia Cadule (IPSL)
Current and future climate changes
Jean-Louis Dufresne(LMD) - Olivier Boucher (LMD)
Paleoclimate and last millennium
Pascale Braconnot - Masa Kageyama (LSCE)
“Near-term” prediction (seasonal to decadal)
Eric Guilyardi (LOCEAN) - Juliette Mignot (LOCEAN)
Evaluation of the models, present-day and future climate change analysis
Sandrine Bony (LMD) - Patricia Cadule (IPSL) - Marion Marchand (LATMOS) - Juliette Mignot
(LOCEAN) – Jérôme Servonnat (LSCE)
Regional climatesRobert Vautard (LSCE), Laurent Li (LMD)Atmospheric chemistry and aerosols (INCA,
INCA_aer, Reprobus)Anne Cozic (LSCE) - M. Marchand (LATMOS)
Continental processes (ORCHIDEE)Philippe Peylin (LSCE) - Josefine Ghattas (IPSL)
ICMC organisationPI: J-L Dufresne; Office: L. Bopp, MA Foujols, J. Mignot
Steering committee
Outline
• IPSL climate modelling centre (ICMC) presentation
• IPSLCM history and perspective • Mini how to use modipsl/libIGCM• Post-processing with libIGCM• Monitoring a simulation• Hands-on
IPSLCM history
IPSLCM history and scientific articles
1990 1995 2001 2007 2013
FAR SAR TAR AR4 AR5 IPCC reports
CMIP projectsCMIP 1 & 2 CMIP3
CMIP5
IPSL-CM1
IPSL-CM2
IPSL-CM4
IPSL-CM5
10+ articles
some articles
few articles
30+ articles
IPSL-CM6
LMDZ : atmospheric componenthttp://lmdz.lmd.jussieu.fr/?set_language=en
Next LMDZ training session : 9-11 December 2013inscription before 15th November http://studs.unistra.fr/studs.php?sondage=1wgk8t9v44nsml27
Introduction to LMDZ
NEMO: oceanic componenthttp://www.nemo-ocean.eu
Short history of IPSL modelhttp://icmc.ipsl.fr/index.php/icmc-models
1979 : 1st Linpack performance list
80 Mflops
Supercomputers timeline : top500.org
*10/4 years
Complexity and resolution of models
IPCC, AR4, WG1, Chap. 1, fig 1.2 and 1.4
top500.org : number of CPUS/cores
1 000
100 000
10
1993 2003 2013
Technical challenges : HPC
• More parallelism in component :– MPI : messages programming – hybrid ie MPI/OpenMP : directives and shared
memory
• More parallelism in coupled model– 3 executables at least – each with MPI or MPI/OpenMP– more executables with XIOS : IO servers
• Huge amount of data produced, to be analysed
on the road for IPSL-CM6• New physical package : LMDZ, NEMO, ORCHIDEE• Increased H and V resolutions• Ensembles of simulations• Longer simulations : paleo• More complexity : INCA chemistry added• More processors used in parallel• New dynamical core : DYNAMICO• Optimisation in IO• Improvement and Reliability of libIGCM
Outline
• IPSL climate modelling centre (ICMC) presentation
• IPSLCM history and perspective • Mini how to use modipsl/libIGCM• Post-processing with libIGCM• Monitoring a simulation• Hands-on
Récupérer, compiler et lancer une configuration de type _v5
1. Accès à MODIPSL svn co http://forge.ipsl.jussieu.fr/igcmg/svn/modipsl/trunk modipsl
2. Accès à IPSLCM5_v5cd modipsl/util ; ./model IPSLCM5_v5
3. Installation des Makefilescd modipsl/util ; ./ins_make
4. Compilation cd modipsl/config/IPSLCM5_v5 ; gmake + resolution choisie
5. Installation de l’expérience type (et post-traitements) cp EXPERIMENT/IPSLCM5/piControl/config.card . vi config.card ### JobName=MYEXP ../../util/ins_job ### recopie repertoire piControl dans MYEXP
avec COMP, DRIVER, PARAM • Soumission du Job de lancement
cd modipsl/config/IPSLCM5_v5/MYEXP; ccc_msub Job_MYEXPllsumbmit Job_MYEXP
Com
putin
gsources of components
IPSL cvs/svn servers
LibIGCM
Specific configuration dowloading
Compilation
Job set up and submission
Connection
Fron
t End
Simulation set up
Modipsl
LibIGCM Physical package choice and set up
Generical job: AA_Job
PeriodLength
libIGCM library : schematic descriptionEXP00 EXP00/DRIVER
driver
card
EXP00/COMP
RebuildFrequency
PackFrequency
SeasonalFrequency
TimeSeriesFrequency
Com
putin
g jo
b
Post
-pro
cess
ing
jobs
PackFrequency
TGCC computers and file system in a nutshell
curiehybrid nodes
-q hybrid
curiehybrid nodes
-q hybrid
curiethin nodes-q standard
curiethin nodes-q standard
curielarge nodes
-q xlarge
curielarge nodes
-q xlarge
dods/storedods/store
$HOME
$CCCSTOREDIR
$CCCWORKDIR
$SCRATCHDIR
HPSS : Robotic tapes
curiefront-end
curiefront-end
Computers
sourcessmall results IGCM_OUT :
MONITORING/ATLAS
temporary REBUILDIGCM_OUT :
files to be packedoutputs of post-proc
jobs
IGCM_OUT : Packed results
Output, Analyse SE and TS
Small precious filesSaved space
File system
dods_cp
cp
ccc_hsm get
airainfront-end
airainfront-end
airainnodesairainnodes
cpdods/workdods/workdods_cp
October 2013Temporary
spaceSaved space
Non saved space
Space on tapes
computecompute
loginlogin
Visible from www
quotasquotas
Job_EXP00Job_EXP00
Com
pute
curie
Job_EXP00Job_EXP00 Job_EXP00Job_EXP00
TGCC PeriodLength PeriodLength
$SCRATCHDIR/IGCM_OUT/.../REBUILD
$SCRATCHDIR/IGCM_OUT/XXX/Restart Debug
DodsCopy=TRUE/FALSE
ncrcat
PackFrequency
$CCCSTOREDIR/IGCM_OUT/XXX/Output
pack_outputpack_output
PackFrequency
$CCCSTOREDIR/IGCM_OUT/.../RESTART DEBUG
Post
curietarpack_restart
pack_debugpack_restartpack_debug
create_tscreate_ts
curiemonitoringmonitoring
Post
TimeSeriesFrequency
TS et SE : $CCCSTOREDIR/IGCM_OUT/… dods/storeMONITORING et ATLAS : $CCCWORKDIR dods/work
create_secreate_se
SeasonalFrequency
atlasatlas
Post
RebuildFrequency
rebuildrebuild
$SCRATCHDIR/IGCM_OUT/XXX/Output
curie
IDRIS computers and file system in a nutshell
dodsdods
$HOME
$HOME
$WORKDIR $WORKDIR
Robotic tapesIGCM_OUT :
Output, AnalyseMONITORING/
ATLAS
$HOME
$TMPDIR
sourcessmall results
temporary REBUILDIGCM_OUT :
files to be packedoutputs of post-proc
jobs
gayagaya
mfput/mfget
dods_cp
mfput/mfget
dmput/dmget
adappcomputeadapp
computeada
computeada
computeadapp
front-endadapp
front-endturing
front-endturing
front-endturingcalculturingcalcul
$TMPDIR $TMPDIR
October 2013Temporary
spaceSaved space
Non saved space
Space on tapes
Visible from www
File system
computecompute
loginlogin
Small precious filesSaved space
Job_EXP00Job_EXP00
Com
pute
ada
Job_EXP00Job_EXP00 Job_EXP00Job_EXP00
IDRIS PeriodLength PeriodLength
$WORKDIR/IGCM_OUT/.../REBUILD
$WORKDIR/IGCM_OUT/XXX/Restart Debug
DodsCopy=TRUE/FALSE
ncrcat
PackFrequency
gaya:IGCM_OUT/XXX/Output
pack_outputpack_output
PackFrequency
gaya:IGCM_OUT/.../RESTART DEBUG
Post
adapptarpack_restart
pack_debugpack_restartpack_debug
create_tscreate_ts
adappmonitoringmonitoring
Post
TimeSeriesFrequency
gaya:IGCM_OUT/… dods.idris.fr
create_secreate_se
SeasonalFrequency
atlasatlas
Post
RebuildFrequency
rebuildrebuild
$WORKDIR/IGCM_OUT/XXX/Output
adapp
Outline
• IPSL climate modelling centre (ICMC) presentation
• IPSLCM history and perspective • Mini how to use modipsl/libIGCM• Post-processing with libIGCM• Monitoring a simulation• Hands-on
Time Series : create_ts.job• A Time Series is a file which contains a single variable over the whole simulation period
(ChunckJob2D = NONE) or for a shorter period for 2D (ChunckJob2D = 100Y) or 3D (ChunckJob3D = 50Y) variables.
• The write frequency is defined in the config.card file: TimeSeriesFrequency=10Y indicates that the time series will be written every 10 years and for 10-year periods.
• The Time Series are set in the COMP/*.card files by the TimeSeriesVars2D and TimeSeriesVars3D options.
• The Time Series coming from monthly (or daily) output files are stored on the file server in the IGCM_OUT/TagName/[SpaceName]/[ExperimentName]/JobName/Composante/Analyse/TS_MO and TS_DA directories.
• Bonus : TS_MO_YE (for annual mean time series) are produced for all TS_MO variables• You can add or remove variables to the TimeSeries lists according to your needs.
[OutputFiles]List= (histmth.nc, ${R_OUT_ATM_O_M}/${PREFIX}_1M_histmth.nc, Post_1M_histmth),\...[Post_1M_histmth]Patches= ()GatherWithInternal = (lon, lat, presnivs, time_counter, time_counter_bnds, aire)TimeSeriesVars2D = (bils, cldh, ......ChunckJob2D = NONETimeSeriesVars3D = (upwd, lwcon, ......ChunckJob3D = OFF
[Post]...#D- If you want to produce time series, this flag determines#D- frequency of post-processing submission (NONE if you don't want)TimeSeriesFrequency=10Y
config.card
COMP/lmdz.card
MONITORING : dods
Intermonitoring : http://webservices.ipsl.jussieu.fr/monitoring/
How to add a new variable in MONITORING
• You can add or change the variables to be monitored by editing the configuration files of the monitoring. Those files are defined by default for each component.
• The monitoring is defined here: ~compte_commun/atlas For example for LMDZ on curie : ~p86ipsl/monitoring01_lmdz_LMD9695.cfgFor example for LMDZ on adapp : ~rpsl035/monitoring01_lmdz_LMD9695.cfg
• You can change the monitoring by creating a POST directory which is part of your configuration. Copy a .cfg file and change it the way you want.
• use ferret language• You can monitor variables produced in time series and stored in TS_MO
#--------------------------------------------------------------------------------------------------------# field | files patterns | files additionnal | operations | title | units | calcul of area#-------------------------------------------------------------------------------------------------------- nettop_global | "tops topl" | LMDZ4.0_9695_grid.nc | "(tops[d=1]-topl[d=2])" | "TOA.
total heat flux (GLOBAL)" | "W/m^2" | "aire[d=3]"
POST/monitoring01_lmdz_LMD9695.cfg
Seasonal mean : create_se.job• A seasonal means files (SE) contain averages for each month of the year (jan, feb,...) for a
frequency defined in the config.card files• SeasonalFrequency=10Y The seasonal means will be computed every 10 years.• SeasonalFrequencyOffset=0 The number of years to be skipped for calculating seasonal means.• All files with a requested Post (Seasonal=ON in COMP/*card) are then averaged within the ncra
script before being stored in the directory: • IGCM_OUT/IPSLCM5A/DEVT/pdControl/MyExp/ATM/Analyse/SE. There is one file per
SeasonalFrequency=10Y• ATLAS are launched by create_se. ATLAS sources are : ~rpsl035 ~p86ipsl/atlas
[OutputFiles]List=(histmth.nc, ${R_OUT_ATM_O_M}/${PREFIX}_1M_histmth.nc, Post_1M_histmth),\...[Post_1M_histmth]...Seasonal=ON
#========================================================================#D-- Post -[Post]...#D- If you want to produce seasonal average, this flag determines#D- the period of this average (NONE if you don't want)SeasonalFrequency=10Y#D- Offset for seasonal average first start dates ; same unit as SeasonalFrequency#D- Usefull if you do not want to consider the first X simulation's yearsSeasonalFrequencyOffset=0
config.card
COMP/lmdz.card
Outline
• IPSL climate modelling centre (ICMC) presentation
• IPSLCM history and perspective • Mini how to use modipsl/libIGCM• Post-processing with libIGCM• Monitoring a simulation• Hands-on
Monitoring the simulation
Verification and Correction
Monitoring a simulation• We strongly encourage you to check your
simulation frequently during run time. First of all, check job status :ccc_mstatllq
• Real time limit exceeded : jobs are killed without any message on ada
• RunChecker.job : This tool, provided with libIGCM, allows you to find out your simulations' status.
• One historical simulation, 156 years : 1850-2005 is composed by 50 computing jobs and 1000 post-processing jobs
Documentationhttp://forge.ipsl.jussieu.fr/igcmg/wiki/
platform/en/documentation
Monitoring a simulation : mail• You receive a message at the end of the simulation• The simulation could be completed or failed
Début du message réexpédié :
De : [email protected]
Objet : MyJobTest failedDate : 22 octobre 2013 17:17:41 UTC+02:00À : [email protected]
Dear rpsl003,
De : [email protected]
Objet : COURSNIV2 completedDate : 22 octobre 2013 18:29:24 UTC+02:00À : [email protected]
Dear rpsl003,
Simulation COURSNIV2 completed on supercomputer ada027Simulation started : 20000101Simulation ended : 20000102
Output files are available in /u/rech/psl/rpsl003/IGCM_OUT/IPSLCM5A/DEVT/pdControl/COURSNIV2Files to be rebuild are temporarily available in /workgpfs/rech/psl/rpsl003/IGCM_OUT/IPSLCM5A/DEVT/pdControl/COURSNIV2/REBUILDPre-packed files are temporarily available in /workgpfs/rech/psl/rpsl003/IGCM_OUT/IPSLCM5A/DEVT/pdControl/COURSNIV2Script files, Script Outputs and Debug files (if necessary) are available in
/gpfs5r/workgpfs/rech/psl/rpsl003/ADA/COURS/NIV2/IPSLCM5_v5/modipsl/config/IPSLCM5_v5/COURSNIV2Greetings!
Check this out for more information : https://forge.ipsl.jussieu.fr/igcmg/wiki/platform/documentation
Monitoring a simulation : run.card• When the simulation has started, the file run.card is created by libIGCM using the
template run.card.init. • run.card contains information of the current run period and the previous periods
already finished. • This file is updated at each run period by libIGCM. • You can find here information of the time consumption of each period.• The status of the job is set to OnQueue, Running, Completed or Fatal.
[Configuration]#last PREFIXOldPrefix= COURSNIV2_20000103#Compute date of loopPeriodDateBegin= 2000-01-04PeriodDateEnd= 2000-01-04CumulPeriod= 4# State of Job "Start", "Running", "OnQueue", "Completed"
PeriodState= CompletedSubmitPath= /gpfs5r/workgpfs/rech/psl/rpsl003/ADA/COURS/NIV2/IPSLCM5_v5/modipsl/config/IPSLCM5_v5/COURSNIV2#========================================================================[PostProcessing]TimeSeriesRunning=nTimeSeriesCompleted=#========================================================================[Log]# Executables SizeLastExeSize= ( 88011086, 0, 0, 19956686, 0, 0, 1523952 )#-----------------------------------------------------------------------------------------------------------------------------------# CumulPeriod | PeriodDateBegin | PeriodDateEnd | RunDateBegin | RunDateEnd | RealCpuTime | UserCpuTime |#-----------------------------------------------------------------------------------------------------------------------------------# 1 | 20000101 | 20000101 | 2013-10-22T17:53:48 | 2013-10-22T17:55:10 | 82.01000 | 4.21000 |# 2 | 20000102 | 20000102 | 2013-10-22T18:28:03 | 2013-10-22T18:29:17 | 74.19000 | 4.09000 |# 3 | 20000103 | 20000103 | 2013-10-23T17:28:50 | 2013-10-23T17:30:26 | 95.21000 | 4.30000 |
run.card
Verification and correction 1/6• Where did the problem occur ?
• 1 "failed" email : Main computation job
=> gaya stopped at IDRIS, hardware problem ? Check Script_output_xxxx.
=> When gaya restarted, or if there isn't any clear error message, try relaunching (after a clean_month):
path/to/libIGCM/clean_month.jobccc_msub (llsubmit) Job_...
Verification and correction 2/6• Where did the problem occur ?
• 1 "failed" email : Main computation job : analyse Script_output_xxxx
######################################## ANOTHER GREAT SIMULATION ######################################## 1ère partie (copying the input files)######################################## DIR BEFORE RUN EXECUTION ######################################## 2ème partie (running the model)######################################## DIR AFTER RUN EXECUTION ######################################## 3ème partie (post-processing)#######################################
http://forge.ipsl.jussieu.fr/igcmg/wiki/platform/en/documentation/G_suivi#AnalyzingtheJoboutput:Script_Output
Verification and correction 3/6--> analyse Script_output_xxxx :In general, if your simulation stops you can look for the keyword "IGCM_debug_CallStack" in this file.This keyword will come after a line explaining the error you are experiencing.
=====================================================================EXECUTION of : mpirun -f ./run_file > out_run_file 2>&1Return code of executable : 1IGCM_debug_Exit : EXECUTABLE
!!!!!!!!!!!!!!!!!!!!!!!!!!!! IGCM_debug_CallStack !!!------------------------!
!------------------------!IGCM_sys_Cp : out_run_file xxxxxxxxxxxx_out_run_file_error=====================================================================
Verification and correction 4/6--> Check closely the sub directory Debug (if it exists)
Check file xxxxx_error in Debug/ : – contains LMDZ standard output. LMDZ often fails in hgardfou.
Stopping in hgardfou– contains abends (abnormal termination / exception) of each and every
component.
Check standard outputs for NEMO, ORCHIDEE, INCA, OASIS– Debug/xxxx_ocean.output– Debug/xxxx_output_orchidee– Debug/xxxx_inca.out– Debug/xxxx_cplout
RunChecker.job• RunChecker.job helps you to monitor all the jobs produced by libIGCM for a simulation
RunChecker.job : usage and options
This script can be launched from anywhere.
Usage:path/to/libIGCM/RunChecker.job [-u user] [-q] [-j n] [-s] job_namepath/to/libIGCM/RunChecker.job [-u user] [-q] [-j n] -p config.card_pathpath/to/libIGCM/RunChecker.job [-u user] [-q] [-j n] -r
Options :-h : print this help and exit-u user : owner of the job-q : quiet-j n : print n post-processing jobs (default is 20)-s : search for a new job in $WORKDIR and fill in the catalog before printing
information-p path : give the absolute path to the directory containing the config.card
instead of the job name (needed only once)-r : check all running simulations.
1) path/to/libIGCM/RunCkecker.job –p $CCCWORKDIR/CURIE/CMIP5/R1414/IPSLCM5A_20120731/modipsl/config/IPSLCM5A/v5.rcp45CMR2
2) path/to/libIGCM/RunCkecker.job v5.rcp45CMR2
Pb !
STOP (Fatal into run.card)
Verification and correction 5/6• You have received 2 "failed" emails or
RunChecker status is abnormal ie : red• Analyse the situation:
– Simple case:• Re-submit rebuild, pack_debug or
pack_restart jobs• Re-submit pack_output
– Less simple case:• Use clean_year to go back to a
healthy situation• Holes in the data
path/to/libIGCM/clean_year.job [SSAA]• all data from current year to SSAA
(included) will be deleted.• Restart the simulation
https://forge.ipsl.jussieu.fr/igcmg/wiki/platform/en/documentation/G_suivi#Startorrestartpostprocessingjobs1
TimeSeries_Checker.job• Install a dedicated directory• Copy required files and directories : config.card, run.card, COMP, POST• Copy from libIGCM the script : TimeSeries_Checker.job• Modify the job : libIGCM, name of the simulation, ...• Look at the documentation :
https://forge.ipsl.jussieu.fr/igcmg/wiki/platform/en/documentation/G_suivi#TimeSeries_checker.job-Recommendedmethod
> mkdir POST_REDO> cd POST_REDO> cp –pr COMP POST config.card run.card .> cp ../../../../libIGCM/TimeSeries_Checker.job .> vi TimeSeries_Checker.job# Check/Modify : libIGCM= SpaceName= ExperimentName= JobName=
CARD_DIR= BRIDGE_MSUB_PROJECT=gen2211> ./TimeSeries_Checker.job Answer y to submit create_ts.job
ksh > ./TimeSeries_Checker.job 2>&1|tee TSC_OUT_TO_KEEP
Verification and correction 6/6• Everything went ok :
– End of simulation email– No anomaly detected by
RunChecker
• TimeSeriesChecker (and SE_checker):Checks existing time series et submit create_ts jobs to build the missing ones
• Keep in mind:– Rebuild jobs automatically
submit pack jobs, as well as corresponding TS and SE.
– Pack, TS and SE jobs may be re-submitted independently from a rebuild job
The END! (so soon?)
[email protected] list to ask for help and to share
information with other users [email protected]
How to redo a rebuild?• Install a dedicated directory• Copy required files and directories : config.card, run.card, COMP, POST• Copy from libIGCM job to be relaunched : rebuild...• Modify the job : dates, name of the simulation, ...• Look at the documentation :
https://forge.ipsl.jussieu.fr/igcmg/wiki/platform/en/documentation/G_suivi#Startorrestartpostprocessingjobs1
> mkdir POST_REDO> cd POST_REDO> cp –pr COMP POST config.card run.card .> cp ../../../../libIGCM/rebuild_fromWorkdir.job .> vi rebuild_fromWorkdir.job# Check/Modify : libIGCM= PeriodDateBegin= NbRebuildDir=
REBUILD_DIR=> llsubmit rebuild_fromWorkdir.job> ccc_msub rebuild_fromWorkdir.job
New in libIGCM v2 : Accounting mail• You receive a message after the first job with some information regarding
time used by the simulation• Take it into account before launching a simulation, especially large
simulations
De : [email protected]
Objet : MyJobTest AccountingDate : 22 octobre 2013 17:23:52 UTC+02:00À : [email protected]
Dear rpsl003,
this mail will be sent once for the simulation MyJobTest you recently submitted
The whole simulation will consume around 3.939110 hours. To be compared with your project allocation.
The recommended PeriodNb for a 24 hours job seems to be around 974.839488. To be compare with the current setting (Job_MyJobTest parameter) : PeriodNb=3
Greetings!
Check this out for more information : https://forge.ipsl.jussieu.fr/igcmg/wiki/platform/documentation
More information
• Audit report on IPSL workflow used at TGCC and IDRIS
• IPSL wiki and documentation : https://forge.ipsl.jussieu.fr/igcmg/wiki/platform/en/documentation