Orchestrating a Climate Modeling Data Pipeline …aerler/files/...IntroductionThe...

30
Introduction The Pipeline PyWPS Conclusion Orchestrating a Climate Modeling Data Pipeline using Python and RAM Disk A Brief Overview of the WRF Tools Package Andre R. Erler PyCon Canada November 7 th , 2015 Andre R. Erler ([email protected]) Orchestrating a Climate Modeling Pipeline

Transcript of Orchestrating a Climate Modeling Data Pipeline …aerler/files/...IntroductionThe...

Page 1: Orchestrating a Climate Modeling Data Pipeline …aerler/files/...IntroductionThe PipelinePyWPSConclusion Orchestrating a Climate Modeling Data Pipeline using Python and RAM Disk A

Introduction The Pipeline PyWPS Conclusion

Orchestrating a Climate Modeling Data Pipelineusing Python and RAM Disk

A Brief Overview of the WRF Tools Package

Andre R. Erler

PyCon Canada

November 7th, 2015

Andre R. Erler ([email protected]) Orchestrating a Climate Modeling Pipeline

Page 2: Orchestrating a Climate Modeling Data Pipeline …aerler/files/...IntroductionThe PipelinePyWPSConclusion Orchestrating a Climate Modeling Data Pipeline using Python and RAM Disk A

Introduction The Pipeline PyWPS Conclusion

Outline

Introduction: Climate ModelingRegional Models

The Pre-processing PipelineThe WRF Tools Package

Using Python to Drive the PipelineThe Tool ChainThe Class Structure

Concluding Remarks

Andre R. Erler ([email protected]) Orchestrating a Climate Modeling Pipeline

Page 3: Orchestrating a Climate Modeling Data Pipeline …aerler/files/...IntroductionThe PipelinePyWPSConclusion Orchestrating a Climate Modeling Data Pipeline using Python and RAM Disk A

Introduction The Pipeline PyWPS Conclusion Global Models Regional Models

Global Climate Models

IPCC AR4 (2007) projectionsfor global surface temperatureunder different scenarios.

Global Climate Modelsare the main tool to pre-dict climate change.

Climate models compute energy, mass, and momentumfluxes on a relatively coarse computational grid.

Schematic of a Global Climate Model (GCM):

Center for Multiscale Modeling of Atmospheric Processes, CSU

Andre R. Erler ([email protected]) Orchestrating a Climate Modeling Pipeline

Page 4: Orchestrating a Climate Modeling Data Pipeline …aerler/files/...IntroductionThe PipelinePyWPSConclusion Orchestrating a Climate Modeling Data Pipeline using Python and RAM Disk A

Introduction The Pipeline PyWPS Conclusion Global Models Regional Models

Regional Climate ModelsGCM resolution is coarse and manyregional details are not resolved (e.g.the Rocky Mountains and the GreatLakes).

Giorgi (2006)

Regional ImpactsRegional impacts of ClimateChange are modeled withhigh-resolution regional climatemodels (RCM).

Regional models simulate a smallarea at much higher resolution(×10).

Andre R. Erler ([email protected]) Orchestrating a Climate Modeling Pipeline

Page 5: Orchestrating a Climate Modeling Data Pipeline …aerler/files/...IntroductionThe PipelinePyWPSConclusion Orchestrating a Climate Modeling Data Pipeline using Python and RAM Disk A

Introduction The Pipeline PyWPS Conclusion Global Models Regional Models

Regional Climate ModelsGCM resolution is coarse and manyregional details are not resolved (e.g.the Rocky Mountains and the GreatLakes).

Giorgi (2006)

Regional ImpactsRegional impacts of ClimateChange are modeled withhigh-resolution regional climatemodels (RCM).

Regional models simulate a smallarea at much higher resolution(×10).

Andre R. Erler ([email protected]) Orchestrating a Climate Modeling Pipeline

Page 6: Orchestrating a Climate Modeling Data Pipeline …aerler/files/...IntroductionThe PipelinePyWPSConclusion Orchestrating a Climate Modeling Data Pipeline using Python and RAM Disk A

Introduction The Pipeline PyWPS Conclusion Global Models Regional Models

Global Model: CESM

The Community Earth SystemModel is used as driving model.

Average Surface Temperature andOutline of the WRF Domain

Topography of Western Canada.Left: CESM at ∼80 kmRight: WRF at 10 km

Regional Model: WRF

The Weather Research and Forecastmodel is our regional model.

Andre R. Erler ([email protected]) Orchestrating a Climate Modeling Pipeline

Page 7: Orchestrating a Climate Modeling Data Pipeline …aerler/files/...IntroductionThe PipelinePyWPSConclusion Orchestrating a Climate Modeling Data Pipeline using Python and RAM Disk A

Introduction The Pipeline PyWPS Conclusion Global Models Regional Models

WRF is actuallypronounced “Worf”,like Lt. Worf inStar Trek: The NextGeneration (left)

The WRF model is alimited-area numericalweather prediction modeldeveloped by theNational Center forAtmospheric Research

Andre R. Erler ([email protected]) Orchestrating a Climate Modeling Pipeline

Page 8: Orchestrating a Climate Modeling Data Pipeline …aerler/files/...IntroductionThe PipelinePyWPSConclusion Orchestrating a Climate Modeling Data Pipeline using Python and RAM Disk A

Introduction The Pipeline PyWPS Conclusion The WRF Tools Package

Running the RegionalClimate Model (WRF)I The coupling process

between GCM and RCM is“off-line” (asynchronous)

I A pre-processing systemconverts GCM output intoRCM (wrf-)input files

I A RCM simulation is splitinto ∼ 200 separate jobs

I The RCM runs continuously,each job submitting the next

The WRF Tools Package

PythonI Run pre-processing

tool chain (WPS)I Initialize WRF jobsI Run post-processing

Shell ScriptI Submit pre-processingI Run the WRF job,

submit next jobI Archiving to tape

WRF Tools enables continuousand autonomous operation

Andre R. Erler ([email protected]) Orchestrating a Climate Modeling Pipeline

Page 9: Orchestrating a Climate Modeling Data Pipeline …aerler/files/...IntroductionThe PipelinePyWPSConclusion Orchestrating a Climate Modeling Data Pipeline using Python and RAM Disk A

Introduction The Pipeline PyWPS Conclusion The WRF Tools Package

Running the RegionalClimate Model (WRF)I The coupling process

between GCM and RCM is“off-line” (asynchronous)

I A pre-processing systemconverts GCM output intoRCM (wrf-)input files

I A RCM simulation is splitinto ∼ 200 separate jobs

I The RCM runs continuously,each job submitting the next

The WRF Tools Package

PythonI Run pre-processing

tool chain (WPS)I Initialize WRF jobsI Run post-processing

Shell ScriptI Submit pre-processingI Run the WRF job,

submit next jobI Archiving to tape

WRF Tools enables continuousand autonomous operation

Andre R. Erler ([email protected]) Orchestrating a Climate Modeling Pipeline

Page 10: Orchestrating a Climate Modeling Data Pipeline …aerler/files/...IntroductionThe PipelinePyWPSConclusion Orchestrating a Climate Modeling Data Pipeline using Python and RAM Disk A

Introduction The Pipeline PyWPS Conclusion The WRF Tools Package

Running the RegionalClimate Model (WRF)I The coupling process

between GCM and RCM is“off-line” (asynchronous)

I A pre-processing systemconverts GCM output intoRCM (wrf-)input files

I A RCM simulation is splitinto ∼ 200 separate jobs

I The RCM runs continuously,each job submitting the next

The WRF Tools Package

PythonI Run pre-processing

tool chain (WPS)I Initialize WRF jobsI Run post-processing

Shell ScriptI Submit pre-processingI Run the WRF job,

submit next jobI Archiving to tape

WRF Tools enables continuousand autonomous operation

Andre R. Erler ([email protected]) Orchestrating a Climate Modeling Pipeline

Page 11: Orchestrating a Climate Modeling Data Pipeline …aerler/files/...IntroductionThe PipelinePyWPSConclusion Orchestrating a Climate Modeling Data Pipeline using Python and RAM Disk A

Introduction The Pipeline PyWPS Conclusion The WRF Tools Package

Running the RegionalClimate Model (WRF)I The coupling process

between GCM and RCM is“off-line” (asynchronous)

I A pre-processing systemconverts GCM output intoRCM (wrf-)input files

I A RCM simulation is splitinto ∼ 200 separate jobs

I The RCM runs continuously,each job submitting the next

The WRF Tools Package

PythonI Run pre-processing

tool chain (WPS)I Initialize WRF jobsI Run post-processing

Shell ScriptI Submit pre-processingI Run the WRF job,

submit next jobI Archiving to tape

WRF Tools enables continuousand autonomous operation

Andre R. Erler ([email protected]) Orchestrating a Climate Modeling Pipeline

Page 12: Orchestrating a Climate Modeling Data Pipeline …aerler/files/...IntroductionThe PipelinePyWPSConclusion Orchestrating a Climate Modeling Data Pipeline using Python and RAM Disk A

Introduction The Pipeline PyWPS Conclusion The WRF Tools Package

The Data Pipeline

Global Model (CESM)asynchronous/archived output

Long-termArchive,Post-processing,Analysis

WPS

offline

WRFwrfinput wrfout

WPS WRFwrfinput

wrfoutlaunch

WPS WRFwrfinput

wrfout

launch

geogrid

staticdata

Andre R. Erler ([email protected]) Orchestrating a Climate Modeling Pipeline

Page 13: Orchestrating a Climate Modeling Data Pipeline …aerler/files/...IntroductionThe PipelinePyWPSConclusion Orchestrating a Climate Modeling Data Pipeline using Python and RAM Disk A

Introduction The Pipeline PyWPS Conclusion The WRF Tools Package

The Data Pipeline

Global Model (CESM)asynchronous/archived output

Long-termArchive,Post-processing,Analysis

WPS

offline

WRFwrfinput wrfout

WPS WRFwrfinput

wrfoutlaunch

WPS WRFwrfinput

wrfout

launch

geogrid

staticdata

Andre R. Erler ([email protected]) Orchestrating a Climate Modeling Pipeline

Page 14: Orchestrating a Climate Modeling Data Pipeline …aerler/files/...IntroductionThe PipelinePyWPSConclusion Orchestrating a Climate Modeling Data Pipeline using Python and RAM Disk A

Introduction The Pipeline PyWPS Conclusion The WRF Tools Package

The Data Pipeline

Global Model (CESM)asynchronous/archived output

Long-termArchive,Post-processing,Analysis

WPS

offline

WRFwrfinput wrfout

WPS WRFwrfinput

wrfoutlaunch

WPS WRFwrfinput

wrfout

launch

geogrid

staticdata

Andre R. Erler ([email protected]) Orchestrating a Climate Modeling Pipeline

Page 15: Orchestrating a Climate Modeling Data Pipeline …aerler/files/...IntroductionThe PipelinePyWPSConclusion Orchestrating a Climate Modeling Data Pipeline using Python and RAM Disk A

Introduction The Pipeline PyWPS Conclusion The WRF Tools Package

The Data Pipeline

Global Model (CESM)asynchronous/archived output

Long-termArchive,Post-processing,Analysis

WPS

offline

WRFwrfinput wrfout

WPS WRFwrfinput

wrfoutlaunch

WPS WRFwrfinput

wrfout

launch

geogrid

staticdata

Andre R. Erler ([email protected]) Orchestrating a Climate Modeling Pipeline

Page 16: Orchestrating a Climate Modeling Data Pipeline …aerler/files/...IntroductionThe PipelinePyWPSConclusion Orchestrating a Climate Modeling Data Pipeline using Python and RAM Disk A

Introduction The Pipeline PyWPS Conclusion The WRF Tools Package

WPS: A Collection of Fortran Legacy Tools

WPS Components

1. geogrid.exestatic / geographic data

2. ungrib.exe /unccsm.execonvert driving data toWRF IM Format

3. metgrid.exeinterpolate to WRF grid

4. real.exegenerate boundarycondition files

Fortran legacy tools readfrom and write to temporaryfiles:

I Strongly I/O limited in aHPC cluster environment

The Solution (on Linux)

Run on RAM-disk!I speedup ∼ ×10I requires 64 GB RAM

Using Python driver script

Andre R. Erler ([email protected]) Orchestrating a Climate Modeling Pipeline

Page 17: Orchestrating a Climate Modeling Data Pipeline …aerler/files/...IntroductionThe PipelinePyWPSConclusion Orchestrating a Climate Modeling Data Pipeline using Python and RAM Disk A

Introduction The Pipeline PyWPS Conclusion The WRF Tools Package

WPS: A Collection of Fortran Legacy Tools

WPS Components

1. geogrid.exestatic / geographic data

2. ungrib.exe /unccsm.execonvert driving data toWRF IM Format

3. metgrid.exeinterpolate to WRF grid

4. real.exegenerate boundarycondition files

Fortran legacy tools readfrom and write to temporaryfiles:

I Strongly I/O limited in aHPC cluster environment

The Solution (on Linux)

Run on RAM-disk!I speedup ∼ ×10I requires 64 GB RAM

Using Python driver script

Andre R. Erler ([email protected]) Orchestrating a Climate Modeling Pipeline

Page 18: Orchestrating a Climate Modeling Data Pipeline …aerler/files/...IntroductionThe PipelinePyWPSConclusion Orchestrating a Climate Modeling Data Pipeline using Python and RAM Disk A

Introduction The Pipeline PyWPS Conclusion Overview The Tool Chain The Class Structure

PyWPS: A Driver Module for WPS

I Collect required input data fromGCM archive

I Run applicable pre-processing toolson RAM disk

I Assemble WRF input files

Why Python?I Easier with complex logicI Classes for different

datasets/GCMs

PyWPS Imports

I multiprocessing forparallelization

I re to find input filesI fileinput, sys to edit

configurations filesI subprocess to launch

Fortran toolsI shutils, os to handle

temporary files

Andre R. Erler ([email protected]) Orchestrating a Climate Modeling Pipeline

Page 19: Orchestrating a Climate Modeling Data Pipeline …aerler/files/...IntroductionThe PipelinePyWPSConclusion Orchestrating a Climate Modeling Data Pipeline using Python and RAM Disk A

Introduction The Pipeline PyWPS Conclusion Overview The Tool Chain The Class Structure

PyWPS: A Driver Module for WPS

I Collect required input data fromGCM archive

I Run applicable pre-processing toolson RAM disk

I Assemble WRF input files

Why Python?I Easier with complex logicI Classes for different

datasets/GCMs

PyWPS Imports

I multiprocessing forparallelization

I re to find input filesI fileinput, sys to edit

configurations filesI subprocess to launch

Fortran toolsI shutils, os to handle

temporary files

Andre R. Erler ([email protected]) Orchestrating a Climate Modeling Pipeline

Page 20: Orchestrating a Climate Modeling Data Pipeline …aerler/files/...IntroductionThe PipelinePyWPSConclusion Orchestrating a Climate Modeling Data Pipeline using Python and RAM Disk A

Introduction The Pipeline PyWPS Conclusion Overview The Tool Chain The Class Structure

PyWPS: The Program Flow & Parallelization

All CESMOutput files:...

Selected CESMOutput files:only current job

core 0

core 1

core 2

core 3

Select Files

WRF inputfiles

core 0

core 1

core 2

core 3

WPS Tool Chain

The WPS Tool Chain:

CESMoutput

WRF inter-mediate file

ungrib.exe metgridfile

metgrid.exewrfinput

real.exe

Andre R. Erler ([email protected]) Orchestrating a Climate Modeling Pipeline

Page 21: Orchestrating a Climate Modeling Data Pipeline …aerler/files/...IntroductionThe PipelinePyWPSConclusion Orchestrating a Climate Modeling Data Pipeline using Python and RAM Disk A

Introduction The Pipeline PyWPS Conclusion Overview The Tool Chain The Class Structure

PyWPS: The Program Flow & Parallelization

All CESMOutput files:...

Selected CESMOutput files:only current job

core 0

core 1

core 2

core 3

Select Files

WRF inputfiles

core 0

core 1

core 2

core 3

WPS Tool Chain

The WPS Tool Chain:

CESMoutput

WRF inter-mediate file

ungrib.exe metgridfile

metgrid.exewrfinput

real.exe

Andre R. Erler ([email protected]) Orchestrating a Climate Modeling Pipeline

Page 22: Orchestrating a Climate Modeling Data Pipeline …aerler/files/...IntroductionThe PipelinePyWPSConclusion Orchestrating a Climate Modeling Data Pipeline using Python and RAM Disk A

Introduction The Pipeline PyWPS Conclusion Overview The Tool Chain The Class Structure

PyWPS: The Program Flow & Parallelization

All CESMOutput files:...

Selected CESMOutput files:only current job

core 0

core 1

core 2

core 3

Select Files

WRF inputfiles

core 0

core 1

core 2

core 3

WPS Tool Chain

The WPS Tool Chain:

CESMoutput

WRF inter-mediate file

ungrib.exe metgridfile

metgrid.exewrfinput

real.exe

Andre R. Erler ([email protected]) Orchestrating a Climate Modeling Pipeline

Page 23: Orchestrating a Climate Modeling Data Pipeline …aerler/files/...IntroductionThe PipelinePyWPSConclusion Orchestrating a Climate Modeling Data Pipeline using Python and RAM Disk A

Introduction The Pipeline PyWPS Conclusion Overview The Tool Chain The Class Structure

The Class Structure

Dataset/GCM specificparameters:

I Input file types/namesI Interpolation tables/gridI Variables / frequency

Multiple DatasetsI Inheritance for common

proceduresI Polymorphism for

different procedures

class Dataset(object):prefix = ’’ # file prefixvtable = ’Vtable’gribname = ’GRIBFILE’ # inputungrib_exe = ’ungrib.exe’ungrib_log = ’ungrib.exe.log’...def __init__(self, ...):

# type checking...

def setup(self, src, ...):...

def cleanup(self, tgt):...

def extractDate(self, fname):# match valid filenames...

def ungrib(self, date, mytag):# generate file for metgrid...

Andre R. Erler ([email protected]) Orchestrating a Climate Modeling Pipeline

Page 24: Orchestrating a Climate Modeling Data Pipeline …aerler/files/...IntroductionThe PipelinePyWPSConclusion Orchestrating a Climate Modeling Data Pipeline using Python and RAM Disk A

Introduction The Pipeline PyWPS Conclusion Overview The Tool Chain The Class Structure

The Class Structure

Dataset/GCM specificparameters:

I Input file types/namesI Interpolation tables/gridI Variables / frequency

Multiple DatasetsI Inheritance for common

proceduresI Polymorphism for

different procedures

class Dataset(object):prefix = ’’ # file prefixvtable = ’Vtable’gribname = ’GRIBFILE’ # inputungrib_exe = ’ungrib.exe’ungrib_log = ’ungrib.exe.log’...def __init__(self, ...):

# type checking...

def setup(self, src, ...):...

def cleanup(self, tgt):...

def extractDate(self, fname):# match valid filenames...

def ungrib(self, date, mytag):# generate file for metgrid...

Andre R. Erler ([email protected]) Orchestrating a Climate Modeling Pipeline

Page 25: Orchestrating a Climate Modeling Data Pipeline …aerler/files/...IntroductionThe PipelinePyWPSConclusion Orchestrating a Climate Modeling Data Pipeline using Python and RAM Disk A

Introduction The Pipeline PyWPS Conclusion

Summary & Conclusion

PythonI Use Python for flow control (manage legacy tools)

I Parallelization relatively easy (within one node)

I Class structure is versatile and makes maintenance easier

RAM-diskI Scientific Programming: dealing with legacy tools

Often in Fortran, often relying on disk I/O

I Use RAM-disk to avoid unnecessary disk I/O

Andre R. Erler ([email protected]) Orchestrating a Climate Modeling Pipeline

Page 26: Orchestrating a Climate Modeling Data Pipeline …aerler/files/...IntroductionThe PipelinePyWPSConclusion Orchestrating a Climate Modeling Data Pipeline using Python and RAM Disk A

Introduction The Pipeline PyWPS Conclusion

Thank You! ∼ Questions?

Andre R. Erler ([email protected]) Orchestrating a Climate Modeling Pipeline

Page 27: Orchestrating a Climate Modeling Data Pipeline …aerler/files/...IntroductionThe PipelinePyWPSConclusion Orchestrating a Climate Modeling Data Pipeline using Python and RAM Disk A

Introduction The Pipeline PyWPS Conclusion

List of Publications using WRF Tools

I Erler, Andre R., W. Richard Peltier, Marc d’Orgeville (under review), ProjectedChanges in Hydro-Climatic Extremes for Western Canada, Journal of Climate.

I Marc d’Orgeville, W. Richard Peltier, Andre R. Erler (accepted), Uncertainty inFuture Summer Precipitation on the Great Lakes Basin due to Drought in theSouth-Western US, Journal of Geophysical Research.

I Erler, Andre R., W. Richard Peltier, Marc d’Orgeville, 2015, DynamicallyDownscaled High Resolution Hydro-Climate Projections for Western Canada,Journal of Climate.

I Marc d’Orgeville, W. Richard Peltier, Andre R. Erler, Jonathan Gula, 2014,Climate change impacts on Great Lakes Basin precipitation extremes, Journal ofGeophysical Research.

Andre R. Erler ([email protected]) Orchestrating a Climate Modeling Pipeline

Page 28: Orchestrating a Climate Modeling Data Pipeline …aerler/files/...IntroductionThe PipelinePyWPSConclusion Orchestrating a Climate Modeling Data Pipeline using Python and RAM Disk A

Experimental Setup Summary of Results

Regional Climate Projections

Experimental SetupI GCM & RCM run for 15 years

(model time)

I Historical (1979 - 1994)I Mid-21st -Century (2045-2060)I End-21st -Century (2085-2100)

I GCM & RCM use RCP 8.5 GHGconcentration scenarios

I RCM runs with different physicalparameterizations

I Both models run in an initial conditionensemble with 4 members each

IPCC AR4 climate projectionsbased on different scenarios; theRCP 8.5 is very similar to theolder A2 scenario

Andre R. Erler ([email protected]) Orchestrating a Climate Modeling Pipeline

Page 29: Orchestrating a Climate Modeling Data Pipeline …aerler/files/...IntroductionThe PipelinePyWPSConclusion Orchestrating a Climate Modeling Data Pipeline using Python and RAM Disk A

Experimental Setup Summary of Results

Andre R. Erler ([email protected]) Orchestrating a Climate Modeling Pipeline

Page 30: Orchestrating a Climate Modeling Data Pipeline …aerler/files/...IntroductionThe PipelinePyWPSConclusion Orchestrating a Climate Modeling Data Pipeline using Python and RAM Disk A

Experimental Setup Summary of Results

Summary of ResultsI Significant increase in winter

precipitation (extremes, ∼30%)

I Small increase in summer, butmore increase in evaporation

Hydrological Impacts

I Climate change impacts inARB/Alberta likely benign

I 50% reduction in peak snowmeltand spring runoff in FRB/BC...

I ... but increased flood risk due toprecipitation extremes in fall

End-century soil moisturechanges in late summer

I Late summer dryingwest of ContinentalDivide, but not east

Andre R. Erler ([email protected]) Orchestrating a Climate Modeling Pipeline