Designing and Developing Interactive Training for Reusability.
Advancing the Accessibility, Reusability, and ...
Transcript of Advancing the Accessibility, Reusability, and ...
Brigham Young University Brigham Young University
BYU ScholarsArchive BYU ScholarsArchive
Theses and Dissertations
2020-03-27
Advancing the Accessibility, Reusability, and Interoperability of Advancing the Accessibility, Reusability, and Interoperability of
Environmental Modeling Workflows Through Web Services Environmental Modeling Workflows Through Web Services
Xiaohui Qiao Brigham Young University
Follow this and additional works at: https://scholarsarchive.byu.edu/etd
Part of the Engineering Commons
BYU ScholarsArchive Citation BYU ScholarsArchive Citation Qiao, Xiaohui, "Advancing the Accessibility, Reusability, and Interoperability of Environmental Modeling Workflows Through Web Services" (2020). Theses and Dissertations. 8889. https://scholarsarchive.byu.edu/etd/8889
This Dissertation is brought to you for free and open access by BYU ScholarsArchive. It has been accepted for inclusion in Theses and Dissertations by an authorized administrator of BYU ScholarsArchive. For more information, please contact [email protected].
Advancing the Accessibility, Reusability, and Interoperability
of Environmental Modeling Workflows
Through Web Services
Xiaohui Qiao
A dissertation submitted to the faculty of Brigham Young University
in partial fulfillment of the requirements for the degree of
Doctor of Philosophy
Daniel P. Ames, Chair E. James Nelson Norman L. Jones
Gustavious P. Williams Rollin H. Hotchkiss
Department of Civil and Environmental Engineering
Brigham Young University
Copyright © 2020 Xiaohui Qiao
All Rights Reserved
ABSTRACT
Advancing the Accessibility, Reusability, and Interoperability of Environmental Modeling Workflows
Through Web Services
Xiaohui Qiao Department of Civil and Environmental Engineering, BYU
Doctor of Philosophy
Global flood forecasting can benefit developing countries and ungauged regions that lack observational data, computational infrastructure, and human capacity for streamflow modeling. Many technical challenges exist to provide flood predictions on a global scale. First, existing land surface forecasts use coarse resolution grid cells, which provide limited information when used for flood forecasting at local scales. There is, so far, no modeling system that can provide rapid and accurate global flood predictions with low cost. Second, accurate flood predictions often require integrating interdisciplinary models, data sources, and analysis routines into a workflow. Limited accessibility, reusability, and interoperability of models restrict integrated modeling from producing more reliable results. Web services have been demonstrated as an effective way for data and model sharing because of the capability of enabling communication among heterogeneous applications over the internet. However, publishing models or analysis routines as web services is still challenging and, hence, is not commonly done.
To address the above challenges, I present a computational system for global streamflow
prediction, using existing, well-established open source software tools, that quickly downscales the runoff generated from such coarse grid-based land surface models (LSMs) onto high-resolution vector-based stream networks then routes the results using a vector-based river routing model. A set of experiments are conducted to demonstrate the feasibility and credibility of this approach. I also present a tool to publish complex environmental models as web services by adopting the OpenGMS Wrapper System (OGMS-WS) and Docker. The streamflow prediction system is deployed as a web service using this tool, and the service is used to analyze the historical streamflow tendency in Bangladesh. Next, I present a ready-to-use tool called Tethys WPS Server, which provides a simplified and formalized way to expose web app functionality as standardized Open Geospatial Consortium (OGC) Web Processing Services (WPS) alongside a web app’s graphical user interface. Three Tethys web apps are developed to demonstrate how web app functionality(s) can be exposed as WPS using Tethys WPS Server, and to show how these WPS can be coupled to build a complex modeling web app. In sum, this dissertation explores new computational approaches and software tools to advance global streamflow prediction and integrated environmental modeling.
Keywords: global streamflow prediction, environmental modeling, model interoperability, web service, vector-based river routing, container
ACKNOWLEDGEMENTS
First and most, I am thankful to Daniel P. Ames who worked hard as my advisor for his
patience, guidance, and generous financial support throughout my PhD study. Dr. Ames is the
most supportive advisor I know. He has given me the most freedom to pursue my research
interests and always provides insightful discussions and helpful suggestions. I especially
appreciate his guidance and being a good role model for me in doing research and writing
papers, which gives me the confidence to continue pursuing my academic career. I am very
grateful to E. James Nelson for inviting me to participate in the global streamflow prediction
research in the last year of my PhD, as well as his guidance and funding support. I also would
like to thank Norman L. Jones, Gustavious P. Williams, and Rollin H. Hotchkiss for their help
and suggestions as members of my advisory committee.
Second, I am thankful to have had the opportunity to collaborate with great colleagues from
BYU Hydroinformatics lab. Especially who worked with me: Chris Edwards, Corey Krewson,
Giovanni Bustamante, Jake Nelson, Jeffrey Sadler, Jiri Kadlec, Jorge Sánchez, Matthew Bayles,
Michael Souffront, Nathan Swain, Sarva Pulla, Scott Christensen, Shawn Crawley, Spencer
McDonald, Stephen Duncan, Rohit Khattar, Wade Roberts, and Yue Shen.
Special thanks to my husband, Zhiyu Li, who has provided generous help and support on
both my academic and personal life. He is always the first person to discuss my research ideas
and the primary resource to get my technical questions answered. Thanks for his faith in me and
his unconditional love and companionship in this long journey. Many thanks to my parents for
helping me take care of my son and for their prompt encouragement during my bad days.
Finally, I would like to respectively acknowledge the people and funding support
contributed to each chapter of this dissertation.
The research presented in Chapter 2 and Chapter 3 was supported by NASA ROSES
SERVIR Applied Sciences Team Research Grant No. NNX16AN45G. In Chapter 2, E. James
Nelson coordinated the research. Daniel P. Ames assisted with the research and contributed to
the manuscript. Zhiyu Li provided computational and technical support. Gustavious P. Williams
provided statistical support and greatly improved the manuscript. Wade Roberts, Jorge Sánchez
and Chris Edwards contributed to the experimental studies. Dr. Cédric H. David and Dr. Mir A.
Matin provided valuable suggestions on the manuscript. In Chapter 3, Zhiyu Li and Daniel P.
Ames provided key ideas and contributed to the manuscript. E. James Nelson provided financial
support. Fengyuan Zhang and Min Chen from the OpenGMS group at Nanjing Normal
University provided technical support on deploying the OpenGMS Wrapper System.
The research presented in Chapter 4 was funded by NSF HydroShare Grant No. ACI-
1148453. Zhiyu Li provided significant code contributions. Daniel P. Ames contributed
extensively to the writing. E. James Nelson and Nathan Swain provided significant intellectual
support in terms of the case study and Tethys Platform integration. I also acknowledge the
comments and feedback from Norman L. Jones, Tethys Steering Committee, and BYU
Hydroinformatics Lab on the software.
Xiaohui Qiao
v
TABLE OF CONTENTS
LIST OF TABLES ........................................................................................................................ vii LIST OF FIGURES ..................................................................................................................... viii 1 Introduction ............................................................................................................................. 1
Problem Statement .......................................................................................................... 1
Research Objectives ........................................................................................................ 5
Outline of Dissertation .................................................................................................... 7
2 A Systems Approach to Routing Global Gridded Runoff through Local High-resolution Stream Networks for Flood Early Warning Systems ...................................................................... 9
Introduction ..................................................................................................................... 9
Methods......................................................................................................................... 14
2.2.1 Grid-to-vector Mapping ............................................................................................ 14
2.2.2 RAPID River Routing Model ................................................................................... 17
2.2.3 Experimental Design ................................................................................................. 20
2.2.3.1 Comparison with GloFAS......................................................................... 20
2.2.3.2 Sensitivity of Watershed Resolution ......................................................... 22
Results ........................................................................................................................... 24
2.3.1 Comparison with GloFAS......................................................................................... 24
2.3.2 Sensitivity of Watershed Resolution ......................................................................... 29
Discussion and Conclusions ......................................................................................... 32
3 A Container-based Approach for Sharing Complex Environmental Models as Web Services ......................................................................................................................................... 36
Introduction ................................................................................................................... 36
Methods......................................................................................................................... 40
3.2.1 Approach Design ...................................................................................................... 40
3.2.1.1 OpenGMS Wrapper System (OGMS-WS) ............................................... 40
3.2.1.2 Docker ....................................................................................................... 41
3.2.1.3 OGMS-WS-Docker Approach .................................................................. 43
3.2.2 Experimental Design ................................................................................................. 44
3.2.2.1 Streamflow Prediction System .................................................................. 44
3.2.2.2 Streamflow Prediction Service Design ..................................................... 45
3.2.2.3 Service Application in Bangladesh Historical Streamflow Analysis ....... 50
Results ........................................................................................................................... 52
3.3.1 Streamflow Prediction Service ................................................................................. 52
vi
3.3.2 Bangladesh Historical Streamflow Analysis ............................................................ 55
Discussion & Conclusion .............................................................................................. 59
4 Simplifying the Deployment of OGC Web Processing Services (WPS) for Environmental Modeling – Introducing Tethys WPS Server ....................................................... 62
Introduction ................................................................................................................... 62
Methods......................................................................................................................... 67
4.2.1 System Design .......................................................................................................... 67
4.2.2 System Implementation ............................................................................................ 71
4.2.3 Experimental Case Study Design.............................................................................. 76
4.2.3.1 HydroProspector Use Case ....................................................................... 76
4.2.3.2 WPS Client Use Case................................................................................ 81
Results ........................................................................................................................... 81
4.3.1 Tethys WPS Server ................................................................................................... 81
4.3.1.1 Interaction with Tethys WPS Server......................................................... 83
4.3.2 Case Study Results .................................................................................................... 85
4.3.2.1 HydroProspector Use Case ....................................................................... 85
4.3.2.2 Third-party WPS Client Use Case ............................................................ 87
4.3.3 Quantitative Analysis ................................................................................................ 88
4.3.4 Qualitative Analysis .................................................................................................. 90
Discussion ..................................................................................................................... 92
Conclusion .................................................................................................................... 95
5 Conclusions ........................................................................................................................... 97
References ................................................................................................................................... 102
Appendix A. Software Availability ........................................................................................... 117
1. New Software Developed in the Dissertation .................................................................... 117
2. Third-party Software Used in the Dissertation .................................................................. 118
vii
LIST OF TABLES
Table 2-1: Catchment Basin Properties ........................................................................................ 31
Table 2-2: The Statistical Results of Comparing Streamflow Simulations at Different
Resolutions ............................................................................................................................ 32
Table 3-1 Content of the Model Description Language (MDL) File............................................ 41
Table 3-2: Variable Design of the Streamflow Prediction Service............................................... 48
Table 3-3: Selected Reach Segments ............................................................................................ 56
Table 4-1: Server-Side Implementation of OGC WPS Specification Standard ........................... 72
Table 4-2: Tethys WPS Server HTTP GET + KVP Request APIs .............................................. 84
Table 4-3: Watershed Delineation Process WPS & Reservoir Calculation Process WPS on
Tethys WPS Server................................................................................................................ 87
Table 4-4: Language Composition Comparison ........................................................................... 88
viii
LIST OF FIGURES
Figure 2-1: Comparison of grid-based river routing (left) and vector-based river routing (right).
Note: the grid-based routing consists of two steps: first, the runoff in each grid cell
is routed to the nearest downstream river grid cell (black arrows are flow directions
between grids, blue grids represent the river network); then, the water is routed in each
river grid cell (red arrows are the flow directions in the river). In the vector-based
routing, water is directly routed in the river network (blue lines) following the flow
direction (red arrows) ............................................................................................................ 11
Figure 2-2: Area-weighted grid-to-vector mapping method ......................................................... 16
Figure 2-3: The workflow of calculating a weight table for a gridded LSM runoff ..................... 16
Figure 2-4: Schematic diagram of mapping gridded runoff to river networks ............................. 18
Figure 2-5: Schematic diagram of GloFAS reanalysis and ERA- RAPID ................................... 22
Figure 2-6: A map of the stations where RAPID routing results were compared with GloFAS.
Initially 100 stations were selected (all points) however, 20 stations (grey points) were
not used based on contributing area differences between the grid-based and vector-based
stream node. Eighty (80) stations were evaluated (red points) .............................................. 25
Figure 2-7: Comparison of GloFAS reanalysis data with ERA-RAPID results on 35-year
cumulative volume per unit area (CVPUA). ......................................................................... 26
Figure 2-8: The KGE distribution of comparing ERA-RAPID with GloFAS.............................. 26
Figure 2-9: The R distribution of comparing ERA-RAPID with GloFAS ................................... 27
Figure 2-10: Comparison between ERA-RAPID and GloFAS at Dipayal station, Nepal (up)
and Rocky Reach Dam station, United States (down). Red lines represent the results of
routing ERA-Interim/Land data with RAPID, while black lines represent GloFAS
reanalysis results. The left figures show the comparison of 35 years’ streamflow, the
right figures show the comparison of daily average streamflow. .......................................... 29
Figure 2-11: Experimental watersheds in low, medium and high resolutions .............................. 30
Figure 3-1: OGMS-WS workflow diagram .................................................................................. 41
Figure 3-2: The OGMS-WS-Docker approach ............................................................................. 43
Figure 3-3: Workflow of the Streamflow Prediction System (grey boxes are inputs) ................. 45
ix
Figure 3-4: Interactions between user, service interface, OGMS-WS, and the model
container ................................................................................................................................ 47
Figure 3-5: The streamflow prediction service deployment package ........................................... 50
Figure 3-6: Continued OGMS-WS GUIs ..................................................................................... 53
Figure 3-7: Streamflow prediction service GUIs .......................................................................... 54
Figure 3-8: High resolution river network in Bangladesh ............................................................ 56
Figure 3-9: Streamflow monthly distribution from 1995 to 2014 ................................................ 57
Figure 3-10: Simulated streamflow of selected reach segments ................................................... 58
Figure 4-1: Tethys WPS Server system design............................................................................. 69
Figure 4-2: General system architecture of Tethys WPS Server .................................................. 70
Figure 4-3: PyWPS workflow ....................................................................................................... 73
Figure 4-4: Integration of PyWPS in Tethys Platform ................................................................. 74
Figure 4-5: WPS access to the algorithm of a Tethys app ............................................................ 75
Figure 4-6: HydroProspector app workflow ................................................................................. 77
Figure 4-7: Watershed delineation process diagram ..................................................................... 78
Figure 4-8: Reservoir calculation process diagram ...................................................................... 79
Figure 4-9: Interactions between User, Template, Controller, and Tethys WPS Server in the
HydroProspector App ............................................................................................................ 80
Figure 4-10: WPS definition template file. The green box contains metadata of the service;
the blue box contains the definitions of inputs and outputs; the red box contains the
actual operation, where can directly call functions in the Tethys app. .................................. 82
Figure 4-11: HydroProspector for Dominican Republic App User Interface ............................... 86
Figure 4-12: The comparison of compute time between HydroProspector app and Storage
Capacity Service app ............................................................................................................. 89
Figure 4-13: Academic status (left) and environmental web app developing experience (right)
of the workshop and survey attendees. .................................................................................. 90
Figure 4-14: Comparison of difficulty of OGC WPS deployment with/without Tethys WPS
Server for survey attendees with different years of web application development
experience (Scale of 1 to 10 where 1 is very easy and 10 is very difficult) .......................... 91
Figure 4-15: Time to deploy WPS processes using Tethys WPS Server for survey attendees
with different years of web application development experience. ........................................ 91
1
1 INTRODUCTION
Problem Statement
Flooding is the most frequent natural disasters and a leading cause of natural disaster
fatalities and economic loss worldwide, accounting for over one-third of the total damage and
two-thirds of the impact to people affected by natural disasters (Jonkman 2005; Kousky 2014;
Miao 2019; Tu et al. 2020). Flooding has a long list of impacts, including threatening human life,
causing economic loss, damaging crops (Li et al. 2019; Zhang et al. 2016), triggering fatal
landslides (Martelloni et al. 2012), impacting terrestrial ecosystems (Knapp et al. 2008), stressing
water treatment plants and sewage networks, and affecting public health by inducing outbreaks
of waterborne diseases (Rose et al. 2000). It has been estimated that more than 94 million people
are affected by floods each year globally through property damage, unsafe drinking water,
infrastructure destruction, injury, and loss of life (Emerton et al. 2016). Recent studies have
shown that global climate change is causing more frequent extreme precipitation events in many
regions of the world (Donat et al. 2016; Min et al. 2011; Taylor et al. 2017; Westra et al. 2014),
leading to increased flooding risk. Some studies also have demonstrated a steady increase in
frequency and economic losses of flooding events in recent decades (Hirabayashi et al. 2013;
Schiermeier 2011; Yin et al. 2017).
Effective flood forecasting is becoming increasingly essential for flood management
since it can provide better prediction and timing for people to mitigate the impacts of
2
forthcoming hazards. Also, accurate flood forecasting allows for better quantification of
taxpayers’ liabilities, better pricing of crop insurance to enable informed decisions, and assisting
long-term flood control planning and land use management. Effective flood forecasting requires
a precise hydrologic model and sufficient resources, including observation data, computational
infrastructure, and human capacity. Due to limited resources, it is costly or even impossible to
produce accurate and rapid flood predictions in data-scarce areas (Cools et al. 2016; Emerton et
al. 2016). In recent decades, technological advances in earth observations and computational
resources have led to significant improvements in modeling and predicting the hydrologic cycle
at the global scale (Guy J-P. Schumann 2018; Sood and Smakhtin 2015; Ward et al. 2015).
Global runoff modeling results have been used by many researchers to compute river flow and
estimate floods (Alfieri et al. 2013; Hirpa et al. 2018; Wu et al. 2012; Wu et al. 2014). These
studies provide the potential to provide global-scale flood predictions by leveraging global runoff
predictions, filling the need for flood prediction in data-scarce regions.
A practical flood forecasting system requires a high-resolution stream network where
managers can access predictions at local locations. However, global runoff results are typically
generated at relatively low resolutions, which provide very limited information for flood
prediction at local scales. It requires an approach to accurately distribute gridded runoff to flows
at each stream segment and route the resulting flows for streamflow and flood prediction. If the
approach could be developed as an automated system, it could provide data support for
developing countries and ungauged regions that lack sufficient resources to develop the
cyberinfrastructure and human capacity to implement advanced flood forecasting system. There
exist a number of challenges in developing such an automated system at the global scale.
Traditional river-routing models are generally physically-based and grid-based, which require
3
significant computing power to perform global scale routing at resolutions fine enough to
provide local-scale solutions (Balazs M. Fekete 2011; Yamazaki et al. 2013). In addition, the
result of traditional routing models is river outflow at each grid cell, which is difficult to convert
to discharge at each river segment for intuitive understanding. Therefore, it is necessary to
advance current methods for large-scale flood forecasting and provide reliable streamflow
predictions at river level for stakeholders and managers to make informed decisions.
A precise model plays an essential role in flood forecasting. With advanced satellite and
computing technologies, there has been growing attention on integrating multidisciplinary data
and models to represent a research question more comprehensively (Bandaragoda et al. 2019;
Chen et al. 2019; Gao et al. 2019). However, it remains challenging to integrate models without
extra efforts because of their heterogeneous formats, execution modes, programming languages,
and supported operating systems. First, environmental models often have high entry barriers. It is
challenging for users to master a new model in a short amount of time, especially models in
unfamiliar domains. Second, low reusability and portability greatly hinder the spreading of
models. Collberg et al. (2014) found that less than 50% of software could even be successfully
installed and model reproducibility significantly reduces year by year because of changing
dependencies and missing third-party resources. Above all, there is an increasing demand to
improve the accessibility, reusability, and interoperability of environmental models (Belete et al.
2017; Voinov and Cerco 2010). Accessibility refers to the ability of a model to be easily
accessed and used. Reusability refers to the ability of a model to remain functional to be used to
develop new models or modeling systems. Interoperability refers to the ability of a model to
work with other models or systems without special effort by the user.
4
Model interoperability can be achieved in many ways, such as sharing input and output
files, directly rewriting models into a single software system, or establishing software
architecture principles that facilitate the coupling of independent models (Belete et al. 2017;
Granell et al. 2010). Among these methods, Service-Oriented Architecture (SOA) has gained the
most attention and has been widely used to integrate models and build scientific workflows
(Bosin et al. 2011; Lin et al. 2009; Yue et al. 2016; Zhao et al. 2012b). SOA has significantly
improved dealing with complex problems by decomposing a system into functional components
that communicate via web service (Dubois et al. 2013; Nino-Ruiz et al. 2013; Skøien et al. 2013;
Yang et al. 2009). Each web service in the system can perform an isolated task while the whole
system can address much broader problems (Argent et al. 2006; Castronova et al. 2013;
Schaeffer 2008). The advantages of web services include (1) enabling communication and
interoperation among applications written in different programming languages or running on
different platforms over the Internet (Michaelis and Ames 2009; Nativi et al. 2013); (2)
significantly lowering the entry bar of model execution (Jiang et al. 2017; Xiao et al. 2019;
Zhang et al. 2019).
While numerous environmental web services exist for data distribution and web mapping,
environmental models as web services is still an area of research that has not been widely
investigated and implemented (Belete et al. 2017; Hu et al. 2015; Vitolo et al. 2012; Vitolo et al.
2015). Based on my research, this problem stems from the following three aspects. First, many
studies have demonstrated orchestrating web services for environmental models; however, the
existing literature is more heavily weighted towards specific services for particular models rather
than creating general tools to support publishing models or data analysis workflows as web
services. For example, we wanted to expose existing functions in web apps as web services to
5
avoid repetitive work; however, no ready-to-use tools were found for this purpose. Second, there
are some tools can assist publishing analysis workflows as web services, such as GeoServer
(2013), Deegree (Müller 2007), PyWPS (Becchi 2007), 52°North (2018), and Esri ArcGIS.
However, these tools are designed for geospatial analysis in specific programming languages. It
is still challenging to expose complex models as web services, especially models in different
forms, execution modes, programming languages, and supported operating systems. Third, even
with well-designed tools, deploying models can still be a daunting process. Some models are
difficult to set up or even impossible to install in a new environment because of the complex
infrastructures required (network, storage and computing). Some models have conflicts with the
service publishing tool or with each other when installed on the same platform, such as
dependency conflicts, supporting different operating systems/programming languages, etc.
Model isolation management, which refers to isolating each model and preventing the conflicts
between them on the same platform, is still an uninvestigated issue for these service publishing
tools. While these technological problems are particularly evident when considering flood
forecasting, they are also relevant to many other problems in environmental and water resources
engineering including water management, drought modeling and prediction, water resources
planning, environmental restoration, and other related areas. The research objectives presented
below are positioned in terms of flood forecasting but can also be applicable to many, if not all
of these other related areas.
Research Objectives
The following three objectives are defined to address the challenges associated with flood
forecasting, including providing high-resolution global-scale flood predictions and developing
tools to assist web service deployment to facilitate integrated modeling.
6
Objective 1: Design and develop a fast and automated method to use runoff from
global/large-scale grid-based land surface models, and to route the runoff into high-resolution
vector-based river networks, ultimately provide river-level streamflow prediction at the global
scale. Based on the limitations of implementing traditional physically-based grid models at the
global scale, I elected to use vector-based river routing models as their demonstrated advantages
in large-scale hydrologic modeling, including computational efficiency, higher resolution
representations of the hydrologic features, and more precise locations for predictions.
Meanwhile, to provide intuitive and useful information for decision-makers, a fast and automated
method is needed to provide reliable streamflow predictions at river level.
Objective 2: Design and develop a tool that can (1) publish heterogeneous complex
environmental models as web services; (2) isolate each model and prevent the conflicts between
them; (3) keep the model(s) functional with changing environment and dependencies. The
developed tool will significantly lower the barrier to publishing models that are different in
execution modes, programming languages, and operating systems as web services, especially for
users with limited programming skills.
Objective 3: Design and develop a tool to simultaneously publish web app functionality
as Open Geospatial Consortium (OGC) Web Processing Services (WPS) when developing a web
app. OGC WPS is the most commonly recognized and has been demonstrated as an efficient
technology for publishing geospatial processes as web services. The developed tool will make
the process of WPS development almost transparent to developers. In this way, users can focus
on algorithm design and web app development and less focus less on the details of the OGC
WPS specification and implementation. In addition, the published WPS services can be directly
7
used in developing new web apps or by third-party applications to facilitate integrated
environmental modeling.
Outline of Dissertation
The remainder of this dissertation is structured following a manuscript-based approach as
approved by the BYU Office of Graduate Studies where the following three core chapters are
essentially standalone research articles that have been prepared and submitted for publication in
high quality archival scientific journals. Specifically, Chapters 2 and 4 were published as
standalone papers in Environmental Modelling and Software respectively (Qiao et al. 2019a;
Qiao et al. 2019b) and Chapter 3 is being submitted to International Journal of Digital Earth. A
brief summary of the following chapters is as follows:
Chapter 2 presents the design and development of a new automated computational
system, using existing, well-established open source software tools, that quickly downscales (or
maps) the runoff generated from coarse grid-based land surface models (LSMs) onto high-
resolution vector-based stream networks then routes the results using a vector-based river routing
model. Simulations are conducted with the new system using the ERA-Interim/Land reanalysis
data – a 35-year retrospective gridded runoff data product from the European Center for
Medium-range Weather Forecasts (ECMWF). The simulation results are compared with Global
Flood Awareness System (GloFAS) – a well-established gridded routing model using the same
forcing to assess the credibility of this new system.
Chapter 3 presents a tool to improve publishing complex environmental models as web
services. The OpenGMS Wrapper System (OGMS-WS) is elected as the service publishing tool.
Docker container is used as the model isolation tool to keep the model functional and portable. A
8
streamflow prediction service is deployed using this tool and used to analyze the historical
streamflow tendency in Bangladesh.
Chapter 4 presents the development and testing of a ready-to-use WPS implementation
called Tethys WPS Server, which provides a formalized way to expose web app functionality as
standardized WPS alongside a web app's graphical user interface. The WPS server is created
based on Tethys Platform by leveraging PyWPS. Three Tethys web apps are developed to
demonstrate how web app functionality(s) can be exposed as a WPS using Tethys WPS Server,
and to show how these WPSs can be coupled to build a complex modeling web app.
Chapter 5 provides a summary and conclusion on all the work presented in this
dissertation and outlines opportunities for future research.
Software availability is presented in the appendix to describe all the software and web
apps which have been developed and presented in each chapter and all the third-party software
which has been used in this dissertation.
9
2 A SYSTEMS APPROACH TO ROUTING GLOBAL GRIDDED RUNOFF
THROUGH LOCAL HIGH-RESOLUTION STREAM NETWORKS FOR
FLOOD EARLY WARNING SYSTEMS
Introduction
Flood early warning systems (FEWS) are important tools for flood management since
they can provide greater prediction and timing of forthcoming hazards so that populations are
more prepared before and more resilient after flood events. Effective FEWS that provide flood
warnings with a reasonable lead time are mostly established at catchment-scale, usually in areas
where sufficient data, technology and human resources are available (Basha and Rus 2007; Guy
J-P. Schumann 2018; Latt and Wittenberg 2014). Due to limited resources, it is costly or even
impossible to build effective and sustainable FEWS in data-scarce areas (Cools et al. 2016;
Emerton et al. 2016). In recent decades, the considerable advances in satellite technology and
high-performance computing has led to significant improvements in modeling and predicting the
hydrologic cycle at the global scale (Guy J-P. Schumann 2018; Sood and Smakhtin 2015; Ward
et al. 2015). The hydrologic processes are usually land surface models (LSMs) that simulate the
water and energy interactions between the atmosphere and earth surface. The LSMs use the
meteorological forcing from weather prediction models as input and provide output as water
balance parameters from the earth’s surface including evapotranspiration, soil moisture, snow,
and runoff (Pitman 2003).
10
Among these land surface results, runoff has received most attention because of the
importance of streamflow to society (Bai et al. 2016). Many researchers have performed river
routing using runoff forecasts from LSMs to compute river flow and estimate floods (Alfieri et
al. 2013; Hirpa et al. 2018; Wu et al. 2012; Wu et al. 2014). These studies demonstrate the
potential to develop global or large-scale FEWS by leveraging global runoff predictions. If these
approaches could be automated, they could provide data support for developing countries and
ungauged regions that lack sufficient resources to develop the cyberinfrastructure and human
capacity to implement advanced flood prediction systems. However, the relatively low
resolutions of these LSMs are not practical for flood prediction at local scales. Developing
practical flood prediction systems requires higher resolution stream networks where managers
can select predictions at relevant locations. The development of such systems presents a number
of challenges. First, river routing models have traditionally used a gird-based scheme in which
river segments (calculation units) are discretized as grids (Balazs M. Fekete 2011; Yamazaki et
al. 2013). Significant computational resources are required to perform global or large-scale grid-
based routing at resolutions fine enough to provide local-scale solutions. Second, to better
simulate the hydrodynamic process in river channels, river routing models have been improved
increasingly to include more and more complex physical processes (Shaad 2018; Yamazaki et al.
2013). It is still challenging to implement complex physically-based river routing models at the
global scale because they have more parameters and higher requirement for observation data and
computing power to run and calibrate (Younis and De Roo 2010), which is especially difficult in
data scarce areas. Last, the result of grid-based routing models is river discharge at each grid cell.
It is a challenge to convert from gridded outflow volumes to estimated river discharge at a local
scale so that the benefits of the global models can be maximized. For these systems to be useful
11
for flood prediction, it is essential to provide streamflow at a fine resolution or at a river-level
scale.
Vector-based river routing modeling is receiving more attention as it facilitates large-
scale hydrological predictions at finer resolutions and some large-scale models have started to
shift from a grid-based environment towards to a vector-based environment for predictions
(David et al. 2011b; David et al. 2013; Lin et al. 2018). The difference between vector-based
river routing and grid-based river routing is shown in Figure 2-1.
Figure 2-1: Comparison of grid-based river routing (left) and vector-based river routing (right). Note: the grid-based routing consists of two steps: first, the runoff in each grid cell is routed to the nearest downstream river grid cell (black arrows are flow directions between grids, blue grids represent the river network); then, the water is routed in each river grid cell (red arrows are the flow directions in the river). In the vector-based routing, water is directly routed in the river network (blue lines) following the flow direction (red arrows)
The grid-based river routing defines the river network as a set of connected grid cells and
performs the streamflow routing simulation in each river grid cell. In the vector-based river
12
routing, the river network is represented as lines and the calculations are implemented in each
river segment. In both cases (grid-based and vector-based) river routing calculations require
parameterization of the computational elements including estimates of cross-sectional geometry,
surface roughness, and slope. For grid-based routing, these parameters are estimated for each
grid cell, whereas for vector-based routing, each stream segment is parameterized separately.
Several recent studies have implemented vector-based river routing with large-scale
LSMs and demonstrated its feasibility and flexibility. David et al. (2011a) replaced the grid river
routing scheme in the SIM-France model with Routing Application for Parallel computatIon of
Discharge (RAPID) and obtained comparably accurate simulations but higher model efficiency.
Lehner and Grill (2013) developed a vector river routing model, HydroROUT, and coupled it
with a global river network database, HydroSHEDS, to support global-scale eco-hydrological
modeling. Mizukami et al. (2016) developed a river network routing tool, mizuRoute that post-
processes the runoff outputs from LSMs and performs continental-scale streamflow simulations.
This tool can use both grid-based and vector-based river networks. The authors demonstrated its
capability to produce streamflow on a vector-based river network over the contiguous United
States (CONUS). Snow et al. (2016) implemented RAPID on the runoff generated by the
ECMWF model (Molteni et al. 1996) to develop a US national-scale streamflow prediction web
application. Tavakoly et al. (2017) performed river flow modeling in Mississippi river basin
using RAPID and high-resolution NHDPlus river data. Their validation results showed that
RAPID has a satisfactory performance in continental-scale river routing. Lin et al. (2018)
integrated RAPID with the community WRF-Hydro framework for continental-scale flood
discharge modeling and demonstrated its computational efficiency and reasonable accuracy in
predicting flood discharge during Hurricane Ike in 2008. In general, the advantages of vector-
13
based routing models compared with grid-based models, especially at a large-scale, include
computational efficiency, higher resolution representations of the hydrological features, and
more precise locations for predictions. Research has shown that routing through a vector-based
representation of a river network has more advantages and flexibilities for large spatial domains
than routing through a grid-based representation of river networks at coarse resolutions (Lehner
and Grill 2013; Lin et al. 2018; Mizukami et al. 2016; Singh et al. 2015).
To date, global LSMs have typically been generated at such coarse resolutions that, at
least for streamflow, they provide little value at local scales when used for flood prediction and
warning. At the same time, developing regions of the world lack observational data,
computational infrastructure, and the human capacity to create actionable streamflow, and by
extension, flood forecasts. Global gridded runoff models could fill the need for flood prediction
in developing regions if the large-scale gridded results could be made useful at local scales. This
requires the development and integration of a system that can take output from the LSMs,
accurately distribute flows to vector-based stream segments, and route the resulting flows for
streamflow and flood prediction.
I have designed and developed a computational system, using existing, well established
open source tools, that routes runoff generated from large scale grid-based modeling into high-
resolution vector-based river networks ultimately providing FEWS support for flood
management in data-scarce regions. The contribution of this work to this field is in developing
the tools and data structures to create the automated system, and to develop new algorithms to
distribute grid flow to vector stream segments. This chapter presents both the new system to
provide automated flood forecasts and the method for distributing grid-based flows to vector
stream segments. Experimental studies are conducted to demonstrate the feasibility of the system
14
and its flexibility in providing useful flood prediction over large regions. The results have
demonstrated that this system can produce more useful predictions as they can be computed at
more precise locations than grid-based systems.
Section 2.2 presents the method for grid-to-vector flow mapping and describe the
experimental design I used to validate this method. Section 2.3 presents the results of this
integrated system and validation experiments. Section 2.4 provides a detailed discussion on the
results and describes the benefits of this integrated system for flood prediction in data scarce
regions. Section 2.4 also provides conclusions and potential areas for future work. This chapter
describes an integrated system for using LSM data to create useful flood predictions at a local
scale. The contributions of this work are developing the integrated system and creating a new
method for distributing course resolution grid cell flows to high-resolution vector stream
segments. As part of developing this system, I extended the RAPIDpy vector routing model
(Snow et al. 2017) to accept input from additional LSM models.
Methods
2.2.1 Grid-to-vector Mapping
To develop a system that couples course-resolution LSMs with high-resolution vector-
based river routing models is to determine how to distribute the grid-generated runoff data to the
correct river segment. This is relatively straightforward when coupling grid-based run-off with
grid-based river routing models, where runoff from the LSM grid is distributed as river inflow to
the nearest downstream river grid cell. For vector-based routing, unlike grid-based river routing
methods, no such one-to-one relationship exists to distribute the gridded runoff and with the
15
correct vectorized river segments. For example, one grid cell might contribute to multiple river
segments or one river segment might accept runoff from multiple grid cells. These complex
relationships make runoff distribution complicated. One solution, based on physical processes, is
to determine the contributing catchment for each river segment and use the runoff from those
grid cells to calculate the water inflow of each river segment.
Studies have adopted different methods for coupling gridded LSMs with vector-based
routing models. David et al. (2013) mapped NLDAS2 gridded runoff to the United States
NHDPlus river network using a catchment centroid-based method. The inflow of a catchment
was calculated using the runoff value of the grid cell where the catchment centroid was located.
Snow et al. (2016) adopted an area-weighted method to convert global runoff depths generated
by the ECMWF model to the runoff volume of each NHDPlus catchment. Lin et al. (2018)
compared the area-weighted method with the catchment centroid-based method and concluded
that the model is more sensitive to the grid-to-vector coupling interface than the grid resolution.
The area-weighted coupling exhibits better results for high-resolution gridded forcing data,
especially when the grid cells of the forcing are much smaller than the catchments.
I used an area-weighted grid-to-vector method, adapted from Snow et al. (2016) shown
as Figure 2-2. This method employs a weight table to convert the calculated gridded runoff
depths to the inflow of each catchment. The weight table contains the area ratio of each LSM
grid cell to the intersected catchments. This area ratio is referred as “weight” in this context.
Figure 2-3 shows the workflow for calculating a weight table for gridded LSM runoff data. First,
a Thiessen polygon feature is computed based on the coordinates (latitude and longitude
attributes) stored with the runoff data in the NetCDF file. Figure 2-2 shows example Thiessen
polygons as grids squares. Each polygon is a square with the given coordinates as centroids.
16
Next the algorithm intersects the polygon with data from a drainage file that contains an
identifier for each river segment (the river identifier is referred to as COMID in this research). In
this way, we can obtain a polygon feature that has the attributes of geodesic area and the
intersecting catchment COMID. Finally, the weight table is generated from the intersected
polygon features.
Figure 2-2: Area-weighted grid-to-vector mapping method
Figure 2-3: The workflow of calculating a weight table for a gridded LSM runoff
17
Then, the Inflow of a catchment is computed as the sum of all the runoff products from
each intersected LSM grid cell [runoff (i,j)], the area of the LSM grid cell (L2, and L refers to the
grid size), and the weightk of each intersected grid cell [shown as Eq. (2-1), k refers to the
number of grid cells intersected with the catchment].
𝐼𝑛𝑓𝑙𝑜𝑤 = ∑ [𝑟𝑢𝑛𝑜𝑓𝑓 (𝑖, 𝑗) × 𝐿2 × 𝑤𝑒𝑖𝑔ℎ𝑡𝑘]𝑛𝑘=1 (2-1)
2.2.2 RAPID River Routing Model
RAPID is a vector-based river routing model developed by David et al. (2011a) that uses
a matrix-based version of the Muskingum method to simulate the water flow through a vector-
based river network that can range from watershed-scale to global-scale (David et al. 2011b).
The Muskingum routing method has been widely used in vector-based river routing because of
its simplicity and low computational cost compared with other methods (Gill 1978; Tung 1985).
Since its first formal release, RAPID has been used and verified in a number of studies including
continental-scale, high-resolution flow modeling and operational flood forecasting, computation
of river height at the regional scale and other hydrological topics (David et al. 2016). A set of
assisting tools exists to lower the barrier to operating RAPID by users from different disciplines,
including an ArcGIS toolset (Ding 2016) developed by Esri for RAPID input data preprocessing,
and an open source software tool RAPIDpy developed by Snow et al. (2017) to assist in
preparing inputs in the required formats and running RAPID. RAPIDpy supports several large-
scale LSMs, including ECMWF, ERA-Interim (Dee et al. 2011), NLDAS (Xia et al. 2012) and
GLDAS (Lorenz et al. 2015). I extended RAPIDpy to support more models, including ERA5
(Karl Hennermann 2018), HIWAT (Gatlin et al. 2018), and COSMO (Rockel et al. 2008).
18
Figure 2-4: Schematic diagram of mapping gridded runoff to river networks
RAPID is developed in FORTRAN and compiled as a dynamic link library (DLL) whose
methods can be called or executed from external software. The RAPID required inputs (shown as
the gray boxes in Figure 2-4) include (1) a set of forcing files of LSM surface and subsurface
runoff; (2) a model initialization file describing the streamflow of each river at the start time of
the simulation; (3) a file describing the water inflow from surface and subsurface runoff into the
upstream point of each river reach; (4) a file documenting the river network's topological
connectivity; (5) two files defining the Muskingum routing parameters (k and x), and other basic
information such as the internal time step and duration of the simulation. All the information of
19
required inputs and simulation settings are documented in a “namelist” text file for RAPID to
read at runtime.
River connectivity and Muskingum parameters can be calculated by third-party GIS
software, such as ArcGIS (proprietary), TauDEM (open source) and others. The ArcGIS
extension, Arc Hydro has a tool called “Dendritic Terrain with Unknown Stream Location” that
can delineate watersheds and generate a stream network shapefile from a flow direction file and
a flow accumulation file. These files are directly generated from a DEM file through the “Flow
direction” tool and “Flow accumulation” tool in ArcGIS. Arc Hydro has a tool called “Calculate
the Muskingum Parameters”, which can estimate the Muskingum k and x values for each stream
segment. The value of k is associated with the flow travel time through a stream and calculated
using Eq. (2-2) by multiplying the user-given factor λk by the length of the stream segment over
the flow wave velocity. The value of x for each stream segment is assigned based on the user-
given factor λx.
𝑘𝑗 = 𝜆𝑘 ×𝐿𝑗
𝑣; 𝑥𝑗 = 𝜆𝑥 × 0.1 (David et al. 2013) (2-2)
Where kj and xj are the Muskingum parameters for reach j, Lj is the length of a river
reach and v is flow wave celerity (default value in this tool is 1 km/h or 0.28 m/s). k and x are
two multiplying factors later determined by the RAPID optimization procedure (default value of
k is 0.35, default value of λx is 3). The files of river connectivity and Muskingum parameters
files only need to be generated once for a river network. Then, the weight table, described in
section 2.2.1, is generated through the ArcGIS RAPID preprocessing toolset. All the above steps
only need to be performed once for a river network and an incorporated LSM.
20
Once these data have been generated, RAPIDpy is used to write the Muskingum and river
connectivity files in the required formats from the ArcGIS-processed results, create catchment
inflow files with the weight table, populate the “namelist” file, and run RAPID. When RAPIDpy
is used in a forecasting mode, after the simulation is complete, RAPIDpy generates an
initialization file from the streamflow results for the next forecast model run. This offers users
the flexibility to extract different data from the initialization file, including the result of a specific
day or seasonal averages.
2.2.3 Experimental Design
2.2.3.1 Comparison with GloFAS
The Global Flood Awareness System (GloFAS), developed by ECMWF and the Joint
Research Centre of the European Commission, is a coupled hydro-meteorological model that
generates global streamflow predictions for large-scale river basins (Alfieri et al. 2013). GloFAS
provides daily streamflow forecasts of up to 30 days by using the LISFLOOD hydrological
model forced by the surface and subsurface runoff from the ECMWF Integrated Forecast System
(IFS) meteorological forecasts. LISFLOOD is a grid-based routing model that can separately
simulate different hydrological processes that occur in large river basins (Younis and De Roo
2010). The LISFLOOD processes activated in GloFAS include the simulation of groundwater
storage, groundwater flow, and flow routing in river channels (Hirpa et al. 2018). GloFAS also
provides a long-term reanalysis dataset (1980/01 to 2017/12) with daily streamflow at the global
scale with a gridded spatial resolution of 0.1. This reanalysis dataset uses the same hydrological
model but is forced by the runoff from ERA-Interim/Land. ERA-Interim/Land is a global land
surface reanalysis dataset with parameterization-improved HTESSEL land surface model
21
(Balsamo et al. 2009; E. L. Wipfler 2011) driven by meteorological forcing from the ERA-
Interim atmospheric reanalysis and observed precipitation adjustments (G. Balsamo 2015). As
shown in Figure 2-5, the HTESSEL model is implemented to simulate the water and energy
fluxes between land surface and atmosphere and estimate the surface and subsurface runoff
required for river routing (Caixin Wang 2018; Clement Albergel 2018; Dee et al. 2011). The
resolution of ERA-Interim/Land surface and subsurface runoff are both 80 km. In GloFAS, the
LISFLOOD model is set up on global coverage with horizontal grid resolution of 0.1 to better
represent the hydrological process at large river basin scale (Shaw et al. 2005). So the gridded
surface and subsurface runoff from ERA-Interim/Land is resampled from 80 km to 0.1 to be
used as input of the LISFLOOD model.
It has been demonstrated that GloFAS can skillfully detect hazardous events in large river
basins and also provide a reasonable streamflow forecast in most parts of the world (Alfieri et al.
2013; Hirpa et al. 2018). To evaluate the performance of this system, using RAPIDpy, for
global-scale streamflow prediction, I routed the surface and subsurface runoff of 35-year
(1980/01-2014/12) ERA-Interim/Land data and compared the resulting routed streamflow with
GloFAS reanalysis data (shown as Figure 2-5). The primary benefit of using this RAPIDpy-
based system with vector-based stream routing instead of relying only on GLoFAS, is that the
gridded surface and subsurface runoff is mapped to a high-resolution river network that includes
streams in much smaller basins than GloFAS. This means that flood predictions can be computed
at more precise locations. Section 2.3.1 provides the results of this experiment.
22
Figure 2-5: Schematic diagram of GloFAS reanalysis and ERA- RAPID
2.2.3.2 Sensitivity of Watershed Resolution
Watershed boundaries and river networks are the basis for vector-based river routing. For
many countries and regions in the world, it is difficult and expensive to implement in-situ
measurements to collect hydrographic data, especially at a large spatial scale. For large-scale or
global hydrological modeling, the watershed and river network maps are normally delineated
from digital elevation model (DEM) files. For this method, the resolution and accuracy of the
river network entirely depends on the resolution of the DEM file. In recent years, tremendous
improvements have been made in the availability, quality, and resolution of large-scale
hydrographic datasets due to the availability of large-scale or global earth data obtained from
satellite remote sensing technologies, this includes high-resolution DEM data. The most well-
23
known versions of large-scale hydrographic datasets include the US National Hydrography
Dataset Plus Version 2 (McKay 2012), the Australian Hydrological Geospatial Fabric (Atkinson
2008), the European catchments and Rivers network system (Ecrins) (Agency 2012), and the
global-scale hydrographic dataset - HydroSHEDS (Gong et al. 2011). In summary, several
hydrographic datasets exist at different resolutions allowing users to choose the appropriate
dataset based on data availability and study purposes.
To satisfy river continuum (e.g., mass balance), streamflow at each location of the river
network is influenced by upstream processes (Vannote et al. 1980). In RAPID, each reach
segment or catchment is a modeling unit. Each reach segment gets lateral inflow determined
from the catchment runoff and routes these flows to downstream segment. How these catchment
runoff flows are distributed to the stream segments depends on the algorithm used, the size of the
catchment, and the resolution of the river segments. The same outlet of a watershed can have
different numbers of upstream segments under different hydrographic data resolutions (shown as
Figure 2-11). I performed an experiment to evaluate whether the discharge at the same outlet is
affected by the number of upstream segments it has. In other words, the sensitivity of a vector-
based river routing model to the resolution of the hydrographic data. I selected several
watersheds in CONUS with the following criteria: (1) cover an area of several hundred square
miles, (2) have an outlet that corresponds with a stream gauge, and (3) a be a relatively
unregulated area (i.e., with no major reservoirs or diversions). I delineated each watershed at
low, medium, and high resolutions. Then I routed the 35-year ERA-Interim/Land reanalysis data
using RAPID for each of the watersheds defined at different resolutions. The simulated results
are compared and presented in section 2.3.2.
24
Results
2.3.1 Comparison with GloFAS
Our group has implemented this system for South Asia, Africa, South America, and
CONUS. To validate the model performance, I randomly selected 100 GloFAS reporting stations
across these areas from GloFAS website (http://www.globalfloods.eu/glofas-forecasting/). Figure
2-6 shows the selected stations. The reason for using GloFAS reporting stations is that those
stations provide the upstream area used in the GloFAS routing model. One challenge I faced
comparing GloFAS reanalysis data with the ERA-RAPID simulated results is that GloFAS uses
grid-based river routing, while the ERA-RAPID system uses vector-based river routing. The
challenge is to correctly match GloFAS grid cells with the vector representation of the streams
used in the RAPID routing scheme. The method for matching the outlet points was to select,
from RAPID model, the largest stream among the streams that interact with the GloFAS grid
cell. Then I calculated the upstream area of the selected stream using watershed delineation
procedures using the same DEM data that generated the streams. I compared the upstream areas
calculated using this method with the upstream areas of these stations provided by GloFAS. I
assumed that stations with an upstream area difference less than 10% between these two models
are matched. I based this threshold by considering that the resolution of GloFAS is 0.1 and the
streams used in this experiment are delineated from 3-arc-sec HydroSHEDs DEM data. I initially
identified 100 GloFAS stations for comparison (all the points in Figure 2-6), of these, 20 points
did not meet the selection criteria (red points in Figure 2-6). Based on above criteria, I selected
and evaluated 80 stations (shown as grey points in Figure 2-6). For each station, I calculated 35-
year cumulative volume per unit area (CVPUA) from GloFAS reanalysis data and ERA-RAPID
simulation results. I compared the 35-year CVPUA of 80 stations from the two models, shown in
25
Figure 2-7. The CVPUA values of two models are highly correlated with a high R2 of 0.9818.
This statistic provides additional evidence that the GloFAS and RAPID locations are good
matches at these 80 stations.
Figure 2-6: A map of the stations where RAPID routing results were compared with GloFAS. Initially 100 stations were selected (all points) however, 20 stations (grey points) were not used based on contributing area differences between the grid-based and vector-based stream node. Eighty (80) stations were evaluated (red points)
To compare the two streamflow simulations, I calculated two statistical skill scores,
including the Kling-Gupta efficiency (KGE2012) and the Pearson correlation coefficient (R), for
each station both for the GloFAS reanalysis and the ERA-RAPID. These two metrics are
computed using the Hydrostats library of the HydroErr package (Jackson et al. 2019; Roberts et
al. 2018), a python package that provides statistical analysis for hydrological prediction models.
26
Figure 2-7: Comparison of GloFAS reanalysis data with ERA-RAPID results on 35-year cumulative volume per unit area (CVPUA).
Figure 2-8: The KGE distribution of comparing ERA-RAPID with GloFAS
27
Figure 2-9: The R distribution of comparing ERA-RAPID with GloFAS
For these results, the median of KGE is 0.73 and the median of R is 0.92. The two
simulations are highly correlated for 89% of the stations (with R > 0.7) and well-fitted for 70%
of the stations (with KGE > 0.5). Figure 2-8 and Figure 2-9 respectively show the distribution of
these two skill scores in the research regions. The simulations show consistency between the two
methods in South Asia, most of South America and Africa, and the west coast and east coast of
CONUS. There are a number of stations in the middle part of CONUS with negative KGE and
low R values. I found these stations all have relatively low CVPUA values, therefore the poor fit
in these stations might because small volume of water that was routed with different models. It
also might be due to the fact that the comparison node for the grid-based system is at the center
of the cell, and the node for the vector-based system is on the stream line, even though I used
28
contributing area comparisons to limit these discrepancies, at low flows differences in gauge
locations could create higher variance.
The distributions indicate that ERA-RAPID and GloFAS results are generally consistent;
however, the degree of consistency varies across regions. Figure 2-10 shows the hydrographs
that compare ERA-RAPID with GloFAS at two stations with different level-of-skill scores. The
first station, Dipayal station in Nepal, shows very good match between two simulations. The
second station, Rocky Reach Dam station in the United States, shows a significant difference
between the two datasets. The vector-based ERA-RAPID outputs a lower peak flow than grid-
based GloFAS. The disparity might due to the difference between these two routing models,
GloFAS considers all processes in the grid-based routing, including overland surface routing,
subsurface storage and routing, while RAPID simplifies the explicit representation of these
processes and only focuses on the reach-scale streamflow response. The disparity could also be
because in variations in the location of the comparison points change the reported flow. Overall,
ERA-RAPID results are comparable to GloFAS results, in other words, the higher-resolution
vector-based routing model generated streamflow estimates that are generally consistent with the
results from the GloFAS provides using a gridded-data routing model. I can assume with some
confidence that this high-resolution stream segment forecasts are at least as reliable as the well-
established low-resolution GloFAS results.
29
Figure 2-10: Comparison between ERA-RAPID and GloFAS at Dipayal station, Nepal (up) and Rocky Reach Dam station, United States (down). Red lines represent the results of routing ERA-Interim/Land data with RAPID, while black lines represent GloFAS reanalysis results. The left figures show the comparison of 35 years’ streamflow, the right figures show the comparison of daily average streamflow.
2.3.2 Sensitivity of Watershed Resolution
For sensitivity studies, I selected five watersheds across the CONUS, including Meramec
River near Sullivan, MO; East Branch Delaware River at Margaretville, NY; Alsea River near
Tidewater, OR; White River near Fort Apache, AZ; and North Fork Clearwater River near
Canyon Ranger Station, ID (see Figure 2-11). These watersheds are summarized in Table 1.
These watersheds were selected because of the availability of ground-truth flow data; the US
Geological Survey (USGS) provides sufficient historical streamflow observations at these
30
locations (more than 35 years) that we can use to evaluate whether the different stream densities,
resulting from different DEM resolutions, affects the simulation performance. All the selected
watersheds have a USGS stream gauge located at an outlet. For comparison, each watershed was
delineated into 3, 7, and around 20 catchment sub-basins for the low, medium, and high
watershed resolutions, respectively (shown as Figure 2-11, detailed information of the sub-basins
sees Table 2-1). For each watershed, I ran the 35-year ERA-Interim/Land reanalysis data (1980-
2014, 80 km resolution, see Section 2.2.3.1 for more dataset information) as forcing at each
watershed resolution and the generated flowrates at the basin mouth.
Figure 2-11: Experimental watersheds in low, medium and high resolutions
31
Table 2-1: Catchment Basin Properties
Basin Area [sq. km]
USGS Stream Gauge ID (outlet)
Average Sub-Basin Area [sq. km]
Low
(3 sub-basins)
Medium
(7 sub-basins)
High
(20 sub-basins)
MO 3848 07014500 1283 550 183
NY 422 01413500 141 60 25
OR 857 14306500 286 122 43
AZ 1632 09494000 544 233 71
ID 3356 13340600 1119 479 177
Average 2023 --- 674 289 100
I compared the results for the entire watershed, based on sub-basins delineated at
different resolutions, with each other. Table 2-2 shows the statistical comparison of the 35-year,
model-simulated streamflow at the same watershed outlet generated with different stream
densities. I adopted the following statistical metrics to comprehensively and quantitatively assess
the sensitivity of watershed resolution: KGE, R, and mean absolute average (MAE). Table 2-2
shows that all the comparisons resulted in a high R (average of 0.9849) and KGE (average of
0.9616), and a low MAE (average of 1.605 m3). This small variation among the results from the
different resolution sub-basins demonstrates that the model generates similar streamflow
forecasts using different watershed resolutions. I found that the comparisons between medium
resolution and high-resolution watersheds resulted in a higher R and KGE and lower MAE than
the comparisons of low resolution and medium resolution. This suggests that the simulated
streamflow at the basin mouth changes less as the watershed resolution increases. Essentially, the
watershed resolution has a negligible effect on the simulated streamflow. Since slight differences
exist in the results simulated from different watershed resolutions, we can conclude that the
32
prediction performance of the model, while negligible, is slightly affected by the watershed
resolution.
Table 2-2: The Statistical Results of Comparing Streamflow Simulations at Different
Resolutions
Comparison R MAE[m3] KGE
MO: Low vs. Med Res 0.9375 7.3748 0.8985
MO: Med vs. High Res 0.9979 1.2713 0.9948
MO: Low vs. High Res 0.9232 8.0400 0.8928
NY: Low vs. Med Res 0.9831 0.7209 0.8954
NY: Med vs. High Res 0.9999 0.0355 0.9963
NY: Low vs. High Res 0.9813 0.7556 0.8911
OR: Low vs. Med Res 0.9939 0.9949 0.9841
OR: Med vs. High Res 0.9987 0.4330 0.9985
OR: Low vs. High Res 0.9872 1.4190 0.9799
AZ: Low vs. Med Res 0.9943 0.1244 0.9789
AZ: Med vs. High Res 0.9976 0.0844 0.9744
AZ: Low vs. High Res 0.9849 0.2014 0.9522
ID: Low vs. Med Res 0.9982 0.8860 0.9965
ID: Med vs. High Res 0.9996 0.4632 0.9949
ID: Low vs. High Res 0.9964 1.2706 0.9959
Average: Low vs Med 0.9814 2.0202 0.9507
Average: Med vs High 0.9988 0.4575 0.9918
Average: Low vs High 0.9746 2.3373 0.9424
Total Average 0.9849 1.6050 0.9616
Discussion and Conclusions
The motivation of this study was to create and test an enhanced method and develop a
system to route coarse gridded runoff generated from global or national LSMs onto a high-
33
resolution vector-based river network and provide flood prediction at local scales. There are
many regions in the world that are hydrologic-data poor and have little or no observed data or
working models to provide the backbone of a regional or national streamflow forecasting system
or FEWS. Leveraging global models can be an efficient way to provide baseline hydrologic
information and supplement whatever other resources are available, but there remains the
question of whether or not the information is useful enough at local scales to be able to make
informed decisions. Managers need flood predictions at specific locations on local stream
networks. I demonstrated the ability to distribute grid-based runoff data for routing on high-
resolution vector-based stream networks that meet this need. This represents a significant step
forward because international development organizations like the World Bank can reinvest
dollars focused on leveraging such hydrologic information into applications and have more
informed decisions by using these tools.
I presented and tested a system, including an innovative downscaling method for
mapping large scale LSM grid cells to high-resolution stream networks, by leveraging the
RAPID vector-based river routing model. I compared routing 35-year ERA-Interim/Land
reanalysis data in RAPID and the results from GloFAS reanalysis data and demonstrated that this
RAPID-based system has comparable performance with GloFAS. However, using results from
this vector-based stream network, we can provide streamflow predictions at much higher
resolutions. This is done with much less computational cost than creating high-resolution gridded
data, as it inherits all the benefits of vector-based routing models. By comparing routing results
using the same forcing data with different river network resolutions of the same watershed, I
found that the river network resolution has a negligible effect on the simulated streamflow using
this system. This means that using this system, including the downscaling method and using
34
RAPID model for routing, can produce consistent results regardless of the resolution of
hydrographic datasets the user chooses. This is in major contrast to the performance of grid-
based routing models which is significantly affected by the grid resolution. The system is
developed with open source tools and is available for others to use and extend. Our group have
implemented this system in a “Streamflow Prediction Tool” web app (detailed information sees
Software Availability) that aims to provide global streamflow prediction by mapping gridded
ECMWF runoff forecasts to the global vector-based high-resolution river network. We have
made available the implementation for the South Asia, Africa, South America, and CONUS
regions. We are working to expand this application it to cover the globe.
The primary benefit of this work is to streamline the process of mapping and routing
gridded runoff into high-resolution river network and provide the possibility to simultaneously
produce streamflow forecasts from current existing LSM prediction models. It can provide flood
management support to data-scarce regions by enabling the streamflow estimates at specific
locations using various global or national forecasting systems to generate runoff or forcing data.
This work presents a new method for distributing grid-based runoff to vector stream networks
and serves as a method and guidance for coupling LSMs with other vector-based routing models.
I demonstrated that this mapping process can efficiently convert gridded runoff data to intuitive
and useful river discharge at local scales, provide an easy way to access various global or
national runoff forecasts simultaneously and make informed decisions. I expect that this work
and outcomes can motivate contribution on streamlining different disciplinary models to
maximize current outcomes and facilitate environmental modeling research.
The next steps of this work include introducing approaches to optimize RAPID
parameters, such as flow travel time estimation approach to calculate Muskingum parameter k;
35
incorporating data assimilation methods to improve initial states by leveraging observations or
some large-scale LSM modeling systems like GLDAS; incorporating web processing services to
facilitate process interoperability and data processing (Qiao et al. 2019a).
36
3 A CONTAINER-BASED APPROACH FOR SHARING COMPLEX
ENVIRONMENTAL MODELS AS WEB SERVICES
Introduction
The complexity of environmental modeling has expanded in parallel with the
technological advances in earth observations and computational resources. Presently, there is a
focus on improving models to represent more details and integrating models from different
disciplines, including atmospheric, hydrologic, ecologic, geomorphic and human processes
(Bandaragoda et al. 2019; Chen et al. 2019; Gao et al. 2019). The increasing model complexity
raises entry barriers. It is challenging for users to master and execute a new model in a short
amount of time, especially when working with models in unfamiliar domains. The model users
may need to expend considerable time acquiring enough knowledge to execute the model in a
correct way. Some models are even difficult to set up as they require many dependencies in
specific versions. Improving the reusability and interoperability of environmental models is
becoming more important and gaining momentum due to increased desire to better understand
and investigate environmental systems (Belete et al. 2017; Voinov and Cerco 2010). Reusability
refers to the ability of a model to remain functional to be used to develop new models or
modeling systems. Interoperability is the ability to work with other models or systems without
special effort by the user.
37
To improve the reusability and interoperability of complex models, many studies have
focused on lowering the entry bar of model execution. The primary methods include developing
web-based modeling approaches with simple graphical user interface (Luo et al. 2004; Rajib et
al. 2016), linking heterogenetic models through Application Programmer Interfaces (API)
(Gregersen et al. 2007; Peckham et al. 2013), and publishing models as accessible web services
(Jiang et al. 2017; Xiao et al. 2019; Zhang et al. 2019). All those methods can save considerable
time in organizing hardware and installing software. Compared with the first two methods, the
loosely-coupled, service-oriented approach has a prominent advantage that it enables models
written in different programming languages or running on different platforms communicate over
the Internet through web services (Castronova et al. 2013; Michaelis and Ames 2009). Many
software tools have been developed to assist publishing environmental models or analysis
workflows as web services, such as GeoServer (2013), Deegree (Müller 2007), PyWPS (Becchi
2007), 52°North (2018), Esri ArcGIS, etc. Qiao et al. (2019a) developed a Tethys WPS Server to
publish web app functionality as OGC Web Processing Service to improve the reusability and
interoperability of Python-based functions or models. However, these tools are more designed
for geospatial analysis in specific programming languages. It remains challenging to publish
environmental models in various formats, execution modes, programming languages, and
supported operating systems as web services without extra efforts. Zhang et al. (2019) have
developed a service-oriented wrapper system called OpenGMS Wrapper System (OGMS-WS)
that shares complex models over the Internet by deploying them as web services via a standard
encapsulated method. OGMS-WS also provides a user interface that allows users to search and
run the services, monitor and control the executions (detailed in Section 4.2.1.1). Xiao et al.
38
(2019) have demonstrated its flexibility by deploying the Storm Water Management Model
(SWMM) as a web service-oriented system through OGMS-WS.
When publishing a model as a web service through OGMS-WS, the first step is to install
the model on the server where OGMS-WS is deployed. However, deploying models in a new
environment can be a daunting process because some models lack reproducibility and portability.
First, some models are difficult to set up due to required complex infrastructures (network,
storage and computing), and some models are even impossible to install due to the changing
computational environment and dependencies. Third-party resources and dependencies are not
static, and they are updated regularly to fix bugs, add new features, and remove old features. Any
of these changes can make the model unable to execute. Collberg et al. (2014) found that less
than 50% of software could even be successfully installed. Zhao et al. (2012a) found that the
ability and success of re-executing scientific workflows significantly reduces year by year
because of changing dependencies and missing third-party resources. Second, as the processing
power and capacity of servers increases, the need of consolidating various models onto a single
server is becoming more important. It requires the models work well across different server
environments. However, some models have conflicts when installed to the same platform, such
as they both require the same dependency but in different versions (third-party libraries,
programming languages, etc.). Model users have to either fix the conflicts (if the model source
code is available and under control) or implement one model on another platform. Model
isolation management, which refers to isolating each model and preventing the conflicts between
them on the same platform, is not only essential for service publishing tools like OGMS-WS that
require multiple models installed in the same server, but also significant for improving model
reproducibility in general. Many technologic methods have been developed for model isolation
39
or process isolation. For running applications that require different operating systems on the
same server, virtual machines (VMs) can capture the operating system and everything running on
it, provide functionality of a physical computer, and be transferred and shared as files. For
running isolated applications with conflicts in the same operating system, containers (e.g.
Docker) can encapsulate an application and all the related dependencies in a virtual container
that can run as a lightweight, standalone, and executable software. Compared to VMs, containers
are more lightweight, portable, and high-performing because they don’t contain an operating
system but share the operating system kernel with the host machine (Boettiger 2014).
In this chapter, I propose an approach for sharing and reusing complex models over the
Internet as web services by leveraging OGMS-WS and Docker. Docker is adopted as the model
isolation tool to encapsulate the model and keep the model reusable and portable. Then the
containerized model is published as accessible web services through OGMS-WS. I choose Linux
operating system for implementation because it is the leading and most commonly used
operating system on servers, but the concept of using containers to publish complex models as
web services can be applied to other operating systems as well. The remainder of this chapter is
organized as follows. First, the system design of this approach is shown in Section 3.2 with the
background introduction on OGMS-WS and Docker. This section also shows the design of an
experimental study on publishing a streamflow prediction system as a web service. In Section
3.3, the system implementation of this approach is presented together with the experimental
study results. Finally, a Discussion and Conclusion section summarizes the benefits of this
approach and outlines opportunities for future research.
40
Methods
3.2.1 Approach Design
3.2.1.1 OpenGMS Wrapper System (OGMS-WS)
OpenGMS is a platform for sharing geographic and environmental data and models
among multi-disciplinary users to solve complex geographic problems and conduct integrated
simulation. OGMS-WS is a tool in OpenGMS platform for publishing and sharing models over
the Internet as web services. It provides a standard encapsulated method to deploy models as
RESTful web services and a friendly user interface that allows users to interact with model
services, including search, execute, monitor and control.
To deploy a model as a web service through OGMS-WS (shown as Figure 3-1), a Model
Description Language (MDL) file is required to standardize the model communication with the
wrapper system. The MDL file (shown in Table 3-1) describes the model following an
encapsulation standard, which summarizes heterogeneous models in three interfaces: model
description, model execution, and model deployment. The MDL file provides all the essential
information to encapsulate and deploy the model. In addition, it is used to map the model to a
graphical user interface (GUI) that allows users to manually interact with the model service and a
RESTful API that can be used by third-party applications or clients. A detailed instruction on
documenting the MDL file can be found in Xiao et al. (2019). Then, a model encapsulation file is
required to encapsulate the model. The model encapsulation file defines the interaction between
users and the model service, including accepting requests from users and extracting model
inputs, invoking and executing the model, and returning model outputs as responses to users
when completed.
41
Table 3-1 Content of the Model Description Language (MDL) File
Interface Node Information
Model description AttributeSet name, version, abstract, keywords, category
Model execution Behavior inputs, outputs, data formats
Model deployment Runtime entry function, hardware configuration, software dependencies
Figure 3-1: OGMS-WS workflow diagram
3.2.1.2 Docker
Docker is an open source project designed to develop, deploy, and run applications by
using isolated containers (Merkel 2014). Containers allow users to bundle an application with all
of the parts it needs as one package. Containers are more lightweight than virtual machines
because they share the operating system kernel with the host machine. A typical desktop
computer could run no more than a few virtual machines at once but would have no trouble
running 100 Docker containers (Boettiger 2014). Docker was initially designed to package
42
applications in Linux systems. Docker images (a container is an instance of an image) share the
Linux kernel of the host machine, which means that Docker images must be based on Linux
system with Linux-compatible software. But Docker Linux containers can be installed and run
locally not only on Linux, but also on platforms that are not based on the Linux Kernel (macOS
and Windows), this is accomplished through the use of a small VirtualBox-based VM running on
the host OS. In recent years, Docker released Windows containers for packaging Windows
applications. Docker Windows containers share the Windows kernel with the host machine and
currently only support deploying in Windows 10 system.
There are many advantages of using Docker in environmental modeling. First, Docker
makes models reusable by providing isolated images in which all the software and dependencies
are already installed, configured and tested. Second, Docker supports most major platforms
(Linux, Windows and macOS). With Docker, models can be deployed easily across platforms. In
this way, model developers can focus on the model design and coding without worrying the
platform on which the model will ultimately run, while users can retain their familiar platform
without considering the compatibility between the model and system. Third, a Docker image is
created through reading a Dockerfile, which is a simple script that defines all the commands,
necessary dependencies with detailed versions, and the OS to assemble the image. The
Dockerfile is a small plain text file that can be easily shared. Docker also provides a public
repository (Docker Hub, https://hub.docker.com/) for publishing and sharing docker images. All
of these Docker capabilities significantly improve model sharing and versioning. Last but not
least, Docker allows users to link any directory on the host OS to the running Docker container.
This allows users to directly use data saved on the host OS and rely on familiar tools and
environments for data collecting, preparing and editing, while still executing code inside the
43
controlled development environment of the container. This avoids data transferring across
different platforms.
3.2.1.3 OGMS-WS-Docker Approach
Figure 3-2 shows the design of the OGMS-WS-Docker approach. On the OGMS-WS
server, each model is a deployed as a Docker container in which the model is installed,
configured and tested.
Figure 3-2: The OGMS-WS-Docker approach
The metadata of the model is defined in the MDL file with setting the encapsulation file
as the entry point. The encapsulation file first extract inputs from users’ requests and save them
on the server. The bind mounts method is used to transfer input files and output files between the
server and the container. Then the model can be executed through simple Docker command line
44
“docker run”. This approach ensures each model be isolated from one another. The models can
be deployed on all the operating systems that OGMS-WS supports (Windows, Linux, and
macOS) and remain executable no matter how much the sever environment changes.
3.2.2 Experimental Design
This section presents the experimental design of a case study that publishes a streamflow
prediction system as a web service through the OGMS-WS-Docker approach. The aim of this
case study is to demonstrate that this approach can lower the barrier to publishing complex
modeling systems as web services.
3.2.2.1 Streamflow Prediction System
We developed an automated streamflow prediction system to map the runoff generated
from large-scale grid-based LSMs onto high-resolution vector-based stream networks, then route
the results using a vector-based river routing model to provide streamflow forecasts at the river
level (Qiao et al. 2019b). The main benefit of this streamflow prediction system is providing
flood management support to data-scarce regions by enabling the streamflow estimates at
specific locations using global climate forecasts. It has been demonstrated that this system has
comparable accuracy with the well-established GloFAS but provides streamflow predictions on
significantly higher resolution stream networks with much less computational cost and as a data
service. Figure 3-3 shows the workflow of the system. It adopts the RAPID model for river
routing to simulate the water flow through the river network (David et al. 2011b). RAPID is
developed in FORTRAN and compiled as a dynamic link library (DLL). RAPIDpy, an open
source Python-based software tool, is included for preparing inputs in the required formats and
running RAPID (Snow et al. 2017). This system currently supports multiple large-scale LSMs,
45
including ERA-Interim (Dee et al. 2011), NLDAS (Xia et al. 2012), GLDAS (Lorenz et al.
2015), ERA5 (Karl Hennermann 2018), HIWAT (Gatlin et al. 2018), and COSMO (Rockel et al.
2008).
Figure 3-3: Workflow of the Streamflow Prediction System (grey boxes are inputs)
3.2.2.2 Streamflow Prediction Service Design
I met some difficulties when implementing the streamflow prediction system directly
through OGMS-WS. First, the RAPID model only supports the Ubuntu 14.04 system and
46
requires a number of dependencies in specific versions to be compiled. Any difference would
result in failure to compile the model. It is a daunting process to deploy it in a new platform.
Second, RAPIDpy is recommended to be installed through Conda instead of manually through
the source code because it requires a lot of dependencies in specific versions. Conda is a package
manager that allows users to create an isolated environment for a package with its necessary
dependencies in specific versions. However, when I install RAPIDpy through Conda and run it
through the wrapper system. It can’t work because the wrapper system uses the Python
environment on the host machine by default, while RAPIDpy requires to use the Python
environment in the Conda environment. In this chapter, the streamflow prediction system is
published as a web service through the OGMS-WS-Docker approach. In this way, users can
directly use the service without having to set up the streamflow prediction system locally.
For each service, OGMS-WS creates a GUI that consists of a brief description of the
service and the interactions between users and the service, including interfaces to upload inputs,
buttons to run and cancel service, buttons to download and visualize outputs, and execution
monitoring. Figure 3-4 shows the interactions between user, service GUI, OGMS-WS, and
Docker container throughout the service execution process. First, the user is asked to provide all
the required input data. The service becomes executable when all the required input files are
provided. After clicking on the “Run service” button, a service execute request with all the input
data is sent to OGMS-WS. All the services in OGMS-MS support asynchronous execution by
default, so the service GUI receives a response immediately after submitting an execute request.
The first response confirms that the request is received and accepted by the server, a processing
job has been created and will be run. After verifying all the required input data, the processing
job begins. A Docker container is created based on the Docker image of the streamflow
47
predication system. The system in the container is launched and executed with the input data
saved on the server. The service GUI receives the execution response by repeatedly checking the
execution status until the processing job has completed. After that, the execution status on the
service GUI changes to “succeed” and the user can download the result by clicking on the
activated output download button.
Figure 3-4: Interactions between user, service interface, OGMS-WS, and the model container
Based on above process, the first step to implement the streamflow prediction service in
OGMS-WS is to dockerize the streamflow prediction system. Next, I need to design the service,
48
including inputs, outputs, and the encapsulation function, and encode the MDL file and the
encapsulation file accordingly. In the end, I need to package all the required files (see Figure 3-5)
and deploy it through OGMS-WS. The design of service variables and the encapsulation function
are described as follows.
(1) Service variables design
The entry point function of the streamflow prediction system has three required
parameters: (1) full path of the LSM surface and subsurface runoff forcing files (in NetCDF
format); (2) full path of the RAPID model input files, including a file documenting the river
network's topological connectivity, two files defining the Muskingum routing parameters (k and
x), a weight table describes the area ratio of each LSM grid cell to the intersected catchments that
are delineated upon the upstream point of each river reach and a model initialization file
describing the streamflow of each river at the start time of the simulation; (3) full path where to
save the simulation result. Consequently, I set three input variables and one output for the service
as Table 3-2.
Table 3-2: Variable Design of the Streamflow Prediction Service
Service variable Data Format
Input 1: lsm_data LSM Runoff NetCDF files Zip file
Input 2: rapid_io_files RAPID input files
River connectivity file
Weight table
Muskingum k file
Muskingum x file
Initialization file (optional)
Zip file
Input 3: python_file A Python script to launch the system Python file
Output: streamflow Simulated streamflow result NetCDF
49
(2) Encapsulation function design
The functionalities of the encapsulation file include extracting model inputs from a
request, executing the model, and returning outputs as a response when the simulation is
completed. OGMS-WS provides an encapsulation template file that contains scripts for the
interactions between users and the service. Service developers are responsible for encoding the
model execution steps. For the streamflow prediction service, it includes the following steps to
execute the model: 1) build a Docker container from the Docker image of the streamflow
prediction system, 2) mount the folder with user input files to the Docker container, and 3) run
the model. According to Docker Command Line Interfaces, a command line like the following
can mount a folder to a Docker container and execute the model file in it.
docker run --name [container name] -w [path to the model file in container] -
v [source folder]:[target folder in container] [Docker image] [command to run
the model file]
On the OGMS-WS server, each service instance creates a separate folder. In each service
instance folder, each input variable creates a separate folder that holds the corresponding
uploaded input files. In this way, the service instance folder can be mounted into the Docker
container and the model can be executed by running the mounted file. The command line is:
docker run --name rapidpy -w /home/python_file -v ~/mainProcess:/home
xhqiao89/rapidpy:1.0 python run_rapid.py
In the end, I packaged all the required files following a standard structure shown in
Figure 3-5 and deployed it as a web service through the OGMS-WS user interface.
50
Figure 3-5: The streamflow prediction service deployment package
3.2.2.3 Service Application in Bangladesh Historical Streamflow Analysis
This section aims to demonstrate the benefits of publishing complex models as web
services by using the streamflow prediction service to solve a hydrological research problem.
Bangladesh is regularly devastated by flooding due to its low sea level and large rivers. Several
catastrophic floods happened during the past 30 years. Over 75% of the total area of the country
was flooded in the 1998’s flood. However, it is difficult to build an effective flood early warning
system to manage floods for Bangladesh because lacking hydrologic data, technology and human
resources. Long term coherent hydrologic data is essential for building an effective flood early
warning system, such as calculating flood stages and return periods for different flooding levels,
investigating flooding process, model calibration, and assisting planning and decision making for
future extreme floods (Lee et al. 2017). For data-scarce countries like Bangladesh, that lack
sufficient equipment, technology, and human resources to maintain a stable hydrologic data
51
collection and management system. In this situation, leveraging satellite data and global models
can be an efficient way to obtain baseline hydrologic information and supplement whatever other
resources are available.
ERA-Interim/Land is a global land surface reanalysis dataset with parameterization-
improved European Center for Medium-range Weather Forecasts (ECMWF) model driven by
meteorological forcing from the ERA-Interim atmospheric reanalysis and observed precipitation
adjustments (Balsamo et al. 2009; E. L. Wipfler 2011; G. Balsamo 2015). ERA-Interim/Land
provides integrated and coherent daily land surface estimates from 1980/01/01 to 2014/12/31. It
has been widely used in hydrological research to provide hydrologic baseline information for
data-scarce regions and used as the initialization of weather prediction models (Clement Albergel
2018; Dee et al. 2011; G. Balsamo 2015). However, the low resolution (80 km) of the ERA-
Interim/Land dataset makes it difficult to provide usable streamflow information that people
would expect to obtain at specific locations on local stream networks instead of the total runoff
volume per grid cell (6400 km2).
In this chapter, I used the streamflow prediction service to convert the low-resolution
ERA Interim/Land runoff reanalysis data into streamflow at local rivers to provide historical
streamflow estimates for Bangladesh. I ran the service with 20 years (1995-2014) ERA-Interim
runoff reanalysis data to investigate the streamflow levels and changes in the main streams of
Bangladesh during this period.
52
Results
3.3.1 Streamflow Prediction Service
I have successfully deployed the streamflow prediction model in OGMS-WS and
published it as a web service
(http://cosmo.byu.edu:8060/modelser/preparation/5deef20a230f6b6e9bbae443). The service
requires three inputs: a zip file of the LSM runoff NetCDF files, a zip file of the RAPID input
files, and a “run_rapid.py” Python file to run the model. It returns a NetCDF file with the
simulated streamflow in each river segment. The service can be consumed through either the
service GUI or the service API.
OGMS-WS (see Figure 3-6a) provides multiple modules for different functions. Through
the home page or the left navigation bar, users can check the available local services hosted in
the current OGMS-WS. Users can deploy their own models through the “Deployment” module.
After invoking a service, users can check its execution status in the “Instances” module. When
the service execution is completed, users can check the service configuration, log info, and
results of historical services executions in the “Records” module. Users can check the historical
uploaded data in the “Data cache” module. OGMS-WS also provides other user services like
notifications, system information, linking remote services, etc. After the streamflow prediction
system was deployed in OGMS-WS, the service is listed in the local services list showing its
name, version, type, status, permission, accessibility and allowed operation options (as Figure 3-
6b). Users can click on the “invoking” button to enter the service GUI (as Figure 3-7). The
service GUI provides a brief introduction on the service and all the interactions required to
execute the service, including buttons and popup windows to upload input data files, run and
53
cancel the service (as Figure 3-7b). The GUI also shows the service execution status. After the
execution is successfully completed, users can directly download the output through the
download button (as Figure 3-7c).
(a)
(b)
Figure 3-6: Continued OGMS-WS GUIs
54
(a)
(b)
(c)
Figure 3-7: Streamflow prediction service GUIs
55
Another method to consume the service is through API. The services hosted in OGMS-
WS are RESTful web services. They support all the CRUD (create, retrieve, update, delete)
operations and can be invoked by the HTTP methods (HTTP GET, POST, DELETE, and PUT)
and URL-based requests that RESTful APIs support. The HTTP GET method with a URL
composed of a series of request parameters and input data is commonly used to retrieve
information, send simple requests or requests with simple inputs. To exeute time-consuming
services or services with complex inputs, it is recommanded to use the service client Software
Development Kit (SDK) developed by the OpenGMS group
(https://gitee.com/geomodeling/NJModelServiceSDK).
3.3.2 Bangladesh Historical Streamflow Analysis
I ran the streamflow predictive service for Bangladesh with the ERA-Interim/Land runoff
reanalysis data from 1995/01/01 to 2014/12/31. The OpenGMS service client SDK was used to
interact with the service, including sending requests with input files to the service and receiving
responses with outputs from the service. Through the service, I obtained 20 years daily simulated
streamflow in the high-resolution river network of Bangladesh (shown as the blue lines in Figure
3-8). I selected 4 reach segments (shown as the red circles in Figure 3-8) located in four major
rivers to investigate the streamflow level changes from 1995 to 2014. Reach A locates at the
Ganges river and reach B belongs to the Jamuna river. The Ganges river ang Jamuna river join
together to form the Padma river, where reach C locates. Reach D is the last segment of the
Meghna river before joining with the Padma river. The Meghna river is the longest river in
Bangladesh.
56
Figure 3-8: High resolution river network in Bangladesh
Table 3-3: Selected Reach Segments
Reach COMID Located river
A 281577 Ganges
B 273300 Jamuna
C 288491 Padma
D 292364 Meghna
Figure 3-9 shows the boxplots of their monthly average streamflow in these 20 years.
From these results, I found that the rivers in Bangladesh have clear flood and dry seasons every
year. The streamflow varies a lot through the year and the differences between flood season and
57
day season can be enoumrous. The river flood season is mainly from June to October. The
streamflow increases significantly and remians high from July to September.
Figure 3-9: Streamflow monthly distribution from 1995 to 2014
Figure 3-10 shows the simulated streamflow of the selected four reach segments from
1995 to 2014. From them, I found that there are a lot of flash floods happened during the flood
season with the streamflow increased rapidly from a couple of hundreds to tens of tousands cubic
meters per second in a few days and then quickly fell down. Flash flood is the most destructive
disaster in Bangladesh as it happens with little or no advance warning for preparation and
evacuation, and it also carries more and coarser sediment than normal floods. In Figure 3-10, I
found significant high streamflow around 1995, 1998, 2004 and 2005, which is consistant with
58
the catastrophic flood records of Bangladesh (Wikipedia 2019). Above all, global datasets are
certainly not as accurate as observations. But for hydrologic data poor regions like Bangladesh,
they can quickly convert global datasets into streamflow at rivers by using the service and obtain
some baseline hydrologic information, hence assisting flood management and research.
Figure 3-10: Simulated streamflow of selected reach segments
59
Discussion & Conclusion
The motivation of this section is to create an approach to improve publishing complex
environmental models as web services, thereby facilitating model sharing and reuse. Complex
environmental models always have high entry barriers and require complex infrastructure and
dependencies. A lot of research focuses on developing tools for publishing web services,
however; challenges still exist when publishing complex models as web services, such as the
high complexity and low reproducibility of the models, platform compatibility, dependencies
conflicts, and so on. Some models have conflicts with each other when deployed on the same
platform, while some models can’t even be installed or executed due to the changing
environment and dependencies. I designed and presented an approach to publish complex
environmental models as web services by leveraging OGMS-WS and Docker. OGMS-WS was
chosen as the service publishing tool because it is lightweight, low entry bar, and the success in
working with complex geospatial models. But OGMS-WS lacks isolation management of
deployed models, which makes it difficult or impossible to deploy incompatible models on the
same platform. Docker was included as the model isolation tool to resolve conflicts and keep
each model reusable and portable. I published a complex streamflow prediction system as a web
service using this approach and used the service to generate 20 years (1995-2014) high-
resolution streamflow data in Bangladesh from the ERA-interim/Land global land surface
reanalysis dataset to analyze the historical flood trends in Bangladesh.
Based on the experimental case study and our experience, the key outcomes of this
approach for environmental modeling include:
1) Further lowering the barrier to publishing complex environmental models or modeling
systems as long-term functional web services. It has been demonstrated that OGMS-WS can
60
significantly lower the barrier to publishing complex environmental models as web services
by providing task management, asynchronous execution, data storage, etc. However, there
still exist two main challenges to implementing web services through OGMS-WS. First,
some models may be difficult or even impossible to be installed and executed in a new and
different environment. Second, it is still a daunting process for users to prepare the model
deployment package for OGMS-WS, including writing an encapsulation function to invoke
the model and packaging all required resources in the correct way. This approach allows
users to set up models in Docker containers with their familiar environments. The
encapsulation function is also unified to Docker command lines instead of scripts to execute
the model, which is much simpler and more uniform.
2) Avoid deployment conflicts. This approach keeps each model isolated to avoid the potential
issues such as dependency or environment conflicts with OGMS-WS or other existing
models. For example, different versions of the same model can be deployed on the same
computer platform as separate services.
3) Cross-platform. I deployed this approach in the Linux system as an example to demonstrate
its utility, it can also be used in Windows and macOS because both OGMS-WS and Docker
are cross-platform software.
4) Facilitate model sharing and reuse. Once a model is published as web service, it can be
directly requested and executed through URLs, which facilitates a model to be accessed
through the Internet without being downloaded and saves users time and effort in model
installation and configuration. The services can be directly integrated with online data
services to perform continuous analysis. In addition, web services provide a standard method
for models written in different programming languages or running on different computer
61
platforms to communicate with each other over the Internet, so models can be easily and
flexibly integrated to complex environmental modeling systems through web services.
In addition, this work has demonstrated the benefits of using containers for model
isolation management to avoid the potential conflicts in the service publishing process. This
work can also serve as guidance for applying container on other service publishing systems, and
as a demonstration for using containers for model isolation on servers. I expect that this work
and outcomes can motivate contribution on complex environmental model web services to
maximize current outcomes and facilitate environmental modeling research. I have provided only
a streamflow prediction service case study as a proof of the benefits of publishing complex
environmental models as web services using this approach. The next step of this work is to
implement more web services for complex models to fully demonstrate the utility and understand
the limitations of this approach.
62
4 SIMPLIFYING THE DEPLOYMENT OF OGC WEB PROCESSING SERVICES
(WPS) FOR ENVIRONMENTAL MODELING – INTRODUCING TETHYS WPS
SERVER
Introduction
Environmental modeling can be extremely complex because it includes dynamic physical,
chemical, biological, and human processes that can vary at different spatial and temporal scales.
An accurate environmental modeling system often requires integrating multiple models, data
sources, and analysis routines into a workflow to address a research question, and requires some
level of cooperation among experts with different backgrounds. The concept of chaining
interoperable model components is gaining momentum for modeling because such a chain can
potentially answer more questions than the individual models alone (Castronova et al. 2013;
Dubois et al. 2013). Generally, we can achieve model interoperability by sharing input and
output files, directly rewriting models into a single software system, or establishing software
architecture principles that facilitate the coupling of independent models (Belete et al. 2017;
Granell et al. 2010). In the model coupling approach, models are written in a modular way in
which each model performs an isolated task while the whole workflow addresses a much broader
problem. This approach enables each model to remain as flexible, extensible, and reusable as
possible (Argent et al. 2006; Castronova et al. 2013; Schaeffer 2008). However, integrating
multidisciplinary heterogeneous models still involves barriers such as different programming
63
languages, operating platforms, and user interfaces. It is also difficult for researchers to
understand models from other disciplines (Jiang et al. 2017; Yue et al. 2016). Hence, setting up a
system to make heterogeneous models easily connected remains a challenge.
Service-Oriented Architecture (SOA) has been widely used to integrate models and build
scientific workflows (Bosin et al. 2011; Lin et al. 2009; Yue et al. 2016; Zhao et al. 2012b). This
standards-based, loosely-coupled, and service-oriented approach emphasizes the decomposition
of a system into functional components that communicate via web services (Castronova and
Goodall 2010). Web services enable communication and interoperation among various
applications over the network by using open standards and protocols, which means applications
written in different programming languages or running on different platforms can seamlessly
exchange data over the Internet through web services (Michaelis and Ames 2009; Nativi et al.
2013). The model developer can therefore focus more on how to structure the internal algorithms
(Goodall et al. 2011). It has been shown that SOA-based applications can address the issues of
data accessibility and service interoperability for environmental models (Granell et al. 2010;
Huang et al. 2011; Mason et al. 2014; Zhao et al. 2012b). Also, it has been demonstrated that
SOA has significantly improved dealing with complex problems by making data and models
available on the Internet as interoperable services (Dubois et al. 2013; Nino-Ruiz et al. 2013;
Skøien et al. 2013; Yang et al. 2009). While numerous environmental web services exist for
water data distribution [e.g. CUAHSI HIS (Maidment et al. 2009), HydroShare (Horsburgh et al.
2015)] and for web mapping (Blower et al. 2013; Palomo et al. 2017), environmental models as
web services is still an area of research that has not been widely investigated and implemented
(Belete et al. 2017; Hu et al. 2015; Vitolo et al. 2012; Vitolo et al. 2015).
64
The Open Geospatial Consortium (OGC) has promulgated the Web Processing Service
(WPS) specification, which has been demonstrated as an efficient technology for publishing
geospatial processes and constructing integrated model chains (Castronova et al. 2013; Schaeffer
2008; Tan et al. 2015). OGC is the foremost organization guiding the development of standards
and specifications for geospatial services, and it manages a global consensus process that results
in approved interface and encoding standards that enable interoperability among and between
diverse applications. OGC WPS is a commonly recognized specification for publishing processes
as web services. It specifies a standard method for processing data as a request from client to
server, using eXtensible Markup Language (XML) for communication through the Internet
(Schut and Whiteside 2007). OGC WPS also standardizes the interoperation and communication
between WPS, which provides flexibility to create workflows for different purposes. It has also
enabled openly sharing models over the Internet to address the needs for complex simulations
that involve diverse data and algorithms (Kumar and Saran 2014).
Many recent studies have demonstrated orchestrating OGC WPS into environmental
workflows. Feng et al. (2011) have developed five ecosystem disciplinary models that comply
with the OGC WPS specification, and a Geospatial Model Sharing Interface (GeoMSI) that
provides a service-interactive interface for sharing/accessing models through WPS. A modeling
system that implements a time-driven process simulation of wetland ecosystems has been
established through integrating these five remote services. Vitolo et al. (2012) implemented a
web service-based modeling system for hydrological modeling on the basis of the FUSE
modeling framework (Clark et al. 2008) exposing hydrological processes using an open source
library called PyWPS. Castronova et al. (2013) have built a WPS process of the TOPMODEL
hydrologic model and used it within a client application. They advanced PyWPS to maintain
65
state and store data values on the server, in order for the WPS to perform time-dependent
calculations. Kumar and Saran (2014) designed and implemented a WPS process that fetches
data from an OGC Web Coverage Service (WCS) server and performs the Normalized
Difference Vegetative Index (NDVI) calculation. Astsatryan et al. (2015) developed several
vegetation index computation WPS processes using PyWPS, and a user-friendly portal using the
Joomla content management system to enable users to compute either a single vegetation index
or a combination of vegetation indices as a workflow. Some studies also include cloud
computing and parallel computing to improve the performance of complex WPS processes. Tan
et al. (2015) constructed an elastic parallel OGC WPS on a cloud-based cluster to provide high-
performance computation. Astsatryan et al. (2015) designed and applied HPC clusters with an
effective parallelization algorithm to speed up the execution time of the vegetation index
computation WPS.
Compared with web services for water time series data (e.g. CUAHSI WaterOneFlow)
and spatial data (e.g. OGC web mapping and web feature services), the implementation of WPS
for environmental modeling and data analysis is still not particularly common. Indeed, the
existing literature is more heavily weighted towards specific services for particular
environmental workflows rather than general tools to support creating WPS services. One
obvious exception to this observation is the readily available capabilities of the commercial
Environmental Systems Research Institute (Esri) ArcGIS Server software, which can easily
publish data analysis and modeling services. ArcGIS Server uses a proprietary format for
services consumed by the ArcGIS desktop applications, and it can also serve WPS services to be
consumed in web applications and other desktop software (Esri 2018). Notable recent examples
66
of data processing services developed and deployed using ArcGIS Server include (Leonard et al.
2014; Yue et al. 2015).
I posit that there is much value in exposing existing models and data analysis workflows
via WPS in addition to GUI for optimal flexibility. This work presents the design, development,
and testing of an OGC WPS system on the Tethys Platform – a collection of existing open source
GIS tools wrapped in a Django/Python framework. I elected to use Tethys Platform as the
development environment because it is open source and provides a complete and relatively
uncomplicated environment for web-based environmental modeling applications development.
Tethys platform is intended for the development and deployment of web “apps” that are
characterized as generally single-purpose, lightweight web applications that are independently
developed, and are installed and uninstalled from a web app portal – mirroring the app paradigm
of mobile devices. (Swain et al. 2016) demonstrate that Tethys Platform successfully lowers the
barrier to developing web apps in the environmental domain by providing (1) a suite of software
that meet the spatial and computational capabilities commonly required for environmental
modeling, including tools for geoprocessing of spatial data, distributed computing, database
management, map rendering and visualization; (2) a programmatic means to use each of the
recommended software systems in a single programming language, Python; (3) a reduction to the
web development skills required to develop web apps; and (4) a web-safe mechanism for
deploying the finished web apps that is flexible enough to work on the most common means for
obtaining hardware (Swain et al. 2016).
The research goal of this chapter was to generate an approach of simultaneously exposing
web app functionality as WPS when developing a web app. This approach makes the process of
WPS development almost transparent to developers. In this way, environmental modelers and
67
developers can focus on creating web apps, and Tethys Platform takes care of creating the
corresponding WPS services. In addition, exposing apps’ functions as WPS processes can
improve the interoperability and reusability of apps. It also facilitates the implementation of
complex modeling, including chaining apps for integrated modeling.
The remainder of this chapter is organized as follows. The system design and
implementation of Tethys WPS Server is shown in Section 4.2. I also present the design of a
“HydroProspector” hydrological modeling system to demonstrate how Tethys WPS Server
enables and simplifies environmental workflow modeling. The results of this research are
described in Section 4.3. A detailed discussion on the benefits of this research for environmental
modeling and how it successfully facilitates complex environmental modeling web app
development are presented in Section 4.4. Section 4.5 provides conclusions and opportunities for
future work in this area.
Methods
4.2.1 System Design
The fundamental purpose of Tethys WPS Server is to provide a simple method to expose
function(s) of environmental web apps as WPS. This allows environmental modelers and
developers to focus more on web app development and less on the details of the OGC WPS
specification. A Tethys app project is organized using the Model View Controller (MVC)
software architectural pattern (Leff and Rayfield 2001; Swain 2015). The Model is the app’s data
structure. It directly manages the data, logic and rules of the app. The View is the visualization of
the data. The Controller handles user input and performs interactions on the data model or view.
68
A Tethys app receives input datasets (the data Model) from users through a GUI, performs
computation in the backend Controller, and returns outputs to the web browser (the View). The
core part of a Tethys app is its controller functions, which are implemented in Python. Similar to
Tethys apps, the mechanism of WPS also includes accepting input parameters from a client,
executing processes in the backend, and returning outputs to the client. Hence, it is feasible to
disseminate Tethys app functions as WPS processes. The primary difference between Tethys
apps and WPS processes is each Tethys app defines its own specific formats of inputs and
outputs, while the inputs and outputs of WPS processes are uniformly defined in XML format.
Each Tethys app typically contains its own data analysis or modeling workflow to
provide a solution for one specified task or several tasks as shown in Figure 4-1. Here we use the
term, “workflow” to indicate the overall functionality of a web app. A workflow is comprised of
one or more computational processes. Such processes can be hosted within the app itself or can
be included as web processing services provided by other apps. For example, the workflow of a
runoff prediction app includes: a “snap” function that moves a user-defined point to the nearest
river; a “watershed delineation” function that generates watershed boundaries based on the
snapped point; and a "runoff calculation” function that calculates runoff within this watershed
area using the latest precipitation forecasts. Each function of this workflow can be part of other
water resources modeling tasks. With Tethys WPS Server, these functions can be exposed as
OGC WPS processes and reused in the workflow of other Tethys apps or other third-party clients
that support the OGC WPS specification.
69
Figure 4-1: Tethys WPS Server system design
The general system architecture of Tethys WPS Server and its relationship with Tethys
Platform is shown in Figure 4-2. Tethys Platform consists of three major components: Tethys
Software Suite, Tethys Software Development Kit (SDK), and Tethys Portal. Tethys Software
Suite synthesizes several Free and Open-Source Software (FOSS) projects that are needed for
developing environmental modeling applications, including GeoServer (2013) for spatial datasets
publishing, 52°North (2018) for geoprocessing tools, PostgreSQL (2018) database with PostGIS
(2018) extension for spatial datasets storage, HTCondor (2018) for distributed computing, and a
number of JavaScript libraries like OpenLayers (2018) and HighCharts (2018) to help
visualization. Tethys SDK provides multiple Python APIs for each supporting software element.
This makes it possible to orchestrate the functionality of each software using Python in web app
70
scripts. Tethys Portal is a web portal used to host web apps that developed with Tethys Platform
(Christensen 2016; Swain 2015).
Figure 4-2: General system architecture of Tethys WPS Server
In this study, Tethys WPS Server is implemented as a plugin component that has no
effect on other components and functions of Tethys Platform. The purpose of this design is to (1)
leave apps developed based on former versions of Tethys Platform unaffected and (2) make
exposing WPS a flexible option, which means the app can execute normally with or without
71
exposing internal functions as WPS processes. After exposing various functions/processes as
WPS, clients are able to send requests to the server to fetch related information about specific
processes or execute processes. Once the WPS server is invoked by a request, it will
automatically accept input data or fetch data from specified databases (generally with URLs),
perform computation with the data, encode the result into XML and send it back to the client.
4.2.2 System Implementation
The first challenge of implementing Tethys WPS Server is to integrate an existing
software product into Tethys Platform to provide server-side implementation of OGC WPS
interface specification. By “server-side implementation”, I refer specifically to implementing a
WPS server on a web server to handle requests from clients and responses back to clients. By
using existing OGC WPS implementation software, we can avoid having to build this
functionality from scratch and can take advantage of the debugging and testing of the software
by others. Of course, this is the self-evident advantage of using any third party module. As
Tethys Platform is Python-based (Jones et al. 2014) and I aim to directly disseminate app
functions as WPS without too much modification, the implementation project is required to
support Python scripting. Doing so can also facilitate code debugging when developing WPS
processes. Table 4-1 lists several popular software packages that support server-side OGC WPS
implementation. GeoServer WPS and Deegree WPS (Müller 2007) are both Java-based
frameworks and only support Java scripting. 52°North WPS is Java-based implementation.
According to its documentation, 52°North supports Python scripting. However, the Python
support in 52°North was originally designed for use within the Esri ArcGIS Python library in the
Windows operating system – limiting cross-platform usage (Boerboom 2013). There is also a
52°North module for exposing ILWIS GIS (2001) functions as WPS services. Unfortunately,
72
there does not appear to be support in 52°North for exposing native Python WPS functions on a
Linux server. ArcGIS Server WPS refers to its ModelBuilder module that supports exposing
ArcGIS geoprocessing services as OGC WPS. PyWPS (2009) is a Python-based server side
implementation of WPS, allowing integration, publishing and execution of Python processes via
the WPS standard. It was originally developed for implementing GRASS GIS (2018a) operations
as OGC WPS processes and has shown to be well-suited for integration with Tethys Platform.
Table 4-1: Server-Side Implementation of OGC WPS Specification Standard
Software Code Base WPS Process Development
Copyright
GeoServer WPS Java Java Open source
52°North WPS Java Java, R, Python(Windows) Open source
ArcGIS Server WPS C++ (ArcObjects) Python (ArcTools) Proprietary
Deegree WPS Java Java Open source
PyWPS Python Python Open source
PyWPS version 4 utilizes Werkzeug (2011), an open source Web Server Gateway
Interface (WSGI) utility library, to make the WPS functionalities written in Python be accessible
over the Internet. There are four core classes in PyWPS: Service, Process, WPSRequest, and
WPSResponse. The Service class streamlines the request pre-processing, process execution, and
the response post-processing into a complete Hypertext Transfer Protocol (HTTP) request-
response cycle. A complete HTTP request-response cycle begins with a client sending
a request message to a server; the server handles the request and sends a response message back
to the client. The Service class is the top-level object representing the WPS service; it can run as
a WSGI application. The Process class specifies the attributes attached to a generalized WPS
73
process (including metadata terms and a handler function to hold process logic implementation)
and a set of functions to operate on a Process object. Each WPS process accepts a WPSRequest
argument and returns a WPSResponse object. The workflow of PyWPS is shown in Figure 4-3.
When the WPS server is invoked by a HTTP request, PyWPS utilizes Werkzeug to create a
Werkzeug Request object containing metadata about the HTTP request, and then packages it to a
PyWPS WPSRequest object. OGC WPS specifies three operations that can be requested by
clients and performed by a WPS server, including GetCapabilities, DescribeProcess, and Execute
(see Table 2). For “GetCapabilities” and “DescribeProcess” requests, PyWPS directly returns
Werkzeug response objects. For “execute” requests, PyWPS loads the appropriate WPS process
and passes the WPSRequest to the function. The function is responsible for returning a
WPSResponse object, which can then be converted to a Werkzeug response and sent back to the
web server. Tethys Platform is powered by Django framework, of which the primary deployment
platform is WSGI. When a Tethys page is requested, Django creates a Django HttpRequest
object including the request metadata. Then Django passes the HttpRequest to the specified
function and returns a Django HttpResponse object.
Figure 4-3: PyWPS workflow
74
Because Tethys Platform and PyWPS are both WSGI-compliant applications, they can be
deployed on the same web server and coexist as two separate web applications running in
parallel with different Python interpreter instances. However, for a seamless integration it is
preferable for the two applications to use the same Python interpreter instance and thereby share
Python-level objects (variables and functions) as one web application. To allow for this seamless
integration, I choose to ‘downgrade’ PyWPS from a WSGI application to a pure Python package.
This was accomplished by removing the Werkzeug decorator Request application from the
callable functions in the Service class and the WPSResponse class. The slightly modified
PyWPS still has the full implementation of WPS functionality that can be consumed as normal
Python classes or functions from within Tethys. In addition, since the interfaces of PyWPS
classes and functions are unchanged, they still only accept Werkzeug Request objects as input
and return Werkzeug Response objects as output. A conversion between Werkzeug-versioned
Request/Response objects and Django-versioned Request/Response objects is required. The
general workflow is depicted in Figure 4-4.
Figure 4-4: Integration of PyWPS in Tethys Platform
75
Figure 4-5: WPS access to the algorithm of a Tethys app
After integrating PyWPS with Tethys Platform, the next step was to establish a method to
connect Tethys apps with the WPS server. In PyWPS, each WPS process is defined as a Process
class in a Python script containing metadata configuration and a handler function holding
detailed execution algorithm. The handler method can directly call outside functions as a whole
or part of the WPS process. To support this behavior, I created a WPS definition file (as shown
in Figure 4-5) that is included with the Tethys app source code, containing a PyWPS Process
class definition template. The app’s functions can be called in the handler function to be
disseminated as WPS processes. In PyWPS, all the WPS processes should be located in the same
directory to be imported in the server. To implement this capability, Tethys Platform
automatically collects all the WPS definition files in each app hosted on the server and creates
76
symbolic links of them in a location identified in the WPS server configuration. Under this
approach, the WPS definition files located in deployed Tethys apps are imported to the WPS
server. As shown in Figure 4-5, the app’s algorithm can be written in a separate file that can be
used by both the app GUI and the WPS process. The WPS feature is thereby a new route to
access the app’s algorithm and any update to the algorithm will be automatically available in
both the app GUI and the WPS process.
4.2.3 Experimental Case Study Design
This section presents design of use cases to illustrate (1) bi-directional communication
between Tethys apps and Tethys WPS Server, including how the functions of a Tethys app can
be exposed as WPS processes on Tethys WPS Server and how services hosted on it can be
included in another Tethys app as components of the workflow; (2) consuming services hosted
on Tethys WPS Server with third party WPS clients. The aim of these use cases is to test and
demonstrate how Tethys WPS Server can simplify environmental OGC WPS development and
significantly lower the barrier to developing complex web apps.
4.2.3.1 HydroProspector Use Case
Consider “HydroProspector” as a reservior management model that calculates the storage
capacity curve of a reservior to help decision-makers choose the best dam location. The storage
capacity curve, which refers to the relationship between dam height and reservior storage
capacity, is important for dam planners, designers and operators (Issa et al. 2017). To calculate
the storage capacity curve, the first step is delineating a watershed based on a dam location. The
watershed basin defines the maximum reservior size and works as a boundary for latter
calculation. Then the reservoir’s storage capcity are calculated for different dam heights. Finally,
77
the storage capacity curve is plotted using different dam heights with corresponding calculated
storage capcity. The HydroProspector model contains a complex geoprocessing workflow that
requires a relatively long processing time (about 2 minutes, see compute time discussion in
Section 4.3.3). Challenges with developing this app include but are not limited to, complex
geoprocessing coding, handling asynchronous execution, and job management. Furthermore, the
watershed delineation process and reservoir storage calculation process are both essential and
commonly-used in environmnetal modeling. Exposing these processes as web services can make
them more easy to be reused in other complex models. To demonstrate this, I developed a
service-oriented Tethys app that performs HydroProspector model in the Dominican Republic
Country (workflow shown as Figure 4-6).
Figure 4-6: HydroProspector app workflow
Instead of coding from scratch, the app consumes two WPS processes hosted on Tethys
WPS Server: watershed delineation WPS and reservoir calculation WPS. These two WPS
processes are respectively exposed from two Tethys apps:Watershed Delineation for Dominican
78
Republic App and Reservoir Calculation for Dominican Republic App. Both of these two Tethys
apps are functionally independent with their own graphical user interfaces.
Both of these two WPS processes use GRASS GIS functions for geoprocessing
calculation. GRASS GIS provides GIS functions that can be accessed via Python code (Neteler
et al. 2008). The watershed delineation service (Figure 4-7) accepts coordinates of a point and
returns the snapped point at nearest stream and a watershed in geographic markup language
(GML) format. The service first performs a GRASS GIS function, r.watershed, to generate a set
of maps indicating flow accumulation and flow direction. Then a point indexing function is
performed to link the given outlet to the nearest stream network via a shortest snap function.
Finally, a watershed basin is calculated with the snapped outlet point and the flow direction map.
Figure 4-7: Watershed delineation process diagram
The reservoir calculation service (Figure 4-8) calculates the boundary and volume of a
reservoir, which is generated based on coordinates of a pour point and a designed water level at
the pour point. A maximum boundary polygon is required as input to subset the digital elevation
model (DEM) file to reduce the execution time of raster calculation process. The service uses the
GRASS GIS function, r.lake, to generate a reservoir raster from the dam point. The resulting
79
reservoir raster, based on which the reservoir boundary and volume are generated, contains cells
with values representing reservoir depth and NULL for all other cells beyond the reservoir. Both
the output of the watershed delineation service and one input of the reservoir calculation service
are polygons in the GeoJSON format. Having variables in the same format enables the
connection of these two services. Each of the services hosted in these Tethys WPS Server
instances can be executed using the OGC WPS specification.
Figure 4-8: Reservoir calculation process diagram
The HydroProspector app user interface consists of a user interactive map and a chart
(app code flow shown as Figure 4-9). The user first can add a point anywhere on the map with a
mouse click. After selecting a point, the coordinates of the user-clicked point are passed to the
app controller using Asynchronous JavaScript (AJAX). The app controller reads in the
coordinates and sends it as a request to the watershed delineation service on Tethys WPS Server.
As the watershed delineation service execution is finished, the controller extracts the watershed
and snapped point results and passes them to the map. Using AJAX, the map periodically checks
the controller for progress status and adds the watershed and snapped point to the map once the
80
processing is finished. Next, the user defines dam height and interval, then clicks the calculate
button. Each gage height with the watershed and snapped point from the first step are passed to
the controller using AJAX . Then the controller submits a request to the reservoir calculation
service on Tethys WPS Server. Once the service execution is finished and the response is
returned to the controller function, the controller function extracts the reservoir boundary and
volume and passes them to the GUI. To sequentially get the proposed reservoir boundary and
volume results, the data with updated gage height is passed until the former processing is
finished. After each AJAX function is completed, the resevoir boundary is added to the map and
the reservoir volume is drawn in the chart. The process is completed when the storage capacity
values for all gage heights are calculated.
Figure 4-9: Interactions between User, Template, Controller, and Tethys WPS Server in the HydroProspector App
81
4.2.3.2 WPS Client Use Case
One advantage of Tethys WPS Server is that the WPS services hosted on it can be
consumed not only by Tethys apps, but also by various types of clients set up in accordance with
the OGC WPS implementation specification. The simplest client of a WPS service is a web
browser. The WPS services can be consumed through a web browser using HTTP GET method
with Key Value Pair (KVP) encoded requests. Responses and exceptions are then returned to the
web browser. However, there are also some third-party clients and libraries capable of
consuming OGC WPS services. In the HydroProspector app, I use OWSLib (2017) to interact
with Tethys WPS Server, including sending requests to the server and receiving responses from
services. OWSLib is a WPS client library that provides a common API for accessing and
executing OGC WPS services. I also used an open source desktop GIS software, QGIS (2018b)
as a WPS client to demonstrate that the WPS processes hosted on Tethys WPS Server conform to
the OGC WPS standard and are able to be consumed by other third-party software that supports
OGC WPS.
Results
4.3.1 Tethys WPS Server
I developed an API for exposing Tethys app functions as OGC WPS processes. When a
new Tethys app project is created, a WPS definition file (Figure 4-10) is automatically included
in the app source code. The WPS definition file provides a template that allows app developers
to expeditiously define a WPS process. Each WPS process is defined as a PyWPS Process class,
which contains an initialization function and a handler function. The initialization function
82
includes the definition of inputs, outputs, service metadata, and server-based parameters. The
handler function specifies the actual operation of the WPS process. Other outside functions
defined in the Tethys app can be called in the handler method to become a WPS process or part
of a WPS process. This design allows the WPS process to be updated simultaneously with any
modification of the app.
Figure 4-10: WPS definition template file. The green box contains metadata of the service; the blue box contains the definitions of inputs and outputs; the red box contains the actual operation, where can directly call functions in the Tethys app.
The services described in the WPS definition file are not activated automatically. They
must be activated (or deactivated) using command line statements created to help manage WPS
processes on Tethys WPS Server. These include:
• tethys wps list – Lists all published and unpublished WPS processes. This lists
the available published and unpublished processes on the Linux command line with no
further interaction required from the user.
83
• tethys wps publish – Publishes selected WPS processes. This interactive
command results in a prompt to the user requesting the name of a specific app and
associated service to be published.
• tethys wps remove – Removes selected WPS processes. This interactive command
results in a prompt to the user requesting the name of the service to be removed.
The benefits of using command lines include simplifying common management tasks
related to WPS and allowing app developers the flexibility to modify WPS processes without
affecting the apps’ existing operation through its graphical user interface.
4.3.1.1 Interaction with Tethys WPS Server
OGC WPS specifies three operations that can be requested by clients and performed by a
WPS server, including GetCapabilities, DescribeProcess, and Execute (Table 4-2). Supported
data formats for inputs and outputs include literal data (character, string, value, date, etc.),
complex data (vector, raster), and bounding box data (bounding box area with coordinate
reference system). WPS servers can be invoked using the HTTP GET method with a KVP
encoded request or the HTTP POST method using XML format for the request. Generally, the
KVP encoding method is accepted for GetCapabilities and DescribeProcess requests, while the
Execute requests often adopt XML encoding. The KVP encoded requests are written as URLs
composed of a series of request parameters and input data. They are commonly used for simple
requests or requests with simple inputs (like literal data). While sending an Execute request that
includes complex data (vector, raster) or bounding box data, the XML coding method is
recommended over KVP. The inputs can be included directly in the request, or they can
84
reference web accessible resources. The outputs can be returned in the form of an XML response
document, either embedded within the response document or stored as web accessible resources.
Tethys WPS Server, as an OGC WPS implementation, supports executing processes
either synchronously or asynchronously. For the synchronous execution, a WPS client sends an
execute request to Tethys WPS Server and keeps listening for a response until the process has
completed and returned the result. In this situation, the client and WPS server should keep
connected. For the asynchronous execution, a WPS client receives a response immediately after
sending an execute request to Tethys WPS Server. The response confirms that the request is
received and accepted by the server, a processing job has been created and will be run. The
response contains execution status and a job identifier, with which the client can check the
execution status. It also contains the location, usually as URL, where the execution result can be
found after the processing job has completed. Typically, asynchronous execution is
recommended for WPS processes, as geoprocessing processes usually contain complex and time-
consuming calculations.
Table 4-2: Tethys WPS Server HTTP GET + KVP Request APIs
GetCapabilities /tethys_wps?service=WPS&request=GetCapabilities
Generic WPS service metadata; the list of processes available on the server.
DescribeProcess /tethys_wps?service=WPS&version=1.0.0&request=DescribeProcess&identifier=<identifier>
Full description of a process including input and output parameters.
Execute /tethys_wps?service=WPS&version=1.0.0&request=Execute&identifier=<identifier>&datainputs=<input1=;input2=…>
Execution of a process using supplied inputs, returning outputs as requested
85
4.3.2 Case Study Results
4.3.2.1 HydroProspector Use Case
I deployed three Tethys web apps at the website (http://cosmo.byu.edu/apps): Watershed
Delineation for Dominican Republic App, Reservoir Calculation for Dominican Republic App,
and HydroProspector for Dominican Republic App. From the first two apps, I respectively
exposed a watershed delineation WPS and a reservoir calculation WPS on Tethys WPS Server
(details in Table 4-3). These processes were exposed through the Linux command line statements
described previously. The HydroProspector App was developed by coupling these two WPS
processes into the workflow shown in Figure 4-6. OWSLib is used to interact with the WPS
processes on Tethys WPS Server. The HydroProspector app user interface (Figure 4-11) includes
an interactive map to show watershed and reservoir boundaries, along with a chart to show the
storage capacity curve of designed dams. The user is required to (1) click on the map to select an
outlet point for watershed delineation, and then (2) define a dam height and interval. The app
will dynamically draw the storage capacity curve in the chart and show corresponding reservoir
boundaries with different dam heights. When selecting a data point in the chart, the
corresponding reservoir boundary is shown on the map.
The service-oriented approach presented here is particularly suited to complex or time-
consuming workflows where information must persist beyond server-client transactions. For
example, when directly implementing the HydroProspector model in a single web app, the model
execution time often exceeded the typical HTTP request-response timeout setting and failed to
reply to the request made from the client. The browser timeout can be extended long enough to
account for this, or can be disabled completely; however, this can affect the normal execution of
other web browser requests. In addition, interruption or error can be caused by conflicting
86
requests by different clients. To overcome these barriers, a job queue system should be included
to implement asynchronous execution and provide real-time monitoring of different requests. A
database or other persistent data store mechanism is also required for the storage of job
information and results to allow users to check the job status and retrieve result. This approach is
complicated and time-consuming, and it requires advanced web development skills. Tethys WPS
Server, as an OGC WPS implementation, supports storing the execution response and outputs as
web-accessible resources that can be retrieved by the client. It also supports asynchronous
execution, which can keep the execution status updated and accessible while the request is being
processed. By using Tethys WPS Server, app developers can focus more on model design rather
than on web development.
Figure 4-11: HydroProspector for Dominican Republic App User Interface
87
Table 4-3: Watershed Delineation Process WPS & Reservoir Calculation Process WPS on
Tethys WPS Server
WPS Watershed Delineation Process
Reservoir Calculation Process
Identifier watersheddelineationprocess reservoircalculationprocess
Abstract This process snaps a given outlet to the nearest stream
and performs watershed delineation within the Dominican Republic
country.
This process returns the boundary and storage capacity of a reservoir based on a given pour
point location, an aimed water level, and maximum reservoir
boundary polygon.
Inputs Outlet longitude: outlet_x (float)
Pour point longitude: point_x (float)
Pour point latitude: point_y (float)
Outlet latitude: outlet_y (float)
Water level at pour point: water_level (float)
Maximum reservoir boundary: max_boundary (text/xml)
Outputs Delineated watershed: watershed (text/xml)
Reservoir volume: lake_volume (float)
Snapped point: snappoint (text/xml)
Reservoir boundary: lake_boundary (text/xml)
status_supported True True
store_supported True True
4.3.2.2 Third-party WPS Client Use Case
I choose QGIS as a WPS client to demonstrate that the WPS processes hosted on Tethys
WPS Server can be used by other third-party clients that support OGC WPS specification. QGIS
was chosen because it is open source and is a comprehensive GIS environment. QGIS has a WPS
Python client plug-in, through which QGIS can directly consume WPS services on WPS servers.
The interactions include listing WPS capabilities, sending request to execute a WPS process and
88
visualizing outputs in QGIS interface. I successfully connected to Tethys WPS Server, executed
the Watershed Delineation WPS and the Reservoir Calculation WPS through the QGIS WPS
Client, and displayed the results in the QGIS map. This effort proved to be uncomplicated and
self-explanatory with no unusual results or unpredicted outcomes, hence no further discussion of
this test is given here.
4.3.3 Quantitative Analysis
An analysis of programming language composition was performed to measure how
Tethys WPS Server overcomes the hurdle of complex web app development in environmental
modeling. I compared the HydroProspector app with another Tethys app called Storage Capacity
Service (https://github.com/xhqiao89/storage_capacity_service.git), which implements the same
functionality of the HydroProspector model fully natively without using WPS services. I used the
Count Lines of Code (CLOC) Linux package to count the number of code lines, excluding blank
lines, comment lines, and third party libraries included. One aim of Tethys WPS Server is
lowering the barrier to complex environmental web app development.
Table 4-4: Language Composition Comparison
Note: Front end contains HTML, JavaScript and CSS.
Language
HydroProspector App Storage Capacity Service App
Lines Percent Lines Percent
Python 204 29% 603 55%
Front end 494 71% 500 45%
Total 698 100% 1103 100%
89
This is evident by the reduced amount of Python used in the HydroProspector app shown
in Table 4-4, 204 lines compared to 603 lines for the Storage Capacity Service app. In addition,
the Storage Capacity Service app adopts Celery (Solem 2007) for asynchronous execution and
task job management, and PostGIS database for job results storage – none of which is required in
the WPS based HydroProspector.
In addition, I selected 10 hydropower dam sites around Dominican Republic country and
calculated the storage capacity curve with the HydroProspector app and the Storage Capacity
Service app respectively. The dams were all 30 meters high and the interval for calculating the
storage capacity curve was set to 10 meters. The total compute time for each dm site was
recorded as Figure 4-12.
Figure 4-12: The comparison of compute time between HydroProspector app and Storage Capacity Service app
90
The average compute time of using the HydroProspector app is 147.2 seconds, while
103.8 seconds with the Storage Capacity Service app. The HydroProspector app takes a longer
time because it calls two independent WPS services, which contains more steps than the pure
storage capacity curve calculation process, such as setting up GRASS GIS environment, creating
GRASS GIS projects, input data preprocessing and outputs post-processing.
4.3.4 Qualitative Analysis
As a qualitative assessment of how Tethys WPS Server lowers the barrier of OGC WPS
deployment, I hosted a workshop and collected results from a questionnaire survey focusing on
the ease of operation and the required time to setup WPS processes with Tethys WPS Server. I
had 10 attendees with various knowledge and experience on environmental web app
development participate the workshop and survey (see Figure 4-13).
Figure 4-13: Academic status (left) and environmental web app developing experience (right) of the workshop and survey attendees.
91
Figure 4-14: Comparison of difficulty of OGC WPS deployment with/without Tethys WPS Server for survey attendees with different years of web application development experience (Scale of 1 to 10 where 1 is very easy and 10 is very difficult)
Figure 4-15: Time to deploy WPS processes using Tethys WPS Server for survey attendees with different years of web application development experience.
92
The survey generally focused on (1) level of difficulty to deploy a WPS service with
Tethys WPS Sever compared with from creating a WPS using other means; (2) duration of time
to convert an existing app function into a WPS process through defining the WPS definition file
and publishing it through command line functions. Based on the survey results, all the attendees
agree that it is easier to publish app functions as WPS using Tethys WPS Server. Figure 4-14
shows the difficulty level (scale 1 to 10 where 1 is very easy and 10 is very difficult) of
deploying WPS with/without using Tethys WPS Server for the attendees with different years of
web application development experience. Based on the different programming level of the
survey attendees, most of them spend less than 15 minutes to convert an existing app function
into a WPS process and less than 3 minutes to publish it through command lines (shown as
Figure 4-15). Approximate 60% of the attendees estimate that this WPS deployment process
takes less than 5% of the total app development time, while the other 40% estimate it may take
5% to 20%. Overall, with Tethys WPS Server, the time of exposing an app function as a WPS
service is negligible.
Discussion
The motivation of this work was to create a ready-to-use tool to expose web app
functions to standard OGC WPS, thereby facilitating the development of web apps containing
complex environmental models. PyWPS was chosen for server-side OGC WPS implementation
because it is Python-based and supports Python scripts. Tethys Platform was chosen as the
development environment because of its demonstrated advantages in developing web-based
environmental apps. It enables those with limited web development skills to develop web-based
environmental apps using a pre-packaged web app development environment, including
geoprocessing tools for spatial data, map rendering and visualization, distributed computing, and
93
database management (Jones et al. 2014). An API was developed for exposing Tethys app
functions as WPS processes. A set of command line functions was developed for WPS
management on the server.
Based on the case studies presented here and my own experience with Tethys WPS
Server, the key outcomes of this work for app developers using Tethys platform include:
1) Ease of implementation - It is possible to add WPS processes without needing to
change well-written codes in existing web apps.
2) Easier maintenance - A WPS process is based on and connected with a web app, so
these services can be debugged, tested, and updated together with the web app, enabling simpler
maintenance of the services.
3) Avoidance of code duplication – As OGC WPS can be used across applications with
different programming languages over different computational platforms, WPS processes hosted
on Tethys WPS server can be used within other Tethys apps, or by third-party applications or
clients which support OGC WPS specification.
4) Lowering the barrier to environmental OGC WPS development and deployment –
Tethys Platform has been demonstrated to successfully lower the barrier to developing and
deploying environmental web apps. Tethys WPS Server allows developers to convert existing
app functions into WPS processes, making the WPS generation process almost transparent to
developers. A WPS process is automatically deployed along with the web app from which it is
defined and can be flexibly activated or deleted by command lines. Therefore, the barrier to
developing OGC WPS processes should be lowered.
94
5) Facilitating complex environmental web app development - Implementing complex
environmental models to web apps can be a challenging task because complex models are
normally time-consuming processes, which require task management, asynchronous execution,
data storage, and other related issues. As demonstrated in the case study, decomposing the model
to loosely coupled WPS processes can avoid many web development issues, especially
benefiting environmental scientists and engineers with limited programming skills. Moreover, a
WPS process can contain a series of other WPS processes, which is necessary for environmental
modeling as many complex environmental models contain multiple steps. For example, some
models may first decompose the water system into atmosphere, land surface, and subsurface.
Each component may contain a list of more detailed process-level services, such as land surface
including infiltration, evaporation, runoff, and others, and each process can be decomposed to
multiple smaller services.
In addition to creating a Tethys-specific implementation of WPS and demonstrating its
utility, this work also serves as a general methodological experience and guidance for alternative
implementations of WPS, which may not use the Tethys/Django framework. I have shown that
disaggregation of a complex workflow into multiple modular WPS instances can help avoid
overloading a single server and does yield many of the benefits reported by Castronova et al.
(2013) and Goodall et al. (2011), especially enabling models to be flexibly extended and reused
by clients from different systems. Others can learn from and emulate my success in deploying
GIS-enabled web applications that include both a front-end GUI for end users and a back-end
WPS service, which can be activated and deactivated through simple server side commands.
Maintaining the integrity of PyWPS (with only minor changes) as a separate package that is
linked into the Tethys infrastructure, we gain the benefit of being able to readily upgrade to
95
newer versions of this library that may become available in the future. In addition, my
experience with the PyWPS library specifically is highly positive and serves as a
recommendation for this library in similar use cases where Python is the primary server-side
technology.
I expect that these observations and outcomes will have broader application for future
environmental web application developers beyond the Tethys user and developer community. I
also note a few challenges that remain with respect to this work. First, it still requires minor
extra-coding to integrate functions of an app into WPS processes. App developers must manually
define WPS processes following the API, which also causes some level of code duplication in
the app, including request and response data processing. Second, the services hosted on Tethys
WPS Server currently are configured to be accessible to anyone without any authorization or
authentication requirements. The third challenge is not a shortcoming of Tethys WPS Server, but
is a shortcoming of WPS in general and is related to moving large datasets between services,
which can cause network latency. One way to minimize this effect is to avoid passing large sizes
of messages when designing services. This means even though Tethys WPS Server provides an
easy environment to develop WPS, there remains a high requirement for developers' ability to
decompose a complex modeling system into a set of representative services, and design services
with a minimum size of input and output data.
Conclusion
In this chapter, I developed an open source platform called Tethys WPS Server using
PyWPS, which can help web app developers expose app functions to standard OGC WPS
processes. A set of command line functions was created for flexible WPS processes
96
management, including publishing, listing, and removal. A Tethys web application coupling two
WPS processes was developed to demonstrate the advantages and benefits of Tethys WPS Server
for complex environmental modeling systems. Through Tethys WPS Server, one or more
functionalities in each Tethys app can be exposed as OGC WPS processes through several
simple steps and deployed along with the Tethys app. With the work presented in this chapter,
Tethys WPS Server can lower the barrier to OGC WPS development and deployment and further
improve complex modeling web applications development in the environmental domain.
The next steps for Tethys WPS Server include upgrading it from a Tethys internal
application to a Django application that can be easily coupled with any Django-based system,
handling the problem of security and user authentication, and providing more examples and
guidance on well-designed modularization and decomposition of complex problems to avoid
large-data-passing problems. Furthermore, I have provided here only a HydroProspector web app
to serve as a proof of benefits for using Tethys WPS Server in complex environmental web app
development. More complicated web apps are needed to fully test the pros and cons of Tethys
WPS Server.
97
5 CONCLUSIONS
The primary goals of this dissertation include creating an automated and efficient method
to generate global high-resolution flood forecasting and developing software tools to share
geospatial analysis workflows and environmental models as web services, facilitating complex
environmental modeling. There are three research objectives:
(1) create and test an enhanced method to route coarse gridded runoff generated from
global or national LSMs onto a high-resolution vector-based river network and provide flood
prediction at local scales.
(2) develop a tool to facilitate publishing complex environmental models as web services,
thereby facilitating model sharing and reuse.
(3) develop a ready-to-use tool to expose web app functionality as standard OGC WPS,
thereby facilitating the development of web apps containing complex environmental models.
Chapter 2 addressed the first objective and presented the design and development of a
new automated computational system that quickly downscales the runoff generated from coarse
grid-based LSMs onto high-resolution vector-based stream networks by leveraging the RAPID
vector-based river routing model. I compared routing 35-year ERA-Interim/Land reanalysis data
in this system and the results from GloFAS reanalysis data and demonstrated that this system has
comparable performance with GloFAS. The new system has comparable accuracy with GloFAS
98
but provides streamflow predictions on higher resolution stream networks with much less
computational cost. I found that the river network resolution has a negligible effect on the
simulated streamflow from the new system. In other words, we can forecast streamflow for very
small stream segments and potentially improve local flood awareness and response much more
successfully than previously possible using readily available climate forcing from LSMs. The
system has been extended to work with multiple LSM models, which allows developing
countries and data-scarce regions to generate streamflow estimations simultaneously from
various LSM products to guide flood management.
Chapter 3 addressed the second objective and presents a tool to improve publishing
complex environmental models as web services. OGMS-WS was chosen as the service
publishing tool because of its lightweight, low entry bar, and the success in working with
complex geospatial models. Docker was included as the model isolation tool to resolve conflicts
and keep each model functional with changing environments. I published the streamflow
prediction system in Chapter 2 as a web service using this tool. Then, the service was used to
generate 20 years (1995-2014) high-resolution streamflow data in Bangladesh to analyze its
historical flood trends. We demonstrated that this tool can significantly lower the barrier to
publishing complex models as long-term functional web services, avoid deployment conflicts,
and facilitate model sharing and reuse. This work also has demonstrated the benefits of using
containers for model isolation management to sidestep the potential conflicts in the service
publishing process. The tool supports different operating systems, including Linux, Windows
and macOS.
Chapter 4 addressed the third objective and presents the development and testing of a
ready-to-use WPS implementation called Tethys WPS Server, which provides a formalized way
99
to expose web app functionality as standardized WPS alongside a web app's graphical user
interface. The WPS server is created based on Tethys Platform by leveraging PyWPS. Tethys
Platform was chosen as the development environment because of its demonstrated advantages in
developing web-based environmental apps, especially for users with limited web development
skills. PyWPS was chosen for server-side OGC WPS implementation because it is Python-based
and supports Python scripts. An API was developed for exposing Tethys app functions as WPS
processes. A set of command-line functions was developed for WPS management on the server.
Three Tethys web apps are developed to demonstrate how web app functionality(s) can be
exposed as WPS using Tethys WPS Server, and to show how these WPS can be coupled to build
a complex modeling web app. I demonstrated that the services hosted on Tethys WPS Server
follow OGC standards correctly and can be used successfully by third-party applications and
clients that support the OGC WPS specification. The advantages of this tool include ease of
implementation and maintenance, avoidance of code duplication, lowering the barrier to
environmental OGC WPS development and deployment, and facilitating complex environmental
web app development.
The work in this dissertation contributes to environmental research in several aspects.
First, the work in Chapter 2 has demonstrated the feasibility to map global gridded runoff data on
high-resolution vector-based stream networks, and the credibility to use global LSMs to provide
flood predictions at local stream locations to meet managers’ needs. By leveraging existing long-
term land surface reanalysis data from global models, this approach can efficiently provide
baseline hydrologic information and supplement for regions that are hydrologic-data poor and
have little or no observed data or working models. The system can produce consistent results
regardless of watershed resolution, this also demonstrated the advantages of using vector-based
100
river routing model in large-scale streamflow modeling compared with traditional grid-based
models, which is significantly affected by the grid resolution. Moreover, this work represents a
significant step forward in global or large-scale flood forecasting by providing rapid and accurate
streamflow forecasts with low computational costs. International development organizations like
the World Bank can reinvest dollars focused on leveraging such hydrologic information into
applications and making more informed decisions. In sum, this work provides a new pathway to
exploit the ever-increasing runoff products from land surface models. In the era of big data, I
expect that this work can motivate contribution to developing systems to better utilize earth
observations and global modeling products, improving our understanding of environmental
systems.
The works in Chapter 3 and Chapter 4 have provided solutions to web service
deployment respectively for environmental models and analysis workflows in web apps. The
biggest advantage of web services is enabling being used by clients/applications with different
programming languages over different platforms. The OGMS-WS-Docker approach for complex
models overcomes the shortcoming of current exiting service-publishing tools --- model
reproducibility, isolation, and system compatibility. Tethys WPS Server can publish existing web
app functions as WPS processes through simple command lines, then the service can be used
within other Tethys apps, or by third-party applications or clients which support OGC WPS
specification. These two outcomes significantly lower the barrier to publishing environmental
models and analysis workflows as web services by providing simple API/UI, task management,
asynchronous execution, etc. Others can learn from and emulate my success in model isolation
management and WPS implementations. I expect these works and outcomes can motivate
101
contribution to environmental web service development to facilitate integrated environmental
modeling in the future.
The future work includes (1) optimizing the streamflow prediction performance by model
calibration, including reservoir simulation, and improving initial states through absorbing
satellite observations; (2) upgrading Tethys WPS Server to a Django application that can be
easily coupled with any Django-based system, handling the problem of security and user
authentication; (3) conducting more cases to fully test the pros and cons of the OGMS-WS-
Docker approach and Tethys WPS Server.
102
REFERENCES
52°North, 2018. Home - 52°North Initiative for Geospatial Open Source Software. In. http://52north.org/.
Alfieri, L., P. Burek, E. Dutra, B. Krzeminski, D. Muraro, J. Thielen & F. Pappenberger, 2013. GloFAS - global ensemble streamflow forecasting and flood early warning. Hydrology and Earth System Sciences 17(3):1161 doi:http://dx.doi.org/10.5194/hess-17-1161-2013.
Argent, R. M., A. Voinov, T. Maxwell, S. M. Cuddy, J. M. Rahman, S. Seaton, R. Vertessy & R. D. Braddock, 2006. Comparing modelling frameworks–a workshop approach. Environmental Modelling & Software 21(7):895-910.
Astsatryan, H., A. Hayrapetyan, W. Narsisian, A. Saribekyan, S. Asmaryan, A. Saghatelyan, V. Muradyan, Y. Guigoz, G. Giuliani & N. Ray, 2015. An interoperable web portal for parallel geoprocessing of satellite image vegetation indices. Earth Science Informatics 8(2):453-460.
Atkinson, R., Power, R., Lemon, D. and O’Hagan, R. , 2008. The Australian Hydrological Geospatial Fabric – Development Methodology and Conceptual Architecture Water for a Healthy Country Flagship Report. Australian Government: Bureau of Meteorology.
Bai, P., X. Liu, T. Yang, K. Liang & C. Liu, 2016. Evaluation of streamflow simulation results of land surface models in GLDAS on the Tibetan plateau. Journal of Geophysical Research: Atmospheres 121(20):12,180-12,197 doi:10.1002/2016jd025501.
Balazs M. Fekete, C. J. V., and Richard B. Lammers 2011. Scaling gridded river networks for macroscale hydrology: Development, analysis, and control of error. Water Resources Research 37(7):13.
Balsamo, G., A. Beljaars, K. Scipal, P. Viterbo, B. v. d. Hurk, M. Hirschi & A. K. Betts, 2009. A Revised Hydrology for the ECMWF Model: Verification from Field Site to Terrestrial
103
Water Storage and Impact in the Integrated Forecast System. Journal of Hydrometeorology 10(3):623-643 doi:10.1175/2008jhm1068.1.
Bandaragoda, C., A. Castronova, E. Istanbulluoglu, R. Strauch, S. S. Nudurupati, J. Phuong, J. M. Adams, N. M. Gasparini, K. Barnhart, E. W. H. Hutton, D. E. J. Hobley, N. J. Lyons, G. E. Tucker, D. G. Tarboton, R. Idaszak & S. Wang, 2019. Enabling Collaborative Numerical Modeling in Earth Sciences using Knowledge Infrastructure. Environmental Modelling & Software:104424 doi:https://doi.org/10.1016/j.envsoft.2019.03.020.
Basha, E. & D. Rus, Design of early warning flood detection systems for developing countries. In: 2007 International Conference on Information and Communication Technologies and Development, 15-16 Dec. 2007 2007. p 1-10.
Becchi, J. C. a. L., 2007. Geospatial Processing via Internet on Remote Servers - PyWPS. OSGeo Journal 1:5.
Belete, G. F., A. Voinov & G. F. Laniak, 2017. An overview of the model integration process: From pre-integration assessment to testing. Environmental Modelling & Software 87:49-63 doi:https://doi.org/10.1016/j.envsoft.2016.10.013.
Blower, J. D., A. Gemmell, G. H. Griffiths, K. Haines, A. Santokhee & X. Yang, 2013. A Web Map Service implementation for the visualization of multidimensional gridded environmental data. Environmental Modelling & Software 47:218-224.
Boerboom, J., 2013. Implementing the WPS Standard - Case Study for Dissemination of Coastal and Marine Tools. Delft University of Technology.
Boettiger, C., 2014. An introduction to Docker for reproducible research, with examples from the R environment. ACM SIGOPS Operating Systems Review - Special Issue on Repeatability and Sharing of Experimental Artifacts 49(1):9.
Bosin, A., N. Dessì & B. Pes, 2011. Extending the SOA paradigm to e-Science environments. Future Generation Computer Systems 27(1):20-31 doi:https://doi.org/10.1016/j.future.2010.07.003.
Wang, C., R. M. G., Wang, K., Sebastian Gerland, Mats A. Granskog, 2018. Comparison of ERA5 and ERA-Interim near surface air temperature and precipitation over Arctic sea ice: Effects on sea ice thermodynamics and evolution. The Cryosphere Discussion.
104
Castronova, A. M. & J. L. Goodall, 2010. A generic approach for developing process-level hydrologic modeling components. Environmental Modelling & Software 25(7):819-825.
Castronova, A. M., J. L. Goodall & M. M. Elag, 2013. Models as web services using the open geospatial consortium (ogc) web processing service (wps) standard. Environmental Modelling & Software 41:72-83.
Chen, M., S. Yue, G. Lü, H. Lin, C. Yang, Y. Wen, T. Hou, D. Xiao & H. Jiang, 2019. Teamwork-oriented integrated modeling method for geo-problem solving. Environmental Modelling & Software 119:111-123 doi:https://doi.org/10.1016/j.envsoft.2019.05.015.
Christensen, S. D., 2016. A Comprehensive Python Toolkit for Harnessing Cloud-Based High-Throughput Computing to Support Hydrologic Modeling Workflows. Brigham Young University.
Clark, M. P., A. G. Slater, D. E. Rupp, R. A. Woods, J. A. Vrugt, H. V. Gupta, T. Wagener & L. E. Hay, 2008. Framework for Understanding Structural Errors (FUSE): A modular framework to diagnose differences between hydrological models. Water Resources Research 44(12).
Clement Albergel, E. D., Simon Munier, Jean-Christophe Calvet, Joaquin Munoz-Sabater, Patricia de Rosnay, Gianpaolo Balsamo, 2018. ERA-5 and ERA-Interim driven ISBA land surface model simulations: which one performs better? Hydrology & Earth System Sciences 22(6):18.
Collberg, C., T. Proebsting, G. Moraila, A. Shankaran, Z. Shi & A. M. Warren, 2014. Measuring Reproducibility in Computer Systems Research. University of Arizona, http://reproducibility.cs.arizona.edu/tr.pdf, 37.
Cools, J., D. Innocenti & S. O’Brien, 2016. Lessons from flood early warning systems. Environmental Science & Policy 58:117-122 doi:https://doi.org/10.1016/j.envsci.2016.01.006.
David, C. H., J. S. Famiglietti, Z.-L. Yang, F. Habets & D. R. Maidment, 2016. A decade of RAPID—Reflections on the development of an open source geoscience code. Earth and Space Science 3(5):226-244 doi:10.1002/2015EA000142.
David, C. H., F. Habets, D. R. Maidment & Z.-L. Yang, 2011a. RAPID applied to the SIM-France model. Hydrological Processes 25(22):3412-3425 doi:10.1002/hyp.8070.
105
David, C. H., D. R. Maidment, G.-Y. Niu, Z.-L. Yang, F. Habets & V. Eijkhout, 2011b. River Network Routing on the NHDPlus Dataset. Journal of Hydrometeorology 12(5):913-934 doi:10.1175/2011jhm1345.1.
David, C. H., Z.-L. Yang & S. Hong, 2013. Regional-scale river flow modeling using off-the-shelf runoff products, thousands of mapped rivers and hundreds of stream flow gauges. Environmental Modelling & Software 42:116-132 doi:https://doi.org/10.1016/j.envsoft.2012.12.011.
Dee, D. P., S. M. Uppala, A. J. Simmons, P. Berrisford, P. Poli, S. Kobayashi, U. Andrae, M. A. Balmaseda, G. Balsamo, P. Bauer, P. Bechtold, A. C. M. Beljaars, L. van de Berg, J. Bidlot, N. Bormann, C. Delsol, R. Dragani, M. Fuentes, A. J. Geer, L. Haimberger, S. B. Healy, H. Hersbach, E. V. Hólm, L. Isaksen, P. Kållberg, M. Köhler, M. Matricardi, A. P. McNally, B. M. Monge-Sanz, J. J. Morcrette, B. K. Park, C. Peubey, P. de Rosnay, C. Tavolato, J. N. Thépaut & F. Vitart, 2011. The ERA-Interim reanalysis: configuration and performance of the data assimilation system. Quarterly Journal of the Royal Meteorological Society 137(656):553-597 doi:10.1002/qj.828.
Ding, D., 2016. Esri ArcGIS Python toolbox for runing RAPID. doi:https://github.com/Esri/python-toolbox-for-rapid.git.
Donat, M. G., A. L. Lowry, L. V. Alexander, P. A. O’Gorman & N. Maher, 2016. More extreme precipitation in the world’s dry and wet regions. Nature Climate Change 6:508 doi:10.1038/nclimate2941. https://www.nature.com/articles/nclimate2941#supplementary-information.
Dubois, G., M. Schulz, J. Skøien, L. Bastin & S. Peedell, 2013. eHabitat, a multi-purpose Web Processing Service for ecological modeling. Environmental Modelling & Software 41:123-133.
E. L. Wipfler, K. M., J. C. van Dam, R. A. Feddes, E. van Meijgaard, L. H. van Ulft, B. van den Hurk,S. J. Zwart4,5, and W. G. M. Bastiaanssen, 2011. Seasonal evaluation of the land surface scheme HTESSEL against remote sensing derived energy fluxes of the Transdanubian region in Hungary. Hydrology and Earth System Sciences 15:15 doi:10.5194/hess-15-1257-2011.
Emerton, R. E., E. M. Stephens, F. Pappenberger, T. C. Pagano, A. H. Weerts, A. W. Wood, P. Salamon, J. D. Brown, N. Hjerdt, C. Donnelly, C. A. Baugh & H. L. Cloke, 2016. Continental and global scale flood forecasting systems. Wiley Interdisciplinary Reviews: Water 3(3):391-418 doi:10.1002/wat2.1137.
106
Esri, 2018. WPS services - Documentation | ArcGIS Enterprise. In. http://enterprise.arcgis.com/en/server/latest/publish-services/windows/wps-services.htm Accessed 12-03 2018.
European Environment Agency, 2012. European catchments and Rivers network system (Ecrins). https://www.eea.europa.eu/data-and-maps/data/european-catchments-and-rivers-network.
Feng, M., S. Liu, N. H. Euliss, C. Young & D. M. Mushet, 2011. Prototyping an online wetland ecosystem services model using open model sharing standards. Environmental Modelling & Software 26(4):458-468.
G. Balsamo, C. A., A. Beljaars, S. Boussetta, E. Brun, H. Cloke, D. Dee, E. Dutra, J. Muñoz-Sabater,F. Pappenberger, P. de Rosnay, T. Stockdale, F. Vitart, 2015. ERA-Interim/Land: a global land surface reanalysis data set. Hydrology Earth System Sciences 19:18 doi:10.5194/hess-19-389-2015.
Gao, F., P. Yue, C. Zhang & M. Wang, 2019. Coupling components and services for integrated environmental modelling. Environmental Modelling & Software 118:14-22 doi:https://doi.org/10.1016/j.envsoft.2019.04.003.
Gatlin, P. N., J. L. Case, J. Bell, W. A. Petersen & D. Cecil, 2018. Monitoring Intense Thunderstorms in the Hindu-Kush Himalayan Region. NASA Marshall Space Flight Center; Huntsville, AL, United States.
GeoServer, 2013. GeoServer Web Processing Service Home Page. In. http://docs.geoserver.org/2.3.5/user/extensions/wps/index.html.
Gill, M. A., 1978. Flood routing by the Muskingum method. Journal of Hydrology 36(3):353-363 doi:https://doi.org/10.1016/0022-1694(78)90153-1.
Gong, L., S. Halldin & C.-Y. Xu, 2011. Global-scale river routing—an efficient time-delay algorithm based on HydroSHEDS high-resolution hydrography. Hydrological Processes 25(7):1114-1128 doi:10.1002/hyp.7795.
Goodall, J. L., B. F. Robinson & A. M. Castronova, 2011. Modeling water resource systems using a service-oriented computing paradigm. Environmental Modelling & Software 26(5):573-582 doi:https://doi.org/10.1016/j.envsoft.2010.11.013.
107
Granell, C., L. Díaz & M. Gould, 2010. Service-oriented applications for environmental models: Reusable geospatial services. Environmental Modelling & Software 25(2):182-198.
GRASS Development Team, 2018a. Geographic Resources Analysis Support System (GRASS) Software. In. Open Source Geospatial Foundation. https://grass.osgeo.org.
Gregersen, J. B., P. J. A. Gijsbers & S. J. P. Westen, 2007. OpenMI: Open modelling interface. Journal of Hydroinformatics 9(3):175-191 doi:10.2166/hydro.2007.023 %J Journal of Hydroinformatics.
Guy J-P. Schumann, P. D. B., Heiko Apel, Giuseppe T. Aronica, 2018. Global Flood Hazard Mapping, Modeling, and Forecasting: Challenges and Perspectives.
Highsoft Data Visualization Company, 2018. Interactive JavaScript charts for your webpage | HighCharts. In. https://www.highcharts.com/ Accessed 12-03 2018.
Hirabayashi, Y., R. Mahendran, S. Koirala, L. Konoshima, D. Yamazaki, S. Watanabe, H. Kim & S. Kanae, 2013. Global flood risk under climate change. Nature Climate Change 3(9):816-821 doi:10.1038/nclimate1911.
Hirpa, F. A., P. Salamon, H. E. Beck, V. Lorini, L. Alfieri, E. Zsoter & S. J. Dadson, 2018. Calibration of the Global Flood Awareness System (GloFAS) using daily streamflow data. Journal of Hydrology 566:595-606 doi:10.1016/j.jhydrol.2018.09.052.
Horsburgh, J. S., M. M. Morsy, A. M. Castronova, J. L. Goodall, T. Gan, H. Yi, M. J. Stealey & D. G. Tarboton, 2015. Hydroshare: Sharing diverse environmental data types and models as social objects with application to the hydrology domain. JAWRA Journal of the American Water Resources Association.
Hu, Y., X. Cai & B. DuPont, 2015. Design of a web-based application of the coupled multi-agent system model and environmental model for watershed management analysis using Hadoop. Environmental Modelling & Software 70:149-162 doi:https://doi.org/10.1016/j.envsoft.2015.04.011.
Huang, M., D. R. Maidment & Y. Tian, 2011. Using SOA and RIAs for water data discovery and retrieval. Environmental Modelling & Software 26(11):1309-1324 doi:https://doi.org/10.1016/j.envsoft.2011.05.008.
108
ILWIS, 2001. Integrated land and water information system user’s guide. International Institute for Aerospace Survey and Earth Sciences (ITC).
Issa, I. E., N. Al-Ansari, G. Sherwany & S. Knutsson, 2017. Evaluation and modification of some empirical and semi-empirical approaches for prediction of area-storage capacity curves in reservoirs of dams. International Journal of Sediment Research 32(1):127-135 doi:https://doi.org/10.1016/j.ijsrc.2015.12.001.
Jackson, E. K., W. Roberts, B. Nelsen, G. P. Williams, E. J. Nelson & D. P. Ames, 2019. Introductory overview: Error metrics for hydrologic modelling – A review of common practices and an open source library to facilitate use and adoption. Environmental Modelling & Software 119:32-48 doi:https://doi.org/10.1016/j.envsoft.2019.05.001.
Jiang, P., M. Elag, P. Kumar, S. D. Peckham, L. Marini & L. Rui, 2017. A service-oriented architecture for coupling web service models using the Basic Model Interface (BMI). Environmental Modelling & Software 92(Supplement C):107-118 doi:https://doi.org/10.1016/j.envsoft.2017.01.021.
Jones, N., J. Nelson, N. Swain, S. Christensen, D. Tarboton & P. Dash, Tethys: A Software Framework for Web-Based Modeling and Decision Support Applications. In: 7th International Congress on Environmental Modelling and Software, DP Ames, NWT Quinn, and AE Rizzoli (Editors) iEMSs, San Diego, California, 2014. p 170-177.
Jonkman, S. N., 2005. Global Perspectives on Loss of Human Life Caused by Floods. Natural Hazards 34(2):151-175 doi:10.1007/s11069-004-8891-3.
Karl Hennermann, P. B., 2018. What are the changes from ERA-Interim to ERA5? In: European Centre for Medium-Range Weather Forecasts. https://confluence.ecmwf.int/pages/viewpage.action?pageId=74764925 2019.
Knapp, A. K., C. Beier, D. D. Briske, A. T. Classen, Y. Luo, M. Reichstein, M. D. Smith, S. D. Smith, J. E. Bell, P. A. Fay, J. L. Heisler, S. W. Leavitt, R. Sherry, B. Smith & E. Weng, 2008. Consequences of More Extreme Precipitation Regimes for Terrestrial Ecosystems. BioScience 58(9):811-821 doi:10.1641/B580908 %J BioScience.
Kousky, C., 2014. Informing climate adaptation: A review of the economic costs of natural disasters. Energy Economics 46:576-592 doi:https://doi.org/10.1016/j.eneco.2013.09.029.
Kralidis, T., 2017. OWSLib 0.16.0 documentation. In. http://geopython.github.io/OWSLib/.
109
Kumar, K. & S. Saran, 2014. Web based geoprocessing tool for coverage data handling. The International Archives of Photogrammetry, Remote Sensing and Spatial Information Sciences 40(8):1139.
Latt, Z. Z. & H. Wittenberg, 2014. Improving Flood Forecasting in a Developing Country: A Comparative Study of Stepwise Multiple Linear Regression and Artificial Neural Network. Water Resources Management 28(8):2109-2128 doi:10.1007/s11269-014-0600-8.
Lee, K., H. Gao, M. Huang, J. Sheffield & X. Shi, 2017. Development and Application of Improved Long-Term Datasets of Surface Hydrology for Texas %J Advances in Meteorology. 2017:13 doi:10.1155/2017/8485130.
Leff, A. & J. T. Rayfield, Web-application development using the model/view/controller design pattern. In: Enterprise Distributed Object Computing Conference, 2001 EDOC'01 Proceedings Fifth IEEE International, 2001. IEEE, p 118-127.
Lehner, B. & G. Grill, 2013. Global river hydrography and network routing: baseline data and new approaches to study the world's large river systems. Hydrological Processes 27(15):2171-2186 doi:10.1002/hyp.9740.
Leonard, L., C. J. J. E. m. Duffy & software, 2014. Automating data-model workflows at a level 12 HUC scale: Watershed modeling in a distributed computing environment. 61:174-190.
Li, Y., K. Guan, G. D. Schnitkey, E. DeLucia & B. Peng, 2019. Excessive rainfall leads to maize yield loss of a comparable magnitude to extreme drought in the United States. Global Change Biology 25(7):2325-2337 doi:10.1111/gcb.14628.
Lin, C., S. Lu, X. Fei, A. Chebotko, D. Pai, Z. Lai, F. Fotouhi & J. Hua, 2009. A Reference Architecture for Scientific Workflow Management Systems and the VIEW SOA Solution. IEEE Transactions on Services Computing 2(1):79-92 doi:10.1109/TSC.2009.4.
Lin, P., Z.-L. Yang, D. J. Gochis, W. Yu, D. R. Maidment, M. A. Somos-Valenzuela & C. H. David, 2018. Implementation of a vector-based river network routing scheme in the community WRF-Hydro modeling framework for flood discharge simulation. Environmental Modelling & Software 107:1-11 doi:https://doi.org/10.1016/j.envsoft.2018.05.018.
110
Lorenz, C., M. J. Tourian, B. Devaraju, N. Sneeuw & H. Kunstmann, 2015. Basin-scale runoff prediction: An Ensemble Kalman Filter framework based on global hydrometeorological data sets. Water Resources Research 51(10):8450-8475 doi:10.1002/2014WR016794.
Luo, W., K. Duffin, E. Peronja, J. Stravers & G. Henry, 2004. A web-based interactive landform simulation model (WILSIM). Computers & Geosciences - COMPUT GEOSCI 30:215-220 doi:10.1016/j.cageo.2004.01.001.
Maidment, D. R., R. P. Hooper, D. G. Tarboton & I. Zaslavsky, 2009. Accessing and Sharing Data Using CUAHSI Water Data Services. In Cluckie, I., et al. (eds) HydrologyHydrogeology and Water Resources. IAHS Publ. 331, Proceedings of Symposium JS4 held in Hyderabad, India, 213-223.
Martelloni, G., S. Segoni, R. Fanti & F. J. L. Catani, 2012. Rainfall thresholds for the forecasting of landslide occurrence at regional scale. 9(4):485-495 doi:10.1007/s10346-011-0308-2.
Mason, S. J. K., S. B. Cleveland, P. Llovet, C. Izurieta & G. C. Poole, 2014. A centralized tool for managing, archiving, and serving point-in-time data in ecological research laboratories. Environmental Modelling & Software 51:59-69 doi:https://doi.org/10.1016/j.envsoft.2013.09.008.
McKay, L., Bondelid, T., Dewald, T., Johnston, J., Moore, R., and Rea, A., 2012. NHDPlus Version 2: User Guide.
Merkel, D., 2014. Docker: lightweight Linux containers for consistent development and deployment. 2014(239 %J Linux J.):Article 2.
MetaCarta, 2018. OpenLayers - Home. In. https://openlayers.org/.
Miao, Q., 2019. Are We Adapting to Floods? Evidence from Global Flooding Fatalities. 39(6):1298-1313 doi:10.1111/risa.13245.
Michaelis, C. D. & D. P. Ames, 2009. Evaluation and implementation of the OGC web processing service for use in client-side GIS. Geoinformatica 13(1):109-120.
Min, S.-K., X. Zhang, F. W. Zwiers & G. C. Hegerl, 2011. Human contribution to more-intense precipitation extremes. Nature 470(7334):378-381 doi:10.1038/nature09763.
111
Mizukami, N., M. P. Clark, K. Sampson, B. Nijssen, Y. Mao, H. McMillan, R. J. Viger, S. L. Markstrom, L. E. Hay, R. Woods, J. R. Arnold & L. D. Brekke, 2016. mizuRoute version 1: a river network routing tool for a continental domain water resources applications. Geosci Model Dev 9(6):2223-2238 doi:10.5194/gmd-9-2223-2016.
Molteni, F., R. Buizza, T. N. Palmer & T. Petroliagis, 1996. The ECMWF Ensemble Prediction System: Methodology and validation. Quarterly Journal of the Royal Meteorological Society 122(529):73-119 doi:10.1002/qj.49712252905.
Müller, M., 2007. deegree – Building Blocks for Spatial Data. OSGeo Journal 1:4.
Nativi, S., P. Mazzetti & G. N. Geller, 2013. Environmental model access and interoperability: The GEO Model Web initiative. Environmental Modelling & Software 39(Supplement C):214-228 doi:https://doi.org/10.1016/j.envsoft.2012.03.007.
Neteler, M., D. E. Beaudette, P. Cavallini, L. Lami & J. Cepicky, 2008. GRASS GIS. In Hall, G. B. & M. G. Leahy (eds) Open Source Approaches in Spatial Data Handling. Springer Berlin Heidelberg, Berlin, Heidelberg, 171-199.
Nino-Ruiz, M., I. Bishop & C. Pettit, 2013. Spatial model steering, an exploratory approach to uncertainty awareness in land use allocation. Environmental Modelling & Software 39:70-80 doi:https://doi.org/10.1016/j.envsoft.2012.06.009.
Palomo, I., M. Adamescu, K. J. Bagstad, C. Cazacu, H. Klug & S. Nedkov, 2017. Tools for mapping ecosystem services.
Peckham, S. D., E. W. H. Hutton & B. Norris, 2013. A component-based approach to integrated modeling in the geosciences: The design of CSDMS. Computers & Geosciences 53:3-12 doi:https://doi.org/10.1016/j.cageo.2012.04.002.
Pitman, A. J., 2003. The evolution of, and revolution in, land surface schemes designed for climate models. International Journal of Climatology 23(5):479-510 doi:10.1002/joc.893.
PostGIS Project Steering Committee (PSC), 2018. PostGIS - Spatial and Geographic objects for PostgreSQL. In. https://postgis.net/.
PostGIS Project Steering Committee (PSC), 2018. PostgreSQL: The World's Most Advanced Open Source Database. In. https://www.postgresql.org/ Accessed 12-03 2018.
112
PyWPS, 2009. Python Web Processing Service (PyWPS). In. http://pywps.org.
QGIS Development Team, 2018b. QGIS Geographic Information System. In. http://qgis.osgeo.org.
Qiao, X., Z. Li, D. P. Ames, E. J. Nelson & N. R. Swain, 2019a. Simplifying the deployment of OGC web processing services (WPS) for environmental modelling – Introducing Tethys WPS Server. Environmental Modelling & Software 115:38-50 doi:https://doi.org/10.1016/j.envsoft.2019.01.021.
Qiao, X., E. J. Nelson, D. P. Ames, Z. Li, C. H. David, G. P. Williams, W. Roberts, J. L. Sánchez Lozano, C. Edwards, M. Souffront & M. A. Matin, 2019b. A systems approach to routing global gridded runoff through local high-resolution stream networks for flood early warning systems. Environmental Modelling & Software 120:104501 doi:10.1016/j.envsoft.2019.104501.
Rajib, M. A., V. Merwade, I. L. Kim, L. Zhao, C. Song & S. Zhe, 2016. SWATShare – A web platform for collaborative research and education through online sharing, simulation and visualization of SWAT models. Environmental Modelling & Software 75:498-512 doi:https://doi.org/10.1016/j.envsoft.2015.10.032.
Roberts, W., G. P. Williams, E. Jackson, E. J. Nelson & D. P. Ames, 2018. Hydrostats: A Python Package for Characterizing Errors between Observed and Predicted Time Series. Hydrology 5(66).
Rockel, B., A. Will & A. Hense, 2008. The regional climate model COSMO-CLM (CCLM), vol 17.
Rose, J. B., S. Daeschner, D. R. Easterling, F. C. Curriero, S. Lele & J. A. Patz, 2000. Climate and waterborne disease outbreaks. 92(9):77-87 doi:10.1002/j.1551-8833.2000.tb09006.x.
Schaeffer, B., 2008. Towards a transactional web processing service. Proceedings of the GI-Days, Münster.
Schiermeier, Q., 2011. Increased flood risk linked to global warming. Nature 470(7334):316-316 doi:10.1038/470316a.
Schut, P. & A. Whiteside, 2007. OpenGIS web processing service. OGC project document.
113
Shaad, K., 2018. Evolution of river-routing schemes in macro-scale models and their potential for watershed management. Hydrological Sciences Journal 63(7):1062-1077 doi:10.1080/02626667.2018.1473871.
Shaw, D., L. W. Martz & A. Pietroniro, 2005. Flow routing in large-scale models using vector addition. Journal of Hydrology 307(1):38-47 doi:https://doi.org/10.1016/j.jhydrol.2004.09.019.
Singh, R. S., J. T. Reager, N. L. Miller & J. S. Famiglietti, 2015. Toward hyper-resolution land-surface modeling: The effects of fine-scale topography and soil texture on CLM4.0 simulations over the Southwestern U.S. Water Resources Research 51(4):2648-2667 doi:10.1002/2014WR015686.
Skøien, J. O., M. Schulz, G. Dubois, I. Fisher, M. Balman, I. May & É. Ó. Tuama, 2013. A Model Web approach to modelling climate change in biomes of Important Bird Areas. Ecological informatics 14:38-43.
Snow, A. D., S. D. Christensen, J. W. Lewis, S. McDonald & T. Whitaker, 2017. erdc/RAPIDpy: 2.6.0 (Version 2.6.0). doi:Zenodo. http://doi.org/10.5281/zenodo.941990.
Snow, A. D., S. D. Christensen, N. R. Swain, E. J. Nelson, D. P. Ames, N. L. Jones, D. Ding, N. S. Noman, C. H. David, F. Pappenberger & E. Zsoter, 2016. A High-Resolution National-Scale Hydrologic Forecast System from a Global Ensemble Land Surface Model. JAWRA Journal of the American Water Resources Association 52(4):950-964 doi:10.1111/1752-1688.12434.
Solem, A., 2007. Celery: Distributed Task Queue. In. http://www.celeryproject.org/ 2018.
Sood, A. & V. Smakhtin, 2015. Global hydrological models: a review. Hydrological Sciences Journal 60(4):549-565 doi:10.1080/02626667.2014.950580.
Swain, N. R., 2015. Tethys Platform: A Development and Hosting Platform for Water Resources Web Apps. Brigham Young University.
Swain, N. R., S. D. Christensen, A. D. Snow, H. Dolder, G. Espinoza-Dávalos, E. Goharian, N. L. Jones, E. J. Nelson, D. P. Ames & S. J. Burian, 2016. A new open source platform for lowering the barrier for environmental web app development. Environmental Modelling & Software 85:11-26.
114
Tan, X., L. Di, M. Deng, J. Fu, G. Shao, M. Gao, Z. Sun, X. Ye, Z. Sha & B. Jin, 2015. Building an Elastic Parallel OGC Web Processing Service on a Cloud-Based Cluster: A Case Study of Remote Sensing Data Processing Service. Sustainability 7(10):14245-14258 doi:10.3390/su71014245.
Tavakoly, A. A., A. D. Snow, C. H. David, M. L. Follum, D. R. Maidment & Z.-L. Yang, 2017. Continental-Scale River Flow Modeling of the Mississippi River Basin Using High-Resolution NHDPlus Dataset. JAWRA Journal of the American Water Resources Association 53(2):258-279 doi:10.1111/1752-1688.12456.
Taylor, C. M., D. Belušić, F. Guichard, D. J. Parker, T. Vischel, O. Bock, P. P. Harris, S. Janicot, C. Klein & G. Panthou, 2017. Frequency of extreme Sahelian storms tripled since 1982 in satellite observations. Nature 544:475 doi:10.1038/nature22069.
Tu, H., X. Wang, W. Zhang, H. Peng, Q. Ke & X. Chen, 2020. Flash Flood Early Warning Coupled with Hydrological Simulation and the Rising Rate of the Flood Stage in a Mountainous Small Watershed in Sichuan Province, China. 12(1):255.
Tung, Y. K., 1985. River Flood Routing by Nonlinear Muskingum Method. Journal of Hydraulic Engineering 111(12):1447-1460 doi:10.1061/(ASCE)0733-9429(1985)111:12(1447).
UW-Madison, C. T. a., 2018. HTCondor - Home. In. https://research.cs.wisc.edu/htcondor/.
Vannote, R. L., G. W. Minshall, K. W. Cummins, J. R. Sedell & C. E. Cushing, 1980. The River Continuum Concept. Canadian Journal of Fisheries and Aquatic Sciences 37(1):130-137 doi:10.1139/f80-017.
Vitolo, C., W. Buytaert & D. Reusser, Hydrological Models as Web Services: An Implementation Using OGC Standards. In: Proceedings of the 10th International Conference on Hydroinformatics, HIC, Hamburg, Germany, 2012.
Vitolo, C., Y. Elkhatib, D. Reusser, C. J. A. Macleod & W. Buytaert, 2015. Web technologies for environmental Big Data. Environmental Modelling & Software 63:185-198 doi:https://doi.org/10.1016/j.envsoft.2014.10.007.
Voinov, A. & C. Cerco, 2010. Model integration and the role of data. Environmental Modelling & Software 25(8):965-969 doi:https://doi.org/10.1016/j.envsoft.2010.02.005.
115
Ward, P. J., B. Jongman, P. Salamon, A. Simpson, P. Bates, T. De Groeve, S. Muis, E. C. de Perez, R. Rudari, M. A. Trigg & H. C. Winsemius, 2015. Usefulness and limitations of global flood risk models. Nature Climate Change 5:712 doi:10.1038/nclimate2742.
Werkzeug, 2011. Werkzeug - The Python WSGI Utility Library. In. http://werkzeug.pocoo.org/.
Westra, S., H. J. Fowler, J. P. Evans, L. V. Alexander, P. Berg, F. Johnson, E. J. Kendon, G. Lenderink & N. M. Roberts, 2014. Future changes to the intensity and frequency of short-duration extreme rainfall. 52(3):522-555 doi:10.1002/2014rg000464.
Wikipedia, 2019. Floods in Bangladesh. In: Wikipedia. The Free Encyclopedia. https://en.wikipedia.org/w/index.php?title=Floods_in_Bangladesh&oldid=920590097 Accessed October 11 2019.
Wu, H., R. F. Adler, Y. Hong, Y. Tian & F. Policelli, 2012. Evaluation of Global Flood Detection Using Satellite-Based Rainfall and a Hydrologic Model. Journal of Hydrometeorology 13(4):1268-1284 doi:10.1175/jhm-d-11-087.1.
Wu, H., R. F. Adler, Y. Tian, G. J. Huffman, H. Li & J. Wang, 2014. Real-time global flood estimation using satellite-based precipitation and a coupled land surface and routing model. Water Resources Research 50(3):2693-2717 doi:10.1002/2013WR014710.
Xia, Y., K. Mitchell, M. Ek, J. Sheffield, B. Cosgrove, E. Wood, L. Luo, C. Alonge, H. Wei, J. Meng, B. Livneh, D. Lettenmaier, V. Koren, Q. Duan, K. Mo, Y. Fan & D. Mocko, 2012. Continental-scale water and energy flux analysis and validation for the North American Land Data Assimilation System project phase 2 (NLDAS-2): 1. Intercomparison and application of model products. Journal of Geophysical Research: Atmospheres 117(D3) doi:10.1029/2011jd016048.
Xiao, D., M. Chen, Y. Lu, S. Yue & T. Hou, 2019. Research on the Construction Method of the Service-Oriented Web-SWMM System. ISPRS International Journal of Geo-Information 8(6):268.
Yamazaki, D., G. A. M. de Almeida & P. D. Bates, 2013. Improving computational efficiency in global river models by implementing the local inertial flow equation and a vector-based river network map. Water Resources Research 49(11):7221-7235 doi:10.1002/wrcr.20552.
116
Yang, C.-T., G.-L. Huang, F.-H. Huang, T.-W. Ho, P.-H. Huang & J.-M. Yang, Soa-based platform for water resource information exchanging. In: 2009 17th International Conference on Geoinformatics, 2009. IEEE, p 1-4.
Yin, D., Y. Liu, A. Padmanabhan, J. Terstriep, J. Rush & S. Wang, 2017. A CyberGIS-Jupyter Framework for Geospatial Analytics at Scale. Paper presented at the Proceedings of the Practice and Experience in Advanced Research Computing 2017 on Sustainability, Success and Impact, New Orleans, LA, USA.
Younis, J. & A. P. J. De Roo, 2010. LISFLOOD: a GIS‐based distributed model for river basin scale water balance and flood simulation AU - Van Der Knijff, J. M. International Journal of Geographical Information Science 24(2):189-212 doi:10.1080/13658810802549154.
Yue, P., M. Zhang, Z. J. E. M. Tan & Software, 2015. A geoprocessing workflow system for environmental monitoring and integrated modelling. 69:128-140.
Yue, S., M. Chen, Y. Wen & G. Lu, 2016. Service-oriented model-encapsulation strategy for sharing and integrating heterogeneous geo-analysis models in an open web environment. ISPRS Journal of Photogrammetry and Remote Sensing 114:258-273.
Zhang, F., M. Chen, D. P. Ames, C. Shen, S. Yue, Y. Wen & G. Lü, 2019. Design and development of a service-oriented wrapper system for sharing and reusing distributed geoanalysis models on the web. Environmental Modelling & Software 111:498-509 doi:https://doi.org/10.1016/j.envsoft.2018.11.002.
Zhang, Q., X. Gu, V. P. Singh, L. Liu & D. Kong, 2016. Flood-induced agricultural loss across China and impacts from climate indices. Global and Planetary Change 139:31-43 doi:https://doi.org/10.1016/j.gloplacha.2015.10.006.
Zhao, J., J. M. Gomez-Perez, K. Belhajjame, G. Klyne, E. Garcia-Cuesta, A. Garrido, K. Hettne, M. Roos, D. D. Roure & C. Goble, Why workflows break — Understanding and combating decay in Taverna workflows. In: 2012 IEEE 8th International Conference on E-Science, 8-12 Oct. 2012 2012a. p 1-9.
Zhao, P., T. Foerster & P. Yue, 2012b. The geoprocessing web. Computers & Geosciences 47:3-12.
117
APPENDIX A. SOFTWARE AVAILABILITY
1. New Software Developed in the Dissertation
Chapter 2:
Extended RAPIDpy
o Source code: https://github.com/BYU-Hydroinformatics/RAPIDpy.git.
Chapter 3:
Streamflow prediction service
o http://cosmo.byu.edu:8060/modelser/preparation/5deef20a230f6b6e9bbae443
Streamflow prediction service encapsulation package for OGMS-MS:
o https://github.com/xhqiao89/rapidpy_docker_opengms
Dockerized streamflow prediction system:
o https://hub.docker.com/r/xhqiao89/rapidpy
OpenGMS service client SDK Python version example:
o https://github.com/xhqiao89/OpenGMS_SDK_Python
Chapter 4:
Tethys WPS Server
o Source code: https://github.com/tethysplatform/tethys/tree/tethys_wps
o Web Page: http://cosmo.byu.edu/tethys_wps/?service=WPS&request=GetCapabilities
118
o License: The BSD 2-Clause open source license
Tethys WPS Server case study apps
o http://cosmo.byu.edu/apps/. (username : demo; password : demo)
o The source code for these three web apps are available on GitHub in the following
repositories:
o Watershed Delineation for Dominican Republic App:
https://github.com/xhqiao89/watershed_delineation_app
o Reservoir Calculation for Dominican Republic App:
https://github.com/xhqiao89/reservoir_calculation_app
o HydroProspector for Dominican Republic App:
https://github.com/xhqiao89/hydroprospector_app
2. Third-party Software Used in the Dissertation
Tethys Platform
o Source code: https://github.com/tethysplatform/tethys
o Home page: http://www.tethysplatform.org/
o License: The BSD 2-Clause open source license
PyWPS
o Source code: https://github.com/geopython/pywps
o Home page: http://pywps.org/
o License: The MIT license
ArcGIS RAPID preprocessing toolbox
o Developed by Esri:
o Source code: https://github.com/Esri/python-toolbox-for-rapid.git.
119
RAPIDpy
o Developed by U.S. Army Engineer Research and Development Center
o Source code: https://github.com/erdc/RAPIDpy.git.
Global “Streamflow Prediction Tool” web app
o https://tethys.byu.edu/apps/streamflow-prediction-tool/ (username : demo; password :
demo).
OGMS-MS source code:
o https://gitee.com/geomodeling/GeoModelServiceContainer
OpenGMS service client SDK:
o Source code: https://gitee.com/geomodeling/NJModelServiceSDK