a, c arXiv:1908.10403v2 [math.NA] 29 Aug 2019

24
Centroidal Voronoi Tessellation Based Methods for Optimal Rain Gauge Location Prediction Zichao (Wendy) Di a,* , Viviana Maggioni b , Yiwen Mei b , Marilyn Vazquez c , Paul Houser d , Maria Emelianenko e a Mathematics and Computer Science Division, Argonne National Laboratory, Lemont, IL, USA b Department of Civil, Environmental and Infrastructure Engineering, George Mason University, Fairfax, VA, USA c Mathematical Biosciences Institute, Ohio State University, Columbus, OH, USA d Department of Geography and Geoinformation Science, George Mason University, Fairfax, VA, USA e Department of Mathematical Sciences, George Mason University, Fairfax, VA, USA Abstract With more satellite and model precipitation data becoming available, new analytical methods are needed that can take advantage of emerging data pat- terns to make well informed predictions in many hydrological applications. We propose a new strategy where we extract precipitation variability pat- terns and use correlation map to build the resulting density map that serves as an input to centroidal Voronoi tessellation construction that optimizes placement of precipitation gauges. We provide results of numerical experi- ments based on the data from the Alto-Adige region in Northern Italy and Oklahoma and compare them against actual gauge locations. This method provides an automated way for choosing new gauge locations and can be gen- eralized to include physical constraints and to tackle other types of resource allocation problems. Keywords: rain gauges, CVT, decorrelation, optimal placement * Corresponding author August 30, 2019 arXiv:1908.10403v2 [math.NA] 29 Aug 2019

Transcript of a, c arXiv:1908.10403v2 [math.NA] 29 Aug 2019

Page 1: a, c arXiv:1908.10403v2 [math.NA] 29 Aug 2019

Centroidal Voronoi Tessellation Based Methods for

Optimal Rain Gauge Location Prediction

Zichao (Wendy) Dia,∗, Viviana Maggionib, Yiwen Meib, Marilyn Vazquezc,Paul Houserd, Maria Emelianenkoe

aMathematics and Computer Science Division, Argonne National Laboratory, Lemont,IL, USA

bDepartment of Civil, Environmental and Infrastructure Engineering, George MasonUniversity, Fairfax, VA, USA

cMathematical Biosciences Institute, Ohio State University, Columbus, OH, USAdDepartment of Geography and Geoinformation Science, George Mason University,

Fairfax, VA, USAeDepartment of Mathematical Sciences, George Mason University, Fairfax, VA, USA

Abstract

With more satellite and model precipitation data becoming available, newanalytical methods are needed that can take advantage of emerging data pat-terns to make well informed predictions in many hydrological applications.We propose a new strategy where we extract precipitation variability pat-terns and use correlation map to build the resulting density map that servesas an input to centroidal Voronoi tessellation construction that optimizesplacement of precipitation gauges. We provide results of numerical experi-ments based on the data from the Alto-Adige region in Northern Italy andOklahoma and compare them against actual gauge locations. This methodprovides an automated way for choosing new gauge locations and can be gen-eralized to include physical constraints and to tackle other types of resourceallocation problems.

Keywords: rain gauges, CVT, decorrelation, optimal placement

∗Corresponding author

August 30, 2019

arX

iv:1

908.

1040

3v2

[m

ath.

NA

] 2

9 A

ug 2

019

Page 2: a, c arXiv:1908.10403v2 [math.NA] 29 Aug 2019

1. Introduction

Precipitation is a critical variable in the water cycle, as it provides mois-ture for processes such as runoff, evapotranspiration, and groundwater recharge.Knowledge of the precipitation characteristics and patterns is therefore cru-cial for understanding land-climate interactions, for extreme event monitor-ing, and for water resource management. However, accurate precipitationinformation at fine space and time scales is difficult to obtain, as precipita-tion estimates from rain gauges, ground-based radars, satellite sensors, andnumerical models are all affected by significant uncertainties, which can evenbe amplified when exposed to non-linear land surface model physics [1, 2].

Rain gauge (or pluviometer) networks are the only direct method to mea-sure precipitation and provide observations with high temporal resolution.As such, they are widely accepted as the benchmark for validating remotely-sensed precipitation products (e.g.,[3],[4]). However, obtaining a spatiallyrepresentative precipitation field from rain gauges may require collecting alarge number of observations at several locations to include different terrain,micro-climate, and vegetation variability. This translates into placing numer-ous gauges, which is costly in terms of maintenance and often not feasiblebecause of location inaccessibility. Since rain gauges cannot offer spatiallycontinuous information of precipitation [5], it is desirable to place them in astrategic way, accounting for changes in spatio-temporal patterns and hav-ing a methodology that is able to adapt to climate instabilities as suggestedin [6]. This work proposes an automated method to identify the optimallocations of rain gauges in order to capture the spatial variability of precipi-tation systems in the region and therefore provide a spatially representativerain field.

Voronoi tessellations appear in many contexts and serve a variety of pur-poses. They are referred to as quantizations in the electrical engineeringcommunity [7], as polygons of influence or Thiessen regions in geostatistics[8], as geodesic, icosahedral or hexagonal grids in climate system modeling[9]. The corresponding energy functional can be thought of as variation instatistical terminology, or as a cost functional in economic terms. The basicidea of this construction is to represent a large set of data by means of fewrepresentative points (generators).

While this general concept has been widely used in a variety of contexts,the special case of a centroidal Voronoi tessellations (CVTs), which is usedto denote Voronoi diagrams with the choice of generators that minimizes the

2

Page 3: a, c arXiv:1908.10403v2 [math.NA] 29 Aug 2019

energy functional, is less known due to the difficulties associated with its con-struction. This concept has been gaining popularity in the recent decadesin many applications areas, ranging from biology and physics to finance,economics, social science, as well as hydrology (see [10],[11] and referencestherein). It is particularly well studied in the context of mesh generation,clustering, quantization, imaging, reduced order modeling and partial differ-ential equations (PDE) applications, where a number of theoretical resultshas been obtained attesting for its superior qualities comparing to other com-peting methodologies.

The goal of this work is two-fold. First, we want to introduce the conceptof CVTs for precipitation pattern analysis and point out several recentlydeveloped numerical algorithms that help constructing CVTs in continuousand discrete settings. Second, we are applying the idea of CVTs in the con-text of optimal placement of rain gauges, similarly to how it was previouslyapplied in finding optimal placement of schools, post offices, and other re-sources [12]. The practical aim of this study is to develop an automatedstrategy that would work for an arbitrary data in any geographical location.We draw attention to several modeling assumptions that are part of the al-gorithm presented herein and the implications of these assumptions on thealgorithm performance based on several selected datasets.

We utilize the truncated-Newton (TN) method [13], which is a large-scale nonlinear optimization algorithm, to construct all CVT solutions. Themethod, described briefly in Appendix (Section 9), allows to considerablyspeed up the calculation comparing to techniques such as Lloyd methodwidely used in the engineering community [14], and has advantages overpreviously introduced modified Lloyd formulations [15].

As in any CVT problem aimed at finding optimal placement of resources,one needs to have a local density estimator. When trying to optimize raingauge placement, this density should be naturally related to the measure oflocal precipitation variability. In this work, we have used a variability estima-tor based on local covariance matrix computed at the decorrelation distance.Our aim is to develop an automated strategy that could be employed in anygeographical location in the future on a need basis. This method could beapplied to any precipitation dataset, whether in situ, remotely, or based onmodel simulations.

The article is organized as follows. Section 2 provides information on theCVT methodology and formulates the problem of optimal gauge placement,comparing to previously used approaches. Section 3 gives information on

3

Page 4: a, c arXiv:1908.10403v2 [math.NA] 29 Aug 2019

the data we used in this work. The proposed algorithm is presented inSection 4. We also compare predictions given by this model with existinggauge locations in Section 5.

2. Methods

2.1. Voronoi and centroidal Voronoi tessellations

The idea of tessellating the region, i.e. decomposing the region into sub-regions, based on the locations of rain gauges has appeared in the early worksof Thiessen [8].

The construction is simple. Consider a certain geographic region W ⊂ R2.Voronoi (Thiessen) regions {Vi}ki=1 are generated by a set of points {xi}ki=1 ∈R2 and are defined as follows:

Vi(xi) = {x ∈ W : ||x− xi|| ≤ ||x− xj||, j = 1, . . . , k; j 6= i},

where ‖ · ‖ is any distance metric. In this work, we choose it to be thestandard Euclidean norm. These regions cover the entire domain W andcan be formed by drawing perpendicular bisectors to the segments joiningconsecutive generating points.

Given a certain desired “density” function ρ(x), one can compute thetessellation error, i.e.

E({xi}ki=1

)= min{xi}ki=1

k∑i=1

∫Vi

ρ(x)||x− xi||2 dx. (2.1)

It can be shown [16] that this happens precisely when xi = x∗i , where x∗i isthe mass centroid of the corresponding Voronoi region Vi. A tessellation sat-isfying this property is called a “centroidal Voronoi tessellation”, or CVT forshort. Notice that the above formulation may be extended to other more gen-eral cases, i.e. by considering other distance metrics f(‖ · ‖), other geometricconstraints and periodic extensions [12, 17].

The density function may be used to represent a variety of physical char-acteristics, such as local characteristic length-scale [18], signal intensity [19],desired grid resolution [16]. In this work we propose to use it for representingspatial rainfall variability, as described in Section 4.

The classical method for constructing CVTs is the algorithm developedby Lloyd in the 1980s [14] which represents a fixed-point type iterative mech-

4

Page 5: a, c arXiv:1908.10403v2 [math.NA] 29 Aug 2019

anism. More efficient methods for calculating CVTs have been developed inthe past decades [20, 21, 15]. For an overview of CVT related numerical tech-niques we refer interested reader to [12, 22]. For the purposes of this work,we will use one of the recently developed and well tested CVT solvers basedon truncated Newton (TN) methodology [15] (see the preudocode given inAppendix B).

2.2. Problem formulation

There are several possible ways to formulate the problem of optimal place-ment of rain gauges in a certain region.

One method has been proposed in [23] and later used in [12]. If k raingauges locations in region W ∈ R2 are given by x1, . . . ,xk and Vi ∈ W are theVoronoi (Thiessen) region associated with the i-th gauge, one can minimizethe expected squared approximation error for the amount of rainfall z(x)treated as a random variable z(x) = m(x) + ε(x) with mean m(x) anddeviation ε(x). If the change in the average trend m(x) is small comparedwith the variance V ar(x) = E(ε(x)2) = ε, which is assumed to be constant,the expected squared approximation error can be approximated as

E({xi}ki=1

)≈ min{xi}ki=1

n∑i=1

∫Vi

2ε2 [1− Corr(x,xi)] dx, (2.2)

whereCorr(x,xi) = E[ε(x)ε(xi)]/[V ar(x)V ar(xi)] (2.3)

is the (Pearson) linear correlation coefficient of the time series at locationsx and xi. Details on this derivation are given in Appendix A. Evaluation of(2.2) is computationally intensive, requiring calculation of pairwise correla-tions for all points inside the domain.

Notice that under the additional assumption that Corr(x,xi) only de-pends on the differences ||x − xi||, F ({xi, Vi}ki=1) can be thought of as ageneralization of the CVT energy (2.1) given by:

E({xi}ki=1) =k∑i=1

∫Vi

ρ(x)f(||x− xi||) dx (2.4)

with distance metric f(||x − xi||) = 2ε2(1 − Corr(x,xi)) and ρ(x) = 1.The combination of conditions of constant mean and variance together with

5

Page 6: a, c arXiv:1908.10403v2 [math.NA] 29 Aug 2019

Corr(x,xi) = Corr(x − xi) is normally referred to as the weak stationar-ity assumption that might or might not hold in practice, as discussed forinstance in [6]. Effectively, in this formulation the distance plays the mostimportant role, placing a small weight on highly correlated points and mag-nifying weakly correlated regions. The maximum value of the variance canbe rather small depending on the data, which may possibly lead to slowconvergence for commonly used numerical algorithms [24].

In contrast with the above approach, standard CVT formulation (2.1)applied to the same problem allows to achieve a similar effect by fixing Eu-clidean density and instead selecting appropriate density function ρ. Thismethod is grounded on the observation that for the solutions to (2.1), thesizes of 2-dimensional Voronoi regions defined as hVi = 2 max

y∈Vi||xi − y|| satisfy

[12]:

hVihVj

=

(ρ(xi)

ρ(xj)

)1/3

. (2.5)

This is the approach advocated for in this work. One may rescale thedensity ρ to achieve any desired ratio of cell sizes based on a certain spa-tial distribution of interest, for instance, in [18], CVT mesh was gener-ated using a velocity field. In this work we choose our density so that itsatisfies the following conditions: sup

x∈Wρ(x) = lim

Corr(x)→Cmin

ρ(x) = R and

infx∈W

ρ(x) = limCorr(x)→Cmax

ρ(x) = r. Here R and r are scale parameters and

Corr(x) denotes effective (average) correlation at spatial location x, with−1 ≤ Cmin ≤ Corr(x) ≤ Cmax ≤ 1, ∀x ∈ W . The method for computingthis quantity based on averaging correlations at a “decorrelation” distance isdiscussed in 4.1. There are many functional forms such a relation can take.One choice is to consider power law relation of the type

ρ(x) = r +R(Cmax − Corr(x)

Cmax − Cmin

)α, (2.6)

where α > 0 denotes the power exponent.This approach results in the following alternative formulation of the op-

timal rain gauge placement problem:

min{xi}ki=1

k∑i=1

∫Vi

[r +R

(Cmax − Corr(x)

Cmax − Cmin

)α]‖x− xi‖22dx. (2.7)

6

Page 7: a, c arXiv:1908.10403v2 [math.NA] 29 Aug 2019

10.5 11 11.5 12

46.3

46.4

46.5

46.6

46.7

46.8

46.9

47

47.1

R=1

10.5 11 11.5 12

46.3

46.4

46.5

46.6

46.7

46.8

46.9

47

47.1

R=100

Figure 1: Illustration of the effect of the ratio of R/r on the sizes of Voronoi regions. r = 1is fixed for both figures.

The choice of parameters r, R, α is ultimately dependent on the data, thechoice of the numerical optimization algorithm and particular application.Due to possible convergence issues, one needs to constrain density away fromzero, so that r > 0. The scale parameter R magnifies the range of densityvalues. Based on (2.5), the ratio R/r can be interpreted as an approximation

for the ratio of largest and smallest Voronoi regions:R

r≈ hmaxhmin

. We illustrate

the role of this ratio in Figure 1 by fixing r = 1 and varying R values. Wedefer to the values of r = 10−6 and R = 1 in this work to provide the proofof concept. This choice gives a ratio of R/r = 106 and was optimized for theuse of TN CVT construction algorithm.

Given desired region size ratio, parameter α allows to enhance the con-trast between peaks and valleys of the function. It is expected that highervalues of α will essentially penalize low correlation areas compared to highcorrelation (low density) areas, exaggerating density differences inside thegiven domain. The choice of the enhancement parameter α is described indetails in Section 4.

3. Study Regions and Dataset

This work focused on two very different regions: Oklahoma in the UnitedStates and Alto-Adige in Northern Italy. The reason we chose these two do-mains is twofold. First, they are both covered by dense rain gauge networksthat can be used as reference to evaluate the proposed algorithm. Second,

7

Page 8: a, c arXiv:1908.10403v2 [math.NA] 29 Aug 2019

they are characterized by different topography and therefore different pre-cipitation processes.

Oklahoma is characterized by relatively uniform terrain, with gentle to-pography that rises from the southeastern corner to the tip of the panhandle.The continental climate of the region presents cold winters and hot summerseasons and a rainfall spatial pattern that exhibits a west-to-east (dry-to-wet)gradient. The study domain is covered by a dense network of meteorologi-cal stations, the Oklahoma Mesoscale Network (hereinafter Mesonet; [25]),as seen in Figure 2 (top). Although data collected at these stations werenot used in this study, their locations were compared to the output fromour proposed algorithm that optimizes gauge placement to fully capture theprecipitation variability in the region.

A similar analysis was performed in Alto-Adige, located in the easternItalian Alps. Unlike Oklahoma, this area is characterized by complex topog-raphy, with elevation ranging from 65 to almost 4000 m a.s.l.

Figure 2: Topographic maps with exist-ing gauge locations: (top) Oklahoma region;(bottom) Adige region.

Precipitation climatology in thearea exhibits strong spatial gradi-ents, with mean annual precipita-tion varying from ∼500 mm in theNorthwestern region to ∼1700 mmin the Southeastern part ([26], [27]).A network of 192 rain gauges isavailable in the region. The raingauges are distributed quite uni-formly over the area (Fig. 2 (bot-tom)), providing a very dense gaugedensity (∼1/70 rain gauge/km2) fora mountainous area. For the re-gional studies across Oklahoma andAlto-Adige, we adopted a high res-olution (1hour/0.04◦) satellite pre-cipitation product, the PrecipitationEstimation from Remotely SensedImagery using Artificial Neural Net-works (PERSIANN; [28],[29],[30]),produced by the University of Cali-fornia, Irvine. PERSIANN uses an adaptive algorithm to extract informationfrom Infra-Red (IR) brightness temperature images, classify the extracted

8

Page 9: a, c arXiv:1908.10403v2 [math.NA] 29 Aug 2019

feature, and apply multivariate mapping of the classified texture to the rainrate. Specifically, in this study we used the PERSIANN-Cloud ClassificationSystem (PERSIANN-CCS), which only applies information from IR observa-tions from geostationary satellites with high sampling frequency ([31]). In thenumerical examples below, we used a crude conversion of 0.04◦ = 5[km] whenpresenting the results for both regions of interest. Improving this approxi-mation or changing it for other geographic locations will bear no significantconsequences in terms of the results.

4. Proposed algorithm

The main steps of the algorithm we are proposing are the following:

Step 1 Build CVT density function based on the information on spatio-temporal correlations in a certain region using (2.6),

Step 2 Solve the CVT minimization problem (2.7) to obtain optimal rain gaugelocations for this region.

We are now going to discuss the details of Step 1, while an example ofa TN-based CVT solver is provided in Appendix B. Note that other meth-ods may be used in place of the TN method and might be equally effectivedepending on the properties of the data.

0 20 40 60 80 100 120

distance [km]

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

co

rrela

tio

n

1/e

0 20 40 60 80 100

distance [km]

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

co

rre

lati

on

1/e

Figure 3: Calculation of the decorrelation distance using 1/e-rule: (left) for the moun-tainous region in Northern Italy, with d = 45 [km]; (right) for the Oklahoma region, withd = 80 [km].

9

Page 10: a, c arXiv:1908.10403v2 [math.NA] 29 Aug 2019

4.1. Effective correlation computation

Effective local correlation Corr(x) is a key ingredient of the CVT den-sity estimator used in our method. The approach we take is based on thecalculation of local correlation of the time series at each grid point. Thenumber of neighbors one should be taking into account in this computationis related to the decorrelation distance (radius), which is estimated using thethe exponential model with the so-called nugget effect [32, 33]:

ρg(d) = c0 exp[−( dd0

)s0], (4.8)

where d is the separation distance between two points, c0 is the nugget param-eter (which corresponds to the correlation value for the near-zero distances;[34]), d0 is the scale parameter (which corresponds to the spatial decorrela-tion distance), and s0 is the correlogram shape parameter, which controls thebehavior of the model near the origin for small separation distances. Notethat (1− c0) is the instant decorrelation due to random errors in the rainfallobservations [35].

We estimate Corr(x) as the average of correlation coefficients betweengiven point x and locations on the circle S(x, d) = {y ∈ W : ||y − x||2 = d}of radius d:

Corr(x) = Corrd(x) =1

|S(x, d)|∑

y∈S(x,d)

Corr(x,y), (4.9)

where Corr(x,y) is the Pearson correlation coefficient given in (2.3).Monte Carlo integration over the circle of radius d may be used to speed

up the calculation of this quantity. Using uniform sampling y1, . . . ,yN overthe region S̃(x, d) = {y ∈ W : d− 1 ≤ ||y − x||2 ≤ d+ 1}, we define

CorrN(x) ≈ 1

N

N∑i=1

Corr(x,yi) (4.10)

This method gives an error of the order of O(1/√N) [36]. The value

of N = 100 was used in the numerical experiments described in Section 5.Importance sampling may be used to improve the accuracy of the MonteCarlo approximation of the local correlation map, but it was not explored incurrent work.

10

Page 11: a, c arXiv:1908.10403v2 [math.NA] 29 Aug 2019

We define the separating distance d0 at which the correlation is 1/e thecorrelation length for the (assumed) exponential variogram model:

d0 = min{d : Corrd(x) <1

e} (4.11)

10 11 12

45.5

46

46.5

47

Original correlation map

0.2

0.4

0.6

0.8

10 11 1245.5

46

46.5

47

Interpolated correlation map

0.2

0.4

0.6

0.8

10 11 1245.5

46

46.5

47

Density with =1

1

1.5

2

10 11 1245.5

46

46.5

47

Density with =2

1

1.5

2

Figure 4: Density computation for the Italy region: (top left) Effective correlation mapbased on the original data; (top right) interpolated map; (bottom left) density map con-structed using (2.6) with α = 1; (bottom right) density map for α = 2.

Figure 3 (left) demonstrates the calculation of the decorrelation distancefor the case of the data collected over the Adige mountainous region in North-ern Italy. We expect to have relatively small decorrelation distance overmountains, which is indeed true based on this data, from which a value of 9grid points may be estimated, which corresponds to 45 [km]. Similar calcu-lation performed over the flat region of Oklahoma shows decorrelation at 16grid points, corresponding to 80 [km], as seen in Figure 3 (right).

Now that the correlation component of the density is defined, we areready to discuss a possible strategy for picking optimal value for α givenexpected number of rain gauges and a correlation threshold.

The choice of the density can be motivated by several factors, includingavailability of resources and desired accuracy.

Figures 4 shows CVT densities for Adige region, computed using for-mula (2.6) with Corr(x) computed at the decorrelation distance as discussedabove. Two different values of parameter α have been tested, resulting in

11

Page 12: a, c arXiv:1908.10403v2 [math.NA] 29 Aug 2019

10 11 1245.5

46

46.5

47

=1, k =375

0.2

0.4

0.6

0.8

1

10 11 1245.5

46

46.5

47

=2, k =82

0.2

0.4

0.6

0.8

1

10 11 1245.5

46

46.5

47

=3, k =39

0.2

0.4

0.6

0.8

1

Figure 5: Comparing density computation for different α values: Adige region data. Pointssatisfying condition (4.12) are marked and their number is denoted as k in the title to eachsubfigure.

clear differences in densities.

4.2. Optimal α and main algorithm

One may want to optimize the choice of parameter α based, for instance,on the desired number of rain gauges kg to be placed in the regions withrelative correlation below certain threshold Ctol. Namely, we can pick densityin such a way that we get approximately kg locations with relative correlationbelow Ctol. We may choose optimal value α∗ as the smallest α satisfying

k =∣∣∣{x |Cα

rel =(Corr(x)− Cmin

Cmax − Cmin

)α< Ctol}

∣∣∣ ≤ kg (4.12)

where | · | denotes the number of grid points in the set.If α = 1 immediately satisfies condition (4.12), we stop. Otherwise we

keep increasing α until desired resolution is obtained.For instance, if the desired relative tolerance is Ctol = 0.1, meaning that

locations with correlation below 10% are targeted, and if we are planning toplace kg = 100 gauges in a certain area, we will need to pick α to satisfy

k =∣∣∣{x |Cα

rel < 0.1∣∣∣ ≤ 100

Figure 5 gives a visualization of the optimal α selection procedure in thecase of Adige data based on the above strategy, with the choice of kg =60, Ctol = 0.1 and default values r = 1, R = 1.

While the locations satisfying condition (4.12) are good candidates forinitial gauge placement, this choice is not optimal in terms of the overallapproximation of the precipitation in the region of interest. As discussed

12

Page 13: a, c arXiv:1908.10403v2 [math.NA] 29 Aug 2019

above, optimal placement is attained by computing CVT that minimizes theapproximation error (2.1).

Finally, in Algorithm 1 we describe the main iterative algorithm for de-termining optimal placement for rain gauges for a given region.

Algorithm 1 Automatic selection of optimal gauge locations.

1: procedure GaugeOptim2: Define correlation threshold Ctol (default value Ctol = 0.1), desired

number of gauges kg.3: Form the N × n observation matrix Y , where N and n denote the

discrete spatial and temporal resolution, respectively.4: Compute Corr(yj) at each location yj, j = 1, . . . N using (4.10).5: Compute decorrelation distance d using (4.11).6: Interpolate the correlation map. Set α = 0.7: loop: Let α = α + 1.8: Build the density map for the interpolated grid using (2.6).9: if (4.12) is not satisfied then

10: goto loop.11: else12: Return optimal α.

13: Starting with kg random points, find CVT-optimal generators xi, i =1, . . . , kg using TN method given in Appendix B or any alternativemethod with the density (2.6).

14: Calculate CVT energy (2.1).

Apart from the precipitation time series data, the only input parametersneeded to run the code are the correlation threshold Ctol and the number ofgauges to be placed kg. The default value for Ctol is 0.1, while the numberof gauges can be arbitrary. If desired, the user may choose to construct theCVT tessellation for any given number of generators kg starting immediatelyat line 13 of Algorithm 1.

5. Numerical results and discussion

Figures 6 and 7 provide the results of applying Algorithm 1 to bothregions and compare the optimal locations of rain gauges with existing raingauge locations. The default values of r = 10−6, R = 1 were chosen in allcalculations.

13

Page 14: a, c arXiv:1908.10403v2 [math.NA] 29 Aug 2019

Adige: alpha = 3, # of gauges:138

0.2

0.4

0.6

0.8

density contour

CVT

real gauge

Figure 6: Optimized rain gauge locations compared with real gauge data for the Adigedataset.

As shown in Figure 6, more gauges would be required where topographyis the most complex, i.e., the northwestern region of Alto Adige, character-ized by altitudes close to 4000 m a.s.l. On the other hand, in the valleysin the southeastern region, less gauges would be needed to fully capturethe variability of precipitation. Similarly, in Oklahoma, where topographyis more uniform than in Alto Adige, the gauge placement is also more ho-mogeneous, with limited areas characterized by high density. This confirmsthat the proposed algorithm is able to optimize gauge placement based onobserved precipitation patterns, which are linked to the geography of the re-gion (e.g., orographic rainfall systems). In fact, elevation was shown in pastliterature to explain most of the variance in rainfall in mountainous regions(e.g., [37, 38]).

In order to investigate how far the actual gauges are from their optimallocations, we look at Euclidean distance between a certain gauge and theclosest optimal location. We pick a measure of closeness as a circle of acertain radius and count the number of gauges that fall within that distancefrom any optimal location. Tables 2 and 1 give direct counts of the gaugesthat are close and far from optimal locations, and Figures 8 and 9 providethe visual representation of the these results.

Obviously, some locations may be inaccessible (especially in complex ter-rain) although optimal. The methodology introduced in this work can bemodified to account for physical constraints and other resource allocation

14

Page 15: a, c arXiv:1908.10403v2 [math.NA] 29 Aug 2019

Figure 7: Optimized rain gauge locations compared with real gauge data for the Oklahomadataset.

problems, but this is beyond the scope of this article and recommended forfuture work.

Radius (km) gauges within radius r gauges not within radius dr = 2 8 130r = 5 60 78r = 10 132 6

Table 1: Adige region: number of gauges close to optimal locations for different closenessmeasures.

As a measure of how well selected locations approximate the precipitationdata in a given region, we can compute the error functional given by (2.1).As shown in Figure 10, the approximation error is significantly decreased byrunning the optimization routine. More specifically, the error decreases from259.56 to 15.02 for Adige data and from 8.61 · 104 to 4.67 · 104 for Oklahoma.

15

Page 16: a, c arXiv:1908.10403v2 [math.NA] 29 Aug 2019

Radius (km) gauges within radius r gauges not within radius dr = 5 7 111r = 10 18 100r = 15 49 69r = 15 86 32

Table 2: Oklahoma region: number of gauges close to optimal locations for differentcloseness measures

Figure 8: Comparing existing locations with optimal placement for the Oklahoma region.

Iteration was terminated once local minimum was achieved according to thechosen stopping criterion.

6. Summary

We developed an automated strategy that allows to find optimal precipi-tation gauge locations in any given region based on the variability pattern inthe precipitation over the region. The proposed algorithm could potentiallybe applied to any precipitation dataset (including re-analysis products) thatis long enough to capture the temporal variability of precipitation, regard-less of any seasonality. We chose satellite-based observations because of theirglobal (or quasi-global) coverage, making this method applicable anywhereelse in the world. This is particularly useful when planning a field campaignto select sampling sites or when installing a new gauge network to pick theoptimal number of gauges and their locations.

7. Acknowledgements

The authors are grateful to Ufficio Idrografico di Bolzano for providingAlto-Adige data. ME acknowledges support provided by the US National

16

Page 17: a, c arXiv:1908.10403v2 [math.NA] 29 Aug 2019

Figure 9: Comparing existing locations with optimal placement for Adige region.

50 100 150 200 250

# of iteration

10 5

10 6real gauge error

CVT-based gauge error

20 40 60 80 100 120 140 160 180

# of iteration

10 2

10 3 real gauge error

CVT-based gauge error

Figure 10: Decay of the approximation error E (2.1) during TN iteration for Oklahomaregion (left) and Adige (right). CVT-based location denoted as a red asterisk gives asmaller error value comparing to the error computed for the real-gauge locations (shownas a blue asterisk).

Science Foundation CAREER grant DMS-1056821 that funded Mason Mod-eling Days workshop where initial conceptualization of this work occurred.

References

[1] J. Gottschalck, J. Meng, M. Rodell, P. Houser, Analysis of multiple pre-cipitation products and preliminary assessment of their impact on globalland data assimilation system land surface states, J. Hydrometeorol. 6(2005) 573598.

[2] A. Hazra, V. Maggioni, P. Houser, H. Antil, M. Noonan, A Monte Carlo-

17

Page 18: a, c arXiv:1908.10403v2 [math.NA] 29 Aug 2019

based multi-objective optimization approach to merge different precip-itation estimates for land surface modeling, J. Hydrology 570 (2019)454–462.

[3] V. Maggioni, P. Meyers, M. Robinson, A review of merged high-resolution satellite precipitation product accuracy during the tropicalrainfall measuring mission (trmm) era, Journal of Hydrometeorology17(4) (2016) 1101–1117.

[4] E. Anagnostou, V. Maggioni, E. Nikolopoulos, T. Meskele, F. Hossain,A. Papadopoulos, Benchmarking high-resolution global satellite rainfallproducts to radar and rain-gauge rainfall estimates, IEEE Transactionson Geoscience and Remote Sensing 48(4) (2009) 1667–1683.

[5] C. Kidd, P. Bauer, J. Turk, G. Huffman, R. Joyce, K. Hsu, D. Braith-waite, Intercomparison of high-resolution precipitation products overnorthwest Europe, J. Hydrometeor. 13 (6783).

[6] M. Emelianenko, V. Maggioni, Mathematical challenges in measuringvariability patterns for precipitation analysis, in: H. Kaper, F. Roberts(Eds.), Mathematics of Planet Earth, Springer, 2019, pp. 49–61.

[7] E. Agrell, T. Ericsson, Optimization of lattices for quantization, IEEETrans. Inf. Theory 44(5) (1998) 1814–1828.

[8] A. Thiessen, Precipitation averages for large areas, Monthly Weatherreview 39 (1911) 1081–1084.

[9] T. Ringler, R. Heikes, D. Randall, Modeling the atmospheric generalcirculation using a spherical geodesic grid: a new class of dynamicalcores, Mon Weather Rev 128 (2000) 2471–2490.

[10] Q. Du, M. Gunzburger, L. Ju, Advances in studies and applications ofcentroidal Voronoi tessellations, Numer. Math. Theor. Meth. Appl. 3 (2)(2000) 119–142.

[11] L. Chen, J. Xu, Optimal Delaunay triangulations, J. Comp. Math. 22(2004) 299–308.

[12] Q. Du, V. Faber, M. Gunzburger, Centroidal Voronoi tessellations: ap-plications and algorithms, SIAM Review 41 (1999) 637–676.

18

Page 19: a, c arXiv:1908.10403v2 [math.NA] 29 Aug 2019

[13] S. G. Nash, A survey of truncated-Newton methods, Journal of Compu-tational and Applied Mathematics 124 (1) (2000) 45–59.

[14] S. Lloyd, Least square quantization in PCM, IEEE Transactions onInformation Theory 28 (1982) 129–137.

[15] Z. Di, M. Emelianenko, S. Nash, Truncated Newton-based multigrid al-gorithm for centroidal Voronoi calculation, Numer. Math. Theor. Meth.Appl. 5 (1) (2012) 242–259.

[16] J. Burkardt, M. Gunzburger, H.-C. Lee, Centroidal Voronoi tessellation-based reduced order modeling of complex systems, SIAM J. Sci. Com-put. 28 (2) (2006) 459–484.

[17] J. Zhang, M. Emelianenko, Q. Du, Periodic centroidal Voronoi tessella-tions, Intern. J. Num. Anal. Modeling 9 (2012) 950–969.

[18] T. Ringler, L. Ju, M. Gunzburger, A multiresolution method for climatesystem modeling: application of spherical centroidal Voronoi tessella-tions, Ocean Dynamics 58 (2008) 475–498.

[19] M. Emelianenko, Fast multilevel CVT-based adaptive data visualizationalgorithm, Numer. Math. Theor. Meth. Appl. 3 (2) (2010) 195–211.

[20] Q. Du, M. Emelianenko, Acceleration schemes for computing the cen-troidal Voronoi tessellations, Numer. Linear Algebra Appl. 13 (2006)173–192.

[21] Q. Du, M. Emelianenko, Uniform convergence of a nonlinear energy-based multilevel quantization scheme via centroidal Voronoi tessella-tions, SIAM J. Numer. Anal. 46 (2008) 1483–1502.

[22] L. Chen, M. Holst, Efficient mesh optimization schemes based on opti-mal Delaunay triangulations, Comp. Methods Appl. Mech. Engrg. 200(2011) 967–984.

[23] A. Okabe, B. Boots, K. Sugihara, Spatial Tessellations; Concepts andApplications of Voronoi Diagrams, Wiley, Chichester, 1992.

[24] Q. Du, M. Emelianenko, L. Ju, Convergence properties of the lloydalgorithm for computing the centroidal Voronoi tessellations, SIAM J.Numer. Anal. 44 (2006) 102–119.

19

Page 20: a, c arXiv:1908.10403v2 [math.NA] 29 Aug 2019

[25] F. Brock, K. Crawford, R. Elliott, G. Cuperus, S. Stadler, H. John-son, M. Eilts, The Oklahoma Mesonet: A technical overview, J. Atmos.Oceanic Technol. 12(1) (1995) 5–19.

[26] V. Maggioni, E. Nikolopoulos, E. Anagnostou, M. Borga, Modelingsatellite precipitation errors over mountainous terrain: The influence ofgauge density, seasonality, and temporal resolution, IEEE Transactionson Geoscience and Remote Sensing 55(7) (2017) 4130–4140.

[27] E. I. Nikolopoulos, M. Borga, F. Marra, S. Crema, L. Marchi, Debrisflows in the eastern Italian Alps: Seasonality and atmospheric circula-tion patterns, Natural Hazards Earth Syst. Sci. 15 (3) (2015) 647656.

[28] K.-L. Hsu, X. Gao, S. Sorooshian, H. V. Gupta, Precipitation estimationfrom remotely sensed information using artificial neural networks, J.Appl. Meteorol. 36 (9) (1997) 11761190.

[29] S. Sorooshian, K.-L. Hsu, X. Gao, H. V. Gupta, B. Imam, D. Braith-waite, Evaluation of PERSIANN system satellitebased estimates of trop-ical rainfall, Bull. Amer. Meteorol. Soc. 81 (2000) 20352046.

[30] K.-L. Hsu, S. Sorooshian, Satellite-based precipitation measurement us-ing PERSIANN system, in: Hydrol. Model. Water Cycle, Berlin, Ger-many: Springer-Verlag, 2008, p. 2748.

[31] K.-L. Hsu, A. Behrangi, B. Iman, S. Sorooshian, Extreme precipitationestimation using satellite-based PERSIANN-CCS algorithm, in: M. Ge-bremichael, F. Hossain (Eds.), Satellite Rainfall Applications for SurfaceHydrology, Springer, 2010, p. 4967.

[32] G. Ciach, W. Krajewski, On the estimation of radar rainfall error vari-ance, Adv. Water Resources 22(6) (1999) 585–595.

[33] G. Ciach, W. Krajewski, Analysis and modeling of spatial correlationstructure in small-scale rainfall in Central Oklahoma, Adv. Water Re-sources 29(10) (2006) 1450–1463.

[34] N. Cressie, Statistics for spatial data, John Wiley and Sons, 1993.

[35] G. Ciach, Local random errors in tipping-bucket rain gauge measure-ments, Journal of Atmospheric and Oceanic Technology 20(5) (2003)752–759.

20

Page 21: a, c arXiv:1908.10403v2 [math.NA] 29 Aug 2019

[36] R. E. Caflisch, Monte Carlo and quasi-Monte Carlo methods, Acta Nu-merica 7 (1998) 149. doi:10.1017/S0962492900002804.

[37] J. Sanchez-Moreno, C. Mannaerts, V. Jetten, Influence of topographyon rainfall variability in santiago island, cape verde, Intl. J. Climatology34(4) (2014) 1081–1097.

[38] G. Johnson, C. Hanson, Topographic and atmospheric influences onprecipitation variability over a mountainous watershed, J. Applied Me-teorology 34(1) (1995) 68–87.

[39] S. G. Nash, J. Nocedal, A numerical study of the limited memory BFGSmethod and the truncated-Newton method for large scale optimization,SIOPT 1 (1991) 358–372.

8. Appendix A: Optimization problem formulation.

Following [23], let z(x) be the random variable representing rainfall atlocation x ∈ W . We can think of it as

z(x) = m(x) + ε(x)

with E(ε(x))] = 0, so that m(x) = E(z(x)) represents the average trend ofthe process over the given region.

Consider k rain gauges (pluviometers) x1, . . . ,xk placed within the regionW and let Vi be the Voronoi (Thiessen) region associated with the i-th gauge.As done by Thiessen back in 1911 [8], we can estimate the total precipitationin the region W as

Z =

∫S

z(x)dx =k∑i=1

∫Vi

z(x)dx ≈n∑i=1

|Vi|z(xi)

This leads to the natural way to formulate the optimization problem foroptimal gauge placement to minimize the expected squared approximation

21

Page 22: a, c arXiv:1908.10403v2 [math.NA] 29 Aug 2019

error:

F (x1, . . . ,xk) = E[k∑i=1

∫Vi

(z(x)− z(xi))2dx]

=k∑i=1

∫Vi

[(m(x)−m(xi))]2 + [V ar(x)− V ar(xi)]2

+ 2V ar(x)V ar(xi)[1− Corr(x,xi)] dx→ min (8.13)

Here V ar(x) = E(ε(x)2) and V ar(xi) = E(ε(xi)2) denote variances at points

x and xi respectively and Corr(x,xi) = E[ε(x)ε(xi)]/[V ar(x)V ar(xi)] is thecorresponding correlation of the time series at locations x and xi.

If the variance V ar(x) is small compared to the change in the averagetrend m(x):

F (x1, . . . ,xk) ≈k∑i=1

∫Vi

[(m(x)−m(xi))]2 dx→ min

One can solve this approximated problem to obtain optimal gauge placementin this case.

If on the other hand the change in the average trend m(x) is smallcompared with the variance V ar(x), which is considered to be constantV ar(x) = ε, the following approximation is valid:

F (x1, . . . ,xk) ≈k∑i=1

∫Vi

2ε2[1− Corr(x,xi)] dx→ min

Hence, if there is a relatively small change in the means, we can solve thisapproximated problem instead.

Notice that in this case F (x) is a variant of the CVT energy (2.4) withf(||x− xi||) = ε2(1− Corr(x,xi)) and ρ(x) = 1.

9. Appendix B: Truncated-Newton Algorithm for CVT calculation

Here, we give a brief review of the truncated Newton algorithm. For moredetails we refer readers to [13]. To optimize a problem of the form

minxf(x)

22

Page 23: a, c arXiv:1908.10403v2 [math.NA] 29 Aug 2019

at the j-th TN iteration a search direction p is computed as an approximatesolution to the Newton equations

∇2f(x(j))p = −∇f(x(j))

where x(j) is the current approximation to the solution of the optimizationproblem. The search direction p is computed using the linear conjugate-gradient algorithm (CG). The necessary Hessian-vector products are esti-mated using finite differencing. The TN algorithm only requires that valuesof f(x) and ∇f(x) are computed. TN has low storage requirements, and haslow computational costs per iteration, and hence is suitable for solving largeoptimization problems [39].

The following are the steps necessary to compute discrete CVT using TNmethod using a pre-defined density function ρ:

i. Given the discrete set of points Y = {yi}mi=1 ∈ W ,

ii. Give the discrete energy function

G(x) =∑Vi

∑j

ρ(yj)‖xi − yj‖2

where j is the index for those y included in the voronoi set Vi,

iii. Compute its corresponding gradient value

∇iG(x) =∑j

ρ(yj)2(xi − yj),

iv. By Taylor series:

∇G(x + αv) = ∇G(x) + α∇2G(x)v

we can approximate the necessary component ∇2G(x)v of CG as

∇2G(x)v =∇G(x + αp)−∇G(x)

α,

v. Substitue the above information to CG described as following to com-pute the search direction p:

• r0 := −∇G(x)

• v0 := r0

• k := 0

23

Page 24: a, c arXiv:1908.10403v2 [math.NA] 29 Aug 2019

• repeat

– αk :=rTk rk

vTk ∇2G(x)vk– pk+1 := pk + αkvk

– rk+1 := rk − αk∇2G(x)vk

– if rk+1 is sufficiently small then exit loop, end if

– βk :=rTk+1rk+1

rTk rk

– vk+1 := rk+1 + βkvk

– k := k + 1

end repeat

vi. Test whether ∇G(x)p < 0. If so, accept p as a descent direction,otherwise, take p = −∇G(x),

vii. Use Armijo line search to determine the step size α, then update x byx = x + αp,

viii. Go back to step iii) until a stopping criterion is reached.

24