monitoring design w/ support vector machines

download monitoring design w/ support vector machines

of 14

Transcript of monitoring design w/ support vector machines

  • 8/14/2019 monitoring design w/ support vector machines

    1/14

    Support vectors--based groundwater head observation networks

    design

    Tirusew Asefa, Mariush W. Kemblowski, Gilberto Urroz, Mac McKee,

    and Abedalrazq KhalilDepartment of Civil and Environmental Engineering and Utah Water Research Laboratory, Utah State University, Logan,Utah, USA

    Received 26 April 2004; revised 30 August 2004; accepted 20 September 2004; published 25 November 2004.

    [1] This study presents a methodology for designing long-term groundwater headmonitoring networks in order to reduce spatial redundancy. A spatially redundant welldoes not change the potentiometric surface estimation error appreciably, if not sampled.This methodology, based on Support Vector Machines, makes use of a uniquely solvablequadratic optimization problem that minimizes the bound on generalized risk, ratherthan just the mean square error of differences between measured and predictedgroundwater head values. The nature of the optimization problem results in sparseapproximation of the function defining the potentiometric surface that was utilized to

    select the number and locations of long-term monitoring wells and guide future datacollection efforts, which is a prerequisite in building and calibrating regional flow andtransport models. The methodology is applied to the design of regional groundwatermonitoring networks in the Water Resources Inventory Area (WRIA) 1, Whatcom County,northern Washington State, USA. INDEXTERMS: 1829 Hydrology: Groundwater hydrology; 1848Hydrology: Networks; 9820 General or Miscellaneous: Techniques applicable in three or more fields;

    KEYWORDS: Support Vector Machines, groundwater monitoring networks, statistical learning theory

    Citation: Asefa, T., M. W. Kemblowski, G. Urroz, M. McKee, and A. Khalil (2004), Support vectorsbased groundwater head

    observation networks design, Water Resour. Res., 40, W11509, doi:10.1029/2004WR003304.

    1. Background

    [2] This article is concerned with the design of long-term

    groundwater head observation networks. Groundwater headobservations are important calibration constraining data.Under ideal conditions, physical models that are based ongoverning physical processes of groundwater flow do notneed calibration. In reality, since model input parameters aresubject to uncertainties and since they are observed locallyand in sparse locations only, it is necessary to adjust these parameters so that the observed value of a dependent variable(e.g., groundwater head) matches the one simulated. Ground-water monitoring network design is defined as the selectionof sampling points (spatial) and sampling frequency (tem- poral) to determine the physical, chemical, and biologicalcharacteristics of groundwater [Loaiciga et al., 1992].

    [3] Broadly speaking, groundwater monitoring networksmay be classified into two categories: (1) groundwatercontaminant monitoring networks, and (2) groundwaterhead observation networks. On the basis of design objec-tives, the former, in turn, may be classified into threecategories: initial groundwater contamination detection,characterization, and long-term monitoring networks. Initialgroundwater contamination detection networks enable oneto detect unexpected leaks before reaching a complianceboundary, which is usually located at some relatively shortdistance, say 100m, from a landfill. Examples of suchstudies are Massmann and Freeze [1987a, 1987b], Meyer

    and Brill [1988], Morisawa and Inoue [1991], Meyer et al.[1994], Jardine et al. [1996], Storck et al. [1997], and

    Angulo and Tang [1999]. Contaminant characterizationnetworks are concerned with characterizing the nature andextent of the pollutant once initial detection is made.Specifically, the design procedure provides a methodologyon how existing monitoring wells can be augmented, ifthere are any or siting new wells. Examples of such studiesare Hudak and Loaiciga [1992], Mahar and Datta [1997],

    Datta and Dhiman [1996], and Montas et al. [2000]. Inlong-term monitoring network design, the aim is, given anadequately characterized plume, development of a cost-effective monitoring plan. Issues one looks at are selectingthe subset of monitoring wells to be sampled for a givenperiod and the frequency of monitoring those wells. Exam-ples of such studies are Molina et al. [1996], Cameron and

    Hunter[2000], Nunes et al. [2004a], and Reed et al. [2000,2001, 2003]. We refer interested readers to a recent publi-cation of the American Society of Civil Engineers (ASCE)task committee on state-of-the-art in long-term monitoringnetwork design [Minsker and Task Committee, 2003].

    [4] On the basis of design objectives, groundwaterhead observation wells may be classified into two types:(1) characterization wells, where one tries to locate newobservation wells; and (2) long-term monitoring wellswhere one selects subsets from (many) existing wells tomake frequent (monthly, quarterly) observations at thoselocations. Examples of such studies are Rouhani [1985],Gangopadhyay et al. [2001], and Nunes et al. [2004b].

    [5] On the basis of the design approach, Loaiciga et al.[1992] classified all types of monitoring networks (bothCopyright 2004 by the American Geophysical Union.0043-1397/04/2004WR003304$09.00

    W11509

    WATER RESOURCES RESEARCH, VOL. 40, W11509, doi:10.1029/2004WR003304, 2004

    1 of 14

  • 8/14/2019 monitoring design w/ support vector machines

    2/14

    groundwater head and contaminant monitoring) as hydro-logic that do not include advanced statistical methods, andstatistical otherwise. The statistical approaches were furtherdivided into simulation, variance-based, and probability- based. Differences between these methods came fromdifferences in the objective function formulation.

    [6] Variance-based, also known as variance reduction and

    redundancy reduction, methods assess the suitability of agiven network by relying on variance of estimation errorobtained by kriging [Rouhani, 1985; Ben-Jemaa et al.,1994; Nunes et al., 2004a, 2004b]. A given monitoringnetwork (number and locations) has associated uncertaintyexplained by the variance of estimation error; and if wells inthe network are to be added, removed, or displaced, theassociated network accuracy will change. These methodsthen systematically search through different combinationsof monitoring well locations that would result in minimumvariance of estimation of error.

    [7] Ben-Jemaa et al. [1994] applied a branch-and-boundalgorithm in designing monitoring networks for observingaquifer properties. The algorithm consists of searching for

    optimal monitoring nodes along preconstructed tree branches. If one has to select a monitoring network of

    n nodes from a total of N locations, there will ben

    N

    possible network layouts, each layout corresponding to onebranch of the search tree. The limitation of this approach isthat if n ( N, the number of combinations become verylarge and the problem becomes a difficult combinatorialoptimization problem. Improvements on such an approachwere made using heuristic algorithms that iteratively lookfor a better solution by trial and error, rather than searchingthe entire state space [Wagner, 1995; Nunes et al., 2004a,2004b]. As in all heuristic searches, these approaches may

    not guarantee that the final network corresponds to a globaloptimum. In addition, more than one network may be

    classified as optimal, making the final solution nonuniqueand requiring additional criterion to chose between theseequally optimal networks. This technique also does notdepend on actual values of measured variables, but on therelative distribution of the measuring locations.

    [8] In this paper, we present a methodology that is basedon Support Vector Machines (SVMs) for long-term ground-

    water head monitoring network design that globally opti-mizes an objective function to identify monitoring welllocations based on their importance in explaining the potentiometric surface without going through exhaustivetrial and error searches on alternative monitoring wellconfigurations.

    2. Case Study

    [9] The study area that is in northwestern WashingtonState, USA, including a small portion inside Canada, ispart of what is known as Water Resources Inventory Area#1 (hereafter WRIA 1) (Figure 1). It covers an area of629 Km2. As part of a concerted effort in tackling water

    resources management problems in WRIA 1, Utah StateUniversity (USU), the United States Geological Survey(USGS), and the Public Utility District No. 1 (PUD) ofWhatcom County undertook different tasks within three phases: Phase I Organization; Phase II Technicalassessment, and Phase III Plan development and imple-mentation. Different water resources management issuescurrently being looked at include: (1) Groundwater quan-tity/quality assessments; (2) Surface water quantity/qualityassessments; (3) Instream flow and fish habitat require-ments; and (4) Database management and decision supportsystems that integrate these different activities and presentan easy-to-use computer model for the decision makers.

    All these processes are interrelated. One of the centralcomponents of the system is groundwater flow and trans-

    Figure 1. Study area and 1990 phased groundwater head measurement locations. Cross sections areshown in Figure 2.

    2 of 14

    W11509 ASEFA ET AL.: SVM IN GROUNDWATER HEAD MONITORING NETWORKS W11509

  • 8/14/2019 monitoring design w/ support vector machines

    3/14

    port modeling, knowledge of which is a prerequisite forother processes. Effective groundwater flow and transportmodeling, in turn, would require, among other things,groundwater head observations that should be collectedin a timely fashion and used for model building andcalibration. Therefore acquisition of this calibration con-straining data is the first step in flow modeling. In the

    past, our experience in the project area showed that budgetand practical constraints (for example, arrangements with private well owners, and arrangements for transboundarymeasurements due to the fact that some well owners arewithin Canada) resulted in asynchronic groundwater headmeasurements. The USGS conducted one of the mostcomplete surveys in 1990. Within six months (March toAugust 1990), observations were made covering the entirepresent study area. These inventoried wells are shown inFigure 1.

    [10] Since it is not feasible to measure all these wells atall times, the management problem to be addressed by the present study is to identify subsets of these wells to bemonitored simultaneously on a regular basis. Cost-effective

    acquisition of these data is then crucial in flow and transportmodeling. Consequently, regional groundwater monitoringnetwork design that identifies wells to be monitored on aregular basis while characterizing the potentiometric surfaceadequately is the subject of this study. In doing so, wepresent a novel approach to regional groundwater monitor-ing network design that uses a new learning methodologycalled Support Vector Machines (SVM) based on StatisticalLearning Theory (SLT).

    [11] Despite enjoying success in other fields [Scholkopf etal., 1999], there are few applications of SVM in hydrology.

    Dibike et al. [2001] applied SVM successfully in bothremotely sensed image classification and regression (rain-

    fall/runoff modeling) problems and reported a superiorperformance over the traditional artificial neural networks.Kaneviski et al. [2000] used SVM for mapping soil pollu-tion by Chernobyl radionuclide Sr90 and concluded that theSVM was able to extract spatially structured informationfrom the row data. Liong and Sivapragasam [2000] alsoreported a superior SVM performance compared to artificialneural net in forecasting flood stage. Asefa and Kemblowski[2002] used SVM to reproduce the behavior of a MonteCarlo based groundwater flow and transport model thatwas, in turn, utilized in the design of initial groundwatercontamination detection monitoring systems. Training andtesting examples were derived using plumes generated froma random contaminant leak resulting from failure of a landfillcell and random hydraulic conductivity field. Designedmonitoring networks by the trained machine were nearlyidentical with those obtained by the physical models.Contaminant plume detection reliabilities provided by bothmethods were also close.

    3. Methodology

    3.1. Estimation Variance--Based Method

    [12] Suppose one wants to predict the value of a randomvariable Z at a location x0 from a space of function F (alsonamed feature space), where no observation is made usingobservations at the vicinity, x1, x2, . . .xN. The most commontheory considers second-order stationarity. The kriging

    estimator ofZat pointx0 is given as the best linear unbiasedestimation (BLUE) expressed as a linear combination of

    Z^

    x0 XNi1

    wk i Z x i

    ; 1

    where wk(i) s are kriging weights; Z^

    (x0) is kriging estimator;and Z(x

    (i)) is observation made at location xi. The kriging

    weights are determined by requiring unbiasedness

    (Pni1

    wk(i) = 1) and minimum estimation variance [Journel

    and Huijbregts, 1978].[13] The groundwater monitoring network optimization

    problem is then posed as follows: for a given network sizeof n, find the best monitoring location out of the total of Nthat results in minimum mean estimation variance. This isdone through an exhaustive search as in branch-and-boundalgorithm that guarantees a global optimum, or heuristicnear-optimal solution of, say, simulated annealing andgenetic algorithm. For applications of this approach seestudies by Rouhani [1985], Ben-Jemaa et al. [1994], and

    Nunes et al. [2004a, 2004b].3.2. Support VectorBased Method

    [14] The support vector methodology [Vapnik, 1995,1998], based on Statistical Learning Theory (SLT) (seeAppendix A for detailed presentation of SVM algorithm),estimates the value ofZ at unsampled location x0 (vector ofmeasurement locations x and y) by

    Z^

    x wsv; x0h i b; 2

    where h.,.i indicates an inner or dot product between x0and wsv. wsv is the support vector weight (basis function),

    and b is bias. For simplicity, we will just use x rather thanx0. The weights and bias are found by minimizing aregularized e-insensitive loss function. This loss functionis depicted in Figure 2 and given below:

    G jZ x Z^

    x je 0 if jZ x Z

    ^x j e

    jZ x Z^

    x j e otherwise

    8>: ; 3

    where Z(x) is measured quantity (groundwater head in this

    case) and e represents the precision by which Z^

    (x)isestimated.

    [15] In Figure 2, each data point schematically represents

    measurements made at a monitoring well, and x s are slackvariables that measure distances of these data points fromthe e tube. Data points that lie inside the e tube have a zerovalue of the loss function and do not have associated slackvariables.

    [16] In order to find wsv and b, if one only minimizesequation (3), it is an ill-posed problem in Tikhonovs sense[Tikhonov and Arsenin, 1977]. Therefore in practice oneimposes a convex penalty term on some quantity related tothe complexity of Z. Vapniks [Vapnik, 1995, 1998] choiceof the regularization term is given by 1

    2kwk2 .

    [17] The optimization problem is then cast as follows:What is the best subset of long-term monitoring wells(number and locations) out of the existing N wells thatwould result in the best estimation of potentiometric

    W11509 ASEFA ET AL.: SVM IN GROUNDWATER HEAD MONITORING NETWORKS

    3 of 14

    W11509

  • 8/14/2019 monitoring design w/ support vector machines

    4/14

  • 8/14/2019 monitoring design w/ support vector machines

    5/14

    Intuitively, one can imagine the support vectors asmonitoring well locations that support the estimated

    potentiometric surface. Observe the difference betweenequation (5c) and equation (2). This is because of the factthat in differentiating the dual form the SVM weights are

    shown to be equal to wsv =Pni1

    (a*i ai)xi (equation (A6)).

    Substituting this expression in equation (2) would result in

    Z^

    (x) =Pni1

    (a*i ai)hx, xii + b.

    [21] The dot product is then substituted by a kernel:hx, x0i hF(x), F(x0)i = K(x, x0). This is the so-calledkernel trick depicted in Figure 3 where nonlinear trans-formation is achieved. This is because of the fact that theSVM algorithm depends only on the dot product betweenmonitoring well locations (see equations (A9a)(A9c)).

    [22] Kernels may be viewed as dot products of nonlineartransformation functions. The connection between Repro-ducing Kernel Hilbert Space and random processes is welldocumented [see, e.g., Wahba, 1990]. According to theBayesian interpretation, the first term in equation (4a) is astabilizer that is a prior on the regression function Z in theReproducing Kernel Hilbert Space (RKHS) induced bykernel K, and the data term is the noise model. If weassume that the data, zi, are affected by additive independentGaussian noise process (zi = z(xi) + ei), then the squarednorm, k Zk 2, can be thought of as the generalization of theexpression ZS1Z (also called the Mahalanobis distancefrom the mean Z) with covariance S [Wahba, 1990; Poggio

    and Girosi, 1998a, 1998b]. The density, P(Z), is then amultivariate Gaussian zero-mean function in the Hilbertspace defined by the covariance function. The existenceof such a well-defined family of random variables isguaranteed by the Kolmogorov consistency theorem[Wahba, 1990]. Therefore choosing kernel K may beviewed as assuming a Gaussian prior on Z with covarianceequal to K [Poggio and Girosi, 1998b]. This is also the linkbetween SVMs and kriging theory where the kernel is givenby the covariance function: K(x, x0) = cov(Z(x), Z(x0) = S.

    [23] The optimization problem given by equations (5a)(5c) estimates the best function that defines the potentio-metric surface as a function of support vector locationsonly. Measurements at other locations (those that lie inside

    the e tube) do not contribute to the function defining the

    potentiometric surface. Since support vectors define the potentiometric surface, future groundwater head observa-

    tions at those locations will explain the nature of thissurface better than measurements taken at other locations.Therefore support vector locations are assumed to be the best long-term monitoring well locations. In addition, theSVM algorithm directly gives the number of wells to bemonitored.

    4. Application to a Case Study

    [24] SVM-based regional groundwater monitoring net-work design may be summarized in two steps: (1) inventoryof groundwater head observations and hydrogeologicalcharacterization of different layers within which existingpiezometers are located; and (2) SVM implementation.

    4.1. Hydrogeological Characterization

    [25] In the present study area, groundwater observationwells are distributed within different aquifer layers and onehas to delineate these aquifers in order to select wells ineach layer. At a regional scale, the study area is classifiedas what is known as the Puget Sound Lowland that has been influenced in large part by the tectonic and glacialevents during the Tertiary and Quaternary periods [Jones,1999]. This part of the Puget Sound Lowland is named theFraser-Whatcom Basin. Cox and Kahle [1999] identifiedtwo classes of aquifers (from top down): (1) Sumas aquifer(Qsa) and (2) Everson-Vashon aquifer (Qev). The latter may

    be further divided into Everson-Vashon fine-grainedconfining unit (Qevf) and Everson-Vashon coarse-grainedlayer (Qevc), a confined aquifer. The Qevc consists ofdiscontinuous patches (lenses, pools). Therefore the hydro-geology of the present study area is a two-aquifer, three-layer system.

    [26] Characterization data were obtained from Cox andKahle [1999], Whatcom County Health and Human Serv-ices Department (WCHHSD) well log database (2826geographically referenced points), and the Department ofEcologys scanned well logs (6967 data points). These datawere analyzed to select well logs that were subsequentlyused to delineate these identified hydrogeologic layers. Welllog selection criteria, among other factors, include depth of

    completion and uniform aerial coverage. Figure 4 shows

    Figure 3. Conceptual representation of kernel transformation.

    W11509 ASEFA ET AL.: SVM IN GROUNDWATER HEAD MONITORING NETWORKS

    5 of 14

    W11509

  • 8/14/2019 monitoring design w/ support vector machines

    6/14

    two cross sections (east-west and south-north) of the presentstudy area. Locations of the cross sections are shown inFigure 1 (Figure 4).

    [27] Because of the fact that most of the water supplyneed in the project area is satisfied by the Sumas Aquifer,

    and this aquifer is practically disconnected from the under-lying Qevc layer through a thick low permeable Qevf layer,the present study is concerned only with the Sumas Aquifer.In addition, most of the inventoried observation wells arealso sited in this aquifer. We note that several localized

    Figure 4. Cross sections of (a) east-west and (b) south-north. The locations of the cross sections areshown in Figure 1.

    6 of 14

    W11509 ASEFA ET AL.: SVM IN GROUNDWATER HEAD MONITORING NETWORKS W11509

  • 8/14/2019 monitoring design w/ support vector machines

    7/14

    previous hydrological investigations also considered the bottom of the Sumas Aquifer as an impermeable unit[Associated Earth Sciences, Inc., 1994, 1995; GeoEngineers

    Hydrogeologic Services, 1994; Water Resources Consulting,LLC, 1997].

    4.2. SVM Implementation[28] Equations (5a) (5c) is a quadratic optimization

    problem that guarantees a global optimum and can besolved using any off-the-shelf quadratic optimization algo-rithms like LOQO [Vanderbei, 1994]. We used the SVMoptimization code developed by the Royal Holloway Uni-versity of London and AT&T Speech and Image ProcessingService Research Lab [Saunders et al., 1998]. The datarequired to solve equations (5a)(5c) are observed ground-water head atx (X and Y coordinates) monitoring locations,

    Z(x), and a kernel k(x, xi) that describes the (nonlinear)dependency between observation points. Table 1 shows themost commonly used SVM kernels. Here we used a radial

    basis kernel that is translation invariant and estimated itsparameter using cross validation (see below). From Table 1,notice that use of the two-layer neural network kernel inSVM is not the same as that of the traditional ArtificialNeural Network (ANN) [Govindaraju and Rao, 2000]. Thisimportant difference between ANNs and SVMs is explainedbelow.

    [29] Although the transformation function (kernel) used by ANNs and SVMs with the two-layer neural networkkernel is similar, the loss function used by ANN (based onleast square) does not result in a sparse solution [ Girosi,1998], as in the SVM. Therefore because of the nature ofthe loss function employed, if ANNs were to be used toestimate the potentiometric surface, they will use all the

    measured data at monitoring well locations. Consequently,ANNs will not be able to directly select a subset ofmonitoring wells to be used as LTM networks as a functionof different levels of potentiometric surface approximations.Lastly, most training methods in ANN such as the backpropagation algorithm may not guarantee a global optimum[Hastie et al., 2001, p. 359; Vapnik, 1998, p. 399].

    [30] The SVM algorithm is used in two stages: (1)training/validation, and (2) design. The training/validationstages aim at finding the optimal kernel parameter and SVMparameter C for a range of potentiometric surface approx-imations (e) that will be used in the design stage. The designstage then uses trained SVM to provide a long-term mon-

    itoring network as a function of groundwater head surfaceapproximations. Each of these steps is explained below.

    [31] Three hundred and fifty well locations and ground-water head observations extracted from the Sumas Aquiferwere used to estimate SVM parameter, C, and radial basiskernel parameter, g. One way of conducting the training/validation is with a split sample approach. This approachdivides the available data into two and uses one for trainingand the remaining for validation. Optimal SVM parameters

    will then be selected based on performance (e.g., minimumroot mean square error) of the validation set. We used a K-fold cross-validation technique. The K-fold cross-validationapproach splits the available data into more or less K equalparts. K-1 parts of the data will be used to find the SVM

    estimator, Z^

    (x), and calculate the validation error ofthe fitted model while predicting the kth part of the data.The procedure then continues for k = 1, 2, . . ., K, andthe selection of parameters is based on minimum predictionerror estimates over all K parts.

    [32] Now the question is what value to use for K. Hastieet al. [2001] recommend the use of K = 5 or 10 based on theshape of a learning curve. A learning curve is a plot oftraining error versus training size. For given SVM param-eters (g, e and C), different training errors are calculated by

    progressively estimating Z^

    (x) for increased number of thetraining size, constituting a plot of the learning curve. Forsmaller training sizes, the learning curve has a steep slopeand it gradually flattens, as the training size increases andchanges in training error becomes small. At this point, thetraining error is said to be independent of the training size.Consequently, the value 4K or 9K will correspond to thetraining size where the learning curve starts to be flat. Wenote that even though the actual value of the training errormay differ for different combination of SVM parameters,the shape of the learning curve remains more or less thesame (i.e., the training size that corresponds to flattened

    portion of the training curve stays nearly the same).[33] Figure 5 shows a representative learning curve in

    our case. The curve was made using e = 0.1 and C = 10 andg = 6. The value of the kernel parameter was derived fromdata. This was done by noting that the radial basis kernel,in fact, is a Gaussian covariance with unit variance (seeTable 1), the relation being r2 = 1/g2, where ris the distanceafter which no spatial autocorrelation is evident. Figure 6shows the experimental and Gaussian covariance that wasused to estimate the value ofg. We would like to point out

    Table 1. Commonly Used Kernels

    Kernel Type Expression

    Simple dot producta K(x, x0) = x*x0

    Polynomial K(x, x0) = (x*x0 + 1)d,d is user specified

    Two-layer neural network K(x, x0) = tanh (b(x*x0) c)) ,b and c are user specified

    Radial basisb

    K(x, x0

    ) = exp (g2

    kx x0

    k2

    ),g2 is user specified

    aThis kernel corresponds to linear machine.bThis kernel is translation invariant. Can be written as Gaussian

    covariance kernel with unit variance: K(x, x0) = s2 expkx x0k2

    r2

    =

    exp h2

    r2

    , where s2 = 1, r2 = 1/g2, and h2 = kx x0k2.

    Figure 5. A learning curve. The broken line corre-sponds to fivefold cross validation (280 data points).

    W11509 ASEFA ET AL.: SVM IN GROUNDWATER HEAD MONITORING NETWORKS

    7 of 14

    W11509

  • 8/14/2019 monitoring design w/ support vector machines

    8/14

    that this kernel parameter value obtained from covariance fitis used to obtain the learning curve and we do not imply anassumption of underlying Gaussian random field for thehead distribution. One could also use an arbitrarily selectedkernel parameter value and adjust its value during training.

    [34] As shown in Figure 5, the learning curve is relativelyflat after it reaches a training size of 250. The five- andtenfold training sizes correspond to sample sizes of 280 and315, respectively, which is virtually the same as the perfor-mance of the complete set. Thus cross validation would notsuffer from much bias. The case K = 5 will have almost thesame performance as the case K = 10, but it will result in asmaller computational time and, therefore, was used toconduct the cross validation. If the five- or tenfold trainingsize (training size corresponding to 4K or 9K) indicates a

    location where the learning curve has a considerable slope,from Figure 5 we observe that the true prediction error(where the curve flattens) will be underestimated [Hastie etal., 2001].

    [35] Consequently, we conducted a fivefold cross valida-tion for a range of potentiometric surface approximations(e = 0.01 0.5) and obtained optimal values of SVMparameters to be C = 7 and g = 2. These values were thenused in the design stage as explained in the next section.

    4.3. Selecting Optimal Long-TermMonitoring Networks

    [36] The design of a groundwater monitoring network is amultiobjective optimization problem [Knopman et al., 1991;

    Cieniawski et al., 1995; Wagner, 1995; Reed et al., 2001,2003]. If one monitors all the available wells, the errorassociated with defining the potentiometric surface will beminimal but this also means a higher cost of monitoring.Using small number of monitoring wells would be lesscostly but will also have higher error in explaining thepotentiometric surface. Therefore our interest in this studylies in (1) finding how many wells would be required todefine the groundwater flow field, (2) identifying thelocations of those wells, and (3) providing a decision curvethat shows trade-offs between the number of wells andcorresponding relative error in groundwater table elevationestimates.

    [37] Using the optimal SVM parameters estimated in the previous section, we fit a potentiometric surface to all the

    observed data (groundwater head observations at monitor-ing wells and their corresponding locations, X and Ycoordinates) for various levels of potentiometric surfaceapproximations. At the end of the quadratic optimizationprocedure, the support vectors were extracted and geograph-ically referenced, thus producing a set of long-term moni-toring well locations. Different magnitude of errors in

    defining the potentiometric surface would then result indifferent numbers and locations of monitoring wells. There-fore the relation between e and the number and locations ofmonitoring wells can be used to decide the size of thenetwork as shown in Figure 7. For example, Figure 8 showsthe locations of monitoring wells for four different errorlevels. Sixty-five monitoring wells would be required tomaintain an error level of 5%; 23 wells fore = 10%; and soon. Wells selected in networks of higher error level (forexample, e = 15%) were found to be progressively includedin the set when e is smaller, rendering consistency in thesolution.

    [38] It is interesting to observe that selected monitoringwell locations (Figure 8) are at the areas where the observed

    heads are most uncertain. Inspection of the equipotentiallines shows that the support vector points follow approxi-mately the groundwater watershed boundaries. If two ormore monitoring locations are very close to each other, it isbecause the local differences between groundwater heads atthose locations are large, therefore requiring more monitor-ing wells to explain the groundwater head variation at thoseareas. Figure 9 depicts the SVM prediction error surface fordifferent sizes of monitoring networks. Recall that from thedefinition of support vectors, at selected monitoring welllocations we have (absolute) prediction errors equal to orgreater than the prespecified error level. In other words, atthose locations training points are on or outside the e tube.

    Nonmonitoring observation wells at other locations lieinside the e tube, hence do not contribute toward thedefinition of the potentiometric surface. This confirmscommon intuition as the SVM procedure puts observationwells at the most uncertain locations. The groundwatersurface is then supported at those locations.

    [39] We also investigated the performance of kernel parameter value (length correlation scale) estimated fromcovariance fit, compared to the one obtained through

    Figure 6. Experimental covariance along with Gaussiancovariance fit.

    Figure 7. Network size versus potentiometric surfaceapproximation.

    8 of 14

    W11509 ASEFA ET AL.: SVM IN GROUNDWATER HEAD MONITORING NETWORKS W11509

  • 8/14/2019 monitoring design w/ support vector machines

    9/14

    fivefold cross validation, on the complete set of data.Figure 10 depicts this comparison.

    [40] For small values of e, the covariance fit value

    (smaller correlation scale or higher gamma) gave betterresults of Root Mean Square Error (RMSE), as e increasesthe kernel parameter derived from the fivefold cross vali-dation (selected based on overall best performance) gavebetter RMSE. This observation can be explained as follows:At lower values of e, the estimated potentiometric surfacewill be close to the observed groundwater surface, requiringone to use highly localized kernel and, hence, such a kernelis expected to produce a smaller RMSE. As e increases, theestimated potentiometric surface is flatter, with supportvectors far apart and, hence, a kernel with higher lengthscale would result in smaller RMSE values.

    [41] Two types of measurement errors may be identified

    in the process: (1) piezometer dislocation (X, Y coordi-nates); and (2) groundwater head measurement errors. The

    former type is a onetime error (although piezometer loca-tions could be updated through resurvey). Usually, ground-water observations are made from ground surface to water

    table and are converted to groundwater heads by subtractingthese values from estimated ground surface elevations.When the variation in topography within the neighborhoodof a piezometer head is large, the impact of dislocation errorcould be significant and may affect subsequent estimates in both groundwater network design and flow and transportmodeling. In the present study, we extracted piezometerhead information from a high-resolution (10m) DigitalElevation Model (DEM) using GIS operations and assumedthat the dislocation error is negligible.

    [42] In order to investigate groundwater head measure-ment errors, we conducted experiments for different Noiseto Signal Ratios (NSR) using Gaussian noise. NSR is

    defined as the ratio between the variance of the noise andthe variance of the observed data. Table 2 shows compar-

    Figure 8. SVM predicted groundwater head (m) surface and selected monitoring wells for differentlevels of a prespecified error level (e, number of monitoring wells): (a) (5%, 65); (b) (10%, 23);(c) (15%, 11); and (d) (20%, 8).

    W11509 ASEFA ET AL.: SVM IN GROUNDWATER HEAD MONITORING NETWORKS

    9 of 14

    W11509

  • 8/14/2019 monitoring design w/ support vector machines

    10/14

    isons between designed networks with and without Gauss-ian noise. Ate = 5 % network size has increased marginallyand this change increases with increase in NSR values.Whereas at higher e values (for example, e = 10%), the

    change in network size remains steady with increasing NSR. As seen in the table, the network size changes arehigher at lower values of e, which is in agreement with ourintuition, indicating that designed networks at higher elevels are more tolerant against measurement corruptions.For example, ate = 15%, the NSR value has to be increasedto 50% in order to cause changes in the designed network.Overall, we have found the support vectorbased designednetwork to be robust.

    5. Conclusions

    [43] We have presented a regional groundwater networkdesign procedure that used a new machine learning meth-odology called Support Vector Machines (SVM) based on

    Statistical Learning Theory (SLT). The SLT procedureallows for an unbiased selection of monitoring points basedon their importance in constructing the groundwater poten-tiometric surface without going through an exhaustive

    search on different monitoring network configurations.The approach utilized consists of two parts: one related tothe regularization of the solution (i.e., the estimated functionwill always tend to be flat, avoiding over fitting), and thesecond related to the goodness-of-fit resulting in remarkablegeneralization capabilities. The current procedure evaluatesminimal information (number of monitoring wells) to de-sign a regional groundwater monitoring network by select-ing from (many) existing wells. The locations of existingwells are mapped to the potentiometric surface using anonlinear kernel transformation chosen a priori. The e-insensitive unique feature of SVMs was used to select(the number and locations of) monitoring wells. The abilityof SVMs to construct potentiometric surface approxima-tions using a very rich set of functions and to control the

    Figure 9. Error surface and selected monitoring wells for different levels of a prespecified error level(e, number of monitoring wells): (a) (5%, 65); (b) (10%, 23); (c) (15%, 11); and (d) (20%, 8).

    10 of 14

    W11509 ASEFA ET AL.: SVM IN GROUNDWATER HEAD MONITORING NETWORKS W11509

  • 8/14/2019 monitoring design w/ support vector machines

    11/14

  • 8/14/2019 monitoring design w/ support vector machines

    12/14

    [47] The dual form is obtained by using Lagrange multi- pliers. Equations (A1a)(A1c) written in dual form is asfollows:

    G w; x; x*;a;a*;h;h*; b 1

    2k w k2 C

    XNi1

    xi x*i

    XN

    i1

    ai e xi zi XK

    j1

    wjxji b" #XNi1

    ai* e xi* zi XKj1

    wjxji b

    " #

    XLi1

    hixi hi*xi* ; A2

    where a*, a, h*, h are Lagrange multipliers, and j = 1,.K isinput dimension. The saddle point condition states that the partial derivatives ofG with respect to primal variables(w, b, xi, x*i ) have to vanish for optimality, i.e.,

    @G

    @b XNi1

    ai* ai 0; A3

    @G

    @wXKj1

    @G

    @wjzj

    XKj1

    wj z

    XNi1

    ai* ai XKj1

    xij z

    " # 0f g;

    A4

    @G

    @w w

    XNi1

    ai* ai xi 0f g; A5

    and thus

    w XNi1

    ai* ai xi; A6

    where z

    is a unit vector. Also,

    @G

    @xi C ai hi 0 A7

    @G

    @xi* C ai* hi* 0: A8

    Substituting equations (A3) to (A8) in equation (A2) resultsin the following quadratic optimization problem. Maximizethe following functional with respect to the forcings (as):

    W a*;a eXNi1

    ai ai* XLi1

    zi ai ai*

    1

    2

    XNi1

    XNj1

    ai ai* aj aj*

    xi; xj

    ; A9a

    subject to constraints

    XNi1 a

    i* ai 0 0 ai;ai* C; A9b

    to obtain

    Z^

    x Xni1

    ai* ai x; xih i b: A9c

    Since the above expression depends only on inner productsbetween input examples, kernel substitution (also called thekernel trick (see Figure 3)) of hx, x0i hF(x), F(x0)i =

    K(x, x0) would result in the SVM algorithm:maximize

    W a*;a eXNi1

    ai ai* XNi1

    Zi ai ai*

    1

    2

    XNi;j1

    ai ai* aj aj*

    k xi; xj

    ; A10a

    subject to constraints

    XNi1

    ai* ai 0 0 ai;ai* C; A10b

    to obtain

    Z^

    x Xni1

    ai* ai k x; xi b: A10c

    The bias b of the function that we are seeking is found fromthe Kuhn-Tucker (KT) condition, which requires that for theoptimal solution the product between dual variables andconstraints vanish. Mathematically, this is expressed as

    ai e xi zi XKj1

    wjxji b

    ! 0; A11

    ai* e xi* zi X

    K

    j1

    wjxji b !

    0; A12

    ai C xi 0;

    ai* C xi* 0:A13

    From the relations shown above, it follows that: (1) onlysamples (xi, zi) with corresponding a*i orai = C lie outsidethe e tube; (2) the dual variables are mutually exclusive(a*iai = 0); if both dual variables have nonzero values, itwould require nonzero slack variables on both directions;and (3) fora*

    iand a

    i2(0,C) it follows thatx = x* = 0, i.e.,

    the zi lie on the e tube. Since the second term has to vanishalso to satisfy the KT condition, this result would allow theestimation ofb. Even though a single xi would be enough tosolve the problem, in practice one uses the average of all thesupport vectors that lie on the e tube for the purpose ofassuring stability [ Muller et al., 1999]. Thus the properformulation for estimating b is

    b

    1

    M

    XMm1

    zm hw; xmi e; for am 2 0; C

    1

    M

    XMm1

    zm hw; xmi e; for am 2 0; C

    ;

    8>>>>>>>:

    A14

    where M is the number of sample points on the e tube.

    12 of 14

    W11509 ASEFA ET AL.: SVM IN GROUNDWATER HEAD MONITORING NETWORKS W11509

  • 8/14/2019 monitoring design w/ support vector machines

    13/14

    [48] Acknowledgments. We are grateful for the thoughtful reviewand suggested improvements by two anonymous reviewers that helpedimprove this manuscript. We especially thank the first reviewer for her/hissubstantive guidance in improving the manuscript.

    References

    Angulo, M., and W. H. Tang (1999), Optimal groundwater detection mon-itoring system design under uncertainty, J. Geotech. Geoenviron. Eng.,125, 510517.

    Asefa, T., and M. W. Kemblowski (2002), Support vector machines approx-imation of flow and transport models in initial groundwater contamina-tion network design, Eos Trans. AGU, 83(47), Fall Meet. Suppl.,Abstract H72D-0882.

    Associated Earth Sciences, Inc. (1994), Wellhead protection plan for thecity of Everson, Whatcom County, Washington, report, Kirkland,Wash.

    Associated Earth Sciences, Inc. (1995), Wellhead protection program,Sumas, Washington, for city of Sumas, report, Kirkland, Wash.

    Ben-Jemaa, F., M. A. Marino, and H. A. Loaiciga (1994), Multivariategeostatistical design of groundwater monitoring networks, J. WaterResour. Plann. Manage., 120, 505522.

    Cameron, K., and P. Hunter (2000), Optimization of LTM networks usingGTS: Statistical approaches to spatial and temporal redundancy, report,Air Force Cent. for Environ. Excell., Brooks AFB, Tex.

    Cieniawski, S. E., J. W. Eheart, and S. R. Ranjithan (1995), Using geneticalgorithms to solve a multiobjective groundwater monitoring problem,Water Resour. Res., 31, 399409.

    Cox, S. E., and S. C. Kahle (1999), Hydrogeology, ground water quality,and sources of nitrate in lowland glacial aquifer of Whatcom County,Washington, and British Columbia, Canada, U. S. Geol. Surv. WaterResour. Invest. Rep., 98-4195.

    Datta, B., and S. D. Dhiman (1996), Chance-constrained optimal monitor-ing network design for pollutants in groundwater, J. Water Resour.Plann. Manage., 122, 180188.

    Dibike, B. Y., S. Velickov, D. Solomatine, and B. M. Abbot (2001), Modelinduction with support vector machines: Introduction and applications,J. Comput. Civ. Eng., 15, 208216.

    Gangopadhyay, S., A. D. Gupta, and M. H. Nachabe (2001), Evaluation ofgroundwater monitoring network by principal component analysis,Ground Water, 39, 181191.

    GeoEngineers Hydrogeologic Services (1994), Wellhead protection study:Dodsons IGA well, Whatcom County, Washington, U.S.A., report,

    Bellingham, Wash.Girosi, F. (1998), An equivalence between sparse approximation and sup-

    port vector machines, Neural Comput., 10, 14551480.Govindaraju, R. S., and A. R. Rao (2000), Artificial Neural Network in

    Hydrology, 348 pp., Kluwer Acad., Norwell, Mass.Hastie, T., R. Tibshirani, and J. Friedman (2001), The Elements of Statis-

    tical Learning: Data Mining, Inference and Prediction, Springer-Verlag, New York.

    Hudak, P. F., and H. A. Loaiciga (1992), A location modeling approach forgroundwater monitoring network augmentation, Water Resour. Res., 28,643649.

    Jardine, K., L. Smith, and T. Clemo (1996), Monitoring networks infractured rocks: A decision analysis approach, Ground Water, 34,504518.

    Jones, M. A. (1999), Geologic framework for the Puget Sound aquifersystem, Washington and British Columbia, U.S. Geol. Surv. Prof. Pap.,1424-C.

    Journel, A., and C. Huijbregts (1978), Mining Geostatistics, Academic, SanDiego, Calif.

    Kaneviski, M., A. Pozdnukhov, S. Canu, and M. Maignan (2000), Ad-vanced spatial data analysis and modeling with support vector machines, Int. J. Fuzzy Syst., 4, 606 615.

    Knopman, D. S., C. I. Voss, and S. P. Garabedian (1991), Sampling designfor groundwater solute transport: Tests of methods and analysis of CapeCode tracer test data, Water Resour. Res., 27, 925 949.

    Liong, S. Y., and C. Sivapragasam (2000), Flood stage forecasting withSVM, J. Am. Water Resour. Assoc., 38, 173186.

    Loaiciga, H. A., R. J. Charbeneau, L. G. Everett, G. E. Fogg, B. F. Hobbs,and S. Rouhani (1992), Review of groundwater quality monitoringnetwork design, J. Hydrol. Eng., 118, 1137.

    Mahar, P. S., and B. Datta (1997), Optimal monitoring network and ground-water pollution source identification, J. Water Resour. Plann. Manage.,123, 199207.

    Massmann, J., and R. A. Freeze (1987a), Groundwater contamination fromwaste management sites: The interaction between risk-based engineeringdesign and regulatory policy: 1. Methodology, Water Resour. Res., 23,351367.

    Massmann, J., and R. A. Freeze (1987b), Groundwater contamination fromwaste management sites: The interaction between risk-based engineeringdesign and regulatory policy: 2. Results, Water Resour. Res., 23, 368380.

    Meyer, P. D., and E. D. Brill, Jr. (1988), A method for locating wells in a

    groundwater monitoring network under conditions of uncertainty, Water Resour. Res., 24, 12771282.

    Meyer, P. D., A. J. Valocchi, and J. W. Eheart (1994), Monitoring networkdesign to provide initial detection of groundwater contamination, Water Resour. Res., 30, 26472659.

    Minsker, B., and Task Committee (2003), Long-term groundwater monitor-ing design: State of the art applications, report, Am. Soc. of Civ. Eng.,Reston, Va.

    Molina, G. R., J. J. Beauchamp, and T. Wright (1996), Determining anoptimal sampling frequency for measuring bulk temporal changes ingroundwater quality, Ground Water, 34, 579 587.

    Montas, H. J., R. H. Mohtar, A. E. Hassan, and F. AlKhad (2000),Heuristic space-time design of the monitoring wells for contaminant plume characterization in stochastic flow fields, J. Contamin. Hydrol.,43, 271301.

    Morisawa, S., and Y. Inoue (1991), Optimum allocation of monitoring wellsaround a solid-waste landfill site using precursor indicators and fuzzyutility functions, J. Contamin. Hydrol., 7, 337 370.

    Muller, K. R., A. Smola, G. Ratsch. B. Scholkopf, J. Kohlmorgen, andV. Vapnik (1999), Predicting time series with support vector machines,in Advances in Kernel Methods: Support Vector Learning, edited byB. Scholkopf, C. J. C. Burges, and A. J. Smola, pp. 243 254, MITPress, Cambridge, Mass.

    Nunes, L. M., E. Paralta, M. C. Cunha, and L. Ribeiro (2004a), Ground-water nitrate monitoring network optimization with missing data, Water Resour. Res., 40, W02406, doi:10.1029/2003WR002469.

    Nunes, L. M., M. C. Cunha, and L. Ribeiro (2004b), Groundwater mon-itoring network optimization with redundancy reduction, J. Water Re-sour. Plann. Manage., 130, 33 43.

    Poggio, T., and F. Girosi (1998a), A sparse representation for functionapproximation, Neural Comput., 10, 14451454.

    Poggio, T., and F. Girosi (1998b), Notes on PCA, regularization, sparsityand support vector machines, AI Memo. 1632, CBCI Pap. 161, Mass.

    Inst. of Technol., Cambridge.Reed, P., and B. S. Minsker (2004), Striking the balance: Long-term

    groundwater monitoring design for conflicting objectives, J. Water Re-sour. Plann. Manage., 130, 140149.

    Reed, P., B. Minsker, and A. J. Valocchi (2000), Cost-effectivelong-term groundwater monitoring design using a genetic algorithmand global mass interpolation, Water Resour. Res., 36, 37313741.

    Reed, P., B. S. Minsker, and D. E. Goldberg (2001), A multiobjectiveapproach to cost effective long-term groundwater monitoring using anElitist Nondominated Sorted Genetic Algorithm with historical data,J. Hydroinformatics, 3, 71 90.

    Reed, P., B. S. Minsker, and D. E. Goldberg (2003), Simplifying multi-objective optimization: An automated design methodology for the non-dominated sorted genetic algorithmII, Water Resour. Res., 39(7), 1196,doi:10.1029/2002WR001483.

    Rouhani, S. (1985), Variance reduction analysis, Water Resour. Res., 21,837846.

    Saunders, C., M. O. Stitson, J. Weston, L. Bottou, B. Scholkopf, andA. Smola (1998), Support vector machine reference manual, Tech. Rep.CSD-TR-98-03, Royal Holloway Univ. of London, London.

    Scholkopf, B., J. C. Burges, and A. Smola (1999), Advances in Kernel Methods: Support Vector Learning, MIT Press, Cambridge,Mass.

    Storck, P., J. W. Eheart, and A. J. Valocchi (1997), A method for theoptimal location of monitoring wells for detection of groundwater con-tamination in three-dimensional heterogeneous aquifers, Water Resour.Res., 33, 20812088.

    Tikhonov, A., and V. Arsenin (1977), Solution of Ill- Posed Problems, W. H.Winston, Washington, D. C.

    Vanderbei, R. J. (1994), LOQO: An interior point code for quadratic pro-gramming, Rep. TRSOR-94-15, Stat. and Oper. Res. Princeton Univ.,Princeton, N. J.

    W11509 ASEFA ET AL.: SVM IN GROUNDWATER HEAD MONITORING NETWORKS

    13 of 14

    W11509

  • 8/14/2019 monitoring design w/ support vector machines

    14/14

    Vapnik, V. (1995), The Nature of Statistical Learning Theory, Springer-Verlag, New York.

    Vapnik, V. (1998), Statistical Learning Theory, John Wiley, Hoboken,N. J.

    Wagner, B. J. (1995), Sampling design methods for groundwater modelingunder uncertainty, Water Resour. Res., 31, 25812591.

    Wahba, G. (1990), Spline Models for Observation Data, Ser. Appl. Math.,vol. 59, Soc. for Indust. and Appl. Math., Philadelphia, Pa.

    Water Resources Consulting, LLC (1997), Wellhead protection program,report, Pole Road Water Assoc., Whatcom County, Wash.

    T. Asefa, M. W. Kemblowski, A. Khalil, M. McKee, and G. Urroz,Department of Civil and Environmental Engineering, Utah State University,Logan, UT 84322, USA. ([email protected]; [email protected]; [email protected]; [email protected]; [email protected])

    14 of 14

    W11509 ASEFA ET AL.: SVM IN GROUNDWATER HEAD MONITORING NETWORKS W11509