Basin Scale Hydrology support vector machines

download Basin Scale Hydrology support vector machines

of 14

Transcript of Basin Scale Hydrology support vector machines

  • 8/14/2019 Basin Scale Hydrology support vector machines

    1/14

    ABSTRACT: Water scarcity in the Sevier River Basin in south-cen-

    tral Utah has led water managers to seek advanced techniques for

    identifying optimal forecasting and management measures. To

    more efficiently use the limited quantity of water in the basin, bet-

    ter methods for control and forecasting are imperative. Basin scale

    management requires advanced forecasts of the availability ofwater. Information about long term water availability is important

    for decision making in terms of how much land to plant and what

    crops to grow; advanced daily predictions of streamflows and

    hydraulic characteristics of irrigation canals are of importance for

    managing water delivery and reservoir releases; and hourly fore-

    casts of flows in tributary streams to account for diurnal fluctua-

    tions are vital to more precisely meet the day-to-day expectations of

    downstream farmers. A priori streamflow information and exoge-

    nous climate data have been used to predict future streamflows and

    required reservoir releases at different timescales. Data on snow

    water equivalent, sea surface temperatures, temperature, total

    solar radiation, and precipitation are fused by applying artificial

    neural networks to enhance long term and real time basin scale

    water management information. This approach has not previously

    been used in water resources management at the basin-scale andcould be valuable to water users in semi-arid areas to more effi-

    ciently utilize and manage scarce water resources.

    (KEY TERMS: artificial neural networks; multi-sensor data; irriga-

    tion; water management; multi-time scale forecasting; streamflow.)

    Khalil, Abedalrazq F., Mac McKee, Mariush Kemblowski, and Tirusew Asefa,

    2005. Basin Scale Water Management and Forecasting Using Artificial Neural

    Networks. Journal of the American Water Resources Association (JAWRA)

    41(1):195-208.

    INTRODUCTION

    Forecasting of streamflow at different temporalscales is of practical importance to several disciplines.

    Techniques for predicting seasonal, daily, and hourly

    streamflows are utilized in this paper to address theneed for accurate information about water deliveries

    on a short term scale and to formulate long term or

    seasonal plans for allocation of water and relatedresources. Streamflow prediction is used in applica-

    tions as diverse as agricultural planning, reservoir,and watershed management. Ames (1998) has dis-

    cussed the financial returns to agriculture and indus-try that could be derived from successful extended

    range streamflow forecasts.Short term and real time forecasts of flows in rivers

    and tributaries, and near real time recommendations

    for required operational decisions for canal diversionsand reservoir releases can provide additional opportu-

    nities for improving system level water use efficien-cies. These information needs for long term and real

    time streamflow forecasts and near real time reser- voir releases require a substantial investment in

    acquisition and analysis of a wide range of temporallyand spatially disparate data. These information needsare very much the case for the highly regulated Sevier

    River Basin of south-central Utah, which has beenheavily instrumented in recent years and which pro-

    vides both the motivation and case study area for thispaper.

    Physically based hydrologic and hydraulic mathe-matical modeling approaches have been proposed forstreamflow predictions, but complexities in these

    modeling processes and difficulties associated withobtaining the data that such models would require

    have limited the scope and applicability of these

    1Paper No. 03202 of theJournal of the American Water Resources Association (JAWRA) (Copyright 2005). Discussions are open untilAugust 1, 2005.

    2Respectively, Graduate Research Assistant, Professors of Civil and Environmental Engineering, and Graduate Research Assistant,Department of Civil and Environmental Engineering, Utah Water Research Laboratory, Utah State University, Logan, Utah 84322-8200 (E-Mail/Khalil: [email protected]).

    JOURNAL OF THE AMERICAN WATER RESOURCES ASSOCIATION 195 JAWRA

    JOURNAL OF THE AMERICAN WATER RESOURCES ASSOCIATION

    FEBRUARY AMERICAN WATER RESOURCES ASSOCIATION 2005

    BASIN SCALE WATER MANAGEMENT AND FORECASTINGUSING ARTIFICIAL NEURAL NETWORKS1

    Abedalrazq F. Khalil, Mac McKee, Mariush Kemblowski, and Tirusew Asefa2

  • 8/14/2019 Basin Scale Hydrology support vector machines

    2/14

    traditional methods. As a result, there is a need forthe development of modeling approaches that capturethe behavior of the system utilizing available data,

    are computationally robust, and could be used in realapplications. One such approach is presented in this

    paper.The goals of the work reported in this paper are to:

    1. Provide analyses that can be used to improvedecisions in river basin management through exploit-

    ing the wealth of available, diverse data regardingcanals and streamflows, irrigation water orders, cli-

    mate information, and earth and sea surface satelliteimagery.

    2. Provide decision relevant information thatfacilitates the on-farm management of water in boththe short and long term.

    STUDY AREA SEVIER RIVER BASIN

    The Sevier River Basin in rural south-central Utahis one of the states major drainages (Figure 1). A

    closed river basin, it encompasses 12.5 percent of thestates total area. From the headwaters 250 miles

    (402 km) south of Salt Lake City, the river flows north

    and then west 255 miles (410 km) before reachingSevier Lake (Berger et al., 2002). The Sevier River

    Basin has five subwatersheds and is divided into twomajor divisions, the upper and lower basins, for the

    administration of water rights. The dividing pointbetween the upper and lower basins is the Vermillion

    Diversion Dam. Average annual precipitation variesaround 13.0 inches (33 cm), and the growing seasonranges from 60 to 178 days (Bergeret al., 2002; Utah

    Board of Water Resources, 2001). Most of the surfacewater runoff comes from snowmelt during the spring

    and early summer months. The primary use of waterin the basin is for irrigation. The average annual

    JAWRA 196 JOURNAL OF THE AMERICAN WATER RESOURCES ASSOCIATION

    KHALIL, MCKEE, KEMBLOWSKI, AND ASEFA

    Figure 1. The Sevier River Basin in South-Central Utah.

  • 8/14/2019 Basin Scale Hydrology support vector machines

    3/14

    amount of water diverted for cropland irrigation is903,460 acre-feet (1,114 million cubic meters, mcm).Of this amount, approximately 135,000 acre-feet

    (166.5 cmc) are pumped from ground water. About 40percent of the diversions are return flows from

    upstream use (Berger et al., 2002). For a detaileddescription of the basin and much of the real time

    database utilized in this research, refer to SevierWater Users Association (2004).

    BACKGROUND

    Real time integrated management of river basinscan be important in achieving optimum allocation of

    scarce water resources. Researchers have proposedphysical and stochastic approaches for prediction of

    streamflow at different time scales for managementpurposes. Complexities in the underlying physical

    processes and difficulties in acquiring needed datalimit the utility of these approaches. The main func-tions of an integrated real time water resources man-

    agement system are: water resources real timemonitoring and data collection, information and

    knowledge mining, and prediction and real time deci-sion support. Real time water resources management

    requires a heavily instrumented basin to monitor pre-cipitation, runoff, climatic indices, and streamflow.The Sevier River Basin has been heavily instrument-

    ed with gages that measure all the aforementionedfactors. Measurements of flows at several locations on

    the mainstem of the Sevier River, tributary flows andcanal diversions, various meteorological data, reser-

    voir volumes and releases, and other data are report-ed hourly, stored in a database, and made available

    via the internet. While the managers of the Sevier

    River water systems have utilized these data in rawform to improve overall system operations, much

    more could be done with these data to develop andimplement advanced tools for forecasting and real-

    time management. The Sevier River is therefore asuitable study area to test tools that are not physical-

    ly based, but that let the data speak. The emphasisof this manuscript is on integration of the available

    data by artificial neural networks to obtain decision-relevant predictions of flows and reservoir operationrecommendations at different time scales. These mod-

    els will ultimately be integrated into a waterresources information management system to be

    delivered to the operators of the reservoir and canalsystems in the Sevier River Basin.

    ARTIFICIAL NEURAL NETWORKS

    In this paper, artificial neural network (ANN)learning methods are used to develop basin scale

    management models. Artificial neural networks arepractical information processing systems that provide

    methods for learning functions from observations

    An ANN roughly replicates the behavior of the organ-ic brain by emulating the operations and connectivity

    of biological neurons. This emulation, of course, isdone in a mathematical form that is greatly simplified

    from the biological prototype. The advantage of ANNsin engineering and practical applications lies in their

    ability to learn and capture information from datathat describe the behavior of a real system (Govin-daraju and Rao, 2000; Hayken, 1994).

    An interesting property of ANNs is that they oftenwork well even when the training data sets contain

    noise and measurement errors (Hammerstrom, 1993).Moreover, they have the capability of representing

    complex behaviors of nonlinear systems (Maier andDandy, 2000).

    Artificial neural networks are characterized by

    their architecture, an activation function, and thelearning rule and learning parameter set used in

    their construction. A common architecture is oneembodied in feed forward backpropagation ANNs,

    which consists of layers of neurons in the network anddifferent number of neurons in each layer (Skapura,

    1995). It is composed of a sequence of layers that areclassified as input, hidden, and output layers. Eachlayer consists of a set of one or more nodes, or neu-

    rons. The nodes in the input layer receive informa-tion from the outside world, process this information,

    and send output to the next layer of neurons in thenetwork. Each neuron is connected to neurons in the

    preceding layer, from which it receives inputs, and tothe neurons in the subsequent layer, to which it pass-es its output.

    The learning rule specifies the way in whichweights will be determined during the training pro-

    cess, and this depends on the input, output, and acti-vation values of the model. Each neuron has an

    activation function, which can be continuous, linear,or nonlinear functions [i.e., monotonic nonlinear func-

    tion that saturates at finite value arguments likesgm() and tanh()]. The output signal that passesfrom one neuron to another in a subsequent layer is

    transformed by a weight, or connection strength,that modifies the signal before it reaches the receiv-

    ing neuron. Thus, the output of a node in any layer isdetermined by applying a nonlinear transformation(the activation function) to the sum of the weighted

    inputs it receives from the neurons of the previouslayer. Figure 2 shows an ANN model that takes input

    JOURNAL OF THE AMERICAN WATER RESOURCES ASSOCIATION 197 JAWRA

    BASIN SCALE WATER MANAGEMENT AND FORECASTING USING ARTIFICIAL NEURAL NETWORKS

  • 8/14/2019 Basin Scale Hydrology support vector machines

    4/14

    valuesx1,x2, ...xl and generates an output signal y1,y2, ...yK. A multi-layer ANN is described as feedfor-ward when the connections are directed from the

    input layer, forward through the network, to the out-put layer.

    The activation functions are evaluated through twosteps. First, the activation is calculated as the inner

    product of the input vector, x = [x1,x2, ...xl]T, and the

    ond, the output, y, is evaluated as a function f(u) ofthe activation. Optimal values for the weight vector

    are determined by minimization of an objective func-tion that measures the error between the models out-

    put and the measured behavior of the real systemEmpirical Risk Minimization. Typically, the error

    for query t may be defined as the difference betweenthe observed or measured target response, T(t), andthe models response, y(t). Generally, a method called

    backpropagation (Rumelhartet al., 1986) is used fortraining ANNs, by which w is modified in such a wayto find a set ofw that minimizes the error.

    For details about ANNs, interested readers arereferred to Govindaraju and Rao (2000) and Schalkoff

    (1997).

    RELEVANT DATA SETS

    The development of the predictive learning modelrequires the precise identification of the relevant

    data. In the next sections, a brief description of therelevant data sets will be provided. The relevancyevaluation is judged subjectively. In other words, this

    paper utilizes the available data that could be relatedto the given model from a hydrologic perception.

    Streamflow

    Streamflow is the result of interactions betweenmany hydrologic events, such as precipitation,snowmelt, evapotranspiration, infiltration, and

    ground water recharge, with anthropogenic influ-ences, such as irrigation activities.

    Continuous historical streamflow data wereobtained for different sites. Data appropriate for use

    in seasonal streamflow predictions are available inthe form of average daily flows from 1976 to 2002.

    Short term predictions can be supported by daily and

    hourly streamflow data that are available in bothdaily and hourly form from 2000 to 2003.

    Irrigation Demands

    Irrigation demands represent the quantities ofwater that farmers request be delivered to their

    headgates. Such requests are made one day inadvance of the expected time of deliveries to takeplace. Data on irrigation demands for various canals

    in the Sevier River Basin are available for the years1952 through 2002.

    Temperature

    Temperature can directly affect the rate ofsnowmelt, which in turn contributes to streamflow.The inclusion of temperature data as a predictor can

    enhance the model. Historic daily and hourly temper-ature data are available at many SnoTel and weather

    stations.

    JAWRA 198 JOURNAL OF THE AMERICAN WATER RESOURCES ASSOCIATION

    KHALIL, MCKEE, KEMBLOWSKI, AND ASEFA

    Figure 2. Typical ANN Structure.

    weight vector, w = [w1

    , w2

    , ... wl

    ]T, u w xi ii

    l

    =

    , . Sec-

  • 8/14/2019 Basin Scale Hydrology support vector machines

    5/14

    Sea Surface Temperature Anomaly

    Satellite derived measurements of sea surface tem-perature anomaly (SSTA) data can be useful in mak-ing seasonal predictions of streamflow. Sea surface

    temperature influences continental precipitation pat-terns, and hence provides information about the

    quantity of water that will become available for stor-age in reservoirs. Incorporation of SSTA measure-

    ments over a broad temporal scale can therefore berelevant to the study of basin-scale water manage-ment issues. A long, statistically homogeneous record

    of sea surface temperature anomalies is available(Kaplan et al., 1998) on a 5-degree-by-5-degree grid

    covering the majority of the worlds oceans for theperiod 1856 to present. Details of the statistical devel-

    opment of these data are beyond the scope of thispaper. Readers are referred to Kaplanet al. (1997) for

    a description of the methodology.

    Snow Water Equivalent

    Information about snow can be critical for forecast-ing spring runoff and water levels in streams. Snow

    serves as storage of water supplies at the beginning ofthe season. Daily data on snow water equivalent

    (SWE), which is the equivalent depth of waterobtained when the snow is completely melted, areavailable from several SnoTel sites in the Sevier River

    Basin, including the three shown in Figure 1.

    Precipitation

    Daily precipitation measurements are available at

    different locations across the Sevier River Basin. Theprecipitation data used in this manuscript wereobtained from the Kimberly Mine SnoTel station, and

    the Richfield airport weather station, as it is the near-est station to the locations at which streamflow pre-

    dictions are desired.

    MODEL FORMULATION AND APPLICATION

    Developing an ANN model for a particular applica-tion requires designing the network architecture for

    capturing the dynamical characteristics of the systembeing simulated from data that are available to

    describe the problem domain. The structure of anANN requires identification of the input and output vectors. It also requires selection of the number of

    hidden layers and specification of the number of

    neurons in each hidden layer, which is usually accom-plished through a trial-and-error process. Finally, theresulting ANN model must be evaluated, or tested,

    in terms of the quality of its predictions.

    Seasonal Streamflow Prediction Model

    Seasonal predictions of future streamflow and

    reservoir volumes can play a vital role in planningand decision making in river basins. In the case of theSevier River Basin, ranchers must make decisions to

    purchase livestock early in the year, well before infor-mation is available about how much water will be

    supplied in the summer and fall for irrigation andproduction of feed for those livestock. Financial com-

    mitments made early in the water year can result insubstantial economic losses if the winter snow packand resulting spring runoff do not subsequently sup-

    ply enough irrigation water. Seasonal predictionswere made in this study for flows on the Sevier River

    at the Hatch gage, which is high in the upper basin.The quantity of water that flows through this gauge

    represents a large portion of the total water availableto the basin. The streamflow at this gauge changesfrom season to season due to the interactions of a

    multitude of factors. Regional and local meteorologi-cal conditions and snowpack in the mountains will

    obviously influence streamflows. Previous work hasshown that ANNs are appropriate to capture the

    complex nonlinear relationships among these phe-nomena. For a more complete review of the uses of

    ANNs in water resources applications, refer to Maierand Dandy (2000) and Govindaraju and Rao (2000).

    The approach adopted here in building a model for

    forecasting seasonal streamflow quantities is basedupon a multi-sensor data driven approach that uses

    an ANN as a learning machine. Inputs to the modelconsist of previous seasonal streamflows, SSTA data,

    and SWE data from the SnoTel stations at Harris Flatand Midway Valley. The cumulative quantity of waterthat flows past the Hatch gage in a season provides

    information on the overall status of the basin withrespect to water availability and the response of basin

    hydrology to climatic forcings. The SWE input to the

    model is the average of the monthly SWE over theprevious 12 months. Sea surface temperature anoma-ly data are input to the model in the form of the 12previous monthly average SSTA values. The relation-

    ship between inputs and outputs of the seasonal ANNmodel, then, can be expressed as

    Qt+6 = (I)

    where Qt+6 is the expected quantity of water (cfs)coming to the basin through the Hatch gage for six

    JOURNAL OF THE AMERICAN WATER RESOURCES ASSOCIATION 199 JAWRA

    BASIN SCALE WATER MANAGEMENT AND FORECASTING USING ARTIFICIAL NEURAL NETWORKS

    (1)

  • 8/14/2019 Basin Scale Hydrology support vector machines

    6/14

    months from time t, I is the vector of inputs to theANN, and is the ANN nonlinear transformation ofinputs to outputs. The input vector can be expressed

    as

    I = [Qt-6 St-12 T

    t-12]T

    where Qt-6 is the total quantity of water (cfs) flowingpast the Hatch gage in the last six moths, St-12 is theaverage SWE (in) calculated over the 12 months prior

    to time t for each SnoTel station, and T

    t-12 represent avector of average monthly SSTAs (C) for the previous

    12 months. The SSTA data were obtained for six dif-ferent stations (see Figure 3). Therefore, six ANN

    models were built using one individual SSTA stationat a time (see Figure 3). Detailed descriptions of themodel performance for the SSTA station that proved

    to be the most significant are presented in the resultssection.

    Daily Reservoir Release Prediction Model

    The need for daily prediction is of great importanceto manage irrigation canals and reservoir releases inriver basins. Piute Reservoir was selected to test the

    applicability of ANNs for supplying information fordaily reservoir management (see the Middle Sevier

    portion of Figure 1). Each day, the operator of thePiute Reservoir must set releases at a level that will

    be sufficient to meet the needs of nine irrigationcanals that divert water from the river downstream ofthe reservoir, these canals all lie between the Clear

    Creek confluences and the Vermillion Diversion Dam(see Figure 1).

    If too little water is released, it is likely that thelower canals will not receive enough water. If too

    much water is released, some might be spilled to thelower basin; water that is spilled is considered lostby the users in the upper basin, who, in accordance

    with the complicated system of water rights on theSevier, are entitled to it. Vermillion Diversion Dam,

    shown in Figure 1, is the administrative dividingpoint between the upper and lower Sevier River. Effi-

    cient daily management decisions about the operationof the reservoir, then, can result in reduction of waterlosses and improved deliveries to users. This will

    translate into increased overall farm production forthe upper basin. Modeling all the climatic, hydrologic,

    and hydraulic physical processes involved to provide

    near real time forecasts of river and canal flows and,ultimately, required reservoir releases would involvesolution of a complex system of nonlinear, partial dif-ferential equations. Implementation of such a model

    would need a substantial amount of data, a skilledmodeler, and powerful computing devices.

    There is uncertainty involved in the reservoirreleases owing to the variations in the influencing

    processes throughout the season and the travel timesfrom the reservoir to the last demand that range from

    JAWRA 200 JOURNAL OF THE AMERICAN WATER RESOURCES ASSOCIATION

    KHALIL, MCKEE, KEMBLOWSKI, AND ASEFA

    (2)

    Figure 3. Significant Sea Surface Temperature Anomaly Measurement Locations.

  • 8/14/2019 Basin Scale Hydrology support vector machines

    7/14

    two to three days depending on the quantity of flow inthe river and on antecedent flow conditions. In theface of uncertainty, the Piute Reservoir operator

    needs a tool to help decide on a near real time basishow much water to release to meet water orders to

    canal operators located downstream of the reservoir.In other words, a common requirement for managing

    the reservoir that is operated on an on-demandbasis is the anticipation of the quantity of water thatmust be released while accounting for losses and trav-

    el time. The Piute Reservoir operators would like toset the diversion gates once per day and maintain a

    constant flow into the canal over the following 24-hour period. Therefore, the desired output of the ANN

    model is simply the daily quantity of water thatshould be released from the Piute Reservoir. Theinformation that should be made available to the

    ANN model through the neurons in the input layershould include the data that describe current, and

    perhaps recent historical, flow conditions in the river

    and canals. This information is readily available fromthe on-line database maintained by the Sevier RiverWater Users Association (2004). Input to the ANNshould also include the orders that have been received

    by the canal managers for water deliveries along thelength of the river. The relationship between inputs

    and outputs of the daily ANN model, then, can beexpressed as

    ODt = (I)

    where ODt is the rate flow of water (cfs) to be releasedon day t, I is the vector of inputs to the ANN, and is

    the ANN nonlinear transformation of inputs to out-puts. The input vector can be expressed as

    I = [Dt-1 Q

    t-l O

    t]T

    where Dt-1 is the average release flow (cfs) from theprevious day, Q

    t-l is a vector composed of the average

    flows (cfs) from the previous day at the flow gagesalong the river, and O

    t is a vector of water orders (cfs)

    to be delivered during day t. The use of previous daycanal flow information and orders for next day water

    deliveries produces an input layer with 14 neurons.

    Hourly Streamflow Predictions

    In some situations, unregulated tributary streamscan cause flows to fluctuate in the main river over a

    diurnal pattern that is difficult to predict and thatcauses management problems in planning for diver-

    sions in locations downstream of the tributary. Clear

    Creek, a tributary of the Sevier River, is an exampleof an uncontrolled tributary stream that dischargesinto the river in such a way that its diurnal fluctua-

    tions make downstream water management more dif-ficult.

    In the spring and early summer, snowmelt in theClear Creek watershed can produce runoff quantities

    with substantial diurnal fluctuations. The irrigatorsin the upper basin are entitled to capture and useflows from Clear Creek, but they have limited capabil-

    ities to do so. Instead, they must let flows from ClearCreek enter the mainstem of the Sevier, and then

    divert these waters downstream. If they fail to do so,the excess flows received at Vermillion that cannot be

    diverted and locally used will be spilled from the Ver-million Diversion Dam and lost from the upper basin.Clearly, capture of Clear Creek waters will require

    coordination of releases from Piute Reservoir,upstream, with diversion of irrigation water into

    canals, downstream. This coordination will be best

    facilitated with advanced forecasts about likely diur-nal fluctuations in Clear Creek flows.

    The design of an appropriate hourly predictionmodel requires the use of data that reflect the physi-

    cal forces that cause streamflow in these tributarystreams to fluctuate throughout the day. These

    include hourly total solar radiation, previous daystreamflow, precipitation, and air temperature. An

    hourly model is required to provide information onthe diurnal fluctuations in the river flows due to trib-utary inflows. The nonlinear mapping equations used

    to capture the relationships between inputs and out-puts of a hourly ANN model can be expressed as

    Qt = (I)

    where Qt is the rate of flow (cfs) past the Clear Creekgage for the coming 24 hours, and t = (1,2,..., 24). I is

    the vector of inputs to the ANN, and is the ANNnonlinear transformation of inputs to outputs. The

    input vector can be expressed as

    I = [Q

    t-24 T

    t-24 R

    t-24 S

    t-24 P

    t-24]T

    where Q

    t-24, T

    t-24, and R

    t-24 are averages of the vec-

    tors of hourly streamflow (cfs) at the Clear Creekgage, air temperature (C), and solar radiation

    (kW/m2), respectively, for the 24 hours previous to theprediction time; S

    t-24 and P

    t-24 are averages of the

    vectors of wind speed (mph) and precipitation (in)respectively, for a period of 24 hours before time t

    Precipitation data were provided from the KimberlyMine SnoTel station (see Figure 1).

    JOURNAL OF THE AMERICAN WATER RESOURCES ASSOCIATION 201 JAWRA

    BASIN SCALE WATER MANAGEMENT AND FORECASTING USING ARTIFICIAL NEURAL NETWORKS

    (3)

    (4)

    (5)

    (6)

  • 8/14/2019 Basin Scale Hydrology support vector machines

    8/14

    RESULTS AND DISCUSSION

    Model Specifications

    Obtaining an optimal level of performance for any

    learning machine entails a considerable number of

    design choices, especially for ANN learning. The char-acteristics of an optimal architecture are a model that

    produces acceptable predictions, has good generaliza-tion abilities, and requires a minimal number of cali-

    brated parameters (i.e., degrees of freedom). Theapproach for selecting an optimal architecture bene-

    fits from a rigorous statistical analysis and expertknowledge. Splitting the data into two sets, where the

    machine is trained on one and tested on the other toavoid underestimating the true error, has a twofolddisadvantage: the problem of having sufficient data

    for training, and the possibility of statistical depen-dence between the two subsets (Blum et al., 1999).

    Moreover, since the available data are scarce, k-foldcross-validation can be used to overcome these defi-

    ciencies. In k-fold cross-validation, the data set ispartitioned into k mutually disjoint folds (subsets)

    each Sj, the model is trained on all folds except Sj.The final error is estimated as

    whereQ

    (S

    j,X

    ) is the statistic of interest for evaluationof an ANN model trained usingXand tested onSj. In

    this paper, a set aside sample of data is used (i.e., val-idation data set) to test the model plausibility. Toavoid data splitting, the training data sets were used

    in a cross-validation context to build the ANN model.The problem of choosing a suitable architecture for

    ANNs lies in specifying the activation function andthe number of neurons in the hidden layer. Trial-and-

    error analysis resulted in selection of a suitable acti- vation function for each model. Selection of thenumber of hidden nodes in ANNs is a most difficult

    but important step. The root mean square error

    (RMSE) from the five-fold cross-validation error wasused to select the optimal number of hidden nodes(Rivals and Personnaz, 2000). The number of hidden

    nodes was increased, starting from only one, and eval-uated the five-fold cross-validation error (mean and

    variance). The optimal number of hidden nodes was

    selected at the point where the decrease in the five-fold error becomes insignificant (see Figure 4 for the

    case of the hourly model). Table 1 provides a summa-ry of the characteristics of the seasonal, daily, and

    hourly models that have been discussed. The table

    shows the number of neurons in each layer, the opti-mal transfer function, and the learning rule used foreach model.

    The ANN model is constructed once the modelstructure is selected. Construction of an ANN model

    involves training the network with known input/output data available from the real system, and then

    testing the resulting model against other data notused in training (the withheld sample). In thismanuscript, ANNs are developed using Neural Works

    Professional II/Plus (NeuralWare, Inc., 2000) and the

    Matlab toolbox NetLab (Bishop, 1995; Nabney,2001).

    Performance Criteria

    The objective of the training phase in building anANN is to produce a set of connection weights that

    causes the outputs of the ANN,y(t), to match as close-ly as possible the observed system outputs, T(t), for

    every set of training patterns. Achievement of thisobjective is typically measured by the correlation coef-

    ficient, R2

    , defined as

    wherey and T

    are the means ofy and T, respectively.The correlation coefficient is not a measure of the pre-

    dictive capabilities of the model since it is sensitive to

    JAWRA 202 JOURNAL OF THE AMERICAN WATER RESOURCES ASSOCIATION

    KHALIL, MCKEE, KEMBLOWSKI, AND ASEFA

    S j kj { }1 2, ,..., (Shakhnarovich et al., 2001). For

    ErrK

    Q S X S X CV k jj

    k

    j=

    = ( ) 11

    , , (7)

    Figure 4. RMSE (five-fold cross-validation) and

    95 Percent Confidence Bounds as a Function

    of Number of Hidden Nodes.

    Ry y T T

    y y T T

    t

    tt

    2

    2

    =( ) ( )

    ( ) ( )

    (8)

  • 8/14/2019 Basin Scale Hydrology support vector machines

    9/14

    outliers and spurious data. Therefore, the coefficient

    of efficiency,E, has been widely used, defined as

    A model withE = 0.9 has a mean square error of 10percent of the variance of the observed data. It is,

    however, sensitive to significant outliers. To overcomethe susceptibility to extreme values, the Index of

    Agreement, d, can be used. It is defined as follows

    It is less sensitive to large values. To quantify theerror in terms of the units of the variable, one could

    use the RMSE. It is defined as

    Bias and mean absolute error are also physical mea-

    sures. Bias is the average of the differences betweenobserved and predicted values, while mean absoluteerror is the average of the absolute of the residuals.

    For more details about goodness-of-fit measures, seeDavid and Gregory (1999).

    A complete assessment of the model should also

    include scatterplots with error bounds. The perfor-mance of the ANN model is evaluated during the

    ANN testing phase using scatterplots ofy(t) versus

    T(t). The magnitude of the scatter of [T(t),y(t)] about

    a 45 degree line can be examined using error boundsto assess the deviation of predicted outputs from mea-

    sured system behavior.

    Seasonal Streamflow Prediction Model Performance

    Figure 5 illustrates the relationship between the

    model predictions and the actual data. Using SSTAdata from the East Atlantic station produced the opti-

    mal performance. The correlation coefficient of the

    model is 0.88 and the coefficient of efficiency is 0.76.Adequacy of the seasonal ANN could provide a very

    useful utility to the water users in making decisionsin regard to the basin operations. Figure 6 provides a

    scatterplot, together with 20 percent error bounds, ofmodel predictions versus actual system behavior. It

    should be noted that in the total training data set cor-responding to the period 1981 to 2002, the small num-ber of patterns could be a direct reason for the ANN

    to exhibit relatively poor predictions at some points(i.e., the peak flows). It also could be attributed to a

    lack of sufficient data included in the inputs to theANN model to fully represent the hydrology of the

    watershed for these events. It is worth mentioningthat the lack of accuracy of ANNs in predicting peaks

    and valleys in hydrologic time series is one of themajor concerns facing users of ANN technology in thehydrologic community. For techniques to improve

    peak flow estimation in ANNs, readers are referred toSudheeret al. (2003).

    Successful seasonal forecasts of water quantityshould help answer difficult questions such as, Will

    there be sufficient water to meet competing demandsin the Sevier River Basin? and How far will one beable to stretch the water that will become available?

    Daily Prediction of Required Reservoir Releases

    Figure 7 presents a time series plot comparing the ANN model release forecasts and the actual diver

    sions for the irrigation seasons of 2000, 2001, and2002. This figure shows good model performance inpredicting the required releases from the reservoir.

    JOURNAL OF THE AMERICAN WATER RESOURCES ASSOCIATION 203 JAWRA

    BASIN SCALE WATER MANAGEMENT AND FORECASTING USING ARTIFICIAL NEURAL NETWORKS

    TABLE 1. Model Structures Summary.

    Hidden Hidden

    Input Layer Layer Output Transfer Learning

    Model Layer 1 2 Layer Function Rule*

    Seasonal Model 15 10 01 sig(.) -Rule

    Daily Model 14 14 3 01 tanh(.) NCD

    Hourly Model 07 06 24 sig(.) -Rule

    *The learning rules are the delta rule (-Rule), and the normalized cumulative delta (NCD). A discussion of these rules can be found in Neu-*ralWare (2000).

    ET y

    T T

    t

    t

    =

    1

    2

    2

    ( )

    ( )

    dT y

    y T T T

    t

    t

    =

    ( ) + ( )

    1

    RMSE N T y t N t

    = ( ) = 1 2 1, ,...,

    (9)

    (10)

    (11)

  • 8/14/2019 Basin Scale Hydrology support vector machines

    10/14

    Figure 8 provides a scatterplot, together with 20 per-

    cent error bounds, of model predictions versus mea-sured releases for the validation data used in the2000, 2001, and 2002 irrigation seasons. The correla-

    tion coefficient for this scatterplot had a value of R2 =0.98 and the coefficient of efficiency = 0.95. To utilize

    the model in near real time, the predicted reservoir

    releases can be provided to the reservoir operator, andthen it is possible for the operator and experts to ana-lyze, judge, and evaluate the results of the ANN

    model according to their own knowledge and experi-ence.

    JAWRA 204 JOURNAL OF THE AMERICAN WATER RESOURCES ASSOCIATION

    KHALIL, MCKEE, KEMBLOWSKI, AND ASEFA

    Figure 5. Time Series Performance of the ANN Model in Predicting Seasonal Quantity of Water.

    Figure 6. Scatterplot of Model Predictions Versus Actual Flows.

  • 8/14/2019 Basin Scale Hydrology support vector machines

    11/14

    The results indicate that the model forecast can be

    used to address the conflicting goals of satisfyingdownstream demands with high certainty while at

    the same time conserving water in the reservoir foruse later in the season.

    Hourly Streamflow Prediction Model Results

    A hourly streamflow prediction model has been

    built to forecast the substantial diurnal fluctuationfor the Clear Creek watershed. The total data avail-

    able for building this model are from the spring runoff

    JOURNAL OF THE AMERICAN WATER RESOURCES ASSOCIATION 205 JAWRA

    BASIN SCALE WATER MANAGEMENT AND FORECASTING USING ARTIFICIAL NEURAL NETWORKS

    Figure 7. Time Series Performance of the ANN Model in Predicting the 2000, 2001, and 2002 Irrigation Season Releases.

    Figure 8. Scatter Plot of Model Predictions Versus Measured Releases for the 2000, 2001, and 2002 Irrigation Seasons.

  • 8/14/2019 Basin Scale Hydrology support vector machines

    12/14

    periods of 2000 through 2003. As shown in Figure 9, itis possible to predict 2003 hourly flows at Clear Creekduring the first months of the irrigation season when

    diurnal fluctuations play a strong role in determiningflows in the creek.

    On average, the linear correlation between theactual and the predicted flow is 0.97 and the RMSE is

    9.43 cfs (0.27m3 /sec). Different trials with differentdata sets proved that the hourly predictions wouldnot be as good unless all the relevant data previous

    streamflow, total solar radiation, air temperature, andprecipitation were employed.

    As shown in Figure 10, the predicted flows versusthe actual flows illustrate very good model perfor-mance. Hourly streamflow predictions provide useful

    management information for the Sevier River Basinmanagers and farmers in dealing with diurnal fluctu-

    ations of tributary streams. It is worth mentioninghere that the model was able to accurately simulate

    the rapid rise in streamflow that occurs at sunrise, aswell as other diurnal fluctuations in flow.

    SUMMARY AND CONCLUSIONS

    To improve water management for the Sevier RiverBasin, an extensive, basin wide automated system

    has been installed that records and stores data on ahourly basis to enable real time information process-

    ing. Moreover, Internet based communications andcontrol systems are in place to allow managers to

    remotely manipulate all reservoir releases and canaldiversion gates at will. Operators of the basin widesystem and water users alike have begun to view the

    resulting information and control system as an inte-grated tool for basin wide management (Bergeret al.,2002; Bretet al., 2002).

    In most river basins, and particularly in the Sevier,

    water supply is managed at different temporal andspatial scales, and decisions made by differentmanagers are not always well coordinated. This is

    JAWRA 206 JOURNAL OF THE AMERICAN WATER RESOURCES ASSOCIATION

    KHALIL, MCKEE, KEMBLOWSKI, AND ASEFA

    Figure 9. Model Performance Evaluated Using Coefficient

    of Efficiency and RMSE (2003 irrigation season).

    Figure 10. Daily Predicted Versus Actual Flow at 10 p.m. for Clear Creek.

  • 8/14/2019 Basin Scale Hydrology support vector machines

    13/14

    particularly difficult given long travel times anduncertainties in system behavior. The modelsdescribed in this manuscript represent a first attempt

    to exploit the real time database available on theSevier River to address the range in information

    needs of stakeholders and managers. A seasonalmodel provides prediction of future water availability

    in the upper basin to reduce the vulnerability of waterusers to unforeseen water shortages. This informationwill help them avoid financial commitments that

    must be made early in the water year but that couldresult in substantial economic losses if future water

    supplies become limited. A daily reservoir releasemodel was designed to improve on-demand flexibility

    in reservoir operation. Efficient daily managementdecisions about reservoir releases reduce water lossesand improve deliveries to downstream irrigators. A

    hourly model of uncontrolled tributary flows allowswater managers to accurately anticipate diurnal flow

    conditions and consequently integrate both upstream

    reservoir releases with numerous downstream canaldiversions. These models exploit the real timedatabase with the coordinated input of water demandinformation by diverse canal and reservoir operators

    to provide both short term and long term decision rel-evant information. In these functions, they constitute

    a foundation of an integrated framework for basin-scale management of the available scarce water

    resources.The ANN model was able to successfully transform

    measured input vectors into reasonably accurate fore-

    casts of outputs for the three models. Large amountsof data, including multi-sensor data in the form of

    meteorological and streamflow data, were integratedinto an ANN framework to develop useful models for

    water management problems. The adequacy of theANN models is demonstrated by the quality of theirforecast. This shows that construction of real time

    monitoring and management systems can be accom-plished to provide more efficient utilization of thebasins water resources. This paper demonstrates the

    applicability of ANNs to learn relationships betweeneasy to measure streamflow, meteorological, and

    satellite data to enhance basin scale management.The performance of ANN techniques in extracting

    useful information is satisfactory (see Table 2).Overall, the resulting models are easily used and

    have been found to provide useful and efficient fore-

    casts without resorting to the development and appli-cation of complex, computationally demanding

    physically based models that require expensive datacollection efforts to support them. In the future, such

    models could also provide a substantial potential con-tribution to computer controlled basin automation bylinking them to the basin database. This is being con-

    sidered in the Sevier River Basin, and, if implement-ed, might reduce the cost of management and more

    fully exploit the available database for the basin. This

    leads us to optimistically share the view voiced by oneof the water users in the Sevier River Basin that:when something goes down and I have to go backto the old way of doing things, it is like being blind

    after being able to see (Bergeret al., 2002, p. 25-11).

    ACKNOWLEDGMENTS

    The authors wish to thank Dr. Roger Hansen of the U.S. Bureau

    of Reclamation, Provo, Utah, for the extremely valuable contribu-

    tions he has made to the work reported in this paper. Thanks are

    also due to Dr. Luis Bastidas and Connely K. Baldwin for their

    valuable insights and help. The authors are grateful to the SevierRiver Water Users Association, the U.S. Bureau of Reclamation

    and the Utah Water Research Laboratory at Utah State University

    for providing funding in partial support of the work reported here.

    Thanks are also due to anonymous reviewers for their insightful

    comments.

    JOURNAL OF THE AMERICAN WATER RESOURCES ASSOCIATION 207 JAWRA

    BASIN SCALE WATER MANAGEMENT AND FORECASTING USING ARTIFICIAL NEURAL NETWORKS

    TABLE 2. Key Statistics of Model Performance in the Training and Testing Phases.

    Seasonal Daily Hourly

    Statistics Training Testing Training Testing Training Testing

    Correlation Coefficient 0.93 0.88 0.99 0.98 0.99 0.97Coefficient of Efficiency 0.86 0.76 0.98 0.95 0.98 0.91

    Index of Agreement 0.96 0.94 0.99 0.99 0.99 0.97

    RMSE 19.58 mcf 20.59 mcf 20.16 cfs 38.13 cfs 4.25 cfs 9.43 cfs

    0.55 mcm 0.58 mcm 0.57 cms 1.08 cms 0.12 cms 0.27 cms

    Bias -3.44 mcf -4.24 mcf 0.00 cfs 1.84 cfs 0.00 cfs -3.84 cfs

    0.097 mcm -0.12 mcm 0.00 cfs 0.05 cms 0.00 cfs -0.1 cms

    Mean Absolute Error 14.00 mcf 15.67 mcf 13.25 cfs 27.38 cfs 2.86 cfs 5.91 cfs

    0.4 mcm 0.44 mcm 0.38 cms 0.78 cms 0.081 cms 0.17 cms

  • 8/14/2019 Basin Scale Hydrology support vector machines

    14/14

    LITERATURE CITED

    Ames, D., 1998. Seasonal to Interannual Streamflow Forecasts

    Using Nonlinear Timeseries Methods and Climate Information.

    Master of Science Thesis, Utah State University, Logan, Utah.

    Berger, B., R. Hansen, and A. Hilton, 2002. Using the World-Wide-

    Web as a Support System to Enhance Water Management. The

    18th ICID Congress and 53rd IEC Meeting, Montral, Canada,

    pp. 25-1 to 25-12.

    Bishop, C.M., 1995. Neural Networks for Pattern Recognition.

    Oxford University Press.

    Blum, A., A. Kalai, and J. Langford, 1999. Beating the Holdout:

    Bounds for k-Fold and Progressive Cross-Validation. Proceed-

    ings of the 12th Annual Conference on Computational Learning

    Theory, pp. 203-208.

    Bret, B., H. Rogers, and R. Jensen, 2002. Sevier River Basin Sys-

    tem Description. Available at http://www.sevierriver.org/sys_

    desc/t1.html.Accessed onApril 20, 2004.

    David, R.L. and M.J. Gregory, 1999. Evaluating the Use of Good-

    ness-of-Fit Measures in Hydrologic and Hydroclimatic Model

    Validation. Water Resources Research 35(1):233-241.

    Govindaraju, R.S. and A.R. Rao, 2000. Artificial Neural Networks

    in Hydrology. Kluwer Academic Publishers, Amsterdam, The

    Netherlands.

    Hammerstrom, D., 1993. Working With Neural Networks. IEEESpectrum, July, pp. 46-53.

    Hayken, S., 1994. Neural Networks: A Comprehensive Foundation.

    IEEE Press, McMillan College Publishing, New York, New York.

    Kaplan, A., M. Cane, Y. Kushnir, A. Clement, M. Blumenthal, and

    B. Rajagopalan, 1998. Analyses of Global Sea Surface Tempera-

    ture 1856-1991. Journal of Geophysical Research 103:18,567-

    18,589.

    Kaplan, A., Y. Kushnir, M. Cane, and M. Blumenthal, 1997.

    Reduced Space Optimal Analysis for Historical Datasets: 136

    Years of Atlantic Sea Surface Temperatures. Journal of Geo-

    physical Research 102:27,835-27,860.

    Maier, H.R. and G.C. Dandy, 2000. Neural Networks for the Predic-

    tion and Forecasting of Water Resources Variables: A Review of

    Modeling Issues and Applications. Environmental Modeling and

    Software 15:101-124.Nabney, I., 2001. Netlab: Algorithms for Pattern Recognition.

    Springer, New York, New York.

    NeuralWare, Inc., 2000. Neural Computing, NeuralWorks Profes-

    sional II/PLUS. Carnegie, Pennsylvania.

    Rivals, I. and L. Personnaz, 2000. A Statistical Procedure for

    Determining the Optimal Number of Hidden Neurons of a Neu-

    ral Model. Second International Symposium on Neural Compu-

    tation, Berlin, Germany.

    Rumelhart, D.E., G.E. Hinton, and R.J. Williams, 1986. Learning

    Internal Representations by Error Propagation.In: Parallel Dis-

    tributed Processing: Explorations in the Microstructure of Cog-

    nition, D.E. Rumelhart and J.L. McClelland (Editors). MIT

    Press, Cambridge, Massachusetts, Vol. 1, Chapter 8, pp. 318-

    362.

    Schalkoff, R.J., 1997. Artificial Neural Networks. McGraw-Hill,New York, New York.

    Sevier River Water Users Association, 2004. Sevier River Water

    Users Association: Real-time Water/Weather Data. Available at

    http://www.sevierriver.org/.Accessed in December 08, 2004.

    Shakhnarovich, G., R. El-Yaniv, and Y. Baram, 2001. Smoothed

    Bootstrap and Statistical Data Cloning for Classifier Evalua-

    tion. Proceedings of International Conference on Machine

    Learning, pp. 521-528.

    Skapura, D.M., 1995. Building Neural Networks. Addison-Wesley

    Publishing Company, Boston, Massachusetts.

    Sudheer K.P., P.C. Nayak, and K.S. Ramasastri, 2003. Improving

    Peak Flow Estimates in Artificial Neural Network River Flow

    Models. Hydrological Processes 17:677-686.

    Utah Board of Water Resources, 2001. Utahs Water Resources

    Planning for the Future. Division of Water Resources Publica-

    tions, Salt Lake City,Utah. Available at http://www.water.utah.

    gov/waterplan/uwrpff/TOC.htm. Accessed on May 21, 2001.

    JAWRA 208 JOURNAL OF THE AMERICAN WATER RESOURCES ASSOCIATION

    KHALIL, MCKEE, KEMBLOWSKI, AND ASEFA