2014 KSCE Peak Discharge

KSCE Journal of Civil Engineering (2014) 18(6):1868-1876Copyright 2014 Korean Society of Civil EngineersDOI 10.1007/s12205-014-0047-8 1868pISSN 1226-7988, eISSN 1976-3808www.springer.com/12205Water EngineeringPeakDischargePredictionduetoEmbankmentDamBreak byusingSensitivityAnalysisbasedANNAli Osman Pektas* and Tarkan Erdik**Received January 21, 2013/Revised April 1, 2013/Accepted October 14, 2013/Published Online June 20, 2014AbstractAccurate prediction of peak discharges due to embankment dam failure is essential to identifying and reducing potential for lossof life and damage in the downstream floodplain. Because, when a dam fails the damage is certain, but the extent of this damagecannot be evaluated in advance. The loss of life and property damage can vary depending on flood area and population. In orderto cope with embankment dam breaching and to take necessary steps beforehand many researchers worked on parametric breachmodels based on Regression Analysis (RA) to estimate the peak outflow from a breached embankment dam since 1970s. RA is awidely-used approach that could provide acceptable results. Since, this approach bears restrictive assumptions, direct applicationofRAignoringtheseassumptionsmightcausepitfallsandbiasedcalculations.Inthisstudy,itisshownthatpreviousworksgeneratedbyRAgivesbiasedcalculationsandanewalternativeapproach,basedonArtificialNeuralNetworks(ANN),issuggestedinreplacementofclassicalRA,whichgivesmoreaccurateresultsaccordingtobothnumericalerrorcriteriaandscientific background.Keywords: artificial neural networks, multilayer perceptrons, sensitivity analysis, peak discharge, embankment dam failure, variableimportance1. IntroductionEmbankment dams, which are widely constructed worldwidethan other types, have been of great importance in water resourcedevelopment and hydropower generation. Although the probabilityof dam failure can be extremely low, its occurrences can implycatastrophic consequences downstream such as a huge amount ofhumanlossaswellasdamagetothefacilitiesandenvironment.The analysis of the dam failure phenomenon must be taken intoaccountinthedesignandconstructionofphaseofdams.Therefore, the prediction of peak discharges due to embankmentdamfailurecanbeconsideredasthefirstandmostimportanttask.Moreover,thepeakdischargescalculationprovidesascenario generating tool for identifying the resulting hazards, as aresultofwhichdecisionmakingauthoritiesmaythentakenecessary actions to protect against the loss of life and propertydamage due to resulting contingencies.Embankmentdamsaresubjecttodambreachingcausedbyerosion due to overtopping process. The breach formation happensgraduallywithrespecttotimeandwidth(ASCE/EWRITaskCommitteeonDam/LeveeBreaching,2011).Dambreachingisusually formed only along a portion of the dams crest length. Inmanyinstances,thebottomofthebreachprogressivelyerodesdownward until it reaches the bottom of the dam.The embankment dam failure can cause devastating disasterssuch as overtopping of the South Fork Dam, Pennsylvania, USAin 1889, as a result of which over 2200 people became death andlarge property loss (Singh and Scarlatos, 1998). Dike breaches inabout 900 places due to heavy storm surge in 1953 in Netherlandsinduced one of the biggest natural disasters in the Dutch history,causing 1835 people lost their lives and a direct economic loss ofabout14%oftheDutchGDP(Huisman,1998),andfinallyinAugust 1975, the catastrophic ponderous rainstorms (maximum6-hourrainfall830mm)incentralChinaledtothedisastrousfailures of the Banqiao and the Shimantan Reservoir Dams with26000 deaths (Pan, 2000).ASCE/EWRI Task Committeeon Dam/LeveeBreaching hasrecentlyclassifiedembankmentbreachmodelsasparametric,simplified physically-based, or detailed physically-based, takingintoaccountthemodelformulationandapproximationofphysicalprocesses.Parametricmodelsareusuallyempirical,physically-basedmodelsderivedbyusingRegressionModels(RMs).ManyresearcherssofaremployedRMsrelatingpeakbreachoutflowtodamheight,reservoirstoragevolume,orcombinations of the two, using the case study data. Kirkpatrick (1977), who is among the pioneering researchers,TECHNICAL NOTE*Assistant Professor, Dept. of Environmental Engineering, Bahcesehir University, 34353 Istanbul, Turkey (Corresponding Author, E-mail: [email protected])**Research Assistant, Dept. of Civil Engineering, Istanbul Technical University, Istanbul, Turkey (E-mail: [email protected]) Peak Discharge Prediction due to Embankment Dam Break by using Sensitivity Analysis based ANNVol. 18, No. 6 / September 2014 1869 presenteddatafrom13embankmentdamfailures,togetherwith 6 additional hypothetical failures and suggested a best-fitequationforpeakdischargepredictionasafunctionofthedepth of water behind the dam at failure. The Soil ConservationService (SCS, 1981) developed a power law envelope equationrelating the peak dam failure outflow tothe depth of water atthedamatthetimeoffailure,usingthe13casestudiesofKirkpatrick,althoughthreedatapointsareslightlyabovethecurve. U.S. Bureau of Reclamation (1982), extending the workofSCS(1981),proposedsimilarenvelopeequationforpeakbreach outflow using case study data from 21 dams. Singh andSnorrason(1982and1984)presentedapeakdischargepredictionequationfromtherelationsofdamheightandreservoirstorage.TheseequationsweredevelopedusingtheresultsofeightsimulateddamfailurespreviouslyanalyzedemployingsoftwareasDAMBRKandHEC-1.Costa(1985)presentedadetailedsummaryofflooddischargesresultingfrom the failures of all types of constructed and natural dams.Heusedcurvefittingproceduretoforecastpeakflowfrombreached damsasfunctionsof damheight,storage volumeattimeoffailure,andtheproductofboth.Healsopresentedenvelopecurves.AlthoughheincludeddataforthefailureofSt. Francis Dam (a concrete gravity structure) in the analysis, itseemsthatthisdoesnotaffecttheresults.Froehlich(1995)developed a best-fit regression equation for prediction of peakdischarge based on reservoir volume and head, using data from22casestudiesforwhichpeakdischargedatawereavailable(Wahl,1998).MacDonaldandLangridge-Monopolis(1984)proposedabreachformationfactor,whichisdefinedastheproduct of the volume of breach and the depth of water abovethebreachinvertatthetimeoffailure.Wahl(1998,2004)compiled one of the most comprehensive databases about dambreachcasestudiesandexaminedtheexpressiondevelopedbetween1977and1995.Recently,Pierceetal.(2010)expandedthedatabaseofthecasestudies,reviewedthepublished regression expression in the literature, and suggestedbrand new approaches by using the unused database. Thorntonet al. (2011) performed the multivariate regression analysis byadding extra variables as either the embankment length (L) oraverage embankment width (Wave), which have not been usedpreviously,intothevolumeofwaterbehindthedam(V)anddam height (H) to improve the peak discharge (Qp) equations.Thenewlydevelopedexpressionsseemtoreducethepredictionerrorandimprovethepredictionaccuracy.Morerecently,GuptaandSingh(2012)suggestedtousecomposite(Wave + L) variable together with dam height (H) and volumeof water behind the dam (V). The objective of the present workistoinvestigateproperANNmodelstopredictQpofdambreaching and comparing the ANN and most-recent regressionequationresults.Inadditionimportanceanalysisofindependentvariables which is used in these regression equations have beencomputedrelativetoneuralnetworksensitivityanalysistechniquestohaveanopinionbetweentherelationshipofpredictors of Qp.2. The Studies used as a Benchmark and Pitfallsof EquationsIn this study, the most recent parametric models by Thornton etal. (2011) and Gupta and Singh (2012) are focused and employedas benchmarks. Thornton et al. (2011) benefited from the 38 dataout of 87 databases of Pierce et al. (2010). A total of 25, 14, and4 data out of 38 data in Thornton et al. (2011) reports informationaboutWave,L,andboth,respectively.Theyaddedtwonewparameters as Wave or L for volume of water behind the dam (V)anddamheight(H)toimprovethepeakdischargeequations.The resulting equations are as follows;(1)(2)Gupta and Singh (2012) discussed the paper of Thornton et al.(2011)usingthe35dataoutof38dataofPierceetal.(2010)whichhasinformationaboutbothWaveandL.TheyhaveopposedtotheuseofthedoubleequationbyThorntonetal.(2011) and suggested to use a single equation consisting of threevariables at a time as;(3)Gupta and Singh (2012) also generated databases by using theequationsthatarecompiledbyThortonetal.(2011)andusedthesedatasetstogeneratethreenewequations.Oneoftheseequations(Eq.4)usestwoindependentvariableregresstoQpand compared to the suggested ANN models in the scope of thisstudy.(4)GuptaandSingh(2012)developedEq.(3)andEq.(4)byusingregressionanalysis.However,theregressionanalysisapproach bears restrictive assumptions and direct application ofregressionwithoutconsideringrestrictiveassumptionsmightcausepitfallsandbiasedcalculations.Themostencounteredviolation that does not meet the criteria of regression analysis isNormality, which states that the conditional distribution of theresidualsshouldbenormallydistributed(Erdik,2009;ErdiketQpV0.335H1.833Wave0.663 ( ) =Qp0.012 V0.493H1.205L0.226( ) =Qp0.0217V0.47378H1.1775W L + ( )0.17094=Qp0.11769H0.7567V0.4814=Fig. 1. Normal Distribution of Residuals for Eq. (3)Ali Osman Pektas and Tarkan Erdik 1870 KSCE Journal of Civil Engineeringal., 2009).Theviolationintheassumptionthattheresidualsmustbenormally distributed is shown in Fig. 1 and Fig. 2 for Eq. (3) andEq. (4), respectively. It is clear from the both figures that residualsof Eq. (3) and Eq. (4) do not fit normal distribution function.Inaddition,inordertomeasurethediscrepancyexistingbetweenobservedandtheoreticalfrequenciesforbothEq.(3)and (4) Kolmogrov-Smirnov test is applied at a 0.05 significancelevel. The computed Kolmogrov-Smirnov value for both Eq. (3)and 4 are 0.224 for 5% significance level. However, the statisticsare 0.410 and 0.367. Since the values 0.367 and 0.410 are greaterthan 0.224 the hypothesis that residuals fit the normal distributionisviolated.Themotivationofthisstudyistodevelopanewmodel which not only outperforms the most recent applications,butalsohasnorestrictiveassumptionsasregressionmodelshave.3. Artificial Neural Network ApplicationDue to the insufficient data set of dam breaks; Artificial NeuralNetworks(ANNs)canbeagoodalternativetoregressionanalysisinpeakflowofdambreach(Qp)predictionwiththeirflexibleapplicabilityandnonlinearstructurewhencomparedwith multitude restrictive assumptions of regression analysis.ANN is a computing system based on the operating mechanismof biological neural networks (Dorf, 1997). During the process ofneuralnetworktrainingorlearning,adatasetincludinginputsanddesiredoutputsareprovidedtothenetworkmodel.Theneural network is constructed by fitting itself to the training datato predict (to learn) the unknown outputs by using training data.Usually,aportionofdatacalledtestingdataisreservedtoconfirm the prediction accuracy performance of the trained model.ANNs have been shown to be highly succeeded approximators ofmanycomplexfunctions,andhaveshownadvantagesovergeneral linear models in predictive ability (Hastie et al., 2001).ThisfacthasledmanyresearcherstoadvocateANNsasanattractive, non-linear alternative to the traditional statistical methods(Mohammadietal.,2005;Bourdeset al.,2010;Trontoetal.,2006).Although highly predictive capacities, the black box natureofANNsisamajorweaknesswhencomparedtotraditionalstatistical approaches likeregressionanalysis. ANNsarecalledblack-boxes by many researchers due to the implicit behaviorof input and output relationship. Input variables are often enteredintothenetwork,andanoutputvalueisgeneratedwithoutgaining any understanding of the interrelationships between thevariables. Without information regarding the relative importanceofparametersandcontributorybehaviorofinputsonoutputsANNs utility is limited. To fix this deficiency, many sensitivityanalysis methods are developed by ANN applicators. Sensitivityanalysisisthestudyofhowthevariationoruncertaintyintheoutput of a mathematical model can be apportioned, qualitativelyor quantitatively, to different sources of variation in the input to amodel(Cacucietal.,2005).Sensitivityanalysiswithneuralnetworks involves varying network activations and observing theresultingchangesinotherpartsofthenetwork(suchastheoutput activations) to determine which parts of the network aremoreimportantandwhichareless.Theapproachalsohelpsmodel builders study the uncertainties associated with the modelparameters,whichisespeciallyimportantfordomainsthatarenot well understood (Hodouin et al., 1991).ManyrelevantworksonapplyingANNmodelingcombinedwith Sensitivity Analysis (SA) have been implemented in businessand engineering applications (Baker et al., 1999; CullenandFrey,1999;Embrechtsetal.,2001;Pohetal.,1998;FraedrichandGoldberg,2000;Kleijnen,1995;Wuetal.,2010).HowesandCrook (1999) proposed the general influence method of eachinputtobeusedasaparameterbasedontheanalysisofnetworks weights, similar to a method introduced by Yoon et al.(1994). Both the above methods are closely related to Garsonsalgorithm(1991)andtheWeightProduct(WP)methodbyTchaban et al. (1998). Olden et al. (2004) compared a number ofmethodstodeterminetheimportanceofparametersinANNsusing a simulation-based approach. Kemp et al. (2007) proposedanewmethodbymakingrefinementoftheperturbationmethod by Olden et al. (2004), which is later called Hold-backInputRandomizationMethod(HIPR-Method).The approachesreviewed by these authors ranged from the calculation of indices,tosensitivityanalysis,tovariationsonthestepwiseparametricanalysis. In this study, ANN models have constructed not only toobtainaccuratemodels,butalsotoperformthesensitivityanalysis of the model parameters.3.1 Model Setup and ApplicationAfterhavingdeterminedandgatheredthetotal35dataofGuptaandSingh(2011)descriptivestatisticswereusedtoanalyzethedata.Histogramdiagramsandnormalcurveswereplotted to investigate the distribution of data. As seen in Fig. 3 (a-d),thedistributionsofvariablesetsarehighlyskewedanddeviatedfromthenormalprobabilityplot.Linearregressionequationsweredevelopedtoinvestigateinterrelationshipsofvariables by using non-transformed and log-transformed data topredict Qp and LnQp, respectively. The independent variables ofregressions were V, H and L+W. In the first case (Reg_model 1)Fig. 2. Normal Distribution of Residuals for Eq. (4) Peak Discharge Prediction due to Embankment Dam Break by using Sensitivity Analysis based ANNVol. 18, No. 6 / September 2014 1871 adjusted R2 values was found 0.685. The independent variableswerenotsignificantstatisticallyandthereisnocollinearitydiagnosticbytheperspectiveofcorrelationchange,VarianceInflation Factor (VIF) and condition indexes. According to a ruleof thumb, the values of the VIF, which is greater than 2 and thevalueoftheconditionindex,whichisgreaterthan10aretheindicatorsofapossiblemoderateproblemofcollinearity.Maximumconditionindexinmodel1is3.47,andVIFvaluesareallsmallerthan2,sothereisnotacollinearityproblembetweennontransformedvariables.However,inReg_model1theassumptionoflinearregressionisviolatedbecauseofnon-linearity and non normal distribution of variables. Therefore, inReg_model 2 log transformed variables were used. Although thelogtransformationhavefixedthenon-linearityproblemandfixedthenormaldistributionassumptionofregressionmodel,butthecollinearityproblemwasresuming.Inaddition,inReg_model2Ln(L+W)wasfoundtobeinsignificant.Asadiagnosticofcollinearity,zeroorder,partialandpart(semipartial)correlationsofLn(L+W)weredecreasedsharplyfrom0.79 to 0.006. Maximum condition index of fourth dimension ofthemodel2is19.37,whichindicatesamoderatecollinearityproblem. Regression models collinearity and coefficient statisticsare presented in Table 1.Before modeling with ANNs, an appropriate training data setwasselectedanddatasetsweresplitintotrainingandtestingparts. The training part is approximately 80% of total data, andtesting is20%. The selectionof testing data was performed byrandomly generated Bernoulli variate with a probability parameterFig. 3. HistogramFrequencyGraphsoftheVariables:(a)DamHeight,(b)WaterVolume,(c)TotalofEmbankmentwithandLength,(d) Peak DischargeTable 1. Coefficient and Collinearity Statistics of Regression ModelsUnstandardized Coefficientst Sig.Correlations Collinearity StatisticsB Zero-order Partial Part VIF Condition IndexModel 1a(Constant) -4566.87 -1.84 0.08 1.000Hm 433.47 3.57 0.00 0.67 0.54 0.34 1.32 1.886V 0.00 3.46 0.00 0.71 0.53 0.33 1.44 2.263LW 10.54 2.66 0.01 0.58 0.43 0.26 1.24 3.438Model 2b(Constant) -3.82 -6.23 0.00 1.000Ln_H 1.18 4.94 0.00 0.92 0.66 0.20 4.32 6.435Ln_V 0.47 7.56 0.00 0.93 0.81 0.31 3.02 11.162Ln_L+W 0.17 1.37 0.18 0.79 0.24 0.06 2.74 19.327a. Dependent Variable: Qpb. Dependent Variable: Ln_QpAli Osman Pektas and Tarkan Erdik 1872 KSCE Journal of Civil Engineering() of 0.2. So, the selection of the testing partition has adjusted inan unbiased manner, and a partition variable was created (IBMSPSS neural networks 19, 2011). By using the created partitionvariable, the testing sample was fixed for every ANN model. TheBernoulli distribution is a special case of the binomial distribution;probability density and cumulative distribution functions are asfollows:(5)(6)After splitting the data, two input (H, V) and three input (H, VandL+W)ANNmodelsweredevelopedtocomparewithregressionequationsofEq.(3)andEq.(4).Thedetailsofdeveloped models are presented at Table 2 and Table 3. As statedby Thorton et al. (2011), H and V are traditional variables, whichareusedbymanyresearcherswithdifferentdatasetsinregression analysis of predicting Qp. The variable L+W is a novelvariable that is suggested by Gupta and Singh (2011) for takingthe effects of dam crest length and average dam width into thepredictions. They also asserted that the new variable addition israisedthepredictionR2to0.95.InANNtrainingphase,theBatch type of training was selected because each variable datalengthismerely35.Batchtrainingisrecommendedasmostusefulforsmallerdatasetsduetothefactthatitupdatesthesynaptic weights only after passing all training data records anddirectlyminimizesthetotalerror(IBMSPSSneuralnetworks19,2011).TheScaledconjugategradientmethodisusedtoestimate the synaptic weights Initial lambda and initial sigma areconstrainedto0.0000005and0.00005,intervalcentervalueisselected as 0 and interval offset is constrained as.The training performance of each model was restricted to 500iterations to reduce the overtraining risk. Three different activationfunctions and their combinations were used in output and inputlayers. The activation function links the weighted sums of unitsin a layer to the values of units in the succeeding layer. Sigmoid,hyperbolictangentandidentityfunctionswereusedas transferfunctions. While the identity function does not transform the realvalue arguments, sigmoid function does the real valued argumentsin the range of (0, 1) whereas Hyperbolic tangent function has atransformationrange(1,1).Thesetransferfunctionscanbeexpressed as:(7)(8)(9)TwoinputandthreeinputANNmodelswereconstructedtopredictQp,bychangingactivationfunctionsandhiddenlayernumbers. A schematic representation of neural network structureis in Fig. 4.Model performances have been compared with each other byf x ; ( ) x1 ( )x=F x ; ( )1 x 0 =1x 1 = =0.5+Tanh c ( ) ecec ecec + =Sig c ( ) 1 1 ec + ( ) =Id c ( ) c =Fig 4. ANN StructureTable 2. Two Input Models Transformed and the Original Data SetNum. of Hidden LayerUnits in Hidden LayerHidden Layer Activation Functionoutput Layer Activation FunctionModel inputs: H,V Model output: QpTrain data Test data Overall dataCorrela-tion coeffi-cientR2MeanStd. Devia-tionCorrela-tion coef-ficientR2MeanStd. Devia-tionCorre-lation coeffi-cientR2MeanStd. Devia-tionQpQp(m3/s) QpQp(m3/s) QpQp(m3/s)Qp (m3/s) 1.00 1.00 9418.6 19424.3 1.00 1.00 1925.3 1859.8 1.00 1.00 7705.817304.01.00 1.00 Hyperbolic tangent Identity Model 1 0.85 0.73 9345.1 16523.7 0.91 0.82 1599.4 1972.1 0.86 0.73 7574.714848.51.00 2.00 Hyperbolic tangent Identity Model 2 0.87 0.76 9430.3 16822.6 0.92 0.84 1730.7 2009.2 0.87 0.76 7670.415099.81.00 2.00 Hyperbolic tangent Hyperbolic tangent Model 3 0.85 0.73 9442.3 16540.5 0.84 0.71 1682.5 1699.8 0.86 0.74 7668.614857.31.00 2.00 Hyperbolic tangent Hyperbolic tangent Model 4 0.87 0.76 9615.6 16808.6 0.86 0.75 1891.0 1228.0 0.87 0.76 7850.015072.91.00 2.00 Hyperbolic tangent Hyperbolic tangent Model 5 0.87 0.76 9714.5 16737.6 0.87 0.76 2036.71347.9 0.87 0.77 7959.615010.11.00 2.00 Hyperbolic tangent Sigmoid Model 6 0.85 0.73 9689.5 16411.5 0.86 0.73 1972.4 1070.1 0.86 0.73 7925.614731.21.00 2.00 Sigmoid Sigmoid Model 7 0.85 0.73 9653.9 16407.3 0.84 0.71 1958.8 1259.9 0.86 0.73 7895.014728.62.00 2+2 Sigmoid Sigmoid Model 8 0.85 0.72 9605.2 16372.4 0.82 0.67 1909.4 1005.5 0.86 0.73 7846.214695.02.00 2+2 Hyperbolic tangent Hyperbolic tangent Model 9 0.85 0.72 9682.7 16438.1 0.83 0.68 1956.4 1385.7 0.86 0.73 7916.714760.22.00 2+2 Sigmoid Identity Model 10 0.85 0.72 9467.9 16455.3 0.93 0.87 1776.9 2294.3 0.86 0.73 7709.914794.71.00 2.00 Sigmoid Identity Model 11 0.85 0.73 9465.4 16544.0 0.92 0.85 1659.9 2045.2 0.86 0.73 7681.314873.61.00 2.00 Hyperbolic tangent Identity Model 12 0.85 0.73 9427.7 16562.2 0.90 0.81 1639.4 1850.3 0.86 0.74 7647.514882.2The predictions of the regression Eq. (1) Qp Eq. (4) 0.80 0.64 5533.1 9877.7 0.92 0.85 1305.21140.9 0.81 0.65 4566.7 8838.8 Peak Discharge Prediction due to Embankment Dam Break by using Sensitivity Analysis based ANNVol. 18, No. 6 / September 2014 1873 squareofthePearsonproduct-momentcorrelationcoefficient(R2) of overall data. The training, testing and overall data predictionR2 values and ANN architectures of various models are given inTable 2 and Table 3. The difference between training and testingR2 was used to diagnose the overtraining. The compared regressionequation R2 results are given for each part of data at last row oftables.However,themodelresultsareclosetoeachother,fortwoinputsANNmodel5,forthreeinputsANNmodel2wasselected as the best model by the selection criteria.3.2 Evaluation of the Proposed ModelTheperformancesoftheproposedmodelsareachievedinterms of 4 different numerical error criteria (Eqs. 10-12) such asCoefficient of Efficiency (COE), root mean squared error (RMSE),Mean Absolute Error (MEAE), Maximum Absolute Error (MAAE),in which COE is the measure of how much of the variation andtrendsin the observeddata arepredicted by the model whereasRMSE, MEAE and MAAE indicate a quantitative information ofthe model error with the characteristic that larger errors receivegreater attention than smaller ones (Erdik et al., 2009).(10)(11) (12)wheresubscriptsmandpindicatemeasuredandpredictedvalues, respectively, the variable with a bar over it represent theaverage of that variable and n is total number of data.The prediction results for overall 35 data set are given in Table4. As is clearly demonstrated, our proposed models for two andthree input situations yield encouraging results when comparedto two inputs (Eq. 4) and three inputs (Eq. 3) regression equations.The proposed models are out numbered to the most-recent modelbyGuptaandSingh(2012)intermsoffourerrorstatistics.InRMSEi 1 =nQpmi( ) Qppi( ) [ ]n------------------------------------------- =COE 1i 1 =nQpmi( ) Qppi( ) [ ]2i 1 =nQpmi( ) Qpmi( ) [ ]2-------------------------------------------------- =MEAE1n---i 1 =nQpmiQpp=Table 3. Three Input Models Transformed and Original Data SetNum. of Hidden LayerUnits in Hidden LayerHidden Layer Activation Functionoutput Layer Activation FunctionModel inputs: H, V, (L+W)Model output: QpTrain data Test data Overall dataCorrela-tion coef-ficientR2MeanStd. DeviationCorrela-tion coef-ficientR2MeanStd. Devia-tionCorrela-tion coef-ficientR2MeanStd. Devia-tionQpQp(m3/s) QpQp(m3/s) QpQp(m3/s)Qp (m3/s) 1.00 1.00 9418.6 19424.3 1.00 1.00 1925.3 1859.8 1.00 1.00 7705.817304.01.00 1.00 Hyperbolic tangent Identity Model 1 1.00 1.00 9493.9 19186.6 0.86 0.73 3224.4 3843.4 1.00 0.99 8060.817078.81.00 2.00 Hyperbolic tangentHyperbolic tangent Model 2 1.00 1.00 9415.5 19408.3 0.90 0.80 2194.4 2352.2 1.00 1.00 7765.0 17281.61.00 2.00 Hyperbolic tangent Sigmoid Model 3 1.00 1.00 9479.4 19194.7 0.86 0.73 3146.2 3742.7 1.00 0.99 8031.817085.41.00 2.00 Sigmoid Identity Model 4 1.00 1.00 9420.1 19404.3 0.91 0.82 2123.9 2277.9 1.00 1.00 7752.417281.91.00 2.00 Sigmoid Hyperbolic tangent Model 5 1.00 1.00 9479.0 19333.9 0.86 0.74 3055.0 3847.4 1.00 0.99 8010.617215.81.00 2.00 Sigmoid Sigmoid Model 6 0.85 0.73 9337.0 16676.9 0.84 0.70 2403.8 2916.9 0.86 0.74 7752.314938.41.00 4.00 Sigmoid Identity Model 7 0.92 0.86 9698.9 12897.8 0.84 0.70 5198.6 3634.5 0.92 0.85 8670.211558.82.00 2+2 Sigmoid Hyperbolic tangent Model 8 1.00 1.00 9428.7 19463.2 0.88 0.77 2955.2 3803.7 1.00 0.99 7949.017328.22.00 2+2 Sigmoid Sigmoid Model 9 1.00 1.00 9548.8 19244.7 0.85 0.72 3403.4 4008.2 1.00 0.99 8144.117128.32.00 2+2 Hyperbolic tangent Identity Model 10 1.00 1.00 9456.1 19176.9 0.78 0.61 4162.0 6028.6 0.99 0.98 8246.117140.42.00 2+2 Hyperbolic tangent Hyperbolic tangent Model 11 1.00 1.00 9454.7 19286.4 0.74 0.55 3358.4 5264.3 0.99 0.99 8061.317230.62.00 2+2 Hyperbolic tangent Sigmoid Model 12 1.00 1.00 9563.0 19161.8 0.85 0.73 3272.7 3620.2 1.00 0.99 8125.217048.8The predictions of the regression Eq. (1) Qp Eq. (3) 0.86 0.74 10781.2 22433.5 0.97 0.95 1730.5 1833.3 0.87 0.75 8712.520010.2Fig. 5. ScatterDiagramsofTwoInputModels:(a)ANNModel5,(b) Eq. (4).Table 4. Comparison of the Proposed Model with Error StatisticsRMSE MEAE COE MAAEModel 5 (ANN 2 input)8269.64 2857.00 0.76 34268.00Eq. (4) 11699.52 4588.76 0.53 54407.20Model 2(ANN 3 input)937.01 560.03 1.00 3043.00Eq. (3) 9927.14 3649.20 0.66 39870.82Ali Osman Pektas and Tarkan Erdik 1874 KSCE Journal of Civil Engineeringaddition, developed model does not have any restrictive assumption,which regression model has. Scatter diagrams of observed peakflows and predictions of ANN model 5 and Eq. (4) is presentedinFig.5.ThescatterdiagramsofpredictedvaluesofANNmodel 2, Eq. (3) and observed values of Qp are also presented inFig. 6.AsmentionedbyMarquesdesa(2007)manypeoplewhendesigning a classification or regression model that performs verywellinatrainingset(thesetusedinthedesign)sufferfromakindoflove-at-first-sightsyndromethatleadstoneglectingorrelaxing the evaluation of their models in test sets (independentof the training sets). The research literature is full with examplesofimproperlyvalidatedmodelsthatarelaterondroppedoutwhenmoredatabecomesavailableandtheinitialoptimismplunges down. The book of Chamont Wang (Wang, 1993), wheremany illustrations and words of caution on the topic of inferentialstatisticscanbefound,makesadetailedexaminationofthistopic. So another comparison of regression equations and ANNmodelsisemployedfromthisperspective.Regressionmodelsare generally produced by considering all data in hand, as in thecaseofGuptaandSingh(2012).Thedecisionmaking-processforpredictiveaccuracyofthemodelsareusuallydependedonR2values. As shown in Table 2 and 3, if the data set is dividedintotwosubsetsasvalidationandtraining,theR2valuesmaychangedramatically.Thisindicatesthatchoosingtheproperregressionequationisverydependedontheselecteddataset.The suggested Ann models are superior to regression approachesfrom this point. Because a testing data is split from whole dataandnotusedindevelopingthemodel,whichisrequiredforselecting the data set in an unbiased manner. So the establishedANNmodelislessinsensitivetotheuseddatasetandgivesgood predictions when used different data as input to the system.3.3 Determining the Importance of Predictors In order to determine the influence of the variables, the inputunits are subjected to sensitivity analysis to rank their importance.Whenanalyzingtheinputs,inputfieldsareconsideredastheunitsofanalysis,ratherthanindividualinputneurons.So,theentiregroupofinputneuronsrepresentingthesetfieldisanalyzed together. The sensitivity of an input field is calculatedby varying the value of that input field for each record in the testset. As the value is varied, the maximum and minimum outputsarestoredandthemaximumdifferenceintheoutputsiscalculated.Thismaximumdifferenceiscalculatedforeveryrecordandthenaveraged.Valuesarevariedsuchas0.0,0.25,0.5,0.75,and1.0,representingfiveequallyspacedvaluescovering the range of the original input field. This procedure canbe presented for each predictor p and each input pattern m as:(13)WhereIspredictedoutputvector(standardizedifstandardizationoftheoutputvariableisusedintraining)using( )asitsinput,and for scale predictors. Then dp is normalizedto sum 1 as:(14)andthesenormalizedvaluesarereportedastheimportancecoefficient values for predictors. Before determining the importance of variables for predictionof the dam break Qp, the random number generator was fixed toa custom number. Afterwards, model 5 and model 2 were run 10times. In each case the independent variable importance coefficientsarecalculated,thenaveragedtogainthefinalresults.Thepurpose of duplication is to reduce the bias effect of the arbitraryassignmentoffirstweightvaluesatthebeginningoftraining.The importance ratio of H and V are 0.45 and 0.55, respectively.When the third parameter (L+W) included in ANN models, theresultsarechanged0.47,0.22and0.31intheorderofH,V,(L+W).4. ConclusionsInrecentyears,ANNtechniqueshavemadeaconsiderablecontributiontodata modeling, which includesuncertaintyas isthecaseinthisresearch.Ingeneral,theANNmodelbetterrepresentstherelationshipbetweenvariablesandyieldslessrelativeerrorthanregressionanalysistechniques.Inthisstudy,peak discharges due to embankment dam failure is evaluated byANN,whichhasnorestrictiveassumptionssuchasnormality,linearity, etc. On the contrary to regression techniques, it is easy touse. This model outperforms the most recent previous two studiesemployed as a benchmark in terms of graphical representations anddpmmaxxp1 xp2Sp,Yp1mYp2m =Yp1mx1mxp 1 mxpkxp 1 +mxpm, , , , , , Sp=xpminxp1 ( )xp2 ( )xp3 ( )xpmax, , , , { }dp1M-----m 1 =Mdpm =Fig. 6. Scatter Diagrams of Three Input Models: (a) ANN Model 2,(b) Eq. (3). Peak Discharge Prediction due to Embankment Dam Break by using Sensitivity Analysis based ANNVol. 18, No. 6 / September 2014 1875 statisticalerrorcriteriaandsuggestedtouseinthepreliminarydesignstageofembankmentdamsbydesignengineersandresearchers.Different activation functions and different network structuresapplied to models. It is observed that the variation of activationfunctionsandstructurehasnotasignificantinfluenceontheresultofpredictionR2values.ThepredictioncapabilitiesofANNs are found superior when compared with the Eq. (3) andEq. (4). The prediction accuracy of three inputs ANN models ishigherthantwoinputmodels.Therefore-ifpossible-itisrecommended to use the third variable (L+W) in Qp predictionstudies. TheinterrelationshipsandcontributionratesofmodelinputparametersareinvestigatedbysensitivityanalysisinANNs.Dam height (H) is found the most influential variable on the Qpby its exchange. Among three variables volume of water behindthedamcrestisfoundtheleastaffectiveparameterontheQpvalues.ReferencesASCE/EWRI Task Committee on Dam/Levee Breaching (2011). Earthenembankmentbreaching.J.Hydraul.Eng.,Vol.137,No.12,pp.1549-1564, DOI: 10.1061/(ASCE)HY.1943-7900.0000498.Baker, S., Ponniah, D., and Smith, S. (1999). Survey of risk managementin major U.K. companies. Journal of Professional Issues in EngineeringEducation and Practice, Vol. 125, No. 3, pp. 94-102.Bourdes, V., Bonnevay, S., Lisboa, P., Defrance, R., Perol, D., Chabaud,S., Bachelot, T., Gargi, T., and Negrier, S. (2010). Comparison ofartificialneuralnetworkwithlogisticregressionasclassificationmodels for variable selection for prediction of breast cancer patientoutcomes. Advances in Artificial Neural Systems, Vol. 2010, ArticleID 309841, p. 10, DOI: 10.1155/2010/309841.Cacuci, D. G., Ionescu-Bujor, M., and Navon, I. M. (2005). Sensitivityand uncertainty analysis: Applications to large-scale systems, CRCPress LLC, Boca Raton.Costa,J.E.(1985).Floodsfromdamfailures,U.S.GeologicalSurveyOpen-File Rep, No. 85-560, U.S. Geological Survey Denver, Colo.Cullen, A. C. and Frey, H. C. (1999). Probabilistic techniques in exposureassessment, Springer, New York.Dorf,R.C.(1997).Theelectricalengineeringhandbook,CRCPressLLC, Boca Raton.Embrechts, M. J., Arciniegas, F., Ozdemir, M., Breneman, C. M., Bennett,K., and Lockwood, L. (2001). Bagging neural network sensitivityanalysis for feature reduction for in-silico drug design. 2001 INNS-IEEEInternationalJointConferenceonNeuralNetworks,Vol.4,IEEE Press, Washington, DC, p. 2478.Erdik, T., Savci, M. E., and Sen Z. (2009). Artificial neural networksfor predicting maximum wave runup on rubble mound structures.Expert Systems with Applications, Vol. 36, No. 3, pp. 6403-6408.Froehlich,D.C.(1995).Peakoutflowfrombreachedembankmentdam. J. Water Resour. Plann. Manage., Vol. 121, No. 1, pp. 90-97.Fraedrich, D. and Goldberg, A. (2000). A methodological frameworkforthevalidationofpredictivesimulations.EuropeanJournalofOperational Research, Vol. 124, No. 1, pp. 55-62.Garson, G. D. (1991). Interpreting neural network connection weights.Artif. Intell. Expert., Vol. 6, No. 6, pp. 47-51.Gupta, S.and Singh,V.(2012). Discussionof Enhancedpredictions forpeakoutflowfrombreachedEmbankmentDamsbyChristopher,I.Thornton,MichaelW.,Pierce,andStevenR.,Abt.J.Hydrol.Eng.,ASCE,Vol.17,No.3,pp.463-466,DOI:10.1061/(ASCE)HE.1943-5584.0000470.Hastie, T., Tibishrani, R., and Freidman, J. (2001). The elements of statisticallearning:Datamining,inferenceandprediction,Springer,NewYork.Hodouin, D., Thibault, J., and Flamemt, F. (1991). Artificial neuralnetworks:Anemergingtechniquetomodelandcontrolmineralprocessing plants. 120th Annual Meeting of the Society of Mining,Metallurgy and Exploration Inc.Howes, P. and Crook, N. (1999). Using input parameter influences tosupport the decisions of feedforward neural networks. Neurocomputing,Vol. 24, Nos. 1-3, pp. 191-206, DOI: 10.1016/S0925-2312(98)00102-7.Huisman, P., Cramer, W., and Van, E. G. (1998). Water in the Netherlands,Netherlands Hydrological Society, Netherlands: Delft.IBMSPSSNeuralNetworks19(2010).CopyrightSPSSInc,1989,2010.Kemp,J.S.,Zaradic,P.,andHansen,F.(2007).Anapproachfordetermining relative input parameter importance and significance inartificial neural networks. Ecol. Model., Vol. 204, No. 1, pp. 326-334.Kirkpatrick, G. W. (1977). Evaluation guidelines for spillway adequacy.Proc., Engineering Foundation Conf., ASCE, Reston, Va., pp. 395-414.Kleijnen, J. P. C. (1995). Verification and validation of simulation models.European Journal of Operational Research, Vol. 82, No. 1, pp. 145-162.MarquesdeSa,J.P.(2007).AppliedstatisticsusingSPSS,STATISTICA,MATLAB and R, Springer Berlin Heidelberg New York.Mohammadi, K., Eslami, H. R., and Dardashti, Sh. D. (2005). Comparisonofregression,ARIMAandANNmodelsforreservoirinflowforecasting using snowmelt equivalent (a Case study of Karaj). J.Agric. Sci. Techno., Vol. 7, No. 3, pp. 17-30.Olden, J. D. and Jackson, D. A. (2002). Illuminating the black box: Arandomization approach for understanding variable contributions inartificialneuralnetworks.Ecol.Model,Vol.154,No.2,pp.135-150.Olden, J. D., Joy, M. K., and Death, R.G. (2004). An accurate comparisonofmethodsforquantifyingvariableimportanceinartificialneuralnetworksusing simulateddata.Ecol.Model,Vol.178,No. 1, pp.389-397.Pan,J. Z. (2000). Merits of dams, Beijing:TsinghuaUniversityPress,(in Chinese).Poh,H.,Yao,J.,andJascaronic,T.(1998).Neuralnetworksfortheanalysisandforecastingofadvertisingandpromotionimpact.International Journal of Intelligent Systems in Accounting, Finance& Management, Vol. 7, No. 4, pp. 253-268.Singh, V. P. (1996). Dam breach modeling technology, Kluwer Academic,Dordrecht, Netherlands.Singh, V. P. and Scarlatos, P. D. (1988). Analysis of gradual earth-damfailure. J. Hydraul. Eng., Vol. 114, No. 1, pp. 21-42.Singh, K. P. and Snorrason, A. (1982). Sensitivity of outflow peaks andflood stages to the selection of dam breach parameters and simulationmodels, State Water Survey (SWS) Contract Rep. No. 288, IllinoisDept. of Energy and Natural Resources, SWS Div., Surface Water atthe Univ. of Illinois.Singh, K. P. and Snorrason, A. (1984). Sensitivity of outflow peaks andflood stages to the selection of dam breach parameters and simulationAli Osman Pektas and Tarkan Erdik 1876 KSCE Journal of Civil Engineeringmodels. J. Hydrol., Vol. 68, Nos. 1-4, pp. 295-310.Soil Conservation Service (SCS) (1981). Simplified dam-breach routingprocedure, Technical Release Rep. No. 66 (Rev. 1).Tchaban, T., Taylor, M. J., and Griffin, A. (1998). Establishing impactsof the inputs in a feedforward network. Neural Comput Appl., Vol.7, pp. 309-317, DOI: 10.1007/BF01428122.Thornton, C. I., Pierce, M. W., and Abt, S. R. (2011). Enhanced predictionsforpeakoutflowfrombreachedembankmentdams.Technicalpaper.JournalofHydrologicEngineering,Vol.16,No.1,p.81,DOI: 10.1061/_ASCE_HE.1943-5584.0000288.Tronto, I. F. B., Da Silva, J. D. S., and Sant Anna, N. (2006). Comparisonof artificial neural network and regression models in software effortestimation, NPE ePrint: sid.inpe.br/ePrint@80/2006/12.08.12.47 v12006-12-09.U.S. Bureau of Reclamation (1982). Guidelines for defining inundatedareas downstream from Bureau of Reclamation dams, ReclamationPlanning Instruction Rep., Nos. 82-11.Wahl, T. L. (1998). Prediction of embankment dam breach parameters:Aliteraturereviewandneedsassessment,DamSafetyRep.No.DSO-98-004,BureauofReclamation,U.S.Dept.oftheInterior,Denver.Wahl,T.L.(2004).Uncertaintyofpredictionsofembankmentdambreach parameters. J. Hydraul. Eng., Vol. 130, No. 5, pp. 389-397.Wang, C. (1993). Sense and nonsense of statistical inference, controversy,misuse and subtlety, Marcel Dekker, Inc.Yoon,Y.,Guimaraes,T.,andSwales,G.(1994).Integratingartificialneural networks with rule-based expert systems. Decis Support Syst.,Vol. 11, Issue 5, pp. 497-507, DOI: 10.1016/0167-9236(94) 90021-3.

2014 KSCE Peak Discharge

Documents

Transcript of 2014 KSCE Peak Discharge