Research Article Time Series Outlier Detection Based on ...

15
Research Article Time Series Outlier Detection Based on Sliding Window Prediction Yufeng Yu, Yuelong Zhu, Shijin Li, and Dingsheng Wan College of Computer & Information, Hohai University, Nanjing 210098, China Correspondence should be addressed to Yufeng Yu; [email protected] Received 18 July 2014; Accepted 15 September 2014; Published 30 October 2014 Academic Editor: Jun Jiang Copyright ยฉ 2014 Yufeng Yu et al. is is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. In order to detect outliers in hydrological time series data for improving data quality and decision-making quality related to design, operation, and management of water resources, this research develops a time series outlier detection method for hydrologic data that can be used to identify data that deviate from historical patterns. e method ๏ฌrst built a forecasting model on the history data and then used it to predict future values. Anomalies are assumed to take place if the observed values fall outside a given prediction con๏ฌdence interval (PCI ), which can be calculated by the predicted value and con๏ฌdence coe๏ฌƒcient. e use of PCI as threshold is mainly on the fact that it considers the uncertainty in the data series parameters in the forecasting model to address the suitable threshold selection problem. e method performs fast, incremental evaluation of data as it becomes available, scales to large quantities of data, and requires no preclassi๏ฌcation of anomalies. Experiments with di๏ฌ€erent hydrologic real-world time series showed that the proposed methods are fast and correctly identify abnormal data and can be used for hydrologic time series analysis. 1. Introduction As the fundamental resources for water resources manage- ment and planning, long-term hydrological data are sets of discrete record values of hydrological elements that are collected with time and have been frequently analyzed in the ๏ฌeld such as ๏ฌ‚ood and drought control, water resources management, and water environment protection. With the development of data acquisition technology and data trans- mission technology, hydrological departments collected ever- increasing amounts of time series data from automatic mon- itoring systems via loggers and telemetry systems. Within these datasets, hydrologic time series analysis becomes work- able and credible for building mathematical model to gen- erate synthetic hydrologic records, to forecast hydrologic events, to detect trends and shi๏ฌ…s in hydrologic records, and to ๏ฌll in missing data and extend records [1]. However, a hydrologic time series is generally composed of a stochastic component superimposed on a deterministic component and usually shows stochastic, fuzzy, nonlinear, nonstationary, and multitemporal scale characteristics [2]. So, it is a challenging task to process and interpret the original hydrological time series due to the following: (i) the large volumes of data, (ii) the parameter pattern being speci๏ฌc and changing to di๏ฌ€erent hydrology acquisition system due to multi- temporal scale characteristic, (iii) abnormal events or disturbances that create spurious e๏ฌ€ects in the data series and result in unexpected pat- terns, (iv) inaccuracies in hydrological models due to imprecise and outdated information, logger and communica- tions failures, poor calibration, and lack of system feedback. Consequences of such situations in hydrological infor- mation systems may result in the DRQP (data rich, but quality poor) phenomenon. Consequently, the original mon- itoring data (i.e., precipitation, discharge, and water levels) should undergo a preprocessing step to eliminate the nega- tive in๏ฌ‚uence caused by incorrect or abnormal data due to Hindawi Publishing Corporation Mathematical Problems in Engineering Volume 2014, Article ID 879736, 14 pages http://dx.doi.org/10.1155/2014/879736

Transcript of Research Article Time Series Outlier Detection Based on ...

Page 1: Research Article Time Series Outlier Detection Based on ...

Research ArticleTime Series Outlier Detection Based onSliding Window Prediction

Yufeng Yu, Yuelong Zhu, Shijin Li, and Dingsheng Wan

College of Computer & Information, Hohai University, Nanjing 210098, China

Correspondence should be addressed to Yufeng Yu; [email protected]

Received 18 July 2014; Accepted 15 September 2014; Published 30 October 2014

Academic Editor: Jun Jiang

Copyright ยฉ 2014 Yufeng Yu et al. This is an open access article distributed under the Creative Commons Attribution License,which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

In order to detect outliers in hydrological time series data for improving data quality and decision-making quality related to design,operation, and management of water resources, this research develops a time series outlier detection method for hydrologic datathat can be used to identify data that deviate from historical patterns. The method first built a forecasting model on the historydata and then used it to predict future values. Anomalies are assumed to take place if the observed values fall outside a givenprediction confidence interval (PCI), which can be calculated by the predicted value and confidence coefficient. The use of PCI asthreshold is mainly on the fact that it considers the uncertainty in the data series parameters in the forecasting model to addressthe suitable threshold selection problem. The method performs fast, incremental evaluation of data as it becomes available, scalesto large quantities of data, and requires no preclassification of anomalies. Experiments with different hydrologic real-world timeseries showed that the proposed methods are fast and correctly identify abnormal data and can be used for hydrologic time seriesanalysis.

1. Introduction

As the fundamental resources for water resources manage-ment and planning, long-term hydrological data are setsof discrete record values of hydrological elements that arecollected with time and have been frequently analyzed inthe field such as flood and drought control, water resourcesmanagement, and water environment protection. With thedevelopment of data acquisition technology and data trans-mission technology, hydrological departments collected ever-increasing amounts of time series data from automatic mon-itoring systems via loggers and telemetry systems. Withinthese datasets, hydrologic time series analysis becomes work-able and credible for building mathematical model to gen-erate synthetic hydrologic records, to forecast hydrologicevents, to detect trends and shifts in hydrologic records, andto fill in missing data and extend records [1]. However, ahydrologic time series is generally composed of a stochasticcomponent superimposed on a deterministic component andusually shows stochastic, fuzzy, nonlinear, nonstationary, andmultitemporal scale characteristics [2]. So, it is a challenging

task to process and interpret the original hydrological timeseries due to the following:

(i) the large volumes of data,(ii) the parameter pattern being specific and changing to

different hydrology acquisition system due to multi-temporal scale characteristic,

(iii) abnormal events or disturbances that create spuriouseffects in the data series and result in unexpected pat-terns,

(iv) inaccuracies in hydrological models due to impreciseand outdated information, logger and communica-tions failures, poor calibration, and lack of systemfeedback.

Consequences of such situations in hydrological infor-mation systems may result in the DRQP (data rich, butquality poor) phenomenon. Consequently, the original mon-itoring data (i.e., precipitation, discharge, and water levels)should undergo a preprocessing step to eliminate the nega-tive influence caused by incorrect or abnormal data due to

Hindawi Publishing CorporationMathematical Problems in EngineeringVolume 2014, Article ID 879736, 14 pageshttp://dx.doi.org/10.1155/2014/879736

Page 2: Research Article Time Series Outlier Detection Based on ...

2 Mathematical Problems in Engineering

instrumentation faults, data inherent change, operation error,or other possible influencing factors [3]. Therefore, outlierdetection usually becomes a vital step for hydrologic timeseries analysis based on themonitoring data. Regarding smallmonitoring datasets, data managers can detect and deal withoutliers directly with a simple graphical or manual process.However, for massive datasets or data stream, an automaticand objective technique for effectively detecting and treatingoutliers is necessary.

This study develops a real-time outlier detection methodthat employs a window-based forecasting model for hydro-logic time series collected from automatic monitoring sys-tems.Themethod builds a forecastingmodel from a sequenceof historical point values with a given window to predictfuture values. If the observed value differs from the predictedvalue beyond a certain threshold, an outlier would be indi-cated. The method uses prediction confidence interval (๐‘ƒ๐ถ๐ผ)as threshold in consideration of uncertainty in the data seriesparameters in the forecasting model. Data are classified asanomalous/nonanomalous based on whether or not they falloutside a given ๐‘ƒ๐ถ๐ผ. Thus, the method provides a principledframework for selecting a threshold. This method does notrequire any preclassified examples of data, scales well to largevolumes of data, and allows for fast incremental evaluation ofdata as it becomes available.

In order to evaluate the proposed method, it was appliedto two different hydrological variables, water level and dailyflow, from Huayuankou (hereafter ๐ป๐‘Œ๐พ, 34.76โˆ˜N, 113.58โˆ˜E)and Lanzhou (hereafter ๐ฟ๐‘, 36.04โˆ˜N, 103.49โˆ˜E) stationsobtained from national hydrology database of MWR, China.The results show that the proposedmethod can exactly detectthe outliers in the hydrological time series with near negligi-ble false positive rate. Furthermore, the algorithmโ€™s efficiencyis analyzed based on the detection results.

The rest of the paper is organized as follows. In the nextsection (Section 2) we present the related work to this area ofresearch. In Section 3 we present details about the proposedalgorithm for outlier detection in time series based on predic-tion confidence interval. A number of experiments with theproposed method using real-world hydrological time seriesare reported in Section 4. Finally, Section 5 gives conclusionsand suggestions for further research.

2. Related Work

2.1. Time Series Analysis. A time series (๐‘‡๐‘†) ๐‘‹ = {๐‘ฅ(๐‘ก) | 1 โฉฝ

๐‘ก โฉฝ ๐‘š} is a sequence of ๐‘‘-dimensional observations vector๐‘ฅ(๐‘ก) = (๐‘ฅ

1(๐‘ก), ๐‘ฅ2(๐‘ก), . . . ๐‘ฅ

๐‘‘(๐‘ก)) ordered in time. Mostly these

observations are collected at equally spaced, discrete timeintervals. It is called a univariate (or single) time series when๐‘‘ is equal to 1 and amultivariate time series when ๐‘‘ is equal toor greater than 2 [4]. Generally, a time series can be regardedas a sample realization from an infinite population of suchtime series generated by a stochastic process, which can bestationary or nonstationary. In addition, the understandingof the structure and dependence of time series is achievedthrough time series analysis.

Time series analysis is the investigation of a temporallydistributed sequence of data or the synthesis of a modelfor prediction wherein time is an independent variable; asa consequence, the information obtained from time seriesanalysis can be applied to forecasting, process control, outlierdetection, and other applications [5]. A basic assumption intime series analysis is that some aspects of the past patternwill continue to remain in the future. Also under this setup,often the time series process is assumed to be based on pastvalues of the main variable but not on explanatory variableswhich may affect the variable/system.

In hydrology, time series analysis is one of frontier scien-tific issues because it can detect and describe quantitativelyeach of the hydrologic processes underlying a given sequenceof observations.Moreover, hydrologic time series analysis canalso be used for building mathematical models to generatesynthetic hydrologic records, to forecast hydrologic events, todetect trends and shifts in hydrologic records, and to fill inmissing data and extend records. Consequently, time seriesanalysis has become a vital tool in hydrological sciencesand its importance has been dramatically enhanced in therecent past due to ever-increasing interest in the scientificunderstanding of climate change [6].

In the time series analysis, it is assumed that the data(observations) consist of a systematic pattern and stochasticcomponent; the former is deterministic in nature, whereasthe latter accounts for the random error and usually makesthe pattern difficult to be identified. Previous research usuallyequates stochastic component to system error and thensimply discards it so as to not complicate the statistical anal-yses. However, the stochastic component potentially includesinteresting and meaningful information; it must be treatedwith caution. It is for this reason that outlier detectionbecomes a hotspot research issue in recent years.

2.2. Outlier Detection. An outlier can be defined as โ€œobser-vation that deviates so much from other observations as toarouse suspicion that it was generated by a different mech-anismโ€ [7], or โ€œpatterns in data that do not conform to awell-defined notion of normal behaviorโ€ [8]. Generally, anoutlier may be incorrect and/or the circumstances aroundthe measurement may have changed over time. Thus, theidentification and treatment of outliers constitute an impor-tant component of the time series analysis before modelingbecause outliers can have negative impacts on the selectionof the appropriate model as well as on the estimation of theassociated parameters.

Outlier detection, also known as anomaly detection insome literatures, is an important long-standing researchproblem in the domains of data mining and statistics. Themajor objective of outlier detection is to identify data objectsthat are markedly different from, or inconsistent with, theremaining set of data [9, 10]. In recent decades, this researchproblem is attracting significant research attentions in thefields of statistical analysis, machine learning, and artificialintelligence due to its important applications in a wide rangeof areas in business, security, insurance, health care, andengineering, to name a few. Chandola et al. [8] provide a com-prehensive classification of the outlier detection techniques

Page 3: Research Article Time Series Outlier Detection Based on ...

Mathematical Problems in Engineering 3

and classify the techniques into six categories: classification-based [11], nearest-neighbor-based [12], clustering-based [13],statistical [14], information theory-based [15], and spectraltheory-based [16]. In addition to the abovementioned algo-rithms, other approaches have been adopted to solve theoutlier detection problem. For instance, signal processingtechniques such as wavelet transform [17] and Fouriertransform [18] have been used to detect outlier regions inmeteorological data.

Outlier detection is a very broad field and has been stud-ied in the context of a large number of application domainswhere many detection methods have been proposed accord-ing to the different data characteristics. Recently, there hasbeen significant interest in detecting outliers in time series.Generally, methods for time series outlier detection shouldconsider the sequence nature of data and operate either ona single time series or on a time series database. The goalof outlier detection on a single time series is to find ananomalous subregion, while the goal of the latter is to identifya few sequences as outliers or to identify a subsequence in atest sequence as an outlier. In some cases, a single time seriesis converted to a time series database through the use of asliding window [19].

Given a single time series, one can find particular ele-ments (or time points) within the time series as outliers orfind subsequence outliers. Fox defines two types of outliers(type I/additive and type II/innovative) based on the dataassociated with an individual object across time, ignoringthe community aspect completely [20]. Since then, a largenumber of ๐‘ก prediction models and profile based modelshave been proposed to find point outlier within a timeseries. A straightforward method for outlier detection intime series is based on forecasting [21โ€“23]. This approachfirst builds a prediction model from the historical valuesand is then used to predict future values. If a predictedvalue differs from the observed value beyond a certainthreshold, an outlier would be indicated. The definition ofthe value of the threshold to be used for detecting outliers isthe main problem of outlier detection based on prediction.The difficulties of forecasting-based outlier detection havemotivated the proposal of anomaly subsequences detectiontechniques, based on outlier score calculated with respect tosimilar measure between subsequences [24โ€“27]. Keogh et al.[24] proposed using one nearest-neighbor approach to detectmaximal different subsequences within a longer sequence(called discords). Subsequence comparisons can be smartlyordered for effective pruning using various methods likeheuristic reordering of candidate subsequences [25], localitysensitive hashing [26], Haar wavelet [27], and SAX withaugmented tries [28].

The problem of outlier detection in time series databasemainly focuses on how to find all anomalous time series. Itis assumed that most of the time series in the database arenormal while a few are anomalous. Similar to the traditionaloutlier detection, the usual recipe of solving such problems isto first learn a model based on all the time series sequencesin the database and then compute an outlier score for eachsequence with respect to the model [29]. Statistics-basedmethods are still conducted to detect outliers in time series

database because of their efficient structure. In 2012, Zhangdeveloped an average-based methodology based on timeseries analysis and geostatistics, which achieved satisfactorydetection results in short snapshots [30]. However, it willfail if the outliers gather together closely in the same shorttime slot. Hence, recent researches have mainly focused onnonparametric outlier detection methods such as Bayesianmethod and discrete wavelet transform (DWT). Frieda pro-posed a Bayesian approach to outlier modeling, approximat-ing the posterior distribution of the model parameters byapplication of a componentwise Metropolis-Hastings algo-rithm [31]. Apart from the Bayesian method, Grane andVeiga [32] identified the outliers as those observations in theoriginal series whose detail coefficients are greater than acertain threshold based on wavelet transform. They iteratedthe process of DWT and outlier correction until all detailcoefficients are lower than the threshold. Based on real-worldfinancial time series, their method achieved a lower averagenumber of false outliers than Bilen and Huzurbazarโ€™s [33].However, in these works, thresholds are mainly set subjec-tively, which makes these methods inefficient and insensitivewhen there are several different kinds of outliers appearing inthe same time series.

2.3. Outlier Detection in Hydrological Time Series. Hydro-logic systems involving outliers invariably represent complexdynamical systems. The current state and future evolutionsof such dynamical systems depend on countless propertiesand interactions involving numerous highly variable physicalelements. The representation of such dynamical systems intheir corresponding models is complicated because certainrelationships can only be developed through analyses.

Outlier detection in hydrologic data is a common prob-lem which has received considerable attention in the uni-variate framework. In the multivariate setting, the problemis well established in statistics. However, in the hydrologicfield, the concepts are much less established. A pioneeringwork in this directionwas recently presented by Chebana andOuarda [34]. Moreover, many outlier detection techniques,such as Chauvenetโ€™s method, Dixon-Thompson outlier test,and Rosnerโ€™s test [35], are statistics based on the principleof hypothesis testing with the underlying assumption oflog-Pearson type III (LP3) distribution [36], which maynot always be readily available for hydrological time series.Hyndman and Shangโ€™s [37] methods, on the basis of real data,are graphical and consist first in visualizing functional datathrough the rainbow plot and then in identifying functionalhydrological outliers using the functional bag-plot and thefunctional highest-density region box-plot. The methods candetect outlier curves that may lie outside the range of themajority of the data ormay be within the range of the data buthave a very different shape.However, as indicated byChebanaand Ouarda [34], the points outside the fence of the bag-plotor box-plot are considered as extremes rather than outliers.On this basis, Chebana et al. proposed a nongraphical outlierdetection method based on the bivariate score points, whichwere obtained from the first two principal componentsscore vectors generated by functional principal componentanalysis, to detect outliers in flood frequency analysis [38].

Page 4: Research Article Time Series Outlier Detection Based on ...

4 Mathematical Problems in Engineering

Ng et al. [39] found that chaotic approach can determine thelevel of complexity of a system after they applied the chaoticanalytical techniques to daily hydrologic series comprisingof outlier. It adopts box-plot [40] as the outlier detectiontools based on its (1) statistical popularity, (2) ability tohandle and detect multiple outliers simultaneously withoutpreceding to an iterative detection, and (3) ability to detectoutliers by block detection where no masking effects areinvolved before conducting the chaotic analysis. The methodcan effectively detect numerous outliers in the original dailydischarge dataset but may understate the actual number ofoutliers because the data is usually clustered around the zeromean and consequently results in less skewness in the data.

Although many outlier detection methods exist in theliterature, there is a lack of discussion on the selection ofa proper detection method for hydrological outliers. It ismainly because of the fact that most of outlier detectionmethods belong to statistical approaches and demanded thatthe data must follow some distributions, and the selection ofa suitable outlier detection method is critically determinedby the intent of analyst and the intended use of the results.Analysts have to consider several technical aspects in itsdecisions-making such as the tradeoff between accurate andefficient, the evaluation of consequences subject (i.e.,maskingand swamping), the design assumptions and the limitationof different methods, and the preference on parametric ornonparametric approach.Without a thorough understandingof outlier phenomena, it is difficult to determine a suitableoutlier detection method.

Faced with such challenges, this work proposes a newmethod to detect outliers that splits given historical hydro-logical time series into subsequences by a sliding-windowand then an autoregressive (๐ด๐‘…) prediction model of timeseries and prediction confidence interval (๐‘ƒ๐ถ๐ผ) calculatedfrom nearest-neighbor historical data to identify time seriesanomalies. The method used an autoregressive predictionmodel, which belongs to data-driven time seriesmodel essen-tially, rather than a physics-based time series model, due tothe fact that it is simpler to develop and can rapidly produceaccurate short forecast horizon predictions. Data are classi-fied as anomalous/nonanomalous based on whether or notthey fall outside a given ๐‘ƒ๐ถ๐ผ. The ๐‘ƒ๐ถ๐ผ which is employed inthis study not only accounts for the fact that the correlationsbetween adjacent data points in the time series are higherthan those farther away but also avoids falling into ineffi-cient and insensitive dilemma caused by a subjective-settingthreshold. Moreover, the ๐‘ƒ๐ถ๐ผ can be calculated dynamicallyaccording to different nearest-neighbor windows size andconfidence coefficient of difference users, which make it suit-able for different variables of hydrologic time series outlierdetection for different userโ€™s demand. In conclusion, thismethod does not require any preclassified examples of data,scales well to large volumes of data, and allows for fast incre-mental evaluation of data as it becomes available.

3. Window-Based Outlier Detection

In this section, we formulate the outlier detection problemand give a formal definition of some concepts which are

used in the proposed algorithm. And then we will introducethe algorithm detecting outliers in time series based on thesliding-window prediction model. In addition, we also men-tion efficient strategies to choose the optimal parameters tomeet the usersโ€™ requirements. Next is the formal formulationof the contextual outlier detection algorithm.

3.1. Problem Definition. Hydrology is a time-varying phe-nomenon, the change of which is referred to hydrologicalprocesses. As important scientific data resources, hydrologi-cal data are the discrete records of hydrological processes andcould be divided into flow, water level, rainfall, evaporation,and other hydrologic time series according to the physicalquantities of its representation.

Definition 1 (hydrological time series). A hydrological timeseries ๐‘‡ is a set of real-valued data in successive order,occurring uniform time interval. In this work, ๐‘‡ =

โŸจ๐‘‘1= (V1, ๐‘ก1), ๐‘‘2= (V2, ๐‘ก2), . . . ๐‘‘

๐‘š= (V๐‘š, ๐‘ก๐‘š)โŸฉ, where๐‘š is the

length of the time series and point ๐‘‘๐‘–

= (V๐‘–, ๐‘ก๐‘–) stands for

the observation V๐‘–at the moment ๐‘ก

๐‘–; moreover, ๐‘ก

๐‘–is strictly

increased.

For outlier detection purposes, we are typically not inter-ested in any of the global properties of a time series; rather, weare interested in local subsections of the time series, which arecalled subsequences.

Definition 2 (subsequence). Given a time series๐‘‡ of length๐‘š,a subsequence ๐ถ of ๐‘‡ is a sampling of length ๐‘› โ‰ค ๐‘š of con-tiguous position from ๐‘; that is, ๐ถ = โŸจ๐‘‘

๐‘= (V๐‘, ๐‘ก๐‘), ๐‘‘๐‘+1

=

(V๐‘+1

, ๐‘ก๐‘+1

), . . . ๐‘‘๐‘+๐‘›โˆ’1

= (V๐‘+๐‘›โˆ’1

, ๐‘ก๐‘+๐‘›โˆ’1

)โŸฉ for 1 โ‰ค ๐‘ โ‰ค ๐‘š โˆ’

๐‘› + 1. Particularly, subsequence ๐ถmay contain only one datapoint on the condition that ๐‘› = ๐‘š.

Since all subsequences may potentially be abnormal, anyalgorithm will eventually have to extract all of them; this canbe achieved by use of a sliding window.

Definition 3 (sliding window). Given a time series๐‘‡ of length๐‘š and a user-defined subsequence length of ๐‘›, all possiblesubsequences can be extracted by sliding window of size ๐‘›

across ๐‘‡ and considering each subsequence ๐ถ๐‘.

Generally, the first problem of time series outlier detec-tion is to define what kind of data in a given dataset is abnor-mal. That is, the definition of outlier determines the outlierdetectionsโ€™ goals. In hydrologic time series, time sequenceswhich are composed of different physical quantities showgreat difference anomaly characteristics; therefore, it is diffi-cult to give a uniform definition of abnormality. In this paper,we identify a subsequence to be outlier based on its nearest-neighbor.

Definition 4 (๐‘˜-nearest-neighbor). Given a time series ๐‘‡ oflength ๐‘š and a data point ๐‘‘

๐‘–(๐‘– < ๐‘š), the ๐‘˜-nearest-neighbor

๐œ‚(๐‘˜)

๐‘–

of ๐‘‘๐‘–is a sampling of length 2๐‘˜ of contiguous position

Page 5: Research Article Time Series Outlier Detection Based on ...

Mathematical Problems in Engineering 5

from ๐‘– โˆ’ ๐‘˜ to ๐‘– + ๐‘˜ and does not contain ๐‘–; that is, ๐œ‚(๐‘˜)๐‘–

=

{๐‘‘๐‘–โˆ’๐‘˜

, . . . ๐‘‘๐‘–โˆ’1

, ๐‘‘๐‘–+1

, . . . ๐‘‘๐‘–+๐‘˜

} for ๐‘– + 1 โ‰ค ๐‘˜ โ‰ค ๐‘š โˆ’ ๐‘–.

Definition 5 (time series outlier). Given a time series ๐‘‡ oflength ๐‘š and the ๐‘˜-nearest-neighbor ๐œ‚

(๐‘˜)

๐‘–

of ๐‘‘๐‘–, ๐‘‘๐‘–will be

identified as an outlier if the observed value of ๐‘‘๐‘–falls outside

a ๐‘ƒ๐ถ๐ผ calculated by confidence coefficient ๐‘ and predictedvalue according to ๐œ‚

(๐‘˜)

๐‘–

.

From the above definition, one can see that the nearest-neighborswindow size ๐‘˜ and confidence coefficient๐‘ becomekey parameters of outlier detection.Therefore, it can dynam-ically adjust ๐‘˜ and ๐‘ for different users and different hydro-logical elements to achieve the optimal detection result.

3.2. Algorithm Description. After studying the current sit-uation and challenge of hydrological time series and itsoutliers, this study proposes a new outlier detection methodthat uses a sliding window of hydrological time series๐‘‡๐‘š

= โŸจ๐‘‘1= (V1, ๐‘ก1), ๐‘‘2= (V2, ๐‘ก2), . . . ๐‘‘

๐‘š= (V๐‘š, ๐‘ก๐‘š)โŸฉ, where

point ๐‘‘๐‘–

= (V๐‘–, ๐‘ก๐‘–) stands for the measurement V

๐‘–at the

moment ๐‘ก๐‘–, to classify a particular data point as anomalous

or not. A data point ๐‘‘๐‘–is classified as anomalous if its mea-

surement deviates significantly from its ๐‘˜-nearest-neighbor(๐พ๐‘๐‘) prediction value calculated using its neighboringpoint set ๐œ‚(๐‘˜)

๐‘–

as input. Upon initialization, themethod fills thewindowwith the most recent measurements and commencesclassification with the next measurement taken by the timeseries.

In brief, the method consists of the following steps be-ginning at time ๐‘– within a given hydrological time series ๐‘‡๐‘–.

Step 1. Define ๐‘˜-nearest-neighbor window ๐œ‚(๐‘˜)

๐‘–

for the point๐‘‘๐‘–, based on the data source, from where outliers are to be

detected (history data including complete sequence or onlypast data to forecast new incomings); the ๐œ‚

(๐‘˜)

๐‘–

can be dividedinto one-sided-windows and two-sided-windows types.

Step 2. Build a nearest-neighbor-window prediction modelthat takes ๐œ‚(๐‘˜)

๐‘–

as input to predict V๐‘–+1

, the expected value of thepoint ๐‘‘

๐‘–+1; in addition, calculate the upper and lower bounds

of the range within which the measurement should lie (i.e.,the prediction confidence interval, ๐‘ƒ๐ถ๐ผ).

Step 3. Compare the actual measurement at time ๐‘– + 1 withthe range calculated in Step 2 and classify it as anomalous if itfalls outside the range; otherwise, classify it as nonanomalous.

Step 4. Modify ๐‘‡๐‘– by removing ๐‘‘

๐‘–โˆ’๐‘˜+1from the back of the

window and adding ๐‘‘๐‘–+1

which holds the value V๐‘–+1

to thefront of the window to create ๐‘‡

๐‘–+1; slide one step and modifythe ๐œ‚(๐‘˜)

๐‘–

to create ๐œ‚(๐‘˜)

๐‘–+1

if the measurement is classified asanomalous; else, modify ๐‘‡

๐‘– by removing ๐‘‘๐‘–โˆ’๐‘˜+1

from the backof the window and adding ๐‘‘

๐‘–+1which takes its original value

to the front of the window to create ๐‘‡๐‘–+1; slide one step and

modify the ๐œ‚(๐‘˜)

๐‘–

to create ๐œ‚(๐‘˜)

๐‘–+1

.

Step 5. Repeat Steps 1โ€“4, until all sequence has been detected.

d1, d2, d3, . . . diโˆ’k+1 . . . di, . . . dm}

diโˆ’2k+1, . . . diโˆ’1}

Time series dataset

and correct it

Yes

End

No

Tm

= {

Mark di as outliers

๏ฟฝi

window ๐œ‚i

(k)= {diโˆ’2k,

Predict ๏ฟฝi

Define di โ€™s k-nearest-neighbor

Compare ๏ฟฝi with PCI

extends PCI?

Calculate PCI

Figure 1: Diagram of the proposed outlier detection method.

The detection process of data point ๐‘‘๐‘–is illustrated with a

flow chart in Figure 1.The remainder of this section describesthese steps in detail.

3.2.1. Window Definition. The first step of this outlier detec-tion process, the ๐พ๐‘๐‘ window of the test point ๐‘‘

๐‘–in time

series data, is defined to illustrate the relations between thedata point and its nearest-neighbor. And then, the predictionmodel can use only the test pointโ€™s ๐พ๐‘๐‘ window to predictthe measurement of ๐‘‘

๐‘–for the purpose of simplifying the

computational complexity.Generally, there are two types of hydrological data from

where outliers are to be detected: history time series dataor real-time data. The difference between them is primarilybased on the fact that the former uses the previous andsubsequent neighbor window as input parameters to detectthe outlier while the latter only uses the previous neighborwindow as input parameters.Then, the neighbor window canbe divided into one-sided and two-sided types.

(1) Two-Sided Neighbor Windows. The two-sided-windowsoutlier detection method uses a data pointsโ€™ previous (left)and subsequent (right) neighborโ€™s data point to determinewhether this data point is outlier or not. Given a water

Page 6: Research Article Time Series Outlier Detection Based on ...

6 Mathematical Problems in Engineering

level time series ๐‘‡๐‘š

= โŸจ๐‘‘1= (V1, ๐‘ก1), ๐‘‘2= (V2, ๐‘ก2), . . . ๐‘‘

๐‘š=

(V๐‘š, ๐‘ก๐‘š)โŸฉ, the two-sided-windows neighboring point sets ๐œ‚(๐‘˜)

๐‘–

for the point ๐‘‘๐‘–can be defined as follows:

๐œ‚(๐‘˜)

๐‘–

= {๐‘‘๐‘–โˆ’๐‘˜

, . . . ๐‘‘๐‘–โˆ’1

, ๐‘‘๐‘–+1

, . . . ๐‘‘๐‘–+๐‘˜

} . (1)

Note that 2๐‘˜ is the size of the neighborhood window,starting at ๐‘– โˆ’ ๐‘˜ and ending at ๐‘– + ๐‘˜ (not including ๐‘–).

(2) One-Sided NeighborWindows. As the right neighborsmaycontain outliers that had not been detected, therefore, in thereal-time detection application, only left neighbor can beavailable. So, one-sided-windows outlier detection methodonly chooses left neighbors which removed (or revised)the outliers had been detected will make the detectionresults more meaningful. Other than the two-sided-windowsmethod, one-sided-windows method outlier detection algo-rithms define neighborhood point set ๐œ‚(๐‘˜)

๐‘–

for the point ๐‘‘๐‘–, as

follows:

๐œ‚(๐‘˜)

๐‘–

= {๐‘‘๐‘–โˆ’2๐‘˜

, ๐‘‘๐‘–โˆ’2๐‘˜+1

, . . . ๐‘‘๐‘–โˆ’1

} . (2)

Here, 2๐‘˜ is the size of the neighborhood window for thepoint ๐‘‘

๐‘–, starting at ๐‘– โˆ’ 2๐‘˜ and ending at ๐‘– โˆ’ 1.

3.2.2. Model Selection. In this step, an autoregressive (๐ด๐‘…)prediction model is built to forecast the particular pointโ€™svalue using its neighborhood point sets as input. The ๐ด๐‘…

model forecasts future measurements in time series datasetsusing only a specified set of observations, that is, ๐œ‚(๐‘˜)

๐‘–

, fromthe same discharge site; they are used because they avoidcomplications caused by different sampling frequencies thatcan arise if a heterogeneous set of time series data was used;moreover, the use of ๐ด๐‘… model reduces the number of pre-dictions that cannot be made due to insufficient data causedby the embedded telemetry equipment that went offline.

The neighborhood point sets ๐œ‚(๐‘˜)

๐‘–

are used as input tothe ๐ด๐‘… model of the time series to predict the followingobservation:

๐‘‘๐‘–= ๐‘€(๐œ‚

(๐‘˜)

๐‘–

) , (3)

where ๐‘€( ) is the model. This method assumes that thebehavior of the processes at time ๐‘ก + 1 can be described bya finite set of ๐‘˜ previous measurements; thus it implicitlyassumes that the time series is an order ๐‘˜ Markov process.Literature [22] compares native, nearest cluster (๐‘๐ถ), andsingle-layer linear network (๐ฟ๐‘) and multilayer perceptron(๐‘€๐ฟ๐‘ƒ) on different dataset and concludes that ๐ฟ๐‘ and ๐‘€๐ฟ๐‘ƒ

would obtain better detection result than othermodels. Basedon this work, we use ๐ฟ๐‘ as predictionmodel and assume thatobservation V

๐‘–is a linear combination of two-sided neighbors

windows data point:

V๐‘–=

(โˆ‘๐‘˜

๐‘—=1

๐‘ค๐‘–โˆ’๐‘—V๐‘–โˆ’๐‘—

+ โˆ‘๐‘˜

๐‘—=1

๐‘ค๐‘–+๐‘—V๐‘–+๐‘—

)

(โˆ‘๐‘˜

๐‘—=1

๐‘ค๐‘–โˆ’๐‘—

+ โˆ‘๐‘˜

๐‘—=1

๐‘ค๐‘–+๐‘—

)

, (4)

where โŸจ๐‘ค๐‘–โˆ’๐‘˜

, . . . , ๐‘ค๐‘–โˆ’1

, ๐‘ค๐‘–+1

, ๐‘ค๐‘–+๐‘˜

โŸฉ stands for the weight ofneighborhood, which defines the relationship between

the two-sided-windows neighborhood {๐‘‘๐‘–โˆ’๐‘˜

, . . . ๐‘‘๐‘–โˆ’1

, ๐‘‘๐‘–+1

, . . .

๐‘‘๐‘–+๐‘˜

} and the expected value of ๐‘‘๐‘–. Generally, the weight

of the neighboring point ๐‘‘๐‘—is inversely proportional to the

distance between the point ๐‘‘๐‘–and ๐‘‘

๐‘—; that is, the larger the

distance is, the small the weight๐‘ค is. For simplicity, it assignsthe weight vector โŸจ๐‘ค

๐‘–โˆ’๐‘˜, . . . , ๐‘ค

๐‘–โˆ’1, ๐‘ค๐‘–+1

, ๐‘ค๐‘–+๐‘˜

โŸฉ with the valuesโŸจ1, 2, . . . ๐‘˜, ๐‘˜, . . . 2, 1โŸฉ.

Generally, two-sided neighbor windows need pointsโ€™previous and subsequent neighbors; however, the rightneighbors may contain outliers that had not been detected,which may affect detection results subsequently. So, in someapplication fields, only previous (left) neighbor-window datacan be used to predict and identify forthcoming outliers.Therefore, it often uses a simple modification of the two-sided-windows model to predict the measurement at time ๐‘–:

V๐‘–=

โˆ‘2๐‘˜

๐‘—=1

๐‘ค๐‘–โˆ’๐‘—V๐‘–โˆ’๐‘—

โˆ‘2๐‘˜

๐‘—=1

๐‘ค๐‘–โˆ’๐‘—

. (5)

Similar to two-sided neighbors windows, the weightvector โŸจ๐‘ค

๐‘–โˆ’2๐‘˜, ๐‘ค๐‘–โˆ’2๐‘˜+1

, . . . ๐‘ค๐‘–โˆ’1

โŸฉ stands for theweight of neigh-borhood and is assigned with the values โŸจ1, 2, . . . 2๐‘˜โŸฉ.

3.2.3. Outliers Identification. Given the model predictionfrom Section 3.2.2 based on test pointโ€™s neighbor, the datapoint can then be classified as anomalous using a confidenceboundary ๐‘ƒ๐ถ๐ผ calculated via predicted value and confidencecoefficient. The ๐‘ƒ๐ถ๐ผ gives the range of plausible values thatthe test measurement can take; the confidence coefficient(๐‘ = 100(1โˆ’๐›ผ)) indicates the expected frequency with whichmeasurements will actually fall in this range. If it is assumedthat themodel residuals have a zero-mean Gaussian distribu-tion, the ๐‘% ๐‘ƒ๐ถ๐ผ can be calculated as follows [22]:

๐‘ƒ๐ถ๐ผ = V๐‘ก+1

ยฑ ๐‘ก๐›ผ/2,2๐‘˜โˆ’1

ร— ๐‘ โˆš1 +1

2๐‘˜, (6)

where V๐‘ก+1

is the prediction value of the test point, ๐‘ก๐›ผ/2,๐‘›โˆ’1

isthe ๐‘th percentile of a Studentโ€™s ๐‘ก-distribution with 2๐‘˜ โˆ’ 1

degrees of freedom, ๐‘  is the standard deviation of the modelresidual, and ๐‘˜ is the window size used to calculate ๐‘ . Thistype of๐‘ƒ๐ถ๐ผ is a type of ๐‘ก-interval because it relies on Studentโ€™s๐‘ก-distribution. If the test pointโ€™s actual measurement fallswithin the bounds of the ๐‘ƒ๐ถ๐ผ, then the point is classified asnonanomalous; otherwise, it is classified as anomalous.Thus,the๐‘ƒ๐ถ๐ผ represents a threshold for acceptance or rejection of adata point.Thebenefit of using the๐‘ƒ๐ถ๐ผ instead of an arbitrarythreshold is that the prediction level guides the selection ofthe interval width.

The two-sided-windows approach for outlier detecting isillustrated with a simple example in Figure 2 with a neigh-borhood window width of ๐‘˜ = 4. It should be noted that thearea between the two green lines in Figure 2 covering theneighborhood of data points indicates the confidences bound(๐‘ƒ๐ถ๐ผ). In this example, ๐‘‘

7is an outlier in the current

neighborhood of points and proposes to be replaced by the

Page 7: Research Article Time Series Outlier Detection Based on ...

Mathematical Problems in Engineering 7

2 4 6 8 10 12 14

1

2

3

Time

Varia

ble

Raw dataPredicted value

Confidence boundOutlier

๐œ‚7

(4)= {d3, d4, d5, d6, d8, d9, d10, d11}

Figure 2: Outlier detection example, where the raw data, predictedvalue, confidence bound, and outlier are identified by different sym-bols. In this example, ๐‘‘

7

is an outlier in the current neighborhoodof points and has been detected correctly.

predicted value V7for analysis and modeling of the time

series.

3.3. Parameter Optimization. In order to detect the outliersin the time series, the proposed methods should calculatethe plausible values range ๐‘ƒ๐ถ๐ผ via predicted value andconfidence coefficient based on the test pointโ€™s ๐‘˜-nearest-neighbor. Hence, the values of the two parameters, ๐‘˜ and ๐‘,become the key issues for improving the outlier detectionmethods mentioned in the previous section. In this case, wecan use the following experiments to choose a proper valueof those two parameters.

(1) The window width (๐‘˜) of the methods controls howmany neighboring points are included in the calculation ofthe prediction value.The larger thewindowwidth is, themorepoints are included to compute the prediction value. It varies๐‘˜ from 3 to 15 in increments of one; that is, ๐‘˜ = {3, 4, . . . , 15}.

(2) The confidence coefficient ๐‘ calculates the plausiblevalues range ๐‘ƒ๐ถ๐ผ to classify a data point as an outlier. Thelarger the confidence coefficient is, the more plausible theprediction value is. Therefore, it varies ๐‘ from 85% to 99%in increments of one percent.

To tune the best combination of algorithm parametersthat maximizes the ratio of detection, the cross-validationscheme is applied. The complete sample is split into two seg-ments: a training dataset and a testing dataset. This is a wayof cross-validating whether the parameters found during thefirst period, the training phase, are consistent and still validin a different period, the testing phase. The principle ofcross-validation is a generic resource to validate statisticalprocedures and it has been applied in different contexts.Furthermore, in training phase, the training set is divided into10 nonintersecting subsets of equal size, chosen by randomsampling. The model is then trained 10 times, each timereserving one of the subsets as a validation set on which themodel error is evaluated while fitting the model parametersusing the remaining nine subsets.Themodel parameters withthe lowest mean squared error among the 10 training modelsare then selected for the final model [41].

4. Experiments and Analysis

To demonstrate the efficacy of the outlier detection meth-ods developed in this study for data QA/QC, it will beapplied to hydrological data series from national hydrologydatabase of MWR, China. In the following, the real dataare described and are functional and results are presentedand discussed. More precisely, it first uses the previouslypresented approaches to identify outliers; then, some perfor-mance evaluation will be discussed and interpreted on thebasis of hydrological data; and some results usingmultivariateapproaches for comparison purposes will be provided at last.

4.1. Study Area and Data. In this subsection we report ona number of experiments using two different hydrologicelements, water level (m) and daily flow (m3 sโˆ’1), from ๐ฟ๐‘

and ๐ป๐‘Œ๐พ stations with reference numbers 40101200 and40105150, respectively, obtained from national hydrologydatabase of MWR, China. The ๐ฟ๐‘ and ๐ป๐‘Œ๐พ stations arethe important flood prevention and control stations and aregenerally known as typical hydrological stations of upper andmiddle reaches of the Yellow River. They play an importantrole in downstream of Yellow River in the fields such ashydrological data collection and forecasting, flood controlscheduling, river control experiments, and water resourcesdevelopment. Figure 3 indicates the geographical location of๐ฟ๐‘ and ๐ป๐‘Œ๐พ stations.

The data downloaded from the national hydrologydatabase of MWR, China, were available as raw data. In addi-tion, we followed the procedures described in the previoussection. Figures 4 and 5 illustrate the whole dataset within agiven time series; it shows that the dataset is nearly periodicand has some suspicious data point obviously.

4.2. Experiment Result. Since the data used in this study weresubjected to manual quality control before being archivedto the national hydrology database, it was expected that thedetectorswould not identifymany data outliers in the archive.However, we can easily see that some data points deviate fromtheir neighbor. And then, we apply our methods to detect theoutliers in the given hydrological time series with the windowsize ๐‘˜ = 6 and probability ๐‘ = 95%. And the detection resultsfor the proposed methods are shown in Figures 6, 7, 8, and 9.

Figures 6โ€“9 depict real and predicted values with theproposed outlier detectedmethods in the daily flow andwaterlevel time series of ๐ฟ๐‘ and๐ป๐‘Œ๐พ stations, respectively. In eachof these graphics the ๐‘ƒ๐ถ๐ผ for the predictions is also depicted.These graphics show that most of the real values are verynear the respective predicted value, while a spot of them liesoutside the ๐‘ƒ๐ถ๐ผ boundaries for the predictions. An outlierdetection method based on ๐‘ƒ๐ถ๐ผ would indicate an outlier ifthe real value lies outside the confidence bounds, as describedin Section 3.2. The experiments reported here showed thatthese bounds, built from cross-validation scheme on trainingand validation sets, can correctly bind the region of normality.Furthermore, these intervals could be used to indicate thelevel of suspicion associated with a given point in the future.

Page 8: Research Article Time Series Outlier Detection Based on ...

8 Mathematical Problems in Engineering

N

Station

Figure 3: Geographical location of ๐ฟ๐‘ and ๐ป๐‘Œ๐พ stations in Yellow River.

Time (day)

Raw data

100 200 300 400 500 600 700

1511.5

1512

1512.5

1513

1513.5

Wat

er le

vel (

m)

Station (40101200) raw water level data:from 1998/1/1 to 1999/12/31

(a) Water level

100 200 300 400 500 600 700200

400

600

800

1000

1200

1400

1600

1800

2000

Time (day)

Raw data

Dai

ly fl

ow (m

3 /s)

Station (40101200) raw daily flow data:from 1998/1/1 to 1999/12/31

(b) Daily flow

Figure 4: Raw hydrological time series of ๐ฟ๐‘ station.

4.3. Experiment Evaluation

4.3.1. Parameters of Quality. The detection results for theproposedmethodswith the two different hydrologic elementsfrom ๐ฟ๐‘ and ๐ป๐‘Œ๐พ stations illustrate that the outlier detec-tion methods are promising and can successfully detect aconsiderable amount of outliers from the hydrologic dataset.However, we also note that there are some valid data pointswhich had been detected as outliers and imputed in error.

The objective of the evaluation analysis described in thissection is to assess the effectiveness ofmethod; we can classifythe results from our experiment into four categories (seeTable 1).

The categories in Table 1 correspond to the four possibleoutcomes of one experimental run, which consists of usingboth methods with a particular combination of windowwidth (๐‘˜) and probability (๐‘). Categories ๐ด and ๐ท are idealsituations in which a point can be detected correctly, whilecategories ๐ต and ๐ถ are undesired, because the methods arenot able to distinguish between outliers and exactness.

According to these definitions, the Sensitivity is theprobability that the proposed methods discovered a realoutlier. Its formula is defined as follows:

Sensitivity =๐‘‡๐‘ƒ

(๐‘‡๐‘ƒ + ๐น๐‘). (7)

Table 1: Assessment of both methods.

Truth DetectionOutlier Not an outlier

OutlierTrue positives or TP (A)

data points that areoutliers and identify as

outliers

False negatives or FN(C) data points that areoutliers but identify as

normal

Not an outlierFalse positives or FP (B)data points that are

normal but identify asoutliers

True negatives or TN(D) data points that arenormal and identify as

normal

Another relevant parameter is the Specificity, the propor-tion of negative test results among the normal; themathemat-ical expression is as follows:

Specificity =๐‘‡๐‘

(๐‘‡๐‘ + ๐น๐‘ƒ). (8)

The positive predictive value (PPV) is the probability thata detected outlier is indeed a real one. Its formula is as follows:

๐‘ƒ๐‘ƒ๐‘‰ =๐‘‡๐‘ƒ

(๐‘‡๐‘ƒ + ๐น๐‘ƒ). (9)

Page 9: Research Article Time Series Outlier Detection Based on ...

Mathematical Problems in Engineering 9

90.5

91

91.5

92

92.5

Station (40105150) raw water level data:from 2006/1/1 to 2007/12/31

Time (day)

Raw data

100 200 300 400 500 600 700

Wat

er le

vel (

m)

(a) Water level

500

1000

1500

2000

2500

3000

3500

4000

Station (40105150) raw daily flow data:from 2006/1/1 to 2007/12/31

100 200 300 400 500 600 700

Time (day)

Raw data

Dai

ly fl

ow (m

3 /s)

(b) Daily flow

Figure 5: Raw hydrological time series of ๐ป๐‘Œ๐พ station.

1510.5

1511

1511.5

1512

1512.5

1513

1513.5

1514

Time (day)100 200 300 400 500 600 700

Wat

er le

vel (

m)

Station (40101200) raw water level data:from 1998/1/1 to 1999/12/31

Raw dataPredicted value

Confidence boundOutlier

(a) Two-sided window detection results

1510.5

1511

1511.5

1512

1512.5

1513

1513.5

1514

Time (day)100 200 300 400 500 600 700

Wat

er le

vel (

m)

Station (40101200) raw water level data:from 1998/1/1 to 1999/12/31

Raw dataPredicted value

Confidence boundOutlier

(b) One-sided window detection results

Figure 6: Detection results over ๐ฟ๐‘ water level.

0

500

1000

1500

2000

Station (40101200) detection result:

Time (day)100 200 300 400 500 600 700

Raw dataPredicted value

Confidence boundOutlier

Dai

ly fl

ow (m

3 /s)

from 1998/1/1 to 1999/12/31

(a) Two-sided-window results

0

500

1000

1500

2000

Station (40101200) detection result:

Time (day)100 200 300 400 500 600 700

Raw dataPredicted value

Confidence boundOutlier

Dai

ly fl

ow (m

3 /s)

from 1998/1/1 to 1999/12/31

(b) One-sided-window detection results

Figure 7: Detection results over ๐ฟ๐‘ daily flow.

Page 10: Research Article Time Series Outlier Detection Based on ...

10 Mathematical Problems in Engineering

90

90.5

91

91.5

92

92.5

93

93.5

Station (40105150) detection result:from 2006/1/1 to 2007/12/31

Time (day)100 200 300 400 500 600 700

Raw dataPredicted value

Confidence boundOutlier

Wat

er le

vel (

m)

(a) Two-sided-window detection results

Station (40105150) detection result:from 2006/1/1 to 2007/12/31

Time (day)100 200 300 400 500 600 700

Raw dataPredicted value

Confidence boundOutlier

90

90.5

91

91.5

92

92.5

93

93.5

Wat

er le

vel (

m)

(b) One-sided-window detection results

Figure 8: Detection results over ๐ป๐‘Œ๐พ water level.

0

1000

2000

3000

4000

5000

Station (40105150) detection result:from 2006/1/1 to 2007/12/31

Time (day)100 200 300 400 500 600 700

Raw dataPredicted value

Confidence boundOutlier

Dai

ly fl

ow (m

3 /s)

(a) Two-sided-window detection results

0

1000

2000

3000

4000

5000

Station (40105150) detection result:

Time (day)100 200 300 400 500 600 700

Raw dataPredicted value

Confidence boundOutlier

Dai

ly fl

ow (m

3 /s)

from 2006/1/1 to 2007/12/31

(b) One-sided-window detection results

Figure 9: Detection results over ๐ป๐‘Œ๐พ daily flow.

Finally, the negative predictive value (NPV) is the propor-tion of nonoutliers among subjects with a negative test result.Its formula is as follows:

๐‘๐‘ƒ๐‘‰ =๐‘‡๐‘

(๐‘‡๐‘ + ๐น๐‘). (10)

4.3.2. Quantifying the Outlier Occurrences. A statistical mea-sure of the accuracy is provided in this section. The parame-ters used are the ones described in Section 4.3.1 and summa-rized in Tables 2 and 3. Note that all parameters refer to thewhole year from 2006 to 2007 of ๐ป๐‘Œ๐พ station and 1998 to1999 of ๐ฟ๐‘ station. And just for simplicity, only the detectionresult with three pairs of parameters and the explanation forthe water level where (๐‘˜, ๐‘) takes of the value (6, 0.95) isprovided, since the reasoning for the remaining hydrologicelements and parameters is analogous. Note that the optimalvalues for the parameters (๐‘˜, ๐‘) would be (6, 0.95) on water

level time series of ๐ป๐‘Œ๐พ station according to parameteroptimization method described in Section 3.3.

As it is shown in Table 2, during the one-sided methodsdetecting process on water level time series of ๐ป๐‘Œ๐พ stationwith (๐‘˜, ๐‘) taking the value of (6, 0.95), there were 14 outliersproperly detected; that is, ๐‘‡๐‘ƒ = 14. On the other hand, therewere two occasions which originally represent normal eventbut are to be considered an outlier by the methods; therefore,๐น๐‘ƒ = 2. Furthermore, there was one day which was outlierand was not detected by the methods; that is, ๐น๐‘ = 1. Andthe remaining 713 dayswere correctly judged as normal event.Thus, ๐‘‡๐‘ = 713 (14 + 2 + 1 + 713 = 730 forecasted days from2006 to 2007).

The results show great accuracy for both features. Partic-ularly remarkable are the values reached by the specificity. Inparticular, it exceeds 99% in all situations.These values meanthat when the approach classifies the day to be predicted asnormal, it does it with high reliability.

Page 11: Research Article Time Series Outlier Detection Based on ...

Mathematical Problems in Engineering 11

Table 2: Statistical analysis of one-sided methods with different parameters of HYK station.

Parameters Water level Daily flow(5, 0.95) (6, 0.95) (6, 0.96) (7, 0.95) (5, 0.95) (5, 0.96) (6, 0.96) (5, 0.97)

TP 12 14 12 11 15 16 14 13TN 713 713 711 712 707 710 708 709FP 2 2 4 3 5 2 4 3FN 3 1 3 4 3 2 4 5Sensitivity 80.00% 93.33% 80.00% 73.33% 83.33% 88.89% 77.78% 72.22%Specificity 99.72% 99.72% 99.44% 99.58% 99.30% 99.72% 99.44% 99.58%PPV 85.71% 87.50% 75.00% 78.57% 75.00% 88.89% 77.78% 81.25%NPV 99.58% 99.86% 99.58% 99.44% 99.58% 99.72% 99.44% 99.30%

Table 3: Statistical analysis of both methods with optimal parameters of given dataset.

ParametersLZ station HYK station

Water level Daily flow Water level Daily flowOne-sided(6, 0.96)

Two-sided(6, 0.95)

One-sided(7, 0.96)

Two-sided(6, 0.95)

One-sided(6, 0.95)

Two-sided(6, 0.96)

One-sided(5, 0.96)

Two-sided(6, 0.95)

TP 20 18 18 19 14 13 17 17TN 704 704 706 704 713 710 710 708FP 4 4 3 5 2 5 2 4FN 2 4 3 2 1 2 1 1Sensitivity 90.91% 81.82% 85.71% 90.48% 93.33% 86.67% 94.44% 94.44%Specificity 99.44% 99.44% 99.58% 99.29% 99.72% 99.30% 99.72% 99.44%PPV 83.33% 81.82% 85.71% 79.17% 87.50% 72.22% 89.47% 80.95%NPV 99.72% 99.44% 99.58% 99.72% 99.86% 99.72% 99.86% 99.86%

As for the sensitivity, all the results reached values greaterthan 73% (except for daily flowwith the situation (๐‘˜, ๐‘) takingthe value of (5, 0.97), in which it reaches 72.22%) and obtains81.6% on average.

The๐‘ƒ๐‘ƒ๐‘‰ reached values similar to those of the sensitivity,in particular, slightly greater (82.8% on average).Therefore, itcan be stated that when the proposedmethod determines thatthere is an upcoming outlier, it is highly reliable.

Finally,๐‘๐‘ƒ๐‘‰ provides similar values for all the situations,reaching 99.58%on average; that is, the rate of real outliers notfound by the approach cannot be considered significant.

As for Table 3, it can be easy to draw a conclusionthat different hydrologic elements of the same station andthe same hydrologic elements of different station may takedifferent optimal parameters values for (๐‘˜, ๐‘) from experi-ments described in Section 3.3. Moreover, we can see that thedetection accuracy of one-sided method is better than two-sided one as a whole. For example, there were 15 outliers inwater level dataset of ๐ป๐‘Œ๐พ station; one-sided method cancorrectly detect 14 abnormal and 713 normal data points;simultaneously, it masks one outlier as normal and swampstwo normal samples as outliers. As a comparison, two-sidedmethods can correctly detect 13 abnormal and 710 normaldata points; correspondingly, the masking and swampingeffects reach to 2 and 5, respectively. The reason for thisdifference may lie in that the latter method needs both sidesโ€™neighbors which may contain outliers, which may lead to thefact that masking and swamping event occurs.

4.4. Analysis and Discussion. We compared our methodswith other methods such as ๐‘†๐‘‰๐‘€ (support vector machine)[42], box-plot techniques [40], and median method [23] onthe same test datasets. The comparison results will displayin the receiver operating characteristic (๐‘…๐‘‚๐ถ) [43] curves,which will be shown later. By convention, the ๐‘…๐‘‚๐ถ curvedisplays sensitivity (๐‘‡๐‘ƒ๐‘…) on the vertical axis against thecomplement of specificity (1-specificity or ๐น๐‘ƒ๐‘…) on thehorizontal axis. The ๐‘…๐‘‚๐ถ curve then demonstrates thecharacteristic reciprocal relationship between sensitivity andspecificity, expressed as a tradeoff between the ๐‘‡๐‘ƒ๐‘… and ๐น๐‘ƒ๐‘….This configuration of the curve also facilitates calculationof the area beneath it as a summary index of overall testperformance. Therefore, the larger the area under the ๐‘…๐‘‚๐ถ

curve, the better the performance of the technique.Figure 10 reveals the ๐‘…๐‘‚๐ถ curves obtained by proposed

methods and other techniques. For these datasets, the per-formances of proposed methods are satisfactory and stable.For the water level dataset of ๐ฟ๐‘, our methods obtain similarresults; their๐‘…๐‘‚๐ถ curves showbetter performance than thoseof ๐‘†๐‘‰๐‘€ and box-plot methods, while median method showssecond-better ๐‘…๐‘‚๐ถ curves. For the daily flow dataset of ๐ฟ๐‘,ourmethods and ๐‘†๐‘‰๐‘€ showbetter performance thanmedianmethod and box-plot method; moreover, the performanceof one-sided-window method is slightly better than that oftwo-sided-window. For the water level of๐ป๐‘Œ๐พ, our methodsshow preferable results, while ๐‘†๐‘‰๐‘€method obtains relativelybetter results than box-plot and median methods. For the

Page 12: Research Article Time Series Outlier Detection Based on ...

12 Mathematical Problems in Engineering

Table 4: Comparisons of area under ROC curves.

AUCs Median Box-plot SVM Two-sided One-sidedLZ water level 0.922 0.894 0. 871 0.935 0.957LZ daily flow 0. 843 0.852 0.895 0.92 0.933HYK water level 0.819 0.836 0.865 0.903 0.921HYK daily flow 0.934 0.928 0.93 0.942 0.955

MedianBox-plotSVM

Two-sidedOne-sided

0.2

0.4

0.6

0.8

1

0 0.2 0.4 0.6 0.8 1

(a) ๐‘…๐‘‚๐ถ of ๐ฟ๐‘water level

MedianBox-plotSVM

Two-sidedOne-sided

0.2

0.4

0.6

0.8

1

0 0.2 0.4 0.6 0.8 1

(b) ๐‘…๐‘‚๐ถ of ๐ฟ๐‘ daily flow

MedianBox-plotSVM

Two-sidedOne-sided

0.4

0.6

0.8

1

0 0.2 0.4 0.6 0.8 1

(c) ๐‘…๐‘‚๐ถ of๐ป๐‘Œ๐พ water level

MedianBox-plotSVM

Two-sidedOne-sided

0.4

0.6

0.8

1

0 0.2 0.4 0.6 0.8 1

(d) ๐‘…๐‘‚๐ถ of๐ป๐‘Œ๐พ daily flow

Figure 10:๐‘…๐‘‚๐ถ curves for different datasets. Five methods are compared: medianmethod, box-plot method, ๐‘†๐‘‰๐‘€method, and ourmethod,which are described using different colored lines.

daily flow of๐ป๐‘Œ๐พ, the ๐‘…๐‘‚๐ถ curves interlace with each other,so it is difficult to evaluate the performance only accordingto the observation. We have also calculated the ๐ด๐‘ˆ๐ถ๐‘  forthe ๐‘…๐‘‚๐ถ curves to compare the results (see Table 4). Notethat there is a remarkable improvement in the detectionperformance.The average๐ด๐‘ˆ๐ถ๐‘  of ourmethod are 0.925 and0.942, which are the highest two among these five methods.The average ๐ด๐‘ˆ๐ถ๐‘  of median, box-plot, and SVM methodsare 0.892, 0.878, and 0.897, respectively.

For our method, it can be inferred that the anomaliescan be effectively detected by the window-based forecasting

model, which is constructed using ๐ด๐‘… prediction modeland dynamically boundary movement strategies. The resultssuggest that our methods improve the robustness of theoverall decision, while the other methods show differentperformances on different datasets. The other three tech-niques have the similar characteristic; they achieve better orworse results on different datasets. Results indicate that theproposed framework can find robust ๐‘ƒ๐ถ๐ผ as a means fordefining the thresholds for detecting novelties in time seriesand can therefore improve performance of forecasting-basedtime series novelty detection.Moreover, ourmethod is robust

Page 13: Research Article Time Series Outlier Detection Based on ...

Mathematical Problems in Engineering 13

to detect anomalies in different datasets, which means that itis more data independent.

It is important to mention that we have also validatedour method on different datasets. The results were similar tothose reported in this section. Furthermore, this paper used๐‘ƒ๐ถ๐ผ rather than an arbitrary, user-defined threshold, whichis mainly because of the fact that ๐‘ƒ๐ถ๐ผ can provide guidance(in the form of the confidence coefficient) for the calcula-tion of the normal boundary region without requiring anyknowledge of the process variables being measured. The๐‘% ๐‘ƒ๐ถ๐ผ indicates the region likely to contain at least ๐‘% ofthe possible sensor measurements, and we expect that ap-proximately (1 โˆ’ ๐‘)% of the nonanomalous data will bemisclassified as anomalous. This indicates that the normalassumption implicit in the ๐‘ƒ๐ถ๐ผ calculation is reasonablefor the hydrological time series data. One limitation of ourmethods is that we identify a data point to be an outlier or notby comparing new observed value to ๐‘ƒ๐ถ๐ผ calculated dynam-ically according to different nearest-neighbor-windows sizeand confidence coefficient of difference users, whichmay costa certain amount of time complexity to calculate optimizationparameter for best detection results. Notwithstanding thatlimitation, this study does suggest that the proposed methodcan improve the performance of outlier detection. Resultsconfirm that our methods can achieve a more robust perfor-mance than other outlier detection techniques.

5. Conclusions

Outlier detection, one of the classical topics of data mining,has generated a great deal of research in recent years owing tothe new challenges posed by large high-dimensional data. Inthe meantime, outliers in hydrological time series have manypractical applications, such as data QA/QC, adaptive sam-pling, and anomalous event detection. This research devel-oped a time series outlier detection method that employsa window-based forecasting mode in conjunction with ๐‘ƒ๐ถ๐ผ

to detect novelties in hydrological time series. The methodfirst splits given historical hydrological time series into sub-sequences by a sliding-window, and then an autoregressiveprediction model of time series was built from its nearest-neighbor-window to predict future values. Anomalies areassumed to take place if the observed values fall outside agiven ๐‘ƒ๐ถ๐ผ, which can be calculated dynamically accordingto different nearest-neighbor-windows size and confidencecoefficient of different user. Moreover, experiments with twodifferent hydrological variables, water level and daily flow,from๐ฟ๐‘ and๐ป๐‘Œ๐พ stations obtained fromnational hydrologydatabase of MWR, China, were performed to examine theeffects of the method. In the experiments, single-layer linearnetworks were used for forecasting model; confidence coeffi-cient and window size are assigned to 95% and 6 respectivelyin the initialization phase. Moreover, the best combinationof parameters confidence coefficient and window size canbe tuned according to different userโ€™s requirements and timeseriesโ€™ features.

The case study results suggest that the proposed outlierdetectionmethods developed in this study are useful tools for

identifying anomalies in hydrological time series. Since thesemethods only require a time-series model of the time series,they can be easily applied to many real-time hydrologicaltime series. However, it should be noted that, while the ๐ฟ๐‘

produced the best model for the water level and daily flowtime series considered in the case study, this model may notbe the most appropriate choice for other types of hydro-logical data. Furthermore, the optimization combination ofparameters confidence coefficient and window size may costa certain amount of time complexity for different hydrologicalelement and different userโ€™s requirements. In spite of this,these methods do not require any preclassified examples ofdata, scale well to large volumes of data, and allow for fastincremental evaluation of data, which make them an idealchoice for correctly and efficiently hydrological time seriesoutlier detection, especially for cases in which there is littleinformation about how to set the threshold value.

Conflict of Interests

The authors declare that there is no conflict of interestsregarding the publication of this paper.

Acknowledgments

This work is supported by the Natural Science Foundationof China (nos. 51079040, 61170200, and 61370091) and theNational Science and Technology Infrastructure of China(no. 2005DKA32000).

References

[1] J. D. Salas, โ€œAnalysis and modeling of hydrologic time series,โ€ inHandbook of Hydrology, vol. 19, pp. 1โ€“72, 1993.

[2] W. Gujer, Systems Analysis for Water Technology, Springer,Berlin, Germany, 2008.

[3] N. Lauzon,Water resources data quality assessment and descrip-tion of natural processes using artificial intelligence techniques[Ph.D. thesis], University of British Columbia, 2003.

[4] K. Yang and C. Shahabi, โ€œA PCA-based similarity measure formultivariate time series,โ€ in Proceedings of the 2nd ACM Inter-national Workshop on Multimedia Databases, pp. 65โ€“74, ACM,November 2004.

[5] G. E. P. Box, G. M. Jenkins, and G. C. Reinsel, Time Series Anal-ysis: Forecasting and Control, JohnWiley & Sons, NewYork, NY,USA, 2013.

[6] D. Machiwal and M. K. Jha, Hydrologic Time Series Analysis:Theory and Practice, Springer, New York, NY, USA, 2012.

[7] D. M. Hawkins, Identification of Outliers, vol. 11, Chapman&Hall, London, UK, 1980.

[8] V. Chandola, A. Banerjee, and V. Kumar, โ€œAnomaly detection: asurvey,โ€ ACM Computing Surveys, vol. 41, no. 3, article 15, 2009.

[9] M. Gupta, J. Gao, C. Aggarwal, and J. Han, Outlier Detectionfor Temporal Data, Synthesis Lectures on Data Mining andKnowledge Discovery, Morgan & Claypool, 2014.

[10] V. J. Hodge and J. Austin, โ€œA survey of outlier detectionmethod-ologies,โ€ Artificial Intelligence Review, vol. 22, no. 2, pp. 85โ€“126,2004.

Page 14: Research Article Time Series Outlier Detection Based on ...

14 Mathematical Problems in Engineering

[11] K. Das and J. Schneider, โ€œDetecting anomalous records in cate-gorical datasets,โ€ in Proceedings of the 13th ACM SIGKDD Inter-national Conference on Knowledge Discovery and Data Mining,pp. 220โ€“229, ACM, August 2007.

[12] M. M. Breuniq, H.-P. Kriegel, R. T. Ng, and J. Sander, โ€œLOF:identifying density-based local outliers,โ€ ACM Sigmod Record,vol. 29, no. 2, pp. 93โ€“104, 2000.

[13] Z. He, X. Xu, and S. Deng, โ€œDiscovering cluster-based localoutliers,โ€ Pattern Recognition Letters, vol. 24, no. 9-10, pp. 1641โ€“1650, 2003.

[14] C. C. Aggarwal and P. S. Yu, โ€œOutlier detection with uncertaindata,โ€ in Proceedings of the 8th SIAM International Conferenceon Data Mining, pp. 483โ€“493, April 2008.

[15] S. Ando, โ€œClustering needles in a haystack: An information the-oretic analysis of minority and outlier detection,โ€ in Proceedingsof the 7th IEEE International Conference on DataMining (ICDMโ€™07), pp. 13โ€“22, October 2007.

[16] A. Agovic, B. Arindam, G. Auroop, and P. Vladimir, โ€œAnomalydetection using manifold embedding and its applications intransportation corridors,โ€ Intelligent Data Analysis, vol. 13, no.3, pp. 435โ€“455, 2009.

[17] S. Barua and R. Alhajj, โ€œA parallel multi-scale region outliermining algorithm for meteorological data,โ€ in Proceedings of the15th ACM International Symposium on Advances in GeographicInformation Systems (GIS โ€™07), pp. 352โ€“355, ACM, November2007.

[18] F. Rasheed, P. Peng, R. Alhajj, and J. Rokne, โ€œFourier transformbased spatial outliermining,โ€ in IntelligentData Engineering andAutomated Learningโ€”IDEAL 2009, vol. 5788 of Lecture Notes inComputer Science, pp. 317โ€“324, Springer, Berlin, Germany, 2009.

[19] U. Rebbapragada, P. Protopapas, C. E. Brodley, and C. Alcock,โ€œFinding anomalous periodic time series : an application to cat-alogs of periodic variable stars,โ€ Machine Learning, vol. 74, no.3, pp. 281โ€“313, 2009.

[20] A. J. Fox, โ€œOutliers in time series,โ€ Journal of the Royal StatisticalSociety. Series B (Methodological), vol. 34, pp. 350โ€“363, 1972.

[21] J. Ma and S. Perkins, โ€œOnline novelty detection on temporalsequences,โ€ in Proceedings of the 9th ACM SIGKDD Interna-tional Conference on Knowledge Discovery and Data Mining(KDD โ€™03), pp. 613โ€“618, ACM, August 2003.

[22] D. J. Hill and B. S. Minsker, โ€œAnomaly detection in streamingenvironmental sensor data: a data-driven modeling approach,โ€Environmental Modelling and Software, vol. 25, no. 9, pp. 1014โ€“1022, 2010.

[23] A. L. I. Oliveira and S. R. L. Meira, โ€œDetecting novelties in timeseries through neural networks forecasting with robust confi-dence intervals,โ€ Neurocomputing, vol. 70, no. 1โ€“3, pp. 79โ€“92,2006.

[24] E. Keogh, J. Lin, A. W. Fu, and H. Van Herle, โ€œFinding unu-sual medical time-series subsequences: algorithms and appli-cations,โ€ IEEE Transactions on Information Technology inBiomedicine, vol. 10, no. 3, pp. 429โ€“439, 2006.

[25] E. Keogh, J. Lin, and A. Fu, โ€œHOT SAX: efficiently finding themost unusual time series subsequence,โ€ in Proceedings of the 5thIEEE International Conference on Data Mining (ICDM โ€™05), pp.226โ€“233, Houston, Tex, USA, November 2005.

[26] L. Wei, E. Keogh, and X. Xi, โ€œSAXually explicit images: findingunusual shapes,โ€ in Proceedings of the 6th International Con-ference on Data Mining (ICDM โ€™06), pp. 711โ€“720, Hong Kong,December 2006.

[27] Y. Bu, T.-W. Leung, A. W.-C. Fu, E. J. Keogh, J. Pei, and S.Meshkin, โ€œWAT: finding top-Kdiscords in time series database,โ€in Proceedings of the 7th SIAM International Conference on DataMining (SDM โ€™07), pp. 449โ€“454, April 2007.

[28] J. Lin, E. Keogh, A. Fu, and H. van Herle, โ€œApproximations tomagic: finding unusual medical time series,โ€ in Proceedings ofthe 18th IEEE Symposium on Computer-Based Medical Systems,pp. 329โ€“334, June 2005.

[29] M. Gupta, J. Gao, C. C. Aggarwal, and J. Han, โ€œOutlier detectionfor temporal data: a survey,โ€ IEEE Transactions on Knowledgeand Data Engineering, vol. 26, no. 9, p. 1, 2014.

[30] Y. Zhang, N. A. S. Hamm, N. Meratnia, A. Stein, M. van deVoort, and P. J. M. Havinga, โ€œStatistics-based outlier detectionfor wireless sensor networks,โ€ International Journal of Geo-graphical Information Science, vol. 26, no. 8, pp. 1373โ€“1392, 2012.

[31] R. Frieda, I. Agueusopa, B. Bornkampb et al., โ€œBayesian outlierdetection in INGARCH time series,โ€ Sonderforschungsbereich(SFB) 823, 2012.

[32] A. Grane and H. Veiga, โ€œWavelet-based detection of outliers infinancial time series,โ€ Computational Statistics & Data Analysis,vol. 54, no. 11, pp. 2580โ€“2593, 2010.

[33] C. Bilen and S. Huzurbazar, โ€œWavelet-based detection of out-liers in time series,โ€ Journal of Computational and GraphicalStatistics, vol. 11, no. 2, pp. 311โ€“327, 2002.

[34] F. Chebana and T. B. M. J. Ouarda, โ€œDepth-based multivariatedescriptive statistics with hydrological applications,โ€ Journal ofGeophysical Research: Atmospheres, vol. 116, no. D10, 2011.

[35] R. H. McCuen, Modeling Hydrologic Change: Statistical Meth-ods, CRC Press, New York, NY, USA, 2002.

[36] Interagency Advisory Committee onWater Data,Guidelines forDetermining Flood Flow Frequency: Bulletin 17B, U.S. GeologicalSurvey, Office of Water Data Coordination, Reston, Va, USA,1982.

[37] R. J. Hyndman and H. L. Shang, โ€œRainbow plots, bagplots, andboxplots for functional data,โ€ Journal of Computational andGraphical Statistics, vol. 19, no. 1, pp. 29โ€“45, 2010.

[38] F. Chebana, S.Dabo-Niang, andT. B.M. J.Ouarda, โ€œExploratoryfunctional flood frequency analysis and outlier detection,โ€Water Resources Research, vol. 48, no. 4, Article ID W04514,2012.

[39] W. W. Ng, U. S. Panu, and W. C. Lennox, โ€œChaos based Ana-lytical techniques for daily extreme hydrological observations,โ€Journal of Hydrology, vol. 342, no. 1-2, pp. 17โ€“41, 2007.

[40] J. M. Chambers, W. S. Cleveland, B. Kleiner, and P. A. Tukey,Graphical Methods for Data Analysis, Wadsworth, Belmont,Calif, USA, 1983.

[41] J. Han and M. Kamber, Data Mining: Concepts and Techniques,Morgan Kaufmann, San Francisco, Calif, USA, 2001.

[42] J. Ma and S. Perkins, โ€œTime-series novelty detection using one-class support vector machines,โ€ in Proceedings of the Interna-tional Joint Conference on Neural Networks, vol. 3, pp. 1741โ€“1745,July 2003.

[43] T. Fawcett, โ€œAn introduction to ROC analysis,โ€ Pattern Recogni-tion Letters, vol. 27, no. 8, pp. 861โ€“874, 2006.

Page 15: Research Article Time Series Outlier Detection Based on ...

Submit your manuscripts athttp://www.hindawi.com

Hindawi Publishing Corporationhttp://www.hindawi.com Volume 2014

MathematicsJournal of

Hindawi Publishing Corporationhttp://www.hindawi.com Volume 2014

Mathematical Problems in Engineering

Hindawi Publishing Corporationhttp://www.hindawi.com

Differential EquationsInternational Journal of

Volume 2014

Applied MathematicsJournal of

Hindawi Publishing Corporationhttp://www.hindawi.com Volume 2014

Probability and StatisticsHindawi Publishing Corporationhttp://www.hindawi.com Volume 2014

Journal of

Hindawi Publishing Corporationhttp://www.hindawi.com Volume 2014

Mathematical PhysicsAdvances in

Complex AnalysisJournal of

Hindawi Publishing Corporationhttp://www.hindawi.com Volume 2014

OptimizationJournal of

Hindawi Publishing Corporationhttp://www.hindawi.com Volume 2014

CombinatoricsHindawi Publishing Corporationhttp://www.hindawi.com Volume 2014

International Journal of

Hindawi Publishing Corporationhttp://www.hindawi.com Volume 2014

Operations ResearchAdvances in

Journal of

Hindawi Publishing Corporationhttp://www.hindawi.com Volume 2014

Function Spaces

Abstract and Applied AnalysisHindawi Publishing Corporationhttp://www.hindawi.com Volume 2014

International Journal of Mathematics and Mathematical Sciences

Hindawi Publishing Corporationhttp://www.hindawi.com Volume 2014

The Scientific World JournalHindawi Publishing Corporation http://www.hindawi.com Volume 2014

Hindawi Publishing Corporationhttp://www.hindawi.com Volume 2014

Algebra

Discrete Dynamics in Nature and Society

Hindawi Publishing Corporationhttp://www.hindawi.com Volume 2014

Hindawi Publishing Corporationhttp://www.hindawi.com Volume 2014

Decision SciencesAdvances in

Discrete MathematicsJournal of

Hindawi Publishing Corporationhttp://www.hindawi.com

Volume 2014 Hindawi Publishing Corporationhttp://www.hindawi.com Volume 2014

Stochastic AnalysisInternational Journal of