shanu CASH DEMAND FORECASTING FOR ATMS - … Agrawal_Cash...3 ACKNOWLEDGEMENT I owe thanks to many...
Transcript of shanu CASH DEMAND FORECASTING FOR ATMS - … Agrawal_Cash...3 ACKNOWLEDGEMENT I owe thanks to many...
1
Report of summer project
Institute for development and research in banking technology
13 May -13 July, 2013
CASH DEMAND FORECASTING
FOR ATMS
Guided By
Dr. Mahil Carr
Associate Professor
IDRBT, Hyderabad
Submitted By
Shanu Agrawal
Industrial and Management Engineering
M. Tech. 1st Year
IIT Kanpur
2
INSTITUTE FOR DEVELOPMENT AND RESEARCH IN
BANKING TECHNOLOGY
(IDRBT)
Road No. 1, Castle Hills, Masab Tank,
Hyderabad-500057
CERTIFICATE OF COMPLETION
This is to certify that Miss Shanu Agrawal, pursuing M. Tech degree in the Department of
Industrial and Management Engineering at Indian Institute of Technology, Kanpur, has
undertaken a project as an intern in the Institute of Development and Research in Banking
Technology (IDRBT), Hyderabad from 13th
May, 2013 to 13th July, 2013.
She was assigned the project “CASH DEMAND FORECASTING FOR ATMS” which she
completed successfully under the guidance of Dr. Mahil Carr, IDRBT.
We wish her all the best in all her endeavors.
Dr. Mahil Carr
(Project Guide)
Associate Professor
IDRBT, Hyderabad
3
ACKNOWLEDGEMENT
I owe thanks to many people who helped and guided me in the completion of this project. But
most of all, I thank my supervisor Dr. Mahil Carr for providing me such an excellent opportunity
to work for an interesting project idea and for motivating me all the time. At every stage of the
project, he was patient in hearing and clearing all my doubts. I always got help from him despite
his busy schedule and my irregular schedule. Without his support, this project would not have
been possible. I am thankful for all his support.
I am thankful to Industrial and Management Engineering Department, IIT Kanpur for giving me
this golden opportunity to work in a high-end research institute like IDRBT. I am thankful to
IDRBT for providing such an amazing platform to work on real application oriented research.
Finally, I thank one and all who made this project successful either directly or indirectly.
Shanu Agrawal
Industrial and Management Engineering
M. Tech. 1st Year
IIT Kanpur
4
ABSTRACT
Cash management is a crucial activity for any bank. Banks need to predict the customer demand
for cash reasonably accurately. For instance demand is subject to change according to period of
time, position of ATM, socio-economic features of users. In addition, the quantity of cash drawn
may be differ before holidays, may also follow weekly, monthly and annual cycles. It has been
verified by using graphs and one sample t test.
After analyzing the time series data, the forecasting model was designed that addresses the
multiple seasonality’s and calendar day effects that are prevalent in the demand for cash. Time
series models are used to determine the demand for cash for the region and per ATM. The time
series models we used to analyze the data are
• Simple Exponential model
• Double Exponential Smoothing Model
• Holt-Winters model and
• ARIMA
After forecasting accuracy was measured by using Mean Absolute Percentage Error (MAPE),
Autocorrelation plot, Ljung-box test and Normal Distribution.
5
Table of Contents
Acknowledgement................................................................................................................................... 1
Abstract .................................................................................................................................................. 4
1 Introduction ......................................................................................................................................... 6
1.1 Objective ....................................................................................................................................... 6
1.2 Motivation ..................................................................................................................................... 6
1.3 Data ............................................................................................................................................... 6
2. Project Description .............................................................................................................................. 7
2.1Analysis of Time series data ............................................................................................................ 7
2.2 Forecasting Using Time Series Methods ......................................................................................... 9
2.2.1 Exponential Smoothing ........................................................................................................... 9
2.2.2 Double Exponential Smoothing ............................................................................................. 10
2.2.3 Holt-Winters Method ............................................................................................................ 10
2.2.4 ARIMA Model ........................................................................................................................ 11
2.3 Validity of Forecast ...................................................................................................................... 13
2.3.1 Mean Absolute Percentage Error (MAPE) .............................................................................. 13
2.3.2 Autocorrelation Function ...................................................................................................... 13
2.3.3 Ljung-Box Test ....................................................................................................................... 13
2.3.4 Check Normal Distribution .................................................................................................... 14
3 Modeling Procedure in Excel............................................................................................................... 14
4 Modeling Procedure in R .................................................................................................................... 14
5 Interpretation of Results ..................................................................................................................... 15
6 Conclusions ........................................................................................................................................ 18
7 References ......................................................................................................................................... 19
8 Appendix ............................................................................................................................................ 20
6
1 Introduction
1.1 Objective
To forecast the cash demand in ATM refilling using time series models through Excel and R coding. It
has been divided into three parts:
1. Analysis of time series data.
2. Forecasting using time series methods.
3. Checking validity of forecast
1.2 Motivation
ATM refilling is done by fixed amount and when it reaches to below threshold limit refilling is done
again. The threshold limit depending on where the ATM is placed and the turnaround time it will take
to replenish the cash. The services, involved in ATM management, say, “The frequency of replenishing
cash per week will increase the additional costs (for vendors refilling cash and in turn banks) as instead
of five trips, he (the vendor refilling cash) may have to make six trips in a week to the ATM to refill
cash.”
Though the frequency of refilling cash in ATMs depends on the area where it is located, agencies
usually refill between bi-weekly to per day. “In some areas such as MG Road in Bangalore we need to
refill twice a day,” says an official with a cash replenishing agency.
Banks will need to observe the withdrawal behaviour before changing the amount deposited in each
ATM. However, refilling ATMs with a higher amount unnecessarily too will not be a viable option for
banks. “Banks do not want to keep more money in ATMs as it is idle cash and increases their cost of
funds. Depending on the dispensing pattern, the frequency of refilling cash will have to be increased.
Banks will have to pay more charges, which could be around Rs 300-500 per day (per additional visit
for each ATM),” says the official of the replenishing agency [2]
.
1.3 Data
Banks store data for each and every transaction made. Hence this withdrawal data consists of historical
time series data ranging over a period of certain years. The data used in this project is the daily ATM data
for Union Bank of India for the period Apr 2007 to Sep 2011.
7
2. Project Description
2.1Analysis of Time series data
In this we observe three things that are follows:
Is there any level in time series data?
Is there any trend in time series data?
Is there any seasonality in time series data?
Before analyzing time series data we made following assumptions:
1. There will be seasonality pattern (12 seasons) in months of a year because of difference in number
of days in each month and festival (mostly it will not differ from month to month) in each month.
2. There will be seasonality pattern (52 seasons) in weeks of a year (considering 52 weeks in a year)
because of difference in withdrawal in each week.
3. There will be seasonality pattern (4 seasons) in weeks of a successive months because of
difference in withdrawal in each week (first two weeks are generally more crowded).
4. There will be seasonality pattern in weeks of particular month due to difference in withdrawal in each week (first two weeks are generally more crowded because pay days in occur in initial 10
days).E.g. seasonality (5 seasons) in weeks of September month.
5. There will be seasonality pattern (7 seasons) in days of weeks due to difference in withdrawal in each day. ATMs are mostly used on Friday and Saturday and least on Sunday.
For above assumptions graphs shown in Figure 1-4. From graphs we can see that the difference in
withdrawal is small in monthly pattern, weekly, daily patterns. Now question is that Is this difference is
significant difference? It has checked by one sample t test and found that there is no significant
difference in withdrawal in data points of monthly, weekly and daily pattern. So there is no seasonality in
monthly, weekly and daily pattern.
Now next question is that Is there any trend from one period to next period in time series data? We can
clearly see from graph that there is a trend (increasing) from last quarter of 2010 to 2011.
8
Figure 1.Monthly rn in a year
Figure 2.Weekly pattern in a year
0
5000000
10000000
15000000
20000000
25000000
JAN FEB MAR APR MAY JUN JUL AUG SEP OCT NOV DEC
wit
hd
raw
al(
in R
up
ee
)
Month
Monthly withdrawal in a year
2008
2009
2010
2011
0
1000000
2000000
3000000
4000000
5000000
6000000
7000000
1 3 5 7 9 1113 15171921 232527 293133 353739 414345 474951
Wit
hd
raw
al (
in R
up
ee
)
Week
Weekly Withdrawal in a year
2008
2009
2010
2011
9
Figure 3.Weekly pattern of successive months in a year
Figure 4.Daily pattern of in 2011 year
2.2 Forecasting Using Time Series Methods
2.2.1 Exponential Smoothing
Exponential smoothing is a technique that can be applied to time series data, either to produce smoothed
data for presentation, or to make forecasts. Exponential smoothing assigns exponentially decreasing
0
1000000
2000000
3000000
4000000
5000000
6000000
1 2 3 4
Wit
hd
raw
al(
in R
up
ee
)
Week
Weekly pattern of successive months(2011)
apr
may
june
july
Aug
Sep
0
200000
400000
600000
800000
1000000
1200000
Sat Sun Mon Tue Wed Thu Fri
Wit
hd
raw
al(
in R
up
ee
)
Day
Daily pattern in 2011
20Aug-26Aug
27Aug-2sep
3Sep-9Sep
10Sep-16Sep
17Sep-23Sep
24Sep-30Sep
10
weights over time. Exponential smoothing is commonly applied to financial market and economic data,
but it can be used with any discrete set of repeated measurements.
The raw data sequence is often represented by {xt}, and the output of the exponential smoothing algorithm is commonly written as {st}, which may be regarded as a best estimate of what the next value
of x will be. When the sequence of observations begins at time t = 0, the simplest form of exponential
smoothing is given by the formulae:
Where α is the smoothing factor, and 0 < α < 1. Values of α close to one have less of a smoothing effect and give greater weight to recent changes in the data, while values of α closer to zero have a greater
smoothing effect and are less responsive to recent changes.
2.2.2 Double Exponential Smoothing
Simple exponential smoothing does not do well when there is a trend in the data. In such situations, double exponential smoothing" or "second-order exponential smoothing” is useful. Again, the raw data
sequence of observations is represented by {xt}, beginning at time t = 0. We use {st} to represent the
smoothed value for time t, and {bt} is our best estimate of the trend at time t. The output of the algorithm
is now written as Ft+m, an estimate of the value of x at time t+m, m>0 based on the raw data up to time t.
Double exponential smoothing is given by the formulas
And for t > 1 by
Where α is the data smoothing factor, 0 < α < 1, and β is the trend smoothing factor, 0 < β < 1.
To forecast beyond x_t
2.2.3 Holt-Winters Method
The Holt-Winters seasonal method comprises the forecast equation and three smoothing equa-
tions — one for the level , one for trend , and one for the seasonal component denoted by
, with smoothing parameters α, β and ( α, β, . We use L to denote the number of periods
in a cycle (12 if months of year, 7 if days of week …).
11
The point forecast made in time period ‘t’ ���� for is calculated as:
���� � ��� � ������ � (h=1, 2…) … (1)
The smoothing equations:
�� � � � ������
� � �1 � ����� � � � �� … (2)
� � ���� � �� �� � �1 � ��� � … (3)
�� � � ������ � �1 � ���� � … (4)
2.2.4 ARIMA Model
An autoregressive integrated moving average (ARIMA) model is a generalization of an autoregressive
moving average (ARMA) model. These models are fitted to time series data either to better understand
the data or to predict future points in the series (forecasting).
AR part of a time series Yt is that the observed value yt depends on linear combination of previous
observed values up to a defined lag (denoted p), plus a random error term εt because partial dependence
The AR(p) model can be expressed algebraically as:
yt = φ1yt-1 + φ2yt-2 + …… + φpyt-p + εt
MA part of a time series Yt is that the observed value yt is a random error term plus linear combination of
previous random error terms up to a defined lag (denoted q). The MA(q) model can be expressed
algebraically as:
yt = εt + θ1εt-1 + θ2εt-2 + …… + θqεt-q
If we combine differencing with auto regression and a moving average model, we obtain a non-seasonal
ARIMA model. Then equation can be written as:
(1- φ1B- φ2B2-……- φpB
p) (1-B)
dyt = c+(1- θ1B- θ1B
2-……- θ1B
p) εt
AR(p) d differences MA(q)
Where B: Backward shift operator (shift-ing the data back one period)
A seasonal ARIMA model is formed by including additional seasonal terms in the ARIMA models we
have seen so far. It is written as follows:
ARIMA(p,q,d)(P,Q,D)L
We use L to denote the number of periods in a cycle (12 if months of year, 7 if days of week …)
Non-seasonal
part of model
seasonal part
of model
12
Flow chart for ARIMA model
13
2.3 Validity of Forecast
The forecast intervals for models are based on assumptions that the residuals are uncorrelated and nor-
mally distributed. If either of these are assumptions do not hold, then the forecast intervals may be
incorrect. For this reason, always plot the ACF and histogram of the residuals to check the assumptions
before producing forecast intervals. The validity of forecast is checked by using following methods:
2.3.1 Mean Absolute Percentage Error (MAPE)
It is a measure of accuracy of a method for constructing fitted time series values in statistics, specifically in trend estimation. It usually expresses accuracy as a percentage, and is defined by the
formula:
where At is the actual value and Ft is the forecast value.
The absolute value in this calculation is summed for every fitted or forecasted point in time and
divided again by the number of fitted points n. Multiplying by 100 makes it a percentage error. In
general a MAPE of 10% is considered very good but normally it range in 20-30%.
2.3.2 Autocorrelation Function
Residuals should be random and independent, for this it should not be auto correlated.
2.3.3 Ljung-Box Test
It is a type of statistical test of whether any of a group of autocorrelations of a time series are different
from zero. The Ljung–Box test can be defined as follows:
H0: The data are independently distributed (i.e. the correlations in the population from which the
sample is taken are 0, so that any observed correlations in the data result from randomness of the
sampling process).
Ha: The data are not independently distributed.
The test statistic is:
where n is the sample size, is the sample autocorrelation at lag k, and h is the number of lags being
tested. For significance level α, the critical region for rejection of the hypothesis of randomness is
14
where is the α-quantile of the chi-squared distribution with h degrees of freedom.
2.3.4 Check Normal Distribution
For white noise residuals should follow normally distributed. After plotting histogram fitting normal
distributed curve.
3 Modeling Procedure in Excel
In excel forecasting was done by using Simple Exponential model, Double Exponential Smoothing
model and Holt-Winters model. Modeling procedure is approximately same for all these models are
follows:
1. Find initial values for the level , the trend rate and the seasonal factors
according to selected model.
2. Then calculate the new estimates of level, trend and seasonality using above equations according
to selected model
3. Then calculate the point forecast value by using above estimates.
4. Now we have both actual and forecast value of historical data, from this we can calculate absolute
value of error (MAE) i.e. difference between actual and forecast value.
5. Determine the smoothing constants α, β and that minimizes the mean absolute error (MAE)
using Excel Solver.
6. Then checking validity of forecast result using auto correlation function, Ljung test and Normal
Distribution plot of residual in R.
7. After that we can make point forecast for future periods.
4 Modeling Procedure in R
In R forecasting was done by using Simple Exponential model, Double Exponential Smoothing model,
Holt-Winters model and ARIMA. Modeling procedure is approximately same for all these models are
follows:
Once you have installed R on a Windows computer, you can install an additional package by following
the steps below:
1. Once you have started R, you have to install R package (“XLConnect” and “forecast” package)
by choosing “Install package(s)” from the “Packages” menu at the top of the R console.
2. Load library functions i.e. XLConnect and forecast.
3. Then read time series from workbook by using loadworkbook( ) function from XLConnect
package.
4. Apply t test to the data for checking Is there any significant difference in withdrawal of different
–different data points?
5. Then we convert data into time series frame assuming first no seasonality in data means
frequency=1.
15
6. After this we make forecast using simple exponential model by HoltWinters ( ) function from
forecast package setting beta and gamma FALSE because in simple exponential no trend and
seasonality.
7. After this checking validity of forecast using acf( ), Box.Test( ) and by normal distribution plot
of residuals.
8. After this we make forecasting for further period using forecast.HoltWinters ( ) function
according to required periods.
9. Similarly for Double Exponential Smoothing and Holt Winter model we applied by adjusting beta
and gamma value.
10. For forecasting using ARIMA model we use auto.arima( ) function from forecast package. Then
repeat 7th and 8
th step.
For Monthly forecasting R code is given in appendix, for weekly and daily forecasting also code will be
same with few changes.
5 Interpretation of Results
1. On Monthly Basis withdrawal
Taking from Jan-2008 to Jun -2011 as historical data and forecasting for next 3 months (July-sep 2011).
We can see from graph shown in figure1 that there is trend (increasing) from year 2008 to 2011.
Forecast Results using Excel
Month Actual Fore_HW Fore_DES Fore_ES
JUL 18614300 18448082.16 18877299 17963806.7
AUG 20450533.33 18447689.65 19111582 17963806.7
SEP 18022300 20820396.24 19345865 17963806.7
MAPE
7.01 7.39 7.18
ACF
passed passed passed
L jung Test
passed passed passed
Normal Dist.
Followed Followed Followed
Forecast Results using R
Month Actual fore_ES fore_DES fore_HW fore_ARIMA
Jul 18614300 18469116.53 19002892.62 19042659.42 18809953
Aug 20450533.33 18469116.53 19255709.7 20513614.89 19019292
Sep 18022300 18469116.53 19508526.77 21814274.84 19228630
MAPE 4.316008748 5.540136923 7.339768224 4.511494
ACF passed passed failed passed
L jung Test passed passed passed passed
Normal Dist. Followed Followed Followed Followed
16
2. On week basis:
We can do forecasting on week basis using two methods below:
a) By considering 52 seasons in a year.
b) By considering 4 seasons in successive month.
a) For forecasting on weekly basis taking from week1 of 2008 to week 36 of 2011 as historical data
and forecasting for next 3weeks (37-39 weeks of 2011) .we can see from graph shown in figure 2
that there is trend (increasing) from year 2008 to 2011 and due to transaction difference in each
week this taking 52 seasons in a year.
Forecast Results using Excel
Week(2011) Actual Fore_ES Fore_DES Fore_HW
37 4795300 4347122 4389653 4272329
38 3068500 4347122 4390689 4526822
39 4272500 4347122 4391725 4506860
MAPE 17.58735 18.11297 21.30559
ACF failed failed passed
L- Jung Test failed failed passed
Normal Dist. followed followed followed
Forecast Results using R
Week(2011) Actual Fore_ES Fore_DES Fore_HW Fore_ARIMA
37 4795300 4360478 4399495 4710448 4766318.483
38 3068500 4360478 4402583 4686353 4644281.332
39 4272500 4360478 4405672 4772860 4109684.12
MAPE
17.7438 18.28257 22.06841 18.58954464
ACF
failed failed passed passed
L jung Test
failed failed passed passed
Normal Dist.
followed followed followed followed
b) For forecasting on week basis taking from week1of January 2008 to week1 of Sep 2011 as
historical data and forecasting for next 3 weeks (2-4 week of Sep 2011). Pattern is shown in
figure 3.
Forecast Results using Excel
Week(Sep 2011) Actual Fore_ES Fore_DES Fore_HW
2 5333700 4344150 4140268 4946206
3 3177700 4344150 4075195 4481201
4 4317000 4344150 4010122 4171006
MAPE 18.62969 19.24248 17.22237
ACF failed failed passed
L jung Test passed failed passed
Normal Dist. followed followed followed
17
Forecast Results using R
Week(Sep 2011) Actual Fore_ES Fore_DES Fore_HW Fore_ARIMA
2 5333700 4348899 4175056 5481334 4740220
3 3177700 4348899 4124223 4984934 4748460
4 4317000 4348899 4073390 3993368 4571047
MAPE
18.68649 19.05085 22.37902 22.1475
ACF
failed failed failed passed
L jung Test
passed passed passed passed
Normal Dist.
followed followed followed followed
3. On daily basis:
For forecasting on daily basis taking from 1/1/2008 to 27/9/2011 as historical data and forecasting for
next 3 days (28-29 Sep 2011). Pattern for daily basis shown in figure 4.
Forecast Results using Excel
Day(2011) Actual Fore_ES Fore_DES Fore_HW
28/09/2011 617100 605149.3 589531.8 569801.9
29/09/2011 571400 605149.3 589861.9 599025.2
30/09/2011 470700 605149.3 590192 668046.8
MAPE
12.13557 11.02813 18.14182
ACF
failed failed passed
L jung Test
failed passed failed
Normal Dist.
followed followed followed
Forecast Results using R
Day(2011) Actual Fore_ES Fore_DES Fore_HW Fore_ARIMA
28/09/2011 617100 601719.2 585397.4 562390.5 631656.9
29/09/2011 571400 601719.2 585320.1 596371.8 619255.5
30/09/2011 470700 601719.2 585242.8 603665.4 617706.5
MAPE
11.87784 10.63602 13.8281 13.9885
ACF
failed failed failed failed
L jung Test
failed passed failed passed
Normal Dist.
followed followed followed followed
18
6 Conclusions
Following conclusion drawn on the basis of data obtained from particular ATM, this will vary from ATM
to ATM depends on position of ATM, socio-economic features of users.
• For UBI (Muland) ATM there is no significant difference in data points due to this forecasting
using Simple Exponential Smoothing Model, Double Exponential Model and ARIMA Model
forecasting well.
• By forecasting using R gives upper and lower bound, so based on economic scenario we can
choose bound value of cash for refilling ATM. Suppose if we know previously that next week
there will inflation/deflation/normal according to that we can choose lower/upper/forecast value
of cash for refilling ATM.
19
7 References
1. Avril Coghlan. Little Book of R for Time Series. Available from http://a-little-book-of-r-for-time-
series.readthedocs.org/en/latest/.
2. http://www.dnaindia.com/money/1391225/report-atm-refilling-costs-up-as-withdrawal-limit-is-hiked
3. Joe Choong. Forecasting With MS Excel. McGraw-Hill; 1 edition (January 11, 1993)
4. Mirai Solutions GmbH. XLConnect: Excel Connector for R. Retrieved from http://cran.r-
project.org/web/packages/XLConnect/index.html, http://mirai-solutions.com. 5. Rob J Hyndman. forecast: Forecasting functions for time series and linear models. Retrieved from
http://cran.r-project.org/web/packages/forecast/index.html.
6. Rob J Hyndman and George Athanasopoulos. Forecasting: Principles and practice. Available from http://otexts.com/fpp/.
7. Ruey S. Tsay. An Introduction to analysis of financial data with R. John Wiley &Sons Inc.,
Hoboken, New Jersey.
20
8 Appendix
R code:
#Using library
require(XLConnect)
library(forecast)
#Reading data from excel file
wb_m <- loadWorkbook("E:/IDRBT/R/Worksheet.xlsx", create = TRUE)
data = readWorksheet(wb_m,sheet = "Month",startRow = 3, endRow = 47,startCol
= 2, endCol = 3,header=TRUE)
data
#Checking Is there difference in withdrawal in each month
t.test(data$Amount,mu=mean(data$Amount))
#Converting data into time series frame assuming no seasonality
datats<-ts(data$Amount,frequency=1)
datats
plot.ts(datats)
#Forecasting using Simple Exponential Method
monthf_ES<-HoltWinters(datats,beta=FALSE,gamma=FALSE)
monthf_ES
plot(monthf_ES)
series_ES<-forecast.HoltWinters(monthf_ES,h=2)
series_ES
#Checking Validity of Exponential Smoothing forecast method
acf(series_ES$residuals, lag.max=20)
Box.test(series_ES$residuals, lag=20, type="Ljung-Box")
#writing into excel file
writeWorksheetToFile("E:/IDRBT/R/Worksheet.xlsx", data = series_ES,sheet =
"month_outn", startRow = 3, startCol = 1)
#Forecasting using Double Exponential Method
monthf_DES<-HoltWinters(datats,gamma=FALSE)
monthf_DES
head(monthf_DES)
plot(monthf_DES)
series_DES<-forecast.HoltWinters(monthf_DES,h=2)
series_DES
#Checking Validity of Double Exponential Smoothing forecast method
acf(series_DES$residuals, lag.max=20)
Box.test(series_DES$residuals, lag=20, type="Ljung-Box")
#writing into excel file
writeWorksheetToFile("E:/IDRBT/R/Worksheet.xlsx", data = series_DES,sheet =
"month_outn", startRow = 3, startCol = 8)
#Converting data into time series frame assuming seasonality
datats_s<-ts(data$Amount,frequency=12)
datats_s
plot.ts(datats_s)
#Forecasting using Holt-Winter Method
21
monthf_HW<-HoltWinters(datats_s)
monthf_HW
plot(monthf_HW)
series_HW<-forecast.HoltWinters(monthf_HW,h=2)
#Checking Validity of Holt Winter method
acf(series_HW$residuals, lag.max=20)
Box.test(series_HW$residuals, lag=20, type="Ljung-Box")
#writing into excel file
writeWorksheetToFile("E:/IDRBT/R/Worksheet.xlsx", data = series_HW,sheet =
"month_outn", startRow = 3, startCol = 15)
#Forecasting using Arima Model
fit <- auto.arima(datats)
fit
head(fit)
x<-forecast(fit,h=2)
x
plot(x)
#Checking Validity of forecast by ARIMA
acf(residuals(fit), lag.max=20)
Box.test(residuals(fit), lag=20, type="Ljung-Box")
writeWorksheetToFile("E:/IDRBT/R/Worksheet.xlsx", data = x,sheet =
"month_outn", startRow = 3, startCol = 21)
plotForecastErrors <- function(forecasterrors)
{
# make a histogram of the forecast errors:
mybinsize <- IQR(forecasterrors)/4
mysd <- sd(forecasterrors)
mymin <- min(forecasterrors) - mysd*5
mymax <- max(forecasterrors) + mysd*3
# generate normally distributed data with mean 0 and standard deviation mysd
mynorm <- rnorm(10000, mean=0, sd=mysd)
mymin2 <- min(mynorm)
mymax2 <- max(mynorm)
if (mymin2 < mymin) { mymin <- mymin2 }
if (mymax2 > mymax) { mymax <- mymax2 }
# make a red histogram of the forecast errors, with the normally distributed
data overlaid:
mybins <- seq(mymin, mymax, mybinsize)
hist(forecasterrors, col="red", freq=FALSE, breaks=mybins)
# freq=FALSE ensures the area under the histogram = 1
# generate normally distributed data with mean 0 and standard deviation mysd
myhist <- hist(mynorm, plot=FALSE, breaks=mybins)
# plot the normal curve as a blue line on top of the histogram of forecast
errors:
points(myhist$mids, myhist$density, type="l", col="blue", lwd=2)
}
plotForecastErrors(series_ES$residuals)
plotForecastErrors(series_DES$residuals)
plotForecastErrors(series_HW$residuals)
plotForecastErrors(residuals(fit))