Time Series Analysis – Google - · PDF fileL"SternGroup( ( LyPham((( 1(Time Series...

13
LStern Group Ly Pham 1 Time Series Analysis – Google I. Introduction: This paper will provide analysis of time series of Google stock prices from 01/01/2007 to 07/18/2012 using ARIMA-ARCH/GARCH model. The main goal is to find a model that best fits Google stock price process and therefore is used to predict future values. The first part of the paper establishes ARIMA model based on Box-Jenkins method and AICc. Next, we will examine the residuals and the series’ volatility to fit ARCH/GARCH model to these residuals. The final part will provides the forecast for Google series and evaluate the overall performance of the selected model. R and Minitab will be used for this analysis and we will compare the results from the two softwares later on in the paper to see their pros and cons. II. ARIMA model: a. Model identification: Firstly, the log transformation is performed to linearize the series since financial time series often shows exponential growth. This might not be true for all time series with different time periods because Price and Log Price plots and characteristics of some time series are similar to each other, and therefore, analyzing Price series makes no significant deviation from Log Price. However, as a convention, we will work on Log Price series. Examining ACF & PACF of Log Price indicates that it is necessary to difference the series. ACF plot displays a slow decrease at all lags instead of the series dying down or cutting off. On the other hand, PACF shows no significant lags. One advantage of differencing is that we can look at the returns of stock price because log changes for financial data are nearly similar to percentage changes that represent stock returns, and most people are often interested in return distribution rather than original series since it might be affected by price level. Followings are plots of Log Price and its ACF & PACF, and Difference Log Price from R:

Transcript of Time Series Analysis – Google - · PDF fileL"SternGroup( ( LyPham((( 1(Time Series...

L-­‐Stern  Group     Ly  Pham    

  1  

Time Series Analysis – Google I. Introduction:

This paper will provide analysis of time series of Google stock prices from 01/01/2007 to 07/18/2012 using ARIMA-ARCH/GARCH model. The main goal is to find a model that best fits Google stock price process and therefore is used to predict future values. The first part of the paper establishes ARIMA model based on Box-Jenkins method and AICc. Next, we will examine the residuals and the series’ volatility to fit ARCH/GARCH model to these residuals. The final part will provides the forecast for Google series and evaluate the overall performance of the selected model. R and Minitab will be used for this analysis and we will compare the results from the two softwares later on in the paper to see their pros and cons.

II. ARIMA model: a. Model identification: Firstly, the log transformation is performed to linearize the series since financial time series often shows exponential growth. This might not be true for all time series with different time periods because Price and Log Price plots and characteristics of some time series are similar to each other, and therefore, analyzing Price series makes no significant deviation from Log Price. However, as a convention, we will work on Log Price series. Examining ACF & PACF of Log Price indicates that it is necessary to difference the series. ACF plot displays a slow decrease at all lags instead of the series dying down or cutting off. On the other hand, PACF shows no significant lags. One advantage of differencing is that we can look at the returns of stock price because log changes for financial data are nearly similar to percentage changes that represent stock returns, and most people are often interested in return distribution rather than original series since it might be affected by price level. Followings are plots of Log Price and its ACF & PACF, and Difference Log Price from R:  

 

L-­‐Stern  Group     Ly  Pham    

  2  

The Difference Log Price shows that returns seem mean-reverting with constant variance, i.e. the differenced series is stationary. To check stationarity and roughly identify order of ARIMA model, we observe ACF & PACF of this series as follows:

• The upper graph displays ACF of differences of log Google with no significant lags • The lower plots PACF of differences of log Apple, reflecting no significant lags. The model

for differenced log Google series is thus a white noise, and the original model resembles random walk model ARIMA(0,1,0)

We will check the AICC for this model and compare it with those of other models in which p and q are less than or equal to 2. We choose the benchmarks for p and q are up to 2 since the higher p and q, the less stable the model is, reflecting the less accurate forecast. Following is AICc computed manually with the Sum of Squares obtained from Minitab:

Model N p q SS with constant

AICc with constant

SS without constant

AICc without constant

0 1 0 1396 0 0 0.645862518 -4651.295876 0.645895711 -4653.270463 1 1 0 1396 1 0 0.644936 -4650.157601 0.644971 -4652.133327 0 1 1 1396 0 1 0.644925 -4650.167942 0.644961 -4652.142727 1 1 1 1396 1 1 0.644901 -4648.178989 0.644936 -4650.157601 0 1 2 1396 0 2 0.644907 -4648.173349 0.644943 -4650.151021 1 1 2 1396 1 2 0.644897 -4646.168341 0.644933 -4648.148907 2 1 0 1396 2 0 0.644911 -4648.169588 0.644947 -4650.147261

L-­‐Stern  Group     Ly  Pham    

  3  

2 1 1 1396 2 1 0.644896 -4646.169281 0.644932 -4648.149847 2 1 2 1396 2 2 0.644926 -4646.141078

Note that Minitab cannot fit ARIMA(2,1,2) without constant to the series; therefore, we do not have AICc in this case. Additionally, Minitab does not fit random walk model because there are no autoregressive and moving average coefficients, and so we need to compute Sum of Squares manually. To compute sum of squares for ARIMA(0,1,0), follow the formulas:

• SS = Σyt ARIMA(0,1,0) without constant • SS = Σ(yt – y ave) ARIMA(0,1,0) with constant

In Minitab, when compute sum of squares using Calculator, the sum of square is computed without corrected for constant, and thus it is sum of squares without constant. Some notes about whether to include constant in ARIMA model1:

• d = 0: stationary model, constant should always be included and constant is the man of series • d = 1: model has constant average trend, constant is included if the series shows any growth or

deterministic trend. According to Box-Jenkins, when d > 0, constant should not be included except for series showing significant trend

• d = 2: model has time-varying trend, constant should not be included According to the output as shown above, ARIMA(0,1,0) without constant has the lowest AICc and ARIMA(0,1,1) ranks second. Following is AICc obtained directly from R by fitting ARIMA model for the Log series:

Model AICc 0 1 0 -6755.48 1 1 0 -6755.47 0 1 1 -6755.5 1 1 1 -6753.54 0 1 2 -6753.52 1 1 2 -6751.52 2 1 0 -6753.52 2 1 1 -6751.51 2 1 2 -6749.49

Noted that:

• R will ignore the mean when performing ARIMA model with differencing, so AICc obtained in R are those for ARIMA model without constant

• AICc computed manually and by using Minitab are different from those obtained in R, yet the model selection result seems similar

According to the information provided above, we would select ARIMA(0,1,1) without constant as our model. Although output from Minitab suggested that ARIMA(0,1,0) without constant is the most suitable, the result from R prefers ARIMA(0,1,1) and this model only ranks after ARIMA(0,1,0) in Minitab. Therefore, we will consider ARIMA(0,1,1) for our main analysis, and compare it with ARIMA(0,1,0) later in the paper.

L-­‐Stern  Group     Ly  Pham    

  4  

b. Parameters estimation: We perform the fitting of ARIMA model to Log Price series in Minitab and R. Following is the output from Minitab:

Final Estimates of Parameters Type Coef SE Coef T P MA 1 0.0383 0.0268 1.43 0.153 Differencing: 1 regular difference Number of observations: Original series 1397, after differencing 1396 Residuals: SS = 0.121647 (backforecasts excluded) MS = 0.000087 DF = 1395

Full ARIMA(0,1,1) model: (Yt – Yt-1) = 0.0383εt-1 + εt

A quick look at the result from R shows that the coefficient estimate is similar to that obtained in Minitab:

summary(arima.goog011) Series: log.goog ARIMA(0,1,1) Coefficients: ma1 -0.0383 s.e. 0.0269 sigma^2 estimated as 0.000462: log likelihood=3379.75 AIC=-6755.5 AICc=-6755.5 BIC=-6745.02 In-sample error measures: ME RMSE MAE MPE MAPE 0.0001644413 0.0214872894 0.0144117023 0.0019870399 0.2335862764

c. Diagnostic checking: To check how well the model fits Log Price series, first look at the plot of residuals:

• The residuals are mean-reverting and there are clusters of volatility at some points • ACF & PACF have no significant lags; therefore, ARIMA(0,1,1) is the appropriate model

L-­‐Stern  Group     Ly  Pham    

  5  

Following is the Ljung-Box test from Minitab for ARIMA(0,1,1) model

Modified Box-Pierce (Ljung-Box) Chi-Square statistic Lag 12 24 36 48 Chi-Square 19.1 34.8 62.9 80.1 DF 11 23 35 47 P-Value 0.059 0.054 0.003 0.002

The result indicates that we cannot reject the hypothesis that the autocorrelations of residuals are different from 0 for lag 12 and 24 because p-values are greater than 0.05. However, these values are only slightly greater than 0.05 and for lag 36 and 48, p-values are less than 0.05 and thus, we can reject this hypothesis. The Ljung-Box result gives us an idea that ARIMA(0,1,1) needs modified to be more reliable and more accurate.

In addition, histogram and Q-Q plot of residuals show that residuals are not distributed normally:

L-­‐Stern  Group     Ly  Pham    

  6  

III. ARCH/GARCH  model:  Although ACF & PACF of residuals have no significant lags, the time series plot of residuals shows some cluster of volatility and Ljung-Box test shows weak rejection. Following are plots of Squared Residuals and its ACF & PACF obtained from R:

• The squared residuals plot apparently shows clusters of volatility • ACF still remains many significant lags • PACF seems to cut off after lag 6 yet there are some significant lags later

L-­‐Stern  Group     Ly  Pham    

  7  

The residuals therefore show some patterns that might be modeled. ARCH/GARCH is necessary to model the volatility of the series. As indicated by its name, this method concerns with the conditional variance of the series. The general form of ARCH(q):

εt  |  ψt-­‐1  ~  N(0,ht)  ht  =  ω  +  Σαiε2t-­‐i  

We  fit  ARCH  to  the  residuals  from  ARIMA  model  selected  previously,  not  to  the  original  series  or  log  or  differenced  log  series  because  we  only  want  to  model  the  noise  of  ARIMA  model.  Next,  we  compute  AICc  for  ARCH/GARCH  model:  

Model   N   q   Log  likelihood   AICc  no  const   AICc  const  ARCH(0)   1396   0   3380.212   -­‐6758.421131   -­‐6756.415385  ARCH(1)   1396   1   3400.348   -­‐6796.687385   -­‐6794.678759  ARCH(2)   1396   2   3415.905   -­‐6825.792759   -­‐6823.781244  ARCH(3)   1396   3   3426.305   -­‐6844.581244   -­‐6842.566835  ARCH(4)   1396   4   3440.927   -­‐6871.810835   -­‐6869.793525  ARCH(5)   1396   5   3474.429   -­‐6936.797525   -­‐6934.777308  ARCH(6)   1396   6   3473.901   -­‐6933.721308   -­‐6931.698179  ARCH(7)   1396   7   3473.474   -­‐6930.844179   -­‐6928.81813  ARCH(8)   1396   8   3497.62   -­‐6977.11013   -­‐6975.081155  ARCH(9)   1396   9   3494.325   -­‐6968.491155   -­‐6966.459249  ARCH(10)   1396   10   3499.742   -­‐6977.293249   -­‐6975.258403  

L-­‐Stern  Group     Ly  Pham    

  8  

GARCH(1,  1)   1396   2   3515.871   -­‐7025.724759   -­‐7023.713244   The  table  of  AICc  is  provided  above  for  both  constant  and  non-­‐constant  cases.  Moreover,  we  used  residuals  obtained  when  fitting  Log  Price  series  to  ARIMA(0,1,1)  in  R  to  fit  ARCH/GARCH  model.  Note  that  from  ARCH(0,8)  afterwards,  the  model  fails  to  converge,  and  therefore,  the  models’  capability  to  forecast  is  doubted.  Although  GARCH(1,1)  also  has  the  lowest  AICc,  the  model  is  falsely  converged,  and  thus  excluded.  Therefore,  ARCH(5)  is  the  selected  model.    A  quick  comparison  of  the  results  if  we  use  residuals  of  ARIMA(0,1,1)  obtained  from  Minitab  to  fit  ARCH/GARCH:  

Model   N   q   Log  likelihood   AICc  no  const   AICc  const  ARCH(0)   1396   0   3379.774   -­‐6757.545131   -­‐6755.539385  ARCH(1)   1396   1   3398.722   -­‐6793.435385   -­‐6791.426759  ARCH(2)   1396   2   3413.205   -­‐6820.392759   -­‐6818.381244  ARCH(3)   1396   3   3423.473   -­‐6838.917244   -­‐6836.902835  ARCH(4)   1396   4   3437.674   -­‐6865.304835   -­‐6863.287525  ARCH(5)   1396   5   3471.289   -­‐6930.517525   -­‐6928.497308  ARCH(6)   1396   6   3471.223   -­‐6928.365308   -­‐6926.342179  ARCH(7)   1396   7   3470.351   -­‐6924.598179   -­‐6922.57213  ARCH(8)   1396   8   3494.238   -­‐6970.34613   -­‐6968.317155  ARCH(9)   1396   9   3398.722   -­‐6777.285155   -­‐6775.253249  ARCH(10)   1396   10   3497.242   -­‐6972.293249   -­‐6970.258403  GARCH(1,  1)   1396   2   3513.862   -­‐7021.706759   -­‐7019.695244  

 The  series  starts  failing  to  converge  at  ARCH(8);  however,  GARCH(1,1)  can  still  converge.  Log  likelihoods  obtained  by  fitting  ARCH/GARCH  to  two  series  are  only  slightly  different.  This  might  be  due  to  specifications  of  each  software.  Although  GARCH(1,1)  has  lowest  AICc  and  therefore  should  be  selected  according  to  Minitab  result,  ARCH(5)  only  ranks  second  and  is  selected  based  on  output  in  R,  so  we  will  use  ARCH(5)    as  our  selected  model.Moreover,  we  also  include  ARCH(0)  in  the  analysis  because  it  can  serve  as  a  check  to  see  if  there  are  any  ARCH  effects  or  the  residuals  are  independent.    Note  that  R  will  not  allow  the  order  of  q  =  0,  and  so  we  cannot  get  the  log  likelihood  for  ARCH(0)  from  R;  yet  we  need  to  compute  it  by  the  formula7:  

−.5*N*(1+log(2*pi*mean(x ˆ2))) N: number of observations after differencing N = n – d X: the data set in consideration (in this case, the residuals)

The  output  for  ARCH(5):  summary(arch.goog05) Call: garch(x = resid.goog, order = c(0, 5)) Model: GARCH(0,5)

L-­‐Stern  Group     Ly  Pham    

  9  

Residuals: Min 1Q Median 3Q Max -5.81050 -0.45289 0.01511 0.51736 8.27483 Coefficient(s): Estimate Std. Error t value Pr(>|t|) a0 1.956e-04 7.634e-06 25.617 < 2e-16 *** a1 3.368e-02 1.878e-02 1.793 0.07294 . a2 4.971e-02 1.908e-02 2.605 0.00918 ** a3 1.128e-01 2.428e-02 4.645 3.40e-06 *** a4 1.299e-01 2.635e-02 4.928 8.29e-07 *** a5 3.741e-01 2.329e-02 16.060 < 2e-16 *** --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Diagnostic Tests: Jarque Bera Test data: Residuals X-squared = 3877.3, df = 2, p-value < 2.2e-16

Box-Ljung test data: Squared.Residuals

X-squared = 0.1016, df = 1, p-value = 0.7499 The  p-­‐values  for  all  parameters  are  less  than  0.05  (except  for  1st  parameter),  indicating  that  they  are  statistically  significant.  In  addition,  p-­‐value  of  Box-­‐Ljung  test  is  greater  than  0.05,  and  so  we  cannot  reject  the  hypothesis  that  the  autocorrelation  of  residuals  is  different  from  0.  The  model  thus  adequately  represents  the  residuals.   Full  ARCH(5)  model:  ht = 1.956e-04 + 3.368e-02ε2

t-1 + 4.971e-02ε2t-2 + 1.128e-01ε2

t-3 + 1.299e-01ε2t-4 + 3.741e-01ε2

t-5

IV. ARIMA-­‐ARCH/GARCH  performance:  In this section, we will compare the results from ARIMA model and the combined ARIMA-ARCH/GARCH model. As selected earlier, ARIMA and ARCH model for Google Log price series are ARIMA(0,1,1) without constant and ARCH(5), respectively. Moreover, we will also look at the result from Minitab and compare it with that from R. The 1-step forecast for the series under ARIMA(0,1,1):

Point Forecast Lo 95 Hi 95 1398 6.362642 6.320513 6.40477

Full model of ARIMA(0,1,1) – ARCH(5):

(Yt – Yt-1) = 0.0383εt-1 + 1.956e-04 + 3.368e-02ε2t-1 + 4.971e-02ε2

t-2 + 1.128e-01ε2t-3 + 1.299e-01ε2

t-4 + 3.741e-01ε2

t-5

L-­‐Stern  Group     Ly  Pham    

  10  

Following is the table summarizing all models with their point forecast and forecast interval edited and computed in Excel:

Model Forecast Lower Upper Actual ARIMA(0,1,1) in R 6.362642 6.320513 6.40477 6.385295574 ARIMA(0,1,1) in Minitab (constant) 6.3628 6.32063 6.40497 ARIMA(0,1,1) in Minitab (no constant) 6.36260 6.32049 6.40479 ARIMA(0,1,1) + ARCH(5) in R 6.36285539 6.32072639 6.40498339 ARIMA(0,1,1) in Minitab (constant) +ARCH(5) 6.36301339 6.32084339 6.40518339 ARIMA(0,1,1) in Minitab (no constant) +ARCH(5) 6.36281339 6.32070339 6.40500339

Converting Log Price to Price, we obtain the forecast for original series:

        95%  Confident  interval      Model   Forecast   Lower   Upper   Actual  ARIMA(0,1,1)  in  R   579.7761032   555.8580745   604.7226965   593.06  ARIMA(0,1,1)  in  Minitab  (constant)   579.867715   555.9231137   604.8436531      ARIMA(0,1,1)  in  Minitab  (no  constant)   579.7517531   555.8452899   604.734791      ARIMA(0,1,1)  +  ARCH(5)  in  R   579.8998335   555.9767005   604.8517506      ARIMA(0,1,1)  in  Minitab  (constant)  +ARCH(5)   579.9914649   556.0417535   604.9727331      ARIMA(0,1,1)  in  Minitab  (no  constant)  +ARCH(5)   579.8754782   555.9639131   604.8638478        

The actual price was obtained on 07/19/2012. It is much off from the forecast yet still within 95% confident interval.

The Log Price and condition variances are plotted:

• The conditional variances plot successfully reflects the volatility of the time series over the entire period

• High volatility is closely related to period where stock price tumbled

However, the point forecast is not quite good given the high volatility of Google stock.

L-­‐Stern  Group     Ly  Pham    

  11  

 

The 95% forecast interval of Log price:

 

L-­‐Stern  Group     Ly  Pham    

  12  

 

The final check on the model is to look at Q-Q Plot of residuals of ARIMA-ARCH model, which is et = εt/sqrt(ht) =Residuals/sqrt(Conditional variance). Following is Q-Q plot of mixed model’s residuals:

 

L-­‐Stern  Group     Ly  Pham    

  13  

The  residuals  are  obviously  not  normally  distributed.  The  mixed  model  does  not  adequately  explain  the  Log  Price  series.  Additionally,  out  of  1397  observations,  there  are  74  times  the  95%  prediction  interval  constructed  yesterday  fails  to  cover  today’s  log  price,  which  is  around  5.3%.  

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

© 2013 L-Stern Group. All Rights Reserved. The information contained herein is not represented or warranted to be accurate, correct, or complete. This report is for information purposes only, and should not be considered a solicitation to buy or sell any security. Redistribution is prohibited without written permission.