Time Series Analysis – Google - · PDF fileL"SternGroup( ( LyPham((( 1(Time Series...

L-‐Stern Group Ly Pham

1

Time Series Analysis – Google I. Introduction:

This paper will provide analysis of time series of Google stock prices from 01/01/2007 to 07/18/2012 using ARIMA-ARCH/GARCH model. The main goal is to find a model that best fits Google stock price process and therefore is used to predict future values. The first part of the paper establishes ARIMA model based on Box-Jenkins method and AICc. Next, we will examine the residuals and the series’ volatility to fit ARCH/GARCH model to these residuals. The final part will provides the forecast for Google series and evaluate the overall performance of the selected model. R and Minitab will be used for this analysis and we will compare the results from the two softwares later on in the paper to see their pros and cons.

II. ARIMA model: a. Model identification: Firstly, the log transformation is performed to linearize the series since financial time series often shows exponential growth. This might not be true for all time series with different time periods because Price and Log Price plots and characteristics of some time series are similar to each other, and therefore, analyzing Price series makes no significant deviation from Log Price. However, as a convention, we will work on Log Price series. Examining ACF & PACF of Log Price indicates that it is necessary to difference the series. ACF plot displays a slow decrease at all lags instead of the series dying down or cutting off. On the other hand, PACF shows no significant lags. One advantage of differencing is that we can look at the returns of stock price because log changes for financial data are nearly similar to percentage changes that represent stock returns, and most people are often interested in return distribution rather than original series since it might be affected by price level. Followings are plots of Log Price and its ACF & PACF, and Difference Log Price from R:


2

The Difference Log Price shows that returns seem mean-reverting with constant variance, i.e. the differenced series is stationary. To check stationarity and roughly identify order of ARIMA model, we observe ACF & PACF of this series as follows:

• The upper graph displays ACF of differences of log Google with no significant lags • The lower plots PACF of differences of log Apple, reflecting no significant lags. The model

for differenced log Google series is thus a white noise, and the original model resembles random walk model ARIMA(0,1,0)

We will check the AICC for this model and compare it with those of other models in which p and q are less than or equal to 2. We choose the benchmarks for p and q are up to 2 since the higher p and q, the less stable the model is, reflecting the less accurate forecast. Following is AICc computed manually with the Sum of Squares obtained from Minitab:

Model N p q SS with constant

AICc with constant

SS without constant

AICc without constant

0 1 0 1396 0 0 0.645862518 -4651.295876 0.645895711 -4653.270463 1 1 0 1396 1 0 0.644936 -4650.157601 0.644971 -4652.133327 0 1 1 1396 0 1 0.644925 -4650.167942 0.644961 -4652.142727 1 1 1 1396 1 1 0.644901 -4648.178989 0.644936 -4650.157601 0 1 2 1396 0 2 0.644907 -4648.173349 0.644943 -4650.151021 1 1 2 1396 1 2 0.644897 -4646.168341 0.644933 -4648.148907 2 1 0 1396 2 0 0.644911 -4648.169588 0.644947 -4650.147261


3

2 1 1 1396 2 1 0.644896 -4646.169281 0.644932 -4648.149847 2 1 2 1396 2 2 0.644926 -4646.141078

Note that Minitab cannot fit ARIMA(2,1,2) without constant to the series; therefore, we do not have AICc in this case. Additionally, Minitab does not fit random walk model because there are no autoregressive and moving average coefficients, and so we need to compute Sum of Squares manually. To compute sum of squares for ARIMA(0,1,0), follow the formulas:

• SS = Σyt ARIMA(0,1,0) without constant • SS = Σ(yt – y ave) ARIMA(0,1,0) with constant

In Minitab, when compute sum of squares using Calculator, the sum of square is computed without corrected for constant, and thus it is sum of squares without constant. Some notes about whether to include constant in ARIMA model1:

• d = 0: stationary model, constant should always be included and constant is the man of series • d = 1: model has constant average trend, constant is included if the series shows any growth or

deterministic trend. According to Box-Jenkins, when d > 0, constant should not be included except for series showing significant trend

• d = 2: model has time-varying trend, constant should not be included According to the output as shown above, ARIMA(0,1,0) without constant has the lowest AICc and ARIMA(0,1,1) ranks second. Following is AICc obtained directly from R by fitting ARIMA model for the Log series:

Model AICc 0 1 0 -6755.48 1 1 0 -6755.47 0 1 1 -6755.5 1 1 1 -6753.54 0 1 2 -6753.52 1 1 2 -6751.52 2 1 0 -6753.52 2 1 1 -6751.51 2 1 2 -6749.49

Noted that:

• R will ignore the mean when performing ARIMA model with differencing, so AICc obtained in R are those for ARIMA model without constant

• AICc computed manually and by using Minitab are different from those obtained in R, yet the model selection result seems similar

According to the information provided above, we would select ARIMA(0,1,1) without constant as our model. Although output from Minitab suggested that ARIMA(0,1,0) without constant is the most suitable, the result from R prefers ARIMA(0,1,1) and this model only ranks after ARIMA(0,1,0) in Minitab. Therefore, we will consider ARIMA(0,1,1) for our main analysis, and compare it with ARIMA(0,1,0) later in the paper.


4

b. Parameters estimation: We perform the fitting of ARIMA model to Log Price series in Minitab and R. Following is the output from Minitab:

Final Estimates of Parameters Type Coef SE Coef T P MA 1 0.0383 0.0268 1.43 0.153 Differencing: 1 regular difference Number of observations: Original series 1397, after differencing 1396 Residuals: SS = 0.121647 (backforecasts excluded) MS = 0.000087 DF = 1395

Full ARIMA(0,1,1) model: (Yt – Yt-1) = 0.0383εt-1 + εt

A quick look at the result from R shows that the coefficient estimate is similar to that obtained in Minitab:

summary(arima.goog011) Series: log.goog ARIMA(0,1,1) Coefficients: ma1 -0.0383 s.e. 0.0269 sigma^2 estimated as 0.000462: log likelihood=3379.75 AIC=-6755.5 AICc=-6755.5 BIC=-6745.02 In-sample error measures: ME RMSE MAE MPE MAPE 0.0001644413 0.0214872894 0.0144117023 0.0019870399 0.2335862764

c. Diagnostic checking: To check how well the model fits Log Price series, first look at the plot of residuals:

• The residuals are mean-reverting and there are clusters of volatility at some points • ACF & PACF have no significant lags; therefore, ARIMA(0,1,1) is the appropriate model


5

Following is the Ljung-Box test from Minitab for ARIMA(0,1,1) model

Modified Box-Pierce (Ljung-Box) Chi-Square statistic Lag 12 24 36 48 Chi-Square 19.1 34.8 62.9 80.1 DF 11 23 35 47 P-Value 0.059 0.054 0.003 0.002

The result indicates that we cannot reject the hypothesis that the autocorrelations of residuals are different from 0 for lag 12 and 24 because p-values are greater than 0.05. However, these values are only slightly greater than 0.05 and for lag 36 and 48, p-values are less than 0.05 and thus, we can reject this hypothesis. The Ljung-Box result gives us an idea that ARIMA(0,1,1) needs modified to be more reliable and more accurate.

In addition, histogram and Q-Q plot of residuals show that residuals are not distributed normally:


6

III. ARCH/GARCH model: Although ACF & PACF of residuals have no significant lags, the time series plot of residuals shows some cluster of volatility and Ljung-Box test shows weak rejection. Following are plots of Squared Residuals and its ACF & PACF obtained from R:

• The squared residuals plot apparently shows clusters of volatility • ACF still remains many significant lags • PACF seems to cut off after lag 6 yet there are some significant lags later


7

The residuals therefore show some patterns that might be modeled. ARCH/GARCH is necessary to model the volatility of the series. As indicated by its name, this method concerns with the conditional variance of the series. The general form of ARCH(q):

εt | ψt-‐1 ~ N(0,ht) ht = ω + Σαiε2t-‐i

We fit ARCH to the residuals from ARIMA model selected previously, not to the original series or log or differenced log series because we only want to model the noise of ARIMA model. Next, we compute AICc for ARCH/GARCH model:

Model N q Log likelihood AICc no const AICc const ARCH(0) 1396 0 3380.212 -‐6758.421131 -‐6756.415385 ARCH(1) 1396 1 3400.348 -‐6796.687385 -‐6794.678759 ARCH(2) 1396 2 3415.905 -‐6825.792759 -‐6823.781244 ARCH(3) 1396 3 3426.305 -‐6844.581244 -‐6842.566835 ARCH(4) 1396 4 3440.927 -‐6871.810835 -‐6869.793525 ARCH(5) 1396 5 3474.429 -‐6936.797525 -‐6934.777308 ARCH(6) 1396 6 3473.901 -‐6933.721308 -‐6931.698179 ARCH(7) 1396 7 3473.474 -‐6930.844179 -‐6928.81813 ARCH(8) 1396 8 3497.62 -‐6977.11013 -‐6975.081155 ARCH(9) 1396 9 3494.325 -‐6968.491155 -‐6966.459249 ARCH(10) 1396 10 3499.742 -‐6977.293249 -‐6975.258403


8

GARCH(1, 1) 1396 2 3515.871 -‐7025.724759 -‐7023.713244 The table of AICc is provided above for both constant and non-‐constant cases. Moreover, we used residuals obtained when fitting Log Price series to ARIMA(0,1,1) in R to fit ARCH/GARCH model. Note that from ARCH(0,8) afterwards, the model fails to converge, and therefore, the models’ capability to forecast is doubted. Although GARCH(1,1) also has the lowest AICc, the model is falsely converged, and thus excluded. Therefore, ARCH(5) is the selected model. A quick comparison of the results if we use residuals of ARIMA(0,1,1) obtained from Minitab to fit ARCH/GARCH:

Model N q Log likelihood AICc no const AICc const ARCH(0) 1396 0 3379.774 -‐6757.545131 -‐6755.539385 ARCH(1) 1396 1 3398.722 -‐6793.435385 -‐6791.426759 ARCH(2) 1396 2 3413.205 -‐6820.392759 -‐6818.381244 ARCH(3) 1396 3 3423.473 -‐6838.917244 -‐6836.902835 ARCH(4) 1396 4 3437.674 -‐6865.304835 -‐6863.287525 ARCH(5) 1396 5 3471.289 -‐6930.517525 -‐6928.497308 ARCH(6) 1396 6 3471.223 -‐6928.365308 -‐6926.342179 ARCH(7) 1396 7 3470.351 -‐6924.598179 -‐6922.57213 ARCH(8) 1396 8 3494.238 -‐6970.34613 -‐6968.317155 ARCH(9) 1396 9 3398.722 -‐6777.285155 -‐6775.253249 ARCH(10) 1396 10 3497.242 -‐6972.293249 -‐6970.258403 GARCH(1, 1) 1396 2 3513.862 -‐7021.706759 -‐7019.695244

The series starts failing to converge at ARCH(8); however, GARCH(1,1) can still converge. Log likelihoods obtained by fitting ARCH/GARCH to two series are only slightly different. This might be due to specifications of each software. Although GARCH(1,1) has lowest AICc and therefore should be selected according to Minitab result, ARCH(5) only ranks second and is selected based on output in R, so we will use ARCH(5) as our selected model.Moreover, we also include ARCH(0) in the analysis because it can serve as a check to see if there are any ARCH effects or the residuals are independent. Note that R will not allow the order of q = 0, and so we cannot get the log likelihood for ARCH(0) from R; yet we need to compute it by the formula7:

−.5*N*(1+log(2*pi*mean(x ˆ2))) N: number of observations after differencing N = n – d X: the data set in consideration (in this case, the residuals)

The output for ARCH(5): summary(arch.goog05) Call: garch(x = resid.goog, order = c(0, 5)) Model: GARCH(0,5)


9

Residuals: Min 1Q Median 3Q Max -5.81050 -0.45289 0.01511 0.51736 8.27483 Coefficient(s): Estimate Std. Error t value Pr(>|t|) a0 1.956e-04 7.634e-06 25.617 < 2e-16 *** a1 3.368e-02 1.878e-02 1.793 0.07294 . a2 4.971e-02 1.908e-02 2.605 0.00918 ** a3 1.128e-01 2.428e-02 4.645 3.40e-06 *** a4 1.299e-01 2.635e-02 4.928 8.29e-07 *** a5 3.741e-01 2.329e-02 16.060 < 2e-16 *** --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Diagnostic Tests: Jarque Bera Test data: Residuals X-squared = 3877.3, df = 2, p-value < 2.2e-16

Box-Ljung test data: Squared.Residuals

X-squared = 0.1016, df = 1, p-value = 0.7499 The p-‐values for all parameters are less than 0.05 (except for 1st parameter), indicating that they are statistically significant. In addition, p-‐value of Box-‐Ljung test is greater than 0.05, and so we cannot reject the hypothesis that the autocorrelation of residuals is different from 0. The model thus adequately represents the residuals. Full ARCH(5) model: ht = 1.956e-04 + 3.368e-02ε2

t-1 + 4.971e-02ε2t-2 + 1.128e-01ε2

t-3 + 1.299e-01ε2t-4 + 3.741e-01ε2

t-5

IV. ARIMA-‐ARCH/GARCH performance: In this section, we will compare the results from ARIMA model and the combined ARIMA-ARCH/GARCH model. As selected earlier, ARIMA and ARCH model for Google Log price series are ARIMA(0,1,1) without constant and ARCH(5), respectively. Moreover, we will also look at the result from Minitab and compare it with that from R. The 1-step forecast for the series under ARIMA(0,1,1):

Point Forecast Lo 95 Hi 95 1398 6.362642 6.320513 6.40477

Full model of ARIMA(0,1,1) – ARCH(5):

(Yt – Yt-1) = 0.0383εt-1 + 1.956e-04 + 3.368e-02ε2t-1 + 4.971e-02ε2

t-2 + 1.128e-01ε2t-3 + 1.299e-01ε2

t-4 + 3.741e-01ε2

t-5


10

Following is the table summarizing all models with their point forecast and forecast interval edited and computed in Excel:

Model Forecast Lower Upper Actual ARIMA(0,1,1) in R 6.362642 6.320513 6.40477 6.385295574 ARIMA(0,1,1) in Minitab (constant) 6.3628 6.32063 6.40497 ARIMA(0,1,1) in Minitab (no constant) 6.36260 6.32049 6.40479 ARIMA(0,1,1) + ARCH(5) in R 6.36285539 6.32072639 6.40498339 ARIMA(0,1,1) in Minitab (constant) +ARCH(5) 6.36301339 6.32084339 6.40518339 ARIMA(0,1,1) in Minitab (no constant) +ARCH(5) 6.36281339 6.32070339 6.40500339

Converting Log Price to Price, we obtain the forecast for original series:

95% Confident interval Model Forecast Lower Upper Actual ARIMA(0,1,1) in R 579.7761032 555.8580745 604.7226965 593.06 ARIMA(0,1,1) in Minitab (constant) 579.867715 555.9231137 604.8436531 ARIMA(0,1,1) in Minitab (no constant) 579.7517531 555.8452899 604.734791 ARIMA(0,1,1) + ARCH(5) in R 579.8998335 555.9767005 604.8517506 ARIMA(0,1,1) in Minitab (constant) +ARCH(5) 579.9914649 556.0417535 604.9727331 ARIMA(0,1,1) in Minitab (no constant) +ARCH(5) 579.8754782 555.9639131 604.8638478

The actual price was obtained on 07/19/2012. It is much off from the forecast yet still within 95% confident interval.

The Log Price and condition variances are plotted:

• The conditional variances plot successfully reflects the volatility of the time series over the entire period

• High volatility is closely related to period where stock price tumbled

However, the point forecast is not quite good given the high volatility of Google stock.


11

The 95% forecast interval of Log price:


12

The final check on the model is to look at Q-Q Plot of residuals of ARIMA-ARCH model, which is et = εt/sqrt(ht) =Residuals/sqrt(Conditional variance). Following is Q-Q plot of mixed model’s residuals:


13

The residuals are obviously not normally distributed. The mixed model does not adequately explain the Log Price series. Additionally, out of 1397 observations, there are 74 times the 95% prediction interval constructed yesterday fails to cover today’s log price, which is around 5.3%.

© 2013 L-Stern Group. All Rights Reserved. The information contained herein is not represented or warranted to be accurate, correct, or complete. This report is for information purposes only, and should not be considered a solicitation to buy or sell any security. Redistribution is prohibited without written permission.

Time Series Analysis – Google - · PDF fileL"SternGroup( ( LyPham((( 1(Time Series...

Documents

Transcript of Time Series Analysis – Google - · PDF fileL"SternGroup( ( LyPham((( 1(Time Series...