Stata Good
-
Upload
jesse-mora -
Category
Documents
-
view
232 -
download
0
Transcript of Stata Good
-
8/3/2019 Stata Good
1/35
Unit root tests and Box-Jenkins
Anton Parlow
Lab session Econ710UWM Econ Department
03/05/2010
nton Parlow Lab session Econ710 UWM Econ Department ()
Unit root tests and Box-Jenkins 03/05/2010 1 / 35
http://goforward/http://find/http://goback/ -
8/3/2019 Stata Good
2/35
Our plan
Introduction to time series
AR and MA-process
Box-Jenkins Method
Unit root tests
Short review of Stata
Finding the proper model
Unit root tests
Arima
Forecasting
nton Parlow Lab session Econ710 UWM Econ Department ()
Unit root tests and Box-Jenkins 03/05/2010 2 / 35
http://goforward/http://find/http://goback/ -
8/3/2019 Stata Good
3/35
Introduction
A time series is the outcome of a variable observed over time e.g. annually, quarterly, monthlyand so on. There are different ways to describe a series e.g. has it a trend, a drift or is it a
random walk?
Example: Quarterly real GDP from 1947 to 2008
We want to explain GDP today with past values of GDP but have to find the proper model first.
nton Parlow Lab session Econ710 UWM Econ Department ()
Unit root tests and Box-Jenkins 03/05/2010 3 / 35
http://find/http://goback/ -
8/3/2019 Stata Good
4/35
AR and MA-process
If GDP (yt) depends only on its own (=auto) and past values (regressive) we have anautoregressive process:
yt = + 1yt
1+
2yt
2+ 3yt
3+ + pytp + t
In general we call it an AR(p)-model and if GDP depends only on one past realization (=lag), itis an AR(1)-process:
yt = + 1yt1 + t
nton Parlow Lab session Econ710 UWM Econ Department ()Unit root tests and Box-Jenkins 03/05/2010 4 / 35
http://find/http://goback/ -
8/3/2019 Stata Good
5/35
AR and MA-process continued
If a variable depends only on past realizations of own error-terms we have a moving averageprocess
yt = + t + 1t1 + 2t2 + 3t3 + + qtq
In general we call it a MA(q)-model and if it depends only on one past error-term, it is aMA(1)-process:
yt = + t + 1t1
Sometimes called a white noise process or the error-term is well-behaved (E [ut] = 0,Var(ut) = 2) and they are iid (=independently identically distributed)
A bit hard to find examples for this, so let us focus on AR-processes today!
nton Parlow Lab session Econ710 UWM Econ Department ()Unit root tests and Box-Jenkins 03/05/2010 5 / 35
http://find/http://goback/ -
8/3/2019 Stata Good
6/35
AR and MA-process continued
In general theses two models are an ARMA(p,q)-model where p = order for the AR-process, q= order for the MA-process
Examples:
ARMA(1,0)= AR(1)-process yt = + 1yt1 + t
ARMA(0,1)= MA(1)-process yt = + t + 1t1
ARMA(1,1)= AR(1) and MA(1) in one model
yt = + 1yt1 + it1 + t
If you see an ARIMA(p,I,q)-model then the I stands for integrated or when is the modelstationary (see unit-root tests). If I=0 or I(0) the time series is already stationary. If I=1 or I(1)
then it is stationary after first differencing and so on.
nton Parlow Lab session Econ710 UWM Econ Department ()Unit root tests and Box-Jenkins 03/05/2010 6 / 35
http://find/http://goback/ -
8/3/2019 Stata Good
7/35
AR and MA-process continued
Sometimes it is convenient to write these models in lag-operator notation L for L = one lag, L2
= two lags and so on.
Example: yt = + 1yt1 + t becomes yt = + 1Lyt + t
that Lyt = yt1, L2yt = yt2, L3yt = yt3 and so on
Example ARMA(1,1) in L-notation:
yt =[11]t
[11 ] yt [11L] = [1 1] t open the brackets yt1Lyt = t 1Lt
yt = 1Lyt + t 1Lt finally: yt = 1yt1 + t 1t1
nton Parlow Lab session Econ710 UWM Econ Department ()Unit root tests and Box-Jenkins 03/05/2010 7 / 35
http://find/http://goback/ -
8/3/2019 Stata Good
8/35
AR and MA-process continued
How to figure out the process describing a time-series? Use the autocorrelation function ACF(= covariance between past realizations) and the partial autocorrelation function PACF. SeeHamilton chapter 3 for a very good step by step derivation of these.
Take a look at these and decide. Time-series modeling is often referred as art (actually
empirical work in general) meaning you can have two economists telling you something else ifthey look at these functions.
Remember the ACF and PACF are pretty much opposite to each other when we talk about ARand MA-processes. An AR-process has a (exponentially) declining ACF and spikes for the PACF.A MA-process has spikes in the ACF and (exponentially) declining PACF CONFUSED??? see some examples next
nton Parlow Lab session Econ710 UWM Econ Department ()Unit root tests and Box-Jenkins 03/05/2010 8 / 35
http://find/http://goback/ -
8/3/2019 Stata Good
9/35
AR and MA-process continued
Example AR(1):
Example AR(2):
nton Parlow Lab session Econ710 UWM Econ Department ()Unit root tests and Box-Jenkins 03/05/2010 9 / 35
http://find/http://goback/ -
8/3/2019 Stata Good
10/35
AR and MA-process continued
Example MA(1):
Example MA(2):
nton Parlow Lab session Econ710 UWM Econ Department ()Unit root tests and Box-Jenkins 03/05/2010 10 / 35
http://goforward/http://find/http://goback/ -
8/3/2019 Stata Good
11/35
AR and MA-process continued
Much more fun if you have AR and MA-terms in your model.. ARMA(1,1):
Another way to find the underlying process is to use information criteria like BIC, AIC, SIC
which is part of the output in Eviews but not in STATA (calculating by hand a lot of fun) e.g.start with AR(0), then AR(1), AR(2).. and calculate the information criteria a trick maybe useestat ic
nton Parlow Lab session Econ710 UWM Econ Department ()Unit root tests and Box-Jenkins 03/05/2010 11 / 35
http://goforward/http://find/http://goback/ -
8/3/2019 Stata Good
12/35
-
8/3/2019 Stata Good
13/35
Unit root tests
If a time series is stationary, regressions results are not spurious or screwed up. This means mostof the time we want to have the series stationary (not needed if you do error-correction models).
Problem is, most macroeconomic time series like GDP, unemployment, trade and many more arenon-stationary (=contain a unit-root) or are not going back to their mean and the variance isnot constant (actually increasing over time). More formally, a series is stationary when the
errors are:
1. E(t) = 0
2. var(t) = 2 = or is constant
3. E(tt1) = 0 or error terms are not (serially) correlated
in other words: the errors are well-behaved or white noise.A non-stationary time series has the opposite properties!
nton Parlow Lab session Econ710 UWM Econ Department ()Unit root tests and Box-Jenkins 03/05/2010 13 / 35
http://goforward/http://find/http://goback/ -
8/3/2019 Stata Good
14/35
Unit root tests continued
Or if we use yt instead, a time-series is stationary when:
1. E(yt) = the mean is constant and does not depend on time
2. E(yt )(ytj ) = j that the auto covariance is independent of time too!
This means we have to test for non-stationarity, which is done using unit root tests like themost common Dickey-Fuller test.
To make a non-stationary time series stationary, we can do the following:
1. take the first differences
2. or detrend the time series (dont do this today)
nton Parlow Lab session Econ710 UWM Econ Department ()Unit root tests and Box-Jenkins 03/05/2010 14 / 35
http://goforward/http://find/http://goback/ -
8/3/2019 Stata Good
15/35
Unit root tests continued
The Dickey Fuller test (or augmented if more than one lag is included) uses following testregressions:
1. yt = yt1 + t note: = yt yt1, = (constant 1)
if the time series is flat (no trend) and potentially slow turning around zero
2. yt = + yt1 + t
if the series is flat and potentially slow-turning around a non-zero value (or has a drift, intercept= )
3. yt = + yt1 + T + t
if the series has a trend T(up or down) and a drift (intercept) or slow-turning around a trendline you would draw through the data
The DF-test has its own test statistics and we want to reject the H0 : = 0 for stationarity. Orin other words if we cannot reject H0 the series is non-stationary and it has to be firstdifferenced.
nton Parlow Lab session Econ710 UWM Econ Department ()Unit root tests and Box-Jenkins 03/05/2010 15 / 35
http://goforward/http://find/http://goback/ -
8/3/2019 Stata Good
16/35
Unit root tests continued
How do we choose the lag-length p for the DF-test? Schwert (1989) suggests following rule ofthumb:
pmax =
12
T100
14
where T = number of periods e.g. years, quarters
Why should we care? If p (1) is too small some serial correlation can remain in the errors andbiases the test, (2) is too large the power of the test will suffer
Another test for unit roots is suggested by Phillips-Perron (=PP) which corrects for a serialcorrelation and heteroskedasticity in the errors.
And both ADF and PP-tests are not very helpful if the series is close to be stationary.Kwiatkowski, Phillips, Schmidt and Shin (1992) suggest a test for stationarity, the so-called
KPSS-test s.t. H0 = series is stationary.
nton Parlow Lab session Econ710 UWM Econ Department ()Unit root tests and Box-Jenkins 03/05/2010 16 / 35
U i i d
http://goforward/http://find/http://goback/ -
8/3/2019 Stata Good
17/35
Unit root tests continued
There are more tests out there, but in general it is not enough to use the Dickey-Fuller test only.
Usually you use some more to be confident about your time series.
nton Parlow Lab session Econ710 UWM Econ Department ()Unit root tests and Box-Jenkins 03/05/2010 17 / 35
Sh S i
http://goforward/http://find/http://goback/ -
8/3/2019 Stata Good
18/35
Short Stata review
Remember a command in Stata has the following structure:
[command] variable, options
We used gen for generating new variables e.g. gen lgdp=log(gdp) to generate the log of GDP
Remember: if you want to have the residues after a regression use predict
nton Parlow Lab session Econ710 UWM Econ Department ()Unit root tests and Box-Jenkins 03/05/2010 18 / 35
Fi di h d l S 1
http://goforward/http://find/http://goback/ -
8/3/2019 Stata Good
19/35
Finding the proper model - Step 1
We will work with quarterly GDP data first
1. set mem 50m
2. load gdp.dta 3. Stata needs to know it is a time series.
3.1. generate a time-variable: gen time=tq(1947q1)+_n-1
3.2. give it the right format: format time %tq
3.3. tell Stata about it: tsset time
4. graph the series: tsline gdp
5. generate: gen lgdp=log(gdp) and graph it again: tsline lgdp
nton Parlow Lab session Econ710 UWM Econ Department ()Unit root tests and Box-Jenkins 03/05/2010 19 / 35
Fi di th d l St 1
http://goforward/http://find/http://goback/ -
8/3/2019 Stata Good
20/35
Finding the proper model - Step 1
Let us play around with ACF (=ac) and PACF (=pac) and lgdp is the variable, option =lag-length
1. ac lgdp, lags(10)
2. pac lgdp, lags(10)
or
3. corrgram lgdp, lags(10)
What do we see? Do it again for 20 lags.
Let us do the same for the first-difference version of lgdp. There are two ways:
1. generate a new variable: gen flgdp=D.lgdp
or2. ac D.lgdp
nton Parlow Lab session Econ710 UWM Econ Department ()Unit root tests and Box-Jenkins 03/05/2010 20 / 35
Fi di th d l ti d St 2
http://goforward/http://find/http://goback/ -
8/3/2019 Stata Good
21/35
Finding the proper model continued - Step 2
Assume an AR(1)-model is okay for log of real GDP. We should run following regression:
reg lgdp L.lgdp
note:
Stata uses L= for lag, L2= two lags, L3 = three lags
Stata uses D = for taking the first difference
Stata uses F = if you have to forward your series, sometimes called a lead
pretty convenient, because you can use these for generating new variables too.
nton Parlow Lab session Econ710 UWM Econ Department ()Unit root tests and Box-Jenkins 03/05/2010 21 / 35
Finding the proper model continued Step 3
http://goforward/http://find/http://goback/ -
8/3/2019 Stata Good
22/35
Finding the proper model continued - Step 3
If the AR(1) model is the proper one, the errors should be white noise. There are a couple ofways to test for it:
1. graph the errors
2. do a Breusch-Godfrey-test for serial correlation
3. do a Q-test called White-Noise test (or portmanteau test)
Note: The Box-pierce test is not very common anymore, due its poor performance in smallsamples.
nton Parlow Lab session Econ710 UWM Econ Department ()Unit root tests and Box-Jenkins 03/05/2010 22 / 35
Finding the proper model continued Step 3
http://goforward/http://find/http://goback/ -
8/3/2019 Stata Good
23/35
Finding the proper model continued - Step 3
1. Graphing the errors
To get the residues after the regression: predict res, resid
Stata will save the errors in res
There are two ways to graph them:
1.1. tsline resid
plots them against time, there should be no pattern over time
1.2. plot the residues against past residues
and there should be no pattern again!
reg res L.res, beta
twoway (scatter res L.res)
nton Parlow Lab session Econ710 UWM Econ Department ()Unit root tests and Box-Jenkins 03/05/2010 23 / 35
Finding the proper model continued Step 3
http://goforward/http://find/http://goback/ -
8/3/2019 Stata Good
24/35
Finding the proper model continued - Step 3
2. Breusch-Godfrey-test
again after the regression do the following (no need for predicting errors):
estat bgodfrey, lags(10)
H0 = no serial correlation, if we reject it, then the errors are correlated and not white-noise!
3. White-noise test
run the regression
predict the errors and do the following
wntestq resid, lags(10)
H0 = no serial correlation, if we reject it, then the errors are correlated and not white-noise!
nton Parlow Lab session Econ710 UWM Econ Department ()Unit root tests and Box-Jenkins 03/05/2010 24 / 35
Unit root tests
http://goforward/http://find/http://goback/ -
8/3/2019 Stata Good
25/35
Unit root tests
Are pretty straightforward in Stata:
load quarterly data for defense spending ds.dta and generate the log of defense spending (ds)
1.A-Dickey-Fuller tests
1.case: no constant, no trend term
dfuller lds, noconstant
2.case: constant, no trend
dfuller lds
3.case: constant, trend
dfuller, lds trend
options:
4. includes lags for ADF: dfuller lds, lags(10) includes 10 lags
5. if you need the regression output: dfuller lds, regress
nton Parlow Lab session Econ710 UWM Econ Department ()Unit root tests and Box-Jenkins 03/05/2010 25 / 35
Unit root tests continued
http://goforward/http://find/http://goback/ -
8/3/2019 Stata Good
26/35
Unit root tests continued
2. Phillips-Perron-test
If we dont specify a lag-length PP-test uses Schwerts thumb of rule.
Options are similar to dfuller
pperron lds
Remember: H0=non-stationary
3.KPSS-test
kpss lds
type help kpss into Stata, options are a bit different
Remember: H0=stationary
If we reject the Null, then the series is non-stationary. Stata gives you the test values fordifferent lag-lengths.
nton Parlow Lab session Econ710 UWM Econ Department ()Unit root tests and Box-Jenkins 03/05/2010 26 / 35
ARIMA in Stata
http://goforward/http://find/http://goback/ -
8/3/2019 Stata Good
27/35
ARIMA in Stata
We focused on AR-processes using OLS so far, but more powerful is following command:
arima
Arima-estimation is a maximum likelihood estimation and remember the notation is in generalArima(p,I,q) where I = integration e.g. I=0 the series is already stationary, I=1 you have totake the first differences first
examples
arima ds, ar(1) AR(1) for defense spending (ds)
arima ds, arima(1,0,0) still AR(1) but already stationary without first-differencing
arima D.ds, ar(1) = arima ds, arima(1,1,0) first-difference version of AR(1) on ds
arima ds, ma(1) = arima ds, arima(0,0,1) would be a MA(1)-process for ds
arima ds, ar(1) ma(1) = arima ds, arima(1,0,1) would have an AR(1) and a MA(1)component
to get the AIC, BIC for the models, use following command after a regression:
estat ic
nton Parlow Lab session Econ710 UWM Econ Department ()Unit root tests and Box-Jenkins 03/05/2010 27 / 35
ARIMA in Stata continued
http://goforward/http://find/http://goback/ -
8/3/2019 Stata Good
28/35
ARIMA in Stata continued
Residual test
to test the residuals for auto-correlation, it is similar as before (but bgodfrey will not work)
e.g. predict the residuals and graph them, do a whitenoise test (wntestq res)
or if you like a durban watson statistics (dwstat res) which should be around 2.
nton Parlow Lab session Econ710 UWM Econ Department ()Unit root tests and Box-Jenkins 03/05/2010 28 / 35
Forecasting
http://goforward/http://find/http://goback/ -
8/3/2019 Stata Good
29/35
Forecasting
There are different types of forecasting after a regression. We can do an in-sample forecast(using the quarters given) or we can do an out-of-sample forecast (adding quarters).
I will do it for the Arima-command (OLS is a bit different)
Remember: To check the quality of your forecast, you need to calculate the Root mean squareerror (RMSE). The RMSE uses the forecast-error (actual observation minus the forecast) and
the formula is the following: RMSE =
(Ytforecastt)2
N
Example AR(1)-model:
arima fgdp, ar(1)
Do a one-step ahead forecast:
predict fgdp1, y
Compare the actual value with the forecast
tsline fgdp fgdp1
nton Parlow Lab session Econ710 UWM Econ Department ()Unit root tests and Box-Jenkins 03/05/2010 29 / 35
Forecasting continued
http://goforward/http://find/http://goback/ -
8/3/2019 Stata Good
30/35
Forecasting continued
Calculate the RMSE:1. Generate the forecast error:
gen ferr=fgdp-fgdp1
2. Generate the square of the forecast error:
gen ferr2=ferr^2
3. Get the mean of the errors
sum ferr2
(0.0040)
4. Use it to compute the RMSE.
display "rmse: " (0.0040)^.5
Note there are more ways to measure forecast accuracy.
nton Parlow Lab session Econ710 UWM Econ Department ()Unit root tests and Box-Jenkins 03/05/2010 30 / 35
Forecasting continued
http://goforward/http://find/http://goback/ -
8/3/2019 Stata Good
31/35
Forecasting continued
A dynamic forecast could be done as follows:
predict fgdpd, xb dynamic(.)
Plot the actual value and the forecast
tsline fgdp fgdpd
Out of sample forecast
Do the regression but then you have to extend the time-horizon first:
tsappend, add(24)
adds 24 quarters to the quarterly data-set we have.
Then use the predict command for one-step ahead or dynamic forecasts.
nton Parlow Lab session Econ710 UWM Econ Department ()Unit root tests and Box-Jenkins 03/05/2010 31 / 35
Forecasting continued
http://goforward/http://find/http://goback/ -
8/3/2019 Stata Good
32/35
g
A simple linear OLS-forecast (dont ask me about the dynamic one, same command as above isnot working. There should be a way to compute it manually in Stata):
reg fgdp L.fgdp
predict fgdp1
(Stata assumes the option xb anyway in this case)
tsline fgdp fgdp1
What else could be done???
There is much more out there e.g. rolling forecast, comparing forecasts of different models e.g.AR(1) with AR(2) and so on.
nton Parlow Lab session Econ710 UWM Econ Department ()Unit root tests and Box-Jenkins 03/05/2010 32 / 35
How to create the first difference of a series
http://goforward/http://find/http://goback/ -
8/3/2019 Stata Good
33/35
The simplest way in Stata is:
Let gdp be in levels and we want to create the first difference:
gen fgdp=D.gdp
(same as: yt yt1)
or D2 would be (yt yt1) (yt1 yt2)
As you have seen above, in a regression you can use D,F and L in front of a variable withoutgenerating a new variable first!
nton Parlow Lab session Econ710 UWM Econ Department ()Unit root tests and Box-Jenkins 03/05/2010 33 / 35
Setting the time
http://goforward/http://find/http://goback/ -
8/3/2019 Stata Good
34/35
g
In our examples we had quarterly data, what if you have annual, monthly, weekly or daily data?
annual data
gen time=1947+_n-1
tsset time
monthly data
gen time=tm(1962m2)+_n-1
format time %tm
tsset timeweekly data
gen time=tw(1962w1)+_n-1
format time %tw
tsset time
daily data
gen time=td(1apr1962)+_n-1
format time %td
tsset time
Note:: _n = adds 1 observation to the start date and then it subtracts one.
nton Parlow Lab session Econ710 UWM Econ Department ()Unit root tests and Box-Jenkins 03/05/2010 34 / 35
How to detrend a series? And how to choose the time
http://goforward/http://find/http://goback/ -
8/3/2019 Stata Good
35/35
horizon?
1. Detrending
Sometimes you want to detrend a series e.g. there is a trend present or compared to taking the first difference, you save oneobservation. Imagine you only have 20 years of annual observations.
Steps:
create a trend variable, e.g. a variable increasing with time
gen trend = _n+1
regress your variable of interest using a constant and a trend
reg lgdp trend
use the residuals for the fun stuff you want to do!
2. Choosing the time horizon
There a couple of ways e.g. use observations if starting with 1980 or so but one neat command is the followingtin
= time inreg D.lgdp D2.lgdp tin{1947q1,1965q4)
that the observations are from January 1947 (first quarter) to December 1965 (fourth quarter)
nton Parlow Lab session Econ710 UWM Econ Department ()Unit root tests and Box-Jenkins 03/05/2010 35 / 35
http://goforward/http://find/http://goback/