Data Preparation – Homogeneity ‐1‐ © Spider Financial Corp, 2012
HomogeneityIn this issue, the third tutorial in our data preparation series, we will touch on the third most important
assumption in time series analysis: Homogeneity, or the assumption that a time series sample is drawn
from a stable/homogeneous process.
We’ll start by defining the homogeneous stochastic process and stating the minimum stationary
requirements for our time series analysis. Then we demonstrate how to examine the sample data, draw
a few observations, and highlight some underlying intuitions behind them.
BackgroundIn statistics, homogeneity is used to describe the statistical properties of a particular data set. In
essence, it states that statistical properties of any part of an overall data set are the same as any other
part.
What do we mean by statistical properties? A strict way of looking at homogeneity would involve
examining the changes to the whole of the marginal distribution, but time series analysis only demands
that we consider the location stability over time (versus trend) and the stability of local fluctuation over
time.
Whatdoesthismean?In time series analysis, we are concerned with the stability of the underlying stochastic process over
time. Do we have structural changes? If changes exist but go undetected, we find ourselves in one of
several difficult situations:
1. The proposed model offers little explanation for the data variation over time
2. The model’s parameter values vary significantly when we re‐calibrate using either a subset of
the sample, or by incorporating new observations
3. In extreme cases, the selection of the best model type or order(s) can be influenced by the
selection of sample data
Whydowecare?The objective of time series analysis and modeling is usually the construction of out‐of‐sample forecasts.
How can we generate these forecasts using a model with time‐varying parameters? How much
confidence can we put in those forecasts? Are the forecast robust? Let’s find out.
Whydoesithappen?There are several causes for heterogeneity (opposite of homogeneity) in a time series:
(1) The underlying model’s statistical properties are evolving over time. In this case, trying to fit a
model with fixed parameter values would not be optimal, despite our best efforts. We need to
Data Preparation – Homogeneity ‐2‐ © Spider Financial Corp, 2012
examine advanced modeling techniques to capture the dynamics of the statistical properties of
the process. This, unfortunately, is outside the scope of this paper.
(2) The underlying process is not stationary (e.g. possesses trend over time).
(3) The underlying process is heteroskedastic where volatility exhibits clustering and mean
reversion.
(4) The underlying process had undergone few but major structural changes due to exogenous
events, such as the passing and enforcement of new relevant laws or a major development in
the process itself.
ExampleI:Ozone level in downtown Los Angeles case (refer to the “How does it fit” issue)
Throughout the sample time between 1955 and 1972, there were two major developments:
(1) Rule #76 for gasoline mix and combustion engine design
(2) Opening of a freeway to divert traffic from downtown LA
Obviously, those exogenous events affect the number of cars in downtown LA, and consequently
the amount of Ozone emitted in the area. One can argue that the process after those events (1972)
is not the same as the process in 1960.
ExampleII:US Consumer Price Index and its derivative, the inflation rate:
The inflation rate in the US reflects the effectiveness of government public policies, so throughout
the sample horizon between 1913 and 2009, it is no surprise that the data characteristics before and
after World War II are fundamentally different. Also consider that in the 1970s, the sudden rise in
inflation evident in our data reflects a fundamental change (or failure) in public policy.
Data Preparation – Homogeneity ‐3‐ © Spider Financial Corp, 2012
Most importantly, the inflation rate underlying the process after the 1970s is very different than in
prior years, for a number of reasons: (1) fundamental changes in public policies and (2) a mandate
for the Federal Reserve to fight the inflation rate and unemployment in 1977.
In sum, one may argue that the post‐1977 process is very different from the pre‐1977 process.
ConclusionThe investigator must bring rich prior knowledge and strong hypotheses about the underlying process
structure and its drivers to his interpretation of a data set. The liability of powerful analytical methods is
Data Preparation – Homogeneity ‐4‐ © Spider Financial Corp, 2012
the potential for a rich diversity of alternative solutions that can have very different properties when
extrapolated from the situation from which the data was originally sampled.
CheckingForHomogeneityThe initial stages in the analysis of a time series may involve plotting values against time to examine the
homogeneity of the series in various ways: namely, stability across time (as opposed to a trend) and
stability of local fluctuations over time.
In a statistical sense, a test for homogeneity is equivalent to a test of a statistical distribution. In plain
English, we wish to detect a change in the underlying distribution. For that, we can examine the
distribution moments: mean, variance, skew, and kurtosis for changes.
For time series analysis, we will look into the 1st two moments: mean and variance, and examine any
shift over time. Here are few tests to aid us:
Standard Normal Homogeneity Test (SNHT) :
Q: Do we have a shift in the mean or variance?
: ~ (0,1)oH r N
1 :H There is a shift
Where r are the standardized ratios (an observation’s value compared to the average).
Pettitt’s Test ‐ detecting a shift in variance ‐ Non‐parametric test (i.e. no assumption about the
distribution of data).
Q: Do we have a shift in the variance? When?
The Pettitt's test is an adaptation of the rank‐based Mann‐Whitney test, which allows you to
identify the time at which the shift occurs.
Tests for detecting a shift in the mean ‐ Non‐parametric test (i.e. no assumption about the
distribution of data).
Q: Do we have a shift in the mean? When?
1
:
:o t
k
H c
H c
Where
o oH is the null hypothesis, which states that tx follows one or more distributions that
have the same mean.
Data Preparation – Homogeneity ‐5‐ © Spider Financial Corp, 2012
o 1H is the alternative hypothesis, which states that there exists a time k from which the
variables change mean.
Bartles Test (ranked version of Von Neumann ratio test) for randomness –
Q: Is the sample data random? Do we have patterns?
o Null Hypothesis ( oH ): time series is homogeneous.
o Alternative hypothesis ( 1H ): time series is not homogeneous.
Holdon,doesn’thomogeneitysoundalooklikestationarity?Stationarity and homogeneity are closely related; stationarity looks into the stability of the joint
distribution1 2
( , ,..., )NX t t tF x x x , while homogeneity examines the stability of the whole marginal
distribution over time.
A non‐stationary time series is non‐homogeneous, but the opposite may not always be true.
Mytimeseriesisnothomogeneousovertime;whatcanIdo?If a homogeneous assumption fails to hold, we need to take a closer look and understand the time
series:
(1) Is the time series stationary? If so, transform the data to bring it to stationarity.
(2) Identify and understand the drivers of the underlying process:
a. Do we have exogenous drivers/factors (e.g. laws, events, etc.) that could affect the
values of the observations?
b. Has the underlying process changed permanently over time?
c. Do we expect the exogenous factor to change again in the future?
d. When did the process mean or variance change?
In the US CPI example, the change made in 1977 by congress to mandate the Federal Reserve to adopt
public policy to control inflation is a major turning point, and we are inclined to conclude that process
underwent a permanent change as a result of that development. In this case, I would disregard all
observation before that time.
In the Ozone level in downtown LA example, the opening of a freeway diverting traffic from downtown
is a structural change in the underlying process. The same can be said about the laws for gasoline mix
and engine design. Again, I would disregard data before the changes took effect, and only concern
myself with observations that occur after these events.
Top Related