Hierarchical Mixtures of AR Models for Financial Time...
Transcript of Hierarchical Mixtures of AR Models for Financial Time...
Hierarchical Mixtures of Hierarchical Mixtures of AR AR Models Models for for Financial Financial Time Series AnalysisTime Series Analysis
Carmen Vidal(1) & Alberto Suárez (1,2)
(1) Computer Science Dpt., Escuela Politécnica Superior(2) Risklab Madrid
Universidad Autónoma de Madrid (Spain)
2
Time series of assets are highly irregularIf market efficiency hypothesis is correct they are also unpredictable.
Time series of assets are non-stationaryThey are usually transformed in log-returns, or, for short periods of time, in relative returns
Asset returns exhibit deveations from normalityLeptokurtic: Heavy tailsHeteroskedastic: Volatility clustering
Financial time seriesFinancial time series
3
Modelling finacial time-series is not easyNatural sciences⌧ Not reproducible⌧ Underlying model?
Inductive / statistical learning⌧ Small data sets⌧ Complex data
• Non-linear• Non-stationarity• Non-gaussian• Heteroskedastic 0 500 1000 1500 2000 2500 30
0
2000
4000
6000
8000
10000
12000
14000
Financial time seriesFinancial time series modellingmodelling/ analysis/ analysis
4
Two stylized facts (Two stylized facts (Timo TeräsvirtaTimo Teräsvirta))
Returns exhibit two empirically observed features:
Correlations⌧Short term for the returns⌧Medium term for absolute
values of returns
Leptokurtosis⌧Heavy tails⌧Extreme events
5
An example: IBEX35An example: IBEX35
6
Daily returns: IBEX35 (5 years)Daily returns: IBEX35 (5 years)
7
DailyDaily--returns distributionreturns distribution
8
BlackBlack--ScholesScholes theorytheory
In theory: Markets are efficientAbsence of arbitrage opportunities.No systematic trends.Very short term memory.
Model: Black-ScholesLog of daily returns of an asset are distributed according to a normal distribution.Two parameters:
• Risk free interest rate.• Volatility [ free parameter]
9
Advantages Simple minimal model with only one free parameter, the volatility.Good pricing accuracy for at-the-moneyoptions.Analytic pricing formulas for simple derivatives.
Drawbacks:Incorrect pricing formulas for:
Deep in-the-money or out-of-the-moneyShort-term (less than a month) orptionsOptions on underlying with very low or very high volatility.
This is reflected in the fact that impliedvolatility is not constant [Volatility smile]
Is BlackIs Black--Scholes a good model?Scholes a good model?
80 85 90 95 100 105 1100.24
0.242
0.244
0.246
0.248
0.25
0.252
0.254
0.256
0.258Sonrisa de la volatilidad
Volatility smile (European call)
Impl
ied
vola
tility
Strike
10
In practice markets areNot efficient: Memory effects (short/long term?).Very unpredictable (at least sometimes)
Extreme events are more frequent than what the Black-Scholes models predicts.Occurrence of crashes.Changes in economic paradigm.
Market friction: Transaction costs, lack of liquidity, dividends, etc.
Heteroskedasticity + heavy tails
Need more sophisticated modelParametric models: Generalizations of Blak-Scholes.Non-parametric models: Neural networks, Mixture models
Beyond BlackBeyond Black--ScholesScholes
11
Memory effects (IBEX 35)Memory effects (IBEX 35)
12
13
Failure of normal model: Heavy tails
14
15
Empirical evidence for leptokurtosisEmpirical evidence for leptokurtosis
Volatility smiles and smirksBlack-Scholes is insufficient to account for time evolution of underlying.
Incremented risk Multiplicative factor in market Risk estimates (Basel Accord 1988, 1996 ammendment)
80 85 90 95 100 105 110 115 1200.205
0.21
0.215
0.22
0.225
0.23
0.235
0.24
16
Time series analysisTime series analysis
Consider the time series
Time series analysisForecastingClassificationModelling
These problems are closely related to each other:
Tt21 X,,X,,X,X ……
);;(ˆ tF θθθθ…,X,XX 1ttdt −+ =);( tFClass θθθθ…,X,X 1tt −=
);|( tP θθθθ…,X,XX 1ttdt −+
);|(;);(
t
dtt
PF
θθθθεεεεεεεεθθθθ
……
,X,X,X,XX
1ttdt
1ttdt
−+
+−+ +=
17
Time series prediction: a Learning viewTime series prediction: a Learning view
Network model for time-series prediction
Learning device
1−tX
2−tX
ptX −
tX̂
1
18
Tasks in time series analysisTasks in time series analysis
Obtaining data:Selection of attributes: Choose relevant indicatorsData collection⌧Discrete data: Grouping /averaging in time window⌧Continuous data: Importance of sampling frequency
Preprocessing dataClean data : Missing data, outliersNormalization of data
Eliminate trends /seasonality: Handle a-priori info explicit /
Stationary data.11
11 log;;
−−
−−
−−t
t
t
tttt X
XX
XXXX
( )minmax
minmax2;;XX
XXXiqmedianXX ttt
−+−−
σµ−
19
Parametric / nonParametric / non--parametric data analysisparametric data analysis
ParametricFormulate (restrictive) hypothesis dependent on a set of parametersFind parameters by data-driven optimization [training set]⌧Sensitivity analysis⌧Uncertainty in estimated parameters⌧Robustness
Validation of models [test set]Non-Parametric
Consider a family of universal approximants Fix architecture / parameters by data-driven optimization [training set]⌧Sensitivity analysis⌧Robustness⌧Uncertainty ⌧Intelligibility
Validation of models [test set]
20
Classical models in timeClassical models in time--seriesseries
Consider the time series
The series exhibits randomness.The process is covariance-stationary when:
Mean is time independent
Autocovariance is independent of time-translations
Ttt XXXXXX ,,,,,, 1210 …… −
( )( )[ ] ττ γµµ =−−+ tt XXE
[ ] µ=tXE
21
AutorregressiveAutorregressive+Moving average models+Moving average models
Autorregressive model for a time-series
Vectors of delayed values:
The systematic term reflects trends.The innovations are uncorrelated noise.Maximization of the likelihood function yields estimates of the model parameters.
tu
[ ][ ] ][
][
21][
21][
mtttm
t
mtttm
t
uuu
XXX
−−−+
−−−+
=
=
u
X
);,(ˆ ][][ θθθθqt
ptt f uXX =
tq
tp
tt ufX += );,( ][][ θθθθuX
22
Autoregressive (Autoregressive (feedforwardfeedforward) MLP) MLP
;;ˆ1 1
)1(0
)1()2(∑ ∑= =
−
−
θ+=J
jj
D
djjdtjdjt cwxwfwx
)1(20w
)1(10w1
1−tx
)1(JDw
Input layerHidden layer(s)
Output layer)2(
1w
)(ˆ tx)2(2w
)2(Jw
Sigmoidal (logistic) xe
xf −−=
11)(
xx
xx
eeeexf −
−
+−=)(
Hyperbolic tangent:
2−tx
Dtx −
23
ARMA(p,q) MLPARMA(p,q) MLP
1θ1
ARw
Input layerHidden layer(s)
Output layer)2(
1w
)(ˆ tx)2(2w
)2(Jw
delay
delay
delay
1
1−tx
2−tx
ptx −
1ˆ −tx2ˆ −tx
qtx −ˆ +_qtu −
( ) ;ˆ
ˆ
1
1 1
θ+−+
+=
∑
∑ ∑
=−−
= =−
p
djdtdt
MAjd
J
j
p
ddt
ARjdjt
xxw
xwfwx
2θ
MAw
24
Mixture modelMixture model
st
t
X
X
−
−1
tX̂2ˆ tσ
1
2
21
st
t
−
−
σσσσ
σσσσ
MODEL 1
MODEL 2
MODEL J
GATING NETWORK
ΣgJ
g2
g1
25
Gating NetworkGating Network
1ˆ −tX
2ˆ −tX
rtX −ˆ
1
1h
2h
1−Jh
-c1
1
ar-1
a1
−+= ∑−
=−−− i
r
kktitii cXaXbh
1
111 ˆˆexp
Probabilities ∑∑
−
=−
=
−=−=+
=1
11
1
1)1(21;1
J
jjJJ
jj
ii ggJ,,i
h
hg …
26
Hierarchical mixturesHierarchical mixtures
MODEL 1 MODEL 2
MODEL 3
2
11|212
11|111
3 Model; 2 Model; 1 Model
µ
µµ=µ
µµ=µ
µ1
µ1|1
µ 2
µ2|1
12
1
1
11111
1
1
11111
1
1
exp1
exp
µ−=µ
−++
−+=µ
∑
∑−
=−−−
−
=−−−
cXaXb
cXaXb
r
kktkt
r
kktkt
1|11|2
2
1
11212
2
1
11212
1|1
1
exp1
exp
µ−=µ
−++
−+=µ
∑
∑−
=−−−
−
=−−−
cXaXb
cXaXb
r
kktkt
r
kktkt
Input = Vector of Delayed values
27
Mixture of Mixture of Gaussians Gaussians for tfor t--independent independent pdfpdf
Empirical sample Model pdf
Two steps:Toss a K-sided loaded dice to choose component.Extract value from the selected model.
Advantages:Close to the normal world.Accounts for leptokurtosis of empirical unconditional distributions in finance.
),;(N)(1
kk
K
kk x pxP σµ∑
=
=
NXXX …,, 21
28
29
30
Mixture ofMixture of GaussiansGaussians
Intuition: Implicitly market forecasts are made in terms of scenarios. Each of these scenarios is characterized by an expected returnand a volatility.Markets assign a different probability to each scenario.
Dynamical picture?Direct time aggregation of the process yields a normal model (by Central Limit Theorem).It is possible to construct a discontinuous jump processmaintaining the mixture form. Not realistic.
31
Mixture of AR processesMixture of AR processes
Mixtures of Gaussians + autorregressive dynamicsInIn: Vector of delays (Used in gating network + AR models)OutOut: Next value in time series
No hierarchy Tree hierarchy
32
Synthetic dataSynthetic data: E: Example 1xample 1
−10 −8 −6 −4 −2 0 2 4 6 80
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Contribucion de cada experto
E3E1E2
Histogram (unconditional pdf)
−10 −8 −6 −4 −2 0 2 4 6 80
50
100
150
200
250
Time series generated by a hierarchical mixture of 3 AR(1) experts
Expert contributions
33
Model 1 fitModel 1 fit
Fitting to a mixture of 2 AR(1) experts (wrong type of model!)
Contributions Histogram Percentile plot
−10 −8 −6 −4 −2 0 2 4 6 8 100
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Contribucion de cada experto
g1g2
−15 −10 −5 0 5 100
20
40
60
80
100
120
140
−10 −8 −6 −4 −2 0 2 4 6 8−15
−10
−5
0
5
10
X Quantiles
Y Q
uant
iles
0.46450-18009-17967
ECM TestK-S TestLL TestLL Train
34
Model 2 fitModel 2 fit
Fitting to a mixture of 3 AR(1) experts (learnable model)
Contributions Histogram Percentile plot
0.31640.9666-16755-16675
ECM TestK-S TestLL TestLL Train
−10 −8 −6 −4 −2 0 2 4 6 8 100
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Contribucion de cada experto
E3E1E2
−10 −8 −6 −4 −2 0 2 4 6 80
20
40
60
80
100
120
140
160
180
200
−10 −8 −6 −4 −2 0 2 4 6 8−10
−8
−6
−4
−2
0
2
4
6
8
X Quantiles
Y Q
uant
iles
35
AR(1) fit for Ibex35 AR(1) fit for Ibex35 (1200 +712 days)(1200 +712 days)
36
AR(1) fit for Ibex35 AR(1) fit for Ibex35 (1200 +712 days)(1200 +712 days)
37
MIX 2 AR(1) fit for Ibex35 MIX 2 AR(1) fit for Ibex35
38
MIX 3 AR(1) fit for Ibex35 MIX 3 AR(1) fit for Ibex35
39
Hierarchical MIX 3 AR(1) fit for Ibex35Hierarchical MIX 3 AR(1) fit for Ibex35
40
ConclusionsConclusions and perspectivesand perspectives
MixturesMixtures of AR(1) models improveimprove the results of single AR(1) models in financial returns time series.Mixtures of Mixtures of 2 / 3 experts2 / 3 experts seem to be sufficientsufficient to model leptokurtosis and dynamics.The introduction of hierarchyhierarchy in the structure of the mixture may significantly improve statistical description of financial time series data.To do:
HeteroskedasticityCalibration of models to market
41
Mixture of ARCH processesMixture of ARCH processes
MixARCH
The model for the residuals is
The quantities are assumed to be N(0,1)
)(
),(
][][
][
][][
][
ir
ti
im
tit
gyprobabilitwith
tuX
θθθθ,,,,
φφφφ
X
X +⋅= +
)(][)(
)()(][][
22][
][][
tut
Zttuqiiii
tii
⋅+=
=+αααακκκκσσσσ
σσσσ
tZ
42
Mixture of GARCH processesMixture of GARCH processes
MixGARCH
The model for the residuals is
The quantities are assumed to be N(0,1)
),(
),(ˆˆ
][][
][
][][
][
ir
ti
im
tit
gyprobabilitwith
tuX
θθθθ
φφφφ
X
X +⋅= +
)(][)(][)(
)()(][
][2][
][22
][
][][
ttut
Zttupii
qiiii
tii
σσσσββββαααακκκκσσσσ
σσσσ
⋅+⋅+=
=++
tZ
43
AR(1) / ARCH(1) for IBEX35AR(1) / ARCH(1) for IBEX35
The maximum-likelihood fit of the time-series IBEX35 yields the model
The quantities are assumed to follow a N(0,1) distribution.
tZ
( )221
2
1
ˆ1129.0ˆ1118.09097.0
ˆ1129.0ˆ
−−
−
−+=
+=
ttt
tttt
XX
ZXX
σ
σ
44
Residual correlations: ARCH(1)Residual correlations: ARCH(1)
45
Normality hypothesis: ARCH(1)
KS Test = 0.12
-6 -4 -2 0 2 4 6-6
-4
-2
0
2
4
6
X Qua ntile s
Y Quantiles
-4 -3 -2 -1 0 1 2 3 4 50
50
100
150
200
46
MIXARCH for IBEX35MIXARCH for IBEX35
The mixture model is
The probabilities for the mixture are
( )
( )221
2
1
221
2
1
ˆ1380.0ˆ03821.06820.0
ˆ1380.0ˆ 2Model
ˆ0559.0ˆ1976.02194.2
ˆ0559.0ˆ 1Model
−−
−
−−
−
−+=
+=
−+=
+=
ttt
tttt
ttt
tttt
XX
ZXX
XX
ZXX
σ
σ
σ
σ
{ })(1)(
;)5155.2(6839.0exp1
1)(
1]1[1]2[
11]1[
−−
−−
−=−−+
=
tt
tt
XgXgX
Xg
47
Residual correlations: MIXARCH
48
Normality hypothesis: MixARCH(1)
KS Test = 0.83
-3 -2 -1 0 1 2 30
20
40
60
80
100
120
140
160
-6 -4 -2 0 2 4 6-6
-4
-2
0
2
4
6
X Quantile s
Y Quantiles
49
MIXARCH Model fit
50
AR(1) / GARCH(1,1) for IBEX35AR(1) / GARCH(1,1) for IBEX35
The maximum-likelihood fit of the time-series IBEX35 yields the model
The quantities are assumed to follow a N(0,1) distribution.
tZ
( )2
1
221
2
1
8733.0
ˆ1358.0ˆ0755.00527.0
ˆ1358.0ˆ
−
−−
−
+−+=
+=
t
ttt
tttt
XX
ZXX
σσσσσσσσ
σσσσ
51
Residual correlations: GARCH
0 5 10 15 20 25 300
0.2
0.4
0.6
0.8
1
Magnitude
Autocorre la tions of re s idua ls
0 5 10 15 20 25 300
0.2
0.4
0.6
0.8
1
Delay
Magnitude
Autocorre la tions of abs (re s idua ls)
52
Normality hypothesis: GARCH(1,1)
-4 -2 0 2 4 60
50
100
150
200
250
KS Test = 0.56
-6 -4 -2 0 2 4 6-6
-4
-2
0
2
4
6
X Quantile s
Y Quantiles
53
Test Data
-5 0 5-6
-4
-2
0
2
4
6
X Qua ntiles
Y Quantiles
0 5 10 15 20 25 30
0
0.2
0.4
0.6
0.8
1
Magnitude
Autocorre la tions of re s iduals
0 5 10 15 20 25 30
0
0.2
0.4
0.6
0.8
1
De lay
Magnitude
Autocorre la tions of abs (re s idua ls )
-6 -4 -2 0 2 4 60
10
20
30
40
50
60
70
80
90
100 200 300 400 500 6000
1
2
3
Time
Volatility
KS = 0.33
54
MIXGARCH for IBEX35MIXGARCH for IBEX35
The mixture model is
The probabilities for the mixture are
( )
( ) 21
221
2
1
21
221
2
1
0285.0ˆ3314.0ˆ0000.06230.2
ˆ3314.0ˆ 2Model
8937.01255.0ˆ0778.00156.0
ˆ1255.0ˆ 1Model
−−−
−
−−−
−
+−+=
+=
+−+=
+=
tttt
tttt
tttt
tttt
XX
ZXX
XX
ZXX
σσσσσσσσ
σσσσ
σσσσσσσσ
σσσσ
{ })(1)(
;)8710.4ˆ(0.5418exp1
1)(
1]1[1]2[
11]1[
−−
−−
−=−+
=
tt
tt
XgXgX
Xg
55
Residual correlations: MIXGARCH
0 5 10 15 20 25 300
0.2
0.4
0.6
0.8
1
Magnitude
Autocorre la tions of re s idua ls
0 5 10 15 20 25 300
0.2
0.4
0.6
0.8
1
Delay
Magnitude
Autocorre la tions of abs (res idua ls )
56
Normality hypothesis: MIXGARCH
-6 -4 -2 0 2 4 6-6
-4
-2
0
2
4
6
X Quantile s
Y Quantiles
-3 -2 -1 0 1 2 30
20
40
60
80
100
120
140
160
KS test = 0.95
57
MIXGARCH Model fit
200 400 600 800 1000 12000
1
2
Time
Volatility
200 400 600 800 1000 12000
0.20.40.60.8
Entropy
200 400 600 800 1000 12000
0.20.40.60.8
Probabilities
Model 1Model 2
58
Test Data
-6 -4 -2 0 2 4 6-6
-4
-2
0
2
4
6
X Qua ntiles
Y Quantiles
100 200 300 400 500 6000
1
2
3
Time
Volatility
0 5 10 15 20 25 30
0
0.2
0.4
0.6
0.8
1
Magnitude
Autocorre la tions of re s iduals
0 5 10 15 20 25 30
0
0.2
0.4
0.6
0.8
1
De lay
Magnitude
Autocorre la tions of abs (re s idua ls )
-6 -4 -2 0 2 4 60
10
20
30
40
50
60
70
80
KS = 0.25