Managing Swaption Risk with a Dynamic SABR Model
Transcript of Managing Swaption Risk with a Dynamic SABR Model
Amsterdam School of EconomicsFaculty of Economics and Business
Managing Swaption Risk
with a Dynamic SABR Model
MSc in EconometricsFinancial Econometrics track
Frank de Zwart
10204245
supervised by
Dr. S.A. Broda
and supervisor at Abn Amro
Ms Hiltje Bijkersma
July 28, 2017
ABN AMRO Bank N.V.CRM | Regulatory Risk | Model Validation
Frank de Zwart Abn Amro Model Validation
Statement of Originality
This document is written by Student Frank de Zwart who declares to take full responsibility for the
contents of this document. I declare that the text and the work presented in this document is original
and that no sources other than those mentioned in the text and its references have been used in creating
it. The Faculty of Economics and Business is responsible solely for the supervision of completion of the
work, not for the contents.
Frank de Zwart Abn Amro Model Validation
Abstract
This thesis focuses on models that can be used to estimate risk measures, like Value at Risk and Expected
Shortfall. The displaced Black’s model and the displaced SABR volatility model are used to price a
portfolio of swaptions. The aim here is to capture the dynamics of the SABR parameters in a time series
model to obtain more accurate swaption risk estimates. Hence, this time series model is used to simulate
the one-day-ahead profit and loss distribution and is then compared to the Historical Simulation method.
In an empirical study, we compute the Value at Risk and Expected Shortfall estimates based on the
Historical Simulation method as well as the time series model. These models are analyzed with several
backtests and diagnostic tests to be able to answer the following research question. Can one outperform
the Historical Simulation Value at Risk and Expected Shortfall forecasts by fitting a time series model
to the calibrated SABR model parameters instead?
A vector autoregressive model is used as well as a local level model. Based on these two models we
are not able to outdo the Historical Simulation estimates of the risk measures. Diagnostic tests show
remaining significant autocorrelation as well as heterogeneity in the residuals of the vector autoregressive
model. Also the backtests that are carried out show that the vector autoregressive model performs worse
than the Historical Simulation method.
Frank de Zwart Abn Amro Model Validation
Contents
1 Introduction 1
2 Preliminaries on financial notation 2
2.1 Interest rate instruments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
2.2 Bootstrapping the zero curve . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
2.3 Swaptions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
2.4 Martingales and Measures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
3 Literature review 9
4 Models and method 12
4.1 Option pricing models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
4.2 Time series analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
4.3 Risk measurement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
4.4 Backtests . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
5 Data 26
5.1 Calculating the implied volatilities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
5.2 Leaving out of some strikes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
6 Empirical study and results 29
6.1 Calibrating the SABR model parameters . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
6.2 Fitting a model through the SABR parameters time series . . . . . . . . . . . . . . . . . . 32
6.3 Risk measurement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
6.4 Backtests . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
6.5 Robustness check: Local level model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
7 Conclusion 43
References 45
A Appendix 47
Frank de Zwart Abn Amro Model Validation
1 Introduction
The Basel Committee (2013) has introduced the Fundamental Review of the Trading Book (FRTB).
To contribute to a more resilient banking sector, they have decided to change the current framework’s
reliance on Value at Risk (VaR) to the Expected Shortfall (ES) measure to estimate market risk. On
the other hand Pérignon and Smith (2010) state that most banks use Historical Simulation (HS) to
estimate their VaR. This Historical Simulation method computes the VaR by using past returns of the
portfolio’s present assets so that one obtains a distribution of price changes that would have realized,
had the current portfolio been held throughout the observation period. The decision described in the
FRTB shows that it is getting even more important for financial institutions to estimate their market
risk accurately. However, we also see that a relatively simple method is still used to obtain these risk
measures. One of the main drawbacks of this Historical Simulation method is that it does not take the
decreasing predictability of older returns into account.
Derivatives are traded extensively these days, and one of these products is a swap option, or swaption.
A swaption is an option on an interest rate swap. Swaptions are traded over-the-counter, so compared
to derivatives that are traded on an exchange, the information is more scarce and not publicly available.
This makes it an interesting challenge to find an accurate method to assess the risk of holding these
derivatives. Besides this, the negative interest rates also affect almost all the valuation methods for
these options. In the current interest rate environment, the Historical Simulation method that is used
to produce the VaR and ES estimates of market risk may not be reliable. Hence, finding a method to
get more reliable estimates for the VaR and ES, based on historical swaption data, is of interest.
This leads to the following research question, which defines the main purpose of this thesis. Can one
outperform the Historical Simulation Value at Risk and Expected Shortfall forecasts by fitting a time
series model to the calibrated SABR model parameters instead? An empirical study is performed to be
able to answer this question. This study is based on an ICAP data set of swaption premiums, interest
rate deposits, and interest rate swaps. The time series, with a time grid of approximately 2.5 years, of
the displaced SABR volatility model parameters will be analyzed to obtain an one-day-ahead forecast of
the price of a portfolio of swaptions. Finally, a backtesting procedure will assess the quality of this new
method compared to the well-known Historical Simulation method.
The remainder of this research report is structured as follows. First in Section 2, the necessary
background theory for this research will be discussed. This includes theory on interest rate instruments
in general as well as a description of an interpolation method called bootstrapping to obtain the zero and
discount curves. We will then give a description of a swaption and some of its relevant trading strategies
are discussed. This section will be concluded with the a description of martingales and measures. Then we
will briefly discuss the relevant literature for this research in Section 3 and subsequently we will continue
in Section 4 with the theory that is used to price the swaptions. In this section, the well-known model
of Black (1976) is described in detail. Besides this, we will focus on the SABR volatility model of Hagan
et al. (2002) and the correction to their work from Obłój (2008). We will then discuss the implications of
negative interest rates on these models. In Section 4.2, some basic time series models that are used in this
research will be discussed. We will then continue with the risk measurement concepts, like Value at Risk
and Expected Shortfall. Finally, several different backtests will be elaborated. Different backtests are
used to be able to assess the quality of our model estimates as thoroughly as possible. The data set will
be described in Section 5. Not only the raw data will be described, but also the different pre-processing
techniques will be explained. This section also contains information on some of the limitations and
1
Frank de Zwart Abn Amro Model Validation
argumentation on why some adjustments are made. In the next section, Section 6, the empirical study
and results are described. This section follows the structure of Section 4 and starts with the calibrated
SABR parameters and continues with the time series analysis, risk measurement, and concludes with the
backtesting procedure. There are also some diagnostic tests carried out, besides the backtests themselves,
to assess the quality of the fit of the time series analysis. The results will be elaborated and discussed
in every single step. Finally in Section 7, the main findings are summarized and a conclusion is drawn.
The research question will be answered and some limitations and recommendations for further research
will be provided.
2 Preliminaries on financial notation
Trading in derivatives has become an indispensable part of the financial industry. There are multiple
different derivatives for every type of investment asset. The magnitude of this market shows that it
is of great importance to have an understanding of how these derivatives work. Consequently a lot of
researchers have focused on all these derivatives. Numerous papers and books describe how derivatives
work and what risks the holder of an open position in them is taking. We will first explain some basic
but crucial concepts of interest rate instruments. Then in Section 4, we will describe the models and
methods that are applied in the empirical analysis of this research.
2.1 Interest rate instruments
Interest rates are crucial in the valuation of derivatives. Especially the ’risk-free’ rate is of concern
when evaluating derivatives. Hull (2012) explains that the interest rates implied by Treasury bills are
artificially low because of a favorable tax treatment and other regulations. For this reason the LIBOR
rate became commonly used instead. However, when the rates ascended during the crisis in 2007, many
derivatives dealers had to review their practices. The LIBOR rate is the short-term opportunity cost
of capital of AA-rated financial institutions and it lifted off due to unwillingness of banks to lend to
each other during the crisis. Many dealers have switched now to using overnight indexed swaps (OIS),
because they are closer to being ’risk-free’. This research focuses on the Euro market and makes use of
the Euribor rate. The Euribor rate is similar to the LIBOR rate, however is only based on a panel of
European banks. The rates at which they borrow funds from one another is the Euro Interbank Offered
Rate (Euribor). Although the Euribor rate is not theoretically risk-free, it still is considered a good
alternative against which to measure the risk and return trade off.
There is one very important assumption that makes the risk-free rate even more crucial. This is
known as the assumption of a risk-neutral world. In a risk-neutral world is assumed that all investors
are risk-neutral. In other words, they do not require a higher expected return from an investment that
is more risky.
Theorem 2.1. This leads to the following two characteristics of a risk-neutral world (Hull, 2012):
1. The expected return on an investment is the risk-free rate.
2. The discount rate used for the expected payoff on a financial instrument is the risk-free rate.
This makes the pricing of derivatives much more simple. The world is not actually risk-neutral,
however it can be shown that if we compute the price of a derivative under the risk-neutral world
2
Frank de Zwart Abn Amro Model Validation
assumption, we obtain the correct price for the derivative in all worlds. This makes a significant difference,
because there is still a lot unknown about the risk preferences of buyers and sellers of derivatives.
The main focus of this research is on swaptions. The underlying of this product is an interest rate
swap and therefore this derivative will first be discussed. However before the swaps are considered, we
briefly discuss forward rate agreements. These agreements give an insight in how we can price a swap.
We will then continue with an interpolation method, known as bootstrapping, that is used to obtain
the zero curve. Then the swaption will be explained together with some of its most common trading
strategies. Finally we will continue with theory about martingales and measures. These measures are
used to compute the discounted expected value of a certain future payoff.
A forward rate agreement (FRA) is an agreement defined to ensure that a certain interest rate will
apply to either borrowing or lending a certain principal during a specified future period of time. We
define RK as the interest rate agreed to in the FRA and define RF as the forward of the reference rate
at time Tα for the period between times Tα and Tβ . We denote the value of a FRA at time t, where RKis received as
VFRA(t) = L(RK −RF )(Tβ − Tα)P (t, Tβ), (2.1)
where L is the principal of the FRA and P(t,T) is the present value at time t of 1 Euro received at time
T (Brigo and Mercurio, 2007).
The forward interest rate that is used in the FRA’s, is implied by current zero rates for periods of
time in the future. A n-year zero rate is the rate of interest earned on an investment that starts today
and lasts for n years. All the interest and principal is realized at the end of n years. A curve of zero rates
can be created from market quotes by using a popular interpolation method known as the bootstrap
method. This method will be described in more detail in Section 2.2.
A fixed-for-floating swap, also known as a payer swap, is the most common type of swap. In this swap
an investor agrees to pay interest at a predetermined fixed rate on a notional principal for a predetermined
number of years. In return it receives interest at a floating rate on the same notional principal for the
same period of time. A swap can be characterized as a portfolio of forward rate agreements and this can
be used to determine its value. The value of the swap is simply the sum of multiple FRA’s, so we find
that the value of a payer swap is given by
Vswap(t) = L
β∑i=α+1
(RK −RFi)(Ti − Ti−1)P (t, Ti), (2.2)
where the length of the swap Tβ − Tα is called the tenor with n years between Tα and Tβ with each m
cash flows per year. Throughout this entire paper we will denote m as the swap payment frequency per
annum. So in total we have n×m cash flows, which can be valued like FRA’s. This leads to the sum as
shown in (2.2), which sums in total over n×m different cash flows (Brigo and Mercurio, 2007).
2.2 Bootstrapping the zero curve
Only spot rates are quoted in the market, so the bootstrapping method is used to obtain the forward
rates and forward swap rates. This method works by incrementally computing zero-coupon bonds in
order of increasing maturity.
As curve inputs first multiple market quotes based on the Euribor rate are used. More precisely,
interest rate deposits with different maturities varying from overnight up to 3 weeks are used. To expand
3
Frank de Zwart Abn Amro Model Validation
the time grid out to 30 years, swaps are used with a maturity varying from 1 month up to 30 years. The
interpolation over these market quotes will give us all zero-coupon prices, which we can use to compute
the forward rates and the forward swap rates.
Uri (2000) lists the different payment frequencies, compounding frequencies and day count conven-
tions, as applicable to each currency specific interest rate type. The conventions for the Euro rates are
used for this research, namely for Euro deposit rates the day count convention of ACT/360 and for Euro
swap rates the day count convention of 30/360, respectively.
The deposit rates that are used for the time grid of the swap curve up to 3 weeks are inherently zero-
coupon rates. For this reason they only need to be converted to the base currency swap rate compounding
frequency and day count convention. The day count convention of the deposit is ACT/360, so we can
directly interpolate the data points to obtain the first part of the zero curve.
For the middle part of the curve one could use market quotes of forward rate agreements like described
by Uri (2000). This can be preferable, because they carry a fixed time horizon to settlement and settle
at maturity. However the FRA’s can lack liquidity, which results in inaccurate market quotes. For this
reason only swaps and deposits are used. The annually compounded zero swap rate is used to construct
most of the zero curve. The different day count convention of the swaps is taken into account. The
discount rates are computed based on the deposit and swap rates. Brigo and Mercurio (2007) define the
zero curve at time t as a graph of the simply-compounded interest rates for maturities up to one year
and of the annually compounded rates for maturities larger than one year. The simply compounded and
the annually compounded interest rates are defined as follows
L(t, T ) =1− P (t, T )
(T − t)P (t, T ), (2.3)
and Y (t, T ) =1
[P (t, T )]1/(T−t), (2.4)
where L(t, T ) represents the simply compounded interest rate at time t for maturity T and Y (t, T )
represents the annually compounded interest rate respectively. The simply compounded interest rates
that represent the first part of the zero curve are now combined with the annually compounded rates
that are used for the other part of the zero curve. To do so we first define R(ti) as the interest rate
corresponding to maturity ti, where i is the market observation index. Hence, R(ti) represents the simply
compounded interest rate if ti ≤ 1 and the annually compounded interest rate if ti > 1. There is no single
way to construct this complete zero curve correctly. It is however important that the derived yield curve
is consistent, smooth, and closely tracks the observed market points. Uri (2000) however also mentions
that over-smoothing the yield curve might cause the elimination of valuable market pricing information.
Piecewise linear interpolation and piecewise cubic spline interpolation are two commonly used methods
that are appropriate for market pricing.
The piecewise linear interpolation method is simple to implement, because the value of a new data
point is simply assigned according to its position along a straight line between observed market data
points. One drawback of this method however is that it produces kinks in the areas, where the yield
curve is changing slope. The piecewise linear interpolation can be constructed in a closed form as follows
R(t) = R(ti) +
[(t− ti)
(ti+1 − ti)
]× [R(ti+1)−R(ti)], (2.5)
where ti ≤ t ≤ ti+1.
To avoid the kinks produced by the linear method, one can choose to fit a polynomial function through
the observed marked data points instead. It is possible to either use a single high-order polynomial or a
4
Frank de Zwart Abn Amro Model Validation
number of lower-order polynomials. The latter method is preferred, because the extra degrees of freedom
can be used to impose additional constraints to ensure smoothness of the curve. The piecewise cubic
spline technique goes through all observed data points and creates the smoothest curve that fits the
observations and avoids kinks.
We can construct a cubic polynomial for the n− 1 splines between the n market observations. Now
let Qi(t) denote the cubic polynomial associated with the t segment [ti, ti+1]
Qi(t) = ai(t− ti)3 + bi(t− ti)2 + ci(t− ti) +R(ti), (2.6)
where R(ti) again represents market observation point i and ti represents the time to maturity of market
observation i. With three coefficients per spline and n− 1 splines, we have 3n− 3 unknown coefficients
and we impose the following constraints
ai(ti+1)3 + bi(ti+1 − ti)2 + ci(ti+1 − ti) = R(ti+1)−R(ti),
3ai−1(ti − ti−1)2 + 2bi−1(ti − ti−1) + ci−1 − ci = 0,
6ai−1(ti − ti−1) + 2bi−1 − 2bi = 0,
b1 = 0,
6an−1(tn − tn−1) + 2bn−1 = 0.
(2.7)
The first set of n − 1 constraints is imposed in order to force the polynomials to perfectly fit to each
other at the knot points. To also let the first and second order derivatives of the polynomials match, we
set the second and third set of 2n − 2 constraints. Finally two endpoint constraints are required to set
the derivative equal to zero at both ends. We end up with a linear system of 3n−3 equations and 3n−3
unknowns, which is solved to obtain the optimal piecewise cubic spline.
Both methods are used and plotted in Figure 2.1 below. The main advantage of the linear interpo-
lation method is that it is closed form. The piecewise cubic spline interpolation method however takes
longer to compute. The figures show almost no difference between both methods. This relatively small
difference can be explained by the large number of data points and the smooth structure of the rates
with respect to their time to maturity.
0 5 10 15 20 25 30−5
0
5
10x 10
−3
Time to Maturity
Rate
s
Zero curve for 02−Jun−2016
Market Quotes
Linear interpolated line
0 5 10 15 20 25 300.7
0.8
0.9
1
1.1
Time to Maturity
Rate
s
Discount curve for 02−Jun−2016
Discount rates
Linear interpolated line
0 5 10 15 20 25 30−5
0
5
10x 10
−3
Time to Maturity
Rate
s
Zero curve for 02−Jun−2016
Market Quotes
Cubic spline interpolated line
0 5 10 15 20 25 300.7
0.8
0.9
1
1.1
Time to Maturity
Rate
s
Discount curve for 02−Jun−2016
Discount rates
Cubic spline interpolated line
Figure 2.1: The interpolated zero and discount curves.
The zero curves as well as the discount curves are computed daily over the entire time grid. The
discount curves are used to price the swaptions and are obtained for a maturity up to 30 years. However,
we will only focus on swaptions with a maximum maturity of 10 years and a maximum underlying tenor
of the swap of 10 years.
As showed in Figure 2.1, the two interpolation methods differ very little. The computation time of
the linear interpolation method is however significantly shorter. For this practical reason and the aim of
5
Frank de Zwart Abn Amro Model Validation
this research we have chosen to only use the linear interpolation method.
2.3 Swaptions
Swap options, or swaptions, are options on interest rate swaps. They give the holder the right to enter
into a certain interest rate swap at a certain time in the future. Depending on whether the swaption is a
call or a put option, we call it a payer swaption or a receiver swaption respectively. The swap rate of the
swap contract equals the strike of the swaption. In a payer swaption, the owner pays the fixed leg and
receives the floating leg and in a receiver swaption this is the other way around. For example, a ’2y10y’
European payer swaption with a strike of 1%, represents a contract in which the owner has the right to
enter a swap, with a tenor of ten years, in two years from now where he pays a fixed rate of 1%.
First we define the annuity factor A that is used for discounting
Aα,β(t) =
β∑i=α+1
(Ti − Ti−1)P (t, Ti), (2.8)
where we have again n×m cash flows, with n the number of years and m the number of cash flows per
year respectively. We also define the forward swap rate at time t for the sets of times Ti, like Brigo and
Mercurio (2007). The forward swap rate is the rate in the fixed leg of the interest rate swap that makes
the contract fair at the present time. We denote the forward swap rate as follows
Sα,β(t) =P (t, Tα)− P (t, Tβ)
Aα,β(t). (2.9)
Now one can define the value of a payer swaption with strike K and resetting at Tα, . . . , Tβ−1 as follows
Vswaption(t) = Aα,β(t)Et[(Sα,β(Tα)−K)+
]. (2.10)
The value of the swaption clearly depends on the expected value of the difference between the forward
swap value and the strike rate. To obtain an arbitrage free price of a swaption, we need to define the
corresponding measure used to derive the expected value. This will be elaborated in more detail in
Section 2.4.
Some swaptions or combinations of swaptions will briefly be explained in this section, because of their
relevance for this research. An at-the-money (ATM) swaption is a swaption that has a strike equal to
the par swap rate of the underlying swap of the swaption. There are multiple trading strategies involving
swaptions. We define a straddle as the sum of an ATM payer swaption and an ATM receiver swaption
with the same ATM strike. If the interest rate is close to the strike rate at expiration of the options,
the straddle leads to a loss. However, if there is a sufficiently large move in either direction, a significant
profit will be the result.
We also define a strangle, which is the sum of a receiver swaption with a strike of ’ATM - offset’
and a payer swaption with a strike of ’ATM + offset’. The market normally refers to strangles as a ’2Y
into 10Y 100 out/wide skew strangle’ in which 100 is the width (in basis points) between the payer and
receiver strike and the offset from the ATM to both the payer and receiver swaption is thus width/2.
For example if we assume an ATM strike of 1%, the receiver strike is thus 0.5% and the payer strike is
1.5%. A strangle is a similar strategy to a straddle. The investor is betting that there will be a large
movement in the interest rate, but is uncertain whether it will be an increase or a decrease. If we would
compare the payoff of both strategies, we see that the interest rate has to move farther in a strangle than
6
Frank de Zwart Abn Amro Model Validation
in a straddle for the investor to make profit. However, the downside risk if the interest rate ends up in
a central value is less with a strangle.
Finally, a collar is also defined. A collar is a payer swaption with a strike of ’ATM + offset’ minus
a receiver swaption with a strike of ’ATM - offset’. A collar is normally quoted as a ’2Y into 10Y 100
out/wide skew collar’ in which the width of 100 basis points is again the width between the payer and
receiver strike. So, you will pay floating if the swap rate is within the interval of ’ATM ± offset’ and pay
a fixed rate for the range of the swap rate outside this interval.
2.4 Martingales and Measures
The models that are used to price derivatives try to estimate the expected payoff of the derivative. These
models are based on a stochastic process, which is simply a variable whose value changes over time in
an uncertain way. The processes, where only the current value of a variable is relevant for predicting
the future, are called Markov processes. The Markov property is very useful, because it states that the
future value of a variable is independent of the path it has followed in the past. This corresponds to
the assumption of weak market efficiency and states that all the relevant information is captured in the
current value of the variable (Hull, 2012). A stochastic process that satisfies the Markov property is
known as a Markov process.
We now focus on a particular kind of Markov process, which is known as a Wiener process (or a
Brownian motion). Formally, we define a P-Wiener process as stated in the theorem below (Tsay, 2005).
Theorem 2.2. A real-valued stochastic process Wtt≥0 is a P-Wiener process if for some real constant
σ, under P,
1. for each s ≥ 0 and t ≥ 0 the random variable Wt+s −Ws has the normal distribution with mean
zero and variance σ2t ,
2. for each n ≥ 1 and any times 0 ≤ t0 ≤ t1 ≤ · · · ≤ tn, the random variables Wtr −Wtr−1 are
independent,
3. W0 = 0,
4. Wt is continuous in t ≥ 0.
The probability measure P is defined as the probability of each event A ∈ F . We can think here
of F as a collection of subsets out of the entire sample space. Finally Ft contains all the information
about the evolution of the stochastic process up until time t.
The price of a non-dividend paying stock is often modelled as a Geometric Brownian motion. Before
we define this Geometric Brownian motion we first define a standard Brownian motion as a Wiener
process with zero drift and a variance proportional to the length of the time interval. This corresponds
to a rate of change in the expectation that is equal to zero and a rate of change in the variance that
is equal to one. We now consider a generalized Wiener process, where the expectation has a drift rate
equal to µ and the rate of change in the variance is equal to σ2 (Tsay, 2005). This leads to the following
generalized Wiener process
dxt = µ(xt, t)dt+ σ(xt, t)dWt, (2.11)
7
Frank de Zwart Abn Amro Model Validation
where Wt is a standard Brownian motion. We then consider the modelled change in price of a non-
dividend paying stock over time and this results in the following Geometric Brownian motion
dSt = µStdt+ σStdWt ⇒ dStSt
= µdt+ σdWt, (2.12)
where µ and σ are constant. Now Ito’s lemma can be used to derive the process followed by the logarithm
of St (Itô, 1951). First consider the general case for the continuous-time stochastic process xt of (2.11).
We also define G(xt, t) as a differentiable function of xt and t and find
dG =
(∂G
∂xµ(xt, t) +
∂G
∂t+
1
2
∂2G
∂x2σ2(xt, t)
)dt+
∂G
∂xσ(xt, t)dWt. (2.13)
We then apply Ito’s lemma to obtain a continuous-time model for the logarithm of the stock price. The
differentiable function is now defined as G(St, t) = ln(St). This leads to
d ln (St) =
(µ− σ2
2
)dt+ σdWt. (2.14)
This stochastic process has a constant drift rate of µ− σ2/2 and a constant variance of σ2. This implies
that the price of a stock at some future time T is log-normally distributed, given the current value of
the stock at time t
ln ST ∼ φ[ln S0 +
(µ− σ2
2
)∆, σ2∆
], (2.15)
where ∆ is the fixed time interval T − t. Black’s model is based on this lognormal property together
with the property of a risk-neutral world as will be further explained in Section 4.1.
In order to be able to normalize different asset prices, one can use a numeraire Z as reference asset.
A numeraire is defined as any positive non-dividend-paying asset. A key result used in the pricing of
derivatives is the relation between the concept of absence of arbitrage and the existence of a probability
measure like the martingale measure (or risk-neutral measure). Brigo and Mercurio (2007) denote this
relation as follows based on a numeraire Z
StZt
= EZ[STZT|Ft
]0 ≤ t ≤ T, (2.16)
where the price of any traded asset S (without intermediate payments) relative to Z is a martingale
under probability measure QZ . This probability measure Q is equivalent to the real world probability
measure P. A martingale is a zero-drift stochastic process, so under probability measure QZ we have for
a sequence of random variables S0, S1, . . .
EZ [Si|Si−1, Si−2, . . . , S0] = Si−1, ∀ i > 0. (2.17)
The preferred numeraire to use, depends on the derivative that is priced. Two frequently used numeraires
are now briefly described. First a numeraire based on the zero-coupon bond and secondly a numeraire
based on the annuity of a swap.
A zero-coupon bond, with a maturity T equal to that of the derivative, is commonly used as a
numeraire. We denote the value of this numeraire at time t as Zt and note that ZT = P (T, T ) = 1. We
also denote the measure associated with this numeraire as the T-forward measure QT with expectation
ET . This way we are able to price a derivative by computing the expectation of its payoff under this
measure. This leads to the following price of a derivative at time t
V (t) = P (t, T )ET[V (T )
P (T, T )|Ft
]= P (t, T )ET [V (T )], (2.18)
8
Frank de Zwart Abn Amro Model Validation
for 0 ≤ t ≤ T (Brigo and Mercurio, 2007). Notice that the forward rate is a martingale under this
measure, this makes the forward measure convenient to work with.
The annuity of a swap is a linear combination of zero-coupon bonds. A numeraire is defined as a
positive non-dividend paying asset, so the annuity of a swap can also be used as a numeraire. The
numeraire in this case will be the following portfolio of zero coupon bonds:
ZT = Aα,β(T ) =
β∑i=α+1
(Ti − Ti−1)P (T, Ti), (2.19)
this leads to the swap measure Qα,β . Under this measure we find that the swap rate Sα,β(t) is a
martingale:
Sα,β(t) =P (t, Tα)− P (t, Tβ)
Aα,β(t)(2.20)
⇒ P (t, Tα)− P (t, Tβ)
Zt= Eα,β
[P (T, Tα)− P (T, Tβ)
ZT|Ft
]0 ≤ t ≤ T. (2.21)
These numeraires and their related measures are used in arbitrage-free pricing, which is an essential part
of the option pricing models that are used. These models and their assumptions are further explained in
Section 4.1. However, first we will review several studies that are relevant for this research in the next
section.
3 Literature review
This study combines several methods with different underlying assumptions. First of all, the risk-neutral
world assumption is used and both the Black model and the SABR volatility model are used to price a
swaption on an interval of strike rates. When pricing these swaptions we also take negative interest rates
into account by using a displacement parameter. Then also the risk measures Value at Risk and Expected
Shortfall are computed. The estimated underlying profit and loss distribution is based on the normal
world probabilities to estimate valid risk measures. Hence, we make a bridge between the Q-measure and
the P-measure. To obtain the forecasts of the risk measures, we use two different methods. The quality
of the estimates of these two methods is evaluated by several different backtests. The methods that are
used differ in several ways. First, the Historical Simulation method for instance gives all historical returns
in the estimation window an equal weight and uses them to construct the profit and loss distribution
of the portfolio. The time series analysis that is performed on the other hand simulates one-day-ahead
forecasts of the SABR model parameters. These SABR parameters represent the characteristics of the
volatility structure of the individual swaptions. As a result of this we estimate the risk measures based
on one-day-ahead simulations of this volatility structure instead of the value of the portfolio on itself.
We will now discuss some studies that have also focused on the aspects that we are looking at.
Pérignon and Smith (2010) for instance, compare the disclosed quantitative information and VaR es-
timates of up to fifty international commercial banks in their paper. They use panel data over the
period 1996-2005 and find that VaR estimates are in general excessively conservative and also note that
there is no improvement in the estimates of the VaR over time. Besides this, they find that the most
popular VaR method is the Historical Simulation method. Then they also conclude that this method
helps little in forecasting the volatility of future trading revenues. Pérignon and Smith (2010) use the
Unconditional Coverage test of Kupiec (1995) to test whether the proportion of VaR violations equals
the desired proportion p. The number of VaR violations that is found is extremely small and the null
9
Frank de Zwart Abn Amro Model Validation
hypothesis of unconditional coverage is rejected for every year except for 1998 at the 5% confidence level.
This study clearly shows the relevance of finding an improved and less conservative method to estimate
risk measures like the Value at Risk.
There are multiple improvements proposed in the literature that are based on the Historical Simula-
tion method such as the Filtered Historical Simulation method (FHS) as described by Barone-Adesi et al.
(2002). While the Historical Simulation method was found to be excessively conservative by Pérignon
and Smith (2010), it is also known to underestimate risk in some particular situations. This is because
the method is based on the assumption that the risks do not change over time. Hence, when the market
conditions change and the market becomes more volatile, the risk is underestimated by the method. This
can fortunately be solved by first standardizing the historical returns and then scaling them to the current
volatility as is done with the Filtered Historical Simulation method. In this method a GARCH model
is fitted to the historical data and the residuals are divided by their corresponding volatility estimates.
These standardized residuals are then randomly drawn and used to simulate the one-day-ahead profit
and loss distribution. Even though this method overcomes a shortcoming of the Historical Simulation
method it still needs some care. According to the work of Gurrola and Murphy (2015) the filtering
process changes the return distribution in ways that may not be intuitive. Furthermore it is important
to make a careful selection in which application of the FHS method is used and besides this re-calibration
and re-testing is essential to ensure that the model remains relevant. Finally, Pritsker (2001) also shows
that one has to be careful when dealing with limited data sets. He shows for example that two years of
historical data is not sufficient for the FHS method to estimate the Value at Risk accurately at a 10-day
horizon.
The Historical Simulation method is based on historical returns, however to obtain these returns we
first need to price the swaptions. There are numerous models that can be used to price a swaption.
However, we also need to take the smile risk into account due to the fact that the volatility of the
swaptions differs for different strike rates. To capture this smile risk in the derivatives market Hagan
et al. (2002) introduce the SABR volatility model. West (2005) calibrates the parameters of the SABR
model in a situation where input data is very scarce. The calibration is based on equity futures which are
traded at the South African Futures Exchange. The study focuses on packages of options that combine
multiple derivatives like a collar or a butterfly for example. Some of these packages are traded in total
for about 800 times, while there are more than double that number of strike combinations. West (2005)
compares two cases. First, he estimates all of the SABR model parameters daily and then in the second
case he keeps one of the parameters (β) fixed while he still estimates the other parameters daily. This is
because hedging efficiency can be ensured by changing the parameters only once a month while changing
the input values of F and σATM daily. West (2005) finds that the calibrated parameters of the model
only change infrequently when the value for β is fixed. In fact, they are always changing up to a very high
precision, but they remain unchanged up to a fairly high precision. For this reason, he finds that keeping
the value for β fixed leads to an infrequent change of the other SABR parameters. These infrequent
changes result in the end in lower hedging costs. Hence, this research shows a robust algorithm to capture
the volatility smile based on the SABR model while the input data is very scarce and also shows the
advantages of keeping the parameter β fixed.
Bogerd (2015) also uses the SABR volatility model, but he combines it with the Historical Simulation
method. He focuses on the volatility structure of swaptions in specific. He uses daily observations of
the calibrated SABR model parameters and also uses a displacement parameter to deal with negative
10
Frank de Zwart Abn Amro Model Validation
interest rates. He simulates 1000 one-day-ahead estimates of the profit and loss distribution based on
historical changes in the SABR model parameters. A distinction is made here between the curvature
and the level of the volatility structure. Only varying one of the SABR parameters (i.e. α) results in
just a vertical shift of the volatility skew. Bogerd (2015) notes that this is a reasonable approximation,
because most of the variation in the swaption volatility over time is caused by vertical movements of the
volatility smile. He performs an unconditional coverage test as well as an independence test and only
rejects the independence property for the Historical Simulation method applied to all of the SABR model
parameters. The independence property is tested here with the backtest of Du and Escanciano (2015),
which is based on the Ljung-Box statistic. These results imply that there are possibilities to obtain valid
forecasts of the risk measures based on estimates of the one-day-ahead volatility structure. We note
however that the SABR parameters that represent the volatility structure are dependent on each other
like described in Section 4.1.2. When dealing with such a time series of interdependent parameters a
multivariate time series model can be used to capture the dynamics of the parameters over time. This
makes it interesting to investigate whether it is possible to improve the one-day-ahead forecasts of the
volatility structure by using a time series analysis.
There are however some difficulties when applying the Historical Simulation method on the SABR
model parameters. Moni (2014) explains that it is questionable if it is meaningful to add past changes
in the SABR parameters to their current values. A change in the SABR parameters changes the entire
volatility structure. Such a change may not always be valid, especially if the values of the historical
SABR parameters are significantly different from the current values of the SABR parameters. For this
reason, the Historical Simulation method will not be applied to the SABR parameters in this study. We
will compare estimated risk measures of the Historical Simulation method based on the portfolio returns
with the estimated risk measures based on a time series analysis of the SABR model parameters.
In this study we make use of two different measures with each their own underlying assumptions.
The risk-neutral world assumption makes it possible for us to compute the expected value of future
payoffs without having to deal with the different risk preferences of buyers and sellers of derivatives.
Giordano and Siciliano (2013) clarify in their paper that this risk-neutral hypothesis is acceptable for
pricing derivatives. However, they also note that the risk-neutral assumption can not be used to forecast
the future value of a financial product. So, if we estimate the one-day-ahead value of a swaption we need
to take the risk premium into account. Hence, we compute the estimated profit and loss distribution
based on the real-world probability measure P. Therefore we use the risk-neutral world assumption only
to compute the volatility structure of the derivatives based on the quoted historical swaption premiums.
These volatility structures are then used together with the risk-neutral assumption to price the swaptions
up to and including the last day of the estimation window. The methods that are then used to estimate
the one day ahead profit and loss distribution do not depend on the risk-neutral assumption. The one-
day-ahead forecasts of the price of the swaptions are estimated based on the real-world probabilities.
The risk measures are then computed based on these estimates of the profit and loss distribution.
The adequacy of the forecasts based on these models will be assessed by several backtests. Piontek
(2009) reviews various backtests that assess the quality of models that produce VaR estimates. He
analyzes some commonly used backtesting methods in his research and focuses on the problems regarding
limited data sets and low power of the tests. The simulations are performed for different sample sizes
with the number of observations between 100 and 1000. He finds a low power for the backtest of Kupiec
(1995) for all of these sample sizes. He tests for example based on 250 observations and an inaccurate
11
Frank de Zwart Abn Amro Model Validation
model that gives 3% or 7% of violations, instead of the chosen tolerance level of 5%. In this example
the backtest only rejects the model in 35% of the draws. This shows that an inaccurate model in such a
situation is not rejected in 65% of the cases with a significance level of 5%. A low power is also found by
other backtests and this shows that we can not assume that a model is correct if it is not rejected by a
backtest. In the empirical study of this research we also have to deal with a limited backtesting sample
size of 363 observations. For this reason, we apply numerous different backtests that enable us to assess
the quality of our methods more extensively.
In the next section we will first discuss the models and methods that are used in the empirical part
of this research. We will then continue with a description of the data and then also discuss the results
of the empirical study.
4 Models and method
The SABR volatility model that is used will be explained in more detail in Section 4.1.2. It will be
used to convert the quoted market swaption premiums into a volatility surface that allows us to price
swaptions for arbitrary non-quoted strikes. This will be done for a selected combination of the expiry
and tenor, so not the entire surface will be taken into account.
4.1 Option pricing models
Under the right corresponding measure, we have seen that both the forward rate as well as the swap
rate are martingales. In this research we use the Euribor forward rate, which is a martingale under the
forward measure QT . We also have that forward swap rates are a martingale under their measure Qα,β .The option pricing models are based on the following stochastic process
dFt = c(t, . . . )dWt. (4.1)
The Brownian motion Wt and coefficient c can be deterministic or random. Note that the dynamics do
not have a drift term, since the forward rate is a martingale under its corresponding measure.
4.1.1 Black’s model
Black (1976) introduced a model which gives a closed form solution for the price of an option under the
assumption that price movements of the forward rate Ft follow a log-normal distribution. The dynamics
in Black’s model depend on the current value of the forward rate Ft and one parameter σB called Black’s
volatility and are given by the following equation
dFt = σBFtdWt F0 = F > 0. (4.2)
The standard continuous-time stochastic process is denoted in (2.11). Notice that the drift parameter
µ is dropped out of Black’s differential equation. This implies that the equation is independent of risk
preferences. Black, Scholes and Merton use in their analysis that a riskless portfolio can be set up from
the stock and the derivative. This portfolio is riskless for an instantaneously short period, but can be
rebalanced frequently. This way one can assume that investors are risk-neutral and therefore use the
following results. The expected return on all securities is the risk-free interest rate r and the present
value of any cash flow can be obtained by discounting its expected value at the risk-free rate (Tsay,
2005).
12
Frank de Zwart Abn Amro Model Validation
The expected payoff of an European call option on a futures contract under the forward measure is
ET [max(V (T )−K, 0)], (4.3)
where ET denotes the expected value under the forward measure and V (T ) is the value of the underlying
of the option at time t = T . We denote the price of this call option at time t as
ct = P (t, T )ET [max(V (T )−K, 0)]. (4.4)
Using the dynamics of (4.1), the following well known solution for the price of an European call option
on a futures contract can be derived
c0(F0,K, T ;σB) = P (0, T )[F0φ(d1)−Kφ(d2)],
d1 =log
(F0
K
)+ σ2
2 T
σ√T
,
d2 = d1 − σ√T .
(4.5)
Besides this general formula, one can also compute the price of a payer swaption with Black’s formula,
as described in Hull (2012)
d1 =ln(Sα,β(Tα)
K
)+ σ2
2 T
σ√T
,
d2 = d1 − σ√T ,
VSwaption(t) = LAα,β(t)[Sα,β(Tα)N(d1)−KN(d2)],
(4.6)
where L is the notional principal value of the contract. In this formula the swap rate is used instead of
the discounted futures price, based on this swap rate and the swap measure we can price a swaption in
a similar manner to an option on a futures contract.
4.1.2 SABR volatility model
One of the assumptions of the Black model is that a fractional change in the futures price over any
interval follows a lognormal distribution (Black, 1976). If this assumption would be violated, some of
the outcomes will as a result change. If for example the probability of a large positive movement in the
interest rate would actually be significantly higher than implied by the lognormal property, this would
lead to a higher expected payoff of an out-of-the-money (OTM) payer swaption with a strike rate in
this region. The corresponding price of such a swaption will subsequently also need to be higher than
the price based on the lognormal assumption. This phenomenon is observed in the market and leads to
a volatility that varies for different strike rates, as opposed to the constant Black’s volatility. For this
reason, we introduce a volatility model to take this volatility skew into account.
The Stochastic Alpha Beta Rho model, like derived by Hagan et al. (2002), is given by a system of
two stochastic differential equations. The state variables Ft and αt are defined as the forward interest
rate and a volatility parameter respectively. The dynamics of the model are as follows
dFt = αtFβt dW
(1)t F0 = F > 0,
dαt = ναtdW(2)t α0 = α > 0,
dW(1)t dW
(2)t = ρdt,
(4.7)
13
Frank de Zwart Abn Amro Model Validation
where the power parameter β ∈ [0, 1] and ν > 0 is the volatility of αt, so the volatility of the volatility
of the forward rate. dW (1)t & dW
(2)t are two ρ-correlated Brownian motions. The factors F and α are
stochastic and the parameters β, ρ and ν are not.
West (2005) describes the parameters in more detail. α is a ’volatility-like’ parameter: not equal to
the volatility, but there will be a functional relationship between this parameter and the at-the-money
volatility. Including the constant ν acknowledges that volatility obeys well known clustering in time.
The parameter β ∈ [0, 1] defines the relationship between futures spot and at-the-money volatility. A
value of β close to one indicates that the user believes that if the market were to move up or down in an
orderly fashion, the at-the-money volatility level would not be affected significantly. Whereas for values
of β << 1 it indicates that if the market were to move then the at-the-money volatility would move
in the opposite direction. The closer β is to zero the more distinct this effect would be. Moreover the
value for β also gives insight in the distribution of the the underlying. If β is close to one the stochastic
model is said to be more lognormal and the closer β is to zero the closer the stochastic model follows the
normal distribution instead.
Hagan et al. (2002) show that the price of a vanilla option under the SABR model is given by the
appropriate Black’s formula, provided the correct implied volatility is used. For given α, β, ρ, ν and τ ,
this volatility is given by
σ(K,F, τ) =α(
1 +(
(1−β)224
α2
(FK)1−β+ 1
4ρβνα
(FK)(1−β)/2+ 2−3ρ2
24 ν2)τ)
(FK)(1−β)/2[1 + (1−β)2
24 ln2 FK + (1−β)41920 ln4 FK
] z
χ(z), (4.8)
where z =ν
α(FK)(1−β)/2ln
F
K, (4.9)
and χ(z) = ln
(√1− 2ρz + z2 + z − ρ
1− ρ
), (4.10)
for an option with strike K, given that the current value of the forward price is F . Here we note that in
our case we have that the forward value is equal to the par swap rate. Hence, we have F = Sα,β(Tα) and
note that if F = K the swaption is said to be at-the-money. For the ATM strike rate, we can remove
the terms z and χ(z) from the equation, because in the limit we have zχ(z) = 1. So for an at-the-money
volatility, one can rewrite the equation as follows
σATM (F, τ) =
(1−β)2τ24F (2−2β)α
3 + ρβντ4F (1−β)α
2 +(
1 + 2−3ρ224 ν2τ
)α
F (1−β) , (4.11)
where τ is the year fraction to maturity. This formula is closed form, which makes the model very
convenient for the pricing of an option.
There is however one main drawback of Hagan’s formula. This drawback is that the formula is known
to produce wrong prices in region of small strikes for large maturities. Obłój (2008) proposes for this
reason to an improvement to the original formulas that compute the volatility as defined by Hagan et al.
(2002). In his paper he gives several arguments to use the formula derived by Berestycki et al. (2004).
To understand why we use the formula of Berestycki et al. (2004), we consider the Taylor expansion of
the implied volatility surface
σ0(K,F, τ) = σ0(K,F )(1 + σ1(K,F )τ
)+O(τ2). (4.12)
Obłój (2008) then compares the explicit expressions of Hagan et al. (2002) and Berestycki et al. (2004)
for σ0(K,F ) and σ1(K,F ). It can be shown that both expressions for σ0(K,F ) and σ1(K,F ) are exactly
14
Frank de Zwart Abn Amro Model Validation
the same when either K = F , ν = 0 or β = 1. However, when β < 1 the results of σ0(K,F ) of the
two papers differ and Obłój (2008) argues that the formula of Berestycki et al. (2004) is correct and
should be used. This conclusion is based on two arguments. First of all Hagan’s formula is inconsistent
if β → 0. And secondly the formula suggested by Obłój (2008) produces, in most cases, correct prices in
the region of small strikes for large maturities, unlike Hagan’s formula.
The formula for the implied volatility is now obtained by combining σ0(K,F ) from Berestycki et al.
(2004) and σ1(K,F ) from Hagan et al. (2002). We define the fine-tuned implied volatility as follows
σ(K,F, τ) =ν ln F
K
(1 +
((1−β)2
24α2
(FK)1−β+ 1
4ρβνα
(FK)(1−β)/2+ 2−3ρ2
24 ν2)τ)
χ(z), (4.13)
where z =ν
α
F (1−β) −K(1−β)
1− β, (4.14)
and χ(z) = ln
(√1− 2ρz + z2 + z − ρ
1− ρ
), (4.15)
which is used instead of (4.8) if there is reason to assume that β < 1.
We will now discuss the method that is used to calibrate the SABR model parameters. In the
empirical part of this research we will only find values of β < 1, so as a result we will only work with
(4.13) instead of (4.8). Nevertheless Obłój (2008) showed that the expressions from Hagan et al. (2002)
and Berestycki et al. (2004) are exactly the same for the volatility of an at-the-money swaption. For this
reason, (4.11) remains valid. We now follow the steps from West (2005) and notice the following relation
ln σATM = ln α− (1− β)ln F + . . . , (4.16)
so the right value of β can be estimated from a log-log plot of σATM and F . Hagan et al. (2002) suggest
that it is appropriate to fit this parameter in advance and never change it. So the appropriate value for
β is chosen first. Then (4.11) is inverted to obtain an expression of α in the other SABR parameters and
the at-the-money volatility. This is done by setting the equation equal to zero and selecting the smallest
positive real root. In the final step we minimize the difference between the market volatilities and the
volatilities computed with the SABR model
minρ,ν|σM − σSABR(α, β, ρ, ν, τ)|, (4.17)
where β is already estimated and α(σATM , β, ρ, ν, τ). The time to maturity τ is also known, so we
calibrate ρ and ν by minimizing this difference. In this method, we calibrate the parameters so that the
produced at-the-money volatilities are exactly equal to the market quotes. The at-the-money volatilities
are important to match, because they are traded most frequently. Finally, when all of the parameters
are calibrated and we have estimated the SABR volatility for a swaption, we can use (4.6) to price this
swaption. The steps to calibrate the SABR model parameters are all applied and described in more
detail in Section 6.1.
4.1.3 Pricing in a negative interest rate environment
Before we continue with the time series analysis, we first need to consider a method that enables us to
price derivatives in a negative interest rate environment. The option pricing models that are used in this
research do not allow interest rates to become negative. However a lot has changed since these models
where constructed and we need to adjust our models to be able to deal with the negative interest rates
15
Frank de Zwart Abn Amro Model Validation
that have occurred over the past years. Frankema (2016) describes the Displaced Black’s model as well
as the displaced SABR model, which allow interest rates to be negative. The shifted models with shift
s > 0 allow rates larger than −s to be modelled. This leads to the following adjusted dynamics of Black’s
model, which is also known as a displaced diffusion process
dFt = d(Ft + s) = σB(Ft + s)dWt, (4.18)
where s is the constant displacement (or shift) parameter. Note that Ft ≡ (Ft+s) follows a lognormal (or
Black) process. This fact, together with the fact that the payoff of a European call option max(FT−K, 0)can be written as
max(FT −K, 0) = max((FT + s)− (K + s), 0) ≡ max(FT − K), (4.19)
leads to the conclusion that European calls and puts can be valued under the displaced diffusion model
by plugging in F0 ≡ (F0 + s) and K = (K + s) in Black’s model.
A similar adjustment leads to the following dynamics of the displaced SABR model
dFt = αt(Ft + s)βdW(1)t ,
dαt = ναtdW(2)t ,
E[dW(1)t dW
(2)t ] = ρdt.
(4.20)
Hence, we use the formulas from Black’s model (4.6) and the SABR model (4.13) with the displaced
values F0 and K instead of F0 and K. A drawback of the displaced models however is that the shift
parameter needs to be selected a priori. So an assumption has to be made on the minimum of the interest
rate. To overcome this drawback mentioned above, Antonov et al. (2015) describe the Free boundary
model. However for this research the displaced SABR model is preferred.
4.2 Time series analysis
The SABR volatility model parameters are estimated on a daily basis. The aim of this research is to
estimate the risk related to a portfolio of swaptions. Therefore an analysis of these SABR parameters
over time is of interest to be able to forecast the one-day-ahead volatility structure. In this section, some
models will be discussed that are used to capture the dynamics of the parameters αt, ρt and νt over
time.
4.2.1 Vector Autoregressive model
A time series is called white noise if all autocorrelation functions (ACF) of a sequence γt are equal
to zero. So we need for a white noise series, that all sample ACFs are close to zero. To obtain this, we
need to apply some time series models to model the dynamic structure of our time series. Tsay (2005)
denotes first the simple autoregressive model of order 1 or simply AR(1) model. This model is defined
as follows:
γt = φ0 + φ1γt−1 + at, (4.21)
where at is assumed to be a white noise series with mean zero and variance σ2a.
This model described above could make sense for the individual parameters, but we have to obtain
a forecast of all of the SABR parameters together. These parameters clearly depend on each other like
16
Frank de Zwart Abn Amro Model Validation
described in (4.7). Hence, a model that takes the correlation between these time series into account
is desired. The vector autoregressive model (VAR) is a model that can be used for this kind of linear
dynamic structures of a multivariate time series. We fit a VAR model to the three time series α, ρ and ν
Γt = φ0 + ΦΓt−1 + at,
where Γt =
αt
ρt
νt
.(4.22)
The vectors Γt and φ0 are k-dimensional, Φ is a k×k matrix, and at is a sequence of serially uncorrelated
random vectors with mean zero and co-variance matrix Σ. Note that we are modelling three different
SABR parameters over time and for this reason have k = 3.
For our V AR(p) model estimation, we have to decide how many lags p to include. A vector au-
toregressive model of lag length p refers to a time series in which its current value is dependent on its
first p lagged values. There are several tools that can be used to decide which lag length to include.
Firstly a sample autocorrelation function (ACF) of the parameters can be used to check their level of
autocorrelation. If we have a weakly stationary return series γt, we define the lag-l autocorrelation of
γt, ACFl, as the correlation coefficient between γt and γt−l. We define ACFl as follows (Tsay, 2005)
ACFl =Cov(γt, γt−l)√V ar(γt)V ar(γt−l)
=Cov(γt, γt−l)
V ar(γt)(4.23)
Another method to determine the optimal selection of lags to include is to use information criteria.
These criteria like the Akaike information criterion (AIC), Bayes information criterion (BIC) and Hannan-
Quinn criterion (HQC) can be used to measure the relative quality of statistical models for a given set
of data. Liew (2004) compares these different criteria in a simulation study to obtain the best choice of
lag length criteria for an autoregressive model. He finds out that for a relatively large sample, with 120
or more observations, the Hannan-Quinn criterion is found to outdo the rest in correctly identifying the
true lag length.
4.2.2 Local level model
A local level model is a type of state space model, which can like the VAR model also be used for a
time series analysis. In a classical regression model, a trend and an intercept are estimated. However,
when focusing on a time series this intercept might in reality not be fixed over time. When this level
component changes over time it is applied locally and for this reason this model is known as the local
level model. The local level model allows this intercept to change over time and is defined as follows
µt+1 = Imµt +Bηt, where µt =
µ(1)t
µ(2)t
µ(3)t
, (4.24)
Γt = Cµt +Dεt, (4.25)
where Γt is the vector of SABR parameters that is defined in (4.22). Moreover the observation or
measurement equation (4.25) contains the values of the three observed time series at time t. Besides
this, we also have a m × 1 vector of unobserved variables µt. Three unobserved variables are used in
this research, so we have here m = 3. These unobserved variables represent the unknown fixed effects
17
Frank de Zwart Abn Amro Model Validation
and we define (4.24) as the state equation. We also define εt as the observation disturbances and ηt as
the state disturbances respectively. These disturbances are independent and follow the standard normal
distribution.
The state disturbance coefficient matrix B is here defined as a 3 × 3 matrix. This results in a co-
variance matrix equal to BB′. The observation innovation coefficient matrix D is defined in a similar way
as a 3× 3 matrix, which leads to an observation innovation co-variance matrix equal to DD′. Both the
state disturbance coefficient matrix as well as the observation innovation coefficient matrix are defined as
a diagonal matrix. The diagonal elements of these matrices are estimated by using maximum likelihood.
Furthermore, we note that Im is the identity matrix of size m = 3. Finally, the 3 × 3 matrix C links
the unobservable factors of the state vector µt with the observation vector Γt. All the coefficients of the
matrix C are also estimated by using maximum likelihood.
The state equation is defined as a random walk and in the measurement equation an irregular com-
ponent εt is added, which makes this model a random walk plus noise. The state equation is essential in
time series analysis, because the time dependencies in the observed time series are dealt with by letting
the state at time t+ 1 depend on the state at time t (Commandeur and Koopman, 2007).
4.3 Risk measurement
The option pricing models are based on a probability measure Q that is related to a risk-neutral world.
On the other hand the real probability P is used to estimate the risk of a portfolio. These two measures
give different weights to the same possible outcomes for the same derivatives. The risk measures are
based on estimates of the profit and loss distribution. The probability of occurring a certain value from
this profit and loss distribution needs to be equivalent to the real probability P to obtain a valid risk
measure. In this research, we will use option pricing models together with the risk neutral measure to
price the swaptions. These swaption prices as well as the calibrated parameters of the SABR model are
then used to derive the profit and loss distribution under the probability in the real world. In this section
the concepts of financial risk and some methods of measuring risk will be introduced. This includes a
definition of Value at Risk and Expected Shortfall as well as their limitations.
4.3.1 Risk measures
Financial risk can be seen as the change of a loss in a financial position, caused by an unexpected change
in the underlying risk factor. In this research we focus on a portfolio of swaptions, so the risk related to
this is the risk of losses in positions arising from movements in market prices. The risk we are trying to
measure is called market risk and in our case in specific, interest rate risk.
Now a formal definition of a risk measure is provided. We have a finite set of states of nature Ω,
a set of all risks χ and the set of all real-valued functions X ∈ χ , which represent the final net worth
of an instrument for each element of Ω. We now define a risk measure ρ(X) as a mapping of χ into R(Roccioletti, 2016).
To assess whether a risk measure is acceptable, the axioms of a coherent risk measure are defined. In
other words, a risk measure is said to be coherent if it satisfies the following four properties.
Axiom 1. Translation Invariance
For all X ∈ χ and for all m ∈ R, we have
ρ(X +m) = ρ(X)−m (4.26)
18
Frank de Zwart Abn Amro Model Validation
Translation invariance implies in words that the addition of a sure amount of capital reduces the risk by
the same amount.
Axiom 2. Sub-additivity
For all X1 ∈ χ and X2 ∈ χ, we have
ρ(X1 +X2) ≤ ρ(X1) + ρ(X2) (4.27)
So, the risk of two portfolios together cannot get any worse than adding the two risk separately.
Axiom 3. Positive Homogeneity
For all X ∈ χ and for all τ > 0, we have
ρ(τX) = τρ(X) (4.28)
Again in words, positive homogeneity implies the risk of a position is proportional to its size.
Axiom 4. Monotonicity
For all X1 ∈ χ and X2 ∈ χ with X1 ≤ X2, we have
ρ(X1) ≥ ρ(X2) (4.29)
Finally, like described by Roccioletti (2016), the monotonicity axiom explains that if, in each state of
the world, the position X2, performs better than position X1, then the risk associated to X1 should be
higher than that related to X2.
The Value at Risk measure is a single estimate of the amount by which an institution’s position in
a risk category could decline due to general market movements during a given holding period. Define
∆Vl as the change in value of the assets of a financial position from time t to t + l. This quantity will
be measured in Euros and is a random variable at time index t. The cumulative distribution function of
∆Vl is expressed as Fl(X). The Value at Risk measure is defined such that a loss will not exceed VaR
with probability 1-p over a given time horizon(Tsay, 2005). The VaR is given by
p = Pr[∆Vl ≤ −V aR] = Fl(−V aR). (4.30)
Although VaR is widely used among banks, it also has several limitations. First of all, as described
by the Basel Committee (2013), the VaR measure does not capture tail risk. As it is a single estimate of
the minimal potential loss in an adverse market outcome, it will underestimate the actual potential loss.
The Value at Risk measure gives no estimation of the magnitude of the loss in such an event. Besides
this the sub-additivity property fails to be valid for VaR in general, meaning that it is not a coherent
risk measure and we can have
V aR(X1 + · · ·+Xd) > V aR(X1) + · · ·+ V aR(Xd). (4.31)
While in general, portfolio diversification always leads to risk reduction, this is not the case for the
VaR measure. This is especially a problem when we consider the capital adequacy requirements for a
financial institution made of several businesses. With a decentralized approach, where the VaR number
is calculated for every different branch, we can not be sure if the aggregated overall risk is an accurate
estimation. However, we note that VaR is not sub-additive in general, but whether or not it is the case
depends on the properties of the joint loss distribution.
To overcome the shortcomings of the Value at Risk measure, the Expected Shortfall measure can be
used instead. Expected Shortfall is the expected return of the portfolio given that a loss has exceeded
19
Frank de Zwart Abn Amro Model Validation
the VaR. We define the ES as
−ES(1−p) = E[∆Vl|∆Vl ≤ −V aR(1−p)], (4.32)
−ES(1−p) =1
p
∫ −V aR(1−p)
−∞xfl(x)dx, (4.33)
where fl(x) is the probability distribution function of ∆Vl. In these formulas we assume a long position
in the portfolio, but the same can be derived for a short position.
Expected Shortfall fulfills all the four axioms above and so it is a coherent risk measure. Also, the
tail risk is taken into account with the ES measure. There are still some other issues with this measure.
To obtain the ES forecast, we first need to ascertain the VaR estimate to subsequently compute the tail
expectation. This brings a greater uncertainty into the estimation. There is also some difficulty with
the validation of risk models’ ES forecasts. As showed by Gneiting (2011), Expected Shortfall is not
elictable. A function is elictable if there exists a scoring function that is strictly consistent for it. The
difficulty with the ES forecasts is that it measures all risk in the tail of the return distribution. Some
losses far out in the tail however, will not be observed in regular backtesting. Despite these drawbacks,
it is still proposed as a replacement of the VaR measure.
To assess the risk related to the swaptions, we want to compute the Value at Risk and Expected
Shortfall forecasts. The Historical Simulation method will now be described. This procedure uses
historical returns to predict the VaR. It is easy to implement, but has some shortcomings. All of the
returns are given the same weight, so this procedure does not take the decreasing predictability of data
that are further away from the present into account.
Let rt, rt−1, . . . , rt−K be the returns of a portfolio in the sample period. So first the changes in swap-
tion price over our sample are computed. Then we sort the returns in ascending order: r[1], r[2], . . . , r[K].
The one-day ahead Value at Risk is given by:
−V aR(1−p) = r[k], (4.34)
where k = Kp. The Expected Shortfall follows from the previous steps and can be computed as follows:
−ES(1−p) =1
k
k∑i=1
r[i] (4.35)
We note that classical HS is only valid in theory when the volatility and the correlation are constant
over time, when dealing with a time-varying volatility we need to use another method.
4.4 Backtests
Backtesting can be described as checking whether realizations are in line with the model forecasts.
Financial institutions base their decisions partly on their estimates of risk measures. Therefore it is very
important to test whether these estimates are accurate. There are various different tests developed over
time to be able to assess the quality of the models that produce these estimates. Even though it may
seem like a simple task, there are some complications. The main difficulty is that the methods result in an
estimate of the profit and loss distribution daily, but to assess the quality of this estimated distribution
only one true profit or loss is observed. Especially the evaluation of the accuracy of an Expected Shortfall
estimate is challenging. If we focus on the ES(0.975) for example, we only incur in theory a loss that
exceeds the V aR(0.975) in 2.5% of the cases. With this minimal number of actual losses that exceed the
20
Frank de Zwart Abn Amro Model Validation
VaR, we need to assess whether the ES forecast actually represents the true expected value of the tail
loss. Besides this, the tail loss is also estimated based on a different profit and loss distribution for every
new forecast. Fortunately there are some methods to backtest the models we are using. We will mainly
focus however on backtests based on the VaR estimates, but we will also perform a backtest to assess
the performance of the models with regard to their Expected Shortfall estimates.
Campbell (2007) reviews a variety of backtests. He defines a hit function that creates a sequence like
for example: (0, 0, 0, 1, 0, 0, . . . , 1), where a 1 stands for a loss that exceeds the VaR measure. Determining
the accuracy of the VaR measure can be reduced to determining whether the hit sequence satisfies two
properties. First of all, the probability of receiving a loss that exceeds the (1− p)% VaR measure must
be p. Secondly, any two elements of the hit sequence must be independent from each other. Only hit
sequences that satisfy both properties can be described as evidence of an accurate VaR model. Let this
hit function be defined as follows
It =
1, if rt+1 < −V aR(1−p)
0, if rt+1 ≥ −V aR(1−p).(4.36)
The hit function is used to test the unconditional coverage property with the backtest proposed by
Kupiec (1995) and also to test the independence property with the backtest proposed by Christoffersen
(1998). In addition also the magnitude of losses that exceed the VaR can be taken into account with a
magnitude-based test.
4.4.1 Unconditional coverage backtesting
The unconditional coverage backtest, proposed by Kupiec (1995), tests the null hypothesis of E[It] = p.
The hit function defined at the beginning of this section is used and we first compute the total number
of hits
n1 =
T∑t=1
It, (4.37)
we also define n0 = T − n1 as the total number of returns larger than −V aR(1−p). The estimated
probability now becomes
π =n1
n0 + n1. (4.38)
So this corresponds to the following hypothesis based on the returns and the Value at Risk measure
H0 : π = p, H1 : π 6= p. (4.39)
The likelihood under the null hypothesis is defined as
L(p; I1, I2, . . . , IT ) = (1− p)n0pn1 , (4.40)
and under the alternative hypothesis as
L(π; I1, I2, . . . , IT ) = (1− π)n0πn1 . (4.41)
This can be tested with a standard likelihood ratio test
LRuc = −2 log[L(p; I1, I2, . . . , IT )
L(π; I1, I2, . . . , IT )]
]asy∼ χ2(m− 1) (4.42)
21
Frank de Zwart Abn Amro Model Validation
The variable m is the number of possible outcomes of the hit sequence, so in this case we have m = 2.
The LR-statistic converges under the null hypothesis to the chi-squared distribution with one degree of
freedom
LRuc = 2 log[(
1− π1− p
)n0(π
p
)n1]
d−−→ χ2(1). (4.43)
4.4.2 Magnitude-based test
Frequency tests do not take the magnitude of the losses into account, therefore it is desired to also
perform a magnitude-based test. If we consider for example two different banks, with both a V aR(0.99)
estimate. Say that these banks both encounter three losses that exceed their Value at Risk estimate
within the same time period. The Unconditional Coverage test would indicate that the performance of
both models is similar. However, it could be the case that Bank A has occurred three losses that exceed
the VaR with one million euros, while Bank B has occurred losses that exceed the VaR with one billion
euros. This difference in the risk is obvious, so for that reason a multivariate version of the unconditional
coverage test will be applied.
Colletaz et al. (2013) describe a method to validate risk models. The test is based on the intuition
that a large loss will not only exceed the V aR(1−p), but is also likely to exceed the V aR(1−p′) with
p′ < p. A standard Value at Risk violation is defined as a exception and a super exception is defined as
rt < −V aR(1−p′). Based on these two concepts the following null hypothesis is defined
H0 : E[It(p)] = p and E[It(p′)] = p′. (4.44)
To test this hypothesis, we define two hit functions to indicate the frequency of returns that fall in each
interval
J1,t = It(p)− It(p′) =
1 if − V aR(1−p′) < rt < −V aR(1−p),
0 otherwise,(4.45)
J2,t = It(p′) =
1 if rt < −V aR(1−p′),
0 otherwise ,(4.46)
and J0,t = 1 − J1,t − J2,t = 1 − It(p). The hit functions Ji,t2i=0 are Bernoulli random variables equal
to one with probability 1− p, p− p′, and p′, respectively. The hit functions are not independent of each
other and we now denote ni,t =∑Tt=1 Ji,t, for i = 0, 1, 2. Then we define the proportions of exceptions
as follows
π0 =n0
n0 + n1 + n2, π1 =
n1n0 + n1 + n2
, and π2 =n2
n0 + n1 + n2. (4.47)
The likelihood ratio test can now also be defined for the multivariate case
LRmuc(p, p′) = 2 ln
[(π0
1− p
)n0(
π1p− p′
)n1(π2p′
)n2]
d−−→ χ2(2), (4.48)
where the χ2 distribution has m− 1 degrees of freedom with in this case m = 3.
4.4.3 Independence backtesting
The next step is to test whether any two outcomes of the hit sequence are independent of each other.
Christoffersen (1998) proposed a test that examines whether the likelihood of a VaR violation today is
22
Frank de Zwart Abn Amro Model Validation
dependent on a violation yesterday. The hypotheses are constructed as follows
πij = P (It = j|It−1 = i), i, j = 0, 1,
H0 : π01 = π11 = p, H1 : π01 6= π11.(4.49)
Christoffersen (1998) tests the independence property against an explicit first-order Markov alternative.
First a transition probability matrix is defined based on the binary first-order Markov chain It
Π1 =
1− π01 π01
1− π11 π11
. (4.50)
We now define nij as the number of observations with value i followed by j and this leads to the following
likelihood function
L(Π1; I1, I2, . . . , IT ) = (1− π01)n00πn0101 (1− π11)n10πn11
11 . (4.51)
Conditioned on the first observation, the log likelihood can be maximized and the parameters are ratios
of the counts of the appropriate cells
Π1 =
n00
n00+n01
n01
n00+n01
n10
n10+n11
n11
n10+n11
. (4.52)
We now consider a similar interval model, with the same output sequence It. This Markov chain model
has the independence property and is given by
Π2 =
1− π2 π2
1− π2 π2
. (4.53)
This gives us the likelihood under the null hypothesis
L(Π2; I1, I2, . . . IT ) = (1− π2)(n00+n10)π(n01+n11)2 , (4.54)
where we can again maximize the likelihood function and estimate the parameters. This leads to
Π2 = π2 =n01 + n11
n00 + n10 + n01 + n11, (4.55)
now the likelihood ratio test follows and is like the unconditional coverage test asymptotically χ2 dis-
tributed with (m− 1)2 degrees of freedom
LRind = −2 log
[L(Π2; I1, I2, . . . , IT )
L(Π1; I1, I2, . . . , IT )
]asy∼ χ2((m− 1)2). (4.56)
This leads again to a χ2 distribution with one degree of freedom, because we again have m = 2
LRind = 2 log
[(1− π01)n00πn01
01 (1− π11)n10πn1111
(1− π2)(n00+n10)π(n01+n11)2
]d−−→ χ2(1). (4.57)
4.4.4 Duration-based test
In addition to the tests described above, one could also assess the duration between two consecutive hits.
The baseline idea is that if the one-day-ahead Value at Risk is correctly specified for a coverage rate
p, then the durations between two consecutive hits must have a geometric distribution with a success
23
Frank de Zwart Abn Amro Model Validation
probability equal to p (Candelon et al., 2010). When the model satisfies the unconditional coverage
property (UC) as well as the independence property (IND), the VaR forecasts are said to have a correct
conditional coverage (CC). Under this property, the VaR violation process is a martingale difference
E[It(p)− p|Ft−1] = 0. (4.58)
The hit series It(p) is a random sample from a Bernoulli distribution with a success probability equal
to p. We denote the duration between two consecutive violations as
di = ti − ti−1, (4.59)
where ti represents the date of the ith violation. A GMM moment condition test is used to backtest
the UC, IND and CC properties, but now based on the duration. First we define the orthonormal
polynomials associated to a geometric distribution with a success probability p as follows
Mk+1(d, p) =(1− p)(2k + 1) + p(k − d+ 1)
(k + 1)√
1− pMk(d, p)−
(k
k + 1
)Mk−1(d, p), (4.60)
for any order k ∈ N, with M−1(d, p) = 0 and M0(d, p) = 1. If the true distribution is a geometric
distribution with a success probability p, then we have
E[Mk(d, p)] = 0, ∀ k ∈ N∗, ∀ d ∈ N∗. (4.61)
This leads to the following hypotheses for each property
H0,uc : E[M1(di, p)] = 0,
H0,ind : E[Mk(di, q)] = 0, k = 1, . . . ,K,
H0,cc : E[Mk(di, p)] = 0, k = 1, . . . ,K,
(4.62)
where K is defined as the number of moment conditions. The unconditional coverage property is tested
with the first hypothesis. This hypothesis states that the expected value of the first moment condition
is equal to zero for the sequence of durations d1, . . . , dN. The second hypothesis is used to test the
independence property. This hypothesis states in words that the expected value for every moment
condition is equal to zero. There is however one difference, the probability q in the moment conditions
does not has to be equal to the true success probability p. Finally the conditional coverage property is
tested with the final hypothesis, which is a combination of the other two hypotheses. Now the statistics
of the three different tests are defined
GMMuc(K) =
(1√N
N∑i=1
M1(di, p)
)2
d−−→ χ2(1), (4.63)
GMMind(K) =
(1√N
N∑i=1
M(di, q)
)T (1√N
N∑i=1
M(di, q)
)d−−→ χ2(K), (4.64)
GMMcc(K) =
(1√N
N∑i=1
M(di, p)
)T (1√N
N∑i=1
M(di, p)
)d−−→ χ2(K). (4.65)
Note however that in the second equation the value of q is not known, so has to be estimated. Candelon
et al. (2010) show that the distribution of the GMM statistic GMMind, based on Mk(di, q), is similar to
the one based on Mk(di, q) and this leads to
GMMind(K) =
(1√N
N∑i=1
M(di, q)
)T (1√N
N∑i=1
M(di, q)
)d−−→ χ2(K − 1), (4.66)
(4.67)
24
Frank de Zwart Abn Amro Model Validation
because the first polynomial is used to estimated the maximum likelihood estimator q. The first polyno-
mial M1(di, q) is strictly proportional to the score used to define the maximum likelihood estimator q,
so we solve M1(di, q) = 0 to obtain our estimate of q.
4.4.5 Kolmogorov Smirnov test
To assess the goodness of fit of a statistical model, one can use the Kolmogorov Smirnov test. The
test, like described in Massey (1951), is based on the maximum difference between an empirical and a
hypothetical cumulative distribution. The first distribution is a specified cumulative distribution function
F0(x). This is compared with an observed cumulative step-function of the sample SN (x) = k/N , where
k is the number of observations less than or equal to x. This results in the following test statistic
DN = max |F0(x)− SN (x)|. (4.68)
When (x1, x2, . . . , xn) are mutually independent and all come from the same distribution function F0(x),
then the distribution of DN does not depend on F0(x). This means that a table, used to test the hypothe-
sis that numbers come from a uniform distribution, may also be used to test the hypothesis that numbers
come from a normal distribution, or from any completely specified continuous distribution(Miller, 1956).
The statistic DN is used to test the null hypothesis that the observations come from F0(x) against the
alternative that they come from an alternative distribution. Based on formulas noted in Miller (1956),
one can derive the values of ε based on the sample size N and the desired level of significance (1 − a).
These values of ε define the distribution of the statistic DN : P = Prob(DN ≤ ε).
4.4.6 Expected Shortfall backtesting
We mainly focus on backtests based on the estimated Value at Risk measure. However, we also estimate
the Expected Shortfall measure and therefore also want to assess the quality of our methods based on
these ES estimates. Acerbi and Szekely (2014) describe three different backtests based on the Expected
Shortfall measure. They only make the assumption that the profit and loss distributions are continuous.
This way the Expected Shortfall can be written as
ES(1−p)(t) = −E[∆Vl(t)|∆Vl(t) + V aR(1−p)(t) < 0]. (4.69)
The tests that are used are model independent, so there is besides continuity no assumption made on the
true distribution of the returns. The general hypothesis of the Expected Value backtests is constructed
as follows
H0 : Fl(t) = Pl(t),
H1 : ES(1−p)F (t) > ES
(1−p)P (t),
(4.70)
where Fl(t) is the unknown true distribution of the returns ∆Vl(t) and Pl(t) is the forecasted distribution
of the returns ∆Vl(t) based on the model. Furthermore we also define ES(1−p)F (t) as the Expected
Shortfall based on the unknown true distribution Fl(t) and ES(1−p)P (t) as the Expected Shortfall estimate
based on the model distribution Pl(t).
We perform one of the proposed backtests that is sensitive to both the magnitude as well as the
frequency of exceptions. Besides this, we only estimate one-day-ahead forecasts and for this reason set
l = 1. The test is based on the returns of a portfolio rt in a sample period with T observations in total.
25
Frank de Zwart Abn Amro Model Validation
Acerbi and Szekely (2014) base the test statistic of this test on the following relation
ES(1−p)F (t) = −E
[rtItp
], (4.71)
where It is the indicator function as defined in (4.36). This leads to the following test statistic
Z(~r ) =
T∑t=1
rtIt
T pES(1−p)F (t)
+ 1. (4.72)
The hypothesis of this specific test is defined as follows
H0 :F[1−p]1 (t) = P
[1−p]1 (t) ∀ t,
H1 :ES(1−p)F (t) ≥ ES(1−p)
P (t) ∀ t,
and ES(1−p)F (t) > ES
(1−p)P (t) ∃ t,
and V aR(1−p)F (t) ≥ V aR(1−p)
P (t) ∀ t.
(4.73)
So under the null hypothesis we have a model that estimates the tail risk correctly, while if the null
hypothesis is rejected we have a model that underestimates the tail risk. The expected value of this test
statistic Z is under the null hypothesis equal to zero and under the alternative hypothesis strictly smaller
than zero. We perform this test with a significance level of 5% and Acerbi and Szekely (2014) show that
we do not need to perform a Monte Carlo simulation to compute the p-value for Z. They show that the
p-values are remarkable stable when all financially realistic cases are taken into account. This leads to a
selected critical value of the test statistic that is equal to −0.7.
5 Data
Two data sets, with each a different source, are combined for this research. The swaption data is provided
by ICAP and the zero curve data is collected from Thomson Reuter Eikon and Bloomberg. All of the
data is available from 13-Jan-2015 up to and including 1/Jun/2017. The data only contains trading days
and this leads to a total of 613 observations for each variable.
Little pre-processing is done to obtain the necessary zero curve data. The interest rates and interest
rate swaps, that are used to construct the zero and discount curves, are based on the Euribor rate. The
quotes are end-of-day and based on a floating tenor of three months. Furthermore we use so called mid
rates, which are computed based on the bid and ask quotes as observed in the market. The day count
convention is also quoted for each product and this is all used together with the bootstrapping method
described in Section 2.2 to obtain the zero and discount curves.
We then start with the pre-processing of the ICAP swaption data. The initial data consists of two raw
ICAP end-of-day data files. The first file contains the ATM data, including the ATM straddle premiums
for various swaption expiry and tenor combinations. The second file contains the skew data including
payer, receiver, collar and strangle premiums for various expiry, tenor and relative strike combinations.
First the relevant data is extracted from these raw files, then we convert them into files that are used
as input to the calibration. We store the ATM straddle premiums in a separate file. The premiums are
stored in an expiry-tenor grid. In another file we store the payer and receiver swaption premiums for
different relative strikes. For some strikes no payer and receiver premiums are available, but only collars
and strangles. In this case the payer and receiver swaptions are derived using the relationship
payer =collar + strangle
2, receiver = strangle - payer. (5.1)
26
Frank de Zwart Abn Amro Model Validation
We end up with two files with payer and receiver swaption premiums. These files contain the exact same
values, we have only separated the premiums for expiries up to one year from the premiums for expiries
of one year and beyond. The only reason for the two separate files is to follow the set up of the raw
input ICAP data. Then finally we also create a file which contains all of the ICAP displacement values
for every expiry-tenor combination.
The descriptive statistics of these deposit rates, swap rates, and the ’10y10’ swaption premiums are
shown in Table 5.1. The displacement parameter is excluded from this table, because it only takes on
a small number of discrete values on the entire time grid. A plot of the magnitude of the displacement
parameter for the ’5y5y’ and the ’10y10y’ swaption is shown instead in Figure 6.2. The value for the
standard deviation that is shown in the table is the average of the standard deviations between the
different tenors and strike rates for the Euribor data and the swaption data respectively. Aggregating
the data gives a more clear view of the main characteristics of the data. However, on the other hand some
information is lost because of the aggregation. For this reason, we show boxplots of both the Euribor
data and the Swaption data in Section A.1.
Euribor deposit rate Euribor swap rate Swaption premium
Tenor Overnight - 3 weeks 1 month - 60 years 10 years
Maturity - - 10 years
Min -0.3320 % -0.3980 % 57.85 euro
Max 0.0710 % 1.7965 % 875.64 euro
Mean -0.1787 % 0.2781 % 551.49 euro
Median -0.2420 % -0.1490 % 582.98 euro
Std. Dev. 0.0014 0.0020 25.23
Number of observations 3678 23907 10421
Table 5.1: Descriptive statistics of the data.
5.1 Calculating the implied volatilities
Next, we have to convert the premiums to volatilities, which we can then use to calibrate the SABR
model. To obtain the volatilities, we will use the displaced Black’s model as described in (4.18). First
we will use the ATM implied volatility to compute the correct principle value of the contract. This way
we link the correct volatilities to the ICAP premiums. The ATM volatility is given in our dataset, so
this makes a good starting point. We compute the principle value of the contract L as follows
d1.ATM =σATM
√T
2
d2.ATM = −σATM√T
2
L =Pswaption.ATM
Aα,β(0)[Sα,β(Tα)N(d1.ATM )−KN(d2.ATM )],
(5.2)
this notional principal is then used in the next step to compute the out-of-the-money volatilities. The
premiums for both receiver and payer OTM swaptions are quoted in the data set. The interval of these
strikes relative to the par swap rate of the underlying swap of the swaption is as follows
27
Frank de Zwart Abn Amro Model Validation
Receiver -3% -2% -1.5% -1% -0.75% -0.5% -0.25% -0.125% -0.0625%
ATM 0%
Payer +0.0625% +0.125% +0.25% +0.5% +0.75% +1% +1.5% +2% +3%
Table 5.2: Available strikes relative to the par swap rate.
Now we also compute the absolute rates of the ATM strikes based on the par swap rates. The ICAP
data strikes are all relative to the ATM strike, so to get the absolute strikes we need to compute the par
swap rate. To do so we use (2.9) together with the bootstrapped discount curve based on the Euribor
rate. The OTM volatilities are now computed by inverting (4.6) and solving this function for σ. We
make use of the displaced variant of Black’s model, so we use F and K, as described in Section 4.1.3.
This way we obtain the market points of the implied volatilities of the swaption.
5.2 Leaving out of some strikes
Firstly, there are some strikes missing in our data set. We only focus on the most frequently traded
expiry-tenor combinations to minimize the amount of missing values, but still some premiums are missing.
Especially the receiver swaptions with strikes of -3% and -2% relative to the par swap rate are often
missing. For this reason, we choose to exclude those two strikes on the entire interval. Furthermore there
is one day in particular (25/Mar/2015) where the premiums of only 11 out of the 19 strikes are available.
Fortunately this day is the only exception and for the 10y10 swaption there are at least 17 out of the 19
premiums available for all of the other days. The missing premiums here are the receiver swaptions with
strikes of -3% and -2%, which are excluded from our calibration. This results in a complete premium
vector for our interval of strikes for all days except for 25/Mar/2015. To obtain a more stable time series
of SABR parameters, we choose to exclude the quotes on 25/Mar/2015 from our data set.
Secondly, as will be described in Section 6.1, the shape of the volatility structure depends on the
chosen level of displacement. This volatility structure is then used to calibrate the SABR model. The
SABR model can however have difficulties to calibrate both to the low and high strikes. Some of the low
strike receiver swaptions will be removed in the calibration to obtain a better calibration to the higher
strikes, in which practitioners have the most exposure. The impact of these low strike receiver swaptions,
with a high volatility, on the SABR parameters is too big in relation to their importance. Leaving them
out will not only result in a better calibration for the other strikes, but also prevent calibrated SABR
smiles that result in big repricing differences. We remove a strike (K[1]) from the range we use for the
calibration in one of the following two cases
1. |σK[1]− σK[2]
| > 0.2,
2. σK[1]< σK[2]
,
where the strikes are ordered in ascending order from the receiver swaption with the lowest strike up
to the payer swaption with the highest strike. K[i] represents the ith strike in this sorted strike range.
Moreover by removing strikes, with a too high (1.) or a too low (2.) volatility, we improve our overall
calibration. These two cases only occur in the period from 13/Jan/2015 until 25/Mar/2015 and in total
no more than 23 strikes are removed on this interval. Note that for example in Figure 6.2 a strike is
removed from the interval for the lower two displacements.
28
Frank de Zwart Abn Amro Model Validation
6 Empirical study and results
The models and theory that are described in the previous sections will now be applied on our data set.
First, we will argue which values for β and the displacement parameter are preferred. Then we will
continue by calibrating the other SABR model parameters and subsequently we will start with the time
series analysis. The vector autoregressive model is estimated and analyzed. The estimates of the risk
measures based on this model will then be compared to the estimates of the historical simulation method
by multiple backtests. Finally, this section is ended with the estimation of the local level model and this
is used as a robustness check of the vector autoregressive model.
6.1 Calibrating the SABR model parameters
In Section 5 is described how the implied volatilities are obtained from the input data. These volatilities
are now used as inputs for the SABR volatility model. The model will be calibrated daily and the time
series of the parameters will then be stored and finally also analyzed. This is described in Section 6.2.
The first step in calibrating the SABR parameters is to determine which value for β fits the data
best. Our main focus in this research is on a swaption with 10 years to maturity and an underlying swap
tenor of 10 years as well. In Figure 6.1 the log-log plot of σATM and F is displayed. This can be used
together with the theoretical relation described in (4.16). Now one can estimate the value for β and
we use a simple OLS regression to do so. The linear approximations are plotted as well and the OLS
estimates are shown in Table 6.1.
−4.7 −4.6 −4.5 −4.4 −4.3 −4.2 −4.1 −4 −3.9 −3.8 −3.7−1.7
−1.6
−1.5
−1.4
−1.3
−1.2
−1.1
−1
−0.9
Log F
log
σA
TM
Log log plot with an OLS approximation
Data
OLS approximation
Figure 6.1: Log-log plot for the ’10y10y’ swaption.
The OLS estimation gives us the following results:
Log α -(1-β) α β
OLS estimate -3.5563 -0.5262 0.0285 0.4738
Table 6.1: OLS estimates for α and β.
29
Frank de Zwart Abn Amro Model Validation
So, as mentioned before, one fixed value for β can be used for the entire time grid. This method of
estimating the best value for β is however not always used. A common other approach is to just set β
equal to 0.5. We note that the value of β lies close to this value of 0.5 for the ’10y10y’ swaption. On the
other hand, this does not hold for every swaption. If we compute the log-log plot for the ’5y5y’ swaption,
we find a optimal value for β = 0.7191. The log-log plot and OLS estimates for the ’5y5y’ swaption are
displayed in appendix Section A.2.
Before we start with the calibration of the other SABR parameters, we first need to select the level of
the displacement parameter. This level of displacement has on itself no impact when repricing a single
swaption. If a given displacement is used to imply the volatility, then recomputing the premium will
result in an identical premium independent of the size of the displacement. However, the displacement
parameter has got an impact on the underlying volatility structure for different strikes. So, we need to
take two things into account when choosing the displacement parameter. First of all the displacement
parameters s needs to be larger than the absolute value of the lowest strike K. This is necessary to be
able to use Black’s model for the entire range of strikes. Also if we have K + s > 0, but really close to
zero, this will result in very high volatilities. Secondly, a large displacement parameter will flatten the
volatility structure or will even result in a frown. This effect is clearly shown in Figure 6.2 based on our
data set.
−0.02 −0.01 0 0.01 0.02 0.03 0.04 0.050
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
110−Mar−2017
Strike
Imp
lie
d B
lac
k V
ola
tili
ty
Market Volatilities
ATM volatility
Displacement of 1%
Displacement of 1.6%
Displacement of 3%
Figure 6.2: Implied volatilities and SABR calibration for different levels of displacement.
The interest rate and the par swap rate vary over time. For this reason it also makes sense to
vary the magnitude of the displacement parameter over time. The proposed magnitude of this variable
displacement parameter is provided with the data and shown in Figure 6.3. The fixed displacement value
of 1.25% is proposed for the 10y10y swaption, because a larger value will result in a worse calibration
for the positive interest rate period. On the other hand, a smaller value for the displacement parameter
forces us to remove some of the lowest strikes in the negative interest rate period. We also see a dynamic
displacement in Figure 6.3, these displacements are used by the data supplier ICAP. The dynamic
displacement parameter makes sure that we obtain a well behaving volatility structure on the entire
interval.
30
Frank de Zwart Abn Amro Model Validation
Q4-14 Q1-15 Q2-15 Q3-15 Q4-15 Q1-16 Q2-16 Q3-16 Q1-17 Q2-17 Q3-17
Date
0
0.5
1
1.5
2
2.5
3D
isp
lacem
en
t in
%
Different displacement parameters for the 5y5y swaption
Dynamic
Fixed
Q4-14 Q1-15 Q2-15 Q3-15 Q4-15 Q1-16 Q2-16 Q3-16 Q1-17 Q2-17 Q3-17
Date
0
0.5
1
1.5
2
2.5
3
Dis
pla
cem
en
t in
%
Different displacement parameters for the 10y10y swaption
Dynamic
Fixed
Figure 6.3: Magnitude of displacement for the ’5y5y’ and ’10y10y’ swaption respectively.
Once we have obtained the optimal value for β and the displacement parameter s, we can calibrate
the other SABR parameters; α, ρ, and ν. In Figure 6.4, we can see the effect of a change in one of these
parameters, while the other parameters remain unchanged. Again, the relationship between α and β,
as given in (4.16), is clear to see. An increase (decrease) in α or a decrease (increase) in β leads to an
increase (decrease) in all of the implied volatilities. So, a shift in one of these two parameters results in
a vertical shift of the entire volatility structure.
Now in the figure on the left side below, we can see that a change in ρ will lead to a tilt in the
volatility skew. So an increase (decrease) in ρ results in a decrease (increase) of the implied volatility
for the OTM receiver swaption strikes and in an increase (decrease) for the OTM payer swaption strikes
respectively. Finally, a shift in ν affects the structure again in another way. An increase (decrease) in
ν leads to a more (less) curved volatility structure. These responses of a change in one of the SABR
parameters hold in general for every swaption. The plots below are based on the ’5y5y’ swaption, but
for this reason also hold for swaptions with another expiry-tenor combination like the ’10y10y’ swaption.
31
Frank de Zwart Abn Amro Model Validation
−0.02 −0.01 0 0.01 0.02 0.03 0.04 0.050
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
110−Mar−2017
Strike
Imp
lie
d B
lac
k V
ola
tili
ty
Market Volatilities
ATM volatility
α = 0.0344
α = 0.0844
α = 0.1344
−0.02 −0.01 0 0.01 0.02 0.03 0.04 0.050
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
110−Mar−2017
Strike
Imp
lie
d B
lac
k V
ola
tili
ty
Market Volatilities
ATM volatility
β = 0.6227
β = 0.7227
β = 0.8227
−0.02 −0.01 0 0.01 0.02 0.03 0.04 0.050
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
110−Mar−2017
Strike
Imp
lie
d B
lac
k V
ola
tili
ty
Market Volatilities
ATM volatility
ρ = −0.6896
ρ = −0.1896
ρ = 0.3104
−0.02 −0.01 0 0.01 0.02 0.03 0.04 0.050
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
110−Mar−2017
Strike
Imp
lie
d B
lac
k V
ola
tili
ty
Market Volatilities
ATM volatility
ν = 0.0669
ν = 0.2669
ν = 0.4669
Figure 6.4: Effect of changes in SABR parameters for the ’5y5y’ swaption.
6.2 Fitting a model through the SABR parameters time series
Now we have calibrated the SABR volatility model, we obtain the volatility structure for our expiry-tenor
combination. Our input strikes are relative to the par swap rate, so they differ over our time period. For
the next step in our research, we will focus on one fixed interval of strikes for the entire time period. We
will compute the volatilities again with the formula suggested by Obłój (2008) and our calibrated SABR
parameters for 100 strikes equally distributed on an interval between 0.1% and 3.0%.
The calibrated SABR parameters for a fixed displacement of 1.25 % are displayed in the left part
of Figure 6.5. As can be seen from this plot, we notice that the parameter ρ is very unstable up to
25/Mar/2015. We expect that these unstable results are due to the relatively high displacement for this
period. The right part of Figure 6.5 shows again the calibrated SABR parameters, but now the dynamic
displacement is used. The dynamic displacement is significantly lower in the first months of 2015 and this
solves our problem of ρ being unstable. This clearly shows the significance of using the right magnitude
of the displacement parameter.
Again the same steps are followed for the ’5y5y’ swaption and the results are similar. The calibrated
SABR parameters are displayed in Section A.3. Different magnitudes of the fixed and dynamic displace-
ment parameter are proposed for the ’5y5y’ swaption, however again the first months of our time grid are
calibrated with a relatively high value for the fixed displacement parameter. The dynamic displacement
parameter, that is related to the level of the interest rates at that current time period, results also for
the ’5y5y’ swaption in more stable SABR parameters.
32
Frank de Zwart Abn Amro Model Validation
Q4-14 Q1-15 Q2-15 Q3-15 Q4-15 Q1-16 Q2-16 Q3-16 Q1-17 Q2-17 Q3-17
Date
-1
-0.8
-0.6
-0.4
-0.2
0
0.2
0.4
0.6
0.8
1S
AB
R p
ara
mete
rs
SABR parameters with fixed displacement
Q4-14 Q1-15 Q2-15 Q3-15 Q4-15 Q1-16 Q2-16 Q3-16 Q1-17 Q2-17 Q3-17
Date
-1
-0.8
-0.6
-0.4
-0.2
0
0.2
0.4
0.6
0.8
1
SA
BR
para
mete
rs
SABR parameters with dynamic displacement
Figure 6.5: SABR parameters for the ’10y10y’ swaption.
We will now try to capture the linear interdependencies among our variables α, ρ, and ν. We will
focus on the ’10y10y’ swaption with a dynamic displacement parameter. Again the decision to focus
on the ’10y10y’ swaption is made based on the fact that the quoted premiums are the most reliable
and complete. The dynamic displacement is preferred, because it results in a more stable time series of
our parameters. So this combination is the most promising in leading to reliable estimates of our risk
measures.
As discussed in Section 6.1, the volatility surface depends on the magnitude of the displacement
parameter. This results in some shocks in our calibrated SABR parameters. A shift in the dynamic
displacement parameter causes a change in the volatility structure and this results in slightly different
calibrated SABR parameters. In Figure A.5 of the appendix the SABR parameters and the level of the
dynamic displacement parameter are displayed in one figure. These plots give a clear view of the effect
of the level of displacement on the calibrated SABR parameters. For now we do not adjust our time
series analysis to deal with these small shocks, but we do note this occurrence.
To estimate the one-day-ahead forecasts, we use a moving window of n = 250 observations. The
first estimation will be based on the t1, . . . , tn interval, where t1 is the first day of our data set,
namely 13/Jan/2015, and tn represents 27/Jan/2016. This results in the first estimated profit or loss
on 28/Jan/2016. We will fit a new autoregressive model to be able to estimate every day between
28/Jan/2016 and 01/Jun/2017. The moving window method implies that we use interval t2, . . . , tn+1 to
estimate tn+2 and so on. The SABR parameters α, ρ, and ν are shown in Figure 6.6. In addition, we
have also computed the first differences of the three SABR parameters and show them as well in Figure
6.6.
33
Frank de Zwart Abn Amro Model Validation
Q4-14 Q1-15 Q2-15 Q3-15 Q4-15 Q1-16 Q2-16 Q3-16 Q1-17 Q2-17 Q3-17
0.02
0.04
0.06
Q4-14 Q1-15 Q2-15 Q3-15 Q4-15 Q1-16 Q2-16 Q3-16 Q1-17 Q2-17 Q3-17
-0.4
-0.2
0
Q4-14 Q1-15 Q2-15 Q3-15 Q4-15 Q1-16 Q2-16 Q3-16 Q1-17 Q2-17 Q3-17
Date
0.2
0.4
0.6
Q4-14 Q1-15 Q2-15 Q3-15 Q4-15 Q1-16 Q2-16 Q3-16 Q1-17 Q2-17 Q3-17
-2
0
2
10-3
Q4-14 Q1-15 Q2-15 Q3-15 Q4-15 Q1-16 Q2-16 Q3-16 Q1-17 Q2-17 Q3-17
-0.1
0
0.1
Q4-14 Q1-15 Q2-15 Q3-15 Q4-15 Q1-16 Q2-16 Q3-16 Q1-17 Q2-17 Q3-17
Date
-0.05
0
0.05
Figure 6.6: SABR parameters and first differences for the ’10y10y’ swaption with dynamic displacement.
Now we will determine how many lags p to include in our vector autoregression. To do so we first
check the sample ACF’s of the parameters and the parameters in first differences respectively. Plots of
the ACF’s for different intervals can be found in Section A.4. These ACF’s are by themselves not enough
to decide how many lags to include. If we look at the ACF’s of our parameters in first differences, there
can still be some significant autocorrelation up to lag number 200. It does not make sense to include this
many lags in our estimation, so we check the lag order selection criteria. The values of several criteria
are compared and denoted in Table A.2, which can be found in the appendix. In our decision we follow
the argumentation of Liew (2004) and make our selection based on the Hannan-Quinn criterion, because
of our large sample. We check for lags between zero and twenty and the Hannan-Quin criterion reaches
its minimum value on this interval at three lags. For this reason we make the decision to include p = 3
lags in our vector autoregressive model.
We estimate the parameters of our VAR(3) model based on our moving window of 250 observations.
The results of this estimation for the first period are denoted in Section A.5. One can use the VAR model
to obtain forecasts of the SABR parameters. In Section A.7, a ’10-day-ahead’ forecast of the parameters
can be found. The figure shows us the estimated trend of the parameters based on our fitted VAR(3)
model. However, we are especially interested in the one-day-ahead forecasts. For this reason we simulate
20000 different one-day-ahead forecasts for our SABR parameters.
We now first want to check the fit of our vector autoregressive model to the data and multiple
diagnostic tests are performed. These tests show that the VAR model does not fit the data as well
as required. The VAR models are stable, which can be evaluated by checking the inverse roots of the
characteristic polynomial. However, a different preferred number of lags is found if we compare the
Hannan-Quinn information criterion for different estimation windows over time. Instead of the three
lags that are used now, for some estimation windows a preferred number of one lag as well as a number
of thirteen lags is found. We also perform some other diagnostic tests based on the VAR(3) model and
show these results in Section A.6.
Recall that we want to model the dynamic structure of the time series such that the remaining
residuals are white noise. First we check whether there is still significant auto correlation within the
residuals by performing the Portmanteau test and the LM test for serial correlations. The null hypothesis
of the Portmanteau test states that there is no serial correlation up to lag h. This hypothesis is rejected
for h > 6, with a significance level of 1%. Furthermore the null hypothesis of the LM test states that
there is no serial correlation and this hypothesis is with the same significance level rejected for lags 6, 12,
34
Frank de Zwart Abn Amro Model Validation
13 and 19 if we take up to 20 lags into account. Subsequently the White test is performed to check for
heteroskedasticity in the errors. The test is carried out both with and without cross terms and rejects
the null hypothesis for every combination of the individual components. The test without cross terms
tests for heteroskedasticity only, while the test with cross terms also tests for a specification error. They
both show that we have not captured the dynamics of our parameters as intended. In the final test,
we assess whether the residuals follow the multivariate normal distribution. This Jarque-Bera test is
based on the square root of correlation as orthogonalization method and rejects the null hypothesis of
normality. Only a test on the skewness of the residuals of the equation for ν does not reject the null
hypothesis of normality, but for every other part the null is rejected. We can conclude that the VAR(3)
model does not capture the dynamics of the SABR parameters over time.
The diagnostic tests show that the vector autoregressive model is not able to capture all the dynamics
of the SABR parameters. We note these findings, but we will nevertheless compute the risk measures
based on these simulations and compare the estimates of the risk measures to the estimates of the
Historical Simulation method. Then we will perform the backtests and in addition also perform one
robustness check after this. In Section 6.5, we will estimate the local level model and compare the
simulations based on this model to the simulations that are generated based on the VAR(3) model.
6.3 Risk measurement
Now we use the simulated SABR parameters of the 28/Jan/2016 - 01/Jun/2017 period to compute 20000
volatility structures for each day. We then compute the premiums for the swaptions on the fixed range
of strikes again based on these volatility structures. These premiums are used to create the profit and
loss distribution. We will focus on a portfolio of three different swaptions. A strangle is one of the most
popular trading strategies and for this reason we will focus on this trading strategy. This portfolio will
be completed by adding an ATM payer swaption. This results in the following portfolio
Π = SwaptionReceiver(K1) + SwaptionPayer(K2) + SwaptionPayer(K3),
where K1 = 0.62%, K2 = 1.62% &K3 = 2.62%.
This portfolio is based on the par swap rate at the first day of our data set, K2 = 1.62%. The strangle
is a combination of a receiver swaption with strike K1 =ATM - offset and a payer swaption with strike
K3 =ATM + offset, where we have chosen for a offset of 1%. A plot of the par swap rates together with
K1, K2, K3, and the used range of strikes is displayed in Section A.8.
We compute the profit and loss distribution for all of our individual strikes based on the Historical
Simulation method, which makes use of the 249 most recent past returns. We also compute a profit and
loss distribution for every strike based on the 20000 simulated swaption prices. These distributions are
used to compute the 99% VaR by simply selecting the value of our sorted profit and loss distribution that
represents the lowest one percentile. For the 97.5% ES we compute the 97.5% VaR and then compute
the expected value within this tail. We compare our Value at Risk and Expected Shortfall estimates
between the two methods in Figure 6.7
35
Frank de Zwart Abn Amro Model Validation
Q4-15 Q1-16 Q2-16 Q3-16 Q1-17 Q2-17 Q3-17
Date
-50
0
50
100
150
200
250
300
350-9
9%
VaR
VAR(3) and Historical VaR estimates over time
VAR(3)
Historical
Q4-15 Q1-16 Q2-16 Q3-16 Q1-17 Q2-17 Q3-17
Date
-50
0
50
100
150
200
250
300
350
-97%
ES
VAR(3) and Historical ES estimates over time
VAR(3)
Historical
Figure 6.7: VAR(3) and Historical Simulation −99% VaR and −97.5% ES.
The graph clearly shows that, if we use a vector autoregressive model together with the SABR model
to estimate the risk measures, we get far more unstable results. The Historical Simulation method reacts
relatively slow to changes in the market, because all of the 249 historical returns are taken into account
with equal weights. The VAR(3) model however focuses on the more recent dynamics of the SABR model
parameters. We also see that the somewhat less stable first part of our calibrated SABR parameters
results in a higher estimate of the VaR and ES. The figures on the next page show the losses of our
portfolio over time together with the estimates of the two risk measures. The percentage of violations is
also given in the title of these plots.
Q4-14 Q1-15 Q2-15 Q3-15 Q4-15 Q1-16 Q2-16 Q3-16 Q1-17 Q2-17 Q3-17
Date
-100
-50
0
50
100
150
Lo
ss in
Eu
ros
Portofolio losses over time, with : 1.105% violated.
Historical returns
-99% HS VaR
Violations
Q4-14 Q1-15 Q2-15 Q3-15 Q4-15 Q1-16 Q2-16 Q3-16 Q1-17 Q2-17 Q3-17
Date
-100
-50
0
50
100
150
200
250
300
350
400
Lo
ss in
Eu
ros
Portofolio losses over time, with : 0.55096% violated.
Historical returns
-99% VAR(3) VaR
Violations
Q4-14 Q1-15 Q2-15 Q3-15 Q4-15 Q1-16 Q2-16 Q3-16 Q1-17 Q2-17 Q3-17
Date
-100
-50
0
50
100
150
Lo
ss in
Eu
ros
Portofolio losses over time, with : 0.82873% violated.
Historical returns
-97.5% HS ES
Violations
Q4-14 Q1-15 Q2-15 Q3-15 Q4-15 Q1-16 Q2-16 Q3-16 Q1-17 Q2-17 Q3-17
Date
-100
-50
0
50
100
150
200
250
300
350
400
Lo
ss in
Eu
ros
Portofolio losses over time, with : 0.55096% violated.
Historical returns
-97.5% VAR(3) ES
Violations
Figure 6.8: Losses over time for both methods with estimated VaR and ES.
36
Frank de Zwart Abn Amro Model Validation
We can now compare the proportion of violations with the theoretical value p. The Historical Sim-
ulation method gives the results as we would expect. The 99% VaR results in four violations on the
estimated interval, which is close to the expected one percent of the total estimations. The values for
the 97.5% ES are also shown in the lower two plots of Figure 6.8. The title of these plots also shows a
percentage of violations, but we note that this can not be used to assess the accuracy of the estimates of
the Expected Shortfall measure. The models that produce the ES forecasts will be backtested in Section
6.4.6. The VAR(3) model on the other hand only results in two losses larger than the 99% VaR. This is
in itself not that strange, but the values of the risk measures are. Some estimates do not make sense,
because they are either extremely high or way too low. For the 99% Value at Risk for example, we find
values ranging from 354.0662 up to −31.8681. A VaR of 354.0662 corresponds to a very large loss of
our portfolio and is not very likely to be correct. Let alone the value of −31.8681, which means that we
are 99% sure that the return of the portfolio over this one day is at least 31, 86 Euro. The simulated
returns are displayed in gray in Figure 6.9. The mean of each set of simulated returns is also displayed
and compared to the actual returns based on the data. In Figure A.9 a plot of the difference between
the mean of the simulation and the actual return is displayed. These errors are compared to errors of
the Historical Simulation method. These errors are defined as the difference between the mean of the
Historical Simulation profit and loss and the actual return. If we compute the mean squared error (MSE)
we find for the VAR(3) model a MSE of 563.7987 and for the Historical Simulation method a MSE of
378.7922.
Figure 6.9: VAR(3)-model simulations compared to the data set.
In the next section the results of several backtests will be displayed. These statistical tests give us a
better way to assess the quality of the models. It is hard to evaluate the quality of our models, based
on the number of violations alone. Due to the fact that we are looking at the tails of the distributions
only, we do not have enough data available to draw conclusions about the quality of these models with
enough certainty.
37
Frank de Zwart Abn Amro Model Validation
6.4 Backtests
In this section the quality of the two models will be assessed by applying several backtests. The
Value at Risk as well as the Expected Shortfall is computed with four different probabilities p =
[0.05, 0.025, 0.01, 0.001]. The Value at Risk and Expected Shortfall are estimated for 363 days. The
number and proportion of violations for the Value at Risk measures based on both methods are denoted
for different probabilities p in Table 6.2.
Historical Simulation VAR(3)-model
p Risk measure Violations Proportion Violations Proportion
0.05 VaR 18 4.9587% 9 2.4793%
0.025 VaR 10 2.7548% 4 1.1019%
0.01 VaR 4 1.1049% 2 0.5510%
0.001 VaR 0 0% 1 0.2755%
Table 6.2: Proportions and total number of violations for different values of p.
The Historical Simulation method results in a number of violations that is very close to the theoretical
values for all of the four values of p. The vector autoregressive model on the other hand shows a deviation
from the theoretical values. The 99.9% Value at Risk for example has in theory a chance on a loss greater
than this V aR(0.999) of one out of thousand. For this reason we would not expect to find a violation with
363 estimations in total, but even though of this small chance we still find a violation with the VAR(3)
model Expected Shortfall.
6.4.1 Kupiec
The unconditional coverage test can be used to evaluate this even better. The test is performed with a
significance level of 5% and the results can be found in Table 6.3. The test confirms the deviation of the
VAR(3) model and rejects the null hypothesis for a value of p = 0.05. We recall the null hypothesis of
this test, which is equal to E[It] = p. In words the null hypothesis states that the expected proportion
of losses that exceed the VaR is equal to p. The null hypothesis for the Historical Simulation method is
unlike the VAR(3)-model rejected in none of the five cases. We also note that the null hypothesis for the
VAR(3) model VaR estimation with p = 0.025 is close to being rejected. If we would use a significance
level of 6% the VAR(3) model would also be rejected based on the VaR estimates with p = 0.025.
Historical VaR VAR(3) model VaR
p LR-statistic p-value Reject H0 LR-statistic p-value Reject H0
0.05 0.0013 0.9711 False 5.9146 0.0150 True
0.025 0.0937 0.7596 False 3.6686 0.0554 False
0.01 0.0369 0.8477 False 0.8830 0.3474 False
0.005 0.0183 0.8923 False 0.0183 0.8923 False
0.002 0 1 False 0.0926 0.7609 False
Table 6.3: Kupiec unconditional coverage test, with a significance level of 5%.
38
Frank de Zwart Abn Amro Model Validation
The Historical Simulation method satisfies the unconditional coverage property according to Kupiec’s
backtest. The LR-statistics based on the HS method are for every value of p close to zero, which shows
that there is little reason to suspect that H0 does not hold. Moreover a well known drawback of these
backtests based on the Value at Risk is that they often have a low power, particularly when dealing with
a lower number of observations, like the data set that is used in this research. Nevertheless, this backtest
still indicates that the HS method VaR, unlike the VAR(3) model VaR, satisfies the unconditional
coverage property.
6.4.2 Magnitude-based test
Now in next step, we also take the magnitude of the losses into account. This is done by performing a
multivariate backtest on both the normal exceptions as well as on the super exceptions, as described in
Section 4.4.2. The results of this test are in line with what we have found so far and shown in the table
below. The power of most of these tests is as stated before in theory relatively low for our data set.
The magnitude-based test rejects the null hypothesis in none of the six cases. However, we find again
very low LR-statistics for the HS VaR. If we would use a larger significance level (e.g. 10%), we would
again reject the null hypothesis in two out of three cases of the VAR(3) model VaR estimates. On the
other hand, we note a different outcome for the p = 0.01 & p′ = 0.002 coverage rates. The LR-statistic
based on the Historical Simulation method in this case is even larger than the statistic based on the
VAR(3) model. This can possibly be explained by the fact that there are very few VaR violations for
these coverage rates, which makes the results for this test very inaccurate. This way it becomes more
difficult to assess the model, because of the low number of violations.
Historical VaR VAR(3) model VaR
p | p′ LR-statistic p-value Reject H0 LR-statistic p-value Reject H0
0.050 | 0.01 0.0554 0.9727 False 5.9417 0.0513 False
0.025 | 0.005 0.0937 0.9543 False 5.4537 0.0654 False
0.010 | 0.002 1.8220 0.4021 False 1.7756 0.4116 False
Table 6.4: Magnitude-based test, with a significance level of 5%.
To assess the quality of our methods even better, we also test on the independence property and apply
a duration-based test.
6.4.3 Christoffersen
The third test is performed to check whether the occurrences of a loss greater than the Value at Risk on
two different dates for the same coverage rate are independently distributed. We test with a significance
level of 5% and reject the null hypothesis of independent outcomes in none of the eight cases. This
indicates that the models in general do not violate the indepence property.
39
Frank de Zwart Abn Amro Model Validation
Historical VaR VAR(3) model VaR
p LR-statistic p-value Reject H0 LR-statistic p-value Reject H0
0.05 1.8791 0.1704 False 0.4577 0.4987 False
0.025 0.5666 0.4516 False 0.0891 0.7653 False
0.01 0.0891 0.7653 False 0.0222 0.8817 False
0.005 0.0222 0.8817 False 0.0222 0.8817 False
0.002 0 1 False 0.0055 0.9407 False
Table 6.5: Christoffersen independence property test, with a significance level of 5%.
Also note that the VAR(3) model actually performs better compared to the Historical Simulation
method, based on the LR-statistics here. One of the drawbacks of the HS method Value at Risk estimation
is that it does not always satisfy the independence property. In this case, we are not able to reject the null
hypothesis of independence, but we note that the Historical Simulation method is performing somewhat
worse than the VAR(3) model.
6.4.4 Duration-based test
The duration-based test is based on a GMM framework (Candelon et al., 2010). The test is performed
for a number of different moment conditions, however the results are only shown for K = 6 moment
conditions. The results for other numbers of moment conditions were similar to the results below and so
the same number of moment conditions as Candelon et al. (2010) imposed in their empirical research is
used. The duration-based test can be used to test the unconditional coverage property, the independence
property, and the conditional coverage property. The results for the test based on the conditional coverage
property are shown in the table below and again we are not able to reject the null hypothesis in any
case. However, we note again the difference in GMM-statistics between both methods and we are able
to reject the UC property for the VAR(3) model V aR(0.95) with a significance level of 1%. We do not
display the results for p = 0.002, because this results in too few violations. To test the independence
property, we need at least two durations between at least three violations. The results for the tests based
on the UC and the IND properties by themselves are shown in Section A.9.
Historical VaR VAR(3) model VaR
p GMM-statistic p-value Reject CC H0 GMM-statistic p-value Reject CC H0
0.05 3.0972 0.9281 False 15.2962 0.0536 False
0.025 0.6814 0.9996 False 12.8338 0.1177 False
0.01 0.9753 0.9984 False 2.9089 0.9399 False
0.005 1.3723 0.9946 False 4.5482 0.8046 False
Table 6.6: Duration-based CC property test, with a significance level of 5% and K = 6.
The HS method performs well in the duration-based test on the conditional coverage property. Again
the results show that the VAR(3) model does not produce better VaR estimates than the HS method,
as also found with the unconditional coverage test and the magnitude-based test.
40
Frank de Zwart Abn Amro Model Validation
6.4.5 Kolmogorov-Smirnov
The next step in assessing the quality of the Historical Simulation method is to perform a test of
goodness of fit. We have 363 different estimation samples from which we compute 363 different Value
at Risk estimates. The profit and loss distributions, which we use to estimate the upcoming returns,
are composed of 249 equally weighted historical returns. We now use the Kolmogorov-Smirnov test to
assess whether the actual returns we observe are uniform draws from the estimation samples. This gives
the opportunity to test a crucial assumption in our analysis, namely if the historical returns can be used
with equal weights to obtain an accurate estimate of the one-day-ahead return. The output of the test
is displayed in Table 6.7.
Significance level ks-statistic p-value Reject H0
0.05 0.0386 0.9461 False
Table 6.7: Kolmogorov-Smirnov test for Historical Simulation method.
The Historical Simulation method is used to estimate the profit and loss distribution and the true
return is compared to this sample. First, we sort the returns of the estimated profit and loss distribution
in ascending order. Then we check the rank of the observed return in this sorted estimated distribution.
The absolute value of the rank of the observed return is then converted to the relative rank by dividing
by the total number of observations in the estimation window. For the Historical Simulation method to
be valid the observed returns need to be random draws from the 249 historical returns that represent
the estimated profit and loss distribution for every value of t. We check if this is the case by evaluating
if the relative rank of the actual return with respect to the values of the profit and loss distribution is a
random draw from the uniform distribution. Therefore the two-sample Kolmogorov-Smirnov test is used
to compare the sample of relative ranks to the theoretical values from the uniform distribution.
The theoretical values from the uniform distribution are also plotted together with the sample of
relative ranks and displayed in Figure A.10. Table 6.7 already showed that the null hypothesis is not
rejected with a significance level of 5% and the graph also shows little difference between the two CDF’s.
So to conclude, based on the Kolmogorov-Smirnov test, we are not able to reject the assumption that
the historical returns can be used to estimate current returns.
6.4.6 Expected Shortfall backtest
Next, also the backtest based on the Expected Shortfall measure is performed. We focus again on
different values for p and expect under the null hypothesis to find a value for Z equal to zero. The
results of the backtest are shown in Table 6.8 and we find values for the Z statistic close to zero for the
Historical Simulation method. The Z statistic is not defined for the Historical Simulation method with
p = 0.001, because the indicator function is in this case equal to zero for all t.
41
Frank de Zwart Abn Amro Model Validation
Historical ES VAR(3) model ES
p Z-statistic Reject H0 Z-statistic Reject H0
0.05 0.0292 False 0.4616 False
0.025 -0.0595 False 0.9380 False
0.01 0.0624 False -0.3730 False
0.005 0.1305 False -0.2686 False
0.001 - - -0.8821 True
Table 6.8: Expected Shortfall backtest, with a significance level of 5%.
The Z statistic is strictly negative for the VAR(3) model estimations with p ≤ 0.01, but we only
reject the VAR(3) model estimations with p = 0.001. We are not able to reject the null hypothesis for
other values for p, but the results are in line with what we found based on the Value at Risk estimates.
The VAR(3) model performs here worse than the Historical Simulation method based on the Expected
Shortfall forecasts.
6.5 Robustness check: Local level model
The results that we find based on the VAR(3) model are less stable than we would have hoped for. The
main goal is to find an accurate risk measure. However, stability is also valued, because this makes a risk
measure more suitable to be actually enforced by a financial institution. Large shifts in the level of the
risk measures will make it more difficult and more expensive to adjust the required amount of capital
that needs to be hold.
Now the Local Level model is also applied on the time series of SABR parameters in first differences.
The estimates of the model parameters and the model that is used itself are denoted in Section A.10.
Based on this model, we find the following simulations and VaR estimates over time.
Q4-15 Q1-16 Q2-16 Q3-16 Q1-17 Q2-17 Q3-17
Date
-20
0
20
40
60
80
100
120
140
160
180
200
-99%
VaR
VAR(3) and LLM Value-at-Risk estimates over time
VAR(3)
LLM
Figure 6.10: Local level model simulations.
The results that are found based on the local level model are similar to results that were found with
the VAR(3) model. If we compare the simulated profit and loss distributions, which are denoted in gray
in the right part of Figure 6.10, we see outcomes that are close to the VAR(3) model simulations. Hence,
applying the local level model does not result in more stable simulations. This is unfortunate, but on
the other hand supports the results that we found based on the VAR(3) model.
42
Frank de Zwart Abn Amro Model Validation
7 Conclusion
The main goal of this research was to find more accurate risk measures for swaptions, based on a time
series analysis of SABR model parameters. An empirical study is used to assess the quality of this
new method compared to the commonly used Historical Simulation method. Before we were able to
apply the time series analysis, the SABR volatility model had to be calibrated. The optimal value for
β is first estimated based on the log-log plot of σATM and F and we found a value close to 0.5 for the
’10y10y’ swaption. Some studies however, do not make use of this log-log plot and set β = 0.5. When we
estimated the optimal value for β based on the ’5y5y’ swaption, we found a value significantly different
from 0.5. Estimating the value for β based on the log-log plot is straightforward and takes little time to
carry out. For this reason, we recommend to always check this estimate of β based on the log-log plot,
before the decision to set β equal to 0.5 is made.
We then noticed the relationship between the magnitude of the displacement parameter and the
volatility structure. We have chosen for the displaced SABR model to be able to deal with negative
interest rates and we find that a high value for the displacement parameter can lead to unstable cali-
brated SABR parameters. Based on this consequence, we conclude that it is preferred to use a dynamic
value for the displacement parameter. Using a displacement parameter that represents the interest rate
environment at the current time period results in a more stable SABR calibration.
Then in the next step a time series analysis was applied to the SABR model parameters. After
some diagnostic tests, we can conclude that a vector autoregressive model is not qualified to capture the
dynamic structure of the time series. As a result, we obtain unstable estimates of our risk measures over
time. This is unfavorable, so based on the VAR model we are not able to improve the estimates of the
Historical Simulation method. We then also use a local level model to analyze the time series, but find
similar results as we found with the VAR model. We recall the research question of this thesis. Can one
outperform the Historical Simulation Value at Risk and Expected Shortfall forecasts by fitting a time
series model to the calibrated SABR model parameters instead? Based on our empirical study, we can
conclude that we were not able to improve the Historical Simulation estimates of the risk measures by
using a vector autoregressive model or a local level model.
If we compare the results of the numerous backtests, we conclude that the Historical Simulation
method performs relatively well. The indepence property is in general sometimes violated when the HS
method is used, but in our case we do not reject the null hypothesis of independence. However, we note
that we find somewhat higher LR-statistics and keep in mind that it is possible that the power of our
backtest is too low to reject in our case. Nevertheless the HS method performs in this case relatively
well even though the estimates of the risk measures respond slowly to changes in the profit and loss
distribution. The vector autoregressive model on the other hand performs worse in the unconditional
coverage test, the magnitude-based test and the duration-based test. Also if we perform a test based on
the estimates of the Expected Shortfall measure, we find that the estimates based on the HS method are
more accurate than the estimates based on the VAR(3) model. The backtests are in line with what we
noticed from the estimated risk measures themselves and confirm that the Historical Simulation method
outperforms the vector autoregressive model in the estimation of the risk measures.
In our conclusion, we make the distinction between two possibilities. First of all, it could be the case
that another more advanced time series model is able to produce better estimates of the risk measures.
This is something that would be interesting for a follow up research. We saw for example that the shifts
in the dynamic displacement parameter caused a shift in the calibrated SABR model parameters. These
43
Frank de Zwart Abn Amro Model Validation
shifts are in this study ignored, but it would be interesting to check to which extent the estimates of the
risk measures could be improved by taken these shocks into account. On the other hand it could also
be the case that the uncertainty in the simulated one-day-ahead SABR model parameters just has a too
large impact on the volatility structure and as a result also on the price of the swaptions. If this is the
case, then the time series analysis on itself is not the main issue. In a follow up research, it would be
interesting to investigate whether one can find better estimates with a more advanced time series model
and when this does not work it would also be interesting to investigate why this is the case.
44
Frank de Zwart Abn Amro Model Validation
References
Acerbi, C. and Szekely, B. (2014). Backtesting expected shortfall. Risk.
Antonov, A., Konikov, M., and Spector, M. (2015). The free boundary sabr: Natural extension to
negative rates. Risk.
Barone-Adesi, G., Giannopoulos, K., and Vosper, L. (2002). Backtesting derivative portfolios with filtered
historical simulation (fhs). European Financial Management, 8(1):31–58.
Basel Committee (2013). Fundamental Review of the Trading Book: A revised market risk framework.
Bank for International Settlements.
Berestycki, H., Busca, J., and Florent, I. (2004). Computing the implied volatility in stochastic volatility
models. Communications on Pure and Applied Mathematics, 57(10):1352–1373.
Black, F. (1976). The pricing of commodity contracts. Journal of Financial Economics, 3(1-2):167–179.
Bogerd, K. (2015). Smile risk in expected shortfall estimation for interest rate options. Utrecht University.
Brigo, D. and Mercurio, F. (2007). Interest rate models - theory and practice: with smile, inflation and
credit. Springer.
Campbell, S. (2007). A review of backtesting and backtesting procedures. The Journal of Risk, 9(2):1–17.
Candelon, B., Colletaz, G., Hurlin, C., and Tokpavi, S. (2010). Backtesting value-at-risk: A gmm
duration-based test. Journal of Financial Econometrics, 9(2):314–343.
Christoffersen, P. F. (1998). Evaluating interval forecasts. International Economic Review, 39(4):841–862.
Colletaz, G., Hurlin, C., and Perignon, C. (2013). The risk map: A new tool for validating risk models.
Journal of Banking and Finance, 37(10):3843–3854.
Commandeur, J. J. F. and Koopman, S. J. (2007). An introduction to state space time series analysis.
Oxford University Press.
Du, Z. and Escanciano, Juan, C. (2015). Backtesting expected shortfall: Accounting for tail risk. Man-
agement Science.
Frankema, L. (2016). Pricing and hedging options in a negative interest rate environment. Delft University
of Technology.
Giordano, L. and Siciliano, G. (2013). Real-world and risk-neutral probabilities in the regulation on the
transparency of structured products. SSRN Electronic Journal.
Gneiting, T. (2011). Making and evaluating point forecasts. Journal of the American Statistical Associ-
ation, 106(494):746–762.
Gurrola, P. and Murphy, D. (2015). Filtered historical simulation value-at-risk models and their com-
petitors. Bank of England.
Hagan, P., Kumar, D., Lesniewski, S., andWoodward, D. (2002). Managing smile risk. Wilmott magazine,
1:84–108.
45
Frank de Zwart Abn Amro Model Validation
Hull, J. (2012). Options, futures, and other derivatives. Prentice Hall.
Itô, K. (1951). On stochastic differential equations. Memoirs of the American Mathematical Society,
4:1–51.
Kupiec, P. H. (1995). Techniques for verifying the accuracy of risk measurement models. The Journal
of Derivatives, 3(2):73–84.
Liew, V. K.-S. (2004). Which lag length selection criteria should we employ. Economics Bulletin,
3(33):1–9.
Massey, F. J. (1951). The kolmogorov-smirnov test for goodness of fit. Journal of the American Statistical
Association, 46(253):68.
Miller, L. H. (1956). Table of percentage points of kolmogorov statistics. Journal of the American
Statistical Association, 51(273):111.
Moni, C. (2014). Risk managing smile risk with sabr model. WBS Interest Rate Conference.
Obłój, J. (2008). Fine-tune your smile correction to hagan et al. Wilmott magazine, 1.
Pérignon, C. and Smith, D. R. (2010). The level and quality of value-at-risk disclosure by commercial
banks. Journal of Banking & Finance, 34(2):362–377.
Piontek, K. (2009). The analysis of power for some chosen var backtesting procedures: Simulation ap-
proach. Advances in Data Analysis, Data Handling and Business Intelligence Studies in Classification,
Data Analysis, and Knowledge Organization, page 481–490.
Pritsker, M. G. (2001). The hidden dangers of historical simulation. Journal of Banking and Finance,
30(2):561–582.
Roccioletti, S. (2016). Backtesting value at risk and expected shortfall. Springer Gabler.
Tsay, R. S. (2005). Analysis of financial time series. Wiley Series in Probability and Statistics.
Uri, R. (2000). A practical guide to swap curve construction. Bank of Canada.
West, G. (2005). Calibration of the sabr model in illiquid markets. Applied Mathematical Finance,
12(4):371–385.
46
Frank de Zwart Abn Amro Model Validation
A Appendix
A.1 Data
Boxplots of the data are displayed below. The Euribor data consists of Deposits for all tenors up to and
including three weeks and of Swaps for all of the remaining tenors. The boxplots show the minimum, the
quantiles, and the outliers of the data for every tenor and strike rate, respectively. The boxplot draws
values as an outlier if they are larger than q3 +w(q3 − q1) or smaller than q1 −w(q3 − q1), with whisker
w = 1.5 and q1 and q3 equal to the 25th and the 75th percentile of the sample data, respectively. We
note that the Euribor data is shown in a more compact way, but this boxplot would still show outliers
if they existed in the data. The boxplot of the swaption premiums on the other hand shows multiple
outliers. If the data would be normally distributed then we would expect to find 0.7% outliers for every
strike rate, which equals 4.3 outliers for every strike rate in our sample. However, we notice more outliers
for the swaptions with a strike rate that is further out-of-the-money. This indicates that the swaption
premiums are not identically distributed for different strike rates.
-0.5
0
0.5
1
1.5
De
po
sit
or
sw
ap
ra
te i
n %
Boxplot of Euribor data
Tenor
ON
TN
SN
1W
2W
3W
1M
2M
3M
4M
5M
6M
7M
8M
9M
10
M
11
M
1Y
15
M
18
M
21
M
2Y
27
M
30
M
33
M
3Y
4Y
5Y
6Y
7Y
8Y
9Y
10
Y
11
Y
12
Y
13
Y
14
Y
15
Y
20
Y
25
Y
30
Y
35
Y
40
Y
50
Y
60
Y
Figure A.1: Boxplot of Euribor data.
47
Frank de Zwart Abn Amro Model Validation
-1.5 -1 -0.75 -0.5 -0.25 -0.125 -0.0625 0 0.0625 0.125 0.25 0.5 0.75 1 1.5 2 3
Relative strike rates in %
100
200
300
400
500
600
700
800
900
Pre
miu
m in
Eu
ros
Boxplot of swaption premium data
Figure A.2: Boxplot of premiums for the ’10y10y’ swaption.
A.2 Determining the optimal value for β
-5.4 -5.2 -5 -4.8 -4.6 -4.4 -4.2 -4 -3.8
Log F
-1.65
-1.6
-1.55
-1.5
-1.45
-1.4
-1.35
-1.3
-1.25
-1.2
log
A
TM
Log log plot with an OLS approximation
Data
OLS approximation
Figure A.3: Log-log plot for the ’5y5y’ swaption.
The OLS estimation gives us the following results:
Log α -(1-β) α β
OLS estimate -2.7005 -0.2809 0.0672 0.7191
Table A.1: OLS estimates for α and β.
48
Frank de Zwart Abn Amro Model Validation
A.3 Time series of SABR parameters
Q4-14 Q1-15 Q2-15 Q3-15 Q4-15 Q1-16 Q2-16 Q3-16 Q1-17 Q2-17 Q3-17
Date
-1
-0.8
-0.6
-0.4
-0.2
0
0.2
0.4
0.6
0.8
1
SA
BR
para
mete
rs
SABR parameters with fixed displacement
Q4-14 Q1-15 Q2-15 Q3-15 Q4-15 Q1-16 Q2-16 Q3-16 Q1-17 Q2-17 Q3-17
Date
-1
-0.8
-0.6
-0.4
-0.2
0
0.2
0.4
0.6
0.8
1
SA
BR
para
mete
rs
SABR parameters with dynamic displacement
Figure A.4: SABR parameters for the ’5y5y’ swaption.
Q4-14 Q1-15 Q2-15 Q3-15 Q4-15 Q1-16 Q2-16 Q3-16 Q1-17 Q2-17 Q3-17
Date
-1
-0.8
-0.6
-0.4
-0.2
0
0.2
0.4
0.6
0.8
1
SA
BR
para
mete
rs
0
0.5
1
1.5
2
2.5
3
Dis
pla
cem
en
t in
%
SABR parameters with dynamic displacement for the 5y5y swaption
Dynamic displacement
Q4-14 Q1-15 Q2-15 Q3-15 Q4-15 Q1-16 Q2-16 Q3-16 Q1-17 Q2-17 Q3-17
Date
-1
-0.8
-0.6
-0.4
-0.2
0
0.2
0.4
0.6
0.8
1
SA
BR
para
mete
rs
0
0.5
1
1.5
2
2.5
3
Dis
pla
cem
en
t in
%
SABR parameters with displacement for the 10y10y swaption
Dynamic displacement
Figure A.5: SABR parameters with dynamic displacement.
49
Frank de Zwart Abn Amro Model Validation
A.4 Lag length selection
-0.5
0
0.5
1
Sam
ple
Auto
corr
ela
tion Sample ACF of
0 20 40 60
Lag
-0.5
0
0.5
1
Sam
ple
Auto
corr
ela
tion Sample ACF of
0 20 40 60
Lag
-0.5
0
0.5
1
Sam
ple
Auto
corr
ela
tion Sample ACF of
0 200 400 600
Lag
-0.5
0
0.5
1
Sam
ple
Auto
corr
ela
tion Sample ACF of
0 200 400 600
Lag
-0.5
0
0.5
1
Sam
ple
Auto
corr
ela
tion Sample ACF of
0 20 40 60
Lag
-0.5
0
0.5
1
Sam
ple
Auto
corr
ela
tion Sample ACF of
0 20 40 60
Lag
-0.5
0
0.5
1
Sam
ple
Auto
corr
ela
tion Sample ACF of
0 200 400 600
Lag
-0.5
0
0.5
1
Sam
ple
Auto
corr
ela
tion Sample ACF of
0 200 400 600
Lag
-0.5
0
0.5
1
Sam
ple
Auto
corr
ela
tion Sample ACF of
0 20 40 60
Lag
-0.5
0
0.5
1
Sam
ple
Auto
corr
ela
tion Sample ACF of
0 20 40 60
Lag
-0.5
0
0.5
1
Sam
ple
Auto
corr
ela
tion Sample ACF of
0 200 400 600
Lag
-0.5
0
0.5
1
Sam
ple
Auto
corr
ela
tion Sample ACF of
0 200 400 600
Lag
Figure A.6: Sample ACF’s for SABR parameters with Dynamic displacement.
50
Frank de Zwart Abn Amro Model Validation
Below the results for several selection criteria are shown. The LR stands for the sequential modified
Likelihood-Ratio test statistic, FPE is the final prediction error, AIC is the Akaike information criterion,
SC is the Schwarz information criterion and HQ represents the Hannan-Quinn information criterion.
The tests in the table are at a 5% level and the optimal lag order selected by the criterion is denoted
with a asterisk.
Lag LogL LR FPE AIC SC HQ
0 2261,784 NA 5,92E-13 -19,64160 -19,59675 -19,62351
1 2290,584 56,59829 4,98E-13 -19,81377 -19,63440* -19,74142
2 2303,829 25,68476 4,80E-13 -19,85069 -19,53678 -19,72407
3 2322,641 35,98778 4,41E-13 -19,93601 -19,48757 -19,75512*
4 2329,173 12,32593 4,51E-13 -19,91455 -19,33157 -19,67939
5 2337,536 15,56151 4,53E-13 -19,90901 -19,19150 -19,61958
6 2349,805 22,51040 4,41E-13 -19,93743 -19,08539 -19,59373
7 2361,361 20,90162 4,31E-13 -19,95966 -18,97308 -19,56169
8 2369,328 14,20298 4,36E-13 -19,95068 -18,82957 -19,49845
9 2391,234 38,47807 3,90E-13 -20,06291 -18,80726 -19,55640
10 2398,282 12,19566 3,97E-13 -20,04593 -18,65575 -19,48516
11 2409,427 18,99526 3,90E-13 -20,06458 -18,53987 -19,44954
12 2419,645 17,14795 3,87e-13* -20,07517* -18,41592 -19,40587
13 2425,063 8,95164 4,00E-13 -20,04402 -18,25024 -19,32045
14 2433,286 13,37230 4,04E-13 -20,03727 -18,10896 -19,25943
15 2438,372 8,13667 4,19E-13 -20,00323 -17,94039 -19,17112
16 2449,252 17,12415 4,14E-13 -20,01958 -17,82220 -19,13320
17 2464,643 23,82352 3,93E-13 -20,07516 -17,74325 -19,13451
18 2468,343 5,62935 4,13E-13 -20,02907 -17,56262 -19,03415
19 2471,675 4,98327 4,36E-13 -19,97978 -17,37880 -18,93060
20 2483,380 17,20143* 4,29E-13 -20,00330 -17,26779 -18,89985
Table A.2: VAR lag order selection criteria.
51
Frank de Zwart Abn Amro Model Validation
A.5 Vector Autoregression
AR−Stat i onary 3−Dimensional VAR(3) Model
E f f e c t i v e Sample S i z e : 246
Number o f Estimated Parameters : 30
LogLike l ihood : 2432 .32
AIC : −4804.63BIC : −4699.47
Value StandardError TS t a t i s t i c PValue
___________ _____________ __________ __________
Constant (1 ) −4.9616e−05 6 .4909 e−05 −0.7644 0.44463
Constant (2 ) −0.0007553 0.0025274 −0.29885 0.76506
Constant (3 ) −0.0003377 0.0017135 −0.19708 0.84377
AR1(1 ,1) −0.019815 0.07939 −0.2496 0 .8029
AR1(2 ,1) −3.6426 3 .0912 −1.1784 0.23865
AR1(3 ,1) −2.6018 2 .0958 −1.2415 0.21444
AR1(1 ,2) 0 .001924 0.0021331 0.90198 0.36707
AR1(2 ,2) −0.34306 0.083057 −4.1305 3 .62 e−05AR1(3 ,2) −0.055376 0.056311 −0.98341 0.32541
AR1(1 ,3) 0 .0019307 0.0028546 0.67634 0.49883
AR1(2 ,3) −0.048717 0.11115 −0.43829 0.66118
AR1(3 ,3) 0 .016625 0.075359 0.22061 0 .8254
AR2(1 ,1) 0 .11722 0.078275 1 .4976 0.13425
AR2(2 ,1) −10.905 3 .0478 −3.578 0.00034629
AR2(3 ,1) 5 .5086 2 .0664 2 .6658 0.0076798
AR2(1 ,2) −0.0036061 0.0021917 −1.6453 0.099902
AR2(2 ,2) −0.14852 0.08534 −1.7403 0.081801
AR2(3 ,2) −0.033632 0.057859 −0.58128 0.56105
AR2(1 ,3) −0.0113 0.0028421 −3.9758 7 .0137 e−05AR2(2 ,3) 0 .30217 0.11067 2 .7304 0.0063252
AR2(3 ,3) −0.24748 0.07503 −3.2985 0.00097215
AR3(1 ,1) 0 .050474 0.079474 0 .6351 0.52537
AR3(2 ,1) −7.2978 3 .0945 −2.3583 0.018359
AR3(3 ,1) −0.36743 2 .098 −0.17513 0.86098
AR3(1 ,2) −0.0016881 0.0021352 −0.79059 0.42918
AR3(2 ,2) −0.0095256 0.083142 −0.11457 0.90879
AR3(3 ,2) −0.1261 0.056368 −2.2371 0.025282
AR3(1 ,3) −0.012139 0.0028895 −4.2012 2 .6556 e−05AR3(2 ,3) 0 .54294 0.11251 4 .8257 1 .3953 e−06AR3(3 ,3) −0.29708 0.07628 −3.8946 9 .835 e−05
52
Frank de Zwart Abn Amro Model Validation
Innovat ions co−var iance matrix :
0 .0000 −0.0000 0 .0000
−0.0000 0 .0016 −0.00060 .0000 −0.0006 0 .0007
Innovat ions c o r r e l a t i o n matrix :
1 .0000 −0.5863 0 .4330
−0.5863 1 .0000 −0.53100 .4330 −0.5310 1 .0000
A.6 Evaluating the time series analysis
The results of the performed diagnostic tests are summarized below. The lag length selection criteria are
shown for different samples and the other tests are all based on the first estimation window (13/Jan/2015
- 06/Jan/2016).
Portmanteau test LM test (χ2(9))
Lags Q-Stat Prob. Adj Q-Stat Prob. df LM-Stat Prob
1 0,395149 NA* 0,396762 NA* NA* 10,44895 0,3154
2 1,948446 NA* 1,96279 NA* NA* 11,24182 0,2595
3 4,111961 NA* 4,153016 NA* NA* 16,09898 0,0648
4 7,868684 0,5474 7,971833 0,537 9 4,45073 0,8793
5 11,26682 0,8827 11,44047 0,8747 18 3,858909 0,9205
6 44,47365 0,0185 45,47747 0,0145 27 37,52362 0
7 58,42196 0,0105 59,83431 0,0076 36 13,75203 0,1314
8 78,40643 0,0015 80,49053 0,0009 45 20,5727 0,0147
9 93,57406 0,0007 96,23414 0,0004 54 15,46203 0,079
10 102,7398 0,0012 105,7882 0,0006 63 9,46408 0,3956
11 115,5032 0,0009 119,149 0,0004 72 13,45859 0,1429
12 139,618 0,0001 144,5006 0 81 24,38307 0,0037
13 173,1128 0 179,8642 0 90 36,33583 0
14 177,9179 0 184,9593 0 99 5,150915 0,821
15 184,9171 0 192,4129 0 108 7,294864 0,6064
16 195,7306 0 203,9787 0 117 11,4214 0,2479
17 212,7453 0 222,2564 0 126 17,5525 0,0407
18 224,2442 0 234,6632 0 135 12,33173 0,1952
19 243,6628 0 255,7071 0 144 21,86051 0,0093
20 252,3717 0 265,1867 0 153 8,914652 0,4452
Table A.3: Portmanteau test and LM test for auto correlation.
53
Frank de Zwart Abn Amro Model Validation
Without cross terms
Dependent R-squared F(18,227) Prob. χ2(18) Prob.
res1*res1 0,646044 23,01793 0 158,9268 0
res2*res2 0,546893 15,22139 0 134,5356 0
res3*res3 0,283992 5,001986 0 69,86214 0
res2*res1 0,657318 24,19009 0 161,7002 0
res3*res1 0,619349 20,51922 0 152,3597 0
res3*res2 0,588867 18,06292 0 144,8612 0
Joint test (χ2(108)): 312,5637, with prob. 0.
With cross terms
Dependent R-squared F(18,227) Prob. χ2(18) Prob.
res1*res1 0,929864 46,89429 0 228,7466 0
res2*res2 0,910374 35,92741 0 223,952 0
res3*res3 0,465922 3,085659 0 114,6168 0
res2*res1 0,981364 186,2624 0 241,4156 0
res3*res1 0,924792 43,49284 0 227,4988 0
res3*res2 0,927955 45,55747 0 228,2768 0
Joint test (χ2(324)): 720,2421, with prob. 0.
Table A.4: White heteroskedasticity test.
Component Skewness χ2 df Prob.
1 -2,294449 92,91814 1 0
2 -0,433951 7,546527 1 0,006
3 -1,176436 40,00863 1 0
Joint test 140,4733 3 0
Component Kurtosis χ2 df Prob.
1 20,48321 20,32191 1 0
2 18,2069 444,9114 1 0
3 22,57069 417,4646 1 0
Joint test 882,698 3 0
Component Jarque-Bera df Prob.
1 113,24 2 0
2 452,458 2 0
3 457,4732 2 0
Joint test 1023,171 6 0
Table A.5: Normality test.
54
Frank de Zwart Abn Amro Model Validation
A.7 Forecasts
Figure A.7: Forecasts based on VAR(3) model fitted to ’10y10y’ swaption with dynamic displacement.
55
Frank de Zwart Abn Amro Model Validation
A.8 Risk measurement
Q4-14 Q1-15 Q2-15 Q3-15 Q4-15 Q1-16 Q2-16 Q3-16 Q1-17 Q2-17 Q3-17
Date
0
0.5
1
1.5
2
2.5
3
3.5R
ate
in
%Range of strikes and the par swap rate over time
Par swap rate Fixed strike interval K1
K2
K3
Figure A.8: Range of strikes and the par swap rate over time.
Q4-15 Q1-16 Q2-16 Q3-16 Q1-17 Q2-17 Q3-17
Date
-200
-150
-100
-50
0
50
100
Ov
ere
sti
ma
te i
n E
uro
s
VAR(3) and Historical Simulation errors
VAR(3) model MSE:563.7987 Historical Simulation MSE:378.7922
Figure A.9: Comparison between actual and mean of estimated returns.
56
Frank de Zwart Abn Amro Model Validation
A.9 Backtests
Historical VaR VAR(3) model VaR
p GMM-statistic p-value Reject UC H0 GMM-statistic p-value Reject UC H0
0.05 0.2006 0.6542 False 7.5003 0.0062 True
0.025 0.2234 0.6365 False 1.4720 0.2250 False
0.01 5.3872e-04 0.9815 False 0.8001 0.3711 False
0.005 4.0201e-04 0.9840 False 0.8975 0.3434 False
Table A.6: Duration-based UC property test, with a significance level of 5% and K = 6 moment
conditions.
Historical VaR VAR(3) model VaR
p GMM-statistic p-value Reject IND H0 GMM-statistic p-value Reject IND H0
0.05 2.8965 0.9407 False 7.7958 0.4537 False
0.025 0.4581 0.9999 False 11.3618 0.1820 False
0.01 0.9747 0.9984 False 2.1088 0.9775 False
0.005 1.3719 0.9946 False 3.6506 0.8872 False
Table A.7: Duration-based IND property test, with a significance level of 5% and K = 6 moment
conditions.
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
x values
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Pro
ba
bil
ity
Distribution comparison Kolmogorov-Smirnov test
Uniform distribution CDF
Empirical CDF
Figure A.10: Comparison between empirical and theoretical distribution Historical Simulation method.
57
Frank de Zwart Abn Amro Model Validation
A.10 Local level model
The local level model that is used is specified as follows:
µ(1)t = µ
(1)t−1 + c1ε
(1)t
µ(2)t = µ
(2)t−1 + c2ε
(2)t
µ(3)t = µ
(3)t−1 + c3ε
(3)t
αt = c4µ(1)t + c7µ
(2)t + c10µ
(3)t + c13ηt
ρt = c5µ(1)t + c8µ
(2)t + c11µ
(3)t + c14ηt
νt = c6µ(1)t + c9µ
(2)t + c12µ
(3)t + c15ηt
(A.1)
Also the following initial state mean and co-variance matrix estimates are used:
I n i t i a l s t a t e means :
x1 x2 x3
−6.5489∗e−05 5.2572∗ e−04 −4.7769∗e−05
I n i t i a l s t a t e co−var iance matrix :
x1 x2 x3
x1 1 .66 e−07 −3.25e−06 −3.92e−08x2 −3.25e−06 6 .65 e−04 2 .24 e−05x3 −3.92e−08 2 .24 e−05 3 .71 e−05
The coefficients of the model equations are estimated by maximum likelihood and this results in the
following values:
58
Frank de Zwart Abn Amro Model Validation
Coeff Std Err t Stat Prob.
c(1) -0.00009 0.00624 -0.01466 0.98830
c(2) -0.00005 0.30904 -0.00015 0.99988
c(3) -0.00004 0.36125 -0.00011 0.99991
c(4) 0.08108 3.01323 0.02691 0.97853
c(5) 0.46985 20.65594 0.02275 0.98185
c(6) 0.43666 23.87897 0.01829 0.98541
c(7) 0.02826 0.58728 0.04812 0.96162
c(8) 0.40308 15.57251 0.02588 0.97935
c(9) 0.42118 11.34973 0.03711 0.97040
c(10) 0.02609 5.85120 0.00446 0.99644
c(11) 0.43940 79.70170 0.00551 0.99560
c(12) 0.42112 87.15293 0.00483 0.99614
c(13) -0.00109 0.00004 -26.26307 0
c(14) -0.04227 0.00086 -49.09493 0
c(15) -0.02838 0.00100 -28.50648 0
Final Final State Std Dev t Stat Prob.
x(1) 0.00011 0.00161 0.07058 0.94373
x(2) -0.00676 0.02566 -0.26360 0.79209
x(3) 0.00535 0.02502 0.21382 0.83069
Table A.8: Parameter estimates of the local level model.
This is based on the first differences of the SABR parameters in the first estimation window with a
sample size of 249. We also find a logarithmic likelihood of 2307.61, an Akaike information criterion of
-4585.21 and a Bayesian information criterion of -4532.45.
59