Chapter V Extreme Value Theory and Frequency...
Transcript of Chapter V Extreme Value Theory and Frequency...
Chapter V
Extreme Value Theory and Frequency Analysis
5.1 INTRODUCTION TO EXTREME VALUE THEORY
Most statistical methods are concerned primarily with what goes on in the center of a
statistical distribution, and do not pay particular attention to the tails of a distribution,
or in other words, the most extreme values at either the high or low end. Extreme
event risk is present in all areas of risk management – market, credit, day to day
operation, and insurance. One of the greatest challenges to a risk manager is to
implement risk management tools which allow for modeling rare but damaging
events, and permit the measurement of their consequences. Extreme value theory
(EVT) plays a vital role in these activities.
The standard mathematical approach to modeling risks uses the language of
probability theory. Risks are random variables, mapping unforeseen future states of
the world into values representing profits and losses. These risks may be considered
individually, or seen as part of a stochastic process where present risks depend on
previous risks. The potential values of a risk have a probability distribution which we
will never observe exactly although past losses due to similar risks, where available,
may provide partial information about that distribution. Extreme events occur when a
risk takes values from the tail of its distribution.
We develop a model for risk by selecting a particular probability distribution. We may
have estimated this distribution through statistical analysis of empirical data. In this
case EVT is a tool which attempts to provide us with the best possible estimate of the
tail area of the distribution. However, even in the absence of useful historical data,
EVT provides guidance on the kind of distribution we should select so that extreme
risks are handled conservatively.
There are two principal kinds of model for extreme values. The oldest group of
models is the block maxima models; these are models for the largest observations
collected from large samples of identically distributed observations. For example, if
we record daily or hourly losses and profits from trading a particular instrument or
Chapter V: Extreme Value Theory and Frequency Analysis
group of instruments, the block maxima/minima method provides a model which may
be appropriate for the quarterly or annual maximum of such values. The block
maxima/minima methods are fitted with the generalized extreme value (GEV)
distribution.
A more modern group of models is the peaks-over-threshold (POT) models; these are
models for all large observations which exceed a high threshold. The POT models are
generally considered to be the most useful for practical applications, due to a number
of reasons. First, by taking all exceedances over a suitably high threshold into
account, they use the data more efficiently. Second, they are easily extended to
situations where one wants to study how the extreme levels of a variable Y depend on
some other variable X – for instance, Y may be the level of tropospheric ozone on a
particular day and X a vector of meteorological variables for that day. This kind of
problem is almost impossible to handle through the annual maximum method. The
POT methods are fitted with the generalized Pareto distribution (GPD).
5.2 GENERALIZED EXTREME VALUE DISTRIBUTION
The role of the generalized extreme value (GEV) distribution in the theory of
extremes is analogous to that of the normal distribution in the central limit theory for
sums of random variables. The normal distribution proves to be the important limiting
distribution for sample sums or averages, as is made explicit in the central limit
theorem, the GEV distribution also proves to be important in the study of the limiting
behavior of sample extrema. The three-parameter distribution function of the standard
GEV is given by:
⎪⎩
⎪⎨
⎧
=−
≠−
+−=
−−
−
,0)exp(
,0))1(exp()(
/1
,,
ξ
ξσµξ
σµ
ξ
σµξ
ife
ifx
xHx
(5.2.1)
where 01 >−
+σµξ x is such that 01 >+ xξ . µ and 0>σ are known as the location
and scale parameters, respectively. ξ is the all-important shape parameter which
determines the nature of the tail distribution. The extreme value distribution in
equation (5.2.1) is generalized in the sense that the parametric form subsumes three
types of distributions which are known by other names according to the value of ξ :
103
Chapter V: Extreme Value Theory and Frequency Analysis
when 0>ξ we have the Frechet distribution with ξα /1= ; when 0<ξ we have the
Weibull distribution with shape ξα /1−= ; when 0=ξ we have the Gumbel
distribution. The Weibull is a short-tailed distribution with a so-called finite right end
point. The Gumbel and the Frechet have infinite right end points but the decay of tail
of the Frechet is much slower than that of the Gumbel.
Here are a few basic properties of the GEV distribution. The mean exists if 1<ξ and
the variance if 21
<ξ ; more generally, the 'th moment exists if kk1
<ξ . The mean
and variance are given by
}1)1({)(1 −−Γ+== ξξσµXEm , )1( <ξ
)}1()21({}){( 22
222
12 ξξξσ
−Γ−−Γ=−= mXEm , )2/1( <ξ
One objection to the extreme value distributions is that many processes rarely produce
observations that are independent and identically distributed. However, there is an
extensive theory of extreme value theory for non-IID processes. A second objection is
that sometimes it is argued that alternative distributional families fit the data better –
for example, in the 1970s there was a lengthy debate among hydrologists over the use
of extreme value distributions as compared with those of log Pearson type III. There
is no universal solution to this kind of debate.
5.2.1 The Fisher-Tippett Theorem
The Fisher-Tippett theorem is the fundamental result in EVT and can be considered to
have the same status in EVT as the central limit theorem has in the study of sums. The
theorem describes the limiting behavior of appropriately normalized sample maxima.
Suppose are a sequence of independent random variables with a common
distribution function F; in other words
K,, 21 XX
)()( xXPxF j ≤= for each and . The
distribution function of the maximum of the first observations
is given by the n 'th power of F :
j x
n
),,max( 1 nn XXM K=
{ } { }xXxXxXPxMP nn ≤≤≤=≤ ,,, 21 L
104
Chapter V: Extreme Value Theory and Frequency Analysis
{ } { } { }xXPxXPxXP n ≤≤≤= L21
)(xF n=
Suppose further that we can find sequences of real numbers and such
that , the sequence of normalized maxima, converge in distribution. That
is
0>na nb
nnn abM /)( −
( ){ } { }nnnnnn bxaMPxabMP +≤=≤− /
)( nnn bxaF +=
as )(xH→ ∞→n (5.2.2)
for some non-degenerate distribution function . If this condition holds we say
that is in the maximum domain of attraction of
)(xH
F H and we write . It
was shown by Fisher and Tippett (1928) that
)(HMDAF ∈
ξξ somefor type theof is )( HHHMDAF ⇒∈
Thus, if we know that suitably normalized maxima converge in distribution, then the
limit distribution must be an extreme value distribution for some value of the
parameters ξ , µ , and σ .
The class of distributions for which the condition (5.2.2) holds is large. Gnedenko
(1943) showed that for
F
0>ξ , )( ξHMDAF ∈ if and only if , for
some slowly varying function L(x). This result essentially says that if the tail of the
distribution function decays like a power function, then the distribution is in the
domain of attraction of the Frechet. The class of distributions where the tail decays
like a power function is quite large and includes the Pareto, Burr, loggamma, Cauchy
and t-distributions as well as various mixture models. These are all so-called heavy
tailed distributions. Distributions in the maximum domain of attraction of the Gumbel
include the normal, exponential, gamma and lognormal distributions. The
lognormal distribution has a moderately heavy tail and has historically been a popular
model for loss severity distributions; however it is not as heavy-tailed as the
distributions in for
)()(1 /1 xLxxF ξ−=−
)(xF
)( 0HMDA
)( ξHMDA 0<ξ . Distributions in the domain of attraction of the
Weibull ( forξH 0<ξ ) are short tailed distributions such as the uniform and beta
distributions.
105
Chapter V: Extreme Value Theory and Frequency Analysis
The Fisher-Tippett theorem suggests the fitting of the GEV to data on sample
maxima, when such data can be collected. There is much literature on this topic
particularly in hydrology where the so-called annual maxima method has a long
history.
5.2.2 Fitting the GEV Distribution
The GEV distribution can be fitted using various methods – probability weighted
moments, maximum likelihood, Bayesian methods. The latter two require numerical
computation, and one disadvantage is that the computations are not easily performed
with standard statistical packages. However, there are some programs available which
are specifically tailored to extreme values. In implementing maximum likelihood, it is
assumed that the block size is quite large so that, regardless of whether the underlying
data are dependent or not, the block maxima observations can be taken to be
independent.
For the GEV, the density ),,;( ξσµxp is obtained by differentiating (5.2.1) w.r.t. .
The likelihood based on observations is
x
nYYY ,,, 21 L
∏=
N
iiYp
1
),,;( ξσµ
and so the log likelihood is given by
( )ξ
σµ
ξσµ
ξξ
σξσµ/1
11log11log,,−
∑∑ ⎟⎠⎞
⎜⎝⎛ −+−⎟
⎠⎞
⎜⎝⎛ −+⎟⎟
⎠
⎞⎜⎜⎝
⎛+−−=
i
i
i
iY
YYNl (5.2.3)
which must be maximized subject to the parameter constraints that 0>σ and
01 >−
+σµ
ξ iY for all i . While this represents an irregular likelihood problem, due to
the dependence of the parameter space on the values of the data, the consistency and
asymptotic efficiency of the resulting MLEs can be established for the case when
21
−>ξ .
In determining the number and size of blocks ( and respectively) a trade-off
necessarily takes place: roughly speaking, a large value of m leads to a more accurate
n m
106
Chapter V: Extreme Value Theory and Frequency Analysis
approximation of the block maxima distribution by a GEV distribution and a low bias
in the parameter estimates; a large value of gives more block maxima data for the
ML estimation and leads to a low variance in the parameter estimates. Notes also that,
in the case of dependent data, somewhat larger block sizes than are used in the IID
case may be advisable; dependence generally has the effect that convergence to the
GEV distribution is slower, since the effective sample size is
n
θm , which is smaller
than . m
The following exploratory analyses can be done while fitting a GEV or GPD model to
a set of given data:
Time series plot
Autocorrelation plot
Histogram on the log scale
Quantile-Quantile plot against the exponential distribution
Gumbel plot (Mondal, 2006d, p. 338)
Sample mean excess plot (also called the mean residual life plot in survival
data)
Plot of empirical distribution function
Quantile-Quantile plot of residuals of a fitted model
Example 5.2.1 The highest water level during the period from 1 March to 15
May of different years at the Sunamganj station of the Surma river in Bangladesh is
given below. Fit a GEV model to the data.
Year WL (m) 1975 8.354 1976 7.615 1977 7.442 1978 7.428 1979 7.308 1980 7.284 1981 7.250 1982 7.075 1983 7.056 1984 7.056
Year WL (m) 1985 7.044 1986 6.874 1987 6.750 1988 6.650 1989 6.614 1990 6.590 1991 6.530 1992 6.214 1993 6.144 1994 6.134
Year WL (m) 1995 6.082 1996 6.045 1997 5.956 1998 5.188 1999 5.090 2000 4.826 2001 4.040 2002 2.882 2003 2.806 2004 8.080
A Q-Q plot against the exponential distribution is shown below. The plot shows a
convex departure from a straight line. This is a sign of thin-tailed behavior.
107
Chapter V: Extreme Value Theory and Frequency Analysis
3 4 5 6 7 8
01
23
4
Ordered Data
Expo
nent
ial Q
uant
iles
Figure 5.2.1 Q-Q plot of the water level data against the exponential distribution
Figure 5.2.2 shows the sample mean excess plot. The slope of the fitted points is
downward. This again is a sign of thin-tailed behavior.
3 4 5 6 7
12
3
Threshold
Mea
n Ex
cess
Figure 5.2.2 The sample mean excess plot for the water level data at Sunamganj
A GEV model was fitted to the data. The parameters were: 094991.6ˆ =µ ,
41705.1ˆ =σ and . Since the shape parameter is negative, the
distribution of the data is of Weibull-type (thin-tailed). The period of March-May is
the dry season in Bangladesh, so the maxima are from low flow periods. The Weibull-
type distribution is appropriate for low flows. The Q-Q plot of the residuals of the
fitted GEV model is shown in Figure 5.2.3. The plot is roughly a straight line of unit
slope passing through the origin. So the model can be considered to be a reasonably
5997389.0ˆ −=ξ
108
Chapter V: Extreme Value Theory and Frequency Analysis
good fit of the data. It is to be noted here that other distributions, such as Gumbel,
Lognormal, etc., were tried for the data set, however, the GEV distribution was found
to fit the data better. The estimated water levels corresponding to different return
periods are given below: Return period (years) WL (m)
2 6.56 5 7.50 10 7.85 20 8.06 50 8.23 100 8.31
0 1 2 3 4
01
23
4
Ordered Data
Expo
nent
ial Q
uant
iles
Figure 5.2.3 The Q-Q plot of the fitted GEV residuals
5.3 GENERALIZED PARETO DISTRIBUTION
The GPD is a two-parameter distribution with distribution function
⎩⎨⎧
=−−≠+−
=−
0)/exp(10)/1(1
)(/1
, ξβξβξ ξ
βξ xx
xG
where 0>β , and when 0≥x 0≥ξ and ξβ /0 −≤≤ x when 0<ξ . The parameters
ξ and β are referred to respectively as the shape and scale parameters.
The GPD is generalized in the sense that it subsumes certain other distributions under
a common parametric form. If 0>ξ then is a reparametrized version of the
ordinary Pareto distribution (with
βξ ,G
ξα /1= and ξβκ /= ), which has a long history in
actuarial mathematics as a model for large losses; 0=ξ corresponds to the
exponential distribution, i.e. a distribution with a medium-sized tail; and 0<ξ is a
109
Chapter V: Extreme Value Theory and Frequency Analysis
short-tailed Pareto type II distribution. The mean of the GPD is defined provided
1<ξ and is
ξβ−
=1
)(XE
The first case is the most relevant for risk management purposes since the GPD is
heavy-tailed when 0>ξ . Whereas the normal distribution has moments of all orders,
a heavy-tailed distribution does not possess a complete set of moments. In the case of
the GPD with 0>ξ we find that [ ]kxE is infinite for ξ/1≥k . When 2/1=ξ , the
GPD is an infinite variance (second moment) distribution; when 4/1=ξ , the GPD
has an infinite fourth moment.
The role of the GPD in EVT is as a natural model for the excess distribution over a
high threshold. Certain types of large claims data in insurance typically suggest an
infinite second moment; similarly econometricians might claim that certain market
returns indicate a distribution with infinite fourth moment. The normal distribution
cannot model these phenomena but the GPD is used to capture precisely this kind of
behavior.
5.3.1 Modeling Excess Distributions
Let X be a random variable with distribution function . The distribution of
excesses over a threshold has distribution function
F
u
{ }uXyuXpyFu >≤−=)(
for where is the right endpoint of . The excess distribution
function represents the probability that a loss exceeds the threshold by at most
an amount , given the information that it exceeds the threshold. In survival analysis
the excess distribution function is more commonly known as the residual life
distribution function ― it expresses the probability that, say, an electrical component
which has functioned for u units of time fails in the time period
uxy −<≤ 00 ∞≤0x F
uF u
y
[ ]yuu +, . It is very
useful to observe that can be written in terms of the underlying as uF F
)(1)()()(
uFuFuyFyFu −
−+=
110
Chapter V: Extreme Value Theory and Frequency Analysis
The mean excess function of a random variable X with finite mean is given by
)|()( uXuXEue >−=
The mean excess function expresses the mean of as a function of u . In
survival analysis, the mean excess function is known as the mean residual life
function and gives the expected residual lifetime for components with different ages.
)(ue uF
Mostly we would assume our underlying is a distribution with an infinite right
endpoint, i.e. it allows the possibility of arbitrarily large losses, even if it attributes
negligible probability to unreasonably large outcomes, e.g. the normal or
distributions. But it is also conceivable, in certain applications, that could have a
finite right endpoint. An example is the beta distribution on the interval [0,1] which
attributes zero probability to outcomes larger than 1 and which might be used, for
example, as the distribution of credit losses expressed as a proportion of exposure.
F
t
F
5.3.2 The Pickands-Balkema-de Haan Theorem
The Pickands-Balkema-de Haan limit theorem (Balkema and de Haan, 1974;
Pickands, 1975) is a key result in EVT and explains the importance of the GPD. We
can find a (positive measurable function) )(uβ such that
0)()(suplim )(,0 00
=−−<≤→
yGyF uuuxyxu βξ , uu ξββ +=)(
if and only if . )( ξHMDAF ∈
The theorem shows that under MDA conditions the generalized Pareto distribution is
the limiting distribution for the distribution of excesses as the threshold tends to the
right endpoint. All the common continuous distributions of statistics and actuarial
science (normal, lognormal, , t, F, gamma, exponential, uniform, beta, etc.) are in
MDA( ) for some
2χ
ξH ξ , so the above theorem proves to be a very widely applicable
result that essentially says that the GPD is the natural model for the unknown excess
distribution above sufficiently high thresholds.
5.3.3 Fitting a GPD Model
Given loss data from , a random number will exceed our
threshold ; it will be convenient to re-label these data
nXXX ,,, 21 K F uN
uuNXXX ~,,~,~
21 K . For each of
111
Chapter V: Extreme Value Theory and Frequency Analysis
these exceedances we calculate the amount uXY jj −= ~ of the excess loss. We wish
to estimate the parameters and of a GPD model by fitting this distribution to the
excess losses. There are various ways of fitting the GPD including ML and
PWM. The former method is more commonly used and easy to implement if the
excess data can be assumed to be realizations of independent random variables, since
the joint probability density of the observations will then be a product of marginal
densities. This is the most general fitting method in statistics and it also allows us to
give estimates of statistical error (standard errors) for the parameter estimates. Writing
for the density of the GPD, the log-likelihood may be easily calculated to be
ξ̂ β̂
uN
βξ ,g
∑=
=u
u
N
jjN YgYYYL
1,11 )(ln),,,;,(ln βξβξ L
∑=
⎟⎟⎠
⎞⎜⎜⎝
⎛+⎟⎟
⎠
⎞⎜⎜⎝
⎛+−−=
uN
j
ju
YN
1
111lnβ
ξξ
β
which must be maximized subject to the parameter constraints that 0>β and
0/1 >+ βξ jY for all . Solving the maximization problem yields a GPD model
for the excess distribution .
j βξ ˆ,ˆG
uF
Choice of the threshold is basically a compromise between choosing a sufficiently
high threshold so that the asymptotic theorem can be considered to be essentially
exact and choosing a sufficiently low threshold so that we have sufficient material for
estimation of the parameters.
5.4 MULTIVARIATE EXTREMES
The multivariate extreme value theory (MEVT) can be used to model the tails of the
multivariate distributions. The dependence structure of extreme events can be studied
with the MEVT. Consider the random vector which represents
losses of d different kinds measured at the same point in time. Assume that these
losses have joint distribution
tdXX ),,( 1 K=
{ }ddd xXxXPxxF ≤≤= ,,),,( 111 KK and that
individual losses have continuous marginal distributions { xXPxF ii ≤ }=)( . It has
been shown by Sklar (1959) that every joint distribution can be written as
)](,),([),,( 111 ddd xFxFCxxF KK = ,
112
Chapter V: Extreme Value Theory and Frequency Analysis
for a unique function C that is known as the copula of F. A copula is the joint
distribution of uniformly distributed random variables. If are then nUU ,,1 L )1,0(U
{ nnn uUuUPuuC }≤≤= ,,),,( 111 LL is a copula. is a function from [0,1]× . . .
×[0,1] into [0,1]. A copula is a function of variables on the unit n -cube
with the following properties:
C
C n [ ]n1,0
the range of C is the unit interval [ ]1,0 ;
is zero for all u in )C u( [ ]n1,0 for at least one coordinate equals zero [for the
bivariate case, for every in vu, [ ]1,0 , ),0(0)0,( vCuC == ];
if all coordinates of u are 1 except the -th one; [for the bivariate
case, for every in ,
ku)C =u( k
vu, [ ]1,0 uuC =)1,( and vvC =),1( ];
is -increasing in the sense that for every a≤b in C n [ ]n1,0 the volume assigned
by to the -box [a,b] = [aC n 1,b1]× . . . ×[an,bn] is non-negative [for the
bivariate case, for every in 2121 ,,, vvuu [ ]1,0 such that 21 uu ≤ and , 21 vv ≤
0),(),(),(),( 11211222 ≥+−− vuCvuCvuCvuC ];.
A copula may be thought of in two equivalent ways: as a function (with some
technical restrictions) that maps values in the unit hypercube to values in the unit
interval; as a multivariate distribution function with standard uniform marginal
distributions. The copula C does not change under (strictly) increasing
transformations of the losses and it makes sense to interpret C as the
dependence structure of X or F, as the following simple illustration in d = 2
dimensions shows.
dXX ,,1 K
We take the marginal distributions to be standard univariate normal distributions
. We can then choose any copula C (i.e. any bivariate distribution with
uniform marginals) and apply it to these marginals to obtain bivariate distributions
with normal marginals. For one particular choice of C, which we call the Gaussian
copula and denote , we obtain the standard bivariate normal distribution with
correlation
Φ== 21 FF
GaCρ
ρ . The Gaussian copula does not have a simple closed form and must be
written as a double integral:
113
Chapter V: Extreme Value Theory and Frequency Analysis
∫ ∫− −Φ
∞−
Φ
∞−
−+−−−
=)( )(
222
221
11
21
})1(2/]2[exp{121),(
u uGa dsdttstsuuC ρρ
ρπρ
Another interesting copula is the Gumbel copula which does have a simple closed
form:
{ }[ ]ββββ
/12
/1121 )log()log(exp),( vvvvC Gu −+−−= , 10 ≤< β
Figure 5.4.? shows the bivariate distributions which arise when we apply the two
copulas and to standard normal marginals. The left-hand picture is the
standard bivariate normal with correlation 70%; the right-hand picture is a bivariate
distribution with approximately equal correlation but the tendency to generate extreme
values of X1 and X2 simultaneously. It is, in this sense, a more dangerous distribution
for risk managers. On the basis of correlation, these distributions cannot be
differentiated but they obviously have entirely different dependence structures. The
bivariate normal has rather weak tail dependence; the normal-Gumbel distribution has
pronounced tail dependence.
GaC 7.0GuC 5.0
One way of understanding MEVT is as the study of copulas which arise in the
limiting multivariate distribution of component-wise block maxima. Suppose we have
a family of random vectors representing d-dimensional losses at different
points in time, where . A simple interpretation might be that they
represent daily (negative) returns for d instruments. As for the univariate discussion of
block maxima, we assume that losses at different points in time are independent. This
assumption simplifies the statement of the result, but can be relaxed to allow serial
dependence of losses at the cost of some additional technical conditions.
K,,t
idi XX ),,( 1 K=
We define the vector of component-wise block maxima to be
where is the block maximum of the j-th component for a
block of size n observations. Now consider the vector of normalized block maxima
given by , where ajn > 0 and bjn are normalizing
sequences. If this vector converges in distribution to a non-degenerate limiting
distribution then this limit must have the form
tdnn MMM ),,( 1 K=
),max( 1 njjjn XXM K=
tdndndnnnn abMabM )/)(,,/)(( 111 −− K
114
Chapter V: Extreme Value Theory and Frequency Analysis
⎟⎟⎠
⎞⎜⎜⎝
⎛⎟⎟⎠
⎞⎜⎜⎝
⎛ −⎟⎟⎠
⎞⎜⎜⎝
⎛ −
d
ddd
xH
xHC
σµ
σµ
ξξ ,,1
111 K ,
for some values of the parameters jj µξ , , and jσ and some copula C. It must have
this form because of univariate EVT. Each marginal distribution of the limiting
multivariate distribution must be a GEV, as we noted earlier in this chapter.
MEVT characterizes the copulas C which may arise in this limit — the so-called
MEV copulas. It turns out that the limiting copulas must satisfy
for t > 0. There is no single parametric family which
contains all the MEV copulas, but certain parametric copulas are consistent with the
above condition and might therefore be regarded as natural models for the
dependence structure of extreme observations.
),,(),( 11 dtt
dt uuCuuC KK =
In two dimensions, the Gumbel copula is an example of an MEV copula; it is
moreover a versatile copula. If the parameter β is 1 then and this
copula models independence of the components of a random vector . If
21211 ),( vvvvC Gu =
tXX ),( 21
)1,0(∈β then the Gumbel copula models dependence between X1 and X2. As β
decreases the dependence becomes stronger until a value 0=β corresponds to perfect
dependence of X1 and X2; this means X2 = T(X1) for some strictly increasing function
T. For 1<β the Gumbel copula shows tail dependence - the tendency of extreme
values to occur together as observed in Figure 5.4.?.
5.4.1 Fitting a Tail Model Using a Copula
The Gumbel copula can be used to build tail models in two dimensions as follows.
Suppose two risk factors have an unknown joint distribution F and
marginals F1 and F2 so that, for some copula C ,
tXX ),( 21
)](),([),( 221121 xFxFCxxF = .
Assume that we have n pairs of data points from this distribution. Using the univariate
peaks-over-threshold method, we model the tails of the two marginal distributions by
picking high thresholds and and using tail estimators to obtain 1u 2u
ii
i
iii
uxn
NxF ξµ
βξ ˆ/1)1(1)(ˆ −−
+−= , , iux > 2,1=i
115
Chapter V: Extreme Value Theory and Frequency Analysis
We model the dependence structure of observations exceeding these thresholds using
the Gumbel copula for some estimated value of the dependence parameter GuC β̂ β̂ β .
We put tail models and dependence structure together to obtain a model for the joint
tail of F.
( ))(),(ˆ),(ˆ2211ˆ21 xFxFCxxF Gu
β= , , 11 ux > 22 ux >
The estimate of the dependence parameter β can be determined by maximum
likelihood, either in a second stage after the parameters of the tail estimators have
been estimated or in a single stage estimation procedure where all parameters are
estimated together.
5.5 INTRODUCTION TO FREQUENCY ANALYSIS
The magnitude of an extreme event is inversely related to its frequency of occurrence,
very severe events occurring less frequently than more moderate events. The objective
of frequency analysis is to relate the magnitude of extreme events to their frequency
of occurrence through the use of probability distributions. Frequency analysis is
defined as the investigation of population sample data to estimate recurrence or
probabilities of magnitudes. It is one of the earliest and most frequent uses of statistics
in hydrology and natural sciences. Early applications of frequency analysis were
largely in the area of flood flow estimation. Today nearly every phase of hydrology
and natural sciences is subjected to frequency analyses.
Two methods of frequency analysis are described: one is a straightforward plotting
technique to obtain the cumulative distribution and the other uses the frequency
factors. The cumulative distribution function provides a rapid means of determining
the probability of an event equal to or less than some specified quantity. The inverse
is used to obtain the recurrence intervals. The analytical frequency analysis is a
simplified technique based on frequency factors depending on the distributional
assumption that is made and of the mean, variance and for some distributions the
coefficient of skew of the data.
Over the years and continuing today there have been volumes of material written on
the best probability distribution to use in various situations. One cannot, in most
instances, analytically determine which probability distribution should be used.
116
Chapter V: Extreme Value Theory and Frequency Analysis
Certain limit theorems such as the Central Limit Theorem and extreme value
theorems might provide guidance. One should also evaluate the experience that has
been accumulated with the various distributions and how well they describe the
phenomena of interest. Certain properties of the distributions can be used in screening
distributions for possible application in a particular situation. For example, the range
of the distribution, the general shape of the distribution and the skewness of the
distribution many times indicate that a particular distribution may or may not be
applicable in a given situation. When two or more distributions appear to describe a
given set of data equally well, the distribution that has been traditionally used should
be selected unless there are contrary overriding reasons for selecting another
distribution. However, if a traditionally used distribution is inferior, its use should not
be continued just because "that's the way it's always been done". As a general rule,
frequency analysis is cautioned when working with records shorter than 10 years and
in estimating frequencies of expected events greater than twice the record length.
5.6 PROBABILITY PLOTTING
A probability plot is a plot of a magnitude versus a probability. Determining the
probability to assign a data point is commonly referred to as determining the plotting
position. If one is dealing with a population, determining the plotting position is
merely a matter of determining the fraction of the data values less (greater) than or
equal to the value in question. Thus smallest (largest) population value would plot at 0
and the largest (smallest) population value would plot at 1.00. Assigning plotting
positions of 0 and 1 should be avoided for sample data unless one has additional
information on the population limits.
Plotting position may be expressed as a probability from 0 to 1 or a percent from 0 to
100. Which method is being used should be clear from the context. In some
discussions of probability plotting, especially in hydrologic literature, the probability
scale is used to denote prob or )( xX ≥ )(1 xPx− . One can always transform the
probability scale )(1 xPx− to or even if desired. )(xPx )(xTx
Probability plotting of hydrologic data requires that individual observations or data
points be independent of each other and that the sample data be representative of the
117
Chapter V: Extreme Value Theory and Frequency Analysis
population (unbiased). There are four common types of sample data: Complete
duration series, annual series, partial duration series, and extreme value series. The
complete duration series consists of all available data. An example would be all the
available daily flow data for a stream. This particular data set would most likely not
have independent observations. The annual series consists of one value per year such
as the maximum peak flow of each year. The partial duration series consists of all
values above (below) a certain base. All values above a threshold would represent a
partial duration series. This series may have more or less values in it than the annual
series. For example, there could have some years that would not have contributed any
data to a partial duration series with a certain base for the data. Frequently the annual
series and the partial duration series are combined so that the largest (smallest) annual
value plus all independent values above (below) some base are used. The extreme
value series consists of the largest (smallest) observation in a given time interval. The
annual series is a special case of the extreme value series with the time interval being
one year.
Regardless of the type of sample data used, the plotting position can be determined in
the same manner. Gumbel (1958) states the following criteria for plotting position
relationships:
1. The plotting position must be such that all observations can be plotted.
2. The plotting position should lie between the observed frequencies of
and where is the rank of the observation beginning with
for the largest (smallest) value and is the number of years of record
(if applicable) or the number of observations.
nm /)1( − nm / m
1=m n
3. The return period of a value equal to or larger than the largest observation and
the return period of a value equal to or smaller than the smallest observation
should converge toward . n
4. The observations should be equally spaced on the frequency scale.
5. The plotting position should have an initiative meaning, be analytically
simple, and be easy to use.
Several plotting position relationships are presented in Table 5.1. Benson (1962a) in a
comparative study of several plotting position relationships found on the basis of
118
Chapter V: Extreme Value Theory and Frequency Analysis
theoretical sampling from extreme value and normal distributions that the Weibull
relationship provided estimates that were consistent with experience. The Weibull
ploting position formula meets all 5 of the above criteria. (1) All of the observations
can be plotted since the plotting positions range from )1/(1 +n which is greater than
zero to which is less than 1. Probability paper for many distributions does
not contain the points zero and one. (2) The relationship
)1/( +nn
)1/( +nm lies between
and for all values of and . (3) The return period of the largest
value is ( which approaches n s n ets large. (4) The difference between the
plotting position of the ( +m d thm ue is /(1
nm /)1( − nm / m n
1/)1+n a g
an
st)1 val )1+n r all values of m and n .
) The fact that condition 3 is met plus the simplicity of the Weibull relationship
fulfills condition 5.
fo
(5
Cunnane (1978) studied the various available plotting position methods using criteria
of unbiasedness and minimum variance. An unbiased plotting method is one that, if
used for plotting a large number of equally sized samples, will result in the average of
the plotting points for each value of m falling on the theoretical distribution line. A
minimum variance plotting method is one that concluded that the Weibull plotting
formula is biased and plots the largest values of a sample at too small a return period.
For normally distributed data, he found that the Blom (1958) plotting position ( =
3/8) is closest to being unbiased, while for data distributed according to the Extreme
Value Type I distribution, the Gringorten (1963) formula ( = 0.44) is the best. For
the log-Pearson Type III distribution, the optimal value of depends on the value of
the coefficient of skewness, being larger than 3/8 when the data are positively skewed
and smaller than 3/8 when the data are negatively skewed.
c
c
c
Table 5.6.1 Plotting position relationships
Name Source Relationship
California California (1923) nm /
Hazen Hazen (1930) nm 2/)12( −
Weibull Weibull (1939) )1/( +nm
Blom Blom (1958) )25.0/()8/3( +− nm
Gringorten Gringorten (1963) )21/()( anam −+− #
119
Chapter V: Extreme Value Theory and Frequency Analysis
Cunnane Cunnane (1978) )12/()( +−− cncm *
Benard's median rank )4.0/()3.0( +− nm
M.S. Excel )1/()1( −− nm * c is 3/8 for normal distribution and 0.4 if the applicable distribution is unknown; # most plotting
position formulas do not account for the sample size or length of record. One formula that does account
for sample size was given by Gringorten. In general, = 0.4 is recommended in the Gringorten
equation. If distribution is approximately normal, = 3/8 is used. A value of = 0.44 is used if the
data follows a Gumbel (EV1) distribution.
cc c
It should be noted that all of the relationships give similar values near the centre of the
distribution, but may vary considerably in the tails. Predicting extreme events depends
on the tails of the distribution so care must be exercised. The quantity
represents the probability of an event with a magnitude equal to or greater than the
event in question. When the data are ranked from the largest ( m =1) to the smallest
( = ), the plotting positions determined from the Table 5.6.1 correspond to
. If the data are ranked from the smallest ( m =1) to the largest ( = ), the
plotting position formulas are still valid; however, the plotting position now
corresponds to the probability of an event to or small than the event in question which
is . Probability paper may contain scales of ,
)(1 xPx−
m n
)(1 xPx− m n
)(xPX )(xPx x )(1 xP− , or a
combination of these.
)(xTx
The following steps are necessary for a probability plotting of a given set of data:
Rank the data from the largest (smallest) to the smallest (largest) value. If two
or more observations have the same value, several procedures can be used for
assigning a plotting position.
Calculate the plotting position of each data point following Table 5.6.1.
Select the type of probability paper to be used.
Plot the observations on the probability paper.
A theoretical normal line is drawn. For normal distribution, the line should
pass through the mean plus one standard deviation at 84.1% and the mean
minus one standard deviation at 15.9%.
If a set of data plots as a straight line on a probability paper, the data can be said to be
distributed as the distribution corresponding to that probability paper. Since it would
120
Chapter V: Extreme Value Theory and Frequency Analysis
be rare for a set of data to plot exactly on a line, a decision must be made as to
whether or not the deviations from the line are random deviations or represent true
deviations indicating the data do not follow the given probability distribution. Visual
comparison of the observed and theoretical frequency histograms and the observed
and theoretical cumulative frequency curves in the form probability plots can help
determine if a set of data follows a certain distribution. Also, some statistical tests are
also available.
When probability plots are made and a line drawn through the data, the tendency to
extrapolate the data to high return periods is great. The distance on the probability
paper from a return period of 20 years to a return period of 200 years is not very
much; however, if the data do not truly follow the assumed distribution with
population parameters equal to the sample statistics (i.e., X=µ and for the
normal), the error in this extrapolation can be quite large. This fact has already been
referred to when it was stated that the estimation of probabilities in the tails of
distributions is very sensitive to distributional assumptions. Since one of the usual
purposes of probability is to estimate events with longer return periods, Blench (1959)
and Dalrymple (1960) have criticized the blind use of analytical flood frequency
methods because of this tendency toward extrapolation.
22 s=σ
5.7 ANALYTICAL FREQUENCY ANALYSIS
Calculating the magnitudes of extreme events by the method outlined in example
12.2.1 requires that the probability distribution function be invertible, that is, given a
value for T or [ ])1/()( −= TTFTx , the corresponding value of can be determined.
Some probability distribution functions are not readily invertible, including the
Normal and Pearson type III distributions, and an alternative method of calculating
the magnitudes of extreme events is required for these distributions.
Tx
The magnitude of a hydrologic event may be represented as the mean Tx µ plus the
departure of the variate from the mean: Tx∆
TT xx ∆+= µ (5.7.1)
121
Chapter V: Extreme Value Theory and Frequency Analysis
The departure may be taken as equal to the product of the standard deviation σ and a
frequency factor ; that is, TK σTT Kx =∆ . The departure Tx∆ and the frequency factor
are functions of the return period and the type of probability distribution to be
used in the analysis. Equation (5.7.1) may be therefore expressed as
TK
σµ TT Kx += (5.7.2)
which may be approximated by
sKxx TT += (5.7.3)
In the event that the variable analyzed is xy log= , then the same method is applied to
the statistics for the logarithms of the data, using
yTT sKyy += (5.7.4)
and the required value of is found by taking the antilog of . Tx Ty
The frequency factor equation (5.7.2) was proposed by Chow (1951), and it is
applicable to many probability distributions used in hydrologic frequency analysis.
For a given distribution, a TK − relationship can be determined between the
frequency factor and the corresponding return period. This relationship can be
expressed in mathematical terms or by a table.
Frequency analysis begins with the calculation of the statistical parameters required
for a proposed probability distribution by the method of moments from the given data.
For a given return period, the frequency factor can be determined from the TK −
relationship for the proposed distribution, and the magnitude computed by
equation (5.7.3), or (5.7.4).
Tx
The theoretical TK − relationships for several probability distributions commonly
used in frequency analysis are now described.
5.7.1 Normal and Lognormal Distributions
The frequency factor can be expressed from equation (5.7.2) as
σµ−
= TT
xK (5.7.5)
122
Chapter V: Extreme Value Theory and Frequency Analysis
This is the same as the standard normal variable . The value of corresponding to
an exceedence probability of
z z
)/1( Tpp = can be calculated by finding the value of an
intermediate variable : w2/1
2
1ln ⎥⎦
⎤⎢⎣
⎡⎟⎟⎠
⎞⎜⎜⎝
⎛=
pw )5.00( ≤< p (5.7.6)
Then calculating using the approximation z
32
2
8.0013.0189269.0432788.11010328.0802853.0515517.2
wwwwwwz
+++++
−= (5.7.7)
When , 5.0>p p−1 is substituted for p in (5.7.6) and the value of computed by
(5.7.7) is given a negative sign. The error in this formula is less than 0.00045 in
(Abramowitz and Stegun, 1965). The frequency factor for the normal distribution
is equal to , as mentioned above.
z
z
TK
z
For the lognormal distribution, the same procedure applies except that it is applied to
the logarithms of the variables, and their mean and standard deviation are used in
equation (5.7.4).
Example 5.7.1: Calculate the frequency factor for the normal distribution for an event
with a return period of 50 years. (Chow et al., 1988, p. 390)
Solution: For years, 50=T 02.050/1 ==p . From equation (5.7.6)
2/1
2
1ln ⎥⎦
⎤⎢⎣
⎡⎟⎟⎠
⎞⎜⎜⎝
⎛=
pw
2/1
202.01ln ⎥
⎦
⎤⎢⎣
⎡⎟⎠⎞
⎜⎝⎛=
7971.2=
Then substituting into (5.7.7) w
zKT =
32
2
)7971.2(8.0013.0)7971.2(189269.07971.2432788.11)7971.2(010328.07971.2802853.0515517.2
×+×+×+×+×+
−= w
=2.054
123
Chapter V: Extreme Value Theory and Frequency Analysis
5.7.2 Extreme Value Distributions
For the Extreme Value Type I distribution, Chow (1953) derived the expression
⎭⎬⎫
⎩⎨⎧
⎥⎦
⎤⎢⎣
⎡⎟⎠⎞
⎜⎝⎛
−+−=
1lnln5772.06
TTKT π
(5.7.8)
To express T in terms of , the above equation can be written as TK
⎪⎭
⎪⎬⎫
⎪⎩
⎪⎨⎧
⎥⎦
⎤⎢⎣
⎡⎟⎟⎠
⎞⎜⎜⎝
⎛+−−−
=
6expexp1
1
TKT
πγ
(5.7.9)
where 5772.0=γ . When µ=Tx , equation (5.7.5) gives 0=TK and equation (5.7.8)
gives years. This is the return period of the mean of the Extreme Value
Type I distribution. For the Extreme Value Type II distribution, the logarithm of the
variate follows the EVI distribution. For this case, (5.7.4) is used to calculate ,
using the value of from (5.7.8).
33.2=T
Ty
TK
Example 5.7.2 Determine the 5-year return period rainfall for Chicago using
the frequency factor method and the annual maximum rainfall data given below.
(Chow et al., 1988, p. 391)
Year Rainfall (inch) Year Rainfall (inch) Year Rainfall (inch) 1913 0.49 1926 0.68 1938 0.52 1914 0.66 1927 0.61 1939 0.64 1915 0.36 1928 0.88 1940 0.34 1916 0.58 1929 0.49 1941 0.70 1917 0.41 1930 0.33 1942 0.57 1918 0.47 1931 0.96 1943 0.92 1920 0.74 1932 0.94 1944 0.66 1921 0.53 1933 0.80 1945 0.65 1922 0.76 1934 0.62 1946 0.63 1923 0.57 1935 0.71 1947 0.60 1924 0.80 1936 1.11 1925 0.66 1937 0.64
Solution: The mean and standard deviation of annual maximum rainfalls at Chicago
are 649.0=x inch and inch, respectively. For 177.0=s 5=T , equation (5.7.8)
gives
⎭⎬⎫
⎩⎨⎧
⎥⎦
⎤⎢⎣
⎡⎟⎠⎞
⎜⎝⎛
−+−=
1lnln5772.06
TTKT π
⎭⎬⎫
⎩⎨⎧
⎥⎦
⎤⎢⎣
⎡⎟⎠⎞
⎜⎝⎛
−+−=
155lnln5772.06
π
124
Chapter V: Extreme Value Theory and Frequency Analysis
=0.719
By (5.7.3)
sKxx TT +=
=0.649+0.719×0.177
=0.78 in
5.7.3 Log-Pearson Type III Distribution
For this distribution, the first step is to take the logarithms of the hydrologic data,
. Usually logarithms to base 10 are used. The meanxy log= y , standard deviation ,
and coefficient of skewness are calculated for the logarithms of the data. The
frequency factor depends on the return period
ys
sC
T and the coefficient of skewness .
When , the frequency factor is equal to the standard normal variable .
When , is approximated by Kite (1977) as
sC
0=sC z
0≠sC TK
5432232
31)1()6(
31)1( kzkkzkzzkzzKT ++−−+−+= (5.7.10)
where . 6/sCk =
The value of for a given return period can be calculated by the procedure used in
Example 5.7.1. In standard text books, Tables are given for the values of the
frequency factor of the log-Pearson Type III distribution for various values of the
return period and coefficient of skewness.
z
Example 5.7.3 Calculate the 5- and 50-year return period annual maximum
discharges of the Gaudalupe River near Victoria, Texas, using the lognormal and log-
pearson Type III distributions. The data in cfs from 1935 to 1978 are given below.
(Chow et al., 1988, p. 393)
Year 1930 1940 1950 1960 1970 0 55900 13300 23700 9190 1 58000 12300 55800 9740 2 56000 28400 10800 58500 3 7710 11600 4100 33100 4 12300 8560 5720 25200 5 38500 22000 4950 15000 30200 6 179000 17900 1730 9790 14100 7 17200 46000 25300 70000 54500 8 25400 6970 58300 44300 12700 9 4940 20600 10100 15200
125
Chapter V: Extreme Value Theory and Frequency Analysis
Solution: The logarithms of the discharge values are taken and their statistics
calculated: 2743.4=y , , 4027.0=xs 0696.0−=sC .
Lognormal distribution: The frequency factor can be obtained from equation (5.7.7).
For years, was computed in Example 5.7.1 as 50=T TK 054.250 =K . By (5.7.4)
yTT sKyy +=
4027.0054.22743.450 ×+=y
=5.101
Then 101.550 )10(=x
=126,300 cfs
Similarly, , 842.05 =K 6134.44027.0842.02743.45 =×+=y , and
cfs. 060,41)10( 6134.45 ==x
Log-Pearson Type III distribution: For 0696.0−=sC , the value of is obtained by
interpolation from a standard table or by equation (5.7.10). By interpolation with
T=50 yrs:
50K
016.2)00696.0()01.0(
)054.200.2(054.250 =−−−−
−+=K
So 0863.54027.0016.22743.45050 =×+=+= ysKyy and
cfs. By a similar calculation,
.99.121)10( 0863.550 ==x
845.05 =K , 6146.45 =y , and 170,415 =x cfs.
The results for estimated annual maximum discharges are:
Return Period
5 years 50 years
Lognormal
)0( =sC
41,060 126,300
Log-Pearson Type III
)07.0( −=sC
41,170 121,990
It can be seen that the effect of including the small negative coefficient of skewness in
the calculations is to alter slightly the estimated flow with that effect being more
pronounced at years than at 50=T 5=T years. Another feature of the results is that
the 50-year return period estimates are about three times as large as the 5-year return
126
Chapter V: Extreme Value Theory and Frequency Analysis
period estimates; for this example, the increase in the estimated flood discharges is
less than proportional to the increase in return period.
5.7.4 Treatment of Zeros
Most hydrologic variables are bounded on the left by zero. A zero in a set of data that
is being logarithmically transformed requires special handling. One solution is to add
a small constant to all of the observations. Another method is to analyze the nonzero
values and then adjust the relation to the full period of record. This method biases the
results as the zero values are essentially ignored. A third and theoretically more sound
method would be to use the theorem of total probability:
prob )0prob()0prob()0prob(0)prob()( ≠≠≥+==≥=> XXxXXXxXxX
Since prob )0( =≥ XxX is zero, the relationship reduces to
)0prob()0prob()prob( ≠≥≠=> XxXXxX
In this relationship prob would be required by the fraction of non-zero values
and prob
)0( ≠X
)0( ≠≥ XxX would be estimated by a standard analysis of the non-zero
values with the sample size taken to be equal to the number of non-zero values. This
relation can be written as a function of cumulative probability distributions.
)(1)(1 xGkxF −=− or
)(1)( xkGkxF +−= (5.7.11)
where is the cumulative probability distribution of all )(xF X ))0prob(( ≥≤ XxX ,
is the probability that k X is not zero, and is the cumulative probability
distribution of the non-zero values of
)(xG
X (i.e. )0(prob ≠≤ XxX .
Equation 5.7.11 can be used to estimate the magnitude of an event with return period
by solving first for and then using the inverse transformation of
to get the value of
)(XTX )(xG )(xG
X . For example the 10-year event with 95.0=k is found to be the
value of satisfying
[ ] 89.095.0/)95.019.0(/1)()( =+−=+−= kkxFxG
Note that it is possible to generate negative estimates for from equation 5.7.11.
For example, if
)(xG
50.0=k and 05.0)( =xF , the estimated is )(xG
127