Chapter V Extreme Value Theory and Frequency...

27
Chapter V Extreme Value Theory and Frequency Analysis 5.1 INTRODUCTION TO EXTREME VALUE THEORY Most statistical methods are concerned primarily with what goes on in the center of a statistical distribution, and do not pay particular attention to the tails of a distribution, or in other words, the most extreme values at either the high or low end. Extreme event risk is present in all areas of risk management – market, credit, day to day operation, and insurance. One of the greatest challenges to a risk manager is to implement risk management tools which allow for modeling rare but damaging events, and permit the measurement of their consequences. Extreme value theory (EVT) plays a vital role in these activities. The standard mathematical approach to modeling risks uses the language of probability theory. Risks are random variables, mapping unforeseen future states of the world into values representing profits and losses. These risks may be considered individually, or seen as part of a stochastic process where present risks depend on previous risks. The potential values of a risk have a probability distribution which we will never observe exactly although past losses due to similar risks, where available, may provide partial information about that distribution. Extreme events occur when a risk takes values from the tail of its distribution. We develop a model for risk by selecting a particular probability distribution. We may have estimated this distribution through statistical analysis of empirical data. In this case EVT is a tool which attempts to provide us with the best possible estimate of the tail area of the distribution. However, even in the absence of useful historical data, EVT provides guidance on the kind of distribution we should select so that extreme risks are handled conservatively. There are two principal kinds of model for extreme values. The oldest group of models is the block maxima models; these are models for the largest observations collected from large samples of identically distributed observations. For example, if we record daily or hourly losses and profits from trading a particular instrument or

Transcript of Chapter V Extreme Value Theory and Frequency...

Chapter V

Extreme Value Theory and Frequency Analysis

5.1 INTRODUCTION TO EXTREME VALUE THEORY

Most statistical methods are concerned primarily with what goes on in the center of a

statistical distribution, and do not pay particular attention to the tails of a distribution,

or in other words, the most extreme values at either the high or low end. Extreme

event risk is present in all areas of risk management – market, credit, day to day

operation, and insurance. One of the greatest challenges to a risk manager is to

implement risk management tools which allow for modeling rare but damaging

events, and permit the measurement of their consequences. Extreme value theory

(EVT) plays a vital role in these activities.

The standard mathematical approach to modeling risks uses the language of

probability theory. Risks are random variables, mapping unforeseen future states of

the world into values representing profits and losses. These risks may be considered

individually, or seen as part of a stochastic process where present risks depend on

previous risks. The potential values of a risk have a probability distribution which we

will never observe exactly although past losses due to similar risks, where available,

may provide partial information about that distribution. Extreme events occur when a

risk takes values from the tail of its distribution.

We develop a model for risk by selecting a particular probability distribution. We may

have estimated this distribution through statistical analysis of empirical data. In this

case EVT is a tool which attempts to provide us with the best possible estimate of the

tail area of the distribution. However, even in the absence of useful historical data,

EVT provides guidance on the kind of distribution we should select so that extreme

risks are handled conservatively.

There are two principal kinds of model for extreme values. The oldest group of

models is the block maxima models; these are models for the largest observations

collected from large samples of identically distributed observations. For example, if

we record daily or hourly losses and profits from trading a particular instrument or

Chapter V: Extreme Value Theory and Frequency Analysis

group of instruments, the block maxima/minima method provides a model which may

be appropriate for the quarterly or annual maximum of such values. The block

maxima/minima methods are fitted with the generalized extreme value (GEV)

distribution.

A more modern group of models is the peaks-over-threshold (POT) models; these are

models for all large observations which exceed a high threshold. The POT models are

generally considered to be the most useful for practical applications, due to a number

of reasons. First, by taking all exceedances over a suitably high threshold into

account, they use the data more efficiently. Second, they are easily extended to

situations where one wants to study how the extreme levels of a variable Y depend on

some other variable X – for instance, Y may be the level of tropospheric ozone on a

particular day and X a vector of meteorological variables for that day. This kind of

problem is almost impossible to handle through the annual maximum method. The

POT methods are fitted with the generalized Pareto distribution (GPD).

5.2 GENERALIZED EXTREME VALUE DISTRIBUTION

The role of the generalized extreme value (GEV) distribution in the theory of

extremes is analogous to that of the normal distribution in the central limit theory for

sums of random variables. The normal distribution proves to be the important limiting

distribution for sample sums or averages, as is made explicit in the central limit

theorem, the GEV distribution also proves to be important in the study of the limiting

behavior of sample extrema. The three-parameter distribution function of the standard

GEV is given by:

⎪⎩

⎪⎨

=−

≠−

+−=

−−

,0)exp(

,0))1(exp()(

/1

,,

ξ

ξσµξ

σµ

ξ

σµξ

ife

ifx

xHx

(5.2.1)

where 01 >−

+σµξ x is such that 01 >+ xξ . µ and 0>σ are known as the location

and scale parameters, respectively. ξ is the all-important shape parameter which

determines the nature of the tail distribution. The extreme value distribution in

equation (5.2.1) is generalized in the sense that the parametric form subsumes three

types of distributions which are known by other names according to the value of ξ :

103

Chapter V: Extreme Value Theory and Frequency Analysis

when 0>ξ we have the Frechet distribution with ξα /1= ; when 0<ξ we have the

Weibull distribution with shape ξα /1−= ; when 0=ξ we have the Gumbel

distribution. The Weibull is a short-tailed distribution with a so-called finite right end

point. The Gumbel and the Frechet have infinite right end points but the decay of tail

of the Frechet is much slower than that of the Gumbel.

Here are a few basic properties of the GEV distribution. The mean exists if 1<ξ and

the variance if 21

<ξ ; more generally, the 'th moment exists if kk1

<ξ . The mean

and variance are given by

}1)1({)(1 −−Γ+== ξξσµXEm , )1( <ξ

)}1()21({}){( 22

222

12 ξξξσ

−Γ−−Γ=−= mXEm , )2/1( <ξ

One objection to the extreme value distributions is that many processes rarely produce

observations that are independent and identically distributed. However, there is an

extensive theory of extreme value theory for non-IID processes. A second objection is

that sometimes it is argued that alternative distributional families fit the data better –

for example, in the 1970s there was a lengthy debate among hydrologists over the use

of extreme value distributions as compared with those of log Pearson type III. There

is no universal solution to this kind of debate.

5.2.1 The Fisher-Tippett Theorem

The Fisher-Tippett theorem is the fundamental result in EVT and can be considered to

have the same status in EVT as the central limit theorem has in the study of sums. The

theorem describes the limiting behavior of appropriately normalized sample maxima.

Suppose are a sequence of independent random variables with a common

distribution function F; in other words

K,, 21 XX

)()( xXPxF j ≤= for each and . The

distribution function of the maximum of the first observations

is given by the n 'th power of F :

j x

n

),,max( 1 nn XXM K=

{ } { }xXxXxXPxMP nn ≤≤≤=≤ ,,, 21 L

104

Chapter V: Extreme Value Theory and Frequency Analysis

{ } { } { }xXPxXPxXP n ≤≤≤= L21

)(xF n=

Suppose further that we can find sequences of real numbers and such

that , the sequence of normalized maxima, converge in distribution. That

is

0>na nb

nnn abM /)( −

( ){ } { }nnnnnn bxaMPxabMP +≤=≤− /

)( nnn bxaF +=

as )(xH→ ∞→n (5.2.2)

for some non-degenerate distribution function . If this condition holds we say

that is in the maximum domain of attraction of

)(xH

F H and we write . It

was shown by Fisher and Tippett (1928) that

)(HMDAF ∈

ξξ somefor type theof is )( HHHMDAF ⇒∈

Thus, if we know that suitably normalized maxima converge in distribution, then the

limit distribution must be an extreme value distribution for some value of the

parameters ξ , µ , and σ .

The class of distributions for which the condition (5.2.2) holds is large. Gnedenko

(1943) showed that for

F

0>ξ , )( ξHMDAF ∈ if and only if , for

some slowly varying function L(x). This result essentially says that if the tail of the

distribution function decays like a power function, then the distribution is in the

domain of attraction of the Frechet. The class of distributions where the tail decays

like a power function is quite large and includes the Pareto, Burr, loggamma, Cauchy

and t-distributions as well as various mixture models. These are all so-called heavy

tailed distributions. Distributions in the maximum domain of attraction of the Gumbel

include the normal, exponential, gamma and lognormal distributions. The

lognormal distribution has a moderately heavy tail and has historically been a popular

model for loss severity distributions; however it is not as heavy-tailed as the

distributions in for

)()(1 /1 xLxxF ξ−=−

)(xF

)( 0HMDA

)( ξHMDA 0<ξ . Distributions in the domain of attraction of the

Weibull ( forξH 0<ξ ) are short tailed distributions such as the uniform and beta

distributions.

105

Chapter V: Extreme Value Theory and Frequency Analysis

The Fisher-Tippett theorem suggests the fitting of the GEV to data on sample

maxima, when such data can be collected. There is much literature on this topic

particularly in hydrology where the so-called annual maxima method has a long

history.

5.2.2 Fitting the GEV Distribution

The GEV distribution can be fitted using various methods – probability weighted

moments, maximum likelihood, Bayesian methods. The latter two require numerical

computation, and one disadvantage is that the computations are not easily performed

with standard statistical packages. However, there are some programs available which

are specifically tailored to extreme values. In implementing maximum likelihood, it is

assumed that the block size is quite large so that, regardless of whether the underlying

data are dependent or not, the block maxima observations can be taken to be

independent.

For the GEV, the density ),,;( ξσµxp is obtained by differentiating (5.2.1) w.r.t. .

The likelihood based on observations is

x

nYYY ,,, 21 L

∏=

N

iiYp

1

),,;( ξσµ

and so the log likelihood is given by

( )ξ

σµ

ξσµ

ξξ

σξσµ/1

11log11log,,−

∑∑ ⎟⎠⎞

⎜⎝⎛ −+−⎟

⎠⎞

⎜⎝⎛ −+⎟⎟

⎞⎜⎜⎝

⎛+−−=

i

i

i

iY

YYNl (5.2.3)

which must be maximized subject to the parameter constraints that 0>σ and

01 >−

+σµ

ξ iY for all i . While this represents an irregular likelihood problem, due to

the dependence of the parameter space on the values of the data, the consistency and

asymptotic efficiency of the resulting MLEs can be established for the case when

21

−>ξ .

In determining the number and size of blocks ( and respectively) a trade-off

necessarily takes place: roughly speaking, a large value of m leads to a more accurate

n m

106

Chapter V: Extreme Value Theory and Frequency Analysis

approximation of the block maxima distribution by a GEV distribution and a low bias

in the parameter estimates; a large value of gives more block maxima data for the

ML estimation and leads to a low variance in the parameter estimates. Notes also that,

in the case of dependent data, somewhat larger block sizes than are used in the IID

case may be advisable; dependence generally has the effect that convergence to the

GEV distribution is slower, since the effective sample size is

n

θm , which is smaller

than . m

The following exploratory analyses can be done while fitting a GEV or GPD model to

a set of given data:

Time series plot

Autocorrelation plot

Histogram on the log scale

Quantile-Quantile plot against the exponential distribution

Gumbel plot (Mondal, 2006d, p. 338)

Sample mean excess plot (also called the mean residual life plot in survival

data)

Plot of empirical distribution function

Quantile-Quantile plot of residuals of a fitted model

Example 5.2.1 The highest water level during the period from 1 March to 15

May of different years at the Sunamganj station of the Surma river in Bangladesh is

given below. Fit a GEV model to the data.

Year WL (m) 1975 8.354 1976 7.615 1977 7.442 1978 7.428 1979 7.308 1980 7.284 1981 7.250 1982 7.075 1983 7.056 1984 7.056

Year WL (m) 1985 7.044 1986 6.874 1987 6.750 1988 6.650 1989 6.614 1990 6.590 1991 6.530 1992 6.214 1993 6.144 1994 6.134

Year WL (m) 1995 6.082 1996 6.045 1997 5.956 1998 5.188 1999 5.090 2000 4.826 2001 4.040 2002 2.882 2003 2.806 2004 8.080

A Q-Q plot against the exponential distribution is shown below. The plot shows a

convex departure from a straight line. This is a sign of thin-tailed behavior.

107

Chapter V: Extreme Value Theory and Frequency Analysis

3 4 5 6 7 8

01

23

4

Ordered Data

Expo

nent

ial Q

uant

iles

Figure 5.2.1 Q-Q plot of the water level data against the exponential distribution

Figure 5.2.2 shows the sample mean excess plot. The slope of the fitted points is

downward. This again is a sign of thin-tailed behavior.

3 4 5 6 7

12

3

Threshold

Mea

n Ex

cess

Figure 5.2.2 The sample mean excess plot for the water level data at Sunamganj

A GEV model was fitted to the data. The parameters were: 094991.6ˆ =µ ,

41705.1ˆ =σ and . Since the shape parameter is negative, the

distribution of the data is of Weibull-type (thin-tailed). The period of March-May is

the dry season in Bangladesh, so the maxima are from low flow periods. The Weibull-

type distribution is appropriate for low flows. The Q-Q plot of the residuals of the

fitted GEV model is shown in Figure 5.2.3. The plot is roughly a straight line of unit

slope passing through the origin. So the model can be considered to be a reasonably

5997389.0ˆ −=ξ

108

Chapter V: Extreme Value Theory and Frequency Analysis

good fit of the data. It is to be noted here that other distributions, such as Gumbel,

Lognormal, etc., were tried for the data set, however, the GEV distribution was found

to fit the data better. The estimated water levels corresponding to different return

periods are given below: Return period (years) WL (m)

2 6.56 5 7.50 10 7.85 20 8.06 50 8.23 100 8.31

0 1 2 3 4

01

23

4

Ordered Data

Expo

nent

ial Q

uant

iles

Figure 5.2.3 The Q-Q plot of the fitted GEV residuals

5.3 GENERALIZED PARETO DISTRIBUTION

The GPD is a two-parameter distribution with distribution function

⎩⎨⎧

=−−≠+−

=−

0)/exp(10)/1(1

)(/1

, ξβξβξ ξ

βξ xx

xG

where 0>β , and when 0≥x 0≥ξ and ξβ /0 −≤≤ x when 0<ξ . The parameters

ξ and β are referred to respectively as the shape and scale parameters.

The GPD is generalized in the sense that it subsumes certain other distributions under

a common parametric form. If 0>ξ then is a reparametrized version of the

ordinary Pareto distribution (with

βξ ,G

ξα /1= and ξβκ /= ), which has a long history in

actuarial mathematics as a model for large losses; 0=ξ corresponds to the

exponential distribution, i.e. a distribution with a medium-sized tail; and 0<ξ is a

109

Chapter V: Extreme Value Theory and Frequency Analysis

short-tailed Pareto type II distribution. The mean of the GPD is defined provided

1<ξ and is

ξβ−

=1

)(XE

The first case is the most relevant for risk management purposes since the GPD is

heavy-tailed when 0>ξ . Whereas the normal distribution has moments of all orders,

a heavy-tailed distribution does not possess a complete set of moments. In the case of

the GPD with 0>ξ we find that [ ]kxE is infinite for ξ/1≥k . When 2/1=ξ , the

GPD is an infinite variance (second moment) distribution; when 4/1=ξ , the GPD

has an infinite fourth moment.

The role of the GPD in EVT is as a natural model for the excess distribution over a

high threshold. Certain types of large claims data in insurance typically suggest an

infinite second moment; similarly econometricians might claim that certain market

returns indicate a distribution with infinite fourth moment. The normal distribution

cannot model these phenomena but the GPD is used to capture precisely this kind of

behavior.

5.3.1 Modeling Excess Distributions

Let X be a random variable with distribution function . The distribution of

excesses over a threshold has distribution function

F

u

{ }uXyuXpyFu >≤−=)(

for where is the right endpoint of . The excess distribution

function represents the probability that a loss exceeds the threshold by at most

an amount , given the information that it exceeds the threshold. In survival analysis

the excess distribution function is more commonly known as the residual life

distribution function ― it expresses the probability that, say, an electrical component

which has functioned for u units of time fails in the time period

uxy −<≤ 00 ∞≤0x F

uF u

y

[ ]yuu +, . It is very

useful to observe that can be written in terms of the underlying as uF F

)(1)()()(

uFuFuyFyFu −

−+=

110

Chapter V: Extreme Value Theory and Frequency Analysis

The mean excess function of a random variable X with finite mean is given by

)|()( uXuXEue >−=

The mean excess function expresses the mean of as a function of u . In

survival analysis, the mean excess function is known as the mean residual life

function and gives the expected residual lifetime for components with different ages.

)(ue uF

Mostly we would assume our underlying is a distribution with an infinite right

endpoint, i.e. it allows the possibility of arbitrarily large losses, even if it attributes

negligible probability to unreasonably large outcomes, e.g. the normal or

distributions. But it is also conceivable, in certain applications, that could have a

finite right endpoint. An example is the beta distribution on the interval [0,1] which

attributes zero probability to outcomes larger than 1 and which might be used, for

example, as the distribution of credit losses expressed as a proportion of exposure.

F

t

F

5.3.2 The Pickands-Balkema-de Haan Theorem

The Pickands-Balkema-de Haan limit theorem (Balkema and de Haan, 1974;

Pickands, 1975) is a key result in EVT and explains the importance of the GPD. We

can find a (positive measurable function) )(uβ such that

0)()(suplim )(,0 00

=−−<≤→

yGyF uuuxyxu βξ , uu ξββ +=)(

if and only if . )( ξHMDAF ∈

The theorem shows that under MDA conditions the generalized Pareto distribution is

the limiting distribution for the distribution of excesses as the threshold tends to the

right endpoint. All the common continuous distributions of statistics and actuarial

science (normal, lognormal, , t, F, gamma, exponential, uniform, beta, etc.) are in

MDA( ) for some

ξH ξ , so the above theorem proves to be a very widely applicable

result that essentially says that the GPD is the natural model for the unknown excess

distribution above sufficiently high thresholds.

5.3.3 Fitting a GPD Model

Given loss data from , a random number will exceed our

threshold ; it will be convenient to re-label these data

nXXX ,,, 21 K F uN

uuNXXX ~,,~,~

21 K . For each of

111

Chapter V: Extreme Value Theory and Frequency Analysis

these exceedances we calculate the amount uXY jj −= ~ of the excess loss. We wish

to estimate the parameters and of a GPD model by fitting this distribution to the

excess losses. There are various ways of fitting the GPD including ML and

PWM. The former method is more commonly used and easy to implement if the

excess data can be assumed to be realizations of independent random variables, since

the joint probability density of the observations will then be a product of marginal

densities. This is the most general fitting method in statistics and it also allows us to

give estimates of statistical error (standard errors) for the parameter estimates. Writing

for the density of the GPD, the log-likelihood may be easily calculated to be

ξ̂ β̂

uN

βξ ,g

∑=

=u

u

N

jjN YgYYYL

1,11 )(ln),,,;,(ln βξβξ L

∑=

⎟⎟⎠

⎞⎜⎜⎝

⎛+⎟⎟

⎞⎜⎜⎝

⎛+−−=

uN

j

ju

YN

1

111lnβ

ξξ

β

which must be maximized subject to the parameter constraints that 0>β and

0/1 >+ βξ jY for all . Solving the maximization problem yields a GPD model

for the excess distribution .

j βξ ˆ,ˆG

uF

Choice of the threshold is basically a compromise between choosing a sufficiently

high threshold so that the asymptotic theorem can be considered to be essentially

exact and choosing a sufficiently low threshold so that we have sufficient material for

estimation of the parameters.

5.4 MULTIVARIATE EXTREMES

The multivariate extreme value theory (MEVT) can be used to model the tails of the

multivariate distributions. The dependence structure of extreme events can be studied

with the MEVT. Consider the random vector which represents

losses of d different kinds measured at the same point in time. Assume that these

losses have joint distribution

tdXX ),,( 1 K=

{ }ddd xXxXPxxF ≤≤= ,,),,( 111 KK and that

individual losses have continuous marginal distributions { xXPxF ii ≤ }=)( . It has

been shown by Sklar (1959) that every joint distribution can be written as

)](,),([),,( 111 ddd xFxFCxxF KK = ,

112

Chapter V: Extreme Value Theory and Frequency Analysis

for a unique function C that is known as the copula of F. A copula is the joint

distribution of uniformly distributed random variables. If are then nUU ,,1 L )1,0(U

{ nnn uUuUPuuC }≤≤= ,,),,( 111 LL is a copula. is a function from [0,1]× . . .

×[0,1] into [0,1]. A copula is a function of variables on the unit n -cube

with the following properties:

C

C n [ ]n1,0

the range of C is the unit interval [ ]1,0 ;

is zero for all u in )C u( [ ]n1,0 for at least one coordinate equals zero [for the

bivariate case, for every in vu, [ ]1,0 , ),0(0)0,( vCuC == ];

if all coordinates of u are 1 except the -th one; [for the bivariate

case, for every in ,

ku)C =u( k

vu, [ ]1,0 uuC =)1,( and vvC =),1( ];

is -increasing in the sense that for every a≤b in C n [ ]n1,0 the volume assigned

by to the -box [a,b] = [aC n 1,b1]× . . . ×[an,bn] is non-negative [for the

bivariate case, for every in 2121 ,,, vvuu [ ]1,0 such that 21 uu ≤ and , 21 vv ≤

0),(),(),(),( 11211222 ≥+−− vuCvuCvuCvuC ];.

A copula may be thought of in two equivalent ways: as a function (with some

technical restrictions) that maps values in the unit hypercube to values in the unit

interval; as a multivariate distribution function with standard uniform marginal

distributions. The copula C does not change under (strictly) increasing

transformations of the losses and it makes sense to interpret C as the

dependence structure of X or F, as the following simple illustration in d = 2

dimensions shows.

dXX ,,1 K

We take the marginal distributions to be standard univariate normal distributions

. We can then choose any copula C (i.e. any bivariate distribution with

uniform marginals) and apply it to these marginals to obtain bivariate distributions

with normal marginals. For one particular choice of C, which we call the Gaussian

copula and denote , we obtain the standard bivariate normal distribution with

correlation

Φ== 21 FF

GaCρ

ρ . The Gaussian copula does not have a simple closed form and must be

written as a double integral:

113

Chapter V: Extreme Value Theory and Frequency Analysis

∫ ∫− −Φ

∞−

Φ

∞−

−+−−−

=)( )(

222

221

11

21

})1(2/]2[exp{121),(

u uGa dsdttstsuuC ρρ

ρπρ

Another interesting copula is the Gumbel copula which does have a simple closed

form:

{ }[ ]ββββ

/12

/1121 )log()log(exp),( vvvvC Gu −+−−= , 10 ≤< β

Figure 5.4.? shows the bivariate distributions which arise when we apply the two

copulas and to standard normal marginals. The left-hand picture is the

standard bivariate normal with correlation 70%; the right-hand picture is a bivariate

distribution with approximately equal correlation but the tendency to generate extreme

values of X1 and X2 simultaneously. It is, in this sense, a more dangerous distribution

for risk managers. On the basis of correlation, these distributions cannot be

differentiated but they obviously have entirely different dependence structures. The

bivariate normal has rather weak tail dependence; the normal-Gumbel distribution has

pronounced tail dependence.

GaC 7.0GuC 5.0

One way of understanding MEVT is as the study of copulas which arise in the

limiting multivariate distribution of component-wise block maxima. Suppose we have

a family of random vectors representing d-dimensional losses at different

points in time, where . A simple interpretation might be that they

represent daily (negative) returns for d instruments. As for the univariate discussion of

block maxima, we assume that losses at different points in time are independent. This

assumption simplifies the statement of the result, but can be relaxed to allow serial

dependence of losses at the cost of some additional technical conditions.

K,,t

idi XX ),,( 1 K=

We define the vector of component-wise block maxima to be

where is the block maximum of the j-th component for a

block of size n observations. Now consider the vector of normalized block maxima

given by , where ajn > 0 and bjn are normalizing

sequences. If this vector converges in distribution to a non-degenerate limiting

distribution then this limit must have the form

tdnn MMM ),,( 1 K=

),max( 1 njjjn XXM K=

tdndndnnnn abMabM )/)(,,/)(( 111 −− K

114

Chapter V: Extreme Value Theory and Frequency Analysis

⎟⎟⎠

⎞⎜⎜⎝

⎛⎟⎟⎠

⎞⎜⎜⎝

⎛ −⎟⎟⎠

⎞⎜⎜⎝

⎛ −

d

ddd

xH

xHC

σµ

σµ

ξξ ,,1

111 K ,

for some values of the parameters jj µξ , , and jσ and some copula C. It must have

this form because of univariate EVT. Each marginal distribution of the limiting

multivariate distribution must be a GEV, as we noted earlier in this chapter.

MEVT characterizes the copulas C which may arise in this limit — the so-called

MEV copulas. It turns out that the limiting copulas must satisfy

for t > 0. There is no single parametric family which

contains all the MEV copulas, but certain parametric copulas are consistent with the

above condition and might therefore be regarded as natural models for the

dependence structure of extreme observations.

),,(),( 11 dtt

dt uuCuuC KK =

In two dimensions, the Gumbel copula is an example of an MEV copula; it is

moreover a versatile copula. If the parameter β is 1 then and this

copula models independence of the components of a random vector . If

21211 ),( vvvvC Gu =

tXX ),( 21

)1,0(∈β then the Gumbel copula models dependence between X1 and X2. As β

decreases the dependence becomes stronger until a value 0=β corresponds to perfect

dependence of X1 and X2; this means X2 = T(X1) for some strictly increasing function

T. For 1<β the Gumbel copula shows tail dependence - the tendency of extreme

values to occur together as observed in Figure 5.4.?.

5.4.1 Fitting a Tail Model Using a Copula

The Gumbel copula can be used to build tail models in two dimensions as follows.

Suppose two risk factors have an unknown joint distribution F and

marginals F1 and F2 so that, for some copula C ,

tXX ),( 21

)](),([),( 221121 xFxFCxxF = .

Assume that we have n pairs of data points from this distribution. Using the univariate

peaks-over-threshold method, we model the tails of the two marginal distributions by

picking high thresholds and and using tail estimators to obtain 1u 2u

ii

i

iii

uxn

NxF ξµ

βξ ˆ/1)1(1)(ˆ −−

+−= , , iux > 2,1=i

115

Chapter V: Extreme Value Theory and Frequency Analysis

We model the dependence structure of observations exceeding these thresholds using

the Gumbel copula for some estimated value of the dependence parameter GuC β̂ β̂ β .

We put tail models and dependence structure together to obtain a model for the joint

tail of F.

( ))(),(ˆ),(ˆ2211ˆ21 xFxFCxxF Gu

β= , , 11 ux > 22 ux >

The estimate of the dependence parameter β can be determined by maximum

likelihood, either in a second stage after the parameters of the tail estimators have

been estimated or in a single stage estimation procedure where all parameters are

estimated together.

5.5 INTRODUCTION TO FREQUENCY ANALYSIS

The magnitude of an extreme event is inversely related to its frequency of occurrence,

very severe events occurring less frequently than more moderate events. The objective

of frequency analysis is to relate the magnitude of extreme events to their frequency

of occurrence through the use of probability distributions. Frequency analysis is

defined as the investigation of population sample data to estimate recurrence or

probabilities of magnitudes. It is one of the earliest and most frequent uses of statistics

in hydrology and natural sciences. Early applications of frequency analysis were

largely in the area of flood flow estimation. Today nearly every phase of hydrology

and natural sciences is subjected to frequency analyses.

Two methods of frequency analysis are described: one is a straightforward plotting

technique to obtain the cumulative distribution and the other uses the frequency

factors. The cumulative distribution function provides a rapid means of determining

the probability of an event equal to or less than some specified quantity. The inverse

is used to obtain the recurrence intervals. The analytical frequency analysis is a

simplified technique based on frequency factors depending on the distributional

assumption that is made and of the mean, variance and for some distributions the

coefficient of skew of the data.

Over the years and continuing today there have been volumes of material written on

the best probability distribution to use in various situations. One cannot, in most

instances, analytically determine which probability distribution should be used.

116

Chapter V: Extreme Value Theory and Frequency Analysis

Certain limit theorems such as the Central Limit Theorem and extreme value

theorems might provide guidance. One should also evaluate the experience that has

been accumulated with the various distributions and how well they describe the

phenomena of interest. Certain properties of the distributions can be used in screening

distributions for possible application in a particular situation. For example, the range

of the distribution, the general shape of the distribution and the skewness of the

distribution many times indicate that a particular distribution may or may not be

applicable in a given situation. When two or more distributions appear to describe a

given set of data equally well, the distribution that has been traditionally used should

be selected unless there are contrary overriding reasons for selecting another

distribution. However, if a traditionally used distribution is inferior, its use should not

be continued just because "that's the way it's always been done". As a general rule,

frequency analysis is cautioned when working with records shorter than 10 years and

in estimating frequencies of expected events greater than twice the record length.

5.6 PROBABILITY PLOTTING

A probability plot is a plot of a magnitude versus a probability. Determining the

probability to assign a data point is commonly referred to as determining the plotting

position. If one is dealing with a population, determining the plotting position is

merely a matter of determining the fraction of the data values less (greater) than or

equal to the value in question. Thus smallest (largest) population value would plot at 0

and the largest (smallest) population value would plot at 1.00. Assigning plotting

positions of 0 and 1 should be avoided for sample data unless one has additional

information on the population limits.

Plotting position may be expressed as a probability from 0 to 1 or a percent from 0 to

100. Which method is being used should be clear from the context. In some

discussions of probability plotting, especially in hydrologic literature, the probability

scale is used to denote prob or )( xX ≥ )(1 xPx− . One can always transform the

probability scale )(1 xPx− to or even if desired. )(xPx )(xTx

Probability plotting of hydrologic data requires that individual observations or data

points be independent of each other and that the sample data be representative of the

117

Chapter V: Extreme Value Theory and Frequency Analysis

population (unbiased). There are four common types of sample data: Complete

duration series, annual series, partial duration series, and extreme value series. The

complete duration series consists of all available data. An example would be all the

available daily flow data for a stream. This particular data set would most likely not

have independent observations. The annual series consists of one value per year such

as the maximum peak flow of each year. The partial duration series consists of all

values above (below) a certain base. All values above a threshold would represent a

partial duration series. This series may have more or less values in it than the annual

series. For example, there could have some years that would not have contributed any

data to a partial duration series with a certain base for the data. Frequently the annual

series and the partial duration series are combined so that the largest (smallest) annual

value plus all independent values above (below) some base are used. The extreme

value series consists of the largest (smallest) observation in a given time interval. The

annual series is a special case of the extreme value series with the time interval being

one year.

Regardless of the type of sample data used, the plotting position can be determined in

the same manner. Gumbel (1958) states the following criteria for plotting position

relationships:

1. The plotting position must be such that all observations can be plotted.

2. The plotting position should lie between the observed frequencies of

and where is the rank of the observation beginning with

for the largest (smallest) value and is the number of years of record

(if applicable) or the number of observations.

nm /)1( − nm / m

1=m n

3. The return period of a value equal to or larger than the largest observation and

the return period of a value equal to or smaller than the smallest observation

should converge toward . n

4. The observations should be equally spaced on the frequency scale.

5. The plotting position should have an initiative meaning, be analytically

simple, and be easy to use.

Several plotting position relationships are presented in Table 5.1. Benson (1962a) in a

comparative study of several plotting position relationships found on the basis of

118

Chapter V: Extreme Value Theory and Frequency Analysis

theoretical sampling from extreme value and normal distributions that the Weibull

relationship provided estimates that were consistent with experience. The Weibull

ploting position formula meets all 5 of the above criteria. (1) All of the observations

can be plotted since the plotting positions range from )1/(1 +n which is greater than

zero to which is less than 1. Probability paper for many distributions does

not contain the points zero and one. (2) The relationship

)1/( +nn

)1/( +nm lies between

and for all values of and . (3) The return period of the largest

value is ( which approaches n s n ets large. (4) The difference between the

plotting position of the ( +m d thm ue is /(1

nm /)1( − nm / m n

1/)1+n a g

an

st)1 val )1+n r all values of m and n .

) The fact that condition 3 is met plus the simplicity of the Weibull relationship

fulfills condition 5.

fo

(5

Cunnane (1978) studied the various available plotting position methods using criteria

of unbiasedness and minimum variance. An unbiased plotting method is one that, if

used for plotting a large number of equally sized samples, will result in the average of

the plotting points for each value of m falling on the theoretical distribution line. A

minimum variance plotting method is one that concluded that the Weibull plotting

formula is biased and plots the largest values of a sample at too small a return period.

For normally distributed data, he found that the Blom (1958) plotting position ( =

3/8) is closest to being unbiased, while for data distributed according to the Extreme

Value Type I distribution, the Gringorten (1963) formula ( = 0.44) is the best. For

the log-Pearson Type III distribution, the optimal value of depends on the value of

the coefficient of skewness, being larger than 3/8 when the data are positively skewed

and smaller than 3/8 when the data are negatively skewed.

c

c

c

Table 5.6.1 Plotting position relationships

Name Source Relationship

California California (1923) nm /

Hazen Hazen (1930) nm 2/)12( −

Weibull Weibull (1939) )1/( +nm

Blom Blom (1958) )25.0/()8/3( +− nm

Gringorten Gringorten (1963) )21/()( anam −+− #

119

Chapter V: Extreme Value Theory and Frequency Analysis

Cunnane Cunnane (1978) )12/()( +−− cncm *

Benard's median rank )4.0/()3.0( +− nm

M.S. Excel )1/()1( −− nm * c is 3/8 for normal distribution and 0.4 if the applicable distribution is unknown; # most plotting

position formulas do not account for the sample size or length of record. One formula that does account

for sample size was given by Gringorten. In general, = 0.4 is recommended in the Gringorten

equation. If distribution is approximately normal, = 3/8 is used. A value of = 0.44 is used if the

data follows a Gumbel (EV1) distribution.

cc c

It should be noted that all of the relationships give similar values near the centre of the

distribution, but may vary considerably in the tails. Predicting extreme events depends

on the tails of the distribution so care must be exercised. The quantity

represents the probability of an event with a magnitude equal to or greater than the

event in question. When the data are ranked from the largest ( m =1) to the smallest

( = ), the plotting positions determined from the Table 5.6.1 correspond to

. If the data are ranked from the smallest ( m =1) to the largest ( = ), the

plotting position formulas are still valid; however, the plotting position now

corresponds to the probability of an event to or small than the event in question which

is . Probability paper may contain scales of ,

)(1 xPx−

m n

)(1 xPx− m n

)(xPX )(xPx x )(1 xP− , or a

combination of these.

)(xTx

The following steps are necessary for a probability plotting of a given set of data:

Rank the data from the largest (smallest) to the smallest (largest) value. If two

or more observations have the same value, several procedures can be used for

assigning a plotting position.

Calculate the plotting position of each data point following Table 5.6.1.

Select the type of probability paper to be used.

Plot the observations on the probability paper.

A theoretical normal line is drawn. For normal distribution, the line should

pass through the mean plus one standard deviation at 84.1% and the mean

minus one standard deviation at 15.9%.

If a set of data plots as a straight line on a probability paper, the data can be said to be

distributed as the distribution corresponding to that probability paper. Since it would

120

Chapter V: Extreme Value Theory and Frequency Analysis

be rare for a set of data to plot exactly on a line, a decision must be made as to

whether or not the deviations from the line are random deviations or represent true

deviations indicating the data do not follow the given probability distribution. Visual

comparison of the observed and theoretical frequency histograms and the observed

and theoretical cumulative frequency curves in the form probability plots can help

determine if a set of data follows a certain distribution. Also, some statistical tests are

also available.

When probability plots are made and a line drawn through the data, the tendency to

extrapolate the data to high return periods is great. The distance on the probability

paper from a return period of 20 years to a return period of 200 years is not very

much; however, if the data do not truly follow the assumed distribution with

population parameters equal to the sample statistics (i.e., X=µ and for the

normal), the error in this extrapolation can be quite large. This fact has already been

referred to when it was stated that the estimation of probabilities in the tails of

distributions is very sensitive to distributional assumptions. Since one of the usual

purposes of probability is to estimate events with longer return periods, Blench (1959)

and Dalrymple (1960) have criticized the blind use of analytical flood frequency

methods because of this tendency toward extrapolation.

22 s=σ

5.7 ANALYTICAL FREQUENCY ANALYSIS

Calculating the magnitudes of extreme events by the method outlined in example

12.2.1 requires that the probability distribution function be invertible, that is, given a

value for T or [ ])1/()( −= TTFTx , the corresponding value of can be determined.

Some probability distribution functions are not readily invertible, including the

Normal and Pearson type III distributions, and an alternative method of calculating

the magnitudes of extreme events is required for these distributions.

Tx

The magnitude of a hydrologic event may be represented as the mean Tx µ plus the

departure of the variate from the mean: Tx∆

TT xx ∆+= µ (5.7.1)

121

Chapter V: Extreme Value Theory and Frequency Analysis

The departure may be taken as equal to the product of the standard deviation σ and a

frequency factor ; that is, TK σTT Kx =∆ . The departure Tx∆ and the frequency factor

are functions of the return period and the type of probability distribution to be

used in the analysis. Equation (5.7.1) may be therefore expressed as

TK

σµ TT Kx += (5.7.2)

which may be approximated by

sKxx TT += (5.7.3)

In the event that the variable analyzed is xy log= , then the same method is applied to

the statistics for the logarithms of the data, using

yTT sKyy += (5.7.4)

and the required value of is found by taking the antilog of . Tx Ty

The frequency factor equation (5.7.2) was proposed by Chow (1951), and it is

applicable to many probability distributions used in hydrologic frequency analysis.

For a given distribution, a TK − relationship can be determined between the

frequency factor and the corresponding return period. This relationship can be

expressed in mathematical terms or by a table.

Frequency analysis begins with the calculation of the statistical parameters required

for a proposed probability distribution by the method of moments from the given data.

For a given return period, the frequency factor can be determined from the TK −

relationship for the proposed distribution, and the magnitude computed by

equation (5.7.3), or (5.7.4).

Tx

The theoretical TK − relationships for several probability distributions commonly

used in frequency analysis are now described.

5.7.1 Normal and Lognormal Distributions

The frequency factor can be expressed from equation (5.7.2) as

σµ−

= TT

xK (5.7.5)

122

Chapter V: Extreme Value Theory and Frequency Analysis

This is the same as the standard normal variable . The value of corresponding to

an exceedence probability of

z z

)/1( Tpp = can be calculated by finding the value of an

intermediate variable : w2/1

2

1ln ⎥⎦

⎤⎢⎣

⎡⎟⎟⎠

⎞⎜⎜⎝

⎛=

pw )5.00( ≤< p (5.7.6)

Then calculating using the approximation z

32

2

8.0013.0189269.0432788.11010328.0802853.0515517.2

wwwwwwz

+++++

−= (5.7.7)

When , 5.0>p p−1 is substituted for p in (5.7.6) and the value of computed by

(5.7.7) is given a negative sign. The error in this formula is less than 0.00045 in

(Abramowitz and Stegun, 1965). The frequency factor for the normal distribution

is equal to , as mentioned above.

z

z

TK

z

For the lognormal distribution, the same procedure applies except that it is applied to

the logarithms of the variables, and their mean and standard deviation are used in

equation (5.7.4).

Example 5.7.1: Calculate the frequency factor for the normal distribution for an event

with a return period of 50 years. (Chow et al., 1988, p. 390)

Solution: For years, 50=T 02.050/1 ==p . From equation (5.7.6)

2/1

2

1ln ⎥⎦

⎤⎢⎣

⎡⎟⎟⎠

⎞⎜⎜⎝

⎛=

pw

2/1

202.01ln ⎥

⎤⎢⎣

⎡⎟⎠⎞

⎜⎝⎛=

7971.2=

Then substituting into (5.7.7) w

zKT =

32

2

)7971.2(8.0013.0)7971.2(189269.07971.2432788.11)7971.2(010328.07971.2802853.0515517.2

×+×+×+×+×+

−= w

=2.054

123

Chapter V: Extreme Value Theory and Frequency Analysis

5.7.2 Extreme Value Distributions

For the Extreme Value Type I distribution, Chow (1953) derived the expression

⎭⎬⎫

⎩⎨⎧

⎥⎦

⎤⎢⎣

⎡⎟⎠⎞

⎜⎝⎛

−+−=

1lnln5772.06

TTKT π

(5.7.8)

To express T in terms of , the above equation can be written as TK

⎪⎭

⎪⎬⎫

⎪⎩

⎪⎨⎧

⎥⎦

⎤⎢⎣

⎡⎟⎟⎠

⎞⎜⎜⎝

⎛+−−−

=

6expexp1

1

TKT

πγ

(5.7.9)

where 5772.0=γ . When µ=Tx , equation (5.7.5) gives 0=TK and equation (5.7.8)

gives years. This is the return period of the mean of the Extreme Value

Type I distribution. For the Extreme Value Type II distribution, the logarithm of the

variate follows the EVI distribution. For this case, (5.7.4) is used to calculate ,

using the value of from (5.7.8).

33.2=T

Ty

TK

Example 5.7.2 Determine the 5-year return period rainfall for Chicago using

the frequency factor method and the annual maximum rainfall data given below.

(Chow et al., 1988, p. 391)

Year Rainfall (inch) Year Rainfall (inch) Year Rainfall (inch) 1913 0.49 1926 0.68 1938 0.52 1914 0.66 1927 0.61 1939 0.64 1915 0.36 1928 0.88 1940 0.34 1916 0.58 1929 0.49 1941 0.70 1917 0.41 1930 0.33 1942 0.57 1918 0.47 1931 0.96 1943 0.92 1920 0.74 1932 0.94 1944 0.66 1921 0.53 1933 0.80 1945 0.65 1922 0.76 1934 0.62 1946 0.63 1923 0.57 1935 0.71 1947 0.60 1924 0.80 1936 1.11 1925 0.66 1937 0.64

Solution: The mean and standard deviation of annual maximum rainfalls at Chicago

are 649.0=x inch and inch, respectively. For 177.0=s 5=T , equation (5.7.8)

gives

⎭⎬⎫

⎩⎨⎧

⎥⎦

⎤⎢⎣

⎡⎟⎠⎞

⎜⎝⎛

−+−=

1lnln5772.06

TTKT π

⎭⎬⎫

⎩⎨⎧

⎥⎦

⎤⎢⎣

⎡⎟⎠⎞

⎜⎝⎛

−+−=

155lnln5772.06

π

124

Chapter V: Extreme Value Theory and Frequency Analysis

=0.719

By (5.7.3)

sKxx TT +=

=0.649+0.719×0.177

=0.78 in

5.7.3 Log-Pearson Type III Distribution

For this distribution, the first step is to take the logarithms of the hydrologic data,

. Usually logarithms to base 10 are used. The meanxy log= y , standard deviation ,

and coefficient of skewness are calculated for the logarithms of the data. The

frequency factor depends on the return period

ys

sC

T and the coefficient of skewness .

When , the frequency factor is equal to the standard normal variable .

When , is approximated by Kite (1977) as

sC

0=sC z

0≠sC TK

5432232

31)1()6(

31)1( kzkkzkzzkzzKT ++−−+−+= (5.7.10)

where . 6/sCk =

The value of for a given return period can be calculated by the procedure used in

Example 5.7.1. In standard text books, Tables are given for the values of the

frequency factor of the log-Pearson Type III distribution for various values of the

return period and coefficient of skewness.

z

Example 5.7.3 Calculate the 5- and 50-year return period annual maximum

discharges of the Gaudalupe River near Victoria, Texas, using the lognormal and log-

pearson Type III distributions. The data in cfs from 1935 to 1978 are given below.

(Chow et al., 1988, p. 393)

Year 1930 1940 1950 1960 1970 0 55900 13300 23700 9190 1 58000 12300 55800 9740 2 56000 28400 10800 58500 3 7710 11600 4100 33100 4 12300 8560 5720 25200 5 38500 22000 4950 15000 30200 6 179000 17900 1730 9790 14100 7 17200 46000 25300 70000 54500 8 25400 6970 58300 44300 12700 9 4940 20600 10100 15200

125

Chapter V: Extreme Value Theory and Frequency Analysis

Solution: The logarithms of the discharge values are taken and their statistics

calculated: 2743.4=y , , 4027.0=xs 0696.0−=sC .

Lognormal distribution: The frequency factor can be obtained from equation (5.7.7).

For years, was computed in Example 5.7.1 as 50=T TK 054.250 =K . By (5.7.4)

yTT sKyy +=

4027.0054.22743.450 ×+=y

=5.101

Then 101.550 )10(=x

=126,300 cfs

Similarly, , 842.05 =K 6134.44027.0842.02743.45 =×+=y , and

cfs. 060,41)10( 6134.45 ==x

Log-Pearson Type III distribution: For 0696.0−=sC , the value of is obtained by

interpolation from a standard table or by equation (5.7.10). By interpolation with

T=50 yrs:

50K

016.2)00696.0()01.0(

)054.200.2(054.250 =−−−−

−+=K

So 0863.54027.0016.22743.45050 =×+=+= ysKyy and

cfs. By a similar calculation,

.99.121)10( 0863.550 ==x

845.05 =K , 6146.45 =y , and 170,415 =x cfs.

The results for estimated annual maximum discharges are:

Return Period

5 years 50 years

Lognormal

)0( =sC

41,060 126,300

Log-Pearson Type III

)07.0( −=sC

41,170 121,990

It can be seen that the effect of including the small negative coefficient of skewness in

the calculations is to alter slightly the estimated flow with that effect being more

pronounced at years than at 50=T 5=T years. Another feature of the results is that

the 50-year return period estimates are about three times as large as the 5-year return

126

Chapter V: Extreme Value Theory and Frequency Analysis

period estimates; for this example, the increase in the estimated flood discharges is

less than proportional to the increase in return period.

5.7.4 Treatment of Zeros

Most hydrologic variables are bounded on the left by zero. A zero in a set of data that

is being logarithmically transformed requires special handling. One solution is to add

a small constant to all of the observations. Another method is to analyze the nonzero

values and then adjust the relation to the full period of record. This method biases the

results as the zero values are essentially ignored. A third and theoretically more sound

method would be to use the theorem of total probability:

prob )0prob()0prob()0prob(0)prob()( ≠≠≥+==≥=> XXxXXXxXxX

Since prob )0( =≥ XxX is zero, the relationship reduces to

)0prob()0prob()prob( ≠≥≠=> XxXXxX

In this relationship prob would be required by the fraction of non-zero values

and prob

)0( ≠X

)0( ≠≥ XxX would be estimated by a standard analysis of the non-zero

values with the sample size taken to be equal to the number of non-zero values. This

relation can be written as a function of cumulative probability distributions.

)(1)(1 xGkxF −=− or

)(1)( xkGkxF +−= (5.7.11)

where is the cumulative probability distribution of all )(xF X ))0prob(( ≥≤ XxX ,

is the probability that k X is not zero, and is the cumulative probability

distribution of the non-zero values of

)(xG

X (i.e. )0(prob ≠≤ XxX .

Equation 5.7.11 can be used to estimate the magnitude of an event with return period

by solving first for and then using the inverse transformation of

to get the value of

)(XTX )(xG )(xG

X . For example the 10-year event with 95.0=k is found to be the

value of satisfying

[ ] 89.095.0/)95.019.0(/1)()( =+−=+−= kkxFxG

Note that it is possible to generate negative estimates for from equation 5.7.11.

For example, if

)(xG

50.0=k and 05.0)( =xF , the estimated is )(xG

127

Chapter V: Extreme Value Theory and Frequency Analysis

[ ] 9.05.0/)50.0105.0(/1)()( −=+−=+−= kkxFxG

This merely means that the value of X corresponding to 05.0)( =xF is zero.

128