EXTREME VALUES AND PROBABILITY DISTRIBUTION FUNCTIONS ON FINITE DIMENSIONAL SPACES

VIETNAM NATIONAL UNIVERSITY

UNIVERSITY OF SCIENCE

FACULTY OF MATHEMATICS, MECHANICS AND INFORMATICS

Do Dai Chi

EXTREME VALUES ANDPROBABILITY DISTRIBUTION FUNCTONS

ON FINITE DIMENSIONAL SPACES

Undergraduate Thesis

Advanced Undergraduate Program in Mathematics

Hanoi - 2012

VIETNAM NATIONAL UNIVERSITY

UNIVERSITY OF SCIENCE

FACULTY OF MATHEMATICS, MECHANICS AND INFORMATICS

Do Dai Chi

EXTREME VALUES ANDPROBABILITY DISTRIBUTION FUNCTONS

ON FINITE DIMENSIONAL SPACES

Undergraduate Thesis

Advanced Undergraduate Program in Mathematics

Thesis advisor: Assoc.Prof.Dr. Ho Dang Phuc

Hanoi - 2012

Acknowledgments

It would not have been possible to write this undergraduate thesis without the help,

and support, of the kind people around me, to only some of whom it is possible to

give particular mention here.

This thesis would not have been possible without the help, support and patience of

my advisor, Assoc.Prof.Dr. Ho Dang Phuc, not to mention his advice and unsur-

passed knowledge of probability and statistic. The advice, support and friendship

of his have been invaluable on both an academic and a personal level, for which I

am extremely grateful.

I would like to show my gratitude to my teachers at Faculty of Mathematics, Me-

chanics and Informatics, University of Sciences, VietNam National University who

equip me with important mathematics knowledge during first four years at the uni-

versity.

I would like to thank my parents for their personal support and great patience at all

times. My parents have given me their unequivocal support throughout, as always,

for which my mere expression of thanks likewise does not suffice.

Last, but by no means least, I thank my friends in K53-Advanced Math for their

support and encouragement throughout.

i

List of abbreviations and symbols

Here is a glossary of miscellaneous symbols, in case you need a reference guide.

∼ f (x) ∼ g(x) as x → x0 means that limx→x0f (x)g(x) = 1

d→ Xnd→ X : convergence in distribution.

P→ XnP→ X convergence in probability.

a.s→ Xna.s→ X almost surely convergence.

v→ µnv→ µ vague convergence.

d= X d

= Y: X and Y have the same distribution.

o(1) f (x) = o(g(x)) as x → x0 means that limx→x0f (x)g(x) = 0.

f← The generalized inverse of a monotone function f defined by

f←(x) = infy : f (y) ≥ x.Λ(x) Gumbel distribution.

Φα(x) Frechet distribution.

Ψα(x) Weibull distribution.

xF xF = supx ∈ R : F(x) < 1.f (x−) f (x−) = limy↑x f (y).

[F > 0] means the set x : F(x) > 0.M+(E) The space of nonnegative Radon measures on E.

C ( f ) The points at which the function f is continuous.

d.f. Distribution function.

r.v. Random variable.

DOA Domain of attraction.

ii

Contents

Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . i

List of abbreviations and symbols . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ii

Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . v

Chapter 1. Univariate Extreme Value Theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1

1.1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1

1.1.1. Limit Probabilities for Maxima . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2

1.2. Maximum Domains of Attraction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4

1.2.1. Max-Stable Distributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12

1.3. Extremal Value Distributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13

1.3.1. Extremal Types Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13

1.3.2. Generalized Extreme Value Distributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19

1.4. Domain of Attraction Condition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20

1.4.1. General Theory of Domains of Attraction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23

1.5. Condition for belonging to Extreme Value Domain . . . . . . . . . . . . . . . . . . . . 26

Chapter 2. Multivariate Extreme Value Theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30

2.1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30

2.2. Limit Distributions of Multivariate Maxima . . . . . . . . . . . . . . . . . . . . . . . . . . . 32

2.2.1. Max-infinitely Divisible Distributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33

2.2.2. Characterizing Max-id Distributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36

2.3. Multivariate Domain of Attraction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39

2.3.1. Max-stability. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39

2.4. Basic Properties of Multivariate Extreme Value Distributions . . . . . . . . . . 41

2.5. Standardization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45

Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49

iii

Chapter A. Appendix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50

A.1. Modes of Convergence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50

A.2. Inverses of Monotone Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53

A.3. Some Convergence Theorems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54

Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55

iv

Introduction

Extreme value theory developed from an interest in studying the behavior of the

maximum or minimum (extremes) of independent and identically distributed ran-

dom variables. Historically, the study of extremes can be dated back to Nicholas

Bernoulli who studied the mean largest distance from the origin to n points scat-

tered randomly on a straight line of some fixed length (Gumbel.1958 [15]). Extreme

value theory provides important applications in finance, risk management, telecom-

munication, environmental and pollution studies and other fields. In this thesis, we

study the probabilistic approach to extreme value theory. The thesis is divided into

the two chapters, namely,

Chapter 1: Univariate Extreme Value Theory.

Chapter 2: Multivariate Extreme Value Theory.

Chapter 1 introduces the basic concepts related to Univariate Extreme Value Theory.

This chapter concerns with the limit problem of determining the possible limits of

sample extremes and the domain of attraction problem.

Chapter 2 provides basic results in Multivariate Extreme Value Theory. We deal

with the probabilistic aspects of multivariate extreme value theory by including the

possible limits and their domain of attraction.

The main materials of the thesis were taken from the books by M. R. Leadbetter, G.

Lindgren, and H. Rootzen [16], Resnick [18], Embrechts [12] and de Haan and Ana

Ferreira [11]. We have also borrowed extensively from lecture notes of Bikramjit

Dass [9].

v

CHAPTER 1

Univariate Extreme Value Theory

This chapter is primarily concerned with the central result of classical extreme value

theory, the Extremal Types Theorem, which specifies the possible forms for the limit-

ing distribution of maxima in sequences of independent and identically distributed

(i.i.d.) random variables(r.v.s). In the derivation, the possible limiting distributions

are identified with a class having a certain stability property, the so-called max-stable

distributions. It is further shown that this class consists precisely of the three families

known (loosely) as the three extreme value distributions.

1.1. Introduction

The asymptotic theory of sample extremes has been developed in parallel with the

central limit theory, and in fact the two theories bear some resemblance.

Let X1, X2, . . . , Xn be i.i.d. random variables. The central limit theory is concerned

with the limit behavior of the partial sums Sn = X1 + X2 + · · · + Xn as n → ∞,

whereas the theory of sample extremes is concerned with the limit behavior of the

sample extremes max(X1, X2, . . . , Xn) or min(X1, . . . , Xn) as n→ ∞.

We consider some basic theory for sums of independent random variables. This in-

cludes classical results such as the strong law of large numbers and the Central Limit

Theorem. Throughout this chapter X1, X2, . . . is a sequence of i.i.d. non-degenerate

real random variables defined on a probability space (Ω,F , P) with common distri-

butions function(d.f.) F. We consider the partial sums

Sn = X1 + · · ·+ Xn, n ≥ 1.

and of the sample means

Xn = n−1Sn =Sn

n, n ≥ 1.

Let X be random variable and denote the expectation, the variance of X by E(X) =

µ, Var(X) = σ2. Firstly, we assume that E(X) = µ < ∞. From the strong law of

1

large numbers, we get

Xn = n−1Sna.s→ µ.

With the additional assumption of Var(X1) = σ2 < ∞, we get the Central Limit

Theorem:Sn − nµ√

nσ

d→ Z, Z ∼ N(0, 1).

Hence for large n, we can approximate

P(Sn ≤ x) ≈ P(

Z ≤ x− nµ√nσ

).

Taking an alternative approach, we can deal with the problem of finding possi-

ble limit distributions for (say) sample maxima of independent and identically dis-

tributed random variables.

1.1.1. Limit Probabilities for Maxima

Whereas in above, we introduced ideas on partial sums, in this section we investi-

gate the fluctuations of the sample maxima:

Mn =n∨

i=1

Xi = max(X1, . . . , Xn), n ≥ 1.

Remark 1.1. Corresponding results for minima can easily be obtained from those for

maxima by using the identity

min(X1, . . . , Xn) = −max(−X1, . . . ,−Xn).

We shall therefore only briefly discuss minima explicitly in this work, except where

its joint distribution with Mn is considered.

We have the exact d.f of the maximum Mn for x ∈ R, n ∈N,

P(Mn ≤ x) = P(X1 ≤ x, . . . , Xn ≤ x) =n

∏i=1

P(Xi ≤ x) = Fn(x). (1.1)

Extreme events happen ’near’ the upper end of the support of the distribution. We

denote the right endpoint of F by

xF = supx ∈ R : F(x) < 1. (1.2)

That is, F(x) < 1 for all x < xF and F(x) = 1 for all x > xF. We immediately obtainP(Mn ≤ x) = Fn(x)→ 0, n→ ∞ for all x < xF

P(Mn ≤ x) = Fn(x)→ 1, n→ ∞ in the case xF < ∞, ∀x > xF

2

Therefore the limit distribution limn→∞ Fn(x) is degenerate. Thus MnP→ xF as n →

∞ where xF < ∞. Since the sequence (Mn) is non decreasing in n, it converges

almost surely(a.s), no matter whether it is finite or infinite and hence we conclude

that

Mna.s→ xF, n→ ∞.

This result is quite uninformative for our purpose and does not answer the basic

question in our mind. This difficulty is avoided by allowing a linear renormalization

of the variable Mn:

M∗n =Mn − bn

an,

for sequences of constants an > 0 and bn ∈ R.

Definition 1.1. A univariate distribution function F, belong to the maximum do-

main of attraction of a distribution function G if

1. G is non degenerate distribution.

2. There exist real valued sequence an > 0, bn ∈ R, such that

P(

Mn − bn

an≤ x

)= Fn(anx + bn)

d→ G(x). (1.3)

Finding the limit distribution G(x) is called the Extremal Limit Problem. Finding the

F(x) that have sequences of constants as described above leading to G(x) is called

the Domain of Attraction Problem.

For large n, we can approximate P(Mn ≤ x) ≈ G( x−bnan

). We denote F ∈ D(G). We

often ignore the term ’maximum’ and abbreviate domain of attraction as DOA.

Now we are faced with certain questions:

1. Given any F, does there exist G such that F ∈ D(G) ?

2. Given any F, if G exist, is it unique?

3. Can we characterize the class of all possible limits G according to definition

1.1?

4. Given a limit G, what properties should F have so that F ∈ D(G)?

5. How can we compute an, bn ?

The goal of the next section is to answer the above questions.

3

1.2. Maximum Domains of Attraction

Let’s consider probabilities of the form

P(

Mn − bn

an≤ x

)which may be rewritten as

P(Mn ≤ un)

where un = un(x) = anx + bn. In order to get more insight into the asymptotic

behavior of Mn we have to investigate the following aspects:

1. Conditions on F, that ensure the existence of the limit of P(Mn ≤ un) for

n→ ∞ and appropriate constants un.

2. Possible limit laws for the (centered and normalized) maxima Mn (comparable

to the Central Limit Theorem).

Example 1.1. Let X be a standard exponential distribution. Then the distribution

function of X is given by

FX(x) = 1− e−x, x > 0.

If X1, X2, . . . are i.i.d. random variables with common distribution function F, then

P(Mn ≤ x + log n) =(

1− e−x−log n)n

= (1− e−x

n)n

→ exp−e−x =: Λ(x), x ∈ R.

The limit distribution Λ(x) is called the Gumbel distribution. So we obtain that the

Gumbel distribution is a possible limit distribution according to Definition 1.1.

The following Theorem provides a partial answer to question 1.

Theorem 1.1 (Poisson approximation). For given τ ∈ [0, ∞] and a sequence un of real

numbers, the following two conditions are equivalent:

nF(un)→ τ as n→ ∞, (1.4)

P(Mn ≤ un)→ e−τ as n→ ∞. (1.5)

where F = 1− F.

4

Proof. Suppose first that 0 ≤ τ < ∞. If (1.4) holds, then

P(Mn ≤ un) = Fn(un) = (1− F(un))n =

(1− τ

n+ o(

1n)

)n,

so that (1.5) follows at once.

Conversely, if (1.5) holds (0 ≤ τ < ∞), we must have

F(un) = 1− F(un)→ 0

(otherwise, F(unk) would be bounded away from 0 for some subsequence (nk) and

P(Mnk ≤ unk) = (1− F(unk))nk would imply P(Mnk ≤ unk) → 0). By taking loga-

rithms in (1.5), we have

−n ln(1− F(un))→ τ.

Since − ln(1− x) ∼ x for x → 0, this implies nF(un) = τ + o(1) that giving (1.4).

If τ = ∞ and (1.4) holds but (1.5) does not, there must be a subsequence (nk) such

that

P(Mnk ≤ unk)→ exp−τ′,

as k → ∞ for some τ′ < ∞. But then (1.5) implies (1.4), so that nkF(unk) → τ′ < ∞,

contradicting (1.4) with τ = ∞.

Similarly, (1.5) implies (1.4) for τ = ∞.

Example 1.2. We consider the distribution function F

F(x) =

1− e1/x for all x < 0

1 for all x ≥ 0

By theorem 1.1, if un such that

ne1/un n→∞→ τ > 0,

it follows that

P(Mn ≤ un)n→∞→ e−τ.

By writing τ = e−x(−∞ < x < ∞) and taking un = (log τ− log n)−1, it follows that

P(Mn ≤ −(log n + x)−1)→ exp(−e−x),

from which it is readily checked that

P(log n)2[Mn +1

log n] ≤ x + o(1) → exp(−e−x),

giving Gumbel distribution with

an = (log n)−2, bn = −(log n)−1.

5

We denote f (x−) = limy↑x f (y) and p(x) = F(x)− F(x−).

Theorem 1.2. Let F be a d.f. with right endpoint xF ≤ ∞ and let τ ∈ (0, ∞). There exists

a sequence (un) satisfying nF(un)→ τ if and only if

limx↑xF

F(x)F(x−)

= 1 (1.6)

or equivalently, if and only if

limx↑xF

p(x)F(x−)

= 0 (1.7)

Hence, by Theorem 1.1, if 0 < ρ < 1, there is a sequence un such that P(Mn ≤un) → ρ if and only if (1.6) (or (1.7)) holds. For ρ = 0 or 1, such a sequence may

always be found.

Proof. We suppose that (1.4) holds for some 0 < τ < ∞ but that, say (1.7), does not.

Then there exists ε and a sequence xn such that xn → xF and

p(xn) ≥ 2ε(F(xn−)). (1.8)

Now choose a sequence of integers nj so that 1− τnj

is ”close” to the midpoint of

the jump of F at xj, i.e. such that

1− τ

nj≤

F(xj−) + F(xj)

2≤ 1− τ

nj + 1.

Clearly we have either

(i) unj < xj for infinitely many values of j, or

(ii) unj ≥ xj for infinitely many j-values.

If alternative (i) holds, then for such j,

njF(unj) ≥ njF(xj−). (1.9)

Now, clearly

njF(xj−) = τ + nj

[(1− τ

nj

)−

F(xj) + F(xj−)2

+p(xj)

2

]

≥ τ +nj p(xj)

2− nj

(τ

nj− τ

nj + 1

)≥ τ + εnjF(xj−)−

τ

nj + 1

6

by (1.8) so that

(1− ε)njF(xj−) ≥ τ − τ

nj + 1.

Since clearly nj → ∞, it follows that (since τ ∈ (0, ∞) by assumption)

limj→∞

sup njF(xj−) ≥ τ,

and hence by (1.2),

limj→∞

sup njF(unj) ≥ τ,

which contradicts (1.4). The calculations in case (ii) (unj ≥ xj for infinitely many j)

are very similar, with only the obvious changes.

Conversely, suppose that (1.6) holds and let un be any sequence such that F(un−) ≤1− τ

n ≤ F(un) (e.g. un = F−1(1− τn )), from which a simple rearrangement yields

F(un)

F(un−)τ ≤ nF(un) ≤ τ

from which (1.4) follows since clearly un → xF as n→ ∞.

The result applies in particular to discrete distributions with infinite right endpoint.

If the jump heights of the d.f. do not decay sufficiently fast, then a non-degenerate

limit distribution for maxima does not exist.

Example 1.3 (Poisson distribution). Let X be a Poisson r.v.s with expectation λ > 0;

i.e,

P(X = k) = e−λ λk

k!, λ > 0, k ∈N

Then,

F(k)F(k− 1)

= 1− F(k)− F(k− 1)F(k− 1)

= 1− λk

k!

(∞

∑r=k

λr

r!

)−1

= 1−(

1 +∞

∑r=k

k!r!

λr−k

)−1

.

The latter sum can be estimated as∞

∑s=1

λs

(k + 1)(k + 2) · · · (k + s)≤

∞

∑s=1

(λ

k

)s=

λ/k1− λ/k

, k ≥ s

which tends to 0 as k→ ∞, so that F(k)F(k−1)

→ 0.

Hence, by virtue of Theorem 1.2 we can see that no non-degenerate distribution can

be limit of normalized maxima taken from a sequence of random variables identi-

cally distributed as X.

7

Example 1.4 (Geometric distribution). We consider the random variable X with ge-

ometric distribution:

P(X = k) = p(1− p)k−1, 0 < p < 1, k ∈N.

For this distribution, we have

F(k)F(k− 1)

= 1− (1− p)k−1

(∞

∑r=k

(1− p)r−1

)−1

= 1− p ∈ (0, 1).

By the same argument as above, no limit P(Mn ≤ un) → ρ exists except for ρ = 0

or 1, that implies there is no nondegenerate limit distribution for the maxima in the

geometric distribution case.

Example 1.5 (Negative binomial distribution). Let X be the random variable with

P(X = k) = Ck−1i+k−1pi(1− p)k−1, k ∈ R, 0 < p < 1, i > 0.

Using properties of the binomial coefficients we obtain

F(k)F(k− 1)

= 1− F(k)− F(k− 1)F(k− 1)

≤ 1− p ∈ (0, 1),

i.e, no limit P(Mn ≤ un)→ ρ exists except for ρ = 0 or 1.

Definition 1.2. Suppose that H : R→ R is a non-decreasing function. The general-

ized inverse of H is given by

H←(x) = infy : H(y) ≥ x.

Properties of generalized inverse are given in Appendix A.2.

Lemma 1.1. (i) For H as above, if a > 0, b and c are constants, and T(x) = H(ax +

b)− c, then T←(y) = a−1[H←(y + c)− b].

(ii) If F is a non-degenerate d.f., there exist y1, y2 such that F←(y1) < F←(y2) are well

defined ( and finite).

Proof. (i) We have

T←(y) = infx : H(ax + b)− c ≥ y

= a−1[ inf(ax + b : H(ax + b) ≥ y + c) − b ]

= a−1[H←(y + c)− b],

8

as required.

(ii) If F is non-degenerate, there exist x′1 < x′2 such that

0 < F(x′1) = y1 < F(x′2) = y2 ≤ 1.

Clearly x1 = F←(y1) and x2 = F←(y2) are both well defined. Also F←(y2) ≥ x′1 and

equality would require F(z) ≥ y2 for all z > x1 so that

F(x′1) = limε↓0

F(x′1 + ε) = F(x′1+) ≥ y2,

contradicting F(x′1) = y1. Thus F←(y2) > x′1 ≥ x1 = F←(y1), as required.

For any function H denote

C (H) = x ∈ R : H is finite and continuous at x.

If two r.v.s X and Y have the same distribution, we write

X d= Y.

Definition 1.3. Two distribution functions U(x) and V(x) are of the same type if for

some A > 0, B ∈ R

V(x) = U(Ax + B)

for all x.

In terms of random variables, if X has distribution U and Y has distribution V, then

Y d=

X− BA

Example 1.6. Let denote N(0, 1, x) (normal distribution function with mean 0 and

variance 1). Then, it is easy to see that N(µ, σ2, x) = N(0, 1, x−µσ ) for σ > 0, µ ∈

R. Then all normalized d.f’s are of the same type called normal type. If X0,1 has

N(0, 1, x) as its distribution and Xµ,σ has N(µ, σ2, x) as its distribution, then Xµ,σd=

σX0,1 + µ.

Now we state the theorem developed by Gnedenko and Khintchin.

Theorem 1.3 (Convergence to types theorem). (a) Suppose U(x) and V(x) are two non-

degenerate distribution functions. Suppose for n ≥ 1, Fn is a distribution, an ≥ 0, bn ∈R, αn > 0, βn ∈ R and

Fn(anx + bn)d→ U(x), Fn(αnx + βn)

d→ V(x). (1.10)

9

Then as n→ ∞αn

an→ A > 0,

βn − bn

an→ B ∈ R, (1.11)

and

V(x) = U(Ax + B) (1.12)

An equivalent formulation in term of random variables :

(a’) Let Xn, n ≥ 1 be random variables with distribution function Fn and the U, V

are random variables with distribution functions U(x), V(x). If

Xn − bn

an

d→ U,Xn − βn

αn

d→ V, (1.13)

then (1.11) holds and

V d=

U − BA

. (1.14)

(b) Conversely, if (1.11) holds then either of the two relations in (1.10) (or (1.13) )

implies the other and (1.12) (or (1.14)) holds.

Proof. (b) Suppose (1.11) holds and Yn := Xn−bnan

d→ U. We must show that

Xn − βn

αn

d→ U − BA

.

By Skorohod’s Theorem (see appendix A.3), there exist Yn, U, n ≥ 1, defined on

([0, 1], B[0, 1], m)(the Lebesgue probability space, m is Lebesgue measure) such that

Ynd= Yn, U d

= U, Yna.s= U, n ≥ 1.

Put Xn := anYn + bn so Xnd= Xn. Then

Xn − βn

αn

d=

Xn − βn

αn=

an

αnYn +

bn − βn

αn

→ UA− B

Ad=

U − BA

,

that means Xn−βnαn

d→ U−BA .

(a) Using Proposition A.2 (see Appendix A.2) that if Gnd→ G, then also G←n

d→ G←

and the relation in (1.10) can be inverted to give

F←n (y)− bn

an→ U←(y), y ∈ C (U←) (1.15)

F←n (y)− βn

αn→ V←(y), y ∈ C (V←) (1.16)

10

weakly. Since neither U(x) nor V(x) concentrates at one point we can find points

y1, y2 with yi ∈ C (U←) ∩ C (V←), y1 < y2, for i = 1, 2, satisfying

−∞ < U←(y1) < U←(y2) < ∞,

and

−∞ < V←(y1) < V←(y2) < ∞.

Therefore from (1.15) we have for i = 1, 2

F←n (yi)− bn

an→ U←(yi),

F←n (yi)− βn

αn→ V←(yi) (1.17)

and by subtraction

F←n (y2)− F←n (y1)

an→ U←(y2)−U←(y1) > 0,

F←n (y2)− F←n (y1)

αn→ V←(y2)−V←(y1) > 0. (1.18)

Divide the first relation in (1.18) by the second to obtain

αn

an→ U←(y2)−U←(y1)

V←(y2)−V←(y1)=: A > 0

Using this and (1.17) we get

F←n (y1)− bn

an→ U←(y1),

F←n (y1)− βn

an=

F←n (y1)− βn

αn.αn

an→ V←(y1)A,

and so subtracting yields

βn − bn

an→ V←(y1)A−U←(y1) =: B

This gives (1.11) and (1.12) follows from (b).

Remark 1.2. (a) The answer to question 2 is quite clear from Theorem 1.3. Namely,

F ∈ D(G1) and F ∈ D(G2) then G1 and G2 must be of the same type.

(b) The theorem shows that when

Xn − bn

an

d→ U

and U is non-constant, we can always suitable choice of the normalizing con-

stants is

an = F←n (y2)− F←n (y1),

bn = F←n (y1).

11

1.2.1. Max-Stable Distributions

In this section we answer the question: What are the possible (non-degenerate) limit

laws for the maxima Mn when properly normalised and centred?

Definition 1.4. A non-degenerate random distribution function F is max-stable if for

X1, X2, . . . , Xn i.i.d. F there exist an > 0, bn ∈ R such that

Mnd= anX1 + bn.

Example 1.7. If X1, X2, . . . is a sequence of independent standard exponential Exp(1)

variables, F(x) = 1− e−x for x > 0. Taking an = 1 and bn = n, we have

P(

Mn − bn

an≤ x

)= Fn(x + log n) = [1− e−(x+log n)]n

= [1− n−1e−x]n → exp(−e−x)

as n → ∞, for each fixed x ∈ R. Hence, with the chosen an and bn, the limit distri-

bution of normalized Mn as n→ ∞ is the Gumbel distribution.

Example 1.8. If X1, X2, . . . is a sequence of independent standard Frechet variables,

F(x) = exp(− 1x ) for x > 0. For an = n and bn = 0.

P(

Mn − bn

an≤ x

)= Fn(nx) = [exp− 1

nx]n

= exp(− nnx

) = F(x)

as n→ ∞, for each fixed x > 0. Hence, the limit in this case - which is an exact result

for all n, because of the max-stability of F - is also the standard Frechet distribution.

Example 1.9. If X1, X2, . . . are a sequence of independent uniform U(0, 1) variables,

F(x) = x for 0 ≤ x ≤ 1. For fixed x < 0, suppose n > −x and let an = 1n and bn = 1.

Then,

P(

Mn − bn

an≤ x

)= Fn(n−1x + 1)

=(

1 +xn

)n→ ex

as n → ∞. Hence, the limit distribution is of Weibull type, that means Weibull

distribution are max-stable.

Theorem 1.4 (Limit property of max-stable laws). The class of all max-stable distribu-

tion functions coincide with the class of all limit laws G for (properly normalised) maxima

of i.i.d. rvs (as given in (1.3)).

12

Proof. 1. If X1, X2, . . . are i.i.d. G, G is max-stable and Mn =∨n

i=1 Xi, then

Mnd= anX1 + bn

for some an > 0, bn ∈ R. Then ∀x ∈ R

limn→∞

P(

Mn − bn

an≤ x

)= G(x).

2. Now suppose that H is non degenerate and there exist an > 0, bn ∈ R such that

limn→∞

Fn(anx + bn) = H(x).

We claim that H is max-stable. Observe that for all k ∈N, we have

limn→∞

Fnk(anx + bn) = Hk(x),

limn→∞

Fnk(ankx + bnk) = H(x).

By virtue of Convergence to Types Theorem: there exist a∗k > 0, b∗k ∈ R such that

limn→∞

ankan

= a∗k , limn→∞

bnk − bn

an= b∗k

and

H(x) = Hk(a∗k x + b∗k ).

Therefore if Y1, . . . , Yk are i.i.d. from H then for all k ∈N,

Y1d=

∨ni=1 Yi − b∗k

a∗k

which impliesn∨

i=1

Yid= a∗kY1 + b∗k .

1.3. Extremal Value Distributions

1.3.1. Extremal Types Theorem

The extreme type theorems play a central role of the study of extreme value theory.

In the literature, Fisher and Tippett (1928) were the first who discovered the extreme

type theorems and later these results were proved in complete generality by Gne-

denko (1943). Later Galambos (1987), Leadbetter, Lindgren and Rootzen (1983), and

Resnick (1987) gave excellent reference books on the probabilistic aspect.

13

Theorem 1.5 (Fisher-Tippett(1928), Gnedenko(1943)). Suppose there exist sequence an >

0 and bn ∈ R, n ≥ 1 such that

Mn − bn

an

d→ G

where G is non-degenerate, then G is of one the following three types:

1. Type I, Gumbel : Λ(x) = exp−e−x, x ∈ R.

2. Type II, Frechet : Φα(x) =

0 if x < 0

exp−x−α if x ≥ 0for some α > 0.

3. Type III, Weibull : Ψα(x) =

exp−(−x)α if x < 0

1 if x ≥ 0for some α > 0

Proof. For t ∈ R Let denote [t] = The greatest integer less than or equal to t. We

proceed in a sequence of steps.

Step(i). From P[Mn−bnan≤ x] = Fn(anx + bn)

d→ G(x), we get for any t > 0.

F[nt](a[nt]x + b[nt])d→ G(x)

and on the other

F[nt](anx + bn) = (Fn(anx + bn))[nt]n → Gt(x).

Thus Gt and G are of the same type and the convergence to types theorem applies

the existence of two functions α(t) > 0, β(t) ∈ R, t > 0 such that for all t > 0,

limn→∞

an

a[nt]= α(t), lim

n→∞

bn − b[nt]

a[nt]= β(t) (1.19)

and also

Gt(x) = G(α(t)x + β(t)). (1.20)

Step(ii). We observe that the functionα(t) and β(t) are Lebesgue measurable. For in-

stance, to prove α(·) is measurable, it suffices (since limits of measurable functions

are measurable) to show that the function

t 7→ an

a[nt]

is measurable for each n. Since an does not depend on t, the previous statement is

true if the function

t 7→ a[nt]

14

is measurable. Since this function has a countable range aj, j ≥ 1 it suffices to

show

t > 0 : a[nt] = aj

is measurable. But this set equals

⋃k:ak=aj

[kn

,k + 1

n)

which, being a union of intervals, is certainly a measurable set.

Step(iii). Facts about the Hamel Equation (see [20]). We need to use facts about possi-

ble solutions of functional equations called Hamel’s equation and Cauchy’s equation. If

f (x), x > 0 is finite, measurable and real valued and satisfies the Cauchy equation

f (x + y) = f (x) + f (y), x > 0, y > 0,

then f is necessarily of the form

f (x) = cx, x > 0,

for some c ∈ R. A variant of this is Hamel’s equation. If φ(x), x > 0 is finite,

measurable, real valued and satisfies Hamel’s equation

φ(xy) = φ(x)φ(y), x > 0, y > 0,

then φ is of the form

φ(x) = eρ,

for some ρ ∈ R.

Step(iv). Another useful fact. If F is a non-degenerate distribution function and

F(ax + b) = F(cx + d) ∀x ∈ R,

for some a > 0, c > 0 and b, d are constants, then a = c, and b = d.

Choose y1 < y2 and −∞ < x1 < x2 < ∞ by (ii) of lemma 1.1 so that x1 =

F←(y1), x2 = F←(y2). Taking inverses of F(ax+ b) and F(cx+ d) by (i) of the lemma

1.1, we have

a−1(F←(y)− b) = c−1(F←(y)− d)

for all y. Applying this to y1 and y2 in turn, we obtain

a−1(x1 − b) = c−1(x1 − d) and a−1(x2 − b) = c−1(x2 − d),

15

from which it follows simply that a = c and b = d.

Step(v). Return to (1.20) and for t > 0, s > 0 we have on the one hand

Gts(x) = G(α(ts)x + β(ts))

and on the other

Gts(x) = (Gs(x))t = G(α(s)x + β(s))t

= G(α(t)α(s)x + β(s)+ β(t))

= G(α(t)α(s)x + α(t)β(s) + β(t)).

Since G is assumed non degenerate we therefore conclude for t > 0, s > 0

α(ts) = α(t)α(s) (1.21)

β(ts) = α(t)β(s) + β(t) = α(s)β(t) + β(s) (1.22)

the last step following by symmetry. We recognize (1.21) as the famous Hamel func-

tional equation. The only finite measurable, nonnegative solution is of the following

form

α(t) = t−θ, θ ∈ R

Step(vi). We will show that

β(t) =

c log t If θ = 0

c(1− tθ) If θ 6= 0

for some c ∈ R.

If θ = 0, then α(t) = 1 and β(t) satisfies

β(ts) = β(t)β(s).

So expβ(·) satisfies the Hamel equation which implies that

expβ(t) = tc

for some c ∈ R and thus β(t) = c log t.

If θ 6= 0, then

β(ts) = α(t)β(s) + β(t) = α(s)β(t) + β(s).

Fix s0 6= 1 and we get

α(t)β(s0) + β(t) = α(s0)β(t) + β(s0),

16

and solving for β(t) we get

β(t)(1− α(s0)) = β(s0)(1− α(t)).

Note that 1− α(s0) 6= 0. Thus we conclude

β(t) =β(s0)

1− α(s0)(1− α(t)) =: c(1− tθ).

Step(vii). We conclude that

Gt(x) =

G(x + c log t) If θ = 0 (a)

G(tθx + c(1− tθ)) If θ 6= 0 (b)

Now we show that θ = 0 corresponds to a limit distribution of type Λ(x), that the

case θ > 0 corresponds to a limit distribution of type Φα and that θ < 0 corresponds

to Ψα.

Consider the case θ = 0. Examine the equation in (a): For fixed x, the function Gt(x)

is non-increasing in t. So c < 0, since otherwise the right side of (a) would not be

decreasing. If x0 ∈ R such that G(x0) = 1, then

1 = Gt(x0) = G(x0 + c log t), ∀t > 0,

which implies

G(y) = 1, y ∈ R

and this contradicts G non-degenerate. If x0 ∈ R such that G(x0) = 0, then

0 = Gt(x0) = G(x0 + c log t), ∀t > 0,

which implies

G(x) = 0, ∀x ∈ R,

again giving a contradiction. We conclude 0 < G(y) < 1, for all y ∈ R.

In (a), set x = 0 and set G(0) = e−κ Then

e−tκ = G(c log t).

Set y = c log t, and we get

G(y) = exp−κeyc = exp−e−(

y|c|−log κ)

which is the type of Λ(x).

We consider the case θ > 0. Examine the equation in (b):

Gt(x) = G(tθx + c(1− tθ))

= G(tθ(x− c) + c)

17

i.e, changing variables

Gt(x + c) = G(tθx + c).

Set H(x) = G(x + c). Then G and H are of the same type so it suffices to solve for

H. The function H satisfies

Ht(x) = H(tθx) (1.23)

and H is non-degenerate. Set x = 0 and we get from (1.23)

t log H(0) = log H(0)

for t > 0. So either log H(0) = 0 or −∞; i.e, either H(0) = 0 or 1. However,

H(0) = 1 is impossible since it would imply the existence of x < 0 such that the

left side of (1.23) is decreasing in t while the right side of (1.23) is increasing in t.

Therefore we conclude H(0) = 0. Again from (1.23) we obtain

Ht(1) = H(tθ)

if H(1) = 0, then H ≡ 0 and if H(1) = 1 then H ≡ 1, both statements contradicting

H non-degenerate. Therefore H(1) ∈ (0, 1). Set α = θ−1, H(1) = exp−ρ−α, u = tθ

so that u−α = t. From (1.23) with x = 1 we get for u > 0

H(u) = exp−ρ−αt = exp−(ρu)−α

= Ψα(ρu).

The other cases and θ < 0 are handled similarly.

In words, The extreme type theorems say that for a sequence of i.i.d. random vari-

ables with suitable normalizing constants, the limiting distribution of maximum

statistics, if it exists, follows one of three types of extreme value distributions that

labeled I, II and III. Collectively, these three classes of distribution are termed the

extreme value distributions, with types I, II and III widely known as the Gumbel,

Frechet and Weibull families respectively. Each family has a location and scale pa-

rameter, band a respectively; additionally, the Frechet and Weibull families have a

shape parameter α.

Remark 1.3. (a) Though, for modelling purposes the types of Λ, Φα and Ψα are very

different, from a mathematical point of view they are closely linked. Indeed, one

immediately verifes the following properties. Suppose X > 0, then

X ∼ Ψα ⇔ −1X∼ Ψα ⇔ log Xα ∼ Λ

18

(b) We have to shown that:

Class of Extreme Value distributions = Max-stable distributions = Distributions ap-

pearing as limits in Definition 1.1

Thus we have a characterization of the limit distributions appearing as limits in

Definition 1.1, which answers question 3.

1.3.2. Generalized Extreme Value Distributions

Definition 1.5 (Generalized Extreme Value Distributions). For any γ ∈ R, defined

the distribution

Gγ(x) := exp(−(1 + γx)1γ ), 1 + γx > 0

is an extreme value distribution, abbreviated by EVD. The parameter γ is called the

extreme value index.

Since (1 + γx)−1/γ → exp(−x), as γ → 0 interpret for γ = 0, we have G0(x) =

exp−e−x . The family of distributions Gγ(x−µ

σ ), for µ, γ ∈ R, σ > 0 is called

the family of generalized extreme value distributions under von Mises or von Mises-

Jenkins parametrization. It shows that the limit distribution functions form a simple

explicit one-parameter family apart from the scale and location parameters.

Let us consider the subclasses with γ > 0, γ = 0, and γ < 0 separately:

(a) For γ > 0 clearly Gγ(x) < 1 for all x, i.e., the right endpoint of the distribution

is infinity. Moreover, as x → ∞, 1− Gγ(x) ∼ γ−1/γx−1/γ i.e., the distribution

has a rather heavy right tail. We use Gγ(x−1

γ ) and get with α = 1γ > 0,

Φγ(x) =

0 x ≤ 0

exp(−x−α) x > 0

This class is often called the Frechet class of distributions.

(b) For γ = 0. The distribution with γ = 0

G0(x) = exp(−e−x),

for all x ∈ R, is called the double-exponential or Gumbel distribution.

Observe that the right endpoint of the distribution equals infinity. The distri-

bution,however, is rather light-tailed:

1− G0(x) = 1− exp−e−x ∼ e−x,

as x → ∞ and all moments exist.

19

(c) For γ < 0, the right endpoint of the distribution is − 1γ so it has a short tail,

verifying 1− Gγ(−γ−1 − x) ∼ (−γx)−1/γ, as x ↓ 0. We use Gγ(−1+xγ ) and

get with α = − 1γ > 0,

Ψγ(x) =

exp(−(−x)−α) x < 0

1 x ≥ 0

This class is sometimes called the reverse-Weibull class of distributions.

1.4. Domain of Attraction Condition

Recall that we defined the generalized inverse of non-decreasing function f . We have

the lemma :

Lemma 1.2. Suppose fn is a sequence of nondecreasing functions and g is a nondecreasing

function. Suppose that for each x in some open interval (a, b) that is a continuity point of g,

limn→∞

fn(x) = g(x).

Then, for each x in the interval (g(a), g(b)) that is a continuity point of g← we have

limn→∞

f←n (x) = g←(x).

Proof. Let x be a continuity point of g←. Fix ε > 0. We have to prove that for

n, n0 ∈N, n0 ≤ n,

f←n (x)− ε ≤ g←(x) ≤ f←n (x) + ε.

We are going to prove the right inequality; the proof of the left-hand side inequality

is similar.

Choose 0 < ε1 < ε such that g←(x)− ε1 is a continuity point of g. This is possible

since the continuity points of g form a dense set. Since g← is continuous in x, g←(x)

is a point of increase for g; hence g(g←(x)− ε1) < x. Choose δ < x− g(g←(x)− ε1).

Since g←(x)− ε1 is a continuity point of g, there exists n0 such that fn(g←(x)− ε1) <

g(g←(x)− ε1) + δ < x for n ≥ n0. The definition of the function f←n then implies

g←(x)− ε1 ≤ f←n (x).

Theorem 1.6. The following statements are equivalent for all x, such that 0 < G(x) < 1:

1. There exist an > 0, bn ∈ R and a nondegenerate distribution function G such that,

Fn(anx + bn)d→ G(x), as n→ ∞.

20

2. There exist an > 0, bn ∈ R and a nondegenerate distribution function G, for each

continuity point x of G,

n(1− F(anx + bn))→ − log G(x), as n→ ∞

3. There exist a(t) > 0, b(t) ∈ R and a nondegenerate distribution function G, for each

continuity point x of G,

t(1− F(a(t)x + b(t)))→ − log G(x), as n→ ∞.

4. Let U = ( 11−F )

←, There exist a(t) > 0, b(t) ∈ R and a nondegenerate distribution

function G,

U(tx)− b(t)a(t)

→ D(x) = G←(e−1/x), as n→ ∞

for each x > 0 continuity point of D(x).

5. There exist a(t) > 0, b(t) ∈ R and a nondegenerate distribution function G, such

that

Ft(a(t)x + b(t)) d→ G(x), as n→ ∞.

Proof. Fix a continuity point x of G, 0 < G(x) < 1.

(1 ⇔ 2) Clearly F(anx + bn) → 1 as n → ∞ and from the expansion log(1 + ε) =

ε + O(ε2) for ε→ 0. By taking logarithms, as n→ ∞

Fn(anx + bn)→ G(x)

⇔ n log F(anx + bn)→ log G(x)

⇔ n log(1− (1− F(anx + bn)))→ log G(x)

⇔ n(1− F(anx + b))→ − log G(x). (1.24)

Similarly, we can show that 3⇔ 5.

(2⇔ 3). To show that (2 d→ 3), let a(t) = a[t], b(t) = b[t] (with [t] the integer part of

t). Then

t(1− F(a(t)x + b(t))) ≤ ([t] + 1)(1− F(a(t)x + b(t)))

=[t] + 1[t]

× [t](1− F(a[t]x + b[t]))

t→∞→ 1× (− log G(x)) = − log G(x). (1.25)

Similarly, we can show that

limt→∞

t(1− F(a(t)x + b(t))) ≥ − log G(x).

21

Hence, we have (2 d→ 3). We see that (3 d→ 2) is obvious.

(3⇔ 4), firstly, we have to show 3 d→ 4

t(1− F(a(t)x + b(t)))→ − log G(x)d→ 1

t

(1

[1− F(a(t)x + b(t))]

)→ − 1

log G(x).

Inverting the above convergence and apply Lemma 1.2, we obtain

U(ty)− b(t)a(t)

→(− 1

log G(x)

)←(y) = G←(e−1/y).

Similarly, we get 4 d→ 3.

Example 1.10 (Normal distribution). Let F be the standard normal distribution. We

are going to show that : for all x > 0,

limn→∞

n(1− F(anx + bn)) = e−x (1.26)

with

bn := (2 log n− log log n− log(4π))1/2, (1.27)

and

an :=1bn

. (1.28)

Note that bn(2 log n)1/2 → 1, n→ ∞, then

log bn −12

log log n− 12

log 2→ 0

andb2

n2+ log bn − log n +

12

log(2π)→ 0

as n→ ∞. Now taking an = 1bn

,

− ddx

n(1− F(anx + bn)) =n

bn√

2πexp(−1

2(

xbn

+ bn)2)

= exp−(b2n

2log bn − log n +

12

log(2π))e−x2/(2b2n)e−x

→ e−x

for x ∈ R. Hence

n(1− F(anx + bn)) =n√

2πbn

∫ ∞

xexp

(−1

2

(ubn

+ bn

)2)

22

= exp−(

b2n

2+ log bn log n +

12

log(2π)

) ∫ ∞

xe−u2/(2b2

n)e−udu

→ e−x

by Lebesgue’s theorem on dominated convergence (see Appendix A.3.1). Then

limn→∞

n(1− F(anx + bn)) = e−x

We obtain

Fn(anx + bn)n→∞→ exp(−e−x).

Since in the limit relation (1.26) we can replace an by a′n and bn by b′n provided ana′n→

1, (b′n−bn)an

→ 0, we can replace bn, an from (1.27) and (1.28) by, e.g

b′n = (2 log n)1/2 − log log n + log(4π)

(2 log n)1/2

and

a′n = (2 log n)−1/2.

1.4.1. General Theory of Domains of Attraction

It is important to know which (if any) of the three types of limit law applies when

r.v.s Xn have a given d.f. F. Various necessary and sufficient conditions are

known, involving the ” tail behaviour ” 1 − F(x) as x increases, for each type of

limit. We shall state these and prove their sufficiency, omitting the proofs of neces-

sity.

Theorem 1.7. Necessary and sufficient conditions for the d.f. F of the r.v.’s of the i.i.d.

sequence Xn to belong to each of the three types are

Type I :∫ ∞

0 (1− F(u))du < ∞. There exists some strictly positive function g(t) such that

limx↑xF

1− F(t + xg(t))1− F(t)

= e−x

for all real x, where

g(t) =

∫ xFt (1− F(u))du

1− F(u)for t < xF.

Type II : xF = ∞ and

limt→∞

1− F(tx)1− F(t)

= x−α,

α > 0, for each x > 0.

23

Type III : xF < ∞ and

limh↓0

1− F(xF − xh)1− F(xF − h)

= xα

α > 0, for each x > 0.

Proof. We assume first the existence of a sequence un (which may be taken non-

decreasing in n) in each case such that n(1− F(un)) → 1. The un constants will, of

course, differ for the differing types. Clearly un → xF and un < xF for all sufficiently

large n.

If F satisfies the Type II criterion we have, writing un for t, for each x > 0,

n(1− F(unx)) ∼ n(1− F(un))x−α → x−α

so that Theorem 1.1 yields, for x > 0,

P(Mn ≤ unx)→ exp(−x−α).

Since un > 0 (when n is large, at least) and the right-hand side tends to zero as x ↓ 0,

it also follows that PMn ≤ 0 → 0, and for x < 0, that

PMn ≤ unx ≤ PMn ≤ 0 → 0.

Thus PMn < unx → G(x), where G is the Type II representative d.f. listed in

Theorem 1.5. But this may be restated as

P(

Mn − bn

an≤ x

)→ G(x), (1.29)

where an = un and bn = 0 so that the Type II limit follows.

The Type III limit follows in a closely similar way by writing hn = xF − un(↓ 0) so

that, for x > 0,

limn→∞

n(1− FxF − x(xF − un)) = xα,

and hence (replacing x by −x) for x < 0,

limn→∞

n(1− FxF + x(xF − un)) = (−x)α.

Using Theorem 1.1 again, this shows at once that the Type III limit applies with

constants in (1.29) given by

an = (xF − un), bn = xF.

The Type I limit also follows along the same lines since, when F satisfies that crite-

rion, we have, for all x, writing t = un ↑ xF(≤ ∞),

limn→∞

n(1− Fun + xg(un)) = e−x,

24

giving (again by Theorem 1.1) the Type I limit with an = g(un), bn = un. Fi-

nally, we must show the existence of the (nondecreasing) sequence un satisfying

limn→∞ n(1− F(un)) = 1. For un, we may take any nondecreasing sequence such

that

F(un−) ≤ 1− 1n≤ F(un)

(such as the sequence un = F−1(1 − 1n ) = infx : F(x) > 1 − 1

n). For such a

sequence, n(1− F(un)) ≤ 1 so that, trivially, lim sup n(1− F(un)) ≤ 1. Thus it only

remains to show that in each case lim inf n(1− F(un)) ≥ 1, which will follow since

n(1− F(un−)) ≥ 1 if we show that

limn→∞

inf1− F(un)

1− F(un−)≥ 1 (1.30)

For a d.f. F satisfying the listed Type II criterion, the left-hand side of (1.30) is, for

any x < 1, no smaller than

limn→∞

inf1− F(un)

1− F(unx)= xα,

from which (1.30) follows by letting x → 1.

A similar argument holds for a d.f. F satisfying the Type III criterion, the left-hand

side of (1.30) being no smaller (for x > 1, hn = xF − un) than

limn→∞

inf1− F(xF − hn)

1− F(xF − xhn)= x−α,

which tends to 1 as x → 1, giving (1.30).

Finally for the Type I case, the left-hand side of (1.30) is no smaller ( if x < 0) than

limn→∞

inf1− F(un)

1− F(un + xg(un))= ex,

which tends to 1 as x → 0, so that again (1.30) holds.

Corollary 1.1. The constants an, bn in the convergence PMn−bnan≤ x → G(x) may be

taken in each case above to be:

Type I :

an = g(un), bn = un,

with un = F←(1− 1n ) = infx : F(x) ≥ 1− 1

n.

Type II :

an = un, bn = 0.

25

Type II :

an = xF − un, bn = xF.

Proof. These relationships appear in the course of the proof of the theorem above.

Example 1.11 (Pareto distribution). As a simple example, we consider now the Pareto

distribution

F(x) = 1− κx−α, α > 0, κ > 0, x ≥ κ1/α.

We have1− F(tx)1− F(t)

=(tx)−α

t−α= x−α

so F belongs to DOA of a Type II extreme value distribution. By setting

n(1− F(un)) = τ

We have

un = (κnτ)1/α

so that Theorem 1.1 gives

PMn ≤ (κnτ)1/α → e−τ

By putting τ = x−α for x ≥ 0, we have

P(κn)−1/αMn ≤ x → exp(−x−α),

so that a Type II limit holds with

an = (κn)1/α, bn = 0.

1.5. Condition for belonging to Extreme Value Domain

The following theorem states a sufficient condition for belonging to a domain of

attraction. The condition is called von Mises’condition which is applied when the d.f.

F has a density function - an obviously common case.

26

Theorem 1.8. Let F be a distribution function and xF its right endpoint. Suppose F”(x)

exists and F′(x) is positive for all x in some left neighborhood of xF. If

limt↑xF

(1− F(t)

F′(t))= γ (1.31)

or equivalently

limt↑xF

(1− F(t))F′′(t)(F′(t))2 = −γ− 1 (1.32)

then F is in the domain of attraction of Gγ (F ∈ D(Gγ)).

Proof. Here, as elsewhere, the proof is much simplified by formulating everything

in terms of the inverse function U rather than the distribution function F. By differ-

entiating the relation1

1− F(U(t))= t

We obtain

U′(t) =[1− F(U(t))]2

F′(U(t)).

Differentiating once more, we find that

U′′(t)U′(t)

= −2[1− F(U(t))]− F′′(U(t))[1− F(U(t))]2

[F′(U(t))]2

By Theorems 1.6 the relation to be proved is equivalent to

limt→∞

U(tx)−U(t)a(t)

=xγ − 1

γ(1.33)

for all x > 0. So we need to prove that

limt→∞

tU′′(t)U′(t)

= γ− 1

implies (1.33) for the same γ. Since for 1 < x0 < x,

log U′(x)− log U′(x0) =∫ x

x0

U′′(s)U′(s)

ds,

we have for x > 0, t, tx > 1,

log U′(tx)− log U′(t) =∫ x

1A(ts)

d(s)s

with A(t) := tU′′(t)U′(t) . It follows that for 0 < a < b < ∞,

limt→∞

supa≤x≤b

| logU′′(tx)U′(t)

− log xγ−1| = 0.

27

Hence, since |es − et| < c|s− t| on a compact interval for some positive constant c,

limt→∞

supa≤x≤b

|U′′(tx)

U′(t)− xγ−1| = 0.

This impliesU(tx)−U(t)

tU′(t)− xγ − 1

γ=∫ x

1(

U′′(ts)U′(t)

− sγ−1)ds

converges to zero.

Remark 1.4. If (1.31) holds then

Fn(anx + bn)→ Gγ(x),

with an = U(n) and bn = U′(n) = 1nF′(bn)

.

Example 1.12. Let F(x) = N(x), the standard normal distribution. We have

F′(x) = n(x) =1√2π

e−x2/2

F′′(x) = − 1√2π

xe−x2/2 = −xn(x)

and using Mills’ ratio [14], we have 1− N(x) ∼ x−1n(x). Therefore

limx→∞

(1− F(x))F′′(x)(F′(x))2 = lim

x→∞

−x−1n(x)xn(x)(n(x))2 = −1.

Then, γ = 0 and F ∈ D(Λ) - Gumbel distribution.

Remark 1.5. If we defined R(t) = − log(1− F(t)), which called the integrated hazad

function, then we get

r(t) = R′(t) =F′(t)

1− F(t).

If the von Mises condition holds, then

limt↑xF

(1

r(t))′ = lim

t↑xF

(1− F(t)

F′(t)

)= lim

t↑xF

(−(F′(t))2 − (1− F(t))F′′(t)

(F′(t))2

)= −1− (γ− 1) = γ.

Thus, the von Mises condition can be written as limt↑xF

(1

r(t)

)′= γ which’s more

convenient to check.

28

Example 1.13. Let F(x) = e−xp, x > 0, p > 0. Then

R(t) = xp, r(t) = R′(t) = pxp−1

1r(t)

= p−1x1−p,(

1r(t)

)′=

1− pp

x−p → 0

as n→ ∞. In this case γ = 0.

Example 1.14. Let F(x) = e−α, x ≥ 1, α > 0. Then F(x) = e−α log x.

R(t) = α log x, r(t) = R′(t) =α

x

1r(t)

=xα

,(

1r(t)

)′=

1α

.

For this case γ = 1α .

29

CHAPTER 2

Multivariate Extreme Value Theory

In this chapter, we study the limiting joint distributions of normalized component-

wise defined maxima of i.i.d. d-variate random vectors. Such distributions are again

max-stable as in the univariate case.

2.1. Introduction

Historically, the first direction that had been explored concerning multivariate ex-

treme events was the modeling of the asymptotic behavior of componentwise max-

ima of i.i.d. observations. Key early contributions to this domain of research are,

among others, the papers of Tiago deOliveira (1958)[22], Sibuya (1960)[21], de Haan

and Resnick (1977)[10], and Pickands (1981)[17]. The general structure of the mul-

tivariate extreme value distributions has been explored by de Haan and Resnick

(1977). Useful representations in terms of max-stable distributions, regular variation

functions, or point processes, have been established. The next section is devoted to

the asymptotic model for componentwise maxima.

Let us assume that X1, X2, . . . , Xn are d-dimensional i.i.d. random vectors from some

distribution F. If we intend to study the extremes of the distribution F, we must

know what an extreme value really mean in the multivariate set-up. Of course, for

d ≥ 2, there is no natural ordering of the random sample. An early review article

Barnett(1976)[2] talks about four kinds of ordering for multivariate data set which

leads to different approaches for studying extremes (maxima or minima) in a multi-

variate set-up.

1. Componentwise maxima depending on Marginal ordering.

2. Maxima based on Redeced (Aggregate) ordering based on a single value com-

puted from a multivariate observation through a function f : Rd → R.

Usually the function f is some measure of generalized distance, say, f (x) =

(x− α)T ∑(x− α).

30

3. Maxima based on Partial ordering, say, based on convex hulls.

4. Concomitants of marginal maxima: Conditional ordering based marginal order-

ing.

Our approach for extreme value analysis will be based on the first one mentioned

above. It turns out that the theory behind component-wise maxima is quite rich

and provides answers to the questions we have. We will study multivariate extreme

value theory in the case dimension, d = 2. Most of the result and definitions given

forth will also hold for d > 2.

In univariate extreme value theory we found that the limit distributions of sample

maxima can be characterized through a parametric family of generalised extreme

value or generalised Pareto distribution. We will learn in this chapter that such is

not the case with multi-dimensional case.

Some notation : The most useful order relation in multivariate extreme value theory

is a special case of what is called marginal ordering: for d-dimensional vectors x =

(x(1), . . . , x(d)) and y = (y(1), . . . , y(d)), the relation x ≤ y is defined as x(j) ≤ y(j) for

all j = 1, . . . , d.

The component-wise maximum of x and y, (x, y ∈ Rd), defined as

x∨ y := (x(1) ∨ y(1), . . . , x(d) ∨ y(d))

is in general different from both x and y.

Subsequently, arithmetic operations and order relations are meant componentwise;

that is,

a + b = (a(1) + b(1), . . . , a(d) + b(d))

where vectors a = (a(1), . . . , a(d)) and b = (b(1), . . . , b(d)). An interval (a, b] is de-

fined by "j≤d(aj, bj].

(a, b] = x ∈ Rd : a < x ≤ b

= (x(1), . . . , x(d)) : a(i) < x(i) ≤ b(i)

and similarly

(−∞, a] = x ∈ Rd : x ≤ a.

We denote Rd= [−∞, ∞]d and we will require sets of the form

[−∞, a] = x ∈ Rd : 0 ≤ x(i) ≤ a(i), 1 ≤ i ≤ d

(a,∞] = x ∈ Rd : a(i) < x(i) ≤ ∞, 1 ≤ i ≤ d

31

[−∞, a]c = Rd\[−∞, a] = x ∈ R

d : x(i) > a(i), f or some i = 1, . . . , d.

Recal that the d.f. F(x) = Q(−∞, x] of a probability measure Q has the following

properties:

(a) F is right-continuous: F(xn) ↓ F(x0) if xn ↓ x0;

(b) F is normed:F(xn) ↑ 1 if xnj ↑ ∞, j = 1, . . . , d,

F(xn) ↓ 0 if xn ≥ xn+1 and xnj ↓ −∞ for some j ∈ 1, . . . , d

(c) F is4-monotone: For a ≤ b,

4baF = Q(a, b]

= ∑m∈0,1d

(−1)(d−∑j≤d mj)F(bm11 a1−m1

1 , . . . , bmdd a1−md

d ) ≥ 0.

Conversely, every function F, satisfying conditions (a)− (c), is the df of a probabil-

ity measure Q. Usually, conditions (a) and (b) can be verified in a straightforward

way. The4-monotonicity holds if, e.g., F is the pointwise limit of a sequence of d.f.

Usually we follows the convention that F stands for the distribution function as well

as the measure, and thus we need emphasize that

Ft(x) stands for (F(x))t

and that for x < y

Ft((x, y]) 6= (F(x, y])t.

2.2. Limit Distributions of Multivariate Maxima

Assume (X1, Y1), (X2, Y2), . . . , (Xn, Yn) is a sequence of vectors that are independent

versions of a random vector having distribution function F(x, y). Since we are talk-

ing the component-wise maxima approach, let us denote

Mn = (n∨

i=1

Xi,n∨

i=1

Yi) = (M(1)n , M(2)

n )

As we had done in the univariate case we wouldn’t study the minima, since

mn := (n∧

i=1

Xi,n∧

i=1

Yi) = −(n∨

i=1

(−Xi),n∨

i=1

(−Yi)).

32

2.2.1. Max-infinitely Divisible Distributions

Let Xn, n ≥ 1 be i.i.d. random variables with common distribution F(x) and

defined Mn =∨n

i=1 Xi. In order to study the structure of the stochastic process

Mn, It has been found convenient to embed Mn in a continuous time process

Y = Y(t), t > 0, called an extremal process, in sense that Mnd= Y(n). For this

construct, Y is defined by the finite dimensional distributions : For 0 < t1 < · · · <tn, x1 ≤ · · · ≤ xn

P[Y(ti) ≤ xi, i = 1, . . . , n] = Fti .Ft2−t1 · · · Ftn−tn−1 .

Recall from the univariate theory that a univariate distribution function F is in the

domain of attraction of a non-degenerate d.f. G if ∃ a(t) > 0, b(t) ∈ R such that as

t→ ∞,

Ft(a(t)x + b(t)) d→ G(x).

In the one-dimensional case Ft is a probability distribution function whenever F is a

probability distribution function. This is not the case for dimension d > 1. Consider

the following example.

Example 2.1. Suppose Xn, n ≥ 1 is an i.i.d. sequence of R valued random vari-

ables and we wish to study the range

Rn :=n∨

i=1

Xi −n∧

i=1

Xi =n∨

i=1

Xi +n∨

i=1

(−Xi).

Thus, it is only natural to look at the joint distribution of (Xi,−Xi) which has distri-

bution F, say. Then

(M(1)n , M(2)

n ) = (n∨

i=1

Xi,n∨

i=1

(−Xi)) =n∨

i=1

(Xi,−Xi)

and Rn = M(1)n + M(2)

n . The joint distribution F(x(1), x(2)) of (Xi,−Xi) concentrates

on (x(1), x(2)) : x(1) + x(2) = 0 and we show that it is not the case that Ft is a

distribution for all t > 0. We see that

Ft((x, y]) = Ft(y)− Ft(x(2), y(1))− Ft(x(1), y(2)) + Ft(x) (2.1)

If it were then the expression in (2.1) would be non-negative for all t. However, take

x = (0, 0), y = (1, 1) and observe that

F(0, 1) = F(u(1), u(2)) : u(1) + u(2) = 0, 0 ≤ u(2) ≤ 1 =: p1,

33

F(1, 0) = F(u(1), u(2)) : u(1) + u(2) = 0, 0 ≤ u(1) ≤ 1 =: p2,

F(0, 0) = 0, F(1, 1) = p1 + p2.

For Ft to be a distribution function, Ft(A) ≥ 0 for any Borel A ⊂ R2.

Ft((0, 1]2) = Ft(1, 1)− Ft(0, 1)− Ft(1, 0) + Ft(0, 0)

= (p1 + p2)t − pt

1 − pt2.

Then, Ft((0, 1]2) ≥ 0⇔ (p1 + p2)t − pt

1 − pt2 ≥ 0 which need not be the case t < 1.

Thus not every distribution function F on Rd for d > 1 has the property that Ft is a

distribution. Thus the following definition is meaningful:

Definition 2.1. The distribution function F on Rd is max-infinitely divisible or max-id

if for every n there exists a distribution Fn on Rd such that

F = Fnn .

i.e, for every n, F1/n is a distribution. If a random vector X : X ∼ F we also call X

max-id.

Proposition 2.1. Suppose that for n ≥ 0, Fn are probability distribution functions on Rd.

If Fnn

d→ F0 (pointwise convergence at continuity points of F0) then F0 is max-id. Conse-

quently,

(a) F is max-id if and only if Ft is a distribution function for every t > 0.

(b) The class of max-id distributions in Rd is closed with respect to weak convergence:

If Gn are max-id distributions converging weakly to a distribution G0, then G0 is

max-id.

Proof. We have Fn’s are distribution functions and Fnn

d→ F0. Suppose x is continuity

pointof F0. We show that F[nt]n (x) → Ft

0(x) for all t > 0. We see that F[nt]n are distri-

bution function for any n ≥ 1, t > 0 with [nt] ≥ 1 and also Ft0 is non-proper, since F0

is non-proper.

Fix t > 0, if F0(x) = 0 then

F[nt]n (x) = (Fn

n (x))[nt]n → 0 = Ft

0(x)

If F0(x) > 0 then Fn(x)→ 1, and as n→ ∞

− log F[nt]n (x) = [nt](− log Fn(x)) ∼ nt(− log Fn(x))

= t(− log Fnn (x))→ t(− log F0(x)) = − log Ft

0(x).

34

Thus F[nt]n (x) → Ft

0(x) whence Ft0 is a distribution function. Hence F

1n

0 is a distribu-

tion function for any n and F0 is max-id.

The proof of (a) follows easily from the above Fn = (F1n

0 )n.

Consequence (b) follows observe that if Gn are max-id and Gnd→ G0, then

G0 = limn→∞

Gn = limn→∞

(G1nn )

n

in order to conclude that G0 is max-id.

Here are two examples of max i.d. distribution function.

Example 2.2. (The independent case)

Suppose G(x, y) = F1(x)F2(y) where F1, F2 are (one-dimensional) distributions. For

every t, Gt(x, y) is a d.f. for we may construct independent extremal processes

Y1(·), Y2(·) governed by F1, F2 respectively and then

P[Y1(t), Y2(t)] = Ft1(x)Ft

2(y) = Gt(x, y).

Example 2.3. A more interesting example is from the theory of Poisson random

measures. Let µ(dx, dy) be a measure on (R2, B(R2)) such that µ(R2) = +∞

and for all x and y sufficiently large µ((−∞, x]× (−∞, y])c < ∞. Use this mea-

sure µ to construct Poisson random measure N on R+ × R2 with mean measure

dt× µ(dx, dy). Let the points of the random measure be (Tk, J(1)k , J(2)k ) and defined a

two-dimensional (extremal) process by

Yi(t) = supJ(i)k | Tk ≤ t

for i = 1, 2. Then

P[Y1(t) ≤ x, Y2(t) ≤ y] =

= P[N[0, t]× ((−∞, x]× (−∞, y])c = 0]

= exp−tµ((−∞, x]× (−∞, y])c

This shows that the d.f.

G(x, y) = exp−µ((−∞, x]× (−∞, y])c

is max i.d.

In what follows we represent partial derivatives by subscripts;

Fx =∂

∂xF, Fy =

∂

∂yF, Fx,y =

∂2

∂y∂xF,

and so on. The notation [F > 0] means the set x : F(x) > 0.

35

Proposition 2.2. Let F be a distributions on R2 with continuous density Fx,y. Then F is

max-id iff Q := − log F satisfies

Qx,y ≤ 0 on [F > 0]

or equivalently iff

FxFy ≤ Fx,yF as on R2.

Proof. Since Ft = e−tQ, we have on (the open) set [F > 0]

∂

∂y∂

∂xFt =

∂

∂y(−te−tQQx) =

−∂

∂y(tFtQx)

= −t(Qx,yFt − tFtQyQx)

= tFt(tQxQy −Qx,y)

and F is max-id iff this latter expression is non-negative for all t, as occurs iff

tQxQy −Qx,y ≥ 0 (2.2)

for all t. Since Qx = − FxF ≤ 0 and Qy ≤ 0, we have (2.2) holds for all t iff Qx,y ≤ 0 as

asserted. The rest follows by differentiation:

0 ≥ Qx,y =∂

∂y∂

∂x(− log F) =

−∂

∂x(

Fy

F)

= −FFx,y − FyFx

F2

that means, FxFy ≤ Fx,yF.

2.2.2. Characterizing Max-id Distributions

Proposition 2.3. The following are equivalent:

(i) F is max-id.

(ii) For some l ∈ [−∞, ∞)d, there exists an exponent measure µ on E := [l,∞]\lsuch that

F(y) =

exp−µ([−∞, y]c) l ≤ y

0 otherwise

Here µ an exponent measure if it is Radon (on compact sets) and satisfies

36

1. µ(E\[−∞, ∞)d) = µ(⋃d

i=1y ∈ E : y(i) = ∞)= 0.

2. Either l > −∞ or x ≥ l and x(i) = −∞ for some i ≤ d implies µ ([−∞, x]c) = ∞.

Proof. We start by showing that if F is max-id, then [F > 0] ⊂ Rd is a rectangle of

the form A1 × · · · × Ad where Ai = [l(i), ∞) or (l(i), ∞) and l = (l(1), . . . , l(d)) =

inf[F > 0]. Note that F is a probability distribution on Rd and µ is defined on

[l,∞]\l = E ⊂ [−∞, ∞]d.

It suffices to show (i) implies (ii). We start by showing that if F is max-id, then

[F > 0] is rectangle. To do this we need to verify two properties of [F > 0]

(1’) x ∈ [F > 0] and x ≤ y implies y ∈ [F > 0].

(2’) x, y ∈ [F > 0] implies x∧ y ∈ [F > 0].

The first is obvious, and for the second it suffices to show that

F(x∧ y) ≥ F(x)F(y)

or equivalent (Q = − log F)

Q(x∧ y) ≤ Q(x) + Q(y) (2.3)

However, suppose that Y(t), t > 0 is extremal-F, then

Fn−1(x) = P[Y(n−1) ≤ x]

= P[Y(n−1) ≤ x, Y(n−1) ≤ y] + P([Y(n−1) ≤ x] ∩ [Y(n−1) ≤ y]c

)≤ P[Y(n−1) ≤ x∧ y] + P

([Y(n−1) ≤ y]c

)= Fn−1

(x∧ y) + 1− Fn−1(y)

and therefore

n(1− Fn−1(x∧ y)) ≤ n(1− Fn−1

(x)) + n(1− Fn−1(y)) (2.4)

For fixed x ∈ [F > 0] we have as n→ ∞

n(1− Fn−1(x)) ∼ −n log Fn−1

(x) = Q(x)

and letting n → ∞ (2.4) gives the desired (2.3). Based on (1’) and (2’) to verification

that [F > 0] is rectangle can proceed: Defined projection maps usual by

πix = x(i), i = 1, . . . , d

37

for x ∈ Rd. We assert

[F > 0] = π1[F > 0]× · · · × πd[F > 0] =: "di=1πi[F > 0] (2.5)

and from this the result directly follows since πi[F > 0] is an interval of the form

(l(i), ∞) or [l(i), ∞) by (1’).

If x ∈ [F > 0] then of course x(i) ∈ πi[F > 0] implying x ∈ "di=1πi[F > 0]. Con-

versely, suppose that x ∈ "di=1πi[F > 0] so that for i = 1, . . . , d, x(i) ∈ πi[F > 0] and

thus there exists yi ∈ [F > 0] with πiyi = x(i). From (2’) we have y := ∧di=1yiπi[F >

0]. However, πiy ≤ πyi = x(i), and thus y ≤ x and then by (1’) we get x ∈ [F > 0].

Hence (2.4) is verified.

With l = inf[F > 0] consider E = [l, ∞]\l and defined on E measures

µn := nF1n .

Since F1n is only defined on Rd one must exten the definition of F

1n in the obvi-

ous way to [−∞, ∞]d in order to get µn defined on E. Set of the form [−∞, x]c =

E\[−∞, x] for x ≥ l are relatively compact subsets of E and as n→ ∞

µn([−∞, x]c) = n(1− F1n (x))→ Q(x) < ∞,

so that for such x > l

sup µn([−∞, x]c) < ∞.

Since E = limx↓l[−∞, x]c it follows that for any relatively compact subset B of E.

sup µn(B) < ∞

then µn is vaguely relatively compact. Let µ1 and µ2 be two vague limit points of

µn. Then for any x > y

µ1([−∞, x]c) = µ2([−∞, x]c) = Q(x) = − log F(x).

and thus

µ1 = µ2.

So all limits points of un are equal and hence there is limit measure µ on E with

µnv→ µ

Thus for x > l

µ([−∞, x]c) = − log F(x)

F(x) = exp−µ([−∞, x]c)

Hence µ is an exponent measure and the proof is complete.

38

2.3. Multivariate Domain of Attraction

In this section we determine the class of all possible limit distributions G.

Definition 2.2. A bivariate distribution function F is said to be in the domain of

attraction of a bivariate distribution function G if

1. G has non-degenerate marginal distributions G1 and G2.

2. There exist sequence an, cn > 0 and bn, dn ∈ R, such that

P

(M(1)

n − bn

an≤ x,

M(2)n − dn

cn≤ y

)= Fn(anx+ bn, cny+ dn)

d→ G(x, y) (2.6)

Any limit distribution function G in (2.5) with non-degenerate marginals is called

a multivariate extreme value distribution. Since (2.5) implies convergence of the one-

dimensional two marginal distributions, we have

limn→∞

P

(M(1)

n − bn

an≤ y

)= G(x, ∞) =: G1(x).

and

limn→∞

P

(M(2)

n − dn

cn≤ y

)= G(∞, y) =: G2(y).

Remark 2.1. We note in passing that since the two marginal distributions of G are

continuous, G must be continuous as well.

G1 and G2 are univariate extreme value distributions, we choose the constants an, cn >

0 and bn, dn ∈ R such that for some γ1, γ2

G1(x) = exp−(1 + γ1x)−1/γ1, 1 + γ1x > 0,

G2(y) = exp−(1 + γ2y)−1/γ2, 1 + γ2x > 0.

2.3.1. Max-stability

In the characterization of the multivariate extreme distribution, max-stable (or min-

stable) distributions play a central role.

Suppose that Xn = (X(1)n , . . . , X(d)

n ), n ≥ 1 are i.i.d. random d-dimensional vectors

with common distribution F(x)

F(x) = F(x1, . . . , xd) = P(X(k)n ≤ xk, k = 1, . . . , d).

39

Let the marginal distributions of F(x) be F1, . . . , Fd so that F1(x) = F(x, ∞, . . . , ∞),

and so on. Assume that there exist normalizing constants a(i)n > 0, b(i)n ∈ R, 1 ≤ i ≤d, n ≥ 1 such that as n→ ∞

P[M(i)

n − b(i)n

a(i)n

≤ x(i)] = Fn(a(1)n x(1) + b(1)n , . . . , a(d)n x(d) + b(d)n )→ G(x) (2.7)

for the limit distribution G such that each marginal Gi, i = 1, . . . , d is non degenerate.

The class of limit G, called extreme value distributions.

Definition 2.3. We say that a distribution G(x) is max-stable if for i = 1, 2, . . . , d and

every t > 0, there exist functions α(i)(t) > 0 , β(i)(t) such that

Gt(x) = G(α(1)(t)x(1) + β(1)(t), . . . , α(d)(t)x(d) + β(d)(t)) (2.8)

It is clear from (2.8) that for every t > 0, Gt is a distribution function and hence

every max-stable distribution is max-id. The relevance of max-stable distributions

is obvious from next result.

The following theorem describes the equivalence between multivariate extreme value

distributions and max-stable distributions.

Theorem 2.1. The class of multivariate extreme value distributions is precisely the class of

max-stable distribution functions with non degenerate marginals.

Proof. It is clear that if G has non-degenerate marginals and is max-stable, then (2.7)

holds ; take F = G.

Conversely suppose (2.7) holds. From marginal convergence

Fni (a(i)n (t)x + b(i)n )→ Gi(x), non-degenerate, 1 ≤ i ≤ d

and (1.19) there exist functions α(i)(t) > 0, β(i)(t) ∈ R such that for t > 0, 1 ≤ i ≤ d

limn→∞

a(i)n

a(i)[nt]

= α(i)n (t), lim

n→∞

b(i)n − b(i)[nt]

a(i)[nt]

= β(i)n (t) (2.9)

Suppose Y(t) is a vector with distribution Gt(x). Then for t > 0, 1 ≤ i ≤ d we have

on the one handM(i)

[nt] − b(i)[nt]

a(i)[nt]

d→ Y(1) (2.10)

40

and on the otherM(i)

[nt] − b(i)n

a(i)n

d→ Y(t). (2.11)

Since P[M[nt] ≤ x] = F[nt](x) = (Fn(x))[nt]n . Using (2.9), (2.10) and (2.11) we get

M(i)[nt] − b(i)

[nt]

a(i)[nt]

=

M(i)[nt] − b(i)n

a(i)n

a(i)n

a(i)[nt]

+b(i)n − b(i)

[nt]

a(i)[nt]

d→ (α(i)(t)Y(i)(t) + β(i)(t)) d

= Y(1).

which is the same as (2.8).

2.4. Basic Properties of Multivariate Extreme Value Dis-

tributions

The indicator function of a subset of a set is a function

1A : X → 0, 1

defined as

1A(x) =

1 if x ∈ A

0 if x /∈ A.

In this subsection, we study some basic properties of multivariate extreme value

distribution functions.

Remark 2.2. Any bivariate extreme value distribution is continuous in (x, y). This

follows from the subsequent lemma.

Lemma 2.1. If G is a bivariate distribution function such that its marginal distribution

functions G1 and G2 are continuous, then G(x, y) is itselt continuous in (x, y).

Proof. For 0 ≤ u, v ≤ 1 and 0 ≤ u′, v′ ≤ 1,

|uv− u′v′| = |(u− u′)v + u′(v− v′)| ≤ |(u− u′)v|+ |u′(v− v′)|

≤ |u− u′|+ |v− v′|. (2.12)

Now for x, x′, y, y′ ∈ R, we know that as (x, y)→ (x′, y′),

G1(x)→ G1(x′), G2(y)→ G2(y′).

41

Suppose that (X, Y) ∼ G. Then

|G(x, y)− G(x′, y′)| = |E(1X≤x.1Y≤y)−E(1X≤x′.1Y≤y′)|

≤ E|(1X≤x.1Y≤y)− 1X≤x′.1Y≤y′|

≤ E|1X≤x − 1X≤x′|+ E|1Y≤y.1Y≤y′|

= |G1(x)− G1(x′)|+ |G2(y)− G2(y′)| → 0,

as (x, y)→ (x′, y′).

Thus, since the univariate marginal of a bivariate extreme value distribution are

continuous, it itselt is continuous. This leads us to another result.

Proposition 2.4. For F ∈ D(G), the convergence

Fn(anx + bn, cny + dn)→ G(x, y)

is locally uniform as n→ ∞.

Proof. The original analytical proof is due to Buchanan and Hildebrandt, 1908 [6].

We proceed in a sequence of steps.

Step(i). From the fact that Fn, n ≥ 1 are all monotone non-decreasing functions. We

can show that G has to be monotone non-decreasing. Suppose it were not. Then for

certain two pairs (x1, y1), (x2, y2) (a ≤ x1 < x2 ≤ b, c ≤ y1 < y2 ≤ d) we would have

G(x1, y1) > G(x2, y2).

Let G(x1, y1)− G(x2, y2) = h.

Since Fn converges for all values of (x, y) on the intervals a ≤ x ≤ b, c ≤ y ≤ d, we

have

limn→∞

Fn(anx1 + bn, cny1 + dn) = G(x1, y1)

limn→∞

Fn(anx2 + bn, cny2 + dn) = G(x2, y2).

that is, if there is given an ε > 0, it is possible to find an nε(x1,y1)of such a nature that

if n ≥ nε(x1,y1), we have

|Fn(anx1 + bn, cny1 + dn)− G(x1, y1)| ≤ ε

and an nε(x2,y2) of such a nature, that if n ≥ nε(x2,y2) we have

|Fn(anx2 + bn, cny2 + dn)− G(x2, y2)| ≤ ε.

42

Suppose ε < h2 and choose n greater than or equal to nε(x1,y1)

and nε(x2,y2). Then

|Fn(anx1 + bn, cny1 + dn)− G(x1, y1)| ≤h2

|Fn(anx2 + bn, cny2 + dn)− G(x2, y2)| ≤h2

.

That is,

Fn(anx1 + bn, cny1 + dn) > G(x1, y1)−h2

Fn(anx2 + bn, cny2 + dn) < G(x2, y2) +h2

,

and therefore

Fn(anx1 + bn, cny1 + dn)− Fn(anx2 + bn, cny2 + dn) > 0.

But by hypothesis Fn is monotonic non-decreasing, i.e,

Fn(anx1 + bn, cny1 + dn)− Fn(anx2 + bn, cny2 + dn) ≤ 0.

We have then reached a contradiction, and therefore by hypothesis that G is not a

monotonic non-decreasing function of (x, y) is invalid.

Step(ii). Fix S = [a, b]× [c, d] ⊂ R2. We show that Fn(anx + bn, cny + dn) converges

to G(x, y) uniformly in S.

Step(iii). G is continuous on R2. Hence G is uniformly continuous on S (compact-

ness). Thus, given ε > 0, ∃ δ > 0, such that d((x, y), (x′, y′)) < 2δ implies

|G(x, y)− G(x′, y′)| < ε.

Consider an open cover of S with B((x, y), δ) = (x − δ, x + δ) × (y − δ, y + δ). By

Heine-Borel Theorem, there exists a finite subcover of Bi = (xi − δ, xi + δ)× (yi −δ, yi + δ), i = 1, . . . , k of S.

Step(iv). For all (xi, yi), i = 1, . . . , k, find Mi such that for n > Mi,

|Fn(anxi + bn, cnyi + dn)− G(xi, yi)| ≤ ε.

Step(v). Clearly for each i = 1, . . . , k

Vni := Fn(an(xi + δ) + bn, cn(yi + δ) + dn)− Fn(an(xi − δ) + bn, cn(yi − δ) + dn)n→∞→ G(xi + δ, yi + δ)− G(xi − δ, yi − δ)

≤ 2ε.

43

Therefore , there exists Ni, i = 1, . . . , k, such that Vni < 3ε, for n > Ni.

Step(vi). Define N = maxM1, . . . , Mk, N1, . . . , Nk. Pick (x, y) ∈ S, (x, y) ∈ Bi∗ for

some i∗. Easy to show that

|Fn(anx + bn, cny + dn)− G(x, y)|

= |Fn(anx + bn, cny + dn)− Fn(anxi∗ + bn, cnyi∗ + dn) + Fn(anxi∗ + bn, cnyi∗ + dn)

−G(xi∗ , yi∗) + G(xi∗ , yi∗)− G(x, y)|

≤ |Fn(anx + bn, cny + dn)− Fn(anxi∗ + bn, cnyi∗ + dn)|+

+ |Fn(anxi∗ + bn, cnyi∗ + dn)− G(xi∗ , yi∗)|+ |G(xi∗ , yi∗)− G(x, y)|

≤ Vni∗ + ε + ε

≤ 3ε + 2ε = 5ε.

Hence the statement is true.

A consequence of this fact is the following corollary

Corollary 2.1. If xn and yn are real sequences such that xn → u, yn → v, then

limn→∞

Fn(anxn + bn, cnyn + dn) = G(u, v).

Proof. By the local uniform convergence from Proposition 2.4, we have that, given

ε > 0, there exist δ > 0 such that

sup|x−u|≤δ|y−v|≤δ

|Fn(anx + bn, cny + dn)− G(x, y)| ≤ ε

2.

We also have G is continuous and (xn, yn)→ (u, v), there exists N0 ≥ 1 such that for

n ≥ N0, we get

|G(xn, yn)− G(u, v)| ≤ ε

2.

In addition, there exists N1 ≥ 1 such that for n ≥ N1, we have

|xn − u| ≤ δ, |yn − v| ≤ δ

Taking N = N0 ∨ N1. There combining all the above we observe that, for n ≥ N,

|Fn(anx + bn, cny + dn)− G(x, y)|

≤ |Fn(anx + bn, cny + dn)− G(xn, yn)|+ |G(xn, yn)− G(x, y)|

≤ sup|x−u|≤δ|y−v|≤δ

|Fn(anx + bn, cny + dn)− G(x, y)|+ |G(xn, yn)− G(x, y)|

≤ ε

2+

ε

2= ε.

The proof is complete.

44

Theorem 2.2. If F ∈ D(G) and F ∈ D(H), then there exists A, C > 0 and B, D ∈ R

such that,

H(x, y) = G(Ax + B, Cy + D).

Proof. Since F ∈ D(G) then there exists an, cn > 0, and bn, dn ∈ R, such that

Fn(anx + bn, cny + dn)d→ G(x, y).

By taking marginals Fi, i = 1, 2 of F we get the following weak convergences

Fn1 (anx + bn)

d→ G1(x),

Fn2 (cny + dn)

d→ G2(y).

Since we also have F ∈ D(H), there exists a′n, c′n > 0 and b′n, d′n ∈ R such that

Fn(a′nx + b′n, c′ny + d′n)d→ H(x, y).

By taking marginals Fi, i = 1, 2 of F we get the following weak convergences

Fn1 (a′nx + b′n)

d→ H1(x),

Fn2 (c′ny + d′n)

d→ H2(y).

From univariate convergence to types theorem, we obtain

a′nan→ A > 0

b′n − bn

an→ B,

c′ncn→ C > 0

d′n − dn

cn→ D.

Using the Corollary (2.1) we obtain that

H(x, y) = G(Ax + b, Cy + D).

2.5. Standardization

Let us now proceed towards characterizing G. We have seen that the marginals of

a bivariate (multivariate) extreme-value distribution (EVD) are all univariate EVDs.

To study the dependence structure of the distribution, it would be much more con-

venient if all the marginals were the same. For this we would actually perform a

transformation of the marginals which we call standardization. There are multiple

45

choices for this transformation leading to either Uniform or Gumbel or Weibull or

Frechet marginals. Each one has its own merit and has been explored in the liter-

ature. We would reduce to Frechet(1) margins across all coordinates, the merits of

which will be evident soon.

Let Fi, i = 1, 2 be the marginal distribution functions of F. We define Ui(t) as

Ui(t) = F←i (1− 1t) =

(1

1− Fi

)←(t), t > 1

From univariate case, we know that

Fn1 (anx + bn)→ G1(x) = exp−(1 + γ1x)−1/γ1, 1 + γ1x > 0,

Then there are positive functions ai(t), i = 1, 2, and bi(t) ∈ R such that

limt→∞

Ui(tx)− bi(t)ai(t)

→ xγi − 1γi

This immediately leads to the fact that we can take

an = a([n]), cn = c([n]), bn = U1([n]), dn = U2([n])

Hence we have for x, y > 0

xn :=U1(nx)− bn

an→ xγ1 − 1

γ1:= u, (2.13)

yn :=U2(nx)− dn

cn→ xγ2 − 1

γ2:= v. (2.14)

Note that if xn → u, yn → v, then by the continuity of G and the monotonicity of F

limn→∞

Fn(anxn + bn, cnyn + dn) = G(u, v).

Then for all x, y > 0, we get

limn→∞

Fn(U1(nx), U2(ny)) = G(

xγ1 − 1γ1

,xγ2 − 1

γ2

).

This is a direct consequence of Corollary 2.1 with xn, yn defined as in (2.13)-(2.14).

We have proved the following theorem:

Theorem 2.3 (Standardization). Suppose that there are real constants an, cn > 0, bn, and

dn such that

limn→∞

Fn(anx + bn, cny + dn) = G(x, y)

for all (x, y) of G, and the marginals of G are standardized. Then with F1(x) := F(x, ∞), F2(y) :=

F(∞, y), and Ui(x), i = 1, 2

limn→∞

Fn(U1(nx), U2(ny)) = G0(x, y) (2.15)

46

for all x, y > 0, where

G0(x, y) = G(

xγ1−1

γ1,

yγ2−1

γ2

)and γ1, γ2 are the marginal extreme value indices.

Remark 2.3. In case F has continuous marginal distribution functions F1 and F2,

relation (2.15) can be formulated simply as

limn→∞

max

(1

1− F1(Xi)≤ nx

), max

(1

1− F2(Yi)≤ ny

)= G0(x, y),

for x, y > 0, i.e., after a transformation of the marginal distributions to a standard

distribution, namely F(x) := 1− 1x , x ≥ 1, a simplified limit relation applies. This

means that we have reformulated the problem of identifying the limit distribution

in such a way that the marginal distributions no longer play a role. From now on

we can focus solely on the dependence structure.

Corollary 2.2. For any (x, y) for which 0 < G0(x, y) < 1,

limn→∞

n(1− F(U1(nx), U2(ny))) = − log G0(x, y). (2.16)

Proof. Taking logarithms to the left and to the right of (2.15), we get

limn→∞−n log F(U1(nx), U2(ny)) = − log G0(x, y).

We see that log(x) ∼ x − 1 as x → 1 and F(U1(nx), U2(ny)) → 1 as n → ∞. Then

we have− log F(U1(nx), U2(ny))1− F(U1(nx), U2(ny))

→ 1

and with −n log F(U1(nx), U2(ny))→ − log G0(x, y), we obtain (2.15).

We shall also use the following slight extension.

Corollary 2.3. For any (x, y) for which 0 < G0(x, y) < 1,

limt→∞

t(1− F(U1(tx), U2(ty))) = − log G0(x, y). (2.17)

where t runs through the real numbers

Proof. By applying inequalities (1.25) to relation (2.16) for the integer n replaced t,

we get (2.17).

47

Why do we call this standardization and what do we gain out of theorem 2.3. Ob-

serve that

Fn(U1(nx), U2(nx)) = P

[n∨

i=1

Xi ≤ U1(nx),n∨

i=1

Yi ≤ U2(nx)

]

= P

[n∨

i=1

U←1 (Xi)

n≤ x,

n∨i=1

U←2 (Yi)

n≤ y

]

=: P

[n∨

i=1

X∗in≤ x,

n∨i=1

Y∗in≤ y

].

Thus we have from (2.15) that

P

[n∨

i=1

X∗in≤ x,

n∨i=1

Y∗in≤ y

]→ G0(x, y)

weakly. Now let us find the distribution of X∗i and Y∗i .

P (X∗i ≤ t) = P (Xi ≤ Ui(t))

= P

(Xi ≤ F←i (1− 1

t)

)= P (

11− F1(Xi)

≤ t).

Note that if F1 is continuous then F1(Xi) ∼ U(0, 1). Then

P (X∗i ≤ t) = P

(1

1− F1(Xi)≤ t)

= P

(F1(Xi) ≤ 1− 1

t

)= 1− 1

t.

Thus X∗i follows a standard Pareto distribution. The same is true for Y∗i . Hence the

transformations U←1 and U←2 standardizes Xi, Yi to standard Pareto distributions.

Even when F is not continuous the standardized variables X∗i , Y∗i have asymptoti-

cally Pareto-like tails. Also note that the marginal G01 of G0 turns out to be:

G01 = G0(x, ∞) = G(

xγ1−1

γ1, ∞)

= G1

(xγ1−1

γ1

)= exp−(1 + γ1(

xγ1 − 1γ1

))−1/γ1

= e−1/x = Φ1(x), x > 0,

which is the Frechet(1) distribution. The same is true for the other marginal, G02(y) =

e−1/y, y > 0. Thus now we have reduced the problem of characterizing all bivariate

extreme value distributions to the the problem of characterizing all bivariate EVD’s

with Frechet(1) marginals.

48

Conclusion

Classical extreme value theory is concerned substantially with distributional prop-

erties of the maximum Mn of n i.i.d. random variables. Two results of basic impor-

tance are proved in Chapter 1. The first is the fundamental result, here called the

Extremal Types Theorem, which exhibits the possible limiting forms for the distri-

bution of Mn under linear normalizations. The second basic result given in Chapter

1 is almost trivial in the independent context, and gives a simple necessary and

sufficient condition under which PMn ≤ un converges, for a given sequence of

constants un. The theory is illustrated by several examples from each of the pos-

sible limiting types.

In chapter 2, we concentrate on the maximum of n multivariate observations. The

maximum is defined by the vector of componentwise maxima. The structure of the

family of limiting distributions is actually quite rich, and can be studied in terms of

max-stable distributions. We discuss characterizations of the limiting multivariate

extreme value distributions and give domains of attraction criteria.

49

APPENDIX A

Appendix

In the following we will provide some basic tools which are used throughout the

thesis. All random variables are assumed to be defined on a common probability space

(Ω,F , P) (i.e, Ω is a nonempty set, F is a σ-algebra over Ω, and P is a measure on

the measurable space (Ω,F ) having total mass 1). We commence with elementary

results on the convergence of random variables and of probability distributions.

A.1. Modes of Convergence

We introduce the main modes of convergence for a sequence of r.v.s X, X1, X2, . . ..

Convergence in Distribution

Definition A.1. We say that (Xn) converges in distribution or converges weakly to the

random variable X (Xnd→ X) if for all bounded, continuous functions f the relation

E f (Xn)→ E f (X), n→ ∞,

holds

We sometimes write Xnd→ FX where FX is the distribution or probability measure

of X. We use the same symbol both for the distribution and for the d.f. of a r.v.

Weak convergence can be described by the d.f.s FXn and FX of Xn and X respectively:

Xnd→ X holds if and only if for all continuity points y of the d.f. FX the relation

FXn(y)→ FX(y), n→ ∞

is satisfied. Moreover, if FX is continuous then relation can even be strenghtened to

uniform convergence:

supx|FXn(y)− FX(y)| → 0, n→ ∞.

50

Convergence in Probability

Definition A.2. We say that (Xn) converges in probability to the random variable X

(XnP→ X) if for all positive ε the relation

P(|Xn − X| > ε)→ 0, n→ ∞,

holds

Remark A.1. Convergence in probability implies convergence in distribution. The

converse is true if and only if X = c a.s for some constant c.

Almost Sure Convergence

Definition A.3. We say that (Xn) converges almost surely (a.s), or with probability

1, to the r.v. X (Xna.s→ X) if for P almost all ω ∈ Ω the relation

Xn(ω)→ X(ω), n→ ∞,

holds

This means that

P(Xn → X) = P(ω : Xn(ω)→ X(ω)) = 1.

Convergence with probability 1 is equivalent to the relation

supk≥n|Xk − X| P→ 0.

Hence convergence with probability 1 implies convergence in probability, hence

convergence in distribution .

Lp - Convergence

Definition A.4. Let p > 0. We say that (Xn) converges in Lp or in pth mean to X

(XnLp→ X) if E|Xn|p < ∞ and E|X|p < ∞ and

E|Xn(ω)− X(ω)|p → 0, n→ ∞

By Markov’s inequality, P(|Xn − X| > ε) ≤ ε−pE|Xn(ω) − X(ω)|p for positive p

and ε. Thus XnLp→ X implies that Xn

P→ X. The converse is in general not true.

51

Vague Convergence

Suppose E is a locally compact topological space with countable base. We can safely

think of E = Rd or Rd. Let C be the Borel σ-field of E.

A measure µ : C → [0, ∞] is set function such that

1. µ(∅) = 0 and µ(A) ≥ 0 for A ∈ C .

2. If An, n ≥ 1 are mutually disjoint sets in C , then

µ(∞⋃

i=1

Ai) =∞

∑i=1

µ(Ai).

µ is Radon if µ(K) < ∞ for all K compact in E. Now denote

M+(E) = µ : µ is a non-negative Radon measure on C .

Now recall from weak convergence of probability measures ([4]), that we considered

a class of continuous and bounded test functions f and if

Pn( f ) :=∫

f (x)Pn(dx)→∫

f (x)P (dx) =: P ( f ),

then we said Pn converges weakly to P . Note that the measure in M+(R) are poten-

tially infinite. So if we want fo follow the same route, we need functions on compact

support, where the measures are finite. So defined

C+K (E) = f : E→ [0, ∞) : f is continuous with compact support .

Let µn ∈ M+(E), ∀n ≥ 0. Then µn converges to µ0 vaguely if ∀ f ∈ C+K (E),

µn( f ) :=∫

Ef (x)µn(dx)→

∫E

f (x)µ0(dx) =: µ0( f ).

Write unv→ µ0. We have M+(E) can be made into a complete separable metric with

respect to the vague metric.

Uniform Convergence

Definition A.5. Consider a sequence of functions fn : R→ R, n ≥ 0. Then fn, n ≥1 converges to f0 uniformly on A ∈ R, if

supx∈A| fn(x)− f0(x)| → 0, n→ ∞.

52

fn converges to f0 locally uniformly if it converges uniformly on all compact sets, i.e.,

for any a < b,

supx∈[a,b]

| fn(x)− f0(x)| → 0, n→ ∞.

A.2. Inverses of Monotone Functions

Suppose H is a nondecreasing function on R. With the convention that inf∅ = ∞,

and infR = −∞ we define the (left-continuous) inverse of H as

H←(y) = infs : H(s) ≥ y

The function H← is left continuous at x ∈ R. Indeed, Suppose (xn) ↑ x but

H←(xn) ↑ H←(x−) < H←(x). Then there exist δ > 0 and y such that for all n

H←(xn) < y < H←(x)− δ.

The left inequality and the definition of H← yield H(y) ≥ xn for all n . Hence, letting

n → ∞ we get H(y) ≥ x whence again by the definition of H← we get y ≥ H←(x),

which coupled with y < H←(x)− δ leads to the desired contradiction.

Proposition A.1 (Properties of a generalised inverse function). If h is right continuous,

then the following properties hold. (a) h(x) ≥ q⇐⇒ h←(q) ≤ x.

(b) h(x) < q⇐⇒ h←(q) > x.

(c) h(x1) < q ≤ h(x2)⇐⇒ x1 < h←(q) ≤ x2.

(d) h←(h(q)) ≥ q for all t ∈ [0, 1], with equality for h continuous.

(e) h←(h(x)) ≤ x for all x ∈ R, with equality for h increasing.

(f) h is continuous⇐⇒ h← is increasing.

(g) h is increasing⇐⇒ h← is continuous.

(h) If X is a r.v.s with d.f h, then P(h←(h(X)) 6= X) = 0.

Proposition A.2 (Convergence of generalised inverse functions). Let h, h1, h2, . . . be

non-decreasing functions such that limn→∞ hn(x) = h(x) for every continuity point of h.

Then limn→∞ hn(y) = h←(y) for every continuity point y of h←.

Proofs of these results and more theory on generalised inverse functions can be

found in Resnick [18], Section 0.2.

53

A.3. Some Convergence Theorems

Lebesgue’s theorem on dominated convergence

Theorem A.1. Suppose that fn is a sequence of measurable functions, that fn → f

pointwise almost everywhere as n → ∞ , and that | fn| ≤ g for all n, where g is integrable.

Then f is integrable, and

limn→∞

∫fn dµ =

∫f dµ.

Proof. For the proof of the following, see P. Billingsley,1995 [5].

Skorohod’s theorem

Theorem A.2. For n ≥ 0 suppose Xn is a real random variable on (Ωn, Bn, Pn) such

that Xnd→ X0. Then there exist random variables Xn, n ≥ 0 defined on the Lebesgue

probability space ([0, 1], B[0, 1], m) such that

(i) Xnd= Xn for each n ≥ 0 and

(ii) Xn → X0 almost surely with respect to m.

Proof. For the proof of the following, see (Sidney I. Resnick, 1987, page 6) [18].

54

Bibliography

[1] A. A. Balkema and S. I. Resnick. Max-infinite divisibility. J. Appl. Probability,

14(2):309–319, 1977.

[2] V. Barnett. The ordering of multivariate data. J. Roy. Statist. Soc. Ser., 139(3):318–

355, 1976.

[3] J. Beirlant. Statistics Of Extremes: Theory And Applications. Wiley, 2004.

[4] Patrick Billingsley. Convergence of Probability Measures. John Wiley & Sons, 1968.

[5] Patrick Billingsley. Probability and Measure. Wiley-Interscience, 1995.

[6] H.E. Buchanan and T.H. Hildebrandt. Note on the convergence of a sequence

of functions of a certain type. Ann. of Math, 9(3):123–126, 1908.

[7] S.G. Coles and J.A. Tawn. Modelling extreme multivariate events. J. R. Statist.

Soc., 53(2):377–392, 1991.

[8] Stuart Coles. An introduction to statistical modeling of extreme values. Springer

series in statistics. Springer, 2001.

[9] Bikramjit Dass. A course in multivariate extremes. Spring, 2010.

[10] L. de Haan and S. I. Resnick. Limit theory for multidimensional sample ex-

tremes. Z. Wahr. verw. Geb., (40):317–337, 1977.

[11] Laurens de Haan and Ana Ferreira. Extreme Value Theory: An Introduction

(Springer Series in Operations Research). Springer, 2006.

[12] Paul Embrechts, Claudia Kluppelberg, and Thomas Mikosch. Modelling Ex-

tremal Events: for Insurance and Finance (Stochastic Modelling and Applied Proba-

bility). Springer, 2011.

[13] M. Falk, J. Husler, and R.D. Reiß. Laws of Small Numbers: Extremes and Rare

Events. Birkhauser, 2010.

[14] William Feller. An introduction to probability theory and its applications. Vol. II.

Second edition. John Wiley & Sons Inc., 1971.

[15] E.J. Gumbel. Statistics of extremes. Columbia University Press, 1958.

55

[16] M. R. Leadbetter, G. Lindgren, and H. Rootzen. Extremes and Related Properties

of Random Sequences and Processes. Springer-Verlag, 1983.

[17] J. Pickands. Multivariate extreme value distributions. Bull. Int. Statist. Inst.,

pages 859–878, 1981.

[18] S. Resnick. Extreme Values, Regular Variation, and Point Processes. Springer Series

in Operations Research and Financial Engineering, 1987.

[19] S.I. Resnick. A Probability Path. Birkhauser, 1999.

[20] E. Seneta. Regularly Varying Functions. Lecture Notes in Mathematics. Springer-

Verlag, 1976.

[21] M. Sibuya. Bivariate extreme statistics. Ann. Inst. Math. Statist., (11):195–210,

1960.

[22] J. Tiago de Oliveira. Extremal distributions. Rev. Fac. Sci. Lisboa, 1958.

[]

56

EXTREME VALUES AND PROBABILITY DISTRIBUTION FUNCTIONS ON FINITE DIMENSIONAL SPACES

Documents

Transcript of EXTREME VALUES AND PROBABILITY DISTRIBUTION FUNCTIONS ON FINITE DIMENSIONAL SPACES