EXTREME VALUES AND PROBABILITY DISTRIBUTION FUNCTIONS ON FINITE DIMENSIONAL SPACES
description
Transcript of EXTREME VALUES AND PROBABILITY DISTRIBUTION FUNCTIONS ON FINITE DIMENSIONAL SPACES
VIETNAM NATIONAL UNIVERSITY
UNIVERSITY OF SCIENCE
FACULTY OF MATHEMATICS, MECHANICS AND INFORMATICS
Do Dai Chi
EXTREME VALUES ANDPROBABILITY DISTRIBUTION FUNCTONS
ON FINITE DIMENSIONAL SPACES
Undergraduate Thesis
Advanced Undergraduate Program in Mathematics
Hanoi - 2012
VIETNAM NATIONAL UNIVERSITY
UNIVERSITY OF SCIENCE
FACULTY OF MATHEMATICS, MECHANICS AND INFORMATICS
Do Dai Chi
EXTREME VALUES ANDPROBABILITY DISTRIBUTION FUNCTONS
ON FINITE DIMENSIONAL SPACES
Undergraduate Thesis
Advanced Undergraduate Program in Mathematics
Thesis advisor: Assoc.Prof.Dr. Ho Dang Phuc
Hanoi - 2012
Acknowledgments
It would not have been possible to write this undergraduate thesis without the help,
and support, of the kind people around me, to only some of whom it is possible to
give particular mention here.
This thesis would not have been possible without the help, support and patience of
my advisor, Assoc.Prof.Dr. Ho Dang Phuc, not to mention his advice and unsur-
passed knowledge of probability and statistic. The advice, support and friendship
of his have been invaluable on both an academic and a personal level, for which I
am extremely grateful.
I would like to show my gratitude to my teachers at Faculty of Mathematics, Me-
chanics and Informatics, University of Sciences, VietNam National University who
equip me with important mathematics knowledge during first four years at the uni-
versity.
I would like to thank my parents for their personal support and great patience at all
times. My parents have given me their unequivocal support throughout, as always,
for which my mere expression of thanks likewise does not suffice.
Last, but by no means least, I thank my friends in K53-Advanced Math for their
support and encouragement throughout.
i
List of abbreviations and symbols
Here is a glossary of miscellaneous symbols, in case you need a reference guide.
∼ f (x) ∼ g(x) as x → x0 means that limx→x0f (x)g(x) = 1
d→ Xnd→ X : convergence in distribution.
P→ XnP→ X convergence in probability.
a.s→ Xna.s→ X almost surely convergence.
v→ µnv→ µ vague convergence.
d= X d
= Y: X and Y have the same distribution.
o(1) f (x) = o(g(x)) as x → x0 means that limx→x0f (x)g(x) = 0.
f← The generalized inverse of a monotone function f defined by
f←(x) = infy : f (y) ≥ x.Λ(x) Gumbel distribution.
Φα(x) Frechet distribution.
Ψα(x) Weibull distribution.
xF xF = supx ∈ R : F(x) < 1.f (x−) f (x−) = limy↑x f (y).
[F > 0] means the set x : F(x) > 0.M+(E) The space of nonnegative Radon measures on E.
C ( f ) The points at which the function f is continuous.
d.f. Distribution function.
r.v. Random variable.
DOA Domain of attraction.
ii
Contents
Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . i
List of abbreviations and symbols . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ii
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . v
Chapter 1. Univariate Extreme Value Theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.1.1. Limit Probabilities for Maxima . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.2. Maximum Domains of Attraction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.2.1. Max-Stable Distributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
1.3. Extremal Value Distributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
1.3.1. Extremal Types Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
1.3.2. Generalized Extreme Value Distributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
1.4. Domain of Attraction Condition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
1.4.1. General Theory of Domains of Attraction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
1.5. Condition for belonging to Extreme Value Domain . . . . . . . . . . . . . . . . . . . . 26
Chapter 2. Multivariate Extreme Value Theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
2.1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
2.2. Limit Distributions of Multivariate Maxima . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
2.2.1. Max-infinitely Divisible Distributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
2.2.2. Characterizing Max-id Distributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
2.3. Multivariate Domain of Attraction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
2.3.1. Max-stability. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
2.4. Basic Properties of Multivariate Extreme Value Distributions . . . . . . . . . . 41
2.5. Standardization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
iii
Chapter A. Appendix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
A.1. Modes of Convergence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
A.2. Inverses of Monotone Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
A.3. Some Convergence Theorems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54
Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
iv
Introduction
Extreme value theory developed from an interest in studying the behavior of the
maximum or minimum (extremes) of independent and identically distributed ran-
dom variables. Historically, the study of extremes can be dated back to Nicholas
Bernoulli who studied the mean largest distance from the origin to n points scat-
tered randomly on a straight line of some fixed length (Gumbel.1958 [15]). Extreme
value theory provides important applications in finance, risk management, telecom-
munication, environmental and pollution studies and other fields. In this thesis, we
study the probabilistic approach to extreme value theory. The thesis is divided into
the two chapters, namely,
Chapter 1: Univariate Extreme Value Theory.
Chapter 2: Multivariate Extreme Value Theory.
Chapter 1 introduces the basic concepts related to Univariate Extreme Value Theory.
This chapter concerns with the limit problem of determining the possible limits of
sample extremes and the domain of attraction problem.
Chapter 2 provides basic results in Multivariate Extreme Value Theory. We deal
with the probabilistic aspects of multivariate extreme value theory by including the
possible limits and their domain of attraction.
The main materials of the thesis were taken from the books by M. R. Leadbetter, G.
Lindgren, and H. Rootzen [16], Resnick [18], Embrechts [12] and de Haan and Ana
Ferreira [11]. We have also borrowed extensively from lecture notes of Bikramjit
Dass [9].
v
CHAPTER 1
Univariate Extreme Value Theory
This chapter is primarily concerned with the central result of classical extreme value
theory, the Extremal Types Theorem, which specifies the possible forms for the limit-
ing distribution of maxima in sequences of independent and identically distributed
(i.i.d.) random variables(r.v.s). In the derivation, the possible limiting distributions
are identified with a class having a certain stability property, the so-called max-stable
distributions. It is further shown that this class consists precisely of the three families
known (loosely) as the three extreme value distributions.
1.1. Introduction
The asymptotic theory of sample extremes has been developed in parallel with the
central limit theory, and in fact the two theories bear some resemblance.
Let X1, X2, . . . , Xn be i.i.d. random variables. The central limit theory is concerned
with the limit behavior of the partial sums Sn = X1 + X2 + · · · + Xn as n → ∞,
whereas the theory of sample extremes is concerned with the limit behavior of the
sample extremes max(X1, X2, . . . , Xn) or min(X1, . . . , Xn) as n→ ∞.
We consider some basic theory for sums of independent random variables. This in-
cludes classical results such as the strong law of large numbers and the Central Limit
Theorem. Throughout this chapter X1, X2, . . . is a sequence of i.i.d. non-degenerate
real random variables defined on a probability space (Ω,F , P) with common distri-
butions function(d.f.) F. We consider the partial sums
Sn = X1 + · · ·+ Xn, n ≥ 1.
and of the sample means
Xn = n−1Sn =Sn
n, n ≥ 1.
Let X be random variable and denote the expectation, the variance of X by E(X) =
µ, Var(X) = σ2. Firstly, we assume that E(X) = µ < ∞. From the strong law of
1
large numbers, we get
Xn = n−1Sna.s→ µ.
With the additional assumption of Var(X1) = σ2 < ∞, we get the Central Limit
Theorem:Sn − nµ√
nσ
d→ Z, Z ∼ N(0, 1).
Hence for large n, we can approximate
P(Sn ≤ x) ≈ P(
Z ≤ x− nµ√nσ
).
Taking an alternative approach, we can deal with the problem of finding possi-
ble limit distributions for (say) sample maxima of independent and identically dis-
tributed random variables.
1.1.1. Limit Probabilities for Maxima
Whereas in above, we introduced ideas on partial sums, in this section we investi-
gate the fluctuations of the sample maxima:
Mn =n∨
i=1
Xi = max(X1, . . . , Xn), n ≥ 1.
Remark 1.1. Corresponding results for minima can easily be obtained from those for
maxima by using the identity
min(X1, . . . , Xn) = −max(−X1, . . . ,−Xn).
We shall therefore only briefly discuss minima explicitly in this work, except where
its joint distribution with Mn is considered.
We have the exact d.f of the maximum Mn for x ∈ R, n ∈N,
P(Mn ≤ x) = P(X1 ≤ x, . . . , Xn ≤ x) =n
∏i=1
P(Xi ≤ x) = Fn(x). (1.1)
Extreme events happen ’near’ the upper end of the support of the distribution. We
denote the right endpoint of F by
xF = supx ∈ R : F(x) < 1. (1.2)
That is, F(x) < 1 for all x < xF and F(x) = 1 for all x > xF. We immediately obtainP(Mn ≤ x) = Fn(x)→ 0, n→ ∞ for all x < xF
P(Mn ≤ x) = Fn(x)→ 1, n→ ∞ in the case xF < ∞, ∀x > xF
2
Therefore the limit distribution limn→∞ Fn(x) is degenerate. Thus MnP→ xF as n →
∞ where xF < ∞. Since the sequence (Mn) is non decreasing in n, it converges
almost surely(a.s), no matter whether it is finite or infinite and hence we conclude
that
Mna.s→ xF, n→ ∞.
This result is quite uninformative for our purpose and does not answer the basic
question in our mind. This difficulty is avoided by allowing a linear renormalization
of the variable Mn:
M∗n =Mn − bn
an,
for sequences of constants an > 0 and bn ∈ R.
Definition 1.1. A univariate distribution function F, belong to the maximum do-
main of attraction of a distribution function G if
1. G is non degenerate distribution.
2. There exist real valued sequence an > 0, bn ∈ R, such that
P(
Mn − bn
an≤ x
)= Fn(anx + bn)
d→ G(x). (1.3)
Finding the limit distribution G(x) is called the Extremal Limit Problem. Finding the
F(x) that have sequences of constants as described above leading to G(x) is called
the Domain of Attraction Problem.
For large n, we can approximate P(Mn ≤ x) ≈ G( x−bnan
). We denote F ∈ D(G). We
often ignore the term ’maximum’ and abbreviate domain of attraction as DOA.
Now we are faced with certain questions:
1. Given any F, does there exist G such that F ∈ D(G) ?
2. Given any F, if G exist, is it unique?
3. Can we characterize the class of all possible limits G according to definition
1.1?
4. Given a limit G, what properties should F have so that F ∈ D(G)?
5. How can we compute an, bn ?
The goal of the next section is to answer the above questions.
3
1.2. Maximum Domains of Attraction
Let’s consider probabilities of the form
P(
Mn − bn
an≤ x
)which may be rewritten as
P(Mn ≤ un)
where un = un(x) = anx + bn. In order to get more insight into the asymptotic
behavior of Mn we have to investigate the following aspects:
1. Conditions on F, that ensure the existence of the limit of P(Mn ≤ un) for
n→ ∞ and appropriate constants un.
2. Possible limit laws for the (centered and normalized) maxima Mn (comparable
to the Central Limit Theorem).
Example 1.1. Let X be a standard exponential distribution. Then the distribution
function of X is given by
FX(x) = 1− e−x, x > 0.
If X1, X2, . . . are i.i.d. random variables with common distribution function F, then
P(Mn ≤ x + log n) =(
1− e−x−log n)n
= (1− e−x
n)n
→ exp−e−x =: Λ(x), x ∈ R.
The limit distribution Λ(x) is called the Gumbel distribution. So we obtain that the
Gumbel distribution is a possible limit distribution according to Definition 1.1.
The following Theorem provides a partial answer to question 1.
Theorem 1.1 (Poisson approximation). For given τ ∈ [0, ∞] and a sequence un of real
numbers, the following two conditions are equivalent:
nF(un)→ τ as n→ ∞, (1.4)
P(Mn ≤ un)→ e−τ as n→ ∞. (1.5)
where F = 1− F.
4
Proof. Suppose first that 0 ≤ τ < ∞. If (1.4) holds, then
P(Mn ≤ un) = Fn(un) = (1− F(un))n =
(1− τ
n+ o(
1n)
)n,
so that (1.5) follows at once.
Conversely, if (1.5) holds (0 ≤ τ < ∞), we must have
F(un) = 1− F(un)→ 0
(otherwise, F(unk) would be bounded away from 0 for some subsequence (nk) and
P(Mnk ≤ unk) = (1− F(unk))nk would imply P(Mnk ≤ unk) → 0). By taking loga-
rithms in (1.5), we have
−n ln(1− F(un))→ τ.
Since − ln(1− x) ∼ x for x → 0, this implies nF(un) = τ + o(1) that giving (1.4).
If τ = ∞ and (1.4) holds but (1.5) does not, there must be a subsequence (nk) such
that
P(Mnk ≤ unk)→ exp−τ′,
as k → ∞ for some τ′ < ∞. But then (1.5) implies (1.4), so that nkF(unk) → τ′ < ∞,
contradicting (1.4) with τ = ∞.
Similarly, (1.5) implies (1.4) for τ = ∞.
Example 1.2. We consider the distribution function F
F(x) =
1− e1/x for all x < 0
1 for all x ≥ 0
By theorem 1.1, if un such that
ne1/un n→∞→ τ > 0,
it follows that
P(Mn ≤ un)n→∞→ e−τ.
By writing τ = e−x(−∞ < x < ∞) and taking un = (log τ− log n)−1, it follows that
P(Mn ≤ −(log n + x)−1)→ exp(−e−x),
from which it is readily checked that
P(log n)2[Mn +1
log n] ≤ x + o(1) → exp(−e−x),
giving Gumbel distribution with
an = (log n)−2, bn = −(log n)−1.
5
We denote f (x−) = limy↑x f (y) and p(x) = F(x)− F(x−).
Theorem 1.2. Let F be a d.f. with right endpoint xF ≤ ∞ and let τ ∈ (0, ∞). There exists
a sequence (un) satisfying nF(un)→ τ if and only if
limx↑xF
F(x)F(x−)
= 1 (1.6)
or equivalently, if and only if
limx↑xF
p(x)F(x−)
= 0 (1.7)
Hence, by Theorem 1.1, if 0 < ρ < 1, there is a sequence un such that P(Mn ≤un) → ρ if and only if (1.6) (or (1.7)) holds. For ρ = 0 or 1, such a sequence may
always be found.
Proof. We suppose that (1.4) holds for some 0 < τ < ∞ but that, say (1.7), does not.
Then there exists ε and a sequence xn such that xn → xF and
p(xn) ≥ 2ε(F(xn−)). (1.8)
Now choose a sequence of integers nj so that 1− τnj
is ”close” to the midpoint of
the jump of F at xj, i.e. such that
1− τ
nj≤
F(xj−) + F(xj)
2≤ 1− τ
nj + 1.
Clearly we have either
(i) unj < xj for infinitely many values of j, or
(ii) unj ≥ xj for infinitely many j-values.
If alternative (i) holds, then for such j,
njF(unj) ≥ njF(xj−). (1.9)
Now, clearly
njF(xj−) = τ + nj
[(1− τ
nj
)−
F(xj) + F(xj−)2
+p(xj)
2
]
≥ τ +nj p(xj)
2− nj
(τ
nj− τ
nj + 1
)≥ τ + εnjF(xj−)−
τ
nj + 1
6
by (1.8) so that
(1− ε)njF(xj−) ≥ τ − τ
nj + 1.
Since clearly nj → ∞, it follows that (since τ ∈ (0, ∞) by assumption)
limj→∞
sup njF(xj−) ≥ τ,
and hence by (1.2),
limj→∞
sup njF(unj) ≥ τ,
which contradicts (1.4). The calculations in case (ii) (unj ≥ xj for infinitely many j)
are very similar, with only the obvious changes.
Conversely, suppose that (1.6) holds and let un be any sequence such that F(un−) ≤1− τ
n ≤ F(un) (e.g. un = F−1(1− τn )), from which a simple rearrangement yields
F(un)
F(un−)τ ≤ nF(un) ≤ τ
from which (1.4) follows since clearly un → xF as n→ ∞.
The result applies in particular to discrete distributions with infinite right endpoint.
If the jump heights of the d.f. do not decay sufficiently fast, then a non-degenerate
limit distribution for maxima does not exist.
Example 1.3 (Poisson distribution). Let X be a Poisson r.v.s with expectation λ > 0;
i.e,
P(X = k) = e−λ λk
k!, λ > 0, k ∈N
Then,
F(k)F(k− 1)
= 1− F(k)− F(k− 1)F(k− 1)
= 1− λk
k!
(∞
∑r=k
λr
r!
)−1
= 1−(
1 +∞
∑r=k
k!r!
λr−k
)−1
.
The latter sum can be estimated as∞
∑s=1
λs
(k + 1)(k + 2) · · · (k + s)≤
∞
∑s=1
(λ
k
)s=
λ/k1− λ/k
, k ≥ s
which tends to 0 as k→ ∞, so that F(k)F(k−1)
→ 0.
Hence, by virtue of Theorem 1.2 we can see that no non-degenerate distribution can
be limit of normalized maxima taken from a sequence of random variables identi-
cally distributed as X.
7
Example 1.4 (Geometric distribution). We consider the random variable X with ge-
ometric distribution:
P(X = k) = p(1− p)k−1, 0 < p < 1, k ∈N.
For this distribution, we have
F(k)F(k− 1)
= 1− (1− p)k−1
(∞
∑r=k
(1− p)r−1
)−1
= 1− p ∈ (0, 1).
By the same argument as above, no limit P(Mn ≤ un) → ρ exists except for ρ = 0
or 1, that implies there is no nondegenerate limit distribution for the maxima in the
geometric distribution case.
Example 1.5 (Negative binomial distribution). Let X be the random variable with
P(X = k) = Ck−1i+k−1pi(1− p)k−1, k ∈ R, 0 < p < 1, i > 0.
Using properties of the binomial coefficients we obtain
F(k)F(k− 1)
= 1− F(k)− F(k− 1)F(k− 1)
≤ 1− p ∈ (0, 1),
i.e, no limit P(Mn ≤ un)→ ρ exists except for ρ = 0 or 1.
Definition 1.2. Suppose that H : R→ R is a non-decreasing function. The general-
ized inverse of H is given by
H←(x) = infy : H(y) ≥ x.
Properties of generalized inverse are given in Appendix A.2.
Lemma 1.1. (i) For H as above, if a > 0, b and c are constants, and T(x) = H(ax +
b)− c, then T←(y) = a−1[H←(y + c)− b].
(ii) If F is a non-degenerate d.f., there exist y1, y2 such that F←(y1) < F←(y2) are well
defined ( and finite).
Proof. (i) We have
T←(y) = infx : H(ax + b)− c ≥ y
= a−1[ inf(ax + b : H(ax + b) ≥ y + c) − b ]
= a−1[H←(y + c)− b],
8
as required.
(ii) If F is non-degenerate, there exist x′1 < x′2 such that
0 < F(x′1) = y1 < F(x′2) = y2 ≤ 1.
Clearly x1 = F←(y1) and x2 = F←(y2) are both well defined. Also F←(y2) ≥ x′1 and
equality would require F(z) ≥ y2 for all z > x1 so that
F(x′1) = limε↓0
F(x′1 + ε) = F(x′1+) ≥ y2,
contradicting F(x′1) = y1. Thus F←(y2) > x′1 ≥ x1 = F←(y1), as required.
For any function H denote
C (H) = x ∈ R : H is finite and continuous at x.
If two r.v.s X and Y have the same distribution, we write
X d= Y.
Definition 1.3. Two distribution functions U(x) and V(x) are of the same type if for
some A > 0, B ∈ R
V(x) = U(Ax + B)
for all x.
In terms of random variables, if X has distribution U and Y has distribution V, then
Y d=
X− BA
Example 1.6. Let denote N(0, 1, x) (normal distribution function with mean 0 and
variance 1). Then, it is easy to see that N(µ, σ2, x) = N(0, 1, x−µσ ) for σ > 0, µ ∈
R. Then all normalized d.f’s are of the same type called normal type. If X0,1 has
N(0, 1, x) as its distribution and Xµ,σ has N(µ, σ2, x) as its distribution, then Xµ,σd=
σX0,1 + µ.
Now we state the theorem developed by Gnedenko and Khintchin.
Theorem 1.3 (Convergence to types theorem). (a) Suppose U(x) and V(x) are two non-
degenerate distribution functions. Suppose for n ≥ 1, Fn is a distribution, an ≥ 0, bn ∈R, αn > 0, βn ∈ R and
Fn(anx + bn)d→ U(x), Fn(αnx + βn)
d→ V(x). (1.10)
9
Then as n→ ∞αn
an→ A > 0,
βn − bn
an→ B ∈ R, (1.11)
and
V(x) = U(Ax + B) (1.12)
An equivalent formulation in term of random variables :
(a’) Let Xn, n ≥ 1 be random variables with distribution function Fn and the U, V
are random variables with distribution functions U(x), V(x). If
Xn − bn
an
d→ U,Xn − βn
αn
d→ V, (1.13)
then (1.11) holds and
V d=
U − BA
. (1.14)
(b) Conversely, if (1.11) holds then either of the two relations in (1.10) (or (1.13) )
implies the other and (1.12) (or (1.14)) holds.
Proof. (b) Suppose (1.11) holds and Yn := Xn−bnan
d→ U. We must show that
Xn − βn
αn
d→ U − BA
.
By Skorohod’s Theorem (see appendix A.3), there exist Yn, U, n ≥ 1, defined on
([0, 1], B[0, 1], m)(the Lebesgue probability space, m is Lebesgue measure) such that
Ynd= Yn, U d
= U, Yna.s= U, n ≥ 1.
Put Xn := anYn + bn so Xnd= Xn. Then
Xn − βn
αn
d=
Xn − βn
αn=
an
αnYn +
bn − βn
αn
→ UA− B
Ad=
U − BA
,
that means Xn−βnαn
d→ U−BA .
(a) Using Proposition A.2 (see Appendix A.2) that if Gnd→ G, then also G←n
d→ G←
and the relation in (1.10) can be inverted to give
F←n (y)− bn
an→ U←(y), y ∈ C (U←) (1.15)
F←n (y)− βn
αn→ V←(y), y ∈ C (V←) (1.16)
10
weakly. Since neither U(x) nor V(x) concentrates at one point we can find points
y1, y2 with yi ∈ C (U←) ∩ C (V←), y1 < y2, for i = 1, 2, satisfying
−∞ < U←(y1) < U←(y2) < ∞,
and
−∞ < V←(y1) < V←(y2) < ∞.
Therefore from (1.15) we have for i = 1, 2
F←n (yi)− bn
an→ U←(yi),
F←n (yi)− βn
αn→ V←(yi) (1.17)
and by subtraction
F←n (y2)− F←n (y1)
an→ U←(y2)−U←(y1) > 0,
F←n (y2)− F←n (y1)
αn→ V←(y2)−V←(y1) > 0. (1.18)
Divide the first relation in (1.18) by the second to obtain
αn
an→ U←(y2)−U←(y1)
V←(y2)−V←(y1)=: A > 0
Using this and (1.17) we get
F←n (y1)− bn
an→ U←(y1),
F←n (y1)− βn
an=
F←n (y1)− βn
αn.αn
an→ V←(y1)A,
and so subtracting yields
βn − bn
an→ V←(y1)A−U←(y1) =: B
This gives (1.11) and (1.12) follows from (b).
Remark 1.2. (a) The answer to question 2 is quite clear from Theorem 1.3. Namely,
F ∈ D(G1) and F ∈ D(G2) then G1 and G2 must be of the same type.
(b) The theorem shows that when
Xn − bn
an
d→ U
and U is non-constant, we can always suitable choice of the normalizing con-
stants is
an = F←n (y2)− F←n (y1),
bn = F←n (y1).
11
1.2.1. Max-Stable Distributions
In this section we answer the question: What are the possible (non-degenerate) limit
laws for the maxima Mn when properly normalised and centred?
Definition 1.4. A non-degenerate random distribution function F is max-stable if for
X1, X2, . . . , Xn i.i.d. F there exist an > 0, bn ∈ R such that
Mnd= anX1 + bn.
Example 1.7. If X1, X2, . . . is a sequence of independent standard exponential Exp(1)
variables, F(x) = 1− e−x for x > 0. Taking an = 1 and bn = n, we have
P(
Mn − bn
an≤ x
)= Fn(x + log n) = [1− e−(x+log n)]n
= [1− n−1e−x]n → exp(−e−x)
as n → ∞, for each fixed x ∈ R. Hence, with the chosen an and bn, the limit distri-
bution of normalized Mn as n→ ∞ is the Gumbel distribution.
Example 1.8. If X1, X2, . . . is a sequence of independent standard Frechet variables,
F(x) = exp(− 1x ) for x > 0. For an = n and bn = 0.
P(
Mn − bn
an≤ x
)= Fn(nx) = [exp− 1
nx]n
= exp(− nnx
) = F(x)
as n→ ∞, for each fixed x > 0. Hence, the limit in this case - which is an exact result
for all n, because of the max-stability of F - is also the standard Frechet distribution.
Example 1.9. If X1, X2, . . . are a sequence of independent uniform U(0, 1) variables,
F(x) = x for 0 ≤ x ≤ 1. For fixed x < 0, suppose n > −x and let an = 1n and bn = 1.
Then,
P(
Mn − bn
an≤ x
)= Fn(n−1x + 1)
=(
1 +xn
)n→ ex
as n → ∞. Hence, the limit distribution is of Weibull type, that means Weibull
distribution are max-stable.
Theorem 1.4 (Limit property of max-stable laws). The class of all max-stable distribu-
tion functions coincide with the class of all limit laws G for (properly normalised) maxima
of i.i.d. rvs (as given in (1.3)).
12
Proof. 1. If X1, X2, . . . are i.i.d. G, G is max-stable and Mn =∨n
i=1 Xi, then
Mnd= anX1 + bn
for some an > 0, bn ∈ R. Then ∀x ∈ R
limn→∞
P(
Mn − bn
an≤ x
)= G(x).
2. Now suppose that H is non degenerate and there exist an > 0, bn ∈ R such that
limn→∞
Fn(anx + bn) = H(x).
We claim that H is max-stable. Observe that for all k ∈N, we have
limn→∞
Fnk(anx + bn) = Hk(x),
limn→∞
Fnk(ankx + bnk) = H(x).
By virtue of Convergence to Types Theorem: there exist a∗k > 0, b∗k ∈ R such that
limn→∞
ankan
= a∗k , limn→∞
bnk − bn
an= b∗k
and
H(x) = Hk(a∗k x + b∗k ).
Therefore if Y1, . . . , Yk are i.i.d. from H then for all k ∈N,
Y1d=
∨ni=1 Yi − b∗k
a∗k
which impliesn∨
i=1
Yid= a∗kY1 + b∗k .
1.3. Extremal Value Distributions
1.3.1. Extremal Types Theorem
The extreme type theorems play a central role of the study of extreme value theory.
In the literature, Fisher and Tippett (1928) were the first who discovered the extreme
type theorems and later these results were proved in complete generality by Gne-
denko (1943). Later Galambos (1987), Leadbetter, Lindgren and Rootzen (1983), and
Resnick (1987) gave excellent reference books on the probabilistic aspect.
13
Theorem 1.5 (Fisher-Tippett(1928), Gnedenko(1943)). Suppose there exist sequence an >
0 and bn ∈ R, n ≥ 1 such that
Mn − bn
an
d→ G
where G is non-degenerate, then G is of one the following three types:
1. Type I, Gumbel : Λ(x) = exp−e−x, x ∈ R.
2. Type II, Frechet : Φα(x) =
0 if x < 0
exp−x−α if x ≥ 0for some α > 0.
3. Type III, Weibull : Ψα(x) =
exp−(−x)α if x < 0
1 if x ≥ 0for some α > 0
Proof. For t ∈ R Let denote [t] = The greatest integer less than or equal to t. We
proceed in a sequence of steps.
Step(i). From P[Mn−bnan≤ x] = Fn(anx + bn)
d→ G(x), we get for any t > 0.
F[nt](a[nt]x + b[nt])d→ G(x)
and on the other
F[nt](anx + bn) = (Fn(anx + bn))[nt]n → Gt(x).
Thus Gt and G are of the same type and the convergence to types theorem applies
the existence of two functions α(t) > 0, β(t) ∈ R, t > 0 such that for all t > 0,
limn→∞
an
a[nt]= α(t), lim
n→∞
bn − b[nt]
a[nt]= β(t) (1.19)
and also
Gt(x) = G(α(t)x + β(t)). (1.20)
Step(ii). We observe that the functionα(t) and β(t) are Lebesgue measurable. For in-
stance, to prove α(·) is measurable, it suffices (since limits of measurable functions
are measurable) to show that the function
t 7→ an
a[nt]
is measurable for each n. Since an does not depend on t, the previous statement is
true if the function
t 7→ a[nt]
14
is measurable. Since this function has a countable range aj, j ≥ 1 it suffices to
show
t > 0 : a[nt] = aj
is measurable. But this set equals
⋃k:ak=aj
[kn
,k + 1
n)
which, being a union of intervals, is certainly a measurable set.
Step(iii). Facts about the Hamel Equation (see [20]). We need to use facts about possi-
ble solutions of functional equations called Hamel’s equation and Cauchy’s equation. If
f (x), x > 0 is finite, measurable and real valued and satisfies the Cauchy equation
f (x + y) = f (x) + f (y), x > 0, y > 0,
then f is necessarily of the form
f (x) = cx, x > 0,
for some c ∈ R. A variant of this is Hamel’s equation. If φ(x), x > 0 is finite,
measurable, real valued and satisfies Hamel’s equation
φ(xy) = φ(x)φ(y), x > 0, y > 0,
then φ is of the form
φ(x) = eρ,
for some ρ ∈ R.
Step(iv). Another useful fact. If F is a non-degenerate distribution function and
F(ax + b) = F(cx + d) ∀x ∈ R,
for some a > 0, c > 0 and b, d are constants, then a = c, and b = d.
Choose y1 < y2 and −∞ < x1 < x2 < ∞ by (ii) of lemma 1.1 so that x1 =
F←(y1), x2 = F←(y2). Taking inverses of F(ax+ b) and F(cx+ d) by (i) of the lemma
1.1, we have
a−1(F←(y)− b) = c−1(F←(y)− d)
for all y. Applying this to y1 and y2 in turn, we obtain
a−1(x1 − b) = c−1(x1 − d) and a−1(x2 − b) = c−1(x2 − d),
15
from which it follows simply that a = c and b = d.
Step(v). Return to (1.20) and for t > 0, s > 0 we have on the one hand
Gts(x) = G(α(ts)x + β(ts))
and on the other
Gts(x) = (Gs(x))t = G(α(s)x + β(s))t
= G(α(t)α(s)x + β(s)+ β(t))
= G(α(t)α(s)x + α(t)β(s) + β(t)).
Since G is assumed non degenerate we therefore conclude for t > 0, s > 0
α(ts) = α(t)α(s) (1.21)
β(ts) = α(t)β(s) + β(t) = α(s)β(t) + β(s) (1.22)
the last step following by symmetry. We recognize (1.21) as the famous Hamel func-
tional equation. The only finite measurable, nonnegative solution is of the following
form
α(t) = t−θ, θ ∈ R
Step(vi). We will show that
β(t) =
c log t If θ = 0
c(1− tθ) If θ 6= 0
for some c ∈ R.
If θ = 0, then α(t) = 1 and β(t) satisfies
β(ts) = β(t)β(s).
So expβ(·) satisfies the Hamel equation which implies that
expβ(t) = tc
for some c ∈ R and thus β(t) = c log t.
If θ 6= 0, then
β(ts) = α(t)β(s) + β(t) = α(s)β(t) + β(s).
Fix s0 6= 1 and we get
α(t)β(s0) + β(t) = α(s0)β(t) + β(s0),
16
and solving for β(t) we get
β(t)(1− α(s0)) = β(s0)(1− α(t)).
Note that 1− α(s0) 6= 0. Thus we conclude
β(t) =β(s0)
1− α(s0)(1− α(t)) =: c(1− tθ).
Step(vii). We conclude that
Gt(x) =
G(x + c log t) If θ = 0 (a)
G(tθx + c(1− tθ)) If θ 6= 0 (b)
Now we show that θ = 0 corresponds to a limit distribution of type Λ(x), that the
case θ > 0 corresponds to a limit distribution of type Φα and that θ < 0 corresponds
to Ψα.
Consider the case θ = 0. Examine the equation in (a): For fixed x, the function Gt(x)
is non-increasing in t. So c < 0, since otherwise the right side of (a) would not be
decreasing. If x0 ∈ R such that G(x0) = 1, then
1 = Gt(x0) = G(x0 + c log t), ∀t > 0,
which implies
G(y) = 1, y ∈ R
and this contradicts G non-degenerate. If x0 ∈ R such that G(x0) = 0, then
0 = Gt(x0) = G(x0 + c log t), ∀t > 0,
which implies
G(x) = 0, ∀x ∈ R,
again giving a contradiction. We conclude 0 < G(y) < 1, for all y ∈ R.
In (a), set x = 0 and set G(0) = e−κ Then
e−tκ = G(c log t).
Set y = c log t, and we get
G(y) = exp−κeyc = exp−e−(
y|c|−log κ)
which is the type of Λ(x).
We consider the case θ > 0. Examine the equation in (b):
Gt(x) = G(tθx + c(1− tθ))
= G(tθ(x− c) + c)
17
i.e, changing variables
Gt(x + c) = G(tθx + c).
Set H(x) = G(x + c). Then G and H are of the same type so it suffices to solve for
H. The function H satisfies
Ht(x) = H(tθx) (1.23)
and H is non-degenerate. Set x = 0 and we get from (1.23)
t log H(0) = log H(0)
for t > 0. So either log H(0) = 0 or −∞; i.e, either H(0) = 0 or 1. However,
H(0) = 1 is impossible since it would imply the existence of x < 0 such that the
left side of (1.23) is decreasing in t while the right side of (1.23) is increasing in t.
Therefore we conclude H(0) = 0. Again from (1.23) we obtain
Ht(1) = H(tθ)
if H(1) = 0, then H ≡ 0 and if H(1) = 1 then H ≡ 1, both statements contradicting
H non-degenerate. Therefore H(1) ∈ (0, 1). Set α = θ−1, H(1) = exp−ρ−α, u = tθ
so that u−α = t. From (1.23) with x = 1 we get for u > 0
H(u) = exp−ρ−αt = exp−(ρu)−α
= Ψα(ρu).
The other cases and θ < 0 are handled similarly.
In words, The extreme type theorems say that for a sequence of i.i.d. random vari-
ables with suitable normalizing constants, the limiting distribution of maximum
statistics, if it exists, follows one of three types of extreme value distributions that
labeled I, II and III. Collectively, these three classes of distribution are termed the
extreme value distributions, with types I, II and III widely known as the Gumbel,
Frechet and Weibull families respectively. Each family has a location and scale pa-
rameter, band a respectively; additionally, the Frechet and Weibull families have a
shape parameter α.
Remark 1.3. (a) Though, for modelling purposes the types of Λ, Φα and Ψα are very
different, from a mathematical point of view they are closely linked. Indeed, one
immediately verifes the following properties. Suppose X > 0, then
X ∼ Ψα ⇔ −1X∼ Ψα ⇔ log Xα ∼ Λ
18
(b) We have to shown that:
Class of Extreme Value distributions = Max-stable distributions = Distributions ap-
pearing as limits in Definition 1.1
Thus we have a characterization of the limit distributions appearing as limits in
Definition 1.1, which answers question 3.
1.3.2. Generalized Extreme Value Distributions
Definition 1.5 (Generalized Extreme Value Distributions). For any γ ∈ R, defined
the distribution
Gγ(x) := exp(−(1 + γx)1γ ), 1 + γx > 0
is an extreme value distribution, abbreviated by EVD. The parameter γ is called the
extreme value index.
Since (1 + γx)−1/γ → exp(−x), as γ → 0 interpret for γ = 0, we have G0(x) =
exp−e−x . The family of distributions Gγ(x−µ
σ ), for µ, γ ∈ R, σ > 0 is called
the family of generalized extreme value distributions under von Mises or von Mises-
Jenkins parametrization. It shows that the limit distribution functions form a simple
explicit one-parameter family apart from the scale and location parameters.
Let us consider the subclasses with γ > 0, γ = 0, and γ < 0 separately:
(a) For γ > 0 clearly Gγ(x) < 1 for all x, i.e., the right endpoint of the distribution
is infinity. Moreover, as x → ∞, 1− Gγ(x) ∼ γ−1/γx−1/γ i.e., the distribution
has a rather heavy right tail. We use Gγ(x−1
γ ) and get with α = 1γ > 0,
Φγ(x) =
0 x ≤ 0
exp(−x−α) x > 0
This class is often called the Frechet class of distributions.
(b) For γ = 0. The distribution with γ = 0
G0(x) = exp(−e−x),
for all x ∈ R, is called the double-exponential or Gumbel distribution.
Observe that the right endpoint of the distribution equals infinity. The distri-
bution,however, is rather light-tailed:
1− G0(x) = 1− exp−e−x ∼ e−x,
as x → ∞ and all moments exist.
19
(c) For γ < 0, the right endpoint of the distribution is − 1γ so it has a short tail,
verifying 1− Gγ(−γ−1 − x) ∼ (−γx)−1/γ, as x ↓ 0. We use Gγ(−1+xγ ) and
get with α = − 1γ > 0,
Ψγ(x) =
exp(−(−x)−α) x < 0
1 x ≥ 0
This class is sometimes called the reverse-Weibull class of distributions.
1.4. Domain of Attraction Condition
Recall that we defined the generalized inverse of non-decreasing function f . We have
the lemma :
Lemma 1.2. Suppose fn is a sequence of nondecreasing functions and g is a nondecreasing
function. Suppose that for each x in some open interval (a, b) that is a continuity point of g,
limn→∞
fn(x) = g(x).
Then, for each x in the interval (g(a), g(b)) that is a continuity point of g← we have
limn→∞
f←n (x) = g←(x).
Proof. Let x be a continuity point of g←. Fix ε > 0. We have to prove that for
n, n0 ∈N, n0 ≤ n,
f←n (x)− ε ≤ g←(x) ≤ f←n (x) + ε.
We are going to prove the right inequality; the proof of the left-hand side inequality
is similar.
Choose 0 < ε1 < ε such that g←(x)− ε1 is a continuity point of g. This is possible
since the continuity points of g form a dense set. Since g← is continuous in x, g←(x)
is a point of increase for g; hence g(g←(x)− ε1) < x. Choose δ < x− g(g←(x)− ε1).
Since g←(x)− ε1 is a continuity point of g, there exists n0 such that fn(g←(x)− ε1) <
g(g←(x)− ε1) + δ < x for n ≥ n0. The definition of the function f←n then implies
g←(x)− ε1 ≤ f←n (x).
Theorem 1.6. The following statements are equivalent for all x, such that 0 < G(x) < 1:
1. There exist an > 0, bn ∈ R and a nondegenerate distribution function G such that,
Fn(anx + bn)d→ G(x), as n→ ∞.
20
2. There exist an > 0, bn ∈ R and a nondegenerate distribution function G, for each
continuity point x of G,
n(1− F(anx + bn))→ − log G(x), as n→ ∞
3. There exist a(t) > 0, b(t) ∈ R and a nondegenerate distribution function G, for each
continuity point x of G,
t(1− F(a(t)x + b(t)))→ − log G(x), as n→ ∞.
4. Let U = ( 11−F )
←, There exist a(t) > 0, b(t) ∈ R and a nondegenerate distribution
function G,
U(tx)− b(t)a(t)
→ D(x) = G←(e−1/x), as n→ ∞
for each x > 0 continuity point of D(x).
5. There exist a(t) > 0, b(t) ∈ R and a nondegenerate distribution function G, such
that
Ft(a(t)x + b(t)) d→ G(x), as n→ ∞.
Proof. Fix a continuity point x of G, 0 < G(x) < 1.
(1 ⇔ 2) Clearly F(anx + bn) → 1 as n → ∞ and from the expansion log(1 + ε) =
ε + O(ε2) for ε→ 0. By taking logarithms, as n→ ∞
Fn(anx + bn)→ G(x)
⇔ n log F(anx + bn)→ log G(x)
⇔ n log(1− (1− F(anx + bn)))→ log G(x)
⇔ n(1− F(anx + b))→ − log G(x). (1.24)
Similarly, we can show that 3⇔ 5.
(2⇔ 3). To show that (2 d→ 3), let a(t) = a[t], b(t) = b[t] (with [t] the integer part of
t). Then
t(1− F(a(t)x + b(t))) ≤ ([t] + 1)(1− F(a(t)x + b(t)))
=[t] + 1[t]
× [t](1− F(a[t]x + b[t]))
t→∞→ 1× (− log G(x)) = − log G(x). (1.25)
Similarly, we can show that
limt→∞
t(1− F(a(t)x + b(t))) ≥ − log G(x).
21
Hence, we have (2 d→ 3). We see that (3 d→ 2) is obvious.
(3⇔ 4), firstly, we have to show 3 d→ 4
t(1− F(a(t)x + b(t)))→ − log G(x)d→ 1
t
(1
[1− F(a(t)x + b(t))]
)→ − 1
log G(x).
Inverting the above convergence and apply Lemma 1.2, we obtain
U(ty)− b(t)a(t)
→(− 1
log G(x)
)←(y) = G←(e−1/y).
Similarly, we get 4 d→ 3.
Example 1.10 (Normal distribution). Let F be the standard normal distribution. We
are going to show that : for all x > 0,
limn→∞
n(1− F(anx + bn)) = e−x (1.26)
with
bn := (2 log n− log log n− log(4π))1/2, (1.27)
and
an :=1bn
. (1.28)
Note that bn(2 log n)1/2 → 1, n→ ∞, then
log bn −12
log log n− 12
log 2→ 0
andb2
n2+ log bn − log n +
12
log(2π)→ 0
as n→ ∞. Now taking an = 1bn
,
− ddx
n(1− F(anx + bn)) =n
bn√
2πexp(−1
2(
xbn
+ bn)2)
= exp−(b2n
2log bn − log n +
12
log(2π))e−x2/(2b2n)e−x
→ e−x
for x ∈ R. Hence
n(1− F(anx + bn)) =n√
2πbn
∫ ∞
xexp
(−1
2
(ubn
+ bn
)2)
22
= exp−(
b2n
2+ log bn log n +
12
log(2π)
) ∫ ∞
xe−u2/(2b2
n)e−udu
→ e−x
by Lebesgue’s theorem on dominated convergence (see Appendix A.3.1). Then
limn→∞
n(1− F(anx + bn)) = e−x
We obtain
Fn(anx + bn)n→∞→ exp(−e−x).
Since in the limit relation (1.26) we can replace an by a′n and bn by b′n provided ana′n→
1, (b′n−bn)an
→ 0, we can replace bn, an from (1.27) and (1.28) by, e.g
b′n = (2 log n)1/2 − log log n + log(4π)
(2 log n)1/2
and
a′n = (2 log n)−1/2.
1.4.1. General Theory of Domains of Attraction
It is important to know which (if any) of the three types of limit law applies when
r.v.s Xn have a given d.f. F. Various necessary and sufficient conditions are
known, involving the ” tail behaviour ” 1 − F(x) as x increases, for each type of
limit. We shall state these and prove their sufficiency, omitting the proofs of neces-
sity.
Theorem 1.7. Necessary and sufficient conditions for the d.f. F of the r.v.’s of the i.i.d.
sequence Xn to belong to each of the three types are
Type I :∫ ∞
0 (1− F(u))du < ∞. There exists some strictly positive function g(t) such that
limx↑xF
1− F(t + xg(t))1− F(t)
= e−x
for all real x, where
g(t) =
∫ xFt (1− F(u))du
1− F(u)for t < xF.
Type II : xF = ∞ and
limt→∞
1− F(tx)1− F(t)
= x−α,
α > 0, for each x > 0.
23
Type III : xF < ∞ and
limh↓0
1− F(xF − xh)1− F(xF − h)
= xα
α > 0, for each x > 0.
Proof. We assume first the existence of a sequence un (which may be taken non-
decreasing in n) in each case such that n(1− F(un)) → 1. The un constants will, of
course, differ for the differing types. Clearly un → xF and un < xF for all sufficiently
large n.
If F satisfies the Type II criterion we have, writing un for t, for each x > 0,
n(1− F(unx)) ∼ n(1− F(un))x−α → x−α
so that Theorem 1.1 yields, for x > 0,
P(Mn ≤ unx)→ exp(−x−α).
Since un > 0 (when n is large, at least) and the right-hand side tends to zero as x ↓ 0,
it also follows that PMn ≤ 0 → 0, and for x < 0, that
PMn ≤ unx ≤ PMn ≤ 0 → 0.
Thus PMn < unx → G(x), where G is the Type II representative d.f. listed in
Theorem 1.5. But this may be restated as
P(
Mn − bn
an≤ x
)→ G(x), (1.29)
where an = un and bn = 0 so that the Type II limit follows.
The Type III limit follows in a closely similar way by writing hn = xF − un(↓ 0) so
that, for x > 0,
limn→∞
n(1− FxF − x(xF − un)) = xα,
and hence (replacing x by −x) for x < 0,
limn→∞
n(1− FxF + x(xF − un)) = (−x)α.
Using Theorem 1.1 again, this shows at once that the Type III limit applies with
constants in (1.29) given by
an = (xF − un), bn = xF.
The Type I limit also follows along the same lines since, when F satisfies that crite-
rion, we have, for all x, writing t = un ↑ xF(≤ ∞),
limn→∞
n(1− Fun + xg(un)) = e−x,
24
giving (again by Theorem 1.1) the Type I limit with an = g(un), bn = un. Fi-
nally, we must show the existence of the (nondecreasing) sequence un satisfying
limn→∞ n(1− F(un)) = 1. For un, we may take any nondecreasing sequence such
that
F(un−) ≤ 1− 1n≤ F(un)
(such as the sequence un = F−1(1 − 1n ) = infx : F(x) > 1 − 1
n). For such a
sequence, n(1− F(un)) ≤ 1 so that, trivially, lim sup n(1− F(un)) ≤ 1. Thus it only
remains to show that in each case lim inf n(1− F(un)) ≥ 1, which will follow since
n(1− F(un−)) ≥ 1 if we show that
limn→∞
inf1− F(un)
1− F(un−)≥ 1 (1.30)
For a d.f. F satisfying the listed Type II criterion, the left-hand side of (1.30) is, for
any x < 1, no smaller than
limn→∞
inf1− F(un)
1− F(unx)= xα,
from which (1.30) follows by letting x → 1.
A similar argument holds for a d.f. F satisfying the Type III criterion, the left-hand
side of (1.30) being no smaller (for x > 1, hn = xF − un) than
limn→∞
inf1− F(xF − hn)
1− F(xF − xhn)= x−α,
which tends to 1 as x → 1, giving (1.30).
Finally for the Type I case, the left-hand side of (1.30) is no smaller ( if x < 0) than
limn→∞
inf1− F(un)
1− F(un + xg(un))= ex,
which tends to 1 as x → 0, so that again (1.30) holds.
Corollary 1.1. The constants an, bn in the convergence PMn−bnan≤ x → G(x) may be
taken in each case above to be:
Type I :
an = g(un), bn = un,
with un = F←(1− 1n ) = infx : F(x) ≥ 1− 1
n.
Type II :
an = un, bn = 0.
25
Type II :
an = xF − un, bn = xF.
Proof. These relationships appear in the course of the proof of the theorem above.
Example 1.11 (Pareto distribution). As a simple example, we consider now the Pareto
distribution
F(x) = 1− κx−α, α > 0, κ > 0, x ≥ κ1/α.
We have1− F(tx)1− F(t)
=(tx)−α
t−α= x−α
so F belongs to DOA of a Type II extreme value distribution. By setting
n(1− F(un)) = τ
We have
un = (κnτ)1/α
so that Theorem 1.1 gives
PMn ≤ (κnτ)1/α → e−τ
By putting τ = x−α for x ≥ 0, we have
P(κn)−1/αMn ≤ x → exp(−x−α),
so that a Type II limit holds with
an = (κn)1/α, bn = 0.
1.5. Condition for belonging to Extreme Value Domain
The following theorem states a sufficient condition for belonging to a domain of
attraction. The condition is called von Mises’condition which is applied when the d.f.
F has a density function - an obviously common case.
26
Theorem 1.8. Let F be a distribution function and xF its right endpoint. Suppose F”(x)
exists and F′(x) is positive for all x in some left neighborhood of xF. If
limt↑xF
(1− F(t)
F′(t))= γ (1.31)
or equivalently
limt↑xF
(1− F(t))F′′(t)(F′(t))2 = −γ− 1 (1.32)
then F is in the domain of attraction of Gγ (F ∈ D(Gγ)).
Proof. Here, as elsewhere, the proof is much simplified by formulating everything
in terms of the inverse function U rather than the distribution function F. By differ-
entiating the relation1
1− F(U(t))= t
We obtain
U′(t) =[1− F(U(t))]2
F′(U(t)).
Differentiating once more, we find that
U′′(t)U′(t)
= −2[1− F(U(t))]− F′′(U(t))[1− F(U(t))]2
[F′(U(t))]2
By Theorems 1.6 the relation to be proved is equivalent to
limt→∞
U(tx)−U(t)a(t)
=xγ − 1
γ(1.33)
for all x > 0. So we need to prove that
limt→∞
tU′′(t)U′(t)
= γ− 1
implies (1.33) for the same γ. Since for 1 < x0 < x,
log U′(x)− log U′(x0) =∫ x
x0
U′′(s)U′(s)
ds,
we have for x > 0, t, tx > 1,
log U′(tx)− log U′(t) =∫ x
1A(ts)
d(s)s
with A(t) := tU′′(t)U′(t) . It follows that for 0 < a < b < ∞,
limt→∞
supa≤x≤b
| logU′′(tx)U′(t)
− log xγ−1| = 0.
27
Hence, since |es − et| < c|s− t| on a compact interval for some positive constant c,
limt→∞
supa≤x≤b
|U′′(tx)
U′(t)− xγ−1| = 0.
This impliesU(tx)−U(t)
tU′(t)− xγ − 1
γ=∫ x
1(
U′′(ts)U′(t)
− sγ−1)ds
converges to zero.
Remark 1.4. If (1.31) holds then
Fn(anx + bn)→ Gγ(x),
with an = U(n) and bn = U′(n) = 1nF′(bn)
.
Example 1.12. Let F(x) = N(x), the standard normal distribution. We have
F′(x) = n(x) =1√2π
e−x2/2
F′′(x) = − 1√2π
xe−x2/2 = −xn(x)
and using Mills’ ratio [14], we have 1− N(x) ∼ x−1n(x). Therefore
limx→∞
(1− F(x))F′′(x)(F′(x))2 = lim
x→∞
−x−1n(x)xn(x)(n(x))2 = −1.
Then, γ = 0 and F ∈ D(Λ) - Gumbel distribution.
Remark 1.5. If we defined R(t) = − log(1− F(t)), which called the integrated hazad
function, then we get
r(t) = R′(t) =F′(t)
1− F(t).
If the von Mises condition holds, then
limt↑xF
(1
r(t))′ = lim
t↑xF
(1− F(t)
F′(t)
)= lim
t↑xF
(−(F′(t))2 − (1− F(t))F′′(t)
(F′(t))2
)= −1− (γ− 1) = γ.
Thus, the von Mises condition can be written as limt↑xF
(1
r(t)
)′= γ which’s more
convenient to check.
28
Example 1.13. Let F(x) = e−xp, x > 0, p > 0. Then
R(t) = xp, r(t) = R′(t) = pxp−1
1r(t)
= p−1x1−p,(
1r(t)
)′=
1− pp
x−p → 0
as n→ ∞. In this case γ = 0.
Example 1.14. Let F(x) = e−α, x ≥ 1, α > 0. Then F(x) = e−α log x.
R(t) = α log x, r(t) = R′(t) =α
x
1r(t)
=xα
,(
1r(t)
)′=
1α
.
For this case γ = 1α .
29
CHAPTER 2
Multivariate Extreme Value Theory
In this chapter, we study the limiting joint distributions of normalized component-
wise defined maxima of i.i.d. d-variate random vectors. Such distributions are again
max-stable as in the univariate case.
2.1. Introduction
Historically, the first direction that had been explored concerning multivariate ex-
treme events was the modeling of the asymptotic behavior of componentwise max-
ima of i.i.d. observations. Key early contributions to this domain of research are,
among others, the papers of Tiago deOliveira (1958)[22], Sibuya (1960)[21], de Haan
and Resnick (1977)[10], and Pickands (1981)[17]. The general structure of the mul-
tivariate extreme value distributions has been explored by de Haan and Resnick
(1977). Useful representations in terms of max-stable distributions, regular variation
functions, or point processes, have been established. The next section is devoted to
the asymptotic model for componentwise maxima.
Let us assume that X1, X2, . . . , Xn are d-dimensional i.i.d. random vectors from some
distribution F. If we intend to study the extremes of the distribution F, we must
know what an extreme value really mean in the multivariate set-up. Of course, for
d ≥ 2, there is no natural ordering of the random sample. An early review article
Barnett(1976)[2] talks about four kinds of ordering for multivariate data set which
leads to different approaches for studying extremes (maxima or minima) in a multi-
variate set-up.
1. Componentwise maxima depending on Marginal ordering.
2. Maxima based on Redeced (Aggregate) ordering based on a single value com-
puted from a multivariate observation through a function f : Rd → R.
Usually the function f is some measure of generalized distance, say, f (x) =
(x− α)T ∑(x− α).
30
3. Maxima based on Partial ordering, say, based on convex hulls.
4. Concomitants of marginal maxima: Conditional ordering based marginal order-
ing.
Our approach for extreme value analysis will be based on the first one mentioned
above. It turns out that the theory behind component-wise maxima is quite rich
and provides answers to the questions we have. We will study multivariate extreme
value theory in the case dimension, d = 2. Most of the result and definitions given
forth will also hold for d > 2.
In univariate extreme value theory we found that the limit distributions of sample
maxima can be characterized through a parametric family of generalised extreme
value or generalised Pareto distribution. We will learn in this chapter that such is
not the case with multi-dimensional case.
Some notation : The most useful order relation in multivariate extreme value theory
is a special case of what is called marginal ordering: for d-dimensional vectors x =
(x(1), . . . , x(d)) and y = (y(1), . . . , y(d)), the relation x ≤ y is defined as x(j) ≤ y(j) for
all j = 1, . . . , d.
The component-wise maximum of x and y, (x, y ∈ Rd), defined as
x∨ y := (x(1) ∨ y(1), . . . , x(d) ∨ y(d))
is in general different from both x and y.
Subsequently, arithmetic operations and order relations are meant componentwise;
that is,
a + b = (a(1) + b(1), . . . , a(d) + b(d))
where vectors a = (a(1), . . . , a(d)) and b = (b(1), . . . , b(d)). An interval (a, b] is de-
fined by "j≤d(aj, bj].
(a, b] = x ∈ Rd : a < x ≤ b
= (x(1), . . . , x(d)) : a(i) < x(i) ≤ b(i)
and similarly
(−∞, a] = x ∈ Rd : x ≤ a.
We denote Rd= [−∞, ∞]d and we will require sets of the form
[−∞, a] = x ∈ Rd : 0 ≤ x(i) ≤ a(i), 1 ≤ i ≤ d
(a,∞] = x ∈ Rd : a(i) < x(i) ≤ ∞, 1 ≤ i ≤ d
31
[−∞, a]c = Rd\[−∞, a] = x ∈ R
d : x(i) > a(i), f or some i = 1, . . . , d.
Recal that the d.f. F(x) = Q(−∞, x] of a probability measure Q has the following
properties:
(a) F is right-continuous: F(xn) ↓ F(x0) if xn ↓ x0;
(b) F is normed:F(xn) ↑ 1 if xnj ↑ ∞, j = 1, . . . , d,
F(xn) ↓ 0 if xn ≥ xn+1 and xnj ↓ −∞ for some j ∈ 1, . . . , d
(c) F is4-monotone: For a ≤ b,
4baF = Q(a, b]
= ∑m∈0,1d
(−1)(d−∑j≤d mj)F(bm11 a1−m1
1 , . . . , bmdd a1−md
d ) ≥ 0.
Conversely, every function F, satisfying conditions (a)− (c), is the df of a probabil-
ity measure Q. Usually, conditions (a) and (b) can be verified in a straightforward
way. The4-monotonicity holds if, e.g., F is the pointwise limit of a sequence of d.f.
Usually we follows the convention that F stands for the distribution function as well
as the measure, and thus we need emphasize that
Ft(x) stands for (F(x))t
and that for x < y
Ft((x, y]) 6= (F(x, y])t.
2.2. Limit Distributions of Multivariate Maxima
Assume (X1, Y1), (X2, Y2), . . . , (Xn, Yn) is a sequence of vectors that are independent
versions of a random vector having distribution function F(x, y). Since we are talk-
ing the component-wise maxima approach, let us denote
Mn = (n∨
i=1
Xi,n∨
i=1
Yi) = (M(1)n , M(2)
n )
As we had done in the univariate case we wouldn’t study the minima, since
mn := (n∧
i=1
Xi,n∧
i=1
Yi) = −(n∨
i=1
(−Xi),n∨
i=1
(−Yi)).
32
2.2.1. Max-infinitely Divisible Distributions
Let Xn, n ≥ 1 be i.i.d. random variables with common distribution F(x) and
defined Mn =∨n
i=1 Xi. In order to study the structure of the stochastic process
Mn, It has been found convenient to embed Mn in a continuous time process
Y = Y(t), t > 0, called an extremal process, in sense that Mnd= Y(n). For this
construct, Y is defined by the finite dimensional distributions : For 0 < t1 < · · · <tn, x1 ≤ · · · ≤ xn
P[Y(ti) ≤ xi, i = 1, . . . , n] = Fti .Ft2−t1 · · · Ftn−tn−1 .
Recall from the univariate theory that a univariate distribution function F is in the
domain of attraction of a non-degenerate d.f. G if ∃ a(t) > 0, b(t) ∈ R such that as
t→ ∞,
Ft(a(t)x + b(t)) d→ G(x).
In the one-dimensional case Ft is a probability distribution function whenever F is a
probability distribution function. This is not the case for dimension d > 1. Consider
the following example.
Example 2.1. Suppose Xn, n ≥ 1 is an i.i.d. sequence of R valued random vari-
ables and we wish to study the range
Rn :=n∨
i=1
Xi −n∧
i=1
Xi =n∨
i=1
Xi +n∨
i=1
(−Xi).
Thus, it is only natural to look at the joint distribution of (Xi,−Xi) which has distri-
bution F, say. Then
(M(1)n , M(2)
n ) = (n∨
i=1
Xi,n∨
i=1
(−Xi)) =n∨
i=1
(Xi,−Xi)
and Rn = M(1)n + M(2)
n . The joint distribution F(x(1), x(2)) of (Xi,−Xi) concentrates
on (x(1), x(2)) : x(1) + x(2) = 0 and we show that it is not the case that Ft is a
distribution for all t > 0. We see that
Ft((x, y]) = Ft(y)− Ft(x(2), y(1))− Ft(x(1), y(2)) + Ft(x) (2.1)
If it were then the expression in (2.1) would be non-negative for all t. However, take
x = (0, 0), y = (1, 1) and observe that
F(0, 1) = F(u(1), u(2)) : u(1) + u(2) = 0, 0 ≤ u(2) ≤ 1 =: p1,
33
F(1, 0) = F(u(1), u(2)) : u(1) + u(2) = 0, 0 ≤ u(1) ≤ 1 =: p2,
F(0, 0) = 0, F(1, 1) = p1 + p2.
For Ft to be a distribution function, Ft(A) ≥ 0 for any Borel A ⊂ R2.
Ft((0, 1]2) = Ft(1, 1)− Ft(0, 1)− Ft(1, 0) + Ft(0, 0)
= (p1 + p2)t − pt
1 − pt2.
Then, Ft((0, 1]2) ≥ 0⇔ (p1 + p2)t − pt
1 − pt2 ≥ 0 which need not be the case t < 1.
Thus not every distribution function F on Rd for d > 1 has the property that Ft is a
distribution. Thus the following definition is meaningful:
Definition 2.1. The distribution function F on Rd is max-infinitely divisible or max-id
if for every n there exists a distribution Fn on Rd such that
F = Fnn .
i.e, for every n, F1/n is a distribution. If a random vector X : X ∼ F we also call X
max-id.
Proposition 2.1. Suppose that for n ≥ 0, Fn are probability distribution functions on Rd.
If Fnn
d→ F0 (pointwise convergence at continuity points of F0) then F0 is max-id. Conse-
quently,
(a) F is max-id if and only if Ft is a distribution function for every t > 0.
(b) The class of max-id distributions in Rd is closed with respect to weak convergence:
If Gn are max-id distributions converging weakly to a distribution G0, then G0 is
max-id.
Proof. We have Fn’s are distribution functions and Fnn
d→ F0. Suppose x is continuity
pointof F0. We show that F[nt]n (x) → Ft
0(x) for all t > 0. We see that F[nt]n are distri-
bution function for any n ≥ 1, t > 0 with [nt] ≥ 1 and also Ft0 is non-proper, since F0
is non-proper.
Fix t > 0, if F0(x) = 0 then
F[nt]n (x) = (Fn
n (x))[nt]n → 0 = Ft
0(x)
If F0(x) > 0 then Fn(x)→ 1, and as n→ ∞
− log F[nt]n (x) = [nt](− log Fn(x)) ∼ nt(− log Fn(x))
= t(− log Fnn (x))→ t(− log F0(x)) = − log Ft
0(x).
34
Thus F[nt]n (x) → Ft
0(x) whence Ft0 is a distribution function. Hence F
1n
0 is a distribu-
tion function for any n and F0 is max-id.
The proof of (a) follows easily from the above Fn = (F1n
0 )n.
Consequence (b) follows observe that if Gn are max-id and Gnd→ G0, then
G0 = limn→∞
Gn = limn→∞
(G1nn )
n
in order to conclude that G0 is max-id.
Here are two examples of max i.d. distribution function.
Example 2.2. (The independent case)
Suppose G(x, y) = F1(x)F2(y) where F1, F2 are (one-dimensional) distributions. For
every t, Gt(x, y) is a d.f. for we may construct independent extremal processes
Y1(·), Y2(·) governed by F1, F2 respectively and then
P[Y1(t), Y2(t)] = Ft1(x)Ft
2(y) = Gt(x, y).
Example 2.3. A more interesting example is from the theory of Poisson random
measures. Let µ(dx, dy) be a measure on (R2, B(R2)) such that µ(R2) = +∞
and for all x and y sufficiently large µ((−∞, x]× (−∞, y])c < ∞. Use this mea-
sure µ to construct Poisson random measure N on R+ × R2 with mean measure
dt× µ(dx, dy). Let the points of the random measure be (Tk, J(1)k , J(2)k ) and defined a
two-dimensional (extremal) process by
Yi(t) = supJ(i)k | Tk ≤ t
for i = 1, 2. Then
P[Y1(t) ≤ x, Y2(t) ≤ y] =
= P[N[0, t]× ((−∞, x]× (−∞, y])c = 0]
= exp−tµ((−∞, x]× (−∞, y])c
This shows that the d.f.
G(x, y) = exp−µ((−∞, x]× (−∞, y])c
is max i.d.
In what follows we represent partial derivatives by subscripts;
Fx =∂
∂xF, Fy =
∂
∂yF, Fx,y =
∂2
∂y∂xF,
and so on. The notation [F > 0] means the set x : F(x) > 0.
35
Proposition 2.2. Let F be a distributions on R2 with continuous density Fx,y. Then F is
max-id iff Q := − log F satisfies
Qx,y ≤ 0 on [F > 0]
or equivalently iff
FxFy ≤ Fx,yF as on R2.
Proof. Since Ft = e−tQ, we have on (the open) set [F > 0]
∂
∂y∂
∂xFt =
∂
∂y(−te−tQQx) =
−∂
∂y(tFtQx)
= −t(Qx,yFt − tFtQyQx)
= tFt(tQxQy −Qx,y)
and F is max-id iff this latter expression is non-negative for all t, as occurs iff
tQxQy −Qx,y ≥ 0 (2.2)
for all t. Since Qx = − FxF ≤ 0 and Qy ≤ 0, we have (2.2) holds for all t iff Qx,y ≤ 0 as
asserted. The rest follows by differentiation:
0 ≥ Qx,y =∂
∂y∂
∂x(− log F) =
−∂
∂x(
Fy
F)
= −FFx,y − FyFx
F2
that means, FxFy ≤ Fx,yF.
2.2.2. Characterizing Max-id Distributions
Proposition 2.3. The following are equivalent:
(i) F is max-id.
(ii) For some l ∈ [−∞, ∞)d, there exists an exponent measure µ on E := [l,∞]\lsuch that
F(y) =
exp−µ([−∞, y]c) l ≤ y
0 otherwise
Here µ an exponent measure if it is Radon (on compact sets) and satisfies
36
1. µ(E\[−∞, ∞)d) = µ(⋃d
i=1y ∈ E : y(i) = ∞)= 0.
2. Either l > −∞ or x ≥ l and x(i) = −∞ for some i ≤ d implies µ ([−∞, x]c) = ∞.
Proof. We start by showing that if F is max-id, then [F > 0] ⊂ Rd is a rectangle of
the form A1 × · · · × Ad where Ai = [l(i), ∞) or (l(i), ∞) and l = (l(1), . . . , l(d)) =
inf[F > 0]. Note that F is a probability distribution on Rd and µ is defined on
[l,∞]\l = E ⊂ [−∞, ∞]d.
It suffices to show (i) implies (ii). We start by showing that if F is max-id, then
[F > 0] is rectangle. To do this we need to verify two properties of [F > 0]
(1’) x ∈ [F > 0] and x ≤ y implies y ∈ [F > 0].
(2’) x, y ∈ [F > 0] implies x∧ y ∈ [F > 0].
The first is obvious, and for the second it suffices to show that
F(x∧ y) ≥ F(x)F(y)
or equivalent (Q = − log F)
Q(x∧ y) ≤ Q(x) + Q(y) (2.3)
However, suppose that Y(t), t > 0 is extremal-F, then
Fn−1(x) = P[Y(n−1) ≤ x]
= P[Y(n−1) ≤ x, Y(n−1) ≤ y] + P([Y(n−1) ≤ x] ∩ [Y(n−1) ≤ y]c
)≤ P[Y(n−1) ≤ x∧ y] + P
([Y(n−1) ≤ y]c
)= Fn−1
(x∧ y) + 1− Fn−1(y)
and therefore
n(1− Fn−1(x∧ y)) ≤ n(1− Fn−1
(x)) + n(1− Fn−1(y)) (2.4)
For fixed x ∈ [F > 0] we have as n→ ∞
n(1− Fn−1(x)) ∼ −n log Fn−1
(x) = Q(x)
and letting n → ∞ (2.4) gives the desired (2.3). Based on (1’) and (2’) to verification
that [F > 0] is rectangle can proceed: Defined projection maps usual by
πix = x(i), i = 1, . . . , d
37
for x ∈ Rd. We assert
[F > 0] = π1[F > 0]× · · · × πd[F > 0] =: "di=1πi[F > 0] (2.5)
and from this the result directly follows since πi[F > 0] is an interval of the form
(l(i), ∞) or [l(i), ∞) by (1’).
If x ∈ [F > 0] then of course x(i) ∈ πi[F > 0] implying x ∈ "di=1πi[F > 0]. Con-
versely, suppose that x ∈ "di=1πi[F > 0] so that for i = 1, . . . , d, x(i) ∈ πi[F > 0] and
thus there exists yi ∈ [F > 0] with πiyi = x(i). From (2’) we have y := ∧di=1yiπi[F >
0]. However, πiy ≤ πyi = x(i), and thus y ≤ x and then by (1’) we get x ∈ [F > 0].
Hence (2.4) is verified.
With l = inf[F > 0] consider E = [l, ∞]\l and defined on E measures
µn := nF1n .
Since F1n is only defined on Rd one must exten the definition of F
1n in the obvi-
ous way to [−∞, ∞]d in order to get µn defined on E. Set of the form [−∞, x]c =
E\[−∞, x] for x ≥ l are relatively compact subsets of E and as n→ ∞
µn([−∞, x]c) = n(1− F1n (x))→ Q(x) < ∞,
so that for such x > l
sup µn([−∞, x]c) < ∞.
Since E = limx↓l[−∞, x]c it follows that for any relatively compact subset B of E.
sup µn(B) < ∞
then µn is vaguely relatively compact. Let µ1 and µ2 be two vague limit points of
µn. Then for any x > y
µ1([−∞, x]c) = µ2([−∞, x]c) = Q(x) = − log F(x).
and thus
µ1 = µ2.
So all limits points of un are equal and hence there is limit measure µ on E with
µnv→ µ
Thus for x > l
µ([−∞, x]c) = − log F(x)
F(x) = exp−µ([−∞, x]c)
Hence µ is an exponent measure and the proof is complete.
38
2.3. Multivariate Domain of Attraction
In this section we determine the class of all possible limit distributions G.
Definition 2.2. A bivariate distribution function F is said to be in the domain of
attraction of a bivariate distribution function G if
1. G has non-degenerate marginal distributions G1 and G2.
2. There exist sequence an, cn > 0 and bn, dn ∈ R, such that
P
(M(1)
n − bn
an≤ x,
M(2)n − dn
cn≤ y
)= Fn(anx+ bn, cny+ dn)
d→ G(x, y) (2.6)
Any limit distribution function G in (2.5) with non-degenerate marginals is called
a multivariate extreme value distribution. Since (2.5) implies convergence of the one-
dimensional two marginal distributions, we have
limn→∞
P
(M(1)
n − bn
an≤ y
)= G(x, ∞) =: G1(x).
and
limn→∞
P
(M(2)
n − dn
cn≤ y
)= G(∞, y) =: G2(y).
Remark 2.1. We note in passing that since the two marginal distributions of G are
continuous, G must be continuous as well.
G1 and G2 are univariate extreme value distributions, we choose the constants an, cn >
0 and bn, dn ∈ R such that for some γ1, γ2
G1(x) = exp−(1 + γ1x)−1/γ1, 1 + γ1x > 0,
G2(y) = exp−(1 + γ2y)−1/γ2, 1 + γ2x > 0.
2.3.1. Max-stability
In the characterization of the multivariate extreme distribution, max-stable (or min-
stable) distributions play a central role.
Suppose that Xn = (X(1)n , . . . , X(d)
n ), n ≥ 1 are i.i.d. random d-dimensional vectors
with common distribution F(x)
F(x) = F(x1, . . . , xd) = P(X(k)n ≤ xk, k = 1, . . . , d).
39
Let the marginal distributions of F(x) be F1, . . . , Fd so that F1(x) = F(x, ∞, . . . , ∞),
and so on. Assume that there exist normalizing constants a(i)n > 0, b(i)n ∈ R, 1 ≤ i ≤d, n ≥ 1 such that as n→ ∞
P[M(i)
n − b(i)n
a(i)n
≤ x(i)] = Fn(a(1)n x(1) + b(1)n , . . . , a(d)n x(d) + b(d)n )→ G(x) (2.7)
for the limit distribution G such that each marginal Gi, i = 1, . . . , d is non degenerate.
The class of limit G, called extreme value distributions.
Definition 2.3. We say that a distribution G(x) is max-stable if for i = 1, 2, . . . , d and
every t > 0, there exist functions α(i)(t) > 0 , β(i)(t) such that
Gt(x) = G(α(1)(t)x(1) + β(1)(t), . . . , α(d)(t)x(d) + β(d)(t)) (2.8)
It is clear from (2.8) that for every t > 0, Gt is a distribution function and hence
every max-stable distribution is max-id. The relevance of max-stable distributions
is obvious from next result.
The following theorem describes the equivalence between multivariate extreme value
distributions and max-stable distributions.
Theorem 2.1. The class of multivariate extreme value distributions is precisely the class of
max-stable distribution functions with non degenerate marginals.
Proof. It is clear that if G has non-degenerate marginals and is max-stable, then (2.7)
holds ; take F = G.
Conversely suppose (2.7) holds. From marginal convergence
Fni (a(i)n (t)x + b(i)n )→ Gi(x), non-degenerate, 1 ≤ i ≤ d
and (1.19) there exist functions α(i)(t) > 0, β(i)(t) ∈ R such that for t > 0, 1 ≤ i ≤ d
limn→∞
a(i)n
a(i)[nt]
= α(i)n (t), lim
n→∞
b(i)n − b(i)[nt]
a(i)[nt]
= β(i)n (t) (2.9)
Suppose Y(t) is a vector with distribution Gt(x). Then for t > 0, 1 ≤ i ≤ d we have
on the one handM(i)
[nt] − b(i)[nt]
a(i)[nt]
d→ Y(1) (2.10)
40
and on the otherM(i)
[nt] − b(i)n
a(i)n
d→ Y(t). (2.11)
Since P[M[nt] ≤ x] = F[nt](x) = (Fn(x))[nt]n . Using (2.9), (2.10) and (2.11) we get
M(i)[nt] − b(i)
[nt]
a(i)[nt]
=
M(i)[nt] − b(i)n
a(i)n
a(i)n
a(i)[nt]
+b(i)n − b(i)
[nt]
a(i)[nt]
d→ (α(i)(t)Y(i)(t) + β(i)(t)) d
= Y(1).
which is the same as (2.8).
2.4. Basic Properties of Multivariate Extreme Value Dis-
tributions
The indicator function of a subset of a set is a function
1A : X → 0, 1
defined as
1A(x) =
1 if x ∈ A
0 if x /∈ A.
In this subsection, we study some basic properties of multivariate extreme value
distribution functions.
Remark 2.2. Any bivariate extreme value distribution is continuous in (x, y). This
follows from the subsequent lemma.
Lemma 2.1. If G is a bivariate distribution function such that its marginal distribution
functions G1 and G2 are continuous, then G(x, y) is itselt continuous in (x, y).
Proof. For 0 ≤ u, v ≤ 1 and 0 ≤ u′, v′ ≤ 1,
|uv− u′v′| = |(u− u′)v + u′(v− v′)| ≤ |(u− u′)v|+ |u′(v− v′)|
≤ |u− u′|+ |v− v′|. (2.12)
Now for x, x′, y, y′ ∈ R, we know that as (x, y)→ (x′, y′),
G1(x)→ G1(x′), G2(y)→ G2(y′).
41
Suppose that (X, Y) ∼ G. Then
|G(x, y)− G(x′, y′)| = |E(1X≤x.1Y≤y)−E(1X≤x′.1Y≤y′)|
≤ E|(1X≤x.1Y≤y)− 1X≤x′.1Y≤y′|
≤ E|1X≤x − 1X≤x′|+ E|1Y≤y.1Y≤y′|
= |G1(x)− G1(x′)|+ |G2(y)− G2(y′)| → 0,
as (x, y)→ (x′, y′).
Thus, since the univariate marginal of a bivariate extreme value distribution are
continuous, it itselt is continuous. This leads us to another result.
Proposition 2.4. For F ∈ D(G), the convergence
Fn(anx + bn, cny + dn)→ G(x, y)
is locally uniform as n→ ∞.
Proof. The original analytical proof is due to Buchanan and Hildebrandt, 1908 [6].
We proceed in a sequence of steps.
Step(i). From the fact that Fn, n ≥ 1 are all monotone non-decreasing functions. We
can show that G has to be monotone non-decreasing. Suppose it were not. Then for
certain two pairs (x1, y1), (x2, y2) (a ≤ x1 < x2 ≤ b, c ≤ y1 < y2 ≤ d) we would have
G(x1, y1) > G(x2, y2).
Let G(x1, y1)− G(x2, y2) = h.
Since Fn converges for all values of (x, y) on the intervals a ≤ x ≤ b, c ≤ y ≤ d, we
have
limn→∞
Fn(anx1 + bn, cny1 + dn) = G(x1, y1)
limn→∞
Fn(anx2 + bn, cny2 + dn) = G(x2, y2).
that is, if there is given an ε > 0, it is possible to find an nε(x1,y1)of such a nature that
if n ≥ nε(x1,y1), we have
|Fn(anx1 + bn, cny1 + dn)− G(x1, y1)| ≤ ε
and an nε(x2,y2) of such a nature, that if n ≥ nε(x2,y2) we have
|Fn(anx2 + bn, cny2 + dn)− G(x2, y2)| ≤ ε.
42
Suppose ε < h2 and choose n greater than or equal to nε(x1,y1)
and nε(x2,y2). Then
|Fn(anx1 + bn, cny1 + dn)− G(x1, y1)| ≤h2
|Fn(anx2 + bn, cny2 + dn)− G(x2, y2)| ≤h2
.
That is,
Fn(anx1 + bn, cny1 + dn) > G(x1, y1)−h2
Fn(anx2 + bn, cny2 + dn) < G(x2, y2) +h2
,
and therefore
Fn(anx1 + bn, cny1 + dn)− Fn(anx2 + bn, cny2 + dn) > 0.
But by hypothesis Fn is monotonic non-decreasing, i.e,
Fn(anx1 + bn, cny1 + dn)− Fn(anx2 + bn, cny2 + dn) ≤ 0.
We have then reached a contradiction, and therefore by hypothesis that G is not a
monotonic non-decreasing function of (x, y) is invalid.
Step(ii). Fix S = [a, b]× [c, d] ⊂ R2. We show that Fn(anx + bn, cny + dn) converges
to G(x, y) uniformly in S.
Step(iii). G is continuous on R2. Hence G is uniformly continuous on S (compact-
ness). Thus, given ε > 0, ∃ δ > 0, such that d((x, y), (x′, y′)) < 2δ implies
|G(x, y)− G(x′, y′)| < ε.
Consider an open cover of S with B((x, y), δ) = (x − δ, x + δ) × (y − δ, y + δ). By
Heine-Borel Theorem, there exists a finite subcover of Bi = (xi − δ, xi + δ)× (yi −δ, yi + δ), i = 1, . . . , k of S.
Step(iv). For all (xi, yi), i = 1, . . . , k, find Mi such that for n > Mi,
|Fn(anxi + bn, cnyi + dn)− G(xi, yi)| ≤ ε.
Step(v). Clearly for each i = 1, . . . , k
Vni := Fn(an(xi + δ) + bn, cn(yi + δ) + dn)− Fn(an(xi − δ) + bn, cn(yi − δ) + dn)n→∞→ G(xi + δ, yi + δ)− G(xi − δ, yi − δ)
≤ 2ε.
43
Therefore , there exists Ni, i = 1, . . . , k, such that Vni < 3ε, for n > Ni.
Step(vi). Define N = maxM1, . . . , Mk, N1, . . . , Nk. Pick (x, y) ∈ S, (x, y) ∈ Bi∗ for
some i∗. Easy to show that
|Fn(anx + bn, cny + dn)− G(x, y)|
= |Fn(anx + bn, cny + dn)− Fn(anxi∗ + bn, cnyi∗ + dn) + Fn(anxi∗ + bn, cnyi∗ + dn)
−G(xi∗ , yi∗) + G(xi∗ , yi∗)− G(x, y)|
≤ |Fn(anx + bn, cny + dn)− Fn(anxi∗ + bn, cnyi∗ + dn)|+
+ |Fn(anxi∗ + bn, cnyi∗ + dn)− G(xi∗ , yi∗)|+ |G(xi∗ , yi∗)− G(x, y)|
≤ Vni∗ + ε + ε
≤ 3ε + 2ε = 5ε.
Hence the statement is true.
A consequence of this fact is the following corollary
Corollary 2.1. If xn and yn are real sequences such that xn → u, yn → v, then
limn→∞
Fn(anxn + bn, cnyn + dn) = G(u, v).
Proof. By the local uniform convergence from Proposition 2.4, we have that, given
ε > 0, there exist δ > 0 such that
sup|x−u|≤δ|y−v|≤δ
|Fn(anx + bn, cny + dn)− G(x, y)| ≤ ε
2.
We also have G is continuous and (xn, yn)→ (u, v), there exists N0 ≥ 1 such that for
n ≥ N0, we get
|G(xn, yn)− G(u, v)| ≤ ε
2.
In addition, there exists N1 ≥ 1 such that for n ≥ N1, we have
|xn − u| ≤ δ, |yn − v| ≤ δ
Taking N = N0 ∨ N1. There combining all the above we observe that, for n ≥ N,
|Fn(anx + bn, cny + dn)− G(x, y)|
≤ |Fn(anx + bn, cny + dn)− G(xn, yn)|+ |G(xn, yn)− G(x, y)|
≤ sup|x−u|≤δ|y−v|≤δ
|Fn(anx + bn, cny + dn)− G(x, y)|+ |G(xn, yn)− G(x, y)|
≤ ε
2+
ε
2= ε.
The proof is complete.
44
Theorem 2.2. If F ∈ D(G) and F ∈ D(H), then there exists A, C > 0 and B, D ∈ R
such that,
H(x, y) = G(Ax + B, Cy + D).
Proof. Since F ∈ D(G) then there exists an, cn > 0, and bn, dn ∈ R, such that
Fn(anx + bn, cny + dn)d→ G(x, y).
By taking marginals Fi, i = 1, 2 of F we get the following weak convergences
Fn1 (anx + bn)
d→ G1(x),
Fn2 (cny + dn)
d→ G2(y).
Since we also have F ∈ D(H), there exists a′n, c′n > 0 and b′n, d′n ∈ R such that
Fn(a′nx + b′n, c′ny + d′n)d→ H(x, y).
By taking marginals Fi, i = 1, 2 of F we get the following weak convergences
Fn1 (a′nx + b′n)
d→ H1(x),
Fn2 (c′ny + d′n)
d→ H2(y).
From univariate convergence to types theorem, we obtain
a′nan→ A > 0
b′n − bn
an→ B,
c′ncn→ C > 0
d′n − dn
cn→ D.
Using the Corollary (2.1) we obtain that
H(x, y) = G(Ax + b, Cy + D).
2.5. Standardization
Let us now proceed towards characterizing G. We have seen that the marginals of
a bivariate (multivariate) extreme-value distribution (EVD) are all univariate EVDs.
To study the dependence structure of the distribution, it would be much more con-
venient if all the marginals were the same. For this we would actually perform a
transformation of the marginals which we call standardization. There are multiple
45
choices for this transformation leading to either Uniform or Gumbel or Weibull or
Frechet marginals. Each one has its own merit and has been explored in the liter-
ature. We would reduce to Frechet(1) margins across all coordinates, the merits of
which will be evident soon.
Let Fi, i = 1, 2 be the marginal distribution functions of F. We define Ui(t) as
Ui(t) = F←i (1− 1t) =
(1
1− Fi
)←(t), t > 1
From univariate case, we know that
Fn1 (anx + bn)→ G1(x) = exp−(1 + γ1x)−1/γ1, 1 + γ1x > 0,
Then there are positive functions ai(t), i = 1, 2, and bi(t) ∈ R such that
limt→∞
Ui(tx)− bi(t)ai(t)
→ xγi − 1γi
This immediately leads to the fact that we can take
an = a([n]), cn = c([n]), bn = U1([n]), dn = U2([n])
Hence we have for x, y > 0
xn :=U1(nx)− bn
an→ xγ1 − 1
γ1:= u, (2.13)
yn :=U2(nx)− dn
cn→ xγ2 − 1
γ2:= v. (2.14)
Note that if xn → u, yn → v, then by the continuity of G and the monotonicity of F
limn→∞
Fn(anxn + bn, cnyn + dn) = G(u, v).
Then for all x, y > 0, we get
limn→∞
Fn(U1(nx), U2(ny)) = G(
xγ1 − 1γ1
,xγ2 − 1
γ2
).
This is a direct consequence of Corollary 2.1 with xn, yn defined as in (2.13)-(2.14).
We have proved the following theorem:
Theorem 2.3 (Standardization). Suppose that there are real constants an, cn > 0, bn, and
dn such that
limn→∞
Fn(anx + bn, cny + dn) = G(x, y)
for all (x, y) of G, and the marginals of G are standardized. Then with F1(x) := F(x, ∞), F2(y) :=
F(∞, y), and Ui(x), i = 1, 2
limn→∞
Fn(U1(nx), U2(ny)) = G0(x, y) (2.15)
46
for all x, y > 0, where
G0(x, y) = G(
xγ1−1
γ1,
yγ2−1
γ2
)and γ1, γ2 are the marginal extreme value indices.
Remark 2.3. In case F has continuous marginal distribution functions F1 and F2,
relation (2.15) can be formulated simply as
limn→∞
max
(1
1− F1(Xi)≤ nx
), max
(1
1− F2(Yi)≤ ny
)= G0(x, y),
for x, y > 0, i.e., after a transformation of the marginal distributions to a standard
distribution, namely F(x) := 1− 1x , x ≥ 1, a simplified limit relation applies. This
means that we have reformulated the problem of identifying the limit distribution
in such a way that the marginal distributions no longer play a role. From now on
we can focus solely on the dependence structure.
Corollary 2.2. For any (x, y) for which 0 < G0(x, y) < 1,
limn→∞
n(1− F(U1(nx), U2(ny))) = − log G0(x, y). (2.16)
Proof. Taking logarithms to the left and to the right of (2.15), we get
limn→∞−n log F(U1(nx), U2(ny)) = − log G0(x, y).
We see that log(x) ∼ x − 1 as x → 1 and F(U1(nx), U2(ny)) → 1 as n → ∞. Then
we have− log F(U1(nx), U2(ny))1− F(U1(nx), U2(ny))
→ 1
and with −n log F(U1(nx), U2(ny))→ − log G0(x, y), we obtain (2.15).
We shall also use the following slight extension.
Corollary 2.3. For any (x, y) for which 0 < G0(x, y) < 1,
limt→∞
t(1− F(U1(tx), U2(ty))) = − log G0(x, y). (2.17)
where t runs through the real numbers
Proof. By applying inequalities (1.25) to relation (2.16) for the integer n replaced t,
we get (2.17).
47
Why do we call this standardization and what do we gain out of theorem 2.3. Ob-
serve that
Fn(U1(nx), U2(nx)) = P
[n∨
i=1
Xi ≤ U1(nx),n∨
i=1
Yi ≤ U2(nx)
]
= P
[n∨
i=1
U←1 (Xi)
n≤ x,
n∨i=1
U←2 (Yi)
n≤ y
]
=: P
[n∨
i=1
X∗in≤ x,
n∨i=1
Y∗in≤ y
].
Thus we have from (2.15) that
P
[n∨
i=1
X∗in≤ x,
n∨i=1
Y∗in≤ y
]→ G0(x, y)
weakly. Now let us find the distribution of X∗i and Y∗i .
P (X∗i ≤ t) = P (Xi ≤ Ui(t))
= P
(Xi ≤ F←i (1− 1
t)
)= P (
11− F1(Xi)
≤ t).
Note that if F1 is continuous then F1(Xi) ∼ U(0, 1). Then
P (X∗i ≤ t) = P
(1
1− F1(Xi)≤ t)
= P
(F1(Xi) ≤ 1− 1
t
)= 1− 1
t.
Thus X∗i follows a standard Pareto distribution. The same is true for Y∗i . Hence the
transformations U←1 and U←2 standardizes Xi, Yi to standard Pareto distributions.
Even when F is not continuous the standardized variables X∗i , Y∗i have asymptoti-
cally Pareto-like tails. Also note that the marginal G01 of G0 turns out to be:
G01 = G0(x, ∞) = G(
xγ1−1
γ1, ∞)
= G1
(xγ1−1
γ1
)= exp−(1 + γ1(
xγ1 − 1γ1
))−1/γ1
= e−1/x = Φ1(x), x > 0,
which is the Frechet(1) distribution. The same is true for the other marginal, G02(y) =
e−1/y, y > 0. Thus now we have reduced the problem of characterizing all bivariate
extreme value distributions to the the problem of characterizing all bivariate EVD’s
with Frechet(1) marginals.
48
Conclusion
Classical extreme value theory is concerned substantially with distributional prop-
erties of the maximum Mn of n i.i.d. random variables. Two results of basic impor-
tance are proved in Chapter 1. The first is the fundamental result, here called the
Extremal Types Theorem, which exhibits the possible limiting forms for the distri-
bution of Mn under linear normalizations. The second basic result given in Chapter
1 is almost trivial in the independent context, and gives a simple necessary and
sufficient condition under which PMn ≤ un converges, for a given sequence of
constants un. The theory is illustrated by several examples from each of the pos-
sible limiting types.
In chapter 2, we concentrate on the maximum of n multivariate observations. The
maximum is defined by the vector of componentwise maxima. The structure of the
family of limiting distributions is actually quite rich, and can be studied in terms of
max-stable distributions. We discuss characterizations of the limiting multivariate
extreme value distributions and give domains of attraction criteria.
49
APPENDIX A
Appendix
In the following we will provide some basic tools which are used throughout the
thesis. All random variables are assumed to be defined on a common probability space
(Ω,F , P) (i.e, Ω is a nonempty set, F is a σ-algebra over Ω, and P is a measure on
the measurable space (Ω,F ) having total mass 1). We commence with elementary
results on the convergence of random variables and of probability distributions.
A.1. Modes of Convergence
We introduce the main modes of convergence for a sequence of r.v.s X, X1, X2, . . ..
Convergence in Distribution
Definition A.1. We say that (Xn) converges in distribution or converges weakly to the
random variable X (Xnd→ X) if for all bounded, continuous functions f the relation
E f (Xn)→ E f (X), n→ ∞,
holds
We sometimes write Xnd→ FX where FX is the distribution or probability measure
of X. We use the same symbol both for the distribution and for the d.f. of a r.v.
Weak convergence can be described by the d.f.s FXn and FX of Xn and X respectively:
Xnd→ X holds if and only if for all continuity points y of the d.f. FX the relation
FXn(y)→ FX(y), n→ ∞
is satisfied. Moreover, if FX is continuous then relation can even be strenghtened to
uniform convergence:
supx|FXn(y)− FX(y)| → 0, n→ ∞.
50
Convergence in Probability
Definition A.2. We say that (Xn) converges in probability to the random variable X
(XnP→ X) if for all positive ε the relation
P(|Xn − X| > ε)→ 0, n→ ∞,
holds
Remark A.1. Convergence in probability implies convergence in distribution. The
converse is true if and only if X = c a.s for some constant c.
Almost Sure Convergence
Definition A.3. We say that (Xn) converges almost surely (a.s), or with probability
1, to the r.v. X (Xna.s→ X) if for P almost all ω ∈ Ω the relation
Xn(ω)→ X(ω), n→ ∞,
holds
This means that
P(Xn → X) = P(ω : Xn(ω)→ X(ω)) = 1.
Convergence with probability 1 is equivalent to the relation
supk≥n|Xk − X| P→ 0.
Hence convergence with probability 1 implies convergence in probability, hence
convergence in distribution .
Lp - Convergence
Definition A.4. Let p > 0. We say that (Xn) converges in Lp or in pth mean to X
(XnLp→ X) if E|Xn|p < ∞ and E|X|p < ∞ and
E|Xn(ω)− X(ω)|p → 0, n→ ∞
By Markov’s inequality, P(|Xn − X| > ε) ≤ ε−pE|Xn(ω) − X(ω)|p for positive p
and ε. Thus XnLp→ X implies that Xn
P→ X. The converse is in general not true.
51
Vague Convergence
Suppose E is a locally compact topological space with countable base. We can safely
think of E = Rd or Rd. Let C be the Borel σ-field of E.
A measure µ : C → [0, ∞] is set function such that
1. µ(∅) = 0 and µ(A) ≥ 0 for A ∈ C .
2. If An, n ≥ 1 are mutually disjoint sets in C , then
µ(∞⋃
i=1
Ai) =∞
∑i=1
µ(Ai).
µ is Radon if µ(K) < ∞ for all K compact in E. Now denote
M+(E) = µ : µ is a non-negative Radon measure on C .
Now recall from weak convergence of probability measures ([4]), that we considered
a class of continuous and bounded test functions f and if
Pn( f ) :=∫
f (x)Pn(dx)→∫
f (x)P (dx) =: P ( f ),
then we said Pn converges weakly to P . Note that the measure in M+(R) are poten-
tially infinite. So if we want fo follow the same route, we need functions on compact
support, where the measures are finite. So defined
C+K (E) = f : E→ [0, ∞) : f is continuous with compact support .
Let µn ∈ M+(E), ∀n ≥ 0. Then µn converges to µ0 vaguely if ∀ f ∈ C+K (E),
µn( f ) :=∫
Ef (x)µn(dx)→
∫E
f (x)µ0(dx) =: µ0( f ).
Write unv→ µ0. We have M+(E) can be made into a complete separable metric with
respect to the vague metric.
Uniform Convergence
Definition A.5. Consider a sequence of functions fn : R→ R, n ≥ 0. Then fn, n ≥1 converges to f0 uniformly on A ∈ R, if
supx∈A| fn(x)− f0(x)| → 0, n→ ∞.
52
fn converges to f0 locally uniformly if it converges uniformly on all compact sets, i.e.,
for any a < b,
supx∈[a,b]
| fn(x)− f0(x)| → 0, n→ ∞.
A.2. Inverses of Monotone Functions
Suppose H is a nondecreasing function on R. With the convention that inf∅ = ∞,
and infR = −∞ we define the (left-continuous) inverse of H as
H←(y) = infs : H(s) ≥ y
The function H← is left continuous at x ∈ R. Indeed, Suppose (xn) ↑ x but
H←(xn) ↑ H←(x−) < H←(x). Then there exist δ > 0 and y such that for all n
H←(xn) < y < H←(x)− δ.
The left inequality and the definition of H← yield H(y) ≥ xn for all n . Hence, letting
n → ∞ we get H(y) ≥ x whence again by the definition of H← we get y ≥ H←(x),
which coupled with y < H←(x)− δ leads to the desired contradiction.
Proposition A.1 (Properties of a generalised inverse function). If h is right continuous,
then the following properties hold. (a) h(x) ≥ q⇐⇒ h←(q) ≤ x.
(b) h(x) < q⇐⇒ h←(q) > x.
(c) h(x1) < q ≤ h(x2)⇐⇒ x1 < h←(q) ≤ x2.
(d) h←(h(q)) ≥ q for all t ∈ [0, 1], with equality for h continuous.
(e) h←(h(x)) ≤ x for all x ∈ R, with equality for h increasing.
(f) h is continuous⇐⇒ h← is increasing.
(g) h is increasing⇐⇒ h← is continuous.
(h) If X is a r.v.s with d.f h, then P(h←(h(X)) 6= X) = 0.
Proposition A.2 (Convergence of generalised inverse functions). Let h, h1, h2, . . . be
non-decreasing functions such that limn→∞ hn(x) = h(x) for every continuity point of h.
Then limn→∞ hn(y) = h←(y) for every continuity point y of h←.
Proofs of these results and more theory on generalised inverse functions can be
found in Resnick [18], Section 0.2.
53
A.3. Some Convergence Theorems
Lebesgue’s theorem on dominated convergence
Theorem A.1. Suppose that fn is a sequence of measurable functions, that fn → f
pointwise almost everywhere as n → ∞ , and that | fn| ≤ g for all n, where g is integrable.
Then f is integrable, and
limn→∞
∫fn dµ =
∫f dµ.
Proof. For the proof of the following, see P. Billingsley,1995 [5].
Skorohod’s theorem
Theorem A.2. For n ≥ 0 suppose Xn is a real random variable on (Ωn, Bn, Pn) such
that Xnd→ X0. Then there exist random variables Xn, n ≥ 0 defined on the Lebesgue
probability space ([0, 1], B[0, 1], m) such that
(i) Xnd= Xn for each n ≥ 0 and
(ii) Xn → X0 almost surely with respect to m.
Proof. For the proof of the following, see (Sidney I. Resnick, 1987, page 6) [18].
54
Bibliography
[1] A. A. Balkema and S. I. Resnick. Max-infinite divisibility. J. Appl. Probability,
14(2):309–319, 1977.
[2] V. Barnett. The ordering of multivariate data. J. Roy. Statist. Soc. Ser., 139(3):318–
355, 1976.
[3] J. Beirlant. Statistics Of Extremes: Theory And Applications. Wiley, 2004.
[4] Patrick Billingsley. Convergence of Probability Measures. John Wiley & Sons, 1968.
[5] Patrick Billingsley. Probability and Measure. Wiley-Interscience, 1995.
[6] H.E. Buchanan and T.H. Hildebrandt. Note on the convergence of a sequence
of functions of a certain type. Ann. of Math, 9(3):123–126, 1908.
[7] S.G. Coles and J.A. Tawn. Modelling extreme multivariate events. J. R. Statist.
Soc., 53(2):377–392, 1991.
[8] Stuart Coles. An introduction to statistical modeling of extreme values. Springer
series in statistics. Springer, 2001.
[9] Bikramjit Dass. A course in multivariate extremes. Spring, 2010.
[10] L. de Haan and S. I. Resnick. Limit theory for multidimensional sample ex-
tremes. Z. Wahr. verw. Geb., (40):317–337, 1977.
[11] Laurens de Haan and Ana Ferreira. Extreme Value Theory: An Introduction
(Springer Series in Operations Research). Springer, 2006.
[12] Paul Embrechts, Claudia Kluppelberg, and Thomas Mikosch. Modelling Ex-
tremal Events: for Insurance and Finance (Stochastic Modelling and Applied Proba-
bility). Springer, 2011.
[13] M. Falk, J. Husler, and R.D. Reiß. Laws of Small Numbers: Extremes and Rare
Events. Birkhauser, 2010.
[14] William Feller. An introduction to probability theory and its applications. Vol. II.
Second edition. John Wiley & Sons Inc., 1971.
[15] E.J. Gumbel. Statistics of extremes. Columbia University Press, 1958.
55
[16] M. R. Leadbetter, G. Lindgren, and H. Rootzen. Extremes and Related Properties
of Random Sequences and Processes. Springer-Verlag, 1983.
[17] J. Pickands. Multivariate extreme value distributions. Bull. Int. Statist. Inst.,
pages 859–878, 1981.
[18] S. Resnick. Extreme Values, Regular Variation, and Point Processes. Springer Series
in Operations Research and Financial Engineering, 1987.
[19] S.I. Resnick. A Probability Path. Birkhauser, 1999.
[20] E. Seneta. Regularly Varying Functions. Lecture Notes in Mathematics. Springer-
Verlag, 1976.
[21] M. Sibuya. Bivariate extreme statistics. Ann. Inst. Math. Statist., (11):195–210,
1960.
[22] J. Tiago de Oliveira. Extremal distributions. Rev. Fac. Sci. Lisboa, 1958.
[]
56