Testing extreme value conditions an overview and recent approaches
description
Transcript of Testing extreme value conditions an overview and recent approaches
International Conferenceon Mathematical and Statistical
Modelingin Honor of Enrique Castillo
(ICMSM 2006) University of Castilla-La Mancha
Ciudad Real (SPAIN) June 28-30, 2006
Testing extreme value conditions an overview and recent approaches
Isabel Fraga AlvesCEAUL & DEIO University Lisbon, Portugal
Cláudia NevesUIMA & DM University Aveiro, Portugal
ICMSM 2006 Ciudad Real, June 28-30, 2006 – 2Isabel Fraga Alves & Cláudia Neves
Contents
• Introduction
• Preliminaries and notation
• Testing extremes
Parametric Approaches
Annual Maxima (AM)
Peaks Over Threshold (POT)
Largest Observations (LO)
Semi-Parametric Approaches
Testing EV Conditions
• PORT approach Three Tests
• A case study S&P500 data
ICMSM 2006 Ciudad Real, June 28-30, 2006 – 3Isabel Fraga Alves & Cláudia Neves
Introduction• In analysis of extreme large (or small) values it is of relevant importance the model assumptions on the right (or left) tail of the underlying distribution function (d.f.) F to the sample data.
• We focus on the problem of extreme large values. By an obvious transformation, the problem of extreme small values is analogous.
• Statistical inference about rare events can clearly be deduced only from those observations which are extreme in some sense:
classical Gumbel method of block of annual maxima (AM)
peaks-over-threshold (POT) methods
peaks-over-random-threshold (PORT) methods.
• Statistical inference is clearly improved if one make an a priori statistical choice about the more appropriate tail decay for the underlying df:
exponential
• This is supported by Extreme Value Theory (EVT).
or polynomiallight tails with finite right endpoint
ICMSM 2006 Ciudad Real, June 28-30, 2006 – 4Isabel Fraga Alves & Cláudia Neves
Theory and Extreme Values Analysis• Extreme Values Analysis Models for Extreme Values, not
central values; modelling the tail of the underlying distribution
• Problem: How to make inference beyond the sample data?
• One Answer: use techniques based on EVT in such a way that it is possible to make statistical inference about rare events, using only a limited amount of data!
•Notation:
1 2( , , , ) iid r.v.'s wi Sam th d.ple f. ( ).nX X X F x •
• ,1, 2, Order Statistics =:
n nn n nX X X M ( ) ( ) 1 (Tail of ). F x P X x F xF •
ICMSM 2006 Ciudad Real, June 28-30, 2006 – 5Isabel Fraga Alves & Cláudia Neves
1
( ).
n
n
P X x P X x
F x
1[ ] , , n nP M x P X x X x•
. .Consequently, ,
with sup , ( ) 1 .
a sn F
F
M x
x x F x
Suppose there exist >0 and , such that
( ), for every n n
n n n n
a b
P M a x b G x x
R
R
• Then
Basic Theory – distribution of the Maximum
1/exp 1 , para 1 0, se 0
( ) ( )exp( exp( )), para z , se 0
z zG z G z
z
R
,F G RD( )[GEV- Generalized Extreme Value]
von Mises-Jenkinson Representation
• Gnedenko (1943)
ICMSM 2006 Ciudad Real, June 28-30, 2006 – 6Isabel Fraga Alves & Cláudia Neves
Extreme Value Distributions (maxima)
• The GEV() incorporates the 3 types:[Fisher-Tippett]
• Fréchet:
• Weibull:
• Gumbel:
( ) exp( ( ) ), 0, 0;z z z
( ) exp( ( ) ), 0, 0;z z z
R( ) exp( exp( )), .z z z
1 / 0
0
1 / 0
limit for heavy tailed distributions
limit for short tailed distributions with Fx
limit for exponential tailed distributions
ICMSM 2006 Ciudad Real, June 28-30, 2006 – 7Isabel Fraga Alves & Cláudia Neves
Parametric aprochesFitting GEV() to Anual Maxima (AM) – GUMBEL METHOD
Block 1 Block 2 Block 3 Block 4 Block 5
• Inclusion of location and scale parametersin GEV() df
( ; , ) , , 0,x
G x G
R R
tail index (shape)
ICMSM 2006 Ciudad Real, June 28-30, 2006 – 8Isabel Fraga Alves & Cláudia Neves
Testing problem in GEV()
The shape parameter determines the weight of the tail
Choice between Gumbel, Weibull or Fréchet
: 0 . : 0 G vs G
. : 0
. : 0
vs G
vs G
or
•Van Montfort (1970)•Bardsley (1977)•Otten and Van Montfort (1978)•Tiago de Oliveira (1981)•Gomes (1982)•Tiago de Oliveira (1984)
•Tiago de Oliveira and Gomes (1984)•Hosking (1984)•Marohn (1994) •Wang, Cooke, and Li (1996)•Marohn (2000)
ICMSM 2006 Ciudad Real, June 28-30, 2006 – 9Isabel Fraga Alves & Cláudia Neves
Generalized Pareto distribution GP()
-1/
1- 1 if 0( ; , ) , , 0,
1- exp -( ) / if 0
xH x
x
R
for 0 and 1+ / 0 x x
• GP() df includes the models:
( ; , ) 1 log ( ; , )H x G x
1,W ( ) 1 , 0, 1 x x x• Pareto: Heavy Tail
• Exponential: 0W ( ) 1 exp( ), 0 x x x Exponential tail
• Beta: 2,W ( ) 1 ( ) , 0, -1 0 x x x bounded
support
ICMSM 2006 Ciudad Real, June 28-30, 2006 – 10Isabel Fraga Alves & Cláudia Neves
Excesses over high thresholds – POT ( Peaks Over Thresholds )
( ) ( ) , 0 F u y F u P X u y X u y
( ; ( )) P X u y X u H y u
Excesses over : - | i iu X u X u
u
• Balkema-de Haan’74+Pickands’75
0(G ) lim sup - - ( ; ( )) 0
Fu x x u
F P X u x X u H x u
D
ICMSM 2006 Ciudad Real, June 28-30, 2006 – 11Isabel Fraga Alves & Cláudia Neves
Testing problem in GP()
The shape parameter determines the weight of the tail
Choice between Exponential, Beta or Pareto
: 0 . : 0 H vs H
. : 0
. : 0
vs H
vs H
or
Fitting GPdf to data• Castillo and Hadi (1997)Goodness-of-fit tests for GPdf model• Choulakian and Stephens (2001)Goodness-of-fit problem heavy tailed Pareto-type dfs• Beirlant, de Wet and Goegebeur (2006)
Fitting GPdf to data• Castillo and Hadi (1997)Goodness-of-fit tests for GPdf model• Choulakian and Stephens (2001)Goodness-of-fit problem heavy tailed Pareto-type dfs• Beirlant, de Wet and Goegebeur (2006)
• Van Montfort and Witter (1985)• Gomes and Van Montfort (1986)• Brilhante (2004)• Marohn (2000) AM & POT
ICMSM 2006 Ciudad Real, June 28-30, 2006 – 12Isabel Fraga Alves & Cláudia Neves
LO (Larger Observations)
(1)X
(2)X
(3)X
(4)X
X (k)
(1) (2) ( )kXX X k largest observations of the sample:
are modeled by joint pdf GEV() - extremal process
( )( ) 1, ,: ,ii i k
XZ
1
1 11
( ), , ( ) , , ( ) : ( ) /
( )
ki
k k ki i
g zf z z g z z z g z G z z
G z
ICMSM 2006 Ciudad Real, June 28-30, 2006 – 13Isabel Fraga Alves & Cláudia Neves
Testing problem in GEV() GEV()-extremal process
The shape parameter determines the weight of the tail
Choice between Gumbel, Weibull or Fréchet
: 0 . : 0 G vs G
. : 0
. : 0
vs G
vs G
or
• Gomes and Alpuim (1986)• Gomes (1989) LO & AM
Goodness-of-fit tests• Gomes (1987)
ICMSM 2006 Ciudad Real, June 28-30, 2006 – 14Isabel Fraga Alves & Cláudia Neves
Semi-Parametric Approach – Upper Order Statistics
(G )DF
,n nX
1,n nX
2,n nX
3,n nX
,n nX k
1 ,, ,n n n n n k nXX X
, n k nX ( ) ,
/ 0,
k k n
k n nupper
intermediate o.s.
ICMSM 2006 Ciudad Real, June 28-30, 2006 – 15Isabel Fraga Alves & Cláudia Neves
Peaks Over Random Threshold - PORT
1: : , 1,: ,n k nn ni iX i kZ X Excesses Over Random Threshold :n k nX
n nX k:
1: : :Excesses over : - n ii nk n kn nnX Z X X
ICMSM 2006 Ciudad Real, June 28-30, 2006 – 16Isabel Fraga Alves & Cláudia Neves
Testing Problem: Max-Domains of Attraction
The shape parameter determines the weight of the tail
Choice between Domains of Attraction
0 0(G ) . (G ) F vs F D D0
0
. (G )
. (G )
vs F
vs F
D
Dor
PORT approach• Neves, Picek and Fraga Alves(2006) • Neves and Fraga Alves (2006)
• Galambos (1982)• Castillo, Galambos and Sarabia (1989)• Hasofer and Wang (1992)• Falk (1995)• Fraga Alves and Gomes (1996)• Fraga Alves (1999)• Marohn (1998a,b)• Segers and Teugels (2000)
ICMSM 2006 Ciudad Real, June 28-30, 2006 – 17Isabel Fraga Alves & Cláudia Neves
D for any real(G ), F
1 ,, ,n n n n n k nXX X , n k nX upper intermediate o.s.
Adapted Goodness-of-fit tests
(Kolmogorov-Smirnov & Cramér-von Mises type)
• Dietrich, de Haan and Husler (2002)
• Drees, de Haan and Li (2006)
Testing EV conditions
ICMSM 2006 Ciudad Real, June 28-30, 2006 – 18Isabel Fraga Alves & Cláudia Neves
1 ,, ,n n n n n k nXX X ( ) ,
/ 0,
k k n
k n n
1, , , 1,: ,n k nn ni iX i kZ X
Largest Observations
Excesses over the Random Threshold ,n k nX
Define the r-Moment of Excesses
( )1,
1,
1
1, 21 1
: : ,
k k
r
nrr
n n k ini ni i
rXM X Zk k
PORT approach Three Tests for0 0
(G ) . (G ) F vs F D D
ICMSM 2006 Ciudad Real, June 28-30, 2006 – 19Isabel Fraga Alves & Cláudia Neves
NPFA test statistic:Ratio between the Maximum and the Mean of Excesses
Motivation: different behaviour of the ratio between the maximum and the mean for light and heavy tails
1(1)
( )nn
ZT k
M
The distribution does NOT depend on the location and scale
Neves, Picek & FragaAlves ‘06
ICMSM 2006 Ciudad Real, June 28-30, 2006 – 20Isabel Fraga Alves & Cláudia Neves
Gt test statistic:Greenwood-type Statistic
Motivation: based on the statistic Greenwood ’46
(2)
2(1)( ) n
n
n
MR k
M
The distribution does NOT depend on the location and scale
(Neves & FragaAlves ‘06)
ICMSM 2006 Ciudad Real, June 28-30, 2006 – 21Isabel Fraga Alves & Cláudia Neves
HW - test statistic:Hasofer and Wang Statistic
Motivation: based on goodness-of-fit statistic Shapiro-Wilk ’65
2(1)
2(2) (1)
1 1 1( ) :
( ) 1n
nnn n
MW k
k k R kM M
The distribution does NOT depend on the location and scale
(Hasofer & Wang ’92; Neves & FragaAlves ‘06)
ICMSM 2006 Ciudad Real, June 28-30, 2006 – 22Isabel Fraga Alves & Cláudia Neves
NPFA - Test at asymptotic level
0 0 1 0H : (G ) . H : (G ) F vs F D D
0 0 1 0H : (G ) . H : (G ) F vs F D D
0 0 1 0H : (G ) . H : (G ) F vs F D D Reject H0 (light tails) in favour of H1 (bilateral) if:
Reject H0 (light tails) in favour of H1 (short tails) if:
*, 1k nT g
*,k nT g
* *, 2 , 1 2or k n k nT g T g
ln( l: n ) g Gumbel quantile
Reject H0 (light tails) in favour of H1 (heavy tails) if:
under H0
+ extra second order conditions on the upper tail of F
+ extra conditions on convergence rate of k to infinity
,*, og: lkk n nT T k d
n 0 G
ICMSM 2006 Ciudad Real, June 28-30, 2006 – 23Isabel Fraga Alves & Cláudia Neves
Gt & HW - Tests at asymptotic level
0 0 1 0H : (G ) . H : (G ) F vs F D D
0 0 1 0H : (G ) . H : (G ) F vs F D D
1 (: )z - Normal quantile
0 0 1 0H : (G ) . H : (G ) F vs F D D
Reject H0 (light tails) in favour of H1 (short tails) if:
*1
*1
( )
( )
n
n
R k z
W k z
Gt
HW
- test
- test
*
*
/ 4 ( ) 2
/ 4 (
(
) : )( 1
) : nn
n n
R k
W
k R k
k k kW k
under H0
+ extra second order conditions on the upper tail of F
+ extra conditions on convergence rate of k to infinity
dn (0,1)N
Reject H0 (light tails) in favour of H1 (heavy tails) if:
*1
*1
( )
( )
n
n
R k z
W k z
Gt
HW
- test
- test
Reject H0 (light tails) in favour of H1 (bilateral) if:
*1 2
*1 2
( )
( )
n
n
R k z
W k z
Gt
HW
- test
- test
ICMSM 2006 Ciudad Real, June 28-30, 2006 – 24Isabel Fraga Alves & Cláudia Neves
Exact Properties of NPFA, GT & HW - Tests
An extensive simulation study concerning the proposed procedures,
allows us to conclude that:
The Gt-test is shown to good advantage when testing the presence of heavy-tailed distributions is in demand.
While the Gt-test barely detects small negative values of , the HW-test is the most powerful test under study concerning alternatives in the Weibull domain of attraction.
Since the NPFA- test based on the very simple Tn-statistic tends to be a conservative test and yet detains a reasonable power, this test proves to be a valuable complement to the remainder procedures.
ICMSM 2006 Ciudad Real, June 28-30, 2006 – 25Isabel Fraga Alves & Cláudia Neves
Financial data: stock index log-returns EVT offers a powerful framework to characterize financial market crashes and booms.
The exact distribution of financial returns remains an open question.
Heavy tails are consistent with a variety of financial theories.
In financial studies, the following question is relevant:
are return distributions symmetric in the tails?
Differences in the behavior of extreme positive and negative tail movements within the same market constitute a point of investigation.
The aforementioned tests can be seen as a first test for symmetry between the positive and negative tails of the log-returns of some stock index.
ICMSM 2006 Ciudad Real, June 28-30, 2006 – 26Isabel Fraga Alves & Cláudia Neves
S&P500: left and right tails of stock index log-returns
S&P500 data: n=6985 observations
series of closing prices, {Si , i = 1, … , n} of S&P500 stock index taken
from 4 January, 1960 up to Friday, 16 October, 1987 (the last trading day before the crash of Black Monday, October 19, 1987 ), from which we use the daily log-returns (assumed to be stationary and weakly dependent).
Study left tail of the distribution of the returns:negative log-returns, i.e.,
Li := log (Si+1 / Si ) , i = 1,…, n -1.
Study right tail of the distribution of the returns:positive log-returns, defined as
Xi := log (Si+1 / Si )= Li , i = 1,…, n -1.
ICMSM 2006 Ciudad Real, June 28-30, 2006 – 27Isabel Fraga Alves & Cláudia Neves
S&P500: percentage log-returns Xi := log (Si+1 / Si )
S&P500 (log-returns, 5 Jan 60 - 16 Oct 87)
-8
-6
-4
-2
0
2
4
6
1/5/1960
1/5/1962
1/6/1964
1/6/1966
1/7/1968
1/7/1970
1/8/1972
1/8/1974
1/9/1976
1/9/1978
1/10/1980
1/10/1982
1/11/1984
1/11/1986
1/12/1988iX
ICMSM 2006 Ciudad Real, June 28-30, 2006 – 28Isabel Fraga Alves & Cláudia Neves
S&P500 (Left tail)
-3
-2
-1
0
1
2
3
4
5
6
0 200 400 600 800 1000 1200
T* R* W*
k
(G ), 0 F L Fre chet Domain, Heavy Tail !D
0.95g
0.95z
0.05z
NPFA-test
HW-test
Gt-test
Sample paths of the statistics T*, R* and W*,plotted against k = 5, … , 1200, applied to S&P500:
negative log-returns Li := log (Si+1 / Si )
ICMSM 2006 Ciudad Real, June 28-30, 2006 – 29Isabel Fraga Alves & Cláudia Neves
S&P500 (Right tail)
-3
-2
-1
0
1
2
3
4
5
6
0 200 400 600 800 1000 1200
T* R* W*
k
0(G ), F X D Gumbel Domain, light/exponential Tail !
0.975g
0.975z
0.025z
NPFA-testHW-test
Gt-test
Sample paths of the statistics T*, R* and W*,plotted against k = 5, … , 1200, applied to S&P500:
positive log-returns Xi := log (Si+1 / Si )
0.025g
ICMSM 2006 Ciudad Real, June 28-30, 2006 – 30Isabel Fraga Alves & Cláudia Neves
S&P500: left and right tails of stock index log-returns
NPFA, HW and Gt testing procedures under the PORT approach yielded the sample paths plots presented.
This analysis suggests the consideration of the Fréchet and Gumbel domains of attraction, respectively, for the left and right tails of the returns distribution.
This may have the following interpretation: in this stock index the crashes are much more likely than large gain values.
ICMSM 2006 Ciudad Real, June 28-30, 2006 – 31Isabel Fraga Alves & Cláudia Neves
Main References
Neves, C., Picek, J. and Fraga Alves, M.I. (2006). Contribution of the maximum to the sum of excesses for testing max-domains of attraction. JSPI, 136, 4, 1281-1301.
Neves, C. and Fraga Alves, M.I. (2006). Semi-parametric Approach to Hasofer-Wang and Greenwood Statistics in Extremes. To appear in TEST.