Overview of Extreme Value Analysis (EVA)

24
Overview of Extreme Value Analysis (EVA) Brian Reich North Carolina State University July 26, 2016 Rossbypalooza Chicago, IL Brian Reich Overview of Extreme Value Analysis (EVA) 1 / 24

Transcript of Overview of Extreme Value Analysis (EVA)

Page 1: Overview of Extreme Value Analysis (EVA)

Overview of Extreme Value Analysis (EVA)

Brian Reich

North Carolina State University

July 26, 2016

RossbypaloozaChicago, IL

Brian Reich Overview of Extreme Value Analysis (EVA) 1 / 24

Page 2: Overview of Extreme Value Analysis (EVA)

Importance of extremes in climate research

I From heat waves to hurricanes, often the environmentalprocesses that are the most critical to understandprobabilistically are extreme events

I There is a large literature on EVA

I There are some beautiful mathematical results

I The statistical methodology is unique

I There are many open statistical problems

Brian Reich Overview of Extreme Value Analysis (EVA) 2 / 24

Page 3: Overview of Extreme Value Analysis (EVA)

Common objectives in EVA

I Estimate the 1,000 year return level, i.e., the value thatoccurs on average once every 1,000 years

I Identify environmental covariates that drive extremes

I Test the hypothesis that the likelihood of an extreme eventis changing over time

I Determine if two locations are asymptotically dependent

I Project the change in the 99.9th percentile in 2050

Brian Reich Overview of Extreme Value Analysis (EVA) 3 / 24

Page 4: Overview of Extreme Value Analysis (EVA)

Unique statistical challenges

I Most of our intuition and methodology are built aroundthe mean and deviation from the mean

I In EVA concepts like mean and variance are irrelevantbecause they don’t speak to the tail of the distribution

I Similarly, correlation isn’t the best measure of dependencebecause it is based on deviation around the means

I We need new ways to describe distributions anddependence between random variables

Brian Reich Overview of Extreme Value Analysis (EVA) 4 / 24

Page 5: Overview of Extreme Value Analysis (EVA)

Isolating the extremes

I The first step in classic EVA is to separate extremeobservations from the bulk of the distribution

I For example, in a daily time series of precipitation in FLthese correspond to very different weather regimes

I Bulk: Thunderstorms

I Tails: Hurricanes

I Mean regression focus on thunderstorms and treatshurricanes as outliers

I If you want to estimate the 100-year storm, you shouldfocus only on hurricanes

I Two common ways to isolate the extremes: block maximaand points above a threshold

Brian Reich Overview of Extreme Value Analysis (EVA) 5 / 24

Page 6: Overview of Extreme Value Analysis (EVA)

Block maxima (BM) in CheeseboroBlock is a year and the block maximum is annual maximum

2000 2002 2004 2006 2008 2010

020

4060

80

Year

Hou

rly W

ind

Spe

ed

●●

Brian Reich Overview of Extreme Value Analysis (EVA) 6 / 24

Page 7: Overview of Extreme Value Analysis (EVA)

Points above a threshold (POT) for CheeseboroThe threshold is 50 and we analyze the points in red

2000 2002 2004 2006 2008 2010

020

4060

80

Year

Hou

rly W

ind

Spe

ed

●●

●●●

●●●●

●●

●●

●●

●●

●●

●●

●●

● ●●●●

●●●●

●●●

●●

●●

●●

●●

●●●●●

●● ● ●

●●

●●

●●

●●

●●

Brian Reich Overview of Extreme Value Analysis (EVA) 7 / 24

Page 8: Overview of Extreme Value Analysis (EVA)

Pros/cons of analyzing BM

Pros:I Can evoke EVA theory and use a simple modelI It removes dependence with block

Cons:I Excludes some large values (second highest each year)I Must pick the block size: too big and you lose data; too

small and you can’t use EVA theory

Brian Reich Overview of Extreme Value Analysis (EVA) 8 / 24

Page 9: Overview of Extreme Value Analysis (EVA)

Pros/cons of analyzing POT

Pros:I Can evoke EVA theory and use a simple modelI Retains all large values in the analysis

Cons:I Must deal with dependence within blockI Setting the threshold is really difficult

Brian Reich Overview of Extreme Value Analysis (EVA) 9 / 24

Page 10: Overview of Extreme Value Analysis (EVA)

BM: normal distribution/central limit theorem

I Let Y1, ...,Yn be the n independent and identicallydistributed values in a block (say n = 365 days in a year)

I In certain conditions, for large n the sample (annual) mean

Yn =1n

n∑i=1

Yi

is approximately normally distributedI Holds with some forms of dependence and nonstationarityI The underlying data Yi do not have to be GaussianI For example, the mean of 10 uniforms is ≈ normalI So if you are analyzing data that are constructed as

means, then a normal distribution is a good startI You should still check the assumption of normality

Brian Reich Overview of Extreme Value Analysis (EVA) 10 / 24

Page 11: Overview of Extreme Value Analysis (EVA)

BM: GEV distribution

I A similar result holds for the block maximum

Yn = max{Y1, ...,Yn}

I Under certain conditions, for large n Yn approximatelyfollows the Generalized Extreme Value (GEV) distribution

I Holds with some forms of dependence and nonstationarity

I So if you are analyzing data that are constructed as sayannual maximums, then GEV is a good start

I You should still of course check the GEV fit

Brian Reich Overview of Extreme Value Analysis (EVA) 11 / 24

Page 12: Overview of Extreme Value Analysis (EVA)

BM: GEV distribution

I The GEV has three parameters:I Location: µ

I Scale: σ > 0

I Shape: ξ

I The shape defines three special cases:I Weibull: ξ < 0 and the distribution is bounded above

I Gumbel: ξ = 0 and the distribution is unbounded

I Frechet: ξ > 0 and the distribution is bounded below

Explore the shape of the distribution: http://teaching.stat.ncsu.edu/shiny/bjreich/GEV/

Brian Reich Overview of Extreme Value Analysis (EVA) 12 / 24

Page 13: Overview of Extreme Value Analysis (EVA)

BM: Fitting in R

I Say Y1, ...,Ymiid∼ GEV(µ, σ, ξ)

I Estimates of the three GEV parameters and the standarderrors can be obtained with usual MLE

I The fgev package in R does this

I CLIMDEX example: http://www4.stat.ncsu.edu/~reich/Rossby/CLIMDEX_GEV.html

Brian Reich Overview of Extreme Value Analysis (EVA) 13 / 24

Page 14: Overview of Extreme Value Analysis (EVA)

BM: Model checking

I Just because data are block maxima doesn’t necessarilymean they fit the GEV perfectly

I Why?

I QQ-plots are a good diagnostic

I KS goodness-of-fit tests can be constructed

Brian Reich Overview of Extreme Value Analysis (EVA) 14 / 24

Page 15: Overview of Extreme Value Analysis (EVA)

BM: Return levels

I Say the data are annual maxima

I The n-year return level is the value exceeded once every1/n years

I This is the 1− 1/n quantile of the GEV distribution

I In R, the n-year return level is

RLn = qgev(1-1/n,mu.est,sigma.est,xi.est)

I Standard errors account for uncertainty in the GEVparameters, and can be found using the delta method

Brian Reich Overview of Extreme Value Analysis (EVA) 15 / 24

Page 16: Overview of Extreme Value Analysis (EVA)

BM: non-stationarity

I Until now we have assumed the distribution isstationary, i.e., constant over time

I However, the GEV parmeters (usually µ, but sometimes σand ξ) can vary with time

I Linear GEV location: Yt ∼ GEV(β0 + tβ1, σ, ξ)

I Add a covariate: Yt ∼ GEV(β0 + tβ1 + Xtβ2, σ, ξ)

I Linear GEV location and scale:Yt ∼ GEV[β0 + tβ1,exp(α0 + tα1), ξ]

I These models can be fit in evd/fgev using MLE

Brian Reich Overview of Extreme Value Analysis (EVA) 16 / 24

Page 17: Overview of Extreme Value Analysis (EVA)

POT: GPD distribution

I A POT analysis begins by selecting a threshold T thatseparates the bulk from the extremes

I The dataset for analysis then becomes only the values thatexceed T

I For most distributions, for large enough T the tail of thedistribution matches the Generalized Pareto Distribution(GPD)

I Picking the threshold too low leads to bias because theGPD doesn’t fit well

I Picking the threshold too high leads to high variancebecause the number of observations is small

Brian Reich Overview of Extreme Value Analysis (EVA) 17 / 24

Page 18: Overview of Extreme Value Analysis (EVA)

POT: GDP distribution

I The GPD has three parameters:I Location/lower bound/threshold: T

I Scale: σ > 0

I Shape: ξ

I The shape defines three special cases:I ξ < 0 and the distribution is bounded above

I ξ > 0 and the distribution is unbounded

Explore the shape of the distribution: http://teaching.stat.ncsu.edu/shiny/bjreich/GPD/

Brian Reich Overview of Extreme Value Analysis (EVA) 18 / 24

Page 19: Overview of Extreme Value Analysis (EVA)

POT: Fitting in R

I Say we set the threshold at T and Y1, ...,YmT are the mTobservations above T

I The model is Y1, ...,YmT

iid∼ GPD(T , σT , ξT )

I Estimates of the two GPD parameters σT and ξT and thestandard errors can be obtained with usual MLE

I The fpot package in R does this

I CLIMDEX example: http://www4.stat.ncsu.edu/~reich/Rossby/CLIMDEX_GPD.html

Brian Reich Overview of Extreme Value Analysis (EVA) 19 / 24

Page 20: Overview of Extreme Value Analysis (EVA)

POT: Model checking

I The biggest challenge is picking the threshold T

I For the GPD, for any u > T the mean residual life is:E(Y − u|Y > u) = σ

1−ξ + ξ1−ξu

I The mean residual life (MRL) plot plots the sample meanestimate of E(Y − u|Y > u) versus u

I The smallest u which the MRL plots is linear above u is areasonable threshold choice

I In R/evd this is the mrlplot function

Brian Reich Overview of Extreme Value Analysis (EVA) 20 / 24

Page 21: Overview of Extreme Value Analysis (EVA)

POT: Return levels

I Now the data are daily data

I The n-year return level is the value exceeded once every1/n years, which is 1/(365n) days

I Let pT be the probability below the threshold

I On a given day the probability of being below u > T ispT + (1− pT )FGPD(u)

I In R, the n-year return level isq = 1-(1/(365*n)-p.T)/(1-p.T)RLn = qgpd(q,thresh,sigma.est,xi.est)

I Standard errors account for uncertainty in the GPDparameters, and can be found using the delta method

Brian Reich Overview of Extreme Value Analysis (EVA) 21 / 24

Page 22: Overview of Extreme Value Analysis (EVA)

POT: non-stationarity

I As with the GEV, the GPD parameters can vary overspace and time following covariates

I Allowing the threshold to vary with covariates is probably agood idea, but really tricky

I Another departure for the simple model assumptions isserial dependence in the daily data

I This can be handled by declustering

I For example, if 5 consecutive days exceed the thresholdthen only the largest is retained

I Declustering is implemented in fpot

Brian Reich Overview of Extreme Value Analysis (EVA) 22 / 24

Page 23: Overview of Extreme Value Analysis (EVA)

Other extensions

I Multivariate extremes

I Time series of extremes and heat waves

I Spatial extremes

I Detection and attribution

I Methods to handle large n

I Many more!

Brian Reich Overview of Extreme Value Analysis (EVA) 23 / 24

Page 24: Overview of Extreme Value Analysis (EVA)

Resources

I Book on applied EVA: Coles (2001)

I Book on theory: de Haan and Ferreira (2006)

I Book on recent methods: Dey and Yan (2016)

I More computing in R: evd; extRemes;SpatialExtremes

I My info: http://www4.stat.ncsu.edu/~reich/

Brian Reich Overview of Extreme Value Analysis (EVA) 24 / 24