Bayesian model mergings for multivariate extremes ... · Bayesian model mergings for multivariate...

Post on 11-Aug-2020

2 views 0 download

Transcript of Bayesian model mergings for multivariate extremes ... · Bayesian model mergings for multivariate...

Bayesian model mergings for multivariateextremes

Application to regional predetermination of �oodswith incomplete data

Anne Sabourin

PhD defenseUniversity of Lyon 1

Committee:Anthony Davison, Johan Segers (Rapporteurs)

Anne-Laure Fougères (Director), Philippe Naveau (Co-director),

Clémentine Prieur, Stéphane Robin, Éric Sauquet.

September 24th, 20131

Multivariate extreme values

I Risk management: Largest events, largest losses

I Hydrology: `�ood predetermination'.I Return levels (extreme quantiles)I Return periods (1 / probability of occurrence )

→ digs, dams, land use plans.

I Simultaneous occurrence of rare events can be catastrophic

I Multivariate extremes:

Probability of jointly extreme events ?

2

Censored Multivariate extremes: �oods in the `Gardons'joint work with Benjamin Renard

I Daily stream�ow data at 4 neighbouring sites :St Jean du Gard, Mialet, Anduze, Alès.

I Joint distributions of extremes ?

→ probability of simultaneous �oods.I Recent, `clean' series very shortI Historical data from archives, depending on `perception

thresholds' for �oods (Earliest: 1604). → censored data

Gard river Neppel et al. (2010)

How to use all di�erent kinds of data ? 3

Multivariate extremes for regional analysis in hydrology

I Many sites, many parameters for marginal distributions, shortobservation period.

I `Regional analysis': replace time with space.Assume some parameters constant over the region and useextreme data from all sites.

I Independence between extremes at neighbouring sites ?Dependence structure ?

I Idea: use multivariate extreme value models

4

Outline

Multivariate extremes and model uncertainty

Bayesian model averaging (`Mélange de modèles')

Dirichlet mixture model (`Modèle de mélange'):a re-parametrization

Historical, censored data in the Dirichlet model

5

Multivariate extremesI Random vectors Y = (Y1, . . . ,Yd ,) ; Yj ≥ 0I Margins: Yj ∼ Fj , 1 ≤ j ≤ d

(Generalized Pareto above large thresholds)

I Standardization (→ unit Fréchet margins)

Xj = −1/ log [Fj(Yj)] ; P(Xj ≤ x) = e−1/x , 1 ≤ j ≤ d

I Joint behaviour of extremes: distribution of X above largethresholds ?

P(X ∈ A|X ∈ A0)? (A ⊂ A0, 0 /∈ A0), A0 `far from the origin'.

u1 X1

X2

u2

A0 :

“Extremal region”

A

X

6

Polar decomposition and angular measure

I Polar coordinates: R =∑d

j=1 Xj (L1 norm) ; W = X

R.

I W ∈ simplex Sd = {w : wj ≥ 0,∑

j wj = 1}.I Angular probability measure:

H(B) = P(W ∈ B) (B ⊂ Sd ).

1st component

2nd

com

pone

nt

B

x

0 1 5 10

01

510

w

B

W

R

X

Sd

7

Fundamental Result de Haan, Resnick, 70's, 80's

I Radial homogeneity (regular variation)

P(R > r ,W ∈ B|R ≥ r0) ∼r0→∞

r0rH(B) (r = c r0, c > 1)

I Above large radial thresholds, R is independent from WI H (+ margins) entirely determines the joint distribution

X1

X2

x

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

0 1 5 10

01

510

w

x

X1

X2

x●

●●

●●

●●

●●

● ●●

●●

●●

● ●

● ●

●●

●●

●●

0 1 5 10

01

510

w

x

I One condition only for genuine H: moments constraint∫w dH(w) = (

1

d, . . . ,

1

d).

Center of mass at the center of the simplex.I Few constraints: non parametric family !

8

Estimating the angular measure: non parametric problem

I Non parametric estimation (empirical likelihood, Einmahl et

al., 2001, Einmahl, Segers, 2009, Guillotte et al, 2011.) No explicitexpression for asymptotic variance, Bayesian inference withd = 2 only.

I Restriction to parametric family: Gumbel, logistic, pairwiseBeta . . . Coles & Tawn, 91, Cooley et al., 2010, Ballani & Schlather,

2011 :

How to take into account model uncertainty ?

9

Outline

Multivariate extremes and model uncertainty

Bayesian model averaging (`Mélange de modèles')

Dirichlet mixture model (`Modèle de mélange'):a re-parametrization

Historical, censored data in the Dirichlet model

10

BMA: Averaging estimates from di�erent models

Disjoint union of several parametric models

I Sabourin, Naveau, Fougères, 2013 (Extremes)

I Package R: `BMAmevt' , available on CRAN repositories 1.

1http://cran.r-project.org/11

Outline

Multivariate extremes and model uncertainty

Bayesian model averaging (`Mélange de modèles')

Dirichlet mixture model (`Modèle de mélange'):a re-parametrization

Historical, censored data in the Dirichlet model

12

Dirichlet distribution

∀w ∈◦Sd , diri(w | µ, ν) =

Γ(ν)∏di=1 Γ(νµi )

d∏i=1

wνµi−1i .

I µ ∈◦Sd : location parameter (point on the simplex): `center';

I ν > 0 : concentration parameter.

0.00 0.35 0.71 1.06 1.41

w3 w1

w2

ex: µ = (0.15, 0.35, 0.5), ν = 9.

13

Dirichlet distribution

∀w ∈◦Sd , diri(w | µ, ν) =

Γ(ν)∏di=1 Γ(νµi )

d∏i=1

wνµi−1i .

I µ ∈◦Sd : location parameter (point on the simplex): `center';

I ν > 0 : concentration parameter.

0.00 0.35 0.71 1.06 1.41

w3 w1

w2

H valid: µ = (1/3, 1/3, 1/3).

13

Dirichlet mixture model Boldi, Davison, 2007

I µ = µ · ,1:k , ν = ν1:k , p = p1:k , ψ = (µ,p,ν),

hψ(w) =k∑

m=1

pm diri(w | µ · ,m, νm)

I Moments constraint → on (µ, p):

k∑m=1

pm µ.,m = (1

d, . . . ,

1

d) .

Weakly dense family (k ∈ N) in the space of admissible angularmeasures

14

Bayesian inference and censored data

I Two issues : (i) parameters constraints (ii) censorship

(i) Bayesian framework: MCMC methods to sample the posteriordistribution.Constraints ⇒ Sampling issues for d > 2.

I Re-parametrization: No more constraint, �tting is manageablefor d = 5: Sabourin, Naveau, 2013

(ii) Censoring: data6= points but segments or boxes in Rd .I Intervals overlapping threshold: extreme or not ?

I Likelihood: density drr2

dH(w) integrated over boxes.

I Sabourin, under review ; Sabourin, Renard, in preparation

15

Bayesian inference and censored data

I Two issues : (i) parameters constraints (ii) censorship

(i) Bayesian framework: MCMC methods to sample the posteriordistribution.Constraints ⇒ Sampling issues for d > 2.

I Re-parametrization: No more constraint, �tting is manageablefor d = 5: Sabourin, Naveau, 2013

(ii) Censoring: data6= points but segments or boxes in Rd .I Intervals overlapping threshold: extreme or not ?

I Likelihood: density drr2

dH(w) integrated over boxes.

I Sabourin, under review ; Sabourin, Renard, in preparation

15

Re-parametrization: intermediate variables (γ1, . . . ,γk−1),partial barycenters

ex: k = 4

γ0

γm : Barycenter of kernels 'following µ.,m �: µ.,m+1, . . . ,µ.,k .

γm =(∑j>m

pj)−1 ∑

j>m

pj µ.,j

16

γ1 on a line segment: eccentricity parameter e1 ∈ (0, 1).

ex: k = 4

γ0

I1

µ1

γ1

Draw (µ · ,1 ∈ Sd , e1 ∈ (0, 1)) −→ γ1 de�ned byγ0 γ1

γ0 I1= e1 ;

−→ p1 =γ0 γ1

µ · ,1 γ1

.

17

γ2 on a line segment: eccentricity parameter e2 ∈ (0, 1).

ex: k = 4

γ0

I1

γ1 I2

µ

µ

1

2 γ2

Draw (µ · ,2, e2) −→ γ2 :γ1 γ2

γ1 I2= e2

−→ p2

18

Last density kernel = last center µ · ,k .

ex: k = 4

γ0

I1

γ1 I2γ2

I3

µ

µ

µ

1

2

3

µ4

Draw (µ · ,3, e3) −→ γ3

−→ p3 , µ.,4 = γ3.

−→ p4

19

Summary

γ0

I1

γ1 I2γ2

I3

µ

µ

µ

1

2

3

µ4

I Given(µ.,1:k−1, e1:k−1) ,

One obtains(µ.,1:k , p1:k).

I The density h may thus be parametrized by

θ = (µ.,1:k−1, e1:k−1, ν1:k) .

20

Bayesian model

I Unconstrained parameter space : union of product spaces(`rectangles')

Θ =∞∐k=1

Θk ; Θk ={

(Sd )k−1 × [0, 1)k−1 × (0,∞]k−1}

I Inference: Gibbs + Reversible-jumps.

I Restriction (numerical convenience) : k ≤ 15, ν < νmax, etc ...

I `Reasonable' prior ' `�at' and rotation invariant.Balanced weight and uniformly scattered centers.

21

MCMC sampling: Metropolis-within-Gibbs, reversible jumps.

Three transition types for the Markov chain:

I Classical (Gibbs): one µ.,m, em or a νm is modi�ed.

I Trans-dimensional (Green, 1995):One component (µ.,k , ek , νk+1) is added or deleted.

I `Shu�e': Indices permutation of the original mixture:Re-allocating mass from old components to new ones.

22

Results: model's and algorithm's consistencyI Ergodicity: The generated MC is φ-irréducible, aperiodic and

admits πn (= posterior | W1:n) as invariant distribution.I Consequence:

∀g ∈ Cb(Θ),1

T

T∑t=1

g(θt)→ Eπn(g) .

I Key point: πn is invariant under the `shu�e' moves.

I Posterior consistency for πn under `weak conditions'2, π-a.s.,∀U weakly open containing θ0,

πn(U) −−−→n→∞

1 .

I Consequence:Eπn

(g) −−−→n→∞

g(θ0) .

I Key: The Euclidian topology is �ner than the Kullbacktopology in this model.

2If the prior grants some mass to every Euclidian neighbourhood of Θ and ifθ0 is in the Kullback-Leibler closure of Θ

23

Convergence checking (simulated data, d = 5, k = 4)I θ summarized by a scalar quantity (integrating the DM density

against a test function)

Original algorithm Re-parametrized version

I standard tests:I Stationarity (Heidelberger & Welch, 83)I variance ratio (inter/intra chains, Gelman, 92)

24

Estimating H in dimension 3, simulated dataMCMC: T2 = 50 103; T1 = 25 103.

0.00 0.35 0.71 1.06 1.41

w3 w1

w2

0.00 0.35 0.71 1.06 1.41

25

Dimension 5, simulated data

Angular measure density for one pair (T2 = 200 103, T1 = 80 103).

X2/(X2+X5)

0 1

0.0

1.8

X2/(X2+X5)

0 1

0.0

1.8

Gelman ratio: Original version: 2.18 ; Re-parametrized: 1.07.

Credibility sets (posterior quantiles): wider.

26

Bayesian inference and censored data

I Two issues : (i) parameters constraints (ii) censorship

(i) Bayesian framework: MCMC methods to sample the posteriordistribution.Constraints ⇒ Sampling issues for d > 2.

I Re-parametrization: No more constraint, �tting is manageablefor d = 5: Sabourin, Naveau, 2013

(ii) Censoring: data6= points but segments or boxes in Rd .I Intervals overlapping threshold: extreme or not ?

I Likelihood: density drr2

dH(w) integrated over boxes.

I Sabourin, under review ; Sabourin, Renard, in preparation

27

Bayesian inference and censored data

I Two issues : (i) parameters constraints (ii) censorship

(i) Bayesian framework: MCMC methods to sample the posteriordistribution.Constraints ⇒ Sampling issues for d > 2.

I Re-parametrization: No more constraint, �tting is manageablefor d = 5: Sabourin, Naveau, 2013

(ii) Censoring: data6= points but segments or boxes in Rd .I Intervals overlapping threshold: extreme or not ?

I Likelihood: density drr2

dH(w) integrated over boxes.

I Sabourin, under review ; Sabourin, Renard, in preparation

27

Outline

Multivariate extremes and model uncertainty

Bayesian model averaging (`Mélange de modèles')

Dirichlet mixture model (`Modèle de mélange'):a re-parametrization

Historical, censored data in the Dirichlet model

28

Censored data: univariate and pairwise plots

Univariate time series:

●●●●●●●●

●●●●

●●●●●●

●●●

●●

●●●●

●●

●●●●●●

●●

●●●●

●●

●●●●●●u

1604 1668 1733 1797 1861 1925 1989

010

0030

00

St Jean

Année

Déb

it (m

3/s)

●●●●●●

●●●

●●●

●●

●●●●●●●●●●

●●

●●

●●●●

●●

u

1604 1668 1733 1797 1861 1925 1989

010

0020

00

Mialet

Année

Déb

it (m

3/s)

●●

●● ●

●●●

●●

●●●

●●●

●●●

●●

●●●

●●●●

●●●●●●●●●●●●

●●●●

●●●●●●●

●●●

●●●

●●

●●●●●●●●●●

●●●●●

●●●●

●●

●●●●

●●●

●●

●●●●

●●

●●●

●●●

●●●

●●●

●●●●u

1604 1668 1733 1797 1861 1925 1989

020

0040

0060

00

Anduze

Année

Déb

it (m

3/s)

●●●●

●●

●●

●●●

●●●

●●

●●●

●●●●

●●●

●●

●●●●●●●●

●●

●●●●●

●●●

●●●●●●

●●

●●

●●●●●●●●

●●●

●●●●●

●●●●

●●

●●

u

1604 1668 1733 1797 1861 1925 1989

010

0020

00

Ales

Année

Déb

it (m

3/s)

29

Censored data: univariate and pairwise plots

Bivariate plots:

0 1000 2000 3000

050

010

0015

0020

0025

00

Streamflow at StJean (m3/s)

Str

eam

flow

at M

iale

t (m

3/s)

●●

●●

●●●

●●●

●●

● ●● ●

●●

●●

● ●

●●

0 1000 2000 3000

010

0030

0050

00

Streamflow at StJean (m3/s)

Str

eam

flow

at A

nduz

e (m

3/s)

●●

●●●●

●●●●

●●●

●●

●●

●●

●●

●●

●●

0 1000 2000 3000

050

010

0015

0020

0025

00

Streamflow at StJean (m3/s)

Str

eam

flow

at A

les

(m3/

s)

●●●

●●

29

Using censored data: wishes and reality

I Take into account as many data as possible→ Censored likelihood, integration problems

I Information transfer from well gauged to poorly gauged sitesusing the dependence structure

→ Estimate together marginal parameters + dependence

30

Data overlapping threshold and Poisson model

How to include the rectangles overlapping threshold in the likelihood ?{(t

n,Xt

n

), 1 ≤ t ≤ n

}∼ Poisson Process (Leb×λ) on [0, 1]×Au,n

λ: ` exponent measure', with Dirichlet Mixture angular component

dλdr × dw

(r ,w) =d

r2h(w) .

Overlapping events appear in Poisson likelihood as

P

[N

{(t2n− t1

n)× 1

nAi

}= 0

]= exp [−(t2 − t1)λ(Ai )]

31

`Censored' likelihood: model density integrated over boxesI Ledford & Tawn, 1996: partially extreme data censored at

threshold,I GEV modelsI Explicit expression for censored likelihood.

I Here: idem + natural censoringI Poisson modelI No closed form expression for integrated likelihood.

I Two terms without closed form:I Censored regions Ai overlapping threshold:

exp {−(t2 − t1)λ(Ai )}

I Classical censoring above threshold∫censored region

dx.

32

Data augmentation

One more Gibbs step, no more numerical integration.

I Objective: sample [θ|Obs] ∝ likelihood (censored obs)

I Additional variables (replace missing data component): Z

I Full conditionals [Zi |Zj 6=j , θ,Obs], [θ|Z,Obs], . . . explicit(Thanks Dirichlet): → Gibbs sampling.

I Sample [z , θ|Obs]+ (augmented distribution) on Θ×Z.

33

Censored regions above threshold∫Censored region

dxdxj1:jr :

Generate missing components under univariate conditionaldistributions

Zj1:r ∼ [Xmissing|Xobs, θ]

u1 /n x1

x2

u2/n

Censored interval

Augmentation data Zj = [X

censored | X

observed, θ]

Extremal region

Dirichlet ⇒ Explicit univariate conditionals

Exact sampling of censored data on censored interval 34

Censored regions overlapping threshold

e−(t2,i−t1,i )λ(Ai ) ⇔

augmentation Poisson process Ni on Ei ⊃ Ai .

+

Functional ϕ(Ni )

u1 /n X1

X2

u2/n

U'1

U'2

AiE

i

Censored region

Augmentation Zi = PP(τ.λ)

on Ei

φ(#{points in Ai})

[z , θ|Obs] ∝ . . .︸︷︷︸density terms, prior, augmented missing components

[Ni ]ϕ(Ni )

35

Simulated data (Dirichlet, d = 4, k = 3 components),same censoring as real data

Pairwise plot and angular measure density(true/ posterior predictive)

0 1000 2000 3000 4000 5000 6000

010

0020

0030

0040

00

S3

S4

●●

●●●

●●

●●

0.0 0.2 0.4 0.6 0.8 1.0

01

23

45

X3/( X3 + X4 )

h

36

Simulated data (Dirichlet, d = 4, k = 3 components),same censoring as real data

Marginal quantile curves: better in joint model.

0 1 2 3

033

5067

0010

051

1340

1

return period (years, log−scale)

disc

harg

e :

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●

●●●●●●●

●●●●

●●

● ●● ●

dependentindependenttrue

S3

36

Angular predictive density for Gardons data

0.0 0.2 0.4 0.6 0.8 1.0

01

23

4

St Jean/(St Jean+Mialet )

h

0.0 0.2 0.4 0.6 0.8 1.0

01

23

4St Jean/(St Jean+Anduze )

h0.0 0.2 0.4 0.6 0.8 1.0

01

23

4

St Jean/(St Jean+Ales )

h

0.0 0.2 0.4 0.6 0.8 1.0

01

23

45

6

Mialet/(Mialet+Anduze )

h

0.0 0.2 0.4 0.6 0.8 1.0

01

23

4

Mialet/(Mialet+Ales )

h

0.0 0.2 0.4 0.6 0.8 1.0

01

23

4

Anduze/(Anduze+Ales )

h

37

Conditional exceedance probability

●●●●●●

●●

●●

●●

●●

1000 2000 3000 4000

0.0

0.2

0.4

0.6

0.8

P( Mialet > y | St Jean > 300 )

threshold (y)

Con

dit.

exce

ed.

prob

a.

● empirical probability of an excessempirical errorposterior predictiveposterior quantiles

●●

●●●●

●●●●

●●●

●●●●

●●●●

●●

●●●

● ● ● ●● ●

●●

● ● ●

2000 4000 6000 80000.

00.

20.

40.

6

P( Anduze > y | St Jean > 300 )

threshold (y)C

ondi

t. ex

ceed

. pr

oba. ●●

●●●●●●●●●

●●

●● ●●

● ●

●●●●●

●●●●● ●

1000 2000 3000 4000

0.0

0.1

0.2

0.3

0.4

0.5

0.6

P( Ales > y | St Jean > 300 )

threshold (y)

Con

dit.

exce

ed.

prob

a.

●●●●●●●●

●●

●●●

●●●●

●●●●

●●●

●●●

● ● ● ●● ●

●●

● ● ●

2000 4000 6000 8000

0.0

0.1

0.2

0.3

0.4

0.5

0.6

P( Anduze > y | Mialet > 320 )

threshold (y)

Con

dit.

exce

ed.

prob

a. ●●

●●●●●●●●●

●● ●●

● ●

● ●●●●

●●●●● ●

1000 2000 3000 4000

0.0

0.1

0.2

0.3

0.4

0.5

P( Ales > y | Mialet > 320 )

threshold (y)

Con

dit.

exce

ed.

prob

a.

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●●

●●●●

●● ●

1000 2000 3000 4000

0.0

0.1

0.2

0.3

0.4

0.5

0.6

P( Ales > y | Anduze > 520 )

threshold (y)

Con

dit.

exce

ed.

prob

a.

38

Conclusion

I Building Bayesian multivariate models for excesses:I Dirichlet mixture family: `non' parametric, Bayesian inference

possible up to re-parametrization

I Censoring → data augmenting (Dirichlet conditioningproperies)

I Two packages R:I DiriXtremes, MCMC algorithm for Dirichlet mixtures,I DiriCens, implementation with censored data.

I High dimensional sample space (GCM grid, spatial �elds) ?I Impose reasonable structure (sparse) on Dirichlet parametersI Dirichlet Process ? Challenges :

Discrete random measure 6= continuous framework

39

Bibliographie I

M.-O. Boldi and A. C. Davison.

A mixture model for multivariate extremes.JRSS: Series B (Statistical Methodology), 69(2):217�229, 2007.

Coles, SG and Tawn, JA

Modeling extreme multivariate eventsJR Statist. Soc. B, 53:377�392, 1991

Gómez, G., Calle, M. L., and Oller, R.

Frequentist and bayesian approaches for interval-censored data.Statistical Papers, 45(2):139�173, 2004.

Hosking, J.R.M. and Wallis, J.R..

Regional frequency analysis: an approach based on L-momentsCambridge University Press, 2005.

Ledford, A. and Tawn, J. (1996).

Statistics for near independence in multivariate extreme values.Biometrika, 83(1):169�187.

Neppel, L., Renard, B., Lang, M., Ayral, P., Coeur, D., Gaume, E., Jacob, N., Payrastre, O.,

Pobanz, K., and Vinet, F. (2010).Flood frequency analysis using historical data: accounting for random and systematic errors.Hydrological Sciences Journal�Journal des Sciences Hydrologiques, 55(2):192�208.

Resnick, S. (1987).

Extreme values, regular variation, and point processes, volume 4 of Applied Probability. A Seriesof the Applied Probability Trust.Springer-Verlag, New York.

40

Bibliographie II

Sabourin, A., Naveau, P. and Fougères, A-L. (2013)

Bayesian model averaging for multivariate extremes.Extremes, 16(3) 325�350

Sabourin, A., Naveau, P. (2013)

Bayesian Dirichlet mixture model for multivariate extremes: a re-parametrization.Computation. Stat and Data Analysis

Schnedler, W. (2005).

Likelihood estimation for censored random vectors.Econometric Reviews, 24(2):195�217.

Tanner, M. and Wong, W. (1987).

The calculation of posterior distributions by data augmentation.Journal of the American Statistical Association, 82(398):528�540.

Van Dyk, D. and Meng, X. (2001).

The art of data augmentation.Journal of Computational and Graphical Statistics, 10(1):1�50.

41

Outline

Multivariate extremes and model uncertainty

Bayesian model averaging (`Mélange de modèles')

Dirichlet mixture model (`Modèle de mélange'):a re-parametrization

Historical, censored data in the Dirichlet model

42

Bayesian Model Averaging: reducing model uncertainty.

I Parametric framework: arbitrary restriction, di�erent modelscan yield di�erent estimates.

I First option: Fight !

I Choose one model (Information criterions: BIC/ AIC /AICC)

43

Bayesian Model Averaging: reducing model uncertainty.I Parametric framework: arbitrary restriction, di�erent models

can yield di�erent estimates.I First option: Fight !

I Choose one model (Information criterions: BIC/ AIC /AICC)

H's family

M1

M2

43

Bayesian Model Averaging: reducing model uncertainty.I Parametric framework: arbitrary restriction, di�erent models

can yield di�erent estimates.I BMA = averaging predictions based on posterior model

weights

H's family

M1

M2

Models' average

I Already widely studied and used in several contexts ( weatherforecast . . . ).

Hoeting et al. (99), Madigan & Raftery (94), Raftery et al. (05)I Applicability to extreme value theory ?

43

BMA: principle

I J statistical models M(1), . . .M(J), with parametrizationΘj , 1 ≤ j ≤ J and priors πj de�ned on Θj

I BMA model = disjoint union: Θ̃ =⊔J

1 Θj ,with prior p on index set {1, . . . , J}:p(Mj) = `prior marginal model weight' for Mj

I prior on Θ̃: π̃(⊔J

1 Bj) =∑J

1 p(Mj) πj(Bj) (Bj ⊂ Θj)

I posterior (conditioning on data X ) = weighted average

π̃(J⊔1

Bj |X ) =J∑1

p(Mj |X )︸ ︷︷ ︸posterior marginal model weight

posterior inMj︷ ︸︸ ︷πj(Bj |X )

Key: posterior weights. (Laplace approx or standard MC ?)

p(Mj |X ) ∝ p(Mj)

∫Θj

Likelihood(X |θ) dπj(θ)

44

BMA: principle

I J statistical models M(1), . . .M(J), with parametrizationΘj , 1 ≤ j ≤ J and priors πj de�ned on Θj

I BMA model = disjoint union: Θ̃ =⊔J

1 Θj ,with prior p on index set {1, . . . , J}:p(Mj) = `prior marginal model weight' for Mj

I prior on Θ̃: π̃(⊔J

1 Bj) =∑J

1 p(Mj) πj(Bj) (Bj ⊂ Θj)

I posterior (conditioning on data X ) = weighted average

π̃(J⊔1

Bj |X ) =J∑1

p(Mj |X )︸ ︷︷ ︸posterior marginal model weight

posterior inMj︷ ︸︸ ︷πj(Bj |X )

Key: posterior weights. (Laplace approx or standard MC ?)

p(Mj |X ) ∝ p(Mj)

∫Θj

Likelihood(X |θ) dπj(θ)

44

BMA: principle

I J statistical models M(1), . . .M(J), with parametrizationΘj , 1 ≤ j ≤ J and priors πj de�ned on Θj

I BMA model = disjoint union: Θ̃ =⊔J

1 Θj ,with prior p on index set {1, . . . , J}:p(Mj) = `prior marginal model weight' for Mj

I prior on Θ̃: π̃(⊔J

1 Bj) =∑J

1 p(Mj) πj(Bj) (Bj ⊂ Θj)

I posterior (conditioning on data X ) = weighted average

π̃(J⊔1

Bj |X ) =J∑1

p(Mj |X )︸ ︷︷ ︸posterior marginal model weight

posterior inMj︷ ︸︸ ︷πj(Bj |X )

Key: posterior weights. (Laplace approx or standard MC ?)

p(Mj |X ) ∝ p(Mj)

∫Θj

Likelihood(X |θ) dπj(θ)

44

BMA for multivariate extremes Sabourin, Naveau, Fougères (2013)

I Background: univariate EVD's of di�erent types Stephenson &

Tawn (04) or multivariate, asymptotically dependent/independent EVD's Apputhurai & Stephenson (10)

I Our approach: averaging angular measure models, withangular data W .

I F1, . . . ,FJ max-stable distributions →∑

j pjFj not

max-stable.

I H1, . . .HJ angular measures (moments constraint) →∑

j pjHj

is a valid angular measure ! (linearity)

45

BMA for multivariate extremes Sabourin, Naveau, Fougères (2013)

I Background: univariate EVD's of di�erent types Stephenson &

Tawn (04) or multivariate, asymptotically dependent/independent EVD's Apputhurai & Stephenson (10)

I Our approach: averaging angular measure models, withangular data W .

I F1, . . . ,FJ max-stable distributions →∑

j pjFj not

max-stable.

I H1, . . .HJ angular measures (moments constraint) →∑

j pjHj

is a valid angular measure ! (linearity)

45

Implementing and scoring BMA for Multivariate extremes

Does the BMA framework perform signi�cantly better than

selecting models based on AIC ?

I Yes, in terms of logarithmic score for the predictive density(Kullback-Leibler divergence to the truth) Madigan & Raftery (94)

In average over the union model, w.r.t prior !

I Simulation study : Evaluation via proper scoring rules(Logarithmic + probability of failure regions)

I 2 models of same dimension: Pairwise-Beta / Nested asymmetriclogistic

I 100 data sets simulated from another model

I Results: The BMA framework performsI consistently (for all scores),I slightly (1/20 to 1/100),

better than model selection.

46

Implementing and scoring BMA for Multivariate extremes

Does the BMA framework perform signi�cantly better than

selecting models based on AIC ?

I Yes, in terms of logarithmic score for the predictive density(Kullback-Leibler divergence to the truth) Madigan & Raftery (94)

In average over the union model, w.r.t prior !

I Simulation study : Evaluation via proper scoring rules(Logarithmic + probability of failure regions)

I 2 models of same dimension: Pairwise-Beta / Nested asymmetriclogistic

I 100 data sets simulated from another model

I Results: The BMA framework performsI consistently (for all scores),I slightly (1/20 to 1/100),

better than model selection.

46

Implementing and scoring BMA for Multivariate extremes

Does the BMA framework perform signi�cantly better than

selecting models based on AIC ?

I Yes, in terms of logarithmic score for the predictive density(Kullback-Leibler divergence to the truth) Madigan & Raftery (94)

In average over the union model, w.r.t prior !

I Simulation study : Evaluation via proper scoring rules(Logarithmic + probability of failure regions)

I 2 models of same dimension: Pairwise-Beta / Nested asymmetriclogistic

I 100 data sets simulated from another model

I Results: The BMA framework performsI consistently (for all scores),I slightly (1/20 to 1/100),

better than model selection.

46

Implementing and scoring BMA for Multivariate extremes

Does the BMA framework perform signi�cantly better than

selecting models based on AIC ?

I Yes, in terms of logarithmic score for the predictive density(Kullback-Leibler divergence to the truth) Madigan & Raftery (94)

In average over the union model, w.r.t prior !

I Simulation study : Evaluation via proper scoring rules(Logarithmic + probability of failure regions)

I 2 models of same dimension: Pairwise-Beta / Nested asymmetriclogistic

I 100 data sets simulated from another model

I Results: The BMA framework performsI consistently (for all scores),I slightly (1/20 to 1/100),

better than model selection.

46

Discussion

I BMA vs selection: Moderate gain for large sample size :Posterior concentration on `asymptotic carrier regions' =points (parameters) of minimal KL divergence from truth

I BMA : simple if several models have already been �tted (`only'compute posterior weights)

I Way out: Mixture models for increased dimension of theparameter space. (product)

47