Bayesian Empirical likelihood - Stanford University

37
Bayesian Empirical Likelihood 1 Bayesian Empirical Likelihood: a survey Art B. Owen Stanford University BNP 11, June 2017

Transcript of Bayesian Empirical likelihood - Stanford University

Page 1: Bayesian Empirical likelihood - Stanford University

Bayesian Empirical Likelihood 1

Bayesian Empirical Likelihood:

a survey

Art B. Owen

Stanford University

BNP 11, June 2017

Page 2: Bayesian Empirical likelihood - Stanford University

Bayesian Empirical Likelihood 2

These are the slides I presented at BNP 11 in Paris, a very fine meeting. I have added a few

interstitial slides like this one to include comments I made while presenting the slides and to give

some background thoughts in this area.

The session included a delightful presentation by Nils Lid Hjort on some not yet published

theorems where he connects Bayesian empirical likelihood to Dirichlet process Bayesian

nonparametrics.

I was pleased to see our session got a positive review on Xian’s Og (written by our session chair,

Christian Robert).

Also, the conference included two very strong posters by students working on Bayesian

empirical likelihood. Frank Liu (Monash) has a nice way to robustify BETEL. Laura Turbatu

(Geneva) took a close look at high order asymptotics to compare frequentist coverage of Bayes

with various kinds of nonparametric priors.

BNP 11, June 2017

Page 3: Bayesian Empirical likelihood - Stanford University

Bayesian Empirical Likelihood 3

The goal

Science driven prior × Data driven likelihood

BNP 11, June 2017

Page 4: Bayesian Empirical likelihood - Stanford University

Bayesian Empirical Likelihood 4

This goal is not everybody’s goal. But this goal means you don’t have to make up a likelihood

which could bring unwanted and consequential restrictions.

If the data deliver the likelihood and the prior is chosen flat or by previously known science, then

what is there left for the statistician to do?

BNP 11, June 2017

Page 5: Bayesian Empirical likelihood - Stanford University

Bayesian Empirical Likelihood 5

The goal

Science driven prior × Data driven likelihood

The ask

IID data and

an estimand defined by estimating equations.

X1,X2, . . . ,Xniid∼ F∫

m(x, θ) dF (x) = 0, θ ∈ Rp

BNP 11, June 2017

Page 6: Bayesian Empirical likelihood - Stanford University

Bayesian Empirical Likelihood 6

We have IID X and an estimating equation. If you picture the plate diagram, it is one of the

simplest possible ones. It covers an important set of problems, but Bayes includes much else as

well.

BNP 11, June 2017

Page 7: Bayesian Empirical likelihood - Stanford University

Bayesian Empirical Likelihood 7

Outline1) Empirical Likelihood

2) Bayes EL Lazar (2003)

3) Exponential tilting, and ETEL Schennach (2005,2007)

4) ABC and EL Mengersen, Pudio, Robert (2013)

5) BEL with Hamiltonian MCMC Chaudhuri et al (2017)

6) Some other recent works

Notes

• Apologies about necessarily omitted papers.

• Motivation: bioinformatics.

BNP 11, June 2017

Page 8: Bayesian Empirical likelihood - Stanford University

Bayesian Empirical Likelihood 8

In the bioinformatics motivation, the data are counts. They are assuredly not Poisson. So a

common response is to model them as negative binomial. Of course they are not that either.

This is work in progress.

SAMSI is having a year on QMC. The original slides touted the opening workshop. However the

application deadline will almost surely have passed by the time you’re reading this.

BNP 11, June 2017

Page 9: Bayesian Empirical likelihood - Stanford University

Bayesian Empirical Likelihood 9

Empirical likelihoodFor Xi

iid∼ F observed Xi = xi ∈ Rd, define

L(F ) =n∏i=1

F ({xi}).

The nonparametric MLE (NPMLE) is

F ≡ arg maxF

L(F ) =1

n

n∑i=1

δxi

There are many other NPMLEs.

E.g., Kaplan-Meier for right censored survival,

Log concave densities: Walther, Dumbgen, Rufibach, Cule, Samworth, Stewart · · ·

BNP 11, June 2017

Page 10: Bayesian Empirical likelihood - Stanford University

Bayesian Empirical Likelihood 10

Basic notions

Model: xiiid∼ F True value: F = F0 NPMLE: F

Estimand T (F )

E.g., population mean, median, regression coefficient etc.

True value: θ0 = T (F0) NPMLE: θ = T (F )

BNP 11, June 2017

Page 11: Bayesian Empirical likelihood - Stanford University

Bayesian Empirical Likelihood 11

Empirical likelihood ratios

Point est. Interval / Test

Parametric θ = arg maxθ R(θ) −2 logR(θ0)→ χ2

Non-parametric T (F ) ∗ ∗ ∗

The purpose of empirical likelihood ratios is to fill in the ∗ ∗ ∗,

getting likelihood ratio confidence intervals and tests without parametric assumptions.

I.e., a nonparametric Wilks’ theorem

O (2000) Chapman & Hall, Monograph

BNP 11, June 2017

Page 12: Bayesian Empirical likelihood - Stanford University

Bayesian Empirical Likelihood 12

Empirical likelihood ratioLet wi = wi(F ) = F ({xi})

Then wi > 0 and∑ni=1 wi 6 1.

R(F ) =R(F )

R(F )=

n∏i=1

wi1/n

=n∏i=1

(nwi)

Literal EL confidence regions{T (F ) | R(F ) > c}

Profile likelihood

R(θ) = max{R(F ) | T (F ) = θ}Conf set = {θ | R(θ) > c}.

We still have to choose c BNP 11, June 2017

Page 13: Bayesian Empirical likelihood - Stanford University

Bayesian Empirical Likelihood 13

Usual EL

It is usual to force∑i wi =

∑i F ({xi}) = 1.

R(F ) > c already implies 1− F ({x1, . . . ,xn}) = O(1/n)

It is a multinomial likelihood

Supported on x1, . . . ,xn

There are other choices

BNP 11, June 2017

Page 14: Bayesian Empirical likelihood - Stanford University

Bayesian Empirical Likelihood 14

Empirical likelihood for the mean

T (F ) =

∫x dF (x)

Profile likelihood

R(µ) = max

{ n∏i=1

nwi∣∣ n∑i=1

wixi = µ, wi > 0,n∑i=1

wi = 1

}Confidence set{µ | R(µ) > c}

To pick c

− 2 log(R(µ0))d→ χ2

(k), k = rank(Var(x))

c = exp(−1

2χ2,1−α(k)

)BNP 11, June 2017

Page 15: Bayesian Empirical likelihood - Stanford University

Bayesian Empirical Likelihood 15

Dipper, Cinclus cinclus

Eats larvae of Mayflies, Stoneflies, Caddis flies, other

Photo By Mark Medcalf. From Wikipedia. Lic under CC by 2.0.

https://commons.wikimedia.org/w/index.php?curid=15739681BNP 11, June 2017

Page 16: Bayesian Empirical likelihood - Stanford University

Bayesian Empirical Likelihood 16

Dipper diet means

Caddis fly larvae

Sto

nefly

larv

ae

0 100 200 300

020

040

060

080

0

Mayfly larvae

Oth

er in

vert

ebra

tes

0 200 600 1000

020

040

060

080

0

Caddis fly larvae

Sto

nefly

larv

ae

0 100 200 300

020

040

060

080

0

Mayfly larvae

Oth

er in

vert

ebra

tes

0 200 600 1000

020

040

060

080

0

Top row shows EL; bottom Hotelling’s T 2 ellipses

Data from Iles (1993) 22 sites in Wales.

Hall (1990) quantifies correctness of shape.BNP 11, June 2017

Page 17: Bayesian Empirical likelihood - Stanford University

Bayesian Empirical Likelihood 17

Computing EL for the meanEasy by convex duality.

The dual is self-concordant

=⇒ truncated Newton assured of convergence.

O (2013) Canadian Journal of Statistics

R code online

BNP 11, June 2017

Page 18: Bayesian Empirical likelihood - Stanford University

Bayesian Empirical Likelihood 18

Estimating equations

θ solves E(m(x, θ)) = 0

Examples

m(x, θ) Estimand θ

x− θ E(x)

1x<θ − 0.5 median(x)

(x− θ)× 1z∈A E(x | z ∈ A)

x(y − xTβ) regression

∂∂θ log f(x, θ) MLE

BNP 11, June 2017

Page 19: Bayesian Empirical likelihood - Stanford University

Bayesian Empirical Likelihood 19

Note that some of these estimands can be well defined without making any assumptions about

the likelihood. For instance, we could well want a median without wanting to assume a double

exponential distribution or any other whose MLE is the median.

Similarly, least squares estimates are interpretable as sample based counterparts to population

least squares quantities.

Other cases, like logistic regression, make it harder to interpret what you get if the motivating

parametric model does not hold. That said, you might still prefer reliable to unreliable uncertainty

quantification for such an estimand.

BNP 11, June 2017

Page 20: Bayesian Empirical likelihood - Stanford University

Bayesian Empirical Likelihood 20

EL properties• High power Kitamura (2001), Lazar & Mykland (1998)

Comparable to a true parametric model (when there is one)

• Bartlett correctable DiCiccio, Hall & Romano (1991)

Coverage errors O(1/n)→O(1/n2)

Challenges

• Convex hull issue, fixable by pseudo-data,

Chen & Variyath (2008), Tsao & Wu (2013/14), Emerson & O (2009)

• Computing max{R(θ1, . . . , θp) | θj = θj0} can be hard

O (2000) used expensive NPSOL solver

Possibility

Sampling might work more easily than optimization

BNP 11, June 2017

Page 21: Bayesian Empirical likelihood - Stanford University

Bayesian Empirical Likelihood 21

Overdetermined estimating equations

θ ∈ Rp, m(x, θ) ∈ Rq, q > p

q equations in p < q unknowns:1

n

n∑i=1

m(xi, θ) = 0

Popular in econometrics

E.g., generalized method of moments (GMM) Hansen (1982)

E.g., regression through (0, 0)

E(y − xβ) = E(x(y − xβ)) = 0

Misspecification

Sometimes no θ ∈ Rp gives E(m(x, θ)) = 0 ∈ Rq

BNP 11, June 2017

Page 22: Bayesian Empirical likelihood - Stanford University

Bayesian Empirical Likelihood 22

EL for overdeterminedGet n more multinomial parameters w1, . . . , wn

NPMLE

maxθ,w

n∏i=1

wi subject ton∑i=1

wim(xi, θ) = 0,n∑i=1

wi = 1, wi > 0

Then solve∑i wim(xi, θ) = 0 for θ

Newey & Smith (2004): same variance & less bias than GMM

BNP 11, June 2017

Page 23: Bayesian Empirical likelihood - Stanford University

Bayesian Empirical Likelihood 23

Bayesian EL

p(θ | x) ∝ p(θ)R(θ | x)

Lazar (2003) justifies by a Bayesian CLT, like Boos & Monahan (1992)

Also Hjort (Today!)

Is that Bayesian enough?

BNP 11, June 2017

Page 24: Bayesian Empirical likelihood - Stanford University

Bayesian Empirical Likelihood 24

Cressie-Read likelihoodsThere are numerous alternative nonparametric likelihoods.

Chang & Mukerjee (2008) give conditions for them to have a very accurate BCLT for very

general priors.

EL is uniquely able to do so.

Some others will work with a flat prior.

Also

Poster by Turbatu (yesterday)

BNP 11, June 2017

Page 25: Bayesian Empirical likelihood - Stanford University

Bayesian Empirical Likelihood 25

Least favorable families

PrEL(xi) = wi =1

n

1

1 + λ(θ)Tm(xi, θ)

p-dimensional θ =⇒ p-dimensional λ(θ).

There is a least favorable p-dimensional family for θ.

Any other family makes the inference artificially easy.

EL family is asymptotically the LFF DiCiccio & Romano (1990)

The connection

p(θ) induces a prior on the LFF, and

R(θ | x).= likelihood on LFF

=⇒ p(θ)×R(θ | x).= posterior on LFF

= posterior on θ

BNP 11, June 2017

Page 26: Bayesian Empirical likelihood - Stanford University

Bayesian Empirical Likelihood 26

To my knowledge nobody has tracked down exactly how the approximations work out for the

least favorable family motivation of Bayesian EL.

BNP 11, June 2017

Page 27: Bayesian Empirical likelihood - Stanford University

Bayesian Empirical Likelihood 27

Exponential tilting (ET)Schennach (2005)

To get an empirical probability model, for

θ ∈ Θ ⊂ Rp, partition Θ =

N⋃`=1

Θ`

p(θ) is multinomial on N disjoint cells.

Now let N →∞ for universal approximation.

Ultimately

p(θ | x) ∝ p(θ)×n∏i=1

w∗i (θ), where

w∗i maximize −n∑i=1

wi log(wi)

s.t.n∑i=1

w∗i xi = θ,∑i

wi = 1 BNP 11, June 2017

Page 28: Bayesian Empirical likelihood - Stanford University

Bayesian Empirical Likelihood 28

Exponential tilting ELSchennach (2007) Form parametric family via exponential tilting

w∗i (θ) =eλ(θ)

Tm(xi,θ)∑nj=1 e

λ(θ)Tm(xj ,θ), where

0 =n∑i=1

eλ(θ)Tm(xi,θ)m(xi, θ)

Likelihood

L(θ) =n∏i=1

nw∗i (θ)

ETELθETEL = arg max

θL(θ)

Works better on misspecified overdetermined models,

i.e., no θ ∈ Rp makes E(m(x, θ)) = 0 ∈ Rq

Reason: at least there is an estimand. BNP 11, June 2017

Page 29: Bayesian Empirical likelihood - Stanford University

Bayesian Empirical Likelihood 29

If we know that no θ can make E(m(x, θ)) = 0 then I’m not sure why we still want an

estimand. It seems like we should instead change the choice of m.

BNP 11, June 2017

Page 30: Bayesian Empirical likelihood - Stanford University

Bayesian Empirical Likelihood 30

BETELChib, Shin, Simoni (2016,2017)

Use a slack parameter Vk.

Vk 6= 0 for non-truthfulness of moment condition k.

At most m− p nonzero Vk.

Metropolized Independence Sampling from a t distribution.

Asymptotic model selection

Use log marginal likelihood.

If there is a true model then it will be selected over any false one.

Among true models: most equations holding wins.

Robust BETEL

poster by Liu (yesterday)

BNP 11, June 2017

Page 31: Bayesian Empirical likelihood - Stanford University

Bayesian Empirical Likelihood 31

EL with ABCMengersen, Pudlo & Robert (2013)

Basic EL-ABC

Sample θi ∼ p(θ)Compute wi = R(θi | x)

Normalize the weights

Advanced EL-ABC

Use adaptive multiple importance sampling (AMIS)

Algorithm has 5 loops

Max depth 3

BNP 11, June 2017

Page 32: Bayesian Empirical likelihood - Stanford University

Bayesian Empirical Likelihood 32

Hamiltonian MCMCChaudhuri, Mondal & Yin (2017)

Using p(θ)R(θ | x) they report that:

• Gibbs is not applicable

• Random walk Metropolis has problems with irregular support ofR(θ | x)

Their solution

Use gradients, Hamiltonian MCMC.

Gradient gets steep just where you need it.

BNP 11, June 2017

Page 33: Bayesian Empirical likelihood - Stanford University

Bayesian Empirical Likelihood 33

Posterior momentsVexler, Tao & Hutson (2014)

For xi ∈ R ∫ x(n)

x(1)θkR(θ | x)p(θ) dθ∫ x(n)

x(1)R(θ | x)p(θ) dθ

Nonparametric James-Stein

For xi = (xi1, . . . , xik) ∈ RK , getRj(θj) = R(θj | x1j , . . . , xnj)

θj =

∫θj Rj(θ)ϕ

(θj−θ0σ∗

)dθj∫

Rj(θ)ϕ((θj − θ0)/σ∗) dθj

σ2∗ = arg max

σ2

K∑j=1

log( 1√

2πσ

∫Rj(θj)e−θ

2j/(2σ

2) dθj

)

BNP 11, June 2017

Page 34: Bayesian Empirical likelihood - Stanford University

Bayesian Empirical Likelihood 34

QuantilesVexler, Yu & Lazar (2017)

EL Bayes factors for quantile θ ≡ Qα = Q0 vs Qα 6= Q0

They overcome a technical challenge handling∫HAR(θ | x)p(θ) dθ becauseR has jump

discontinuities.

Also two sample Bayes factors.

If scientific or regulalatory interest is in a specific quantile then the user does not have to pick a

whole parametric family.

BNP 11, June 2017

Page 35: Bayesian Empirical likelihood - Stanford University

Bayesian Empirical Likelihood 35

Quantile regression

Pr(Y 6 α+ βx | x) = τ, e.g., τ = 0.9

Estimating equations

0 = E(1y6α+βx − τ

)0 = E

(x(1y6α+βx − τ

))Lancaster & Jun (2010)

BETEL for quantiles similar to Jeffrey’s prior

Yang & He (2012)

BCLT accounting for non smoothness of estimating equations

Shrinking priors pool β for multiple τ

Both use MCMC

BNP 11, June 2017

Page 36: Bayesian Empirical likelihood - Stanford University

Bayesian Empirical Likelihood 36

Summary

We can do Bayes without requiring a parametric family for the observations.

It is the “other nonparametric Bayes”.

BNP 11, June 2017

Page 37: Bayesian Empirical likelihood - Stanford University

Bayesian Empirical Likelihood 37

Thanks

• Luke Bornn, invitation

• Sanjay Chaudhuri, many references

• Scientific and organizing committees

• Especially Judith Rousseau

• NSF DMS-1521145, DMS-1407397

BNP 11, June 2017