Post on 08-Jan-2022
Bayesian Estimation & Model Evaluation
Frank SchorfheideUniversity of Pennsylvania
MFM Summer Camp
June 12, 2016
Frank Schorfheide Bayesian Estimation & Model Evaluation
Why Bayesian Inference?
• Why not?
p(θ|Y ) =p(Y |θ)p(θ)∫p(Y |θ)p(θ)dθ
• Treat uncertainty with respect to shocks, latent states, parameters,and model specifications uncertainty symmetrically.
• Condition inference on what you know (the data Y ) instead of whatyou don’t know (the parameter θ).
• Make optimal decision conditional on observed data.
Frank Schorfheide Bayesian Estimation & Model Evaluation
Excuses and Overview
• Too little time to provide a detailed survey of state-of-the-artBayesian methods.
• Instead: an eclectic collection of ideas and insights related to:
1 Model Development
2 Identification
3 Priors
4 Computations
5 Working with Multiple Models
Frank Schorfheide Bayesian Estimation & Model Evaluation
1. Model Development
• Bayesian estimation can take a lot of time... so don’t waste it onbad models!
• Suppose you have an elaborate macro-finance DSGE model...
• Applied theorists get credit for plugging parameter values into themodel and solving/simulating it.
• You can easily get extra credit by:• specifying a prior distribution p(θ);• generating draws θi , i = 1, . . . ,N from prior;• simulating trajectories Y i (conditional on θi ), i = 1, . . . ,N;• computing sample statistics S(Y i );• comparing the distribution of simulated sample statistics observed
sample statistic S(Y );• calling it a prior predictive check
Frank Schorfheide Bayesian Estimation & Model Evaluation
1. Predictive Checks – An Example
Reference: Chang, Doh, and Schorfheide (2007, JMCB)
Frank Schorfheide Bayesian Estimation & Model Evaluation
2. Identification
• We are trying to learn the parameters θ from the data.
• Formal definitions... e.g., model is identified at θ0 ifp(Y |θ) = p(Y |θ0) implies that θ = θ0.
• Without identification or with weak identfication:
• use more/different data to achieve identification;
• use identification-robust inference procedures.
• Lack of identification does not raise conceptual issues for Bayesianinference (as long as priors are proper), but possibly computationalchallenges.
Reference: Fernandez-Villaverde, Rubio-Ramirez, Schorfheide (2016, HB of Macro
Chapter)
Frank Schorfheide Bayesian Estimation & Model Evaluation
2. (Lack of) Identification – An Analytical Example
• Let φ be an identifiable reduced-form parameter.
• Let θ be a structural parameter of interest:
φ ≤ θ and θ ≤ φ+ 1.
• Parameter θ is set-identified.
• The interval Θ(φ) = [φ, φ + 1] is called the identified set.
• This problem shows up prominently in VARs identified with signrestrictions.
References: Moon and Schorfheide (2012, Econometrica); Schorfheide (2016,
Discussion of World Congress Lectures by Muller and Uhlig)
Frank Schorfheide Bayesian Estimation & Model Evaluation
2. (Lack of) Identification – An Analytical Example
• Joint posterior of θ and φ:
p(θ, φ|Y ) = p(φ|Y )p(θ|φ,Y ) ∝ p(Y |φ)p(θ|φ)p(φ).
• Because θ does not enter the likelihood function, we deduce that
p(φ|Y ) =p(Y |φ)p(φ)∫p(Y |φ)p(φ)dφ
p(θ|φ,Y ) = p(θ|φ).
No updating of beliefs about θ conditional on φ!
• Marginal posterior distribution of θ:
p(θ|Y ) =
∫ θ
θ−1
p(φ|Y )p(θ|φ)dφ
Updating of marginal posterior of θ!
Frank Schorfheide Bayesian Estimation & Model Evaluation
2. An Analytical Example: Posterior p(θ|Y )
Assume φ|Y ∼ N(− 0.5, V
); θ|φ ∼ U[φ, φ+ 1].
V is equal to 1/4 (solid red), 1/20 (dashed blue), and 1/100 (dottedgreen).
Post
erio
r D
ensi
ty π
(θ)
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
Parameter θ-2 -1.5 -1 -0.5 0 0.5 1 1.5
Frank Schorfheide Bayesian Estimation & Model Evaluation
3. Prior Distributions
• Ideally: probabilistic representation of our knowledge/beliefs beforeobserving sample Y .
• More realistically: choice of prior as well as model are influenced bysome observations. Try to keep influence small or adjust measures ofuncertainty.
• Views about role of priors:
1 keep them “uninformative” (???) so that posterior inherits shape oflikelihood function;
2 use them to regularize the likelihood function;
3 incorporate information from sources other than Y ;
Frank Schorfheide Bayesian Estimation & Model Evaluation
3. Role of Priors – Example 1
• “Uninformative” priors?
• Consider structural VAR
yt = Φyt−1 + ΣtrΩεt , ut = ΣtrΩεt , E[utu′t ] = Σ
• Uniform distribution on orthonormal matrix Ω does not induceuniform prior over identified set for IRF
IRF (i , h) = ΦhΣtr [Ω].i = ΦhΣtrq, where ‖q‖ = 1
θ = q1
q2
F q(Σtr )
Fθ(Σtr )
Σtr21q1 + Σtr
22q2 = 0
Reference: Schorfheide (2016, World Congress Discussion)
Frank Schorfheide Bayesian Estimation & Model Evaluation
3. Role of Priors – Example 2a
• Consider model
yt = θ1x1,t + θ1θ2x2,t + ut .
• No identification of θ2 if θ1 = 0.
• Models with multiplicative parameters generate likelihood functionsthat look like this...
0.0 0.1 0.2 0.3 0.4 0.5
0.2
0.4
0.6
0.8
θ2
θ1
Frank Schorfheide Bayesian Estimation & Model Evaluation
3. Role of Priors – Example 2a
• Identification problem also distorts
p(θ1 = 0|Y ) ∝∫
p(Y |θ1 = 0, θ2)p(θ2)p(θ1 = 0)dθ2.
• Reparameterize: α1 = θ1, α2 = θ1θ2.
• Prior p(α1, α2) ∝ c can regularize the problem.
• Jacobian:∣∣∣∣ ∂α∂θ′∣∣∣∣ =
∣∣∣∣ 1 0θ2 θ1
∣∣∣∣ = |θ1|.
• Prior density p(θ1, θ2) ∝ |θ1| vanishes as θ1 approaches point ofnon-identification.
• More generally: try to add information when data are notparticularly informative.
References: For cointegration model: Kleibergen and van Dijk (1994), Kleibergen and
Paap (2002)
Frank Schorfheide Bayesian Estimation & Model Evaluation
3. Role of Priors – Example 2b
• For instance, high-dimensional VARs:
Y = XΦ + U, ut ∼ N(0,Σ)
with low observation-parameter ratio.
• Hierarchical (conjugate) MNIW prior p(Φ,Σ|λ) adds information.Frequentist perspective: add some bias and reduce variance toimprove MSE.
• How much? Data-driven choice of λ (empirical Bayes):
λ = argmaxλ
∫p(Y |Φ,Σ)p(Φ,Σ|λ)d(Φ,Σ)
• Or specify prior p(λ) and integrate out hyperparameters.
• Alternative priors: LASSO, spike-and-slab,...
References: Giannone, Lenza, and Primiceri (2014, REStat)
Frank Schorfheide Bayesian Estimation & Model Evaluation
3. Role of Priors – Example 3
• Prior elicitation based on: pre-sample information; information fromexcluded data series; or micro (macro) level information whenestimating a model on macro (micro) data.
• A cute example...• Production function:
Yt = (AtHt)αK 1−α
t
(1− ϕ ·
(Ht
Ht−1− 1
)2).
• Prior for adjustment costs ϕ?• Firms can either search for workers, incurring adjustment costsϕ( ∆H
H)2Y , or pay head hunters for finding workers.
• Head hunters service fee is ζW∆H.• Head hunters tend to charge about ζ = 1/3 to 2/3 of quarterly
earnings of a worker.• Recruiting costs should be approximately the same:ϕ( ∆H
H)2Y = ζW∆H.
• With the labor share of 1/3 (=WHY
) for a size of one percent increaseof employment, ∆H
H= 1%, we obtain a range of 22 to 44 for ϕ.
Reference: Chang, Doh, and Schorfheide (2007, JMCB)
Frank Schorfheide Bayesian Estimation & Model Evaluation
4. Computations
• Practical work utilizes algorithms to generate draws θi , i = 1, . . . ,Nfrom posterior p(θ|Y ).
• Post-process draws by converting them into object of interesthi = h(θi ) to characterize p(h(θ)|Y )=⇒ inference and decision making under uncertainty.
• Important algorithms:
• importance sampling
• Markov chain Monte Carlo (MCMC) algorithms, e.g.,Metropolis-Hastings samplers or Gibbs samplers
• More recently: widespread access to parallel computationenvironments.
• Sequential Monte Carlo (SMC) techniques provide an interestingalternative.
Reference: Herbst and Schorfheide (2015, Princeton University Press)
Frank Schorfheide Bayesian Estimation & Model Evaluation
4. Importance Sampling
• Target posterior π(θ) ∝ f (θ).
• Use identity∫h(θ)f (θ)dθ =
∫h(θ) f (θ)
g(θ)g(θ)dθ.
• θi ’s are draws from g(·).• approximation:
Eπ[h] ≈1N
∑Ni=1 h(θi )w(θi )
1N
∑Ni=1 w(θi )
, w(θ) =f (θ)
g(θ).
6 4 2 0 2 4 60.00
0.05
0.10
0.15
0.20
0.25
0.30
0.35
0.40
f
g1
g2
6 4 2 0 2 4 6
weights
f/g1
f/g2
Frank Schorfheide Bayesian Estimation & Model Evaluation
4. A Challenging Posterior
• Consider the state-space model:
yt = [1 1]st , st =
[θ2
1 0(1− θ2
1)− θ1θ2 (1− θ21)
]st−1 +
[10
]εt .
• Shocks: εt ∼ iidN(0, 1); uniform prior.
• Simulate T = 200 observations given θ = [0.45, 0.45]′, which isobservationally equivalent to θ = [0.89, 0.22]′.
0.0 0.2 0.4 0.6 0.8 1.00.0
0.2
0.4
0.6
0.8
1.0θ2
θ1
Frank Schorfheide Bayesian Estimation & Model Evaluation
4. From Importance to Sequential Importance Sampling
θ1
0.00.2
0.40.6
0.81.0
n10
2030
4050
0
1
2
3
4
5
πn(θ) =[p(Y |θ)]φnp(θ)∫[p(Y |θ)]φnp(θ)dθ
=fn(θ)
Zn, φn =
(n
Nφ
)λ
Frank Schorfheide Bayesian Estimation & Model Evaluation
4. SMC Algorithm: A Graphical Illustration
C S M C S M C S M
−10
−5
0
5
10
φ0 φ1 φ2 φ3
• πn(θ) is represented by a swarm of particles θin,W inNi=1:
hn,N =1
N
N∑i=1
W inh(θin)
a.s.−→ Eπn [h(θn)].
• C is Correction; S is Selection; and M is Mutation.
Frank Schorfheide Bayesian Estimation & Model Evaluation
4. SMC Algorithm
1 Initialization. (φ0 = 0). Draw the initial particles from the prior:
θi1iid∼ p(θ) and W i
1 = 1, i = 1, . . . ,N.
2 Recursion. For n = 1, . . . ,Nφ,
1 Correction. Reweight the particles from stage n − 1 by defining theincremental weights
w in = [p(Y |θin−1)]φn−φn−1 (1)
and the normalized weights
W in =
w inW
in−1
1N
∑Ni=1 w
inW i
n−1
, i = 1, . . . ,N. (2)
An approximation of Eπn [h(θ)] is given by
hn,N =1
N
N∑i=1
W inh(θin−1). (3)
2 Selection.
Frank Schorfheide Bayesian Estimation & Model Evaluation
4. SMC Algorithm
1 Initialization.
2 Recursion. For n = 1, . . . ,Nφ,
1 Correction.2 Selection. (Optional Resampling) Let θNi=1 denote N iid draws
from a multinomial distribution characterized by support points andweights θin−1, W
inNi=1 and set W i
n = 1.An approximation of Eπn [h(θ)] is given by
hn,N =1
N
N∑i=1
W inh(θin). (4)
3 Mutation. Propagate the particles θi ,W in via NMH steps of a MH
algorithm with transition density θin ∼ Kn(θn|θin; ζn) and stationarydistribution πn(θ). An approximation of Eπn [h(θ)] is given by
hn,N =1
N
N∑i=1
h(θin)W in . (5)
Frank Schorfheide Bayesian Estimation & Model Evaluation
4. Remarks
• Correction Step:• reweight particles from iteration n− 1 to create importance sampling
approximation of Eπn [h(θ)]
• Selection Step: the resampling of the particles• (good) equalizes the particle weights and thereby increases accuracy
of subsequent importance sampling approximations;• (not good) adds a bit of noise to the MC approximation.
• Mutation Step:• adapts particles to posterior πn(θ);• imagine we don’t do it: then we would be using draws from prior
p(θ) to approximate posterior π(θ), which can’t be good!
θ1
0.00.2
0.40.6
0.81.0
n10
2030
4050
0
1
2
3
4
5
Frank Schorfheide Bayesian Estimation & Model Evaluation
5. Working with Multiple Models
• Assign prior probabilities γj,0 to models Mj , j = 1, . . . , J.• Posterior model probabilities are given by
γj,T =γj,0p(Y |Mj)∑Jj=1 γj,0p(Y |Mj)
,
where
p(Y |Mj) =
∫p(Y |θ(j),Mj)p(θ(j)|Mj)dθ(j)
• Log marginal data densities are one-step-ahead predictive scores:
ln p(Y |Mj)
=T∑t=1
ln
∫p(yt |θ(j),Y1:t−1,Mj)p(θ(j)|Y1:t−1,Mj)dθ(j).
• Bayesian model averaging:
p(h|Y ) =J∑
j=1
γj,Tp(hj(θ(j))|Y ,Mj).
Frank Schorfheide Bayesian Estimation & Model Evaluation
5. Working with Multiple Models
• Application: DSGE model with and without financial frictions.
• Food for thought:
• Bayesian model averaging essentially assumes that the model spaceis complete. Is it?
• Time-varying model weights can be a stand in for nonlinearmacroeconomic dynamics.
Reference: Del Negro, Hasegawa, and Schorfheide (2016, JoE)
Frank Schorfheide Bayesian Estimation & Model Evaluation
5. A Stylized Framework
• Consider principal-agent setting in mind to separate the task ofestimating models from the task of combining them.
• Agents Mm= econometric modelers:• provide principal with predictive densities p(yt+1|Imt ,Mm);• are rewarded based on the realized value of ln p(yt+1|Imt ,Mm)
(induces truth-telling).• Imt is model specific information set.
• Principal P = policy maker who aggregates information obtainedfrom modelers:
p(yt+1|λ, IPt ,P) = λp(yt+1|I1t ,M1) + (1− λ)p(yt+1|I2
t ,M2)
where IPt = y1:t , p(yτ |Imτ−1,Mm)tτ=1 for m = 1, 2
Frank Schorfheide Bayesian Estimation & Model Evaluation
5. Bayesian Model Averaging (BMA): λ ∈ 1, 0
• At any time T the policy maker can use the predictive densities toform marginal likelihoods:
p(Y1:T |Mi ) =T∏t=1
p(yt |Y1:t−1,Mi )
• . . . use them to update model probabilities:
λBMAT = P[λ = 1|Y1:T ]︸ ︷︷ ︸
P[M1 is correct]
=λBMA
0 p(Y1:T |M1)
λBMA0 p(Y1:T |M1) + (1− λBMA
0 )p(Y1:T |M2)
• Predictive density:
pBMA(yt+1|IPt ,P) = λBMAt p(yt+1|Y1:t ,M1)
+(1− λBMAt )p(yt+1|Y1:t ,M2)
Frank Schorfheide Bayesian Estimation & Model Evaluation
5. BMA and Model Misspecification
• BMA is based on the assumption that the model space contains the‘true’ model (“complete model space”):
p(y1:T |λ,P) =
p(y1:T |M1) =
∏Tt=1 p(yt |Y1:t−1,M1) if λ = 1
p(y1:T |M2) =∏T
t=1 p(yt |Y1:t−1,M2) if λ = 0
DGP = p(Y1:T )
KL Discrepancy
p(Y1:T |M1) p(Y1:T |M2)
• λBMAT
a.s.−→ 1 or 0 as T −→∞ (Dawid 1984, others): Asymptotically,no model averaging! All the weight is on model closest in KLdiscrepancy.
Frank Schorfheide Bayesian Estimation & Model Evaluation
5. Optimal (Static) Pools: λ ∈ [0, 1]
• A policy maker concerned about misspecification of Mi could createconvex combinations of predictive densities:
DGP = p(Y1:T )
p(Y1:T |M1) p(Y1:T |M2)
p(Y1:T |λ,P) =T∏t=1
λ p (yt |Y1:t−1,M1) + (1− λ) p (yt |Y1:t−1,M2)
• λSPT = argmaxλ∈[0,1]p(y1:T |λ,P) generally 6→ 1 or 0 (unless one ofthe models is correct): Exploits gains from diversification.
References: Hall and Mitchell (2007), Geweke and Amisano (2011)
Frank Schorfheide Bayesian Estimation & Model Evaluation
5. Dynamic Pools - Prior for Weights λ1:T
• Dynamic pool: replace λ by sequence λt
• Likelihood function:
p(y1:T |λ1:T ,P) =T∏t=1
[λtp(yt |y1:t−1,M1)+(1−λt)p(yt |y1:t−1,M2)
].
• Prior p(λ1:T |ρ) for sequence λ1:T :
xt = ρxt−1 +√
1− ρ2εt , εt ∼ iid N(0, 1), x0 ∼ N(0, 1),
λt = Φ(xt)
where Φ(.) is the Gaussian CDF.
• Unconditionally, λt ∼ U[0, 1] for all t.
• Hyperparameter ρ controls the amount of “smoothing.”• As ρ −→ 1: dynamic pool −→ static pool.
• Specify a prior distribution for ρ (and other hyperparamters) andbase our results on the (real time) posterior distribution.
Frank Schorfheide Bayesian Estimation & Model Evaluation
5. Dynamic Pools - Nonlinear State Space System
• Measurement equation:
p(yt |λt ,P) = λtp(yt |y1:t−1,M1) + (1− λt)p(yt |y1:t−1,M2)
• Transition equation:
λt = Φ(xt), xt = ρxt−1 +√
1− ρ2εt , εt ∼ iid N(0, 1)
• Use particle filter to construct the sequence p(λt |ρ, IPt ,P)..
Frank Schorfheide Bayesian Estimation & Model Evaluation
5. Application
• Two models: Smets-Wouters and Smets-Wouters with financialfrictions
• Track relative performance over time and construct real-time weights
Frank Schorfheide Bayesian Estimation & Model Evaluation
5. Log Scores Comparison: SWFF vs SWπ
p(yt+h,h|Imt+ ,Mm)
Frank Schorfheide Bayesian Estimation & Model Evaluation
5. Dynamic Pools – Posterior p(h)DP(λt |IPt ,P)
ρ ∼ U[0, 1], µ = 0, σ = 1
Frank Schorfheide Bayesian Estimation & Model Evaluation
To Recap...
• Too little time to provide a detailed survey of state-of-the-artBayesian methods.
• Instead: an eclectic collection of ideas and insights related to:
1 Model Development
2 Identification
3 Priors
4 Computations
5 Working with Multiple Models
Frank Schorfheide Bayesian Estimation & Model Evaluation