Bayesian Model Selection in Factorial Designs
Seminal work is by Box and Meyer Intuitive formulation and analytical
approach, but the devil is in the details!
Look at simplifying assumptions as we step through Box and Meyer’s approach
One of the hottest areas in statistics for several years
Bayesian Model Selection in Factorial Designs
There are 2k-p-1 possible (fractional) factorial models, denoted as a set {Ml}.
To simplify later calculations, we usually assume that the only active effects are main effects, two-way effects or three-way effects– This assumption is already in place for
low-resolution fractional factorials
Bayesian Model Selection in Factorial Designs
Each Ml denotes a set of active effects (both main effects and interactions) in a hierarchical model.
We will use Xik=1 for the high level of effect k and Xik=-1 for the low level of effect k.
Bayesian Model Selection in Factorial Designs
We will assume that the response variables have a linear model with normal errors given model M
Xi and b are model-specific, but we will use a saturated model in what follows
€
Yi ~ N(Xi′β, σ 2I)
Bayesian Model Selection in Factorial Designs
The likelihood for the data given the parameters has the following form
€
L(β,σ ,Y ) =1
2πσ 2exp
i=1
2k− p
∏ −1
2σ 2 Yi − ′ X iβ( )2 ⎛
⎝ ⎜
⎞
⎠ ⎟
=1
2πσ 2( )
m2
exp −1
2σ 2 Y − Xβ( )′ Y − Xβ( ) ⎛
⎝ ⎜
⎞
⎠ ⎟
Bayesian Paradigm
Unlike in classical inference, we assume the parameters, Q, are random variables that have a prior distribution, fQ(q), rather than being fixed unknown constants.
In classical inference, we estimate qby maximizing the likelihood L(q|y)
Bayesian Paradigm
Estimation using the Bayesian approach relies on updating our prior distribution for Q after collecting our data y. The posterior density, by an application of Bayes rule, is proportional to the familiar data density and the prior density:
€
fΘ|Y θ | y( ) ∝ fX |Θ(x |θ ) fΘ(θ )
Bayesian Paradigm
The Bayes estimate of Q minimizes Bayes risk—the expected value (with respect to the prior) of loss function L(q).
Under squared error loss, the Bayes estimate is the mean of the posterior distribution:
€
ˆ θ y( ) = EΘ|YΘ
Bayesian Model Selection in Factorial Designs
The Bayesian prior for models is quite straightforward. If r effects are in the model, then they are active with prior probability p
€
L(π ) = C1πr 1− π( )
n −r= C1 1− π( )
n π
1− π
⎛
⎝ ⎜
⎞
⎠ ⎟r
Bayesian Model Selection in Factorial Designs
Since we’re using a Bayesian approach, we need priors for b and s as well
€
β0 ~ N 0, σ 2 ε( ), ε =10−6
β j ~ N 0, γ 2σ 2( )
σ ~ g σ( ), g σ( ) ∝ σ −a
Bayesian Model Selection in Factorial Designs
For non-orthogonal designs, it’s common to use Zellner’s g-prior for b:
Note that we did not assign priors to g or p
€
β j ~ N 0, γ 2σ 2 X ' X( )−1
( )
Bayesian Model Selection in Factorial Designs
We can combine f(b,s,M) and f(Y|b,s,M) to obtain the full likelihood L(b,s,M,Y)
€
L(β,σ , M,Y ) = Cπ
1− π
⎛
⎝ ⎜
⎞
⎠ ⎟r
1
γ
⎛
⎝ ⎜
⎞
⎠ ⎟
n −11
2πσ 2
⎛
⎝ ⎜
⎞
⎠ ⎟
n +m +a
2
×
exp −1
2σ 2 Q(β) ⎛
⎝ ⎜
⎞
⎠ ⎟
Bayesian Model Selection in Factorial Designs
€
Q(β) = Y − Xβ( )′ Y − Xβ( ) + ′ β Γβ
Γ =ε 0
01
γ 2 In −1
⎛
⎝
⎜ ⎜
⎞
⎠
⎟ ⎟
Bayesian Model Selection in Factorial Designs
Our goal is to derive the posterior distribution of M given Y, which first requires integrating out b and s.
€
L(M |Y ) ∝ L(M,Y ) = L(β,σ ,M,Y )dβdσRn
∫0
∞
∫ =
Cπ
1− π
⎛
⎝ ⎜
⎞
⎠ ⎟r
1
γ
⎛
⎝ ⎜
⎞
⎠ ⎟
n −1 ′ X X + Γ( )−1 2
Q ′ X X + Γ( )−1
′ X Y( )(n −1+a ) 2
Bayesian Model Selection in Factorial Designs
The first term is a penalty for model complexity (smaller is better)
The second term is a measure of model fit (smaller is better)
€
L(M |Y ) ∝ L(M,Y ) =
Cπ
1− π
⎛
⎝ ⎜
⎞
⎠ ⎟r
1
γ
⎛
⎝ ⎜
⎞
⎠ ⎟
n −1 ′ X X + Γ( )−1 2
Q ′ X X + Γ( )−1
′ X Y( )(n −1+a ) 2
Bayesian Model Selection in Factorial Designs
p and g are still present. We will fix p; the method is robust to the choice of p
g is selected to minimize the probability of no active factors
€
L(M |Y ) ∝ L(M,Y ) =
Cπ
1− π
⎛
⎝ ⎜
⎞
⎠ ⎟r
1
γ
⎛
⎝ ⎜
⎞
⎠ ⎟
n −1 ′ X X + Γ( )−1 2
Q ′ X X + Γ( )−1
′ X Y( )(n −1+a ) 2
Bayesian Model Selection in Factorial Designs
With L(M|Y) in hand, we can actually evaluate the P(Mi|Y) for all Mi for any prior choice of p, provided the number of Mi is not burdensome
This is in part why we assume eligible Mi only include lower order effects.
Bayesian Model Selection in Factorial Designs
Greedy search or MCMC algorithms are used to select models when they cannot be itemized
Selection criteria include Bayes Factor, Schwarz criterion, Bayesian Information Criterion
Refer to R package BMA and bic.glm for fitting more general models.
Bayesian Model Selection in Factorial Designs
For each effect, we sum the probabilities for all Mi that contain that effect and obtain a marginal posterior probability for that effect.
These marginal probabilities are relatively robust to the choice of p.
Case Study
Violin data* (24 factorial design with n=11 replications)
Response: Decibels Factors
– A: Pressure (Low/High)– B: Placement (Near/Far)– C: Angle (Low/High)– D: Speed (Low/High)
*Carla Padgett, STAT 706 taught by Don Edwards
Top Related