Download - Bayesian Model Selection in Factorial Designs Seminal work is by Box and Meyer Seminal work is by Box and Meyer Intuitive formulation and analytical approach,

Bayesian Model Selection in Factorial Designs

Seminal work is by Box and Meyer Intuitive formulation and analytical

approach, but the devil is in the details!

Look at simplifying assumptions as we step through Box and Meyer’s approach

One of the hottest areas in statistics for several years


There are 2k-p-1 possible (fractional) factorial models, denoted as a set {Ml}.

To simplify later calculations, we usually assume that the only active effects are main effects, two-way effects or three-way effects– This assumption is already in place for

low-resolution fractional factorials


Each Ml denotes a set of active effects (both main effects and interactions) in a hierarchical model.

We will use Xik=1 for the high level of effect k and Xik=-1 for the low level of effect k.


We will assume that the response variables have a linear model with normal errors given model M

Xi and b are model-specific, but we will use a saturated model in what follows

€

Yi ~ N(Xi′β, σ 2I)


The likelihood for the data given the parameters has the following form

€

L(β,σ ,Y ) =1

2πσ 2exp

i=1

2k− p

∏ −1

2σ 2 Yi − ′ X iβ( )2 ⎛

⎝ ⎜

⎞

⎠ ⎟

=1

2πσ 2( )

m2

exp −1

2σ 2 Y − Xβ( )′ Y − Xβ( ) ⎛

⎝ ⎜

⎞

⎠ ⎟

Bayesian Paradigm

Unlike in classical inference, we assume the parameters, Q, are random variables that have a prior distribution, fQ(q), rather than being fixed unknown constants.

In classical inference, we estimate qby maximizing the likelihood L(q|y)

Bayesian Paradigm

Estimation using the Bayesian approach relies on updating our prior distribution for Q after collecting our data y. The posterior density, by an application of Bayes rule, is proportional to the familiar data density and the prior density:

€

fΘ|Y θ | y( ) ∝ fX |Θ(x |θ ) fΘ(θ )

Bayesian Paradigm

The Bayes estimate of Q minimizes Bayes risk—the expected value (with respect to the prior) of loss function L(q).

Under squared error loss, the Bayes estimate is the mean of the posterior distribution:

€

ˆ θ y( ) = EΘ|YΘ


The Bayesian prior for models is quite straightforward. If r effects are in the model, then they are active with prior probability p

€

L(π ) = C1πr 1− π( )

n −r= C1 1− π( )

n π

1− π

⎛

⎝ ⎜

⎞

⎠ ⎟r


Since we’re using a Bayesian approach, we need priors for b and s as well

€

β0 ~ N 0, σ 2 ε( ), ε =10−6

β j ~ N 0, γ 2σ 2( )

σ ~ g σ( ), g σ( ) ∝ σ −a


For non-orthogonal designs, it’s common to use Zellner’s g-prior for b:

Note that we did not assign priors to g or p

€

β j ~ N 0, γ 2σ 2 X ' X( )−1

( )


We can combine f(b,s,M) and f(Y|b,s,M) to obtain the full likelihood L(b,s,M,Y)

€

L(β,σ , M,Y ) = Cπ

1− π

⎛

⎝ ⎜

⎞

⎠ ⎟r

1

γ

⎛

⎝ ⎜

⎞

⎠ ⎟

n −11

2πσ 2

⎛

⎝ ⎜

⎞

⎠ ⎟

n +m +a

2

×

exp −1

2σ 2 Q(β) ⎛

⎝ ⎜

⎞

⎠ ⎟


€

Q(β) = Y − Xβ( )′ Y − Xβ( ) + ′ β Γβ

Γ =ε 0

01

γ 2 In −1

⎛

⎝

⎜ ⎜

⎞

⎠

⎟ ⎟


Our goal is to derive the posterior distribution of M given Y, which first requires integrating out b and s.

€

L(M |Y ) ∝ L(M,Y ) = L(β,σ ,M,Y )dβdσRn

∫0

∞

∫ =

Cπ

1− π

⎛

⎝ ⎜

⎞

⎠ ⎟r

1

γ

⎛

⎝ ⎜

⎞

⎠ ⎟

n −1 ′ X X + Γ( )−1 2

Q ′ X X + Γ( )−1

′ X Y( )(n −1+a ) 2


The first term is a penalty for model complexity (smaller is better)

The second term is a measure of model fit (smaller is better)

€

L(M |Y ) ∝ L(M,Y ) =

Cπ

1− π

⎛

⎝ ⎜

⎞

⎠ ⎟r

1

γ

⎛

⎝ ⎜

⎞

⎠ ⎟

n −1 ′ X X + Γ( )−1 2

Q ′ X X + Γ( )−1

′ X Y( )(n −1+a ) 2


p and g are still present. We will fix p; the method is robust to the choice of p

g is selected to minimize the probability of no active factors

€

L(M |Y ) ∝ L(M,Y ) =

Cπ

1− π

⎛

⎝ ⎜

⎞

⎠ ⎟r

1

γ

⎛

⎝ ⎜

⎞

⎠ ⎟

n −1 ′ X X + Γ( )−1 2

Q ′ X X + Γ( )−1

′ X Y( )(n −1+a ) 2


With L(M|Y) in hand, we can actually evaluate the P(Mi|Y) for all Mi for any prior choice of p, provided the number of Mi is not burdensome

This is in part why we assume eligible Mi only include lower order effects.


Greedy search or MCMC algorithms are used to select models when they cannot be itemized

Selection criteria include Bayes Factor, Schwarz criterion, Bayesian Information Criterion

Refer to R package BMA and bic.glm for fitting more general models.


For each effect, we sum the probabilities for all Mi that contain that effect and obtain a marginal posterior probability for that effect.

These marginal probabilities are relatively robust to the choice of p.

Case Study

Violin data* (24 factorial design with n=11 replications)

Response: Decibels Factors

– A: Pressure (Low/High)– B: Placement (Near/Far)– C: Angle (Low/High)– D: Speed (Low/High)

*Carla Padgett, STAT 706 taught by Don Edwards

Case Study

Fractional Factorial Design:

• A, B, and D significant

• AB marginal

*Carla Padgett, STAT 706 taught by Don Edwards

Bayesian Model Selection:• A, B, D, AB, AD, BD significant• All others negligible