What is Empirical Bayes? A brief history, well-known examples &...

What is Empirical Bayes? A brief history,well-known examples & connections.

Berkeley Stanford Joint Colloquium (BSJC) in Statistics

Leonid Pekelis

Apr 24, 2012

A brief (incomplete, biased) history.

I von Mises (1940s)

I Robbins (1955) - “An Empirical Bayes Approach to Statistics”

I Efron & Morris (1977) - “Stein’s Paradox in Statistics”I Stein (1956), James & Stein (1961)

I Morris (1983) - “Parametric Empirical Bayes Inference:Theory & Applications”

I Casella (1985) - “An Introduction to Empirical Bayes DataAnalysis”

I Efron, Tibshirani, Storey & Tusher (2001) - “Empirical BayesAnalysis of a Microarray Experiment”

I Benjamini & Hochberg (1995)

I Efron (2010) - “Large Scale Inference”

I von Mises (1940s)

A brief history.

Robbins - “Our reason for doing this ... is that w hope for large N to beable to extract information about the [prior] from the valueswhich have been observed, hopefully in such a way that[EBayes] will be close to the optimal but unknown [Bayes] ...”

Efron - “Empirical Bayes blurs the line between testing andestimation as well as between frequentism Bayesianism.”

Morris - “Should statisticians use empirical Bayes modeling in mostmultiparameter inference problems? Probably not”

A brief history.

I Non-Parametric Empirical Bayes

I leaves prior completely unspecified (Robbins - 1955, Tweedie’sformula)

I Parametric Empirical Bayes

I Specifies family of priors, estimate hyperparameters(James-Stein)

A brief history.

I Non-Parametric Empirical BayesI leaves prior completely unspecified (Robbins - 1955, Tweedie’s

formula)

I Parametric Empirical BayesI Specifies family of priors, estimate hyperparameters

(James-Stein)

Robbins Original Formulation

X ∈ X , and a σ-finite ν(x)

µ ∈ Q, µ ∼ G (µ)

when parameter is µ, X |µ ∼ fµ

for any loss L(t, µ), t a decision function

R(t,G ) = IEG IEfµ [L(t(X ), µ)]

R(G ) = mint

R(t,G )

X ∈ X , and a σ-finite ν(x)

µ ∈ Q, µ ∼ G (µ)

when parameter is µ, X |µ ∼ fµ

for any loss L(t, µ), t a decision function

R(t,G ) = IEG IEfµ [L(t(X ), µ)]

R(G ) = mint

R(t,G )

R(G ) = mint

R(t,G )

What if G is not known?

But you have samples (xi , µi ),i = 1, . . . ,N iid with µi ∼ G , xi ∼ fµ. Then you can estimatetn(x) = tn(x ; x1, . . . , xn) and get

RN({tN},G ) = IExR(tN ,G ) ≥ R(G )

DefinitionT = {tN} is asymptotically optimal relative to G if

RN(T ,G ) = R(G )

We want the above to hold on a class G containing the true G .

R(G ) = mint

R(t,G )

What if G is not known? But you have samples (xi , µi ),i = 1, . . . ,N iid with µi ∼ G , xi ∼ fµ.

Then you can estimatetn(x) = tn(x ; x1, . . . , xn) and get

RN(T ,G ) = R(G )

R(G ) = mint

R(t,G )

What if G is not known? But you have samples (xi , µi ),i = 1, . . . ,N iid with µi ∼ G , xi ∼ fµ. Then you can estimatetn(x) = tn(x ; x1, . . . , xn) and get

RN(T ,G ) = R(G )

R(G ) = mint

R(t,G )

What if G is not known? But you have samples (xi , µi ),i = 1, . . . ,N iid with µi ∼ G , xi ∼ fµ. Then you can estimatetn(x) = tn(x ; x1, . . . , xn) and get

RN(T ,G ) = R(G )

Example: James-Stein Estimator

Suppose G = N(0,A) and fµ = N(µ, 1).

Then IE (µ|x) =(

1− 1A+1

And E (N−2S ) = 1

A+1 where S = ‖x‖2.

So µ̂JS(x) =(1− N−2

TheoremFor N ≥ 3, µ̂JS everywhere dominates µ̂MLE forL(t, µ) = ‖t − µ‖2.

Example: James-Stein Estimator

Suppose G = N(0,A) and fµ = N(µ, 1).

Then IE (µ|x) =(

1− 1A+1

And E (N−2S ) = 1

A+1 where S = ‖x‖2.

So µ̂JS(x) =(1− N−2

TheoremFor N ≥ 3, µ̂JS everywhere dominates µ̂MLE forL(t, µ) = ‖t − µ‖2.

Example: Benjamini-Hochberg / FDR

BH procedure: Have p1, . . . , pn p-values

Fix some q ∈ (0, 1), and take imax as the largest index suchthat p(i) ≤ i

Reject H0(i) for i ≤ imax .

TheoremIf pi ∼ U(0, 1) under H0i , and independent for all i , then

)≤ q

Example: Benjamini-Hochberg / FDR

BH procedure: Have p1, . . . , pn p-values

Fix some q ∈ (0, 1), and take imax as the largest index suchthat p(i) ≤ i

Reject H0(i) for i ≤ imax .

TheoremIf pi ∼ U(0, 1) under H0i , and independent for all i , then

)≤ q

Example: BH / FDR - Empirical Bayes

Suppose pi correspond to zi test statistics.

Specifically µ ∼ π01(µ = 0) + π1g(µ), and x |µ ∼ N(µ, 1).And test H0i : µi > 0.

Define F̄ as empirical distribution of z, F0 as common nulldistribution, and ˆFdr(z) = π0F0(z)

F̄ (z).

Corollary

Then the procedure rejecting for i < imax , with imax largest suchthat ˆFdr(z(i)) < q, controls false discovery propostion at level q asbefore.

Example: BH / FDR - Empirical Bayes

Suppose pi correspond to zi test statistics.

Specifically µ ∼ π01(µ = 0) + π1g(µ), and x |µ ∼ N(µ, 1).And test H0i : µi > 0.

Define F̄ as empirical distribution of z, F0 as common nulldistribution, and ˆFdr(z) = π0F0(z)

F̄ (z).

Corollary

Then the procedure rejecting for i < imax , with imax largest suchthat ˆFdr(z(i)) < q, controls false discovery propostion at level q asbefore.

Example: Tweedie’s formula

µ ∼ g(µ), z |µ ∼ fµ(z) = eµz−ψ(µ)f0(z)

Bayes rule: g(µ|z) =fµ(z)g(µ)

f (z) , where f (z) =∫fµ(z)g(µ)dµ

⇒ g(µ|z) = ezµ−λ(z)(g(µ)e−ψ(µ)

), λ(z) = log

(f (z)

)⇒ IE (µ|z) = λ′(z) , Var(µ|z) = λ′′(z)

⇒ µ|z ∼ (l ′(z)− l ′0(z), l ′′(z)− l ′′0 (z)) , l(z) = log f (z)

For: z |µ ∼ N(µ, 1), µ|z ∼ (z + l ′(z), 1 + l ′′(z).

), λ(z) = log

(f (z)

)⇒ IE (µ|z) = λ′(z) , Var(µ|z) = λ′′(z)

For: z |µ ∼ N(µ, 1), µ|z ∼ (z + l ′(z), 1 + l ′′(z).

), λ(z) = log

(f (z)

)⇒ IE (µ|z) = λ′(z) , Var(µ|z) = λ′′(z)

For: z |µ ∼ N(µ, 1), µ|z ∼ (z + l ′(z), 1 + l ′′(z).

The Empirical Bayes part comes from assuming

f (z) = e∑J

j=0 βjzj

Then can estimate β̂ by GLM (Lindsey’s Method).

And finally l̂ ′(z) =∑J

j=1 j β̂jzj−1.

TheoremUnder above assumptions, Lindsey’s Method produces nearlyunbiased estimates of f .

The Empirical Bayes part comes from assuming

f (z) = e∑J

j=0 βjzj

Then can estimate β̂ by GLM (Lindsey’s Method).

And finally l̂ ′(z) =∑J

j=1 j β̂jzj−1.

TheoremUnder above assumptions, Lindsey’s Method produces nearlyunbiased estimates of f .

Tweedie & James-Stein

Suppose µ ∼ N(0,A), and take J = 2.

µ̂Tweedie(z) =

(1− N

µ̂JS(z) =

(1− N − 2

Tweedie & James-Stein

Suppose µ ∼ N(0,A), and take J = 2.

µ̂Tweedie(z) =

(1− N

µ̂JS(z) =

(1− N − 2

Other Connections

I under 2 groups model, fdr(z) = π0f0(z)f(z)

− ∂∂z log(fdr(z)) = l ′(z)− l ′0(z) = IE (µ|z)

I ˆFdr similar, and more general than Tweedie’s as it onlyrequires specification of null density f0(z).

I gBH(µ) = π01(µ = 0) + π1g(µ), and gJS(µ) = N(0,A)

I JS can have N = 10, but Fdr or Tweedie with J > 2 needN >> 10

I Suggests Bias / Variance tradeoff in empirical Bayesestimates.

Other Connections

The Last Slide

Robbins - “... it is obvious that any theory of statistical inference willfind itself in and out of fashion as the winds of doctrine blow.Here, then, are some remarks and references for furtherreading which I hope will interest my audience in thinking thematter through for themselves.”

Me - “Thanks!”

The Last Slide

Robbins - “... it is obvious that any theory of statistical inference willfind itself in and out of fashion as the winds of doctrine blow.Here, then, are some remarks and references for furtherreading which I hope will interest my audience in thinking thematter through for themselves.”

Me - “Thanks!”

What is Empirical Bayes? A brief history, well-known examples &...

Documents

Transcript of What is Empirical Bayes? A brief history, well-known examples &...