Frequentist vs bayesian statistics resources to help you choose (updated)

3

Click here to load reader

Transcript of Frequentist vs bayesian statistics resources to help you choose (updated)

Page 1: Frequentist vs bayesian statistics  resources to help you choose (updated)

Oikos Blog

• Home

• Oikos Online

• Editorial Office

• About

• Editor’s Choice

Posted by: Jeremy Fox | October 11, 2011

Frequentist vs. Bayesian statistics: resources to help you choose (UPDATED)

There are two dominant approaches to statistics. Here, I explain why you

need to choose one or the other, and link to resources to help you make

your choice.

Most ecologists use the frequentist approach. This approach focuses on

P(D|H), the probability of the data, given the hypothesis. That is, this

approach treats data as random (if you repeated the study, the data might

come out differently), and hypotheses as fixed (the hypothesis is either true

or false, and so has a probability of either 1 or 0, you just don’t know for

sure which it is). This approach is called frequentist because it’s concerned

with the frequency with which one expects to observe the data, given some

hypothesis about the world. The P values you see in the “Results” sections

of most empirical ecology papers are values of P(D|H), where H is usually

some “null” hypothesis.

Bayesian statistical approaches are increasingly common in ecology.

Bayesian statistics focuses on P(H|D), the probability of the hypothesis,

given the data. That is, this approach treats the data as fixed (these are the

only data you have) and hypotheses as random (the hypothesis might be

true or false, with some probability between 0 and 1). This approach is

Page 2: Frequentist vs bayesian statistics  resources to help you choose (updated)

called Bayesian because you need to use Bayes’ Theorem to calculate

P(H|D).

At a broad-brush verbal level, both these approaches sound eminently

reasonable, to the point that differences between them sound subtle to the

point of unimportance. A frequentist basically says, “The world is a certain

way, but I don’t know how it is. Further, I can’t necessarily tell how the

world is just by collecting data, because data are always finite and noisy. So

I’ll use statistics to line up the alternative possibilities, and see which ones

the data more or less rule out.” A Bayesian basically says, “I don’t know

how the world is. All I have to go on is finite data. So I’ll use statistics to

infer something from those data about how probable different possible

states of the world are.” And indeed, there are contexts in which Bayesian

and frequentist statistics easily coexist.

But there are many contexts in which they don’t; frequentist and Bayesian

approaches represent deeply conflicting approaches with deeply conflicting

goals. Perhaps the deepest and most important conflict has to do with

alternative interpretations of what “probability” means. These alternative

interpretations arise because it often doesn’t make sense to talk about

possible states of the world. For instance, there’s either life on Mars, or

there’s not. We don’t know for sure which it is, but we do know for sure that

it’s one or the other. So if you insist on trying to put a number on the

probability of life on Mars (i.e. the probability that the hypothesis “There is

life on Mars” is true), you are forced to drop the frequentist interpretation of

probability. A frequentist interprets the word “probability” as meaning “the

frequency with which something would happen, in a lengthy series of trials”.

The most common alternative interpretation of “probability” (though not the

only one) is as “subjective degree of belief”: the probability that you

(personally) attach to a hypothesis is a measure of how strongly you

(personally) believe that hypothesis. So a frequentist would never say

“There’s probably not life on Mars”, unless she was just speaking loosely

and using that phrase as shorthand for “The data are inconsistent with the

hypothesis of life on Mars”. But the most common sort of Bayesian would

say “There’s probably not life on Mars”, not as a loose way of speaking

about Mars, but as a literal and precise way of speaking about his beliefs

about Mars. A lot of the choice between frequentist and Bayesian statistics

comes down to whether you think science should comprise statements

about the world, or statements about our beliefs.

I’m a frequentist. But lots of very smart people aren’t. This post isn’t an

argument for or against either philosophy. It’s just to alert you that this

philosophical conflict exists, that it is very deep, and that you, as a working

Page 3: Frequentist vs bayesian statistics  resources to help you choose (updated)

scientist, need to be familiar with it in order to make an informed choice of

statistical approach. One thing frequentists and Bayesians agree on is that

it’s a bad idea to do “cookbook statistics”, where you just mindlessly choose

and follow some statistical “recipe” without worrying about why the recipe

works–or even about what it’s trying to cook! I agree with Ellison and

Dennis (2010) that ecologists should be “statistically fluent”, although I

disagree with them that taking calculus-based technical courses in statistics

is the only way to achieve fluency. Note that “fluency” is not at all the same

thing as “technical proficiency”. If anything, I think one unfortunate side

effect of the increasing popularity of technically-sophisticated,

computationally-intensive statistical approaches in ecology has been to

make ecologists even more reluctant to engage with philosophical issues–

i.e. less fluent, or else less likely to care about fluency. It seems like there’s

a “shut up and calculate the numbers” ethos developing, as if technical

proficiency with programming could substitute for thinking about what the

numbers mean. Lee Smolin noted a similar trend in fundamental physics.

Unfortunately, even advanced stats textbooks aimed at ecologists mostly

don’t bother with more than the most cursory philosophical remarks. For

instance, Clark (2007)spends only two pages on philosophy of statistics.

And he uses those two pages to argue for the irrelevance of statistical

philosophy to the real world scientist, because longstanding philosophical

debates show no sign of definitive resolution! As I’ve notedelsewhere, this is

a terrible argument for “pragmatism”, analogous to arguing that debates

between liberal and conservative political philosophies are longstanding,

and therefore irrelevant to the real world voter. Bolker (2008) is an

admirable exception to this general reluctance of ecological statistics

textbooks to grapple with conceptual issues.

So below is some food for thought, a compilation of some interesting and

provocative writings I’ve found really helpful in developing my own

philosophy of statistics. I encourage you to dip into them.

Note that most of the items I’ve listed assume some basic familiarity with

different statistical philosophies, beyond the very brief sketch I gave above.

Unfortunately, I have yet to find a really good, freely available, non-

technical introduction to alternative philosophies of statistics, pitched at a

level suitable for any professional ecologist or grad student. The discussion

in Bolker (2008) is the sort of thing I’m thinking of, but it’s part of a book

that costs money. Anyone know of anything good?