Frequentist vs bayesian statistics resources to help you choose (updated)
Click here to load reader
-
Upload
glaucio-bastos -
Category
Documents
-
view
55 -
download
1
Transcript of Frequentist vs bayesian statistics resources to help you choose (updated)
Oikos Blog
• Home
• Oikos Online
• Editorial Office
• About
• Editor’s Choice
Posted by: Jeremy Fox | October 11, 2011
Frequentist vs. Bayesian statistics: resources to help you choose (UPDATED)
There are two dominant approaches to statistics. Here, I explain why you
need to choose one or the other, and link to resources to help you make
your choice.
Most ecologists use the frequentist approach. This approach focuses on
P(D|H), the probability of the data, given the hypothesis. That is, this
approach treats data as random (if you repeated the study, the data might
come out differently), and hypotheses as fixed (the hypothesis is either true
or false, and so has a probability of either 1 or 0, you just don’t know for
sure which it is). This approach is called frequentist because it’s concerned
with the frequency with which one expects to observe the data, given some
hypothesis about the world. The P values you see in the “Results” sections
of most empirical ecology papers are values of P(D|H), where H is usually
some “null” hypothesis.
Bayesian statistical approaches are increasingly common in ecology.
Bayesian statistics focuses on P(H|D), the probability of the hypothesis,
given the data. That is, this approach treats the data as fixed (these are the
only data you have) and hypotheses as random (the hypothesis might be
true or false, with some probability between 0 and 1). This approach is
called Bayesian because you need to use Bayes’ Theorem to calculate
P(H|D).
At a broad-brush verbal level, both these approaches sound eminently
reasonable, to the point that differences between them sound subtle to the
point of unimportance. A frequentist basically says, “The world is a certain
way, but I don’t know how it is. Further, I can’t necessarily tell how the
world is just by collecting data, because data are always finite and noisy. So
I’ll use statistics to line up the alternative possibilities, and see which ones
the data more or less rule out.” A Bayesian basically says, “I don’t know
how the world is. All I have to go on is finite data. So I’ll use statistics to
infer something from those data about how probable different possible
states of the world are.” And indeed, there are contexts in which Bayesian
and frequentist statistics easily coexist.
But there are many contexts in which they don’t; frequentist and Bayesian
approaches represent deeply conflicting approaches with deeply conflicting
goals. Perhaps the deepest and most important conflict has to do with
alternative interpretations of what “probability” means. These alternative
interpretations arise because it often doesn’t make sense to talk about
possible states of the world. For instance, there’s either life on Mars, or
there’s not. We don’t know for sure which it is, but we do know for sure that
it’s one or the other. So if you insist on trying to put a number on the
probability of life on Mars (i.e. the probability that the hypothesis “There is
life on Mars” is true), you are forced to drop the frequentist interpretation of
probability. A frequentist interprets the word “probability” as meaning “the
frequency with which something would happen, in a lengthy series of trials”.
The most common alternative interpretation of “probability” (though not the
only one) is as “subjective degree of belief”: the probability that you
(personally) attach to a hypothesis is a measure of how strongly you
(personally) believe that hypothesis. So a frequentist would never say
“There’s probably not life on Mars”, unless she was just speaking loosely
and using that phrase as shorthand for “The data are inconsistent with the
hypothesis of life on Mars”. But the most common sort of Bayesian would
say “There’s probably not life on Mars”, not as a loose way of speaking
about Mars, but as a literal and precise way of speaking about his beliefs
about Mars. A lot of the choice between frequentist and Bayesian statistics
comes down to whether you think science should comprise statements
about the world, or statements about our beliefs.
I’m a frequentist. But lots of very smart people aren’t. This post isn’t an
argument for or against either philosophy. It’s just to alert you that this
philosophical conflict exists, that it is very deep, and that you, as a working
scientist, need to be familiar with it in order to make an informed choice of
statistical approach. One thing frequentists and Bayesians agree on is that
it’s a bad idea to do “cookbook statistics”, where you just mindlessly choose
and follow some statistical “recipe” without worrying about why the recipe
works–or even about what it’s trying to cook! I agree with Ellison and
Dennis (2010) that ecologists should be “statistically fluent”, although I
disagree with them that taking calculus-based technical courses in statistics
is the only way to achieve fluency. Note that “fluency” is not at all the same
thing as “technical proficiency”. If anything, I think one unfortunate side
effect of the increasing popularity of technically-sophisticated,
computationally-intensive statistical approaches in ecology has been to
make ecologists even more reluctant to engage with philosophical issues–
i.e. less fluent, or else less likely to care about fluency. It seems like there’s
a “shut up and calculate the numbers” ethos developing, as if technical
proficiency with programming could substitute for thinking about what the
numbers mean. Lee Smolin noted a similar trend in fundamental physics.
Unfortunately, even advanced stats textbooks aimed at ecologists mostly
don’t bother with more than the most cursory philosophical remarks. For
instance, Clark (2007)spends only two pages on philosophy of statistics.
And he uses those two pages to argue for the irrelevance of statistical
philosophy to the real world scientist, because longstanding philosophical
debates show no sign of definitive resolution! As I’ve notedelsewhere, this is
a terrible argument for “pragmatism”, analogous to arguing that debates
between liberal and conservative political philosophies are longstanding,
and therefore irrelevant to the real world voter. Bolker (2008) is an
admirable exception to this general reluctance of ecological statistics
textbooks to grapple with conceptual issues.
So below is some food for thought, a compilation of some interesting and
provocative writings I’ve found really helpful in developing my own
philosophy of statistics. I encourage you to dip into them.
Note that most of the items I’ve listed assume some basic familiarity with
different statistical philosophies, beyond the very brief sketch I gave above.
Unfortunately, I have yet to find a really good, freely available, non-
technical introduction to alternative philosophies of statistics, pitched at a
level suitable for any professional ecologist or grad student. The discussion
in Bolker (2008) is the sort of thing I’m thinking of, but it’s part of a book
that costs money. Anyone know of anything good?