AMATYC 39 th Annual Conference Anaheim 2013

AMATYC39th Annual Conference

Anaheim 2013A Brief Introduction to Some Philosophical Principles of Statistics

Brian E. SmithMcGill University

Philosophical Principles of StatisticsThe purpose of my talk today is to introduce some philosophical ideas that pervade statistical thinking.

As mathematics educators we often focus on the computational aspects without pausing to think about the deeper meanings.

Students may become adept at calculating statistical measures such as a mean, a standard deviation, a correlation coefficient, without ever developing a deep understanding of the underlying concepts.

Deep Understandings!Try a little experiment in your statistics classes. After you have taught a particular concept such as a confidence interval or a p-value, and you are satisfied that your students can do the appropriate computation and draw the correct conclusions, wait a few days and then ask the following questions:

Who believes they have a

- perfect understanding of this concept- A good understanding- A vague understanding- Don’t understand it at all

The results may surprise you!

My ExperienceI teach a second level course Advanced Business Statistics to students who have a strong mathematical background. They have all taken Calculus I and II, Linear Algebra, and a first course in Statistics.

When I first started teaching the course I took it for granted that my students understood basic principles of statistics that they had learned in the prerequisite course.

At one point I realized that a good percentage of my class was lost, and were unable to answer the questions that I would routinely ask in class. At first I though they may just be feeling shy or intimidated, but when I started to ask how well they grasped some basic concepts I was shocked at their lack of a deep understanding.

TerminologyUnlike many scientific disciplines (medicine, psychology, chemistry, genetics, …) statistics did not primarily develop its own terminology, but rather co-opted common everyday words, assigning them a specific statistical meaning.Examples:

- Significance (has a small probability of occurrence)- Confidence (has a high probability of being true)- Regression (relation or association between variables)- Correlation (measures the strength of a Statistical, not to be

confused with Causal, relation)- Bias (the difference between a parameter and the mean of its

estimator)

We need some real Statistical Terminology

Thank goodness for

HeteroscedasticityA statisticians way of saying unequal variances!

Linguistic ConfusionThis lack of a statistics-specific vocabulary may lead us to think we know what something means when in fact we may be confusing the everyday meaning with the statistical meaning.

For example, if I tell a physician that my research on the correlation between medication X and medical condition Y is significant, the doctor may be forgiven for thinking that I have discovered a result that is clinically important! However, I may in fact have simply shown that with a very large sample of patients with condition Y, one in a million will have a minor improvement in symptoms with prolonged use of medication X.

But is it significant?The drug company that manufactures medication X will have a vested interest, such as billions of dollars in sales annually, so that their statisticians will want to show that there is a statistically significant correlation.

The physician and the patient who are trying to lessen the symptoms of medical condition Y will want to know if spending large amounts of money and experiencing significant side effects (watch some TV ads!) will result in a clinically significant outcome!

Statistical significance does not necessarily imply clinical significance!

Francis Galton (1822-1911)An African explorer and geographer

Wrote the books Hereditary Genius (1869) and Natural Inheritance (1889)

He advised his cousin Charles Darwin, on statistical matters. Galton’s major contribution to statistics were the methods of correlation and regressionGalton contributed a large number of terms to statistics, including many of those used in elementary statistics, e.g. ogive percentile and inter-quartile range

Studied fingerprints and discovered their unique properties.

Galton also provided the first workable fingerprint classification system

RegressionWhen I teach Simple Linear Regression

I tell my students that the name is misleading (simple is not to be confused with easy, it actually means that there is only one independent variable in the model) and regression is not what we understand it to mean if we consult a dictionary! In this context regression is understood to imply a functional or relational association between variables.

So why is it called regression?

Regression UnmaskedIn Galton’s early experiments on heredity he observed that the heights of sons tended to regress towards the average height of males in the population – after all if tall fathers tended to have taller progeny and short fathers tended to have shorter sons, then by now the world should be populated by very tall men and very short men!

He developed a mathematical model to measure this “regression to the mean” effect and from that day on any statistical model that attempts to establish relationships between variables has been called a regression model!

BiometricsGalton established a biometrics laboratory, later taken over by Pearson, in which they worked with Darwin to try to find evidence of evolutionary changes in species. To this end they took measurements on body parts, including those taken in Africa and other parts of the world, and tried to measure evolutionary changes.

To organize and analyze the data they hired large numbers of young women, who were called “calculators”.

Pearson pioneered the “goodness-of-fit” test and the use of the chi-square distribution as part of his biomerical investigations.

Established the journal Biometrika which started out as a biometrical journal but eventually became the major journal of mathematical statistics when Pearson’s son Egon took over as editor.

Karl Pearson (1857-1936)

In 1901 published the journal Biometrika along with Galton

Pioneered the method of moments and the chi square test

Pearson had a great influence on the language and notation of statistics e.g. population, histogram and standard deviation.

Pearson’s philosophical view Prior to Pearson, scientists assumed that “errors” were due to imprecise measurement.

In 1820 Laplace discovered the error distribution – aka the “bell shaped distribution” or the “normal curve”.

Pearson’s breakthrough idea was to realize that uncertainty is not due to measurement errors – it is inherent in nature itself. In other words, what we see and touch is actually a manifestation of a probability distribution – the real thing is the probability distribution, not the object!

The Pearsonian View of RealityFor Pearson, experimental results are not “numbers”, they are “distributions”.

Statistical models of distributions enable us to describe the mathematical nature of the randomness that it is an essential part of the world we live in.

This philosophy is a break with the deterministic mindset of scientific thinking that characterized the 19th century and earlier.

Keep in mind that the quantum physics revolution had not yet appeared!

Pearson’s “skew distributions”Pearson discovered a family of distributions he called the “skew distributions” which he believed would describe every type of distribution found in nature.

Skew distributions are characterized by four parameters:

- The mean- The standard deviation- Symmetry (or its opposite – skewness)- Kurtosis

Mathematically, these are the first, second, third, and fourth moments, respectively

Pearson’s views challenged Pearson mistakenly believed that with enough data he could find the “true” values of the 4 parameters.

Fisher showed that many of Pearson’s methods of estimation were not optimal. Fisher developed the principle of maximum likelihood estimation. Also, Pearson ignored degrees of freedom, resulting in biased estimates.

Question: What is the philosophy behind degrees of freedom? What exactly is a degree of freedom? See the bibliography for a reference that describes 6 different ways of viewing (and explaining) degrees of freedom.

Later, Jerzy Neyman showed that Pearson’s system of skew distributions did not cover the universe of possible distributions. For example, the Poisson distribution is not a member of the skew distribution family (it only has one parameter!)

R. A. Fisher (1890-1962). He derived the exact distribution of the correlation coefficient

Developed the theory of maximum likelihood estimation.

Developed tests of significance:Use of p-value

Experimental design including concepts of randomization, replication and blocking) and ANOVA.

Fisher created many terms in everyday use, e.g. statistic and sampling distribution

Fisher’s Theory of EstimationThe four desirable properties of an estimator are:

- Unbiased- Consistent- Efficient - Sufficient

The unsuspecting student will form an immediate opinion as to what these properties mean, but their everyday interpretations of these words will not help them to grasp the specifically statistical meanings.

BiasThe very word in a self referential way biases us to assume that this is perhaps the most important property of an estimator – after all who wants to admit to bias – it is almost a dirty word in modern morality!

Yet, in statistical estimation theory, least squares estimates (LSEs) are unbiased whereas maximum likelihood estimates (MLEs) are biased. Nevertheless for estimation of model parameters from “ugly” distributions (not normal, not homoscedastic) MLEs are vastly superior to LSEs in spite of a small bias. While the downside of MLE is the bias, they have the very desirable property of sufficiency. This more than compensates for the biased nature of MLE estimates!

Fisher’s Maximum Likelihood EstimatorsFisher proved that MLE estimates are consistent and relatively efficient, and in addition they are sufficient (roughly meaning that no other estimator can provide more information).

Consequently, maximum likelihood estimators are highly desirable and are used in many statistical applications.

Unfortunately, as anyone who has taken a mathematical statistics course knows, deriving exact forms of MLEs can be mathematically challenging!

The Pearson-Fisher (acrimonious) debateBackground

Pearson dabbled in Marxism and Socialism He changed the spelling of his first name from Carl to Karl in admiration of Marx!

Fisher, by contrast, believed that any form of social welfare should be discouraged. In a work on eugenics he recommended that the government not provide assistance to the poor on the grounds that they should be discouraged from procreating, and that by contrast the upper classes should have large families to improve the hereditary quality of the population.

Origins of the DisputePearson introduced Fisher to the difficult problem of finding the statistical distribution of Galton’s correlation coefficient.

Fisher solved the problem in a week.

Neither Pearson nor Student (Gossett) could understand the mathematics of Fisher’s solution!

Pearson refused to publish Fisher’s solution in Biometrika. After a year he finally published it as a footnote to one of his own papers.

Fisher never forgave Pearson!

Philosophy: Pearson vs Fisher

Pearson Statistical distribution

describes the actual collection of data to analyze.

The distribution of Measurements is the real thing!

Data produces Models

Fisher The true distribution is abstract

– a mathematical model

Data is used to estimate parameters of the model

Model produces dataIn recent years Pearson’s view has been making a

comeback in view of new computer techniques such as bootstrap and resampling models.

Neyman/Pearson

Jerzy Neyman Egon Pearson

Philosophical debate Hypothesis Tests vs Confidence IntervalsHypothesis (Tests of Significance)

Fisher used “tests of significance” based on p-values.Egon Pearson and Neyman used “tests of hypothesis” and developed the famous Neyman-Pearson Lemma, which introduced the concept of Type I and Type II errors and the power of a statistical test.Problem: what does the null hypothesis mean? For example, if we cannot reject the hypothesis that a population is normal, can we conclude that it is normal? Not necessarily.In 1980 Deming attacked the whole idea of hypothesis testing as nonsensical.

Confidence IntervalsDeveloped by Jerzy Neyman 1934Met with great skepticism- Problem – what is confidence?- What are we confident about? The

parameter? The interval? In common usage today but still has its critics who doubt the validity of the interpretation of the probability associated with the intervals. - Problem: Leads to sloppy thinking.

Not uncommon for someone to say that he/she is 95% sure that a parameter lies within the interval.

Underlying Problem: What is the meaning of probability in Real Life?

William Sealy Gosset (‘Student’) (1876-1937)In 1908 he published in Biometrika two

papers on small sample distributions, one on the normal mean (Student's t distribution) and the second one on normal correlation (see Fisher’s z-transformation).

In his work for Guinness’s brewery in Dublin, Ireland, he modelled the number of live yeast cells in small samples using the Poisson distribution.Student’s work on inferences from small

sample has had a major impact on modern statistical analysis and lead to the theory of hypothesis testing.

Thank you, and I hope you enjoyed my brief introduction to some philosophical ideas in statistics.

References The Lady Tasting Tea, How Statistics Revolutionized Science in the

Twentieth Century by David Salsburg, Holt , New York, 2001 Illustrating degrees of freedom in terms of sample size and dimensionality,

by Dr. Alex YU, 2009. http://www.creative-wisdom.com/computer/sas/df.html

Karl Pearson and the Origins of Modern Statistics, by M. Eileen Magnello, The Rutherford Journal. www.rutherfordjournal.org/article010107.html

Statistical Concepts in Their Relation to Reality, by E.S. Pearson, Journal of the Royal Statistical Society, Series B (Methodological), Vol 17, No. 2, 1955 204-207. Online at http://www .phil.vt.e du/dmayo/p ersonal_we bsite/Pear son%201955 .pdf

Don’t Believe in the Null Hypothesis, Dr. Alex Yu (2013). . http://www.creative-wisdom.com/computer/sas/hypothesis.html

AMATYC 39 th Annual Conference Anaheim 2013

Documents

Transcript of AMATYC 39 th Annual Conference Anaheim 2013