Descriptive or Inferential Statistics

3
Robert Anderson takes on Census and Sample: Descriptive or Inferential Statistics All of my statistical training has been about dealing with samples but I am presently involved in a project where we have a census dataset. We are to compare the profitability of automobile companies for a period of twenty years. After collecting the data I have found that only 19 companies could be my population as there were entry and exit of companies in the industry during the period of the study. Is this a sample or population? Management students often commit mistakes in defining sample and population. Never speak of a ‘random sample of experts’: You choose experts for their special expertise, not for their typicality. Expert panels must be chosen expertly, not randomly. Their responses may be subject to descriptive statistics, but never to inferential statistics. When we define some criteria, say 20 years of operation, naturally, the companies below twenty years of age go outside the purview of our study. Even if the number of companies is less, it is actually the population and not at all sample. The only way to include more companies in the population of my study is to reduce the study period. Oh, how students forget that statistical inference is reasoning from a sample to a specified population. As a late replacement for his dissertation examiner, I listened to a candidate’s oral defence. He showed a sophisticated analysis and a big n, and claimed support for his hypothesis at a high p level. I asked, what is the population? He replied, Huh? Initially I was surprised to see the reaction of the management students. But a sample must be drawn from a population. If you can’t say what the population is, your sample is crap. Then the dissertation is crap. Don’t confuse a census with a sample. If you measure every member of your pre-defined population, you’ve done a census, not a sample. No statistical inference is needed, or appropriate. You’re not making inference from sample to population, because you’ve ‘sampled’ the entire population! One student measured every company in his well-specified population of interest. He then performed statistical tests. ‘Why?’ I asked.

description

Descriptive or Inferential Statistics

Transcript of Descriptive or Inferential Statistics

Robert Anderson takes on

Census and Sample: Descriptive or Inferential Statistics

All of my statistical training has been about dealing with samples but I am presently involved in a project where we have a census dataset. We are to compare the profitability of automobile companies for a period of twenty years. After collecting the data I have found that only 19 companies could be my population as there were entry and exit of companies in the industry during the period of the study. Is this a sample or population? Management students often commit mistakes in defining sample and population. Never speak of a random sample of experts: You choose experts for their special expertise, not for their typicality. Expert panels must be chosen expertly, not randomly. Their responses may be subject to descriptive statistics, but never to inferential statistics. When we define some criteria, say 20 years of operation, naturally, the companies below twenty years of age go outside the purview of our study. Even if the number of companies is less, it is actually the population and not at all sample. The only way to include more companies in the population of my study is to reduce the study period.

Oh, how students forget that statistical inference is reasoning from a sample to a specified population. As a late replacement for his dissertation examiner, I listened to a candidates oral defence. He showed a sophisticated analysis and a big n, and claimed support for his hypothesis at a high p level. I asked, what is the population? He replied, Huh? Initially I was surprised to see the reaction of the management students. But a sample must be drawn from a population. If you cant say what the population is, your sample is crap. Then the dissertation is crap.

Dont confuse a census with a sample.If you measure every member of your pre-defined population, youve done a census, not a sample. No statistical inference is needed, or appropriate. Youre not making inference from sample to population, because youve sampled the entire population!

One student measured every company in his well-specified population of interest. He then performed statistical tests. Why? I asked. Well, he said, it wouldnt look like much of a dissertation if I didnt have statistical tests. Buzzer! Wrong answer!

The usual method of comparing group means, such as using an ANOVA or Kruskal-Wallis test, doesnt apply to census data. For example, we want to know whether the returns on assets of all the companies are different from each other or not. Like any management student I also initially went on searching for any references for census statistical analysis. I have one very good text book on SPSS but it's all about samples, which is a pity. I've been searching all day for about six month...

With census data, I suspect I should not use ANOVA/parametric equivalent to test the significance of the difference between group means. Do I just report the group means and comment on the difference/lack of difference between them?

Over a period of six months after going through the literatures what I could gather when you have a sample you use inferential stats to generalise to the population. When you have a census, you already have data for the whole population. So there is no need to generalise. The 19 companies are not the representative of all the companies of the industry because the other companies could not have the probability of being selected as no other company is left which have been running for twenty years or more.

Youmustdefine a population. It's not hard. You're doing a study; the population is what you're studying, or aiming to study. No population, no study. If you say that you pulled a sample, your examiners or referees will ask, "A sample of what?" It is an obvious and reasonable question that you, the researcher, must answer.

For example, if you used sampling, and there is a 3% difference between groups, then you have to use inferential stats to decide whether that 3% difference is real, or just due to random chance when you did the sampling.

But if you did a census, and there is a 3% difference between groups, well, then there's definitely a 3% difference. That 3% difference is not due to random chance in sampling, because you have data for the whole population. However, even with a census you will still need to use your own judgement to think about why there is a 3% difference (for reasons other than random chance in sampling), and whether the 3% difference is large enough to have any practical significance for the work you are doing. But certainly there does not exist any statistical significance.

So basically, just use descriptive stats. Correlations are fine, but you only need the r value to show the strength of the correlation, not the p value which is related to random chance in sampling.

A lot of people don't get the difference between sample stats and census stats, and will complain that you didn't do the stats properly. I've had cases where I ended up having to do inferential stats on census data just because people hailing from social science background does not have clear understanding of the subject and often complained so much that there were no p values on anything!

Inferential statistics are for the purpose of generalizing from a sample to a population, and if you already know the results for the population, it is of no use (and makes no sense) to conduct an inferential test such as an ANOVA, a Wald test in logistic regression, etc. Anything that generates ap-value is going to be out of place here. What I'd recommend is to explore your possibilities for effective data visualization--in a purely descriptive, not inferential, spirit.

If you have a lot of missing data from a census sometimes you need some fancy inferential stats to fill it in. But that is also not desirable as it violates the basic principles of the subject.