Applied Quantitative Methods MBA course Montenegro

56
Applied Quantitative Methods MBA course Montenegro Peter Balogh PhD baloghp @ agr.unideb.hu

description

Applied Quantitative Methods MBA course Montenegro. Peter Balogh PhD baloghp @ agr.unideb.hu. 6. Measures of dispersion. In the previous part of the presentation we considered several measures of the typical, or average value. - PowerPoint PPT Presentation

Transcript of Applied Quantitative Methods MBA course Montenegro

Page 1: Applied Quantitative Methods MBA course Montenegro

Applied Quantitative MethodsMBA course Montenegro

Peter BaloghPhD

[email protected]

Page 2: Applied Quantitative Methods MBA course Montenegro

6. Measures of dispersion• In the previous part of the presentation we

considered several measures of the typical, or average value.

• The mean is widely regarded as the most important descriptive statistic.

• When references are made to the average time or the average weight or the average cost it is generally the mean that has been calculated.

• Knowledge of the mean, the median and the mode will increase our understanding of the data but will not provide a sufficient understanding of the differences in the data.

Page 3: Applied Quantitative Methods MBA course Montenegro

6. Measures of dispersion• In many applications it is the differences that are of

particular interest to us. • In market research, for example, we are interested

not only in the typical values but also in whether opinions or behaviours are fairly consistent or vary considerably.

• A niche market is defined by difference. • Quality control, whether in the manufacturing or

the service sector, is concerned with difference from the expected.

Page 4: Applied Quantitative Methods MBA course Montenegro

6. Measures of dispersion• In this part I introduce ways of measuring this

variability, or dispersion, and then consider ways of comparing different distributions.

• Measures of dispersion can be absolute (considering only one set of data at a time and giving an answer in the original units e.g. £'s, minutes, years), or relative (giving the answer as a percentage or proportion and allowing direct comparison between distributions).

Page 5: Applied Quantitative Methods MBA course Montenegro
Page 6: Applied Quantitative Methods MBA course Montenegro

6.1 The standard deviation• The standard deviation is the most widely used measure of

dispersion, since it is directly related to the mean.• If you choose the mean as the most appropriate measure of

central location, then the standard deviation would be the natural choice for a measure of dispersion.

• Unlike the mean, the standard deviation is not so well known and does not have the same intuitive meaning.

• The standard deviation measures differences from the mean - a larger value indicating a larger measure of overall variation.

• The standard deviation will also be in the same units as the mean (£‘s, minutes, years) and a change of units (e.g. from £’s to dollars, or metres to centimetres) will change the value.

Page 7: Applied Quantitative Methods MBA course Montenegro

6.1 The standard deviation• The application of computer packages will generally make

the determination of the standard deviation a relatively straightforward procedure, but it is worth checking what version of the formula is being used (the divisor can be n or n - 1).

• I will continue to follow the practice of showing the calculations by hand, as you may still need to do them.

• Such calculations do have the additional advantage of showing how the standard deviation is related to the mean.

• The standard deviation is particularly important in the development of statistical theory, since most statistical theory is based on distributions described by their mean and standard deviation.

Page 8: Applied Quantitative Methods MBA course Montenegro

6. 1.1 Untabulated data• We have already seen how to calculate the mean from simple data. • We will need this calculation of the mean before we calculate the standard

deviation. • We can again use the first 10 observations on the number of cars entering a car

park in 10-minute intervals:10 22 31 9 24 27 29 9 23 12

• The mean of this data is 19.6 cars.• The differences about the mean are shown diagrammatically in Figure 6.2.• To the left of the mean the differences are negative and to the right of the mean

the differences are positive. • It can be seen, for example, that the observation 9 is 10.6 units below the mean,

a deviation of -10.6. • The sum of these differences is zero - check this by adding all the deviations.• This summing of deviations to zero illustrates the physical interpretation of the

mean as being the centre of gravity with the observations as a number of "weights in balance'.

• A

Page 9: Applied Quantitative Methods MBA course Montenegro
Page 10: Applied Quantitative Methods MBA course Montenegro

6. 1.1 Untabulated data• To calculate the standard deviation we follow six steps:– Compute the mean– Calculate the differences from the mean– Square these differences – Sum the squared differences– Average the squared differences to find variance:

– Square root variance to find standard deviation.

xx x

2xx 2xx

nxx 2

nxx 2

Page 11: Applied Quantitative Methods MBA course Montenegro
Page 12: Applied Quantitative Methods MBA course Montenegro

• Table 6.2, showing the number of working days lost by employees in the last quarter, typifies the tabulation of discrete data.

6.1.2 Tabulated discrete data

Page 13: Applied Quantitative Methods MBA course Montenegro

• We need to allow for the fact that 410 employees lost no days, 430 lost one day and so on by including frequency in our calculations.

• In this example there are 1440 employees in total and we need to include 1440 squared differences.

• The formula for the standard deviation becomes

6.1.2 Tabulated discrete data

Page 14: Applied Quantitative Methods MBA course Montenegro

6.1.2 Tabulated discrete data

Page 15: Applied Quantitative Methods MBA course Montenegro

• When data is presented as a grouped frequency distribution we must determine whether it is discrete or continuous (as this will affect the way we view the range of values) and determine the mid-points.

• Once the mid-points have been determined we proceed as before using mid-point values for x and frequencies, as shown in Table 6.4.

• The approach shown clearly illustrates how the standard deviation summarizes differences, but would be extremely tedious to perform by hand.

6.1.3 Tabulated (grouped) data

Page 16: Applied Quantitative Methods MBA course Montenegro

• Some algebraic manipulation of the formula given in Section 6.1.2, will provide a simplified formula that is easier to work with for both calculations by hand and the construction of spreadsheets.

• The simplified formula is usually presented as follows:

• The formula does lose its intuitive appeal but is easier to use.• Formula of this kind can be presented in a variety of ways. Using a formula

presented in different ways should not be a problem. What you do need to be sure about are the stages required in the calculations (e.g. what columns to add] and the assumptions being made (e.g. is n or [n - 1) being used as the divisor?). The use of this simplified formula is illustrated in Table 6.5.

6.1.3 Tabulated (grouped) data

Page 17: Applied Quantitative Methods MBA course Montenegro
Page 18: Applied Quantitative Methods MBA course Montenegro

6.1.3 Tabulated (grouped) data

Page 19: Applied Quantitative Methods MBA course Montenegro

• The variance is the squared value of the standard deviation, and therefore is calculated easily once the standard deviation is known.

• It is sometimes used as a descriptive measure of dispersion or variability rather than the standard deviation, but its importance lies in more advanced statistical theory.

• As we will see, you can add variances but you cannot add standard deviations.

• Variance is mentioned here for completeness.

6.1.4 The variance

Page 20: Applied Quantitative Methods MBA course Montenegro

• While the standard deviation is the most widely used measure of dispersion, it is not the only one.

• As we saw when looking at measures of location, different measures (mean, median and mode) are appropriate for different situations and the same is true for measures of dispersion.

• Furthermore, some of the measures of dispersion are specifically linked to certain measures of location and it would not make sense to mix and match the statistics.

6.2 Other measures of dispersion

Page 21: Applied Quantitative Methods MBA course Montenegro

• The range is the most easily understood measure of dispersion as it is the difference between the highest and lowest values.

• If we were again concerned with the 10 observations:10 22 31 9 24 27 29 9 23 12

the range would be 22 cars (31 - 9).• It is, however, a rather crude measure of spread, being

dependent on the two most extreme observations. • It is also highly unstable as new data is added. • If this measure is to be used, it may well be better to

quote the highest and lowest figure, rather than the difference.

6.2.1 The range

Page 22: Applied Quantitative Methods MBA course Montenegro

• The range has, however, found a number of specialist applications, particularly in quality control (range charts).

• When dealing with data presented as a frequency distribution we will not always know exactly the highest and lowest values, only the group they lie in.

• If the groups are open-ended (e.g. 60 and more), then any values used will merely be based on assumptions that we have made about the widths of the groups.

• In such cases there seems little point in quoting either the range or the extreme values.

6.2.1 The range

Page 23: Applied Quantitative Methods MBA course Montenegro

• If we are able to quote a half-way value, the median, then we can also quote quarter-way values, the quartiles.

• These are order statistics like the median and can be determined in the same way.

• With untabulated data or tabulated discrete data it will merely be a case of counting through the ordered data set until we are a quarter of the way through and three quarters of the way through and noting the values; this will give the first quartile and third quartile, respectively.

• When working with tabulated continuous data, further calculations are necessary.

• Consider for example the data given in Table 6.6 (see Table 5.6 for the determination of the median).

6.2.2 The quartile deviation

Page 24: Applied Quantitative Methods MBA course Montenegro
Page 25: Applied Quantitative Methods MBA course Montenegro

• The lower quartile (referred to as Q1), will correspond to the value one-quarter of the way through the data, the 11th ordered value:

• and the upper quartile (referred to as Q3) to the value three-quarters of the way through the data, the 33rd ordered value:

6.2.2 The quartile deviation

Page 26: Applied Quantitative Methods MBA course Montenegro

• To estimate any of the order statistics graphically, we plot cumulative frequency against the value to which it refers, as shown in Figure 6.4.

• The value of the lower quartile is £12 and the value of the upper quartile is £25 (to an accuracy of the nearest £1 which the scale of this graph allows).

The graphical method

Page 27: Applied Quantitative Methods MBA course Montenegro
Page 28: Applied Quantitative Methods MBA course Montenegro

• We can adapt the median formula (see Section 5.1.3) as follows:

• where O is the order value of interest, l is the lower boundary of corresponding group, i is the width of this group, F is the cumulative frequency up to this group, and f is the frequency in this group.

Calculation of the quartiles

Page 29: Applied Quantitative Methods MBA course Montenegro

• The lower quartile will lie in the group '£10 but under £15' and can be calculated thus:

• The upper quartile will lie in the group '£20 but under £30' and can be calculated thus:

Calculation of the quartiles

Page 30: Applied Quantitative Methods MBA course Montenegro

• The quartile range is the difference between the quartiles:

• and the quartile deviation (or semi-interquartile range) is the average difference:

Calculation of the quartiles

Page 31: Applied Quantitative Methods MBA course Montenegro

• As with the range, the quartile deviation may be misleading.

• If the majority of the data is towards the lower end of the range, for example, then the third quartile will be considerably further above the median than the first quartile is below it, and when we average the difference of the two numbers we will disguise this difference.

• This is likely to be the case with a country's personal income distribution.

• In such circumstances, it would be preferable to quote the actual values of the two quartiles, rather than the quartile deviation.

Calculation of the quartiles

Page 32: Applied Quantitative Methods MBA course Montenegro

• The formula given in Section 6.2.2 for an order value, O, can be used to find the value at any position in a grouped frequency distribution of continuous data.

• For data sets that are not skewed to one side or the other, the statistics we have calculated so far will usually be sufficient, but heavily skewed data sets will need further statistics to fully describe them.

• Examples would include some income distributions, wealth distributions and times taken to complete a complex task.

• In such cases, we may want to use the 95th percentile, i.e. the value below which 95% of the data lies.

• Any other value between 1 and 99 could also be calculated. • An example of such a calculation is shown in Table 6.7.

6.2.3 Percentiles

Page 33: Applied Quantitative Methods MBA course Montenegro
Page 34: Applied Quantitative Methods MBA course Montenegro

• For this wealth distribution, the first quartile and the median are both zero.

• The third quartile is £4347.83. • None of these statistics adequately describes the

distribution.• To calculate the 95th percentile, we find 95% of the

total frequency, here 0.95 x 26 700 = 25365

and this is the item whose value we require.

6.2.3 Percentiles

Page 35: Applied Quantitative Methods MBA course Montenegro

6.2.3 Percentiles• It will be in the group labelled 'under £100 000'

which has a frequency of 800 and a width of 50 000 (i.e. 100000 - 50 000).

• Using the formula, we have:

Page 36: Applied Quantitative Methods MBA course Montenegro

• So far this chapter has taken us from individual numbers (raw data) through ordered data to grouped data, looking at the methods used to find the measures of dispersion.

• The previous chapter did the same for measures of location. • However, the idea of grouping the data developed when

calculation had to be done by hand, or at least using slide-rules and calculators.

• It was the only practical method when large amounts of data were being analysed.

• Now we have compu ters and suitable software, which can deal with huge amounts of data very quickly and easily, without having to make assumptions about an even spread of data within each group, or guessing what the highest or lowest value was.

6.2.4 Back to raw data

Page 37: Applied Quantitative Methods MBA course Montenegro

• Add to this that most data starts life as individual bits of raw data, and you can see that most of the descriptive statistics we have been discussing can be found very easily, provided someone has recorded them electronically.

• An example using Excel is shown as Figure 6.5. • An example of the output from SPSS is shown as

Figure 6.6.• If you are trying to describe secondary data for which

you only have tabulated data, then, of course, you have to go back to the methods we have been discussing.

6.2.4 Back to raw data

Page 38: Applied Quantitative Methods MBA course Montenegro
Page 39: Applied Quantitative Methods MBA course Montenegro
Page 40: Applied Quantitative Methods MBA course Montenegro

• All of the measures of dispersion described earlier in this chapter have dealt with a single set of data.

• In practice, it is often important to compare two or more sets of data, maybe from different areas, or data collected at different times.

• In Part 4 we look at formal methods of comparing the difference between sample observations, but the measures described in this section will enable some initial comparisons to be made.

• The advantage of using relative measures is that they do not depend on the units of measurement of the data.

6.3 Relative measures of dispersion

Page 41: Applied Quantitative Methods MBA course Montenegro

• This measure calculates the standard deviation from a set of observations as a percentage of the arithmetic mean:

• Thus the higher the result, the more variability there is in the set of observations.

6.3.1 Coefficient of variation

Page 42: Applied Quantitative Methods MBA course Montenegro

• If, for example, we collected data on personal incomes for two different years, and the results showed a coefficient of variation of 89.4% for the first year, and 94.2% for the second year, then we could say that the amount of dispersion in personal income data had increased between the two years.

• Even if there has been a high level of inflation between the two years, this will not affect the coefficient of variation, although it will have meant that the average and standard deviation for the second year are much higher, in absolute terms, than the first year.

6.3.1 Coefficient of variation

Page 43: Applied Quantitative Methods MBA course Montenegro

• Skewness of a set of data relates to the shape of the histogram which could be drawn from the data.

• The type of skewness present in the data can be described by just looking at the histogram, but it is also possible to calculate a measure of skewness so that different sets of data can be compared.

• Three basic histogram shapes are shown in Figure 6.7, and a formula for calculating skewness is shown below.

6.3.2 Coefficient of skewness

Page 44: Applied Quantitative Methods MBA course Montenegro

• A typical example of the use of the coefficient of skewness is in the analysis of income data.

• If the coefficient is calculated for gross income before tax, then the coefficient gives a large positive result since the majority of income earners receive relatively low incomes, while a small proportion of income earners receive high incomes.

• When the coefficient is calculated for the same group of earners using their after tax income, then, although a positive result is still obtained, its size has decreased.

6.3.2 Coefficient of skewness

Page 45: Applied Quantitative Methods MBA course Montenegro

• These results are typical of a progressive tax system, such as that in the UK.

• Using such calculations it is possible to show that the distribution of personal incomes in the UK has changed over time.

• A discussion of whether or not this change in the distribution of personal incomes is good or bad will depend on your economic and political views; the statistics highlight that the change has occurred.

6.3.2 Coefficient of skewness

Page 46: Applied Quantitative Methods MBA course Montenegro
Page 47: Applied Quantitative Methods MBA course Montenegro

• We would expect the results of a survey to identify differences in opinions, income and a range of other factors.

• The extent of these differences can be summarized by an appropriate measure of dispersion (standard deviation, quartile deviation, range).

• Market researchers, in particular, seek to explain differences in attitudes and actions of distinct groups within a population.

• It is known, for example, that the propensity to buy frozen foods varies between different groups of people.

6.4 Variability in sample data

Page 48: Applied Quantitative Methods MBA course Montenegro

• As a producer of frozen foods you might be particularly interested in those most likely to buy your products.

• Supermarkets of the same size can have very different turnover figures and a manager of a supermarket may wish to identify those factors most likely to explain the differences in turnover.

• A number of clustering algorithms have been developed in recent years that seek to explain differences in sample data.

6.4 Variability in sample data

Page 49: Applied Quantitative Methods MBA course Montenegro

6.4 Variability in sample data• As an example, consider the following

algorithm or procedure that seeks to explain the differences in the selling prices of houses:

• 1 Calculate the mean and a measure of dispersion for all the observations in your sample. In this example we could calculate the average price and the range of prices (Figure 6.8).

Page 50: Applied Quantitative Methods MBA course Montenegro
Page 51: Applied Quantitative Methods MBA course Montenegro
Page 52: Applied Quantitative Methods MBA course Montenegro

6.4 Variability in sample data• It can be seen from the range that there is

considerable variability in price relative to the average price.

• Usually the standard deviation would be preferred to the range as a measure of dispersion for this type of data.

• 2 Decide which factors explain most of the difference (range) in price, for example, location, house-type, number of bedrooms.

• If location is considered particularly important, we can divide the sample on that basis and calculate the chosen descriptive statistics (Figure 6.9).

Page 53: Applied Quantitative Methods MBA course Montenegro

6.4 Variability in sample data• In this case we have chosen to segment the sample by

location, areas X and Y. • The smaller range within the two new groups indicates

that there is less variability of house prices within areas. • We could have divided the sample by some other factor

and compared the reduction in the range.• 3 Divide the new groups and again calculate the

descriptive statistics. We could divide the sample a second time on the basis of house-type (Figure 6.10).

• 4 The procedure can be continued in many ways with many splitting criteria.

• A more sophisticated version of this procedure is known as the automatic interactive detection technique.

Page 54: Applied Quantitative Methods MBA course Montenegro

Case 2: using measures of difference and performance• Managers are likely to meet a number of measures

of difference and increasingly also various measures of performance (benchmarking, for instance, has become an important management tool, where targets are determined using the performance of the 'best' organizations on certain measures).

• Managers need to be able to respond to this type of information with insight and confidence.

Page 55: Applied Quantitative Methods MBA course Montenegro

Case 2: using measures of difference and performance• It is important for managers to clarify what these measures

mean in business terms and what the underlying assumptions are.

• In the same way that you don't need to be an accountant to use accounting information, you don't need to be a statistician to use statistical information.

• Managers should look for a business understanding in the information they are given and develop responses that allow their organization to interpret and apply such information.

• Knowing the assumptions will reveal some of the thinking of those that devised them.

• Management is a process that involves a judgement as to what is appropriate and when.

Page 56: Applied Quantitative Methods MBA course Montenegro