Business statistics

BUSINESS STATISTICS FOR BBA (HONS) 4TH SEMESTER

Presentation by Sajjad Hussain (Sajjad Chitrali)

INTRODUCTION

There are different definitions of “Statistics” and each researcher has defined it in their own terms. For example;

Statistics is a science of sampling and estimation.

Statistics is a science of probability.

Statistics is a science of collecting information/data.

Statistics is a science of presentation of data either in qualitative or quantitative form.

Statistics is a science of analyzing the data.

Statistics is a science of collection, presentation, analysis and interpretation of numerical data.

IMPORTANCE OF STATISTICS

Statistical methods are used for summarization of a large set of data.

Statistical methods are used for analyzing the data related to filed and lab experiments.

Statistical methods are used for conducting sampling surveys, also the data coming from surveys can be analyzed by using statistical methods to find solutions of the problems under study.

Statistical methods are helpful in effective planning in any field of inquiry.

Statistical methods are used in each and every filed of scientific discipline like agriculture, business, medical, biological, genetics, physical and social sciences etc.

Banks, insurance companies, government and semi-government organizations, are using statistical techniques as a tool for data analysis.

Statistics helps in drawing general conclusions about the characteristics of a population/aggregate on the basis of sample data;

Statistical methods are also helpful in making prediction (forecasting).

What Is Business Statistics?

Briefly defined, business statistics can be described as the collection, summarization,

analysis, and reporting of numerical findings relevant to a business decision or situation.

Naturally, given the great diversity of business itself, it’s not surprising that statistics can be

applied to many kinds of business settings. We will be examining a wide spectrum of such

applications and settings. Regardless of your eventual career destination, whether it be

accounting or marketing, finance or politics, information science or human resource

management, you’ll find the statistical techniques explained here are supported by examples

and problems relevant to your own field.

Types of Statistics

As we have seen, statistics can refer to a set of individual numbers or numerical facts, or to

general or specific statistical techniques. A further breakdown of the subject is possible,

depending on whether the emphasis is on (1) simply describing the characteristics of a set of

data or (2) proceeding from data characteristics to making generalizations, estimates,

forecasts, or other judgments based on the data. The former is referred to as descriptive

statistics, while the latter is called inferential statistics. As you might expect, both

approaches are vital in today’s business world.

Descriptive Statistics

Descriptive statistics deals with concepts and methods related with the summarization and

description of the important aspects of numerical data. It consists of condensation of data,

STATISTICS

Descriptive Statistics

Presentation of Data(Graphs and Diagrams)

Tabulation and Classification

Measures of CentralTendency and

Dispersion

Inferential Statistics

Estimation of Parameters

Testing of Hypothesis

their graphical displays and computation of numerical quantities that can provide information

about the Centre and spreadness of observations of a data set.

In descriptive statistics, we simply summarize and describe the data we’vecollected. For

example, upon looking around your class, you may find that 35%of your fellow students are

wearing Casio watches. If so, the figure “35%” is adescriptive statistic. You are not

attempting to suggest that 35% of all collegestudents in the United States, or even at your

school, wear Casio watches. You’remerely describing the data that you’ve recorded. For now,

however, just remember that descriptivestatistics are used only to summarize or describe.


Inferential statistics deals with methods and procedures used for drawing inferences about the

true but unknown characteristics of a population based on the sample data derived from the

same population. Inferential statistics can be further classified into estimation of parameters

and testing of hypothesis.

In inferential statistics, sometimes referred to as inductive statistics, we go beyondmere

description of the data and arrive at inferences regarding the phenomenonor phenomena for

which sample data were obtained. For example, based partially on an examination of the

viewing behaviour of several thousand television households, the ABC television network

may decide to cancel a prime-time television program. In so doing, the network is assuming

that millions of other viewers across the nation are also watching competing programs.

Key Terms for Inferential Statistics

In surveying the political choices of a small number of eligible voters, political pollsters are

using a sample of voters selected from the population of all eligible voters. Based on the

results observed in the sample, the researchers then proceed to make inferences on the

political choices likely to exist in this larger population of eligible voters. A sample result

(e.g., 46% of the sample favour Charles Grady for president) is referred to as a sample

statistic and is used in an attempt to estimate the corresponding population parameter (e.g.,

the actual, but unknown, national percentage of voters who favour Mr. Grady). These and

other important terms from inferential statistics may be defined as follows:

• Population Sometimes referred to as the universe, this is the entire set of people or objects

of interest. It could be all adult citizens in the United States, all commercial pilots employed

by domestic airlines, or every roller bearing ever produced by the Timken Company.

An aggregate or totality having some common characteristics on interest is called

population. It is also called Universe. For example, total number of students enrolled in

IBMS-Peshawar, total number of markets in Peshawar city, total number of banks in

Hayatabad, number of industries in the province, monthly/yearly sales of the stores in

Peshawar district, etc.

There are different types of population e.g. finite and infinite population, homogeneous and heterogeneous population etc.

Sample This is a smaller number (a subset) of the people or objects that exist within the

larger population. The retailer in the preceding definition may decide to select her sample by

choosing every 10th person entering the store between 9 a.m. and 5 p.m. next Wednesday.

A small representative part of population is called sample. For example, a small portion/part

of students of IBMS-Peshawar will constitute a sample of students. Similarly, a

randomly/purposively selected number of markets from a bulk of markets are called a sample

of markets.

A sample is said to be representative if its members tend to have the same characteristics

(e.g., voting preference, shopping behaviour, age, income, educational level) as the

population from which they were selected. For example, if 45% of the population consists of

female shoppers, we would like our sample to also include 45% females. When a sample is so

large as to include all members of the population, it is referred to as a complete census.

Parameter:Any numerical quantity like mean, standard deviation etc. computed/obtained

from population data is known as parameter. For example, average monthly/yearly sale of all

the stores located in district Peshawar etc. Parameters are generally used to specify the

distribution of data.

This is a numerical characteristic of the population. If we were to take a complete census of

the population, the parameter could actually be measured. As discussed earlier, however, this

is grossly impractical for most business research. The purpose of the sample statistic is to

estimate the value of the corresponding population parameter (e.g., the sample mean is used

to estimate the population mean).

Statistic: Any numerical quantity like mean, standard deviation etc. computed from sample

data is called statistic. For example, the average GPA of the 50 students that are selected from

a population of 300 students. Similarly, the average sale of 100 stores instead of 1000 stores

etc. is the examples of statistic(s).

VARIABLE AND ITS TYPES

Variable

Any characteristic of interest which takes on different values is called variable. For example,

price of a commodity at different places in Peshawar city, profit of a business firm at

different months of a year, production, cost, temperature, sale of a market, consumption etc.

Variable is broadly divided into qualitative and quantitative variables.

Variables express how much of an attribute is possessed. Discrete quantitative variables can

take on only certain values along an interval, with the possible values having gaps between

them, while continuous quantitative variables can take on a value at any point along an

interval. When a variable is measured, a numerical value is assigned to it, and the result will

be in one of four levels, or scales, of measurement—nominal, ordinal, interval, or ratio. The

scale to which the measurements belong will be important in determining appropriate

methods for data description and analysis. By helping to reduce the uncertainty posed by

largely uncontrollable factors, such as competitors, government, technology, the social and

economic environment, and often unpredictable consumers and voters, statistics plays a vital

role in business decision making. Although statistics is a valuable tool in business, its

techniques can be abused or misused for personal or corporate gain. This makes it especially

important for businesspersons to be informed consumers of statistical claims and findings.

Qualitative and Quantitative Variables

A variable is defined to be qualitative which is not capable of numerical measurement but

one can feel the presence or absence of a particular phenomenon. For example, honesty,

beauty, race, like and dislike, pass or fail, gender classification etc.

A variable is defined to be quantitative which is capable of numerical measurement. For

example, cost of production, price of a commodity, monthly consumption of households etc.

Discrete and Continuous Variables

A variable is said to be discrete if it takes isolated integral values or a variable which take the

values on jumps is called a discrete variable. For example, number of rooms in a house,

number of students in the class, number of Banks in different cities, size of a household,

number of shops in a market etc.

A type of variable which takes all possible values with in a given interval/range (a, b). For

example, consumption, production, temperature, monthly sale of a market, height, weight and

age etc.

Dependent and Independent Variables

A type of variable which is influenced by other variable/variables is called dependent

variable. It is also called random or stochastic variable. OR

A variable which depends on one or more other variables is called dependent variable. OR

A variable of primary interest that lends itself for investigation as a function of other cause

variables is known as dependent variable.

For example, in economics, consumption of a commodity (say apple) depends upon the

income, household size, and price etc. of the commodity. In this example, consumption of

apple is a dependent variable which will vary from one family to other family; while the other

variables like income, household size and price are independent variables.

A variable which influence a dependent variable in either direction (positive or negative) is

called independent variable.

Meaning and Purpose of Data

Data means observations or evidences. OR, the raw facts and figures/collection of meaningful

information is called data. Data are both qualitative and quantitative in nature.

The data are needed in a research work to serve the following purposes:

1. Quality of data determines the quality of research.

2. It provides a direction and answer to a research inquiry. Data are very essential for

conducting a research.

3. The main purpose of data collection is to verify the hypotheses.

4. Data are necessary to provide the solution of the problem.

5. Data are also employed to ascertain the effectiveness of new device for its practical utility.

6. Statistical data are used in two basic problems of any investigation:

(a). Estimation of population parameters, which helps in drawing generalization about the

population characteristics.

(b).The hypotheses of any investigation are tested with the help of data.

Types of Data

Primary Data: The data which is collected for the first time from its source, is called primary

data.

Secondary Data: When the primary data is passed through any sort of statistical or

mathematical treatment, the data is known as secondary data.

OR, the data that are collected and compiled by an outside source or by someone in the

organization who may later provide access to the data to other users.

Collection of Data

Collection of Primary Data

1. Primary data can be collected through:

2. Direct personal investigation

3. Indirect investigation or personal interviews

4. Collection through questionnaire

5. Collection through enumerators

6. Collection through local sources

Collection of Secondary Data

1. Secondary data can be collected from:

2. Collection from official records

3. Collection from semi-official records

Collection of Primary Data

i. Direct Personal Investigation

According to this method, the researcher/investigator collect information in person from the

selected respondents. In this method an investigator has a degree of freedom and open

choices of asking a variety of questions (open ended and closed ended or mixed). The data

collected through this method is complete; however, personal bias can be present due to

personal involvement of the investigator. Also this method is very costly and time

consuming.

ii.Indirect Investigation

In some cases, it is not possible to take direct information from the respondents due to certain

limitations. So, in such a circumstances indirect investigation is carried out by involving a

third party for collecting the required information. This method is useful in conducting the

inquiries or the information are the information required are complex.

iii. Collection through questionnaire

In this method, a list of questions (called questionnaire) is prepared by the

researcher/investigator covering all aspects of the study being required. A list of questions is

send to the respondents through mail or email with a request to send back after answering all

the listed questions. In this method it is possible that the respondents keep some of the

questions blank due to no understanding or don’t want to give information about those

questions. In addition, some of the respondents are not willing at all to given any of the

information that are contained in the questionnaire.

iv. Collection through enumerators

According to this method, trained peoples are send to the area under study for collecting

information on a pre-specified Performa. Information collected through this method will be

more useful as compared to the questionnaire method. In this method, the enumerators can

take information from the respondents directly or may be his/her closed relatives (if not

available on the spot).

V. Collection through local sources

As the name indicates that by using this method, data are collected through local sources.

Local sources means that information are not directly collected from the respondents but the

desired information are collected from the people belong the area about which information

are required.

Time Series Data

The data collected at different interval of time regarding a commodity or group of

commodities (or organization/firm) is called time series data. For example,

Time Series data of a company showing profit, production and sale.

Year Profit Production Sale

1990 12 120 110

1991 13 140 132

1992 14 150 145

1993 13.5 140 123

1994 10 103 90

1995 11 115 100

1996 12.5 123 122

1997 13.8 140 135

1998 15 160 145

2000 11.6 120 115

2001 15 162 150

2002 16 165 145

Cross Sectional Data

Widely dispersed data (such as) relating to one period, or data related to households, data

collected from the field survey i.e. monthly profit of the selected stores or monthly profit of

different companies related to only one period etc.

Cross sectional data of 12 different households showing profit, production and sale.

Household Profit Production Sale

1 12 120 110

2 13 140 132

3 14 150 145

4 13.5 140 123

5 10 103 90

6 11 115 100

7 12.5 123 122

8 13.8 140 135

9 15 160 145

10 11.6 120 115

11 15 162 150

12 16 165 145

Frequency

Repetition of an observation in a data set is called frequency of that particular

observation/data point/individual. OR

Total number of observations in a class is called the frequency of that class. For example,

consider the following data showing the monthly salaries of 50 employees of a certain

University. In this example, 20 is the frequency of the class (employees) having salary Rs.

40, 000 per month, and 3 is the frequency of the employees drawing Rs. 90,000 per month

salary.

Salary (000) 40 50 60 70 80 90

Number of employees 20 10 8 5 4 3

Class Boundaries: In a grouped frequency distribution, if upper limit of a class is repeated as

a lower limit of the next class, such classes are called class boundaries. For example, consider

the following data set:

Salary (000) 5-10 10-15 15-20 20-25 25-30 30-35


Class Limits: In a grouped frequency distribution, if upper limit of a class is not repeated as

a lower limit of the next class, such classes are called class limits. For example, consider the

following data set:

Salary (000) 5-9 10-14 15-19 20-24 25-29 30-34


How to convert class limits in to class boundaries:

In the given data, classes shows class limits. To convert class limits in to class boundaries,

calculate the mid-way-value as: = (10-9)/2 = 0.5

Now subtract 0.5 from each of the lower limit of the class, and add 0.5 to each of the upper

class limits. See the example for further understanding.

Classes Class Boundaries

5-9 4.5-9.5

10-14 9.5-14.5

15-19 14.5-19.5

20-24 19.5-24.5

25-29 24.5-29.5

30-34 29.5-34.5

Frequency Distribution

Arrangement of data in to different classes or group in such a way that each class/group has

their own frequency is called frequency distribution. For example, the following data shows

the frequency distribution of the salary of 50 employees of a firm. This frequency distribution

is called discrete frequency distribution.

Salary (000) 40 50 60 70 80 90


Whereas, the data below indicate the grouped/continuous frequency distribution of the

amount of salary of 50 employees

Salary (000) 5-9 10-14 15-19 20-24 25-29 30-34


PRESENTATION OF DATA: DIAGRAMS

City Name No. of Industries No. of Banks

Peshawar 50 35

Islamabad 40 45

Karachi 120 90

Lahore 70 55

Faisalabad 90 30

Quetta 15 8

In this section, we will examine several other methods for the graphical representation of

data, then discuss some of the ways in which graphs and charts can be used (by either the

unwary or the unscrupulous) to deceive the reader or viewer. We will also provide several

Computer Solutions to guide you in using Excel and Minitab to generate some of the more

common graphical presentations. These are just some of the more popular approaches. There

are many other possibilities.

The Bar Chart

Like the histogram, the bar chart represents frequencies according to the relative lengths of a

set of rectangles, but it differs in two respects from the histogram: (1) the histogram is used in

representing quantitative data, while the bar chart represents qualitative data; and (2) adjacent

0

20

40

60

80

100

120

140

Peshawar Islamabad Karachi Lahore Faisalabad Quetta

rectangles in the histogram share a common side, while those in the bar chart have a gap

between them.

Figure: Summary of the number of industries in different cities of Pakistan

Multiple Bar Diagram:

Figure: Summary of the number of industries and Banks in different cities of Pakistan

Component bar Diagram:

0

20

40

60

80

100

120

140


0

20

40

60

80

100

120

140


City Name

No. of Industries No. of banks

90

0

50

100

150

200

250



Figure: Summary of the number of industries and Banks in different cities of Pakistan

The Line Graph

The line graph is capable of simultaneously showing values of two quantitative variables (y,

or verticalaxis, and x, or horizontal axis); it consists of linear segments connecting points

observed or measuredfor each variable. When x represents time, the result is a time series

view of the y variable.

The Pie Chart

The pie chart is a circular display divided into sections based on either the number of

observations within or the relative values of the segments. If the pie chart is not computer

generated, it can be constructed by using the principle that a circle contains 360 degrees. The

90

0

50

100

150

200

250



0

5

10

15

20

25

30

35

40

45

2.7 2.9 3.1 3.3 3.5 3.7 3.9 4.1

GPA

Nu

mb

er o

f S

tud

ents

angle used for each piece of the pie can be calculated as follows: Number of degrees for the

category 5 Relative value of the category 3 360.

For example, if 25% of the observations fall into a group, they would be represented by a

section of the circle that includes (0.25 3 360), or 90 degrees. Computer Solutions 2.6 shows

the Excel and Minitab procedures for generating a pie chart to show the relative importance

of four major business segments in contributing to Home Depot Corporation’s overall profit.

Figure: Summary of the number of industries in different cities of Pakistan

Statistical Tools and Methods

Measures of Central Tendency

Measures of Dispersion

Reliability Analysis


Mean Comparison (statistical tests)

Analysis of Variance (ANOVA)

Tests of Association

Regression and Correlation Analysis

Non-parametric Tests

Peshawar13%

Islamabad10%

Karachi32%

Lahore18%

Faisalabad23%

Quetta4%

MEASURES OF CENTRAL TENDENCY

We saw how raw data are converted into frequency distributions, histograms, and visual

displays. We will now examine statistical methods for describing typical values in the data as

well as the extent to which the data are spread out. Introduced in Chapter 1, these descriptors

are known as measures of central tendency and measures of dispersion:

A data set can be summarized into a single value, usually lies somewhere in the center and

represent the whole data set. Such a single value that represents the central part of a data set

is called central value.The tendency of samples of a given measurement to cluster around

some central value. Tendency of observations that cluster in the central part of the data set is

called central tendency. Most commonly used measures of central tendency are given in the

following diagram:

Arithmetic Mean

Defined as the sum of the data values divided by the number of observations, the arithmetic

mean is one of the most common measures of central tendency. Also referred to as the

arithmetic average or simply the mean, it can be expressed as _ (the population mean,

pronounced “myew”) or} ( x ) (the sample mean, “x bar”). The population mean µ applies

when our data represent all of the items with in the population. The sample mean (x ) is

applicable whenever data representa sample taken from the population.

Simply it is called mean or average and mostly used measure of central tendency in every

field of research. “Arithmetic mean is a value obtained by dividing the sum of all

observations in a data set by the number of observations”.

Mathematical Description of Arithmetic Mean

Mathematically, Arithmetic mean is expressed as

Example: The following data shows the consumption (in thousand of Rs.) of 9 MBA students

per semester in a certain University, compute arithmetic mean and interpret the result. The

data is: 39, 36, 48, 36, 41, 37, 32, 46 and 45.

It indicates that on the average, each MBA student is consuming Rs. 40,000 per semester.

Arithmetic Mean for Grouped Data

Example: Using the following data showing the profits (in thousand of Rs.) of 60 different

industries, calculate the mean profit (average profit) of the industries.

Profit (000) 65-

84

85-

104

105-

124

125-

144

145-

164

165-

184

185-

204

Tota

l

Number of

industries

9 10 17 10 5 4 5 60

To compute the mean, first we convert the class intervals into mid points (X)

Profit (000) 65-

84

85-

104

105-

124

125-

144

145-

164

165-

184

185-

204

Tota

l

Number of

industries (f)

9 10 17 10 5 4 5 60

Mid point (X) 74.5 94.5 114.5 134.5 154.5 174.5 194.5

fx 670.

5

945 1946.

5

1345 772.5 698 972.5 7350

Frequency distributions and histograms

To more concisely communicate the information contained, raw data can be visually

represented and expressed in terms of statistical summary measures. When data are

quantitative, they can be transformed to a frequency distribution or a histogram describing the

number of observations occurring in each category.

Visual Description of Data

The set of classes in the frequency distribution must include all possible values and should be

selected so that any given value falls into just one category. Selecting the number of classes

to use is a subjective process. In general, between 5 and 15 classes are employed. A

frequency distribution may be converted to show either relative or cumulative frequencies for

the data.

• Stem-and-leaf displays, dot plots, and other graphical methods

The stem-and-leaf display, a variant of the frequency distribution, uses a subset of the

original digits in the raw data as class descriptors (stems) and class members (leaves). In the

dot plot, values for a variable are shown as dots appearing along a single dimension.

Frequency polygons, ogives, bar charts, line graphs, pie charts, pictograms, and sketches are

among the more popular methods of visually summarizing data. As with many statistical

methods, the possibility exists for the purposeful distortion of graphical information.

• The scatter diagram

The scatter diagram, or scatterplot, is a diagram in which each point represents a pair of

known or observed values of two variables. These are typically referred to as the dependent

variable (y) and the independent variable (x). This type of analysis is carried out to fit an

equation to the data to estimate or predict the value of y for a given value of x.

• Tabulation and contingency tables

When some of the variables represent categories, simple tabulation and cross tabulation

are used to count how many people or items are in each category or combination of

categories, respectively. These tabular methods can be extended to include the mean or other

measures of a selected quantitative variable for persons or items within a category or

combination of categories.

MODE

Mode is a value which has maximum frequency as compared to other items of a data set. OR,

the most frequent value of a data set is called mode.

A distribution/data set having only one mode is called uni-modal distribution. Similarly, a

distribution is defined to be bi-modal if it has two modes. Generally, a distribution having

more than one modes is called multi-modal distribution. For example:

a). 2, 4, 6, 4, 8, 10 (mode = 4)

b). 2, 4, 6, 4, 8, 10, 8 (mode = 4 and 8)

c). 2, 4, 6, 4, 8, 10, 8, 10 (mode = 4, 8 and 10 )

If all the observations of a data set have the same frequencies (repeated the same number of

times), the data set will have no mode. For example: 2, 4, 6, 4, 8,

10, 8, 10, 6: this data set has no mode because each and every

observation is repeated the same number of times.

Mode is the appropriate average for qualitative/nominal data.

MODE FOR CONTINEOUS SERIES

Consumption f

4.5-9.5 10

9.5-14.5 8

14.5-19.5 20

19.5-24.5 5

24.4-29.5 4

29.5-34.5 3

Total 50

MEDIAN

Median is a value which divide and arranged data set into two equal parts i.e. half (50%) of

the observations will lies below and half (50%) will come above that value.

For example: what will be the median of the following data showing weekly profit (000) of

seven stores as: 10, 20, 15, 13, 14, 9 and 12.

Arranged data (increasing order): 9, 10, 12, 13, 14, 15, 16

Median = 13

Similarly, for the data set having the size (even number) divisible by 2, median will be the

average of two middle values, for example:

9, 10, 12, 13, 14, 15, 16, 20 (here n = 8) so

Numerical Examples-Continuous Frequency Distribution

The following data shows the frequency distribution of the salary of 50 employees of a firm. Calculate the following

1. Arithmetic mean

2. Median

Salary (000)

5-9 10-14 15-19 20-24 25-29 30-34

Number of employees

20 10 8 5 4 3

Mode lies in the group (14.5-19.5) as it has maximum frequency, so

To calculate the required quantities, we take the following steps:

Salary (000) Class boundary f cf X fX

5-9 4.5-9.5 20 20 7 140

10-14 9.5-14.5 10 30 12 120

15-19 14.5-19.5 8 38 17 136

20-24 19.5-24.5 5 43 22 110

25-29 24.4-29.5 4 47 27 108

30-34 29.5-34.5 3 50 32 96

50 710

QUANTILES: Quartiles, Deciles and Percentiles are collectively called quintiles. Generally, quantiles are also called measures of position.

We have already seen how the median divides data into two equal-size groups: one with values above the median, the other with values below the median. Quantilesalso separate the data into equal-size groups in order of numerical value. There are several kinds of quantiles, of which quartiles will be the primary topic for our discussion.

PERCENTILES divide the values into

Quartiles:After values are arranged from smallest to largest, quartiles are calculated similarly to the median. It may be necessary to interpolate (calculate a position between) two values to identify the data position corresponding to the quartile.

The three points which divide an arranged (ascending order) data set into four equal parts are called quartiles. Quartiles are denoted by Q1, Q2 and Q3.

10% 10% 10% 10% 10% 10% 10% 10% 10% 10%

1% 1%

Q1 = lower quartile or first quartile

Q2 = second quartile = median

Q3 = upper quartile or third quartile

Deciles: The nine points which divide an arranged (ascending order) data set into 10 equal parts are called deciles. Deciles are denoted by D1, D2 -----D9.

D1 = first decile, D2 = second decile, …., D9 = 9thdecile

D1 is a value from which 10% observations lies below and 90% lies above;

D2 is a value from which 20% observations lies below and 80% lies above;

------

------

D9 = 90% observations below and 10% lies above

Percentiles: Percentiles divide an arranged data set into 100 equal parts. These are 99 points to do so. Percentiles are denoted by Pi( i = 1,2 , 3, …., 99).

P1 is a value from which 1% observations lies below and 99% lies above;

P2 is a value from which 2% observations lies below and 98% lies above;

D

1

D

2

D

3

D

4

D5

Q2

Median

D

6D

7

D

8

D

9

P50

Q2

Median

D5

----P75---

Q3

------

------

P99 = 99% observations below and 1% lies above

COMPUTATIONAL FORMULAE

COMPUTATIONAL FORMULA FOR PERCENTILES

MEASURES OF DISPERSION

Although the mean and other measures are useful in identifying the central tendency of

values in a population or a set of sample data, it’s also valuable to describe their dispersion,

or scatter.

Measures of central tendency (mean, median, mode, GM and HM) do not provide all

information about the observations contained in a data set that how the individual

observations are scattered around the central value. It is possible with the help of measures of

dispersion.

A single value which measure that how the individual observations of a data set are

scattered/dispersed around the central value, is called measure of dispersion. Measures of

dispersion are classified as “Absolute measures” and “Relative measures” of dispersion.

TYPES OF MEASURES OF DISPERSION

A type of dispersion which can be expressed in the same unit of measurement in which the

original series/data set/ distribution is given is called “Absolute measure” of dispersion. For

example, Range, Quartile deviation, Mean deviation, Standard deviation etc. Similarly,a type

of dispersion which is independent of unit of measurement is called “Relative measure” of

dispersion. For example, coefficient of range, coefficient of quartile deviation, coefficient of

mean deviation, coefficient of variation etc.

Absolute Measures of Dispersions

1. Range2. Inter quartile range3. Semi inter quartile range or Quartile Deviation (QD)4. Mean deviation5. Variance and 6. Standard deviation

Relative Measures of Dispersions

1. Coefficient of Range2. Coefficient of Inter quartile range3. Coefficient Semi inter quartile range4. Coefficient Mean deviation5. Coefficient of Variation

RANGE AND ITS COEFFICIENT

The simplest measure of dispersion, the range is the difference between the highest and lowest values. OR

Range is an absolute measure of dispersion. “It is the difference between maximum (Xm) and minimum (X0) values of a data set”. Mathematically, range is defined as:

Range = Xm – X0

Its relative measure is called coefficient of range and can be defined as:

For Example: The following data indicate the amount of fill (in ml) of 5 different bottle by a soft drink company. The data is 12.5, 12.3, 12, 13, 12.8

So, Range = 13-12 = 1 ml.

Coefficient of Range = (13-12)/(13+12) = 1/25.

Inter Quartile Range and Quartile Deviation

Inter Quartile Range (IQR) is an absolute measure of dispersion. “It is the difference between upper quartile (Q3) and lower quartile (Q1) of a data set”. Mathematically, IQR is defined as:

IQR = Q3 – Q1

Quartile Deviation (Semi inter quartile range): It is half of the inter quartile range. It is also called quartile deviation (QD) and is expressed as:

SIQR = QD = (Q3 – Q1)/2

A relative measure of IQR and SIQR is called coefficient of IQR and coefficient of SIQR (coefficient of QD), respectively and can be expressed as:

Coefficient IQR = (Q3 – Q1)/ (Q3 + Q1)

Coefficient of SIQR =Coefficient of QD = (Q3 – Q1)/ (Q3 + Q1)

MEAN DEVIATION AND ITS COEFFICIENT

Mean deviation is an arithmetic mean of absolute deviations taken from any central value (mean, median, and mode) of a data set. It is an absolute measure of dispersion.

With this descriptor, sometimes called the average deviation or the average absolutedeviation, we now consider the extent to which the data values tend to differ fromthe mean. In particular, the mean absolute deviation (MAD) is the average of theabsolute values of differences from the mean and may be expressed as follows:

In the preceding formula, we find the sum of the absolute values of the differences between the individual values and the mean, and then divide by the number of individual values. The two vertical lines (“z z”) tell us that we are concerned with absolute values, for which the algebraic sign is ignored.

COEFFICIENT OF MEAN DEVIATION

Its relative measure is called coefficient of mead deviation and can be expressed as:

Business statistics

Business

Transcript of Business statistics