Business+Statistics

7/28/2019 Business+Statistics

1/123

Business Statistics


2/123

Contents

1. Meaning and Scope

2. Collection of Data3. Classification and tabulation

4. Diagrammatic and Graphic Representation

5. Averages

6. Dispersion

7. Skewness and Kurtosis8. Correlation

9. Linear Regression Analysis

10. Index Numbers

11. Time series Analysis

12. Theory of Probability13. Random Variable, Probability Distribution and Mathematical expectation

14. Theoretical Distributions

15. Sampling Theory and Design of sample Surveys

16. Interpolation and Extrapolation


3/123

Quantitative Decision Making


4/123

Learning Objectives

Basic Statistics and its application in dayto-day lifeof a Manager

Various aspects of quantitative techniques and their applicationin Decision making

Also frequently used models of Statistical analysis

Understand:

Complexity of Managerial decisions

Quantitative Techniques

Need of using Quantitative approach in decisions

Role of statistical methods in data analysis

Brief idea of various statistical methods

Know the areas of applications of quantitative approach in businessand management.


5/123

Introduction

Individual business prior to Industrial revolutionand need for info----Decisions based on past

experience and intuition.

Marketing of productsTest marketing of products

The manager (also the owner)

Progress of work

Any other fact the owner needed to know


6/123

Intuition alone has no place in

decision making Becomes highly questionable when decisions

involve the choice among several courses of action

each of which can achieve several managementobjectives simultaneously.


7/123

Statistical methods used in

Marketing, Finance, Production and

personnel Also in:

Regional planning

Transportation

Public health

Communication

Military

agriculture


8/123

QT: A group of statistical , and OR

(programming) Techniques

QT approach in decision making :

Problems be defined, analyzed and solved in a conscious,

rational, systematic, scientific manner based on ;

Data, facts, info, and logic (and not whims and guesses)

QT provides decision maker a scientific method based on quantitative

data in identifying a course of action to achieve the optimal value of the

predetermined objective or goal.

Usage of numbers , symbols or mathematical formulae are used to

represent the models of reality.


9/123

Statistics and different senses

Statistical Data

Numerical or quantitative aspects

Statistical Methods

Collect, organize /classify, present, analyze and

interpret


10/123

Functions of Statistical Methods

Data Collection

Organize: segregate/condense

Presentation: orderly manner: graphs/charts

Analysis

Interpretation

examples


11/123

Statistics:

Characteristics of Data: Common to refer

data in quantitative form as Data.

Not all numerical data is statistical.

For numerical description to be statistics: Aggregate of facts

Affected to a marked extend by multiplicity of causes

(controllable/uncontrollable)

Enumerated or estimated according to reasonable standard of

accuracy.

Collected in a systematic manner for a pre-determined purpose.

Placed in relation to each other

Numerically expressed


12/123

Types of Statistical data

Secondary

Primary


13/123

OR : a mathematical model to represent the

situation under study.

Helps to:

Either to predict the performance of a system

Or determine the action or control needed to optimize the

performance.


14/123

Classification of Statistical Methods

into three categories Descriptive Statistics

Data Collection

Presentation

Inductive statistics

Statistical inference

Estimation

Statistical decision Theory

Analysis of business Decision


15/123

Descriptive Statistics

Used for re-arranging, grouping, and summarizing

sets of data

Changes in price index,

Yield by wheat using different charts and graphs

having large quantities of numerical data for easy

understanding

Various types of averages, central tendency and dispersion,trends, index numbers.


16/123

Inductive Statistics

The development of some criteria which can be usedto derive info about the nature of entire population

or universe from the nature of the small sample.

Include : probability, probability distribution, sampling and sampling

distribution,

various methods of testing hypothesis :correlation, regression,

factor analysis, time series analysis.


17/123

Statistical Decision Theory; 4 different

states of decision environment

State of decision and Consequence

Certainty: Deterministic

Risk: ProbabilisticUncertainty: Unknown

Conflict: Influenced by an opponent

Subjective approach (uses probabilities)

Also known as Bayesian approach,


18/123

Models in OR

Based on Purpose: Descriptive: behavior of a system ( Behavior of demand of an inventory item)

Explanatory, : Explain behavior with relationships( wages, promotion policy,)

Predictive: predict stock prices for given any level of earning per share.

Prescriptive (normative): norms for comparison of alternate solutions

(Allocation). Based on Degree of Abstraction Physical, Graphic, Schematic, Analog, Mathematical

Based on Degree of certainty, and risk Deterministic: Linear programming, transportation and assignment models

Probabilistic: simulation models, decision theory Based on Specified behavior characteristics

Static, Dynamic, Linear, Non-linear

Based on Procedure (method) of solution Analytical, Simulation


19/123

Classification of models help in

understanding the nature and role of

models Abstract or

Physical Static : linear programming

Dynamic model

Linear or non-linear

Stable,

unstable

unstable( Constrained)

Unstable (explosive)

Transient steady state,

Transient (non existent)

Ref:


20/123

Various Statistical Techniques Measure of Central tendency

Measure of Dispersion:

Correlation

Regression analysis:

Time Series Analysis

Index Numbers

Sampling and Statistical Inference


21/123

Measure of Central tendency

Mean: common arithmetic average

Divide the sum of the values of observation s by number of items observed.

Median:

Item lies exactly half way between the lowest and highest values

when they are arranged in ascending/descending order. Not

affected by value of observation

Divides the number of households into two equal parts.

(50% of all households have income below median income)

Mode:

Category that has max number of observation, (that occurs more

frequently)


22/123

Measure of Dispersion:

spread away from central tendency

(mean/mode/median) :

Range, mean deviation, Standard deviation.

The data spread in symmetrical or asymmetrical

pattern: skewness

Frequency distribution in the shape of a peak:

measure called: Kurtosis


23/123

Correlation

Dependent variable associated with changes

in other independent variable.

Sales as depended variable and advertisingbudget as an independent.

Could be casual or causal relationships


24/123

Regression analysis:

determining casual relationship between

two variables

Use of Multi-variate statistical techniques for

determining casual relationships involving two or

more variables:

Multi-regression analysis, Discriminant analysis, factor

analysis


25/123

Time Series Analysis

A set of data (arranged in some desired manner)recorded either at successive points in time or over

successive periods of time.

The changes considered as a resultant of combinedaffect of a force

The force components:

Editing time series data

Secular trend

Periodic changes (cyclical/seasonal variations)

Irregular or random variators.

Cost of living, growth of agricultural /food production, seasonalrequirements of items, impact of war, strikes


26/123

Index Numbers: a relative number

representing net result of change in a group

of variables Stated in percentages

given or current year, and base year

production, sales price, volume of employment,


27/123

Sampling and Statistical Inference

Sampling for reasons Schemes for drawing samples are classified as :

Random Sampling Schemes

Every element has an equal chance (probability) of beingselected

Non-random sampling schemes

Drawing samples based on choice or purpose of selectors

Sampling analysis using various tests :

Z normal distribution

Students t distribution,

F distribution

X^2 distribution


28/123

Advantages to Management

Definiteness

Condensation

Comparison

Formulation of policies

Formulating and testing hypothesis

Prediction


29/123

Application of techniques in Business

and Management Management

Marketing

Production

Finance, accounting and Investment

Personnel

Economics

Research and Development

Natural science


30/123

Marketing

Marketing research info

Building and maintaining an extensive

market

Sales forecasting


31/123

Production

PPC and analysis

Machine performance evaluation

QC

Inventory control


32/123

Finance, accounting and Investments

Financial forecast, budget preparation

Fin Investment decision

Selection of securities

Auditing function

Credit policies, credit risk, delinquent

account


33/123

Personnel

Labour turnover rate

Employment trends

Performance appraisal

Wage rates and incentive plans


34/123

Economics

Measurement of Gross National Product and input-output analysis

Determination of business cycles, seasonal

fluctuations Comparison of market price, cost and profit of

individual firm

Analysis of population, Operational studies of Public utilities

Formulation of appropriate economic policies and

evaluation of their effects


35/123

Research and Development

Development of new product lines

Optimal use of resources

Evaluation of existing products


36/123

Natural science

Diagnosing based on inputs

Efficacy of certain drugs

Study of plant life


37/123

Exercise/ Assignments

1. Comment on the statement: Statistics arenumerical statements of facts, but all factsnumerically stated are not statistics

2. Explain the distinction between : Descriptiveand Prescriptive models

1. Presentation topic:1. Formulate a business problem and analyze it by

applying the major phases of statistics


38/123

Functions and Progressions


39/123

Learning Objectives:

Insight into different aspects of the types of functional

relationships among business variables

Their applications in various fields of management

Need to Identify/define relationships among business

variables

Define functional relationships

Various types of functional relationships

Use of graph to depict functional relationships

Managerial applicability

Progression and application..


40/123

Introduction

For decision problems which use mathematicaltools, the first requirement is to identify or formally

define all significant interactions or relationships

among primary factors (also called variables). The

relationships usually are stated in the form of an

equation or inequation.

Study mathematical problems in the context of

managerial problem

Definitions


41/123

Definitions Variables: A variable is something whose magnitude can

vary or which can assume various values. Represented by

symbols (first letter of the name) Discrete variable: suspect to counting (houses, machines)

Continuous Variables: suspect to measurements (temp, height)

Constant and Parameters:

A constant: Remains fixed in the context of a given problem orsituation

An Absolute ( or numerical) Constant retains same value in all problems

Absolute ( or numerical) value of b is denoted by lbl regardless of its algebraicsign. lbl=l-bl

An Arbitrary (or parametric) constant or parameter retains same valuethroughout any particular problem, but may assume different values indifferent problems

P21 (ex1)


42/123

Types of Function Linear Functions:

The power of independent variable is 1 A function with only one independent variable is called a Single variable function. (P21(1)

A single variable function can be linear or non-linear. (p 22)

A linear function with one variable can always be graphed in two dimensional plane (orspace). The graph of such functions is always a straight line.

(P22ex2

Polynomial functions: Polynomial function of degree 1 is called a linear function

Polynomial function of degree 2 is called a Quadratic function (p23-ab

Absolute Value Functions : ( p23(3

Inverse Function: (P 23 Step function: For different values of an independent variable x in an interval the

depended variable y=f(x) takes a constant value, but takes different values in diffintervals. (p24-5)

Algebraic and Transcendental functions


43/123

Activity

P 25 activity B -1a&b assignment


44/123

Business Application

Linear Function ( P27-ex3 assignment

Quadratic function ( P27-ex4 assignment

Activity D (Page 28-b_assignment


45/123

Sequence and Series

If for every positive integer,n, --------related to somenumber-----sequence

Installment buying,

simple and compound interest problemsAnnuities and present values

Mortgage payments


46/123

Arithmetic progression (AP)

Arithmetic progression: A sequence whose

term increases or decreases by a constant

number called Common difference of an APand is denoted by d

P29 ex6 assignment


47/123

Geometric progression (GP)

A geometric progression: A sequence

whose term increases or decreases by a

constant ratio called Common ratio of anAP and is denoted by d

P29 ex7 assignment

P31 ex 8


48/123

Concept of Maxima and Minima

with managerial applications Page 55 ex18 assignment


49/123

Descriptive Statistics

Data Collection and analysis


50/123

Contents

Collection of data:

Need and significance of data collection

Primary and secondary data

Different methods of collecting primary data

Edit primary data and know sources of secondary data and its use

Census versus sample

Classification and presentation of collected data

Treatment of data through central tendency measurements,

Deviations and different measures of variation.


51/123

Introduction

The need for data collection

Statistical data is a set of facts expressed in

quantitative form.The use of facts expressed as measurable

quantities can help a decision maker to arrive at

better decisions.


52/123

Primary and Secondary Data

Distinguish between Primary and------


53/123

Methods of collecting Primary Data

Observation

Questionnaire

Personal interviewMail

Telephone

Designing/Preparing questionnaire

Pre-testing a questionnaire

Editing the primary data.


54/123

Important points in Designing a

questionnaire Covering letter

Number of questions to be minim (15-40)

Simple, short, and unambiguous Sensitive and personal nature be avoided

Answer to questionnaires should not require

calculations Logical arrangement

Crosscheck and footnotes


55/123

Editing Primary Data to ensure:

completeness

Consistency

Accuracy

Homogeneity


56/123

Sources of secondary data

Published Sources

Unpublished Sources


57/123

Precautions in use of secondary Data

Because of bias, inadequate sample size,

errors of definitions, computational errors

Hence to consider:Suitability

Reliability

Adequacy


58/123

Census (complete enumeration) and

Sample Advantages and disadvantages of census

(Physical destruction)


59/123

Exercises/Assignments

1. Distinguish between Primary and

Secondary data. Indicate the situations in

which each of these----?2. Distinguish between census and sampling

methods of data collection. Compare

merits/demerits. Why samplingunavoidable in certain situations.


60/123

Presentation of Data



61/123


Learning objectives

Understand the need and significance of presentation of dataNecessity of classifying data and various types of classification

Construct frequency distribution of discrete and continuous data

Frequency distribution in the form of :bar diagrams, histograms,

frequency polygon, and ogives

Classification

Discrete frequency Distribution Continuous frequency distribution

Choosing the classes

Cumulative and Relative frequencies

Charting data

Introduction


62/123

Introduction

After the understanding various ways of data

collection:The successful use of Data collected depends on:

The manner in which it is arranged, displayed and summarized.

Presentation of data can be displayed either in tabular form orthrough charts

In tabular form , it is necessary to classify the data before the data is

tabulated. Hence to understand:

classification ,

tabulation and

charting of data.

Classification of data


63/123

Classification of data

After the data has been systematically collected andedited,

The first step in presentation of data is Classification

Classification is the process of arranging the dataaccording to points of similarities and dissimilarities


64/123

Principal objectives of classification

To condense the mass of data in such a way that

salient features can be easily noticed

To facilitate comparisons between attributes of

variables

To prepare data to be presented in tabular form

To highlight significant features of data at a glance


65/123

Some Common Types of Classification

Geographical Classification Production of wheat state-wise

Chronological Classification Sales figures of a company for last six years

Qualitative Classification Dichotomous Classification

An attribute divided into two classes, one possessing and the other notpossessing it (basis of employment)

Manifold Classification : divided into several classes (educationallevel)

Quantitative Classification : according to characteristics thatcan be measured (employees as per monthly salaries) Discrete : limited to certain numerical value of a variable

Continuous: Take all values of the variable


66/123

Examples

Chronological classification

Discrete frequency distribution

Continuous frequency distribution

P14,15

Construction of a Discrete Frequency


67/123

Construction of a Discrete Frequency

distribution

Place all possible values of the variable in ascending orderin one column

Then prepare another column of Tally mark to count the

number of times a particular value of the variable isrepeated

To facilitate counting use blocks of 5 Tally marks with a spaceleft in-between blocks

The frequency column refers to numbers of tally marks, aparticular class will contain

p15

Construction of a Continuous


68/123

Construction of a Continuous

Frequency distribution

Class limits: 60-69: lower and upper limits, lowestand highest

Class intervals: width, span or size20-10=10

Class frequency: The number of observation fallingwithin a particular class is called , class frequency or

frequency. Total frequency (sum of all frequencies)

indicate the total number of observations consideredin a given frequency distribution.

Class mid-point: sum of two successive lower points

divided by 2.

A i t


69/123

Assignments

1. What do you understand by classification of data?

2. Why classification of data is required?

3. Illustrate the difference between qualitative andquantitative data.

Types of class interval: Methods


70/123

Types of class interval: Methods

Exclusive and Inclusive (on whether upper limit is

included or excluded) ----(p16)

Openend (p17)

Generally opt for exclusive method

But If Inclusive is suggested, minor adjustments required

to determine class interval

Correction factor: Lower limit of second class-upper limit of

first class, divided by 2

Deduct the correction value from lower limit and add to upper

limit

Guidelines for choosing the class


71/123

Guidelines for choosing the class

The number of classes should not be too small or too large

(5 to 15)

If possible Values of widths of interval should benumerically simple like 5, 10, 25 (values like3,7,9 beavoided

It is desirable to have classes of equal width, (classes withunequal class interval can be formed, like in incomedistribution)

The starting point of a class should begin with 0,5,10, ormultiples of. ( eg 3-13 not allowed)

Class interval should be determined, considering, min maxvalue and the number of classes to be formed

(p18)


72/123

Activity

Distinguish between:

1. Discrete and continuous frequency

distribution2. Class limits and class intervals

3. Inclusive and exclusive methods



73/123


Rather than listing the actual frequency opportunity

each class , it may be appropriate to list eithercumulative frequencies or relative frequencies orboth.

Cumulative frequencies: cumulates the frequencies,starting from either lowest or highest values. (p18-19)

Relative Frequencies: Very often, the frequencies in a

frequency distribution are converted to relativefrequencies to show percentage for each class. Thefrequency of class is divided by the total number ofobservations (total frequency).To get the percentage for

each class, multiply the relative frequency by 100. (p19)

Important advantages in looking at


74/123

Important advantages in looking at

Relative frequencies (percentages)

1. Facilitates a comparison of two or more

sets of data.

2. Constitute the basis for understanding theconcept of probability.


75/123

Activity

Explain the concept of relative frequency


76/123

Charting of Data


77/123

Bar diagram


78/123

Bar diagram

Most popular

Example: Population, per capita income, sales and profits A bar is a thick line whose width is shown to attract the

viewer.

A bar diagram may be either vertical or horizontal.

DRAWING A BAR DIAGRAM:

Take characteristic (or attributes) under consideration on X-axis and thecorresponding value on the Y-axis. It is desirable to mention the valuedepicted by the bar on the top of the bar.

The gap between one bar and the other is kept equal.

Also width of bars are same.

The only difference is in length of the bars.

That is why this type of diagrams are known as one dimensional.

(P20)

Histograms


79/123

g One of the most commonly used and easily understood

methods of graphic representation of frequency distribution.

A histogram is a series of rectangles having areas that are in

the same proportion as the frequencies of a frequency

distribution

CONSTRUCTING HISTOGRAM:

On horizontal axis or X-axis, we take class limits of variables, and on

vertical axis or Y-axis, we take frequencies of class intervals shown on

horizontal axis

If class intervals are of equal width, then the vertical bars of equal

widths.(P20-21)

On the other hand if the class intervals are unequal , the frequencies have to

be adjusted according to width of class interval (P 21-22)


80/123

Activity

Draw a sketch of a histogram and a bar

diagram and explain the difference between

the two.

Frequency Polygon


81/123

Frequency Polygon

A graphical presentation of frequency distribution

A polygon is a many sided closed figure, A frequency polygon is constructed by:

taking the mid points of upper horizontal points of each rectangle on the

histogram and

connecting these mid-points by straight lines. In order to close the polygon, an additional class is assumed at each end,

having zero frequency.

(p22-23)

The histogram is usually associated with discrete data and a frequency polygon

is appropriate for continuous data. (But the distinction is not always followed)

The frequency polygon and frequency curve have a special advantage over

histogram particularly when to compare two or more frequency distributions


82/123

Activity

What is the procedure for making a

frequency polygon? Illustrate.

Ogives or Cumulative frequency Curve


83/123

Ogives or Cumulative frequency Curve

A graphical presentation of a cumulative frequencydistribution .

There are two methods:

Less than ogive:

The upper limits of various classes are taken on X-axis, and frequencies

obtained by the process of cumulating the preceding frequencies on Y-

axis.By joining these points we get less than ogive

More than ogive.

By taking lower limits on X-axis and cumulative frequencies on the Y-axis.by joining these points we get more than ogive.

The shape of less than ogive curve will be a rising one,

Whereas the shape of more than ogive curve wood be a falling one

Activity


84/123

Activity

With the help of an example , explain the

concept of less than ogive and more than ogive.

Types of Data


85/123

yp

Data refers to known facts or things used as basis for

inference or reckoning.

Types of Data:

Qualitative: concerned with qualities and non-numerical

characteristics.

Quantitative: concerned with numerical characteristics.

Discrete: take only one of a range of distinct values (no of

employees). Continuous: take any value within a given range (time, length)

(P160-161BR)

The Concept of Level of Measurements


86/123

The Concept of Level of Measurements

Scales of Measurement

Nominal level (Classificatory/ named) Data:

Ordinal level (Ranking/ordered) data:

Interval level (Numerical) data

Ratio level (Numerical) data: represent highest level ofprecision.

Nominal level (Classificatory/ named)


87/123

Nominal level (Classificatory/ named)

Data:

And Implications for Data handlingMethodologies

Classification of data: Statements of equality or differences

(according to variable occupation)

Although mode could be used, very few statistics can be

applied to data collected in this form

Ordinal level (Ranking/ordered) data:


88/123

( g )

And Implications for Data handling

Methodologies

Can be Classified in terms of of equality or differences

Permit you to order individual data and make decisions such as

this score is greater or lesser than another. (employee grades or

choices ranked)

Since arithmetic mean cannot be calculated , the use of many

other statistics are also excluded.

Interval level (Numerical) data


89/123

( )

And Implications for Data handling

Methodologies

Have characteristics of both Nominal and Ordinal scales, but

also provides additional info regarding the degree of differencebetween individual data items within a set of group.

Most measures of human characteristics have interval

properties. (Interval between IQ Scores/ assignment marks)

However precision in interval scale is limited. Also somestatistics such as geometric mean are excluded from use with

data collected in this form.

Ratio level (Numerical) data: represent


90/123

highest level of precision.

And Implications for Data handlingMethodologies

A Mathematical number system (height, weight, time)

Ratio Scale allow ratio as well as interval decision (allowing us

to say something is so many times big/bright/heavy)

Any statistics can be used on data collected in this form. (Some scales such as temp may appear to have ratio properties,

but in fact are only interval scales) (Centigrade)

Parametric and non-parametric methods


91/123

p

(assumptions about parameters of the data)

Associated with every data analytic method, there isa set of assumptions that underlie the use of thatmethod.

t-test (to compare the means of two samples ofdata) as one of the most popular (p133-RM)

non-parametric methods; For research in social sciences in mind Valid for use with nominal or ordinal level.

For very small samples (less than n.=10), though the power ofany test weakens with very small samples.


92/123

Measures of central Tendency

Measures of central Tendency


93/123

y

Learning objectives:

Concept and significance of measures of central

tendency.

Computing: arithmetic mean, weighted arithmetic mean,

median, mode, geometric mean, and harmonic mean.

Computing several quantiles: quartiles, deciles, and

percentiles

Relationships among various averages.

Si ifi f f t l


94/123

Significance of measure of central

tendency

The objective is to find one representative value

which can be used to locate and summarize the

entire set of varying values.

To find some central value around which the data

tend to cluster

Average income

Average sales figure may be compared with that of

another

Properties of a Good measure of central


95/123

p

tendency

Easy to understand

Simple to compute

Based on all observations

Uniquely defined

Capable of further algebraic treatment

It should not be unduly affected by extreme

values.

Important measures of central tendency


96/123

Important measures of central tendency

commonly used by Business and Industry.

arithmetic mean,

weighted arithmetic mean,median,

quantiles

mode,

geometric mean,

harmonic mean.

Arithmetic Mean


97/123

Arithmetic Mean

(or Mean or Average)

In statistics term average refers to any of the measure of centraltendency

The Arithmetic mean is defined as being equal to the sum ofnumerical values of each and every observation divided by the totalnumbers of observations.

Eg; Average monthly salary ..ungrouped data

When observations are classified into a frequency distribution, Themidpoint of a class interval would be treated as the representativeaverage value of that class.

(P-31 .)

M th ti l ti f


98/123

Mathmetical properties of

Arithmetic mean

The sum of deviations of observations from

AM is always zero

The sum of squared deviations ofobservations from the mean is minimum

Arithmetic means of several sets of data

may be combined into a single AM forcombined sets of data.

AM


99/123

AM

Advantages:Easily computed

Readily understood

Almost all properties of a good measure of centraltendency.

DisadvantagesDistorted by Extreme values

Open end distribution and assigning midpoint value.

Weighted Arithmetic mean


100/123

Weighted Arithmetic mean

Arithmetic mean gives equal importance (or weight)to each observation. In some cases all observations

do not have same importance

Useful in problems relating to construction of index

numbers.

P33,34

Median


101/123

Divides the distribution into two equal parts.

50% of the observations in distribution are above the

value of median -------

The median is the value of the middle observation

when the series is arranged in

P34,,35

Mathematical Property of Median


102/123

Sum of absolute deviations about the median is minimum

Easy to determine and easy to explain Affected by number of observations and not by value of

observation, hence less distorted as a representative value

than AM

It may be computed for an openend distribution

Disadvantages:

Less familiar than AM As a positional average its values are not determined by each and every

observation.

Not capable of algebraic treatment

Quantiles


103/123

Related positional measures of central tendency

The most familiar quantiles are

Quartiles:

Values which divide the total data into 4 equal parts

Since 3 points divide the distribution into 4 equal parts, we have 3 quartile.Q1(25% of observations are smaller and ----), Q2,Q3

Deciles Values which divide the total data into ten equal parts. Since 9 points divide

the distribution into 10 equal parts, we have 9 Deciles denoted as D1, D2----D9

Percentiles:

Values which divide the total data into 100 equal parts. Since 9 9pointsdivide the distribution into 100 equal parts, we have 99 percentiles denotedas P1, P2----P99

P36,37


104/123

Locating Quantiles graphically:

To locate median graphically, draw less than ogive(cumulative frequency curve),

Take variables on X axis and frequency on Y axis

Determine median value by locating N/2 observation on Yaxis,

Draw a horizo line to cum freq curve

From where it meets, draw perp to X axis

The point where it meets X axis is the median value.

Same way values of Q1---, D1---,P1---, etc can be found

p38

MODE


105/123

MODE Most commonly observed value in a set of data-----

P39

Locating the mode graphically

Construct a histogram

p40

Relationship among Mean Median


106/123

Relationship among Mean, Median

and Mode

A distribution in which mean, median and mode coincide is

known as Symmetrical (bell shaped) distribution

If a distribution is skewed, ( not symmetrical), then mean,

median and mode are not equal.

In a moderately skewed distribution, distance between mean

and median is approx , one third the distance between mean

and mode Mode=3median-2mean

p41

Geometric Mean


107/123

Geometric mean like arithmetic mean is acalculated average.

Very useful in averaging ratios and percentages.

Also in determining the rate of increase or decrease

Also capable of further algebraic treatment

GM is more difficult to compute and interpret

Cannot be computed if any observation has either a value

zero or negative observations

Harmonic Mean


108/123

A measure of central tendency for data expressed

as rates (km/hr, tonnes/day , Km/ltre)

Defined as the reciprocal of arithmetic mean of

reciprocal of individual observations.

Harmonic mean like arithmetic mean and geometricmean is computed from each and every observations

It is specially used for averaging rates

Cannot be computed when on or more observations have zero

value or when there are both positive and negative

observations

In dealing with business problems rarely used.


109/123

Measure of Variation( Dispersion)


110/123

( p )

A measure of variation (dispersion) describes the

spread or scattering of the individual values around

the central value.

Illustration (p47)

Significance of Measuring variation


111/123

1. Determines the reliability of an average by

pointing out as to how far an average is

representative of the entire data.

2. Determine nature and cause of variation in-order to

control the variation itself

3. Enable comparisons of two or more distributions

with regard to their variability.

4. Measuring variability is of great importance to

advanced statistical analysis. (like in sampling or

statistical inference)

Properties of a Good measure of variation


112/123

p

Should possess, as far as possible same properties as

those of a good measure of central tendency.

Some of the well known measures of variation

which provide a numerical index of the variability ofthe given data are:

Range

Average or mean deviation

Quartile Deviation or Semi-Interquartile range

Standard deviation

Absolute and Relative measures of


113/123

variation

Measures of Absolute variation are expressed in

terms of the original data.

In cases two sets of data are expressed in different

units of measurement, then the absolute measures ofvariation are not comparable. In such cases

measures of relative variation are used. Also in

cases:Comparison between two sets of data having the same

unit of measurement, but with different means.

Range


114/123

Difference between the highest (numerically large ) value and thelowest value in a set of data.

R=H-L Range is very easy to calculate and gives us some idea about the

variability of data.

However, the range is a crude measure of variation , as it uses only

two extreme values. Concept of range utilized in SQC, in studying variations in prices of shares anddebentures and other commodities that are very sensitive to price changes fromone period to another. Also a good indicator in weather forecast

For grouped data, the range may be approximated as difference

between upper limit of the largest class and the lower limit of thelowest class.

The relative measure corresponding to range, called the coefficient ofrange , is obtained by applying formula

P48,49

Quartile deviation or

Semi interquartile range


115/123

Semi-interquartile range Computed by taking the averages of the difference

between the third quartile and the first quartile.

The relative measure corresponding to quartile

deviation, called coefficient of quartile deviation.

QD is superior to range as it is not based on two extreme

values, but rather on middle 50% observations.

Another advantage of QD is that it is the only measure of

variability which can be used for open-end distribution. The disadvantage is that it ignores the first and last 25%

observations.

P49,50

Average Deviation

or Mean Deviation


116/123

or Mean Deviation

Is an improvement over the previous two measures in that it considersall observations in the given set of data.

This measure is computed as a mean of deviations from mean or themedian.

All deviations are treated as positive regardless of sign.

Theoretically, there is an advantage in taking the deviations frommedian, because, the sum of absolute deviations from median isminimum. However, in actual practice, the arithmetic mean is more

popular.

The relative measure corresponding to the average deviation, calledcoefficient of average deviation is obtained by dividing averagedeviation by the particular average used in computing the averagedeviation. (Mean or median)

p51

Advantages and disadvantages

(of Average Deviation)


117/123

(of Average Deviation) Though a good measure of variability, its use is

limited,

If only to measure and compare variability among

several sets of data, the AD may be used.

Major disadvantage is its lack of mathematical

properties. This is more so because non-use of signs in

its calculations make it algebraically inconsistent.

Standard Deviation


118/123

Most widely used and important measure of variation.

(In computing average deviation , the signs are ignored). The stddeviation overcomes this problem, by squaring the deviations, whichmakes them all positive.

The std deviation, also known as root mean square deviation.

The square of Std Deviation is called variance

The Std Deviation and variance becomes larger as the variability or spreadwithin the data becomes greater.

It is readily comparable with other Std deviations, and greater the Std Deviation,greater the variability.

The Std deviation is commonly used to measure variability,

While other measures have special uses, It is the only measure possessing the necessary mathematical properties to make

it useful for advanced statistical work.

p53

C ffi i f i i (C )


119/123

Coefficient of Variation (C.V)

Frequently used relative measure of

variation .

This measure is simply the ratio of stddeviation to mean expressed as percentage.

p54

Skewness


120/123

The measure of central tendency and variation do

not reveal all characteristics of a given set of data

Two distributions having same mean and Std

deviation, may differ widely in the shape of their

distribution.

Distribution of data is symmetrical or not (asymmetrical

or skewed)

Thus the skewness refers to lack of symmetry indistribution

Method of detection of skewness is to

id th t il f di t ib ti


121/123

consider the tail of distribution

Symmetrical distribution:No extreme values in a particular direction, so that low and high

values balance each other.

Mean=median=mode

Negatively skewed distribution

Longer tail towards lower value, or left hand side, the skewness is

negative. The mean is decreased by some extremely low values.

Positively skewed Distribution Longer tail of distribution towards higher values, or right handside, the skewness is positive. The mean is increased by some

unusually high values.

p55

R l i k


122/123

Relative skewness

In order to make comparisons between the

skewness in two or more distributions, the

coefficient of skewness (Karl Pearson method, Bowleys methods )

In practice the value of coefficient ofSkewness , SK may be between +-1


123/123

Business+Statistics

Documents

Transcript of Business+Statistics