Managerial Statistics

31
1 | Page COURSE DOCKET * SUBJECT: MANAGERIAL STATISTICS PGDM TERM I FACULTY: ROHIT R. MUTKEKAR * Disclaimer: Course Docket will help you for quick reference and provides fundamental inputs, hence cannot be substitute for text book/reference books

description

Goa Institute of Management

Transcript of Managerial Statistics

1 | P a g e

COURSE DOCKET*

SUBJECT: MANAGERIAL STATISTICS

PGDM TERM – I

FACULTY: ROHIT R. MUTKEKAR

*Disclaimer: Course Docket will help you for quick reference and provides fundamental inputs,

hence cannot be substitute for text book/reference books

2 | P a g e

Course Docket

Subject: Managerial Statistics

Correlation Analysis

Introduction

Classification Correlation

o Positive/Negative/Zero Correlation

o Simple/Multiple/Partial Correlation

o Linear/Non-Linear Correlation

Degree of Correlation

Methods to determine Simple Correlation

o Scatter Diagram Method

o Karl Pearson Method

o Spearmen’s Rank Correlation Method

Properties of Correlation

o Examples on Simple Correlation Analysis

1. The following data provides details regarding sales and net profit for some of the top auto

makers during the quarter July-September 2006. Find the co-efficient of correlation using

an appropriate method and interpret the result.

Company Average Sales (X)

(Rs. Crores)

Average Profit (Y)

(Rs. Crores)

Tata Motors 6484.8 466

Hero Honda 2196.5 224.2

Bajaj Auto 2444.7 345.4

TVS Motor 1032.9 35.1

Bharat Forge 461.6 63.4

Ashok Leyland 1635.8 94.7

M & M 2365.5 200.6

Maruti Udyog 3426.5 315.7

Source: Economic Times, 11th October, 2006

3 | P a g e

Coefficient of Determination (r2)

2. The following are the monthly figures of advertising expenditures and sales of a firm. It is

generally found that the advertising expenditures has an impact on sales. Determine the

co-efficient of correlation for the data provided using rank correlation method.

Month Adv Expenditure (Rs’ 000) Sales (Rs’000)

Jan 50 1200

Feb 60 1500

March 90 1600

April 70 2000

May 120 2200

June 150 2400

July 140 2500

Aug 160 2600

Sept 190 2800

Oct 170 2900

Nov 200 3100

Dec 250 3900

3. Find the co-efficient of correlation between the two kinds of assessment of postgraduate

students’ performance in a college using the rank correlation method.

Name

Assessment Marks

Internal External

A 51 50

B 63 72

C 73 74

D 46 50

E 50 58

F 60 66

G 47 50

H 36 30

I 60 35

4 | P a g e

Statistical Significance of Correlation

o For Karl Pearson’s Method

o For Spearmen’s Rank Method

Assignment 1

1. Calculate the co-efficient of correlation for the original data given in Example 1 using

Karl Pearson’s method. Determine the amount of variation in the result as compared to

the round of values. What can you conclude from this exercise?

2. A group of students of management programme at a certain institute were selected at

random. Their IQ and the marks obtained by them in the paper on decision science were

recorded. The details are as follows –

IQ Marks

Scored

120 85

110 80

130 90

115 88

125 92

120 87

Calculate co-efficient of correlation using Karl Pearson’s and Spearmen’s Rank Method

and test for statistical significance in both the cases at 5% l.o.s.

3. Given below is the data about revenues and profit after tax for the quarter July-

September 2007 of some cement companies. Compute the co-efficient of correlation

using appropriate method and interpret the result. Also test the statistical significance at

1%.

Company Revenues

(Rs Crores)

Profit after Tax

(Rs Crores)

ACC 13 2.5

Ambuja 21 3.2

Ultratech 10 2.6

Shree 9 1.4

India 5 1.1

Bagalkot 3 0.8

5 | P a g e

Regression Analysis

Introduction

Types of Regression

o Simple Regression

o Multiple Regression

Simple Linear Regression Analysis using Least Squares Method

o Examples on Simple Regression Analysis using Least Squares Method

1. A group of students of management programme at a certain institute were selected at

random. Their IQ and the marks obtained by them in the paper on decision science were

recorded. The details are as follows –

IQ Marks

Scored

120 85

110 80

130 90

115 88

125 92

120 87

Fit a regression equation and interpret the result so obtained.

2. A national level organization wishes to prepare a manpower plan based on the ever

growing sales offices in the country. Data pertaining to manpower and the number of sales

offices for previous is given below-

Fit a regression model for the given data and estimate the manpower required if the

organization targets to have 43 sales offices at the end of 2015.

Manpower Sales Offices

370 22

386 25

443 28

499 31

528 33

616 38

6 | P a g e

3. The following data is relates to training and performance of salesmen employed in a

company.

Fit a regression model and determine the weekly sales that is likely to be attained by a

salesman who is given 16 hours of training.

Regression Co-efficients

Standard Error of Estimate

Co-efficient of Determination

How good is the Regression?

Multiple Regression Analysis

Introduction

General form of Multiple Regression Model

Standard Error for Multiple Regression Model

o Examples on Multiple Regression Analysis

1. Fit a regression model for the following data and interpret the result

Sales

(Rs.Lakh)

Adv Expenses

(Rs’000)

Selling

Offices

100 40 10

80 30 10

60 20 7

120 50 15

150 60 20

90 40 12

70 20 8

130 60 14

Training (hrs.) Performance(Avg weekly sales)

20 44

5 22

10 25

13 32

12 27

7 | P a g e

2. The owner of a chain of the stores wishes to forecast net profit with the help of next years

projected sales of food and non-food items. The data about the current year’s sales of food

items, sales of non-food items as also net profit for all the ten stores are available as follows-

Fit a regression model and interpret the result.

Adjusted Coefficient of Determination

Multicollinearity in Multiple Regression

Selection of Independent Variables in a Regression Model

Assignment 2

1. For the following data fit regression equations of-

i. Net Profit on Net Sales

ii. P/E ratio on Net Sales

For the group of these companies

Name of the

Company

Net Sales(Rs. Cr)

(Sept, 2005)

Net Profit (Rs. Cr)

(Sept, 2005)

P/E Ratio

(31st Oct, 2005)

Infosys 7836 2170.9 32

Wipro 8051 1831.4 30

Bharti 8211 1655.8 31

Hero Honda 9771 1753.5 128

ITC 8086 868.4 16

Satyam 8422 2351.3 20

HDFC 3996 844.8 23

Tata Motors 18368 1314.9 14

Siemens 2753 254.7 38

Interpret the result in terms of standard error, R2 and significance of regression model.

Net Profit

(Rs.Cr)

Sales of Food

Items (Rs.Cr)

Sales of Non-Food Items

(Rs.Cr)

5.6 20 5

4.7 15 5

5.4 18 6

5.5 20 5

5.1 16 6

6.8 25 6

5.8 22 4

8.2 30 7

5.8 24 3

6.2 25 4

8 | P a g e

2. A company wants to assess the impact of R&D expenditure (Rs. Cr) on Annual profits

(Rs. Cr). The following table give information for the past 8 years.

Year R&D expenditure Annual profits

2006 9 45

2007 7 42

2008 5 41

2009 10 60

2010 4 30

2011 5 34

2012 3 25

2013 2 20

Fit a regression model. Interpret the result in terms of standard error, R2 and significance

of regression model.

3. Ashwin, owner of a business unit, is concerned about the sales pattern of his product. He

realizes that there are many factors that might help explain sales, but believes that

advertising and prices are major determinants. He has collected data from the past

records which are as follows-

Sales (unit sold) 37 65 75 87 22 29

Advertising (No of ads) 07 10 14 17 13 10

Price (Rs’000) 129 115 140 130 145 140

Fit a regression model. Interpret the result in terms of standard error, R2 and significance

of regression model.

9 | P a g e

Introduction to Statistical Inference

Introduction

Parameter

Statistic

Sample Space

Sampling Distribution

Standard Error

Testing of Hypothesis

Hypothesis

Statistical Hypothesis

o Simple Hypothesis

o Composite Hypothesis

Null Hypothesis

Alternative Hypothesis

Test Statistic

Null Distribution

Critical (Rejection) Region

Acceptance Region

Errors in Hypothesis Testing

Actual Fact Decision based on

sample

observation

Inference Error

H0 is true Accept H0 Correct Decision --

H0 is true Reject H0 Incorrect Decision Type I

H0 is false Accept H0 Incorrect Decision Type II

H0 is false Reject H0 Correct Decision --

Type I Error

Type II Error

Size of the test (Level of Significance)

Power of the test

One Tail Test

Two Tail Test

Procedure in Hypothesis Testing

o Formulation of Hypothesis

o Set up a suitable significance level

o Select the test criterion

o Computation

o Decision making

10 | P a g e

Large Sample Tests (Z Tests)

Z-Test for Single Mean (Theory)

Here the null hypothesis is given by,

H0: 𝜇 = 𝜇0

For the above null hypothesis, we may have any one of the following alternatives,

a) H1: 𝜇 ≠ 𝜇0 (Two tail test)

b) H1: 𝜇 > 𝜇0 (One tail test – Upper)

c) H1: 𝜇 < 𝜇0 (One tail test – Lower)

Now the test statistic under H0 is given by,

𝑍𝑜𝑏𝑠 =�̅� − 𝜇0

𝜎

√𝑛

~ 𝑁(0,1)

�̅� denotes sample mean

𝜇0 denotes standard value at which the population mean 𝜇 is tested

𝜎 denotes population standard deviation

𝑛 denotes sample size

(Noted: If 𝝈 is not specified then we need to use the sample standard deviation‘s’)

Decision Making

a) If we are testing H0: 𝜇 = 𝜇0 vs H1: 𝜇 ≠ 𝜇0 at α level of significance, then we can reject

H0 if 𝑍𝑜𝑏𝑠 is lying outside the interval (−𝑍α2⁄ , +𝑍α

2⁄ )

b) If we are testing H0: 𝜇 = 𝜇0 vs H1: 𝜇 > 𝜇0 at α level of significance, then we can reject

H0 if 𝑍𝑜𝑏𝑠 > 𝑍α

c) If we are testing H0: 𝜇 = 𝜇0 vs H1: 𝜇 < 𝜇0 at α level of significance, then we can reject

H0 if 𝑍𝑜𝑏𝑠 < - 𝑍α

Here 𝑍α and 𝑍α2⁄ are Normal Table (Z) values at α level of significance

Z-Test for Two Means (Theory)

Here the null hypothesis is given by,

H0: 𝜇1 = 𝜇2

For the above null hypothesis, we may have any one of the following alternatives,

a) H1: 𝜇1 ≠ 𝜇2 (Two tail test)

b) H1: 𝜇1 > 𝜇2 (One tail test – Upper)

c) H1: 𝜇1 < 𝜇2 (One tail test – Lower)

Now the test statistic under H0 is given by,

𝑍𝑜𝑏𝑠 =�̅�1 − �̅�2

√𝜎1

2

𝑛1+

𝜎22

𝑛2

~ 𝑁(0,1)

11 | P a g e

�̅�1 denotes sample mean for a sample of size 𝑛1 from population 1

�̅�2 denotes sample mean for a sample of size 𝑛2 from population 2

𝜎1 denotes population standard deviation from population 1

𝜎2 denotes population standard deviation from population 2

(Noted: If 𝝈𝟏 , 𝝈𝟐 is not specified then we need to use the sample standard deviation‘𝒔𝟏 , 𝒔𝟐’)

Decision Making

a) If we are testing H0: 𝜇1 = 𝜇2 vs H1: 𝜇1 ≠ 𝜇2 at α level of significance then, we can reject

H0 if 𝑍𝑜𝑏𝑠 is lying outside the interval (−𝑍α2⁄ , +𝑍α

2⁄ )

b) If we are testing H0:𝜇1 = 𝜇2 vs H1: 𝜇1 > 𝜇2 at α level of significance then, we can reject

H0 if 𝑍𝑜𝑏𝑠 > 𝑍α

c) If we are testing H0: 𝜇1 = 𝜇2 vs H1: 𝜇1 < 𝜇2 at α level of significance, then we can reject

H0 if 𝑍𝑜𝑏𝑠 < - 𝑍α

o Examples on Z-Test (Mean)

1. An aircraft manufacturer needs to buy aluminium sheets of 0.05 inch in thickness. Thinner

sheets would not be appropriate and thicker sheets would be too heavy. The aircraft

manufacturer takes a random sample of 100 sheets and finds that their average thickness is

0.048 inch and their standard deviation is 0.01 inch. Should the aircraft manufacturer by the

aluminium sheets from the supplier?

2. A company manufacturing automobile tyres finds that the tyre life is normally distributed with

mean 40000 kms and standard deviation of 3000 kms. It is believed that a change in the

production process will result in a better product and the company goes ahead in adopting the

new process. As a pilot study 100 new tyres are randomly selected from the lot and tested.

From the test result it is found that the average life of these new tyres is 40900 kms. Can it be

concluded that the new tyres are significantly better than the old ones? Test at 1% l.o.s.

3. It has been found from experience that the average tensile strength of an alloy is 500 pounds

with standard deviation of 40 pounds. From the supplies, received during the current month,

a sample of 50 units were tested which showed an average tensile strength of 450 pounds. Can

we conclude that the alloy supplied is inferior?

4. A potential buyer wants to decide which of the two brands of electric bulbs he should buy as

he wants to buy them in bulk. As a specimen, he buys 100 bulbs of each of the two brands –

A and B. On testing these bulbs, he finds that brand A has mean life of 1200 hours with

standard deviation 50 hours and brand B has mean life of 1150 hours with standard deviation

40 hours. Do the two brands differ significantly in terms of average life span? Test at 2% l.o.s.

12 | P a g e

5. An automobile company is interested in testing the average mileage given by one of the car

brand in two different cities i.e. Delhi and Mumbai. The company surveyed 100 car owners in

Delhi and found the average mileage is 12 kms and it surveyed 150 owners in Mumbai and

found the average mileage is 12.5 kms. The standard deviation for mileage of this brand of car

is known to be 0.9 kms. Can we conclude that the cars gives better average in Mumbai as

compared to Delhi? Test at 5% l.o.s.

Examples on Z test using MS Excel

1. A business school in its advertisement claims that the average salary of its graduates in a

particular lean year is at par with the average salaries offered at the top five business schools.

A sample of 35 graduates, from the business school whose claim was to be verified, was taken

at random. The average salary offered at the top five business schools in that year was given

as Rs.750000. Test the validity of the claim.

Student Salary(000's)

1 750

2 600

3 600

4 650

5 700

6 780

7 860

8 810

9 780

10 670

11 690

12 550

13 610

14 715

15 755

16 770

17 680

18 670

19 740

20 760

21 775

22 845

23 870

24 640

25 690

26 715

27 630

28 685

29 780

30 635

13 | P a g e

31 770

32 665

33 780

34 550

35 620

2. A large organization produces electric light bulbs in each of its two factories (A and B). It is

suspected that the quality of production from factory A is better than factory B. To test this

assertion the organization collets samples from factory A and B, and measures how long each

light bulb works (in hours) before it fails (relevant data is given below). Both population

variances are known i.e. Var(A)=52783 and Var(B)=61560. Test the assertion at 5% l.o.s.

Factory A Factory B

900 1052

1276 947

1421 886

1014 788

1246 1188

1507 928

975 983

1177 970

1246 766

875 1369

816 737

983 1114

1119 354

988 1347

1137 1062

1227 756

858 1052

941 754

1299 990

1110 950

929 783

843 816

1156 658

867 504

1454 1076

1403 500

1165 1025

1653 649

1288 1166

1187 498

945

1002

14 | P a g e

Z-Test for Single Proportion (Theory)

Here the null hypothesis is given by,

H0: 𝑃 = 𝑃0

For the above null hypothesis, we may have any one of the following alternatives,

a) H1: 𝑃 ≠ 𝑃0 (Two tail test)

b) H1: 𝑃 > 𝑃0 (One tail test – Upper)

c) H1: 𝑃 < 𝑃0 (One tail test – Lower)

Now the test statistic under H0 is given by,

𝑍𝑜𝑏𝑠 =𝑝 − 𝑃0

√𝑃0. 𝑄0

𝑛

~ 𝑁(0,1)

𝑝 denotes sample proportion

𝑃0 standard value at which the population proportion P is tested

𝑄0 = 1 − 𝑃0

𝑛 denotes sample size

Decision Making

a) If we are testing H0: 𝑃 = 𝑃0 vs H1: 𝑃 ≠ 𝑃0 at α level of significance then we can reject

H0 if 𝑍𝑜𝑏𝑠 is lying outside the interval (−𝑍α2⁄ , +𝑍α

2⁄ )

b) If we are testing H0: 𝑃 = 𝑃0 vs H1: 𝑃 > 𝑃0 at α level of significance then we can reject

H0 if 𝑍𝑜𝑏𝑠 > 𝑍α

c) If we are testing H0: 𝑃 = 𝑃0 vs H1: 𝑃 < 𝑃0 at α level of significance then we can reject

H0 if 𝑍𝑜𝑏𝑠 < - 𝑍α

Z-Test for Two Proportions (Theory)

Here the null hypothesis is given by,

H0: 𝑃1 = 𝑃2

For the above null hypothesis, we may have any one of the following alternatives,

a) H1: 𝑃1 ≠ 𝑃2 (Two tail test)

b) H1: 𝑃1 > 𝑃2 (One tail test – Upper)

c) H1: 𝑃1 < 𝜇2 (One tail test – Lower)

Now the test statistic under H0 is given by,

𝑍𝑜𝑏𝑠 =𝑝1 − 𝑝2

√𝑃.̂ 𝑄.̂ (1𝑛1

+1

𝑛2)

~ 𝑁(0,1)

Where �̂� = 𝑛1𝑝1+𝑛2𝑝2

𝑛1+𝑛2 and �̂� = 1 − �̂�

𝑝1 denotes sample proportion for a sample of size 𝑛1 from population 1

𝑝2 denotes sample proportion for a sample of size 𝑛2 from population 2

𝑃1 denotes population proportion for population 1

𝑃2 denotes population proportion for population 2

15 | P a g e

Decision Making

a) If we are testing H0: 𝑃1 = 𝑃2 vs H1: 𝑃1 ≠ 𝑃2 at α level of significance then we can reject

H0 if 𝑍𝑜𝑏𝑠 is lying outside the interval (−𝑍α2⁄ , +𝑍α

2⁄ )

b) If we are testing H0:𝑃1 = 𝑃2 vs H1: 𝑃1 > 𝑃2 at α level of significance then we can reject

H0 if 𝑍𝑜𝑏𝑠 > 𝑍α

c) If we are testing H0: 𝑃1 = 𝑃2 vs H1: 𝑃1 < 𝑃2 at α level of significance then we can reject

H0 if 𝑍𝑜𝑏𝑠 < - 𝑍α

o Examples on Z-Test (Proportion)

1. It is known from the past data that 10% of the families in a certain locality subscribe to a

periodical called Outlook. Of late, there has been some apprehension that the subscription rate

has declined. In order to test whether there has been a decline, a random sample of 100 families

were surveyed from the locality and it was found that 7 families did subscribe for Outlook.

Can it be concluded that the subscription rate has really declined? Test at 5% l.o.s.

2. The owner of a departmental stores claims that majority of his customers use credit/debit card

as their payment option. To verify the claim made, 800 customers were randomly observed

during the given time and it was found that 420 made payment using credit/debit card. Discuss

whether the information supports the view and test the same at 1% l.o.s.

3. A cable TV operator claims that 50% of the homes in a city have opted for his services. Before

sponsoring advertisements on the local cable channel, a firm conducted a survey and found

that 280 homes out of 600 to have cable TV service provided by the operator. On this basis of

the data can we accept the claim made by the cable operator? Test at 1% l.o.s.

4. A company is considering two different ads for promotion of a new product. After watching

both the ads, the management believes that advertisement A is more effective than

advertisement B. Two test market areas with virtually identical consumer characteristics are

selected. Advertisement A is used in one area and B in another. In a random sample of 60

customers who saw advertisement A, 18 tried the product and similarly a random sample of

100 customers who saw advertisement B, 22 tried the product. Does this indicate that

advertisement A is more effective than advertisement B, test at 5% l.o.s.

5. You obtain a large number of components to an identical specification from 2 sources. You

notice that some of the components are from the suppliers own plant at Pune and some are

from the plant at Bangalore. You would like to know whether the proportion of defective

components are the same or there is a difference between them. For this, you take a random

sample of 600 components from each plant and find the sample proportion of defective

components as 0.015 and 0.017 respectively. Test at 1% l.o.s., whether the proportion of

defectives differ significantly with respect to these two plants.

16 | P a g e

Small Sample Tests (t Tests)

t-Test for Single Mean (Theory)

Here the null hypothesis is given by,

H0: 𝜇 = 𝜇0

For the above null hypothesis, we may have any one of the following alternatives,

a) H1: 𝜇 ≠ 𝜇0 (Two tail test)

b) H1: 𝜇 > 𝜇0 (One tail test – Upper)

c) H1: 𝜇 < 𝜇0 (One tail test – Lower)

Now the test statistic under H0 is given by,

𝑡𝑜𝑏𝑠 =�̅� − 𝜇0

𝑠

√𝑛 − 1

~ 𝑡(𝑛 − 1)degree of freedom

�̅� denotes sample mean

𝜇0 denotes standard value at which the population mean 𝜇 is tested

𝑠 denotes sample standard deviation

𝑛 denotes sample size

Decision Making

a) If we are testing H0: 𝜇 = 𝜇0 vs H1: 𝜇 ≠ 𝜇0 at α level of significance then, we can reject

H0 if 𝑡𝑜𝑏𝑠 is lying outside the interval (−𝑡α2⁄ , +𝑡α

2⁄ )

b) If we are testing H0: 𝜇 = 𝜇0 vs H1: 𝜇 > 𝜇0 at α level of significance then, we can reject

H0 if 𝑡𝑜𝑏𝑠 > 𝑡α

c) If we are testing H0: 𝜇 = 𝜇0 vs H1: 𝜇 < 𝜇0 at α level of significance then, we can reject

H0 if 𝑡𝑜𝑏𝑠 < - 𝑡α

Here 𝑡α and 𝑡α2⁄ are t distribution table values at α level of significance

t-Test for Two Means (Theory)

Here the null hypothesis is given by,

H0: 𝜇1 = 𝜇2

For the above null hypothesis, we may have any one of the following alternatives,

a) H1: 𝜇1 ≠ 𝜇2 (Two tail test)

b) H1: 𝜇1 > 𝜇2 (One tail test – Upper)

c) H1: 𝜇1 < 𝜇2 (One tail test – Lower)

Now the test statistic under H0 is given by,

𝑡𝑜𝑏𝑠 =�̅�1 − �̅�2

√𝑛1𝑠1

2 + 𝑛2𝑠22

𝑛1 + 𝑛2 − 2 . (𝑛1 + 𝑛2

𝑛1. 𝑛2)

~ 𝑡(𝑛1 + 𝑛2 − 2)degree of freedom

17 | P a g e

�̅�1 denotes sample mean for a sample of size 𝑛1 from population 1

�̅�2 denotes sample mean for a sample of size 𝑛2 from population 2

𝑠1 denotes sample standard deviation for a sample of size 𝑛1 from population 1

𝑠2 denotes sample standard deviation for a sample of size 𝑛2 from population 2

Decision Making

a) If we are testing H0: 𝜇1 = 𝜇2 vs H1: 𝜇1 ≠ 𝜇2 at α level of significance then we can reject

H0 if 𝑡𝑜𝑏𝑠 is lying outside the interval (−𝑡α2⁄ , +𝑡α

2⁄ )

b) If we are testing H0:𝜇1 = 𝜇2 vs H1: 𝜇1 > 𝜇2 at α level of significance then we can reject

H0 if 𝑡𝑜𝑏𝑠 > 𝑡α

c) If we are testing H0: 𝜇1 = 𝜇2 vs H1: 𝜇1 < 𝜇2 at α level of significance then we can reject

H0 if 𝑡𝑜𝑏𝑠 < - 𝑡α

o Examples on t-Test (Mean)

1. The mean nicotine content of a brand of cigarette is 20.0 mgs. A new process is proposed to

lower the nicotine content without affecting the quality. To test the new process, 16 cigarettes

are selected at random from the output obtained from the test plant. The sample mean nicotine

content is found to be 18.5 mg with standard deviation of 2 mg. Is the claim for the new

process justified? Test at 5% l.o.s.

2. A car manufacturer claims that its new car gives a mileage of atleast 15 kms/litre of petrol. A

sample of 10 cars is taken at random, and their mileage recorded are as follows:

16.2, 15.7, 16.3, 16.0, 15.8, 15.7, 15.6, 15.6, 15.7, 15.4

Is there any statistical evidence to support the claim of the manufacturer about the mileage?

3. A local car dealer wants to know if the purchasing habits of a buyer buying extras have

changed. He is particularly interested in male buyers. Based upon the previous experience he

finds that the average of extras purchased is $2000. As a test he collects details of extras

purchased by the last 7 male customers i.e. ($) 2300, 2386, 1920, 1578, 3065, 2312 and 1790.

Test whether the extras purchased on average has changed.

18 | P a g e

4. Two types of drugs viz. A and B were used on 5 and 7 patients respectively for reducing their

weight. Drug A was imported and drug B was indigenous. The decrease in the weight after

using the drug for six months was as follows:

Drug A Drug B

10 8

12 9

13 12

11 14

14 15

10

9

Test whether there is any significant difference in the efficacy of the two drugs with respect

to average weight lost.

5. A physical instructor has an opinion that students who are associated with athletics are taller

in height as compared to those who do not. Among 16 students who were selected at random

it was found that 6 students were associated with athletics and the remaining were not

associated with athletics. Their heights recorded are as below-

Height (cms)

Athletes Non-Athletes

176 172

173 167

171 175

172 169

177 169

169 172

174

170

167

170

Test at 1 % l.o.s whether the opinion of the physical instructor is valid?

19 | P a g e

Paired t-Test (Theory)

Here the null hypothesis is given by,

H0: There is no significant difference after as compared to before (𝜇1 = 𝜇2)

For the above null hypothesis have the following alternative,

H1: There is a significant difference after as compared to before

(𝜇1 > 𝜇2 𝑜𝑟 𝜇1 < 𝜇2 )

Now the test statistic under H0 is given by,

𝑡𝑜𝑏𝑠 =�̅� − 𝐷

𝑠𝑑

√𝑛 − 1

~ 𝑡(𝑛 − 1)degree of freedom

Where, �̅� = ∑ 𝑑

𝑛 and 𝑠𝑑 = √∑ 𝑑2

𝑛− (

∑ 𝑑

𝑛)

2

Here d = Before Score – After Score / Score 1 – Score 2

D the standard value at which the hypothesis is tested

(Note: If D is not specified then take it as zero)

Decision Making

a) If we are testing H0:𝜇1 = 𝜇2 vs H1: 𝜇1 > 𝜇2 at α level of significance then we can reject

H0 if 𝑡𝑜𝑏𝑠 > 𝑡α

b) If we are testing H0: 𝜇1 = 𝜇2 vs H1: 𝜇1 < 𝜇2 at α level of significance then we can reject

H0 if 𝑡𝑜𝑏𝑠 < - 𝑡α

o Examples on Paired t-test

6. Super Slim is advertising a weight reduction programme which claims that more than 10 lbs

weight loss is possible in first 30 days. Twenty six subjects were independently and randomly

selected for study, and their weights before and after the weight loss programme were

recorded. The data is as follow-

Weight (lbs)

Before After

170 170

159 153

162 129

153 143

177 137

167 134

158 133

178 128

141 152

163 142

20 | P a g e

154 140

159 154

159 143

138 147

161 142

156 149

165 136

158 154

151 140

165 145

155 125

154 140

147 125

156 141

155 146

169 135

Test the claim at 5% l.o.s.

7. A company has reorganized its sales department. The following data shows its weekly sales

(in Rs lakh) before and after reorganization. The period for comparison is taken from Jan to

March in two successive years-

Weekly Sales

Before After

12 16

15 17

13 14

11 13

17 15

15 14

10 12

11 11

18 17

19 22

Comment and draw valid conclusion.

21 | P a g e

8. A local pizza restaurant and a local branch of a national chain are located across the street

from a college campus. The local pizza restaurant advertises that it delivers to the dormitories

faster than the national chain. In order to determine whether this advertisement is valid, you

and some of friends have decided to order pizzas from both the outlets at different time. The

delivery times in minutes are as given below-

Local Chain

16.8 22

11.7 15.2

15.6 18.7

16.7 15.6

17.5 20.8

18.1 19.5

14.1 17

21.8 19.5

13.9 16.5

20.8 24

Test the claim at 1% l.o.s.

Assignment 3

1. The cinema-goers were 800 people out of a sample of 1000 persons during the period of a

fortnight in a town where no TV programme was aired. Similarly cinema-goers were 700

people out of a sample of 2800 persons during a fortnight where a TV programme was aired.

Do you think that there has been a significant decrease in proportion of cinema-goers due to

the introduction of TV programmes?

2. An insurance agent has claimed that the average age of policy holders who insured through

him is less than the average for all agents which he estimates as 30 years. A random sample

of 100 policy holders who have insured through him gave the following age distribution-

Age in years No of persons insured

16 – 20 12

21 – 25 22

26 – 30 20

31 – 35 30

36 – 40 16

Test the claim at 1% l.o.s.

22 | P a g e

3. Two salesmen A and B are working in a certain district. From a sample survey conducted by

the head office, the following results were obtained. State whether there is any significant

difference in the average sales between the two salesmen?

A B

Number of Sales 20 15

Average Sales (in Rs’000) 170 200

Standard Deviation (in Rs’000) 20 25

4. Ten persons were appointed in officer cadre in an office. Their performance was evaluated by

giving a test and the marks were recorded out of 100. They were given two months’ training

and another test was held and the marks were recorded out of 100. The details are as below-

Employees Marks Before Training Marks After Training

A 80 84

B 76 70

C 92 96

D 60 80

E 70 70

F 56 52

G 74 84

H 56 72

I 70 72

J 56 50

Can it be conclude that the employees have benefited by the training?

5. As per the ET-TNS consumer confidence survey, published in Economic Times dt. 10th

November, 2006, the consumer confidence indices for some of the cities changed from

December 2005 to September 2006, as follows. Is the difference significant?

City December 2005 September 2006

Delhi 106 83

Jaipur 117 142

Mumbai 112 126

Ahmedabad 123 108

Kolkota 83 84

Bhubaneshwar 137 144

Bangalore 137 138

Kochi 113 134

23 | P a g e

Chi-Square Test for Independence of Attributes (Theory)

Here the null hypothesis is given by,

H0: The two attributes are independent

For the above null hypothesis have the following alternative,

H1: The two attributes are dependent

Here the observed frequencies are given in tabular form called Contingency table.

We need to calculate the expected frequencies using the formula,

𝐸 =𝑅𝑜𝑤 𝑇𝑜𝑡𝑎𝑙 𝑋 𝐶𝑜𝑙𝑢𝑚𝑛 𝑇𝑜𝑡𝑎𝑙

𝐺𝑟𝑎𝑛𝑑 𝑇𝑜𝑡𝑎𝑙

Now the test statistic under H0 is given by,

𝜒2𝑜𝑏𝑠

= ∑(𝑂 − 𝐸)2

𝐸 ~ 𝜒2[(𝑟 − 1)x(c − 1)]degree of freedom

Where,

O denotes Observed Frequencies

E denotes Expected Frequencies

r denotes number of rows

c denotes number of columns

Decision Making

We are testing H0 vs H1 at α level of significance, where we can reject H0 if 𝜒2𝑜𝑏𝑠

> 𝜒α2

𝜒α2denotes chi square table value at α level of significance

o Examples on Chi-Square Test for Independence of Attributes

1. The marketing agency gives the following information about the age group of the sample

informants and their liking for a particular model of scooter which a company plans to

introduce:

Age group of the informants

Total Below 20 20-39 40-59

Liked 125 420 60 605

Disliked 75 220 100 395

Total 200 640 160 1000

On the basis of the above data can it be concluded that the model appeal is independent of the

age group of the informants?

24 | P a g e

2. 1000 employees at a company are graded according to their performance and economic

conditions. Test at 1 % level of significance whether there is any association between the

performance and economic condition of the employees.

Performance

Total High Medium Low

Economic

Condition

Rich 160 300 140 600

Poor 140 100 160 400

Total 300 400 300 1000

3. In order to test whether attributes ‘smoking’ and ‘literacy’ are independent, a survey of 210

literates and 250 illiterates was conducted. The result of the survey is given below-

Smoker Non

Smoker

Total

Education

Background

Literate 13 197 210

Illiterate 46 204 250

Total 59 401 460

Test at 1 % level of significance whether there is any association between the attributes at

5% l.o.s?

4. Suppose a university sampled 485 of its students to determine whether males and females

differed in preference for the five courses offered. The data obtained is tabulated as below-

Courses offered Gender

Male Female

Science 45 86

Engineering 52 67

Medicine 50 19

Management 50 32

Arts 69 15

Test whether there exists any association between the choice of the course and the gender of

the respondent.

25 | P a g e

Chi-Square Test for Goodness of Fit

Here the null hypothesis is given by,

H0: The theoretical and observed frequency distribution is a good fit.

For the above null hypothesis have the following alternative,

H1: The theoretical and observed frequency distribution is not a good fit.

Now the test statistic under H0 is given by,

𝜒2𝑜𝑏𝑠

= ∑(𝑂 − 𝐸)2

𝐸 ~ 𝜒2(n − k − 1)degree of freedom

Where,

O denotes Observed Frequencies

E denotes Expected Frequencies

k denotes Additional Constraints

Decision Making

We are testing H0 Vs H1 at α level of significance, where we can reject H0 if 𝜒2𝑜𝑏𝑠

> 𝜒α2

𝜒α2denotes chi-square table value at α level of significance

(Note: If the expected frequencies are found to be less than 5 then it should be pooled with either

the preceding or succeeding frequency term)

o Examples on Chi-Square Test for Independence of Attributes

1. A survey of 64 families with 3 children each is conducted and the number of male children

in each family is noted. The results are tabulated as follows-

No of Male Children 0 1 2 3 Total

No of Families 6 19 29 10 64

Test whether male and female children are equi-probable?

2. The following data relates to the number of mistakes on each page of a book containing 180

pages.

No of Mistakes / page 0 1 2 3 4 ≥5 Total

No of Pages 130 32 15 2 1 0 180

Test whether Poisson distribution is a good fit to the observed distribution.

3. A sample analysis of examination results of 200 MBAs was made. It was found that 46

students had failed, 68 students secured pass class, 62 secured second class and the remaining

secured first class. Are these figures commensurate with the general examination result that is

in the ratio of 2:3:3:2 for various categories respectively? Test at 1% l.o.s.

4. The divisional manager of a retail chain believes that average number of customers entering

each of the five stores in his division weekly is the same. In a given week, the manager reports

the following number of customers in the stores as: 3000, 2960, 3100, 2780, 3160. Test the

divisional manager’s belief at 5% l.o.s.

26 | P a g e

Assignment 4

1. A trainee risk manager for an investment bank has been told that the level of risk is related to

the industry type. For the sample data presented in the contingency table analyze whether

perceived risk is dependent upon the type of industry identified?

Industry Class

Manufacturing Retail Financial

Level of

Risk

Low 81 38 16

Moderate 46 42 33

High 22 26 29

2. An employment agency has recently implemented a new training programme to develop the

interview skills of potential job applicants. Based upon the collected data can we say

confidently that the data can be modelled using binomial distribution? (Test at 1% l.o.s).

No of Interview Successes 0 1 2 3

Frequency 78 143 43 13

3. A motorway safety officer believes that the number of accidents per week occurring on a

stretch of motorway can be modelled using Poisson distribution. A sample data collect for the

study is given below-

No of accidents/week 0 1 2 3 4 5 6 ≥7

Frequency 10 12 12 9 5 3 1 0

Test whether Poisson distribution is a good fit to the observed distribution.

4. A university has recently set up a satellite department within a local college of higher

education. The university claims that 35%, 26%, 25% and 14% of the undergraduate students

are in department A, B, C and D respectively. A random sample of 320 students finds the

following number of students in department A-D: 132, 89, 64 and 35 respectively. Test the

claim at 1% l.o.s.

27 | P a g e

Analysis of Variance (ANOVA)

Introduction

One Way Analysis of Variance (One Way ANOVA) – Theory

Here the null hypothesis is given by

H0: There is no significant difference between the population means

And the corresponding alternative hypothesis is given by

H1: There is a significant difference between atleast one pair of population means

Here the calculation is done using ANOVA table called One Way ANOVA Table

Sources of

Variation

Sum of

Squares

(SS)

Degree of

Freedom

Mean Sum of

Squares

(MSS)

F-Ratio

Between the

Sample

SSB k-1 MSB=SSB/k-1 MSB/MSE

Within the

Sample

SSE n-k MSE=SSE/n-k

Total Variation SST n-1

Now the test statistics is given by-

𝐹𝑜𝑏𝑠 =𝑀𝑆𝐵

𝑀𝑆𝐸 ~ 𝐹(𝑘 − 1, 𝑛 − 𝑘)𝑑𝑒𝑔𝑟𝑒𝑒 𝑜𝑓 𝑓𝑟𝑒𝑒𝑑𝑜𝑚

Here,

n denotes total number of observations

k denotes number of entities under study

Decision Making

We are testing H0 Vs H1 at α level of significance, where we can reject H0 if 𝐹𝑜𝑏𝑠 > 𝐹α

𝐹α denotes F table value at α level of significance

o Examples on One Way ANOVA

1. To assess the significance of possible variation in performance of a company (which has

four plants in different cities) was conducted and the results are given below.

Plant A Plant B Plant C Plant D

08 12 18 13

10 11 12 09

12 09 16 12

08 14 06 16

07 04 08 15

Carry out analysis of variance and interpret the result.

28 | P a g e

2. The following table gives the yields on 15 sample plots under three varieties of seeds namely

A, B and C.

A: 20, 21, 23, 16, 20

B: 18, 20, 17, 15, 25

C: 25, 28, 22, 28, 32

Find out if the average yields of the land with different varieties of seed show significant

differences.

Two Way Analysis of Variance (Two Way ANOVA) without Replication –

Theory

Here the null hypothesis are given by

H0R: There is no significant difference between the factors along the rows

H0C: There is no significant difference between the factors along the columns

And the corresponding alternative hypothesis is given by

H1R: There is a significant difference between the factors along the rows

H1C: There is a significant difference between the factors along the columns

Here the calculation is done using ANOVA table called Two Way ANOVA Table

Sources of

Variation

Sum of

Squares

(SS)

Degree of

Freedom

Mean Sum of

Squares

(MSS)

F-Ratio

Between the

Rows

SSR r-1 MSR=SSR/r-1 MSR/MSE

Between the

Columns

SSC c-1 MSC=SSC/c-1 MSC/MSE

Residual

SSE (r-1)(c-1) MSE=SSE/(r-1)(c-1)

Total Variation SST n-1

Now the test statistics is given by-

𝐹1𝑜𝑏𝑠 =𝑀𝑆𝑅

𝑀𝑆𝐸 ~ 𝐹(𝑟 − 1, (r − 1)(c − 1))𝑑𝑒𝑔𝑟𝑒𝑒 𝑜𝑓 𝑓𝑟𝑒𝑒𝑑𝑜𝑚

𝐹2𝑜𝑏𝑠 =𝑀𝑆𝐶

𝑀𝑆𝐸 ~ 𝐹(𝑐 − 1, (r − 1)(c − 1))𝑑𝑒𝑔𝑟𝑒𝑒 𝑜𝑓 𝑓𝑟𝑒𝑒𝑑𝑜𝑚

Here,

n denotes total number of observations (n=cr)

r denotes number of rows

c denotes number of columns

29 | P a g e

Decision Making

1. We are testing H0R Vs H1R at α level of significance, where we can reject H0 if 𝐹1𝑜𝑏𝑠 >

𝐹1α

𝐹1α denotes F table value at α level of significance for (r-1,(r-1)(c-1))degree of freedom

2. We are testing H0C Vs H1C at α level of significance, where we can reject H0 if 𝐹2𝑜𝑏𝑠 >

𝐹2α

𝐹2α denotes F table value at α level of significance for (c-1,(r-1)(c-1))degree of freedom

o Examples on Two Way Analysis of Variance (Two Way ANOVA) without Replication

3. A company has appointed four salesmen A, B, C and D and observed their sales in three

seasons – summer, winter and monsoon. The figures (in Rs. Lakh) are given in the

following table-

Seasons

Salesmen

A B C D

Summer 36 36 21 35

Winter 28 29 31 32

Monsoon 26 28 29 29

Using 5% l.o.s, perform analysis of variance on the above data and interpret the results.

4. The following data represents the number of units of production per day turned out by

four different workers using five different machines:

Workers

Machine Type

A B C D E

I 4 5 3 7 6

II 5 7 7 4 5

III 7 6 7 8 8

IV 3 5 4 8 2

On the basis of the data given in the table, can it be concluded that-

i. The mean productivity is the same for different machines

ii. The workers don’t differ with regard to productivity.

30 | P a g e

o Examples on Two Way Analysis of Variance (Two Way ANOVA) with Replication

5. The following data refers to the yields of rice on two plots each with combination of the

verity of rice and type of fertilizers

Fertilizer A Fertilizer B Fertilizer C Fertilizer D

Verity 1 6 4 8 6

5 5 6 4

Verity 2 7 6 6 9

6 7 7 8

Verity 3 8 5 10 9

7 5 9 10

Test the above case at 1% l.o.s.

6. Reliable tyre dealer wishes to assess the quality of lives of three different brands of tyres

sold by it. It also wants to assess whether the lives of these tyres is the same for four brands

of cars on which they are been used. Thus, each brand of tyres was tested on each of the

four brands of cars. Further, the dealer wishes to ascertain the equality of lives for each

combination of brands of tyre and car. The mileage obtained are given as follows-

Car A Car B Car C Car D

Tyre 1 32 30 34 36

31 29 33 38

33 28 36 39

31 30 35 40

Tyre 2 38 39 40 41

37 40 41 39

38 41 42 40

39 39 43 42

Tyre 3 32 33 40 45

30 32 42 43

31 30 41 42

33 31 40 46

Test the above case at 5% l.o.s.

31 | P a g e

Assignment 5

1. Three training methods were compared to see if they led to greater productivity after training.

The productivity measures for individuals trained by different methods are as below-

Method 1 36 26 31 20 34 25

Method 2 40 29 38 32 39 34

Method 3 32 18 23 21 33 27

At 1 % l.o.s test whether three training methods lead to different levels of productivity?

2. The following table gives the data on the performance of three different detergents at three

water temperatures. The performance was obtained on the basis ‘whiteness’ readings based

on specially designed equipment for nine loads of washing-

Water Temp.

Detergent

A B C

Cold Water 45 43 55

Warm Water 37 40 56

Hot Water 42 44 46

Analysis the above case using ANOVA at 5% l.o.s.

3. The manager of a bank in Mumbai is responsible for ATM operations in three areas in the

city, viz. Andheri, Vile Parle and Santa Cruz. When he took over the operations, he faced the

problem of cash running out from the ATM machines. To study the problem he collected data

from all the three areas to check whether ATMs in all three areas need equal amount of cash.

He also wanted to know whether ATMs at different locations needed the same amount of cash

or not. So he collected the following data about cash withdrawals(in Rs. Lakhs) during the

last four months which is tabulated as below-

Areas Locations

Station Market Bank

Andheri 40 37 35

39 39 34

41 37 38

39 36 32

Santa Cruz 38 36 34

42 37 35

40 34 33

39 35 34

Vile Parle 38 39 35

39 34 35

39 37 34

41 36 33

Analyze the above case at 1% l.o.s.