MB0040

22
MB 0040 – STATISTICS FOR MANAGEMENT Assignment Set - 1 Q.1. (a) What is the difference between a qualitative and quantitative variable? Ans.: Qualitative variables are based on qualitative aspect or descriptive characteristics of a phenomenon viz. sex, beauty, literacy, honesty, intelligence, religion, eye-sight etc. Such variables are usually dichotomous in nature in which the whole data are divided into two groups viz. a group with presence of the attribute and a group with absence of the attribute such as blind and not blind, deaf and not deaf etc. However in certain cases variables can also be made in manifold manner in which the data are grouped under more than two classes. This type of classification is made when the qualitative aspect are defined by some grade or performance. For instance, in the field of education, the classification can be made in to different group viz. primary, secondary, higher secondary, and higher education. Similarly on the basis of eye sight, the data may be grouped under different grades of eye-sight viz. A, B, C, etc. Further, qualitative classifications are made in made manner when more than on e attribute are taken into consideration at a time, the classification will lead to a type of manifold classification. Quantitative variables are numerical in nature. In simple these variables can be measured in quantitative terms. For example- mark, income, expenditure, profit, loss, height, weight, age, price, production etc. which is capable of quantitative -1 -

Transcript of MB0040

Page 1: MB0040

MB 0040 – STATISTICS FOR MANAGEMENT

Assignment Set - 1

Q.1. (a) What is the difference between a qualitative and quantitative

variable?

Ans.: Qualitative variables are based on qualitative aspect or descriptive

characteristics of a phenomenon viz. sex, beauty, literacy, honesty, intelligence,

religion, eye-sight etc.

Such variables are usually dichotomous in nature in which the whole data are

divided into two groups viz. a group with presence of the attribute and a group

with absence of the attribute such as blind and not blind, deaf and not deaf etc.

However in certain cases variables can also be made in manifold manner in

which the data are grouped under more than two classes. This type of

classification is made when the qualitative aspect are defined by some grade or

performance. For instance, in the field of education, the classification can be

made in to different group viz. primary, secondary, higher secondary, and higher

education. Similarly on the basis of eye sight, the data may be grouped under

different grades of eye-sight viz. A, B, C, etc. Further, qualitative classifications

are made in made manner when more than on e attribute are taken into

consideration at a time, the classification will lead to a type of manifold

classification.

Quantitative variables are numerical in nature. In simple these variables can be

measured in quantitative terms. For example- mark, income, expenditure, profit,

loss, height, weight, age, price, production etc. which is capable of quantitative

expression and measurement. Quantitative variables may be defined as a

characteristic which varies in amount of magnitude under different time and

place e.g. mark, age, and height etc. These variables can be of two types viz. a)

discrete variables, b) Continuous variables. A variable that assumes only some

specified values in a given range is known as discrete variable. A variable that

assumes all the values in the series is known as continuous variables.

Q.1. (b) A town has 15 neighborhoods. If you interviewed everyone

living in one particular neighborhood, would you be

- 1 -

Page 2: MB0040

interviewing a population or a sample from the town? Would

this be a random sample? If you had a list of everyone living

in the town, called a frame, and you randomly selected 100

people from all neighborhoods, would this a random sample?

Ans.: Before answering this question we need to know what population is and

what a sample is. The totality of all individual in a survey is called population or

universe. If the number of objects in a population is finite then it is called finite

population otherwise it is known as infinite population.

A sample is a part or subset of the population. By studying the sample, we can

predict the characteristics of the entire population from where the sample is

taken. The data that describes the characteristics of sample is known as

statistics.

Now if we interview only one particular neighborhood then it would be a sample

survey not a population survey. Because here we interviewed every individuals

of a particular group not the whole population. But by selecting 100 people from

all neighborhoods for a survey would be called as a random sample.

Q.2. (a) Explain the steps involved in planning of a statistical survey?

Ans.: Stages in a statistical survey -

a). Nature of the problem to be investigated should be clearly defined in an

unambiguous manner.

b). Objectives of the investigation should be stated at the outset. Objectives

could be -

Obtain certain estimates.

Establish a theory.

Verify an existing statement.

Find relationship between characteristics

c). The scope of investigation has to be made clear. The scope of the

investigation refers to the area to be covered, identification of units to be studied,

nature of characteristics to be observed, accuracy of measurement, analytical

method, time cost and other resources required.

- 2 -

Page 3: MB0040

d). Whether to use data collected from primary sources or secondary sources

should be determined in advanced.

e). The organization of investigation is the final step in the process. It

encompasses the determination of the number of investigator required, their

training, supervision work needed, funds required.

Q.2. (b) What are the merits & Demerits of Direct personal observation

and Indirect Oral Interview?

Ans.: Direct personal observation: In the direct personal observation method,

the investigator collects data by having direct contact with the units of

investigation. The accuracy of the data depends upon the ability, training, and

attitude of the investigator.

Merits -

We get the original data which is more accurate and reliable.

Satisfactory information can be extracted by the investigator through

indirect questions.

Data is homogenous and comparable.

Additional information can be gathered.

Misinterpretation of question can be avoided.

Demerits -

This method consumes more cost.

This method costs more time.

This cannot be used when the scope of the investigation is wide.

Indirect oral interview: Indirect oral interview is used when the area to be

covered is large. The investigator collects the data from a third party or witness

or had of the institution. This method is generally used by police department in

cases related to enquiries on causes of fires, theft or murders.

Merits -

Economical in terms of time, cost and man power.

- 3 -

Page 4: MB0040

Confidential information can be collected.

Information is likely to be unbiased and reliable.

Demerits -

The degree of accuracy of information is less.

Q.3. (a)

Draw Ogives from the following data and measure the median value. Verify it by actual calculations.

Central size 5 15 25 35 45 Frequency 5 11 21 16 10

Ans.:

Central

Value

Limits Frequency Less than Greater than

5 0-10 5 10 5 0 63

15 10-20 11 20 16 10 58

25 20-30 21 30 37 20 47

35 30-40 16 40 53 30 26

45 40-50 10 50 63 40 10

Total 63

- 4 -

Page 5: MB0040

Now from the meeting points of these two ogives if we draw a perpendicular to

the X axis, the point where it meets X axis gives median of the series. So here

midpoint of 20-30 limit is 25. So median is 25.

By actual calculation

Here n=63, hence median is (N+1)/2th item which is (63+1)/2=32nd item =25.

So ogive median and actual median are same.

Q.3. (b)

Complete the following distribution, if its Median is 2,600 and compute the value of Arithmetic Mean.

Size 1000-1500

1500-2000

2000-2500

2500-3000

3000-4000

4000-5000

5000-6000

Total

Frequency 120 ? 400 500 ? 50 20 1500

Ans.:

Size f cf

1000-1500 120 120

1500-2000 f1 120+f1

2000-2500 400 520+f1

2500-3000 500 1020+f1

- 5 -

Page 6: MB0040

3000-4000 410-f1* 1430

4000-5000 50 1480

5000-6000 20 1500

N=1500(given)

*N=1500-(120+400+500+50+20)-f1

Median = (N)th item , 1500/2=750th item ,but median is 2600 (given)

2

This lies between 2500-3000 groups

Now M= L1 + L2-L1 (m-c)

F

2600= 2500+ 3000-2500/500 *(750-(520+f1))

= >2600 = 2500+ 500/500* (750-520-f1)

= >2600 = 2500- 230-f1

= >2600-2500= 320-f1

= > 100= 320- f1

= > f1 =130

Then f2 = 410-130=280

Ci f m fm

1000-1500 120 1250 150000

1500-2000 130 1750 227500

2000-2500 400 2250 900000

2500-3000 500 2750 1375000

3000-4000 280 3500 980000

4000-5000 50 4500 225000

5000-6000 20 5500 110000

1500 3967500

X= Σ fm

- 6 -

Page 7: MB0040

Σ f

=3967500/1500 = 2645 (ans)

Q.4. (a) What is the main difference between correlation analysis and

regression analysis?

Ans.: Correlation analysis: When two or more variables move in sympathy with

other, they are said to be correlated. If both variables move in the same direction

then they are said to be positively correlated. If the variables move in opposite

direction then they are said to be negatively correlated. If they move haphazardly

then there is no correlation between them.

Regression analysis: Regression analysis is used to estimate the values of the

dependent variables from the values of the independent variables. Regression

analysis is used to get measure of the error involved while using the regression

line as a basis for estimation. Regression coefficient is used to calculate

correlation coefficient.

The main difference between these two is: - correlation analysis attempts to

study the relationship between the variable ‘X’ and ‘Y’. Regression analysis

attempts to predict the average ‘X’ for a given ‘Y’. It is attempted to quantify the

dependence of one variable on the other.

Difference between regression coefficient and correlation coefficient

Correlation coefficient Regression Coefficient

The correlation coefficients, rxy = ryx. The regression coefficients, byx = bxy

It indirectly helps in estimation. It is meant for estimation.

It has no units attached to it. It has units attached to it.

There exists nonsense correlation. There is no such nonsense correlation.

It is not based on cause and effect It is based on cause and effect

relationship. relationship.

Q.4. (b) In Multiple regressions analysis is an extension of two

variable regression analyses. In this analysis, two or more

- 7 -

Page 8: MB0040

independent variables are used to estimate the values of a

dependent variable, instead of one independent variable.

Ans.: In Multiple regressions analysis is an extension of two variable regression analyses. In this analysis, two or more independent variables are used to estimate the values of a dependent variable, instead of one independent variable.

Objectives of multiple regression analysis are –

To derive an equation, this provides estimates of the dependent variable from values of the two or more independent variables?

To obtain the measure of the error involved in using the regression equation as a basis of estimation.

To obtain a measure of the proportion of variance in the dependent variable accounted for or explained by the independent variables.

In the given question N=12, hence degree of freedom will be v=n-1, where n is the sample size. So the degree of freedom will be 12-1=11

Q.5. (a) Discuss what is meant by Quality control and quality

improvement.

Ans.: Quality Control – is defined as the part of quality management focused

on fulfilling quality requirements. Ideally, prevention based controls should

prevent problems from occurring, but in reality, no system is foolproof and

problems do occur. Accordingly, controls to detect quality problems must be

established so that customers receive only products that meet their

requirements. ISO 9000 Lead Auditor Training Detection based controls are

reactive – the problem and cost have already occurred and the company is

resorting to damage control. The intent of detection is to evaluate output from

processes and activities by implementing controls to catch problems when they

do occur. For example, final inspection to catch defective product before it gets

shipped.

Quality Improvement – is defined as the part of quality management focused

on increasing the ability to fulfill requirements. Continual improvement results

from ongoing actions taken to enhance product characteristics or increase

process effectiveness and efficiency. This is one of the key characteristics that

differentiate a quality management system from a quality assurance system, i.e.,

being able to improve the effectiveness and efficiency and of a process or

- 8 -

Page 9: MB0040

activity by setting measurable objectives and using performance data to manage

the achievement of these objectives.

Effectiveness is defined as the extent to which planned activities are realized

and planned results are achieved. In determining the effectiveness of quality

assurance and quality improvement activities, the following questions should be

asked:

–To what extent have problems in product or processes been prevented?

–To what extent have planned objectives for quality been met?

Efficiency is defined as the relationship between result achieved and resources

used.

The measure of efficiency is determined by asking the following:

–Can we get the same output using fewer resources?

–Can we get more output without adding resources?

These questions may be applied to the output of any activity within the quality

management system of an organization.

It should be noted that ISO 9001 requires organizations to achieve QMS

effectiveness through quality assurance and continual improvement activities.

QMS efficiency is desirable, but not currently required by ISO 9001. ISO 9004

provides guidelines that consider both the effectiveness and efficiency of the

QMS.

Quality improvement actions may include -

Measuring and analyzing situations

Establishing improvement objectives

Searching for possible solutions

Evaluating these solutions

Implementing the selected solution

Measuring, verifying, and analyzing results

Formalizing the changes

- 9 -

Page 10: MB0040

Q.5. (b) What are the limitations of a quality control charts?

Ans.: The quality control chart is based on the research of Villefredo Pareto. He

found that approximately 80 percent of all wealth of Italian cities he researched

was held by only 20 percent of the families. The Pareto principle has been found

to apply in other areas, from economics to quality control. Pareto charts have

several disadvantages, however.

Easy to Make but Difficult to Troubleshoot

Based on the Pareto principle, any process improvement should focus on

the 20 percent of issues that cause the majority of problems in order to

have the greatest impact. However, one of the disadvantages of Pareto

charts is that they provide no insight on the root causes. For example, a

Pareto chart will demonstrate that half of all problems occur in shipping

and receiving. Failure Modes Effect Analysis, Statistical Process Control

charts, run charts and cause-and-effect charts are needed to determine

the most basic reasons that the major issues identified by the Pareto chart

are occurring.

Multiple Pareto Charts May Be Needed

Pareto charts can show where the major problems are occurring.

However, one chart may not be enough. To trace the cause for the errors

to its source, lower levels of Pareto charts may be needed. If mistakes are

occurring in shipping and receiving, further analysis and more charts are

needed to show that the biggest contributor is in order-taking or label-

printing. Another disadvantage of Pareto charts is that as more are

created with finer detail, it is also possible to lose sight of these causes in

comparison to each other. The top 20 percent of root causes in a Pareto

analysis two to three layers down from the original Pareto chart must also

be compared to each other so that the targeted fix will have the greatest

impact.

Qualitative Data versus Quantitative Data

Pareto charts can only show qualitative data that can be observed. It

merely shows the frequency of an attribute or measurement. One

disadvantage of generating Pareto charts is that they cannot be used to

calculate the average of the data, its variability or changes in the

- 10 -

Page 11: MB0040

measured attribute over time. It cannot be used to calculate the mean, the

standard deviation or other statistics needed to translate data collected

from a sample and estimate the state of the real-world population. Without

quantitative data and the statistics calculated from that data, it isn't

possible to mathematically test the values. Qualitative statistics are

needed to whether or not a process can stay within a specification limit.

While a Pareto chart may show which problem is the greatest, it cannot

be used to calculate how bad the problem is or how far changes would

bring a process back into specification.

Q.6. (a) Suggest a more suitable average in each of the following

cases:

(i) Average size of ready-made garments.

(ii) Average marks of a student.

Ans.: Average size of readymade garments: Arithmetic mean will be used

because it is continuous and additive in nature.

Average marks of a student: Arithmetic mean will be used because the data re

in the interval and the distribution is symmetrical.

Q.4. (b) State the nature of symmetry in the following cases:

(i) When median is greater than mean, and

(ii) When Mean is greater than median.

Ans.: When median is greater than mean, the series is said to have negative skewness. The following characteristics can be seen Mode > Median > Mean The left tail of the curve is longer than the right tail, when the data are plotted through a histogram, or a frequency polygon. The formula of skewness and its coefficients give negative figures.

When mean is greater than median, the series is said to have positive skewness. the following characteristics can be seen Mean > Median > Mode The right tail of the curve is longer than its left tail, when the data are poltted through a histogram, or a polygon. The formula of skewness and its coefficients give positive figures.

- 11 -

Page 12: MB0040

The following example would show the above distributions and their respective

characteristics:

Value (X) Positively Skewed Negatively Skewed

F FX CF F FX CF

10 5 50 5 5 50 5

20 15 300 20 7 140 12

30 13 390 33 9 270 21

40 11 440 44 11 440 32

50 9 450 53 13 650 45

60 7 420 60 15 900 60

70 5 350 65 5 350 65

Total 65 2400 - 65 2800 -

Mean= 2400/65= 37 Mean= 2800/65= 43

Median=(65+1)/2=33th Median= 33th item =50

Item =30

- 12 -

Page 13: MB0040

Assignment Set - 2

Q.1. (a) What are the characteristics of a good measure of central

tendency?

Ans.: The statistics, mean median and mode are known to be the most common measures of central tendency. A measure of central tendency is a sort of average or a typical value of the item in the series or some characteristic of members in a group. Each of these measures of central tendency provides a single value o represent the characteristic of the whole group in its own way.

According to Tete measure of central tendency is:"A sort of average or typical value of the items in the series and its function is to summarize the series in terms of this average value."

Mean represents the average for an ungrouped data; the sum of the scores divide by the total number of the scores gives the value of the mean.

Median is the score or value of that central item which divides the series in exactly two equal halves.

Mode is defined as the size of the variable that occurs most frequently in the series.

1). Characteristics of the mean include –

Every score influences the mean. Changing a score changes the mean. Adding or subtracting a score changes the mean (unless the score equals

the mean). If a constant value is added to every score, the same constant will be

added to the mean. If a constant value is subtracted from every score, the same constant will be subtracted from the mean.

If every score is multiplied or divided by a constant, the mean will change in the same way.

It is inappropriate to use the mean to summarize nominal and ordinal data; it is appropriate to use the mean to summarize interval and ratio data.

If the distribution is skewed or has some outliers, the mean will be distorted.

2). Characteristics of the Median include –

It is inappropriate to use the median to summarize nominal data; it is appropriate to use the median to summarize ordinal, interval, and ratio data.

The median depends on the frequency of the scores, not on the actual values.

The median is not distorted by outliers or extreme scores.

- 13 -

Page 14: MB0040

The median is the preferred measure of central tendency when the distribution is skewed or distorted by outliers.

3). Characteristics of the Mode include –

The mode may be used to summarize nominal, ordinal, interval, and ratio data.

There may be more than one mode. The mode may not exist.

Q.1. (b) What are the uses of averages?

Ans.: Below are the usages of various Averages –

1). Arithmetic mean is used when –

In depth study of the variable is needed The variable is continuous and additive in nature The data are in the interval or ratio scale When the distribution is symmetrical

2). Median is used when –

The variable is discrete There exists abnormal values The distribution is skewed The extreme values are missing The characteristics studied are qualitative The data are on the ordinal scale

3). Mode is used when – The variable is discrete There exists abnormal values The distribution is skewed The extreme values are missing The characteristics studied are qualitative

4). Geometric mean is used when – The rate of growth, ratios and percentages are to be studies The variable is of multiplicative nature

5). Harmonic mean is used when – The study is related to speed, time Average of rates which produce equal effects has to be found

Q.2. For each one of the following null hypothesis, determine it is a left

tailed a right-tailed, or a two-tailed test.

a. μ ≥ 10

- 14 -

Page 15: MB0040

b. P ≤ 0.5

c. μ is at least 100.

d. μ ≤ -20

e. p is exactly 0.22

Ans.: The hypothesis that contains an = is the null, implied by the question.a) Ho:μ ≥ 10, Ha:μ <10 so left tailed.b) Ha: P>0.5, so right tailedc) at least 100 means μ ≥ 100 so Ha:μ < 100 and left-tailedd) Ha:μ >-20, right-tailede) Could be left, right or two tailed.

Q.3. What is test statistic? Why do we have to know the distribution of a test statistic?

Ans.: In statistical hypothesis testing, a test statistic is a numerical summary of

a set of data that reduces the data to one or a small number of values that can

be used to perform a hypothesis test. Given a null hypothesis and a test

statistic T, we can specify a "null value" T0 such that values of T close to T0

present the strongest evidence in favor of the null hypothesis, whereas values

of T far from T0 present the strongest evidence against the null hypothesis. An

important property of a test statistic is that we must be able to determine

its sampling distribution under the null hypothesis, which allows us to calculate p-

values.

For example, suppose we wish to test whether a coin is fair (i.e. has equal

probabilities of producing a head or a tail). If we flip the coin 100 times and

record the results, the raw data can be represented as a sequence of 100

Head's and Tail's. If our interest is in the marginal probability of obtaining a head,

we only need to record the number T out of the 100 flips that produced a head,

and use T0 = 50 as our null value. The exact sampling distribution of T is

the binomial distribution, but for larger sample sizes the normal approximation

can be used. Using one of these sampling distributions, it is possible to compute

either a one-tailed or two-tailed p-value for the null hypothesis that the coin is

fair. Note that the test statistic in this case reduces a set of 100 numbers to a

single numerical summary that can be used for testing.

A test statistic shares some of the same qualities of a descriptive statistic, and

many statistics can be used as both test statistics and descriptive statistics.

However a test statistic is specifically intended for use in statistical testing,

whereas the main quality of a descriptive statistic is that it is easily interpretable.

- 15 -

Page 16: MB0040

Some informative descriptive statistics, such as the sample range, do not make

good test statistics since it is difficult to determine their sampling distribution.

After deciding what level of significance to use, our next task in hypothesis

testing is to determine the appropriate probability distribution. We have a choice

between the normal distribution and the‘t’ distribution.

Q.4. Suppose you are sampling from a population with mean μ= 1,065 and standard deviation σ = 500. The sample size is n=100. What are the expected value and the variance of a sample mean ̅ ?

Ans.: If sample mean is xbar, E(xbar)=μ=1065Var(xbar) = (population variance)/n==100^2/100=500

E[Xbar] = μ = 1,065Var(Xbar) = σ^2/n = 500^2/100 = 2500

Q.5. The time it takes an international telephone operator to place an overseas phone call is normally distributed with mean 45 seconds and standard deviation 10 seconds. a) What is the probability that my call will go through in less than 1 minute?b) What is the probability that I will get through in less than 40 seconds?

Ans.: a. First thing is, let   be the time it takes to the telephone operator to place an overseas phone call. The probability we are looking for is

In order to compute this probability, we need to normalize   in the following

way. We know that  has standard normal distribution, so we compute

(Since the mean is given in seconds, we have to write 1 minute as 60 seconds)

b. Same as before, let   be the time it takes to the telephone operator to place an overseas phone calls. The probability we are looking for is

- 16 -

Page 17: MB0040

In order to compute this probability, we need to normalize   in the same way

we did before. We know that   has standard normal distribution, so we compute

Q.6. The following data are the number of tons shipped weekly across the pacific by a shipping company. 398, 412, 560, 476, 544, 690, 587, 600, 613, 457, 504, 477, 530, 641, 359, 566, 452, 633, 474, 499, 580, 606, 344, 455, 505, 396, 347, 441, 390, 632, 400, 582Assume these data represent an entire population. Find the population mean and the population standard deviation.

Ans.: Population Mean = 504.7, Population Standard Deviation = 94.5

*******************************

- 17 -