40 SAT-2

Name :

Roll No :

Learning Centre :

Subject : STATISTICS MANAGEMENT

Assignment No : TWO

Date of Submission

at the learning centre:

1

Q1. What are the characteristics of a good measure of central tendency?

Ans. In statistics, the term central tendency relates to the way in which quantitative data

tend to cluster around some value. A measure of central tendency is any of a number of

ways of specifying this "central value". In practical statistical analyses, the terms are often

used before one has chosen even a preliminary form of analysis: thus an initial objective

might be to "choose an appropriate measure of central tendency".

In the simplest cases, the measure of central tendency is an average of a set of

measurements, the word average being variously construed as mean, median, or other

measure of location, depending on the context. However, the term is applied to

multidimensional data as well as to univariate data and in situations where a transformation

of the data values for some or all dimensions would usually be considered necessary: in the

latter cases, the notion of a "central location" is retained in converting an "average"

computed for the transformed data back to the original units. In addition, there are several

different kinds of calculations for central tendency, where the kind of calculation depends on

the type of data (level of measurement).Both "central tendency" and "measure of central

tendency" apply to either statistical populations or to samples from a population

Three measures of central tendency are: mean, median, and mode.

The mean for a distribution is the sum of the scores divided by the number of scores.

SampleMean= Number of Scores Population Mean= Sum of the Scores

Sum of the Scores Number of Scores

M= Σx μ = ΣX

n N

Some characteristics of the mean include:

• Every score influences the mean.

• Changing a score changes the mean.

• Adding or subtracting a score changes the mean (unless the score equals the

2

mean).

• If a constant value is added to every score, the same constant will be added to the

mean. If a constant

value is subtracted from every score, the same constant will be subtracted from the

mean.

• If every score is multiplied or divided by a constant, the mean will change in the

same way

• It is inappropriate to use the mean to summarize nominal and ordinal data; it is

appropriate to use the

mean to summarize interval and ratio data.

• If the distribution is skewed or has some outliers, the mean will be distorted.

Median: If the scores in a distribution are listed in order, the median is the midpoint of the

list. Half of the scores are below the median; half of the scores are above the median.

1. Place the data in descending order. (Ascending would have worked too.)

2. Find the score that cuts the sample into two halves.

Characteristics of the Median include:

1. It is inappropriate to use the median to summarize nominal data; it is appropriate

to use the median to

summarize ordinal, interval, and ratio data.

2. The median depends on the frequency of the scores, not on the actual values.

3. The median is not distorted by outliers or extreme scores.

4. The median is the preferred measure of central tendency when the distribution is

skewed or distorted by outliers.

3

Mode: In a frequency distribution, the mode is the score or category that has the greatest

frequency.

Characteristics of the Mode include:

• The mode may be used to summarize nominal, ordinal, interval, and ratio data.

• There may be more than one mode.

• The mode may not exist.

Relationships among the Mean, Median, and Mode

• The mean and median are equal if the distribution is symmetric.

• The mean, median, and mode are equal if the distribution is uni modal and

symmetric.

• Otherwise, they do not give you the same answer.

Q 2. Your company has launched a new product .Your company is a reputed

company with 50% market share of similar range of products. Your competitors also

enter with their new products equivalent to your new product. Based on your earlier

experience, you initially estimated that, your market share of the new product would

be 50%. You carry out random sampling of 25 customers who have purchased the

new product ad realize that only eight of them have actually purchased your product.

Plan a hypothesis test to check whether you are likely to have a half of market share.

Ans: A company has launched a new product. Our earlier experience, initially estimated

that, market share of the new product would be 50%.Any hypothesis which specifies the

population distribution completely. statistical hypothesis testing plays a fundamental role.[6]

The usual line of reasoning is as follows:

1. We start with a research hypothesis of which the truth is unknown.

4

2. The first step is to state the relevant null and alternative hypotheses. This is

important as mis-stating the hypotheses will muddy the rest of the process.

Specifically, the null hypothesis allows to attach an attribute: it should be chosen in

such a way that it allows us to conclude whether the alternative hypothesis can

either be accepted or stays undecided as it was before the test.

3. The second step is to consider the statistical assumptions being made about the

sample in doing the test; for example, assumptions about the statistical

independence or about the form of the distributions of the observations. This is

equally important as invalid assumptions will mean that the results of the test are

invalid.

4. Decide which test is appropriate, and stating the relevant test statistic T.

5. Derive the distribution of the test statistic under the null hypothesis from the

assumptions. In standard cases this will be a well-known result. For example the test

statistics may follow a Student's t distribution or a normal distribution.

6. The distribution of the test statistic partitions the possible values of T into those for

which the null-hypothesis is rejected, the so called critical region, and those for which

it is not.

7. Compute from the observations the observed value tobs of the test statistic T.

8. Decide to either fail to reject the null hypothesis or reject it in favor of the alternative.

The decision rule is to reject the null hypothesis H0 if the observed value tobs is in the

critical region, and to accept or "fail to reject" the hypothesis otherwise.

It is important to note the philosophical difference between accepting the null hypothesis

and simply failing to reject it. The "fail to reject" terminology highlights the fact that the null

hypothesis is assumed to be true from the start of the test; if there is a lack of evidence

against it, it simply continues to be assumed true. The phrase "accept the null hypothesis"

may suggest it has been proved simply because it has not been disproved, a logical fallacy

known as the argument from ignorance. Unless a test with particularly high power is used,

the idea of "accepting" the null hypothesis may be dangerous. Nonetheless the terminology

is prevalent throughout statistics, where its meaning is well understood. Alternatively, if the

testing procedure forces us to reject the null hypothesis (H-null), we can accept the

alternative hypothesis (H-alt)and we conclude that the research hypothesis is supported by

5

the data. This fact expresses that our procedure is based on probabilistic considerations in

the sense we accept that using another set could lead us to a different conclusion.

3. The upper and the lower quartile income of a group of workers are Rs 8 and Rs 3

per day respectively. Calculate the Quartile deviations and its coefficient?

Ans. Quartile Deviation: It is based on the lower quartile and the upper quartile

. The difference is called the inter quartile range. The difference

divided by is called semi-inter-quartile range or the quartile deviation. Thus

Quartile Deviation (Q.D)

In this question = 3 and = 8

Q.D = 8-32=2.5

Here Quartile deviation is Rs 2.5 per day.

Coefficient of Quartile Deviation

Coefficient of Quartile Deviation:

A relative measure of dispersion based on the quartile deviation is called the

6

coefficient of quartile deviation. It is defined as

Here In this question = 3 and = 8

Coefficient of Quartile Deviation is Rs 0.455 per day

0.455

=

5

11

=

Coefficient of Quartile Deviation

=

2

8 + 3

2

8 – 3

4. The cost of living index number on a certain data was 200. From the base

period, the percentage increases in prices were—Rent Rs 60, clothing Rs 250, Fuel

and Light Rs 150 and Miscellaneous Rs 120. The weights for different groups were

food 60, Rent 16, clothing 12, Fuel and Light 8 and Miscellaneous 4.

7

Ans. Arranging the data in tabular form for easy representation

ITEM P W(Wt) wP

RENT 60 16 960

CLOTHING 250 12 3000

FUEL AND LIGHT 150 8 1200

MISCELLANEOU

S

120 4 480

FOOD - 60 60

∑ W= 100 ∑wP = 5700

P01 = ∑wP ∑ W

= 5700100 = 57

Hence living Index No is 57.

5. Education seems to be a difficult field in which to use quality techniques. One

possible outcome measures for colleges is the graduation rate (the percentage of the

students matriculating who graduate on time). Would you recommend using P or R

charts to examine graduation rates at a school? Would this be a good measure of

Quality?

Ans. In statistical quality control, the p-chart is a type of control chart used to monitor the

proportion of nonconforming units in a sample, where the sample proportion nonconforming

is defined as the ratio of the number of nonconforming units to the sample size, n. The p-

chart only accommodates "pass"/"fail"-type inspection as determined by one or more go-

no go gauges or tests, effectively applying the specifications to the data before they're

plotted on the chart. Other types of control charts display the magnitude of the quality

characteristic under study, making troubleshooting possible directly from those charts.

Some practitioners have pointed out that the p-chart is sensitive to the underlying

assumptions, using control limits derived from the binomial distribution rather than from the

observed sample variance. Due to this sensitivity to the underlying assumptions, p-charts

are often implemented incorrectly, with control limits that are either too wide or too narrow,

leading to incorrect decisions regarding process stability. A p-chart is a form of the

Individuals chart (also referred to as "XmR" or "ImR"), and these practitioners recommend

the individuals chart as a more robust alternative for count-based data

8

R Chart : Range charts are used when you can rationally collect measurements in

groups (subgroups) of between two and ten observations. Each subgroup represents a

"snapshot" of the process at a given point in time. The charts' x-axes are time based, so

that the charts show a history of the process. For this reason, you must have data that is

time-ordered; that is, entered in the sequence from which it was generated. If this is not the

case, then trends or shifts in the process may not be detected, but instead attributed to

random (common cause) variation.

For subgroup sizes greater than ten, use X-bar / Sigma charts, since the range statistic is a

poor estimator of process sigma for large subgroups. In fact, the subgroup sigma is

ALWAYS a better estimate of subgroup variation than subgroup range. The popularity of the

Range chart is only due to its ease of calculation, dating to its use before the advent of

computers. For subgroup sizes equal to one, an Individual-X / Moving Range chart can be

used, as well as EWMA or Cu Sum charts.

X-bar Charts are efficient at detecting relatively large shifts in the process average, typically

shifts of +-1.5 sigma or larger. The larger the subgroup, the more sensitive the chart will be

to shifts, providing a Rational Subgroup can be formed.

Hence, R Chrt will be a good measure of quality instead of P chart.

6. (a) Why do we use a chi-square test?

Ans. Chi-Square test is a non-parametric test. It is used to test the independence of

attributes, goodness of fit and specified variance. The Chi-Square test does not require any

assumptions regarding the shape of the population distribution from which the sample was

drawn. Chi-Square test assumes that samples are drawn at random and external forces, if

any, act on them in equal magnitude. Chi-Square distribution is a family of distributions. For

every degree of freedom, there will be one chi-square distribution. An important criterion for

applying the Chi-Square test is that the sample size should be very large. None of the

theoretical expected values calculated should be less than five. The important applications

of Chi-Square test are the tests for independence of attributes, the test of goodness of fit

and the test for specified variance.

The chi-square (c2) test measures the alignment between two sets of frequency measures.

These must be categorical counts and not percentages or ratios measures (for these, use

another correlation test). Note that the frequency numbers should be significant and be at

least above 5 (although an occasional lower figure may be possible, as long as they are not

a part of a pattern of low figures).

9

Goodness of fit: A common use is to assess whether a measured/observed set of

measures follows an expected pattern. The expected frequency may be determined from

prior knowledge (such as a previous year's exam results) or by calculation of an average

from the given data. The null hypothesis, H0 is that the two sets of measures are not

significantly different.

Independence: The chi-square test can be used in the reverse manner to goodness of fit. If

the two sets of measures are compared, then just as you can show they align, you can also

determine if they do not align. The null hypothesis here is that the two sets of measures are

similar.

The main difference in goodness-of-fit vs. independence assessments is in the use

of the Chi Square table. For goodness of fit, attention is on 0.05, 0.01 or 0.001 figures. For

independence, it is on 0.95 or 0.99 figures (this is why the table has two ends to it).

(b) Why do we use analysis of variance?

Ans. Let's start with the basic concept of a variance. It is simply the difference between

what you expected and what you really received. If you expected something to cost $1 and

it, in fact, cost $1.25, then you have a variance of $0.25 more than expected. This, of

course, means that you spent $0.25 more than what you planned. When you are calculating

your variances, take materiality into consideration. If you have a variance of $0.25, that isn't

a big deal if the quantity produced is very small. However, as the production run increases,

then that variance can add up quickly. Most projects generate tons of variances every day.

To avoid a tidal wave of numbers that are inconsequential, instead focus on the large

variances. For example, it is far more important to find out why there is a $10,000 cost

variance than to spend two days determining why an expense report was $75 over budget.

we want to do variance analysis in order to learn. One of the easiest and most objective

ways to see that things need to change is to watch the financials and ask questions. Don't

get me wrong: You cannot and should not base important decisions solely on financial data.

You must use the data as a basis to understand areas for further analysis. For example, if a

bandsaw is a bottleneck, then go to the department and ask why. The reasons for the

variance may range from the normal operator being out sick, to a worn blade, to there not

being enough crewing and a great deal of overtime being incurred. Use the numbers to

highlight areas to investigate, but do not make decisions without first investigating further.

Point in time variances, meaning singular occurrences, can help some. To make real

10

gains, look at trends over time. If our earlier variance of $0.25 is judged as a one-time

event, is that good or bad? We cannot tell with just one value, so let's look at the trend over

time. If we see that the negative variance over time was $0.01, $0.05, $0.10, $0.12 and

$0.25, then we can see that there apparently is a steady trend of increasing costs and, if

large enough to be material, should be investigated. Yes, this can take a lot of time if done

manually. However, spreadsheets and computer systems can be used to generate real-time

variance reports that are incredibly useful with little to no work to actually run the report.

Variance analysis and cost accounting in general are very interesting fields with a

great deal of specialized knowledge. By using variance analysis to identify areas of

concern, management has another tool to monitor project and organizational health. People

reviewing the variances should focus on the important exceptions so management can

become aware of changes in the organization, the environment and so on. Without this

information, management risks blindly proceeding down a path that cannot be judged as

good or bad.

11

40 SAT-2

Documents

Transcript of 40 SAT-2