40 SAT-2
-
Upload
vikash-agrawal -
Category
Documents
-
view
738 -
download
0
description
Transcript of 40 SAT-2
Name :
Roll No :
Learning Centre :
Subject : STATISTICS MANAGEMENT
Assignment No : TWO
Date of Submission
at the learning centre:
1
Q1. What are the characteristics of a good measure of central tendency?
Ans. In statistics, the term central tendency relates to the way in which quantitative data
tend to cluster around some value. A measure of central tendency is any of a number of
ways of specifying this "central value". In practical statistical analyses, the terms are often
used before one has chosen even a preliminary form of analysis: thus an initial objective
might be to "choose an appropriate measure of central tendency".
In the simplest cases, the measure of central tendency is an average of a set of
measurements, the word average being variously construed as mean, median, or other
measure of location, depending on the context. However, the term is applied to
multidimensional data as well as to univariate data and in situations where a transformation
of the data values for some or all dimensions would usually be considered necessary: in the
latter cases, the notion of a "central location" is retained in converting an "average"
computed for the transformed data back to the original units. In addition, there are several
different kinds of calculations for central tendency, where the kind of calculation depends on
the type of data (level of measurement).Both "central tendency" and "measure of central
tendency" apply to either statistical populations or to samples from a population
Three measures of central tendency are: mean, median, and mode.
The mean for a distribution is the sum of the scores divided by the number of scores.
SampleMean= Number of Scores Population Mean= Sum of the Scores
Sum of the Scores Number of Scores
M= Σx μ = ΣX
n N
Some characteristics of the mean include:
• Every score influences the mean.
• Changing a score changes the mean.
• Adding or subtracting a score changes the mean (unless the score equals the
2
mean).
• If a constant value is added to every score, the same constant will be added to the
mean. If a constant
value is subtracted from every score, the same constant will be subtracted from the
mean.
• If every score is multiplied or divided by a constant, the mean will change in the
same way
• It is inappropriate to use the mean to summarize nominal and ordinal data; it is
appropriate to use the
mean to summarize interval and ratio data.
• If the distribution is skewed or has some outliers, the mean will be distorted.
Median: If the scores in a distribution are listed in order, the median is the midpoint of the
list. Half of the scores are below the median; half of the scores are above the median.
1. Place the data in descending order. (Ascending would have worked too.)
2. Find the score that cuts the sample into two halves.
Characteristics of the Median include:
1. It is inappropriate to use the median to summarize nominal data; it is appropriate
to use the median to
summarize ordinal, interval, and ratio data.
2. The median depends on the frequency of the scores, not on the actual values.
3. The median is not distorted by outliers or extreme scores.
4. The median is the preferred measure of central tendency when the distribution is
skewed or distorted by outliers.
3
Mode: In a frequency distribution, the mode is the score or category that has the greatest
frequency.
Characteristics of the Mode include:
• The mode may be used to summarize nominal, ordinal, interval, and ratio data.
• There may be more than one mode.
• The mode may not exist.
Relationships among the Mean, Median, and Mode
• The mean and median are equal if the distribution is symmetric.
• The mean, median, and mode are equal if the distribution is uni modal and
symmetric.
• Otherwise, they do not give you the same answer.
Q 2. Your company has launched a new product .Your company is a reputed
company with 50% market share of similar range of products. Your competitors also
enter with their new products equivalent to your new product. Based on your earlier
experience, you initially estimated that, your market share of the new product would
be 50%. You carry out random sampling of 25 customers who have purchased the
new product ad realize that only eight of them have actually purchased your product.
Plan a hypothesis test to check whether you are likely to have a half of market share.
Ans: A company has launched a new product. Our earlier experience, initially estimated
that, market share of the new product would be 50%.Any hypothesis which specifies the
population distribution completely. statistical hypothesis testing plays a fundamental role.[6]
The usual line of reasoning is as follows:
1. We start with a research hypothesis of which the truth is unknown.
4
2. The first step is to state the relevant null and alternative hypotheses. This is
important as mis-stating the hypotheses will muddy the rest of the process.
Specifically, the null hypothesis allows to attach an attribute: it should be chosen in
such a way that it allows us to conclude whether the alternative hypothesis can
either be accepted or stays undecided as it was before the test.
3. The second step is to consider the statistical assumptions being made about the
sample in doing the test; for example, assumptions about the statistical
independence or about the form of the distributions of the observations. This is
equally important as invalid assumptions will mean that the results of the test are
invalid.
4. Decide which test is appropriate, and stating the relevant test statistic T.
5. Derive the distribution of the test statistic under the null hypothesis from the
assumptions. In standard cases this will be a well-known result. For example the test
statistics may follow a Student's t distribution or a normal distribution.
6. The distribution of the test statistic partitions the possible values of T into those for
which the null-hypothesis is rejected, the so called critical region, and those for which
it is not.
7. Compute from the observations the observed value tobs of the test statistic T.
8. Decide to either fail to reject the null hypothesis or reject it in favor of the alternative.
The decision rule is to reject the null hypothesis H0 if the observed value tobs is in the
critical region, and to accept or "fail to reject" the hypothesis otherwise.
It is important to note the philosophical difference between accepting the null hypothesis
and simply failing to reject it. The "fail to reject" terminology highlights the fact that the null
hypothesis is assumed to be true from the start of the test; if there is a lack of evidence
against it, it simply continues to be assumed true. The phrase "accept the null hypothesis"
may suggest it has been proved simply because it has not been disproved, a logical fallacy
known as the argument from ignorance. Unless a test with particularly high power is used,
the idea of "accepting" the null hypothesis may be dangerous. Nonetheless the terminology
is prevalent throughout statistics, where its meaning is well understood. Alternatively, if the
testing procedure forces us to reject the null hypothesis (H-null), we can accept the
alternative hypothesis (H-alt)and we conclude that the research hypothesis is supported by
5
the data. This fact expresses that our procedure is based on probabilistic considerations in
the sense we accept that using another set could lead us to a different conclusion.
3. The upper and the lower quartile income of a group of workers are Rs 8 and Rs 3
per day respectively. Calculate the Quartile deviations and its coefficient?
Ans. Quartile Deviation: It is based on the lower quartile and the upper quartile
. The difference is called the inter quartile range. The difference
divided by is called semi-inter-quartile range or the quartile deviation. Thus
Quartile Deviation (Q.D)
In this question = 3 and = 8
Q.D = 8-32=2.5
Here Quartile deviation is Rs 2.5 per day.
Coefficient of Quartile Deviation
Coefficient of Quartile Deviation:
A relative measure of dispersion based on the quartile deviation is called the
6
coefficient of quartile deviation. It is defined as
Here In this question = 3 and = 8
Coefficient of Quartile Deviation is Rs 0.455 per day
0.455
=
5
11
=
Coefficient of Quartile Deviation
=
2
8 + 3
2
8 – 3
4. The cost of living index number on a certain data was 200. From the base
period, the percentage increases in prices were—Rent Rs 60, clothing Rs 250, Fuel
and Light Rs 150 and Miscellaneous Rs 120. The weights for different groups were
food 60, Rent 16, clothing 12, Fuel and Light 8 and Miscellaneous 4.
7
Ans. Arranging the data in tabular form for easy representation
ITEM P W(Wt) wP
RENT 60 16 960
CLOTHING 250 12 3000
FUEL AND LIGHT 150 8 1200
MISCELLANEOU
S
120 4 480
FOOD - 60 60
∑ W= 100 ∑wP = 5700
P01 = ∑wP ∑ W
= 5700100 = 57
Hence living Index No is 57.
5. Education seems to be a difficult field in which to use quality techniques. One
possible outcome measures for colleges is the graduation rate (the percentage of the
students matriculating who graduate on time). Would you recommend using P or R
charts to examine graduation rates at a school? Would this be a good measure of
Quality?
Ans. In statistical quality control, the p-chart is a type of control chart used to monitor the
proportion of nonconforming units in a sample, where the sample proportion nonconforming
is defined as the ratio of the number of nonconforming units to the sample size, n. The p-
chart only accommodates "pass"/"fail"-type inspection as determined by one or more go-
no go gauges or tests, effectively applying the specifications to the data before they're
plotted on the chart. Other types of control charts display the magnitude of the quality
characteristic under study, making troubleshooting possible directly from those charts.
Some practitioners have pointed out that the p-chart is sensitive to the underlying
assumptions, using control limits derived from the binomial distribution rather than from the
observed sample variance. Due to this sensitivity to the underlying assumptions, p-charts
are often implemented incorrectly, with control limits that are either too wide or too narrow,
leading to incorrect decisions regarding process stability. A p-chart is a form of the
Individuals chart (also referred to as "XmR" or "ImR"), and these practitioners recommend
the individuals chart as a more robust alternative for count-based data
8
R Chart : Range charts are used when you can rationally collect measurements in
groups (subgroups) of between two and ten observations. Each subgroup represents a
"snapshot" of the process at a given point in time. The charts' x-axes are time based, so
that the charts show a history of the process. For this reason, you must have data that is
time-ordered; that is, entered in the sequence from which it was generated. If this is not the
case, then trends or shifts in the process may not be detected, but instead attributed to
random (common cause) variation.
For subgroup sizes greater than ten, use X-bar / Sigma charts, since the range statistic is a
poor estimator of process sigma for large subgroups. In fact, the subgroup sigma is
ALWAYS a better estimate of subgroup variation than subgroup range. The popularity of the
Range chart is only due to its ease of calculation, dating to its use before the advent of
computers. For subgroup sizes equal to one, an Individual-X / Moving Range chart can be
used, as well as EWMA or Cu Sum charts.
X-bar Charts are efficient at detecting relatively large shifts in the process average, typically
shifts of +-1.5 sigma or larger. The larger the subgroup, the more sensitive the chart will be
to shifts, providing a Rational Subgroup can be formed.
Hence, R Chrt will be a good measure of quality instead of P chart.
6. (a) Why do we use a chi-square test?
Ans. Chi-Square test is a non-parametric test. It is used to test the independence of
attributes, goodness of fit and specified variance. The Chi-Square test does not require any
assumptions regarding the shape of the population distribution from which the sample was
drawn. Chi-Square test assumes that samples are drawn at random and external forces, if
any, act on them in equal magnitude. Chi-Square distribution is a family of distributions. For
every degree of freedom, there will be one chi-square distribution. An important criterion for
applying the Chi-Square test is that the sample size should be very large. None of the
theoretical expected values calculated should be less than five. The important applications
of Chi-Square test are the tests for independence of attributes, the test of goodness of fit
and the test for specified variance.
The chi-square (c2) test measures the alignment between two sets of frequency measures.
These must be categorical counts and not percentages or ratios measures (for these, use
another correlation test). Note that the frequency numbers should be significant and be at
least above 5 (although an occasional lower figure may be possible, as long as they are not
a part of a pattern of low figures).
9
Goodness of fit: A common use is to assess whether a measured/observed set of
measures follows an expected pattern. The expected frequency may be determined from
prior knowledge (such as a previous year's exam results) or by calculation of an average
from the given data. The null hypothesis, H0 is that the two sets of measures are not
significantly different.
Independence: The chi-square test can be used in the reverse manner to goodness of fit. If
the two sets of measures are compared, then just as you can show they align, you can also
determine if they do not align. The null hypothesis here is that the two sets of measures are
similar.
The main difference in goodness-of-fit vs. independence assessments is in the use
of the Chi Square table. For goodness of fit, attention is on 0.05, 0.01 or 0.001 figures. For
independence, it is on 0.95 or 0.99 figures (this is why the table has two ends to it).
(b) Why do we use analysis of variance?
Ans. Let's start with the basic concept of a variance. It is simply the difference between
what you expected and what you really received. If you expected something to cost $1 and
it, in fact, cost $1.25, then you have a variance of $0.25 more than expected. This, of
course, means that you spent $0.25 more than what you planned. When you are calculating
your variances, take materiality into consideration. If you have a variance of $0.25, that isn't
a big deal if the quantity produced is very small. However, as the production run increases,
then that variance can add up quickly. Most projects generate tons of variances every day.
To avoid a tidal wave of numbers that are inconsequential, instead focus on the large
variances. For example, it is far more important to find out why there is a $10,000 cost
variance than to spend two days determining why an expense report was $75 over budget.
we want to do variance analysis in order to learn. One of the easiest and most objective
ways to see that things need to change is to watch the financials and ask questions. Don't
get me wrong: You cannot and should not base important decisions solely on financial data.
You must use the data as a basis to understand areas for further analysis. For example, if a
bandsaw is a bottleneck, then go to the department and ask why. The reasons for the
variance may range from the normal operator being out sick, to a worn blade, to there not
being enough crewing and a great deal of overtime being incurred. Use the numbers to
highlight areas to investigate, but do not make decisions without first investigating further.
Point in time variances, meaning singular occurrences, can help some. To make real
10
gains, look at trends over time. If our earlier variance of $0.25 is judged as a one-time
event, is that good or bad? We cannot tell with just one value, so let's look at the trend over
time. If we see that the negative variance over time was $0.01, $0.05, $0.10, $0.12 and
$0.25, then we can see that there apparently is a steady trend of increasing costs and, if
large enough to be material, should be investigated. Yes, this can take a lot of time if done
manually. However, spreadsheets and computer systems can be used to generate real-time
variance reports that are incredibly useful with little to no work to actually run the report.
Variance analysis and cost accounting in general are very interesting fields with a
great deal of specialized knowledge. By using variance analysis to identify areas of
concern, management has another tool to monitor project and organizational health. People
reviewing the variances should focus on the important exceptions so management can
become aware of changes in the organization, the environment and so on. Without this
information, management risks blindly proceeding down a path that cannot be judged as
good or bad.
11