Download - MNS 601: STATISTICS FOR BUSINESS COURSE NOTES WEEK 1 - ONLINE · MNS 601: STATISTICS FOR BUSINESS COURSE NOTES – WEEK 1 - ONLINE . This is a summary of all the important concepts

MNS 601 COURSE NOTES 1

MNS 601: STATISTICS FOR BUSINESS

COURSE NOTES – WEEK 1 - ONLINE .

This is a summary of all the important concepts in the course. The information is

arranged in a specific sequence and divided into class sessions. There are two types of

content:

Presentation of information/concepts – you should study this information

thoroughly in addition to studying the relevant sections in the textbook (which

are identified in the Course Outline)

Examples – these sample problems illustrate the key concepts

* * * * * * * * * * * * * * * * * * Session #1 * * * * * * * * * * * * * * * * * *

INTRODUCTION

Definitions

Population - all the members of a defined group.

Sample - some of the members of a population.

Parameter - any descriptive measure of a characteristic of a population. Expressed using

Greek letters.

Statistic - any descriptive measure of a characteristic of a sample. Expressed using

Arabic letters.

Variable - a characteristic of a group.

Levels of Measurement

Nominal - grouping objects into classes. No "scale" is implied, e.g., nationality, sex, etc.

Ordinal - measures that indicate an order (i.e., the relationships can be expressed using

>). The scale does not have equal intervals between the points, e.g., the final standings in

a beauty contest, socio-economic status, grades in the military, etc.

Interval - measures that indicate an order and have equal intervals between successive

points on the scale, e.g., temperature in Fahrenheit degrees. The majority of statistical

procedures you will learn in this course apply to interval data.

Ratio - comparable to interval measures but have an absolute “0” point, e.g., Kelvin

temperature scale.

NOTE: Nominal and ordinal measures are often called categorical while interval and

ratio measures are called quantitative or scale.


DESCRIPTIVE STATISTICS

The primary purpose of descriptive statistics is to describe or summarize data so they are

easier to interpret.

Measures of Central Tendency

Mode - the most popular score, found by determining the frequency of each score. Note:

The mode is the only "average" which can be used appropriately to describe nominal

data. It is possible that a set of scores may have no mode, or several.

Example #1: To calculate the mode of the following set of scores:

13, 21, 19, 16, 14, 13, 36, 13, 19, 6, 15, 13, 4, 15, 8

You could construct this table:

SCORE FREQUENCY

13 4

21 1

19 2

16 1

14 1

36 1

6 1

15 2

4 1

8 1

Thus, the mode of these scores is 13.

Median - the middle score, found by listing scores in order and counting to the middle.

For an odd number of observations, it is the middle value. For an even number of

observations, it is the average of the two middle values.

Example#2: To find the median of the same scores as in Example #1 …

Arrange them in order: 4, 6, 8, 13, 13, 13, 13, 14, 15, 15, 16, 19, 19, 21, 36

Then, find the position of the middle score using the formula:

N + 1 = 15 + 1 = 8

2 2

Thus, the 8th

score, which is 14, is the median.

Arithmetic Mean - the "average". The mean is affected by the value of every score in

the group.


Sample Mean = x (x-bar) = sum of scores = ∑ x__

number of scores n

Where:

= the sum of

x = individual scores

n = number of scores in the sample

Population Mean = µ (mu) = sum of scores = ∑ x__

number of scores N

Where:

= the sum of

x = individual scores

N = number of scores in the population

Example#3: To calculate the arithmetic mean of the same scores as is Example #1, add

the scores (i.e., calculate X) = 225, then count the number of scores = 15, and divide

= 225 = 15

15

Weighted Arithmetic Mean - the mean of a set of scores in which the scores have

different weights.

w = ∑ wx

∑ w

Where "w" = the weight assigned to each value of x.

NOTE: The variable you are averaging is denoted by “x.”

Example#4: If your test grades for a course were 85%, 75%, 69% and 83%, and the first

3 tests were worth 20% of your course grade and the final test was 40%. Your course

grade would be a weighted mean:

Test # Grade (x) Weight (w) wx

1 85% .20 17.0

2 75% .20 15.0

3 69% .20 13.8

4 83% .40 33.2

Total = 1.00 79.0

x w = 79.0 = 79%


1.00

Measures of Dispersion

Range - the difference between the largest and smallest scores, or, the largest and

smallest scores themselves. Note: the range tells you nothing about the scores between

the largest and smallest scores.

Example #5: The range of the scores (same scores as in Example #1)

13, 21, 19, 16, 14, 13, 36, 13, 19, 6, 15, 13, 4, 15, 8

Can be expressed as: 32 or, 4 - 36

Standard Deviation - a statistic that shows how much the scores are spread out around

the mean. The larger the standard deviation, the more spread out are the scores.

s = standard deviation of a sample

sigma (σ) = standard deviation of a population

(Deviation - the distance of a score from the mean for the group. For example, if

the mean of a set of scores is 10, the deviation of a value of 15 is +5, and the

deviation of a score of 6 is -4.)

Every score in a group of scores has a deviation score. Conceptually, standard deviation

is the mean of the deviation scores for a group.

For a population: For a sample:

__________ ______________

sigma (σ) = √ ∑ (x - u)2 s = √ ∑ (x - x )2

N n – 1


Example #6: A consumer group studying the cost of various food items in San Diego

grocery stores recorded the following prices for a food item in seven different stores: 79

cents, 80 cents, 67 cents, 99 cents, 76 cents, 83 cents, and 84 cents. The standard

deviation of these prices can be calculated using the formula:

______________

s = √ ∑ (x - )2

n - 1

x _x _ x - x (x - x )2

79 81 -2 4 563_

80 81 -1 1 s = √ 7 - 1

67 81 -14 196

99 81 18 324

76 81 -5 25

83 81 2 4 = √ 93.83

84 81 3 9__

sum (∑) = 563 = 9.7 cents

Variance - the arithmetic mean of the squared deviations of the individual scores about

their mean. The variance = the standard deviation squared.

s2 = sample variance sigma

2 (σ2

) = population variance


* * * * * * * * * * * * * * * * * * Session #2 * * * * * * * * * * * * * * * * * *

Frequency Distributions and Graphs

Frequency Distribution – tabular summary of data showing the number (frequency) of

items in each of several non-overlapping classes.

Frequency distributions can be ungrouped or grouped.

For categorical data you may simply determine the frequency for each category

(ungrouped) (pp. 34–35 35-36, Table 2.2, Figure 2.1)

For quantitative/scale data, you should group/bin the scores then find the

frequency of scores falling within each group/class.

The three steps required to define the classes for a frequency distribution with

quantitative data are, to determine the:

1. Number of classes

2. Width of each class

3. Class limits

In setting the class limits you should be sure that:

1. The class intervals are of equal width. To determine the width of the class

interval use the formula below, then round up:

Class Width = (Largest data value + one unit) - Smallest data value

Total number of classes

2. The classes do not overlap (i.e., they are mutually exclusive).

3. You identify a reasonable number of classes.

For example, given the following scores if we construct a frequency distribution with 3

class intervals the class width could be: (16 – 1) ÷ 3 = 5. NOTE: Start the first class

using the lowest number in the data set.

12 4 1 8 15 6 2 2 3 5 9 2

Classes (of X) Frequency (f)

1 - 5 7

6 - 10 3

11 - 15 2


Histogram - The base of each bar/column equals one class interval and the height of

each one equals the frequency. The mid-point of each bar is the mid-point of a class

interval. A histogram is one of the easiest graphs to interpret when one variable is being

presented.

NOTE: Bar charts vs. histograms: A bar chart is used when the independent (x) variable

is categorical, which means the bars are separated. A histogram is used when the

independent (x) variable is scale, which means the bars are not separated.

The histogram for the frequency table above follows:

Example #7: Draw a histogram of the following frequency distribution.

Range of Scores Number of

On a Final Exam as a % Students

(Midpoint shown on graph)

52.5 – 57.4 (55) 1

57.5 – 62.4 (60) 3

62.5 – 67.4 (65) 2

67.5 – 72.4 (70) 2

72.5 – 77.4 (75) 5

77.5 – 82.4 (80) 6

82.5 – 87.4 (85) 5

87.5 – 92.4 (90) 3

92.5 – 97.4 (95) 2


Answer:

Frequency polygon - a "line" graph that connects the midpoints of each class interval (at

the top). They are particularly useful when more than one distribution is being

represented.

Pie Chart - a circular graph. The steps involved in constructing one are:

1. Add

2. Calculate the percentages

3. Divide the circle (360 degrees) into segments

For example, the U.S. Federal Budget Expenditures ($2,931B) in 2008 are shown in the

following pie chart (where “Count” is amounts in $B):


PROBABILITY & PROBABILITY DISTRIBUTIONS

Probability - A number between 0 and 1 that represents the odds that a particular event

will occur. It is the ratio of successful outcomes to the total number of outcomes

possible: The relative frequency of successful outcomes. [NOTE: The term “success” in

this context refers to the outcome of interest, not necessarily one that is positive or good.]

The probability of A = P(A) = ____number of examples of A___

total number of possible outcomes

Frequency Distribution - shows the number of values that fall within arbitrary classes or

limits.

Probability Distribution - a systematic arrangement of the probabilities corresponding

to the values of a random variable.

Frequency Distribution Probability Distribution

Freq. of X Prob. of X

__________________ __________________

Values/Classes of X Values of X

A frequency distribution is a distribution of actual values. A probability distribution is a

theoretical distribution: It represents what should happen according to the laws of

probability.


Example #8: Construct a frequency distribution and a histogram for the following

experiment. Toss two coins a total of 12 times. For each toss record the number of

heads that appear (i.e., 0, 1 or 2).

The probability distribution for this experiment is:

Outcome (Number of Heads on 2 Coins) Probability

0 0.25

1 0.50

2 0.25

The actual frequency distribution for your experiment will, of course, vary but as

predicted by the probabilities above, it is most likely that the result, 1 head, will occur

more frequently than either 0 or 2 heads. The more trials in the experiment, the more the

frequency distribution will approximate the theoretical probability distribution.

Comparison of Frequency Distribution (Actual)

With Its Probability Distribution (Theoretical)

Recap of 72 Hours at the Crap Table Shooters: 1829 Total Rolls: 14, 967

Point 2 3 4 5 6 7 8 9 10 11 12

Actual

Rolls

401 853 1236 1668 2008 2574 2081 1601 1253 875 417

Theory

Rolls

416 832 1248 1664 2080 2496 2080 1664 1248 832 416

From: Mickelson, B 72 Hours at the Crap Table Las Vegas: GBC Press, 1978

In the above table, “Actual Rolls” indicates the actual frequency of occurrence for each

point value in 14, 967 rolls of the dice. “Theory Rolls” indicates the frequency of

occurrence predicted by probability formulas. Comparing “Actual” with “Theory” you

can see that the values are very close, i.e., that the Actual results very closely

approximate the Theoretical prediction based on the statistical probability formulas.

Discrete Variable - the several possible values differ by clearly defined steps (e.g., the

number of automobiles sold per day). The probability distribution for such a value has a

“stair-step” appearance since only certain values of X are possible.

Continuous Variable - all values are possible. The probability distribution for such a

value has the appearance of a smooth curve.


The Binomial Probability Function

It is a discrete probability distribution. It is used to determine the probability of obtaining

exactly x successes in n trials of an experiment. In order to use this formula, four

assumptions (conditions) must be met. These are found at the top of p. 208 242 in the

textbook under “Properties of a Binomial Experiment.”

The formula is found on p. 212 244 in the textbook.

P (x out of n) = f(x) = n! px (1-p)

(n-x)

x! (n – x)!

where: n = the number of trials in the experiment

x = the number of successes

p = the probability of a success on one trial

1 – p = the probability of failure on one trial

f(x) = the probability of x successes in n trials

NOTE: ! = factorial, which indicates you are to multiply the number by all the

preceding integers counting backwards to 1. Thus, 5! = 5 x 4 x 3 x 2 x 1 = 120

Example #12: The probability of tossing a coin 10 times and getting exactly 7 heads is:

P(7 out of 10) = _____n!_____ (px) (1-p)

n-x

x! (n – x)!

The binomial probability function defines a series of binomial probability distributions,

one for every possible combination of values of n and probability.

NOTE: By definition: X0 = 1 and 0! = 1

Binomial Probability Table - A table of binomial probabilities for various values of n,

x, and probability. See Appendix B, Table 5, pp. 989-997 985-993.


Areas Under Any Probability Distribution

This area ... Y

is equal to the probability

of finding a value that

falls within ...

X

this range

The total area under a probability distribution = 1


COURSE NOTES – WEEK 2 - ONLINE

* * * * * * * * * * * * * * * * * * Session #3 * * * * * * * * * * * * * * * * * *

The Normal Curve – Continuous Probability Distributions

The Normal Curve – [pp. 239-245 276-278] A continuous, symmetrical, probability

curve that is defined by "population mean” (u) and "population standard deviation"

(sigma, ). The normal curve is important because it is a fairly accurate representation of

the frequency distribution of scores on a large number of variables, i.e., lots of variables

are normally distributed.

The probability that a normally distributed random variable X will fall in a given range is

equal to the area under the normal curve for that range.

Standard Normal Distribution - has u = 0, population standard deviation () = 1.

Use the table on the inside front cover of the textbook to find areas (i.e., probabilities)

under this distribution.

Nonstandard Normal Distribution - calculate the standard deviation using the formula:

[p. 245 282] Z = x - u

population standard deviation (σ, sigma)


The Central Limit Theorem

If you want to estimate the mean (µ) of a population you can make a good estimate by

examining the means of samples taken from that population. [p. 281 319-320]

The Distribution of Sample Means

If a population has a mean (µ) and you take many samples from the population, calculate

the mean for each sample, and calculate the mean of those sample means, then, that mean

will approach µ as the sample size increases.

or,

The mean of a distribution of sample means is equal to the population mean, µ .

The standard deviation of a distribution of sample means approaches

population standard deviation

square root of “n” as the sample size increases. (Where n = sample size.)

The standard deviation of sample means = the standard error of the mean (SEM) =


square root of “n”

Sampling Error - the difference between the sample value and the parameter.

Sample Mean + Sampling Error = µ

Since the standard error of the mean is a measure of how much, on the average, the

sample mean varies from the population mean, it is a measure of sampling error.

The smaller the sampling error, the more accurate the estimate of the parameter.

The size of the standard error of the mean can be reduced by increasing the sample size.

If you don't know the standard deviation of the population, you can use an alternate

formula to calculate an …

unbiased estimate of the standard error of the mean = sample standard deviation


Thus:

Z = sample mean - µ = sample mean - mean of the means

standard error of the mean population std dev



STATISTICS FOR BUSINESS

COURSE NOTES – WEEK 3

ESTIMATES

Point Estimate - a single value from a sample used to estimate the value in the

population (i.e., a parameter).

Interval Estimate - An estimate of a parameter in terms of the probability that it will

occur within an interval.

Interval Estimates for Means - The distribution of sample means can be considered

normal, therefore, we can use the table on the back inside cover of the textbook to

determine the probability that the population mean (µ) will fall in a given interval. And,

by stating the probability that µ will fall in a specific interval, we are identifying a

confidence interval.

Alpha (a) is the area under the probability distribution that is outside (above and below)

the confidence limits, i.e., it is the total “shaded” area. It is equal to 1.00 minus the size

of the confidence interval, thus a 95% confidence interval has an alpha value of 5% (or,

0.05).

Confidence Interval a Z

90% .10 + 1.645

95% .05 + 1.96

99% .01 + 2.575

If the population standard deviation (σ) is known (i.e., it has been calculated directly,

or a reliable estimate is available), use this formula:

sample mean - (Za/2) population std. dev. < µ < sample mean + (Za/2) population std. dev

square root of “n” square root of “n”

If the population standard deviation (σ) is unknown (and, therefore, must be estimated

using s), use this formula:

sample mean - (ta/2) sample std dev < µ < sample mean + (ta/2) sample std dev

square root of “n” square root of “n”

The “t” distribution is actually a collection of distributions [p. 316 354, the

shape of each one depending on the “degrees of freedom (df),” which is

dependent on “n.” In confidence interval problems, df = n – 1.


Critical values for the “t” distribution are found in the t table in your

textbook [pp. 980-982 976-978].

SAMPLING

Simple Random Sampling - The selection of items from a universe in such a way that

each item in the universe has an equal probability of being selected as each sample item

is drawn. A table of random numbers can be used to conduct random sampling.

Sample statistics can only be generalized to the population that the sample represents.

HYPOTHESIS TESTING

Purpose - To use sample data in testing assumptions about the population from which the

sample came.

Key Question: How large must the difference be between the sample mean and the

population mean to be statistically significant? When that difference is significantly

large, it can no longer be attributed to sampling error, rather, there is another reason for

the difference.

You can use the Z formula to find the probability of obtaining a sample mean that is far

(or further) from the hypothesized population mean.

Z = sample mean - mean of the means



Step 1: Formulate the Null and Alternative Hypotheses

Ho = Null Hypothesis = There is no "significant difference" between the value of the

sample statistic and the value of the population parameter,

or,

Any difference that does exist is due to sampling error.

The null hypothesis is the hypothesis that includes the equal sign (=).

[Remember: Sampling error is not the same as sampling bias.]


H1 = Alternative Hypothesis = There is "significant difference" between the value of the

sample statistic and the value of the population

parameter,

or,

The difference cannot be accounted for (explained by)

sampling error alone.

Decisions are made on the basis of whether or not the null hypothesis is accepted.

The null hypothesis and the alternative hypothesis are mutually exclusive and collectively

exhaustive.

Example #13: The null hypothesis should include the condition of equality, while the

alternative hypothesis should involve one of the three signs >, <, or ≠. For each of the

following claims, identify Ho and H1 as in the following example:

"The mean I.Q. of physicians is greater than 110."

Ho: µ ≤ 110

H1: µ > 110

a. The mean age of professors is more than 30 years.

Ho: µ ≤ 30 H1: µ > 30

b. The mean I.Q. of criminals is below 100.

Ho: µ ≥ 100 H1: µ < 100

c. The mean I.Q. of college students is at least 100.

Ho: µ ≥ 100 H1: µ < 100

d. The mean monthly maintenance cost of an aircraft is $3,200.

Ho: µ = $3,200 H1: µ ≠ $3,200

e. The mean annual salary of police officers is less than $45,000.

Ho: µ ≥ $45,000 H1: µ < $45,000

Step 2: Determine the Criterion (Significance Level) for Deciding Whether to Accept or

Reject the Null Hypothesis.

alpha = a = the significance level

Since sample statistics are not always reliable measures of population parameters

they may lead us to an incorrect decision, either a Type I or Type II Error.

NOTE: For each problem you are given to solve, the significance level will be

provided.

Step 3: Select the Appropriate Probability Distribution (Normal or t) and Determine the

Critical Values.


One-Tailed Tests - The case where you are interested in whether the sample

statistic is significantly different from the population parameter in one direction

only.

Two-Tailed Tests - The case where you want to know if the sample statistic is

significantly greater than or less than the population parameter.


Step 4: Using the Sample Data, Compute the Test Statistic.

For example:

Z = sample mean - mean of the means



Step 5: Compare the Test Statistic With the Critical Value and Either Reject or Accept

the Null Hypothesis.

In addition to “accepting” or “rejecting” the null hypothesis, you must also write

a brief interpretation of the results (a few words, phrase or sentence) that answers

the original question stated in the problem.


HYPOTHESIS TESTS COMPARING THE SAMPLE MEAN

WITH THE POPULATION MEAN,

WHEN THE POPULATION STANDARD DEVIATION IS KNOWN

Example: It has been hypothesized that blondes have above average intelligence. You

know that according to one I.Q. test the population mean (µ) is 100 and the

population standard deviation (σ) is 15. You take a random sample of 100

blondes and administer the test. The sample mean is 101.8. Do the sample

data support the hypothesis that blondes are above average in intelligence?

(Use a = .05)


Start

1 Identify the specific claim or hypothesis Blondes have above

to be tested and put it in symbolic form. average (100 I.Q.)

intelligence. µ > 100

2 Give the symbolic form that must be µ < 100

true when the original claim is false.

Of the two symbolic expressions obtained

3 so far, let the null hypothesis Ho be Ho: µ < 100

the one that contains the condition of H1: µ > 100

equality; H1 is the other statement.

Select the significance level a based on

the seriousness of a type I error. Make

4 a small if the consequences of rejecting a = 0.05

a true Ho are severe. The values of 0.05

and 0.01 are very common.

5 What statistic is relevant to this test The sample mean is the

and what is its sampling distribution? relevant statistic. Sample

means can be approximated

by a normal distribution.

Determine the test statistic, the The sample mean of 101.8 is

6 critical region, and the critical equivalent to Zcalc = 1.20. The

value(s). (It helps to draw a picture.) critical region consists of all

values > Zcrit = 1.645.

Reject Ho if the test statistic is in the

7 critical region. Accept Ho if the Accept Ho

test statistic is not in the critical region.

=100 Zcrit=1.645

For sample mean of 101.8, Zcalc = 1.20

8 Restate this previous decision in There is insufficient evidence

simple, non-technical terms. to support the claim that

blondes have above average

Stop intelligence.


* * * * * * * * * * * * * * * * * * Session #5 * * * * * * * * * * * * * * * * * *

HYPOTHESIS TESTS COMPARING THE SAMPLE MEAN

WITH THE POPULATION MEAN, WHEN THE POPULATION STANDARD

DEVIATION IS UNKNOWN

Use the t distribution and the formula:

t = sample mean - mean of the means

sample standard deviation


Example: A pilot training program usually takes an average of 57.2 hours (µ), but

new teaching methods were used on the last class of 25 students.

Computations reveal that for this experimental class, the completion times

had a mean (x ) of 54.8 hours and a standard deviation (s) of 4.3 hours. At

the = 0.05 significance level, test the claim that the new teaching

techniques reduce the instruction time.

Solution: The claim that the new teaching method reduces instruction time is

equivalent to the claim that µ < 57.2 hours. We compare the test statistic

as follows:

t = sample mean - mean of the means

sample standard deviation


= 54.8 - 57.2 = -2.791

4.3

25

We find the critical t value from the table where we locate 25 - 1 or

24 degrees of freedom at the left column and a = 0.05 (one-tail) across the

top. The critical t value of 1.711 is obtained, but since small values of x will cause the rejection of Ho, we recognize that t = -1.711 is the actual t

value that is the boundary for the critical region.

It is easy to lose sight of the underlying rationale as we go through

this hypothesis testing procedure, so let's review the essence of the test.

We set out to determine whether the sample mean of 54.8 hours is

significantly below the value of 57.2 hours. Knowing the distribution of

sample means (of which 54.8 is one) and choosing a level of significance

(5% or a = 0.05), we are able to determine the cutoff for what is a

significant difference and what is not. Any sample mean equivalent to a t

score below -1.711 represents a significant difference. The mean of 54.8


hours is significantly below 57.2 hours, so it appears as though the

new teaching method does reduce instruction time.


Start

1 There is a claim that the new teaching

method requires a mean time faster than

57.2 hours. That is, < 57.2 hours.

2 The alternative to the original claim is

> 57.2 hours.

Ho must contain the condition of equality

3 so we get: = Ho : > 57.2 hours

H1 : < 57.2 hours

The level of significance has been

4 specified in the statement of the problem.

a = 0.05

5 The sample mean should be used in

testing a claim about a population mean. a = 0.05

Since is unknown, we assume that

such sample means follow a

t distribution. It is reasonable to assume t = -1.71 = 57.2

here that the population of all or, t = 0

completion times is essentially normal.

Sample: x = 54.8 hours, where

tcalc = -2.791 (mean of 25 students)

The test statistic (tcalc = -2.791), critical

value (tcrit = -1.711), and critical region

6 are shown in the figure to the right.

Since the test statistic is in the critical

7 region, we reject Ho.

8 The new teaching method does appear to

reduce the training completion time.

Stop


COMPARING TWO POPULATION VARIANCES

Properties of the F Distribution:

The F distribution is defined by the ratio of two variances.

1. [p. 462 496] All values of F are non-negative (F > 0)

2. Instead of being symmetric, the F distribution is skewed to the right

3. There is a different F distribution for each different pair of degrees of freedom

for numerator and denominator

One use of the F distribution is to test whether two samples are from populations

having equal variances.

F = s12 = variance for sample #1 (larger variance)

s22 variance for sample #2 (smaller variance)

degrees of freedom = n1 – 1 and n2 – 1

(Where n1 is the size of sample #1 and n2 is the size of sample #2.)

[Example pp. 461-464 497-499]

Values for the F distribution are in the Appendix Table 4 [pp. 985-988 981-984]. Need

to know:

1. Significance level ()

2. Degrees of freedom for both the numerator and denominator

This test can be either one-tailed or two-tailed. However, by always placing the larger

variance in the numerator of the F ratio, this will produce an Fcalculated > 1, thus making

the right tail the relevant critical area.

NOTE: Review the information in the textbook [pp. 460-464 495–499].


ANALYSIS OF VARIANCE (ANOVA)

Purpose of ANOVA - To test for significant differences among more than two sample

means. This permits you to make inferences about whether the samples are drawn from

populations having the same mean (i.e., from the same population, or from different

populations).

Assumptions -

1. Independent variable is nominal or ordinal (with small number of categories)

2. Dependent variable is interval or ratio

3. Random sampling

4. Dependent variable is normally distributed in the populations (robust to violation

of normality assumption if n per group > 20)

5. The populations have equal variances - homogeneity of variances (robust to

homogeneity assumption if ns are similar)

[p. 511 550 Fig 13.2, p. 512 550 Fig. 13.3]

F = between groups variance = variance of the sample means

within groups variance weighted mean of the sample variances

∑ nj (mean for each sample - grand mean) 2

k - 1

F = _______________________________________ = MSTR

MSE

∑ (nj – 1) sj2

nT – k

Where:

nj = size of each sample

k = number of samples/groups

nT = ∑nj = total size of all samples

sj2 = variance of each sample

x double-bar = grand mean = mean of all the sample values

[Example p.p. 508-510, 517-519 546-548, 555-557]

NOTE: Review the information in the textbook [pp. 508-519 546–557] (From Section

13.1, up to but not including the Section, “Computer Results for Analysis of Variance”)

IMPORTANT NOTE: Two ways to interpret results.

REJECT Ho IF:

If Fcalculated > Fcritical

OR,

IF p-value < a (alpha, significance level)

The p-value is the probability of obtaining the given result [i.e., sample statistic(s)], if the null hypothesis is true.



COURSE NOTES – WEEK 4 - ONLINE

* * * * * * * * * * * * * * * * * * Session #6 * * * * * * * * * * * * * * * * * *

SIMPLE LINEAR REGRESSION

Simple regression analysis allows us to predict one variable from another, where the two

variables are quantitative (interval or ratio level)

Independent Variable(s) - The one that is known and assumed to be predictive or

causal. Denoted by X.

Dependent Variable - The one you are trying to predict. That is, it varies with the

independent variable. Denoted by Y.

Scatter Diagrams - This example shows a “perfect” relationship because all the points

lie on the line.


NOTE: Review the information in the textbook [pp. 57-58 64–68]

Purpose of Regression Analysis - To develop a mathematical equation that can be used

to predict values of some dependent variable (Y) from values of an independent

variable (X). That is, to explain or account for the variation in a variable.

The line fitted to the scatter diagram may be called any one of the following:

regression line

line of average relationship

least squares regression line

best-fit line

^ y = b0 + b1x

where: b0 = y-intercept, b1 = slope

NOTE: Review the information in the textbook [pp. 562-569 600–607].

Y

(DEPENDENT)

Y

X b1 = Y_

X

b0 (VALUE OF Y WHEN X = 0)

X

(INDEPENDENT)


The following formulas are used to find the constants, b1 and b0.

b1 = ∑ (xi - mean of x) (yi - mean of y)

∑ (xi - mean of x) 2

b0 = mean of y - b1 (mean of x)

Note: Since it is usually true that the dependent variable (y) cannot be determined

exactly from a set of specified values of the independent variables (x), the best

relationship that can be derived is an average value of y associated with a specified value

of the independent variable. And, this average value of y will have some error (e)

associated with it since it is an estimate. Thus, the more precise equation is:

Y = bo + b1x + e


CHI SQUARE (χ²)

Test of Independence - to test for the independence of two categorical variables.

χ² = ∑ (fij - eij)2

eij

Where: fij = observed frequency

eij = expected or theoretical frequency

Assumptions:

1. Both the independent and dependent variables are categorical data (i.e., measured

at a nominal/ordinal level)

2. n > 50

3. The expected frequency for each cell must be > 5

This is a way to test whether membership in one category has any bearing on

membership in another. For example:

Is a person's level of responsibility in a company related to their sex?

OR,

Is the quality of an executive's work related to whether they have an MBA?

H0 can be written as: (a) they are not related, (b) they are independent, or (c) there

is no significant difference between the groups in the independent variable.

Example: To illustrate the second case, suppose a researcher collected data on the quality

of work performed by 240 business executives using a rating scale from excellent to

poor. Then, he divided them into two groups, those with MBA's and those without. (Use

0.10 level of significance)

Using the χ² template:

Step 1: Enter the observed frequencies.

Observed Frequencies

No MBA MBA Total

Excellent 40 60 100

Very Good 10 10 20

Good 5 15 20

Fair 10 30 40

Poor 15 45 60

Total 80 160 240

Step 2: Using the marginals, calculate the expected frequencies, and enter them in the

template.

eij = column total x row total

T

Where T = grand total of marginals


Expected Frequencies

No MBA MBA Total

Excellent 33.33 66.67 100

Very Good 6.67 13.33 20

Good 6.67 13.33 20

Fair 13.33 26.67 40

Poor 20 40 60

Total 80 160 240

Step 3: Interpret the results of the χ² analysis by either (1) comparing the p-value to the

significance level (), or (2) comparing the calculated χ² value to the table (critical)

value of χ²

Test Results

0 Correction

8.260 χ2

5 Rows

2 Columns

4 df

0.083 p(χ2)

0.186 V (or φ)

See the χ² table in Appendix B [pp. 983-984 979-980]. To use this table you need to

know and df.

NOTE: Review the Chi-square distribution [p. 451 486]. Like the Student's t

distribution, the χ² distribution is different for different degrees of freedom.

For contingency tables: df = (r - 1) (c - 1)

where: r = # rows in the contingency table

c = # columns in the contingency table

Thus, df = (5 - 1) (2 - 1) = 4

Critical value of χ² at df = 4 and = 0.10 is 7.779

CONCLUSION: There is a significant difference between the ratings of executives who

have MBAs and those who do not.


* * * * * * * * * * * * * * * * * * Session #7 * * * * * * * * * * * * * * * * * *

TIME SERIES & FORECASTING

Purpose - Time series decomposition is used to detect patterns of change in statistical

information over regular intervals of time and to project these patterns in making

predictions (i.e., forecasting).

The three kinds of change involved in time series decomposition are listed below. A time

series may contain more than one of these components.

1. Seasonal variation

2. Trend

3. Irregular variation

[pp. 786-792 807-813]

Seasonal Variation

Seasonal variation is repetitive and predictable movement around the trend line in one

year or less. To detect seasonal variation, the time intervals should be days, weeks,

months, or quarters.

We study seasonal variation to:

1. Establish a pattern of past change

2. Make projections (for short-run decisions)

3. Eliminate its effects from the time series

Example: Over a six-year period, the gold market planned an uncanny buying

opportunity in June of each year. This remarkable chart graphically illustrates that June

is buying time. The chart shows a seasonal trend in the dollar price of gold for the years

1977 to 1982. Here's how to interpret these patterns. For example, at a level of 104 the

price of gold is 4% above its long-term trendline, i.e., 4% above its seasonal adjusted

average. Similarly, at 97 gold is 3% below its long-term trendline.


112 -

GOLD PRICE

110 - ($ U.S.)

108 - 1981

106 - 1982

1980

104 - 1979

102 - 1978

1977

100 -

98 -

96 -

94 -

92 -

90 -

J F M A M J J A S O N D

[pp. 829-834 848-856]

Ratio-to-Moving Average Method - This method uses an index (based on 1.00) to

describe the degree of seasonal variation.

Trend Analysis

Trends are described using the least squares method. They can be linear or curvilinear.

We study trends to:

1. Describe an historical pattern

2. Project past trends into the future

Linear Trends - Use the equation for a straight line (regression equation).

[p. 834 830] Tt = b0 + b1t

Where: Tt = linear trend forecast in period t; b0 = intercept of the linear trend

line, b1 = slope of the trend line, t = time period

b1 = ∑ (t - mean of t) (Yt - mean of Y)

∑ (t - mean of t) 2

b0 = mean of Y - b1 (mean of t)


The Process of Decomposing a Time Series

Following are the steps in the process:

1. Calculate the seasonal indices

2. Deseasonalize all of the original data

3. Conduct a trend analysis

4. Use the trend equation to make a forecast

5. Adjust the forecast for the seasonal effect (i.e., re-seasonalize the forecasted

value)