Chapter 15 Analysis of Variance ( ANOVA ). Analysis of Variance… Analysis of variance is a...

45
Chapter 15 Analysis of Variance ( ANOVA )

Transcript of Chapter 15 Analysis of Variance ( ANOVA ). Analysis of Variance… Analysis of variance is a...

Page 1: Chapter 15 Analysis of Variance ( ANOVA ). Analysis of Variance… Analysis of variance is a technique that allows us to compare two or more populations.

Chapter 15

Analysis of Variance

( ANOVA )

Page 2: Chapter 15 Analysis of Variance ( ANOVA ). Analysis of Variance… Analysis of variance is a technique that allows us to compare two or more populations.

Analysis of Variance…Analysis of variance is a technique that allows us to compare two or more populations of interval data.

Analysis of variance is: an extremely powerful and widely used procedure. a procedure which determines whether differences exist between population means. a procedure which works by analyzing sample variance.

Page 3: Chapter 15 Analysis of Variance ( ANOVA ). Analysis of Variance… Analysis of variance is a technique that allows us to compare two or more populations.

One-Way Analysis of Variance…Independent samples are drawn from k populations:

Note: These populations are referred to as treatments.It is not a requirement that n1 = n2 = … = nk.

Page 4: Chapter 15 Analysis of Variance ( ANOVA ). Analysis of Variance… Analysis of variance is a technique that allows us to compare two or more populations.

Table 15.01 Notation for the One-Way Analysis of Variance

Page 5: Chapter 15 Analysis of Variance ( ANOVA ). Analysis of Variance… Analysis of variance is a technique that allows us to compare two or more populations.

Independent samples are drawn from k populations (treatments).

1 2 kX11

x21

.

.

.Xn1,1

1

1x

n

X12

x22

.

.

.Xn2,2

2

2x

n

X1k

x2k

.

.

.Xnk,k

k

kx

n

Sample size

Sample mean

First observation,first sample

Second observation,second sample

X is the “response variable”.The variables’ value are called “responses”.

Notation

Page 6: Chapter 15 Analysis of Variance ( ANOVA ). Analysis of Variance… Analysis of variance is a technique that allows us to compare two or more populations.

One Way Analysis of Variance…New Terminology:

x is the response variable, and its values are responses.

xij refers to the i th observation in the j th sample.

E.g. x35 is the third observation of the fifth

sample.

∑ xij

xj = mean of the jth sample = nj

nj

i=1

nj = number of observations in the sample taken

from the jth population

Page 7: Chapter 15 Analysis of Variance ( ANOVA ). Analysis of Variance… Analysis of variance is a technique that allows us to compare two or more populations.

One Way Analysis of Variance…

x

=

∑ ∑ xij

x = n=

k nj

j = 1 i = 1

The grand mean, , is the mean of all the observations, i.e.:

(n = n1 + n2 + … + nk)and k is the number of populations

Page 8: Chapter 15 Analysis of Variance ( ANOVA ). Analysis of Variance… Analysis of variance is a technique that allows us to compare two or more populations.

One Way Analysis of Variance…

More New Terminology:

Population classification criterion is called a factor.

Each population is a factor level.

Page 9: Chapter 15 Analysis of Variance ( ANOVA ). Analysis of Variance… Analysis of variance is a technique that allows us to compare two or more populations.

Example 15-1…An apple juice company has a new product featuring…

more convenience,

similar or better quality, and

lower price

when compared with existing juice products.

Which factor should an advertising campaign focus on?

Before going national, test markets are set-up in three cities, each with its own campaign, and data is recorded…

Do differences in sales exist between the test markets?

Page 10: Chapter 15 Analysis of Variance ( ANOVA ). Analysis of Variance… Analysis of variance is a technique that allows us to compare two or more populations.

529.00658.00793.00514.00663.00719.00711.00606.00461.00529.00498.00663.00604.00495.00485.00557.00353.00557.00542.00614.00

804.00630.00774.00717.00679.00604.00620.00697.00706.00615.00492.00719.00787.00699.00572.00523.00584.00634.00580.00624.00

672.00531.00443.00596.00602.00502.00659.00689.00675.00512.00691.00733.00698.00776.00561.00572.00469.00581.00679.00532.00

City 1 City2 City3(Convenience) (Quality) (Price)

Data

Xm15-01

Page 11: Chapter 15 Analysis of Variance ( ANOVA ). Analysis of Variance… Analysis of variance is a technique that allows us to compare two or more populations.

Example 15.1…x is the response variable, and its values are responses.

weekly sales is the response variable;

the actual sales figures are the responses in this example.

xij refers to the ith observation in the jth sample.

E.g. x42 is the fourth week’s sales in city #2: 717 pkgs.

x20, 3 is the last week of sales for city #3: 532 pkgs.

Terminology

comma added for clarity

Page 12: Chapter 15 Analysis of Variance ( ANOVA ). Analysis of Variance… Analysis of variance is a technique that allows us to compare two or more populations.

Example 15.1…

.

The response variable is weekly sales

Population classification criterion is called a factor.

The advertising strategy is the factor we’re interested in. This is the only factor under consideration (hence the term “one way” analysis of variance).

Each population is a factor level.

In this example, there are three factor levels: convenience, quality, and price.

Terminology

Page 13: Chapter 15 Analysis of Variance ( ANOVA ). Analysis of Variance… Analysis of variance is a technique that allows us to compare two or more populations.

Terminology

In the context of this problem…Response variable – weekly salesResponses – actual sale valuesExperimental unit – weeks in the three cities when we record sales figures.Factor – the criterion by which we classify the populations (the treatments). In this problem the factor is the marketing strategy.

Factor levels – the population (treatment) names. In this problem factor levels are the marketing strategies.

Page 14: Chapter 15 Analysis of Variance ( ANOVA ). Analysis of Variance… Analysis of variance is a technique that allows us to compare two or more populations.

Example 15.1…

The null hypothesis in this case is:

H0: μ1= μ2 =μ3

i.e. there are no differences between population means.

Our alternative hypothesis becomes:

H1: at least two means differ

OK. Now we need some test statistics…

IDENTIFY

Page 15: Chapter 15 Analysis of Variance ( ANOVA ). Analysis of Variance… Analysis of variance is a technique that allows us to compare two or more populations.

Two types of variability are employed when testing for the

equality of the population means

The rationale of the test statistic

Page 16: Chapter 15 Analysis of Variance ( ANOVA ). Analysis of Variance… Analysis of variance is a technique that allows us to compare two or more populations.

Graphical demonstration:Employing two types of variability

Page 17: Chapter 15 Analysis of Variance ( ANOVA ). Analysis of Variance… Analysis of variance is a technique that allows us to compare two or more populations.

20

25

30

1

7

Treatment 1 Treatment 2 Treatment 3

10

12

19

9

Treatment 1Treatment 2Treatment 3

20

161514

1110

9

10x1

15x2

20x3

10x1

15x2

20x3

The sample means are the same as before,but the larger within-sample variability makes it harder to draw a conclusionabout the population means.

A small variability withinthe samples makes it easierto draw a conclusion about the population means.

Page 18: Chapter 15 Analysis of Variance ( ANOVA ). Analysis of Variance… Analysis of variance is a technique that allows us to compare two or more populations.

The rationale behind the test statistic – I

• If the null hypothesis is true, we would expect all the sample means to be close to one another (and as a result, close to the grand mean).

• If the alternative hypothesis is true, at least some of the sample means would differ.

• Thus, we measure variability between sample means.

Page 19: Chapter 15 Analysis of Variance ( ANOVA ). Analysis of Variance… Analysis of variance is a technique that allows us to compare two or more populations.

• The variability between the sample means is measured as the sum of squared distances between each mean and the grand mean.

This sum is called the

Sum of Squares for Treatments

SST

In our example treatments arerepresented by the differentadvertising strategies.

Variability between sample means

Page 20: Chapter 15 Analysis of Variance ( ANOVA ). Analysis of Variance… Analysis of variance is a technique that allows us to compare two or more populations.

2k

1jjj )xx(nSST

There are k treatments

The size of sample j The mean of sample j

Sum of squares for treatments (SST)

Note: When the sample means are close toone another, their distance from the grand mean is small, leading to a small SST. Thus, large SST indicates large variation between sample means, which supports H1.

Page 21: Chapter 15 Analysis of Variance ( ANOVA ). Analysis of Variance… Analysis of variance is a technique that allows us to compare two or more populations.

Test Statistics…Since μ1= μ2 =μ3 is of interest to us, a statistic that

measures the proximity of the sample means to each other would also be of interest.

Such a statistic exists, and is called the between-treatments variation. It is denoted SST, short for “sum of squares for treatments”. Its is calculated as:

grand meansum across k treatments

A large SST indicates large variation between samplemeans which supports H1.

Page 22: Chapter 15 Analysis of Variance ( ANOVA ). Analysis of Variance… Analysis of variance is a technique that allows us to compare two or more populations.

Example 15.1…

Since:

If it were the case that:

then SST = 0 and our null hypothesis, H0:

would be supported.

More generally, a “small value” of SST supports the null hypothesis. The question is, how small is “small enough”?

COMPUTE

Page 23: Chapter 15 Analysis of Variance ( ANOVA ). Analysis of Variance… Analysis of variance is a technique that allows us to compare two or more populations.

Example 15.1…

The following sample statistics and grand mean were computed…

Hence, the between-treatments variation, sum of squares for treatments, is:

is SST = 57,512.23 “large enough” to indicate the population means differ?

COMPUTE

Page 24: Chapter 15 Analysis of Variance ( ANOVA ). Analysis of Variance… Analysis of variance is a technique that allows us to compare two or more populations.

• Large variability within the samples weakens the “ability” of the sample means to represent their corresponding population means.

• Therefore, even though sample means may markedly differ from one another, SST must be judged relative to the “within samples variability”.

The rationale behind test statistic – II

Page 25: Chapter 15 Analysis of Variance ( ANOVA ). Analysis of Variance… Analysis of variance is a technique that allows us to compare two or more populations.

• The variability within samples is

measured by adding all the squared distances between observations and their sample means.

This sum is called the

Sum of Squares for Error

SSEIn our example this is the sum of all squared differencesbetween sales in city j and thesample mean of city j (over all the three cities).

Within samples variability

Page 26: Chapter 15 Analysis of Variance ( ANOVA ). Analysis of Variance… Analysis of variance is a technique that allows us to compare two or more populations.

Test Statistics…SST gave us the between-treatments variation. A second statistic, SSE (Sum of Squares for Error) measures the within-treatments variation.

SSE is given by: or:

In the second formulation, it is easier to see that it provides a measure of the amount of variation we can expect from the random variable we’ve observed.

Page 27: Chapter 15 Analysis of Variance ( ANOVA ). Analysis of Variance… Analysis of variance is a technique that allows us to compare two or more populations.

Example 15.1…We calculate the sample variances as:

COMPUTE

3

and from these, calculate the within-treatments

variation (sum of squares for error) as:

Page 28: Chapter 15 Analysis of Variance ( ANOVA ). Analysis of Variance… Analysis of variance is a technique that allows us to compare two or more populations.

Is SST = 57,512.23 large enough

relative to SSE = 506,983.50 to reject

the null hypothesis that specifies that

all the means are equal?

We still need a couple more quantities

in order to relate SST and SSE

together in a meaningful way…

Sum of squares for errors (SSE)

Page 29: Chapter 15 Analysis of Variance ( ANOVA ). Analysis of Variance… Analysis of variance is a technique that allows us to compare two or more populations.

Mean Squares…The mean square for treatments (MST) is given by:

is F-distributed with k–1 and n–k degrees of freedom.

The mean square for errors (MSE) is given by:

And the test statistic:

ν1 = 3 – 1 = 2 ; ν2 = 60 – 3 = 57

Page 30: Chapter 15 Analysis of Variance ( ANOVA ). Analysis of Variance… Analysis of variance is a technique that allows us to compare two or more populations.

Example 15.1…

We can calculate the mean squares treatment and mean squares error quantities as:

COMPUTE

Page 31: Chapter 15 Analysis of Variance ( ANOVA ). Analysis of Variance… Analysis of variance is a technique that allows us to compare two or more populations.

Giving us our F-statistic of:

Does F = 3.23 fall into a rejection region or not?

How does it compare to a critical value of F?

Note these required conditions:1. The populations tested are normally distributed.2. The variances of all the populations tested are equal.

Example 15.1… COMPUTE

Page 32: Chapter 15 Analysis of Variance ( ANOVA ). Analysis of Variance… Analysis of variance is a technique that allows us to compare two or more populations.

Example 15.1…Since the purpose of calculating the F-statistic is to determine whether the value of SST is large enough to reject the null hypothesis, if SST is large, F will be large.Hence our rejection region is:

Our value for FCritical is:

INTERPRET

Page 33: Chapter 15 Analysis of Variance ( ANOVA ). Analysis of Variance… Analysis of variance is a technique that allows us to compare two or more populations.

Example 15.1…

Since F = 3.23 is greater than FCritical = 3.15, we

reject the null hypothesis (H0: μ1= μ2 =μ3 ) in

favor of the alternative hypothesis (H1: at least two population means differ).

That is: there is enough evidence to infer that the mean weekly sales differ between the three cities.

Stated another way: we are quite confident that the strategy used to advertise the product will produce different sales figures.

INTERPRET

Page 34: Chapter 15 Analysis of Variance ( ANOVA ). Analysis of Variance… Analysis of variance is a technique that allows us to compare two or more populations.
Page 35: Chapter 15 Analysis of Variance ( ANOVA ). Analysis of Variance… Analysis of variance is a technique that allows us to compare two or more populations.

Summary of Techniques (so far)…

Page 36: Chapter 15 Analysis of Variance ( ANOVA ). Analysis of Variance… Analysis of variance is a technique that allows us to compare two or more populations.

ANOVA Table…

The results of analysis of variance are usually reported in an ANOVA table…

Source of Variation

degrees offreedom

Sum of Squares

Mean Square

Treatments

k–1 SST MST=SST/(k–1)

Error n–k SSE MSE=SSE/(n–k)

Total n–1 SS(Total)

F-stat=MST/MSE

Page 37: Chapter 15 Analysis of Variance ( ANOVA ). Analysis of Variance… Analysis of variance is a technique that allows us to compare two or more populations.

Table 15.2 ANOVA Table for the One-Way Analysis of Variance

Page 38: Chapter 15 Analysis of Variance ( ANOVA ). Analysis of Variance… Analysis of variance is a technique that allows us to compare two or more populations.

Table 15.3 ANOVA Table for Example 15.1

Page 39: Chapter 15 Analysis of Variance ( ANOVA ). Analysis of Variance… Analysis of variance is a technique that allows us to compare two or more populations.

SPSS Output

ANOVA

Sales

57512.233 2 28756.117 3.233 .047

506983.5 57 8894.447

564495.7 59

Between Groups

Within Groups

Total

Sum ofSquares df Mean Square F Sig.

Page 40: Chapter 15 Analysis of Variance ( ANOVA ). Analysis of Variance… Analysis of variance is a technique that allows us to compare two or more populations.

Can We Use t – Test Instead of ANOVA?

We can’t for two reasons

1. We need to perform more calculations. If we have six pairs then we will have to test C6 = ( 6 x 5 ) / 2 = 15 times

2. It will increase the probability of making Type I error from 5% to 54%

2

Page 41: Chapter 15 Analysis of Variance ( ANOVA ). Analysis of Variance… Analysis of variance is a technique that allows us to compare two or more populations.

Relationship Between t and F Statistics

F = t2

The F statistic is approximately equal to the square of t

Hence we will draw exactly the same conclusion using analysis of variance

as we did when we applied t test of u1 – u2.

Page 42: Chapter 15 Analysis of Variance ( ANOVA ). Analysis of Variance… Analysis of variance is a technique that allows us to compare two or more populations.

Identifying Factors…Factors that Identify the One-Way Analysis of Variance:

Page 43: Chapter 15 Analysis of Variance ( ANOVA ). Analysis of Variance… Analysis of variance is a technique that allows us to compare two or more populations.

Analysis of Variance Experimental Designs

Experimental design is one of the factors that determines which technique we use.

In the previous example we compared three populations on the basis of one factor – advertising strategy.

One-way analysis of variance is only one of many different experimental designs of the analysis of variance.

Page 44: Chapter 15 Analysis of Variance ( ANOVA ). Analysis of Variance… Analysis of variance is a technique that allows us to compare two or more populations.

Analysis of Variance Experimental Designs

A multifactor experiment is one where there are two or more factors that define the treatments.

For example, if instead of just varying the advertising strategy for our new apple juice product if we also vary the advertising medium (e.g. television or newspaper), then we have a two-factor analysis of variance situation.

The first factor, advertising strategy, still has three levels (convenience, quality, and price) while the second factor, advertising medium, has two levels (TV or print).

Page 45: Chapter 15 Analysis of Variance ( ANOVA ). Analysis of Variance… Analysis of variance is a technique that allows us to compare two or more populations.

Factor ALevel 1Level2

Level 1

Factor B

Level 3

Two - way ANOVATwo factors

Level2

One - way ANOVASingle factor

Treatment 3 (level 1)

Response

Response

Treatment 1 (level 3)

Treatment 2 (level 2)