Statistics

33
Sultan Kudarat State University Access Campus, EJC Montilla, Tacurong City In Partial Requirements In Analytical Statistics Educ. 600 Cashmira Balabagan-Ibrahim

Transcript of Statistics

Page 1: Statistics

Sultan Kudarat State UniversityAccess Campus, EJC Montilla, Tacurong City

In Partial Requirements

In

Analytical StatisticsEduc. 600

Cashmira Balabagan-IbrahimMAT-English

October 2010

Page 2: Statistics

McNemar's Test Of Change

McNemar's Test of Change is a two-sample dependent test for proportions. The test involves two evaluations

of a single set of items, where each items falls into one of two classifications on each evaluation. If we use a

Pass/Fail classification scheme, McNemar's Test evaluates the differences between the number that Passed

on the first evaluation and Failed on the second, versus the number that Passed on the second evaluation and

Failed on the first.

The frequencies corresponding to the two classifications and two evaluations may be placed in a 2 x 2 table as

seen below. The cells of interest in this table are the b and c cells. In these cells, b and c, differences were

found in the classifications from the first evaluation to the second. McNemar's Test evaluates the change in the

number of misclassifications in one direction versus the number of misclassifications in the other. The

comparison is then the number of observations falling in the b cell versus the number falling in the c cell.

Hypotheses

The following hypotheses may be tested:

Where is the population proportion that would Pass on the first evaluation and Fail on the second, and is

the population proportion that would Pass on the second evaluation and Fail on the first.

Assumptions

1. The samples have been randomly drawn from two dependent populations either through matching or

repeated measures (Critical)

2. Each item evaluation yields one of two classifications (Critical)

3. Each observation is independent of every other observation, other than the given paired dependency

(Critical)

Test Statistics

Page 3: Statistics

McNemar's Test of Change may be reduced to a One-Sample Binomial Test, with the following:

Let p = c/(b+c), and = 0.5

The exact or approximate One-Sample Binomial Test is then performed using these values.

Output

Note

The p-value is flagged with an asterisk (*) when p <= alpha.

Kappa

Kappa is a measure of agreement. While McNemar may reject the null hypothesis, the level of agreement may

also be of interest.

The following statistics are output.

Agreement Proportion Agreement = 0.520

Proportion Chance Agreement = 0.392

Kappa (Max) = 0.211

Kappa = 0.211

Here are the methods of calculation

N=A+B+C+D [Total Sample Size]

Po=(A+D)/N [Proportion Agreement]

Pc=((A+B)*(A+C)+(C+D)*(B+D))/N/N [Proportion Chance Agreement]

Pom=(Minimum(A+C,A+B)+Minimum(B+D,C+D))/N

Kappa (Max)=(Pom-Pc)/(1-Pc) [Maximum value of Kappa, given marginal values]

Kappa=(Po-Pc)/(1-Pc)

Mann–Whitney U

Page 4: Statistics

In statistics, the Mann–Whitney U test (also called the Mann–Whitney–Wilcoxon (MWW), Wilcoxon rank-sum test, or Wilcoxon–Mann–Whitney test) is a non-parametric test for assessing whether two independent

samples of observations have equally large values. It is one of the best-known non-parametric significance

tests. It was proposed initially by the Irish-born US statistician Frank Wilcoxon in 1945, for equal sample sizes,

and extended to arbitrary sample sizes and in other ways by the Austrian-born US mathematician Henry

Berthold Mann and the US statistician Donald Ransom Whitney. MWW is virtually identical to performing an

ordinary parametric two-sample t test on the data after ranking over the combined samples.

Assumptions and formal statement of hypotheses

Although Mann and Whitney (1947) developed the MWW test under the assumption of continuous responses

with the alternative hypothesis being that one distribution is stochastically greater than the other, there are

many other ways to formulate the null and alternative hypotheses such that the MWW test will give a valid test.[1]

A very general formulation is to assume that:

1. All the observations from both groups are independent of each other,

2. The responses are ordinal or continuous measurements (i.e. one can at least say, of any two

observations, which is the greater),

3. Under the null hypothesis the distributions of both groups are equal, so that the probability of an

observation from one population (X) exceeding an observation from the second population (Y) equals

the probability of an observation from Y exceeding an observation from X, that is, there is a symmetry

between populations with respect to probability of random drawing of a larger observation.

4. Under the alternative hypothesis the probability of an observation from one population (X) exceeding an

observation from the second population (Y) (after correcting for ties) is not equal to 0.5. The alternative

may also be stated in terms of a one-sided test, for example: P(X > Y) + 0.5 P(X = Y)  > 0.5.

If we add more strict assumptions than those above such that the responses are assumed continuous and the

alternative is a location shift (i.e. F1(x) = F2(x + δ)), then we can interpret a significant MWW test as showing a

significant difference in medians. Under this location shift assumption, we can also interpret the MWW as

assessing whether the Hodges–Lehmann estimate of the difference in central tendency between the two

populations differs significantly from zero. The Hodges–Lehmann estimate for this two-sample problem is the

median of all possible differences between an observation in the first sample and an observation in the second

sample.

Calculations

The test involves the calculation of a statistic, usually called U, whose distribution under the null hypothesis is

known. In the case of small samples, the distribution is tabulated, but for sample sizes above ~20 there is a

Page 5: Statistics

good approximation using the normal distribution. Some books tabulate statistics equivalent to U, such as the

sum of ranks in one of the samples, rather than U itself.

The U test is included in most modern statistical packages. It is also easily calculated by hand, especially for

small samples. There are two ways of doing this.

For small samples a direct method is recommended. It is very quick, and gives an insight into the meaning of

the U statistic.

1. Choose the sample for which the ranks seem to be smaller (The only reason to do this is to make

computation easier). Call this "sample 1," and call the other sample "sample 2."

2. Taking each observation in sample 1, count the number of observations in sample 2 that are smaller

than it (count a half for any that are equal to it).

3. The total of these counts is U.

For larger samples, a formula can be used:

1. Arrange all the observations into a single ranked series. That is, rank all the observations without

regard to which sample they are in.

2. Add up the ranks for the observations which came from sample 1. The sum of ranks in sample 2 follows

by calculation, since the sum of all the ranks equals N(N + 1)/2 where N is the total number of

observations.

3. U is then given by:

where n1 is the sample size for sample 1, and R1 is the sum of the ranks in sample 1.

Note that there is no specification as to which sample is considered sample 1. An equally valid formula

for U is

The smaller value of U1 and U2 is the one used when consulting significance tables. The sum of the two

values is given by

Knowing that R1 + R2 = N(N + 1)/2 and N = n1 + n2 , and doing some algebra, we find that the sum is

Page 6: Statistics

The maximum value of U is the product of the sample sizes for the two samples. In such a case, the "other" U

would be 0. The Mann–Whitney U is equivalent to the area under the receiver operating characteristic curve

that can be readily calculated

Examples

Illustration of calculation methods

Suppose that Aesop is dissatisfied with his classic experiment in which one tortoise was found to beat one

hare in a race, and decides to carry out a significance test to discover whether the results could be extended to

tortoises and hares in general. He collects a sample of 6 tortoises and 6 hares, and makes them all run his

race. The order in which they reach the finishing post (their rank order, from first to last) is as follows, writing T

for a tortoise and H for a hare:

T H H H H H T T T T T H

What is the value of U?

Using the direct method, we take each tortoise in turn, and count the number of hares it is beaten by

(lower rank), getting 0, 5, 5, 5, 5, 5, which means U = 25. Alternatively, we could take each hare in turn,

and count the number of tortoises it is beaten by. In this case, we get 1, 1, 1, 1, 1, 6. So U = 6 + 1 + 1 +

1 + 1 + 1 = 11. Note that the sum of these two values for U is 36, which is 6 × 6.

Using the indirect method:

the sum of the ranks achieved by the tortoises is 1 + 7 + 8 + 9 + 10 + 11 = 46.

Therefore U = 46 − (6×7)/2 = 46 − 21 = 25.

the sum of the ranks achieved by the hares is 2 + 3 + 4 + 5 + 6 + 12 = 32, leading to U = 32 − 21 = 11.

Illustration of object of test

A second example illustrates the point that the Mann–Whitney does not test for equality of medians. Consider

another hare and tortoise race, with 19 participants of each species, in which the outcomes are as follows:

H H H H H H H H H T T T T T T T T T T H H H H H H H H H H T T T T T T T T T

The median tortoise here comes in at position 19, and thus actually beats the median hare, which comes in at

position 20.

Page 7: Statistics

However, the value of U (for hares) is 100

(9 Hares beaten by (x) 0 tortoises) + (10 hares beaten by (x) 10 tortoises) = 0 + 100 = 100

Value of U(for tortoises) is

(10 tortoises beaten by 9 hares) + (9 tortoises beaten by 19 hares) = 90 + 171 = 261

Consulting tables, or using the approximation below, shows that this U value gives significant evidence that

hares tend to do better than tortoises (p < 0.05, two-tailed). Obviously this is an extreme distribution that would

be spotted easily, but in a larger sample something similar could happen without it being so apparent. Notice

that the problem here is not that the two distributions of ranks have different variances; they are mirror images

of each other, so their variances are the same, but they have very different skewness.

Normal approximation

For large samples, U is approximately normally distributed. In that case, the standardized value

where mU and σU are the mean and standard deviation of U, is approximately a standard normal deviate whose

significance can be checked in tables of the normal distribution. mU and σU are given by

The formula for the standard deviation is more complicated in the presence of tied ranks; the full formula is

given in the text books referenced below. However, if the number of ties is small (and especially if there are no

large tie bands) ties can be ignored when doing calculations by hand. The computer statistical packages will

use the correctly adjusted formula as a matter of routine.

Note that since U1 + U2 = n1 n2, the mean n1 n2/2 used in the normal approximation is the mean of the two

values of U. Therefore, the absolute value of the z statistic calculated will be same whichever value of U is

used.

Relation to other tests

Comparison to Student's t-test

Page 8: Statistics

The U test is useful in the same situations as the independent samples Student's t -test , and the question

arises of which should be preferred.

Ordinal data

U remains the logical choice when the data are ordinal but not interval scaled, so that the spacing

between adjacent values cannot be assumed to be constant.

Robustness

As it compares the sums of ranks [2]. the Mann–Whitney test is less likely than the t-test to spuriously

indicate significance because of the presence of outliers – i.e. Mann–Whitney is more robust.[clarification

needed][citation needed]

Efficiency

When normality holds, MWW has an (asymptotic) efficiency of 3 / π or about 0.95 when compared to

the t test[3]. For distributions sufficiently far from normal and for sufficiently large sample sizes, the

MWW can be considerably more efficient than the t[4].

Overall, the robustness makes the MWW more widely applicable than the t test, and for large samples from the

normal distribution, the efficiency loss compared to the t test is only 5%, so one can recommend MWW as the

default test for comparing interval or ordinal measurements with similar distributions.

The relation between efficiency and power in concrete situations isn't trivial though. For small sample sizes one

should investigate the power of the MWW vs t.

Different distributions

If one is only interested in stochastic ordering of the two populations (i.e., the concordance probability

P(Y > X)), the Wilcoxon–Mann–Whitney test can be used even if the shapes of the distributions are different.

The concordance probability is exactly equal to the area under the receiver operating characteristic curve

(AUC) that is often used in the context.[citation needed] If one desires a simple shift interpretation, the U test should

not be used when the distributions of the two samples are very different, as it can give erroneously significant

results.

Alternatives

In that situation, the unequal variances version of the t test is likely to give more reliable results, but only if

normality holds.

Alternatively, some authors (e.g. Conover) suggest transforming the data to ranks (if they are not already

ranks) and then performing the t test on the transformed data, the version of the t test used depending on

whether or not the population variances are suspected to be different. Rank transformations do not preserve

variances so it is difficult to see how this would help.

Page 9: Statistics

The Brown–Forsythe test has been suggested as an appropriate non-parametric equivalent to the F test for

equal variances.

Kendall's τ

The U test is related to a number of other non-parametric statistical procedures. For example, it is equivalent to

Kendall's τ correlation coefficient if one of the variables is binary (that is, it can only take two values).

ρ statistic

A statistic called ρ that is linearly related to U and widely used in studies of categorization (discrimination

learning involving concepts) is calculated by dividing U by its maximum value for the given sample sizes, which

is simply n1 × n2. ρ is thus a non-parametric measure of the overlap between two distributions; it can take

values between 0 and 1, and it is an estimate of P(Y > X) + 0.5 P(Y = X), where X and Y are randomly chosen

observations from the two distributions. Both extreme values represent complete separation of the

distributions, while a ρ of 0.5 represents complete overlap. This statistic was first proposed by Richard

Herrnstein (see Herrnstein et al., 1976). The usefulness of the ρ statistic can be seen in the case of the odd

example used above, where two distributions that were significantly different on a U-test nonetheless had

nearly identical medians: the ρ value in this case is approximately 0.723 in favour of the hares, correctly

reflecting the fact that even though the median tortoise beat the median hare, the hares collectively did better

than the tortoises collectively.

Example statement of results

In reporting the results of a Mann–Whitney test, it is important to state:

A measure of the central tendencies of the two groups (means or medians; since the Mann–Whitney is

an ordinal test, medians are usually recommended)

The value of U

The sample sizes

The significance level.

In practice some of this information may already have been supplied and common sense should be used in

deciding whether to repeat it. A typical report might run,

"Median latencies in groups E and C were 153 and 247 ms; the distributions in the two groups differed

significantly (Mann–Whitney U = 10.5, n1 = n2 = 8, P < 0.05 two-tailed)."

A statement that does full justice to the statistical status of the test might run,

Page 10: Statistics

"Outcomes of the two treatments were compared using the Wilcoxon–Mann–Whitney two-sample rank-

sum test. The treatment effect (difference between treatments) was quantified using the Hodges–

Lehmann (HL) estimator, which is consistent with the Wilcoxon test (ref. 5 below). This estimator (HLΔ)

is the median of all possible differences in outcomes between a subject in group B and a subject in

group A. A non-parametric 0.95 confidence interval for HLΔ accompanies these estimates as does ρ,

an estimate of the probability that a randomly chosen subject from population B has a higher weight

than a randomly chosen subject from population A. The median [quartiles] weight for subjects on

treatment A and B respectively are 147 [121, 177] and 151 [130, 180] Kg. Treatment A decreased

weight by HLΔ = 5 Kg. (0.95 CL [2, 9] Kg., 2P = 0.02, ρ = 0.58)."

However it would be rare to find so extended a report in a document whose major topic was not statistical

inference.

CHI-SQUARE INDEPENDENCE TEST

If we have N observations with two variables where each observation can be classified into one of R mutually

exclusive categories for variable one and one of C mutually exclusive categories for variable two, then a cross-

tabulation of the data results in a two-way contingency table (also referred to as an RxC contingency table).

The resulting contingency table has R rows and C columns.

A common question with regards to a two-way contingency table is whether we have independence. By

independence, we mean that the row and column variables are unassociated (i.e., knowing the value of the

row variable will not help us predict the value of column variable and likewise knowing the value of the column

variable will not help us predict the value of the row variable).

A more technical definition for independence is that

P(row i, column j) = P(row i)*P(column j)       for all i,j

One such test is the chi-square test for independence.

H0: The two-way table is independent

Ha: The two-way table is not independent

Test Statistic:

where

r = the number of rows in the contingency table

Page 11: Statistics

c = the number of columns in the contingency table

Oij = the observed frequency of the ith row and jth column

Eij = the expected frequency of the ith row and jth column

  =

Ri = the sum of the observed frequencies for row i

Cj = the sum of the observed frequencies for column j

N = the total sample size

Significance

Level:

Critical Region: T > CHSPPF(alpha,(r-1)*(c-1))

where CHSPPF is the percent point function of the chi-square distribution and

(r-1)*(c-1) is the degrees of freedom

Conclusion: Reject the independence hypothesis if the value of the test statistic is greater

than the chi-square value.

This test statistic can also be formulated as

where

The dij are referred to as the standardized residuals and they show the contribution to the chi-square

test statistic of each cell.

Syntax 1:

CHI-SQUARE INDPENDENCE TEST <y1> <y2>

                        <SUBSET/EXCEPT/FOR qualification>

where <y1> is the first response variable;

            <y2> is the second response variable;

and where the <SUBSET/EXCEPT/FOR qualification> is optional.

This syntax is used for the case where you have raw data (i.e., the data has not yet been cross

tabulated into a two-way table).

Page 12: Statistics

Syntax 2:

CHI-SQUARE INDEPENDENCE TEST <m>

                        <SUBSET/EXCEPT/FOR qualification>

where <m> is a matrix containing the two-way table;

and where the <SUBSET/EXCEPT/FOR qualification> is optional.

This syntax is used for the case where we the data have already been cross-tabulated into a two-way

contingency table.

Syntax 3:

CHI-SQUARE INDEPENDENCE TEST <n11> <n12> <n21> <n22>

where <n11> is a parameter containing the value for row 1, column 1 of a 2x2 table;

            <n12> is a parameter containing the value for row 1, column 2 of a 2x2 table;

            <n21> is a parameter containing the value for row 2, column 1 of a 2x2 table;

            and <n22> is a parameter containing the value for row 2, column 2 of a 2x2 table.

This syntax is used for the special case where you have a 2x2 table. In this case, you can enter the 4

values directly, although you do need to be careful that the parameters are entered in the order

expected above.

Examples:

CHI-SQUARE INDEPENDENCE TEST Y1 Y2

CHI-SQUARE INDEPENDENCE TEST M

CHI-SQUARE INDEPENDENCE TEST N11 N12 N21 N22

Note:

The chi-square approximation is asymptotic. This means that the critical values may not be valid if the

expected frequencies are too small.

Cochran suggests that if the minimum expected frequency is less than 1 or if 20% of the expected frequencies

are less than 5, the approximation may be poor. However, Conover suggests that this is probably too

conservative, particularly if r and c are not too small. He suggests that the minimum expected frequency should

be 0.5 and at least half the expected frequencies should be greater than 1.

In any event, if there are too many low expected frequencies, you can do one of the following:

1. If rows or columns with small expected frequencies can be intelligently combined, then this may

result in expected frequencies that are sufficiently large.

2. Use Fisher's exact test.

Note:

Page 13: Statistics

Conover points out that there are really 3 distinct tests:

1. Only N is fixed. The row and column totals are not fixed (i.e., they are random).

2. Either the row totals or the column totals are fixed beforehand.

3. Both the row totals and the column totals are fixed beforehand.

Note that in all three cases, the test statistic and the chi-square approximation are the same. What differs is the

exact distribution of the test statistic. When either the row or column totals (or both) are fixed, the possible

number of contingency tables is reduced.

As long as the expected frequencies are sufficiently large, the chi-square approximation should be adequate

for practical purposes.

Note:

Some authors recommend using a continuity correction for this test. In this case, 0.5 is added to the observed

frequency in each cell. Dataplot performs this test both with the continuity correction and without the continuity

correction.

Note:

The following information is written to the file dpst1f.dat (in the current directory):

Column 1 - row id

Column 2 - column id

Column 3 - row total

Column 4 - column total

Column 5 - expected frequency (Eij

Column 6 - observed frequency (Oij

To read this information into Dataplot, enter

SKIP 1

READ DPST1F.DAT ROWID COLID ROWTOT COLTOT ...

            EXPFREQ OBSFREQ

Note:

The ASSOCIATION PLOT command can be used to plot the standardized residuals of the chi-square analysis.

The ODDS RATIO INDEPDNENCE TEST is an alternative test for independence based on the LOG(odds

ratio).

Related Commands:

ODDS RATIO INDEPENDENCE TEST = Perform a log(odds ratio) test for independence.

FISHER EXACT TEST = Perform Fisher's exact test.

Page 14: Statistics

ASSOCIATION PLOT = Generate an association plot.

SIEVE PLOT = Generate a sieve plot.

ROSE PLOT = Generate a Rose plot.

BINARY TABULATION PLOT = Generate a binary tabulation plot.

ROC CURVE = Generate a ROC curve.

ODDS RATIO = Compute the bias corrected odds ratio.

LOG ODDS RATIO = Compute the bias corrected log(odds ratio).

FRIEDMAN TEST

The Friedman test is a non-parametric test for analyzing randomized complete block designs. It is an extension

of the sign test when there may be more than two treatments.

The Friedman test assumes that there are k experimental treatments (k ≥ 2). The observations are

arranged in b blocks, that is

  Treatment

Block 1 2 ... k

1 X11 X12 ... X1k

2 X21 X22 ... X2k

3 X31 X32 ... X3k

... ... ... ... ...

b Xb1 Xb2 ... Xbk

Let R(Xij) be the rank assigned to Xij within block i (i.e., ranks within a given row). Average ranks are

used in the case of ties. The ranks are summed to obtain

Then the Friedman test is

H0: The treatment effects have identical effects

Ha: At least one treatment is different from at least one other treatment

Test Statistic:

Page 15: Statistics

If there are ties, then

where

Note that Conover recommends the statistic

since it has a more accurate approximate distribution. The T2 statistic is the two-way

analysis of variance statistic computed on the ranks R(Xij).

Significance

Level:

Critical Region:

where F is the percent point function of the F distributuion.

where is the percent point function of the chi-square distribution.

The T1 approximation is sometimes poor, so the T2 approximation is typically preferred.

Conclusion: Reject the null hypothesis if the test statistic is in the critical region.

If the hypothesis of identical treatment effects is rejected, it is often desirable to determine which

treatments are different (i.e., multiple comparisons). Treatments i and j are considered different if

Syntax:

Page 16: Statistics

FRIEDMAN TEST <y> <block> <treat>

                        <SUBSET/EXCEPT/FOR qualification>

where <y> is the response variable;

            <block> is a variable that identifies the block;

            <treat> is a variable that identifies the treatment;

and where the <SUBSET/EXCEPT/FOR qualification> is optional.

Examples:

FRIEDMAN TEST Y BLOCK TREATMENT

FRIEDMAN TEST Y X1 X2

FRIEDMAN TEST Y BLOCK TREATMENT SUBSET BLOCK > 2

Note:

In Dataplot, the variables should be given as:

Y BLOCK TREAT

X11 1 1

X12 1 2

... 1 ...

X1k 1 k

X21 2 1

X22 2 2

... 2 ...

X2k 2 k

... ... ...

Xb1 b 1

Xb2 b 2

... b ...

Xbk b k

If your data are in a format similar to that given in the DESCRIPTION section (i.e., you have colums Y1

to Yk, each with b rows), you can convert it to the format required by Dataplot with the commands:

LET K = 5

LET NBLOCK = SIZE Y1

LET NTOTAL = K*NBLOCK

LET BLOCK = SEQUENCE 1 K 1 NBLOCK

LET TREAT = SEQUENCE 1 1 K FOR I = 1 1 NTOTAL

Page 17: Statistics

LET Y2 = STACK Y1 Y2 ... YK

FRIEDMAN TEST Y2 BLOCK TREAT

Note:

The response, ranked response, block, and treatment are written to the file dpst1f.dat in the current

directory.

The treatment ranks and multiple comparisons are written to the file dpst2f.dat in the current directory.

Comparisons that are statistically significant at the 95% level are flagged with a single asterisk while

comparisons that are statistically significant at the 99% level are flagged with two asterisks.

Note:

The Friedman test is based on the following assumptions:

1. The b rows are mutually independent. That is, the results within one block (row) do not affect

the results within other blocks.

2. The data can be meaningfully ranked.

Default:

None

Synonyms:

None

Related Commands:

ANOVA = Perform an analysis of variance.

SIGN TEST = Perform a sign test.

MEDIAN POLISH = Carries out a robust ANOVA.

T TEST = Carries out a t test.

RANK SUM TEST = Perform a rank sum test.

SIGNED RANK TEST = Perform a signed rank test.

BLOCK PLOT = Generate a block plot.

DEX SCATTER PLOT = Generates a dex scatter plot.

DEX ... PLOT = Generates a dex plot for a statistic.

DEX ... EFFECTS PLOT = Generates a dex effects plot for a

Reference:

"Practical Nonparametric Statistics", Third Edition, Wiley, 1999, pp. 367-373.

Applications:

Analysis of Variance

Implementation Date:

2004/1

Program:

Page 18: Statistics

SKIP 1

READ CONOVER.DAT Y BLOCK TREAT

FRIEDMAN Y BLOCK TREAT

The following output is generated.

FRIEDMAN TEST FOR TWO-WAY ANOVA

1. STATISTICS

NUMBER OF OBSERVATIONS = 48

NUMBER OF BLOCKS = 12

NUMBER OF TREATMENTS = 4

FRIEDMAN TEST STATISTIC (ORIGINAL) = 8.097345

A1 (SUM OF SQUARES OF RANKS) = 356.5000

C1 (CORRECTION FACTOR) = 300.0000

FRIEDMAN TEST STATISTIC (CONOVER) = 3.192198

2. PERCENT POINTS OF THE F REFERENCE DISTRIBUTION

FOR FRIEDMAN TEST STATISTIC

0 % POINT = 0.000000

50 % POINT = 0.8052071

75 % POINT = 1.435732

90 % POINT = 2.257744

95 % POINT = 2.891563

99 % POINT = 4.436786

99.9 % POINT = 6.882786

96.37845 % Point: 3.192198

3. CONCLUSION (AT THE 5% LEVEL):

THE 4 TREATMENTS DO NOT HAVE IDENTICAL EFFECTS

Kruskal-Wallis test

 This is a method for comparing several independent random samples and can be used as a non-parametric

alternative to the one way ANOVA.

Page 19: Statistics

 The Kruskal-Wallis test statistic for k samples, each of size ni is:

- where N is the total number (all ni) and Ri is the sum of the ranks (from all samples pooled) for the ith sample

and:

 The null hypothesis of the test is that all k distribution functions are equal. The alternative hypothesis is that at

least one of the populations tends to yield larger values than at least one of the other populations.

 Assumptions:

random samples from populations

independence within each sample

mutual independence among samples

measurement scale is at least ordinal

either k population distribution functions are identical, or else some of the populations tend to yield

larger values than other populations

 If the test is significant, you can make multiple comparisons between the samples. You may choose the level

of significance for these comparisons (default is = 0.05). All pairwise comparisons are made and the

probability of each presumed "non-difference" is indicated (Conover, 1999; Critchlow and Fligner, 1991;

Hollander and Wolfe, 1999). Two alternative methods are used to make all possible pairwise comparisons

between groups; these are Dwass-Steel-Critchlow-Fligner and Conover-Inman. In most situations, you should

use the Dwass-Steel-Critchlow-Fligner result.

 By the Dwass-Steel-Critchlow-Fligner procedure, a contrast is considered significant if the following inequality

is satisfied:

Page 20: Statistics

- where q is a quantile from the normal range distribution for k groups, ni is size of the ith group, nj is the size of

the jth group, tb is the number of ties at rank b and Wij is the sum of the ranks for the ith group where

observations for both groups have been ranked together. The values either side of the greater than sign are

displayed in parentheses in StatsDirect results.

 The Conover-Inman procedure is simply Fisher's least significant difference method performed on ranks. A

contrast is considered significant if the following inequality is satisfied:

- where t is a quantile from the Student t distribution on N-k degrees of freedom. The values either side of the

greater than sign are displayed in parentheses in StatsDirect results.

 An alternative to Kruskal-Wallis is to perform a one way ANOVA on the ranks of the observations.

 StatsDirect also gives you an homogeneity of variance test option with Kruskal-Wallis; this is marked as

"Equality of variance (squared ranks)". Please refer to homogeneity of variance for more details.

 Technical Validation

The test statistic is an extension of the Mann-Whitney test and is calculated as above. In the presence of tied

ranks the test statistic is given in adjusted and unadjusted forms, (opinion varies concerning the handling of

ties). The test statistic follows approximately a chi-square distribution with k-1 degrees of freedom; P values

are derived from this. For small samples you may wish to refer to tables of the Kruskal-Wallis test statistic but

the chi-square approximation is highly satisfactory in most cases (Conover, 1999).

 Example

From Conover (1999, p. 291).

Test workbook (ANOVA worksheet: Method 1, Method 2, Method 3, Method 4).

 The following data represent corn yields per acre from four different fields where different farming methods

were used.

Method 1 Method 2 Method 3 Method 4

83 91 101 78

Page 21: Statistics

91 90 100 82

94 81 91 81

89 83 93 77

89 84 96 79

96 83 95 81

91 88 94 80

92 91   81

90 89    

  84    

 To analyse these data in StatsDirect you must first prepare them in four workbook columns appropriately

labelled. Alternatively, open the test workbook using the file open function of the file menu. Then select

Kruskal-Wallis from the Non-parametric section of the analysis menu. Then select the columns marked

"Method 1", "Method 2", "Method 3" and "Method 4" in one selection action.

Example:

 Adjusted for ties: T = 25.62883 P < 0.0001

 All pairwise comparisons (Dwass-Steel-Chritchlow-Fligner)

Method 1 and Method 2 , P = 0.1529

Method 1 and Method 3 , P = 0.0782

Method 1 and Method 4 , P = 0.0029

Method 2 and Method 3 , P = 0.0048

Method 2 and Method 4 , P = 0.0044

Method 3 and Method 4 , P = 0.0063

Page 22: Statistics

 All pairwise comparisons (Conover-Inman)

Method 1 and Method 2, P = 0.0078

Method 1 and Method 3, P = 0.0044

Method 1 and Method 4, P < 0.0001

Method 2 and Method 3, P < 0.0001

Method 2 and Method 4, P = 0.0001

Method 3 and Method 4, P < 0.0001

 From the overall T we see a statistically highly significant tendency for at least one group to give higher values

than at least one of the others. Subsequent contrasts show a significant separation of all groups with the

Conover-Inman method and all but method 1 vs. methods 2 and 3 with the Dwass-Steel-Chritchlow-Fligner

method. In most situations, it is best to use only the Dwass-Steel-Chritchlow-Fligner result.

 P values-analysis of variance

CRAMER CONTINGENCY COEFICIENT

If we have N observations with two variables where each observation can be classified into one of R mutually

exclusive categories for variable one and one of C mutually exclusive categories for variable two, then a cross-

tabulation of the data results in a two-way contingency table (also referred to as an RxC contingency table).

The resulting contingency table has R rows and C columns.

A common question with regards to a two-way contingency table is whether we have independence. By

independence, we mean that the row and column variables are unassociated (i.e., knowing the value of the

row variable will not help us predict the value of column variable and likewise knowing the value of the column

variable will not help us predict the value of the row variable).

A more technical definition for independence is that

P(row i, column j) = P(row i)*P(column j)       for all i,j

The standard test statistic for determing independence is the chi-square test statistic:

Page 23: Statistics

One criticism of this statistic is that it does not give a meaningful description of the degree of dependence (or

strength of association). That is, it is useful for determining whether there is dependence. However, since the

strength of that association also depends on the degrees of freedom as well as the value of the test statistic, it

is not easy to interpert the strength of association.

The Cramer's contingency coefficient is one method to provide an easier to interpret measure of strength of

association. Specifically, it is:

where

T = the chi-square test statistic given above

N = the total sample size

q = minimum(number of rows,number of columns)

This statistic is based on the fact that the maximum value of T is:

N (q - 1)

So this statistic basically scales the chi-square statistic to a value between 0 (no association) and 1 (maximum

association). It has the desirable property of scale invariance. That is, if the sample size increases, the value of

Cramer's contingency coefficient does not change as long as values in the table change the same relative to

each other.

The data for the contingency table can be specified in either of the following two ways:

1. raw data

In this case, you will have two variables. The first will contain r distinct values and the second

will contain c distinct values. Dataplot will automatically perform the cross-tabulation to obtain

the counts for each cell. Although the distinct values will typically be integers, this is not strictly

required.

2. table data

Page 24: Statistics

If you only have the resulting contingency table (i.e., the counts for each cell), then you can use

the READ MATRIX (or CREATE MATRIX) command to create a matrix with the data. This is

demonstrated in the example program below.

In this case, your data should contain non-negative integers since they represent the counts for

each cell.

Syntax 1:

LET <par> = CRAMER CONTINGENCY COEFICIENT <y1> <y2>

                        <SUBSET/EXCEPT/FOR qualification>

where <y1> is the first response variable;

            <y2> is the second response variable;

            <par> is a parameter where the computed Cramer contingency coefficient is stored;

and where the <SUBSET/EXCEPT/FOR qualification> is optional.

Use this syntax for raw data.

Syntax 2:

LET <p> = MATRIX GRAND CRAMER CONTINGENCY COEFICIENT <y1> <y2>

                        <SUBSET/EXCEPT/FOR qualification>

where <m> is a matrix containing the contingency table;

            <p> is a parameter where the computed Cramer contingency coefficient is stored;

and where the <SUBSET/EXCEPT/FOR qualification> is optional.

Use this syntax if your data is a contingency table.

Examples:

LET A = CRAMER CONTINGENCY COEFICIENT Y1 Y2

LET A = MATRIX GRAND CRAMER CONTINGENCY COEFICIENT M

Note:

For the raw data case, the two variables should have the same number of elements.

Note:

The following additional commands are supported

TABULATE CRAMER CONTINGENCY COEFICIENT Y1 Y2 X

CROSS TABULATE CRAMER CONTINGENCY COEFICIENT ...

            Y1 Y2 X1 X2

CRAMER CONTINGENCY COEFICIENT PLOT Y1 Y2 X

CROSS TABULATE CRAMER CONTINGENCY COEFICIENT PLOT ...

            Y1 Y2 X1 X2

Page 25: Statistics

BOOTSTRAP CRAMER CONTINGENCY COEFICIENT PLOT Y1 Y2

JACKNIFE CRAMER CONTINGENCY COEFICIENT PLOT Y1 Y2

The above commands expect the variables to have the same number of observations.

Note that the above commands are only available if you have raw data.

Default:

None

Synonyms:

None

Related Commands:

PEARSON CONTINGENCY COEFFICIENT = Compute Pearson's contingency coefficient.

CHI-SQUARE INDEPENDENCE TEST = Perform a chi-square test for independence.

ODDS RATIO INDEPENDENCE TEST = Perform a log(odds ratio) test for independence.

FISHER EXACT TEST = Perform Fisher's exact test.

ASSOCIATION PLOT = Generate an association plot.

SIEVE PLOT = Generate a sieve plot.

ROSE PLOT = Generate a Rose plot.

BINARY TABULATION PLOT = Generate a binary tabulation plot.

ROC CURVE = Generate a ROC curve.

ODDS RATIO = Compute the bias corrected odds ratio.

LOG ODDS RATIO = Compute the bias corrected log(odds ratio).