CHAPTER 26 Discriminant Analysis From: McCune, B. & J. B. Grace. 2002. Analysis of Ecological...

24
CHAPTER 26 Discriminant Analysis From: McCune, B. & J. B. Grace. 2002. Analysis of Ecological Communities. MjM Software Design, Gleneden Beach, Oregon http://www.pcord.com Tables, Figures, and Equations

Transcript of CHAPTER 26 Discriminant Analysis From: McCune, B. & J. B. Grace. 2002. Analysis of Ecological...

Page 1: CHAPTER 26 Discriminant Analysis From: McCune, B. & J. B. Grace. 2002. Analysis of Ecological Communities. MjM Software Design, Gleneden Beach, Oregon.

CHAPTER 26Discriminant Analysis

From: McCune, B. & J. B. Grace. 2002. Analysis of Ecological Communities. MjM Software Design, Gleneden Beach, Oregon http://www.pcord.com

Tables, Figures, and Equations

Page 2: CHAPTER 26 Discriminant Analysis From: McCune, B. & J. B. Grace. 2002. Analysis of Ecological Communities. MjM Software Design, Gleneden Beach, Oregon.

Purposes:

1. Summarizing the differences between groups (often used as a follow-up to clustering, to help describe the groups); "descriptive discriminant analysis." With community data, you could use indicator species analysis as a nonparametric alternative.

Page 3: CHAPTER 26 Discriminant Analysis From: McCune, B. & J. B. Grace. 2002. Analysis of Ecological Communities. MjM Software Design, Gleneden Beach, Oregon.

Purposes:

1. Summarizing the differences between groups (often used as a follow-up to clustering, to help describe the groups); "descriptive discriminant analysis." With community data, you could use indicator species analysis as a nonparametric alternative.

2. Multivariate testing of whether or not two or more groups differ significantly from each other. For ecological community data this is better done with MRPP, thus avoiding the assumptions listed below.

Page 4: CHAPTER 26 Discriminant Analysis From: McCune, B. & J. B. Grace. 2002. Analysis of Ecological Communities. MjM Software Design, Gleneden Beach, Oregon.

Purposes:

1. Summarizing the differences between groups (often used as a follow-up to clustering, to help describe the groups); "descriptive discriminant analysis." With community data, you could use indicator species analysis as a nonparametric alternative.

2. Multivariate testing of whether or not two or more groups differ significantly from each other. For ecological community data this is better done with MRPP, thus avoiding the assumptions listed below.

3. Determining the dimensionality of group differences.

Page 5: CHAPTER 26 Discriminant Analysis From: McCune, B. & J. B. Grace. 2002. Analysis of Ecological Communities. MjM Software Design, Gleneden Beach, Oregon.

Purposes:

1. Summarizing the differences between groups (often used as a follow-up to clustering, to help describe the groups); "descriptive discriminant analysis." With community data, you could use indicator species analysis as a nonparametric alternative.

2. Multivariate testing of whether or not two or more groups differ significantly from each other. For ecological community data this is better done with MRPP, thus avoiding the assumptions listed below.

3. Determining the dimensionality of group differences.

4. Checking for misclassified items.

Page 6: CHAPTER 26 Discriminant Analysis From: McCune, B. & J. B. Grace. 2002. Analysis of Ecological Communities. MjM Software Design, Gleneden Beach, Oregon.

Purposes (cont.):

5. Predicting group membership or classifying new cases ("predictive discriminant analysis").

Page 7: CHAPTER 26 Discriminant Analysis From: McCune, B. & J. B. Grace. 2002. Analysis of Ecological Communities. MjM Software Design, Gleneden Beach, Oregon.

Purposes (cont.):

5. Predicting group membership or classifying new cases ("predictive discriminant analysis").

6. Comparing occupied vs. unoccupied habitat to determine the habitat characteristics that allow or prevent a species' existence. DA has been widely used for this purpose in wildlife studies and rare plant studies.

Page 8: CHAPTER 26 Discriminant Analysis From: McCune, B. & J. B. Grace. 2002. Analysis of Ecological Communities. MjM Software Design, Gleneden Beach, Oregon.

Assumptions1. Homogeneous within-group variances

2. Multivariate normality within groups.

3. Linearity among all pairs of variables.

4. Prior probabilities.

Page 9: CHAPTER 26 Discriminant Analysis From: McCune, B. & J. B. Grace. 2002. Analysis of Ecological Communities. MjM Software Design, Gleneden Beach, Oregon.

How it worksThe "direct" procedure is described below.

1. Calculate variance/covariance matrix for each group.

Page 10: CHAPTER 26 Discriminant Analysis From: McCune, B. & J. B. Grace. 2002. Analysis of Ecological Communities. MjM Software Design, Gleneden Beach, Oregon.

How it worksThe "direct" procedure is described below.

1. Calculate variance/covariance matrix for each group.

2. Calculate pooled variance/covariance matrix (Sp) from the above matrices.

Page 11: CHAPTER 26 Discriminant Analysis From: McCune, B. & J. B. Grace. 2002. Analysis of Ecological Communities. MjM Software Design, Gleneden Beach, Oregon.

How it worksThe "direct" procedure is described below.

1. Calculate variance/covariance matrix for each group.

2. Calculate pooled variance/covariance matrix (Sp) from the above matrices.

3. Calculate between group variance (Sg) for each variable.

Page 12: CHAPTER 26 Discriminant Analysis From: McCune, B. & J. B. Grace. 2002. Analysis of Ecological Communities. MjM Software Design, Gleneden Beach, Oregon.

4. Maximize the F-ratio:

F =

y S y

y S yg

p

where the y is an the eigenvector associated with a particular discriminant function.

We seek y to maximize F.

Page 13: CHAPTER 26 Discriminant Analysis From: McCune, B. & J. B. Grace. 2002. Analysis of Ecological Communities. MjM Software Design, Gleneden Beach, Oregon.

Maximize this ratio by finding the partial derivatives with a characteristic equation:

| | = -p gS S - I1 0

The number of roots is g-1, where g is number of groups. In other words, the number of functions (axes) derived is one less than the number of groups.

The eigenvalues thus express the percent of variance among groups explained by those axes.

Page 14: CHAPTER 26 Discriminant Analysis From: McCune, B. & J. B. Grace. 2002. Analysis of Ecological Communities. MjM Software Design, Gleneden Beach, Oregon.

6. Solve for each eigenvector y (also known as the "canonical variates" or "discriminant functions").

[ ]p gS S I y-1 - = 0

Page 15: CHAPTER 26 Discriminant Analysis From: McCune, B. & J. B. Grace. 2002. Analysis of Ecological Communities. MjM Software Design, Gleneden Beach, Oregon.

7. Locate points (sample units) on each axis.

X = AYX = scores (coordinates) for n rows (sample units) on m dimensions, where m = g-1.A = original data matrix of n rows by p columnsY = matrix of m eigenvectors with loadings for p variables.

Each eigenvector is known as a discriminant function.

Page 16: CHAPTER 26 Discriminant Analysis From: McCune, B. & J. B. Grace. 2002. Analysis of Ecological Communities. MjM Software Design, Gleneden Beach, Oregon.

These unstandardized discriminant functions Y can be used as (linear) prediction equations, assigning scores to unclassified items.

Standardized discriminant function coefficients standardize to unit variance. The absolute value of these coefficients indicate the relative importance of the individual variables in contributing to the discriminant function.

Page 17: CHAPTER 26 Discriminant Analysis From: McCune, B. & J. B. Grace. 2002. Analysis of Ecological Communities. MjM Software Design, Gleneden Beach, Oregon.

8. Classification phase.

a. Derive a classification equation for each group, one term in the equation for each variable, plus a constant.

b. Insert data values for a given SU to calculate a classification score for each group for that SU.

c. The SU is assigned to the group in which it had the highest score.

The coefficients in the equation are derived from:p p within-group variance-covariance matrix (Sp) and

p 1 vector of the means for each variable in group k, Mk.

First, calculate W by dividing each term of Sp by the within-group degrees of freedom. Then:

Page 18: CHAPTER 26 Discriminant Analysis From: McCune, B. & J. B. Grace. 2002. Analysis of Ecological Communities. MjM Software Design, Gleneden Beach, Oregon.

8. Classification phase, cont.

The coefficients in the equation are derived from:p p within-group variance-covariance matrix (Sp)

p 1 vector of the means for each variable in group k, Mk.

First, calculate W by dividing each term of Sp by the within-group degrees of freedom.

Then: k-

k = C W M1

The constant is derived as:

The constant and the coefficients in Ck define a linear equation of the usual form, one equation for each group k.

k k kconstant = - 1

2C M

Page 19: CHAPTER 26 Discriminant Analysis From: McCune, B. & J. B. Grace. 2002. Analysis of Ecological Communities. MjM Software Design, Gleneden Beach, Oregon.

Summary statistics Wilk's lambda (). Wilk's is the error sum of squares divided by the sum of the effect sum of squares and the error sum of squares. Thus, it is the variance among the objects not explained by the discriminant functions. It ranges from zero (perfect separation of groups) to one (no separation of groups).

Statistical significance of lambda is tested with a chi-square approximation.

Chi-square (derived from Wilk’s lambda).

Variance explained.

Page 20: CHAPTER 26 Discriminant Analysis From: McCune, B. & J. B. Grace. 2002. Analysis of Ecological Communities. MjM Software Design, Gleneden Beach, Oregon.

Figure 26.1. Comparison of DA and PCA. Groups are tighter in DA than in PCA because DA maximizes group separation while PCA maximizes the representation of variance among individual points. Groups were superimposed on an ordination of pine species in ecological trait space (after McCune 1988). Pinus resinosa was not assigned to a group, so it does not appear in the DA ordination.

Page 21: CHAPTER 26 Discriminant Analysis From: McCune, B. & J. B. Grace. 2002. Analysis of Ecological Communities. MjM Software Design, Gleneden Beach, Oregon.

Predicted with

EQUAL PRIORS

Predicted with

UNEQUAL PRIORS

Nest Not nest Nest Not nest

Actual Nest 0.83 0.17 0.48 0.52

Not nest 0.17 0.83 0.02 0.98

Table 26.1. Predictions of goshawk nesting sites from DA compared to actual results, in one case using equal prior probabilities, in the other case using prior probabilities based on the occupancy rate of landscape cells. The first value of 0.83 means that 83% of the sites that were predicted by DA to be nesting sites actually were nesting sites.

Page 22: CHAPTER 26 Discriminant Analysis From: McCune, B. & J. B. Grace. 2002. Analysis of Ecological Communities. MjM Software Design, Gleneden Beach, Oregon.

Predicted with

EQUAL PRIORS

Predicted with

UNEQUAL PRIORS

Nest Not nest Nest Not nest

Actual Nest 0.83 0.17 0.48 0.52

Not nest 0.17 0.83 0.02 0.98

Table 26.1. Predictions of goshawk nesting sites from DA compared to actual results, in one case using equal prior probabilities, in the other case using prior probabilities based on the occupancy rate of landscape cells. The first value of 0.83 means that 83% of the sites that were predicted by DA to be nesting sites actually were nesting sites.

priors

0.5

priors

0.5 0.93

0.07

Page 23: CHAPTER 26 Discriminant Analysis From: McCune, B. & J. B. Grace. 2002. Analysis of Ecological Communities. MjM Software Design, Gleneden Beach, Oregon.

EQUAL priors:No. non-nests predicted nests = p(predicted nest but not nest) number of non-nests

= 0.17 93

= 15.8

No. nests predicted non-nests = p(predicted not nest but nest) number of nests

= 0.17 7

= 1.2

Total number of errors = 15.8 + 1.2

= 17

False positives

False negatives

Page 24: CHAPTER 26 Discriminant Analysis From: McCune, B. & J. B. Grace. 2002. Analysis of Ecological Communities. MjM Software Design, Gleneden Beach, Oregon.

UNEQUAL priors:No. non-nests predicted nests = p(predicted nest but not nest) number of non-nests

= 0.02 93 = 1.9

No. nests predicted non-nests = p(predicted not nest but nest) number of nests

= 0.52 7= 3.6

Total number of errors = 1.9 + 3.6= 5.5

False positives

False negatives