Linear Discriminant Analysis and Logistic Regression.

17
Linear Discriminant Analysis and Logistic Regression

Transcript of Linear Discriminant Analysis and Logistic Regression.

Page 1: Linear Discriminant Analysis and Logistic Regression.

Linear Discriminant Analysis and Logistic Regression

Page 2: Linear Discriminant Analysis and Logistic Regression.

Background

Linear Discriminant Analysis predicts a categorical variable based on one or more metric independent variables

Page 3: Linear Discriminant Analysis and Logistic Regression.

Example

Age Purchase18 019 020 021 022 124 025 026 027 129 130 0

65 168 170 1

Data

Age

Pur

chas

e

Consider purchase data compared to a person’s age. A 0 value for Purchase represents someone who didn’t buy, while a 1 represents someone who did.

Page 4: Linear Discriminant Analysis and Logistic Regression.

Graph Interpretation

Potential customers who did purchase

Age

Pur

chas

e

Potential customers who did not purchase

Page 5: Linear Discriminant Analysis and Logistic Regression.

Graphical Representation

Age

Pur

chas

e

A discriminant analysis fits a linear regression to this data as though the categorical variable was numerical.

Page 6: Linear Discriminant Analysis and Logistic Regression.

Graphical Representation ctd.

Age

Pur

chas

e

Then the Discriminant Analysis determines a cutoff score.

For a single predictor variable, this score is where the regression line is equal to.5. Any data points to the left of the line are predicted to be 0, while those to the right are predicted to be 1.

For this data, any potential customer below the age of 41 is predicted not to buy, while anyone older is predicted to buy.

Page 7: Linear Discriminant Analysis and Logistic Regression.

A 100% Accurate Discriminate Analysis

Even a discriminant analysis that provides perfect separation between purchasers and non-purchasers does not have a perfect R .2

Page 8: Linear Discriminant Analysis and Logistic Regression.

Classification Accuracy

Standard Error measures the distance of the predicted value (the regression line) from the observed values. Even data points that are correctly predicted will contribute to the error calculation.

Classification accuracy is a better measure.

This distance will lower the total R , even though it is a correct classification.

2

Page 9: Linear Discriminant Analysis and Logistic Regression.

Discriminant Analysis in StatTools

Page 10: Linear Discriminant Analysis and Logistic Regression.

Discriminant Analysis in StatTools

Page 11: Linear Discriminant Analysis and Logistic Regression.

StatTools – Interpreting OutputA

ctu

al valu

es

Predicted Values

Correct Predictions

Page 12: Linear Discriminant Analysis and Logistic Regression.

StatTools – Interpreting Output ctd.A

ctu

al valu

es

Predicted Values

False Negatives

False Positives

Overall Accuracy

Page 13: Linear Discriminant Analysis and Logistic Regression.

Logistic Regression

A logistic regression fits a sigmoid, or S-shaped curve instead of a straight line. On some datasets, this will provide greater classification accuracy.

pp

pp

XXX

XXX

e

e

....

....

22110

22110

1)(x

Page 14: Linear Discriminant Analysis and Logistic Regression.

Logistic Regression in StatTools

Page 15: Linear Discriminant Analysis and Logistic Regression.

Logistic Regression in StatTools

Page 16: Linear Discriminant Analysis and Logistic Regression.

StatTools – Interpreting Output

Age is highly statistically significant

Overall Accuracy

Page 17: Linear Discriminant Analysis and Logistic Regression.

Comparison

Discriminant Analysis Can be used for

dependent variables with more than 2 possible values

Logistic Regression Less reliant on basic

assumptions of the data like normality and constant variance

More accurate on borderline points for some datasets