Stat 586 Discriminant Analysis

download Stat 586 Discriminant Analysis

of 41

Transcript of Stat 586 Discriminant Analysis

  • 8/7/2019 Stat 586 Discriminant Analysis

    1/41

    Discriminant AnalysisDiscriminant Analysis

    An IntroductionAn Introduction

  • 8/7/2019 Stat 586 Discriminant Analysis

    2/41

    22

    Problem descriptionProblem description

    We wish to predict group membership forWe wish to predict group membership fora number of subjects from a set ofa number of subjects from a set of

    predictor variables.predictor variables. The criterion variable (also called groupingThe criterion variable (also called grouping

    variable) is the object of classification. Thisvariable) is the object of classification. Thisis ALWAYS a categorical variable!!!is ALWAYS a categorical variable!!!

    Simple case: two groups andSimple case: two groups and pp predictorpredictorvariables.variables.

  • 8/7/2019 Stat 586 Discriminant Analysis

    3/41

    33

    ExampleExample

    We want to know whether somebody hasWe want to know whether somebody haslung cancer. Hence, we wish to predict alung cancer. Hence, we wish to predict a

    yes or no outcome.yes or no outcome. Possible predictor variables: number ofPossible predictor variables: number of

    cigarettes smoked a day, caughingcigarettes smoked a day, caughingfrequency and intensity etc.frequency and intensity etc.

  • 8/7/2019 Stat 586 Discriminant Analysis

    4/41

    44

    Approach (1)Approach (1)

    Linear discriminant analysis constructs one orLinear discriminant analysis constructs one ormoremorediscriminant equationsdiscriminant equations DDii (linear(linearcombinations of the predictor variables Xcombinations of the predictor variables Xkk))

    such that the different groups differ as muchsuch that the different groups differ as muchas possible on D.as possible on D.

    Discriminant function:Discriminant function:

    k

    p

    k

    ki XbbD !

    !1

    0

  • 8/7/2019 Stat 586 Discriminant Analysis

    5/41

    55

    Approach (2)Approach (2)

    More precisely, the weights of theMore precisely, the weights of thediscriminant function are calculated indiscriminant function are calculated in

    such a way, that the ratio (between groupssuch a way, that the ratio (between groupsSS)/(within groups SS) is as large asSS)/(within groups SS) is as large aspossible.possible.

    Number of discriminant functions =Number of discriminant functions =

    min(number of groupsmin(number of groups 1,p).1,p).

  • 8/7/2019 Stat 586 Discriminant Analysis

    6/41

    DefinitionsDefinitions

    Suppose we have a set of g classes.Suppose we have a set of g classes.

    Let W denote the withinLet W denote the within--class covariance matrix, that isclass covariance matrix, that isthe covariance matrix of the variables centered on thethe covariance matrix of the variables centered on the

    class mean.class mean. B denote the betweenB denote the between--classes covariance matrix, that is,classes covariance matrix, that is,

    of the predictions by the class means.of the predictions by the class means.

    The sample covariances are:The sample covariances are:

    4/13/20114/13/2011 66

  • 8/7/2019 Stat 586 Discriminant Analysis

    7/41

    77

    InterpretationInterpretation

    First discriminant function DFirst discriminant function D11 distinguishesdistinguishesfirst group from groups 2,3,..N.first group from groups 2,3,..N.

    Second discriminant function DSecond discriminant function D22distinguishes second group from groups 3,distinguishes second group from groups 3,4,N.4,N.

    etcetc

  • 8/7/2019 Stat 586 Discriminant Analysis

    8/41

    88

    Visualization (two outcomes)Visualization (two outcomes)

  • 8/7/2019 Stat 586 Discriminant Analysis

    9/41

    99

    Visualization (3 outcomes)Visualization (3 outcomes)

  • 8/7/2019 Stat 586 Discriminant Analysis

    10/41

    1010

    Approach (3)Approach (3)

    To calculate the optimal weights, a trainingTo calculate the optimal weights, a trainingset is used containing the correctset is used containing the correctclassification for a group of subjects.classification for a group of subjects.

    EXAMPLE (lung cancer):EXAMPLE (lung cancer):We need data about persons for whom weWe need data about persons for whom weknow for sure that they had lung cancerknow for sure that they had lung cancer

    (e.g. established by means of an(e.g. established by means of anoperation, scan, or xrays)!operation, scan, or xrays)!

  • 8/7/2019 Stat 586 Discriminant Analysis

    11/41

    1111

    Approach (4)Approach (4)

    For a new group of subjects for whom weFor a new group of subjects for whom wedo not yet know the group they belong to,do not yet know the group they belong to,we can use the previously calculatedwe can use the previously calculateddiscriminant weights to obtain theirdiscriminant weights to obtain theirdiscriminant scores.discriminant scores.

    We call this classification.We call this classification.

  • 8/7/2019 Stat 586 Discriminant Analysis

    12/41

    1212

    Technical detailsTechnical details

    The calculation of optimal discriminantThe calculation of optimal discriminantweights involves some mathematics.weights involves some mathematics.

  • 8/7/2019 Stat 586 Discriminant Analysis

    13/41

    1313

    Example (1)Example (1)

    The famous (Fisher's or Anderson's) irisThe famous (Fisher's or Anderson's) irisdata set gives the measurements indata set gives the measurements incentimeters of the variablescentimeters of the variables sepal lengthsepal lengthandand widthwidth andand petal lengthpetal length andand widthwidth,,respectively, for50 flowers from each of3respectively, for50 flowers from each of3species of iris.species of iris.

    The species are IrisThe species are Iris setosasetosa,,versicolorversicolor,,andand virginicavirginica..

  • 8/7/2019 Stat 586 Discriminant Analysis

    14/41

    1414

    Obs S.Length S.Width P.Length P.Width Species1 5.1 3.5 1.4 0.2 setosa2 4.9 3.0 1.4 0.2 setosa

    3 4.7 3.2 1.3 0.2 setosa4 4.6 3.1 1.5 0.2 setosa5 5.0 3.6 1.4 0.2 setosa6 5.4 3.9 1.7 0.4 setosa7 4.6 3.4 1.4 0.3 setosa8 5.0 3.4 1.5 0.2 setosa

    9 4.4 2.9 1.4 0.2 setosa

    Fragment of data setFragment of data set

  • 8/7/2019 Stat 586 Discriminant Analysis

    15/41

    1515

    Example (2)Example (2)

    Dependent variable?Dependent variable?

    Predictor variables?Predictor variables?

    Number of discriminant functions?Number of discriminant functions?

  • 8/7/2019 Stat 586 Discriminant Analysis

    16/41

    1616

    Step 1: Analyze dataStep 1: Analyze data

    The idea is to start with analyzing the data.The idea is to start with analyzing the data.

    We start with linear discriminant analysis.We start with linear discriminant analysis.

    Do the predictors vary sufficiently over theDo the predictors vary sufficiently over thedifferent groups?different groups?

    If not, they will be bad predictors.If not, they will be bad predictors.

    Formal test for this:Wilks testFormal test for this:Wilks test

    This test assesses whether the predictorsThis test assesses whether the predictorsvary enough to distinguish differentvary enough to distinguish differentgroups.groups.

  • 8/7/2019 Stat 586 Discriminant Analysis

    17/41

    4/13/20114/13/2011 1717

    Step 1a: Sample statisticsStep 1a: Sample statistics

    Call:Call:

    iris.lda

  • 8/7/2019 Stat 586 Discriminant Analysis

    18/41

    VisualizationVisualization plot(iris.lda)plot(iris.lda)

    1818

  • 8/7/2019 Stat 586 Discriminant Analysis

    19/41

    1919

    Step 1b: Formal testStep 1b: Formal test

    X

  • 8/7/2019 Stat 586 Discriminant Analysis

    20/41

    2020

    Step 2: Discriminant function (1)Step 2: Discriminant function (1)

    Look at the coefficients of theLook at the coefficients of thestandardized (!) discriminant functions tostandardized (!) discriminant functions tosee what predictors play an important role.see what predictors play an important role.

    The larger the coefficient of a predictor inThe larger the coefficient of a predictor inthe standardized discriminant function, thethe standardized discriminant function, themore important its role in the discriminantmore important its role in the discriminant

    function.function.

  • 8/7/2019 Stat 586 Discriminant Analysis

    21/41

    2121

    Step 2: Discriminant function (2)Step 2: Discriminant function (2)

    The coefficients represent partialThe coefficients represent partialcorrelations:correlations:

    the contribution of a variable to thethe contribution of a variable to thediscriminant function in the context of thediscriminant function in the context of theother predictor variables in the model.other predictor variables in the model.

    Limitations: with more than two outcomesLimitations: with more than two outcomesmore difficult to interpret.more difficult to interpret.

  • 8/7/2019 Stat 586 Discriminant Analysis

    22/41

    4/13/20114/13/2011 2222

    Step 2: Getting discr. functionsStep 2: Getting discr. functions

    Call:Call:

    iris.lda

  • 8/7/2019 Stat 586 Discriminant Analysis

    23/41

    2323

    Step 3: Comparing discr. funcsStep 3: Comparing discr. funcs

    Which discriminant function has mostWhich discriminant function has mostdiscriminating power?discriminating power?

    Look at the eigenvalues, also called theLook at the eigenvalues, also called the

    singular values or characteristic roots. Eachsingular values or characteristic roots. Eachdiscriminant function has such a value. Theydiscriminant function has such a value. Theyreflect the amount of varianceexplained in thereflect the amount of varianceexplained in thegrouping variable by the predictors in agrouping variable by the predictors in adiscriminant function.discriminant function.

    Always look at the ratio of theeigenvalues toAlways look at the ratio of theeigenvalues toassess the relative importance of a discriminantassess the relative importance of a discriminantfunction.function.

  • 8/7/2019 Stat 586 Discriminant Analysis

    24/41

    2424

    Step 3: Getting eigenvaluesStep 3: Getting eigenvalues

    iris.lda$svdiris.lda$svd

    > iris.lda$svd> iris.lda$svd

    [1]48.642644 4.579983[1]48.642644 4.579983

    svd: the singular values, which give thesvd: the singular values, which give theratio of the betweenratio of the between-- and withinand within--groupgroupstandard deviations on the linearstandard deviations on the lineardiscriminant variables.discriminant variables.

    belongsto D1

    belongsto D2

  • 8/7/2019 Stat 586 Discriminant Analysis

    25/41

    2525

    Step 4: More interpretationStep 4: More interpretation

    TraceTrace

    Useful plotsUseful plots

    Group centroidsGroup centroids

  • 8/7/2019 Stat 586 Discriminant Analysis

    26/41

    2626

    Step 4a: TraceStep 4a: TraceCall:Call:iris.lda

  • 8/7/2019 Stat 586 Discriminant Analysis

    27/41

    2727

    Step 4a: Trace interpretationStep 4a: Trace interpretation

    The first trace number indicates theThe first trace number indicates thepercentage of betweenpercentage of between--group variancegroup variancethat the first discriminant function is able tothat the first discriminant function is able toexplain from the total amount of betweenexplain from the total amount of between--group variance.group variance.

    High trace number = discriminant functionHigh trace number = discriminant function

    plays an important role!plays an important role!

  • 8/7/2019 Stat 586 Discriminant Analysis

    28/41

    2828

    Step 4b: Useful plotsStep 4b: Useful plots

    Takee.g. first and second discriminantTakee.g. first and second discriminantfunction. Plot discriminant function valuesfunction. Plot discriminant function valuesof objects in scatter plot, with predictedof objects in scatter plot, with predicted

    groups. Does the discriminant functiongroups. Does the discriminant functiondiscriminate well between the differentdiscriminate well between the differentgroups?groups?

    Combine plot with group centroids.Combine plot with group centroids.(Average values of discriminant functions(Average values of discriminant functionsforeach group)foreach group)

  • 8/7/2019 Stat 586 Discriminant Analysis

    29/41

    2929

    Step 4c: R code for plotStep 4c: R code for plot# Plot# Plot

    LD1

  • 8/7/2019 Stat 586 Discriminant Analysis

    30/41

    3030

    Step 5: Prediction (1)Step 5: Prediction (1)

    Using theestimated discriminant model, classifyUsing theestimated discriminant model, classifynew subjects.new subjects.

    Various ways to do this.Various ways to do this.

    We consider the following approach:We consider the following approach:

    Calculate the probability that a subject belongs to aCalculate the probability that a subject belongs to acertain group using theestimated discriminant model.certain group using theestimated discriminant model.

    Do this for all groups.Do this for all groups.

    Classification rule: subject is assigned to group it hasClassification rule: subject is assigned to group it hasthe highest probability to fall into.the highest probability to fall into.

  • 8/7/2019 Stat 586 Discriminant Analysis

    31/41

    3131

    Step 5: Bayes ruleStep 5: Bayes rule

    Formula used to calculate probability that aFormula used to calculate probability that asubject belongs to a group:subject belongs to a group:

    !

    !

    N

    k

    kk

    ii

    GPGP

    GPGPiGp

    1

    )|()(

    )|()()|(

    priors

  • 8/7/2019 Stat 586 Discriminant Analysis

    32/41

    3232

    Step 5: Prediction (2)Step 5: Prediction (2)

    To determine these probabilities, a priorTo determine these probabilities, a priorprobability is required. These priorsprobability is required. These priorsrepresent the probability that a subjectrepresent the probability that a subjectbelongs to a particular groups.belongs to a particular groups.

    Usually, we set them equal to the fractionUsually, we set them equal to the fractionof subjects in a particular group.of subjects in a particular group.

  • 8/7/2019 Stat 586 Discriminant Analysis

    33/41

    3333

    Step 5: Prediction (3)Step 5: Prediction (3)

    Prediction on training set: to assess howPrediction on training set: to assess howwell the discriminant model predicts.well the discriminant model predicts.

    Prediction on a new data set: to predict thePrediction on a new data set: to predict thegroup new object belongs to.group new object belongs to.

  • 8/7/2019 Stat 586 Discriminant Analysis

    34/41

    3434

    Step 5: Prediction in RStep 5: Prediction in R

    iris.predict

  • 8/7/2019 Stat 586 Discriminant Analysis

    35/41

    3535

    Step 5: Quality of prediction (1)Step 5: Quality of prediction (1)

    To assess thequality of a prediction, makeTo assess thequality of a prediction, makea prediction table.a prediction table.

    Rows with observed categories ofRows with observed categories ofdependent variable, columns withdependent variable, columns withforecasted categories.forecasted categories.

    Ideally, the offIdeally, the off--diagonal elements shoulddiagonal elements shouldbe zero.be zero.

  • 8/7/2019 Stat 586 Discriminant Analysis

    36/41

    3636

    Step 5: Quality of prediction (2)Step 5: Quality of prediction (2)

    The percentage correctly classified objectsThe percentage correctly classified objectsis usually compared tois usually compared to

    the random classificationthe random classification(100/N)% probability in group i=1,,N.(100/N)% probability in group i=1,,N.

    the probability matching classifcationthe probability matching classifcationProbability of assigning group i=1,,N to anProbability of assigning group i=1,,N to an

    object is equal to the fraction of objects inobject is equal to the fraction of objects inclass i.class i.

  • 8/7/2019 Stat 586 Discriminant Analysis

    37/41

    3737

    Step 5: Quality of prediction (3)Step 5: Quality of prediction (3)

    the probability maximizing method.the probability maximizing method.Put all subjects in the most likely category (i.e.Put all subjects in the most likely category (i.e.the category with the highest fraction ofthe category with the highest fraction of

    objects in it).objects in it).

  • 8/7/2019 Stat 586 Discriminant Analysis

    38/41

    3838

    Step 5: Get table in RStep 5: Get table in R

    table(Original=iris$Species,Predicted=table(Original=iris$Species,Predicted=predict(iris.lda)$class)predict(iris.lda)$class)

    PredictedPredicted

    OriginalOriginal setosa versicolor virginicasetosa versicolor virginica

    setosa 50 0 0setosa 50 0 0

    versicolor 0 48 2versicolor 0 48 2

    virginica 0 1 49virginica 0 1 49

    Grouping

    variable

    Predictedclasses

  • 8/7/2019 Stat 586 Discriminant Analysis

    39/41

    3939

    Step 6: Structure coefficientsStep 6: Structure coefficients

    Correlations between predictors andCorrelations between predictors anddiscriminant values indicate whichdiscriminant values indicate whichpredictor is most related to discriminantpredictor is most related to discriminantfunction (not corrected for the otherfunction (not corrected for the othervariables)variables)

    Example: cor(iris[,1],LD1)Example: cor(iris[,1],LD1)

    (Note difference with discriminant(Note difference with discriminantcoefficients!!!)coefficients!!!)

  • 8/7/2019 Stat 586 Discriminant Analysis

    40/41

    4040

    Assumptions underlying LDAAssumptions underlying LDA

    Independent subjects.Independent subjects.

    Normality: the varianceNormality: the variance--covariance matrixcovariance matrixof the predictors is the same in all groups.of the predictors is the same in all groups.

    If the latter assumption is violated: useIf the latter assumption is violated: usequadratic discriminantquadratic discriminant analysis in theanalysis in thesame manner as linear discriminantsame manner as linear discriminant

    analysis.analysis. ALWAYS CHECK YOURALWAYS CHECK YOUR

    ASSUMPTIONS.ASSUMPTIONS.

  • 8/7/2019 Stat 586 Discriminant Analysis

    41/41

    Quadratic discriminant analysisQuadratic discriminant analysis Call qda:Call qda:result