Chapter 16 - Logistic Regression Model

download Chapter 16 - Logistic Regression Model

of 7

Transcript of Chapter 16 - Logistic Regression Model

  • 7/28/2019 Chapter 16 - Logistic Regression Model

    1/7

    LOGISTIC REGRESSION MODEL

    1

    1.1IntroductionRegression methods have become an integral component of any data analysis concerned

    with describing the relationship between a response variable and one or more explanatory

    variables. It is often the case that the outcome variable is discrete, taking on two or more

    possible values. Over the last decade logistic regression model has become, in many fields,

    the standard method of analysis in this situation.1

    Before beginning a study of logistic regression it is important to understand that the goal of

    an analysis using this method is the same as that of any model-building technique used in

    statistic: to find the best fitting and most parsimonious, yet biologically reasonable model to

    describe the relationship between an outcome (dependent or response) variable and a set of

    independent (predictor or explanatory) variables. These independent variables are often

    called covariates. The most common example of modeling, and one assumed to be familiar

    to the readers of this text, is the usual linear regression model where the outcome variable isassumed to be continuous.

    1

    What distinguishes a logistic regression model from the linear regression model is that the

    outcome variable in logistic regression is binary or dichotomous. The difference between

    logistic and linear regression is reflected both in the choice of a parametric model and in the

    assumptions.1

    In this chapter, we focus on logit analysis (a.k.a. logistic regression analysis) as an optimal

    method for the regression analysis of dichotomous (binary) dependent variables. Before

    considering the full model, lets examine one of its components the odds of an event.2

    1.2Odds and Odds RatiosTo appreciate the logit model, its helpful to have an understanding of odds and odds ratios.

    Most people regard probability as the natural way to quantify the chances that an event

    will occur. We automatically think in terms of numbers ranging from 0 to 1, with a 0 meaning

    that the event will certainly not occur, and a 1 meaning that the event certainly will occur.2

    Probability, can be computed as follows:

    For example:

  • 7/28/2019 Chapter 16 - Logistic Regression Model

    2/7

    LOGISTIC REGRESSION MODEL

    2

    However, there are other way of representing the chances of event, one of which the odds

    has a nearly equal claim to being natural. Consider Table 1, which shows the cross -

    tabulation of race of defendant by death sentence for the 147 penalty-trial cases. The

    numbers in the table are the actual numbers of cases that have the stated characteristics.

    Table 1 Death Sentences by Race of Defendant for 147 Penalty Trials

    Black Nonblack Total

    Death 28 22 50

    Life 45 52 97

    Total 73 74 147

    Odds, =

    Odds ratio = or exp(, the odds ratio represents the change in odds of being in on

    categories of outcome when the value of a predictor increases by one unit.

    For example: (additional exercise April 2010)

    I. Odds of a death sentence =II. Odds of a death sentence for black =

    III. Odds of death sentence for nonblack =IV. Odds ratio of the black to the nonblack =

    Interpretation: we may say that the odds of death sentences for black are 47% more

    than for nonblack. We can also say that the odds of a death sentences for nonblack

    are 1/1.47 = 0.63 times the odds of a death sentence for blacks. So, depending on

    which categories were comparing, we either get an odds ratio greater than 1 or its

    reciprocal, which is less than 1.

    1.3The Logit ModelNow were ready to introduce the logit model, otherwise known as the logistic regression

    model. For explanatory variables and individuals, the model is

    [ ]

    where is the probability that The expression on the left-hand side is usuallyreferred to as the logit or log-odds Logit, is the log of the odds, is not only linear in X, but alsolinear in the parameters.

  • 7/28/2019 Chapter 16 - Logistic Regression Model

    3/7

    LOGISTIC REGRESSION MODEL

    3

    The positive logit values indicate that the odds are in favour of an event happening,while

    Negative logit values indicate that the odds are against the occurrence of an event.We can solve the logit equation for to obtain

    We can simplify further by dividing both numerator and denominator by the numerator

    itself:

    In mathematical expression, this formula is called the logistic function and can be written as:

    ranges from - to +,ranges between 0 and 1, andis nonlinearly related to Simple logit model

    Let and be defined as follows:

    [ ]

    Hence,

    | |

  • 7/28/2019 Chapter 16 - Logistic Regression Model

    4/7

    LOGISTIC REGRESSION MODEL

    4

    1.4Interpretation of odds-ratioIf for example: ,Odds ratio =

    This odds ratio indicates that a smoker is 3 times more likely to develop lung cancer

    compared to a nonsmoker.

    1.5Applying Logistic RegressionThis statistical method were applied to a data set to compare their predictive ability of

    classifying a baby as low birth weight or normal based on several predictor variables

    Description of Variables

    Variables Description Type Categorical

    Y Birthweight Categorical 1= if low birth

    0 = normal

    X1 Race Categorical 2 = Malay

    1 = Chinese

    0 = Indian

    X2 Gender Categorical 1 = Male

    0 = Female

    X3 Mothers age Continuous (years)

    X4 Fathers income Continuous (RM)

    X5 Parity Integer (children)

    X6 Abortion Categorical 1 = Yes

    0 = No

    X7 Mothers height Continuous (cm)

    X8 Vitamin Continuous (mg)

    X9 Weight gain Continuous (kg)

    X10 Antenatal visits Integer (number of times)

    Table 2: SPSS Results for Multiple Logistic Regression.

    The estimated logistic regression model obtained:

  • 7/28/2019 Chapter 16 - Logistic Regression Model

    5/7

    LOGISTIC REGRESSION MODEL

    5

    where

    Interpreting

    The values provided in the SPSS output are equivalent to the obtained in a multipleregression analysis. These are the values that you would use in an equation to calculate the

    probability of a case falling into a specific category. You should check whether your valuesare positive or negative. This will tell you about the direction of the relationship

    (increase/decrease)

    Test concerning

    The crucial statistic is the Wald statistic which has a chi-square distribution and tells uswhether the coefficient for that predictor is significantly different from zero. If thecoefficient is significantly different from zero then we assume that the predictor is making a

    significant contribution to the prediction of the outcome (). In this sense it is analogous tothe -tests found in multiple regression.3

    Walds p-valueif Walds p-value

  • 7/28/2019 Chapter 16 - Logistic Regression Model

    6/7

    LOGISTIC REGRESSION MODEL

    6

    I. Omnibus Tests of Model Coefficients gives us an overall indication of how well themodel performs compared with model with none of the predictors entered into the

    model. For this results, we want a highly significant value (p-value must be less than

    0.05)

    II. Hosmer-Lemeshow test will be used to test the goodness-of-fit of the modelIII. Cox & Snell R-square and Nagelkerke R-square values provide an indication of the

    amount of variation in the dependent variable explained by the model.

    IV. The Classification table was also used in this study to know how well the model isable to predict the correct category. This table also provides the sensitivity and

    specificity of the model. Sensitivity measures the proportion of actual positives

    which are correctly identified, whereas Specificity measures the proportion of

    negative which are correctly identified. A model with high percentage of sensitivity

    and low in specificity are good and can be used for prediction.

    The logistic regression model is a good fit for the data The logistic regression model is not a good fit for the data Chi-square statistic or p-valueif p-value , accept .since p-value (0.511) > , accept . We can conclude that the logisticregression model is good fit for the data.

    The R-square values suggested that this model can explain about 15.7 to 24.6 percent of the

    total variation in the dependent variable.

    Example of Sensitivity and Specificity Analysis

  • 7/28/2019 Chapter 16 - Logistic Regression Model

    7/7

    LOGISTIC REGRESSION MODEL

    7

    Overall predictive efficiency = 74.1%

    Sensitivity (actual positives which are correctly identified) = 141/170 = 0.8294 or 82.9%

    Specificity (actual negative which are correctly identified) = 54/93 = 0.5806 or 58.1%

    Based on these results, we can conclude that Logistic Regression Model for Bahasa Inggeris

    can be used to predict Form Four students achievement.