Chapter 16 - Logistic Regression Model

7/28/2019 Chapter 16 - Logistic Regression Model

1/7

LOGISTIC REGRESSION MODEL

1

1.1IntroductionRegression methods have become an integral component of any data analysis concerned

with describing the relationship between a response variable and one or more explanatory

variables. It is often the case that the outcome variable is discrete, taking on two or more

possible values. Over the last decade logistic regression model has become, in many fields,

the standard method of analysis in this situation.1

Before beginning a study of logistic regression it is important to understand that the goal of

an analysis using this method is the same as that of any model-building technique used in

statistic: to find the best fitting and most parsimonious, yet biologically reasonable model to

describe the relationship between an outcome (dependent or response) variable and a set of

independent (predictor or explanatory) variables. These independent variables are often

called covariates. The most common example of modeling, and one assumed to be familiar

to the readers of this text, is the usual linear regression model where the outcome variable isassumed to be continuous.

1

What distinguishes a logistic regression model from the linear regression model is that the

outcome variable in logistic regression is binary or dichotomous. The difference between

logistic and linear regression is reflected both in the choice of a parametric model and in the

assumptions.1

In this chapter, we focus on logit analysis (a.k.a. logistic regression analysis) as an optimal

method for the regression analysis of dichotomous (binary) dependent variables. Before

considering the full model, lets examine one of its components the odds of an event.2

1.2Odds and Odds RatiosTo appreciate the logit model, its helpful to have an understanding of odds and odds ratios.

Most people regard probability as the natural way to quantify the chances that an event

will occur. We automatically think in terms of numbers ranging from 0 to 1, with a 0 meaning

that the event will certainly not occur, and a 1 meaning that the event certainly will occur.2

Probability, can be computed as follows:

For example:


2/7


2

However, there are other way of representing the chances of event, one of which the odds

has a nearly equal claim to being natural. Consider Table 1, which shows the cross -

tabulation of race of defendant by death sentence for the 147 penalty-trial cases. The

numbers in the table are the actual numbers of cases that have the stated characteristics.

Table 1 Death Sentences by Race of Defendant for 147 Penalty Trials

Black Nonblack Total

Death 28 22 50

Life 45 52 97

Total 73 74 147

Odds, =

Odds ratio = or exp(, the odds ratio represents the change in odds of being in on

categories of outcome when the value of a predictor increases by one unit.

For example: (additional exercise April 2010)

I. Odds of a death sentence =II. Odds of a death sentence for black =

III. Odds of death sentence for nonblack =IV. Odds ratio of the black to the nonblack =

Interpretation: we may say that the odds of death sentences for black are 47% more

than for nonblack. We can also say that the odds of a death sentences for nonblack

are 1/1.47 = 0.63 times the odds of a death sentence for blacks. So, depending on

which categories were comparing, we either get an odds ratio greater than 1 or its

reciprocal, which is less than 1.

1.3The Logit ModelNow were ready to introduce the logit model, otherwise known as the logistic regression

model. For explanatory variables and individuals, the model is

[ ]

where is the probability that The expression on the left-hand side is usuallyreferred to as the logit or log-odds Logit, is the log of the odds, is not only linear in X, but alsolinear in the parameters.


3/7


3

The positive logit values indicate that the odds are in favour of an event happening,while

Negative logit values indicate that the odds are against the occurrence of an event.We can solve the logit equation for to obtain

We can simplify further by dividing both numerator and denominator by the numerator

itself:

In mathematical expression, this formula is called the logistic function and can be written as:

ranges from - to +,ranges between 0 and 1, andis nonlinearly related to Simple logit model

Let and be defined as follows:

[ ]

Hence,

| |


4/7


4

1.4Interpretation of odds-ratioIf for example: ,Odds ratio =

This odds ratio indicates that a smoker is 3 times more likely to develop lung cancer

compared to a nonsmoker.

1.5Applying Logistic RegressionThis statistical method were applied to a data set to compare their predictive ability of

classifying a baby as low birth weight or normal based on several predictor variables

Description of Variables

Variables Description Type Categorical

Y Birthweight Categorical 1= if low birth

0 = normal

X1 Race Categorical 2 = Malay

1 = Chinese

0 = Indian

X2 Gender Categorical 1 = Male

0 = Female

X3 Mothers age Continuous (years)

X4 Fathers income Continuous (RM)

X5 Parity Integer (children)

X6 Abortion Categorical 1 = Yes

0 = No

X7 Mothers height Continuous (cm)

X8 Vitamin Continuous (mg)

X9 Weight gain Continuous (kg)

X10 Antenatal visits Integer (number of times)

Table 2: SPSS Results for Multiple Logistic Regression.

The estimated logistic regression model obtained:


5/7


5

where

Interpreting

The values provided in the SPSS output are equivalent to the obtained in a multipleregression analysis. These are the values that you would use in an equation to calculate the

probability of a case falling into a specific category. You should check whether your valuesare positive or negative. This will tell you about the direction of the relationship

(increase/decrease)

Test concerning

The crucial statistic is the Wald statistic which has a chi-square distribution and tells uswhether the coefficient for that predictor is significantly different from zero. If thecoefficient is significantly different from zero then we assume that the predictor is making a

significant contribution to the prediction of the outcome (). In this sense it is analogous tothe -tests found in multiple regression.3

Walds p-valueif Walds p-value


6/7


6

I. Omnibus Tests of Model Coefficients gives us an overall indication of how well themodel performs compared with model with none of the predictors entered into the

model. For this results, we want a highly significant value (p-value must be less than

0.05)

II. Hosmer-Lemeshow test will be used to test the goodness-of-fit of the modelIII. Cox & Snell R-square and Nagelkerke R-square values provide an indication of the

amount of variation in the dependent variable explained by the model.

IV. The Classification table was also used in this study to know how well the model isable to predict the correct category. This table also provides the sensitivity and

specificity of the model. Sensitivity measures the proportion of actual positives

which are correctly identified, whereas Specificity measures the proportion of

negative which are correctly identified. A model with high percentage of sensitivity

and low in specificity are good and can be used for prediction.

The logistic regression model is a good fit for the data The logistic regression model is not a good fit for the data Chi-square statistic or p-valueif p-value , accept .since p-value (0.511) > , accept . We can conclude that the logisticregression model is good fit for the data.

The R-square values suggested that this model can explain about 15.7 to 24.6 percent of the

total variation in the dependent variable.

Example of Sensitivity and Specificity Analysis


7/7


7

Overall predictive efficiency = 74.1%

Sensitivity (actual positives which are correctly identified) = 141/170 = 0.8294 or 82.9%

Specificity (actual negative which are correctly identified) = 54/93 = 0.5806 or 58.1%

Based on these results, we can conclude that Logistic Regression Model for Bahasa Inggeris

can be used to predict Form Four students achievement.

Chapter 16 - Logistic Regression Model

Documents

Transcript of Chapter 16 - Logistic Regression Model