Generalized Linear Models

27
Generalized Linear Models • All the regression models treated so far have common structure. This structure can be split up into two parts: The random part: The systematic part: • These two elements are the basic building blocks of generalized linear models.

description

Generalized Linear Models. All the regression models treated so far have common structure. This structure can be split up into two parts: The random part: The systematic part: These two elements are the basic building blocks of generalized linear models. The systematic part. - PowerPoint PPT Presentation

Transcript of Generalized Linear Models

Page 1: Generalized Linear Models

Generalized Linear Models

• All the regression models treated so far have common structure. This structure can be split up into two parts: The random part: The systematic part:

• These two elements are the basic building blocks of generalized linear models.

Page 2: Generalized Linear Models

The systematic part

• Generalized linear model, systematic part: The covariates influence the distribution of

response through the linear predictor:

There is a link-function that links the expectation to the linear predictor:

Page 3: Generalized Linear Models

The generalization from linear models to GLM

• GLMs are a generalization of linear normal models in two directions:

Page 4: Generalized Linear Models

Example: binomial distribution• Definition: the binomial distribution is the discrete

probability distribution of the number of successes in a sequence of n independent yes/no experiments, each of which yields success with probability p.

Page 5: Generalized Linear Models

Example

• For the binomial distribution

• The variance is a function of the mean:

• The linear model for the logit: ____________________ is a non-linear model for the probability ___________________.

Page 6: Generalized Linear Models

The exponential family

• Many distributions encountered in practice (ex: normal, binomial, Poisson and Gamma distribution) share a common structure:

Page 7: Generalized Linear Models

Example of the exponential family: Normal distribution

Page 8: Generalized Linear Models

Example of the exponential family: Binomial

Page 9: Generalized Linear Models

Example of the exponential family

• The Poisson distribution: It is a discrete probability distribution that expresses the probability of a number of events occurring in a fixed period of time if these events occur with a known average rate and independently to the time.

• Ex:The number of phone calls received by a telephone

operator in a 10-minute period. The number of typos per page made by a secretary.

Page 10: Generalized Linear Models

Poisson distribution

• The Poisson distribution belongs to the exponential family:

Page 11: Generalized Linear Models

Mean and variance in the exponential family

• It can be shown that the mean and variance in the exponential family is:

Page 12: Generalized Linear Models

Mean and variance example: Poisson

• For the Poisson model, mean and variance are:

• To summarize, for any given distribution we obtain a specific form of b which in turn determines the variance function.

• The converse is also true:

• Hence specifying a distribution and a variance function is two sides of the same coin as long as we work with exponential families.

Page 13: Generalized Linear Models

Various variance functions

Page 14: Generalized Linear Models

The link function

• The link function is a function which relates the mean to the linear predictor:

• Various link functions have been illustrated so far:

Page 15: Generalized Linear Models

Canonical link

• For each distribution there is a specific link function which yields “nice” mathematical and numerical properties in connection with the estimation process. This link function is called the canonical link:

Page 16: Generalized Linear Models

Specification of GLM

• In practice, a GLM is specified by three steps:

• In this connection it is important to be aware of the following: Most statistical packages will by default use the canonical link function unless another one is explicitly provided.

Page 17: Generalized Linear Models

R code• The glm function in R is used for fitting

generalized linear models.

• Specification of the linear predictor:

• Specification of the distribution and the link function: e.g.

family=Gamma(link=log)

Page 18: Generalized Linear Models

• Remember that the specification of a distribution yields a specific variance function. Not all possible combinations of a distribution and a link function are allowed in R.

Page 19: Generalized Linear Models

Special aspects for binomial data

• Simulate artificial Bernoulli observations with different event probabilities for two groups (the number of trails N is equal to 1):

R code group <- rep(c("A", "B"), c(30, 45))

logit.pi <- ifelse(group == "B", 0.7, 0.7 + 0.5) group <- factor(group) pi <- plogis(logit.pi) N <- rep(1, length(group)) events <- rbinom(length(group), size = N, prob = pi) dat <- data.frame(group, N, events)

Page 20: Generalized Linear Models

Analysis of simulated data• Model:

___________________________________• The response is a two-column matrix containing events and non-

events: f1<-glm(cbind(events,N-events)~group, family=binomial,data=dat)

• Define proportions: dat$prop<-with(dat, events/N)

and use these as the response and the number of trails N as weights in the fit:

f2<-glm(prop~group, family=binomial, weights=N, data=dat)

• Use the number of events directly as the response f3<-glm(events~group,family=binomial,data=dat)

Page 21: Generalized Linear Models

Fitting GLMs– logistic regression• Consider a data set where the response variable takes only 0 or 1

values and the single covariate variable is continues numerical type. Examples

• If we apply a simple linear regression model_____

to fit the data, there are some problems. • Conclusion: it is not appropriate to use the simple linear regression to

model regression data with binary responses.

Page 22: Generalized Linear Models

Logistic regression• Solution is to use the logistic function:• The formal definition of logistic model for binary response with p

variable:

Page 23: Generalized Linear Models

Logistic regression

• How to interpret the model?

• In logistic model, the odds of “success”:

• The logistic model for binary data can be slightly modified

Page 24: Generalized Linear Models

Modified to cover binomial data

Page 25: Generalized Linear Models

Bernoulli and Poisson distribution

• Likelihood:

• MLE estimates:

Page 26: Generalized Linear Models

Parameter estimation in GLMs

Page 27: Generalized Linear Models

IWLS Algorithm

• Iterative weighted least square algorithm: