Estimating Marginal and Incremental Effects on Health...

Estimating Marginal and Incremental Effects on Health Outcomes Using Flexible Link and Variance Function Models

Anirban Basu Harris School of Public Policy Studies

[email protected]

and

Paul J. Rathouz*

Department of Health Studies [email protected]

University of Chicago, Chicago, IL

September 18, 2003

Summary We propose an extension to the estimating equations in generalized linear models to estimate parameters in the link function and variance structure simultaneously with regression coefficients. Rather than focusing on the regression coefficients, the purpose of these models is inference about the mean of the outcome as a function of a set of covariates, and various functionals of the mean function used to measure the effects of the covariates. A commonly used functional in econometrics, referred to as the marginal effect, is the partial derivative of the mean function with respect to any covariate, averaged over the empirical distribution of covariates in the model. We define an analogous parameter for discrete covariates. The proposed estimation algorithm not only helps to identify an appropriate link function and to suggest an underlying distribution for a specific application but also serves as a robust estimator when no specific distribution for the outcome measure can be identified. Using Monte-Carlo simulations, we show that the resulting parameter estimators are consistent. The method is illustrated with an analysis of inpatient expenditure data from a study of hospitalists. Keywords: Econometric models, Estimating equations, Generalized linear models, Link function, Variance function

* Corresponding author: Paul Pathouz Dept. Of Health Studies – MC 2007 The University of Chicago 5841 South Maryland Ave. Chicago, IL 60637 e-Mail: [email protected] Tel.: +1-773-834-1970 Fax: +1-773-702-1979

2

1. Introduction

Most analysis problems in health economics and biostatistics involve modeling a response

variableY as a function of a vector 1 2( , ,..., )TpX X X X= of covariates in a regression model. The goal is

to estimate the mean function ( ) ( | )x E Y X xµ ≡ = as a function of x . Usually, interest lies in estimating

the effect of one or more of the covariates jX onY . In many cases, this effect is measured by regression

coefficients jβ , but in others it is quantified via more general functionals of ( ) ( | )x E Y X xµ = = . One

such functional, commonly used in health economics, is the partial derivative of µ(x) with respect to one

or more covariates jx in vector 1 2( , ,.., )Tpx x x x= . Denoted by ( ) jx xµ∂ ∂ , this parameter is the expected

change inY due to an infinitely small change in jx in the population of subjects with characteristics X x= .

When Xj is an indicator variable, an analogous parameter is ( ) ( 1, ) ( 0, )j j j j j jx x x x xµ µ µ− − −∆ = = − = ,

where jx− is the vector of x without jx (and similarly for jX − ). This parameter is the difference in µ(x)

between two groups defined by Xj in the population of subjects with characteristics j jX x− −= .

In many applications, any particular combination of values, such as X x= , represents only a

negligible fraction of the population, and researchers would like to assess the effect of Xj in the whole

population. To this end, econometricians commonly focus on the expected value of ( ) / jX Xµ∂ ∂ over the

population distribution of X. This parameter is termed the marginal effect of the covariate (Greene 2000,

pg 824) and is given by ( ) j X jE X Xξ µ≡ ∂ ∂ ; it is the average change in µ(X) due to an infinitely small

change in jX in each member of whole population, controlling for other factors. A related but

conceptually different definition of marginal effect is ( ) / | j j jX j X xE X Xµ

− =∂ ∂ . Here ( ) jX Xµ∂ ∂ is

evaluated at a specific value of Xj = xj, and then the expected value is calculated over the distribution

of jX − , marginally with respect to Xj. An example is the effect of price ( )jX on the demand of a

good ( )Y averaged over all characteristics of consumers ( )jX − and evaluated at the current price ( )jx .

Other functionals of ( )xµ of interest include [ ( ) / ( )]X jE X X Xµ µ∂ ∂ and

[ ( ) / ( )]X j jE X X X Xµ µ∂ ∂ ⋅ . The former represents the average proportionate change in µ(X) due to

infinitely small change in jX in the whole population. The latter is the well-known elasticity measure in

economics representing the population average percentage change inY due to a one percent change in jX .

3

When Xj is a binary treatment variable, the parameter of interest is the incremental effect given by

( )jj X j jE Xπ µ

−≡ ∆ , where the expected value is over jX − , marginally with respect to Xj. This

parameter is the average difference in µ(X) between treating the whole population with

treatment 0jX = and with treatment 1jX = . Another measure of incremental effect (Oaxaca, 1973) is

given by | 1 ( )j jX X jE Xµ

− = ∆ , where the expected value is over jX − , conditional on Xj = 1. It represents the

average change in µ(X) when the population with Xj = 1 is moved to Xj = 0. Examples include prevention

of heart attacks in the population of patient who have heart attacks and providing insurance to people who

do not have health insurance.

The choice of which if any of these functionals is of interest depends on the specific research

question being addressed. A recent example from health economics arises in a two-year study of

hospitalists at the University of Chicago (Meltzer et al., 2002). Hospitalists are physicians who spend

three months a year attending on inpatient wards, rather than the one month typical of most physicians in

academic medical centers. The policy issues are whether hospitalists provide less expensive care than the

traditional arrangement, and if so, what the magnitudes of these effects are. Preliminary evidence shows

that, although at the beginning of the study there were no differences in utilization (inpatient expenditures

and length of stay) between patients treated by hospitalist and those treated by non-hospitalists, at the end

of the two year period and after adjusting for patient demographics and clinical conditions, hospitalist

patients had significantly lower utilization rates than non-hospitalist patients (Meltzer et al. 2002). The

behavioral question is whether this difference is due to the higher cumulative inpatient experience (i.e. the

number of prior disease-specific cases treated) of attending hospitalists over time. That is, as the number

of cases treated increases, do expenditures fall? And, does the introduction of a covariate for disease-

specific cumulative experience eliminate the explanatory power of the indicator for the hospitalists?

Letting 1X be the hospitalist indicator variable and 2X the disease-specific experience of the attending

physician the analysis involves modeling total inpatient expenditure as a function of 1X and 2X , adjusting

for patient demographics, clinical conditions and type of care received. The modeling goal is to estimate

the incremental effect 1π of being a hospitalist and the marginal effect 2ξ of disease-specific experience

on average inpatient expenditures across the patient and attending populations.

In some applications, the mean function ( )xµ is assumed to be a linear model ( ) Tx xµ β= , and

when this model is correct, the ordinary least squares (OLS) regression coefficients ˆjβ are consistent

estimators of all variants of marginal and incremental effects discussed earlier, including jξ and jπ .

However, many outcome variables in health economics and biostatistics are characterized by non-

4

negative values, heteroscedasticity, heavy skewness in the right tail, and kurtotic distributions that render

OLS on the raw scale ofY inapplicable. Examples include inpatient length of stay and expenditure,

income and earnings, and counts of psychiatric or other symptoms. For example, total inpatient

expenditure (in $) from the hospitalist data is strictly positive and has a coefficient of skewness of about 6

and a coefficient of kurtosis of about 60 on the raw-scale. Problems with OLS in this setting include at a

minimum instability in resulting estimators due to skewness, and inefficiency due to heteroskedasticity.

In addition, experience shows that linear models for positive, skewed outcome variables tend to be less

representative of true data generating mechanisms than are other models. Such alternative models include

those based on transformation of response ,Y as well as the class of generalized linear models (GLM;

McCullagh and Nelder, 1989; Blough et. al., 1999), which rely on transformation of the mean

function ( )xµ . The methodology proposed in this paper is based on an extension of the traditional GLMs,

and on the use of this extended class for estimation of marginal and incremental effects.

Econometricians have historically relied on logarithmic or other transformations of ,Y followed by

regression of the transformedY on X using OLS, to overcome problems of heteroskedasticity, severe

skewness, and kurtosis (Box and Cox, 1964). In the hospitalist data, for example, the log-scale residual

for total inpatient expenditure is better behaved than the raw scale, with a coefficient of skewness of about

0.09 and a coefficient of kurtosis of about 4. The main drawback of transformingY is that the analysis

does not result in a model for ( )xµ in the original scale, a scale that in many applications is the scale of

interest. For example, in the hospitalist study, the scale of interest when modeling inpatient expenditure

is dollars while the scale of estimation may be log-dollars used in a log-OLS model. In order to draw

inferences about the mean ( )xµ in the natural scale of ,Y one can assume that the error terms in the log-

OLS model are normally distributed. In this case, ( )xµ is given by the retransformation

2exp( 0.5 )Tx β σ+ (Duan, 1983; Manning, 1998), where σ2 is Varlog( ) | Y X x= . When the log-scale

errors are iid (and hence homoscedastic), but not necessarily normal, regression coefficients β are

estimated consistently, but the retransformation is no longer a valid estimator of ( )xµ , although Duan’s

(1983) smearing estimator provides a consistent alternative. Even when the iid errors assumption holds,

efficiency of the regression coefficient estimators is sacrificed unless normality also holds. The

retransformation is further complicated in the presence of heteroscedasticity on the log-scale, i.e., where 2 2 ( )xσ σ= (Manning, 1998; Mullahy 1998). In practice, as we seldom know the true form of

heteroscedasticity, any retransformation can potentially yield biased estimators of ( )xµ unless

considerable effort is devoted to studying the specific form of heteroscedasticity.

5

To avoid such problems of retransformation, biostatisticians and some economists have focused

on the use of GLMs with quasi-likelihood estimation (Wedderburn, 1974). In the GLM approach, a link

function relates the expectation ( )xµ of outcomeY to a linear specification Tx β of covariates. The

retransformation problem is simplified to a considerable extent by transforming ( | )E Y X x= instead ofY .

Moreover, GLMs allow for heteroscedasticity through a variance structure relating Var( | )Y X x= to the

mean. Correct specification of the variance structure results in efficient estimators (Crowder, 1987) and

may correspond to an underlying distribution of the outcome measure. Although log link models with the

gamma error distribution are the most common GLM application in health economics (Blough et. al.

1999; Manning and Mullahy, 2001; Basu, Manning and Mullahy 2002), this specification is not

universally correct, and it is often difficult to identify the appropriate link function and variance structure

a priori (Blough et. al., 1999; Manning and Mullahy, 2001). Economic theory has a difficult enough time

predicting the signs of the partial derivative of ( )xµ with respect to some jx , and that theory provides

almost no guidance about functional form of ( )xµ or about distributional characteristics of Y given X.

One approach to this problem is to employ a series of diagnostic tests for candidate link and variance

function models; examples include the Pregibon link test (1980), the Hosmer-Lemeshow test (1995) and

the modified Park test (Manning and Mullahy, 2001). However, in many cases, even if these tests detect

problems, they do not provide any guidance on how to fix those problems. An alternative approach,

which we pursue in this paper, is to estimate the link function and variance structure along with other

components of the model.

We propose a semi-parametric method to estimate the mean model ( )xµ and the variance

structure for Y given X, concentrating on the case where Y is a positive random variable. We extend the

traditional GLM framework via a mean model that contains an additional parameter governing the link

function using the Box-Cox transformation (Box and Cox, 1964), and we propose parametric models for

the variance as a function of ( )xµ . We estimate the regression and link parameters via an extension of

quasi-likelihood (Wedderburn, 1974), and the variance parameters using additional estimating equations.

Finally, we show how to use this fitted model to make inferences about marginal ( jξ ) and incremental

( jπ ) effects. While we focus on jξ and jπ , our methodology would apply equally well to the variants of

these parameters discussed above. The flexible algorithm we propose has two primary advantages: first,

it helps to identify an appropriate link function and suggests an underlying model for the error distribution

for a specific application; second, the proposed method itself is a robust estimator when no specific

distribution for the outcome measure can be identified. That is, our approach is semi-parametric in that,

6

while we employ parametric models for the mean and variance of ( | )Y X , we do not employ further

distributional assumptions or full likelihood estimation methods.

Other researchers have proposed methods for estimating link and/or variance functions along with

regression coefficients in GLMs. In perhaps the closest work to our approach, Nelder and Pregibon

(1987) suggest a profile extended quasi-likelihood function that may be used to obtain correct estimates

of ancillary parameters in the link and variance functions. Their method requires iterative estimations of

the parameters, holding one or more of the ancillary parameters fixed while estimating β, and then

varying these parameters over some interval of interest. Scallan et al. (1984), Mallick and Gelfand

(1994), and Kaiser (1997) propose maximum likelihood estimation of parametric link function models,

but their approaches require full distributional assumptions, and neither Scallan et al. nor Kaiser include

any simulation studies to assess the performance of their estimators. Mallick and Gelfand (1994) operate

in a Bayesian framework. Other approaches include quasi-likelihood methods where a variance function

is estimated but the link function is assumed known (Chiou and Müller, 1997), where the link function is

estimated non-parametrically but the variance function is assumed known (Li and Duan, 1989; Li, 1991;

Weisberg and Welsh, 1994; Carroll, Fan, Gijbels and Wand 1997), and where non-parametric link and

variance functions are estimated simultaneously (Chiou and Müller, 1998).

The rest of this paper is structured as follows. The model definition, basic assumptions and

estimation algorithm are presented in Section 2. Section 3 presents a simulation study comparing the

performance of the proposed estimator with several other GLM estimators in terms of consistency in

estimating functionals of ( )xµ , specifically the marginal effects jξ , and in terms of efficiency loss due to

estimation of additional parameters, versus cases when the appropriate link and variance are known a

priori. In Section 4, we illustrate the application of the proposed method with analysis of inpatient

expenditure data from the hospitalist study.

2. Extended Estimating Equations (EEE) in Generalized Linear Models

2.1. Model

Consider N iid observations ( , )i iY X , where iY is a positive response variable and

1( ,..., )Ti i ipX X X= is a vector of covariates that may include an intercept. Interest is on modeling the

mean function ( ) ( | )i ix E Y X xµ ≡ = and functionals thereof. Letting ( )i iXµ µ= , we posit a generalized

linear model (GLM; McCullagh and Nelder, 1989) wherein ( )i ig µ η= , Ti iXη β= and β is a 1p × vector of

regression parameters. Here, g(.) is a strictly monotone differentiable link function that relates µi to the

7

linear predictor ηi. In addition, the variance of the outcome variable is given by, Vi= Var(Yi|Xi) = φh(µi),

where φ is the dispersion parameter and variance function ( )ih µ is positive for all µi. In traditional GLMs,

the link function and variance structure are fixed by the investigator.

Our goal is to extend the GLM model for the mean and the variance of Yi to include families of

link functions g(.; λ) and of variance functions h(.; θ1, θ2). As such, define a parametric family of link

functions indexed by λ:

( 1) / , if 0 ( ; )

log( ), if 0.i

i ii

gλµ λ λ

η µ λµ λ

− ≠= = =

(McCullagh and Nelder, 1989, Chap. 2; Box and Cox, 1964). Notice that g(µi; λ) is continuous in λ and

has continuous first derivatives in λ for all µi > 0 and for all λ, including λ = 0. As λ is allowed to vary,

the scale of the linear predictor ηi, relative to µi also varies. However, at ηi = 0, we have µi = 1

and i iµ η∂ ∂ = 1 for all λ. Therefore, the family of link functions g(µi; λ) is standardized such that µi = 1

and i ij jXµ β∂ ∂ = (j=1,..., p) when XiTβ = 0, across all values of λ. For link function g(µi; λ) to be valid

for a given λ, we restrict the linear predictor iη such that ( 1) 0iη λ + > which implies ( 1 )iη λ> − if λ ≥ 0, and

( 1 )iη λ< − if λ < 0. This ensures that 0 i iµ > ∀ .

Similar to the link function, define a family h(µi; θ1, θ2) of variance functions indexed by (θ1, θ2).

We consider two such families. The first, which we refer to as the Power Variance (PV) family,

sets 21 2 1( ; , )i ih θµ θ θ θ µ= . It includes as special cases the variances of several standard distributions used for

modeling health outcomes. A list of these distributions is presented in Table 1. An alternative is the

Quadratic Variance (QV) family given by 21 2 1 2( ; , )i i ih µ θ θ θ µ θ µ= + ; standard distributions corresponding

to this form of variance are also listed in Table 1. Note that we have subsumed the dispersion parameter φ

into the variance functions.

2.2. Estimation

In traditional GLM (McCullagh and Nelder, 1989), the regression parameters β are estimated

using the well-known quasi score equations (Wedderburn, 1974),

1

1 1( ) ( ) 0

j

N Ni

i i i i ji i

G Y Vβ µ µ β−

= =

= − ∂ ∂ =∑ ∑ , j=1,…, p. (2.1)

McCullagh (1983) showed that solving equations (2.1) is equivalent to maximizing a quasi-likelihood

function that behaves in many ways as a likelihood for the regression parameters.

8

Building on (2.1), we define an extended set of estimating functions for parameter vector γ = (βT,

λ, θ1, θ2)T. For the ith individual, define

1( ) ( )j

ii i i i jG Y Vβ µ µ β−= − ∂ ∂ j=1,.., p

1( ) ( )ii i i iG Y Vλ µ µ λ−= − ∂ ∂

1

2 21( ) ( )i

i i i i iG Y V V Vθ µ θ− = − − ∂ ∂

2

2 22( ) ( ).i

i i i i iG Y V V Vθ µ θ− = − − ∂ ∂ (2.2)

Quasi-score equation iGλ is similar to (2.1) since, like β , λ is also a mean model parameter. Estimating

functions1

iGθ and2

iGθ for variance parameters (θ1,θ2), can be motivated via the likelihood under the

assumption that Yi|Xi is normal (Hall and Sevrini, 1998). The log likelihood for the normal model is

2 10.5( ) 0.5logii i i iL Y V Vµ −= − − , and , from Li, the maximum likelihood equations with respect to θ1

and θ2 are1

iGθ and 2

iGθ in (2.2). However, these equations remain unbiased even if we relax the assumption

of normality. They provide consistent estimators of θ1 and θ2 under the assumption that the mean model

µi and the variance model h(µi; θ1, θ2) are correct. Define1 2 1 2

, ,..., , , , p

i i i i i i i TG G G G G G Gγ β β β λ θ θ= and the

extended estimating function for γ as1

Ni

iG Gγ γ

=

= ∑ .

We estimate γ by solving 0Gγ = , yielding estimator ˆNγ . Under mild regularity conditions,

0ˆ pNγ γ→ as N → ∞ and 0ˆ( )Nγ γ− is asymptotically normal with mean 0 and covariance matrix AN

given by:

AN = 1

1( ) ( ) ( ) .

( 1)

N Ti iT

i

NE G E G G E GNγ γ γ γγ γ

− −

=

−∂ ∂ −∂ ∂ − ∑ (2.3)

A sketch of the proof is given in Appendix A. Replacing γ by ˆNγ and ( )i iTE G Gγ γ with i iTG Gγ γ in (2.3) yields

a sandwich estimator of the variance-covariance of ˆNγ (Huber, 1972; Liang and Zeger, 1986).

Some studies yield clustered observations. For example in the hospitalist study, individual

patients are clustered within physicians. While estimator ˆNγ , albeit inefficient, is still consistent

forγ because Gγ is unbiased, the variance-covariance estimator given by (2.3) is inconsistent. However,

(2.3) can be modified to yield a sandwich estimator that accounts for the clustering effect. Specifically, let

M denote the total number of clusters in the sample, where the mth cluster (m = 1, 2,…, M) contains Nm iid

9

observations (Ymi, Xmi), and estimating function miGγ , i = 1, 2,.., Nm. The asymptotic variance-covariance

of ˆNγ is given by:

AM = 1

1( ) ( ) ( ) ,

( 1)

M Tm mT

m

ME G E Q Q E GMγ γ γ γγ γ

− −

=

−∂ ∂ −∂ ∂ − ∑ (2.4)

where 1

mNm mi

iQ Gγ γ

=

= ∑ . Replacing γ by ˆNγ and ( )m mTE Q Qγ γ with m mTQ Qγ γ in (2.4) yields the sandwich

estimator of the variance covariance of 0ˆ( )Nγ γ− accounting for clustering in the data.

A Fisher scoring algorithm used to solve 0Gγ = is described in Appendix B. The algorithm is

implemented in Stata SE (StataCorp, 2001) and is available on request from the corresponding author.

Further details are posted at http:\\home.uchicago.edu\~abasu\EEE\EEEWeb.pdf.

2.3. Estimation of marginal and incremental effects

For the model proposed in 2.1, the partial derivative of ( )xµ with respect to a covariate jx is given

by 1( , )T

j xλ

β µ β λ−

. Thus, an estimator of the marginal effect jξ of any continuous covariate Xj onY ,

denoted by ˆjξ , is given by

ˆ1

1

1

ˆ ˆ ˆ ˆˆ ˆ ( ) ( ) ( , )N

Tj X j j i

iE X X N X

λξ µ β µ β λ

−−

=

= ∂ ∂ = ∑ . (2.5)

Here the hat ( )∧ on µ indicates that β and λ have been estimated and the hat on ˆXE indicates that the sample

expected value has replaced the population expected value. To estimate the incremental effect jπ of an

indicator variable Xj, we use the method of recycled predictions (StataCorp, 2001). In this method, after

estimating parametersγ , we assign the value of 1k = or 0 to Xj for every observation in the dataset, keeping

other covariates same as before, and then average the predictions ,ˆ ( , )i j ijX X kµ − = across i . Here,

ijX and ,i jX − are the values of jX and jX − for the ith observation. The estimated incremental effect ˆ jπ for

covariate Xj is

ˆˆ ˆ ( )jj X jE Xπ µ

−= ∆ = 1

, ,1

ˆ ˆ ( , 1) ( , 0)N

i j ij i i j iji

N X X X Xµ µ−− −

=

= − =∑ . (2.6)

Variance estimators for the marginal and incremental effect estimators ˆjξ and ˆ jπ are obtained

using Taylor series approximations. These depend both on the variance of ( β , λ ) and also on the

10

variance of covariates X in the population of interest. We show in Appendix C that the variance for (2.6)

is given by

ˆVar jπ = ˆVar ( )jX jE Xµ

−∆ +

T

j jNA

α α

π πγ γ

∂ ∂ ∂ ∂

(2.7)

In (2.7), the first term is the sample variance of ˆ jπ due to using the empirical expected value

ˆ ( )jX jE Xµ

−∆ rather than the population expected value, assuming γ known. The second term is due to

the fact that γ is estimated. An estimator of the variance (2.7) is obtained by replacing γ with γ , and

replacing the first term in (2.7) by

1 1 2

1

ˆ ˆ( 1) ( ) N

j i ji

N N Xµ π− −

=

− ∆ −

∑ . (2.8)

Variance estimators for the estimated effect ˆ jπ that follow from (2.7) may be modified to account for

clustered observations. First, AN in (2.7) may be replaced with AM from (2.6). Second the variance

estimator of π in presence of clustering is given by:

2 1 2

1

ˆ ˆ( 1) ( )M

mm

M N M π π− −

=

− −

∑ (2.9)

where1

ˆ ˆmN

m mii

π π=

= ∑ , 1

1 1

ˆ ˆmNM

m mim i

N Nπ π−

= =

= ∑∑ , M is the total number of clusters and ˆmiπ is the π for the ith

observation of the mth cluster.

An estimator for ˆVar( )jξ analogous to (2.7) may be obtained through a similar approach.

3. Simulations

3.1Design

To evaluate the performance of the extended estimating equations (EEE) approach for estimation

of ( )xµ and associated effects, we performed a simulation study comparing EEE to alternative estimators

under a variety of data generating processes. We consider processes yielding strictly positive

outcomesY that are skewed to the right and kurtotic. They differ in their degrees of skewness and

kurtosis, and, through different link functions, in their dependence on a single covariate X . For all data

generating processes, the values of X are evenly-spaced over the interval[0,1] , the linear predictor

is 0 1Xη β β= + , 1 1β = , and 0β is selected such that ( ) 1E Y = marginally over X .

11

The first data generating process for ( | )Y X is the gamma distribution with the shape parameter

0.5, yielding a monotonically declining density. We select this shape parameter due to the prevalence of

such densities in the health economics literature. The scale parameter µ is related toη through three

different link functions: log ( log( )µ η= ), inverse ( 1µ η− = ) and square-root ( 0.5µ η= ). The second

generating mechanism for ( | )Y X is the inverse Gaussian distribution with shape parameter 1 and scale

parameter µ related toη via the log link and the identity link ( µ η= ). The final process we considered

generates ( | )Y X as log-normal with log-scale mean equal toη and heteroscedastic log-scale variance

equal to either ( ) (1 )v X X= + or to 2( ) (1 )v x X= + . In this case, 0 1( ) exp 0.5 ( )x x v xµ β β= + + . We also

studied Poisson and negative binomial data generating mechanisms for ( | )Y X ; results were very similar

to those for the gamma distribution and are available on request. For each data generating mechanism,

we generated 500 replicates each of sample size 2,000N = and 10,000N = .

For each replicate data set, we estimated the mean function ( )xµ and the variance

Var( | )Y X x= as a function of x using five different estimators. The first three of these were gamma,

Poisson and inverse Gaussian regression models ofY on X , each with a log link function. These

regression models determine the variance as a function of the mean. The last two were the EEE

estimators with the PV and the QV variance models. Each of the five estimators set 0 1Xη β β= + . Note

that all of these specifications are incorrect for the heteroscedastic log-normal data with quadratic

variance. Nevertheless, we wanted to see how the proposed estimators perform under such

misspecification. To evaluate the PV and QV model estimators, we examined the mean and the standard

deviation of λ , 1θ and 2θ across the 500 replicate data sets. We also studied the negative binomial

regression model estimator, but omit the results due to similarity to those of Poisson regression.

For each fitted mean and variance model, we computed a variety of parameter estimates: (1) the

fitted mean ˆ ( )xµ at the midpoint of each decile of the distribution of X ; (2) the fitted partial

derivative ˆ ( ) /x xµ∂ ∂ at the midpoint of each decile of X and also at 0.2,0.5,0.8x = ; (3) the marginal effect

ξ computed using (2.5); and (4) the fitted variance Var( | )Y X x∧

= at the midpoint of each decile of the

distribution of X . By computing the mean of these statistics across all 500 replicates, we obtained the

percent bias of each estimator, which is presented graphically. In addition, to evaluate the efficiency of

the QV and PV estimators relative to the other estimators, we computed the percent coefficient of

variation of ˆ ( ) /x xµ∂ ∂ ( ˆ100 sd ( ) / / ( ) / x x x xµ µ× ∂ ∂ ∂ ∂ ) over 500 replicates at 0.2,0.5,0.8x = and of ξ .

12

3.2. Results

Table 2 provides descriptive statistics for Y across the various data generating mechanisms. For

each case,Y is skewed to the right and heavy tailed, with the heteroscedastic log normal distribution

exhibiting the greatest skewness and kurtosis. Table 3 shows that the EEE method yields consistent

estimators of the link ( λ ) and variance function ( 1 2ˆ ˆ,θ θ ) parameters for all data processes, except for the

heteroscedastic log-normal case. In this last case, these parameters are not defined.

Figure 1 displays the % bias in ˆ( )xµ for different estimators across the midpoints of the deciles of

X for the data generating mechanisms (N=2000) with link functions different from log. Similar results for

the three log-link processes (not shown) indicated that all estimators were consistent for those cases. EEE

corrects the considerable bias evident in the GLM estimators with incorrect link functions, and, with the

exception of the heteroscedastic log-normal case, the EEE estimators are consistent. Similar to Figure 1,

Figure 2 illustrates % bias in ˆ ( ) /x xµ∂ ∂ . The bias for GLM estimators increases at values of X away

from 0.5x = , whereas the EEE estimators are generally consistent across all values of X. For gamma data

with inverse link, EEE shows a 20% bias in ˆ ( ) /x xµ∂ ∂ at 0.05x = , and for the heteroscedastic log-normal

data, EEE shows a 40% bias at 0.95x = . These biases are evidently due to small sample size, as the bias

reduces to less than 4% for 10,000N = for the gamma data with inverse link. Even for the log-normal

data, for which all models are misspecified, the bias is reduced considerably using EEE with 10,000N = .

Figure 3 shows similar results as those in Figure 1, but for variance estimation. Note here that, for

gamma data with square-root or inverse link, even the gamma estimator with log-link show biases in

estimating the variance. This arises due to the bias in estimating the mean function with incorrect link,

since the variance model is a function of the mean. For heteroscedastic log-normal data, the flexible

variance model for EEE results in considerably less bias in estimating the variance than standard GLM

approaches with log link. Results in Figures 1—3 were similar for 10,000N = .

Table 4 reports the bias and coefficient of variations in ˆ ( ) /x xµ∂ ∂ at 0.2,0.5,0.8x = , and

in ξ arising for each combination of data generating process and estimator. For gamma and inverse

Gaussian data with log link, the corresponding GLM estimators with log link are maximum likelihood

and hence are consistent and efficient for ( ) /x xµ∂ ∂ and forξ . As expected, the Poisson and inverse

Gaussian estimators have slightly higher coefficients of variations than the gamma estimator when the

data are gamma, especially at the tails of the distribution of X. Similar results obtain with the gamma and

Poisson estimators for inverse Gaussian data, and all three of the GLM estimators were consistent for the

log-normal data with ( ) (1 )v x x= + . For the three log link processes, the EEE PV and QV estimators are

13

consistent for ( ) /x xµ∂ ∂ , though at a reduced efficiency that is the cost of estimating the link

parameter λ . For example, in case of gamma data with log link and 2000N = , the coefficient of variation

in ˆ ( ) /x xµ∂ ∂ at 0.8x = is 14.6% for the gamma GLM estimator with log link and 29% for EEE with

either PV or QV variance structure. However, the discrepancies in efficiency between GLM and EEE are

considerably lower for estimation ofξ , which averages ( ) /x xµ∂ ∂ over X , and for the larger sample size.

Turning to the three data generating mechanisms without log link (gamma with square root and

inverse link, and inverse Gaussian with identity link; Table 4), we see that misspecification of the link

parameter generally results in biased estimators of ( )x xµ∂ ∂ (especially at more extreme values of x ) and

ofξ for all GLM estimators. The EEE estimators with either the PV or QV structures largely correct for

these biases, and in the case of ξ do so with a high degree of efficiency. For example, when a gamma

with log link estimator is applied to gamma data with square root link ( 10,000N = ), bias in

estimating ( )x xµ∂ ∂ is 28%, 4% and 43% at 0.2,0.5x = and 0.8 respectively, and 17% forξ .

Corresponding biases for the proposed estimator with either PV or QV structure are less than 0.5%, and

the coefficient of variation of ξ is lower for EEE than for the GLM estimators.

For heteroscedastic log-normal data with quadratic variance, the true functional form

of log ( )xµ is quadratic in x . However, we only use a linear specification in x in our estimation models.

Consequently, all estimators are expected to be biased for ( ) /x xµ∂ ∂ andξ . The interesting result for this

data generating mechanism is that the EEE PV estimator overcomes the problem of covariate

misspecification by estimating a suitable link parameter and produces an estimator with considerably

reduced bias over the GLM estimators, although considerable sample size is required.

Finally, we note that the estimating equations with QV variance structure fail to converge for a

significant number of replicates (80%) of the heteroscedastic log-normal data. Hence we do not report the

results obtained from this estimator on this data generating mechanisms. We believe that the variance

structure imposed by QV may not be appropriate for modeling heteroscedastic log-normal data resulting in

instability in the weights for the mean model. However, the EEE with PV structure converged in all the

heteroscedastic log normal data replicates and we do see reasonably good fit in the mean and variance model.

4. Empirical Example – Hospitalist Study

We now return to the hospitalist study described in the introduction to illustrate the proposed

methodology. Subjects are all adult patients ( 6500N = ) admitted to the medical wards at the University

14

of Chicago over a two-year period. Hospitalist and non-hospitalist attending teams rotated days through

the calendar in a fixed order. Thus, patients are assigned to attending physician in a quasi-random

manner based on date of admission, ensuring a balance of days of the week and months across the two

sets of attending physicians. Patients are clustered within physician and we will account for this

clustering in our variance estimation. There are no appreciable differences between the two groups of

patients in terms of demographics, diagnoses or other baseline characteristics. Inpatient (facility)

expenditure is the outcome of interest in our analysis, with a sample mean of $8530 (sd = $12500; 25th

percentile = $2857; median = $4910; 75th percentile = $9235). As discussed in the introduction, the two

parameters of primary interest are the incremental effect 1π of the hospitalist indicator variable ( 1X ) and

the marginal effect of disease-specific physician experience on total inpatient expenditure (Y ). Physician

experience is measured via the prior number of patients with the same disease treated by that physician

( 2X ). Adjustor covariates include patient co-morbidities, relative utilization weight of diagnosis,

admission month indicator variables, and an indicator for transfer from another institution. A scatter plot

of unadjusted inpatient expenditure against disease-specific cases is shown in Figure 4. Due to the

skewed distribution of the experience variable and to the non-linear relationship of expenditure to

experience, and also to conform to the specification used by the original investigator (Meltzer et al.,

2002), we use logged count of disease specific experience as a covariate in our model. However, since we

are interested in the marginal effect with respect to raw counts, our parameter of interest becomes

*2

2 2

( ) 1X

XEX X

µξ ∂

= ⋅ ∂ , (4.1)

the star (* ) indicating that this effect parameter is slightly different than that defined earlier.

Preliminary analyses revealed no differences in cost per stay between the two groups of attending

physicians at the beginning of the study, suggesting that there were no significant or appreciable

differences in baseline skills or experience between the hospitalist and traditional attending teams.

Instead, it appears that the differences evolve over time and are directly related to accumulated physician

experience on the date of admission of the observation.

We examine four models: (1) a gamma regression model with a log link, as was done in the

original study; (2) EEE model with PV variance structure; (3) a gamma regression model with square root

link; and (4) EEE model with the variance model fixed as 0.51Var( | ) ( )Y X x xθ µ= = . For each model,

we estimate the incremental effect 1π of the hospitalist variable as in (2.6), and the marginal effect *2ξ of

disease-specific experience via a natural adaptation of (2.5) to (4.1). To study the overall goodness of fit

of each of these models, we examine plots of the mean of the residuals ˆ( )i iY µ− across the demi-deciles

15

(twentiles) of the fitted linear predictor ˆTiX β . Non-linear patterns in the residuals will reveal systematic

bias in the fitted mean function ˆ( )xµ . We also present robust estimates of standard error based on (2.7)

with MA computed as in (2.4) to account for the fact that patients are clustered within attending physician.

We also replaced the second model with the EEE QV model with very similar results.

In Table 5, we present estimates of the incremental effect of the hospitalist variable and the

marginal effect of the disease specific experience on the inpatient expenditure. Using gamma regression

with log link, the incremental effect of the hospitalist variable is evidently incorrectly estimated to be

positive. The EEE with PV structure produces the correct sign and significance for this effect as expected

from theory. Hospitalists are expected to save costs; however, once physician experience is accounted for,

cost savings are not significantly different from zero. The marginal effect of experience, interpreted as

the cost savings due to the increase in disease-specific experience of one case averaged over all

physicians, is evidently over-estimated (indicating greater cost-savings with additional experience) with

gamma log link estimator, although the qualitative result is similar. The percentage bias in this case is

estimated to be 22% (=(318-260)/260).

The lack of fit for the gamma model with log link is illustrated in Figure 5, where the raw scale

residuals are plotted against the demi-deciles of the linear predictor. The gamma regression with log link

tends to over-predict at the tails of the linear predictor, while the EEE with PV structure overcomes this

problem in terms of predicted mean. Note importantly that the effects of poor fit of the mean function

especially at the top deciles of a right-skewed distribution such as inpatient expenditures are amplified

when the target is the marginal or incremental effect. Due to the magnitude of expenditure in this tail of

the distribution, even modestly poor fits can lead to substantial biases in estimated marginal and

incremental effects. The advantage of EEE approach lies to a considerable extent in overcoming this

problem without changing the linear specification of covariates.

The estimated link parameter under the PV variance structure is λ =0.398 (95%CI: 0.191, 0.605),

indicating that the optimal link for these data is not the log link but more likely a square root link. The

PV model fit also suggests that the variance is a quadratic function of the mean ( 1θ =0.81, 95% CI: 0.71,

0.91 and 2θ =2.14, 95%CI: 1.97, 2.31), which suggests that ( | )Y X is close to a gamma distribution. Using

this information, we ran the standard gamma GLM model, this time with a square root link. This

estimator (Table 5) provides a better fit to the data compared to log link estimator and produces marginal

effects that are more in line with the EEE model. Note the mild increase in efficiency due to using a

known link function.

16

We compare the robustness of estimating the link parameter using the EEE approach to that using

the profile-extended quasi likelihood approach suggested by Nelder and Pregibon (1987), assuming the

underlying distribution to be gamma. Here, the optimal value of λ is found to be 0.39 with a 95%

likelihood-type interval of (0.22, 0.56), which is in general agreement with what we found with the EEE

approach. Presumably, the EEE approach is a bit more conservative since it is also estimating the

parameters in the variance function, though the loss in precision is negligible.

To see the efficiency benefit of flexibly modeling the variance using EEE, we compared our

results to a modified EEE estimator where we incorrectly fixed the variance to be proportional

to 0.5 ( )xµ , but allowed the link parameter λ to be estimated. Though this estimator is consistent, the

standard errors for the incremental and the marginal effects are much larger than when a flexible variance

model is used (Table 5). Had we used this estimator, we would have concluded that the marginal effect

of disease experience is not statistically significant at the 5% level.

5. Discussion

In this paper, we have proposed estimating equations for parameters in the link and variance

functions along with those of the linear predictor in a generalized linear model, and have developed

methodology for using this fitted model to estimate marginal and incremental parameters. The work is

important since, in many health applications, researchers are primarily interested in estimating the mean

and functionals of the mean of the outcome variable in the original versus a transformed scale. We use a

generalized linear models approach in order to overcome the problems that may arise when the outcome

variable is transformed. However, generalized linear models may pose difficulties in choosing the

correct link function and variance structure; difficulties which are addressed by our method.

A critique that may be leveled against methods that involve estimation of a link function is that,

as the link function varies, so does the interpretation of regression coefficients β. This is indeed a problem

when the primary focus is on β. However, the primary parameters of interest here are the mean function

µ(x) and the marginal and incremental effects. The flexible link function allows for less biased estimation

of µ(x) across a range of values of x. Moreover, the advantage of the estimating equation approach lies

in its semi-parametric nature, where no distributional assumptions are made beyond the first and second

moments and in the general form of link and variance structures used. An added advantage is the

simultaneous estimation of all parameters that facilitates its ease of use over earlier methods.

Evidence from simulations shows that these estimators, especially those with PV variance

structure, perform well in terms of bias and efficiency when the distribution of the outcome variable is not

17

known and/or there is ambiguity about the appropriate link function. Estimation of the link parameter λ

does incur a cost in terms of efficiency, but this is partially recovered through simultaneous estimation of

the variance structure. One surprising result was that, while relative efficiency losses due to link function

estimation were sometimes substantial for effects ( ) / jx xµ∂ ∂ for given x , corresponding losses were much

more modest for integrated effects jξ . In applications, we recommend the use of PV variance structure

for continuous outcome variables such as costs and expenditures since the gamma and inverse Gaussian

distributions are special cases of this variance structure. Similarly we recommend the use of QV variance

structure for discrete outcomes such as length of stay and counts of physician visits since Poisson and

negative binomial distributions are its special cases. Practically, we also found that the new estimators

work best in analyses with larger sample sizes, say, over 5000N = .

Finally, we mention generalized additive models (Hastie and Tibshirani, 1990) as an alternative

to the model we propose with flexible link function. While useful in some contexts, in many applications,

there are several covariates, and fitting such models with multiple smooth terms becomes difficult. The

flexible yet parametric link function approach that we propose offers an added degree of flexibility over

the standard generalized linear model, while retaining enough model structure so that the model is still

relatively easy to estimate. We hope that this methodology will be increasingly used in the health

economics and other areas of research that are plagued by data characteristics that makes a priori choices

of link functions and of estimators with distributional assumptions difficult.

ACKNOWLEDGEMENT

We are grateful to Willard G. Manning, Ronald A. Thisted, Vanja Dukic and Daniel L. Gillen for

extremely helpful comments and suggestions. The authors also thank David Meltzer for providing the

Hospitalist Study data. The opinions expressed are those of the authors, and not those of the University of

Chicago. Anirban Basu’s time was supported in part by the National Institute on Alcohol Abuse and

Alcoholism (NIAAA) grant 1 RO1 AA12664-01 A2.

18

Appendix A

Sketch of proof: Proof follows standard asymptotic arguments for solutions to estimating equations.

Let 0γ be the true value of 1 2 1 2( , ,.., , , , )Tpγ β β β λ θ θ= and let 1 2 1 2

ˆ ˆ ˆ ˆ ˆ ˆˆ ( , ,.., , , , )Tpγ β β β λ θ θ= solve 0Gγ = .

Then, under regularity conditions, one can approximate 0ˆ( )N γ γ− via Taylor series approximation

by1

1 0.5

1 1/

N Ni i

i iN G N Gγ γγ

−− −

= =

− ∂ ∂

∑ ∑ + op(1) . By the multivariate central limit theorem, 0.5

1

Ni

iN Gγ

−

=∑

(0,B )L MVN ∞→ , where B∞ = 1

1( )

Ni iT

N ilim N E G Gγ γ

−

→∞ =∑ . By the law of large numbers, 1

1

Ni

iN Gγ γ−

=

∂ ∂∑

1

1( ) C

Np i

N ilim N E Gγ γ−

∞→∞ =

→ ∂ ∂ =∑ . Hence 1

1C ( )

Ni

iN E Gγ γ−

∞=

= ∂ ∂∑ + o(1) and 1

1

Ni

iN Gγ γ−

=

∂ ∂∑ = C∞ + op(1).

Therefore by Slutsky’s Theorem, 0ˆ( )N γ γ− has the limiting distribution of (0, )MVN A∞ where

1C B C TA − −∞ ∞ ∞ ∞= . AN in (2.3), computed replacing γ with γ , yields a consistent estimator of A∞ .

Appendix B:

Initial values of the regression coefficients come from the estimates of regression coefficients

from a gamma GLM model with log link. The initial value of the link parameter λ is set to 0.1. For the PV

structure, initial value of θ1 comes from the shape parameter (φ) computed by the gamma GLM model.

The initial value of θ2 comes from the modified Park test (Manning and Mullahy, 2001). In this test the

logarithm of the squared residuals from the GLM model is regressed on the logarithm of the predicated

values ( µ ) from the GLM model. The coefficient of the log( µ ) gives initial estimate for θ2. For the QV

structure, the squared residuals from the GLM model is regressed on the predicated values ( µ ) and the

squared predicted values ( 2µ ) without an intercept. The coefficient of µ gives an estimate for θ1 and the

coefficient of 2µ gives an estimate for θ2.

Parameter estimates are updated using the following equality ( 1) ( ) ( ) 1 ( )ˆ ˆ Ik k k kGγγ γ+ −= + , where I(k)

= ( )( )kE Gγ γ−∂ ∂ . ( )kI and ( )kGγ are computed using the current value of ( )ˆ kγ . This procedure is iterated

until the maximum relative difference in parameter estimates between two successive iterations γ(k) and

γ(k+1) is less than 0.0001.

We ensure that the required condition 0 i iµ > ∀ is met by setting ˆ iµ to missing for all observations

for which this condition is violated at any given iteration. After the estimator has converged we searched

for any observation with missing ˆiµ . We did not find any such observations for all our simulated datasets

and also for the empirical example with the hospitalist data.

19

Appendix C:

Let jπ% = ˆ ( )jX jE Xµ

−∆ = 1

1 ( , 1) ( , 0)

Ni i

j ij j iji

N X X X Xµ µ−− −

=

= − =∑ be an estimator for jπ when γ

is a vector of known constants. When γ is estimated from the data, then the estimator for jπ is ˆ jπ =

1

1

ˆ ˆ ( , 1) ( , 0)N

i ij ij j ij

iN X X X Xµ µ−

− −=

= − =∑ . Following a first-order Taylor series approximation, one can

write ˆ jπ = jπ% + ˆ( )j

γ

πγ γ

γ∂

−∂

where 1ˆ( ) Sγ γγ γ −− = Ω , 1

1

Ni

iN Gγ γ γ−

=

Ω = ∂ ∂∑ and 1

1

Ni

iS N Gγ γ

−

=

= ∑ . Thus,

ˆVar jπ = Var jπ% + 12 ( )( )jj j Sγ γ

γ

ππ π

γ−∂

Ω −∂

% + ( )1

1( ) ( )

TN

j ji iT T

iS Sγ γ γ γ

γ γ

π πγ γ

− −

=

∂ ∂ Ω Ω ∂ ∂

∑

= Var jπ% + 12 ( )( )jj j Sγ γ

γ

ππ π

γ−∂

Ω −∂

% +T

j jNA

γ γ

π πγ γ

∂ ∂ ∂ ∂

where Var jπ% = 1 1 2

1( 1) ( )

N

j ji

N N π π− −

=

− −

∑ % , 1

1

N

j jii

Nπ π−

=

= ∑ % and AN is the analytical variance of

parameter as in (2.3). Since, 0E Sγ γ = , the second term in the above expression converges to zero. Note

that the second part is identical to the delta method used to calculate the asymptotic variance of a non-

linear function of parameters.

The variance of the marginal effect can also be obtained through an analogous method.

20

REFERENCES

BASU, A., MANNING, W.G. AND MULLAHY, J. (2002). Comparing alternative models: log vs proportional hazard? Forthcoming in Health Economics.

BLOUGH, D.K., MADDEN, C.W. AND HORNBROOK, M.C. (1999). Modeling risk using generalized linear models. Journal of Health Economics, 18, 153-171.

BOX, G. E. P. AND COX, D. R. (1964). An analysis of transformations. Journal of the Royal Statistical Society, Series B 26, 211-252.

BRADLEY, E. AND TIBSHIRANI, R. (1993). An introduction to the bootstrap, London; New York: Chapman & Hall Ltd.

CARROLL, R.J., RUPPERT, D. (1988). Transformation and weighting in regression, New York: Chapman and Hall.

CARROLL, R.J., FAN, J., GIJBELS, I. AND WAND, M.P. (1997). Generalized partially linear single-index models. Journal of the American Statistical Association, 92, 477-489.

CHIOU, J.M. AND MÜLLER, H.G. (1998). Quasi-likelihood regression with unknown link and variance functions. Journal of the American Statistical Association, 98(444), 1376-1386.

CHIOU, J.M. AND MÜLLER, H.G. (1999). Nonparametric quasi-likelihood. Annals of Statistics, 27, 36-64.

CROWDER, M. (1987). On linear and quadratic estimating functions. Biometrika, 74(3), 591-597.

DUAN, N. (1983). Smearing estimate: a nonparametric retransformation method. Journal of the American Statistical Association, 78, 605-610.

GREENE W. H. (2000). Econometric Analysis, 4th edn. New Jersey, Prentice Hall.

HALL, D. B. AND SEVRINI, T. A. (1998). Extended generalized estimating equations for clustered data. Journal of the American Statistical Association, 93(444), 1365-1375.

HASTIE, T.J. AND TIBSHARINI, R.J. (1990). Generalized Additive Models. London: Chapman & Hall.

HOSMER, D.W. AND LEMESHOW, S. (1995). Applied Logistic Regression, 2nd edn. New York, John Wiley & Sons.

HUBER, PETER J. (1972). Robust statistics: A review. The Annals of Mathematical Statistics, 43, 1041-1067.

KAISER, M.S. (1998). Maximum likelihood estimation of link function parameters. Computational Statistics and Data Analysis, 24, 79-87.

LIANG, K. Y. AND ZEGER, S. L. (1986). Longitudinal data analysis using generalized linear models. Biometrika, 73, 13-22.

LI, K. (1991). Sliced Inverse regression for dimension reduction Journal of the American Statistical Association, 86(414): 316-327.

21

LI, K. AND DUAN, N. (1989). Regression analysis under link violation. Annals of Statistics, 17(3), 1009-1052.

LIANG, K.-Y. AND ZEGER, S. L. (1986). Longitudinal data analysis using generalized linear models. Biometrika, 73, 13-22.

MANNING, W.G., BASU, A. AND MULLAHY, J. (2002). Modeling costs with generalized gamma regressions. Draft.

MANNING, W.G. AND MULLAHY, J. (2001). Estimating log models: To transform or not to transform? Journal of Health Economics, 20(4), 461-494.

MANNING, W.G. (1998). The logged dependent variable, heteroscedasticity, and the retransformation problem. Journal of Health Economics, 17, 283-295.

MCCULLAGH, P. (1983). Quasi-likelihood functions. Annals of Statistics, 11 , 59-67.

MCCULLAGH, P. AND NELDER, J.A. (1989). Generalized linear models. 2nd edn. London: Chapman and Hall.

MELTZER, D.O., MANNING, W.G., MORRISON, J., GUTH, T., HERNANADEZ, A., DHAR, A., JIN L. AND LEVINSON W. (2002). Effects of physician experience on an academic general medicine service: results of a trial of hospitalists. Annals of Internal Medicine, 137, 866 – 874.

MULLAHY, J. (1998). Much ado about two: reconsidering retransformation and the two-part model in health econometrics. Journal of Health Economics, 17, 247-281.

NELDER, J.A. AND PREGIBON, D. (1987). An extended quasi-likelihood function. Biometrika, 74(2), 221-232.

OAXACA, R.L. (1973). Male-Female Wage Differentials in Urban Labor Markets. International Economic Review 14(October), 693-709.

PREGIBON, D. (1980). Goodness of link tests for generalized linear models. Applied Statisics., 29, 15-24.

PREGIBON, D. (1981). Logistic regression diagnostics. Annals of Statistics, 9, 705-724.

SCALLAN, A., GILCHRIST, R. AND GREEN, M. (1984). Fitting parametric link functions in generalised linear models. Computational Statistics and Data Analysis, 2, 37-49.

STATACORP. (2001). Stata Statistical Software: Release 7.0. College Station, TX. Vol. 2, p.406.

WEDDERBURN, R. W. M. (1974). Quasi-likelihood functions, generalized linear models, and the Gauss-Newton method. Biometrika, 61, 439-447.

WEISBERG, S. AND WELSH, A.H. (1994). Adapting for the missing link. Annals of Statistics, 22, 1674-1700.

22

Data: Gamma (sqr. root link), N=2000

-15

-10

-5

0

5

10

15

20

25

30

35

0.05 0.15 0.25 0.35 0.45 0.55 0.65 0.75 0.85 0.95x

% B

ias:

mu-

hat

Data: Gamma (inverse link), N=2000

-10

-8

-6

-4

-2

0

2

4

6

0.05 0.15 0.25 0.35 0.45 0.55 0.65 0.75 0.85 0.95x

% B

ias:

mu-

hat

Data: Inverse Gaussian (identity link), N=2000

-6

-4

-2

0

2

4

6

8

10

12

0.05 0.15 0.25 0.35 0.45 0.55 0.65 0.75 0.85 0.95x

% B

ias:

mu-

hat

Data: Hetero log normal v=(1+x)^2, N=2000

-12

-10

-8

-6

-4

-2

0

2

4

6

0.05 0.15 0.25 0.35 0.45 0.55 0.65 0.75 0.85 0.95x

% B

ias:

mu-

hat

Figure 1: %Bias in estimating µ at different values of x from different estimators for selected gamma, Inverse Gaussian and heteroscedastic log-normal data with different link functions. (EEE QV did not run with hetero log normal data).

23


-60

-40

-20

0

20

40

60

80

100

120

140

0.05 0.15 0.25 0.35 0.45 0.55 0.65 0.75 0.85 0.95x

% B

ias:

dm

udx-

hat


-50

-40

-30

-20

-10

0

10

20

30

40

50

0.05 0.15 0.25 0.35 0.45 0.55 0.65 0.75 0.85 0.95x

% B

ias:

dm

udx-

hat


-60

-40

-20

0

20

40

60

80

0.05 0.15 0.25 0.35 0.45 0.55 0.65 0.75 0.85 0.95x

% B

ias:

dm

udx-

hat


-30

-20

-10

0

10

20

30

40

50

0.05 0.15 0.25 0.35 0.45 0.55 0.65 0.75 0.85 0.95x

% B

ias:

dm

udx-

hat

Figure 2: %Bias in estimating ( )x xµ∂ ∂ at different values of x from different estimators for selected gamma, Inverse Gaussian and heteroscedastic log-normal data with different link functions. (EEE QV did not run with hetero log normal data).

24


-100

-50

0

50

100

150

200

250

300

0.05 0.15 0.25 0.35 0.45 0.55 0.65 0.75 0.85 0.95x

% B

ias:

var

-hat


-70

-60

-50

-40

-30

-20

-10

0

10

20

0.05 0.15 0.25 0.35 0.45 0.55 0.65 0.75 0.85 0.95x

% B

ias:

var

-hat


-100

-50

0

50

100

150

200

0.05 0.15 0.25 0.35 0.45 0.55 0.65 0.75 0.85 0.95x

% B

ias:

var

-hat


-200

-100

0

100

200

300

400

0.05 0.15 0.25 0.35 0.45 0.55 0.65 0.75 0.85 0.95x

% B

ias:

var

-hat

Figure 3: %Bias in estimating Var( | )Y X x= at different values of x from different estimators for selected gamma, Inverse Gaussian and heteroscedastic log-normal data with different link functions. (EEE QV did not run with hetero log normal data).

25

050

000

1000

0015

0000

2000

0025

0000

Inpa

tient

Exp

endi

ture

s ($)

0 20 40 60 80Counts of Disease Specif ic Experience

Figure 4: Scatter plot of inpatient expenditures against counts of disease specific experience of physicians in the Hospitalist Study.

26

-3000

-2500

-2000

-1500

-1000

-500

0

500

1000

1500

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20

Categories of linear predictor from respective estimators

Mea

n R

esid

uals

Gamma Log Link EEE PV Gamma Sq. Root Link EEE with Var~mu^0.5

Figure 5: Average of raw-scale residuals from different estimators across 20 categories of the linear predictor from respective estimators of inpatient expenditures in the hospitalist Study.

27

Table 1: Special cases of distributions under PV and QV Variance formulations.

Variance Formulation Distribution PDF PV QV

θ1 θ2 θ1 θ2 1 1 1 0 Poisson !ye yµ µ−

> 0 2 0 >0 Gamma

1

1

1 1 1

exp(1/ )

y y yθ

θ µθ µθ

− Γ

> 0 3 + + Inverse Gaussian 2

2311

1 ( )exp22

yyy

µµ θπ θ

−−

+ + 1 > 0 Negative Binomial ( )

( ) ( )

21

2 2

22

1 1

11 1

y

y

θ

θ θµ θθ

Γ + + Γ Γ +

PV= Power Variance, where 2

1( )

i iV y

θ

θ µ= ; QV= Quadratic Variance, where 2

1 2( )

i i iV y θ µ θ µ= + .

‘+‘ = True value unknown since distribution does not conform to the particular variance structure assumed.

28

Table 2: Descriptive Statistics for Y (averages over all 500 replicates of N=2,000 each)*

Data Generating Mechanism

Mean1 Std. Dev.

Coeff. of skewness

Coeff. of kurtosis

Gamma log link

1.012

1.52

3.25

19.71

Inv. Gaussian log link 0.993 1.14 4.26 35.86 Gamma sqr. root link 1.008 1.73 3.95 27.97 Gamma inverse link 1.028 1.55 3.36 21.27 Inv. Gauss. identity link 1.000 1.15 4.07 32.41 Log-normal ( ) (1 )v x x= + 0.995 2.30 10.99 228.43 Log-normal 2( ) (1 )v x x= + 0.958 5.00 19.06 544.76

* Descriptive statistics for datasets with N=10,000 are similar to those presented above. 1 The intercepts have been selected so that the sample mean is around 1. v= log-scale variance.

29

Table 3: Mean and standard deviations (over all 500 replicates of N=2,000 & N=10,000 each) of parameter estimates in the link and variance functions of EEE estimators Data (link)

N

EEE PV

EEE QV

λ 1θ 2θ λ 1θ 2θ Gamma

2000 -0.062 (0.93) 1.991 (0.11) 1.999 (0.20) -0.058 (0.92) 0.006 (0.37) 1.989 (0.40)

(Log) 10000 -0.009 (0.38) 2.000 (0.05) 1.998 (0.08) -0.009 (0.38) 0.003 (0.16) 1.997 (0.18) True 0 2 2 0 0 2

Inv.

Gaussian 2000 0.023 (0.64) 0.992 (0.06) 3.004 (0.21) 0.090 (0.62) -0.766 (0.17) 1.820 (0.23) (Log) 10000 0.002 (0.27) 0.995 (0.03) 2.990 (0.10) 0.027 (0.26) -0.760 (0.08) 1.813 (0.10)

True 0 1 3 0 + +

Gamma

2000 0.484 (0.17) 1.992 (0.11) 2.002 (0.09) 0.498 (0.17) 0.000 (0.11) 1.995 (0.19)

(Sqr Root) 10000 0.498 (0.08) 2.000 (0.05) 2.000 (0.04) 0.501 (0.07) 0.001 (0.05) 2.000 (0.09) True 0.5 2 2 0.5 0 2

Gamma

2000 -1.060 (0.83) 1.989 (0.11) 1.993 (0.19) -1.060 (0.83) 0.008 (0.39) 1.985 (0.43)

(Inverse ) 10000 -1.011 (0.38) 2.000 (0.05) 2.000 (0.09) -1.011 (0.38) -0.002 (0.17) 2.003 (0.19) True -1 2 2 -1 0 2

Inv. Gaussian

2000 0.998 (0.57) 0.993 (0.06) 3.010 (0.20) 0.995 (0.46) -0.669 (0.13) 1.723 (0.20)

(Identity) 10000 0.996 (0.19) 0.996 (0.03) 2.999 (0.09) 0.997 (0.19) -0.661 (0.06) -0.717 (0.09) True 1 1 3 1 + +

Link Function is given by: ( 1) /i iλη µ λ= − , where Xi iη β= . PV= Power Variance, where 2

1( )i iV y θθ µ= ; QV=

Quadratic Variance, where 21 2( )i i iV y θ µ θ µ= + . ‘+‘ = True value unknown.

30

Table 4: Simulation results on estimation of dµ/dx for alternative estimators on data generated with log link (over all 500 replicates of N=2,000 & N=10,000 each).

% BIAS with datasets of N=2,000 % BIAS with datasets of N=10,000 Data Estimator* ˆ ( )x xµ∂ ∂

at x=0.2 ˆ ( )x xµ∂ ∂

at x=0.5 ˆ ( )x xµ∂ ∂

at x=0.8 ˆ ( ) x

xEx

µ∂∂

ˆ ( )x xµ∂ ∂ at x=0.2

ˆ ( )x xµ∂ ∂ at x=0.5

ˆ ( )x xµ∂ ∂ at x=0.8

ˆ ( ) xxE

xµ∂∂

Mn (cv)++ Mn (cv) Mn (cv) Mn (cv) Mn (cv) Mn (cv) Mn (cv) Mn (cv) Gamma Gamma -0.3 (8.3) 0.0 (11.3) 0.4 (14.6) -0.1 (12.2) 0.0 (3.6) 0.1 (4.9) 0.2 (6.3) 0.1 (5.3) (Log ) Poisson -0.1 (8.6) 0.3 (11.8) 0.9 (15.1) 0.3 (12.7) 0.0 (3.8) 0.1 (5.2) 0.2 (6.6) 0.2 (5.6) Inv. Gauss. 0.3 (9.1) 1.7 (13.8) 3.3 (19.1) 2.0 (15.3) 2.0 (6.1) 5.1 (11.7) 8.5 (18.1) 6.1 (13.6) EEE (PV) 0.7 (26.7) -3.8 (12.8) 4.1 (29.0) 4.8 (18.1) 0.3 (11.5) -0.7 (5.2) 1.0 (13.6) 0.8 (6.6) EEE (QV) 0.8 (26.6) -3.9 (12.8) 3.8 (28.7) 4.6 (17.8) 0.3 (11.5) -0.7 (5.2) 1.0 (13.6) 0.8 (6.6) True 0.719 0.970 1.310 1.013 0.719 0.970 1.310 1.013 Inv. Gamma -0.1 (6.6) 0.1 (8.9) 0.2 (11.2) -0.1 (9.5) 0.0 (2.8) 0.0 (4.0) 0.1 (5.0) 0.0 (4.2) Gaussian Poisson -0.1 (7.1) 0.1 (9.6) 0.3 (12.2) 0.0 (10.3) 0.0 (3.2) 0.0 (4.3) 0.1 (5.5) 0.1 (4.6) (Log) Inv. Gauss. 0.0 (6.4) 0.1 (8.7) 0.3 (11.0) 0.0 (9.3) 0.0 (2.7) 0.0 (3.8) 0.0 (4.8) 0.0 (4.1) EEE (PV) 1.5 (17.8) -1.8 (9.3) 1.5 (23.5) 2.0 (12.9) 0.2 (7.8) -0.4 (3.9) 0.4 (10.7) 0.4 (5.4) EEE (QV) 2.9 (17.2) -2.1 (9.1) -1.6 (21.9) 0.4 (11.9) 0.8 (7.6) -0.5 (3.8) -0.6 (10.3) -0.1 (5.2) True 0.705 0.951 1.284 0.993 0.705 0.951 1.284 0.993 Log Gamma -0.3 (7.9) 0.0 (12.1) 0.5 (16.8) 0.0 (14) 0.0 (3.6) 0.1 (5.4) 0.2 (7.3) 0.2 (6.2) Normal Poisson -0.2 (8.5) 0.5 (13.8) 1.6 (19.8) 0.8 (16.2) 0.0 (3.9) 0.1 (6.1) 0.3 (8.5) 0.2 (7.1) V=(1+x) Inv. Gauss. -0.5 (7.6) -0.3 (11.5) 0.0 (15.9) -0.5 (13.3) 0.0 (3.4) 0.1 (5.1) 0.2 (6.9) 0.2 (5.8) EEE (PV) -0.6 (21.4) -3.6 (12.2) 4.8 (31.7) 5.7 (24.7) 0.3 (9.0) -0.5 (5.2) 0.9 (14.4) 0.9 (8.8) EEE (QV) - - - - - - - - True 0.865 1.357 2.129 1.493 0.865 1.357 2.129 1.493 Gamma Gamma -27.7 (2.5) -3.7 (5.6) 42.4 (12.4) 16.9 (8.7) -27.7 (1.2) -3.6 (2.4) 42.5 (5.5) 17.2 (3.9) (Sqr. Poisson -28.8 (2.6) -9.5 (5.9) 27.7 (12.4) 7.1 (8.7) -28.8 (1.2) -9.7 (2.5) 27.0 (5.5) 6.9 (3.9) Root ) Inv. Gauss. -23.9 (3.2) 8.2 (9.1) 71.0 (22.8) 36.7 (15.6) -24.0 (1.4) 8.0 (5.2) 70.3 (13.6) 36.5 (9.1) EEE (PV) -0.8 (10.9) -0.6 (6.4) 2.3 (16.0) 1.1 (8.4) -0.1 (4.8) -0.1 (2.8) 0.5 (7) 0.2 (3.7) EEE (QV) -0.1 (10.7) -0.9 (6.3) 1.0 (15.8) 0.5 (8.3) 0.1 (4.7) -0.1 (2.7) 0.1 (6.8) 0.1 (3.6) True 1.320 1.920 2.520 1.924 1.320 1.920 2.520 1.924 Gamma Gamma -24.7 (11.2) 8.8 (12.6) 32.8 (11.4) -11.6 (11.0) -24.4 (4.8) 9.5 (5.4) 33.9 (4.8) -11.6 (4.7) (Inverse) Poisson -21.0 (12.1) 12.7 (13.5) 36 (11.8) -8.1 (11.9) -21.1 (5.1) 13.1 (5.6) 36.9 (4.9) -8.4 (4.9) Inv. Gauss. -26.1 (13.6) 6.7 (14.4) 30.3 (12) -13.3 (12.9) -23.4 (11.6) 9.5 (11.2) 32.6 (7.6) -11.2 (10.3) EEE (PV) -1.1 (20.6) -2.9 (15.4) 1.1 (25) 5.6 (22.4) -0.2 (9.7) -0.5 (7.4) 0.5 (11.6) 1.1 (8.1) EEE (QV) -1.0 (20.7) -2.9 (15.4) 1.1 (25) 5.7 (22.4) -0.2 (9.7) -0.5 (7.4) 0.5 (11.6) 1.1 (8.1) True -1.759 -0.900 -0.545 -1.156 -1.759 -0.900 -0.545 -1.156 Inv. Gamma -26.4 (4.6) 1.1 (8.6) 39.2 (15.1) 6.0 (9.7) -26.5 (1.9) 1.0 (3.6) 38.7 (6.4) 5.8 (4.1) Gaussian Poisson -28.2 (4.9) -2.3 (9.0) 32.9 (15.5) 2.0 (10.1) -28.2 (2.0) -2.6 (3.7) 32.3 (6.5) 1.7 (4.2) (Identity) Inv. Gauss. -24.2 (4.6) 5.6 (8.8) 47.2 (15.8) 11.1 (10.1) -24.3 (2.0) 5.4 (3.8) 46.7 (6.8) 10.8 (4.3) EEE (PV) -0.5 (12.0) -0.5 (10.1) 2.9 (21.7) 1.3 (9.0) -0.2 (5.0) 0.0 (4.2) 0.6 (8.4) 0.2 (3.7) EEE (QV) -0.3 (11.8) -0.3 (9.6) 2.8 (20.5) 1.3 (8.7) -0.2 (5.0) -0.1 (4.2) 0.5 (8.5) 0.2 (3.7) True 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 Log Gamma 12.5 (10.6) 3.9 (18.4) -10.3 (24.8) -8 (22.3) 13.2 (5.2) 4.3 (8.8) -10.7 (11.4) -7.8 (10.3) Normal Poisson 10.9 (9.4) 6.6 (21.8) -2.7 (38.1) -1.8 (33.3) 12.8 (4.8) 6.7 (10.8) -5.9 (16.8) -3.7 (14.8) V=(1+x)2 Inv. Gauss. 10.8 (9.9) -0.4 (14.8) -16.7 (18.2) -13.6 (16.6) 11.5 (4.7) 0.3 (6.9) -16.3 (8.2) -12.7 (7.6) EEE (PV) 0.7 (19.8) -4.8 (14.6) 3.6 (41.2) 14.5 (64.1) 1.1 (8.1) -1.4 (6.7) 1.6 (19.8) 3.6 (20.4) EEE (QV) - - - - - - - - True 0.766 1.762 4.369 2.574 0.766 1.762 4.369 2.574

NOTE: Based on 500 replicates. * Gamma, Poisson, and Inverse Gaussian estimators are implemented with log link.

++ Mn = Mean %Bias across 500 replicates ( 5001 ˆ500 ( ) 1001

x x xjjµ µ µ− ∂ ∂ − ∂ ∂ ⋅ ∂ ∂∑

=); cv= coefficient of variation across 500

replicates ( ˆ100 sd( )x xµ µ⋅ ∂ ∂ ∂ ∂ ) where sd = standard deviation across 500 replicates; True = represent the true value of the

corresponding parameter.

31

Table 5: Estimated incremental and marginal effects on inpatient expenditures from hospitalist data.

Estimator Mean $ (se+)

Incremental effect of hospitalist ( 1π )

Marginal effect of Disease specific Experience ( *

2ξ ) Gamma (log link)

61 (279)

-318 (107)

EEE PV

-30 (231)

-261 (87)

Gamma (sqr root link)

-18 (215)

-242 (85)

EEE with Var ∝ µ0.5

122 (327)

-218 (128)

Note: Models are adjusted for patient co-morbidities, relative utilization weight of diagnosis, admission month indicator variables, and an indicator for transfer from another institution. + Robust standard error based on sandwich estimator, accounting for clustering of observations within physicians.

Estimating Marginal and Incremental Effects on Health...

Documents

Transcript of Estimating Marginal and Incremental Effects on Health...