Computation of estimates in segmented regression and a liquidity effect model

Computational Statistics & Data Analysis 51 (2007) 6459–6475www.elsevier.com/locate/csda

Computation of estimates in segmented regression and a liquidityeffect model

Ryan Gilla,∗, Kiseop Leea, Seongjoo Songb

aDepartment of Mathematics, University of Louisville, USAbDepartment of Statistics, University of Seoul, Korea

Received 25 April 2006; received in revised form 28 January 2007; accepted 24 February 2007Available online 7 March 2007

Abstract

Weighted least squares (WLS) estimation in segmented regression with multiple change points is considered. A computationallyefficient algorithm for calculating the WLS estimate of a single change point is derived. Then, iterative methods of approximatingthe global solution of the multiple change-point problem based on estimating change points one-at-a-time are discussed. It is shownthat these results can also be applied to a liquidity effect model in finance with multiple change points. The liquidity effect modelwe consider is a generalization of one proposed by Çetin et al. [2006. Pricing options in an extended Black Scholes economy withilliquidity: theory and empirical evidence. Rev. Financial Stud. 19, 493–529], allowing that the magnitude of liquidity effect dependson the size of a trade. Two data sets are used to illustrate these methods.© 2007 Elsevier B.V. All rights reserved.

Keywords: Segmented regression; Liquidity effect estimation; Multiple change points

1. Introduction

In many instances, it is not reasonable to assume a linear relationship between a set of covariates and a responsevariable. In these circumstances, there are several nonlinear modeling techniques available in the literature, such asspline models. Splines produce flexible nonlinear fits which are easy to compute provided the number of knots arefixed (see de Boor, 1978; Dierckx, 1993). However, the selection of the number of knots is important since enoughknots are needed to provide a good fit, but using too many knots results in overfitting. Several methods for choosingthe number and location of knots have been proposed, including smoothing splines (see Eubank, 1988; Wahba, 1990;Green and Silverman, 1994; Gu, 2002), multivariate adaptive regression splines (see Hastie et al., 2001), and P-splines(see Eilers and Marx, 1996; Ruppert, 2002). In using these spline methods, approximation and interpolation of theregression functions are emphasized.

On the other hand, segmented regression models focus on estimating and making inference on the locations atwhich there are intrinsic structural changes in the model (see Feder, 1975; Seber and Wild, 1989; Kim et al., 2004). Inthis setting, the knots are commonly referred to as change points. Although the estimation of change-point models isinherently more computationally intense than fitting spline models, the estimates of the change points as well as the

∗ Corresponding author. Tel.: +1 5028526826.E-mail address: [email protected] (R. Gill).

0167-9473/$ - see front matter © 2007 Elsevier B.V. All rights reserved.doi:10.1016/j.csda.2007.02.026

http://www.elsevier.com/locate/csda

mailto:[email protected]

6460 R. Gill et al. / Computational Statistics & Data Analysis 51 (2007) 6459–6475

other model parameters can more easily be interpreted and the model can be formally tested for the existence of changepoints. In addition, segmented regression can be used as a diagnostic tool in the linear regression setting to assess theadequacy of the linear model. If no significant change point is detected, then this provides support for the linear model.

In this paper, we consider the segmented regression setting where we model the mean of a real-valued responsevariable y based on a vector of covariates z = [z1, . . . , zp]� as well as on a real-valued covariate x in which we suspectthere are � unknown change points �1 < · · · < �� using the function

f (x, z) = ��z +�∑

j=1

�j (x − �j )+, (1)

where (·)+ = max{·, 0}. Given observations (zi , xi, yi), i = 1, . . . , n, the weighted least squares (WLS) estimates ofthe unknown parameters �1, . . . , ��, �1, . . . , ��, and � are the values which minimize the weighted sum of squares:

WSS =n∑

i=1

wi(yi − f (xi, zi ))2,

with known weights wi , i = 1, . . . , n. The functional form (1) of the segmented regression model with multiple changepoints resembles that given by Küchenhoff (1997).

First, we consider the model with one change point. If the locations of the change points are known, then the problemof estimating the other unknown parameters reduces to standard techniques. Therefore, a simple method for attemptingto fit our model is a grid search. This method begins by restricting the change points to a finite set of selected values(the grid). For each possible � in the grid, the WSS is optimized by estimating the regression coefficients at that �. Thenthe � which corresponds to the smallest WSS is the estimate of the change point based on the grid search. Unfortunately,the grid search is very slow for a fine grid, and it could be inaccurate since the exact minimizer may not be in the gridof possible values of the change-point parameter.

Hence we would prefer an exact algorithm. One approach is to differentiate with respect to all unknown parameters,including the change-point parameter, and search for local extrema. However, WSS is not differentiable on regionswhere � is one of the observed x’s. Thus, this method searches for local minimizers in each differentiable region byletting � vary within the region and optimizes it over all nondifferentiable regions by fixing the � that corresponds to theregion. Finally, the change-point estimate is determined by evaluating WSS at each local minimizer in the regions ofdifferentiability as well as at all of the (unique) x’s. Similar algorithms have been discussed in the literature in varioussettings (see Hudson, 1966; Seber and Wild, 1989; Küchenhoff, 1997; and the references therein).

The standard exact algorithm presented in the previous paragraph is much more efficient than the grid search, but itscomputational efficiency can be improved for the model based on (1) by using part of the grid search idea. Specifically,we would like to avoid evaluating our function at points of nondifferentiability if these points are not local minimizers.As shown in Section 3, it turns out that, if we begin by fixing � and estimating all of the other parameters as functions of�, we can find explicit formulas for the minimizer (and maximizer) when � is restricted to the interval between a pair ofconsecutively ordered x’s. These formulas are related to the derivative of WSS in such a way that we can simultaneouslydetermine whether there is a minimizer and/or maximizer in the interval and whether the WSS optimized with respectto all regression coefficients is increasing or decreasing as we approach the endpoints from within that interval. Thus,we may be able to avoid optimizing and evaluating WSS at some (often most) of the x’s.

These results can apply to a more general model given by (2) discussed in Section 2 which includes a liquidity effectmodel in finance. The main idea of the liquidity effect model is that the underlying asset price process may depend onactivities of traders in the market. Although the idea that the evolution of the stock price depends on the trading volumehas existed for several decades (for instance, see Clark, 1973; Morgan, 1976; Admati and Pfleiderer, 1988) in thefinance literature, the problem of the liquidity risk was not widely studied until only about a decade ago. Readers mayconsult Jarrow (1992, 1994, 2001), Frey (1998), Frey and Stremme (1997), Subramanian and Jarrow (2001), Duffieand Ziegler (2003), Pastor and Stambaugh (2003), Bank and Baum (2004), and Çetin et al. (2004) for recent works.

When we apply models with liquidity risk, for example, one by Çetin et al. (2006), to a real market data set, weencounter the problem of estimation of liquidity risk. Çetin et al. (2006) consider a model that the liquidity effect islinear in the size of a trade, estimate the appropriate parameters by simple linear regression, and provide evidencethat the supply curve is linear for the most liquid stocks. Our method, on the other hand, allows different slopes of

R. Gill et al. / Computational Statistics & Data Analysis 51 (2007) 6459–6475 6461

regression lines depending on the size of a trade. Thus, we provide a method to estimate liquidity effect in the pricingmodel, using a multiple change-point detection algorithm. Under our model, the term representing the liquidity effectis piecewise linear in the size of a trade which may be more appropriate for less liquid stocks.

The mathematical details and algorithm for finding the WLS estimator of a single change point in the generalsetting (2) which includes both the basic segmented regression model and this liquidity effect model are presented inSection 3. Next, we consider fitting the model with multiple change points. Full exact searches for the global minimizer(s)are available, but they are very computationally intensive (see Küchenhoff, 1997). The most widely used method ofapproximating the global solution by estimating the change points one-at-a-time is known as binary segmentation(see, for instance, Vostrikova, 1981; Yang, 2004 and the references therein). Beginning with a null model of no changepoints, the classical method estimates and tests a model with one change point. If the model with one change point issignificant, then the data is split into two segments and the process is repeated for each segment. The splitting processcontinues until the null hypothesis of no change points is not rejected for any of the segments.

Since each segment shares the covariates z in our model, the binary segmentation method must be modified since �

must be estimated based on all segments. Also, we will not immediately be concerned with the testing aspect of binarysegmentation as discussed above. Instead, we use it as a fast method of approximating the global solution when wesuppose that we know the number of change points a priori. In Section 4, we describe our version of the segmentationmethod and use simulations to compare it with iterative methods which guarantee a local minimizer, as well as thelengthy method of computing the global minimizer.

Finally, we illustrate the methods on two data sets in Section 5. The first example uses the basic segmented regressionon gasoline mileage data (see Koul et al., 2003). The second example uses our liquidity effect model to examineFederal Express (FDX) stock data. In each example, we use the generalized likelihood ratio test based on Monte Carlosimulations under the assumption of normality to estimate various p-values and choose � before computing our finalestimates.

2. The model

In this section we define the basic segmented regression model and derive a liquidity effect model. We show howthese models can be expressed in the form

f (xi , zi ) = ��zi +�∑

j=1

�j�(�j ; xi ), (2)

where there are known ck and dk such that

�(�; xi ) = �k(�; xi ) ≡ cik − �dik, � ∈ Ik , (3)

where U is the set of unique observed x’s, uk is the kth smallest unique element of U, q is the cardinality of U, andIk=[uk, uk+1] for k=1, . . . , q−1. Note that � is a continuous, piecewise-linear function of �; the continuity restrictionwill be relaxed at � = 0 for the liquidity effect model.

2.1. Basic segmented regression model

First, we verify that the basic segmented regression model defined by (1) can be expressed as (2). Let U ={x | x = xi for some i} and

�(�; xi) = (xi − �)+ ={

0 if xi ��,

xi − � if xi > �(4)

for � ∈ Ik as in (1). Now suppose xi = u� for some �. We have xi = u� �uk+1 �� when k < � and xi = u� �uk ��when k��. Thus, Eq. (4) is a special case of (3) with

cik ={

xi for k = 1, . . . , � − 1,

0 for k = �, . . . , q − 1


and

dik ={

1 for k = 1, . . . , � − 1,

0 for k = �, . . . , q − 1.

2.2. Liquidity effect model

Consider the filtered probability space (�,F, (Ft )0� t �T , P) satisfying the usual conditions where T is a fixedtime. P represents the statistical or empirical probability measure. We consider a market with a risky asset and a moneymarket account. The risky asset, stock, pays no dividend and we assume that the spot rate of interest is zero. We alsoconsider an arbitrary trader who acts as a price taker with respect to an exogenously given supply curve for sharesbought or sold of the stock within the trading interval.

S(t, x, �) represents the stock price per share at time t ∈ [0, T ] that the trader pays/receives for an order of sizex ∈ R given the state � ∈ �. A positive order (x > 0) represents a buy, a negative order (x < 0) represents a sale, andthe zeroth order (x = 0) corresponds to the marginal trade. In a perfectly liquid market, all orders can be consideredmarginal. For the detailed structure of the supply curve S(t, x), see Çetin et al. (2004, Section 2).

Let S(t, 0) be the Black–Scholes stock price as

S(t, 0) = S0 exp{(� − 122)t + Bt }, (5)

where B is a Brownian motion under P. Suppose our supply curve is given by

S(t, x) = e�xS(t, 0), (6)

where � is a constant. This is the case when the liquidity effect grows linearly in the trading size in log of the price. Thismodel can be expressed in the form of linear regression as described in Çetin et al. (2006). Suppose that we observesettlement prices S(t1, x1), and S(t2, x2). Combining (5) and (6), we get the following:

log

(S(t2, x2)

S(t1, x1)

)=(

� − 2

2

)(t2 − t1) + �(x2 − x1) + , (7)

where follows N(0, (t2 − t1)2).An advantage of the model (6) is that it is easy to calculate and to interpret, since we can use the standard linear

regression theory. But there is a drawback too. Recent studies such as Frino et al. (2003), Tse and Xiang (2005) havefound that the liquidity effects are in general asymmetric and that there is steeper price drop in a block sale than in ablock buy. Also, the liquidity cost is not like a linear transaction cost. A reasonably liquid stock is traded without pricechange up to some quantity, but starts to be affected by the trading volume as the volume exceeds a certain quantity.Also, the increment pattern of the liquidity cost varies as volume changes. This is supported by a typical strategic tradeby block traders. They usually divide their block trade into several smaller orders to hide their block trade. In addition,many block orders are traded not on the floor, but by a special agreement. Therefore, we expect the liquidity cost willbe pretty flat after some points. These observations suggest that the liquidity cost will change the pattern as the volumechanges. In other words, we expect a relatively flat pattern around zero, and a steeper pattern in the middle, and againa relatively flat pattern at the both extremes, with a bigger effect on the sale side.

One possible extension is the multiple change-point model. This model allows us to fix the possible problems inmodel (6) while maintaining piecewise linearity so that the parameter estimates are still easy to understand. Also, itgives us the ability to estimate the volume at which the size of a trade begins to affect the price of a stock, as well asany other changes in pattern for larger volumes. Therefore, we consider an extension of this regression model to thefollowing multiple change-point model1 :

log

(S(t2, x2)

S(t1, x1)

)=(

� − 2

2

)(t2 − t1) + g(x2) − g(x1) + , (8)

1 This equation corresponds to the model S(t, x) = eg(x)S(t, 0).


Table 1Definitions of cik and dik for the liquidity effect model when � ∈ int(Ik)

�> 0 �< 0

cik dik cik dik

�< xi1, xi2 xi2 − xi1 0 �< xi1, xi2 0 0xi1 < �< xi2 xi2 1 xi1 < �< xi2 −xi1 −1xi2 < �< xi1 −xi1 −1 xi2 < �< xi1 xi2 1xi1, xi2 < � 0 0 xi1, xi2 < � xi2 − xi1 0

where

g(x) =�+∑j=1

�j+�−(x − �j+�−)+ −�−∑j=1

�j (�j − x)+,

�− and �+ are parameters representing the number of respective change points before or at and at or after x = 0,�−�− , . . . , �−1, �1, . . . , ��+ are unknown coefficients, and �1 < · · · < ��− �0��−+1 < · · · < �� are unknown changepoints with �=�− +�+. Here we replace the linear function �x in (7) by the more flexible g which is a piecewise-linearcontinuous function containing the origin.

Given the observed settlement prices S(ti1, xi1) and S(ti2, xi2) for i = 1, . . . , n where t11 < t12 � t21 < t22 � t31< · · · < tn2, let

yi = log

(S(ti2, xi2)

S(ti1, xi1)

)for i = 1, . . . , n so that y1, . . . , yn are independent since they are based on independent increments. Then we have

yi = f (xi1, xi2; wi) + i ,

where

f (xi1, xi2; wi) = �0/wi + g(xi2) − g(xi1) = �0/wi +�∑

j=1

�j�(�j ; xi1, xi2),

�(�; xi1, xi2) =

⎧⎪⎨⎪⎩(xi2 − �)+ − (xi1 − �)+ if � > 0,

(� − xi1)+ − (� − xi2)+ if � < 0,

0 if � = 0,

(9)

�0 = � − 2/2, wi = (ti2 − ti1)−1, and 1, . . . , n are independent N(0, (ti2 − ti1)2), i = 1, . . . , n, respectively. Since

the function � is not necessarily continuous at � = 0, we admit the left and right limits 0− and 0+ as possible valueswe may wish to evaluate �; that is, �(0+; xi1, xi2) = (xi2)+ − (xi1)+ and �(0−; xi1, xi2) = (−xi1)+ − (−xi2)+.

Since (9) is not continuous at 0, we need to modify U and Ik before (9) can be expressed in the form (3). LetU= {x|x = xij for some i, j or x = 0}, Ik = [uk, uk+1] − {0}, and let q∗ be the cardinality of U∩ (−∞, 0). Then (3)is valid if we define cik and dik by the respective values in Table 1when � is in the interior of Ik .

3. Single change-point estimation

In this section, we describe our algorithm for finding the WLS estimate under the model (2, 3) with � = 1. Asmentioned in Section 1, it is more computationally efficient than the standard exact algorithm. However, it shouldbe noted that our setting is different from that of other authors. Our results do not hold in general in their settings.Hudson (1966) considered more general functional changes in the model at the change points, and Küchenhoff (1997)considered maximum likelihood estimation of change points in a generalized linear model (GLM) setting. Nevertheless,the most widely used cases covered by these authors’ models—the case where f1, . . . , fr is linear for Hudson (1966)and the Normal case for Küchenhoff (1997)—are covered by the model given by (2, 3).


3.1. Mathematical preliminaries

When � = 1, the WLS estimator corresponds to the value of � which minimizes the optimized weighted sum ofsquares:

V (�; �(�), �(�)) =n∑

i=1

wi(yi − f (xi, zi ))2

= ‖W1/2(y − Z�(�) − �(�)�(�))‖2, (10)

where y = [y1, . . . , yn]�, Z = [z1 · · · zn]�, W is a diagonal matrix with diagonal elements wi for i = 1, . . . , n, �(�) =[�(�; x1), . . . , �(�; xn)]�, and (�, �) is a solution to the score equations

�V (�; �, �)

��= −2Z�W(y − Z� − �(�)�) = 0 (11)

and

�V (�; �, �)

��= −2��(�)W(y − Z� − �(�)�) = 0. (12)

When � is fixed, (�, �) corresponds to a minimizer since the matrix of second partials of V with respect to (�, �),

2[Z...�(�)]�W[Z...�(�)],is positive semidefinite. Throughout this section, we denote the range of Z by R(Z) and assume that Z is full rank. Wealso assume that y /∈R(Z) since this would imply that V (�) = 0; thus, there would be no reason to search for a changepoint if the model based on Z provides a perfect fit.

First, we want to examine the solutions to the score equations (11) and (12). Let K = W − WZ(Z�WZ)−1Z�W andJ = I − Z(Z�WZ)−1Z�W. Note that K is symmetric, J is idempotent, and K = J�WJ = J�W = WJ. Then, Theorem 1explicitly solves the score equations when such a solution is unique; when the solution is not unique, it is shown thatthe minimum of V is the same as under the reduced model without an additional change point.

Theorem 1 (Solutions to score equations).

(a) For a ∈ Rn, a�Ka = 0 if and only if a ∈ R(Z).(b) For any � such that �(�) /∈R(Z),

�(�) = (Z�WZ)−1Z�W(

I − �(�)��(�)K��(�)K�(�)

)y (13)

and

�(�) = ��(�)Ky��(�)K�(�)

. (14)

(c) For any � such that �(�) ∈ R(Z),

V ≡ min�,�

V (�; �, �) = min�

V (�; �, 0) = y�Ky. (15)

Proof. Consider statement (a). Suppose a�Ka = 0 which implies that W1/2Ja = 0. Left multiplication by W−1/2

produces

a = Z(Z�WZ)−1Z�Wa


which shows that a ∈ R(Z). Conversely, suppose that a ∈ R(Z). Then there is some b such that a = Zb so that

Ja = (I − Z(Z�WZ)−1Z�W)Zb = 0

implying that a�Ka = (Ja)�W(Ja) = 0.Now consider (b). The score equations can be reduced to

Z�WZ� + Z�W�(�)� = Z�Wy,

��(�)WZ� + ��(�)W�(�)� = ��(�)Wy.

Thus, we can solve for [�, �]� by observing that

[Z�WZ Z�W�(�)

��(�)WZ ��(�)W�(�)

]−1

=

⎡⎢⎢⎢⎣G

(I + Z�W�(�)��(�)WZG

��(�)K�(�)

)− GZ�W�(�)

�(�)�K�(�)

− ��(�)WZG��(�)K�(�)

1

��(�)K�(�)

⎤⎥⎥⎥⎦ ,

where G = (Z�WZ)−1. Now G exists since Z is full rank, and the overall inverse exists since �(�) /∈R(Z) so that��(�)K�(�) = 0.

Finally, consider (c). If �(�) ∈ R(Z), then the range of the matrix [Z...�(�)] is the same as that of Z. Thus, theprojection of y onto either space is the same which proves the first equality in (15). The second equality follows easilyfrom standard techniques. �

Now that we have solved the score equations, we view V as a univariate function of � and minimize it. It is useful toexpress (10) as

V (�) = Vk(�), � ∈ Ik, k = 1, . . . , q − 1,

where

Vk(�) = ‖W1/2(y − Z�k(�) − �k(�)�k(�))‖2, (16)

�k(�) = ck − �dk , ck = [c1k, . . . , cnk]�, dk = [d1k, . . . , dnk]�, and �k and �k are defined as in (11) and (12) with �

replaced by �k . Of course, �k and �k are only uniquely defined when �k(�) /∈R(Z), but Vk = V when �k(�) ∈ R(Z).The function Vk agrees with V for all � ∈ Ik . Thus, we minimize each Vk restricted to Ik . We will consider the

cases Sk = O and Sk = O separately where

Sk = K(ckd�k − dkc�

k )K

and O denotes a matrix of zeros.Statement (a) in Theorem 2 gives a necessary and sufficient condition for Sk = O. Since the condition depends on

whether �k(�) is in the range of Z, Theorem 1 implies that this determines the values of � for which we can solve for�k(�) and �k(�) uniquely. Statement (b) shows that these score equations cannot be solved uniquely for any value of� ∈ R if they cannot be solved uniquely for more than one �.

Theorem 2 (Characterization of �k(�) ∈ R(Z)).

(a) There exists some �∗ ∈ R such that �k(�∗) ∈ R(Z) if and only if Sk = O.(b) If �k(�∗

1), �k(�∗2) ∈ R(Z) for �∗

1 = �∗2, then �k(�) ∈ R(Z) for all � ∈ R.

Proof. Consider statement (a). First suppose �k(�∗) ∈ R(Z). Then K�k(�∗) = 0 so that Kck = �∗Kdk . Thus, we have

Sk = K(�∗dkd�k − dkc�

k )K = −Kdk(ck − �∗dk)�K = −Kdk0� = O.


V

Vk*

V

Vk*

Vk*

V

Vk*

νk

νk

ν νk

ν

ν

Fig. 1. Plots of Vk for Theorem 3d1–3d4.

Conversely, suppose Sk = O. Then ab� = ba� where a = Kck ≡ [a1, . . . , an]� and b = Kdk ≡ [b1, . . . , bn]�. Thisimplies that aibj = biaj for all i and j. Consequently, a = �∗b for some �∗ so that K(ck − �∗dk) = 0. Thus, we have�k(�∗) ∈ R(Z).

Next we have

�k(�) = �∗2 − �

�∗2 − �∗

1�k(�

∗1) + � − �∗

1

�∗2 − �∗

1�k(�

∗2)

for any �, which proves (b). �

Consequently, Sk = O if there are no values of � such that �k(�) ∈ R(Z). Theorem 3 discusses properties of Vk inthis case. Fig. 1 illustrates scenarios 1–4 described in part (d).

Theorem 3 (Properties of Vk when Sk = O). If Sk = O, then the following statements hold:

(a) Vk is differentiable with respect to � on R.(b) If dk /∈R(Z), then Vk has exactly one horizontal asymptote at

V ∗k ≡ Vk(−∞) = Vk(∞) = y�

(K − Kdkd�

k K

d�k Kdk

)y.

If dk ∈ R(Z), then Vk is constant.(c) Vk (and V) are bounded above by V .(d) Let �k = c�

k Sky/d�k Sky if d�

k Sky = 0 and �k = c�k Ky/d�

k Ky if d�k Ky = 0. Then Vk satisfies one of the following

five situations:1. If (d�

k Ky)(d�k Sky) < 0, then Vk decreases on (−∞, �k), increases on (�k, �k), and decreases on (�k, ∞). Thus,

Vk is minimized at �k and maximized at �k .2. If (d�

k Ky)(d�k Sky) > 0, then Vk increases on (−∞, �k), decreases on (�k, �k), and increases on (�k, ∞). Thus,

Vk is minimized at �k and maximized at �k .


3. If d�k Ky = 0 but d�

k Sky = 0, then Vk decreases on (−∞, �k) and increases on (�k, ∞). Thus, Vk is minimizedat �k but has no maximum.

4. If d�k Ky = 0 but d�

k Sky=0, then Vk increases on (−∞, �k) and decreases on (�k, ∞). Thus, Vk is maximizedat �k but has no minimum.

5. If d�k Ky = d�

k Sky = 0, then Vk is constant.

Proof. Since Sk = O, Theorem 2(a) and Theorem 1(b) imply that �k and �k are differentiable, so that statement (a)follows from the differentiability of �k .

Statement (b) follows by substituting �k(�) and �k(�) into (16) to obtain

Vk(�) =∥∥∥∥∥K1/2

(I − �k(�)��

k (�)K

��k (�)K�k(�)

)y

∥∥∥∥∥2

= y�(

K − K(ck − �dk)(ck − �dk)�K

(ck − �dk)�K(ck − �dk)

)y

and taking the limits as � → −∞ and � → ∞.Statement (c) follows from

Vk(�) = (y − Z�k)�W(y − Z�k) − 2��

k ��k W(y − Z�k − �k �k) − ��

k ��k W�k �k

= (y − Z�k(�))�W(y − Z�k(�)) − ��

k (�)��k (�)W�k(�)�k(�)

�(y − Z�k(�))�W(y − Z�k(�)) = V (17)

since (12) implies (17). The result holds for V (�) since the bound holds for all k. To show (d), we first differentiate (16)with respect to � to obtain

dVk(�)

d�= 2�k(�)fk(�),

where fk(�) = d�k W(y − Z�(�) − �k(�)�k(�)). Note that all terms involving the derivatives of �k and �k cancel. Since

��k (�)K�k(�) > 0, we have that

�k(�) = c�k Ky − �d�

k Ky

��k (�)K�k(�)

is decreasing(constant/increasing) if d�k Ky > 0(=0/ < 0) with a unique root at �k if d�

k Ky = 0. Clearly, �k maximizesVk since �k(�) = 0 corresponds to the upper bound V . Also, we have that

fk(�) = c�k Sky − �d�

k Sky

��k (�)K�k(�)

is decreasing(constant/increasing) if d�k Sky > 0(=0/ < 0) with a unique root at �k if d�

k Sky = 0. When both exist, theordering of �k and �k must occur as stated in (d) since �k must be a maximizer. This proves (d1) and (d2). If d�

k Ky = 0,then d�

k Sky=−(d�k Kdk)(c�

k Ky) so that �k and fk have opposite signs before �k and the same sign after �k which proves(d3); note that d�

k Kdk > 0 since d�k Sk = 0. If d�

k Sky = 0, then

(d�k Kck)(d�

k Ky) = (d�k Kdk)(c�

k Ky)

so that

c�k Sky = (c�

k Kck)(d�k Kdk) − (c�

k Kdk)2

d�k Kdk

d�k Ky. (18)

Note that ck is not a multiple of dk since this would contradict Sk = O; that is, K(ck − �∗dk) = 0 ∈ R(Z) impliesSk = O by Theorem 2(a) if say ck = �∗dk . Thus, the numerator in the first term on the right of (18) is positive by the


10

8

6

4

2

0 2 4 6 8 10

ν

V1

V2

V3

V4

V5

V6

V

Fig. 2. Plots of V and Vk , k = 1, . . . , 6, for the artificial data set in Section 3.2.

Cauchy–Schwarz inequality so that �k and fk have the same sign before �k and opposite signs after �k . This proves(d4). If d�

k Ky = d�k Sky = 0, then c�

k Sky = 0 so that fk(�) = 0 which proves (d5). �

On the other hand, Sk = O if there is at least one value �∗ such that �k(�∗) ∈ R(Z). Theorem 4 discusses propertiesof Vk for this case.

Theorem 4 (Properties of Vk and V when Sk = O). If Sk = O, then the following statements hold.

(a) If there is exactly one �∗ such that �k(�∗) ∈ R(Z), then Vk is constant over R except possibly for a removablediscontinuity at �∗.

(b) If the assumption of Theorem 2(b) holds, then Vk(�) = V for all �.

Proof. The proof for (a) is similar to the proof of Theorem 3(d5) when � = �∗. Statement (b) follows immediatelyfrom Theorem 2(b) and Theorem 1(c). �

3.2. Algorithm

Consider the artificial data set y = [1, 2, 3, 4, 3, 0, 3]�, xi = i, and wi = 1 for i = 1, . . . , 7. For the basic segmentedregression model with zi = [1, xi]�, the function V is plotted in Fig. 2. To use the standard exact algorithm, we needto search for critical points in the interior of Ik for k = 2, . . . , 5. Then we evaluate V at each of these points as well asat xk = k for k = 2, . . . , 6 since V is not differentiable there. However, the only local minimizer among the points ofnondifferentiability (from 2 to 6) is 6. Thus, we need only to evaluate V at the critical point between 3 and 4 and at 6.

In this section, we describe an algorithm which does not evaluate V at an endpoint of Ik unless it is a possibleminimizer. (An endpoint is a possible minimizer if it is a local minimizer and there is no other local minimizer inthe adjacent intervals.) It determines the sign of the directional derivatives at each of the points in U using the samecalculations used in determining if there are critical points in the interiors of the respective intervals. In most practicalproblems, very few of the points in U will be possible minimizers, so the algorithm avoids evaluating V at most pointsof nondifferentiability. The details of this algorithm given for the case when Sk =O are important since we will consideran application of the single change-point algorithm in Section 4 where this occurs frequently.

Now we are ready to give the details of our algorithm based on Theorems 3 and 4.Although we describe the algorithmmoving from right to left, obvious modifications can be made if we wish to move from left to right instead. In movingfrom right to left, we cannot determine whether the left endpoint uk ofIk is a possible minimizer until after we examine


Ik−1. Thus, we need the boolean-valued END where TRUE indicates that uk might be a possible minimizer based onthe behavior of V in Ik and depending on its behavior in Ik−1.

1. Compute K and Ky.2. Set END = TRUE and k = q − 1.3. If k = 0, then go to 4. Otherwise, search for possible minimizers in (uk, uk+1]. Compute dk , ck , and Sk . If

Sk = O, then go to 3.1. If Sk = O, then go to 3.2.3.1. Compute Sky and d�

k Sky. If d�k Sky = 0, then go to 3.1.1. If d�

k Sky = 0, then go to 3.1.2.3.1.1. Compute c�

k Sky and �k . If �k ∈ (uk, uk+1), then store �k as a possible minimizer based on Theorem3(d1)/3(d2), set END = FALSE, decrement k by 1, and go back to 3. If �k /∈ (uk, uk+1), then go to 3.1.1.1.

3.1.1.1. Compute d�k Ky. If (d�

k Ky)(d�k Sky) > 0, then go to 3.1.1.1.1. If (d�

k Ky)(d�k Sky) < 0, then go to 3.1.1.1.2. If

d�k Ky = 0, then go to 3.1.1.1.3.

3.1.1.1.1. If �k �uk , then go to 3.3.3. If �k �uk+1, then go to 3.3.3.1.1.1.2. If �k �uk , then go to 3.3. If �k �uk+1, then go to 3.3.1.3.1.1.1.3. If �k �uk , then go to 3.3.3. If �k �uk+1, then go to 3.3.1.

3.1.2. Compute d�k Ky. If d�

k Ky = 0, then go to 3.3. If d�k Ky = 0, then V is constant overIk by Theorem 3(d5) so

that we keep the previous value of END, decrement k by 1, and go back to 3.3.2. Theorem 4 implies that V is constant a.e. overIk . So compute K�(uk+1) to determine whether V is continuous

at uk+1. If K�(uk+1) = 0, then go to 3.2.1. If K�(uk+1) = 0, then go to 3.2.2.3.2.1. Theorem 4 implies that V may not be continuous at uk+1. If END = TRUE, then store uk+2 as a possible

minimizer. (It is not unique in this case.) Next, set END = TRUE, decrement k by 1, and go back to 3.3.2.2. Theorem 4(a) implies that V is continuous at uk+1. Hence, keep the previous value of END, decrement k by

1, and go back to 3.3.3. Compute c�

k Ky and �k . If �k < uk , then go to 3.3.1. If �k ∈ Ik , then go to 3.3.2. If �k > uk , then go to 3.3.3.3.3.1. Theorem 3(d)implies that V is decreasing on Ik . If END = TRUE, then store uk+1 as a possible minimizer.

Next, set END = FALSE, decrement k by 1, and go back to 3.3.3.2. Theorem 3(d) implies that V is increasing on Ik for � < �k and decreasing on � > �k . If END = TRUE, then

store uk+1 as a possible minimizer. Next, set END = TRUE, decrement k by 1, and go back to 3.3.3.3. Theorem 3(d) implies that V is increasing on Ik . Thus, set END = TRUE, decrement k by 1, and go back to

3.4. If END = TRUE, then store u1 as a possible minimizer.5. Evaluate V at all possible minimizers. The value of � which corresponds to the smallest V is the minimizer

of V.

3.3. Special cases

We consider four special cases in which minor simplifications and/or modifications should be made to the algorithm.Here let 1 be a vector of ones and x = [x1, . . . , xn]�.

Basic segmented regression with z1=1 and z2=x: In the algorithm described in Section 3.2, we must search for possibleminimizers in [u1, uq ]. However, in this most widely used version of segmented regression, it can be shown that theminimizer must be in [u2, uq−1]. Theorems 2 and 4 imply that V1 is constant on (u1, u2) since �1(u1)=x−u11 ∈ R(Z)

and �q−1(uq) = 0 ∈ R(Z). For the artificial data set in Section 3.2, the computations required to determine that theWLS estimate of the change-point parameter is � = 3.625 are given in Table 2. The points in the column labeled �∗ arethe points at which V must be evaluated.

Slippage model with Z = 1 and W = I: This is a basic segmented regression model in which p = 1 so that the meanof y given x is the constant � before the change point � but gradually drifts away from � at rate �1 when x > �. Here theminimizer must be in the interval [u1, uq−1] since �q−1(uq) = 0 ∈ R(Z).

An interesting computational simplification can be made in this model. Provided that∑n

i=1 dikyi(xi − x∗k) = 0 and

y∗k = y◦

k , it can be shown that

�k = x∗k − (y∗

k − y◦k)∑n

i=1 dik(xi − x∗k)

2∑ni=1 dikyi(xi − x∗

k)


Table 2Computation of the minimizer of V for the artificial data set in Section 3.2

Interval d�k Sky c�

k Sky �k d�k Ky c�

k Ky �k END �∗ V (�∗)

�> 6 TRUE� ∈ (5, 6] −0.612 −4.796 7.833 −1.929 −11.071 5.741 TRUE 6 10.819� ∈ (4, 5] 0.612 1.224 2.000 −1.286 −7.857 6.111 TRUE� ∈ (3, 4] 0.980 3.551 3.625 FALSE 3.625 7.200� ∈ (2, 3] 0.357 1.173 3.286 1.214 1.357 1.118 FALSE��2

and

�k = x∗k +

∑ni=1 dikyi(xi − x∗

k)

ndk(1 − dk)(y∗k − y◦

k),

where x∗k =∑n

i=1 dikxi/∑n

i=1 dik , y∗k =∑n

i=1 dikyi/∑n

i=1 dik , and y◦k =∑n

i=1(1 − dik)yi/n −∑ni=1 dik . Either �k

or �k must be greater than x∗k and x∗

k �uk+1, so we need not compute �k if we know �k < x∗k since this guarantees that

�k > uk+1.Liquidity effect model: Consider again the model proposed in Section 2.2. The minimizer must be in the interval

[u2, uq−1] since �1(u1)=�q−1(uq)=0 ∈ R(Z). The main modification that must be made is due to discontinuity of (9)and thus of V at 0. Therefore, the most natural way of modifying the algorithm is performing step 3 for k=q−2, . . . , q∗and then repeating this step again for k = 2, . . . , q∗ − 1 (where we instead move from left to right and modify thealgorithm accordingly).

Model with covariates representing fixed change points: Suppose that z� = �k(�∗) for some �∗ ∈ Ik . For instance,this will be applicable to the methods discussed in Section 4. Then Theorem 2 implies that Sk = O so that we neednot compute it to find that Vk is constant everywhere except possibly �∗. If �∗ is, say, the right endpoint uk+1 ofIk , then our objective function V is constant a.e. over both intervals Ik and Ik+1 but may have a removablediscontinuity at �∗.

4. Multiple change-point estimation

Now, we consider the problem of finding the WLS solution under the model (2, 3) when the number of change points� is greater than 1. In this section, we propose various methods for approximating the computationally expensive exactglobal solution by estimating the change points one-at-a-time, and we compare these methods via simulation.

We begin by presenting each method. Each method is used to approximate the global minimizer based on the basicsegmented regression model with �=3 change points and zi =[1, xi]� for the artificial data set y=[5, 3, 9, 1, 7, 8, 2, 4,

5, 10, 1, 10, 2, 2, 6]�, xi = i and wi = 1 for i = 1, . . . , 15.Segmentation method: Recall from Section 1 that the method described here differs from the classic version in the

literature. Beginning with the covariates in zi , use the single change-point algorithm described in Section 3.2 to estimatethe first change point �1. Next, add an extra covariate �(�1; xi ) to zi and estimate the next change point �2 based againon the single change-point algorithm. Continue this process until we have obtained estimates �1, . . . , ��. Finally, orderthe estimates to obtain the final segmentation estimates �1, . . . , ��.

We now illustrate the segmentation method on the artificial data set in this section. Beginning with zi = [1, xi]�, weuse the single change-point algorithm to obtain the estimate �1 = 10 (WSS = 139.990). Then, fixing the change pointat 10 and using zi = [1, xi, (xi − 10)+]�, we estimate �2 = 14 (WSS = 130.642). Next, fixing another change pointat 14 and using zi = [1, xi, (xi − 10)+, (xi − 14)+]�, we estimate �3 = 8.805 (WSS = 119.89). Finally, we order theestimates to obtain �1 = 8.805, �2 = 10 and �3 = 14 with WSS = 119.893.

Quick local method: Although the segmentation method is very fast, it is not guaranteed to find even a local min-imizer. With this in mind, we now propose two methods which are guaranteed to converge to a local minimizer. Inthe first method, we begin by performing the segmentation method to obtain initial estimates �1, . . . , �� with the esti-mates labeled prior to ordering. Then we re-estimate each estimated change point one-at-a-time. That is, we begin byconsidering the covariate vector with the original components of zi plus the additional terms �(�j ), j = 2, . . . , � and


Table 3Results of the simulation study in Section 4

� Model �1, . . . , �� Segmentation Quick local Repeated local

2 1A 39.8%,.03980 s 92.9%,.09915 s 92.9%,.09915 s1B 2,5 36.1%,.03117 s 91.8%,.10075 s 91.8%,.10075 s2A 21.1%,.06747 s 73.2%,.22677 s 73.2%,.22677 s2B 5,10 24.3%,.06751 s 74.9%,.22530 s 74.9%,.22530 s3A 14.2%,.35460 s 46.9%,1.25416 s 46.9%,1.25416 s3B 26,52 12.9%,.35530 s 48.3%,1.23802 s 48.3%,1.23802 s

3 1A 23.9%,.04715 s 82.6%,.12151 s 89.9%,.19279 s1B 1,3,5 21.7%,.04685 s 83.7%,.12322 s 91.4%,.19657 s2A 9.0%,.09917 s 59.0%,.28387 s 65.1%,.45046 s2B 4,9,14 9.0%,.09874 s 60.3%,.28594 s 63.4%,.45287 s

re-estimating �1. Then we re-estimate �2 based on all other terms involving fixed change points. This continues untilwe cycle through all of the change points. If there is a sufficiently large change in the minimum value of V comparedwith its value before the cycle, then we cycle through and update the estimates �1, . . . , �� again. This iterative processcontinues until convergence at a desired level.

To illustrate the quick local method on the artificial data set, we begin by obtaining initial estimates based on thesegmentation method. Then, we fix change points at �2 =14 and �3 =8.805 and re-estimate �1 using the single change-point algorithm; it stays at 10. With �1 = 10 and �2 = 8.805 fixed, we re-estimate �3 to be 11 and the WSS decreasesto 114.868. With �1 = 10 and �2 = 11, we re-estimate �3 to be 8.881 so that WSS decreases to 114.793. Thus, wemust cycle through the change points again since the WSS decreased by 5.100. With �2 = 11 and �3 = 8.881 fixed,�1 stays at 10. With �1 = 10 and �3 = 8.881 fixed, �2 stays at 11. This is sufficient to show that the change-pointestimates have stabilized so that the final estimates based on this method are �1 = 8.881, �2 = 10, and �3 = 11 withWSS = 114.793.

Repeated local method: Now we describe a slower but most likely more accurate method in which we can guarantee alocal minimizer for the model with � change points. Begin by using the quick local method to obtain a local minimizerof the model with two change points. Then we use these estimates for the model with two change points as initialestimates to obtain estimates for the model with three change points, which in turn we use as initial estimates to obtainestimates for the model with four change points, and so on until we reach the model with � change points. Each ofthese repeated local searches begin by using the single change-point algorithm to estimate an additional change pointbased on a model with j change points. Then, we iteratively cycle through all change points in the list �1, . . . , �j+1until the values convergence as described in the quick local method.

To illustrate the repeated local method on the artificial data set, we begin by obtaining initial estimates �1 = 10 and�2 = 14 based on the second step of the segmentation method and first find a local minimizer for the model with � = 2.With �2 = 14 fixed, we re-estimate �1 to be 12 and WSS decreases from 130.642 to 122.308. With �1 = 12 fixed, were-estimate �2 to be 13.479 so that WSS decreases to 118.719. With �2 = 13.479 fixed, �1 stays at 12 and so two changepoints have stabilized. Now we update the local minimizer for �= 2 to obtain a local minimizer for �= 3. With �1 = 12and �2 = 13.479 fixed, we estimate �3 = 11 so that WSS decreases to 100.525. With �2 = 13.479 and �3 = 11 fixed,�1 stays at 12. With �1 = 12 and �3 = 11 fixed, we re-estimate �2 to be 13.333 so that WSS decreases to 99.918. Thereare no further changes when we re-estimate �3 and �1 so this method produces the estimates �1 = 11, �2 = 12, and�3 = 13.333 with WSS = 99.918.

Analysis of the artificial data set illustrates the trade-off between speed and performance for the approximationmethods. The repeated local method is most accurate but it is the slowest. The computing times for the methods usingthe free statistical package R on a Dell 2.4 GHz workstation are .01, .04, and .08 s, respectively, for the three algorithms.All are very fast compared with the simple standard exact global algorithm with a naively written portion that uses thebuilt-in lm function to extract all needed coefficient estimates which took 15.55 s. In this case, it turns out the repeatedlocal method finds the global minimizer.

Table 3 summarizes the results of a small simulation study to examine the performance of the methods. Here we letn = 80 and simulate yi according to the model yi = �i + where each i follows an independent Normal(0, = 0.5)


Table 4Estimated p-values for various generalized likelihood ratio tests with the MPG data

Alternative model

� = 1 � = 2 � = 3 � = 4

Null model � = 0 .0004 .0016 .0076 .0187� = 1 .3009 .4960 .5246� = 2 .6645 .6241� = 3 .3978

distribution. We consider two different models for �i ; �i =1−�xi for modelA (a misspecified model) and �i =1−�xi −�∑�

j=1(−1)j (xi − �j )+ for model B (a model with � true change points). Also, we consider three different models for

xi ; x=[0, . . . , 7, 0, . . . , 7, . . . , 0, . . . , 7]� for model 1 (m=10 repetitions), x=[0, . . . , 15, 0, . . . , 15, . . . , 0, . . . , 15]�for model 2 (m = 5), and x = [0, . . . , 79] for model 3 (m = 1). In each case, we assume the true values are � = m/100and the true value of �j are given in Table 3 although we are only concerned with the ability of the approximate methodsto find the global solutions and not necessarily the true values. The results based on 1000 simulations are given in thelast three columns of the table. For each model, the percentage of times that the methods agree with the global solutionand the duration of the methods are given. The quick local and repeated local methods are the same when �= 2. Model3 was not run for � = 3 because of the lengthy computation time needed for the global algorithm. In any case, thetrade-off between speed and accuracy is apparent. For this simulation, the accuracy and speed does not seem to dependon whether the model is specified correctly.

5. Examples

In this section, the methods of Section 4 are applied to two real data sets. Although estimation of � is not a focusof this paper, we perform several generalized likelihood ratio tests via simulations under the assumption of normality(with errors having covariance matrix 2W) which suggest a particular value of � for each example. Each of thesetests compare a null model having �0 change points with an alternative model having �1 > �0 change points. Theratio of the optimal weighted sums of squares (which is equivalent to the ratio of the likelihood functions undernormality) are compared for each model and the null hypothesis of �0 is rejected if the ratio WSS�0

/WSS�1is large

enough. For each test, we estimate the p-values based on 10,000 simulated data sets under the fitted null model.All models for real and simulated data are fit using the repeated local method in Section 4. The complicated natureof the likelihood ratio test statistic under the null model is discussed by Feder (1975). Other interesting approachesfor approximating the p-values of tests in segmented regression are discussed in Knowles and Siegmund (1989) andDavies (2002).

5.1. MPG data

Here we apply basic segmented regression to model gas mileage by the weight for automobiles for 38 cars from modelyear 1978–1979. This data set has been considered by several authors (see Koul et al., 2003, and the references therein).Initially, we used weights wi = 1, and simulation-based generalized likelihood ratio tests suggested the choice of �= 1among models with � = 0, 1, . . . , 4. Using the single change-point estimation algorithm in Section 3.2, �1 = 3163.371is obtained.

However, a plot of the residuals versus the weight indicated that the variance was significantly larger before thechange point. The ratio of the variance of the residuals after to the variance before �1 is 4.866, so we let wi = 4.866when xi > �1 and wi = 1 otherwise. Then we re-estimate all parameters for each � and obtain the estimated p-valuesgiven in Table 4 based on the generalized likelihood ratio test. The tests clearly indicate that � = 1 is the best choicesince we reject � = 0 but fail to reject � = 1 against all alternatives tested.

Under the model with � = 1, �1 is still 3163.371. The estimates of the coefficients are � = [58.224, −0.0123]�and �1 = 0.00994, and an ad hoc estimate of the variance is 2 = WSS/(n − p − �) = 1.554. The fitted mean ofy is shown in Fig. 3. Thus, the estimate of MPG decreases by about 1.23 given a 100 pound increase in the weight


35

35

25

20

15

MP

G

2000 2500 3000 3500 4000 4500

Weight

Fig. 3. Estimated f for the MPG data with � = 1.

Table 5Estimated p-values for various generalized likelihood ratio tests with the FDX data

Alternative model

� = 1 � = 2 � = 3 � = 4 � = 5

Null model � = 0 0.0000 0.0000 0.0000 0.0000 0.0000� = 1 0.0000 0.0000 0.0000 0.0000� = 2 0.0101 0.0068 0.0214� = 3 0.0919 0.2277� = 4 0.8083

of the automobile if the car weighs less than �1, but decreases by only 1.23 − 0.99 = 0.34 if the car weighs morethan �1.

5.2. FDX data

Here we apply the liquidity effect model described in Section 2.2 to stock data from FDX for April 11, 2001. Thisillustrates our extension of the liquidity effect model considered in Çetin et al. (2006) to a model with multiple changepoints. Also, the example demonstrates a useful application of our methods beyond the basic segmented regressionmodel.

The data set consists of stock prices S(ti1, xi1) and S(ti2, xi2) for orders of size xi1 and xi2 (in units of $100) that atrader pays/receives at adjacent trading times. We use 189 observed pairs in this period in which (xi1, xi2) ∈ U whereU = {±1, . . . ,±10} and ti2 − ti1 �30 s. As discussed in Section 2.2, we are interested in the sizes at which the pricebegins to be affected by the volume of the trade.

Table 5 gives simulation-based estimates of the p-values for different null and alternative models and suggeststhat � = 3 is a good choice. When � = 3, the repeated local method gives change-point estimates �1 = −1.1941,�2 =−0.10756, and �3 =3.1468, and coefficient estimates �0 =2.8971E-07, �1 =−4.4579E-04, and �2 =4.6285E-04,and �3 = 3.7225E-05. Using a daily time unit and the estimator 2 = WSS/(n − p − �) = 2.8506E-04, we obtain� = 2/2 + �0 = 0.025174.

As explained in Section 2.2, the function g denotes how the liquidity affects the asset price. Fig. 4 plots the estimatedsupply curve g which is upward-sloping as expected. We see that there is a cost associated with all seller-initiatedtrades since there is essentially a change point at $0 on the negative side and another at the next possible level $100.Buyer-initiated trades are more liquid near $0, and the volume does not affect the price until the size of the trade


2 e-04

-2 e-04

-6 e-04

g

-10 -5 0 5 10

ν

Fig. 4. Estimated supply curve g for the FDX data with � = 3.

exceeds over $300. No other changes are apparent up to a size of $1000. As expected, sellers encounter a moresevere loss.

6. Conclusion

We have demonstrated that the computational effort required by the standard exact algorithm for finding the exactminimizer of the WLS estimator under model (2, 3) with a single change point can be decreased by the algorithm wehave described in Section 3. The model (2, 3) is quite general which includes both basic segmented regression and ourliquidity effect model. We have also shown how we can use the single change-point estimation algorithm to approximatethe global minimizer when the number of change points is greater than 1. Simulations showed that the accuracy of thesegmentation algorithm can be improved greatly by other methods which produce local estimates without too muchcomputational cost. Finally, we have applied the methods to two real data sets in which simulation-based generalizedlikelihood ratio tests suggest a clear choice for � in each example. Although this paper does not focus on issues relatedto estimating the number of change points, future research on this topic will be very interesting.

Acknowledgments

The authors wish to thank two anonymous referees for their valuable comments which greatly improved the qualityof this paper.

References

Admati, A.R., Pfleiderer, P., 1988. A theory of intraday patterns: volume and price variability. Rev. Financial Studies 1 (1), 3–40.Bank, P., Baum, D., 2004. Hedging and portfolio optimization in financial markets with a large trader. Math. Finance 14 (1), 1–18.Çetin, U., Jarrow, R., Protter, P., 2004. Liquidity risk and arbitrage pricing theory. Finance Stoch. 8 (3), 311–341.Çetin, U., Jarrow, R., Protter, P., Warachka, M., 2006. Pricing options in an extended Black Scholes economy with illiquidity: theory and empirical

evidence. Rev. Financial Stud. 19, 493–529.Clark, P.K., 1973. A subordinated stochastic process model with finite variance for speculative prices. Econometrica 41 (1), 135–155.Davies, R.B., 2002. Hypothesis testing when a nuisance parameter is present only under the alternative: linear model case. Biometrika 89 (2),

484–489.de Boor, C., 1978. A Practical Guide to Splines. Springer, Berlin.Dierckx, P., 1993. Curve and Surface Fitting with Splines. Clarendon, Oxford.Duffie, D., Ziegler, A., 2003. Liquidation risk. Financial Analysts J. 59 (3), 42–51.Eilers, P.H.C., Marx, B.D., 1996. Flexible smoothing with B-splines and penalties. Statist. Sci. 11 (2), 89–102.Eubank, R.I., 1988. Spline Smoothing and Nonparametric Regression. Marcel Dekker, New York.Feder, P.I., 1975. The log likelihood ratio in segmented regression. Ann. Statist. 3 (1), 84–97.Frey, R., 1998. Perfect option hedging for a large trader. Finance Stoch. 2 (2), 115–141.


Frey, R., Stremme, A., 1997. Market volatility and feedback effects from dynamic hedging. Math. Finance 7 (4), 351–374.Frino, A., Mollica, V., Walter, T., 2003. Asymmetric price behaviour surrounding block trades: a market microstructure explanation. Retrieved from

〈http://www.cls.dk/caf/wp/wp-154.pdf〉.Green, P.J., Silverman, B.W., 1994. Nonparametric Regression and Generalized Linear Models. Chapman & Hall, London.Gu, C., 2002. Smoothing Spline ANOVA Models. Springer, New York.Hastie, T., Tibshirani, R., Friedman, J., 2001. Elements of Statistical Learning: Data Mining, Inference, and Prediction. Springer, New York.Hudson, D.J., 1966. Fitting segmented curves whose join points have to be estimated. J. Amer. Statist. Assoc. 61 (316), 1097–1129.Jarrow, R., 1992. Market manipulation, bubbles, corners and short squeezes. J. Financial and Quantitative Anal. 27 (3), 311–336.Jarrow, R., 1994. Derivative security markets, market manipulation and option pricing. J. Financial and Quantitative Anal. 29 (2), 241–261.Jarrow, R., 2001. Default parameter estimation using market prices. Financial Analysts J. 57 (5), 75–92.Kim, H.-J., Fay, M.P., Yu, B., Barrett, M.J., Feuer, E.J., 2004. Comparability of segmented regression models. Biometrics 60, 1005–1014.Knowles, M., Siegmund, D., 1989. On Hotelling’s approach to testing for a nonlinear parameter in regression. Internat. Statist. Rev. 57 (3),

205–220.Koul, H.L., Qian, L., Surgailis, D., 2003. Asymptotics of M-estimators in two-phase linear regression models. Stochastic Process. Appl. 103 (1),

123–154.Küchenhoff, H., 1997. An exact algorithm for estimating breakpoints in segmented linear models. Comput. Statist. 12 (2), 235–247.Morgan, I.G., 1976. Stock prices and heteroscedasticity. J. Business 49 (4), 496–508.Pastor, L., Stambaugh, R.F., 2003. Liquidity risk and expected stock returns. J. Political Economy 111 (3), 642–685.Ruppert, D., 2002. Selecting the number of knots for penalized splines. J. Comput. Graph. Statist. 11 (4), 735–757.Seber, G.A.F., Wild, C.J., 1989. Nonlinear Regression. Wiley, New York.Subramanian, A., Jarrow, R., 2001. The liquidity discount. Math. Finance 11 (4), 447–474.Tse, Y., Xiang, J., 2005. Asymmetric liquidity and asymmetric volatility. Retrieved from 〈www.fma.org/Chicago/Papers/Asymmetric_Liquidity_

and_Asymmetric_Volatility.pdf〉.Vostrikova, L.J., 1981. Detecting disorder in multidimensional random process. Soviet Math. Dokl. 24, 55–59.Wahba, G., 1990. Spline Models for Observational Data. Society for Industrial and Applied Mathematics, Philadelphia.Yang, T.Y., 2004. Bayesian binary segmentation procedure for detecting streakiness in sports. J. Roy. Statist. Soc. Ser. A 167 (4), 627–637.

http://www.cls.dk/caf/wp/wp-154.pdf

http://www.fma.org/Chicago/Papers/Asymmetric_Liquidity_and_Asymmetric_Volatility.pdf

http://www.fma.org/Chicago/Papers/Asymmetric_Liquidity_and_Asymmetric_Volatility.pdf

Computation of estimates in segmented regression and a liquidity effect model

Documents

Transcript of Computation of estimates in segmented regression and a liquidity effect model