Influence Diagnostics for the Cross-Validated Smoothing Parameter in Spline Smoothing

7
Influence Diagnostics for the Cross-Validated Smoothing Parameter in Spline Smoothing Author(s): William Thomas Reviewed work(s): Source: Journal of the American Statistical Association, Vol. 86, No. 415 (Sep., 1991), pp. 693- 698 Published by: American Statistical Association Stable URL: http://www.jstor.org/stable/2290399 . Accessed: 09/08/2012 12:22 Your use of the JSTOR archive indicates your acceptance of the Terms & Conditions of Use, available at . http://www.jstor.org/page/info/about/policies/terms.jsp . JSTOR is a not-for-profit service that helps scholars, researchers, and students discover, use, and build upon a wide range of content in a trusted digital archive. We use information technology and tools to increase productivity and facilitate new forms of scholarship. For more information about JSTOR, please contact [email protected]. . American Statistical Association is collaborating with JSTOR to digitize, preserve and extend access to Journal of the American Statistical Association. http://www.jstor.org

Transcript of Influence Diagnostics for the Cross-Validated Smoothing Parameter in Spline Smoothing

Page 1: Influence Diagnostics for the Cross-Validated Smoothing Parameter in Spline Smoothing

Influence Diagnostics for the Cross-Validated Smoothing Parameter in Spline SmoothingAuthor(s): William ThomasReviewed work(s):Source: Journal of the American Statistical Association, Vol. 86, No. 415 (Sep., 1991), pp. 693-698Published by: American Statistical AssociationStable URL: http://www.jstor.org/stable/2290399 .Accessed: 09/08/2012 12:22

Your use of the JSTOR archive indicates your acceptance of the Terms & Conditions of Use, available at .http://www.jstor.org/page/info/about/policies/terms.jsp

.JSTOR is a not-for-profit service that helps scholars, researchers, and students discover, use, and build upon a wide range ofcontent in a trusted digital archive. We use information technology and tools to increase productivity and facilitate new formsof scholarship. For more information about JSTOR, please contact [email protected].

.

American Statistical Association is collaborating with JSTOR to digitize, preserve and extend access to Journalof the American Statistical Association.

http://www.jstor.org

Page 2: Influence Diagnostics for the Cross-Validated Smoothing Parameter in Spline Smoothing

Influence Diagnostics for the Cross-Validated Smoothing Parameter in Spline Smoothing

WILLIAM THOMAS*

This article addresses the problem of influence in estimating the smoothing parameter when fitting a univariate smoothing spline. Diagnostics are presented that can identify observations that locally influence the choice of smoothing parameter by generalized cross-validation, either through their case weights or observed responses. Generalized cross-validation and the diagnostic methods are given a frequency interpretation and illustrated with examples.

KEY WORDS: Data perturbation; Generalized cross-validation; Local influence.

1. INTRODUCTION Diagnostic methods for smoothing splines have received

increasing attention in recent years; see Wendelberger (1981), Eubank (1984, 1985), Silverman (1985), and Eubank and Gunst (1986). This article presents new diagnostics for in- fluence on an important aspect of a fitted smoothing spline: the generalized cross-validation estimate of the smoothing parameter. These new diagnostics represent an extension of the local-influence methods of Cook (1986) to the smooth- ing spline setting. Our diagnostics are based on simulta- neous local perturbations of all the observations, through their case weights or their values of the response variable.

Suppose that scalar responses yj follow the model

y = g (ti) + ej(1.1) where ,u is a "smooth" regression function (a ' t1 < ...

< tn < b) and the errors Ej are uncorrelated, with zero mean and constant variance. By smooth, we mean that ,u belongs to the set W'7[a, b] of functions g that, for some fixed m, have m - 1 continuous derivatives and square-integrable mth derivative g(m) in [a, b]. The regression curve is to be estimated assuming only that , is an element of Wm[a, b].

There are many possible estimators of , in (1.1); see Eubank (1988) or Muller (1988). A popular estimator, based on the assumptions above, is the minimizer over g E W2 of

n rb

_!E {yj _ g(tj)}2 + A j{g(m)(t)}2 dt, A > 0. (1.2) n i= 1

If n ' m, the minimizer U is a natural polynomial spline of order 2m with knots at the tj that is known as a smoothing spline. Discussions of smoothing splines and their statisti- cal application may be found in Wegman and Wright (1983), Silverman (1985), Eubank (1988), and Wahba (1990).

The parameter A in (1.2) acts as a tuning constant to bal- ance the competing aims of fidelity to the data and smooth- ness. Small values of A produce wiggly estimates and, in the extreme case A = 0, a spline which interpolates the data. Large values of A yield smoother estimates, with A = so corresponding to polynomial regression.

* William Thomas is Assistant Professor, Division of Biostatistics, Box 197 Mayo, University of Minnesota, Minneapolis, MN 55455. This re- search was supported in part by the National Institutes of Health (GM39015- OlAl). The author thanks Randy Eubank for many helpful discussions and comments on the manuscript and John Adams, Steve Marron, and Gary Oehlert for their suggestions.

Selecting a value for the smoothing parameter A is a cru- cial part of the fitting process, and automatic procedures which allow the data to select such tuning constants are often preferred. For smoothing splines, generalized cross- validation (GCV) appears to have become the selection pro- cedure of choice since its introduction by Craven and Wahba (1979); for another view, see Gasser, Kneip, and Kohler (1991). The GCV choice A minimizes

III_HkWI12 eATeA G(A) - _ HA)- t 2'AeA (1.3)

(n -trHA)2 (n -trHA)

where HA is the "hat" matrix that transforms the data vector y into the vector of smoothing spline fitted values, and ex = (I - HA)y is the vector of residuals. Generalized cross- validation can be used to select the smoothing parameter in various smoothing problems; for discussion, see Li (1985), Hall and Titterington (1987), and Hairdle, Hall, and Marron (1988).

Despite the nonparametric nature of spline smoothing, much of the sensitivity of least squares is inherited through the squared-error term in (1.2), as noted by Huber (1979). Current diagnostic methods for smoothing splines are mostly of the case-deletion variety, including parallels of student- ized residuals and Cook's (1977) distance measures; see Eubank (1984, 1985, 1988) and Eubank and Gunst (1986).

No diagnostics for influence on GCV have appeared in the literature. Case-deletion diagnostics for the GCV choice A are an obvious approach, but are computationally infeasi- ble for large datasets. The minimum of the GCV criterion is found by optimization methods or by a grid search, where each evaluation of G(A) requires O(n) operations; deter- mining A(t), with the ith case omitted, then requires re- peating this process for each of n cases. Further, as will be discussed in Section 3, the estimate A is apparently sensi- tive to groups of observations acting together rather than single outlying points. Hence case-deletion diagnostics may not be very informative, even when they are computable.

To deal with the possibility of influential groups of cases, it seems natural to base diagnostics on simultaneous per- turbation of all observations rather than on deleting single cases. We examine the effect of data perturbations on the GCV estimate A and derive diagnostics by applying the lo-

t 1991 American Statistical Association Joumal of the American Statistical Association

September 1991, Vol. 86, No. 415, Theory and Methods

693

Page 3: Influence Diagnostics for the Cross-Validated Smoothing Parameter in Spline Smoothing

694 Journal of the American Statistical Association

cal-influence method of Cook (1986). Such diagnostics al- low the identification of sets of cases exerting influence jointly and, by varying the type of perturbation, permit the analyst to assess different kinds of influence on A. Two perturbation schemes are considered: modifying the weights of all observations and additive perturbations to the re- sponses yj. Perturbing the design points {tj} is not consid- ered because the complicated dependence of the hat matrix HA on the {tj} leads to an impracticably difficult form.

Section 2 presents details of the perturbation schemes and the diagnostics. Section 3 presents a frequency interpreta- tion of the influence diagnostics and GCV in the context of periodic smoothing splines, and Section 4 contains two examples illustrating the diagnostics' use.

2. THE INFLUENCE DIAGNOSTICS

To identify observations that have a disproportionately large impact on the determination of the GCV estimator A, we derive diagnostics using the local-influence method of Cook (1986). Thus, suppose that we modify all observa- tions simultaneously in some fashion with a vector of small perturbations co; details are given later regarding modifi- cation of case weights and responses. We can apply GCV to the perturbed data to get an estimate A(co) for each co in some open set IQ of allowable perturbations. We assume there is a point coo E f that represents no modification of the data, so that A(coo) = A.

Now we view the function A(co) as a surface over the region 1Q. To study influence, we want to find directions of large local change in the surface at A = A(coo), the un- modified data. The essential idea is that a direction of large local change at A corresponds to perturbation of influential elements of the data, and therefore large components of this direction vector identify locally influential observations.

To find directions of large local change, our first step is to approximate the actual A(co) surface with its tangent plane at A(coo) and find the direction of maximum slope tm,,, on this tangent plane. It is easy to show that the direction of maximum slope is tma,, oc dA(cv)/dcvT, evaluated at co0; Cook (1986) provides a detailed account of the geometry in- volved. The direction vector tma,, tells us how to perturb the data to produce the greatest local change. Thus tm,,a itself is the influence diagnostic, and the largest absolute com- ponents of tm,,a identify locally influential cases.

2.1 Perturbing Case Weights

We first perturb the data by modifying the weight given to the contribution of each case (tj, yj) in the penalized least squares criterion (1.2) using a vector of weights co = (cto, * cO)T. Pregibon (1981) argues that case-weight pertur-

bation generalizes the notion of case deletion, in which co; is limited to the values 0 or 1; also see Cook (1986). Thus, for fixed co, the defining criterion (1.2) now becomes

gmEin [ y - &?ilYi_g(tj)}2 + AJ {g(m)(t)}2 dtJ A > 0.

(2.1)

We now have a weighted version of the GCV criterion

A

(1.3): A(co) minimizes T

G(A,@) = trH 2 (n - tH,,2

where A = diag(w1, ..., Wn), Hk,O is the hat matrix resulting from (2.1), and ek, = (I - H,)Y The point w0 = (1, I)T represents no perturbation.

To find the direction of maximum slope at A(co0) we use the implicit definition of A(co) as the solution to

dG(A, co) |=A(w) = 0. (2.2) d A A=A@

By the implicit function theorem, we may differentiate both sides of (2.2) with respect to co to obtain

dX(w) (a2GA - 1 d2G a (OT dA2 ()d(TdA)

where all derivatives are evaluated at coo and A. The first quantity on the right side is a scalar which may be ignored, so

tmax(Wt) C G(A co)

A~~~~Wd evaluated at A and cOo. Straightforward calculations yield

tmax(Wt) cX (I - 2H)e C) He + (y + 4e) 0 H(I - H)e

+ 2d-1 tr{H(I - H)}{(3I - 2H)y O) He

- (2I-H)y0C)e}

- 2d-11jej 2[diagonal of {H(I - H)(I - 2H)}]

+ 4d-'(eTHe)[diagonal of {H(I - H)}]

- 6d-2 tr{H(I - H)}1je112[diagonal of {H(I - H)}],

(2.3)

where subscripts have been dropped for simplicity, so that H = HA and e = ex, while I is the n x n identity matrix, d = (n - tr HA), and 0) is the Hadamard, or component- wise, product.

2.2 Perturbing the Response

The second perturbation scheme consists of adding small perturbations to the responses, so the vector of modified responses is ye, = y + co. Here coo = 0 represents no mod- ification of the data. Additive perturbations of the responses have been used by Emerson, Hoaglin, and Kempthorne (1984) and Thomas and Cook (1989).

Under this perturbation scheme, the penalized least squares criterion (1.2) becomes

min [j {yJ+% 2g(tj)}2 +A {g(m)(t)}2dt] A>0,

(2.4)

and GCV chooses A(co) to minimize

G(ASQ)) (n -tr HA)2-

Page 4: Influence Diagnostics for the Cross-Validated Smoothing Parameter in Spline Smoothing

Thomas: Diagnostics for the Cross-Validated Smoothing Parameter 695

The diagnostics for this situation are simpler than the di- agnostics for case weights. By similar calculations, the di- rection of maximum slope when y is perturbed is

tmax(y) c: (cI - H)(I - H)2y, (2.5)

where c = tr{H(I - H)}/tr(I - H), H = HA and e = ex.

3. FREQUENCY INTERPRETATION OF GCV AND THE DIAGNOSTICS

Results from applying the influence diagnostics suggest that when A is large, it is not particularly sensitive to small subsets of observations. In contrast, when A is very small, it seems to be sensitive to groups of observations that make the regression function appear to have important high-fre- quency components. In this section we discuss the connec- tion between the frequency components of the data and the diagnostics for A. The analysis is most tractable in the spe- cial case of periodic smoothing splines; see Eubank (1988, sec. 6.3. 1) and the references therein. For periodic splines the mapping from the "time domain" (y) to the frequency domain is particularly transparent, so attention is restricted to that case. However, the ideas extend in principle to the general case. The effect of high-frequency components in the related problem of selecting a bandwidth for nonpara- metric regression has been discussed by Chiu (1990).

To discuss the periodic case for cubic splines (m = 2), we assume the model (1.1) with the additional assumptions: (a) tl, ..., tn are equally spaced in [0, 1], (b) ,u is smoothly periodic in the sense that ,u(O) = ,(1) and 1f(l(O) -= A()(1), and, for simplicity, (c) n is odd. Write the Fourier trans- form of y as f(y) = Xy/n, where X is the n x n matrix with rows

r= (1, exp(2lTir/n), ..., exp{2i7ri(n - )r/n}),

in the order r = -(n - 1)/2, ..., (n - 1)/2, and where i2 = -1. The Fourier coefficient of y for the frequency r/n is the rth component of f(y),

n

fr(Y) = Yk exp{27li(k - l)r/n}, nk=l

so that the fr for large Irl correspond to high frequencies. Then Ifr(Y)12 is the power of the signal y at frequency r/n.

The cubic periodic spline estimate of ,u for fixed A > 0 is, to a high order of approximation, ,21 = X"WXy/n = XHWf(y), where XH is the hermitian, or conjugate trans- pose, of X, and W = W(A) is a diagonal matrix with di- agonal elements Wr(A) = (1 + Ar4)-l, for r = -(n - 1)/2,

(n - 1)/2. Since the weights Wr(A) decrease with in- creasing frequency lr/n|, W is essentially a low-pass filter in the frequency domain that smooths the data by damping high-frequency components of y. The amount of damping depends on A: small values produce less damping, large values more.

The GCV criterion (1.3) can be expressed in terms of the power Ifr(Y)12 at various frequencies as

G(A) = E flOr(A)Ifr(Y)12, Irjs(n-1)/2

where

(l Ar4 (IjI?l)/2 A

Aj4)2

Note that, in contrast to the weights wr(A) for the periodic spline, the GCV weights Or(A) are strictly increasing with jr/ni, so that high-frequency components of the data may have a larger role in determining A. The amount by which high frequencies outweigh low frequencies depends criti- cally on the value of A. Figure 1 shows several sequences of GCV weights Or(A) with n = 51, for A equal to Ao {(n - 1)/2}-4, bA0, 102A0, and 104Ao. When A is near {(n- 1)/2}-4, high frequencies receive substantially greater weight. Thus when GCV is minimized by a very small A, it may be driven by small groups of cases that contribute to the power of y at high frequencies. When GCV is min- imized at a large A, higher and lower frequencies have nearly equal weight.

To see how the diagnostics track this sensitivity in GCV, we rewrite tmax in (2.3) and (2.5) as a function of the Fou- rier coefficients of the data f(y). In the following, we regard the number of observations n as fixed and examine the be- havior of the diagnostics for various values of A. In the periodic case, the diagnostic (2.5) for influence through values of y is

tmax(y) ?C XH(cI - W)(I -W)2f(y),

where c is defined as after (2.5). The filter (cI - W)(I -

W)2 is increasing in jr/nj for all values of A and so acts as a high-pass filter that increases the output power at high frequencies. Thus tmax(y) has large absolute components corresponding to groups of responses that make large con- tributions to the high-frequency components of y.

The diagnostic (2.3) for influence through case weights can be written as

tmax(Wt) cc cll + XH(5I - 4W)f 0) XHW(I - W)2f

+ XH(I - 2W)(I - W)f 0) XHW(I - W)f

+ c2{XH(3I - 2W)f 0 XHW(I - W)f

- XH(2I - W)f ( XH(I- W)f- ,

where f = f(y), 1 is a vector of ones, cl is the sum of the last three lines of (2.3), which are constant vectors for this case, c2 = 2d-1 tr{H(I - H)}, and the other notation is the same as in (2.3). The filters (5I - 4W), (I - 2W)(I -W), (3I - 2W), (21 - W), and (I - W) are diagonal matrices whose diagonal elements strictly increase with jr/nj. The remaining filters, W(I - W) and W(I - W)2, have diagonal elements which increase with frequency only when A c {(n - 1)/2}-4. Thus, the latter filters act as high-pass filters only when A is very small. In this case, groups of obser- vations that contribute mainly to the power at high fre- quencies get larger components of tmax(wt) and are marked as locally influential.

4. EXAMPLES

To illustrate the diagnostics, we consider the simulated data shown in Figure 2. The 32 equally spaced observations

Page 5: Influence Diagnostics for the Cross-Validated Smoothing Parameter in Spline Smoothing

696 Journal of the American Statistical Association

0

0 0

0 o . 1 ONO,

?l I~~~~~~~~~~~~00

o 1 02?\

0oL* 0 0|0f O O&; 04N 0 6 0 5 10 15 20 25 30 35

I ri

Figure 1. GCV Coefficients Or(A) versus r, for Several Values of A, Given as Multiples of Ao = {(n - 1)/2}

were generated from model (1. 1) using the periodic regres- sion function

4

,u(t) = 1 + 2 > {ar cos 2i7rk(t - .3) + br sin 2-7rk(t -.3)}, k=l

0 't 1,

with aT = (-.5, .5, 2.5, 1.0) and bT = (2.5, 1.0, .5, .5), plus independent errors uniformly distributed on [-3, 3]. This regression function is a version of one used by Hall and Titterington (1987). In Figure 2 the data are plotted against case number rather than t to facilitate comparison with the diagnostics.

Figure 2 also shows the periodic cubic smoothing spline fitted to the simulated data using GCV to select A = 7.6 X i0-5 5{(n- 1)/2}-4. An index plot of the diagnostic

0

0~~~~~~~~~~~

L)

0

. . L j . L 6

I 0 4 8 12 16 20 24 28 32 36 case number

Figure 2. Simulated Data Based on a Periodic Regression Function (dotted curve) With a Cubic Spline Fit (soilid curve). Data are plotted against case number rather than t.

0 4 8 1 2 1 6 20 24 28 32 case number

Figure 3. Index Plot of tma,,(wt) for the Simulated Data.

for influence through case weights, tm,(wt) in (2.3), is given in Figure 3. Conditional on the value of A, the components of tm,,(wt) in this plot indicate, up to sign, how to modify case weights to produce the greatest local change in the GCV estimate. The cases with the largest absolute com- ponents (5, 6, 19) have greatest local influence with respect to case weights.

Figure 4 displays an index plot of the diagnostic for in- fluential responses, tmax(y) in (2.5). This plot indicates that moving the responses for cases 17, 19, and 31 in the same direction while moving the response for case 18 in the op- posite direction will produce a large local change in A.

For comparison with standard diagnostic methods for case influence, Figure 5 gives an index plot of a generalized Cook's distance (Eubank 1984), which estimates the over- all change in the fitted spline when a single case is deleted. As might be expected from their different aims, the group of cases with largest Cook's distance (6, 15, 17) is not the same as either of the groups highlighted by the tmax diag- nostics. Implicit in any notion of influence is a particular kind of perturbation of the problem, and different types of perturbation may lead to different sets of influential observations.

To assess the actual effect of locally influential groups of observations, one may modify these observations as in- dicated by tmaxg find the new GCV choice, and plot the re- sulting spline for comparison. For example, to assess the effect of the responses in cases 17, 18, 19, and 31, we subtract 1.0 from the response in cases 17, 19, and 31, while adding 1.0 to the response in case 18. Figure 6 shows the original data and the four perturbed responses, with the

>6

x 4~0

0 4 8 12 16 20 24 28 32 case number

Figure 4. Index Plot of tma,(y) for the Simulated Data.

Page 6: Influence Diagnostics for the Cross-Validated Smoothing Parameter in Spline Smoothing

Thomas: Diagnostics for the Cross-Validated Smoothing Parameter 697

0

(I)

00 0

? 0 4 8 12 16 20 24 28 32 case number

Figure 5. Index Plot of Generalized Cook's Distance for the Simu- lated Data.

smoothing splines fit to both the original data (,i) and the modified data (,i). Altering these four responses produces a larger GCV estimate, about 3 times A from the original data, and hence a less wiggly fit. On the other hand, mov- ing these cases in the opposite direction decreases the GCV estimate, giving a fitted spline that virtually interpolates the data.

The effect of modifying these four cases on the frequency content may be seen by comparing the sample spectral den- sity functions of the spline fits ,u and ,I of Figure 6. For a real n x 1 vector u, the sample spectral density function is

n ~~~~~~~~2 S(h, u) - Uk exp{2ri(k - l)h} , h E [0, .5].

nk=1I

Figure 7 displays log S(h, ,u) and log S(h, ,u) calculated from the vectors of fitted values ,i(tj) and ft(tj) (j = 1, n). There is a clear drop in power at the higher frequencies as a result of perturbing the response at four influential cases.

The second example uses the income (t) per capita in U.S. dollars and life expectancy (y) in 93 nations for 1979. The data are given by Leinhardt and Wasserman (1979) and

10

0~~~~~~~~ 10

0~~~~~~~~~ 10~~~~~~~~~

10~~~~~~~~~~

0 4 8 12 16 20 24 28 32 36 case number

Figure 6. Simulated Data With Four Perturbed Responses Marked by x. The solid curve is the spline fit,u to the original data, while the dashed curve is the spline fit,ui to the perturbed data.

LO

0

o~~~~~, \

f) LO

ol~~~~~~~~~~~~~I 0 '

0

lO

I 0.0 0.1 0.2 0.3 0.4 0.5 h

Figure 7. Simulated Data: log Sample Spectral Density Functions. The solid curve is log S(h, ,u) and the dashed curve is log S(h, ,u).

are graphed in Figure 8. The solid curve is the fitted cubic spline ,d, with smoothing parameter chosen by GCV. The three cases (66, 72, 74) marked by filled circles have the largest absolute components of tmax(Wt). To see the practical effect of these cases on GCV, their case weights were low- ered to .5 and the smoothing parameter was reestimated. The resulting estimate A(Z) based on the perturbed weights is about 36 times larger than A from the original data, yield- ing a smoother spline fit ,t, shown as a dashed curve in Figure 8. Of the three cases, only the one with smallest response possesses a large Cook's distance. It appears that these three cases jointly influence GCV to assign higher frequencies to the regression function rather than to error. Through the smoothing-parameter estimate, these three cases exert substantial influence on the form of the fitted spline.

[Received April 1990. Revised January 1991.]

0

0~~~~~~~~~~~~ 000

0C a~~~

i

0~~~~~~~

rr)_

0_ Lo

0 500 1 500 2500 3500 4500 5500 Per Capita Income

Figure 8. Life Expectancy Data. The solid curve is the cubic spline fit ,u to the original data; The dashed curve is the spline fit ,u to the perturbed data with reduced case weights for the observations marked by ifilled circles.

Page 7: Influence Diagnostics for the Cross-Validated Smoothing Parameter in Spline Smoothing

698 Journal of the American Statistical Association

REFERENCES

Chiu, S.-T. (1990), "Why Bandwidth Selectors Tend to Choose Smaller Bandwidths, and a Remedy," Biometrika, 77, 222-226.

Cook, R. D. (1977), "Detection of Influential Observations in Linear Regression," Technometrics, 19, 15-18.

(1986), "Assessment of Local Influence" (with discussion), Jour- nal of the Royal Statistical Society, Ser. B, 48, 133-169.

Craven, P., and Wahba, G. (1979), "Smoothing Noisy Data With Spline Functions: Estimating the Correct Degree of Smoothing by the Method of Generalized Cross-Validation," Numerische Mathematik, 31, 377- 403.

Emerson, J. D., Hoaglin, D. C., and Kempthorne, P. J. (1984), "Le- verage in Least Squares Additive-Plus-Multiplicative Fits for Two-Way Tables," Journal of the American Statistical Association, 79, 329-335.

Eubank, R. L. (1984), "The Hat Matrix for Smoothing Splines," Statis- tics and Probability Letters, 2, 9-14.

(1985), "Diagnostics for Smoothing Splines," Journal of the Royal Statistical Society, Ser. B, 47, 332-341.

(1988), Spline Smoothing and Nonparametric Regression, New York: Marcel Dekker.

Eubank, R. L., and Gunst, R. F. (1986), "Diagnostics for Penalized Least- Squares Estimators," Statistics and Probability Letters, 4, 265-272.

Gasser, T., Kneip, A., and Kohler, W. (1991), "A Flexible and Fast Method for Automatic Smoothing," Journal of the American Statistical Association, 86, 643-652.

Hall, P., and Titterington, D. M. (1987), "Common Structure of Tech- niques for Choosing Smoothing Parameters in Regression Problems," Journal of the Royal Statistical Society, Ser. B, 49, 184-198.

Hairdle, W., Hall, P., and Marron, J. S. (1988), "How Far Are Auto- matically Chosen Regression Parameters from Their Optimum?" (with discussion), Journal of the American Statistical Association, 83, 86- 101.

Huber, P. (1979), "Robust Smoothing," in Robustness in Statistics, eds. R. L. Launer and G. N. Wilkinson, New York: Academic Press, pp. 33-48.

Leinhardt, S., and Wasserman, S. S. (1979), "Teaching Regression: An Exploratory Approach," The American Statistician, 33, 196- 203.

Li, K. C. (1985), "From Stein's Unbiased Risk Estimates to the Method of Generalized Cross-Validation," The Annals of Statistics, 13, 1352- 1377.

Muller, H.-G. (1988), Nonparametric Regression Analysis of Longitu- dinal Data, New York: Springer-Verlag.

Pregibon, D. (1981), "Logistic Regression Diagnostics," The Annals of Statistics, 9, 705-724.

Silverman, B. W. (1985), "Some Aspects of the Spline Smoothing Ap- proach to Non-Parametric Regression Curve Fitting" (with discussion), Journal of the Royal Statistical Society, Ser. B, 47, 1-52.

Thomas, W., and Cook, R. D. (1989), "Assessing Influence on Regres- sion Coefficients in Generalized Linear Models," Biometrika, 76, 741-749.

Wahba, G. (1990), Spline Models in Statistics, Philadelphia: SIAM. Wegman, E. J., and Wright, I. W. (1983), "Splines in Statistics," Jour-

nal of the American Statistical Association, 78, 351-365. Wendelberger, J. G. (1981), "The Computation of Laplacian Smoothing

Splines With Examples," Technical Report 648, University of Wis- consin, Madison, Dept. of Statistics.