A Study on Prediction of Spatial Binomial Probabilities with...

14
1 Correspondence to [email protected] 1 A Study on Prediction of Spatial Binomial Probabilities with an Application to Spatial Design Hao Zhang 1 Program in Statistics Washin gton State U niversity Pullman, WA 99164 H. Holly Wang Department of Agricultural Economics Washin gton State U niversity Pullman, WA 99164 Abstract This work studies some issues that are related to interpolation of binomial probabilities. In some situations, binomial counts are observed at some spatial locations and binomial probabilities are interpolated at un-sampled locations based on the sample data. An example in precision agriculture is considered in this work. A natural practical question is how the number of sampling locations and the sampling sizes at the sampling locations affect the interpolation. This que stion is studied in th is work throu gh simulations . The mo del-based geostatistics ap proach (D iggl et al., 1998) is employed in which the binomial counts are modeled through a spatial generalized linear mixed model. The minimum m ean-square d error pr ediction is car ried out for the binomial p robability. 1. Introduction This work is motivated by an experiment on Cunningham farm that is located 8 miles north of Pullman, Washington. The ov erall objec tive of the expe riment is to iden tify factors that affect bo th yield and q uality of wheat an d barley, the two major crops in the dryland farming area in the Pacific Northwest, so that measures of precision farming may be applied to improve yield and quality so as to maximize profitability of the farm. Some variables that likely affect both yield and quality are incidence rates of plant root diseases, soil properties, elevation and aspect ratio, all of which are location-specific and measured locally. Once the dominating variables are identified and their effects on yield and quality quantified, optimization of both yield and quality can be realized through the practice of precision agriculture. 100 locations were randomly selected by the research team by dividing the farm into equally spaced rows and columns, approximately 20 meters between rows and columns, then random drawing from the lattice. At each

Transcript of A Study on Prediction of Spatial Binomial Probabilities with...

  • 1Correspondence to [email protected]

    1

    A Study on Prediction of Spatial Binomial Probabilities with an Application

    to Spatial Design

    Hao Zhang 1

    Program in Statistics

    Washin gton State U niversity

    Pullman, WA 99164

    H. Holly Wang

    Department of Agricultural Economics

    Washin gton State U niversity

    Pullman, WA 99164

    Abstract

    This work studies some issues that are related to interpolation of binomial probabilities. In some situations, binomial

    counts are observed at some spatial locations and binomial probabilities are interpolated at un-sampled locations

    based on the sample data. An example in precision agriculture is considered in this work. A natural practical question

    is how the number of sampling locations and the sampling sizes at the sampling locations affect the interpolation.

    This que stion is studied in th is work throu gh simulations . The mo del-based geostatistics ap proach (D iggl et al.,

    1998) is employed in which the binomial counts are modeled through a spatial generalized linear mixed model. The

    minimum m ean-square d error pr ediction is car ried out for the binomial p robability.

    1. Introduction

    This work is motivated by an experiment on Cunningham farm that is located 8 miles north of Pullman, Washington.

    The ov erall objec tive of the expe riment is to iden tify factors that affect bo th yield and q uality of wheat an d barley,

    the two major crops in the dryland farming area in the Pacific Northwest, so that measures of precision farming may

    be applied to improve yield and quality so as to maximize profitability of the farm. Some variables that likely affect

    both yield and quality are incidence rates of plant root diseases, soil properties, elevation and aspect ratio, all of

    which are location-specific and measured locally. Once the dominating variables are identified and their effects on

    yield and quality quantified, optimization of both yield and quality can be realized through the practice of precision

    agriculture. 100 locations were randomly selected by the research team by dividing the farm into equally spaced rows

    and columns, approximately 20 meters between rows and columns, then random drawing from the lattice. At each

  • 2

    sampling lo cation, the incid ence rate o f some roo t disease and other variab les were mea sured, alon g with the yield

    obtained by hand-harvesting a squared area of 2 meters long and 2 meters wide. Variables such as protein content

    that reflect quality of wheat were also measured at the sites. These sample data are currently being studied to see how

    they affect yield and quality. Once the domina ting variables a re identified an d their effects on yield and qu ality

    quantified, o ptimum p rofits can be a chieved b y locally adjusting or contro lling the variable s to the appr opriate leve ls

    through pr ecision farmin g that utilizes local info rmation. Ap parently, data of these variab les must be av ailable at all

    locations wh ere quality and yield are to be optimized and ideally we would like to optimize the m across the whole

    farm. This w ould requ ire interpolatio n or pred iction of the var iables to all um sampled locations.

    In this work, we will study some issues related to interpolation of a particular variable, namely, the incidence

    rate of Rhizo ctonia roo t rot caused b y Rhizoctonia solani and Rhizoctonia oryzae. These fungi attach to the root

    system and re duce the ab ility of plants to take u p adequ ate water and nutrients, and c onseque ntly affect both yield

    and quality of the crops. At each of the sampling sites on Cunningham farm, 15 plants of barley were

    sampled and pulled out of groun d in the summ er of 200 0, and the nu mber of to tal crown ro ots, , and that of

    infected cro wn roots, , were obtained for each plant. The incidence rate of root rot at the site was obtained as

    the total numb er of infected c rown roo ts divided b y the total numb er of crown roots, i.e., . These incidence

    rates at the sam pled sites are used to interp olate the incid ence rate to a s many sites acro ss the farm as ne cessary. A

    natural practical problem is ho w the variables and affect the interpo lation. For ex ample, is it nece ssary to

    sample more crops to increase , which currently ranges from 89 to 197? W hat is the gain by sampling more

    locations? When the total sampling cost is controlled, do we prefer sampling more locations with each location

    having a smaller sample size or otherwise? These will be the problems to be studied in the paper.

    It is not possible to study these problems without first specifying how the prediction is to be carried out. Some

    well-known geostatistical kriging (predicting) methods such as ordinary kriging, trans-Gaussian kriging and

    disjunctive kr iging might now emerge as p ossible interp olation meth ods for ou r problem s. We refe r to Cressie

    (1993) for introduction to each of the methods. However, if any of the methods was to be applied, it would be

    applied to the ratio and not to , because itself without means nothing, and varies from

    site to site in the experiment. Once the ratios are used for prediction, the sample sizes no longer affect

    prediction. Conseq uently, we would not be able to study how the sampling sizes affect the prediction.

    Therefore, we will adapt the model-based geostatistics (Diggle, et al.,1998), which incorporates the sample sizes

    into the model and allows for the minimum mean-squared error (MMSE) prediction of the incidence rate.

    This approac h assumes that the counts of infected crow n roots follow the following spatial generalized

    linear mixed model (GLMM ):

    (a). is a Gaussian stationary process with mean 0.

    (b). Conditionally on , consists of independent variables. Moreover, the conditional

    distribution o f an individua l is binomial with a binomial index and the bino mial prob ability

  • 3

    .

    It is reasonable to assume that the counts of infected crown roots are binomial with varying binomial

    probabilities . Spatial variation is represented by the random term which might be accounted for by

    unknown or unobs ervable fac tors. In the abs ence of any p hysically based model for the spatial trend , the parame tric

    form may be rea sonable. U nder the sp atial GLM M, the M MSE estimation or prediction of is the

    conditiona l expectation of given the observed b inomial responses . Although it can not be eva luated in

    closed form , the MM SE pred iction can be compute d through the Marko v chain M onte Carlo methods.

    Two ap proache s to the MM SE pred iction of have been considere d under the spatial GLM M. Digg le et al.

    (1998) con sidered Bayesian p rediction of a function of random effects, such as , implemented using MCMC

    methods. Zhang (2002a, b) considered MM SE prediction under the assumption that parameters are known or have

    been estimated, and also relied on MCMC m ethods for the calculation. Zhang (2002b) showed that some analytical

    results can be used in either the Bayesian or non-Bayesian approach to make the prediction computationally more

    efficient.

    Obviously model parameters have to be known or estimated before prediction can be evaluated. For spatial

    GLMM s, different inferential methods exist for model parameter estimation. Diggle et al. (1998) used a Bayesian

    approach for parameter estimation. Zhang (2002a) considered the Monte Carlo EM algorithm for maximum

    likelihood e stimation. Pe nalized qu asi-likelihood (Breslow and Clayto n, 1993 ) can also b e applied to this mode l.

    Being aw are of that differe nt parame ter estimation m ethods exit fo r the mode l, and that these m ethods ma y lead to

    different estimate s, we will not study e xplicitly how the d esigns, espec ially the choice o f and , affect

    paramete r estimates. Ra ther, we will focus o n their effects on p rediction.

    Although this w ork is related to spatial samp ling design and indeed d eals with some aspect of the d esign, it

    differs from most works on spatial sampling design in several ways. Firstly, in most works on spatial designs, the

    response variable can be decom posed into add itive parts:

    where the error term has mean 0 and is either intrinsically stationary or second-order stationary, whose correlation

    structure is independent of the mean surface (see, for example, Warrick and Myers, 1987, Pesti et al., 1994,

    Bened etti and Palm a ,1995 , and Mu ller and Zimm erman,19 99, amo ng many oth ers). Such a d ecomp osition is

    important in some of the works on spatial desgin. For example, when the means are a constant, the variogram of the

    can be estimated without estimating the mean. Hence the design criterion can be expressed in the variogram

    parameters alone (see, for example, Muller and Zimmerman, 1999). In our model, such a decomposition does not

    exist because both the mea n and the variance of the respo nse variable depend on some com mon parame ters.

    Consequently, approaches to and results of a classical spatial design need to be extended or modified to become

    applicab le to our mo del.

    Secondly, the objective of a spatial sampling design is often the optimum allocation of spatial sampling

  • 4

    locations, where the optimum may be done with regard to trend estimation, variogram estimation or prediction.

    Although it is an interesting and important problem, deciding optimum sampling locations is very complicated in the

    Cunningham farm experiment because not only the response variable is binomial but also other response variables

    also measured at the sampling sites. Optimum sampling design for one variable alone has little practical value to the

    experime nt. In addition , the respons e variable w e consider in this paper is b inomial. He nce the sam ple size at a

    sampling lo cation is a new variable that c lassical spatial d esigns do no t entertain.

    Lastly, unconditional variances of estimator or predictors are used in the criteria of spatial designs such as

    minimizing the weighted average variance of prediction in Fed orov (1996 ) and Muller (19 98). Howev er, most

    applications of spatial prediction use the conditional mean and variance of the predicted variable given observed

    data. For e xample, the kriging (or pr edicting) surfac e of consists of the co nditional me an of given the

    responses at sampling sites (Diggle et al., 1998). In our particular experiment, we may ask how this kriging

    surface would change if less roots were sampled at each site (i.e., a smaller ). Hence, the conditiona l inference is

    more dire ctly related to o ur proble ms.

    Therefore, this work is not on spatial design in the classical sense. The primary focus of this paper is to study

    how and affect the prediction. There are no theoretical results that provide immediate answers to the

    questions. We hence resort to simulation studies to see these effects on prediction. The rest of the paper is organized

    as follows. In S ection 2, we review som e results of Zha ng (2002 b) that will be use d in the subse quent sectio n to

    calculate the M MSE prediction of . Sections 3 contains several simulation studies to see how the prediction of

    is affected by changes in and . When we change sampling sizes, we keep the incidence rates unchanged

    or appropriately the same ( has to be an integer) hence the conditional inferences, i.e., predicted values and

    prediction variance give n the same inc idence rate s, can be co mparab le. Conclusio ns and discu ssion are pre sented in

    Section 4.

    2. MMSE Prediction in a Spatial GLMM

    In this section, we review some results of Zhang (2002b) that can be used to efficiently calculate MMSE prediction

    for . Let satisfy the spatial GLMM defined in the introduction and be the random effects. Let

    be the sampling sites and write for and , respectively, , and

    . Zhang (2002b) showed that for any function ,

    (1)

    Monte Carlo samples from the conditional distribution can be generated through a MCM C method and the

    right-hand side of the equation can be approximated by the appropriate sample average. For example, the

    Metropolis-Hastings algorithm is quite easy to be implemented for spatial GLMM, as seen in Diggle et al. (1998)

    and Zhang (20 02a, b). Once M onte Carlo samp les are generated from , the following approximation

  • 5

    takes place

    (2)

    We now need to calculate for a given vector of rando m effects . Fortunately, this conditional

    expectation is an integral of the form , which can be fairly easily approximated to any given

    precision (Crouch and Spiegelman, 1990) if it can not be computed in closed form. Indeed, conditional on ,

    has a normal distribution with mean and variance , and

    (3)

    It is well known that the conditional mean and conditional variance can b e calculated from the cova riances:

    where is the covarian ce matrix of and is the (1,1)th elem ent of .

    Equation s (1)-(3) can be applie d to calculate the MM SE pred iction of and the

    corresponding prediction variance given . The first two c onditional m oments of given involve logistic-

    normal integrals of the form

    Indeed, for ,

    from which we obtain the prediction variance

    The logistic-normal integrals cannot be calculated in closed form but can be evaluated through numerical

    methods. For exam ple, the method of Ga ussian quadrature app roximates the integral as follows:

    where , , are available from standard tables for (Abramowitz and Stegun, 1967, p 924).

    Although this m ethod is be lieved to ap proxima te well, an analytica l error bou nd is not kno wn. If it is desirable to

    control the error bound, then the method of Crouch and Speigelman (1990) can be used. Given an error bound , this

    method c alls for choo sing a prop er constant h>0 such that

    (4)

    where . The infinite sum is then truncated to satisfy any error-bound.

  • 6

    Figure 1. Sampling locations (circle) and predictedlocations (+); the black dot shows the samplinglocation where sample size is changed.

    We have now outlined a method for approximate the MM SE prediction and the prediction variance

    . This method uses some analytical results and calls for generation of Monte Carlo samples of random

    effects at the sampled sites only, hence differs from the pure Monte Carlo method that generates samples of the

    random effect at the interpolated site as in Diggle et al. (1998). The simulation results of Zhang (2002b) show that

    using the analytical results provide faster convergence and is less computationally costing.

    3. Simulation Studies

    In this section, we carry out several simulation studies to deal with the practical problems arising in the

    Cunningham farm experiment. Each of the following three subsections deal with some specific problems, in which

    data from th e following m odel are ge nerated. Le t be a seco nd-order stationary Ga ussian spatial p rocess with

    mean 0 an d a spheric al variogram : for and equals for .

    Conditional on the pro cess , consists of independent binomial variables with binomial index and

    probab ility . We use the method outlined in the previous section to calculate the MMSE

    prediction and prediction variance for . The Metropolis-Hastings algorithm is employed to generate a Markov

    chain of length 2000 in th e impleme ntation of the m ethod. Zha ng (2002 a, b) showe d that is sufficiently

    large for predicting . Equation (4) is used in all simulations for predicting and the error bound is .

    3.1. Effects of sample sizes on prediction

    There is a general belief that a larger sample size at a location should affect prediction favorably. The simulation

    study in subsec tion is intended to shed light on the exact effects a nd provid e guidance as to what sam ple size shou ld

    be used in the experiment as far as prediction is concerned. The model parameters are ,

    and . We choose these values because they are the estimates obtained in Zhang

  • 7

    (c)

    (b)(a)

    Figure 2. C ompariso ns of predic tion at the 40 site s using unmo dified data (c ircle) and m odified da ta

    (black) with o ne sample size reduce d to 69 (a ), 14 (b), an d 0 (c).

    (2002 a) using the rea l experimen tal data. Th e sample siz e is fixed at 138 for all sampling sites, the average of sample

    sizes in the real ex periment.

    We divide the stud y into two steps: (a) How do es a single sample size affect prediction of its neighbo ring sites?

    (b) How do the sample sizes at all sampling sites affect prediction surface across the farm so that explicit guidance

    may be given on cho osing the sample sizes?

  • 8

    3.1.1 The effect of the sample size at a single site

    We choose a sampling site near the center of the farm and change and predict 40 sites located on a line crossing

    the sampling site. This sampling site and the 40 predicted sites are shown on Figure 1.

    We redu ce this sample size 0.5 and 0 .1 times to 69 and 24 , respectively. The binomial resp onse is also reduced so

    that the ratio remains approximately the same. Using the modified data from the site and data from other sampling

    sites, we calculate d the pred icted value a nd pred iction varianc e for each o f the 40 sites. W e also elimina ted this samp le

    site and use the remaining 99 sites to make prediction for the 40 sites. Figure 2 shows the plots predicted values and

    prediction variances using modified data in comparison with that obtained using original simulated data.

    We can see from Figure 2 that reducing sample size at the site from 138 to 69 has very little efects on prediction. It

    probably means that a sample size like 138 is large enough that it can be reduced. By reducing the sample size of the

    single site to 14, prediction variances for other sites, especially for the sites near the sampling site, are increased.

    However, these increases are not as great as those when the site is eliminated. This suggests that we prefer using more

    sites with each site having a smaller sample size over another way around.

    3.1.2 Effects of sample sizes on prediction over the farm

    Having se en how a sing le sample size affects predic tion, we now carry out ano ther simulation study in which all

    sampling size s are change d so that we m ight see how the se changes a ffect predictio n across the fa rm. The fa rm is

    divided into 1904 g rid points, at wh ich the inciden ce prob ability will be interpolated. We first use the original

    simulated data to make these predictions.

    We now double the sampling size of each site to 276 and also double the binomial variable proportionally so that

    the ratio remains the same. We then use the modified data to predict the same 1904 sites. Predicted values

    and prediction variances are compared with those from the original data in Figure 3 (a). The horizontal axis is for the

    original data and vertical one is for the modified data.

    We then reduce the sample size of each site from 139 to 69, 28 and 14, and again change the binomial responses

    to keep approximately the same. The 1904 sites are interpolated using the modified data. Prediction

    results are shown in Figure 3 (b), (c) and (d).

    We cle arly see that a sam ple size of 13 8 at every sam pling site is sufficient in the se nse that increas ing the samp le

    size brings about little changes on prediction. Indeed, the sample sizes can be reduced without greatly affecting the

    prediction. For example, a sample size of 69 provides very close results. We also see that when sample sizes become

    smaller, predicted values are more smoothed out in the sense that the predicted value for a lower become larger

    and that for a higher becomes smaller. Therefore, when sample sizes are smaller, the predicted values vary in a

    smaller range . This can b e explained as follows. If a sam pling site has an in cidence ra te lower than m ost other sites, it

    will affect the predicted incidence rates at nearby sites and those predicted incidence rates will be likely lower too at

    nearby sites. If this p articular site has a large samp le size, its impact o n predictio n of nearby site s will be greater than it

  • 9

    (a) (b)

    (c) (d)

    would with a smaller size. In the latter case, othe r sampling size s would hav e more im pact on the prediction resulting in

    elevated prediction for the b inomial probability at the nearby sites.

    From the simulation results, we conclude that the sample size of 138 at each of 100 sites is large enough that

    increasing the sample size would no t affect much the prediction surface. It may b e reduced to 50% without greatly

    affecting pred iction.

    Figure 3. Comparison of prediction using different sample sizes. The horizontal axises in (a)-(d) are

    predicted value or prediction variance corresponding to ; the vertical axises in (a)-(d) are

    those corresponding to =276, 6 9, 28 and 14, respe ctively.

    3.2. Effects of the number of sampling locations

    Due to the spatial correlation, the predicted value of a site is more influenced by observations near the site. Therefore,

    more sampling sites are added near an interpolation site, the MMSE pred iction of the site will change. Since the

    prediction variance is a conditional variance and depends on the observations at the sampling sites, it may become

  • 10

    Figure 4. Sampling sites randomly chosen(circle) and seven added sampling sites (black dot).

    Figure 5. Comparisons of 40 predicted values andprediction standard deviations with 100 sites (circle)and with 10 7 sites (black d ot)

    smaller or larg er. This is evid ent in the followin g simulation stud y.

    We randomly draw 100 sites as the sampling sites from the farm, and then add 7 more sites are located on the line

    crossing the center of the farm. These locations are shown in Figure 4. We now simulate data on the 107 sites from the

    binomial mixed effects model with the linear parameter , and the spherical variogram with parameters

    , and ; the binomial index or sample size is 138 for all sites. Prediction of 40 sites

    are made separately using the 107 sites and using 100 sites with seven sites removed. The predicted sites and the 7 sites

    are marked by “+” and black dots in Figure 4, respectively. Figure 5 compares the predicted values and predicted

    variances, in w hich black d ots refer to pr ediction using 107 sites an d circles refer to prediction using the 100 sites. We

    see as more sits are sampled, the predicted values vary more resulting in the prediction surfacing being less smooth.

    The advantage of more sampling sites will be seen more clearly from the next simulation study in which we

    investigate the combined effects of the number of sampling sites and sample sizes at the location. In this simulation, we

    randomly draw 500 sites from the farm and simulate data from the same model with the same parameters we just gave

    but with two sets o f sample sizes: and for all sites. We then randomly draw 200 sites from the 500

    sites, and 100 sites from the 200 sites. The three sets of sampling sites are shown in Figure 6. Once the sites are chosen,

    data on the sites are obtained accordingly from the simulated values. For each set of sampling sites (100, 200 and 500

    sites), we make prediction for 1904 sites using =138 and 14 and plot the predicted values and prediction variance

    together in Figures 7-9.

    From Fig ures 7-9, we see that when th e number of sampling site s is large, reduc ing the samp ling size at each site

    has less effect on prediction than it would when the number of sampling sites is small. Therefore, when more sites are

  • 11

    Figure 6. Change the number of sampling sites from 100 (blackdot) to 200 (+) to 500 (circle).

    Figure 7. Comparison of prediction using =138 (horizontal) and 14

    (vertical) at sites.

    sampled, we can afford sampling less in each location without as greatly affecting the pred iction as we would when less

    locations are sampled. S ince pred iction near a sa mpling site is mo re accurate , we prefer to sa mple mo re locations w ith

    each loca tion having a sm all sample size , if the total sampling effort is controlle d.

  • 12

    Figure 8. Comparison of prediction of 1904 sites using =138 and 14at N=200 sampling sites.

    Figure 9. Comparison of prediction of 1904 sites using =138 and 14at N=500 sampling sites.

    4. Conclusion and Discussion

    We studied the effects of the sample sizes at the sampling locations and the number of sampling locations on prediction

    of binomial probability, employing model-based geostatistics and minimum mean-squared prediction. Results of the

  • 13

    simulation studies support the following conclusion:

    (1). The sample size at each location needs not to be too large. For the Cunningham farm experiment, a sample size

    like 69 seems sufficient for prediction. Too large a sample size hardly improves prediction.

    (2). A smalle r sample size makes the p rediction sur face smoo ther, and the p redicted va lues vary in a sma ller range.

    (3). When m ore locations are samp led, the effects of sample size on the pred iction surface become less.

    (4). When the total sampling effort is controlled, we prefer sampling more locations with each location having a

    moderate sample size over sampling fewer locations with each one having a larger sample size.

    Theoretical justifications of the conclusions are not available. Nevertheless, the simulation results provide useful

    guidance to the design of the experiment on Cunningham farm. In all the simulations, the parameters are fixed at some

    values. We exp ect the conclusions to hold in gene ral cases regardless of the param eter values.

    Acknowledgment

    The auth ors ackno wledge the a ssistance of X iaoping Jin o n compu ting.

    Reference:

    Abramowitz, M. and Stegun, J. (1967) Handbook of Mathematical Functions. U.S. Government Printing Office,

    Washin gton, D. C . (eds.)

    Bened etti, R. and P alma, D. (1 995) O ptimal samp ling designs for depend ent spatial units. Environmetrics, 6, 101-114.

    Breslow , N.E. and Clayton, D .G. (199 3) App roximate infe rence in gen eralized linea r mixed mo dels. Journal of the

    American Statistical Association, 88, 9-25.

    Crouch, E.A.C. and Spiegelman, D. (1990) The evaluation of integrals of the form : Applicatio n to

    logistic-norma l models. Journal of the American Statistical Association, 85, 464-469.

    Diggle, P., Tawn, J. A., and Moyeed, R.A. (1998) Mo del-based geostatistics (with discussion). Journal of Royal

    Statistical So ciety, Ser. C, Applied Statistics, 47, 299-350.

    Fedoro v, V.V. (1 996) D esign of spatial e xperimen ts: modeling fitting a nd pred iction. In Rao , C.R. and G osh, S.,

    editors, Handbook of Statistics, Vol. 13, North-Holland.

    Muller, W. G. (1998) Collecting Spatial D ata: Op timum D esign of E xperime nts for Ran dom F ields. Physica-V erlag, ,

    Heidelberg.

    Muller, W . G. and Zim merman, D . L. (1999 ) Optimum design for sem ivariogram estimation. E nvironme trics, 10, 23-

    37.

    Pesti, G. K elly, W. E. a nd Bog ardi, I. (199 4) Obse rvation netwo rk design for se lecting locatio ns for water sup ply

    wells. Environmetrics, 5, 91-110.

    Warric k, A.W. a nd Mye rs, D. E. (19 87) Op timization of sa mpling loca tions for vario gram calcu lations. Water

    Resources Research, 23, 496-500.

  • 14

    Zhang, H . (2002a ) On estima tion and pr ediction for sp atial generalize d linear mixe d mode ls. Biome trics, Vol. 58,

    No.1, 1 29-136 .

    Zhang, H. (2002b) Optimal interpolation and the appropriateness of cross-validating variogram in spatial generalized

    linear mixed models. Journal of Computational and Graphical Statistics (To appear)

    .